All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-15 13:51 ` Michal Hocko
  0 siblings, 0 replies; 210+ messages in thread
From: Michal Hocko @ 2011-04-15 13:51 UTC (permalink / raw)
  To: linux-mm; +Cc: Andrew Morton, Hugh Dickins, LKML

Hi,
the following patch is just a cleanup for better readability without any
functional changes. What do you think about it?
---
>From 71de71aaa725ee87459b3a256e8bb0af7de4abeb Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.cz>
Date: Fri, 15 Apr 2011 14:56:26 +0200
Subject: [PATCH] mm: make expand_downwards symmetrical to expand_upwards

Currently we have expand_upwards exported while expand_downwards is
accessible only via expand_stack.

check_stack_guard_page is a nice example of the asymmetry. It uses
expand_stack for VM_GROWSDOWN while expand_upwards is called for
VM_GROWSUP case. Let's make this consistent and export expand_downwards
same way we do with expand_upwards.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
---
 include/linux/mm.h |    2 ++
 mm/memory.c        |    2 +-
 mm/mmap.c          |    2 +-
 3 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 692dbae..765cf4e 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1498,8 +1498,10 @@ unsigned long ra_submit(struct file_ra_state *ra,
 extern int expand_stack(struct vm_area_struct *vma, unsigned long address);
 #if VM_GROWSUP
 extern int expand_upwards(struct vm_area_struct *vma, unsigned long address);
+  #define expand_downwards(vma, address) do { } while (0)
 #else
   #define expand_upwards(vma, address) do { } while (0)
+extern int expand_downwards(struct vm_area_struct *vma, unsigned long address);
 #endif
 extern int expand_stack_downwards(struct vm_area_struct *vma,
 				  unsigned long address);
diff --git a/mm/memory.c b/mm/memory.c
index ce22a25..f404fb6 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2969,7 +2969,7 @@ static inline int check_stack_guard_page(struct vm_area_struct *vma, unsigned lo
 		if (prev && prev->vm_end == address)
 			return prev->vm_flags & VM_GROWSDOWN ? 0 : -ENOMEM;
 
-		expand_stack(vma, address - PAGE_SIZE);
+		expand_downwards(vma, address - PAGE_SIZE);
 	}
 	if ((vma->vm_flags & VM_GROWSUP) && address + PAGE_SIZE == vma->vm_end) {
 		struct vm_area_struct *next = vma->vm_next;
diff --git a/mm/mmap.c b/mm/mmap.c
index e27e0cf..6b2a817 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1782,7 +1782,7 @@ int expand_upwards(struct vm_area_struct *vma, unsigned long address)
 /*
  * vma is the first one with address < vma->vm_start.  Have to extend vma.
  */
-static int expand_downwards(struct vm_area_struct *vma,
+int expand_downwards(struct vm_area_struct *vma,
 				   unsigned long address)
 {
 	int error;
-- 
1.7.4.1

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply related	[flat|nested] 210+ messages in thread

* [PATCH] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-15 13:51 ` Michal Hocko
  0 siblings, 0 replies; 210+ messages in thread
From: Michal Hocko @ 2011-04-15 13:51 UTC (permalink / raw)
  To: linux-mm; +Cc: Andrew Morton, Hugh Dickins, LKML

Hi,
the following patch is just a cleanup for better readability without any
functional changes. What do you think about it?
---

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-15 13:51 ` Michal Hocko
@ 2011-04-18  3:00   ` Hugh Dickins
  -1 siblings, 0 replies; 210+ messages in thread
From: Hugh Dickins @ 2011-04-18  3:00 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-mm, Andrew Morton, LKML

On Fri, 14 Apr 2011, Michal Hocko wrote:

> Hi,
> the following patch is just a cleanup for better readability without any
> functional changes. What do you think about it?
> ---
> From 71de71aaa725ee87459b3a256e8bb0af7de4abeb Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.cz>
> Date: Fri, 15 Apr 2011 14:56:26 +0200
> Subject: [PATCH] mm: make expand_downwards symmetrical to expand_upwards
> 
> Currently we have expand_upwards exported while expand_downwards is
> accessible only via expand_stack.
> 
> check_stack_guard_page is a nice example of the asymmetry. It uses
> expand_stack for VM_GROWSDOWN while expand_upwards is called for
> VM_GROWSUP case. Let's make this consistent and export expand_downwards
> same way we do with expand_upwards.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.cz>

Yes, I've just been looking around here, and I like your symmetry.
But two points:

> ---
>  include/linux/mm.h |    2 ++
>  mm/memory.c        |    2 +-
>  mm/mmap.c          |    2 +-
>  3 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 692dbae..765cf4e 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1498,8 +1498,10 @@ unsigned long ra_submit(struct file_ra_state *ra,
>  extern int expand_stack(struct vm_area_struct *vma, unsigned long address);
>  #if VM_GROWSUP
>  extern int expand_upwards(struct vm_area_struct *vma, unsigned long address);
> +  #define expand_downwards(vma, address) do { } while (0)

I think this is wrong: doesn't the VM_GROWSUP case actually want
a real expand_downwards() in addition to expand_upwards()?

>  #else
>    #define expand_upwards(vma, address) do { } while (0)
> +extern int expand_downwards(struct vm_area_struct *vma, unsigned long address);
>  #endif
>  extern int expand_stack_downwards(struct vm_area_struct *vma,
>  				  unsigned long address);

And if you're going for symmetry, wouldn't it be nice to add fs/exec.c
to the patch and remove this silly expand_stack_downwards() wrapper?

Hugh


> diff --git a/mm/memory.c b/mm/memory.c
> index ce22a25..f404fb6 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2969,7 +2969,7 @@ static inline int check_stack_guard_page(struct vm_area_struct *vma, unsigned lo
>  		if (prev && prev->vm_end == address)
>  			return prev->vm_flags & VM_GROWSDOWN ? 0 : -ENOMEM;
>  
> -		expand_stack(vma, address - PAGE_SIZE);
> +		expand_downwards(vma, address - PAGE_SIZE);
>  	}
>  	if ((vma->vm_flags & VM_GROWSUP) && address + PAGE_SIZE == vma->vm_end) {
>  		struct vm_area_struct *next = vma->vm_next;
> diff --git a/mm/mmap.c b/mm/mmap.c
> index e27e0cf..6b2a817 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1782,7 +1782,7 @@ int expand_upwards(struct vm_area_struct *vma, unsigned long address)
>  /*
>   * vma is the first one with address < vma->vm_start.  Have to extend vma.
>   */
> -static int expand_downwards(struct vm_area_struct *vma,
> +int expand_downwards(struct vm_area_struct *vma,
>  				   unsigned long address)
>  {
>  	int error;
> -- 
> 1.7.4.1

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-18  3:00   ` Hugh Dickins
  0 siblings, 0 replies; 210+ messages in thread
From: Hugh Dickins @ 2011-04-18  3:00 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-mm, Andrew Morton, LKML

On Fri, 14 Apr 2011, Michal Hocko wrote:

> Hi,
> the following patch is just a cleanup for better readability without any
> functional changes. What do you think about it?
> ---
> From 71de71aaa725ee87459b3a256e8bb0af7de4abeb Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.cz>
> Date: Fri, 15 Apr 2011 14:56:26 +0200
> Subject: [PATCH] mm: make expand_downwards symmetrical to expand_upwards
> 
> Currently we have expand_upwards exported while expand_downwards is
> accessible only via expand_stack.
> 
> check_stack_guard_page is a nice example of the asymmetry. It uses
> expand_stack for VM_GROWSDOWN while expand_upwards is called for
> VM_GROWSUP case. Let's make this consistent and export expand_downwards
> same way we do with expand_upwards.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.cz>

Yes, I've just been looking around here, and I like your symmetry.
But two points:

> ---
>  include/linux/mm.h |    2 ++
>  mm/memory.c        |    2 +-
>  mm/mmap.c          |    2 +-
>  3 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 692dbae..765cf4e 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1498,8 +1498,10 @@ unsigned long ra_submit(struct file_ra_state *ra,
>  extern int expand_stack(struct vm_area_struct *vma, unsigned long address);
>  #if VM_GROWSUP
>  extern int expand_upwards(struct vm_area_struct *vma, unsigned long address);
> +  #define expand_downwards(vma, address) do { } while (0)

I think this is wrong: doesn't the VM_GROWSUP case actually want
a real expand_downwards() in addition to expand_upwards()?

>  #else
>    #define expand_upwards(vma, address) do { } while (0)
> +extern int expand_downwards(struct vm_area_struct *vma, unsigned long address);
>  #endif
>  extern int expand_stack_downwards(struct vm_area_struct *vma,
>  				  unsigned long address);

And if you're going for symmetry, wouldn't it be nice to add fs/exec.c
to the patch and remove this silly expand_stack_downwards() wrapper?

Hugh


> diff --git a/mm/memory.c b/mm/memory.c
> index ce22a25..f404fb6 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2969,7 +2969,7 @@ static inline int check_stack_guard_page(struct vm_area_struct *vma, unsigned lo
>  		if (prev && prev->vm_end == address)
>  			return prev->vm_flags & VM_GROWSDOWN ? 0 : -ENOMEM;
>  
> -		expand_stack(vma, address - PAGE_SIZE);
> +		expand_downwards(vma, address - PAGE_SIZE);
>  	}
>  	if ((vma->vm_flags & VM_GROWSUP) && address + PAGE_SIZE == vma->vm_end) {
>  		struct vm_area_struct *next = vma->vm_next;
> diff --git a/mm/mmap.c b/mm/mmap.c
> index e27e0cf..6b2a817 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1782,7 +1782,7 @@ int expand_upwards(struct vm_area_struct *vma, unsigned long address)
>  /*
>   * vma is the first one with address < vma->vm_start.  Have to extend vma.
>   */
> -static int expand_downwards(struct vm_area_struct *vma,
> +int expand_downwards(struct vm_area_struct *vma,
>  				   unsigned long address)
>  {
>  	int error;
> -- 
> 1.7.4.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* [PATCH v2] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-18  3:00   ` Hugh Dickins
@ 2011-04-18 10:01     ` Michal Hocko
  -1 siblings, 0 replies; 210+ messages in thread
From: Michal Hocko @ 2011-04-18 10:01 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: linux-mm, Andrew Morton, LKML

On Sun 17-04-11 20:00:17, Hugh Dickins wrote:
> On Fri, 14 Apr 2011, Michal Hocko wrote:
[...]
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 692dbae..765cf4e 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -1498,8 +1498,10 @@ unsigned long ra_submit(struct file_ra_state *ra,
> >  extern int expand_stack(struct vm_area_struct *vma, unsigned long address);
> >  #if VM_GROWSUP
> >  extern int expand_upwards(struct vm_area_struct *vma, unsigned long address);
> > +  #define expand_downwards(vma, address) do { } while (0)
> 
> I think this is wrong: doesn't the VM_GROWSUP case actually want
> a real expand_downwards() in addition to expand_upwards()?

Ahh, right you are. I haven't noticed that we have
expand_stack_downwards as well and this one is called from get_arg_page
if CONFIG_STACK_GROWSUP. 

I am wondering why we do this just for CONFIG_STACK_GROWSUP. Do we
expand stack for GROWSDOWN automatically?

> 
> >  #else
> >    #define expand_upwards(vma, address) do { } while (0)
> > +extern int expand_downwards(struct vm_area_struct *vma, unsigned long address);
> >  #endif
> >  extern int expand_stack_downwards(struct vm_area_struct *vma,
> >  				  unsigned long address);
> 
> And if you're going for symmetry, wouldn't it be nice to add fs/exec.c
> to the patch and remove this silly expand_stack_downwards() wrapper?

Sounds reasonable. Thanks for the review, Hugh. I am also thinking
whether expand_stack_{downwards,upwards} is more suitable name for those
functions as we are more explicit that this is stack related.

What about the updated patch bellow?

Changes since v1
 - fixed expand_downwards case for CONFIG_STACK_GROWSUP in get_arg_page.
 - rename expand_{downwards,upwards} -> expand_stack_{downwards,upwards}
---

>From f3adcf40518eaf7c4ee1cb10abd16b59785a66ba Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.cz>
Date: Fri, 15 Apr 2011 14:56:26 +0200
Subject: [PATCH] mm: make expand_downwards symmetrical to expand_upwards

Currently we have expand_upwards exported while expand_downwards is
accessible only via expand_stack or expand_stack_downwards.

check_stack_guard_page is a nice example of the asymmetry. It uses
expand_stack for VM_GROWSDOWN while expand_upwards is called for
VM_GROWSUP case.

Let's clean this up by exporting both functions and make those name
consistent. Let's use expand_stack_{upwards,downwards} so that we are
explicit about stack manipulation in the name. expand_stack_downwards
has to be defined for both CONFIG_STACK_GROWS{UP,DOWN} because
get_arg_page calls the downwards version in the early process
initialization phase for growsup configuration.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
---
 include/linux/mm.h |   13 ++++++++-----
 mm/memory.c        |    4 ++--
 mm/mmap.c          |   13 ++++---------
 3 files changed, 14 insertions(+), 16 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 692dbae..17f9b86 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1494,15 +1494,18 @@ unsigned long ra_submit(struct file_ra_state *ra,
 			struct address_space *mapping,
 			struct file *filp);
 
-/* Do stack extension */
+/* Generic expand stack which grows the stack according to GROWS{UP,DOWN} */
 extern int expand_stack(struct vm_area_struct *vma, unsigned long address);
+
+/* CONFIG_STACK_GROWSUP still needs to to grow downwards at some places */
+extern int expand_stack_downwards(struct vm_area_struct *vma,
+		unsigned long address);
 #if VM_GROWSUP
-extern int expand_upwards(struct vm_area_struct *vma, unsigned long address);
+extern int expand_stack_upwards(struct vm_area_struct *vma,
+		unsigned long address);
 #else
-  #define expand_upwards(vma, address) do { } while (0)
+  #define expand_stack_upwards(vma, address) do { } while (0)
 #endif
-extern int expand_stack_downwards(struct vm_area_struct *vma,
-				  unsigned long address);
 
 /* Look up the first VMA which satisfies  addr < vm_end,  NULL if none. */
 extern struct vm_area_struct * find_vma(struct mm_struct * mm, unsigned long addr);
diff --git a/mm/memory.c b/mm/memory.c
index ce22a25..ba5b4d8 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2969,7 +2969,7 @@ static inline int check_stack_guard_page(struct vm_area_struct *vma, unsigned lo
 		if (prev && prev->vm_end == address)
 			return prev->vm_flags & VM_GROWSDOWN ? 0 : -ENOMEM;
 
-		expand_stack(vma, address - PAGE_SIZE);
+		expand_stack_downwards(vma, address - PAGE_SIZE);
 	}
 	if ((vma->vm_flags & VM_GROWSUP) && address + PAGE_SIZE == vma->vm_end) {
 		struct vm_area_struct *next = vma->vm_next;
@@ -2978,7 +2978,7 @@ static inline int check_stack_guard_page(struct vm_area_struct *vma, unsigned lo
 		if (next && next->vm_start == address + PAGE_SIZE)
 			return next->vm_flags & VM_GROWSUP ? 0 : -ENOMEM;
 
-		expand_upwards(vma, address + PAGE_SIZE);
+		expand_stack_upwards(vma, address + PAGE_SIZE);
 	}
 	return 0;
 }
diff --git a/mm/mmap.c b/mm/mmap.c
index e27e0cf..29c68b0 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1731,7 +1731,7 @@ static int acct_stack_growth(struct vm_area_struct *vma, unsigned long size, uns
  * PA-RISC uses this for its stack; IA64 for its Register Backing Store.
  * vma is the last one with address > vma->vm_end.  Have to extend vma.
  */
-int expand_upwards(struct vm_area_struct *vma, unsigned long address)
+int expand_stack_upwards(struct vm_area_struct *vma, unsigned long address)
 {
 	int error;
 
@@ -1782,7 +1782,7 @@ int expand_upwards(struct vm_area_struct *vma, unsigned long address)
 /*
  * vma is the first one with address < vma->vm_start.  Have to extend vma.
  */
-static int expand_downwards(struct vm_area_struct *vma,
+int expand_stack_downwards(struct vm_area_struct *vma,
 				   unsigned long address)
 {
 	int error;
@@ -1829,15 +1829,10 @@ static int expand_downwards(struct vm_area_struct *vma,
 	return error;
 }
 
-int expand_stack_downwards(struct vm_area_struct *vma, unsigned long address)
-{
-	return expand_downwards(vma, address);
-}
-
 #ifdef CONFIG_STACK_GROWSUP
 int expand_stack(struct vm_area_struct *vma, unsigned long address)
 {
-	return expand_upwards(vma, address);
+	return expand_stack_upwards(vma, address);
 }
 
 struct vm_area_struct *
@@ -1859,7 +1854,7 @@ find_extend_vma(struct mm_struct *mm, unsigned long addr)
 #else
 int expand_stack(struct vm_area_struct *vma, unsigned long address)
 {
-	return expand_downwards(vma, address);
+	return expand_stack_downwards(vma, address);
 }
 
 struct vm_area_struct *
-- 
1.7.4.1

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply related	[flat|nested] 210+ messages in thread

* [PATCH v2] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-18 10:01     ` Michal Hocko
  0 siblings, 0 replies; 210+ messages in thread
From: Michal Hocko @ 2011-04-18 10:01 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: linux-mm, Andrew Morton, LKML

On Sun 17-04-11 20:00:17, Hugh Dickins wrote:
> On Fri, 14 Apr 2011, Michal Hocko wrote:
[...]
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 692dbae..765cf4e 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -1498,8 +1498,10 @@ unsigned long ra_submit(struct file_ra_state *ra,
> >  extern int expand_stack(struct vm_area_struct *vma, unsigned long address);
> >  #if VM_GROWSUP
> >  extern int expand_upwards(struct vm_area_struct *vma, unsigned long address);
> > +  #define expand_downwards(vma, address) do { } while (0)
> 
> I think this is wrong: doesn't the VM_GROWSUP case actually want
> a real expand_downwards() in addition to expand_upwards()?

Ahh, right you are. I haven't noticed that we have
expand_stack_downwards as well and this one is called from get_arg_page
if CONFIG_STACK_GROWSUP. 

I am wondering why we do this just for CONFIG_STACK_GROWSUP. Do we
expand stack for GROWSDOWN automatically?

> 
> >  #else
> >    #define expand_upwards(vma, address) do { } while (0)
> > +extern int expand_downwards(struct vm_area_struct *vma, unsigned long address);
> >  #endif
> >  extern int expand_stack_downwards(struct vm_area_struct *vma,
> >  				  unsigned long address);
> 
> And if you're going for symmetry, wouldn't it be nice to add fs/exec.c
> to the patch and remove this silly expand_stack_downwards() wrapper?

Sounds reasonable. Thanks for the review, Hugh. I am also thinking
whether expand_stack_{downwards,upwards} is more suitable name for those
functions as we are more explicit that this is stack related.

What about the updated patch bellow?

Changes since v1
 - fixed expand_downwards case for CONFIG_STACK_GROWSUP in get_arg_page.
 - rename expand_{downwards,upwards} -> expand_stack_{downwards,upwards}
---

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v2] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-18 10:01     ` Michal Hocko
@ 2011-04-18 20:56       ` Andrew Morton
  -1 siblings, 0 replies; 210+ messages in thread
From: Andrew Morton @ 2011-04-18 20:56 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Hugh Dickins, linux-mm, LKML

On Mon, 18 Apr 2011 12:01:31 +0200
Michal Hocko <mhocko@suse.cz> wrote:

> Currently we have expand_upwards exported while expand_downwards is
> accessible only via expand_stack or expand_stack_downwards.
> 
> check_stack_guard_page is a nice example of the asymmetry. It uses
> expand_stack for VM_GROWSDOWN while expand_upwards is called for
> VM_GROWSUP case.
> 
> Let's clean this up by exporting both functions and make those name
> consistent. Let's use expand_stack_{upwards,downwards} so that we are
> explicit about stack manipulation in the name. expand_stack_downwards
> has to be defined for both CONFIG_STACK_GROWS{UP,DOWN} because
> get_arg_page calls the downwards version in the early process
> initialization phase for growsup configuration.

Has this patch been tested on any stack-grows-upwards architecture?

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v2] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-18 20:56       ` Andrew Morton
  0 siblings, 0 replies; 210+ messages in thread
From: Andrew Morton @ 2011-04-18 20:56 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Hugh Dickins, linux-mm, LKML

On Mon, 18 Apr 2011 12:01:31 +0200
Michal Hocko <mhocko@suse.cz> wrote:

> Currently we have expand_upwards exported while expand_downwards is
> accessible only via expand_stack or expand_stack_downwards.
> 
> check_stack_guard_page is a nice example of the asymmetry. It uses
> expand_stack for VM_GROWSDOWN while expand_upwards is called for
> VM_GROWSUP case.
> 
> Let's clean this up by exporting both functions and make those name
> consistent. Let's use expand_stack_{upwards,downwards} so that we are
> explicit about stack manipulation in the name. expand_stack_downwards
> has to be defined for both CONFIG_STACK_GROWS{UP,DOWN} because
> get_arg_page calls the downwards version in the early process
> initialization phase for growsup configuration.

Has this patch been tested on any stack-grows-upwards architecture?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* [PATCH followup] mm: get rid of CONFIG_STACK_GROWSUP || CONFIG_IA64
       [not found]       ` <20110419091022.GA21689@tiehlicka.suse.cz>
@ 2011-04-19 11:09           ` Michal Hocko
  0 siblings, 0 replies; 210+ messages in thread
From: Michal Hocko @ 2011-04-19 11:09 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Hugh Dickins, linux-mm, LKML

While I am in the cleanup mode. We should use VM_GROWSUP rather than
tricky CONFIG_STACK_GROWSUP||CONFIG_IA64.

What do you think?
--- 
>From fd832dd46b4918718901f2ebe994f4662f167999 Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.cz>
Date: Tue, 19 Apr 2011 11:11:41 +0200
Subject: [PATCH] mm: get rid of CONFIG_STACK_GROWSUP || CONFIG_IA64

IA64 needs some trickery for Register Backing Store so we have to
export expand_stack_upwards for it even though the architecture expands
its stack downwards normally. To avoid
we have defined VM_GROWSUP which is defined only for the above
configuration.

We still have places which use the original ifdefs so let's get rid of
them finally.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
---
 mm/mmap.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index 29c68b0..3ff9edf 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1726,7 +1726,7 @@ static int acct_stack_growth(struct vm_area_struct *vma, unsigned long size, uns
 	return 0;
 }
 
-#if defined(CONFIG_STACK_GROWSUP) || defined(CONFIG_IA64)
+#if VM_GROWSUP
 /*
  * PA-RISC uses this for its stack; IA64 for its Register Backing Store.
  * vma is the last one with address > vma->vm_end.  Have to extend vma.
@@ -1777,7 +1777,7 @@ int expand_stack_upwards(struct vm_area_struct *vma, unsigned long address)
 	khugepaged_enter_vma_merge(vma);
 	return error;
 }
-#endif /* CONFIG_STACK_GROWSUP || CONFIG_IA64 */
+#endif /* VM_GROWSUP */
 
 /*
  * vma is the first one with address < vma->vm_start.  Have to extend vma.
-- 
1.7.4.1

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply related	[flat|nested] 210+ messages in thread

* [PATCH followup] mm: get rid of CONFIG_STACK_GROWSUP || CONFIG_IA64
@ 2011-04-19 11:09           ` Michal Hocko
  0 siblings, 0 replies; 210+ messages in thread
From: Michal Hocko @ 2011-04-19 11:09 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Hugh Dickins, linux-mm, LKML

While I am in the cleanup mode. We should use VM_GROWSUP rather than
tricky CONFIG_STACK_GROWSUP||CONFIG_IA64.

What do you think?
--- 

^ permalink raw reply	[flat|nested] 210+ messages in thread

* [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-18 20:56       ` Andrew Morton
  (?)
@ 2011-04-19 11:10         ` Michal Hocko
  -1 siblings, 0 replies; 210+ messages in thread
From: Michal Hocko @ 2011-04-19 11:10 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Hugh Dickins, linux-mm, LKML, linux-parisc

On Mon 18-04-11 13:56:37, Andrew Morton wrote:
> On Mon, 18 Apr 2011 12:01:31 +0200
> Michal Hocko <mhocko@suse.cz> wrote:
> 
> > Currently we have expand_upwards exported while expand_downwards is
> > accessible only via expand_stack or expand_stack_downwards.
> > 
> > check_stack_guard_page is a nice example of the asymmetry. It uses
> > expand_stack for VM_GROWSDOWN while expand_upwards is called for
> > VM_GROWSUP case.
> > 
> > Let's clean this up by exporting both functions and make those name
> > consistent. Let's use expand_stack_{upwards,downwards} so that we are
> > explicit about stack manipulation in the name. expand_stack_downwards
> > has to be defined for both CONFIG_STACK_GROWS{UP,DOWN} because
> > get_arg_page calls the downwards version in the early process
> > initialization phase for growsup configuration.
> 
> Has this patch been tested on any stack-grows-upwards architecture?

The only one I can find in the tree is parisc and I do not have access
to any such machine. Maybe someone on the list (CCed) can help with
testing the patch bellow? Nevertheless, the patch doesn't change growsup
case. It just renames functions and exports growsdown.

IA64 which grows upwards only for registers still needs a fix because of
the rename, though. I'm sorry, I must have missed it in the grep output
before. No other arch specific code uses expand_{down,up}wards directly.

Changes since v2
 - fix compilation error on ia64
Changes since v1
 - fixed expand_downwards case for CONFIG_STACK_GROWSUP in get_arg_page.
 - rename expand_{downwards,upwards} -> expand_stack_{downwards,upwards}
--- 
>From 091cf27fe9fad875bc7f6cdbb8206c617b06fc7b Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.cz>
Date: Fri, 15 Apr 2011 14:56:26 +0200
Subject: [PATCH] mm: make expand_downwards symmetrical to expand_upwards

Currently we have expand_upwards exported while expand_downwards is
accessible only via expand_stack or expand_stack_downwards.

check_stack_guard_page is a nice example of the asymmetry. It uses
expand_stack for VM_GROWSDOWN while expand_upwards is called for
VM_GROWSUP case.

Let's clean this up by exporting both functions and make those name
consistent. Let's use expand_stack_{upwards,downwards} so that we are
explicit about stack manipulation in the name. expand_stack_downwards
has to be defined for both CONFIG_STACK_GROWS{UP,DOWN} because
get_arg_page calls the downwards version in the early process
initialization phase for growsup configuration.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
---
 arch/ia64/mm/fault.c |    2 +-
 include/linux/mm.h   |   13 ++++++++-----
 mm/memory.c          |    4 ++--
 mm/mmap.c            |   13 ++++---------
 4 files changed, 15 insertions(+), 17 deletions(-)

diff --git a/arch/ia64/mm/fault.c b/arch/ia64/mm/fault.c
index 0799fea..aebff8a 100644
--- a/arch/ia64/mm/fault.c
+++ b/arch/ia64/mm/fault.c
@@ -197,7 +197,7 @@ ia64_do_page_fault (unsigned long address, unsigned long isr, struct pt_regs *re
 		 */
 		if (address > vma->vm_end + PAGE_SIZE - sizeof(long))
 			goto bad_area;
-		if (expand_upwards(vma, address))
+		if (expand_stack_upwards(vma, address))
 			goto bad_area;
 	}
 	goto good_area;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 692dbae..17f9b86 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1494,15 +1494,18 @@ unsigned long ra_submit(struct file_ra_state *ra,
 			struct address_space *mapping,
 			struct file *filp);
 
-/* Do stack extension */
+/* Generic expand stack which grows the stack according to GROWS{UP,DOWN} */
 extern int expand_stack(struct vm_area_struct *vma, unsigned long address);
+
+/* CONFIG_STACK_GROWSUP still needs to to grow downwards at some places */
+extern int expand_stack_downwards(struct vm_area_struct *vma,
+		unsigned long address);
 #if VM_GROWSUP
-extern int expand_upwards(struct vm_area_struct *vma, unsigned long address);
+extern int expand_stack_upwards(struct vm_area_struct *vma,
+		unsigned long address);
 #else
-  #define expand_upwards(vma, address) do { } while (0)
+  #define expand_stack_upwards(vma, address) do { } while (0)
 #endif
-extern int expand_stack_downwards(struct vm_area_struct *vma,
-				  unsigned long address);
 
 /* Look up the first VMA which satisfies  addr < vm_end,  NULL if none. */
 extern struct vm_area_struct * find_vma(struct mm_struct * mm, unsigned long addr);
diff --git a/mm/memory.c b/mm/memory.c
index ce22a25..ba5b4d8 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2969,7 +2969,7 @@ static inline int check_stack_guard_page(struct vm_area_struct *vma, unsigned lo
 		if (prev && prev->vm_end == address)
 			return prev->vm_flags & VM_GROWSDOWN ? 0 : -ENOMEM;
 
-		expand_stack(vma, address - PAGE_SIZE);
+		expand_stack_downwards(vma, address - PAGE_SIZE);
 	}
 	if ((vma->vm_flags & VM_GROWSUP) && address + PAGE_SIZE == vma->vm_end) {
 		struct vm_area_struct *next = vma->vm_next;
@@ -2978,7 +2978,7 @@ static inline int check_stack_guard_page(struct vm_area_struct *vma, unsigned lo
 		if (next && next->vm_start == address + PAGE_SIZE)
 			return next->vm_flags & VM_GROWSUP ? 0 : -ENOMEM;
 
-		expand_upwards(vma, address + PAGE_SIZE);
+		expand_stack_upwards(vma, address + PAGE_SIZE);
 	}
 	return 0;
 }
diff --git a/mm/mmap.c b/mm/mmap.c
index e27e0cf..29c68b0 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1731,7 +1731,7 @@ static int acct_stack_growth(struct vm_area_struct *vma, unsigned long size, uns
  * PA-RISC uses this for its stack; IA64 for its Register Backing Store.
  * vma is the last one with address > vma->vm_end.  Have to extend vma.
  */
-int expand_upwards(struct vm_area_struct *vma, unsigned long address)
+int expand_stack_upwards(struct vm_area_struct *vma, unsigned long address)
 {
 	int error;
 
@@ -1782,7 +1782,7 @@ int expand_upwards(struct vm_area_struct *vma, unsigned long address)
 /*
  * vma is the first one with address < vma->vm_start.  Have to extend vma.
  */
-static int expand_downwards(struct vm_area_struct *vma,
+int expand_stack_downwards(struct vm_area_struct *vma,
 				   unsigned long address)
 {
 	int error;
@@ -1829,15 +1829,10 @@ static int expand_downwards(struct vm_area_struct *vma,
 	return error;
 }
 
-int expand_stack_downwards(struct vm_area_struct *vma, unsigned long address)
-{
-	return expand_downwards(vma, address);
-}
-
 #ifdef CONFIG_STACK_GROWSUP
 int expand_stack(struct vm_area_struct *vma, unsigned long address)
 {
-	return expand_upwards(vma, address);
+	return expand_stack_upwards(vma, address);
 }
 
 struct vm_area_struct *
@@ -1859,7 +1854,7 @@ find_extend_vma(struct mm_struct *mm, unsigned long addr)
 #else
 int expand_stack(struct vm_area_struct *vma, unsigned long address)
 {
-	return expand_downwards(vma, address);
+	return expand_stack_downwards(vma, address);
 }
 
 struct vm_area_struct *
-- 
1.7.4.1

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 210+ messages in thread

* [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-19 11:10         ` Michal Hocko
  0 siblings, 0 replies; 210+ messages in thread
From: Michal Hocko @ 2011-04-19 11:10 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Hugh Dickins, linux-mm, LKML, linux-parisc

On Mon 18-04-11 13:56:37, Andrew Morton wrote:
> On Mon, 18 Apr 2011 12:01:31 +0200
> Michal Hocko <mhocko@suse.cz> wrote:
> 
> > Currently we have expand_upwards exported while expand_downwards is
> > accessible only via expand_stack or expand_stack_downwards.
> > 
> > check_stack_guard_page is a nice example of the asymmetry. It uses
> > expand_stack for VM_GROWSDOWN while expand_upwards is called for
> > VM_GROWSUP case.
> > 
> > Let's clean this up by exporting both functions and make those name
> > consistent. Let's use expand_stack_{upwards,downwards} so that we are
> > explicit about stack manipulation in the name. expand_stack_downwards
> > has to be defined for both CONFIG_STACK_GROWS{UP,DOWN} because
> > get_arg_page calls the downwards version in the early process
> > initialization phase for growsup configuration.
> 
> Has this patch been tested on any stack-grows-upwards architecture?

The only one I can find in the tree is parisc and I do not have access
to any such machine. Maybe someone on the list (CCed) can help with
testing the patch bellow? Nevertheless, the patch doesn't change growsup
case. It just renames functions and exports growsdown.

IA64 which grows upwards only for registers still needs a fix because of
the rename, though. I'm sorry, I must have missed it in the grep output
before. No other arch specific code uses expand_{down,up}wards directly.

Changes since v2
 - fix compilation error on ia64
Changes since v1
 - fixed expand_downwards case for CONFIG_STACK_GROWSUP in get_arg_page.
 - rename expand_{downwards,upwards} -> expand_stack_{downwards,upwards}
--- 
>From 091cf27fe9fad875bc7f6cdbb8206c617b06fc7b Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.cz>
Date: Fri, 15 Apr 2011 14:56:26 +0200
Subject: [PATCH] mm: make expand_downwards symmetrical to expand_upwards

Currently we have expand_upwards exported while expand_downwards is
accessible only via expand_stack or expand_stack_downwards.

check_stack_guard_page is a nice example of the asymmetry. It uses
expand_stack for VM_GROWSDOWN while expand_upwards is called for
VM_GROWSUP case.

Let's clean this up by exporting both functions and make those name
consistent. Let's use expand_stack_{upwards,downwards} so that we are
explicit about stack manipulation in the name. expand_stack_downwards
has to be defined for both CONFIG_STACK_GROWS{UP,DOWN} because
get_arg_page calls the downwards version in the early process
initialization phase for growsup configuration.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
---
 arch/ia64/mm/fault.c |    2 +-
 include/linux/mm.h   |   13 ++++++++-----
 mm/memory.c          |    4 ++--
 mm/mmap.c            |   13 ++++---------
 4 files changed, 15 insertions(+), 17 deletions(-)

diff --git a/arch/ia64/mm/fault.c b/arch/ia64/mm/fault.c
index 0799fea..aebff8a 100644
--- a/arch/ia64/mm/fault.c
+++ b/arch/ia64/mm/fault.c
@@ -197,7 +197,7 @@ ia64_do_page_fault (unsigned long address, unsigned long isr, struct pt_regs *re
 		 */
 		if (address > vma->vm_end + PAGE_SIZE - sizeof(long))
 			goto bad_area;
-		if (expand_upwards(vma, address))
+		if (expand_stack_upwards(vma, address))
 			goto bad_area;
 	}
 	goto good_area;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 692dbae..17f9b86 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1494,15 +1494,18 @@ unsigned long ra_submit(struct file_ra_state *ra,
 			struct address_space *mapping,
 			struct file *filp);
 
-/* Do stack extension */
+/* Generic expand stack which grows the stack according to GROWS{UP,DOWN} */
 extern int expand_stack(struct vm_area_struct *vma, unsigned long address);
+
+/* CONFIG_STACK_GROWSUP still needs to to grow downwards at some places */
+extern int expand_stack_downwards(struct vm_area_struct *vma,
+		unsigned long address);
 #if VM_GROWSUP
-extern int expand_upwards(struct vm_area_struct *vma, unsigned long address);
+extern int expand_stack_upwards(struct vm_area_struct *vma,
+		unsigned long address);
 #else
-  #define expand_upwards(vma, address) do { } while (0)
+  #define expand_stack_upwards(vma, address) do { } while (0)
 #endif
-extern int expand_stack_downwards(struct vm_area_struct *vma,
-				  unsigned long address);
 
 /* Look up the first VMA which satisfies  addr < vm_end,  NULL if none. */
 extern struct vm_area_struct * find_vma(struct mm_struct * mm, unsigned long addr);
diff --git a/mm/memory.c b/mm/memory.c
index ce22a25..ba5b4d8 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2969,7 +2969,7 @@ static inline int check_stack_guard_page(struct vm_area_struct *vma, unsigned lo
 		if (prev && prev->vm_end == address)
 			return prev->vm_flags & VM_GROWSDOWN ? 0 : -ENOMEM;
 
-		expand_stack(vma, address - PAGE_SIZE);
+		expand_stack_downwards(vma, address - PAGE_SIZE);
 	}
 	if ((vma->vm_flags & VM_GROWSUP) && address + PAGE_SIZE == vma->vm_end) {
 		struct vm_area_struct *next = vma->vm_next;
@@ -2978,7 +2978,7 @@ static inline int check_stack_guard_page(struct vm_area_struct *vma, unsigned lo
 		if (next && next->vm_start == address + PAGE_SIZE)
 			return next->vm_flags & VM_GROWSUP ? 0 : -ENOMEM;
 
-		expand_upwards(vma, address + PAGE_SIZE);
+		expand_stack_upwards(vma, address + PAGE_SIZE);
 	}
 	return 0;
 }
diff --git a/mm/mmap.c b/mm/mmap.c
index e27e0cf..29c68b0 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1731,7 +1731,7 @@ static int acct_stack_growth(struct vm_area_struct *vma, unsigned long size, uns
  * PA-RISC uses this for its stack; IA64 for its Register Backing Store.
  * vma is the last one with address > vma->vm_end.  Have to extend vma.
  */
-int expand_upwards(struct vm_area_struct *vma, unsigned long address)
+int expand_stack_upwards(struct vm_area_struct *vma, unsigned long address)
 {
 	int error;
 
@@ -1782,7 +1782,7 @@ int expand_upwards(struct vm_area_struct *vma, unsigned long address)
 /*
  * vma is the first one with address < vma->vm_start.  Have to extend vma.
  */
-static int expand_downwards(struct vm_area_struct *vma,
+int expand_stack_downwards(struct vm_area_struct *vma,
 				   unsigned long address)
 {
 	int error;
@@ -1829,15 +1829,10 @@ static int expand_downwards(struct vm_area_struct *vma,
 	return error;
 }
 
-int expand_stack_downwards(struct vm_area_struct *vma, unsigned long address)
-{
-	return expand_downwards(vma, address);
-}
-
 #ifdef CONFIG_STACK_GROWSUP
 int expand_stack(struct vm_area_struct *vma, unsigned long address)
 {
-	return expand_upwards(vma, address);
+	return expand_stack_upwards(vma, address);
 }
 
 struct vm_area_struct *
@@ -1859,7 +1854,7 @@ find_extend_vma(struct mm_struct *mm, unsigned long addr)
 #else
 int expand_stack(struct vm_area_struct *vma, unsigned long address)
 {
-	return expand_downwards(vma, address);
+	return expand_stack_downwards(vma, address);
 }
 
 struct vm_area_struct *
-- 
1.7.4.1

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply related	[flat|nested] 210+ messages in thread

* [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-19 11:10         ` Michal Hocko
  0 siblings, 0 replies; 210+ messages in thread
From: Michal Hocko @ 2011-04-19 11:10 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Hugh Dickins, linux-mm, LKML, linux-parisc

On Mon 18-04-11 13:56:37, Andrew Morton wrote:
> On Mon, 18 Apr 2011 12:01:31 +0200
> Michal Hocko <mhocko@suse.cz> wrote:
> 
> > Currently we have expand_upwards exported while expand_downwards is
> > accessible only via expand_stack or expand_stack_downwards.
> > 
> > check_stack_guard_page is a nice example of the asymmetry. It uses
> > expand_stack for VM_GROWSDOWN while expand_upwards is called for
> > VM_GROWSUP case.
> > 
> > Let's clean this up by exporting both functions and make those name
> > consistent. Let's use expand_stack_{upwards,downwards} so that we are
> > explicit about stack manipulation in the name. expand_stack_downwards
> > has to be defined for both CONFIG_STACK_GROWS{UP,DOWN} because
> > get_arg_page calls the downwards version in the early process
> > initialization phase for growsup configuration.
> 
> Has this patch been tested on any stack-grows-upwards architecture?

The only one I can find in the tree is parisc and I do not have access
to any such machine. Maybe someone on the list (CCed) can help with
testing the patch bellow? Nevertheless, the patch doesn't change growsup
case. It just renames functions and exports growsdown.

IA64 which grows upwards only for registers still needs a fix because of
the rename, though. I'm sorry, I must have missed it in the grep output
before. No other arch specific code uses expand_{down,up}wards directly.

Changes since v2
 - fix compilation error on ia64
Changes since v1
 - fixed expand_downwards case for CONFIG_STACK_GROWSUP in get_arg_page.
 - rename expand_{downwards,upwards} -> expand_stack_{downwards,upwards}
--- 

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-19 11:10         ` Michal Hocko
@ 2011-04-19 15:46           ` James Bottomley
  -1 siblings, 0 replies; 210+ messages in thread
From: James Bottomley @ 2011-04-19 15:46 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Andrew Morton, Hugh Dickins, linux-mm, LKML, linux-parisc

On Tue, 2011-04-19 at 13:10 +0200, Michal Hocko wrote:
> On Mon 18-04-11 13:56:37, Andrew Morton wrote:
> > On Mon, 18 Apr 2011 12:01:31 +0200
> > Michal Hocko <mhocko@suse.cz> wrote:
> > 
> > > Currently we have expand_upwards exported while expand_downwards is
> > > accessible only via expand_stack or expand_stack_downwards.
> > > 
> > > check_stack_guard_page is a nice example of the asymmetry. It uses
> > > expand_stack for VM_GROWSDOWN while expand_upwards is called for
> > > VM_GROWSUP case.
> > > 
> > > Let's clean this up by exporting both functions and make those name
> > > consistent. Let's use expand_stack_{upwards,downwards} so that we are
> > > explicit about stack manipulation in the name. expand_stack_downwards
> > > has to be defined for both CONFIG_STACK_GROWS{UP,DOWN} because
> > > get_arg_page calls the downwards version in the early process
> > > initialization phase for growsup configuration.
> > 
> > Has this patch been tested on any stack-grows-upwards architecture?
> 
> The only one I can find in the tree is parisc and I do not have access
> to any such machine. Maybe someone on the list (CCed) can help with
> testing the patch bellow? Nevertheless, the patch doesn't change growsup
> case. It just renames functions and exports growsdown.

It compiles OK, but crashes on boot in fsck.  The crash is definitely mm
but looks to be a slab problem (it's a null deref on a spinlock in
add_partial(), which seems unrelated to this patch).

[   15.628000] sd 1:0:2:0: [sdc] Attached SCSI disk
done.
[   16.632000] EXT3-fs: barriers not enabled
[   16.640000] kjournald starting.  Commit interval 5 seconds
[   16.640000] EXT3-fs (sda3): mounted filesystem with ordered data mode
Begin: Running /scripts/local-bottom ... done.
done.
Begin: Running /scripts/init-bottom ... done.
INIT: version 2.88 booting
Setting hostname to 'ion'...done.
Starting the hotplug events dispatcher: udevd[   22.008000] udev[211]: starting version 164
.
Synthesizing the initial hotplug events...done.
Waiting for /dev to be fully populated...done.
Activating swap:swapon on /dev/sda2
swapon: /dev/sda2: found swap signature: version 1, page-size 4, same byte order
swapon: /dev/sda2: pagesize=4096, swapsize=1028157440, devsize=1028160000
[   28.780000] Adding 1004056k swap on /dev/sda2.  Priority:-1 extents:1 across:1004056k 
.
Will now check root file system:fsck from util-linux-ng 2.17.2
[/sbin/fsck.ext3 (1) -- /] fsck.ext3 -a -C0 /dev/sda3 
/dev/sda3 has been mounted 37 times without being checked, check forced.
[  257.192000] Backtrace:===========                                \ 42.8%   
[  257.192000]  [<0000000040214f78>] add_partial+0x28/0x98
[  257.192000]  [<0000000040217ff8>] __slab_free+0x1d0/0x1d8
[  257.192000]  [<000000004021825c>] kmem_cache_free+0xc4/0x128
[  257.192000]  [<00000000401fd1a4>] remove_vma+0x8c/0xc0
[  257.192000]  [<00000000401fd3a8>] exit_mmap+0x1d0/0x220
[  257.192000]  [<0000000040156514>] mmput+0xd4/0x200
[  257.192000]  [<000000004015c7b8>] exit_mm+0x100/0x2c0
[  257.192000]  [<000000004015ef40>] do_exit+0x778/0x9d8
[  257.192000]  [<000000004015f1ec>] do_group_exit+0x4c/0xe0
[  257.192000]  [<000000004015f298>] sys_exit_group+0x18/0x28
[  257.192000]  [<0000000040106034>] syscall_exit+0x0/0x14
[  257.192000] 
[  257.192000] 
[  257.192000] Kernel Fault: Code=26 regs=00000040bf1807d0 (Addr=0000000000000000)
[  257.192000] 
[  257.192000]      YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
[  257.192000] PSW: 00001000000001001111000000001110 Not tainted
[  257.192000] r00-03  000000ff0804f00e 0000000040769e40 0000000040214f78 0000000000000000
[  257.192000] r04-07  0000000040746e40 0000000000000001 0000004080ded370 0000000000000001
[  257.192000] r08-11  0000000040654150 0000000000000000 0000000000000001 0000000000000001
[  257.192000] r12-15  0000000000000000 00000000ffffffff 0000000000000024 0000000000000000
[  257.192000] r16-19  00000000fb4ead9c 00000000fb4eac54 0000000000000000 0000000000000000
[  257.192000] r20-23  000000000800000e 0000000000000001 000000007bbb7180 00000000401fd1a4
[  257.192000] r24-27  0000000000000001 0000004080ded370 0000000000000000 0000000040746e40
[  257.192000] r28-31  000000007ec0a908 00000040bf1807a0 00000040bf1807d0 0000000000000016
[  257.192000] sr00-03  00000000002d9000 0000000000000000 0000000000000000 00000000002d9000
[  257.192000] sr04-07  0000000000000000 0000000000000000 0000000000000000 0000000000000000
[  257.192000] 
[  257.192000] IASQ: 0000000000000000 0000000000000000 IAOQ: 000000004011bbc0 000000004011bbc4
[  257.192000]  IIR: 0f4015dc    ISR: 0000000000000000  IOR: 0000000000000000
[  257.192000]  CPU:        0   CR30: 00000040bf180000 CR31: fffffff0f0e098e0
[  257.192000]  ORIG_R28: 0000000040769e40
[  257.192000]  IAOQ[0]: _raw_spin_lock+0x0/0x20
[  257.192000]  IAOQ[1]: _raw_spin_lock+0x4/0x20
[  257.192000]  RP(r2): add_partial+0x28/0x98
[  257.192000] Backtrace:
[  257.192000]  [<0000000040214f78>] add_partial+0x28/0x98
[  257.192000]  [<0000000040217ff8>] __slab_free+0x1d0/0x1d8
[  257.192000]  [<000000004021825c>] kmem_cache_free+0xc4/0x128
[  257.192000]  [<00000000401fd1a4>] remove_vma+0x8c/0xc0
[  257.192000]  [<00000000401fd3a8>] exit_mmap+0x1d0/0x220
[  257.192000]  [<0000000040156514>] mmput+0xd4/0x200
[  257.192000]  [<000000004015c7b8>] exit_mm+0x100/0x2c0
[  257.192000]  [<000000004015ef40>] do_exit+0x778/0x9d8
[  257.192000]  [<000000004015f1ec>] do_group_exit+0x4c/0xe0
[  257.192000]  [<000000004015f298>] sys_exit_group+0x18/0x28
[  257.192000]  [<0000000040106034>] syscall_exit+0x0/0x14
[  257.192000] 
[  257.192000] Kernel panic - not syncing: Kernel Fault
[  257.192000] Backtrace:
[  257.192000]  [<000000004011f984>] show_stack+0x14/0x20
[  257.192000]  [<000000004011f9a8>] dump_stack+0x18/0x28
[  257.192000]  [<000000004015946c>] panic+0xd4/0x368
[  257.192000]  [<0000000040120024>] parisc_terminate+0x14c/0x170
[  257.192000]  [<000000004012059c>] handle_interruption+0x2ac/0x8f8
[  257.192000]  [<000000004011bbc0>] _raw_spin_lock+0x0/0x20
[  257.192000] 
[  257.192000] Rebooting in 5 seconds..

It seems to be a random intermittent mm crash because the next reboot
crashed with the same trace but after the fsck had completed and the
third came up to the login prompt.

James



^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-19 15:46           ` James Bottomley
  0 siblings, 0 replies; 210+ messages in thread
From: James Bottomley @ 2011-04-19 15:46 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Andrew Morton, Hugh Dickins, linux-mm, LKML, linux-parisc

On Tue, 2011-04-19 at 13:10 +0200, Michal Hocko wrote:
> On Mon 18-04-11 13:56:37, Andrew Morton wrote:
> > On Mon, 18 Apr 2011 12:01:31 +0200
> > Michal Hocko <mhocko@suse.cz> wrote:
> > 
> > > Currently we have expand_upwards exported while expand_downwards is
> > > accessible only via expand_stack or expand_stack_downwards.
> > > 
> > > check_stack_guard_page is a nice example of the asymmetry. It uses
> > > expand_stack for VM_GROWSDOWN while expand_upwards is called for
> > > VM_GROWSUP case.
> > > 
> > > Let's clean this up by exporting both functions and make those name
> > > consistent. Let's use expand_stack_{upwards,downwards} so that we are
> > > explicit about stack manipulation in the name. expand_stack_downwards
> > > has to be defined for both CONFIG_STACK_GROWS{UP,DOWN} because
> > > get_arg_page calls the downwards version in the early process
> > > initialization phase for growsup configuration.
> > 
> > Has this patch been tested on any stack-grows-upwards architecture?
> 
> The only one I can find in the tree is parisc and I do not have access
> to any such machine. Maybe someone on the list (CCed) can help with
> testing the patch bellow? Nevertheless, the patch doesn't change growsup
> case. It just renames functions and exports growsdown.

It compiles OK, but crashes on boot in fsck.  The crash is definitely mm
but looks to be a slab problem (it's a null deref on a spinlock in
add_partial(), which seems unrelated to this patch).

[   15.628000] sd 1:0:2:0: [sdc] Attached SCSI disk
done.
[   16.632000] EXT3-fs: barriers not enabled
[   16.640000] kjournald starting.  Commit interval 5 seconds
[   16.640000] EXT3-fs (sda3): mounted filesystem with ordered data mode
Begin: Running /scripts/local-bottom ... done.
done.
Begin: Running /scripts/init-bottom ... done.
INIT: version 2.88 booting
Setting hostname to 'ion'...done.
Starting the hotplug events dispatcher: udevd[   22.008000] udev[211]: starting version 164
.
Synthesizing the initial hotplug events...done.
Waiting for /dev to be fully populated...done.
Activating swap:swapon on /dev/sda2
swapon: /dev/sda2: found swap signature: version 1, page-size 4, same byte order
swapon: /dev/sda2: pagesize=4096, swapsize=1028157440, devsize=1028160000
[   28.780000] Adding 1004056k swap on /dev/sda2.  Priority:-1 extents:1 across:1004056k 
.
Will now check root file system:fsck from util-linux-ng 2.17.2
[/sbin/fsck.ext3 (1) -- /] fsck.ext3 -a -C0 /dev/sda3 
/dev/sda3 has been mounted 37 times without being checked, check forced.
[  257.192000] Backtrace:===========                                \ 42.8%   
[  257.192000]  [<0000000040214f78>] add_partial+0x28/0x98
[  257.192000]  [<0000000040217ff8>] __slab_free+0x1d0/0x1d8
[  257.192000]  [<000000004021825c>] kmem_cache_free+0xc4/0x128
[  257.192000]  [<00000000401fd1a4>] remove_vma+0x8c/0xc0
[  257.192000]  [<00000000401fd3a8>] exit_mmap+0x1d0/0x220
[  257.192000]  [<0000000040156514>] mmput+0xd4/0x200
[  257.192000]  [<000000004015c7b8>] exit_mm+0x100/0x2c0
[  257.192000]  [<000000004015ef40>] do_exit+0x778/0x9d8
[  257.192000]  [<000000004015f1ec>] do_group_exit+0x4c/0xe0
[  257.192000]  [<000000004015f298>] sys_exit_group+0x18/0x28
[  257.192000]  [<0000000040106034>] syscall_exit+0x0/0x14
[  257.192000] 
[  257.192000] 
[  257.192000] Kernel Fault: Code=26 regs=00000040bf1807d0 (Addr=0000000000000000)
[  257.192000] 
[  257.192000]      YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
[  257.192000] PSW: 00001000000001001111000000001110 Not tainted
[  257.192000] r00-03  000000ff0804f00e 0000000040769e40 0000000040214f78 0000000000000000
[  257.192000] r04-07  0000000040746e40 0000000000000001 0000004080ded370 0000000000000001
[  257.192000] r08-11  0000000040654150 0000000000000000 0000000000000001 0000000000000001
[  257.192000] r12-15  0000000000000000 00000000ffffffff 0000000000000024 0000000000000000
[  257.192000] r16-19  00000000fb4ead9c 00000000fb4eac54 0000000000000000 0000000000000000
[  257.192000] r20-23  000000000800000e 0000000000000001 000000007bbb7180 00000000401fd1a4
[  257.192000] r24-27  0000000000000001 0000004080ded370 0000000000000000 0000000040746e40
[  257.192000] r28-31  000000007ec0a908 00000040bf1807a0 00000040bf1807d0 0000000000000016
[  257.192000] sr00-03  00000000002d9000 0000000000000000 0000000000000000 00000000002d9000
[  257.192000] sr04-07  0000000000000000 0000000000000000 0000000000000000 0000000000000000
[  257.192000] 
[  257.192000] IASQ: 0000000000000000 0000000000000000 IAOQ: 000000004011bbc0 000000004011bbc4
[  257.192000]  IIR: 0f4015dc    ISR: 0000000000000000  IOR: 0000000000000000
[  257.192000]  CPU:        0   CR30: 00000040bf180000 CR31: fffffff0f0e098e0
[  257.192000]  ORIG_R28: 0000000040769e40
[  257.192000]  IAOQ[0]: _raw_spin_lock+0x0/0x20
[  257.192000]  IAOQ[1]: _raw_spin_lock+0x4/0x20
[  257.192000]  RP(r2): add_partial+0x28/0x98
[  257.192000] Backtrace:
[  257.192000]  [<0000000040214f78>] add_partial+0x28/0x98
[  257.192000]  [<0000000040217ff8>] __slab_free+0x1d0/0x1d8
[  257.192000]  [<000000004021825c>] kmem_cache_free+0xc4/0x128
[  257.192000]  [<00000000401fd1a4>] remove_vma+0x8c/0xc0
[  257.192000]  [<00000000401fd3a8>] exit_mmap+0x1d0/0x220
[  257.192000]  [<0000000040156514>] mmput+0xd4/0x200
[  257.192000]  [<000000004015c7b8>] exit_mm+0x100/0x2c0
[  257.192000]  [<000000004015ef40>] do_exit+0x778/0x9d8
[  257.192000]  [<000000004015f1ec>] do_group_exit+0x4c/0xe0
[  257.192000]  [<000000004015f298>] sys_exit_group+0x18/0x28
[  257.192000]  [<0000000040106034>] syscall_exit+0x0/0x14
[  257.192000] 
[  257.192000] Kernel panic - not syncing: Kernel Fault
[  257.192000] Backtrace:
[  257.192000]  [<000000004011f984>] show_stack+0x14/0x20
[  257.192000]  [<000000004011f9a8>] dump_stack+0x18/0x28
[  257.192000]  [<000000004015946c>] panic+0xd4/0x368
[  257.192000]  [<0000000040120024>] parisc_terminate+0x14c/0x170
[  257.192000]  [<000000004012059c>] handle_interruption+0x2ac/0x8f8
[  257.192000]  [<000000004011bbc0>] _raw_spin_lock+0x0/0x20
[  257.192000] 
[  257.192000] Rebooting in 5 seconds..

It seems to be a random intermittent mm crash because the next reboot
crashed with the same trace but after the fsck had completed and the
third came up to the login prompt.

James


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to
  2011-04-19 15:46           ` James Bottomley
@ 2011-04-19 16:06             ` John David Anglin
  -1 siblings, 0 replies; 210+ messages in thread
From: John David Anglin @ 2011-04-19 16:06 UTC (permalink / raw)
  To: James Bottomley; +Cc: mhocko, akpm, hughd, linux-mm, linux-kernel, linux-parisc

> It compiles OK, but crashes on boot in fsck.  The crash is definitely mm
> but looks to be a slab problem (it's a null deref on a spinlock in
> add_partial(), which seems unrelated to this patch).

I had a somewhat similar crash Sunday with the "debian" config building GCC.
This is with 2.6.39-rc3+  without the mm patch:

mx3210 login: [12244.664000] Backtrace:
[12244.664000]  [<000000004020c9a0>] __slab_free+0x100/0x200
[12244.664000]  [<000000004020d23c>] kmem_cache_free+0xf4/0x108
[12244.668000]  [<000000001c7e9efc>] __journal_remove_journal_head+0x214/0x248 [
jbd]
[12244.668000]  [<000000001c7edd48>] journal_put_journal_head+0xc8/0x168 [jbd]
[12244.668000]  [<000000001c7e158c>] journal_invalidatepage+0x45c/0x710 [jbd]
[12244.672000]  [<000000001c8652d8>] ext3_invalidatepage+0x88/0xe8 [ext3]
[12244.672000]  [<00000000401d76f4>] do_invalidatepage+0x34/0x40
[12244.672000]  [<00000000401d7770>] truncate_inode_page+0x70/0x178
[12244.676000]  [<00000000401d798c>] truncate_inode_pages_range+0x114/0x518
[12244.676000]  [<00000000401d7da4>] truncate_inode_pages+0x14/0x20
[12244.680000]  [<000000001c86ac1c>] ext3_evict_inode+0x64/0x2a8 [ext3]
[12244.680000]  [<0000000040234424>] evict+0xac/0x1c8

[12244.692000] PSW: 00001000000001101111001100001110 Not tainted
[12244.692000] r00-03  000000ff0806f30e 0000000040745e50 000000004020c9a0 000000
01453d6000
[12244.692000] r04-07  0000000040722e50 0000000000000000 00000002bf400000 000000
001c7dc000
[12244.696000] r08-11  0000000000000002 0000000040630298 0000000000000001 000000
007bba4940
[12244.696000] r12-15  00000002bf400000 0000000000000001 0000000000000000 000000
0000000001
[12244.700000] r16-19  000000007e71d888 000000007e71d888 0000000000001000 200000
0000000081
[12244.700000] r20-23  0000000000000000 0000000040630290 0000000000000001 000000
001c7e9efc
[12244.704000] r24-27  000000000800000e 00000001453d6000 0000000000000000 000000
0040722e50
[12244.704000] r28-31  0000000000000001 000000007bba4b70 000000007bba4ba0 000000
0000000023
[12244.708000] sr00-03  000000000556c000 000000000556c000 0000000000000000 000000000556c000
[12244.708000] sr04-07  0000000000000000 0000000000000000 0000000000000000 0000000000000000

[12244.712000]
[12244.712000] IASQ: 0000000000000000 0000000000000000 IAOQ: 000000004011b240 000000004011b244
[12244.712000]  IIR: 0f4015dc    ISR: 0000000000000000  IOR: 0000000000000000
[12244.716000]  CPU:        3   CR30: 000000007bba4000 CR31: ffffffffffffffff
[12244.716000]  ORIG_R28: 0000000000000001
[12244.720000]  IAOQ[0]: _raw_spin_lock+0x10/0x20
[12244.720000]  IAOQ[1]: _raw_spin_lock+0x14/0x20
[12244.720000]  RP(r2): __slab_free+0x100/0x200

Dave
-- 
J. David Anglin                                  dave.anglin@nrc-cnrc.gc.ca
National Research Council of Canada              (613) 990-0752 (FAX: 952-6602)

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to
@ 2011-04-19 16:06             ` John David Anglin
  0 siblings, 0 replies; 210+ messages in thread
From: John David Anglin @ 2011-04-19 16:06 UTC (permalink / raw)
  To: James Bottomley; +Cc: mhocko, akpm, hughd, linux-mm, linux-kernel, linux-parisc

> It compiles OK, but crashes on boot in fsck.  The crash is definitely mm
> but looks to be a slab problem (it's a null deref on a spinlock in
> add_partial(), which seems unrelated to this patch).

I had a somewhat similar crash Sunday with the "debian" config building GCC.
This is with 2.6.39-rc3+  without the mm patch:

mx3210 login: [12244.664000] Backtrace:
[12244.664000]  [<000000004020c9a0>] __slab_free+0x100/0x200
[12244.664000]  [<000000004020d23c>] kmem_cache_free+0xf4/0x108
[12244.668000]  [<000000001c7e9efc>] __journal_remove_journal_head+0x214/0x248 [
jbd]
[12244.668000]  [<000000001c7edd48>] journal_put_journal_head+0xc8/0x168 [jbd]
[12244.668000]  [<000000001c7e158c>] journal_invalidatepage+0x45c/0x710 [jbd]
[12244.672000]  [<000000001c8652d8>] ext3_invalidatepage+0x88/0xe8 [ext3]
[12244.672000]  [<00000000401d76f4>] do_invalidatepage+0x34/0x40
[12244.672000]  [<00000000401d7770>] truncate_inode_page+0x70/0x178
[12244.676000]  [<00000000401d798c>] truncate_inode_pages_range+0x114/0x518
[12244.676000]  [<00000000401d7da4>] truncate_inode_pages+0x14/0x20
[12244.680000]  [<000000001c86ac1c>] ext3_evict_inode+0x64/0x2a8 [ext3]
[12244.680000]  [<0000000040234424>] evict+0xac/0x1c8

[12244.692000] PSW: 00001000000001101111001100001110 Not tainted
[12244.692000] r00-03  000000ff0806f30e 0000000040745e50 000000004020c9a0 000000
01453d6000
[12244.692000] r04-07  0000000040722e50 0000000000000000 00000002bf400000 000000
001c7dc000
[12244.696000] r08-11  0000000000000002 0000000040630298 0000000000000001 000000
007bba4940
[12244.696000] r12-15  00000002bf400000 0000000000000001 0000000000000000 000000
0000000001
[12244.700000] r16-19  000000007e71d888 000000007e71d888 0000000000001000 200000
0000000081
[12244.700000] r20-23  0000000000000000 0000000040630290 0000000000000001 000000
001c7e9efc
[12244.704000] r24-27  000000000800000e 00000001453d6000 0000000000000000 000000
0040722e50
[12244.704000] r28-31  0000000000000001 000000007bba4b70 000000007bba4ba0 000000
0000000023
[12244.708000] sr00-03  000000000556c000 000000000556c000 0000000000000000 000000000556c000
[12244.708000] sr04-07  0000000000000000 0000000000000000 0000000000000000 0000000000000000

[12244.712000]
[12244.712000] IASQ: 0000000000000000 0000000000000000 IAOQ: 000000004011b240 000000004011b244
[12244.712000]  IIR: 0f4015dc    ISR: 0000000000000000  IOR: 0000000000000000
[12244.716000]  CPU:        3   CR30: 000000007bba4000 CR31: ffffffffffffffff
[12244.716000]  ORIG_R28: 0000000000000001
[12244.720000]  IAOQ[0]: _raw_spin_lock+0x10/0x20
[12244.720000]  IAOQ[1]: _raw_spin_lock+0x14/0x20
[12244.720000]  RP(r2): __slab_free+0x100/0x200

Dave
-- 
J. David Anglin                                  dave.anglin@nrc-cnrc.gc.ca
National Research Council of Canada              (613) 990-0752 (FAX: 952-6602)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-19 15:46           ` James Bottomley
@ 2011-04-19 16:07             ` James Bottomley
  -1 siblings, 0 replies; 210+ messages in thread
From: James Bottomley @ 2011-04-19 16:07 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Andrew Morton, Hugh Dickins, linux-mm, LKML, linux-parisc

On Tue, 2011-04-19 at 10:46 -0500, James Bottomley wrote:
> On Tue, 2011-04-19 at 13:10 +0200, Michal Hocko wrote:
> > On Mon 18-04-11 13:56:37, Andrew Morton wrote:
> > > On Mon, 18 Apr 2011 12:01:31 +0200
> > > Michal Hocko <mhocko@suse.cz> wrote:
> > > 
> > > > Currently we have expand_upwards exported while expand_downwards is
> > > > accessible only via expand_stack or expand_stack_downwards.
> > > > 
> > > > check_stack_guard_page is a nice example of the asymmetry. It uses
> > > > expand_stack for VM_GROWSDOWN while expand_upwards is called for
> > > > VM_GROWSUP case.
> > > > 
> > > > Let's clean this up by exporting both functions and make those name
> > > > consistent. Let's use expand_stack_{upwards,downwards} so that we are
> > > > explicit about stack manipulation in the name. expand_stack_downwards
> > > > has to be defined for both CONFIG_STACK_GROWS{UP,DOWN} because
> > > > get_arg_page calls the downwards version in the early process
> > > > initialization phase for growsup configuration.
> > > 
> > > Has this patch been tested on any stack-grows-upwards architecture?
> > 
> > The only one I can find in the tree is parisc and I do not have access
> > to any such machine. Maybe someone on the list (CCed) can help with
> > testing the patch bellow? Nevertheless, the patch doesn't change growsup
> > case. It just renames functions and exports growsdown.
> 
> It compiles OK, but crashes on boot in fsck.  The crash is definitely mm
> but looks to be a slab problem (it's a null deref on a spinlock in
> add_partial(), which seems unrelated to this patch).
> 
> [   15.628000] sd 1:0:2:0: [sdc] Attached SCSI disk
> done.
> [   16.632000] EXT3-fs: barriers not enabled
> [   16.640000] kjournald starting.  Commit interval 5 seconds
> [   16.640000] EXT3-fs (sda3): mounted filesystem with ordered data mode
> Begin: Running /scripts/local-bottom ... done.
> done.
> Begin: Running /scripts/init-bottom ... done.
> INIT: version 2.88 booting
> Setting hostname to 'ion'...done.
> Starting the hotplug events dispatcher: udevd[   22.008000] udev[211]: starting version 164
> .
> Synthesizing the initial hotplug events...done.
> Waiting for /dev to be fully populated...done.
> Activating swap:swapon on /dev/sda2
> swapon: /dev/sda2: found swap signature: version 1, page-size 4, same byte order
> swapon: /dev/sda2: pagesize=4096, swapsize=1028157440, devsize=1028160000
> [   28.780000] Adding 1004056k swap on /dev/sda2.  Priority:-1 extents:1 across:1004056k 
> .
> Will now check root file system:fsck from util-linux-ng 2.17.2
> [/sbin/fsck.ext3 (1) -- /] fsck.ext3 -a -C0 /dev/sda3 
> /dev/sda3 has been mounted 37 times without being checked, check forced.
> [  257.192000] Backtrace:===========                                \ 42.8%   
> [  257.192000]  [<0000000040214f78>] add_partial+0x28/0x98
> [  257.192000]  [<0000000040217ff8>] __slab_free+0x1d0/0x1d8
> [  257.192000]  [<000000004021825c>] kmem_cache_free+0xc4/0x128
> [  257.192000]  [<00000000401fd1a4>] remove_vma+0x8c/0xc0
> [  257.192000]  [<00000000401fd3a8>] exit_mmap+0x1d0/0x220
> [  257.192000]  [<0000000040156514>] mmput+0xd4/0x200
> [  257.192000]  [<000000004015c7b8>] exit_mm+0x100/0x2c0
> [  257.192000]  [<000000004015ef40>] do_exit+0x778/0x9d8
> [  257.192000]  [<000000004015f1ec>] do_group_exit+0x4c/0xe0
> [  257.192000]  [<000000004015f298>] sys_exit_group+0x18/0x28
> [  257.192000]  [<0000000040106034>] syscall_exit+0x0/0x14
> [  257.192000] 
> [  257.192000] 
> [  257.192000] Kernel Fault: Code=26 regs=00000040bf1807d0 (Addr=0000000000000000)
> [  257.192000] 
> [  257.192000]      YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
> [  257.192000] PSW: 00001000000001001111000000001110 Not tainted
> [  257.192000] r00-03  000000ff0804f00e 0000000040769e40 0000000040214f78 0000000000000000
> [  257.192000] r04-07  0000000040746e40 0000000000000001 0000004080ded370 0000000000000001
> [  257.192000] r08-11  0000000040654150 0000000000000000 0000000000000001 0000000000000001
> [  257.192000] r12-15  0000000000000000 00000000ffffffff 0000000000000024 0000000000000000
> [  257.192000] r16-19  00000000fb4ead9c 00000000fb4eac54 0000000000000000 0000000000000000
> [  257.192000] r20-23  000000000800000e 0000000000000001 000000007bbb7180 00000000401fd1a4
> [  257.192000] r24-27  0000000000000001 0000004080ded370 0000000000000000 0000000040746e40
> [  257.192000] r28-31  000000007ec0a908 00000040bf1807a0 00000040bf1807d0 0000000000000016
> [  257.192000] sr00-03  00000000002d9000 0000000000000000 0000000000000000 00000000002d9000
> [  257.192000] sr04-07  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> [  257.192000] 
> [  257.192000] IASQ: 0000000000000000 0000000000000000 IAOQ: 000000004011bbc0 000000004011bbc4
> [  257.192000]  IIR: 0f4015dc    ISR: 0000000000000000  IOR: 0000000000000000
> [  257.192000]  CPU:        0   CR30: 00000040bf180000 CR31: fffffff0f0e098e0
> [  257.192000]  ORIG_R28: 0000000040769e40
> [  257.192000]  IAOQ[0]: _raw_spin_lock+0x0/0x20
> [  257.192000]  IAOQ[1]: _raw_spin_lock+0x4/0x20
> [  257.192000]  RP(r2): add_partial+0x28/0x98
> [  257.192000] Backtrace:
> [  257.192000]  [<0000000040214f78>] add_partial+0x28/0x98
> [  257.192000]  [<0000000040217ff8>] __slab_free+0x1d0/0x1d8
> [  257.192000]  [<000000004021825c>] kmem_cache_free+0xc4/0x128
> [  257.192000]  [<00000000401fd1a4>] remove_vma+0x8c/0xc0
> [  257.192000]  [<00000000401fd3a8>] exit_mmap+0x1d0/0x220
> [  257.192000]  [<0000000040156514>] mmput+0xd4/0x200
> [  257.192000]  [<000000004015c7b8>] exit_mm+0x100/0x2c0
> [  257.192000]  [<000000004015ef40>] do_exit+0x778/0x9d8
> [  257.192000]  [<000000004015f1ec>] do_group_exit+0x4c/0xe0
> [  257.192000]  [<000000004015f298>] sys_exit_group+0x18/0x28
> [  257.192000]  [<0000000040106034>] syscall_exit+0x0/0x14
> [  257.192000] 
> [  257.192000] Kernel panic - not syncing: Kernel Fault
> [  257.192000] Backtrace:
> [  257.192000]  [<000000004011f984>] show_stack+0x14/0x20
> [  257.192000]  [<000000004011f9a8>] dump_stack+0x18/0x28
> [  257.192000]  [<000000004015946c>] panic+0xd4/0x368
> [  257.192000]  [<0000000040120024>] parisc_terminate+0x14c/0x170
> [  257.192000]  [<000000004012059c>] handle_interruption+0x2ac/0x8f8
> [  257.192000]  [<000000004011bbc0>] _raw_spin_lock+0x0/0x20
> [  257.192000] 
> [  257.192000] Rebooting in 5 seconds..
> 
> It seems to be a random intermittent mm crash because the next reboot
> crashed with the same trace but after the fsck had completed and the
> third came up to the login prompt.

I should add that this crash is with CONFIG_SLUB ... do you want me to
retry with CONFIG_SLAB?

James



^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-19 16:07             ` James Bottomley
  0 siblings, 0 replies; 210+ messages in thread
From: James Bottomley @ 2011-04-19 16:07 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Andrew Morton, Hugh Dickins, linux-mm, LKML, linux-parisc

On Tue, 2011-04-19 at 10:46 -0500, James Bottomley wrote:
> On Tue, 2011-04-19 at 13:10 +0200, Michal Hocko wrote:
> > On Mon 18-04-11 13:56:37, Andrew Morton wrote:
> > > On Mon, 18 Apr 2011 12:01:31 +0200
> > > Michal Hocko <mhocko@suse.cz> wrote:
> > > 
> > > > Currently we have expand_upwards exported while expand_downwards is
> > > > accessible only via expand_stack or expand_stack_downwards.
> > > > 
> > > > check_stack_guard_page is a nice example of the asymmetry. It uses
> > > > expand_stack for VM_GROWSDOWN while expand_upwards is called for
> > > > VM_GROWSUP case.
> > > > 
> > > > Let's clean this up by exporting both functions and make those name
> > > > consistent. Let's use expand_stack_{upwards,downwards} so that we are
> > > > explicit about stack manipulation in the name. expand_stack_downwards
> > > > has to be defined for both CONFIG_STACK_GROWS{UP,DOWN} because
> > > > get_arg_page calls the downwards version in the early process
> > > > initialization phase for growsup configuration.
> > > 
> > > Has this patch been tested on any stack-grows-upwards architecture?
> > 
> > The only one I can find in the tree is parisc and I do not have access
> > to any such machine. Maybe someone on the list (CCed) can help with
> > testing the patch bellow? Nevertheless, the patch doesn't change growsup
> > case. It just renames functions and exports growsdown.
> 
> It compiles OK, but crashes on boot in fsck.  The crash is definitely mm
> but looks to be a slab problem (it's a null deref on a spinlock in
> add_partial(), which seems unrelated to this patch).
> 
> [   15.628000] sd 1:0:2:0: [sdc] Attached SCSI disk
> done.
> [   16.632000] EXT3-fs: barriers not enabled
> [   16.640000] kjournald starting.  Commit interval 5 seconds
> [   16.640000] EXT3-fs (sda3): mounted filesystem with ordered data mode
> Begin: Running /scripts/local-bottom ... done.
> done.
> Begin: Running /scripts/init-bottom ... done.
> INIT: version 2.88 booting
> Setting hostname to 'ion'...done.
> Starting the hotplug events dispatcher: udevd[   22.008000] udev[211]: starting version 164
> .
> Synthesizing the initial hotplug events...done.
> Waiting for /dev to be fully populated...done.
> Activating swap:swapon on /dev/sda2
> swapon: /dev/sda2: found swap signature: version 1, page-size 4, same byte order
> swapon: /dev/sda2: pagesize=4096, swapsize=1028157440, devsize=1028160000
> [   28.780000] Adding 1004056k swap on /dev/sda2.  Priority:-1 extents:1 across:1004056k 
> .
> Will now check root file system:fsck from util-linux-ng 2.17.2
> [/sbin/fsck.ext3 (1) -- /] fsck.ext3 -a -C0 /dev/sda3 
> /dev/sda3 has been mounted 37 times without being checked, check forced.
> [  257.192000] Backtrace:===========                                \ 42.8%   
> [  257.192000]  [<0000000040214f78>] add_partial+0x28/0x98
> [  257.192000]  [<0000000040217ff8>] __slab_free+0x1d0/0x1d8
> [  257.192000]  [<000000004021825c>] kmem_cache_free+0xc4/0x128
> [  257.192000]  [<00000000401fd1a4>] remove_vma+0x8c/0xc0
> [  257.192000]  [<00000000401fd3a8>] exit_mmap+0x1d0/0x220
> [  257.192000]  [<0000000040156514>] mmput+0xd4/0x200
> [  257.192000]  [<000000004015c7b8>] exit_mm+0x100/0x2c0
> [  257.192000]  [<000000004015ef40>] do_exit+0x778/0x9d8
> [  257.192000]  [<000000004015f1ec>] do_group_exit+0x4c/0xe0
> [  257.192000]  [<000000004015f298>] sys_exit_group+0x18/0x28
> [  257.192000]  [<0000000040106034>] syscall_exit+0x0/0x14
> [  257.192000] 
> [  257.192000] 
> [  257.192000] Kernel Fault: Code=26 regs=00000040bf1807d0 (Addr=0000000000000000)
> [  257.192000] 
> [  257.192000]      YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
> [  257.192000] PSW: 00001000000001001111000000001110 Not tainted
> [  257.192000] r00-03  000000ff0804f00e 0000000040769e40 0000000040214f78 0000000000000000
> [  257.192000] r04-07  0000000040746e40 0000000000000001 0000004080ded370 0000000000000001
> [  257.192000] r08-11  0000000040654150 0000000000000000 0000000000000001 0000000000000001
> [  257.192000] r12-15  0000000000000000 00000000ffffffff 0000000000000024 0000000000000000
> [  257.192000] r16-19  00000000fb4ead9c 00000000fb4eac54 0000000000000000 0000000000000000
> [  257.192000] r20-23  000000000800000e 0000000000000001 000000007bbb7180 00000000401fd1a4
> [  257.192000] r24-27  0000000000000001 0000004080ded370 0000000000000000 0000000040746e40
> [  257.192000] r28-31  000000007ec0a908 00000040bf1807a0 00000040bf1807d0 0000000000000016
> [  257.192000] sr00-03  00000000002d9000 0000000000000000 0000000000000000 00000000002d9000
> [  257.192000] sr04-07  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> [  257.192000] 
> [  257.192000] IASQ: 0000000000000000 0000000000000000 IAOQ: 000000004011bbc0 000000004011bbc4
> [  257.192000]  IIR: 0f4015dc    ISR: 0000000000000000  IOR: 0000000000000000
> [  257.192000]  CPU:        0   CR30: 00000040bf180000 CR31: fffffff0f0e098e0
> [  257.192000]  ORIG_R28: 0000000040769e40
> [  257.192000]  IAOQ[0]: _raw_spin_lock+0x0/0x20
> [  257.192000]  IAOQ[1]: _raw_spin_lock+0x4/0x20
> [  257.192000]  RP(r2): add_partial+0x28/0x98
> [  257.192000] Backtrace:
> [  257.192000]  [<0000000040214f78>] add_partial+0x28/0x98
> [  257.192000]  [<0000000040217ff8>] __slab_free+0x1d0/0x1d8
> [  257.192000]  [<000000004021825c>] kmem_cache_free+0xc4/0x128
> [  257.192000]  [<00000000401fd1a4>] remove_vma+0x8c/0xc0
> [  257.192000]  [<00000000401fd3a8>] exit_mmap+0x1d0/0x220
> [  257.192000]  [<0000000040156514>] mmput+0xd4/0x200
> [  257.192000]  [<000000004015c7b8>] exit_mm+0x100/0x2c0
> [  257.192000]  [<000000004015ef40>] do_exit+0x778/0x9d8
> [  257.192000]  [<000000004015f1ec>] do_group_exit+0x4c/0xe0
> [  257.192000]  [<000000004015f298>] sys_exit_group+0x18/0x28
> [  257.192000]  [<0000000040106034>] syscall_exit+0x0/0x14
> [  257.192000] 
> [  257.192000] Kernel panic - not syncing: Kernel Fault
> [  257.192000] Backtrace:
> [  257.192000]  [<000000004011f984>] show_stack+0x14/0x20
> [  257.192000]  [<000000004011f9a8>] dump_stack+0x18/0x28
> [  257.192000]  [<000000004015946c>] panic+0xd4/0x368
> [  257.192000]  [<0000000040120024>] parisc_terminate+0x14c/0x170
> [  257.192000]  [<000000004012059c>] handle_interruption+0x2ac/0x8f8
> [  257.192000]  [<000000004011bbc0>] _raw_spin_lock+0x0/0x20
> [  257.192000] 
> [  257.192000] Rebooting in 5 seconds..
> 
> It seems to be a random intermittent mm crash because the next reboot
> crashed with the same trace but after the fsck had completed and the
> third came up to the login prompt.

I should add that this crash is with CONFIG_SLUB ... do you want me to
retry with CONFIG_SLAB?

James


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to
  2011-04-19 16:06             ` John David Anglin
@ 2011-04-19 16:59               ` James Bottomley
  -1 siblings, 0 replies; 210+ messages in thread
From: James Bottomley @ 2011-04-19 16:59 UTC (permalink / raw)
  To: John David Anglin
  Cc: mhocko, akpm, hughd, linux-mm, linux-kernel, linux-parisc

On Tue, 2011-04-19 at 12:06 -0400, John David Anglin wrote:
> > It compiles OK, but crashes on boot in fsck.  The crash is definitely mm
> > but looks to be a slab problem (it's a null deref on a spinlock in
> > add_partial(), which seems unrelated to this patch).
> 
> I had a somewhat similar crash Sunday with the "debian" config building GCC.
> This is with 2.6.39-rc3+  without the mm patch:
> 
> mx3210 login: [12244.664000] Backtrace:
> [12244.664000]  [<000000004020c9a0>] __slab_free+0x100/0x200
> [12244.664000]  [<000000004020d23c>] kmem_cache_free+0xf4/0x108
> [12244.668000]  [<000000001c7e9efc>] __journal_remove_journal_head+0x214/0x248 [
> jbd]
> [12244.668000]  [<000000001c7edd48>] journal_put_journal_head+0xc8/0x168 [jbd]
> [12244.668000]  [<000000001c7e158c>] journal_invalidatepage+0x45c/0x710 [jbd]
> [12244.672000]  [<000000001c8652d8>] ext3_invalidatepage+0x88/0xe8 [ext3]
> [12244.672000]  [<00000000401d76f4>] do_invalidatepage+0x34/0x40
> [12244.672000]  [<00000000401d7770>] truncate_inode_page+0x70/0x178
> [12244.676000]  [<00000000401d798c>] truncate_inode_pages_range+0x114/0x518
> [12244.676000]  [<00000000401d7da4>] truncate_inode_pages+0x14/0x20
> [12244.680000]  [<000000001c86ac1c>] ext3_evict_inode+0x64/0x2a8 [ext3]
> [12244.680000]  [<0000000040234424>] evict+0xac/0x1c8
> 
> [12244.692000] PSW: 00001000000001101111001100001110 Not tainted
> [12244.692000] r00-03  000000ff0806f30e 0000000040745e50 000000004020c9a0 000000
> 01453d6000
> [12244.692000] r04-07  0000000040722e50 0000000000000000 00000002bf400000 000000
> 001c7dc000
> [12244.696000] r08-11  0000000000000002 0000000040630298 0000000000000001 000000
> 007bba4940
> [12244.696000] r12-15  00000002bf400000 0000000000000001 0000000000000000 000000
> 0000000001
> [12244.700000] r16-19  000000007e71d888 000000007e71d888 0000000000001000 200000
> 0000000081
> [12244.700000] r20-23  0000000000000000 0000000040630290 0000000000000001 000000
> 001c7e9efc
> [12244.704000] r24-27  000000000800000e 00000001453d6000 0000000000000000 000000
> 0040722e50
> [12244.704000] r28-31  0000000000000001 000000007bba4b70 000000007bba4ba0 000000
> 0000000023
> [12244.708000] sr00-03  000000000556c000 000000000556c000 0000000000000000 000000000556c000
> [12244.708000] sr04-07  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> 
> [12244.712000]
> [12244.712000] IASQ: 0000000000000000 0000000000000000 IAOQ: 000000004011b240 000000004011b244
> [12244.712000]  IIR: 0f4015dc    ISR: 0000000000000000  IOR: 0000000000000000
> [12244.716000]  CPU:        3   CR30: 000000007bba4000 CR31: ffffffffffffffff
> [12244.716000]  ORIG_R28: 0000000000000001
> [12244.720000]  IAOQ[0]: _raw_spin_lock+0x10/0x20
> [12244.720000]  IAOQ[1]: _raw_spin_lock+0x14/0x20
> [12244.720000]  RP(r2): __slab_free+0x100/0x200

Yes, it's the same crash.  Apparently get_node() is returning NULL for
some reason.

James



^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to
@ 2011-04-19 16:59               ` James Bottomley
  0 siblings, 0 replies; 210+ messages in thread
From: James Bottomley @ 2011-04-19 16:59 UTC (permalink / raw)
  To: John David Anglin
  Cc: mhocko, akpm, hughd, linux-mm, linux-kernel, linux-parisc

On Tue, 2011-04-19 at 12:06 -0400, John David Anglin wrote:
> > It compiles OK, but crashes on boot in fsck.  The crash is definitely mm
> > but looks to be a slab problem (it's a null deref on a spinlock in
> > add_partial(), which seems unrelated to this patch).
> 
> I had a somewhat similar crash Sunday with the "debian" config building GCC.
> This is with 2.6.39-rc3+  without the mm patch:
> 
> mx3210 login: [12244.664000] Backtrace:
> [12244.664000]  [<000000004020c9a0>] __slab_free+0x100/0x200
> [12244.664000]  [<000000004020d23c>] kmem_cache_free+0xf4/0x108
> [12244.668000]  [<000000001c7e9efc>] __journal_remove_journal_head+0x214/0x248 [
> jbd]
> [12244.668000]  [<000000001c7edd48>] journal_put_journal_head+0xc8/0x168 [jbd]
> [12244.668000]  [<000000001c7e158c>] journal_invalidatepage+0x45c/0x710 [jbd]
> [12244.672000]  [<000000001c8652d8>] ext3_invalidatepage+0x88/0xe8 [ext3]
> [12244.672000]  [<00000000401d76f4>] do_invalidatepage+0x34/0x40
> [12244.672000]  [<00000000401d7770>] truncate_inode_page+0x70/0x178
> [12244.676000]  [<00000000401d798c>] truncate_inode_pages_range+0x114/0x518
> [12244.676000]  [<00000000401d7da4>] truncate_inode_pages+0x14/0x20
> [12244.680000]  [<000000001c86ac1c>] ext3_evict_inode+0x64/0x2a8 [ext3]
> [12244.680000]  [<0000000040234424>] evict+0xac/0x1c8
> 
> [12244.692000] PSW: 00001000000001101111001100001110 Not tainted
> [12244.692000] r00-03  000000ff0806f30e 0000000040745e50 000000004020c9a0 000000
> 01453d6000
> [12244.692000] r04-07  0000000040722e50 0000000000000000 00000002bf400000 000000
> 001c7dc000
> [12244.696000] r08-11  0000000000000002 0000000040630298 0000000000000001 000000
> 007bba4940
> [12244.696000] r12-15  00000002bf400000 0000000000000001 0000000000000000 000000
> 0000000001
> [12244.700000] r16-19  000000007e71d888 000000007e71d888 0000000000001000 200000
> 0000000081
> [12244.700000] r20-23  0000000000000000 0000000040630290 0000000000000001 000000
> 001c7e9efc
> [12244.704000] r24-27  000000000800000e 00000001453d6000 0000000000000000 000000
> 0040722e50
> [12244.704000] r28-31  0000000000000001 000000007bba4b70 000000007bba4ba0 000000
> 0000000023
> [12244.708000] sr00-03  000000000556c000 000000000556c000 0000000000000000 000000000556c000
> [12244.708000] sr04-07  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> 
> [12244.712000]
> [12244.712000] IASQ: 0000000000000000 0000000000000000 IAOQ: 000000004011b240 000000004011b244
> [12244.712000]  IIR: 0f4015dc    ISR: 0000000000000000  IOR: 0000000000000000
> [12244.716000]  CPU:        3   CR30: 000000007bba4000 CR31: ffffffffffffffff
> [12244.716000]  ORIG_R28: 0000000000000001
> [12244.720000]  IAOQ[0]: _raw_spin_lock+0x10/0x20
> [12244.720000]  IAOQ[1]: _raw_spin_lock+0x14/0x20
> [12244.720000]  RP(r2): __slab_free+0x100/0x200

Yes, it's the same crash.  Apparently get_node() is returning NULL for
some reason.

James


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-19 15:46           ` James Bottomley
  (?)
@ 2011-04-19 17:05             ` Pekka Enberg
  -1 siblings, 0 replies; 210+ messages in thread
From: Pekka Enberg @ 2011-04-19 17:05 UTC (permalink / raw)
  To: James Bottomley
  Cc: Michal Hocko, Andrew Morton, Hugh Dickins, linux-mm, LKML,
	linux-parisc, Christoph Lameter, David Rientjes

On Tue, Apr 19, 2011 at 6:46 PM, James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
> It compiles OK, but crashes on boot in fsck. =A0The crash is definite=
ly mm
> but looks to be a slab problem (it's a null deref on a spinlock in
> add_partial(), which seems unrelated to this patch).
>
> [ =A0 15.628000] sd 1:0:2:0: [sdc] Attached SCSI disk
> done.
> [ =A0 16.632000] EXT3-fs: barriers not enabled
> [ =A0 16.640000] kjournald starting. =A0Commit interval 5 seconds
> [ =A0 16.640000] EXT3-fs (sda3): mounted filesystem with ordered data=
 mode
> Begin: Running /scripts/local-bottom ... done.
> done.
> Begin: Running /scripts/init-bottom ... done.
> INIT: version 2.88 booting
> Setting hostname to 'ion'...done.
> Starting the hotplug events dispatcher: udevd[ =A0 22.008000] udev[21=
1]: starting version 164
> .
> Synthesizing the initial hotplug events...done.
> Waiting for /dev to be fully populated...done.
> Activating swap:swapon on /dev/sda2
> swapon: /dev/sda2: found swap signature: version 1, page-size 4, same=
 byte order
> swapon: /dev/sda2: pagesize=3D4096, swapsize=3D1028157440, devsize=3D=
1028160000
> [ =A0 28.780000] Adding 1004056k swap on /dev/sda2. =A0Priority:-1 ex=
tents:1 across:1004056k
> .
> Will now check root file system:fsck from util-linux-ng 2.17.2
> [/sbin/fsck.ext3 (1) -- /] fsck.ext3 -a -C0 /dev/sda3
> /dev/sda3 has been mounted 37 times without being checked, check forc=
ed.
> [ =A0257.192000] Backtrace:=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0\ 42.8%
> [ =A0257.192000] =A0[<0000000040214f78>] add_partial+0x28/0x98
> [ =A0257.192000] =A0[<0000000040217ff8>] __slab_free+0x1d0/0x1d8
> [ =A0257.192000] =A0[<000000004021825c>] kmem_cache_free+0xc4/0x128
> [ =A0257.192000] =A0[<00000000401fd1a4>] remove_vma+0x8c/0xc0
> [ =A0257.192000] =A0[<00000000401fd3a8>] exit_mmap+0x1d0/0x220
> [ =A0257.192000] =A0[<0000000040156514>] mmput+0xd4/0x200
> [ =A0257.192000] =A0[<000000004015c7b8>] exit_mm+0x100/0x2c0
> [ =A0257.192000] =A0[<000000004015ef40>] do_exit+0x778/0x9d8
> [ =A0257.192000] =A0[<000000004015f1ec>] do_group_exit+0x4c/0xe0
> [ =A0257.192000] =A0[<000000004015f298>] sys_exit_group+0x18/0x28
> [ =A0257.192000] =A0[<0000000040106034>] syscall_exit+0x0/0x14
> [ =A0257.192000]
> [ =A0257.192000]
> [ =A0257.192000] Kernel Fault: Code=3D26 regs=3D00000040bf1807d0 (Add=
r=3D0000000000000000)
> [ =A0257.192000]
> [ =A0257.192000] =A0 =A0 =A0YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
> [ =A0257.192000] PSW: 00001000000001001111000000001110 Not tainted
> [ =A0257.192000] r00-03 =A0000000ff0804f00e 0000000040769e40 00000000=
40214f78 0000000000000000
> [ =A0257.192000] r04-07 =A00000000040746e40 0000000000000001 00000040=
80ded370 0000000000000001
> [ =A0257.192000] r08-11 =A00000000040654150 0000000000000000 00000000=
00000001 0000000000000001
> [ =A0257.192000] r12-15 =A00000000000000000 00000000ffffffff 00000000=
00000024 0000000000000000
> [ =A0257.192000] r16-19 =A000000000fb4ead9c 00000000fb4eac54 00000000=
00000000 0000000000000000
> [ =A0257.192000] r20-23 =A0000000000800000e 0000000000000001 00000000=
7bbb7180 00000000401fd1a4
> [ =A0257.192000] r24-27 =A00000000000000001 0000004080ded370 00000000=
00000000 0000000040746e40
> [ =A0257.192000] r28-31 =A0000000007ec0a908 00000040bf1807a0 00000040=
bf1807d0 0000000000000016
> [ =A0257.192000] sr00-03 =A000000000002d9000 0000000000000000 0000000=
000000000 00000000002d9000
> [ =A0257.192000] sr04-07 =A00000000000000000 0000000000000000 0000000=
000000000 0000000000000000
> [ =A0257.192000]
> [ =A0257.192000] IASQ: 0000000000000000 0000000000000000 IAOQ: 000000=
004011bbc0 000000004011bbc4
> [ =A0257.192000] =A0IIR: 0f4015dc =A0 =A0ISR: 0000000000000000 =A0IOR=
: 0000000000000000
> [ =A0257.192000] =A0CPU: =A0 =A0 =A0 =A00 =A0 CR30: 00000040bf180000 =
CR31: fffffff0f0e098e0
> [ =A0257.192000] =A0ORIG_R28: 0000000040769e40
> [ =A0257.192000] =A0IAOQ[0]: _raw_spin_lock+0x0/0x20
> [ =A0257.192000] =A0IAOQ[1]: _raw_spin_lock+0x4/0x20
> [ =A0257.192000] =A0RP(r2): add_partial+0x28/0x98
> [ =A0257.192000] Backtrace:
> [ =A0257.192000] =A0[<0000000040214f78>] add_partial+0x28/0x98
> [ =A0257.192000] =A0[<0000000040217ff8>] __slab_free+0x1d0/0x1d8
> [ =A0257.192000] =A0[<000000004021825c>] kmem_cache_free+0xc4/0x128
> [ =A0257.192000] =A0[<00000000401fd1a4>] remove_vma+0x8c/0xc0
> [ =A0257.192000] =A0[<00000000401fd3a8>] exit_mmap+0x1d0/0x220
> [ =A0257.192000] =A0[<0000000040156514>] mmput+0xd4/0x200
> [ =A0257.192000] =A0[<000000004015c7b8>] exit_mm+0x100/0x2c0
> [ =A0257.192000] =A0[<000000004015ef40>] do_exit+0x778/0x9d8
> [ =A0257.192000] =A0[<000000004015f1ec>] do_group_exit+0x4c/0xe0
> [ =A0257.192000] =A0[<000000004015f298>] sys_exit_group+0x18/0x28
> [ =A0257.192000] =A0[<0000000040106034>] syscall_exit+0x0/0x14
> [ =A0257.192000]
> [ =A0257.192000] Kernel panic - not syncing: Kernel Fault
> [ =A0257.192000] Backtrace:
> [ =A0257.192000] =A0[<000000004011f984>] show_stack+0x14/0x20
> [ =A0257.192000] =A0[<000000004011f9a8>] dump_stack+0x18/0x28
> [ =A0257.192000] =A0[<000000004015946c>] panic+0xd4/0x368
> [ =A0257.192000] =A0[<0000000040120024>] parisc_terminate+0x14c/0x170
> [ =A0257.192000] =A0[<000000004012059c>] handle_interruption+0x2ac/0x=
8f8
> [ =A0257.192000] =A0[<000000004011bbc0>] _raw_spin_lock+0x0/0x20
> [ =A0257.192000]
> [ =A0257.192000] Rebooting in 5 seconds..
>
> It seems to be a random intermittent mm crash because the next reboot
> crashed with the same trace but after the fsck had completed and the
> third came up to the login prompt.

Looks like a genuine SLUB problem on parisc. Christoph?
--
To unsubscribe from this list: send the line "unsubscribe linux-parisc"=
 in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-19 17:05             ` Pekka Enberg
  0 siblings, 0 replies; 210+ messages in thread
From: Pekka Enberg @ 2011-04-19 17:05 UTC (permalink / raw)
  To: James Bottomley
  Cc: Michal Hocko, Andrew Morton, Hugh Dickins, linux-mm, LKML,
	linux-parisc, Christoph Lameter, David Rientjes

On Tue, Apr 19, 2011 at 6:46 PM, James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
> It compiles OK, but crashes on boot in fsck.  The crash is definitely mm
> but looks to be a slab problem (it's a null deref on a spinlock in
> add_partial(), which seems unrelated to this patch).
>
> [   15.628000] sd 1:0:2:0: [sdc] Attached SCSI disk
> done.
> [   16.632000] EXT3-fs: barriers not enabled
> [   16.640000] kjournald starting.  Commit interval 5 seconds
> [   16.640000] EXT3-fs (sda3): mounted filesystem with ordered data mode
> Begin: Running /scripts/local-bottom ... done.
> done.
> Begin: Running /scripts/init-bottom ... done.
> INIT: version 2.88 booting
> Setting hostname to 'ion'...done.
> Starting the hotplug events dispatcher: udevd[   22.008000] udev[211]: starting version 164
> .
> Synthesizing the initial hotplug events...done.
> Waiting for /dev to be fully populated...done.
> Activating swap:swapon on /dev/sda2
> swapon: /dev/sda2: found swap signature: version 1, page-size 4, same byte order
> swapon: /dev/sda2: pagesize=4096, swapsize=1028157440, devsize=1028160000
> [   28.780000] Adding 1004056k swap on /dev/sda2.  Priority:-1 extents:1 across:1004056k
> .
> Will now check root file system:fsck from util-linux-ng 2.17.2
> [/sbin/fsck.ext3 (1) -- /] fsck.ext3 -a -C0 /dev/sda3
> /dev/sda3 has been mounted 37 times without being checked, check forced.
> [  257.192000] Backtrace:===========                                \ 42.8%
> [  257.192000]  [<0000000040214f78>] add_partial+0x28/0x98
> [  257.192000]  [<0000000040217ff8>] __slab_free+0x1d0/0x1d8
> [  257.192000]  [<000000004021825c>] kmem_cache_free+0xc4/0x128
> [  257.192000]  [<00000000401fd1a4>] remove_vma+0x8c/0xc0
> [  257.192000]  [<00000000401fd3a8>] exit_mmap+0x1d0/0x220
> [  257.192000]  [<0000000040156514>] mmput+0xd4/0x200
> [  257.192000]  [<000000004015c7b8>] exit_mm+0x100/0x2c0
> [  257.192000]  [<000000004015ef40>] do_exit+0x778/0x9d8
> [  257.192000]  [<000000004015f1ec>] do_group_exit+0x4c/0xe0
> [  257.192000]  [<000000004015f298>] sys_exit_group+0x18/0x28
> [  257.192000]  [<0000000040106034>] syscall_exit+0x0/0x14
> [  257.192000]
> [  257.192000]
> [  257.192000] Kernel Fault: Code=26 regs=00000040bf1807d0 (Addr=0000000000000000)
> [  257.192000]
> [  257.192000]      YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
> [  257.192000] PSW: 00001000000001001111000000001110 Not tainted
> [  257.192000] r00-03  000000ff0804f00e 0000000040769e40 0000000040214f78 0000000000000000
> [  257.192000] r04-07  0000000040746e40 0000000000000001 0000004080ded370 0000000000000001
> [  257.192000] r08-11  0000000040654150 0000000000000000 0000000000000001 0000000000000001
> [  257.192000] r12-15  0000000000000000 00000000ffffffff 0000000000000024 0000000000000000
> [  257.192000] r16-19  00000000fb4ead9c 00000000fb4eac54 0000000000000000 0000000000000000
> [  257.192000] r20-23  000000000800000e 0000000000000001 000000007bbb7180 00000000401fd1a4
> [  257.192000] r24-27  0000000000000001 0000004080ded370 0000000000000000 0000000040746e40
> [  257.192000] r28-31  000000007ec0a908 00000040bf1807a0 00000040bf1807d0 0000000000000016
> [  257.192000] sr00-03  00000000002d9000 0000000000000000 0000000000000000 00000000002d9000
> [  257.192000] sr04-07  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> [  257.192000]
> [  257.192000] IASQ: 0000000000000000 0000000000000000 IAOQ: 000000004011bbc0 000000004011bbc4
> [  257.192000]  IIR: 0f4015dc    ISR: 0000000000000000  IOR: 0000000000000000
> [  257.192000]  CPU:        0   CR30: 00000040bf180000 CR31: fffffff0f0e098e0
> [  257.192000]  ORIG_R28: 0000000040769e40
> [  257.192000]  IAOQ[0]: _raw_spin_lock+0x0/0x20
> [  257.192000]  IAOQ[1]: _raw_spin_lock+0x4/0x20
> [  257.192000]  RP(r2): add_partial+0x28/0x98
> [  257.192000] Backtrace:
> [  257.192000]  [<0000000040214f78>] add_partial+0x28/0x98
> [  257.192000]  [<0000000040217ff8>] __slab_free+0x1d0/0x1d8
> [  257.192000]  [<000000004021825c>] kmem_cache_free+0xc4/0x128
> [  257.192000]  [<00000000401fd1a4>] remove_vma+0x8c/0xc0
> [  257.192000]  [<00000000401fd3a8>] exit_mmap+0x1d0/0x220
> [  257.192000]  [<0000000040156514>] mmput+0xd4/0x200
> [  257.192000]  [<000000004015c7b8>] exit_mm+0x100/0x2c0
> [  257.192000]  [<000000004015ef40>] do_exit+0x778/0x9d8
> [  257.192000]  [<000000004015f1ec>] do_group_exit+0x4c/0xe0
> [  257.192000]  [<000000004015f298>] sys_exit_group+0x18/0x28
> [  257.192000]  [<0000000040106034>] syscall_exit+0x0/0x14
> [  257.192000]
> [  257.192000] Kernel panic - not syncing: Kernel Fault
> [  257.192000] Backtrace:
> [  257.192000]  [<000000004011f984>] show_stack+0x14/0x20
> [  257.192000]  [<000000004011f9a8>] dump_stack+0x18/0x28
> [  257.192000]  [<000000004015946c>] panic+0xd4/0x368
> [  257.192000]  [<0000000040120024>] parisc_terminate+0x14c/0x170
> [  257.192000]  [<000000004012059c>] handle_interruption+0x2ac/0x8f8
> [  257.192000]  [<000000004011bbc0>] _raw_spin_lock+0x0/0x20
> [  257.192000]
> [  257.192000] Rebooting in 5 seconds..
>
> It seems to be a random intermittent mm crash because the next reboot
> crashed with the same trace but after the fsck had completed and the
> third came up to the login prompt.

Looks like a genuine SLUB problem on parisc. Christoph?

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-19 17:05             ` Pekka Enberg
  0 siblings, 0 replies; 210+ messages in thread
From: Pekka Enberg @ 2011-04-19 17:05 UTC (permalink / raw)
  To: James Bottomley
  Cc: Michal Hocko, Andrew Morton, Hugh Dickins, linux-mm, LKML,
	linux-parisc, Christoph Lameter, David Rientjes

On Tue, Apr 19, 2011 at 6:46 PM, James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
> It compiles OK, but crashes on boot in fsck.  The crash is definitely mm
> but looks to be a slab problem (it's a null deref on a spinlock in
> add_partial(), which seems unrelated to this patch).
>
> [   15.628000] sd 1:0:2:0: [sdc] Attached SCSI disk
> done.
> [   16.632000] EXT3-fs: barriers not enabled
> [   16.640000] kjournald starting.  Commit interval 5 seconds
> [   16.640000] EXT3-fs (sda3): mounted filesystem with ordered data mode
> Begin: Running /scripts/local-bottom ... done.
> done.
> Begin: Running /scripts/init-bottom ... done.
> INIT: version 2.88 booting
> Setting hostname to 'ion'...done.
> Starting the hotplug events dispatcher: udevd[   22.008000] udev[211]: starting version 164
> .
> Synthesizing the initial hotplug events...done.
> Waiting for /dev to be fully populated...done.
> Activating swap:swapon on /dev/sda2
> swapon: /dev/sda2: found swap signature: version 1, page-size 4, same byte order
> swapon: /dev/sda2: pagesize=4096, swapsize=1028157440, devsize=1028160000
> [   28.780000] Adding 1004056k swap on /dev/sda2.  Priority:-1 extents:1 across:1004056k
> .
> Will now check root file system:fsck from util-linux-ng 2.17.2
> [/sbin/fsck.ext3 (1) -- /] fsck.ext3 -a -C0 /dev/sda3
> /dev/sda3 has been mounted 37 times without being checked, check forced.
> [  257.192000] Backtrace:===========                                \ 42.8%
> [  257.192000]  [<0000000040214f78>] add_partial+0x28/0x98
> [  257.192000]  [<0000000040217ff8>] __slab_free+0x1d0/0x1d8
> [  257.192000]  [<000000004021825c>] kmem_cache_free+0xc4/0x128
> [  257.192000]  [<00000000401fd1a4>] remove_vma+0x8c/0xc0
> [  257.192000]  [<00000000401fd3a8>] exit_mmap+0x1d0/0x220
> [  257.192000]  [<0000000040156514>] mmput+0xd4/0x200
> [  257.192000]  [<000000004015c7b8>] exit_mm+0x100/0x2c0
> [  257.192000]  [<000000004015ef40>] do_exit+0x778/0x9d8
> [  257.192000]  [<000000004015f1ec>] do_group_exit+0x4c/0xe0
> [  257.192000]  [<000000004015f298>] sys_exit_group+0x18/0x28
> [  257.192000]  [<0000000040106034>] syscall_exit+0x0/0x14
> [  257.192000]
> [  257.192000]
> [  257.192000] Kernel Fault: Code=26 regs=00000040bf1807d0 (Addr=0000000000000000)
> [  257.192000]
> [  257.192000]      YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
> [  257.192000] PSW: 00001000000001001111000000001110 Not tainted
> [  257.192000] r00-03  000000ff0804f00e 0000000040769e40 0000000040214f78 0000000000000000
> [  257.192000] r04-07  0000000040746e40 0000000000000001 0000004080ded370 0000000000000001
> [  257.192000] r08-11  0000000040654150 0000000000000000 0000000000000001 0000000000000001
> [  257.192000] r12-15  0000000000000000 00000000ffffffff 0000000000000024 0000000000000000
> [  257.192000] r16-19  00000000fb4ead9c 00000000fb4eac54 0000000000000000 0000000000000000
> [  257.192000] r20-23  000000000800000e 0000000000000001 000000007bbb7180 00000000401fd1a4
> [  257.192000] r24-27  0000000000000001 0000004080ded370 0000000000000000 0000000040746e40
> [  257.192000] r28-31  000000007ec0a908 00000040bf1807a0 00000040bf1807d0 0000000000000016
> [  257.192000] sr00-03  00000000002d9000 0000000000000000 0000000000000000 00000000002d9000
> [  257.192000] sr04-07  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> [  257.192000]
> [  257.192000] IASQ: 0000000000000000 0000000000000000 IAOQ: 000000004011bbc0 000000004011bbc4
> [  257.192000]  IIR: 0f4015dc    ISR: 0000000000000000  IOR: 0000000000000000
> [  257.192000]  CPU:        0   CR30: 00000040bf180000 CR31: fffffff0f0e098e0
> [  257.192000]  ORIG_R28: 0000000040769e40
> [  257.192000]  IAOQ[0]: _raw_spin_lock+0x0/0x20
> [  257.192000]  IAOQ[1]: _raw_spin_lock+0x4/0x20
> [  257.192000]  RP(r2): add_partial+0x28/0x98
> [  257.192000] Backtrace:
> [  257.192000]  [<0000000040214f78>] add_partial+0x28/0x98
> [  257.192000]  [<0000000040217ff8>] __slab_free+0x1d0/0x1d8
> [  257.192000]  [<000000004021825c>] kmem_cache_free+0xc4/0x128
> [  257.192000]  [<00000000401fd1a4>] remove_vma+0x8c/0xc0
> [  257.192000]  [<00000000401fd3a8>] exit_mmap+0x1d0/0x220
> [  257.192000]  [<0000000040156514>] mmput+0xd4/0x200
> [  257.192000]  [<000000004015c7b8>] exit_mm+0x100/0x2c0
> [  257.192000]  [<000000004015ef40>] do_exit+0x778/0x9d8
> [  257.192000]  [<000000004015f1ec>] do_group_exit+0x4c/0xe0
> [  257.192000]  [<000000004015f298>] sys_exit_group+0x18/0x28
> [  257.192000]  [<0000000040106034>] syscall_exit+0x0/0x14
> [  257.192000]
> [  257.192000] Kernel panic - not syncing: Kernel Fault
> [  257.192000] Backtrace:
> [  257.192000]  [<000000004011f984>] show_stack+0x14/0x20
> [  257.192000]  [<000000004011f9a8>] dump_stack+0x18/0x28
> [  257.192000]  [<000000004015946c>] panic+0xd4/0x368
> [  257.192000]  [<0000000040120024>] parisc_terminate+0x14c/0x170
> [  257.192000]  [<000000004012059c>] handle_interruption+0x2ac/0x8f8
> [  257.192000]  [<000000004011bbc0>] _raw_spin_lock+0x0/0x20
> [  257.192000]
> [  257.192000] Rebooting in 5 seconds..
>
> It seems to be a random intermittent mm crash because the next reboot
> crashed with the same trace but after the fsck had completed and the
> third came up to the login prompt.

Looks like a genuine SLUB problem on parisc. Christoph?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-19 17:05             ` Pekka Enberg
@ 2011-04-19 17:11               ` James Bottomley
  -1 siblings, 0 replies; 210+ messages in thread
From: James Bottomley @ 2011-04-19 17:11 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Michal Hocko, Andrew Morton, Hugh Dickins, linux-mm, LKML,
	linux-parisc, Christoph Lameter, David Rientjes

On Tue, 2011-04-19 at 20:05 +0300, Pekka Enberg wrote:
> > It seems to be a random intermittent mm crash because the next reboot
> > crashed with the same trace but after the fsck had completed and the
> > third came up to the login prompt.
> 
> Looks like a genuine SLUB problem on parisc. Christoph?

Looking through the slub code, it seems to be making invalid
assumptions.  All of the node stuff is dependent on CONFIG_NUMA.
However, we're CONFIG_DISCONTIGMEM (with CONFIG_NUMA not set): on the
machines I and Dave Anglin have, our physical memory ranges are 0-1GB
and 64-65GB, so I think slub crashes when we get a page from the high
memory range ... because it's not expecting a non-zero node number.

James



^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-19 17:11               ` James Bottomley
  0 siblings, 0 replies; 210+ messages in thread
From: James Bottomley @ 2011-04-19 17:11 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Michal Hocko, Andrew Morton, Hugh Dickins, linux-mm, LKML,
	linux-parisc, Christoph Lameter, David Rientjes

On Tue, 2011-04-19 at 20:05 +0300, Pekka Enberg wrote:
> > It seems to be a random intermittent mm crash because the next reboot
> > crashed with the same trace but after the fsck had completed and the
> > third came up to the login prompt.
> 
> Looks like a genuine SLUB problem on parisc. Christoph?

Looking through the slub code, it seems to be making invalid
assumptions.  All of the node stuff is dependent on CONFIG_NUMA.
However, we're CONFIG_DISCONTIGMEM (with CONFIG_NUMA not set): on the
machines I and Dave Anglin have, our physical memory ranges are 0-1GB
and 64-65GB, so I think slub crashes when we get a page from the high
memory range ... because it's not expecting a non-zero node number.

James


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-19 17:05             ` Pekka Enberg
                               ` (2 preceding siblings ...)
  (?)
@ 2011-04-19 17:12             ` Christoph Lameter
  -1 siblings, 0 replies; 210+ messages in thread
From: Christoph Lameter @ 2011-04-19 17:12 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: James Bottomley, Michal Hocko, Andrew Morton, Hugh Dickins,
	linux-mm, LKML, linux-parisc, David Rientjes

[-- Attachment #1: Type: TEXT/PLAIN, Size: 765 bytes --]

On Tue, 19 Apr 2011, Pekka Enberg wrote:

> On Tue, Apr 19, 2011 at 6:46 PM, James Bottomley
> <James.Bottomley@hansenpartnership.com> wrote:
> > It compiles OK, but crashes on boot in fsck.  The crash is definitely mm
> > but looks to be a slab problem (it's a null deref on a spinlock in
> > add_partial(), which seems unrelated to this patch).

That means that the per node structures have not been setup yet. Node
hotplug not working?

> > It seems to be a random intermittent mm crash because the next reboot
> > crashed with the same trace but after the fsck had completed and the
> > third came up to the login prompt.
>
> Looks like a genuine SLUB problem on parisc. Christoph?

Race between node hotplug and use of the slab on that node?

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-19 17:11               ` James Bottomley
@ 2011-04-19 17:15                 ` Christoph Lameter
  -1 siblings, 0 replies; 210+ messages in thread
From: Christoph Lameter @ 2011-04-19 17:15 UTC (permalink / raw)
  To: James Bottomley
  Cc: Pekka Enberg, Michal Hocko, Andrew Morton, Hugh Dickins,
	linux-mm, LKML, linux-parisc, David Rientjes

On Tue, 19 Apr 2011, James Bottomley wrote:

> On Tue, 2011-04-19 at 20:05 +0300, Pekka Enberg wrote:
> > > It seems to be a random intermittent mm crash because the next reboot
> > > crashed with the same trace but after the fsck had completed and the
> > > third came up to the login prompt.
> >
> > Looks like a genuine SLUB problem on parisc. Christoph?
>
> Looking through the slub code, it seems to be making invalid
> assumptions.  All of the node stuff is dependent on CONFIG_NUMA.
> However, we're CONFIG_DISCONTIGMEM (with CONFIG_NUMA not set): on the
> machines I and Dave Anglin have, our physical memory ranges are 0-1GB
> and 64-65GB, so I think slub crashes when we get a page from the high
> memory range ... because it's not expecting a non-zero node number.

Right !NUMA systems only have node 0.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-19 17:15                 ` Christoph Lameter
  0 siblings, 0 replies; 210+ messages in thread
From: Christoph Lameter @ 2011-04-19 17:15 UTC (permalink / raw)
  To: James Bottomley
  Cc: Pekka Enberg, Michal Hocko, Andrew Morton, Hugh Dickins,
	linux-mm, LKML, linux-parisc, David Rientjes

On Tue, 19 Apr 2011, James Bottomley wrote:

> On Tue, 2011-04-19 at 20:05 +0300, Pekka Enberg wrote:
> > > It seems to be a random intermittent mm crash because the next reboot
> > > crashed with the same trace but after the fsck had completed and the
> > > third came up to the login prompt.
> >
> > Looks like a genuine SLUB problem on parisc. Christoph?
>
> Looking through the slub code, it seems to be making invalid
> assumptions.  All of the node stuff is dependent on CONFIG_NUMA.
> However, we're CONFIG_DISCONTIGMEM (with CONFIG_NUMA not set): on the
> machines I and Dave Anglin have, our physical memory ranges are 0-1GB
> and 64-65GB, so I think slub crashes when we get a page from the high
> memory range ... because it's not expecting a non-zero node number.

Right !NUMA systems only have node 0.

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-19 17:15                 ` Christoph Lameter
@ 2011-04-19 17:48                   ` James Bottomley
  -1 siblings, 0 replies; 210+ messages in thread
From: James Bottomley @ 2011-04-19 17:48 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, Michal Hocko, Andrew Morton, Hugh Dickins,
	linux-mm, LKML, linux-parisc, David Rientjes

On Tue, 2011-04-19 at 12:15 -0500, Christoph Lameter wrote:
> On Tue, 19 Apr 2011, James Bottomley wrote:
> 
> > On Tue, 2011-04-19 at 20:05 +0300, Pekka Enberg wrote:
> > > > It seems to be a random intermittent mm crash because the next reboot
> > > > crashed with the same trace but after the fsck had completed and the
> > > > third came up to the login prompt.
> > >
> > > Looks like a genuine SLUB problem on parisc. Christoph?
> >
> > Looking through the slub code, it seems to be making invalid
> > assumptions.  All of the node stuff is dependent on CONFIG_NUMA.
> > However, we're CONFIG_DISCONTIGMEM (with CONFIG_NUMA not set): on the
> > machines I and Dave Anglin have, our physical memory ranges are 0-1GB
> > and 64-65GB, so I think slub crashes when we get a page from the high
> > memory range ... because it's not expecting a non-zero node number.
> 
> Right !NUMA systems only have node 0.

That's rubbish.  Discontigmem uses the nodes field to identify the
discontiguous region.  page_to_nid() returns this value.  Your code
wrongly assumes this is zero for non NUMA.

This simple program triggers the problem instantly because it forces
allocation up into the upper discontigmem range:

#include <stdlib.h>

void main(void)
{
  const long size = 1024*1024*1024;
  char *a = malloc(size);
  int i;

  for (i=0; i < size; i += 4096)
    a[i] = '\0';
}

I can fix the panic by hard coding get_nodes() to return the zero node
for the non-numa case ... however, presumably it's more than just this
that's broken in slub?

James

---

diff --git a/mm/slub.c b/mm/slub.c
index 94d2a33..243bd9c 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -235,7 +235,11 @@ int slab_is_available(void)
 
 static inline struct kmem_cache_node *get_node(struct kmem_cache *s, int node)
 {
+#ifdef CONFIG_NUMA
 	return s->node[node];
+#else
+	return s->node[0];
+#endif
 }
 
 /* Verify that a pointer has an address that is valid within a slab page */


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-19 17:48                   ` James Bottomley
  0 siblings, 0 replies; 210+ messages in thread
From: James Bottomley @ 2011-04-19 17:48 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, Michal Hocko, Andrew Morton, Hugh Dickins,
	linux-mm, LKML, linux-parisc, David Rientjes

On Tue, 2011-04-19 at 12:15 -0500, Christoph Lameter wrote:
> On Tue, 19 Apr 2011, James Bottomley wrote:
> 
> > On Tue, 2011-04-19 at 20:05 +0300, Pekka Enberg wrote:
> > > > It seems to be a random intermittent mm crash because the next reboot
> > > > crashed with the same trace but after the fsck had completed and the
> > > > third came up to the login prompt.
> > >
> > > Looks like a genuine SLUB problem on parisc. Christoph?
> >
> > Looking through the slub code, it seems to be making invalid
> > assumptions.  All of the node stuff is dependent on CONFIG_NUMA.
> > However, we're CONFIG_DISCONTIGMEM (with CONFIG_NUMA not set): on the
> > machines I and Dave Anglin have, our physical memory ranges are 0-1GB
> > and 64-65GB, so I think slub crashes when we get a page from the high
> > memory range ... because it's not expecting a non-zero node number.
> 
> Right !NUMA systems only have node 0.

That's rubbish.  Discontigmem uses the nodes field to identify the
discontiguous region.  page_to_nid() returns this value.  Your code
wrongly assumes this is zero for non NUMA.

This simple program triggers the problem instantly because it forces
allocation up into the upper discontigmem range:

#include <stdlib.h>

void main(void)
{
  const long size = 1024*1024*1024;
  char *a = malloc(size);
  int i;

  for (i=0; i < size; i += 4096)
    a[i] = '\0';
}

I can fix the panic by hard coding get_nodes() to return the zero node
for the non-numa case ... however, presumably it's more than just this
that's broken in slub?

James

---

diff --git a/mm/slub.c b/mm/slub.c
index 94d2a33..243bd9c 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -235,7 +235,11 @@ int slab_is_available(void)
 
 static inline struct kmem_cache_node *get_node(struct kmem_cache *s, int node)
 {
+#ifdef CONFIG_NUMA
 	return s->node[node];
+#else
+	return s->node[0];
+#endif
 }
 
 /* Verify that a pointer has an address that is valid within a slab page */



^ permalink raw reply related	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-19 17:48                   ` James Bottomley
@ 2011-04-19 18:10                     ` Christoph Lameter
  -1 siblings, 0 replies; 210+ messages in thread
From: Christoph Lameter @ 2011-04-19 18:10 UTC (permalink / raw)
  To: James Bottomley
  Cc: Pekka Enberg, Michal Hocko, Andrew Morton, Hugh Dickins,
	linux-mm, LKML, linux-parisc, David Rientjes

On Tue, 19 Apr 2011, James Bottomley wrote:

> > Right !NUMA systems only have node 0.
>
> That's rubbish.  Discontigmem uses the nodes field to identify the
> discontiguous region.  page_to_nid() returns this value.  Your code
> wrongly assumes this is zero for non NUMA.

Sorry the kernel has no node awareness if you do not set CONFIG_NUMA

F.e. zone node lookups work the following way

static inline int
zone_to_nid(struct zone *zone)
{
#ifdef CONFIG_NUMA
        return zone->node;
#else
        return 0;
#endif
}

How in the world did you get a zone setup in node 1 with a !NUMA config?


The problem seems to be that the kernel seems to allow a
definition of a page_to_nid() function that returns non zero in the !NUMA
case. And slub relies on page_to_nid returning zero in the !NUMA case.
Because NODES_WIDTH should be 0 in the !NUMA case and therefore
page_to_nid must return 0.

> I can fix the panic by hard coding get_nodes() to return the zero node
> for the non-numa case ... however, presumably it's more than just this
> that's broken in slub?

If you think that is broken then we have brokenness all over the kernel
whenever we determine the node from a page and use that to do a lookup.


^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-19 18:10                     ` Christoph Lameter
  0 siblings, 0 replies; 210+ messages in thread
From: Christoph Lameter @ 2011-04-19 18:10 UTC (permalink / raw)
  To: James Bottomley
  Cc: Pekka Enberg, Michal Hocko, Andrew Morton, Hugh Dickins,
	linux-mm, LKML, linux-parisc, David Rientjes

On Tue, 19 Apr 2011, James Bottomley wrote:

> > Right !NUMA systems only have node 0.
>
> That's rubbish.  Discontigmem uses the nodes field to identify the
> discontiguous region.  page_to_nid() returns this value.  Your code
> wrongly assumes this is zero for non NUMA.

Sorry the kernel has no node awareness if you do not set CONFIG_NUMA

F.e. zone node lookups work the following way

static inline int
zone_to_nid(struct zone *zone)
{
#ifdef CONFIG_NUMA
        return zone->node;
#else
        return 0;
#endif
}

How in the world did you get a zone setup in node 1 with a !NUMA config?


The problem seems to be that the kernel seems to allow a
definition of a page_to_nid() function that returns non zero in the !NUMA
case. And slub relies on page_to_nid returning zero in the !NUMA case.
Because NODES_WIDTH should be 0 in the !NUMA case and therefore
page_to_nid must return 0.

> I can fix the panic by hard coding get_nodes() to return the zero node
> for the non-numa case ... however, presumably it's more than just this
> that's broken in slub?

If you think that is broken then we have brokenness all over the kernel
whenever we determine the node from a page and use that to do a lookup.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-19 18:10                     ` Christoph Lameter
@ 2011-04-19 18:20                       ` James Bottomley
  -1 siblings, 0 replies; 210+ messages in thread
From: James Bottomley @ 2011-04-19 18:20 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, Michal Hocko, Andrew Morton, Hugh Dickins,
	linux-mm, LKML, linux-parisc, David Rientjes

On Tue, 2011-04-19 at 13:10 -0500, Christoph Lameter wrote:
> On Tue, 19 Apr 2011, James Bottomley wrote:
> 
> > > Right !NUMA systems only have node 0.
> >
> > That's rubbish.  Discontigmem uses the nodes field to identify the
> > discontiguous region.  page_to_nid() returns this value.  Your code
> > wrongly assumes this is zero for non NUMA.
> 
> Sorry the kernel has no node awareness if you do not set CONFIG_NUMA
> 
> F.e. zone node lookups work the following way
> 
> static inline int
> zone_to_nid(struct zone *zone)
> {
> #ifdef CONFIG_NUMA
>         return zone->node;
> #else
>         return 0;
> #endif
> }
> 
> How in the world did you get a zone setup in node 1 with a !NUMA config?

I told you ... I forced an allocation into the first discontiguous
region.  That will return 1 for page_to_nid().

> The problem seems to be that the kernel seems to allow a
> definition of a page_to_nid() function that returns non zero in the !NUMA
> case.

This is called reality, yes.

>  And slub relies on page_to_nid returning zero in the !NUMA case.
> Because NODES_WIDTH should be 0 in the !NUMA case and therefore
> page_to_nid must return 0.

right, that's what I told you: slub is broken because it's making a
wrong assumption.  Look in asm-generic/memory_model.h it shows how the
page_to_nid() is used in finding the pfn array.  DISCONTIGMEM uses some
of the numa properties (including assigning zones to the discontiguous
regions).

> > I can fix the panic by hard coding get_nodes() to return the zero node
> > for the non-numa case ... however, presumably it's more than just this
> > that's broken in slub?
> 
> If you think that is broken then we have brokenness all over the kernel
> whenever we determine the node from a page and use that to do a lookup.

Not really.  The rest of the kernel uses the proper macros.  in
DISCONTIGMEM but !NUMA configs, the numa macros expand correctly.
You've cut across that with all the CONFIG_NUMA checks in slub.

James



^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-19 18:20                       ` James Bottomley
  0 siblings, 0 replies; 210+ messages in thread
From: James Bottomley @ 2011-04-19 18:20 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, Michal Hocko, Andrew Morton, Hugh Dickins,
	linux-mm, LKML, linux-parisc, David Rientjes

On Tue, 2011-04-19 at 13:10 -0500, Christoph Lameter wrote:
> On Tue, 19 Apr 2011, James Bottomley wrote:
> 
> > > Right !NUMA systems only have node 0.
> >
> > That's rubbish.  Discontigmem uses the nodes field to identify the
> > discontiguous region.  page_to_nid() returns this value.  Your code
> > wrongly assumes this is zero for non NUMA.
> 
> Sorry the kernel has no node awareness if you do not set CONFIG_NUMA
> 
> F.e. zone node lookups work the following way
> 
> static inline int
> zone_to_nid(struct zone *zone)
> {
> #ifdef CONFIG_NUMA
>         return zone->node;
> #else
>         return 0;
> #endif
> }
> 
> How in the world did you get a zone setup in node 1 with a !NUMA config?

I told you ... I forced an allocation into the first discontiguous
region.  That will return 1 for page_to_nid().

> The problem seems to be that the kernel seems to allow a
> definition of a page_to_nid() function that returns non zero in the !NUMA
> case.

This is called reality, yes.

>  And slub relies on page_to_nid returning zero in the !NUMA case.
> Because NODES_WIDTH should be 0 in the !NUMA case and therefore
> page_to_nid must return 0.

right, that's what I told you: slub is broken because it's making a
wrong assumption.  Look in asm-generic/memory_model.h it shows how the
page_to_nid() is used in finding the pfn array.  DISCONTIGMEM uses some
of the numa properties (including assigning zones to the discontiguous
regions).

> > I can fix the panic by hard coding get_nodes() to return the zero node
> > for the non-numa case ... however, presumably it's more than just this
> > that's broken in slub?
> 
> If you think that is broken then we have brokenness all over the kernel
> whenever we determine the node from a page and use that to do a lookup.

Not really.  The rest of the kernel uses the proper macros.  in
DISCONTIGMEM but !NUMA configs, the numa macros expand correctly.
You've cut across that with all the CONFIG_NUMA checks in slub.

James


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-19 18:20                       ` James Bottomley
@ 2011-04-19 18:35                         ` Christoph Lameter
  -1 siblings, 0 replies; 210+ messages in thread
From: Christoph Lameter @ 2011-04-19 18:35 UTC (permalink / raw)
  To: James Bottomley
  Cc: Pekka Enberg, Michal Hocko, Andrew Morton, Hugh Dickins,
	linux-mm, LKML, linux-parisc, David Rientjes

On Tue, 19 Apr 2011, James Bottomley wrote:

> > }
> >
> > How in the world did you get a zone setup in node 1 with a !NUMA config?
>
> I told you ... I forced an allocation into the first discontiguous
> region.  That will return 1 for page_to_nid().

How? The kernel has no concept of a node 1 without CONFIG_NUMA and so you
cannot tell the page allocator to allocate from node 1.

zone_to_nid is used as a fallback mechanism for page_to_nid() and as shown
will always return 0 for !NUMA configs.

page_to_nid(x) == zone_to_nid(page_zone(x)) must hold true. It is not
here.

> > The problem seems to be that the kernel seems to allow a
> > definition of a page_to_nid() function that returns non zero in the !NUMA
> > case.
>
> This is called reality, yes.

There you have the bug. Fix that and things will work fine.

> right, that's what I told you: slub is broken because it's making a
> wrong assumption.  Look in asm-generic/memory_model.h it shows how the
> page_to_nid() is used in finding the pfn array.  DISCONTIGMEM uses some
> of the numa properties (including assigning zones to the discontiguous
> regions).

Bitrotted code? If it uses numa properties then it must use a zone field
in struct zone. So DISCONTIGMEM seems to require CONFIG_NUMA.

> > If you think that is broken then we have brokenness all over the kernel
> > whenever we determine the node from a page and use that to do a lookup.
>
> Not really.  The rest of the kernel uses the proper macros.  in
> DISCONTIGMEM but !NUMA configs, the numa macros expand correctly.
> You've cut across that with all the CONFIG_NUMA checks in slub.

What are "the proper macros"? AFAICT page_to_nid() is the proper way to
access the node of a page. If page_to_nid() returns 1 then you have a zone
that the kernel knows of as being in node 0 having a page on a different
node.

We can likely force page_to_nid to ignore the node information that have
been erroneously placed there but this looks like something deeper is
wrong here. The node field in struct page is not only used for the Linux
support of a NUMA node but also for blocks of memory. Those should be
separate things.

---
 include/linux/mm.h |    4 ++++
 1 file changed, 4 insertions(+)

Index: linux-2.6/include/linux/mm.h
===================================================================
--- linux-2.6.orig/include/linux/mm.h	2011-04-19 13:20:20.092521248 -0500
+++ linux-2.6/include/linux/mm.h	2011-04-19 13:21:05.962521196 -0500
@@ -665,6 +665,7 @@ static inline int zone_to_nid(struct zon
 #endif
 }

+#ifdef CONFIG_NUMA
 #ifdef NODE_NOT_IN_PAGE_FLAGS
 extern int page_to_nid(struct page *page);
 #else
@@ -673,6 +674,9 @@ static inline int page_to_nid(struct pag
 	return (page->flags >> NODES_PGSHIFT) & NODES_MASK;
 }
 #endif
+#else
+#define page_to_nid(x) 0
+#endif

 static inline struct zone *page_zone(struct page *page)
 {

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-19 18:35                         ` Christoph Lameter
  0 siblings, 0 replies; 210+ messages in thread
From: Christoph Lameter @ 2011-04-19 18:35 UTC (permalink / raw)
  To: James Bottomley
  Cc: Pekka Enberg, Michal Hocko, Andrew Morton, Hugh Dickins,
	linux-mm, LKML, linux-parisc, David Rientjes

On Tue, 19 Apr 2011, James Bottomley wrote:

> > }
> >
> > How in the world did you get a zone setup in node 1 with a !NUMA config?
>
> I told you ... I forced an allocation into the first discontiguous
> region.  That will return 1 for page_to_nid().

How? The kernel has no concept of a node 1 without CONFIG_NUMA and so you
cannot tell the page allocator to allocate from node 1.

zone_to_nid is used as a fallback mechanism for page_to_nid() and as shown
will always return 0 for !NUMA configs.

page_to_nid(x) == zone_to_nid(page_zone(x)) must hold true. It is not
here.

> > The problem seems to be that the kernel seems to allow a
> > definition of a page_to_nid() function that returns non zero in the !NUMA
> > case.
>
> This is called reality, yes.

There you have the bug. Fix that and things will work fine.

> right, that's what I told you: slub is broken because it's making a
> wrong assumption.  Look in asm-generic/memory_model.h it shows how the
> page_to_nid() is used in finding the pfn array.  DISCONTIGMEM uses some
> of the numa properties (including assigning zones to the discontiguous
> regions).

Bitrotted code? If it uses numa properties then it must use a zone field
in struct zone. So DISCONTIGMEM seems to require CONFIG_NUMA.

> > If you think that is broken then we have brokenness all over the kernel
> > whenever we determine the node from a page and use that to do a lookup.
>
> Not really.  The rest of the kernel uses the proper macros.  in
> DISCONTIGMEM but !NUMA configs, the numa macros expand correctly.
> You've cut across that with all the CONFIG_NUMA checks in slub.

What are "the proper macros"? AFAICT page_to_nid() is the proper way to
access the node of a page. If page_to_nid() returns 1 then you have a zone
that the kernel knows of as being in node 0 having a page on a different
node.

We can likely force page_to_nid to ignore the node information that have
been erroneously placed there but this looks like something deeper is
wrong here. The node field in struct page is not only used for the Linux
support of a NUMA node but also for blocks of memory. Those should be
separate things.

---
 include/linux/mm.h |    4 ++++
 1 file changed, 4 insertions(+)

Index: linux-2.6/include/linux/mm.h
===================================================================
--- linux-2.6.orig/include/linux/mm.h	2011-04-19 13:20:20.092521248 -0500
+++ linux-2.6/include/linux/mm.h	2011-04-19 13:21:05.962521196 -0500
@@ -665,6 +665,7 @@ static inline int zone_to_nid(struct zon
 #endif
 }

+#ifdef CONFIG_NUMA
 #ifdef NODE_NOT_IN_PAGE_FLAGS
 extern int page_to_nid(struct page *page);
 #else
@@ -673,6 +674,9 @@ static inline int page_to_nid(struct pag
 	return (page->flags >> NODES_PGSHIFT) & NODES_MASK;
 }
 #endif
+#else
+#define page_to_nid(x) 0
+#endif

 static inline struct zone *page_zone(struct page *page)
 {

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-19 18:35                         ` Christoph Lameter
@ 2011-04-19 19:49                           ` James Bottomley
  -1 siblings, 0 replies; 210+ messages in thread
From: James Bottomley @ 2011-04-19 19:49 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, Michal Hocko, Andrew Morton, Hugh Dickins,
	linux-mm, LKML, linux-parisc, David Rientjes

On Tue, 2011-04-19 at 13:35 -0500, Christoph Lameter wrote:
> On Tue, 19 Apr 2011, James Bottomley wrote:
> 
> > > }
> > >
> > > How in the world did you get a zone setup in node 1 with a !NUMA config?
> >
> > I told you ... I forced an allocation into the first discontiguous
> > region.  That will return 1 for page_to_nid().
> 
> How? The kernel has no concept of a node 1 without CONFIG_NUMA and so you
> cannot tell the page allocator to allocate from node 1.

Yes, it does, as I explained in the email.

> zone_to_nid is used as a fallback mechanism for page_to_nid() and as shown
> will always return 0 for !NUMA configs.
> 
> page_to_nid(x) == zone_to_nid(page_zone(x)) must hold true. It is not
> here.
> 
> > > The problem seems to be that the kernel seems to allow a
> > > definition of a page_to_nid() function that returns non zero in the !NUMA
> > > case.
> >
> > This is called reality, yes.
> 
> There you have the bug. Fix that and things will work fine.

Why don't yout file the bug against reality? I'm not sure I have enough
credibility ...

> > right, that's what I told you: slub is broken because it's making a
> > wrong assumption.  Look in asm-generic/memory_model.h it shows how the
> > page_to_nid() is used in finding the pfn array.  DISCONTIGMEM uses some
> > of the numa properties (including assigning zones to the discontiguous
> > regions).
> 
> Bitrotted code?

Don't be silly: alpha, ia64, m32r, m68k, mips, parisc, tile and even x86
all use the discontigmem memory model in some configurations.

>  If it uses numa properties then it must use a zone field
> in struct zone. So DISCONTIGMEM seems to require CONFIG_NUMA.

No ... you're giving me back your assumptions.  They're not based on
what the kernel does.  CONFIG_NUMA may or may not be defined with
CONFIG_DISCONTIGMEM.

Of all the above, only x86 always had NUMA with DISCONTIGMEM.

> > > If you think that is broken then we have brokenness all over the kernel
> > > whenever we determine the node from a page and use that to do a lookup.
> >
> > Not really.  The rest of the kernel uses the proper macros.  in
> > DISCONTIGMEM but !NUMA configs, the numa macros expand correctly.
> > You've cut across that with all the CONFIG_NUMA checks in slub.
> 
> What are "the proper macros"? AFAICT page_to_nid() is the proper way to
> access the node of a page. If page_to_nid() returns 1 then you have a zone
> that the kernel knows of as being in node 0 having a page on a different
> node.

Well it depends what you want.  If you only want the actual NUMA node,
then pfn_to_nid() probably isn't what you want, because in a
DISCONTIGMEM model, there may be multiple nids per actual numa node.

> We can likely force page_to_nid to ignore the node information that have
> been erroneously placed there but this looks like something deeper is
> wrong here. The node field in struct page is not only used for the Linux
> support of a NUMA node but also for blocks of memory. Those should be
> separate things.

Look, it's not wrong, it's by design.  The assumption that non-numa
systems don't use nodes is the wrong one.

> ---
>  include/linux/mm.h |    4 ++++
>  1 file changed, 4 insertions(+)
> 
> Index: linux-2.6/include/linux/mm.h
> ===================================================================
> --- linux-2.6.orig/include/linux/mm.h	2011-04-19 13:20:20.092521248 -0500
> +++ linux-2.6/include/linux/mm.h	2011-04-19 13:21:05.962521196 -0500
> @@ -665,6 +665,7 @@ static inline int zone_to_nid(struct zon
>  #endif
>  }
> 
> +#ifdef CONFIG_NUMA
>  #ifdef NODE_NOT_IN_PAGE_FLAGS
>  extern int page_to_nid(struct page *page);
>  #else
> @@ -673,6 +674,9 @@ static inline int page_to_nid(struct pag
>  	return (page->flags >> NODES_PGSHIFT) & NODES_MASK;
>  }
>  #endif
> +#else
> +#define page_to_nid(x) 0
> +#endif

Don't be silly ... that breaks asm-generic/memory_model.h

James



^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-19 19:49                           ` James Bottomley
  0 siblings, 0 replies; 210+ messages in thread
From: James Bottomley @ 2011-04-19 19:49 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, Michal Hocko, Andrew Morton, Hugh Dickins,
	linux-mm, LKML, linux-parisc, David Rientjes

On Tue, 2011-04-19 at 13:35 -0500, Christoph Lameter wrote:
> On Tue, 19 Apr 2011, James Bottomley wrote:
> 
> > > }
> > >
> > > How in the world did you get a zone setup in node 1 with a !NUMA config?
> >
> > I told you ... I forced an allocation into the first discontiguous
> > region.  That will return 1 for page_to_nid().
> 
> How? The kernel has no concept of a node 1 without CONFIG_NUMA and so you
> cannot tell the page allocator to allocate from node 1.

Yes, it does, as I explained in the email.

> zone_to_nid is used as a fallback mechanism for page_to_nid() and as shown
> will always return 0 for !NUMA configs.
> 
> page_to_nid(x) == zone_to_nid(page_zone(x)) must hold true. It is not
> here.
> 
> > > The problem seems to be that the kernel seems to allow a
> > > definition of a page_to_nid() function that returns non zero in the !NUMA
> > > case.
> >
> > This is called reality, yes.
> 
> There you have the bug. Fix that and things will work fine.

Why don't yout file the bug against reality? I'm not sure I have enough
credibility ...

> > right, that's what I told you: slub is broken because it's making a
> > wrong assumption.  Look in asm-generic/memory_model.h it shows how the
> > page_to_nid() is used in finding the pfn array.  DISCONTIGMEM uses some
> > of the numa properties (including assigning zones to the discontiguous
> > regions).
> 
> Bitrotted code?

Don't be silly: alpha, ia64, m32r, m68k, mips, parisc, tile and even x86
all use the discontigmem memory model in some configurations.

>  If it uses numa properties then it must use a zone field
> in struct zone. So DISCONTIGMEM seems to require CONFIG_NUMA.

No ... you're giving me back your assumptions.  They're not based on
what the kernel does.  CONFIG_NUMA may or may not be defined with
CONFIG_DISCONTIGMEM.

Of all the above, only x86 always had NUMA with DISCONTIGMEM.

> > > If you think that is broken then we have brokenness all over the kernel
> > > whenever we determine the node from a page and use that to do a lookup.
> >
> > Not really.  The rest of the kernel uses the proper macros.  in
> > DISCONTIGMEM but !NUMA configs, the numa macros expand correctly.
> > You've cut across that with all the CONFIG_NUMA checks in slub.
> 
> What are "the proper macros"? AFAICT page_to_nid() is the proper way to
> access the node of a page. If page_to_nid() returns 1 then you have a zone
> that the kernel knows of as being in node 0 having a page on a different
> node.

Well it depends what you want.  If you only want the actual NUMA node,
then pfn_to_nid() probably isn't what you want, because in a
DISCONTIGMEM model, there may be multiple nids per actual numa node.

> We can likely force page_to_nid to ignore the node information that have
> been erroneously placed there but this looks like something deeper is
> wrong here. The node field in struct page is not only used for the Linux
> support of a NUMA node but also for blocks of memory. Those should be
> separate things.

Look, it's not wrong, it's by design.  The assumption that non-numa
systems don't use nodes is the wrong one.

> ---
>  include/linux/mm.h |    4 ++++
>  1 file changed, 4 insertions(+)
> 
> Index: linux-2.6/include/linux/mm.h
> ===================================================================
> --- linux-2.6.orig/include/linux/mm.h	2011-04-19 13:20:20.092521248 -0500
> +++ linux-2.6/include/linux/mm.h	2011-04-19 13:21:05.962521196 -0500
> @@ -665,6 +665,7 @@ static inline int zone_to_nid(struct zon
>  #endif
>  }
> 
> +#ifdef CONFIG_NUMA
>  #ifdef NODE_NOT_IN_PAGE_FLAGS
>  extern int page_to_nid(struct page *page);
>  #else
> @@ -673,6 +674,9 @@ static inline int page_to_nid(struct pag
>  	return (page->flags >> NODES_PGSHIFT) & NODES_MASK;
>  }
>  #endif
> +#else
> +#define page_to_nid(x) 0
> +#endif

Don't be silly ... that breaks asm-generic/memory_model.h

James


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-19 19:49                           ` James Bottomley
@ 2011-04-19 20:56                             ` Christoph Lameter
  -1 siblings, 0 replies; 210+ messages in thread
From: Christoph Lameter @ 2011-04-19 20:56 UTC (permalink / raw)
  To: James Bottomley
  Cc: Pekka Enberg, Michal Hocko, Andrew Morton, Hugh Dickins,
	linux-mm, LKML, linux-parisc, David Rientjes

On Tue, 19 Apr 2011, James Bottomley wrote:

> > > I told you ... I forced an allocation into the first discontiguous
> > > region.  That will return 1 for page_to_nid().
> >
> > How? The kernel has no concept of a node 1 without CONFIG_NUMA and so you
> > cannot tell the page allocator to allocate from node 1.
>
> Yes, it does, as I explained in the email.

Looked through it and canot find it. How would that be possible to do
with core kernel calls since the page allocator calls do not allow you to
specify a node under !NUMA.

> Don't be silly: alpha, ia64, m32r, m68k, mips, parisc, tile and even x86
> all use the discontigmem memory model in some configurations.

I guess DISCONTIGMEM is typically used together with NUMA. Otherwise we
would have run into this before.

> > > Not really.  The rest of the kernel uses the proper macros.  in
> > > DISCONTIGMEM but !NUMA configs, the numa macros expand correctly.
> > > You've cut across that with all the CONFIG_NUMA checks in slub.
> >
> > What are "the proper macros"? AFAICT page_to_nid() is the proper way to
> > access the node of a page. If page_to_nid() returns 1 then you have a zone
> > that the kernel knows of as being in node 0 having a page on a different
> > node.
>
> Well it depends what you want.  If you only want the actual NUMA node,
> then pfn_to_nid() probably isn't what you want, because in a
> DISCONTIGMEM model, there may be multiple nids per actual numa node.

Right yes you got it. The notion of a node is different(!!!!!). What
matters to the core kernel is the notion of a NUMA node. If DISCONTIGMEM
runs with !NUMA then the way that "node" is used in DISCONTIGMEM is
different from the core code and refers only to the memory blocks managed
by DISCONTIGMEM. The node should be irrelevant to the core then.

> > We can likely force page_to_nid to ignore the node information that have
> > been erroneously placed there but this looks like something deeper is
> > wrong here. The node field in struct page is not only used for the Linux
> > support of a NUMA node but also for blocks of memory. Those should be
> > separate things.
>
> Look, it's not wrong, it's by design.  The assumption that non-numa
> systems don't use nodes is the wrong one.

Depends on how you define the notion of a node. The way the core kernel
uses the term "node" means that there will be only one node and that is
node 0 if CONFIG_NUMA is off. Thus page_to_nid() must return 0 for !NUMA.

All sort of things in the core code will break in weird ways if you do
allow page_to_nid to return 1 under !NUMA. Just look at the usage of
page_to_nid(). Tried to use huge pages yet? And how will your version
of reality deal with the following checks in the page allocator? F.e.

              VM_BUG_ON(page_to_nid(page) != zone_to_nid(zone));

Enabled CONFIG_DEBUG_VM yet?


> > Index: linux-2.6/include/linux/mm.h
> > ===================================================================
> > --- linux-2.6.orig/include/linux/mm.h	2011-04-19 13:20:20.092521248 -0500
> > +++ linux-2.6/include/linux/mm.h	2011-04-19 13:21:05.962521196 -0500
> > @@ -665,6 +665,7 @@ static inline int zone_to_nid(struct zon
> >  #endif
> >  }
> >
> > +#ifdef CONFIG_NUMA
> >  #ifdef NODE_NOT_IN_PAGE_FLAGS
> >  extern int page_to_nid(struct page *page);
> >  #else
> > @@ -673,6 +674,9 @@ static inline int page_to_nid(struct pag
> >  	return (page->flags >> NODES_PGSHIFT) & NODES_MASK;
> >  }
> >  #endif
> > +#else
> > +#define page_to_nid(x) 0
> > +#endif
>
> Don't be silly ... that breaks asm-generic/memory_model.h

Well yeah looks like in order to be clean in the !NUMA case we would then
need a page_to_discontig_node_id() there that is different from the
page_to_nid() used for the core.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-19 20:56                             ` Christoph Lameter
  0 siblings, 0 replies; 210+ messages in thread
From: Christoph Lameter @ 2011-04-19 20:56 UTC (permalink / raw)
  To: James Bottomley
  Cc: Pekka Enberg, Michal Hocko, Andrew Morton, Hugh Dickins,
	linux-mm, LKML, linux-parisc, David Rientjes

On Tue, 19 Apr 2011, James Bottomley wrote:

> > > I told you ... I forced an allocation into the first discontiguous
> > > region.  That will return 1 for page_to_nid().
> >
> > How? The kernel has no concept of a node 1 without CONFIG_NUMA and so you
> > cannot tell the page allocator to allocate from node 1.
>
> Yes, it does, as I explained in the email.

Looked through it and canot find it. How would that be possible to do
with core kernel calls since the page allocator calls do not allow you to
specify a node under !NUMA.

> Don't be silly: alpha, ia64, m32r, m68k, mips, parisc, tile and even x86
> all use the discontigmem memory model in some configurations.

I guess DISCONTIGMEM is typically used together with NUMA. Otherwise we
would have run into this before.

> > > Not really.  The rest of the kernel uses the proper macros.  in
> > > DISCONTIGMEM but !NUMA configs, the numa macros expand correctly.
> > > You've cut across that with all the CONFIG_NUMA checks in slub.
> >
> > What are "the proper macros"? AFAICT page_to_nid() is the proper way to
> > access the node of a page. If page_to_nid() returns 1 then you have a zone
> > that the kernel knows of as being in node 0 having a page on a different
> > node.
>
> Well it depends what you want.  If you only want the actual NUMA node,
> then pfn_to_nid() probably isn't what you want, because in a
> DISCONTIGMEM model, there may be multiple nids per actual numa node.

Right yes you got it. The notion of a node is different(!!!!!). What
matters to the core kernel is the notion of a NUMA node. If DISCONTIGMEM
runs with !NUMA then the way that "node" is used in DISCONTIGMEM is
different from the core code and refers only to the memory blocks managed
by DISCONTIGMEM. The node should be irrelevant to the core then.

> > We can likely force page_to_nid to ignore the node information that have
> > been erroneously placed there but this looks like something deeper is
> > wrong here. The node field in struct page is not only used for the Linux
> > support of a NUMA node but also for blocks of memory. Those should be
> > separate things.
>
> Look, it's not wrong, it's by design.  The assumption that non-numa
> systems don't use nodes is the wrong one.

Depends on how you define the notion of a node. The way the core kernel
uses the term "node" means that there will be only one node and that is
node 0 if CONFIG_NUMA is off. Thus page_to_nid() must return 0 for !NUMA.

All sort of things in the core code will break in weird ways if you do
allow page_to_nid to return 1 under !NUMA. Just look at the usage of
page_to_nid(). Tried to use huge pages yet? And how will your version
of reality deal with the following checks in the page allocator? F.e.

              VM_BUG_ON(page_to_nid(page) != zone_to_nid(zone));

Enabled CONFIG_DEBUG_VM yet?


> > Index: linux-2.6/include/linux/mm.h
> > ===================================================================
> > --- linux-2.6.orig/include/linux/mm.h	2011-04-19 13:20:20.092521248 -0500
> > +++ linux-2.6/include/linux/mm.h	2011-04-19 13:21:05.962521196 -0500
> > @@ -665,6 +665,7 @@ static inline int zone_to_nid(struct zon
> >  #endif
> >  }
> >
> > +#ifdef CONFIG_NUMA
> >  #ifdef NODE_NOT_IN_PAGE_FLAGS
> >  extern int page_to_nid(struct page *page);
> >  #else
> > @@ -673,6 +674,9 @@ static inline int page_to_nid(struct pag
> >  	return (page->flags >> NODES_PGSHIFT) & NODES_MASK;
> >  }
> >  #endif
> > +#else
> > +#define page_to_nid(x) 0
> > +#endif
>
> Don't be silly ... that breaks asm-generic/memory_model.h

Well yeah looks like in order to be clean in the !NUMA case we would then
need a page_to_discontig_node_id() there that is different from the
page_to_nid() used for the core.


^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-19 20:56                             ` Christoph Lameter
@ 2011-04-19 21:21                               ` James Bottomley
  -1 siblings, 0 replies; 210+ messages in thread
From: James Bottomley @ 2011-04-19 21:21 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, Michal Hocko, Andrew Morton, Hugh Dickins,
	linux-mm, LKML, linux-parisc, David Rientjes

On Tue, 2011-04-19 at 15:56 -0500, Christoph Lameter wrote:
> On Tue, 19 Apr 2011, James Bottomley wrote:
> 
> > > > I told you ... I forced an allocation into the first discontiguous
> > > > region.  That will return 1 for page_to_nid().
> > >
> > > How? The kernel has no concept of a node 1 without CONFIG_NUMA and so you
> > > cannot tell the page allocator to allocate from node 1.
> >
> > Yes, it does, as I explained in the email.
> 
> Looked through it and canot find it. How would that be possible to do
> with core kernel calls since the page allocator calls do not allow you to
> specify a node under !NUMA.

it's used under DISCONTIGMEM to identify the pfn array.

> > Don't be silly: alpha, ia64, m32r, m68k, mips, parisc, tile and even x86
> > all use the discontigmem memory model in some configurations.
> 
> I guess DISCONTIGMEM is typically used together with NUMA. Otherwise we
> would have run into this before.

Which bit of my telling you that six architectures already use it this
way did you not get?  I'm not really interested in reconciling your
theories with how we currently operate.  If you want to require NUMA
with DISCONTIGMEM, fine, we'll just define SLUB as broken if that's not
true ... that will fix my boot panic reports.

James

---

diff --git a/init/Kconfig b/init/Kconfig
index 56240e7..a7ad8fb 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1226,6 +1226,7 @@ config SLAB
 	  per cpu and per node queues.
 
 config SLUB
+	depends on BROKEN || NUMA || !DISCONTIGMEM
 	bool "SLUB (Unqueued Allocator)"
 	help
 	   SLUB is a slab allocator that minimizes cache line usage

^ permalink raw reply related	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-19 21:21                               ` James Bottomley
  0 siblings, 0 replies; 210+ messages in thread
From: James Bottomley @ 2011-04-19 21:21 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, Michal Hocko, Andrew Morton, Hugh Dickins,
	linux-mm, LKML, linux-parisc, David Rientjes

On Tue, 2011-04-19 at 15:56 -0500, Christoph Lameter wrote:
> On Tue, 19 Apr 2011, James Bottomley wrote:
> 
> > > > I told you ... I forced an allocation into the first discontiguous
> > > > region.  That will return 1 for page_to_nid().
> > >
> > > How? The kernel has no concept of a node 1 without CONFIG_NUMA and so you
> > > cannot tell the page allocator to allocate from node 1.
> >
> > Yes, it does, as I explained in the email.
> 
> Looked through it and canot find it. How would that be possible to do
> with core kernel calls since the page allocator calls do not allow you to
> specify a node under !NUMA.

it's used under DISCONTIGMEM to identify the pfn array.

> > Don't be silly: alpha, ia64, m32r, m68k, mips, parisc, tile and even x86
> > all use the discontigmem memory model in some configurations.
> 
> I guess DISCONTIGMEM is typically used together with NUMA. Otherwise we
> would have run into this before.

Which bit of my telling you that six architectures already use it this
way did you not get?  I'm not really interested in reconciling your
theories with how we currently operate.  If you want to require NUMA
with DISCONTIGMEM, fine, we'll just define SLUB as broken if that's not
true ... that will fix my boot panic reports.

James

---

diff --git a/init/Kconfig b/init/Kconfig
index 56240e7..a7ad8fb 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1226,6 +1226,7 @@ config SLAB
 	  per cpu and per node queues.
 
 config SLUB
+	depends on BROKEN || NUMA || !DISCONTIGMEM
 	bool "SLUB (Unqueued Allocator)"
 	help
 	   SLUB is a slab allocator that minimizes cache line usage


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-19 21:21                               ` James Bottomley
@ 2011-04-19 21:39                                 ` Christoph Lameter
  -1 siblings, 0 replies; 210+ messages in thread
From: Christoph Lameter @ 2011-04-19 21:39 UTC (permalink / raw)
  To: James Bottomley
  Cc: Pekka Enberg, Michal Hocko, Andrew Morton, Hugh Dickins,
	linux-mm, LKML, linux-parisc, David Rientjes

On Tue, 19 Apr 2011, James Bottomley wrote:

> > I guess DISCONTIGMEM is typically used together with NUMA. Otherwise we
> > would have run into this before.
>
> Which bit of my telling you that six architectures already use it this
> way did you not get?  I'm not really interested in reconciling your
> theories with how we currently operate.  If you want to require NUMA
> with DISCONTIGMEM, fine, we'll just define SLUB as broken if that's not
> true ... that will fix my boot panic reports.

Which part of me telling you that you will break lots of other things in
the core kernel dont you get? If you were able to get to a command prompt
with SLAB then lets all be happy for as long as it lasts.



^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-19 21:39                                 ` Christoph Lameter
  0 siblings, 0 replies; 210+ messages in thread
From: Christoph Lameter @ 2011-04-19 21:39 UTC (permalink / raw)
  To: James Bottomley
  Cc: Pekka Enberg, Michal Hocko, Andrew Morton, Hugh Dickins,
	linux-mm, LKML, linux-parisc, David Rientjes

On Tue, 19 Apr 2011, James Bottomley wrote:

> > I guess DISCONTIGMEM is typically used together with NUMA. Otherwise we
> > would have run into this before.
>
> Which bit of my telling you that six architectures already use it this
> way did you not get?  I'm not really interested in reconciling your
> theories with how we currently operate.  If you want to require NUMA
> with DISCONTIGMEM, fine, we'll just define SLUB as broken if that's not
> true ... that will fix my boot panic reports.

Which part of me telling you that you will break lots of other things in
the core kernel dont you get? If you were able to get to a command prompt
with SLAB then lets all be happy for as long as it lasts.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-19 21:39                                 ` Christoph Lameter
@ 2011-04-19 21:48                                   ` James Bottomley
  -1 siblings, 0 replies; 210+ messages in thread
From: James Bottomley @ 2011-04-19 21:48 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, Michal Hocko, Andrew Morton, Hugh Dickins,
	linux-mm, LKML, linux-parisc, David Rientjes

On Tue, 2011-04-19 at 16:39 -0500, Christoph Lameter wrote:
> On Tue, 19 Apr 2011, James Bottomley wrote:
> 
> > > I guess DISCONTIGMEM is typically used together with NUMA. Otherwise we
> > > would have run into this before.
> >
> > Which bit of my telling you that six architectures already use it this
> > way did you not get?  I'm not really interested in reconciling your
> > theories with how we currently operate.  If you want to require NUMA
> > with DISCONTIGMEM, fine, we'll just define SLUB as broken if that's not
> > true ... that will fix my boot panic reports.
> 
> Which part of me telling you that you will break lots of other things in
> the core kernel dont you get?

I get that you tell me this ... however, the systems that, according to
you, should be failing to get to boot prompt do, in fact, manage it.

>  If you were able to get to a command prompt
> with SLAB then lets all be happy for as long as it lasts.

We can't re-engineer DISCONTIGMEM as a bug fix, so something like this
has to be done for stable regardless.

James



^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-19 21:48                                   ` James Bottomley
  0 siblings, 0 replies; 210+ messages in thread
From: James Bottomley @ 2011-04-19 21:48 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, Michal Hocko, Andrew Morton, Hugh Dickins,
	linux-mm, LKML, linux-parisc, David Rientjes

On Tue, 2011-04-19 at 16:39 -0500, Christoph Lameter wrote:
> On Tue, 19 Apr 2011, James Bottomley wrote:
> 
> > > I guess DISCONTIGMEM is typically used together with NUMA. Otherwise we
> > > would have run into this before.
> >
> > Which bit of my telling you that six architectures already use it this
> > way did you not get?  I'm not really interested in reconciling your
> > theories with how we currently operate.  If you want to require NUMA
> > with DISCONTIGMEM, fine, we'll just define SLUB as broken if that's not
> > true ... that will fix my boot panic reports.
> 
> Which part of me telling you that you will break lots of other things in
> the core kernel dont you get?

I get that you tell me this ... however, the systems that, according to
you, should be failing to get to boot prompt do, in fact, manage it.

>  If you were able to get to a command prompt
> with SLAB then lets all be happy for as long as it lasts.

We can't re-engineer DISCONTIGMEM as a bug fix, so something like this
has to be done for stable regardless.

James


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-19 21:48                                   ` James Bottomley
@ 2011-04-19 21:58                                     ` Christoph Lameter
  -1 siblings, 0 replies; 210+ messages in thread
From: Christoph Lameter @ 2011-04-19 21:58 UTC (permalink / raw)
  To: James Bottomley
  Cc: Pekka Enberg, Michal Hocko, Andrew Morton, Hugh Dickins,
	linux-mm, LKML, linux-parisc, David Rientjes

On Tue, 19 Apr 2011, James Bottomley wrote:

> > Which part of me telling you that you will break lots of other things in
> > the core kernel dont you get?
>
> I get that you tell me this ... however, the systems that, according to
> you, should be failing to get to boot prompt do, in fact, manage it.

If you dont use certain subsystems then it may work. Also do you run with
debuggin on.

The following patch is I think what would be needed to fix it.



Subject: [PATCH] Fix discontig support for !NUMA

Under NUMA discontig nodes map directly to the kernel NUMA nodes.

However, when DISCONTIG is used without NUMA then the kernel has only
one NUMA mode (==0) but within the node there may be multiple discontig pages
on various "nodes" for page struct vector management purposes.

Define a function __page_to_nid() that always extracts the node from
the page struct. This can be used in places where we need the discontig
node. Define page_to_nid() under !NUMA to always return 0. This ensures
that the various subsystems relying on page_to_nid(page) == 0 on !NUMA
function properly.

<Untested since I do not have a PARISC system. There could be
additional occurrences that need __page_to_nid>

Signed-off-by: Christoph Lameter <cl@linux.com>

---
 include/asm-generic/memory_model.h |    2 +-
 include/linux/mm.h                 |   10 ++++++++--
 mm/sparse.c                        |    2 +-
 3 files changed, 10 insertions(+), 4 deletions(-)

Index: linux-2.6/include/linux/mm.h
===================================================================
--- linux-2.6.orig/include/linux/mm.h	2011-04-19 16:43:53.822507013 -0500
+++ linux-2.6/include/linux/mm.h	2011-04-19 16:44:52.082506944 -0500
@@ -666,14 +666,20 @@ static inline int zone_to_nid(struct zon
 }

 #ifdef NODE_NOT_IN_PAGE_FLAGS
-extern int page_to_nid(struct page *page);
+extern int __page_to_nid(struct page *page);
 #else
-static inline int page_to_nid(struct page *page)
+static inline int __page_to_nid(struct page *page)
 {
 	return (page->flags >> NODES_PGSHIFT) & NODES_MASK;
 }
 #endif

+#ifdef CONFIG_NUMA
+#define page_to_nid __page_to_nid
+#else
+#define page_to_nid(x) 0
+#endif
+
 static inline struct zone *page_zone(struct page *page)
 {
 	return &NODE_DATA(page_to_nid(page))->node_zones[page_zonenum(page)];
Index: linux-2.6/include/asm-generic/memory_model.h
===================================================================
--- linux-2.6.orig/include/asm-generic/memory_model.h	2011-04-19 16:45:26.772506904 -0500
+++ linux-2.6/include/asm-generic/memory_model.h	2011-04-19 16:46:02.602506861 -0500
@@ -40,7 +40,7 @@

 #define __page_to_pfn(pg)						\
 ({	struct page *__pg = (pg);					\
-	struct pglist_data *__pgdat = NODE_DATA(page_to_nid(__pg));	\
+	struct pglist_data *__pgdat = NODE_DATA(__page_to_nid(__pg));	\
 	(unsigned long)(__pg - __pgdat->node_mem_map) +			\
 	 __pgdat->node_start_pfn;					\
 })
Index: linux-2.6/mm/sparse.c
===================================================================
--- linux-2.6.orig/mm/sparse.c	2011-04-19 16:44:58.432506937 -0500
+++ linux-2.6/mm/sparse.c	2011-04-19 16:45:07.332506926 -0500
@@ -40,7 +40,7 @@ static u8 section_to_node_table[NR_MEM_S
 static u16 section_to_node_table[NR_MEM_SECTIONS] __cacheline_aligned;
 #endif

-int page_to_nid(struct page *page)
+int __page_to_nid(struct page *page)
 {
 	return section_to_node_table[page_to_section(page)];
 }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-19 21:58                                     ` Christoph Lameter
  0 siblings, 0 replies; 210+ messages in thread
From: Christoph Lameter @ 2011-04-19 21:58 UTC (permalink / raw)
  To: James Bottomley
  Cc: Pekka Enberg, Michal Hocko, Andrew Morton, Hugh Dickins,
	linux-mm, LKML, linux-parisc, David Rientjes

On Tue, 19 Apr 2011, James Bottomley wrote:

> > Which part of me telling you that you will break lots of other things in
> > the core kernel dont you get?
>
> I get that you tell me this ... however, the systems that, according to
> you, should be failing to get to boot prompt do, in fact, manage it.

If you dont use certain subsystems then it may work. Also do you run with
debuggin on.

The following patch is I think what would be needed to fix it.



Subject: [PATCH] Fix discontig support for !NUMA

Under NUMA discontig nodes map directly to the kernel NUMA nodes.

However, when DISCONTIG is used without NUMA then the kernel has only
one NUMA mode (==0) but within the node there may be multiple discontig pages
on various "nodes" for page struct vector management purposes.

Define a function __page_to_nid() that always extracts the node from
the page struct. This can be used in places where we need the discontig
node. Define page_to_nid() under !NUMA to always return 0. This ensures
that the various subsystems relying on page_to_nid(page) == 0 on !NUMA
function properly.

<Untested since I do not have a PARISC system. There could be
additional occurrences that need __page_to_nid>

Signed-off-by: Christoph Lameter <cl@linux.com>

---
 include/asm-generic/memory_model.h |    2 +-
 include/linux/mm.h                 |   10 ++++++++--
 mm/sparse.c                        |    2 +-
 3 files changed, 10 insertions(+), 4 deletions(-)

Index: linux-2.6/include/linux/mm.h
===================================================================
--- linux-2.6.orig/include/linux/mm.h	2011-04-19 16:43:53.822507013 -0500
+++ linux-2.6/include/linux/mm.h	2011-04-19 16:44:52.082506944 -0500
@@ -666,14 +666,20 @@ static inline int zone_to_nid(struct zon
 }

 #ifdef NODE_NOT_IN_PAGE_FLAGS
-extern int page_to_nid(struct page *page);
+extern int __page_to_nid(struct page *page);
 #else
-static inline int page_to_nid(struct page *page)
+static inline int __page_to_nid(struct page *page)
 {
 	return (page->flags >> NODES_PGSHIFT) & NODES_MASK;
 }
 #endif

+#ifdef CONFIG_NUMA
+#define page_to_nid __page_to_nid
+#else
+#define page_to_nid(x) 0
+#endif
+
 static inline struct zone *page_zone(struct page *page)
 {
 	return &NODE_DATA(page_to_nid(page))->node_zones[page_zonenum(page)];
Index: linux-2.6/include/asm-generic/memory_model.h
===================================================================
--- linux-2.6.orig/include/asm-generic/memory_model.h	2011-04-19 16:45:26.772506904 -0500
+++ linux-2.6/include/asm-generic/memory_model.h	2011-04-19 16:46:02.602506861 -0500
@@ -40,7 +40,7 @@

 #define __page_to_pfn(pg)						\
 ({	struct page *__pg = (pg);					\
-	struct pglist_data *__pgdat = NODE_DATA(page_to_nid(__pg));	\
+	struct pglist_data *__pgdat = NODE_DATA(__page_to_nid(__pg));	\
 	(unsigned long)(__pg - __pgdat->node_mem_map) +			\
 	 __pgdat->node_start_pfn;					\
 })
Index: linux-2.6/mm/sparse.c
===================================================================
--- linux-2.6.orig/mm/sparse.c	2011-04-19 16:44:58.432506937 -0500
+++ linux-2.6/mm/sparse.c	2011-04-19 16:45:07.332506926 -0500
@@ -40,7 +40,7 @@ static u8 section_to_node_table[NR_MEM_S
 static u16 section_to_node_table[NR_MEM_SECTIONS] __cacheline_aligned;
 #endif

-int page_to_nid(struct page *page)
+int __page_to_nid(struct page *page)
 {
 	return section_to_node_table[page_to_section(page)];
 }

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH followup] mm: get rid of CONFIG_STACK_GROWSUP || CONFIG_IA64
  2011-04-19 11:09           ` Michal Hocko
@ 2011-04-20  0:33             ` KOSAKI Motohiro
  -1 siblings, 0 replies; 210+ messages in thread
From: KOSAKI Motohiro @ 2011-04-20  0:33 UTC (permalink / raw)
  To: Michal Hocko; +Cc: kosaki.motohiro, Andrew Morton, Hugh Dickins, linux-mm, LKML

> While I am in the cleanup mode. We should use VM_GROWSUP rather than
> tricky CONFIG_STACK_GROWSUP||CONFIG_IA64.
> 
> What do you think?

Now, VM_GROWSUP share the same value with VM_NOHUGEPAGE.
(this trick use the fact that thp don't support any stack growup architecture)

So, No.
Sorry for that.





^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH followup] mm: get rid of CONFIG_STACK_GROWSUP || CONFIG_IA64
@ 2011-04-20  0:33             ` KOSAKI Motohiro
  0 siblings, 0 replies; 210+ messages in thread
From: KOSAKI Motohiro @ 2011-04-20  0:33 UTC (permalink / raw)
  To: Michal Hocko; +Cc: kosaki.motohiro, Andrew Morton, Hugh Dickins, linux-mm, LKML

> While I am in the cleanup mode. We should use VM_GROWSUP rather than
> tricky CONFIG_STACK_GROWSUP||CONFIG_IA64.
> 
> What do you think?

Now, VM_GROWSUP share the same value with VM_NOHUGEPAGE.
(this trick use the fact that thp don't support any stack growup architecture)

So, No.
Sorry for that.




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-19 21:58                                     ` Christoph Lameter
@ 2011-04-20  1:23                                       ` KOSAKI Motohiro
  -1 siblings, 0 replies; 210+ messages in thread
From: KOSAKI Motohiro @ 2011-04-20  1:23 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: kosaki.motohiro, James Bottomley, Pekka Enberg, Michal Hocko,
	Andrew Morton, Hugh Dickins, linux-mm, LKML, linux-parisc,
	David Rientjes

> On Tue, 19 Apr 2011, James Bottomley wrote:
> 
> > > Which part of me telling you that you will break lots of other things in
> > > the core kernel dont you get?
> >
> > I get that you tell me this ... however, the systems that, according to
> > you, should be failing to get to boot prompt do, in fact, manage it.
> 
> If you dont use certain subsystems then it may work. Also do you run with
> debuggin on.
> 
> The following patch is I think what would be needed to fix it.

I'm worry about this patch. A lot of mm code assume !NUMA systems 
only have node 0. Not only SLUB.

I'm not sure why this unfortunate mismatch occur. but I think DISCONTIG
hacks makes less sense. Can we consider parisc turn NUMA on instead?




^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-20  1:23                                       ` KOSAKI Motohiro
  0 siblings, 0 replies; 210+ messages in thread
From: KOSAKI Motohiro @ 2011-04-20  1:23 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: kosaki.motohiro, James Bottomley, Pekka Enberg, Michal Hocko,
	Andrew Morton, Hugh Dickins, linux-mm, LKML, linux-parisc,
	David Rientjes

> On Tue, 19 Apr 2011, James Bottomley wrote:
> 
> > > Which part of me telling you that you will break lots of other things in
> > > the core kernel dont you get?
> >
> > I get that you tell me this ... however, the systems that, according to
> > you, should be failing to get to boot prompt do, in fact, manage it.
> 
> If you dont use certain subsystems then it may work. Also do you run with
> debuggin on.
> 
> The following patch is I think what would be needed to fix it.

I'm worry about this patch. A lot of mm code assume !NUMA systems 
only have node 0. Not only SLUB.

I'm not sure why this unfortunate mismatch occur. but I think DISCONTIG
hacks makes less sense. Can we consider parisc turn NUMA on instead?



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-19 21:58                                     ` Christoph Lameter
@ 2011-04-20  2:33                                       ` James Bottomley
  -1 siblings, 0 replies; 210+ messages in thread
From: James Bottomley @ 2011-04-20  2:33 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, Michal Hocko, Andrew Morton, Hugh Dickins,
	linux-mm, LKML, linux-parisc, David Rientjes

On Tue, 2011-04-19 at 16:58 -0500, Christoph Lameter wrote:
> On Tue, 19 Apr 2011, James Bottomley wrote:
> 
> > > Which part of me telling you that you will break lots of other things in
> > > the core kernel dont you get?
> >
> > I get that you tell me this ... however, the systems that, according to
> > you, should be failing to get to boot prompt do, in fact, manage it.
> 
> If you dont use certain subsystems then it may work. Also do you run with
> debuggin on.
> 
> The following patch is I think what would be needed to fix it.

Not really: crashes immediately on boot

[    0.000000] FP[0] enabled: Rev 1 Model 20
[    0.000000] The 64-bit Kernel has started...
[    0.000000] bootconsole [ttyB0] enabled
[    0.000000] Initialized PDC Console for debugging.
[    0.000000] Determining PDC firmware type: 64 bit PAT.
[    0.000000] model 00008870 00000491 00000000 00000002 3e0505e7352af710 100000f0 00000008 000000b2 000000b2
[    0.000000] vers  00000301
[    0.000000] CPUID vers 20 rev 4 (0x00000284)
[    0.000000] capabilities 0x35
[    0.000000] model 9000/800/rp3440  
[    0.000000] parisc_cache_init: Only equivalent aliasing supported!
[    0.000000] Memory Ranges:
[    0.000000]  0) Start 0x0000000000000000 End 0x000000003fffffff Size   1024 MB
[    0.000000]  1) Start 0x0000004040000000 End 0x000000407fdfffff Size   1022 MB
[    0.000000] Total Memory: 2046 MB
[    0.000000] initrd: 7f390000-7ffedf6d
[    0.000000] initrd: reserving 3f390000-3ffedf6d (mem_max 7fe00000)
[    0.000000] ------------[ cut here ]------------
[    0.000000] kernel BUG at mm/mm_init.c:127!
[    0.000000] 
[    0.000000]      YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
[    0.000000] PSW: 00001000000001001111111100001110 Not tainted
[    0.000000] r00-03  000000ff0804ff0e 000000004076a640 0000000040798c50 0000004080000000
[    0.000000] r04-07  0000000040746e40 0000000004040000 0000000000000001 0000000040654150
[    0.000000] r08-11  00000000405bd540 000000000407fe00 0000000000000001 0000000000000000
[    0.000000] r12-15  00000000405bc740 000f000000000000 00000000000001ff 000000004076a640
[    0.000000] r16-19  00000000000000ff 0000000000000000 2000000000000000 0000000000000000
[    0.000000] r20-23  0000004080000000 00000000405bd908 0000000000000000 0000000004040000
[    0.000000] r24-27  0000000000000001 0000000000000000 0000004080000000 0000000040746e40
[    0.000000] r28-31  2000000000000000 00000000405b0610 00000000405b0640 0000000000000000
[    0.000000] sr00-03  0000000000000000 0000000000000000 0000000000000000 0000000000000000
[    0.000000] sr04-07  0000000000000000 0000000000000000 0000000000000000 0000000000000000
[    0.000000] 
[    0.000000] IASQ: 0000000000000000 0000000000000000 IAOQ: 0000000040798fac 0000000040798fb0
[    0.000000]  IIR: 03ffe01f    ISR: 0000000010350000  IOR: 0000010000000000
[    0.000000]  CPU:        0   CR30: 00000000405b0000 CR31: fffffff0f0e098e0
[    0.000000]  ORIG_R28: 000000000003fe00
[    0.000000]  IAOQ[0]: mminit_verify_page_links+0x84/0xa0
[    0.000000]  IAOQ[1]: mminit_verify_page_links+0x88/0xa0
[    0.000000]  RP(r2): memmap_init_zone+0x148/0x2a0
[    0.000000] Backtrace:
[    0.000000]  [<0000000040798c50>] memmap_init_zone+0x148/0x2a0
[    0.000000]  [<0000000040777ca8>] free_area_init_node+0x3c8/0x518
[    0.000000]  [<000000004076fde0>] paging_init+0x928/0xb20
[    0.000000]  [<0000000040770a48>] setup_arch+0xe8/0x120
[    0.000000]  [<000000004076c9a0>] start_kernel+0xf0/0x830
[    0.000000]  [<000000004011f4fc>] start_parisc+0xa4/0xb8
[    0.000000]  [<00000000404b0f0c>] packet_ioctl+0x1e4/0x208
[    0.000000]  [<00000000404a79d0>] unix_ioctl+0x70/0x168
[    0.000000]  [<0000000040482d24>] ip_mc_gsfget+0x14c/0x200
[    0.000000]  [<000000004046cc20>] raw_ioctl+0xe8/0x118
[    0.000000]  [<000000004044f524>] do_tcp_getsockopt+0x5c4/0x5d0
[    0.000000]  [<0000000040432d64>] netlink_getsockopt+0x15c/0x178
[    0.000000] 
[    0.000000] Backtrace:
[    0.000000]  [<000000004011f984>] show_stack+0x14/0x20
[    0.000000]  [<000000004011f9a8>] dump_stack+0x18/0x28
[    0.000000]  [<000000004012022c>] die_if_kernel+0x194/0x258
[    0.000000]  [<0000000040120b30>] handle_interruption+0x840/0x8f8
[    0.000000]  [<0000000040798fac>] mminit_verify_page_links+0x84/0xa0
[    0.000000] 
[    0.000000] ---[ end trace 139ce121c98e96c9 ]---
[    0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
[    0.000000] Backtrace:
[    0.000000]  [<000000004011f984>] show_stack+0x14/0x20
[    0.000000]  [<000000004011f9a8>] dump_stack+0x18/0x28
[    0.000000]  [<000000004015945c>] panic+0xd4/0x368
[    0.000000]  [<000000004015f054>] do_exit+0x89c/0x9d8
[    0.000000]  [<00000000401202d4>] die_if_kernel+0x23c/0x258
[    0.000000]  [<0000000040120b30>] handle_interruption+0x840/0x8f8
[    0.000000]  [<0000000040798fac>] mminit_verify_page_links+0x84/0xa0
[    0.000000] 

There's a lot more to discontigmem than just page_to_nid ... there's the
whole pfn_to_nid() thing as well

James


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-20  2:33                                       ` James Bottomley
  0 siblings, 0 replies; 210+ messages in thread
From: James Bottomley @ 2011-04-20  2:33 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, Michal Hocko, Andrew Morton, Hugh Dickins,
	linux-mm, LKML, linux-parisc, David Rientjes

On Tue, 2011-04-19 at 16:58 -0500, Christoph Lameter wrote:
> On Tue, 19 Apr 2011, James Bottomley wrote:
> 
> > > Which part of me telling you that you will break lots of other things in
> > > the core kernel dont you get?
> >
> > I get that you tell me this ... however, the systems that, according to
> > you, should be failing to get to boot prompt do, in fact, manage it.
> 
> If you dont use certain subsystems then it may work. Also do you run with
> debuggin on.
> 
> The following patch is I think what would be needed to fix it.

Not really: crashes immediately on boot

[    0.000000] FP[0] enabled: Rev 1 Model 20
[    0.000000] The 64-bit Kernel has started...
[    0.000000] bootconsole [ttyB0] enabled
[    0.000000] Initialized PDC Console for debugging.
[    0.000000] Determining PDC firmware type: 64 bit PAT.
[    0.000000] model 00008870 00000491 00000000 00000002 3e0505e7352af710 100000f0 00000008 000000b2 000000b2
[    0.000000] vers  00000301
[    0.000000] CPUID vers 20 rev 4 (0x00000284)
[    0.000000] capabilities 0x35
[    0.000000] model 9000/800/rp3440  
[    0.000000] parisc_cache_init: Only equivalent aliasing supported!
[    0.000000] Memory Ranges:
[    0.000000]  0) Start 0x0000000000000000 End 0x000000003fffffff Size   1024 MB
[    0.000000]  1) Start 0x0000004040000000 End 0x000000407fdfffff Size   1022 MB
[    0.000000] Total Memory: 2046 MB
[    0.000000] initrd: 7f390000-7ffedf6d
[    0.000000] initrd: reserving 3f390000-3ffedf6d (mem_max 7fe00000)
[    0.000000] ------------[ cut here ]------------
[    0.000000] kernel BUG at mm/mm_init.c:127!
[    0.000000] 
[    0.000000]      YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
[    0.000000] PSW: 00001000000001001111111100001110 Not tainted
[    0.000000] r00-03  000000ff0804ff0e 000000004076a640 0000000040798c50 0000004080000000
[    0.000000] r04-07  0000000040746e40 0000000004040000 0000000000000001 0000000040654150
[    0.000000] r08-11  00000000405bd540 000000000407fe00 0000000000000001 0000000000000000
[    0.000000] r12-15  00000000405bc740 000f000000000000 00000000000001ff 000000004076a640
[    0.000000] r16-19  00000000000000ff 0000000000000000 2000000000000000 0000000000000000
[    0.000000] r20-23  0000004080000000 00000000405bd908 0000000000000000 0000000004040000
[    0.000000] r24-27  0000000000000001 0000000000000000 0000004080000000 0000000040746e40
[    0.000000] r28-31  2000000000000000 00000000405b0610 00000000405b0640 0000000000000000
[    0.000000] sr00-03  0000000000000000 0000000000000000 0000000000000000 0000000000000000
[    0.000000] sr04-07  0000000000000000 0000000000000000 0000000000000000 0000000000000000
[    0.000000] 
[    0.000000] IASQ: 0000000000000000 0000000000000000 IAOQ: 0000000040798fac 0000000040798fb0
[    0.000000]  IIR: 03ffe01f    ISR: 0000000010350000  IOR: 0000010000000000
[    0.000000]  CPU:        0   CR30: 00000000405b0000 CR31: fffffff0f0e098e0
[    0.000000]  ORIG_R28: 000000000003fe00
[    0.000000]  IAOQ[0]: mminit_verify_page_links+0x84/0xa0
[    0.000000]  IAOQ[1]: mminit_verify_page_links+0x88/0xa0
[    0.000000]  RP(r2): memmap_init_zone+0x148/0x2a0
[    0.000000] Backtrace:
[    0.000000]  [<0000000040798c50>] memmap_init_zone+0x148/0x2a0
[    0.000000]  [<0000000040777ca8>] free_area_init_node+0x3c8/0x518
[    0.000000]  [<000000004076fde0>] paging_init+0x928/0xb20
[    0.000000]  [<0000000040770a48>] setup_arch+0xe8/0x120
[    0.000000]  [<000000004076c9a0>] start_kernel+0xf0/0x830
[    0.000000]  [<000000004011f4fc>] start_parisc+0xa4/0xb8
[    0.000000]  [<00000000404b0f0c>] packet_ioctl+0x1e4/0x208
[    0.000000]  [<00000000404a79d0>] unix_ioctl+0x70/0x168
[    0.000000]  [<0000000040482d24>] ip_mc_gsfget+0x14c/0x200
[    0.000000]  [<000000004046cc20>] raw_ioctl+0xe8/0x118
[    0.000000]  [<000000004044f524>] do_tcp_getsockopt+0x5c4/0x5d0
[    0.000000]  [<0000000040432d64>] netlink_getsockopt+0x15c/0x178
[    0.000000] 
[    0.000000] Backtrace:
[    0.000000]  [<000000004011f984>] show_stack+0x14/0x20
[    0.000000]  [<000000004011f9a8>] dump_stack+0x18/0x28
[    0.000000]  [<000000004012022c>] die_if_kernel+0x194/0x258
[    0.000000]  [<0000000040120b30>] handle_interruption+0x840/0x8f8
[    0.000000]  [<0000000040798fac>] mminit_verify_page_links+0x84/0xa0
[    0.000000] 
[    0.000000] ---[ end trace 139ce121c98e96c9 ]---
[    0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
[    0.000000] Backtrace:
[    0.000000]  [<000000004011f984>] show_stack+0x14/0x20
[    0.000000]  [<000000004011f9a8>] dump_stack+0x18/0x28
[    0.000000]  [<000000004015945c>] panic+0xd4/0x368
[    0.000000]  [<000000004015f054>] do_exit+0x89c/0x9d8
[    0.000000]  [<00000000401202d4>] die_if_kernel+0x23c/0x258
[    0.000000]  [<0000000040120b30>] handle_interruption+0x840/0x8f8
[    0.000000]  [<0000000040798fac>] mminit_verify_page_links+0x84/0xa0
[    0.000000] 

There's a lot more to discontigmem than just page_to_nid ... there's the
whole pfn_to_nid() thing as well

James



^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-20  1:23                                       ` KOSAKI Motohiro
@ 2011-04-20  2:48                                         ` James Bottomley
  -1 siblings, 0 replies; 210+ messages in thread
From: James Bottomley @ 2011-04-20  2:48 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Christoph Lameter, Pekka Enberg, Michal Hocko, Andrew Morton,
	Hugh Dickins, linux-mm, LKML, linux-parisc, David Rientjes

On Wed, 2011-04-20 at 10:23 +0900, KOSAKI Motohiro wrote:
> > On Tue, 19 Apr 2011, James Bottomley wrote:
> > 
> > > > Which part of me telling you that you will break lots of other things in
> > > > the core kernel dont you get?
> > >
> > > I get that you tell me this ... however, the systems that, according to
> > > you, should be failing to get to boot prompt do, in fact, manage it.
> > 
> > If you dont use certain subsystems then it may work. Also do you run with
> > debuggin on.
> > 
> > The following patch is I think what would be needed to fix it.
> 
> I'm worry about this patch. A lot of mm code assume !NUMA systems 
> only have node 0. Not only SLUB.
> 
> I'm not sure why this unfortunate mismatch occur. but I think DISCONTIG
> hacks makes less sense. Can we consider parisc turn NUMA on instead?

Well, you mean a patch like this?  It won't build ... obviously we need
some more machinery

  CC      arch/parisc/kernel/asm-offsets.s
In file included from include/linux/sched.h:78,
                 from arch/parisc/kernel/asm-offsets.c:31:
include/linux/topology.h:212:2: error: #error Please define an appropriate SD_NODE_INIT in include/asm/topology.h!!!
In file included from include/linux/sched.h:78,
                 from arch/parisc/kernel/asm-offsets.c:31:
include/linux/topology.h: In function 'numa_node_id':
include/linux/topology.h:255: error: implicit declaration of function 'cpu_to_node'

James

---

diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig
index 69ff049..ffe4058 100644
--- a/arch/parisc/Kconfig
+++ b/arch/parisc/Kconfig
@@ -261,6 +261,9 @@ config HPUX
 	bool "Support for HP-UX binaries"
 	depends on !64BIT
 
+config NUMA
+       def_bool n
+
 config NR_CPUS
 	int "Maximum number of CPUs (2-32)"
 	range 2 32
diff --git a/mm/Kconfig b/mm/Kconfig
index e9c0c61..17a1474 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -64,6 +64,7 @@ endchoice
 config DISCONTIGMEM
 	def_bool y
 	depends on (!SELECT_MEMORY_MODEL && ARCH_DISCONTIGMEM_ENABLE) || DISCONTIGMEM_MANUAL
+	select NUMA
 
 config SPARSEMEM
 	def_bool y




^ permalink raw reply related	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-20  2:48                                         ` James Bottomley
  0 siblings, 0 replies; 210+ messages in thread
From: James Bottomley @ 2011-04-20  2:48 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Christoph Lameter, Pekka Enberg, Michal Hocko, Andrew Morton,
	Hugh Dickins, linux-mm, LKML, linux-parisc, David Rientjes

On Wed, 2011-04-20 at 10:23 +0900, KOSAKI Motohiro wrote:
> > On Tue, 19 Apr 2011, James Bottomley wrote:
> > 
> > > > Which part of me telling you that you will break lots of other things in
> > > > the core kernel dont you get?
> > >
> > > I get that you tell me this ... however, the systems that, according to
> > > you, should be failing to get to boot prompt do, in fact, manage it.
> > 
> > If you dont use certain subsystems then it may work. Also do you run with
> > debuggin on.
> > 
> > The following patch is I think what would be needed to fix it.
> 
> I'm worry about this patch. A lot of mm code assume !NUMA systems 
> only have node 0. Not only SLUB.
> 
> I'm not sure why this unfortunate mismatch occur. but I think DISCONTIG
> hacks makes less sense. Can we consider parisc turn NUMA on instead?

Well, you mean a patch like this?  It won't build ... obviously we need
some more machinery

  CC      arch/parisc/kernel/asm-offsets.s
In file included from include/linux/sched.h:78,
                 from arch/parisc/kernel/asm-offsets.c:31:
include/linux/topology.h:212:2: error: #error Please define an appropriate SD_NODE_INIT in include/asm/topology.h!!!
In file included from include/linux/sched.h:78,
                 from arch/parisc/kernel/asm-offsets.c:31:
include/linux/topology.h: In function 'numa_node_id':
include/linux/topology.h:255: error: implicit declaration of function 'cpu_to_node'

James

---

diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig
index 69ff049..ffe4058 100644
--- a/arch/parisc/Kconfig
+++ b/arch/parisc/Kconfig
@@ -261,6 +261,9 @@ config HPUX
 	bool "Support for HP-UX binaries"
 	depends on !64BIT
 
+config NUMA
+       def_bool n
+
 config NR_CPUS
 	int "Maximum number of CPUs (2-32)"
 	range 2 32
diff --git a/mm/Kconfig b/mm/Kconfig
index e9c0c61..17a1474 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -64,6 +64,7 @@ endchoice
 config DISCONTIGMEM
 	def_bool y
 	depends on (!SELECT_MEMORY_MODEL && ARCH_DISCONTIGMEM_ENABLE) || DISCONTIGMEM_MANUAL
+	select NUMA
 
 config SPARSEMEM
 	def_bool y



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-20  2:48                                         ` James Bottomley
  (?)
@ 2011-04-20  2:57                                           ` KOSAKI Motohiro
  -1 siblings, 0 replies; 210+ messages in thread
From: KOSAKI Motohiro @ 2011-04-20  2:57 UTC (permalink / raw)
  To: James Bottomley
  Cc: kosaki.motohiro, Christoph Lameter, Pekka Enberg, Michal Hocko,
	Andrew Morton, Hugh Dickins, linux-mm, LKML, linux-parisc,
	David Rientjes

> On Wed, 2011-04-20 at 10:23 +0900, KOSAKI Motohiro wrote:
> > > On Tue, 19 Apr 2011, James Bottomley wrote:
> > >=20
> > > > > Which part of me telling you that you will break lots of other =
things in
> > > > > the core kernel dont you get?
> > > >
> > > > I get that you tell me this ... however, the systems that, accord=
ing to
> > > > you, should be failing to get to boot prompt do, in fact, manage =
it.
> > >=20
> > > If you dont use certain subsystems then it may work. Also do you ru=
n with
> > > debuggin on.
> > >=20
> > > The following patch is I think what would be needed to fix it.
> >=20
> > I'm worry about this patch. A lot of mm code assume !NUMA systems=20
> > only have node 0. Not only SLUB.
> >=20
> > I'm not sure why this unfortunate mismatch occur. but I think DISCONT=
IG
> > hacks makes less sense. Can we consider parisc turn NUMA on instead?
>=20
> Well, you mean a patch like this?  It won't build ... obviously we need
> some more machinery
>=20
>   CC      arch/parisc/kernel/asm-offsets.s
> In file included from include/linux/sched.h:78,
>                  from arch/parisc/kernel/asm-offsets.c:31:
> include/linux/topology.h:212:2: error: #error Please define an appropri=
ate SD_NODE_INIT in include/asm/topology.h!!!
> In file included from include/linux/sched.h:78,
>                  from arch/parisc/kernel/asm-offsets.c:31:
> include/linux/topology.h: In function 'numa_node_id':
> include/linux/topology.h:255: error: implicit declaration of function '=
cpu_to_node'

Sorry about that. I'll see more carefully the code later. Probably long
time discontig-mem uninterest made multiple level breakage. Grr. ;-)



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter=
.ca/
Don't email: <a href=3Dmailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-20  2:57                                           ` KOSAKI Motohiro
  0 siblings, 0 replies; 210+ messages in thread
From: KOSAKI Motohiro @ 2011-04-20  2:57 UTC (permalink / raw)
  To: James Bottomley
  Cc: kosaki.motohiro, Christoph Lameter, Pekka Enberg, Michal Hocko,
	Andrew Morton, Hugh Dickins, linux-mm, LKML, linux-parisc,
	David Rientjes

> On Wed, 2011-04-20 at 10:23 +0900, KOSAKI Motohiro wrote:
> > > On Tue, 19 Apr 2011, James Bottomley wrote:
> > > 
> > > > > Which part of me telling you that you will break lots of other things in
> > > > > the core kernel dont you get?
> > > >
> > > > I get that you tell me this ... however, the systems that, according to
> > > > you, should be failing to get to boot prompt do, in fact, manage it.
> > > 
> > > If you dont use certain subsystems then it may work. Also do you run with
> > > debuggin on.
> > > 
> > > The following patch is I think what would be needed to fix it.
> > 
> > I'm worry about this patch. A lot of mm code assume !NUMA systems 
> > only have node 0. Not only SLUB.
> > 
> > I'm not sure why this unfortunate mismatch occur. but I think DISCONTIG
> > hacks makes less sense. Can we consider parisc turn NUMA on instead?
> 
> Well, you mean a patch like this?  It won't build ... obviously we need
> some more machinery
> 
>   CC      arch/parisc/kernel/asm-offsets.s
> In file included from include/linux/sched.h:78,
>                  from arch/parisc/kernel/asm-offsets.c:31:
> include/linux/topology.h:212:2: error: #error Please define an appropriate SD_NODE_INIT in include/asm/topology.h!!!
> In file included from include/linux/sched.h:78,
>                  from arch/parisc/kernel/asm-offsets.c:31:
> include/linux/topology.h: In function 'numa_node_id':
> include/linux/topology.h:255: error: implicit declaration of function 'cpu_to_node'

Sorry about that. I'll see more carefully the code later. Probably long
time discontig-mem uninterest made multiple level breakage. Grr. ;-)




^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-20  2:57                                           ` KOSAKI Motohiro
  0 siblings, 0 replies; 210+ messages in thread
From: KOSAKI Motohiro @ 2011-04-20  2:57 UTC (permalink / raw)
  To: James Bottomley
  Cc: kosaki.motohiro, Christoph Lameter, Pekka Enberg, Michal Hocko,
	Andrew Morton, Hugh Dickins, linux-mm, LKML, linux-parisc,
	David Rientjes

> On Wed, 2011-04-20 at 10:23 +0900, KOSAKI Motohiro wrote:
> > > On Tue, 19 Apr 2011, James Bottomley wrote:
> > > 
> > > > > Which part of me telling you that you will break lots of other things in
> > > > > the core kernel dont you get?
> > > >
> > > > I get that you tell me this ... however, the systems that, according to
> > > > you, should be failing to get to boot prompt do, in fact, manage it.
> > > 
> > > If you dont use certain subsystems then it may work. Also do you run with
> > > debuggin on.
> > > 
> > > The following patch is I think what would be needed to fix it.
> > 
> > I'm worry about this patch. A lot of mm code assume !NUMA systems 
> > only have node 0. Not only SLUB.
> > 
> > I'm not sure why this unfortunate mismatch occur. but I think DISCONTIG
> > hacks makes less sense. Can we consider parisc turn NUMA on instead?
> 
> Well, you mean a patch like this?  It won't build ... obviously we need
> some more machinery
> 
>   CC      arch/parisc/kernel/asm-offsets.s
> In file included from include/linux/sched.h:78,
>                  from arch/parisc/kernel/asm-offsets.c:31:
> include/linux/topology.h:212:2: error: #error Please define an appropriate SD_NODE_INIT in include/asm/topology.h!!!
> In file included from include/linux/sched.h:78,
>                  from arch/parisc/kernel/asm-offsets.c:31:
> include/linux/topology.h: In function 'numa_node_id':
> include/linux/topology.h:255: error: implicit declaration of function 'cpu_to_node'

Sorry about that. I'll see more carefully the code later. Probably long
time discontig-mem uninterest made multiple level breakage. Grr. ;-)



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-20  1:23                                       ` KOSAKI Motohiro
@ 2011-04-20  5:53                                         ` Pekka Enberg
  -1 siblings, 0 replies; 210+ messages in thread
From: Pekka Enberg @ 2011-04-20  5:53 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Christoph Lameter, James Bottomley, Michal Hocko, Andrew Morton,
	Hugh Dickins, linux-mm, LKML, linux-parisc, David Rientjes

On Wed, Apr 20, 2011 at 4:23 AM, KOSAKI Motohiro
<kosaki.motohiro@jp.fujitsu.com> wrote:
> I'm worry about this patch. A lot of mm code assume !NUMA systems
> only have node 0. Not only SLUB.

So is that a valid assumption or not? Christoph seems to think it is
and James seems to think it's not. Which way should we aim to fix it?
Would be nice if other people chimed in as we already know what James
and Christoph think.

                        Pekka

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-20  5:53                                         ` Pekka Enberg
  0 siblings, 0 replies; 210+ messages in thread
From: Pekka Enberg @ 2011-04-20  5:53 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Christoph Lameter, James Bottomley, Michal Hocko, Andrew Morton,
	Hugh Dickins, linux-mm, LKML, linux-parisc, David Rientjes

On Wed, Apr 20, 2011 at 4:23 AM, KOSAKI Motohiro
<kosaki.motohiro@jp.fujitsu.com> wrote:
> I'm worry about this patch. A lot of mm code assume !NUMA systems
> only have node 0. Not only SLUB.

So is that a valid assumption or not? Christoph seems to think it is
and James seems to think it's not. Which way should we aim to fix it?
Would be nice if other people chimed in as we already know what James
and Christoph think.

                        Pekka

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH followup] mm: get rid of CONFIG_STACK_GROWSUP || CONFIG_IA64
  2011-04-20  0:33             ` KOSAKI Motohiro
@ 2011-04-20  6:59               ` Michal Hocko
  -1 siblings, 0 replies; 210+ messages in thread
From: Michal Hocko @ 2011-04-20  6:59 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: Andrew Morton, Hugh Dickins, linux-mm, LKML

Hi Kosaki,

On Wed 20-04-11 09:33:26, KOSAKI Motohiro wrote:
> > While I am in the cleanup mode. We should use VM_GROWSUP rather than
> > tricky CONFIG_STACK_GROWSUP||CONFIG_IA64.
> > 
> > What do you think?
> 
> Now, VM_GROWSUP share the same value with VM_NOHUGEPAGE.
> (this trick use the fact that thp don't support any stack growup architecture)

I am not sure I understand you. AFAICS, VM_GROWSUP is defined to non 0
only if CONFIG_STACK_GROWSUP||CONFIG_IA64 (include/linux/mm.h).
And we use it to determine whether expand_stack_growsup[*] should be
defined (in include/linux/mm.h).

The patch basically unifies the way how we export expand_stack_growsup
function and how define it (in mm/mmap.c).

So either we should use CONFIG_STACK_GROWSUP||CONFIG_IA64 at both places
or we should use VM_GROWSUP trick. I am for the later one.

Am I missing something?

--- 
[*] the previous patch renamed expand_growsup to expand_stack_growsup.
-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH followup] mm: get rid of CONFIG_STACK_GROWSUP || CONFIG_IA64
@ 2011-04-20  6:59               ` Michal Hocko
  0 siblings, 0 replies; 210+ messages in thread
From: Michal Hocko @ 2011-04-20  6:59 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: Andrew Morton, Hugh Dickins, linux-mm, LKML

Hi Kosaki,

On Wed 20-04-11 09:33:26, KOSAKI Motohiro wrote:
> > While I am in the cleanup mode. We should use VM_GROWSUP rather than
> > tricky CONFIG_STACK_GROWSUP||CONFIG_IA64.
> > 
> > What do you think?
> 
> Now, VM_GROWSUP share the same value with VM_NOHUGEPAGE.
> (this trick use the fact that thp don't support any stack growup architecture)

I am not sure I understand you. AFAICS, VM_GROWSUP is defined to non 0
only if CONFIG_STACK_GROWSUP||CONFIG_IA64 (include/linux/mm.h).
And we use it to determine whether expand_stack_growsup[*] should be
defined (in include/linux/mm.h).

The patch basically unifies the way how we export expand_stack_growsup
function and how define it (in mm/mmap.c).

So either we should use CONFIG_STACK_GROWSUP||CONFIG_IA64 at both places
or we should use VM_GROWSUP trick. I am for the later one.

Am I missing something?

--- 
[*] the previous patch renamed expand_growsup to expand_stack_growsup.
-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH followup] mm: get rid of CONFIG_STACK_GROWSUP || CONFIG_IA64
  2011-04-20  6:59               ` Michal Hocko
@ 2011-04-20  7:08                 ` KOSAKI Motohiro
  -1 siblings, 0 replies; 210+ messages in thread
From: KOSAKI Motohiro @ 2011-04-20  7:08 UTC (permalink / raw)
  To: Michal Hocko; +Cc: kosaki.motohiro, Andrew Morton, Hugh Dickins, linux-mm, LKML

> Hi Kosaki,
> 
> On Wed 20-04-11 09:33:26, KOSAKI Motohiro wrote:
> > > While I am in the cleanup mode. We should use VM_GROWSUP rather than
> > > tricky CONFIG_STACK_GROWSUP||CONFIG_IA64.
> > > 
> > > What do you think?
> > 
> > Now, VM_GROWSUP share the same value with VM_NOHUGEPAGE.
> > (this trick use the fact that thp don't support any stack growup architecture)
> 
> I am not sure I understand you. AFAICS, VM_GROWSUP is defined to non 0
> only if CONFIG_STACK_GROWSUP||CONFIG_IA64 (include/linux/mm.h).
> And we use it to determine whether expand_stack_growsup[*] should be
> defined (in include/linux/mm.h).
> 
> The patch basically unifies the way how we export expand_stack_growsup
> function and how define it (in mm/mmap.c).
> 
> So either we should use CONFIG_STACK_GROWSUP||CONFIG_IA64 at both places
> or we should use VM_GROWSUP trick. I am for the later one.
> 
> Am I missing something?
> 
> --- 
> [*] the previous patch renamed expand_growsup to expand_stack_growsup.

Right you are. sorry.




^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH followup] mm: get rid of CONFIG_STACK_GROWSUP || CONFIG_IA64
@ 2011-04-20  7:08                 ` KOSAKI Motohiro
  0 siblings, 0 replies; 210+ messages in thread
From: KOSAKI Motohiro @ 2011-04-20  7:08 UTC (permalink / raw)
  To: Michal Hocko; +Cc: kosaki.motohiro, Andrew Morton, Hugh Dickins, linux-mm, LKML

> Hi Kosaki,
> 
> On Wed 20-04-11 09:33:26, KOSAKI Motohiro wrote:
> > > While I am in the cleanup mode. We should use VM_GROWSUP rather than
> > > tricky CONFIG_STACK_GROWSUP||CONFIG_IA64.
> > > 
> > > What do you think?
> > 
> > Now, VM_GROWSUP share the same value with VM_NOHUGEPAGE.
> > (this trick use the fact that thp don't support any stack growup architecture)
> 
> I am not sure I understand you. AFAICS, VM_GROWSUP is defined to non 0
> only if CONFIG_STACK_GROWSUP||CONFIG_IA64 (include/linux/mm.h).
> And we use it to determine whether expand_stack_growsup[*] should be
> defined (in include/linux/mm.h).
> 
> The patch basically unifies the way how we export expand_stack_growsup
> function and how define it (in mm/mmap.c).
> 
> So either we should use CONFIG_STACK_GROWSUP||CONFIG_IA64 at both places
> or we should use VM_GROWSUP trick. I am for the later one.
> 
> Am I missing something?
> 
> --- 
> [*] the previous patch renamed expand_growsup to expand_stack_growsup.

Right you are. sorry.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-20  5:53                                         ` Pekka Enberg
@ 2011-04-20  7:15                                           ` KOSAKI Motohiro
  -1 siblings, 0 replies; 210+ messages in thread
From: KOSAKI Motohiro @ 2011-04-20  7:15 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: kosaki.motohiro, Christoph Lameter, James Bottomley,
	Michal Hocko, Andrew Morton, Hugh Dickins, linux-mm, LKML,
	linux-parisc, David Rientjes

> On Wed, Apr 20, 2011 at 4:23 AM, KOSAKI Motohiro
> <kosaki.motohiro@jp.fujitsu.com> wrote:
> > I'm worry about this patch. A lot of mm code assume !NUMA systems
> > only have node 0. Not only SLUB.
> 
> So is that a valid assumption or not? Christoph seems to think it is
> and James seems to think it's not. Which way should we aim to fix it?
> Would be nice if other people chimed in as we already know what James
> and Christoph think.

I'm sorry. I don't know it really. The fact was gone into historical myst. ;-)

Now, CONFIG_NUMA has mainly five meanings.

1) system may has !0 node id.
2) compile mm/mempolicy.c (ie enable mempolicy APIs)
3) Allocator (kmalloc, vmalloc, alloc_page, et al) awake NUMA topology.
4) enable zone-reclaim feature
5) scheduler makes per-node load balancing scheduler domain

Anyway, we have to fix this issue.  I'm digging which fixing way has least risk.


btw, x86 don't have an issue. Probably it's a reason why this issue was neglected
long time.

arch/x86/Kconfig
-------------------------------------
config ARCH_DISCONTIGMEM_ENABLE
        def_bool y
        depends on NUMA && X86_32




^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-20  7:15                                           ` KOSAKI Motohiro
  0 siblings, 0 replies; 210+ messages in thread
From: KOSAKI Motohiro @ 2011-04-20  7:15 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: kosaki.motohiro, Christoph Lameter, James Bottomley,
	Michal Hocko, Andrew Morton, Hugh Dickins, linux-mm, LKML,
	linux-parisc, David Rientjes

> On Wed, Apr 20, 2011 at 4:23 AM, KOSAKI Motohiro
> <kosaki.motohiro@jp.fujitsu.com> wrote:
> > I'm worry about this patch. A lot of mm code assume !NUMA systems
> > only have node 0. Not only SLUB.
> 
> So is that a valid assumption or not? Christoph seems to think it is
> and James seems to think it's not. Which way should we aim to fix it?
> Would be nice if other people chimed in as we already know what James
> and Christoph think.

I'm sorry. I don't know it really. The fact was gone into historical myst. ;-)

Now, CONFIG_NUMA has mainly five meanings.

1) system may has !0 node id.
2) compile mm/mempolicy.c (ie enable mempolicy APIs)
3) Allocator (kmalloc, vmalloc, alloc_page, et al) awake NUMA topology.
4) enable zone-reclaim feature
5) scheduler makes per-node load balancing scheduler domain

Anyway, we have to fix this issue.  I'm digging which fixing way has least risk.


btw, x86 don't have an issue. Probably it's a reason why this issue was neglected
long time.

arch/x86/Kconfig
-------------------------------------
config ARCH_DISCONTIGMEM_ENABLE
        def_bool y
        depends on NUMA && X86_32



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-20  7:15                                           ` KOSAKI Motohiro
  (?)
@ 2011-04-20  7:34                                             ` Pekka Enberg
  -1 siblings, 0 replies; 210+ messages in thread
From: Pekka Enberg @ 2011-04-20  7:34 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Christoph Lameter, James Bottomley, Michal Hocko, Andrew Morton,
	Hugh Dickins, linux-mm, LKML, linux-parisc, David Rientjes,
	Ingo Molnar, x86 maintainers

Hi!

On Wed, Apr 20, 2011 at 4:23 AM, KOSAKI Motohiro
>> <kosaki.motohiro@jp.fujitsu.com> wrote:
>> > I'm worry about this patch. A lot of mm code assume !NUMA systems
>> > only have node 0. Not only SLUB.
>>
>> So is that a valid assumption or not? Christoph seems to think it is
>> and James seems to think it's not. Which way should we aim to fix it=
?
>> Would be nice if other people chimed in as we already know what Jame=
s
>> and Christoph think.

On Wed, Apr 20, 2011 at 10:15 AM, KOSAKI Motohiro
<kosaki.motohiro@jp.fujitsu.com> wrote:
> I'm sorry. I don't know it really. The fact was gone into historical =
myst. ;-)
>
> Now, CONFIG_NUMA has mainly five meanings.
>
> 1) system may has !0 node id.
> 2) compile mm/mempolicy.c (ie enable mempolicy APIs)
> 3) Allocator (kmalloc, vmalloc, alloc_page, et al) awake NUMA topolog=
y.
> 4) enable zone-reclaim feature
> 5) scheduler makes per-node load balancing scheduler domain
>
> Anyway, we have to fix this issue. =A0I'm digging which fixing way ha=
s least risk.
>
>
> btw, x86 don't have an issue. Probably it's a reason why this issue w=
as neglected
> long time.
>
> arch/x86/Kconfig
> -------------------------------------
> config ARCH_DISCONTIGMEM_ENABLE
> =A0 =A0 =A0 =A0def_bool y
> =A0 =A0 =A0 =A0depends on NUMA && X86_32

That part makes me think the best option is to make parisc do
CONFIG_NUMA as well regardless of the historical intent was.

                        Pekka
--
To unsubscribe from this list: send the line "unsubscribe linux-parisc"=
 in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-20  7:34                                             ` Pekka Enberg
  0 siblings, 0 replies; 210+ messages in thread
From: Pekka Enberg @ 2011-04-20  7:34 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Christoph Lameter, James Bottomley, Michal Hocko, Andrew Morton,
	Hugh Dickins, linux-mm, LKML, linux-parisc, David Rientjes,
	Ingo Molnar, x86 maintainers

Hi!

On Wed, Apr 20, 2011 at 4:23 AM, KOSAKI Motohiro
>> <kosaki.motohiro@jp.fujitsu.com> wrote:
>> > I'm worry about this patch. A lot of mm code assume !NUMA systems
>> > only have node 0. Not only SLUB.
>>
>> So is that a valid assumption or not? Christoph seems to think it is
>> and James seems to think it's not. Which way should we aim to fix it?
>> Would be nice if other people chimed in as we already know what James
>> and Christoph think.

On Wed, Apr 20, 2011 at 10:15 AM, KOSAKI Motohiro
<kosaki.motohiro@jp.fujitsu.com> wrote:
> I'm sorry. I don't know it really. The fact was gone into historical myst. ;-)
>
> Now, CONFIG_NUMA has mainly five meanings.
>
> 1) system may has !0 node id.
> 2) compile mm/mempolicy.c (ie enable mempolicy APIs)
> 3) Allocator (kmalloc, vmalloc, alloc_page, et al) awake NUMA topology.
> 4) enable zone-reclaim feature
> 5) scheduler makes per-node load balancing scheduler domain
>
> Anyway, we have to fix this issue.  I'm digging which fixing way has least risk.
>
>
> btw, x86 don't have an issue. Probably it's a reason why this issue was neglected
> long time.
>
> arch/x86/Kconfig
> -------------------------------------
> config ARCH_DISCONTIGMEM_ENABLE
>        def_bool y
>        depends on NUMA && X86_32

That part makes me think the best option is to make parisc do
CONFIG_NUMA as well regardless of the historical intent was.

                        Pekka

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-20  7:34                                             ` Pekka Enberg
  0 siblings, 0 replies; 210+ messages in thread
From: Pekka Enberg @ 2011-04-20  7:34 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Christoph Lameter, James Bottomley, Michal Hocko, Andrew Morton,
	Hugh Dickins, linux-mm, LKML, linux-parisc, David Rientjes,
	Ingo Molnar, x86 maintainers

Hi!

On Wed, Apr 20, 2011 at 4:23 AM, KOSAKI Motohiro
>> <kosaki.motohiro@jp.fujitsu.com> wrote:
>> > I'm worry about this patch. A lot of mm code assume !NUMA systems
>> > only have node 0. Not only SLUB.
>>
>> So is that a valid assumption or not? Christoph seems to think it is
>> and James seems to think it's not. Which way should we aim to fix it?
>> Would be nice if other people chimed in as we already know what James
>> and Christoph think.

On Wed, Apr 20, 2011 at 10:15 AM, KOSAKI Motohiro
<kosaki.motohiro@jp.fujitsu.com> wrote:
> I'm sorry. I don't know it really. The fact was gone into historical myst. ;-)
>
> Now, CONFIG_NUMA has mainly five meanings.
>
> 1) system may has !0 node id.
> 2) compile mm/mempolicy.c (ie enable mempolicy APIs)
> 3) Allocator (kmalloc, vmalloc, alloc_page, et al) awake NUMA topology.
> 4) enable zone-reclaim feature
> 5) scheduler makes per-node load balancing scheduler domain
>
> Anyway, we have to fix this issue.  I'm digging which fixing way has least risk.
>
>
> btw, x86 don't have an issue. Probably it's a reason why this issue was neglected
> long time.
>
> arch/x86/Kconfig
> -------------------------------------
> config ARCH_DISCONTIGMEM_ENABLE
>        def_bool y
>        depends on NUMA && X86_32

That part makes me think the best option is to make parisc do
CONFIG_NUMA as well regardless of the historical intent was.

                        Pekka

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-20  7:34                                             ` Pekka Enberg
  (?)
@ 2011-04-20  8:40                                               ` KOSAKI Motohiro
  -1 siblings, 0 replies; 210+ messages in thread
From: KOSAKI Motohiro @ 2011-04-20  8:40 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: kosaki.motohiro, Christoph Lameter, James Bottomley,
	Michal Hocko, Andrew Morton, Hugh Dickins, linux-mm, LKML,
	linux-parisc, David Rientjes, Ingo Molnar, x86 maintainers

> > btw, x86 don't have an issue. Probably it's a reason why this issue=
 was neglected
> > long time.
> >
> > arch/x86/Kconfig
> > -------------------------------------
> > config ARCH_DISCONTIGMEM_ENABLE
> > =A0 =A0 =A0 =A0def_bool y
> > =A0 =A0 =A0 =A0depends on NUMA && X86_32
>=20
> That part makes me think the best option is to make parisc do
> CONFIG_NUMA as well regardless of the historical intent was.
>=20
>                         Pekka

This?

compile test only.

---
 arch/parisc/Kconfig            |    7 +++++++
 include/asm-generic/topology.h |    4 ----
 include/linux/topology.h       |    2 +-
 3 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig
index 69ff049..0bf9ae8 100644
--- a/arch/parisc/Kconfig
+++ b/arch/parisc/Kconfig
@@ -229,6 +229,12 @@ config HOTPLUG_CPU
 	default y if SMP
 	select HOTPLUG
=20
+config NUMA
+	bool "NUMA support"
+	help
+	  Say Y to compile the kernel to support NUMA (Non-Uniform Memory
+	  Access).
+
 config ARCH_SELECT_MEMORY_MODEL
 	def_bool y
 	depends on 64BIT
@@ -236,6 +242,7 @@ config ARCH_SELECT_MEMORY_MODEL
 config ARCH_DISCONTIGMEM_ENABLE
 	def_bool y
 	depends on 64BIT
+	depends on NUMA
=20
 config ARCH_FLATMEM_ENABLE
 	def_bool y
diff --git a/include/asm-generic/topology.h b/include/asm-generic/topol=
ogy.h
index fc824e2..932567b 100644
--- a/include/asm-generic/topology.h
+++ b/include/asm-generic/topology.h
@@ -27,8 +27,6 @@
 #ifndef _ASM_GENERIC_TOPOLOGY_H
 #define _ASM_GENERIC_TOPOLOGY_H
=20
-#ifndef	CONFIG_NUMA
-
 /* Other architectures wishing to use this simple topology API should =
fill
    in the below functions as appropriate in their own <asm/topology.h>=
 file. */
 #ifndef cpu_to_node
@@ -60,8 +58,6 @@
 				 cpumask_of_node(pcibus_to_node(bus)))
 #endif
=20
-#endif	/* CONFIG_NUMA */

^ permalink raw reply related	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-20  8:40                                               ` KOSAKI Motohiro
  0 siblings, 0 replies; 210+ messages in thread
From: KOSAKI Motohiro @ 2011-04-20  8:40 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: kosaki.motohiro, Christoph Lameter, James Bottomley,
	Michal Hocko, Andrew Morton, Hugh Dickins, linux-mm, LKML,
	linux-parisc, David Rientjes, Ingo Molnar, x86 maintainers

> > btw, x86 don't have an issue. Probably it's a reason why this issue was neglected
> > long time.
> >
> > arch/x86/Kconfig
> > -------------------------------------
> > config ARCH_DISCONTIGMEM_ENABLE
> >        def_bool y
> >        depends on NUMA && X86_32
> 
> That part makes me think the best option is to make parisc do
> CONFIG_NUMA as well regardless of the historical intent was.
> 
>                         Pekka

This?

compile test only.

---
 arch/parisc/Kconfig            |    7 +++++++
 include/asm-generic/topology.h |    4 ----
 include/linux/topology.h       |    2 +-
 3 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig
index 69ff049..0bf9ae8 100644
--- a/arch/parisc/Kconfig
+++ b/arch/parisc/Kconfig
@@ -229,6 +229,12 @@ config HOTPLUG_CPU
 	default y if SMP
 	select HOTPLUG
 
+config NUMA
+	bool "NUMA support"
+	help
+	  Say Y to compile the kernel to support NUMA (Non-Uniform Memory
+	  Access).
+
 config ARCH_SELECT_MEMORY_MODEL
 	def_bool y
 	depends on 64BIT
@@ -236,6 +242,7 @@ config ARCH_SELECT_MEMORY_MODEL
 config ARCH_DISCONTIGMEM_ENABLE
 	def_bool y
 	depends on 64BIT
+	depends on NUMA
 
 config ARCH_FLATMEM_ENABLE
 	def_bool y
diff --git a/include/asm-generic/topology.h b/include/asm-generic/topology.h
index fc824e2..932567b 100644
--- a/include/asm-generic/topology.h
+++ b/include/asm-generic/topology.h
@@ -27,8 +27,6 @@
 #ifndef _ASM_GENERIC_TOPOLOGY_H
 #define _ASM_GENERIC_TOPOLOGY_H
 
-#ifndef	CONFIG_NUMA
-
 /* Other architectures wishing to use this simple topology API should fill
    in the below functions as appropriate in their own <asm/topology.h> file. */
 #ifndef cpu_to_node
@@ -60,8 +58,6 @@
 				 cpumask_of_node(pcibus_to_node(bus)))
 #endif
 
-#endif	/* CONFIG_NUMA */
-
 #if !defined(CONFIG_NUMA) || !defined(CONFIG_HAVE_MEMORYLESS_NODES)
 
 #ifndef set_numa_mem
diff --git a/include/linux/topology.h b/include/linux/topology.h
index b91a40e..e1e535b 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -209,7 +209,7 @@ int arch_update_cpu_topology(void);
 
 #ifdef CONFIG_NUMA
 #ifndef SD_NODE_INIT
-#error Please define an appropriate SD_NODE_INIT in include/asm/topology.h!!!
+#define SD_NODE_INIT SD_ALLNODES_INIT
 #endif
 
 #endif /* CONFIG_NUMA */
-- 
1.7.3.1





^ permalink raw reply related	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-20  8:40                                               ` KOSAKI Motohiro
  0 siblings, 0 replies; 210+ messages in thread
From: KOSAKI Motohiro @ 2011-04-20  8:40 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: kosaki.motohiro, Christoph Lameter, James Bottomley,
	Michal Hocko, Andrew Morton, Hugh Dickins, linux-mm, LKML,
	linux-parisc, David Rientjes, Ingo Molnar, x86 maintainers

> > btw, x86 don't have an issue. Probably it's a reason why this issue was neglected
> > long time.
> >
> > arch/x86/Kconfig
> > -------------------------------------
> > config ARCH_DISCONTIGMEM_ENABLE
> >        def_bool y
> >        depends on NUMA && X86_32
> 
> That part makes me think the best option is to make parisc do
> CONFIG_NUMA as well regardless of the historical intent was.
> 
>                         Pekka

This?

compile test only.

---
 arch/parisc/Kconfig            |    7 +++++++
 include/asm-generic/topology.h |    4 ----
 include/linux/topology.h       |    2 +-
 3 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig
index 69ff049..0bf9ae8 100644
--- a/arch/parisc/Kconfig
+++ b/arch/parisc/Kconfig
@@ -229,6 +229,12 @@ config HOTPLUG_CPU
 	default y if SMP
 	select HOTPLUG
 
+config NUMA
+	bool "NUMA support"
+	help
+	  Say Y to compile the kernel to support NUMA (Non-Uniform Memory
+	  Access).
+
 config ARCH_SELECT_MEMORY_MODEL
 	def_bool y
 	depends on 64BIT
@@ -236,6 +242,7 @@ config ARCH_SELECT_MEMORY_MODEL
 config ARCH_DISCONTIGMEM_ENABLE
 	def_bool y
 	depends on 64BIT
+	depends on NUMA
 
 config ARCH_FLATMEM_ENABLE
 	def_bool y
diff --git a/include/asm-generic/topology.h b/include/asm-generic/topology.h
index fc824e2..932567b 100644
--- a/include/asm-generic/topology.h
+++ b/include/asm-generic/topology.h
@@ -27,8 +27,6 @@
 #ifndef _ASM_GENERIC_TOPOLOGY_H
 #define _ASM_GENERIC_TOPOLOGY_H
 
-#ifndef	CONFIG_NUMA
-
 /* Other architectures wishing to use this simple topology API should fill
    in the below functions as appropriate in their own <asm/topology.h> file. */
 #ifndef cpu_to_node
@@ -60,8 +58,6 @@
 				 cpumask_of_node(pcibus_to_node(bus)))
 #endif
 
-#endif	/* CONFIG_NUMA */
-
 #if !defined(CONFIG_NUMA) || !defined(CONFIG_HAVE_MEMORYLESS_NODES)
 
 #ifndef set_numa_mem
diff --git a/include/linux/topology.h b/include/linux/topology.h
index b91a40e..e1e535b 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -209,7 +209,7 @@ int arch_update_cpu_topology(void);
 
 #ifdef CONFIG_NUMA
 #ifndef SD_NODE_INIT
-#error Please define an appropriate SD_NODE_INIT in include/asm/topology.h!!!
+#define SD_NODE_INIT SD_ALLNODES_INIT
 #endif
 
 #endif /* CONFIG_NUMA */
-- 
1.7.3.1




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-20  7:34                                             ` Pekka Enberg
@ 2011-04-20 11:20                                               ` Matthew Wilcox
  -1 siblings, 0 replies; 210+ messages in thread
From: Matthew Wilcox @ 2011-04-20 11:20 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: KOSAKI Motohiro, Christoph Lameter, James Bottomley,
	Michal Hocko, Andrew Morton, Hugh Dickins, linux-mm, LKML,
	linux-parisc, David Rientjes, Ingo Molnar, x86 maintainers

On Wed, Apr 20, 2011 at 10:34:23AM +0300, Pekka Enberg wrote:
> That part makes me think the best option is to make parisc do
> CONFIG_NUMA as well regardless of the historical intent was.

But it's not just parisc.  It's six other architectures as well, some
of which aren't even SMP.  Does !SMP && NUMA make any kind of sense?

I think really, this is just a giant horrible misunderstanding on the part
of the MM people.  There's no reason why an ARM chip with 16MB of memory
at 0 and 16MB of memory at 1GB should be saddled with all the NUMA gunk.

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-20 11:20                                               ` Matthew Wilcox
  0 siblings, 0 replies; 210+ messages in thread
From: Matthew Wilcox @ 2011-04-20 11:20 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: KOSAKI Motohiro, Christoph Lameter, James Bottomley,
	Michal Hocko, Andrew Morton, Hugh Dickins, linux-mm, LKML,
	linux-parisc, David Rientjes, Ingo Molnar, x86 maintainers

On Wed, Apr 20, 2011 at 10:34:23AM +0300, Pekka Enberg wrote:
> That part makes me think the best option is to make parisc do
> CONFIG_NUMA as well regardless of the historical intent was.

But it's not just parisc.  It's six other architectures as well, some
of which aren't even SMP.  Does !SMP && NUMA make any kind of sense?

I think really, this is just a giant horrible misunderstanding on the part
of the MM people.  There's no reason why an ARM chip with 16MB of memory
at 0 and 16MB of memory at 1GB should be saddled with all the NUMA gunk.

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-20 11:20                                               ` Matthew Wilcox
  (?)
@ 2011-04-20 11:28                                                 ` Pekka Enberg
  -1 siblings, 0 replies; 210+ messages in thread
From: Pekka Enberg @ 2011-04-20 11:28 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: KOSAKI Motohiro, Christoph Lameter, James Bottomley,
	Michal Hocko, Andrew Morton, Hugh Dickins, linux-mm, LKML,
	linux-parisc, David Rientjes, Ingo Molnar, x86 maintainers

Hi Matthew,

On Wed, Apr 20, 2011 at 10:34:23AM +0300, Pekka Enberg wrote:
>> That part makes me think the best option is to make parisc do
>> CONFIG_NUMA as well regardless of the historical intent was.

On Wed, Apr 20, 2011 at 2:20 PM, Matthew Wilcox <matthew@wil.cx> wrote:
> But it's not just parisc. =A0It's six other architectures as well, so=
me
> of which aren't even SMP. =A0Does !SMP && NUMA make any kind of sense=
?

IIRC, we actually fixed SLAB or SLUB to work on such configs in the pas=
t.

On Wed, Apr 20, 2011 at 2:20 PM, Matthew Wilcox <matthew@wil.cx> wrote:
> I think really, this is just a giant horrible misunderstanding on the=
 part
> of the MM people. =A0There's no reason why an ARM chip with 16MB of m=
emory
> at 0 and 16MB of memory at 1GB should be saddled with all the NUMA gu=
nk.

Right. My point was simply that since x86 doesn't support DISCONTIGMEM
without NUMA, the misunderstanding is likely very wide-spread.

                       Pekka
--
To unsubscribe from this list: send the line "unsubscribe linux-parisc"=
 in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-20 11:28                                                 ` Pekka Enberg
  0 siblings, 0 replies; 210+ messages in thread
From: Pekka Enberg @ 2011-04-20 11:28 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: KOSAKI Motohiro, Christoph Lameter, James Bottomley,
	Michal Hocko, Andrew Morton, Hugh Dickins, linux-mm, LKML,
	linux-parisc, David Rientjes, Ingo Molnar, x86 maintainers

Hi Matthew,

On Wed, Apr 20, 2011 at 10:34:23AM +0300, Pekka Enberg wrote:
>> That part makes me think the best option is to make parisc do
>> CONFIG_NUMA as well regardless of the historical intent was.

On Wed, Apr 20, 2011 at 2:20 PM, Matthew Wilcox <matthew@wil.cx> wrote:
> But it's not just parisc.  It's six other architectures as well, some
> of which aren't even SMP.  Does !SMP && NUMA make any kind of sense?

IIRC, we actually fixed SLAB or SLUB to work on such configs in the past.

On Wed, Apr 20, 2011 at 2:20 PM, Matthew Wilcox <matthew@wil.cx> wrote:
> I think really, this is just a giant horrible misunderstanding on the part
> of the MM people.  There's no reason why an ARM chip with 16MB of memory
> at 0 and 16MB of memory at 1GB should be saddled with all the NUMA gunk.

Right. My point was simply that since x86 doesn't support DISCONTIGMEM
without NUMA, the misunderstanding is likely very wide-spread.

                       Pekka

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-20 11:28                                                 ` Pekka Enberg
  0 siblings, 0 replies; 210+ messages in thread
From: Pekka Enberg @ 2011-04-20 11:28 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: KOSAKI Motohiro, Christoph Lameter, James Bottomley,
	Michal Hocko, Andrew Morton, Hugh Dickins, linux-mm, LKML,
	linux-parisc, David Rientjes, Ingo Molnar, x86 maintainers

Hi Matthew,

On Wed, Apr 20, 2011 at 10:34:23AM +0300, Pekka Enberg wrote:
>> That part makes me think the best option is to make parisc do
>> CONFIG_NUMA as well regardless of the historical intent was.

On Wed, Apr 20, 2011 at 2:20 PM, Matthew Wilcox <matthew@wil.cx> wrote:
> But it's not just parisc.  It's six other architectures as well, some
> of which aren't even SMP.  Does !SMP && NUMA make any kind of sense?

IIRC, we actually fixed SLAB or SLUB to work on such configs in the past.

On Wed, Apr 20, 2011 at 2:20 PM, Matthew Wilcox <matthew@wil.cx> wrote:
> I think really, this is just a giant horrible misunderstanding on the part
> of the MM people.  There's no reason why an ARM chip with 16MB of memory
> at 0 and 16MB of memory at 1GB should be saddled with all the NUMA gunk.

Right. My point was simply that since x86 doesn't support DISCONTIGMEM
without NUMA, the misunderstanding is likely very wide-spread.

                       Pekka

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-20  2:57                                           ` KOSAKI Motohiro
@ 2011-04-20 13:50                                             ` Christoph Lameter
  -1 siblings, 0 replies; 210+ messages in thread
From: Christoph Lameter @ 2011-04-20 13:50 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: James Bottomley, Pekka Enberg, Michal Hocko, Andrew Morton,
	Hugh Dickins, linux-mm, LKML, linux-parisc, David Rientjes,
	Tejun Heo

On Wed, 20 Apr 2011, KOSAKI Motohiro wrote:

> >                  from arch/parisc/kernel/asm-offsets.c:31:
> > include/linux/topology.h: In function 'numa_node_id':
> > include/linux/topology.h:255: error: implicit declaration of function 'cpu_to_node'
>
> Sorry about that. I'll see more carefully the code later. Probably long
> time discontig-mem uninterest made multiple level breakage. Grr. ;-)

True. Someone needs to go through discontig and make it work right with a
!NUMA configuration. Many pieces of the core code assume that there will
be no node on a !NUMA config today. I guess that was different in ages
past.

Maybe we should make DISCONTIG broken under !NUMA until that time?

Tejon was working on getting rid of DISCONTIG. SPARSEMEM is the favored
alternative today. So we could potentially change the arches to use SPARSE
configs in the !NUMA case.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-20 13:50                                             ` Christoph Lameter
  0 siblings, 0 replies; 210+ messages in thread
From: Christoph Lameter @ 2011-04-20 13:50 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: James Bottomley, Pekka Enberg, Michal Hocko, Andrew Morton,
	Hugh Dickins, linux-mm, LKML, linux-parisc, David Rientjes,
	Tejun Heo

On Wed, 20 Apr 2011, KOSAKI Motohiro wrote:

> >                  from arch/parisc/kernel/asm-offsets.c:31:
> > include/linux/topology.h: In function 'numa_node_id':
> > include/linux/topology.h:255: error: implicit declaration of function 'cpu_to_node'
>
> Sorry about that. I'll see more carefully the code later. Probably long
> time discontig-mem uninterest made multiple level breakage. Grr. ;-)

True. Someone needs to go through discontig and make it work right with a
!NUMA configuration. Many pieces of the core code assume that there will
be no node on a !NUMA config today. I guess that was different in ages
past.

Maybe we should make DISCONTIG broken under !NUMA until that time?

Tejon was working on getting rid of DISCONTIG. SPARSEMEM is the favored
alternative today. So we could potentially change the arches to use SPARSE
configs in the !NUMA case.



^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-20 11:20                                               ` Matthew Wilcox
@ 2011-04-20 13:58                                                 ` Christoph Lameter
  -1 siblings, 0 replies; 210+ messages in thread
From: Christoph Lameter @ 2011-04-20 13:58 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Pekka Enberg, KOSAKI Motohiro, James Bottomley, Michal Hocko,
	Andrew Morton, Hugh Dickins, linux-mm, LKML, linux-parisc,
	David Rientjes, Ingo Molnar, x86 maintainers, Mel Gorman

On Wed, 20 Apr 2011, Matthew Wilcox wrote:

> On Wed, Apr 20, 2011 at 10:34:23AM +0300, Pekka Enberg wrote:
> > That part makes me think the best option is to make parisc do
> > CONFIG_NUMA as well regardless of the historical intent was.
>
> But it's not just parisc.  It's six other architectures as well, some
> of which aren't even SMP.  Does !SMP && NUMA make any kind of sense?

Of course not.

> I think really, this is just a giant horrible misunderstanding on the part
> of the MM people.  There's no reason why an ARM chip with 16MB of memory
> at 0 and 16MB of memory at 1GB should be saddled with all the NUMA gunk.

DISCONTIG has fallen out of favor in the last years. SPARSEMEM has largely
replaced it. ARM uses that and does not suffer from these issue.

No one considered the issues of having a !NUMA configuration with
nodes (which DISCONTIG seems to create) when developing core code in the
last years. The implicit assumption has always been that page_to_nid(x)
etc is always zero on a !NUMA configuration.

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-20 13:58                                                 ` Christoph Lameter
  0 siblings, 0 replies; 210+ messages in thread
From: Christoph Lameter @ 2011-04-20 13:58 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Pekka Enberg, KOSAKI Motohiro, James Bottomley, Michal Hocko,
	Andrew Morton, Hugh Dickins, linux-mm, LKML, linux-parisc,
	David Rientjes, Ingo Molnar, x86 maintainers, Mel Gorman

On Wed, 20 Apr 2011, Matthew Wilcox wrote:

> On Wed, Apr 20, 2011 at 10:34:23AM +0300, Pekka Enberg wrote:
> > That part makes me think the best option is to make parisc do
> > CONFIG_NUMA as well regardless of the historical intent was.
>
> But it's not just parisc.  It's six other architectures as well, some
> of which aren't even SMP.  Does !SMP && NUMA make any kind of sense?

Of course not.

> I think really, this is just a giant horrible misunderstanding on the part
> of the MM people.  There's no reason why an ARM chip with 16MB of memory
> at 0 and 16MB of memory at 1GB should be saddled with all the NUMA gunk.

DISCONTIG has fallen out of favor in the last years. SPARSEMEM has largely
replaced it. ARM uses that and does not suffer from these issue.

No one considered the issues of having a !NUMA configuration with
nodes (which DISCONTIG seems to create) when developing core code in the
last years. The implicit assumption has always been that page_to_nid(x)
etc is always zero on a !NUMA configuration.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-20  7:34                                             ` Pekka Enberg
@ 2011-04-20 14:07                                               ` Christoph Lameter
  -1 siblings, 0 replies; 210+ messages in thread
From: Christoph Lameter @ 2011-04-20 14:07 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: KOSAKI Motohiro, James Bottomley, Michal Hocko, Andrew Morton,
	Hugh Dickins, linux-mm, LKML, linux-parisc, David Rientjes,
	Ingo Molnar, x86 maintainers, Tejun Heo

On Wed, 20 Apr 2011, Pekka Enberg wrote:

> That part makes me think the best option is to make parisc do
> CONFIG_NUMA as well regardless of the historical intent was.

Another possilibity is to use SPARSEMEM instead? We can do the same for
the other arches that we have done to x86.



^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-20 14:07                                               ` Christoph Lameter
  0 siblings, 0 replies; 210+ messages in thread
From: Christoph Lameter @ 2011-04-20 14:07 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: KOSAKI Motohiro, James Bottomley, Michal Hocko, Andrew Morton,
	Hugh Dickins, linux-mm, LKML, linux-parisc, David Rientjes,
	Ingo Molnar, x86 maintainers, Tejun Heo

On Wed, 20 Apr 2011, Pekka Enberg wrote:

> That part makes me think the best option is to make parisc do
> CONFIG_NUMA as well regardless of the historical intent was.

Another possilibity is to use SPARSEMEM instead? We can do the same for
the other arches that we have done to x86.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-20 11:28                                                 ` Pekka Enberg
@ 2011-04-20 14:15                                                   ` James Bottomley
  -1 siblings, 0 replies; 210+ messages in thread
From: James Bottomley @ 2011-04-20 14:15 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Matthew Wilcox, KOSAKI Motohiro, Christoph Lameter, Michal Hocko,
	Andrew Morton, Hugh Dickins, linux-mm, LKML, linux-parisc,
	David Rientjes, Ingo Molnar, x86 maintainers, linux-arch

[added linux-arch to cc since we're going to be affecting them]
On Wed, 2011-04-20 at 14:28 +0300, Pekka Enberg wrote:
> Right. My point was simply that since x86 doesn't support DISCONTIGMEM
> without NUMA, the misunderstanding is likely very wide-spread.

Why don't we approach the problem in a few separate ways then. 

     1. We can look at what imposing NUMA on the DISCONTIGMEM archs
        would do ... the embedded ones are going to be hardest hit, but
        if it's not too much extra code, it might be palatable.
     2. The other is that we can audit mm to look at all the node
        assumptions in the non-numa case.  My suspicion is that
        accidentally or otherwise, it mostly works for the normal case,
        so there might not be much needed to pull it back to working
        properly for DISCONTIGMEM.
     3. Finally we could look at deprecating DISCONTIGMEM in favour of
        SPARSEMEM, but we'd still need to fix -stable for that case.
        Especially as it will take time to convert all the architectures

I'm certainly with Matthew: DISCONTIGMEM is supposed to be a lightweight
framework which allows machines with split physical memory ranges to
work.  That's a very common case nowadays.  Numa is supposed to be a
heavyweight framework to preserve node locality for non-uniform memory
access boxes (which none of the DISCONTIGMEM && !NUMA systems are).

James

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-20 14:15                                                   ` James Bottomley
  0 siblings, 0 replies; 210+ messages in thread
From: James Bottomley @ 2011-04-20 14:15 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Matthew Wilcox, KOSAKI Motohiro, Christoph Lameter, Michal Hocko,
	Andrew Morton, Hugh Dickins, linux-mm, LKML, linux-parisc,
	David Rientjes, Ingo Molnar, x86 maintainers, linux-arch

[added linux-arch to cc since we're going to be affecting them]
On Wed, 2011-04-20 at 14:28 +0300, Pekka Enberg wrote:
> Right. My point was simply that since x86 doesn't support DISCONTIGMEM
> without NUMA, the misunderstanding is likely very wide-spread.

Why don't we approach the problem in a few separate ways then. 

     1. We can look at what imposing NUMA on the DISCONTIGMEM archs
        would do ... the embedded ones are going to be hardest hit, but
        if it's not too much extra code, it might be palatable.
     2. The other is that we can audit mm to look at all the node
        assumptions in the non-numa case.  My suspicion is that
        accidentally or otherwise, it mostly works for the normal case,
        so there might not be much needed to pull it back to working
        properly for DISCONTIGMEM.
     3. Finally we could look at deprecating DISCONTIGMEM in favour of
        SPARSEMEM, but we'd still need to fix -stable for that case.
        Especially as it will take time to convert all the architectures

I'm certainly with Matthew: DISCONTIGMEM is supposed to be a lightweight
framework which allows machines with split physical memory ranges to
work.  That's a very common case nowadays.  Numa is supposed to be a
heavyweight framework to preserve node locality for non-uniform memory
access boxes (which none of the DISCONTIGMEM && !NUMA systems are).

James


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-20 14:15                                                   ` James Bottomley
@ 2011-04-20 14:50                                                     ` Christoph Lameter
  -1 siblings, 0 replies; 210+ messages in thread
From: Christoph Lameter @ 2011-04-20 14:50 UTC (permalink / raw)
  To: James Bottomley
  Cc: Pekka Enberg, Matthew Wilcox, KOSAKI Motohiro, Michal Hocko,
	Andrew Morton, Hugh Dickins, linux-mm, LKML, linux-parisc,
	David Rientjes, Ingo Molnar, x86 maintainers, linux-arch,
	Mel Gorman

On Wed, 20 Apr 2011, James Bottomley wrote:

>      1. We can look at what imposing NUMA on the DISCONTIGMEM archs
>         would do ... the embedded ones are going to be hardest hit, but
>         if it's not too much extra code, it might be palatable.
>      2. The other is that we can audit mm to look at all the node
>         assumptions in the non-numa case.  My suspicion is that
>         accidentally or otherwise, it mostly works for the normal case,
>         so there might not be much needed to pull it back to working
>         properly for DISCONTIGMEM.

The older code may work. SLAB f.e. does not call page_to_nid() in the
!NUMA case but keeps special metadata structures around in each slab page
that records the node used for allocation. The problem is with new code
added/revised in the last 5 years or so that uses page_to_nid() and
allocates only a single structure for !NUMA. There are also VM_BUG_ONs in
the page allocator that should trigger if page_to_nid() returns strange
values. I wonder why that never occurred.

>      3. Finally we could look at deprecating DISCONTIGMEM in favour
of >         SPARSEMEM, but we'd still need to fix -stable for that case.
>         Especially as it will take time to convert all the architectures

The fix needed is to mark DISCONTIGMEM without NUMA as broken for now. We
need an audit of the core VM before removing that or making it contingent
on the configurations of various VM subsystems.

> I'm certainly with Matthew: DISCONTIGMEM is supposed to be a lightweight
> framework which allows machines with split physical memory ranges to
> work.  That's a very common case nowadays.  Numa is supposed to be a
> heavyweight framework to preserve node locality for non-uniform memory
> access boxes (which none of the DISCONTIGMEM && !NUMA systems are).

Well yes but we have SPARSE for that today. DISCONTIG with multiple per
pgdat structures in a !NUMA case is just weird and unexpected for many who
have done VM coding in the last years.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-20 14:50                                                     ` Christoph Lameter
  0 siblings, 0 replies; 210+ messages in thread
From: Christoph Lameter @ 2011-04-20 14:50 UTC (permalink / raw)
  To: James Bottomley
  Cc: Pekka Enberg, Matthew Wilcox, KOSAKI Motohiro, Michal Hocko,
	Andrew Morton, Hugh Dickins, linux-mm, LKML, linux-parisc,
	David Rientjes, Ingo Molnar, x86 maintainers, linux-arch,
	Mel Gorman

On Wed, 20 Apr 2011, James Bottomley wrote:

>      1. We can look at what imposing NUMA on the DISCONTIGMEM archs
>         would do ... the embedded ones are going to be hardest hit, but
>         if it's not too much extra code, it might be palatable.
>      2. The other is that we can audit mm to look at all the node
>         assumptions in the non-numa case.  My suspicion is that
>         accidentally or otherwise, it mostly works for the normal case,
>         so there might not be much needed to pull it back to working
>         properly for DISCONTIGMEM.

The older code may work. SLAB f.e. does not call page_to_nid() in the
!NUMA case but keeps special metadata structures around in each slab page
that records the node used for allocation. The problem is with new code
added/revised in the last 5 years or so that uses page_to_nid() and
allocates only a single structure for !NUMA. There are also VM_BUG_ONs in
the page allocator that should trigger if page_to_nid() returns strange
values. I wonder why that never occurred.

>      3. Finally we could look at deprecating DISCONTIGMEM in favour
of >         SPARSEMEM, but we'd still need to fix -stable for that case.
>         Especially as it will take time to convert all the architectures

The fix needed is to mark DISCONTIGMEM without NUMA as broken for now. We
need an audit of the core VM before removing that or making it contingent
on the configurations of various VM subsystems.

> I'm certainly with Matthew: DISCONTIGMEM is supposed to be a lightweight
> framework which allows machines with split physical memory ranges to
> work.  That's a very common case nowadays.  Numa is supposed to be a
> heavyweight framework to preserve node locality for non-uniform memory
> access boxes (which none of the DISCONTIGMEM && !NUMA systems are).

Well yes but we have SPARSE for that today. DISCONTIG with multiple per
pgdat structures in a !NUMA case is just weird and unexpected for many who
have done VM coding in the last years.


^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-20 14:50                                                     ` Christoph Lameter
@ 2011-04-20 15:02                                                       ` James Bottomley
  -1 siblings, 0 replies; 210+ messages in thread
From: James Bottomley @ 2011-04-20 15:02 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, Matthew Wilcox, KOSAKI Motohiro, Michal Hocko,
	Andrew Morton, Hugh Dickins, linux-mm, LKML, linux-parisc,
	David Rientjes, Ingo Molnar, x86 maintainers, linux-arch,
	Mel Gorman

On Wed, 2011-04-20 at 09:50 -0500, Christoph Lameter wrote:
> On Wed, 20 Apr 2011, James Bottomley wrote:
> 
> >      1. We can look at what imposing NUMA on the DISCONTIGMEM archs
> >         would do ... the embedded ones are going to be hardest hit, but
> >         if it's not too much extra code, it might be palatable.
> >      2. The other is that we can audit mm to look at all the node
> >         assumptions in the non-numa case.  My suspicion is that
> >         accidentally or otherwise, it mostly works for the normal case,
> >         so there might not be much needed to pull it back to working
> >         properly for DISCONTIGMEM.
> 
> The older code may work. SLAB f.e. does not call page_to_nid() in the
> !NUMA case but keeps special metadata structures around in each slab page
> that records the node used for allocation. The problem is with new code
> added/revised in the last 5 years or so that uses page_to_nid() and
> allocates only a single structure for !NUMA. There are also VM_BUG_ONs in
> the page allocator that should trigger if page_to_nid() returns strange
> values. I wonder why that never occurred.

Actually, I think slab got changed when discontigmem was added ...
that's why it all works OK.

> >      3. Finally we could look at deprecating DISCONTIGMEM in favour
> of >         SPARSEMEM, but we'd still need to fix -stable for that case.
> >         Especially as it will take time to convert all the architectures
> 
> The fix needed is to mark DISCONTIGMEM without NUMA as broken for now. We
> need an audit of the core VM before removing that or making it contingent
> on the configurations of various VM subsystems.

Don't be stupid ... that would cause six architectures to get marked
broken.

> > I'm certainly with Matthew: DISCONTIGMEM is supposed to be a lightweight
> > framework which allows machines with split physical memory ranges to
> > work.  That's a very common case nowadays.  Numa is supposed to be a
> > heavyweight framework to preserve node locality for non-uniform memory
> > access boxes (which none of the DISCONTIGMEM && !NUMA systems are).
> 
> Well yes but we have SPARSE for that today. DISCONTIG with multiple per
> pgdat structures in a !NUMA case is just weird and unexpected for many who
> have done VM coding in the last years.

Look, I'm not really interested in who understands what.  The fact is we
have six architectures with the possibility for DISCONTIGMEM && !NUMA,
so that's the case we need to fix in -stable.

They oops with SLUB, as far as I can tell, there are still no oops
reports with SLAB.  The simplest -stable fix seems to be to mark SLUB
broken on DISCONTIGMEM && !NUMA.

James



^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-20 15:02                                                       ` James Bottomley
  0 siblings, 0 replies; 210+ messages in thread
From: James Bottomley @ 2011-04-20 15:02 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, Matthew Wilcox, KOSAKI Motohiro, Michal Hocko,
	Andrew Morton, Hugh Dickins, linux-mm, LKML, linux-parisc,
	David Rientjes, Ingo Molnar, x86 maintainers, linux-arch,
	Mel Gorman

On Wed, 2011-04-20 at 09:50 -0500, Christoph Lameter wrote:
> On Wed, 20 Apr 2011, James Bottomley wrote:
> 
> >      1. We can look at what imposing NUMA on the DISCONTIGMEM archs
> >         would do ... the embedded ones are going to be hardest hit, but
> >         if it's not too much extra code, it might be palatable.
> >      2. The other is that we can audit mm to look at all the node
> >         assumptions in the non-numa case.  My suspicion is that
> >         accidentally or otherwise, it mostly works for the normal case,
> >         so there might not be much needed to pull it back to working
> >         properly for DISCONTIGMEM.
> 
> The older code may work. SLAB f.e. does not call page_to_nid() in the
> !NUMA case but keeps special metadata structures around in each slab page
> that records the node used for allocation. The problem is with new code
> added/revised in the last 5 years or so that uses page_to_nid() and
> allocates only a single structure for !NUMA. There are also VM_BUG_ONs in
> the page allocator that should trigger if page_to_nid() returns strange
> values. I wonder why that never occurred.

Actually, I think slab got changed when discontigmem was added ...
that's why it all works OK.

> >      3. Finally we could look at deprecating DISCONTIGMEM in favour
> of >         SPARSEMEM, but we'd still need to fix -stable for that case.
> >         Especially as it will take time to convert all the architectures
> 
> The fix needed is to mark DISCONTIGMEM without NUMA as broken for now. We
> need an audit of the core VM before removing that or making it contingent
> on the configurations of various VM subsystems.

Don't be stupid ... that would cause six architectures to get marked
broken.

> > I'm certainly with Matthew: DISCONTIGMEM is supposed to be a lightweight
> > framework which allows machines with split physical memory ranges to
> > work.  That's a very common case nowadays.  Numa is supposed to be a
> > heavyweight framework to preserve node locality for non-uniform memory
> > access boxes (which none of the DISCONTIGMEM && !NUMA systems are).
> 
> Well yes but we have SPARSE for that today. DISCONTIG with multiple per
> pgdat structures in a !NUMA case is just weird and unexpected for many who
> have done VM coding in the last years.

Look, I'm not really interested in who understands what.  The fact is we
have six architectures with the possibility for DISCONTIGMEM && !NUMA,
so that's the case we need to fix in -stable.

They oops with SLUB, as far as I can tell, there are still no oops
reports with SLAB.  The simplest -stable fix seems to be to mark SLUB
broken on DISCONTIGMEM && !NUMA.

James


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-20 15:02                                                       ` James Bottomley
@ 2011-04-20 15:22                                                         ` Christoph Lameter
  -1 siblings, 0 replies; 210+ messages in thread
From: Christoph Lameter @ 2011-04-20 15:22 UTC (permalink / raw)
  To: James Bottomley
  Cc: Pekka Enberg, Matthew Wilcox, KOSAKI Motohiro, Michal Hocko,
	Andrew Morton, Hugh Dickins, linux-mm, LKML, linux-parisc,
	David Rientjes, Ingo Molnar, x86 maintainers, linux-arch,
	Mel Gorman

On Wed, 20 Apr 2011, James Bottomley wrote:

> > The older code may work. SLAB f.e. does not call page_to_nid() in the
> > !NUMA case but keeps special metadata structures around in each slab page
> > that records the node used for allocation. The problem is with new code
> > added/revised in the last 5 years or so that uses page_to_nid() and
> > allocates only a single structure for !NUMA. There are also VM_BUG_ONs in
> > the page allocator that should trigger if page_to_nid() returns strange
> > values. I wonder why that never occurred.
>
> Actually, I think slab got changed when discontigmem was added ...
> that's why it all works OK.

Could be. I was not around at the time.

> > >      3. Finally we could look at deprecating DISCONTIGMEM in favour
> > of >         SPARSEMEM, but we'd still need to fix -stable for that case.
> > >         Especially as it will take time to convert all the architectures
> >
> > The fix needed is to mark DISCONTIGMEM without NUMA as broken for now. We
> > need an audit of the core VM before removing that or making it contingent
> > on the configurations of various VM subsystems.
>
> Don't be stupid ... that would cause six architectures to get marked
> broken.

Yes they are broken right now. Marking just means showing the user that we
are aware of the situation.

> Look, I'm not really interested in who understands what.  The fact is we
> have six architectures with the possibility for DISCONTIGMEM && !NUMA,
> so that's the case we need to fix in -stable.
>
> They oops with SLUB, as far as I can tell, there are still no oops
> reports with SLAB.  The simplest -stable fix seems to be to mark SLUB
> broken on DISCONTIGMEM && !NUMA.

There is barely any testing going on at all of this since we have had this
issue for more than 5 years and have not noticed it. The absence of bug
reports therefore proves nothing. Code inspection of the VM shows
that this is an issue that arises in multiple subsystems and that we have
VM_BUG_ONs in the page allocator that should trigger for these situations.

Usage of DISCONTIGMEM and !NUMA is not safe and should be flagged as such.

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-20 15:22                                                         ` Christoph Lameter
  0 siblings, 0 replies; 210+ messages in thread
From: Christoph Lameter @ 2011-04-20 15:22 UTC (permalink / raw)
  To: James Bottomley
  Cc: Pekka Enberg, Matthew Wilcox, KOSAKI Motohiro, Michal Hocko,
	Andrew Morton, Hugh Dickins, linux-mm, LKML, linux-parisc,
	David Rientjes, Ingo Molnar, x86 maintainers, linux-arch,
	Mel Gorman

On Wed, 20 Apr 2011, James Bottomley wrote:

> > The older code may work. SLAB f.e. does not call page_to_nid() in the
> > !NUMA case but keeps special metadata structures around in each slab page
> > that records the node used for allocation. The problem is with new code
> > added/revised in the last 5 years or so that uses page_to_nid() and
> > allocates only a single structure for !NUMA. There are also VM_BUG_ONs in
> > the page allocator that should trigger if page_to_nid() returns strange
> > values. I wonder why that never occurred.
>
> Actually, I think slab got changed when discontigmem was added ...
> that's why it all works OK.

Could be. I was not around at the time.

> > >      3. Finally we could look at deprecating DISCONTIGMEM in favour
> > of >         SPARSEMEM, but we'd still need to fix -stable for that case.
> > >         Especially as it will take time to convert all the architectures
> >
> > The fix needed is to mark DISCONTIGMEM without NUMA as broken for now. We
> > need an audit of the core VM before removing that or making it contingent
> > on the configurations of various VM subsystems.
>
> Don't be stupid ... that would cause six architectures to get marked
> broken.

Yes they are broken right now. Marking just means showing the user that we
are aware of the situation.

> Look, I'm not really interested in who understands what.  The fact is we
> have six architectures with the possibility for DISCONTIGMEM && !NUMA,
> so that's the case we need to fix in -stable.
>
> They oops with SLUB, as far as I can tell, there are still no oops
> reports with SLAB.  The simplest -stable fix seems to be to mark SLUB
> broken on DISCONTIGMEM && !NUMA.

There is barely any testing going on at all of this since we have had this
issue for more than 5 years and have not noticed it. The absence of bug
reports therefore proves nothing. Code inspection of the VM shows
that this is an issue that arises in multiple subsystems and that we have
VM_BUG_ONs in the page allocator that should trigger for these situations.

Usage of DISCONTIGMEM and !NUMA is not safe and should be flagged as such.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-20  8:40                                               ` KOSAKI Motohiro
  (?)
@ 2011-04-20 16:32                                                 ` James Bottomley
  -1 siblings, 0 replies; 210+ messages in thread
From: James Bottomley @ 2011-04-20 16:32 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Pekka Enberg, Christoph Lameter, Michal Hocko, Andrew Morton,
	Hugh Dickins, linux-mm, LKML, linux-parisc, David Rientjes,
	Ingo Molnar, x86 maintainers

On Wed, 2011-04-20 at 17:40 +0900, KOSAKI Motohiro wrote:
> > > btw, x86 don't have an issue. Probably it's a reason why this issue was neglected
> > > long time.
> > >
> > > arch/x86/Kconfig
> > > -------------------------------------
> > > config ARCH_DISCONTIGMEM_ENABLE
> > >        def_bool y
> > >        depends on NUMA && X86_32
> > 
> > That part makes me think the best option is to make parisc do
> > CONFIG_NUMA as well regardless of the historical intent was.
> > 
> >                         Pekka
> 
> This?

I'm afraid it doesn't boot (it's another slub crash):

[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Linux version 2.6.39-rc3+ (jejb@ion) (gcc version 4.2.4 (Debian 4.2.4-6)) #30 SMP Wed Apr 20 08:52:23 PDT 2011
[    0.000000] unwind_init: start = 0x4057a000, end = 0x405b0e80, entries = 14056
[    0.000000] WARNING: Out of order unwind entry! 000000004057b470 and 000000004057b480
[    0.000000] WARNING: Out of order unwind entry! 000000004057b480 and 000000004057b490
[    0.000000] WARNING: Out of order unwind entry! 000000004057c160 and 000000004057c170
[    0.000000] WARNING: Out of order unwind entry! 000000004057c170 and 000000004057c180
[    0.000000] FP[0] enabled: Rev 1 Model 20
[    0.000000] The 64-bit Kernel has started...
[    0.000000] bootconsole [ttyB0] enabled
[    0.000000] Initialized PDC Console for debugging.
[    0.000000] Determining PDC firmware type: 64 bit PAT.
[    0.000000] model 00008870 00000491 00000000 00000002 3e0505e7352af710 100000f0 00000008 000000b2 000000b2
[    0.000000] vers  00000301
[    0.000000] CPUID vers 20 rev 4 (0x00000284)
[    0.000000] capabilities 0x35
[    0.000000] model 9000/800/rp3440  
[    0.000000] parisc_cache_init: Only equivalent aliasing supported!
[    0.000000] Memory Ranges:
[    0.000000]  0) Start 0x0000000000000000 End 0x000000003fffffff Size   1024 MB
[    0.000000]  1) Start 0x0000004040000000 End 0x000000407fdfffff Size   1022 MB
[    0.000000] Total Memory: 2046 MB
[    0.000000] initrd: 7f390000-7ffedaa1
[    0.000000] initrd: reserving 3f390000-3ffedaa1 (mem_max 7fe00000)
[    0.000000] PERCPU: Embedded 10 pages/cpu @00000000418f5000 s12288 r8192 d20480 u40960
[    0.000000] SMP: bootstrap CPU ID is 0
[    0.000000] Built 2 zonelists in Node order, mobility grouping on.  Total pages: 258560
[    0.000000] Policy zone: Normal
[    0.000000] Kernel command line:  root=/dev/sda3 panic=5 console=ttyS1 palo_kernel=1/vmlinux-test
[    0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes)
[    0.000000] Memory: 2042228k/2095104k available (3849k kernel code, 52876k reserved, 1661k data, 324k init)
[    0.000000] virtual kernel memory layout:
[    0.000000]     vmalloc : 0x0000000000008000 - 0x000000003f000000   (1007 MB)
[    0.000000]     memory  : 0x0000000040000000 - 0x00000040bfe00000   (264190 MB)
[    0.000000]       .init : 0x000000004077c000 - 0x00000000407cd000   ( 324 kB)
[    0.000000]       .data : 0x00000000404c2518 - 0x0000000040661920   (1661 kB)
[    0.000000]       .text : 0x0000000040100000 - 0x00000000404c2518   (3849 kB)
[    0.000000] SLUB: Genslabs=11, HWalign=64, Order=0-3, MinObjects=0, CPUs=8, Nodes=8
[    0.000000] Hierarchical RCU implementation.
[    0.000000]  CONFIG_RCU_FANOUT set to non-default value of 32
[    0.000000]  RCU-based detection of stalled CPUs is disabled.
[    0.000000] NR_IRQS:128
[    0.000000] Console: colour dummy device 160x64
[    0.000000] numa_policy_init: interleaving failed
[    0.000000] Calibrating delay loop... 1594.36 BogoMIPS (lpj=3188736)
[    0.048000] pid_max: default: 32768 minimum: 301
[    0.048000] Security Framework initialized
[    0.048000] SELinux:  Disabled at boot.
[    0.060000] Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes)
[    0.072000] Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes)
[    0.076000] Mount-cache hash table entries: 256
[    0.076000] Initializing cgroup subsys ns
[    0.076000] ns_cgroup deprecated: consider using the 'clone_children' flag without the ns_cgroup.
[    0.080000] Initializing cgroup subsys cpuacct
[    0.092000] Initializing cgroup subsys devices
[    0.092000] Initializing cgroup subsys freezer
[    0.100000] Initializing cgroup subsys net_cls
[    0.100000] Initializing cgroup subsys blkio
[    0.200000] Backtrace:
[    0.200000]  [<000000004021c938>] add_partial+0x28/0x98
[    0.200000]  [<000000004021faa0>] __slab_free+0x1d0/0x1d8
[    0.200000]  [<000000004021fd04>] kmem_cache_free+0xc4/0x128
[    0.200000]  [<000000004033bf9c>] ida_get_new_above+0x21c/0x2c0
[    0.200000]  [<00000000402a8980>] sysfs_new_dirent+0xd0/0x238
[    0.200000]  [<00000000402a974c>] create_dir+0x5c/0x168
[    0.200000]  [<00000000402a9ab0>] sysfs_create_dir+0x98/0x128
[    0.200000]  [<000000004033d6c4>] kobject_add_internal+0x114/0x258
[    0.200000]  [<000000004033d9ac>] kobject_add_varg+0x7c/0xa0
[    0.200000]  [<000000004033df20>] kobject_add+0x50/0x90
[    0.200000]  [<000000004033dfb4>] kobject_create_and_add+0x54/0xc8
[    0.200000]  [<00000000407862a0>] cgroup_init+0x138/0x1f0
[    0.200000]  [<000000004077ce50>] start_kernel+0x5a0/0x840
[    0.200000]  [<000000004011fa3c>] start_parisc+0xa4/0xb8
[    0.200000]  [<00000000404bb034>] packet_ioctl+0x16c/0x208
[    0.200000]  [<000000004049ac30>] ip_mroute_setsockopt+0x260/0xf20
[    0.200000] 
[    0.200000] 
[    0.200000] Kernel Fault: Code=26 regs=00000000405bca80 (Addr=0000000000000000)
[    0.200000] 
[    0.200000]      YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
[    0.200000] PSW: 00001000000001001111001000001110 Not tainted
[    0.200000] r00-03  000000ff0804f20e 0000000040778360 000000004021c938 0000000000000000
[    0.200000] r04-07  0000000040754b60 0000000000000001 00000000418b1440 000000007ec01280
[    0.200000] r08-11  0000000000000000 00000000405f7380 00000000000003c0 000000007fffffff
[    0.200000] r12-15  00000000405bc7d8 00000000405f7398 00000000405bc6e8 00000000000041ed
[    0.200000] r16-19  00000000f0d00b0c 0000000000000000 0000000000000000 0000000000000000
[    0.200000] r20-23  000000000800000e 0000000000000001 00000000f0000000 000000004033bf9c
[    0.200000] r24-27  0000000000000001 00000000418b1440 0000000000000000 0000000040754b60
[    0.200000] r28-31  000000007ec08000 00000000405bca50 00000000405bca80 000000000000001d
[    0.200000] sr00-03  0000000000000000 0000000000000000 0000000000000000 0000000000000000
[    0.200000] sr04-07  0000000000000000 0000000000000000 0000000000000000 0000000000000000
[    0.200000] 
[    0.200000] IASQ: 0000000000000000 0000000000000000 IAOQ: 000000004011c0f0 000000004011c0f4
[    0.200000]  IIR: 0f4015dc    ISR: 0000000000000000  IOR: 0000000000000000
[    0.200000]  CPU:        0   CR30: 00000000405bc000 CR31: fffffff0f0e098e0
[    0.200000]  ORIG_R28: 0000000000000100
[    0.200000]  IAOQ[0]: _raw_spin_lock+0x0/0x20
[    0.200000]  IAOQ[1]: _raw_spin_lock+0x4/0x20
[    0.200000]  RP(r2): add_partial+0x28/0x98
[    0.200000] Backtrace:
[    0.200000]  [<000000004021c938>] add_partial+0x28/0x98
[    0.200000]  [<000000004021faa0>] __slab_free+0x1d0/0x1d8
[    0.200000]  [<000000004021fd04>] kmem_cache_free+0xc4/0x128
[    0.200000]  [<000000004033bf9c>] ida_get_new_above+0x21c/0x2c0
[    0.200000]  [<00000000402a8980>] sysfs_new_dirent+0xd0/0x238
[    0.200000]  [<00000000402a974c>] create_dir+0x5c/0x168
[    0.200000]  [<00000000402a9ab0>] sysfs_create_dir+0x98/0x128
[    0.200000]  [<000000004033d6c4>] kobject_add_internal+0x114/0x258
[    0.200000]  [<000000004033d9ac>] kobject_add_varg+0x7c/0xa0
[    0.200000]  [<000000004033df20>] kobject_add+0x50/0x90
[    0.200000]  [<000000004033dfb4>] kobject_create_and_add+0x54/0xc8
[    0.200000]  [<00000000407862a0>] cgroup_init+0x138/0x1f0
[    0.200000]  [<000000004077ce50>] start_kernel+0x5a0/0x840
[    0.200000]  [<000000004011fa3c>] start_parisc+0xa4/0xb8
[    0.200000]  [<00000000404bb034>] packet_ioctl+0x16c/0x208
[    0.200000]  [<000000004049ac30>] ip_mroute_setsockopt+0x260/0xf20
[    0.200000] 
[    0.200000] Kernel panic - not syncing: Kernel Fault
[    0.200000] Backtrace:
[    0.200000]  [<000000004011fec4>] show_stack+0x14/0x20
[    0.200000]  [<000000004011fee8>] dump_stack+0x18/0x28
[    0.200000]  [<000000004015a9a4>] panic+0xd4/0x368
[    0.200000]  [<0000000040120564>] parisc_terminate+0x14c/0x170
[    0.200000]  [<0000000040120adc>] handle_interruption+0x2ac/0x8f8
[    0.200000]  [<000000004011c0f0>] _raw_spin_lock+0x0/0x20
[    0.200000] 
[    0.200000] Rebooting in 5 seconds..

James


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-20 16:32                                                 ` James Bottomley
  0 siblings, 0 replies; 210+ messages in thread
From: James Bottomley @ 2011-04-20 16:32 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Pekka Enberg, Christoph Lameter, Michal Hocko, Andrew Morton,
	Hugh Dickins, linux-mm, LKML, linux-parisc, David Rientjes,
	Ingo Molnar, x86 maintainers

On Wed, 2011-04-20 at 17:40 +0900, KOSAKI Motohiro wrote:
> > > btw, x86 don't have an issue. Probably it's a reason why this issue was neglected
> > > long time.
> > >
> > > arch/x86/Kconfig
> > > -------------------------------------
> > > config ARCH_DISCONTIGMEM_ENABLE
> > >        def_bool y
> > >        depends on NUMA && X86_32
> > 
> > That part makes me think the best option is to make parisc do
> > CONFIG_NUMA as well regardless of the historical intent was.
> > 
> >                         Pekka
> 
> This?

I'm afraid it doesn't boot (it's another slub crash):

[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Linux version 2.6.39-rc3+ (jejb@ion) (gcc version 4.2.4 (Debian 4.2.4-6)) #30 SMP Wed Apr 20 08:52:23 PDT 2011
[    0.000000] unwind_init: start = 0x4057a000, end = 0x405b0e80, entries = 14056
[    0.000000] WARNING: Out of order unwind entry! 000000004057b470 and 000000004057b480
[    0.000000] WARNING: Out of order unwind entry! 000000004057b480 and 000000004057b490
[    0.000000] WARNING: Out of order unwind entry! 000000004057c160 and 000000004057c170
[    0.000000] WARNING: Out of order unwind entry! 000000004057c170 and 000000004057c180
[    0.000000] FP[0] enabled: Rev 1 Model 20
[    0.000000] The 64-bit Kernel has started...
[    0.000000] bootconsole [ttyB0] enabled
[    0.000000] Initialized PDC Console for debugging.
[    0.000000] Determining PDC firmware type: 64 bit PAT.
[    0.000000] model 00008870 00000491 00000000 00000002 3e0505e7352af710 100000f0 00000008 000000b2 000000b2
[    0.000000] vers  00000301
[    0.000000] CPUID vers 20 rev 4 (0x00000284)
[    0.000000] capabilities 0x35
[    0.000000] model 9000/800/rp3440  
[    0.000000] parisc_cache_init: Only equivalent aliasing supported!
[    0.000000] Memory Ranges:
[    0.000000]  0) Start 0x0000000000000000 End 0x000000003fffffff Size   1024 MB
[    0.000000]  1) Start 0x0000004040000000 End 0x000000407fdfffff Size   1022 MB
[    0.000000] Total Memory: 2046 MB
[    0.000000] initrd: 7f390000-7ffedaa1
[    0.000000] initrd: reserving 3f390000-3ffedaa1 (mem_max 7fe00000)
[    0.000000] PERCPU: Embedded 10 pages/cpu @00000000418f5000 s12288 r8192 d20480 u40960
[    0.000000] SMP: bootstrap CPU ID is 0
[    0.000000] Built 2 zonelists in Node order, mobility grouping on.  Total pages: 258560
[    0.000000] Policy zone: Normal
[    0.000000] Kernel command line:  root=/dev/sda3 panic=5 console=ttyS1 palo_kernel=1/vmlinux-test
[    0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes)
[    0.000000] Memory: 2042228k/2095104k available (3849k kernel code, 52876k reserved, 1661k data, 324k init)
[    0.000000] virtual kernel memory layout:
[    0.000000]     vmalloc : 0x0000000000008000 - 0x000000003f000000   (1007 MB)
[    0.000000]     memory  : 0x0000000040000000 - 0x00000040bfe00000   (264190 MB)
[    0.000000]       .init : 0x000000004077c000 - 0x00000000407cd000   ( 324 kB)
[    0.000000]       .data : 0x00000000404c2518 - 0x0000000040661920   (1661 kB)
[    0.000000]       .text : 0x0000000040100000 - 0x00000000404c2518   (3849 kB)
[    0.000000] SLUB: Genslabs=11, HWalign=64, Order=0-3, MinObjects=0, CPUs=8, Nodes=8
[    0.000000] Hierarchical RCU implementation.
[    0.000000]  CONFIG_RCU_FANOUT set to non-default value of 32
[    0.000000]  RCU-based detection of stalled CPUs is disabled.
[    0.000000] NR_IRQS:128
[    0.000000] Console: colour dummy device 160x64
[    0.000000] numa_policy_init: interleaving failed
[    0.000000] Calibrating delay loop... 1594.36 BogoMIPS (lpj=3188736)
[    0.048000] pid_max: default: 32768 minimum: 301
[    0.048000] Security Framework initialized
[    0.048000] SELinux:  Disabled at boot.
[    0.060000] Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes)
[    0.072000] Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes)
[    0.076000] Mount-cache hash table entries: 256
[    0.076000] Initializing cgroup subsys ns
[    0.076000] ns_cgroup deprecated: consider using the 'clone_children' flag without the ns_cgroup.
[    0.080000] Initializing cgroup subsys cpuacct
[    0.092000] Initializing cgroup subsys devices
[    0.092000] Initializing cgroup subsys freezer
[    0.100000] Initializing cgroup subsys net_cls
[    0.100000] Initializing cgroup subsys blkio
[    0.200000] Backtrace:
[    0.200000]  [<000000004021c938>] add_partial+0x28/0x98
[    0.200000]  [<000000004021faa0>] __slab_free+0x1d0/0x1d8
[    0.200000]  [<000000004021fd04>] kmem_cache_free+0xc4/0x128
[    0.200000]  [<000000004033bf9c>] ida_get_new_above+0x21c/0x2c0
[    0.200000]  [<00000000402a8980>] sysfs_new_dirent+0xd0/0x238
[    0.200000]  [<00000000402a974c>] create_dir+0x5c/0x168
[    0.200000]  [<00000000402a9ab0>] sysfs_create_dir+0x98/0x128
[    0.200000]  [<000000004033d6c4>] kobject_add_internal+0x114/0x258
[    0.200000]  [<000000004033d9ac>] kobject_add_varg+0x7c/0xa0
[    0.200000]  [<000000004033df20>] kobject_add+0x50/0x90
[    0.200000]  [<000000004033dfb4>] kobject_create_and_add+0x54/0xc8
[    0.200000]  [<00000000407862a0>] cgroup_init+0x138/0x1f0
[    0.200000]  [<000000004077ce50>] start_kernel+0x5a0/0x840
[    0.200000]  [<000000004011fa3c>] start_parisc+0xa4/0xb8
[    0.200000]  [<00000000404bb034>] packet_ioctl+0x16c/0x208
[    0.200000]  [<000000004049ac30>] ip_mroute_setsockopt+0x260/0xf20
[    0.200000] 
[    0.200000] 
[    0.200000] Kernel Fault: Code=26 regs=00000000405bca80 (Addr=0000000000000000)
[    0.200000] 
[    0.200000]      YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
[    0.200000] PSW: 00001000000001001111001000001110 Not tainted
[    0.200000] r00-03  000000ff0804f20e 0000000040778360 000000004021c938 0000000000000000
[    0.200000] r04-07  0000000040754b60 0000000000000001 00000000418b1440 000000007ec01280
[    0.200000] r08-11  0000000000000000 00000000405f7380 00000000000003c0 000000007fffffff
[    0.200000] r12-15  00000000405bc7d8 00000000405f7398 00000000405bc6e8 00000000000041ed
[    0.200000] r16-19  00000000f0d00b0c 0000000000000000 0000000000000000 0000000000000000
[    0.200000] r20-23  000000000800000e 0000000000000001 00000000f0000000 000000004033bf9c
[    0.200000] r24-27  0000000000000001 00000000418b1440 0000000000000000 0000000040754b60
[    0.200000] r28-31  000000007ec08000 00000000405bca50 00000000405bca80 000000000000001d
[    0.200000] sr00-03  0000000000000000 0000000000000000 0000000000000000 0000000000000000
[    0.200000] sr04-07  0000000000000000 0000000000000000 0000000000000000 0000000000000000
[    0.200000] 
[    0.200000] IASQ: 0000000000000000 0000000000000000 IAOQ: 000000004011c0f0 000000004011c0f4
[    0.200000]  IIR: 0f4015dc    ISR: 0000000000000000  IOR: 0000000000000000
[    0.200000]  CPU:        0   CR30: 00000000405bc000 CR31: fffffff0f0e098e0
[    0.200000]  ORIG_R28: 0000000000000100
[    0.200000]  IAOQ[0]: _raw_spin_lock+0x0/0x20
[    0.200000]  IAOQ[1]: _raw_spin_lock+0x4/0x20
[    0.200000]  RP(r2): add_partial+0x28/0x98
[    0.200000] Backtrace:
[    0.200000]  [<000000004021c938>] add_partial+0x28/0x98
[    0.200000]  [<000000004021faa0>] __slab_free+0x1d0/0x1d8
[    0.200000]  [<000000004021fd04>] kmem_cache_free+0xc4/0x128
[    0.200000]  [<000000004033bf9c>] ida_get_new_above+0x21c/0x2c0
[    0.200000]  [<00000000402a8980>] sysfs_new_dirent+0xd0/0x238
[    0.200000]  [<00000000402a974c>] create_dir+0x5c/0x168
[    0.200000]  [<00000000402a9ab0>] sysfs_create_dir+0x98/0x128
[    0.200000]  [<000000004033d6c4>] kobject_add_internal+0x114/0x258
[    0.200000]  [<000000004033d9ac>] kobject_add_varg+0x7c/0xa0
[    0.200000]  [<000000004033df20>] kobject_add+0x50/0x90
[    0.200000]  [<000000004033dfb4>] kobject_create_and_add+0x54/0xc8
[    0.200000]  [<00000000407862a0>] cgroup_init+0x138/0x1f0
[    0.200000]  [<000000004077ce50>] start_kernel+0x5a0/0x840
[    0.200000]  [<000000004011fa3c>] start_parisc+0xa4/0xb8
[    0.200000]  [<00000000404bb034>] packet_ioctl+0x16c/0x208
[    0.200000]  [<000000004049ac30>] ip_mroute_setsockopt+0x260/0xf20
[    0.200000] 
[    0.200000] Kernel panic - not syncing: Kernel Fault
[    0.200000] Backtrace:
[    0.200000]  [<000000004011fec4>] show_stack+0x14/0x20
[    0.200000]  [<000000004011fee8>] dump_stack+0x18/0x28
[    0.200000]  [<000000004015a9a4>] panic+0xd4/0x368
[    0.200000]  [<0000000040120564>] parisc_terminate+0x14c/0x170
[    0.200000]  [<0000000040120adc>] handle_interruption+0x2ac/0x8f8
[    0.200000]  [<000000004011c0f0>] _raw_spin_lock+0x0/0x20
[    0.200000] 
[    0.200000] Rebooting in 5 seconds..

James



^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-20 16:32                                                 ` James Bottomley
  0 siblings, 0 replies; 210+ messages in thread
From: James Bottomley @ 2011-04-20 16:32 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Pekka Enberg, Christoph Lameter, Michal Hocko, Andrew Morton,
	Hugh Dickins, linux-mm, LKML, linux-parisc, David Rientjes,
	Ingo Molnar, x86 maintainers

On Wed, 2011-04-20 at 17:40 +0900, KOSAKI Motohiro wrote:
> > > btw, x86 don't have an issue. Probably it's a reason why this issue was neglected
> > > long time.
> > >
> > > arch/x86/Kconfig
> > > -------------------------------------
> > > config ARCH_DISCONTIGMEM_ENABLE
> > >        def_bool y
> > >        depends on NUMA && X86_32
> > 
> > That part makes me think the best option is to make parisc do
> > CONFIG_NUMA as well regardless of the historical intent was.
> > 
> >                         Pekka
> 
> This?

I'm afraid it doesn't boot (it's another slub crash):

[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Linux version 2.6.39-rc3+ (jejb@ion) (gcc version 4.2.4 (Debian 4.2.4-6)) #30 SMP Wed Apr 20 08:52:23 PDT 2011
[    0.000000] unwind_init: start = 0x4057a000, end = 0x405b0e80, entries = 14056
[    0.000000] WARNING: Out of order unwind entry! 000000004057b470 and 000000004057b480
[    0.000000] WARNING: Out of order unwind entry! 000000004057b480 and 000000004057b490
[    0.000000] WARNING: Out of order unwind entry! 000000004057c160 and 000000004057c170
[    0.000000] WARNING: Out of order unwind entry! 000000004057c170 and 000000004057c180
[    0.000000] FP[0] enabled: Rev 1 Model 20
[    0.000000] The 64-bit Kernel has started...
[    0.000000] bootconsole [ttyB0] enabled
[    0.000000] Initialized PDC Console for debugging.
[    0.000000] Determining PDC firmware type: 64 bit PAT.
[    0.000000] model 00008870 00000491 00000000 00000002 3e0505e7352af710 100000f0 00000008 000000b2 000000b2
[    0.000000] vers  00000301
[    0.000000] CPUID vers 20 rev 4 (0x00000284)
[    0.000000] capabilities 0x35
[    0.000000] model 9000/800/rp3440  
[    0.000000] parisc_cache_init: Only equivalent aliasing supported!
[    0.000000] Memory Ranges:
[    0.000000]  0) Start 0x0000000000000000 End 0x000000003fffffff Size   1024 MB
[    0.000000]  1) Start 0x0000004040000000 End 0x000000407fdfffff Size   1022 MB
[    0.000000] Total Memory: 2046 MB
[    0.000000] initrd: 7f390000-7ffedaa1
[    0.000000] initrd: reserving 3f390000-3ffedaa1 (mem_max 7fe00000)
[    0.000000] PERCPU: Embedded 10 pages/cpu @00000000418f5000 s12288 r8192 d20480 u40960
[    0.000000] SMP: bootstrap CPU ID is 0
[    0.000000] Built 2 zonelists in Node order, mobility grouping on.  Total pages: 258560
[    0.000000] Policy zone: Normal
[    0.000000] Kernel command line:  root=/dev/sda3 panic=5 console=ttyS1 palo_kernel=1/vmlinux-test
[    0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes)
[    0.000000] Memory: 2042228k/2095104k available (3849k kernel code, 52876k reserved, 1661k data, 324k init)
[    0.000000] virtual kernel memory layout:
[    0.000000]     vmalloc : 0x0000000000008000 - 0x000000003f000000   (1007 MB)
[    0.000000]     memory  : 0x0000000040000000 - 0x00000040bfe00000   (264190 MB)
[    0.000000]       .init : 0x000000004077c000 - 0x00000000407cd000   ( 324 kB)
[    0.000000]       .data : 0x00000000404c2518 - 0x0000000040661920   (1661 kB)
[    0.000000]       .text : 0x0000000040100000 - 0x00000000404c2518   (3849 kB)
[    0.000000] SLUB: Genslabs=11, HWalign=64, Order=0-3, MinObjects=0, CPUs=8, Nodes=8
[    0.000000] Hierarchical RCU implementation.
[    0.000000]  CONFIG_RCU_FANOUT set to non-default value of 32
[    0.000000]  RCU-based detection of stalled CPUs is disabled.
[    0.000000] NR_IRQS:128
[    0.000000] Console: colour dummy device 160x64
[    0.000000] numa_policy_init: interleaving failed
[    0.000000] Calibrating delay loop... 1594.36 BogoMIPS (lpj=3188736)
[    0.048000] pid_max: default: 32768 minimum: 301
[    0.048000] Security Framework initialized
[    0.048000] SELinux:  Disabled at boot.
[    0.060000] Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes)
[    0.072000] Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes)
[    0.076000] Mount-cache hash table entries: 256
[    0.076000] Initializing cgroup subsys ns
[    0.076000] ns_cgroup deprecated: consider using the 'clone_children' flag without the ns_cgroup.
[    0.080000] Initializing cgroup subsys cpuacct
[    0.092000] Initializing cgroup subsys devices
[    0.092000] Initializing cgroup subsys freezer
[    0.100000] Initializing cgroup subsys net_cls
[    0.100000] Initializing cgroup subsys blkio
[    0.200000] Backtrace:
[    0.200000]  [<000000004021c938>] add_partial+0x28/0x98
[    0.200000]  [<000000004021faa0>] __slab_free+0x1d0/0x1d8
[    0.200000]  [<000000004021fd04>] kmem_cache_free+0xc4/0x128
[    0.200000]  [<000000004033bf9c>] ida_get_new_above+0x21c/0x2c0
[    0.200000]  [<00000000402a8980>] sysfs_new_dirent+0xd0/0x238
[    0.200000]  [<00000000402a974c>] create_dir+0x5c/0x168
[    0.200000]  [<00000000402a9ab0>] sysfs_create_dir+0x98/0x128
[    0.200000]  [<000000004033d6c4>] kobject_add_internal+0x114/0x258
[    0.200000]  [<000000004033d9ac>] kobject_add_varg+0x7c/0xa0
[    0.200000]  [<000000004033df20>] kobject_add+0x50/0x90
[    0.200000]  [<000000004033dfb4>] kobject_create_and_add+0x54/0xc8
[    0.200000]  [<00000000407862a0>] cgroup_init+0x138/0x1f0
[    0.200000]  [<000000004077ce50>] start_kernel+0x5a0/0x840
[    0.200000]  [<000000004011fa3c>] start_parisc+0xa4/0xb8
[    0.200000]  [<00000000404bb034>] packet_ioctl+0x16c/0x208
[    0.200000]  [<000000004049ac30>] ip_mroute_setsockopt+0x260/0xf20
[    0.200000] 
[    0.200000] 
[    0.200000] Kernel Fault: Code=26 regs=00000000405bca80 (Addr=0000000000000000)
[    0.200000] 
[    0.200000]      YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
[    0.200000] PSW: 00001000000001001111001000001110 Not tainted
[    0.200000] r00-03  000000ff0804f20e 0000000040778360 000000004021c938 0000000000000000
[    0.200000] r04-07  0000000040754b60 0000000000000001 00000000418b1440 000000007ec01280
[    0.200000] r08-11  0000000000000000 00000000405f7380 00000000000003c0 000000007fffffff
[    0.200000] r12-15  00000000405bc7d8 00000000405f7398 00000000405bc6e8 00000000000041ed
[    0.200000] r16-19  00000000f0d00b0c 0000000000000000 0000000000000000 0000000000000000
[    0.200000] r20-23  000000000800000e 0000000000000001 00000000f0000000 000000004033bf9c
[    0.200000] r24-27  0000000000000001 00000000418b1440 0000000000000000 0000000040754b60
[    0.200000] r28-31  000000007ec08000 00000000405bca50 00000000405bca80 000000000000001d
[    0.200000] sr00-03  0000000000000000 0000000000000000 0000000000000000 0000000000000000
[    0.200000] sr04-07  0000000000000000 0000000000000000 0000000000000000 0000000000000000
[    0.200000] 
[    0.200000] IASQ: 0000000000000000 0000000000000000 IAOQ: 000000004011c0f0 000000004011c0f4
[    0.200000]  IIR: 0f4015dc    ISR: 0000000000000000  IOR: 0000000000000000
[    0.200000]  CPU:        0   CR30: 00000000405bc000 CR31: fffffff0f0e098e0
[    0.200000]  ORIG_R28: 0000000000000100
[    0.200000]  IAOQ[0]: _raw_spin_lock+0x0/0x20
[    0.200000]  IAOQ[1]: _raw_spin_lock+0x4/0x20
[    0.200000]  RP(r2): add_partial+0x28/0x98
[    0.200000] Backtrace:
[    0.200000]  [<000000004021c938>] add_partial+0x28/0x98
[    0.200000]  [<000000004021faa0>] __slab_free+0x1d0/0x1d8
[    0.200000]  [<000000004021fd04>] kmem_cache_free+0xc4/0x128
[    0.200000]  [<000000004033bf9c>] ida_get_new_above+0x21c/0x2c0
[    0.200000]  [<00000000402a8980>] sysfs_new_dirent+0xd0/0x238
[    0.200000]  [<00000000402a974c>] create_dir+0x5c/0x168
[    0.200000]  [<00000000402a9ab0>] sysfs_create_dir+0x98/0x128
[    0.200000]  [<000000004033d6c4>] kobject_add_internal+0x114/0x258
[    0.200000]  [<000000004033d9ac>] kobject_add_varg+0x7c/0xa0
[    0.200000]  [<000000004033df20>] kobject_add+0x50/0x90
[    0.200000]  [<000000004033dfb4>] kobject_create_and_add+0x54/0xc8
[    0.200000]  [<00000000407862a0>] cgroup_init+0x138/0x1f0
[    0.200000]  [<000000004077ce50>] start_kernel+0x5a0/0x840
[    0.200000]  [<000000004011fa3c>] start_parisc+0xa4/0xb8
[    0.200000]  [<00000000404bb034>] packet_ioctl+0x16c/0x208
[    0.200000]  [<000000004049ac30>] ip_mroute_setsockopt+0x260/0xf20
[    0.200000] 
[    0.200000] Kernel panic - not syncing: Kernel Fault
[    0.200000] Backtrace:
[    0.200000]  [<000000004011fec4>] show_stack+0x14/0x20
[    0.200000]  [<000000004011fee8>] dump_stack+0x18/0x28
[    0.200000]  [<000000004015a9a4>] panic+0xd4/0x368
[    0.200000]  [<0000000040120564>] parisc_terminate+0x14c/0x170
[    0.200000]  [<0000000040120adc>] handle_interruption+0x2ac/0x8f8
[    0.200000]  [<000000004011c0f0>] _raw_spin_lock+0x0/0x20
[    0.200000] 
[    0.200000] Rebooting in 5 seconds..

James


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-20 16:32                                                 ` James Bottomley
@ 2011-04-20 16:50                                                   ` Christoph Lameter
  -1 siblings, 0 replies; 210+ messages in thread
From: Christoph Lameter @ 2011-04-20 16:50 UTC (permalink / raw)
  To: James Bottomley
  Cc: KOSAKI Motohiro, Pekka Enberg, Michal Hocko, Andrew Morton,
	Hugh Dickins, linux-mm, LKML, linux-parisc, David Rientjes,
	Ingo Molnar, x86 maintainers

On Wed, 20 Apr 2011, James Bottomley wrote:

> > > That part makes me think the best option is to make parisc do
> > > CONFIG_NUMA as well regardless of the historical intent was.

Well if it never supported NUMA then this is going to be problematic.
> > >
> > >                         Pekka
> >
> > This?
>
> I'm afraid it doesn't boot (it's another slub crash):

Is there any simulator available that we can use to run a parisc boot?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-20 16:50                                                   ` Christoph Lameter
  0 siblings, 0 replies; 210+ messages in thread
From: Christoph Lameter @ 2011-04-20 16:50 UTC (permalink / raw)
  To: James Bottomley
  Cc: KOSAKI Motohiro, Pekka Enberg, Michal Hocko, Andrew Morton,
	Hugh Dickins, linux-mm, LKML, linux-parisc, David Rientjes,
	Ingo Molnar, x86 maintainers

On Wed, 20 Apr 2011, James Bottomley wrote:

> > > That part makes me think the best option is to make parisc do
> > > CONFIG_NUMA as well regardless of the historical intent was.

Well if it never supported NUMA then this is going to be problematic.
> > >
> > >                         Pekka
> >
> > This?
>
> I'm afraid it doesn't boot (it's another slub crash):

Is there any simulator available that we can use to run a parisc boot?


^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-20 16:50                                                   ` Christoph Lameter
@ 2011-04-20 18:09                                                     ` James Bottomley
  -1 siblings, 0 replies; 210+ messages in thread
From: James Bottomley @ 2011-04-20 18:09 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: KOSAKI Motohiro, Pekka Enberg, Michal Hocko, Andrew Morton,
	Hugh Dickins, linux-mm, LKML, linux-parisc, David Rientjes,
	Ingo Molnar, x86 maintainers

On Wed, 2011-04-20 at 11:50 -0500, Christoph Lameter wrote:
> > I'm afraid it doesn't boot (it's another slub crash):
> 
> Is there any simulator available that we can use to run a parisc boot?

I don't think we have a simulator.  However, if you send a ssh key to

T-Bone@parisc-linux.org

He can loan you remote access to one of the systems that ESIEE in France
hosts for us.  (he's expecting you).

James



^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-20 18:09                                                     ` James Bottomley
  0 siblings, 0 replies; 210+ messages in thread
From: James Bottomley @ 2011-04-20 18:09 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: KOSAKI Motohiro, Pekka Enberg, Michal Hocko, Andrew Morton,
	Hugh Dickins, linux-mm, LKML, linux-parisc, David Rientjes,
	Ingo Molnar, x86 maintainers

On Wed, 2011-04-20 at 11:50 -0500, Christoph Lameter wrote:
> > I'm afraid it doesn't boot (it's another slub crash):
> 
> Is there any simulator available that we can use to run a parisc boot?

I don't think we have a simulator.  However, if you send a ssh key to

T-Bone@parisc-linux.org

He can loan you remote access to one of the systems that ESIEE in France
hosts for us.  (he's expecting you).

James


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-20 15:22                                                         ` Christoph Lameter
@ 2011-04-20 19:25                                                           ` Matthew Wilcox
  -1 siblings, 0 replies; 210+ messages in thread
From: Matthew Wilcox @ 2011-04-20 19:25 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: James Bottomley, Pekka Enberg, KOSAKI Motohiro, Michal Hocko,
	Andrew Morton, Hugh Dickins, linux-mm, LKML, linux-parisc,
	David Rientjes, Ingo Molnar, x86 maintainers, linux-arch,
	Mel Gorman

On Wed, Apr 20, 2011 at 10:22:04AM -0500, Christoph Lameter wrote:
> There is barely any testing going on at all of this since we have had this
> issue for more than 5 years and have not noticed it. The absence of bug
> reports therefore proves nothing. Code inspection of the VM shows
> that this is an issue that arises in multiple subsystems and that we have
> VM_BUG_ONs in the page allocator that should trigger for these situations.

So ... we've proven that people using these architectures use SLAB
instead of SLUB, don't enable CONFIG_DEBUG_VM and don't use hugepages
(not really a surprise ... nobody's running Oracle on these arches :-)

I don't think that qualifies as "barely any testing".  I think that's
"nobody developing the Linux MM uses one of these architectures".

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-20 19:25                                                           ` Matthew Wilcox
  0 siblings, 0 replies; 210+ messages in thread
From: Matthew Wilcox @ 2011-04-20 19:25 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: James Bottomley, Pekka Enberg, KOSAKI Motohiro, Michal Hocko,
	Andrew Morton, Hugh Dickins, linux-mm, LKML, linux-parisc,
	David Rientjes, Ingo Molnar, x86 maintainers, linux-arch,
	Mel Gorman

On Wed, Apr 20, 2011 at 10:22:04AM -0500, Christoph Lameter wrote:
> There is barely any testing going on at all of this since we have had this
> issue for more than 5 years and have not noticed it. The absence of bug
> reports therefore proves nothing. Code inspection of the VM shows
> that this is an issue that arises in multiple subsystems and that we have
> VM_BUG_ONs in the page allocator that should trigger for these situations.

So ... we've proven that people using these architectures use SLAB
instead of SLUB, don't enable CONFIG_DEBUG_VM and don't use hugepages
(not really a surprise ... nobody's running Oracle on these arches :-)

I don't think that qualifies as "barely any testing".  I think that's
"nobody developing the Linux MM uses one of these architectures".

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-20  8:40                                               ` KOSAKI Motohiro
@ 2011-04-20 21:05                                                 ` David Rientjes
  -1 siblings, 0 replies; 210+ messages in thread
From: David Rientjes @ 2011-04-20 21:05 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Pekka Enberg, Christoph Lameter, James Bottomley, Michal Hocko,
	Andrew Morton, Hugh Dickins, linux-mm, LKML, linux-parisc,
	Ingo Molnar, x86 maintainers

On Wed, 20 Apr 2011, KOSAKI Motohiro wrote:

> diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig
> index 69ff049..0bf9ae8 100644
> --- a/arch/parisc/Kconfig
> +++ b/arch/parisc/Kconfig
> @@ -229,6 +229,12 @@ config HOTPLUG_CPU
>  	default y if SMP
>  	select HOTPLUG
>  
> +config NUMA
> +	bool "NUMA support"
> +	help
> +	  Say Y to compile the kernel to support NUMA (Non-Uniform Memory
> +	  Access).
> +
>  config ARCH_SELECT_MEMORY_MODEL
>  	def_bool y
>  	depends on 64BIT
> @@ -236,6 +242,7 @@ config ARCH_SELECT_MEMORY_MODEL
>  config ARCH_DISCONTIGMEM_ENABLE
>  	def_bool y
>  	depends on 64BIT
> +	depends on NUMA
>  
>  config ARCH_FLATMEM_ENABLE
>  	def_bool y

I think this should probably be

diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig
--- a/arch/parisc/Kconfig
+++ b/arch/parisc/Kconfig
@@ -244,6 +244,9 @@ config ARCH_DISCONTIGMEM_DEFAULT
 	def_bool y
 	depends on ARCH_DISCONTIGMEM_ENABLE
 
+config NUMA
+	def_bool ARCH_DISCONTIGMEM_ENABLE
+
 config NODES_SHIFT
 	int
 	default "3"

instead since we don't need CONFIG_NUMA for anything other than 
CONFIG_PA8X00 and 64-bit enabled.

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-20 21:05                                                 ` David Rientjes
  0 siblings, 0 replies; 210+ messages in thread
From: David Rientjes @ 2011-04-20 21:05 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Pekka Enberg, Christoph Lameter, James Bottomley, Michal Hocko,
	Andrew Morton, Hugh Dickins, linux-mm, LKML, linux-parisc,
	Ingo Molnar, x86 maintainers

On Wed, 20 Apr 2011, KOSAKI Motohiro wrote:

> diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig
> index 69ff049..0bf9ae8 100644
> --- a/arch/parisc/Kconfig
> +++ b/arch/parisc/Kconfig
> @@ -229,6 +229,12 @@ config HOTPLUG_CPU
>  	default y if SMP
>  	select HOTPLUG
>  
> +config NUMA
> +	bool "NUMA support"
> +	help
> +	  Say Y to compile the kernel to support NUMA (Non-Uniform Memory
> +	  Access).
> +
>  config ARCH_SELECT_MEMORY_MODEL
>  	def_bool y
>  	depends on 64BIT
> @@ -236,6 +242,7 @@ config ARCH_SELECT_MEMORY_MODEL
>  config ARCH_DISCONTIGMEM_ENABLE
>  	def_bool y
>  	depends on 64BIT
> +	depends on NUMA
>  
>  config ARCH_FLATMEM_ENABLE
>  	def_bool y

I think this should probably be

diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig
--- a/arch/parisc/Kconfig
+++ b/arch/parisc/Kconfig
@@ -244,6 +244,9 @@ config ARCH_DISCONTIGMEM_DEFAULT
 	def_bool y
 	depends on ARCH_DISCONTIGMEM_ENABLE
 
+config NUMA
+	def_bool ARCH_DISCONTIGMEM_ENABLE
+
 config NODES_SHIFT
 	int
 	default "3"

instead since we don't need CONFIG_NUMA for anything other than 
CONFIG_PA8X00 and 64-bit enabled.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-20 16:32                                                 ` James Bottomley
@ 2011-04-20 21:18                                                   ` David Rientjes
  -1 siblings, 0 replies; 210+ messages in thread
From: David Rientjes @ 2011-04-20 21:18 UTC (permalink / raw)
  To: James Bottomley
  Cc: KOSAKI Motohiro, Pekka Enberg, Christoph Lameter, Michal Hocko,
	Andrew Morton, Hugh Dickins, linux-mm, LKML, linux-parisc,
	Ingo Molnar, x86 maintainers

On Wed, 20 Apr 2011, James Bottomley wrote:

> [    0.200000] Backtrace:
> [    0.200000]  [<000000004021c938>] add_partial+0x28/0x98
> [    0.200000]  [<000000004021faa0>] __slab_free+0x1d0/0x1d8
> [    0.200000]  [<000000004021fd04>] kmem_cache_free+0xc4/0x128
> [    0.200000]  [<000000004033bf9c>] ida_get_new_above+0x21c/0x2c0
> [    0.200000]  [<00000000402a8980>] sysfs_new_dirent+0xd0/0x238
> [    0.200000]  [<00000000402a974c>] create_dir+0x5c/0x168
> [    0.200000]  [<00000000402a9ab0>] sysfs_create_dir+0x98/0x128
> [    0.200000]  [<000000004033d6c4>] kobject_add_internal+0x114/0x258
> [    0.200000]  [<000000004033d9ac>] kobject_add_varg+0x7c/0xa0
> [    0.200000]  [<000000004033df20>] kobject_add+0x50/0x90
> [    0.200000]  [<000000004033dfb4>] kobject_create_and_add+0x54/0xc8
> [    0.200000]  [<00000000407862a0>] cgroup_init+0x138/0x1f0
> [    0.200000]  [<000000004077ce50>] start_kernel+0x5a0/0x840
> [    0.200000]  [<000000004011fa3c>] start_parisc+0xa4/0xb8
> [    0.200000]  [<00000000404bb034>] packet_ioctl+0x16c/0x208
> [    0.200000]  [<000000004049ac30>] ip_mroute_setsockopt+0x260/0xf20
> [    0.200000] 

This is probably because the parisc's DISCONTIGMEM memory ranges don't 
have bits set in N_NORMAL_MEMORY.

diff --git a/arch/parisc/mm/init.c b/arch/parisc/mm/init.c
--- a/arch/parisc/mm/init.c
+++ b/arch/parisc/mm/init.c
@@ -266,8 +266,10 @@ static void __init setup_bootmem(void)
 	}
 	memset(pfnnid_map, 0xff, sizeof(pfnnid_map));
 
-	for (i = 0; i < npmem_ranges; i++)
+	for (i = 0; i < npmem_ranges; i++) {
+		node_set_state(i, N_NORMAL_MEMORY);
 		node_set_online(i);
+	}
 #endif
 
 	/*

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-20 21:18                                                   ` David Rientjes
  0 siblings, 0 replies; 210+ messages in thread
From: David Rientjes @ 2011-04-20 21:18 UTC (permalink / raw)
  To: James Bottomley
  Cc: KOSAKI Motohiro, Pekka Enberg, Christoph Lameter, Michal Hocko,
	Andrew Morton, Hugh Dickins, linux-mm, LKML, linux-parisc,
	Ingo Molnar, x86 maintainers

On Wed, 20 Apr 2011, James Bottomley wrote:

> [    0.200000] Backtrace:
> [    0.200000]  [<000000004021c938>] add_partial+0x28/0x98
> [    0.200000]  [<000000004021faa0>] __slab_free+0x1d0/0x1d8
> [    0.200000]  [<000000004021fd04>] kmem_cache_free+0xc4/0x128
> [    0.200000]  [<000000004033bf9c>] ida_get_new_above+0x21c/0x2c0
> [    0.200000]  [<00000000402a8980>] sysfs_new_dirent+0xd0/0x238
> [    0.200000]  [<00000000402a974c>] create_dir+0x5c/0x168
> [    0.200000]  [<00000000402a9ab0>] sysfs_create_dir+0x98/0x128
> [    0.200000]  [<000000004033d6c4>] kobject_add_internal+0x114/0x258
> [    0.200000]  [<000000004033d9ac>] kobject_add_varg+0x7c/0xa0
> [    0.200000]  [<000000004033df20>] kobject_add+0x50/0x90
> [    0.200000]  [<000000004033dfb4>] kobject_create_and_add+0x54/0xc8
> [    0.200000]  [<00000000407862a0>] cgroup_init+0x138/0x1f0
> [    0.200000]  [<000000004077ce50>] start_kernel+0x5a0/0x840
> [    0.200000]  [<000000004011fa3c>] start_parisc+0xa4/0xb8
> [    0.200000]  [<00000000404bb034>] packet_ioctl+0x16c/0x208
> [    0.200000]  [<000000004049ac30>] ip_mroute_setsockopt+0x260/0xf20
> [    0.200000] 

This is probably because the parisc's DISCONTIGMEM memory ranges don't 
have bits set in N_NORMAL_MEMORY.

diff --git a/arch/parisc/mm/init.c b/arch/parisc/mm/init.c
--- a/arch/parisc/mm/init.c
+++ b/arch/parisc/mm/init.c
@@ -266,8 +266,10 @@ static void __init setup_bootmem(void)
 	}
 	memset(pfnnid_map, 0xff, sizeof(pfnnid_map));
 
-	for (i = 0; i < npmem_ranges; i++)
+	for (i = 0; i < npmem_ranges; i++) {
+		node_set_state(i, N_NORMAL_MEMORY);
 		node_set_online(i);
+	}
 #endif
 
 	/*

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-20 11:20                                               ` Matthew Wilcox
@ 2011-04-20 21:34                                                 ` David Rientjes
  -1 siblings, 0 replies; 210+ messages in thread
From: David Rientjes @ 2011-04-20 21:34 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Pekka Enberg, KOSAKI Motohiro, Christoph Lameter,
	James Bottomley, Michal Hocko, Andrew Morton, Hugh Dickins,
	linux-mm, LKML, linux-parisc, Ingo Molnar, x86 maintainers

On Wed, 20 Apr 2011, Matthew Wilcox wrote:

> > That part makes me think the best option is to make parisc do
> > CONFIG_NUMA as well regardless of the historical intent was.
> 
> But it's not just parisc.  It's six other architectures as well, some
> of which aren't even SMP.  Does !SMP && NUMA make any kind of sense?
> 

It does as long as DISCONTIGMEM is hijacking NUMA abstractions throughout 
the code; for example, look at the .config that James is probably using 
for testing here:

	CONFIG_PA8X00=y
	CONFIG_64BIT=y
	CONFIG_DISCONTIGMEM=y
	CONFIG_NEED_MULTIPLE_NODES=y
	CONFIG_NODES_SHIFT=3

and CONFIG_NUMA is not enabled.  So we want CONFIG_NODES_SHIFT of 3 
(because MAX_PHYSMEM_RANGES is 8) and CONFIG_NEED_MULTIPLE_NODES is 
enabled because of DISCONTIGMEM:

	#
	# Both the NUMA code and DISCONTIGMEM use arrays of pg_data_t's
	# to represent different areas of memory.  This variable allows
	# those dependencies to exist individually.
	#
	config NEED_MULTIPLE_NODES
		def_bool y
		depends on DISCONTIGMEM || NUMA

when in reality we should do away with CONFIG_NEED_MULTIPLE_NODES and just 
force DISCONTIGMEM to enable CONFIG_NUMA at least for -stable and as a 
quick fix for James.

In the long run, we'll probably want to define a lighterweight CONFIG_NUMA 
as a layer that CONFIG_DISCONTIGMEM can use for memory range abstractions 
and then CONFIG_NUMA is built on top of it to define proximity between 
those ranges.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-20 21:34                                                 ` David Rientjes
  0 siblings, 0 replies; 210+ messages in thread
From: David Rientjes @ 2011-04-20 21:34 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Pekka Enberg, KOSAKI Motohiro, Christoph Lameter,
	James Bottomley, Michal Hocko, Andrew Morton, Hugh Dickins,
	linux-mm, LKML, linux-parisc, Ingo Molnar, x86 maintainers

On Wed, 20 Apr 2011, Matthew Wilcox wrote:

> > That part makes me think the best option is to make parisc do
> > CONFIG_NUMA as well regardless of the historical intent was.
> 
> But it's not just parisc.  It's six other architectures as well, some
> of which aren't even SMP.  Does !SMP && NUMA make any kind of sense?
> 

It does as long as DISCONTIGMEM is hijacking NUMA abstractions throughout 
the code; for example, look at the .config that James is probably using 
for testing here:

	CONFIG_PA8X00=y
	CONFIG_64BIT=y
	CONFIG_DISCONTIGMEM=y
	CONFIG_NEED_MULTIPLE_NODES=y
	CONFIG_NODES_SHIFT=3

and CONFIG_NUMA is not enabled.  So we want CONFIG_NODES_SHIFT of 3 
(because MAX_PHYSMEM_RANGES is 8) and CONFIG_NEED_MULTIPLE_NODES is 
enabled because of DISCONTIGMEM:

	#
	# Both the NUMA code and DISCONTIGMEM use arrays of pg_data_t's
	# to represent different areas of memory.  This variable allows
	# those dependencies to exist individually.
	#
	config NEED_MULTIPLE_NODES
		def_bool y
		depends on DISCONTIGMEM || NUMA

when in reality we should do away with CONFIG_NEED_MULTIPLE_NODES and just 
force DISCONTIGMEM to enable CONFIG_NUMA at least for -stable and as a 
quick fix for James.

In the long run, we'll probably want to define a lighterweight CONFIG_NUMA 
as a layer that CONFIG_DISCONTIGMEM can use for memory range abstractions 
and then CONFIG_NUMA is built on top of it to define proximity between 
those ranges.

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-20 15:22                                                         ` Christoph Lameter
@ 2011-04-20 21:42                                                           ` David Rientjes
  -1 siblings, 0 replies; 210+ messages in thread
From: David Rientjes @ 2011-04-20 21:42 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: James Bottomley, Pekka Enberg, Matthew Wilcox, KOSAKI Motohiro,
	Michal Hocko, Andrew Morton, Hugh Dickins, linux-mm, LKML,
	linux-parisc, Ingo Molnar, x86 maintainers, linux-arch,
	Mel Gorman

On Wed, 20 Apr 2011, Christoph Lameter wrote:

> There is barely any testing going on at all of this since we have had this
> issue for more than 5 years and have not noticed it. The absence of bug
> reports therefore proves nothing. Code inspection of the VM shows
> that this is an issue that arises in multiple subsystems and that we have
> VM_BUG_ONs in the page allocator that should trigger for these situations.
> 
> Usage of DISCONTIGMEM and !NUMA is not safe and should be flagged as such.
> 

We don't actually have any bug reports in front of us that show anything 
else in the VM other than slub has issues with this configuration, so 
marking them as broken is probably premature.  The parisc config that 
triggered this debugging enables CONFIG_SLAB by default, so it probably 
has gone unnoticed just because nobody other than James has actually tried 
it on hppa64.

Let's see if KOSAKI-san's fixes to Kconfig (even though I'd prefer the 
simpler and implicit "config NUMA def_bool ARCH_DISCONTIGMEM_ENABLE" over 
his config NUMA) and my fix to parisc to set the bit in N_NORMAL_MEMORY 
so that CONFIG_SLUB initializes kmem_cache_node correctly works and then 
address issues in the core VM as they arise.  Presumably someone has been 
running DISCONTIGMEM on hppa64 in the past five years without issues with 
defconfig, so the issue here may just be slub.

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-20 21:42                                                           ` David Rientjes
  0 siblings, 0 replies; 210+ messages in thread
From: David Rientjes @ 2011-04-20 21:42 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: James Bottomley, Pekka Enberg, Matthew Wilcox, KOSAKI Motohiro,
	Michal Hocko, Andrew Morton, Hugh Dickins, linux-mm, LKML,
	linux-parisc, Ingo Molnar, x86 maintainers, linux-arch,
	Mel Gorman

On Wed, 20 Apr 2011, Christoph Lameter wrote:

> There is barely any testing going on at all of this since we have had this
> issue for more than 5 years and have not noticed it. The absence of bug
> reports therefore proves nothing. Code inspection of the VM shows
> that this is an issue that arises in multiple subsystems and that we have
> VM_BUG_ONs in the page allocator that should trigger for these situations.
> 
> Usage of DISCONTIGMEM and !NUMA is not safe and should be flagged as such.
> 

We don't actually have any bug reports in front of us that show anything 
else in the VM other than slub has issues with this configuration, so 
marking them as broken is probably premature.  The parisc config that 
triggered this debugging enables CONFIG_SLAB by default, so it probably 
has gone unnoticed just because nobody other than James has actually tried 
it on hppa64.

Let's see if KOSAKI-san's fixes to Kconfig (even though I'd prefer the 
simpler and implicit "config NUMA def_bool ARCH_DISCONTIGMEM_ENABLE" over 
his config NUMA) and my fix to parisc to set the bit in N_NORMAL_MEMORY 
so that CONFIG_SLUB initializes kmem_cache_node correctly works and then 
address issues in the core VM as they arise.  Presumably someone has been 
running DISCONTIGMEM on hppa64 in the past five years without issues with 
defconfig, so the issue here may just be slub.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-20 21:18                                                   ` David Rientjes
@ 2011-04-20 22:15                                                     ` James Bottomley
  -1 siblings, 0 replies; 210+ messages in thread
From: James Bottomley @ 2011-04-20 22:15 UTC (permalink / raw)
  To: David Rientjes
  Cc: KOSAKI Motohiro, Pekka Enberg, Christoph Lameter, Michal Hocko,
	Andrew Morton, Hugh Dickins, linux-mm, LKML, linux-parisc,
	Ingo Molnar, x86 maintainers

On Wed, 2011-04-20 at 14:18 -0700, David Rientjes wrote:
> This is probably because the parisc's DISCONTIGMEM memory ranges don't 
> have bits set in N_NORMAL_MEMORY.
> 
> diff --git a/arch/parisc/mm/init.c b/arch/parisc/mm/init.c
> --- a/arch/parisc/mm/init.c
> +++ b/arch/parisc/mm/init.c
> @@ -266,8 +266,10 @@ static void __init setup_bootmem(void)
>  	}
>  	memset(pfnnid_map, 0xff, sizeof(pfnnid_map));
>  
> -	for (i = 0; i < npmem_ranges; i++)
> +	for (i = 0; i < npmem_ranges; i++) {
> +		node_set_state(i, N_NORMAL_MEMORY);
>  		node_set_online(i);
> +	}
>  #endif

Yes, this seems to be the missing piece that gets it to boot.  We really
need this in generic code, unless someone wants to run through all the
other arch's doing it ...

James



^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-20 22:15                                                     ` James Bottomley
  0 siblings, 0 replies; 210+ messages in thread
From: James Bottomley @ 2011-04-20 22:15 UTC (permalink / raw)
  To: David Rientjes
  Cc: KOSAKI Motohiro, Pekka Enberg, Christoph Lameter, Michal Hocko,
	Andrew Morton, Hugh Dickins, linux-mm, LKML, linux-parisc,
	Ingo Molnar, x86 maintainers

On Wed, 2011-04-20 at 14:18 -0700, David Rientjes wrote:
> This is probably because the parisc's DISCONTIGMEM memory ranges don't 
> have bits set in N_NORMAL_MEMORY.
> 
> diff --git a/arch/parisc/mm/init.c b/arch/parisc/mm/init.c
> --- a/arch/parisc/mm/init.c
> +++ b/arch/parisc/mm/init.c
> @@ -266,8 +266,10 @@ static void __init setup_bootmem(void)
>  	}
>  	memset(pfnnid_map, 0xff, sizeof(pfnnid_map));
>  
> -	for (i = 0; i < npmem_ranges; i++)
> +	for (i = 0; i < npmem_ranges; i++) {
> +		node_set_state(i, N_NORMAL_MEMORY);
>  		node_set_online(i);
> +	}
>  #endif

Yes, this seems to be the missing piece that gets it to boot.  We really
need this in generic code, unless someone wants to run through all the
other arch's doing it ...

James


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-20 22:15                                                     ` James Bottomley
@ 2011-04-20 23:12                                                       ` David Rientjes
  -1 siblings, 0 replies; 210+ messages in thread
From: David Rientjes @ 2011-04-20 23:12 UTC (permalink / raw)
  To: James Bottomley
  Cc: KOSAKI Motohiro, Pekka Enberg, Christoph Lameter, Michal Hocko,
	Andrew Morton, Hugh Dickins, linux-mm, LKML, linux-parisc,
	Ingo Molnar, x86 maintainers

On Wed, 20 Apr 2011, James Bottomley wrote:

> > This is probably because the parisc's DISCONTIGMEM memory ranges don't 
> > have bits set in N_NORMAL_MEMORY.
> > 
> > diff --git a/arch/parisc/mm/init.c b/arch/parisc/mm/init.c
> > --- a/arch/parisc/mm/init.c
> > +++ b/arch/parisc/mm/init.c
> > @@ -266,8 +266,10 @@ static void __init setup_bootmem(void)
> >  	}
> >  	memset(pfnnid_map, 0xff, sizeof(pfnnid_map));
> >  
> > -	for (i = 0; i < npmem_ranges; i++)
> > +	for (i = 0; i < npmem_ranges; i++) {
> > +		node_set_state(i, N_NORMAL_MEMORY);
> >  		node_set_online(i);
> > +	}
> >  #endif
> 
> Yes, this seems to be the missing piece that gets it to boot.  We really
> need this in generic code, unless someone wants to run through all the
> other arch's doing it ...
> 

Looking at all other architectures that allow ARCH_DISCONTIGMEM_ENABLE, we 
already know x86 is fine, avr32 disables ARCH_DISCONTIGMEM_ENABLE entirely 
because its code only brings online node 0, and tile already sets the bit 
in N_NORMAL_MEMORY correctly when bringing a node online, probably because 
it was introduced after the various node state masks were added in 
7ea1530ab3fd back in October 2007.

So we're really only talking about alpha, ia64, m32r, m68k, and mips and 
it only seems to matter when using CONFIG_SLUB, which isn't surprising 
when greping for it:

	$ grep -r N_NORMAL_MEMORY mm/*
	mm/memcontrol.c:	if (!node_state(node, N_NORMAL_MEMORY))
	mm/memcontrol.c:		if (!node_state(node, N_NORMAL_MEMORY))
	mm/page_alloc.c:	[N_NORMAL_MEMORY] = { { [0] = 1UL } },
	mm/page_alloc.c:			node_set_state(zone_to_nid(zone), N_NORMAL_MEMORY);
	mm/slub.c:	for_each_node_state(node, N_NORMAL_MEMORY) {
	mm/slub.c:	for_each_node_state(node, N_NORMAL_MEMORY) {
	mm/slub.c:	for_each_node_state(node, N_NORMAL_MEMORY) {
	mm/slub.c:	for_each_node_state(node, N_NORMAL_MEMORY) {
	mm/slub.c:	for_each_node_state(node, N_NORMAL_MEMORY) {
	mm/slub.c:	for_each_node_state(node, N_NORMAL_MEMORY) {
	mm/slub.c:	for_each_node_state(node, N_NORMAL_MEMORY) {
	mm/slub.c:		for_each_node_state(node, N_NORMAL_MEMORY) {
	mm/slub.c:		for_each_node_state(node, N_NORMAL_MEMORY) {
	mm/slub.c:	for_each_node_state(node, N_NORMAL_MEMORY)

Those memory controller occurrences only result in it passing a node id of 
-1 to kmalloc_node() which means no specific node target, and that's fine 
for DISCONTIGMEM since we don't care about any proximity between memory 
ranges.

This should fix the remaining architectures so they can use CONFIG_SLUB, 
but I hope it can be tested by the individual arch maintainers like you 
did for parisc.

diff --git a/arch/alpha/mm/numa.c b/arch/alpha/mm/numa.c
--- a/arch/alpha/mm/numa.c
+++ b/arch/alpha/mm/numa.c
@@ -245,6 +245,7 @@ setup_memory_node(int nid, void *kernel_end)
 			bootmap_size, BOOTMEM_DEFAULT);
 	printk(" reserving pages %ld:%ld\n", bootmap_start, bootmap_start+PFN_UP(bootmap_size));
 
+	node_set_state(nid, N_NORMAL_MEMORY);
 	node_set_online(nid);
 }
 
diff --git a/arch/ia64/mm/discontig.c b/arch/ia64/mm/discontig.c
--- a/arch/ia64/mm/discontig.c
+++ b/arch/ia64/mm/discontig.c
@@ -573,6 +573,8 @@ void __init find_memory(void)
 				  map>>PAGE_SHIFT,
 				  bdp->node_min_pfn,
 				  bdp->node_low_pfn);
+		if (node_present_pages(node))
+			node_set_state(node, N_NORMAL_MEMORY);
 	}
 
 	efi_memmap_walk(filter_rsvd_memory, free_node_bootmem);
diff --git a/arch/m32r/kernel/setup.c b/arch/m32r/kernel/setup.c
--- a/arch/m32r/kernel/setup.c
+++ b/arch/m32r/kernel/setup.c
@@ -247,7 +247,9 @@ void __init setup_arch(char **cmdline_p)
 
 #ifdef CONFIG_DISCONTIGMEM
 	nodes_clear(node_online_map);
+	node_set_state(0, N_NORMAL_MEMORY);	/* always has memory */
 	node_set_online(0);
+	node_set_state(1, N_NORMAL_MEMORY);	/* always has memory */
 	node_set_online(1);
 #endif	/* CONFIG_DISCONTIGMEM */
 
diff --git a/arch/m68k/mm/init_mm.c b/arch/m68k/mm/init_mm.c
--- a/arch/m68k/mm/init_mm.c
+++ b/arch/m68k/mm/init_mm.c
@@ -59,6 +59,8 @@ void __init m68k_setup_node(int node)
 	}
 #endif
 	pg_data_map[node].bdata = bootmem_node_data + node;
+	if (node_present_pages(node))
+		node_set_state(node, N_NORMAL_MEMORY);
 	node_set_online(node);
 }
 
diff --git a/arch/mips/sgi-ip27/ip27-memory.c b/arch/mips/sgi-ip27/ip27-memory.c
--- a/arch/mips/sgi-ip27/ip27-memory.c
+++ b/arch/mips/sgi-ip27/ip27-memory.c
@@ -471,6 +471,8 @@ void __init paging_init(void)
 
 		if (end_pfn > max_low_pfn)
 			max_low_pfn = end_pfn;
+		if (end_pfn > start_pfn)
+			node_set_state(node, N_NORMAL_MEMORY);
 	}
 	zones_size[ZONE_NORMAL] = max_low_pfn;
 	free_area_init_nodes(zones_size);

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-20 23:12                                                       ` David Rientjes
  0 siblings, 0 replies; 210+ messages in thread
From: David Rientjes @ 2011-04-20 23:12 UTC (permalink / raw)
  To: James Bottomley
  Cc: KOSAKI Motohiro, Pekka Enberg, Christoph Lameter, Michal Hocko,
	Andrew Morton, Hugh Dickins, linux-mm, LKML, linux-parisc,
	Ingo Molnar, x86 maintainers

On Wed, 20 Apr 2011, James Bottomley wrote:

> > This is probably because the parisc's DISCONTIGMEM memory ranges don't 
> > have bits set in N_NORMAL_MEMORY.
> > 
> > diff --git a/arch/parisc/mm/init.c b/arch/parisc/mm/init.c
> > --- a/arch/parisc/mm/init.c
> > +++ b/arch/parisc/mm/init.c
> > @@ -266,8 +266,10 @@ static void __init setup_bootmem(void)
> >  	}
> >  	memset(pfnnid_map, 0xff, sizeof(pfnnid_map));
> >  
> > -	for (i = 0; i < npmem_ranges; i++)
> > +	for (i = 0; i < npmem_ranges; i++) {
> > +		node_set_state(i, N_NORMAL_MEMORY);
> >  		node_set_online(i);
> > +	}
> >  #endif
> 
> Yes, this seems to be the missing piece that gets it to boot.  We really
> need this in generic code, unless someone wants to run through all the
> other arch's doing it ...
> 

Looking at all other architectures that allow ARCH_DISCONTIGMEM_ENABLE, we 
already know x86 is fine, avr32 disables ARCH_DISCONTIGMEM_ENABLE entirely 
because its code only brings online node 0, and tile already sets the bit 
in N_NORMAL_MEMORY correctly when bringing a node online, probably because 
it was introduced after the various node state masks were added in 
7ea1530ab3fd back in October 2007.

So we're really only talking about alpha, ia64, m32r, m68k, and mips and 
it only seems to matter when using CONFIG_SLUB, which isn't surprising 
when greping for it:

	$ grep -r N_NORMAL_MEMORY mm/*
	mm/memcontrol.c:	if (!node_state(node, N_NORMAL_MEMORY))
	mm/memcontrol.c:		if (!node_state(node, N_NORMAL_MEMORY))
	mm/page_alloc.c:	[N_NORMAL_MEMORY] = { { [0] = 1UL } },
	mm/page_alloc.c:			node_set_state(zone_to_nid(zone), N_NORMAL_MEMORY);
	mm/slub.c:	for_each_node_state(node, N_NORMAL_MEMORY) {
	mm/slub.c:	for_each_node_state(node, N_NORMAL_MEMORY) {
	mm/slub.c:	for_each_node_state(node, N_NORMAL_MEMORY) {
	mm/slub.c:	for_each_node_state(node, N_NORMAL_MEMORY) {
	mm/slub.c:	for_each_node_state(node, N_NORMAL_MEMORY) {
	mm/slub.c:	for_each_node_state(node, N_NORMAL_MEMORY) {
	mm/slub.c:	for_each_node_state(node, N_NORMAL_MEMORY) {
	mm/slub.c:		for_each_node_state(node, N_NORMAL_MEMORY) {
	mm/slub.c:		for_each_node_state(node, N_NORMAL_MEMORY) {
	mm/slub.c:	for_each_node_state(node, N_NORMAL_MEMORY)

Those memory controller occurrences only result in it passing a node id of 
-1 to kmalloc_node() which means no specific node target, and that's fine 
for DISCONTIGMEM since we don't care about any proximity between memory 
ranges.

This should fix the remaining architectures so they can use CONFIG_SLUB, 
but I hope it can be tested by the individual arch maintainers like you 
did for parisc.

diff --git a/arch/alpha/mm/numa.c b/arch/alpha/mm/numa.c
--- a/arch/alpha/mm/numa.c
+++ b/arch/alpha/mm/numa.c
@@ -245,6 +245,7 @@ setup_memory_node(int nid, void *kernel_end)
 			bootmap_size, BOOTMEM_DEFAULT);
 	printk(" reserving pages %ld:%ld\n", bootmap_start, bootmap_start+PFN_UP(bootmap_size));
 
+	node_set_state(nid, N_NORMAL_MEMORY);
 	node_set_online(nid);
 }
 
diff --git a/arch/ia64/mm/discontig.c b/arch/ia64/mm/discontig.c
--- a/arch/ia64/mm/discontig.c
+++ b/arch/ia64/mm/discontig.c
@@ -573,6 +573,8 @@ void __init find_memory(void)
 				  map>>PAGE_SHIFT,
 				  bdp->node_min_pfn,
 				  bdp->node_low_pfn);
+		if (node_present_pages(node))
+			node_set_state(node, N_NORMAL_MEMORY);
 	}
 
 	efi_memmap_walk(filter_rsvd_memory, free_node_bootmem);
diff --git a/arch/m32r/kernel/setup.c b/arch/m32r/kernel/setup.c
--- a/arch/m32r/kernel/setup.c
+++ b/arch/m32r/kernel/setup.c
@@ -247,7 +247,9 @@ void __init setup_arch(char **cmdline_p)
 
 #ifdef CONFIG_DISCONTIGMEM
 	nodes_clear(node_online_map);
+	node_set_state(0, N_NORMAL_MEMORY);	/* always has memory */
 	node_set_online(0);
+	node_set_state(1, N_NORMAL_MEMORY);	/* always has memory */
 	node_set_online(1);
 #endif	/* CONFIG_DISCONTIGMEM */
 
diff --git a/arch/m68k/mm/init_mm.c b/arch/m68k/mm/init_mm.c
--- a/arch/m68k/mm/init_mm.c
+++ b/arch/m68k/mm/init_mm.c
@@ -59,6 +59,8 @@ void __init m68k_setup_node(int node)
 	}
 #endif
 	pg_data_map[node].bdata = bootmem_node_data + node;
+	if (node_present_pages(node))
+		node_set_state(node, N_NORMAL_MEMORY);
 	node_set_online(node);
 }
 
diff --git a/arch/mips/sgi-ip27/ip27-memory.c b/arch/mips/sgi-ip27/ip27-memory.c
--- a/arch/mips/sgi-ip27/ip27-memory.c
+++ b/arch/mips/sgi-ip27/ip27-memory.c
@@ -471,6 +471,8 @@ void __init paging_init(void)
 
 		if (end_pfn > max_low_pfn)
 			max_low_pfn = end_pfn;
+		if (end_pfn > start_pfn)
+			node_set_state(node, N_NORMAL_MEMORY);
 	}
 	zones_size[ZONE_NORMAL] = max_low_pfn;
 	free_area_init_nodes(zones_size);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* [patch] parisc: set memory ranges in N_NORMAL_MEMORY when onlined
  2011-04-20 21:18                                                   ` David Rientjes
  (?)
  (?)
@ 2011-04-21  2:27                                                   ` David Rientjes
  -1 siblings, 0 replies; 210+ messages in thread
From: David Rientjes @ 2011-04-21  2:27 UTC (permalink / raw)
  To: Kyle McMartin, Helge Deller, James Bottomley
  Cc: Andrew Morton, linux-kernel, linux-parisc

When a DISCONTIGMEM memory range is brought online as a NUMA node, it 
also needs to have its bet set in N_NORMAL_MEMORY.  This is necessary for 
generic kernel code that utilizes N_NORMAL_MEMORY as a subset of N_ONLINE 
for memory savings.

These types of hacks can hopefully be removed once DISCONTIGMEM is either 
removed or abstracted away from CONFIG_NUMA.

Fixes a panic in the slub code which only initializes structures for 
N_NORMAL_MEMORY to save memory:

	Backtrace:
	 [<000000004021c938>] add_partial+0x28/0x98
	 [<000000004021faa0>] __slab_free+0x1d0/0x1d8
	 [<000000004021fd04>] kmem_cache_free+0xc4/0x128
	 [<000000004033bf9c>] ida_get_new_above+0x21c/0x2c0
	 [<00000000402a8980>] sysfs_new_dirent+0xd0/0x238
	 [<00000000402a974c>] create_dir+0x5c/0x168
	 [<00000000402a9ab0>] sysfs_create_dir+0x98/0x128
	 [<000000004033d6c4>] kobject_add_internal+0x114/0x258
	 [<000000004033d9ac>] kobject_add_varg+0x7c/0xa0
	 [<000000004033df20>] kobject_add+0x50/0x90
	 [<000000004033dfb4>] kobject_create_and_add+0x54/0xc8
	 [<00000000407862a0>] cgroup_init+0x138/0x1f0
	 [<000000004077ce50>] start_kernel+0x5a0/0x840
	 [<000000004011fa3c>] start_parisc+0xa4/0xb8
	 [<00000000404bb034>] packet_ioctl+0x16c/0x208
	 [<000000004049ac30>] ip_mroute_setsockopt+0x260/0xf20

Tested-by: James Bottomley <James.Bottomley@hansenpartnership.com>
Signed-off-by: David Rientjes <rientjes@google.com>
---
 arch/parisc/mm/init.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/arch/parisc/mm/init.c b/arch/parisc/mm/init.c
--- a/arch/parisc/mm/init.c
+++ b/arch/parisc/mm/init.c
@@ -266,8 +266,10 @@ static void __init setup_bootmem(void)
 	}
 	memset(pfnnid_map, 0xff, sizeof(pfnnid_map));
 
-	for (i = 0; i < npmem_ranges; i++)
+	for (i = 0; i < npmem_ranges; i++) {
+		node_set_state(i, N_NORMAL_MEMORY);
 		node_set_online(i);
+	}
 #endif
 
 	/*

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-20 21:18                                                   ` David Rientjes
@ 2011-04-21 13:03                                                     ` KOSAKI Motohiro
  -1 siblings, 0 replies; 210+ messages in thread
From: KOSAKI Motohiro @ 2011-04-21 13:03 UTC (permalink / raw)
  To: David Rientjes
  Cc: kosaki.motohiro, James Bottomley, Pekka Enberg,
	Christoph Lameter, Michal Hocko, Andrew Morton, Hugh Dickins,
	linux-mm, LKML, linux-parisc, Ingo Molnar, x86 maintainers

> diff --git a/arch/parisc/mm/init.c b/arch/parisc/mm/init.c
> --- a/arch/parisc/mm/init.c
> +++ b/arch/parisc/mm/init.c
> @@ -266,8 +266,10 @@ static void __init setup_bootmem(void)
>  	}
>  	memset(pfnnid_map, 0xff, sizeof(pfnnid_map));
>  
> -	for (i = 0; i < npmem_ranges; i++)
> +	for (i = 0; i < npmem_ranges; i++) {
> +		node_set_state(i, N_NORMAL_MEMORY);
>  		node_set_online(i);
> +	}
>  #endif


I'm surprised this one. If arch code doesn't initialized N_NORMAL_MEMORY,
(or N_HIGH_MEMORY. N_HIGH_MEMORY == N_NORMAL_MEMORY if CONFIG_HIGHMEM=n)
kswapd is created only at node0. wow.

The initialization must be necessary even if !NUMA, I think. ;-)
Probably we should have revisit all arch code when commit 9422ffba4a 
(Memoryless nodes: No need for kswapd) was introduced, at least.

Thank you David. and I'm sad this multi level unforunate mismatch....



^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-21 13:03                                                     ` KOSAKI Motohiro
  0 siblings, 0 replies; 210+ messages in thread
From: KOSAKI Motohiro @ 2011-04-21 13:03 UTC (permalink / raw)
  To: David Rientjes
  Cc: kosaki.motohiro, James Bottomley, Pekka Enberg,
	Christoph Lameter, Michal Hocko, Andrew Morton, Hugh Dickins,
	linux-mm, LKML, linux-parisc, Ingo Molnar, x86 maintainers

> diff --git a/arch/parisc/mm/init.c b/arch/parisc/mm/init.c
> --- a/arch/parisc/mm/init.c
> +++ b/arch/parisc/mm/init.c
> @@ -266,8 +266,10 @@ static void __init setup_bootmem(void)
>  	}
>  	memset(pfnnid_map, 0xff, sizeof(pfnnid_map));
>  
> -	for (i = 0; i < npmem_ranges; i++)
> +	for (i = 0; i < npmem_ranges; i++) {
> +		node_set_state(i, N_NORMAL_MEMORY);
>  		node_set_online(i);
> +	}
>  #endif


I'm surprised this one. If arch code doesn't initialized N_NORMAL_MEMORY,
(or N_HIGH_MEMORY. N_HIGH_MEMORY == N_NORMAL_MEMORY if CONFIG_HIGHMEM=n)
kswapd is created only at node0. wow.

The initialization must be necessary even if !NUMA, I think. ;-)
Probably we should have revisit all arch code when commit 9422ffba4a 
(Memoryless nodes: No need for kswapd) was introduced, at least.

Thank you David. and I'm sad this multi level unforunate mismatch....


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-20 23:12                                                       ` David Rientjes
@ 2011-04-21 13:16                                                         ` KOSAKI Motohiro
  -1 siblings, 0 replies; 210+ messages in thread
From: KOSAKI Motohiro @ 2011-04-21 13:16 UTC (permalink / raw)
  To: David Rientjes
  Cc: kosaki.motohiro, James Bottomley, Pekka Enberg,
	Christoph Lameter, Michal Hocko, Andrew Morton, Hugh Dickins,
	linux-mm, LKML, linux-parisc, Ingo Molnar, x86 maintainers

> On Wed, 20 Apr 2011, James Bottomley wrote:
> 
> > > This is probably because the parisc's DISCONTIGMEM memory ranges don't 
> > > have bits set in N_NORMAL_MEMORY.
> > > 
> > > diff --git a/arch/parisc/mm/init.c b/arch/parisc/mm/init.c
> > > --- a/arch/parisc/mm/init.c
> > > +++ b/arch/parisc/mm/init.c
> > > @@ -266,8 +266,10 @@ static void __init setup_bootmem(void)
> > >  	}
> > >  	memset(pfnnid_map, 0xff, sizeof(pfnnid_map));
> > >  
> > > -	for (i = 0; i < npmem_ranges; i++)
> > > +	for (i = 0; i < npmem_ranges; i++) {
> > > +		node_set_state(i, N_NORMAL_MEMORY);
> > >  		node_set_online(i);
> > > +	}
> > >  #endif
> > 
> > Yes, this seems to be the missing piece that gets it to boot.  We really
> > need this in generic code, unless someone wants to run through all the
> > other arch's doing it ...
> > 
> 
> Looking at all other architectures that allow ARCH_DISCONTIGMEM_ENABLE, we 
> already know x86 is fine, avr32 disables ARCH_DISCONTIGMEM_ENABLE entirely 
> because its code only brings online node 0, and tile already sets the bit 
> in N_NORMAL_MEMORY correctly when bringing a node online, probably because 
> it was introduced after the various node state masks were added in 
> 7ea1530ab3fd back in October 2007.
> 
> So we're really only talking about alpha, ia64, m32r, m68k, and mips and 
> it only seems to matter when using CONFIG_SLUB, which isn't surprising 
> when greping for it:
> 
> 	$ grep -r N_NORMAL_MEMORY mm/*
> 	mm/memcontrol.c:	if (!node_state(node, N_NORMAL_MEMORY))
> 	mm/memcontrol.c:		if (!node_state(node, N_NORMAL_MEMORY))
> 	mm/page_alloc.c:	[N_NORMAL_MEMORY] = { { [0] = 1UL } },
> 	mm/page_alloc.c:			node_set_state(zone_to_nid(zone), N_NORMAL_MEMORY);
> 	mm/slub.c:	for_each_node_state(node, N_NORMAL_MEMORY) {
> 	mm/slub.c:	for_each_node_state(node, N_NORMAL_MEMORY) {
> 	mm/slub.c:	for_each_node_state(node, N_NORMAL_MEMORY) {
> 	mm/slub.c:	for_each_node_state(node, N_NORMAL_MEMORY) {
> 	mm/slub.c:	for_each_node_state(node, N_NORMAL_MEMORY) {
> 	mm/slub.c:	for_each_node_state(node, N_NORMAL_MEMORY) {
> 	mm/slub.c:	for_each_node_state(node, N_NORMAL_MEMORY) {
> 	mm/slub.c:		for_each_node_state(node, N_NORMAL_MEMORY) {
> 	mm/slub.c:		for_each_node_state(node, N_NORMAL_MEMORY) {
> 	mm/slub.c:	for_each_node_state(node, N_NORMAL_MEMORY)
> 
> Those memory controller occurrences only result in it passing a node id of 
> -1 to kmalloc_node() which means no specific node target, and that's fine 
> for DISCONTIGMEM since we don't care about any proximity between memory 
> ranges.
> 
> This should fix the remaining architectures so they can use CONFIG_SLUB, 
> but I hope it can be tested by the individual arch maintainers like you 
> did for parisc.

ia64 and mips have CONFIG_ARCH_POPULATES_NODE_MAP and it initialize
N_NORMAL_MEMORY automatically if my understand is correct.
(plz see free_area_init_nodes)

I guess alpha and m32r have no active developrs. only m68k seems to be need
fix and we have a chance to get a review... 




^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-21 13:16                                                         ` KOSAKI Motohiro
  0 siblings, 0 replies; 210+ messages in thread
From: KOSAKI Motohiro @ 2011-04-21 13:16 UTC (permalink / raw)
  To: David Rientjes
  Cc: kosaki.motohiro, James Bottomley, Pekka Enberg,
	Christoph Lameter, Michal Hocko, Andrew Morton, Hugh Dickins,
	linux-mm, LKML, linux-parisc, Ingo Molnar, x86 maintainers

> On Wed, 20 Apr 2011, James Bottomley wrote:
> 
> > > This is probably because the parisc's DISCONTIGMEM memory ranges don't 
> > > have bits set in N_NORMAL_MEMORY.
> > > 
> > > diff --git a/arch/parisc/mm/init.c b/arch/parisc/mm/init.c
> > > --- a/arch/parisc/mm/init.c
> > > +++ b/arch/parisc/mm/init.c
> > > @@ -266,8 +266,10 @@ static void __init setup_bootmem(void)
> > >  	}
> > >  	memset(pfnnid_map, 0xff, sizeof(pfnnid_map));
> > >  
> > > -	for (i = 0; i < npmem_ranges; i++)
> > > +	for (i = 0; i < npmem_ranges; i++) {
> > > +		node_set_state(i, N_NORMAL_MEMORY);
> > >  		node_set_online(i);
> > > +	}
> > >  #endif
> > 
> > Yes, this seems to be the missing piece that gets it to boot.  We really
> > need this in generic code, unless someone wants to run through all the
> > other arch's doing it ...
> > 
> 
> Looking at all other architectures that allow ARCH_DISCONTIGMEM_ENABLE, we 
> already know x86 is fine, avr32 disables ARCH_DISCONTIGMEM_ENABLE entirely 
> because its code only brings online node 0, and tile already sets the bit 
> in N_NORMAL_MEMORY correctly when bringing a node online, probably because 
> it was introduced after the various node state masks were added in 
> 7ea1530ab3fd back in October 2007.
> 
> So we're really only talking about alpha, ia64, m32r, m68k, and mips and 
> it only seems to matter when using CONFIG_SLUB, which isn't surprising 
> when greping for it:
> 
> 	$ grep -r N_NORMAL_MEMORY mm/*
> 	mm/memcontrol.c:	if (!node_state(node, N_NORMAL_MEMORY))
> 	mm/memcontrol.c:		if (!node_state(node, N_NORMAL_MEMORY))
> 	mm/page_alloc.c:	[N_NORMAL_MEMORY] = { { [0] = 1UL } },
> 	mm/page_alloc.c:			node_set_state(zone_to_nid(zone), N_NORMAL_MEMORY);
> 	mm/slub.c:	for_each_node_state(node, N_NORMAL_MEMORY) {
> 	mm/slub.c:	for_each_node_state(node, N_NORMAL_MEMORY) {
> 	mm/slub.c:	for_each_node_state(node, N_NORMAL_MEMORY) {
> 	mm/slub.c:	for_each_node_state(node, N_NORMAL_MEMORY) {
> 	mm/slub.c:	for_each_node_state(node, N_NORMAL_MEMORY) {
> 	mm/slub.c:	for_each_node_state(node, N_NORMAL_MEMORY) {
> 	mm/slub.c:	for_each_node_state(node, N_NORMAL_MEMORY) {
> 	mm/slub.c:		for_each_node_state(node, N_NORMAL_MEMORY) {
> 	mm/slub.c:		for_each_node_state(node, N_NORMAL_MEMORY) {
> 	mm/slub.c:	for_each_node_state(node, N_NORMAL_MEMORY)
> 
> Those memory controller occurrences only result in it passing a node id of 
> -1 to kmalloc_node() which means no specific node target, and that's fine 
> for DISCONTIGMEM since we don't care about any proximity between memory 
> ranges.
> 
> This should fix the remaining architectures so they can use CONFIG_SLUB, 
> but I hope it can be tested by the individual arch maintainers like you 
> did for parisc.

ia64 and mips have CONFIG_ARCH_POPULATES_NODE_MAP and it initialize
N_NORMAL_MEMORY automatically if my understand is correct.
(plz see free_area_init_nodes)

I guess alpha and m32r have no active developrs. only m68k seems to be need
fix and we have a chance to get a review... 



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-20 13:50                                             ` Christoph Lameter
@ 2011-04-21 13:32                                               ` Tejun Heo
  -1 siblings, 0 replies; 210+ messages in thread
From: Tejun Heo @ 2011-04-21 13:32 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: KOSAKI Motohiro, James Bottomley, Pekka Enberg, Michal Hocko,
	Andrew Morton, Hugh Dickins, linux-mm, LKML, linux-parisc,
	David Rientjes

Hey,

On Wed, Apr 20, 2011 at 08:50:15AM -0500, Christoph Lameter wrote:
> Tejon was working on getting rid of DISCONTIG. SPARSEMEM is the favored
> alternative today. So we could potentially change the arches to use SPARSE
> configs in the !NUMA case.

Well, the thing is that sparsemem w/ vmemmap is definitely better than
discontigmem on x86-64; however, on x86-32, vmemmap can't be used due
to address space shortage and there are some minor disadvantages to
sparsemem compared to discontigmem.

IIRC, the biggest was losing a bit of granuality in memsections and
possibly wasting slightly more memory on the page array.  Both didn't
seem critical to me but given that the actual amount of code needed
for discontigmem in arch code was fairly small (although the amount of
added complexity for auditing/testing can be much higher) I didn't
feel sure about dropping discontigmem and thus the patchset to drop
discontigmem was posted as RFC, to which nobody commented.

  http://thread.gmane.org/gmane.linux.kernel/1121321

What do you guys think?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-21 13:32                                               ` Tejun Heo
  0 siblings, 0 replies; 210+ messages in thread
From: Tejun Heo @ 2011-04-21 13:32 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: KOSAKI Motohiro, James Bottomley, Pekka Enberg, Michal Hocko,
	Andrew Morton, Hugh Dickins, linux-mm, LKML, linux-parisc,
	David Rientjes

Hey,

On Wed, Apr 20, 2011 at 08:50:15AM -0500, Christoph Lameter wrote:
> Tejon was working on getting rid of DISCONTIG. SPARSEMEM is the favored
> alternative today. So we could potentially change the arches to use SPARSE
> configs in the !NUMA case.

Well, the thing is that sparsemem w/ vmemmap is definitely better than
discontigmem on x86-64; however, on x86-32, vmemmap can't be used due
to address space shortage and there are some minor disadvantages to
sparsemem compared to discontigmem.

IIRC, the biggest was losing a bit of granuality in memsections and
possibly wasting slightly more memory on the page array.  Both didn't
seem critical to me but given that the actual amount of code needed
for discontigmem in arch code was fairly small (although the amount of
added complexity for auditing/testing can be much higher) I didn't
feel sure about dropping discontigmem and thus the patchset to drop
discontigmem was posted as RFC, to which nobody commented.

  http://thread.gmane.org/gmane.linux.kernel/1121321

What do you guys think?

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-20 21:42                                                           ` David Rientjes
@ 2011-04-21 16:06                                                             ` James Bottomley
  -1 siblings, 0 replies; 210+ messages in thread
From: James Bottomley @ 2011-04-21 16:06 UTC (permalink / raw)
  To: David Rientjes
  Cc: Christoph Lameter, Pekka Enberg, Matthew Wilcox, KOSAKI Motohiro,
	Michal Hocko, Andrew Morton, Hugh Dickins, linux-mm, LKML,
	linux-parisc, Ingo Molnar, x86 maintainers, linux-arch,
	Mel Gorman

On Wed, 2011-04-20 at 14:42 -0700, David Rientjes wrote:
> On Wed, 20 Apr 2011, Christoph Lameter wrote:
> 
> > There is barely any testing going on at all of this since we have had this
> > issue for more than 5 years and have not noticed it. The absence of bug
> > reports therefore proves nothing. Code inspection of the VM shows
> > that this is an issue that arises in multiple subsystems and that we have
> > VM_BUG_ONs in the page allocator that should trigger for these situations.
> > 
> > Usage of DISCONTIGMEM and !NUMA is not safe and should be flagged as such.
> > 
> 
> We don't actually have any bug reports in front of us that show anything 
> else in the VM other than slub has issues with this configuration, so 
> marking them as broken is probably premature.  The parisc config that 
> triggered this debugging enables CONFIG_SLAB by default, so it probably 
> has gone unnoticed just because nobody other than James has actually tried 
> it on hppa64.
> 
> Let's see if KOSAKI-san's fixes to Kconfig (even though I'd prefer the 
> simpler and implicit "config NUMA def_bool ARCH_DISCONTIGMEM_ENABLE" over 
> his config NUMA) and my fix to parisc to set the bit in N_NORMAL_MEMORY 
> so that CONFIG_SLUB initializes kmem_cache_node correctly works and then 
> address issues in the core VM as they arise.  Presumably someone has been 
> running DISCONTIGMEM on hppa64 in the past five years without issues with 
> defconfig, so the issue here may just be slub.

Actually, we can fix slub.  As far as all my memory hammer tests go, the
one liner below is the actual fix (it just forces slub get_node() to
return the zero node always on !NUMA).  That, as far as a code
inspection goes, seems to make SLUB as good as SLAB ... as long as
no-one uses hugepages or VM DEBUG, which, I think we've demonstrated, is
the case for all the current DISCONTIGMEM users.

I think either the above or just marking slub broken in DISCONTIGMEM & !
NUMA is sufficient for stable.  The fix is getting urgent, because
debian (which is what most of our users are running) has made SLUB the
default allocator, which is why we're now starting to run into these
panic reports.

The set memory range fix looks good for a backport too ... at least the
page cache is now no-longer reluctant to use my upper 1GB ...

I worry a bit more about backporting the selection of NUMA as a -stable
fix because it's a larger change (and requires changes to all the
architectures, since NUMA is an arch local Kconfig variable)

James

----

diff --git a/mm/slub.c b/mm/slub.c
index 94d2a33..243bd9c 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -235,7 +235,11 @@ int slab_is_available(void)
 
 static inline struct kmem_cache_node *get_node(struct kmem_cache *s, int node)
 {
+#ifdef CONFIG_NUMA
 	return s->node[node];
+#else
+	return s->node[0];
+#endif
 }
 
 /* Verify that a pointer has an address that is valid within a slab page */

^ permalink raw reply related	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-21 16:06                                                             ` James Bottomley
  0 siblings, 0 replies; 210+ messages in thread
From: James Bottomley @ 2011-04-21 16:06 UTC (permalink / raw)
  To: David Rientjes
  Cc: Christoph Lameter, Pekka Enberg, Matthew Wilcox, KOSAKI Motohiro,
	Michal Hocko, Andrew Morton, Hugh Dickins, linux-mm, LKML,
	linux-parisc, Ingo Molnar, x86 maintainers, linux-arch,
	Mel Gorman

On Wed, 2011-04-20 at 14:42 -0700, David Rientjes wrote:
> On Wed, 20 Apr 2011, Christoph Lameter wrote:
> 
> > There is barely any testing going on at all of this since we have had this
> > issue for more than 5 years and have not noticed it. The absence of bug
> > reports therefore proves nothing. Code inspection of the VM shows
> > that this is an issue that arises in multiple subsystems and that we have
> > VM_BUG_ONs in the page allocator that should trigger for these situations.
> > 
> > Usage of DISCONTIGMEM and !NUMA is not safe and should be flagged as such.
> > 
> 
> We don't actually have any bug reports in front of us that show anything 
> else in the VM other than slub has issues with this configuration, so 
> marking them as broken is probably premature.  The parisc config that 
> triggered this debugging enables CONFIG_SLAB by default, so it probably 
> has gone unnoticed just because nobody other than James has actually tried 
> it on hppa64.
> 
> Let's see if KOSAKI-san's fixes to Kconfig (even though I'd prefer the 
> simpler and implicit "config NUMA def_bool ARCH_DISCONTIGMEM_ENABLE" over 
> his config NUMA) and my fix to parisc to set the bit in N_NORMAL_MEMORY 
> so that CONFIG_SLUB initializes kmem_cache_node correctly works and then 
> address issues in the core VM as they arise.  Presumably someone has been 
> running DISCONTIGMEM on hppa64 in the past five years without issues with 
> defconfig, so the issue here may just be slub.

Actually, we can fix slub.  As far as all my memory hammer tests go, the
one liner below is the actual fix (it just forces slub get_node() to
return the zero node always on !NUMA).  That, as far as a code
inspection goes, seems to make SLUB as good as SLAB ... as long as
no-one uses hugepages or VM DEBUG, which, I think we've demonstrated, is
the case for all the current DISCONTIGMEM users.

I think either the above or just marking slub broken in DISCONTIGMEM & !
NUMA is sufficient for stable.  The fix is getting urgent, because
debian (which is what most of our users are running) has made SLUB the
default allocator, which is why we're now starting to run into these
panic reports.

The set memory range fix looks good for a backport too ... at least the
page cache is now no-longer reluctant to use my upper 1GB ...

I worry a bit more about backporting the selection of NUMA as a -stable
fix because it's a larger change (and requires changes to all the
architectures, since NUMA is an arch local Kconfig variable)

James

----

diff --git a/mm/slub.c b/mm/slub.c
index 94d2a33..243bd9c 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -235,7 +235,11 @@ int slab_is_available(void)
 
 static inline struct kmem_cache_node *get_node(struct kmem_cache *s, int node)
 {
+#ifdef CONFIG_NUMA
 	return s->node[node];
+#else
+	return s->node[0];
+#endif
 }
 
 /* Verify that a pointer has an address that is valid within a slab page */


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-21 13:16                                                         ` KOSAKI Motohiro
@ 2011-04-21 16:37                                                           ` James Bottomley
  -1 siblings, 0 replies; 210+ messages in thread
From: James Bottomley @ 2011-04-21 16:37 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: David Rientjes, Pekka Enberg, Christoph Lameter, Michal Hocko,
	Andrew Morton, Hugh Dickins, linux-mm, LKML, linux-parisc,
	Ingo Molnar, x86 maintainers

On Thu, 2011-04-21 at 22:16 +0900, KOSAKI Motohiro wrote:
> > This should fix the remaining architectures so they can use CONFIG_SLUB, 
> > but I hope it can be tested by the individual arch maintainers like you 
> > did for parisc.
> 
> ia64 and mips have CONFIG_ARCH_POPULATES_NODE_MAP and it initialize
> N_NORMAL_MEMORY automatically if my understand is correct.
> (plz see free_area_init_nodes)
> 
> I guess alpha and m32r have no active developrs. only m68k seems to be need
> fix and we have a chance to get a review... 

Actually, it's not quite a fix yet, I'm afraid.  I've just been
investigating why my main 4 way box got slower with kernel builds:
Apparently userspace processes are now all stuck on CPU0, so we're
obviously tripping over some NUMA scheduling stuff that's missing.

James



^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-21 16:37                                                           ` James Bottomley
  0 siblings, 0 replies; 210+ messages in thread
From: James Bottomley @ 2011-04-21 16:37 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: David Rientjes, Pekka Enberg, Christoph Lameter, Michal Hocko,
	Andrew Morton, Hugh Dickins, linux-mm, LKML, linux-parisc,
	Ingo Molnar, x86 maintainers

On Thu, 2011-04-21 at 22:16 +0900, KOSAKI Motohiro wrote:
> > This should fix the remaining architectures so they can use CONFIG_SLUB, 
> > but I hope it can be tested by the individual arch maintainers like you 
> > did for parisc.
> 
> ia64 and mips have CONFIG_ARCH_POPULATES_NODE_MAP and it initialize
> N_NORMAL_MEMORY automatically if my understand is correct.
> (plz see free_area_init_nodes)
> 
> I guess alpha and m32r have no active developrs. only m68k seems to be need
> fix and we have a chance to get a review... 

Actually, it's not quite a fix yet, I'm afraid.  I've just been
investigating why my main 4 way box got slower with kernel builds:
Apparently userspace processes are now all stuck on CPU0, so we're
obviously tripping over some NUMA scheduling stuff that's missing.

James


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-21 16:37                                                           ` James Bottomley
@ 2011-04-21 18:33                                                             ` Christoph Lameter
  -1 siblings, 0 replies; 210+ messages in thread
From: Christoph Lameter @ 2011-04-21 18:33 UTC (permalink / raw)
  To: James Bottomley
  Cc: KOSAKI Motohiro, David Rientjes, Pekka Enberg, Michal Hocko,
	Andrew Morton, Hugh Dickins, linux-mm, LKML, linux-parisc,
	Ingo Molnar, x86 maintainers, Tejun Heo, Dave Hansen, Mel Gorman

On Thu, 21 Apr 2011, James Bottomley wrote:

> On Thu, 2011-04-21 at 22:16 +0900, KOSAKI Motohiro wrote:
> > > This should fix the remaining architectures so they can use CONFIG_SLUB,
> > > but I hope it can be tested by the individual arch maintainers like you
> > > did for parisc.
> >
> > ia64 and mips have CONFIG_ARCH_POPULATES_NODE_MAP and it initialize
> > N_NORMAL_MEMORY automatically if my understand is correct.
> > (plz see free_area_init_nodes)
> >
> > I guess alpha and m32r have no active developrs. only m68k seems to be need
> > fix and we have a chance to get a review...
>
> Actually, it's not quite a fix yet, I'm afraid.  I've just been
> investigating why my main 4 way box got slower with kernel builds:
> Apparently userspace processes are now all stuck on CPU0, so we're
> obviously tripping over some NUMA scheduling stuff that's missing.

The simplest solution may be to move these arches to use SPARSE instead.
AFAICT this was relatively easy for the arm guys.

Here is short guide on how to do that from the mips people:

http://www.linux-mips.org/archives/linux-mips/2008-08/msg00154.html

http://mytechkorner.blogspot.com/2010/12/sparsemem.html

Dave Hansen, Mel: Can you provide us with some help? (Its Easter and so
the europeans may be off for awhile)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-21 18:33                                                             ` Christoph Lameter
  0 siblings, 0 replies; 210+ messages in thread
From: Christoph Lameter @ 2011-04-21 18:33 UTC (permalink / raw)
  To: James Bottomley
  Cc: KOSAKI Motohiro, David Rientjes, Pekka Enberg, Michal Hocko,
	Andrew Morton, Hugh Dickins, linux-mm, LKML, linux-parisc,
	Ingo Molnar, x86 maintainers, Tejun Heo, Dave Hansen, Mel Gorman

On Thu, 21 Apr 2011, James Bottomley wrote:

> On Thu, 2011-04-21 at 22:16 +0900, KOSAKI Motohiro wrote:
> > > This should fix the remaining architectures so they can use CONFIG_SLUB,
> > > but I hope it can be tested by the individual arch maintainers like you
> > > did for parisc.
> >
> > ia64 and mips have CONFIG_ARCH_POPULATES_NODE_MAP and it initialize
> > N_NORMAL_MEMORY automatically if my understand is correct.
> > (plz see free_area_init_nodes)
> >
> > I guess alpha and m32r have no active developrs. only m68k seems to be need
> > fix and we have a chance to get a review...
>
> Actually, it's not quite a fix yet, I'm afraid.  I've just been
> investigating why my main 4 way box got slower with kernel builds:
> Apparently userspace processes are now all stuck on CPU0, so we're
> obviously tripping over some NUMA scheduling stuff that's missing.

The simplest solution may be to move these arches to use SPARSE instead.
AFAICT this was relatively easy for the arm guys.

Here is short guide on how to do that from the mips people:

http://www.linux-mips.org/archives/linux-mips/2008-08/msg00154.html

http://mytechkorner.blogspot.com/2010/12/sparsemem.html

Dave Hansen, Mel: Can you provide us with some help? (Its Easter and so
the europeans may be off for awhile)

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-21 18:33                                                             ` Christoph Lameter
@ 2011-04-21 18:45                                                               ` Dave Hansen
  -1 siblings, 0 replies; 210+ messages in thread
From: Dave Hansen @ 2011-04-21 18:45 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: James Bottomley, KOSAKI Motohiro, David Rientjes, Pekka Enberg,
	Michal Hocko, Andrew Morton, Hugh Dickins, linux-mm, LKML,
	linux-parisc, Ingo Molnar, x86 maintainers, Tejun Heo,
	Mel Gorman

On Thu, 2011-04-21 at 13:33 -0500, Christoph Lameter wrote:
> http://www.linux-mips.org/archives/linux-mips/2008-08/msg00154.html
> 
> http://mytechkorner.blogspot.com/2010/12/sparsemem.html
> 
> Dave Hansen, Mel: Can you provide us with some help? (Its Easter and so
> the europeans may be off for awhile) 

Yup, for sure.  It's also interesting how much code ppc64 removed when
they did this:

http://lists.ozlabs.org/pipermail/linuxppc64-dev/2005-November/006646.html

Please cc me on patches.  Or, if nobody else was planning on doing it, I
can take a stab at doing SPARSEMEM on one of the arches.  I won't be
able to _run_ it outside of qemu, but it might be quicker than someone
starting from scratch.

Was it really just m68k and parisc that need immediate attention?

-- Dave


^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-21 18:45                                                               ` Dave Hansen
  0 siblings, 0 replies; 210+ messages in thread
From: Dave Hansen @ 2011-04-21 18:45 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: James Bottomley, KOSAKI Motohiro, David Rientjes, Pekka Enberg,
	Michal Hocko, Andrew Morton, Hugh Dickins, linux-mm, LKML,
	linux-parisc, Ingo Molnar, x86 maintainers, Tejun Heo,
	Mel Gorman

On Thu, 2011-04-21 at 13:33 -0500, Christoph Lameter wrote:
> http://www.linux-mips.org/archives/linux-mips/2008-08/msg00154.html
> 
> http://mytechkorner.blogspot.com/2010/12/sparsemem.html
> 
> Dave Hansen, Mel: Can you provide us with some help? (Its Easter and so
> the europeans may be off for awhile) 

Yup, for sure.  It's also interesting how much code ppc64 removed when
they did this:

http://lists.ozlabs.org/pipermail/linuxppc64-dev/2005-November/006646.html

Please cc me on patches.  Or, if nobody else was planning on doing it, I
can take a stab at doing SPARSEMEM on one of the arches.  I won't be
able to _run_ it outside of qemu, but it might be quicker than someone
starting from scratch.

Was it really just m68k and parisc that need immediate attention?

-- Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-21 13:16                                                         ` KOSAKI Motohiro
@ 2011-04-21 19:33                                                           ` David Rientjes
  -1 siblings, 0 replies; 210+ messages in thread
From: David Rientjes @ 2011-04-21 19:33 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: James Bottomley, Pekka Enberg, Christoph Lameter, Michal Hocko,
	Andrew Morton, Hugh Dickins, linux-mm, LKML, linux-parisc,
	Ingo Molnar, x86 maintainers

On Thu, 21 Apr 2011, KOSAKI Motohiro wrote:

> ia64 and mips have CONFIG_ARCH_POPULATES_NODE_MAP and it initialize
> N_NORMAL_MEMORY automatically if my understand is correct.
> (plz see free_area_init_nodes)
> 

ia64 doesn't enable CONFIG_HIGHMEM, so it never gets set via this generic 
code; mips also doesn't enable it for all configs even for 32-bit.

So we'll either want to take check_for_regular_memory() out from under 
CONFIG_HIGHMEM and do it for all configs or teach slub to use 
N_HIGH_MEMORY rather than N_NORMAL_MEMORY.

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-21 19:33                                                           ` David Rientjes
  0 siblings, 0 replies; 210+ messages in thread
From: David Rientjes @ 2011-04-21 19:33 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: James Bottomley, Pekka Enberg, Christoph Lameter, Michal Hocko,
	Andrew Morton, Hugh Dickins, linux-mm, LKML, linux-parisc,
	Ingo Molnar, x86 maintainers

On Thu, 21 Apr 2011, KOSAKI Motohiro wrote:

> ia64 and mips have CONFIG_ARCH_POPULATES_NODE_MAP and it initialize
> N_NORMAL_MEMORY automatically if my understand is correct.
> (plz see free_area_init_nodes)
> 

ia64 doesn't enable CONFIG_HIGHMEM, so it never gets set via this generic 
code; mips also doesn't enable it for all configs even for 32-bit.

So we'll either want to take check_for_regular_memory() out from under 
CONFIG_HIGHMEM and do it for all configs or teach slub to use 
N_HIGH_MEMORY rather than N_NORMAL_MEMORY.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-21 13:03                                                     ` KOSAKI Motohiro
@ 2011-04-21 19:38                                                       ` David Rientjes
  -1 siblings, 0 replies; 210+ messages in thread
From: David Rientjes @ 2011-04-21 19:38 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: James Bottomley, Pekka Enberg, Christoph Lameter, Michal Hocko,
	Andrew Morton, Hugh Dickins, linux-mm, LKML, linux-parisc,
	Ingo Molnar, x86 maintainers

On Thu, 21 Apr 2011, KOSAKI Motohiro wrote:

> > diff --git a/arch/parisc/mm/init.c b/arch/parisc/mm/init.c
> > --- a/arch/parisc/mm/init.c
> > +++ b/arch/parisc/mm/init.c
> > @@ -266,8 +266,10 @@ static void __init setup_bootmem(void)
> >  	}
> >  	memset(pfnnid_map, 0xff, sizeof(pfnnid_map));
> >  
> > -	for (i = 0; i < npmem_ranges; i++)
> > +	for (i = 0; i < npmem_ranges; i++) {
> > +		node_set_state(i, N_NORMAL_MEMORY);
> >  		node_set_online(i);
> > +	}
> >  #endif
> 
> 
> I'm surprised this one. If arch code doesn't initialized N_NORMAL_MEMORY,
> (or N_HIGH_MEMORY. N_HIGH_MEMORY == N_NORMAL_MEMORY if CONFIG_HIGHMEM=n)
> kswapd is created only at node0. wow.
> 
> The initialization must be necessary even if !NUMA, I think. ;-)
> Probably we should have revisit all arch code when commit 9422ffba4a 
> (Memoryless nodes: No need for kswapd) was introduced, at least.
> 

I think we may want to just convert slub (and the memory controller) to 
use N_HIGH_MEMORY rather than N_NORMAL_MEMORY since nothing else uses it 
and the generic code seems to handle N_HIGH_MEMORY for all configs 
appropriately.

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-21 19:38                                                       ` David Rientjes
  0 siblings, 0 replies; 210+ messages in thread
From: David Rientjes @ 2011-04-21 19:38 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: James Bottomley, Pekka Enberg, Christoph Lameter, Michal Hocko,
	Andrew Morton, Hugh Dickins, linux-mm, LKML, linux-parisc,
	Ingo Molnar, x86 maintainers

On Thu, 21 Apr 2011, KOSAKI Motohiro wrote:

> > diff --git a/arch/parisc/mm/init.c b/arch/parisc/mm/init.c
> > --- a/arch/parisc/mm/init.c
> > +++ b/arch/parisc/mm/init.c
> > @@ -266,8 +266,10 @@ static void __init setup_bootmem(void)
> >  	}
> >  	memset(pfnnid_map, 0xff, sizeof(pfnnid_map));
> >  
> > -	for (i = 0; i < npmem_ranges; i++)
> > +	for (i = 0; i < npmem_ranges; i++) {
> > +		node_set_state(i, N_NORMAL_MEMORY);
> >  		node_set_online(i);
> > +	}
> >  #endif
> 
> 
> I'm surprised this one. If arch code doesn't initialized N_NORMAL_MEMORY,
> (or N_HIGH_MEMORY. N_HIGH_MEMORY == N_NORMAL_MEMORY if CONFIG_HIGHMEM=n)
> kswapd is created only at node0. wow.
> 
> The initialization must be necessary even if !NUMA, I think. ;-)
> Probably we should have revisit all arch code when commit 9422ffba4a 
> (Memoryless nodes: No need for kswapd) was introduced, at least.
> 

I think we may want to just convert slub (and the memory controller) to 
use N_HIGH_MEMORY rather than N_NORMAL_MEMORY since nothing else uses it 
and the generic code seems to handle N_HIGH_MEMORY for all configs 
appropriately.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-21 19:38                                                       ` David Rientjes
@ 2011-04-21 20:02                                                         ` Christoph Lameter
  -1 siblings, 0 replies; 210+ messages in thread
From: Christoph Lameter @ 2011-04-21 20:02 UTC (permalink / raw)
  To: David Rientjes
  Cc: KOSAKI Motohiro, James Bottomley, Pekka Enberg, Michal Hocko,
	Andrew Morton, Hugh Dickins, linux-mm, LKML, linux-parisc,
	Ingo Molnar, x86 maintainers

On Thu, 21 Apr 2011, David Rientjes wrote:

> I think we may want to just convert slub (and the memory controller) to
> use N_HIGH_MEMORY rather than N_NORMAL_MEMORY since nothing else uses it
> and the generic code seems to handle N_HIGH_MEMORY for all configs
> appropriately.

In 32 bit configurations some architectures (like x86) provide nodes
that have only high memory. Slab allocators only handle normal memory.
SLAB operates in a kind of degraded mode in that case by falling back for
each allocation to the nodes that have normal memory.


^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-21 20:02                                                         ` Christoph Lameter
  0 siblings, 0 replies; 210+ messages in thread
From: Christoph Lameter @ 2011-04-21 20:02 UTC (permalink / raw)
  To: David Rientjes
  Cc: KOSAKI Motohiro, James Bottomley, Pekka Enberg, Michal Hocko,
	Andrew Morton, Hugh Dickins, linux-mm, LKML, linux-parisc,
	Ingo Molnar, x86 maintainers

On Thu, 21 Apr 2011, David Rientjes wrote:

> I think we may want to just convert slub (and the memory controller) to
> use N_HIGH_MEMORY rather than N_NORMAL_MEMORY since nothing else uses it
> and the generic code seems to handle N_HIGH_MEMORY for all configs
> appropriately.

In 32 bit configurations some architectures (like x86) provide nodes
that have only high memory. Slab allocators only handle normal memory.
SLAB operates in a kind of degraded mode in that case by falling back for
each allocation to the nodes that have normal memory.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-21 18:33                                                             ` Christoph Lameter
  (?)
@ 2011-04-21 20:05                                                               ` James Bottomley
  -1 siblings, 0 replies; 210+ messages in thread
From: James Bottomley @ 2011-04-21 20:05 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: KOSAKI Motohiro, David Rientjes, Pekka Enberg, Michal Hocko,
	Andrew Morton, Hugh Dickins, linux-mm, LKML, linux-parisc,
	Ingo Molnar, x86 maintainers, Tejun Heo, Dave Hansen, Mel Gorman

On Thu, 2011-04-21 at 13:33 -0500, Christoph Lameter wrote:
> On Thu, 21 Apr 2011, James Bottomley wrote:
> 
> > On Thu, 2011-04-21 at 22:16 +0900, KOSAKI Motohiro wrote:
> > > > This should fix the remaining architectures so they can use CONFIG_SLUB,
> > > > but I hope it can be tested by the individual arch maintainers like you
> > > > did for parisc.
> > >
> > > ia64 and mips have CONFIG_ARCH_POPULATES_NODE_MAP and it initialize
> > > N_NORMAL_MEMORY automatically if my understand is correct.
> > > (plz see free_area_init_nodes)
> > >
> > > I guess alpha and m32r have no active developrs. only m68k seems to be need
> > > fix and we have a chance to get a review...
> >
> > Actually, it's not quite a fix yet, I'm afraid.  I've just been
> > investigating why my main 4 way box got slower with kernel builds:
> > Apparently userspace processes are now all stuck on CPU0, so we're
> > obviously tripping over some NUMA scheduling stuff that's missing.
> 
> The simplest solution may be to move these arches to use SPARSE instead.
> AFAICT this was relatively easy for the arm guys.
> 
> Here is short guide on how to do that from the mips people:
> 
> http://www.linux-mips.org/archives/linux-mips/2008-08/msg00154.html
> 
> http://mytechkorner.blogspot.com/2010/12/sparsemem.html
> 
> Dave Hansen, Mel: Can you provide us with some help? (Its Easter and so
> the europeans may be off for awhile)

It sort of depends on your definition of easy.  The problem going from
DISCONTIGMEM to SPARSEMEM is sorting out the section size (the minimum
indivisible size for a sectional_mem_map array) and also deciding on
whether you need SPARSEMEM_EXTREME (discontigmem allows arbitrarily
different sizes for each contiguous region) or
ARCH_HAS_HOLES_MEMORYMODEL (allows empty mem_map regions as well).  I
suspect most architectures will want SPARSEMEM_EXTREME (it means that
the section array isn't fully populated) because the gaps can be huge
(we've got a 64GB gap on parisc).

However, even though I think we can do this going forwards ... I don't
think we can backport it as a bug fix for the slub panic.

James


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-21 20:05                                                               ` James Bottomley
  0 siblings, 0 replies; 210+ messages in thread
From: James Bottomley @ 2011-04-21 20:05 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: KOSAKI Motohiro, David Rientjes, Pekka Enberg, Michal Hocko,
	Andrew Morton, Hugh Dickins, linux-mm, LKML, linux-parisc,
	Ingo Molnar, x86 maintainers, Tejun Heo, Dave Hansen, Mel Gorman

On Thu, 2011-04-21 at 13:33 -0500, Christoph Lameter wrote:
> On Thu, 21 Apr 2011, James Bottomley wrote:
> 
> > On Thu, 2011-04-21 at 22:16 +0900, KOSAKI Motohiro wrote:
> > > > This should fix the remaining architectures so they can use CONFIG_SLUB,
> > > > but I hope it can be tested by the individual arch maintainers like you
> > > > did for parisc.
> > >
> > > ia64 and mips have CONFIG_ARCH_POPULATES_NODE_MAP and it initialize
> > > N_NORMAL_MEMORY automatically if my understand is correct.
> > > (plz see free_area_init_nodes)
> > >
> > > I guess alpha and m32r have no active developrs. only m68k seems to be need
> > > fix and we have a chance to get a review...
> >
> > Actually, it's not quite a fix yet, I'm afraid.  I've just been
> > investigating why my main 4 way box got slower with kernel builds:
> > Apparently userspace processes are now all stuck on CPU0, so we're
> > obviously tripping over some NUMA scheduling stuff that's missing.
> 
> The simplest solution may be to move these arches to use SPARSE instead.
> AFAICT this was relatively easy for the arm guys.
> 
> Here is short guide on how to do that from the mips people:
> 
> http://www.linux-mips.org/archives/linux-mips/2008-08/msg00154.html
> 
> http://mytechkorner.blogspot.com/2010/12/sparsemem.html
> 
> Dave Hansen, Mel: Can you provide us with some help? (Its Easter and so
> the europeans may be off for awhile)

It sort of depends on your definition of easy.  The problem going from
DISCONTIGMEM to SPARSEMEM is sorting out the section size (the minimum
indivisible size for a sectional_mem_map array) and also deciding on
whether you need SPARSEMEM_EXTREME (discontigmem allows arbitrarily
different sizes for each contiguous region) or
ARCH_HAS_HOLES_MEMORYMODEL (allows empty mem_map regions as well).  I
suspect most architectures will want SPARSEMEM_EXTREME (it means that
the section array isn't fully populated) because the gaps can be huge
(we've got a 64GB gap on parisc).

However, even though I think we can do this going forwards ... I don't
think we can backport it as a bug fix for the slub panic.

James



^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-21 20:05                                                               ` James Bottomley
  0 siblings, 0 replies; 210+ messages in thread
From: James Bottomley @ 2011-04-21 20:05 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: KOSAKI Motohiro, David Rientjes, Pekka Enberg, Michal Hocko,
	Andrew Morton, Hugh Dickins, linux-mm, LKML, linux-parisc,
	Ingo Molnar, x86 maintainers, Tejun Heo, Dave Hansen, Mel Gorman

On Thu, 2011-04-21 at 13:33 -0500, Christoph Lameter wrote:
> On Thu, 21 Apr 2011, James Bottomley wrote:
> 
> > On Thu, 2011-04-21 at 22:16 +0900, KOSAKI Motohiro wrote:
> > > > This should fix the remaining architectures so they can use CONFIG_SLUB,
> > > > but I hope it can be tested by the individual arch maintainers like you
> > > > did for parisc.
> > >
> > > ia64 and mips have CONFIG_ARCH_POPULATES_NODE_MAP and it initialize
> > > N_NORMAL_MEMORY automatically if my understand is correct.
> > > (plz see free_area_init_nodes)
> > >
> > > I guess alpha and m32r have no active developrs. only m68k seems to be need
> > > fix and we have a chance to get a review...
> >
> > Actually, it's not quite a fix yet, I'm afraid.  I've just been
> > investigating why my main 4 way box got slower with kernel builds:
> > Apparently userspace processes are now all stuck on CPU0, so we're
> > obviously tripping over some NUMA scheduling stuff that's missing.
> 
> The simplest solution may be to move these arches to use SPARSE instead.
> AFAICT this was relatively easy for the arm guys.
> 
> Here is short guide on how to do that from the mips people:
> 
> http://www.linux-mips.org/archives/linux-mips/2008-08/msg00154.html
> 
> http://mytechkorner.blogspot.com/2010/12/sparsemem.html
> 
> Dave Hansen, Mel: Can you provide us with some help? (Its Easter and so
> the europeans may be off for awhile)

It sort of depends on your definition of easy.  The problem going from
DISCONTIGMEM to SPARSEMEM is sorting out the section size (the minimum
indivisible size for a sectional_mem_map array) and also deciding on
whether you need SPARSEMEM_EXTREME (discontigmem allows arbitrarily
different sizes for each contiguous region) or
ARCH_HAS_HOLES_MEMORYMODEL (allows empty mem_map regions as well).  I
suspect most architectures will want SPARSEMEM_EXTREME (it means that
the section array isn't fully populated) because the gaps can be huge
(we've got a 64GB gap on parisc).

However, even though I think we can do this going forwards ... I don't
think we can backport it as a bug fix for the slub panic.

James


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-21 20:05                                                               ` James Bottomley
@ 2011-04-21 21:07                                                                 ` Christoph Lameter
  -1 siblings, 0 replies; 210+ messages in thread
From: Christoph Lameter @ 2011-04-21 21:07 UTC (permalink / raw)
  To: James Bottomley
  Cc: KOSAKI Motohiro, David Rientjes, Pekka Enberg, Michal Hocko,
	Andrew Morton, Hugh Dickins, linux-mm, LKML, linux-parisc,
	Ingo Molnar, x86 maintainers, Tejun Heo, Dave Hansen, Mel Gorman

On Thu, 21 Apr 2011, James Bottomley wrote:

> > Dave Hansen, Mel: Can you provide us with some help? (Its Easter and so
> > the europeans may be off for awhile)
>
> It sort of depends on your definition of easy.  The problem going from
> DISCONTIGMEM to SPARSEMEM is sorting out the section size (the minimum
> indivisible size for a sectional_mem_map array) and also deciding on
> whether you need SPARSEMEM_EXTREME (discontigmem allows arbitrarily
> different sizes for each contiguous region) or
> ARCH_HAS_HOLES_MEMORYMODEL (allows empty mem_map regions as well).  I
> suspect most architectures will want SPARSEMEM_EXTREME (it means that
> the section array isn't fully populated) because the gaps can be huge
> (we've got a 64GB gap on parisc).

Well my favorite is SPARSEMEM_VMEMMAP because it allows page level holes
and uses the TLB (via page tables) to avoid lookups in the SPARSE maps but
that is likely not going to be in an initial fix.

> However, even though I think we can do this going forwards ... I don't
> think we can backport it as a bug fix for the slub panic.

So far there seems to be no other solution that will fix the issues
cleanly since we have a clash of the notions of a node in !NUMA between
core and discontig. Which is a pretty basic thing to get wrong.

If we can avoid all the fancy stuff and Dave can just get a minimal SPARSE
config going then this may be the best solution for stable as well.

But then these configs have been broken for years and no one noticed. This
means the users of these arches likely have been running a subset of
kernel functionality. I suspect they have never freed memory from
DISCONTIG node 1 and higher without CONFIG_DEBUG_VM on. Otherwise I
cannot explain why the VM_BUG_ONs did not trigger in
mm/page_alloc.c:move_freepages() that should have been brought to the MM
developers attention.

This set of circumstances leads to the suspicion that there were only
tests run that showed that the kernel booted. Higher node memory was never
touched and the MM code was never truly exercised.

So I am not sure that there is any urgency in this matter. No one has
cared for years after all.

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-21 21:07                                                                 ` Christoph Lameter
  0 siblings, 0 replies; 210+ messages in thread
From: Christoph Lameter @ 2011-04-21 21:07 UTC (permalink / raw)
  To: James Bottomley
  Cc: KOSAKI Motohiro, David Rientjes, Pekka Enberg, Michal Hocko,
	Andrew Morton, Hugh Dickins, linux-mm, LKML, linux-parisc,
	Ingo Molnar, x86 maintainers, Tejun Heo, Dave Hansen, Mel Gorman

On Thu, 21 Apr 2011, James Bottomley wrote:

> > Dave Hansen, Mel: Can you provide us with some help? (Its Easter and so
> > the europeans may be off for awhile)
>
> It sort of depends on your definition of easy.  The problem going from
> DISCONTIGMEM to SPARSEMEM is sorting out the section size (the minimum
> indivisible size for a sectional_mem_map array) and also deciding on
> whether you need SPARSEMEM_EXTREME (discontigmem allows arbitrarily
> different sizes for each contiguous region) or
> ARCH_HAS_HOLES_MEMORYMODEL (allows empty mem_map regions as well).  I
> suspect most architectures will want SPARSEMEM_EXTREME (it means that
> the section array isn't fully populated) because the gaps can be huge
> (we've got a 64GB gap on parisc).

Well my favorite is SPARSEMEM_VMEMMAP because it allows page level holes
and uses the TLB (via page tables) to avoid lookups in the SPARSE maps but
that is likely not going to be in an initial fix.

> However, even though I think we can do this going forwards ... I don't
> think we can backport it as a bug fix for the slub panic.

So far there seems to be no other solution that will fix the issues
cleanly since we have a clash of the notions of a node in !NUMA between
core and discontig. Which is a pretty basic thing to get wrong.

If we can avoid all the fancy stuff and Dave can just get a minimal SPARSE
config going then this may be the best solution for stable as well.

But then these configs have been broken for years and no one noticed. This
means the users of these arches likely have been running a subset of
kernel functionality. I suspect they have never freed memory from
DISCONTIG node 1 and higher without CONFIG_DEBUG_VM on. Otherwise I
cannot explain why the VM_BUG_ONs did not trigger in
mm/page_alloc.c:move_freepages() that should have been brought to the MM
developers attention.

This set of circumstances leads to the suspicion that there were only
tests run that showed that the kernel booted. Higher node memory was never
touched and the MM code was never truly exercised.

So I am not sure that there is any urgency in this matter. No one has
cared for years after all.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-21 20:02                                                         ` Christoph Lameter
@ 2011-04-21 21:19                                                           ` David Rientjes
  -1 siblings, 0 replies; 210+ messages in thread
From: David Rientjes @ 2011-04-21 21:19 UTC (permalink / raw)
  To: Christoph Lameter, James Bottomley, Andrew Morton
  Cc: KOSAKI Motohiro, Pekka Enberg, Michal Hocko, Hugh Dickins,
	linux-mm, LKML, linux-parisc, Ingo Molnar, x86 maintainers

On Thu, 21 Apr 2011, Christoph Lameter wrote:

> In 32 bit configurations some architectures (like x86) provide nodes
> that have only high memory. Slab allocators only handle normal memory.
> SLAB operates in a kind of degraded mode in that case by falling back for
> each allocation to the nodes that have normal memory.
> 

Let's do this:

 - parisc: James has already queued "parisc: set memory ranges in 
   N_NORMAL_MEMORY when onlined" for 2.6.39, so all he needs now is 
   to merge a hybrid of the Kconfig changes requiring CONFIG_NUMA for 
   CONFIG_DISCONTIGMEM from KOSAKI-san and myself which also fix the 
   compile issues,

 - generic code: we pull check_for_regular_memory() out from under
   CONFIG_HIGHMEM so that N_NORMAL_MEMORY gets set appropriately for 
   all callers of free_area_init_nodes() from paging_init(); this fixes 
   ia64 and mips,

 - alpha, m32r, m68k: push the changes to those individual architectures 
   that I proposed earlier that set N_NORMAL_MEMORY for DISCONTINGMEM
   when memory regions have memory; KOSAKI-san says a couple of these
   architectures may be orphaned so hopefully Andrew can pick them up
   in -mm.

I'll reply to this email with the parisc Kconfig changes for James, the 
generic change to check_for_regular_memory() for Andrew, and the 
arch-specific changes to the appropriate maintainers and email lists (but 
may need to go through -mm if they aren't picked up).

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-21 21:19                                                           ` David Rientjes
  0 siblings, 0 replies; 210+ messages in thread
From: David Rientjes @ 2011-04-21 21:19 UTC (permalink / raw)
  To: Christoph Lameter, James Bottomley, Andrew Morton
  Cc: KOSAKI Motohiro, Pekka Enberg, Michal Hocko, Hugh Dickins,
	linux-mm, LKML, linux-parisc, Ingo Molnar, x86 maintainers

On Thu, 21 Apr 2011, Christoph Lameter wrote:

> In 32 bit configurations some architectures (like x86) provide nodes
> that have only high memory. Slab allocators only handle normal memory.
> SLAB operates in a kind of degraded mode in that case by falling back for
> each allocation to the nodes that have normal memory.
> 

Let's do this:

 - parisc: James has already queued "parisc: set memory ranges in 
   N_NORMAL_MEMORY when onlined" for 2.6.39, so all he needs now is 
   to merge a hybrid of the Kconfig changes requiring CONFIG_NUMA for 
   CONFIG_DISCONTIGMEM from KOSAKI-san and myself which also fix the 
   compile issues,

 - generic code: we pull check_for_regular_memory() out from under
   CONFIG_HIGHMEM so that N_NORMAL_MEMORY gets set appropriately for 
   all callers of free_area_init_nodes() from paging_init(); this fixes 
   ia64 and mips,

 - alpha, m32r, m68k: push the changes to those individual architectures 
   that I proposed earlier that set N_NORMAL_MEMORY for DISCONTINGMEM
   when memory regions have memory; KOSAKI-san says a couple of these
   architectures may be orphaned so hopefully Andrew can pick them up
   in -mm.

I'll reply to this email with the parisc Kconfig changes for James, the 
generic change to check_for_regular_memory() for Andrew, and the 
arch-specific changes to the appropriate maintainers and email lists (but 
may need to go through -mm if they aren't picked up).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-21 21:07                                                                 ` Christoph Lameter
@ 2011-04-21 21:22                                                                   ` James Bottomley
  -1 siblings, 0 replies; 210+ messages in thread
From: James Bottomley @ 2011-04-21 21:22 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: KOSAKI Motohiro, David Rientjes, Pekka Enberg, Michal Hocko,
	Andrew Morton, Hugh Dickins, linux-mm, LKML, linux-parisc,
	Ingo Molnar, x86 maintainers, Tejun Heo, Dave Hansen, Mel Gorman

On Thu, 2011-04-21 at 16:07 -0500, Christoph Lameter wrote:
> On Thu, 21 Apr 2011, James Bottomley wrote:
> 
> > > Dave Hansen, Mel: Can you provide us with some help? (Its Easter and so
> > > the europeans may be off for awhile)
> >
> > It sort of depends on your definition of easy.  The problem going from
> > DISCONTIGMEM to SPARSEMEM is sorting out the section size (the minimum
> > indivisible size for a sectional_mem_map array) and also deciding on
> > whether you need SPARSEMEM_EXTREME (discontigmem allows arbitrarily
> > different sizes for each contiguous region) or
> > ARCH_HAS_HOLES_MEMORYMODEL (allows empty mem_map regions as well).  I
> > suspect most architectures will want SPARSEMEM_EXTREME (it means that
> > the section array isn't fully populated) because the gaps can be huge
> > (we've got a 64GB gap on parisc).
> 
> Well my favorite is SPARSEMEM_VMEMMAP because it allows page level holes
> and uses the TLB (via page tables) to avoid lookups in the SPARSE maps but
> that is likely not going to be in an initial fix.

Really, no ... that requires additional pte insertion logic and some
other stuff that's nasty to craft and requires significant testing.

> > However, even though I think we can do this going forwards ... I don't
> > think we can backport it as a bug fix for the slub panic.
> 
> So far there seems to be no other solution that will fix the issues
> cleanly since we have a clash of the notions of a node in !NUMA between
> core and discontig. Which is a pretty basic thing to get wrong.

Yes there is ... there's the slub patch or the marking as broken.
Either are much simpler.

> If we can avoid all the fancy stuff and Dave can just get a minimal SPARSE
> config going then this may be the best solution for stable as well.
> 
> But then these configs have been broken for years and no one noticed. This
> means the users of these arches likely have been running a subset of
> kernel functionality. I suspect they have never freed memory from
> DISCONTIG node 1 and higher without CONFIG_DEBUG_VM on. Otherwise I
> cannot explain why the VM_BUG_ONs did not trigger in
> mm/page_alloc.c:move_freepages() that should have been brought to the MM
> developers attention.

Yes they have.  As willy said, they've just never been run with DEBUG_VM
or HUGEPAGES or, until recently, SLUB.  The test boxes (at least for
parisc) get hammered quite a lot to flush out coherency issues.  That's
why I'm confident this panic only triggers for slub.  I found the panic
within about two days of turning SLUB on.

> This set of circumstances leads to the suspicion that there were only
> tests run that showed that the kernel booted. Higher node memory was never
> touched and the MM code was never truly exercised.

Look, try to stay on point with logic: they have been extensively
tested, just not in the slub configuration, which is the only one that
crashes.  As I explained (several times) we're just now picking up slub
because debian now enables it by default.

> So I am not sure that there is any urgency in this matter. No one has
> cared for years after all.

If we didn't care, we wouldn't be making all this fuss.  It's only a
couple of days since the bug was reported, which should indicate the
high importance attached to it (well, by everyone except you,
apparently).

James



^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-21 21:22                                                                   ` James Bottomley
  0 siblings, 0 replies; 210+ messages in thread
From: James Bottomley @ 2011-04-21 21:22 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: KOSAKI Motohiro, David Rientjes, Pekka Enberg, Michal Hocko,
	Andrew Morton, Hugh Dickins, linux-mm, LKML, linux-parisc,
	Ingo Molnar, x86 maintainers, Tejun Heo, Dave Hansen, Mel Gorman

On Thu, 2011-04-21 at 16:07 -0500, Christoph Lameter wrote:
> On Thu, 21 Apr 2011, James Bottomley wrote:
> 
> > > Dave Hansen, Mel: Can you provide us with some help? (Its Easter and so
> > > the europeans may be off for awhile)
> >
> > It sort of depends on your definition of easy.  The problem going from
> > DISCONTIGMEM to SPARSEMEM is sorting out the section size (the minimum
> > indivisible size for a sectional_mem_map array) and also deciding on
> > whether you need SPARSEMEM_EXTREME (discontigmem allows arbitrarily
> > different sizes for each contiguous region) or
> > ARCH_HAS_HOLES_MEMORYMODEL (allows empty mem_map regions as well).  I
> > suspect most architectures will want SPARSEMEM_EXTREME (it means that
> > the section array isn't fully populated) because the gaps can be huge
> > (we've got a 64GB gap on parisc).
> 
> Well my favorite is SPARSEMEM_VMEMMAP because it allows page level holes
> and uses the TLB (via page tables) to avoid lookups in the SPARSE maps but
> that is likely not going to be in an initial fix.

Really, no ... that requires additional pte insertion logic and some
other stuff that's nasty to craft and requires significant testing.

> > However, even though I think we can do this going forwards ... I don't
> > think we can backport it as a bug fix for the slub panic.
> 
> So far there seems to be no other solution that will fix the issues
> cleanly since we have a clash of the notions of a node in !NUMA between
> core and discontig. Which is a pretty basic thing to get wrong.

Yes there is ... there's the slub patch or the marking as broken.
Either are much simpler.

> If we can avoid all the fancy stuff and Dave can just get a minimal SPARSE
> config going then this may be the best solution for stable as well.
> 
> But then these configs have been broken for years and no one noticed. This
> means the users of these arches likely have been running a subset of
> kernel functionality. I suspect they have never freed memory from
> DISCONTIG node 1 and higher without CONFIG_DEBUG_VM on. Otherwise I
> cannot explain why the VM_BUG_ONs did not trigger in
> mm/page_alloc.c:move_freepages() that should have been brought to the MM
> developers attention.

Yes they have.  As willy said, they've just never been run with DEBUG_VM
or HUGEPAGES or, until recently, SLUB.  The test boxes (at least for
parisc) get hammered quite a lot to flush out coherency issues.  That's
why I'm confident this panic only triggers for slub.  I found the panic
within about two days of turning SLUB on.

> This set of circumstances leads to the suspicion that there were only
> tests run that showed that the kernel booted. Higher node memory was never
> touched and the MM code was never truly exercised.

Look, try to stay on point with logic: they have been extensively
tested, just not in the slub configuration, which is the only one that
crashes.  As I explained (several times) we're just now picking up slub
because debian now enables it by default.

> So I am not sure that there is any urgency in this matter. No one has
> cared for years after all.

If we didn't care, we wouldn't be making all this fuss.  It's only a
couple of days since the bug was reported, which should indicate the
high importance attached to it (well, by everyone except you,
apparently).

James


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-21 21:19                                                           ` David Rientjes
@ 2011-04-21 21:24                                                             ` James Bottomley
  -1 siblings, 0 replies; 210+ messages in thread
From: James Bottomley @ 2011-04-21 21:24 UTC (permalink / raw)
  To: David Rientjes
  Cc: Christoph Lameter, Andrew Morton, KOSAKI Motohiro, Pekka Enberg,
	Michal Hocko, Hugh Dickins, linux-mm, LKML, linux-parisc,
	Ingo Molnar, x86 maintainers

On Thu, 2011-04-21 at 14:19 -0700, David Rientjes wrote:
> On Thu, 21 Apr 2011, Christoph Lameter wrote:
> 
> > In 32 bit configurations some architectures (like x86) provide nodes
> > that have only high memory. Slab allocators only handle normal memory.
> > SLAB operates in a kind of degraded mode in that case by falling back for
> > each allocation to the nodes that have normal memory.
> > 
> 
> Let's do this:
> 
>  - parisc: James has already queued "parisc: set memory ranges in 
>    N_NORMAL_MEMORY when onlined" for 2.6.39, so all he needs now is 
>    to merge a hybrid of the Kconfig changes requiring CONFIG_NUMA for 
>    CONFIG_DISCONTIGMEM from KOSAKI-san and myself which also fix the 
>    compile issues,

Not quite: if we go this route, we need to sort out our CPU scheduling
problem as well ... as I said, I don't think we've got all the necessary
numa machinery in place yet.

James



^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-21 21:24                                                             ` James Bottomley
  0 siblings, 0 replies; 210+ messages in thread
From: James Bottomley @ 2011-04-21 21:24 UTC (permalink / raw)
  To: David Rientjes
  Cc: Christoph Lameter, Andrew Morton, KOSAKI Motohiro, Pekka Enberg,
	Michal Hocko, Hugh Dickins, linux-mm, LKML, linux-parisc,
	Ingo Molnar, x86 maintainers

On Thu, 2011-04-21 at 14:19 -0700, David Rientjes wrote:
> On Thu, 21 Apr 2011, Christoph Lameter wrote:
> 
> > In 32 bit configurations some architectures (like x86) provide nodes
> > that have only high memory. Slab allocators only handle normal memory.
> > SLAB operates in a kind of degraded mode in that case by falling back for
> > each allocation to the nodes that have normal memory.
> > 
> 
> Let's do this:
> 
>  - parisc: James has already queued "parisc: set memory ranges in 
>    N_NORMAL_MEMORY when onlined" for 2.6.39, so all he needs now is 
>    to merge a hybrid of the Kconfig changes requiring CONFIG_NUMA for 
>    CONFIG_DISCONTIGMEM from KOSAKI-san and myself which also fix the 
>    compile issues,

Not quite: if we go this route, we need to sort out our CPU scheduling
problem as well ... as I said, I don't think we've got all the necessary
numa machinery in place yet.

James


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* [patch] parisc: enable CONFIG_NUMA for DISCONTIGMEM and fix build errors
  2011-04-21 21:19                                                           ` David Rientjes
  (?)
  (?)
@ 2011-04-21 21:31                                                           ` David Rientjes
  -1 siblings, 0 replies; 210+ messages in thread
From: David Rientjes @ 2011-04-21 21:31 UTC (permalink / raw)
  To: Kyle McMartin, Helge Deller, James Bottomley
  Cc: KOSAKI Motohiro, Andrew Morton, linux-kernel, linux-parisc

From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

CONFIG_DISCONTIGMEM requires CONFIG_NUMA because it represents its memory 
regions as individual nodes that lack remote proximity, hence it already 
has a default CONFIG_NODES_SHIFT of 3.

This configuration causes build errors, however, so fix them as well.

This was first realized when CONFIG_DISCONTIGMEM was used with 
CONFIG_SLUB which depends heavily on CONFIG_NUMA being enabled or 
disabled to support these memory regions.

A long-term fix is to separate the memory region abstraction required for 
DISCONTIGMEM away from NUMA and then build NUMA on top of it with 
proximity domains.

[rientjes@google.com: implicitly enable NUMA for DISCONTIGMEM, changelog]
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: David Rientjes <rientjes@google.com>
---
 arch/parisc/Kconfig            |    3 +++
 include/asm-generic/topology.h |    4 ----
 include/linux/topology.h       |    2 +-
 3 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig
--- a/arch/parisc/Kconfig
+++ b/arch/parisc/Kconfig
@@ -244,6 +244,9 @@ config ARCH_DISCONTIGMEM_DEFAULT
 	def_bool y
 	depends on ARCH_DISCONTIGMEM_ENABLE
 
+config NUMA
+	def_bool ARCH_DISCONTIGMEM_ENABLE
+
 config NODES_SHIFT
 	int
 	default "3"
diff --git a/include/asm-generic/topology.h b/include/asm-generic/topology.h
--- a/include/asm-generic/topology.h
+++ b/include/asm-generic/topology.h
@@ -27,8 +27,6 @@
 #ifndef _ASM_GENERIC_TOPOLOGY_H
 #define _ASM_GENERIC_TOPOLOGY_H
 
-#ifndef	CONFIG_NUMA
-
 /* Other architectures wishing to use this simple topology API should fill
    in the below functions as appropriate in their own <asm/topology.h> file. */
 #ifndef cpu_to_node
@@ -60,8 +58,6 @@
 				 cpumask_of_node(pcibus_to_node(bus)))
 #endif
 
-#endif	/* CONFIG_NUMA */
-
 #if !defined(CONFIG_NUMA) || !defined(CONFIG_HAVE_MEMORYLESS_NODES)
 
 #ifndef set_numa_mem
diff --git a/include/linux/topology.h b/include/linux/topology.h
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -209,7 +209,7 @@ int arch_update_cpu_topology(void);
 
 #ifdef CONFIG_NUMA
 #ifndef SD_NODE_INIT
-#error Please define an appropriate SD_NODE_INIT in include/asm/topology.h!!!
+#define SD_NODE_INIT SD_ALLNODES_INIT
 #endif
 
 #endif /* CONFIG_NUMA */

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-21 21:24                                                             ` James Bottomley
@ 2011-04-21 21:34                                                               ` David Rientjes
  -1 siblings, 0 replies; 210+ messages in thread
From: David Rientjes @ 2011-04-21 21:34 UTC (permalink / raw)
  To: James Bottomley
  Cc: Christoph Lameter, Andrew Morton, KOSAKI Motohiro, Pekka Enberg,
	Michal Hocko, Hugh Dickins, linux-mm, LKML, linux-parisc,
	Ingo Molnar, x86 maintainers

On Thu, 21 Apr 2011, James Bottomley wrote:

> >  - parisc: James has already queued "parisc: set memory ranges in 
> >    N_NORMAL_MEMORY when onlined" for 2.6.39, so all he needs now is 
> >    to merge a hybrid of the Kconfig changes requiring CONFIG_NUMA for 
> >    CONFIG_DISCONTIGMEM from KOSAKI-san and myself which also fix the 
> >    compile issues,
> 
> Not quite: if we go this route, we need to sort out our CPU scheduling
> problem as well ... as I said, I don't think we've got all the necessary
> numa machinery in place yet.
> 

Ok, it seems like there're two options for this release cycle:

 (1) merge the patch that enables CONFIG_NUMA for DISCONTIGMEM but only 
     do so if CONFIG_SLUB is enabled to avoid the build error, or

 (2) disallow CONFIG_SLUB for parisc with DISCONTIGMEM.

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-21 21:34                                                               ` David Rientjes
  0 siblings, 0 replies; 210+ messages in thread
From: David Rientjes @ 2011-04-21 21:34 UTC (permalink / raw)
  To: James Bottomley
  Cc: Christoph Lameter, Andrew Morton, KOSAKI Motohiro, Pekka Enberg,
	Michal Hocko, Hugh Dickins, linux-mm, LKML, linux-parisc,
	Ingo Molnar, x86 maintainers

On Thu, 21 Apr 2011, James Bottomley wrote:

> >  - parisc: James has already queued "parisc: set memory ranges in 
> >    N_NORMAL_MEMORY when onlined" for 2.6.39, so all he needs now is 
> >    to merge a hybrid of the Kconfig changes requiring CONFIG_NUMA for 
> >    CONFIG_DISCONTIGMEM from KOSAKI-san and myself which also fix the 
> >    compile issues,
> 
> Not quite: if we go this route, we need to sort out our CPU scheduling
> problem as well ... as I said, I don't think we've got all the necessary
> numa machinery in place yet.
> 

Ok, it seems like there're two options for this release cycle:

 (1) merge the patch that enables CONFIG_NUMA for DISCONTIGMEM but only 
     do so if CONFIG_SLUB is enabled to avoid the build error, or

 (2) disallow CONFIG_SLUB for parisc with DISCONTIGMEM.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* [patch] mm: always set nodes with regular memory in N_NORMAL_MEMORY
  2011-04-21 21:19                                                           ` David Rientjes
@ 2011-04-21 21:41                                                             ` David Rientjes
  -1 siblings, 0 replies; 210+ messages in thread
From: David Rientjes @ 2011-04-21 21:41 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, Lee Schermerhorn, KOSAKI Motohiro, linux-mm, linux-kernel

N_NORMAL_MEMORY is intended to include all nodes that have present memory 
in regular zones, that is, zones below ZONE_HIGHMEM.  This should be done 
regardless of whether CONFIG_HIGHMEM is set or not.

This fixes ia64 so that the nodes get set appropriately in the nodemask 
for DISCONTIGMEM and mips if it does not enable CONFIG_HIGHMEM even for 
32-bit kernels.

If N_NORMAL_MEMORY is not accurate, slub may encounter errors since it 
relies on this nodemask to setup kmem_cache_node data structures for each 
cache.

Signed-off-by: David Rientjes <rientjes@google.com>
---
 mm/page_alloc.c |    2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4727,7 +4727,6 @@ out:
 /* Any regular memory on that node ? */
 static void check_for_regular_memory(pg_data_t *pgdat)
 {
-#ifdef CONFIG_HIGHMEM
 	enum zone_type zone_type;
 
 	for (zone_type = 0; zone_type <= ZONE_NORMAL; zone_type++) {
@@ -4735,7 +4734,6 @@ static void check_for_regular_memory(pg_data_t *pgdat)
 		if (zone->present_pages)
 			node_set_state(zone_to_nid(zone), N_NORMAL_MEMORY);
 	}
-#endif
 }
 
 /**

^ permalink raw reply	[flat|nested] 210+ messages in thread

* [patch] mm: always set nodes with regular memory in N_NORMAL_MEMORY
@ 2011-04-21 21:41                                                             ` David Rientjes
  0 siblings, 0 replies; 210+ messages in thread
From: David Rientjes @ 2011-04-21 21:41 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, Lee Schermerhorn, KOSAKI Motohiro, linux-mm, linux-kernel

N_NORMAL_MEMORY is intended to include all nodes that have present memory 
in regular zones, that is, zones below ZONE_HIGHMEM.  This should be done 
regardless of whether CONFIG_HIGHMEM is set or not.

This fixes ia64 so that the nodes get set appropriately in the nodemask 
for DISCONTIGMEM and mips if it does not enable CONFIG_HIGHMEM even for 
32-bit kernels.

If N_NORMAL_MEMORY is not accurate, slub may encounter errors since it 
relies on this nodemask to setup kmem_cache_node data structures for each 
cache.

Signed-off-by: David Rientjes <rientjes@google.com>
---
 mm/page_alloc.c |    2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4727,7 +4727,6 @@ out:
 /* Any regular memory on that node ? */
 static void check_for_regular_memory(pg_data_t *pgdat)
 {
-#ifdef CONFIG_HIGHMEM
 	enum zone_type zone_type;
 
 	for (zone_type = 0; zone_type <= ZONE_NORMAL; zone_type++) {
@@ -4735,7 +4734,6 @@ static void check_for_regular_memory(pg_data_t *pgdat)
 		if (zone->present_pages)
 			node_set_state(zone_to_nid(zone), N_NORMAL_MEMORY);
 	}
-#endif
 }
 
 /**

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-21 21:34                                                               ` David Rientjes
@ 2011-04-21 21:49                                                                 ` James Bottomley
  -1 siblings, 0 replies; 210+ messages in thread
From: James Bottomley @ 2011-04-21 21:49 UTC (permalink / raw)
  To: David Rientjes
  Cc: Christoph Lameter, Andrew Morton, KOSAKI Motohiro, Pekka Enberg,
	Michal Hocko, Hugh Dickins, linux-mm, LKML, linux-parisc,
	Ingo Molnar, x86 maintainers

On Thu, 2011-04-21 at 14:34 -0700, David Rientjes wrote:
> On Thu, 21 Apr 2011, James Bottomley wrote:
> 
> > >  - parisc: James has already queued "parisc: set memory ranges in 
> > >    N_NORMAL_MEMORY when onlined" for 2.6.39, so all he needs now is 
> > >    to merge a hybrid of the Kconfig changes requiring CONFIG_NUMA for 
> > >    CONFIG_DISCONTIGMEM from KOSAKI-san and myself which also fix the 
> > >    compile issues,
> > 
> > Not quite: if we go this route, we need to sort out our CPU scheduling
> > problem as well ... as I said, I don't think we've got all the necessary
> > numa machinery in place yet.
> > 
> 
> Ok, it seems like there're two options for this release cycle:
> 
>  (1) merge the patch that enables CONFIG_NUMA for DISCONTIGMEM but only 
>      do so if CONFIG_SLUB is enabled to avoid the build error, or

That's not an option without coming up with the rest of the numa
fixes ... we can't basically force all SMP systems to become UP.

What build error, by the way?  There's only a runtime panic caused by
slub.

>  (2) disallow CONFIG_SLUB for parisc with DISCONTIGMEM.

Well, that's this patch ... it will actually fix every architecture, not
just parisc.


> diff --git a/init/Kconfig b/init/Kconfig
> index 56240e7..a7ad8fb 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -1226,6 +1226,7 @@ config SLAB
>           per cpu and per node queues.
>  
>  config SLUB
> +       depends on BROKEN || NUMA || !DISCONTIGMEM
>         bool "SLUB (Unqueued Allocator)"
>         help
>            SLUB is a slab allocator that minimizes cache line usage


I already sent it to linux-arch and there's been no dissent; there have
been a few "will that fix my slub bug?" type of responses.

James



^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-21 21:49                                                                 ` James Bottomley
  0 siblings, 0 replies; 210+ messages in thread
From: James Bottomley @ 2011-04-21 21:49 UTC (permalink / raw)
  To: David Rientjes
  Cc: Christoph Lameter, Andrew Morton, KOSAKI Motohiro, Pekka Enberg,
	Michal Hocko, Hugh Dickins, linux-mm, LKML, linux-parisc,
	Ingo Molnar, x86 maintainers

On Thu, 2011-04-21 at 14:34 -0700, David Rientjes wrote:
> On Thu, 21 Apr 2011, James Bottomley wrote:
> 
> > >  - parisc: James has already queued "parisc: set memory ranges in 
> > >    N_NORMAL_MEMORY when onlined" for 2.6.39, so all he needs now is 
> > >    to merge a hybrid of the Kconfig changes requiring CONFIG_NUMA for 
> > >    CONFIG_DISCONTIGMEM from KOSAKI-san and myself which also fix the 
> > >    compile issues,
> > 
> > Not quite: if we go this route, we need to sort out our CPU scheduling
> > problem as well ... as I said, I don't think we've got all the necessary
> > numa machinery in place yet.
> > 
> 
> Ok, it seems like there're two options for this release cycle:
> 
>  (1) merge the patch that enables CONFIG_NUMA for DISCONTIGMEM but only 
>      do so if CONFIG_SLUB is enabled to avoid the build error, or

That's not an option without coming up with the rest of the numa
fixes ... we can't basically force all SMP systems to become UP.

What build error, by the way?  There's only a runtime panic caused by
slub.

>  (2) disallow CONFIG_SLUB for parisc with DISCONTIGMEM.

Well, that's this patch ... it will actually fix every architecture, not
just parisc.


> diff --git a/init/Kconfig b/init/Kconfig
> index 56240e7..a7ad8fb 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -1226,6 +1226,7 @@ config SLAB
>           per cpu and per node queues.
>  
>  config SLUB
> +       depends on BROKEN || NUMA || !DISCONTIGMEM
>         bool "SLUB (Unqueued Allocator)"
>         help
>            SLUB is a slab allocator that minimizes cache line usage


I already sent it to linux-arch and there's been no dissent; there have
been a few "will that fix my slub bug?" type of responses.

James


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* [patch] alpha, mm: set all online nodes in N_NORMAL_MEMORY
  2011-04-21 21:19                                                           ` David Rientjes
                                                                             ` (3 preceding siblings ...)
  (?)
@ 2011-04-21 22:00                                                           ` David Rientjes
  -1 siblings, 0 replies; 210+ messages in thread
From: David Rientjes @ 2011-04-21 22:00 UTC (permalink / raw)
  To: Richard Henderson, Ivan Kokshaysky, Andrew Morton, Matt Turner
  Cc: linux-alpha, linux-kernel

For alpha, N_NORMAL_MEMORY represents all nodes that have present memory 
since it does not support HIGHMEM.  This patch sets the bit at the time 
the node is initialized.

If N_NORMAL_MEMORY is not accurate, slub may encounter errors since it 
uses this nodemask to setup per-cache kmem_cache_node data structures.

Signed-off-by: David Rientjes <rientjes@google.com>
---
 arch/alpha/mm/numa.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/alpha/mm/numa.c b/arch/alpha/mm/numa.c
--- a/arch/alpha/mm/numa.c
+++ b/arch/alpha/mm/numa.c
@@ -313,6 +313,7 @@ void __init paging_init(void)
 			zones_size[ZONE_DMA] = dma_local_pfn;
 			zones_size[ZONE_NORMAL] = (end_pfn - start_pfn) - dma_local_pfn;
 		}
+		node_set_state(nid, N_NORMAL_MEMORY);
 		free_area_init_node(nid, zones_size, start_pfn, NULL);
 	}
 

^ permalink raw reply	[flat|nested] 210+ messages in thread

* [patch] m32r, mm: set all online nodes in N_NORMAL_MEMORY
  2011-04-21 21:19                                                           ` David Rientjes
                                                                             ` (4 preceding siblings ...)
  (?)
@ 2011-04-21 22:01                                                           ` David Rientjes
  -1 siblings, 0 replies; 210+ messages in thread
From: David Rientjes @ 2011-04-21 22:01 UTC (permalink / raw)
  To: Andrew Morton, Hirokazu Takata; +Cc: linux-m32r, linux-kernel

For m32r, N_NORMAL_MEMORY represents all nodes that have present memory 
since it does not support HIGHMEM.  This patch sets the bit at the time 
the node is initialized.

If N_NORMAL_MEMORY is not accurate, slub may encounter errors since it 
uses this nodemask to setup per-cache kmem_cache_node data structures.

Signed-off-by: David Rientjes <rientjes@google.com>
---
 arch/m32r/mm/discontig.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/m32r/mm/discontig.c b/arch/m32r/mm/discontig.c
--- a/arch/m32r/mm/discontig.c
+++ b/arch/m32r/mm/discontig.c
@@ -149,6 +149,7 @@ unsigned long __init zone_sizes_init(void)
 		zholes_size[ZONE_DMA] = mp->holes;
 		holes += zholes_size[ZONE_DMA];
 
+		node_set_state(nid, N_NORMAL_MEMORY);
 		free_area_init_node(nid, zones_size, start_pfn, zholes_size);
 	}
 

^ permalink raw reply	[flat|nested] 210+ messages in thread

* [patch] m68k, mm: set all online nodes in N_NORMAL_MEMORY
  2011-04-21 21:19                                                           ` David Rientjes
                                                                             ` (5 preceding siblings ...)
  (?)
@ 2011-04-21 22:02                                                           ` David Rientjes
  2011-04-24  1:59                                                             ` Michael Schmitz
  -1 siblings, 1 reply; 210+ messages in thread
From: David Rientjes @ 2011-04-21 22:02 UTC (permalink / raw)
  To: Geert Uytterhoeven, Andrew Morton, Roman Zippel; +Cc: linux-m68k, linux-kernel

For m68k, N_NORMAL_MEMORY represents all nodes that have present memory 
since it does not support HIGHMEM.  This patch sets the bit at the time 
the node is brought online.

If N_NORMAL_MEMORY is not accurate, slub may encounter errors since it 
uses this nodemask to setup per-cache kmem_cache_node data structures.

Signed-off-by: David Rientjes <rientjes@google.com>
---
 arch/m68k/mm/init_mm.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/m68k/mm/init_mm.c b/arch/m68k/mm/init_mm.c
--- a/arch/m68k/mm/init_mm.c
+++ b/arch/m68k/mm/init_mm.c
@@ -59,6 +59,8 @@ void __init m68k_setup_node(int node)
 	}
 #endif
 	pg_data_map[node].bdata = bootmem_node_data + node;
+	if (node_present_pages(node))
+		node_set_state(node, N_NORMAL_MEMORY);
 	node_set_online(node);
 }
 

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-21 21:49                                                                 ` James Bottomley
@ 2011-04-21 22:12                                                                   ` David Rientjes
  -1 siblings, 0 replies; 210+ messages in thread
From: David Rientjes @ 2011-04-21 22:12 UTC (permalink / raw)
  To: James Bottomley
  Cc: Christoph Lameter, Andrew Morton, KOSAKI Motohiro, Pekka Enberg,
	Michal Hocko, Hugh Dickins, linux-mm, LKML, linux-parisc,
	Ingo Molnar, x86 maintainers

On Thu, 21 Apr 2011, James Bottomley wrote:

> > Ok, it seems like there're two options for this release cycle:
> > 
> >  (1) merge the patch that enables CONFIG_NUMA for DISCONTIGMEM but only 
> >      do so if CONFIG_SLUB is enabled to avoid the build error, or
> 
> That's not an option without coming up with the rest of the numa
> fixes ... we can't basically force all SMP systems to become UP.
> 
> What build error, by the way?  There's only a runtime panic caused by
> slub.
> 

If you enable CONFIG_NUMA for ARCH_DISCONTIGMEM_ENABLE on parisc, it 
results in the same build error that you identified in

	http://marc.info/?l=linux-parisc&m=130326773918005

at least on my hppa64 compiler.

> >  (2) disallow CONFIG_SLUB for parisc with DISCONTIGMEM.
> 
> Well, that's this patch ... it will actually fix every architecture, not
> just parisc.
> 
> 
> > diff --git a/init/Kconfig b/init/Kconfig
> > index 56240e7..a7ad8fb 100644
> > --- a/init/Kconfig
> > +++ b/init/Kconfig
> > @@ -1226,6 +1226,7 @@ config SLAB
> >           per cpu and per node queues.
> >  
> >  config SLUB
> > +       depends on BROKEN || NUMA || !DISCONTIGMEM
> >         bool "SLUB (Unqueued Allocator)"
> >         help
> >            SLUB is a slab allocator that minimizes cache line usage
> 
> 
> I already sent it to linux-arch and there's been no dissent; there have
> been a few "will that fix my slub bug?" type of responses.
> 

I was concerned about tile because it actually got all this right by using 
N_NORMAL_MEMORY appropriately and it uses slub by default, but it always 
enables NUMA at the moment so this won't impact it.

Acked-by: David Rientjes <rientjes@google.com>

I agree we can now defer "parisc: enable CONFIG_NUMA for DISCONTIGMEM and 
fix build errors" until parisc moves away from DISCONTIGMEM, its extracted 
away from CONFIG_NUMA, or the scheduler issues are debugged.

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-21 22:12                                                                   ` David Rientjes
  0 siblings, 0 replies; 210+ messages in thread
From: David Rientjes @ 2011-04-21 22:12 UTC (permalink / raw)
  To: James Bottomley
  Cc: Christoph Lameter, Andrew Morton, KOSAKI Motohiro, Pekka Enberg,
	Michal Hocko, Hugh Dickins, linux-mm, LKML, linux-parisc,
	Ingo Molnar, x86 maintainers

On Thu, 21 Apr 2011, James Bottomley wrote:

> > Ok, it seems like there're two options for this release cycle:
> > 
> >  (1) merge the patch that enables CONFIG_NUMA for DISCONTIGMEM but only 
> >      do so if CONFIG_SLUB is enabled to avoid the build error, or
> 
> That's not an option without coming up with the rest of the numa
> fixes ... we can't basically force all SMP systems to become UP.
> 
> What build error, by the way?  There's only a runtime panic caused by
> slub.
> 

If you enable CONFIG_NUMA for ARCH_DISCONTIGMEM_ENABLE on parisc, it 
results in the same build error that you identified in

	http://marc.info/?l=linux-parisc&m=130326773918005

at least on my hppa64 compiler.

> >  (2) disallow CONFIG_SLUB for parisc with DISCONTIGMEM.
> 
> Well, that's this patch ... it will actually fix every architecture, not
> just parisc.
> 
> 
> > diff --git a/init/Kconfig b/init/Kconfig
> > index 56240e7..a7ad8fb 100644
> > --- a/init/Kconfig
> > +++ b/init/Kconfig
> > @@ -1226,6 +1226,7 @@ config SLAB
> >           per cpu and per node queues.
> >  
> >  config SLUB
> > +       depends on BROKEN || NUMA || !DISCONTIGMEM
> >         bool "SLUB (Unqueued Allocator)"
> >         help
> >            SLUB is a slab allocator that minimizes cache line usage
> 
> 
> I already sent it to linux-arch and there's been no dissent; there have
> been a few "will that fix my slub bug?" type of responses.
> 

I was concerned about tile because it actually got all this right by using 
N_NORMAL_MEMORY appropriately and it uses slub by default, but it always 
enables NUMA at the moment so this won't impact it.

Acked-by: David Rientjes <rientjes@google.com>

I agree we can now defer "parisc: enable CONFIG_NUMA for DISCONTIGMEM and 
fix build errors" until parisc moves away from DISCONTIGMEM, its extracted 
away from CONFIG_NUMA, or the scheduler issues are debugged.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-21 16:06                                                             ` James Bottomley
@ 2011-04-21 22:19                                                               ` David Rientjes
  -1 siblings, 0 replies; 210+ messages in thread
From: David Rientjes @ 2011-04-21 22:19 UTC (permalink / raw)
  To: James Bottomley
  Cc: Christoph Lameter, Pekka Enberg, Matthew Wilcox, KOSAKI Motohiro,
	Michal Hocko, Andrew Morton, Hugh Dickins, linux-mm, LKML,
	linux-parisc, Ingo Molnar, x86 maintainers, linux-arch,
	Mel Gorman

On Thu, 21 Apr 2011, James Bottomley wrote:

> diff --git a/mm/slub.c b/mm/slub.c
> index 94d2a33..243bd9c 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -235,7 +235,11 @@ int slab_is_available(void)
>  
>  static inline struct kmem_cache_node *get_node(struct kmem_cache *s, int node)
>  {
> +#ifdef CONFIG_NUMA
>  	return s->node[node];
> +#else
> +	return s->node[0];
> +#endif
>  }
>  
>  /* Verify that a pointer has an address that is valid within a slab page */

Looks like parisc may have been just fine before 7340cc84141d (slub: 
reduce differences between SMP and NUMA), which was merged into 2.6.37?

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-21 22:19                                                               ` David Rientjes
  0 siblings, 0 replies; 210+ messages in thread
From: David Rientjes @ 2011-04-21 22:19 UTC (permalink / raw)
  To: James Bottomley
  Cc: Christoph Lameter, Pekka Enberg, Matthew Wilcox, KOSAKI Motohiro,
	Michal Hocko, Andrew Morton, Hugh Dickins, linux-mm, LKML,
	linux-parisc, Ingo Molnar, x86 maintainers, linux-arch,
	Mel Gorman

On Thu, 21 Apr 2011, James Bottomley wrote:

> diff --git a/mm/slub.c b/mm/slub.c
> index 94d2a33..243bd9c 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -235,7 +235,11 @@ int slab_is_available(void)
>  
>  static inline struct kmem_cache_node *get_node(struct kmem_cache *s, int node)
>  {
> +#ifdef CONFIG_NUMA
>  	return s->node[node];
> +#else
> +	return s->node[0];
> +#endif
>  }
>  
>  /* Verify that a pointer has an address that is valid within a slab page */

Looks like parisc may have been just fine before 7340cc84141d (slub: 
reduce differences between SMP and NUMA), which was merged into 2.6.37?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-21 22:19                                                               ` David Rientjes
@ 2011-04-21 22:31                                                                 ` James Bottomley
  -1 siblings, 0 replies; 210+ messages in thread
From: James Bottomley @ 2011-04-21 22:31 UTC (permalink / raw)
  To: David Rientjes
  Cc: Christoph Lameter, Pekka Enberg, Matthew Wilcox, KOSAKI Motohiro,
	Michal Hocko, Andrew Morton, Hugh Dickins, linux-mm, LKML,
	linux-parisc, Ingo Molnar, x86 maintainers, linux-arch,
	Mel Gorman

On Thu, 2011-04-21 at 15:19 -0700, David Rientjes wrote:
> On Thu, 21 Apr 2011, James Bottomley wrote:
> 
> > diff --git a/mm/slub.c b/mm/slub.c
> > index 94d2a33..243bd9c 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -235,7 +235,11 @@ int slab_is_available(void)
> >  
> >  static inline struct kmem_cache_node *get_node(struct kmem_cache *s, int node)
> >  {
> > +#ifdef CONFIG_NUMA
> >  	return s->node[node];
> > +#else
> > +	return s->node[0];
> > +#endif
> >  }
> >  
> >  /* Verify that a pointer has an address that is valid within a slab page */
> 
> Looks like parisc may have been just fine before 7340cc84141d (slub: 
> reduce differences between SMP and NUMA), which was merged into 2.6.37?

That's possible.  I've had no bug reports from the debian 2.6.32 kernel,
which is the only other one that has SLUB by default.  The m68k guys
seem to think this is the cause of their problems too.

But the basic fact is that all our testing has been done on SLAB.  It
wasn't until debian asked us to looks at a 2.6.38 kernel that I
accidentally picked up SLUB by importing their config into my build
environment.

James



^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-21 22:31                                                                 ` James Bottomley
  0 siblings, 0 replies; 210+ messages in thread
From: James Bottomley @ 2011-04-21 22:31 UTC (permalink / raw)
  To: David Rientjes
  Cc: Christoph Lameter, Pekka Enberg, Matthew Wilcox, KOSAKI Motohiro,
	Michal Hocko, Andrew Morton, Hugh Dickins, linux-mm, LKML,
	linux-parisc, Ingo Molnar, x86 maintainers, linux-arch,
	Mel Gorman

On Thu, 2011-04-21 at 15:19 -0700, David Rientjes wrote:
> On Thu, 21 Apr 2011, James Bottomley wrote:
> 
> > diff --git a/mm/slub.c b/mm/slub.c
> > index 94d2a33..243bd9c 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -235,7 +235,11 @@ int slab_is_available(void)
> >  
> >  static inline struct kmem_cache_node *get_node(struct kmem_cache *s, int node)
> >  {
> > +#ifdef CONFIG_NUMA
> >  	return s->node[node];
> > +#else
> > +	return s->node[0];
> > +#endif
> >  }
> >  
> >  /* Verify that a pointer has an address that is valid within a slab page */
> 
> Looks like parisc may have been just fine before 7340cc84141d (slub: 
> reduce differences between SMP and NUMA), which was merged into 2.6.37?

That's possible.  I've had no bug reports from the debian 2.6.32 kernel,
which is the only other one that has SLUB by default.  The m68k guys
seem to think this is the cause of their problems too.

But the basic fact is that all our testing has been done on SLAB.  It
wasn't until debian asked us to looks at a 2.6.38 kernel that I
accidentally picked up SLUB by importing their config into my build
environment.

James


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-21 19:33                                                           ` David Rientjes
@ 2011-04-22  0:34                                                             ` KOSAKI Motohiro
  -1 siblings, 0 replies; 210+ messages in thread
From: KOSAKI Motohiro @ 2011-04-22  0:34 UTC (permalink / raw)
  To: David Rientjes
  Cc: kosaki.motohiro, James Bottomley, Pekka Enberg,
	Christoph Lameter, Michal Hocko, Andrew Morton, Hugh Dickins,
	linux-mm, LKML, linux-parisc, Ingo Molnar, x86 maintainers

> On Thu, 21 Apr 2011, KOSAKI Motohiro wrote:
> 
> > ia64 and mips have CONFIG_ARCH_POPULATES_NODE_MAP and it initialize
> > N_NORMAL_MEMORY automatically if my understand is correct.
> > (plz see free_area_init_nodes)
> > 
> 
> ia64 doesn't enable CONFIG_HIGHMEM, so it never gets set via this generic 
> code; mips also doesn't enable it for all configs even for 32-bit.
> 
> So we'll either want to take check_for_regular_memory() out from under 
> CONFIG_HIGHMEM and do it for all configs or teach slub to use 
> N_HIGH_MEMORY rather than N_NORMAL_MEMORY.

Hey, I already told this thing.

If CONFIG_HIGHMEM=n, N_HIGH_MEMORY and N_NORMAL_MEMORY are share the
same value. then, 
	node_set_state(nid, N_HIGH_MEMORY) in free_area_init_nodes()

mean set both N_HIGH_MEMORY and N_NORMAL_MEMORY.



^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-22  0:34                                                             ` KOSAKI Motohiro
  0 siblings, 0 replies; 210+ messages in thread
From: KOSAKI Motohiro @ 2011-04-22  0:34 UTC (permalink / raw)
  To: David Rientjes
  Cc: kosaki.motohiro, James Bottomley, Pekka Enberg,
	Christoph Lameter, Michal Hocko, Andrew Morton, Hugh Dickins,
	linux-mm, LKML, linux-parisc, Ingo Molnar, x86 maintainers

> On Thu, 21 Apr 2011, KOSAKI Motohiro wrote:
> 
> > ia64 and mips have CONFIG_ARCH_POPULATES_NODE_MAP and it initialize
> > N_NORMAL_MEMORY automatically if my understand is correct.
> > (plz see free_area_init_nodes)
> > 
> 
> ia64 doesn't enable CONFIG_HIGHMEM, so it never gets set via this generic 
> code; mips also doesn't enable it for all configs even for 32-bit.
> 
> So we'll either want to take check_for_regular_memory() out from under 
> CONFIG_HIGHMEM and do it for all configs or teach slub to use 
> N_HIGH_MEMORY rather than N_NORMAL_MEMORY.

Hey, I already told this thing.

If CONFIG_HIGHMEM=n, N_HIGH_MEMORY and N_NORMAL_MEMORY are share the
same value. then, 
	node_set_state(nid, N_HIGH_MEMORY) in free_area_init_nodes()

mean set both N_HIGH_MEMORY and N_NORMAL_MEMORY.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [patch] mm: always set nodes with regular memory in N_NORMAL_MEMORY
  2011-04-21 21:41                                                             ` David Rientjes
@ 2011-04-22  0:36                                                               ` KOSAKI Motohiro
  -1 siblings, 0 replies; 210+ messages in thread
From: KOSAKI Motohiro @ 2011-04-22  0:36 UTC (permalink / raw)
  To: David Rientjes
  Cc: kosaki.motohiro, Andrew Morton, Mel Gorman, Lee Schermerhorn,
	linux-mm, linux-kernel

> N_NORMAL_MEMORY is intended to include all nodes that have present memory 
> in regular zones, that is, zones below ZONE_HIGHMEM.  This should be done 
> regardless of whether CONFIG_HIGHMEM is set or not.
> 
> This fixes ia64 so that the nodes get set appropriately in the nodemask 
> for DISCONTIGMEM and mips if it does not enable CONFIG_HIGHMEM even for 
> 32-bit kernels.
> 
> If N_NORMAL_MEMORY is not accurate, slub may encounter errors since it 
> relies on this nodemask to setup kmem_cache_node data structures for each 
> cache.
> 
> Signed-off-by: David Rientjes <rientjes@google.com>
> ---
>  mm/page_alloc.c |    2 --
>  1 files changed, 0 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4727,7 +4727,6 @@ out:
>  /* Any regular memory on that node ? */
>  static void check_for_regular_memory(pg_data_t *pgdat)
>  {
> -#ifdef CONFIG_HIGHMEM
>  	enum zone_type zone_type;
>  
>  	for (zone_type = 0; zone_type <= ZONE_NORMAL; zone_type++) {
> @@ -4735,7 +4734,6 @@ static void check_for_regular_memory(pg_data_t *pgdat)
>  		if (zone->present_pages)
>  			node_set_state(zone_to_nid(zone), N_NORMAL_MEMORY);
>  	}
> -#endif

enum node_states {
        N_POSSIBLE,             /* The node could become online at some point */
        N_ONLINE,               /* The node is online */
        N_NORMAL_MEMORY,        /* The node has regular memory */
#ifdef CONFIG_HIGHMEM
        N_HIGH_MEMORY,          /* The node has regular or high memory */
#else
        N_HIGH_MEMORY = N_NORMAL_MEMORY,
#endif
        N_CPU,          /* The node has one or more cpus */
        NR_NODE_STATES
};

Then, only node_set_state(nid, N_HIGH_MEMORY) is enough initialization, IIUC.
Can you please explain when do we need this patch?



^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [patch] mm: always set nodes with regular memory in N_NORMAL_MEMORY
@ 2011-04-22  0:36                                                               ` KOSAKI Motohiro
  0 siblings, 0 replies; 210+ messages in thread
From: KOSAKI Motohiro @ 2011-04-22  0:36 UTC (permalink / raw)
  To: David Rientjes
  Cc: kosaki.motohiro, Andrew Morton, Mel Gorman, Lee Schermerhorn,
	linux-mm, linux-kernel

> N_NORMAL_MEMORY is intended to include all nodes that have present memory 
> in regular zones, that is, zones below ZONE_HIGHMEM.  This should be done 
> regardless of whether CONFIG_HIGHMEM is set or not.
> 
> This fixes ia64 so that the nodes get set appropriately in the nodemask 
> for DISCONTIGMEM and mips if it does not enable CONFIG_HIGHMEM even for 
> 32-bit kernels.
> 
> If N_NORMAL_MEMORY is not accurate, slub may encounter errors since it 
> relies on this nodemask to setup kmem_cache_node data structures for each 
> cache.
> 
> Signed-off-by: David Rientjes <rientjes@google.com>
> ---
>  mm/page_alloc.c |    2 --
>  1 files changed, 0 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4727,7 +4727,6 @@ out:
>  /* Any regular memory on that node ? */
>  static void check_for_regular_memory(pg_data_t *pgdat)
>  {
> -#ifdef CONFIG_HIGHMEM
>  	enum zone_type zone_type;
>  
>  	for (zone_type = 0; zone_type <= ZONE_NORMAL; zone_type++) {
> @@ -4735,7 +4734,6 @@ static void check_for_regular_memory(pg_data_t *pgdat)
>  		if (zone->present_pages)
>  			node_set_state(zone_to_nid(zone), N_NORMAL_MEMORY);
>  	}
> -#endif

enum node_states {
        N_POSSIBLE,             /* The node could become online at some point */
        N_ONLINE,               /* The node is online */
        N_NORMAL_MEMORY,        /* The node has regular memory */
#ifdef CONFIG_HIGHMEM
        N_HIGH_MEMORY,          /* The node has regular or high memory */
#else
        N_HIGH_MEMORY = N_NORMAL_MEMORY,
#endif
        N_CPU,          /* The node has one or more cpus */
        NR_NODE_STATES
};

Then, only node_set_state(nid, N_HIGH_MEMORY) is enough initialization, IIUC.
Can you please explain when do we need this patch?


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-21 22:12                                                                   ` David Rientjes
  (?)
@ 2011-04-22  8:02                                                                     ` Pekka Enberg
  -1 siblings, 0 replies; 210+ messages in thread
From: Pekka Enberg @ 2011-04-22  8:02 UTC (permalink / raw)
  To: David Rientjes
  Cc: James Bottomley, Christoph Lameter, Andrew Morton,
	KOSAKI Motohiro, Michal Hocko, Hugh Dickins, linux-mm, LKML,
	linux-parisc, Ingo Molnar, x86 maintainers

On Fri, Apr 22, 2011 at 1:12 AM, David Rientjes <rientjes@google.com> w=
rote:
>> > diff --git a/init/Kconfig b/init/Kconfig
>> > index 56240e7..a7ad8fb 100644
>> > --- a/init/Kconfig
>> > +++ b/init/Kconfig
>> > @@ -1226,6 +1226,7 @@ config SLAB
>> > =A0 =A0 =A0 =A0 =A0 per cpu and per node queues.
>> >
>> > =A0config SLUB
>> > + =A0 =A0 =A0 depends on BROKEN || NUMA || !DISCONTIGMEM
>> > =A0 =A0 =A0 =A0 bool "SLUB (Unqueued Allocator)"
>> > =A0 =A0 =A0 =A0 help
>> > =A0 =A0 =A0 =A0 =A0 =A0SLUB is a slab allocator that minimizes cac=
he line usage
>>
>>
>> I already sent it to linux-arch and there's been no dissent; there h=
ave
>> been a few "will that fix my slub bug?" type of responses.
>
> I was concerned about tile because it actually got all this right by =
using
> N_NORMAL_MEMORY appropriately and it uses slub by default, but it alw=
ays
> enables NUMA at the moment so this won't impact it.
>
> Acked-by: David Rientjes <rientjes@google.com>

I'm OK with this Kconfig patch. Can someone send a proper patch with
signoffs and such? Do we want to tag this for -stable too?

                         Pekka
--
To unsubscribe from this list: send the line "unsubscribe linux-parisc"=
 in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-22  8:02                                                                     ` Pekka Enberg
  0 siblings, 0 replies; 210+ messages in thread
From: Pekka Enberg @ 2011-04-22  8:02 UTC (permalink / raw)
  To: David Rientjes
  Cc: James Bottomley, Christoph Lameter, Andrew Morton,
	KOSAKI Motohiro, Michal Hocko, Hugh Dickins, linux-mm, LKML,
	linux-parisc, Ingo Molnar, x86 maintainers

On Fri, Apr 22, 2011 at 1:12 AM, David Rientjes <rientjes@google.com> wrote:
>> > diff --git a/init/Kconfig b/init/Kconfig
>> > index 56240e7..a7ad8fb 100644
>> > --- a/init/Kconfig
>> > +++ b/init/Kconfig
>> > @@ -1226,6 +1226,7 @@ config SLAB
>> >           per cpu and per node queues.
>> >
>> >  config SLUB
>> > +       depends on BROKEN || NUMA || !DISCONTIGMEM
>> >         bool "SLUB (Unqueued Allocator)"
>> >         help
>> >            SLUB is a slab allocator that minimizes cache line usage
>>
>>
>> I already sent it to linux-arch and there's been no dissent; there have
>> been a few "will that fix my slub bug?" type of responses.
>
> I was concerned about tile because it actually got all this right by using
> N_NORMAL_MEMORY appropriately and it uses slub by default, but it always
> enables NUMA at the moment so this won't impact it.
>
> Acked-by: David Rientjes <rientjes@google.com>

I'm OK with this Kconfig patch. Can someone send a proper patch with
signoffs and such? Do we want to tag this for -stable too?

                         Pekka

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-22  8:02                                                                     ` Pekka Enberg
  0 siblings, 0 replies; 210+ messages in thread
From: Pekka Enberg @ 2011-04-22  8:02 UTC (permalink / raw)
  To: David Rientjes
  Cc: James Bottomley, Christoph Lameter, Andrew Morton,
	KOSAKI Motohiro, Michal Hocko, Hugh Dickins, linux-mm, LKML,
	linux-parisc, Ingo Molnar, x86 maintainers

On Fri, Apr 22, 2011 at 1:12 AM, David Rientjes <rientjes@google.com> wrote:
>> > diff --git a/init/Kconfig b/init/Kconfig
>> > index 56240e7..a7ad8fb 100644
>> > --- a/init/Kconfig
>> > +++ b/init/Kconfig
>> > @@ -1226,6 +1226,7 @@ config SLAB
>> >           per cpu and per node queues.
>> >
>> >  config SLUB
>> > +       depends on BROKEN || NUMA || !DISCONTIGMEM
>> >         bool "SLUB (Unqueued Allocator)"
>> >         help
>> >            SLUB is a slab allocator that minimizes cache line usage
>>
>>
>> I already sent it to linux-arch and there's been no dissent; there have
>> been a few "will that fix my slub bug?" type of responses.
>
> I was concerned about tile because it actually got all this right by using
> N_NORMAL_MEMORY appropriately and it uses slub by default, but it always
> enables NUMA at the moment so this won't impact it.
>
> Acked-by: David Rientjes <rientjes@google.com>

I'm OK with this Kconfig patch. Can someone send a proper patch with
signoffs and such? Do we want to tag this for -stable too?

                         Pekka

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-22  8:02                                                                     ` Pekka Enberg
@ 2011-04-22 13:49                                                                       ` James Bottomley
  -1 siblings, 0 replies; 210+ messages in thread
From: James Bottomley @ 2011-04-22 13:49 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: David Rientjes, Christoph Lameter, Andrew Morton,
	KOSAKI Motohiro, Michal Hocko, Hugh Dickins, linux-mm, LKML,
	linux-parisc, Ingo Molnar, x86 maintainers

On Fri, 2011-04-22 at 11:02 +0300, Pekka Enberg wrote:
> On Fri, Apr 22, 2011 at 1:12 AM, David Rientjes <rientjes@google.com> wrote:
> >> > diff --git a/init/Kconfig b/init/Kconfig
> >> > index 56240e7..a7ad8fb 100644
> >> > --- a/init/Kconfig
> >> > +++ b/init/Kconfig
> >> > @@ -1226,6 +1226,7 @@ config SLAB
> >> >           per cpu and per node queues.
> >> >
> >> >  config SLUB
> >> > +       depends on BROKEN || NUMA || !DISCONTIGMEM
> >> >         bool "SLUB (Unqueued Allocator)"
> >> >         help
> >> >            SLUB is a slab allocator that minimizes cache line usage
> >>
> >>
> >> I already sent it to linux-arch and there's been no dissent; there have
> >> been a few "will that fix my slub bug?" type of responses.
> >
> > I was concerned about tile because it actually got all this right by using
> > N_NORMAL_MEMORY appropriately and it uses slub by default, but it always
> > enables NUMA at the moment so this won't impact it.
> >
> > Acked-by: David Rientjes <rientjes@google.com>
> 
> I'm OK with this Kconfig patch. Can someone send a proper patch with
> signoffs and such? Do we want to tag this for -stable too?

I already did here:

http://marc.info/?l=linux-arch&m=130324857801976

I got the wrong linux-mm email address, though (I thought you were on
vger).

I've got a parisc specific patch already for this (also for stable), so
I can just queue this alongside if everyone's OK with that?

James



^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-22 13:49                                                                       ` James Bottomley
  0 siblings, 0 replies; 210+ messages in thread
From: James Bottomley @ 2011-04-22 13:49 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: David Rientjes, Christoph Lameter, Andrew Morton,
	KOSAKI Motohiro, Michal Hocko, Hugh Dickins, linux-mm, LKML,
	linux-parisc, Ingo Molnar, x86 maintainers

On Fri, 2011-04-22 at 11:02 +0300, Pekka Enberg wrote:
> On Fri, Apr 22, 2011 at 1:12 AM, David Rientjes <rientjes@google.com> wrote:
> >> > diff --git a/init/Kconfig b/init/Kconfig
> >> > index 56240e7..a7ad8fb 100644
> >> > --- a/init/Kconfig
> >> > +++ b/init/Kconfig
> >> > @@ -1226,6 +1226,7 @@ config SLAB
> >> >           per cpu and per node queues.
> >> >
> >> >  config SLUB
> >> > +       depends on BROKEN || NUMA || !DISCONTIGMEM
> >> >         bool "SLUB (Unqueued Allocator)"
> >> >         help
> >> >            SLUB is a slab allocator that minimizes cache line usage
> >>
> >>
> >> I already sent it to linux-arch and there's been no dissent; there have
> >> been a few "will that fix my slub bug?" type of responses.
> >
> > I was concerned about tile because it actually got all this right by using
> > N_NORMAL_MEMORY appropriately and it uses slub by default, but it always
> > enables NUMA at the moment so this won't impact it.
> >
> > Acked-by: David Rientjes <rientjes@google.com>
> 
> I'm OK with this Kconfig patch. Can someone send a proper patch with
> signoffs and such? Do we want to tag this for -stable too?

I already did here:

http://marc.info/?l=linux-arch&m=130324857801976

I got the wrong linux-mm email address, though (I thought you were on
vger).

I've got a parisc specific patch already for this (also for stable), so
I can just queue this alongside if everyone's OK with that?

James


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-22 13:49                                                                       ` James Bottomley
@ 2011-04-22 17:00                                                                         ` Pekka Enberg
  -1 siblings, 0 replies; 210+ messages in thread
From: Pekka Enberg @ 2011-04-22 17:00 UTC (permalink / raw)
  To: James Bottomley
  Cc: David Rientjes, Christoph Lameter, Andrew Morton,
	KOSAKI Motohiro, Michal Hocko, Hugh Dickins, linux-mm, LKML,
	linux-parisc, Ingo Molnar, x86 maintainers

Hi James,

On Fri, 2011-04-22 at 08:49 -0500, James Bottomley wrote:
> > I'm OK with this Kconfig patch. Can someone send a proper patch with
> > signoffs and such? Do we want to tag this for -stable too?
> 
> I already did here:
> 
> http://marc.info/?l=linux-arch&m=130324857801976
> 
> I got the wrong linux-mm email address, though (I thought you were on
> vger).

Grr, it's a SLUB patch and you didn't CC any of the maintainers! If that
was an attempt to sneak it past me, that's not cool. And if you left it
out by mistake, that's not cool either!

> I've got a parisc specific patch already for this (also for stable), so
> I can just queue this alongside if everyone's OK with that?

Feel free, I'm not subscribed to linux-arch so I don't have the patch in
my inbox at all:

Acked-by: Pekka Enberg <penberg@kernel.org>

			Pekka


^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-22 17:00                                                                         ` Pekka Enberg
  0 siblings, 0 replies; 210+ messages in thread
From: Pekka Enberg @ 2011-04-22 17:00 UTC (permalink / raw)
  To: James Bottomley
  Cc: David Rientjes, Christoph Lameter, Andrew Morton,
	KOSAKI Motohiro, Michal Hocko, Hugh Dickins, linux-mm, LKML,
	linux-parisc, Ingo Molnar, x86 maintainers

Hi James,

On Fri, 2011-04-22 at 08:49 -0500, James Bottomley wrote:
> > I'm OK with this Kconfig patch. Can someone send a proper patch with
> > signoffs and such? Do we want to tag this for -stable too?
> 
> I already did here:
> 
> http://marc.info/?l=linux-arch&m=130324857801976
> 
> I got the wrong linux-mm email address, though (I thought you were on
> vger).

Grr, it's a SLUB patch and you didn't CC any of the maintainers! If that
was an attempt to sneak it past me, that's not cool. And if you left it
out by mistake, that's not cool either!

> I've got a parisc specific patch already for this (also for stable), so
> I can just queue this alongside if everyone's OK with that?

Feel free, I'm not subscribed to linux-arch so I don't have the patch in
my inbox at all:

Acked-by: Pekka Enberg <penberg@kernel.org>

			Pekka

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-22 17:00                                                                         ` Pekka Enberg
@ 2011-04-22 17:03                                                                           ` James Bottomley
  -1 siblings, 0 replies; 210+ messages in thread
From: James Bottomley @ 2011-04-22 17:03 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: David Rientjes, Christoph Lameter, Andrew Morton,
	KOSAKI Motohiro, Michal Hocko, Hugh Dickins, linux-mm, LKML,
	linux-parisc, Ingo Molnar, x86 maintainers

On Fri, 2011-04-22 at 20:00 +0300, Pekka Enberg wrote:
> Hi James,
> 
> On Fri, 2011-04-22 at 08:49 -0500, James Bottomley wrote:
> > > I'm OK with this Kconfig patch. Can someone send a proper patch with
> > > signoffs and such? Do we want to tag this for -stable too?
> > 
> > I already did here:
> > 
> > http://marc.info/?l=linux-arch&m=130324857801976
> > 
> > I got the wrong linux-mm email address, though (I thought you were on
> > vger).
> 
> Grr, it's a SLUB patch and you didn't CC any of the maintainers! If that
> was an attempt to sneak it past me, that's not cool. And if you left it
> out by mistake, that's not cool either!

No, christoph was directly on the cc ... although I didn't realise there
were more slub maintainers.

> > I've got a parisc specific patch already for this (also for stable), so
> > I can just queue this alongside if everyone's OK with that?
> 
> Feel free, I'm not subscribed to linux-arch so I don't have the patch in
> my inbox at all:
> 
> Acked-by: Pekka Enberg <penberg@kernel.org>

Will do, thanks.

James



^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-22 17:03                                                                           ` James Bottomley
  0 siblings, 0 replies; 210+ messages in thread
From: James Bottomley @ 2011-04-22 17:03 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: David Rientjes, Christoph Lameter, Andrew Morton,
	KOSAKI Motohiro, Michal Hocko, Hugh Dickins, linux-mm, LKML,
	linux-parisc, Ingo Molnar, x86 maintainers

On Fri, 2011-04-22 at 20:00 +0300, Pekka Enberg wrote:
> Hi James,
> 
> On Fri, 2011-04-22 at 08:49 -0500, James Bottomley wrote:
> > > I'm OK with this Kconfig patch. Can someone send a proper patch with
> > > signoffs and such? Do we want to tag this for -stable too?
> > 
> > I already did here:
> > 
> > http://marc.info/?l=linux-arch&m=130324857801976
> > 
> > I got the wrong linux-mm email address, though (I thought you were on
> > vger).
> 
> Grr, it's a SLUB patch and you didn't CC any of the maintainers! If that
> was an attempt to sneak it past me, that's not cool. And if you left it
> out by mistake, that's not cool either!

No, christoph was directly on the cc ... although I didn't realise there
were more slub maintainers.

> > I've got a parisc specific patch already for this (also for stable), so
> > I can just queue this alongside if everyone's OK with that?
> 
> Feel free, I'm not subscribed to linux-arch so I don't have the patch in
> my inbox at all:
> 
> Acked-by: Pekka Enberg <penberg@kernel.org>

Will do, thanks.

James


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-21 18:45                                                               ` Dave Hansen
@ 2011-04-22 18:19                                                                 ` James Bottomley
  -1 siblings, 0 replies; 210+ messages in thread
From: James Bottomley @ 2011-04-22 18:19 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Christoph Lameter, KOSAKI Motohiro, David Rientjes, Pekka Enberg,
	Michal Hocko, Andrew Morton, Hugh Dickins, linux-mm, LKML,
	linux-parisc, Ingo Molnar, x86 maintainers, Tejun Heo,
	Mel Gorman

On Thu, 2011-04-21 at 11:45 -0700, Dave Hansen wrote:
> On Thu, 2011-04-21 at 13:33 -0500, Christoph Lameter wrote:
> > http://www.linux-mips.org/archives/linux-mips/2008-08/msg00154.html
> > 
> > http://mytechkorner.blogspot.com/2010/12/sparsemem.html
> > 
> > Dave Hansen, Mel: Can you provide us with some help? (Its Easter and so
> > the europeans may be off for awhile) 
> 
> Yup, for sure.  It's also interesting how much code ppc64 removed when
> they did this:
> 
> http://lists.ozlabs.org/pipermail/linuxppc64-dev/2005-November/006646.html

I looked at converting parisc to sparsemem and there's one problem that
none of these cover.  How do you set up bootmem?  If I look at the
examples, they all seem to have enough memory in the first range to
allocate from, so there's no problem.  On parisc, with discontigmem, we
set up all of our ranges as bootmem (we can do this because we
effectively have one node per range).  Obviously, since sparsemem has a
single bitmap for all of the bootmem, we can no longer allocate all of
our memory to it (well, without exploding because some of our gaps are
gigabytes big).  How does everyone cope with this (do you search for
your largest range and use that as bootmem or something)?

James


If 


^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-22 18:19                                                                 ` James Bottomley
  0 siblings, 0 replies; 210+ messages in thread
From: James Bottomley @ 2011-04-22 18:19 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Christoph Lameter, KOSAKI Motohiro, David Rientjes, Pekka Enberg,
	Michal Hocko, Andrew Morton, Hugh Dickins, linux-mm, LKML,
	linux-parisc, Ingo Molnar, x86 maintainers, Tejun Heo,
	Mel Gorman

On Thu, 2011-04-21 at 11:45 -0700, Dave Hansen wrote:
> On Thu, 2011-04-21 at 13:33 -0500, Christoph Lameter wrote:
> > http://www.linux-mips.org/archives/linux-mips/2008-08/msg00154.html
> > 
> > http://mytechkorner.blogspot.com/2010/12/sparsemem.html
> > 
> > Dave Hansen, Mel: Can you provide us with some help? (Its Easter and so
> > the europeans may be off for awhile) 
> 
> Yup, for sure.  It's also interesting how much code ppc64 removed when
> they did this:
> 
> http://lists.ozlabs.org/pipermail/linuxppc64-dev/2005-November/006646.html

I looked at converting parisc to sparsemem and there's one problem that
none of these cover.  How do you set up bootmem?  If I look at the
examples, they all seem to have enough memory in the first range to
allocate from, so there's no problem.  On parisc, with discontigmem, we
set up all of our ranges as bootmem (we can do this because we
effectively have one node per range).  Obviously, since sparsemem has a
single bitmap for all of the bootmem, we can no longer allocate all of
our memory to it (well, without exploding because some of our gaps are
gigabytes big).  How does everyone cope with this (do you search for
your largest range and use that as bootmem or something)?

James


If 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-22 18:19                                                                 ` James Bottomley
@ 2011-04-22 20:24                                                                   ` Dave Hansen
  -1 siblings, 0 replies; 210+ messages in thread
From: Dave Hansen @ 2011-04-22 20:24 UTC (permalink / raw)
  To: James Bottomley
  Cc: Christoph Lameter, KOSAKI Motohiro, David Rientjes, Pekka Enberg,
	Michal Hocko, Andrew Morton, Hugh Dickins, linux-mm, LKML,
	linux-parisc, Ingo Molnar, x86 maintainers, Tejun Heo,
	Mel Gorman

On Fri, 2011-04-22 at 13:19 -0500, James Bottomley wrote:
> I looked at converting parisc to sparsemem and there's one problem that
> none of these cover.  How do you set up bootmem?  If I look at the
> examples, they all seem to have enough memory in the first range to
> allocate from, so there's no problem.  On parisc, with discontigmem, we
> set up all of our ranges as bootmem (we can do this because we
> effectively have one node per range).  Obviously, since sparsemem has a
> single bitmap for all of the bootmem, we can no longer allocate all of
> our memory to it (well, without exploding because some of our gaps are
> gigabytes big).  How does everyone cope with this (do you search for
> your largest range and use that as bootmem or something)? 

Sparsemem is purely post-bootmem.  It doesn't deal with sparse
bootmem. :(

That said, I'm not sure you're in trouble.  One bit of bitmap covers 4k
(with 4k pages of course) of memory, one byte covers 32k, and A 32MB
bitmap can cover 1TB of address space.  It explodes, but I think it's
manageable.  It hasn't been a problem enough up to this point to go fix
it.

-- Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-22 20:24                                                                   ` Dave Hansen
  0 siblings, 0 replies; 210+ messages in thread
From: Dave Hansen @ 2011-04-22 20:24 UTC (permalink / raw)
  To: James Bottomley
  Cc: Christoph Lameter, KOSAKI Motohiro, David Rientjes, Pekka Enberg,
	Michal Hocko, Andrew Morton, Hugh Dickins, linux-mm, LKML,
	linux-parisc, Ingo Molnar, x86 maintainers, Tejun Heo,
	Mel Gorman

On Fri, 2011-04-22 at 13:19 -0500, James Bottomley wrote:
> I looked at converting parisc to sparsemem and there's one problem that
> none of these cover.  How do you set up bootmem?  If I look at the
> examples, they all seem to have enough memory in the first range to
> allocate from, so there's no problem.  On parisc, with discontigmem, we
> set up all of our ranges as bootmem (we can do this because we
> effectively have one node per range).  Obviously, since sparsemem has a
> single bitmap for all of the bootmem, we can no longer allocate all of
> our memory to it (well, without exploding because some of our gaps are
> gigabytes big).  How does everyone cope with this (do you search for
> your largest range and use that as bootmem or something)? 

Sparsemem is purely post-bootmem.  It doesn't deal with sparse
bootmem. :(

That said, I'm not sure you're in trouble.  One bit of bitmap covers 4k
(with 4k pages of course) of memory, one byte covers 32k, and A 32MB
bitmap can cover 1TB of address space.  It explodes, but I think it's
manageable.  It hasn't been a problem enough up to this point to go fix
it.

-- Dave


^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-22 20:24                                                                   ` Dave Hansen
@ 2011-04-22 20:35                                                                     ` James Bottomley
  -1 siblings, 0 replies; 210+ messages in thread
From: James Bottomley @ 2011-04-22 20:35 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Christoph Lameter, KOSAKI Motohiro, David Rientjes, Pekka Enberg,
	Michal Hocko, Andrew Morton, Hugh Dickins, linux-mm, LKML,
	linux-parisc, Ingo Molnar, x86 maintainers, Tejun Heo,
	Mel Gorman

On Fri, 2011-04-22 at 13:24 -0700, Dave Hansen wrote:
> On Fri, 2011-04-22 at 13:19 -0500, James Bottomley wrote:
> > I looked at converting parisc to sparsemem and there's one problem that
> > none of these cover.  How do you set up bootmem?  If I look at the
> > examples, they all seem to have enough memory in the first range to
> > allocate from, so there's no problem.  On parisc, with discontigmem, we
> > set up all of our ranges as bootmem (we can do this because we
> > effectively have one node per range).  Obviously, since sparsemem has a
> > single bitmap for all of the bootmem, we can no longer allocate all of
> > our memory to it (well, without exploding because some of our gaps are
> > gigabytes big).  How does everyone cope with this (do you search for
> > your largest range and use that as bootmem or something)? 
> 
> Sparsemem is purely post-bootmem.  It doesn't deal with sparse
> bootmem. :(

Well, this is enabled in discontigmem, sigh.

> That said, I'm not sure you're in trouble.  One bit of bitmap covers 4k
> (with 4k pages of course) of memory, one byte covers 32k, and A 32MB
> bitmap can cover 1TB of address space.  It explodes, but I think it's
> manageable.  It hasn't been a problem enough up to this point to go fix
> it.

I think the platform limited physical address range is 42 bits, so I
suppose that's 128MB ... hopefully we should have that as a contiguous
range from the end of the loaded kernel.  We're lucky they didn't enable
the full ZX1 address range; that would have been 48 bits (or a whole
gigabyte just for the bitmap).

James



^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-22 20:35                                                                     ` James Bottomley
  0 siblings, 0 replies; 210+ messages in thread
From: James Bottomley @ 2011-04-22 20:35 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Christoph Lameter, KOSAKI Motohiro, David Rientjes, Pekka Enberg,
	Michal Hocko, Andrew Morton, Hugh Dickins, linux-mm, LKML,
	linux-parisc, Ingo Molnar, x86 maintainers, Tejun Heo,
	Mel Gorman

On Fri, 2011-04-22 at 13:24 -0700, Dave Hansen wrote:
> On Fri, 2011-04-22 at 13:19 -0500, James Bottomley wrote:
> > I looked at converting parisc to sparsemem and there's one problem that
> > none of these cover.  How do you set up bootmem?  If I look at the
> > examples, they all seem to have enough memory in the first range to
> > allocate from, so there's no problem.  On parisc, with discontigmem, we
> > set up all of our ranges as bootmem (we can do this because we
> > effectively have one node per range).  Obviously, since sparsemem has a
> > single bitmap for all of the bootmem, we can no longer allocate all of
> > our memory to it (well, without exploding because some of our gaps are
> > gigabytes big).  How does everyone cope with this (do you search for
> > your largest range and use that as bootmem or something)? 
> 
> Sparsemem is purely post-bootmem.  It doesn't deal with sparse
> bootmem. :(

Well, this is enabled in discontigmem, sigh.

> That said, I'm not sure you're in trouble.  One bit of bitmap covers 4k
> (with 4k pages of course) of memory, one byte covers 32k, and A 32MB
> bitmap can cover 1TB of address space.  It explodes, but I think it's
> manageable.  It hasn't been a problem enough up to this point to go fix
> it.

I think the platform limited physical address range is 42 bits, so I
suppose that's 128MB ... hopefully we should have that as a contiguous
range from the end of the loaded kernel.  We're lucky they didn't enable
the full ZX1 address range; that would have been 48 bits (or a whole
gigabyte just for the bitmap).

James


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
  2011-04-21 18:45                                                               ` Dave Hansen
@ 2011-04-22 21:33                                                                 ` James Bottomley
  -1 siblings, 0 replies; 210+ messages in thread
From: James Bottomley @ 2011-04-22 21:33 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Christoph Lameter, KOSAKI Motohiro, David Rientjes, Pekka Enberg,
	Michal Hocko, Andrew Morton, Hugh Dickins, linux-mm, LKML,
	linux-parisc, Ingo Molnar, x86 maintainers, Tejun Heo,
	Mel Gorman

On Thu, 2011-04-21 at 11:45 -0700, Dave Hansen wrote:
> On Thu, 2011-04-21 at 13:33 -0500, Christoph Lameter wrote:
> > http://www.linux-mips.org/archives/linux-mips/2008-08/msg00154.html
> > 

By the way, this reference is actively wrong for parisc (having just
debugged the problem).  The basic issue is that until we start paging,
we have the kernel and some memory beyond it barely covered with the pg0
page table set up in head.S  On our systems, that extends out to 16MB.
SPARSEMEM is much more bootmem resource greedy than DISCONTIGMEM, so if
we actually call sparse_init() before we have the page tables set up, we
fall off the end of our 16MB mapping and go boom.  For us, therefore, we
can't call sparse_init() until we have our proper page tables in place.

James



^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
@ 2011-04-22 21:33                                                                 ` James Bottomley
  0 siblings, 0 replies; 210+ messages in thread
From: James Bottomley @ 2011-04-22 21:33 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Christoph Lameter, KOSAKI Motohiro, David Rientjes, Pekka Enberg,
	Michal Hocko, Andrew Morton, Hugh Dickins, linux-mm, LKML,
	linux-parisc, Ingo Molnar, x86 maintainers, Tejun Heo,
	Mel Gorman

On Thu, 2011-04-21 at 11:45 -0700, Dave Hansen wrote:
> On Thu, 2011-04-21 at 13:33 -0500, Christoph Lameter wrote:
> > http://www.linux-mips.org/archives/linux-mips/2008-08/msg00154.html
> > 

By the way, this reference is actively wrong for parisc (having just
debugged the problem).  The basic issue is that until we start paging,
we have the kernel and some memory beyond it barely covered with the pg0
page table set up in head.S  On our systems, that extends out to 16MB.
SPARSEMEM is much more bootmem resource greedy than DISCONTIGMEM, so if
we actually call sparse_init() before we have the page tables set up, we
fall off the end of our 16MB mapping and go boom.  For us, therefore, we
can't call sparse_init() until we have our proper page tables in place.

James


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* [PATCH] convert parisc to sparsemem (was Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards)
  2011-04-22 21:33                                                                 ` James Bottomley
@ 2011-04-23 18:34                                                                   ` James Bottomley
  -1 siblings, 0 replies; 210+ messages in thread
From: James Bottomley @ 2011-04-23 18:34 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Christoph Lameter, KOSAKI Motohiro, David Rientjes, Pekka Enberg,
	Michal Hocko, Andrew Morton, Hugh Dickins, linux-mm, LKML,
	linux-parisc, Ingo Molnar, x86 maintainers, Tejun Heo,
	Mel Gorman

This is the preliminary conversion.  It's very nasty on parisc because
the memory allocation isn't symmetric anymore: under DISCONTIGMEM, we
push all memory into bootmem and then let free_all_bootmem() do the
magic for us; now we have to do separate initialisations for ranges
because SPARSEMEM can't do multi-range boot memory. It's also got the
horrible hack that I only use the first found range for bootmem.  I'm
not sure if this is correct (it won't be if the first found range can be
under about 50MB because we'll run out of bootmem during boot) ... we
might have to sort the ranges and use the larges, but that will involve
us in even more hackery around the bootmem reservations code.

The boot sequence got a few seconds slower because now all of the loops
over our pfn ranges actually have to skip through the holes (which takes
time for 64GB).

All in all, I've not been very impressed with SPARSEMEM over
DISCONTIGMEM.  It seems to have a lot of rough edges (necessitating
exception code) which DISCONTIGMEM just copes with.

And before you say the code is smaller, that's because I converted us to
generic show_mem().

James

---

diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig
index 69ff049..b416641 100644
--- a/arch/parisc/Kconfig
+++ b/arch/parisc/Kconfig
@@ -233,22 +233,17 @@ config ARCH_SELECT_MEMORY_MODEL
 	def_bool y
 	depends on 64BIT
 
-config ARCH_DISCONTIGMEM_ENABLE
+config ARCH_SPARSEMEM_ENABLE
 	def_bool y
 	depends on 64BIT
 
 config ARCH_FLATMEM_ENABLE
 	def_bool y
 
-config ARCH_DISCONTIGMEM_DEFAULT
+config ARCH_SPARSEMEM_DEFAULT
 	def_bool y
 	depends on ARCH_DISCONTIGMEM_ENABLE
 
-config NODES_SHIFT
-	int
-	default "3"
-	depends on NEED_MULTIPLE_NODES
-
 source "kernel/Kconfig.preempt"
 source "kernel/Kconfig.hz"
 source "mm/Kconfig"
diff --git a/arch/parisc/include/asm/mmzone.h b/arch/parisc/include/asm/mmzone.h
index 9608d2c..8344bcb 100644
--- a/arch/parisc/include/asm/mmzone.h
+++ b/arch/parisc/include/asm/mmzone.h
@@ -1,73 +1,11 @@
 #ifndef _PARISC_MMZONE_H
 #define _PARISC_MMZONE_H
 
-#ifdef CONFIG_DISCONTIGMEM
+#ifdef CONFIG_SPARSEMEM
 
-#define MAX_PHYSMEM_RANGES 8 /* Fix the size for now (current known max is 3) */
-extern int npmem_ranges;
-
-struct node_map_data {
-    pg_data_t pg_data;
-};
-
-extern struct node_map_data node_data[];
-
-#define NODE_DATA(nid)          (&node_data[nid].pg_data)
-
-#define node_start_pfn(nid)	(NODE_DATA(nid)->node_start_pfn)
-#define node_end_pfn(nid)						\
-({									\
-	pg_data_t *__pgdat = NODE_DATA(nid);				\
-	__pgdat->node_start_pfn + __pgdat->node_spanned_pages;		\
-})
-
-/* We have these possible memory map layouts:
- * Astro: 0-3.75, 67.75-68, 4-64
- * zx1: 0-1, 257-260, 4-256
- * Stretch (N-class): 0-2, 4-32, 34-xxx
- */
-
-/* Since each 1GB can only belong to one region (node), we can create
- * an index table for pfn to nid lookup; each entry in pfnnid_map 
- * represents 1GB, and contains the node that the memory belongs to. */
-
-#define PFNNID_SHIFT (30 - PAGE_SHIFT)
-#define PFNNID_MAP_MAX  512     /* support 512GB */
-extern unsigned char pfnnid_map[PFNNID_MAP_MAX];
-
-#ifndef CONFIG_64BIT
-#define pfn_is_io(pfn) ((pfn & (0xf0000000UL >> PAGE_SHIFT)) == (0xf0000000UL >> PAGE_SHIFT))
+#define MAX_PHYSMEM_RANGES 	8 /* current max is 3 but future proof this */
 #else
-/* io can be 0xf0f0f0f0f0xxxxxx or 0xfffffffff0000000 */
-#define pfn_is_io(pfn) ((pfn & (0xf000000000000000UL >> PAGE_SHIFT)) == (0xf000000000000000UL >> PAGE_SHIFT))
-#endif
-
-static inline int pfn_to_nid(unsigned long pfn)
-{
-	unsigned int i;
-	unsigned char r;
-
-	if (unlikely(pfn_is_io(pfn)))
-		return 0;
-
-	i = pfn >> PFNNID_SHIFT;
-	BUG_ON(i >= sizeof(pfnnid_map) / sizeof(pfnnid_map[0]));
-	r = pfnnid_map[i];
-	BUG_ON(r == 0xff);
-
-	return (int)r;
-}
-
-static inline int pfn_valid(int pfn)
-{
-	int nid = pfn_to_nid(pfn);
-
-	if (nid >= 0)
-		return (pfn < node_end_pfn(nid));
-	return 0;
-}
-
-#else /* !CONFIG_DISCONTIGMEM */
 #define MAX_PHYSMEM_RANGES 	1 
 #endif
+
 #endif /* _PARISC_MMZONE_H */
diff --git a/arch/parisc/include/asm/page.h b/arch/parisc/include/asm/page.h
index a84cc1f..654285a 100644
--- a/arch/parisc/include/asm/page.h
+++ b/arch/parisc/include/asm/page.h
@@ -139,9 +139,9 @@ extern int npmem_ranges;
 #define __pa(x)			((unsigned long)(x)-PAGE_OFFSET)
 #define __va(x)			((void *)((unsigned long)(x)+PAGE_OFFSET))
 
-#ifndef CONFIG_DISCONTIGMEM
+#ifndef CONFIG_SPARSEMEM
 #define pfn_valid(pfn)		((pfn) < max_mapnr)
-#endif /* CONFIG_DISCONTIGMEM */
+#endif
 
 #ifdef CONFIG_HUGETLB_PAGE
 #define HPAGE_SHIFT		22	/* 4MB (is this fixed?) */
diff --git a/arch/parisc/kernel/parisc_ksyms.c b/arch/parisc/kernel/parisc_ksyms.c
index df65366..526122c 100644
--- a/arch/parisc/kernel/parisc_ksyms.c
+++ b/arch/parisc/kernel/parisc_ksyms.c
@@ -147,12 +147,6 @@ extern void $$dyncall(void);
 EXPORT_SYMBOL($$dyncall);
 #endif
 
-#ifdef CONFIG_DISCONTIGMEM
-#include <asm/mmzone.h>
-EXPORT_SYMBOL(node_data);
-EXPORT_SYMBOL(pfnnid_map);
-#endif
-
 #ifdef CONFIG_FUNCTION_TRACER
 extern void _mcount(void);
 EXPORT_SYMBOL(_mcount);
diff --git a/arch/parisc/mm/init.c b/arch/parisc/mm/init.c
index 5fa1e27..69c547c 100644
--- a/arch/parisc/mm/init.c
+++ b/arch/parisc/mm/init.c
@@ -21,7 +21,6 @@
 #include <linux/initrd.h>
 #include <linux/swap.h>
 #include <linux/unistd.h>
-#include <linux/nodemask.h>	/* for node_online_map */
 #include <linux/pagemap.h>	/* for release_pages and page_cache_release */
 
 #include <asm/pgalloc.h>
@@ -35,11 +34,6 @@ DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
 
 extern int  data_start;
 
-#ifdef CONFIG_DISCONTIGMEM
-struct node_map_data node_data[MAX_NUMNODES] __read_mostly;
-unsigned char pfnnid_map[PFNNID_MAP_MAX] __read_mostly;
-#endif
-
 static struct resource data_resource = {
 	.name	= "Kernel data",
 	.flags	= IORESOURCE_BUSY | IORESOURCE_MEM,
@@ -110,7 +104,7 @@ static void __init setup_bootmem(void)
 	unsigned long bootmap_pages;
 	unsigned long bootmap_start_pfn;
 	unsigned long bootmap_pfn;
-#ifndef CONFIG_DISCONTIGMEM
+#ifndef CONFIG_SPARSEMEM
 	physmem_range_t pmem_holes[MAX_PHYSMEM_RANGES - 1];
 	int npmem_holes;
 #endif
@@ -144,7 +138,7 @@ static void __init setup_bootmem(void)
 		}
 	}
 
-#ifndef CONFIG_DISCONTIGMEM
+#ifndef CONFIG_SPARSEMEM
 	/*
 	 * Throw out ranges that are too far apart (controlled by
 	 * MAX_GAP).
@@ -156,7 +150,7 @@ static void __init setup_bootmem(void)
 			 pmem_ranges[i-1].pages) > MAX_GAP) {
 			npmem_ranges = i;
 			printk("Large gap in memory detected (%ld pages). "
-			       "Consider turning on CONFIG_DISCONTIGMEM\n",
+			       "Consider turning on CONFIG_SPARSEMEM\n",
 			       pmem_ranges[i].start_pfn -
 			       (pmem_ranges[i-1].start_pfn +
 			        pmem_ranges[i-1].pages));
@@ -228,7 +222,7 @@ static void __init setup_bootmem(void)
 
 	printk(KERN_INFO "Total Memory: %ld MB\n",mem_max >> 20);
 
-#ifndef CONFIG_DISCONTIGMEM
+#ifndef CONFIG_SPARSEMEM
 	/* Merge the ranges, keeping track of the holes */
 
 	{
@@ -253,48 +247,29 @@ static void __init setup_bootmem(void)
 	}
 #endif
 
-	bootmap_pages = 0;
-	for (i = 0; i < npmem_ranges; i++)
-		bootmap_pages += bootmem_bootmap_pages(pmem_ranges[i].pages);
+	bootmap_pages = bootmem_bootmap_pages(pmem_ranges[0].pages);
 
 	bootmap_start_pfn = PAGE_ALIGN(__pa((unsigned long) &_end)) >> PAGE_SHIFT;
 
-#ifdef CONFIG_DISCONTIGMEM
-	for (i = 0; i < MAX_PHYSMEM_RANGES; i++) {
-		memset(NODE_DATA(i), 0, sizeof(pg_data_t));
-		NODE_DATA(i)->bdata = &bootmem_node_data[i];
-	}
-	memset(pfnnid_map, 0xff, sizeof(pfnnid_map));
-
-	for (i = 0; i < npmem_ranges; i++) {
-		node_set_state(i, N_NORMAL_MEMORY);
-		node_set_online(i);
-	}
-#endif
-
 	/*
-	 * Initialize and free the full range of memory in each range.
-	 * Note that the only writing these routines do are to the bootmap,
-	 * and we've made sure to locate the bootmap properly so that they
-	 * won't be writing over anything important.
+	 * Only initialise the first memory range to bootmem (the bootmem
+	 * allocation map can't cope with large holes)
 	 */
 
 	bootmap_pfn = bootmap_start_pfn;
 	max_pfn = 0;
-	for (i = 0; i < npmem_ranges; i++) {
+	{
 		unsigned long start_pfn;
 		unsigned long npages;
 
-		start_pfn = pmem_ranges[i].start_pfn;
-		npages = pmem_ranges[i].pages;
+		start_pfn = pmem_ranges[0].start_pfn;
+		npages = pmem_ranges[0].pages;
 
-		bootmap_size = init_bootmem_node(NODE_DATA(i),
+		bootmap_size = init_bootmem_node(NODE_DATA(0),
 						bootmap_pfn,
 						start_pfn,
 						(start_pfn + npages) );
-		free_bootmem_node(NODE_DATA(i),
-				  (start_pfn << PAGE_SHIFT),
-				  (npages << PAGE_SHIFT) );
+		free_bootmem(start_pfn << PAGE_SHIFT, npages << PAGE_SHIFT);
 		bootmap_pfn += (bootmap_size + PAGE_SIZE - 1) >> PAGE_SHIFT;
 		if ((start_pfn + npages) > max_pfn)
 			max_pfn = start_pfn + npages;
@@ -323,7 +298,7 @@ static void __init setup_bootmem(void)
 			((bootmap_pfn - bootmap_start_pfn) << PAGE_SHIFT),
 			BOOTMEM_DEFAULT);
 
-#ifndef CONFIG_DISCONTIGMEM
+#ifndef CONFIG_SPARSEMEM
 
 	/* reserve the holes */
 
@@ -369,6 +344,13 @@ static void __init setup_bootmem(void)
 		request_resource(res, &data_resource);
 	}
 	request_resource(&sysram_resources[0], &pdcdata_resource);
+
+#ifdef CONFIG_SPARSEMEM
+	for (i = 0; i < npmem_ranges; i++) {
+		memory_present(0, pmem_ranges[i].start_pfn,
+			       pmem_ranges[i].start_pfn + pmem_ranges[i].pages);
+	}
+#endif
 }
 
 static void __init map_pages(unsigned long start_vaddr,
@@ -580,7 +562,7 @@ unsigned long pcxl_dma_start __read_mostly;
 
 void __init mem_init(void)
 {
-	int codesize, reservedpages, datasize, initsize;
+	int codesize, reservedpages, datasize, initsize, i;
 
 	/* Do sanity checks on page table constants */
 	BUILD_BUG_ON(PTE_ENTRY_SIZE != sizeof(pte_t));
@@ -589,19 +571,27 @@ void __init mem_init(void)
 	BUILD_BUG_ON(PAGE_SHIFT + BITS_PER_PTE + BITS_PER_PMD + BITS_PER_PGD
 			> BITS_PER_LONG);
 
-	high_memory = __va((max_pfn << PAGE_SHIFT));
-
-#ifndef CONFIG_DISCONTIGMEM
-	max_mapnr = page_to_pfn(virt_to_page(high_memory - 1)) + 1;
 	totalram_pages += free_all_bootmem();
-#else
-	{
-		int i;
-
-		for (i = 0; i < npmem_ranges; i++)
-			totalram_pages += free_all_bootmem_node(NODE_DATA(i));
+	/* free all the ranges not in bootmem */
+	for (i = 1; i < npmem_ranges; i++) {
+		unsigned long pfn = pmem_ranges[i].start_pfn;
+		unsigned long end = pfn + pmem_ranges[i].pages;
+
+		if (end > max_pfn)
+			max_pfn = end;
+
+		for (; pfn < end; pfn++) {
+			struct page *page = pfn_to_page(pfn);
+			ClearPageReserved(page);
+			init_page_count(page);
+			__free_page(page);
+			totalram_pages++;
+		}
 	}
-#endif
+
+	max_low_pfn = max_pfn;
+	high_memory = __va((max_pfn << PAGE_SHIFT));
+	max_mapnr = page_to_pfn(virt_to_page(high_memory - 1)) + 1;
 
 	codesize = (unsigned long)_etext - (unsigned long)_text;
 	datasize = (unsigned long)_edata - (unsigned long)_etext;
@@ -610,24 +600,15 @@ void __init mem_init(void)
 	reservedpages = 0;
 {
 	unsigned long pfn;
-#ifdef CONFIG_DISCONTIGMEM
-	int i;
-
-	for (i = 0; i < npmem_ranges; i++) {
-		for (pfn = node_start_pfn(i); pfn < node_end_pfn(i); pfn++) {
-			if (PageReserved(pfn_to_page(pfn)))
-				reservedpages++;
-		}
-	}
-#else /* !CONFIG_DISCONTIGMEM */
 	for (pfn = 0; pfn < max_pfn; pfn++) {
 		/*
 		 * Only count reserved RAM pages
 		 */
+		if (!pfn_valid(pfn))
+			continue;
 		if (PageReserved(pfn_to_page(pfn)))
 			reservedpages++;
 	}
-#endif
 }
 
 #ifdef CONFIG_PA11
@@ -680,78 +661,6 @@ void __init mem_init(void)
 unsigned long *empty_zero_page __read_mostly;
 EXPORT_SYMBOL(empty_zero_page);
 
-void show_mem(unsigned int filter)
-{
-	int i,free = 0,total = 0,reserved = 0;
-	int shared = 0, cached = 0;
-
-	printk(KERN_INFO "Mem-info:\n");
-	show_free_areas();
-#ifndef CONFIG_DISCONTIGMEM
-	i = max_mapnr;
-	while (i-- > 0) {
-		total++;
-		if (PageReserved(mem_map+i))
-			reserved++;
-		else if (PageSwapCache(mem_map+i))
-			cached++;
-		else if (!page_count(&mem_map[i]))
-			free++;
-		else
-			shared += page_count(&mem_map[i]) - 1;
-	}
-#else
-	for (i = 0; i < npmem_ranges; i++) {
-		int j;
-
-		for (j = node_start_pfn(i); j < node_end_pfn(i); j++) {
-			struct page *p;
-			unsigned long flags;
-
-			pgdat_resize_lock(NODE_DATA(i), &flags);
-			p = nid_page_nr(i, j) - node_start_pfn(i);
-
-			total++;
-			if (PageReserved(p))
-				reserved++;
-			else if (PageSwapCache(p))
-				cached++;
-			else if (!page_count(p))
-				free++;
-			else
-				shared += page_count(p) - 1;
-			pgdat_resize_unlock(NODE_DATA(i), &flags);
-        	}
-	}
-#endif
-	printk(KERN_INFO "%d pages of RAM\n", total);
-	printk(KERN_INFO "%d reserved pages\n", reserved);
-	printk(KERN_INFO "%d pages shared\n", shared);
-	printk(KERN_INFO "%d pages swap cached\n", cached);
-
-
-#ifdef CONFIG_DISCONTIGMEM
-	{
-		struct zonelist *zl;
-		int i, j;
-
-		for (i = 0; i < npmem_ranges; i++) {
-			zl = node_zonelist(i, 0);
-			for (j = 0; j < MAX_NR_ZONES; j++) {
-				struct zoneref *z;
-				struct zone *zone;
-
-				printk("Zone list for zone %d on node %d: ", j, i);
-				for_each_zone_zonelist(zone, z, zl, j)
-					printk("[%d/%s] ", zone_to_nid(zone),
-								zone->name);
-				printk("\n");
-			}
-		}
-	}
-#endif
-}
-
 /*
  * pagetable_init() sets up the page tables
  *
@@ -886,6 +795,9 @@ EXPORT_SYMBOL(map_hpux_gateway_page);
 void __init paging_init(void)
 {
 	int i;
+	unsigned long zones_size[MAX_NR_ZONES] = { 0, };
+	unsigned long holes_size[MAX_NR_ZONES] = { 0, };
+	unsigned long mem_start_pfn = ~0UL, mem_end_pfn = 0, mem_size_pfn = 0;
 
 	setup_bootmem();
 	pagetable_init();
@@ -893,27 +805,31 @@ void __init paging_init(void)
 	flush_cache_all_local(); /* start with known state */
 	flush_tlb_all_local(NULL);
 
-	for (i = 0; i < npmem_ranges; i++) {
-		unsigned long zones_size[MAX_NR_ZONES] = { 0, };
-
-		zones_size[ZONE_NORMAL] = pmem_ranges[i].pages;
-
-#ifdef CONFIG_DISCONTIGMEM
-		/* Need to initialize the pfnnid_map before we can initialize
-		   the zone */
-		{
-		    int j;
-		    for (j = (pmem_ranges[i].start_pfn >> PFNNID_SHIFT);
-			 j <= ((pmem_ranges[i].start_pfn + pmem_ranges[i].pages) >> PFNNID_SHIFT);
-			 j++) {
-			pfnnid_map[j] = i;
-		    }
-		}
-#endif
+	/*
+	 *  from here, the kernel and all of the physical memory is
+	 *  fully covered with page table entries.  This is required
+	 *  because sparse_init() is very memory greedy and will fall
+	 *  off the end of the kernel initial page mapping.
+	 */
+
+	sparse_init();
 
-		free_area_init_node(i, zones_size,
-				pmem_ranges[i].start_pfn, NULL);
+	for (i = 0; i < npmem_ranges; i++) {
+		unsigned long start = pmem_ranges[i].start_pfn;
+		unsigned long size = pmem_ranges[i].pages;
+		unsigned long end = start + size;
+
+		if (mem_start_pfn > start)
+			mem_start_pfn = start;
+		if (mem_end_pfn < end)
+			mem_end_pfn = end;
+		mem_size_pfn += size;
 	}
+
+	zones_size[ZONE_NORMAL] = mem_end_pfn - mem_start_pfn;
+	holes_size[ZONE_NORMAL] = zones_size[ZONE_NORMAL] - mem_size_pfn;
+
+	free_area_init_node(0, zones_size, mem_start_pfn, holes_size);
 }
 
 #ifdef CONFIG_PA20



^ permalink raw reply related	[flat|nested] 210+ messages in thread

* [PATCH] convert parisc to sparsemem (was Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards)
@ 2011-04-23 18:34                                                                   ` James Bottomley
  0 siblings, 0 replies; 210+ messages in thread
From: James Bottomley @ 2011-04-23 18:34 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Christoph Lameter, KOSAKI Motohiro, David Rientjes, Pekka Enberg,
	Michal Hocko, Andrew Morton, Hugh Dickins, linux-mm, LKML,
	linux-parisc, Ingo Molnar, x86 maintainers, Tejun Heo,
	Mel Gorman

This is the preliminary conversion.  It's very nasty on parisc because
the memory allocation isn't symmetric anymore: under DISCONTIGMEM, we
push all memory into bootmem and then let free_all_bootmem() do the
magic for us; now we have to do separate initialisations for ranges
because SPARSEMEM can't do multi-range boot memory. It's also got the
horrible hack that I only use the first found range for bootmem.  I'm
not sure if this is correct (it won't be if the first found range can be
under about 50MB because we'll run out of bootmem during boot) ... we
might have to sort the ranges and use the larges, but that will involve
us in even more hackery around the bootmem reservations code.

The boot sequence got a few seconds slower because now all of the loops
over our pfn ranges actually have to skip through the holes (which takes
time for 64GB).

All in all, I've not been very impressed with SPARSEMEM over
DISCONTIGMEM.  It seems to have a lot of rough edges (necessitating
exception code) which DISCONTIGMEM just copes with.

And before you say the code is smaller, that's because I converted us to
generic show_mem().

James

---

diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig
index 69ff049..b416641 100644
--- a/arch/parisc/Kconfig
+++ b/arch/parisc/Kconfig
@@ -233,22 +233,17 @@ config ARCH_SELECT_MEMORY_MODEL
 	def_bool y
 	depends on 64BIT
 
-config ARCH_DISCONTIGMEM_ENABLE
+config ARCH_SPARSEMEM_ENABLE
 	def_bool y
 	depends on 64BIT
 
 config ARCH_FLATMEM_ENABLE
 	def_bool y
 
-config ARCH_DISCONTIGMEM_DEFAULT
+config ARCH_SPARSEMEM_DEFAULT
 	def_bool y
 	depends on ARCH_DISCONTIGMEM_ENABLE
 
-config NODES_SHIFT
-	int
-	default "3"
-	depends on NEED_MULTIPLE_NODES
-
 source "kernel/Kconfig.preempt"
 source "kernel/Kconfig.hz"
 source "mm/Kconfig"
diff --git a/arch/parisc/include/asm/mmzone.h b/arch/parisc/include/asm/mmzone.h
index 9608d2c..8344bcb 100644
--- a/arch/parisc/include/asm/mmzone.h
+++ b/arch/parisc/include/asm/mmzone.h
@@ -1,73 +1,11 @@
 #ifndef _PARISC_MMZONE_H
 #define _PARISC_MMZONE_H
 
-#ifdef CONFIG_DISCONTIGMEM
+#ifdef CONFIG_SPARSEMEM
 
-#define MAX_PHYSMEM_RANGES 8 /* Fix the size for now (current known max is 3) */
-extern int npmem_ranges;
-
-struct node_map_data {
-    pg_data_t pg_data;
-};
-
-extern struct node_map_data node_data[];
-
-#define NODE_DATA(nid)          (&node_data[nid].pg_data)
-
-#define node_start_pfn(nid)	(NODE_DATA(nid)->node_start_pfn)
-#define node_end_pfn(nid)						\
-({									\
-	pg_data_t *__pgdat = NODE_DATA(nid);				\
-	__pgdat->node_start_pfn + __pgdat->node_spanned_pages;		\
-})
-
-/* We have these possible memory map layouts:
- * Astro: 0-3.75, 67.75-68, 4-64
- * zx1: 0-1, 257-260, 4-256
- * Stretch (N-class): 0-2, 4-32, 34-xxx
- */
-
-/* Since each 1GB can only belong to one region (node), we can create
- * an index table for pfn to nid lookup; each entry in pfnnid_map 
- * represents 1GB, and contains the node that the memory belongs to. */
-
-#define PFNNID_SHIFT (30 - PAGE_SHIFT)
-#define PFNNID_MAP_MAX  512     /* support 512GB */
-extern unsigned char pfnnid_map[PFNNID_MAP_MAX];
-
-#ifndef CONFIG_64BIT
-#define pfn_is_io(pfn) ((pfn & (0xf0000000UL >> PAGE_SHIFT)) == (0xf0000000UL >> PAGE_SHIFT))
+#define MAX_PHYSMEM_RANGES 	8 /* current max is 3 but future proof this */
 #else
-/* io can be 0xf0f0f0f0f0xxxxxx or 0xfffffffff0000000 */
-#define pfn_is_io(pfn) ((pfn & (0xf000000000000000UL >> PAGE_SHIFT)) == (0xf000000000000000UL >> PAGE_SHIFT))
-#endif
-
-static inline int pfn_to_nid(unsigned long pfn)
-{
-	unsigned int i;
-	unsigned char r;
-
-	if (unlikely(pfn_is_io(pfn)))
-		return 0;
-
-	i = pfn >> PFNNID_SHIFT;
-	BUG_ON(i >= sizeof(pfnnid_map) / sizeof(pfnnid_map[0]));
-	r = pfnnid_map[i];
-	BUG_ON(r == 0xff);
-
-	return (int)r;
-}
-
-static inline int pfn_valid(int pfn)
-{
-	int nid = pfn_to_nid(pfn);
-
-	if (nid >= 0)
-		return (pfn < node_end_pfn(nid));
-	return 0;
-}
-
-#else /* !CONFIG_DISCONTIGMEM */
 #define MAX_PHYSMEM_RANGES 	1 
 #endif
+
 #endif /* _PARISC_MMZONE_H */
diff --git a/arch/parisc/include/asm/page.h b/arch/parisc/include/asm/page.h
index a84cc1f..654285a 100644
--- a/arch/parisc/include/asm/page.h
+++ b/arch/parisc/include/asm/page.h
@@ -139,9 +139,9 @@ extern int npmem_ranges;
 #define __pa(x)			((unsigned long)(x)-PAGE_OFFSET)
 #define __va(x)			((void *)((unsigned long)(x)+PAGE_OFFSET))
 
-#ifndef CONFIG_DISCONTIGMEM
+#ifndef CONFIG_SPARSEMEM
 #define pfn_valid(pfn)		((pfn) < max_mapnr)
-#endif /* CONFIG_DISCONTIGMEM */
+#endif
 
 #ifdef CONFIG_HUGETLB_PAGE
 #define HPAGE_SHIFT		22	/* 4MB (is this fixed?) */
diff --git a/arch/parisc/kernel/parisc_ksyms.c b/arch/parisc/kernel/parisc_ksyms.c
index df65366..526122c 100644
--- a/arch/parisc/kernel/parisc_ksyms.c
+++ b/arch/parisc/kernel/parisc_ksyms.c
@@ -147,12 +147,6 @@ extern void $$dyncall(void);
 EXPORT_SYMBOL($$dyncall);
 #endif
 
-#ifdef CONFIG_DISCONTIGMEM
-#include <asm/mmzone.h>
-EXPORT_SYMBOL(node_data);
-EXPORT_SYMBOL(pfnnid_map);
-#endif
-
 #ifdef CONFIG_FUNCTION_TRACER
 extern void _mcount(void);
 EXPORT_SYMBOL(_mcount);
diff --git a/arch/parisc/mm/init.c b/arch/parisc/mm/init.c
index 5fa1e27..69c547c 100644
--- a/arch/parisc/mm/init.c
+++ b/arch/parisc/mm/init.c
@@ -21,7 +21,6 @@
 #include <linux/initrd.h>
 #include <linux/swap.h>
 #include <linux/unistd.h>
-#include <linux/nodemask.h>	/* for node_online_map */
 #include <linux/pagemap.h>	/* for release_pages and page_cache_release */
 
 #include <asm/pgalloc.h>
@@ -35,11 +34,6 @@ DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
 
 extern int  data_start;
 
-#ifdef CONFIG_DISCONTIGMEM
-struct node_map_data node_data[MAX_NUMNODES] __read_mostly;
-unsigned char pfnnid_map[PFNNID_MAP_MAX] __read_mostly;
-#endif
-
 static struct resource data_resource = {
 	.name	= "Kernel data",
 	.flags	= IORESOURCE_BUSY | IORESOURCE_MEM,
@@ -110,7 +104,7 @@ static void __init setup_bootmem(void)
 	unsigned long bootmap_pages;
 	unsigned long bootmap_start_pfn;
 	unsigned long bootmap_pfn;
-#ifndef CONFIG_DISCONTIGMEM
+#ifndef CONFIG_SPARSEMEM
 	physmem_range_t pmem_holes[MAX_PHYSMEM_RANGES - 1];
 	int npmem_holes;
 #endif
@@ -144,7 +138,7 @@ static void __init setup_bootmem(void)
 		}
 	}
 
-#ifndef CONFIG_DISCONTIGMEM
+#ifndef CONFIG_SPARSEMEM
 	/*
 	 * Throw out ranges that are too far apart (controlled by
 	 * MAX_GAP).
@@ -156,7 +150,7 @@ static void __init setup_bootmem(void)
 			 pmem_ranges[i-1].pages) > MAX_GAP) {
 			npmem_ranges = i;
 			printk("Large gap in memory detected (%ld pages). "
-			       "Consider turning on CONFIG_DISCONTIGMEM\n",
+			       "Consider turning on CONFIG_SPARSEMEM\n",
 			       pmem_ranges[i].start_pfn -
 			       (pmem_ranges[i-1].start_pfn +
 			        pmem_ranges[i-1].pages));
@@ -228,7 +222,7 @@ static void __init setup_bootmem(void)
 
 	printk(KERN_INFO "Total Memory: %ld MB\n",mem_max >> 20);
 
-#ifndef CONFIG_DISCONTIGMEM
+#ifndef CONFIG_SPARSEMEM
 	/* Merge the ranges, keeping track of the holes */
 
 	{
@@ -253,48 +247,29 @@ static void __init setup_bootmem(void)
 	}
 #endif
 
-	bootmap_pages = 0;
-	for (i = 0; i < npmem_ranges; i++)
-		bootmap_pages += bootmem_bootmap_pages(pmem_ranges[i].pages);
+	bootmap_pages = bootmem_bootmap_pages(pmem_ranges[0].pages);
 
 	bootmap_start_pfn = PAGE_ALIGN(__pa((unsigned long) &_end)) >> PAGE_SHIFT;
 
-#ifdef CONFIG_DISCONTIGMEM
-	for (i = 0; i < MAX_PHYSMEM_RANGES; i++) {
-		memset(NODE_DATA(i), 0, sizeof(pg_data_t));
-		NODE_DATA(i)->bdata = &bootmem_node_data[i];
-	}
-	memset(pfnnid_map, 0xff, sizeof(pfnnid_map));
-
-	for (i = 0; i < npmem_ranges; i++) {
-		node_set_state(i, N_NORMAL_MEMORY);
-		node_set_online(i);
-	}
-#endif
-
 	/*
-	 * Initialize and free the full range of memory in each range.
-	 * Note that the only writing these routines do are to the bootmap,
-	 * and we've made sure to locate the bootmap properly so that they
-	 * won't be writing over anything important.
+	 * Only initialise the first memory range to bootmem (the bootmem
+	 * allocation map can't cope with large holes)
 	 */
 
 	bootmap_pfn = bootmap_start_pfn;
 	max_pfn = 0;
-	for (i = 0; i < npmem_ranges; i++) {
+	{
 		unsigned long start_pfn;
 		unsigned long npages;
 
-		start_pfn = pmem_ranges[i].start_pfn;
-		npages = pmem_ranges[i].pages;
+		start_pfn = pmem_ranges[0].start_pfn;
+		npages = pmem_ranges[0].pages;
 
-		bootmap_size = init_bootmem_node(NODE_DATA(i),
+		bootmap_size = init_bootmem_node(NODE_DATA(0),
 						bootmap_pfn,
 						start_pfn,
 						(start_pfn + npages) );
-		free_bootmem_node(NODE_DATA(i),
-				  (start_pfn << PAGE_SHIFT),
-				  (npages << PAGE_SHIFT) );
+		free_bootmem(start_pfn << PAGE_SHIFT, npages << PAGE_SHIFT);
 		bootmap_pfn += (bootmap_size + PAGE_SIZE - 1) >> PAGE_SHIFT;
 		if ((start_pfn + npages) > max_pfn)
 			max_pfn = start_pfn + npages;
@@ -323,7 +298,7 @@ static void __init setup_bootmem(void)
 			((bootmap_pfn - bootmap_start_pfn) << PAGE_SHIFT),
 			BOOTMEM_DEFAULT);
 
-#ifndef CONFIG_DISCONTIGMEM
+#ifndef CONFIG_SPARSEMEM
 
 	/* reserve the holes */
 
@@ -369,6 +344,13 @@ static void __init setup_bootmem(void)
 		request_resource(res, &data_resource);
 	}
 	request_resource(&sysram_resources[0], &pdcdata_resource);
+
+#ifdef CONFIG_SPARSEMEM
+	for (i = 0; i < npmem_ranges; i++) {
+		memory_present(0, pmem_ranges[i].start_pfn,
+			       pmem_ranges[i].start_pfn + pmem_ranges[i].pages);
+	}
+#endif
 }
 
 static void __init map_pages(unsigned long start_vaddr,
@@ -580,7 +562,7 @@ unsigned long pcxl_dma_start __read_mostly;
 
 void __init mem_init(void)
 {
-	int codesize, reservedpages, datasize, initsize;
+	int codesize, reservedpages, datasize, initsize, i;
 
 	/* Do sanity checks on page table constants */
 	BUILD_BUG_ON(PTE_ENTRY_SIZE != sizeof(pte_t));
@@ -589,19 +571,27 @@ void __init mem_init(void)
 	BUILD_BUG_ON(PAGE_SHIFT + BITS_PER_PTE + BITS_PER_PMD + BITS_PER_PGD
 			> BITS_PER_LONG);
 
-	high_memory = __va((max_pfn << PAGE_SHIFT));
-
-#ifndef CONFIG_DISCONTIGMEM
-	max_mapnr = page_to_pfn(virt_to_page(high_memory - 1)) + 1;
 	totalram_pages += free_all_bootmem();
-#else
-	{
-		int i;
-
-		for (i = 0; i < npmem_ranges; i++)
-			totalram_pages += free_all_bootmem_node(NODE_DATA(i));
+	/* free all the ranges not in bootmem */
+	for (i = 1; i < npmem_ranges; i++) {
+		unsigned long pfn = pmem_ranges[i].start_pfn;
+		unsigned long end = pfn + pmem_ranges[i].pages;
+
+		if (end > max_pfn)
+			max_pfn = end;
+
+		for (; pfn < end; pfn++) {
+			struct page *page = pfn_to_page(pfn);
+			ClearPageReserved(page);
+			init_page_count(page);
+			__free_page(page);
+			totalram_pages++;
+		}
 	}
-#endif
+
+	max_low_pfn = max_pfn;
+	high_memory = __va((max_pfn << PAGE_SHIFT));
+	max_mapnr = page_to_pfn(virt_to_page(high_memory - 1)) + 1;
 
 	codesize = (unsigned long)_etext - (unsigned long)_text;
 	datasize = (unsigned long)_edata - (unsigned long)_etext;
@@ -610,24 +600,15 @@ void __init mem_init(void)
 	reservedpages = 0;
 {
 	unsigned long pfn;
-#ifdef CONFIG_DISCONTIGMEM
-	int i;
-
-	for (i = 0; i < npmem_ranges; i++) {
-		for (pfn = node_start_pfn(i); pfn < node_end_pfn(i); pfn++) {
-			if (PageReserved(pfn_to_page(pfn)))
-				reservedpages++;
-		}
-	}
-#else /* !CONFIG_DISCONTIGMEM */
 	for (pfn = 0; pfn < max_pfn; pfn++) {
 		/*
 		 * Only count reserved RAM pages
 		 */
+		if (!pfn_valid(pfn))
+			continue;
 		if (PageReserved(pfn_to_page(pfn)))
 			reservedpages++;
 	}
-#endif
 }
 
 #ifdef CONFIG_PA11
@@ -680,78 +661,6 @@ void __init mem_init(void)
 unsigned long *empty_zero_page __read_mostly;
 EXPORT_SYMBOL(empty_zero_page);
 
-void show_mem(unsigned int filter)
-{
-	int i,free = 0,total = 0,reserved = 0;
-	int shared = 0, cached = 0;
-
-	printk(KERN_INFO "Mem-info:\n");
-	show_free_areas();
-#ifndef CONFIG_DISCONTIGMEM
-	i = max_mapnr;
-	while (i-- > 0) {
-		total++;
-		if (PageReserved(mem_map+i))
-			reserved++;
-		else if (PageSwapCache(mem_map+i))
-			cached++;
-		else if (!page_count(&mem_map[i]))
-			free++;
-		else
-			shared += page_count(&mem_map[i]) - 1;
-	}
-#else
-	for (i = 0; i < npmem_ranges; i++) {
-		int j;
-
-		for (j = node_start_pfn(i); j < node_end_pfn(i); j++) {
-			struct page *p;
-			unsigned long flags;
-
-			pgdat_resize_lock(NODE_DATA(i), &flags);
-			p = nid_page_nr(i, j) - node_start_pfn(i);
-
-			total++;
-			if (PageReserved(p))
-				reserved++;
-			else if (PageSwapCache(p))
-				cached++;
-			else if (!page_count(p))
-				free++;
-			else
-				shared += page_count(p) - 1;
-			pgdat_resize_unlock(NODE_DATA(i), &flags);
-        	}
-	}
-#endif
-	printk(KERN_INFO "%d pages of RAM\n", total);
-	printk(KERN_INFO "%d reserved pages\n", reserved);
-	printk(KERN_INFO "%d pages shared\n", shared);
-	printk(KERN_INFO "%d pages swap cached\n", cached);
-
-
-#ifdef CONFIG_DISCONTIGMEM
-	{
-		struct zonelist *zl;
-		int i, j;
-
-		for (i = 0; i < npmem_ranges; i++) {
-			zl = node_zonelist(i, 0);
-			for (j = 0; j < MAX_NR_ZONES; j++) {
-				struct zoneref *z;
-				struct zone *zone;
-
-				printk("Zone list for zone %d on node %d: ", j, i);
-				for_each_zone_zonelist(zone, z, zl, j)
-					printk("[%d/%s] ", zone_to_nid(zone),
-								zone->name);
-				printk("\n");
-			}
-		}
-	}
-#endif
-}
-
 /*
  * pagetable_init() sets up the page tables
  *
@@ -886,6 +795,9 @@ EXPORT_SYMBOL(map_hpux_gateway_page);
 void __init paging_init(void)
 {
 	int i;
+	unsigned long zones_size[MAX_NR_ZONES] = { 0, };
+	unsigned long holes_size[MAX_NR_ZONES] = { 0, };
+	unsigned long mem_start_pfn = ~0UL, mem_end_pfn = 0, mem_size_pfn = 0;
 
 	setup_bootmem();
 	pagetable_init();
@@ -893,27 +805,31 @@ void __init paging_init(void)
 	flush_cache_all_local(); /* start with known state */
 	flush_tlb_all_local(NULL);
 
-	for (i = 0; i < npmem_ranges; i++) {
-		unsigned long zones_size[MAX_NR_ZONES] = { 0, };
-
-		zones_size[ZONE_NORMAL] = pmem_ranges[i].pages;
-
-#ifdef CONFIG_DISCONTIGMEM
-		/* Need to initialize the pfnnid_map before we can initialize
-		   the zone */
-		{
-		    int j;
-		    for (j = (pmem_ranges[i].start_pfn >> PFNNID_SHIFT);
-			 j <= ((pmem_ranges[i].start_pfn + pmem_ranges[i].pages) >> PFNNID_SHIFT);
-			 j++) {
-			pfnnid_map[j] = i;
-		    }
-		}
-#endif
+	/*
+	 *  from here, the kernel and all of the physical memory is
+	 *  fully covered with page table entries.  This is required
+	 *  because sparse_init() is very memory greedy and will fall
+	 *  off the end of the kernel initial page mapping.
+	 */
+
+	sparse_init();
 
-		free_area_init_node(i, zones_size,
-				pmem_ranges[i].start_pfn, NULL);
+	for (i = 0; i < npmem_ranges; i++) {
+		unsigned long start = pmem_ranges[i].start_pfn;
+		unsigned long size = pmem_ranges[i].pages;
+		unsigned long end = start + size;
+
+		if (mem_start_pfn > start)
+			mem_start_pfn = start;
+		if (mem_end_pfn < end)
+			mem_end_pfn = end;
+		mem_size_pfn += size;
 	}
+
+	zones_size[ZONE_NORMAL] = mem_end_pfn - mem_start_pfn;
+	holes_size[ZONE_NORMAL] = zones_size[ZONE_NORMAL] - mem_size_pfn;
+
+	free_area_init_node(0, zones_size, mem_start_pfn, holes_size);
 }
 
 #ifdef CONFIG_PA20


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 210+ messages in thread

* Re: [patch] m68k, mm: set all online nodes in N_NORMAL_MEMORY
  2011-04-21 22:02                                                           ` [patch] m68k, " David Rientjes
@ 2011-04-24  1:59                                                             ` Michael Schmitz
  2011-04-24  3:14                                                               ` KOSAKI Motohiro
                                                                                 ` (2 more replies)
  0 siblings, 3 replies; 210+ messages in thread
From: Michael Schmitz @ 2011-04-24  1:59 UTC (permalink / raw)
  To: David Rientjes
  Cc: Geert Uytterhoeven, Andrew Morton, Roman Zippel, linux-m68k,
	linux-kernel, schwab, tg@mirbsd.de >> Thorsten Glaser

David Rientjes wrote:
> For m68k, N_NORMAL_MEMORY represents all nodes that have present memory 
> since it does not support HIGHMEM.  This patch sets the bit at the time 
> the node is brought online.
>
> If N_NORMAL_MEMORY is not accurate, slub may encounter errors since it 
> uses this nodemask to setup per-cache kmem_cache_node data structures.
>
> Signed-off-by: David Rientjes <rientjes@google.com>
> ---
>  arch/m68k/mm/init_mm.c |    2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
>
> diff --git a/arch/m68k/mm/init_mm.c b/arch/m68k/mm/init_mm.c
> --- a/arch/m68k/mm/init_mm.c
> +++ b/arch/m68k/mm/init_mm.c
> @@ -59,6 +59,8 @@ void __init m68k_setup_node(int node)
>  	}
>  #endif
>  	pg_data_map[node].bdata = bootmem_node_data + node;
> +	if (node_present_pages(node))
> +		node_set_state(node, N_NORMAL_MEMORY);
>  	node_set_online(node);
>  }
>  
>   
As Andreas pointed out, node_present_pages is set in free_area_init_node 
which only gets called at the very end of m68k mm paging_init.

The correct patch would be something like this - the need for the 
conditional is perhaps debatable, seeing as we set the pages present 
just before node_set_state.

Tested on my ARAnyM test setup so far. I'd like to wait for an 
independent kernel image built by Thorsten before I test on the actual 
hardware. Sorry but you'll have to restart your build Thorsten :-)

Signed-off-by: Michael Schmitz <schmitz@debian.org>
--
diff --git a/arch/m68k/mm/motorola.c b/arch/m68k/mm/motorola.c
index 02b7a03..b806c19 100644
--- a/arch/m68k/mm/motorola.c
+++ b/arch/m68k/mm/motorola.c
@@ -300,6 +300,8 @@ void __init paging_init(void)
                zones_size[ZONE_DMA] = m68k_memory[i].size >> PAGE_SHIFT;
                free_area_init_node(i, zones_size,
                                    m68k_memory[i].addr >> PAGE_SHIFT, 
NULL);
+                if (node_present_pages(i))
+                        node_set_state(i, N_NORMAL_MEMORY);
        }
 }



^ permalink raw reply related	[flat|nested] 210+ messages in thread

* Re: [patch] m68k, mm: set all online nodes in N_NORMAL_MEMORY
  2011-04-24  1:59                                                             ` Michael Schmitz
@ 2011-04-24  3:14                                                               ` KOSAKI Motohiro
  2011-04-24  4:01                                                                 ` Michael Schmitz
  2011-04-24 11:32                                                                 ` Thorsten Glaser
  2011-04-24 12:18                                                               ` Thorsten Glaser
  2011-04-25 20:56                                                               ` David Rientjes
  2 siblings, 2 replies; 210+ messages in thread
From: KOSAKI Motohiro @ 2011-04-24  3:14 UTC (permalink / raw)
  To: Michael Schmitz
  Cc: David Rientjes, Geert Uytterhoeven, Andrew Morton, Roman Zippel,
	linux-m68k, linux-kernel, schwab,
	tg@mirbsd.de >> Thorsten Glaser

2011/4/24 Michael Schmitz <schmitzmic@googlemail.com>:
> David Rientjes wrote:
>>
>> For m68k, N_NORMAL_MEMORY represents all nodes that have present memory
>> since it does not support HIGHMEM.  This patch sets the bit at the time the
>> node is brought online.
>>
>> If N_NORMAL_MEMORY is not accurate, slub may encounter errors since it
>> uses this nodemask to setup per-cache kmem_cache_node data structures.
>>
>> Signed-off-by: David Rientjes <rientjes@google.com>
>> ---
>>  arch/m68k/mm/init_mm.c |    2 ++
>>  1 files changed, 2 insertions(+), 0 deletions(-)
>>
>> diff --git a/arch/m68k/mm/init_mm.c b/arch/m68k/mm/init_mm.c
>> --- a/arch/m68k/mm/init_mm.c
>> +++ b/arch/m68k/mm/init_mm.c
>> @@ -59,6 +59,8 @@ void __init m68k_setup_node(int node)
>>        }
>>  #endif
>>        pg_data_map[node].bdata = bootmem_node_data + node;
>> +       if (node_present_pages(node))
>> +               node_set_state(node, N_NORMAL_MEMORY);
>>        node_set_online(node);
>>  }
>>
>
> As Andreas pointed out, node_present_pages is set in free_area_init_node
> which only gets called at the very end of m68k mm paging_init.
>
> The correct patch would be something like this - the need for the
> conditional is perhaps debatable, seeing as we set the pages present just
> before node_set_state.
>
> Tested on my ARAnyM test setup so far. I'd like to wait for an independent
> kernel image built by Thorsten before I test on the actual hardware. Sorry
> but you'll have to restart your build Thorsten :-)
>
> Signed-off-by: Michael Schmitz <schmitz@debian.org>
> --
> diff --git a/arch/m68k/mm/motorola.c b/arch/m68k/mm/motorola.c
> index 02b7a03..b806c19 100644
> --- a/arch/m68k/mm/motorola.c
> +++ b/arch/m68k/mm/motorola.c
> @@ -300,6 +300,8 @@ void __init paging_init(void)
>               zones_size[ZONE_DMA] = m68k_memory[i].size >> PAGE_SHIFT;
>               free_area_init_node(i, zones_size,
>                                   m68k_memory[i].addr >> PAGE_SHIFT, NULL);
> +                if (node_present_pages(i))
> +                        node_set_state(i, N_NORMAL_MEMORY);
>       }
> }

I think you are right. however I doubt m68k need to care memoryless node check.
probably following patch just work.

------------------------------------
 diff --git a/arch/m68k/mm/init_mm.c b/arch/m68k/mm/init_mm.c
 --- a/arch/m68k/mm/init_mm.c
 +++ b/arch/m68k/mm/init_mm.c
 @@ -59,6 +59,8 @@ void __init m68k_setup_node(int node)
        }
  #endif
        pg_data_map[node].bdata = bootmem_node_data + node;
        node_set_online(node);
+      node_set_state(node, N_NORMAL_MEMORY);


Maybe, the correct solution is to use CONFIG_ARCH_POPULATES_NODE_MAP feature.
the usage is simple, adding add_active_range() and replacing
free_area_init_node() with free_area_init_nodes().

Thanks.

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [patch] m68k, mm: set all online nodes in N_NORMAL_MEMORY
  2011-04-24  3:14                                                               ` KOSAKI Motohiro
@ 2011-04-24  4:01                                                                 ` Michael Schmitz
  2011-04-24  5:05                                                                   ` Michael Schmitz
  2011-04-24 11:32                                                                 ` Thorsten Glaser
  1 sibling, 1 reply; 210+ messages in thread
From: Michael Schmitz @ 2011-04-24  4:01 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Michael Schmitz, David Rientjes, Geert Uytterhoeven,
	Andrew Morton, Roman Zippel, linux-m68k, linux-kernel, schwab,
	tg@mirbsd.de >> Thorsten Glaser

KOSAKI Motohiro wrote:
> 2011/4/24 Michael Schmitz <schmitzmic@googlemail.com>:
>   
>> David Rientjes wrote:
>>     
>>> For m68k, N_NORMAL_MEMORY represents all nodes that have present memory
>>> since it does not support HIGHMEM.  This patch sets the bit at the time the
>>> node is brought online.
>>>
>>> If N_NORMAL_MEMORY is not accurate, slub may encounter errors since it
>>> uses this nodemask to setup per-cache kmem_cache_node data structures.
>>>
>>> Signed-off-by: David Rientjes <rientjes@google.com>
>>> ---
>>>  arch/m68k/mm/init_mm.c |    2 ++
>>>  1 files changed, 2 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/arch/m68k/mm/init_mm.c b/arch/m68k/mm/init_mm.c
>>> --- a/arch/m68k/mm/init_mm.c
>>> +++ b/arch/m68k/mm/init_mm.c
>>> @@ -59,6 +59,8 @@ void __init m68k_setup_node(int node)
>>>        }
>>>  #endif
>>>        pg_data_map[node].bdata = bootmem_node_data + node;
>>> +       if (node_present_pages(node))
>>> +               node_set_state(node, N_NORMAL_MEMORY);
>>>        node_set_online(node);
>>>  }
>>>
>>>       
>> As Andreas pointed out, node_present_pages is set in free_area_init_node
>> which only gets called at the very end of m68k mm paging_init.
>>
>> The correct patch would be something like this - the need for the
>> conditional is perhaps debatable, seeing as we set the pages present just
>> before node_set_state.
>>
>> Tested on my ARAnyM test setup so far. I'd like to wait for an independent
>> kernel image built by Thorsten before I test on the actual hardware. Sorry
>> but you'll have to restart your build Thorsten :-)
>>
>> Signed-off-by: Michael Schmitz <schmitz@debian.org>
>> --
>> diff --git a/arch/m68k/mm/motorola.c b/arch/m68k/mm/motorola.c
>> index 02b7a03..b806c19 100644
>> --- a/arch/m68k/mm/motorola.c
>> +++ b/arch/m68k/mm/motorola.c
>> @@ -300,6 +300,8 @@ void __init paging_init(void)
>>               zones_size[ZONE_DMA] = m68k_memory[i].size >> PAGE_SHIFT;
>>               free_area_init_node(i, zones_size,
>>                                   m68k_memory[i].addr >> PAGE_SHIFT, NULL);
>> +                if (node_present_pages(i))
>> +                        node_set_state(i, N_NORMAL_MEMORY);
>>       }
>> }
>>     
>
> I think you are right. however I doubt m68k need to care memoryless node check.
> probably following patch just work.
>
> ------------------------------------
>  diff --git a/arch/m68k/mm/init_mm.c b/arch/m68k/mm/init_mm.c
>  --- a/arch/m68k/mm/init_mm.c
>  +++ b/arch/m68k/mm/init_mm.c
>  @@ -59,6 +59,8 @@ void __init m68k_setup_node(int node)
>         }
>   #endif
>         pg_data_map[node].bdata = bootmem_node_data + node;
>         node_set_online(node);
> +      node_set_state(node, N_NORMAL_MEMORY);
>   
It does in fact - that's what I had as a workaround before.

Memory node information is provided by the respective m68k bootstrap so 
memoryless nodes won't be possible indeed.
> Maybe, the correct solution is to use CONFIG_ARCH_POPULATES_NODE_MAP feature.
> the usage is simple, adding add_active_range() and replacing
> free_area_init_node() with free_area_init_nodes().
>   
Either of the patches should work as long as there is not a potential 
problem with setting the node state before setting node_present_pages. 
Your suggestion may work as well but I cannot judge whether there may be 
side effects in relation to special case handling of node zero on m68k. 
Do we need to call add_active_range at the time init_bootmem_node is 
called? That would be a bit earlier in paging_init, and achieve much the 
same thing as the proposed patch or the earlier workaround.

Besides, your solution would set N_HIGH_MEMORY not N_NORMAL_MEMORY if I 
read free_area_init_nodes() right. Does that matter?

Cheers,

  Michael


^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [patch] m68k, mm: set all online nodes in N_NORMAL_MEMORY
  2011-04-24  4:01                                                                 ` Michael Schmitz
@ 2011-04-24  5:05                                                                   ` Michael Schmitz
  2011-04-24  6:04                                                                     ` KOSAKI Motohiro
  0 siblings, 1 reply; 210+ messages in thread
From: Michael Schmitz @ 2011-04-24  5:05 UTC (permalink / raw)
  To: Michael Schmitz
  Cc: KOSAKI Motohiro, David Rientjes, Geert Uytterhoeven,
	Andrew Morton, Roman Zippel, linux-m68k, linux-kernel, schwab,
	tg@mirbsd.de >> Thorsten Glaser

Michael Schmitz wrote:
> Besides, your solution would set N_HIGH_MEMORY not N_NORMAL_MEMORY if 
> I read free_area_init_nodes() right. Does that matter?
check_for_regular_memory() does take care of that - so it would be all 
the same in the end. Whether it's worth it for a handful of nodes at 
best I'm unsure. I'll leave that for Geert to decide.

Cheers,

  Michael


^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [patch] m68k, mm: set all online nodes in N_NORMAL_MEMORY
  2011-04-24  5:05                                                                   ` Michael Schmitz
@ 2011-04-24  6:04                                                                     ` KOSAKI Motohiro
  0 siblings, 0 replies; 210+ messages in thread
From: KOSAKI Motohiro @ 2011-04-24  6:04 UTC (permalink / raw)
  To: Michael Schmitz
  Cc: David Rientjes, Geert Uytterhoeven, Andrew Morton, Roman Zippel,
	linux-m68k, linux-kernel, schwab,
	tg@mirbsd.de >> Thorsten Glaser

2011/4/24 Michael Schmitz <schmitzmic@googlemail.com>:
> Michael Schmitz wrote:
>>
>> Besides, your solution would set N_HIGH_MEMORY not N_NORMAL_MEMORY if I
>> read free_area_init_nodes() right. Does that matter?
>
> check_for_regular_memory() does take care of that - so it would be all the
> same in the end. Whether it's worth it for a handful of nodes at best I'm
> unsure. I'll leave that for Geert to decide.

check_for_regular_memory() doesn't. It is CONFIG_HIGHMEM specific.
The trick is in node_status definition. N_NORMAL_MEMORY and
N_HIGHMEMORY share the same value. then
node_set_state(nid, N_HIGH_MEMORY) turn on both normal and
highmem bitmask.

--------------------------------------------------------------------
enum node_states {
        N_POSSIBLE,             /* The node could become online at some point */
        N_ONLINE,               /* The node is online */
        N_NORMAL_MEMORY,        /* The node has regular memory */
#ifdef CONFIG_HIGHMEM
        N_HIGH_MEMORY,          /* The node has regular or high memory */
#else
        N_HIGH_MEMORY = N_NORMAL_MEMORY,
#endif
        N_CPU,          /* The node has one or more cpus */
        NR_NODE_STATES
};

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [patch] m68k, mm: set all online nodes in N_NORMAL_MEMORY
  2011-04-24  3:14                                                               ` KOSAKI Motohiro
  2011-04-24  4:01                                                                 ` Michael Schmitz
@ 2011-04-24 11:32                                                                 ` Thorsten Glaser
  2011-04-25  2:41                                                                   ` Michael Schmitz
  1 sibling, 1 reply; 210+ messages in thread
From: Thorsten Glaser @ 2011-04-24 11:32 UTC (permalink / raw)
  To: Michael Schmitz, KOSAKI Motohiro
  Cc: David Rientjes, Geert Uytterhoeven, Andrew Morton, Roman Zippel,
	linux-m68k, linux-kernel, schwab

Michael Schmitz dixit:

> Tested on my ARAnyM test setup so far. I'd like to wait for an independent
> kernel image built by Thorsten before I test on the actual hardware. Sorry but
> you'll have to restart your build Thorsten :-)

Heh okay, if you want. But only cross-compiling :D


KOSAKI Motohiro dixit:

>I think you are right. however I doubt m68k need to care memoryless node check.
>probably following patch just work.

That’s what we have now. Compiles, boots, works fine.


So, which of these do you guys want?

bye,
//mirabilos
-- 
13:47⎜<tobiasu> if i were omnipotent, i would divide by zero
                all day long ;)
(thinking about http://lobacevski.tumblr.com/post/3260866481 by waga)

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [patch] m68k, mm: set all online nodes in N_NORMAL_MEMORY
  2011-04-24  1:59                                                             ` Michael Schmitz
  2011-04-24  3:14                                                               ` KOSAKI Motohiro
@ 2011-04-24 12:18                                                               ` Thorsten Glaser
  2011-04-25 20:56                                                               ` David Rientjes
  2 siblings, 0 replies; 210+ messages in thread
From: Thorsten Glaser @ 2011-04-24 12:18 UTC (permalink / raw)
  To: Michael Schmitz
  Cc: David Rientjes, Geert Uytterhoeven, Andrew Morton, Roman Zippel,
	linux-m68k, linux-kernel, schwab

Michael Schmitz dixit:

> Tested on my ARAnyM test setup so far. I'd like to wait for an independent
> kernel image built by Thorsten before I test on the actual hardware. Sorry but

ARAnyM too, but both this patch and the earlier one compile and boot
(on atari) and I’m building packages under both of them (two ARAnyM
instances) right now. Toolchain used:

binutils-m68k-linux-gnu         2.21.51.20110419-2
gcc-4.4-m68k-linux-gnu          4.4.6-2+m68k.1

(i.e. sid latest; gcc with only a patch to keep building libgcc2,
since we don’t have gcc-4.5 on m68k yet)

bye,
//mirabilos
-- 
13:47⎜<tobiasu> if i were omnipotent, i would divide by zero
                all day long ;)
(thinking about http://lobacevski.tumblr.com/post/3260866481 by waga)

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH] convert parisc to sparsemem (was Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards)
  2011-04-23 18:34                                                                   ` James Bottomley
@ 2011-04-24 16:27                                                                     ` John David Anglin
  -1 siblings, 0 replies; 210+ messages in thread
From: John David Anglin @ 2011-04-24 16:27 UTC (permalink / raw)
  To: James Bottomley
  Cc: Dave Hansen, Christoph Lameter, KOSAKI Motohiro, David Rientjes,
	Pekka Enberg, Michal Hocko, Andrew Morton, Hugh Dickins,
	linux-mm, LKML, linux-parisc, Ingo Molnar, x86 maintainers,
	Tejun Heo, Mel Gorman

On Sat, 23 Apr 2011, James Bottomley wrote:

> The boot sequence got a few seconds slower because now all of the loops
> over our pfn ranges actually have to skip through the holes (which takes
> time for 64GB).

On my rp3440, the biggest gap seems to be 265GB:

dave@mx3210:~$ cat /proc/iomem
00000000-3fffffff : System RAM
00000000-000009ff : PDC data (Page Zero)
00100000-004acfff : Kernel code
004ad000-00661fff : Kernel data
40000000-4fffffff : IOVA Space
100000000-27fdfffff : System RAM
4040000000-40ffffffff : System RAM

Dave
-- 
J. David Anglin                                  dave.anglin@nrc-cnrc.gc.ca
National Research Council of Canada              (613) 990-0752 (FAX: 952-6602)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH] convert parisc to sparsemem (was Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards)
@ 2011-04-24 16:27                                                                     ` John David Anglin
  0 siblings, 0 replies; 210+ messages in thread
From: John David Anglin @ 2011-04-24 16:27 UTC (permalink / raw)
  To: James Bottomley
  Cc: Dave Hansen, Christoph Lameter, KOSAKI Motohiro, David Rientjes,
	Pekka Enberg, Michal Hocko, Andrew Morton, Hugh Dickins,
	linux-mm, LKML, linux-parisc, Ingo Molnar, x86 maintainers,
	Tejun Heo, Mel Gorman

On Sat, 23 Apr 2011, James Bottomley wrote:

> The boot sequence got a few seconds slower because now all of the loops
> over our pfn ranges actually have to skip through the holes (which takes
> time for 64GB).

On my rp3440, the biggest gap seems to be 265GB:

dave@mx3210:~$ cat /proc/iomem
00000000-3fffffff : System RAM
00000000-000009ff : PDC data (Page Zero)
00100000-004acfff : Kernel code
004ad000-00661fff : Kernel data
40000000-4fffffff : IOVA Space
100000000-27fdfffff : System RAM
4040000000-40ffffffff : System RAM

Dave
-- 
J. David Anglin                                  dave.anglin@nrc-cnrc.gc.ca
National Research Council of Canada              (613) 990-0752 (FAX: 952-6602)

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [patch] m68k, mm: set all online nodes in N_NORMAL_MEMORY
  2011-04-24 11:32                                                                 ` Thorsten Glaser
@ 2011-04-25  2:41                                                                   ` Michael Schmitz
  2011-04-25 14:22                                                                     ` Thorsten Glaser
  0 siblings, 1 reply; 210+ messages in thread
From: Michael Schmitz @ 2011-04-25  2:41 UTC (permalink / raw)
  To: Thorsten Glaser
  Cc: Michael Schmitz, KOSAKI Motohiro, David Rientjes,
	Geert Uytterhoeven, Andrew Morton, Roman Zippel, linux-m68k,
	linux-kernel, schwab

Hello Thorsten,
>> Tested on my ARAnyM test setup so far. I'd like to wait for an independent
>> kernel image built by Thorsten before I test on the actual hardware. Sorry but
>> you'll have to restart your build Thorsten :-)
>>     
>
> Heh okay, if you want. But only cross-compiling :D
>   
Thought so. That's fine actually. Your cross toolchain will be a lot 
more recent than mine.
>
> KOSAKI Motohiro dixit:
>
>   
>> I think you are right. however I doubt m68k need to care memoryless node check.
>> probably following patch just work.
>>     
>
> That’s what we have now. Compiles, boots, works fine.
>
>
> So, which of these do you guys want?
>   
Whatever is cleaner and easier to understand. I'm a poor judge of code 
elegance.

Cheers,

  Michael


^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [patch] m68k, mm: set all online nodes in N_NORMAL_MEMORY
  2011-04-25  2:41                                                                   ` Michael Schmitz
@ 2011-04-25 14:22                                                                     ` Thorsten Glaser
  0 siblings, 0 replies; 210+ messages in thread
From: Thorsten Glaser @ 2011-04-25 14:22 UTC (permalink / raw)
  To: Michael Schmitz
  Cc: KOSAKI Motohiro, David Rientjes, Geert Uytterhoeven,
	Andrew Morton, Roman Zippel, linux-m68k, linux-kernel, schwab

Michael Schmitz dixit:

> Thought so. That's fine actually. Your cross toolchain will be a lot more
> recent than mine.

Just these tools in the exact versions I’d use natively, too:

binutils-m68k-linux-gnu 2.21.51.20110419-2
gcc-4.4-m68k-linux-gnu 4.4.6-2+m68k.1

>> So, which of these do you guys want?
>>  
> Whatever is cleaner and easier to understand. I'm a poor judge of code
> elegance.

I’d say the “correcter” one, if m68k would have to care about
memoryless nodes in the future (not sure whether it’s possible
for that to happen). But I’ve added your two-liner to Debian now.

bye,
//mirabilos
-- 
13:47⎜<tobiasu> if i were omnipotent, i would divide by zero
                all day long ;)
(thinking about http://lobacevski.tumblr.com/post/3260866481 by waga)

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [patch] m68k, mm: set all online nodes in N_NORMAL_MEMORY
  2011-04-24  1:59                                                             ` Michael Schmitz
  2011-04-24  3:14                                                               ` KOSAKI Motohiro
  2011-04-24 12:18                                                               ` Thorsten Glaser
@ 2011-04-25 20:56                                                               ` David Rientjes
  2011-04-26  2:51                                                                 ` Michael Schmitz
  2 siblings, 1 reply; 210+ messages in thread
From: David Rientjes @ 2011-04-25 20:56 UTC (permalink / raw)
  To: Michael Schmitz
  Cc: Geert Uytterhoeven, Andrew Morton, Roman Zippel, linux-m68k,
	linux-kernel, schwab, tg@mirbsd.de >> Thorsten Glaser

On Sun, 24 Apr 2011, Michael Schmitz wrote:

> diff --git a/arch/m68k/mm/motorola.c b/arch/m68k/mm/motorola.c
> index 02b7a03..b806c19 100644
> --- a/arch/m68k/mm/motorola.c
> +++ b/arch/m68k/mm/motorola.c
> @@ -300,6 +300,8 @@ void __init paging_init(void)
>                zones_size[ZONE_DMA] = m68k_memory[i].size >> PAGE_SHIFT;
>                free_area_init_node(i, zones_size,
>                                    m68k_memory[i].addr >> PAGE_SHIFT, NULL);
> +                if (node_present_pages(i))
> +                        node_set_state(i, N_NORMAL_MEMORY);
>        }
> }
> 

Ok, would you like to write a changelog for this similar to mine and then 
propose it as an alternative?

Thanks!

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH] convert parisc to sparsemem (was Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards)
  2011-04-23 18:34                                                                   ` James Bottomley
@ 2011-04-26  0:32                                                                     ` KOSAKI Motohiro
  -1 siblings, 0 replies; 210+ messages in thread
From: KOSAKI Motohiro @ 2011-04-26  0:32 UTC (permalink / raw)
  To: James Bottomley
  Cc: kosaki.motohiro, Dave Hansen, Christoph Lameter, David Rientjes,
	Pekka Enberg, Michal Hocko, Andrew Morton, Hugh Dickins,
	linux-mm, LKML, linux-parisc, Ingo Molnar, x86 maintainers,
	Tejun Heo, Mel Gorman

Hi James,

 % make CROSS_COMPILE=hppa64-linux- ARCH=parisc
  CHK     include/linux/version.h
  CHK     include/generated/utsrelease.h
  CC      arch/parisc/kernel/asm-offsets.s
In file included from include/linux/topology.h:32:0,
                 from include/linux/sched.h:78,
                 from arch/parisc/kernel/asm-offsets.c:31:
include/linux/mmzone.h:916:27: fatal error: asm/sparsemem.h: No such file or directory

Parhaps, you forgot to quilt add?


> This is the preliminary conversion.  It's very nasty on parisc because
> the memory allocation isn't symmetric anymore: under DISCONTIGMEM, we
> push all memory into bootmem and then let free_all_bootmem() do the
> magic for us; now we have to do separate initialisations for ranges
> because SPARSEMEM can't do multi-range boot memory. It's also got the
> horrible hack that I only use the first found range for bootmem.  I'm
> not sure if this is correct (it won't be if the first found range can be
> under about 50MB because we'll run out of bootmem during boot) ... we
> might have to sort the ranges and use the larges, but that will involve
> us in even more hackery around the bootmem reservations code.
> 
> The boot sequence got a few seconds slower because now all of the loops
> over our pfn ranges actually have to skip through the holes (which takes
> time for 64GB).
> 
> All in all, I've not been very impressed with SPARSEMEM over
> DISCONTIGMEM.  It seems to have a lot of rough edges (necessitating
> exception code) which DISCONTIGMEM just copes with.
> 
> And before you say the code is smaller, that's because I converted us to
> generic show_mem().

Cool! I hoped to remove arch specific show_mem() long time.


And, nitpick comment.

Could you please use #ifdef CONFIG_FLAGMEM instead #ifndef CONFIG_SPARSEMEM?
MM gyes parse '#ifndef CONFIG_SPARSEMEM' as valid-both-flatmem-and-discontigmem.
but this code isn't.

If my quick grep is correct, all of your #ifndef SPARSEMEM can be converted
#ifdef FALTMEM.





^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH] convert parisc to sparsemem (was Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards)
@ 2011-04-26  0:32                                                                     ` KOSAKI Motohiro
  0 siblings, 0 replies; 210+ messages in thread
From: KOSAKI Motohiro @ 2011-04-26  0:32 UTC (permalink / raw)
  To: James Bottomley
  Cc: kosaki.motohiro, Dave Hansen, Christoph Lameter, David Rientjes,
	Pekka Enberg, Michal Hocko, Andrew Morton, Hugh Dickins,
	linux-mm, LKML, linux-parisc, Ingo Molnar, x86 maintainers,
	Tejun Heo, Mel Gorman

Hi James,

 % make CROSS_COMPILE=hppa64-linux- ARCH=parisc
  CHK     include/linux/version.h
  CHK     include/generated/utsrelease.h
  CC      arch/parisc/kernel/asm-offsets.s
In file included from include/linux/topology.h:32:0,
                 from include/linux/sched.h:78,
                 from arch/parisc/kernel/asm-offsets.c:31:
include/linux/mmzone.h:916:27: fatal error: asm/sparsemem.h: No such file or directory

Parhaps, you forgot to quilt add?


> This is the preliminary conversion.  It's very nasty on parisc because
> the memory allocation isn't symmetric anymore: under DISCONTIGMEM, we
> push all memory into bootmem and then let free_all_bootmem() do the
> magic for us; now we have to do separate initialisations for ranges
> because SPARSEMEM can't do multi-range boot memory. It's also got the
> horrible hack that I only use the first found range for bootmem.  I'm
> not sure if this is correct (it won't be if the first found range can be
> under about 50MB because we'll run out of bootmem during boot) ... we
> might have to sort the ranges and use the larges, but that will involve
> us in even more hackery around the bootmem reservations code.
> 
> The boot sequence got a few seconds slower because now all of the loops
> over our pfn ranges actually have to skip through the holes (which takes
> time for 64GB).
> 
> All in all, I've not been very impressed with SPARSEMEM over
> DISCONTIGMEM.  It seems to have a lot of rough edges (necessitating
> exception code) which DISCONTIGMEM just copes with.
> 
> And before you say the code is smaller, that's because I converted us to
> generic show_mem().

Cool! I hoped to remove arch specific show_mem() long time.


And, nitpick comment.

Could you please use #ifdef CONFIG_FLAGMEM instead #ifndef CONFIG_SPARSEMEM?
MM gyes parse '#ifndef CONFIG_SPARSEMEM' as valid-both-flatmem-and-discontigmem.
but this code isn't.

If my quick grep is correct, all of your #ifndef SPARSEMEM can be converted
#ifdef FALTMEM.




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [patch] m68k, mm: set all online nodes in N_NORMAL_MEMORY
  2011-04-25 20:56                                                               ` David Rientjes
@ 2011-04-26  2:51                                                                 ` Michael Schmitz
  2011-04-26 10:50                                                                   ` Thorsten Glaser
  0 siblings, 1 reply; 210+ messages in thread
From: Michael Schmitz @ 2011-04-26  2:51 UTC (permalink / raw)
  To: David Rientjes
  Cc: Michael Schmitz, Geert Uytterhoeven, Andrew Morton, Roman Zippel,
	linux-m68k, linux-kernel, schwab,
	tg@mirbsd.de >> Thorsten Glaser

David Rientjes wrote:
> Ok, would you like to write a changelog for this similar to mine and then 
> propose it as an alternative?
>
> Thanks!
>   
Will this one do?

Cheers,

  Michael


commit 99b9c43cfb18a8e2222e9ef80b04a5c3e1dad254
Author: Michael Schmitz <schmitz@debian.org>
Date:   Tue Apr 26 14:51:54 2011 +1200

[m68k]      For m68k, N_NORMAL_MEMORY represents all nodes that have 
present memory
        since it does not support HIGHMEM.  This patch sets the bit at 
the time
        node_present_pages has been set by free_area_init_node.
        At the time the node is brought online, the the node state would 
have to
        be done unconditionally since information about present memory 
has not yet
        been recorded.
              
        If N_NORMAL_MEMORY is not accurate, slub may encounter errors 
since it
        uses this nodemask to setup per-cache kmem_cache_node data 
structures.

         This pach is an alternative to the one proposed by David 
Rientjes <rientjes@google.com>
         attempting to set node state immediately when bringing the node 
online.

Signed-off-by: Michael Schmitz <schmitz@debian.org>
---
 arch/m68k/mm/motorola.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/m68k/mm/motorola.c b/arch/m68k/mm/motorola.c
index 02b7a03..b806c19 100644
--- a/arch/m68k/mm/motorola.c
+++ b/arch/m68k/mm/motorola.c
@@ -300,6 +300,8 @@ void __init paging_init(void)
                zones_size[ZONE_DMA] = m68k_memory[i].size >> PAGE_SHIFT;
                free_area_init_node(i, zones_size,
                                    m68k_memory[i].addr >> PAGE_SHIFT, 
NULL);
+                if (node_present_pages(i))
+                        node_set_state(i, N_NORMAL_MEMORY);
        }
 }


^ permalink raw reply related	[flat|nested] 210+ messages in thread

* Re: [PATCH followup] mm: get rid of CONFIG_STACK_GROWSUP || CONFIG_IA64
  2011-04-19 11:09           ` Michal Hocko
@ 2011-04-26  7:59             ` Michal Hocko
  -1 siblings, 0 replies; 210+ messages in thread
From: Michal Hocko @ 2011-04-26  7:59 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Hugh Dickins, linux-mm, LKML

Hi Andrew,
this one probably got lost in the follow up "parisc doesn't boot" email
storm.

On Tue 19-04-11 13:09:56, Michal Hocko wrote:
> While I am in the cleanup mode. We should use VM_GROWSUP rather than
> tricky CONFIG_STACK_GROWSUP||CONFIG_IA64.
> 
> What do you think?
> --- 
> From fd832dd46b4918718901f2ebe994f4662f167999 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.cz>
> Date: Tue, 19 Apr 2011 11:11:41 +0200
> Subject: [PATCH] mm: get rid of CONFIG_STACK_GROWSUP || CONFIG_IA64
> 
> IA64 needs some trickery for Register Backing Store so we have to
> export expand_stack_upwards for it even though the architecture expands
> its stack downwards normally. To avoid
> we have defined VM_GROWSUP which is defined only for the above
> configuration.
> 
> We still have places which use the original ifdefs so let's get rid of
> them finally.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.cz>
> ---
>  mm/mmap.c |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 29c68b0..3ff9edf 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1726,7 +1726,7 @@ static int acct_stack_growth(struct vm_area_struct *vma, unsigned long size, uns
>  	return 0;
>  }
>  
> -#if defined(CONFIG_STACK_GROWSUP) || defined(CONFIG_IA64)
> +#if VM_GROWSUP
>  /*
>   * PA-RISC uses this for its stack; IA64 for its Register Backing Store.
>   * vma is the last one with address > vma->vm_end.  Have to extend vma.
> @@ -1777,7 +1777,7 @@ int expand_stack_upwards(struct vm_area_struct *vma, unsigned long address)
>  	khugepaged_enter_vma_merge(vma);
>  	return error;
>  }
> -#endif /* CONFIG_STACK_GROWSUP || CONFIG_IA64 */
> +#endif /* VM_GROWSUP */
>  
>  /*
>   * vma is the first one with address < vma->vm_start.  Have to extend vma.

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH followup] mm: get rid of CONFIG_STACK_GROWSUP || CONFIG_IA64
@ 2011-04-26  7:59             ` Michal Hocko
  0 siblings, 0 replies; 210+ messages in thread
From: Michal Hocko @ 2011-04-26  7:59 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Hugh Dickins, linux-mm, LKML

Hi Andrew,
this one probably got lost in the follow up "parisc doesn't boot" email
storm.

On Tue 19-04-11 13:09:56, Michal Hocko wrote:
> While I am in the cleanup mode. We should use VM_GROWSUP rather than
> tricky CONFIG_STACK_GROWSUP||CONFIG_IA64.
> 
> What do you think?
> --- 
> From fd832dd46b4918718901f2ebe994f4662f167999 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.cz>
> Date: Tue, 19 Apr 2011 11:11:41 +0200
> Subject: [PATCH] mm: get rid of CONFIG_STACK_GROWSUP || CONFIG_IA64
> 
> IA64 needs some trickery for Register Backing Store so we have to
> export expand_stack_upwards for it even though the architecture expands
> its stack downwards normally. To avoid
> we have defined VM_GROWSUP which is defined only for the above
> configuration.
> 
> We still have places which use the original ifdefs so let's get rid of
> them finally.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.cz>
> ---
>  mm/mmap.c |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 29c68b0..3ff9edf 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1726,7 +1726,7 @@ static int acct_stack_growth(struct vm_area_struct *vma, unsigned long size, uns
>  	return 0;
>  }
>  
> -#if defined(CONFIG_STACK_GROWSUP) || defined(CONFIG_IA64)
> +#if VM_GROWSUP
>  /*
>   * PA-RISC uses this for its stack; IA64 for its Register Backing Store.
>   * vma is the last one with address > vma->vm_end.  Have to extend vma.
> @@ -1777,7 +1777,7 @@ int expand_stack_upwards(struct vm_area_struct *vma, unsigned long address)
>  	khugepaged_enter_vma_merge(vma);
>  	return error;
>  }
> -#endif /* CONFIG_STACK_GROWSUP || CONFIG_IA64 */
> +#endif /* VM_GROWSUP */
>  
>  /*
>   * vma is the first one with address < vma->vm_start.  Have to extend vma.

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [patch] m68k, mm: set all online nodes in N_NORMAL_MEMORY
  2011-04-26  2:51                                                                 ` Michael Schmitz
@ 2011-04-26 10:50                                                                   ` Thorsten Glaser
  2011-04-27 13:19                                                                     ` Geert Uytterhoeven
  0 siblings, 1 reply; 210+ messages in thread
From: Thorsten Glaser @ 2011-04-26 10:50 UTC (permalink / raw)
  To: Michael Schmitz
  Cc: David Rientjes, Geert Uytterhoeven, Andrew Morton, Roman Zippel,
	linux-m68k, linux-kernel, schwab

Michael Schmitz dixit:

>       be done unconditionally since information about present memory has not
> yet
>       been recorded.
>                    If N_NORMAL_MEMORY is not accurate, slub may encounter
> errors since it

Hrm…

> @@ -300,6 +300,8 @@ void __init paging_init(void)
>               zones_size[ZONE_DMA] = m68k_memory[i].size >> PAGE_SHIFT;
>               free_area_init_node(i, zones_size,
>                                   m68k_memory[i].addr >> PAGE_SHIFT, NULL);
> +                if (node_present_pages(i))
> +                        node_set_state(i, N_NORMAL_MEMORY);
>       }
> }

No, this has whitespace problems (tabs are expanded to spaces).

bye,
//mirabilos
-- 
13:47⎜<tobiasu> if i were omnipotent, i would divide by zero
                all day long ;)
(thinking about http://lobacevski.tumblr.com/post/3260866481 by waga)

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [patch] m68k, mm: set all online nodes in N_NORMAL_MEMORY
  2011-04-26 10:50                                                                   ` Thorsten Glaser
@ 2011-04-27 13:19                                                                     ` Geert Uytterhoeven
  2011-04-27 15:09                                                                       ` Thorsten Glaser
                                                                                         ` (2 more replies)
  0 siblings, 3 replies; 210+ messages in thread
From: Geert Uytterhoeven @ 2011-04-27 13:19 UTC (permalink / raw)
  To: Thorsten Glaser
  Cc: Michael Schmitz, David Rientjes, Andrew Morton, Roman Zippel,
	linux-m68k, linux-kernel, schwab

On Tue, Apr 26, 2011 at 12:50, Thorsten Glaser <tg@mirbsd.de> wrote:
> Michael Schmitz dixit:
>
>>       be done unconditionally since information about present memory has not
>> yet
>>       been recorded.
>>                    If N_NORMAL_MEMORY is not accurate, slub may encounter
>> errors since it
>
> Hrm…
>
>> @@ -300,6 +300,8 @@ void __init paging_init(void)
>>               zones_size[ZONE_DMA] = m68k_memory[i].size >> PAGE_SHIFT;
>>               free_area_init_node(i, zones_size,
>>                                   m68k_memory[i].addr >> PAGE_SHIFT, NULL);
>> +                if (node_present_pages(i))
>> +                        node_set_state(i, N_NORMAL_MEMORY);
>>       }
>> }
>
> No, this has whitespace problems (tabs are expanded to spaces).

Fixed those up, applied, and will send to Linus for 2.6.39-final.

BTW, if I enable CONFIG_SLUB on my ARAnyM setup, I don't get a crash, but it
hangs after:

| INIT: version 2.86 booting

With Michael's patch, it continues fine.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [patch] m68k, mm: set all online nodes in N_NORMAL_MEMORY
  2011-04-27 13:19                                                                     ` Geert Uytterhoeven
@ 2011-04-27 15:09                                                                       ` Thorsten Glaser
  2011-04-27 15:11                                                                       ` Thorsten Glaser
  2011-04-28  3:05                                                                       ` Michael Schmitz
  2 siblings, 0 replies; 210+ messages in thread
From: Thorsten Glaser @ 2011-04-27 15:09 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Michael Schmitz, David Rientjes, Andrew Morton, Roman Zippel,
	linux-m68k, linux-kernel, schwab

Geert Uytterhoeven dixit:

>Fixed those up, applied, and will send to Linus for 2.6.39-final.

Tested-by: Thorsten Glaser <tg@debian.org>

>BTW, if I enable CONFIG_SLUB on my ARAnyM setup, I don't get a crash, but it
>hangs after:

I think that’s toolchain, environment, etc. dependent…

>With Michael's patch, it continues fine.

Yeah, same here.

bye,
//mirabilos
-- 
FWIW, I'm quite impressed with mksh interactively. I thought it was much
*much* more bare bones. But it turns out it beats the living hell out of
ksh93 in that respect. I'd even consider it for my daily use if I hadn't
wasted half my life on my zsh setup. :-) -- Frank Terbeck in #!/bin/mksh

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [patch] m68k, mm: set all online nodes in N_NORMAL_MEMORY
  2011-04-27 13:19                                                                     ` Geert Uytterhoeven
  2011-04-27 15:09                                                                       ` Thorsten Glaser
@ 2011-04-27 15:11                                                                       ` Thorsten Glaser
  2011-04-28  2:35                                                                         ` Ben Hutchings
  2011-04-28  3:05                                                                       ` Michael Schmitz
  2 siblings, 1 reply; 210+ messages in thread
From: Thorsten Glaser @ 2011-04-27 15:11 UTC (permalink / raw)
  To: Debian Kernel Team; +Cc: linux-m68k

[-- Attachment #1: Type: TEXT/PLAIN, Size: 780 bytes --]

Geert Uytterhoeven dixit:

>Fixed those up, applied, and will send to Linus for 2.6.39-final.

Attached patch
• synchronises the 0009-* patch against what will be in 2.6.39
  (no changes in the patch area, only in the patch metadata)
• adds the series/5 file which seemingly was forgotten

I think you’ll still need the following patch for 2.6.39:
0008-m68k-atari-Reserve-some-ST-RAM-early-on-for-device-b.patch
We’re working on getting that functionality in 2.6.40 though.

bye,
//mirabilos
-- 
I believe no one can invent an algorithm. One just happens to hit upon it
when God enlightens him. Or only God invents algorithms, we merely copy them.
If you don't believe in God, just consider God as Nature if you won't deny
existence.		-- Coywolf Qi Hunt

[-- Attachment #2: Type: TEXT/PLAIN, Size: 4400 bytes --]

Index: dists/sid/linux-2.6/debian/patches/bugfix/m68k/0009-m68k-mm-set-all-online-nodes-in-N_NORMAL_MEMORY.patch
===================================================================
--- dists/sid/linux-2.6/debian/patches/bugfix/m68k/0009-m68k-mm-set-all-online-nodes-in-N_NORMAL_MEMORY.patch	(revision 17261)
+++ dists/sid/linux-2.6/debian/patches/bugfix/m68k/0009-m68k-mm-set-all-online-nodes-in-N_NORMAL_MEMORY.patch	(working copy)
@@ -1,47 +1,25 @@
-From 678cf05cb8031c762644a393b7c943e7a9d2c288 Mon Sep 17 00:00:00 2001
+From fb5f44b4fd12bfd16019956d0d0795278a2f561b Mon Sep 17 00:00:00 2001
 From: Michael Schmitz <schmitzmic@googlemail.com>
-Date: Sun, 24 Apr 2011 13:59:43 +1200
-Subject: [PATCH 9/9] m68k, mm: set all online nodes in N_NORMAL_MEMORY
+Date: Tue, 26 Apr 2011 14:51:53 +1200
+Subject: [PATCH] m68k/mm: Set all online nodes in N_NORMAL_MEMORY
 
-David Rientjes wrote:
-> For m68k, N_NORMAL_MEMORY represents all nodes that have present memory
-> since it does not support HIGHMEM.  This patch sets the bit at the time
-> the node is brought online.
->
-> If N_NORMAL_MEMORY is not accurate, slub may encounter errors since it
-> uses this nodemask to setup per-cache kmem_cache_node data structures.
->
-> Signed-off-by: David Rientjes <rientjes@google.com>
-> ---
->  arch/m68k/mm/init_mm.c |    2 ++
->  1 files changed, 2 insertions(+), 0 deletions(-)
->
-> diff --git a/arch/m68k/mm/init_mm.c b/arch/m68k/mm/init_mm.c
-> --- a/arch/m68k/mm/init_mm.c
-> +++ b/arch/m68k/mm/init_mm.c
-> @@ -59,6 +59,8 @@ void __init m68k_setup_node(int node)
->  	}
->  #endif
->  	pg_data_map[node].bdata = bootmem_node_data + node;
-> +	if (node_present_pages(node))
-> +		node_set_state(node, N_NORMAL_MEMORY);
->  	node_set_online(node);
->  }
->
->
-As Andreas pointed out, node_present_pages is set in free_area_init_node
-which only gets called at the very end of m68k mm paging_init.
+For m68k, N_NORMAL_MEMORY represents all nodes that have present memory
+since it does not support HIGHMEM.  This patch sets the bit at the time
+node_present_pages has been set by free_area_init_node.
+At the time the node is brought online, the node state would have to be
+done unconditionally since information about present memory has not yet
+been recorded.
 
-The correct patch would be something like this - the need for the
-conditional is perhaps debatable, seeing as we set the pages present
-just before node_set_state.
+If N_NORMAL_MEMORY is not accurate, slub may encounter errors since it
+uses this nodemask to setup per-cache kmem_cache_node data structures.
 
-Tested on my ARAnyM test setup so far. I'd like to wait for an
-independent kernel image built by Thorsten before I test on the actual
-hardware. Sorry but you'll have to restart your build Thorsten :-)
+This pach is an alternative to the one proposed by David Rientjes
+<rientjes@google.com> attempting to set node state immediately when
+bringing the node online.
 
 Signed-off-by: Michael Schmitz <schmitz@debian.org>
-Tested-by: Thorsten Glaser <tg@debian.org>
+Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
+CC: stable@kernel.org
 ---
  arch/m68k/mm/motorola.c |    2 ++
  1 files changed, 2 insertions(+), 0 deletions(-)
@@ -60,5 +38,5 @@
  }
  
 -- 
-1.7.4.4
+1.7.5
 
Index: dists/sid/linux-2.6/debian/patches/series/5
===================================================================
--- dists/sid/linux-2.6/debian/patches/series/5	(revision 0)
+++ dists/sid/linux-2.6/debian/patches/series/5	(revision 0)
@@ -0,0 +1,12 @@
+# will be in 2.6.39 (pulled from linus tree)
++ bugfix/m68k/0001-m68k-Add-helper-function-handle_kernel_fault.patch
++ bugfix/m68k/0002-m68k-Use-base_trap_init-to-initialize-vectors.patch
++ bugfix/m68k/0003-m68k-Allow-all-kernel-traps-to-be-handled-via-except.patch
++ bugfix/m68k/0004-m68k-atari-Initial-ARAnyM-support.patch
++ bugfix/m68k/0005-m68k-atari-ARAnyM-Add-support-for-block-access.patch
++ bugfix/m68k/0006-m68k-atari-ARAnyM-Add-support-for-console-access.patch
++ bugfix/m68k/0007-m68k-atari-ARAnyM-Add-support-for-network-access.patch
+# accepted by m68k subsystem maintainer; probably in 2.6.40
++ bugfix/m68k/0008-m68k-atari-Reserve-some-ST-RAM-early-on-for-device-b.patch
+# will be in 2.6.39
++ bugfix/m68k/0009-m68k-mm-set-all-online-nodes-in-N_NORMAL_MEMORY.patch

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH] convert parisc to sparsemem (was Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards)
  2011-04-23 18:34                                                                   ` James Bottomley
@ 2011-04-27 16:36                                                                     ` Dave Hansen
  -1 siblings, 0 replies; 210+ messages in thread
From: Dave Hansen @ 2011-04-27 16:36 UTC (permalink / raw)
  To: James Bottomley
  Cc: Christoph Lameter, KOSAKI Motohiro, David Rientjes, Pekka Enberg,
	Michal Hocko, Andrew Morton, Hugh Dickins, linux-mm, LKML,
	linux-parisc, Ingo Molnar, x86 maintainers, Tejun Heo,
	Mel Gorman, Andy Whitcroft

On Sat, 2011-04-23 at 13:34 -0500, James Bottomley wrote: 
> This is the preliminary conversion.  It's very nasty on parisc because
> the memory allocation isn't symmetric anymore: under DISCONTIGMEM, we
> push all memory into bootmem and then let free_all_bootmem() do the
> magic for us;

Urg, that's unfortunate.  I bet we could fairly easily teach the bootmem
allocator to allow a couple of bootmem_data's to hang off of an
individual pgdat.  Put each pmem_ranges in one of those instead of a
pgdat.  That would at least help with the bitmap size explosion and
extra loops.

> now we have to do separate initialisations for ranges
> because SPARSEMEM can't do multi-range boot memory. It's also got the
> horrible hack that I only use the first found range for bootmem.  I'm
> not sure if this is correct (it won't be if the first found range can be
> under about 50MB because we'll run out of bootmem during boot) ... we
> might have to sort the ranges and use the larges, but that will involve
> us in even more hackery around the bootmem reservations code.
> 
> The boot sequence got a few seconds slower because now all of the loops
> over our pfn ranges actually have to skip through the holes (which takes
> time for 64GB).

Which iterations were these, btw?  All of the ones I saw the patch touch
seemed to be running over just a single pmem_range.

> All in all, I've not been very impressed with SPARSEMEM over
> DISCONTIGMEM.  It seems to have a lot of rough edges (necessitating
> exception code) which DISCONTIGMEM just copes with.

We definitely need to look at extending it to cover bootmem-time a bit.
Is that even worth it these days with the no-bootmem bits around?

-- Dave



^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [PATCH] convert parisc to sparsemem (was Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards)
@ 2011-04-27 16:36                                                                     ` Dave Hansen
  0 siblings, 0 replies; 210+ messages in thread
From: Dave Hansen @ 2011-04-27 16:36 UTC (permalink / raw)
  To: James Bottomley
  Cc: Christoph Lameter, KOSAKI Motohiro, David Rientjes, Pekka Enberg,
	Michal Hocko, Andrew Morton, Hugh Dickins, linux-mm, LKML,
	linux-parisc, Ingo Molnar, x86 maintainers, Tejun Heo,
	Mel Gorman, Andy Whitcroft

On Sat, 2011-04-23 at 13:34 -0500, James Bottomley wrote: 
> This is the preliminary conversion.  It's very nasty on parisc because
> the memory allocation isn't symmetric anymore: under DISCONTIGMEM, we
> push all memory into bootmem and then let free_all_bootmem() do the
> magic for us;

Urg, that's unfortunate.  I bet we could fairly easily teach the bootmem
allocator to allow a couple of bootmem_data's to hang off of an
individual pgdat.  Put each pmem_ranges in one of those instead of a
pgdat.  That would at least help with the bitmap size explosion and
extra loops.

> now we have to do separate initialisations for ranges
> because SPARSEMEM can't do multi-range boot memory. It's also got the
> horrible hack that I only use the first found range for bootmem.  I'm
> not sure if this is correct (it won't be if the first found range can be
> under about 50MB because we'll run out of bootmem during boot) ... we
> might have to sort the ranges and use the larges, but that will involve
> us in even more hackery around the bootmem reservations code.
> 
> The boot sequence got a few seconds slower because now all of the loops
> over our pfn ranges actually have to skip through the holes (which takes
> time for 64GB).

Which iterations were these, btw?  All of the ones I saw the patch touch
seemed to be running over just a single pmem_range.

> All in all, I've not been very impressed with SPARSEMEM over
> DISCONTIGMEM.  It seems to have a lot of rough edges (necessitating
> exception code) which DISCONTIGMEM just copes with.

We definitely need to look at extending it to cover bootmem-time a bit.
Is that even worth it these days with the no-bootmem bits around?

-- Dave


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [patch] m68k, mm: set all online nodes in N_NORMAL_MEMORY
  2011-04-27 15:11                                                                       ` Thorsten Glaser
@ 2011-04-28  2:35                                                                         ` Ben Hutchings
  0 siblings, 0 replies; 210+ messages in thread
From: Ben Hutchings @ 2011-04-28  2:35 UTC (permalink / raw)
  To: Thorsten Glaser; +Cc: Debian Kernel Team, linux-m68k

[-- Attachment #1: Type: text/plain, Size: 621 bytes --]

On Wed, 2011-04-27 at 15:11 +0000, Thorsten Glaser wrote:
> Geert Uytterhoeven dixit:
> 
> >Fixed those up, applied, and will send to Linus for 2.6.39-final.
> 
> Attached patch
> • synchronises the 0009-* patch against what will be in 2.6.39
>   (no changes in the patch area, only in the patch metadata)

OK.

> • adds the series/5 file which seemingly was forgotten
[...]

Duh, I have it here but didn't check it in.  I'm normally so diligent
about running 'svn st' to check for untracked files...

Ben.

-- 
Ben Hutchings
Once a job is fouled up, anything done to improve it makes it worse.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 210+ messages in thread

* Re: [patch] m68k, mm: set all online nodes in N_NORMAL_MEMORY
  2011-04-27 13:19                                                                     ` Geert Uytterhoeven
  2011-04-27 15:09                                                                       ` Thorsten Glaser
  2011-04-27 15:11                                                                       ` Thorsten Glaser
@ 2011-04-28  3:05                                                                       ` Michael Schmitz
  2 siblings, 0 replies; 210+ messages in thread
From: Michael Schmitz @ 2011-04-28  3:05 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Thorsten Glaser, David Rientjes, Andrew Morton, Roman Zippel,
	linux-m68k, linux-kernel, schwab

Hi Geert,

>> No, this has whitespace problems (tabs are expanded to spaces).
>
> Fixed those up, applied, and will send to Linus for 2.6.39-final.

Thanks, I have no idea how that happened.
>
> BTW, if I enable CONFIG_SLUB on my ARAnyM setup, I don't get a crash, but it
> hangs after:
>
> | INIT: version 2.86 booting
>
> With Michael's patch, it continues fine.

It may just need a long time to crash - forcing e2fsck is a good way
to speed up the crash usually.

May depend on how much RAM you give to ARAnyM, on the kernel size and
any number of other parameters.

Cheers,

  Michael

^ permalink raw reply	[flat|nested] 210+ messages in thread

end of thread, other threads:[~2011-04-28  3:05 UTC | newest]

Thread overview: 210+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-04-15 13:51 [PATCH] mm: make expand_downwards symmetrical to expand_upwards Michal Hocko
2011-04-15 13:51 ` Michal Hocko
2011-04-18  3:00 ` Hugh Dickins
2011-04-18  3:00   ` Hugh Dickins
2011-04-18 10:01   ` [PATCH v2] " Michal Hocko
2011-04-18 10:01     ` Michal Hocko
2011-04-18 20:56     ` Andrew Morton
2011-04-18 20:56       ` Andrew Morton
     [not found]       ` <20110419091022.GA21689@tiehlicka.suse.cz>
2011-04-19 11:09         ` [PATCH followup] mm: get rid of CONFIG_STACK_GROWSUP || CONFIG_IA64 Michal Hocko
2011-04-19 11:09           ` Michal Hocko
2011-04-20  0:33           ` KOSAKI Motohiro
2011-04-20  0:33             ` KOSAKI Motohiro
2011-04-20  6:59             ` Michal Hocko
2011-04-20  6:59               ` Michal Hocko
2011-04-20  7:08               ` KOSAKI Motohiro
2011-04-20  7:08                 ` KOSAKI Motohiro
2011-04-26  7:59           ` Michal Hocko
2011-04-26  7:59             ` Michal Hocko
2011-04-19 11:10       ` [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards Michal Hocko
2011-04-19 11:10         ` Michal Hocko
2011-04-19 11:10         ` Michal Hocko
2011-04-19 15:46         ` James Bottomley
2011-04-19 15:46           ` James Bottomley
2011-04-19 16:06           ` [PATCH v3] mm: make expand_downwards symmetrical to John David Anglin
2011-04-19 16:06             ` John David Anglin
2011-04-19 16:59             ` James Bottomley
2011-04-19 16:59               ` James Bottomley
2011-04-19 16:07           ` [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards James Bottomley
2011-04-19 16:07             ` James Bottomley
2011-04-19 17:05           ` Pekka Enberg
2011-04-19 17:05             ` Pekka Enberg
2011-04-19 17:05             ` Pekka Enberg
2011-04-19 17:11             ` James Bottomley
2011-04-19 17:11               ` James Bottomley
2011-04-19 17:15               ` Christoph Lameter
2011-04-19 17:15                 ` Christoph Lameter
2011-04-19 17:48                 ` James Bottomley
2011-04-19 17:48                   ` James Bottomley
2011-04-19 18:10                   ` Christoph Lameter
2011-04-19 18:10                     ` Christoph Lameter
2011-04-19 18:20                     ` James Bottomley
2011-04-19 18:20                       ` James Bottomley
2011-04-19 18:35                       ` Christoph Lameter
2011-04-19 18:35                         ` Christoph Lameter
2011-04-19 19:49                         ` James Bottomley
2011-04-19 19:49                           ` James Bottomley
2011-04-19 20:56                           ` Christoph Lameter
2011-04-19 20:56                             ` Christoph Lameter
2011-04-19 21:21                             ` James Bottomley
2011-04-19 21:21                               ` James Bottomley
2011-04-19 21:39                               ` Christoph Lameter
2011-04-19 21:39                                 ` Christoph Lameter
2011-04-19 21:48                                 ` James Bottomley
2011-04-19 21:48                                   ` James Bottomley
2011-04-19 21:58                                   ` Christoph Lameter
2011-04-19 21:58                                     ` Christoph Lameter
2011-04-20  1:23                                     ` KOSAKI Motohiro
2011-04-20  1:23                                       ` KOSAKI Motohiro
2011-04-20  2:48                                       ` James Bottomley
2011-04-20  2:48                                         ` James Bottomley
2011-04-20  2:57                                         ` KOSAKI Motohiro
2011-04-20  2:57                                           ` KOSAKI Motohiro
2011-04-20  2:57                                           ` KOSAKI Motohiro
2011-04-20 13:50                                           ` Christoph Lameter
2011-04-20 13:50                                             ` Christoph Lameter
2011-04-21 13:32                                             ` Tejun Heo
2011-04-21 13:32                                               ` Tejun Heo
2011-04-20  5:53                                       ` Pekka Enberg
2011-04-20  5:53                                         ` Pekka Enberg
2011-04-20  7:15                                         ` KOSAKI Motohiro
2011-04-20  7:15                                           ` KOSAKI Motohiro
2011-04-20  7:34                                           ` Pekka Enberg
2011-04-20  7:34                                             ` Pekka Enberg
2011-04-20  7:34                                             ` Pekka Enberg
2011-04-20  8:40                                             ` KOSAKI Motohiro
2011-04-20  8:40                                               ` KOSAKI Motohiro
2011-04-20  8:40                                               ` KOSAKI Motohiro
2011-04-20 16:32                                               ` James Bottomley
2011-04-20 16:32                                                 ` James Bottomley
2011-04-20 16:32                                                 ` James Bottomley
2011-04-20 16:50                                                 ` Christoph Lameter
2011-04-20 16:50                                                   ` Christoph Lameter
2011-04-20 18:09                                                   ` James Bottomley
2011-04-20 18:09                                                     ` James Bottomley
2011-04-20 21:18                                                 ` David Rientjes
2011-04-20 21:18                                                   ` David Rientjes
2011-04-20 22:15                                                   ` James Bottomley
2011-04-20 22:15                                                     ` James Bottomley
2011-04-20 23:12                                                     ` David Rientjes
2011-04-20 23:12                                                       ` David Rientjes
2011-04-21 13:16                                                       ` KOSAKI Motohiro
2011-04-21 13:16                                                         ` KOSAKI Motohiro
2011-04-21 16:37                                                         ` James Bottomley
2011-04-21 16:37                                                           ` James Bottomley
2011-04-21 18:33                                                           ` Christoph Lameter
2011-04-21 18:33                                                             ` Christoph Lameter
2011-04-21 18:45                                                             ` Dave Hansen
2011-04-21 18:45                                                               ` Dave Hansen
2011-04-22 18:19                                                               ` James Bottomley
2011-04-22 18:19                                                                 ` James Bottomley
2011-04-22 20:24                                                                 ` Dave Hansen
2011-04-22 20:24                                                                   ` Dave Hansen
2011-04-22 20:35                                                                   ` James Bottomley
2011-04-22 20:35                                                                     ` James Bottomley
2011-04-22 21:33                                                               ` James Bottomley
2011-04-22 21:33                                                                 ` James Bottomley
2011-04-23 18:34                                                                 ` [PATCH] convert parisc to sparsemem (was Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards) James Bottomley
2011-04-23 18:34                                                                   ` James Bottomley
2011-04-24 16:27                                                                   ` John David Anglin
2011-04-24 16:27                                                                     ` John David Anglin
2011-04-26  0:32                                                                   ` KOSAKI Motohiro
2011-04-26  0:32                                                                     ` KOSAKI Motohiro
2011-04-27 16:36                                                                   ` Dave Hansen
2011-04-27 16:36                                                                     ` Dave Hansen
2011-04-21 20:05                                                             ` [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards James Bottomley
2011-04-21 20:05                                                               ` James Bottomley
2011-04-21 20:05                                                               ` James Bottomley
2011-04-21 21:07                                                               ` Christoph Lameter
2011-04-21 21:07                                                                 ` Christoph Lameter
2011-04-21 21:22                                                                 ` James Bottomley
2011-04-21 21:22                                                                   ` James Bottomley
2011-04-21 19:33                                                         ` David Rientjes
2011-04-21 19:33                                                           ` David Rientjes
2011-04-22  0:34                                                           ` KOSAKI Motohiro
2011-04-22  0:34                                                             ` KOSAKI Motohiro
2011-04-21  2:27                                                   ` [patch] parisc: set memory ranges in N_NORMAL_MEMORY when onlined David Rientjes
2011-04-21 13:03                                                   ` [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards KOSAKI Motohiro
2011-04-21 13:03                                                     ` KOSAKI Motohiro
2011-04-21 19:38                                                     ` David Rientjes
2011-04-21 19:38                                                       ` David Rientjes
2011-04-21 20:02                                                       ` Christoph Lameter
2011-04-21 20:02                                                         ` Christoph Lameter
2011-04-21 21:19                                                         ` David Rientjes
2011-04-21 21:19                                                           ` David Rientjes
2011-04-21 21:24                                                           ` James Bottomley
2011-04-21 21:24                                                             ` James Bottomley
2011-04-21 21:34                                                             ` David Rientjes
2011-04-21 21:34                                                               ` David Rientjes
2011-04-21 21:49                                                               ` James Bottomley
2011-04-21 21:49                                                                 ` James Bottomley
2011-04-21 22:12                                                                 ` David Rientjes
2011-04-21 22:12                                                                   ` David Rientjes
2011-04-22  8:02                                                                   ` Pekka Enberg
2011-04-22  8:02                                                                     ` Pekka Enberg
2011-04-22  8:02                                                                     ` Pekka Enberg
2011-04-22 13:49                                                                     ` James Bottomley
2011-04-22 13:49                                                                       ` James Bottomley
2011-04-22 17:00                                                                       ` Pekka Enberg
2011-04-22 17:00                                                                         ` Pekka Enberg
2011-04-22 17:03                                                                         ` James Bottomley
2011-04-22 17:03                                                                           ` James Bottomley
2011-04-21 21:31                                                           ` [patch] parisc: enable CONFIG_NUMA for DISCONTIGMEM and fix build errors David Rientjes
2011-04-21 21:41                                                           ` [patch] mm: always set nodes with regular memory in N_NORMAL_MEMORY David Rientjes
2011-04-21 21:41                                                             ` David Rientjes
2011-04-22  0:36                                                             ` KOSAKI Motohiro
2011-04-22  0:36                                                               ` KOSAKI Motohiro
2011-04-21 22:00                                                           ` [patch] alpha, mm: set all online nodes " David Rientjes
2011-04-21 22:01                                                           ` [patch] m32r, " David Rientjes
2011-04-21 22:02                                                           ` [patch] m68k, " David Rientjes
2011-04-24  1:59                                                             ` Michael Schmitz
2011-04-24  3:14                                                               ` KOSAKI Motohiro
2011-04-24  4:01                                                                 ` Michael Schmitz
2011-04-24  5:05                                                                   ` Michael Schmitz
2011-04-24  6:04                                                                     ` KOSAKI Motohiro
2011-04-24 11:32                                                                 ` Thorsten Glaser
2011-04-25  2:41                                                                   ` Michael Schmitz
2011-04-25 14:22                                                                     ` Thorsten Glaser
2011-04-24 12:18                                                               ` Thorsten Glaser
2011-04-25 20:56                                                               ` David Rientjes
2011-04-26  2:51                                                                 ` Michael Schmitz
2011-04-26 10:50                                                                   ` Thorsten Glaser
2011-04-27 13:19                                                                     ` Geert Uytterhoeven
2011-04-27 15:09                                                                       ` Thorsten Glaser
2011-04-27 15:11                                                                       ` Thorsten Glaser
2011-04-28  2:35                                                                         ` Ben Hutchings
2011-04-28  3:05                                                                       ` Michael Schmitz
2011-04-20 21:05                                               ` [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards David Rientjes
2011-04-20 21:05                                                 ` David Rientjes
2011-04-20 11:20                                             ` Matthew Wilcox
2011-04-20 11:20                                               ` Matthew Wilcox
2011-04-20 11:28                                               ` Pekka Enberg
2011-04-20 11:28                                                 ` Pekka Enberg
2011-04-20 11:28                                                 ` Pekka Enberg
2011-04-20 14:15                                                 ` James Bottomley
2011-04-20 14:15                                                   ` James Bottomley
2011-04-20 14:50                                                   ` Christoph Lameter
2011-04-20 14:50                                                     ` Christoph Lameter
2011-04-20 15:02                                                     ` James Bottomley
2011-04-20 15:02                                                       ` James Bottomley
2011-04-20 15:22                                                       ` Christoph Lameter
2011-04-20 15:22                                                         ` Christoph Lameter
2011-04-20 19:25                                                         ` Matthew Wilcox
2011-04-20 19:25                                                           ` Matthew Wilcox
2011-04-20 21:42                                                         ` David Rientjes
2011-04-20 21:42                                                           ` David Rientjes
2011-04-21 16:06                                                           ` James Bottomley
2011-04-21 16:06                                                             ` James Bottomley
2011-04-21 22:19                                                             ` David Rientjes
2011-04-21 22:19                                                               ` David Rientjes
2011-04-21 22:31                                                               ` James Bottomley
2011-04-21 22:31                                                                 ` James Bottomley
2011-04-20 13:58                                               ` Christoph Lameter
2011-04-20 13:58                                                 ` Christoph Lameter
2011-04-20 21:34                                               ` David Rientjes
2011-04-20 21:34                                                 ` David Rientjes
2011-04-20 14:07                                             ` Christoph Lameter
2011-04-20 14:07                                               ` Christoph Lameter
2011-04-20  2:33                                     ` James Bottomley
2011-04-20  2:33                                       ` James Bottomley
2011-04-19 17:12             ` Christoph Lameter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.