linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] numa, mem-hotplug: Fix stack overflow in numa when seting kernel nodes to unhotpluggable.
@ 2014-01-23  5:49 Tang Chen
  2014-01-23  6:01 ` Dave Jones
                   ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Tang Chen @ 2014-01-23  5:49 UTC (permalink / raw)
  To: davej, tglx, mingo, hpa, akpm, zhangyanfei, guz.fnst; +Cc: x86, linux-kernel

Dave found that the kernel will hang during boot. This is because
the nodemask_t type stack variable numa_kernel_nodes is large enough
to overflow the stack.

This doesn't always happen. According to Dave, this happened once
in about five boots. The backtrace is like the following:

dump_stack
panic
? numa_clear_kernel_node_hotplug
__stack_chk_fail
numa_clear_kernel_node_hotplug
? memblock_search_pfn_nid
? __early_pfn_to_nid
numa_init
x86_numa_init
initmem_init
setup_arch
start_kernel

This patch fix this problem by defining numa_kernel_nodes as a
static global variable in __initdata area.

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Tested-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
---
 arch/x86/mm/numa.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index 81b2750..ebefeb7 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -562,10 +562,10 @@ static void __init numa_init_array(void)
 	}
 }
 
+static nodemask_t numa_kernel_nodes __initdata;
 static void __init numa_clear_kernel_node_hotplug(void)
 {
 	int i, nid;
-	nodemask_t numa_kernel_nodes;
 	unsigned long start, end;
 	struct memblock_type *type = &memblock.reserved;
 
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH] numa, mem-hotplug: Fix stack overflow in numa when seting kernel nodes to unhotpluggable.
  2014-01-23  5:49 [PATCH] numa, mem-hotplug: Fix stack overflow in numa when seting kernel nodes to unhotpluggable Tang Chen
@ 2014-01-23  6:01 ` Dave Jones
  2014-01-23  6:05 ` Andrew Morton
  2014-01-23  6:06 ` David Rientjes
  2 siblings, 0 replies; 23+ messages in thread
From: Dave Jones @ 2014-01-23  6:01 UTC (permalink / raw)
  To: Tang Chen
  Cc: tglx, mingo, hpa, akpm, zhangyanfei, guz.fnst, x86, linux-kernel

On Thu, Jan 23, 2014 at 01:49:28PM +0800, Tang Chen wrote:
 
 > This doesn't always happen. According to Dave, this happened once
 > in about five boots. The backtrace is like the following:
 > 
 > dump_stack
 > panic
 > ? numa_clear_kernel_node_hotplug
 > __stack_chk_fail
 > numa_clear_kernel_node_hotplug
 > ? memblock_search_pfn_nid
 > ? __early_pfn_to_nid
 > numa_init
 > x86_numa_init
 > initmem_init
 > setup_arch
 > start_kernel
 > 
 > This patch fix this problem by defining numa_kernel_nodes as a
 > static global variable in __initdata area.
 > 
 > diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
 > index 81b2750..ebefeb7 100644
 > --- a/arch/x86/mm/numa.c
 > +++ b/arch/x86/mm/numa.c
 > @@ -562,10 +562,10 @@ static void __init numa_init_array(void)
 >  	}
 >  }
 >  
 > +static nodemask_t numa_kernel_nodes __initdata;
 >  static void __init numa_clear_kernel_node_hotplug(void)
 >  {
 >  	int i, nid;
 > -	nodemask_t numa_kernel_nodes;
 >  	unsigned long start, end;
 >  	struct memblock_type *type = &memblock.reserved;

I'm surprised that this worked for anyone.
By my math, nodemask_t is 1024 longs, which should fill the whole stack.

Any idea why it only broke sometimes ?

There are other on-stack nodemask_t's in the tree too, why are they safe ?

	Dave


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] numa, mem-hotplug: Fix stack overflow in numa when seting kernel nodes to unhotpluggable.
  2014-01-23  5:49 [PATCH] numa, mem-hotplug: Fix stack overflow in numa when seting kernel nodes to unhotpluggable Tang Chen
  2014-01-23  6:01 ` Dave Jones
@ 2014-01-23  6:05 ` Andrew Morton
  2014-01-23  6:06 ` David Rientjes
  2 siblings, 0 replies; 23+ messages in thread
From: Andrew Morton @ 2014-01-23  6:05 UTC (permalink / raw)
  To: Tang Chen
  Cc: davej, tglx, mingo, hpa, zhangyanfei, guz.fnst, x86, linux-kernel

On Thu, 23 Jan 2014 13:49:28 +0800 Tang Chen <tangchen@cn.fujitsu.com> wrote:

> Dave found that the kernel will hang during boot. This is because
> the nodemask_t type stack variable numa_kernel_nodes is large enough
> to overflow the stack.
> 
> This doesn't always happen. According to Dave, this happened once
> in about five boots. The backtrace is like the following:
> 
> dump_stack
> panic
> ? numa_clear_kernel_node_hotplug
> __stack_chk_fail
> numa_clear_kernel_node_hotplug
> ? memblock_search_pfn_nid
> ? __early_pfn_to_nid
> numa_init
> x86_numa_init
> initmem_init
> setup_arch
> start_kernel
> 
> This patch fix this problem by defining numa_kernel_nodes as a
> static global variable in __initdata area.
> 
> ...
>
> --- a/arch/x86/mm/numa.c
> +++ b/arch/x86/mm/numa.c
> @@ -562,10 +562,10 @@ static void __init numa_init_array(void)
>  	}
>  }
>  
> +static nodemask_t numa_kernel_nodes __initdata;
>  static void __init numa_clear_kernel_node_hotplug(void)
>  {
>  	int i, nid;
> -	nodemask_t numa_kernel_nodes;
>  	unsigned long start, end;
>  	struct memblock_type *type = &memblock.reserved;

Seems odd.  The maximum size of a nodemask_t is 128 bytes, isn't it? 
If so, what the heck have we done in there to consume so much stack?


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] numa, mem-hotplug: Fix stack overflow in numa when seting kernel nodes to unhotpluggable.
  2014-01-23  5:49 [PATCH] numa, mem-hotplug: Fix stack overflow in numa when seting kernel nodes to unhotpluggable Tang Chen
  2014-01-23  6:01 ` Dave Jones
  2014-01-23  6:05 ` Andrew Morton
@ 2014-01-23  6:06 ` David Rientjes
  2014-01-23  6:13   ` Dave Jones
  2014-01-28  0:32   ` David Rientjes
  2 siblings, 2 replies; 23+ messages in thread
From: David Rientjes @ 2014-01-23  6:06 UTC (permalink / raw)
  To: Tang Chen
  Cc: davej, tglx, mingo, hpa, akpm, zhangyanfei, guz.fnst, x86, linux-kernel

On Thu, 23 Jan 2014, Tang Chen wrote:

> Dave found that the kernel will hang during boot. This is because
> the nodemask_t type stack variable numa_kernel_nodes is large enough
> to overflow the stack.
> 
> This doesn't always happen. According to Dave, this happened once
> in about five boots. The backtrace is like the following:
> 
> dump_stack
> panic
> ? numa_clear_kernel_node_hotplug
> __stack_chk_fail
> numa_clear_kernel_node_hotplug
> ? memblock_search_pfn_nid
> ? __early_pfn_to_nid
> numa_init
> x86_numa_init
> initmem_init
> setup_arch
> start_kernel
> 
> This patch fix this problem by defining numa_kernel_nodes as a
> static global variable in __initdata area.
> 
> Reported-by: Dave Jones <davej@redhat.com>
> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
> Tested-by: Gu Zheng <guz.fnst@cn.fujitsu.com>

I guess it depends on what Dave's CONFIG_NODES_SHIFT is?

> ---
>  arch/x86/mm/numa.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
> index 81b2750..ebefeb7 100644
> --- a/arch/x86/mm/numa.c
> +++ b/arch/x86/mm/numa.c
> @@ -562,10 +562,10 @@ static void __init numa_init_array(void)
>  	}
>  }
>  
> +static nodemask_t numa_kernel_nodes __initdata;
>  static void __init numa_clear_kernel_node_hotplug(void)
>  {
>  	int i, nid;
> -	nodemask_t numa_kernel_nodes;
>  	unsigned long start, end;
>  	struct memblock_type *type = &memblock.reserved;
>  

Isn't this also a bugfix since you never initialize numa_kernel_nodes when 
it's allocated on the stack with NODE_MASK_NONE?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] numa, mem-hotplug: Fix stack overflow in numa when seting kernel nodes to unhotpluggable.
  2014-01-23  6:06 ` David Rientjes
@ 2014-01-23  6:13   ` Dave Jones
  2014-01-23  6:15     ` David Rientjes
  2014-01-23  6:36     ` Tang Chen
  2014-01-28  0:32   ` David Rientjes
  1 sibling, 2 replies; 23+ messages in thread
From: Dave Jones @ 2014-01-23  6:13 UTC (permalink / raw)
  To: David Rientjes
  Cc: Tang Chen, tglx, mingo, hpa, akpm, zhangyanfei, guz.fnst, x86,
	linux-kernel

On Wed, Jan 22, 2014 at 10:06:14PM -0800, David Rientjes wrote:
 > On Thu, 23 Jan 2014, Tang Chen wrote:
 > 
 > > Dave found that the kernel will hang during boot. This is because
 > > the nodemask_t type stack variable numa_kernel_nodes is large enough
 > > to overflow the stack.
 > > 
 > > This doesn't always happen. According to Dave, this happened once
 > > in about five boots. The backtrace is like the following:
 > > 
 > > dump_stack
 > > panic
 > > ? numa_clear_kernel_node_hotplug
 > > __stack_chk_fail
 > > numa_clear_kernel_node_hotplug
 > > ? memblock_search_pfn_nid
 > > ? __early_pfn_to_nid
 > > numa_init
 > > x86_numa_init
 > > initmem_init
 > > setup_arch
 > > start_kernel
 > > 
 > > This patch fix this problem by defining numa_kernel_nodes as a
 > > static global variable in __initdata area.
 > > 
 > > Reported-by: Dave Jones <davej@redhat.com>
 > > Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
 > > Tested-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
 > 
 > I guess it depends on what Dave's CONFIG_NODES_SHIFT is?

It's 10, because I had MAXSMP set.

So, MAX_NUMNODES = 1 << 10

And the bitmask is made of longs. 1024 of them.

How does this work ?

	Dave


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] numa, mem-hotplug: Fix stack overflow in numa when seting kernel nodes to unhotpluggable.
  2014-01-23  6:13   ` Dave Jones
@ 2014-01-23  6:15     ` David Rientjes
  2014-01-23  6:58       ` Dave Jones
  2014-01-23  6:36     ` Tang Chen
  1 sibling, 1 reply; 23+ messages in thread
From: David Rientjes @ 2014-01-23  6:15 UTC (permalink / raw)
  To: Dave Jones, Tang Chen, tglx, mingo, hpa, akpm, zhangyanfei,
	guz.fnst, x86, linux-kernel

On Thu, 23 Jan 2014, Dave Jones wrote:

> It's 10, because I had MAXSMP set.
> 
> So, MAX_NUMNODES = 1 << 10
> 
> And the bitmask is made of longs. 1024 of them.
> 
> How does this work ?
> 

It's 1024 bits.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] numa, mem-hotplug: Fix stack overflow in numa when seting kernel nodes to unhotpluggable.
  2014-01-23  6:13   ` Dave Jones
  2014-01-23  6:15     ` David Rientjes
@ 2014-01-23  6:36     ` Tang Chen
  1 sibling, 0 replies; 23+ messages in thread
From: Tang Chen @ 2014-01-23  6:36 UTC (permalink / raw)
  To: Dave Jones, David Rientjes, tglx, mingo, hpa, akpm, zhangyanfei,
	guz.fnst, x86, linux-kernel

On 01/23/2014 02:13 PM, Dave Jones wrote:
> On Wed, Jan 22, 2014 at 10:06:14PM -0800, David Rientjes wrote:
>   >  On Thu, 23 Jan 2014, Tang Chen wrote:
>   >
......
>   >
>   >  I guess it depends on what Dave's CONFIG_NODES_SHIFT is?
>
> It's 10, because I had MAXSMP set.
>
> So, MAX_NUMNODES = 1<<  10
>
> And the bitmask is made of longs. 1024 of them.
>
> How does this work ?

I have the same config with you.

Would you please try it for me ?  Does it work on your box ?

I cannot reproduce this problem on the latest kernel.
But I can reproduce it on 3.10.

Thanks

>
> 	Dave
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] numa, mem-hotplug: Fix stack overflow in numa when seting kernel nodes to unhotpluggable.
  2014-01-23  6:15     ` David Rientjes
@ 2014-01-23  6:58       ` Dave Jones
  2014-01-23 22:31         ` Dave Jones
  0 siblings, 1 reply; 23+ messages in thread
From: Dave Jones @ 2014-01-23  6:58 UTC (permalink / raw)
  To: David Rientjes
  Cc: Tang Chen, tglx, mingo, hpa, akpm, zhangyanfei, guz.fnst, x86,
	linux-kernel

On Wed, Jan 22, 2014 at 10:15:51PM -0800, David Rientjes wrote:
 > On Thu, 23 Jan 2014, Dave Jones wrote:
 > 
 > > It's 10, because I had MAXSMP set.
 > > 
 > > So, MAX_NUMNODES = 1 << 10
 > > 
 > > And the bitmask is made of longs. 1024 of them.
 > > 
 > > How does this work ?
 > > 
 > 
 > It's 1024 bits.

ok, I got lost in the maze of macros.

128 bytes is a pretty small amount of stack though, so I'm just as confused
as to what the actual bug here is.


After trying the proposed fix, I got another oops in the early init code..

<trace>
nr_free_zone_pages
nr_free_pagecache_pages
build_all_zonelists
start_kernel
<rip> ffffffffbc164b1e next_zones_zonelist
<rsp> ffffffffbcc01f00

I'll poke at it more in the morning. Too sleepy.

	Dave


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] numa, mem-hotplug: Fix stack overflow in numa when seting kernel nodes to unhotpluggable.
  2014-01-23  6:58       ` Dave Jones
@ 2014-01-23 22:31         ` Dave Jones
  2014-01-27  7:29           ` Tang Chen
  0 siblings, 1 reply; 23+ messages in thread
From: Dave Jones @ 2014-01-23 22:31 UTC (permalink / raw)
  To: David Rientjes, Tang Chen, tglx, mingo, hpa, akpm, zhangyanfei,
	guz.fnst, x86, linux-kernel

On Thu, Jan 23, 2014 at 01:58:24AM -0500, Dave Jones wrote:

 > 128 bytes is a pretty small amount of stack though, so I'm just as confused
 > as to what the actual bug here is.
 > 
 > After trying the proposed fix, I got another oops in the early init code..
 > 
 > <trace>
 > nr_free_zone_pages
 > nr_free_pagecache_pages
 > build_all_zonelists
 > start_kernel
 > <rip> ffffffffbc164b1e next_zones_zonelist
 > <rsp> ffffffffbcc01f00

Ok, this is crashing here in next_zones_zonelist...

                while (zonelist_zone_idx(z) > highest_zoneidx)
  de:   3b 77 08                cmp    0x8(%rdi),%esi


I stuck this at the top of the function..

printk(KERN_ERR "z:%p nodes:%p highest:%d\n", z, nodes, highest_zoneidx);

and got

z: 1d08   nodes: (null)  highest:3


Some build tests show..

MAXSMP ( NODESHIFT=10 ) : Bug
NRCPUS=4 & NODESHIFT=10 : Bug
NRCPUS=4 & NODESHIFT=1 : no bug


The middle config test was accidental, I hadn't realised disabling MAXSMP
wouldn't reset NODESHIFT to something sane.

I'll start bisecting, as MAXSMP worked fine until a few days ago.

	Dave


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] numa, mem-hotplug: Fix stack overflow in numa when seting kernel nodes to unhotpluggable.
  2014-01-23 22:31         ` Dave Jones
@ 2014-01-27  7:29           ` Tang Chen
  2014-01-27 14:52             ` Dave Jones
  0 siblings, 1 reply; 23+ messages in thread
From: Tang Chen @ 2014-01-27  7:29 UTC (permalink / raw)
  To: Dave Jones, David Rientjes, tglx, mingo, hpa, akpm, zhangyanfei,
	guz.fnst, x86, linux-kernel




On 01/24/2014 06:31 AM, Dave Jones wrote:
> On Thu, Jan 23, 2014 at 01:58:24AM -0500, Dave Jones wrote:
>
>   >  128 bytes is a pretty small amount of stack though, so I'm just as confused
>   >  as to what the actual bug here is.
>   >
>   >  After trying the proposed fix, I got another oops in the early init code..
>   >
>   >  <trace>
>   >  nr_free_zone_pages
>   >  nr_free_pagecache_pages
>   >  build_all_zonelists
>   >  start_kernel
>   >  <rip>  ffffffffbc164b1e next_zones_zonelist
>   >  <rsp>  ffffffffbcc01f00
>
> Ok, this is crashing here in next_zones_zonelist...
>
>                  while (zonelist_zone_idx(z)>  highest_zoneidx)
>    de:   3b 77 08                cmp    0x8(%rdi),%esi
>
>
> I stuck this at the top of the function..
>
> printk(KERN_ERR "z:%p nodes:%p highest:%d\n", z, nodes, highest_zoneidx);
>
> and got
>
> z: 1d08   nodes: (null)  highest:3
>
>
> Some build tests show..
>
> MAXSMP ( NODESHIFT=10 ) : Bug
> NRCPUS=4&  NODESHIFT=10 : Bug
> NRCPUS=4&  NODESHIFT=1 : no bug
>
>
> The middle config test was accidental, I hadn't realised disabling MAXSMP
> wouldn't reset NODESHIFT to something sane.
>
> I'll start bisecting, as MAXSMP worked fine until a few days ago.

Hi Dave,

I didn't reproduce this bug. Would you please share the bisect result ?

Thanks.

>
> 	Dave
>
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] numa, mem-hotplug: Fix stack overflow in numa when seting kernel nodes to unhotpluggable.
  2014-01-27  7:29           ` Tang Chen
@ 2014-01-27 14:52             ` Dave Jones
  0 siblings, 0 replies; 23+ messages in thread
From: Dave Jones @ 2014-01-27 14:52 UTC (permalink / raw)
  To: Tang Chen
  Cc: David Rientjes, tglx, mingo, hpa, akpm, zhangyanfei, guz.fnst,
	x86, linux-kernel

On Mon, Jan 27, 2014 at 03:29:03PM +0800, Tang Chen wrote:

 > > Some build tests show..
 > >
 > > MAXSMP ( NODESHIFT=10 ) : Bug
 > > NRCPUS=4&  NODESHIFT=10 : Bug
 > > NRCPUS=4&  NODESHIFT=1 : no bug
 > >
 > >
 > > The middle config test was accidental, I hadn't realised disabling MAXSMP
 > > wouldn't reset NODESHIFT to something sane.
 > >
 > > I'll start bisecting, as MAXSMP worked fine until a few days ago.
 > 
 > Hi Dave,
 > 
 > I didn't reproduce this bug. Would you please share the bisect result ?

The bisect pointed at something completely unrelated, and then when I tried 
again on git HEAD, I couldn't reproduce it.. 
If I manage to get it happening again I'll look into it more, but for now
it seems to have gone into hiding.

	Dave

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] numa, mem-hotplug: Fix stack overflow in numa when seting kernel nodes to unhotpluggable.
  2014-01-23  6:06 ` David Rientjes
  2014-01-23  6:13   ` Dave Jones
@ 2014-01-28  0:32   ` David Rientjes
  2014-01-28  1:01     ` Tang Chen
  1 sibling, 1 reply; 23+ messages in thread
From: David Rientjes @ 2014-01-28  0:32 UTC (permalink / raw)
  To: Tang Chen
  Cc: davej, tglx, mingo, hpa, akpm, zhangyanfei, guz.fnst, x86, linux-kernel

On Wed, 22 Jan 2014, David Rientjes wrote:

> >  arch/x86/mm/numa.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
> > index 81b2750..ebefeb7 100644
> > --- a/arch/x86/mm/numa.c
> > +++ b/arch/x86/mm/numa.c
> > @@ -562,10 +562,10 @@ static void __init numa_init_array(void)
> >  	}
> >  }
> >  
> > +static nodemask_t numa_kernel_nodes __initdata;
> >  static void __init numa_clear_kernel_node_hotplug(void)
> >  {
> >  	int i, nid;
> > -	nodemask_t numa_kernel_nodes;
> >  	unsigned long start, end;
> >  	struct memblock_type *type = &memblock.reserved;
> >  
> 
> Isn't this also a bugfix since you never initialize numa_kernel_nodes when 
> it's allocated on the stack with NODE_MASK_NONE?
> 

This hasn't been answered and the patch still isn't in linux-kernel yet 
Dave tested it as good.  I'm suspicious of the changelog that indicates 
this nodemask is the result of a stack overflow itself which only manages 
to reproduce itself in the init patch slightly more than 50% of the time.  
How is that possible?

I think the changelog should indicate this also fixes an uninitialized 
nodemask issue.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] numa, mem-hotplug: Fix stack overflow in numa when seting kernel nodes to unhotpluggable.
  2014-01-28  0:32   ` David Rientjes
@ 2014-01-28  1:01     ` Tang Chen
  2014-01-28  2:55       ` Dave Jones
  0 siblings, 1 reply; 23+ messages in thread
From: Tang Chen @ 2014-01-28  1:01 UTC (permalink / raw)
  To: David Rientjes
  Cc: davej, tglx, mingo, hpa, akpm, zhangyanfei, guz.fnst, x86, linux-kernel

On 01/28/2014 08:32 AM, David Rientjes wrote:
> On Wed, 22 Jan 2014, David Rientjes wrote:
>
>>>   arch/x86/mm/numa.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
>>> index 81b2750..ebefeb7 100644
>>> --- a/arch/x86/mm/numa.c
>>> +++ b/arch/x86/mm/numa.c
>>> @@ -562,10 +562,10 @@ static void __init numa_init_array(void)
>>>   	}
>>>   }
>>>
>>> +static nodemask_t numa_kernel_nodes __initdata;
>>>   static void __init numa_clear_kernel_node_hotplug(void)
>>>   {
>>>   	int i, nid;
>>> -	nodemask_t numa_kernel_nodes;
>>>   	unsigned long start, end;
>>>   	struct memblock_type *type =&memblock.reserved;
>>>
>>
>> Isn't this also a bugfix since you never initialize numa_kernel_nodes when
>> it's allocated on the stack with NODE_MASK_NONE?
>>
>
> This hasn't been answered and the patch still isn't in linux-kernel yet
> Dave tested it as good.  I'm suspicious of the changelog that indicates
> this nodemask is the result of a stack overflow itself which only manages
> to reproduce itself in the init patch slightly more than 50% of the time.
> How is that possible?
>
> I think the changelog should indicate this also fixes an uninitialized
> nodemask issue.

Hi David,

I'm still working on this problem, but unfortunately nothing new for now.
And the test till now shows no more problem here.

I'm digging into it, but need more time.

I'll resend a new patch and modify the changelog soon. Before we find the
root cause, I think we can use this patch as a temporary solution.

Thanks.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] numa, mem-hotplug: Fix stack overflow in numa when seting kernel nodes to unhotpluggable.
  2014-01-28  1:01     ` Tang Chen
@ 2014-01-28  2:55       ` Dave Jones
  2014-01-28  3:14         ` Tang Chen
  2014-01-28  3:24         ` Tang Chen
  0 siblings, 2 replies; 23+ messages in thread
From: Dave Jones @ 2014-01-28  2:55 UTC (permalink / raw)
  To: Tang Chen
  Cc: David Rientjes, tglx, mingo, hpa, akpm, zhangyanfei, guz.fnst,
	x86, linux-kernel

On Tue, Jan 28, 2014 at 09:01:25AM +0800, Tang Chen wrote:
 > On 01/28/2014 08:32 AM, David Rientjes wrote:
 > > On Wed, 22 Jan 2014, David Rientjes wrote:
 > >
 > >>>   arch/x86/mm/numa.c | 2 +-
 > >>>   1 file changed, 1 insertion(+), 1 deletion(-)
 > >>>
 > >>> diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
 > >>> index 81b2750..ebefeb7 100644
 > >>> --- a/arch/x86/mm/numa.c
 > >>> +++ b/arch/x86/mm/numa.c
 > >>> @@ -562,10 +562,10 @@ static void __init numa_init_array(void)
 > >>>   	}
 > >>>   }
 > >>>
 > >>> +static nodemask_t numa_kernel_nodes __initdata;
 > >>>   static void __init numa_clear_kernel_node_hotplug(void)
 > >>>   {
 > >>>   	int i, nid;
 > >>> -	nodemask_t numa_kernel_nodes;
 > >>>   	unsigned long start, end;
 > >>>   	struct memblock_type *type =&memblock.reserved;
 > >>>
 > >>
 > >> Isn't this also a bugfix since you never initialize numa_kernel_nodes when
 > >> it's allocated on the stack with NODE_MASK_NONE?
 > >>
 > >
 > > This hasn't been answered and the patch still isn't in linux-kernel yet
 > > Dave tested it as good.  I'm suspicious of the changelog that indicates
 > > this nodemask is the result of a stack overflow itself which only manages
 > > to reproduce itself in the init patch slightly more than 50% of the time.
 > > How is that possible?
 > >
 > > I think the changelog should indicate this also fixes an uninitialized
 > > nodemask issue.
 > 
 > Hi David,
 > 
 > I'm still working on this problem, but unfortunately nothing new for now.
 > And the test till now shows no more problem here.
 > 
 > I'm digging into it, but need more time.
 > 
 > I'll resend a new patch and modify the changelog soon. Before we find the
 > root cause, I think we can use this patch as a temporary solution.

Ok, I hit the 2nd bug again (oops in next_zones_zonelist...)

I did a bisect with the patch above applied each step of the way.
This time I got a plausible looking result....


a0acda917284183f9b71e2d08b0aa0aea722b321 is the first bad commit
commit a0acda917284183f9b71e2d08b0aa0aea722b321
Author: Tang Chen <tangchen@cn.fujitsu.com>
Date:   Tue Jan 21 15:49:32 2014 -0800

    acpi, numa, mem_hotplug: mark all nodes the kernel resides un-hotpluggable
    

Reverting this commit of course removes the whole function from above,
so we haven't really learned anything new, other than that commit is broken,
even after the above fix-up.

	Dave


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] numa, mem-hotplug: Fix stack overflow in numa when seting kernel nodes to unhotpluggable.
  2014-01-28  2:55       ` Dave Jones
@ 2014-01-28  3:14         ` Tang Chen
  2014-01-28  3:24         ` Tang Chen
  1 sibling, 0 replies; 23+ messages in thread
From: Tang Chen @ 2014-01-28  3:14 UTC (permalink / raw)
  To: Dave Jones, David Rientjes, tglx, mingo, hpa, akpm, zhangyanfei,
	guz.fnst, x86, linux-kernel

On 01/28/2014 10:55 AM, Dave Jones wrote:
> On Tue, Jan 28, 2014 at 09:01:25AM +0800, Tang Chen wrote:
>   >  On 01/28/2014 08:32 AM, David Rientjes wrote:
>   >  >  On Wed, 22 Jan 2014, David Rientjes wrote:
>   >  >
>   >  >>>    arch/x86/mm/numa.c | 2 +-
>   >  >>>    1 file changed, 1 insertion(+), 1 deletion(-)
>   >  >>>
>   >  >>>  diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
>   >  >>>  index 81b2750..ebefeb7 100644
>   >  >>>  --- a/arch/x86/mm/numa.c
>   >  >>>  +++ b/arch/x86/mm/numa.c
>   >  >>>  @@ -562,10 +562,10 @@ static void __init numa_init_array(void)
>   >  >>>    	}
>   >  >>>    }
>   >  >>>
>   >  >>>  +static nodemask_t numa_kernel_nodes __initdata;
>   >  >>>    static void __init numa_clear_kernel_node_hotplug(void)
>   >  >>>    {
>   >  >>>    	int i, nid;
>   >  >>>  -	nodemask_t numa_kernel_nodes;
>   >  >>>    	unsigned long start, end;
>   >  >>>    	struct memblock_type *type =&memblock.reserved;
>   >  >>>
>   >  >>
>   >  >>  Isn't this also a bugfix since you never initialize numa_kernel_nodes when
>   >  >>  it's allocated on the stack with NODE_MASK_NONE?
>   >  >>
>   >  >
>   >  >  This hasn't been answered and the patch still isn't in linux-kernel yet
>   >  >  Dave tested it as good.  I'm suspicious of the changelog that indicates
>   >  >  this nodemask is the result of a stack overflow itself which only manages
>   >  >  to reproduce itself in the init patch slightly more than 50% of the time.
>   >  >  How is that possible?
>   >  >
>   >  >  I think the changelog should indicate this also fixes an uninitialized
>   >  >  nodemask issue.
>   >
>   >  Hi David,
>   >
>   >  I'm still working on this problem, but unfortunately nothing new for now.
>   >  And the test till now shows no more problem here.
>   >
>   >  I'm digging into it, but need more time.
>   >
>   >  I'll resend a new patch and modify the changelog soon. Before we find the
>   >  root cause, I think we can use this patch as a temporary solution.
>
> Ok, I hit the 2nd bug again (oops in next_zones_zonelist...)
>
> I did a bisect with the patch above applied each step of the way.
> This time I got a plausible looking result....
>
>
> a0acda917284183f9b71e2d08b0aa0aea722b321 is the first bad commit
> commit a0acda917284183f9b71e2d08b0aa0aea722b321
> Author: Tang Chen<tangchen@cn.fujitsu.com>
> Date:   Tue Jan 21 15:49:32 2014 -0800
>
>      acpi, numa, mem_hotplug: mark all nodes the kernel resides un-hotpluggable
>
>
> Reverting this commit of course removes the whole function from above,
> so we haven't really learned anything new, other than that commit is broken,
> even after the above fix-up.

If we revert this commit, memory hot-remove won't be able to work.
Let's try to fix it before the merge window is close.

>
> 	Dave
>
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] numa, mem-hotplug: Fix stack overflow in numa when seting kernel nodes to unhotpluggable.
  2014-01-28  2:55       ` Dave Jones
  2014-01-28  3:14         ` Tang Chen
@ 2014-01-28  3:24         ` Tang Chen
  2014-01-28  3:55           ` Dave Jones
  1 sibling, 1 reply; 23+ messages in thread
From: Tang Chen @ 2014-01-28  3:24 UTC (permalink / raw)
  To: Dave Jones, David Rientjes, tglx, mingo, hpa, akpm, zhangyanfei,
	guz.fnst, x86, linux-kernel

On 01/28/2014 10:55 AM, Dave Jones wrote:
> On Tue, Jan 28, 2014 at 09:01:25AM +0800, Tang Chen wrote:
>   >  On 01/28/2014 08:32 AM, David Rientjes wrote:
>   >  >  On Wed, 22 Jan 2014, David Rientjes wrote:
>   >  >
>   >  >>>    arch/x86/mm/numa.c | 2 +-
>   >  >>>    1 file changed, 1 insertion(+), 1 deletion(-)
>   >  >>>
>   >  >>>  diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
>   >  >>>  index 81b2750..ebefeb7 100644
>   >  >>>  --- a/arch/x86/mm/numa.c
>   >  >>>  +++ b/arch/x86/mm/numa.c
>   >  >>>  @@ -562,10 +562,10 @@ static void __init numa_init_array(void)
>   >  >>>    	}
>   >  >>>    }
>   >  >>>
>   >  >>>  +static nodemask_t numa_kernel_nodes __initdata;
>   >  >>>    static void __init numa_clear_kernel_node_hotplug(void)
>   >  >>>    {
>   >  >>>    	int i, nid;
>   >  >>>  -	nodemask_t numa_kernel_nodes;
>   >  >>>    	unsigned long start, end;
>   >  >>>    	struct memblock_type *type =&memblock.reserved;
>   >  >>>
>   >  >>
>   >  >>  Isn't this also a bugfix since you never initialize numa_kernel_nodes when
>   >  >>  it's allocated on the stack with NODE_MASK_NONE?
>   >  >>
>   >  >
>   >  >  This hasn't been answered and the patch still isn't in linux-kernel yet
>   >  >  Dave tested it as good.  I'm suspicious of the changelog that indicates
>   >  >  this nodemask is the result of a stack overflow itself which only manages
>   >  >  to reproduce itself in the init patch slightly more than 50% of the time.
>   >  >  How is that possible?
>   >  >
>   >  >  I think the changelog should indicate this also fixes an uninitialized
>   >  >  nodemask issue.
>   >
>   >  Hi David,
>   >
>   >  I'm still working on this problem, but unfortunately nothing new for now.
>   >  And the test till now shows no more problem here.
>   >
>   >  I'm digging into it, but need more time.
>   >
>   >  I'll resend a new patch and modify the changelog soon. Before we find the
>   >  root cause, I think we can use this patch as a temporary solution.
>
> Ok, I hit the 2nd bug again (oops in next_zones_zonelist...)
>
> I did a bisect with the patch above applied each step of the way.
> This time I got a plausible looking result....

I cannot reproduce this. Would you please share how to reproduce it ?
Or does it just happen during the booting ?

>
>
> a0acda917284183f9b71e2d08b0aa0aea722b321 is the first bad commit
> commit a0acda917284183f9b71e2d08b0aa0aea722b321
> Author: Tang Chen<tangchen@cn.fujitsu.com>
> Date:   Tue Jan 21 15:49:32 2014 -0800
>
>      acpi, numa, mem_hotplug: mark all nodes the kernel resides un-hotpluggable
>
>
> Reverting this commit of course removes the whole function from above,
> so we haven't really learned anything new, other than that commit is broken,
> even after the above fix-up.
>
> 	Dave
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] numa, mem-hotplug: Fix stack overflow in numa when seting kernel nodes to unhotpluggable.
  2014-01-28  3:24         ` Tang Chen
@ 2014-01-28  3:55           ` Dave Jones
  2014-01-28  4:47             ` Tang Chen
  0 siblings, 1 reply; 23+ messages in thread
From: Dave Jones @ 2014-01-28  3:55 UTC (permalink / raw)
  To: Tang Chen
  Cc: David Rientjes, tglx, mingo, hpa, akpm, zhangyanfei, guz.fnst,
	x86, linux-kernel

On Tue, Jan 28, 2014 at 11:24:37AM +0800, Tang Chen wrote:

 > > I did a bisect with the patch above applied each step of the way.
 > > This time I got a plausible looking result....
 > 
 > I cannot reproduce this. Would you please share how to reproduce it ?
 > Or does it just happen during the booting ?

Just during boot. Very early. So early in fact, I have no logging facilities
like usb-serial, just what is on vga console.

If you want me to add some printk's, I can add a while (1); before
the part that oopses so we can diagnose further..

	Dave

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] numa, mem-hotplug: Fix stack overflow in numa when seting kernel nodes to unhotpluggable.
  2014-01-28  3:55           ` Dave Jones
@ 2014-01-28  4:47             ` Tang Chen
  2014-01-28  4:47               ` Dave Jones
  0 siblings, 1 reply; 23+ messages in thread
From: Tang Chen @ 2014-01-28  4:47 UTC (permalink / raw)
  To: Dave Jones, David Rientjes, tglx, mingo, hpa, akpm, zhangyanfei,
	guz.fnst, x86, linux-kernel

On 01/28/2014 11:55 AM, Dave Jones wrote:
> On Tue, Jan 28, 2014 at 11:24:37AM +0800, Tang Chen wrote:
>
>   >  >  I did a bisect with the patch above applied each step of the way.
>   >  >  This time I got a plausible looking result....
>   >
>   >  I cannot reproduce this. Would you please share how to reproduce it ?
>   >  Or does it just happen during the booting ?
>
> Just during boot. Very early. So early in fact, I have no logging facilities
> like usb-serial, just what is on vga console.
>
> If you want me to add some printk's, I can add a while (1); before
> the part that oopses so we can diagnose further..

Sure. Would you please do that for me ? Maybe we can find something in 
the early log.

Thanks.

>
> 	Dave
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] numa, mem-hotplug: Fix stack overflow in numa when seting kernel nodes to unhotpluggable.
  2014-01-28  4:47             ` Tang Chen
@ 2014-01-28  4:47               ` Dave Jones
  2014-01-28  5:17                 ` Tang Chen
  2014-01-28  5:31                 ` Tang Chen
  0 siblings, 2 replies; 23+ messages in thread
From: Dave Jones @ 2014-01-28  4:47 UTC (permalink / raw)
  To: Tang Chen
  Cc: David Rientjes, tglx, mingo, hpa, akpm, zhangyanfei, guz.fnst,
	x86, linux-kernel

On Tue, Jan 28, 2014 at 12:47:11PM +0800, Tang Chen wrote:
 > On 01/28/2014 11:55 AM, Dave Jones wrote:
 > > On Tue, Jan 28, 2014 at 11:24:37AM +0800, Tang Chen wrote:
 > >
 > >   >  >  I did a bisect with the patch above applied each step of the way.
 > >   >  >  This time I got a plausible looking result....
 > >   >
 > >   >  I cannot reproduce this. Would you please share how to reproduce it ?
 > >   >  Or does it just happen during the booting ?
 > >
 > > Just during boot. Very early. So early in fact, I have no logging facilities
 > > like usb-serial, just what is on vga console.
 > >
 > > If you want me to add some printk's, I can add a while (1); before
 > > the part that oopses so we can diagnose further..
 > 
 > Sure. Would you please do that for me ? Maybe we can find something in 
 > the early log.

I was hoping you'd have suggestions what you'd like me to dump ;-)

	Dave


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] numa, mem-hotplug: Fix stack overflow in numa when seting kernel nodes to unhotpluggable.
  2014-01-28  4:47               ` Dave Jones
@ 2014-01-28  5:17                 ` Tang Chen
  2014-01-28  6:53                   ` Dave Jones
  2014-01-28  5:31                 ` Tang Chen
  1 sibling, 1 reply; 23+ messages in thread
From: Tang Chen @ 2014-01-28  5:17 UTC (permalink / raw)
  To: Dave Jones, David Rientjes, tglx, mingo, hpa, akpm, zhangyanfei,
	guz.fnst, x86, linux-kernel

On 01/28/2014 12:47 PM, Dave Jones wrote:
> On Tue, Jan 28, 2014 at 12:47:11PM +0800, Tang Chen wrote:
>   >  On 01/28/2014 11:55 AM, Dave Jones wrote:
>   >  >  On Tue, Jan 28, 2014 at 11:24:37AM +0800, Tang Chen wrote:
>   >  >
>   >  >    >   >   I did a bisect with the patch above applied each step of the way.
>   >  >    >   >   This time I got a plausible looking result....
>   >  >    >
>   >  >    >   I cannot reproduce this. Would you please share how to reproduce it ?
>   >  >    >   Or does it just happen during the booting ?
>   >  >
>   >  >  Just during boot. Very early. So early in fact, I have no logging facilities
>   >  >  like usb-serial, just what is on vga console.
>   >  >
>   >  >  If you want me to add some printk's, I can add a while (1); before
>   >  >  the part that oopses so we can diagnose further..
>   >
>   >  Sure. Would you please do that for me ? Maybe we can find something in
>   >  the early log.
>
> I was hoping you'd have suggestions what you'd like me to dump ;-)

Sorry. I didn't say it clearly. :)

Seeing from your earlier mail, it crashed at:

                 while (zonelist_zone_idx(z) > highest_zoneidx)
   de:   3b 77 08                cmp    0x8(%rdi),%esi


	I stuck this at the top of the function..

	printk(KERN_ERR "z:%p nodes:%p highest:%d\n", z, nodes, highest_zoneidx);

	and got

	z: 1d08   nodes: (null)  highest:3


nodes=null and highest=3, they are correct. When looking into 
next_zones_zonelist(),
I cannot see why it crashed. So, can you print the zone id in the
for_each_zone_zonelist() loop in nr_free_zone_pages() ?

I want to know why it crashed. A NULL pointer ?  Which one ?

Thanks.

>
> 	Dave
>
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] numa, mem-hotplug: Fix stack overflow in numa when seting kernel nodes to unhotpluggable.
  2014-01-28  4:47               ` Dave Jones
  2014-01-28  5:17                 ` Tang Chen
@ 2014-01-28  5:31                 ` Tang Chen
  2014-01-28  7:10                   ` Tang Chen
  1 sibling, 1 reply; 23+ messages in thread
From: Tang Chen @ 2014-01-28  5:31 UTC (permalink / raw)
  To: Dave Jones, David Rientjes, tglx, mingo, hpa, akpm, zhangyanfei,
	guz.fnst, x86, linux-kernel

On 01/28/2014 12:47 PM, Dave Jones wrote:
> On Tue, Jan 28, 2014 at 12:47:11PM +0800, Tang Chen wrote:
>   >  On 01/28/2014 11:55 AM, Dave Jones wrote:
>   >  >  On Tue, Jan 28, 2014 at 11:24:37AM +0800, Tang Chen wrote:
>   >  >
>   >  >    >   >   I did a bisect with the patch above applied each step of the way.
>   >  >    >   >   This time I got a plausible looking result....
>   >  >    >
>   >  >    >   I cannot reproduce this. Would you please share how to reproduce it ?
>   >  >    >   Or does it just happen during the booting ?
>   >  >
>   >  >  Just during boot. Very early. So early in fact, I have no logging facilities
>   >  >  like usb-serial, just what is on vga console.
>   >  >
>   >  >  If you want me to add some printk's, I can add a while (1); before
>   >  >  the part that oopses so we can diagnose further..
>   >
>   >  Sure. Would you please do that for me ? Maybe we can find something in
>   >  the early log.
>
> I was hoping you'd have suggestions what you'd like me to dump ;-)


I think I found something.

Since I can reproduce the first problem on 3.10, I found some memory 
ranges in memblock
have nid = 1024. When we use node_set(), it will crash.

I'll see if we have the same problem on the latest kernel.

[    0.000000] NUMA: Initialized distance table, cnt=2
[    0.000000] NUMA: Warning: node ids are out of bound, from=-1 to=-1 
distance=10
[    0.000000] NUMA: Node 0 [mem 0x00000000-0x7fffffff] + [mem 
0x100000000-0x47fffffff] -> [mem 0x00000000-0x47fffffff]
[    0.000000] Initmem setup node 0 [mem 0x00000000-0x47fffffff]
[    0.000000]   NODE_DATA [mem 0x47ffd9000-0x47fffffff]
[    0.000000] Initmem setup node 1 [mem 0x480000000-0x87fffffff]
[    0.000000]   NODE_DATA [mem 0x87ffbb000-0x87ffe1fff]
[    0.000000] AAAA: i = 0, nid = 0
[    0.000000] AAAA: i = 1, nid = 0
[    0.000000] AAAA: i = 2, nid = 0
[    0.000000] AAAA: i = 3, nid = 0
[    0.000000] AAAA: i = 4, nid = 1024
[    0.000000] AAAA: i = 5, nid = 1024
[    0.000000] AAAA: i = 6, nid = 1
[    0.000000] AAAA: i = 7, nid = 1
[    0.000000] Reserving 128MB of memory at 704MB for crashkernel 
(System RAM: 32406MB)
[    0.000000]  [ffffea0000000000-ffffea0011ffffff] PMD -> 
[ffff880470200000-ffff88047fdfffff] on node 0
[    0.000000]  [ffffea0012000000-ffffea0021ffffff] PMD -> 
[ffff88086f600000-ffff88087f5fffff] on node 1
[    0.000000] Zone ranges:
[    0.000000]   DMA      [mem 0x00001000-0x00ffffff]
[    0.000000]   DMA32    [mem 0x01000000-0xffffffff]
[    0.000000]   Normal   [mem 0x100000000-0x87fffffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x00001000-0x00098fff]
[    0.000000]   node   0: [mem 0x00100000-0x696f7fff]
[    0.000000]   node   0: [mem 0x100000000-0x47fffffff]
[    0.000000]   node   1: [mem 0x480000000-0x87fffffff]

Thanks.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] numa, mem-hotplug: Fix stack overflow in numa when seting kernel nodes to unhotpluggable.
  2014-01-28  5:17                 ` Tang Chen
@ 2014-01-28  6:53                   ` Dave Jones
  0 siblings, 0 replies; 23+ messages in thread
From: Dave Jones @ 2014-01-28  6:53 UTC (permalink / raw)
  To: Tang Chen
  Cc: David Rientjes, tglx, mingo, hpa, akpm, zhangyanfei, guz.fnst,
	x86, linux-kernel

On Tue, Jan 28, 2014 at 01:17:21PM +0800, Tang Chen wrote:

 > Seeing from your earlier mail, it crashed at:
 > 
 >                  while (zonelist_zone_idx(z) > highest_zoneidx)
 >    de:   3b 77 08                cmp    0x8(%rdi),%esi
 > 
 > 
 > 	I stuck this at the top of the function..
 > 
 > 	printk(KERN_ERR "z:%p nodes:%p highest:%d\n", z, nodes, highest_zoneidx);
 > 
 > 	and got
 > 
 > 	z: 1d08   nodes: (null)  highest:3
 > 
 > 
 > nodes=null and highest=3, they are correct. When looking into 
 > next_zones_zonelist(),
 > I cannot see why it crashed. So, can you print the zone id in the
 > for_each_zone_zonelist() loop in nr_free_zone_pages() ?
 > I want to know why it crashed. A NULL pointer ?  Which one ?

It's not so easy further in the function, because the oops scrolls off
any useful printks, there's no scrollback, and no logging..
I even tried adding some udelays to slow things down (and using boot_delay)
but that makes things just hang seemingly indefinitly.

What about that 'z' ptr though ? 0x1d08 seems like a strange address
for us to have a structure at, though I'm not too familiar with the early
boot code, so maybe we do have something down there ?

	Dave 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] numa, mem-hotplug: Fix stack overflow in numa when seting kernel nodes to unhotpluggable.
  2014-01-28  5:31                 ` Tang Chen
@ 2014-01-28  7:10                   ` Tang Chen
  0 siblings, 0 replies; 23+ messages in thread
From: Tang Chen @ 2014-01-28  7:10 UTC (permalink / raw)
  To: Dave Jones, David Rientjes, tglx, mingo, hpa, akpm, zhangyanfei,
	guz.fnst, x86, linux-kernel


Hi Dave,

I think here is the overflow problem. Not the stackoverflow,
but the array index overflow.

Please have a look at the following path:

numa_init()
  |---> numa_register_memblks()
  |      |---> memblock_set_node(memory)		set correct nid in memblock.memory
  |      |---> memblock_set_node(reserved)	set correct nid in 
memblock.reserved
  |      |......
  |      |---> setup_node_data()
  |             |---> memblock_alloc_nid()	here, nid is set to 
MAX_NUMNODES (1024)
  |......
  |---> numa_clear_kernel_node_hotplug()
         |---> node_set()			here, we have an index 1024, and overflowed

For now, I think this is the first problem you mentioned.

Will send a new patch to fix it and do more tests.

Thanks.

On 01/28/2014 01:31 PM, Tang Chen wrote:
> On 01/28/2014 12:47 PM, Dave Jones wrote:
>> On Tue, Jan 28, 2014 at 12:47:11PM +0800, Tang Chen wrote:
>> > On 01/28/2014 11:55 AM, Dave Jones wrote:
>> > > On Tue, Jan 28, 2014 at 11:24:37AM +0800, Tang Chen wrote:
>> > >
>> > > > > I did a bisect with the patch above applied each step of the way.
>> > > > > This time I got a plausible looking result....
>> > > >
>> > > > I cannot reproduce this. Would you please share how to reproduce
>> it ?
>> > > > Or does it just happen during the booting ?
>> > >
>> > > Just during boot. Very early. So early in fact, I have no logging
>> facilities
>> > > like usb-serial, just what is on vga console.
>> > >
>> > > If you want me to add some printk's, I can add a while (1); before
>> > > the part that oopses so we can diagnose further..
>> >
>> > Sure. Would you please do that for me ? Maybe we can find something in
>> > the early log.
>>
>> I was hoping you'd have suggestions what you'd like me to dump ;-)
>
>
> I think I found something.
>
> Since I can reproduce the first problem on 3.10, I found some memory
> ranges in memblock
> have nid = 1024. When we use node_set(), it will crash.
>
> I'll see if we have the same problem on the latest kernel.
>
> [ 0.000000] NUMA: Initialized distance table, cnt=2
> [ 0.000000] NUMA: Warning: node ids are out of bound, from=-1 to=-1
> distance=10
> [ 0.000000] NUMA: Node 0 [mem 0x00000000-0x7fffffff] + [mem
> 0x100000000-0x47fffffff] -> [mem 0x00000000-0x47fffffff]
> [ 0.000000] Initmem setup node 0 [mem 0x00000000-0x47fffffff]
> [ 0.000000] NODE_DATA [mem 0x47ffd9000-0x47fffffff]
> [ 0.000000] Initmem setup node 1 [mem 0x480000000-0x87fffffff]
> [ 0.000000] NODE_DATA [mem 0x87ffbb000-0x87ffe1fff]
> [ 0.000000] AAAA: i = 0, nid = 0
> [ 0.000000] AAAA: i = 1, nid = 0
> [ 0.000000] AAAA: i = 2, nid = 0
> [ 0.000000] AAAA: i = 3, nid = 0
> [ 0.000000] AAAA: i = 4, nid = 1024
> [ 0.000000] AAAA: i = 5, nid = 1024
> [ 0.000000] AAAA: i = 6, nid = 1
> [ 0.000000] AAAA: i = 7, nid = 1
> [ 0.000000] Reserving 128MB of memory at 704MB for crashkernel (System
> RAM: 32406MB)
> [ 0.000000] [ffffea0000000000-ffffea0011ffffff] PMD ->
> [ffff880470200000-ffff88047fdfffff] on node 0
> [ 0.000000] [ffffea0012000000-ffffea0021ffffff] PMD ->
> [ffff88086f600000-ffff88087f5fffff] on node 1
> [ 0.000000] Zone ranges:
> [ 0.000000] DMA [mem 0x00001000-0x00ffffff]
> [ 0.000000] DMA32 [mem 0x01000000-0xffffffff]
> [ 0.000000] Normal [mem 0x100000000-0x87fffffff]
> [ 0.000000] Movable zone start for each node
> [ 0.000000] Early memory node ranges
> [ 0.000000] node 0: [mem 0x00001000-0x00098fff]
> [ 0.000000] node 0: [mem 0x00100000-0x696f7fff]
> [ 0.000000] node 0: [mem 0x100000000-0x47fffffff]
> [ 0.000000] node 1: [mem 0x480000000-0x87fffffff]
>
> Thanks.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2014-01-28  7:07 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-23  5:49 [PATCH] numa, mem-hotplug: Fix stack overflow in numa when seting kernel nodes to unhotpluggable Tang Chen
2014-01-23  6:01 ` Dave Jones
2014-01-23  6:05 ` Andrew Morton
2014-01-23  6:06 ` David Rientjes
2014-01-23  6:13   ` Dave Jones
2014-01-23  6:15     ` David Rientjes
2014-01-23  6:58       ` Dave Jones
2014-01-23 22:31         ` Dave Jones
2014-01-27  7:29           ` Tang Chen
2014-01-27 14:52             ` Dave Jones
2014-01-23  6:36     ` Tang Chen
2014-01-28  0:32   ` David Rientjes
2014-01-28  1:01     ` Tang Chen
2014-01-28  2:55       ` Dave Jones
2014-01-28  3:14         ` Tang Chen
2014-01-28  3:24         ` Tang Chen
2014-01-28  3:55           ` Dave Jones
2014-01-28  4:47             ` Tang Chen
2014-01-28  4:47               ` Dave Jones
2014-01-28  5:17                 ` Tang Chen
2014-01-28  6:53                   ` Dave Jones
2014-01-28  5:31                 ` Tang Chen
2014-01-28  7:10                   ` Tang Chen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).