linux-next.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* linux-next: PowerPC boot failures in next-20120521
@ 2012-05-22  1:40 Stephen Rothwell
  2012-05-22  1:53 ` David Rientjes
  2012-05-22  2:12 ` linux-next: PowerPC boot failures in next-20120521 Michael Neuling
  0 siblings, 2 replies; 13+ messages in thread
From: Stephen Rothwell @ 2012-05-22  1:40 UTC (permalink / raw)
  To: LKML
  Cc: linux-next, ppc-dev, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Peter Zijlstra, Lee Schermerhorn, Linus


[-- Attachment #1.1: Type: text/plain, Size: 2690 bytes --]

Hi all,

Last nights boot tests on various PowerPC systems failed like this:

calling  .numa_group_init+0x0/0x3c @ 1
initcall .numa_group_init+0x0/0x3c returned 0 after 0 usecs
calling  .numa_init+0x0/0x1dc @ 1
Unable to handle kernel paging request for data at address 0x00001688
Faulting instruction address: 0xc00000000016e154
Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=32 NUMA pSeries
Modules linked in:
NIP: c00000000016e154 LR: c0000000001b9140 CTR: 0000000000000000
REGS: c0000003fc8c76d0 TRAP: 0300   Not tainted  (3.4.0-autokern1)
MSR: 8000000000009032 <SF,EE,ME,IR,DR,RI>  CR: 24044022  XER: 00000003
SOFTE: 1
CFAR: 000000000000562c
DAR: 0000000000001688, DSISR: 40000000
TASK = c0000003fc8c8000[1] 'swapper/0' THREAD: c0000003fc8c4000 CPU: 0
GPR00: 0000000000000000 c0000003fc8c7950 c000000000d05b30 00000000000012d0 
GPR04: 0000000000000000 0000000000001680 0000000000000000 c0000003fe032f60 
GPR08: 0004005400000001 0000000000000000 ffffffffffffc980 c000000000d24fe0 
GPR12: 0000000024044024 c00000000f33b000 0000000001a3fa78 00000000009bac00 
GPR16: 0000000000e1f338 0000000002d513f0 0000000000001680 0000000000000000 
GPR20: 0000000000000001 c0000003fc8c7c00 0000000000000000 0000000000000001 
GPR24: 0000000000000001 c000000000d1b490 0000000000000000 0000000000001680 
GPR28: 0000000000000000 0000000000000000 c000000000c7ce58 c0000003fe009200 
NIP [c00000000016e154] .__alloc_pages_nodemask+0xc4/0x8f0
LR [c0000000001b9140] .new_slab+0xd0/0x3c0
Call Trace:
[c0000003fc8c7950] [2e6e756d615f696e] 0x2e6e756d615f696e (unreliable)
[c0000003fc8c7ae0] [c0000000001b9140] .new_slab+0xd0/0x3c0
[c0000003fc8c7b90] [c0000000001b9844] .__slab_alloc+0x254/0x5b0
[c0000003fc8c7cd0] [c0000000001bb7a4] .kmem_cache_alloc_node_trace+0x94/0x260
[c0000003fc8c7d80] [c000000000ba36d0] .numa_init+0x98/0x1dc
[c0000003fc8c7e10] [c00000000000ace4] .do_one_initcall+0x1a4/0x1e0
[c0000003fc8c7ed0] [c000000000b7b354] .kernel_init+0x124/0x2e0
[c0000003fc8c7f90] [c0000000000211c8] .kernel_thread+0x54/0x70
Instruction dump:
5400d97e 7b170020 0b000000 eb3e8000 3b800000 80190088 2f800000 40de0014 
7860efe2 787c6fe2 78000fa4 7f9c0378 <e81b0008> 83f90000 2fa00000 7fff1838 
---[ end trace 31fd0ba7d8756001 ]---

swapper/0 (1) used greatest stack depth: 10864 bytes left
Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b

I may be completely wrong, but I guess the obvious target would be the
sched/numa branch that came in via the tip tree.

Config file attached.  I haven't had a chance to try to bisect this yet.

Anyone have any ideas?
-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

[-- Attachment #1.2: dotconfig.bz2 --]
[-- Type: application/octet-stream, Size: 15419 bytes --]

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: linux-next: PowerPC boot failures in next-20120521
  2012-05-22  1:40 linux-next: PowerPC boot failures in next-20120521 Stephen Rothwell
@ 2012-05-22  1:53 ` David Rientjes
  2012-05-22  3:03   ` Stephen Rothwell
  2012-05-22  2:12 ` linux-next: PowerPC boot failures in next-20120521 Michael Neuling
  1 sibling, 1 reply; 13+ messages in thread
From: David Rientjes @ 2012-05-22  1:53 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: LKML, linux-next, ppc-dev, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Peter Zijlstra, Lee Schermerhorn, Linus

On Tue, 22 May 2012, Stephen Rothwell wrote:

> Unable to handle kernel paging request for data at address 0x00001688
> Faulting instruction address: 0xc00000000016e154
> Oops: Kernel access of bad area, sig: 11 [#1]
> SMP NR_CPUS=32 NUMA pSeries
> Modules linked in:
> NIP: c00000000016e154 LR: c0000000001b9140 CTR: 0000000000000000
> REGS: c0000003fc8c76d0 TRAP: 0300   Not tainted  (3.4.0-autokern1)
> MSR: 8000000000009032 <SF,EE,ME,IR,DR,RI>  CR: 24044022  XER: 00000003
> SOFTE: 1
> CFAR: 000000000000562c
> DAR: 0000000000001688, DSISR: 40000000
> TASK = c0000003fc8c8000[1] 'swapper/0' THREAD: c0000003fc8c4000 CPU: 0
> GPR00: 0000000000000000 c0000003fc8c7950 c000000000d05b30 00000000000012d0 
> GPR04: 0000000000000000 0000000000001680 0000000000000000 c0000003fe032f60 
> GPR08: 0004005400000001 0000000000000000 ffffffffffffc980 c000000000d24fe0 
> GPR12: 0000000024044024 c00000000f33b000 0000000001a3fa78 00000000009bac00 
> GPR16: 0000000000e1f338 0000000002d513f0 0000000000001680 0000000000000000 
> GPR20: 0000000000000001 c0000003fc8c7c00 0000000000000000 0000000000000001 
> GPR24: 0000000000000001 c000000000d1b490 0000000000000000 0000000000001680 
> GPR28: 0000000000000000 0000000000000000 c000000000c7ce58 c0000003fe009200 
> NIP [c00000000016e154] .__alloc_pages_nodemask+0xc4/0x8f0
> LR [c0000000001b9140] .new_slab+0xd0/0x3c0
> Call Trace:
> [c0000003fc8c7950] [2e6e756d615f696e] 0x2e6e756d615f696e (unreliable)
> [c0000003fc8c7ae0] [c0000000001b9140] .new_slab+0xd0/0x3c0
> [c0000003fc8c7b90] [c0000000001b9844] .__slab_alloc+0x254/0x5b0
> [c0000003fc8c7cd0] [c0000000001bb7a4] .kmem_cache_alloc_node_trace+0x94/0x260
> [c0000003fc8c7d80] [c000000000ba36d0] .numa_init+0x98/0x1dc
> [c0000003fc8c7e10] [c00000000000ace4] .do_one_initcall+0x1a4/0x1e0
> [c0000003fc8c7ed0] [c000000000b7b354] .kernel_init+0x124/0x2e0
> [c0000003fc8c7f90] [c0000000000211c8] .kernel_thread+0x54/0x70
> Instruction dump:
> 5400d97e 7b170020 0b000000 eb3e8000 3b800000 80190088 2f800000 40de0014 
> 7860efe2 787c6fe2 78000fa4 7f9c0378 <e81b0008> 83f90000 2fa00000 7fff1838 
> ---[ end trace 31fd0ba7d8756001 ]---
> 
> swapper/0 (1) used greatest stack depth: 10864 bytes left
> Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
> 
> I may be completely wrong, but I guess the obvious target would be the
> sched/numa branch that came in via the tip tree.
> 
> Config file attached.  I haven't had a chance to try to bisect this yet.
> 
> Anyone have any ideas?

Yeah, it's sched/numa since that's what introduced numa_init().  It does 
for_each_node() for each node and does a kmalloc_node() even though that 
node may not be online.  Slub ends up passing this node to the page 
allocator through alloc_pages_exact_node().  CONFIG_DEBUG_VM would have 
caught this and your config confirms its not enabled.

sched/numa either needs a memory hotplug notifier or it needs to pass 
NUMA_NO_NODE for nodes that aren't online.  Until we get the former, the 
following should fix it.


sched, numa: Allocate node_queue on any node for offline nodes

struct node_queue must be allocated with NUMA_NO_NODE for nodes that are 
not (yet) online, otherwise the page allocator has a bad zonelist.

Signed-off-by: David Rientjes <rientjes@google.com>
---
diff --git a/kernel/sched/numa.c b/kernel/sched/numa.c
--- a/kernel/sched/numa.c
+++ b/kernel/sched/numa.c
@@ -885,7 +885,8 @@ static __init int numa_init(void)
 
 	for_each_node(node) {
 		struct node_queue *nq = kmalloc_node(sizeof(*nq),
-				GFP_KERNEL | __GFP_ZERO, node);
+				GFP_KERNEL | __GFP_ZERO,
+				node_online(node) ? node : NUMA_NO_NODE);
 		BUG_ON(!nq);
 
 		spin_lock_init(&nq->lock);

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: linux-next: PowerPC boot failures in next-20120521
  2012-05-22  1:40 linux-next: PowerPC boot failures in next-20120521 Stephen Rothwell
  2012-05-22  1:53 ` David Rientjes
@ 2012-05-22  2:12 ` Michael Neuling
  2012-05-22  2:25   ` David Rientjes
  1 sibling, 1 reply; 13+ messages in thread
From: Michael Neuling @ 2012-05-22  2:12 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: LKML, linux-next, ppc-dev, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Peter Zijlstra, Lee Schermerhorn, Linus

> Hi all,
> 
> Last nights boot tests on various PowerPC systems failed like this:
> 
> calling  .numa_group_init+0x0/0x3c @ 1
> initcall .numa_group_init+0x0/0x3c returned 0 after 0 usecs
> calling  .numa_init+0x0/0x1dc @ 1
> Unable to handle kernel paging request for data at address 0x00001688
> Faulting instruction address: 0xc00000000016e154
> Oops: Kernel access of bad area, sig: 11 [#1]
> SMP NR_CPUS=32 NUMA pSeries
> Modules linked in:
> NIP: c00000000016e154 LR: c0000000001b9140 CTR: 0000000000000000
> REGS: c0000003fc8c76d0 TRAP: 0300   Not tainted  (3.4.0-autokern1)
> MSR: 8000000000009032 <SF,EE,ME,IR,DR,RI>  CR: 24044022  XER: 00000003
> SOFTE: 1
> CFAR: 000000000000562c
> DAR: 0000000000001688, DSISR: 40000000
> TASK = c0000003fc8c8000[1] 'swapper/0' THREAD: c0000003fc8c4000 CPU: 0
> GPR00: 0000000000000000 c0000003fc8c7950 c000000000d05b30 00000000000012d0 
> GPR04: 0000000000000000 0000000000001680 0000000000000000 c0000003fe032f60 
> GPR08: 0004005400000001 0000000000000000 ffffffffffffc980 c000000000d24fe0 
> GPR12: 0000000024044024 c00000000f33b000 0000000001a3fa78 00000000009bac00 
> GPR16: 0000000000e1f338 0000000002d513f0 0000000000001680 0000000000000000 
> GPR20: 0000000000000001 c0000003fc8c7c00 0000000000000000 0000000000000001 
> GPR24: 0000000000000001 c000000000d1b490 0000000000000000 0000000000001680 
> GPR28: 0000000000000000 0000000000000000 c000000000c7ce58 c0000003fe009200 
> NIP [c00000000016e154] .__alloc_pages_nodemask+0xc4/0x8f0
> LR [c0000000001b9140] .new_slab+0xd0/0x3c0
> Call Trace:
> [c0000003fc8c7950] [2e6e756d615f696e] 0x2e6e756d615f696e (unreliable)
> [c0000003fc8c7ae0] [c0000000001b9140] .new_slab+0xd0/0x3c0
> [c0000003fc8c7b90] [c0000000001b9844] .__slab_alloc+0x254/0x5b0
> [c0000003fc8c7cd0] [c0000000001bb7a4] .kmem_cache_alloc_node_trace+0x94/0x260
> [c0000003fc8c7d80] [c000000000ba36d0] .numa_init+0x98/0x1dc
> [c0000003fc8c7e10] [c00000000000ace4] .do_one_initcall+0x1a4/0x1e0
> [c0000003fc8c7ed0] [c000000000b7b354] .kernel_init+0x124/0x2e0
> [c0000003fc8c7f90] [c0000000000211c8] .kernel_thread+0x54/0x70
> Instruction dump:
> 5400d97e 7b170020 0b000000 eb3e8000 3b800000 80190088 2f800000 40de0014 
> 7860efe2 787c6fe2 78000fa4 7f9c0378 <e81b0008> 83f90000 2fa00000 7fff1838 
> ---[ end trace 31fd0ba7d8756001 ]---
> 
> swapper/0 (1) used greatest stack depth: 10864 bytes left
> Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
> 
> I may be completely wrong, but I guess the obvious target would be the
> sched/numa branch that came in via the tip tree.
> 
> Config file attached.  I haven't had a chance to try to bisect this yet.
> 
> Anyone have any ideas?

I'm getting similar here:


console [tty0] enabled
console [hvc0] enabled
pid_max: default: 32768 minimum: 301
Dentry cache hash table entries: 262144 (order: 5, 2097152 bytes)
Inode-cache hash table entries: 131072 (order: 4, 1048576 bytes)
Mount-cache hash table entries: 4096
Initializing cgroup subsys cpuacct
Initializing cgroup subsys devices
Initializing cgroup subsys freezer
POWER7 performance monitor hardware support registered
Unable to handle kernel paging request for data at address 0x00001388
Faulting instruction address: 0xc00000000014a070
Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=1024 NUMA pSeries
Modules linked in:
NIP: c00000000014a070 LR: c0000000001978cc CTR: c0000000000b6870
REGS: c00000007e5836b0 TRAP: 0300   Tainted: G        W     (3.4.0-rc6-mikey)
MSR: 9000000000009032 <SF,HV,EE,ME,IR,DR,RI>  CR: 28004022  XER: 02000000
SOFTE: 1
CFAR: 00000000000050fc
DAR: 0000000000001388, DSISR: 40000000
TASK = c00000007e560000[1] 'swapper/0' THREAD: c00000007e580000 CPU: 0
GPR00: 0000000000000000 c00000007e583930 c000000000c034d8 00000000000012d0 
GPR04: 0000000000000000 0000000000001380 0000000000000000 0000000000000001 
GPR08: c00000007e0dff60 0000000000000000 c000000000ca05a0 0000000000000000 
GPR12: 0000000028004024 c00000000ff20000 0000000000000000 0000000000000000 
GPR16: 0000000000000000 0000000000000000 0000000000000001 0000000000001380 
GPR20: 0000000000000001 c000000000e14900 c000000000e148f0 0000000000000001 
GPR24: c000000000c6f378 0000000000000000 0000000000001380 00000000000002aa 
GPR28: 0000000000000000 0000000000000000 c000000000b576b0 c00000007e021200 
NIP [c00000000014a070] .__alloc_pages_nodemask+0xd0/0x910
LR [c0000000001978cc] .new_slab+0xcc/0x3d0
Call Trace:
[c00000007e583930] [c00000007e5839c0] 0xc00000007e5839c0 (unreliable)
[c00000007e583ac0] [c0000000001978cc] .new_slab+0xcc/0x3d0
[c00000007e583b70] [c00000000072ae98] .__slab_alloc+0x38c/0x4f8
[c00000007e583cb0] [c000000000198190] .kmem_cache_alloc_node_trace+0x90/0x260
[c00000007e583d60] [c000000000a5a404] .numa_init+0x9c/0x188
[c00000007e583e00] [c00000000000aa30] .do_one_initcall+0x60/0x1e0
[c00000007e583ec0] [c000000000a40b60] .kernel_init+0x128/0x294
[c00000007e583f90] [c000000000020788] .kernel_thread+0x54/0x70
Instruction dump:
0b000000 eb1e8000 3b800000 801800a8 2f800000 409e001c 7860efe3 38000000 
41820008 38000002 787c6fe2 7f9c0378 <e93a0008> 801800a4 3b600000 2fa90000 
---[ end trace 31fd0ba7d8756002 ]---

Which seems to be this code in __alloc_pages_nodemask
---
        /*
         * Check the zones suitable for the gfp_mask contain at least one
         * valid zone. It's possible to have an empty zonelist as a result
         * of GFP_THISNODE and a memoryless node
         */
        if (unlikely(!zonelist->_zonerefs->zone))
c00000000014a070:       e9 3a 00 08     ld      r9,8(r26)
---

r26 is coming from r5 which is the struct zonelist *zonelist parameter
to __alloc_pages_nodemask.  Having 0000000000001380 in there is clearly
a bogus pointer.

Bisecting it points to b4cdf91668c27a5a6a5a3ed4234756c042dd8288
  b4cdf91 sched/numa: Implement numa balancer

Trying David's patch just posted doesn't fix it.

Mikey

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: linux-next: PowerPC boot failures in next-20120521
  2012-05-22  2:12 ` linux-next: PowerPC boot failures in next-20120521 Michael Neuling
@ 2012-05-22  2:25   ` David Rientjes
  2012-05-22  2:39     ` Michael Neuling
  0 siblings, 1 reply; 13+ messages in thread
From: David Rientjes @ 2012-05-22  2:25 UTC (permalink / raw)
  To: Michael Neuling
  Cc: Stephen Rothwell, LKML, linux-next, ppc-dev, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Peter Zijlstra, Lee Schermerhorn,
	Linus

On Tue, 22 May 2012, Michael Neuling wrote:

> console [tty0] enabled
> console [hvc0] enabled
> pid_max: default: 32768 minimum: 301
> Dentry cache hash table entries: 262144 (order: 5, 2097152 bytes)
> Inode-cache hash table entries: 131072 (order: 4, 1048576 bytes)
> Mount-cache hash table entries: 4096
> Initializing cgroup subsys cpuacct
> Initializing cgroup subsys devices
> Initializing cgroup subsys freezer
> POWER7 performance monitor hardware support registered
> Unable to handle kernel paging request for data at address 0x00001388
> Faulting instruction address: 0xc00000000014a070
> Oops: Kernel access of bad area, sig: 11 [#1]
> SMP NR_CPUS=1024 NUMA pSeries
> Modules linked in:
> NIP: c00000000014a070 LR: c0000000001978cc CTR: c0000000000b6870
> REGS: c00000007e5836b0 TRAP: 0300   Tainted: G        W     (3.4.0-rc6-mikey)
> MSR: 9000000000009032 <SF,HV,EE,ME,IR,DR,RI>  CR: 28004022  XER: 02000000
> SOFTE: 1
> CFAR: 00000000000050fc
> DAR: 0000000000001388, DSISR: 40000000
> TASK = c00000007e560000[1] 'swapper/0' THREAD: c00000007e580000 CPU: 0
> GPR00: 0000000000000000 c00000007e583930 c000000000c034d8 00000000000012d0 
> GPR04: 0000000000000000 0000000000001380 0000000000000000 0000000000000001 
> GPR08: c00000007e0dff60 0000000000000000 c000000000ca05a0 0000000000000000 
> GPR12: 0000000028004024 c00000000ff20000 0000000000000000 0000000000000000 
> GPR16: 0000000000000000 0000000000000000 0000000000000001 0000000000001380 
> GPR20: 0000000000000001 c000000000e14900 c000000000e148f0 0000000000000001 
> GPR24: c000000000c6f378 0000000000000000 0000000000001380 00000000000002aa 
> GPR28: 0000000000000000 0000000000000000 c000000000b576b0 c00000007e021200 
> NIP [c00000000014a070] .__alloc_pages_nodemask+0xd0/0x910
> LR [c0000000001978cc] .new_slab+0xcc/0x3d0
> Call Trace:
> [c00000007e583930] [c00000007e5839c0] 0xc00000007e5839c0 (unreliable)
> [c00000007e583ac0] [c0000000001978cc] .new_slab+0xcc/0x3d0
> [c00000007e583b70] [c00000000072ae98] .__slab_alloc+0x38c/0x4f8
> [c00000007e583cb0] [c000000000198190] .kmem_cache_alloc_node_trace+0x90/0x260
> [c00000007e583d60] [c000000000a5a404] .numa_init+0x9c/0x188
> [c00000007e583e00] [c00000000000aa30] .do_one_initcall+0x60/0x1e0
> [c00000007e583ec0] [c000000000a40b60] .kernel_init+0x128/0x294
> [c00000007e583f90] [c000000000020788] .kernel_thread+0x54/0x70
> Instruction dump:
> 0b000000 eb1e8000 3b800000 801800a8 2f800000 409e001c 7860efe3 38000000 
> 41820008 38000002 787c6fe2 7f9c0378 <e93a0008> 801800a4 3b600000 2fa90000 
> ---[ end trace 31fd0ba7d8756002 ]---
> 
> Which seems to be this code in __alloc_pages_nodemask
> ---
>         /*
>          * Check the zones suitable for the gfp_mask contain at least one
>          * valid zone. It's possible to have an empty zonelist as a result
>          * of GFP_THISNODE and a memoryless node
>          */
>         if (unlikely(!zonelist->_zonerefs->zone))
> c00000000014a070:       e9 3a 00 08     ld      r9,8(r26)
> ---
> 
> r26 is coming from r5 which is the struct zonelist *zonelist parameter
> to __alloc_pages_nodemask.  Having 0000000000001380 in there is clearly
> a bogus pointer.
> 
> Bisecting it points to b4cdf91668c27a5a6a5a3ed4234756c042dd8288
>   b4cdf91 sched/numa: Implement numa balancer
> 
> Trying David's patch just posted doesn't fix it.
> 

Hmm, what does CONFIG_DEBUG_VM say?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: linux-next: PowerPC boot failures in next-20120521
  2012-05-22  2:25   ` David Rientjes
@ 2012-05-22  2:39     ` Michael Neuling
  2012-05-22  2:40       ` Michael Neuling
  0 siblings, 1 reply; 13+ messages in thread
From: Michael Neuling @ 2012-05-22  2:39 UTC (permalink / raw)
  To: David Rientjes
  Cc: Stephen Rothwell, LKML, linux-next, ppc-dev, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Peter Zijlstra, Lee Schermerhorn,
	Linus

> > Trying David's patch just posted doesn't fix it.
> > 
> 
> Hmm, what does CONFIG_DEBUG_VM say?

No set.

Mikey

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: linux-next: PowerPC boot failures in next-20120521
  2012-05-22  2:39     ` Michael Neuling
@ 2012-05-22  2:40       ` Michael Neuling
  2012-05-22  2:44         ` David Rientjes
  0 siblings, 1 reply; 13+ messages in thread
From: Michael Neuling @ 2012-05-22  2:40 UTC (permalink / raw)
  To: David Rientjes
  Cc: Stephen Rothwell, LKML, linux-next, ppc-dev, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Peter Zijlstra, Lee Schermerhorn,
	Linus

Michael Neuling <mikey@neuling.org> wrote:

> > > Trying David's patch just posted doesn't fix it.
> > > 
> > 
> > Hmm, what does CONFIG_DEBUG_VM say?
> 
> No set.

Sorry, should have read "Not set"

mikey

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: linux-next: PowerPC boot failures in next-20120521
  2012-05-22  2:40       ` Michael Neuling
@ 2012-05-22  2:44         ` David Rientjes
  2012-05-22  2:51           ` Michael Neuling
  0 siblings, 1 reply; 13+ messages in thread
From: David Rientjes @ 2012-05-22  2:44 UTC (permalink / raw)
  To: Michael Neuling
  Cc: Stephen Rothwell, LKML, linux-next, ppc-dev, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Peter Zijlstra, Lee Schermerhorn,
	Linus

On Tue, 22 May 2012, Michael Neuling wrote:

> > > > Trying David's patch just posted doesn't fix it.
> > > > 
> > > 
> > > Hmm, what does CONFIG_DEBUG_VM say?
> > 
> > No set.
> 
> Sorry, should have read "Not set"
> 

I mean if it's set, what does it emit to the kernel log with my patch 
applied?

I made CONFIG_DEBUG_VM catch !node_online(node) about six months ago, so I 
was thinking it would have caught this if either you or Stephen enable it.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: linux-next: PowerPC boot failures in next-20120521
  2012-05-22  2:44         ` David Rientjes
@ 2012-05-22  2:51           ` Michael Neuling
  2012-05-22  2:58             ` David Rientjes
  0 siblings, 1 reply; 13+ messages in thread
From: Michael Neuling @ 2012-05-22  2:51 UTC (permalink / raw)
  To: David Rientjes
  Cc: Stephen Rothwell, LKML, linux-next, ppc-dev, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Peter Zijlstra, Lee Schermerhorn,
	Linus

David Rientjes <rientjes@google.com> wrote:

> On Tue, 22 May 2012, Michael Neuling wrote:
> 
> > > > > Trying David's patch just posted doesn't fix it.
> > > > > 
> > > > 
> > > > Hmm, what does CONFIG_DEBUG_VM say?
> > > 
> > > No set.
> > 
> > Sorry, should have read "Not set"
> > 
> 
> I mean if it's set, what does it emit to the kernel log with my patch 
> applied?
> 
> I made CONFIG_DEBUG_VM catch !node_online(node) about six months ago, so I 
> was thinking it would have caught this if either you or Stephen enable it.

Sorry, got it... CONFIG_DEBUG_VM enabled below...

pid_max: default: 32768 minimum: 301
Dentry cache hash table entries: 262144 (order: 5, 2097152 bytes)
Inode-cache hash table entries: 131072 (order: 4, 1048576 bytes)
Mount-cache hash table entries: 4096
Initializing cgroup subsys cpuacct
Initializing cgroup subsys devices
Initializing cgroup subsys freezer
POWER7 performance monitor hardware support registered
------------[ cut here ]------------
kernel BUG at /scratch/mikey/src/linux-next/include/linux/gfp.h:318!
Oops: Exception in kernel mode, sig: 5 [#1]
SMP NR_CPUS=1024 NUMA pSeries
Modules linked in:
NIP: c000000000199164 LR: c0000000001993e0 CTR: c0000000000b6b70
REGS: c00000007e583830 TRAP: 0700   Tainted: G        W     (3.4.0-rc6-mikey)
MSR: 9000000000029032 <SF,HV,EE,ME,IR,DR,RI>  CR: 28004028  XER: 02000000
SOFTE: 1
CFAR: c0000000001993c4
TASK = c00000007e560000[1] 'swapper/0' THREAD: c00000007e580000 CPU: 0
GPR00: 0000000000000001 c00000007e583ab0 c000000000c035a0 00000000000012d0 
GPR04: 0000000000000000 0000000000000001 c000000000e14900 0005055500000001 
GPR08: 0000000000000001 00000000000012d0 c000000000c6f398 0000000000000001 
GPR12: 0000000028004022 c00000000ff20000 0000000000000000 0000000000000000 
GPR16: 0000000000000000 0000000000000000 0000000000001380 0000000000000000 
GPR20: 0000000000000001 c000000000e14900 c000000000e148f0 0000000000210d00 
GPR24: 0000000000000001 00000000000000d0 00000000000002aa 0000000000000000 
GPR28: 00000000000000d0 0000000000000001 c000000000b58fc8 c00000007e021200 
NIP [c000000000199164] .new_slab+0xb4/0x440
LR [c0000000001993e0] .new_slab+0x330/0x440
Call Trace:
[c00000007e583ab0] [c0000000001993e0] .new_slab+0x330/0x440 (unreliable)
[c00000007e583b60] [c00000000072ce84] .__slab_alloc+0x3bc/0x52c
[c00000007e583ca0] [c000000000199b08] .kmem_cache_alloc_node_trace+0x98/0x280
[c00000007e583d60] [c000000000a5a440] .numa_init+0x9c/0x188
[c00000007e583e00] [c00000000000aa30] .do_one_initcall+0x60/0x1e0
[c00000007e583ec0] [c000000000a40b60] .kernel_init+0x128/0x294
[c00000007e583f90] [c000000000020788] .kernel_thread+0x54/0x70
Instruction dump:
7b5b8402 7f6407b4 7c1ce378 7d29e038 7b990020 61291200 79230020 419202b8 
2b9d00ff 78840020 38000001 409d0240 <0b000000> e95e8140 792977e2 7bab1f24 
---[ end trace 31fd0ba7d8756002 ]---

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: linux-next: PowerPC boot failures in next-20120521
  2012-05-22  2:51           ` Michael Neuling
@ 2012-05-22  2:58             ` David Rientjes
  2012-05-22  3:12               ` Michael Neuling
  0 siblings, 1 reply; 13+ messages in thread
From: David Rientjes @ 2012-05-22  2:58 UTC (permalink / raw)
  To: Michael Neuling
  Cc: Stephen Rothwell, LKML, linux-next, ppc-dev, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Peter Zijlstra, Lee Schermerhorn,
	Linus

On Tue, 22 May 2012, Michael Neuling wrote:

> Sorry, got it... CONFIG_DEBUG_VM enabled below...
> 
> pid_max: default: 32768 minimum: 301
> Dentry cache hash table entries: 262144 (order: 5, 2097152 bytes)
> Inode-cache hash table entries: 131072 (order: 4, 1048576 bytes)
> Mount-cache hash table entries: 4096
> Initializing cgroup subsys cpuacct
> Initializing cgroup subsys devices
> Initializing cgroup subsys freezer
> POWER7 performance monitor hardware support registered
> ------------[ cut here ]------------
> kernel BUG at /scratch/mikey/src/linux-next/include/linux/gfp.h:318!

Yeah, this is what I was expecting, it's tripping on

	VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid));

and slub won't pass nid < 0.  You're sure my patch is applied? :)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: linux-next: PowerPC boot failures in next-20120521
  2012-05-22  1:53 ` David Rientjes
@ 2012-05-22  3:03   ` Stephen Rothwell
  2012-05-22  3:25     ` Stephen Rothwell
  2012-05-23  4:17     ` [patch] sched, numa: Allocate node_queue on any node for offline nodes David Rientjes
  0 siblings, 2 replies; 13+ messages in thread
From: Stephen Rothwell @ 2012-05-22  3:03 UTC (permalink / raw)
  To: David Rientjes
  Cc: LKML, linux-next, ppc-dev, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Peter Zijlstra, Lee Schermerhorn, Linus

[-- Attachment #1: Type: text/plain, Size: 1082 bytes --]

Hi David,

On Mon, 21 May 2012 18:53:37 -0700 (PDT) David Rientjes <rientjes@google.com> wrote:
>
> Yeah, it's sched/numa since that's what introduced numa_init().  It does 
> for_each_node() for each node and does a kmalloc_node() even though that 
> node may not be online.  Slub ends up passing this node to the page 
> allocator through alloc_pages_exact_node().  CONFIG_DEBUG_VM would have 
> caught this and your config confirms its not enabled.
> 
> sched/numa either needs a memory hotplug notifier or it needs to pass 
> NUMA_NO_NODE for nodes that aren't online.  Until we get the former, the 
> following should fix it.
> 
> 
> sched, numa: Allocate node_queue on any node for offline nodes
> 
> struct node_queue must be allocated with NUMA_NO_NODE for nodes that are 
> not (yet) online, otherwise the page allocator has a bad zonelist.
> 
> Signed-off-by: David Rientjes <rientjes@google.com>

Thanks, that fixes it.

Tested-by: Stephen Rothwell <sfr@canb.auug.org.au>

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: linux-next: PowerPC boot failures in next-20120521
  2012-05-22  2:58             ` David Rientjes
@ 2012-05-22  3:12               ` Michael Neuling
  0 siblings, 0 replies; 13+ messages in thread
From: Michael Neuling @ 2012-05-22  3:12 UTC (permalink / raw)
  To: David Rientjes
  Cc: Stephen Rothwell, LKML, linux-next, ppc-dev, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Peter Zijlstra, Lee Schermerhorn,
	Linus

David Rientjes <rientjes@google.com> wrote:

> On Tue, 22 May 2012, Michael Neuling wrote:
> 
> > Sorry, got it... CONFIG_DEBUG_VM enabled below...
> > 
> > pid_max: default: 32768 minimum: 301
> > Dentry cache hash table entries: 262144 (order: 5, 2097152 bytes)
> > Inode-cache hash table entries: 131072 (order: 4, 1048576 bytes)
> > Mount-cache hash table entries: 4096
> > Initializing cgroup subsys cpuacct
> > Initializing cgroup subsys devices
> > Initializing cgroup subsys freezer
> > POWER7 performance monitor hardware support registered
> > ------------[ cut here ]------------
> > kernel BUG at /scratch/mikey/src/linux-next/include/linux/gfp.h:318!
> 
> Yeah, this is what I was expecting, it's tripping on
> 
> 	VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid));
> 
> and slub won't pass nid < 0.  You're sure my patch is applied? :)

I did have your patch applied but at "b4cdf91 sched/numa: Implement numa
balancer" (where git bisect spotted the fail).  

If I apply your patch on the full next-20120521 it does fix the problem.

Sorry for the confusion.

Thanks!
Mikey

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: linux-next: PowerPC boot failures in next-20120521
  2012-05-22  3:03   ` Stephen Rothwell
@ 2012-05-22  3:25     ` Stephen Rothwell
  2012-05-23  4:17     ` [patch] sched, numa: Allocate node_queue on any node for offline nodes David Rientjes
  1 sibling, 0 replies; 13+ messages in thread
From: Stephen Rothwell @ 2012-05-22  3:25 UTC (permalink / raw)
  To: David Rientjes
  Cc: LKML, linux-next, ppc-dev, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Peter Zijlstra, Lee Schermerhorn, Linus

[-- Attachment #1: Type: text/plain, Size: 1285 bytes --]

On Tue, 22 May 2012 13:03:54 +1000 Stephen Rothwell <sfr@canb.auug.org.au> wrote:
>
> On Mon, 21 May 2012 18:53:37 -0700 (PDT) David Rientjes <rientjes@google.com> wrote:
> >
> > Yeah, it's sched/numa since that's what introduced numa_init().  It does 
> > for_each_node() for each node and does a kmalloc_node() even though that 
> > node may not be online.  Slub ends up passing this node to the page 
> > allocator through alloc_pages_exact_node().  CONFIG_DEBUG_VM would have 
> > caught this and your config confirms its not enabled.
> > 
> > sched/numa either needs a memory hotplug notifier or it needs to pass 
> > NUMA_NO_NODE for nodes that aren't online.  Until we get the former, the 
> > following should fix it.
> > 
> > 
> > sched, numa: Allocate node_queue on any node for offline nodes
> > 
> > struct node_queue must be allocated with NUMA_NO_NODE for nodes that are 
> > not (yet) online, otherwise the page allocator has a bad zonelist.
> > 
> > Signed-off-by: David Rientjes <rientjes@google.com>
> 
> Thanks, that fixes it.
> 
> Tested-by: Stephen Rothwell <sfr@canb.auug.org.au>

And I will put that patch in linux-next until it (or something better)
appears.

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [patch] sched, numa: Allocate node_queue on any node for offline nodes
  2012-05-22  3:03   ` Stephen Rothwell
  2012-05-22  3:25     ` Stephen Rothwell
@ 2012-05-23  4:17     ` David Rientjes
  1 sibling, 0 replies; 13+ messages in thread
From: David Rientjes @ 2012-05-23  4:17 UTC (permalink / raw)
  To: Ingo Molnar, H. Peter Anvin, Thomas Gleixner, Peter Zijlstra
  Cc: Stephen Rothwell, linux-kernel, linux-next, linuxppc-dev,
	Lee Schermerhorn, Linus Torvalds

struct node_queue must be allocated with NUMA_NO_NODE for nodes that are 
not (yet) online, otherwise the page allocator has a bad zonelist and 
results in an early crash.

Tested-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David Rientjes <rientjes@google.com>
---
 kernel/sched/numa.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/kernel/sched/numa.c b/kernel/sched/numa.c
--- a/kernel/sched/numa.c
+++ b/kernel/sched/numa.c
@@ -885,7 +885,8 @@ static __init int numa_init(void)
 
 	for_each_node(node) {
 		struct node_queue *nq = kmalloc_node(sizeof(*nq),
-				GFP_KERNEL | __GFP_ZERO, node);
+				GFP_KERNEL | __GFP_ZERO,
+				node_online(node) ? node : NUMA_NO_NODE);
 		BUG_ON(!nq);
 
 		spin_lock_init(&nq->lock);

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2012-05-23  4:18 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-22  1:40 linux-next: PowerPC boot failures in next-20120521 Stephen Rothwell
2012-05-22  1:53 ` David Rientjes
2012-05-22  3:03   ` Stephen Rothwell
2012-05-22  3:25     ` Stephen Rothwell
2012-05-23  4:17     ` [patch] sched, numa: Allocate node_queue on any node for offline nodes David Rientjes
2012-05-22  2:12 ` linux-next: PowerPC boot failures in next-20120521 Michael Neuling
2012-05-22  2:25   ` David Rientjes
2012-05-22  2:39     ` Michael Neuling
2012-05-22  2:40       ` Michael Neuling
2012-05-22  2:44         ` David Rientjes
2012-05-22  2:51           ` Michael Neuling
2012-05-22  2:58             ` David Rientjes
2012-05-22  3:12               ` Michael Neuling

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).