All of lore.kernel.org
 help / color / mirror / Atom feed
* Problems booting 3.0.3 kernel on Octeon CN58XX board
@ 2011-08-19  0:10 ` Jason Kwon
  0 siblings, 0 replies; 9+ messages in thread
From: Jason Kwon @ 2011-08-19  0:10 UTC (permalink / raw)
  To: linux-mips

Attempting to boot a 3.0.3 kernel on a CN58XX board produced the 
following oops:

CPU 4 Unable to handle kernel paging request at virtual address 
0000000001c00000, epc == ffffffff811aa9f4, ra == ffffffff811aaa98
Oops[#1]:
Cpu 4
$ 0   : 0000000000000000 0000000010008ce0 ffffffff821d2b80 0000000001c00000
$ 4   : 0000000001c00038 000000000000017c 0000000000080000 0000000000080072
$ 8   : 0000000000000008 0000000000000002 0000000000000003 a800000002284520
$12   : 0000000000000002 ffffffff8186ee80 ffffffffffffff80 0000000000000030
$16   : 0000000000080072 0000000000000001 0000000001bfa8f0 0000000001bfa928
$20   : a800000003aff8f0 00000000000f0000 ffffffff8186ee80 ffffffff821d2a80
$24   : 0000000000000001 0000000000000038
$28   : a80000041fc48000 a80000041fc4bd90 fffffffffffffffc ffffffff811aaa98
Hi    : 0000000000000000
Lo    : 0000000000000000
epc   : ffffffff811aa9f4 setup_per_zone_wmarks+0x19c/0x2d8
     Not tainted
ra    : ffffffff811aaa98 setup_per_zone_wmarks+0x240/0x2d8
Status: 10008ce2    KX SX UX KERNEL EXL
Cause : 40808408
BadVA : 0000000001c00000
PrId  : 000d0301 (Cavium Octeon+)
Modules linked in:
Process swapper (pid: 1, threadinfo=a80000041fc48000, 
task=a80000041fc44038, tls=0000000000000000)
Stack : 0000000000000000 000000000006f75d ffffffff8186eec0 0000000000000001
         0000000000000547 ffffffff81825598 ffffffff81a80000 ffffffff818b3e68
         ffffffff818a40ac 0000000000000000 ffffffff81a80000 0000000000000000
         0000000000000000 0000000000000000 0000000000000000 ffffffff818a40f0
         ffffffff81a80000 ffffffff81100438 ffffffff818b4198 ffffffff818b3e68
         ffffffff818b46c8 0000000000000000 0000000000000000 ffffffff818721d0
         0000000000000000 0000000000000000 0000000000000000 ffffffff81109bb0
         0000000000000000 0000000000000000 0000000000000000 0000000000000000
         0000000000000000 0000000000000000 0000000000000000 ffffffff818720f8
         0000000000000000 0000000000000000 0000000000000000 0000000000000000
         ...
Call Trace:
[<ffffffff811aa9f4>] setup_per_zone_wmarks+0x19c/0x2d8
[<ffffffff818a40f0>] init_per_zone_wmark_min+0x44/0xe0
[<ffffffff81100438>] do_one_initcall+0x38/0x160
[<ffffffff818721d0>] kernel_init+0xd8/0x178
[<ffffffff81109bb0>] kernel_thread_helper+0x10/0x18


Code: 007e1824  0064182d  64840038 <dc620000> c84afff5  64c60001  
66020200  66737000  1220000c

All code
========
    0:    24 18                    and    $0x18,%al
    2:    7e 00                    jle    0x4
    4:    2d 18 64 00 38           sub    $0x38006418,%eax
    9:    00 84 64 00 00 62 dc     add    %al,-0x239e0000(%rsp,%riz,2)
   10:    f5                       cmc
   11:    ff 4a c8                 decl   -0x38(%rdx)
   14:    01 00                    add    %eax,(%rax)
   16:    c6                       (bad)
   17:    64 00 02                 add    %al,%fs:(%rdx)
   1a:    02 66 00                 add    0x0(%rsi),%ah
   1d:    70 73                    jo     0x92
   1f:    66                       data16
   20:    0c 00                    or     $0x0,%al
   22:    20 12                    and    %dl,(%rdx)

Code starting with the faulting instruction
===========================================
    0:    00 00                    add    %al,(%rax)
    2:    62                       (bad)
    3:    dc f5                    fdiv   %st,%st(5)
    5:    ff 4a c8                 decl   -0x38(%rdx)
    8:    01 00                    add    %eax,(%rax)
    a:    c6                       (bad)
    b:    64 00 02                 add    %al,%fs:(%rdx)
    e:    02 66 00                 add    0x0(%rsi),%ah
   11:    70 73                    jo     0x86
   13:    66                       data16
   14:    0c 00                    or     $0x0,%al
   16:    20 12                    and    %dl,(%rdx)

(gdb) list *setup_per_zone_wmarks+0x19c
0x811aa9f4 is in setup_per_zone_wmarks 
(include/asm-generic/bitops/non-atomic.h:105).
100     * @nr: bit number to test
101     * @addr: Address to start counting from
102     */
103    static inline int test_bit(int nr, const volatile unsigned long 
*addr)
104    {
105        return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
106    }
107
108    #endif /* _ASM_GENERIC_BITOPS_NON_ATOMIC_H_ */
(gdb)

I am able to boot 2.6.39.3 on this board.  Is this a known issue?  Thanks,

Jason

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Problems booting 3.0.3 kernel on Octeon CN58XX board
@ 2011-08-19  0:10 ` Jason Kwon
  0 siblings, 0 replies; 9+ messages in thread
From: Jason Kwon @ 2011-08-19  0:10 UTC (permalink / raw)
  To: linux-mips

Attempting to boot a 3.0.3 kernel on a CN58XX board produced the 
following oops:

CPU 4 Unable to handle kernel paging request at virtual address 
0000000001c00000, epc == ffffffff811aa9f4, ra == ffffffff811aaa98
Oops[#1]:
Cpu 4
$ 0   : 0000000000000000 0000000010008ce0 ffffffff821d2b80 0000000001c00000
$ 4   : 0000000001c00038 000000000000017c 0000000000080000 0000000000080072
$ 8   : 0000000000000008 0000000000000002 0000000000000003 a800000002284520
$12   : 0000000000000002 ffffffff8186ee80 ffffffffffffff80 0000000000000030
$16   : 0000000000080072 0000000000000001 0000000001bfa8f0 0000000001bfa928
$20   : a800000003aff8f0 00000000000f0000 ffffffff8186ee80 ffffffff821d2a80
$24   : 0000000000000001 0000000000000038
$28   : a80000041fc48000 a80000041fc4bd90 fffffffffffffffc ffffffff811aaa98
Hi    : 0000000000000000
Lo    : 0000000000000000
epc   : ffffffff811aa9f4 setup_per_zone_wmarks+0x19c/0x2d8
     Not tainted
ra    : ffffffff811aaa98 setup_per_zone_wmarks+0x240/0x2d8
Status: 10008ce2    KX SX UX KERNEL EXL
Cause : 40808408
BadVA : 0000000001c00000
PrId  : 000d0301 (Cavium Octeon+)
Modules linked in:
Process swapper (pid: 1, threadinfo=a80000041fc48000, 
task=a80000041fc44038, tls=0000000000000000)
Stack : 0000000000000000 000000000006f75d ffffffff8186eec0 0000000000000001
         0000000000000547 ffffffff81825598 ffffffff81a80000 ffffffff818b3e68
         ffffffff818a40ac 0000000000000000 ffffffff81a80000 0000000000000000
         0000000000000000 0000000000000000 0000000000000000 ffffffff818a40f0
         ffffffff81a80000 ffffffff81100438 ffffffff818b4198 ffffffff818b3e68
         ffffffff818b46c8 0000000000000000 0000000000000000 ffffffff818721d0
         0000000000000000 0000000000000000 0000000000000000 ffffffff81109bb0
         0000000000000000 0000000000000000 0000000000000000 0000000000000000
         0000000000000000 0000000000000000 0000000000000000 ffffffff818720f8
         0000000000000000 0000000000000000 0000000000000000 0000000000000000
         ...
Call Trace:
[<ffffffff811aa9f4>] setup_per_zone_wmarks+0x19c/0x2d8
[<ffffffff818a40f0>] init_per_zone_wmark_min+0x44/0xe0
[<ffffffff81100438>] do_one_initcall+0x38/0x160
[<ffffffff818721d0>] kernel_init+0xd8/0x178
[<ffffffff81109bb0>] kernel_thread_helper+0x10/0x18


Code: 007e1824  0064182d  64840038 <dc620000> c84afff5  64c60001  
66020200  66737000  1220000c

All code
========
    0:    24 18                    and    $0x18,%al
    2:    7e 00                    jle    0x4
    4:    2d 18 64 00 38           sub    $0x38006418,%eax
    9:    00 84 64 00 00 62 dc     add    %al,-0x239e0000(%rsp,%riz,2)
   10:    f5                       cmc
   11:    ff 4a c8                 decl   -0x38(%rdx)
   14:    01 00                    add    %eax,(%rax)
   16:    c6                       (bad)
   17:    64 00 02                 add    %al,%fs:(%rdx)
   1a:    02 66 00                 add    0x0(%rsi),%ah
   1d:    70 73                    jo     0x92
   1f:    66                       data16
   20:    0c 00                    or     $0x0,%al
   22:    20 12                    and    %dl,(%rdx)

Code starting with the faulting instruction
===========================================
    0:    00 00                    add    %al,(%rax)
    2:    62                       (bad)
    3:    dc f5                    fdiv   %st,%st(5)
    5:    ff 4a c8                 decl   -0x38(%rdx)
    8:    01 00                    add    %eax,(%rax)
    a:    c6                       (bad)
    b:    64 00 02                 add    %al,%fs:(%rdx)
    e:    02 66 00                 add    0x0(%rsi),%ah
   11:    70 73                    jo     0x86
   13:    66                       data16
   14:    0c 00                    or     $0x0,%al
   16:    20 12                    and    %dl,(%rdx)

(gdb) list *setup_per_zone_wmarks+0x19c
0x811aa9f4 is in setup_per_zone_wmarks 
(include/asm-generic/bitops/non-atomic.h:105).
100     * @nr: bit number to test
101     * @addr: Address to start counting from
102     */
103    static inline int test_bit(int nr, const volatile unsigned long 
*addr)
104    {
105        return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
106    }
107
108    #endif /* _ASM_GENERIC_BITOPS_NON_ATOMIC_H_ */
(gdb)

I am able to boot 2.6.39.3 on this board.  Is this a known issue?  Thanks,

Jason

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Problems booting 3.0.3 kernel on Octeon CN58XX board
  2011-08-19  0:10 ` Jason Kwon
  (?)
@ 2011-08-19  1:05 ` David Daney
  -1 siblings, 0 replies; 9+ messages in thread
From: David Daney @ 2011-08-19  1:05 UTC (permalink / raw)
  To: Jason Kwon; +Cc: linux-mips

On 08/18/2011 05:10 PM, Jason Kwon wrote:
> Attempting to boot a 3.0.3 kernel on a CN58XX board produced the
> following oops:
>
> CPU 4 Unable to handle kernel paging request at virtual address
> 0000000001c00000, epc == ffffffff811aa9f4, ra == ffffffff811aaa98
> Oops[#1]:
> Cpu 4
> $ 0 : 0000000000000000 0000000010008ce0 ffffffff821d2b80 0000000001c00000
> $ 4 : 0000000001c00038 000000000000017c 0000000000080000 0000000000080072
> $ 8 : 0000000000000008 0000000000000002 0000000000000003 a800000002284520
> $12 : 0000000000000002 ffffffff8186ee80 ffffffffffffff80 0000000000000030
> $16 : 0000000000080072 0000000000000001 0000000001bfa8f0 0000000001bfa928
> $20 : a800000003aff8f0 00000000000f0000 ffffffff8186ee80 ffffffff821d2a80
> $24 : 0000000000000001 0000000000000038
> $28 : a80000041fc48000 a80000041fc4bd90 fffffffffffffffc ffffffff811aaa98
> Hi : 0000000000000000
> Lo : 0000000000000000
> epc : ffffffff811aa9f4 setup_per_zone_wmarks+0x19c/0x2d8
> Not tainted
> ra : ffffffff811aaa98 setup_per_zone_wmarks+0x240/0x2d8
> Status: 10008ce2 KX SX UX KERNEL EXL
> Cause : 40808408
> BadVA : 0000000001c00000
> PrId : 000d0301 (Cavium Octeon+)
> Modules linked in:
> Process swapper (pid: 1, threadinfo=a80000041fc48000,
> task=a80000041fc44038, tls=0000000000000000)
> Stack : 0000000000000000 000000000006f75d ffffffff8186eec0 0000000000000001
> 0000000000000547 ffffffff81825598 ffffffff81a80000 ffffffff818b3e68
> ffffffff818a40ac 0000000000000000 ffffffff81a80000 0000000000000000
> 0000000000000000 0000000000000000 0000000000000000 ffffffff818a40f0
> ffffffff81a80000 ffffffff81100438 ffffffff818b4198 ffffffff818b3e68
> ffffffff818b46c8 0000000000000000 0000000000000000 ffffffff818721d0
> 0000000000000000 0000000000000000 0000000000000000 ffffffff81109bb0
> 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> 0000000000000000 0000000000000000 0000000000000000 ffffffff818720f8
> 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> ...
> Call Trace:
> [<ffffffff811aa9f4>] setup_per_zone_wmarks+0x19c/0x2d8
> [<ffffffff818a40f0>] init_per_zone_wmark_min+0x44/0xe0
> [<ffffffff81100438>] do_one_initcall+0x38/0x160
> [<ffffffff818721d0>] kernel_init+0xd8/0x178
> [<ffffffff81109bb0>] kernel_thread_helper+0x10/0x18
[...]

Weird, I get the same thing on cn5860 and cn3860.

cn5750 and cn5020 are fine.

Normally I test on my ebh5610 board (cn5750) so I didn't notice this.

It may be caused by holes in the memory map.  Earlier I posted patches 
to set the memory as present:

http://patchwork.linux-mips.org/patch/1988/
http://patchwork.linux-mips.org/patch/1989/
http://patchwork.linux-mips.org/patch/1990/

One or more of those might help, but it is just a guess at this point.

I might take a look next week.

David Daney

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Problems booting 3.0.3 kernel on Octeon CN58XX board
  2011-08-19  0:10 ` Jason Kwon
  (?)
  (?)
@ 2011-08-19 10:59 ` Sergei Shtylyov
  -1 siblings, 0 replies; 9+ messages in thread
From: Sergei Shtylyov @ 2011-08-19 10:59 UTC (permalink / raw)
  To: Jason Kwon; +Cc: linux-mips

Hello.

On 19-08-2011 4:10, Jason Kwon wrote:

> Attempting to boot a 3.0.3 kernel on a CN58XX board produced the following oops:

[...]

> Code: 007e1824 0064182d 64840038 <dc620000> c84afff5 64c60001 66020200
> 66737000 1220000c

> All code
> ========
> 0: 24 18 and $0x18,%al
> 2: 7e 00 jle 0x4
> 4: 2d 18 64 00 38 sub $0x38006418,%eax
> 9: 00 84 64 00 00 62 dc add %al,-0x239e0000(%rsp,%riz,2)
> 10: f5 cmc
> 11: ff 4a c8 decl -0x38(%rdx)
> 14: 01 00 add %eax,(%rax)
> 16: c6 (bad)
> 17: 64 00 02 add %al,%fs:(%rdx)
> 1a: 02 66 00 add 0x0(%rsi),%ah
> 1d: 70 73 jo 0x92
> 1f: 66 data16
> 20: 0c 00 or $0x0,%al
> 22: 20 12 and %dl,(%rdx)

> Code starting with the faulting instruction
> ===========================================
> 0: 00 00 add %al,(%rax)
> 2: 62 (bad)
> 3: dc f5 fdiv %st,%st(5)
> 5: ff 4a c8 decl -0x38(%rdx)
> 8: 01 00 add %eax,(%rax)
> a: c6 (bad)
> b: 64 00 02 add %al,%fs:(%rdx)
> e: 02 66 00 add 0x0(%rsi),%ah
> 11: 70 73 jo 0x86
> 13: 66 data16
> 14: 0c 00 or $0x0,%al
> 16: 20 12 and %dl,(%rdx)

    This is x86 disassembly -- you should have used MIPS cross-tools.

WBR, Sergei

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Problems booting 3.0.3 kernel on Octeon CN58XX board
  2011-08-19  0:10 ` Jason Kwon
                   ` (2 preceding siblings ...)
  (?)
@ 2011-08-19 16:32 ` David Daney
  2011-08-19 17:00   ` Guenter Roeck
  -1 siblings, 1 reply; 9+ messages in thread
From: David Daney @ 2011-08-19 16:32 UTC (permalink / raw)
  To: Jason Kwon; +Cc: linux-mips

On 08/18/2011 05:10 PM, Jason Kwon wrote:
> Attempting to boot a 3.0.3 kernel on a CN58XX board produced the
> following oops:
>
> CPU 4 Unable to handle kernel paging request at virtual address
> 0000000001c00000, epc == ffffffff811aa9f4, ra == ffffffff811aaa98
> Oops[#1]:
> Cpu 4
> $ 0 : 0000000000000000 0000000010008ce0 ffffffff821d2b80 0000000001c00000
> $ 4 : 0000000001c00038 000000000000017c 0000000000080000 0000000000080072
> $ 8 : 0000000000000008 0000000000000002 0000000000000003 a800000002284520
> $12 : 0000000000000002 ffffffff8186ee80 ffffffffffffff80 0000000000000030
> $16 : 0000000000080072 0000000000000001 0000000001bfa8f0 0000000001bfa928
> $20 : a800000003aff8f0 00000000000f0000 ffffffff8186ee80 ffffffff821d2a80
> $24 : 0000000000000001 0000000000000038
> $28 : a80000041fc48000 a80000041fc4bd90 fffffffffffffffc ffffffff811aaa98
> Hi : 0000000000000000
> Lo : 0000000000000000
> epc : ffffffff811aa9f4 setup_per_zone_wmarks+0x19c/0x2d8
> Not tainted
> ra : ffffffff811aaa98 setup_per_zone_wmarks+0x240/0x2d8
> Status: 10008ce2 KX SX UX KERNEL EXL
> Cause : 40808408
> BadVA : 0000000001c00000
> PrId : 000d0301 (Cavium Octeon+)
> Modules linked in:
> Process swapper (pid: 1, threadinfo=a80000041fc48000,
> task=a80000041fc44038, tls=0000000000000000)
> Stack : 0000000000000000 000000000006f75d ffffffff8186eec0 0000000000000001
> 0000000000000547 ffffffff81825598 ffffffff81a80000 ffffffff818b3e68
> ffffffff818a40ac 0000000000000000 ffffffff81a80000 0000000000000000
> 0000000000000000 0000000000000000 0000000000000000 ffffffff818a40f0
> ffffffff81a80000 ffffffff81100438 ffffffff818b4198 ffffffff818b3e68
> ffffffff818b46c8 0000000000000000 0000000000000000 ffffffff818721d0
> 0000000000000000 0000000000000000 0000000000000000 ffffffff81109bb0
> 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> 0000000000000000 0000000000000000 0000000000000000 ffffffff818720f8
> 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> ...
> Call Trace:
> [<ffffffff811aa9f4>] setup_per_zone_wmarks+0x19c/0x2d8
> [<ffffffff818a40f0>] init_per_zone_wmark_min+0x44/0xe0
> [<ffffffff81100438>] do_one_initcall+0x38/0x160
> [<ffffffff818721d0>] kernel_init+0xd8/0x178
> [<ffffffff81109bb0>] kernel_thread_helper+0x10/0x18
>

It appears to be related to use of physical memory above the 16GB 
barrier.  You could try reducing the amount of memory allocated to the 
kernel by passing 'mem=1700M' on the kernel command line.

David Daney

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Problems booting 3.0.3 kernel on Octeon CN58XX board
  2011-08-19 16:32 ` David Daney
@ 2011-08-19 17:00   ` Guenter Roeck
  2011-08-19 17:10     ` David Daney
  2011-08-19 17:14     ` Jason Kwon
  0 siblings, 2 replies; 9+ messages in thread
From: Guenter Roeck @ 2011-08-19 17:00 UTC (permalink / raw)
  To: David Daney; +Cc: Jason Kwon, linux-mips

On Fri, 2011-08-19 at 12:32 -0400, David Daney wrote:
> On 08/18/2011 05:10 PM, Jason Kwon wrote:
> > Attempting to boot a 3.0.3 kernel on a CN58XX board produced the
> > following oops:
> >
> > CPU 4 Unable to handle kernel paging request at virtual address
> > 0000000001c00000, epc == ffffffff811aa9f4, ra == ffffffff811aaa98
> > Oops[#1]:
> > Cpu 4
> > $ 0 : 0000000000000000 0000000010008ce0 ffffffff821d2b80 0000000001c00000
> > $ 4 : 0000000001c00038 000000000000017c 0000000000080000 0000000000080072
> > $ 8 : 0000000000000008 0000000000000002 0000000000000003 a800000002284520
> > $12 : 0000000000000002 ffffffff8186ee80 ffffffffffffff80 0000000000000030
> > $16 : 0000000000080072 0000000000000001 0000000001bfa8f0 0000000001bfa928
> > $20 : a800000003aff8f0 00000000000f0000 ffffffff8186ee80 ffffffff821d2a80
> > $24 : 0000000000000001 0000000000000038
> > $28 : a80000041fc48000 a80000041fc4bd90 fffffffffffffffc ffffffff811aaa98
> > Hi : 0000000000000000
> > Lo : 0000000000000000
> > epc : ffffffff811aa9f4 setup_per_zone_wmarks+0x19c/0x2d8
> > Not tainted
> > ra : ffffffff811aaa98 setup_per_zone_wmarks+0x240/0x2d8
> > Status: 10008ce2 KX SX UX KERNEL EXL
> > Cause : 40808408
> > BadVA : 0000000001c00000
> > PrId : 000d0301 (Cavium Octeon+)
> > Modules linked in:
> > Process swapper (pid: 1, threadinfo=a80000041fc48000,
> > task=a80000041fc44038, tls=0000000000000000)
> > Stack : 0000000000000000 000000000006f75d ffffffff8186eec0 0000000000000001
> > 0000000000000547 ffffffff81825598 ffffffff81a80000 ffffffff818b3e68
> > ffffffff818a40ac 0000000000000000 ffffffff81a80000 0000000000000000
> > 0000000000000000 0000000000000000 0000000000000000 ffffffff818a40f0
> > ffffffff81a80000 ffffffff81100438 ffffffff818b4198 ffffffff818b3e68
> > ffffffff818b46c8 0000000000000000 0000000000000000 ffffffff818721d0
> > 0000000000000000 0000000000000000 0000000000000000 ffffffff81109bb0
> > 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > 0000000000000000 0000000000000000 0000000000000000 ffffffff818720f8
> > 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > ...
> > Call Trace:
> > [<ffffffff811aa9f4>] setup_per_zone_wmarks+0x19c/0x2d8
> > [<ffffffff818a40f0>] init_per_zone_wmark_min+0x44/0xe0
> > [<ffffffff81100438>] do_one_initcall+0x38/0x160
> > [<ffffffff818721d0>] kernel_init+0xd8/0x178
> > [<ffffffff81109bb0>] kernel_thread_helper+0x10/0x18
> >
> 
> It appears to be related to use of physical memory above the 16GB 
> barrier.  You could try reducing the amount of memory allocated to the 
> kernel by passing 'mem=1700M' on the kernel command line.
> 

Hi David,

are you sure ?

This is what I see with our own boards (not the reference design board):

Works:

Linux version 3.0.3-423-gfa07d39 (groeck@rbos-pc-13) (gcc version 4.4.1
(Debian 4.4.1-1) ) #2 SMP PREEMPT Thu Aug 18 14:09:53 PDT 2011
[ ... ]
CPU revision is: 000d030b (Cavium Octeon+)
Checking for the multiply/shift bug... no.
Checking for the daddiu bug... no.
Determined physical RAM map:
 memory: 00000000001fa000 @ 000000000160b000 (usable)
 memory: 000000000e400000 @ 0000000001900000 (usable)
 memory: 00000000d0000000 @ 0000000020000000 (usable)
 memory: 000000000ffff000 @ 00000000f0001000 (usable)
 memory: 0000000010000000 @ 0000000410000000 (usable)

Crashes:

Linux version 3.0.3-423-gfa07d39 (groeck@rbos-pc-13) (gcc version 4.4.1
(Debian 4.4.1-1) ) #2 SMP PREEMPT Thu Aug 18 14:09:53 PDT 2011
[ ... ]
CPU revision is: 000d0003 (Cavium Octeon)
Checking for the multiply/shift bug... no.
Checking for the daddiu bug... no.
Determined physical RAM map:
 memory: 00000000001fa000 @ 000000000160b000 (usable)
 memory: 000000000e400000 @ 0000000001900000 (usable)
 memory: 0000000060000000 @ 0000000020000000 (usable)
 memory: 0000000010000000 @ 0000000410000000 (usable)

The memory at 0000000410000000 is there for both CPUs, yet the crash is
only seen on the board with CN38xx. From a SW perspective, only
difference besides the CPU type is that the working board has more
memory.

Thanks,
Guenter

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Problems booting 3.0.3 kernel on Octeon CN58XX board
  2011-08-19 17:00   ` Guenter Roeck
@ 2011-08-19 17:10     ` David Daney
  2011-08-19 17:14     ` Jason Kwon
  1 sibling, 0 replies; 9+ messages in thread
From: David Daney @ 2011-08-19 17:10 UTC (permalink / raw)
  To: guenter.roeck; +Cc: Jason Kwon, linux-mips

On 08/19/2011 10:00 AM, Guenter Roeck wrote:
> On Fri, 2011-08-19 at 12:32 -0400, David Daney wrote:
>> On 08/18/2011 05:10 PM, Jason Kwon wrote:
>>> Attempting to boot a 3.0.3 kernel on a CN58XX board produced the
>>> following oops:
>>>
>>> CPU 4 Unable to handle kernel paging request at virtual address
>>> 0000000001c00000, epc == ffffffff811aa9f4, ra == ffffffff811aaa98
>>> Oops[#1]:
>>> Cpu 4
>>> $ 0 : 0000000000000000 0000000010008ce0 ffffffff821d2b80 0000000001c00000
>>> $ 4 : 0000000001c00038 000000000000017c 0000000000080000 0000000000080072
>>> $ 8 : 0000000000000008 0000000000000002 0000000000000003 a800000002284520
>>> $12 : 0000000000000002 ffffffff8186ee80 ffffffffffffff80 0000000000000030
>>> $16 : 0000000000080072 0000000000000001 0000000001bfa8f0 0000000001bfa928
>>> $20 : a800000003aff8f0 00000000000f0000 ffffffff8186ee80 ffffffff821d2a80
>>> $24 : 0000000000000001 0000000000000038
>>> $28 : a80000041fc48000 a80000041fc4bd90 fffffffffffffffc ffffffff811aaa98
>>> Hi : 0000000000000000
>>> Lo : 0000000000000000
>>> epc : ffffffff811aa9f4 setup_per_zone_wmarks+0x19c/0x2d8
>>> Not tainted
>>> ra : ffffffff811aaa98 setup_per_zone_wmarks+0x240/0x2d8
>>> Status: 10008ce2 KX SX UX KERNEL EXL
>>> Cause : 40808408
>>> BadVA : 0000000001c00000
>>> PrId : 000d0301 (Cavium Octeon+)
>>> Modules linked in:
>>> Process swapper (pid: 1, threadinfo=a80000041fc48000,
>>> task=a80000041fc44038, tls=0000000000000000)
>>> Stack : 0000000000000000 000000000006f75d ffffffff8186eec0 0000000000000001
>>> 0000000000000547 ffffffff81825598 ffffffff81a80000 ffffffff818b3e68
>>> ffffffff818a40ac 0000000000000000 ffffffff81a80000 0000000000000000
>>> 0000000000000000 0000000000000000 0000000000000000 ffffffff818a40f0
>>> ffffffff81a80000 ffffffff81100438 ffffffff818b4198 ffffffff818b3e68
>>> ffffffff818b46c8 0000000000000000 0000000000000000 ffffffff818721d0
>>> 0000000000000000 0000000000000000 0000000000000000 ffffffff81109bb0
>>> 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>>> 0000000000000000 0000000000000000 0000000000000000 ffffffff818720f8
>>> 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>>> ...
>>> Call Trace:
>>> [<ffffffff811aa9f4>] setup_per_zone_wmarks+0x19c/0x2d8
>>> [<ffffffff818a40f0>] init_per_zone_wmark_min+0x44/0xe0
>>> [<ffffffff81100438>] do_one_initcall+0x38/0x160
>>> [<ffffffff818721d0>] kernel_init+0xd8/0x178
>>> [<ffffffff81109bb0>] kernel_thread_helper+0x10/0x18
>>>
>>
>> It appears to be related to use of physical memory above the 16GB
>> barrier.  You could try reducing the amount of memory allocated to the
>> kernel by passing 'mem=1700M' on the kernel command line.
>>
>
> Hi David,
>
> are you sure ?
>
> This is what I see with our own boards (not the reference design board):
>
> Works:
>
> Linux version 3.0.3-423-gfa07d39 (groeck@rbos-pc-13) (gcc version 4.4.1
> (Debian 4.4.1-1) ) #2 SMP PREEMPT Thu Aug 18 14:09:53 PDT 2011
> [ ... ]
> CPU revision is: 000d030b (Cavium Octeon+)
> Checking for the multiply/shift bug... no.
> Checking for the daddiu bug... no.
> Determined physical RAM map:
>   memory: 00000000001fa000 @ 000000000160b000 (usable)
>   memory: 000000000e400000 @ 0000000001900000 (usable)
>   memory: 00000000d0000000 @ 0000000020000000 (usable)
>   memory: 000000000ffff000 @ 00000000f0001000 (usable)
>   memory: 0000000010000000 @ 0000000410000000 (usable)
>
> Crashes:
>
> Linux version 3.0.3-423-gfa07d39 (groeck@rbos-pc-13) (gcc version 4.4.1
> (Debian 4.4.1-1) ) #2 SMP PREEMPT Thu Aug 18 14:09:53 PDT 2011
> [ ... ]
> CPU revision is: 000d0003 (Cavium Octeon)
> Checking for the multiply/shift bug... no.
> Checking for the daddiu bug... no.
> Determined physical RAM map:
>   memory: 00000000001fa000 @ 000000000160b000 (usable)
>   memory: 000000000e400000 @ 0000000001900000 (usable)
>   memory: 0000000060000000 @ 0000000020000000 (usable)
>   memory: 0000000010000000 @ 0000000410000000 (usable)
>
> The memory at 0000000410000000 is there for both CPUs, yet the crash is
> only seen on the board with CN38xx. From a SW perspective, only
> difference besides the CPU type is that the working board has more
> memory.
>

That's right, I normally run on boards with 4GB of memory so I was not 
seeing it.  When I reduce the memory to 2GB, I can see it on most boards.

Sometimes (but not always) I see this:
.
.
.
Movable zone start PFN for each node
early_node_map[5] active PFN ranges
     0: 0x0000056a -> 0x00000578
     0: 0x000005c0 -> 0x00001fc0
     0: 0x00002080 -> 0x00003f80
     0: 0x00008000 -> 0x00020000
     0: 0x00104000 -> 0x00104900
   Normal zone: 2808 pages exceeds realsize 2304

^^^^^^^^^ Is a warning message indicating that something is not right in 
the memory initialization, which is exactly where things are going wrong.

I am retesting on the HEAD and trying to figure out where it is going wrong.

David Daney

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Problems booting 3.0.3 kernel on Octeon CN58XX board
  2011-08-19 17:00   ` Guenter Roeck
  2011-08-19 17:10     ` David Daney
@ 2011-08-19 17:14     ` Jason Kwon
  2011-08-19 17:35       ` Guenter Roeck
  1 sibling, 1 reply; 9+ messages in thread
From: Jason Kwon @ 2011-08-19 17:14 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: David Daney, linux-mips

On 08/19/2011 10:00 AM, Guenter Roeck wrote:
> On Fri, 2011-08-19 at 12:32 -0400, David Daney wrote:
>> On 08/18/2011 05:10 PM, Jason Kwon wrote:
>>> Attempting to boot a 3.0.3 kernel on a CN58XX board produced the
>>> following oops:
>>>
>>> CPU 4 Unable to handle kernel paging request at virtual address
>>> 0000000001c00000, epc == ffffffff811aa9f4, ra == ffffffff811aaa98
>>> Oops[#1]:
>>> Cpu 4
>>> $ 0 : 0000000000000000 0000000010008ce0 ffffffff821d2b80 0000000001c00000
>>> $ 4 : 0000000001c00038 000000000000017c 0000000000080000 0000000000080072
>>> $ 8 : 0000000000000008 0000000000000002 0000000000000003 a800000002284520
>>> $12 : 0000000000000002 ffffffff8186ee80 ffffffffffffff80 0000000000000030
>>> $16 : 0000000000080072 0000000000000001 0000000001bfa8f0 0000000001bfa928
>>> $20 : a800000003aff8f0 00000000000f0000 ffffffff8186ee80 ffffffff821d2a80
>>> $24 : 0000000000000001 0000000000000038
>>> $28 : a80000041fc48000 a80000041fc4bd90 fffffffffffffffc ffffffff811aaa98
>>> Hi : 0000000000000000
>>> Lo : 0000000000000000
>>> epc : ffffffff811aa9f4 setup_per_zone_wmarks+0x19c/0x2d8
>>> Not tainted
>>> ra : ffffffff811aaa98 setup_per_zone_wmarks+0x240/0x2d8
>>> Status: 10008ce2 KX SX UX KERNEL EXL
>>> Cause : 40808408
>>> BadVA : 0000000001c00000
>>> PrId : 000d0301 (Cavium Octeon+)
>>> Modules linked in:
>>> Process swapper (pid: 1, threadinfo=a80000041fc48000,
>>> task=a80000041fc44038, tls=0000000000000000)
>>> Stack : 0000000000000000 000000000006f75d ffffffff8186eec0 0000000000000001
>>> 0000000000000547 ffffffff81825598 ffffffff81a80000 ffffffff818b3e68
>>> ffffffff818a40ac 0000000000000000 ffffffff81a80000 0000000000000000
>>> 0000000000000000 0000000000000000 0000000000000000 ffffffff818a40f0
>>> ffffffff81a80000 ffffffff81100438 ffffffff818b4198 ffffffff818b3e68
>>> ffffffff818b46c8 0000000000000000 0000000000000000 ffffffff818721d0
>>> 0000000000000000 0000000000000000 0000000000000000 ffffffff81109bb0
>>> 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>>> 0000000000000000 0000000000000000 0000000000000000 ffffffff818720f8
>>> 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>>> ...
>>> Call Trace:
>>> [<ffffffff811aa9f4>] setup_per_zone_wmarks+0x19c/0x2d8
>>> [<ffffffff818a40f0>] init_per_zone_wmark_min+0x44/0xe0
>>> [<ffffffff81100438>] do_one_initcall+0x38/0x160
>>> [<ffffffff818721d0>] kernel_init+0xd8/0x178
>>> [<ffffffff81109bb0>] kernel_thread_helper+0x10/0x18
>>>
>> It appears to be related to use of physical memory above the 16GB
>> barrier.  You could try reducing the amount of memory allocated to the
>> kernel by passing 'mem=1700M' on the kernel command line.
>>
> Hi David,
>
> are you sure ?
>
> This is what I see with our own boards (not the reference design board):
>
> Works:
>
> Linux version 3.0.3-423-gfa07d39 (groeck@rbos-pc-13) (gcc version 4.4.1
> (Debian 4.4.1-1) ) #2 SMP PREEMPT Thu Aug 18 14:09:53 PDT 2011
> [ ... ]
> CPU revision is: 000d030b (Cavium Octeon+)
> Checking for the multiply/shift bug... no.
> Checking for the daddiu bug... no.
> Determined physical RAM map:
>   memory: 00000000001fa000 @ 000000000160b000 (usable)
>   memory: 000000000e400000 @ 0000000001900000 (usable)
>   memory: 00000000d0000000 @ 0000000020000000 (usable)
>   memory: 000000000ffff000 @ 00000000f0001000 (usable)
>   memory: 0000000010000000 @ 0000000410000000 (usable)
>
> Crashes:
>
> Linux version 3.0.3-423-gfa07d39 (groeck@rbos-pc-13) (gcc version 4.4.1
> (Debian 4.4.1-1) ) #2 SMP PREEMPT Thu Aug 18 14:09:53 PDT 2011
> [ ... ]
> CPU revision is: 000d0003 (Cavium Octeon)
> Checking for the multiply/shift bug... no.
> Checking for the daddiu bug... no.
> Determined physical RAM map:
>   memory: 00000000001fa000 @ 000000000160b000 (usable)
>   memory: 000000000e400000 @ 0000000001900000 (usable)
>   memory: 0000000060000000 @ 0000000020000000 (usable)
>   memory: 0000000010000000 @ 0000000410000000 (usable)
>
> The memory at 0000000410000000 is there for both CPUs, yet the crash is
> only seen on the board with CN38xx. From a SW perspective, only
> difference besides the CPU type is that the working board has more
> memory.
>
> Thanks,
> Guenter
>
>
Well, I can confirm that setting mem=1700m on my CN5860 board allowed it 
to boot, at least:

Linux version 3.0.3-Cavium-Octeon+ (jkwon@xc5-pc2) (gcc version 4.3.3 
(Cavium Networks Version: 2_0_0 build 95) ) #2 SMP Thu Aug 18 15:16:58 
PDT 2011
[ ... ]
CPU revision is: 000d0301 (Cavium Octeon+)
Checking for the multiply/shift bug... no.
Checking for the daddiu bug... no.
Determined physical RAM map:
  memory: 0000000000208000 @ 0000000001872000 (usable)
  memory: 000000000dc00000 @ 0000000002200000 (usable)
  memory: 000000005c800000 @ 0000000020000000 (usable)

I also tried the memory restriction on a CN3860 board that was also 
hitting the same oops, and it then hit a different problem, so the 
restriction did seem to work on both boards.

Jason

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Problems booting 3.0.3 kernel on Octeon CN58XX board
  2011-08-19 17:14     ` Jason Kwon
@ 2011-08-19 17:35       ` Guenter Roeck
  0 siblings, 0 replies; 9+ messages in thread
From: Guenter Roeck @ 2011-08-19 17:35 UTC (permalink / raw)
  To: Jason Kwon; +Cc: David Daney, linux-mips

On Fri, 2011-08-19 at 13:14 -0400, Jason Kwon wrote:
> On 08/19/2011 10:00 AM, Guenter Roeck wrote:
> > On Fri, 2011-08-19 at 12:32 -0400, David Daney wrote:
> >> On 08/18/2011 05:10 PM, Jason Kwon wrote:
> >>> Attempting to boot a 3.0.3 kernel on a CN58XX board produced the
> >>> following oops:
> >>>
> >>> CPU 4 Unable to handle kernel paging request at virtual address
> >>> 0000000001c00000, epc == ffffffff811aa9f4, ra == ffffffff811aaa98
> >>> Oops[#1]:
> >>> Cpu 4
> >>> $ 0 : 0000000000000000 0000000010008ce0 ffffffff821d2b80 0000000001c00000
> >>> $ 4 : 0000000001c00038 000000000000017c 0000000000080000 0000000000080072
> >>> $ 8 : 0000000000000008 0000000000000002 0000000000000003 a800000002284520
> >>> $12 : 0000000000000002 ffffffff8186ee80 ffffffffffffff80 0000000000000030
> >>> $16 : 0000000000080072 0000000000000001 0000000001bfa8f0 0000000001bfa928
> >>> $20 : a800000003aff8f0 00000000000f0000 ffffffff8186ee80 ffffffff821d2a80
> >>> $24 : 0000000000000001 0000000000000038
> >>> $28 : a80000041fc48000 a80000041fc4bd90 fffffffffffffffc ffffffff811aaa98
> >>> Hi : 0000000000000000
> >>> Lo : 0000000000000000
> >>> epc : ffffffff811aa9f4 setup_per_zone_wmarks+0x19c/0x2d8
> >>> Not tainted
> >>> ra : ffffffff811aaa98 setup_per_zone_wmarks+0x240/0x2d8
> >>> Status: 10008ce2 KX SX UX KERNEL EXL
> >>> Cause : 40808408
> >>> BadVA : 0000000001c00000
> >>> PrId : 000d0301 (Cavium Octeon+)
> >>> Modules linked in:
> >>> Process swapper (pid: 1, threadinfo=a80000041fc48000,
> >>> task=a80000041fc44038, tls=0000000000000000)
> >>> Stack : 0000000000000000 000000000006f75d ffffffff8186eec0 0000000000000001
> >>> 0000000000000547 ffffffff81825598 ffffffff81a80000 ffffffff818b3e68
> >>> ffffffff818a40ac 0000000000000000 ffffffff81a80000 0000000000000000
> >>> 0000000000000000 0000000000000000 0000000000000000 ffffffff818a40f0
> >>> ffffffff81a80000 ffffffff81100438 ffffffff818b4198 ffffffff818b3e68
> >>> ffffffff818b46c8 0000000000000000 0000000000000000 ffffffff818721d0
> >>> 0000000000000000 0000000000000000 0000000000000000 ffffffff81109bb0
> >>> 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> >>> 0000000000000000 0000000000000000 0000000000000000 ffffffff818720f8
> >>> 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> >>> ...
> >>> Call Trace:
> >>> [<ffffffff811aa9f4>] setup_per_zone_wmarks+0x19c/0x2d8
> >>> [<ffffffff818a40f0>] init_per_zone_wmark_min+0x44/0xe0
> >>> [<ffffffff81100438>] do_one_initcall+0x38/0x160
> >>> [<ffffffff818721d0>] kernel_init+0xd8/0x178
> >>> [<ffffffff81109bb0>] kernel_thread_helper+0x10/0x18
> >>>
> >> It appears to be related to use of physical memory above the 16GB
> >> barrier.  You could try reducing the amount of memory allocated to the
> >> kernel by passing 'mem=1700M' on the kernel command line.
> >>
> > Hi David,
> >
> > are you sure ?
> >
> > This is what I see with our own boards (not the reference design board):
> >
> > Works:
> >
> > Linux version 3.0.3-423-gfa07d39 (groeck@rbos-pc-13) (gcc version 4.4.1
> > (Debian 4.4.1-1) ) #2 SMP PREEMPT Thu Aug 18 14:09:53 PDT 2011
> > [ ... ]
> > CPU revision is: 000d030b (Cavium Octeon+)
> > Checking for the multiply/shift bug... no.
> > Checking for the daddiu bug... no.
> > Determined physical RAM map:
> >   memory: 00000000001fa000 @ 000000000160b000 (usable)
> >   memory: 000000000e400000 @ 0000000001900000 (usable)
> >   memory: 00000000d0000000 @ 0000000020000000 (usable)
> >   memory: 000000000ffff000 @ 00000000f0001000 (usable)
> >   memory: 0000000010000000 @ 0000000410000000 (usable)
> >
> > Crashes:
> >
> > Linux version 3.0.3-423-gfa07d39 (groeck@rbos-pc-13) (gcc version 4.4.1
> > (Debian 4.4.1-1) ) #2 SMP PREEMPT Thu Aug 18 14:09:53 PDT 2011
> > [ ... ]
> > CPU revision is: 000d0003 (Cavium Octeon)
> > Checking for the multiply/shift bug... no.
> > Checking for the daddiu bug... no.
> > Determined physical RAM map:
> >   memory: 00000000001fa000 @ 000000000160b000 (usable)
> >   memory: 000000000e400000 @ 0000000001900000 (usable)
> >   memory: 0000000060000000 @ 0000000020000000 (usable)
> >   memory: 0000000010000000 @ 0000000410000000 (usable)
> >
> > The memory at 0000000410000000 is there for both CPUs, yet the crash is
> > only seen on the board with CN38xx. From a SW perspective, only
> > difference besides the CPU type is that the working board has more
> > memory.
> >
> > Thanks,
> > Guenter
> >
> >
> Well, I can confirm that setting mem=1700m on my CN5860 board allowed it 
> to boot, at least:
> 
> Linux version 3.0.3-Cavium-Octeon+ (jkwon@xc5-pc2) (gcc version 4.3.3 
> (Cavium Networks Version: 2_0_0 build 95) ) #2 SMP Thu Aug 18 15:16:58 
> PDT 2011
> [ ... ]
> CPU revision is: 000d0301 (Cavium Octeon+)
> Checking for the multiply/shift bug... no.
> Checking for the daddiu bug... no.
> Determined physical RAM map:
>   memory: 0000000000208000 @ 0000000001872000 (usable)
>   memory: 000000000dc00000 @ 0000000002200000 (usable)
>   memory: 000000005c800000 @ 0000000020000000 (usable)
> 
> I also tried the memory restriction on a CN3860 board that was also 
> hitting the same oops, and it then hit a different problem, so the 
> restriction did seem to work on both boards.
> 
Another data point: The board with CN38xx boots with mem=1700m:

Linux version 3.0.3-422-gcdb65d6 (groeck@rbos-pc-13) (gcc version 4.4.1
(Debian 4.4.1-1) ) #2 SMP PREEMPT Fri Aug 19 10:14:27 PDT 2011
[ ... ]
CPU revision is: 000d0003 (Cavium Octeon)
Checking for the multiply/shift bug... no.
Checking for the daddiu bug... no.
Determined physical RAM map:
 memory: 00000000001fa000 @ 000000000160a000 (usable)
 memory: 000000000e400000 @ 0000000001900000 (usable)
 memory: 0000000060000000 @ 0000000020000000 (usable)
 memory: 0000000010000000 @ 0000000410000000 (usable)
User-defined physical RAM map:
 memory: 000000006a400000 @ 0000000000000000 (usable)

Guenter

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2011-08-19 17:36 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-19  0:10 Problems booting 3.0.3 kernel on Octeon CN58XX board Jason Kwon
2011-08-19  0:10 ` Jason Kwon
2011-08-19  1:05 ` David Daney
2011-08-19 10:59 ` Sergei Shtylyov
2011-08-19 16:32 ` David Daney
2011-08-19 17:00   ` Guenter Roeck
2011-08-19 17:10     ` David Daney
2011-08-19 17:14     ` Jason Kwon
2011-08-19 17:35       ` Guenter Roeck

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.