All of lore.kernel.org
 help / color / mirror / Atom feed
* oops in futex_init()
@ 2009-04-28 12:46 ` Manuel Lauss
  0 siblings, 0 replies; 15+ messages in thread
From: Manuel Lauss @ 2009-04-28 12:46 UTC (permalink / raw)
  To: Linux-MIPS

Hello!

From time to time, my test systems don't boot correctly and spew the
following oops in futex_init():

calling  init_timer_list_procfs+0x0/0x40 @ 1
initcall init_timer_list_procfs+0x0/0x40 returned 0 after 29 usecs
calling  futex_init+0x0/0xac @ 1
Reserved instruction in kernel code[#1]:
Cpu 0
$ 0   : 00000000 10003c00 00000000 00000001
$ 4   : fffffff2 00000000 32e02014 00000000
$ 8   : 00000000 00000000 c4653600 000000cd
$12   : 3b9aca00 000186a0 870ce3f0 0000000d
$16   : 32e02014 00000000 00000000 8042f0dc
$20   : 00000000 00000000 00000000 00000000
$24   : 00000005 80243a3c
$28   : 87020000 87021f30 00000000 80100460
Hi    : 00000000
Lo    : 00000000
epc   : 8042f0f8 futex_init+0x1c/0xac
    Not tainted
ra    : 80100460 _stext+0x60/0x1c8
Status: 10003c03    KERNEL EXL IE
Cause : 00808028
PrId  : 04030202 (Au1250)
Modules linked in:
Process swapper (pid: 1, threadinfo=87020000, task=87018000, tls=00000000)
Stack : 00000000 8042f0dc 00000001 00002543 0000001d 00000000 87021f00 8014f014
        0000000e 00000000 8702a900 87002000 00003137 00000000 00000000 801ba18c
        8041e7a0 000000e0 80410000 00000000 00000000 8014f09c 32e02014 00000000
        80448360 804484f4 00000000 00000000 00000000 80428304 00000000 00000000
        00000000 00000000 87020000 00000000 00000000 80106ea4 10003c03 00000000
        ...
Call Trace:
[<8042f0f8>] futex_init+0x1c/0xac
[<80100460>] _stext+0x60/0x1c8
[<80428304>] kernel_init+0x98/0x104
[<80106ea4>] kernel_thread_helper+0x10/0x18


Code: 30420004  14400008  2404fff2 <c0440000> 14800005  00000000  00000821  e0410000  1020fffa
Disabling lock debugging due to kernel taint
note: swapper[1] exited with preempt_count 1
Kernel panic - not syncing: Attempted to kill init!


Disassembly of futex_init():

(gdb) disass 0x8042f0f8
Dump of assembler code for function futex_init:
0x8042f0dc <futex_init+0>:      lw      v1,20(gp)
0x8042f0e0 <futex_init+4>:      addiu   v1,v1,1
0x8042f0e4 <futex_init+8>:      sw      v1,20(gp)
0x8042f0e8 <futex_init+12>:     lw      v0,24(gp)
0x8042f0ec <futex_init+16>:     andi    v0,v0,0x4
0x8042f0f0 <futex_init+20>:     bnez    v0,0x8042f114 <futex_init+56>
0x8042f0f4 <futex_init+24>:     li      a0,-14
0x8042f0f8 <futex_init+28>:     ll      a0,0(v0)
0x8042f0fc <futex_init+32>:     bnez    a0,0x8042f114 <futex_init+56>
0x8042f100 <futex_init+36>:     nop
0x8042f104 <futex_init+40>:     move    at,zero
0x8042f108 <futex_init+44>:     sc      at,0(v0)
0x8042f10c <futex_init+48>:     beqz    at,0x8042f0f8 <futex_init+28>
0x8042f110 <futex_init+52>:     nop
0x8042f114 <futex_init+56>:     lw      v1,20(gp)
0x8042f118 <futex_init+60>:     addiu   v1,v1,-1
0x8042f11c <futex_init+64>:     sw      v1,20(gp)
0x8042f120 <futex_init+68>:     li      v0,-14
0x8042f124 <futex_init+72>:     beq     a0,v0,0x8042f144 <futex_init+104>
0x8042f128 <futex_init+76>:     li      v1,1
0x8042f12c <futex_init+80>:     lui     v0,0x8049
0x8042f130 <futex_init+84>:     addiu   a0,v0,26536
0x8042f134 <futex_init+88>:     move    a1,zero
0x8042f138 <futex_init+92>:     move    a2,a0
0x8042f13c <futex_init+96>:     j       0x8042f150 <futex_init+116>
0x8042f140 <futex_init+100>:    li      a3,256
0x8042f144 <futex_init+104>:    lui     v0,0x8049
0x8042f148 <futex_init+108>:    j       0x8042f12c <futex_init+80>
0x8042f14c <futex_init+112>:    sw      v1,26528(v0)
0x8042f150 <futex_init+116>:    sll     v1,a1,0x4
0x8042f154 <futex_init+120>:    sll     v0,a1,0x4
0x8042f158 <futex_init+124>:    addiu   v1,v1,8
0x8042f15c <futex_init+128>:    addu    v0,a2,v0
0x8042f160 <futex_init+132>:    addu    v1,a2,v1
0x8042f164 <futex_init+136>:    addiu   a1,a1,1
0x8042f168 <futex_init+140>:    sw      v0,4(a0)
0x8042f16c <futex_init+144>:    sw      v1,12(a0)
0x8042f170 <futex_init+148>:    sw      v0,0(a0)
0x8042f174 <futex_init+152>:    sw      v1,8(a0)
0x8042f178 <futex_init+156>:    bne     a1,a3,0x8042f150 <futex_init+116>
0x8042f17c <futex_init+160>:    addiu   a0,a0,16
0x8042f180 <futex_init+164>:    jr      ra
0x8042f184 <futex_init+168>:    move    v0,zero
End of assembler dump.



Could this be a compiler/toolchain issue?
(gcc-4.3.3, binutils-2.19.1)

Thanks!
	Manuel Lauss

^ permalink raw reply	[flat|nested] 15+ messages in thread

* oops in futex_init()
@ 2009-04-28 12:46 ` Manuel Lauss
  0 siblings, 0 replies; 15+ messages in thread
From: Manuel Lauss @ 2009-04-28 12:46 UTC (permalink / raw)
  To: Linux-MIPS

Hello!

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: oops in futex_init()
  2009-04-28 12:46 ` Manuel Lauss
  (?)
@ 2009-04-29  6:03 ` Ralf Baechle
  2009-04-29  8:25   ` Manuel Lauss
  -1 siblings, 1 reply; 15+ messages in thread
From: Ralf Baechle @ 2009-04-29  6:03 UTC (permalink / raw)
  To: Manuel Lauss; +Cc: Linux-MIPS

On Tue, Apr 28, 2009 at 02:46:45PM +0200, Manuel Lauss wrote:

> >From time to time, my test systems don't boot correctly and spew the
> following oops in futex_init():
> 
> calling  init_timer_list_procfs+0x0/0x40 @ 1
> initcall init_timer_list_procfs+0x0/0x40 returned 0 after 29 usecs
> calling  futex_init+0x0/0xac @ 1
> Reserved instruction in kernel code[#1]:
> Cpu 0
> $ 0   : 00000000 10003c00 00000000 00000001
> $ 4   : fffffff2 00000000 32e02014 00000000
> $ 8   : 00000000 00000000 c4653600 000000cd
> $12   : 3b9aca00 000186a0 870ce3f0 0000000d
> $16   : 32e02014 00000000 00000000 8042f0dc
> $20   : 00000000 00000000 00000000 00000000
> $24   : 00000005 80243a3c
> $28   : 87020000 87021f30 00000000 80100460
> Hi    : 00000000
> Lo    : 00000000
> epc   : 8042f0f8 futex_init+0x1c/0xac
>     Not tainted
> ra    : 80100460 _stext+0x60/0x1c8
> Status: 10003c03    KERNEL EXL IE
> Cause : 00808028
> PrId  : 04030202 (Au1250)
> Modules linked in:
> Process swapper (pid: 1, threadinfo=87020000, task=87018000, tls=00000000)
> Stack : 00000000 8042f0dc 00000001 00002543 0000001d 00000000 87021f00 8014f014
>         0000000e 00000000 8702a900 87002000 00003137 00000000 00000000 801ba18c
>         8041e7a0 000000e0 80410000 00000000 00000000 8014f09c 32e02014 00000000
>         80448360 804484f4 00000000 00000000 00000000 80428304 00000000 00000000
>         00000000 00000000 87020000 00000000 00000000 80106ea4 10003c03 00000000
>         ...
> Call Trace:
> [<8042f0f8>] futex_init+0x1c/0xac
> [<80100460>] _stext+0x60/0x1c8
> [<80428304>] kernel_init+0x98/0x104
> [<80106ea4>] kernel_thread_helper+0x10/0x18
> 
> 
> Code: 30420004  14400008  2404fff2 <c0440000> 14800005  00000000  00000821  e0410000  1020fffa
> Disabling lock debugging due to kernel taint
> note: swapper[1] exited with preempt_count 1
> Kernel panic - not syncing: Attempted to kill init!
> 
> 
> Disassembly of futex_init():
> 
> (gdb) disass 0x8042f0f8
> Dump of assembler code for function futex_init:
> 0x8042f0dc <futex_init+0>:      lw      v1,20(gp)
> 0x8042f0e0 <futex_init+4>:      addiu   v1,v1,1
> 0x8042f0e4 <futex_init+8>:      sw      v1,20(gp)
> 0x8042f0e8 <futex_init+12>:     lw      v0,24(gp)
> 0x8042f0ec <futex_init+16>:     andi    v0,v0,0x4
> 0x8042f0f0 <futex_init+20>:     bnez    v0,0x8042f114 <futex_init+56>
> 0x8042f0f4 <futex_init+24>:     li      a0,-14
> 0x8042f0f8 <futex_init+28>:     ll      a0,0(v0)

So this is in futex_atomic_cmpxchg_inatomic which has been inlined into
futex_init.  The epc is pointing to this LL instruction which is a
legitimate MIPS32 instruction, so a reserved instruction exception does
not make sense.  However, a NULL pointer has intensionally been passed
as the argument heres so this LL instruction will take a TLB exception,
do_page_fault() will change the EPC to return to to point to the fixup
handler which in the sources are these lines:

                "       .section .fixup,\"ax\"                          \n"
                "4:     li      %0, %5                                  \n"
                "       j       3b                                      \n"
                "       .previous                                       \n"
                "       .section __ex_table,\"a\"                       \n"
                "       "__UA_ADDR "\t1b, 4b                            \n"
                "       "__UA_ADDR "\t2b, 4b                            \n"
                "       .previous                                       \n"

That's how it normally should function.  If however in the exception
handler something goes wrong while c0_status.exl is still set the c0_epc
regiser won't be updated for the 2nd exception which is that reserved
instruction exception.  This sort of bug can be ugly to chase, I'm afraid.

  Ralf

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: oops in futex_init()
  2009-04-29  6:03 ` Ralf Baechle
@ 2009-04-29  8:25   ` Manuel Lauss
  2009-04-29  8:33     ` Ralf Baechle
  0 siblings, 1 reply; 15+ messages in thread
From: Manuel Lauss @ 2009-04-29  8:25 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: Linux-MIPS

> > (gdb) disass 0x8042f0f8
> > Dump of assembler code for function futex_init:
> > 0x8042f0dc <futex_init+0>:      lw      v1,20(gp)
> > 0x8042f0e0 <futex_init+4>:      addiu   v1,v1,1
> > 0x8042f0e4 <futex_init+8>:      sw      v1,20(gp)
> > 0x8042f0e8 <futex_init+12>:     lw      v0,24(gp)
> > 0x8042f0ec <futex_init+16>:     andi    v0,v0,0x4
> > 0x8042f0f0 <futex_init+20>:     bnez    v0,0x8042f114 <futex_init+56>
> > 0x8042f0f4 <futex_init+24>:     li      a0,-14
> > 0x8042f0f8 <futex_init+28>:     ll      a0,0(v0)
> 
> So this is in futex_atomic_cmpxchg_inatomic which has been inlined into
> futex_init.  The epc is pointing to this LL instruction which is a
> legitimate MIPS32 instruction, so a reserved instruction exception does
> not make sense.  However, a NULL pointer has intensionally been passed
> as the argument heres so this LL instruction will take a TLB exception,
> do_page_fault() will change the EPC to return to to point to the fixup
> handler which in the sources are these lines:
> 
>                 "       .section .fixup,\"ax\"                          \n"
>                 "4:     li      %0, %5                                  \n"
>                 "       j       3b                                      \n"
>                 "       .previous                                       \n"
>                 "       .section __ex_table,\"a\"                       \n"
>                 "       "__UA_ADDR "\t1b, 4b                            \n"
>                 "       "__UA_ADDR "\t2b, 4b                            \n"
>                 "       .previous                                       \n"
> 
> That's how it normally should function.  If however in the exception
> handler something goes wrong while c0_status.exl is still set the c0_epc
> regiser won't be updated for the 2nd exception which is that reserved
> instruction exception.  This sort of bug can be ugly to chase, I'm afraid.

Thanks for this info! In other words, this oops is actually the result of
another earlier problem, which trashes something used by the tlb fault
handler? (I've also seen this oops as a "kernel unaligned access" with epc
at the 'll'.  Also, isn't it a problem that a0 is -14 instead of zero?).

Thanks!
	Manuel Lauss

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: oops in futex_init()
  2009-04-29  8:25   ` Manuel Lauss
@ 2009-04-29  8:33     ` Ralf Baechle
  2009-04-29 11:40       ` Manuel Lauss
  0 siblings, 1 reply; 15+ messages in thread
From: Ralf Baechle @ 2009-04-29  8:33 UTC (permalink / raw)
  To: Manuel Lauss; +Cc: Linux-MIPS

On Wed, Apr 29, 2009 at 10:25:56AM +0200, Manuel Lauss wrote:

> > > (gdb) disass 0x8042f0f8
> > > Dump of assembler code for function futex_init:
> > > 0x8042f0dc <futex_init+0>:      lw      v1,20(gp)
> > > 0x8042f0e0 <futex_init+4>:      addiu   v1,v1,1
> > > 0x8042f0e4 <futex_init+8>:      sw      v1,20(gp)
> > > 0x8042f0e8 <futex_init+12>:     lw      v0,24(gp)
> > > 0x8042f0ec <futex_init+16>:     andi    v0,v0,0x4
> > > 0x8042f0f0 <futex_init+20>:     bnez    v0,0x8042f114 <futex_init+56>
> > > 0x8042f0f4 <futex_init+24>:     li      a0,-14
> > > 0x8042f0f8 <futex_init+28>:     ll      a0,0(v0)
> > 
> > So this is in futex_atomic_cmpxchg_inatomic which has been inlined into
> > futex_init.  The epc is pointing to this LL instruction which is a
> > legitimate MIPS32 instruction, so a reserved instruction exception does
> > not make sense.  However, a NULL pointer has intensionally been passed
> > as the argument heres so this LL instruction will take a TLB exception,
> > do_page_fault() will change the EPC to return to to point to the fixup
> > handler which in the sources are these lines:
> > 
> >                 "       .section .fixup,\"ax\"                          \n"
> >                 "4:     li      %0, %5                                  \n"
> >                 "       j       3b                                      \n"
> >                 "       .previous                                       \n"
> >                 "       .section __ex_table,\"a\"                       \n"
> >                 "       "__UA_ADDR "\t1b, 4b                            \n"
> >                 "       "__UA_ADDR "\t2b, 4b                            \n"
> >                 "       .previous                                       \n"
> > 
> > That's how it normally should function.  If however in the exception
> > handler something goes wrong while c0_status.exl is still set the c0_epc
> > regiser won't be updated for the 2nd exception which is that reserved
> > instruction exception.  This sort of bug can be ugly to chase, I'm afraid.
> 
> Thanks for this info! In other words, this oops is actually the result of
> another earlier problem, which trashes something used by the tlb fault
> handler? (I've also seen this oops as a "kernel unaligned access" with epc
> at the 'll'.  Also, isn't it a problem that a0 is -14 instead of zero?).

No - it will be overwritten either after the load succeeded or in the
fixup handler.  The load of the -14 value is from __access_() happens to
be in a branch delay slot of a branch which will never be executed - but
that's as far as gcc knows how to optimize the access_ok() invokation
away.

When did this issue start?  I wonder if it was when you removed the Alchemy
hazard barriers?

  Ralf

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: oops in futex_init()
  2009-04-29  8:33     ` Ralf Baechle
@ 2009-04-29 11:40       ` Manuel Lauss
  2009-04-29 14:14         ` Manuel Lauss
  0 siblings, 1 reply; 15+ messages in thread
From: Manuel Lauss @ 2009-04-29 11:40 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: Linux-MIPS

On Wed, Apr 29, 2009 at 10:33:49AM +0200, Ralf Baechle wrote:
> On Wed, Apr 29, 2009 at 10:25:56AM +0200, Manuel Lauss wrote:
> 
> > > > (gdb) disass 0x8042f0f8
> > > > Dump of assembler code for function futex_init:
> > > > 0x8042f0dc <futex_init+0>:      lw      v1,20(gp)
> > > > 0x8042f0e0 <futex_init+4>:      addiu   v1,v1,1
> > > > 0x8042f0e4 <futex_init+8>:      sw      v1,20(gp)
> > > > 0x8042f0e8 <futex_init+12>:     lw      v0,24(gp)
> > > > 0x8042f0ec <futex_init+16>:     andi    v0,v0,0x4
> > > > 0x8042f0f0 <futex_init+20>:     bnez    v0,0x8042f114 <futex_init+56>
> > > > 0x8042f0f4 <futex_init+24>:     li      a0,-14
> > > > 0x8042f0f8 <futex_init+28>:     ll      a0,0(v0)
> > > 
> > > So this is in futex_atomic_cmpxchg_inatomic which has been inlined into
> > > futex_init.  The epc is pointing to this LL instruction which is a
> > > legitimate MIPS32 instruction, so a reserved instruction exception does
> > > not make sense.  However, a NULL pointer has intensionally been passed
> > > as the argument heres so this LL instruction will take a TLB exception,
> > > do_page_fault() will change the EPC to return to to point to the fixup
> > > handler which in the sources are these lines:
> > > 
> > >                 "       .section .fixup,\"ax\"                          \n"
> > >                 "4:     li      %0, %5                                  \n"
> > >                 "       j       3b                                      \n"
> > >                 "       .previous                                       \n"
> > >                 "       .section __ex_table,\"a\"                       \n"
> > >                 "       "__UA_ADDR "\t1b, 4b                            \n"
> > >                 "       "__UA_ADDR "\t2b, 4b                            \n"
> > >                 "       .previous                                       \n"
> > > 
> > > That's how it normally should function.  If however in the exception
> > > handler something goes wrong while c0_status.exl is still set the c0_epc
> > > regiser won't be updated for the 2nd exception which is that reserved
> > > instruction exception.  This sort of bug can be ugly to chase, I'm afraid.
> > 
> > Thanks for this info! In other words, this oops is actually the result of
> > another earlier problem, which trashes something used by the tlb fault
> > handler? (I've also seen this oops as a "kernel unaligned access" with epc
> > at the 'll'.  Also, isn't it a problem that a0 is -14 instead of zero?).
> 
> No - it will be overwritten either after the load succeeded or in the
> fixup handler.  The load of the -14 value is from __access_() happens to
> be in a branch delay slot of a branch which will never be executed - but
> that's as far as gcc knows how to optimize the access_ok() invokation
> away.
> 
> When did this issue start?  I wonder if it was when you removed the Alchemy
> hazard barriers?

No; it started shortly after 2.6.30 was opened and I added TSC2007 support
to my board.  I don't see it on the Db1200, only on systems at work.
I suspect it's parts of the board code which trigger it; I just can't figure
out which (i.e. its just as any other board code with lots of platform
device and resource structs spread over a few files).

I've been running kernels before 2.6.28 came out with the removed hazard
barriers and never before ran into problems.  I don't think they're
responsible. But I'll revert them and keep looking for the real reason ;-)

Thanks!
	Manuel Lauss

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: oops in futex_init()
  2009-04-29 11:40       ` Manuel Lauss
@ 2009-04-29 14:14         ` Manuel Lauss
  2009-04-29 14:20           ` Kevin D. Kissell
  2009-04-29 14:35           ` Ralf Baechle
  0 siblings, 2 replies; 15+ messages in thread
From: Manuel Lauss @ 2009-04-29 14:14 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: Linux-MIPS


FWIW, I think I fixed it: I have a small area (< 4kB) with a lot of UARTs
and 3 interrupt controllers in it.  An ioremap() was done for each uart and
irq ctl area.  Now there's one ioremap of the whole area and the oops is
gone.  I don't know why, but it seems fixed. (The oops appeared after one
of the remapped areas was touched).

Thanks!
	Manuel Lauss

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: oops in futex_init()
  2009-04-29 14:14         ` Manuel Lauss
@ 2009-04-29 14:20           ` Kevin D. Kissell
  2009-04-29 14:35           ` Ralf Baechle
  1 sibling, 0 replies; 15+ messages in thread
From: Kevin D. Kissell @ 2009-04-29 14:20 UTC (permalink / raw)
  To: Manuel Lauss; +Cc: Ralf Baechle, Linux-MIPS

Manuel Lauss wrote:
> FWIW, I think I fixed it: I have a small area (< 4kB) with a lot of UARTs
> and 3 interrupt controllers in it.  An ioremap() was done for each uart and
> irq ctl area.  Now there's one ioremap of the whole area and the oops is
> gone.  I don't know why, but it seems fixed. (The oops appeared after one
> of the remapped areas was touched).
By any chance would it be possible for you to revert to the failing
configuration and dump the contents of the TLB at the oops?  Your
description makes it sound like the multiple ioremaps are generating
duplicate or otherwise conflicting TLB entries.  If that's so, there's a
bug in the TLB management code to be hunted down and killed while we
have a test case...

          Regards,

          Kevin K.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: oops in futex_init()
  2009-04-29 14:14         ` Manuel Lauss
  2009-04-29 14:20           ` Kevin D. Kissell
@ 2009-04-29 14:35           ` Ralf Baechle
  2009-04-29 15:32             ` Manuel Lauss
  1 sibling, 1 reply; 15+ messages in thread
From: Ralf Baechle @ 2009-04-29 14:35 UTC (permalink / raw)
  To: Manuel Lauss; +Cc: Linux-MIPS

On Wed, Apr 29, 2009 at 04:14:11PM +0200, Manuel Lauss wrote:

> FWIW, I think I fixed it: I have a small area (< 4kB) with a lot of UARTs
> and 3 interrupt controllers in it.  An ioremap() was done for each uart and
> irq ctl area.  Now there's one ioremap of the whole area and the oops is
> gone.  I don't know why, but it seems fixed. (The oops appeared after one
> of the remapped areas was touched).

That should be benign - especially if the mappings are for physical
addresses < 512MB which would become mapped as KSEG1 addresses.  The
dangerous cases are where multiple mappings alias (can't happen on
Alchemy caches) or where different machines use different cache modes
which also shouldn't hit you because I/O addresses should be mapped
uncachable.  So you may want to try out what Kevin suggested to get to
the root of this.

  Ralf

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: oops in futex_init()
  2009-04-29 14:35           ` Ralf Baechle
@ 2009-04-29 15:32             ` Manuel Lauss
  2009-04-30 10:41               ` Manuel Lauss
  0 siblings, 1 reply; 15+ messages in thread
From: Manuel Lauss @ 2009-04-29 15:32 UTC (permalink / raw)
  To: Ralf Baechle, Kevin D. Kissell; +Cc: Linux-MIPS

Hi Ralf, Kevin,

On Wed, Apr 29, 2009 at 04:35:23PM +0200, Ralf Baechle wrote:
> On Wed, Apr 29, 2009 at 04:14:11PM +0200, Manuel Lauss wrote:
> 
> > FWIW, I think I fixed it: I have a small area (< 4kB) with a lot of UARTs
> > and 3 interrupt controllers in it.  An ioremap() was done for each uart and
> > irq ctl area.  Now there's one ioremap of the whole area and the oops is
> > gone.  I don't know why, but it seems fixed. (The oops appeared after one
> > of the remapped areas was touched).
> 
> That should be benign - especially if the mappings are for physical
> addresses < 512MB which would become mapped as KSEG1 addresses.  The
> dangerous cases are where multiple mappings alias (can't happen on
> Alchemy caches) or where different machines use different cache modes
> which also shouldn't hit you because I/O addresses should be mapped
> uncachable.  So you may want to try out what Kevin suggested to get to
> the root of this.

This CS is outside the KSEG0/1 areas.  The code in question did an ioremap
on 3 adjacent 8-byte areas (at offset 0x8, 0x10 and 0x14 from the CS base)
for the irq controllers and then registered new irqs in a device_initcall.
I replaced this with an ioremap of a 16mb area and moved irq registration to
a subsys_initcall.

As I said, the oops appeared with 2.6.30 when I added support for the
TSC2007; for this I introduced yet another ioremap() for offset 0x10 (in
code unrelated to the irq stuff) for the pen-down callback.

I'm going to try Kevin's suggestion if it turns out that this whole ordeal
is not completely my own fault (which I assume it is).

Thank you,
	Manuel Lauss

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: oops in futex_init()
  2009-04-29 15:32             ` Manuel Lauss
@ 2009-04-30 10:41               ` Manuel Lauss
  2009-04-30 11:14                 ` Kevin D. Kissell
  2009-04-30 11:23                 ` Ralf Baechle
  0 siblings, 2 replies; 15+ messages in thread
From: Manuel Lauss @ 2009-04-30 10:41 UTC (permalink / raw)
  To: Ralf Baechle, Kevin D. Kissell; +Cc: Linux-MIPS

Hi,

This is really embarrassing:  The oops is what happens when you
ioremap(0x10) and write 0xffffffff at the resulting address (0xa0000010).

Thanks for your time!
	Manuel Lauss

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: oops in futex_init()
  2009-04-30 10:41               ` Manuel Lauss
@ 2009-04-30 11:14                 ` Kevin D. Kissell
  2009-04-30 11:22                   ` Manuel Lauss
  2009-04-30 11:23                 ` Ralf Baechle
  1 sibling, 1 reply; 15+ messages in thread
From: Kevin D. Kissell @ 2009-04-30 11:14 UTC (permalink / raw)
  To: Manuel Lauss; +Cc: Ralf Baechle, Linux-MIPS

Manuel Lauss wrote:
> Hi,
>
> This is really embarrassing:  The oops is what happens when you
> ioremap(0x10) and write 0xffffffff at the resulting address (0xa0000010).
>   
This pretty much implies that _CACHE_UNCACHED was passed in the flags
value to ioremap, so it simply returned the kseg1 address.  By any
chance is physical page zero also referenced somewhere as cacheable?

          Kevin K.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: oops in futex_init()
  2009-04-30 11:14                 ` Kevin D. Kissell
@ 2009-04-30 11:22                   ` Manuel Lauss
  2009-04-30 11:51                     ` Ralf Baechle
  0 siblings, 1 reply; 15+ messages in thread
From: Manuel Lauss @ 2009-04-30 11:22 UTC (permalink / raw)
  To: Kevin D. Kissell; +Cc: Ralf Baechle, Linux-MIPS

On Thu, Apr 30, 2009 at 01:14:46PM +0200, Kevin D. Kissell wrote:
> Manuel Lauss wrote:
> > Hi,
> >
> > This is really embarrassing:  The oops is what happens when you
> > ioremap(0x10) and write 0xffffffff at the resulting address (0xa0000010).
> >   
> This pretty much implies that _CACHE_UNCACHED was passed in the flags
> value to ioremap, so it simply returned the kseg1 address.  By any

uncached is default for plain ioremap(), says io.h:

#define ioremap(offset, size)                                           \
        __ioremap_mode((offset), (size), _CACHE_UNCACHED)

Apparently one can ioremap everything on mips except RAM in use at
0x8xxxxxxxx.

	Manuel Lauss

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: oops in futex_init()
  2009-04-30 10:41               ` Manuel Lauss
  2009-04-30 11:14                 ` Kevin D. Kissell
@ 2009-04-30 11:23                 ` Ralf Baechle
  1 sibling, 0 replies; 15+ messages in thread
From: Ralf Baechle @ 2009-04-30 11:23 UTC (permalink / raw)
  To: Manuel Lauss; +Cc: Kevin D. Kissell, Linux-MIPS

On Thu, Apr 30, 2009 at 12:41:30PM +0200, Manuel Lauss wrote:

> This is really embarrassing:  The oops is what happens when you
> ioremap(0x10) and write 0xffffffff at the resulting address (0xa0000010).

That address is the TLB reload handler.  You just overwrote the TLB
exception hander.  0xffffffff is sd $31, -1($31) which is a MIPS III
that is 64-bit instruction which the Alchemy core duly honors with a
reserved instruction exception.  The handler is running with
c0_status.exl=1 so the EPC will point to the LL instruction but the
exception printed in the panic will be the 2nd exception.  Case closed,
I'd say.

  Ralf

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: oops in futex_init()
  2009-04-30 11:22                   ` Manuel Lauss
@ 2009-04-30 11:51                     ` Ralf Baechle
  0 siblings, 0 replies; 15+ messages in thread
From: Ralf Baechle @ 2009-04-30 11:51 UTC (permalink / raw)
  To: Manuel Lauss; +Cc: Kevin D. Kissell, Linux-MIPS

On Thu, Apr 30, 2009 at 01:22:43PM +0200, Manuel Lauss wrote:

> uncached is default for plain ioremap(), says io.h:
> 
> #define ioremap(offset, size)                                           \
>         __ioremap_mode((offset), (size), _CACHE_UNCACHED)
> 
> Apparently one can ioremap everything on mips except RAM in use at
> 0x8xxxxxxxx.

Yes, it's GIGO - garbage in, garbage out.

  Ralf

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2009-04-30 11:51 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-04-28 12:46 oops in futex_init() Manuel Lauss
2009-04-28 12:46 ` Manuel Lauss
2009-04-29  6:03 ` Ralf Baechle
2009-04-29  8:25   ` Manuel Lauss
2009-04-29  8:33     ` Ralf Baechle
2009-04-29 11:40       ` Manuel Lauss
2009-04-29 14:14         ` Manuel Lauss
2009-04-29 14:20           ` Kevin D. Kissell
2009-04-29 14:35           ` Ralf Baechle
2009-04-29 15:32             ` Manuel Lauss
2009-04-30 10:41               ` Manuel Lauss
2009-04-30 11:14                 ` Kevin D. Kissell
2009-04-30 11:22                   ` Manuel Lauss
2009-04-30 11:51                     ` Ralf Baechle
2009-04-30 11:23                 ` Ralf Baechle

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.