All of lore.kernel.org
 help / color / mirror / Atom feed
* Tracers+cyclictest causing kernel oops
@ 2013-04-26  8:04 Tom Cook
  2013-04-29  7:16 ` Tom Cook
  0 siblings, 1 reply; 16+ messages in thread
From: Tom Cook @ 2013-04-26  8:04 UTC (permalink / raw)
  To: RT

I'm running 3.6.11-rt31 on a Raspberry Pi.  Running cyclictest like this:

  sudo ./cyclictest -p95 -m -W -b 1000
  # /dev/cpu_dma_latency set to 0us
  policy: fifo: loadavg: 0.72 0.28 0.10 2/137 2054

causes this:

Entering kdb (current=0xca78e0a0, pid 2058) Oops: (null)
due to oops @ 0xc009f654

Pid: 2058, comm:           cyclictest
CPU: 0    Not tainted  (3.6.11-rt31+ #10)
PC is at ring_buffer_lock_reserve+0x7c/0x144
LR is at ftrace_test_stop_func+0x28/0x34
pc : [<c009f654>]    lr : [<c0097684>]    psr: 600001d1
sp : ca7bfa78  ip : ca7bfaf0  fp : ca7bfaec
r10: 00000000  r9 : c088ece0  r8 : ca7be000
r7 : 200001d1  r6 : c030dd28  r5 : c00ab928  r4 : c0786000
r3 : 200001d1  r2 : 00000018  r1 : 00000000  r0 : cb8013a0
Flags: nZCv  IRQs off  FIQs off  Mode FIQ_32  ISA ARM  Segment user
Control: 00c5387d  Table: 0a7b8008  DAC: 00000015
[<c0015188>] (unwind_backtrace+0x0/0xfc) from [<c041953c>]
(dump_stack+0x20/0x24)
[<c041953c>] (dump_stack+0x20/0x24) from [<c000f720>] (show_regs+0x4c/0x58)
[<c000f720>] (show_regs+0x4c/0x58) from [<c0086c48>] (kdb_dumpregs+0x38/0x60)
[<c0086c48>] (kdb_dumpregs+0x38/0x60) from [<c00897d8>]
(kdb_main_loop+0x348/0x77c)
[<c00897d8>] (kdb_main_loop+0x348/0x77c) from [<c008c3d8>]
(kdb_stub+0x2f0/0x3f8)
[<c008c3d8>] (kdb_stub+0x2f0/0x3f8) from [<c0082b24>]
(kgdb_handle_exception+0x164/0x6fc)
[<c0082b24>] (kgdb_handle_exception+0x164/0x6fc) from [<c00147e0>]
(kgdb_notify+0x3c/0x74)
[<c00147e0>] (kgdb_notify+0x3c/0x74) from [<c0421e84>]
(notifier_call_chain+0x54/0x94)
[<c0421e84>] (notifier_call_chain+0x54/0x94) from [<c0421f0c>]
(__atomic_notifier_call_chain+0x48/0x5c)
[<c0421f0c>] (__atomic_notifier_call_chain+0x48/0x5c) from
[<c0421f48>] (atomic_notifier_call_chain+0x28/0x30)
[<c0421f48>] (atomic_notifier_call_chain+0x28/0x30) from [<c0421f90>]
(notify_die+0x40/0x4c)
[<c0421f90>] (notify_die+0x40/0x4c) from [<c0011de8>] (die+0xd8/0x3c0)
[<c0011de8>] (die+0xd8/0x3c0) from [<c0012480>] (bad_mode+0x4c/0x70)
[<c0012480>] (bad_mode+0x4c/0x70) from [<c0097684>]
(ftrace_test_stop_func+0x28/0x34)

AFAICT, this is actually kgdb encountering a problem while trying to
unwind the stack trace from the real problem - is that right?

The problem seems to be caused by cyclictest setting up the tracer,
not triggering it - it doesn't matter what I set the latency threshold
to, the same thing happens.

Is there a known cause for this?  Any tips on how to start debugging it?

Thanks,
Tom

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Tracers+cyclictest causing kernel oops
  2013-04-26  8:04 Tracers+cyclictest causing kernel oops Tom Cook
@ 2013-04-29  7:16 ` Tom Cook
  2013-05-03 16:05   ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 16+ messages in thread
From: Tom Cook @ 2013-04-29  7:16 UTC (permalink / raw)
  To: RT

No ideas?

On Fri, Apr 26, 2013 at 9:04 AM, Tom Cook <tom.k.cook@gmail.com> wrote:
> I'm running 3.6.11-rt31 on a Raspberry Pi.  Running cyclictest like this:
>
>   sudo ./cyclictest -p95 -m -W -b 1000
>   # /dev/cpu_dma_latency set to 0us
>   policy: fifo: loadavg: 0.72 0.28 0.10 2/137 2054
>
> causes this:
>
> Entering kdb (current=0xca78e0a0, pid 2058) Oops: (null)
> due to oops @ 0xc009f654
>
> Pid: 2058, comm:           cyclictest
> CPU: 0    Not tainted  (3.6.11-rt31+ #10)
> PC is at ring_buffer_lock_reserve+0x7c/0x144
> LR is at ftrace_test_stop_func+0x28/0x34
> pc : [<c009f654>]    lr : [<c0097684>]    psr: 600001d1
> sp : ca7bfa78  ip : ca7bfaf0  fp : ca7bfaec
> r10: 00000000  r9 : c088ece0  r8 : ca7be000
> r7 : 200001d1  r6 : c030dd28  r5 : c00ab928  r4 : c0786000
> r3 : 200001d1  r2 : 00000018  r1 : 00000000  r0 : cb8013a0
> Flags: nZCv  IRQs off  FIQs off  Mode FIQ_32  ISA ARM  Segment user
> Control: 00c5387d  Table: 0a7b8008  DAC: 00000015
> [<c0015188>] (unwind_backtrace+0x0/0xfc) from [<c041953c>]
> (dump_stack+0x20/0x24)
> [<c041953c>] (dump_stack+0x20/0x24) from [<c000f720>] (show_regs+0x4c/0x58)
> [<c000f720>] (show_regs+0x4c/0x58) from [<c0086c48>] (kdb_dumpregs+0x38/0x60)
> [<c0086c48>] (kdb_dumpregs+0x38/0x60) from [<c00897d8>]
> (kdb_main_loop+0x348/0x77c)
> [<c00897d8>] (kdb_main_loop+0x348/0x77c) from [<c008c3d8>]
> (kdb_stub+0x2f0/0x3f8)
> [<c008c3d8>] (kdb_stub+0x2f0/0x3f8) from [<c0082b24>]
> (kgdb_handle_exception+0x164/0x6fc)
> [<c0082b24>] (kgdb_handle_exception+0x164/0x6fc) from [<c00147e0>]
> (kgdb_notify+0x3c/0x74)
> [<c00147e0>] (kgdb_notify+0x3c/0x74) from [<c0421e84>]
> (notifier_call_chain+0x54/0x94)
> [<c0421e84>] (notifier_call_chain+0x54/0x94) from [<c0421f0c>]
> (__atomic_notifier_call_chain+0x48/0x5c)
> [<c0421f0c>] (__atomic_notifier_call_chain+0x48/0x5c) from
> [<c0421f48>] (atomic_notifier_call_chain+0x28/0x30)
> [<c0421f48>] (atomic_notifier_call_chain+0x28/0x30) from [<c0421f90>]
> (notify_die+0x40/0x4c)
> [<c0421f90>] (notify_die+0x40/0x4c) from [<c0011de8>] (die+0xd8/0x3c0)
> [<c0011de8>] (die+0xd8/0x3c0) from [<c0012480>] (bad_mode+0x4c/0x70)
> [<c0012480>] (bad_mode+0x4c/0x70) from [<c0097684>]
> (ftrace_test_stop_func+0x28/0x34)
>
> AFAICT, this is actually kgdb encountering a problem while trying to
> unwind the stack trace from the real problem - is that right?
>
> The problem seems to be caused by cyclictest setting up the tracer,
> not triggering it - it doesn't matter what I set the latency threshold
> to, the same thing happens.
>
> Is there a known cause for this?  Any tips on how to start debugging it?
>
> Thanks,
> Tom

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Tracers+cyclictest causing kernel oops
  2013-04-29  7:16 ` Tom Cook
@ 2013-05-03 16:05   ` Sebastian Andrzej Siewior
  2013-05-12  5:14     ` Tom Cook
       [not found]     ` <CAFSh4UwD5yyJrtKwUaOX-uiUuzVf7YaDbM4vbZ1EFZBf6F1JHg@mail.gmail.com>
  0 siblings, 2 replies; 16+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-05-03 16:05 UTC (permalink / raw)
  To: Tom Cook; +Cc: RT

* Tom Cook | 2013-04-29 08:16:35 [+0100]:

>No ideas?

what about trying without kgdb?

Sebastian

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Tracers+cyclictest causing kernel oops
  2013-05-03 16:05   ` Sebastian Andrzej Siewior
@ 2013-05-12  5:14     ` Tom Cook
       [not found]     ` <CAFSh4UwD5yyJrtKwUaOX-uiUuzVf7YaDbM4vbZ1EFZBf6F1JHg@mail.gmail.com>
  1 sibling, 0 replies; 16+ messages in thread
From: Tom Cook @ 2013-05-12  5:14 UTC (permalink / raw)
  To: RT

Okay, I've built the same kernel config but without kgdb enabled.
Running cyclictest with a tracer enabled (-f in this case) causes the
crash below.

I'm not big on debugging these things yet.  Since cyclictest works
without the tracers enabled and causes a crash with them enabled, the
problem must be in the tracers themselves, right?  Actually, 'echo
function > current_tracer' has the same effect.

Is that trace really saying that the processor is in mode 0?  AFAICT,
ARM does not have a mode 0 unless you truncate the mode number to four
bits - in which case mode '0' is user mode.  The process name seems to
be a red herring - repeating this can turn up almost any process
running on the system, from interrupt handler threads to bash.

Is this a known problem?  Any tips on figuring it out?

Sebastian, sorry for the repeat - hit the wrong reply button the first time.

Thanks,
Tom

pi@raspberrypi:~/rt-tests$ sudo ./cyclictest -p95 -m -f -b 2000
# /dev/cpu_dma_latency set to 0us
[  199.186167] Bad mode in data abort handler detected
[  199.186194] Internal error: Oops - bad mode: 0 [#1] PREEMPT ARM
[  199.186246] Modules linked in: snd_bcm2835 snd_pcm snd_seq
snd_timer snd_seq_device snd snd_page_alloc
[  199.186260] CPU: 0    Not tainted  (3.6.11-rt31+ #1)
[  199.186297] PC is at ring_buffer_lock_reserve+0x7c/0x144
[  199.186326] LR is at vfs_write+0x140/0x188
[  199.186341] pc : [<c009486c>]    lr : [<c00efda4>]    psr: 600001d1
[  199.186341] sp : da645ef8  ip : da645e28  fp : da645f6c
[  199.186348] r10: 00000000  r9 : 00000000  r8 : 00000008
[  199.186357] r7 : 200001d1  r6 : c030226c  r5 : c012cd64  r4 : c0732000
[  199.186366] r3 : 200001d1  r2 : 00000018  r1 : 00000000  r0 : db801420
[  199.186375] Flags: nZCv  IRQs off  FIQs off  Mode FIQ_32  ISA ARM
Segment user
[  199.186384] Control: 00c5387d  Table: 1a7f8008  DAC: 00000015
[  199.186396] Process cyclictest (pid: 2035, stack limit = 0xda644268)
[  199.186406] Stack: (0xda645ef8 to 0xda646000)
[  199.186417] 5ee0:
    db801420 00000000
[  199.186434] 5f00: 00000018 200001d1 c0732000 c012cd64 c030226c
200001d1 00000008 00000000
[  199.186451] 5f20: 00000000 da645f6c da645e28 da645ef8 c00efda4
c009486c 600001d1 ffffffff
[  199.186467] 5f40: db408988 da6e52c8 da6e52c0 0000fc10 00000008
00000004 00000000 00000000
[  199.186483] 5f60: da645fa4 da645f70 c00f002c c00efc70 00000008
00000000 00000001 00000000
[  199.186499] 5f80: 00000004 00000004 00000001 0000001a c000e248
da644000 00000000 da645fa8
[  199.186515] 5fa0: c000dfc0 c00effec 00000004 00000001 00000004
0000fc10 00000008 00000063
[  199.186531] 5fc0: 00000004 00000001 0000001a 00000004 00019798
0000fc10 00000008 00000001
[  199.186547] 5fe0: 0000000e beca42c0 0000b7bc b6f7a6fc 60000010
00000004 00000000 00000000
[  199.186595] [<c009486c>] (ring_buffer_lock_reserve+0x7c/0x144) from
[<c00f002c>] (sys_write+0x4c/0x78)
[  199.186639] [<c00f002c>] (sys_write+0x4c/0x78) from [<c000dfc0>]
(ret_fast_syscall+0x0/0x48)
[  199.186659] Code: eb0df889 e3a00000 e89da830 e5941010 (e591cd4c)
[  199.497152] ---[ end trace 0000000000000002 ]---
[  199.497169] Kernel panic - not syncing: Fatal exception
[  199.515555] Bad mode in prefetch abort handler detected
[  199.515584] Internal error: Oops - bad mode: 0 [#2] PREEMPT ARM
[  199.515632] Modules linked in: snd_bcm2835 snd_pcm snd_seq
snd_timer snd_seq_device snd snd_page_alloc
[  199.515644] CPU: 0    Tainted: G      D       (3.6.11-rt31+ #1)
[  199.515660] PC is at 0xbe6f00b0
[  199.515671] LR is at 0xb6ee67d4
[  199.515684] pc : [<be6f00b0>]    lr : [<b6ee67d4>]    psr: 800001d1
[  199.515684] sp : da46ffb0  ip : 00000000  fp : b5ca8fac
[  199.515691] r10: b5ca96a4  r9 : b5ca8fb0  r8 : 00000000
[  199.515700] r7 : 00000003  r6 : b5ca9684  r5 : b6cf62dc  r4 : b6cf42e4
[  199.515708] r3 : 00000000  r2 : 00000fff  r1 : b6cf42e4  r0 : 000008fe
[  199.515717] Flags: Nzcv  IRQs off  FIQs off  Mode FIQ_32  ISA ARM
Segment user
[  199.515725] Control: 00c5387d  Table: 1a434008  DAC: 00000015
[  199.515735] Process rsyslogd (pid: 1700, stack limit = 0xda46e268)
[  199.515744] Stack: (0xda46ffb0 to 0xda470000)
[  199.515759] ffa0:                                     000008fe
b6cf42e4 00000fff 00000000
[  199.515776] ffc0: b6cf42e4 b6cf62dc b5ca9684 00000003 00000000
b5ca8fb0 b5ca96a4 b5ca8fac
[  199.515793] ffe0: 00000000 da46ffb0 b6ee67d4 be6f00b0 800001d1
ffffffff 1bffe821 1bffec21
[  199.515900] Code: 00000000 00000000 00000000 00000000 (00000000)

On Fri, May 3, 2013 at 5:05 PM, Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
> * Tom Cook | 2013-04-29 08:16:35 [+0100]:
>
>>No ideas?
>
> what about trying without kgdb?
>
> Sebastian

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Tracers+cyclictest causing kernel oops
       [not found]     ` <CAFSh4UwD5yyJrtKwUaOX-uiUuzVf7YaDbM4vbZ1EFZBf6F1JHg@mail.gmail.com>
@ 2013-05-21 11:16       ` Sebastian Andrzej Siewior
  2013-05-21 15:20         ` Tom Cook
  0 siblings, 1 reply; 16+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-05-21 11:16 UTC (permalink / raw)
  To: Tom Cook, linux-rt-users

please CC the list.

On 05/11/2013 06:43 PM, Tom Cook wrote:
> Okay, I've built the same kernel config but without kgdb enabled.
> Running cyclictest with a tracer enabled (-f in this case) causes the
> crash below.
> 
> I'm not big on debugging these things yet.  Since cyclictest works
> without the tracers enabled and causes a crash with them enabled, the
> problem must be in the tracers themselves, right?  Actually, 'echo
> function > current_tracer' has the same effect.
> 
> Is that trace really saying that the processor is in mode 0?  AFAICT,
> ARM does not have a mode 0 unless you truncate the mode number to four
> bits - in which case mode '0' is user mode.  The process name seems to
> be a red herring - repeating this can turn up almost any process
> running on the system, from interrupt handler threads to bash.
> 
> Is this a known problem?  Any tips on figuring it out?

Not known until you brought that up. I just booted x86 with kgdb=y and
did "echo function > current_tracer" with no side effects. I don't have
ARM at hand right now. Do you do anything kgdb related besides enabling
it in the kernel?

> 
> Thanks,
> Tom

Sebastian

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Tracers+cyclictest causing kernel oops
  2013-05-21 11:16       ` Sebastian Andrzej Siewior
@ 2013-05-21 15:20         ` Tom Cook
  2013-05-27  8:02           ` Uwe Kleine-König
  2013-05-31  9:23           ` Sebastian Andrzej Siewior
  0 siblings, 2 replies; 16+ messages in thread
From: Tom Cook @ 2013-05-21 15:20 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: linux-rt-users

On Tue, May 21, 2013 at 12:16 PM, Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
> please CC the list.

Sorry.

>
> On 05/11/2013 06:43 PM, Tom Cook wrote:
[snip]
>
> Not known until you brought that up. I just booted x86 with kgdb=y and
> did "echo function > current_tracer" with no side effects. I don't have
> ARM at hand right now. Do you do anything kgdb related besides enabling
> it in the kernel?

No.  I've also tried compiling without kgdb and enabling tracers still
causes a crash (see below).

Again, I'm far from expert here, but as near as I can tell, a fast
interrupt exception handler is causing a data abort exception.  Do the
tracers use fast interrupts to wake up?  Is there some tracer-related
memory that's getting swapped out?

pi@raspberrypi:~/rt-tests$ sudo ./cyclictest -p95 -m -f -b 2000
# /dev/cpu_dma_latency set to 0us
[  199.186167] Bad mode in data abort handler detected
[  199.186194] Internal error: Oops - bad mode: 0 [#1] PREEMPT ARM
[  199.186246] Modules linked in: snd_bcm2835 snd_pcm snd_seq
snd_timer snd_seq_device snd snd_page_alloc
[  199.186260] CPU: 0    Not tainted  (3.6.11-rt31+ #1)
[  199.186297] PC is at ring_buffer_lock_reserve+0x7c/0x144
[  199.186326] LR is at vfs_write+0x140/0x188
[  199.186341] pc : [<c009486c>]    lr : [<c00efda4>]    psr: 600001d1
[  199.186341] sp : da645ef8  ip : da645e28  fp : da645f6c
[  199.186348] r10: 00000000  r9 : 00000000  r8 : 00000008
[  199.186357] r7 : 200001d1  r6 : c030226c  r5 : c012cd64  r4 : c0732000
[  199.186366] r3 : 200001d1  r2 : 00000018  r1 : 00000000  r0 : db801420
[  199.186375] Flags: nZCv  IRQs off  FIQs off  Mode FIQ_32  ISA ARM
Segment user
[  199.186384] Control: 00c5387d  Table: 1a7f8008  DAC: 00000015
[  199.186396] Process cyclictest (pid: 2035, stack limit = 0xda644268)
[  199.186406] Stack: (0xda645ef8 to 0xda646000)
[  199.186417] 5ee0:
    db801420 00000000
[  199.186434] 5f00: 00000018 200001d1 c0732000 c012cd64 c030226c
200001d1 00000008 00000000
[  199.186451] 5f20: 00000000 da645f6c da645e28 da645ef8 c00efda4
c009486c 600001d1 ffffffff
[  199.186467] 5f40: db408988 da6e52c8 da6e52c0 0000fc10 00000008
00000004 00000000 00000000
[  199.186483] 5f60: da645fa4 da645f70 c00f002c c00efc70 00000008
00000000 00000001 00000000
[  199.186499] 5f80: 00000004 00000004 00000001 0000001a c000e248
da644000 00000000 da645fa8
[  199.186515] 5fa0: c000dfc0 c00effec 00000004 00000001 00000004
0000fc10 00000008 00000063
[  199.186531] 5fc0: 00000004 00000001 0000001a 00000004 00019798
0000fc10 00000008 00000001
[  199.186547] 5fe0: 0000000e beca42c0 0000b7bc b6f7a6fc 60000010
00000004 00000000 00000000
[  199.186595] [<c009486c>] (ring_buffer_lock_reserve+0x7c/0x144) from
[<c00f002c>] (sys_write+0x4c/0x78)
[  199.186639] [<c00f002c>] (sys_write+0x4c/0x78) from [<c000dfc0>]
(ret_fast_syscall+0x0/0x48)
[  199.186659] Code: eb0df889 e3a00000 e89da830 e5941010 (e591cd4c)
[  199.497152] ---[ end trace 0000000000000002 ]---
[  199.497169] Kernel panic - not syncing: Fatal exception
[  199.515555] Bad mode in prefetch abort handler detected
[  199.515584] Internal error: Oops - bad mode: 0 [#2] PREEMPT ARM
[  199.515632] Modules linked in: snd_bcm2835 snd_pcm snd_seq
snd_timer snd_seq_device snd snd_page_alloc
[  199.515644] CPU: 0    Tainted: G      D       (3.6.11-rt31+ #1)
[  199.515660] PC is at 0xbe6f00b0
[  199.515671] LR is at 0xb6ee67d4
[  199.515684] pc : [<be6f00b0>]    lr : [<b6ee67d4>]    psr: 800001d1
[  199.515684] sp : da46ffb0  ip : 00000000  fp : b5ca8fac
[  199.515691] r10: b5ca96a4  r9 : b5ca8fb0  r8 : 00000000
[  199.515700] r7 : 00000003  r6 : b5ca9684  r5 : b6cf62dc  r4 : b6cf42e4
[  199.515708] r3 : 00000000  r2 : 00000fff  r1 : b6cf42e4  r0 : 000008fe
[  199.515717] Flags: Nzcv  IRQs off  FIQs off  Mode FIQ_32  ISA ARM
Segment user
[  199.515725] Control: 00c5387d  Table: 1a434008  DAC: 00000015
[  199.515735] Process rsyslogd (pid: 1700, stack limit = 0xda46e268)
[  199.515744] Stack: (0xda46ffb0 to 0xda470000)
[  199.515759] ffa0:                                     000008fe
b6cf42e4 00000fff 00000000
[  199.515776] ffc0: b6cf42e4 b6cf62dc b5ca9684 00000003 00000000
b5ca8fb0 b5ca96a4 b5ca8fac
[  199.515793] ffe0: 00000000 da46ffb0 b6ee67d4 be6f00b0 800001d1
ffffffff 1bffe821 1bffec21
[  199.515900] Code: 00000000 00000000 00000000 00000000 (00000000)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Tracers+cyclictest causing kernel oops
  2013-05-21 15:20         ` Tom Cook
@ 2013-05-27  8:02           ` Uwe Kleine-König
  2013-05-27 21:23             ` Tom Cook
  2013-05-31  9:23           ` Sebastian Andrzej Siewior
  1 sibling, 1 reply; 16+ messages in thread
From: Uwe Kleine-König @ 2013-05-27  8:02 UTC (permalink / raw)
  To: Tom Cook; +Cc: Sebastian Andrzej Siewior, linux-rt-users

Hello,

On Tue, May 21, 2013 at 04:20:20PM +0100, Tom Cook wrote:
> On Tue, May 21, 2013 at 12:16 PM, Sebastian Andrzej Siewior
> <bigeasy@linutronix.de> wrote:
> > please CC the list.
> 
> Sorry.
> 
> >
> > On 05/11/2013 06:43 PM, Tom Cook wrote:
> [snip]
> >
> > Not known until you brought that up. I just booted x86 with kgdb=y and
> > did "echo function > current_tracer" with no side effects. I don't have
> > ARM at hand right now. Do you do anything kgdb related besides enabling
> > it in the kernel?
> 
> No.  I've also tried compiling without kgdb and enabling tracers still
> causes a crash (see below).
> 
> Again, I'm far from expert here, but as near as I can tell, a fast
> interrupt exception handler is causing a data abort exception.  Do the
> tracers use fast interrupts to wake up?  Is there some tracer-related
> memory that's getting swapped out?
> 
> pi@raspberrypi:~/rt-tests$ sudo ./cyclictest -p95 -m -f -b 2000
> # /dev/cpu_dma_latency set to 0us
> [  199.186167] Bad mode in data abort handler detected
> [  199.186194] Internal error: Oops - bad mode: 0 [#1] PREEMPT ARM
> [  199.186246] Modules linked in: snd_bcm2835 snd_pcm snd_seq
> snd_timer snd_seq_device snd snd_page_alloc
> [  199.186260] CPU: 0    Not tainted  (3.6.11-rt31+ #1)
> [  199.186297] PC is at ring_buffer_lock_reserve+0x7c/0x144
> [  199.186326] LR is at vfs_write+0x140/0x188
> [  199.186341] pc : [<c009486c>]    lr : [<c00efda4>]    psr: 600001d1
> [  199.186341] sp : da645ef8  ip : da645e28  fp : da645f6c
> [  199.186348] r10: 00000000  r9 : 00000000  r8 : 00000008
> [  199.186357] r7 : 200001d1  r6 : c030226c  r5 : c012cd64  r4 : c0732000
> [  199.186366] r3 : 200001d1  r2 : 00000018  r1 : 00000000  r0 : db801420
> [  199.186375] Flags: nZCv  IRQs off  FIQs off  Mode FIQ_32  ISA ARM
What are you using fiqs for? AFAIK they are not well supported in
mainline Linux even.

Best regards
Uwe

-- 
Pengutronix e.K.                           | Uwe Kleine-König            |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Tracers+cyclictest causing kernel oops
  2013-05-27  8:02           ` Uwe Kleine-König
@ 2013-05-27 21:23             ` Tom Cook
  2013-05-28  6:53               ` Uwe Kleine-König
  0 siblings, 1 reply; 16+ messages in thread
From: Tom Cook @ 2013-05-27 21:23 UTC (permalink / raw)
  To: Uwe Kleine-König; +Cc: Sebastian Andrzej Siewior, linux-rt-users

On Mon, May 27, 2013 at 9:02 AM, Uwe Kleine-König
<u.kleine-koenig@pengutronix.de> wrote:
[snip]
> What are you using fiqs for? AFAIK they are not well supported in
> mainline Linux even.

You're well on the way to plumbing the depths of my ignorance.  This
is just me applying the RT patches to the raspberrypi linux tree.  The
nearest I can figure out an answer to your question is this, in
arch/arm/kernel/fiqasm.S, in __set_fiq_regs:

@@@@@@@@@@@@@@@ hack: enable the fiq here to keep usb driver happy
        and     r1, #~PSR_F_BIT
@@@@@@@@@@@@@@@ endhack: (need to find better place for this to happen)

It uses the dwc_otg driver, but reports have it that the match of
hardware to driver is not wonderful.

Regards,
Tom
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Tracers+cyclictest causing kernel oops
  2013-05-27 21:23             ` Tom Cook
@ 2013-05-28  6:53               ` Uwe Kleine-König
  2013-06-21 11:01                 ` Tom Cook
  0 siblings, 1 reply; 16+ messages in thread
From: Uwe Kleine-König @ 2013-05-28  6:53 UTC (permalink / raw)
  To: Tom Cook; +Cc: Sebastian Andrzej Siewior, linux-rt-users

On Mon, May 27, 2013 at 10:23:53PM +0100, Tom Cook wrote:
> On Mon, May 27, 2013 at 9:02 AM, Uwe Kleine-König
> <u.kleine-koenig@pengutronix.de> wrote:
> [snip]
> > What are you using fiqs for? AFAIK they are not well supported in
> > mainline Linux even.
> 
> You're well on the way to plumbing the depths of my ignorance.  This
> is just me applying the RT patches to the raspberrypi linux tree.  The
> nearest I can figure out an answer to your question is this, in
> arch/arm/kernel/fiqasm.S, in __set_fiq_regs:
> 
> @@@@@@@@@@@@@@@ hack: enable the fiq here to keep usb driver happy
>         and     r1, #~PSR_F_BIT
> @@@@@@@@@@@@@@@ endhack: (need to find better place for this to happen)
IIRC the usb driver is vital for the raspberry pi, but can you try to
unbind the driver before enabling whatever breaks for you?

Something like:

	cd /sys/bus/platform/drivers/dwc_otg

in this directory are probably >=1 symlinks on devices. Do

	echo $devicename > unbind

for each of them where devicename is the name of the symlinks.

Best regards
Uwe

-- 
Pengutronix e.K.                           | Uwe Kleine-König            |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Tracers+cyclictest causing kernel oops
  2013-05-21 15:20         ` Tom Cook
  2013-05-27  8:02           ` Uwe Kleine-König
@ 2013-05-31  9:23           ` Sebastian Andrzej Siewior
  2013-05-31  9:31             ` Tom Cook
  2013-06-21 11:21             ` Tom Cook
  1 sibling, 2 replies; 16+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-05-31  9:23 UTC (permalink / raw)
  To: Tom Cook; +Cc: linux-rt-users

On 05/21/2013 05:20 PM, Tom Cook wrote:
> No.  I've also tried compiling without kgdb and enabling tracers still
> causes a crash (see below).
> 
> Again, I'm far from expert here, but as near as I can tell, a fast
> interrupt exception handler is causing a data abort exception.  Do the
> tracers use fast interrupts to wake up?  Is there some tracer-related
> memory that's getting swapped out?

I just tried the same thing on my AM33xx and nothing bad happned here.
One thing still: you might want to use "-n" for nanosleep.

Could you try [0] to check if you are not using more memory than
available? If the OOM-killer kills the program, then it is okay, if the
data-abort exception comes or the kernel crashes in a strange way then
it is HW.

The tracer do not use any special interrupts on purpose.
Now that I saw rasperry-pi let me ask this: do you have any
non-mainline patches on-top? And if it is the case, could you try to
get rid of them?
Also you can try the same test without the RT patches?

[0] http://download.breakpoint.cc/malloc.c

Sebastian

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Tracers+cyclictest causing kernel oops
  2013-05-31  9:23           ` Sebastian Andrzej Siewior
@ 2013-05-31  9:31             ` Tom Cook
  2013-06-21 11:21             ` Tom Cook
  1 sibling, 0 replies; 16+ messages in thread
From: Tom Cook @ 2013-05-31  9:31 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: linux-rt-users

On Fri, May 31, 2013 at 10:23 AM, Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
[snip]
> I just tried the same thing on my AM33xx and nothing bad happned here.
> One thing still: you might want to use "-n" for nanosleep.
>
> Could you try [0] to check if you are not using more memory than
> available? If the OOM-killer kills the program, then it is okay, if the
> data-abort exception comes or the kernel crashes in a strange way then
> it is HW.
>
> The tracer do not use any special interrupts on purpose.
> Now that I saw rasperry-pi let me ask this: do you have any
> non-mainline patches on-top? And if it is the case, could you try to
> get rid of them?

It is the linux-3.6.y branch of raspberrypi/linux on github with RT
patches added.  Nothing else custom.

> Also you can try the same test without the RT patches?

Okay.  Hopefully the flurry of spring weddings will break up enough to
give me a chance to try this over the weekend.

Tom

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Tracers+cyclictest causing kernel oops
  2013-05-28  6:53               ` Uwe Kleine-König
@ 2013-06-21 11:01                 ` Tom Cook
  0 siblings, 0 replies; 16+ messages in thread
From: Tom Cook @ 2013-06-21 11:01 UTC (permalink / raw)
  To: RT

On Tue, May 28, 2013 at 7:53 AM, Uwe Kleine-König
<u.kleine-koenig@pengutronix.de> wrote:
[snip]
> IIRC the usb driver is vital for the raspberry pi, but can you try to
> unbind the driver before enabling whatever breaks for you?
>
> Something like:
>
>         cd /sys/bus/platform/drivers/dwc_otg
>
> in this directory are probably >=1 symlinks on devices. Do
>
>         echo $devicename > unbind
>
> for each of them where devicename is the name of the symlinks.

Unfortunately my root FS is on a USB thumb drive.  There aren't many
options to change this, since the RPi MMC driver is broken under RT
(another thing to look into...)

Tom
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Tracers+cyclictest causing kernel oops
  2013-05-31  9:23           ` Sebastian Andrzej Siewior
  2013-05-31  9:31             ` Tom Cook
@ 2013-06-21 11:21             ` Tom Cook
  2013-06-21 14:17               ` Tom Cook
  2013-06-21 15:00               ` Sebastian Andrzej Siewior
  1 sibling, 2 replies; 16+ messages in thread
From: Tom Cook @ 2013-06-21 11:21 UTC (permalink / raw)
  To: RT

On Fri, May 31, 2013 at 10:23 AM, Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
[snip]
> I just tried the same thing on my AM33xx and nothing bad happned here.
> One thing still: you might want to use "-n" for nanosleep.

Same result.

> Could you try [0] to check if you are not using more memory than
> available? If the OOM-killer kills the program, then it is okay, if the
> data-abort exception comes or the kernel crashes in a strange way then
> it is HW.

Seems to be killed by the OOM-killer - see output below.

> The tracer do not use any special interrupts on purpose.
> Now that I saw rasperry-pi let me ask this: do you have any
> non-mainline patches on-top? And if it is the case, could you try to
> get rid of them?

By 'non-mainline' do you mean in addition to the kernel.org source, or
in addition to the raspberry pi git tree?  Yes, to the former, no to
the latter.

> Also you can try the same test without the RT patches?

Quite right - the same oops happens with "Linux raspberrypi 3.6.11+ #4 PREEMPT".

I think I'm nearly starting to get my head around what's going on
here.  The USB driver uses FIQs, which normally isn't a problem
because nothing would interrupt the FIQ handler (or if it did, it
wouldn't generate a page fault).  But cyclictest runs at a higher
priority than the USB handler and generates page faults (at least when
it is initialising).  Eventually it interrupts a USB FIQ handler and
the memory manager doesn't know what to do with a page fault in a FIQ
handler, so it oopses.  Does that sound about right?

Tom

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Tracers+cyclictest causing kernel oops
  2013-06-21 11:21             ` Tom Cook
@ 2013-06-21 14:17               ` Tom Cook
  2013-06-21 14:58                 ` Tom Cook
  2013-06-21 15:00               ` Sebastian Andrzej Siewior
  1 sibling, 1 reply; 16+ messages in thread
From: Tom Cook @ 2013-06-21 14:17 UTC (permalink / raw)
  To: RT

On Fri, Jun 21, 2013 at 12:21 PM, Tom Cook <tom.k.cook@gmail.com> wrote:
[snip]
> I think I'm nearly starting to get my head around what's going on
> here.  The USB driver uses FIQs, which normally isn't a problem
> because nothing would interrupt the FIQ handler (or if it did, it
> wouldn't generate a page fault).  But cyclictest runs at a higher
> priority than the USB handler and generates page faults (at least when
> it is initialising).  Eventually it interrupts a USB FIQ handler and
> the memory manager doesn't know what to do with a page fault in a FIQ
> handler, so it oopses.  Does that sound about right?

Or, of course, perhaps the tracer is instrumenting the USB FIQ handler
with a hardware watchpoint.  These turn up as data aborts, which are
unhandled in FIQ mode.  Some evidence for this is that if I use
set_ftrace_filter to restrict the tracing to only a few system
functions then the crash doesn't happen.

Regards,
Tom

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Tracers+cyclictest causing kernel oops
  2013-06-21 14:17               ` Tom Cook
@ 2013-06-21 14:58                 ` Tom Cook
  0 siblings, 0 replies; 16+ messages in thread
From: Tom Cook @ 2013-06-21 14:58 UTC (permalink / raw)
  To: RT

On Fri, Jun 21, 2013 at 3:17 PM, Tom Cook <tom.k.cook@gmail.com> wrote:
> On Fri, Jun 21, 2013 at 12:21 PM, Tom Cook <tom.k.cook@gmail.com> wrote:
> [snip]
>> I think I'm nearly starting to get my head around what's going on
>> here.  The USB driver uses FIQs, which normally isn't a problem
>> because nothing would interrupt the FIQ handler (or if it did, it
>> wouldn't generate a page fault).  But cyclictest runs at a higher
>> priority than the USB handler and generates page faults (at least when
>> it is initialising).  Eventually it interrupts a USB FIQ handler and
>> the memory manager doesn't know what to do with a page fault in a FIQ
>> handler, so it oopses.  Does that sound about right?
>
> Or, of course, perhaps the tracer is instrumenting the USB FIQ handler
> with a hardware watchpoint.  These turn up as data aborts, which are
> unhandled in FIQ mode.  Some evidence for this is that if I use
> set_ftrace_filter to restrict the tracing to only a few system
> functions then the crash doesn't happen.

On that note, is there a way to disable ftrace for fiq handlers in the
source code?  Something must be done about hard irqs, since data
aborts are not handled there, either.  Does trace_hardirq_enter deal
with this in some way?

Tom

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Tracers+cyclictest causing kernel oops
  2013-06-21 11:21             ` Tom Cook
  2013-06-21 14:17               ` Tom Cook
@ 2013-06-21 15:00               ` Sebastian Andrzej Siewior
  1 sibling, 0 replies; 16+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-06-21 15:00 UTC (permalink / raw)
  To: Tom Cook; +Cc: RT

* Tom Cook | 2013-06-21 12:21:40 [+0100]:

>> The tracer do not use any special interrupts on purpose.
>> Now that I saw rasperry-pi let me ask this: do you have any
>> non-mainline patches on-top? And if it is the case, could you try to
>> get rid of them?
>
>By 'non-mainline' do you mean in addition to the kernel.org source, or
>in addition to the raspberry pi git tree?  Yes, to the former, no to
>the latter.

You could boot into ramdisk with tiny busybox + cyclictest and disable
the usb driver in question.

>> Also you can try the same test without the RT patches?
>
>Quite right - the same oops happens with "Linux raspberrypi 3.6.11+ #4 PREEMPT".
>
>I think I'm nearly starting to get my head around what's going on
>here.  The USB driver uses FIQs, which normally isn't a problem
>because nothing would interrupt the FIQ handler (or if it did, it
>wouldn't generate a page fault).  But cyclictest runs at a higher
>priority than the USB handler and generates page faults (at least when
>it is initialising).  Eventually it interrupts a USB FIQ handler and
>the memory manager doesn't know what to do with a page fault in a FIQ
>handler, so it oopses.  Does that sound about right?

You said that it oops also on non-RT kernel where cyclictest is not
running ontop of the USB thread. In general, the irq core disables the
interrupts and schedules the interrupt thread with threaded interrupt
support enabled.
But here it seems that FIQ interrupt is not implemented / suppoted
fully.
So either you have to fix this or replace the USB part :)

>Tom

Sebastian

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2013-06-21 15:00 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-04-26  8:04 Tracers+cyclictest causing kernel oops Tom Cook
2013-04-29  7:16 ` Tom Cook
2013-05-03 16:05   ` Sebastian Andrzej Siewior
2013-05-12  5:14     ` Tom Cook
     [not found]     ` <CAFSh4UwD5yyJrtKwUaOX-uiUuzVf7YaDbM4vbZ1EFZBf6F1JHg@mail.gmail.com>
2013-05-21 11:16       ` Sebastian Andrzej Siewior
2013-05-21 15:20         ` Tom Cook
2013-05-27  8:02           ` Uwe Kleine-König
2013-05-27 21:23             ` Tom Cook
2013-05-28  6:53               ` Uwe Kleine-König
2013-06-21 11:01                 ` Tom Cook
2013-05-31  9:23           ` Sebastian Andrzej Siewior
2013-05-31  9:31             ` Tom Cook
2013-06-21 11:21             ` Tom Cook
2013-06-21 14:17               ` Tom Cook
2013-06-21 14:58                 ` Tom Cook
2013-06-21 15:00               ` Sebastian Andrzej Siewior

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.