All of lore.kernel.org
 help / color / mirror / Atom feed
* BUG: spinlock trylock failure on UP, i.MX28 3.12.15-rt25
@ 2014-04-14 19:24 Stanislav Meduna
  2014-04-14 23:45   ` Stanislav Meduna
  0 siblings, 1 reply; 48+ messages in thread
From: Stanislav Meduna @ 2014-04-14 19:24 UTC (permalink / raw)
  To: linux-rt-users

Hi,

hunting another problem with AUART acting weirdly I got

BUG: spinlock trylock failure on UP on CPU#0, ksoftirqd/0/3
 lock: boot_tvec_bases+0x0/0x10a0, .magic: dead4ead, .owner: ksoftirqd/0/3, .owner_cpu: 0
CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted 3.12.15-rt25-00371-gfec62f5-dirty #26
[<c00149ec>] (unwind_backtrace+0x0/0xf4) from [<c0012ab0>] (show_stack+0x10/0x14)
[<c0012ab0>] (show_stack+0x10/0x14) from [<c01a4b80>] (do_raw_spin_trylock+0x4c/0x58)
[<c01a4b80>] (do_raw_spin_trylock+0x4c/0x58) from [<c02ca0b4>] (_raw_spin_trylock+0x20/0x98)
[<c02ca0b4>] (_raw_spin_trylock+0x20/0x98) from [<c02c9654>] (rt_spin_trylock+0x14/0xd0)
[<c02c9654>] (rt_spin_trylock+0x14/0xd0) from [<c00285cc>] (run_local_timers+0x24/0x78)
[<c00285cc>] (run_local_timers+0x24/0x78) from [<c0028654>] (update_process_times+0x34/0x68)
[<c0028654>] (update_process_times+0x34/0x68) from [<c005f0ac>] (tick_sched_timer+0x58/0x22c)
[<c005f0ac>] (tick_sched_timer+0x58/0x22c) from [<c003f2f8>] (__run_hrtimer+0x7c/0x2a8)
[<c003f2f8>] (__run_hrtimer+0x7c/0x2a8) from [<c003f684>] (hrtimer_interrupt+0x104/0x30c)
[<c003f684>] (hrtimer_interrupt+0x104/0x30c) from [<c0230a84>] (mxs_timer_interrupt+0x20/0x2c)
[<c0230a84>] (mxs_timer_interrupt+0x20/0x2c) from [<c0051d1c>] (handle_irq_event_percpu+0x80/0x2f8)
[<c0051d1c>] (handle_irq_event_percpu+0x80/0x2f8) from [<c0051fd0>] (handle_irq_event+0x3c/0x5c)
[<c0051fd0>] (handle_irq_event+0x3c/0x5c) from [<c0054778>] (handle_level_irq+0x8c/0x118)
[<c0054778>] (handle_level_irq+0x8c/0x118) from [<c0051c8c>] (generic_handle_irq+0x28/0x30)
[<c0051c8c>] (generic_handle_irq+0x28/0x30) from [<c001009c>] (handle_IRQ+0x30/0x84)
[<c001009c>] (handle_IRQ+0x30/0x84) from [<c0013324>] (__irq_svc+0x44/0x88)
[<c0013324>] (__irq_svc+0x44/0x88) from [<c006a664>] (__try_to_take_rt_mutex+0x4/0x144)
[<c006a664>] (__try_to_take_rt_mutex+0x4/0x144) from [<00000000>] (  (null))


Linux nxt 3.12.15-rt25-00371-gfec62f5-dirty #26 PREEMPT RT Mon Apr 14 20:40:56 CEST 2014 armv5tejl GNU/Linux

This is on a Freescale i.MX28, 3.12.15-rt25 with patches that do not touch
anything related.


On a side note: is it possible that the newer ARM kernels have some
requirements regarding the compiler (I am using gcc 4.4.5)? I don't
understand what I am seeing in the AUART problem and it can be well
something regarding the optimization. However, I had no problems
running 3.4 built using the same compiler.

Thanks
-- 
                                                   Stano

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: BUG: spinlock trylock failure on UP, i.MX28 3.12.15-rt25
  2014-04-14 19:24 BUG: spinlock trylock failure on UP, i.MX28 3.12.15-rt25 Stanislav Meduna
@ 2014-04-14 23:45   ` Stanislav Meduna
  0 siblings, 0 replies; 48+ messages in thread
From: Stanislav Meduna @ 2014-04-14 23:45 UTC (permalink / raw)
  To: linux-rt-users, Linux ARM Kernel, linux-kernel

On 14.04.2014 21:24, Stanislav Meduna wrote:

> hunting another problem with AUART acting weirdly I got

okay, the AUART thing was some device tree snafu, but I have seen
the BUG a few times (but not readily reproducible). This thread
looks remotely similar https://lkml.org/lkml/2013/9/27/534 - looks
like the real bug happened somewhere else :( Any hints on how to
get more info?

Adding Cc: to linux-arm and linux-kernel

> BUG: spinlock trylock failure on UP on CPU#0, ksoftirqd/0/3
>  lock: boot_tvec_bases+0x0/0x10a0, .magic: dead4ead, .owner: ksoftirqd/0/3, .owner_cpu: 0
> CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted 3.12.15-rt25-00371-gfec62f5-dirty #26
> [<c00149ec>] (unwind_backtrace+0x0/0xf4) from [<c0012ab0>] (show_stack+0x10/0x14)
> [<c0012ab0>] (show_stack+0x10/0x14) from [<c01a4b80>] (do_raw_spin_trylock+0x4c/0x58)
> [<c01a4b80>] (do_raw_spin_trylock+0x4c/0x58) from [<c02ca0b4>] (_raw_spin_trylock+0x20/0x98)
> [<c02ca0b4>] (_raw_spin_trylock+0x20/0x98) from [<c02c9654>] (rt_spin_trylock+0x14/0xd0)
> [<c02c9654>] (rt_spin_trylock+0x14/0xd0) from [<c00285cc>] (run_local_timers+0x24/0x78)
> [<c00285cc>] (run_local_timers+0x24/0x78) from [<c0028654>] (update_process_times+0x34/0x68)
> [<c0028654>] (update_process_times+0x34/0x68) from [<c005f0ac>] (tick_sched_timer+0x58/0x22c)
> [<c005f0ac>] (tick_sched_timer+0x58/0x22c) from [<c003f2f8>] (__run_hrtimer+0x7c/0x2a8)
> [<c003f2f8>] (__run_hrtimer+0x7c/0x2a8) from [<c003f684>] (hrtimer_interrupt+0x104/0x30c)
> [<c003f684>] (hrtimer_interrupt+0x104/0x30c) from [<c0230a84>] (mxs_timer_interrupt+0x20/0x2c)
> [<c0230a84>] (mxs_timer_interrupt+0x20/0x2c) from [<c0051d1c>] (handle_irq_event_percpu+0x80/0x2f8)
> [<c0051d1c>] (handle_irq_event_percpu+0x80/0x2f8) from [<c0051fd0>] (handle_irq_event+0x3c/0x5c)
> [<c0051fd0>] (handle_irq_event+0x3c/0x5c) from [<c0054778>] (handle_level_irq+0x8c/0x118)
> [<c0054778>] (handle_level_irq+0x8c/0x118) from [<c0051c8c>] (generic_handle_irq+0x28/0x30)
> [<c0051c8c>] (generic_handle_irq+0x28/0x30) from [<c001009c>] (handle_IRQ+0x30/0x84)
> [<c001009c>] (handle_IRQ+0x30/0x84) from [<c0013324>] (__irq_svc+0x44/0x88)
> [<c0013324>] (__irq_svc+0x44/0x88) from [<c006a664>] (__try_to_take_rt_mutex+0x4/0x144)
> [<c006a664>] (__try_to_take_rt_mutex+0x4/0x144) from [<00000000>] (  (null))
> 
> 
> Linux nxt 3.12.15-rt25-00371-gfec62f5-dirty #26 PREEMPT RT Mon Apr 14 20:40:56 CEST 2014 armv5tejl GNU/Linux
> 
> This is on a Freescale i.MX28, 3.12.15-rt25 with patches that do not touch
> anything related.

-- 
                                                 Stano


^ permalink raw reply	[flat|nested] 48+ messages in thread

* BUG: spinlock trylock failure on UP, i.MX28 3.12.15-rt25
@ 2014-04-14 23:45   ` Stanislav Meduna
  0 siblings, 0 replies; 48+ messages in thread
From: Stanislav Meduna @ 2014-04-14 23:45 UTC (permalink / raw)
  To: linux-arm-kernel

On 14.04.2014 21:24, Stanislav Meduna wrote:

> hunting another problem with AUART acting weirdly I got

okay, the AUART thing was some device tree snafu, but I have seen
the BUG a few times (but not readily reproducible). This thread
looks remotely similar https://lkml.org/lkml/2013/9/27/534 - looks
like the real bug happened somewhere else :( Any hints on how to
get more info?

Adding Cc: to linux-arm and linux-kernel

> BUG: spinlock trylock failure on UP on CPU#0, ksoftirqd/0/3
>  lock: boot_tvec_bases+0x0/0x10a0, .magic: dead4ead, .owner: ksoftirqd/0/3, .owner_cpu: 0
> CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted 3.12.15-rt25-00371-gfec62f5-dirty #26
> [<c00149ec>] (unwind_backtrace+0x0/0xf4) from [<c0012ab0>] (show_stack+0x10/0x14)
> [<c0012ab0>] (show_stack+0x10/0x14) from [<c01a4b80>] (do_raw_spin_trylock+0x4c/0x58)
> [<c01a4b80>] (do_raw_spin_trylock+0x4c/0x58) from [<c02ca0b4>] (_raw_spin_trylock+0x20/0x98)
> [<c02ca0b4>] (_raw_spin_trylock+0x20/0x98) from [<c02c9654>] (rt_spin_trylock+0x14/0xd0)
> [<c02c9654>] (rt_spin_trylock+0x14/0xd0) from [<c00285cc>] (run_local_timers+0x24/0x78)
> [<c00285cc>] (run_local_timers+0x24/0x78) from [<c0028654>] (update_process_times+0x34/0x68)
> [<c0028654>] (update_process_times+0x34/0x68) from [<c005f0ac>] (tick_sched_timer+0x58/0x22c)
> [<c005f0ac>] (tick_sched_timer+0x58/0x22c) from [<c003f2f8>] (__run_hrtimer+0x7c/0x2a8)
> [<c003f2f8>] (__run_hrtimer+0x7c/0x2a8) from [<c003f684>] (hrtimer_interrupt+0x104/0x30c)
> [<c003f684>] (hrtimer_interrupt+0x104/0x30c) from [<c0230a84>] (mxs_timer_interrupt+0x20/0x2c)
> [<c0230a84>] (mxs_timer_interrupt+0x20/0x2c) from [<c0051d1c>] (handle_irq_event_percpu+0x80/0x2f8)
> [<c0051d1c>] (handle_irq_event_percpu+0x80/0x2f8) from [<c0051fd0>] (handle_irq_event+0x3c/0x5c)
> [<c0051fd0>] (handle_irq_event+0x3c/0x5c) from [<c0054778>] (handle_level_irq+0x8c/0x118)
> [<c0054778>] (handle_level_irq+0x8c/0x118) from [<c0051c8c>] (generic_handle_irq+0x28/0x30)
> [<c0051c8c>] (generic_handle_irq+0x28/0x30) from [<c001009c>] (handle_IRQ+0x30/0x84)
> [<c001009c>] (handle_IRQ+0x30/0x84) from [<c0013324>] (__irq_svc+0x44/0x88)
> [<c0013324>] (__irq_svc+0x44/0x88) from [<c006a664>] (__try_to_take_rt_mutex+0x4/0x144)
> [<c006a664>] (__try_to_take_rt_mutex+0x4/0x144) from [<00000000>] (  (null))
> 
> 
> Linux nxt 3.12.15-rt25-00371-gfec62f5-dirty #26 PREEMPT RT Mon Apr 14 20:40:56 CEST 2014 armv5tejl GNU/Linux
> 
> This is on a Freescale i.MX28, 3.12.15-rt25 with patches that do not touch
> anything related.

-- 
                                                 Stano

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: BUG: spinlock trylock failure on UP, i.MX28 3.12.15-rt25
  2014-04-14 23:45   ` Stanislav Meduna
@ 2014-04-15 22:08     ` Stanislav Meduna
  -1 siblings, 0 replies; 48+ messages in thread
From: Stanislav Meduna @ 2014-04-15 22:08 UTC (permalink / raw)
  To: linux-rt-users
  Cc: Linux ARM Kernel, linux-kernel, bigeasy, Steven Rostedt, Thomas Gleixner

On 15.04.2014 01:45, Stanislav Meduna wrote:

>> BUG: spinlock trylock failure on UP on CPU#0, ksoftirqd/0/3

I am now getting this quite reproducibly a few seconds into
the boot and the path is always similar. Depending on what modules
I am loading the source changes, but it is nearly always
a schedule_timeout with a following timer interrupt.

Disabling highres timers just changes the bug path but it happens
also in that case. I am using CONFIG_HZ_PERIODIC. I tried to disable
the serial console and several drivers to rule out some interference
but it did not change anything.

Freescale i.MX28, 3.12.15-rt25 + patches enabling the platform,
none of them touches anything in kernel/* or the MXS timer.
Up to now no freeze or other BUGs, it was always only this one.

I see that the relevant code was touched a few times in the last
few months, maybe there is still something lurking.

Hmm... how is it in the rt-case guaranteed that the timer interrupt
does not preempt someone trying to modify the timer? The run_local_timers
looks to have arrived via hardirq context. The spinlock in the tvec_base
is a normal one and spin_lock_irqsave does not disable interrupts
on rt, right?

[   11.797460] BUG: spinlock trylock failure on UP on CPU#0, rcu_preempt/11
[   11.797522]  lock: boot_tvec_bases+0x0/0x10c0, .magic: dead4ead, .owner: rcu_preempt/11, .owner_cpu: 0
[   11.797550] CPU: 0 PID: 11 Comm: rcu_preempt Not tainted 3.12.15-rt25+ #52
[   11.797630] [<c00151bc>] (unwind_backtrace+0x0/0xf4) from [<c0012c00>] (show_stack+0x10/0x14)
[   11.797691] [<c0012c00>] (show_stack+0x10/0x14) from [<c01b2758>] (do_raw_spin_trylock+0x4c/0x58)
[   11.797748] [<c01b2758>] (do_raw_spin_trylock+0x4c/0x58) from [<c02e0194>] (_raw_spin_trylock+0x20/0x98)
[   11.797792] [<c02e0194>] (_raw_spin_trylock+0x20/0x98) from [<c02df734>] (rt_spin_trylock+0x14/0xd0)
[   11.797851] [<c02df734>] (rt_spin_trylock+0x14/0xd0) from [<c0028e7c>] (run_local_timers+0x24/0x78)
[   11.797892] [<c0028e7c>] (run_local_timers+0x24/0x78) from [<c0028f04>] (update_process_times+0x34/0x68)
[   11.797940] [<c0028f04>] (update_process_times+0x34/0x68) from [<c0060920>] (tick_sched_timer+0x58/0x22c)
[   11.797990] [<c0060920>] (tick_sched_timer+0x58/0x22c) from [<c0040820>] (__run_hrtimer+0x88/0x2b8)
[   11.798029] [<c0040820>] (__run_hrtimer+0x88/0x2b8) from [<c0040bb0>] (hrtimer_interrupt+0x104/0x30c)
[   11.798076] [<c0040bb0>] (hrtimer_interrupt+0x104/0x30c) from [<c0246c50>] (mxs_timer_interrupt+0x20/0x2c)
[   11.798123] [<c0246c50>] (mxs_timer_interrupt+0x20/0x2c) from [<c00534d8>] (handle_irq_event_percpu+0x80/0x2f8)
[   11.798161] [<c00534d8>] (handle_irq_event_percpu+0x80/0x2f8) from [<c005378c>] (handle_irq_event+0x3c/0x5c)
[   11.798201] [<c005378c>] (handle_irq_event+0x3c/0x5c) from [<c0055f68>] (handle_level_irq+0x8c/0x118)
[   11.798239] [<c0055f68>] (handle_level_irq+0x8c/0x118) from [<c0053448>] (generic_handle_irq+0x28/0x30)
[   11.798281] [<c0053448>] (generic_handle_irq+0x28/0x30) from [<c00101dc>] (handle_IRQ+0x30/0x84)
[   11.798322] [<c00101dc>] (handle_IRQ+0x30/0x84) from [<c0013484>] (__irq_svc+0x44/0x88)
[   11.798364] [<c0013484>] (__irq_svc+0x44/0x88) from [<c02deb18>] (rt_spin_lock_slowlock+0x60/0x204)
[   11.798402] [<c02deb18>] (rt_spin_lock_slowlock+0x60/0x204) from [<c02df4d0>] (rt_spin_lock+0x28/0x60)
[   11.798451] [<c02df4d0>] (rt_spin_lock+0x28/0x60) from [<c0028874>] (lock_timer_base+0x28/0x48)
[   11.798494] [<c0028874>] (lock_timer_base+0x28/0x48) from [<c02dcb28>] (schedule_timeout+0x78/0x254)
[   11.798531] [<c02dcb28>] (schedule_timeout+0x78/0x254) from [<c00763a4>] (rcu_gp_kthread+0x2d4/0x5f0)
[   11.798578] [<c00763a4>] (rcu_gp_kthread+0x2d4/0x5f0) from [<c003cf24>] (kthread+0xa0/0xa8)
[   11.798621] [<c003cf24>] (kthread+0xa0/0xa8) from [<c000f3e0>] (ret_from_fork+0x14/0x34)

Thanks
-- 
                                            Stano


^ permalink raw reply	[flat|nested] 48+ messages in thread

* BUG: spinlock trylock failure on UP, i.MX28 3.12.15-rt25
@ 2014-04-15 22:08     ` Stanislav Meduna
  0 siblings, 0 replies; 48+ messages in thread
From: Stanislav Meduna @ 2014-04-15 22:08 UTC (permalink / raw)
  To: linux-arm-kernel

On 15.04.2014 01:45, Stanislav Meduna wrote:

>> BUG: spinlock trylock failure on UP on CPU#0, ksoftirqd/0/3

I am now getting this quite reproducibly a few seconds into
the boot and the path is always similar. Depending on what modules
I am loading the source changes, but it is nearly always
a schedule_timeout with a following timer interrupt.

Disabling highres timers just changes the bug path but it happens
also in that case. I am using CONFIG_HZ_PERIODIC. I tried to disable
the serial console and several drivers to rule out some interference
but it did not change anything.

Freescale i.MX28, 3.12.15-rt25 + patches enabling the platform,
none of them touches anything in kernel/* or the MXS timer.
Up to now no freeze or other BUGs, it was always only this one.

I see that the relevant code was touched a few times in the last
few months, maybe there is still something lurking.

Hmm... how is it in the rt-case guaranteed that the timer interrupt
does not preempt someone trying to modify the timer? The run_local_timers
looks to have arrived via hardirq context. The spinlock in the tvec_base
is a normal one and spin_lock_irqsave does not disable interrupts
on rt, right?

[   11.797460] BUG: spinlock trylock failure on UP on CPU#0, rcu_preempt/11
[   11.797522]  lock: boot_tvec_bases+0x0/0x10c0, .magic: dead4ead, .owner: rcu_preempt/11, .owner_cpu: 0
[   11.797550] CPU: 0 PID: 11 Comm: rcu_preempt Not tainted 3.12.15-rt25+ #52
[   11.797630] [<c00151bc>] (unwind_backtrace+0x0/0xf4) from [<c0012c00>] (show_stack+0x10/0x14)
[   11.797691] [<c0012c00>] (show_stack+0x10/0x14) from [<c01b2758>] (do_raw_spin_trylock+0x4c/0x58)
[   11.797748] [<c01b2758>] (do_raw_spin_trylock+0x4c/0x58) from [<c02e0194>] (_raw_spin_trylock+0x20/0x98)
[   11.797792] [<c02e0194>] (_raw_spin_trylock+0x20/0x98) from [<c02df734>] (rt_spin_trylock+0x14/0xd0)
[   11.797851] [<c02df734>] (rt_spin_trylock+0x14/0xd0) from [<c0028e7c>] (run_local_timers+0x24/0x78)
[   11.797892] [<c0028e7c>] (run_local_timers+0x24/0x78) from [<c0028f04>] (update_process_times+0x34/0x68)
[   11.797940] [<c0028f04>] (update_process_times+0x34/0x68) from [<c0060920>] (tick_sched_timer+0x58/0x22c)
[   11.797990] [<c0060920>] (tick_sched_timer+0x58/0x22c) from [<c0040820>] (__run_hrtimer+0x88/0x2b8)
[   11.798029] [<c0040820>] (__run_hrtimer+0x88/0x2b8) from [<c0040bb0>] (hrtimer_interrupt+0x104/0x30c)
[   11.798076] [<c0040bb0>] (hrtimer_interrupt+0x104/0x30c) from [<c0246c50>] (mxs_timer_interrupt+0x20/0x2c)
[   11.798123] [<c0246c50>] (mxs_timer_interrupt+0x20/0x2c) from [<c00534d8>] (handle_irq_event_percpu+0x80/0x2f8)
[   11.798161] [<c00534d8>] (handle_irq_event_percpu+0x80/0x2f8) from [<c005378c>] (handle_irq_event+0x3c/0x5c)
[   11.798201] [<c005378c>] (handle_irq_event+0x3c/0x5c) from [<c0055f68>] (handle_level_irq+0x8c/0x118)
[   11.798239] [<c0055f68>] (handle_level_irq+0x8c/0x118) from [<c0053448>] (generic_handle_irq+0x28/0x30)
[   11.798281] [<c0053448>] (generic_handle_irq+0x28/0x30) from [<c00101dc>] (handle_IRQ+0x30/0x84)
[   11.798322] [<c00101dc>] (handle_IRQ+0x30/0x84) from [<c0013484>] (__irq_svc+0x44/0x88)
[   11.798364] [<c0013484>] (__irq_svc+0x44/0x88) from [<c02deb18>] (rt_spin_lock_slowlock+0x60/0x204)
[   11.798402] [<c02deb18>] (rt_spin_lock_slowlock+0x60/0x204) from [<c02df4d0>] (rt_spin_lock+0x28/0x60)
[   11.798451] [<c02df4d0>] (rt_spin_lock+0x28/0x60) from [<c0028874>] (lock_timer_base+0x28/0x48)
[   11.798494] [<c0028874>] (lock_timer_base+0x28/0x48) from [<c02dcb28>] (schedule_timeout+0x78/0x254)
[   11.798531] [<c02dcb28>] (schedule_timeout+0x78/0x254) from [<c00763a4>] (rcu_gp_kthread+0x2d4/0x5f0)
[   11.798578] [<c00763a4>] (rcu_gp_kthread+0x2d4/0x5f0) from [<c003cf24>] (kthread+0xa0/0xa8)
[   11.798621] [<c003cf24>] (kthread+0xa0/0xa8) from [<c000f3e0>] (ret_from_fork+0x14/0x34)

Thanks
-- 
                                            Stano

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: BUG: spinlock trylock failure on UP - reverting timer patches helps
  2014-04-15 22:08     ` Stanislav Meduna
@ 2014-04-17 16:29       ` Stanislav Meduna
  -1 siblings, 0 replies; 48+ messages in thread
From: Stanislav Meduna @ 2014-04-17 16:29 UTC (permalink / raw)
  To: linux-rt-users
  Cc: Linux ARM Kernel, linux-kernel, bigeasy, Steven Rostedt,
	Thomas Gleixner, triplesquarednine

On 16.04.2014 00:08, Stanislav Meduna wrote:

>>> BUG: spinlock trylock failure on UP on CPU#0, ksoftirqd/0/3
> 
> I am now getting this quite reproducibly a few seconds into
> the boot and the path is always similar. Depending on what modules
> I am loading the source changes, but it is nearly always
> a schedule_timeout with a following timer interrupt.

Note: the bug goes away reverting the very same patches that are
helping the Jordan's problms. I did follow

http://www.spinics.net/lists/linux-rt-users/msg11639.html

and no BUG while booting anymore. I'll try to let it run some
more time.

> [   11.797460] BUG: spinlock trylock failure on UP on CPU#0, rcu_preempt/11
> [   11.797522]  lock: boot_tvec_bases+0x0/0x10c0, .magic: dead4ead, .owner: rcu_preempt/11, .owner_cpu: 0
> [   11.797550] CPU: 0 PID: 11 Comm: rcu_preempt Not tainted 3.12.15-rt25+ #52
> [   11.797630] [<c00151bc>] (unwind_backtrace+0x0/0xf4) from [<c0012c00>] (show_stack+0x10/0x14)
> [   11.797691] [<c0012c00>] (show_stack+0x10/0x14) from [<c01b2758>] (do_raw_spin_trylock+0x4c/0x58)
> [   11.797748] [<c01b2758>] (do_raw_spin_trylock+0x4c/0x58) from [<c02e0194>] (_raw_spin_trylock+0x20/0x98)
> [   11.797792] [<c02e0194>] (_raw_spin_trylock+0x20/0x98) from [<c02df734>] (rt_spin_trylock+0x14/0xd0)
> [   11.797851] [<c02df734>] (rt_spin_trylock+0x14/0xd0) from [<c0028e7c>] (run_local_timers+0x24/0x78)
> [   11.797892] [<c0028e7c>] (run_local_timers+0x24/0x78) from [<c0028f04>] (update_process_times+0x34/0x68)
> [   11.797940] [<c0028f04>] (update_process_times+0x34/0x68) from [<c0060920>] (tick_sched_timer+0x58/0x22c)
> [   11.797990] [<c0060920>] (tick_sched_timer+0x58/0x22c) from [<c0040820>] (__run_hrtimer+0x88/0x2b8)
> [   11.798029] [<c0040820>] (__run_hrtimer+0x88/0x2b8) from [<c0040bb0>] (hrtimer_interrupt+0x104/0x30c)
> [   11.798076] [<c0040bb0>] (hrtimer_interrupt+0x104/0x30c) from [<c0246c50>] (mxs_timer_interrupt+0x20/0x2c)
> [   11.798123] [<c0246c50>] (mxs_timer_interrupt+0x20/0x2c) from [<c00534d8>] (handle_irq_event_percpu+0x80/0x2f8)
> [   11.798161] [<c00534d8>] (handle_irq_event_percpu+0x80/0x2f8) from [<c005378c>] (handle_irq_event+0x3c/0x5c)
> [   11.798201] [<c005378c>] (handle_irq_event+0x3c/0x5c) from [<c0055f68>] (handle_level_irq+0x8c/0x118)
> [   11.798239] [<c0055f68>] (handle_level_irq+0x8c/0x118) from [<c0053448>] (generic_handle_irq+0x28/0x30)
> [   11.798281] [<c0053448>] (generic_handle_irq+0x28/0x30) from [<c00101dc>] (handle_IRQ+0x30/0x84)
> [   11.798322] [<c00101dc>] (handle_IRQ+0x30/0x84) from [<c0013484>] (__irq_svc+0x44/0x88)
> [   11.798364] [<c0013484>] (__irq_svc+0x44/0x88) from [<c02deb18>] (rt_spin_lock_slowlock+0x60/0x204)
> [   11.798402] [<c02deb18>] (rt_spin_lock_slowlock+0x60/0x204) from [<c02df4d0>] (rt_spin_lock+0x28/0x60)
> [   11.798451] [<c02df4d0>] (rt_spin_lock+0x28/0x60) from [<c0028874>] (lock_timer_base+0x28/0x48)
> [   11.798494] [<c0028874>] (lock_timer_base+0x28/0x48) from [<c02dcb28>] (schedule_timeout+0x78/0x254)
> [   11.798531] [<c02dcb28>] (schedule_timeout+0x78/0x254) from [<c00763a4>] (rcu_gp_kthread+0x2d4/0x5f0)
> [   11.798578] [<c00763a4>] (rcu_gp_kthread+0x2d4/0x5f0) from [<c003cf24>] (kthread+0xa0/0xa8)
> [   11.798621] [<c003cf24>] (kthread+0xa0/0xa8) from [<c000f3e0>] (ret_from_fork+0x14/0x34)

Thanks
-- 
                                                      Stano


^ permalink raw reply	[flat|nested] 48+ messages in thread

* BUG: spinlock trylock failure on UP - reverting timer patches helps
@ 2014-04-17 16:29       ` Stanislav Meduna
  0 siblings, 0 replies; 48+ messages in thread
From: Stanislav Meduna @ 2014-04-17 16:29 UTC (permalink / raw)
  To: linux-arm-kernel

On 16.04.2014 00:08, Stanislav Meduna wrote:

>>> BUG: spinlock trylock failure on UP on CPU#0, ksoftirqd/0/3
> 
> I am now getting this quite reproducibly a few seconds into
> the boot and the path is always similar. Depending on what modules
> I am loading the source changes, but it is nearly always
> a schedule_timeout with a following timer interrupt.

Note: the bug goes away reverting the very same patches that are
helping the Jordan's problms. I did follow

http://www.spinics.net/lists/linux-rt-users/msg11639.html

and no BUG while booting anymore. I'll try to let it run some
more time.

> [   11.797460] BUG: spinlock trylock failure on UP on CPU#0, rcu_preempt/11
> [   11.797522]  lock: boot_tvec_bases+0x0/0x10c0, .magic: dead4ead, .owner: rcu_preempt/11, .owner_cpu: 0
> [   11.797550] CPU: 0 PID: 11 Comm: rcu_preempt Not tainted 3.12.15-rt25+ #52
> [   11.797630] [<c00151bc>] (unwind_backtrace+0x0/0xf4) from [<c0012c00>] (show_stack+0x10/0x14)
> [   11.797691] [<c0012c00>] (show_stack+0x10/0x14) from [<c01b2758>] (do_raw_spin_trylock+0x4c/0x58)
> [   11.797748] [<c01b2758>] (do_raw_spin_trylock+0x4c/0x58) from [<c02e0194>] (_raw_spin_trylock+0x20/0x98)
> [   11.797792] [<c02e0194>] (_raw_spin_trylock+0x20/0x98) from [<c02df734>] (rt_spin_trylock+0x14/0xd0)
> [   11.797851] [<c02df734>] (rt_spin_trylock+0x14/0xd0) from [<c0028e7c>] (run_local_timers+0x24/0x78)
> [   11.797892] [<c0028e7c>] (run_local_timers+0x24/0x78) from [<c0028f04>] (update_process_times+0x34/0x68)
> [   11.797940] [<c0028f04>] (update_process_times+0x34/0x68) from [<c0060920>] (tick_sched_timer+0x58/0x22c)
> [   11.797990] [<c0060920>] (tick_sched_timer+0x58/0x22c) from [<c0040820>] (__run_hrtimer+0x88/0x2b8)
> [   11.798029] [<c0040820>] (__run_hrtimer+0x88/0x2b8) from [<c0040bb0>] (hrtimer_interrupt+0x104/0x30c)
> [   11.798076] [<c0040bb0>] (hrtimer_interrupt+0x104/0x30c) from [<c0246c50>] (mxs_timer_interrupt+0x20/0x2c)
> [   11.798123] [<c0246c50>] (mxs_timer_interrupt+0x20/0x2c) from [<c00534d8>] (handle_irq_event_percpu+0x80/0x2f8)
> [   11.798161] [<c00534d8>] (handle_irq_event_percpu+0x80/0x2f8) from [<c005378c>] (handle_irq_event+0x3c/0x5c)
> [   11.798201] [<c005378c>] (handle_irq_event+0x3c/0x5c) from [<c0055f68>] (handle_level_irq+0x8c/0x118)
> [   11.798239] [<c0055f68>] (handle_level_irq+0x8c/0x118) from [<c0053448>] (generic_handle_irq+0x28/0x30)
> [   11.798281] [<c0053448>] (generic_handle_irq+0x28/0x30) from [<c00101dc>] (handle_IRQ+0x30/0x84)
> [   11.798322] [<c00101dc>] (handle_IRQ+0x30/0x84) from [<c0013484>] (__irq_svc+0x44/0x88)
> [   11.798364] [<c0013484>] (__irq_svc+0x44/0x88) from [<c02deb18>] (rt_spin_lock_slowlock+0x60/0x204)
> [   11.798402] [<c02deb18>] (rt_spin_lock_slowlock+0x60/0x204) from [<c02df4d0>] (rt_spin_lock+0x28/0x60)
> [   11.798451] [<c02df4d0>] (rt_spin_lock+0x28/0x60) from [<c0028874>] (lock_timer_base+0x28/0x48)
> [   11.798494] [<c0028874>] (lock_timer_base+0x28/0x48) from [<c02dcb28>] (schedule_timeout+0x78/0x254)
> [   11.798531] [<c02dcb28>] (schedule_timeout+0x78/0x254) from [<c00763a4>] (rcu_gp_kthread+0x2d4/0x5f0)
> [   11.798578] [<c00763a4>] (rcu_gp_kthread+0x2d4/0x5f0) from [<c003cf24>] (kthread+0xa0/0xa8)
> [   11.798621] [<c003cf24>] (kthread+0xa0/0xa8) from [<c000f3e0>] (ret_from_fork+0x14/0x34)

Thanks
-- 
                                                      Stano

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: BUG: spinlock trylock failure on UP - reverting timer patches helps
  2014-04-17 16:29       ` Stanislav Meduna
@ 2014-04-17 22:55         ` Stanislav Meduna
  -1 siblings, 0 replies; 48+ messages in thread
From: Stanislav Meduna @ 2014-04-17 22:55 UTC (permalink / raw)
  To: linux-rt-users, triplesquarednine
  Cc: Linux ARM Kernel, linux-kernel, bigeasy, Steven Rostedt, Thomas Gleixner

On 17.04.2014 18:29, Stanislav Meduna wrote:

> Note: the bug goes away reverting the very same patches that are
> helping the Jordan's problms. I did follow
> 
> http://www.spinics.net/lists/linux-rt-users/msg11639.html
> 
> and no BUG while booting anymore. I'll try to let it run some
> more time.

Erm.. nope

# (cat /proc/meminfo ; cat /proc/meminfo) |  grep KernelStack
KernelStack:        5920 kB
KernelStack:        5928 kB

# (cat /proc/meminfo ; cat /proc/meminfo) |  grep KernelStack
KernelStack:        5952 kB
KernelStack:        5960 kB

maybe interrupt nesting ad nauseum? How does one debug
such issues?

It is possible that my previous posting was also caused by this -
disregard, I'll post again if the problem persists.

Jordan, could you check whether your kernel stack remains
stable with only the 3 patches reverted?

Thanks
-- 
                                     Stano


^ permalink raw reply	[flat|nested] 48+ messages in thread

* BUG: spinlock trylock failure on UP - reverting timer patches helps
@ 2014-04-17 22:55         ` Stanislav Meduna
  0 siblings, 0 replies; 48+ messages in thread
From: Stanislav Meduna @ 2014-04-17 22:55 UTC (permalink / raw)
  To: linux-arm-kernel

On 17.04.2014 18:29, Stanislav Meduna wrote:

> Note: the bug goes away reverting the very same patches that are
> helping the Jordan's problms. I did follow
> 
> http://www.spinics.net/lists/linux-rt-users/msg11639.html
> 
> and no BUG while booting anymore. I'll try to let it run some
> more time.

Erm.. nope

# (cat /proc/meminfo ; cat /proc/meminfo) |  grep KernelStack
KernelStack:        5920 kB
KernelStack:        5928 kB

# (cat /proc/meminfo ; cat /proc/meminfo) |  grep KernelStack
KernelStack:        5952 kB
KernelStack:        5960 kB

maybe interrupt nesting ad nauseum? How does one debug
such issues?

It is possible that my previous posting was also caused by this -
disregard, I'll post again if the problem persists.

Jordan, could you check whether your kernel stack remains
stable with only the 3 patches reverted?

Thanks
-- 
                                     Stano

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: BUG: spinlock trylock failure on UP - reverting timer patches helps
  2014-04-17 22:55         ` Stanislav Meduna
@ 2014-04-17 23:10           ` jordan
  -1 siblings, 0 replies; 48+ messages in thread
From: jordan @ 2014-04-17 23:10 UTC (permalink / raw)
  To: Stanislav Meduna
  Cc: linux-rt-users, Linux ARM Kernel, linux-kernel,
	Sebastian Andrzej Siewior, Steven Rostedt, Thomas Gleixner

Hey Stano,

On Thu, Apr 17, 2014 at 6:55 PM, Stanislav Meduna <stano@meduna.org> wrote:
> On 17.04.2014 18:29, Stanislav Meduna wrote:
>
>> Note: the bug goes away reverting the very same patches that are
>> helping the Jordan's problms. I did follow
>>
>> http://www.spinics.net/lists/linux-rt-users/msg11639.html
>>
>> and no BUG while booting anymore. I'll try to let it run some
>> more time.

Yeah, i know a handful of people [amd users] that have now reported
success booting into 3.14-rt1 reverting those patches.  Personally, I
have disabled NO_HZ_FULL and have switched back to 'old tick' method
in kconfig. I don't think the latest no_hz stuff is stable enough...

> Erm.. nope
>
> # (cat /proc/meminfo ; cat /proc/meminfo) |  grep KernelStack
> KernelStack:        5920 kB
> KernelStack:        5928 kB
>
> # (cat /proc/meminfo ; cat /proc/meminfo) |  grep KernelStack
> KernelStack:        5952 kB
> KernelStack:        5960 kB
>
> maybe interrupt nesting ad nauseum? How does one debug
> such issues?
>
> It is possible that my previous posting was also caused by this -
> disregard, I'll post again if the problem persists.
>
> Jordan, could you check whether your kernel stack remains
> stable with only the 3 patches reverted?

I'll have to re-install that kernel and get back to you [ give me an
hour or so. i desperately need to eat some food ;) ]. My current
kernel differs in a couple of ways [ but yes, the KernelStack is
stable]. My running/installed rt-kernel is using the 'threadsirq'
patch for linux-rt and isn't vanilla linux-rt anyway, as i patch in
UKSM [in-kernel memory deduplication and BFQ [is scheduler]... but
yes, I'm still reverting those 3 timer[s] patches, since my machine
[and other users i know] won't boot without doing that. Here's my
running kernel's kstack;

[ninez@localhost ~]$  (cat /proc/meminfo ; cat /proc/meminfo) |  grep
KernelStack
KernelStack:        3728 kB
KernelStack:        3728 kB
[ninez@localhost ~]$  (cat /proc/meminfo ; cat /proc/meminfo) |  grep
KernelStack
KernelStack:        3696 kB
KernelStack:        3696 kB

I'll post from my old kernel in a bit.

> Thanks

No problem

Jordan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* BUG: spinlock trylock failure on UP - reverting timer patches helps
@ 2014-04-17 23:10           ` jordan
  0 siblings, 0 replies; 48+ messages in thread
From: jordan @ 2014-04-17 23:10 UTC (permalink / raw)
  To: linux-arm-kernel

Hey Stano,

On Thu, Apr 17, 2014 at 6:55 PM, Stanislav Meduna <stano@meduna.org> wrote:
> On 17.04.2014 18:29, Stanislav Meduna wrote:
>
>> Note: the bug goes away reverting the very same patches that are
>> helping the Jordan's problms. I did follow
>>
>> http://www.spinics.net/lists/linux-rt-users/msg11639.html
>>
>> and no BUG while booting anymore. I'll try to let it run some
>> more time.

Yeah, i know a handful of people [amd users] that have now reported
success booting into 3.14-rt1 reverting those patches.  Personally, I
have disabled NO_HZ_FULL and have switched back to 'old tick' method
in kconfig. I don't think the latest no_hz stuff is stable enough...

> Erm.. nope
>
> # (cat /proc/meminfo ; cat /proc/meminfo) |  grep KernelStack
> KernelStack:        5920 kB
> KernelStack:        5928 kB
>
> # (cat /proc/meminfo ; cat /proc/meminfo) |  grep KernelStack
> KernelStack:        5952 kB
> KernelStack:        5960 kB
>
> maybe interrupt nesting ad nauseum? How does one debug
> such issues?
>
> It is possible that my previous posting was also caused by this -
> disregard, I'll post again if the problem persists.
>
> Jordan, could you check whether your kernel stack remains
> stable with only the 3 patches reverted?

I'll have to re-install that kernel and get back to you [ give me an
hour or so. i desperately need to eat some food ;) ]. My current
kernel differs in a couple of ways [ but yes, the KernelStack is
stable]. My running/installed rt-kernel is using the 'threadsirq'
patch for linux-rt and isn't vanilla linux-rt anyway, as i patch in
UKSM [in-kernel memory deduplication and BFQ [is scheduler]... but
yes, I'm still reverting those 3 timer[s] patches, since my machine
[and other users i know] won't boot without doing that. Here's my
running kernel's kstack;

[ninez at localhost ~]$  (cat /proc/meminfo ; cat /proc/meminfo) |  grep
KernelStack
KernelStack:        3728 kB
KernelStack:        3728 kB
[ninez at localhost ~]$  (cat /proc/meminfo ; cat /proc/meminfo) |  grep
KernelStack
KernelStack:        3696 kB
KernelStack:        3696 kB

I'll post from my old kernel in a bit.

> Thanks

No problem

Jordan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: BUG: spinlock trylock failure on UP - reverting timer patches helps
  2014-04-17 23:10           ` jordan
  (?)
@ 2014-04-18  1:15             ` jordan
  -1 siblings, 0 replies; 48+ messages in thread
From: jordan @ 2014-04-18  1:15 UTC (permalink / raw)
  To: Stanislav Meduna
  Cc: linux-rt-users, Linux ARM Kernel, linux-kernel,
	Sebastian Andrzej Siewior, Steven Rostedt, Thomas Gleixner

Hey again,

> [ninez@localhost ~]$  (cat /proc/meminfo ; cat /proc/meminfo) |  grep
> KernelStack
> KernelStack:        3728 kB
> KernelStack:        3728 kB
> [ninez@localhost ~]$  (cat /proc/meminfo ; cat /proc/meminfo) |  grep
> KernelStack
> KernelStack:        3696 kB
> KernelStack:        3696 kB
>
> I'll post from my old kernel in a bit.

my other machines all report kstack as being stable/consistent - I
also had an Archlinux user of my kernel check his box - his was
consistent too. [all of those boxes reverted those timer patches].

cheerz

Jordan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: BUG: spinlock trylock failure on UP - reverting timer patches helps
@ 2014-04-18  1:15             ` jordan
  0 siblings, 0 replies; 48+ messages in thread
From: jordan @ 2014-04-18  1:15 UTC (permalink / raw)
  To: Stanislav Meduna
  Cc: linux-rt-users, Sebastian Andrzej Siewior, linux-kernel,
	Steven Rostedt, Thomas Gleixner, Linux ARM Kernel

Hey again,

> [ninez@localhost ~]$  (cat /proc/meminfo ; cat /proc/meminfo) |  grep
> KernelStack
> KernelStack:        3728 kB
> KernelStack:        3728 kB
> [ninez@localhost ~]$  (cat /proc/meminfo ; cat /proc/meminfo) |  grep
> KernelStack
> KernelStack:        3696 kB
> KernelStack:        3696 kB
>
> I'll post from my old kernel in a bit.

my other machines all report kstack as being stable/consistent - I
also had an Archlinux user of my kernel check his box - his was
consistent too. [all of those boxes reverted those timer patches].

cheerz

Jordan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* BUG: spinlock trylock failure on UP - reverting timer patches helps
@ 2014-04-18  1:15             ` jordan
  0 siblings, 0 replies; 48+ messages in thread
From: jordan @ 2014-04-18  1:15 UTC (permalink / raw)
  To: linux-arm-kernel

Hey again,

> [ninez at localhost ~]$  (cat /proc/meminfo ; cat /proc/meminfo) |  grep
> KernelStack
> KernelStack:        3728 kB
> KernelStack:        3728 kB
> [ninez at localhost ~]$  (cat /proc/meminfo ; cat /proc/meminfo) |  grep
> KernelStack
> KernelStack:        3696 kB
> KernelStack:        3696 kB
>
> I'll post from my old kernel in a bit.

my other machines all report kstack as being stable/consistent - I
also had an Archlinux user of my kernel check his box - his was
consistent too. [all of those boxes reverted those timer patches].

cheerz

Jordan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: BUG: spinlock trylock failure on UP - reverting timer patches helps
  2014-04-17 23:10           ` jordan
@ 2014-04-18 14:45             ` Stanislav Meduna
  -1 siblings, 0 replies; 48+ messages in thread
From: Stanislav Meduna @ 2014-04-18 14:45 UTC (permalink / raw)
  To: jordan
  Cc: linux-rt-users, Linux ARM Kernel, linux-kernel,
	Sebastian Andrzej Siewior, Steven Rostedt, Thomas Gleixner

On 18.04.2014 01:10, jordan wrote:

Hi Jordan,

many thanks for the tests. I probably botched something during
the reverting - I now tried some more experiments and the system
now runs without BUGs and without kernel leak - of course
it needs some more uptime to be really sure.

My combined revert patch from 3.12.15-rt25 is at
  http://pastebin.com/MYLqbmZw
That was all that was needed.

> Yeah, i know a handful of people [amd users] that have now reported
> success booting into 3.14-rt1 reverting those patches.  Personally, I
> have disabled NO_HZ_FULL and have switched back to 'old tick' method
> in kconfig. I don't think the latest no_hz stuff is stable enough...

My problems were with periodic timers (I am on an embedded system
that runs things periodically anyway and I have also my doubts
regarding the stability), so the latest code did not break
only NO_HZ_FULL.

Thanks
-- 
                                       Stano


^ permalink raw reply	[flat|nested] 48+ messages in thread

* BUG: spinlock trylock failure on UP - reverting timer patches helps
@ 2014-04-18 14:45             ` Stanislav Meduna
  0 siblings, 0 replies; 48+ messages in thread
From: Stanislav Meduna @ 2014-04-18 14:45 UTC (permalink / raw)
  To: linux-arm-kernel

On 18.04.2014 01:10, jordan wrote:

Hi Jordan,

many thanks for the tests. I probably botched something during
the reverting - I now tried some more experiments and the system
now runs without BUGs and without kernel leak - of course
it needs some more uptime to be really sure.

My combined revert patch from 3.12.15-rt25 is at
  http://pastebin.com/MYLqbmZw
That was all that was needed.

> Yeah, i know a handful of people [amd users] that have now reported
> success booting into 3.14-rt1 reverting those patches.  Personally, I
> have disabled NO_HZ_FULL and have switched back to 'old tick' method
> in kconfig. I don't think the latest no_hz stuff is stable enough...

My problems were with periodic timers (I am on an embedded system
that runs things periodically anyway and I have also my doubts
regarding the stability), so the latest code did not break
only NO_HZ_FULL.

Thanks
-- 
                                       Stano

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: BUG: spinlock trylock failure on UP - reverting timer patches helps
  2014-04-18 14:45             ` Stanislav Meduna
@ 2014-04-18 15:09               ` jordan
  -1 siblings, 0 replies; 48+ messages in thread
From: jordan @ 2014-04-18 15:09 UTC (permalink / raw)
  To: Stanislav Meduna
  Cc: linux-rt-users, Linux ARM Kernel, linux-kernel,
	Sebastian Andrzej Siewior, Steven Rostedt, Thomas Gleixner

Hey slavo

> many thanks for the tests. I probably botched something during
> the reverting - I now tried some more experiments and the system
> now runs without BUGs and without kernel leak - of course
> it needs some more uptime to be really sure.

No problem. It's good to hear that is working. My AMD Phenom II 965
has 1day, 10hours uptime. Hopefully that foreshadows your system
running well.

> My combined revert patch from 3.12.15-rt25 is at
>   http://pastebin.com/MYLqbmZw
> That was all that was needed.

Yup. I suggest anyone from the list who can't get their machines to
boot with those commits applied, test that patch.

>> Yeah, i know a handful of people [amd users] that have now reported
>> success booting into 3.14-rt1 reverting those patches.  Personally, I
>> have disabled NO_HZ_FULL and have switched back to 'old tick' method
>> in kconfig. I don't think the latest no_hz stuff is stable enough...
>
> My problems were with periodic timers (I am on an embedded system
> that runs things periodically anyway and I have also my doubts
> regarding the stability), so the latest code did not break
> only NO_HZ_FULL.

ah, thanks for the clarification [ i must have missed that bit - i was
just glad someone else had verified on the list that those commits
were problematic, as i hadn't gotten any response]. NO_HZ_FULL had
been alright for me, for a while - so i didn't really notice the
breakage with periodic timers, but that is good to know.

I think periodic timers + reverting those patches + resurrecting sirq
threads on seems to be the best on 3.14-rt, afaict.

Jordan

> Thanks
> --
>                                        Stano
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* BUG: spinlock trylock failure on UP - reverting timer patches helps
@ 2014-04-18 15:09               ` jordan
  0 siblings, 0 replies; 48+ messages in thread
From: jordan @ 2014-04-18 15:09 UTC (permalink / raw)
  To: linux-arm-kernel

Hey slavo

> many thanks for the tests. I probably botched something during
> the reverting - I now tried some more experiments and the system
> now runs without BUGs and without kernel leak - of course
> it needs some more uptime to be really sure.

No problem. It's good to hear that is working. My AMD Phenom II 965
has 1day, 10hours uptime. Hopefully that foreshadows your system
running well.

> My combined revert patch from 3.12.15-rt25 is at
>   http://pastebin.com/MYLqbmZw
> That was all that was needed.

Yup. I suggest anyone from the list who can't get their machines to
boot with those commits applied, test that patch.

>> Yeah, i know a handful of people [amd users] that have now reported
>> success booting into 3.14-rt1 reverting those patches.  Personally, I
>> have disabled NO_HZ_FULL and have switched back to 'old tick' method
>> in kconfig. I don't think the latest no_hz stuff is stable enough...
>
> My problems were with periodic timers (I am on an embedded system
> that runs things periodically anyway and I have also my doubts
> regarding the stability), so the latest code did not break
> only NO_HZ_FULL.

ah, thanks for the clarification [ i must have missed that bit - i was
just glad someone else had verified on the list that those commits
were problematic, as i hadn't gotten any response]. NO_HZ_FULL had
been alright for me, for a while - so i didn't really notice the
breakage with periodic timers, but that is good to know.

I think periodic timers + reverting those patches + resurrecting sirq
threads on seems to be the best on 3.14-rt, afaict.

Jordan

> Thanks
> --
>                                        Stano
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: BUG: spinlock trylock failure on UP, i.MX28 3.12.15-rt25
  2014-04-15 22:08     ` Stanislav Meduna
@ 2014-04-22 11:54       ` Sebastian Andrzej Siewior
  -1 siblings, 0 replies; 48+ messages in thread
From: Sebastian Andrzej Siewior @ 2014-04-22 11:54 UTC (permalink / raw)
  To: Stanislav Meduna
  Cc: linux-rt-users, Linux ARM Kernel, linux-kernel, Steven Rostedt,
	Thomas Gleixner

* Stanislav Meduna | 2014-04-16 00:08:49 [+0200]:

>Hmm... how is it in the rt-case guaranteed that the timer interrupt
>does not preempt someone trying to modify the timer? The run_local_timers
>looks to have arrived via hardirq context. The spinlock in the tvec_base
>is a normal one and spin_lock_irqsave does not disable interrupts
>on rt, right?

this is, erm, harmless. We grab the timer lock via trylock in hardirq
context. If the lock is already taken then we fail to get it we go for
plan B. According to lockdep a trylock should not fail on UP. This is
true in general except for this timer case. I was thinking abour
disabling this lockdep check…

Sebastian

^ permalink raw reply	[flat|nested] 48+ messages in thread

* BUG: spinlock trylock failure on UP, i.MX28 3.12.15-rt25
@ 2014-04-22 11:54       ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 48+ messages in thread
From: Sebastian Andrzej Siewior @ 2014-04-22 11:54 UTC (permalink / raw)
  To: linux-arm-kernel

* Stanislav Meduna | 2014-04-16 00:08:49 [+0200]:

>Hmm... how is it in the rt-case guaranteed that the timer interrupt
>does not preempt someone trying to modify the timer? The run_local_timers
>looks to have arrived via hardirq context. The spinlock in the tvec_base
>is a normal one and spin_lock_irqsave does not disable interrupts
>on rt, right?

this is, erm, harmless. We grab the timer lock via trylock in hardirq
context. If the lock is already taken then we fail to get it we go for
plan B. According to lockdep a trylock should not fail on UP. This is
true in general except for this timer case. I was thinking abour
disabling this lockdep check?

Sebastian

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: BUG: spinlock trylock failure on UP, i.MX28 3.12.15-rt25
  2014-04-22 11:54       ` Sebastian Andrzej Siewior
@ 2014-04-22 13:46         ` Steven Rostedt
  -1 siblings, 0 replies; 48+ messages in thread
From: Steven Rostedt @ 2014-04-22 13:46 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Stanislav Meduna, linux-rt-users, Linux ARM Kernel, linux-kernel,
	Thomas Gleixner, Peter Zijlstra

[ added Peter ]

On Tue, 22 Apr 2014 13:54:39 +0200
Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
 
> this is, erm, harmless. We grab the timer lock via trylock in hardirq
> context. If the lock is already taken then we fail to get it we go for
> plan B. According to lockdep a trylock should not fail on UP. This is
> true in general except for this timer case. I was thinking abour
> disabling this lockdep check…

trylock not failing on UP, can that be an issue? I mean, if a hardirq
does a trylock to see if it can grab a lock that is not protected by
disabling irqs, and will go to plan B if it fails, on UP, it will
always get it.  But the issue is still there. That would mean that a
hardirq could have preempted a critical section and doing a trylock
here would succeed when it really should have failed.

-- Steve

^ permalink raw reply	[flat|nested] 48+ messages in thread

* BUG: spinlock trylock failure on UP, i.MX28 3.12.15-rt25
@ 2014-04-22 13:46         ` Steven Rostedt
  0 siblings, 0 replies; 48+ messages in thread
From: Steven Rostedt @ 2014-04-22 13:46 UTC (permalink / raw)
  To: linux-arm-kernel

[ added Peter ]

On Tue, 22 Apr 2014 13:54:39 +0200
Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
 
> this is, erm, harmless. We grab the timer lock via trylock in hardirq
> context. If the lock is already taken then we fail to get it we go for
> plan B. According to lockdep a trylock should not fail on UP. This is
> true in general except for this timer case. I was thinking abour
> disabling this lockdep check?

trylock not failing on UP, can that be an issue? I mean, if a hardirq
does a trylock to see if it can grab a lock that is not protected by
disabling irqs, and will go to plan B if it fails, on UP, it will
always get it.  But the issue is still there. That would mean that a
hardirq could have preempted a critical section and doing a trylock
here would succeed when it really should have failed.

-- Steve

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: BUG: spinlock trylock failure on UP, i.MX28 3.12.15-rt25
  2014-04-22 13:46         ` Steven Rostedt
@ 2014-04-22 14:09           ` Peter Zijlstra
  -1 siblings, 0 replies; 48+ messages in thread
From: Peter Zijlstra @ 2014-04-22 14:09 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Sebastian Andrzej Siewior, Stanislav Meduna, linux-rt-users,
	Linux ARM Kernel, linux-kernel, Thomas Gleixner

On Tue, Apr 22, 2014 at 09:46:57AM -0400, Steven Rostedt wrote:
> >  According to lockdep a trylock should not fail on UP.

Oh!? Where does it say that? A trylock can fail at all times.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* BUG: spinlock trylock failure on UP, i.MX28 3.12.15-rt25
@ 2014-04-22 14:09           ` Peter Zijlstra
  0 siblings, 0 replies; 48+ messages in thread
From: Peter Zijlstra @ 2014-04-22 14:09 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Apr 22, 2014 at 09:46:57AM -0400, Steven Rostedt wrote:
> >  According to lockdep a trylock should not fail on UP.

Oh!? Where does it say that? A trylock can fail at all times.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: BUG: spinlock trylock failure on UP, i.MX28 3.12.15-rt25
  2014-04-22 13:46         ` Steven Rostedt
@ 2014-04-22 16:51           ` Sebastian Andrzej Siewior
  -1 siblings, 0 replies; 48+ messages in thread
From: Sebastian Andrzej Siewior @ 2014-04-22 16:51 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Stanislav Meduna, linux-rt-users, Linux ARM Kernel, linux-kernel,
	Thomas Gleixner, Peter Zijlstra

On 04/22/2014 03:46 PM, Steven Rostedt wrote:
> [ added Peter ]
> 
> On Tue, 22 Apr 2014 13:54:39 +0200
> Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
>  
>> this is, erm, harmless. We grab the timer lock via trylock in hardirq
>> context. If the lock is already taken then we fail to get it we go for
>> plan B. According to lockdep a trylock should not fail on UP. This is
>> true in general except for this timer case. I was thinking abour
>> disabling this lockdep check…
> 
> trylock not failing on UP, can that be an issue? I mean, if a hardirq
> does a trylock to see if it can grab a lock that is not protected by
> disabling irqs, and will go to plan B if it fails, on UP, it will
> always get it.  But the issue is still there. That would mean that a
> hardirq could have preempted a critical section and doing a trylock
> here would succeed when it really should have failed.

If you take a lock with irqs enabled and disabled then lockdep should
complain about it.

This is the ->wait_lock of the timer base lock. This (sleeping) lock is
usually taken with interrupts enabled. Except here, in the timer
callback, we check if the lock is available or not. And this lock may
be a) taken (and the ->wait_lock unlocked) or b) in process to be taken
but the caller only succeeded to acquire the ->wait_lock before the
interrupt occurred. This is the case here and we can't acquire the
->wait_lock a second time the check if the lock is really taken. But
since the wait_lock is occupied it is likely that the lock itself is
occupied as well.

> 
> -- Steve

Sebastian

^ permalink raw reply	[flat|nested] 48+ messages in thread

* BUG: spinlock trylock failure on UP, i.MX28 3.12.15-rt25
@ 2014-04-22 16:51           ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 48+ messages in thread
From: Sebastian Andrzej Siewior @ 2014-04-22 16:51 UTC (permalink / raw)
  To: linux-arm-kernel

On 04/22/2014 03:46 PM, Steven Rostedt wrote:
> [ added Peter ]
> 
> On Tue, 22 Apr 2014 13:54:39 +0200
> Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
>  
>> this is, erm, harmless. We grab the timer lock via trylock in hardirq
>> context. If the lock is already taken then we fail to get it we go for
>> plan B. According to lockdep a trylock should not fail on UP. This is
>> true in general except for this timer case. I was thinking abour
>> disabling this lockdep check?
> 
> trylock not failing on UP, can that be an issue? I mean, if a hardirq
> does a trylock to see if it can grab a lock that is not protected by
> disabling irqs, and will go to plan B if it fails, on UP, it will
> always get it.  But the issue is still there. That would mean that a
> hardirq could have preempted a critical section and doing a trylock
> here would succeed when it really should have failed.

If you take a lock with irqs enabled and disabled then lockdep should
complain about it.

This is the ->wait_lock of the timer base lock. This (sleeping) lock is
usually taken with interrupts enabled. Except here, in the timer
callback, we check if the lock is available or not. And this lock may
be a) taken (and the ->wait_lock unlocked) or b) in process to be taken
but the caller only succeeded to acquire the ->wait_lock before the
interrupt occurred. This is the case here and we can't acquire the
->wait_lock a second time the check if the lock is really taken. But
since the wait_lock is occupied it is likely that the lock itself is
occupied as well.

> 
> -- Steve

Sebastian

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: BUG: spinlock trylock failure on UP, i.MX28 3.12.15-rt25
  2014-04-22 16:51           ` Sebastian Andrzej Siewior
@ 2014-04-22 17:48             ` Steven Rostedt
  -1 siblings, 0 replies; 48+ messages in thread
From: Steven Rostedt @ 2014-04-22 17:48 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Stanislav Meduna, linux-rt-users, Linux ARM Kernel, linux-kernel,
	Thomas Gleixner, Peter Zijlstra

On Tue, 22 Apr 2014 18:51:17 +0200
Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:

> On 04/22/2014 03:46 PM, Steven Rostedt wrote:
> > [ added Peter ]
> > 
> > On Tue, 22 Apr 2014 13:54:39 +0200
> > Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
> >  
> >> this is, erm, harmless. We grab the timer lock via trylock in hardirq
> >> context. If the lock is already taken then we fail to get it we go for
> >> plan B. According to lockdep a trylock should not fail on UP. This is
> >> true in general except for this timer case. I was thinking abour
> >> disabling this lockdep check…
> > 
> > trylock not failing on UP, can that be an issue? I mean, if a hardirq
> > does a trylock to see if it can grab a lock that is not protected by
> > disabling irqs, and will go to plan B if it fails, on UP, it will
> > always get it.  But the issue is still there. That would mean that a
> > hardirq could have preempted a critical section and doing a trylock
> > here would succeed when it really should have failed.
> 
> If you take a lock with irqs enabled and disabled then lockdep should
> complain about it.

There's nothing wrong with taking locks with irqs enabled and disabled.
It's only wrong if that lock (or a lock that is held when the lock is
taken) is also taking in interrupt *context*.

> 
> This is the ->wait_lock of the timer base lock. This (sleeping) lock is
> usually taken with interrupts enabled. Except here, in the timer
> callback, we check if the lock is available or not. And this lock may
> be a) taken (and the ->wait_lock unlocked) or b) in process to be taken
> but the caller only succeeded to acquire the ->wait_lock before the
> interrupt occurred. This is the case here and we can't acquire the
> ->wait_lock a second time the check if the lock is really taken. But
> since the wait_lock is occupied it is likely that the lock itself is
> occupied as well.

I need to take a deeper look into the actual code. But as trylocks on
UP are nops (always succeed), and if it expects to be able to do
something in a critical section that is protected by spinlocks (again
nops on UP), this would be broken for UP.

-- Steve


^ permalink raw reply	[flat|nested] 48+ messages in thread

* BUG: spinlock trylock failure on UP, i.MX28 3.12.15-rt25
@ 2014-04-22 17:48             ` Steven Rostedt
  0 siblings, 0 replies; 48+ messages in thread
From: Steven Rostedt @ 2014-04-22 17:48 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 22 Apr 2014 18:51:17 +0200
Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:

> On 04/22/2014 03:46 PM, Steven Rostedt wrote:
> > [ added Peter ]
> > 
> > On Tue, 22 Apr 2014 13:54:39 +0200
> > Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
> >  
> >> this is, erm, harmless. We grab the timer lock via trylock in hardirq
> >> context. If the lock is already taken then we fail to get it we go for
> >> plan B. According to lockdep a trylock should not fail on UP. This is
> >> true in general except for this timer case. I was thinking abour
> >> disabling this lockdep check?
> > 
> > trylock not failing on UP, can that be an issue? I mean, if a hardirq
> > does a trylock to see if it can grab a lock that is not protected by
> > disabling irqs, and will go to plan B if it fails, on UP, it will
> > always get it.  But the issue is still there. That would mean that a
> > hardirq could have preempted a critical section and doing a trylock
> > here would succeed when it really should have failed.
> 
> If you take a lock with irqs enabled and disabled then lockdep should
> complain about it.

There's nothing wrong with taking locks with irqs enabled and disabled.
It's only wrong if that lock (or a lock that is held when the lock is
taken) is also taking in interrupt *context*.

> 
> This is the ->wait_lock of the timer base lock. This (sleeping) lock is
> usually taken with interrupts enabled. Except here, in the timer
> callback, we check if the lock is available or not. And this lock may
> be a) taken (and the ->wait_lock unlocked) or b) in process to be taken
> but the caller only succeeded to acquire the ->wait_lock before the
> interrupt occurred. This is the case here and we can't acquire the
> ->wait_lock a second time the check if the lock is really taken. But
> since the wait_lock is occupied it is likely that the lock itself is
> occupied as well.

I need to take a deeper look into the actual code. But as trylocks on
UP are nops (always succeed), and if it expects to be able to do
something in a critical section that is protected by spinlocks (again
nops on UP), this would be broken for UP.

-- Steve

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: BUG: spinlock trylock failure on UP, i.MX28 3.12.15-rt25
  2014-04-22 17:48             ` Steven Rostedt
@ 2014-04-22 18:16               ` Steven Rostedt
  -1 siblings, 0 replies; 48+ messages in thread
From: Steven Rostedt @ 2014-04-22 18:16 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Sebastian Andrzej Siewior, Stanislav Meduna, linux-rt-users,
	Linux ARM Kernel, linux-kernel, Thomas Gleixner, Peter Zijlstra

On Tue, 22 Apr 2014 13:48:02 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> I need to take a deeper look into the actual code. But as trylocks on
> UP are nops (always succeed), and if it expects to be able to do
> something in a critical section that is protected by spinlocks (again
> nops on UP), this would be broken for UP.

Reading the code, I see it's broken. We should add something like this:

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
diff --git a/kernel/timer.c b/kernel/timer.c
index cc34e42..a03164a 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -1447,6 +1447,12 @@ static void run_timer_softirq(struct softirq_action *h)
 		__run_timers(base);
 }
 
+#ifdef CONFIG_SMP
+#define timer_should_raise_softirq(lock)	!spin_do_trylock(lock)
+#else
+#define timer_should_raise_softirq(lock)	1
+#endif
+
 /*
  * Called by the local, per-CPU timer interrupt on SMP.
  */
@@ -1467,7 +1473,7 @@ void run_local_timers(void)
 		return;
 	}
 
-	if (!spin_do_trylock(&base->lock)) {
+	if (timer_should_raise_softirq(&base->lock)) {
 		raise_softirq(TIMER_SOFTIRQ);
 		return;
 	}

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* BUG: spinlock trylock failure on UP, i.MX28 3.12.15-rt25
@ 2014-04-22 18:16               ` Steven Rostedt
  0 siblings, 0 replies; 48+ messages in thread
From: Steven Rostedt @ 2014-04-22 18:16 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 22 Apr 2014 13:48:02 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> I need to take a deeper look into the actual code. But as trylocks on
> UP are nops (always succeed), and if it expects to be able to do
> something in a critical section that is protected by spinlocks (again
> nops on UP), this would be broken for UP.

Reading the code, I see it's broken. We should add something like this:

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
diff --git a/kernel/timer.c b/kernel/timer.c
index cc34e42..a03164a 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -1447,6 +1447,12 @@ static void run_timer_softirq(struct softirq_action *h)
 		__run_timers(base);
 }
 
+#ifdef CONFIG_SMP
+#define timer_should_raise_softirq(lock)	!spin_do_trylock(lock)
+#else
+#define timer_should_raise_softirq(lock)	1
+#endif
+
 /*
  * Called by the local, per-CPU timer interrupt on SMP.
  */
@@ -1467,7 +1473,7 @@ void run_local_timers(void)
 		return;
 	}
 
-	if (!spin_do_trylock(&base->lock)) {
+	if (timer_should_raise_softirq(&base->lock)) {
 		raise_softirq(TIMER_SOFTIRQ);
 		return;
 	}

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: BUG: spinlock trylock failure on UP, i.MX28 3.12.15-rt25
  2014-04-22 18:16               ` Steven Rostedt
@ 2014-04-23  7:14                 ` Sebastian Andrzej Siewior
  -1 siblings, 0 replies; 48+ messages in thread
From: Sebastian Andrzej Siewior @ 2014-04-23  7:14 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Stanislav Meduna, linux-rt-users, Linux ARM Kernel, linux-kernel,
	Thomas Gleixner, Peter Zijlstra

On 04/22/2014 08:16 PM, Steven Rostedt wrote:
> --- a/kernel/timer.c
> +++ b/kernel/timer.c
> @@ -1447,6 +1447,12 @@ static void run_timer_softirq(struct softirq_action *h)
>  		__run_timers(base);
>  }
>  
> +#ifdef CONFIG_SMP
> +#define timer_should_raise_softirq(lock)	!spin_do_trylock(lock)
> +#else
> +#define timer_should_raise_softirq(lock)	1
> +#endif
> +

No. The lock may be taken but it also may be available no matter if UP
or not. With this patch applied the lockdep splat will go away but the
FULL_NO_HZ people will come back because the timer softirq is scheduled
even if no timer has expired.

Sebastian

^ permalink raw reply	[flat|nested] 48+ messages in thread

* BUG: spinlock trylock failure on UP, i.MX28 3.12.15-rt25
@ 2014-04-23  7:14                 ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 48+ messages in thread
From: Sebastian Andrzej Siewior @ 2014-04-23  7:14 UTC (permalink / raw)
  To: linux-arm-kernel

On 04/22/2014 08:16 PM, Steven Rostedt wrote:
> --- a/kernel/timer.c
> +++ b/kernel/timer.c
> @@ -1447,6 +1447,12 @@ static void run_timer_softirq(struct softirq_action *h)
>  		__run_timers(base);
>  }
>  
> +#ifdef CONFIG_SMP
> +#define timer_should_raise_softirq(lock)	!spin_do_trylock(lock)
> +#else
> +#define timer_should_raise_softirq(lock)	1
> +#endif
> +

No. The lock may be taken but it also may be available no matter if UP
or not. With this patch applied the lockdep splat will go away but the
FULL_NO_HZ people will come back because the timer softirq is scheduled
even if no timer has expired.

Sebastian

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: BUG: spinlock trylock failure on UP, i.MX28 3.12.15-rt25
  2014-04-22 14:09           ` Peter Zijlstra
@ 2014-04-23  8:10             ` Sebastian Andrzej Siewior
  -1 siblings, 0 replies; 48+ messages in thread
From: Sebastian Andrzej Siewior @ 2014-04-23  8:10 UTC (permalink / raw)
  To: Peter Zijlstra, Steven Rostedt
  Cc: Stanislav Meduna, linux-rt-users, Linux ARM Kernel, linux-kernel,
	Thomas Gleixner

On 04/22/2014 04:09 PM, Peter Zijlstra wrote:
> On Tue, Apr 22, 2014 at 09:46:57AM -0400, Steven Rostedt wrote:
>>>  According to lockdep a trylock should not fail on UP.
> 
> Oh!? Where does it say that? A trylock can fail at all times.

kernel/locking/spinlock_debug.c:

int do_raw_spin_trylock(raw_spinlock_t *lock)
 {
         int ret = arch_spin_trylock(&lock->raw_lock);

         if (ret)
                 debug_spin_lock_after(lock);
 #ifndef CONFIG_SMP
         /*
          * Must not happen on UP:
          */
         SPIN_BUG_ON(!ret, lock, "trylock failure on UP");
 #endif
         return ret;
 }

How can a trylock (spinlock, not mutex) fail on UP? That would mean the
lock is not interrupt safe.
Unless, you attempt to take the lock from interrupt context via trylock
while in general you take the spinlock in process context with
interrupts enabled.

Sebastian

^ permalink raw reply	[flat|nested] 48+ messages in thread

* BUG: spinlock trylock failure on UP, i.MX28 3.12.15-rt25
@ 2014-04-23  8:10             ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 48+ messages in thread
From: Sebastian Andrzej Siewior @ 2014-04-23  8:10 UTC (permalink / raw)
  To: linux-arm-kernel

On 04/22/2014 04:09 PM, Peter Zijlstra wrote:
> On Tue, Apr 22, 2014 at 09:46:57AM -0400, Steven Rostedt wrote:
>>>  According to lockdep a trylock should not fail on UP.
> 
> Oh!? Where does it say that? A trylock can fail at all times.

kernel/locking/spinlock_debug.c:

int do_raw_spin_trylock(raw_spinlock_t *lock)
 {
         int ret = arch_spin_trylock(&lock->raw_lock);

         if (ret)
                 debug_spin_lock_after(lock);
 #ifndef CONFIG_SMP
         /*
          * Must not happen on UP:
          */
         SPIN_BUG_ON(!ret, lock, "trylock failure on UP");
 #endif
         return ret;
 }

How can a trylock (spinlock, not mutex) fail on UP? That would mean the
lock is not interrupt safe.
Unless, you attempt to take the lock from interrupt context via trylock
while in general you take the spinlock in process context with
interrupts enabled.

Sebastian

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: BUG: spinlock trylock failure on UP, i.MX28 3.12.15-rt25
  2014-04-23  7:14                 ` Sebastian Andrzej Siewior
@ 2014-04-23  8:49                   ` Peter Zijlstra
  -1 siblings, 0 replies; 48+ messages in thread
From: Peter Zijlstra @ 2014-04-23  8:49 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Steven Rostedt, Stanislav Meduna, linux-rt-users,
	Linux ARM Kernel, linux-kernel, Thomas Gleixner

On Wed, Apr 23, 2014 at 09:14:33AM +0200, Sebastian Andrzej Siewior wrote:
> On 04/22/2014 08:16 PM, Steven Rostedt wrote:
> > --- a/kernel/timer.c
> > +++ b/kernel/timer.c
> > @@ -1447,6 +1447,12 @@ static void run_timer_softirq(struct softirq_action *h)
> >  		__run_timers(base);
> >  }
> >  
> > +#ifdef CONFIG_SMP
> > +#define timer_should_raise_softirq(lock)	!spin_do_trylock(lock)
> > +#else
> > +#define timer_should_raise_softirq(lock)	1
> > +#endif
> > +
> 
> No. The lock may be taken but it also may be available no matter if UP
> or not. With this patch applied the lockdep splat will go away but the
> FULL_NO_HZ people will come back because the timer softirq is scheduled
> even if no timer has expired.

You can do FULL_NO_HZ on UP ? That seems rather pointless, as there
isn't a 'spare' CPU to do all the timekeeping on.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* BUG: spinlock trylock failure on UP, i.MX28 3.12.15-rt25
@ 2014-04-23  8:49                   ` Peter Zijlstra
  0 siblings, 0 replies; 48+ messages in thread
From: Peter Zijlstra @ 2014-04-23  8:49 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Apr 23, 2014 at 09:14:33AM +0200, Sebastian Andrzej Siewior wrote:
> On 04/22/2014 08:16 PM, Steven Rostedt wrote:
> > --- a/kernel/timer.c
> > +++ b/kernel/timer.c
> > @@ -1447,6 +1447,12 @@ static void run_timer_softirq(struct softirq_action *h)
> >  		__run_timers(base);
> >  }
> >  
> > +#ifdef CONFIG_SMP
> > +#define timer_should_raise_softirq(lock)	!spin_do_trylock(lock)
> > +#else
> > +#define timer_should_raise_softirq(lock)	1
> > +#endif
> > +
> 
> No. The lock may be taken but it also may be available no matter if UP
> or not. With this patch applied the lockdep splat will go away but the
> FULL_NO_HZ people will come back because the timer softirq is scheduled
> even if no timer has expired.

You can do FULL_NO_HZ on UP ? That seems rather pointless, as there
isn't a 'spare' CPU to do all the timekeeping on.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: BUG: spinlock trylock failure on UP, i.MX28 3.12.15-rt25
  2014-04-23  8:10             ` Sebastian Andrzej Siewior
@ 2014-04-23  8:50               ` Peter Zijlstra
  -1 siblings, 0 replies; 48+ messages in thread
From: Peter Zijlstra @ 2014-04-23  8:50 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Steven Rostedt, Stanislav Meduna, linux-rt-users,
	Linux ARM Kernel, linux-kernel, Thomas Gleixner

On Wed, Apr 23, 2014 at 10:10:01AM +0200, Sebastian Andrzej Siewior wrote:
> On 04/22/2014 04:09 PM, Peter Zijlstra wrote:
> > On Tue, Apr 22, 2014 at 09:46:57AM -0400, Steven Rostedt wrote:
> >>>  According to lockdep a trylock should not fail on UP.
> > 
> > Oh!? Where does it say that? A trylock can fail at all times.
> 
> kernel/locking/spinlock_debug.c:
> 
> int do_raw_spin_trylock(raw_spinlock_t *lock)
>  {
>          int ret = arch_spin_trylock(&lock->raw_lock);
> 
>          if (ret)
>                  debug_spin_lock_after(lock);
>  #ifndef CONFIG_SMP
>          /*
>           * Must not happen on UP:
>           */
>          SPIN_BUG_ON(!ret, lock, "trylock failure on UP");
>  #endif
>          return ret;
>  }
> 
> How can a trylock (spinlock, not mutex) fail on UP? That would mean the
> lock is not interrupt safe.
> Unless, you attempt to take the lock from interrupt context via trylock
> while in general you take the spinlock in process context with
> interrupts enabled.

But that's not lockdep. That's the spinlock debugging code, entirely
different beasts.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* BUG: spinlock trylock failure on UP, i.MX28 3.12.15-rt25
@ 2014-04-23  8:50               ` Peter Zijlstra
  0 siblings, 0 replies; 48+ messages in thread
From: Peter Zijlstra @ 2014-04-23  8:50 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Apr 23, 2014 at 10:10:01AM +0200, Sebastian Andrzej Siewior wrote:
> On 04/22/2014 04:09 PM, Peter Zijlstra wrote:
> > On Tue, Apr 22, 2014 at 09:46:57AM -0400, Steven Rostedt wrote:
> >>>  According to lockdep a trylock should not fail on UP.
> > 
> > Oh!? Where does it say that? A trylock can fail at all times.
> 
> kernel/locking/spinlock_debug.c:
> 
> int do_raw_spin_trylock(raw_spinlock_t *lock)
>  {
>          int ret = arch_spin_trylock(&lock->raw_lock);
> 
>          if (ret)
>                  debug_spin_lock_after(lock);
>  #ifndef CONFIG_SMP
>          /*
>           * Must not happen on UP:
>           */
>          SPIN_BUG_ON(!ret, lock, "trylock failure on UP");
>  #endif
>          return ret;
>  }
> 
> How can a trylock (spinlock, not mutex) fail on UP? That would mean the
> lock is not interrupt safe.
> Unless, you attempt to take the lock from interrupt context via trylock
> while in general you take the spinlock in process context with
> interrupts enabled.

But that's not lockdep. That's the spinlock debugging code, entirely
different beasts.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: BUG: spinlock trylock failure on UP, i.MX28 3.12.15-rt25
  2014-04-23  7:14                 ` Sebastian Andrzej Siewior
@ 2014-04-23 12:15                   ` Steven Rostedt
  -1 siblings, 0 replies; 48+ messages in thread
From: Steven Rostedt @ 2014-04-23 12:15 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Stanislav Meduna, linux-rt-users, Linux ARM Kernel, linux-kernel,
	Thomas Gleixner, Peter Zijlstra

On Wed, 23 Apr 2014 09:14:33 +0200
Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:

> On 04/22/2014 08:16 PM, Steven Rostedt wrote:
> > --- a/kernel/timer.c
> > +++ b/kernel/timer.c
> > @@ -1447,6 +1447,12 @@ static void run_timer_softirq(struct softirq_action *h)
> >  		__run_timers(base);
> >  }
> >  
> > +#ifdef CONFIG_SMP
> > +#define timer_should_raise_softirq(lock)	!spin_do_trylock(lock)
> > +#else
> > +#define timer_should_raise_softirq(lock)	1
> > +#endif
> > +
> 
> No. The lock may be taken but it also may be available no matter if UP
> or not. With this patch applied the lockdep splat will go away but the
> FULL_NO_HZ people will come back because the timer softirq is scheduled
> even if no timer has expired.

Although, as Peter said FULL_NO_HZ is pretty pointless on UP, but I've
been thinking this from a non PREEMPT_RT viewpoint. In non PREEMPT_RT,
spin_locks() are nops, where this is an issue, but in PREEMPT_RT, they
are mutexes, BUT!

The rt_mutex use wait_lock to protect its internal state, which is a raw
spinlock, and in UP it's a nop. That means there's nothing preventing
an interrupt here from corrupting the rtmutex's internal state. I see
this is still an issue, and the warning is still valid.

-- Steve

^ permalink raw reply	[flat|nested] 48+ messages in thread

* BUG: spinlock trylock failure on UP, i.MX28 3.12.15-rt25
@ 2014-04-23 12:15                   ` Steven Rostedt
  0 siblings, 0 replies; 48+ messages in thread
From: Steven Rostedt @ 2014-04-23 12:15 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 23 Apr 2014 09:14:33 +0200
Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:

> On 04/22/2014 08:16 PM, Steven Rostedt wrote:
> > --- a/kernel/timer.c
> > +++ b/kernel/timer.c
> > @@ -1447,6 +1447,12 @@ static void run_timer_softirq(struct softirq_action *h)
> >  		__run_timers(base);
> >  }
> >  
> > +#ifdef CONFIG_SMP
> > +#define timer_should_raise_softirq(lock)	!spin_do_trylock(lock)
> > +#else
> > +#define timer_should_raise_softirq(lock)	1
> > +#endif
> > +
> 
> No. The lock may be taken but it also may be available no matter if UP
> or not. With this patch applied the lockdep splat will go away but the
> FULL_NO_HZ people will come back because the timer softirq is scheduled
> even if no timer has expired.

Although, as Peter said FULL_NO_HZ is pretty pointless on UP, but I've
been thinking this from a non PREEMPT_RT viewpoint. In non PREEMPT_RT,
spin_locks() are nops, where this is an issue, but in PREEMPT_RT, they
are mutexes, BUT!

The rt_mutex use wait_lock to protect its internal state, which is a raw
spinlock, and in UP it's a nop. That means there's nothing preventing
an interrupt here from corrupting the rtmutex's internal state. I see
this is still an issue, and the warning is still valid.

-- Steve

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: BUG: spinlock trylock failure on UP, i.MX28 3.12.15-rt25
  2014-04-22 18:16               ` Steven Rostedt
@ 2014-05-02 18:38                 ` Sebastian Andrzej Siewior
  -1 siblings, 0 replies; 48+ messages in thread
From: Sebastian Andrzej Siewior @ 2014-05-02 18:38 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Stanislav Meduna, linux-rt-users, Linux ARM Kernel, linux-kernel,
	Thomas Gleixner, Peter Zijlstra

* Steven Rostedt | 2014-04-22 14:16:50 [-0400]:

>On Tue, 22 Apr 2014 13:48:02 -0400
>Steven Rostedt <rostedt@goodmis.org> wrote:
>
>> I need to take a deeper look into the actual code. But as trylocks on
>> UP are nops (always succeed), and if it expects to be able to do
>> something in a critical section that is protected by spinlocks (again
>> nops on UP), this would be broken for UP.
>
>Reading the code, I see it's broken. We should add something like this:
>
>Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
>---
>diff --git a/kernel/timer.c b/kernel/timer.c
>index cc34e42..a03164a 100644
>--- a/kernel/timer.c
>+++ b/kernel/timer.c
>@@ -1447,6 +1447,12 @@ static void run_timer_softirq(struct softirq_action *h)
> 		__run_timers(base);
> }
> 
>+#ifdef CONFIG_SMP
>+#define timer_should_raise_softirq(lock)	!spin_do_trylock(lock)
>+#else
>+#define timer_should_raise_softirq(lock)	1
>+#endif
>+
> /*
>  * Called by the local, per-CPU timer interrupt on SMP.
>  */
>@@ -1467,7 +1473,7 @@ void run_local_timers(void)
> 		return;
> 	}
> 
>-	if (!spin_do_trylock(&base->lock)) {
>+	if (timer_should_raise_softirq(&base->lock)) {
> 		raise_softirq(TIMER_SOFTIRQ);
> 		return;
> 	}

Okay. So Peter said that it is okay to apply this since FULL_NO_HZ users
wouldn't complain on UP. I still wouldn't say it is broken but that is a
different story.
We have two users of this trylock. run_local_timers() which pops up
quite often (and you patched here) and the other is
get_next_timer_interrupt(). What do you suggest we do here? It is
basically the same thing.

Sebastian

^ permalink raw reply	[flat|nested] 48+ messages in thread

* BUG: spinlock trylock failure on UP, i.MX28 3.12.15-rt25
@ 2014-05-02 18:38                 ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 48+ messages in thread
From: Sebastian Andrzej Siewior @ 2014-05-02 18:38 UTC (permalink / raw)
  To: linux-arm-kernel

* Steven Rostedt | 2014-04-22 14:16:50 [-0400]:

>On Tue, 22 Apr 2014 13:48:02 -0400
>Steven Rostedt <rostedt@goodmis.org> wrote:
>
>> I need to take a deeper look into the actual code. But as trylocks on
>> UP are nops (always succeed), and if it expects to be able to do
>> something in a critical section that is protected by spinlocks (again
>> nops on UP), this would be broken for UP.
>
>Reading the code, I see it's broken. We should add something like this:
>
>Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
>---
>diff --git a/kernel/timer.c b/kernel/timer.c
>index cc34e42..a03164a 100644
>--- a/kernel/timer.c
>+++ b/kernel/timer.c
>@@ -1447,6 +1447,12 @@ static void run_timer_softirq(struct softirq_action *h)
> 		__run_timers(base);
> }
> 
>+#ifdef CONFIG_SMP
>+#define timer_should_raise_softirq(lock)	!spin_do_trylock(lock)
>+#else
>+#define timer_should_raise_softirq(lock)	1
>+#endif
>+
> /*
>  * Called by the local, per-CPU timer interrupt on SMP.
>  */
>@@ -1467,7 +1473,7 @@ void run_local_timers(void)
> 		return;
> 	}
> 
>-	if (!spin_do_trylock(&base->lock)) {
>+	if (timer_should_raise_softirq(&base->lock)) {
> 		raise_softirq(TIMER_SOFTIRQ);
> 		return;
> 	}

Okay. So Peter said that it is okay to apply this since FULL_NO_HZ users
wouldn't complain on UP. I still wouldn't say it is broken but that is a
different story.
We have two users of this trylock. run_local_timers() which pops up
quite often (and you patched here) and the other is
get_next_timer_interrupt(). What do you suggest we do here? It is
basically the same thing.

Sebastian

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: BUG: spinlock trylock failure on UP, i.MX28 3.12.15-rt25
  2014-05-02 18:38                 ` Sebastian Andrzej Siewior
@ 2014-05-02 19:01                   ` Thomas Gleixner
  -1 siblings, 0 replies; 48+ messages in thread
From: Thomas Gleixner @ 2014-05-02 19:01 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Steven Rostedt, Stanislav Meduna, linux-rt-users,
	Linux ARM Kernel, linux-kernel, Peter Zijlstra

On Fri, 2 May 2014, Sebastian Andrzej Siewior wrote:
> * Steven Rostedt | 2014-04-22 14:16:50 [-0400]:
> > /*
> >  * Called by the local, per-CPU timer interrupt on SMP.
> >  */
> >@@ -1467,7 +1473,7 @@ void run_local_timers(void)
> > 		return;
> > 	}
> > 
> >-	if (!spin_do_trylock(&base->lock)) {
> >+	if (timer_should_raise_softirq(&base->lock)) {
> > 		raise_softirq(TIMER_SOFTIRQ);
> > 		return;
> > 	}
> 
> Okay. So Peter said that it is okay to apply this since FULL_NO_HZ users
> wouldn't complain on UP. I still wouldn't say it is broken but that is a
> different story.
> We have two users of this trylock. run_local_timers() which pops up
> quite often (and you patched here) and the other is
> get_next_timer_interrupt(). What do you suggest we do here? It is
> basically the same thing.

It's different as it CANNOT fail on UP. That's called from the idle
code and there is no way that anything holds that lock on UP when idle
runs.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 48+ messages in thread

* BUG: spinlock trylock failure on UP, i.MX28 3.12.15-rt25
@ 2014-05-02 19:01                   ` Thomas Gleixner
  0 siblings, 0 replies; 48+ messages in thread
From: Thomas Gleixner @ 2014-05-02 19:01 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, 2 May 2014, Sebastian Andrzej Siewior wrote:
> * Steven Rostedt | 2014-04-22 14:16:50 [-0400]:
> > /*
> >  * Called by the local, per-CPU timer interrupt on SMP.
> >  */
> >@@ -1467,7 +1473,7 @@ void run_local_timers(void)
> > 		return;
> > 	}
> > 
> >-	if (!spin_do_trylock(&base->lock)) {
> >+	if (timer_should_raise_softirq(&base->lock)) {
> > 		raise_softirq(TIMER_SOFTIRQ);
> > 		return;
> > 	}
> 
> Okay. So Peter said that it is okay to apply this since FULL_NO_HZ users
> wouldn't complain on UP. I still wouldn't say it is broken but that is a
> different story.
> We have two users of this trylock. run_local_timers() which pops up
> quite often (and you patched here) and the other is
> get_next_timer_interrupt(). What do you suggest we do here? It is
> basically the same thing.

It's different as it CANNOT fail on UP. That's called from the idle
code and there is no way that anything holds that lock on UP when idle
runs.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: BUG: spinlock trylock failure on UP, i.MX28 3.12.15-rt25
  2014-05-02 19:01                   ` Thomas Gleixner
@ 2014-05-02 19:36                     ` Thomas Gleixner
  -1 siblings, 0 replies; 48+ messages in thread
From: Thomas Gleixner @ 2014-05-02 19:36 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Steven Rostedt, Stanislav Meduna, linux-rt-users,
	Linux ARM Kernel, linux-kernel, Peter Zijlstra

On Fri, 2 May 2014, Thomas Gleixner wrote:
> On Fri, 2 May 2014, Sebastian Andrzej Siewior wrote:
> > * Steven Rostedt | 2014-04-22 14:16:50 [-0400]:
> > > /*
> > >  * Called by the local, per-CPU timer interrupt on SMP.
> > >  */
> > >@@ -1467,7 +1473,7 @@ void run_local_timers(void)
> > > 		return;
> > > 	}
> > > 
> > >-	if (!spin_do_trylock(&base->lock)) {
> > >+	if (timer_should_raise_softirq(&base->lock)) {
> > > 		raise_softirq(TIMER_SOFTIRQ);
> > > 		return;
> > > 	}
> > 
> > Okay. So Peter said that it is okay to apply this since FULL_NO_HZ users
> > wouldn't complain on UP. I still wouldn't say it is broken but that is a
> > different story.
> > We have two users of this trylock. run_local_timers() which pops up
> > quite often (and you patched here) and the other is
> > get_next_timer_interrupt(). What do you suggest we do here? It is
> > basically the same thing.
> 
> It's different as it CANNOT fail on UP. That's called from the idle
> code and there is no way that anything holds that lock on UP when idle
> runs.

So yeah, you are right, that it's called from irq_exit() so it needs
an annotation at least. Maybe it's really cleaner to make it #if SMP
as well just for clarity raisins.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 48+ messages in thread

* BUG: spinlock trylock failure on UP, i.MX28 3.12.15-rt25
@ 2014-05-02 19:36                     ` Thomas Gleixner
  0 siblings, 0 replies; 48+ messages in thread
From: Thomas Gleixner @ 2014-05-02 19:36 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, 2 May 2014, Thomas Gleixner wrote:
> On Fri, 2 May 2014, Sebastian Andrzej Siewior wrote:
> > * Steven Rostedt | 2014-04-22 14:16:50 [-0400]:
> > > /*
> > >  * Called by the local, per-CPU timer interrupt on SMP.
> > >  */
> > >@@ -1467,7 +1473,7 @@ void run_local_timers(void)
> > > 		return;
> > > 	}
> > > 
> > >-	if (!spin_do_trylock(&base->lock)) {
> > >+	if (timer_should_raise_softirq(&base->lock)) {
> > > 		raise_softirq(TIMER_SOFTIRQ);
> > > 		return;
> > > 	}
> > 
> > Okay. So Peter said that it is okay to apply this since FULL_NO_HZ users
> > wouldn't complain on UP. I still wouldn't say it is broken but that is a
> > different story.
> > We have two users of this trylock. run_local_timers() which pops up
> > quite often (and you patched here) and the other is
> > get_next_timer_interrupt(). What do you suggest we do here? It is
> > basically the same thing.
> 
> It's different as it CANNOT fail on UP. That's called from the idle
> code and there is no way that anything holds that lock on UP when idle
> runs.

So yeah, you are right, that it's called from irq_exit() so it needs
an annotation at least. Maybe it's really cleaner to make it #if SMP
as well just for clarity raisins.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: BUG: spinlock trylock failure on UP, i.MX28 3.12.15-rt25
  2014-05-02 19:01                   ` Thomas Gleixner
@ 2014-05-02 19:37                     ` Sebastian Andrzej Siewior
  -1 siblings, 0 replies; 48+ messages in thread
From: Sebastian Andrzej Siewior @ 2014-05-02 19:37 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Steven Rostedt, Stanislav Meduna, linux-rt-users,
	Linux ARM Kernel, linux-kernel, Peter Zijlstra

* Thomas Gleixner | 2014-05-02 21:01:50 [+0200]:

>It's different as it CANNOT fail on UP. That's called from the idle
>code and there is no way that anything holds that lock on UP when idle
>runs.

Okay, so I will add the patch here. The same thing (mostly) but it will
also skip the irq_work_needs_cpu() check since we will run the timer
softirq anyway.

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date: Fri, 2 May 2014 21:31:50 +0200
Subject: [PATCH] timer: do not spin_trylock() on UP

This will void a warning comming from the spin-lock debugging code. The
lock avoiding idea is from Steven Rostedt.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 kernel/timer.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/kernel/timer.c b/kernel/timer.c
index 54596b5..8750875 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -1455,18 +1455,31 @@ void run_local_timers(void)
 	struct tvec_base *base = __this_cpu_read(tvec_bases);
 
 	hrtimer_run_queues();
 	/*
 	 * We can access this lockless as we are in the timer
 	 * interrupt. If there are no timers queued, nothing to do in
 	 * the timer softirq.
 	 */
 #ifdef CONFIG_PREEMPT_RT_FULL
+
+#ifndef CONFIG_SMP
+	/*
+	 * The spin_do_trylock() later may fail as the lock may be hold before
+	 * the interrupt arrived. The spin-lock debugging code will raise a
+	 * warning if the try_lock fails on UP. Since this is only an
+	 * optimization for the FULL_NO_HZ case (not to run the timer softirq on
+	 * an nohz_full CPU) we don't really care and shedule the softirq.
+	 */
+	raise_softirq(TIMER_SOFTIRQ);
+	return;
+#endif
+
 	/* On RT, irq work runs from softirq */
 	if (irq_work_needs_cpu()) {
 		raise_softirq(TIMER_SOFTIRQ);
 		return;
 	}
 
 	if (!spin_do_trylock(&base->lock)) {
 		raise_softirq(TIMER_SOFTIRQ);
 		return;
-- 
2.0.0.rc0

>Thanks,
>
>	tglx

Sebastian

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* BUG: spinlock trylock failure on UP, i.MX28 3.12.15-rt25
@ 2014-05-02 19:37                     ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 48+ messages in thread
From: Sebastian Andrzej Siewior @ 2014-05-02 19:37 UTC (permalink / raw)
  To: linux-arm-kernel

* Thomas Gleixner | 2014-05-02 21:01:50 [+0200]:

>It's different as it CANNOT fail on UP. That's called from the idle
>code and there is no way that anything holds that lock on UP when idle
>runs.

Okay, so I will add the patch here. The same thing (mostly) but it will
also skip the irq_work_needs_cpu() check since we will run the timer
softirq anyway.

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date: Fri, 2 May 2014 21:31:50 +0200
Subject: [PATCH] timer: do not spin_trylock() on UP

This will void a warning comming from the spin-lock debugging code. The
lock avoiding idea is from Steven Rostedt.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 kernel/timer.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/kernel/timer.c b/kernel/timer.c
index 54596b5..8750875 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -1455,18 +1455,31 @@ void run_local_timers(void)
 	struct tvec_base *base = __this_cpu_read(tvec_bases);
 
 	hrtimer_run_queues();
 	/*
 	 * We can access this lockless as we are in the timer
 	 * interrupt. If there are no timers queued, nothing to do in
 	 * the timer softirq.
 	 */
 #ifdef CONFIG_PREEMPT_RT_FULL
+
+#ifndef CONFIG_SMP
+	/*
+	 * The spin_do_trylock() later may fail as the lock may be hold before
+	 * the interrupt arrived. The spin-lock debugging code will raise a
+	 * warning if the try_lock fails on UP. Since this is only an
+	 * optimization for the FULL_NO_HZ case (not to run the timer softirq on
+	 * an nohz_full CPU) we don't really care and shedule the softirq.
+	 */
+	raise_softirq(TIMER_SOFTIRQ);
+	return;
+#endif
+
 	/* On RT, irq work runs from softirq */
 	if (irq_work_needs_cpu()) {
 		raise_softirq(TIMER_SOFTIRQ);
 		return;
 	}
 
 	if (!spin_do_trylock(&base->lock)) {
 		raise_softirq(TIMER_SOFTIRQ);
 		return;
-- 
2.0.0.rc0

>Thanks,
>
>	tglx

Sebastian

^ permalink raw reply related	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2014-05-02 19:37 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-14 19:24 BUG: spinlock trylock failure on UP, i.MX28 3.12.15-rt25 Stanislav Meduna
2014-04-14 23:45 ` Stanislav Meduna
2014-04-14 23:45   ` Stanislav Meduna
2014-04-15 22:08   ` Stanislav Meduna
2014-04-15 22:08     ` Stanislav Meduna
2014-04-17 16:29     ` BUG: spinlock trylock failure on UP - reverting timer patches helps Stanislav Meduna
2014-04-17 16:29       ` Stanislav Meduna
2014-04-17 22:55       ` Stanislav Meduna
2014-04-17 22:55         ` Stanislav Meduna
2014-04-17 23:10         ` jordan
2014-04-17 23:10           ` jordan
2014-04-18  1:15           ` jordan
2014-04-18  1:15             ` jordan
2014-04-18  1:15             ` jordan
2014-04-18 14:45           ` Stanislav Meduna
2014-04-18 14:45             ` Stanislav Meduna
2014-04-18 15:09             ` jordan
2014-04-18 15:09               ` jordan
2014-04-22 11:54     ` BUG: spinlock trylock failure on UP, i.MX28 3.12.15-rt25 Sebastian Andrzej Siewior
2014-04-22 11:54       ` Sebastian Andrzej Siewior
2014-04-22 13:46       ` Steven Rostedt
2014-04-22 13:46         ` Steven Rostedt
2014-04-22 14:09         ` Peter Zijlstra
2014-04-22 14:09           ` Peter Zijlstra
2014-04-23  8:10           ` Sebastian Andrzej Siewior
2014-04-23  8:10             ` Sebastian Andrzej Siewior
2014-04-23  8:50             ` Peter Zijlstra
2014-04-23  8:50               ` Peter Zijlstra
2014-04-22 16:51         ` Sebastian Andrzej Siewior
2014-04-22 16:51           ` Sebastian Andrzej Siewior
2014-04-22 17:48           ` Steven Rostedt
2014-04-22 17:48             ` Steven Rostedt
2014-04-22 18:16             ` Steven Rostedt
2014-04-22 18:16               ` Steven Rostedt
2014-04-23  7:14               ` Sebastian Andrzej Siewior
2014-04-23  7:14                 ` Sebastian Andrzej Siewior
2014-04-23  8:49                 ` Peter Zijlstra
2014-04-23  8:49                   ` Peter Zijlstra
2014-04-23 12:15                 ` Steven Rostedt
2014-04-23 12:15                   ` Steven Rostedt
2014-05-02 18:38               ` Sebastian Andrzej Siewior
2014-05-02 18:38                 ` Sebastian Andrzej Siewior
2014-05-02 19:01                 ` Thomas Gleixner
2014-05-02 19:01                   ` Thomas Gleixner
2014-05-02 19:36                   ` Thomas Gleixner
2014-05-02 19:36                     ` Thomas Gleixner
2014-05-02 19:37                   ` Sebastian Andrzej Siewior
2014-05-02 19:37                     ` Sebastian Andrzej Siewior

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.