linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* mx6: suspicious RCU usage
@ 2015-07-14 15:54 Fabio Estevam
  2015-07-14 16:00 ` Paul E. McKenney
  0 siblings, 1 reply; 11+ messages in thread
From: Fabio Estevam @ 2015-07-14 15:54 UTC (permalink / raw)
  To: Russell King
  Cc: linux-arm-kernel, Peter Zijlstra, Paul McKenney, rostedt, linux-kernel

Hi,

I am running 4.2-rc2 on a mx6q board and I see the following warning
when doing a suspend/resume cycle:

$ echo mem > /sys/power/state

PM: Syncing filesystems ... done.
Freezing user space processes ... (elapsed 0.003 seconds) done.
Freezing remaining freezable tasks ... (elapsed 0.003 seconds) done.
Suspending console(s) (use no_console_suspend to debug)

(Press GPIO button to wake up the system)

PM: suspend of devices complete after 101.667 msecs
PM: suspend devices took 0.110 seconds
PM: late suspend of devices complete after 11.073 msecs
PM: noirq suspend of devices complete after 9.432 msecs
Disabling non-boot CPUs ...

===============================
[ INFO: suspicious RCU usage. ]
4.2.0-rc2 #247 Not tainted
-------------------------------
kernel/sched/fair.c:5032 suspicious rcu_dereference_check() usage!

other info that might help us debug this:


RCU used illegally from offline CPU!
rcu_scheduler_active = 1, debug_locks = 0
3 locks held by swapper/1/0:
 #0:  ((cpu_died).wait.lock){......}, at: [<800643e0>] complete+0x1c/0x4c
 #1:  (&p->pi_lock){-.-.-.}, at: [<8004f7e8>] try_to_wake_up+0x34/0x3c8
 #2:  (rcu_read_lock){......}, at: [<80057b68>] select_task_rq_fair+0x64/0x9e8

stack backtrace:
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.2.0-rc2 #247
Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
Backtrace:
[<80012ee8>] (dump_backtrace) from [<80013084>] (show_stack+0x18/0x1c)
 r6:000013a8 r5:00000000 r4:00000000 r3:00000000
[<8001306c>] (show_stack) from [<8078d364>] (dump_stack+0x88/0xa4)
[<8078d2dc>] (dump_stack) from [<800676d8>] (lockdep_rcu_suspicious+0xbc/0x11c)
 r5:809497c4 r4:be076780
[<8006761c>] (lockdep_rcu_suspicious) from [<80058200>]
(select_task_rq_fair+0x6fc/0x9e8)
 r7:80a94fc0 r6:00000001 r5:00000000 r4:00000000
[<80057b04>] (select_task_rq_fair) from [<8004f8cc>]
(try_to_wake_up+0x118/0x3c8)
 r10:80a94fc0 r9:00000000 r8:00000000 r7:80000093 r6:80a98b2c r5:bd748f8c
 r4:bd748b80
[<8004f7b4>] (try_to_wake_up) from [<8004fb90>]
(default_wake_function+0x14/0x18)
 r10:00000003 r9:8004fb7c r8:00000000 r7:00000000 r6:80a9dc34 r5:00000001
 r4:80a9dc28
[<8004fb7c>] (default_wake_function) from [<80063b18>]
(__wake_up_common+0x58/0x98)
[<80063ac0>] (__wake_up_common) from [<80063b74>] (__wake_up_locked+0x1c/0x24)
 r10:80a98b2c r9:807993ac r8:80a98a10 r7:80afc552 r6:60000093 r5:80a9dc10
 r4:80a9dc14
[<80063b58>] (__wake_up_locked) from [<80064400>] (complete+0x3c/0x4c)
[<800643c4>] (complete) from [<807870e0>] (cpu_die+0x3c/0xa4)
 r6:80a923e4 r5:00000001 r4:80a98968 r3:00000002
[<807870a4>] (cpu_die) from [<80010798>] (arch_cpu_idle_dead+0x10/0x14)
 r5:80a989c4 r4:00000000
[<80010788>] (arch_cpu_idle_dead) from [<80064798>]
(cpu_startup_entry+0x1d8/0x200)
[<800645c0>] (cpu_startup_entry) from [<80015e34>]
(secondary_start_kernel+0x120/0x13c)
 r7:80aff360
[<80015d14>] (secondary_start_kernel) from [<1000962c>] (0x1000962c)
 r5:00000015 r4:4e08806a
CPU1: shutdown
CPU2: shutdown
CPU3: shutdown
Enabling non-boot CPUs ...
CPU1 is up
CPU2 is up
CPU3 is up
PM: noirq resume of devices complete after 2.185 msecs
PM: early resume of devices complete after 2.786 msecs
PM: resume of devices complete after 204.223 msecs
PM: resume devices took 0.200 seconds
ata1: SATA link down (SStatus 0 SControl 300)
PM: Finishing wakeup.
Restarting tasks ... done.

I haven't started bisecting it yet, but if someone has some ideas,
please let me know.

Regards,

Fabio Estevam

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mx6: suspicious RCU usage
  2015-07-14 15:54 mx6: suspicious RCU usage Fabio Estevam
@ 2015-07-14 16:00 ` Paul E. McKenney
  2015-07-14 16:06   ` Fabio Estevam
  2015-07-14 16:08   ` Paul E. McKenney
  0 siblings, 2 replies; 11+ messages in thread
From: Paul E. McKenney @ 2015-07-14 16:00 UTC (permalink / raw)
  To: Fabio Estevam
  Cc: Russell King, linux-arm-kernel, Peter Zijlstra, rostedt, linux-kernel

On Tue, Jul 14, 2015 at 12:54:15PM -0300, Fabio Estevam wrote:
> Hi,
> 
> I am running 4.2-rc2 on a mx6q board and I see the following warning
> when doing a suspend/resume cycle:
> 
> $ echo mem > /sys/power/state
> 
> PM: Syncing filesystems ... done.
> Freezing user space processes ... (elapsed 0.003 seconds) done.
> Freezing remaining freezable tasks ... (elapsed 0.003 seconds) done.
> Suspending console(s) (use no_console_suspend to debug)
> 
> (Press GPIO button to wake up the system)
> 
> PM: suspend of devices complete after 101.667 msecs
> PM: suspend devices took 0.110 seconds
> PM: late suspend of devices complete after 11.073 msecs
> PM: noirq suspend of devices complete after 9.432 msecs
> Disabling non-boot CPUs ...

Does this patch help?

	https://lkml.org/lkml/2015/5/12/885

							Thanx, Paul

> ===============================
> [ INFO: suspicious RCU usage. ]
> 4.2.0-rc2 #247 Not tainted
> -------------------------------
> kernel/sched/fair.c:5032 suspicious rcu_dereference_check() usage!
> 
> other info that might help us debug this:
> 
> 
> RCU used illegally from offline CPU!
> rcu_scheduler_active = 1, debug_locks = 0
> 3 locks held by swapper/1/0:
>  #0:  ((cpu_died).wait.lock){......}, at: [<800643e0>] complete+0x1c/0x4c
>  #1:  (&p->pi_lock){-.-.-.}, at: [<8004f7e8>] try_to_wake_up+0x34/0x3c8
>  #2:  (rcu_read_lock){......}, at: [<80057b68>] select_task_rq_fair+0x64/0x9e8
> 
> stack backtrace:
> CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.2.0-rc2 #247
> Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
> Backtrace:
> [<80012ee8>] (dump_backtrace) from [<80013084>] (show_stack+0x18/0x1c)
>  r6:000013a8 r5:00000000 r4:00000000 r3:00000000
> [<8001306c>] (show_stack) from [<8078d364>] (dump_stack+0x88/0xa4)
> [<8078d2dc>] (dump_stack) from [<800676d8>] (lockdep_rcu_suspicious+0xbc/0x11c)
>  r5:809497c4 r4:be076780
> [<8006761c>] (lockdep_rcu_suspicious) from [<80058200>]
> (select_task_rq_fair+0x6fc/0x9e8)
>  r7:80a94fc0 r6:00000001 r5:00000000 r4:00000000
> [<80057b04>] (select_task_rq_fair) from [<8004f8cc>]
> (try_to_wake_up+0x118/0x3c8)
>  r10:80a94fc0 r9:00000000 r8:00000000 r7:80000093 r6:80a98b2c r5:bd748f8c
>  r4:bd748b80
> [<8004f7b4>] (try_to_wake_up) from [<8004fb90>]
> (default_wake_function+0x14/0x18)
>  r10:00000003 r9:8004fb7c r8:00000000 r7:00000000 r6:80a9dc34 r5:00000001
>  r4:80a9dc28
> [<8004fb7c>] (default_wake_function) from [<80063b18>]
> (__wake_up_common+0x58/0x98)
> [<80063ac0>] (__wake_up_common) from [<80063b74>] (__wake_up_locked+0x1c/0x24)
>  r10:80a98b2c r9:807993ac r8:80a98a10 r7:80afc552 r6:60000093 r5:80a9dc10
>  r4:80a9dc14
> [<80063b58>] (__wake_up_locked) from [<80064400>] (complete+0x3c/0x4c)
> [<800643c4>] (complete) from [<807870e0>] (cpu_die+0x3c/0xa4)
>  r6:80a923e4 r5:00000001 r4:80a98968 r3:00000002
> [<807870a4>] (cpu_die) from [<80010798>] (arch_cpu_idle_dead+0x10/0x14)
>  r5:80a989c4 r4:00000000
> [<80010788>] (arch_cpu_idle_dead) from [<80064798>]
> (cpu_startup_entry+0x1d8/0x200)
> [<800645c0>] (cpu_startup_entry) from [<80015e34>]
> (secondary_start_kernel+0x120/0x13c)
>  r7:80aff360
> [<80015d14>] (secondary_start_kernel) from [<1000962c>] (0x1000962c)
>  r5:00000015 r4:4e08806a
> CPU1: shutdown
> CPU2: shutdown
> CPU3: shutdown
> Enabling non-boot CPUs ...
> CPU1 is up
> CPU2 is up
> CPU3 is up
> PM: noirq resume of devices complete after 2.185 msecs
> PM: early resume of devices complete after 2.786 msecs
> PM: resume of devices complete after 204.223 msecs
> PM: resume devices took 0.200 seconds
> ata1: SATA link down (SStatus 0 SControl 300)
> PM: Finishing wakeup.
> Restarting tasks ... done.
> 
> I haven't started bisecting it yet, but if someone has some ideas,
> please let me know.
> 
> Regards,
> 
> Fabio Estevam
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mx6: suspicious RCU usage
  2015-07-14 16:00 ` Paul E. McKenney
@ 2015-07-14 16:06   ` Fabio Estevam
  2015-07-14 16:26     ` Paul E. McKenney
  2015-07-14 16:08   ` Paul E. McKenney
  1 sibling, 1 reply; 11+ messages in thread
From: Fabio Estevam @ 2015-07-14 16:06 UTC (permalink / raw)
  To: Paul McKenney
  Cc: Russell King, linux-arm-kernel, Peter Zijlstra, rostedt, linux-kernel

Hi Paul,

On Tue, Jul 14, 2015 at 1:00 PM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:

> Does this patch help?
>
>         https://lkml.org/lkml/2015/5/12/885

I am using an ARM 32-bit machine, so I used this one instead:
https://lkml.org/lkml/2015/5/12/892

, and it fixes the problem. Thanks!

Feel free to add:

Tested-by: Fabio Estevam <fabio.estevam@freescale.com>

Thanks

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mx6: suspicious RCU usage
  2015-07-14 16:00 ` Paul E. McKenney
  2015-07-14 16:06   ` Fabio Estevam
@ 2015-07-14 16:08   ` Paul E. McKenney
  1 sibling, 0 replies; 11+ messages in thread
From: Paul E. McKenney @ 2015-07-14 16:08 UTC (permalink / raw)
  To: Fabio Estevam
  Cc: Russell King, linux-arm-kernel, Peter Zijlstra, rostedt, linux-kernel

On Tue, Jul 14, 2015 at 09:00:06AM -0700, Paul E. McKenney wrote:
> On Tue, Jul 14, 2015 at 12:54:15PM -0300, Fabio Estevam wrote:
> > Hi,
> > 
> > I am running 4.2-rc2 on a mx6q board and I see the following warning
> > when doing a suspend/resume cycle:
> > 
> > $ echo mem > /sys/power/state
> > 
> > PM: Syncing filesystems ... done.
> > Freezing user space processes ... (elapsed 0.003 seconds) done.
> > Freezing remaining freezable tasks ... (elapsed 0.003 seconds) done.
> > Suspending console(s) (use no_console_suspend to debug)
> > 
> > (Press GPIO button to wake up the system)
> > 
> > PM: suspend of devices complete after 101.667 msecs
> > PM: suspend devices took 0.110 seconds
> > PM: late suspend of devices complete after 11.073 msecs
> > PM: noirq suspend of devices complete after 9.432 msecs
> > Disabling non-boot CPUs ...
> 
> Does this patch help?
> 
> 	https://lkml.org/lkml/2015/5/12/885

Or this one, if this is ARM rather than ARM64:

	https://lkml.org/lkml/2015/5/12/892

							Thanx, Paul

> > ===============================
> > [ INFO: suspicious RCU usage. ]
> > 4.2.0-rc2 #247 Not tainted
> > -------------------------------
> > kernel/sched/fair.c:5032 suspicious rcu_dereference_check() usage!
> > 
> > other info that might help us debug this:
> > 
> > 
> > RCU used illegally from offline CPU!
> > rcu_scheduler_active = 1, debug_locks = 0
> > 3 locks held by swapper/1/0:
> >  #0:  ((cpu_died).wait.lock){......}, at: [<800643e0>] complete+0x1c/0x4c
> >  #1:  (&p->pi_lock){-.-.-.}, at: [<8004f7e8>] try_to_wake_up+0x34/0x3c8
> >  #2:  (rcu_read_lock){......}, at: [<80057b68>] select_task_rq_fair+0x64/0x9e8
> > 
> > stack backtrace:
> > CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.2.0-rc2 #247
> > Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
> > Backtrace:
> > [<80012ee8>] (dump_backtrace) from [<80013084>] (show_stack+0x18/0x1c)
> >  r6:000013a8 r5:00000000 r4:00000000 r3:00000000
> > [<8001306c>] (show_stack) from [<8078d364>] (dump_stack+0x88/0xa4)
> > [<8078d2dc>] (dump_stack) from [<800676d8>] (lockdep_rcu_suspicious+0xbc/0x11c)
> >  r5:809497c4 r4:be076780
> > [<8006761c>] (lockdep_rcu_suspicious) from [<80058200>]
> > (select_task_rq_fair+0x6fc/0x9e8)
> >  r7:80a94fc0 r6:00000001 r5:00000000 r4:00000000
> > [<80057b04>] (select_task_rq_fair) from [<8004f8cc>]
> > (try_to_wake_up+0x118/0x3c8)
> >  r10:80a94fc0 r9:00000000 r8:00000000 r7:80000093 r6:80a98b2c r5:bd748f8c
> >  r4:bd748b80
> > [<8004f7b4>] (try_to_wake_up) from [<8004fb90>]
> > (default_wake_function+0x14/0x18)
> >  r10:00000003 r9:8004fb7c r8:00000000 r7:00000000 r6:80a9dc34 r5:00000001
> >  r4:80a9dc28
> > [<8004fb7c>] (default_wake_function) from [<80063b18>]
> > (__wake_up_common+0x58/0x98)
> > [<80063ac0>] (__wake_up_common) from [<80063b74>] (__wake_up_locked+0x1c/0x24)
> >  r10:80a98b2c r9:807993ac r8:80a98a10 r7:80afc552 r6:60000093 r5:80a9dc10
> >  r4:80a9dc14
> > [<80063b58>] (__wake_up_locked) from [<80064400>] (complete+0x3c/0x4c)
> > [<800643c4>] (complete) from [<807870e0>] (cpu_die+0x3c/0xa4)
> >  r6:80a923e4 r5:00000001 r4:80a98968 r3:00000002
> > [<807870a4>] (cpu_die) from [<80010798>] (arch_cpu_idle_dead+0x10/0x14)
> >  r5:80a989c4 r4:00000000
> > [<80010788>] (arch_cpu_idle_dead) from [<80064798>]
> > (cpu_startup_entry+0x1d8/0x200)
> > [<800645c0>] (cpu_startup_entry) from [<80015e34>]
> > (secondary_start_kernel+0x120/0x13c)
> >  r7:80aff360
> > [<80015d14>] (secondary_start_kernel) from [<1000962c>] (0x1000962c)
> >  r5:00000015 r4:4e08806a
> > CPU1: shutdown
> > CPU2: shutdown
> > CPU3: shutdown
> > Enabling non-boot CPUs ...
> > CPU1 is up
> > CPU2 is up
> > CPU3 is up
> > PM: noirq resume of devices complete after 2.185 msecs
> > PM: early resume of devices complete after 2.786 msecs
> > PM: resume of devices complete after 204.223 msecs
> > PM: resume devices took 0.200 seconds
> > ata1: SATA link down (SStatus 0 SControl 300)
> > PM: Finishing wakeup.
> > Restarting tasks ... done.
> > 
> > I haven't started bisecting it yet, but if someone has some ideas,
> > please let me know.
> > 
> > Regards,
> > 
> > Fabio Estevam
> > 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mx6: suspicious RCU usage
  2015-07-14 16:06   ` Fabio Estevam
@ 2015-07-14 16:26     ` Paul E. McKenney
  2015-07-14 16:44       ` Fabio Estevam
  2015-07-14 20:49       ` Russell King - ARM Linux
  0 siblings, 2 replies; 11+ messages in thread
From: Paul E. McKenney @ 2015-07-14 16:26 UTC (permalink / raw)
  To: Fabio Estevam
  Cc: Russell King, linux-arm-kernel, Peter Zijlstra, rostedt, linux-kernel

On Tue, Jul 14, 2015 at 01:06:32PM -0300, Fabio Estevam wrote:
> Hi Paul,
> 
> On Tue, Jul 14, 2015 at 1:00 PM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> 
> > Does this patch help?
> >
> >         https://lkml.org/lkml/2015/5/12/885
> 
> I am using an ARM 32-bit machine, so I used this one instead:
> https://lkml.org/lkml/2015/5/12/892
> 
> , and it fixes the problem. Thanks!
> 
> Feel free to add:
> 
> Tested-by: Fabio Estevam <fabio.estevam@freescale.com>

Glad it helped!

Russell, did you want me to push this, or would you rather take it?

							Thanx, Paul


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mx6: suspicious RCU usage
  2015-07-14 16:26     ` Paul E. McKenney
@ 2015-07-14 16:44       ` Fabio Estevam
  2015-07-14 16:54         ` Paul E. McKenney
  2015-07-14 16:59         ` Fabio Estevam
  2015-07-14 20:49       ` Russell King - ARM Linux
  1 sibling, 2 replies; 11+ messages in thread
From: Fabio Estevam @ 2015-07-14 16:44 UTC (permalink / raw)
  To: Paul McKenney
  Cc: Russell King, linux-arm-kernel, Peter Zijlstra, rostedt, linux-kernel

Hi Paul,

On Tue, Jul 14, 2015 at 1:26 PM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:

> Glad it helped!
>
> Russell, did you want me to push this, or would you rather take it?

With your patch applied I am running several suspend/resume cycles and
I am seeing different problems now from time to time (like 1 out 10
times). The kernel dump does not look like the same every time, but
here is one example:

root@freescale /$ echo mem > /sys/power/state
PM: Syncing filesystems ... done.
Freezing user space processes ... (elapsed 0.001 seconds) done.
Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
PM: suspend of devices complete after 36.922 msecs
PM: suspend devices took 0.050 seconds
PM: late suspend of devices complete after 8.133 msecs
PM: noirq suspend of devices complete after 8.036 msecs
Disabling non-boot CPUs ...
CPU1: shutdown
CPU2: shutdown
CPU3: shutdown
Enabling non-boot CPUs ...
CPU1 is up
Unable to handle kernel paging request at virtual address 663bf420
pgd = bd5cc000
[663bf420] *pgd=00000000
Internal error: Oops: 5 [#1] SMP ARM
Modules linked in:
CPU: 0 PID: 968 Comm: sh Not tainted 4.2.0-rc2-dirty #249
Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
task: bdb78000 ti: bd614000 task.ti: bd614000
PC is at rcu_is_watching+0x28/0x3c
LR is at pick_next_task_fair+0x50c/0x774
pc : [<8007e9b0>]    lr : [<8005d554>]    psr: 600001d3
sp : bd615bd8  ip : bd615be8  fp : bd615be4
r10: be79cfc0  r9 : be79cfd0  r8 : ffff953a
r7 : 00000000  r6 : 0007a11f  r5 : 00000000  r4 : 80ad7cfe
r3 : 80a6e40c  r2 : e5951014  r1 : 00000000  r0 : 00000001
Flags: nZCv  IRQs off  FIQs off  Mode SVC_32  ISA ARM  Segment user
Control: 10c5387d  Table: 4d5cc04a  DAC: 00000015
Process sh (pid: 968, stack limit = 0xbd614210)
Stack: (0xbd615bd8 to 0xbd616000)
5bc0:                                                       bd615c54 bd615be8
5be0: 8005d554 8007e994 00000000 00000000 8005d240 00002627 bd615c24 bdb78000
5c00: 00000001 bdb78000 80a74b2c 80a70fc0 be79cfc0 00000000 bd615c44 bd615c28
5c20: 8004e324 80059bf8 ffffffec bdb78000 be79cfd0 bdb78290 80a74b2c 00000000
5c40: be79cfc0 80a70fc0 bd615c9c bd615c58 80774b40 8005d054 bd615c84 bd615c68
5c60: 8006b864 8006b6d8 600001d3 807751cc be79a440 bd615cb4 be79a440 be79a440
5c80: ffff953a 80a74100 00000010 00000000 bd615cac bd615ca0 807751cc 80774a14
5ca0: bd615d04 bd615cb0 80779408 807751a0 40000153 00000000 be79a564 ffff953a
5cc0: 80084624 bdb78000 00000000 ffffffff 812cbdcc 00000000 00000000 8092ee88
5ce0: bd615d04 00000064 80a79be8 80a79be4 00000002 00000000 bd615d44 bd615d08
5d00: 80775b5c 807792dc c0808000 00000001 bdb78000 8004fb7c 80a79c08 80a79c08
5d20: 00000000 00000002 be077300 80adb360 00000000 be077300 bd615d54 bd615d48
5d40: 80775c20 80775aa0 bd615d74 bd615d58 80015af0 80775c18 00000002 00000000
5d60: 00000002 80a749c4 bd615dac bd615d78 8002b728 80015a48 bd615d9c 0000002a
5d80: 8076f64c 00000002 80adbad0 80a74a7c 80a74a18 80a75429 00000000 00000000
5da0: bd615dcc bd615db0 8076c090 8002b668 20000153 00000000 8128869c 00000003
5dc0: bd615e14 bd615dd0 8007172c 8076c02c 80075040 80074a84 8092d4f0 bd615e0c
5de0: bd615e04 bd615df0 8076f64c 00000000 00000003 812886a8 00000003 bd86dec0
5e00: 00000000 00000004 bd615e34 bd615e18 80071b08 80071420 0000006d 00000003
5e20: 80929290 812886ac bd615e5c bd615e38 80070420 800718a4 bd6f3500 00000004
5e40: bd86dec0 00000004 00000000 00a0f408 bd615e6c bd615e60 802db104 800703bc
5e60: bd615e8c bd615e70 8016a4a0 802db0f4 8016a44c bd6f3500 bd86dec0 bd6f350c
5e80: bd615ed4 bd615e90 801697f0 8016a458 00000000 00000000 80a74ca0 800fe580
5ea0: bd615ee4 bd615f78 8006ae9c 80783d34 00a0f408 bdb1e780 bd615f78 00000004
5ec0: bd614000 00a0f408 bd615f44 bd615ed8 800fd4b0 80169734 bd615f44 bd615ee8
5ee0: 800ff8a8 802fdcf4 00000020 00000000 800fe580 bd615f00 00000000 bdb802e4
5f00: bdb1e780 00000000 bd615f34 bd615f18 800669c4 8006665c bdb78000 bdb1e780
5f20: 00a0f408 bdb1e780 00a0f408 00000004 bd615f78 00000004 bd615f74 bd615f48
5f40: 800fe4e0 800fd488 8011a848 8011a7ac 00000000 00000000 bdb1e780 bdb1e780
5f60: 00000004 00a0f408 bd615fa4 bd615f78 800fe778 800fe454 00000000 00000000
5f80: 00000004 76f155c0 00000004 00000004 8000fd44 00000000 00000000 bd615fa8
5fa0: 8000fb60 800fe740 00000004 76f155c0 00000001 00a0f408 00000004 00000000
5fc0: 00000004 76f155c0 00000004 00000004 00a0f408 00a0d318 7ee5aae8 00000000
5fe0: 00000000 7ee5aa68 76e51690 76ea40bc 60000010 00000001 00000000 00000000
Backtrace:
[<8007e988>] (rcu_is_watching) from [<8005d554>]
(pick_next_task_fair+0x50c/0x774)
[<8005d048>] (pick_next_task_fair) from [<80774b40>] (__schedule+0x138/0x6c4)
 r10:80a70fc0 r9:be79cfc0 r8:00000000 r7:80a74b2c r6:bdb78290 r5:be79cfd0
 r4:bdb78000
[<80774a08>] (__schedule) from [<807751cc>] (schedule+0x38/0x9c)
 r10:00000000 r9:00000010 r8:80a74100 r7:ffff953a r6:be79a440 r5:be79a440
 r4:bd615cb4
[<80775194>] (schedule) from [<80779408>] (schedule_timeout+0x138/0x1cc)
[<807792d0>] (schedule_timeout) from [<80775b5c>] (wait_for_common+0xc8/0x15c)
 r8:00000000 r7:00000002 r6:80a79be4 r5:80a79be8 r4:00000064
[<80775a94>] (wait_for_common) from [<80775c20>]
(wait_for_completion_timeout+0x14/0x18)
 r10:be077300 r8:00000000 r7:80adb360 r6:be077300 r5:00000002 r4:00000000
[<80775c0c>] (wait_for_completion_timeout) from [<80015af0>]
(__cpu_up+0xb4/0x14c)
[<80015a3c>] (__cpu_up) from [<8002b728>] (_cpu_up+0xcc/0x148)
 r7:80a749c4 r6:00000002 r5:00000000 r4:00000002
[<8002b65c>] (_cpu_up) from [<8076c090>] (enable_nonboot_cpus+0x70/0xb8)
 r10:00000000 r9:00000000 r8:80a75429 r7:80a74a18 r6:80a74a7c r5:80adbad0
 r4:00000002
[<8076c020>] (enable_nonboot_cpus) from [<8007172c>]
(suspend_devices_and_enter+0x318/0x484)
 r6:00000003 r5:8128869c r4:00000000 r3:20000153
[<80071414>] (suspend_devices_and_enter) from [<80071b08>]
(pm_suspend+0x270/0x2a8)
 r10:00000004 r9:00000000 r8:bd86dec0 r7:00000003 r6:812886a8 r5:00000003
 r4:00000000
[<80071898>] (pm_suspend) from [<80070420>] (state_store+0x70/0xc0)
 r6:812886ac r5:80929290 r4:00000003 r3:0000006d
[<800703b0>] (state_store) from [<802db104>] (kobj_attr_store+0x1c/0x28)
 r10:00a0f408 r8:00000000 r7:00000004 r6:bd86dec0 r5:00000004 r4:bd6f3500
[<802db0e8>] (kobj_attr_store) from [<8016a4a0>] (sysfs_kf_write+0x54/0x58)
[<8016a44c>] (sysfs_kf_write) from [<801697f0>] (kernfs_fop_write+0xc8/0x1a8)
 r6:bd6f350c r5:bd86dec0 r4:bd6f3500 r3:8016a44c
[<80169728>] (kernfs_fop_write) from [<800fd4b0>] (__vfs_write+0x34/0xe0)
 r10:00a0f408 r9:bd614000 r8:00000004 r7:bd615f78 r6:bdb1e780 r5:00a0f408
 r4:80783d34
[<800fd47c>] (__vfs_write) from [<800fe4e0>] (vfs_write+0x98/0x154)
 r8:00000004 r7:bd615f78 r6:00000004 r5:00a0f408 r4:bdb1e780
[<800fe448>] (vfs_write) from [<800fe778>] (SyS_write+0x44/0x90)
 r10:00a0f408 r8:00000004 r7:bdb1e780 r6:bdb1e780 r5:00000000 r4:00000000
[<800fe734>] (SyS_write) from [<8000fb60>] (ret_fast_syscall+0x0/0x54)
 r10:00000000 r8:8000fd44 r7:00000004 r6:00000004 r5:76f155c0 r4:00000004
Code: e5d72014 e5811010 e55f2010 47922101 (f6130414)
---[ end trace 2d8bb88539a73fb8 ]---

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mx6: suspicious RCU usage
  2015-07-14 16:44       ` Fabio Estevam
@ 2015-07-14 16:54         ` Paul E. McKenney
  2015-07-14 16:59         ` Fabio Estevam
  1 sibling, 0 replies; 11+ messages in thread
From: Paul E. McKenney @ 2015-07-14 16:54 UTC (permalink / raw)
  To: Fabio Estevam
  Cc: Russell King, linux-arm-kernel, Peter Zijlstra, rostedt, linux-kernel

On Tue, Jul 14, 2015 at 01:44:15PM -0300, Fabio Estevam wrote:
> Hi Paul,
> 
> On Tue, Jul 14, 2015 at 1:26 PM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> 
> > Glad it helped!
> >
> > Russell, did you want me to push this, or would you rather take it?
> 
> With your patch applied I am running several suspend/resume cycles and
> I am seeing different problems now from time to time (like 1 out 10
> times). The kernel dump does not look like the same every time, but
> here is one example:
> 
> root@freescale /$ echo mem > /sys/power/state
> PM: Syncing filesystems ... done.
> Freezing user space processes ... (elapsed 0.001 seconds) done.
> Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
> PM: suspend of devices complete after 36.922 msecs
> PM: suspend devices took 0.050 seconds
> PM: late suspend of devices complete after 8.133 msecs
> PM: noirq suspend of devices complete after 8.036 msecs
> Disabling non-boot CPUs ...
> CPU1: shutdown
> CPU2: shutdown
> CPU3: shutdown
> Enabling non-boot CPUs ...
> CPU1 is up
> Unable to handle kernel paging request at virtual address 663bf420
> pgd = bd5cc000
> [663bf420] *pgd=00000000
> Internal error: Oops: 5 [#1] SMP ARM
> Modules linked in:
> CPU: 0 PID: 968 Comm: sh Not tainted 4.2.0-rc2-dirty #249
> Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
> task: bdb78000 ti: bd614000 task.ti: bd614000
> PC is at rcu_is_watching+0x28/0x3c

Odd...  All rcu_is_watching does is to disable preemption, access a
statically allocated per-CPU variable, then re-enable preemption.

Is there a window of time when the incoming CPU doesn't have its
per-CPU variables mapped or something?

							Thanx, Paul

> LR is at pick_next_task_fair+0x50c/0x774
> pc : [<8007e9b0>]    lr : [<8005d554>]    psr: 600001d3
> sp : bd615bd8  ip : bd615be8  fp : bd615be4
> r10: be79cfc0  r9 : be79cfd0  r8 : ffff953a
> r7 : 00000000  r6 : 0007a11f  r5 : 00000000  r4 : 80ad7cfe
> r3 : 80a6e40c  r2 : e5951014  r1 : 00000000  r0 : 00000001
> Flags: nZCv  IRQs off  FIQs off  Mode SVC_32  ISA ARM  Segment user
> Control: 10c5387d  Table: 4d5cc04a  DAC: 00000015
> Process sh (pid: 968, stack limit = 0xbd614210)
> Stack: (0xbd615bd8 to 0xbd616000)
> 5bc0:                                                       bd615c54 bd615be8
> 5be0: 8005d554 8007e994 00000000 00000000 8005d240 00002627 bd615c24 bdb78000
> 5c00: 00000001 bdb78000 80a74b2c 80a70fc0 be79cfc0 00000000 bd615c44 bd615c28
> 5c20: 8004e324 80059bf8 ffffffec bdb78000 be79cfd0 bdb78290 80a74b2c 00000000
> 5c40: be79cfc0 80a70fc0 bd615c9c bd615c58 80774b40 8005d054 bd615c84 bd615c68
> 5c60: 8006b864 8006b6d8 600001d3 807751cc be79a440 bd615cb4 be79a440 be79a440
> 5c80: ffff953a 80a74100 00000010 00000000 bd615cac bd615ca0 807751cc 80774a14
> 5ca0: bd615d04 bd615cb0 80779408 807751a0 40000153 00000000 be79a564 ffff953a
> 5cc0: 80084624 bdb78000 00000000 ffffffff 812cbdcc 00000000 00000000 8092ee88
> 5ce0: bd615d04 00000064 80a79be8 80a79be4 00000002 00000000 bd615d44 bd615d08
> 5d00: 80775b5c 807792dc c0808000 00000001 bdb78000 8004fb7c 80a79c08 80a79c08
> 5d20: 00000000 00000002 be077300 80adb360 00000000 be077300 bd615d54 bd615d48
> 5d40: 80775c20 80775aa0 bd615d74 bd615d58 80015af0 80775c18 00000002 00000000
> 5d60: 00000002 80a749c4 bd615dac bd615d78 8002b728 80015a48 bd615d9c 0000002a
> 5d80: 8076f64c 00000002 80adbad0 80a74a7c 80a74a18 80a75429 00000000 00000000
> 5da0: bd615dcc bd615db0 8076c090 8002b668 20000153 00000000 8128869c 00000003
> 5dc0: bd615e14 bd615dd0 8007172c 8076c02c 80075040 80074a84 8092d4f0 bd615e0c
> 5de0: bd615e04 bd615df0 8076f64c 00000000 00000003 812886a8 00000003 bd86dec0
> 5e00: 00000000 00000004 bd615e34 bd615e18 80071b08 80071420 0000006d 00000003
> 5e20: 80929290 812886ac bd615e5c bd615e38 80070420 800718a4 bd6f3500 00000004
> 5e40: bd86dec0 00000004 00000000 00a0f408 bd615e6c bd615e60 802db104 800703bc
> 5e60: bd615e8c bd615e70 8016a4a0 802db0f4 8016a44c bd6f3500 bd86dec0 bd6f350c
> 5e80: bd615ed4 bd615e90 801697f0 8016a458 00000000 00000000 80a74ca0 800fe580
> 5ea0: bd615ee4 bd615f78 8006ae9c 80783d34 00a0f408 bdb1e780 bd615f78 00000004
> 5ec0: bd614000 00a0f408 bd615f44 bd615ed8 800fd4b0 80169734 bd615f44 bd615ee8
> 5ee0: 800ff8a8 802fdcf4 00000020 00000000 800fe580 bd615f00 00000000 bdb802e4
> 5f00: bdb1e780 00000000 bd615f34 bd615f18 800669c4 8006665c bdb78000 bdb1e780
> 5f20: 00a0f408 bdb1e780 00a0f408 00000004 bd615f78 00000004 bd615f74 bd615f48
> 5f40: 800fe4e0 800fd488 8011a848 8011a7ac 00000000 00000000 bdb1e780 bdb1e780
> 5f60: 00000004 00a0f408 bd615fa4 bd615f78 800fe778 800fe454 00000000 00000000
> 5f80: 00000004 76f155c0 00000004 00000004 8000fd44 00000000 00000000 bd615fa8
> 5fa0: 8000fb60 800fe740 00000004 76f155c0 00000001 00a0f408 00000004 00000000
> 5fc0: 00000004 76f155c0 00000004 00000004 00a0f408 00a0d318 7ee5aae8 00000000
> 5fe0: 00000000 7ee5aa68 76e51690 76ea40bc 60000010 00000001 00000000 00000000
> Backtrace:
> [<8007e988>] (rcu_is_watching) from [<8005d554>]
> (pick_next_task_fair+0x50c/0x774)
> [<8005d048>] (pick_next_task_fair) from [<80774b40>] (__schedule+0x138/0x6c4)
>  r10:80a70fc0 r9:be79cfc0 r8:00000000 r7:80a74b2c r6:bdb78290 r5:be79cfd0
>  r4:bdb78000
> [<80774a08>] (__schedule) from [<807751cc>] (schedule+0x38/0x9c)
>  r10:00000000 r9:00000010 r8:80a74100 r7:ffff953a r6:be79a440 r5:be79a440
>  r4:bd615cb4
> [<80775194>] (schedule) from [<80779408>] (schedule_timeout+0x138/0x1cc)
> [<807792d0>] (schedule_timeout) from [<80775b5c>] (wait_for_common+0xc8/0x15c)
>  r8:00000000 r7:00000002 r6:80a79be4 r5:80a79be8 r4:00000064
> [<80775a94>] (wait_for_common) from [<80775c20>]
> (wait_for_completion_timeout+0x14/0x18)
>  r10:be077300 r8:00000000 r7:80adb360 r6:be077300 r5:00000002 r4:00000000
> [<80775c0c>] (wait_for_completion_timeout) from [<80015af0>]
> (__cpu_up+0xb4/0x14c)
> [<80015a3c>] (__cpu_up) from [<8002b728>] (_cpu_up+0xcc/0x148)
>  r7:80a749c4 r6:00000002 r5:00000000 r4:00000002
> [<8002b65c>] (_cpu_up) from [<8076c090>] (enable_nonboot_cpus+0x70/0xb8)
>  r10:00000000 r9:00000000 r8:80a75429 r7:80a74a18 r6:80a74a7c r5:80adbad0
>  r4:00000002
> [<8076c020>] (enable_nonboot_cpus) from [<8007172c>]
> (suspend_devices_and_enter+0x318/0x484)
>  r6:00000003 r5:8128869c r4:00000000 r3:20000153
> [<80071414>] (suspend_devices_and_enter) from [<80071b08>]
> (pm_suspend+0x270/0x2a8)
>  r10:00000004 r9:00000000 r8:bd86dec0 r7:00000003 r6:812886a8 r5:00000003
>  r4:00000000
> [<80071898>] (pm_suspend) from [<80070420>] (state_store+0x70/0xc0)
>  r6:812886ac r5:80929290 r4:00000003 r3:0000006d
> [<800703b0>] (state_store) from [<802db104>] (kobj_attr_store+0x1c/0x28)
>  r10:00a0f408 r8:00000000 r7:00000004 r6:bd86dec0 r5:00000004 r4:bd6f3500
> [<802db0e8>] (kobj_attr_store) from [<8016a4a0>] (sysfs_kf_write+0x54/0x58)
> [<8016a44c>] (sysfs_kf_write) from [<801697f0>] (kernfs_fop_write+0xc8/0x1a8)
>  r6:bd6f350c r5:bd86dec0 r4:bd6f3500 r3:8016a44c
> [<80169728>] (kernfs_fop_write) from [<800fd4b0>] (__vfs_write+0x34/0xe0)
>  r10:00a0f408 r9:bd614000 r8:00000004 r7:bd615f78 r6:bdb1e780 r5:00a0f408
>  r4:80783d34
> [<800fd47c>] (__vfs_write) from [<800fe4e0>] (vfs_write+0x98/0x154)
>  r8:00000004 r7:bd615f78 r6:00000004 r5:00a0f408 r4:bdb1e780
> [<800fe448>] (vfs_write) from [<800fe778>] (SyS_write+0x44/0x90)
>  r10:00a0f408 r8:00000004 r7:bdb1e780 r6:bdb1e780 r5:00000000 r4:00000000
> [<800fe734>] (SyS_write) from [<8000fb60>] (ret_fast_syscall+0x0/0x54)
>  r10:00000000 r8:8000fd44 r7:00000004 r6:00000004 r5:76f155c0 r4:00000004
> Code: e5d72014 e5811010 e55f2010 47922101 (f6130414)
> ---[ end trace 2d8bb88539a73fb8 ]---
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mx6: suspicious RCU usage
  2015-07-14 16:44       ` Fabio Estevam
  2015-07-14 16:54         ` Paul E. McKenney
@ 2015-07-14 16:59         ` Fabio Estevam
  2015-07-14 19:26           ` Paul E. McKenney
  1 sibling, 1 reply; 11+ messages in thread
From: Fabio Estevam @ 2015-07-14 16:59 UTC (permalink / raw)
  To: Paul McKenney
  Cc: Russell King, linux-arm-kernel, Peter Zijlstra, rostedt, linux-kernel

On Tue, Jul 14, 2015 at 1:44 PM, Fabio Estevam <festevam@gmail.com> wrote:

> With your patch applied I am running several suspend/resume cycles and
> I am seeing different problems now from time to time (like 1 out 10
> times). The kernel dump does not look like the same every time, but
> here is one example:

Here is another one I just got, and it seems to be more informative:

http://pastebin.com/T4YxuLCg

Thanks

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mx6: suspicious RCU usage
  2015-07-14 16:59         ` Fabio Estevam
@ 2015-07-14 19:26           ` Paul E. McKenney
  0 siblings, 0 replies; 11+ messages in thread
From: Paul E. McKenney @ 2015-07-14 19:26 UTC (permalink / raw)
  To: Fabio Estevam
  Cc: Russell King, linux-arm-kernel, Peter Zijlstra, rostedt, linux-kernel

On Tue, Jul 14, 2015 at 01:59:10PM -0300, Fabio Estevam wrote:
> On Tue, Jul 14, 2015 at 1:44 PM, Fabio Estevam <festevam@gmail.com> wrote:
> 
> > With your patch applied I am running several suspend/resume cycles and
> > I am seeing different problems now from time to time (like 1 out 10
> > times). The kernel dump does not look like the same every time, but
> > here is one example:
> 
> Here is another one I just got, and it seems to be more informative:
> 
> http://pastebin.com/T4YxuLCg

This might simply be a symptom of the earlier "Unable to handle
kernel paging request at virtual address 708ebbfa".  However, the
usual cause of the later "rcu_read_lock() used illegally while idle!"
is just that.  In this case, might the interrupt have occurred
without irq_enter() being invoked?

(But there are probably two interleaved stack traces, and I am not
certain that I am getting them untangled properly.)

							Thanx, Paul


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mx6: suspicious RCU usage
  2015-07-14 16:26     ` Paul E. McKenney
  2015-07-14 16:44       ` Fabio Estevam
@ 2015-07-14 20:49       ` Russell King - ARM Linux
  2015-07-14 21:31         ` Paul E. McKenney
  1 sibling, 1 reply; 11+ messages in thread
From: Russell King - ARM Linux @ 2015-07-14 20:49 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Fabio Estevam, linux-arm-kernel, Peter Zijlstra, rostedt, linux-kernel

On Tue, Jul 14, 2015 at 09:26:00AM -0700, Paul E. McKenney wrote:
> On Tue, Jul 14, 2015 at 01:06:32PM -0300, Fabio Estevam wrote:
> > Hi Paul,
> > 
> > On Tue, Jul 14, 2015 at 1:00 PM, Paul E. McKenney
> > <paulmck@linux.vnet.ibm.com> wrote:
> > 
> > > Does this patch help?
> > >
> > >         https://lkml.org/lkml/2015/5/12/885
> > 
> > I am using an ARM 32-bit machine, so I used this one instead:
> > https://lkml.org/lkml/2015/5/12/892
> > 
> > , and it fixes the problem. Thanks!
> > 
> > Feel free to add:
> > 
> > Tested-by: Fabio Estevam <fabio.estevam@freescale.com>
> 
> Glad it helped!
> 
> Russell, did you want me to push this, or would you rather take it?

If it's the one I'm thinking of (using the generic code) because it
doesn't actually solve the problem we have.  It may shut up the RCU
warning, but it doesn't solve the underlying problem - one which is
caused by the use of atomics (which use the exclusive instructions)
vs cache line migration between CPUs vs speculative prefetching...

It's possible right now that _dirty_ cache lines can be migrated to
the dying CPU, which are then lost on power down - and if we disable
the caches on the dying CPU, we then can't use exclusives, so atomics
(and all of the other normal kernel synchronisation mechanisms) are
out of the question.

-- 
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mx6: suspicious RCU usage
  2015-07-14 20:49       ` Russell King - ARM Linux
@ 2015-07-14 21:31         ` Paul E. McKenney
  0 siblings, 0 replies; 11+ messages in thread
From: Paul E. McKenney @ 2015-07-14 21:31 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Fabio Estevam, linux-arm-kernel, Peter Zijlstra, rostedt, linux-kernel

On Tue, Jul 14, 2015 at 09:49:09PM +0100, Russell King - ARM Linux wrote:
> On Tue, Jul 14, 2015 at 09:26:00AM -0700, Paul E. McKenney wrote:
> > On Tue, Jul 14, 2015 at 01:06:32PM -0300, Fabio Estevam wrote:
> > > Hi Paul,
> > > 
> > > On Tue, Jul 14, 2015 at 1:00 PM, Paul E. McKenney
> > > <paulmck@linux.vnet.ibm.com> wrote:
> > > 
> > > > Does this patch help?
> > > >
> > > >         https://lkml.org/lkml/2015/5/12/885
> > > 
> > > I am using an ARM 32-bit machine, so I used this one instead:
> > > https://lkml.org/lkml/2015/5/12/892
> > > 
> > > , and it fixes the problem. Thanks!
> > > 
> > > Feel free to add:
> > > 
> > > Tested-by: Fabio Estevam <fabio.estevam@freescale.com>
> > 
> > Glad it helped!
> > 
> > Russell, did you want me to push this, or would you rather take it?
> 
> If it's the one I'm thinking of (using the generic code) because it
> doesn't actually solve the problem we have.  It may shut up the RCU
> warning, but it doesn't solve the underlying problem - one which is
> caused by the use of atomics (which use the exclusive instructions)
> vs cache line migration between CPUs vs speculative prefetching...
> 
> It's possible right now that _dirty_ cache lines can be migrated to
> the dying CPU, which are then lost on power down - and if we disable
> the caches on the dying CPU, we then can't use exclusives, so atomics
> (and all of the other normal kernel synchronisation mechanisms) are
> out of the question.

OK, that could explain any number of failures.

I will leave this issue in your hands, then.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2015-07-14 21:31 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-14 15:54 mx6: suspicious RCU usage Fabio Estevam
2015-07-14 16:00 ` Paul E. McKenney
2015-07-14 16:06   ` Fabio Estevam
2015-07-14 16:26     ` Paul E. McKenney
2015-07-14 16:44       ` Fabio Estevam
2015-07-14 16:54         ` Paul E. McKenney
2015-07-14 16:59         ` Fabio Estevam
2015-07-14 19:26           ` Paul E. McKenney
2015-07-14 20:49       ` Russell King - ARM Linux
2015-07-14 21:31         ` Paul E. McKenney
2015-07-14 16:08   ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).