* (bisected) Lock up on sh73a0/kzm9g on cpuidle initialization @ 2014-11-06 20:38 Geert Uytterhoeven 2014-11-06 21:02 ` Daniel Lezcano 0 siblings, 1 reply; 7+ messages in thread From: Geert Uytterhoeven @ 2014-11-06 20:38 UTC (permalink / raw) To: Daniel Lezcano, Ingo Molnar, Nicolas Pitre Cc: Paul McKenney, Jiri Kosina, Rafael J. Wysocki, Linux PM list, Linux-sh list, linux-arm-kernel, linux-kernel When CONFIG_CPU_IDLE=y, the kernel locks up during cpuidle initialization on Renesas sh73a0/kzm9g-reference, which has a dual-core Cortex-A9. Last message is: DMA: preallocated 256 KiB pool for atomic coherent allocations After this it's supposed to print: cpuidle: using governor ladder cpuidle: using governor menu I've bisected this to commit 442bf3aaf55a91ebfec71da46a4ee10a3c905bcc ("sched: Let the scheduler see CPU idle states"). Reverting that commit, and commit 83a0a96a5f26d974580fd7251043ff70c8f1823d ("sched/fair: Leverage the idle state info when choosing the "idlest" cpu") which depends on it, fixes the problem. I saw the discussion "lockdep splat in CPU hotplug", so I enabled lockdep debugging, but didn't see a lockdep splat. I'm using CONFIG_TREE_RCU=y, as this is SMP without PREEMPT. Anyone with a clue? Thanks! Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: (bisected) Lock up on sh73a0/kzm9g on cpuidle initialization 2014-11-06 20:38 (bisected) Lock up on sh73a0/kzm9g on cpuidle initialization Geert Uytterhoeven @ 2014-11-06 21:02 ` Daniel Lezcano 2014-11-07 7:59 ` Geert Uytterhoeven 0 siblings, 1 reply; 7+ messages in thread From: Daniel Lezcano @ 2014-11-06 21:02 UTC (permalink / raw) To: Geert Uytterhoeven, Ingo Molnar, Nicolas Pitre Cc: Paul McKenney, Jiri Kosina, Rafael J. Wysocki, Linux PM list, Linux-sh list, linux-arm-kernel, linux-kernel On 11/06/2014 09:38 PM, Geert Uytterhoeven wrote: > When CONFIG_CPU_IDLE=y, the kernel locks up during cpuidle initialization > on Renesas sh73a0/kzm9g-reference, which has a dual-core Cortex-A9. > > Last message is: > > DMA: preallocated 256 KiB pool for atomic coherent allocations > > After this it's supposed to print: > > cpuidle: using governor ladder > cpuidle: using governor menu > > I've bisected this to commit 442bf3aaf55a91ebfec71da46a4ee10a3c905bcc > ("sched: Let the scheduler see CPU idle states"). > > Reverting that commit, and commit 83a0a96a5f26d974580fd7251043ff70c8f1823d > ("sched/fair: Leverage the idle state info when choosing the "idlest" > cpu") which > depends on it, fixes the problem. > > I saw the discussion "lockdep splat in CPU hotplug", so I enabled lockdep > debugging, but didn't see a lockdep splat. Did you try the fix attached ? https://lkml.org/lkml/2014/10/22/722 > I'm using CONFIG_TREE_RCU=y, as this is SMP without PREEMPT. > > Anyone with a clue? > > In personal conversations with technical people, I call myself a hacker. But > when I'm talking to journalists I just say "programmer" or something like that. > -- Linus Torvalds > -- <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | <http://twitter.com/#!/linaroorg> Twitter | <http://www.linaro.org/linaro-blog/> Blog ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: (bisected) Lock up on sh73a0/kzm9g on cpuidle initialization 2014-11-06 21:02 ` Daniel Lezcano @ 2014-11-07 7:59 ` Geert Uytterhoeven 2014-11-25 17:49 ` Geert Uytterhoeven 0 siblings, 1 reply; 7+ messages in thread From: Geert Uytterhoeven @ 2014-11-07 7:59 UTC (permalink / raw) To: Daniel Lezcano Cc: Ingo Molnar, Nicolas Pitre, Paul McKenney, Jiri Kosina, Rafael J. Wysocki, Linux PM list, Linux-sh list, linux-arm-kernel, linux-kernel Hi Daniel, On Thu, Nov 6, 2014 at 10:02 PM, Daniel Lezcano <daniel.lezcano@linaro.org> wrote: > On 11/06/2014 09:38 PM, Geert Uytterhoeven wrote: >> When CONFIG_CPU_IDLE=y, the kernel locks up during cpuidle initialization >> on Renesas sh73a0/kzm9g-reference, which has a dual-core Cortex-A9. >> >> Last message is: >> >> DMA: preallocated 256 KiB pool for atomic coherent allocations >> >> After this it's supposed to print: >> >> cpuidle: using governor ladder >> cpuidle: using governor menu >> >> I've bisected this to commit 442bf3aaf55a91ebfec71da46a4ee10a3c905bcc >> ("sched: Let the scheduler see CPU idle states"). >> >> Reverting that commit, and commit 83a0a96a5f26d974580fd7251043ff70c8f1823d >> ("sched/fair: Leverage the idle state info when choosing the "idlest" >> cpu") which >> depends on it, fixes the problem. >> >> I saw the discussion "lockdep splat in CPU hotplug", so I enabled lockdep >> debugging, but didn't see a lockdep splat. > > Did you try the fix attached ? > > https://lkml.org/lkml/2014/10/22/722 Thanks, I didn't try that. However, this patch seems to be in v3.18-rc3, so I'm already using it. Hence it doesn't fix the problem for me. On another board, with a dual Cortex-A15, the problem doesn't show up. Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: (bisected) Lock up on sh73a0/kzm9g on cpuidle initialization 2014-11-07 7:59 ` Geert Uytterhoeven @ 2014-11-25 17:49 ` Geert Uytterhoeven 2014-11-25 18:01 ` Paul E. McKenney 0 siblings, 1 reply; 7+ messages in thread From: Geert Uytterhoeven @ 2014-11-25 17:49 UTC (permalink / raw) To: Daniel Lezcano Cc: Ingo Molnar, Nicolas Pitre, Paul McKenney, Jiri Kosina, Rafael J. Wysocki, Linux PM list, Linux-sh list, linux-arm-kernel, linux-kernel On Fri, Nov 7, 2014 at 8:59 AM, Geert Uytterhoeven <geert@linux-m68k.org> wrote: > On Thu, Nov 6, 2014 at 10:02 PM, Daniel Lezcano > <daniel.lezcano@linaro.org> wrote: >> On 11/06/2014 09:38 PM, Geert Uytterhoeven wrote: >>> When CONFIG_CPU_IDLE=y, the kernel locks up during cpuidle initialization >>> on Renesas sh73a0/kzm9g-reference, which has a dual-core Cortex-A9. >>> >>> Last message is: >>> >>> DMA: preallocated 256 KiB pool for atomic coherent allocations >>> >>> After this it's supposed to print: >>> >>> cpuidle: using governor ladder >>> cpuidle: using governor menu >>> >>> I've bisected this to commit 442bf3aaf55a91ebfec71da46a4ee10a3c905bcc >>> ("sched: Let the scheduler see CPU idle states"). >>> >>> Reverting that commit, and commit 83a0a96a5f26d974580fd7251043ff70c8f1823d >>> ("sched/fair: Leverage the idle state info when choosing the "idlest" >>> cpu") which >>> depends on it, fixes the problem. >>> >>> I saw the discussion "lockdep splat in CPU hotplug", so I enabled lockdep >>> debugging, but didn't see a lockdep splat. >> >> Did you try the fix attached ? >> >> https://lkml.org/lkml/2014/10/22/722 > > Thanks, I didn't try that. > > However, this patch seems to be in v3.18-rc3, so I'm already using it. > Hence it doesn't fix the problem for me. > > On another board, with a dual Cortex-A15, the problem doesn't show up. This problem (regression introduced in v3.18-rc1) is still present in v3.18-rc6. I did some more investigations, and it's hanging in the call to synchronize_rcu() in cpuidle_uninstall_idle_handler(), which was added in commit 442bf3aaf55a91ebfec71da46a4ee10a3c905bcc. More specificailly, it's blocked on the wait_for_completion(&rcu.completion) in kernel/rcu/update.c:void wait_rcu_gp(call_rcu_func_t crf). Anyone with a clue? Thanks again! Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: (bisected) Lock up on sh73a0/kzm9g on cpuidle initialization 2014-11-25 17:49 ` Geert Uytterhoeven @ 2014-11-25 18:01 ` Paul E. McKenney 2014-11-25 21:27 ` Geert Uytterhoeven 0 siblings, 1 reply; 7+ messages in thread From: Paul E. McKenney @ 2014-11-25 18:01 UTC (permalink / raw) To: Geert Uytterhoeven Cc: Daniel Lezcano, Ingo Molnar, Nicolas Pitre, Jiri Kosina, Rafael J. Wysocki, Linux PM list, Linux-sh list, linux-arm-kernel, linux-kernel On Tue, Nov 25, 2014 at 06:49:16PM +0100, Geert Uytterhoeven wrote: > On Fri, Nov 7, 2014 at 8:59 AM, Geert Uytterhoeven <geert@linux-m68k.org> wrote: > > On Thu, Nov 6, 2014 at 10:02 PM, Daniel Lezcano > > <daniel.lezcano@linaro.org> wrote: > >> On 11/06/2014 09:38 PM, Geert Uytterhoeven wrote: > >>> When CONFIG_CPU_IDLE=y, the kernel locks up during cpuidle initialization > >>> on Renesas sh73a0/kzm9g-reference, which has a dual-core Cortex-A9. > >>> > >>> Last message is: > >>> > >>> DMA: preallocated 256 KiB pool for atomic coherent allocations > >>> > >>> After this it's supposed to print: > >>> > >>> cpuidle: using governor ladder > >>> cpuidle: using governor menu > >>> > >>> I've bisected this to commit 442bf3aaf55a91ebfec71da46a4ee10a3c905bcc > >>> ("sched: Let the scheduler see CPU idle states"). > >>> > >>> Reverting that commit, and commit 83a0a96a5f26d974580fd7251043ff70c8f1823d > >>> ("sched/fair: Leverage the idle state info when choosing the "idlest" > >>> cpu") which > >>> depends on it, fixes the problem. > >>> > >>> I saw the discussion "lockdep splat in CPU hotplug", so I enabled lockdep > >>> debugging, but didn't see a lockdep splat. > >> > >> Did you try the fix attached ? > >> > >> https://lkml.org/lkml/2014/10/22/722 > > > > Thanks, I didn't try that. > > > > However, this patch seems to be in v3.18-rc3, so I'm already using it. > > Hence it doesn't fix the problem for me. > > > > On another board, with a dual Cortex-A15, the problem doesn't show up. > > This problem (regression introduced in v3.18-rc1) is still present in v3.18-rc6. > > I did some more investigations, and it's hanging in the call to > synchronize_rcu() in cpuidle_uninstall_idle_handler(), which was added in > commit 442bf3aaf55a91ebfec71da46a4ee10a3c905bcc. > More specificailly, it's blocked on the wait_for_completion(&rcu.completion) > in kernel/rcu/update.c:void wait_rcu_gp(call_rcu_func_t crf). You didn't disable RCU CPU stall warnings, did you? If you did, please re-enable them, as the stall warning messages will likely help to debug this. The soft-lockup checks can also be quite valuable. If you haven't run with CONFIG_PROVE_RCU=y, please try that. For example, if you have CONFIG_PREEMPT=y and you do synchronize_rcu() from within an RCU read-side critical section (don't do that, it will hang!!!), then you will get a lockdep splat. Does any sort of system activity (keyboard, network, etc.) unstick the system? If you have tried all those things without good effect, could you please send along your .config and an alt-sysrq-t dump of all tasks' stacks? Thanx, Paul > Anyone with a clue? > > Thanks again! > > Gr{oetje,eeting}s, > > Geert > > -- > Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org > > In personal conversations with technical people, I call myself a hacker. But > when I'm talking to journalists I just say "programmer" or something like that. > -- Linus Torvalds > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: (bisected) Lock up on sh73a0/kzm9g on cpuidle initialization 2014-11-25 18:01 ` Paul E. McKenney @ 2014-11-25 21:27 ` Geert Uytterhoeven 2014-11-25 22:23 ` Daniel Lezcano 0 siblings, 1 reply; 7+ messages in thread From: Geert Uytterhoeven @ 2014-11-25 21:27 UTC (permalink / raw) To: Paul McKenney Cc: Daniel Lezcano, Ingo Molnar, Nicolas Pitre, Jiri Kosina, Rafael J. Wysocki, Linux PM list, Linux-sh list, linux-arm-kernel, linux-kernel, Magnus Damm Hi Paul, On Tue, Nov 25, 2014 at 7:01 PM, Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote: > On Tue, Nov 25, 2014 at 06:49:16PM +0100, Geert Uytterhoeven wrote: >> On Fri, Nov 7, 2014 at 8:59 AM, Geert Uytterhoeven <geert@linux-m68k.org> wrote: >> > On Thu, Nov 6, 2014 at 10:02 PM, Daniel Lezcano >> > <daniel.lezcano@linaro.org> wrote: >> >> On 11/06/2014 09:38 PM, Geert Uytterhoeven wrote: >> >>> When CONFIG_CPU_IDLE=y, the kernel locks up during cpuidle initialization >> >>> on Renesas sh73a0/kzm9g-reference, which has a dual-core Cortex-A9. >> >>> >> >>> Last message is: >> >>> >> >>> DMA: preallocated 256 KiB pool for atomic coherent allocations >> >>> >> >>> After this it's supposed to print: >> >>> >> >>> cpuidle: using governor ladder >> >>> cpuidle: using governor menu >> >>> >> >>> I've bisected this to commit 442bf3aaf55a91ebfec71da46a4ee10a3c905bcc >> >>> ("sched: Let the scheduler see CPU idle states"). >> >>> >> >>> Reverting that commit, and commit 83a0a96a5f26d974580fd7251043ff70c8f1823d >> >>> ("sched/fair: Leverage the idle state info when choosing the "idlest" >> >>> cpu") which >> >>> depends on it, fixes the problem. >> >>> >> >>> I saw the discussion "lockdep splat in CPU hotplug", so I enabled lockdep >> >>> debugging, but didn't see a lockdep splat. >> >> >> >> Did you try the fix attached ? >> >> >> >> https://lkml.org/lkml/2014/10/22/722 >> > >> > Thanks, I didn't try that. >> > >> > However, this patch seems to be in v3.18-rc3, so I'm already using it. >> > Hence it doesn't fix the problem for me. >> > >> > On another board, with a dual Cortex-A15, the problem doesn't show up. >> >> This problem (regression introduced in v3.18-rc1) is still present in v3.18-rc6. >> >> I did some more investigations, and it's hanging in the call to >> synchronize_rcu() in cpuidle_uninstall_idle_handler(), which was added in >> commit 442bf3aaf55a91ebfec71da46a4ee10a3c905bcc. >> More specificailly, it's blocked on the wait_for_completion(&rcu.completion) >> in kernel/rcu/update.c:void wait_rcu_gp(call_rcu_func_t crf). > > You didn't disable RCU CPU stall warnings, did you? If you did, please > re-enable them, as the stall warning messages will likely help to debug > this. The soft-lockup checks can also be quite valuable. > > If you haven't run with CONFIG_PROVE_RCU=y, please try that. For example, > if you have CONFIG_PREEMPT=y and you do synchronize_rcu() from within > an RCU read-side critical section (don't do that, it will hang!!!), > then you will get a lockdep splat. > > Does any sort of system activity (keyboard, network, etc.) unstick the > system? Thanks! Unfortunately none of the above helped. However, I found the culprit. It turned out to be a platform issue, not an issue in the generic cpu idle or RCU code. Read on below if you're interested in the gory details. Else just skip, and sleep well again tonight ;-) > If you have tried all those things without good effect, could you please > send along your .config and an alt-sysrq-t dump of all tasks' stacks? As I didn't manage to trigger a sysrq dump over the serial console, I just called __handle_sysrq() right before the wait_for_completion(), after a small delay. The dump didn't show anything suspicious. Everything looked the same as on the dual-core Cortex A15, where the problem doesn't manifest. Then I noticed the sched debug output on the A15, which was missing on the CA9 build. Enabling it on the A9 gave: Sched Debug Version: v0.11, 3.18.0-rc6-kzm9g-reference-04913-gedc89a2a2059c7ff-dirty #101 ktime : 0.000000 sched_clk : 0.000000 cpu_clk : 0.000000 jiffies : 4294928896 Oops, time is not advancing? Dmesg also shows (early): clocksource_of_init: no matching clocksources found and the timer is only initialized much later, after cpu idle initialization: sh_cmt e6138000.timer: ch0: used for periodic clock events Hacking up a timer node for "arm,cortex-a9-twd-timer" in sh73a0.dtsi (with some "guessed" values) made it work. Thanks! Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: (bisected) Lock up on sh73a0/kzm9g on cpuidle initialization 2014-11-25 21:27 ` Geert Uytterhoeven @ 2014-11-25 22:23 ` Daniel Lezcano 0 siblings, 0 replies; 7+ messages in thread From: Daniel Lezcano @ 2014-11-25 22:23 UTC (permalink / raw) To: Geert Uytterhoeven, Paul McKenney Cc: Ingo Molnar, Nicolas Pitre, Jiri Kosina, Rafael J. Wysocki, Linux PM list, Linux-sh list, linux-arm-kernel, linux-kernel, Magnus Damm On 11/25/2014 10:27 PM, Geert Uytterhoeven wrote: [ ... ] >> Does any sort of system activity (keyboard, network, etc.) unstick the >> system? > > Thanks! Unfortunately none of the above helped. > > However, I found the culprit. It turned out to be a platform issue, not an > issue in the generic cpu idle or RCU code. Read on below if you're > interested in the gory details. Else just skip, and sleep well again tonight ;-) > >> If you have tried all those things without good effect, could you please >> send along your .config and an alt-sysrq-t dump of all tasks' stacks? > > As I didn't manage to trigger a sysrq dump over the serial console, > I just called __handle_sysrq() right before the wait_for_completion(), after > a small delay. The dump didn't show anything suspicious. Everything > looked the same as on the dual-core Cortex A15, where the problem > doesn't manifest. > > Then I noticed the sched debug output on the A15, which was missing > on the CA9 build. Enabling it on the A9 gave: > > Sched Debug Version: v0.11, > 3.18.0-rc6-kzm9g-reference-04913-gedc89a2a2059c7ff-dirty #101 > ktime : 0.000000 > sched_clk : 0.000000 > cpu_clk : 0.000000 > jiffies : 4294928896 > > Oops, time is not advancing? > > Dmesg also shows (early): > > clocksource_of_init: no matching clocksources found > > and the timer is only initialized much later, after cpu idle initialization: > > sh_cmt e6138000.timer: ch0: used for periodic clock events > > Hacking up a timer node for "arm,cortex-a9-twd-timer" in sh73a0.dtsi > (with some "guessed" values) made it work. > > Thanks! > > Gr{oetje,eeting}s, Hi Geert, thanks for sharing this information. -- Daniel -- <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | <http://twitter.com/#!/linaroorg> Twitter | <http://www.linaro.org/linaro-blog/> Blog ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2014-11-25 22:23 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-11-06 20:38 (bisected) Lock up on sh73a0/kzm9g on cpuidle initialization Geert Uytterhoeven 2014-11-06 21:02 ` Daniel Lezcano 2014-11-07 7:59 ` Geert Uytterhoeven 2014-11-25 17:49 ` Geert Uytterhoeven 2014-11-25 18:01 ` Paul E. McKenney 2014-11-25 21:27 ` Geert Uytterhoeven 2014-11-25 22:23 ` Daniel Lezcano
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).