linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* T20 Cpuidle Freeze
@ 2017-11-03 13:07 Marcel Ziswiler
  2017-11-03 18:52 ` Dmitry Osipenko
  0 siblings, 1 reply; 4+ messages in thread
From: Marcel Ziswiler @ 2017-11-03 13:07 UTC (permalink / raw)
  To: linux-kernel, linux-tegra, rafael.j.wysocki
  Cc: digetx, thierry.reding, mirq-linux

Hi Rafael, dear community

One of our customers reported seeing freezes when running the LTS Linux
kernel 4.9.x on our Toradex Colibri T20 modules [1]. I was able to
reproduce a complete SoC lock-up after a few minutes also running the
latest 4.14-rc7 while LTS 4.4.x seemed to run stable.

Having attempted a multi-level bisection points towards the following
first bad commit:

9c4b2867ed7c8c8784dd417ffd16e705e81eb145

cpuidle: menu: Fix menu_select() for CPUIDLE_DRIVER_STATE_START == 0

Unfortunately as drivers/cpuidle/governors/menu.c has gotten further
edits since it seems not trivial to just revert it.

However I found out that it indeed has to do with CPU idle as when I
did disable the CONFIG_CPU_IDLE Linux kernel configuration option also
LTS 4.9.59 as well as latest 4.14-rc7 run now stable overnight.

Does anybody have any clue what exactly may be happening and/or why
cpuidle may not run stable on T20? Or is everybody always just
disabling cpuidle on T20 anyway?

Thanks!

[1] https://www.toradex.com/community/questions/16838/actual-lts-kernel
-49-on-colibri-t20.html

Best regards - Mit freundlichen Grüssen - Meilleures salutations

Marcel Ziswiler
Platform Manager Embedded Linux

Toradex AG
Altsagenstrasse 5 | 6048 Horw/Luzern | Switzerland | T: +41 41 500 48 00
(main line) | Direct: +41 41 500 48 10

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: T20 Cpuidle Freeze
  2017-11-03 13:07 T20 Cpuidle Freeze Marcel Ziswiler
@ 2017-11-03 18:52 ` Dmitry Osipenko
  2017-11-04 20:49   ` Marcel Ziswiler
  0 siblings, 1 reply; 4+ messages in thread
From: Dmitry Osipenko @ 2017-11-03 18:52 UTC (permalink / raw)
  To: Marcel Ziswiler, linux-kernel, linux-tegra, rafael.j.wysocki
  Cc: thierry.reding, mirq-linux

On 03.11.2017 16:07, Marcel Ziswiler wrote:
> Hi Rafael, dear community
> 
> One of our customers reported seeing freezes when running the LTS Linux
> kernel 4.9.x on our Toradex Colibri T20 modules [1]. I was able to
> reproduce a complete SoC lock-up after a few minutes also running the
> latest 4.14-rc7 while LTS 4.4.x seemed to run stable.
> 
> Having attempted a multi-level bisection points towards the following
> first bad commit:
> 
> 9c4b2867ed7c8c8784dd417ffd16e705e81eb145
> 
> cpuidle: menu: Fix menu_select() for CPUIDLE_DRIVER_STATE_START == 0
> 
> Unfortunately as drivers/cpuidle/governors/menu.c has gotten further
> edits since it seems not trivial to just revert it.
> 
> However I found out that it indeed has to do with CPU idle as when I
> did disable the CONFIG_CPU_IDLE Linux kernel configuration option also
> LTS 4.9.59 as well as latest 4.14-rc7 run now stable overnight.
> 
> Does anybody have any clue what exactly may be happening and/or why
> cpuidle may not run stable on T20? Or is everybody always just
> disabling cpuidle on T20 anyway?
> 
> Thanks!
> 
> [1] https://www.toradex.com/community/questions/16838/actual-lts-kernel
> -49-on-colibri-t20.html

I haven't seen any problems with the cpuidle on next and 4.14-rc7 works fine.

# cat /sys/devices/system/cpu/cpu[0-1]/cpuidle/state[0-1]/usage
162283
32905
254669
32905

# cat /sys/devices/system/cpu/cpu[0-1]/cpuidle/state[0-1]/time
436981763
2110484666
458260707
2121781516

# uptime
 18:50:24 up 44 min,  1 user,  load average: 0.15, 0.08, 0.07

It could be that cpuidle unmasks some other issue on the Colibri.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: T20 Cpuidle Freeze
  2017-11-03 18:52 ` Dmitry Osipenko
@ 2017-11-04 20:49   ` Marcel Ziswiler
  2017-11-05 13:16     ` Dmitry Osipenko
  0 siblings, 1 reply; 4+ messages in thread
From: Marcel Ziswiler @ 2017-11-04 20:49 UTC (permalink / raw)
  To: linux-kernel, digetx, linux-tegra, rafael.j.wysocki
  Cc: thierry.reding, mirq-linux

On Fri, 2017-11-03 at 21:52 +0300, Dmitry Osipenko wrote:
> I haven't seen any problems with the cpuidle on next and 4.14-rc7
> works fine.
> 
> # cat /sys/devices/system/cpu/cpu[0-1]/cpuidle/state[0-1]/usage
> 162283
> 32905
> 254669
> 32905
> 
> # cat /sys/devices/system/cpu/cpu[0-1]/cpuidle/state[0-1]/time
> 436981763
> 2110484666
> 458260707
> 2121781516
> 
> # uptime
>  18:50:24 up 44 min,  1 user,  load average: 0.15, 0.08, 0.07

OK, thanks. Good to know.

> It could be that cpuidle unmasks some other issue on the Colibri.

Yes, that's also my thinking.

What hardware did you run this on?

Does your kernel configuration differ from the stock tegra_defconfig I used?

What exact device tree source are you using (or just stock)?

Maybe the compiler version you are using could also have some influence?

If you have any additional suggestions on what else could be relevant please let me know.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: T20 Cpuidle Freeze
  2017-11-04 20:49   ` Marcel Ziswiler
@ 2017-11-05 13:16     ` Dmitry Osipenko
  0 siblings, 0 replies; 4+ messages in thread
From: Dmitry Osipenko @ 2017-11-05 13:16 UTC (permalink / raw)
  To: Marcel Ziswiler, linux-kernel, linux-tegra, rafael.j.wysocki
  Cc: thierry.reding, mirq-linux

On 04.11.2017 23:49, Marcel Ziswiler wrote:
> On Fri, 2017-11-03 at 21:52 +0300, Dmitry Osipenko wrote:
>> I haven't seen any problems with the cpuidle on next and 4.14-rc7
>> works fine.
>>
>> # cat /sys/devices/system/cpu/cpu[0-1]/cpuidle/state[0-1]/usage
>> 162283
>> 32905
>> 254669
>> 32905
>>
>> # cat /sys/devices/system/cpu/cpu[0-1]/cpuidle/state[0-1]/time
>> 436981763
>> 2110484666
>> 458260707
>> 2121781516
>>
>> # uptime
>>  18:50:24 up 44 min,  1 user,  load average: 0.15, 0.08, 0.07
> 
> OK, thanks. Good to know.
> 
>> It could be that cpuidle unmasks some other issue on the Colibri.
> 
> Yes, that's also my thinking.
> 
> What hardware did you run this on?
> 

I ran it on Acer A500 tablet, which is tegra20.

> Does your kernel configuration differ from the stock tegra_defconfig I used?
> 

Doesn't differ, used stock tegra_defconfig as well.

> What exact device tree source are you using (or just stock)?
> 

https://gist.github.com/digetx/2f624a0df4caff657ef28863b5354d5b

> Maybe the compiler version you are using could also have some influence?
> 

Compiler bugs aren't rareness, I'm using armv7a-hardfloat-linux-gnueabi-gcc
(Gentoo 6.4.0 p1.0) 6.4.0

> If you have any additional suggestions on what else could be relevant please let me know.
> 

1) I suppose you could attach JTAG and see where hang happens.

2) Enable all kernel debug Kconfig options.

3) Disable all non-critical device drivers in Kconfig, so that you could boot.
See if it makes difference.

4) Probably just adding some debug printk's would be good enough to localize the
offending place in the code.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-11-05 13:16 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-03 13:07 T20 Cpuidle Freeze Marcel Ziswiler
2017-11-03 18:52 ` Dmitry Osipenko
2017-11-04 20:49   ` Marcel Ziswiler
2017-11-05 13:16     ` Dmitry Osipenko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).