linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>
To: Tony Lindgren <tony@atomide.com>
Cc: linux-arm-kernel@lists.infradead.org, Nishanth Menon <nm@ti.com>,
	"Bajjuri, Praneeth" <praneeth@ti.com>,
	linux-omap@vger.kernel.org
Subject: Re: Random stack corruption on v5.13 with dra76
Date: Fri, 21 May 2021 13:30:58 +0300	[thread overview]
Message-ID: <9e2e544d-4e3c-4171-9a37-fb582861e368@ideasonboard.com> (raw)
In-Reply-To: <YKd56/KAnIUIm7K5@atomide.com>

On 21/05/2021 12:14, Tony Lindgren wrote:
> * Tomi Valkeinen <tomi.valkeinen@ideasonboard.com> [210521 08:45]:
>> On 21/05/2021 10:39, Tony Lindgren wrote:
>>> * Tomi Valkeinen <tomi.valkeinen@ideasonboard.com> [210521 07:05]:
>>>> On 21/05/2021 08:36, Tony Lindgren wrote:
>>>>> * Tomi Valkeinen <tomi.valkeinen@ideasonboard.com> [210520 08:27]:
>>>>>> Hi,
>>>>>>
>>>>>> I've noticed that the v5.13 rcs crash randomly (but quite often) on dra76 evm
>>>>>> (I haven't tested other boards). Anyone else seen this problem?
>>>>>
>>>>> I have not seen this so far and beagle-x15 is behaving for me.
>>>>>
>>>>> Does it always happen on boot?
>>>>
>>>> No, but quite often. I can't really say how often, as it's annoyingly random.
>>>> I tried to bisect, but that proved to be difficult as sometimes I get multiple (5+)
>>>> successful boots before the crash.
>>>>
>>>> I tested with x15, same issue (below). So... Something in my kernel config? Or compiler?
>>>> Looks like the crash happens always very soon after (or during) probing palmas.
>>>
>>> After about 10 reboots with your .config I'm seeing it now too on
>>> beagle-x15. So far no luck reproducing it with omap2plus_defconfig.
>>
>> I think I have an easy way to see if a kernel is good or bad, by printing
>> stack_not_used(current) in the first call to omap_i2c_xfer_irq(). There's a
>> huge drop between v5.12 and v5.13-rc1.
>>
>> And interestingly, sometimes a simple printk seems to use hundreds of bytes
>> of stack (i.e. compare stack usage before and after the print). But not
>> always. So maybe the issue is somehow related to printk.
>>
>> I'm bisecting.
> 
> OK sounds good to me.

Well, I found the bad commit but unfortunately it doesn't exactly point 
where the issue is.

f483a3e123410bd1c78af295bf65feffb6769a98 is the first bad commit
commit f483a3e123410bd1c78af295bf65feffb6769a98
Author: Tony Lindgren <tony@atomide.com>
Date:   Wed Mar 10 14:03:48 2021 +0200

     ARM: dts: Configure simple-pm-bus for dra7 l4_per1

     We can now probe interconnects with device tree only configuration 
using
     simple-pm-bus and genpd.

     Tested-by: Kishon Vijay Abraham I <kishon@ti.com>
     Signed-off-by: Tony Lindgren <tony@atomide.com>

  arch/arm/boot/dts/dra7-l4.dtsi | 9 ++++++---
  1 file changed, 6 insertions(+), 3 deletions(-)


The difference is clear, though. With 
9a75368b6426739e8b798592f084cb682d760568, which is the last good commit, 
when I print the stack usage with stack_not_used() in three different 
places in omap_i2c_xfer_irq(), I get always prints roughly like:

STACK FREE omap_i2c_xfer_irq: 2972, 2972, 2972

And these repeat exactly the same for each call to omap_i2c_xfer_irq (at 
least during palmas probe).

With the bad commit the situation is different. The first call to 
omap_i2c_xfer_irq prints:

STACK FREE omap_i2c_xfer_irq: 2024, 2024, 2024

so we're already using 1k more. But then, instead of the stack usage 
staying the same, consecutive calls show increased stack usage. It 
doesn't increase for each xfer call, but after about 10 calls, I'm 
getting ~1800, ten calls more I see ~800, and going down to ~500.

However, with this bad commit, I don't see the empty stack going below 
~500, so I don't get crashes. But going to a more recent commit, like 
01d7136894410a71932096e0fb9f1d301b6ccf07, the situation is much worse. 
The first print shows:

STACK FREE omap_i2c_xfer_irq: 1164, 1164, 1164

and it quickly goes to stack overflow.

  Tomi

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2021-05-21 10:32 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <0f48c7e5-6acd-1143-35ef-3dea2255bec6@ideasonboard.com>
2021-05-21  5:36 ` Tony Lindgren
2021-05-21  7:04   ` Tomi Valkeinen
2021-05-21  7:39     ` Tony Lindgren
2021-05-21  8:45       ` Tomi Valkeinen
2021-05-21  9:14         ` Tony Lindgren
2021-05-21 10:30           ` Tomi Valkeinen [this message]
2021-05-21 12:57             ` Tomi Valkeinen
2021-05-21 13:06               ` Tony Lindgren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9e2e544d-4e3c-4171-9a37-fb582861e368@ideasonboard.com \
    --to=tomi.valkeinen@ideasonboard.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-omap@vger.kernel.org \
    --cc=nm@ti.com \
    --cc=praneeth@ti.com \
    --cc=tony@atomide.com \
    --subject='Re: Random stack corruption on v5.13 with dra76' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).