From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.2 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 66B57C433ED for ; Fri, 21 May 2021 10:32:29 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4761C60E0B for ; Fri, 21 May 2021 10:32:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230414AbhEUKdu (ORCPT ); Fri, 21 May 2021 06:33:50 -0400 Received: from perceval.ideasonboard.com ([213.167.242.64]:48932 "EHLO perceval.ideasonboard.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232694AbhEUKcX (ORCPT ); Fri, 21 May 2021 06:32:23 -0400 Received: from [192.168.1.111] (91-157-208-71.elisa-laajakaista.fi [91.157.208.71]) by perceval.ideasonboard.com (Postfix) with ESMTPSA id 648918D8; Fri, 21 May 2021 12:30:59 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ideasonboard.com; s=mail; t=1621593059; bh=6p0bVg/UmK+uYev1JVeL9BVbX3w24rY/vkcEb+2tA2I=; h=To:Cc:References:From:Subject:Date:In-Reply-To:From; b=Adv1enw3bO1mvH5CDCPxa/OtFUqjlUbiWx7aVfdlAbPliCbvm40phH4R/8iFA7Dd/ LTp9jXBwAor2flclwiPnQ4ZFKIWs1IY0W/UYk+vdZ7dpJbf2RhFlGwRTLZNGOO3yrc bOgYQ+zxWnNZCkLhcFUg0dvXadCe65fbiImItMz4= To: Tony Lindgren Cc: linux-arm-kernel@lists.infradead.org, Nishanth Menon , "Bajjuri, Praneeth" , linux-omap@vger.kernel.org References: <0f48c7e5-6acd-1143-35ef-3dea2255bec6@ideasonboard.com> <064a9324-cfcf-47b9-6ae3-a29085a52683@ideasonboard.com> From: Tomi Valkeinen Subject: Re: Random stack corruption on v5.13 with dra76 Message-ID: <9e2e544d-4e3c-4171-9a37-fb582861e368@ideasonboard.com> Date: Fri, 21 May 2021 13:30:58 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-omap@vger.kernel.org On 21/05/2021 12:14, Tony Lindgren wrote: > * Tomi Valkeinen [210521 08:45]: >> On 21/05/2021 10:39, Tony Lindgren wrote: >>> * Tomi Valkeinen [210521 07:05]: >>>> On 21/05/2021 08:36, Tony Lindgren wrote: >>>>> * Tomi Valkeinen [210520 08:27]: >>>>>> Hi, >>>>>> >>>>>> I've noticed that the v5.13 rcs crash randomly (but quite often) on dra76 evm >>>>>> (I haven't tested other boards). Anyone else seen this problem? >>>>> >>>>> I have not seen this so far and beagle-x15 is behaving for me. >>>>> >>>>> Does it always happen on boot? >>>> >>>> No, but quite often. I can't really say how often, as it's annoyingly random. >>>> I tried to bisect, but that proved to be difficult as sometimes I get multiple (5+) >>>> successful boots before the crash. >>>> >>>> I tested with x15, same issue (below). So... Something in my kernel config? Or compiler? >>>> Looks like the crash happens always very soon after (or during) probing palmas. >>> >>> After about 10 reboots with your .config I'm seeing it now too on >>> beagle-x15. So far no luck reproducing it with omap2plus_defconfig. >> >> I think I have an easy way to see if a kernel is good or bad, by printing >> stack_not_used(current) in the first call to omap_i2c_xfer_irq(). There's a >> huge drop between v5.12 and v5.13-rc1. >> >> And interestingly, sometimes a simple printk seems to use hundreds of bytes >> of stack (i.e. compare stack usage before and after the print). But not >> always. So maybe the issue is somehow related to printk. >> >> I'm bisecting. > > OK sounds good to me. Well, I found the bad commit but unfortunately it doesn't exactly point where the issue is. f483a3e123410bd1c78af295bf65feffb6769a98 is the first bad commit commit f483a3e123410bd1c78af295bf65feffb6769a98 Author: Tony Lindgren Date: Wed Mar 10 14:03:48 2021 +0200 ARM: dts: Configure simple-pm-bus for dra7 l4_per1 We can now probe interconnects with device tree only configuration using simple-pm-bus and genpd. Tested-by: Kishon Vijay Abraham I Signed-off-by: Tony Lindgren arch/arm/boot/dts/dra7-l4.dtsi | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) The difference is clear, though. With 9a75368b6426739e8b798592f084cb682d760568, which is the last good commit, when I print the stack usage with stack_not_used() in three different places in omap_i2c_xfer_irq(), I get always prints roughly like: STACK FREE omap_i2c_xfer_irq: 2972, 2972, 2972 And these repeat exactly the same for each call to omap_i2c_xfer_irq (at least during palmas probe). With the bad commit the situation is different. The first call to omap_i2c_xfer_irq prints: STACK FREE omap_i2c_xfer_irq: 2024, 2024, 2024 so we're already using 1k more. But then, instead of the stack usage staying the same, consecutive calls show increased stack usage. It doesn't increase for each xfer call, but after about 10 calls, I'm getting ~1800, ten calls more I see ~800, and going down to ~500. However, with this bad commit, I don't see the empty stack going below ~500, so I don't get crashes. But going to a more recent commit, like 01d7136894410a71932096e0fb9f1d301b6ccf07, the situation is much worse. The first print shows: STACK FREE omap_i2c_xfer_irq: 1164, 1164, 1164 and it quickly goes to stack overflow. Tomi From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.6 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F231CC433B4 for ; Fri, 21 May 2021 10:32:52 +0000 (UTC) Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7FA04613BD for ; Fri, 21 May 2021 10:32:52 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7FA04613BD Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=ideasonboard.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=desiato.20200630; h=Sender:Content-Type: Content-Transfer-Encoding:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:Date:Message-ID:Subject: From:References:Cc:To:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=UOnFCOfIjjqAiBJLw/Zd4ZV5aMI9c2Ks99C13UOyVjo=; b=YcRBlnq68FZn18hYNWjiFPl0lb i73ECN3h2526ZlBmHSunk9b+oVRoFGBUs0IL4xN4Be1dhh7q2Hpkjv2cZw28tP8eKeqaCNz6R4GnX SB1aTDAxdkW+U2WQ2LOT7xuoI3Z6NjjUTRN8lpPTQdfss1DG47Tj8vMah7jwM4Wzu9Ir5Yk7wiZI2 sP5/B9I9bLLn8dud+G7ynsNjRxYpbUv2ROk9Gy+uJfki5GkIb/u+Z72Sz0HHBZt7hdpVkcYhOZLH1 yz1eOmMMsGz3rEsZAbFPQP+8Ew2c3Xz8LW73zU2HVoaD2js70lX5qpMrcCe9SVcE6uOzzwDjW8Fn5 K4OJsE6Q==; Received: from localhost ([::1] helo=desiato.infradead.org) by desiato.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1lk2R0-004vz6-8S; Fri, 21 May 2021 10:31:14 +0000 Received: from bombadil.infradead.org ([2607:7c80:54:e::133]) by desiato.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1lk2Qv-004vy9-5L for linux-arm-kernel@desiato.infradead.org; Fri, 21 May 2021 10:31:09 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: Content-Type:In-Reply-To:MIME-Version:Date:Message-ID:Subject:From:References :Cc:To:Sender:Reply-To:Content-ID:Content-Description; bh=GiWcCpc6LTD/XZiwr9dBblYJDyDXfih2E/nI1AoY4bs=; b=cYVfNLdMJT5VNYx6JnZtge1vQ8 8EQvLl1w7Sq9AbPt/+YK/zuFBNG6y93u/ooxo9YKzvS3LbVZlhTq1rQCzmOzgnr31+ngSHEQCCUEf gyJ2qupUh0NynUc9Ff7vs1i3TuZnehr1qTAZGe3GW2LPSb+MJPjD0EcSQ1l101YsVdt8PuW4isQIa kWxTgdi4nadR+kM7vHDoKuDvouklWY/Ns7qvGiRtkHialeoiu0b2tjanEeN4T0vrsoeJuq24mhTSz W41/FVEGEBjeGAUsjRYzeO9GvFr6Firpno+bz9Vf4+iBGNeoFPw7GlivtSOVkz2iZWIV4KhmRjUHp PxW7fbcQ==; Received: from perceval.ideasonboard.com ([2001:4b98:dc2:55:216:3eff:fef7:d647]) by bombadil.infradead.org with esmtps (Exim 4.94 #2 (Red Hat Linux)) id 1lk2Qr-00H2Rs-4u for linux-arm-kernel@lists.infradead.org; Fri, 21 May 2021 10:31:07 +0000 Received: from [192.168.1.111] (91-157-208-71.elisa-laajakaista.fi [91.157.208.71]) by perceval.ideasonboard.com (Postfix) with ESMTPSA id 648918D8; Fri, 21 May 2021 12:30:59 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ideasonboard.com; s=mail; t=1621593059; bh=6p0bVg/UmK+uYev1JVeL9BVbX3w24rY/vkcEb+2tA2I=; h=To:Cc:References:From:Subject:Date:In-Reply-To:From; b=Adv1enw3bO1mvH5CDCPxa/OtFUqjlUbiWx7aVfdlAbPliCbvm40phH4R/8iFA7Dd/ LTp9jXBwAor2flclwiPnQ4ZFKIWs1IY0W/UYk+vdZ7dpJbf2RhFlGwRTLZNGOO3yrc bOgYQ+zxWnNZCkLhcFUg0dvXadCe65fbiImItMz4= To: Tony Lindgren Cc: linux-arm-kernel@lists.infradead.org, Nishanth Menon , "Bajjuri, Praneeth" , linux-omap@vger.kernel.org References: <0f48c7e5-6acd-1143-35ef-3dea2255bec6@ideasonboard.com> <064a9324-cfcf-47b9-6ae3-a29085a52683@ideasonboard.com> From: Tomi Valkeinen Subject: Re: Random stack corruption on v5.13 with dra76 Message-ID: <9e2e544d-4e3c-4171-9a37-fb582861e368@ideasonboard.com> Date: Fri, 21 May 2021 13:30:58 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1 MIME-Version: 1.0 In-Reply-To: Content-Language: en-US X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210521_033105_360995_9C277332 X-CRM114-Status: GOOD ( 24.92 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On 21/05/2021 12:14, Tony Lindgren wrote: > * Tomi Valkeinen [210521 08:45]: >> On 21/05/2021 10:39, Tony Lindgren wrote: >>> * Tomi Valkeinen [210521 07:05]: >>>> On 21/05/2021 08:36, Tony Lindgren wrote: >>>>> * Tomi Valkeinen [210520 08:27]: >>>>>> Hi, >>>>>> >>>>>> I've noticed that the v5.13 rcs crash randomly (but quite often) on dra76 evm >>>>>> (I haven't tested other boards). Anyone else seen this problem? >>>>> >>>>> I have not seen this so far and beagle-x15 is behaving for me. >>>>> >>>>> Does it always happen on boot? >>>> >>>> No, but quite often. I can't really say how often, as it's annoyingly random. >>>> I tried to bisect, but that proved to be difficult as sometimes I get multiple (5+) >>>> successful boots before the crash. >>>> >>>> I tested with x15, same issue (below). So... Something in my kernel config? Or compiler? >>>> Looks like the crash happens always very soon after (or during) probing palmas. >>> >>> After about 10 reboots with your .config I'm seeing it now too on >>> beagle-x15. So far no luck reproducing it with omap2plus_defconfig. >> >> I think I have an easy way to see if a kernel is good or bad, by printing >> stack_not_used(current) in the first call to omap_i2c_xfer_irq(). There's a >> huge drop between v5.12 and v5.13-rc1. >> >> And interestingly, sometimes a simple printk seems to use hundreds of bytes >> of stack (i.e. compare stack usage before and after the print). But not >> always. So maybe the issue is somehow related to printk. >> >> I'm bisecting. > > OK sounds good to me. Well, I found the bad commit but unfortunately it doesn't exactly point where the issue is. f483a3e123410bd1c78af295bf65feffb6769a98 is the first bad commit commit f483a3e123410bd1c78af295bf65feffb6769a98 Author: Tony Lindgren Date: Wed Mar 10 14:03:48 2021 +0200 ARM: dts: Configure simple-pm-bus for dra7 l4_per1 We can now probe interconnects with device tree only configuration using simple-pm-bus and genpd. Tested-by: Kishon Vijay Abraham I Signed-off-by: Tony Lindgren arch/arm/boot/dts/dra7-l4.dtsi | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) The difference is clear, though. With 9a75368b6426739e8b798592f084cb682d760568, which is the last good commit, when I print the stack usage with stack_not_used() in three different places in omap_i2c_xfer_irq(), I get always prints roughly like: STACK FREE omap_i2c_xfer_irq: 2972, 2972, 2972 And these repeat exactly the same for each call to omap_i2c_xfer_irq (at least during palmas probe). With the bad commit the situation is different. The first call to omap_i2c_xfer_irq prints: STACK FREE omap_i2c_xfer_irq: 2024, 2024, 2024 so we're already using 1k more. But then, instead of the stack usage staying the same, consecutive calls show increased stack usage. It doesn't increase for each xfer call, but after about 10 calls, I'm getting ~1800, ten calls more I see ~800, and going down to ~500. However, with this bad commit, I don't see the empty stack going below ~500, so I don't get crashes. But going to a more recent commit, like 01d7136894410a71932096e0fb9f1d301b6ccf07, the situation is much worse. The first print shows: STACK FREE omap_i2c_xfer_irq: 1164, 1164, 1164 and it quickly goes to stack overflow. Tomi _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel