From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 64D3BC433F5 for ; Wed, 1 Jun 2022 02:00:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1349074AbiFACAG (ORCPT ); Tue, 31 May 2022 22:00:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41802 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1349076AbiFACAE (ORCPT ); Tue, 31 May 2022 22:00:04 -0400 Received: from out30-131.freemail.mail.aliyun.com (out30-131.freemail.mail.aliyun.com [115.124.30.131]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 313DD994E6; Tue, 31 May 2022 19:00:03 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R201e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04394;MF=guoheyi@linux.alibaba.com;NM=1;PH=DS;RN=10;SR=0;TI=SMTPD_---0VExkm4L_1654048798; Received: from 30.225.140.38(mailfrom:guoheyi@linux.alibaba.com fp:SMTPD_---0VExkm4L_1654048798) by smtp.aliyun-inc.com(127.0.0.1); Wed, 01 Jun 2022 10:00:00 +0800 Message-ID: <374237cb-1cda-df12-eb9f-7422cab51fc4@linux.alibaba.com> Date: Wed, 1 Jun 2022 09:59:58 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.9.0 Subject: Re: [PATCH] drivers/i2c-aspeed: avoid invalid memory reference after timeout Content-Language: en-US To: Lei Yu Cc: Joel Stanley , linux-aspeed , Andrew Jeffery , OpenBMC Maillist , Brendan Higgins , Linux Kernel Mailing List , "open list:I2C SUBSYSTEM HOST DRIVERS" , Philipp Zabel , Linux ARM References: <20220109132613.122912-1-guoheyi@linux.alibaba.com> <0f5cd773-2d0a-b782-b967-ecbcec3de7b1@linux.alibaba.com> From: Heyi Guo In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Thanks for your feedback :) 在 2022/5/31 下午5:35, Lei Yu 写道: > I hit a similar problem that has a slightly different backtrace on a > malfunctioning device. > https://pastebin.com/TiWdkdrG > > With this patch, the kernel panic is gone and it gets below logs instead: > > aspeed-i2c-bus 1e78a180.i2c-bus: bus in unknown state. irq_status: 0x1 > aspeed-i2c-bus 1e78a180.i2c-bus: irq handled != irq. expected > 0x00000001, but was 0x00000000 > aspeed-i2c-bus 1e78a180.i2c-bus: bus in unknown state. irq_status: 0x10 > aspeed-i2c-bus 1e78a180.i2c-bus: irq handled != irq. expected > 0x00000010, but was 0x00000000 > > So I think this patch is good in that it prevents the kernel panic. > > On Wed, Jan 19, 2022 at 11:00 AM Heyi Guo wrote: >> >> 在 2022/1/17 下午2:38, Joel Stanley 写道: >>> On Fri, 14 Jan 2022 at 14:01, Heyi Guo wrote: >>>> Hi Joel, >>>> >>>> >>>> 在 2022/1/11 下午6:51, Joel Stanley 写道: >>>>> On Tue, 11 Jan 2022 at 07:52, Heyi Guo wrote: >>>>>> Hi all, >>>>>> >>>>>> Any comments? >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Heyi >>>>>> >>>>>> 在 2022/1/9 下午9:26, Heyi Guo 写道: >>>>>>> The memory will be freed by the caller if transfer timeout occurs, >>>>>>> then it would trigger kernel panic if the peer device responds with >>>>>>> something after timeout and triggers the interrupt handler of aspeed >>>>>>> i2c driver. >>>>>>> >>>>>>> Set the msgs pointer to NULL to avoid invalid memory reference after >>>>>>> timeout to fix this potential kernel panic. >>>>> Thanks for the patch. How did you discover this issue? Do you have a >>>>> test I can run to reproduce the crash? >>>> We are using one i2c channel to communicate with another MCU by >>>> implementing user space SSIF protocol, and the MCU may not respond in >>>> time if it is busy. If it responds after timeout occurs, it will trigger >>>> below kernel panic: >>>> >>> Thanks for the details. It looks like you've done some testing of >>> this, which is good. >>> >>>> After applying this patch, we'll get below warning instead: >>>> >>>> "bus in unknown state. irq_status: 0x%x\n" >>> Given we get to here in the irq handler, we've done these two tests: >>> >>> - aspeed_i2c_is_irq_error() >>> - the state is not INACTIVE or PENDING >>> >>> but there's no buffer ready for us to use. So what has triggered the >>> IRQ in this case? Do you have a record of the irq status bits? >>> >>> I am wondering if the driver should know that the transaction has >>> timed out, instead of detecting this unknown state. >> Yes, some drivers will try to abort the transaction before returning to >> the caller, if timeout happens. >> >> The irq status bits are not always the same; searching from the history >> logs, I found some messages like below: >> >> [ 495.289499] aspeed-i2c-bus 1e78a680.i2c-bus: bus in unknown state. >> irq_status: 0x2 >> [ 495.298003] aspeed-i2c-bus 1e78a680.i2c-bus: bus in unknown state. >> irq_status: 0x10 >> >> [ 65.176761] aspeed-i2c-bus 1e78a680.i2c-bus: bus in unknown state. >> irq_status: 0x15 >> >> Thanks, >> >> Heyi >> >>> >>>>> Can you provide a Fixes tag? >>>> I think the bug was introduced by the first commit of this file :( >>>> >>>> f327c686d3ba44eda79a2d9e02a6a242e0b75787 >>>> >>>> >>>>> Do other i2c master drivers do this? I took a quick look at the meson >>>>> driver and it doesn't appear to clear it's pointer to msgs. >>>> It is hard to say. It seems other drivers have some recover scheme like >>>> aborting the transfer, or loop each messages in process context and >>>> don't do much in IRQ handler, which may disable interrupts or not retain >>>> the buffer pointer before returning timeout. >>> I think your change is okay to go in as it fixes the crash, but first >>> I want to work out if there's some missing handling of a timeout >>> condition that we should add as well. >>> >>> >>>> Thanks, >>>> >>>> Heyi >>>> >>>> >>>>>>> Signed-off-by: Heyi Guo >>>>>>> >>>>>>> ------- >>>>>>> >>>>>>> Cc: Brendan Higgins >>>>>>> Cc: Benjamin Herrenschmidt >>>>>>> Cc: Joel Stanley >>>>>>> Cc: Andrew Jeffery >>>>>>> Cc: Philipp Zabel >>>>>>> Cc: linux-i2c@vger.kernel.org >>>>>>> Cc: openbmc@lists.ozlabs.org >>>>>>> Cc: linux-arm-kernel@lists.infradead.org >>>>>>> Cc: linux-aspeed@lists.ozlabs.org >>>>>>> --- >>>>>>> drivers/i2c/busses/i2c-aspeed.c | 5 +++++ >>>>>>> 1 file changed, 5 insertions(+) >>>>>>> >>>>>>> diff --git a/drivers/i2c/busses/i2c-aspeed.c b/drivers/i2c/busses/i2c-aspeed.c >>>>>>> index 67e8b97c0c950..3ab0396168680 100644 >>>>>>> --- a/drivers/i2c/busses/i2c-aspeed.c >>>>>>> +++ b/drivers/i2c/busses/i2c-aspeed.c >>>>>>> @@ -708,6 +708,11 @@ static int aspeed_i2c_master_xfer(struct i2c_adapter *adap, >>>>>>> spin_lock_irqsave(&bus->lock, flags); >>>>>>> if (bus->master_state == ASPEED_I2C_MASTER_PENDING) >>>>>>> bus->master_state = ASPEED_I2C_MASTER_INACTIVE; >>>>>>> + /* >>>>>>> + * All the buffers may be freed after returning to caller, so >>>>>>> + * set msgs to NULL to avoid memory reference after freeing. >>>>>>> + */ >>>>>>> + bus->msgs = NULL; >>>>>>> spin_unlock_irqrestore(&bus->lock, flags); >>>>>>> >>>>>>> return -ETIMEDOUT; > >