From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B661AC433F5 for ; Wed, 13 Oct 2021 13:10:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9A16C610FE for ; Wed, 13 Oct 2021 13:10:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234087AbhJMNMJ (ORCPT ); Wed, 13 Oct 2021 09:12:09 -0400 Received: from m12-16.163.com ([220.181.12.16]:36389 "EHLO m12-16.163.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233552AbhJMNMG (ORCPT ); Wed, 13 Oct 2021 09:12:06 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=163.com; s=s110527; h=From:Subject:Date:Message-Id; bh=sUtf0XVHSTX2ol/mZr WRCxgBkBPfPmN/2f2tz+1TqbI=; b=poQndgCxhffIodp3KZIUC6RR3Vl3WsaUSz 6o5/FFb60Pnb451TEO6aeKE/sxe0aNRYtFsVh9dxnaGdym/TW9lIzY3pAWUlwLPP 1OCJf8DQ2eUZCsICnMxt+A0L1pQ7Oh0dvacNI0jhP4C3z4AsPfJYbb58T+yXrTz2 OATwSdsQI= Received: from localhost.localdomain (unknown [171.221.151.0]) by smtp12 (Coremail) with SMTP id EMCowABHXi1y2mZheO5fEA--.28928S2; Wed, 13 Oct 2021 21:09:19 +0800 (CST) From: Chen Lin To: will@kernel.org Cc: catalin.marinas@arm.com, mark.rutland@arm.com, joey.gouly@arm.com, maz@kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, chen.lin5@zte.com.cn, Chen Lin Subject: Re:Re: [PATCH] arm64: traps: add dump instr before BUG in kernel Date: Wed, 13 Oct 2021 21:08:43 +0800 Message-Id: <1634130523-2634-1-git-send-email-chen45464546@163.com> X-Mailer: git-send-email 1.7.9.5 In-Reply-To: <20211011100649.GB3681@willie-the-truck> References: <20211011100649.GB3681@willie-the-truck> X-CM-TRANSID: EMCowABHXi1y2mZheO5fEA--.28928S2 X-Coremail-Antispam: 1Uf129KBjvJXoWxXF48KFyfWF4xJF1xXrW5ZFb_yoW5uw47pF W7u3WYyFWDWw4UCw1Utw4rK3W3tws8Jr47WFn8ta4Sywn0qF1IqF4vqr13CayqvrWxuw42 vFyjqF1293W7ZFDanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDUYxBIdaVFxhVjvjDU0xZFpf9x07UOjjkUUUUU= X-Originating-IP: [171.221.151.0] X-CM-SenderInfo: hfkh0kqvuwkkiuw6il2tof0z/xtbBqhsrnl75b4w9DAAAsx Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org At 2021-10-11 17:06:50, "Will Deacon" wrote: >On Thu, Sep 30, 2021 at 10:41:30PM +0800, Chen Lin wrote: >> At 2021-09-30 15:42:47, "Will Deacon" wrote: >> >> >On Wed, Sep 29, 2021 at 09:29:46PM +0800, Chen Lin wrote: >> >> From: Chen Lin >> >> >> >> we should dump the real instructions before BUG in kernel mode, and >> >> compare this to the instructions from objdump. >> >> >> >> Signed-off-by: Chen Lin >> >> --- >> >> arch/arm64/kernel/traps.c | 7 ++++++- >> >> 1 file changed, 6 insertions(+), 1 deletion(-) >> >> >> >> diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c >> >> index b03e383..621a9dd 100644 >> >> --- a/arch/arm64/kernel/traps.c >> >> +++ b/arch/arm64/kernel/traps.c >> >> @@ -495,7 +495,12 @@ void do_undefinstr(struct pt_regs *regs) >> >> if (call_undef_hook(regs) == 0) >> >> return; >> >> >> >> - BUG_ON(!user_mode(regs)); >> >> + if (!user_mode(regs)) { >> >> + pr_emerg("Undef instruction in kernel, dump instr:"); >> >> + dump_kernel_instr(KERN_EMERG, regs); >> >> + BUG(); >> >> + } >> > >> >Hmm, I'm not completely convinced about this as the instruction in the >> >i-cache could be completely different. I think the PC value (for addr2line) >> >is a lot more useful, and we should be printing that already. >> > >> >Maybe you can elaborate on a situation where this information was helpful? >> > >> >Thanks, >> > >> >Will >> >> Undef instruction occurs in some cases >> >> 1. CPU do not have the permission to execute the instruction or the current CPU >> version does not support the instruction. For example, execute >> 'mrs x0, tcr_el3' under el1. > >This really shouldn't happen, but if it did, the PC would surely be enough >to debug the problem? > yes, PC is enough in this situation. >> 2. The instruction is a normal instruction, but it is changed during board >> running in some abnormal situation. eg: DDR bit flip, the normal instruction >> will become an undefined one. By printing the instruction, we can see the >> accurate instruction code and compare it with the instruction code from objdump >> to determine that it is a DDR issue. > >Is this really something we should be designing our exception handlers for? >If we're getting DDR bit flips for kernel .text, then it sounds like we need >ECC and/or RAS features to deal with them. > >So I'm not really sold on this change. 1. About the DDR bit flip, YES, the instruction code information is really useless in the ideal state. The ideal state includes the following conditions like: the DDR controller do supports ECC, and also is configured correctly, such as ECC enabled and ECC fixing enabled. There may exist various old boards or abnormal DDR configurations in embedded systems. 2. DDR flip is just one example. Other examples include text segment being overwritten by DMA and other accidental writes, we want more info about what it writes. Another example, some instructions may change at runtime by ALTERNATIVE(oldinstr, newinstr, feature) or live patch, we want both the pc and the instruction code to double check what happened if illegal instruction exception happen at such points. Personally, I think adding more information can make it easier to locate the above problems. Anyway, Thank you for your patience in reading and replying. > >Will From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 17461C433EF for ; Wed, 13 Oct 2021 13:31:37 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id C9668610E7 for ; Wed, 13 Oct 2021 13:31:36 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org C9668610E7 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=163.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:MIME-Version:List-Subscribe:List-Help: List-Post:List-Archive:List-Unsubscribe:List-Id:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=FPWO5ryEWff0/MXHu371Yjw0J6RgxTu0IFq2X0yF9yE=; b=BKfnT12+Gw+buZ LmMxi9luzoywLe0JuKpagsdfkGaj8s1xJWlDnFMibS/quFTic3Ggq3OKlFwOuNSVBzV9guXQGyUCP NDMObHS/7Yxty/bbWcN6dd1B4Pi4XyYFfS+oE4NgeMFuVNj20FOtbspuYhsqXol6oDytnK2KsOw4C YdfuZQAbppadzf1gH69MU3VGmVCq3iGSqiL9P5As6FKgVGiyiAXV7c8carB7i9z2HimpDc7dzlVhu 9CNMN0Ut4XFVOMhDGguqVLy8m5nGK7JezImEWWHkWXcO/VHuQpkNQr5PDGs+kp6Va69d068kyBOqz 8GuVwOrJILzxOgI2YgcA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1maeJJ-00Gtfu-VC; Wed, 13 Oct 2021 13:28:46 +0000 Received: from m12-16.163.com ([220.181.12.16]) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1mae16-00Gmco-CI for linux-arm-kernel@lists.infradead.org; Wed, 13 Oct 2021 13:10:00 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=163.com; s=s110527; h=From:Subject:Date:Message-Id; bh=sUtf0XVHSTX2ol/mZr WRCxgBkBPfPmN/2f2tz+1TqbI=; b=poQndgCxhffIodp3KZIUC6RR3Vl3WsaUSz 6o5/FFb60Pnb451TEO6aeKE/sxe0aNRYtFsVh9dxnaGdym/TW9lIzY3pAWUlwLPP 1OCJf8DQ2eUZCsICnMxt+A0L1pQ7Oh0dvacNI0jhP4C3z4AsPfJYbb58T+yXrTz2 OATwSdsQI= Received: from localhost.localdomain (unknown [171.221.151.0]) by smtp12 (Coremail) with SMTP id EMCowABHXi1y2mZheO5fEA--.28928S2; Wed, 13 Oct 2021 21:09:19 +0800 (CST) From: Chen Lin To: will@kernel.org Cc: catalin.marinas@arm.com, mark.rutland@arm.com, joey.gouly@arm.com, maz@kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, chen.lin5@zte.com.cn, Chen Lin Subject: Re:Re: [PATCH] arm64: traps: add dump instr before BUG in kernel Date: Wed, 13 Oct 2021 21:08:43 +0800 Message-Id: <1634130523-2634-1-git-send-email-chen45464546@163.com> X-Mailer: git-send-email 1.7.9.5 In-Reply-To: <20211011100649.GB3681@willie-the-truck> References: <20211011100649.GB3681@willie-the-truck> X-CM-TRANSID: EMCowABHXi1y2mZheO5fEA--.28928S2 X-Coremail-Antispam: 1Uf129KBjvJXoWxXF48KFyfWF4xJF1xXrW5ZFb_yoW5uw47pF W7u3WYyFWDWw4UCw1Utw4rK3W3tws8Jr47WFn8ta4Sywn0qF1IqF4vqr13CayqvrWxuw42 vFyjqF1293W7ZFDanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDUYxBIdaVFxhVjvjDU0xZFpf9x07UOjjkUUUUU= X-Originating-IP: [171.221.151.0] X-CM-SenderInfo: hfkh0kqvuwkkiuw6il2tof0z/xtbBqhsrnl75b4w9DAAAsx X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20211013_060956_861910_4CB63691 X-CRM114-Status: GOOD ( 24.43 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org At 2021-10-11 17:06:50, "Will Deacon" wrote: >On Thu, Sep 30, 2021 at 10:41:30PM +0800, Chen Lin wrote: >> At 2021-09-30 15:42:47, "Will Deacon" wrote: >> >> >On Wed, Sep 29, 2021 at 09:29:46PM +0800, Chen Lin wrote: >> >> From: Chen Lin >> >> >> >> we should dump the real instructions before BUG in kernel mode, and >> >> compare this to the instructions from objdump. >> >> >> >> Signed-off-by: Chen Lin >> >> --- >> >> arch/arm64/kernel/traps.c | 7 ++++++- >> >> 1 file changed, 6 insertions(+), 1 deletion(-) >> >> >> >> diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c >> >> index b03e383..621a9dd 100644 >> >> --- a/arch/arm64/kernel/traps.c >> >> +++ b/arch/arm64/kernel/traps.c >> >> @@ -495,7 +495,12 @@ void do_undefinstr(struct pt_regs *regs) >> >> if (call_undef_hook(regs) == 0) >> >> return; >> >> >> >> - BUG_ON(!user_mode(regs)); >> >> + if (!user_mode(regs)) { >> >> + pr_emerg("Undef instruction in kernel, dump instr:"); >> >> + dump_kernel_instr(KERN_EMERG, regs); >> >> + BUG(); >> >> + } >> > >> >Hmm, I'm not completely convinced about this as the instruction in the >> >i-cache could be completely different. I think the PC value (for addr2line) >> >is a lot more useful, and we should be printing that already. >> > >> >Maybe you can elaborate on a situation where this information was helpful? >> > >> >Thanks, >> > >> >Will >> >> Undef instruction occurs in some cases >> >> 1. CPU do not have the permission to execute the instruction or the current CPU >> version does not support the instruction. For example, execute >> 'mrs x0, tcr_el3' under el1. > >This really shouldn't happen, but if it did, the PC would surely be enough >to debug the problem? > yes, PC is enough in this situation. >> 2. The instruction is a normal instruction, but it is changed during board >> running in some abnormal situation. eg: DDR bit flip, the normal instruction >> will become an undefined one. By printing the instruction, we can see the >> accurate instruction code and compare it with the instruction code from objdump >> to determine that it is a DDR issue. > >Is this really something we should be designing our exception handlers for? >If we're getting DDR bit flips for kernel .text, then it sounds like we need >ECC and/or RAS features to deal with them. > >So I'm not really sold on this change. 1. About the DDR bit flip, YES, the instruction code information is really useless in the ideal state. The ideal state includes the following conditions like: the DDR controller do supports ECC, and also is configured correctly, such as ECC enabled and ECC fixing enabled. There may exist various old boards or abnormal DDR configurations in embedded systems. 2. DDR flip is just one example. Other examples include text segment being overwritten by DMA and other accidental writes, we want more info about what it writes. Another example, some instructions may change at runtime by ALTERNATIVE(oldinstr, newinstr, feature) or live patch, we want both the pc and the instruction code to double check what happened if illegal instruction exception happen at such points. Personally, I think adding more information can make it easier to locate the above problems. Anyway, Thank you for your patience in reading and replying. > >Will _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel