From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E40EAC433B4 for ; Mon, 26 Apr 2021 03:42:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B455561139 for ; Mon, 26 Apr 2021 03:42:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231346AbhDZDnS (ORCPT ); Sun, 25 Apr 2021 23:43:18 -0400 Received: from szxga01-in.huawei.com ([45.249.212.187]:5139 "EHLO szxga01-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231550AbhDZDnD (ORCPT ); Sun, 25 Apr 2021 23:43:03 -0400 Received: from dggeml717-chm.china.huawei.com (unknown [172.30.72.54]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4FT9d92n5LzYb5C; Mon, 26 Apr 2021 11:39:57 +0800 (CST) Received: from dggpemm500002.china.huawei.com (7.185.36.229) by dggeml717-chm.china.huawei.com (10.3.17.128) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.2176.2; Mon, 26 Apr 2021 11:42:12 +0800 Received: from [10.174.178.147] (10.174.178.147) by dggpemm500002.china.huawei.com (7.185.36.229) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2176.2; Mon, 26 Apr 2021 11:42:12 +0800 Subject: Re: [PATCH hulk-4.19-next] irqchip/gic-v3: Do not enable irqs when handling spurious interrups To: He Ying , CC: , , , Marc Zyngier , References: <20210426023929.89400-1-heying24@huawei.com> From: Hanjun Guo Message-ID: Date: Mon, 26 Apr 2021 11:42:11 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Thunderbird/68.6.0 MIME-Version: 1.0 In-Reply-To: <20210426023929.89400-1-heying24@huawei.com> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Language: en-GB Content-Transfer-Encoding: 7bit X-Originating-IP: [10.174.178.147] X-ClientProxiedBy: dggems705-chm.china.huawei.com (10.3.19.182) To dggpemm500002.china.huawei.com (7.185.36.229) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org On 2021/4/26 10:39, He Ying wrote: > hulk inclusion > category: bugfix > bugzilla: NA > DTS: NA > CVE: NA > > -------------------------------- > > We triggered the following error while running our 4.19 kernel > with the pseudo-NMI patches backported to it: > > [ 14.816231] ------------[ cut here ]------------ > [ 14.816231] kernel BUG at irq.c:99! > [ 14.816232] Internal error: Oops - BUG: 0 [#1] SMP > [ 14.816232] Process swapper/0 (pid: 0, stack limit = 0x(____ptrval____)) > [ 14.816233] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G O 4.19.95.aarch64 #14 > [ 14.816233] Hardware name: evb (DT) > [ 14.816234] pstate: 80400085 (Nzcv daIf +PAN -UAO) > [ 14.816234] pc : asm_nmi_enter+0x94/0x98 > [ 14.816235] lr : asm_nmi_enter+0x18/0x98 > [ 14.816235] sp : ffff000008003c50 > [ 14.816235] pmr_save: 00000070 > [ 14.816237] x29: ffff000008003c50 x28: ffff0000095f56c0 > [ 14.816238] x27: 0000000000000000 x26: ffff000008004000 > [ 14.816239] x25: 00000000015e0000 x24: ffff8008fb916000 > [ 14.816240] x23: 0000000020400005 x22: ffff0000080817cc > [ 14.816241] x21: ffff000008003da0 x20: 0000000000000060 > [ 14.816242] x19: 00000000000003ff x18: ffffffffffffffff > [ 14.816243] x17: 0000000000000008 x16: 003d090000000000 > [ 14.816244] x15: ffff0000095ea6c8 x14: ffff8008fff5ab40 > [ 14.816244] x13: ffff8008fff58b9d x12: 0000000000000000 > [ 14.816245] x11: ffff000008c8a200 x10: 000000008e31fca5 > [ 14.816246] x9 : ffff000008c8a208 x8 : 000000000000000f > [ 14.816247] x7 : 0000000000000004 x6 : ffff8008fff58b9e > [ 14.816248] x5 : 0000000000000000 x4 : 0000000080000000 > [ 14.816249] x3 : 0000000000000000 x2 : 0000000080000000 > [ 14.816250] x1 : 0000000000120000 x0 : ffff0000095f56c0 > [ 14.816251] Call trace: > [ 14.816251] asm_nmi_enter+0x94/0x98 > [ 14.816251] el1_irq+0x8c/0x180 (IRQ C) > [ 14.816252] gic_handle_irq+0xbc/0x2e4 > [ 14.816252] el1_irq+0xcc/0x180 (IRQ B) > [ 14.816253] arch_timer_handler_virt+0x38/0x58 > [ 14.816253] handle_percpu_devid_irq+0x90/0x240 > [ 14.816253] generic_handle_irq+0x34/0x50 > [ 14.816254] __handle_domain_irq+0x68/0xc0 > [ 14.816254] gic_handle_irq+0xf8/0x2e4 > [ 14.816255] el1_irq+0xcc/0x180 (IRQ A) > [ 14.816255] arch_cpu_idle+0x34/0x1c8 > [ 14.816255] default_idle_call+0x24/0x44 > [ 14.816256] do_idle+0x1d0/0x2c8 > [ 14.816256] cpu_startup_entry+0x28/0x30 > [ 14.816256] rest_init+0xb8/0xc8 > [ 14.816257] start_kernel+0x4c8/0x4f4 > [ 14.816257] Code: 940587f1 d5384100 b9401001 36a7fd01 (d4210000) > [ 14.816258] Modules linked in: start_dp(O) smeth(O) > [ 15.103092] ---[ end trace 701753956cb14aa8 ]--- > [ 15.103093] Kernel panic - not syncing: Fatal exception in interrupt > [ 15.103099] SMP: stopping secondary CPUs > [ 15.103100] Kernel Offset: disabled > [ 15.103100] CPU features: 0x36,a2400218 > [ 15.103100] Memory Limit: none > > which is cause by a 'BUG_ON(in_nmi())' in nmi_enter(). > >>>From the call trace, we can find three interrupts (noted A, B, C above): > interrupt (A) is preempted by (B), which is further interrupted by (C). > > Subsequent investigations show that (B) results in nmi_enter() being > called, but that it actually is a spurious interrupt. Furthermore, > interrupts are reenabled in the context of (B), and (C) fires with > NMI priority. We end-up with a nested NMI situation, something > we definitely do not want to (and cannot) handle. > > The bug here is that spurious interrupts should never result in any > state change, and we should just return to the interrupted context. > Moving the handling of spurious interrupts as early as possible in > the GICv3 handler fixes this issue. > > Fixes: 3f1f3234bc2d ("irqchip/gic-v3: Switch to PMR masking before calling IRQ handler") > Acked-by: Mark Rutland > Signed-off-by: He Ying > [maz: rewrote commit message, corrected Fixes: tag] > Signed-off-by: Marc Zyngier > Link: https://lore.kernel.org/r/20210423083516.170111-1-heying24@huawei.com > Cc: stable@vger.kernel.org > > Signed-off-by: He Ying > --- > drivers/irqchip/irq-gic-v3.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c > index 14e9c1a5627b..acf0ae4ef612 100644 > --- a/drivers/irqchip/irq-gic-v3.c > +++ b/drivers/irqchip/irq-gic-v3.c > @@ -518,6 +518,10 @@ static asmlinkage void __exception_irq_entry gic_handle_irq(struct pt_regs *regs > > irqnr = gic_read_iar(); > > + /* Check for special IDs first */ > + if ((irqnr >= 1020 && irqnr <= 1023)) > + return; > + > if (gic_supports_nmi() && > unlikely(gic_read_rpr() == GICD_INT_NMI_PRI)) { > gic_handle_nmi(irqnr, regs); > Reviewed-by: Hanjun Guo