From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5748EC43462 for ; Fri, 23 Apr 2021 08:10:44 +0000 (UTC) Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id BC0EE613CD for ; Fri, 23 Apr 2021 08:10:43 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BC0EE613CD Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=desiato.20200630; h=Sender:Content-Transfer-Encoding :Content-Type:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To:Subject:Cc:To: From:Message-ID:Date:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=hjgj1g91neiYpDk1skBa+gM+5xxQDTICHTD5b+X2lk4=; b=BJevZSFJzgT0/cGepeIIEZnK6 Tarji9ZcubcAUzB7RUI5UoDSwUx2VXJVJB/djdnVBaWza2/IwAgUdJY0akvMfUP1RB9ZrhlbfLGvN r4NUQplwVPP6Glwex1qL7yS4s0ZErSj+I/vG12LE9fyjsNCDMfqf6LQM6sUJvDbfO+DQ/+4HLWHiT R28KgUjh7vbgUGAjT1FPH1eM4V3qoMPQvdSqvqaz79c79jBTqS+Oax4BGuPT6d8UzZSQdZypnOyjn mdaZt2gAToz5rEzPAw6ZjnKxVnI2wAvBgtFYhMmzy3PeTOIuPOlpu1apA8KFCg1N7NRQq30u63zBW j2yDJ6+8w==; Received: from localhost ([::1] helo=desiato.infradead.org) by desiato.infradead.org with esmtp (Exim 4.94 #2 (Red Hat Linux)) id 1lZqs2-0010y6-F5; Fri, 23 Apr 2021 08:09:03 +0000 Received: from bombadil.infradead.org ([2607:7c80:54:e::133]) by desiato.infradead.org with esmtps (Exim 4.94 #2 (Red Hat Linux)) id 1lZqrS-0010w6-BL for linux-arm-kernel@desiato.infradead.org; Fri, 23 Apr 2021 08:08:30 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Type:MIME-Version:References: In-Reply-To:Subject:Cc:To:From:Message-ID:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=NRiJ6/xD+W+ughrQ8hoTkftiGOtQszF8NhVHYp/Y/2A=; b=phXNpDtwFNABMLJ/WWsMR3eY7g 7e8gRKNptlhql9tKvOQRaYO39TM8DnLONYxo0l+u1hR8ARMDthy+j/LSuMmp+/Ok+1q8a9fdJ24i6 X68DgFSNAAAgKdKZ/gTbtLoodpi+TMZEs3804mGNOdBxIIaYEHT+4EJ4AywnBZG6BiCfNhgvMJW/3 YtscIJ3zqgmUQR23TpUqUCccHGTETjuLfKkEe0KDNnxJPl/Z0h5bzNnmmgk1UCH1pYjBwisLA4LxC gaex0xipXjLQ1i/VfRIXnKjVRfiv8wW8CmJfYEvR6nMfascz5q0oTr0/M6L8YJ1JvVy48zbx291Em KLPpoTGQ==; Received: from mail.kernel.org ([198.145.29.99]) by bombadil.infradead.org with esmtps (Exim 4.94 #2 (Red Hat Linux)) id 1lZqrO-00EF3n-V4 for linux-arm-kernel@lists.infradead.org; Fri, 23 Apr 2021 08:08:25 +0000 Received: from disco-boy.misterjones.org (disco-boy.misterjones.org [51.254.78.96]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 540D261181; Fri, 23 Apr 2021 08:08:22 +0000 (UTC) Received: from 78.163-31-62.static.virginmediabusiness.co.uk ([62.31.163.78] helo=why.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94) (envelope-from ) id 1lZqrM-00922S-0q; Fri, 23 Apr 2021 09:08:20 +0100 Date: Fri, 23 Apr 2021 09:08:13 +0100 Message-ID: <87tunxo236.wl-maz@kernel.org> From: Marc Zyngier To: He Ying Cc: , , , , , Mark Rutland Subject: Re: [RFC PATCH] irqchip/gic-v3: Do not enable irqs when handling spurious interrups In-Reply-To: <47abc8a6-0f73-d1c0-789f-e979d4191ab2@huawei.com> References: <20210416062217.25157-1-heying24@huawei.com> <87y2dis4d7.wl-maz@kernel.org> <875z0eijxh.wl-maz@kernel.org> <47abc8a6-0f73-d1c0-789f-e979d4191ab2@huawei.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/27.1 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") X-SA-Exim-Connect-IP: 62.31.163.78 X-SA-Exim-Rcpt-To: heying24@huawei.com, tglx@linutronix.de, julien.thierry.kdev@gmail.com, catalin.marinas@arm.com, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, mark.rutland@arm.com X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210423_010823_096923_20FC5B9C X-CRM114-Status: GOOD ( 36.72 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Fri, 23 Apr 2021 04:29:44 +0100, He Ying wrote: [...] > >>>> I look into this issue and find that it's caused by 'BUG_ON(in_nmi())' > >>>> in nmi_enter(). From the call trace, we find two 'el1_irqs' which > >>>> means an interrupt preempts the other one and the new one is an NMI. > >>>> Furthermore, by adding some prints, we find the first irq also calls > >>>> nmi_enter(), but its priority is not GICD_INT_NMI_PRI and its irq number > >>>> is 1023. It enables irq by calling gic_arch_enable_irqs() in > >>>> gic_handle_irq(). At this moment, the second irq preempts the first irq > >>>> and it's an NMI but current context is already in nmi. So that may be > >>>> the problem. > >>> I'm not sure I get it. From the stack trace, I see this: > >>> > >>> [ 14.816251] asm_nmi_enter+0x94/0x98 > >>> [ 14.816251] el1_irq+0x8c/0x180 (C) > >>> [ 14.816252] gic_handle_irq+0xbc/0x2e4 > >>> [ 14.816252] el1_irq+0xcc/0x180 (B) > >>> [ 14.816253] arch_timer_handler_virt+0x38/0x58 > >>> [ 14.816253] handle_percpu_devid_irq+0x90/0x240 > >>> [ 14.816253] generic_handle_irq+0x34/0x50 > >>> [ 14.816254] __handle_domain_irq+0x68/0xc0 > >>> [ 14.816254] gic_handle_irq+0xf8/0x2e4 > >>> [ 14.816255] el1_irq+0xcc/0x180 (A) > >>> > >>> which indicates that we preempted a timer interrupt (A) with another > >>> IRQ (B), itself immediately preempted by another IRQ (C)? That's > >>> indeed at least one too many. > >>> > >>> Can you please describe for each of (A), (B) and (C) whether they are > >>> spurious or not, what their priorities are if they aren't spurious? > >> Yes. I ignored interrupt (A). (B) is spurious and its priority is > >> 0xa0 and PMR is 0x70. (C) is an NMI and its priority is 0x20. Note > >> that GIC_PRIO_IRQON is 0xe0, GIC_PRIO_IRQOFF is 0x60, > >> GICD_INT_DEF_PRI is 0xa0 and GICD_INT_NMI_PRI is 0x20 in our kernel. > > If (B) is spurious (aka ICC_IAR1R_EL1 return 1023), then its > > "priority" doesn't really exist, and I don't really get what you mean > > by "its priority is 0xa0". ICC_RPR_EL1 shouldn't change when Ack-ing > > a spurious interrupt, because there is no change in GIC state at all. > OK. By saying "its priority is 0xa0", I just mean ICC_RPR_EL1 is read > as 0xa0. Right. That's very different from saying that an interrupt has priority 0xA0. > > > > And if PMR is 0x70 at the point where you get (B), then I really can't > > see how you can get an interrupt of priority 0xa0 anyway. > > Yes, it also confuses me. Perhaps ICC_RPR_EL1 changes when read, I > think. No, it just means that we were already in an interrupt context when we handled the spurious interrupt: I think interrupt (A) above was being handled with priority 0xa0, preempted by a spurious interrupt (B) which did the wrong thing by messing with the NMI state. Probably (B) was an NMI that got retired early and presented again to the cpuif as (C). If that's the case, the GIC implementation may be a bit flaky. But the driver definitely has a bug. [...] > > I believe the above patch would fix the spurious interrupt issue you > > have experienced. Please let me know, and post a v2 if this works for > > you. > > Fortunately, it works. I'll post a v2. Should I cc stable mail list? Yes please. Thanks, M. -- Without deviation from the norm, progress is not possible. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel