From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 65A11C433E6 for ; Fri, 12 Feb 2021 01:19:55 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 39DB264E3C for ; Fri, 12 Feb 2021 01:19:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229752AbhBLBTl convert rfc822-to-8bit (ORCPT ); Thu, 11 Feb 2021 20:19:41 -0500 Received: from szxga02-in.huawei.com ([45.249.212.188]:3019 "EHLO szxga02-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229539AbhBLBTg (ORCPT ); Thu, 11 Feb 2021 20:19:36 -0500 Received: from DGGEMM405-HUB.china.huawei.com (unknown [172.30.72.53]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4DcFwb3mh3zRCWT; Fri, 12 Feb 2021 09:17:35 +0800 (CST) Received: from dggemi709-chm.china.huawei.com (10.3.20.108) by DGGEMM405-HUB.china.huawei.com (10.3.20.213) with Microsoft SMTP Server (TLS) id 14.3.498.0; Fri, 12 Feb 2021 09:18:52 +0800 Received: from dggemi761-chm.china.huawei.com (10.1.198.147) by dggemi709-chm.china.huawei.com (10.3.20.108) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.2106.2; Fri, 12 Feb 2021 09:18:52 +0800 Received: from dggemi761-chm.china.huawei.com ([10.9.49.202]) by dggemi761-chm.china.huawei.com ([10.9.49.202]) with mapi id 15.01.2106.006; Fri, 12 Feb 2021 09:18:52 +0800 From: "Song Bao Hua (Barry Song)" To: "tglx@linutronix.de" , "gregkh@linuxfoundation.org" , "arnd@arndb.de" , "geert@linux-m68k.org" , "funaho@jurai.org" , "philb@gnu.org" , "corbet@lwn.net" , "mingo@redhat.com" CC: "linux-m68k@lists.linux-m68k.org" , "fthain@telegraphics.com.au" , "linux-kernel@vger.kernel.org" Subject: [RFC] IRQ handlers run with some high-priority interrupts(not NMI) enabled on some platform Thread-Topic: [RFC] IRQ handlers run with some high-priority interrupts(not NMI) enabled on some platform Thread-Index: AdcA2xDwQTa7W6j6SmS4J3iBnsSynA== Date: Fri, 12 Feb 2021 01:18:52 +0000 Message-ID: Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.126.201.23] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, I am getting a very long debate with Finn in this thread: https://lore.kernel.org/lkml/1612697823-8073-1-git-send-email-tanxiaofei@huawei.com/ In short, the debate is about if high-priority IRQs (*not NMI*) are allowed to preempt an running IRQ handler in hardIRQ context. In my understanding, right now IRQ handlers are running with *all* interrupts disabled since this commit and IRQF_DISABLED was dropped: e58aa3d2d0cc genirq: Run irq handlers with interrupts disabled b738a50a2026 genirq: Warn when handler enables interrupts We run all handlers with interrupts disabled and expect them not to enable them. Warn when we catch one who does. While it seems to be true in almost all platforms, it seems to be false on m68k. According to Finn, while IRQ handlers are running, high-priority interrupts can still jump out on m68k. A driver which is handling this issue is here: drivers/net/ethernet/natsemi/sonic.c. you can read the comment: static irqreturn_t sonic_interrupt(int irq, void *dev_id) { struct net_device *dev = dev_id; struct sonic_local *lp = netdev_priv(dev); int status; unsigned long flags; /* The lock has two purposes. Firstly, it synchronizes sonic_interrupt() * with sonic_send_packet() so that the two functions can share state. * Secondly, it makes sonic_interrupt() re-entrant, as that is required * by macsonic which must use two IRQs with different priority levels. */ spin_lock_irqsave(&lp->lock, flags); status = SONIC_READ(SONIC_ISR) & SONIC_IMR_DEFAULT; if (!status) { spin_unlock_irqrestore(&lp->lock, flags); return IRQ_NONE; } } So m68k does allow a high-priority interrupt to preempt a hardIRQ so the code needs to call irqsave to protect this risk. That is to say, some interrupts are not disabled during hardIRQ of m68k. But m68k doesn't trigger any warning for !irqs_disabled() in genirq: irqreturn_t __handle_irq_event_percpu(struct irq_desc *desc, unsigned int *flags) { ... trace_irq_handler_entry(irq, action); res = action->handler(irq, action->dev_id); trace_irq_handler_exit(irq, action, res); if (WARN_ONCE(!irqs_disabled(),"irq %u handler %pS enabled interrupts\n", irq, action->handler)) local_irq_disable(); } The reason is: * arch_irqs_disabled() return true while low-priority interrupts are disabled though high-priority interrupts are still open; * local_irq_disable, spin_lock_irqsave() etc will disable high-priority interrupt (IPL 7); * arch_irqs_disabled() also return true while both low and high priority interrupts interrupts are disabled. Note m68k has several interrupt levels. But in the above description, I simply abstract them as high and low to help the understanding. I think m68k lets arch_irq_disabled() return true in relatively weaker condition to pretend all IRQs are disabled while high-priority IRQ is still open, thus pass all sanitizing check in genirq and kernel core. But Finn strongly disagreed. I am not saying I am right and Finn is wrong. But I think we need somewhere to clarify this problem. Personally, I would prefer "interrupts disabled" mean "all except NMI", So I'd like to guard this by: diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h index 7c9d6a2d7e90..b8ca27555c76 100644 --- a/include/linux/hardirq.h +++ b/include/linux/hardirq.h @@ -32,6 +32,7 @@ static __always_inline void rcu_irq_enter_check_tick(void) */ #define __irq_enter() \ do { \ + WARN_ONCE(in_hardirq() && irqs_disabled(), "nested interrupts\n"); \ preempt_count_add(HARDIRQ_OFFSET); \ lockdep_hardirq_enter(); \ account_hardirq_enter(current); \ @@ -44,6 +45,7 @@ static __always_inline void rcu_irq_enter_check_tick(void) */ #define __irq_enter_raw() \ do { \ + WARN_ONCE(in_hardirq() && irqs_disabled(), " nested interrupts\n"); \ preempt_count_add(HARDIRQ_OFFSET); \ lockdep_hardirq_enter(); \ } while (0) Though Finn thought it lacks any justification So I am requesting comments on: 1. are we expecting all interrupts except NMI to be disabled in irq handler, or do we actually allow some high-priority interrupts between low and NMI to come in some platforms? 2. If either side is true, I think we need to document it somewhere as there is always confusion about this. Personally, I would expect all interrupts to be disabled and I like the way of ARM64 to only use high-priority interrupt as pseudo NMI: https://lwn.net/Articles/755906/ Though Finn argued that this will contribute to lose hardware feature of m68k. Thanks Barry