From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B23CFC433EF for ; Tue, 28 Sep 2021 08:35:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8D13C611CA for ; Tue, 28 Sep 2021 08:35:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239641AbhI1Ihf (ORCPT ); Tue, 28 Sep 2021 04:37:35 -0400 Received: from foss.arm.com ([217.140.110.172]:41448 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239625AbhI1Ihc (ORCPT ); Tue, 28 Sep 2021 04:37:32 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 6B7466D; Tue, 28 Sep 2021 01:35:53 -0700 (PDT) Received: from C02TD0UTHF1T.local (unknown [10.57.23.93]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 1DCDC3F7B4; Tue, 28 Sep 2021 01:35:49 -0700 (PDT) Date: Tue, 28 Sep 2021 09:35:46 +0100 From: Mark Rutland To: "Paul E. McKenney" Cc: Pingfan Liu , Thomas Gleixner , linux-arm-kernel@lists.infradead.org, Catalin Marinas , Will Deacon , Marc Zyngier , Joey Gouly , Sami Tolvanen , Julien Thierry , Yuichi Ito , linux-kernel@vger.kernel.org, Sven Schnelle , Vasily Gorbik Subject: Re: [PATCHv2 0/5] arm64/irqentry: remove duplicate housekeeping of Message-ID: <20210928083546.GB1924@C02TD0UTHF1T.local> References: <20210924132837.45994-1-kernelfans@gmail.com> <20210924173615.GA42068@C02TD0UTHF1T.local> <20210924225954.GN880162@paulmck-ThinkPad-P17-Gen-1> <20210927092303.GC1131@C02TD0UTHF1T.local> <20210928000922.GY880162@paulmck-ThinkPad-P17-Gen-1> <20210928083222.GA1924@C02TD0UTHF1T.local> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210928083222.GA1924@C02TD0UTHF1T.local> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Sep 28, 2021 at 09:32:22AM +0100, Mark Rutland wrote: > On Mon, Sep 27, 2021 at 05:09:22PM -0700, Paul E. McKenney wrote: > > On Mon, Sep 27, 2021 at 10:23:18AM +0100, Mark Rutland wrote: > > > On Fri, Sep 24, 2021 at 03:59:54PM -0700, Paul E. McKenney wrote: > > > > On Fri, Sep 24, 2021 at 06:36:15PM +0100, Mark Rutland wrote: > > > > > [Adding Paul for RCU, s390 folk for entry code RCU semantics] > > > > > > > > > > On Fri, Sep 24, 2021 at 09:28:32PM +0800, Pingfan Liu wrote: > > > > > > After introducing arm64/kernel/entry_common.c which is akin to > > > > > > kernel/entry/common.c , the housekeeping of rcu/trace are done twice as > > > > > > the following: > > > > > > enter_from_kernel_mode()->rcu_irq_enter(). > > > > > > And > > > > > > gic_handle_irq()->...->handle_domain_irq()->irq_enter()->rcu_irq_enter() > > > > > > > > > > > > Besides redundance, based on code analysis, the redundance also raise > > > > > > some mistake, e.g. rcu_data->dynticks_nmi_nesting inc 2, which causes > > > > > > rcu_is_cpu_rrupt_from_idle() unexpected. > > > > > > > > > > Hmmm... > > > > > > > > > > The fundamental questionss are: > > > > > > > > > > 1) Who is supposed to be responsible for doing the rcu entry/exit? > > > > > > > > > > 2) Is it supposed to matter if this happens multiple times? > > > > > > > > > > For (1), I'd generally expect that this is supposed to happen in the > > > > > arch/common entry code, since that itself (or the irqchip driver) could > > > > > depend on RCU, and if that's the case thatn handle_domain_irq() > > > > > shouldn't need to call rcu_irq_enter(). That would be consistent with > > > > > the way we handle all other exceptions. > > > > > > > > > > For (2) I don't know whether the level of nesting is suppoosed to > > > > > matter. I was under the impression it wasn't meant to matter in general, > > > > > so I'm a little surprised that rcu_is_cpu_rrupt_from_idle() depends on a > > > > > specific level of nesting. > > > > > > > > > > >From a glance it looks like this would cause rcu_sched_clock_irq() to > > > > > skip setting TIF_NEED_RESCHED, and to not call invoke_rcu_core(), which > > > > > doesn't sound right, at least... > > > > > > > > > > Thomas, Paul, thoughts? > > > > > > > > It is absolutely required that rcu_irq_enter() and rcu_irq_exit() calls > > > > be balanced. Normally, this is taken care of by the fact that irq_enter() > > > > invokes rcu_irq_enter() and irq_exit() invokes rcu_irq_exit(). Similarly, > > > > nmi_enter() invokes rcu_nmi_enter() and nmi_exit() invokes rcu_nmi_exit(). > > > > > > Sure; I didn't mean to suggest those weren't balanced! The problem here > > > is *nesting*. Due to the structure of our entry code and the core IRQ > > > code, when handling an IRQ we have a sequence: > > > > > > irq_enter() // arch code > > > irq_enter() // irq code > > > > > > < irq handler here > > > > > > > irq_exit() // irq code > > > irq_exit() // arch code > > > > > > ... and if we use something like rcu_is_cpu_rrupt_from_idle() in the > > > middle (e.g. as part of rcu_sched_clock_irq()), this will not give the > > > expected result because of the additional nesting, since > > > rcu_is_cpu_rrupt_from_idle() seems to expect that dynticks_nmi_nesting > > > is only incremented once per exception entry, when it does: > > > > > > /* Are we at first interrupt nesting level? */ > > > nesting = __this_cpu_read(rcu_data.dynticks_nmi_nesting); > > > if (nesting > 1) > > > return false; > > > > > > What I'm trying to figure out is whether that expectation is legitimate, > > > and assuming so, where the entry/exit should happen. > > > > Oooh... > > > > The penalty for fooling rcu_is_cpu_rrupt_from_idle() is that RCU will > > be unable to detect a userspace quiescent state for a non-nohz_full > > CPU. That could result in RCU CPU stall warnings if a user task runs > > continuously on a given CPU for more than 21 seconds (60 seconds in > > some distros). And this can easily happen if the user has a CPU-bound > > thread that is the only runnable task on that CPU. > > > > So, yes, this does need some sort of resolution. > > > > The traditional approach is (as you surmise) to have only a single call > > to irq_enter() on exception entry and only a single call to irq_exit() > > on exception exit. If this is feasible, it is highly recommended. > > Cool; that's roughly what I was expecting / hoping to hear! > > > In theory, we could have that "1" in "nesting > 1" be a constant supplied > > by the architecture (you would want "3" if I remember correctly) but > > in practice could we please avoid this? For one thing, if there is > > some other path into the kernel for your architecture that does only a > > single irq_enter(), then rcu_is_cpu_rrupt_from_idle() just doesn't stand > > a chance. It would need to compare against a different value depending > > on what exception showed up. Even if that cannot happen, it would be > > better if your architecture could remain in blissful ignorance of the > > colorful details of ->dynticks_nmi_nesting manipulations. > > I completely agree. I think it's much harder to keep that in check than > to enforce a "once per architectural exception" policy in the arch code. > > > Another approach would be for the arch code to supply RCU a function that > > it calls. If there is such a function (or perhaps better, if some new > > Kconfig option is enabled), RCU invokes it. Otherwise, it compares to > > "1" as it does now. But you break it, you buy it! ;-) > > I guess we could look at the exception regs and inspect the original > context, but it sounds overkill... > > I think the cleanest thing is to leave this to arch code, and have the > common IRQ code stay well clear. Unfortunately most architectures > (including arch/arm) still need the common IRQ code to handle this, so > we'll have to make that conditional on Kconfig, something like the below > (build+boot tested only). > > If there are no objections, I'll go check who else needs the same > treatment (IIUC at least s390 will), and spin that as a real > patch/series. Ah, looking again this is basically Pinfan's patch 2, so ignore the below, and I'll review Pingfan's patch instead. Thanks, Mark. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2C336C433EF for ; Tue, 28 Sep 2021 08:37:52 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id EB27A61159 for ; Tue, 28 Sep 2021 08:37:51 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org EB27A61159 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=arm.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References: Message-ID:Subject:Cc:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=BezXGJvhgYwqK/5myny80JBmlXoUMWcY/wfKvhvMZts=; b=E1Es7L4v4pCvqa gW3pfjLH9dCZdHxvQ01v7c9JxxQftKmFE+2xbSO/4CtQ4UhTUWqnjzj/94FFLx9ut5qxd7gNDSnE7 FNgfSG7XN7l6+FPV9GYFb7ACnW3vV5LGQX7NCzjD47iVjGoyy1LLxP3jZgy+eXavhbX0mZGztzo74 fuIGquAje9pB4SnUjWnlUGOXvmDjjB27PmKF5uxCu0I7kGxVDMZ/sSoALY1EhQUdrNtfPjIJI+tRg /MMPVLqeSpElaVb7rHRZGCOIcj/IdZ3knQUIpdLyn1JJPsQ+9tcEoRAADaoAJwjhT/nSwNgF2+Nu8 lHGumvDIoFRiYAezutjg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1mV8an-006HFU-NB; Tue, 28 Sep 2021 08:36:01 +0000 Received: from foss.arm.com ([217.140.110.172]) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1mV8aj-006HEe-Cj for linux-arm-kernel@lists.infradead.org; Tue, 28 Sep 2021 08:35:59 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 6B7466D; Tue, 28 Sep 2021 01:35:53 -0700 (PDT) Received: from C02TD0UTHF1T.local (unknown [10.57.23.93]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 1DCDC3F7B4; Tue, 28 Sep 2021 01:35:49 -0700 (PDT) Date: Tue, 28 Sep 2021 09:35:46 +0100 From: Mark Rutland To: "Paul E. McKenney" Cc: Pingfan Liu , Thomas Gleixner , linux-arm-kernel@lists.infradead.org, Catalin Marinas , Will Deacon , Marc Zyngier , Joey Gouly , Sami Tolvanen , Julien Thierry , Yuichi Ito , linux-kernel@vger.kernel.org, Sven Schnelle , Vasily Gorbik Subject: Re: [PATCHv2 0/5] arm64/irqentry: remove duplicate housekeeping of Message-ID: <20210928083546.GB1924@C02TD0UTHF1T.local> References: <20210924132837.45994-1-kernelfans@gmail.com> <20210924173615.GA42068@C02TD0UTHF1T.local> <20210924225954.GN880162@paulmck-ThinkPad-P17-Gen-1> <20210927092303.GC1131@C02TD0UTHF1T.local> <20210928000922.GY880162@paulmck-ThinkPad-P17-Gen-1> <20210928083222.GA1924@C02TD0UTHF1T.local> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20210928083222.GA1924@C02TD0UTHF1T.local> X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210928_013557_564268_FF24C807 X-CRM114-Status: GOOD ( 54.14 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Tue, Sep 28, 2021 at 09:32:22AM +0100, Mark Rutland wrote: > On Mon, Sep 27, 2021 at 05:09:22PM -0700, Paul E. McKenney wrote: > > On Mon, Sep 27, 2021 at 10:23:18AM +0100, Mark Rutland wrote: > > > On Fri, Sep 24, 2021 at 03:59:54PM -0700, Paul E. McKenney wrote: > > > > On Fri, Sep 24, 2021 at 06:36:15PM +0100, Mark Rutland wrote: > > > > > [Adding Paul for RCU, s390 folk for entry code RCU semantics] > > > > > > > > > > On Fri, Sep 24, 2021 at 09:28:32PM +0800, Pingfan Liu wrote: > > > > > > After introducing arm64/kernel/entry_common.c which is akin to > > > > > > kernel/entry/common.c , the housekeeping of rcu/trace are done twice as > > > > > > the following: > > > > > > enter_from_kernel_mode()->rcu_irq_enter(). > > > > > > And > > > > > > gic_handle_irq()->...->handle_domain_irq()->irq_enter()->rcu_irq_enter() > > > > > > > > > > > > Besides redundance, based on code analysis, the redundance also raise > > > > > > some mistake, e.g. rcu_data->dynticks_nmi_nesting inc 2, which causes > > > > > > rcu_is_cpu_rrupt_from_idle() unexpected. > > > > > > > > > > Hmmm... > > > > > > > > > > The fundamental questionss are: > > > > > > > > > > 1) Who is supposed to be responsible for doing the rcu entry/exit? > > > > > > > > > > 2) Is it supposed to matter if this happens multiple times? > > > > > > > > > > For (1), I'd generally expect that this is supposed to happen in the > > > > > arch/common entry code, since that itself (or the irqchip driver) could > > > > > depend on RCU, and if that's the case thatn handle_domain_irq() > > > > > shouldn't need to call rcu_irq_enter(). That would be consistent with > > > > > the way we handle all other exceptions. > > > > > > > > > > For (2) I don't know whether the level of nesting is suppoosed to > > > > > matter. I was under the impression it wasn't meant to matter in general, > > > > > so I'm a little surprised that rcu_is_cpu_rrupt_from_idle() depends on a > > > > > specific level of nesting. > > > > > > > > > > >From a glance it looks like this would cause rcu_sched_clock_irq() to > > > > > skip setting TIF_NEED_RESCHED, and to not call invoke_rcu_core(), which > > > > > doesn't sound right, at least... > > > > > > > > > > Thomas, Paul, thoughts? > > > > > > > > It is absolutely required that rcu_irq_enter() and rcu_irq_exit() calls > > > > be balanced. Normally, this is taken care of by the fact that irq_enter() > > > > invokes rcu_irq_enter() and irq_exit() invokes rcu_irq_exit(). Similarly, > > > > nmi_enter() invokes rcu_nmi_enter() and nmi_exit() invokes rcu_nmi_exit(). > > > > > > Sure; I didn't mean to suggest those weren't balanced! The problem here > > > is *nesting*. Due to the structure of our entry code and the core IRQ > > > code, when handling an IRQ we have a sequence: > > > > > > irq_enter() // arch code > > > irq_enter() // irq code > > > > > > < irq handler here > > > > > > > irq_exit() // irq code > > > irq_exit() // arch code > > > > > > ... and if we use something like rcu_is_cpu_rrupt_from_idle() in the > > > middle (e.g. as part of rcu_sched_clock_irq()), this will not give the > > > expected result because of the additional nesting, since > > > rcu_is_cpu_rrupt_from_idle() seems to expect that dynticks_nmi_nesting > > > is only incremented once per exception entry, when it does: > > > > > > /* Are we at first interrupt nesting level? */ > > > nesting = __this_cpu_read(rcu_data.dynticks_nmi_nesting); > > > if (nesting > 1) > > > return false; > > > > > > What I'm trying to figure out is whether that expectation is legitimate, > > > and assuming so, where the entry/exit should happen. > > > > Oooh... > > > > The penalty for fooling rcu_is_cpu_rrupt_from_idle() is that RCU will > > be unable to detect a userspace quiescent state for a non-nohz_full > > CPU. That could result in RCU CPU stall warnings if a user task runs > > continuously on a given CPU for more than 21 seconds (60 seconds in > > some distros). And this can easily happen if the user has a CPU-bound > > thread that is the only runnable task on that CPU. > > > > So, yes, this does need some sort of resolution. > > > > The traditional approach is (as you surmise) to have only a single call > > to irq_enter() on exception entry and only a single call to irq_exit() > > on exception exit. If this is feasible, it is highly recommended. > > Cool; that's roughly what I was expecting / hoping to hear! > > > In theory, we could have that "1" in "nesting > 1" be a constant supplied > > by the architecture (you would want "3" if I remember correctly) but > > in practice could we please avoid this? For one thing, if there is > > some other path into the kernel for your architecture that does only a > > single irq_enter(), then rcu_is_cpu_rrupt_from_idle() just doesn't stand > > a chance. It would need to compare against a different value depending > > on what exception showed up. Even if that cannot happen, it would be > > better if your architecture could remain in blissful ignorance of the > > colorful details of ->dynticks_nmi_nesting manipulations. > > I completely agree. I think it's much harder to keep that in check than > to enforce a "once per architectural exception" policy in the arch code. > > > Another approach would be for the arch code to supply RCU a function that > > it calls. If there is such a function (or perhaps better, if some new > > Kconfig option is enabled), RCU invokes it. Otherwise, it compares to > > "1" as it does now. But you break it, you buy it! ;-) > > I guess we could look at the exception regs and inspect the original > context, but it sounds overkill... > > I think the cleanest thing is to leave this to arch code, and have the > common IRQ code stay well clear. Unfortunately most architectures > (including arch/arm) still need the common IRQ code to handle this, so > we'll have to make that conditional on Kconfig, something like the below > (build+boot tested only). > > If there are no objections, I'll go check who else needs the same > treatment (IIUC at least s390 will), and spin that as a real > patch/series. Ah, looking again this is basically Pinfan's patch 2, so ignore the below, and I'll review Pingfan's patch instead. Thanks, Mark. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel