From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 29B18C433F5
	for <linux-kernel@archiver.kernel.org>; Fri,  8 Oct 2021 17:25:28 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 0377A60F6B
	for <linux-kernel@archiver.kernel.org>; Fri,  8 Oct 2021 17:25:27 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232682AbhJHR1W (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 8 Oct 2021 13:27:22 -0400
Received: from foss.arm.com ([217.140.110.172]:37642 "EHLO foss.arm.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S229606AbhJHR1U (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 8 Oct 2021 13:27:20 -0400
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
        by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 0151C1063;
        Fri,  8 Oct 2021 10:25:25 -0700 (PDT)
Received: from C02TD0UTHF1T.local (unknown [10.57.27.111])
        by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 80B7F3F766;
        Fri,  8 Oct 2021 10:25:20 -0700 (PDT)
Date:   Fri, 8 Oct 2021 18:25:13 +0100
From:   Mark Rutland <mark.rutland@arm.com>
To:     Pingfan Liu <kernelfans@gmail.com>
Cc:     "Paul E. McKenney" <paulmck@kernel.org>,
        linux-arm-kernel@lists.infradead.org,
        Catalin Marinas <catalin.marinas@arm.com>,
        Will Deacon <will@kernel.org>, Marc Zyngier <maz@kernel.org>,
        Joey Gouly <joey.gouly@arm.com>,
        Sami Tolvanen <samitolvanen@google.com>,
        Julien Thierry <julien.thierry@arm.com>,
        Thomas Gleixner <tglx@linutronix.de>,
        Yuichi Ito <ito-yuichi@fujitsu.com>,
        linux-kernel@vger.kernel.org
Subject: Re: [PATCHv2 1/5] arm64/entry-common: push the judgement of nmi ahead
Message-ID: <20211008172513.GD976@C02TD0UTHF1T.local>
References: <20210924132837.45994-1-kernelfans@gmail.com>
 <20210924132837.45994-2-kernelfans@gmail.com>
 <20210924175306.GB42068@C02TD0UTHF1T.local>
 <YU9Cy9kTew4ySeGZ@piliu.users.ipa.redhat.com>
 <20210930133257.GB18258@lakrids.cambridge.arm.com>
 <YV/ClUNWvMga3qud@piliu.users.ipa.redhat.com>
 <YWBbyPJPpt5zgj+b@piliu.users.ipa.redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <YWBbyPJPpt5zgj+b@piliu.users.ipa.redhat.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Fri, Oct 08, 2021 at 10:55:04PM +0800, Pingfan Liu wrote:
> On Fri, Oct 08, 2021 at 12:01:25PM +0800, Pingfan Liu wrote:
> > Sorry that I missed this message and I am just back from a long
> > festival.
> > 
> > Adding Paul for RCU guidance.
> > 
> > On Thu, Sep 30, 2021 at 02:32:57PM +0100, Mark Rutland wrote:
> > > On Sat, Sep 25, 2021 at 11:39:55PM +0800, Pingfan Liu wrote:
> > > > On Fri, Sep 24, 2021 at 06:53:06PM +0100, Mark Rutland wrote:
> > > > > On Fri, Sep 24, 2021 at 09:28:33PM +0800, Pingfan Liu wrote:
> > > > > > In enter_el1_irq_or_nmi(), it can be the case which NMI interrupts an
> > > > > > irq, which makes the condition !interrupts_enabled(regs) fail to detect
> > > > > > the NMI. This will cause a mistaken account for irq.
> > > > > 
> > > > Sorry about the confusing word "account", it should be "lockdep/rcu/.."
> > > > 
> > > > > Can you please explain this in more detail? It's not clear which
> > > > > specific case you mean when you say "NMI interrupts an irq", as that
> > > > > could mean a number of distinct scenarios.
> > > > > 
> > > > > AFAICT, if we're in an IRQ handler (with NMIs unmasked), and an NMI
> > > > > causes a new exception we'll do the right thing. So either I'm missing a
> > > > > subtlety or you're describing a different scenario..
> > > > > 
> > > > > Note that the entry code is only trying to distinguish between:
> > > > > 
> > > > > a) This exception is *definitely* an NMI (because regular interrupts
> > > > >    were masked).
> > > > > 
> > > > > b) This exception is *either* and IRQ or an NMI (and this *cannot* be
> > > > >    distinguished until we acknowledge the interrupt), so we treat it as
> > > > >    an IRQ for now.
> > > > > 
> > > > b) is the aim.
> > > > 
> > > > At the entry, enter_el1_irq_or_nmi() -> enter_from_kernel_mode()->rcu_irq_enter()/rcu_irq_enter_check_tick() etc.
> > > > While at irqchip level, gic_handle_irq()->gic_handle_nmi()->nmi_enter(),
> > > > which does not call rcu_irq_enter_check_tick(). So it is not proper to
> > > > "treat it as an IRQ for now"
> > > 
> > > I'm struggling to understand the problem here. What is "not proper", and
> > > why?
> > > 
> > > Do you think there's a correctness problem, or that we're doing more
> > > work than necessary? 
> > > 
> > I had thought it just did redundant accounting. But after revisiting RCU
> > code, I think it confronts a real bug.
> > 
> > > If you could give a specific example of a problem, it would really help.
> > > 
> > Refer to rcu_nmi_enter(), which can be called by
> > enter_from_kernel_mode():
> > 
> > ||noinstr void rcu_nmi_enter(void)
> > ||{
> > ||        ...
> > ||        if (rcu_dynticks_curr_cpu_in_eqs()) {
> > ||
> > ||                if (!in_nmi())
> > ||                        rcu_dynticks_task_exit();
> > ||
> > ||                // RCU is not watching here ...
> > ||                rcu_dynticks_eqs_exit();
> > ||                // ... but is watching here.
> > ||
> > ||                if (!in_nmi()) {
> > ||                        instrumentation_begin();
> > ||                        rcu_cleanup_after_idle();
> > ||                        instrumentation_end();
> > ||                }
> > ||
> > ||                instrumentation_begin();
> > ||                // instrumentation for the noinstr rcu_dynticks_curr_cpu_in_eqs()
> > ||                instrument_atomic_read(&rdp->dynticks, sizeof(rdp->dynticks));
> > ||                // instrumentation for the noinstr rcu_dynticks_eqs_exit()
> > ||                instrument_atomic_write(&rdp->dynticks, sizeof(rdp->dynticks));
> > ||
> > ||                incby = 1;
> > ||        } else if (!in_nmi()) {
> > ||                instrumentation_begin();
> > ||                rcu_irq_enter_check_tick();
> > ||        } else  {
> > ||                instrumentation_begin();
> > ||        }
> > ||        ...
> > ||}
> > 
> 
> Forget to supplement the context for understanding the case:
>   On arm64, at present, a pNMI (akin to NMI) may call rcu_nmi_enter()
>   without calling "__preempt_count_add(NMI_OFFSET + HARDIRQ_OFFSET);".
>   As a result it can be mistaken as an normal interrupt in
>   rcu_nmi_enter().

I appreciate that there's a window where we treat the pNMI like an IRQ,
but that's by design, and we account for this in gic_handle_irq() and
gic_handle_nmi() where we "upgrade" to NMI context with
nmi_enter()..nmi_exit().

The idea is that we have two cases: 

1) If we take a pNMI from a context where IRQs were masked, we know it
   must be a pNMI, and perform the NMI entry immediately to avoid
   reentrancy problems. 

   I think we're all happy with this case.

2) If we take a pNMI from a context where IRQs were unmasked, we don't know
   whether the trigger was a pNMI/IRQ until we read from the GIC, and
   since we *could* have taken an IRQ, this is equivalent to taking a
   spurious IRQ, and while handling that, taking the NMI, e.g.
   
   < run with IRQs unmasked >
     ~~~ take IRQ ~~~
     < enter IRQ >
       ~~~ take NMI exception ~~~
       < enter NMI >
       < handle NMI >
       < exit NMI > 
       ~~~ return from NMI exception ~~~
     < handle IRQ / spurious / do-nothing >
     < exit IRQ >
     ~~~ return from IRQ exception ~~~
   < continue running with IRQs unmasked >

   ... except that we don't do the HW NMI exception entry/exit, just all
   the necessary SW accounting.


Note that case (2) can *never* nest within itself or within case (1).

Do you have a specific example of something that goes wrong with the
above? e.g. something that's inconsistent with that rationale?

> And this may cause the following issue:
> > There is 3 pieces of code put under the
> > protection of if (!in_nmi()). At least the last one
> > "rcu_irq_enter_check_tick()" can trigger a hard lock up bug. Because it
> > is supposed to hold a spin lock with irqoff by
> > "raw_spin_lock_rcu_node(rdp->mynode)", but pNMI can breach it. The same
> > scenario in rcu_nmi_exit()->rcu_prepare_for_idle().
> > 
> > As for the first two "if (!in_nmi())", I have no idea of why, except
> > breaching spin_lock_irq() by NMI. Hope Paul can give some guide.

That code (in enter_from_kernel_mode()) only runs in case 2, where it
cannot be nested within a pNMI, so I struggle to see how this can
deadlock. It it can, then I would expect the general case of a pNMI
nesting within and IRQ would be broken?

Can you give a concrete example of a sequence that would lockup?
Currently I can't see how that's possible.

Thanks,
Mark.

From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=fGbl=O4=lists.infradead.org=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id A87B0C433EF
	for <linux-arm-kernel@archiver.kernel.org>; Fri,  8 Oct 2021 17:27:05 +0000 (UTC)
Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id 6E48560C4B
	for <linux-arm-kernel@archiver.kernel.org>; Fri,  8 Oct 2021 17:27:05 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 6E48560C4B
Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=arm.com
Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lists.infradead.org
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
	d=lists.infradead.org; s=bombadil.20210309; h=Sender:
	Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post:
	List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References:
	Message-ID:Subject:Cc:To:From:Date:Reply-To:Content-ID:Content-Description:
	Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:
	List-Owner; bh=eKWaEsncCNqpFCTMpkw+n7bfQW4NqHqFuBSzguHhgHg=; b=yWafGGvVpT3xdq
	gv9SBHgfdetIYawqeNiZclHNyYY74WihrzmB2GbJbTh92NIBzKjl2dd+sQ4JVqMFJy3wu6a25J8i4
	1N1H0PhQ4QL/34eMoRLUCb7QO32HXGttafRUgULw68DcOSXsEqlyE+6bhpjWPxjoTkE7ntIumG3Lr
	YWNK4I23IO76tESlewR0EXQA4JBPC5y66ZOzbZOP+CvpySDiABbNkQ1drJMzQ6oETVCNlj52yzlk7
	b8Ih+VMbskIOAZXLmB9j7qqyEJwF3qDVPVppO7EIqOJAIokiPsamLZT00O4PML6pMGyN1kOPcb47P
	Jz94femva+dr3VHZjXpA==;
Received: from localhost ([::1] helo=bombadil.infradead.org)
	by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux))
	id 1mYtci-003hdq-CG; Fri, 08 Oct 2021 17:25:32 +0000
Received: from foss.arm.com ([217.140.110.172])
 by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux))
 id 1mYtce-003hd2-Lw
 for linux-arm-kernel@lists.infradead.org; Fri, 08 Oct 2021 17:25:30 +0000
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
 by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 0151C1063;
 Fri,  8 Oct 2021 10:25:25 -0700 (PDT)
Received: from C02TD0UTHF1T.local (unknown [10.57.27.111])
 by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 80B7F3F766;
 Fri,  8 Oct 2021 10:25:20 -0700 (PDT)
Date: Fri, 8 Oct 2021 18:25:13 +0100
From: Mark Rutland <mark.rutland@arm.com>
To: Pingfan Liu <kernelfans@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>,
 linux-arm-kernel@lists.infradead.org,
 Catalin Marinas <catalin.marinas@arm.com>,
 Will Deacon <will@kernel.org>, Marc Zyngier <maz@kernel.org>,
 Joey Gouly <joey.gouly@arm.com>, Sami Tolvanen <samitolvanen@google.com>,
 Julien Thierry <julien.thierry@arm.com>,
 Thomas Gleixner <tglx@linutronix.de>,
 Yuichi Ito <ito-yuichi@fujitsu.com>, linux-kernel@vger.kernel.org
Subject: Re: [PATCHv2 1/5] arm64/entry-common: push the judgement of nmi ahead
Message-ID: <20211008172513.GD976@C02TD0UTHF1T.local>
References: <20210924132837.45994-1-kernelfans@gmail.com>
 <20210924132837.45994-2-kernelfans@gmail.com>
 <20210924175306.GB42068@C02TD0UTHF1T.local>
 <YU9Cy9kTew4ySeGZ@piliu.users.ipa.redhat.com>
 <20210930133257.GB18258@lakrids.cambridge.arm.com>
 <YV/ClUNWvMga3qud@piliu.users.ipa.redhat.com>
 <YWBbyPJPpt5zgj+b@piliu.users.ipa.redhat.com>
MIME-Version: 1.0
Content-Disposition: inline
In-Reply-To: <YWBbyPJPpt5zgj+b@piliu.users.ipa.redhat.com>
X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 
X-CRM114-CacheID: sfid-20211008_102528_843164_549E1613 
X-CRM114-Status: GOOD (  49.75  )
X-BeenThere: linux-arm-kernel@lists.infradead.org
X-Mailman-Version: 2.1.34
Precedence: list
List-Id: <linux-arm-kernel.lists.infradead.org>
List-Unsubscribe: <http://lists.infradead.org/mailman/options/linux-arm-kernel>, 
 <mailto:linux-arm-kernel-request@lists.infradead.org?subject=unsubscribe>
List-Archive: <http://lists.infradead.org/pipermail/linux-arm-kernel/>
List-Post: <mailto:linux-arm-kernel@lists.infradead.org>
List-Help: <mailto:linux-arm-kernel-request@lists.infradead.org?subject=help>
List-Subscribe: <http://lists.infradead.org/mailman/listinfo/linux-arm-kernel>, 
 <mailto:linux-arm-kernel-request@lists.infradead.org?subject=subscribe>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org>
Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org

On Fri, Oct 08, 2021 at 10:55:04PM +0800, Pingfan Liu wrote:
> On Fri, Oct 08, 2021 at 12:01:25PM +0800, Pingfan Liu wrote:
> > Sorry that I missed this message and I am just back from a long
> > festival.
> > 
> > Adding Paul for RCU guidance.
> > 
> > On Thu, Sep 30, 2021 at 02:32:57PM +0100, Mark Rutland wrote:
> > > On Sat, Sep 25, 2021 at 11:39:55PM +0800, Pingfan Liu wrote:
> > > > On Fri, Sep 24, 2021 at 06:53:06PM +0100, Mark Rutland wrote:
> > > > > On Fri, Sep 24, 2021 at 09:28:33PM +0800, Pingfan Liu wrote:
> > > > > > In enter_el1_irq_or_nmi(), it can be the case which NMI interrupts an
> > > > > > irq, which makes the condition !interrupts_enabled(regs) fail to detect
> > > > > > the NMI. This will cause a mistaken account for irq.
> > > > > 
> > > > Sorry about the confusing word "account", it should be "lockdep/rcu/.."
> > > > 
> > > > > Can you please explain this in more detail? It's not clear which
> > > > > specific case you mean when you say "NMI interrupts an irq", as that
> > > > > could mean a number of distinct scenarios.
> > > > > 
> > > > > AFAICT, if we're in an IRQ handler (with NMIs unmasked), and an NMI
> > > > > causes a new exception we'll do the right thing. So either I'm missing a
> > > > > subtlety or you're describing a different scenario..
> > > > > 
> > > > > Note that the entry code is only trying to distinguish between:
> > > > > 
> > > > > a) This exception is *definitely* an NMI (because regular interrupts
> > > > >    were masked).
> > > > > 
> > > > > b) This exception is *either* and IRQ or an NMI (and this *cannot* be
> > > > >    distinguished until we acknowledge the interrupt), so we treat it as
> > > > >    an IRQ for now.
> > > > > 
> > > > b) is the aim.
> > > > 
> > > > At the entry, enter_el1_irq_or_nmi() -> enter_from_kernel_mode()->rcu_irq_enter()/rcu_irq_enter_check_tick() etc.
> > > > While at irqchip level, gic_handle_irq()->gic_handle_nmi()->nmi_enter(),
> > > > which does not call rcu_irq_enter_check_tick(). So it is not proper to
> > > > "treat it as an IRQ for now"
> > > 
> > > I'm struggling to understand the problem here. What is "not proper", and
> > > why?
> > > 
> > > Do you think there's a correctness problem, or that we're doing more
> > > work than necessary? 
> > > 
> > I had thought it just did redundant accounting. But after revisiting RCU
> > code, I think it confronts a real bug.
> > 
> > > If you could give a specific example of a problem, it would really help.
> > > 
> > Refer to rcu_nmi_enter(), which can be called by
> > enter_from_kernel_mode():
> > 
> > ||noinstr void rcu_nmi_enter(void)
> > ||{
> > ||        ...
> > ||        if (rcu_dynticks_curr_cpu_in_eqs()) {
> > ||
> > ||                if (!in_nmi())
> > ||                        rcu_dynticks_task_exit();
> > ||
> > ||                // RCU is not watching here ...
> > ||                rcu_dynticks_eqs_exit();
> > ||                // ... but is watching here.
> > ||
> > ||                if (!in_nmi()) {
> > ||                        instrumentation_begin();
> > ||                        rcu_cleanup_after_idle();
> > ||                        instrumentation_end();
> > ||                }
> > ||
> > ||                instrumentation_begin();
> > ||                // instrumentation for the noinstr rcu_dynticks_curr_cpu_in_eqs()
> > ||                instrument_atomic_read(&rdp->dynticks, sizeof(rdp->dynticks));
> > ||                // instrumentation for the noinstr rcu_dynticks_eqs_exit()
> > ||                instrument_atomic_write(&rdp->dynticks, sizeof(rdp->dynticks));
> > ||
> > ||                incby = 1;
> > ||        } else if (!in_nmi()) {
> > ||                instrumentation_begin();
> > ||                rcu_irq_enter_check_tick();
> > ||        } else  {
> > ||                instrumentation_begin();
> > ||        }
> > ||        ...
> > ||}
> > 
> 
> Forget to supplement the context for understanding the case:
>   On arm64, at present, a pNMI (akin to NMI) may call rcu_nmi_enter()
>   without calling "__preempt_count_add(NMI_OFFSET + HARDIRQ_OFFSET);".
>   As a result it can be mistaken as an normal interrupt in
>   rcu_nmi_enter().

I appreciate that there's a window where we treat the pNMI like an IRQ,
but that's by design, and we account for this in gic_handle_irq() and
gic_handle_nmi() where we "upgrade" to NMI context with
nmi_enter()..nmi_exit().

The idea is that we have two cases: 

1) If we take a pNMI from a context where IRQs were masked, we know it
   must be a pNMI, and perform the NMI entry immediately to avoid
   reentrancy problems. 

   I think we're all happy with this case.

2) If we take a pNMI from a context where IRQs were unmasked, we don't know
   whether the trigger was a pNMI/IRQ until we read from the GIC, and
   since we *could* have taken an IRQ, this is equivalent to taking a
   spurious IRQ, and while handling that, taking the NMI, e.g.
   
   < run with IRQs unmasked >
     ~~~ take IRQ ~~~
     < enter IRQ >
       ~~~ take NMI exception ~~~
       < enter NMI >
       < handle NMI >
       < exit NMI > 
       ~~~ return from NMI exception ~~~
     < handle IRQ / spurious / do-nothing >
     < exit IRQ >
     ~~~ return from IRQ exception ~~~
   < continue running with IRQs unmasked >

   ... except that we don't do the HW NMI exception entry/exit, just all
   the necessary SW accounting.


Note that case (2) can *never* nest within itself or within case (1).

Do you have a specific example of something that goes wrong with the
above? e.g. something that's inconsistent with that rationale?

> And this may cause the following issue:
> > There is 3 pieces of code put under the
> > protection of if (!in_nmi()). At least the last one
> > "rcu_irq_enter_check_tick()" can trigger a hard lock up bug. Because it
> > is supposed to hold a spin lock with irqoff by
> > "raw_spin_lock_rcu_node(rdp->mynode)", but pNMI can breach it. The same
> > scenario in rcu_nmi_exit()->rcu_prepare_for_idle().
> > 
> > As for the first two "if (!in_nmi())", I have no idea of why, except
> > breaching spin_lock_irq() by NMI. Hope Paul can give some guide.

That code (in enter_from_kernel_mode()) only runs in case 2, where it
cannot be nested within a pNMI, so I struggle to see how this can
deadlock. It it can, then I would expect the general case of a pNMI
nesting within and IRQ would be broken?

Can you give a concrete example of a sequence that would lockup?
Currently I can't see how that's possible.

Thanks,
Mark.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel