From: Pingfan Liu <kernelfans@gmail.com> To: Mark Rutland <mark.rutland@arm.com>, "Paul E. McKenney" <paulmck@kernel.org> Cc: linux-arm-kernel@lists.infradead.org, Catalin Marinas <catalin.marinas@arm.com>, Will Deacon <will@kernel.org>, Marc Zyngier <maz@kernel.org>, Joey Gouly <joey.gouly@arm.com>, Sami Tolvanen <samitolvanen@google.com>, Julien Thierry <julien.thierry@arm.com>, Thomas Gleixner <tglx@linutronix.de>, Yuichi Ito <ito-yuichi@fujitsu.com>, linux-kernel@vger.kernel.org Subject: Re: [PATCHv2 1/5] arm64/entry-common: push the judgement of nmi ahead Date: Fri, 8 Oct 2021 22:55:04 +0800 [thread overview] Message-ID: <YWBbyPJPpt5zgj+b@piliu.users.ipa.redhat.com> (raw) In-Reply-To: <YV/ClUNWvMga3qud@piliu.users.ipa.redhat.com> On Fri, Oct 08, 2021 at 12:01:25PM +0800, Pingfan Liu wrote: > Sorry that I missed this message and I am just back from a long > festival. > > Adding Paul for RCU guidance. > > On Thu, Sep 30, 2021 at 02:32:57PM +0100, Mark Rutland wrote: > > On Sat, Sep 25, 2021 at 11:39:55PM +0800, Pingfan Liu wrote: > > > On Fri, Sep 24, 2021 at 06:53:06PM +0100, Mark Rutland wrote: > > > > On Fri, Sep 24, 2021 at 09:28:33PM +0800, Pingfan Liu wrote: > > > > > In enter_el1_irq_or_nmi(), it can be the case which NMI interrupts an > > > > > irq, which makes the condition !interrupts_enabled(regs) fail to detect > > > > > the NMI. This will cause a mistaken account for irq. > > > > > > > Sorry about the confusing word "account", it should be "lockdep/rcu/.." > > > > > > > Can you please explain this in more detail? It's not clear which > > > > specific case you mean when you say "NMI interrupts an irq", as that > > > > could mean a number of distinct scenarios. > > > > > > > > AFAICT, if we're in an IRQ handler (with NMIs unmasked), and an NMI > > > > causes a new exception we'll do the right thing. So either I'm missing a > > > > subtlety or you're describing a different scenario.. > > > > > > > > Note that the entry code is only trying to distinguish between: > > > > > > > > a) This exception is *definitely* an NMI (because regular interrupts > > > > were masked). > > > > > > > > b) This exception is *either* and IRQ or an NMI (and this *cannot* be > > > > distinguished until we acknowledge the interrupt), so we treat it as > > > > an IRQ for now. > > > > > > > b) is the aim. > > > > > > At the entry, enter_el1_irq_or_nmi() -> enter_from_kernel_mode()->rcu_irq_enter()/rcu_irq_enter_check_tick() etc. > > > While at irqchip level, gic_handle_irq()->gic_handle_nmi()->nmi_enter(), > > > which does not call rcu_irq_enter_check_tick(). So it is not proper to > > > "treat it as an IRQ for now" > > > > I'm struggling to understand the problem here. What is "not proper", and > > why? > > > > Do you think there's a correctness problem, or that we're doing more > > work than necessary? > > > I had thought it just did redundant accounting. But after revisiting RCU > code, I think it confronts a real bug. > > > If you could give a specific example of a problem, it would really help. > > > Refer to rcu_nmi_enter(), which can be called by > enter_from_kernel_mode(): > > ||noinstr void rcu_nmi_enter(void) > ||{ > || ... > || if (rcu_dynticks_curr_cpu_in_eqs()) { > || > || if (!in_nmi()) > || rcu_dynticks_task_exit(); > || > || // RCU is not watching here ... > || rcu_dynticks_eqs_exit(); > || // ... but is watching here. > || > || if (!in_nmi()) { > || instrumentation_begin(); > || rcu_cleanup_after_idle(); > || instrumentation_end(); > || } > || > || instrumentation_begin(); > || // instrumentation for the noinstr rcu_dynticks_curr_cpu_in_eqs() > || instrument_atomic_read(&rdp->dynticks, sizeof(rdp->dynticks)); > || // instrumentation for the noinstr rcu_dynticks_eqs_exit() > || instrument_atomic_write(&rdp->dynticks, sizeof(rdp->dynticks)); > || > || incby = 1; > || } else if (!in_nmi()) { > || instrumentation_begin(); > || rcu_irq_enter_check_tick(); > || } else { > || instrumentation_begin(); > || } > || ... > ||} > Forget to supplement the context for understanding the case: On arm64, at present, a pNMI (akin to NMI) may call rcu_nmi_enter() without calling "__preempt_count_add(NMI_OFFSET + HARDIRQ_OFFSET);". As a result it can be mistaken as an normal interrupt in rcu_nmi_enter(). And this may cause the following issue: > There is 3 pieces of code put under the > protection of if (!in_nmi()). At least the last one > "rcu_irq_enter_check_tick()" can trigger a hard lock up bug. Because it > is supposed to hold a spin lock with irqoff by > "raw_spin_lock_rcu_node(rdp->mynode)", but pNMI can breach it. The same > scenario in rcu_nmi_exit()->rcu_prepare_for_idle(). > > As for the first two "if (!in_nmi())", I have no idea of why, except > breaching spin_lock_irq() by NMI. Hope Paul can give some guide. > > > Thanks, > > Pingfan > > > > I'm aware that we do more work than strictly necessary when we take a > > pNMI from a context with IRQs enabled, but that's how we'd intended this > > to work, as it's vastly simpler to manage the state that way. Unless > > there's a real problem with that approach I'd prefer to leave it as-is. > > > > Thanks, > > Mark. > > > > _______________________________________________ > > linux-arm-kernel mailing list > > linux-arm-kernel@lists.infradead.org > > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
WARNING: multiple messages have this Message-ID (diff)
From: Pingfan Liu <kernelfans@gmail.com> To: Mark Rutland <mark.rutland@arm.com>, "Paul E. McKenney" <paulmck@kernel.org> Cc: linux-arm-kernel@lists.infradead.org, Catalin Marinas <catalin.marinas@arm.com>, Will Deacon <will@kernel.org>, Marc Zyngier <maz@kernel.org>, Joey Gouly <joey.gouly@arm.com>, Sami Tolvanen <samitolvanen@google.com>, Julien Thierry <julien.thierry@arm.com>, Thomas Gleixner <tglx@linutronix.de>, Yuichi Ito <ito-yuichi@fujitsu.com>, linux-kernel@vger.kernel.org Subject: Re: [PATCHv2 1/5] arm64/entry-common: push the judgement of nmi ahead Date: Fri, 8 Oct 2021 22:55:04 +0800 [thread overview] Message-ID: <YWBbyPJPpt5zgj+b@piliu.users.ipa.redhat.com> (raw) In-Reply-To: <YV/ClUNWvMga3qud@piliu.users.ipa.redhat.com> On Fri, Oct 08, 2021 at 12:01:25PM +0800, Pingfan Liu wrote: > Sorry that I missed this message and I am just back from a long > festival. > > Adding Paul for RCU guidance. > > On Thu, Sep 30, 2021 at 02:32:57PM +0100, Mark Rutland wrote: > > On Sat, Sep 25, 2021 at 11:39:55PM +0800, Pingfan Liu wrote: > > > On Fri, Sep 24, 2021 at 06:53:06PM +0100, Mark Rutland wrote: > > > > On Fri, Sep 24, 2021 at 09:28:33PM +0800, Pingfan Liu wrote: > > > > > In enter_el1_irq_or_nmi(), it can be the case which NMI interrupts an > > > > > irq, which makes the condition !interrupts_enabled(regs) fail to detect > > > > > the NMI. This will cause a mistaken account for irq. > > > > > > > Sorry about the confusing word "account", it should be "lockdep/rcu/.." > > > > > > > Can you please explain this in more detail? It's not clear which > > > > specific case you mean when you say "NMI interrupts an irq", as that > > > > could mean a number of distinct scenarios. > > > > > > > > AFAICT, if we're in an IRQ handler (with NMIs unmasked), and an NMI > > > > causes a new exception we'll do the right thing. So either I'm missing a > > > > subtlety or you're describing a different scenario.. > > > > > > > > Note that the entry code is only trying to distinguish between: > > > > > > > > a) This exception is *definitely* an NMI (because regular interrupts > > > > were masked). > > > > > > > > b) This exception is *either* and IRQ or an NMI (and this *cannot* be > > > > distinguished until we acknowledge the interrupt), so we treat it as > > > > an IRQ for now. > > > > > > > b) is the aim. > > > > > > At the entry, enter_el1_irq_or_nmi() -> enter_from_kernel_mode()->rcu_irq_enter()/rcu_irq_enter_check_tick() etc. > > > While at irqchip level, gic_handle_irq()->gic_handle_nmi()->nmi_enter(), > > > which does not call rcu_irq_enter_check_tick(). So it is not proper to > > > "treat it as an IRQ for now" > > > > I'm struggling to understand the problem here. What is "not proper", and > > why? > > > > Do you think there's a correctness problem, or that we're doing more > > work than necessary? > > > I had thought it just did redundant accounting. But after revisiting RCU > code, I think it confronts a real bug. > > > If you could give a specific example of a problem, it would really help. > > > Refer to rcu_nmi_enter(), which can be called by > enter_from_kernel_mode(): > > ||noinstr void rcu_nmi_enter(void) > ||{ > || ... > || if (rcu_dynticks_curr_cpu_in_eqs()) { > || > || if (!in_nmi()) > || rcu_dynticks_task_exit(); > || > || // RCU is not watching here ... > || rcu_dynticks_eqs_exit(); > || // ... but is watching here. > || > || if (!in_nmi()) { > || instrumentation_begin(); > || rcu_cleanup_after_idle(); > || instrumentation_end(); > || } > || > || instrumentation_begin(); > || // instrumentation for the noinstr rcu_dynticks_curr_cpu_in_eqs() > || instrument_atomic_read(&rdp->dynticks, sizeof(rdp->dynticks)); > || // instrumentation for the noinstr rcu_dynticks_eqs_exit() > || instrument_atomic_write(&rdp->dynticks, sizeof(rdp->dynticks)); > || > || incby = 1; > || } else if (!in_nmi()) { > || instrumentation_begin(); > || rcu_irq_enter_check_tick(); > || } else { > || instrumentation_begin(); > || } > || ... > ||} > Forget to supplement the context for understanding the case: On arm64, at present, a pNMI (akin to NMI) may call rcu_nmi_enter() without calling "__preempt_count_add(NMI_OFFSET + HARDIRQ_OFFSET);". As a result it can be mistaken as an normal interrupt in rcu_nmi_enter(). And this may cause the following issue: > There is 3 pieces of code put under the > protection of if (!in_nmi()). At least the last one > "rcu_irq_enter_check_tick()" can trigger a hard lock up bug. Because it > is supposed to hold a spin lock with irqoff by > "raw_spin_lock_rcu_node(rdp->mynode)", but pNMI can breach it. The same > scenario in rcu_nmi_exit()->rcu_prepare_for_idle(). > > As for the first two "if (!in_nmi())", I have no idea of why, except > breaching spin_lock_irq() by NMI. Hope Paul can give some guide. > > > Thanks, > > Pingfan > > > > I'm aware that we do more work than strictly necessary when we take a > > pNMI from a context with IRQs enabled, but that's how we'd intended this > > to work, as it's vastly simpler to manage the state that way. Unless > > there's a real problem with that approach I'd prefer to leave it as-is. > > > > Thanks, > > Mark. > > > > _______________________________________________ > > linux-arm-kernel mailing list > > linux-arm-kernel@lists.infradead.org > > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2021-10-08 14:55 UTC|newest] Thread overview: 74+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-09-24 13:28 [PATCHv2 0/5] arm64/irqentry: remove duplicate housekeeping of Pingfan Liu 2021-09-24 13:28 ` Pingfan Liu 2021-09-24 13:28 ` [PATCHv2 1/5] arm64/entry-common: push the judgement of nmi ahead Pingfan Liu 2021-09-24 13:28 ` Pingfan Liu 2021-09-24 17:53 ` Mark Rutland 2021-09-24 17:53 ` Mark Rutland 2021-09-25 15:39 ` Pingfan Liu 2021-09-25 15:39 ` Pingfan Liu 2021-09-30 13:32 ` Mark Rutland 2021-09-30 13:32 ` Mark Rutland 2021-10-08 4:01 ` Pingfan Liu 2021-10-08 4:01 ` Pingfan Liu 2021-10-08 14:55 ` Pingfan Liu [this message] 2021-10-08 14:55 ` Pingfan Liu 2021-10-08 17:25 ` Mark Rutland 2021-10-08 17:25 ` Mark Rutland 2021-10-09 3:49 ` Pingfan Liu 2021-10-09 3:49 ` Pingfan Liu 2021-10-08 15:45 ` Paul E. McKenney 2021-10-08 15:45 ` Paul E. McKenney 2021-10-09 4:14 ` Pingfan Liu 2021-10-09 4:14 ` Pingfan Liu 2021-09-24 13:28 ` [PATCHv2 2/5] irqchip/GICv3: expose handle_nmi() directly Pingfan Liu 2021-09-24 13:28 ` Pingfan Liu 2021-09-24 13:28 ` [PATCHv2 3/5] kernel/irq: make irq_{enter,exit}() in handle_domain_irq() arch optional Pingfan Liu 2021-09-24 13:28 ` [PATCHv2 3/5] kernel/irq: make irq_{enter, exit}() " Pingfan Liu 2021-09-28 8:55 ` [PATCHv2 3/5] kernel/irq: make irq_{enter,exit}() " Mark Rutland 2021-09-28 8:55 ` Mark Rutland 2021-09-29 3:15 ` Pingfan Liu 2021-09-29 3:15 ` Pingfan Liu 2021-09-24 13:28 ` [PATCHv2 4/5] irqchip/GICv3: let gic_handle_irq() utilize irqentry on arm64 Pingfan Liu 2021-09-24 13:28 ` Pingfan Liu 2021-09-28 9:10 ` Mark Rutland 2021-09-28 9:10 ` Mark Rutland 2021-09-29 3:10 ` Pingfan Liu 2021-09-29 3:10 ` Pingfan Liu 2021-09-29 7:20 ` Marc Zyngier 2021-09-29 7:20 ` Marc Zyngier 2021-09-29 8:27 ` Pingfan Liu 2021-09-29 8:27 ` Pingfan Liu 2021-09-29 9:23 ` Mark Rutland 2021-09-29 9:23 ` Mark Rutland 2021-09-29 11:40 ` Pingfan Liu 2021-09-29 11:40 ` Pingfan Liu 2021-09-29 14:29 ` Pingfan Liu 2021-09-29 14:29 ` Pingfan Liu 2021-09-29 17:41 ` Mark Rutland 2021-09-29 17:41 ` Mark Rutland 2021-09-24 13:28 ` [PATCHv2 5/5] irqchip/GICv3: make reschedule-ipi light weight Pingfan Liu 2021-09-24 13:28 ` Pingfan Liu 2021-09-29 7:24 ` Marc Zyngier 2021-09-29 7:24 ` Marc Zyngier 2021-09-29 8:32 ` Pingfan Liu 2021-09-29 8:32 ` Pingfan Liu 2021-09-24 17:36 ` [PATCHv2 0/5] arm64/irqentry: remove duplicate housekeeping of Mark Rutland 2021-09-24 17:36 ` Mark Rutland 2021-09-24 22:59 ` Paul E. McKenney 2021-09-24 22:59 ` Paul E. McKenney 2021-09-27 9:23 ` Mark Rutland 2021-09-27 9:23 ` Mark Rutland 2021-09-28 0:09 ` Paul E. McKenney 2021-09-28 0:09 ` Paul E. McKenney 2021-09-28 8:32 ` Mark Rutland 2021-09-28 8:32 ` Mark Rutland 2021-09-28 8:35 ` Mark Rutland 2021-09-28 8:35 ` Mark Rutland 2021-09-28 9:52 ` Sven Schnelle 2021-09-28 9:52 ` Sven Schnelle 2021-09-28 10:26 ` Mark Rutland 2021-09-28 10:26 ` Mark Rutland 2021-09-28 13:55 ` Paul E. McKenney 2021-09-28 13:55 ` Paul E. McKenney 2021-09-25 15:12 ` Pingfan Liu 2021-09-25 15:12 ` Pingfan Liu
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=YWBbyPJPpt5zgj+b@piliu.users.ipa.redhat.com \ --to=kernelfans@gmail.com \ --cc=catalin.marinas@arm.com \ --cc=ito-yuichi@fujitsu.com \ --cc=joey.gouly@arm.com \ --cc=julien.thierry@arm.com \ --cc=linux-arm-kernel@lists.infradead.org \ --cc=linux-kernel@vger.kernel.org \ --cc=mark.rutland@arm.com \ --cc=maz@kernel.org \ --cc=paulmck@kernel.org \ --cc=samitolvanen@google.com \ --cc=tglx@linutronix.de \ --cc=will@kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.