All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mark Rutland <mark.rutland@arm.com>
To: Pingfan Liu <kernelfans@gmail.com>
Cc: linux-arm-kernel@lists.infradead.org,
	"Paul E . McKenney" <paulmck@kernel.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>, Marc Zyngier <maz@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Joey Gouly <joey.gouly@arm.com>,
	Sami Tolvanen <samitolvanen@google.com>,
	Julien Thierry <julien.thierry@arm.com>,
	Yuichi Ito <ito-yuichi@fujitsu.com>,
	rcu@vger.kernel.org
Subject: Re: [PATCHv3 1/4] arm64: entry: judge nmi ealier to avoid deadlock in RCU
Date: Fri, 19 Nov 2021 14:04:04 +0000	[thread overview]
Message-ID: <20211119140404.GC25912@lakrids.cambridge.arm.com> (raw)
In-Reply-To: <YZcFXJmrbY+oac0f@piliu.users.ipa.redhat.com>

On Fri, Nov 19, 2021 at 10:01:00AM +0800, Pingfan Liu wrote:
> On Wed, Nov 17, 2021 at 11:38:54AM +0000, Mark Rutland wrote:
> > On Tue, Nov 16, 2021 at 04:24:47PM +0800, Pingfan Liu wrote:
> > > Linux kernel places strict semantics between NMI and maskable interrupt.
> > > So does the RCU component, else deadlock may happen. But the current
> > > arm64 entry code can partially breach this rule through calling
> > > rcu_nmi_enter().
> > > 
> > > ***  how a deadlock can happen if NMI mistaken as IRQ  ***
> > > 
> > >     rcu_nmi_enter()
> > >     {
> > > 
> > >         if (rcu_dynticks_curr_cpu_in_eqs()) {
> > > 
> > >         	if (!in_nmi())
> > >         		rcu_dynticks_task_exit();
> > >                 ...
> > >         	if (!in_nmi()) {
> > >         		instrumentation_begin();
> > >         		rcu_cleanup_after_idle();
> > >         		instrumentation_end();
> > >         	}
> > >                 ...
> > >         } else if (!in_nmi()) {
> > >         	instrumentation_begin();
> > >         	rcu_irq_enter_check_tick();
> > >         }
> > > 
> > >     }
> > > 
> > > If a NMI is mistaken as a maskable interrupt, rcu_irq_enter_check_tick()
> > > can hit a deadlock, which is demonstrated by the following scenario:
> > > 
> > >     note_gp_changes() runs in a task context
> > >     {
> > >        local_irq_save(flags); // this protects against irq, but not NMI
> > 
> > Note: speecifically, this masks IRQs via ICC_PMR_EL1.
> > 
> > >        rnp = rdp->mynode;
> > >        ...
> > >        raw_spin_trylock_rcu_node(rnp)
> > >                                   -------> broken in by (p)NMI, without taking __nmi_enter()
> > 
> > AFAICT the broken case described here *cannot* happen.
> > 
> > If we take a pNMI here, we'll go:
> > 
> >   el1h_64_irq()
> >   -> el1h64_irq_handler()
> >   -> el1_interrupt()
> > 
> > ... where el1_interrupt is:
> > 
> > | static void noinstr el1_interrupt(struct pt_regs *regs,
> > |                                   void (*handler)(struct pt_regs *)) 
> > | {
> > |         write_sysreg(DAIF_PROCCTX_NOIRQ, daif);
> > | 
> > |         if (IS_ENABLED(CONFIG_ARM64_PSEUDO_NMI) && !interrupts_enabled(regs))
> > |                 __el1_pnmi(regs, handler);
> > |         else
> > |                 __el1_irq(regs, handler);
> > | } 
> > 
> > ... and interrupts_enabled() is:
> > 
> > | #define interrupts_enabled(regs)                        \
> > |        (!((regs)->pstate & PSR_I_BIT) && irqs_priority_unmasked(regs))
> > 
> > ... and irqs_priority_unmasked is:
> > 
> > | #define irqs_priority_unmasked(regs)                                    \
> > |         (system_uses_irq_prio_masking() ?                               \
> > |                 (regs)->pmr_save == GIC_PRIO_IRQON :                    \
> > |                 true)
> > 
> > ... irqs_priority_unmasked(regs) *must* return false for this case,
> > since the local_irq_save() above must have set the PMR to
> > GIC_PRIO_IRQOFF in the interrupted context.
> > 
> > Thus, we *must* call __el1_pnmi() in this case, where __el1_pnmi() is:
> > 
> 
> Yes, you are right. And I made a mistake to think it will call
> __el1_irq()

> As above, I made a mistake about the condition hence the code path.
> There is no problem in this case through calling __el1_pnmi()
> 
> Sorry for the false alarm and thank you again for your detailed
> explaination.

No worries; thanks for confirming this was a false alarm. Glad we're now
on the same page. :)

Likewise, thanks for reporting what you thought was an issue; having
more eyes on this is always good!

Thanks,
Mark.

WARNING: multiple messages have this Message-ID (diff)
From: Mark Rutland <mark.rutland@arm.com>
To: Pingfan Liu <kernelfans@gmail.com>
Cc: linux-arm-kernel@lists.infradead.org,
	"Paul E . McKenney" <paulmck@kernel.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>, Marc Zyngier <maz@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Joey Gouly <joey.gouly@arm.com>,
	Sami Tolvanen <samitolvanen@google.com>,
	Julien Thierry <julien.thierry@arm.com>,
	Yuichi Ito <ito-yuichi@fujitsu.com>,
	rcu@vger.kernel.org
Subject: Re: [PATCHv3 1/4] arm64: entry: judge nmi ealier to avoid deadlock in RCU
Date: Fri, 19 Nov 2021 14:04:04 +0000	[thread overview]
Message-ID: <20211119140404.GC25912@lakrids.cambridge.arm.com> (raw)
In-Reply-To: <YZcFXJmrbY+oac0f@piliu.users.ipa.redhat.com>

On Fri, Nov 19, 2021 at 10:01:00AM +0800, Pingfan Liu wrote:
> On Wed, Nov 17, 2021 at 11:38:54AM +0000, Mark Rutland wrote:
> > On Tue, Nov 16, 2021 at 04:24:47PM +0800, Pingfan Liu wrote:
> > > Linux kernel places strict semantics between NMI and maskable interrupt.
> > > So does the RCU component, else deadlock may happen. But the current
> > > arm64 entry code can partially breach this rule through calling
> > > rcu_nmi_enter().
> > > 
> > > ***  how a deadlock can happen if NMI mistaken as IRQ  ***
> > > 
> > >     rcu_nmi_enter()
> > >     {
> > > 
> > >         if (rcu_dynticks_curr_cpu_in_eqs()) {
> > > 
> > >         	if (!in_nmi())
> > >         		rcu_dynticks_task_exit();
> > >                 ...
> > >         	if (!in_nmi()) {
> > >         		instrumentation_begin();
> > >         		rcu_cleanup_after_idle();
> > >         		instrumentation_end();
> > >         	}
> > >                 ...
> > >         } else if (!in_nmi()) {
> > >         	instrumentation_begin();
> > >         	rcu_irq_enter_check_tick();
> > >         }
> > > 
> > >     }
> > > 
> > > If a NMI is mistaken as a maskable interrupt, rcu_irq_enter_check_tick()
> > > can hit a deadlock, which is demonstrated by the following scenario:
> > > 
> > >     note_gp_changes() runs in a task context
> > >     {
> > >        local_irq_save(flags); // this protects against irq, but not NMI
> > 
> > Note: speecifically, this masks IRQs via ICC_PMR_EL1.
> > 
> > >        rnp = rdp->mynode;
> > >        ...
> > >        raw_spin_trylock_rcu_node(rnp)
> > >                                   -------> broken in by (p)NMI, without taking __nmi_enter()
> > 
> > AFAICT the broken case described here *cannot* happen.
> > 
> > If we take a pNMI here, we'll go:
> > 
> >   el1h_64_irq()
> >   -> el1h64_irq_handler()
> >   -> el1_interrupt()
> > 
> > ... where el1_interrupt is:
> > 
> > | static void noinstr el1_interrupt(struct pt_regs *regs,
> > |                                   void (*handler)(struct pt_regs *)) 
> > | {
> > |         write_sysreg(DAIF_PROCCTX_NOIRQ, daif);
> > | 
> > |         if (IS_ENABLED(CONFIG_ARM64_PSEUDO_NMI) && !interrupts_enabled(regs))
> > |                 __el1_pnmi(regs, handler);
> > |         else
> > |                 __el1_irq(regs, handler);
> > | } 
> > 
> > ... and interrupts_enabled() is:
> > 
> > | #define interrupts_enabled(regs)                        \
> > |        (!((regs)->pstate & PSR_I_BIT) && irqs_priority_unmasked(regs))
> > 
> > ... and irqs_priority_unmasked is:
> > 
> > | #define irqs_priority_unmasked(regs)                                    \
> > |         (system_uses_irq_prio_masking() ?                               \
> > |                 (regs)->pmr_save == GIC_PRIO_IRQON :                    \
> > |                 true)
> > 
> > ... irqs_priority_unmasked(regs) *must* return false for this case,
> > since the local_irq_save() above must have set the PMR to
> > GIC_PRIO_IRQOFF in the interrupted context.
> > 
> > Thus, we *must* call __el1_pnmi() in this case, where __el1_pnmi() is:
> > 
> 
> Yes, you are right. And I made a mistake to think it will call
> __el1_irq()

> As above, I made a mistake about the condition hence the code path.
> There is no problem in this case through calling __el1_pnmi()
> 
> Sorry for the false alarm and thank you again for your detailed
> explaination.

No worries; thanks for confirming this was a false alarm. Glad we're now
on the same page. :)

Likewise, thanks for reporting what you thought was an issue; having
more eyes on this is always good!

Thanks,
Mark.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2021-11-19 14:04 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-16  8:24 [PATCHv3 0/4] arm64: Fixes RCU deadlock due to a mistaken Pingfan Liu
2021-11-16  8:24 ` Pingfan Liu
2021-11-16  8:24 ` [PATCHv3 1/4] arm64: entry: judge nmi ealier to avoid deadlock in RCU Pingfan Liu
2021-11-16  8:24   ` Pingfan Liu
2021-11-17 11:38   ` Mark Rutland
2021-11-17 11:38     ` Mark Rutland
2021-11-19  2:01     ` Pingfan Liu
2021-11-19  2:01       ` Pingfan Liu
2021-11-19 14:04       ` Mark Rutland [this message]
2021-11-19 14:04         ` Mark Rutland
2021-11-16  8:24 ` [PATCHv3 2/4] arm64: entry: distinguish pNMI earlier in el0 interrupt Pingfan Liu
2021-11-16  8:24   ` Pingfan Liu
2021-11-16  8:24 ` [PATCHv3 3/4] irqchip: GICv3: expose pNMI discriminator Pingfan Liu
2021-11-16  8:24   ` Pingfan Liu
2021-11-16  9:53   ` Marc Zyngier
2021-11-16  9:53     ` Marc Zyngier
2021-11-17 10:16     ` Pingfan Liu
2021-11-17 10:16       ` Pingfan Liu
2021-11-17 11:01       ` Marc Zyngier
2021-11-17 11:01         ` Marc Zyngier
2021-11-19  2:38         ` Pingfan Liu
2021-11-19  2:38           ` Pingfan Liu
2021-11-16  8:24 ` [PATCHv3 4/4] arm64: entry: remove pNMI judgement in __el1_interrupt() path Pingfan Liu
2021-11-16  8:24   ` Pingfan Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211119140404.GC25912@lakrids.cambridge.arm.com \
    --to=mark.rutland@arm.com \
    --cc=catalin.marinas@arm.com \
    --cc=ito-yuichi@fujitsu.com \
    --cc=joey.gouly@arm.com \
    --cc=julien.thierry@arm.com \
    --cc=kernelfans@gmail.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=maz@kernel.org \
    --cc=paulmck@kernel.org \
    --cc=rcu@vger.kernel.org \
    --cc=samitolvanen@google.com \
    --cc=tglx@linutronix.de \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.