All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHv3 0/4] arm64: Fixes RCU deadlock due to a mistaken
@ 2021-11-16  8:24 ` Pingfan Liu
  0 siblings, 0 replies; 24+ messages in thread
From: Pingfan Liu @ 2021-11-16  8:24 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Pingfan Liu, Mark Rutland, Paul E . McKenney, Catalin Marinas,
	Will Deacon, Marc Zyngier, Thomas Gleixner, Joey Gouly,
	Sami Tolvanen, Julien Thierry, Yuichi Ito, rcu

It was [PATCHv2 1/5] arm64/entry-common: push the judgement of nmi ahead
[1], but as discussion, I found it hit a deadlock, so change subject in
v3.

Linux kernel places strict semantics between NMI and maskable interrupt.
So does the RCU component, else deadlock may happen.

The current arm64 entry code can partially breach this rule through
calling rcu_nmi_enter() in pNMI conext without calling __nmi_enter().
Finally, it calls __rcu_irq_enter_check_tick() trying to hold
raw_spin_lock_rcu_node(rdp->mynode), but this spinlock can already been
held by a task running note_gp_changes()->raw_spin_trylock_rcu_node(rnp).
Then deadlock happens.

The organization of this series:
  [1/4] aims to fix this bug
  [2-3/4] ease the implementation of [1/4]
  [4/4] is a cleanup

To do:
  Does arm need to handle pNMI as arm64?

v2 -> v3:
  drop handle_arch_nmi interface

[1]: https://lore.kernel.org/linux-arm-kernel/20210924132837.45994-2-kernelfans@gmail.com/

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Joey Gouly <joey.gouly@arm.com>
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: Julien Thierry <julien.thierry@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yuichi Ito <ito-yuichi@fujitsu.com>
Cc: rcu@vger.kernel.org
To: linux-arm-kernel@lists.infradead.org

Pingfan Liu (4):
  arm64: entry: judge nmi ealier to avoid deadlock in RCU
  arm64: entry: distinguish pNMI earlier in el0 interrupt
  irqchip: GICv3: expose pNMI discriminator
  arm64: entry: remove pNMI judgement in __el1_interrupt() path

 arch/arm64/kernel/entry-common.c | 55 +++++++++++++++++++++-----------
 arch/arm64/kernel/irq.c          | 15 +++++++++
 drivers/irqchip/irq-gic-v3.c     | 16 ++++++----
 3 files changed, 62 insertions(+), 24 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCHv3 0/4] arm64: Fixes RCU deadlock due to a mistaken
@ 2021-11-16  8:24 ` Pingfan Liu
  0 siblings, 0 replies; 24+ messages in thread
From: Pingfan Liu @ 2021-11-16  8:24 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Pingfan Liu, Mark Rutland, Paul E . McKenney, Catalin Marinas,
	Will Deacon, Marc Zyngier, Thomas Gleixner, Joey Gouly,
	Sami Tolvanen, Julien Thierry, Yuichi Ito, rcu

It was [PATCHv2 1/5] arm64/entry-common: push the judgement of nmi ahead
[1], but as discussion, I found it hit a deadlock, so change subject in
v3.

Linux kernel places strict semantics between NMI and maskable interrupt.
So does the RCU component, else deadlock may happen.

The current arm64 entry code can partially breach this rule through
calling rcu_nmi_enter() in pNMI conext without calling __nmi_enter().
Finally, it calls __rcu_irq_enter_check_tick() trying to hold
raw_spin_lock_rcu_node(rdp->mynode), but this spinlock can already been
held by a task running note_gp_changes()->raw_spin_trylock_rcu_node(rnp).
Then deadlock happens.

The organization of this series:
  [1/4] aims to fix this bug
  [2-3/4] ease the implementation of [1/4]
  [4/4] is a cleanup

To do:
  Does arm need to handle pNMI as arm64?

v2 -> v3:
  drop handle_arch_nmi interface

[1]: https://lore.kernel.org/linux-arm-kernel/20210924132837.45994-2-kernelfans@gmail.com/

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Joey Gouly <joey.gouly@arm.com>
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: Julien Thierry <julien.thierry@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yuichi Ito <ito-yuichi@fujitsu.com>
Cc: rcu@vger.kernel.org
To: linux-arm-kernel@lists.infradead.org

Pingfan Liu (4):
  arm64: entry: judge nmi ealier to avoid deadlock in RCU
  arm64: entry: distinguish pNMI earlier in el0 interrupt
  irqchip: GICv3: expose pNMI discriminator
  arm64: entry: remove pNMI judgement in __el1_interrupt() path

 arch/arm64/kernel/entry-common.c | 55 +++++++++++++++++++++-----------
 arch/arm64/kernel/irq.c          | 15 +++++++++
 drivers/irqchip/irq-gic-v3.c     | 16 ++++++----
 3 files changed, 62 insertions(+), 24 deletions(-)

-- 
2.31.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCHv3 1/4] arm64: entry: judge nmi ealier to avoid deadlock in RCU
  2021-11-16  8:24 ` Pingfan Liu
@ 2021-11-16  8:24   ` Pingfan Liu
  -1 siblings, 0 replies; 24+ messages in thread
From: Pingfan Liu @ 2021-11-16  8:24 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Pingfan Liu, Mark Rutland, Paul E . McKenney, Catalin Marinas,
	Will Deacon, Marc Zyngier, Thomas Gleixner, Joey Gouly,
	Sami Tolvanen, Julien Thierry, Yuichi Ito, rcu

Linux kernel places strict semantics between NMI and maskable interrupt.
So does the RCU component, else deadlock may happen. But the current
arm64 entry code can partially breach this rule through calling
rcu_nmi_enter().

***  how a deadlock can happen if NMI mistaken as IRQ  ***

    rcu_nmi_enter()
    {

        if (rcu_dynticks_curr_cpu_in_eqs()) {

        	if (!in_nmi())
        		rcu_dynticks_task_exit();
                ...
        	if (!in_nmi()) {
        		instrumentation_begin();
        		rcu_cleanup_after_idle();
        		instrumentation_end();
        	}
                ...
        } else if (!in_nmi()) {
        	instrumentation_begin();
        	rcu_irq_enter_check_tick();
        }

    }

If a NMI is mistaken as a maskable interrupt, rcu_irq_enter_check_tick()
can hit a deadlock, which is demonstrated by the following scenario:

    note_gp_changes() runs in a task context
    {
       local_irq_save(flags); // this protects against irq, but not NMI
       rnp = rdp->mynode;
       ...
       raw_spin_trylock_rcu_node(rnp)
                                  -------> broken in by (p)NMI, without taking __nmi_enter()
                                           rcu_nmi_enter()
                                            ->__rcu_irq_enter_check_tick()
                                               ->raw_spin_lock_rcu_node(rdp->mynode);
                                                   deadlock happens!!!

    }

*** On arm64, how pNMI mistaken as IRQ ***

On arm64, pNMI is an analogue to NMI. In essence, it is a higher
priority interrupt but not disabled by local_irq_disable().

In current implementation
1) If a pNMI from a context where IRQs were masked, it can be recognized
   as nmi, and calls __nmi_enter() immediately. This is no problem.

2) But it causes trouble if a pNMI from a context where IRQs were
   unmasked, and temporarily regarded as maskable interrupt.  It is not
   treated as NMI, i.e. calling nmi_enter() until reading from GIC.

    __el1_irq()
    {
      irq_enter_rcu()   ----> hit the deadlock bug
        gic_handle_nmi()
          nmi_enter()
          nmi_exit()
      irq_exit_rcu()
    }

*** Remedy ***

If the irqchip level exposes an interface for detecting pNMI to arch level
code, it can meet the requirement at this early stage. That is the
interface (*interrupt_is_nmi)() in this patch.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Joey Gouly <joey.gouly@arm.com>
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: Julien Thierry <julien.thierry@arm.com>
Cc: Yuichi Ito <ito-yuichi@fujitsu.com>
Cc: rcu@vger.kernel.org
To: linux-arm-kernel@lists.infradead.org
---
 arch/arm64/include/asm/irq.h     |  1 +
 arch/arm64/kernel/entry-common.c | 10 +++++++++-
 arch/arm64/kernel/irq.c          | 18 ++++++++++++++++++
 3 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/irq.h b/arch/arm64/include/asm/irq.h
index fac08e18bcd5..f3eb13bfa65e 100644
--- a/arch/arm64/include/asm/irq.h
+++ b/arch/arm64/include/asm/irq.h
@@ -11,6 +11,7 @@ struct pt_regs;
 int set_handle_irq(void (*handle_irq)(struct pt_regs *));
 #define set_handle_irq	set_handle_irq
 int set_handle_fiq(void (*handle_fiq)(struct pt_regs *));
+int set_nmi_discriminator(bool (*discriminator)(void));
 
 static inline int nr_legacy_irqs(void)
 {
diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c
index f7408edf8571..5a1a5dd66d04 100644
--- a/arch/arm64/kernel/entry-common.c
+++ b/arch/arm64/kernel/entry-common.c
@@ -260,6 +260,14 @@ static void do_interrupt_handler(struct pt_regs *regs,
 
 extern void (*handle_arch_irq)(struct pt_regs *);
 extern void (*handle_arch_fiq)(struct pt_regs *);
+extern bool (*interrupt_is_nmi)(void);
+
+static inline bool is_in_pnmi(struct pt_regs *regs)
+{
+	if (!interrupts_enabled(regs) || (*interrupt_is_nmi)())
+		return true;
+	return false;
+}
 
 static void noinstr __panic_unhandled(struct pt_regs *regs, const char *vector,
 				      unsigned int esr)
@@ -454,7 +462,7 @@ static void noinstr el1_interrupt(struct pt_regs *regs,
 {
 	write_sysreg(DAIF_PROCCTX_NOIRQ, daif);
 
-	if (IS_ENABLED(CONFIG_ARM64_PSEUDO_NMI) && !interrupts_enabled(regs))
+	if (IS_ENABLED(CONFIG_ARM64_PSEUDO_NMI) && is_in_pnmi(regs))
 		__el1_pnmi(regs, handler);
 	else
 		__el1_irq(regs, handler);
diff --git a/arch/arm64/kernel/irq.c b/arch/arm64/kernel/irq.c
index bda49430c9ea..fabed09ed966 100644
--- a/arch/arm64/kernel/irq.c
+++ b/arch/arm64/kernel/irq.c
@@ -81,8 +81,14 @@ static void default_handle_fiq(struct pt_regs *regs)
 	panic("FIQ taken without a root FIQ handler\n");
 }
 
+static bool default_nmi_discriminator(void)
+{
+	return false;
+}
+
 void (*handle_arch_irq)(struct pt_regs *) __ro_after_init = default_handle_irq;
 void (*handle_arch_fiq)(struct pt_regs *) __ro_after_init = default_handle_fiq;
+bool (*interrupt_is_nmi)(void) __ro_after_init = default_nmi_discriminator;
 
 int __init set_handle_irq(void (*handle_irq)(struct pt_regs *))
 {
@@ -104,6 +110,18 @@ int __init set_handle_fiq(void (*handle_fiq)(struct pt_regs *))
 	return 0;
 }
 
+#if IS_ENABLED(CONFIG_ARM64_PSEUDO_NMI)
+
+int __init set_nmi_discriminator(bool (*discriminator)(void))
+{
+	if (interrupt_is_nmi != default_nmi_discriminator)
+		return -EBUSY;
+
+	interrupt_is_nmi = discriminator;
+	return 0;
+}
+#endif
+
 void __init init_IRQ(void)
 {
 	init_irq_stacks();
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCHv3 1/4] arm64: entry: judge nmi ealier to avoid deadlock in RCU
@ 2021-11-16  8:24   ` Pingfan Liu
  0 siblings, 0 replies; 24+ messages in thread
From: Pingfan Liu @ 2021-11-16  8:24 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Pingfan Liu, Mark Rutland, Paul E . McKenney, Catalin Marinas,
	Will Deacon, Marc Zyngier, Thomas Gleixner, Joey Gouly,
	Sami Tolvanen, Julien Thierry, Yuichi Ito, rcu

Linux kernel places strict semantics between NMI and maskable interrupt.
So does the RCU component, else deadlock may happen. But the current
arm64 entry code can partially breach this rule through calling
rcu_nmi_enter().

***  how a deadlock can happen if NMI mistaken as IRQ  ***

    rcu_nmi_enter()
    {

        if (rcu_dynticks_curr_cpu_in_eqs()) {

        	if (!in_nmi())
        		rcu_dynticks_task_exit();
                ...
        	if (!in_nmi()) {
        		instrumentation_begin();
        		rcu_cleanup_after_idle();
        		instrumentation_end();
        	}
                ...
        } else if (!in_nmi()) {
        	instrumentation_begin();
        	rcu_irq_enter_check_tick();
        }

    }

If a NMI is mistaken as a maskable interrupt, rcu_irq_enter_check_tick()
can hit a deadlock, which is demonstrated by the following scenario:

    note_gp_changes() runs in a task context
    {
       local_irq_save(flags); // this protects against irq, but not NMI
       rnp = rdp->mynode;
       ...
       raw_spin_trylock_rcu_node(rnp)
                                  -------> broken in by (p)NMI, without taking __nmi_enter()
                                           rcu_nmi_enter()
                                            ->__rcu_irq_enter_check_tick()
                                               ->raw_spin_lock_rcu_node(rdp->mynode);
                                                   deadlock happens!!!

    }

*** On arm64, how pNMI mistaken as IRQ ***

On arm64, pNMI is an analogue to NMI. In essence, it is a higher
priority interrupt but not disabled by local_irq_disable().

In current implementation
1) If a pNMI from a context where IRQs were masked, it can be recognized
   as nmi, and calls __nmi_enter() immediately. This is no problem.

2) But it causes trouble if a pNMI from a context where IRQs were
   unmasked, and temporarily regarded as maskable interrupt.  It is not
   treated as NMI, i.e. calling nmi_enter() until reading from GIC.

    __el1_irq()
    {
      irq_enter_rcu()   ----> hit the deadlock bug
        gic_handle_nmi()
          nmi_enter()
          nmi_exit()
      irq_exit_rcu()
    }

*** Remedy ***

If the irqchip level exposes an interface for detecting pNMI to arch level
code, it can meet the requirement at this early stage. That is the
interface (*interrupt_is_nmi)() in this patch.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Joey Gouly <joey.gouly@arm.com>
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: Julien Thierry <julien.thierry@arm.com>
Cc: Yuichi Ito <ito-yuichi@fujitsu.com>
Cc: rcu@vger.kernel.org
To: linux-arm-kernel@lists.infradead.org
---
 arch/arm64/include/asm/irq.h     |  1 +
 arch/arm64/kernel/entry-common.c | 10 +++++++++-
 arch/arm64/kernel/irq.c          | 18 ++++++++++++++++++
 3 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/irq.h b/arch/arm64/include/asm/irq.h
index fac08e18bcd5..f3eb13bfa65e 100644
--- a/arch/arm64/include/asm/irq.h
+++ b/arch/arm64/include/asm/irq.h
@@ -11,6 +11,7 @@ struct pt_regs;
 int set_handle_irq(void (*handle_irq)(struct pt_regs *));
 #define set_handle_irq	set_handle_irq
 int set_handle_fiq(void (*handle_fiq)(struct pt_regs *));
+int set_nmi_discriminator(bool (*discriminator)(void));
 
 static inline int nr_legacy_irqs(void)
 {
diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c
index f7408edf8571..5a1a5dd66d04 100644
--- a/arch/arm64/kernel/entry-common.c
+++ b/arch/arm64/kernel/entry-common.c
@@ -260,6 +260,14 @@ static void do_interrupt_handler(struct pt_regs *regs,
 
 extern void (*handle_arch_irq)(struct pt_regs *);
 extern void (*handle_arch_fiq)(struct pt_regs *);
+extern bool (*interrupt_is_nmi)(void);
+
+static inline bool is_in_pnmi(struct pt_regs *regs)
+{
+	if (!interrupts_enabled(regs) || (*interrupt_is_nmi)())
+		return true;
+	return false;
+}
 
 static void noinstr __panic_unhandled(struct pt_regs *regs, const char *vector,
 				      unsigned int esr)
@@ -454,7 +462,7 @@ static void noinstr el1_interrupt(struct pt_regs *regs,
 {
 	write_sysreg(DAIF_PROCCTX_NOIRQ, daif);
 
-	if (IS_ENABLED(CONFIG_ARM64_PSEUDO_NMI) && !interrupts_enabled(regs))
+	if (IS_ENABLED(CONFIG_ARM64_PSEUDO_NMI) && is_in_pnmi(regs))
 		__el1_pnmi(regs, handler);
 	else
 		__el1_irq(regs, handler);
diff --git a/arch/arm64/kernel/irq.c b/arch/arm64/kernel/irq.c
index bda49430c9ea..fabed09ed966 100644
--- a/arch/arm64/kernel/irq.c
+++ b/arch/arm64/kernel/irq.c
@@ -81,8 +81,14 @@ static void default_handle_fiq(struct pt_regs *regs)
 	panic("FIQ taken without a root FIQ handler\n");
 }
 
+static bool default_nmi_discriminator(void)
+{
+	return false;
+}
+
 void (*handle_arch_irq)(struct pt_regs *) __ro_after_init = default_handle_irq;
 void (*handle_arch_fiq)(struct pt_regs *) __ro_after_init = default_handle_fiq;
+bool (*interrupt_is_nmi)(void) __ro_after_init = default_nmi_discriminator;
 
 int __init set_handle_irq(void (*handle_irq)(struct pt_regs *))
 {
@@ -104,6 +110,18 @@ int __init set_handle_fiq(void (*handle_fiq)(struct pt_regs *))
 	return 0;
 }
 
+#if IS_ENABLED(CONFIG_ARM64_PSEUDO_NMI)
+
+int __init set_nmi_discriminator(bool (*discriminator)(void))
+{
+	if (interrupt_is_nmi != default_nmi_discriminator)
+		return -EBUSY;
+
+	interrupt_is_nmi = discriminator;
+	return 0;
+}
+#endif
+
 void __init init_IRQ(void)
 {
 	init_irq_stacks();
-- 
2.31.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCHv3 2/4] arm64: entry: distinguish pNMI earlier in el0 interrupt
  2021-11-16  8:24 ` Pingfan Liu
@ 2021-11-16  8:24   ` Pingfan Liu
  -1 siblings, 0 replies; 24+ messages in thread
From: Pingfan Liu @ 2021-11-16  8:24 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Pingfan Liu, Mark Rutland, Paul E . McKenney, Catalin Marinas,
	Will Deacon, Marc Zyngier, Thomas Gleixner, Joey Gouly,
	Sami Tolvanen, Julien Thierry, Yuichi Ito, rcu

For ease of unifying code, it is helpful to lift nmi_{enter,exit}()
housekeeping from gic_handle_nmi() to el0_interrupt(). Because
gic_handle_nmi() is called by either el1 interrupt or el0, and the
housekeeping has already been done in arch level code when el1
interrupt.

Note about the original code, which calls enter_from_user_mode() in pNMI
context. Although it is weird to call rcu_eqs_exit() in the pseudo NMI
context, it has no problem. This is due to the essentiality of pNMI, a
higher priority interrupt but not akin to NMI, which allows a break-in
at any time.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Joey Gouly <joey.gouly@arm.com>
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: Julien Thierry <julien.thierry@arm.com>
Cc: Yuichi Ito <ito-yuichi@fujitsu.com>
Cc: rcu@vger.kernel.org
To: linux-arm-kernel@lists.infradead.org
---
 arch/arm64/kernel/entry-common.c | 35 ++++++++++++++++++++++++--------
 1 file changed, 27 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c
index 5a1a5dd66d04..afcde43f1b73 100644
--- a/arch/arm64/kernel/entry-common.c
+++ b/arch/arm64/kernel/entry-common.c
@@ -429,7 +429,7 @@ asmlinkage void noinstr el1h_64_sync_handler(struct pt_regs *regs)
 	}
 }
 
-static __always_inline void __el1_pnmi(struct pt_regs *regs,
+static __always_inline void __pnmi_handler_common(struct pt_regs *regs,
 				       void (*handler)(struct pt_regs *))
 {
 	arm64_enter_nmi(regs);
@@ -437,6 +437,12 @@ static __always_inline void __el1_pnmi(struct pt_regs *regs,
 	arm64_exit_nmi(regs);
 }
 
+static __always_inline void __el1_pnmi(struct pt_regs *regs,
+				       void (*handler)(struct pt_regs *))
+{
+	__pnmi_handler_common(regs, handler);
+}
+
 static __always_inline void __el1_irq(struct pt_regs *regs,
 				      void (*handler)(struct pt_regs *))
 {
@@ -673,21 +679,34 @@ asmlinkage void noinstr el0t_64_sync_handler(struct pt_regs *regs)
 	}
 }
 
-static void noinstr el0_interrupt(struct pt_regs *regs,
-				  void (*handler)(struct pt_regs *))
+static __always_inline void __el0_pnmi(struct pt_regs *regs,
+				       void (*handler)(struct pt_regs *))
+{
+	__pnmi_handler_common(regs, handler);
+}
+
+static __always_inline void __el0_irq(struct pt_regs *regs,
+				       void (*handler)(struct pt_regs *))
 {
 	enter_from_user_mode(regs);
+	irq_enter_rcu();
+	do_interrupt_handler(regs, handler);
+	irq_exit_rcu();
+	exit_to_user_mode(regs);
+}
 
+static void noinstr el0_interrupt(struct pt_regs *regs,
+				  void (*handler)(struct pt_regs *))
+{
 	write_sysreg(DAIF_PROCCTX_NOIRQ, daif);
 
 	if (regs->pc & BIT(55))
 		arm64_apply_bp_hardening();
 
-	irq_enter_rcu();
-	do_interrupt_handler(regs, handler);
-	irq_exit_rcu();
-
-	exit_to_user_mode(regs);
+	if (IS_ENABLED(CONFIG_ARM64_PSEUDO_NMI) && is_in_pnmi(regs))
+		__el0_pnmi(regs, handler);
+	else
+		__el0_irq(regs, handler);
 }
 
 static void noinstr __el0_irq_handler_common(struct pt_regs *regs)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCHv3 2/4] arm64: entry: distinguish pNMI earlier in el0 interrupt
@ 2021-11-16  8:24   ` Pingfan Liu
  0 siblings, 0 replies; 24+ messages in thread
From: Pingfan Liu @ 2021-11-16  8:24 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Pingfan Liu, Mark Rutland, Paul E . McKenney, Catalin Marinas,
	Will Deacon, Marc Zyngier, Thomas Gleixner, Joey Gouly,
	Sami Tolvanen, Julien Thierry, Yuichi Ito, rcu

For ease of unifying code, it is helpful to lift nmi_{enter,exit}()
housekeeping from gic_handle_nmi() to el0_interrupt(). Because
gic_handle_nmi() is called by either el1 interrupt or el0, and the
housekeeping has already been done in arch level code when el1
interrupt.

Note about the original code, which calls enter_from_user_mode() in pNMI
context. Although it is weird to call rcu_eqs_exit() in the pseudo NMI
context, it has no problem. This is due to the essentiality of pNMI, a
higher priority interrupt but not akin to NMI, which allows a break-in
at any time.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Joey Gouly <joey.gouly@arm.com>
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: Julien Thierry <julien.thierry@arm.com>
Cc: Yuichi Ito <ito-yuichi@fujitsu.com>
Cc: rcu@vger.kernel.org
To: linux-arm-kernel@lists.infradead.org
---
 arch/arm64/kernel/entry-common.c | 35 ++++++++++++++++++++++++--------
 1 file changed, 27 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c
index 5a1a5dd66d04..afcde43f1b73 100644
--- a/arch/arm64/kernel/entry-common.c
+++ b/arch/arm64/kernel/entry-common.c
@@ -429,7 +429,7 @@ asmlinkage void noinstr el1h_64_sync_handler(struct pt_regs *regs)
 	}
 }
 
-static __always_inline void __el1_pnmi(struct pt_regs *regs,
+static __always_inline void __pnmi_handler_common(struct pt_regs *regs,
 				       void (*handler)(struct pt_regs *))
 {
 	arm64_enter_nmi(regs);
@@ -437,6 +437,12 @@ static __always_inline void __el1_pnmi(struct pt_regs *regs,
 	arm64_exit_nmi(regs);
 }
 
+static __always_inline void __el1_pnmi(struct pt_regs *regs,
+				       void (*handler)(struct pt_regs *))
+{
+	__pnmi_handler_common(regs, handler);
+}
+
 static __always_inline void __el1_irq(struct pt_regs *regs,
 				      void (*handler)(struct pt_regs *))
 {
@@ -673,21 +679,34 @@ asmlinkage void noinstr el0t_64_sync_handler(struct pt_regs *regs)
 	}
 }
 
-static void noinstr el0_interrupt(struct pt_regs *regs,
-				  void (*handler)(struct pt_regs *))
+static __always_inline void __el0_pnmi(struct pt_regs *regs,
+				       void (*handler)(struct pt_regs *))
+{
+	__pnmi_handler_common(regs, handler);
+}
+
+static __always_inline void __el0_irq(struct pt_regs *regs,
+				       void (*handler)(struct pt_regs *))
 {
 	enter_from_user_mode(regs);
+	irq_enter_rcu();
+	do_interrupt_handler(regs, handler);
+	irq_exit_rcu();
+	exit_to_user_mode(regs);
+}
 
+static void noinstr el0_interrupt(struct pt_regs *regs,
+				  void (*handler)(struct pt_regs *))
+{
 	write_sysreg(DAIF_PROCCTX_NOIRQ, daif);
 
 	if (regs->pc & BIT(55))
 		arm64_apply_bp_hardening();
 
-	irq_enter_rcu();
-	do_interrupt_handler(regs, handler);
-	irq_exit_rcu();
-
-	exit_to_user_mode(regs);
+	if (IS_ENABLED(CONFIG_ARM64_PSEUDO_NMI) && is_in_pnmi(regs))
+		__el0_pnmi(regs, handler);
+	else
+		__el0_irq(regs, handler);
 }
 
 static void noinstr __el0_irq_handler_common(struct pt_regs *regs)
-- 
2.31.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCHv3 3/4] irqchip: GICv3: expose pNMI discriminator
  2021-11-16  8:24 ` Pingfan Liu
@ 2021-11-16  8:24   ` Pingfan Liu
  -1 siblings, 0 replies; 24+ messages in thread
From: Pingfan Liu @ 2021-11-16  8:24 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Pingfan Liu, Catalin Marinas, Will Deacon, Mark Rutland,
	Marc Zyngier, Joey Gouly, Sami Tolvanen, Julien Thierry,
	Yuichi Ito, rcu

Arch level code is ready to take over the nmi_enter()/nmi_exit()
housekeeping.

GICv3 can expose the pNMI discriminator, then simply remove the
housekeeping.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Joey Gouly <joey.gouly@arm.com>
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: Julien Thierry <julien.thierry@arm.com>
Cc: Yuichi Ito <ito-yuichi@fujitsu.com>
Cc: rcu@vger.kernel.org
To: linux-arm-kernel@lists.infradead.org
---
 drivers/irqchip/irq-gic-v3.c | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index daec3309b014..aa2bcb47b47e 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -646,12 +646,8 @@ static void gic_deactivate_unhandled(u32 irqnr)
 
 static inline void gic_handle_nmi(u32 irqnr, struct pt_regs *regs)
 {
-	bool irqs_enabled = interrupts_enabled(regs);
 	int err;
 
-	if (irqs_enabled)
-		nmi_enter();
-
 	if (static_branch_likely(&supports_deactivate_key))
 		gic_write_eoir(irqnr);
 	/*
@@ -664,8 +660,6 @@ static inline void gic_handle_nmi(u32 irqnr, struct pt_regs *regs)
 	if (err)
 		gic_deactivate_unhandled(irqnr);
 
-	if (irqs_enabled)
-		nmi_exit();
 }
 
 static u32 do_read_iar(struct pt_regs *regs)
@@ -702,6 +696,15 @@ static u32 do_read_iar(struct pt_regs *regs)
 	return iar;
 }
 
+static bool gic_is_in_nmi(void)
+{
+	if (gic_supports_nmi() &&
+	    unlikely(gic_read_rpr() == GICD_INT_RPR_PRI(GICD_INT_NMI_PRI)))
+		return true;
+
+	return false;
+}
+
 static asmlinkage void __exception_irq_entry gic_handle_irq(struct pt_regs *regs)
 {
 	u32 irqnr;
@@ -1791,6 +1794,9 @@ static int __init gic_init_bases(void __iomem *dist_base,
 	}
 
 	set_handle_irq(gic_handle_irq);
+#if IS_ENABLED(CONFIG_ARM64_PSEUDO_NMI)
+	set_nmi_discriminator(gic_is_in_nmi);
+#endif
 
 	gic_update_rdist_properties();
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCHv3 3/4] irqchip: GICv3: expose pNMI discriminator
@ 2021-11-16  8:24   ` Pingfan Liu
  0 siblings, 0 replies; 24+ messages in thread
From: Pingfan Liu @ 2021-11-16  8:24 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Pingfan Liu, Catalin Marinas, Will Deacon, Mark Rutland,
	Marc Zyngier, Joey Gouly, Sami Tolvanen, Julien Thierry,
	Yuichi Ito, rcu

Arch level code is ready to take over the nmi_enter()/nmi_exit()
housekeeping.

GICv3 can expose the pNMI discriminator, then simply remove the
housekeeping.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Joey Gouly <joey.gouly@arm.com>
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: Julien Thierry <julien.thierry@arm.com>
Cc: Yuichi Ito <ito-yuichi@fujitsu.com>
Cc: rcu@vger.kernel.org
To: linux-arm-kernel@lists.infradead.org
---
 drivers/irqchip/irq-gic-v3.c | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index daec3309b014..aa2bcb47b47e 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -646,12 +646,8 @@ static void gic_deactivate_unhandled(u32 irqnr)
 
 static inline void gic_handle_nmi(u32 irqnr, struct pt_regs *regs)
 {
-	bool irqs_enabled = interrupts_enabled(regs);
 	int err;
 
-	if (irqs_enabled)
-		nmi_enter();
-
 	if (static_branch_likely(&supports_deactivate_key))
 		gic_write_eoir(irqnr);
 	/*
@@ -664,8 +660,6 @@ static inline void gic_handle_nmi(u32 irqnr, struct pt_regs *regs)
 	if (err)
 		gic_deactivate_unhandled(irqnr);
 
-	if (irqs_enabled)
-		nmi_exit();
 }
 
 static u32 do_read_iar(struct pt_regs *regs)
@@ -702,6 +696,15 @@ static u32 do_read_iar(struct pt_regs *regs)
 	return iar;
 }
 
+static bool gic_is_in_nmi(void)
+{
+	if (gic_supports_nmi() &&
+	    unlikely(gic_read_rpr() == GICD_INT_RPR_PRI(GICD_INT_NMI_PRI)))
+		return true;
+
+	return false;
+}
+
 static asmlinkage void __exception_irq_entry gic_handle_irq(struct pt_regs *regs)
 {
 	u32 irqnr;
@@ -1791,6 +1794,9 @@ static int __init gic_init_bases(void __iomem *dist_base,
 	}
 
 	set_handle_irq(gic_handle_irq);
+#if IS_ENABLED(CONFIG_ARM64_PSEUDO_NMI)
+	set_nmi_discriminator(gic_is_in_nmi);
+#endif
 
 	gic_update_rdist_properties();
 
-- 
2.31.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCHv3 4/4] arm64: entry: remove pNMI judgement in __el1_interrupt() path
  2021-11-16  8:24 ` Pingfan Liu
@ 2021-11-16  8:24   ` Pingfan Liu
  -1 siblings, 0 replies; 24+ messages in thread
From: Pingfan Liu @ 2021-11-16  8:24 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Pingfan Liu, Catalin Marinas, Will Deacon, Mark Rutland,
	Marc Zyngier, Joey Gouly, Sami Tolvanen, Julien Thierry,
	Yuichi Ito, rcu

Now, all pNMI can be detected before the dispatching of __el1_pnmi() or
__el1_interrupt(), and __el1_interrupt() will never be called in pNMI
context. As a result, the judgement of pNMI in
arm64_preempt_schedule_irq() becomes unnecessary.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Joey Gouly <joey.gouly@arm.com>
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: Julien Thierry <julien.thierry@arm.com>
Cc: Yuichi Ito <ito-yuichi@fujitsu.com>
Cc: rcu@vger.kernel.org
To: linux-arm-kernel@lists.infradead.org
---
 arch/arm64/kernel/entry-common.c | 9 ---------
 1 file changed, 9 deletions(-)

diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c
index afcde43f1b73..57d654b915a5 100644
--- a/arch/arm64/kernel/entry-common.c
+++ b/arch/arm64/kernel/entry-common.c
@@ -224,15 +224,6 @@ static void __sched arm64_preempt_schedule_irq(void)
 {
 	lockdep_assert_irqs_disabled();
 
-	/*
-	 * DAIF.DA are cleared at the start of IRQ/FIQ handling, and when GIC
-	 * priority masking is used the GIC irqchip driver will clear DAIF.IF
-	 * using gic_arch_enable_irqs() for normal IRQs. If anything is set in
-	 * DAIF we must have handled an NMI, so skip preemption.
-	 */
-	if (system_uses_irq_prio_masking() && read_sysreg(daif))
-		return;
-
 	/*
 	 * Preempting a task from an IRQ means we leave copies of PSTATE
 	 * on the stack. cpufeature's enable calls may modify PSTATE, but
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCHv3 4/4] arm64: entry: remove pNMI judgement in __el1_interrupt() path
@ 2021-11-16  8:24   ` Pingfan Liu
  0 siblings, 0 replies; 24+ messages in thread
From: Pingfan Liu @ 2021-11-16  8:24 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Pingfan Liu, Catalin Marinas, Will Deacon, Mark Rutland,
	Marc Zyngier, Joey Gouly, Sami Tolvanen, Julien Thierry,
	Yuichi Ito, rcu

Now, all pNMI can be detected before the dispatching of __el1_pnmi() or
__el1_interrupt(), and __el1_interrupt() will never be called in pNMI
context. As a result, the judgement of pNMI in
arm64_preempt_schedule_irq() becomes unnecessary.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Joey Gouly <joey.gouly@arm.com>
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: Julien Thierry <julien.thierry@arm.com>
Cc: Yuichi Ito <ito-yuichi@fujitsu.com>
Cc: rcu@vger.kernel.org
To: linux-arm-kernel@lists.infradead.org
---
 arch/arm64/kernel/entry-common.c | 9 ---------
 1 file changed, 9 deletions(-)

diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c
index afcde43f1b73..57d654b915a5 100644
--- a/arch/arm64/kernel/entry-common.c
+++ b/arch/arm64/kernel/entry-common.c
@@ -224,15 +224,6 @@ static void __sched arm64_preempt_schedule_irq(void)
 {
 	lockdep_assert_irqs_disabled();
 
-	/*
-	 * DAIF.DA are cleared at the start of IRQ/FIQ handling, and when GIC
-	 * priority masking is used the GIC irqchip driver will clear DAIF.IF
-	 * using gic_arch_enable_irqs() for normal IRQs. If anything is set in
-	 * DAIF we must have handled an NMI, so skip preemption.
-	 */
-	if (system_uses_irq_prio_masking() && read_sysreg(daif))
-		return;
-
 	/*
 	 * Preempting a task from an IRQ means we leave copies of PSTATE
 	 * on the stack. cpufeature's enable calls may modify PSTATE, but
-- 
2.31.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCHv3 3/4] irqchip: GICv3: expose pNMI discriminator
  2021-11-16  8:24   ` Pingfan Liu
@ 2021-11-16  9:53     ` Marc Zyngier
  -1 siblings, 0 replies; 24+ messages in thread
From: Marc Zyngier @ 2021-11-16  9:53 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: linux-arm-kernel, Catalin Marinas, Will Deacon, Mark Rutland,
	Joey Gouly, Sami Tolvanen, Julien Thierry, Yuichi Ito, rcu

Hi Pingfan,

On Tue, 16 Nov 2021 08:24:49 +0000,
Pingfan Liu <kernelfans@gmail.com> wrote:
> 
> Arch level code is ready to take over the nmi_enter()/nmi_exit()
> housekeeping.
> 
> GICv3 can expose the pNMI discriminator, then simply remove the
> housekeeping.
> 
> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Joey Gouly <joey.gouly@arm.com>
> Cc: Sami Tolvanen <samitolvanen@google.com>
> Cc: Julien Thierry <julien.thierry@arm.com>
> Cc: Yuichi Ito <ito-yuichi@fujitsu.com>
> Cc: rcu@vger.kernel.org
> To: linux-arm-kernel@lists.infradead.org
> ---
>  drivers/irqchip/irq-gic-v3.c | 18 ++++++++++++------
>  1 file changed, 12 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
> index daec3309b014..aa2bcb47b47e 100644
> --- a/drivers/irqchip/irq-gic-v3.c
> +++ b/drivers/irqchip/irq-gic-v3.c
> @@ -646,12 +646,8 @@ static void gic_deactivate_unhandled(u32 irqnr)
>  
>  static inline void gic_handle_nmi(u32 irqnr, struct pt_regs *regs)
>  {
> -	bool irqs_enabled = interrupts_enabled(regs);
>  	int err;
>  
> -	if (irqs_enabled)
> -		nmi_enter();
> -
>  	if (static_branch_likely(&supports_deactivate_key))
>  		gic_write_eoir(irqnr);
>  	/*
> @@ -664,8 +660,6 @@ static inline void gic_handle_nmi(u32 irqnr, struct pt_regs *regs)
>  	if (err)
>  		gic_deactivate_unhandled(irqnr);
>  
> -	if (irqs_enabled)
> -		nmi_exit();
>  }
>  
>  static u32 do_read_iar(struct pt_regs *regs)
> @@ -702,6 +696,15 @@ static u32 do_read_iar(struct pt_regs *regs)
>  	return iar;
>  }
>  
> +static bool gic_is_in_nmi(void)
> +{
> +	if (gic_supports_nmi() &&
> +	    unlikely(gic_read_rpr() == GICD_INT_RPR_PRI(GICD_INT_NMI_PRI)))
> +		return true;

I don't think this fixes anything.

RPR stands for 'Running Priority Register', which in GIC speak reports
the priority of the most recently Ack'ed interrupt.

You cannot use this to find out whether the interrupt that you /will/
ack is a NMI or not. Actually, you cannot find out about *any*
priority until you actually ack the interrupt. What you are asking for
is the equivalent of a crystal ball, and we're in short supply... ;-)

The only case where ICC_RPR_EL1 will return something that is equal to
GICD_INT_NMI_PRI is when you are *already* in an NMI context. So
unless I have completely misunderstood your approach (which is always
possible), I don't see how this can work.

If you want to distinguish between NMI and IRQ early on (before
acknowledging the interrupt), the only solution is to turn the NMI
into a Group-0 interrupt so that it is presented to the CPU as a
FIQ. At which point, you have the information by construction.

Unfortunately, this will only work in VMs, as Group-0 interrupts are
usually routed to EL3 on bare metal systems.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCHv3 3/4] irqchip: GICv3: expose pNMI discriminator
@ 2021-11-16  9:53     ` Marc Zyngier
  0 siblings, 0 replies; 24+ messages in thread
From: Marc Zyngier @ 2021-11-16  9:53 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: linux-arm-kernel, Catalin Marinas, Will Deacon, Mark Rutland,
	Joey Gouly, Sami Tolvanen, Julien Thierry, Yuichi Ito, rcu

Hi Pingfan,

On Tue, 16 Nov 2021 08:24:49 +0000,
Pingfan Liu <kernelfans@gmail.com> wrote:
> 
> Arch level code is ready to take over the nmi_enter()/nmi_exit()
> housekeeping.
> 
> GICv3 can expose the pNMI discriminator, then simply remove the
> housekeeping.
> 
> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Joey Gouly <joey.gouly@arm.com>
> Cc: Sami Tolvanen <samitolvanen@google.com>
> Cc: Julien Thierry <julien.thierry@arm.com>
> Cc: Yuichi Ito <ito-yuichi@fujitsu.com>
> Cc: rcu@vger.kernel.org
> To: linux-arm-kernel@lists.infradead.org
> ---
>  drivers/irqchip/irq-gic-v3.c | 18 ++++++++++++------
>  1 file changed, 12 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
> index daec3309b014..aa2bcb47b47e 100644
> --- a/drivers/irqchip/irq-gic-v3.c
> +++ b/drivers/irqchip/irq-gic-v3.c
> @@ -646,12 +646,8 @@ static void gic_deactivate_unhandled(u32 irqnr)
>  
>  static inline void gic_handle_nmi(u32 irqnr, struct pt_regs *regs)
>  {
> -	bool irqs_enabled = interrupts_enabled(regs);
>  	int err;
>  
> -	if (irqs_enabled)
> -		nmi_enter();
> -
>  	if (static_branch_likely(&supports_deactivate_key))
>  		gic_write_eoir(irqnr);
>  	/*
> @@ -664,8 +660,6 @@ static inline void gic_handle_nmi(u32 irqnr, struct pt_regs *regs)
>  	if (err)
>  		gic_deactivate_unhandled(irqnr);
>  
> -	if (irqs_enabled)
> -		nmi_exit();
>  }
>  
>  static u32 do_read_iar(struct pt_regs *regs)
> @@ -702,6 +696,15 @@ static u32 do_read_iar(struct pt_regs *regs)
>  	return iar;
>  }
>  
> +static bool gic_is_in_nmi(void)
> +{
> +	if (gic_supports_nmi() &&
> +	    unlikely(gic_read_rpr() == GICD_INT_RPR_PRI(GICD_INT_NMI_PRI)))
> +		return true;

I don't think this fixes anything.

RPR stands for 'Running Priority Register', which in GIC speak reports
the priority of the most recently Ack'ed interrupt.

You cannot use this to find out whether the interrupt that you /will/
ack is a NMI or not. Actually, you cannot find out about *any*
priority until you actually ack the interrupt. What you are asking for
is the equivalent of a crystal ball, and we're in short supply... ;-)

The only case where ICC_RPR_EL1 will return something that is equal to
GICD_INT_NMI_PRI is when you are *already* in an NMI context. So
unless I have completely misunderstood your approach (which is always
possible), I don't see how this can work.

If you want to distinguish between NMI and IRQ early on (before
acknowledging the interrupt), the only solution is to turn the NMI
into a Group-0 interrupt so that it is presented to the CPU as a
FIQ. At which point, you have the information by construction.

Unfortunately, this will only work in VMs, as Group-0 interrupts are
usually routed to EL3 on bare metal systems.

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCHv3 3/4] irqchip: GICv3: expose pNMI discriminator
  2021-11-16  9:53     ` Marc Zyngier
@ 2021-11-17 10:16       ` Pingfan Liu
  -1 siblings, 0 replies; 24+ messages in thread
From: Pingfan Liu @ 2021-11-17 10:16 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, Catalin Marinas, Will Deacon, Mark Rutland,
	Joey Gouly, Sami Tolvanen, Julien Thierry, Yuichi Ito, rcu

On Tue, Nov 16, 2021 at 09:53:25AM +0000, Marc Zyngier wrote:
> Hi Pingfan,
> 
> On Tue, 16 Nov 2021 08:24:49 +0000,
> Pingfan Liu <kernelfans@gmail.com> wrote:
> > 
> > Arch level code is ready to take over the nmi_enter()/nmi_exit()
> > housekeeping.
> > 
> > GICv3 can expose the pNMI discriminator, then simply remove the
> > housekeeping.
> > 
> > Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > Cc: Will Deacon <will@kernel.org>
> > Cc: Mark Rutland <mark.rutland@arm.com>
> > Cc: Marc Zyngier <maz@kernel.org>
> > Cc: Joey Gouly <joey.gouly@arm.com>
> > Cc: Sami Tolvanen <samitolvanen@google.com>
> > Cc: Julien Thierry <julien.thierry@arm.com>
> > Cc: Yuichi Ito <ito-yuichi@fujitsu.com>
> > Cc: rcu@vger.kernel.org
> > To: linux-arm-kernel@lists.infradead.org
> > ---
> >  drivers/irqchip/irq-gic-v3.c | 18 ++++++++++++------
> >  1 file changed, 12 insertions(+), 6 deletions(-)
> > 
> > diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
> > index daec3309b014..aa2bcb47b47e 100644
> > --- a/drivers/irqchip/irq-gic-v3.c
> > +++ b/drivers/irqchip/irq-gic-v3.c
> > @@ -646,12 +646,8 @@ static void gic_deactivate_unhandled(u32 irqnr)
> >  
> >  static inline void gic_handle_nmi(u32 irqnr, struct pt_regs *regs)
> >  {
> > -	bool irqs_enabled = interrupts_enabled(regs);
> >  	int err;
> >  
> > -	if (irqs_enabled)
> > -		nmi_enter();
> > -
> >  	if (static_branch_likely(&supports_deactivate_key))
> >  		gic_write_eoir(irqnr);
> >  	/*
> > @@ -664,8 +660,6 @@ static inline void gic_handle_nmi(u32 irqnr, struct pt_regs *regs)
> >  	if (err)
> >  		gic_deactivate_unhandled(irqnr);
> >  
> > -	if (irqs_enabled)
> > -		nmi_exit();
> >  }
> >  
> >  static u32 do_read_iar(struct pt_regs *regs)
> > @@ -702,6 +696,15 @@ static u32 do_read_iar(struct pt_regs *regs)
> >  	return iar;
> >  }
> >  
> > +static bool gic_is_in_nmi(void)
> > +{
> > +	if (gic_supports_nmi() &&
> > +	    unlikely(gic_read_rpr() == GICD_INT_RPR_PRI(GICD_INT_NMI_PRI)))
> > +		return true;
> 
> I don't think this fixes anything.
> 
> RPR stands for 'Running Priority Register', which in GIC speak reports
> the priority of the most recently Ack'ed interrupt.
> 
> You cannot use this to find out whether the interrupt that you /will/
> ack is a NMI or not. Actually, you cannot find out about *any*
> priority until you actually ack the interrupt. What you are asking for
> is the equivalent of a crystal ball, and we're in short supply... ;-)
> 
> The only case where ICC_RPR_EL1 will return something that is equal to
> GICD_INT_NMI_PRI is when you are *already* in an NMI context. So
> unless I have completely misunderstood your approach (which is always
> possible), I don't see how this can work.
> 

Thank you for the clear explanation. Also I revist this part in "GIC v3
and v4 overview" and have a deeper understanding. (Need to spare time to
go through all later)

You totally got my idea, and I need to find a bail-out.

As all kinds of PIC at least have two parts of functions: active (Ack) and
deactive(EOI), is it possible to split handle_arch_irq into two parts?
I.e let irqchip expose two interfaces:
  u32 (*read_irqinfo*)(struct pt_regs *regs, bool *is_nmi)
  void (*handle_arch_irq)(struct pt_regs *regs, u32 irqnr)
to replace the current interface:
  void (*handle_arch_irq)(struct pt_regs *regs)
  
I have thought about such stuff for some days. And the benefits include:
  -1. For this bugfix (by the parameter 'is_nmi')
  -2. IPI_RESCHEDULE performance drop issue can be resolved at arch
      code level. (by irqnr - ipi_irq_base == IPI_RESCHEDULE ?)
  -3. The arch level can provide a similar loop as aic_handle_irq() in irq-apple-aic.c,
      which can save cpu by avoiding heavy context sync when irq is
      intensive.

Do you think it is doable?


Thanks,

	Pingfan
> If you want to distinguish between NMI and IRQ early on (before
> acknowledging the interrupt), the only solution is to turn the NMI
> into a Group-0 interrupt so that it is presented to the CPU as a
> FIQ. At which point, you have the information by construction.
> 
> Unfortunately, this will only work in VMs, as Group-0 interrupts are
> usually routed to EL3 on bare metal systems.
> 
> 	M.
> 
> -- 
> Without deviation from the norm, progress is not possible.
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCHv3 3/4] irqchip: GICv3: expose pNMI discriminator
@ 2021-11-17 10:16       ` Pingfan Liu
  0 siblings, 0 replies; 24+ messages in thread
From: Pingfan Liu @ 2021-11-17 10:16 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, Catalin Marinas, Will Deacon, Mark Rutland,
	Joey Gouly, Sami Tolvanen, Julien Thierry, Yuichi Ito, rcu

On Tue, Nov 16, 2021 at 09:53:25AM +0000, Marc Zyngier wrote:
> Hi Pingfan,
> 
> On Tue, 16 Nov 2021 08:24:49 +0000,
> Pingfan Liu <kernelfans@gmail.com> wrote:
> > 
> > Arch level code is ready to take over the nmi_enter()/nmi_exit()
> > housekeeping.
> > 
> > GICv3 can expose the pNMI discriminator, then simply remove the
> > housekeeping.
> > 
> > Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > Cc: Will Deacon <will@kernel.org>
> > Cc: Mark Rutland <mark.rutland@arm.com>
> > Cc: Marc Zyngier <maz@kernel.org>
> > Cc: Joey Gouly <joey.gouly@arm.com>
> > Cc: Sami Tolvanen <samitolvanen@google.com>
> > Cc: Julien Thierry <julien.thierry@arm.com>
> > Cc: Yuichi Ito <ito-yuichi@fujitsu.com>
> > Cc: rcu@vger.kernel.org
> > To: linux-arm-kernel@lists.infradead.org
> > ---
> >  drivers/irqchip/irq-gic-v3.c | 18 ++++++++++++------
> >  1 file changed, 12 insertions(+), 6 deletions(-)
> > 
> > diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
> > index daec3309b014..aa2bcb47b47e 100644
> > --- a/drivers/irqchip/irq-gic-v3.c
> > +++ b/drivers/irqchip/irq-gic-v3.c
> > @@ -646,12 +646,8 @@ static void gic_deactivate_unhandled(u32 irqnr)
> >  
> >  static inline void gic_handle_nmi(u32 irqnr, struct pt_regs *regs)
> >  {
> > -	bool irqs_enabled = interrupts_enabled(regs);
> >  	int err;
> >  
> > -	if (irqs_enabled)
> > -		nmi_enter();
> > -
> >  	if (static_branch_likely(&supports_deactivate_key))
> >  		gic_write_eoir(irqnr);
> >  	/*
> > @@ -664,8 +660,6 @@ static inline void gic_handle_nmi(u32 irqnr, struct pt_regs *regs)
> >  	if (err)
> >  		gic_deactivate_unhandled(irqnr);
> >  
> > -	if (irqs_enabled)
> > -		nmi_exit();
> >  }
> >  
> >  static u32 do_read_iar(struct pt_regs *regs)
> > @@ -702,6 +696,15 @@ static u32 do_read_iar(struct pt_regs *regs)
> >  	return iar;
> >  }
> >  
> > +static bool gic_is_in_nmi(void)
> > +{
> > +	if (gic_supports_nmi() &&
> > +	    unlikely(gic_read_rpr() == GICD_INT_RPR_PRI(GICD_INT_NMI_PRI)))
> > +		return true;
> 
> I don't think this fixes anything.
> 
> RPR stands for 'Running Priority Register', which in GIC speak reports
> the priority of the most recently Ack'ed interrupt.
> 
> You cannot use this to find out whether the interrupt that you /will/
> ack is a NMI or not. Actually, you cannot find out about *any*
> priority until you actually ack the interrupt. What you are asking for
> is the equivalent of a crystal ball, and we're in short supply... ;-)
> 
> The only case where ICC_RPR_EL1 will return something that is equal to
> GICD_INT_NMI_PRI is when you are *already* in an NMI context. So
> unless I have completely misunderstood your approach (which is always
> possible), I don't see how this can work.
> 

Thank you for the clear explanation. Also I revist this part in "GIC v3
and v4 overview" and have a deeper understanding. (Need to spare time to
go through all later)

You totally got my idea, and I need to find a bail-out.

As all kinds of PIC at least have two parts of functions: active (Ack) and
deactive(EOI), is it possible to split handle_arch_irq into two parts?
I.e let irqchip expose two interfaces:
  u32 (*read_irqinfo*)(struct pt_regs *regs, bool *is_nmi)
  void (*handle_arch_irq)(struct pt_regs *regs, u32 irqnr)
to replace the current interface:
  void (*handle_arch_irq)(struct pt_regs *regs)
  
I have thought about such stuff for some days. And the benefits include:
  -1. For this bugfix (by the parameter 'is_nmi')
  -2. IPI_RESCHEDULE performance drop issue can be resolved at arch
      code level. (by irqnr - ipi_irq_base == IPI_RESCHEDULE ?)
  -3. The arch level can provide a similar loop as aic_handle_irq() in irq-apple-aic.c,
      which can save cpu by avoiding heavy context sync when irq is
      intensive.

Do you think it is doable?


Thanks,

	Pingfan
> If you want to distinguish between NMI and IRQ early on (before
> acknowledging the interrupt), the only solution is to turn the NMI
> into a Group-0 interrupt so that it is presented to the CPU as a
> FIQ. At which point, you have the information by construction.
> 
> Unfortunately, this will only work in VMs, as Group-0 interrupts are
> usually routed to EL3 on bare metal systems.
> 
> 	M.
> 
> -- 
> Without deviation from the norm, progress is not possible.
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCHv3 3/4] irqchip: GICv3: expose pNMI discriminator
  2021-11-17 10:16       ` Pingfan Liu
@ 2021-11-17 11:01         ` Marc Zyngier
  -1 siblings, 0 replies; 24+ messages in thread
From: Marc Zyngier @ 2021-11-17 11:01 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: linux-arm-kernel, Catalin Marinas, Will Deacon, Mark Rutland,
	Joey Gouly, Sami Tolvanen, Julien Thierry, Yuichi Ito, rcu

On Wed, 17 Nov 2021 10:16:53 +0000,
Pingfan Liu <kernelfans@gmail.com> wrote:
> 
> On Tue, Nov 16, 2021 at 09:53:25AM +0000, Marc Zyngier wrote:
> > Hi Pingfan,
> > 
> > On Tue, 16 Nov 2021 08:24:49 +0000,
> > Pingfan Liu <kernelfans@gmail.com> wrote:
> > > 
> > > Arch level code is ready to take over the nmi_enter()/nmi_exit()
> > > housekeeping.
> > > 
> > > GICv3 can expose the pNMI discriminator, then simply remove the
> > > housekeeping.
> > > 
> > > Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> > > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > > Cc: Will Deacon <will@kernel.org>
> > > Cc: Mark Rutland <mark.rutland@arm.com>
> > > Cc: Marc Zyngier <maz@kernel.org>
> > > Cc: Joey Gouly <joey.gouly@arm.com>
> > > Cc: Sami Tolvanen <samitolvanen@google.com>
> > > Cc: Julien Thierry <julien.thierry@arm.com>
> > > Cc: Yuichi Ito <ito-yuichi@fujitsu.com>
> > > Cc: rcu@vger.kernel.org
> > > To: linux-arm-kernel@lists.infradead.org
> > > ---
> > >  drivers/irqchip/irq-gic-v3.c | 18 ++++++++++++------
> > >  1 file changed, 12 insertions(+), 6 deletions(-)
> > > 
> > > diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
> > > index daec3309b014..aa2bcb47b47e 100644
> > > --- a/drivers/irqchip/irq-gic-v3.c
> > > +++ b/drivers/irqchip/irq-gic-v3.c
> > > @@ -646,12 +646,8 @@ static void gic_deactivate_unhandled(u32 irqnr)
> > >  
> > >  static inline void gic_handle_nmi(u32 irqnr, struct pt_regs *regs)
> > >  {
> > > -	bool irqs_enabled = interrupts_enabled(regs);
> > >  	int err;
> > >  
> > > -	if (irqs_enabled)
> > > -		nmi_enter();
> > > -
> > >  	if (static_branch_likely(&supports_deactivate_key))
> > >  		gic_write_eoir(irqnr);
> > >  	/*
> > > @@ -664,8 +660,6 @@ static inline void gic_handle_nmi(u32 irqnr, struct pt_regs *regs)
> > >  	if (err)
> > >  		gic_deactivate_unhandled(irqnr);
> > >  
> > > -	if (irqs_enabled)
> > > -		nmi_exit();
> > >  }
> > >  
> > >  static u32 do_read_iar(struct pt_regs *regs)
> > > @@ -702,6 +696,15 @@ static u32 do_read_iar(struct pt_regs *regs)
> > >  	return iar;
> > >  }
> > >  
> > > +static bool gic_is_in_nmi(void)
> > > +{
> > > +	if (gic_supports_nmi() &&
> > > +	    unlikely(gic_read_rpr() == GICD_INT_RPR_PRI(GICD_INT_NMI_PRI)))
> > > +		return true;
> > 
> > I don't think this fixes anything.
> > 
> > RPR stands for 'Running Priority Register', which in GIC speak reports
> > the priority of the most recently Ack'ed interrupt.
> > 
> > You cannot use this to find out whether the interrupt that you /will/
> > ack is a NMI or not. Actually, you cannot find out about *any*
> > priority until you actually ack the interrupt. What you are asking for
> > is the equivalent of a crystal ball, and we're in short supply... ;-)
> > 
> > The only case where ICC_RPR_EL1 will return something that is equal to
> > GICD_INT_NMI_PRI is when you are *already* in an NMI context. So
> > unless I have completely misunderstood your approach (which is always
> > possible), I don't see how this can work.
> > 
> 
> Thank you for the clear explanation. Also I revist this part in "GIC v3
> and v4 overview" and have a deeper understanding. (Need to spare time to
> go through all later)
> 
> You totally got my idea, and I need to find a bail-out.
> 
> As all kinds of PIC at least have two parts of functions: active (Ack) and
> deactive(EOI), is it possible to split handle_arch_irq into two parts?
> I.e let irqchip expose two interfaces:
>   u32 (*read_irqinfo*)(struct pt_regs *regs, bool *is_nmi)
>   void (*handle_arch_irq)(struct pt_regs *regs, u32 irqnr)
> to replace the current interface:

No. There is no way we will move to such a scheme. We want to isolate
the irqchip stuff in its own corner, and not propagate it into the
arch code.

If the pseudo-NMI is such a problem, I'm all for removing it *now*,
never to come back again.

>   void (*handle_arch_irq)(struct pt_regs *regs)
>   
> I have thought about such stuff for some days. And the benefits include:
>   -1. For this bugfix (by the parameter 'is_nmi')
>   -2. IPI_RESCHEDULE performance drop issue can be resolved at arch
>       code level. (by irqnr - ipi_irq_base == IPI_RESCHEDULE ?)
>   -3. The arch level can provide a similar loop as aic_handle_irq() in irq-apple-aic.c,
>       which can save cpu by avoiding heavy context sync when irq is
>       intensive.
> 
> Do you think it is doable?

I don't think this is even needed, because I don't believe that the
whole thing is a real problem.

In patch #1, you are claiming that handling a NMI as an IRQ first, and
then upgrading to NMI once we know it really is an NMI is a problem.
How different is this from an IRQ being preempted by a NMI?

Your own conclusion is that the this later case isn't a problem. So
why is the former an issue?

I'm not saying that there is no issue at all, and it could well be
that you have spotted something that I cannot see yet. But if that's
the case, it means that the core code is broken as well.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCHv3 3/4] irqchip: GICv3: expose pNMI discriminator
@ 2021-11-17 11:01         ` Marc Zyngier
  0 siblings, 0 replies; 24+ messages in thread
From: Marc Zyngier @ 2021-11-17 11:01 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: linux-arm-kernel, Catalin Marinas, Will Deacon, Mark Rutland,
	Joey Gouly, Sami Tolvanen, Julien Thierry, Yuichi Ito, rcu

On Wed, 17 Nov 2021 10:16:53 +0000,
Pingfan Liu <kernelfans@gmail.com> wrote:
> 
> On Tue, Nov 16, 2021 at 09:53:25AM +0000, Marc Zyngier wrote:
> > Hi Pingfan,
> > 
> > On Tue, 16 Nov 2021 08:24:49 +0000,
> > Pingfan Liu <kernelfans@gmail.com> wrote:
> > > 
> > > Arch level code is ready to take over the nmi_enter()/nmi_exit()
> > > housekeeping.
> > > 
> > > GICv3 can expose the pNMI discriminator, then simply remove the
> > > housekeeping.
> > > 
> > > Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> > > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > > Cc: Will Deacon <will@kernel.org>
> > > Cc: Mark Rutland <mark.rutland@arm.com>
> > > Cc: Marc Zyngier <maz@kernel.org>
> > > Cc: Joey Gouly <joey.gouly@arm.com>
> > > Cc: Sami Tolvanen <samitolvanen@google.com>
> > > Cc: Julien Thierry <julien.thierry@arm.com>
> > > Cc: Yuichi Ito <ito-yuichi@fujitsu.com>
> > > Cc: rcu@vger.kernel.org
> > > To: linux-arm-kernel@lists.infradead.org
> > > ---
> > >  drivers/irqchip/irq-gic-v3.c | 18 ++++++++++++------
> > >  1 file changed, 12 insertions(+), 6 deletions(-)
> > > 
> > > diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
> > > index daec3309b014..aa2bcb47b47e 100644
> > > --- a/drivers/irqchip/irq-gic-v3.c
> > > +++ b/drivers/irqchip/irq-gic-v3.c
> > > @@ -646,12 +646,8 @@ static void gic_deactivate_unhandled(u32 irqnr)
> > >  
> > >  static inline void gic_handle_nmi(u32 irqnr, struct pt_regs *regs)
> > >  {
> > > -	bool irqs_enabled = interrupts_enabled(regs);
> > >  	int err;
> > >  
> > > -	if (irqs_enabled)
> > > -		nmi_enter();
> > > -
> > >  	if (static_branch_likely(&supports_deactivate_key))
> > >  		gic_write_eoir(irqnr);
> > >  	/*
> > > @@ -664,8 +660,6 @@ static inline void gic_handle_nmi(u32 irqnr, struct pt_regs *regs)
> > >  	if (err)
> > >  		gic_deactivate_unhandled(irqnr);
> > >  
> > > -	if (irqs_enabled)
> > > -		nmi_exit();
> > >  }
> > >  
> > >  static u32 do_read_iar(struct pt_regs *regs)
> > > @@ -702,6 +696,15 @@ static u32 do_read_iar(struct pt_regs *regs)
> > >  	return iar;
> > >  }
> > >  
> > > +static bool gic_is_in_nmi(void)
> > > +{
> > > +	if (gic_supports_nmi() &&
> > > +	    unlikely(gic_read_rpr() == GICD_INT_RPR_PRI(GICD_INT_NMI_PRI)))
> > > +		return true;
> > 
> > I don't think this fixes anything.
> > 
> > RPR stands for 'Running Priority Register', which in GIC speak reports
> > the priority of the most recently Ack'ed interrupt.
> > 
> > You cannot use this to find out whether the interrupt that you /will/
> > ack is a NMI or not. Actually, you cannot find out about *any*
> > priority until you actually ack the interrupt. What you are asking for
> > is the equivalent of a crystal ball, and we're in short supply... ;-)
> > 
> > The only case where ICC_RPR_EL1 will return something that is equal to
> > GICD_INT_NMI_PRI is when you are *already* in an NMI context. So
> > unless I have completely misunderstood your approach (which is always
> > possible), I don't see how this can work.
> > 
> 
> Thank you for the clear explanation. Also I revist this part in "GIC v3
> and v4 overview" and have a deeper understanding. (Need to spare time to
> go through all later)
> 
> You totally got my idea, and I need to find a bail-out.
> 
> As all kinds of PIC at least have two parts of functions: active (Ack) and
> deactive(EOI), is it possible to split handle_arch_irq into two parts?
> I.e let irqchip expose two interfaces:
>   u32 (*read_irqinfo*)(struct pt_regs *regs, bool *is_nmi)
>   void (*handle_arch_irq)(struct pt_regs *regs, u32 irqnr)
> to replace the current interface:

No. There is no way we will move to such a scheme. We want to isolate
the irqchip stuff in its own corner, and not propagate it into the
arch code.

If the pseudo-NMI is such a problem, I'm all for removing it *now*,
never to come back again.

>   void (*handle_arch_irq)(struct pt_regs *regs)
>   
> I have thought about such stuff for some days. And the benefits include:
>   -1. For this bugfix (by the parameter 'is_nmi')
>   -2. IPI_RESCHEDULE performance drop issue can be resolved at arch
>       code level. (by irqnr - ipi_irq_base == IPI_RESCHEDULE ?)
>   -3. The arch level can provide a similar loop as aic_handle_irq() in irq-apple-aic.c,
>       which can save cpu by avoiding heavy context sync when irq is
>       intensive.
> 
> Do you think it is doable?

I don't think this is even needed, because I don't believe that the
whole thing is a real problem.

In patch #1, you are claiming that handling a NMI as an IRQ first, and
then upgrading to NMI once we know it really is an NMI is a problem.
How different is this from an IRQ being preempted by a NMI?

Your own conclusion is that the this later case isn't a problem. So
why is the former an issue?

I'm not saying that there is no issue at all, and it could well be
that you have spotted something that I cannot see yet. But if that's
the case, it means that the core code is broken as well.

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCHv3 1/4] arm64: entry: judge nmi ealier to avoid deadlock in RCU
  2021-11-16  8:24   ` Pingfan Liu
@ 2021-11-17 11:38     ` Mark Rutland
  -1 siblings, 0 replies; 24+ messages in thread
From: Mark Rutland @ 2021-11-17 11:38 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: linux-arm-kernel, Paul E . McKenney, Catalin Marinas,
	Will Deacon, Marc Zyngier, Thomas Gleixner, Joey Gouly,
	Sami Tolvanen, Julien Thierry, Yuichi Ito, rcu

On Tue, Nov 16, 2021 at 04:24:47PM +0800, Pingfan Liu wrote:
> Linux kernel places strict semantics between NMI and maskable interrupt.
> So does the RCU component, else deadlock may happen. But the current
> arm64 entry code can partially breach this rule through calling
> rcu_nmi_enter().
> 
> ***  how a deadlock can happen if NMI mistaken as IRQ  ***
> 
>     rcu_nmi_enter()
>     {
> 
>         if (rcu_dynticks_curr_cpu_in_eqs()) {
> 
>         	if (!in_nmi())
>         		rcu_dynticks_task_exit();
>                 ...
>         	if (!in_nmi()) {
>         		instrumentation_begin();
>         		rcu_cleanup_after_idle();
>         		instrumentation_end();
>         	}
>                 ...
>         } else if (!in_nmi()) {
>         	instrumentation_begin();
>         	rcu_irq_enter_check_tick();
>         }
> 
>     }
> 
> If a NMI is mistaken as a maskable interrupt, rcu_irq_enter_check_tick()
> can hit a deadlock, which is demonstrated by the following scenario:
> 
>     note_gp_changes() runs in a task context
>     {
>        local_irq_save(flags); // this protects against irq, but not NMI

Note: speecifically, this masks IRQs via ICC_PMR_EL1.

>        rnp = rdp->mynode;
>        ...
>        raw_spin_trylock_rcu_node(rnp)
>                                   -------> broken in by (p)NMI, without taking __nmi_enter()

AFAICT the broken case described here *cannot* happen.

If we take a pNMI here, we'll go:

  el1h_64_irq()
  -> el1h64_irq_handler()
  -> el1_interrupt()

... where el1_interrupt is:

| static void noinstr el1_interrupt(struct pt_regs *regs,
|                                   void (*handler)(struct pt_regs *)) 
| {
|         write_sysreg(DAIF_PROCCTX_NOIRQ, daif);
| 
|         if (IS_ENABLED(CONFIG_ARM64_PSEUDO_NMI) && !interrupts_enabled(regs))
|                 __el1_pnmi(regs, handler);
|         else
|                 __el1_irq(regs, handler);
| } 

... and interrupts_enabled() is:

| #define interrupts_enabled(regs)                        \
|        (!((regs)->pstate & PSR_I_BIT) && irqs_priority_unmasked(regs))

... and irqs_priority_unmasked is:

| #define irqs_priority_unmasked(regs)                                    \
|         (system_uses_irq_prio_masking() ?                               \
|                 (regs)->pmr_save == GIC_PRIO_IRQON :                    \
|                 true)

... irqs_priority_unmasked(regs) *must* return false for this case,
since the local_irq_save() above must have set the PMR to
GIC_PRIO_IRQOFF in the interrupted context.

Thus, we *must* call __el1_pnmi() in this case, where __el1_pnmi() is:

| static __always_inline void __el1_pnmi(struct pt_regs *regs,
|                                        void (*handler)(struct pt_regs *)) 
| {
|         arm64_enter_nmi(regs);
|         do_interrupt_handler(regs, handler);
|         arm64_exit_nmi(regs);
| }

Note that arm64_{enter,exit}_nmi() handle the __nmi_{enter,exit}() calls around
do_interupt_handler().

>                                            rcu_nmi_enter()
>                                             ->__rcu_irq_enter_check_tick()
>                                                ->raw_spin_lock_rcu_node(rdp->mynode);
>                                                    deadlock happens!!!
> 
>     }

As above, I don't think this can go wrong as you describe, and don't believe
that this can deadlock.

> *** On arm64, how pNMI mistaken as IRQ ***
> 
> On arm64, pNMI is an analogue to NMI. In essence, it is a higher
> priority interrupt but not disabled by local_irq_disable().
> 
> In current implementation
> 1) If a pNMI from a context where IRQs were masked, it can be recognized
>    as nmi, and calls __nmi_enter() immediately. This is no problem.

Agreed, this case works correctly.

> 2) But it causes trouble if a pNMI from a context where IRQs were
>    unmasked, and temporarily regarded as maskable interrupt.  It is not
>    treated as NMI, i.e. calling nmi_enter() until reading from GIC.
> 
>     __el1_irq()
>     {
>       irq_enter_rcu()   ----> hit the deadlock bug

What is "the deadlock bug"? The example you had above was for a context where
IRQs were *masked*, which is a different case.

Consider that if this can happen for a context where IRQs are unmasked
it means that we would also deadlock *when taking a regular IRQ*.

I do not beleive that this can deadlock as described. I don't see this
as any different from a sequence such as:

  < run in a context with IRQs unmasked>
    < take a regular IRQ >
      < take a pNMI within the IRQ handling flow >

... e.g. taking a pNMI when handling a spurious regular IRQ.

>         gic_handle_nmi()
>           nmi_enter()
>           nmi_exit()
>       irq_exit_rcu()
>     }
> 
> *** Remedy ***
> 
> If the irqchip level exposes an interface for detecting pNMI to arch level
> code, it can meet the requirement at this early stage. That is the
> interface (*interrupt_is_nmi)() in this patch.

As above, I do not believe that this is necessary for the current pseudo-NMI
scheme.

Are you seeing a deadlock in practice, or is this something you've found by
inspection?

Thanks,
Mark.

> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Paul E. McKenney <paulmck@kernel.org>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Joey Gouly <joey.gouly@arm.com>
> Cc: Sami Tolvanen <samitolvanen@google.com>
> Cc: Julien Thierry <julien.thierry@arm.com>
> Cc: Yuichi Ito <ito-yuichi@fujitsu.com>
> Cc: rcu@vger.kernel.org
> To: linux-arm-kernel@lists.infradead.org
> ---
>  arch/arm64/include/asm/irq.h     |  1 +
>  arch/arm64/kernel/entry-common.c | 10 +++++++++-
>  arch/arm64/kernel/irq.c          | 18 ++++++++++++++++++
>  3 files changed, 28 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/include/asm/irq.h b/arch/arm64/include/asm/irq.h
> index fac08e18bcd5..f3eb13bfa65e 100644
> --- a/arch/arm64/include/asm/irq.h
> +++ b/arch/arm64/include/asm/irq.h
> @@ -11,6 +11,7 @@ struct pt_regs;
>  int set_handle_irq(void (*handle_irq)(struct pt_regs *));
>  #define set_handle_irq	set_handle_irq
>  int set_handle_fiq(void (*handle_fiq)(struct pt_regs *));
> +int set_nmi_discriminator(bool (*discriminator)(void));
>  
>  static inline int nr_legacy_irqs(void)
>  {
> diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c
> index f7408edf8571..5a1a5dd66d04 100644
> --- a/arch/arm64/kernel/entry-common.c
> +++ b/arch/arm64/kernel/entry-common.c
> @@ -260,6 +260,14 @@ static void do_interrupt_handler(struct pt_regs *regs,
>  
>  extern void (*handle_arch_irq)(struct pt_regs *);
>  extern void (*handle_arch_fiq)(struct pt_regs *);
> +extern bool (*interrupt_is_nmi)(void);
> +
> +static inline bool is_in_pnmi(struct pt_regs *regs)
> +{
> +	if (!interrupts_enabled(regs) || (*interrupt_is_nmi)())
> +		return true;
> +	return false;
> +}
>  
>  static void noinstr __panic_unhandled(struct pt_regs *regs, const char *vector,
>  				      unsigned int esr)
> @@ -454,7 +462,7 @@ static void noinstr el1_interrupt(struct pt_regs *regs,
>  {
>  	write_sysreg(DAIF_PROCCTX_NOIRQ, daif);
>  
> -	if (IS_ENABLED(CONFIG_ARM64_PSEUDO_NMI) && !interrupts_enabled(regs))
> +	if (IS_ENABLED(CONFIG_ARM64_PSEUDO_NMI) && is_in_pnmi(regs))
>  		__el1_pnmi(regs, handler);
>  	else
>  		__el1_irq(regs, handler);
> diff --git a/arch/arm64/kernel/irq.c b/arch/arm64/kernel/irq.c
> index bda49430c9ea..fabed09ed966 100644
> --- a/arch/arm64/kernel/irq.c
> +++ b/arch/arm64/kernel/irq.c
> @@ -81,8 +81,14 @@ static void default_handle_fiq(struct pt_regs *regs)
>  	panic("FIQ taken without a root FIQ handler\n");
>  }
>  
> +static bool default_nmi_discriminator(void)
> +{
> +	return false;
> +}
> +
>  void (*handle_arch_irq)(struct pt_regs *) __ro_after_init = default_handle_irq;
>  void (*handle_arch_fiq)(struct pt_regs *) __ro_after_init = default_handle_fiq;
> +bool (*interrupt_is_nmi)(void) __ro_after_init = default_nmi_discriminator;
>  
>  int __init set_handle_irq(void (*handle_irq)(struct pt_regs *))
>  {
> @@ -104,6 +110,18 @@ int __init set_handle_fiq(void (*handle_fiq)(struct pt_regs *))
>  	return 0;
>  }
>  
> +#if IS_ENABLED(CONFIG_ARM64_PSEUDO_NMI)
> +
> +int __init set_nmi_discriminator(bool (*discriminator)(void))
> +{
> +	if (interrupt_is_nmi != default_nmi_discriminator)
> +		return -EBUSY;
> +
> +	interrupt_is_nmi = discriminator;
> +	return 0;
> +}
> +#endif
> +
>  void __init init_IRQ(void)
>  {
>  	init_irq_stacks();
> -- 
> 2.31.1
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCHv3 1/4] arm64: entry: judge nmi ealier to avoid deadlock in RCU
@ 2021-11-17 11:38     ` Mark Rutland
  0 siblings, 0 replies; 24+ messages in thread
From: Mark Rutland @ 2021-11-17 11:38 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: linux-arm-kernel, Paul E . McKenney, Catalin Marinas,
	Will Deacon, Marc Zyngier, Thomas Gleixner, Joey Gouly,
	Sami Tolvanen, Julien Thierry, Yuichi Ito, rcu

On Tue, Nov 16, 2021 at 04:24:47PM +0800, Pingfan Liu wrote:
> Linux kernel places strict semantics between NMI and maskable interrupt.
> So does the RCU component, else deadlock may happen. But the current
> arm64 entry code can partially breach this rule through calling
> rcu_nmi_enter().
> 
> ***  how a deadlock can happen if NMI mistaken as IRQ  ***
> 
>     rcu_nmi_enter()
>     {
> 
>         if (rcu_dynticks_curr_cpu_in_eqs()) {
> 
>         	if (!in_nmi())
>         		rcu_dynticks_task_exit();
>                 ...
>         	if (!in_nmi()) {
>         		instrumentation_begin();
>         		rcu_cleanup_after_idle();
>         		instrumentation_end();
>         	}
>                 ...
>         } else if (!in_nmi()) {
>         	instrumentation_begin();
>         	rcu_irq_enter_check_tick();
>         }
> 
>     }
> 
> If a NMI is mistaken as a maskable interrupt, rcu_irq_enter_check_tick()
> can hit a deadlock, which is demonstrated by the following scenario:
> 
>     note_gp_changes() runs in a task context
>     {
>        local_irq_save(flags); // this protects against irq, but not NMI

Note: speecifically, this masks IRQs via ICC_PMR_EL1.

>        rnp = rdp->mynode;
>        ...
>        raw_spin_trylock_rcu_node(rnp)
>                                   -------> broken in by (p)NMI, without taking __nmi_enter()

AFAICT the broken case described here *cannot* happen.

If we take a pNMI here, we'll go:

  el1h_64_irq()
  -> el1h64_irq_handler()
  -> el1_interrupt()

... where el1_interrupt is:

| static void noinstr el1_interrupt(struct pt_regs *regs,
|                                   void (*handler)(struct pt_regs *)) 
| {
|         write_sysreg(DAIF_PROCCTX_NOIRQ, daif);
| 
|         if (IS_ENABLED(CONFIG_ARM64_PSEUDO_NMI) && !interrupts_enabled(regs))
|                 __el1_pnmi(regs, handler);
|         else
|                 __el1_irq(regs, handler);
| } 

... and interrupts_enabled() is:

| #define interrupts_enabled(regs)                        \
|        (!((regs)->pstate & PSR_I_BIT) && irqs_priority_unmasked(regs))

... and irqs_priority_unmasked is:

| #define irqs_priority_unmasked(regs)                                    \
|         (system_uses_irq_prio_masking() ?                               \
|                 (regs)->pmr_save == GIC_PRIO_IRQON :                    \
|                 true)

... irqs_priority_unmasked(regs) *must* return false for this case,
since the local_irq_save() above must have set the PMR to
GIC_PRIO_IRQOFF in the interrupted context.

Thus, we *must* call __el1_pnmi() in this case, where __el1_pnmi() is:

| static __always_inline void __el1_pnmi(struct pt_regs *regs,
|                                        void (*handler)(struct pt_regs *)) 
| {
|         arm64_enter_nmi(regs);
|         do_interrupt_handler(regs, handler);
|         arm64_exit_nmi(regs);
| }

Note that arm64_{enter,exit}_nmi() handle the __nmi_{enter,exit}() calls around
do_interupt_handler().

>                                            rcu_nmi_enter()
>                                             ->__rcu_irq_enter_check_tick()
>                                                ->raw_spin_lock_rcu_node(rdp->mynode);
>                                                    deadlock happens!!!
> 
>     }

As above, I don't think this can go wrong as you describe, and don't believe
that this can deadlock.

> *** On arm64, how pNMI mistaken as IRQ ***
> 
> On arm64, pNMI is an analogue to NMI. In essence, it is a higher
> priority interrupt but not disabled by local_irq_disable().
> 
> In current implementation
> 1) If a pNMI from a context where IRQs were masked, it can be recognized
>    as nmi, and calls __nmi_enter() immediately. This is no problem.

Agreed, this case works correctly.

> 2) But it causes trouble if a pNMI from a context where IRQs were
>    unmasked, and temporarily regarded as maskable interrupt.  It is not
>    treated as NMI, i.e. calling nmi_enter() until reading from GIC.
> 
>     __el1_irq()
>     {
>       irq_enter_rcu()   ----> hit the deadlock bug

What is "the deadlock bug"? The example you had above was for a context where
IRQs were *masked*, which is a different case.

Consider that if this can happen for a context where IRQs are unmasked
it means that we would also deadlock *when taking a regular IRQ*.

I do not beleive that this can deadlock as described. I don't see this
as any different from a sequence such as:

  < run in a context with IRQs unmasked>
    < take a regular IRQ >
      < take a pNMI within the IRQ handling flow >

... e.g. taking a pNMI when handling a spurious regular IRQ.

>         gic_handle_nmi()
>           nmi_enter()
>           nmi_exit()
>       irq_exit_rcu()
>     }
> 
> *** Remedy ***
> 
> If the irqchip level exposes an interface for detecting pNMI to arch level
> code, it can meet the requirement at this early stage. That is the
> interface (*interrupt_is_nmi)() in this patch.

As above, I do not believe that this is necessary for the current pseudo-NMI
scheme.

Are you seeing a deadlock in practice, or is this something you've found by
inspection?

Thanks,
Mark.

> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Paul E. McKenney <paulmck@kernel.org>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Joey Gouly <joey.gouly@arm.com>
> Cc: Sami Tolvanen <samitolvanen@google.com>
> Cc: Julien Thierry <julien.thierry@arm.com>
> Cc: Yuichi Ito <ito-yuichi@fujitsu.com>
> Cc: rcu@vger.kernel.org
> To: linux-arm-kernel@lists.infradead.org
> ---
>  arch/arm64/include/asm/irq.h     |  1 +
>  arch/arm64/kernel/entry-common.c | 10 +++++++++-
>  arch/arm64/kernel/irq.c          | 18 ++++++++++++++++++
>  3 files changed, 28 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/include/asm/irq.h b/arch/arm64/include/asm/irq.h
> index fac08e18bcd5..f3eb13bfa65e 100644
> --- a/arch/arm64/include/asm/irq.h
> +++ b/arch/arm64/include/asm/irq.h
> @@ -11,6 +11,7 @@ struct pt_regs;
>  int set_handle_irq(void (*handle_irq)(struct pt_regs *));
>  #define set_handle_irq	set_handle_irq
>  int set_handle_fiq(void (*handle_fiq)(struct pt_regs *));
> +int set_nmi_discriminator(bool (*discriminator)(void));
>  
>  static inline int nr_legacy_irqs(void)
>  {
> diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c
> index f7408edf8571..5a1a5dd66d04 100644
> --- a/arch/arm64/kernel/entry-common.c
> +++ b/arch/arm64/kernel/entry-common.c
> @@ -260,6 +260,14 @@ static void do_interrupt_handler(struct pt_regs *regs,
>  
>  extern void (*handle_arch_irq)(struct pt_regs *);
>  extern void (*handle_arch_fiq)(struct pt_regs *);
> +extern bool (*interrupt_is_nmi)(void);
> +
> +static inline bool is_in_pnmi(struct pt_regs *regs)
> +{
> +	if (!interrupts_enabled(regs) || (*interrupt_is_nmi)())
> +		return true;
> +	return false;
> +}
>  
>  static void noinstr __panic_unhandled(struct pt_regs *regs, const char *vector,
>  				      unsigned int esr)
> @@ -454,7 +462,7 @@ static void noinstr el1_interrupt(struct pt_regs *regs,
>  {
>  	write_sysreg(DAIF_PROCCTX_NOIRQ, daif);
>  
> -	if (IS_ENABLED(CONFIG_ARM64_PSEUDO_NMI) && !interrupts_enabled(regs))
> +	if (IS_ENABLED(CONFIG_ARM64_PSEUDO_NMI) && is_in_pnmi(regs))
>  		__el1_pnmi(regs, handler);
>  	else
>  		__el1_irq(regs, handler);
> diff --git a/arch/arm64/kernel/irq.c b/arch/arm64/kernel/irq.c
> index bda49430c9ea..fabed09ed966 100644
> --- a/arch/arm64/kernel/irq.c
> +++ b/arch/arm64/kernel/irq.c
> @@ -81,8 +81,14 @@ static void default_handle_fiq(struct pt_regs *regs)
>  	panic("FIQ taken without a root FIQ handler\n");
>  }
>  
> +static bool default_nmi_discriminator(void)
> +{
> +	return false;
> +}
> +
>  void (*handle_arch_irq)(struct pt_regs *) __ro_after_init = default_handle_irq;
>  void (*handle_arch_fiq)(struct pt_regs *) __ro_after_init = default_handle_fiq;
> +bool (*interrupt_is_nmi)(void) __ro_after_init = default_nmi_discriminator;
>  
>  int __init set_handle_irq(void (*handle_irq)(struct pt_regs *))
>  {
> @@ -104,6 +110,18 @@ int __init set_handle_fiq(void (*handle_fiq)(struct pt_regs *))
>  	return 0;
>  }
>  
> +#if IS_ENABLED(CONFIG_ARM64_PSEUDO_NMI)
> +
> +int __init set_nmi_discriminator(bool (*discriminator)(void))
> +{
> +	if (interrupt_is_nmi != default_nmi_discriminator)
> +		return -EBUSY;
> +
> +	interrupt_is_nmi = discriminator;
> +	return 0;
> +}
> +#endif
> +
>  void __init init_IRQ(void)
>  {
>  	init_irq_stacks();
> -- 
> 2.31.1
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCHv3 1/4] arm64: entry: judge nmi ealier to avoid deadlock in RCU
  2021-11-17 11:38     ` Mark Rutland
@ 2021-11-19  2:01       ` Pingfan Liu
  -1 siblings, 0 replies; 24+ messages in thread
From: Pingfan Liu @ 2021-11-19  2:01 UTC (permalink / raw)
  To: Mark Rutland
  Cc: linux-arm-kernel, Paul E . McKenney, Catalin Marinas,
	Will Deacon, Marc Zyngier, Thomas Gleixner, Joey Gouly,
	Sami Tolvanen, Julien Thierry, Yuichi Ito, rcu

On Wed, Nov 17, 2021 at 11:38:54AM +0000, Mark Rutland wrote:
> On Tue, Nov 16, 2021 at 04:24:47PM +0800, Pingfan Liu wrote:
> > Linux kernel places strict semantics between NMI and maskable interrupt.
> > So does the RCU component, else deadlock may happen. But the current
> > arm64 entry code can partially breach this rule through calling
> > rcu_nmi_enter().
> > 
> > ***  how a deadlock can happen if NMI mistaken as IRQ  ***
> > 
> >     rcu_nmi_enter()
> >     {
> > 
> >         if (rcu_dynticks_curr_cpu_in_eqs()) {
> > 
> >         	if (!in_nmi())
> >         		rcu_dynticks_task_exit();
> >                 ...
> >         	if (!in_nmi()) {
> >         		instrumentation_begin();
> >         		rcu_cleanup_after_idle();
> >         		instrumentation_end();
> >         	}
> >                 ...
> >         } else if (!in_nmi()) {
> >         	instrumentation_begin();
> >         	rcu_irq_enter_check_tick();
> >         }
> > 
> >     }
> > 
> > If a NMI is mistaken as a maskable interrupt, rcu_irq_enter_check_tick()
> > can hit a deadlock, which is demonstrated by the following scenario:
> > 
> >     note_gp_changes() runs in a task context
> >     {
> >        local_irq_save(flags); // this protects against irq, but not NMI
> 
> Note: speecifically, this masks IRQs via ICC_PMR_EL1.
> 
> >        rnp = rdp->mynode;
> >        ...
> >        raw_spin_trylock_rcu_node(rnp)
> >                                   -------> broken in by (p)NMI, without taking __nmi_enter()
> 
> AFAICT the broken case described here *cannot* happen.
> 
> If we take a pNMI here, we'll go:
> 
>   el1h_64_irq()
>   -> el1h64_irq_handler()
>   -> el1_interrupt()
> 
> ... where el1_interrupt is:
> 
> | static void noinstr el1_interrupt(struct pt_regs *regs,
> |                                   void (*handler)(struct pt_regs *)) 
> | {
> |         write_sysreg(DAIF_PROCCTX_NOIRQ, daif);
> | 
> |         if (IS_ENABLED(CONFIG_ARM64_PSEUDO_NMI) && !interrupts_enabled(regs))
> |                 __el1_pnmi(regs, handler);
> |         else
> |                 __el1_irq(regs, handler);
> | } 
> 
> ... and interrupts_enabled() is:
> 
> | #define interrupts_enabled(regs)                        \
> |        (!((regs)->pstate & PSR_I_BIT) && irqs_priority_unmasked(regs))
> 
> ... and irqs_priority_unmasked is:
> 
> | #define irqs_priority_unmasked(regs)                                    \
> |         (system_uses_irq_prio_masking() ?                               \
> |                 (regs)->pmr_save == GIC_PRIO_IRQON :                    \
> |                 true)
> 
> ... irqs_priority_unmasked(regs) *must* return false for this case,
> since the local_irq_save() above must have set the PMR to
> GIC_PRIO_IRQOFF in the interrupted context.
> 
> Thus, we *must* call __el1_pnmi() in this case, where __el1_pnmi() is:
> 

Yes, you are right. And I made a mistake to think it will call
__el1_irq()
> | static __always_inline void __el1_pnmi(struct pt_regs *regs,
> |                                        void (*handler)(struct pt_regs *)) 
> | {
> |         arm64_enter_nmi(regs);
> |         do_interrupt_handler(regs, handler);
> |         arm64_exit_nmi(regs);
> | }
> 
> Note that arm64_{enter,exit}_nmi() handle the __nmi_{enter,exit}() calls around
> do_interupt_handler().
> 
> >                                            rcu_nmi_enter()
> >                                             ->__rcu_irq_enter_check_tick()
> >                                                ->raw_spin_lock_rcu_node(rdp->mynode);
> >                                                    deadlock happens!!!
> > 
> >     }
> 
> As above, I don't think this can go wrong as you describe, and don't believe
> that this can deadlock.
> 
> > *** On arm64, how pNMI mistaken as IRQ ***
> > 
> > On arm64, pNMI is an analogue to NMI. In essence, it is a higher
> > priority interrupt but not disabled by local_irq_disable().
> > 
> > In current implementation
> > 1) If a pNMI from a context where IRQs were masked, it can be recognized
> >    as nmi, and calls __nmi_enter() immediately. This is no problem.
> 
> Agreed, this case works correctly.
> 
> > 2) But it causes trouble if a pNMI from a context where IRQs were
> >    unmasked, and temporarily regarded as maskable interrupt.  It is not
> >    treated as NMI, i.e. calling nmi_enter() until reading from GIC.
> > 
> >     __el1_irq()
> >     {
> >       irq_enter_rcu()   ----> hit the deadlock bug
> 
> What is "the deadlock bug"? The example you had above was for a context where
> IRQs were *masked*, which is a different case.
> 

As above, I made a mistake about the condition hence the code path.
There is no problem in this case through calling __el1_pnmi()

Sorry for the false alarm and thank you again for your detailed
explaination.


Regards,

	Pingfan

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCHv3 1/4] arm64: entry: judge nmi ealier to avoid deadlock in RCU
@ 2021-11-19  2:01       ` Pingfan Liu
  0 siblings, 0 replies; 24+ messages in thread
From: Pingfan Liu @ 2021-11-19  2:01 UTC (permalink / raw)
  To: Mark Rutland
  Cc: linux-arm-kernel, Paul E . McKenney, Catalin Marinas,
	Will Deacon, Marc Zyngier, Thomas Gleixner, Joey Gouly,
	Sami Tolvanen, Julien Thierry, Yuichi Ito, rcu

On Wed, Nov 17, 2021 at 11:38:54AM +0000, Mark Rutland wrote:
> On Tue, Nov 16, 2021 at 04:24:47PM +0800, Pingfan Liu wrote:
> > Linux kernel places strict semantics between NMI and maskable interrupt.
> > So does the RCU component, else deadlock may happen. But the current
> > arm64 entry code can partially breach this rule through calling
> > rcu_nmi_enter().
> > 
> > ***  how a deadlock can happen if NMI mistaken as IRQ  ***
> > 
> >     rcu_nmi_enter()
> >     {
> > 
> >         if (rcu_dynticks_curr_cpu_in_eqs()) {
> > 
> >         	if (!in_nmi())
> >         		rcu_dynticks_task_exit();
> >                 ...
> >         	if (!in_nmi()) {
> >         		instrumentation_begin();
> >         		rcu_cleanup_after_idle();
> >         		instrumentation_end();
> >         	}
> >                 ...
> >         } else if (!in_nmi()) {
> >         	instrumentation_begin();
> >         	rcu_irq_enter_check_tick();
> >         }
> > 
> >     }
> > 
> > If a NMI is mistaken as a maskable interrupt, rcu_irq_enter_check_tick()
> > can hit a deadlock, which is demonstrated by the following scenario:
> > 
> >     note_gp_changes() runs in a task context
> >     {
> >        local_irq_save(flags); // this protects against irq, but not NMI
> 
> Note: speecifically, this masks IRQs via ICC_PMR_EL1.
> 
> >        rnp = rdp->mynode;
> >        ...
> >        raw_spin_trylock_rcu_node(rnp)
> >                                   -------> broken in by (p)NMI, without taking __nmi_enter()
> 
> AFAICT the broken case described here *cannot* happen.
> 
> If we take a pNMI here, we'll go:
> 
>   el1h_64_irq()
>   -> el1h64_irq_handler()
>   -> el1_interrupt()
> 
> ... where el1_interrupt is:
> 
> | static void noinstr el1_interrupt(struct pt_regs *regs,
> |                                   void (*handler)(struct pt_regs *)) 
> | {
> |         write_sysreg(DAIF_PROCCTX_NOIRQ, daif);
> | 
> |         if (IS_ENABLED(CONFIG_ARM64_PSEUDO_NMI) && !interrupts_enabled(regs))
> |                 __el1_pnmi(regs, handler);
> |         else
> |                 __el1_irq(regs, handler);
> | } 
> 
> ... and interrupts_enabled() is:
> 
> | #define interrupts_enabled(regs)                        \
> |        (!((regs)->pstate & PSR_I_BIT) && irqs_priority_unmasked(regs))
> 
> ... and irqs_priority_unmasked is:
> 
> | #define irqs_priority_unmasked(regs)                                    \
> |         (system_uses_irq_prio_masking() ?                               \
> |                 (regs)->pmr_save == GIC_PRIO_IRQON :                    \
> |                 true)
> 
> ... irqs_priority_unmasked(regs) *must* return false for this case,
> since the local_irq_save() above must have set the PMR to
> GIC_PRIO_IRQOFF in the interrupted context.
> 
> Thus, we *must* call __el1_pnmi() in this case, where __el1_pnmi() is:
> 

Yes, you are right. And I made a mistake to think it will call
__el1_irq()
> | static __always_inline void __el1_pnmi(struct pt_regs *regs,
> |                                        void (*handler)(struct pt_regs *)) 
> | {
> |         arm64_enter_nmi(regs);
> |         do_interrupt_handler(regs, handler);
> |         arm64_exit_nmi(regs);
> | }
> 
> Note that arm64_{enter,exit}_nmi() handle the __nmi_{enter,exit}() calls around
> do_interupt_handler().
> 
> >                                            rcu_nmi_enter()
> >                                             ->__rcu_irq_enter_check_tick()
> >                                                ->raw_spin_lock_rcu_node(rdp->mynode);
> >                                                    deadlock happens!!!
> > 
> >     }
> 
> As above, I don't think this can go wrong as you describe, and don't believe
> that this can deadlock.
> 
> > *** On arm64, how pNMI mistaken as IRQ ***
> > 
> > On arm64, pNMI is an analogue to NMI. In essence, it is a higher
> > priority interrupt but not disabled by local_irq_disable().
> > 
> > In current implementation
> > 1) If a pNMI from a context where IRQs were masked, it can be recognized
> >    as nmi, and calls __nmi_enter() immediately. This is no problem.
> 
> Agreed, this case works correctly.
> 
> > 2) But it causes trouble if a pNMI from a context where IRQs were
> >    unmasked, and temporarily regarded as maskable interrupt.  It is not
> >    treated as NMI, i.e. calling nmi_enter() until reading from GIC.
> > 
> >     __el1_irq()
> >     {
> >       irq_enter_rcu()   ----> hit the deadlock bug
> 
> What is "the deadlock bug"? The example you had above was for a context where
> IRQs were *masked*, which is a different case.
> 

As above, I made a mistake about the condition hence the code path.
There is no problem in this case through calling __el1_pnmi()

Sorry for the false alarm and thank you again for your detailed
explaination.


Regards,

	Pingfan

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCHv3 3/4] irqchip: GICv3: expose pNMI discriminator
  2021-11-17 11:01         ` Marc Zyngier
@ 2021-11-19  2:38           ` Pingfan Liu
  -1 siblings, 0 replies; 24+ messages in thread
From: Pingfan Liu @ 2021-11-19  2:38 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, Catalin Marinas, Will Deacon, Mark Rutland,
	Joey Gouly, Sami Tolvanen, Julien Thierry, Yuichi Ito, rcu

On Wed, Nov 17, 2021 at 11:01:23AM +0000, Marc Zyngier wrote:
> On Wed, 17 Nov 2021 10:16:53 +0000,
> Pingfan Liu <kernelfans@gmail.com> wrote:
> > 
> > On Tue, Nov 16, 2021 at 09:53:25AM +0000, Marc Zyngier wrote:
> > > Hi Pingfan,
> > > 
> > > On Tue, 16 Nov 2021 08:24:49 +0000,
> > > Pingfan Liu <kernelfans@gmail.com> wrote:
> > > > 
> > > > Arch level code is ready to take over the nmi_enter()/nmi_exit()
> > > > housekeeping.
> > > > 
> > > > GICv3 can expose the pNMI discriminator, then simply remove the
> > > > housekeeping.
> > > > 
> > > > Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> > > > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > > > Cc: Will Deacon <will@kernel.org>
> > > > Cc: Mark Rutland <mark.rutland@arm.com>
> > > > Cc: Marc Zyngier <maz@kernel.org>
> > > > Cc: Joey Gouly <joey.gouly@arm.com>
> > > > Cc: Sami Tolvanen <samitolvanen@google.com>
> > > > Cc: Julien Thierry <julien.thierry@arm.com>
> > > > Cc: Yuichi Ito <ito-yuichi@fujitsu.com>
> > > > Cc: rcu@vger.kernel.org
> > > > To: linux-arm-kernel@lists.infradead.org
> > > > ---
> > > >  drivers/irqchip/irq-gic-v3.c | 18 ++++++++++++------
> > > >  1 file changed, 12 insertions(+), 6 deletions(-)
> > > > 
> > > > diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
> > > > index daec3309b014..aa2bcb47b47e 100644
> > > > --- a/drivers/irqchip/irq-gic-v3.c
> > > > +++ b/drivers/irqchip/irq-gic-v3.c
> > > > @@ -646,12 +646,8 @@ static void gic_deactivate_unhandled(u32 irqnr)
> > > >  
> > > >  static inline void gic_handle_nmi(u32 irqnr, struct pt_regs *regs)
> > > >  {
> > > > -	bool irqs_enabled = interrupts_enabled(regs);
> > > >  	int err;
> > > >  
> > > > -	if (irqs_enabled)
> > > > -		nmi_enter();
> > > > -
> > > >  	if (static_branch_likely(&supports_deactivate_key))
> > > >  		gic_write_eoir(irqnr);
> > > >  	/*
> > > > @@ -664,8 +660,6 @@ static inline void gic_handle_nmi(u32 irqnr, struct pt_regs *regs)
> > > >  	if (err)
> > > >  		gic_deactivate_unhandled(irqnr);
> > > >  
> > > > -	if (irqs_enabled)
> > > > -		nmi_exit();
> > > >  }
> > > >  
> > > >  static u32 do_read_iar(struct pt_regs *regs)
> > > > @@ -702,6 +696,15 @@ static u32 do_read_iar(struct pt_regs *regs)
> > > >  	return iar;
> > > >  }
> > > >  
> > > > +static bool gic_is_in_nmi(void)
> > > > +{
> > > > +	if (gic_supports_nmi() &&
> > > > +	    unlikely(gic_read_rpr() == GICD_INT_RPR_PRI(GICD_INT_NMI_PRI)))
> > > > +		return true;
> > > 
> > > I don't think this fixes anything.
> > > 
> > > RPR stands for 'Running Priority Register', which in GIC speak reports
> > > the priority of the most recently Ack'ed interrupt.
> > > 
> > > You cannot use this to find out whether the interrupt that you /will/
> > > ack is a NMI or not. Actually, you cannot find out about *any*
> > > priority until you actually ack the interrupt. What you are asking for
> > > is the equivalent of a crystal ball, and we're in short supply... ;-)
> > > 
> > > The only case where ICC_RPR_EL1 will return something that is equal to
> > > GICD_INT_NMI_PRI is when you are *already* in an NMI context. So
> > > unless I have completely misunderstood your approach (which is always
> > > possible), I don't see how this can work.
> > > 
> > 
> > Thank you for the clear explanation. Also I revist this part in "GIC v3
> > and v4 overview" and have a deeper understanding. (Need to spare time to
> > go through all later)
> > 
> > You totally got my idea, and I need to find a bail-out.
> > 
> > As all kinds of PIC at least have two parts of functions: active (Ack) and
> > deactive(EOI), is it possible to split handle_arch_irq into two parts?
> > I.e let irqchip expose two interfaces:
> >   u32 (*read_irqinfo*)(struct pt_regs *regs, bool *is_nmi)
> >   void (*handle_arch_irq)(struct pt_regs *regs, u32 irqnr)
> > to replace the current interface:
> 
> No. There is no way we will move to such a scheme. We want to isolate
> the irqchip stuff in its own corner, and not propagate it into the
> arch code.

I understood this beautiful design of isolation. But since the x86 and
powerpc have integrated the PIC as part of arch at hardware design,
they can tell the source of the interrupt at the arch level code.

That is why their IPI_RESCHEDULE is cheaper than arm64. But yeah, the
benifit may not persuasive enough to breach the design goal.

> 
> If the pseudo-NMI is such a problem, I'm all for removing it *now*,
> never to come back again.
> 

It is a pity that pNMI has not fulfilled the roles like other Arches yet.
Two benefits come with NMI on x86 and powerpc:
-1. hardlockup detector uses NMI to ease the detection and analysis of
unpaired irq enable/disable. It can accurately report the stack trace
-2. In kdump case, all cpus can be forced into a known code piece.

But on arm64, it does not play such a role yet. Once I hit a boot hang
issue on a customed kdump kernel, there is no tick irq after adding
some printk, and if there is hardlockup detector, it will be easy to
debug.

> >   void (*handle_arch_irq)(struct pt_regs *regs)
> >   
> > I have thought about such stuff for some days. And the benefits include:
> >   -1. For this bugfix (by the parameter 'is_nmi')
> >   -2. IPI_RESCHEDULE performance drop issue can be resolved at arch
> >       code level. (by irqnr - ipi_irq_base == IPI_RESCHEDULE ?)
> >   -3. The arch level can provide a similar loop as aic_handle_irq() in irq-apple-aic.c,
> >       which can save cpu by avoiding heavy context sync when irq is
> >       intensive.
> > 
> > Do you think it is doable?
> 
> I don't think this is even needed, because I don't believe that the
> whole thing is a real problem.
> 
> In patch #1, you are claiming that handling a NMI as an IRQ first, and
> then upgrading to NMI once we know it really is an NMI is a problem.
> How different is this from an IRQ being preempted by a NMI?
> 

It turned out a false alarm as my reply to Mark. In that case, the code
should take __el1_pnmi(), but I mistake it as __el1_irq().

Thank again for your time and patient reply.


Regards,

	Pingfan

> Your own conclusion is that the this later case isn't a problem. So
> why is the former an issue?
> 
> I'm not saying that there is no issue at all, and it could well be
> that you have spotted something that I cannot see yet. But if that's
> the case, it means that the core code is broken as well.
> 
> 	M.
> 
> -- 
> Without deviation from the norm, progress is not possible.
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCHv3 3/4] irqchip: GICv3: expose pNMI discriminator
@ 2021-11-19  2:38           ` Pingfan Liu
  0 siblings, 0 replies; 24+ messages in thread
From: Pingfan Liu @ 2021-11-19  2:38 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, Catalin Marinas, Will Deacon, Mark Rutland,
	Joey Gouly, Sami Tolvanen, Julien Thierry, Yuichi Ito, rcu

On Wed, Nov 17, 2021 at 11:01:23AM +0000, Marc Zyngier wrote:
> On Wed, 17 Nov 2021 10:16:53 +0000,
> Pingfan Liu <kernelfans@gmail.com> wrote:
> > 
> > On Tue, Nov 16, 2021 at 09:53:25AM +0000, Marc Zyngier wrote:
> > > Hi Pingfan,
> > > 
> > > On Tue, 16 Nov 2021 08:24:49 +0000,
> > > Pingfan Liu <kernelfans@gmail.com> wrote:
> > > > 
> > > > Arch level code is ready to take over the nmi_enter()/nmi_exit()
> > > > housekeeping.
> > > > 
> > > > GICv3 can expose the pNMI discriminator, then simply remove the
> > > > housekeeping.
> > > > 
> > > > Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> > > > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > > > Cc: Will Deacon <will@kernel.org>
> > > > Cc: Mark Rutland <mark.rutland@arm.com>
> > > > Cc: Marc Zyngier <maz@kernel.org>
> > > > Cc: Joey Gouly <joey.gouly@arm.com>
> > > > Cc: Sami Tolvanen <samitolvanen@google.com>
> > > > Cc: Julien Thierry <julien.thierry@arm.com>
> > > > Cc: Yuichi Ito <ito-yuichi@fujitsu.com>
> > > > Cc: rcu@vger.kernel.org
> > > > To: linux-arm-kernel@lists.infradead.org
> > > > ---
> > > >  drivers/irqchip/irq-gic-v3.c | 18 ++++++++++++------
> > > >  1 file changed, 12 insertions(+), 6 deletions(-)
> > > > 
> > > > diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
> > > > index daec3309b014..aa2bcb47b47e 100644
> > > > --- a/drivers/irqchip/irq-gic-v3.c
> > > > +++ b/drivers/irqchip/irq-gic-v3.c
> > > > @@ -646,12 +646,8 @@ static void gic_deactivate_unhandled(u32 irqnr)
> > > >  
> > > >  static inline void gic_handle_nmi(u32 irqnr, struct pt_regs *regs)
> > > >  {
> > > > -	bool irqs_enabled = interrupts_enabled(regs);
> > > >  	int err;
> > > >  
> > > > -	if (irqs_enabled)
> > > > -		nmi_enter();
> > > > -
> > > >  	if (static_branch_likely(&supports_deactivate_key))
> > > >  		gic_write_eoir(irqnr);
> > > >  	/*
> > > > @@ -664,8 +660,6 @@ static inline void gic_handle_nmi(u32 irqnr, struct pt_regs *regs)
> > > >  	if (err)
> > > >  		gic_deactivate_unhandled(irqnr);
> > > >  
> > > > -	if (irqs_enabled)
> > > > -		nmi_exit();
> > > >  }
> > > >  
> > > >  static u32 do_read_iar(struct pt_regs *regs)
> > > > @@ -702,6 +696,15 @@ static u32 do_read_iar(struct pt_regs *regs)
> > > >  	return iar;
> > > >  }
> > > >  
> > > > +static bool gic_is_in_nmi(void)
> > > > +{
> > > > +	if (gic_supports_nmi() &&
> > > > +	    unlikely(gic_read_rpr() == GICD_INT_RPR_PRI(GICD_INT_NMI_PRI)))
> > > > +		return true;
> > > 
> > > I don't think this fixes anything.
> > > 
> > > RPR stands for 'Running Priority Register', which in GIC speak reports
> > > the priority of the most recently Ack'ed interrupt.
> > > 
> > > You cannot use this to find out whether the interrupt that you /will/
> > > ack is a NMI or not. Actually, you cannot find out about *any*
> > > priority until you actually ack the interrupt. What you are asking for
> > > is the equivalent of a crystal ball, and we're in short supply... ;-)
> > > 
> > > The only case where ICC_RPR_EL1 will return something that is equal to
> > > GICD_INT_NMI_PRI is when you are *already* in an NMI context. So
> > > unless I have completely misunderstood your approach (which is always
> > > possible), I don't see how this can work.
> > > 
> > 
> > Thank you for the clear explanation. Also I revist this part in "GIC v3
> > and v4 overview" and have a deeper understanding. (Need to spare time to
> > go through all later)
> > 
> > You totally got my idea, and I need to find a bail-out.
> > 
> > As all kinds of PIC at least have two parts of functions: active (Ack) and
> > deactive(EOI), is it possible to split handle_arch_irq into two parts?
> > I.e let irqchip expose two interfaces:
> >   u32 (*read_irqinfo*)(struct pt_regs *regs, bool *is_nmi)
> >   void (*handle_arch_irq)(struct pt_regs *regs, u32 irqnr)
> > to replace the current interface:
> 
> No. There is no way we will move to such a scheme. We want to isolate
> the irqchip stuff in its own corner, and not propagate it into the
> arch code.

I understood this beautiful design of isolation. But since the x86 and
powerpc have integrated the PIC as part of arch at hardware design,
they can tell the source of the interrupt at the arch level code.

That is why their IPI_RESCHEDULE is cheaper than arm64. But yeah, the
benifit may not persuasive enough to breach the design goal.

> 
> If the pseudo-NMI is such a problem, I'm all for removing it *now*,
> never to come back again.
> 

It is a pity that pNMI has not fulfilled the roles like other Arches yet.
Two benefits come with NMI on x86 and powerpc:
-1. hardlockup detector uses NMI to ease the detection and analysis of
unpaired irq enable/disable. It can accurately report the stack trace
-2. In kdump case, all cpus can be forced into a known code piece.

But on arm64, it does not play such a role yet. Once I hit a boot hang
issue on a customed kdump kernel, there is no tick irq after adding
some printk, and if there is hardlockup detector, it will be easy to
debug.

> >   void (*handle_arch_irq)(struct pt_regs *regs)
> >   
> > I have thought about such stuff for some days. And the benefits include:
> >   -1. For this bugfix (by the parameter 'is_nmi')
> >   -2. IPI_RESCHEDULE performance drop issue can be resolved at arch
> >       code level. (by irqnr - ipi_irq_base == IPI_RESCHEDULE ?)
> >   -3. The arch level can provide a similar loop as aic_handle_irq() in irq-apple-aic.c,
> >       which can save cpu by avoiding heavy context sync when irq is
> >       intensive.
> > 
> > Do you think it is doable?
> 
> I don't think this is even needed, because I don't believe that the
> whole thing is a real problem.
> 
> In patch #1, you are claiming that handling a NMI as an IRQ first, and
> then upgrading to NMI once we know it really is an NMI is a problem.
> How different is this from an IRQ being preempted by a NMI?
> 

It turned out a false alarm as my reply to Mark. In that case, the code
should take __el1_pnmi(), but I mistake it as __el1_irq().

Thank again for your time and patient reply.


Regards,

	Pingfan

> Your own conclusion is that the this later case isn't a problem. So
> why is the former an issue?
> 
> I'm not saying that there is no issue at all, and it could well be
> that you have spotted something that I cannot see yet. But if that's
> the case, it means that the core code is broken as well.
> 
> 	M.
> 
> -- 
> Without deviation from the norm, progress is not possible.
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCHv3 1/4] arm64: entry: judge nmi ealier to avoid deadlock in RCU
  2021-11-19  2:01       ` Pingfan Liu
@ 2021-11-19 14:04         ` Mark Rutland
  -1 siblings, 0 replies; 24+ messages in thread
From: Mark Rutland @ 2021-11-19 14:04 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: linux-arm-kernel, Paul E . McKenney, Catalin Marinas,
	Will Deacon, Marc Zyngier, Thomas Gleixner, Joey Gouly,
	Sami Tolvanen, Julien Thierry, Yuichi Ito, rcu

On Fri, Nov 19, 2021 at 10:01:00AM +0800, Pingfan Liu wrote:
> On Wed, Nov 17, 2021 at 11:38:54AM +0000, Mark Rutland wrote:
> > On Tue, Nov 16, 2021 at 04:24:47PM +0800, Pingfan Liu wrote:
> > > Linux kernel places strict semantics between NMI and maskable interrupt.
> > > So does the RCU component, else deadlock may happen. But the current
> > > arm64 entry code can partially breach this rule through calling
> > > rcu_nmi_enter().
> > > 
> > > ***  how a deadlock can happen if NMI mistaken as IRQ  ***
> > > 
> > >     rcu_nmi_enter()
> > >     {
> > > 
> > >         if (rcu_dynticks_curr_cpu_in_eqs()) {
> > > 
> > >         	if (!in_nmi())
> > >         		rcu_dynticks_task_exit();
> > >                 ...
> > >         	if (!in_nmi()) {
> > >         		instrumentation_begin();
> > >         		rcu_cleanup_after_idle();
> > >         		instrumentation_end();
> > >         	}
> > >                 ...
> > >         } else if (!in_nmi()) {
> > >         	instrumentation_begin();
> > >         	rcu_irq_enter_check_tick();
> > >         }
> > > 
> > >     }
> > > 
> > > If a NMI is mistaken as a maskable interrupt, rcu_irq_enter_check_tick()
> > > can hit a deadlock, which is demonstrated by the following scenario:
> > > 
> > >     note_gp_changes() runs in a task context
> > >     {
> > >        local_irq_save(flags); // this protects against irq, but not NMI
> > 
> > Note: speecifically, this masks IRQs via ICC_PMR_EL1.
> > 
> > >        rnp = rdp->mynode;
> > >        ...
> > >        raw_spin_trylock_rcu_node(rnp)
> > >                                   -------> broken in by (p)NMI, without taking __nmi_enter()
> > 
> > AFAICT the broken case described here *cannot* happen.
> > 
> > If we take a pNMI here, we'll go:
> > 
> >   el1h_64_irq()
> >   -> el1h64_irq_handler()
> >   -> el1_interrupt()
> > 
> > ... where el1_interrupt is:
> > 
> > | static void noinstr el1_interrupt(struct pt_regs *regs,
> > |                                   void (*handler)(struct pt_regs *)) 
> > | {
> > |         write_sysreg(DAIF_PROCCTX_NOIRQ, daif);
> > | 
> > |         if (IS_ENABLED(CONFIG_ARM64_PSEUDO_NMI) && !interrupts_enabled(regs))
> > |                 __el1_pnmi(regs, handler);
> > |         else
> > |                 __el1_irq(regs, handler);
> > | } 
> > 
> > ... and interrupts_enabled() is:
> > 
> > | #define interrupts_enabled(regs)                        \
> > |        (!((regs)->pstate & PSR_I_BIT) && irqs_priority_unmasked(regs))
> > 
> > ... and irqs_priority_unmasked is:
> > 
> > | #define irqs_priority_unmasked(regs)                                    \
> > |         (system_uses_irq_prio_masking() ?                               \
> > |                 (regs)->pmr_save == GIC_PRIO_IRQON :                    \
> > |                 true)
> > 
> > ... irqs_priority_unmasked(regs) *must* return false for this case,
> > since the local_irq_save() above must have set the PMR to
> > GIC_PRIO_IRQOFF in the interrupted context.
> > 
> > Thus, we *must* call __el1_pnmi() in this case, where __el1_pnmi() is:
> > 
> 
> Yes, you are right. And I made a mistake to think it will call
> __el1_irq()

> As above, I made a mistake about the condition hence the code path.
> There is no problem in this case through calling __el1_pnmi()
> 
> Sorry for the false alarm and thank you again for your detailed
> explaination.

No worries; thanks for confirming this was a false alarm. Glad we're now
on the same page. :)

Likewise, thanks for reporting what you thought was an issue; having
more eyes on this is always good!

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCHv3 1/4] arm64: entry: judge nmi ealier to avoid deadlock in RCU
@ 2021-11-19 14:04         ` Mark Rutland
  0 siblings, 0 replies; 24+ messages in thread
From: Mark Rutland @ 2021-11-19 14:04 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: linux-arm-kernel, Paul E . McKenney, Catalin Marinas,
	Will Deacon, Marc Zyngier, Thomas Gleixner, Joey Gouly,
	Sami Tolvanen, Julien Thierry, Yuichi Ito, rcu

On Fri, Nov 19, 2021 at 10:01:00AM +0800, Pingfan Liu wrote:
> On Wed, Nov 17, 2021 at 11:38:54AM +0000, Mark Rutland wrote:
> > On Tue, Nov 16, 2021 at 04:24:47PM +0800, Pingfan Liu wrote:
> > > Linux kernel places strict semantics between NMI and maskable interrupt.
> > > So does the RCU component, else deadlock may happen. But the current
> > > arm64 entry code can partially breach this rule through calling
> > > rcu_nmi_enter().
> > > 
> > > ***  how a deadlock can happen if NMI mistaken as IRQ  ***
> > > 
> > >     rcu_nmi_enter()
> > >     {
> > > 
> > >         if (rcu_dynticks_curr_cpu_in_eqs()) {
> > > 
> > >         	if (!in_nmi())
> > >         		rcu_dynticks_task_exit();
> > >                 ...
> > >         	if (!in_nmi()) {
> > >         		instrumentation_begin();
> > >         		rcu_cleanup_after_idle();
> > >         		instrumentation_end();
> > >         	}
> > >                 ...
> > >         } else if (!in_nmi()) {
> > >         	instrumentation_begin();
> > >         	rcu_irq_enter_check_tick();
> > >         }
> > > 
> > >     }
> > > 
> > > If a NMI is mistaken as a maskable interrupt, rcu_irq_enter_check_tick()
> > > can hit a deadlock, which is demonstrated by the following scenario:
> > > 
> > >     note_gp_changes() runs in a task context
> > >     {
> > >        local_irq_save(flags); // this protects against irq, but not NMI
> > 
> > Note: speecifically, this masks IRQs via ICC_PMR_EL1.
> > 
> > >        rnp = rdp->mynode;
> > >        ...
> > >        raw_spin_trylock_rcu_node(rnp)
> > >                                   -------> broken in by (p)NMI, without taking __nmi_enter()
> > 
> > AFAICT the broken case described here *cannot* happen.
> > 
> > If we take a pNMI here, we'll go:
> > 
> >   el1h_64_irq()
> >   -> el1h64_irq_handler()
> >   -> el1_interrupt()
> > 
> > ... where el1_interrupt is:
> > 
> > | static void noinstr el1_interrupt(struct pt_regs *regs,
> > |                                   void (*handler)(struct pt_regs *)) 
> > | {
> > |         write_sysreg(DAIF_PROCCTX_NOIRQ, daif);
> > | 
> > |         if (IS_ENABLED(CONFIG_ARM64_PSEUDO_NMI) && !interrupts_enabled(regs))
> > |                 __el1_pnmi(regs, handler);
> > |         else
> > |                 __el1_irq(regs, handler);
> > | } 
> > 
> > ... and interrupts_enabled() is:
> > 
> > | #define interrupts_enabled(regs)                        \
> > |        (!((regs)->pstate & PSR_I_BIT) && irqs_priority_unmasked(regs))
> > 
> > ... and irqs_priority_unmasked is:
> > 
> > | #define irqs_priority_unmasked(regs)                                    \
> > |         (system_uses_irq_prio_masking() ?                               \
> > |                 (regs)->pmr_save == GIC_PRIO_IRQON :                    \
> > |                 true)
> > 
> > ... irqs_priority_unmasked(regs) *must* return false for this case,
> > since the local_irq_save() above must have set the PMR to
> > GIC_PRIO_IRQOFF in the interrupted context.
> > 
> > Thus, we *must* call __el1_pnmi() in this case, where __el1_pnmi() is:
> > 
> 
> Yes, you are right. And I made a mistake to think it will call
> __el1_irq()

> As above, I made a mistake about the condition hence the code path.
> There is no problem in this case through calling __el1_pnmi()
> 
> Sorry for the false alarm and thank you again for your detailed
> explaination.

No worries; thanks for confirming this was a false alarm. Glad we're now
on the same page. :)

Likewise, thanks for reporting what you thought was an issue; having
more eyes on this is always good!

Thanks,
Mark.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2021-11-19 14:05 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-16  8:24 [PATCHv3 0/4] arm64: Fixes RCU deadlock due to a mistaken Pingfan Liu
2021-11-16  8:24 ` Pingfan Liu
2021-11-16  8:24 ` [PATCHv3 1/4] arm64: entry: judge nmi ealier to avoid deadlock in RCU Pingfan Liu
2021-11-16  8:24   ` Pingfan Liu
2021-11-17 11:38   ` Mark Rutland
2021-11-17 11:38     ` Mark Rutland
2021-11-19  2:01     ` Pingfan Liu
2021-11-19  2:01       ` Pingfan Liu
2021-11-19 14:04       ` Mark Rutland
2021-11-19 14:04         ` Mark Rutland
2021-11-16  8:24 ` [PATCHv3 2/4] arm64: entry: distinguish pNMI earlier in el0 interrupt Pingfan Liu
2021-11-16  8:24   ` Pingfan Liu
2021-11-16  8:24 ` [PATCHv3 3/4] irqchip: GICv3: expose pNMI discriminator Pingfan Liu
2021-11-16  8:24   ` Pingfan Liu
2021-11-16  9:53   ` Marc Zyngier
2021-11-16  9:53     ` Marc Zyngier
2021-11-17 10:16     ` Pingfan Liu
2021-11-17 10:16       ` Pingfan Liu
2021-11-17 11:01       ` Marc Zyngier
2021-11-17 11:01         ` Marc Zyngier
2021-11-19  2:38         ` Pingfan Liu
2021-11-19  2:38           ` Pingfan Liu
2021-11-16  8:24 ` [PATCHv3 4/4] arm64: entry: remove pNMI judgement in __el1_interrupt() path Pingfan Liu
2021-11-16  8:24   ` Pingfan Liu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.