linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH] irqchip/gic-v3: Do not enable irqs when handling spurious interrups
@ 2021-04-16  6:22 He Ying
  2021-04-16 14:15 ` Marc Zyngier
  0 siblings, 1 reply; 7+ messages in thread
From: He Ying @ 2021-04-16  6:22 UTC (permalink / raw)
  To: tglx, maz, julien.thierry.kdev, catalin.marinas
  Cc: linux-kernel, heying24, linux-arm-kernel

We found this problem in our kernel src tree:

[   14.816231] ------------[ cut here ]------------
[   14.816231] kernel BUG at irq.c:99!
[   14.816232] Internal error: Oops - BUG: 0 [#1] SMP
[   14.816232] Process swapper/0 (pid: 0, stack limit = 0x(____ptrval____))
[   14.816233] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           O      4.19.95-1.h1.AOS2.0.aarch64 #14
[   14.816233] Hardware name: evb (DT)
[   14.816234] pstate: 80400085 (Nzcv daIf +PAN -UAO)
[   14.816234] pc : asm_nmi_enter+0x94/0x98
[   14.816235] lr : asm_nmi_enter+0x18/0x98
[   14.816235] sp : ffff000008003c50
[   14.816235] pmr_save: 00000070
[   14.816237] x29: ffff000008003c50 x28: ffff0000095f56c0
[   14.816238] x27: 0000000000000000 x26: ffff000008004000
[   14.816239] x25: 00000000015e0000 x24: ffff8008fb916000
[   14.816240] x23: 0000000020400005 x22: ffff0000080817cc
[   14.816241] x21: ffff000008003da0 x20: 0000000000000060
[   14.816242] x19: 00000000000003ff x18: ffffffffffffffff
[   14.816243] x17: 0000000000000008 x16: 003d090000000000
[   14.816244] x15: ffff0000095ea6c8 x14: ffff8008fff5ab40
[   14.816244] x13: ffff8008fff58b9d x12: 0000000000000000
[   14.816245] x11: ffff000008c8a200 x10: 000000008e31fca5
[   14.816246] x9 : ffff000008c8a208 x8 : 000000000000000f
[   14.816247] x7 : 0000000000000004 x6 : ffff8008fff58b9e
[   14.816248] x5 : 0000000000000000 x4 : 0000000080000000
[   14.816249] x3 : 0000000000000000 x2 : 0000000080000000
[   14.816250] x1 : 0000000000120000 x0 : ffff0000095f56c0
[   14.816251] Call trace:
[   14.816251]  asm_nmi_enter+0x94/0x98
[   14.816251]  el1_irq+0x8c/0x180
[   14.816252]  gic_handle_irq+0xbc/0x2e4
[   14.816252]  el1_irq+0xcc/0x180
[   14.816253]  arch_timer_handler_virt+0x38/0x58
[   14.816253]  handle_percpu_devid_irq+0x90/0x240
[   14.816253]  generic_handle_irq+0x34/0x50
[   14.816254]  __handle_domain_irq+0x68/0xc0
[   14.816254]  gic_handle_irq+0xf8/0x2e4
[   14.816255]  el1_irq+0xcc/0x180
[   14.816255]  arch_cpu_idle+0x34/0x1c8
[   14.816255]  default_idle_call+0x24/0x44
[   14.816256]  do_idle+0x1d0/0x2c8
[   14.816256]  cpu_startup_entry+0x28/0x30
[   14.816256]  rest_init+0xb8/0xc8
[   14.816257]  start_kernel+0x4c8/0x4f4
[   14.816257] Code: 940587f1 d5384100 b9401001 36a7fd01 (d4210000)
[   14.816258] Modules linked in: start_dp(O) smeth(O)
[   15.103092] ---[ end trace 701753956cb14aa8 ]---
[   15.103093] Kernel panic - not syncing: Fatal exception in interrupt
[   15.103099] SMP: stopping secondary CPUs
[   15.103100] Kernel Offset: disabled
[   15.103100] CPU features: 0x36,a2400218
[   15.103100] Memory Limit: none

Our kernel src tree is based on 4.19.95 and backports arm64 pseudo-NMI
patches but doesn't support nested NMI. Its top relative commit is
commit 17ce302f3117 ("arm64: Fix interrupt tracing in the presence of NMIs").

I look into this issue and find that it's caused by 'BUG_ON(in_nmi())'
in nmi_enter(). From the call trace, we find two 'el1_irqs' which
means an interrupt preempts the other one and the new one is an NMI.
Furthermore, by adding some prints, we find the first irq also calls
nmi_enter(), but its priority is not GICD_INT_NMI_PRI and its irq number
is 1023. It enables irq by calling gic_arch_enable_irqs() in
gic_handle_irq(). At this moment, the second irq preempts the first irq
and it's an NMI but current context is already in nmi. So that may be
the problem.

In my opinion, when handling spurious interrupts, we shouldn't enable irqs.
My reason is that for spurious interrupts we may enter nmi context in
el1_irq() because current PMR may be GIC_PRIO_IRQOFF. If we enable irqs
at this time, another NMI may happen and preempt this spurious interrupt
but the context is already in nmi. That causes a bug on if nested NMI is
not supported. Even for nested nmi, I think it's not a normal scenario.

Fixes: 17ce302f3117 ("arm64: Fix interrupt tracing in the presence of NMIs")
Signed-off-by: He Ying <heying24@huawei.com>
---
 drivers/irqchip/irq-gic-v3.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index 94b89258d045..d3b52734a2c5 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -654,15 +654,15 @@ static asmlinkage void __exception_irq_entry gic_handle_irq(struct pt_regs *regs
 		return;
 	}
 
+	/* Check for special IDs first */
+	if ((irqnr >= 1020 && irqnr <= 1023))
+		return;
+
 	if (gic_prio_masking_enabled()) {
 		gic_pmr_mask_irqs();
 		gic_arch_enable_irqs();
 	}
 
-	/* Check for special IDs first */
-	if ((irqnr >= 1020 && irqnr <= 1023))
-		return;
-
 	if (static_branch_likely(&supports_deactivate_key))
 		gic_write_eoir(irqnr);
 	else
-- 
2.17.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] irqchip/gic-v3: Do not enable irqs when handling spurious interrups
  2021-04-16  6:22 [RFC PATCH] irqchip/gic-v3: Do not enable irqs when handling spurious interrups He Ying
@ 2021-04-16 14:15 ` Marc Zyngier
  2021-04-17  2:01   ` He Ying
  0 siblings, 1 reply; 7+ messages in thread
From: Marc Zyngier @ 2021-04-16 14:15 UTC (permalink / raw)
  To: He Ying
  Cc: tglx, julien.thierry.kdev, catalin.marinas, linux-kernel,
	linux-arm-kernel, Mark Rutland

[+ Mark]

On Fri, 16 Apr 2021 07:22:17 +0100,
He Ying <heying24@huawei.com> wrote:
> 
> We found this problem in our kernel src tree:
> 
> [   14.816231] ------------[ cut here ]------------
> [   14.816231] kernel BUG at irq.c:99!
> [   14.816232] Internal error: Oops - BUG: 0 [#1] SMP
> [   14.816232] Process swapper/0 (pid: 0, stack limit = 0x(____ptrval____))
> [   14.816233] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           O      4.19.95-1.h1.AOS2.0.aarch64 #14
> [   14.816233] Hardware name: evb (DT)
> [   14.816234] pstate: 80400085 (Nzcv daIf +PAN -UAO)
> [   14.816234] pc : asm_nmi_enter+0x94/0x98
> [   14.816235] lr : asm_nmi_enter+0x18/0x98
> [   14.816235] sp : ffff000008003c50
> [   14.816235] pmr_save: 00000070
> [   14.816237] x29: ffff000008003c50 x28: ffff0000095f56c0
> [   14.816238] x27: 0000000000000000 x26: ffff000008004000
> [   14.816239] x25: 00000000015e0000 x24: ffff8008fb916000
> [   14.816240] x23: 0000000020400005 x22: ffff0000080817cc
> [   14.816241] x21: ffff000008003da0 x20: 0000000000000060
> [   14.816242] x19: 00000000000003ff x18: ffffffffffffffff
> [   14.816243] x17: 0000000000000008 x16: 003d090000000000
> [   14.816244] x15: ffff0000095ea6c8 x14: ffff8008fff5ab40
> [   14.816244] x13: ffff8008fff58b9d x12: 0000000000000000
> [   14.816245] x11: ffff000008c8a200 x10: 000000008e31fca5
> [   14.816246] x9 : ffff000008c8a208 x8 : 000000000000000f
> [   14.816247] x7 : 0000000000000004 x6 : ffff8008fff58b9e
> [   14.816248] x5 : 0000000000000000 x4 : 0000000080000000
> [   14.816249] x3 : 0000000000000000 x2 : 0000000080000000
> [   14.816250] x1 : 0000000000120000 x0 : ffff0000095f56c0
> [   14.816251] Call trace:
> [   14.816251]  asm_nmi_enter+0x94/0x98
> [   14.816251]  el1_irq+0x8c/0x180
> [   14.816252]  gic_handle_irq+0xbc/0x2e4
> [   14.816252]  el1_irq+0xcc/0x180
> [   14.816253]  arch_timer_handler_virt+0x38/0x58
> [   14.816253]  handle_percpu_devid_irq+0x90/0x240
> [   14.816253]  generic_handle_irq+0x34/0x50
> [   14.816254]  __handle_domain_irq+0x68/0xc0
> [   14.816254]  gic_handle_irq+0xf8/0x2e4
> [   14.816255]  el1_irq+0xcc/0x180
> [   14.816255]  arch_cpu_idle+0x34/0x1c8
> [   14.816255]  default_idle_call+0x24/0x44
> [   14.816256]  do_idle+0x1d0/0x2c8
> [   14.816256]  cpu_startup_entry+0x28/0x30
> [   14.816256]  rest_init+0xb8/0xc8
> [   14.816257]  start_kernel+0x4c8/0x4f4
> [   14.816257] Code: 940587f1 d5384100 b9401001 36a7fd01 (d4210000)
> [   14.816258] Modules linked in: start_dp(O) smeth(O)
> [   15.103092] ---[ end trace 701753956cb14aa8 ]---
> [   15.103093] Kernel panic - not syncing: Fatal exception in interrupt
> [   15.103099] SMP: stopping secondary CPUs
> [   15.103100] Kernel Offset: disabled
> [   15.103100] CPU features: 0x36,a2400218
> [   15.103100] Memory Limit: none

Urgh...

> Our kernel src tree is based on 4.19.95 and backports arm64 pseudo-NMI
> patches but doesn't support nested NMI. Its top relative commit is
> commit 17ce302f3117 ("arm64: Fix interrupt tracing in the presence of NMIs").

Can you please reproduce it with mainline and without any backport?
It is hard to reason about something that isn't a vanilla kernel.

> I look into this issue and find that it's caused by 'BUG_ON(in_nmi())'
> in nmi_enter(). From the call trace, we find two 'el1_irqs' which
> means an interrupt preempts the other one and the new one is an NMI.
> Furthermore, by adding some prints, we find the first irq also calls
> nmi_enter(), but its priority is not GICD_INT_NMI_PRI and its irq number
> is 1023. It enables irq by calling gic_arch_enable_irqs() in
> gic_handle_irq(). At this moment, the second irq preempts the first irq
> and it's an NMI but current context is already in nmi. So that may be
> the problem.

I'm not sure I get it. From the stack trace, I see this:

[   14.816251]  asm_nmi_enter+0x94/0x98
[   14.816251]  el1_irq+0x8c/0x180			(C)
[   14.816252]  gic_handle_irq+0xbc/0x2e4
[   14.816252]  el1_irq+0xcc/0x180			(B)
[   14.816253]  arch_timer_handler_virt+0x38/0x58
[   14.816253]  handle_percpu_devid_irq+0x90/0x240
[   14.816253]  generic_handle_irq+0x34/0x50
[   14.816254]  __handle_domain_irq+0x68/0xc0
[   14.816254]  gic_handle_irq+0xf8/0x2e4
[   14.816255]  el1_irq+0xcc/0x180			(A)

which indicates that we preempted a timer interrupt (A) with another
IRQ (B), itself immediately preempted by another IRQ (C)? That's
indeed at least one too many.

Can you please describe for each of (A), (B) and (C) whether they are
spurious or not, what their priorities are if they aren't spurious?

> In my opinion, when handling spurious interrupts, we shouldn't enable irqs.
> My reason is that for spurious interrupts we may enter nmi context in
> el1_irq() because current PMR may be GIC_PRIO_IRQOFF. If we enable irqs
> at this time, another NMI may happen and preempt this spurious interrupt
> but the context is already in nmi. That causes a bug on if nested NMI is
> not supported. Even for nested nmi, I think it's not a normal scenario.

I would tend to agree that this isn't great. Actually, I'd probably
move the check for a spurious interrupt right after the read of
ICC_IAR1_EL1, because there is no real need to do anything else at
that point.

However, upstream is quite different from 4.19 in that respect, and
I'm not sure if what I am looking at is what you are seeing with your
older kernel.

Thanks,

	M.

> Fixes: 17ce302f3117 ("arm64: Fix interrupt tracing in the presence of NMIs")
> Signed-off-by: He Ying <heying24@huawei.com>
> ---
>  drivers/irqchip/irq-gic-v3.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
> index 94b89258d045..d3b52734a2c5 100644
> --- a/drivers/irqchip/irq-gic-v3.c
> +++ b/drivers/irqchip/irq-gic-v3.c
> @@ -654,15 +654,15 @@ static asmlinkage void __exception_irq_entry gic_handle_irq(struct pt_regs *regs
>  		return;
>  	}
>  
> +	/* Check for special IDs first */
> +	if ((irqnr >= 1020 && irqnr <= 1023))
> +		return;
> +
>  	if (gic_prio_masking_enabled()) {
>  		gic_pmr_mask_irqs();
>  		gic_arch_enable_irqs();
>  	}
>  
> -	/* Check for special IDs first */
> -	if ((irqnr >= 1020 && irqnr <= 1023))
> -		return;
> -
>  	if (static_branch_likely(&supports_deactivate_key))
>  		gic_write_eoir(irqnr);
>  	else
> -- 
> 2.17.1
> 
> 

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] irqchip/gic-v3: Do not enable irqs when handling spurious interrups
  2021-04-16 14:15 ` Marc Zyngier
@ 2021-04-17  2:01   ` He Ying
  2021-04-21  9:13     ` He Ying
  2021-04-22 12:27     ` Marc Zyngier
  0 siblings, 2 replies; 7+ messages in thread
From: He Ying @ 2021-04-17  2:01 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: tglx, julien.thierry.kdev, catalin.marinas, linux-kernel,
	linux-arm-kernel, Mark Rutland

Hello Marc,


在 2021/4/16 22:15, Marc Zyngier 写道:
> [+ Mark]
>
> On Fri, 16 Apr 2021 07:22:17 +0100,
> He Ying <heying24@huawei.com> wrote:
>> We found this problem in our kernel src tree:
>>
>> [   14.816231] ------------[ cut here ]------------
>> [   14.816231] kernel BUG at irq.c:99!
>> [   14.816232] Internal error: Oops - BUG: 0 [#1] SMP
>> [   14.816232] Process swapper/0 (pid: 0, stack limit = 0x(____ptrval____))
>> [   14.816233] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           O      4.19.95-1.h1.AOS2.0.aarch64 #14
>> [   14.816233] Hardware name: evb (DT)
>> [   14.816234] pstate: 80400085 (Nzcv daIf +PAN -UAO)
>> [   14.816234] pc : asm_nmi_enter+0x94/0x98
>> [   14.816235] lr : asm_nmi_enter+0x18/0x98
>> [   14.816235] sp : ffff000008003c50
>> [   14.816235] pmr_save: 00000070
>> [   14.816237] x29: ffff000008003c50 x28: ffff0000095f56c0
>> [   14.816238] x27: 0000000000000000 x26: ffff000008004000
>> [   14.816239] x25: 00000000015e0000 x24: ffff8008fb916000
>> [   14.816240] x23: 0000000020400005 x22: ffff0000080817cc
>> [   14.816241] x21: ffff000008003da0 x20: 0000000000000060
>> [   14.816242] x19: 00000000000003ff x18: ffffffffffffffff
>> [   14.816243] x17: 0000000000000008 x16: 003d090000000000
>> [   14.816244] x15: ffff0000095ea6c8 x14: ffff8008fff5ab40
>> [   14.816244] x13: ffff8008fff58b9d x12: 0000000000000000
>> [   14.816245] x11: ffff000008c8a200 x10: 000000008e31fca5
>> [   14.816246] x9 : ffff000008c8a208 x8 : 000000000000000f
>> [   14.816247] x7 : 0000000000000004 x6 : ffff8008fff58b9e
>> [   14.816248] x5 : 0000000000000000 x4 : 0000000080000000
>> [   14.816249] x3 : 0000000000000000 x2 : 0000000080000000
>> [   14.816250] x1 : 0000000000120000 x0 : ffff0000095f56c0
>> [   14.816251] Call trace:
>> [   14.816251]  asm_nmi_enter+0x94/0x98
>> [   14.816251]  el1_irq+0x8c/0x180
>> [   14.816252]  gic_handle_irq+0xbc/0x2e4
>> [   14.816252]  el1_irq+0xcc/0x180
>> [   14.816253]  arch_timer_handler_virt+0x38/0x58
>> [   14.816253]  handle_percpu_devid_irq+0x90/0x240
>> [   14.816253]  generic_handle_irq+0x34/0x50
>> [   14.816254]  __handle_domain_irq+0x68/0xc0
>> [   14.816254]  gic_handle_irq+0xf8/0x2e4
>> [   14.816255]  el1_irq+0xcc/0x180
>> [   14.816255]  arch_cpu_idle+0x34/0x1c8
>> [   14.816255]  default_idle_call+0x24/0x44
>> [   14.816256]  do_idle+0x1d0/0x2c8
>> [   14.816256]  cpu_startup_entry+0x28/0x30
>> [   14.816256]  rest_init+0xb8/0xc8
>> [   14.816257]  start_kernel+0x4c8/0x4f4
>> [   14.816257] Code: 940587f1 d5384100 b9401001 36a7fd01 (d4210000)
>> [   14.816258] Modules linked in: start_dp(O) smeth(O)
>> [   15.103092] ---[ end trace 701753956cb14aa8 ]---
>> [   15.103093] Kernel panic - not syncing: Fatal exception in interrupt
>> [   15.103099] SMP: stopping secondary CPUs
>> [   15.103100] Kernel Offset: disabled
>> [   15.103100] CPU features: 0x36,a2400218
>> [   15.103100] Memory Limit: none
> Urgh...
>
>> Our kernel src tree is based on 4.19.95 and backports arm64 pseudo-NMI
>> patches but doesn't support nested NMI. Its top relative commit is
>> commit 17ce302f3117 ("arm64: Fix interrupt tracing in the presence of NMIs").
> Can you please reproduce it with mainline and without any backport?
> It is hard to reason about something that isn't a vanilla kernel.

I think our kernel is quite like v5.3 mainline. Reproducing it in v5.3 
mainline may

be a little difficult for us because our product needs some more self 
developed

patches to work.

>
>> I look into this issue and find that it's caused by 'BUG_ON(in_nmi())'
>> in nmi_enter(). From the call trace, we find two 'el1_irqs' which
>> means an interrupt preempts the other one and the new one is an NMI.
>> Furthermore, by adding some prints, we find the first irq also calls
>> nmi_enter(), but its priority is not GICD_INT_NMI_PRI and its irq number
>> is 1023. It enables irq by calling gic_arch_enable_irqs() in
>> gic_handle_irq(). At this moment, the second irq preempts the first irq
>> and it's an NMI but current context is already in nmi. So that may be
>> the problem.
> I'm not sure I get it. From the stack trace, I see this:
>
> [   14.816251]  asm_nmi_enter+0x94/0x98
> [   14.816251]  el1_irq+0x8c/0x180			(C)
> [   14.816252]  gic_handle_irq+0xbc/0x2e4
> [   14.816252]  el1_irq+0xcc/0x180			(B)
> [   14.816253]  arch_timer_handler_virt+0x38/0x58
> [   14.816253]  handle_percpu_devid_irq+0x90/0x240
> [   14.816253]  generic_handle_irq+0x34/0x50
> [   14.816254]  __handle_domain_irq+0x68/0xc0
> [   14.816254]  gic_handle_irq+0xf8/0x2e4
> [   14.816255]  el1_irq+0xcc/0x180			(A)
>
> which indicates that we preempted a timer interrupt (A) with another
> IRQ (B), itself immediately preempted by another IRQ (C)? That's
> indeed at least one too many.
>
> Can you please describe for each of (A), (B) and (C) whether they are
> spurious or not, what their priorities are if they aren't spurious?

Yes. I ignored interrupt (A). (B) is spurious and its priority is 0xa0 
and PMR is 0x70.

(C) is an NMI and its priority is 0x20. Note that GIC_PRIO_IRQON is 0xe0,

GIC_PRIO_IRQOFF is 0x60, GICD_INT_DEF_PRI is 0xa0 and GICD_INT_NMI_PRI is

0x20 in our kernel.

>> In my opinion, when handling spurious interrupts, we shouldn't enable irqs.
>> My reason is that for spurious interrupts we may enter nmi context in
>> el1_irq() because current PMR may be GIC_PRIO_IRQOFF. If we enable irqs
>> at this time, another NMI may happen and preempt this spurious interrupt
>> but the context is already in nmi. That causes a bug on if nested NMI is
>> not supported. Even for nested nmi, I think it's not a normal scenario.
> I would tend to agree that this isn't great. Actually, I'd probably
> move the check for a spurious interrupt right after the read of
> ICC_IAR1_EL1, because there is no real need to do anything else at
> that point.

So, we don't need to check NMI for spurious interrupts? Do you mean that 
spurious

interrupts' can't be NMIs? Or even spurious interrups are NMIs, we 
shouldn't do

anything for them? If so, I will move the check after the read of 
ICC_IAR1_EL1 and

send a V2.

>
> However, upstream is quite different from 4.19 in that respect, and
> I'm not sure if what I am looking at is what you are seeing with your
> older kernel.

I know that. And I look into all patches about arm64 pseudo NMIs. As I 
said before,

our kernel is very quite like v5.3 mainline. I think we are talking 
about the same thing.


In my opinion, since commit 17ce302f3117 ("arm64: Fix interrupt tracing 
in the presence of NMIs"),

spurious interrups can enter nmi context in interrupt entry because PMR 
can be GIC_PRIO_IRQOFF

for spurious interrupts. That means test_irqs_unmasked is not 0 and 
asm_nmi_enter is called.

    (some el1_irq entry code from v5.3)

    test_irqs_unmasked  res=x0, pmr=x20

    cbz x0, 1f

    bl asm_nmi_enter


And it then calls gic_handle_irq(). It doesn't call gic_handle_nmi() 
because its priority is not GICD_INT_NMI_PRI.

Then it enables irqs. If at that point another NMI comes and preempts 
it, which means NMI occurs in nmi

context. That may cause a bug on if nested NMI is not supported.

    (some gic_handle_irq code from v5.3)

    irqnr = gic_read_iar();

    if (gic_supports_nmi() &&
        unlikely(gic_read_rpr() == GICD_INT_NMI_PRI)) {
            gic_handle_nmi(irqnr, 
regs);                                   (C)
            return;
    }

    if (gic_prio_masking_enabled()) {
          gic_pmr_mask_irqs();
gic_arch_enable_irqs(); (D)
    }

>
> Thanks,
>
> 	M.
>
>> Fixes: 17ce302f3117 ("arm64: Fix interrupt tracing in the presence of NMIs")
>> Signed-off-by: He Ying <heying24@huawei.com>
>> ---
>>   drivers/irqchip/irq-gic-v3.c | 8 ++++----
>>   1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
>> index 94b89258d045..d3b52734a2c5 100644
>> --- a/drivers/irqchip/irq-gic-v3.c
>> +++ b/drivers/irqchip/irq-gic-v3.c
>> @@ -654,15 +654,15 @@ static asmlinkage void __exception_irq_entry gic_handle_irq(struct pt_regs *regs
>>   		return;
>>   	}
>>   
>> +	/* Check for special IDs first */
>> +	if ((irqnr >= 1020 && irqnr <= 1023))
>> +		return;
>> +
>>   	if (gic_prio_masking_enabled()) {
>>   		gic_pmr_mask_irqs();
>>   		gic_arch_enable_irqs();
>>   	}
>>   
>> -	/* Check for special IDs first */
>> -	if ((irqnr >= 1020 && irqnr <= 1023))
>> -		return;
>> -
>>   	if (static_branch_likely(&supports_deactivate_key))
>>   		gic_write_eoir(irqnr);
>>   	else
>> -- 
>> 2.17.1
>>
>>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] irqchip/gic-v3: Do not enable irqs when handling spurious interrups
  2021-04-17  2:01   ` He Ying
@ 2021-04-21  9:13     ` He Ying
  2021-04-22 12:27     ` Marc Zyngier
  1 sibling, 0 replies; 7+ messages in thread
From: He Ying @ 2021-04-21  9:13 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: tglx, julien.thierry.kdev, catalin.marinas, linux-kernel,
	linux-arm-kernel, Mark Rutland

Hello Marc,


Any ideas?


Thanks.


在 2021/4/17 10:01, He Ying 写道:
> Hello Marc,
>
>
> 在 2021/4/16 22:15, Marc Zyngier 写道:
>> [+ Mark]
>>
>> On Fri, 16 Apr 2021 07:22:17 +0100,
>> He Ying <heying24@huawei.com> wrote:
>>> We found this problem in our kernel src tree:
>>>
>>> [   14.816231] ------------[ cut here ]------------
>>> [   14.816231] kernel BUG at irq.c:99!
>>> [   14.816232] Internal error: Oops - BUG: 0 [#1] SMP
>>> [   14.816232] Process swapper/0 (pid: 0, stack limit = 
>>> 0x(____ptrval____))
>>> [   14.816233] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           
>>> O      4.19.95-1.h1.AOS2.0.aarch64 #14
>>> [   14.816233] Hardware name: evb (DT)
>>> [   14.816234] pstate: 80400085 (Nzcv daIf +PAN -UAO)
>>> [   14.816234] pc : asm_nmi_enter+0x94/0x98
>>> [   14.816235] lr : asm_nmi_enter+0x18/0x98
>>> [   14.816235] sp : ffff000008003c50
>>> [   14.816235] pmr_save: 00000070
>>> [   14.816237] x29: ffff000008003c50 x28: ffff0000095f56c0
>>> [   14.816238] x27: 0000000000000000 x26: ffff000008004000
>>> [   14.816239] x25: 00000000015e0000 x24: ffff8008fb916000
>>> [   14.816240] x23: 0000000020400005 x22: ffff0000080817cc
>>> [   14.816241] x21: ffff000008003da0 x20: 0000000000000060
>>> [   14.816242] x19: 00000000000003ff x18: ffffffffffffffff
>>> [   14.816243] x17: 0000000000000008 x16: 003d090000000000
>>> [   14.816244] x15: ffff0000095ea6c8 x14: ffff8008fff5ab40
>>> [   14.816244] x13: ffff8008fff58b9d x12: 0000000000000000
>>> [   14.816245] x11: ffff000008c8a200 x10: 000000008e31fca5
>>> [   14.816246] x9 : ffff000008c8a208 x8 : 000000000000000f
>>> [   14.816247] x7 : 0000000000000004 x6 : ffff8008fff58b9e
>>> [   14.816248] x5 : 0000000000000000 x4 : 0000000080000000
>>> [   14.816249] x3 : 0000000000000000 x2 : 0000000080000000
>>> [   14.816250] x1 : 0000000000120000 x0 : ffff0000095f56c0
>>> [   14.816251] Call trace:
>>> [   14.816251]  asm_nmi_enter+0x94/0x98
>>> [   14.816251]  el1_irq+0x8c/0x180
>>> [   14.816252]  gic_handle_irq+0xbc/0x2e4
>>> [   14.816252]  el1_irq+0xcc/0x180
>>> [   14.816253]  arch_timer_handler_virt+0x38/0x58
>>> [   14.816253]  handle_percpu_devid_irq+0x90/0x240
>>> [   14.816253]  generic_handle_irq+0x34/0x50
>>> [   14.816254]  __handle_domain_irq+0x68/0xc0
>>> [   14.816254]  gic_handle_irq+0xf8/0x2e4
>>> [   14.816255]  el1_irq+0xcc/0x180
>>> [   14.816255]  arch_cpu_idle+0x34/0x1c8
>>> [   14.816255]  default_idle_call+0x24/0x44
>>> [   14.816256]  do_idle+0x1d0/0x2c8
>>> [   14.816256]  cpu_startup_entry+0x28/0x30
>>> [   14.816256]  rest_init+0xb8/0xc8
>>> [   14.816257]  start_kernel+0x4c8/0x4f4
>>> [   14.816257] Code: 940587f1 d5384100 b9401001 36a7fd01 (d4210000)
>>> [   14.816258] Modules linked in: start_dp(O) smeth(O)
>>> [   15.103092] ---[ end trace 701753956cb14aa8 ]---
>>> [   15.103093] Kernel panic - not syncing: Fatal exception in interrupt
>>> [   15.103099] SMP: stopping secondary CPUs
>>> [   15.103100] Kernel Offset: disabled
>>> [   15.103100] CPU features: 0x36,a2400218
>>> [   15.103100] Memory Limit: none
>> Urgh...
>>
>>> Our kernel src tree is based on 4.19.95 and backports arm64 pseudo-NMI
>>> patches but doesn't support nested NMI. Its top relative commit is
>>> commit 17ce302f3117 ("arm64: Fix interrupt tracing in the presence 
>>> of NMIs").
>> Can you please reproduce it with mainline and without any backport?
>> It is hard to reason about something that isn't a vanilla kernel.
>
> I think our kernel is quite like v5.3 mainline. Reproducing it in v5.3 
> mainline may
>
> be a little difficult for us because our product needs some more self 
> developed
>
> patches to work.
>
>>
>>> I look into this issue and find that it's caused by 'BUG_ON(in_nmi())'
>>> in nmi_enter(). From the call trace, we find two 'el1_irqs' which
>>> means an interrupt preempts the other one and the new one is an NMI.
>>> Furthermore, by adding some prints, we find the first irq also calls
>>> nmi_enter(), but its priority is not GICD_INT_NMI_PRI and its irq 
>>> number
>>> is 1023. It enables irq by calling gic_arch_enable_irqs() in
>>> gic_handle_irq(). At this moment, the second irq preempts the first irq
>>> and it's an NMI but current context is already in nmi. So that may be
>>> the problem.
>> I'm not sure I get it. From the stack trace, I see this:
>>
>> [   14.816251]  asm_nmi_enter+0x94/0x98
>> [   14.816251]  el1_irq+0x8c/0x180            (C)
>> [   14.816252]  gic_handle_irq+0xbc/0x2e4
>> [   14.816252]  el1_irq+0xcc/0x180            (B)
>> [   14.816253]  arch_timer_handler_virt+0x38/0x58
>> [   14.816253]  handle_percpu_devid_irq+0x90/0x240
>> [   14.816253]  generic_handle_irq+0x34/0x50
>> [   14.816254]  __handle_domain_irq+0x68/0xc0
>> [   14.816254]  gic_handle_irq+0xf8/0x2e4
>> [   14.816255]  el1_irq+0xcc/0x180            (A)
>>
>> which indicates that we preempted a timer interrupt (A) with another
>> IRQ (B), itself immediately preempted by another IRQ (C)? That's
>> indeed at least one too many.
>>
>> Can you please describe for each of (A), (B) and (C) whether they are
>> spurious or not, what their priorities are if they aren't spurious?
>
> Yes. I ignored interrupt (A). (B) is spurious and its priority is 0xa0 
> and PMR is 0x70.
>
> (C) is an NMI and its priority is 0x20. Note that GIC_PRIO_IRQON is 0xe0,
>
> GIC_PRIO_IRQOFF is 0x60, GICD_INT_DEF_PRI is 0xa0 and GICD_INT_NMI_PRI is
>
> 0x20 in our kernel.
>
>>> In my opinion, when handling spurious interrupts, we shouldn't 
>>> enable irqs.
>>> My reason is that for spurious interrupts we may enter nmi context in
>>> el1_irq() because current PMR may be GIC_PRIO_IRQOFF. If we enable irqs
>>> at this time, another NMI may happen and preempt this spurious 
>>> interrupt
>>> but the context is already in nmi. That causes a bug on if nested 
>>> NMI is
>>> not supported. Even for nested nmi, I think it's not a normal scenario.
>> I would tend to agree that this isn't great. Actually, I'd probably
>> move the check for a spurious interrupt right after the read of
>> ICC_IAR1_EL1, because there is no real need to do anything else at
>> that point.
>
> So, we don't need to check NMI for spurious interrupts? Do you mean 
> that spurious
>
> interrupts' can't be NMIs? Or even spurious interrups are NMIs, we 
> shouldn't do
>
> anything for them? If so, I will move the check after the read of 
> ICC_IAR1_EL1 and
>
> send a V2.
>
>>
>> However, upstream is quite different from 4.19 in that respect, and
>> I'm not sure if what I am looking at is what you are seeing with your
>> older kernel.
>
> I know that. And I look into all patches about arm64 pseudo NMIs. As I 
> said before,
>
> our kernel is very quite like v5.3 mainline. I think we are talking 
> about the same thing.
>
>
> In my opinion, since commit 17ce302f3117 ("arm64: Fix interrupt 
> tracing in the presence of NMIs"),
>
> spurious interrups can enter nmi context in interrupt entry because 
> PMR can be GIC_PRIO_IRQOFF
>
> for spurious interrupts. That means test_irqs_unmasked is not 0 and 
> asm_nmi_enter is called.
>
>    (some el1_irq entry code from v5.3)
>
>    test_irqs_unmasked  res=x0, pmr=x20
>
>    cbz x0, 1f
>
>    bl asm_nmi_enter
>
>
> And it then calls gic_handle_irq(). It doesn't call gic_handle_nmi() 
> because its priority is not GICD_INT_NMI_PRI.
>
> Then it enables irqs. If at that point another NMI comes and preempts 
> it, which means NMI occurs in nmi
>
> context. That may cause a bug on if nested NMI is not supported.
>
>    (some gic_handle_irq code from v5.3)
>
>    irqnr = gic_read_iar();
>
>    if (gic_supports_nmi() &&
>        unlikely(gic_read_rpr() == GICD_INT_NMI_PRI)) {
>            gic_handle_nmi(irqnr, 
> regs);                                   (C)
>            return;
>    }
>
>    if (gic_prio_masking_enabled()) {
>          gic_pmr_mask_irqs();
> gic_arch_enable_irqs(); (D)
>    }
>
>>
>> Thanks,
>>
>>     M.
>>
>>> Fixes: 17ce302f3117 ("arm64: Fix interrupt tracing in the presence 
>>> of NMIs")
>>> Signed-off-by: He Ying <heying24@huawei.com>
>>> ---
>>>   drivers/irqchip/irq-gic-v3.c | 8 ++++----
>>>   1 file changed, 4 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/irqchip/irq-gic-v3.c 
>>> b/drivers/irqchip/irq-gic-v3.c
>>> index 94b89258d045..d3b52734a2c5 100644
>>> --- a/drivers/irqchip/irq-gic-v3.c
>>> +++ b/drivers/irqchip/irq-gic-v3.c
>>> @@ -654,15 +654,15 @@ static asmlinkage void __exception_irq_entry 
>>> gic_handle_irq(struct pt_regs *regs
>>>           return;
>>>       }
>>>   +    /* Check for special IDs first */
>>> +    if ((irqnr >= 1020 && irqnr <= 1023))
>>> +        return;
>>> +
>>>       if (gic_prio_masking_enabled()) {
>>>           gic_pmr_mask_irqs();
>>>           gic_arch_enable_irqs();
>>>       }
>>>   -    /* Check for special IDs first */
>>> -    if ((irqnr >= 1020 && irqnr <= 1023))
>>> -        return;
>>> -
>>>       if (static_branch_likely(&supports_deactivate_key))
>>>           gic_write_eoir(irqnr);
>>>       else
>>> -- 
>>> 2.17.1
>>>
>>>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] irqchip/gic-v3: Do not enable irqs when handling spurious interrups
  2021-04-17  2:01   ` He Ying
  2021-04-21  9:13     ` He Ying
@ 2021-04-22 12:27     ` Marc Zyngier
  2021-04-23  3:29       ` He Ying
  1 sibling, 1 reply; 7+ messages in thread
From: Marc Zyngier @ 2021-04-22 12:27 UTC (permalink / raw)
  To: He Ying
  Cc: tglx, julien.thierry.kdev, catalin.marinas, linux-kernel,
	linux-arm-kernel, Mark Rutland

On Sat, 17 Apr 2021 03:01:54 +0100,
He Ying <heying24@huawei.com> wrote:
> 
> Hello Marc,
> 
> 
> 在 2021/4/16 22:15, Marc Zyngier 写道:
> > [+ Mark]
> > 
> > On Fri, 16 Apr 2021 07:22:17 +0100,
> > He Ying <heying24@huawei.com> wrote:
> >> We found this problem in our kernel src tree:
> >> 
> >> [   14.816231] ------------[ cut here ]------------
> >> [   14.816231] kernel BUG at irq.c:99!
> >> [   14.816232] Internal error: Oops - BUG: 0 [#1] SMP
> >> [   14.816232] Process swapper/0 (pid: 0, stack limit = 0x(____ptrval____))
> >> [   14.816233] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           O      4.19.95-1.h1.AOS2.0.aarch64 #14
> >> [   14.816233] Hardware name: evb (DT)
> >> [   14.816234] pstate: 80400085 (Nzcv daIf +PAN -UAO)
> >> [   14.816234] pc : asm_nmi_enter+0x94/0x98
> >> [   14.816235] lr : asm_nmi_enter+0x18/0x98
> >> [   14.816235] sp : ffff000008003c50
> >> [   14.816235] pmr_save: 00000070
> >> [   14.816237] x29: ffff000008003c50 x28: ffff0000095f56c0
> >> [   14.816238] x27: 0000000000000000 x26: ffff000008004000
> >> [   14.816239] x25: 00000000015e0000 x24: ffff8008fb916000
> >> [   14.816240] x23: 0000000020400005 x22: ffff0000080817cc
> >> [   14.816241] x21: ffff000008003da0 x20: 0000000000000060
> >> [   14.816242] x19: 00000000000003ff x18: ffffffffffffffff
> >> [   14.816243] x17: 0000000000000008 x16: 003d090000000000
> >> [   14.816244] x15: ffff0000095ea6c8 x14: ffff8008fff5ab40
> >> [   14.816244] x13: ffff8008fff58b9d x12: 0000000000000000
> >> [   14.816245] x11: ffff000008c8a200 x10: 000000008e31fca5
> >> [   14.816246] x9 : ffff000008c8a208 x8 : 000000000000000f
> >> [   14.816247] x7 : 0000000000000004 x6 : ffff8008fff58b9e
> >> [   14.816248] x5 : 0000000000000000 x4 : 0000000080000000
> >> [   14.816249] x3 : 0000000000000000 x2 : 0000000080000000
> >> [   14.816250] x1 : 0000000000120000 x0 : ffff0000095f56c0
> >> [   14.816251] Call trace:
> >> [   14.816251]  asm_nmi_enter+0x94/0x98
> >> [   14.816251]  el1_irq+0x8c/0x180
> >> [   14.816252]  gic_handle_irq+0xbc/0x2e4
> >> [   14.816252]  el1_irq+0xcc/0x180
> >> [   14.816253]  arch_timer_handler_virt+0x38/0x58
> >> [   14.816253]  handle_percpu_devid_irq+0x90/0x240
> >> [   14.816253]  generic_handle_irq+0x34/0x50
> >> [   14.816254]  __handle_domain_irq+0x68/0xc0
> >> [   14.816254]  gic_handle_irq+0xf8/0x2e4
> >> [   14.816255]  el1_irq+0xcc/0x180
> >> [   14.816255]  arch_cpu_idle+0x34/0x1c8
> >> [   14.816255]  default_idle_call+0x24/0x44
> >> [   14.816256]  do_idle+0x1d0/0x2c8
> >> [   14.816256]  cpu_startup_entry+0x28/0x30
> >> [   14.816256]  rest_init+0xb8/0xc8
> >> [   14.816257]  start_kernel+0x4c8/0x4f4
> >> [   14.816257] Code: 940587f1 d5384100 b9401001 36a7fd01 (d4210000)
> >> [   14.816258] Modules linked in: start_dp(O) smeth(O)
> >> [   15.103092] ---[ end trace 701753956cb14aa8 ]---
> >> [   15.103093] Kernel panic - not syncing: Fatal exception in interrupt
> >> [   15.103099] SMP: stopping secondary CPUs
> >> [   15.103100] Kernel Offset: disabled
> >> [   15.103100] CPU features: 0x36,a2400218
> >> [   15.103100] Memory Limit: none
> > Urgh...
> > 
> >> Our kernel src tree is based on 4.19.95 and backports arm64 pseudo-NMI
> >> patches but doesn't support nested NMI. Its top relative commit is
> >> commit 17ce302f3117 ("arm64: Fix interrupt tracing in the presence of NMIs").
> > Can you please reproduce it with mainline and without any backport?
> > It is hard to reason about something that isn't a vanilla kernel.
> 
> I think our kernel is quite like v5.3 mainline. Reproducing it in
> v5.3 mainline may be a little difficult for us because our product
> needs some more self developed patches to work.

I don't really care about 5.3. What I care about is the tip of the
tree, and anything we fix there can trickle down to the previous
stable releases.

> >> I look into this issue and find that it's caused by 'BUG_ON(in_nmi())'
> >> in nmi_enter(). From the call trace, we find two 'el1_irqs' which
> >> means an interrupt preempts the other one and the new one is an NMI.
> >> Furthermore, by adding some prints, we find the first irq also calls
> >> nmi_enter(), but its priority is not GICD_INT_NMI_PRI and its irq number
> >> is 1023. It enables irq by calling gic_arch_enable_irqs() in
> >> gic_handle_irq(). At this moment, the second irq preempts the first irq
> >> and it's an NMI but current context is already in nmi. So that may be
> >> the problem.
> > I'm not sure I get it. From the stack trace, I see this:
> > 
> > [   14.816251]  asm_nmi_enter+0x94/0x98
> > [   14.816251]  el1_irq+0x8c/0x180			(C)
> > [   14.816252]  gic_handle_irq+0xbc/0x2e4
> > [   14.816252]  el1_irq+0xcc/0x180			(B)
> > [   14.816253]  arch_timer_handler_virt+0x38/0x58
> > [   14.816253]  handle_percpu_devid_irq+0x90/0x240
> > [   14.816253]  generic_handle_irq+0x34/0x50
> > [   14.816254]  __handle_domain_irq+0x68/0xc0
> > [   14.816254]  gic_handle_irq+0xf8/0x2e4
> > [   14.816255]  el1_irq+0xcc/0x180			(A)
> > 
> > which indicates that we preempted a timer interrupt (A) with another
> > IRQ (B), itself immediately preempted by another IRQ (C)? That's
> > indeed at least one too many.
> > 
> > Can you please describe for each of (A), (B) and (C) whether they are
> > spurious or not, what their priorities are if they aren't spurious?
> 
> Yes. I ignored interrupt (A). (B) is spurious and its priority is
> 0xa0 and PMR is 0x70. (C) is an NMI and its priority is 0x20. Note
> that GIC_PRIO_IRQON is 0xe0, GIC_PRIO_IRQOFF is 0x60,
> GICD_INT_DEF_PRI is 0xa0 and GICD_INT_NMI_PRI is 0x20 in our kernel.

If (B) is spurious (aka ICC_IAR1R_EL1 return 1023), then its
"priority" doesn't really exist, and I don't really get what you mean
by "its priority is 0xa0".  ICC_RPR_EL1 shouldn't change when Ack-ing
a spurious interrupt, because there is no change in GIC state at all.

And if PMR is 0x70 at the point where you get (B), then I really can't
see how you can get an interrupt of priority 0xa0 anyway.



> >> In my opinion, when handling spurious interrupts, we shouldn't enable irqs.
> >> My reason is that for spurious interrupts we may enter nmi context in
> >> el1_irq() because current PMR may be GIC_PRIO_IRQOFF. If we enable irqs
> >> at this time, another NMI may happen and preempt this spurious interrupt
> >> but the context is already in nmi. That causes a bug on if nested NMI is
> >> not supported. Even for nested nmi, I think it's not a normal scenario.
> > I would tend to agree that this isn't great. Actually, I'd probably
> > move the check for a spurious interrupt right after the read of
> > ICC_IAR1_EL1, because there is no real need to do anything else at
> > that point.
> 
> So, we don't need to check NMI for spurious interrupts? Do you mean
> that spurious interrupts' can't be NMIs? Or even spurious interrups
> are NMIs, we shouldn't do anything for them? If so, I will move the
> check after the read of ICC_IAR1_EL1 and send a V2.

Spurious interrupts are not interrupts at all. It is either a level
interrupt that has been retired before being handled, or some other
transient effect (like an interrupt being moved from one CPU to
another), or even flaky HW. So doing anything based on a spurious
interrupt is definitely a potential bug, and I suggest this:

diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index eb0ee356a629..00404024d7cd 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -648,6 +648,10 @@ static asmlinkage void __exception_irq_entry gic_handle_irq(struct pt_regs *regs
 
 	irqnr = gic_read_iar();
 
+	/* Check for special IDs first */
+	if ((irqnr >= 1020 && irqnr <= 1023))
+		return;
+
 	if (gic_supports_nmi() &&
 	    unlikely(gic_read_rpr() == GICD_INT_NMI_PRI)) {
 		gic_handle_nmi(irqnr, regs);
@@ -659,10 +663,6 @@ static asmlinkage void __exception_irq_entry gic_handle_irq(struct pt_regs *regs
 		gic_arch_enable_irqs();
 	}
 
-	/* Check for special IDs first */
-	if ((irqnr >= 1020 && irqnr <= 1023))
-		return;
-
 	if (static_branch_likely(&supports_deactivate_key))
 		gic_write_eoir(irqnr);
 	else

[...]

> In my opinion, since commit 17ce302f3117 ("arm64: Fix interrupt
> tracing in the presence of NMIs"), spurious interrups can enter nmi
> context in interrupt entry because PMR can be GIC_PRIO_IRQOFF for
> spurious interrupts. That means test_irqs_unmasked is not 0 and
> asm_nmi_enter is called.
> 
>    (some el1_irq entry code from v5.3)
>    test_irqs_unmasked  res=x0, pmr=x20
>    cbz x0, 1f
>    bl asm_nmi_enter

That code has significantly changed upstream, and is now in C. I think
it still do the same thing though.

> 
> And it then calls gic_handle_irq(). It doesn't call gic_handle_nmi()
> because its priority is not GICD_INT_NMI_PRI.
> 
> Then it enables irqs. If at that point another NMI comes and
> preempts it, which means NMI occurs in nmi context. That may cause a
> bug on if nested NMI is not supported.
> 
>    (some gic_handle_irq code from v5.3)
> 
>    irqnr = gic_read_iar();
> 
>    if (gic_supports_nmi() &&
>        unlikely(gic_read_rpr() == GICD_INT_NMI_PRI)) {
>            gic_handle_nmi(irqnr,
> regs);                                   (C)
>            return;
>    }
> 
>    if (gic_prio_masking_enabled()) {
>          gic_pmr_mask_irqs();
> gic_arch_enable_irqs(); (D)
>    }

I believe the above patch would fix the spurious interrupt issue you
have experienced. Please let me know, and post a v2 if this works for
you.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] irqchip/gic-v3: Do not enable irqs when handling spurious interrups
  2021-04-22 12:27     ` Marc Zyngier
@ 2021-04-23  3:29       ` He Ying
  2021-04-23  8:08         ` Marc Zyngier
  0 siblings, 1 reply; 7+ messages in thread
From: He Ying @ 2021-04-23  3:29 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: tglx, julien.thierry.kdev, catalin.marinas, linux-kernel,
	linux-arm-kernel, Mark Rutland


在 2021/4/22 20:27, Marc Zyngier 写道:
> On Sat, 17 Apr 2021 03:01:54 +0100,
> He Ying <heying24@huawei.com> wrote:
>> Hello Marc,
>>
>>
>> 在 2021/4/16 22:15, Marc Zyngier 写道:
>>> [+ Mark]
>>>
>>> On Fri, 16 Apr 2021 07:22:17 +0100,
>>> He Ying <heying24@huawei.com> wrote:
>>>> We found this problem in our kernel src tree:
>>>>
>>>> [   14.816231] ------------[ cut here ]------------
>>>> [   14.816231] kernel BUG at irq.c:99!
>>>> [   14.816232] Internal error: Oops - BUG: 0 [#1] SMP
>>>> [   14.816232] Process swapper/0 (pid: 0, stack limit = 0x(____ptrval____))
>>>> [   14.816233] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           O      4.19.95-1.h1.AOS2.0.aarch64 #14
>>>> [   14.816233] Hardware name: evb (DT)
>>>> [   14.816234] pstate: 80400085 (Nzcv daIf +PAN -UAO)
>>>> [   14.816234] pc : asm_nmi_enter+0x94/0x98
>>>> [   14.816235] lr : asm_nmi_enter+0x18/0x98
>>>> [   14.816235] sp : ffff000008003c50
>>>> [   14.816235] pmr_save: 00000070
>>>> [   14.816237] x29: ffff000008003c50 x28: ffff0000095f56c0
>>>> [   14.816238] x27: 0000000000000000 x26: ffff000008004000
>>>> [   14.816239] x25: 00000000015e0000 x24: ffff8008fb916000
>>>> [   14.816240] x23: 0000000020400005 x22: ffff0000080817cc
>>>> [   14.816241] x21: ffff000008003da0 x20: 0000000000000060
>>>> [   14.816242] x19: 00000000000003ff x18: ffffffffffffffff
>>>> [   14.816243] x17: 0000000000000008 x16: 003d090000000000
>>>> [   14.816244] x15: ffff0000095ea6c8 x14: ffff8008fff5ab40
>>>> [   14.816244] x13: ffff8008fff58b9d x12: 0000000000000000
>>>> [   14.816245] x11: ffff000008c8a200 x10: 000000008e31fca5
>>>> [   14.816246] x9 : ffff000008c8a208 x8 : 000000000000000f
>>>> [   14.816247] x7 : 0000000000000004 x6 : ffff8008fff58b9e
>>>> [   14.816248] x5 : 0000000000000000 x4 : 0000000080000000
>>>> [   14.816249] x3 : 0000000000000000 x2 : 0000000080000000
>>>> [   14.816250] x1 : 0000000000120000 x0 : ffff0000095f56c0
>>>> [   14.816251] Call trace:
>>>> [   14.816251]  asm_nmi_enter+0x94/0x98
>>>> [   14.816251]  el1_irq+0x8c/0x180
>>>> [   14.816252]  gic_handle_irq+0xbc/0x2e4
>>>> [   14.816252]  el1_irq+0xcc/0x180
>>>> [   14.816253]  arch_timer_handler_virt+0x38/0x58
>>>> [   14.816253]  handle_percpu_devid_irq+0x90/0x240
>>>> [   14.816253]  generic_handle_irq+0x34/0x50
>>>> [   14.816254]  __handle_domain_irq+0x68/0xc0
>>>> [   14.816254]  gic_handle_irq+0xf8/0x2e4
>>>> [   14.816255]  el1_irq+0xcc/0x180
>>>> [   14.816255]  arch_cpu_idle+0x34/0x1c8
>>>> [   14.816255]  default_idle_call+0x24/0x44
>>>> [   14.816256]  do_idle+0x1d0/0x2c8
>>>> [   14.816256]  cpu_startup_entry+0x28/0x30
>>>> [   14.816256]  rest_init+0xb8/0xc8
>>>> [   14.816257]  start_kernel+0x4c8/0x4f4
>>>> [   14.816257] Code: 940587f1 d5384100 b9401001 36a7fd01 (d4210000)
>>>> [   14.816258] Modules linked in: start_dp(O) smeth(O)
>>>> [   15.103092] ---[ end trace 701753956cb14aa8 ]---
>>>> [   15.103093] Kernel panic - not syncing: Fatal exception in interrupt
>>>> [   15.103099] SMP: stopping secondary CPUs
>>>> [   15.103100] Kernel Offset: disabled
>>>> [   15.103100] CPU features: 0x36,a2400218
>>>> [   15.103100] Memory Limit: none
>>> Urgh...
>>>
>>>> Our kernel src tree is based on 4.19.95 and backports arm64 pseudo-NMI
>>>> patches but doesn't support nested NMI. Its top relative commit is
>>>> commit 17ce302f3117 ("arm64: Fix interrupt tracing in the presence of NMIs").
>>> Can you please reproduce it with mainline and without any backport?
>>> It is hard to reason about something that isn't a vanilla kernel.
>> I think our kernel is quite like v5.3 mainline. Reproducing it in
>> v5.3 mainline may be a little difficult for us because our product
>> needs some more self developed patches to work.
> I don't really care about 5.3. What I care about is the tip of the
> tree, and anything we fix there can trickle down to the previous
> stable releases.

I don't know if I give the tip of tree. I provide the Fixes tag

"Fixes: 17ce302f3117 ("arm64: Fix interrupt tracing in the presence of 
NMIs")".

Is that sufficient to help?

>
>>>> I look into this issue and find that it's caused by 'BUG_ON(in_nmi())'
>>>> in nmi_enter(). From the call trace, we find two 'el1_irqs' which
>>>> means an interrupt preempts the other one and the new one is an NMI.
>>>> Furthermore, by adding some prints, we find the first irq also calls
>>>> nmi_enter(), but its priority is not GICD_INT_NMI_PRI and its irq number
>>>> is 1023. It enables irq by calling gic_arch_enable_irqs() in
>>>> gic_handle_irq(). At this moment, the second irq preempts the first irq
>>>> and it's an NMI but current context is already in nmi. So that may be
>>>> the problem.
>>> I'm not sure I get it. From the stack trace, I see this:
>>>
>>> [   14.816251]  asm_nmi_enter+0x94/0x98
>>> [   14.816251]  el1_irq+0x8c/0x180			(C)
>>> [   14.816252]  gic_handle_irq+0xbc/0x2e4
>>> [   14.816252]  el1_irq+0xcc/0x180			(B)
>>> [   14.816253]  arch_timer_handler_virt+0x38/0x58
>>> [   14.816253]  handle_percpu_devid_irq+0x90/0x240
>>> [   14.816253]  generic_handle_irq+0x34/0x50
>>> [   14.816254]  __handle_domain_irq+0x68/0xc0
>>> [   14.816254]  gic_handle_irq+0xf8/0x2e4
>>> [   14.816255]  el1_irq+0xcc/0x180			(A)
>>>
>>> which indicates that we preempted a timer interrupt (A) with another
>>> IRQ (B), itself immediately preempted by another IRQ (C)? That's
>>> indeed at least one too many.
>>>
>>> Can you please describe for each of (A), (B) and (C) whether they are
>>> spurious or not, what their priorities are if they aren't spurious?
>> Yes. I ignored interrupt (A). (B) is spurious and its priority is
>> 0xa0 and PMR is 0x70. (C) is an NMI and its priority is 0x20. Note
>> that GIC_PRIO_IRQON is 0xe0, GIC_PRIO_IRQOFF is 0x60,
>> GICD_INT_DEF_PRI is 0xa0 and GICD_INT_NMI_PRI is 0x20 in our kernel.
> If (B) is spurious (aka ICC_IAR1R_EL1 return 1023), then its
> "priority" doesn't really exist, and I don't really get what you mean
> by "its priority is 0xa0".  ICC_RPR_EL1 shouldn't change when Ack-ing
> a spurious interrupt, because there is no change in GIC state at all.
OK. By saying "its priority is 0xa0", I just mean ICC_RPR_EL1 is read as 
0xa0.
>
> And if PMR is 0x70 at the point where you get (B), then I really can't
> see how you can get an interrupt of priority 0xa0 anyway.

Yes, it also confuses me. Perhaps ICC_RPR_EL1 changes when read, I think.

>
>
>
>>>> In my opinion, when handling spurious interrupts, we shouldn't enable irqs.
>>>> My reason is that for spurious interrupts we may enter nmi context in
>>>> el1_irq() because current PMR may be GIC_PRIO_IRQOFF. If we enable irqs
>>>> at this time, another NMI may happen and preempt this spurious interrupt
>>>> but the context is already in nmi. That causes a bug on if nested NMI is
>>>> not supported. Even for nested nmi, I think it's not a normal scenario.
>>> I would tend to agree that this isn't great. Actually, I'd probably
>>> move the check for a spurious interrupt right after the read of
>>> ICC_IAR1_EL1, because there is no real need to do anything else at
>>> that point.
>> So, we don't need to check NMI for spurious interrupts? Do you mean
>> that spurious interrupts' can't be NMIs? Or even spurious interrups
>> are NMIs, we shouldn't do anything for them? If so, I will move the
>> check after the read of ICC_IAR1_EL1 and send a V2.
> Spurious interrupts are not interrupts at all. It is either a level
> interrupt that has been retired before being handled, or some other
> transient effect (like an interrupt being moved from one CPU to
> another), or even flaky HW. So doing anything based on a spurious
> interrupt is definitely a potential bug, and I suggest this:
>
> diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
> index eb0ee356a629..00404024d7cd 100644
> --- a/drivers/irqchip/irq-gic-v3.c
> +++ b/drivers/irqchip/irq-gic-v3.c
> @@ -648,6 +648,10 @@ static asmlinkage void __exception_irq_entry gic_handle_irq(struct pt_regs *regs
>   
>   	irqnr = gic_read_iar();
>   
> +	/* Check for special IDs first */
> +	if ((irqnr >= 1020 && irqnr <= 1023))
> +		return;
> +
>   	if (gic_supports_nmi() &&
>   	    unlikely(gic_read_rpr() == GICD_INT_NMI_PRI)) {
>   		gic_handle_nmi(irqnr, regs);
> @@ -659,10 +663,6 @@ static asmlinkage void __exception_irq_entry gic_handle_irq(struct pt_regs *regs
>   		gic_arch_enable_irqs();
>   	}
>   
> -	/* Check for special IDs first */
> -	if ((irqnr >= 1020 && irqnr <= 1023))
> -		return;
> -
>   	if (static_branch_likely(&supports_deactivate_key))
>   		gic_write_eoir(irqnr);
>   	else
>
> [...]
Thanks for explaining it. I quite agree with your suggestion.
>> In my opinion, since commit 17ce302f3117 ("arm64: Fix interrupt
>> tracing in the presence of NMIs"), spurious interrups can enter nmi
>> context in interrupt entry because PMR can be GIC_PRIO_IRQOFF for
>> spurious interrupts. That means test_irqs_unmasked is not 0 and
>> asm_nmi_enter is called.
>>
>>     (some el1_irq entry code from v5.3)
>>     test_irqs_unmasked  res=x0, pmr=x20
>>     cbz x0, 1f
>>     bl asm_nmi_enter
> That code has significantly changed upstream, and is now in C. I think
> it still do the same thing though.
Yes, you're right.
>
>> And it then calls gic_handle_irq(). It doesn't call gic_handle_nmi()
>> because its priority is not GICD_INT_NMI_PRI.
>>
>> Then it enables irqs. If at that point another NMI comes and
>> preempts it, which means NMI occurs in nmi context. That may cause a
>> bug on if nested NMI is not supported.
>>
>>     (some gic_handle_irq code from v5.3)
>>
>>     irqnr = gic_read_iar();
>>
>>     if (gic_supports_nmi() &&
>>         unlikely(gic_read_rpr() == GICD_INT_NMI_PRI)) {
>>             gic_handle_nmi(irqnr,
>> regs);                                   (C)
>>             return;
>>     }
>>
>>     if (gic_prio_masking_enabled()) {
>>           gic_pmr_mask_irqs();
>> gic_arch_enable_irqs(); (D)
>>     }
> I believe the above patch would fix the spurious interrupt issue you
> have experienced. Please let me know, and post a v2 if this works for
> you.

Fortunately, it works. I'll post a v2. Should I cc stable mail list?


Thanks.


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] irqchip/gic-v3: Do not enable irqs when handling spurious interrups
  2021-04-23  3:29       ` He Ying
@ 2021-04-23  8:08         ` Marc Zyngier
  0 siblings, 0 replies; 7+ messages in thread
From: Marc Zyngier @ 2021-04-23  8:08 UTC (permalink / raw)
  To: He Ying
  Cc: tglx, julien.thierry.kdev, catalin.marinas, linux-kernel,
	linux-arm-kernel, Mark Rutland

On Fri, 23 Apr 2021 04:29:44 +0100,
He Ying <heying24@huawei.com> wrote:

[...]

> >>>> I look into this issue and find that it's caused by 'BUG_ON(in_nmi())'
> >>>> in nmi_enter(). From the call trace, we find two 'el1_irqs' which
> >>>> means an interrupt preempts the other one and the new one is an NMI.
> >>>> Furthermore, by adding some prints, we find the first irq also calls
> >>>> nmi_enter(), but its priority is not GICD_INT_NMI_PRI and its irq number
> >>>> is 1023. It enables irq by calling gic_arch_enable_irqs() in
> >>>> gic_handle_irq(). At this moment, the second irq preempts the first irq
> >>>> and it's an NMI but current context is already in nmi. So that may be
> >>>> the problem.
> >>> I'm not sure I get it. From the stack trace, I see this:
> >>> 
> >>> [   14.816251]  asm_nmi_enter+0x94/0x98
> >>> [   14.816251]  el1_irq+0x8c/0x180			(C)
> >>> [   14.816252]  gic_handle_irq+0xbc/0x2e4
> >>> [   14.816252]  el1_irq+0xcc/0x180			(B)
> >>> [   14.816253]  arch_timer_handler_virt+0x38/0x58
> >>> [   14.816253]  handle_percpu_devid_irq+0x90/0x240
> >>> [   14.816253]  generic_handle_irq+0x34/0x50
> >>> [   14.816254]  __handle_domain_irq+0x68/0xc0
> >>> [   14.816254]  gic_handle_irq+0xf8/0x2e4
> >>> [   14.816255]  el1_irq+0xcc/0x180			(A)
> >>> 
> >>> which indicates that we preempted a timer interrupt (A) with another
> >>> IRQ (B), itself immediately preempted by another IRQ (C)? That's
> >>> indeed at least one too many.
> >>> 
> >>> Can you please describe for each of (A), (B) and (C) whether they are
> >>> spurious or not, what their priorities are if they aren't spurious?
> >> Yes. I ignored interrupt (A). (B) is spurious and its priority is
> >> 0xa0 and PMR is 0x70. (C) is an NMI and its priority is 0x20. Note
> >> that GIC_PRIO_IRQON is 0xe0, GIC_PRIO_IRQOFF is 0x60,
> >> GICD_INT_DEF_PRI is 0xa0 and GICD_INT_NMI_PRI is 0x20 in our kernel.
> > If (B) is spurious (aka ICC_IAR1R_EL1 return 1023), then its
> > "priority" doesn't really exist, and I don't really get what you mean
> > by "its priority is 0xa0".  ICC_RPR_EL1 shouldn't change when Ack-ing
> > a spurious interrupt, because there is no change in GIC state at all.
> OK. By saying "its priority is 0xa0", I just mean ICC_RPR_EL1 is read
> as 0xa0.

Right. That's very different from saying that an interrupt has
priority 0xA0.

> > 
> > And if PMR is 0x70 at the point where you get (B), then I really can't
> > see how you can get an interrupt of priority 0xa0 anyway.
> 
> Yes, it also confuses me. Perhaps ICC_RPR_EL1 changes when read, I
> think.

No, it just means that we were already in an interrupt context when we
handled the spurious interrupt: I think interrupt (A) above was being
handled with priority 0xa0, preempted by a spurious interrupt (B) which
did the wrong thing by messing with the NMI state.

Probably (B) was an NMI that got retired early and presented again to
the cpuif as (C). If that's the case, the GIC implementation may be a
bit flaky. But the driver definitely has a bug.

[...]

> > I believe the above patch would fix the spurious interrupt issue you
> > have experienced. Please let me know, and post a v2 if this works for
> > you.
> 
> Fortunately, it works. I'll post a v2. Should I cc stable mail list?

Yes please.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-04-23  8:10 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-16  6:22 [RFC PATCH] irqchip/gic-v3: Do not enable irqs when handling spurious interrups He Ying
2021-04-16 14:15 ` Marc Zyngier
2021-04-17  2:01   ` He Ying
2021-04-21  9:13     ` He Ying
2021-04-22 12:27     ` Marc Zyngier
2021-04-23  3:29       ` He Ying
2021-04-23  8:08         ` Marc Zyngier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).