All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH hulk-4.19-next] irqchip/gic-v3: Do not enable irqs when handling spurious interrups
@ 2021-04-26  2:39 He Ying
  2021-04-26  3:42 ` Hanjun Guo
  2021-04-26  5:39 ` Greg KH
  0 siblings, 2 replies; 6+ messages in thread
From: He Ying @ 2021-04-26  2:39 UTC (permalink / raw)
  To: patchwork
  Cc: huawei.libin, yangerkun, xiexiuqi, guohanjun, He Ying,
	Marc Zyngier, stable

hulk inclusion
category: bugfix
bugzilla: NA
DTS: NA
CVE: NA

--------------------------------

We triggered the following error while running our 4.19 kernel
with the pseudo-NMI patches backported to it:

[   14.816231] ------------[ cut here ]------------
[   14.816231] kernel BUG at irq.c:99!
[   14.816232] Internal error: Oops - BUG: 0 [#1] SMP
[   14.816232] Process swapper/0 (pid: 0, stack limit = 0x(____ptrval____))
[   14.816233] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           O      4.19.95.aarch64 #14
[   14.816233] Hardware name: evb (DT)
[   14.816234] pstate: 80400085 (Nzcv daIf +PAN -UAO)
[   14.816234] pc : asm_nmi_enter+0x94/0x98
[   14.816235] lr : asm_nmi_enter+0x18/0x98
[   14.816235] sp : ffff000008003c50
[   14.816235] pmr_save: 00000070
[   14.816237] x29: ffff000008003c50 x28: ffff0000095f56c0
[   14.816238] x27: 0000000000000000 x26: ffff000008004000
[   14.816239] x25: 00000000015e0000 x24: ffff8008fb916000
[   14.816240] x23: 0000000020400005 x22: ffff0000080817cc
[   14.816241] x21: ffff000008003da0 x20: 0000000000000060
[   14.816242] x19: 00000000000003ff x18: ffffffffffffffff
[   14.816243] x17: 0000000000000008 x16: 003d090000000000
[   14.816244] x15: ffff0000095ea6c8 x14: ffff8008fff5ab40
[   14.816244] x13: ffff8008fff58b9d x12: 0000000000000000
[   14.816245] x11: ffff000008c8a200 x10: 000000008e31fca5
[   14.816246] x9 : ffff000008c8a208 x8 : 000000000000000f
[   14.816247] x7 : 0000000000000004 x6 : ffff8008fff58b9e
[   14.816248] x5 : 0000000000000000 x4 : 0000000080000000
[   14.816249] x3 : 0000000000000000 x2 : 0000000080000000
[   14.816250] x1 : 0000000000120000 x0 : ffff0000095f56c0
[   14.816251] Call trace:
[   14.816251]  asm_nmi_enter+0x94/0x98
[   14.816251]  el1_irq+0x8c/0x180                    (IRQ C)
[   14.816252]  gic_handle_irq+0xbc/0x2e4
[   14.816252]  el1_irq+0xcc/0x180                    (IRQ B)
[   14.816253]  arch_timer_handler_virt+0x38/0x58
[   14.816253]  handle_percpu_devid_irq+0x90/0x240
[   14.816253]  generic_handle_irq+0x34/0x50
[   14.816254]  __handle_domain_irq+0x68/0xc0
[   14.816254]  gic_handle_irq+0xf8/0x2e4
[   14.816255]  el1_irq+0xcc/0x180                    (IRQ A)
[   14.816255]  arch_cpu_idle+0x34/0x1c8
[   14.816255]  default_idle_call+0x24/0x44
[   14.816256]  do_idle+0x1d0/0x2c8
[   14.816256]  cpu_startup_entry+0x28/0x30
[   14.816256]  rest_init+0xb8/0xc8
[   14.816257]  start_kernel+0x4c8/0x4f4
[   14.816257] Code: 940587f1 d5384100 b9401001 36a7fd01 (d4210000)
[   14.816258] Modules linked in: start_dp(O) smeth(O)
[   15.103092] ---[ end trace 701753956cb14aa8 ]---
[   15.103093] Kernel panic - not syncing: Fatal exception in interrupt
[   15.103099] SMP: stopping secondary CPUs
[   15.103100] Kernel Offset: disabled
[   15.103100] CPU features: 0x36,a2400218
[   15.103100] Memory Limit: none

which is cause by a 'BUG_ON(in_nmi())' in nmi_enter().

From the call trace, we can find three interrupts (noted A, B, C above):
interrupt (A) is preempted by (B), which is further interrupted by (C).

Subsequent investigations show that (B) results in nmi_enter() being
called, but that it actually is a spurious interrupt. Furthermore,
interrupts are reenabled in the context of (B), and (C) fires with
NMI priority. We end-up with a nested NMI situation, something
we definitely do not want to (and cannot) handle.

The bug here is that spurious interrupts should never result in any
state change, and we should just return to the interrupted context.
Moving the handling of spurious interrupts as early as possible in
the GICv3 handler fixes this issue.

Fixes: 3f1f3234bc2d ("irqchip/gic-v3: Switch to PMR masking before calling IRQ handler")
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: He Ying <heying24@huawei.com>
[maz: rewrote commit message, corrected Fixes: tag]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210423083516.170111-1-heying24@huawei.com
Cc: stable@vger.kernel.org

Signed-off-by: He Ying <heying24@huawei.com>
---
 drivers/irqchip/irq-gic-v3.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index 14e9c1a5627b..acf0ae4ef612 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -518,6 +518,10 @@ static asmlinkage void __exception_irq_entry gic_handle_irq(struct pt_regs *regs
 
 	irqnr = gic_read_iar();
 
+	/* Check for special IDs first */
+	if ((irqnr >= 1020 && irqnr <= 1023))
+		return;
+
 	if (gic_supports_nmi() &&
 	    unlikely(gic_read_rpr() == GICD_INT_NMI_PRI)) {
 		gic_handle_nmi(irqnr, regs);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH hulk-4.19-next] irqchip/gic-v3: Do not enable irqs when handling spurious interrups
  2021-04-26  2:39 [PATCH hulk-4.19-next] irqchip/gic-v3: Do not enable irqs when handling spurious interrups He Ying
@ 2021-04-26  3:42 ` Hanjun Guo
  2021-04-26  5:39 ` Greg KH
  1 sibling, 0 replies; 6+ messages in thread
From: Hanjun Guo @ 2021-04-26  3:42 UTC (permalink / raw)
  To: He Ying, patchwork
  Cc: huawei.libin, yangerkun, xiexiuqi, Marc Zyngier, stable

On 2021/4/26 10:39, He Ying wrote:
> hulk inclusion
> category: bugfix
> bugzilla: NA
> DTS: NA
> CVE: NA
> 
> --------------------------------
> 
> We triggered the following error while running our 4.19 kernel
> with the pseudo-NMI patches backported to it:
> 
> [   14.816231] ------------[ cut here ]------------
> [   14.816231] kernel BUG at irq.c:99!
> [   14.816232] Internal error: Oops - BUG: 0 [#1] SMP
> [   14.816232] Process swapper/0 (pid: 0, stack limit = 0x(____ptrval____))
> [   14.816233] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           O      4.19.95.aarch64 #14
> [   14.816233] Hardware name: evb (DT)
> [   14.816234] pstate: 80400085 (Nzcv daIf +PAN -UAO)
> [   14.816234] pc : asm_nmi_enter+0x94/0x98
> [   14.816235] lr : asm_nmi_enter+0x18/0x98
> [   14.816235] sp : ffff000008003c50
> [   14.816235] pmr_save: 00000070
> [   14.816237] x29: ffff000008003c50 x28: ffff0000095f56c0
> [   14.816238] x27: 0000000000000000 x26: ffff000008004000
> [   14.816239] x25: 00000000015e0000 x24: ffff8008fb916000
> [   14.816240] x23: 0000000020400005 x22: ffff0000080817cc
> [   14.816241] x21: ffff000008003da0 x20: 0000000000000060
> [   14.816242] x19: 00000000000003ff x18: ffffffffffffffff
> [   14.816243] x17: 0000000000000008 x16: 003d090000000000
> [   14.816244] x15: ffff0000095ea6c8 x14: ffff8008fff5ab40
> [   14.816244] x13: ffff8008fff58b9d x12: 0000000000000000
> [   14.816245] x11: ffff000008c8a200 x10: 000000008e31fca5
> [   14.816246] x9 : ffff000008c8a208 x8 : 000000000000000f
> [   14.816247] x7 : 0000000000000004 x6 : ffff8008fff58b9e
> [   14.816248] x5 : 0000000000000000 x4 : 0000000080000000
> [   14.816249] x3 : 0000000000000000 x2 : 0000000080000000
> [   14.816250] x1 : 0000000000120000 x0 : ffff0000095f56c0
> [   14.816251] Call trace:
> [   14.816251]  asm_nmi_enter+0x94/0x98
> [   14.816251]  el1_irq+0x8c/0x180                    (IRQ C)
> [   14.816252]  gic_handle_irq+0xbc/0x2e4
> [   14.816252]  el1_irq+0xcc/0x180                    (IRQ B)
> [   14.816253]  arch_timer_handler_virt+0x38/0x58
> [   14.816253]  handle_percpu_devid_irq+0x90/0x240
> [   14.816253]  generic_handle_irq+0x34/0x50
> [   14.816254]  __handle_domain_irq+0x68/0xc0
> [   14.816254]  gic_handle_irq+0xf8/0x2e4
> [   14.816255]  el1_irq+0xcc/0x180                    (IRQ A)
> [   14.816255]  arch_cpu_idle+0x34/0x1c8
> [   14.816255]  default_idle_call+0x24/0x44
> [   14.816256]  do_idle+0x1d0/0x2c8
> [   14.816256]  cpu_startup_entry+0x28/0x30
> [   14.816256]  rest_init+0xb8/0xc8
> [   14.816257]  start_kernel+0x4c8/0x4f4
> [   14.816257] Code: 940587f1 d5384100 b9401001 36a7fd01 (d4210000)
> [   14.816258] Modules linked in: start_dp(O) smeth(O)
> [   15.103092] ---[ end trace 701753956cb14aa8 ]---
> [   15.103093] Kernel panic - not syncing: Fatal exception in interrupt
> [   15.103099] SMP: stopping secondary CPUs
> [   15.103100] Kernel Offset: disabled
> [   15.103100] CPU features: 0x36,a2400218
> [   15.103100] Memory Limit: none
> 
> which is cause by a 'BUG_ON(in_nmi())' in nmi_enter().
> 
>>From the call trace, we can find three interrupts (noted A, B, C above):
> interrupt (A) is preempted by (B), which is further interrupted by (C).
> 
> Subsequent investigations show that (B) results in nmi_enter() being
> called, but that it actually is a spurious interrupt. Furthermore,
> interrupts are reenabled in the context of (B), and (C) fires with
> NMI priority. We end-up with a nested NMI situation, something
> we definitely do not want to (and cannot) handle.
> 
> The bug here is that spurious interrupts should never result in any
> state change, and we should just return to the interrupted context.
> Moving the handling of spurious interrupts as early as possible in
> the GICv3 handler fixes this issue.
> 
> Fixes: 3f1f3234bc2d ("irqchip/gic-v3: Switch to PMR masking before calling IRQ handler")
> Acked-by: Mark Rutland <mark.rutland@arm.com>
> Signed-off-by: He Ying <heying24@huawei.com>
> [maz: rewrote commit message, corrected Fixes: tag]
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> Link: https://lore.kernel.org/r/20210423083516.170111-1-heying24@huawei.com
> Cc: stable@vger.kernel.org
> 
> Signed-off-by: He Ying <heying24@huawei.com>
> ---
>   drivers/irqchip/irq-gic-v3.c | 4 ++++
>   1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
> index 14e9c1a5627b..acf0ae4ef612 100644
> --- a/drivers/irqchip/irq-gic-v3.c
> +++ b/drivers/irqchip/irq-gic-v3.c
> @@ -518,6 +518,10 @@ static asmlinkage void __exception_irq_entry gic_handle_irq(struct pt_regs *regs
>   
>   	irqnr = gic_read_iar();
>   
> +	/* Check for special IDs first */
> +	if ((irqnr >= 1020 && irqnr <= 1023))
> +		return;
> +
>   	if (gic_supports_nmi() &&
>   	    unlikely(gic_read_rpr() == GICD_INT_NMI_PRI)) {
>   		gic_handle_nmi(irqnr, regs);
> 

Reviewed-by: Hanjun Guo <guohanjun@huawei.com>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH hulk-4.19-next] irqchip/gic-v3: Do not enable irqs when handling spurious interrups
  2021-04-26  2:39 [PATCH hulk-4.19-next] irqchip/gic-v3: Do not enable irqs when handling spurious interrups He Ying
  2021-04-26  3:42 ` Hanjun Guo
@ 2021-04-26  5:39 ` Greg KH
  2021-04-26  6:28   ` He Ying
  2021-04-26  8:44   ` Hanjun Guo
  1 sibling, 2 replies; 6+ messages in thread
From: Greg KH @ 2021-04-26  5:39 UTC (permalink / raw)
  To: He Ying
  Cc: patchwork, huawei.libin, yangerkun, xiexiuqi, guohanjun,
	Marc Zyngier, stable

On Sun, Apr 25, 2021 at 10:39:29PM -0400, He Ying wrote:
> hulk inclusion
> category: bugfix
> bugzilla: NA
> DTS: NA
> CVE: NA
> 
> --------------------------------
> 
> We triggered the following error while running our 4.19 kernel
> with the pseudo-NMI patches backported to it:
> 
> [   14.816231] ------------[ cut here ]------------
> [   14.816231] kernel BUG at irq.c:99!
> [   14.816232] Internal error: Oops - BUG: 0 [#1] SMP
> [   14.816232] Process swapper/0 (pid: 0, stack limit = 0x(____ptrval____))
> [   14.816233] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           O      4.19.95.aarch64 #14
> [   14.816233] Hardware name: evb (DT)
> [   14.816234] pstate: 80400085 (Nzcv daIf +PAN -UAO)
> [   14.816234] pc : asm_nmi_enter+0x94/0x98
> [   14.816235] lr : asm_nmi_enter+0x18/0x98
> [   14.816235] sp : ffff000008003c50
> [   14.816235] pmr_save: 00000070
> [   14.816237] x29: ffff000008003c50 x28: ffff0000095f56c0
> [   14.816238] x27: 0000000000000000 x26: ffff000008004000
> [   14.816239] x25: 00000000015e0000 x24: ffff8008fb916000
> [   14.816240] x23: 0000000020400005 x22: ffff0000080817cc
> [   14.816241] x21: ffff000008003da0 x20: 0000000000000060
> [   14.816242] x19: 00000000000003ff x18: ffffffffffffffff
> [   14.816243] x17: 0000000000000008 x16: 003d090000000000
> [   14.816244] x15: ffff0000095ea6c8 x14: ffff8008fff5ab40
> [   14.816244] x13: ffff8008fff58b9d x12: 0000000000000000
> [   14.816245] x11: ffff000008c8a200 x10: 000000008e31fca5
> [   14.816246] x9 : ffff000008c8a208 x8 : 000000000000000f
> [   14.816247] x7 : 0000000000000004 x6 : ffff8008fff58b9e
> [   14.816248] x5 : 0000000000000000 x4 : 0000000080000000
> [   14.816249] x3 : 0000000000000000 x2 : 0000000080000000
> [   14.816250] x1 : 0000000000120000 x0 : ffff0000095f56c0
> [   14.816251] Call trace:
> [   14.816251]  asm_nmi_enter+0x94/0x98
> [   14.816251]  el1_irq+0x8c/0x180                    (IRQ C)
> [   14.816252]  gic_handle_irq+0xbc/0x2e4
> [   14.816252]  el1_irq+0xcc/0x180                    (IRQ B)
> [   14.816253]  arch_timer_handler_virt+0x38/0x58
> [   14.816253]  handle_percpu_devid_irq+0x90/0x240
> [   14.816253]  generic_handle_irq+0x34/0x50
> [   14.816254]  __handle_domain_irq+0x68/0xc0
> [   14.816254]  gic_handle_irq+0xf8/0x2e4
> [   14.816255]  el1_irq+0xcc/0x180                    (IRQ A)
> [   14.816255]  arch_cpu_idle+0x34/0x1c8
> [   14.816255]  default_idle_call+0x24/0x44
> [   14.816256]  do_idle+0x1d0/0x2c8
> [   14.816256]  cpu_startup_entry+0x28/0x30
> [   14.816256]  rest_init+0xb8/0xc8
> [   14.816257]  start_kernel+0x4c8/0x4f4
> [   14.816257] Code: 940587f1 d5384100 b9401001 36a7fd01 (d4210000)
> [   14.816258] Modules linked in: start_dp(O) smeth(O)
> [   15.103092] ---[ end trace 701753956cb14aa8 ]---
> [   15.103093] Kernel panic - not syncing: Fatal exception in interrupt
> [   15.103099] SMP: stopping secondary CPUs
> [   15.103100] Kernel Offset: disabled
> [   15.103100] CPU features: 0x36,a2400218
> [   15.103100] Memory Limit: none
> 
> which is cause by a 'BUG_ON(in_nmi())' in nmi_enter().
> 
> >From the call trace, we can find three interrupts (noted A, B, C above):
> interrupt (A) is preempted by (B), which is further interrupted by (C).
> 
> Subsequent investigations show that (B) results in nmi_enter() being
> called, but that it actually is a spurious interrupt. Furthermore,
> interrupts are reenabled in the context of (B), and (C) fires with
> NMI priority. We end-up with a nested NMI situation, something
> we definitely do not want to (and cannot) handle.
> 
> The bug here is that spurious interrupts should never result in any
> state change, and we should just return to the interrupted context.
> Moving the handling of spurious interrupts as early as possible in
> the GICv3 handler fixes this issue.
> 
> Fixes: 3f1f3234bc2d ("irqchip/gic-v3: Switch to PMR masking before calling IRQ handler")

This commit is in 5.1, not 4.19.

How are you hitting this?

What are we supposed to do wit this patch?

confused,

greg k-h

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH hulk-4.19-next] irqchip/gic-v3: Do not enable irqs when handling spurious interrups
  2021-04-26  5:39 ` Greg KH
@ 2021-04-26  6:28   ` He Ying
  2021-04-26  8:44   ` Hanjun Guo
  1 sibling, 0 replies; 6+ messages in thread
From: He Ying @ 2021-04-26  6:28 UTC (permalink / raw)
  To: Greg KH
  Cc: patchwork, huawei.libin, yangerkun, xiexiuqi, guohanjun,
	Marc Zyngier, stable

Hi Greg,

Sorry to bother you. This email was meant to be sent internally. I 
forgot to suppress cc.


在 2021/4/26 13:39, Greg KH 写道:
> On Sun, Apr 25, 2021 at 10:39:29PM -0400, He Ying wrote:
>> hulk inclusion
>> category: bugfix
>> bugzilla: NA
>> DTS: NA
>> CVE: NA
>>
>> --------------------------------
>>
>> We triggered the following error while running our 4.19 kernel
>> with the pseudo-NMI patches backported to it:
>>
>> [   14.816231] ------------[ cut here ]------------
>> [   14.816231] kernel BUG at irq.c:99!
>> [   14.816232] Internal error: Oops - BUG: 0 [#1] SMP
>> [   14.816232] Process swapper/0 (pid: 0, stack limit = 0x(____ptrval____))
>> [   14.816233] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           O      4.19.95.aarch64 #14
>> [   14.816233] Hardware name: evb (DT)
>> [   14.816234] pstate: 80400085 (Nzcv daIf +PAN -UAO)
>> [   14.816234] pc : asm_nmi_enter+0x94/0x98
>> [   14.816235] lr : asm_nmi_enter+0x18/0x98
>> [   14.816235] sp : ffff000008003c50
>> [   14.816235] pmr_save: 00000070
>> [   14.816237] x29: ffff000008003c50 x28: ffff0000095f56c0
>> [   14.816238] x27: 0000000000000000 x26: ffff000008004000
>> [   14.816239] x25: 00000000015e0000 x24: ffff8008fb916000
>> [   14.816240] x23: 0000000020400005 x22: ffff0000080817cc
>> [   14.816241] x21: ffff000008003da0 x20: 0000000000000060
>> [   14.816242] x19: 00000000000003ff x18: ffffffffffffffff
>> [   14.816243] x17: 0000000000000008 x16: 003d090000000000
>> [   14.816244] x15: ffff0000095ea6c8 x14: ffff8008fff5ab40
>> [   14.816244] x13: ffff8008fff58b9d x12: 0000000000000000
>> [   14.816245] x11: ffff000008c8a200 x10: 000000008e31fca5
>> [   14.816246] x9 : ffff000008c8a208 x8 : 000000000000000f
>> [   14.816247] x7 : 0000000000000004 x6 : ffff8008fff58b9e
>> [   14.816248] x5 : 0000000000000000 x4 : 0000000080000000
>> [   14.816249] x3 : 0000000000000000 x2 : 0000000080000000
>> [   14.816250] x1 : 0000000000120000 x0 : ffff0000095f56c0
>> [   14.816251] Call trace:
>> [   14.816251]  asm_nmi_enter+0x94/0x98
>> [   14.816251]  el1_irq+0x8c/0x180                    (IRQ C)
>> [   14.816252]  gic_handle_irq+0xbc/0x2e4
>> [   14.816252]  el1_irq+0xcc/0x180                    (IRQ B)
>> [   14.816253]  arch_timer_handler_virt+0x38/0x58
>> [   14.816253]  handle_percpu_devid_irq+0x90/0x240
>> [   14.816253]  generic_handle_irq+0x34/0x50
>> [   14.816254]  __handle_domain_irq+0x68/0xc0
>> [   14.816254]  gic_handle_irq+0xf8/0x2e4
>> [   14.816255]  el1_irq+0xcc/0x180                    (IRQ A)
>> [   14.816255]  arch_cpu_idle+0x34/0x1c8
>> [   14.816255]  default_idle_call+0x24/0x44
>> [   14.816256]  do_idle+0x1d0/0x2c8
>> [   14.816256]  cpu_startup_entry+0x28/0x30
>> [   14.816256]  rest_init+0xb8/0xc8
>> [   14.816257]  start_kernel+0x4c8/0x4f4
>> [   14.816257] Code: 940587f1 d5384100 b9401001 36a7fd01 (d4210000)
>> [   14.816258] Modules linked in: start_dp(O) smeth(O)
>> [   15.103092] ---[ end trace 701753956cb14aa8 ]---
>> [   15.103093] Kernel panic - not syncing: Fatal exception in interrupt
>> [   15.103099] SMP: stopping secondary CPUs
>> [   15.103100] Kernel Offset: disabled
>> [   15.103100] CPU features: 0x36,a2400218
>> [   15.103100] Memory Limit: none
>>
>> which is cause by a 'BUG_ON(in_nmi())' in nmi_enter().
>>
>> >From the call trace, we can find three interrupts (noted A, B, C above):
>> interrupt (A) is preempted by (B), which is further interrupted by (C).
>>
>> Subsequent investigations show that (B) results in nmi_enter() being
>> called, but that it actually is a spurious interrupt. Furthermore,
>> interrupts are reenabled in the context of (B), and (C) fires with
>> NMI priority. We end-up with a nested NMI situation, something
>> we definitely do not want to (and cannot) handle.
>>
>> The bug here is that spurious interrupts should never result in any
>> state change, and we should just return to the interrupted context.
>> Moving the handling of spurious interrupts as early as possible in
>> the GICv3 handler fixes this issue.
>>
>> Fixes: 3f1f3234bc2d ("irqchip/gic-v3: Switch to PMR masking before calling IRQ handler")
> This commit is in 5.1, not 4.19.
>
> How are you hitting this?
>
> What are we supposed to do wit this patch?
>
> confused,

Our kernel is 4.19, but has backported pseudo-NMI patches from higher 
version, including

this commit of course. This point is written in the commit msg.


Marc Zyngier and Mark Rutland acked the origional patch and accepted it. 
Maybe you can

review the origional one which is titled as "[PATCH v2] irqchip/gic-v3: 
Do not enable irqs

when handling spurious interrups".


Sorry again to bother you.


Thanks.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH hulk-4.19-next] irqchip/gic-v3: Do not enable irqs when handling spurious interrups
  2021-04-26  5:39 ` Greg KH
  2021-04-26  6:28   ` He Ying
@ 2021-04-26  8:44   ` Hanjun Guo
  2021-04-26  9:05     ` Greg KH
  1 sibling, 1 reply; 6+ messages in thread
From: Hanjun Guo @ 2021-04-26  8:44 UTC (permalink / raw)
  To: Greg KH, He Ying
  Cc: patchwork, huawei.libin, yangerkun, xiexiuqi, Marc Zyngier, stable

On 2021/4/26 13:39, Greg KH wrote:
> On Sun, Apr 25, 2021 at 10:39:29PM -0400, He Ying wrote:
>> hulk inclusion
>> category: bugfix
>> bugzilla: NA
>> DTS: NA
>> CVE: NA
>>
>> --------------------------------
>>
>> We triggered the following error while running our 4.19 kernel
>> with the pseudo-NMI patches backported to it:
>>
>> [   14.816231] ------------[ cut here ]------------
>> [   14.816231] kernel BUG at irq.c:99!
>> [   14.816232] Internal error: Oops - BUG: 0 [#1] SMP
>> [   14.816232] Process swapper/0 (pid: 0, stack limit = 0x(____ptrval____))
>> [   14.816233] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           O      4.19.95.aarch64 #14
>> [   14.816233] Hardware name: evb (DT)
>> [   14.816234] pstate: 80400085 (Nzcv daIf +PAN -UAO)
>> [   14.816234] pc : asm_nmi_enter+0x94/0x98
>> [   14.816235] lr : asm_nmi_enter+0x18/0x98
>> [   14.816235] sp : ffff000008003c50
>> [   14.816235] pmr_save: 00000070
>> [   14.816237] x29: ffff000008003c50 x28: ffff0000095f56c0
>> [   14.816238] x27: 0000000000000000 x26: ffff000008004000
>> [   14.816239] x25: 00000000015e0000 x24: ffff8008fb916000
>> [   14.816240] x23: 0000000020400005 x22: ffff0000080817cc
>> [   14.816241] x21: ffff000008003da0 x20: 0000000000000060
>> [   14.816242] x19: 00000000000003ff x18: ffffffffffffffff
>> [   14.816243] x17: 0000000000000008 x16: 003d090000000000
>> [   14.816244] x15: ffff0000095ea6c8 x14: ffff8008fff5ab40
>> [   14.816244] x13: ffff8008fff58b9d x12: 0000000000000000
>> [   14.816245] x11: ffff000008c8a200 x10: 000000008e31fca5
>> [   14.816246] x9 : ffff000008c8a208 x8 : 000000000000000f
>> [   14.816247] x7 : 0000000000000004 x6 : ffff8008fff58b9e
>> [   14.816248] x5 : 0000000000000000 x4 : 0000000080000000
>> [   14.816249] x3 : 0000000000000000 x2 : 0000000080000000
>> [   14.816250] x1 : 0000000000120000 x0 : ffff0000095f56c0
>> [   14.816251] Call trace:
>> [   14.816251]  asm_nmi_enter+0x94/0x98
>> [   14.816251]  el1_irq+0x8c/0x180                    (IRQ C)
>> [   14.816252]  gic_handle_irq+0xbc/0x2e4
>> [   14.816252]  el1_irq+0xcc/0x180                    (IRQ B)
>> [   14.816253]  arch_timer_handler_virt+0x38/0x58
>> [   14.816253]  handle_percpu_devid_irq+0x90/0x240
>> [   14.816253]  generic_handle_irq+0x34/0x50
>> [   14.816254]  __handle_domain_irq+0x68/0xc0
>> [   14.816254]  gic_handle_irq+0xf8/0x2e4
>> [   14.816255]  el1_irq+0xcc/0x180                    (IRQ A)
>> [   14.816255]  arch_cpu_idle+0x34/0x1c8
>> [   14.816255]  default_idle_call+0x24/0x44
>> [   14.816256]  do_idle+0x1d0/0x2c8
>> [   14.816256]  cpu_startup_entry+0x28/0x30
>> [   14.816256]  rest_init+0xb8/0xc8
>> [   14.816257]  start_kernel+0x4c8/0x4f4
>> [   14.816257] Code: 940587f1 d5384100 b9401001 36a7fd01 (d4210000)
>> [   14.816258] Modules linked in: start_dp(O) smeth(O)
>> [   15.103092] ---[ end trace 701753956cb14aa8 ]---
>> [   15.103093] Kernel panic - not syncing: Fatal exception in interrupt
>> [   15.103099] SMP: stopping secondary CPUs
>> [   15.103100] Kernel Offset: disabled
>> [   15.103100] CPU features: 0x36,a2400218
>> [   15.103100] Memory Limit: none
>>
>> which is cause by a 'BUG_ON(in_nmi())' in nmi_enter().
>>
>> >From the call trace, we can find three interrupts (noted A, B, C above):
>> interrupt (A) is preempted by (B), which is further interrupted by (C).
>>
>> Subsequent investigations show that (B) results in nmi_enter() being
>> called, but that it actually is a spurious interrupt. Furthermore,
>> interrupts are reenabled in the context of (B), and (C) fires with
>> NMI priority. We end-up with a nested NMI situation, something
>> we definitely do not want to (and cannot) handle.
>>
>> The bug here is that spurious interrupts should never result in any
>> state change, and we should just return to the interrupted context.
>> Moving the handling of spurious interrupts as early as possible in
>> the GICv3 handler fixes this issue.
>>
>> Fixes: 3f1f3234bc2d ("irqchip/gic-v3: Switch to PMR masking before calling IRQ handler")
> 
> This commit is in 5.1, not 4.19.
> 
> How are you hitting this?
> 
> What are we supposed to do wit this patch?
> 
> confused,

Sorry for the noise, we have an internally used kernel which is based
on LTS 4.19 kernel, and we backported some mainline features including
the pseudo NMI for ARM64, that's why we backporting the following
bugfixes as well.

I didn't notice it was cced stable and Marc as well, and adding
my review tag (for internally review process), sorry.

Thanks
Hanjun

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH hulk-4.19-next] irqchip/gic-v3: Do not enable irqs when handling spurious interrups
  2021-04-26  8:44   ` Hanjun Guo
@ 2021-04-26  9:05     ` Greg KH
  0 siblings, 0 replies; 6+ messages in thread
From: Greg KH @ 2021-04-26  9:05 UTC (permalink / raw)
  To: Hanjun Guo
  Cc: He Ying, patchwork, huawei.libin, yangerkun, xiexiuqi,
	Marc Zyngier, stable

On Mon, Apr 26, 2021 at 04:44:45PM +0800, Hanjun Guo wrote:
> On 2021/4/26 13:39, Greg KH wrote:
> > On Sun, Apr 25, 2021 at 10:39:29PM -0400, He Ying wrote:
> > > hulk inclusion
> > > category: bugfix
> > > bugzilla: NA
> > > DTS: NA
> > > CVE: NA
> > > 
> > > --------------------------------
> > > 
> > > We triggered the following error while running our 4.19 kernel
> > > with the pseudo-NMI patches backported to it:
> > > 
> > > [   14.816231] ------------[ cut here ]------------
> > > [   14.816231] kernel BUG at irq.c:99!
> > > [   14.816232] Internal error: Oops - BUG: 0 [#1] SMP
> > > [   14.816232] Process swapper/0 (pid: 0, stack limit = 0x(____ptrval____))
> > > [   14.816233] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           O      4.19.95.aarch64 #14
> > > [   14.816233] Hardware name: evb (DT)
> > > [   14.816234] pstate: 80400085 (Nzcv daIf +PAN -UAO)
> > > [   14.816234] pc : asm_nmi_enter+0x94/0x98
> > > [   14.816235] lr : asm_nmi_enter+0x18/0x98
> > > [   14.816235] sp : ffff000008003c50
> > > [   14.816235] pmr_save: 00000070
> > > [   14.816237] x29: ffff000008003c50 x28: ffff0000095f56c0
> > > [   14.816238] x27: 0000000000000000 x26: ffff000008004000
> > > [   14.816239] x25: 00000000015e0000 x24: ffff8008fb916000
> > > [   14.816240] x23: 0000000020400005 x22: ffff0000080817cc
> > > [   14.816241] x21: ffff000008003da0 x20: 0000000000000060
> > > [   14.816242] x19: 00000000000003ff x18: ffffffffffffffff
> > > [   14.816243] x17: 0000000000000008 x16: 003d090000000000
> > > [   14.816244] x15: ffff0000095ea6c8 x14: ffff8008fff5ab40
> > > [   14.816244] x13: ffff8008fff58b9d x12: 0000000000000000
> > > [   14.816245] x11: ffff000008c8a200 x10: 000000008e31fca5
> > > [   14.816246] x9 : ffff000008c8a208 x8 : 000000000000000f
> > > [   14.816247] x7 : 0000000000000004 x6 : ffff8008fff58b9e
> > > [   14.816248] x5 : 0000000000000000 x4 : 0000000080000000
> > > [   14.816249] x3 : 0000000000000000 x2 : 0000000080000000
> > > [   14.816250] x1 : 0000000000120000 x0 : ffff0000095f56c0
> > > [   14.816251] Call trace:
> > > [   14.816251]  asm_nmi_enter+0x94/0x98
> > > [   14.816251]  el1_irq+0x8c/0x180                    (IRQ C)
> > > [   14.816252]  gic_handle_irq+0xbc/0x2e4
> > > [   14.816252]  el1_irq+0xcc/0x180                    (IRQ B)
> > > [   14.816253]  arch_timer_handler_virt+0x38/0x58
> > > [   14.816253]  handle_percpu_devid_irq+0x90/0x240
> > > [   14.816253]  generic_handle_irq+0x34/0x50
> > > [   14.816254]  __handle_domain_irq+0x68/0xc0
> > > [   14.816254]  gic_handle_irq+0xf8/0x2e4
> > > [   14.816255]  el1_irq+0xcc/0x180                    (IRQ A)
> > > [   14.816255]  arch_cpu_idle+0x34/0x1c8
> > > [   14.816255]  default_idle_call+0x24/0x44
> > > [   14.816256]  do_idle+0x1d0/0x2c8
> > > [   14.816256]  cpu_startup_entry+0x28/0x30
> > > [   14.816256]  rest_init+0xb8/0xc8
> > > [   14.816257]  start_kernel+0x4c8/0x4f4
> > > [   14.816257] Code: 940587f1 d5384100 b9401001 36a7fd01 (d4210000)
> > > [   14.816258] Modules linked in: start_dp(O) smeth(O)
> > > [   15.103092] ---[ end trace 701753956cb14aa8 ]---
> > > [   15.103093] Kernel panic - not syncing: Fatal exception in interrupt
> > > [   15.103099] SMP: stopping secondary CPUs
> > > [   15.103100] Kernel Offset: disabled
> > > [   15.103100] CPU features: 0x36,a2400218
> > > [   15.103100] Memory Limit: none
> > > 
> > > which is cause by a 'BUG_ON(in_nmi())' in nmi_enter().
> > > 
> > > >From the call trace, we can find three interrupts (noted A, B, C above):
> > > interrupt (A) is preempted by (B), which is further interrupted by (C).
> > > 
> > > Subsequent investigations show that (B) results in nmi_enter() being
> > > called, but that it actually is a spurious interrupt. Furthermore,
> > > interrupts are reenabled in the context of (B), and (C) fires with
> > > NMI priority. We end-up with a nested NMI situation, something
> > > we definitely do not want to (and cannot) handle.
> > > 
> > > The bug here is that spurious interrupts should never result in any
> > > state change, and we should just return to the interrupted context.
> > > Moving the handling of spurious interrupts as early as possible in
> > > the GICv3 handler fixes this issue.
> > > 
> > > Fixes: 3f1f3234bc2d ("irqchip/gic-v3: Switch to PMR masking before calling IRQ handler")
> > 
> > This commit is in 5.1, not 4.19.
> > 
> > How are you hitting this?
> > 
> > What are we supposed to do wit this patch?
> > 
> > confused,
> 
> Sorry for the noise, we have an internally used kernel which is based
> on LTS 4.19 kernel, and we backported some mainline features including
> the pseudo NMI for ARM64, that's why we backporting the following
> bugfixes as well.
> 
> I didn't notice it was cced stable and Marc as well, and adding
> my review tag (for internally review process), sorry.

Not a problem, thanks for letting us know.

greg k-h

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-04-26  9:05 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-26  2:39 [PATCH hulk-4.19-next] irqchip/gic-v3: Do not enable irqs when handling spurious interrups He Ying
2021-04-26  3:42 ` Hanjun Guo
2021-04-26  5:39 ` Greg KH
2021-04-26  6:28   ` He Ying
2021-04-26  8:44   ` Hanjun Guo
2021-04-26  9:05     ` Greg KH

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.