All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] arm64: fix kernel stack overflow in kdump capture kernel
@ 2019-04-18 14:02 Wei Li
  2019-04-18 14:26 ` Julien Thierry
  0 siblings, 1 reply; 5+ messages in thread
From: Wei Li @ 2019-04-18 14:02 UTC (permalink / raw)
  To: catalin.marinas, will.deacon, julien.thierry
  Cc: mark.rutland, daniel.thompson, marc.zyngier, christoffer.dall,
	james.morse, joel, linux-arm-kernel

When enabling ARM64_PSEUDO_NMI feature in kdump capture kernel, it will
report a kernel stack overflow exception:

[    0.000000] CPU features: detected: IRQ priority masking
[    0.000000] alternatives: patching kernel code
[    0.000000] Insufficient stack space to handle exception!
[    0.000000] ESR: 0x96000044 -- DABT (current EL)
[    0.000000] FAR: 0x0000000000000040
[    0.000000] Task stack:     [0xffff0000097f0000..0xffff0000097f4000]
[    0.000000] IRQ stack:      [0x0000000000000000..0x0000000000004000]
[    0.000000] Overflow stack: [0xffff80002b7cf290..0xffff80002b7d0290]
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.19.34-lw+ #3
[    0.000000] pstate: 400003c5 (nZcv DAIF -PAN -UAO)
[    0.000000] pc : el1_sync+0x0/0xb8
[    0.000000] lr : el1_irq+0xb8/0x140
[    0.000000] sp : 0000000000000040
[    0.000000] pmr_save: 00000070
[    0.000000] x29: ffff0000097f3f60 x28: ffff000009806240 
[    0.000000] x27: 0000000080000000 x26: 0000000000004000 
[    0.000000] x25: 0000000000000000 x24: ffff000009329028 
[    0.000000] x23: 0000000040000005 x22: ffff000008095c6c 
[    0.000000] x21: ffff0000097f3f70 x20: 0000000000000070 
[    0.000000] x19: ffff0000097f3e30 x18: ffffffffffffffff 
[    0.000000] x17: 0000000000000000 x16: 0000000000000000 
[    0.000000] x15: ffff0000097f9708 x14: ffff000089a382ef 
[    0.000000] x13: ffff000009a382fd x12: ffff000009824000 
[    0.000000] x11: ffff0000097fb7b0 x10: ffff000008730028 
[    0.000000] x9 : ffff000009440018 x8 : 000000000000000d 
[    0.000000] x7 : 6b20676e69686374 x6 : 000000000000003b 
[    0.000000] x5 : 0000000000000000 x4 : ffff000008093600 
[    0.000000] x3 : 0000000400000008 x2 : 7db2e689fc2b8e00 
[    0.000000] x1 : 0000000000000000 x0 : ffff0000097f3e30 
[    0.000000] Kernel panic - not syncing: kernel stack overflow
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.19.34-lw+ #3
[    0.000000] Call trace:
[    0.000000]  dump_backtrace+0x0/0x1b8
[    0.000000]  show_stack+0x24/0x30
[    0.000000]  dump_stack+0xa8/0xcc
[    0.000000]  panic+0x134/0x30c
[    0.000000]  __stack_chk_fail+0x0/0x28
[    0.000000]  handle_bad_stack+0xfc/0x108
[    0.000000]  __bad_stack+0x90/0x94
[    0.000000]  el1_sync+0x0/0xb8
[    0.000000]  init_gic_priority_masking+0x4c/0x70
[    0.000000]  smp_prepare_boot_cpu+0x60/0x68
[    0.000000]  start_kernel+0x1e8/0x53c
[    0.000000] ---[ end Kernel panic - not syncing: kernel stack overflow ]---

The reason is init_gic_priority_masking() may unmask PSR.I while the
irq stacks are not inited yet. Some "NMI" could be raised unfortunately
and it will just go into this exception.

In this patch, we just write the PMR in smp_prepare_boot_cpu(), and delay
unmasking PSR.I after irq stacks inited in init_IRQ().

Fixes: e79321883842 ("arm64: Switch to PMR masking when starting CPUs")
Signed-off-by: Wei Li <liwei391@huawei.com>
---
 arch/arm64/include/asm/smp.h | 2 ++
 arch/arm64/kernel/irq.c      | 2 ++
 arch/arm64/kernel/smp.c      | 8 ++++----
 3 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/include/asm/smp.h b/arch/arm64/include/asm/smp.h
index 18553f399e08..4d8eb35f4cdd 100644
--- a/arch/arm64/include/asm/smp.h
+++ b/arch/arm64/include/asm/smp.h
@@ -158,6 +158,8 @@ bool cpus_are_stuck_in_kernel(void);
 extern void crash_smp_send_stop(void);
 extern bool smp_crash_stop_failed(void);
 
+extern void init_gic_priority_masking(bool enable);
+
 #endif /* ifndef __ASSEMBLY__ */
 
 #endif /* ifndef __ASM_SMP_H */
diff --git a/arch/arm64/kernel/irq.c b/arch/arm64/kernel/irq.c
index 92fa81798fb9..7e65280f2ebd 100644
--- a/arch/arm64/kernel/irq.c
+++ b/arch/arm64/kernel/irq.c
@@ -75,4 +75,6 @@ void __init init_IRQ(void)
 	irqchip_init();
 	if (!handle_arch_irq)
 		panic("No interrupt controller found.");
+	if (system_uses_irq_prio_masking())
+		init_gic_priority_masking(true);
 }
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 824de7038967..07b6e9df151a 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -181,7 +181,7 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
 	return ret;
 }
 
-static void init_gic_priority_masking(void)
+void init_gic_priority_masking(bool enable)
 {
 	u32 cpuflags;
 
@@ -195,7 +195,7 @@ static void init_gic_priority_masking(void)
 	gic_write_pmr(GIC_PRIO_IRQOFF);
 
 	/* We can only unmask PSR.I if we can take aborts */
-	if (!(cpuflags & PSR_A_BIT))
+	if (enable && !(cpuflags & PSR_A_BIT))
 		write_sysreg(cpuflags & ~PSR_I_BIT, daif);
 }
 
@@ -226,7 +226,7 @@ asmlinkage notrace void secondary_start_kernel(void)
 	cpu_uninstall_idmap();
 
 	if (system_uses_irq_prio_masking())
-		init_gic_priority_masking();
+		init_gic_priority_masking(true);
 
 	preempt_disable();
 	trace_hardirqs_off();
@@ -451,7 +451,7 @@ void __init smp_prepare_boot_cpu(void)
 
 	/* Conditionally switch to GIC PMR for interrupt masking */
 	if (system_uses_irq_prio_masking())
-		init_gic_priority_masking();
+		init_gic_priority_masking(false);
 }
 
 static u64 __init of_get_cpu_mpidr(struct device_node *dn)
-- 
2.17.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] arm64: fix kernel stack overflow in kdump capture kernel
  2019-04-18 14:02 [PATCH] arm64: fix kernel stack overflow in kdump capture kernel Wei Li
@ 2019-04-18 14:26 ` Julien Thierry
  2019-04-18 14:59   ` Marc Zyngier
  2019-04-19  8:56   ` liwei (GF)
  0 siblings, 2 replies; 5+ messages in thread
From: Julien Thierry @ 2019-04-18 14:26 UTC (permalink / raw)
  To: Wei Li, catalin.marinas, will.deacon
  Cc: mark.rutland, daniel.thompson, marc.zyngier, christoffer.dall,
	james.morse, joel, linux-arm-kernel

Hi Wei,

On 18/04/2019 15:02, Wei Li wrote:
> When enabling ARM64_PSEUDO_NMI feature in kdump capture kernel, it will
> report a kernel stack overflow exception:
> 
> [    0.000000] CPU features: detected: IRQ priority masking
> [    0.000000] alternatives: patching kernel code
> [    0.000000] Insufficient stack space to handle exception!
> [    0.000000] ESR: 0x96000044 -- DABT (current EL)
> [    0.000000] FAR: 0x0000000000000040
> [    0.000000] Task stack:     [0xffff0000097f0000..0xffff0000097f4000]
> [    0.000000] IRQ stack:      [0x0000000000000000..0x0000000000004000]
> [    0.000000] Overflow stack: [0xffff80002b7cf290..0xffff80002b7d0290]
> [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.19.34-lw+ #3

Can you reproduce this on a recent kernel?

> [    0.000000] pstate: 400003c5 (nZcv DAIF -PAN -UAO)
> [    0.000000] pc : el1_sync+0x0/0xb8
> [    0.000000] lr : el1_irq+0xb8/0x140
> [    0.000000] sp : 0000000000000040
> [    0.000000] pmr_save: 00000070
> [    0.000000] x29: ffff0000097f3f60 x28: ffff000009806240 
> [    0.000000] x27: 0000000080000000 x26: 0000000000004000 
> [    0.000000] x25: 0000000000000000 x24: ffff000009329028 
> [    0.000000] x23: 0000000040000005 x22: ffff000008095c6c 
> [    0.000000] x21: ffff0000097f3f70 x20: 0000000000000070 
> [    0.000000] x19: ffff0000097f3e30 x18: ffffffffffffffff 
> [    0.000000] x17: 0000000000000000 x16: 0000000000000000 
> [    0.000000] x15: ffff0000097f9708 x14: ffff000089a382ef 
> [    0.000000] x13: ffff000009a382fd x12: ffff000009824000 
> [    0.000000] x11: ffff0000097fb7b0 x10: ffff000008730028 
> [    0.000000] x9 : ffff000009440018 x8 : 000000000000000d 
> [    0.000000] x7 : 6b20676e69686374 x6 : 000000000000003b 
> [    0.000000] x5 : 0000000000000000 x4 : ffff000008093600 
> [    0.000000] x3 : 0000000400000008 x2 : 7db2e689fc2b8e00 
> [    0.000000] x1 : 0000000000000000 x0 : ffff0000097f3e30 
> [    0.000000] Kernel panic - not syncing: kernel stack overflow
> [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.19.34-lw+ #3
> [    0.000000] Call trace:
> [    0.000000]  dump_backtrace+0x0/0x1b8
> [    0.000000]  show_stack+0x24/0x30
> [    0.000000]  dump_stack+0xa8/0xcc
> [    0.000000]  panic+0x134/0x30c
> [    0.000000]  __stack_chk_fail+0x0/0x28
> [    0.000000]  handle_bad_stack+0xfc/0x108
> [    0.000000]  __bad_stack+0x90/0x94
> [    0.000000]  el1_sync+0x0/0xb8
> [    0.000000]  init_gic_priority_masking+0x4c/0x70
> [    0.000000]  smp_prepare_boot_cpu+0x60/0x68
> [    0.000000]  start_kernel+0x1e8/0x53c
> [    0.000000] ---[ end Kernel panic - not syncing: kernel stack overflow ]---
> 
> The reason is init_gic_priority_masking() may unmask PSR.I while the
> irq stacks are not inited yet. Some "NMI" could be raised unfortunately
> and it will just go into this exception.
> 
> In this patch, we just write the PMR in smp_prepare_boot_cpu(), and delay
> unmasking PSR.I after irq stacks inited in init_IRQ().
> 
> Fixes: e79321883842 ("arm64: Switch to PMR masking when starting CPUs")
> Signed-off-by: Wei Li <liwei391@huawei.com>
> ---
>  arch/arm64/include/asm/smp.h | 2 ++
>  arch/arm64/kernel/irq.c      | 2 ++
>  arch/arm64/kernel/smp.c      | 8 ++++----
>  3 files changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/smp.h b/arch/arm64/include/asm/smp.h
> index 18553f399e08..4d8eb35f4cdd 100644
> --- a/arch/arm64/include/asm/smp.h
> +++ b/arch/arm64/include/asm/smp.h
> @@ -158,6 +158,8 @@ bool cpus_are_stuck_in_kernel(void);
>  extern void crash_smp_send_stop(void);
>  extern bool smp_crash_stop_failed(void);
>  
> +extern void init_gic_priority_masking(bool enable);
> +
>  #endif /* ifndef __ASSEMBLY__ */
>  
>  #endif /* ifndef __ASM_SMP_H */
> diff --git a/arch/arm64/kernel/irq.c b/arch/arm64/kernel/irq.c
> index 92fa81798fb9..7e65280f2ebd 100644
> --- a/arch/arm64/kernel/irq.c
> +++ b/arch/arm64/kernel/irq.c
> @@ -75,4 +75,6 @@ void __init init_IRQ(void)
>  	irqchip_init();
>  	if (!handle_arch_irq)
>  		panic("No interrupt controller found.");
> +	if (system_uses_irq_prio_masking())
> +		init_gic_priority_masking(true);
>  }
> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index 824de7038967..07b6e9df151a 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -181,7 +181,7 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
>  	return ret;
>  }
>  
> -static void init_gic_priority_masking(void)
> +void init_gic_priority_masking(bool enable)
>  {
>  	u32 cpuflags;
>  
> @@ -195,7 +195,7 @@ static void init_gic_priority_masking(void)
>  	gic_write_pmr(GIC_PRIO_IRQOFF);
>  
>  	/* We can only unmask PSR.I if we can take aborts */
> -	if (!(cpuflags & PSR_A_BIT))
> +	if (enable && !(cpuflags & PSR_A_BIT))
>  		write_sysreg(cpuflags & ~PSR_I_BIT, daif);
>  }
>  
> @@ -226,7 +226,7 @@ asmlinkage notrace void secondary_start_kernel(void)
>  	cpu_uninstall_idmap();
>  
>  	if (system_uses_irq_prio_masking())
> -		init_gic_priority_masking();
> +		init_gic_priority_masking(true);
>  
>  	preempt_disable();
>  	trace_hardirqs_off();
> @@ -451,7 +451,7 @@ void __init smp_prepare_boot_cpu(void)
>  
>  	/* Conditionally switch to GIC PMR for interrupt masking */
>  	if (system_uses_irq_prio_masking())
> -		init_gic_priority_masking();
> +		init_gic_priority_masking(false);

The assumption is that PSR.A should be set at this point (the kernel is
not more ready to take async exceptions than it is ready to take
interrupts) and we shouldn't be unmasking PSR.I.

PSR.A should be set when booting Linux (requirement in
Documentation/arm64/booting.txt), do you know what is clearing it and why?

Thanks,

-- 
Julien Thierry

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] arm64: fix kernel stack overflow in kdump capture kernel
  2019-04-18 14:26 ` Julien Thierry
@ 2019-04-18 14:59   ` Marc Zyngier
  2019-04-19  8:56   ` liwei (GF)
  1 sibling, 0 replies; 5+ messages in thread
From: Marc Zyngier @ 2019-04-18 14:59 UTC (permalink / raw)
  To: Julien Thierry, Wei Li, catalin.marinas, will.deacon
  Cc: mark.rutland, daniel.thompson, christoffer.dall, james.morse,
	joel, linux-arm-kernel

Wei,

On 18/04/2019 15:26, Julien Thierry wrote:
> Hi Wei,
> 
> On 18/04/2019 15:02, Wei Li wrote:
>> When enabling ARM64_PSEUDO_NMI feature in kdump capture kernel, it will
>> report a kernel stack overflow exception:
>>
>> [    0.000000] CPU features: detected: IRQ priority masking
>> [    0.000000] alternatives: patching kernel code
>> [    0.000000] Insufficient stack space to handle exception!
>> [    0.000000] ESR: 0x96000044 -- DABT (current EL)
>> [    0.000000] FAR: 0x0000000000000040
>> [    0.000000] Task stack:     [0xffff0000097f0000..0xffff0000097f4000]
>> [    0.000000] IRQ stack:      [0x0000000000000000..0x0000000000004000]
>> [    0.000000] Overflow stack: [0xffff80002b7cf290..0xffff80002b7d0290]
>> [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.19.34-lw+ #3
> 
> Can you reproduce this on a recent kernel?

In general, please do not bother reporting an issue on a non-mainline
kernel with backported stuff. This is not helpful at all. Sticking to a
mainline allows us to actually reproduce your problem instead of trying
to figure out how you have backported the patches.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] arm64: fix kernel stack overflow in kdump capture kernel
  2019-04-18 14:26 ` Julien Thierry
  2019-04-18 14:59   ` Marc Zyngier
@ 2019-04-19  8:56   ` liwei (GF)
  2019-04-23  7:55     ` Julien Thierry
  1 sibling, 1 reply; 5+ messages in thread
From: liwei (GF) @ 2019-04-19  8:56 UTC (permalink / raw)
  To: Julien Thierry, catalin.marinas, will.deacon
  Cc: mark.rutland, daniel.thompson, marc.zyngier, christoffer.dall,
	james.morse, joel, linux-arm-kernel

Hi Julien,

On 2019/4/18 22:26, Julien Thierry wrote:
> Hi Wei,
> 
> On 18/04/2019 15:02, Wei Li wrote:
>> When enabling ARM64_PSEUDO_NMI feature in kdump capture kernel, it will
>> report a kernel stack overflow exception:
>>
>> [    0.000000] CPU features: detected: IRQ priority masking
>> [    0.000000] alternatives: patching kernel code
>> [    0.000000] Insufficient stack space to handle exception!
>> [    0.000000] ESR: 0x96000044 -- DABT (current EL)
>> [    0.000000] FAR: 0x0000000000000040
>> [    0.000000] Task stack:     [0xffff0000097f0000..0xffff0000097f4000]
>> [    0.000000] IRQ stack:      [0x0000000000000000..0x0000000000004000]
>> [    0.000000] Overflow stack: [0xffff80002b7cf290..0xffff80002b7d0290]
>> [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.19.34-lw+ #3
> 
> Can you reproduce this on a recent kernel?
> 
OK, i will try to reproduce this on 5.1.0-rc*.

>>  	preempt_disable();
>>  	trace_hardirqs_off();
>> @@ -451,7 +451,7 @@ void __init smp_prepare_boot_cpu(void)
>>  
>>  	/* Conditionally switch to GIC PMR for interrupt masking */
>>  	if (system_uses_irq_prio_masking())
>> -		init_gic_priority_masking();
>> +		init_gic_priority_masking(false);
> 
> The assumption is that PSR.A should be set at this point (the kernel is
> not more ready to take async exceptions than it is ready to take
> interrupts) and we shouldn't be unmasking PSR.I.

I think the assumption here is not right, since PSR.A has been unmasked
in setup_arch() before smp_prepare_boot_cpu().

> PSR.A should be set when booting Linux (requirement in
> Documentation/arm64/booting.txt), do you know what is clearing it and why?

Thanks,
Wei


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] arm64: fix kernel stack overflow in kdump capture kernel
  2019-04-19  8:56   ` liwei (GF)
@ 2019-04-23  7:55     ` Julien Thierry
  0 siblings, 0 replies; 5+ messages in thread
From: Julien Thierry @ 2019-04-23  7:55 UTC (permalink / raw)
  To: liwei (GF), catalin.marinas, will.deacon
  Cc: mark.rutland, daniel.thompson, marc.zyngier, christoffer.dall,
	james.morse, joel, linux-arm-kernel



On 19/04/2019 09:56, liwei (GF) wrote:
> Hi Julien,
> 
> On 2019/4/18 22:26, Julien Thierry wrote:
>> Hi Wei,
>>
>> On 18/04/2019 15:02, Wei Li wrote:
>>> When enabling ARM64_PSEUDO_NMI feature in kdump capture kernel, it will
>>> report a kernel stack overflow exception:
>>>
>>> [    0.000000] CPU features: detected: IRQ priority masking
>>> [    0.000000] alternatives: patching kernel code
>>> [    0.000000] Insufficient stack space to handle exception!
>>> [    0.000000] ESR: 0x96000044 -- DABT (current EL)
>>> [    0.000000] FAR: 0x0000000000000040
>>> [    0.000000] Task stack:     [0xffff0000097f0000..0xffff0000097f4000]
>>> [    0.000000] IRQ stack:      [0x0000000000000000..0x0000000000004000]
>>> [    0.000000] Overflow stack: [0xffff80002b7cf290..0xffff80002b7d0290]
>>> [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.19.34-lw+ #3
>>
>> Can you reproduce this on a recent kernel?
>>
> OK, i will try to reproduce this on 5.1.0-rc*.
> 
>>>  	preempt_disable();
>>>  	trace_hardirqs_off();
>>> @@ -451,7 +451,7 @@ void __init smp_prepare_boot_cpu(void)
>>>  
>>>  	/* Conditionally switch to GIC PMR for interrupt masking */
>>>  	if (system_uses_irq_prio_masking())
>>> -		init_gic_priority_masking();
>>> +		init_gic_priority_masking(false);
>>
>> The assumption is that PSR.A should be set at this point (the kernel is
>> not more ready to take async exceptions than it is ready to take
>> interrupts) and we shouldn't be unmasking PSR.I.
> 
> I think the assumption here is not right, since PSR.A has been unmasked
> in setup_arch() before smp_prepare_boot_cpu().
> 

Yes, you're right. Sorry, I got confused, the secondary CPUs do the
init_gic_priority_masking() with PSR.A set and I had not realized how
problematic it was to not have it set for the boot CPU.

So, I think your patch is doing the right thing, but to keep it simple,
I'd suggest not adding that boolean to init_gic_priority_masking():

- init_gic_priority_masking() only sets PMR to GIC_PRIO_IRQOFF, we don't
alter the value of DAIF bits (we can keep the check that the I bit is
set as we expect this on all CPUs).
- In init IRQ, we can unmask PSR.I when system_uses_irq_prio_masking()
returns true (maybe we can check this is only done on the boot CPU).

I can prepare the patch and send it with the other fixes for PSEUDO_NMI
I'm preparing.

Thanks,

-- 
Julien Thierry

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-04-23  7:55 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-18 14:02 [PATCH] arm64: fix kernel stack overflow in kdump capture kernel Wei Li
2019-04-18 14:26 ` Julien Thierry
2019-04-18 14:59   ` Marc Zyngier
2019-04-19  8:56   ` liwei (GF)
2019-04-23  7:55     ` Julien Thierry

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.