* [PATCH 0/5] powerpc: various interrupt handling fixes
@ 2021-10-04 14:56 Nicholas Piggin
2021-10-04 14:56 ` [PATCH 1/5] powerpc/64s: fix program check interrupt emergency stack path Nicholas Piggin
` (5 more replies)
0 siblings, 6 replies; 8+ messages in thread
From: Nicholas Piggin @ 2021-10-04 14:56 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Ganesh Goudar, Nicholas Piggin
This fixes a number of bugs found mostly looking at a MCE handler issue,
which should be fixed in patch 5 of the series, previous attempt here
which Ganesh found to be wrong.
https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20210922020247.209409-1-npiggin@gmail.com/
I didn't increment to patch v2 because it's a different approach (so I
gave it a different title).
Thanks,
Nick
Nicholas Piggin (5):
powerpc/64s: fix program check interrupt emergency stack path
powerpc/traps: do not enable irqs in _exception
powerpc/64: warn if local irqs are enabled in NMI or hardirq context
powerpc/64/interrupt: Reconcile soft-mask state in NMI and fix false
BUG
powerpc/64s: Fix unrecoverable MCE calling async handler from NMI
arch/powerpc/include/asm/interrupt.h | 18 ++++++------
arch/powerpc/kernel/exceptions-64s.S | 25 ++++++++++------
arch/powerpc/kernel/irq.c | 6 ++++
arch/powerpc/kernel/traps.c | 43 +++++++++++++++++-----------
4 files changed, 59 insertions(+), 33 deletions(-)
--
2.23.0
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 1/5] powerpc/64s: fix program check interrupt emergency stack path
2021-10-04 14:56 [PATCH 0/5] powerpc: various interrupt handling fixes Nicholas Piggin
@ 2021-10-04 14:56 ` Nicholas Piggin
2021-10-04 14:56 ` [PATCH 2/5] powerpc/traps: do not enable irqs in _exception Nicholas Piggin
` (4 subsequent siblings)
5 siblings, 0 replies; 8+ messages in thread
From: Nicholas Piggin @ 2021-10-04 14:56 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Ganesh Goudar, Nicholas Piggin
Emergency stack path was jumping into a 3: label inside the
__GEN_COMMON_BODY macro for the normal path after it had finished,
rather than jumping over it. By a small miracle this is the correct
place to build up a new interrupt frame with the existing stack
pointer, so things basically worked okay with an added weird looking
700 trap frame on top (which had the wrong ->nip so it didn't decode
bug messages either).
Fix this by avoiding using numeric labels when jumping over non-trivial
macros.
Before:
LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV
Modules linked in:
CPU: 0 PID: 88 Comm: sh Not tainted 5.15.0-rc2-00034-ge057cdade6e5 #2637
NIP: 7265677368657265 LR: c00000000006c0c8 CTR: c0000000000097f0
REGS: c0000000fffb3a50 TRAP: 0700 Not tainted
MSR: 9000000000021031 <SF,HV,ME,IR,DR,LE> CR: 00000700 XER: 20040000
CFAR: c0000000000098b0 IRQMASK: 0
GPR00: c00000000006c964 c0000000fffb3cf0 c000000001513800 0000000000000000
GPR04: 0000000048ab0778 0000000042000000 0000000000000000 0000000000001299
GPR08: 000001e447c718ec 0000000022424282 0000000000002710 c00000000006bee8
GPR12: 9000000000009033 c0000000016b0000 00000000000000b0 0000000000000001
GPR16: 0000000000000000 0000000000000002 0000000000000000 0000000000000ff8
GPR20: 0000000000001fff 0000000000000007 0000000000000080 00007fff89d90158
GPR24: 0000000002000000 0000000002000000 0000000000000255 0000000000000300
GPR28: c000000001270000 0000000042000000 0000000048ab0778 c000000080647e80
NIP [7265677368657265] 0x7265677368657265
LR [c00000000006c0c8] ___do_page_fault+0x3f8/0xb10
Call Trace:
[c0000000fffb3cf0] [c00000000000bdac] soft_nmi_common+0x13c/0x1d0 (unreliable)
--- interrupt: 700 at decrementer_common_virt+0xb8/0x230
NIP: c0000000000098b8 LR: c00000000006c0c8 CTR: c0000000000097f0
REGS: c0000000fffb3d60 TRAP: 0700 Not tainted
MSR: 9000000000021031 <SF,HV,ME,IR,DR,LE> CR: 22424282 XER: 20040000
CFAR: c0000000000098b0 IRQMASK: 0
GPR00: c00000000006c964 0000000000002400 c000000001513800 0000000000000000
GPR04: 0000000048ab0778 0000000042000000 0000000000000000 0000000000001299
GPR08: 000001e447c718ec 0000000022424282 0000000000002710 c00000000006bee8
GPR12: 9000000000009033 c0000000016b0000 00000000000000b0 0000000000000001
GPR16: 0000000000000000 0000000000000002 0000000000000000 0000000000000ff8
GPR20: 0000000000001fff 0000000000000007 0000000000000080 00007fff89d90158
GPR24: 0000000002000000 0000000002000000 0000000000000255 0000000000000300
GPR28: c000000001270000 0000000042000000 0000000048ab0778 c000000080647e80
NIP [c0000000000098b8] decrementer_common_virt+0xb8/0x230
LR [c00000000006c0c8] ___do_page_fault+0x3f8/0xb10
--- interrupt: 700
Instruction dump:
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
---[ end trace 6d28218e0cc3c949 ]---
After:
------------[ cut here ]------------
kernel BUG at arch/powerpc/kernel/exceptions-64s.S:491!
Oops: Exception in kernel mode, sig: 5 [#1]
LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV
Modules linked in:
CPU: 0 PID: 88 Comm: login Not tainted 5.15.0-rc2-00034-ge057cdade6e5-dirty #2638
NIP: c0000000000098b8 LR: c00000000006bf04 CTR: c0000000000097f0
REGS: c0000000fffb3d60 TRAP: 0700 Not tainted
MSR: 9000000000021031 <SF,HV,ME,IR,DR,LE> CR: 24482227 XER: 00040000
CFAR: c0000000000098b0 IRQMASK: 0
GPR00: c00000000006bf04 0000000000002400 c000000001513800 c000000001271868
GPR04: 00000000100f0d29 0000000042000000 0000000000000007 0000000000000009
GPR08: 00000000100f0d29 0000000024482227 0000000000002710 c000000000181b3c
GPR12: 9000000000009033 c0000000016b0000 00000000100f0d29 c000000005b22f00
GPR16: 00000000ffff0000 0000000000000001 0000000000000009 00000000100eed90
GPR20: 00000000100eed90 0000000010000000 000000001000a49c 00000000100f1430
GPR24: c000000001271868 0000000002000000 0000000000000215 0000000000000300
GPR28: c000000001271800 0000000042000000 00000000100f0d29 c000000080647860
NIP [c0000000000098b8] decrementer_common_virt+0xb8/0x230
LR [c00000000006bf04] ___do_page_fault+0x234/0xb10
Call Trace:
Instruction dump:
4182000c 39400001 48000008 894d0932 714a0001 39400008 408225fc 718a4000
7c2a0b78 3821fcf0 41c20008 e82d0910 <0981fcf0> f92101a0 f9610170 f9810178
---[ end trace a5dbd1f5ea4ccc51 ]---
Fixes: 0a882e28468f4 ("powerpc/64s/exception: remove bad stack branch")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
arch/powerpc/kernel/exceptions-64s.S | 17 ++++++++++-------
1 file changed, 10 insertions(+), 7 deletions(-)
diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index 37859e62a8dc..024d9231f88c 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -1665,27 +1665,30 @@ EXC_COMMON_BEGIN(program_check_common)
*/
andi. r10,r12,MSR_PR
- bne 2f /* If userspace, go normal path */
+ bne .Lnormal_stack /* If userspace, go normal path */
andis. r10,r12,(SRR1_PROGTM)@h
- bne 1f /* If TM, emergency */
+ bne .Lemergency_stack /* If TM, emergency */
cmpdi r1,-INT_FRAME_SIZE /* check if r1 is in userspace */
- blt 2f /* normal path if not */
+ blt .Lnormal_stack /* normal path if not */
/* Use the emergency stack */
-1: andi. r10,r12,MSR_PR /* Set CR0 correctly for label */
+.Lemergency_stack:
+ andi. r10,r12,MSR_PR /* Set CR0 correctly for label */
/* 3 in EXCEPTION_PROLOG_COMMON */
mr r10,r1 /* Save r1 */
ld r1,PACAEMERGSP(r13) /* Use emergency stack */
subi r1,r1,INT_FRAME_SIZE /* alloc stack frame */
__ISTACK(program_check)=0
__GEN_COMMON_BODY program_check
- b 3f
-2:
+ b .Ldo_program_check
+
+.Lnormal_stack:
__ISTACK(program_check)=1
__GEN_COMMON_BODY program_check
-3:
+
+.Ldo_program_check:
addi r3,r1,STACK_FRAME_OVERHEAD
bl program_check_exception
REST_NVGPRS(r1) /* instruction emulation may change GPRs */
--
2.23.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 2/5] powerpc/traps: do not enable irqs in _exception
2021-10-04 14:56 [PATCH 0/5] powerpc: various interrupt handling fixes Nicholas Piggin
2021-10-04 14:56 ` [PATCH 1/5] powerpc/64s: fix program check interrupt emergency stack path Nicholas Piggin
@ 2021-10-04 14:56 ` Nicholas Piggin
2021-10-04 14:56 ` [PATCH 3/5] powerpc/64: warn if local irqs are enabled in NMI or hardirq context Nicholas Piggin
` (3 subsequent siblings)
5 siblings, 0 replies; 8+ messages in thread
From: Nicholas Piggin @ 2021-10-04 14:56 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Ganesh Goudar, Nicholas Piggin
_exception can be called by machine check handlers when the MCE hits
user code (e.g., pseries and powernv). This will enable local irqs
because, which is a dicey thing to do in NMI or hard irq context.
This seemed to worked out okay because a userspace MCE can basically be
treated like a synchronous interrupt (after async / imprecise MCEs are
filtered out). Since NMI and hard irq handlers have started growing
nmi_enter / irq_enter, and more irq state sanity checks, this has
started to cause problems (or at least trigger warnings).
The Fixes tag to the commit which introduced this rather than try to
work out exactly which commit was the first that could possibly cause a
problem because that may be difficult to prove.
Fixes: 9f2f79e3a3c1 ("powerpc: Disable interrupts in 64-bit kernel FP and vector faults")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
arch/powerpc/kernel/traps.c | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index aac8c0412ff9..e453b666613b 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -340,10 +340,16 @@ static bool exception_common(int signr, struct pt_regs *regs, int code,
return false;
}
- show_signal_msg(signr, regs, code, addr);
+ /*
+ * Must not enable interrupts even for user-mode exception, because
+ * this can be called from machine check, which may be a NMI or IRQ
+ * which don't like interrupts being enabled. Could check for
+ * in_hardirq || in_nmi perhaps, but there doesn't seem to be a good
+ * reason why _exception() should enable irqs for an exception handler,
+ * the handlers themselves do that directly.
+ */
- if (arch_irqs_disabled())
- interrupt_cond_local_irq_enable(regs);
+ show_signal_msg(signr, regs, code, addr);
current->thread.trap_nr = code;
--
2.23.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 3/5] powerpc/64: warn if local irqs are enabled in NMI or hardirq context
2021-10-04 14:56 [PATCH 0/5] powerpc: various interrupt handling fixes Nicholas Piggin
2021-10-04 14:56 ` [PATCH 1/5] powerpc/64s: fix program check interrupt emergency stack path Nicholas Piggin
2021-10-04 14:56 ` [PATCH 2/5] powerpc/traps: do not enable irqs in _exception Nicholas Piggin
@ 2021-10-04 14:56 ` Nicholas Piggin
2021-10-04 14:56 ` [PATCH 4/5] powerpc/64/interrupt: Reconcile soft-mask state in NMI and fix false BUG Nicholas Piggin
` (2 subsequent siblings)
5 siblings, 0 replies; 8+ messages in thread
From: Nicholas Piggin @ 2021-10-04 14:56 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Ganesh Goudar, Nicholas Piggin
This can help catch bugs such as the one fixed by the previous change
to prevent _exception() from enabling irqs.
ppc32 could have a similar warning but it has no good config option to
debug this stuff (the test may be overkill to add for production
kernels).
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
arch/powerpc/kernel/irq.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 551b653228c4..c4f1d6b7d992 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -229,6 +229,9 @@ notrace void arch_local_irq_restore(unsigned long mask)
return;
}
+ if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG))
+ WARN_ON_ONCE(in_nmi() || in_hardirq());
+
/*
* After the stb, interrupts are unmasked and there are no interrupts
* pending replay. The restart sequence makes this atomic with
@@ -321,6 +324,9 @@ notrace void arch_local_irq_restore(unsigned long mask)
if (mask)
return;
+ if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG))
+ WARN_ON_ONCE(in_nmi() || in_hardirq());
+
/*
* From this point onward, we can take interrupts, preempt,
* etc... unless we got hard-disabled. We check if an event
--
2.23.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 4/5] powerpc/64/interrupt: Reconcile soft-mask state in NMI and fix false BUG
2021-10-04 14:56 [PATCH 0/5] powerpc: various interrupt handling fixes Nicholas Piggin
` (2 preceding siblings ...)
2021-10-04 14:56 ` [PATCH 3/5] powerpc/64: warn if local irqs are enabled in NMI or hardirq context Nicholas Piggin
@ 2021-10-04 14:56 ` Nicholas Piggin
2021-10-04 14:56 ` [PATCH 5/5] powerpc/64s: Fix unrecoverable MCE calling async handler from NMI Nicholas Piggin
2021-10-08 13:23 ` [PATCH 0/5] powerpc: various interrupt handling fixes Michael Ellerman
5 siblings, 0 replies; 8+ messages in thread
From: Nicholas Piggin @ 2021-10-04 14:56 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Ganesh Goudar, Nicholas Piggin
If a NMI hits early in an interrupt handler before the irq soft-mask
state is reconciled, that can cause a false-positive BUG with a
CONFIG_PPC_IRQ_SOFT_MASK_DEBUG assertion.
Remove that assertion and instead check the case that if regs->msr has
EE clear, then regs->softe should be marked as disabled so the irq state
looks correct to NMI handlers, the same as how it's fixed up in the
case it was implicit soft-masked.
This doesn't fix a known problem -- the change that was fixed by commit
4ec5feec1ad02 ("powerpc/64s: Make NMI record implicitly soft-masked code
as irqs disabled") was the addition of a warning in the soft-nmi
watchdog interrupt which can never actually fire when MSR[EE]=0. However
it may be important if NMI handlers grow more code, and it's less
surprising to anything using 'regs' - (I tripped over this when working
in the area).
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
arch/powerpc/include/asm/interrupt.h | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/arch/powerpc/include/asm/interrupt.h b/arch/powerpc/include/asm/interrupt.h
index 6b800d3e2681..b894b7169706 100644
--- a/arch/powerpc/include/asm/interrupt.h
+++ b/arch/powerpc/include/asm/interrupt.h
@@ -265,13 +265,16 @@ static inline void interrupt_nmi_enter_prepare(struct pt_regs *regs, struct inte
local_paca->irq_soft_mask = IRQS_ALL_DISABLED;
local_paca->irq_happened |= PACA_IRQ_HARD_DIS;
- if (is_implicit_soft_masked(regs)) {
- // Adjust regs->softe soft implicit soft-mask, so
- // arch_irq_disabled_regs(regs) behaves as expected.
+ if (!(regs->msr & MSR_EE) || is_implicit_soft_masked(regs)) {
+ /*
+ * Adjust regs->softe to be soft-masked if it had not been
+ * reconcied (e.g., interrupt entry with MSR[EE]=0 but softe
+ * not yet set disabled), or if it was in an implicit soft
+ * masked state. This makes arch_irq_disabled_regs(regs)
+ * behave as expected.
+ */
regs->softe = IRQS_ALL_DISABLED;
}
- if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG))
- BUG_ON(!arch_irq_disabled_regs(regs) && !(regs->msr & MSR_EE));
/* Don't do any per-CPU operations until interrupt state is fixed */
--
2.23.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 5/5] powerpc/64s: Fix unrecoverable MCE calling async handler from NMI
2021-10-04 14:56 [PATCH 0/5] powerpc: various interrupt handling fixes Nicholas Piggin
` (3 preceding siblings ...)
2021-10-04 14:56 ` [PATCH 4/5] powerpc/64/interrupt: Reconcile soft-mask state in NMI and fix false BUG Nicholas Piggin
@ 2021-10-04 14:56 ` Nicholas Piggin
2021-10-04 15:28 ` Cédric Le Goater
2021-10-08 13:23 ` [PATCH 0/5] powerpc: various interrupt handling fixes Michael Ellerman
5 siblings, 1 reply; 8+ messages in thread
From: Nicholas Piggin @ 2021-10-04 14:56 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Ganesh Goudar, Nicholas Piggin
The machine check handler is not considered NMI on 64s. The early
handler is the true NMI handler, and then it schedules the
machine_check_exception handler to run when interrupts are enabled.
This works fine except the case of an unrecoverable MCE, where the true
NMI is taken when MSR[RI] is clear, it can not recover, so it calls
machine_check_exception directly so something might be done about it.
Calling an async handler from NMI context can result in irq state and
other things getting corrupted. This can also trigger the BUG at
arch/powerpc/include/asm/interrupt.h:168
BUG_ON(!arch_irq_disabled_regs(regs) && !(regs->msr & MSR_EE));
Fix this by making an _async version of the handler which is called
in the normal case, and a NMI version that is called for unrecoverable
interrupts.
Fixes: 2b43dd7653cc ("powerpc/64: enable MSR[EE] in irq replay pt_regs")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
arch/powerpc/include/asm/interrupt.h | 5 ++---
arch/powerpc/kernel/exceptions-64s.S | 8 +++++--
arch/powerpc/kernel/traps.c | 31 ++++++++++++++++------------
3 files changed, 26 insertions(+), 18 deletions(-)
diff --git a/arch/powerpc/include/asm/interrupt.h b/arch/powerpc/include/asm/interrupt.h
index b894b7169706..a1d238255f07 100644
--- a/arch/powerpc/include/asm/interrupt.h
+++ b/arch/powerpc/include/asm/interrupt.h
@@ -528,10 +528,9 @@ static __always_inline long ____##func(struct pt_regs *regs)
/* kernel/traps.c */
DECLARE_INTERRUPT_HANDLER_NMI(system_reset_exception);
#ifdef CONFIG_PPC_BOOK3S_64
-DECLARE_INTERRUPT_HANDLER_ASYNC(machine_check_exception);
-#else
-DECLARE_INTERRUPT_HANDLER_NMI(machine_check_exception);
+DECLARE_INTERRUPT_HANDLER_ASYNC(machine_check_exception_async);
#endif
+DECLARE_INTERRUPT_HANDLER_NMI(machine_check_exception);
DECLARE_INTERRUPT_HANDLER(SMIException);
DECLARE_INTERRUPT_HANDLER(handle_hmi_exception);
DECLARE_INTERRUPT_HANDLER(unknown_exception);
diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index 024d9231f88c..eaf1f72131a1 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -1243,7 +1243,7 @@ EXC_COMMON_BEGIN(machine_check_common)
li r10,MSR_RI
mtmsrd r10,1
addi r3,r1,STACK_FRAME_OVERHEAD
- bl machine_check_exception
+ bl machine_check_exception_async
b interrupt_return_srr
@@ -1303,7 +1303,11 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
subi r12,r12,1
sth r12,PACA_IN_MCE(r13)
- /* Invoke machine_check_exception to print MCE event and panic. */
+ /*
+ * Invoke machine_check_exception to print MCE event and panic.
+ * This is the NMI version of the handler because we are called from
+ * the early handler which is a true NMI.
+ */
addi r3,r1,STACK_FRAME_OVERHEAD
bl machine_check_exception
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index e453b666613b..11741703d26e 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -796,24 +796,22 @@ void die_mce(const char *str, struct pt_regs *regs, long err)
* do_exit() checks for in_interrupt() and panics in that case, so
* exit the irq/nmi before calling die.
*/
- if (IS_ENABLED(CONFIG_PPC_BOOK3S_64))
- irq_exit();
- else
+ if (in_nmi())
nmi_exit();
+ else
+ irq_exit();
die(str, regs, err);
}
/*
- * BOOK3S_64 does not call this handler as a non-maskable interrupt
+ * BOOK3S_64 does not usually call this handler as a non-maskable interrupt
* (it uses its own early real-mode handler to handle the MCE proper
* and then raises irq_work to call this handler when interrupts are
- * enabled).
+ * enabled). The only time when this is not true is if the early handler
+ * is unrecoverable, then it does call this directly to try to get a
+ * message out.
*/
-#ifdef CONFIG_PPC_BOOK3S_64
-DEFINE_INTERRUPT_HANDLER_ASYNC(machine_check_exception)
-#else
-DEFINE_INTERRUPT_HANDLER_NMI(machine_check_exception)
-#endif
+static void __machine_check_exception(struct pt_regs *regs)
{
int recover = 0;
@@ -847,12 +845,19 @@ DEFINE_INTERRUPT_HANDLER_NMI(machine_check_exception)
/* Must die if the interrupt is not recoverable */
if (regs_is_unrecoverable(regs))
die_mce("Unrecoverable Machine check", regs, SIGBUS);
+}
#ifdef CONFIG_PPC_BOOK3S_64
- return;
-#else
- return 0;
+DEFINE_INTERRUPT_HANDLER_ASYNC(machine_check_exception_async)
+{
+ __machine_check_exception(regs);
+}
#endif
+DEFINE_INTERRUPT_HANDLER_NMI(machine_check_exception)
+{
+ __machine_check_exception(regs);
+
+ return 0;
}
DEFINE_INTERRUPT_HANDLER(SMIException) /* async? */
--
2.23.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 5/5] powerpc/64s: Fix unrecoverable MCE calling async handler from NMI
2021-10-04 14:56 ` [PATCH 5/5] powerpc/64s: Fix unrecoverable MCE calling async handler from NMI Nicholas Piggin
@ 2021-10-04 15:28 ` Cédric Le Goater
0 siblings, 0 replies; 8+ messages in thread
From: Cédric Le Goater @ 2021-10-04 15:28 UTC (permalink / raw)
To: Nicholas Piggin, linuxppc-dev; +Cc: Ganesh Goudar
On 10/4/21 16:56, Nicholas Piggin wrote:
> The machine check handler is not considered NMI on 64s. The early
> handler is the true NMI handler, and then it schedules the
> machine_check_exception handler to run when interrupts are enabled.
>
> This works fine except the case of an unrecoverable MCE, where the true
> NMI is taken when MSR[RI] is clear, it can not recover, so it calls
> machine_check_exception directly so something might be done about it.
>
> Calling an async handler from NMI context can result in irq state and
> other things getting corrupted. This can also trigger the BUG at
> arch/powerpc/include/asm/interrupt.h:168
> BUG_ON(!arch_irq_disabled_regs(regs) && !(regs->msr & MSR_EE));
I was hitting this problem when I rebooted a P8 tuleta system and
this series fixes it.
Tested-by: Cédric Le Goater <clg@kaod.org>
Thanks,
C.
> Fix this by making an _async version of the handler which is called
> in the normal case, and a NMI version that is called for unrecoverable
> interrupts.
>
> Fixes: 2b43dd7653cc ("powerpc/64: enable MSR[EE] in irq replay pt_regs")
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>> ---
> arch/powerpc/include/asm/interrupt.h | 5 ++---
> arch/powerpc/kernel/exceptions-64s.S | 8 +++++--
> arch/powerpc/kernel/traps.c | 31 ++++++++++++++++------------
> 3 files changed, 26 insertions(+), 18 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/interrupt.h b/arch/powerpc/include/asm/interrupt.h
> index b894b7169706..a1d238255f07 100644
> --- a/arch/powerpc/include/asm/interrupt.h
> +++ b/arch/powerpc/include/asm/interrupt.h
> @@ -528,10 +528,9 @@ static __always_inline long ____##func(struct pt_regs *regs)
> /* kernel/traps.c */
> DECLARE_INTERRUPT_HANDLER_NMI(system_reset_exception);
> #ifdef CONFIG_PPC_BOOK3S_64
> -DECLARE_INTERRUPT_HANDLER_ASYNC(machine_check_exception);
> -#else
> -DECLARE_INTERRUPT_HANDLER_NMI(machine_check_exception);
> +DECLARE_INTERRUPT_HANDLER_ASYNC(machine_check_exception_async);
> #endif
> +DECLARE_INTERRUPT_HANDLER_NMI(machine_check_exception);
> DECLARE_INTERRUPT_HANDLER(SMIException);
> DECLARE_INTERRUPT_HANDLER(handle_hmi_exception);
> DECLARE_INTERRUPT_HANDLER(unknown_exception);
> diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
> index 024d9231f88c..eaf1f72131a1 100644
> --- a/arch/powerpc/kernel/exceptions-64s.S
> +++ b/arch/powerpc/kernel/exceptions-64s.S
> @@ -1243,7 +1243,7 @@ EXC_COMMON_BEGIN(machine_check_common)
> li r10,MSR_RI
> mtmsrd r10,1
> addi r3,r1,STACK_FRAME_OVERHEAD
> - bl machine_check_exception
> + bl machine_check_exception_async
> b interrupt_return_srr
>
>
> @@ -1303,7 +1303,11 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
> subi r12,r12,1
> sth r12,PACA_IN_MCE(r13)
>
> - /* Invoke machine_check_exception to print MCE event and panic. */
> + /*
> + * Invoke machine_check_exception to print MCE event and panic.
> + * This is the NMI version of the handler because we are called from
> + * the early handler which is a true NMI.
> + */
> addi r3,r1,STACK_FRAME_OVERHEAD
> bl machine_check_exception
>
> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
> index e453b666613b..11741703d26e 100644
> --- a/arch/powerpc/kernel/traps.c
> +++ b/arch/powerpc/kernel/traps.c
> @@ -796,24 +796,22 @@ void die_mce(const char *str, struct pt_regs *regs, long err)
> * do_exit() checks for in_interrupt() and panics in that case, so
> * exit the irq/nmi before calling die.
> */
> - if (IS_ENABLED(CONFIG_PPC_BOOK3S_64))
> - irq_exit();
> - else
> + if (in_nmi())
> nmi_exit();
> + else
> + irq_exit();
> die(str, regs, err);
> }
>
> /*
> - * BOOK3S_64 does not call this handler as a non-maskable interrupt
> + * BOOK3S_64 does not usually call this handler as a non-maskable interrupt
> * (it uses its own early real-mode handler to handle the MCE proper
> * and then raises irq_work to call this handler when interrupts are
> - * enabled).
> + * enabled). The only time when this is not true is if the early handler
> + * is unrecoverable, then it does call this directly to try to get a
> + * message out.
> */
> -#ifdef CONFIG_PPC_BOOK3S_64
> -DEFINE_INTERRUPT_HANDLER_ASYNC(machine_check_exception)
> -#else
> -DEFINE_INTERRUPT_HANDLER_NMI(machine_check_exception)
> -#endif
> +static void __machine_check_exception(struct pt_regs *regs)
> {
> int recover = 0;
>
> @@ -847,12 +845,19 @@ DEFINE_INTERRUPT_HANDLER_NMI(machine_check_exception)
> /* Must die if the interrupt is not recoverable */
> if (regs_is_unrecoverable(regs))
> die_mce("Unrecoverable Machine check", regs, SIGBUS);
> +}
>
> #ifdef CONFIG_PPC_BOOK3S_64
> - return;
> -#else
> - return 0;
> +DEFINE_INTERRUPT_HANDLER_ASYNC(machine_check_exception_async)
> +{
> + __machine_check_exception(regs);
> +}
> #endif
> +DEFINE_INTERRUPT_HANDLER_NMI(machine_check_exception)
> +{
> + __machine_check_exception(regs);
> +
> + return 0;
> }
>
> DEFINE_INTERRUPT_HANDLER(SMIException) /* async? */
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 0/5] powerpc: various interrupt handling fixes
2021-10-04 14:56 [PATCH 0/5] powerpc: various interrupt handling fixes Nicholas Piggin
` (4 preceding siblings ...)
2021-10-04 14:56 ` [PATCH 5/5] powerpc/64s: Fix unrecoverable MCE calling async handler from NMI Nicholas Piggin
@ 2021-10-08 13:23 ` Michael Ellerman
5 siblings, 0 replies; 8+ messages in thread
From: Michael Ellerman @ 2021-10-08 13:23 UTC (permalink / raw)
To: linuxppc-dev, Nicholas Piggin; +Cc: Ganesh Goudar
On Tue, 5 Oct 2021 00:56:37 +1000, Nicholas Piggin wrote:
> This fixes a number of bugs found mostly looking at a MCE handler issue,
> which should be fixed in patch 5 of the series, previous attempt here
> which Ganesh found to be wrong.
>
> https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20210922020247.209409-1-npiggin@gmail.com/
>
> I didn't increment to patch v2 because it's a different approach (so I
> gave it a different title).
>
> [...]
Applied to powerpc/fixes.
[1/5] powerpc/64s: fix program check interrupt emergency stack path
https://git.kernel.org/powerpc/c/3e607dc4df180b72a38e75030cb0f94d12808712
[2/5] powerpc/traps: do not enable irqs in _exception
https://git.kernel.org/powerpc/c/d0afd44c05f8f4e4c91487c02d43c87a31552462
[3/5] powerpc/64: warn if local irqs are enabled in NMI or hardirq context
https://git.kernel.org/powerpc/c/ff058a8ada5df0d84e5537cfaf89d06d71501580
[4/5] powerpc/64/interrupt: Reconcile soft-mask state in NMI and fix false BUG
https://git.kernel.org/powerpc/c/768c47010392ece9766a56479b4e0cf04a536916
[5/5] powerpc/64s: Fix unrecoverable MCE calling async handler from NMI
https://git.kernel.org/powerpc/c/f08fb25bc66986b0952724530a640d9970fa52c1
cheers
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2021-10-08 13:26 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-04 14:56 [PATCH 0/5] powerpc: various interrupt handling fixes Nicholas Piggin
2021-10-04 14:56 ` [PATCH 1/5] powerpc/64s: fix program check interrupt emergency stack path Nicholas Piggin
2021-10-04 14:56 ` [PATCH 2/5] powerpc/traps: do not enable irqs in _exception Nicholas Piggin
2021-10-04 14:56 ` [PATCH 3/5] powerpc/64: warn if local irqs are enabled in NMI or hardirq context Nicholas Piggin
2021-10-04 14:56 ` [PATCH 4/5] powerpc/64/interrupt: Reconcile soft-mask state in NMI and fix false BUG Nicholas Piggin
2021-10-04 14:56 ` [PATCH 5/5] powerpc/64s: Fix unrecoverable MCE calling async handler from NMI Nicholas Piggin
2021-10-04 15:28 ` Cédric Le Goater
2021-10-08 13:23 ` [PATCH 0/5] powerpc: various interrupt handling fixes Michael Ellerman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).