All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/4] Fixes for 3 separate NMI reentrancy bugs
@ 2019-02-26  6:08 Nicholas Piggin
  2019-02-26  6:08 ` [PATCH v3 1/4] powerpc/64s: Fix HV NMI vs HV interrupt recoverability test Nicholas Piggin
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Nicholas Piggin @ 2019-02-26  6:08 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin

This series fixes several similar but unrelated bugs with NMIs
clobbering live registers without noticing it, because MSR[RI] is set.
Pretty rare bugs, but serious silent corruption consequences.

For the most part these can be observed and tested quite easily
with the mambo simulator, except that it does not seem to follow
the architecture wrt leaving MSR[RI] unchanged for HV interrupts.
Mambo clears MSR[RI], so you have to account for that manually.

Since v1:
- Fixed several build bugs.

Since v2:
- Improved changelog and comments.
- Fixed the NIA test for virt mode interrupts.

Nicholas Piggin (4):
  powerpc/64s: Fix HV NMI vs HV interrupt recoverability test
  powerpc/64s: system reset interrupt preserve HSRRs
  powerpc/64s: Prepare to handle data interrupts vs d-side MCE
    reentrancy
  powerpc/64s: Fix data interrupts vs d-side MCE reentrancy

 arch/powerpc/include/asm/asm-prototypes.h |  8 ++
 arch/powerpc/include/asm/nmi.h            |  2 +
 arch/powerpc/kernel/exceptions-64s.S      | 92 +++++++++++++++++++----
 arch/powerpc/kernel/mce.c                 |  3 +
 arch/powerpc/kernel/traps.c               | 91 +++++++++++++++++++++-
 5 files changed, 179 insertions(+), 17 deletions(-)

-- 
2.18.0


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v3 1/4] powerpc/64s: Fix HV NMI vs HV interrupt recoverability test
  2019-02-26  6:08 [PATCH v3 0/4] Fixes for 3 separate NMI reentrancy bugs Nicholas Piggin
@ 2019-02-26  6:08 ` Nicholas Piggin
  2019-02-26  6:08 ` [PATCH v3 2/4] powerpc/64s: system reset interrupt preserve HSRRs Nicholas Piggin
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Nicholas Piggin @ 2019-02-26  6:08 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin

HV interrupts that use HSRR registers do not enter with MSR[RI] clear,
but their entry code is not recoverable vs NMI, due to shared use of
HSPRG1 as a scratch register to save r13.

This means that a system reset or machine check that hits in HSRR
interrupt entry can cause r13 to be silently corrupted.

Fix this by marking NMIs non-recoverable if they land in HV interrupt
ranges.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/include/asm/asm-prototypes.h |  8 +++
 arch/powerpc/include/asm/nmi.h            |  2 +
 arch/powerpc/kernel/exceptions-64s.S      |  8 +++
 arch/powerpc/kernel/mce.c                 |  3 ++
 arch/powerpc/kernel/traps.c               | 66 +++++++++++++++++++++++
 5 files changed, 87 insertions(+)

diff --git a/arch/powerpc/include/asm/asm-prototypes.h b/arch/powerpc/include/asm/asm-prototypes.h
index 1d911f68a23b..0f8326644fa4 100644
--- a/arch/powerpc/include/asm/asm-prototypes.h
+++ b/arch/powerpc/include/asm/asm-prototypes.h
@@ -51,6 +51,14 @@ int exit_vmx_usercopy(void);
 int enter_vmx_ops(void);
 void *exit_vmx_ops(void *dest);
 
+/* Exceptions */
+#ifdef CONFIG_PPC_POWERNV
+extern unsigned long real_trampolines_start;
+extern unsigned long real_trampolines_end;
+extern unsigned long virt_trampolines_start;
+extern unsigned long virt_trampolines_end;
+#endif
+
 /* Traps */
 long machine_check_early(struct pt_regs *regs);
 long hmi_exception_realmode(struct pt_regs *regs);
diff --git a/arch/powerpc/include/asm/nmi.h b/arch/powerpc/include/asm/nmi.h
index bd9ba8defd72..84b4cfe73edd 100644
--- a/arch/powerpc/include/asm/nmi.h
+++ b/arch/powerpc/include/asm/nmi.h
@@ -14,4 +14,6 @@ extern void arch_trigger_cpumask_backtrace(const cpumask_t *mask,
 #define arch_trigger_cpumask_backtrace arch_trigger_cpumask_backtrace
 #endif
 
+extern void hv_nmi_check_nonrecoverable(struct pt_regs *regs);
+
 #endif /* _ASM_NMI_H */
diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index 9e253ce27e08..d2e9fc968655 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -68,6 +68,14 @@ OPEN_FIXED_SECTION(real_vectors,        0x0100, 0x1900)
 OPEN_FIXED_SECTION(real_trampolines,    0x1900, 0x4000)
 OPEN_FIXED_SECTION(virt_vectors,        0x4000, 0x5900)
 OPEN_FIXED_SECTION(virt_trampolines,    0x5900, 0x7000)
+
+#ifdef CONFIG_PPC_POWERNV
+	.globl real_trampolines_start
+	.globl real_trampolines_end
+	.globl virt_trampolines_start
+	.globl virt_trampolines_end
+#endif
+
 #if defined(CONFIG_PPC_PSERIES) || defined(CONFIG_PPC_POWERNV)
 /*
  * Data area reserved for FWNMI option.
diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
index bd933a75f0bc..d653b5de4537 100644
--- a/arch/powerpc/kernel/mce.c
+++ b/arch/powerpc/kernel/mce.c
@@ -31,6 +31,7 @@
 
 #include <asm/machdep.h>
 #include <asm/mce.h>
+#include <asm/nmi.h>
 
 static DEFINE_PER_CPU(int, mce_nest_count);
 static DEFINE_PER_CPU(struct machine_check_event[MAX_MC_EVT], mce_event);
@@ -488,6 +489,8 @@ long machine_check_early(struct pt_regs *regs)
 {
 	long handled = 0;
 
+	hv_nmi_check_nonrecoverable(regs);
+
 	/*
 	 * See if platform is capable of handling machine check.
 	 */
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 64936b60d521..12b54908c15d 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -376,6 +376,70 @@ void _exception(int signr, struct pt_regs *regs, int code, unsigned long addr)
 	force_sig_fault(signr, code, (void __user *)addr, current);
 }
 
+/*
+ * The interrupt architecture has a quirk in that the HV interrupts excluding
+ * the NMIs (0x100 and 0x200) do not clear MSR[RI] at entry. The first thing
+ * that an interrupt handler must do is save off a GPR into a scratch register,
+ * and all interrupts on POWERNV (HV=1) use the HSPRG1 register as scratch.
+ * Therefore an NMI can clobber an HV interrupt's live HSPRG1 without noticing
+ * that it is non-reentrant, which leads to random data corruption.
+ *
+ * The solution is for NMI interrupts in HV mode to check if they originated
+ * from these critical HV interrupt regions. If so, then mark them not
+ * recoverable.
+ *
+ * An alternative would be for HV NMIs to use SPRG for scratch to avoid the
+ * HSPRG1 clobber, however this would cause guest SPRG to be clobbered. Linux
+ * guests should always have MSR[RI]=0 when its scratch SPRG is in use, so
+ * that would work. However any other guest OS that may have the SPRG live
+ * and MSR[RI]=1 could encounter silent corruption.
+ *
+ * Builds that do not support KVM could take this second option to increase
+ * the recoverability of NMIs.
+ */
+void hv_nmi_check_nonrecoverable(struct pt_regs *regs)
+{
+#ifdef CONFIG_PPC_POWERNV
+	unsigned long kbase = (unsigned long)_stext;
+	unsigned long nip = regs->nip;
+
+	if (!(regs->msr & MSR_RI))
+		return;
+	if (!(regs->msr & MSR_HV))
+		return;
+	if (regs->msr & MSR_PR)
+		return;
+
+	/*
+	 * Now test if the interrupt has hit a range that may be using
+	 * HSPRG1 without having RI=0 (i.e., an HSRR interrupt). The
+	 * problem ranges all run un-relocated. Test real and virt modes
+	 * at the same time by droping the high bit of the nip (virt mode
+	 * entry points still have the +0x4000 offset).
+	 */
+	nip &= ~0xc000000000000000ULL;
+	if ((nip >= 0x500 && nip < 0x600) || (nip >= 0x4500 && nip < 0x4600))
+		goto nonrecoverable;
+	if ((nip >= 0x980 && nip < 0xa00) || (nip >= 0x4980 && nip < 0x4a00))
+		goto nonrecoverable;
+	if ((nip >= 0xe00 && nip < 0xec0) || (nip >= 0x4e00 && nip < 0x4ec0))
+		goto nonrecoverable;
+	if ((nip >= 0xf80 && nip < 0xfa0) || (nip >= 0x4f80 && nip < 0x4fa0))
+		goto nonrecoverable;
+	/* Trampoline code runs un-relocated so subtract kbase. */
+	if (nip >= real_trampolines_start - kbase &&
+			nip < real_trampolines_end - kbase)
+		goto nonrecoverable;
+	if (nip >= virt_trampolines_start - kbase &&
+			nip < virt_trampolines_end - kbase)
+		goto nonrecoverable;
+	return;
+
+nonrecoverable:
+	regs->msr &= ~MSR_RI;
+#endif
+}
+
 void system_reset_exception(struct pt_regs *regs)
 {
 	/*
@@ -386,6 +450,8 @@ void system_reset_exception(struct pt_regs *regs)
 	if (!nested)
 		nmi_enter();
 
+	hv_nmi_check_nonrecoverable(regs);
+
 	__this_cpu_inc(irq_stat.sreset_irqs);
 
 	/* See if any machine dependent calls */
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v3 2/4] powerpc/64s: system reset interrupt preserve HSRRs
  2019-02-26  6:08 [PATCH v3 0/4] Fixes for 3 separate NMI reentrancy bugs Nicholas Piggin
  2019-02-26  6:08 ` [PATCH v3 1/4] powerpc/64s: Fix HV NMI vs HV interrupt recoverability test Nicholas Piggin
@ 2019-02-26  6:08 ` Nicholas Piggin
  2019-02-26  6:09 ` [PATCH v3 3/4] powerpc/64s: Prepare to handle data interrupts vs d-side MCE reentrancy Nicholas Piggin
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Nicholas Piggin @ 2019-02-26  6:08 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin

Code that uses HSRR registers is not required to clear MSR[RI] by
convention, however the system reset NMI itself may use HSRR
registers (e.g., to call OPAL) and clobber them.

Rather than introduce the requirement to clear RI in order to use
HSRRs, have system reset interrupt save and restore HSRRs.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/traps.c | 25 ++++++++++++++++++++++++-
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 12b54908c15d..f2191755fdf5 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -442,14 +442,32 @@ void hv_nmi_check_nonrecoverable(struct pt_regs *regs)
 
 void system_reset_exception(struct pt_regs *regs)
 {
+	unsigned long hsrr0, hsrr1;
+	bool nested = in_nmi();
+	bool saved_hsrrs = false;
+
 	/*
 	 * Avoid crashes in case of nested NMI exceptions. Recoverability
 	 * is determined by RI and in_nmi
 	 */
-	bool nested = in_nmi();
 	if (!nested)
 		nmi_enter();
 
+	/*
+	 * System reset can interrupt code where HSRRs are live and MSR[RI]=1.
+	 * The system reset interrupt itself may clobber HSRRs (e.g., to call
+	 * OPAL), so save them here and restore them before returning.
+	 *
+	 * Machine checks don't need to save HSRRs, as the real mode handler
+	 * is careful to avoid them, and the regular handler is not delivered
+	 * as an NMI.
+	 */
+	if (cpu_has_feature(CPU_FTR_HVMODE)) {
+		hsrr0 = mfspr(SPRN_HSRR0);
+		hsrr1 = mfspr(SPRN_HSRR1);
+		saved_hsrrs = true;
+	}
+
 	hv_nmi_check_nonrecoverable(regs);
 
 	__this_cpu_inc(irq_stat.sreset_irqs);
@@ -499,6 +517,11 @@ void system_reset_exception(struct pt_regs *regs)
 	if (!(regs->msr & MSR_RI))
 		nmi_panic(regs, "Unrecoverable System Reset");
 
+	if (saved_hsrrs) {
+		mtspr(SPRN_HSRR0, hsrr0);
+		mtspr(SPRN_HSRR1, hsrr1);
+	}
+
 	if (!nested)
 		nmi_exit();
 
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v3 3/4] powerpc/64s: Prepare to handle data interrupts vs d-side MCE reentrancy
  2019-02-26  6:08 [PATCH v3 0/4] Fixes for 3 separate NMI reentrancy bugs Nicholas Piggin
  2019-02-26  6:08 ` [PATCH v3 1/4] powerpc/64s: Fix HV NMI vs HV interrupt recoverability test Nicholas Piggin
  2019-02-26  6:08 ` [PATCH v3 2/4] powerpc/64s: system reset interrupt preserve HSRRs Nicholas Piggin
@ 2019-02-26  6:09 ` Nicholas Piggin
  2019-02-26  6:09 ` [PATCH v3 4/4] powerpc/64s: Fix " Nicholas Piggin
  2019-02-26  6:51 ` [PATCH v3 0/4] Fixes for 3 separate NMI reentrancy bugs Satheesh Rajendran
  4 siblings, 0 replies; 6+ messages in thread
From: Nicholas Piggin @ 2019-02-26  6:09 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin

A subsequent fix for data interrupts (those that set DAR / DSISR)
requires some interrupt macros to be open-coded, and also requires
the 0x300 interrupt handler to be moved out-of-line.

This patch does that without changing behaviour, which makes the later
fix a smaller change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/exceptions-64s.S | 48 ++++++++++++++++++++++++----
 1 file changed, 42 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index d2e9fc968655..0b8b57597837 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -574,8 +574,23 @@ EXC_COMMON_BEGIN(mce_return)
 	RFI_TO_KERNEL
 	b	.
 
-EXC_REAL(data_access, 0x300, 0x80)
-EXC_VIRT(data_access, 0x4300, 0x80, 0x300)
+EXC_REAL_BEGIN(data_access, 0x300, 0x80)
+SET_SCRATCH0(r13)		/* save r13 */
+EXCEPTION_PROLOG_0(PACA_EXGEN)
+	b	tramp_real_data_access
+EXC_REAL_END(data_access, 0x300, 0x80)
+
+TRAMP_REAL_BEGIN(tramp_real_data_access)
+EXCEPTION_PROLOG_1(PACA_EXGEN, KVMTEST_PR, 0x300)
+EXCEPTION_PROLOG_2(data_access_common, EXC_STD)
+
+EXC_VIRT_BEGIN(data_access, 0x4300, 0x80)
+SET_SCRATCH0(r13)		/* save r13 */
+EXCEPTION_PROLOG_0(PACA_EXGEN)
+EXCEPTION_PROLOG_1(PACA_EXGEN, NOTEST, 0x300)
+EXCEPTION_PROLOG_2_RELON(data_access_common, EXC_STD)
+EXC_VIRT_END(data_access, 0x4300, 0x80)
+
 TRAMP_KVM_SKIP(PACA_EXGEN, 0x300)
 
 EXC_COMMON_BEGIN(data_access_common)
@@ -604,11 +619,20 @@ ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX)
 
 
 EXC_REAL_BEGIN(data_access_slb, 0x380, 0x80)
-EXCEPTION_PROLOG(PACA_EXSLB, data_access_slb_common, EXC_STD, KVMTEST_PR, 0x380);
+SET_SCRATCH0(r13)		/* save r13 */
+EXCEPTION_PROLOG_0(PACA_EXSLB)
+	b	tramp_real_data_access_slb
 EXC_REAL_END(data_access_slb, 0x380, 0x80)
 
+TRAMP_REAL_BEGIN(tramp_real_data_access_slb)
+EXCEPTION_PROLOG_1(PACA_EXSLB, KVMTEST_PR, 0x380)
+EXCEPTION_PROLOG_2(data_access_slb_common, EXC_STD)
+
 EXC_VIRT_BEGIN(data_access_slb, 0x4380, 0x80)
-EXCEPTION_RELON_PROLOG(PACA_EXSLB, data_access_slb_common, EXC_STD, NOTEST, 0x380);
+SET_SCRATCH0(r13)		/* save r13 */
+EXCEPTION_PROLOG_0(PACA_EXSLB)
+EXCEPTION_PROLOG_1(PACA_EXSLB, NOTEST, 0x380)
+EXCEPTION_PROLOG_2_RELON(data_access_slb_common, EXC_STD)
 EXC_VIRT_END(data_access_slb, 0x4380, 0x80)
 
 TRAMP_KVM_SKIP(PACA_EXSLB, 0x380)
@@ -711,8 +735,20 @@ TRAMP_KVM_HV(PACA_EXGEN, 0x500)
 EXC_COMMON_ASYNC(hardware_interrupt_common, 0x500, do_IRQ)
 
 
-EXC_REAL(alignment, 0x600, 0x100)
-EXC_VIRT(alignment, 0x4600, 0x100, 0x600)
+EXC_REAL_BEGIN(alignment, 0x600, 0x100)
+SET_SCRATCH0(r13)		/* save r13 */
+EXCEPTION_PROLOG_0(PACA_EXGEN)
+EXCEPTION_PROLOG_1(PACA_EXGEN, KVMTEST_PR, 0x600)
+EXCEPTION_PROLOG_2(alignment_common, EXC_STD)
+EXC_REAL_END(alignment, 0x600, 0x100)
+
+EXC_VIRT_BEGIN(alignment, 0x4600, 0x100)
+SET_SCRATCH0(r13)		/* save r13 */
+EXCEPTION_PROLOG_0(PACA_EXGEN)
+EXCEPTION_PROLOG_1(PACA_EXGEN, NOTEST, 0x600)
+EXCEPTION_PROLOG_2_RELON(alignment_common, EXC_STD)
+EXC_VIRT_END(alignment, 0x4600, 0x100)
+
 TRAMP_KVM(PACA_EXGEN, 0x600)
 EXC_COMMON_BEGIN(alignment_common)
 	mfspr	r10,SPRN_DAR
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v3 4/4] powerpc/64s: Fix data interrupts vs d-side MCE reentrancy
  2019-02-26  6:08 [PATCH v3 0/4] Fixes for 3 separate NMI reentrancy bugs Nicholas Piggin
                   ` (2 preceding siblings ...)
  2019-02-26  6:09 ` [PATCH v3 3/4] powerpc/64s: Prepare to handle data interrupts vs d-side MCE reentrancy Nicholas Piggin
@ 2019-02-26  6:09 ` Nicholas Piggin
  2019-02-26  6:51 ` [PATCH v3 0/4] Fixes for 3 separate NMI reentrancy bugs Satheesh Rajendran
  4 siblings, 0 replies; 6+ messages in thread
From: Nicholas Piggin @ 2019-02-26  6:09 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin

Handlers for interrupts that set DAR / DSISR, set MSR[RI] before those
SPRs are read. If a d-side machine check hits in this window, DAR /
DSISR will be clobbered silently, leading to random corruption.

Fix this by having handlers save those registers before setting MSR[RI].

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/exceptions-64s.S | 36 ++++++++++++++++++++--------
 1 file changed, 26 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index 0b8b57597837..77e9c70f1b49 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -582,12 +582,25 @@ EXC_REAL_END(data_access, 0x300, 0x80)
 
 TRAMP_REAL_BEGIN(tramp_real_data_access)
 EXCEPTION_PROLOG_1(PACA_EXGEN, KVMTEST_PR, 0x300)
+	/*
+	 * DAR/DSISR must be read before setting MSR[RI], because
+	 * a d-side MCE will clobber those registers so is not
+	 * recoverable if they are live.
+	 */
+	mfspr	r10,SPRN_DAR
+	mfspr	r11,SPRN_DSISR
+	std	r10,PACA_EXGEN+EX_DAR(r13)
+	stw	r11,PACA_EXGEN+EX_DSISR(r13)
 EXCEPTION_PROLOG_2(data_access_common, EXC_STD)
 
 EXC_VIRT_BEGIN(data_access, 0x4300, 0x80)
 SET_SCRATCH0(r13)		/* save r13 */
 EXCEPTION_PROLOG_0(PACA_EXGEN)
 EXCEPTION_PROLOG_1(PACA_EXGEN, NOTEST, 0x300)
+	mfspr	r10,SPRN_DAR
+	mfspr	r11,SPRN_DSISR
+	std	r10,PACA_EXGEN+EX_DAR(r13)
+	stw	r11,PACA_EXGEN+EX_DSISR(r13)
 EXCEPTION_PROLOG_2_RELON(data_access_common, EXC_STD)
 EXC_VIRT_END(data_access, 0x4300, 0x80)
 
@@ -598,11 +611,8 @@ EXC_COMMON_BEGIN(data_access_common)
 	 * Here r13 points to the paca, r9 contains the saved CR,
 	 * SRR0 and SRR1 are saved in r11 and r12,
 	 * r9 - r13 are saved in paca->exgen.
+	 * EX_DAR and EX_DSISR have saved DAR/DSISR
 	 */
-	mfspr	r10,SPRN_DAR
-	std	r10,PACA_EXGEN+EX_DAR(r13)
-	mfspr	r10,SPRN_DSISR
-	stw	r10,PACA_EXGEN+EX_DSISR(r13)
 	EXCEPTION_PROLOG_COMMON(0x300, PACA_EXGEN)
 	RECONCILE_IRQ_STATE(r10, r11)
 	ld	r12,_MSR(r1)
@@ -626,20 +636,22 @@ EXC_REAL_END(data_access_slb, 0x380, 0x80)
 
 TRAMP_REAL_BEGIN(tramp_real_data_access_slb)
 EXCEPTION_PROLOG_1(PACA_EXSLB, KVMTEST_PR, 0x380)
+	mfspr	r10,SPRN_DAR
+	std	r10,PACA_EXGEN+EX_DAR(r13)
 EXCEPTION_PROLOG_2(data_access_slb_common, EXC_STD)
 
 EXC_VIRT_BEGIN(data_access_slb, 0x4380, 0x80)
 SET_SCRATCH0(r13)		/* save r13 */
 EXCEPTION_PROLOG_0(PACA_EXSLB)
 EXCEPTION_PROLOG_1(PACA_EXSLB, NOTEST, 0x380)
+	mfspr	r10,SPRN_DAR
+	std	r10,PACA_EXGEN+EX_DAR(r13)
 EXCEPTION_PROLOG_2_RELON(data_access_slb_common, EXC_STD)
 EXC_VIRT_END(data_access_slb, 0x4380, 0x80)
 
 TRAMP_KVM_SKIP(PACA_EXSLB, 0x380)
 
 EXC_COMMON_BEGIN(data_access_slb_common)
-	mfspr	r10,SPRN_DAR
-	std	r10,PACA_EXSLB+EX_DAR(r13)
 	EXCEPTION_PROLOG_COMMON(0x380, PACA_EXSLB)
 	ld	r4,PACA_EXSLB+EX_DAR(r13)
 	std	r4,_DAR(r1)
@@ -739,6 +751,10 @@ EXC_REAL_BEGIN(alignment, 0x600, 0x100)
 SET_SCRATCH0(r13)		/* save r13 */
 EXCEPTION_PROLOG_0(PACA_EXGEN)
 EXCEPTION_PROLOG_1(PACA_EXGEN, KVMTEST_PR, 0x600)
+	mfspr	r10,SPRN_DAR
+	mfspr	r11,SPRN_DSISR
+	std	r10,PACA_EXGEN+EX_DAR(r13)
+	stw	r11,PACA_EXGEN+EX_DSISR(r13)
 EXCEPTION_PROLOG_2(alignment_common, EXC_STD)
 EXC_REAL_END(alignment, 0x600, 0x100)
 
@@ -746,15 +762,15 @@ EXC_VIRT_BEGIN(alignment, 0x4600, 0x100)
 SET_SCRATCH0(r13)		/* save r13 */
 EXCEPTION_PROLOG_0(PACA_EXGEN)
 EXCEPTION_PROLOG_1(PACA_EXGEN, NOTEST, 0x600)
+	mfspr	r10,SPRN_DAR
+	mfspr	r11,SPRN_DSISR
+	std	r10,PACA_EXGEN+EX_DAR(r13)
+	stw	r11,PACA_EXGEN+EX_DSISR(r13)
 EXCEPTION_PROLOG_2_RELON(alignment_common, EXC_STD)
 EXC_VIRT_END(alignment, 0x4600, 0x100)
 
 TRAMP_KVM(PACA_EXGEN, 0x600)
 EXC_COMMON_BEGIN(alignment_common)
-	mfspr	r10,SPRN_DAR
-	std	r10,PACA_EXGEN+EX_DAR(r13)
-	mfspr	r10,SPRN_DSISR
-	stw	r10,PACA_EXGEN+EX_DSISR(r13)
 	EXCEPTION_PROLOG_COMMON(0x600, PACA_EXGEN)
 	ld	r3,PACA_EXGEN+EX_DAR(r13)
 	lwz	r4,PACA_EXGEN+EX_DSISR(r13)
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v3 0/4] Fixes for 3 separate NMI reentrancy bugs
  2019-02-26  6:08 [PATCH v3 0/4] Fixes for 3 separate NMI reentrancy bugs Nicholas Piggin
                   ` (3 preceding siblings ...)
  2019-02-26  6:09 ` [PATCH v3 4/4] powerpc/64s: Fix " Nicholas Piggin
@ 2019-02-26  6:51 ` Satheesh Rajendran
  4 siblings, 0 replies; 6+ messages in thread
From: Satheesh Rajendran @ 2019-02-26  6:51 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: linuxppc-dev

On Tue, Feb 26, 2019 at 04:08:57PM +1000, Nicholas Piggin wrote:
> This series fixes several similar but unrelated bugs with NMIs
> clobbering live registers without noticing it, because MSR[RI] is set.
> Pretty rare bugs, but serious silent corruption consequences.
> 
> For the most part these can be observed and tested quite easily
> with the mambo simulator, except that it does not seem to follow
> the architecture wrt leaving MSR[RI] unchanged for HV interrupts.
> Mambo clears MSR[RI], so you have to account for that manually.
> 
> Since v1:
> - Fixed several build bugs.
> 
> Since v2:
> - Improved changelog and comments.
> - Fixed the NIA test for virt mode interrupts.

Hit with below crash on Power8 box, patch built with linuxppc merge branch with `ppc64le_defconfig`

UnknownStateTransition: Something happened system state="8" and we transitioned to UNKNOWN state.  Review the following for more details
Message="OpTestSystem in run_IPLing and Exception="Kernel OOPS (machine in state '5'): Oops: Kernel access of bad area, sig: 11 [#1]
[    0.000000] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV
[    0.000000] Modules linked in:
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.0.0-rc7-gf46b87021 #1
[    0.000000] NIP:  c000000000c1306c LR: c000000000c12f64 CTR: c00000000033d860
[    0.000000] REGS: c0000000014878b0 TRAP: 0380   Not tainted  (5.0.0-rc7-gf46b87021)
[    0.000000] MSR:  9000000000001033 <SF,HV,ME,IR,DR,RI,LE>  CR: 28002224  XER: 00000000
[    0.000000] CFAR: c000000000c12f7c IRQMASK: 1 
[    0.000000] GPR00: c000000000c12f64 c000000001487b40 c000000001488400 f000000000000000 
[    0.000000] GPR04: c000000001487b18 c000000001487b20 0000000000000000 c000000001388400 
[    0.000000] GPR08: f000000000000000 f000000000000008 0000000000000000 0000000800000000 
[    0.000000] GPR12: c0000000015e1ed0 c000000001670000 0000000000000000 0000000000000000 
[    0.000000] GPR16: 0000000000000000 0000000000000000 c0000000015e0d40 0000000000000001 
[    0.000000] GPR20: ffffffffffffffff ffffffffffffffff 0000000008000000 c000000001413b90 
[    0.000000] GPR24: c000000001413b98 007ffff000000000 0000000000080000 0000000000000000 
[    0.000000] GPR28: 0000000000000000 0000000000000000 007ffff000001000 0000000000000000 
[    0.000000] NIP [c000000000c1306c] memmap_init_zone+0x258/0x308
[    0.000000] LR [c000000000c12f64] memmap_init_zone+0x150/0x308
[    0.000000] Call Trace:
[    0.000000] [c000000001487b40] [c000000000c12f64] memmap_init_zone+0x150/0x308 (unreliable)
[    0.000000] [c000000001487be0] [c000000000f87acc] free_area_init_node+0x480/0x518
[    0.000000] [c000000001487cf0] [c000000000f88630] free_area_init_nodes+0x838/0x940
[    0.000000] [c000000001487e10] [c000000000f6340c] paging_init+0x8c/0xa8
[    0.000000] [c000000001487e80] [c000000000f5bc00] setup_arch+0x3b4/0x3f0
[    0.000000] [c000000001487ef0] [c000000000f53b68] start_kernel+0x94/0x630
[    0.000000] [c000000001487f90] [c00000000000b37c] start_here_common+0x1c/0x520
[    0.000000] Instruction dump:
[    0.000000] 71290002 41820014 ebea0008 7cc6fa14 78df8402 48000070 3d22000c 7bea3664 
[    0.000000] 39299d20 e9090000 7c685214 39230008 <fa290010> fa290018 fa290020 fa290030 
[    0.000000] random: get_random_bytes called from print_oops_end_marker+0x40/0x80 with crng_init=0
[    0.000000] ---[ end trace 0000000000000000 ]---
[    0.000000] 
[    0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
[    0.000000] Rebooting in 10 seconds" caused the system to go to UNKNOWN_BAD and the system will be stopping."

Regards,
-Satheesh.
> 
> Nicholas Piggin (4):
>   powerpc/64s: Fix HV NMI vs HV interrupt recoverability test
>   powerpc/64s: system reset interrupt preserve HSRRs
>   powerpc/64s: Prepare to handle data interrupts vs d-side MCE
>     reentrancy
>   powerpc/64s: Fix data interrupts vs d-side MCE reentrancy
> 
>  arch/powerpc/include/asm/asm-prototypes.h |  8 ++
>  arch/powerpc/include/asm/nmi.h            |  2 +
>  arch/powerpc/kernel/exceptions-64s.S      | 92 +++++++++++++++++++----
>  arch/powerpc/kernel/mce.c                 |  3 +
>  arch/powerpc/kernel/traps.c               | 91 +++++++++++++++++++++-
>  5 files changed, 179 insertions(+), 17 deletions(-)
> 
> -- 
> 2.18.0
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-02-26  6:53 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-26  6:08 [PATCH v3 0/4] Fixes for 3 separate NMI reentrancy bugs Nicholas Piggin
2019-02-26  6:08 ` [PATCH v3 1/4] powerpc/64s: Fix HV NMI vs HV interrupt recoverability test Nicholas Piggin
2019-02-26  6:08 ` [PATCH v3 2/4] powerpc/64s: system reset interrupt preserve HSRRs Nicholas Piggin
2019-02-26  6:09 ` [PATCH v3 3/4] powerpc/64s: Prepare to handle data interrupts vs d-side MCE reentrancy Nicholas Piggin
2019-02-26  6:09 ` [PATCH v3 4/4] powerpc/64s: Fix " Nicholas Piggin
2019-02-26  6:51 ` [PATCH v3 0/4] Fixes for 3 separate NMI reentrancy bugs Satheesh Rajendran

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.