All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 00/16] powerpc: machine check and system reset fixes
@ 2020-05-08  4:33 Nicholas Piggin
  2020-05-08  4:33 ` [PATCH v4 01/16] powerpc/64s/exception: Fix machine check no-loss idle wakeup Nicholas Piggin
                   ` (16 more replies)
  0 siblings, 17 replies; 25+ messages in thread
From: Nicholas Piggin @ 2020-05-08  4:33 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin

Since v3, I fixed a compile error and returned the generic machine check
exception handler to be NMI on 32 and 64e, as caught by Christophe's
review.

Also added the last patch, just found it by looking at the code, a
review for 32s would be good.

Thanks,
Nick

Nicholas Piggin (16):
  powerpc/64s/exception: Fix machine check no-loss idle wakeup
  powerpc/64s/exceptions: Fix in_mce accounting in unrecoverable path
  powerpc/64s/exceptions: Change irq reconcile for NMIs from reusing
    _DAR to RESULT
  powerpc/64s/exceptions: machine check reconcile irq state
  powerpc/pseries/ras: avoid calling rtas_token in NMI paths
  powerpc/pseries/ras: FWNMI_VALID off by one
  powerpc/pseries/ras: fwnmi avoid modifying r3 in error case
  powerpc/pseries/ras: fwnmi sreset should not interlock
  powerpc/pseries: limit machine check stack to 4GB
  powerpc/pseries: machine check use rtas_call_unlocked with args on
    stack
  powerpc/64s: machine check interrupt update NMI accounting
  powerpc: implement ftrace_enabled helper
  powerpc/64s: machine check do not trace real-mode handler
  powerpc/traps: system reset do not trace
  powerpc/traps: make unrecoverable NMIs die instead of panic
  powerpc/traps: Machine check fix RI=0 recoverability check

 arch/powerpc/include/asm/firmware.h    |  1 +
 arch/powerpc/include/asm/ftrace.h      | 14 ++++++
 arch/powerpc/kernel/exceptions-64s.S   | 47 +++++++++++++++-----
 arch/powerpc/kernel/mce.c              | 16 ++++++-
 arch/powerpc/kernel/process.c          |  2 +-
 arch/powerpc/kernel/setup_64.c         | 15 +++++--
 arch/powerpc/kernel/traps.c            | 31 ++++++++++---
 arch/powerpc/platforms/pseries/ras.c   | 60 +++++++++++++++++++-------
 arch/powerpc/platforms/pseries/setup.c | 14 ++++--
 9 files changed, 157 insertions(+), 43 deletions(-)

-- 
2.23.0


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v4 01/16] powerpc/64s/exception: Fix machine check no-loss idle wakeup
  2020-05-08  4:33 [PATCH v4 00/16] powerpc: machine check and system reset fixes Nicholas Piggin
@ 2020-05-08  4:33 ` Nicholas Piggin
  2020-05-08  4:33 ` [PATCH v4 02/16] powerpc/64s/exceptions: Fix in_mce accounting in unrecoverable path Nicholas Piggin
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: Nicholas Piggin @ 2020-05-08  4:33 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin

The architecture allows for machine check exceptions to cause idle
wakeups which resume at the 0x200 address which has to return via
the idle wakeup code, but the early machine check handler is run
first.

The case of a no state-loss sleep is broken because the early
handler uses non-volatile register r1 , which is needed for the wakeup
protocol, but it is not restored.

Fix this by loading r1 from the MCE exception frame before returning
to the idle wakeup code. Also update the comment which has become
stale since the idle rewrite in C.

Fixes: 10d91611f426d ("powerpc/64s: Reimplement book3s idle code in C")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

This crash was found and fix confirmed with a machine check injection
test in qemu powernv model (which is not upstream in qemu yet).
---
 arch/powerpc/kernel/exceptions-64s.S | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index 728ccb0f560c..bbf3109c5cba 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -1224,17 +1224,19 @@ EXC_COMMON_BEGIN(machine_check_idle_common)
 	bl	machine_check_queue_event
 
 	/*
-	 * We have not used any non-volatile GPRs here, and as a rule
-	 * most exception code including machine check does not.
-	 * Therefore PACA_NAPSTATELOST does not need to be set. Idle
-	 * wakeup will restore volatile registers.
+	 * GPR-loss wakeups are relatively straightforward, because the
+	 * idle sleep code has saved all non-volatile registers on its
+	 * own stack, and r1 in PACAR1.
 	 *
-	 * Load the original SRR1 into r3 for pnv_powersave_wakeup_mce.
+	 * For no-loss wakeups the r1 and lr registers used by the
+	 * early machine check handler have to be restored first. r2 is
+	 * the kernel TOC, so no need to restore it.
 	 *
 	 * Then decrement MCE nesting after finishing with the stack.
 	 */
 	ld	r3,_MSR(r1)
 	ld	r4,_LINK(r1)
+	ld	r1,GPR1(r1)
 
 	lhz	r11,PACA_IN_MCE(r13)
 	subi	r11,r11,1
@@ -1243,7 +1245,7 @@ EXC_COMMON_BEGIN(machine_check_idle_common)
 	mtlr	r4
 	rlwinm	r10,r3,47-31,30,31
 	cmpwi	cr1,r10,2
-	bltlr	cr1	/* no state loss, return to idle caller */
+	bltlr	cr1	/* no state loss, return to idle caller with r3=SRR1 */
 	b	idle_return_gpr_loss
 #endif
 
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v4 02/16] powerpc/64s/exceptions: Fix in_mce accounting in unrecoverable path
  2020-05-08  4:33 [PATCH v4 00/16] powerpc: machine check and system reset fixes Nicholas Piggin
  2020-05-08  4:33 ` [PATCH v4 01/16] powerpc/64s/exception: Fix machine check no-loss idle wakeup Nicholas Piggin
@ 2020-05-08  4:33 ` Nicholas Piggin
  2020-05-08  4:33 ` [PATCH v4 03/16] powerpc/64s/exceptions: Change irq reconcile for NMIs from reusing _DAR to RESULT Nicholas Piggin
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: Nicholas Piggin @ 2020-05-08  4:33 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Mahesh Salgaonkar, Nicholas Piggin

Acked-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/exceptions-64s.S | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index bbf3109c5cba..3322000316ab 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -1267,6 +1267,10 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
 	andc	r10,r10,r3
 	mtmsrd	r10
 
+	lhz	r12,PACA_IN_MCE(r13)
+	subi	r12,r12,1
+	sth	r12,PACA_IN_MCE(r13)
+
 	/* Invoke machine_check_exception to print MCE event and panic. */
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	machine_check_exception
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v4 03/16] powerpc/64s/exceptions: Change irq reconcile for NMIs from reusing _DAR to RESULT
  2020-05-08  4:33 [PATCH v4 00/16] powerpc: machine check and system reset fixes Nicholas Piggin
  2020-05-08  4:33 ` [PATCH v4 01/16] powerpc/64s/exception: Fix machine check no-loss idle wakeup Nicholas Piggin
  2020-05-08  4:33 ` [PATCH v4 02/16] powerpc/64s/exceptions: Fix in_mce accounting in unrecoverable path Nicholas Piggin
@ 2020-05-08  4:33 ` Nicholas Piggin
  2020-05-08  4:33 ` [PATCH v4 04/16] powerpc/64s/exceptions: machine check reconcile irq state Nicholas Piggin
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: Nicholas Piggin @ 2020-05-08  4:33 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin

A spare interrupt stack slot is needed to save irq state when
reconciling NMIs (sreset and decrementer soft-nmi). _DAR is used
for this, but we want to reconcile machine checks as well, which
do use _DAR. Switch to using RESULT instead, as it's used by
system calls.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/exceptions-64s.S | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index 3322000316ab..a42b73efb1a9 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -939,13 +939,13 @@ EXC_COMMON_BEGIN(system_reset_common)
 	 * the right thing. We do not want to reconcile because that goes
 	 * through irq tracing which we don't want in NMI.
 	 *
-	 * Save PACAIRQHAPPENED to _DAR (otherwise unused), and set HARD_DIS
+	 * Save PACAIRQHAPPENED to RESULT (otherwise unused), and set HARD_DIS
 	 * as we are running with MSR[EE]=0.
 	 */
 	li	r10,IRQS_ALL_DISABLED
 	stb	r10,PACAIRQSOFTMASK(r13)
 	lbz	r10,PACAIRQHAPPENED(r13)
-	std	r10,_DAR(r1)
+	std	r10,RESULT(r1)
 	ori	r10,r10,PACA_IRQ_HARD_DIS
 	stb	r10,PACAIRQHAPPENED(r13)
 
@@ -966,7 +966,7 @@ EXC_COMMON_BEGIN(system_reset_common)
 	/*
 	 * Restore soft mask settings.
 	 */
-	ld	r10,_DAR(r1)
+	ld	r10,RESULT(r1)
 	stb	r10,PACAIRQHAPPENED(r13)
 	ld	r10,SOFTE(r1)
 	stb	r10,PACAIRQSOFTMASK(r13)
@@ -2743,7 +2743,7 @@ EXC_COMMON_BEGIN(soft_nmi_common)
 	li	r10,IRQS_ALL_DISABLED
 	stb	r10,PACAIRQSOFTMASK(r13)
 	lbz	r10,PACAIRQHAPPENED(r13)
-	std	r10,_DAR(r1)
+	std	r10,RESULT(r1)
 	ori	r10,r10,PACA_IRQ_HARD_DIS
 	stb	r10,PACAIRQHAPPENED(r13)
 
@@ -2757,7 +2757,7 @@ EXC_COMMON_BEGIN(soft_nmi_common)
 	/*
 	 * Restore soft mask settings.
 	 */
-	ld	r10,_DAR(r1)
+	ld	r10,RESULT(r1)
 	stb	r10,PACAIRQHAPPENED(r13)
 	ld	r10,SOFTE(r1)
 	stb	r10,PACAIRQSOFTMASK(r13)
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v4 04/16] powerpc/64s/exceptions: machine check reconcile irq state
  2020-05-08  4:33 [PATCH v4 00/16] powerpc: machine check and system reset fixes Nicholas Piggin
                   ` (2 preceding siblings ...)
  2020-05-08  4:33 ` [PATCH v4 03/16] powerpc/64s/exceptions: Change irq reconcile for NMIs from reusing _DAR to RESULT Nicholas Piggin
@ 2020-05-08  4:33 ` Nicholas Piggin
  2020-05-08 13:39   ` Michael Ellerman
  2020-05-08  4:33 ` [PATCH v4 05/16] powerpc/pseries/ras: avoid calling rtas_token in NMI paths Nicholas Piggin
                   ` (12 subsequent siblings)
  16 siblings, 1 reply; 25+ messages in thread
From: Nicholas Piggin @ 2020-05-08  4:33 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin

pseries fwnmi machine check code pops the soft-irq checks in rtas_call
(after the previous patch to remove rtas_token from this call path).
Rather than play whack a mole with these and forever having fragile
code, it seems better to have the early machine check handler perform
the same kind of reconcile as the other NMI interrupts.

  WARNING: CPU: 0 PID: 493 at arch/powerpc/kernel/irq.c:343
  CPU: 0 PID: 493 Comm: a Tainted: G        W
  NIP:  c00000000001ed2c LR: c000000000042c40 CTR: 0000000000000000
  REGS: c0000001fffd38b0 TRAP: 0700   Tainted: G        W
  MSR:  8000000000021003 <SF,ME,RI,LE>  CR: 28000488  XER: 00000000
  CFAR: c00000000001ec90 IRQMASK: 0
  GPR00: c000000000043820 c0000001fffd3b40 c0000000012ba300 0000000000000000
  GPR04: 0000000048000488 0000000000000000 0000000000000000 00000000deadbeef
  GPR08: 0000000000000080 0000000000000000 0000000000000000 0000000000001001
  GPR12: 0000000000000000 c0000000014a0000 0000000000000000 0000000000000000
  GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR28: 0000000000000000 0000000000000001 c000000001360810 0000000000000000
  NIP [c00000000001ed2c] arch_local_irq_restore.part.0+0xac/0x100
  LR [c000000000042c40] unlock_rtas+0x30/0x90
  Call Trace:
  [c0000001fffd3b40] [c000000001360810] 0xc000000001360810 (unreliable)
  [c0000001fffd3b60] [c000000000043820] rtas_call+0x1c0/0x280
  [c0000001fffd3bb0] [c0000000000dc328] fwnmi_release_errinfo+0x38/0x70
  [c0000001fffd3c10] [c0000000000dcd8c] pseries_machine_check_realmode+0x1dc/0x540
  [c0000001fffd3cd0] [c00000000003fe04] machine_check_early+0x54/0x70
  [c0000001fffd3d00] [c000000000008384] machine_check_early_common+0x134/0x1f0
  --- interrupt: 200 at 0x13f1307c8
      LR = 0x7fff888b8528
  Instruction dump:
  60000000 7d2000a6 71298000 41820068 39200002 7d210164 4bffff9c 60000000
  60000000 7d2000a6 71298000 4c820020 <0fe00000> 4e800020 60000000 60000000

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/exceptions-64s.S | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index a42b73efb1a9..072772803b7c 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -1116,11 +1116,30 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
 	li	r10,MSR_RI
 	mtmsrd	r10,1
 
+	/*
+	 * Set IRQS_ALL_DISABLED and save PACAIRQHAPPENED (see
+	 * system_reset_common)
+	 */
+	li	r10,IRQS_ALL_DISABLED
+	stb	r10,PACAIRQSOFTMASK(r13)
+	lbz	r10,PACAIRQHAPPENED(r13)
+	std	r10,RESULT(r1)
+	ori	r10,r10,PACA_IRQ_HARD_DIS
+	stb	r10,PACAIRQHAPPENED(r13)
+
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	machine_check_early
 	std	r3,RESULT(r1)	/* Save result */
 	ld	r12,_MSR(r1)
 
+	/*
+	 * Restore soft mask settings.
+	 */
+	ld	r10,RESULT(r1)
+	stb	r10,PACAIRQHAPPENED(r13)
+	ld	r10,SOFTE(r1)
+	stb	r10,PACAIRQSOFTMASK(r13)
+
 #ifdef CONFIG_PPC_P7_NAP
 	/*
 	 * Check if thread was in power saving mode. We come here when any
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v4 05/16] powerpc/pseries/ras: avoid calling rtas_token in NMI paths
  2020-05-08  4:33 [PATCH v4 00/16] powerpc: machine check and system reset fixes Nicholas Piggin
                   ` (3 preceding siblings ...)
  2020-05-08  4:33 ` [PATCH v4 04/16] powerpc/64s/exceptions: machine check reconcile irq state Nicholas Piggin
@ 2020-05-08  4:33 ` Nicholas Piggin
  2020-05-08  4:33 ` [PATCH v4 06/16] powerpc/pseries/ras: FWNMI_VALID off by one Nicholas Piggin
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: Nicholas Piggin @ 2020-05-08  4:33 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Mahesh Salgaonkar, Nicholas Piggin

In the interest of reducing code and possible failures in the
machine check and system reset paths, grab the "ibm,nmi-interlock"
token at init time.

Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/include/asm/firmware.h    |  1 +
 arch/powerpc/platforms/pseries/ras.c   |  2 +-
 arch/powerpc/platforms/pseries/setup.c | 14 ++++++++++----
 3 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/firmware.h b/arch/powerpc/include/asm/firmware.h
index ca33f4ef6cb4..6003c2e533a0 100644
--- a/arch/powerpc/include/asm/firmware.h
+++ b/arch/powerpc/include/asm/firmware.h
@@ -128,6 +128,7 @@ extern void machine_check_fwnmi(void);
 
 /* This is true if we are using the firmware NMI handler (typically LPAR) */
 extern int fwnmi_active;
+extern int ibm_nmi_interlock_token;
 
 extern unsigned int __start___fw_ftr_fixup, __stop___fw_ftr_fixup;
 
diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
index 1d1da639b8b7..ac92f8687ea3 100644
--- a/arch/powerpc/platforms/pseries/ras.c
+++ b/arch/powerpc/platforms/pseries/ras.c
@@ -458,7 +458,7 @@ static struct rtas_error_log *fwnmi_get_errinfo(struct pt_regs *regs)
  */
 static void fwnmi_release_errinfo(void)
 {
-	int ret = rtas_call(rtas_token("ibm,nmi-interlock"), 0, 1, NULL);
+	int ret = rtas_call(ibm_nmi_interlock_token, 0, 1, NULL);
 	if (ret != 0)
 		printk(KERN_ERR "FWNMI: nmi-interlock failed: %d\n", ret);
 }
diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
index 0c8421dd01ab..dd234095ae4f 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -83,6 +83,7 @@ unsigned long CMO_PageSize = (ASM_CONST(1) << IOMMU_PAGE_SHIFT_4K);
 EXPORT_SYMBOL(CMO_PageSize);
 
 int fwnmi_active;  /* TRUE if an FWNMI handler is present */
+int ibm_nmi_interlock_token;
 
 static void pSeries_show_cpuinfo(struct seq_file *m)
 {
@@ -113,9 +114,14 @@ static void __init fwnmi_init(void)
 	struct slb_entry *slb_ptr;
 	size_t size;
 #endif
+	int ibm_nmi_register_token;
 
-	int ibm_nmi_register = rtas_token("ibm,nmi-register");
-	if (ibm_nmi_register == RTAS_UNKNOWN_SERVICE)
+	ibm_nmi_register_token = rtas_token("ibm,nmi-register");
+	if (ibm_nmi_register_token == RTAS_UNKNOWN_SERVICE)
+		return;
+
+	ibm_nmi_interlock_token = rtas_token("ibm,nmi-interlock");
+	if (WARN_ON(ibm_nmi_interlock_token == RTAS_UNKNOWN_SERVICE))
 		return;
 
 	/* If the kernel's not linked at zero we point the firmware at low
@@ -123,8 +129,8 @@ static void __init fwnmi_init(void)
 	system_reset_addr  = __pa(system_reset_fwnmi) - PHYSICAL_START;
 	machine_check_addr = __pa(machine_check_fwnmi) - PHYSICAL_START;
 
-	if (0 == rtas_call(ibm_nmi_register, 2, 1, NULL, system_reset_addr,
-				machine_check_addr))
+	if (0 == rtas_call(ibm_nmi_register_token, 2, 1, NULL,
+			   system_reset_addr, machine_check_addr))
 		fwnmi_active = 1;
 
 	/*
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v4 06/16] powerpc/pseries/ras: FWNMI_VALID off by one
  2020-05-08  4:33 [PATCH v4 00/16] powerpc: machine check and system reset fixes Nicholas Piggin
                   ` (4 preceding siblings ...)
  2020-05-08  4:33 ` [PATCH v4 05/16] powerpc/pseries/ras: avoid calling rtas_token in NMI paths Nicholas Piggin
@ 2020-05-08  4:33 ` Nicholas Piggin
  2020-05-08  4:33 ` [PATCH v4 07/16] powerpc/pseries/ras: fwnmi avoid modifying r3 in error case Nicholas Piggin
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: Nicholas Piggin @ 2020-05-08  4:33 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Mahesh Salgaonkar, Nicholas Piggin

This was discovered developing qemu fwnmi sreset support. This
off-by-one bug means the last 16 bytes of the rtas area can not
be used for a 16 byte save area.

It's not a serious bug, and QEMU implementation has to retain a
workaround for old kernels, but it's good to tighten it.

Acked-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/platforms/pseries/ras.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
index ac92f8687ea3..a5bd0f747bb1 100644
--- a/arch/powerpc/platforms/pseries/ras.c
+++ b/arch/powerpc/platforms/pseries/ras.c
@@ -395,10 +395,11 @@ static irqreturn_t ras_error_interrupt(int irq, void *dev_id)
 /*
  * Some versions of FWNMI place the buffer inside the 4kB page starting at
  * 0x7000. Other versions place it inside the rtas buffer. We check both.
+ * Minimum size of the buffer is 16 bytes.
  */
 #define VALID_FWNMI_BUFFER(A) \
-	((((A) >= 0x7000) && ((A) < 0x7ff0)) || \
-	(((A) >= rtas.base) && ((A) < (rtas.base + rtas.size - 16))))
+	((((A) >= 0x7000) && ((A) <= 0x8000 - 16)) || \
+	(((A) >= rtas.base) && ((A) <= (rtas.base + rtas.size - 16))))
 
 static inline struct rtas_error_log *fwnmi_get_errlog(void)
 {
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v4 07/16] powerpc/pseries/ras: fwnmi avoid modifying r3 in error case
  2020-05-08  4:33 [PATCH v4 00/16] powerpc: machine check and system reset fixes Nicholas Piggin
                   ` (5 preceding siblings ...)
  2020-05-08  4:33 ` [PATCH v4 06/16] powerpc/pseries/ras: FWNMI_VALID off by one Nicholas Piggin
@ 2020-05-08  4:33 ` Nicholas Piggin
  2020-05-08  4:34 ` [PATCH v4 08/16] powerpc/pseries/ras: fwnmi sreset should not interlock Nicholas Piggin
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: Nicholas Piggin @ 2020-05-08  4:33 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin

If there is some error with the fwnmi save area, r3 has already been
modified which doesn't help with debugging.

Only update r3 when to restore the saved value.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/platforms/pseries/ras.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
index a5bd0f747bb1..fe14186a8cef 100644
--- a/arch/powerpc/platforms/pseries/ras.c
+++ b/arch/powerpc/platforms/pseries/ras.c
@@ -423,18 +423,19 @@ static inline struct rtas_error_log *fwnmi_get_errlog(void)
  */
 static struct rtas_error_log *fwnmi_get_errinfo(struct pt_regs *regs)
 {
+	unsigned long savep_ra;
 	unsigned long *savep;
 	struct rtas_error_log *h;
 
 	/* Mask top two bits */
-	regs->gpr[3] &= ~(0x3UL << 62);
+	savep_ra = regs->gpr[3] & ~(0x3UL << 62);
 
-	if (!VALID_FWNMI_BUFFER(regs->gpr[3])) {
+	if (!VALID_FWNMI_BUFFER(savep_ra)) {
 		printk(KERN_ERR "FWNMI: corrupt r3 0x%016lx\n", regs->gpr[3]);
 		return NULL;
 	}
 
-	savep = __va(regs->gpr[3]);
+	savep = __va(savep_ra);
 	regs->gpr[3] = be64_to_cpu(savep[0]);	/* restore original r3 */
 
 	h = (struct rtas_error_log *)&savep[1];
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v4 08/16] powerpc/pseries/ras: fwnmi sreset should not interlock
  2020-05-08  4:33 [PATCH v4 00/16] powerpc: machine check and system reset fixes Nicholas Piggin
                   ` (6 preceding siblings ...)
  2020-05-08  4:33 ` [PATCH v4 07/16] powerpc/pseries/ras: fwnmi avoid modifying r3 in error case Nicholas Piggin
@ 2020-05-08  4:34 ` Nicholas Piggin
  2020-05-08  4:34 ` [PATCH v4 09/16] powerpc/pseries: limit machine check stack to 4GB Nicholas Piggin
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: Nicholas Piggin @ 2020-05-08  4:34 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin

PAPR does not specify that fwnmi sreset should be interlocked, and
PowerVM (and therefore now QEMU) do not require it.

These "ibm,nmi-interlock" calls are ignored by firmware, but there
is a possibility that the sreset could have interrupted a machine
check and release the machine check's interlock too early, corrupting
it if another machine check came in.

This is an extremely rare case, but it should be fixed for clarity
and reducing the code executed in the sreset path. Firmware also
does not provide error information for the sreset case to look at, so
remove that comment.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/platforms/pseries/ras.c | 46 +++++++++++++++++++---------
 1 file changed, 32 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
index fe14186a8cef..b2adba59f0ff 100644
--- a/arch/powerpc/platforms/pseries/ras.c
+++ b/arch/powerpc/platforms/pseries/ras.c
@@ -406,6 +406,20 @@ static inline struct rtas_error_log *fwnmi_get_errlog(void)
 	return (struct rtas_error_log *)local_paca->mce_data_buf;
 }
 
+static unsigned long *fwnmi_get_savep(struct pt_regs *regs)
+{
+	unsigned long savep_ra;
+
+	/* Mask top two bits */
+	savep_ra = regs->gpr[3] & ~(0x3UL << 62);
+	if (!VALID_FWNMI_BUFFER(savep_ra)) {
+		printk(KERN_ERR "FWNMI: corrupt r3 0x%016lx\n", regs->gpr[3]);
+		return NULL;
+	}
+
+	return __va(savep_ra);
+}
+
 /*
  * Get the error information for errors coming through the
  * FWNMI vectors.  The pt_regs' r3 will be updated to reflect
@@ -423,20 +437,14 @@ static inline struct rtas_error_log *fwnmi_get_errlog(void)
  */
 static struct rtas_error_log *fwnmi_get_errinfo(struct pt_regs *regs)
 {
-	unsigned long savep_ra;
 	unsigned long *savep;
 	struct rtas_error_log *h;
 
-	/* Mask top two bits */
-	savep_ra = regs->gpr[3] & ~(0x3UL << 62);
-
-	if (!VALID_FWNMI_BUFFER(savep_ra)) {
-		printk(KERN_ERR "FWNMI: corrupt r3 0x%016lx\n", regs->gpr[3]);
+	savep = fwnmi_get_savep(regs);
+	if (!savep)
 		return NULL;
-	}
 
-	savep = __va(savep_ra);
-	regs->gpr[3] = be64_to_cpu(savep[0]);	/* restore original r3 */
+	regs->gpr[3] = be64_to_cpu(savep[0]); /* restore original r3 */
 
 	h = (struct rtas_error_log *)&savep[1];
 	/* Use the per cpu buffer from paca to store rtas error log */
@@ -483,11 +491,21 @@ int pSeries_system_reset_exception(struct pt_regs *regs)
 #endif
 
 	if (fwnmi_active) {
-		struct rtas_error_log *errhdr = fwnmi_get_errinfo(regs);
-		if (errhdr) {
-			/* XXX Should look at FWNMI information */
-		}
-		fwnmi_release_errinfo();
+		unsigned long *savep;
+
+		/*
+		 * Firmware (PowerVM and KVM) saves r3 to a save area like
+		 * machine check, which is not exactly what PAPR (2.9)
+		 * suggests but there is no way to detect otherwise, so this
+		 * is the interface now.
+		 *
+		 * System resets do not save any error log or require an
+		 * "ibm,nmi-interlock" rtas call to release.
+		 */
+
+		savep = fwnmi_get_savep(regs);
+		if (savep)
+			regs->gpr[3] = be64_to_cpu(savep[0]); /* restore original r3 */
 	}
 
 	if (smp_handle_nmi_ipi(regs))
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v4 09/16] powerpc/pseries: limit machine check stack to 4GB
  2020-05-08  4:33 [PATCH v4 00/16] powerpc: machine check and system reset fixes Nicholas Piggin
                   ` (7 preceding siblings ...)
  2020-05-08  4:34 ` [PATCH v4 08/16] powerpc/pseries/ras: fwnmi sreset should not interlock Nicholas Piggin
@ 2020-05-08  4:34 ` Nicholas Piggin
  2020-05-08  4:34 ` [PATCH v4 10/16] powerpc/pseries: machine check use rtas_call_unlocked with args on stack Nicholas Piggin
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: Nicholas Piggin @ 2020-05-08  4:34 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Mahesh Salgaonkar, Nicholas Piggin

This allows rtas_args to be put on the machine check stack, which
avoids a lot of complications with re-entrancy deadlocks.

Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/setup_64.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index 8105010b0e76..bb47555d48a2 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -711,7 +711,7 @@ void __init exc_lvl_early_init(void)
  */
 void __init emergency_stack_init(void)
 {
-	u64 limit;
+	u64 limit, mce_limit;
 	unsigned int i;
 
 	/*
@@ -728,7 +728,16 @@ void __init emergency_stack_init(void)
 	 * initialized in kernel/irq.c. These are initialized here in order
 	 * to have emergency stacks available as early as possible.
 	 */
-	limit = min(ppc64_bolted_size(), ppc64_rma_size);
+	limit = mce_limit = min(ppc64_bolted_size(), ppc64_rma_size);
+
+	/*
+	 * Machine check on pseries calls rtas, but can't use the static
+	 * rtas_args due to a machine check hitting while the lock is held.
+	 * rtas args have to be under 4GB, so the machine check stack is
+	 * limited to 4GB so args can be put on stack.
+	 */
+	if (firmware_has_feature(FW_FEATURE_LPAR) && mce_limit > SZ_4G)
+		mce_limit = SZ_4G;
 
 	for_each_possible_cpu(i) {
 		paca_ptrs[i]->emergency_sp = alloc_stack(limit, i) + THREAD_SIZE;
@@ -738,7 +747,7 @@ void __init emergency_stack_init(void)
 		paca_ptrs[i]->nmi_emergency_sp = alloc_stack(limit, i) + THREAD_SIZE;
 
 		/* emergency stack for machine check exception handling. */
-		paca_ptrs[i]->mc_emergency_sp = alloc_stack(limit, i) + THREAD_SIZE;
+		paca_ptrs[i]->mc_emergency_sp = alloc_stack(mce_limit, i) + THREAD_SIZE;
 #endif
 	}
 }
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v4 10/16] powerpc/pseries: machine check use rtas_call_unlocked with args on stack
  2020-05-08  4:33 [PATCH v4 00/16] powerpc: machine check and system reset fixes Nicholas Piggin
                   ` (8 preceding siblings ...)
  2020-05-08  4:34 ` [PATCH v4 09/16] powerpc/pseries: limit machine check stack to 4GB Nicholas Piggin
@ 2020-05-08  4:34 ` Nicholas Piggin
  2020-05-08  4:34 ` [PATCH v4 11/16] powerpc/64s: machine check interrupt update NMI accounting Nicholas Piggin
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: Nicholas Piggin @ 2020-05-08  4:34 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin

With the previous patch, machine checks can use rtas_call_unlocked
which avoids the rtas spinlock which would deadlock if a machine
check hits while making an rtas call.

This also avoids the complex rtas error logging which has more rtas calls
and includes kmalloc (which can return memory beyond RMA, which would
also crash).

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/platforms/pseries/ras.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
index b2adba59f0ff..ce1665e58d9b 100644
--- a/arch/powerpc/platforms/pseries/ras.c
+++ b/arch/powerpc/platforms/pseries/ras.c
@@ -468,7 +468,15 @@ static struct rtas_error_log *fwnmi_get_errinfo(struct pt_regs *regs)
  */
 static void fwnmi_release_errinfo(void)
 {
-	int ret = rtas_call(ibm_nmi_interlock_token, 0, 1, NULL);
+	struct rtas_args rtas_args;
+	int ret;
+
+	/*
+	 * On pseries, the machine check stack is limited to under 4GB, so
+	 * args can be on-stack.
+	 */
+	rtas_call_unlocked(&rtas_args, ibm_nmi_interlock_token, 0, 1, NULL);
+	ret = be32_to_cpu(rtas_args.rets[0]);
 	if (ret != 0)
 		printk(KERN_ERR "FWNMI: nmi-interlock failed: %d\n", ret);
 }
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v4 11/16] powerpc/64s: machine check interrupt update NMI accounting
  2020-05-08  4:33 [PATCH v4 00/16] powerpc: machine check and system reset fixes Nicholas Piggin
                   ` (9 preceding siblings ...)
  2020-05-08  4:34 ` [PATCH v4 10/16] powerpc/pseries: machine check use rtas_call_unlocked with args on stack Nicholas Piggin
@ 2020-05-08  4:34 ` Nicholas Piggin
  2020-05-09  3:13     ` kbuild test robot
  2020-05-08  4:34 ` [PATCH v4 12/16] powerpc: implement ftrace_enabled helper Nicholas Piggin
                   ` (5 subsequent siblings)
  16 siblings, 1 reply; 25+ messages in thread
From: Nicholas Piggin @ 2020-05-08  4:34 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin

machine_check_early is taken as an NMI, so nmi_enter is used there.
machine_check_exception is no longer taken as an NMI (it's invoked
via irq_work in the case a machine check hits in kernel mode), so
remove the nmi_enter from that case.

In NMI context, hash faults don't try to refill the hash table, which
can lead to crashes accessing non-pinned kernel pages. System reset
still has this potential problem.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/mce.c     |  7 +++++++
 arch/powerpc/kernel/process.c |  2 +-
 arch/powerpc/kernel/traps.c   | 14 +++++++++++++-
 3 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
index 8077b5fb18a7..be7e3f92a7b5 100644
--- a/arch/powerpc/kernel/mce.c
+++ b/arch/powerpc/kernel/mce.c
@@ -574,6 +574,9 @@ EXPORT_SYMBOL_GPL(machine_check_print_event_info);
 long machine_check_early(struct pt_regs *regs)
 {
 	long handled = 0;
+	bool nested = in_nmi();
+	if (!nested)
+		nmi_enter();
 
 	hv_nmi_check_nonrecoverable(regs);
 
@@ -582,6 +585,10 @@ long machine_check_early(struct pt_regs *regs)
 	 */
 	if (ppc_md.machine_check_early)
 		handled = ppc_md.machine_check_early(regs);
+
+	if (!nested)
+		nmi_exit();
+
 	return handled;
 }
 
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 9c21288f8645..44410dd3029f 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1421,7 +1421,7 @@ void show_regs(struct pt_regs * regs)
 		pr_cont("DAR: "REG" DSISR: %08lx ", regs->dar, regs->dsisr);
 #endif
 #ifdef CONFIG_PPC64
-	pr_cont("IRQMASK: %lx ", regs->softe);
+	pr_cont("IRQMASK: %lx IN_NMI:%d IN_MCE:%d", regs->softe, (int)get_paca()->in_nmi, (int)get_paca()->in_mce);
 #endif
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
 	if (MSR_TM_ACTIVE(regs->msr))
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 3fca22276bb1..9f6852322e59 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -823,7 +823,19 @@ int machine_check_generic(struct pt_regs *regs)
 void machine_check_exception(struct pt_regs *regs)
 {
 	int recover = 0;
-	bool nested = in_nmi();
+	bool nested;
+
+	/*
+	 * BOOK3S_64 does not call this handler as a non-maskable interrupt
+	 * (it uses its own early real-mode handler to handle the MCE proper
+	 * and then raises irq_work to call this handler when interrupts are
+	 * enabled). Set nested = true for this case, which just makes it avoid
+	 * the nmi_enter/exit.
+	 */
+	if (IS_ENABLED(CONFIG_PPC_BOOK3S_64) || in_nmi())
+		nested = true;
+	else
+		nested = false;
 	if (!nested)
 		nmi_enter();
 
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v4 12/16] powerpc: implement ftrace_enabled helper
  2020-05-08  4:33 [PATCH v4 00/16] powerpc: machine check and system reset fixes Nicholas Piggin
                   ` (10 preceding siblings ...)
  2020-05-08  4:34 ` [PATCH v4 11/16] powerpc/64s: machine check interrupt update NMI accounting Nicholas Piggin
@ 2020-05-08  4:34 ` Nicholas Piggin
  2020-05-08  4:34 ` [PATCH v4 13/16] powerpc/64s: machine check do not trace real-mode handler Nicholas Piggin
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: Nicholas Piggin @ 2020-05-08  4:34 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
 arch/powerpc/include/asm/ftrace.h | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/arch/powerpc/include/asm/ftrace.h b/arch/powerpc/include/asm/ftrace.h
index f54a08a2cd70..bc76970b6ee5 100644
--- a/arch/powerpc/include/asm/ftrace.h
+++ b/arch/powerpc/include/asm/ftrace.h
@@ -108,9 +108,23 @@ static inline void this_cpu_enable_ftrace(void)
 {
 	get_paca()->ftrace_enabled = 1;
 }
+
+/* Disable ftrace on this CPU if possible (may not be implemented) */
+static inline void this_cpu_set_ftrace_enabled(u8 ftrace_enabled)
+{
+	get_paca()->ftrace_enabled = ftrace_enabled;
+}
+
+static inline u8 this_cpu_get_ftrace_enabled(void)
+{
+	return get_paca()->ftrace_enabled;
+}
+
 #else /* CONFIG_PPC64 */
 static inline void this_cpu_disable_ftrace(void) { }
 static inline void this_cpu_enable_ftrace(void) { }
+static inline void this_cpu_set_ftrace_enabled(u8 ftrace_enabled) { }
+static inline u8 this_cpu_get_ftrace_enabled(void) { return 1; }
 #endif /* CONFIG_PPC64 */
 #endif /* !__ASSEMBLY__ */
 
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v4 13/16] powerpc/64s: machine check do not trace real-mode handler
  2020-05-08  4:33 [PATCH v4 00/16] powerpc: machine check and system reset fixes Nicholas Piggin
                   ` (11 preceding siblings ...)
  2020-05-08  4:34 ` [PATCH v4 12/16] powerpc: implement ftrace_enabled helper Nicholas Piggin
@ 2020-05-08  4:34 ` Nicholas Piggin
  2020-05-08  4:34 ` [PATCH v4 14/16] powerpc/traps: system reset do not trace Nicholas Piggin
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: Nicholas Piggin @ 2020-05-08  4:34 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Naveen N . Rao, Nicholas Piggin

Rather than notrace annotations throughout a significant part of the
machine check code across kernel/ pseries/ and powernv/ which can
easily be broken and is infrequently tested, use paca->ftrace_enabled
to blanket-disable tracing of the real-mode non-maskable handler.

Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/mce.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
index be7e3f92a7b5..fd90c0eda229 100644
--- a/arch/powerpc/kernel/mce.c
+++ b/arch/powerpc/kernel/mce.c
@@ -16,6 +16,7 @@
 #include <linux/export.h>
 #include <linux/irq_work.h>
 #include <linux/extable.h>
+#include <linux/ftrace.h>
 
 #include <asm/machdep.h>
 #include <asm/mce.h>
@@ -571,10 +572,14 @@ EXPORT_SYMBOL_GPL(machine_check_print_event_info);
  *
  * regs->nip and regs->msr contains srr0 and ssr1.
  */
-long machine_check_early(struct pt_regs *regs)
+long notrace machine_check_early(struct pt_regs *regs)
 {
 	long handled = 0;
 	bool nested = in_nmi();
+	u8 ftrace_enabled = this_cpu_get_ftrace_enabled();
+
+	this_cpu_set_ftrace_enabled(0);
+
 	if (!nested)
 		nmi_enter();
 
@@ -589,6 +594,8 @@ long machine_check_early(struct pt_regs *regs)
 	if (!nested)
 		nmi_exit();
 
+	this_cpu_set_ftrace_enabled(ftrace_enabled);
+
 	return handled;
 }
 
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v4 14/16] powerpc/traps: system reset do not trace
  2020-05-08  4:33 [PATCH v4 00/16] powerpc: machine check and system reset fixes Nicholas Piggin
                   ` (12 preceding siblings ...)
  2020-05-08  4:34 ` [PATCH v4 13/16] powerpc/64s: machine check do not trace real-mode handler Nicholas Piggin
@ 2020-05-08  4:34 ` Nicholas Piggin
  2020-05-08  4:34 ` [PATCH v4 15/16] powerpc/traps: make unrecoverable NMIs die instead of panic Nicholas Piggin
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: Nicholas Piggin @ 2020-05-08  4:34 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Naveen N . Rao, Nicholas Piggin

Similarly to the previous patch, do not trace system reset. This code
is used when there is a crash or hang, and tracing disturbs the system
more and has been known to crash in the crash handling path.

Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/traps.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 9f6852322e59..ee209c5a1ad7 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -443,6 +443,9 @@ void system_reset_exception(struct pt_regs *regs)
 	unsigned long hsrr0, hsrr1;
 	bool nested = in_nmi();
 	bool saved_hsrrs = false;
+	u8 ftrace_enabled = this_cpu_get_ftrace_enabled();
+
+	this_cpu_set_ftrace_enabled(0);
 
 	/*
 	 * Avoid crashes in case of nested NMI exceptions. Recoverability
@@ -524,6 +527,8 @@ void system_reset_exception(struct pt_regs *regs)
 	if (!nested)
 		nmi_exit();
 
+	this_cpu_set_ftrace_enabled(ftrace_enabled);
+
 	/* What should we do here? We could issue a shutdown or hard reset. */
 }
 
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v4 15/16] powerpc/traps: make unrecoverable NMIs die instead of panic
  2020-05-08  4:33 [PATCH v4 00/16] powerpc: machine check and system reset fixes Nicholas Piggin
                   ` (13 preceding siblings ...)
  2020-05-08  4:34 ` [PATCH v4 14/16] powerpc/traps: system reset do not trace Nicholas Piggin
@ 2020-05-08  4:34 ` Nicholas Piggin
  2020-05-08  4:34 ` [PATCH v4 16/16] powerpc/traps: Machine check fix RI=0 recoverability check Nicholas Piggin
  2020-05-20 11:00 ` [PATCH v4 00/16] powerpc: machine check and system reset fixes Michael Ellerman
  16 siblings, 0 replies; 25+ messages in thread
From: Nicholas Piggin @ 2020-05-08  4:34 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin

System Reset and Machine Check interrupts that are not recoverable due
to being nested or interrupting when RI=0 currently panic. This is
not necessary, and can often just kill the current context and recover.

Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/traps.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index ee209c5a1ad7..477befcda8d3 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -513,11 +513,11 @@ void system_reset_exception(struct pt_regs *regs)
 #ifdef CONFIG_PPC_BOOK3S_64
 	BUG_ON(get_paca()->in_nmi == 0);
 	if (get_paca()->in_nmi > 1)
-		nmi_panic(regs, "Unrecoverable nested System Reset");
+		die("Unrecoverable nested System Reset", regs, SIGABRT);
 #endif
 	/* Must die if the interrupt is not recoverable */
 	if (!(regs->msr & MSR_RI))
-		nmi_panic(regs, "Unrecoverable System Reset");
+		die("Unrecoverable System Reset", regs, SIGABRT);
 
 	if (saved_hsrrs) {
 		mtspr(SPRN_HSRR0, hsrr0);
@@ -875,7 +875,7 @@ void machine_check_exception(struct pt_regs *regs)
 
 	/* Must die if the interrupt is not recoverable */
 	if (!(regs->msr & MSR_RI))
-		nmi_panic(regs, "Unrecoverable Machine check");
+		die("Unrecoverable Machine check", regs, SIGBUS);
 
 	return;
 
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v4 16/16] powerpc/traps: Machine check fix RI=0 recoverability check
  2020-05-08  4:33 [PATCH v4 00/16] powerpc: machine check and system reset fixes Nicholas Piggin
                   ` (14 preceding siblings ...)
  2020-05-08  4:34 ` [PATCH v4 15/16] powerpc/traps: make unrecoverable NMIs die instead of panic Nicholas Piggin
@ 2020-05-08  4:34 ` Nicholas Piggin
  2020-05-20 11:00 ` [PATCH v4 00/16] powerpc: machine check and system reset fixes Michael Ellerman
  16 siblings, 0 replies; 25+ messages in thread
From: Nicholas Piggin @ 2020-05-08  4:34 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin

The MSR[RI]=0 recoverability check should be in the recovered machine
check case. Without this, a machine check that hits in a RI region that
has for example live SRRs, will cause the interrupted context to resume
with corrupted registers and crash unpredictably.

This does not affect 64s at the moment, because it does its own early
handling with RI check, but it may affect 32s.

Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/traps.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 477befcda8d3..759d8dbf867b 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -873,13 +873,13 @@ void machine_check_exception(struct pt_regs *regs)
 
 	die("Machine check", regs, SIGBUS);
 
+	return;
+
+bail:
 	/* Must die if the interrupt is not recoverable */
 	if (!(regs->msr & MSR_RI))
 		die("Unrecoverable Machine check", regs, SIGBUS);
 
-	return;
-
-bail:
 	if (!nested)
 		nmi_exit();
 }
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 04/16] powerpc/64s/exceptions: machine check reconcile irq state
  2020-05-08  4:33 ` [PATCH v4 04/16] powerpc/64s/exceptions: machine check reconcile irq state Nicholas Piggin
@ 2020-05-08 13:39   ` Michael Ellerman
  2020-05-09  7:48     ` Nicholas Piggin
  0 siblings, 1 reply; 25+ messages in thread
From: Michael Ellerman @ 2020-05-08 13:39 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev; +Cc: Nicholas Piggin

Nicholas Piggin <npiggin@gmail.com> writes:

> pseries fwnmi machine check code pops the soft-irq checks in rtas_call
> (after the previous patch to remove rtas_token from this call path).
             ^
             I changed this to "next" which I think is what you meant?

cheers

> Rather than play whack a mole with these and forever having fragile
> code, it seems better to have the early machine check handler perform
> the same kind of reconcile as the other NMI interrupts.
>
>   WARNING: CPU: 0 PID: 493 at arch/powerpc/kernel/irq.c:343
>   CPU: 0 PID: 493 Comm: a Tainted: G        W
>   NIP:  c00000000001ed2c LR: c000000000042c40 CTR: 0000000000000000
>   REGS: c0000001fffd38b0 TRAP: 0700   Tainted: G        W
>   MSR:  8000000000021003 <SF,ME,RI,LE>  CR: 28000488  XER: 00000000
>   CFAR: c00000000001ec90 IRQMASK: 0
>   GPR00: c000000000043820 c0000001fffd3b40 c0000000012ba300 0000000000000000
>   GPR04: 0000000048000488 0000000000000000 0000000000000000 00000000deadbeef
>   GPR08: 0000000000000080 0000000000000000 0000000000000000 0000000000001001
>   GPR12: 0000000000000000 c0000000014a0000 0000000000000000 0000000000000000
>   GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>   GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>   GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>   GPR28: 0000000000000000 0000000000000001 c000000001360810 0000000000000000
>   NIP [c00000000001ed2c] arch_local_irq_restore.part.0+0xac/0x100
>   LR [c000000000042c40] unlock_rtas+0x30/0x90
>   Call Trace:
>   [c0000001fffd3b40] [c000000001360810] 0xc000000001360810 (unreliable)
>   [c0000001fffd3b60] [c000000000043820] rtas_call+0x1c0/0x280
>   [c0000001fffd3bb0] [c0000000000dc328] fwnmi_release_errinfo+0x38/0x70
>   [c0000001fffd3c10] [c0000000000dcd8c] pseries_machine_check_realmode+0x1dc/0x540
>   [c0000001fffd3cd0] [c00000000003fe04] machine_check_early+0x54/0x70
>   [c0000001fffd3d00] [c000000000008384] machine_check_early_common+0x134/0x1f0
>   --- interrupt: 200 at 0x13f1307c8
>       LR = 0x7fff888b8528
>   Instruction dump:
>   60000000 7d2000a6 71298000 41820068 39200002 7d210164 4bffff9c 60000000
>   60000000 7d2000a6 71298000 4c820020 <0fe00000> 4e800020 60000000 60000000
>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>  arch/powerpc/kernel/exceptions-64s.S | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
>
> diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
> index a42b73efb1a9..072772803b7c 100644
> --- a/arch/powerpc/kernel/exceptions-64s.S
> +++ b/arch/powerpc/kernel/exceptions-64s.S
> @@ -1116,11 +1116,30 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
>  	li	r10,MSR_RI
>  	mtmsrd	r10,1
>  
> +	/*
> +	 * Set IRQS_ALL_DISABLED and save PACAIRQHAPPENED (see
> +	 * system_reset_common)
> +	 */
> +	li	r10,IRQS_ALL_DISABLED
> +	stb	r10,PACAIRQSOFTMASK(r13)
> +	lbz	r10,PACAIRQHAPPENED(r13)
> +	std	r10,RESULT(r1)
> +	ori	r10,r10,PACA_IRQ_HARD_DIS
> +	stb	r10,PACAIRQHAPPENED(r13)
> +
>  	addi	r3,r1,STACK_FRAME_OVERHEAD
>  	bl	machine_check_early
>  	std	r3,RESULT(r1)	/* Save result */
>  	ld	r12,_MSR(r1)
>  
> +	/*
> +	 * Restore soft mask settings.
> +	 */
> +	ld	r10,RESULT(r1)
> +	stb	r10,PACAIRQHAPPENED(r13)
> +	ld	r10,SOFTE(r1)
> +	stb	r10,PACAIRQSOFTMASK(r13)
> +
>  #ifdef CONFIG_PPC_P7_NAP
>  	/*
>  	 * Check if thread was in power saving mode. We come here when any
> -- 
> 2.23.0

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 11/16] powerpc/64s: machine check interrupt update NMI accounting
  2020-05-08  4:34 ` [PATCH v4 11/16] powerpc/64s: machine check interrupt update NMI accounting Nicholas Piggin
@ 2020-05-09  3:13     ` kbuild test robot
  0 siblings, 0 replies; 25+ messages in thread
From: kbuild test robot @ 2020-05-09  3:13 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev; +Cc: kbuild-all, Nicholas Piggin

[-- Attachment #1: Type: text/plain, Size: 4851 bytes --]

Hi Nicholas,

I love your patch! Yet something to improve:

[auto build test ERROR on powerpc/next]
[also build test ERROR on tip/perf/core v5.7-rc4 next-20200508]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Nicholas-Piggin/powerpc-machine-check-and-system-reset-fixes/20200509-030554
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-randconfig-r002-20200509 (attached as .config)
compiler: powerpc64-linux-gcc (GCC) 9.3.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day GCC_VERSION=9.3.0 make.cross ARCH=powerpc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   In file included from include/linux/kernel.h:15,
                    from include/asm-generic/bug.h:19,
                    from arch/powerpc/include/asm/bug.h:109,
                    from include/linux/bug.h:5,
                    from arch/powerpc/include/asm/mmu.h:130,
                    from arch/powerpc/include/asm/paca.h:18,
                    from arch/powerpc/include/asm/current.h:13,
                    from include/linux/sched.h:12,
                    from arch/powerpc/kernel/process.c:14:
   arch/powerpc/kernel/process.c: In function 'show_regs':
>> arch/powerpc/kernel/process.c:1425:74: error: 'struct paca_struct' has no member named 'in_nmi'
    1425 |  pr_cont("IRQMASK: %lx IN_NMI:%d IN_MCE:%d", regs->softe, (int)get_paca()->in_nmi, (int)get_paca()->in_mce);
         |                                                                          ^~
   include/linux/printk.h:312:26: note: in definition of macro 'pr_cont'
     312 |  printk(KERN_CONT fmt, ##__VA_ARGS__)
         |                          ^~~~~~~~~~~
>> arch/powerpc/kernel/process.c:1425:99: error: 'struct paca_struct' has no member named 'in_mce'
    1425 |  pr_cont("IRQMASK: %lx IN_NMI:%d IN_MCE:%d", regs->softe, (int)get_paca()->in_nmi, (int)get_paca()->in_mce);
         |                                                                                                   ^~
   include/linux/printk.h:312:26: note: in definition of macro 'pr_cont'
     312 |  printk(KERN_CONT fmt, ##__VA_ARGS__)
         |                          ^~~~~~~~~~~

vim +1425 arch/powerpc/kernel/process.c

  1401	
  1402	void show_regs(struct pt_regs * regs)
  1403	{
  1404		int i, trap;
  1405	
  1406		show_regs_print_info(KERN_DEFAULT);
  1407	
  1408		printk("NIP:  "REG" LR: "REG" CTR: "REG"\n",
  1409		       regs->nip, regs->link, regs->ctr);
  1410		printk("REGS: %px TRAP: %04lx   %s  (%s)\n",
  1411		       regs, regs->trap, print_tainted(), init_utsname()->release);
  1412		printk("MSR:  "REG" ", regs->msr);
  1413		print_msr_bits(regs->msr);
  1414		pr_cont("  CR: %08lx  XER: %08lx\n", regs->ccr, regs->xer);
  1415		trap = TRAP(regs);
  1416		if ((TRAP(regs) != 0xc00) && cpu_has_feature(CPU_FTR_CFAR))
  1417			pr_cont("CFAR: "REG" ", regs->orig_gpr3);
  1418		if (trap == 0x200 || trap == 0x300 || trap == 0x600)
  1419	#if defined(CONFIG_4xx) || defined(CONFIG_BOOKE)
  1420			pr_cont("DEAR: "REG" ESR: "REG" ", regs->dar, regs->dsisr);
  1421	#else
  1422			pr_cont("DAR: "REG" DSISR: %08lx ", regs->dar, regs->dsisr);
  1423	#endif
  1424	#ifdef CONFIG_PPC64
> 1425		pr_cont("IRQMASK: %lx IN_NMI:%d IN_MCE:%d", regs->softe, (int)get_paca()->in_nmi, (int)get_paca()->in_mce);
  1426	#endif
  1427	#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
  1428		if (MSR_TM_ACTIVE(regs->msr))
  1429			pr_cont("\nPACATMSCRATCH: %016llx ", get_paca()->tm_scratch);
  1430	#endif
  1431	
  1432		for (i = 0;  i < 32;  i++) {
  1433			if ((i % REGS_PER_LINE) == 0)
  1434				pr_cont("\nGPR%02d: ", i);
  1435			pr_cont(REG " ", regs->gpr[i]);
  1436			if (i == LAST_VOLATILE && !FULL_REGS(regs))
  1437				break;
  1438		}
  1439		pr_cont("\n");
  1440	#ifdef CONFIG_KALLSYMS
  1441		/*
  1442		 * Lookup NIP late so we have the best change of getting the
  1443		 * above info out without failing
  1444		 */
  1445		printk("NIP ["REG"] %pS\n", regs->nip, (void *)regs->nip);
  1446		printk("LR ["REG"] %pS\n", regs->link, (void *)regs->link);
  1447	#endif
  1448		show_stack(current, (unsigned long *) regs->gpr[1]);
  1449		if (!user_mode(regs))
  1450			show_instructions(regs);
  1451	}
  1452	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 26096 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 11/16] powerpc/64s: machine check interrupt update NMI accounting
@ 2020-05-09  3:13     ` kbuild test robot
  0 siblings, 0 replies; 25+ messages in thread
From: kbuild test robot @ 2020-05-09  3:13 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 4959 bytes --]

Hi Nicholas,

I love your patch! Yet something to improve:

[auto build test ERROR on powerpc/next]
[also build test ERROR on tip/perf/core v5.7-rc4 next-20200508]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Nicholas-Piggin/powerpc-machine-check-and-system-reset-fixes/20200509-030554
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-randconfig-r002-20200509 (attached as .config)
compiler: powerpc64-linux-gcc (GCC) 9.3.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day GCC_VERSION=9.3.0 make.cross ARCH=powerpc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   In file included from include/linux/kernel.h:15,
                    from include/asm-generic/bug.h:19,
                    from arch/powerpc/include/asm/bug.h:109,
                    from include/linux/bug.h:5,
                    from arch/powerpc/include/asm/mmu.h:130,
                    from arch/powerpc/include/asm/paca.h:18,
                    from arch/powerpc/include/asm/current.h:13,
                    from include/linux/sched.h:12,
                    from arch/powerpc/kernel/process.c:14:
   arch/powerpc/kernel/process.c: In function 'show_regs':
>> arch/powerpc/kernel/process.c:1425:74: error: 'struct paca_struct' has no member named 'in_nmi'
    1425 |  pr_cont("IRQMASK: %lx IN_NMI:%d IN_MCE:%d", regs->softe, (int)get_paca()->in_nmi, (int)get_paca()->in_mce);
         |                                                                          ^~
   include/linux/printk.h:312:26: note: in definition of macro 'pr_cont'
     312 |  printk(KERN_CONT fmt, ##__VA_ARGS__)
         |                          ^~~~~~~~~~~
>> arch/powerpc/kernel/process.c:1425:99: error: 'struct paca_struct' has no member named 'in_mce'
    1425 |  pr_cont("IRQMASK: %lx IN_NMI:%d IN_MCE:%d", regs->softe, (int)get_paca()->in_nmi, (int)get_paca()->in_mce);
         |                                                                                                   ^~
   include/linux/printk.h:312:26: note: in definition of macro 'pr_cont'
     312 |  printk(KERN_CONT fmt, ##__VA_ARGS__)
         |                          ^~~~~~~~~~~

vim +1425 arch/powerpc/kernel/process.c

  1401	
  1402	void show_regs(struct pt_regs * regs)
  1403	{
  1404		int i, trap;
  1405	
  1406		show_regs_print_info(KERN_DEFAULT);
  1407	
  1408		printk("NIP:  "REG" LR: "REG" CTR: "REG"\n",
  1409		       regs->nip, regs->link, regs->ctr);
  1410		printk("REGS: %px TRAP: %04lx   %s  (%s)\n",
  1411		       regs, regs->trap, print_tainted(), init_utsname()->release);
  1412		printk("MSR:  "REG" ", regs->msr);
  1413		print_msr_bits(regs->msr);
  1414		pr_cont("  CR: %08lx  XER: %08lx\n", regs->ccr, regs->xer);
  1415		trap = TRAP(regs);
  1416		if ((TRAP(regs) != 0xc00) && cpu_has_feature(CPU_FTR_CFAR))
  1417			pr_cont("CFAR: "REG" ", regs->orig_gpr3);
  1418		if (trap == 0x200 || trap == 0x300 || trap == 0x600)
  1419	#if defined(CONFIG_4xx) || defined(CONFIG_BOOKE)
  1420			pr_cont("DEAR: "REG" ESR: "REG" ", regs->dar, regs->dsisr);
  1421	#else
  1422			pr_cont("DAR: "REG" DSISR: %08lx ", regs->dar, regs->dsisr);
  1423	#endif
  1424	#ifdef CONFIG_PPC64
> 1425		pr_cont("IRQMASK: %lx IN_NMI:%d IN_MCE:%d", regs->softe, (int)get_paca()->in_nmi, (int)get_paca()->in_mce);
  1426	#endif
  1427	#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
  1428		if (MSR_TM_ACTIVE(regs->msr))
  1429			pr_cont("\nPACATMSCRATCH: %016llx ", get_paca()->tm_scratch);
  1430	#endif
  1431	
  1432		for (i = 0;  i < 32;  i++) {
  1433			if ((i % REGS_PER_LINE) == 0)
  1434				pr_cont("\nGPR%02d: ", i);
  1435			pr_cont(REG " ", regs->gpr[i]);
  1436			if (i == LAST_VOLATILE && !FULL_REGS(regs))
  1437				break;
  1438		}
  1439		pr_cont("\n");
  1440	#ifdef CONFIG_KALLSYMS
  1441		/*
  1442		 * Lookup NIP late so we have the best change of getting the
  1443		 * above info out without failing
  1444		 */
  1445		printk("NIP ["REG"] %pS\n", regs->nip, (void *)regs->nip);
  1446		printk("LR ["REG"] %pS\n", regs->link, (void *)regs->link);
  1447	#endif
  1448		show_stack(current, (unsigned long *) regs->gpr[1]);
  1449		if (!user_mode(regs))
  1450			show_instructions(regs);
  1451	}
  1452	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 26096 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 04/16] powerpc/64s/exceptions: machine check reconcile irq state
  2020-05-08 13:39   ` Michael Ellerman
@ 2020-05-09  7:48     ` Nicholas Piggin
  0 siblings, 0 replies; 25+ messages in thread
From: Nicholas Piggin @ 2020-05-09  7:48 UTC (permalink / raw)
  To: linuxppc-dev, Michael Ellerman

Excerpts from Michael Ellerman's message of May 8, 2020 11:39 pm:
> Nicholas Piggin <npiggin@gmail.com> writes:
> 
>> pseries fwnmi machine check code pops the soft-irq checks in rtas_call
>> (after the previous patch to remove rtas_token from this call path).
>              ^
>              I changed this to "next" which I think is what you meant?

Yes I rearranged them, so yes.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 11/16] powerpc/64s: machine check interrupt update NMI accounting
  2020-05-09  3:13     ` kbuild test robot
  (?)
@ 2020-05-09  7:50     ` Nicholas Piggin
  2020-05-11  9:50         ` Michael Ellerman
  -1 siblings, 1 reply; 25+ messages in thread
From: Nicholas Piggin @ 2020-05-09  7:50 UTC (permalink / raw)
  To: linuxppc-dev, kbuild test robot; +Cc: kbuild-all

Excerpts from kbuild test robot's message of May 9, 2020 1:13 pm:
> Hi Nicholas,
> 
> I love your patch! Yet something to improve:

...

>   1419	#if defined(CONFIG_4xx) || defined(CONFIG_BOOKE)
>   1420			pr_cont("DEAR: "REG" ESR: "REG" ", regs->dar, regs->dsisr);
>   1421	#else
>   1422			pr_cont("DAR: "REG" DSISR: %08lx ", regs->dar, regs->dsisr);
>   1423	#endif
>   1424	#ifdef CONFIG_PPC64
>> 1425		pr_cont("IRQMASK: %lx IN_NMI:%d IN_MCE:%d", regs->softe, (int)get_paca()->in_nmi, (int)get_paca()->in_mce);

Oh I meant to get rid of that hunk, it crept back in :(

mpe if you could please take it out if you're merging this.

It was quite useful for debugging this stuff, I might do a proper patch 
for this, but for now not necessary (it doesn't matter for "normal" 
crashes only crash crashes).

Thanks,
Nick

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 11/16] powerpc/64s: machine check interrupt update NMI accounting
  2020-05-09  7:50     ` Nicholas Piggin
@ 2020-05-11  9:50         ` Michael Ellerman
  0 siblings, 0 replies; 25+ messages in thread
From: Michael Ellerman @ 2020-05-11  9:50 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev, kbuild test robot; +Cc: kbuild-all

Nicholas Piggin <npiggin@gmail.com> writes:
> Excerpts from kbuild test robot's message of May 9, 2020 1:13 pm:
>> Hi Nicholas,
>> 
>> I love your patch! Yet something to improve:
>
> ...
>
>>   1419	#if defined(CONFIG_4xx) || defined(CONFIG_BOOKE)
>>   1420			pr_cont("DEAR: "REG" ESR: "REG" ", regs->dar, regs->dsisr);
>>   1421	#else
>>   1422			pr_cont("DAR: "REG" DSISR: %08lx ", regs->dar, regs->dsisr);
>>   1423	#endif
>>   1424	#ifdef CONFIG_PPC64
>>> 1425		pr_cont("IRQMASK: %lx IN_NMI:%d IN_MCE:%d", regs->softe, (int)get_paca()->in_nmi, (int)get_paca()->in_mce);
>
> Oh I meant to get rid of that hunk, it crept back in :(
>
> mpe if you could please take it out if you're merging this.

Yep. I just came here to tell you I'd dropped that hunk :)

> It was quite useful for debugging this stuff, I might do a proper patch 
> for this, but for now not necessary (it doesn't matter for "normal" 
> crashes only crash crashes).

Yeah would be good to print more of those flags.

cheers

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 11/16] powerpc/64s: machine check interrupt update NMI accounting
@ 2020-05-11  9:50         ` Michael Ellerman
  0 siblings, 0 replies; 25+ messages in thread
From: Michael Ellerman @ 2020-05-11  9:50 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 1022 bytes --]

Nicholas Piggin <npiggin@gmail.com> writes:
> Excerpts from kbuild test robot's message of May 9, 2020 1:13 pm:
>> Hi Nicholas,
>> 
>> I love your patch! Yet something to improve:
>
> ...
>
>>   1419	#if defined(CONFIG_4xx) || defined(CONFIG_BOOKE)
>>   1420			pr_cont("DEAR: "REG" ESR: "REG" ", regs->dar, regs->dsisr);
>>   1421	#else
>>   1422			pr_cont("DAR: "REG" DSISR: %08lx ", regs->dar, regs->dsisr);
>>   1423	#endif
>>   1424	#ifdef CONFIG_PPC64
>>> 1425		pr_cont("IRQMASK: %lx IN_NMI:%d IN_MCE:%d", regs->softe, (int)get_paca()->in_nmi, (int)get_paca()->in_mce);
>
> Oh I meant to get rid of that hunk, it crept back in :(
>
> mpe if you could please take it out if you're merging this.

Yep. I just came here to tell you I'd dropped that hunk :)

> It was quite useful for debugging this stuff, I might do a proper patch 
> for this, but for now not necessary (it doesn't matter for "normal" 
> crashes only crash crashes).

Yeah would be good to print more of those flags.

cheers

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 00/16] powerpc: machine check and system reset fixes
  2020-05-08  4:33 [PATCH v4 00/16] powerpc: machine check and system reset fixes Nicholas Piggin
                   ` (15 preceding siblings ...)
  2020-05-08  4:34 ` [PATCH v4 16/16] powerpc/traps: Machine check fix RI=0 recoverability check Nicholas Piggin
@ 2020-05-20 11:00 ` Michael Ellerman
  16 siblings, 0 replies; 25+ messages in thread
From: Michael Ellerman @ 2020-05-20 11:00 UTC (permalink / raw)
  To: linuxppc-dev, Nicholas Piggin

On Fri, 8 May 2020 14:33:52 +1000, Nicholas Piggin wrote:
> Since v3, I fixed a compile error and returned the generic machine check
> exception handler to be NMI on 32 and 64e, as caught by Christophe's
> review.
> 
> Also added the last patch, just found it by looking at the code, a
> review for 32s would be good.
> 
> [...]

Patches 1-15 applied to powerpc/next.

[01/16] powerpc/64s/exception: Fix machine check no-loss idle wakeup
        https://git.kernel.org/powerpc/c/8a5054d8cbbe03c68dcb0957c291c942132e4101
[02/16] powerpc/64s/exceptions: Fix in_mce accounting in unrecoverable path
        https://git.kernel.org/powerpc/c/ac2a2a1417391180ef12f908a2864692d6d76d40
[03/16] powerpc/64s/exceptions: Change irq reconcile for NMIs from reusing _DAR to RESULT
        https://git.kernel.org/powerpc/c/16754d25bd7d4e53a52b311d99cc7a8fba875d81
[04/16] powerpc/64s/exceptions: Machine check reconcile irq state
        https://git.kernel.org/powerpc/c/f0fd9dd3c213c947dfb5bc2cad3ef5e30d3258ec
[05/16] powerpc/pseries/ras: Avoid calling rtas_token() in NMI paths
        https://git.kernel.org/powerpc/c/7368b38b21bfa39df637701a480262c15ab1a49e
[06/16] powerpc/pseries/ras: Fix FWNMI_VALID off by one
        https://git.kernel.org/powerpc/c/deb70f7a35a22dffa55b2c3aac71bc6fb0f486ce
[07/16] powerpc/pseries/ras: fwnmi avoid modifying r3 in error case
        https://git.kernel.org/powerpc/c/dff681e95a23f28b3c688a8bd5535f78bd726bc8
[08/16] powerpc/pseries/ras: fwnmi sreset should not interlock
        https://git.kernel.org/powerpc/c/d7b14c5c042865070a1411078ab49ea17bad0b41
[09/16] powerpc/pseries: Limit machine check stack to 4GB
        https://git.kernel.org/powerpc/c/d2cbbd45d433b96e41711a293e59cff259143694
[10/16] powerpc/pseries: Machine check use rtas_call_unlocked() with args on stack
        https://git.kernel.org/powerpc/c/2576f5f9169620bf329cf1e91086e6041b98e4b2
[11/16] powerpc/64s: machine check interrupt update NMI accounting
        https://git.kernel.org/powerpc/c/116ac378bb3ff844df333e7609e7604651a0db9d
[12/16] powerpc: Implement ftrace_enabled() helpers
        https://git.kernel.org/powerpc/c/f2d7f62e4abdb03de3f4267361d96c417312d05c
[13/16] powerpc/64s: machine check do not trace real-mode handler
        https://git.kernel.org/powerpc/c/abd106fb437ad1cd8c8df8ccabd0fa941ef6342a
[14/16] powerpc/traps: Do not trace system reset
        https://git.kernel.org/powerpc/c/bbbc8032b00f8ef287894425fbdb691049e28d39
[15/16] powerpc/traps: Make unrecoverable NMIs die instead of panic
        https://git.kernel.org/powerpc/c/265d6e588d87194c2fe2d6c240247f0264e0c19b

cheers

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2020-05-20 11:29 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-08  4:33 [PATCH v4 00/16] powerpc: machine check and system reset fixes Nicholas Piggin
2020-05-08  4:33 ` [PATCH v4 01/16] powerpc/64s/exception: Fix machine check no-loss idle wakeup Nicholas Piggin
2020-05-08  4:33 ` [PATCH v4 02/16] powerpc/64s/exceptions: Fix in_mce accounting in unrecoverable path Nicholas Piggin
2020-05-08  4:33 ` [PATCH v4 03/16] powerpc/64s/exceptions: Change irq reconcile for NMIs from reusing _DAR to RESULT Nicholas Piggin
2020-05-08  4:33 ` [PATCH v4 04/16] powerpc/64s/exceptions: machine check reconcile irq state Nicholas Piggin
2020-05-08 13:39   ` Michael Ellerman
2020-05-09  7:48     ` Nicholas Piggin
2020-05-08  4:33 ` [PATCH v4 05/16] powerpc/pseries/ras: avoid calling rtas_token in NMI paths Nicholas Piggin
2020-05-08  4:33 ` [PATCH v4 06/16] powerpc/pseries/ras: FWNMI_VALID off by one Nicholas Piggin
2020-05-08  4:33 ` [PATCH v4 07/16] powerpc/pseries/ras: fwnmi avoid modifying r3 in error case Nicholas Piggin
2020-05-08  4:34 ` [PATCH v4 08/16] powerpc/pseries/ras: fwnmi sreset should not interlock Nicholas Piggin
2020-05-08  4:34 ` [PATCH v4 09/16] powerpc/pseries: limit machine check stack to 4GB Nicholas Piggin
2020-05-08  4:34 ` [PATCH v4 10/16] powerpc/pseries: machine check use rtas_call_unlocked with args on stack Nicholas Piggin
2020-05-08  4:34 ` [PATCH v4 11/16] powerpc/64s: machine check interrupt update NMI accounting Nicholas Piggin
2020-05-09  3:13   ` kbuild test robot
2020-05-09  3:13     ` kbuild test robot
2020-05-09  7:50     ` Nicholas Piggin
2020-05-11  9:50       ` Michael Ellerman
2020-05-11  9:50         ` Michael Ellerman
2020-05-08  4:34 ` [PATCH v4 12/16] powerpc: implement ftrace_enabled helper Nicholas Piggin
2020-05-08  4:34 ` [PATCH v4 13/16] powerpc/64s: machine check do not trace real-mode handler Nicholas Piggin
2020-05-08  4:34 ` [PATCH v4 14/16] powerpc/traps: system reset do not trace Nicholas Piggin
2020-05-08  4:34 ` [PATCH v4 15/16] powerpc/traps: make unrecoverable NMIs die instead of panic Nicholas Piggin
2020-05-08  4:34 ` [PATCH v4 16/16] powerpc/traps: Machine check fix RI=0 recoverability check Nicholas Piggin
2020-05-20 11:00 ` [PATCH v4 00/16] powerpc: machine check and system reset fixes Michael Ellerman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.