All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/3] machine check handling improvements
@ 2017-07-19  6:59 Nicholas Piggin
  2017-07-19  6:59 ` [PATCH v2 1/3] powerpc/powernv: handle the platform error reboot in ppc_md.restart Nicholas Piggin
                   ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Nicholas Piggin @ 2017-07-19  6:59 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Nicholas Piggin, Michael Ellerman, Mahesh Jagannath Salgaonkar

This is a few patches to improve machine check reporting
and recovery.

Since last time:

- improved patch 1 according to review comments:
  https://patchwork.ozlabs.org/patch/785240/

- In patch 3 be a bit smarter about when to use die() and
  when to go straight to firmware reboot.

Nicholas Piggin (3):
  powerpc/powernv: handle the platform error reboot in ppc_md.restart
  powerpc/powernv: machine check use kernel crash path
  powerpc: machine check interrupt is a non-maskable interrupt

 arch/powerpc/include/asm/bug.h            |   1 +
 arch/powerpc/include/asm/fadump.h         |   2 +
 arch/powerpc/include/asm/opal.h           |   2 +-
 arch/powerpc/kernel/fadump.c              |   9 ++-
 arch/powerpc/kernel/traps.c               |  31 +++++++-
 arch/powerpc/platforms/powernv/opal-hmi.c |  22 +-----
 arch/powerpc/platforms/powernv/opal.c     | 121 +++++++++++++++++++-----------
 arch/powerpc/platforms/powernv/powernv.h  |   2 +
 8 files changed, 122 insertions(+), 68 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 1/3] powerpc/powernv: handle the platform error reboot in ppc_md.restart
  2017-07-19  6:59 [PATCH v2 0/3] machine check handling improvements Nicholas Piggin
@ 2017-07-19  6:59 ` Nicholas Piggin
  2017-07-19  7:16   ` Nicholas Piggin
                     ` (2 more replies)
  2017-07-19  6:59 ` [PATCH v2 2/3] powerpc/powernv: machine check use kernel crash path Nicholas Piggin
  2017-07-19  6:59 ` [PATCH v2 3/3] powerpc: machine check interrupt is a non-maskable interrupt Nicholas Piggin
  2 siblings, 3 replies; 21+ messages in thread
From: Nicholas Piggin @ 2017-07-19  6:59 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Nicholas Piggin, Michael Ellerman, Mahesh Jagannath Salgaonkar

Unrecovered MCE and HMI errors are sent through a special restart OPAL
call to log the platform error. The downside is that they don't go
through normal Linux crash paths, so they don't give much information
to the Linux console.

Change this by providing a special crash function which does some of
the console flushing from the panic() path before calling firmware to
reboot.

The downside of this is a little more code to execute before reaching
the firmware reboot. However in practice, it's critical to get the
Linux console messages output in order to debug a problem. So this is
a desirable tradeoff.

Note on the implementation: It is difficult to plumb a custom reboot
handler into the panic path, because panic does a little bit too much
work. For example, it will try to delay with the timebase, but that
may be corrupted in some cases resulting in a hang without reaching
the platform reboot. Another problem is that panic can invoke the
crash dump code which is not what we want in the case of a hardware
platform error. Long-term the best solution will be to rework the
panic path so it can be suitable for this kind of panic, but for now
we just duplicate a bit of the code.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/include/asm/opal.h           |  2 +-
 arch/powerpc/platforms/powernv/opal-hmi.c | 22 ++------
 arch/powerpc/platforms/powernv/opal.c     | 89 ++++++++++++++++++-------------
 arch/powerpc/platforms/powernv/powernv.h  |  2 +
 4 files changed, 57 insertions(+), 58 deletions(-)

diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 588fb1c23af9..182dab435aad 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -50,7 +50,7 @@ int64_t opal_tpo_write(uint64_t token, uint32_t year_mon_day,
 		       uint32_t hour_min);
 int64_t opal_cec_power_down(uint64_t request);
 int64_t opal_cec_reboot(void);
-int64_t opal_cec_reboot2(uint32_t reboot_type, char *diag);
+int64_t opal_cec_reboot2(uint32_t reboot_type, const char *diag);
 int64_t opal_read_nvram(uint64_t buffer, uint64_t size, uint64_t offset);
 int64_t opal_write_nvram(uint64_t buffer, uint64_t size, uint64_t offset);
 int64_t opal_handle_interrupt(uint64_t isn, __be64 *outstanding_event_mask);
diff --git a/arch/powerpc/platforms/powernv/opal-hmi.c b/arch/powerpc/platforms/powernv/opal-hmi.c
index 88f3c61eec95..d78fed728cdf 100644
--- a/arch/powerpc/platforms/powernv/opal-hmi.c
+++ b/arch/powerpc/platforms/powernv/opal-hmi.c
@@ -30,6 +30,8 @@
 #include <asm/cputable.h>
 #include <asm/machdep.h>
 
+#include "powernv.h"
+
 static int opal_hmi_handler_nb_init;
 struct OpalHmiEvtNode {
 	struct list_head list;
@@ -267,8 +269,6 @@ static void hmi_event_handler(struct work_struct *work)
 	spin_unlock_irqrestore(&opal_hmi_evt_lock, flags);
 
 	if (unrecoverable) {
-		int ret;
-
 		/* Pull all HMI events from OPAL before we panic. */
 		while (opal_get_msg(__pa(&msg), sizeof(msg)) == OPAL_SUCCESS) {
 			u32 type;
@@ -284,23 +284,7 @@ static void hmi_event_handler(struct work_struct *work)
 			print_hmi_event_info(hmi_evt);
 		}
 
-		/*
-		 * Unrecoverable HMI exception. We need to inform BMC/OCC
-		 * about this error so that it can collect relevant data
-		 * for error analysis before rebooting.
-		 */
-		ret = opal_cec_reboot2(OPAL_REBOOT_PLATFORM_ERROR,
-			"Unrecoverable HMI exception");
-		if (ret == OPAL_UNSUPPORTED) {
-			pr_emerg("Reboot type %d not supported\n",
-						OPAL_REBOOT_PLATFORM_ERROR);
-		}
-
-		/*
-		 * Fall through and panic if opal_cec_reboot2() returns
-		 * OPAL_UNSUPPORTED.
-		 */
-		panic("Unrecoverable HMI exception");
+		pnv_platform_error_reboot(NULL, "Unrecoverable HMI exception");
 	}
 }
 
diff --git a/arch/powerpc/platforms/powernv/opal.c b/arch/powerpc/platforms/powernv/opal.c
index 9b87abb178f0..96436d129684 100644
--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -25,6 +25,10 @@
 #include <linux/memblock.h>
 #include <linux/kthread.h>
 #include <linux/freezer.h>
+#include <linux/printk.h>
+#include <linux/kmsg_dump.h>
+#include <linux/console.h>
+#include <linux/sched/debug.h>
 
 #include <asm/machdep.h>
 #include <asm/opal.h>
@@ -436,10 +440,55 @@ static int opal_recover_mce(struct pt_regs *regs,
 	return recovered;
 }
 
+void pnv_platform_error_reboot(struct pt_regs *regs, const char *msg)
+{
+	/*
+	 * This is mostly taken from kernel/panic.c, but tries to do
+	 * relatively minimal work. Don't use delay functions (TB may
+	 * be broken), don't crash dump (need to set a firmware log),
+	 * don't run notifiers. We do want to get some information to
+	 * Linux console.
+	 */
+	console_verbose();
+	bust_spinlocks(1);
+	pr_emerg("Hardware platform error: %s\n", msg);
+	if (regs)
+		show_regs(regs);
+	smp_send_stop();
+	printk_safe_flush_on_panic();
+	kmsg_dump(KMSG_DUMP_PANIC);
+	bust_spinlocks(0);
+	debug_locks_off();
+	console_flush_on_panic();
+
+	/*
+	 * Don't bother to shut things down because this will
+	 * xstop the system.
+	 */
+	if (opal_cec_reboot2(OPAL_REBOOT_PLATFORM_ERROR, msg)
+						== OPAL_UNSUPPORTED) {
+		pr_emerg("Reboot type %d not supported for %s\n",
+				OPAL_REBOOT_PLATFORM_ERROR, msg);
+	}
+
+	/*
+	 * We reached here. There can be three possibilities:
+	 * 1. We are running on a firmware level that do not support
+	 *    opal_cec_reboot2()
+	 * 2. We are running on a firmware level that do not support
+	 *    OPAL_REBOOT_PLATFORM_ERROR reboot type.
+	 * 3. We are running on FSP based system that does not need
+	 *    opal to trigger checkstop explicitly for error analysis.
+	 *    The FSP PRD component would have already got notified
+	 *    about this error through other channels.
+	 */
+
+	ppc_md.restart(NULL);
+}
+
 int opal_machine_check(struct pt_regs *regs)
 {
 	struct machine_check_event evt;
-	int ret;
 
 	if (!get_mce_event(&evt, MCE_EVENT_RELEASE))
 		return 0;
@@ -455,43 +504,7 @@ int opal_machine_check(struct pt_regs *regs)
 	if (opal_recover_mce(regs, &evt))
 		return 1;
 
-	/*
-	 * Unrecovered machine check, we are heading to panic path.
-	 *
-	 * We may have hit this MCE in very early stage of kernel
-	 * initialization even before opal-prd has started running. If
-	 * this is the case then this MCE error may go un-noticed or
-	 * un-analyzed if we go down panic path. We need to inform
-	 * BMC/OCC about this error so that they can collect relevant
-	 * data for error analysis before rebooting.
-	 * Use opal_cec_reboot2(OPAL_REBOOT_PLATFORM_ERROR) to do so.
-	 * This function may not return on BMC based system.
-	 */
-	ret = opal_cec_reboot2(OPAL_REBOOT_PLATFORM_ERROR,
-			"Unrecoverable Machine Check exception");
-	if (ret == OPAL_UNSUPPORTED) {
-		pr_emerg("Reboot type %d not supported\n",
-					OPAL_REBOOT_PLATFORM_ERROR);
-	}
-
-	/*
-	 * We reached here. There can be three possibilities:
-	 * 1. We are running on a firmware level that do not support
-	 *    opal_cec_reboot2()
-	 * 2. We are running on a firmware level that do not support
-	 *    OPAL_REBOOT_PLATFORM_ERROR reboot type.
-	 * 3. We are running on FSP based system that does not need opal
-	 *    to trigger checkstop explicitly for error analysis. The FSP
-	 *    PRD component would have already got notified about this
-	 *    error through other channels.
-	 *
-	 * If hardware marked this as an unrecoverable MCE, we are
-	 * going to panic anyway. Even if it didn't, it's not safe to
-	 * continue at this point, so we should explicitly panic.
-	 */
-
-	panic("PowerNV Unrecovered Machine Check");
-	return 0;
+	pnv_platform_error_reboot(regs, "Unrecoverable Machine Check exception");
 }
 
 /* Early hmi handler called in real mode. */
diff --git a/arch/powerpc/platforms/powernv/powernv.h b/arch/powerpc/platforms/powernv/powernv.h
index 6dbc0a1da1f6..a159d48573d7 100644
--- a/arch/powerpc/platforms/powernv/powernv.h
+++ b/arch/powerpc/platforms/powernv/powernv.h
@@ -7,6 +7,8 @@ extern void pnv_smp_init(void);
 static inline void pnv_smp_init(void) { }
 #endif
 
+extern void pnv_platform_error_reboot(struct pt_regs *regs, const char *msg) __noreturn;
+
 struct pci_dev;
 
 #ifdef CONFIG_PCI
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 2/3] powerpc/powernv: machine check use kernel crash path
  2017-07-19  6:59 [PATCH v2 0/3] machine check handling improvements Nicholas Piggin
  2017-07-19  6:59 ` [PATCH v2 1/3] powerpc/powernv: handle the platform error reboot in ppc_md.restart Nicholas Piggin
@ 2017-07-19  6:59 ` Nicholas Piggin
  2017-07-20  7:14   ` Mahesh Jagannath Salgaonkar
  2017-07-19  6:59 ` [PATCH v2 3/3] powerpc: machine check interrupt is a non-maskable interrupt Nicholas Piggin
  2 siblings, 1 reply; 21+ messages in thread
From: Nicholas Piggin @ 2017-07-19  6:59 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Nicholas Piggin, Michael Ellerman, Mahesh Jagannath Salgaonkar

There are quite a few machine check exceptions that can be caused by
kernel bugs. To make debugging easier, use the kernel crash path in
cases of synchronous machine checks that occur in kernel mode, if that
would not result in the machine going straight to panic or crash dump.

There is a downside here that die()ing the process in kernel mode can
still leave the system unstable. panic_on_oops will always force the
system to fail-stop, so systems where that behaviour is important will
still do the right thing.

As a test, when triggering an i-side 0111b error (ifetch from foreign
address) in kernel mode process context on POWER9, the kernel currently
dies quickly like this:

Severe Machine check interrupt [Not recovered]
  NIP [ffff000000000000]: 0xffff000000000000
  Initiator: CPU
  Error type: Real address [Instruction fetch (foreign)]
[  127.426651616,0] OPAL: Reboot requested due to Platform error.
    Effective[  127.426693712,3] OPAL: Reboot requested due to Platform error. address: ffff000000000000
opal: Reboot type 1 not supported
Kernel panic - not syncing: PowerNV Unrecovered Machine Check
CPU: 56 PID: 4425 Comm: syscall Tainted: G   M            4.12.0-rc1-13857-ga4700a261072-dirty #35
Call Trace:
[  128.017988928,4] IPMI: BUG: Dropping ESEL on the floor due to buggy/mising code in OPAL for this BMCRebooting in 10 seconds..
Trying to free IRQ 496 from IRQ context!


After this patch, the process is killed and the kernel continues with
this message, which gives enough information to identify the offending
branch (i.e., with CFAR):

Severe Machine check interrupt [Not recovered]
  NIP [ffff000000000000]: 0xffff000000000000
  Initiator: CPU
  Error type: Real address [Instruction fetch (foreign)]
    Effective address: ffff000000000000
Oops: Machine check, sig: 7 [#1]
SMP NR_CPUS=2048
NUMA
PowerNV
Modules linked in: iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp tun bridge stp llc kvm_hv kvm iptable_filter binfmt_misc vmx_crypto ip_tables x_tables autofs4 crc32c_vpmsum
CPU: 22 PID: 4436 Comm: syscall Tainted: G   M            4.12.0-rc1-13857-ga4700a261072-dirty #36
task: c000000932300000 task.stack: c000000932380000
NIP: ffff000000000000 LR: 00000000217706a4 CTR: ffff000000000000
REGS: c00000000fc8fd80 TRAP: 0200   Tainted: G   M             (4.12.0-rc1-13857-ga4700a261072-dirty)
MSR: 90000000001c1003 <SF,HV,ME,RI,LE>
  CR: 24000484  XER: 20000000
CFAR: c000000000004c80 DAR: 0000000021770a90 DSISR: 0a000000 SOFTE: 1
GPR00: 0000000000001ebe 00007fffce4818b0 0000000021797f00 0000000000000000
GPR04: 00007fff8007ac24 0000000044000484 0000000000004000 00007fff801405e8
GPR08: 900000000280f033 0000000024000484 0000000000000000 0000000000000030
GPR12: 9000000000001003 00007fff801bc370 0000000000000000 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR28: 00007fff801b0000 0000000000000000 00000000217707a0 00007fffce481918
NIP [ffff000000000000] 0xffff000000000000
LR [00000000217706a4] 0x217706a4
Call Trace:
Instruction dump:
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
---[ end trace 32ae1dabb4f8dae6 ]---

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/include/asm/bug.h        |  1 +
 arch/powerpc/include/asm/fadump.h     |  2 ++
 arch/powerpc/kernel/fadump.c          |  9 ++++++++-
 arch/powerpc/kernel/traps.c           | 22 ++++++++++++++++++++++
 arch/powerpc/platforms/powernv/opal.c | 32 ++++++++++++++++++++++++++------
 5 files changed, 59 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h
index 0151af6c2a50..9a918b3ca5ee 100644
--- a/arch/powerpc/include/asm/bug.h
+++ b/arch/powerpc/include/asm/bug.h
@@ -133,6 +133,7 @@ extern int do_page_fault(struct pt_regs *, unsigned long, unsigned long);
 extern void bad_page_fault(struct pt_regs *, unsigned long, int);
 extern void _exception(int, struct pt_regs *, int, unsigned long);
 extern void die(const char *, struct pt_regs *, long);
+extern bool die_will_crash(void);
 
 #endif /* !__ASSEMBLY__ */
 
diff --git a/arch/powerpc/include/asm/fadump.h b/arch/powerpc/include/asm/fadump.h
index ce88bbe1d809..5a23010af600 100644
--- a/arch/powerpc/include/asm/fadump.h
+++ b/arch/powerpc/include/asm/fadump.h
@@ -209,11 +209,13 @@ extern int early_init_dt_scan_fw_dump(unsigned long node,
 extern int fadump_reserve_mem(void);
 extern int setup_fadump(void);
 extern int is_fadump_active(void);
+extern int should_fadump_crash(void);
 extern void crash_fadump(struct pt_regs *, const char *);
 extern void fadump_cleanup(void);
 
 #else	/* CONFIG_FA_DUMP */
 static inline int is_fadump_active(void) { return 0; }
+static inline int should_fadump_crash(void) { return 0; }
 static inline void crash_fadump(struct pt_regs *regs, const char *str) { }
 #endif
 #endif
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index da8830e49696..8a3058f5943b 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -125,6 +125,13 @@ int is_fadump_boot_memory_area(u64 addr, ulong size)
 	return (addr + size) > RMA_START && addr <= fw_dump.boot_memory_size;
 }
 
+int should_fadump_crash(void)
+{
+	if (!fw_dump.dump_registered || !fw_dump.fadumphdr_addr)
+		return 0;
+	return 1;
+}
+
 int is_fadump_active(void)
 {
 	return fw_dump.dump_active;
@@ -518,7 +525,7 @@ void crash_fadump(struct pt_regs *regs, const char *str)
 	struct fadump_crash_info_header *fdh = NULL;
 	int old_cpu, this_cpu;
 
-	if (!fw_dump.dump_registered || !fw_dump.fadumphdr_addr)
+	if (!should_fadump_crash())
 		return;
 
 	/*
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 574e949f8db9..2849c4f50324 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -114,6 +114,28 @@ static void pmac_backlight_unblank(void)
 static inline void pmac_backlight_unblank(void) { }
 #endif
 
+/*
+ * If oops/die is expected to crash the machine, return true here.
+ *
+ * This should not be expected to be 100% accurate, there may be
+ * notifiers registered or other unexpected conditions that may bring
+ * down the kernel. Or if the current process in the kernel is holding
+ * locks or has other critical state, the kernel may become effectively
+ * unusable anyway.
+ */
+bool die_will_crash(void)
+{
+	if (should_fadump_crash())
+		return true;
+	if (kexec_should_crash(current))
+		return true;
+	if (in_interrupt() || panic_on_oops ||
+			!current->pid || is_global_init(current))
+		return true;
+
+	return false;
+}
+
 static arch_spinlock_t die_lock = __ARCH_SPIN_LOCK_UNLOCKED;
 static int die_owner = -1;
 static unsigned int die_nest_count;
diff --git a/arch/powerpc/platforms/powernv/opal.c b/arch/powerpc/platforms/powernv/opal.c
index 96436d129684..140350aacca5 100644
--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -34,6 +34,7 @@
 #include <asm/opal.h>
 #include <asm/firmware.h>
 #include <asm/mce.h>
+#include <asm/bug.h>
 
 #include "powernv.h"
 
@@ -426,17 +427,36 @@ static int opal_recover_mce(struct pt_regs *regs,
 		/* Fatal machine check */
 		pr_err("Machine check interrupt is fatal\n");
 		recovered = 0;
-	} else if ((evt->severity == MCE_SEV_ERROR_SYNC) &&
-			(user_mode(regs) && !is_global_init(current))) {
+	}
+
+	if (!recovered && evt->severity == MCE_SEV_ERROR_SYNC) {
 		/*
-		 * For now, kill the task if we have received exception when
-		 * in userspace.
+		 * Try to kill processes if we get a synchronous machine check
+		 * (e.g., one caused by execution of this instruction). This
+		 * will devolve into a panic if we try to kill init or are in
+		 * an interrupt etc.
 		 *
 		 * TODO: Queue up this address for hwpoisioning later.
+		 * TODO: This is not quite right for d-side machine
+		 *       checks ->nip is not necessarily the important
+		 *       address.
 		 */
-		_exception(SIGBUS, regs, BUS_MCEERR_AR, regs->nip);
-		recovered = 1;
+		if ((user_mode(regs))) {
+			_exception(SIGBUS, regs, BUS_MCEERR_AR, regs->nip);
+			recovered = 1;
+		} else if (die_will_crash()) {
+			/*
+			 * die() would kill the kernel, so better to go via
+			 * the platform reboot code that will log the
+			 * machine check.
+			 */
+			recovered = 0;
+		} else {
+			die("Machine check", regs, SIGBUS);
+			recovered = 1;
+		}
 	}
+
 	return recovered;
 }
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 3/3] powerpc: machine check interrupt is a non-maskable interrupt
  2017-07-19  6:59 [PATCH v2 0/3] machine check handling improvements Nicholas Piggin
  2017-07-19  6:59 ` [PATCH v2 1/3] powerpc/powernv: handle the platform error reboot in ppc_md.restart Nicholas Piggin
  2017-07-19  6:59 ` [PATCH v2 2/3] powerpc/powernv: machine check use kernel crash path Nicholas Piggin
@ 2017-07-19  6:59 ` Nicholas Piggin
  2018-10-08 15:39   ` Christophe LEROY
  2 siblings, 1 reply; 21+ messages in thread
From: Nicholas Piggin @ 2017-07-19  6:59 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Nicholas Piggin, Michael Ellerman, Mahesh Jagannath Salgaonkar

Use nmi_enter similarly to system reset interrupts. This uses NMI
printk NMI buffers and turns off various debugging facilities that
helps avoid tripping on ourselves or other CPUs.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/traps.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 2849c4f50324..6d31f9d7c333 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -789,8 +789,10 @@ int machine_check_generic(struct pt_regs *regs)
 
 void machine_check_exception(struct pt_regs *regs)
 {
-	enum ctx_state prev_state = exception_enter();
 	int recover = 0;
+	bool nested = in_nmi();
+	if (!nested)
+		nmi_enter();
 
 	__this_cpu_inc(irq_stat.mce_exceptions);
 
@@ -820,10 +822,11 @@ void machine_check_exception(struct pt_regs *regs)
 
 	/* Must die if the interrupt is not recoverable */
 	if (!(regs->msr & MSR_RI))
-		panic("Unrecoverable Machine check");
+		nmi_panic(regs, "Unrecoverable Machine check");
 
 bail:
-	exception_exit(prev_state);
+	if (!nested)
+		nmi_exit();
 }
 
 void SMIException(struct pt_regs *regs)
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/3] powerpc/powernv: handle the platform error reboot in ppc_md.restart
  2017-07-19  6:59 ` [PATCH v2 1/3] powerpc/powernv: handle the platform error reboot in ppc_md.restart Nicholas Piggin
@ 2017-07-19  7:16   ` Nicholas Piggin
  2017-07-20  5:39   ` Mahesh Jagannath Salgaonkar
  2017-08-31 11:36   ` [v2, " Michael Ellerman
  2 siblings, 0 replies; 21+ messages in thread
From: Nicholas Piggin @ 2017-07-19  7:16 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Michael Ellerman, Mahesh Jagannath Salgaonkar

On Wed, 19 Jul 2017 16:59:10 +1000
Nicholas Piggin <npiggin@gmail.com> wrote:

> Change this by providing a special crash function which does some of
> the console flushing from the panic() path before calling firmware to
> reboot.

Oops, I changed the changelog but not the patch subject. Should
be something like "flush console before platform error reboot".

Thanks,
Nick

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/3] powerpc/powernv: handle the platform error reboot in ppc_md.restart
  2017-07-19  6:59 ` [PATCH v2 1/3] powerpc/powernv: handle the platform error reboot in ppc_md.restart Nicholas Piggin
  2017-07-19  7:16   ` Nicholas Piggin
@ 2017-07-20  5:39   ` Mahesh Jagannath Salgaonkar
  2017-08-31 11:36   ` [v2, " Michael Ellerman
  2 siblings, 0 replies; 21+ messages in thread
From: Mahesh Jagannath Salgaonkar @ 2017-07-20  5:39 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev

On 07/19/2017 12:29 PM, Nicholas Piggin wrote:
> Unrecovered MCE and HMI errors are sent through a special restart OPAL
> call to log the platform error. The downside is that they don't go
> through normal Linux crash paths, so they don't give much information
> to the Linux console.
> 
> Change this by providing a special crash function which does some of
> the console flushing from the panic() path before calling firmware to
> reboot.
> 
> The downside of this is a little more code to execute before reaching
> the firmware reboot. However in practice, it's critical to get the
> Linux console messages output in order to debug a problem. So this is
> a desirable tradeoff.
> 
> Note on the implementation: It is difficult to plumb a custom reboot
> handler into the panic path, because panic does a little bit too much
> work. For example, it will try to delay with the timebase, but that
> may be corrupted in some cases resulting in a hang without reaching
> the platform reboot. Another problem is that panic can invoke the
> crash dump code which is not what we want in the case of a hardware
> platform error. Long-term the best solution will be to rework the
> panic path so it can be suitable for this kind of panic, but for now
> we just duplicate a bit of the code.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

Thanks,
-Mahesh.

> ---
>  arch/powerpc/include/asm/opal.h           |  2 +-
>  arch/powerpc/platforms/powernv/opal-hmi.c | 22 ++------
>  arch/powerpc/platforms/powernv/opal.c     | 89 ++++++++++++++++++-------------
>  arch/powerpc/platforms/powernv/powernv.h  |  2 +
>  4 files changed, 57 insertions(+), 58 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
> index 588fb1c23af9..182dab435aad 100644
> --- a/arch/powerpc/include/asm/opal.h
> +++ b/arch/powerpc/include/asm/opal.h
> @@ -50,7 +50,7 @@ int64_t opal_tpo_write(uint64_t token, uint32_t year_mon_day,
>  		       uint32_t hour_min);
>  int64_t opal_cec_power_down(uint64_t request);
>  int64_t opal_cec_reboot(void);
> -int64_t opal_cec_reboot2(uint32_t reboot_type, char *diag);
> +int64_t opal_cec_reboot2(uint32_t reboot_type, const char *diag);
>  int64_t opal_read_nvram(uint64_t buffer, uint64_t size, uint64_t offset);
>  int64_t opal_write_nvram(uint64_t buffer, uint64_t size, uint64_t offset);
>  int64_t opal_handle_interrupt(uint64_t isn, __be64 *outstanding_event_mask);
> diff --git a/arch/powerpc/platforms/powernv/opal-hmi.c b/arch/powerpc/platforms/powernv/opal-hmi.c
> index 88f3c61eec95..d78fed728cdf 100644
> --- a/arch/powerpc/platforms/powernv/opal-hmi.c
> +++ b/arch/powerpc/platforms/powernv/opal-hmi.c
> @@ -30,6 +30,8 @@
>  #include <asm/cputable.h>
>  #include <asm/machdep.h>
> 
> +#include "powernv.h"
> +
>  static int opal_hmi_handler_nb_init;
>  struct OpalHmiEvtNode {
>  	struct list_head list;
> @@ -267,8 +269,6 @@ static void hmi_event_handler(struct work_struct *work)
>  	spin_unlock_irqrestore(&opal_hmi_evt_lock, flags);
> 
>  	if (unrecoverable) {
> -		int ret;
> -
>  		/* Pull all HMI events from OPAL before we panic. */
>  		while (opal_get_msg(__pa(&msg), sizeof(msg)) == OPAL_SUCCESS) {
>  			u32 type;
> @@ -284,23 +284,7 @@ static void hmi_event_handler(struct work_struct *work)
>  			print_hmi_event_info(hmi_evt);
>  		}
> 
> -		/*
> -		 * Unrecoverable HMI exception. We need to inform BMC/OCC
> -		 * about this error so that it can collect relevant data
> -		 * for error analysis before rebooting.
> -		 */
> -		ret = opal_cec_reboot2(OPAL_REBOOT_PLATFORM_ERROR,
> -			"Unrecoverable HMI exception");
> -		if (ret == OPAL_UNSUPPORTED) {
> -			pr_emerg("Reboot type %d not supported\n",
> -						OPAL_REBOOT_PLATFORM_ERROR);
> -		}
> -
> -		/*
> -		 * Fall through and panic if opal_cec_reboot2() returns
> -		 * OPAL_UNSUPPORTED.
> -		 */
> -		panic("Unrecoverable HMI exception");
> +		pnv_platform_error_reboot(NULL, "Unrecoverable HMI exception");
>  	}
>  }
> 
> diff --git a/arch/powerpc/platforms/powernv/opal.c b/arch/powerpc/platforms/powernv/opal.c
> index 9b87abb178f0..96436d129684 100644
> --- a/arch/powerpc/platforms/powernv/opal.c
> +++ b/arch/powerpc/platforms/powernv/opal.c
> @@ -25,6 +25,10 @@
>  #include <linux/memblock.h>
>  #include <linux/kthread.h>
>  #include <linux/freezer.h>
> +#include <linux/printk.h>
> +#include <linux/kmsg_dump.h>
> +#include <linux/console.h>
> +#include <linux/sched/debug.h>
> 
>  #include <asm/machdep.h>
>  #include <asm/opal.h>
> @@ -436,10 +440,55 @@ static int opal_recover_mce(struct pt_regs *regs,
>  	return recovered;
>  }
> 
> +void pnv_platform_error_reboot(struct pt_regs *regs, const char *msg)
> +{
> +	/*
> +	 * This is mostly taken from kernel/panic.c, but tries to do
> +	 * relatively minimal work. Don't use delay functions (TB may
> +	 * be broken), don't crash dump (need to set a firmware log),
> +	 * don't run notifiers. We do want to get some information to
> +	 * Linux console.
> +	 */
> +	console_verbose();
> +	bust_spinlocks(1);
> +	pr_emerg("Hardware platform error: %s\n", msg);
> +	if (regs)
> +		show_regs(regs);
> +	smp_send_stop();
> +	printk_safe_flush_on_panic();
> +	kmsg_dump(KMSG_DUMP_PANIC);
> +	bust_spinlocks(0);
> +	debug_locks_off();
> +	console_flush_on_panic();
> +
> +	/*
> +	 * Don't bother to shut things down because this will
> +	 * xstop the system.
> +	 */
> +	if (opal_cec_reboot2(OPAL_REBOOT_PLATFORM_ERROR, msg)
> +						== OPAL_UNSUPPORTED) {
> +		pr_emerg("Reboot type %d not supported for %s\n",
> +				OPAL_REBOOT_PLATFORM_ERROR, msg);
> +	}
> +
> +	/*
> +	 * We reached here. There can be three possibilities:
> +	 * 1. We are running on a firmware level that do not support
> +	 *    opal_cec_reboot2()
> +	 * 2. We are running on a firmware level that do not support
> +	 *    OPAL_REBOOT_PLATFORM_ERROR reboot type.
> +	 * 3. We are running on FSP based system that does not need
> +	 *    opal to trigger checkstop explicitly for error analysis.
> +	 *    The FSP PRD component would have already got notified
> +	 *    about this error through other channels.
> +	 */
> +
> +	ppc_md.restart(NULL);
> +}
> +
>  int opal_machine_check(struct pt_regs *regs)
>  {
>  	struct machine_check_event evt;
> -	int ret;
> 
>  	if (!get_mce_event(&evt, MCE_EVENT_RELEASE))
>  		return 0;
> @@ -455,43 +504,7 @@ int opal_machine_check(struct pt_regs *regs)
>  	if (opal_recover_mce(regs, &evt))
>  		return 1;
> 
> -	/*
> -	 * Unrecovered machine check, we are heading to panic path.
> -	 *
> -	 * We may have hit this MCE in very early stage of kernel
> -	 * initialization even before opal-prd has started running. If
> -	 * this is the case then this MCE error may go un-noticed or
> -	 * un-analyzed if we go down panic path. We need to inform
> -	 * BMC/OCC about this error so that they can collect relevant
> -	 * data for error analysis before rebooting.
> -	 * Use opal_cec_reboot2(OPAL_REBOOT_PLATFORM_ERROR) to do so.
> -	 * This function may not return on BMC based system.
> -	 */
> -	ret = opal_cec_reboot2(OPAL_REBOOT_PLATFORM_ERROR,
> -			"Unrecoverable Machine Check exception");
> -	if (ret == OPAL_UNSUPPORTED) {
> -		pr_emerg("Reboot type %d not supported\n",
> -					OPAL_REBOOT_PLATFORM_ERROR);
> -	}
> -
> -	/*
> -	 * We reached here. There can be three possibilities:
> -	 * 1. We are running on a firmware level that do not support
> -	 *    opal_cec_reboot2()
> -	 * 2. We are running on a firmware level that do not support
> -	 *    OPAL_REBOOT_PLATFORM_ERROR reboot type.
> -	 * 3. We are running on FSP based system that does not need opal
> -	 *    to trigger checkstop explicitly for error analysis. The FSP
> -	 *    PRD component would have already got notified about this
> -	 *    error through other channels.
> -	 *
> -	 * If hardware marked this as an unrecoverable MCE, we are
> -	 * going to panic anyway. Even if it didn't, it's not safe to
> -	 * continue at this point, so we should explicitly panic.
> -	 */
> -
> -	panic("PowerNV Unrecovered Machine Check");
> -	return 0;
> +	pnv_platform_error_reboot(regs, "Unrecoverable Machine Check exception");
>  }
> 
>  /* Early hmi handler called in real mode. */
> diff --git a/arch/powerpc/platforms/powernv/powernv.h b/arch/powerpc/platforms/powernv/powernv.h
> index 6dbc0a1da1f6..a159d48573d7 100644
> --- a/arch/powerpc/platforms/powernv/powernv.h
> +++ b/arch/powerpc/platforms/powernv/powernv.h
> @@ -7,6 +7,8 @@ extern void pnv_smp_init(void);
>  static inline void pnv_smp_init(void) { }
>  #endif
> 
> +extern void pnv_platform_error_reboot(struct pt_regs *regs, const char *msg) __noreturn;
> +
>  struct pci_dev;
> 
>  #ifdef CONFIG_PCI
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 2/3] powerpc/powernv: machine check use kernel crash path
  2017-07-19  6:59 ` [PATCH v2 2/3] powerpc/powernv: machine check use kernel crash path Nicholas Piggin
@ 2017-07-20  7:14   ` Mahesh Jagannath Salgaonkar
  0 siblings, 0 replies; 21+ messages in thread
From: Mahesh Jagannath Salgaonkar @ 2017-07-20  7:14 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev

On 07/19/2017 12:29 PM, Nicholas Piggin wrote:
> There are quite a few machine check exceptions that can be caused by
> kernel bugs. To make debugging easier, use the kernel crash path in
> cases of synchronous machine checks that occur in kernel mode, if that
> would not result in the machine going straight to panic or crash dump.
> 
> There is a downside here that die()ing the process in kernel mode can
> still leave the system unstable. panic_on_oops will always force the
> system to fail-stop, so systems where that behaviour is important will
> still do the right thing.
> 
> As a test, when triggering an i-side 0111b error (ifetch from foreign
> address) in kernel mode process context on POWER9, the kernel currently
> dies quickly like this:
> 
> Severe Machine check interrupt [Not recovered]
>   NIP [ffff000000000000]: 0xffff000000000000
>   Initiator: CPU
>   Error type: Real address [Instruction fetch (foreign)]
> [  127.426651616,0] OPAL: Reboot requested due to Platform error.
>     Effective[  127.426693712,3] OPAL: Reboot requested due to Platform error. address: ffff000000000000
> opal: Reboot type 1 not supported
> Kernel panic - not syncing: PowerNV Unrecovered Machine Check
> CPU: 56 PID: 4425 Comm: syscall Tainted: G   M            4.12.0-rc1-13857-ga4700a261072-dirty #35
> Call Trace:
> [  128.017988928,4] IPMI: BUG: Dropping ESEL on the floor due to buggy/mising code in OPAL for this BMCRebooting in 10 seconds..
> Trying to free IRQ 496 from IRQ context!
> 
> 
> After this patch, the process is killed and the kernel continues with
> this message, which gives enough information to identify the offending
> branch (i.e., with CFAR):
> 
> Severe Machine check interrupt [Not recovered]
>   NIP [ffff000000000000]: 0xffff000000000000
>   Initiator: CPU
>   Error type: Real address [Instruction fetch (foreign)]
>     Effective address: ffff000000000000
> Oops: Machine check, sig: 7 [#1]
> SMP NR_CPUS=2048
> NUMA
> PowerNV
> Modules linked in: iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp tun bridge stp llc kvm_hv kvm iptable_filter binfmt_misc vmx_crypto ip_tables x_tables autofs4 crc32c_vpmsum
> CPU: 22 PID: 4436 Comm: syscall Tainted: G   M            4.12.0-rc1-13857-ga4700a261072-dirty #36
> task: c000000932300000 task.stack: c000000932380000
> NIP: ffff000000000000 LR: 00000000217706a4 CTR: ffff000000000000
> REGS: c00000000fc8fd80 TRAP: 0200   Tainted: G   M             (4.12.0-rc1-13857-ga4700a261072-dirty)
> MSR: 90000000001c1003 <SF,HV,ME,RI,LE>
>   CR: 24000484  XER: 20000000
> CFAR: c000000000004c80 DAR: 0000000021770a90 DSISR: 0a000000 SOFTE: 1
> GPR00: 0000000000001ebe 00007fffce4818b0 0000000021797f00 0000000000000000
> GPR04: 00007fff8007ac24 0000000044000484 0000000000004000 00007fff801405e8
> GPR08: 900000000280f033 0000000024000484 0000000000000000 0000000000000030
> GPR12: 9000000000001003 00007fff801bc370 0000000000000000 0000000000000000
> GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> GPR28: 00007fff801b0000 0000000000000000 00000000217707a0 00007fffce481918
> NIP [ffff000000000000] 0xffff000000000000
> LR [00000000217706a4] 0x217706a4
> Call Trace:
> Instruction dump:
> XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
> XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
> ---[ end trace 32ae1dabb4f8dae6 ]---
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

Thanks,
-Mahesh.

> ---
>  arch/powerpc/include/asm/bug.h        |  1 +
>  arch/powerpc/include/asm/fadump.h     |  2 ++
>  arch/powerpc/kernel/fadump.c          |  9 ++++++++-
>  arch/powerpc/kernel/traps.c           | 22 ++++++++++++++++++++++
>  arch/powerpc/platforms/powernv/opal.c | 32 ++++++++++++++++++++++++++------
>  5 files changed, 59 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h
> index 0151af6c2a50..9a918b3ca5ee 100644
> --- a/arch/powerpc/include/asm/bug.h
> +++ b/arch/powerpc/include/asm/bug.h
> @@ -133,6 +133,7 @@ extern int do_page_fault(struct pt_regs *, unsigned long, unsigned long);
>  extern void bad_page_fault(struct pt_regs *, unsigned long, int);
>  extern void _exception(int, struct pt_regs *, int, unsigned long);
>  extern void die(const char *, struct pt_regs *, long);
> +extern bool die_will_crash(void);
> 
>  #endif /* !__ASSEMBLY__ */
> 
> diff --git a/arch/powerpc/include/asm/fadump.h b/arch/powerpc/include/asm/fadump.h
> index ce88bbe1d809..5a23010af600 100644
> --- a/arch/powerpc/include/asm/fadump.h
> +++ b/arch/powerpc/include/asm/fadump.h
> @@ -209,11 +209,13 @@ extern int early_init_dt_scan_fw_dump(unsigned long node,
>  extern int fadump_reserve_mem(void);
>  extern int setup_fadump(void);
>  extern int is_fadump_active(void);
> +extern int should_fadump_crash(void);
>  extern void crash_fadump(struct pt_regs *, const char *);
>  extern void fadump_cleanup(void);
> 
>  #else	/* CONFIG_FA_DUMP */
>  static inline int is_fadump_active(void) { return 0; }
> +static inline int should_fadump_crash(void) { return 0; }
>  static inline void crash_fadump(struct pt_regs *regs, const char *str) { }
>  #endif
>  #endif
> diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
> index da8830e49696..8a3058f5943b 100644
> --- a/arch/powerpc/kernel/fadump.c
> +++ b/arch/powerpc/kernel/fadump.c
> @@ -125,6 +125,13 @@ int is_fadump_boot_memory_area(u64 addr, ulong size)
>  	return (addr + size) > RMA_START && addr <= fw_dump.boot_memory_size;
>  }
> 
> +int should_fadump_crash(void)
> +{
> +	if (!fw_dump.dump_registered || !fw_dump.fadumphdr_addr)
> +		return 0;
> +	return 1;
> +}
> +
>  int is_fadump_active(void)
>  {
>  	return fw_dump.dump_active;
> @@ -518,7 +525,7 @@ void crash_fadump(struct pt_regs *regs, const char *str)
>  	struct fadump_crash_info_header *fdh = NULL;
>  	int old_cpu, this_cpu;
> 
> -	if (!fw_dump.dump_registered || !fw_dump.fadumphdr_addr)
> +	if (!should_fadump_crash())
>  		return;
> 
>  	/*
> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
> index 574e949f8db9..2849c4f50324 100644
> --- a/arch/powerpc/kernel/traps.c
> +++ b/arch/powerpc/kernel/traps.c
> @@ -114,6 +114,28 @@ static void pmac_backlight_unblank(void)
>  static inline void pmac_backlight_unblank(void) { }
>  #endif
> 
> +/*
> + * If oops/die is expected to crash the machine, return true here.
> + *
> + * This should not be expected to be 100% accurate, there may be
> + * notifiers registered or other unexpected conditions that may bring
> + * down the kernel. Or if the current process in the kernel is holding
> + * locks or has other critical state, the kernel may become effectively
> + * unusable anyway.
> + */
> +bool die_will_crash(void)
> +{
> +	if (should_fadump_crash())
> +		return true;
> +	if (kexec_should_crash(current))
> +		return true;
> +	if (in_interrupt() || panic_on_oops ||
> +			!current->pid || is_global_init(current))
> +		return true;
> +
> +	return false;
> +}
> +
>  static arch_spinlock_t die_lock = __ARCH_SPIN_LOCK_UNLOCKED;
>  static int die_owner = -1;
>  static unsigned int die_nest_count;
> diff --git a/arch/powerpc/platforms/powernv/opal.c b/arch/powerpc/platforms/powernv/opal.c
> index 96436d129684..140350aacca5 100644
> --- a/arch/powerpc/platforms/powernv/opal.c
> +++ b/arch/powerpc/platforms/powernv/opal.c
> @@ -34,6 +34,7 @@
>  #include <asm/opal.h>
>  #include <asm/firmware.h>
>  #include <asm/mce.h>
> +#include <asm/bug.h>
> 
>  #include "powernv.h"
> 
> @@ -426,17 +427,36 @@ static int opal_recover_mce(struct pt_regs *regs,
>  		/* Fatal machine check */
>  		pr_err("Machine check interrupt is fatal\n");
>  		recovered = 0;
> -	} else if ((evt->severity == MCE_SEV_ERROR_SYNC) &&
> -			(user_mode(regs) && !is_global_init(current))) {
> +	}
> +
> +	if (!recovered && evt->severity == MCE_SEV_ERROR_SYNC) {
>  		/*
> -		 * For now, kill the task if we have received exception when
> -		 * in userspace.
> +		 * Try to kill processes if we get a synchronous machine check
> +		 * (e.g., one caused by execution of this instruction). This
> +		 * will devolve into a panic if we try to kill init or are in
> +		 * an interrupt etc.
>  		 *
>  		 * TODO: Queue up this address for hwpoisioning later.
> +		 * TODO: This is not quite right for d-side machine
> +		 *       checks ->nip is not necessarily the important
> +		 *       address.
>  		 */
> -		_exception(SIGBUS, regs, BUS_MCEERR_AR, regs->nip);
> -		recovered = 1;
> +		if ((user_mode(regs))) {
> +			_exception(SIGBUS, regs, BUS_MCEERR_AR, regs->nip);
> +			recovered = 1;
> +		} else if (die_will_crash()) {
> +			/*
> +			 * die() would kill the kernel, so better to go via
> +			 * the platform reboot code that will log the
> +			 * machine check.
> +			 */
> +			recovered = 0;
> +		} else {
> +			die("Machine check", regs, SIGBUS);
> +			recovered = 1;
> +		}
>  	}
> +
>  	return recovered;
>  }
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [v2, 1/3] powerpc/powernv: handle the platform error reboot in ppc_md.restart
  2017-07-19  6:59 ` [PATCH v2 1/3] powerpc/powernv: handle the platform error reboot in ppc_md.restart Nicholas Piggin
  2017-07-19  7:16   ` Nicholas Piggin
  2017-07-20  5:39   ` Mahesh Jagannath Salgaonkar
@ 2017-08-31 11:36   ` Michael Ellerman
  2 siblings, 0 replies; 21+ messages in thread
From: Michael Ellerman @ 2017-08-31 11:36 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev
  Cc: Nicholas Piggin, Mahesh Jagannath Salgaonkar

On Wed, 2017-07-19 at 06:59:10 UTC, Nicholas Piggin wrote:
> Unrecovered MCE and HMI errors are sent through a special restart OPAL
> call to log the platform error. The downside is that they don't go
> through normal Linux crash paths, so they don't give much information
> to the Linux console.
> 
> Change this by providing a special crash function which does some of
> the console flushing from the panic() path before calling firmware to
> reboot.
> 
> The downside of this is a little more code to execute before reaching
> the firmware reboot. However in practice, it's critical to get the
> Linux console messages output in order to debug a problem. So this is
> a desirable tradeoff.
> 
> Note on the implementation: It is difficult to plumb a custom reboot
> handler into the panic path, because panic does a little bit too much
> work. For example, it will try to delay with the timebase, but that
> may be corrupted in some cases resulting in a hang without reaching
> the platform reboot. Another problem is that panic can invoke the
> crash dump code which is not what we want in the case of a hardware
> platform error. Long-term the best solution will be to rework the
> panic path so it can be suitable for this kind of panic, but for now
> we just duplicate a bit of the code.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/b746e3e01e70d23ef53dcde1203ab7

cheers

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 3/3] powerpc: machine check interrupt is a non-maskable interrupt
  2017-07-19  6:59 ` [PATCH v2 3/3] powerpc: machine check interrupt is a non-maskable interrupt Nicholas Piggin
@ 2018-10-08 15:39   ` Christophe LEROY
  2018-10-09  4:32     ` Nicholas Piggin
  0 siblings, 1 reply; 21+ messages in thread
From: Christophe LEROY @ 2018-10-08 15:39 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev; +Cc: Mahesh Jagannath Salgaonkar

Hi Nick,

Le 19/07/2017 à 08:59, Nicholas Piggin a écrit :
> Use nmi_enter similarly to system reset interrupts. This uses NMI
> printk NMI buffers and turns off various debugging facilities that
> helps avoid tripping on ourselves or other CPUs.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>   arch/powerpc/kernel/traps.c | 9 ++++++---
>   1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
> index 2849c4f50324..6d31f9d7c333 100644
> --- a/arch/powerpc/kernel/traps.c
> +++ b/arch/powerpc/kernel/traps.c
> @@ -789,8 +789,10 @@ int machine_check_generic(struct pt_regs *regs)
>   
>   void machine_check_exception(struct pt_regs *regs)
>   {
> -	enum ctx_state prev_state = exception_enter();
>   	int recover = 0;
> +	bool nested = in_nmi();
> +	if (!nested)
> +		nmi_enter();

This alters preempt_count, then when die() is called
in_interrupt() returns true allthough the trap didn't happen in 
interrupt, so oops_end() panics for "fatal exception in interrupt" 
instead of gently sending SIGBUS the faulting app.

Any idea on how to fix this ?

Christophe

>   
>   	__this_cpu_inc(irq_stat.mce_exceptions);
>   
> @@ -820,10 +822,11 @@ void machine_check_exception(struct pt_regs *regs)
>   
>   	/* Must die if the interrupt is not recoverable */
>   	if (!(regs->msr & MSR_RI))
> -		panic("Unrecoverable Machine check");
> +		nmi_panic(regs, "Unrecoverable Machine check");
>   
>   bail:
> -	exception_exit(prev_state);
> +	if (!nested)
> +		nmi_exit();
>   }
>   
>   void SMIException(struct pt_regs *regs)
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 3/3] powerpc: machine check interrupt is a non-maskable interrupt
  2018-10-08 15:39   ` Christophe LEROY
@ 2018-10-09  4:32     ` Nicholas Piggin
  2018-10-09  4:46       ` Christophe LEROY
  0 siblings, 1 reply; 21+ messages in thread
From: Nicholas Piggin @ 2018-10-09  4:32 UTC (permalink / raw)
  To: Christophe LEROY; +Cc: Mahesh Jagannath Salgaonkar, linuxppc-dev

On Mon, 8 Oct 2018 17:39:11 +0200
Christophe LEROY <christophe.leroy@c-s.fr> wrote:

> Hi Nick,
> 
> Le 19/07/2017 à 08:59, Nicholas Piggin a écrit :
> > Use nmi_enter similarly to system reset interrupts. This uses NMI
> > printk NMI buffers and turns off various debugging facilities that
> > helps avoid tripping on ourselves or other CPUs.
> > 
> > Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> > ---
> >   arch/powerpc/kernel/traps.c | 9 ++++++---
> >   1 file changed, 6 insertions(+), 3 deletions(-)
> > 
> > diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
> > index 2849c4f50324..6d31f9d7c333 100644
> > --- a/arch/powerpc/kernel/traps.c
> > +++ b/arch/powerpc/kernel/traps.c
> > @@ -789,8 +789,10 @@ int machine_check_generic(struct pt_regs *regs)
> >   
> >   void machine_check_exception(struct pt_regs *regs)
> >   {
> > -	enum ctx_state prev_state = exception_enter();
> >   	int recover = 0;
> > +	bool nested = in_nmi();
> > +	if (!nested)
> > +		nmi_enter();  
> 
> This alters preempt_count, then when die() is called
> in_interrupt() returns true allthough the trap didn't happen in 
> interrupt, so oops_end() panics for "fatal exception in interrupt" 
> instead of gently sending SIGBUS the faulting app.

Thanks for tracking that down.

> Any idea on how to fix this ?

I would say we have to deliver the sigbus by hand.

    if ((user_mode(regs)))
        _exception(SIGBUS, regs, BUS_MCEERR_AR, regs->nip);
    else
        die("Machine check", regs, SIGBUS);

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 3/3] powerpc: machine check interrupt is a non-maskable interrupt
  2018-10-09  4:32     ` Nicholas Piggin
@ 2018-10-09  4:46       ` Christophe LEROY
  2018-10-09  5:30         ` Nicholas Piggin
  0 siblings, 1 reply; 21+ messages in thread
From: Christophe LEROY @ 2018-10-09  4:46 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: Mahesh Jagannath Salgaonkar, linuxppc-dev



Le 09/10/2018 à 06:32, Nicholas Piggin a écrit :
> On Mon, 8 Oct 2018 17:39:11 +0200
> Christophe LEROY <christophe.leroy@c-s.fr> wrote:
> 
>> Hi Nick,
>>
>> Le 19/07/2017 à 08:59, Nicholas Piggin a écrit :
>>> Use nmi_enter similarly to system reset interrupts. This uses NMI
>>> printk NMI buffers and turns off various debugging facilities that
>>> helps avoid tripping on ourselves or other CPUs.
>>>
>>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>>> ---
>>>    arch/powerpc/kernel/traps.c | 9 ++++++---
>>>    1 file changed, 6 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
>>> index 2849c4f50324..6d31f9d7c333 100644
>>> --- a/arch/powerpc/kernel/traps.c
>>> +++ b/arch/powerpc/kernel/traps.c
>>> @@ -789,8 +789,10 @@ int machine_check_generic(struct pt_regs *regs)
>>>    
>>>    void machine_check_exception(struct pt_regs *regs)
>>>    {
>>> -	enum ctx_state prev_state = exception_enter();
>>>    	int recover = 0;
>>> +	bool nested = in_nmi();
>>> +	if (!nested)
>>> +		nmi_enter();
>>
>> This alters preempt_count, then when die() is called
>> in_interrupt() returns true allthough the trap didn't happen in
>> interrupt, so oops_end() panics for "fatal exception in interrupt"
>> instead of gently sending SIGBUS the faulting app.
> 
> Thanks for tracking that down.
> 
>> Any idea on how to fix this ?
> 
> I would say we have to deliver the sigbus by hand.
> 
>      if ((user_mode(regs)))
>          _exception(SIGBUS, regs, BUS_MCEERR_AR, regs->nip);
>      else
>          die("Machine check", regs, SIGBUS);
> 

And what about all the other things done by 'die()' ?

And what if it is a kernel thread ?

In one of my boards, I have a kernel thread regularly checking the HW, 
and if it gets a machine check I expect it to gently stop and the die 
notification to be delivered to all registered notifiers.

Until before this patch, it was working well.

Christophe

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 3/3] powerpc: machine check interrupt is a non-maskable interrupt
  2018-10-09  4:46       ` Christophe LEROY
@ 2018-10-09  5:30         ` Nicholas Piggin
  2018-10-09  9:36           ` Christophe Leroy
  0 siblings, 1 reply; 21+ messages in thread
From: Nicholas Piggin @ 2018-10-09  5:30 UTC (permalink / raw)
  To: Christophe LEROY; +Cc: Mahesh Jagannath Salgaonkar, linuxppc-dev

On Tue, 9 Oct 2018 06:46:30 +0200
Christophe LEROY <christophe.leroy@c-s.fr> wrote:

> Le 09/10/2018 à 06:32, Nicholas Piggin a écrit :
> > On Mon, 8 Oct 2018 17:39:11 +0200
> > Christophe LEROY <christophe.leroy@c-s.fr> wrote:
> >   
> >> Hi Nick,
> >>
> >> Le 19/07/2017 à 08:59, Nicholas Piggin a écrit :  
> >>> Use nmi_enter similarly to system reset interrupts. This uses NMI
> >>> printk NMI buffers and turns off various debugging facilities that
> >>> helps avoid tripping on ourselves or other CPUs.
> >>>
> >>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> >>> ---
> >>>    arch/powerpc/kernel/traps.c | 9 ++++++---
> >>>    1 file changed, 6 insertions(+), 3 deletions(-)
> >>>
> >>> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
> >>> index 2849c4f50324..6d31f9d7c333 100644
> >>> --- a/arch/powerpc/kernel/traps.c
> >>> +++ b/arch/powerpc/kernel/traps.c
> >>> @@ -789,8 +789,10 @@ int machine_check_generic(struct pt_regs *regs)
> >>>    
> >>>    void machine_check_exception(struct pt_regs *regs)
> >>>    {
> >>> -	enum ctx_state prev_state = exception_enter();
> >>>    	int recover = 0;
> >>> +	bool nested = in_nmi();
> >>> +	if (!nested)
> >>> +		nmi_enter();  
> >>
> >> This alters preempt_count, then when die() is called
> >> in_interrupt() returns true allthough the trap didn't happen in
> >> interrupt, so oops_end() panics for "fatal exception in interrupt"
> >> instead of gently sending SIGBUS the faulting app.  
> > 
> > Thanks for tracking that down.
> >   
> >> Any idea on how to fix this ?  
> > 
> > I would say we have to deliver the sigbus by hand.
> > 
> >      if ((user_mode(regs)))
> >          _exception(SIGBUS, regs, BUS_MCEERR_AR, regs->nip);
> >      else
> >          die("Machine check", regs, SIGBUS);
> >   
> 
> And what about all the other things done by 'die()' ?
> 
> And what if it is a kernel thread ?
> 
> In one of my boards, I have a kernel thread regularly checking the HW, 
> and if it gets a machine check I expect it to gently stop and the die 
> notification to be delivered to all registered notifiers.
> 
> Until before this patch, it was working well.

I guess the alternative is we could check regs->trap for machine
check in the die test. Complication is having to account for MCE
in an interrupt handler.

       if (in_interrupt()) {
                if (!IS_MCHECK_EXC(regs) || (irq_count() - (NMI_OFFSET + HARDIRQ_OFFSET)))
                    panic("Fatal exception in interrupt");
       }

Something like that might work for you? We needs a ppc64 macro for the
MCE, and can probably add something like in_nmi_from_interrupt() for
the second part of the test.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 3/3] powerpc: machine check interrupt is a non-maskable interrupt
  2018-10-09  5:30         ` Nicholas Piggin
@ 2018-10-09  9:36           ` Christophe Leroy
  2018-10-09 11:16             ` Nicholas Piggin
  0 siblings, 1 reply; 21+ messages in thread
From: Christophe Leroy @ 2018-10-09  9:36 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: Mahesh Jagannath Salgaonkar, linuxppc-dev



On 10/09/2018 05:30 AM, Nicholas Piggin wrote:
> On Tue, 9 Oct 2018 06:46:30 +0200
> Christophe LEROY <christophe.leroy@c-s.fr> wrote:
> 
>> Le 09/10/2018 à 06:32, Nicholas Piggin a écrit :
>>> On Mon, 8 Oct 2018 17:39:11 +0200
>>> Christophe LEROY <christophe.leroy@c-s.fr> wrote:
>>>    
>>>> Hi Nick,
>>>>
>>>> Le 19/07/2017 à 08:59, Nicholas Piggin a écrit :
>>>>> Use nmi_enter similarly to system reset interrupts. This uses NMI
>>>>> printk NMI buffers and turns off various debugging facilities that
>>>>> helps avoid tripping on ourselves or other CPUs.
>>>>>
>>>>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>>>>> ---
>>>>>     arch/powerpc/kernel/traps.c | 9 ++++++---
>>>>>     1 file changed, 6 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
>>>>> index 2849c4f50324..6d31f9d7c333 100644
>>>>> --- a/arch/powerpc/kernel/traps.c
>>>>> +++ b/arch/powerpc/kernel/traps.c
>>>>> @@ -789,8 +789,10 @@ int machine_check_generic(struct pt_regs *regs)
>>>>>     
>>>>>     void machine_check_exception(struct pt_regs *regs)
>>>>>     {
>>>>> -	enum ctx_state prev_state = exception_enter();
>>>>>     	int recover = 0;
>>>>> +	bool nested = in_nmi();
>>>>> +	if (!nested)
>>>>> +		nmi_enter();
>>>>
>>>> This alters preempt_count, then when die() is called
>>>> in_interrupt() returns true allthough the trap didn't happen in
>>>> interrupt, so oops_end() panics for "fatal exception in interrupt"
>>>> instead of gently sending SIGBUS the faulting app.
>>>
>>> Thanks for tracking that down.
>>>    
>>>> Any idea on how to fix this ?
>>>
>>> I would say we have to deliver the sigbus by hand.
>>>
>>>       if ((user_mode(regs)))
>>>           _exception(SIGBUS, regs, BUS_MCEERR_AR, regs->nip);
>>>       else
>>>           die("Machine check", regs, SIGBUS);
>>>    
>>
>> And what about all the other things done by 'die()' ?
>>
>> And what if it is a kernel thread ?
>>
>> In one of my boards, I have a kernel thread regularly checking the HW,
>> and if it gets a machine check I expect it to gently stop and the die
>> notification to be delivered to all registered notifiers.
>>
>> Until before this patch, it was working well.
> 
> I guess the alternative is we could check regs->trap for machine
> check in the die test. Complication is having to account for MCE
> in an interrupt handler.
> 
>         if (in_interrupt()) {
>                  if (!IS_MCHECK_EXC(regs) || (irq_count() - (NMI_OFFSET + HARDIRQ_OFFSET)))
>                      panic("Fatal exception in interrupt");
>         }
> 
> Something like that might work for you? We needs a ppc64 macro for the
> MCE, and can probably add something like in_nmi_from_interrupt() for
> the second part of the test.

Don't know, I'm away from home on business trip so I won't be able to 
test anything before next week. However it looks more or less like a 
hack, doesn't it ?

What about the following ?

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index fd58749b4d6b..1f09033a5103 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -208,7 +208,7 @@ static unsigned long oops_begin(struct pt_regs *regs)
  NOKPROBE_SYMBOL(oops_begin);

  static void oops_end(unsigned long flags, struct pt_regs *regs,
-			       int signr)
+		     int signr, bool is_in_interrupt)
  {
  	bust_spinlocks(0);
  	add_taint(TAINT_DIE, LOCKDEP_NOW_UNRELIABLE);
@@ -247,7 +247,7 @@ static void oops_end(unsigned long flags, struct 
pt_regs *regs,
  		mdelay(MSEC_PER_SEC);
  	}

-	if (in_interrupt())
+	if (is_in_interrupt)
  		panic("Fatal exception in interrupt");
  	if (panic_on_oops)
  		panic("Fatal exception");
@@ -288,7 +288,7 @@ static int __die(const char *str, struct pt_regs 
*regs, long err)
  }
  NOKPROBE_SYMBOL(__die);

-void die(const char *str, struct pt_regs *regs, long err)
+static void nmi_die(const char *str, struct pt_regs *regs, long err, 
bool is_in_interrupt)
  {
  	unsigned long flags;

@@ -303,7 +303,13 @@ void die(const char *str, struct pt_regs *regs, 
long err)
  	flags = oops_begin(regs);
  	if (__die(str, regs, err))
  		err = 0;
-	oops_end(flags, regs, err);
+	oops_end(flags, regs, err, is_in_interrupt);
+}
+NOKPROBE_SYMBOL(nmi_die);
+
+void die(const char *str, struct pt_regs *regs, long err)
+{
+	nmi_die(str, regs, err, in_interrupt());
  }
  NOKPROBE_SYMBOL(die);

@@ -737,6 +743,7 @@ int machine_check_generic(struct pt_regs *regs)
  void machine_check_exception(struct pt_regs *regs)
  {
  	int recover = 0;
+	bool is_in_interrupt = in_interrupt();
  	bool nested = in_nmi();
  	if (!nested)
  		nmi_enter();
@@ -765,7 +772,7 @@ void machine_check_exception(struct pt_regs *regs)
  	if (check_io_access(regs))
  		goto bail;

-	die("Machine check", regs, SIGBUS);
+	nmi_die("Machine check", regs, SIGBUS, is_in_interrupt);

  	/* Must die if the interrupt is not recoverable */
  	if (!(regs->msr & MSR_RI))


Thanks
Christophe

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 3/3] powerpc: machine check interrupt is a non-maskable interrupt
  2018-10-09  9:36           ` Christophe Leroy
@ 2018-10-09 11:16             ` Nicholas Piggin
  2018-10-09 12:01               ` Christophe LEROY
  2018-10-11 14:31               ` Christophe LEROY
  0 siblings, 2 replies; 21+ messages in thread
From: Nicholas Piggin @ 2018-10-09 11:16 UTC (permalink / raw)
  To: Christophe Leroy; +Cc: Mahesh Jagannath Salgaonkar, linuxppc-dev

On Tue, 9 Oct 2018 09:36:18 +0000
Christophe Leroy <christophe.leroy@c-s.fr> wrote:

> On 10/09/2018 05:30 AM, Nicholas Piggin wrote:
> > On Tue, 9 Oct 2018 06:46:30 +0200
> > Christophe LEROY <christophe.leroy@c-s.fr> wrote:
> >   
> >> Le 09/10/2018 à 06:32, Nicholas Piggin a écrit :  
> >>> On Mon, 8 Oct 2018 17:39:11 +0200
> >>> Christophe LEROY <christophe.leroy@c-s.fr> wrote:
> >>>      
> >>>> Hi Nick,
> >>>>
> >>>> Le 19/07/2017 à 08:59, Nicholas Piggin a écrit :  
> >>>>> Use nmi_enter similarly to system reset interrupts. This uses NMI
> >>>>> printk NMI buffers and turns off various debugging facilities that
> >>>>> helps avoid tripping on ourselves or other CPUs.
> >>>>>
> >>>>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> >>>>> ---
> >>>>>     arch/powerpc/kernel/traps.c | 9 ++++++---
> >>>>>     1 file changed, 6 insertions(+), 3 deletions(-)
> >>>>>
> >>>>> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
> >>>>> index 2849c4f50324..6d31f9d7c333 100644
> >>>>> --- a/arch/powerpc/kernel/traps.c
> >>>>> +++ b/arch/powerpc/kernel/traps.c
> >>>>> @@ -789,8 +789,10 @@ int machine_check_generic(struct pt_regs *regs)
> >>>>>     
> >>>>>     void machine_check_exception(struct pt_regs *regs)
> >>>>>     {
> >>>>> -	enum ctx_state prev_state = exception_enter();
> >>>>>     	int recover = 0;
> >>>>> +	bool nested = in_nmi();
> >>>>> +	if (!nested)
> >>>>> +		nmi_enter();  
> >>>>
> >>>> This alters preempt_count, then when die() is called
> >>>> in_interrupt() returns true allthough the trap didn't happen in
> >>>> interrupt, so oops_end() panics for "fatal exception in interrupt"
> >>>> instead of gently sending SIGBUS the faulting app.  
> >>>
> >>> Thanks for tracking that down.
> >>>      
> >>>> Any idea on how to fix this ?  
> >>>
> >>> I would say we have to deliver the sigbus by hand.
> >>>
> >>>       if ((user_mode(regs)))
> >>>           _exception(SIGBUS, regs, BUS_MCEERR_AR, regs->nip);
> >>>       else
> >>>           die("Machine check", regs, SIGBUS);
> >>>      
> >>
> >> And what about all the other things done by 'die()' ?
> >>
> >> And what if it is a kernel thread ?
> >>
> >> In one of my boards, I have a kernel thread regularly checking the HW,
> >> and if it gets a machine check I expect it to gently stop and the die
> >> notification to be delivered to all registered notifiers.
> >>
> >> Until before this patch, it was working well.  
> > 
> > I guess the alternative is we could check regs->trap for machine
> > check in the die test. Complication is having to account for MCE
> > in an interrupt handler.
> > 
> >         if (in_interrupt()) {
> >                  if (!IS_MCHECK_EXC(regs) || (irq_count() - (NMI_OFFSET + HARDIRQ_OFFSET)))
> >                      panic("Fatal exception in interrupt");
> >         }
> > 
> > Something like that might work for you? We needs a ppc64 macro for the
> > MCE, and can probably add something like in_nmi_from_interrupt() for
> > the second part of the test.  
> 
> Don't know, I'm away from home on business trip so I won't be able to 
> test anything before next week. However it looks more or less like a 
> hack, doesn't it ?

I thought it seemed okay (with the right functions added). Actually it
could be a bit nicer to do this, then it works generally :

         if (in_interrupt()) {
                  if (!in_nmi() || in_nmi_from_interrupt())
                      panic("Fatal exception in interrupt");
         }

> 
> What about the following ?

Hmm, in some ways maybe it's nicer. One complication is I would like the
same thing to be available for platform specific machine check
handlers, so then you need to pass is_in_interrupt to them. Which you
can do without any problem... But is it cleaner than the above?

I guess one advantage of yours is that a BUG somewhere in the NMI path
will panic the system. Or is that a disadvantage?

Thanks,
Nick


> 
> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
> index fd58749b4d6b..1f09033a5103 100644
> --- a/arch/powerpc/kernel/traps.c
> +++ b/arch/powerpc/kernel/traps.c
> @@ -208,7 +208,7 @@ static unsigned long oops_begin(struct pt_regs *regs)
>   NOKPROBE_SYMBOL(oops_begin);
> 
>   static void oops_end(unsigned long flags, struct pt_regs *regs,
> -			       int signr)
> +		     int signr, bool is_in_interrupt)
>   {
>   	bust_spinlocks(0);
>   	add_taint(TAINT_DIE, LOCKDEP_NOW_UNRELIABLE);
> @@ -247,7 +247,7 @@ static void oops_end(unsigned long flags, struct 
> pt_regs *regs,
>   		mdelay(MSEC_PER_SEC);
>   	}
> 
> -	if (in_interrupt())
> +	if (is_in_interrupt)
>   		panic("Fatal exception in interrupt");
>   	if (panic_on_oops)
>   		panic("Fatal exception");
> @@ -288,7 +288,7 @@ static int __die(const char *str, struct pt_regs 
> *regs, long err)
>   }
>   NOKPROBE_SYMBOL(__die);
> 
> -void die(const char *str, struct pt_regs *regs, long err)
> +static void nmi_die(const char *str, struct pt_regs *regs, long err, 
> bool is_in_interrupt)
>   {
>   	unsigned long flags;
> 
> @@ -303,7 +303,13 @@ void die(const char *str, struct pt_regs *regs, 
> long err)
>   	flags = oops_begin(regs);
>   	if (__die(str, regs, err))
>   		err = 0;
> -	oops_end(flags, regs, err);
> +	oops_end(flags, regs, err, is_in_interrupt);
> +}
> +NOKPROBE_SYMBOL(nmi_die);
> +
> +void die(const char *str, struct pt_regs *regs, long err)
> +{
> +	nmi_die(str, regs, err, in_interrupt());
>   }
>   NOKPROBE_SYMBOL(die);
> 
> @@ -737,6 +743,7 @@ int machine_check_generic(struct pt_regs *regs)
>   void machine_check_exception(struct pt_regs *regs)
>   {
>   	int recover = 0;
> +	bool is_in_interrupt = in_interrupt();
>   	bool nested = in_nmi();
>   	if (!nested)
>   		nmi_enter();
> @@ -765,7 +772,7 @@ void machine_check_exception(struct pt_regs *regs)
>   	if (check_io_access(regs))
>   		goto bail;
> 
> -	die("Machine check", regs, SIGBUS);
> +	nmi_die("Machine check", regs, SIGBUS, is_in_interrupt);
> 
>   	/* Must die if the interrupt is not recoverable */
>   	if (!(regs->msr & MSR_RI))
> 
> 
> Thanks
> Christophe


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 3/3] powerpc: machine check interrupt is a non-maskable interrupt
  2018-10-09 11:16             ` Nicholas Piggin
@ 2018-10-09 12:01               ` Christophe LEROY
  2018-10-09 12:14                 ` Nicholas Piggin
  2018-10-11 14:31               ` Christophe LEROY
  1 sibling, 1 reply; 21+ messages in thread
From: Christophe LEROY @ 2018-10-09 12:01 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: Mahesh Jagannath Salgaonkar, linuxppc-dev



Le 09/10/2018 à 13:16, Nicholas Piggin a écrit :
> On Tue, 9 Oct 2018 09:36:18 +0000
> Christophe Leroy <christophe.leroy@c-s.fr> wrote:
> 
>> On 10/09/2018 05:30 AM, Nicholas Piggin wrote:
>>> On Tue, 9 Oct 2018 06:46:30 +0200
>>> Christophe LEROY <christophe.leroy@c-s.fr> wrote:
>>>    
>>>> Le 09/10/2018 à 06:32, Nicholas Piggin a écrit :
>>>>> On Mon, 8 Oct 2018 17:39:11 +0200
>>>>> Christophe LEROY <christophe.leroy@c-s.fr> wrote:
>>>>>       
>>>>>> Hi Nick,
>>>>>>
>>>>>> Le 19/07/2017 à 08:59, Nicholas Piggin a écrit :
>>>>>>> Use nmi_enter similarly to system reset interrupts. This uses NMI
>>>>>>> printk NMI buffers and turns off various debugging facilities that
>>>>>>> helps avoid tripping on ourselves or other CPUs.
>>>>>>>
>>>>>>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>>>>>>> ---
>>>>>>>      arch/powerpc/kernel/traps.c | 9 ++++++---
>>>>>>>      1 file changed, 6 insertions(+), 3 deletions(-)
>>>>>>>
>>>>>>> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
>>>>>>> index 2849c4f50324..6d31f9d7c333 100644
>>>>>>> --- a/arch/powerpc/kernel/traps.c
>>>>>>> +++ b/arch/powerpc/kernel/traps.c
>>>>>>> @@ -789,8 +789,10 @@ int machine_check_generic(struct pt_regs *regs)
>>>>>>>      
>>>>>>>      void machine_check_exception(struct pt_regs *regs)
>>>>>>>      {
>>>>>>> -	enum ctx_state prev_state = exception_enter();
>>>>>>>      	int recover = 0;
>>>>>>> +	bool nested = in_nmi();
>>>>>>> +	if (!nested)
>>>>>>> +		nmi_enter();
>>>>>>
>>>>>> This alters preempt_count, then when die() is called
>>>>>> in_interrupt() returns true allthough the trap didn't happen in
>>>>>> interrupt, so oops_end() panics for "fatal exception in interrupt"
>>>>>> instead of gently sending SIGBUS the faulting app.
>>>>>
>>>>> Thanks for tracking that down.
>>>>>       
>>>>>> Any idea on how to fix this ?
>>>>>
>>>>> I would say we have to deliver the sigbus by hand.
>>>>>
>>>>>        if ((user_mode(regs)))
>>>>>            _exception(SIGBUS, regs, BUS_MCEERR_AR, regs->nip);
>>>>>        else
>>>>>            die("Machine check", regs, SIGBUS);
>>>>>       
>>>>
>>>> And what about all the other things done by 'die()' ?
>>>>
>>>> And what if it is a kernel thread ?
>>>>
>>>> In one of my boards, I have a kernel thread regularly checking the HW,
>>>> and if it gets a machine check I expect it to gently stop and the die
>>>> notification to be delivered to all registered notifiers.
>>>>
>>>> Until before this patch, it was working well.
>>>
>>> I guess the alternative is we could check regs->trap for machine
>>> check in the die test. Complication is having to account for MCE
>>> in an interrupt handler.
>>>
>>>          if (in_interrupt()) {
>>>                   if (!IS_MCHECK_EXC(regs) || (irq_count() - (NMI_OFFSET + HARDIRQ_OFFSET)))
>>>                       panic("Fatal exception in interrupt");
>>>          }
>>>
>>> Something like that might work for you? We needs a ppc64 macro for the
>>> MCE, and can probably add something like in_nmi_from_interrupt() for
>>> the second part of the test.
>>
>> Don't know, I'm away from home on business trip so I won't be able to
>> test anything before next week. However it looks more or less like a
>> hack, doesn't it ?
> 
> I thought it seemed okay (with the right functions added). Actually it
> could be a bit nicer to do this, then it works generally :
> 
>           if (in_interrupt()) {
>                    if (!in_nmi() || in_nmi_from_interrupt())
>                        panic("Fatal exception in interrupt");
>           }


Yes looks nice, but:
1/ what is in_nmi_from_interrupt() ? Is it (in_nmi() && (in_irq() || 
in_softirq()) ?
2/ what about in_nmi_from_nmi(), how do we detect that ?

Christophe

> 
>>
>> What about the following ?
> 
> Hmm, in some ways maybe it's nicer. One complication is I would like the
> same thing to be available for platform specific machine check
> handlers, so then you need to pass is_in_interrupt to them. Which you
> can do without any problem... But is it cleaner than the above?
> 
> I guess one advantage of yours is that a BUG somewhere in the NMI path
> will panic the system. Or is that a disadvantage?
> 
> Thanks,
> Nick
> 
> 
>>
>> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
>> index fd58749b4d6b..1f09033a5103 100644
>> --- a/arch/powerpc/kernel/traps.c
>> +++ b/arch/powerpc/kernel/traps.c
>> @@ -208,7 +208,7 @@ static unsigned long oops_begin(struct pt_regs *regs)
>>    NOKPROBE_SYMBOL(oops_begin);
>>
>>    static void oops_end(unsigned long flags, struct pt_regs *regs,
>> -			       int signr)
>> +		     int signr, bool is_in_interrupt)
>>    {
>>    	bust_spinlocks(0);
>>    	add_taint(TAINT_DIE, LOCKDEP_NOW_UNRELIABLE);
>> @@ -247,7 +247,7 @@ static void oops_end(unsigned long flags, struct
>> pt_regs *regs,
>>    		mdelay(MSEC_PER_SEC);
>>    	}
>>
>> -	if (in_interrupt())
>> +	if (is_in_interrupt)
>>    		panic("Fatal exception in interrupt");
>>    	if (panic_on_oops)
>>    		panic("Fatal exception");
>> @@ -288,7 +288,7 @@ static int __die(const char *str, struct pt_regs
>> *regs, long err)
>>    }
>>    NOKPROBE_SYMBOL(__die);
>>
>> -void die(const char *str, struct pt_regs *regs, long err)
>> +static void nmi_die(const char *str, struct pt_regs *regs, long err,
>> bool is_in_interrupt)
>>    {
>>    	unsigned long flags;
>>
>> @@ -303,7 +303,13 @@ void die(const char *str, struct pt_regs *regs,
>> long err)
>>    	flags = oops_begin(regs);
>>    	if (__die(str, regs, err))
>>    		err = 0;
>> -	oops_end(flags, regs, err);
>> +	oops_end(flags, regs, err, is_in_interrupt);
>> +}
>> +NOKPROBE_SYMBOL(nmi_die);
>> +
>> +void die(const char *str, struct pt_regs *regs, long err)
>> +{
>> +	nmi_die(str, regs, err, in_interrupt());
>>    }
>>    NOKPROBE_SYMBOL(die);
>>
>> @@ -737,6 +743,7 @@ int machine_check_generic(struct pt_regs *regs)
>>    void machine_check_exception(struct pt_regs *regs)
>>    {
>>    	int recover = 0;
>> +	bool is_in_interrupt = in_interrupt();
>>    	bool nested = in_nmi();
>>    	if (!nested)
>>    		nmi_enter();
>> @@ -765,7 +772,7 @@ void machine_check_exception(struct pt_regs *regs)
>>    	if (check_io_access(regs))
>>    		goto bail;
>>
>> -	die("Machine check", regs, SIGBUS);
>> +	nmi_die("Machine check", regs, SIGBUS, is_in_interrupt);
>>
>>    	/* Must die if the interrupt is not recoverable */
>>    	if (!(regs->msr & MSR_RI))
>>
>>
>> Thanks
>> Christophe

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 3/3] powerpc: machine check interrupt is a non-maskable interrupt
  2018-10-09 12:01               ` Christophe LEROY
@ 2018-10-09 12:14                 ` Nicholas Piggin
  2018-10-11 14:23                   ` Christophe LEROY
  0 siblings, 1 reply; 21+ messages in thread
From: Nicholas Piggin @ 2018-10-09 12:14 UTC (permalink / raw)
  To: Christophe LEROY; +Cc: Mahesh Jagannath Salgaonkar, linuxppc-dev

On Tue, 9 Oct 2018 14:01:37 +0200
Christophe LEROY <christophe.leroy@c-s.fr> wrote:

> Le 09/10/2018 à 13:16, Nicholas Piggin a écrit :
> > On Tue, 9 Oct 2018 09:36:18 +0000
> > Christophe Leroy <christophe.leroy@c-s.fr> wrote:
> >   
> >> On 10/09/2018 05:30 AM, Nicholas Piggin wrote:  
> >>> On Tue, 9 Oct 2018 06:46:30 +0200
> >>> Christophe LEROY <christophe.leroy@c-s.fr> wrote:
> >>>      
> >>>> Le 09/10/2018 à 06:32, Nicholas Piggin a écrit :  
> >>>>> On Mon, 8 Oct 2018 17:39:11 +0200
> >>>>> Christophe LEROY <christophe.leroy@c-s.fr> wrote:
> >>>>>         
> >>>>>> Hi Nick,
> >>>>>>
> >>>>>> Le 19/07/2017 à 08:59, Nicholas Piggin a écrit :  
> >>>>>>> Use nmi_enter similarly to system reset interrupts. This uses NMI
> >>>>>>> printk NMI buffers and turns off various debugging facilities that
> >>>>>>> helps avoid tripping on ourselves or other CPUs.
> >>>>>>>
> >>>>>>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> >>>>>>> ---
> >>>>>>>      arch/powerpc/kernel/traps.c | 9 ++++++---
> >>>>>>>      1 file changed, 6 insertions(+), 3 deletions(-)
> >>>>>>>
> >>>>>>> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
> >>>>>>> index 2849c4f50324..6d31f9d7c333 100644
> >>>>>>> --- a/arch/powerpc/kernel/traps.c
> >>>>>>> +++ b/arch/powerpc/kernel/traps.c
> >>>>>>> @@ -789,8 +789,10 @@ int machine_check_generic(struct pt_regs *regs)
> >>>>>>>      
> >>>>>>>      void machine_check_exception(struct pt_regs *regs)
> >>>>>>>      {
> >>>>>>> -	enum ctx_state prev_state = exception_enter();
> >>>>>>>      	int recover = 0;
> >>>>>>> +	bool nested = in_nmi();
> >>>>>>> +	if (!nested)
> >>>>>>> +		nmi_enter();  
> >>>>>>
> >>>>>> This alters preempt_count, then when die() is called
> >>>>>> in_interrupt() returns true allthough the trap didn't happen in
> >>>>>> interrupt, so oops_end() panics for "fatal exception in interrupt"
> >>>>>> instead of gently sending SIGBUS the faulting app.  
> >>>>>
> >>>>> Thanks for tracking that down.
> >>>>>         
> >>>>>> Any idea on how to fix this ?  
> >>>>>
> >>>>> I would say we have to deliver the sigbus by hand.
> >>>>>
> >>>>>        if ((user_mode(regs)))
> >>>>>            _exception(SIGBUS, regs, BUS_MCEERR_AR, regs->nip);
> >>>>>        else
> >>>>>            die("Machine check", regs, SIGBUS);
> >>>>>         
> >>>>
> >>>> And what about all the other things done by 'die()' ?
> >>>>
> >>>> And what if it is a kernel thread ?
> >>>>
> >>>> In one of my boards, I have a kernel thread regularly checking the HW,
> >>>> and if it gets a machine check I expect it to gently stop and the die
> >>>> notification to be delivered to all registered notifiers.
> >>>>
> >>>> Until before this patch, it was working well.  
> >>>
> >>> I guess the alternative is we could check regs->trap for machine
> >>> check in the die test. Complication is having to account for MCE
> >>> in an interrupt handler.
> >>>
> >>>          if (in_interrupt()) {
> >>>                   if (!IS_MCHECK_EXC(regs) || (irq_count() - (NMI_OFFSET + HARDIRQ_OFFSET)))
> >>>                       panic("Fatal exception in interrupt");
> >>>          }
> >>>
> >>> Something like that might work for you? We needs a ppc64 macro for the
> >>> MCE, and can probably add something like in_nmi_from_interrupt() for
> >>> the second part of the test.  
> >>
> >> Don't know, I'm away from home on business trip so I won't be able to
> >> test anything before next week. However it looks more or less like a
> >> hack, doesn't it ?  
> > 
> > I thought it seemed okay (with the right functions added). Actually it
> > could be a bit nicer to do this, then it works generally :
> > 
> >           if (in_interrupt()) {
> >                    if (!in_nmi() || in_nmi_from_interrupt())
> >                        panic("Fatal exception in interrupt");
> >           }  
> 
> 
> Yes looks nice, but:
> 1/ what is in_nmi_from_interrupt() ? Is it (in_nmi() && (in_irq() || 
> in_softirq()) ?

  return (irq_count() - (NMI_OFFSET + HARDIRQ_OFFSET))) != 0;

(basically just in_interrupt() with the nmi_enter undone)

> 2/ what about in_nmi_from_nmi(), how do we detect that ?

Oh good point, I'm not sure. I guess we could irq_enter() in the
nested case, I think that would make in_nmi_from_interrupt()
return true.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 3/3] powerpc: machine check interrupt is a non-maskable interrupt
  2018-10-09 12:14                 ` Nicholas Piggin
@ 2018-10-11 14:23                   ` Christophe LEROY
  0 siblings, 0 replies; 21+ messages in thread
From: Christophe LEROY @ 2018-10-11 14:23 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: Mahesh Jagannath Salgaonkar, linuxppc-dev



Le 09/10/2018 à 14:14, Nicholas Piggin a écrit :
> On Tue, 9 Oct 2018 14:01:37 +0200
> Christophe LEROY <christophe.leroy@c-s.fr> wrote:
> 
>> Le 09/10/2018 à 13:16, Nicholas Piggin a écrit :
>>> On Tue, 9 Oct 2018 09:36:18 +0000
>>> Christophe Leroy <christophe.leroy@c-s.fr> wrote:
>>>    
>>>> On 10/09/2018 05:30 AM, Nicholas Piggin wrote:
>>>>> On Tue, 9 Oct 2018 06:46:30 +0200
>>>>> Christophe LEROY <christophe.leroy@c-s.fr> wrote:
>>>>>       
>>>>>> Le 09/10/2018 à 06:32, Nicholas Piggin a écrit :
>>>>>>> On Mon, 8 Oct 2018 17:39:11 +0200
>>>>>>> Christophe LEROY <christophe.leroy@c-s.fr> wrote:
>>>>>>>          
>>>>>>>> Hi Nick,
>>>>>>>>
>>>>>>>> Le 19/07/2017 à 08:59, Nicholas Piggin a écrit :
>>>>>>>>> Use nmi_enter similarly to system reset interrupts. This uses NMI
>>>>>>>>> printk NMI buffers and turns off various debugging facilities that
>>>>>>>>> helps avoid tripping on ourselves or other CPUs.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>>>>>>>>> ---
>>>>>>>>>       arch/powerpc/kernel/traps.c | 9 ++++++---
>>>>>>>>>       1 file changed, 6 insertions(+), 3 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
>>>>>>>>> index 2849c4f50324..6d31f9d7c333 100644
>>>>>>>>> --- a/arch/powerpc/kernel/traps.c
>>>>>>>>> +++ b/arch/powerpc/kernel/traps.c
>>>>>>>>> @@ -789,8 +789,10 @@ int machine_check_generic(struct pt_regs *regs)
>>>>>>>>>       
>>>>>>>>>       void machine_check_exception(struct pt_regs *regs)
>>>>>>>>>       {
>>>>>>>>> -	enum ctx_state prev_state = exception_enter();
>>>>>>>>>       	int recover = 0;
>>>>>>>>> +	bool nested = in_nmi();
>>>>>>>>> +	if (!nested)
>>>>>>>>> +		nmi_enter();
>>>>>>>>
>>>>>>>> This alters preempt_count, then when die() is called
>>>>>>>> in_interrupt() returns true allthough the trap didn't happen in
>>>>>>>> interrupt, so oops_end() panics for "fatal exception in interrupt"
>>>>>>>> instead of gently sending SIGBUS the faulting app.
>>>>>>>
>>>>>>> Thanks for tracking that down.
>>>>>>>          
>>>>>>>> Any idea on how to fix this ?
>>>>>>>
>>>>>>> I would say we have to deliver the sigbus by hand.
>>>>>>>
>>>>>>>         if ((user_mode(regs)))
>>>>>>>             _exception(SIGBUS, regs, BUS_MCEERR_AR, regs->nip);
>>>>>>>         else
>>>>>>>             die("Machine check", regs, SIGBUS);
>>>>>>>          
>>>>>>
>>>>>> And what about all the other things done by 'die()' ?
>>>>>>
>>>>>> And what if it is a kernel thread ?
>>>>>>
>>>>>> In one of my boards, I have a kernel thread regularly checking the HW,
>>>>>> and if it gets a machine check I expect it to gently stop and the die
>>>>>> notification to be delivered to all registered notifiers.
>>>>>>
>>>>>> Until before this patch, it was working well.
>>>>>
>>>>> I guess the alternative is we could check regs->trap for machine
>>>>> check in the die test. Complication is having to account for MCE
>>>>> in an interrupt handler.
>>>>>
>>>>>           if (in_interrupt()) {
>>>>>                    if (!IS_MCHECK_EXC(regs) || (irq_count() - (NMI_OFFSET + HARDIRQ_OFFSET)))
>>>>>                        panic("Fatal exception in interrupt");
>>>>>           }
>>>>>
>>>>> Something like that might work for you? We needs a ppc64 macro for the
>>>>> MCE, and can probably add something like in_nmi_from_interrupt() for
>>>>> the second part of the test.
>>>>
>>>> Don't know, I'm away from home on business trip so I won't be able to
>>>> test anything before next week. However it looks more or less like a
>>>> hack, doesn't it ?
>>>
>>> I thought it seemed okay (with the right functions added). Actually it
>>> could be a bit nicer to do this, then it works generally :
>>>
>>>            if (in_interrupt()) {
>>>                     if (!in_nmi() || in_nmi_from_interrupt())
>>>                         panic("Fatal exception in interrupt");
>>>            }
>>
>>
>> Yes looks nice, but:
>> 1/ what is in_nmi_from_interrupt() ? Is it (in_nmi() && (in_irq() ||
>> in_softirq()) ?
> 
>    return (irq_count() - (NMI_OFFSET + HARDIRQ_OFFSET))) != 0;
> 
> (basically just in_interrupt() with the nmi_enter undone)
> 
>> 2/ what about in_nmi_from_nmi(), how do we detect that ?
> 
> Oh good point, I'm not sure. I guess we could irq_enter() in the
> nested case, I think that would make in_nmi_from_interrupt()
> return true.

Yes we could, but I find it ugly.

Don't you think it looks less strange to just check in_interrupt() 
before calling nmi_enter()  ?

Christophe

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 3/3] powerpc: machine check interrupt is a non-maskable interrupt
  2018-10-09 11:16             ` Nicholas Piggin
  2018-10-09 12:01               ` Christophe LEROY
@ 2018-10-11 14:31               ` Christophe LEROY
  2018-10-13  8:29                 ` Christophe Leroy
  1 sibling, 1 reply; 21+ messages in thread
From: Christophe LEROY @ 2018-10-11 14:31 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: Mahesh Jagannath Salgaonkar, linuxppc-dev



Le 09/10/2018 à 13:16, Nicholas Piggin a écrit :
> On Tue, 9 Oct 2018 09:36:18 +0000
> Christophe Leroy <christophe.leroy@c-s.fr> wrote:
> 
>> On 10/09/2018 05:30 AM, Nicholas Piggin wrote:
>>> On Tue, 9 Oct 2018 06:46:30 +0200
>>> Christophe LEROY <christophe.leroy@c-s.fr> wrote:
>>>    
>>>> Le 09/10/2018 à 06:32, Nicholas Piggin a écrit :
>>>>> On Mon, 8 Oct 2018 17:39:11 +0200
>>>>> Christophe LEROY <christophe.leroy@c-s.fr> wrote:
>>>>>       
>>>>>> Hi Nick,
>>>>>>
>>>>>> Le 19/07/2017 à 08:59, Nicholas Piggin a écrit :
>>>>>>> Use nmi_enter similarly to system reset interrupts. This uses NMI
>>>>>>> printk NMI buffers and turns off various debugging facilities that
>>>>>>> helps avoid tripping on ourselves or other CPUs.
>>>>>>>
>>>>>>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>>>>>>> ---
>>>>>>>      arch/powerpc/kernel/traps.c | 9 ++++++---
>>>>>>>      1 file changed, 6 insertions(+), 3 deletions(-)
>>>>>>>
>>>>>>> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
>>>>>>> index 2849c4f50324..6d31f9d7c333 100644
>>>>>>> --- a/arch/powerpc/kernel/traps.c
>>>>>>> +++ b/arch/powerpc/kernel/traps.c
>>>>>>> @@ -789,8 +789,10 @@ int machine_check_generic(struct pt_regs *regs)
>>>>>>>      
>>>>>>>      void machine_check_exception(struct pt_regs *regs)
>>>>>>>      {
>>>>>>> -	enum ctx_state prev_state = exception_enter();
>>>>>>>      	int recover = 0;
>>>>>>> +	bool nested = in_nmi();
>>>>>>> +	if (!nested)
>>>>>>> +		nmi_enter();
>>>>>>
>>>>>> This alters preempt_count, then when die() is called
>>>>>> in_interrupt() returns true allthough the trap didn't happen in
>>>>>> interrupt, so oops_end() panics for "fatal exception in interrupt"
>>>>>> instead of gently sending SIGBUS the faulting app.
>>>>>
>>>>> Thanks for tracking that down.
>>>>>       
>>>>>> Any idea on how to fix this ?
>>>>>
>>>>> I would say we have to deliver the sigbus by hand.
>>>>>
>>>>>        if ((user_mode(regs)))
>>>>>            _exception(SIGBUS, regs, BUS_MCEERR_AR, regs->nip);
>>>>>        else
>>>>>            die("Machine check", regs, SIGBUS);
>>>>>       
>>>>
>>>> And what about all the other things done by 'die()' ?
>>>>
>>>> And what if it is a kernel thread ?
>>>>
>>>> In one of my boards, I have a kernel thread regularly checking the HW,
>>>> and if it gets a machine check I expect it to gently stop and the die
>>>> notification to be delivered to all registered notifiers.
>>>>
>>>> Until before this patch, it was working well.
>>>
>>> I guess the alternative is we could check regs->trap for machine
>>> check in the die test. Complication is having to account for MCE
>>> in an interrupt handler.
>>>
>>>          if (in_interrupt()) {
>>>                   if (!IS_MCHECK_EXC(regs) || (irq_count() - (NMI_OFFSET + HARDIRQ_OFFSET)))
>>>                       panic("Fatal exception in interrupt");
>>>          }
>>>
>>> Something like that might work for you? We needs a ppc64 macro for the
>>> MCE, and can probably add something like in_nmi_from_interrupt() for
>>> the second part of the test.
>>
>> Don't know, I'm away from home on business trip so I won't be able to
>> test anything before next week. However it looks more or less like a
>> hack, doesn't it ?
> 
> I thought it seemed okay (with the right functions added). Actually it
> could be a bit nicer to do this, then it works generally :
> 
>           if (in_interrupt()) {
>                    if (!in_nmi() || in_nmi_from_interrupt())
>                        panic("Fatal exception in interrupt");
>           }
> 
>>
>> What about the following ?
> 
> Hmm, in some ways maybe it's nicer. One complication is I would like the
> same thing to be available for platform specific machine check
> handlers, so then you need to pass is_in_interrupt to them. Which you
> can do without any problem... But is it cleaner than the above?

For me it looks cleaner than twiddle the preempt_count depending on 
whether we were or not already in nmi() .

Let's draft something and see what it looks like.


> 
> I guess one advantage of yours is that a BUG somewhere in the NMI path
> will panic the system. Or is that a disadvantage?

Why would it panic the system more than now ? And is it an issue at all 
? Doesn't BUG() panic in any case ?

Christophe

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 3/3] powerpc: machine check interrupt is a non-maskable interrupt
  2018-10-11 14:31               ` Christophe LEROY
@ 2018-10-13  8:29                 ` Christophe Leroy
  2018-10-13  8:48                   ` Nicholas Piggin
  0 siblings, 1 reply; 21+ messages in thread
From: Christophe Leroy @ 2018-10-13  8:29 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: Mahesh Jagannath Salgaonkar, linuxppc-dev



On 10/11/2018 02:31 PM, Christophe LEROY wrote:
> 
> 
> Le 09/10/2018 à 13:16, Nicholas Piggin a écrit :
>> On Tue, 9 Oct 2018 09:36:18 +0000
>> Christophe Leroy <christophe.leroy@c-s.fr> wrote:
>>
>>> On 10/09/2018 05:30 AM, Nicholas Piggin wrote:
>>>> On Tue, 9 Oct 2018 06:46:30 +0200
>>>> Christophe LEROY <christophe.leroy@c-s.fr> wrote:
>>>>> Le 09/10/2018 à 06:32, Nicholas Piggin a écrit :
>>>>>> On Mon, 8 Oct 2018 17:39:11 +0200
>>>>>> Christophe LEROY <christophe.leroy@c-s.fr> wrote:
>>>>>>> Hi Nick,
>>>>>>>
>>>>>>> Le 19/07/2017 à 08:59, Nicholas Piggin a écrit :
>>>>>>>> Use nmi_enter similarly to system reset interrupts. This uses NMI
>>>>>>>> printk NMI buffers and turns off various debugging facilities that
>>>>>>>> helps avoid tripping on ourselves or other CPUs.
>>>>>>>>
>>>>>>>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>>>>>>>> ---
>>>>>>>>      arch/powerpc/kernel/traps.c | 9 ++++++---
>>>>>>>>      1 file changed, 6 insertions(+), 3 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/arch/powerpc/kernel/traps.c 
>>>>>>>> b/arch/powerpc/kernel/traps.c
>>>>>>>> index 2849c4f50324..6d31f9d7c333 100644
>>>>>>>> --- a/arch/powerpc/kernel/traps.c
>>>>>>>> +++ b/arch/powerpc/kernel/traps.c
>>>>>>>> @@ -789,8 +789,10 @@ int machine_check_generic(struct pt_regs 
>>>>>>>> *regs)
>>>>>>>>      void machine_check_exception(struct pt_regs *regs)
>>>>>>>>      {
>>>>>>>> -    enum ctx_state prev_state = exception_enter();
>>>>>>>>          int recover = 0;
>>>>>>>> +    bool nested = in_nmi();
>>>>>>>> +    if (!nested)
>>>>>>>> +        nmi_enter();
>>>>>>>
>>>>>>> This alters preempt_count, then when die() is called
>>>>>>> in_interrupt() returns true allthough the trap didn't happen in
>>>>>>> interrupt, so oops_end() panics for "fatal exception in interrupt"
>>>>>>> instead of gently sending SIGBUS the faulting app.
>>>>>>
>>>>>> Thanks for tracking that down.
>>>>>>> Any idea on how to fix this ?
>>>>>>
>>>>>> I would say we have to deliver the sigbus by hand.
>>>>>>
>>>>>>        if ((user_mode(regs)))
>>>>>>            _exception(SIGBUS, regs, BUS_MCEERR_AR, regs->nip);
>>>>>>        else
>>>>>>            die("Machine check", regs, SIGBUS);
>>>>>
>>>>> And what about all the other things done by 'die()' ?
>>>>>
>>>>> And what if it is a kernel thread ?
>>>>>
>>>>> In one of my boards, I have a kernel thread regularly checking the HW,
>>>>> and if it gets a machine check I expect it to gently stop and the die
>>>>> notification to be delivered to all registered notifiers.
>>>>>
>>>>> Until before this patch, it was working well.
>>>>
>>>> I guess the alternative is we could check regs->trap for machine
>>>> check in the die test. Complication is having to account for MCE
>>>> in an interrupt handler.
>>>>
>>>>          if (in_interrupt()) {
>>>>                   if (!IS_MCHECK_EXC(regs) || (irq_count() - 
>>>> (NMI_OFFSET + HARDIRQ_OFFSET)))
>>>>                       panic("Fatal exception in interrupt");
>>>>          }
>>>>
>>>> Something like that might work for you? We needs a ppc64 macro for the
>>>> MCE, and can probably add something like in_nmi_from_interrupt() for
>>>> the second part of the test.
>>>
>>> Don't know, I'm away from home on business trip so I won't be able to
>>> test anything before next week. However it looks more or less like a
>>> hack, doesn't it ?
>>
>> I thought it seemed okay (with the right functions added). Actually it
>> could be a bit nicer to do this, then it works generally :
>>
>>           if (in_interrupt()) {
>>                    if (!in_nmi() || in_nmi_from_interrupt())
>>                        panic("Fatal exception in interrupt");
>>           }
>>
>>>
>>> What about the following ?
>>
>> Hmm, in some ways maybe it's nicer. One complication is I would like the
>> same thing to be available for platform specific machine check
>> handlers, so then you need to pass is_in_interrupt to them. Which you
>> can do without any problem... But is it cleaner than the above?
> 
> For me it looks cleaner than twiddle the preempt_count depending on 
> whether we were or not already in nmi() .
> 
> Let's draft something and see what it looks like.

Ok, finaly I went to your solution, see below, as it avoids having to 
modify all subarch and platform specific machine check handlers.

Unfortunately it doesn't solves the issue, it only delays it:

oops_end() calls do_exit(), which has the following test:

	if (unlikely(in_interrupt()))
		panic("Aiee, killing interrupt handler!");


So at the time being I still have no idea how to fix that, have you ?

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index fd58749b4d6b..3569e826f0c2 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -132,6 +132,21 @@ static void pmac_backlight_unblank(void)
  static inline void pmac_backlight_unblank(void) { }
  #endif

+static bool from_interrupt(void)
+{
+	if (!in_nmi())
+		return in_interrupt();
+	/*
+	 * if we are in NMI, we need to determine if we were already in
+	 * interrupt before entering NMI. To do that, we recalculate irq_count()
+	 * from before the call to nmi_enter().
+	 * If we were already in NMI and reentered in a new one, we have
+	 * increased the preempt count by HARDIRQ_OFFSET, so the calculated
+	 * value will be not null
+	 */
+	return irq_count() - NMI_OFFSET - HARDIRQ_OFFSET;
+}
+
  /*
   * If oops/die is expected to crash the machine, return true here.
   *
@@ -147,8 +162,7 @@ bool die_will_crash(void)
  		return true;
  	if (kexec_should_crash(current))
  		return true;
-	if (in_interrupt() || panic_on_oops ||
-			!current->pid || is_global_init(current))
+	if (from_interrupt() || panic_on_oops || !current->pid || 
is_global_init(current))
  		return true;

  	return false;
@@ -242,12 +256,12 @@ static void oops_end(unsigned long flags, struct 
pt_regs *regs,
  	 * know we are going to panic, delay for 1 second so we have a
  	 * chance to get clean backtraces from all CPUs that are oopsing.
  	 */
-	if (in_interrupt() || panic_on_oops || !current->pid ||
+	if (from_interrupt() || panic_on_oops || !current->pid ||
  	    is_global_init(current)) {
  		mdelay(MSEC_PER_SEC);
  	}

-	if (in_interrupt())
+	if (from_interrupt())
  		panic("Fatal exception in interrupt");
  	if (panic_on_oops)
  		panic("Fatal exception");
@@ -378,15 +392,37 @@ void _exception(int signr, struct pt_regs *regs, 
int code, unsigned long addr)
  	_exception_pkey(signr, regs, code, addr, 0);
  }

+static bool exception_nmi_enter(void)
+{
+	bool nested = in_nmi();
+
+	/*
+	 * In case we are already in an NMI, increase preempt_count by
+	 * HARDIRQ_OFFSET in order to get from_interrupt() return true
+	 */
+	if (nested)
+		preempt_count_add(HARDIRQ_OFFSET);
+	else
+		nmi_enter();
+
+	return nested;
+}
+
+static void exception_nmi_exit(bool nested)
+{
+	if (nested)
+		preempt_count_sub(HARDIRQ_OFFSET);
+	else
+		nmi_exit();
+}
+
  void system_reset_exception(struct pt_regs *regs)
  {
  	/*
  	 * Avoid crashes in case of nested NMI exceptions. Recoverability
  	 * is determined by RI and in_nmi
  	 */
-	bool nested = in_nmi();
-	if (!nested)
-		nmi_enter();
+	bool nested = exception_nmi_enter();

  	__this_cpu_inc(irq_stat.sreset_irqs);

@@ -435,8 +471,7 @@ void system_reset_exception(struct pt_regs *regs)
  	if (!(regs->msr & MSR_RI))
  		nmi_panic(regs, "Unrecoverable System Reset");

-	if (!nested)
-		nmi_exit();
+	exception_nmi_exit(nested);

  	/* What should we do here? We could issue a shutdown or hard reset. */
  }
@@ -737,9 +772,7 @@ int machine_check_generic(struct pt_regs *regs)
  void machine_check_exception(struct pt_regs *regs)
  {
  	int recover = 0;
-	bool nested = in_nmi();
-	if (!nested)
-		nmi_enter();
+	bool nested = exception_nmi_enter();

  	__this_cpu_inc(irq_stat.mce_exceptions);

@@ -772,8 +805,7 @@ void machine_check_exception(struct pt_regs *regs)
  		nmi_panic(regs, "Unrecoverable Machine check");

  bail:
-	if (!nested)
-		nmi_exit();
+	exception_nmi_exit(nested);
  }

  void SMIException(struct pt_regs *regs)

> 
> 
>>
>> I guess one advantage of yours is that a BUG somewhere in the NMI path
>> will panic the system. Or is that a disadvantage?
> 
> Why would it panic the system more than now ? And is it an issue at all 
> ? Doesn't BUG() panic in any case ?
> 

Christophe

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 3/3] powerpc: machine check interrupt is a non-maskable interrupt
  2018-10-13  8:29                 ` Christophe Leroy
@ 2018-10-13  8:48                   ` Nicholas Piggin
  2018-10-13  8:56                     ` Christophe LEROY
  0 siblings, 1 reply; 21+ messages in thread
From: Nicholas Piggin @ 2018-10-13  8:48 UTC (permalink / raw)
  To: Christophe Leroy; +Cc: Mahesh Jagannath Salgaonkar, linuxppc-dev

On Sat, 13 Oct 2018 08:29:48 +0000
Christophe Leroy <christophe.leroy@c-s.fr> wrote:

> On 10/11/2018 02:31 PM, Christophe LEROY wrote:
> > 
> > 
> > Le 09/10/2018 à 13:16, Nicholas Piggin a écrit :  
> >> On Tue, 9 Oct 2018 09:36:18 +0000
> >> Christophe Leroy <christophe.leroy@c-s.fr> wrote:
> >>  
> >>> On 10/09/2018 05:30 AM, Nicholas Piggin wrote:  
> >>>> On Tue, 9 Oct 2018 06:46:30 +0200
> >>>> Christophe LEROY <christophe.leroy@c-s.fr> wrote:  
> >>>>> Le 09/10/2018 à 06:32, Nicholas Piggin a écrit :  
> >>>>>> On Mon, 8 Oct 2018 17:39:11 +0200
> >>>>>> Christophe LEROY <christophe.leroy@c-s.fr> wrote:  
> >>>>>>> Hi Nick,
> >>>>>>>
> >>>>>>> Le 19/07/2017 à 08:59, Nicholas Piggin a écrit :  
> >>>>>>>> Use nmi_enter similarly to system reset interrupts. This uses NMI
> >>>>>>>> printk NMI buffers and turns off various debugging facilities that
> >>>>>>>> helps avoid tripping on ourselves or other CPUs.
> >>>>>>>>
> >>>>>>>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> >>>>>>>> ---
> >>>>>>>>      arch/powerpc/kernel/traps.c | 9 ++++++---
> >>>>>>>>      1 file changed, 6 insertions(+), 3 deletions(-)
> >>>>>>>>
> >>>>>>>> diff --git a/arch/powerpc/kernel/traps.c 
> >>>>>>>> b/arch/powerpc/kernel/traps.c
> >>>>>>>> index 2849c4f50324..6d31f9d7c333 100644
> >>>>>>>> --- a/arch/powerpc/kernel/traps.c
> >>>>>>>> +++ b/arch/powerpc/kernel/traps.c
> >>>>>>>> @@ -789,8 +789,10 @@ int machine_check_generic(struct pt_regs 
> >>>>>>>> *regs)
> >>>>>>>>      void machine_check_exception(struct pt_regs *regs)
> >>>>>>>>      {
> >>>>>>>> -    enum ctx_state prev_state = exception_enter();
> >>>>>>>>          int recover = 0;
> >>>>>>>> +    bool nested = in_nmi();
> >>>>>>>> +    if (!nested)
> >>>>>>>> +        nmi_enter();  
> >>>>>>>
> >>>>>>> This alters preempt_count, then when die() is called
> >>>>>>> in_interrupt() returns true allthough the trap didn't happen in
> >>>>>>> interrupt, so oops_end() panics for "fatal exception in interrupt"
> >>>>>>> instead of gently sending SIGBUS the faulting app.  
> >>>>>>
> >>>>>> Thanks for tracking that down.  
> >>>>>>> Any idea on how to fix this ?  
> >>>>>>
> >>>>>> I would say we have to deliver the sigbus by hand.
> >>>>>>
> >>>>>>        if ((user_mode(regs)))
> >>>>>>            _exception(SIGBUS, regs, BUS_MCEERR_AR, regs->nip);
> >>>>>>        else
> >>>>>>            die("Machine check", regs, SIGBUS);  
> >>>>>
> >>>>> And what about all the other things done by 'die()' ?
> >>>>>
> >>>>> And what if it is a kernel thread ?
> >>>>>
> >>>>> In one of my boards, I have a kernel thread regularly checking the HW,
> >>>>> and if it gets a machine check I expect it to gently stop and the die
> >>>>> notification to be delivered to all registered notifiers.
> >>>>>
> >>>>> Until before this patch, it was working well.  
> >>>>
> >>>> I guess the alternative is we could check regs->trap for machine
> >>>> check in the die test. Complication is having to account for MCE
> >>>> in an interrupt handler.
> >>>>
> >>>>          if (in_interrupt()) {
> >>>>                   if (!IS_MCHECK_EXC(regs) || (irq_count() - 
> >>>> (NMI_OFFSET + HARDIRQ_OFFSET)))
> >>>>                       panic("Fatal exception in interrupt");
> >>>>          }
> >>>>
> >>>> Something like that might work for you? We needs a ppc64 macro for the
> >>>> MCE, and can probably add something like in_nmi_from_interrupt() for
> >>>> the second part of the test.  
> >>>
> >>> Don't know, I'm away from home on business trip so I won't be able to
> >>> test anything before next week. However it looks more or less like a
> >>> hack, doesn't it ?  
> >>
> >> I thought it seemed okay (with the right functions added). Actually it
> >> could be a bit nicer to do this, then it works generally :
> >>
> >>           if (in_interrupt()) {
> >>                    if (!in_nmi() || in_nmi_from_interrupt())
> >>                        panic("Fatal exception in interrupt");
> >>           }
> >>  
> >>>
> >>> What about the following ?  
> >>
> >> Hmm, in some ways maybe it's nicer. One complication is I would like the
> >> same thing to be available for platform specific machine check
> >> handlers, so then you need to pass is_in_interrupt to them. Which you
> >> can do without any problem... But is it cleaner than the above?  
> > 
> > For me it looks cleaner than twiddle the preempt_count depending on 
> > whether we were or not already in nmi() .
> > 
> > Let's draft something and see what it looks like.  
> 
> Ok, finaly I went to your solution, see below, as it avoids having to 
> modify all subarch and platform specific machine check handlers.
> 
> Unfortunately it doesn't solves the issue, it only delays it:
> 
> oops_end() calls do_exit(), which has the following test:
> 
> 	if (unlikely(in_interrupt()))
> 		panic("Aiee, killing interrupt handler!");
> 
> 
> So at the time being I still have no idea how to fix that, have you ?

Huh, I'm not sure. x86's MCE handling looks like it does this:

                /*
                 * We might have interrupted pretty much anything.  In
                 * fact, if we're a machine check, we can even interrupt
                 * NMI processing.  We don't want in_nmi() to return true,
                 * but we need to notify RCU.
                 */
                rcu_nmi_enter();

But I don't see why they don't want the full NMI treatment there. I
thought the whole point was to do everything so you would get e.g.,
the NMI-safe printk and so on.

The reason the in_interrupt checks work below is because the synchronous
trap handlers e.g., for BUG do not enter interrupt context so the
question is about they context they interrupted. Maybe the right way to
go is nmi_exit just before deciding to oops.

Perhaps we could ask lkml.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 3/3] powerpc: machine check interrupt is a non-maskable interrupt
  2018-10-13  8:48                   ` Nicholas Piggin
@ 2018-10-13  8:56                     ` Christophe LEROY
  0 siblings, 0 replies; 21+ messages in thread
From: Christophe LEROY @ 2018-10-13  8:56 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: Mahesh Jagannath Salgaonkar, linuxppc-dev



Le 13/10/2018 à 10:48, Nicholas Piggin a écrit :
> On Sat, 13 Oct 2018 08:29:48 +0000
> Christophe Leroy <christophe.leroy@c-s.fr> wrote:
> 
>> On 10/11/2018 02:31 PM, Christophe LEROY wrote:
>>>
>>>
>>> Le 09/10/2018 à 13:16, Nicholas Piggin a écrit :
>>>> On Tue, 9 Oct 2018 09:36:18 +0000
>>>> Christophe Leroy <christophe.leroy@c-s.fr> wrote:
>>>>   
>>>>> On 10/09/2018 05:30 AM, Nicholas Piggin wrote:
>>>>>> On Tue, 9 Oct 2018 06:46:30 +0200
>>>>>> Christophe LEROY <christophe.leroy@c-s.fr> wrote:
>>>>>>> Le 09/10/2018 à 06:32, Nicholas Piggin a écrit :
>>>>>>>> On Mon, 8 Oct 2018 17:39:11 +0200
>>>>>>>> Christophe LEROY <christophe.leroy@c-s.fr> wrote:
>>>>>>>>> Hi Nick,
>>>>>>>>>
>>>>>>>>> Le 19/07/2017 à 08:59, Nicholas Piggin a écrit :
>>>>>>>>>> Use nmi_enter similarly to system reset interrupts. This uses NMI
>>>>>>>>>> printk NMI buffers and turns off various debugging facilities that
>>>>>>>>>> helps avoid tripping on ourselves or other CPUs.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>>>>>>>>>> ---
>>>>>>>>>>       arch/powerpc/kernel/traps.c | 9 ++++++---
>>>>>>>>>>       1 file changed, 6 insertions(+), 3 deletions(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/arch/powerpc/kernel/traps.c
>>>>>>>>>> b/arch/powerpc/kernel/traps.c
>>>>>>>>>> index 2849c4f50324..6d31f9d7c333 100644
>>>>>>>>>> --- a/arch/powerpc/kernel/traps.c
>>>>>>>>>> +++ b/arch/powerpc/kernel/traps.c
>>>>>>>>>> @@ -789,8 +789,10 @@ int machine_check_generic(struct pt_regs
>>>>>>>>>> *regs)
>>>>>>>>>>       void machine_check_exception(struct pt_regs *regs)
>>>>>>>>>>       {
>>>>>>>>>> -    enum ctx_state prev_state = exception_enter();
>>>>>>>>>>           int recover = 0;
>>>>>>>>>> +    bool nested = in_nmi();
>>>>>>>>>> +    if (!nested)
>>>>>>>>>> +        nmi_enter();
>>>>>>>>>
>>>>>>>>> This alters preempt_count, then when die() is called
>>>>>>>>> in_interrupt() returns true allthough the trap didn't happen in
>>>>>>>>> interrupt, so oops_end() panics for "fatal exception in interrupt"
>>>>>>>>> instead of gently sending SIGBUS the faulting app.
>>>>>>>>
>>>>>>>> Thanks for tracking that down.
>>>>>>>>> Any idea on how to fix this ?
>>>>>>>>
>>>>>>>> I would say we have to deliver the sigbus by hand.
>>>>>>>>
>>>>>>>>         if ((user_mode(regs)))
>>>>>>>>             _exception(SIGBUS, regs, BUS_MCEERR_AR, regs->nip);
>>>>>>>>         else
>>>>>>>>             die("Machine check", regs, SIGBUS);
>>>>>>>
>>>>>>> And what about all the other things done by 'die()' ?
>>>>>>>
>>>>>>> And what if it is a kernel thread ?
>>>>>>>
>>>>>>> In one of my boards, I have a kernel thread regularly checking the HW,
>>>>>>> and if it gets a machine check I expect it to gently stop and the die
>>>>>>> notification to be delivered to all registered notifiers.
>>>>>>>
>>>>>>> Until before this patch, it was working well.
>>>>>>
>>>>>> I guess the alternative is we could check regs->trap for machine
>>>>>> check in the die test. Complication is having to account for MCE
>>>>>> in an interrupt handler.
>>>>>>
>>>>>>           if (in_interrupt()) {
>>>>>>                    if (!IS_MCHECK_EXC(regs) || (irq_count() -
>>>>>> (NMI_OFFSET + HARDIRQ_OFFSET)))
>>>>>>                        panic("Fatal exception in interrupt");
>>>>>>           }
>>>>>>
>>>>>> Something like that might work for you? We needs a ppc64 macro for the
>>>>>> MCE, and can probably add something like in_nmi_from_interrupt() for
>>>>>> the second part of the test.
>>>>>
>>>>> Don't know, I'm away from home on business trip so I won't be able to
>>>>> test anything before next week. However it looks more or less like a
>>>>> hack, doesn't it ?
>>>>
>>>> I thought it seemed okay (with the right functions added). Actually it
>>>> could be a bit nicer to do this, then it works generally :
>>>>
>>>>            if (in_interrupt()) {
>>>>                     if (!in_nmi() || in_nmi_from_interrupt())
>>>>                         panic("Fatal exception in interrupt");
>>>>            }
>>>>   
>>>>>
>>>>> What about the following ?
>>>>
>>>> Hmm, in some ways maybe it's nicer. One complication is I would like the
>>>> same thing to be available for platform specific machine check
>>>> handlers, so then you need to pass is_in_interrupt to them. Which you
>>>> can do without any problem... But is it cleaner than the above?
>>>
>>> For me it looks cleaner than twiddle the preempt_count depending on
>>> whether we were or not already in nmi() .
>>>
>>> Let's draft something and see what it looks like.
>>
>> Ok, finaly I went to your solution, see below, as it avoids having to
>> modify all subarch and platform specific machine check handlers.
>>
>> Unfortunately it doesn't solves the issue, it only delays it:
>>
>> oops_end() calls do_exit(), which has the following test:
>>
>> 	if (unlikely(in_interrupt()))
>> 		panic("Aiee, killing interrupt handler!");
>>
>>
>> So at the time being I still have no idea how to fix that, have you ?
> 
> Huh, I'm not sure. x86's MCE handling looks like it does this:
> 
>                  /*
>                   * We might have interrupted pretty much anything.  In
>                   * fact, if we're a machine check, we can even interrupt
>                   * NMI processing.  We don't want in_nmi() to return true,
>                   * but we need to notify RCU.
>                   */
>                  rcu_nmi_enter();
> 
> But I don't see why they don't want the full NMI treatment there. I
> thought the whole point was to do everything so you would get e.g.,
> the NMI-safe printk and so on.
> 
> The reason the in_interrupt checks work below is because the synchronous
> trap handlers e.g., for BUG do not enter interrupt context so the
> question is about they context they interrupted. Maybe the right way to
> go is nmi_exit just before deciding to oops.

Yes I arrived at the same conclusion. I tested it just now and it works 
for me. Thanks.

Christophe

> 
> Perhaps we could ask lkml.
> 
> Thanks,
> Nick
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2018-10-13  8:58 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-19  6:59 [PATCH v2 0/3] machine check handling improvements Nicholas Piggin
2017-07-19  6:59 ` [PATCH v2 1/3] powerpc/powernv: handle the platform error reboot in ppc_md.restart Nicholas Piggin
2017-07-19  7:16   ` Nicholas Piggin
2017-07-20  5:39   ` Mahesh Jagannath Salgaonkar
2017-08-31 11:36   ` [v2, " Michael Ellerman
2017-07-19  6:59 ` [PATCH v2 2/3] powerpc/powernv: machine check use kernel crash path Nicholas Piggin
2017-07-20  7:14   ` Mahesh Jagannath Salgaonkar
2017-07-19  6:59 ` [PATCH v2 3/3] powerpc: machine check interrupt is a non-maskable interrupt Nicholas Piggin
2018-10-08 15:39   ` Christophe LEROY
2018-10-09  4:32     ` Nicholas Piggin
2018-10-09  4:46       ` Christophe LEROY
2018-10-09  5:30         ` Nicholas Piggin
2018-10-09  9:36           ` Christophe Leroy
2018-10-09 11:16             ` Nicholas Piggin
2018-10-09 12:01               ` Christophe LEROY
2018-10-09 12:14                 ` Nicholas Piggin
2018-10-11 14:23                   ` Christophe LEROY
2018-10-11 14:31               ` Christophe LEROY
2018-10-13  8:29                 ` Christophe Leroy
2018-10-13  8:48                   ` Nicholas Piggin
2018-10-13  8:56                     ` Christophe LEROY

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.