linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com>
To: linuxppc-dev <linuxppc-dev@ozlabs.org>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Jeremy Kerr <jeremy.kerr@au1.ibm.com>,
	Paul Mackerras <paulus@samba.org>,
	Anton Blanchard <anton@samba.org>
Subject: [RFC PATCH v3 12/12] powerpc/powernv: Machine check exception handling.
Date: Tue, 27 Aug 2013 01:02:53 +0530	[thread overview]
Message-ID: <20130826193252.2855.59425.stgit@mars> (raw)
In-Reply-To: <20130826192616.2855.18749.stgit@mars>

From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

Add basic error handling in machine check exception handler.

- If MSR_RI isn't set, we can not recover.
- Check if disposition set to OpalMCE_DISPOSITION_RECOVERED.
- Check if address at fault is inside kernel address space, if not then send
  SIGBUS to process if we hit exception when in userspace.
- If address at fault is not provided then and if we get a synchronous machine
  check while in userspace then kill the task.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/mce.h        |    1 +
 arch/powerpc/kernel/mce.c             |   27 +++++++++++++++++++++
 arch/powerpc/platforms/powernv/opal.c |   43 ++++++++++++++++++++++++++++++++-
 3 files changed, 70 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/mce.h b/arch/powerpc/include/asm/mce.h
index 1c20731..f72ea4c 100644
--- a/arch/powerpc/include/asm/mce.h
+++ b/arch/powerpc/include/asm/mce.h
@@ -193,5 +193,6 @@ extern void release_mce_event(void);
 extern void machine_check_queue_event(void);
 extern void machine_check_process_queued_event(void);
 extern void machine_check_print_event_info(struct machine_check_event *evt);
+extern uint64_t get_mce_fault_addr(struct machine_check_event *evt);
 
 #endif /* __ASM_PPC64_MCE_H__ */
diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
index 1cca4b6..3100509 100644
--- a/arch/powerpc/kernel/mce.c
+++ b/arch/powerpc/kernel/mce.c
@@ -316,3 +316,30 @@ void machine_check_print_event_info(struct machine_check_event *evt)
 		break;
 	}
 }
+
+uint64_t get_mce_fault_addr(struct machine_check_event *evt)
+{
+	switch (evt->error_type) {
+	case MCE_ERROR_TYPE_UE:
+		if (evt->u.ue_error.effective_address_provided)
+			return evt->u.ue_error.effective_address;
+		break;
+	case MCE_ERROR_TYPE_SLB:
+		if (evt->u.slb_error.effective_address_provided)
+			return evt->u.slb_error.effective_address;
+		break;
+	case MCE_ERROR_TYPE_ERAT:
+		if (evt->u.erat_error.effective_address_provided)
+			return evt->u.erat_error.effective_address;
+		break;
+	case MCE_ERROR_TYPE_TLB:
+		if (evt->u.tlb_error.effective_address_provided)
+			return evt->u.tlb_error.effective_address;
+		break;
+	default:
+	case MCE_ERROR_TYPE_UNKNOWN:
+		break;
+	}
+	return 0;
+}
+EXPORT_SYMBOL(get_mce_fault_addr);
diff --git a/arch/powerpc/platforms/powernv/opal.c b/arch/powerpc/platforms/powernv/opal.c
index 0170d19..2070970 100644
--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -17,6 +17,7 @@
 #include <linux/interrupt.h>
 #include <linux/notifier.h>
 #include <linux/slab.h>
+#include <linux/sched.h>
 #include <asm/opal.h>
 #include <asm/firmware.h>
 #include <asm/mce.h>
@@ -240,6 +241,44 @@ int opal_put_chars(uint32_t vtermno, const char *data, int total_len)
 	return written;
 }
 
+static int opal_recover_mce(struct pt_regs *regs,
+					struct machine_check_event *evt)
+{
+	int recovered = 0;
+	uint64_t ea = get_mce_fault_addr(evt);
+
+	if (!(regs->msr & MSR_RI)) {
+		/* If MSR_RI isn't set, we cannot recover */
+		recovered = 0;
+	} else if (evt->disposition == MCE_DISPOSITION_RECOVERED) {
+		/* Platform corrected itself */
+		recovered = 1;
+	} else if (ea && !is_kernel_addr(ea)) {
+		/*
+		 * Faulting address is not in kernel text. We should be fine.
+		 * We need to find which process uses this address.
+		 * For now, kill the task if we have received exception when
+		 * in userspace.
+		 *
+		 * TODO: Queue up this address for hwpoisioning later.
+		 */
+		if (user_mode(regs) && !is_global_init(current)) {
+			_exception(SIGBUS, regs, BUS_MCEERR_AR, regs->nip);
+			recovered = 1;
+		} else
+			recovered = 0;
+	} else if (user_mode(regs) && !is_global_init(current) &&
+		evt->severity == MCE_SEV_ERROR_SYNC) {
+		/*
+		 * If we have received a synchronous error when in userspace
+		 * kill the task.
+		 */
+		_exception(SIGBUS, regs, BUS_MCEERR_AR, regs->nip);
+		recovered = 1;
+	}
+	return recovered;
+}
+
 int opal_machine_check(struct pt_regs *regs)
 {
 	struct machine_check_event evt;
@@ -255,7 +294,9 @@ int opal_machine_check(struct pt_regs *regs)
 	}
 	machine_check_print_event_info(&evt);
 
-	return evt.severity == MCE_SEV_FATAL ? 0 : 1;
+	if (opal_recover_mce(regs, &evt))
+		return 1;
+	return 0;
 }
 
 static irqreturn_t opal_interrupt(int irq, void *data)

      parent reply	other threads:[~2013-08-26 19:32 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-26 19:31 [RFC PATCH v3 00/12] Machine check handling in linux host Mahesh J Salgaonkar
2013-08-26 19:31 ` [RFC PATCH v3 01/12] powerpc/book3s: Split the common exception prolog logic into two section Mahesh J Salgaonkar
2013-09-09  4:29   ` Paul Mackerras
2013-08-26 19:31 ` [RFC PATCH v3 02/12] powerpc/book3s: Introduce exclusive emergency stack for machine check exception Mahesh J Salgaonkar
2013-09-09  4:30   ` Paul Mackerras
2013-08-26 19:31 ` [RFC PATCH v3 03/12] powerpc/book3s: handle machine check in Linux host Mahesh J Salgaonkar
2013-09-09  4:52   ` Paul Mackerras
2013-08-26 19:31 ` [RFC PATCH v3 04/12] Validate r1 value before going to host kernel in virtual mode Mahesh J Salgaonkar
2013-09-09  5:29   ` Paul Mackerras
2013-09-09  9:26     ` Benjamin Herrenschmidt
2013-08-26 19:31 ` [RFC PATCH v3 05/12] powerpc/book3s: Introduce a early machine check hook in cpu_spec Mahesh J Salgaonkar
2013-09-09  5:33   ` Paul Mackerras
2013-08-26 19:32 ` [RFC PATCH v3 06/12] powerpc/book3s: Add flush_tlb operation " Mahesh J Salgaonkar
2013-09-09  5:36   ` Paul Mackerras
2013-08-26 19:32 ` [RFC PATCH v3 07/12] powerpc/book3s: Flush SLB/TLBs if we get SLB/TLB machine check errors on power7 Mahesh J Salgaonkar
2013-09-09  6:00   ` Paul Mackerras
2013-08-26 19:32 ` [RFC PATCH v3 08/12] powerpc/book3s: Flush SLB/TLBs if we get SLB/TLB machine check errors on power8 Mahesh J Salgaonkar
2013-09-09  6:01   ` Paul Mackerras
2013-08-26 19:32 ` [RFC PATCH v3 09/12] powerpc/book3s: Decode and save machine check event Mahesh J Salgaonkar
2013-08-26 19:32 ` [RFC PATCH v3 10/12] Queue up and process delayed MCE events Mahesh J Salgaonkar
2013-08-26 19:32 ` [RFC PATCH v3 11/12] powerpc/powernv: Remove machine check handling in OPAL Mahesh J Salgaonkar
2013-08-26 19:32 ` Mahesh J Salgaonkar [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130826193252.2855.59425.stgit@mars \
    --to=mahesh@linux.vnet.ibm.com \
    --cc=anton@samba.org \
    --cc=benh@kernel.crashing.org \
    --cc=jeremy.kerr@au1.ibm.com \
    --cc=linuxppc-dev@ozlabs.org \
    --cc=paulus@samba.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).