From: Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com>
To: linuxppc-dev <linuxppc-dev@ozlabs.org>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Jeremy Kerr <jeremy.kerr@au1.ibm.com>,
Paul Mackerras <paulus@samba.org>,
Anton Blanchard <anton@samba.org>
Subject: [RFC PATCH 9/9] powerpc/powernv: Machine check exception handling.
Date: Wed, 07 Aug 2013 15:09:30 +0530 [thread overview]
Message-ID: <20130807093930.5389.66368.stgit@mars.in.ibm.com> (raw)
In-Reply-To: <20130807093609.5389.26534.stgit@mars.in.ibm.com>
From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Add basic error handling in machine check exception handler.
- If MSR_RI isn't set, we can not recover.
- Check if disposition set to OpalMCE_DISPOSITION_RECOVERED.
- Check if address at fault is inside kernel address space, if not then send
SIGBUS to process if we hit exception when in userspace.
- If address at fault is not provided then and if we get a synchronous machine
check while in userspace then kill the task.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
arch/powerpc/include/asm/mce.h | 1 +
arch/powerpc/kernel/mce.c | 27 +++++++++++++++++++++
arch/powerpc/platforms/powernv/opal.c | 43 ++++++++++++++++++++++++++++++++-
3 files changed, 70 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/include/asm/mce.h b/arch/powerpc/include/asm/mce.h
index cf57f3d..558d0ca 100644
--- a/arch/powerpc/include/asm/mce.h
+++ b/arch/powerpc/include/asm/mce.h
@@ -185,5 +185,6 @@ struct mce_error_info {
extern void save_mce_event(struct pt_regs *regs, long handled,
struct mce_error_info *mce_err, uint64_t addr);
extern int get_mce_event(struct machine_check_event *mce);
+extern uint64_t get_mce_fault_addr(struct machine_check_event *evt);
#endif /* __ASM_PPC64_MCE_H__ */
diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
index a523d8b..eb9e059 100644
--- a/arch/powerpc/kernel/mce.c
+++ b/arch/powerpc/kernel/mce.c
@@ -150,3 +150,30 @@ int get_mce_event(struct machine_check_event *mce)
return ret;
}
+
+uint64_t get_mce_fault_addr(struct machine_check_event *evt)
+{
+ switch (evt->error_type) {
+ case MCE_ERROR_TYPE_UE:
+ if (evt->u.ue_error.effective_address_provided)
+ return evt->u.ue_error.effective_address;
+ break;
+ case MCE_ERROR_TYPE_SLB:
+ if (evt->u.slb_error.effective_address_provided)
+ return evt->u.slb_error.effective_address;
+ break;
+ case MCE_ERROR_TYPE_ERAT:
+ if (evt->u.erat_error.effective_address_provided)
+ return evt->u.erat_error.effective_address;
+ break;
+ case MCE_ERROR_TYPE_TLB:
+ if (evt->u.tlb_error.effective_address_provided)
+ return evt->u.tlb_error.effective_address;
+ break;
+ default:
+ case MCE_ERROR_TYPE_UNKNOWN:
+ break;
+ }
+ return 0;
+}
+EXPORT_SYMBOL(get_mce_fault_addr);
diff --git a/arch/powerpc/platforms/powernv/opal.c b/arch/powerpc/platforms/powernv/opal.c
index a08243d..407d571 100644
--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -17,6 +17,7 @@
#include <linux/interrupt.h>
#include <linux/notifier.h>
#include <linux/slab.h>
+#include <linux/sched.h>
#include <asm/opal.h>
#include <asm/firmware.h>
#include <asm/mce.h>
@@ -240,6 +241,44 @@ int opal_put_chars(uint32_t vtermno, const char *data, int total_len)
return written;
}
+static int opal_recover_mce(struct pt_regs *regs,
+ struct machine_check_event *evt)
+{
+ int recovered = 0;
+ uint64_t ea = get_mce_fault_addr(evt);
+
+ if (!(regs->msr & MSR_RI)) {
+ /* If MSR_RI isn't set, we cannot recover */
+ recovered = 0;
+ } else if (evt->disposition == MCE_DISPOSITION_RECOVERED) {
+ /* Platform corrected itself */
+ recovered = 1;
+ } else if (ea && !is_kernel_addr(ea)) {
+ /*
+ * Faulting address is not in kernel text. We should be fine.
+ * We need to find which process uses this address.
+ * For now, kill the task if we have received exception when
+ * in userspace.
+ *
+ * TODO: Queue up this address for hwpoisioning later.
+ */
+ if (user_mode(regs) && !is_global_init(current)) {
+ _exception(SIGBUS, regs, BUS_MCEERR_AR, regs->nip);
+ recovered = 1;
+ } else
+ recovered = 0;
+ } else if (user_mode(regs) && !is_global_init(current) &&
+ evt->severity == MCE_SEV_ERROR_SYNC) {
+ /*
+ * If we have received a synchronous error when in userspace
+ * kill the task.
+ */
+ _exception(SIGBUS, regs, BUS_MCEERR_AR, regs->nip);
+ recovered = 1;
+ }
+ return recovered;
+}
+
int opal_machine_check(struct pt_regs *regs)
{
struct machine_check_event evt;
@@ -350,7 +389,9 @@ int opal_machine_check(struct pt_regs *regs)
printk("%s Error type: Unknown\n", level);
break;
}
- return evt.severity == MCE_SEV_FATAL ? 0 : 1;
+ if (opal_recover_mce(regs, &evt))
+ return 1;
+ return 0;
}
static irqreturn_t opal_interrupt(int irq, void *data)
prev parent reply other threads:[~2013-08-07 9:42 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-08-07 9:37 [RFC PATCH 0/9] Machine check handling in linux host Mahesh J Salgaonkar
2013-08-07 9:38 ` [RFC PATCH 1/9] powerpc: Split the common exception prolog logic into two section Mahesh J Salgaonkar
2013-08-08 4:10 ` Anshuman Khandual
2013-08-08 4:16 ` Benjamin Herrenschmidt
2013-08-07 9:38 ` [RFC PATCH 2/9] powerpc: handle machine check in Linux host Mahesh J Salgaonkar
2013-08-08 4:51 ` Paul Mackerras
2013-08-08 13:19 ` Mahesh Jagannath Salgaonkar
2013-08-08 13:33 ` Benjamin Herrenschmidt
2013-08-08 5:01 ` Anshuman Khandual
2013-08-07 9:38 ` [RFC PATCH 3/9] powerpc: Introduce a early machine check hook in cpu_spec Mahesh J Salgaonkar
2013-08-07 9:38 ` [RFC PATCH 4/9] powerpc: Add flush_tlb operation " Mahesh J Salgaonkar
2013-08-07 9:38 ` [RFC PATCH 5/9] powerpc: Flush SLB/TLBs if we get SLB/TLB machine check errors on power7 Mahesh J Salgaonkar
2013-08-08 4:58 ` Paul Mackerras
2013-08-07 9:39 ` [RFC PATCH 6/9] powerpc: Flush SLB/TLBs if we get SLB/TLB machine check errors on power8 Mahesh J Salgaonkar
2013-08-07 9:39 ` [RFC PATCH 7/9] powerpc: Decode and save machine check event Mahesh J Salgaonkar
2013-08-07 18:41 ` Scott Wood
2013-08-08 3:40 ` Mahesh Jagannath Salgaonkar
2013-08-08 5:14 ` Paul Mackerras
2013-08-08 13:19 ` Mahesh Jagannath Salgaonkar
2013-08-08 13:33 ` Benjamin Herrenschmidt
2013-08-07 9:39 ` [RFC PATCH 8/9] powerpc/powernv: Remove machine check handling in OPAL Mahesh J Salgaonkar
2013-08-07 9:39 ` Mahesh J Salgaonkar [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130807093930.5389.66368.stgit@mars.in.ibm.com \
--to=mahesh@linux.vnet.ibm.com \
--cc=anton@samba.org \
--cc=benh@kernel.crashing.org \
--cc=jeremy.kerr@au1.ibm.com \
--cc=linuxppc-dev@ozlabs.org \
--cc=paulus@samba.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).