All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/7] powerpc/pseries: Machien check handler improvements.
@ 2018-07-02  5:45 Mahesh J Salgaonkar
  2018-07-02  5:46 ` [PATCH v5 1/7] powerpc/pseries: Avoid using the size greater than Mahesh J Salgaonkar
                   ` (6 more replies)
  0 siblings, 7 replies; 17+ messages in thread
From: Mahesh J Salgaonkar @ 2018-07-02  5:45 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Nicholas Piggin, Michal Suchanek, Michael Ellerman, stable,
	Aneesh Kumar K.V, Nicholas Piggin, Aneesh Kumar K.V,
	Laurent Dufour, Michal Suchanek

This patch series includes some improvement to Machine check handler
for pseries. Patch 1 fixes a buffer overrun issue if rtas extended error
log size is greater than RTAS_ERROR_LOG_MAX.
Patch 2 fixes an issue where machine check handler crashes
kernel while accessing vmalloc-ed buffer while in nmi context.
Patch 3 fixes endain bug while restoring of r3 in MCE handler.
Patch 5 implements a real mode mce handler and flushes the SLBs on SLB error.
Patch 6 display's the MCE error details on console.
Patch 7 saves and dumps the SLB contents on SLB MCE errors to improve the
debugability.

Change in V5:
- Use min_t instead of max_t.
- Fix an issue reported by kbuild test robot and addressed review comments.

Change in V4:
- Flush the SLBs in real mode mce handler to handle SLB errors for entry 0.
- Allocate buffers per cpu to hold rtas error log and old slb contents.
- Defer the logging of rtas error log to irq work queue.

Change in V3:
- Moved patch 5 to patch 2

Change in V2:
- patch 3: Display additional info (NIP and task info) in MCE error details.
- patch 5: Fix endain bug while restoring of r3 in MCE handler.

---

Mahesh Salgaonkar (7):
      powerpc/pseries: Avoid using the size greater than
      powerpc/pseries: Defer the logging of rtas error to irq work queue.
      powerpc/pseries: Fix endainness while restoring of r3 in MCE handler.
      powerpc/pseries: Define MCE error event section.
      powerpc/pseries: flush SLB contents on SLB MCE errors.
      powerpc/pseries: Display machine check error details.
      powerpc/pseries: Dump the SLB contents on SLB MCE errors.


 arch/powerpc/include/asm/book3s/64/mmu-hash.h |    8 +
 arch/powerpc/include/asm/machdep.h            |    1 
 arch/powerpc/include/asm/paca.h               |    4 
 arch/powerpc/include/asm/rtas.h               |  116 ++++++++++++
 arch/powerpc/kernel/exceptions-64s.S          |   42 ++++
 arch/powerpc/kernel/mce.c                     |   16 +-
 arch/powerpc/mm/slb.c                         |   63 +++++++
 arch/powerpc/platforms/powernv/opal.c         |    1 
 arch/powerpc/platforms/pseries/pseries.h      |    1 
 arch/powerpc/platforms/pseries/ras.c          |  241 +++++++++++++++++++++++--
 arch/powerpc/platforms/pseries/setup.c        |   27 +++
 11 files changed, 499 insertions(+), 21 deletions(-)

--
Signature

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v5 1/7] powerpc/pseries: Avoid using the size greater than
  2018-07-02  5:45 [PATCH v5 0/7] powerpc/pseries: Machien check handler improvements Mahesh J Salgaonkar
@ 2018-07-02  5:46 ` Mahesh J Salgaonkar
  2018-07-02  5:46 ` [PATCH v5 2/7] powerpc/pseries: Defer the logging of rtas error to irq work queue Mahesh J Salgaonkar
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: Mahesh J Salgaonkar @ 2018-07-02  5:46 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Michal Suchanek, Nicholas Piggin, Aneesh Kumar K.V,
	Laurent Dufour, Michal Suchanek

From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

The global mce data buffer that used to copy rtas error log is of 2048
(RTAS_ERROR_LOG_MAX) bytes in size. Before the copy we read
extended_log_length from rtas error log header, then use max of
extended_log_length and RTAS_ERROR_LOG_MAX as a size of data to be copied.
Ideally the platform (phyp) will never send extended error log with
size > 2048. But if that happens, then we have a risk of buffer overrun
and corruption. Fix this by using min_t instead.

Fixes: d368514c3097 ("powerpc: Fix corruption when grabbing FWNMI data")
Reported-by: Michal Suchanek <msuchanek@suse.com>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/pseries/ras.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
index 5e1ef9150182..ef104144d4bc 100644
--- a/arch/powerpc/platforms/pseries/ras.c
+++ b/arch/powerpc/platforms/pseries/ras.c
@@ -371,7 +371,7 @@ static struct rtas_error_log *fwnmi_get_errinfo(struct pt_regs *regs)
 		int len, error_log_length;
 
 		error_log_length = 8 + rtas_error_extended_log_length(h);
-		len = max_t(int, error_log_length, RTAS_ERROR_LOG_MAX);
+		len = min_t(int, error_log_length, RTAS_ERROR_LOG_MAX);
 		memset(global_mce_data_buf, 0, RTAS_ERROR_LOG_MAX);
 		memcpy(global_mce_data_buf, h, len);
 		errhdr = (struct rtas_error_log *)global_mce_data_buf;

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v5 2/7] powerpc/pseries: Defer the logging of rtas error to irq work queue.
  2018-07-02  5:45 [PATCH v5 0/7] powerpc/pseries: Machien check handler improvements Mahesh J Salgaonkar
  2018-07-02  5:46 ` [PATCH v5 1/7] powerpc/pseries: Avoid using the size greater than Mahesh J Salgaonkar
@ 2018-07-02  5:46 ` Mahesh J Salgaonkar
  2018-07-03  3:25   ` Nicholas Piggin
  2018-07-02  5:46 ` [PATCH v5 3/7] powerpc/pseries: Fix endainness while restoring of r3 in MCE handler Mahesh J Salgaonkar
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 17+ messages in thread
From: Mahesh J Salgaonkar @ 2018-07-02  5:46 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: stable, Nicholas Piggin, Aneesh Kumar K.V, Laurent Dufour,
	Michal Suchanek

From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

rtas_log_buf is a buffer to hold RTAS event data that are communicated
to kernel by hypervisor. This buffer is then used to pass RTAS event
data to user through proc fs. This buffer is allocated from vmalloc
(non-linear mapping) area.

On Machine check interrupt, register r3 points to RTAS extended event
log passed by hypervisor that contains the MCE event. The pseries
machine check handler then logs this error into rtas_log_buf. The
rtas_log_buf is a vmalloc-ed (non-linear) buffer we end up taking up a
page fault (vector 0x300) while accessing it. Since machine check
interrupt handler runs in NMI context we can not afford to take any
page fault. Page faults are not honored in NMI context and causes
kernel panic. Apart from that, as Nick pointed out, pSeries_log_error()
also takes a spin_lock while logging error which is not safe in NMI
context. It may endup in deadlock if we get another MCE before releasing
the lock. Fix this by deferring the logging of rtas error to irq work queue.

Current implementation uses two different buffers to hold rtas error log
depending on whether extended log is provided or not. This makes bit
difficult to identify which buffer has valid data that needs to logged
later in irq work. Simplify this using single buffer, one per paca, and
copy rtas log to it irrespective of whether extended log is provided or
not. Allocate this buffer below RMA region so that it can be accessed
in real mode mce handler.

Fixes: b96672dd840f ("powerpc: Machine check interrupt is a non-maskable interrupt")
Cc: stable@vger.kernel.org
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/paca.h        |    3 ++
 arch/powerpc/platforms/pseries/ras.c   |   47 ++++++++++++++++++++++----------
 arch/powerpc/platforms/pseries/setup.c |   16 +++++++++++
 3 files changed, 51 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index 3f109a3e3edb..b441fef53077 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -251,6 +251,9 @@ struct paca_struct {
 	void *rfi_flush_fallback_area;
 	u64 l1d_flush_size;
 #endif
+#ifdef CONFIG_PPC_PSERIES
+	u8 *mce_data_buf;		/* buffer to hold per cpu rtas errlog */
+#endif /* CONFIG_PPC_PSERIES */
 } ____cacheline_aligned;
 
 extern void copy_mm_to_paca(struct mm_struct *mm);
diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
index ef104144d4bc..14a46b07ab2f 100644
--- a/arch/powerpc/platforms/pseries/ras.c
+++ b/arch/powerpc/platforms/pseries/ras.c
@@ -22,6 +22,7 @@
 #include <linux/of.h>
 #include <linux/fs.h>
 #include <linux/reboot.h>
+#include <linux/irq_work.h>
 
 #include <asm/machdep.h>
 #include <asm/rtas.h>
@@ -32,11 +33,13 @@
 static unsigned char ras_log_buf[RTAS_ERROR_LOG_MAX];
 static DEFINE_SPINLOCK(ras_log_buf_lock);
 
-static char global_mce_data_buf[RTAS_ERROR_LOG_MAX];
-static DEFINE_PER_CPU(__u64, mce_data_buf);
-
 static int ras_check_exception_token;
 
+static void mce_process_errlog_event(struct irq_work *work);
+static struct irq_work mce_errlog_process_work = {
+	.func = mce_process_errlog_event,
+};
+
 #define EPOW_SENSOR_TOKEN	9
 #define EPOW_SENSOR_INDEX	0
 
@@ -330,16 +333,20 @@ static irqreturn_t ras_error_interrupt(int irq, void *dev_id)
 	((((A) >= 0x7000) && ((A) < 0x7ff0)) || \
 	(((A) >= rtas.base) && ((A) < (rtas.base + rtas.size - 16))))
 
+static inline struct rtas_error_log *fwnmi_get_errlog(void)
+{
+	return (struct rtas_error_log *)local_paca->mce_data_buf;
+}
+
 /*
  * Get the error information for errors coming through the
  * FWNMI vectors.  The pt_regs' r3 will be updated to reflect
  * the actual r3 if possible, and a ptr to the error log entry
  * will be returned if found.
  *
- * If the RTAS error is not of the extended type, then we put it in a per
- * cpu 64bit buffer. If it is the extended type we use global_mce_data_buf.
+ * Use one buffer mce_data_buf per cpu to store RTAS error.
  *
- * The global_mce_data_buf does not have any locks or protection around it,
+ * The mce_data_buf does not have any locks or protection around it,
  * if a second machine check comes in, or a system reset is done
  * before we have logged the error, then we will get corruption in the
  * error log.  This is preferable over holding off on calling
@@ -349,7 +356,7 @@ static irqreturn_t ras_error_interrupt(int irq, void *dev_id)
 static struct rtas_error_log *fwnmi_get_errinfo(struct pt_regs *regs)
 {
 	unsigned long *savep;
-	struct rtas_error_log *h, *errhdr = NULL;
+	struct rtas_error_log *h;
 
 	/* Mask top two bits */
 	regs->gpr[3] &= ~(0x3UL << 62);
@@ -362,22 +369,20 @@ static struct rtas_error_log *fwnmi_get_errinfo(struct pt_regs *regs)
 	savep = __va(regs->gpr[3]);
 	regs->gpr[3] = savep[0];	/* restore original r3 */
 
-	/* If it isn't an extended log we can use the per cpu 64bit buffer */
 	h = (struct rtas_error_log *)&savep[1];
+	/* Use the per cpu buffer from paca to store rtas error log */
+	memset(local_paca->mce_data_buf, 0, RTAS_ERROR_LOG_MAX);
 	if (!rtas_error_extended(h)) {
-		memcpy(this_cpu_ptr(&mce_data_buf), h, sizeof(__u64));
-		errhdr = (struct rtas_error_log *)this_cpu_ptr(&mce_data_buf);
+		memcpy(local_paca->mce_data_buf, h, sizeof(__u64));
 	} else {
 		int len, error_log_length;
 
 		error_log_length = 8 + rtas_error_extended_log_length(h);
 		len = min_t(int, error_log_length, RTAS_ERROR_LOG_MAX);
-		memset(global_mce_data_buf, 0, RTAS_ERROR_LOG_MAX);
-		memcpy(global_mce_data_buf, h, len);
-		errhdr = (struct rtas_error_log *)global_mce_data_buf;
+		memcpy(local_paca->mce_data_buf, h, len);
 	}
 
-	return errhdr;
+	return (struct rtas_error_log *)local_paca->mce_data_buf;
 }
 
 /* Call this when done with the data returned by FWNMI_get_errinfo.
@@ -422,6 +427,17 @@ int pSeries_system_reset_exception(struct pt_regs *regs)
 	return 0; /* need to perform reset */
 }
 
+/*
+ * Process MCE rtas errlog event.
+ */
+static void mce_process_errlog_event(struct irq_work *work)
+{
+	struct rtas_error_log *err;
+
+	err = fwnmi_get_errlog();
+	log_error((char *)err, ERR_TYPE_RTAS_LOG, 0);
+}
+
 /*
  * See if we can recover from a machine check exception.
  * This is only called on power4 (or above) and only via
@@ -466,7 +482,8 @@ static int recover_mce(struct pt_regs *regs, struct rtas_error_log *err)
 		recovered = 1;
 	}
 
-	log_error((char *)err, ERR_TYPE_RTAS_LOG, 0);
+	/* Queue irq work to log this rtas event later. */
+	irq_work_queue(&mce_errlog_process_work);
 
 	return recovered;
 }
diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
index fdb32e056ef4..60a067a6e743 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -41,6 +41,7 @@
 #include <linux/root_dev.h>
 #include <linux/of.h>
 #include <linux/of_pci.h>
+#include <linux/memblock.h>
 
 #include <asm/mmu.h>
 #include <asm/processor.h>
@@ -101,6 +102,9 @@ static void pSeries_show_cpuinfo(struct seq_file *m)
 static void __init fwnmi_init(void)
 {
 	unsigned long system_reset_addr, machine_check_addr;
+	u8 *mce_data_buf;
+	unsigned int i;
+	int nr_cpus = num_possible_cpus();
 
 	int ibm_nmi_register = rtas_token("ibm,nmi-register");
 	if (ibm_nmi_register == RTAS_UNKNOWN_SERVICE)
@@ -114,6 +118,18 @@ static void __init fwnmi_init(void)
 	if (0 == rtas_call(ibm_nmi_register, 2, 1, NULL, system_reset_addr,
 				machine_check_addr))
 		fwnmi_active = 1;
+
+	/*
+	 * Allocate a chunk for per cpu buffer to hold rtas errorlog.
+	 * It will be used in real mode mce handler, hence it needs to be
+	 * below RMA.
+	 */
+	mce_data_buf = __va(memblock_alloc_base(RTAS_ERROR_LOG_MAX * nr_cpus,
+					RTAS_ERROR_LOG_MAX, ppc64_rma_size));
+	for_each_possible_cpu(i) {
+		paca_ptrs[i]->mce_data_buf = mce_data_buf +
+						(RTAS_ERROR_LOG_MAX * i);
+	}
 }
 
 static void pseries_8259_cascade(struct irq_desc *desc)

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v5 3/7] powerpc/pseries: Fix endainness while restoring of r3 in MCE handler.
  2018-07-02  5:45 [PATCH v5 0/7] powerpc/pseries: Machien check handler improvements Mahesh J Salgaonkar
  2018-07-02  5:46 ` [PATCH v5 1/7] powerpc/pseries: Avoid using the size greater than Mahesh J Salgaonkar
  2018-07-02  5:46 ` [PATCH v5 2/7] powerpc/pseries: Defer the logging of rtas error to irq work queue Mahesh J Salgaonkar
@ 2018-07-02  5:46 ` Mahesh J Salgaonkar
  2018-07-02  5:46 ` [PATCH v5 4/7] powerpc/pseries: Define MCE error event section Mahesh J Salgaonkar
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: Mahesh J Salgaonkar @ 2018-07-02  5:46 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: stable, Nicholas Piggin, Nicholas Piggin, Aneesh Kumar K.V,
	Laurent Dufour, Michal Suchanek

From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

During Machine Check interrupt on pseries platform, register r3 points
RTAS extended event log passed by hypervisor. Since hypervisor uses r3
to pass pointer to rtas log, it stores the original r3 value at the
start of the memory (first 8 bytes) pointed by r3. Since hypervisor
stores this info and rtas log is in BE format, linux should make
sure to restore r3 value in correct endian format.

Without this patch when MCE handler, after recovery, returns to code that
that caused the MCE may end up with Data SLB access interrupt for invalid
address followed by kernel panic or hang.

[   62.878965] Severe Machine check interrupt [Recovered]
[   62.878968]   NIP [d00000000ca301b8]: init_module+0x1b8/0x338 [bork_kernel]
[   62.878969]   Initiator: CPU
[   62.878970]   Error type: SLB [Multihit]
[   62.878971]     Effective address: d00000000ca70000
cpu 0xa: Vector: 380 (Data SLB Access) at [c0000000fc7775b0]
    pc: c0000000009694c0: vsnprintf+0x80/0x480
    lr: c0000000009698e0: vscnprintf+0x20/0x60
    sp: c0000000fc777830
   msr: 8000000002009033
   dar: a803a30c000000d0
  current = 0xc00000000bc9ef00
  paca    = 0xc00000001eca5c00	 softe: 3	 irq_happened: 0x01
    pid   = 8860, comm = insmod
[c0000000fc7778b0] c0000000009698e0 vscnprintf+0x20/0x60
[c0000000fc7778e0] c00000000016b6c4 vprintk_emit+0xb4/0x4b0
[c0000000fc777960] c00000000016d40c vprintk_func+0x5c/0xd0
[c0000000fc777980] c00000000016cbb4 printk+0x38/0x4c
[c0000000fc7779a0] d00000000ca301c0 init_module+0x1c0/0x338 [bork_kernel]
[c0000000fc777a40] c00000000000d9c4 do_one_initcall+0x54/0x230
[c0000000fc777b00] c0000000001b3b74 do_init_module+0x8c/0x248
[c0000000fc777b90] c0000000001b2478 load_module+0x12b8/0x15b0
[c0000000fc777d30] c0000000001b29e8 sys_finit_module+0xa8/0x110
[c0000000fc777e30] c00000000000b204 system_call+0x58/0x6c
--- Exception: c00 (System Call) at 00007fff8bda0644
SP (7fffdfbfe980) is in userspace

This patch fixes this issue.

Fixes: a08a53ea4c97 ("powerpc/le: Enable RTAS events support")
Cc: stable@vger.kernel.org
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/pseries/ras.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
index 14a46b07ab2f..851ce326874a 100644
--- a/arch/powerpc/platforms/pseries/ras.c
+++ b/arch/powerpc/platforms/pseries/ras.c
@@ -367,7 +367,7 @@ static struct rtas_error_log *fwnmi_get_errinfo(struct pt_regs *regs)
 	}
 
 	savep = __va(regs->gpr[3]);
-	regs->gpr[3] = savep[0];	/* restore original r3 */
+	regs->gpr[3] = be64_to_cpu(savep[0]);	/* restore original r3 */
 
 	h = (struct rtas_error_log *)&savep[1];
 	/* Use the per cpu buffer from paca to store rtas error log */

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v5 4/7] powerpc/pseries: Define MCE error event section.
  2018-07-02  5:45 [PATCH v5 0/7] powerpc/pseries: Machien check handler improvements Mahesh J Salgaonkar
                   ` (2 preceding siblings ...)
  2018-07-02  5:46 ` [PATCH v5 3/7] powerpc/pseries: Fix endainness while restoring of r3 in MCE handler Mahesh J Salgaonkar
@ 2018-07-02  5:46 ` Mahesh J Salgaonkar
  2018-07-02  5:47 ` [PATCH v5 5/7] powerpc/pseries: flush SLB contents on SLB MCE errors Mahesh J Salgaonkar
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: Mahesh J Salgaonkar @ 2018-07-02  5:46 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Nicholas Piggin, Aneesh Kumar K.V, Laurent Dufour, Michal Suchanek

From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

On pseries, the machine check error details are part of RTAS extended
event log passed under Machine check exception section. This patch adds
the definition of rtas MCE event section and related helper
functions.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/rtas.h |  111 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 111 insertions(+)

diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index ec9dd79398ee..ceeed2dd489b 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -185,6 +185,13 @@ static inline uint8_t rtas_error_disposition(const struct rtas_error_log *elog)
 	return (elog->byte1 & 0x18) >> 3;
 }
 
+static inline
+void rtas_set_disposition_recovered(struct rtas_error_log *elog)
+{
+	elog->byte1 &= ~0x18;
+	elog->byte1 |= (RTAS_DISP_FULLY_RECOVERED << 3);
+}
+
 static inline uint8_t rtas_error_extended(const struct rtas_error_log *elog)
 {
 	return (elog->byte1 & 0x04) >> 2;
@@ -275,6 +282,7 @@ inline uint32_t rtas_ext_event_company_id(struct rtas_ext_event_log_v6 *ext_log)
 #define PSERIES_ELOG_SECT_ID_CALL_HOME		(('C' << 8) | 'H')
 #define PSERIES_ELOG_SECT_ID_USER_DEF		(('U' << 8) | 'D')
 #define PSERIES_ELOG_SECT_ID_HOTPLUG		(('H' << 8) | 'P')
+#define PSERIES_ELOG_SECT_ID_MCE		(('M' << 8) | 'C')
 
 /* Vendor specific Platform Event Log Format, Version 6, section header */
 struct pseries_errorlog {
@@ -326,6 +334,109 @@ struct pseries_hp_errorlog {
 #define PSERIES_HP_ELOG_ID_DRC_COUNT	3
 #define PSERIES_HP_ELOG_ID_DRC_IC	4
 
+/* RTAS pseries MCE errorlog section */
+#pragma pack(push, 1)
+struct pseries_mc_errorlog {
+	__be32	fru_id;
+	__be32	proc_id;
+	uint8_t	error_type;
+	union {
+		struct {
+			uint8_t	ue_err_type;
+			/* XXXXXXXX
+			 * X		1: Permanent or Transient UE.
+			 *  X		1: Effective address provided.
+			 *   X		1: Logical address provided.
+			 *    XX	2: Reserved.
+			 *      XXX	3: Type of UE error.
+			 */
+			uint8_t	reserved_1[6];
+			__be64	effective_address;
+			__be64	logical_address;
+		} ue_error;
+		struct {
+			uint8_t	soft_err_type;
+			/* XXXXXXXX
+			 * X		1: Effective address provided.
+			 *  XXXXX	5: Reserved.
+			 *       XX	2: Type of SLB/ERAT/TLB error.
+			 */
+			uint8_t	reserved_1[6];
+			__be64	effective_address;
+			uint8_t	reserved_2[8];
+		} soft_error;
+	} u;
+};
+#pragma pack(pop)
+
+/* RTAS pseries MCE error types */
+#define PSERIES_MC_ERROR_TYPE_UE		0x00
+#define PSERIES_MC_ERROR_TYPE_SLB		0x01
+#define PSERIES_MC_ERROR_TYPE_ERAT		0x02
+#define PSERIES_MC_ERROR_TYPE_TLB		0x04
+#define PSERIES_MC_ERROR_TYPE_D_CACHE		0x05
+#define PSERIES_MC_ERROR_TYPE_I_CACHE		0x07
+
+/* RTAS pseries MCE error sub types */
+#define PSERIES_MC_ERROR_UE_INDETERMINATE		0
+#define PSERIES_MC_ERROR_UE_IFETCH			1
+#define PSERIES_MC_ERROR_UE_PAGE_TABLE_WALK_IFETCH	2
+#define PSERIES_MC_ERROR_UE_LOAD_STORE			3
+#define PSERIES_MC_ERROR_UE_PAGE_TABLE_WALK_LOAD_STORE	4
+
+#define PSERIES_MC_ERROR_SLB_PARITY		0
+#define PSERIES_MC_ERROR_SLB_MULTIHIT		1
+#define PSERIES_MC_ERROR_SLB_INDETERMINATE	2
+
+#define PSERIES_MC_ERROR_ERAT_PARITY		1
+#define PSERIES_MC_ERROR_ERAT_MULTIHIT		2
+#define PSERIES_MC_ERROR_ERAT_INDETERMINATE	3
+
+#define PSERIES_MC_ERROR_TLB_PARITY		1
+#define PSERIES_MC_ERROR_TLB_MULTIHIT		2
+#define PSERIES_MC_ERROR_TLB_INDETERMINATE	3
+
+static inline uint8_t rtas_mc_error_type(const struct pseries_mc_errorlog *mlog)
+{
+	return mlog->error_type;
+}
+
+static inline uint8_t rtas_mc_error_sub_type(
+					const struct pseries_mc_errorlog *mlog)
+{
+	switch (mlog->error_type) {
+	case	PSERIES_MC_ERROR_TYPE_UE:
+		return (mlog->u.ue_error.ue_err_type & 0x07);
+	case	PSERIES_MC_ERROR_TYPE_SLB:
+	case	PSERIES_MC_ERROR_TYPE_ERAT:
+	case	PSERIES_MC_ERROR_TYPE_TLB:
+		return (mlog->u.soft_error.soft_err_type & 0x03);
+	default:
+		return 0;
+	}
+}
+
+static inline uint64_t rtas_mc_get_effective_addr(
+					const struct pseries_mc_errorlog *mlog)
+{
+	uint64_t addr = 0;
+
+	switch (mlog->error_type) {
+	case	PSERIES_MC_ERROR_TYPE_UE:
+		if (mlog->u.ue_error.ue_err_type & 0x40)
+			addr = mlog->u.ue_error.effective_address;
+		break;
+	case	PSERIES_MC_ERROR_TYPE_SLB:
+	case	PSERIES_MC_ERROR_TYPE_ERAT:
+	case	PSERIES_MC_ERROR_TYPE_TLB:
+		if (mlog->u.soft_error.soft_err_type & 0x80)
+			addr = mlog->u.soft_error.effective_address;
+	default:
+		break;
+	}
+	return be64_to_cpu(addr);
+}
+
 struct pseries_errorlog *get_pseries_errorlog(struct rtas_error_log *log,
 					      uint16_t section_id);
 

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v5 5/7] powerpc/pseries: flush SLB contents on SLB MCE errors.
  2018-07-02  5:45 [PATCH v5 0/7] powerpc/pseries: Machien check handler improvements Mahesh J Salgaonkar
                   ` (3 preceding siblings ...)
  2018-07-02  5:46 ` [PATCH v5 4/7] powerpc/pseries: Define MCE error event section Mahesh J Salgaonkar
@ 2018-07-02  5:47 ` Mahesh J Salgaonkar
  2018-07-02 22:08   ` Nicholas Piggin
  2018-07-02  5:47 ` [PATCH v5 6/7] powerpc/pseries: Display machine check error details Mahesh J Salgaonkar
  2018-07-02  5:47 ` [PATCH v5 7/7] powerpc/pseries: Dump the SLB contents on SLB MCE errors Mahesh J Salgaonkar
  6 siblings, 1 reply; 17+ messages in thread
From: Mahesh J Salgaonkar @ 2018-07-02  5:47 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Nicholas Piggin, Aneesh Kumar K.V, Laurent Dufour, Michal Suchanek

From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

On pseries, as of today system crashes if we get a machine check
exceptions due to SLB errors. These are soft errors and can be fixed by
flushing the SLBs so the kernel can continue to function instead of
system crash. We do this in real mode before turning on MMU. Otherwise
we would run into nested machine checks. This patch now fetches the
rtas error log in real mode and flushes the SLBs on SLB errors.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/book3s/64/mmu-hash.h |    1 
 arch/powerpc/include/asm/machdep.h            |    1 
 arch/powerpc/kernel/exceptions-64s.S          |   42 +++++++++++++++++++++
 arch/powerpc/kernel/mce.c                     |   16 +++++++-
 arch/powerpc/mm/slb.c                         |    6 +++
 arch/powerpc/platforms/powernv/opal.c         |    1 
 arch/powerpc/platforms/pseries/pseries.h      |    1 
 arch/powerpc/platforms/pseries/ras.c          |   51 +++++++++++++++++++++++++
 arch/powerpc/platforms/pseries/setup.c        |    1 
 9 files changed, 116 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
index 50ed64fba4ae..cc00a7088cf3 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
@@ -487,6 +487,7 @@ extern void hpte_init_native(void);
 
 extern void slb_initialize(void);
 extern void slb_flush_and_rebolt(void);
+extern void slb_flush_and_rebolt_realmode(void);
 
 extern void slb_vmalloc_update(void);
 extern void slb_set_size(u16 size);
diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h
index ffe7c71e1132..fe447e0d4140 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -108,6 +108,7 @@ struct machdep_calls {
 
 	/* Early exception handlers called in realmode */
 	int		(*hmi_exception_early)(struct pt_regs *regs);
+	int		(*machine_check_early)(struct pt_regs *regs);
 
 	/* Called during machine check exception to retrive fixup address. */
 	bool		(*mce_check_early_recovery)(struct pt_regs *regs);
diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index f283958129f2..0038596b7906 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -332,6 +332,9 @@ TRAMP_REAL_BEGIN(machine_check_pSeries)
 machine_check_fwnmi:
 	SET_SCRATCH0(r13)		/* save r13 */
 	EXCEPTION_PROLOG_0(PACA_EXMC)
+BEGIN_FTR_SECTION
+	b	machine_check_pSeries_early
+END_FTR_SECTION_IFCLR(CPU_FTR_HVMODE)
 machine_check_pSeries_0:
 	EXCEPTION_PROLOG_1(PACA_EXMC, KVMTEST_PR, 0x200)
 	/*
@@ -343,6 +346,45 @@ machine_check_pSeries_0:
 
 TRAMP_KVM_SKIP(PACA_EXMC, 0x200)
 
+TRAMP_REAL_BEGIN(machine_check_pSeries_early)
+BEGIN_FTR_SECTION
+	EXCEPTION_PROLOG_1(PACA_EXMC, NOTEST, 0x200)
+	mr	r10,r1			/* Save r1 */
+	ld	r1,PACAMCEMERGSP(r13)	/* Use MC emergency stack */
+	subi	r1,r1,INT_FRAME_SIZE	/* alloc stack frame		*/
+	mfspr	r11,SPRN_SRR0		/* Save SRR0 */
+	mfspr	r12,SPRN_SRR1		/* Save SRR1 */
+	EXCEPTION_PROLOG_COMMON_1()
+	EXCEPTION_PROLOG_COMMON_2(PACA_EXMC)
+	EXCEPTION_PROLOG_COMMON_3(0x200)
+	addi	r3,r1,STACK_FRAME_OVERHEAD
+	BRANCH_LINK_TO_FAR(machine_check_early) /* Function call ABI */
+
+	/* Move original SRR0 and SRR1 into the respective regs */
+	ld	r9,_MSR(r1)
+	mtspr	SPRN_SRR1,r9
+	ld	r3,_NIP(r1)
+	mtspr	SPRN_SRR0,r3
+	ld	r9,_CTR(r1)
+	mtctr	r9
+	ld	r9,_XER(r1)
+	mtxer	r9
+	ld	r9,_LINK(r1)
+	mtlr	r9
+	REST_GPR(0, r1)
+	REST_8GPRS(2, r1)
+	REST_GPR(10, r1)
+	ld	r11,_CCR(r1)
+	mtcr	r11
+	REST_GPR(11, r1)
+	REST_2GPRS(12, r1)
+	/* restore original r1. */
+	ld	r1,GPR1(r1)
+	SET_SCRATCH0(r13)		/* save r13 */
+	EXCEPTION_PROLOG_0(PACA_EXMC)
+	b	machine_check_pSeries_0
+END_FTR_SECTION_IFCLR(CPU_FTR_HVMODE)
+
 EXC_COMMON_BEGIN(machine_check_common)
 	/*
 	 * Machine check is different because we use a different
diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
index efdd16a79075..221271c96a57 100644
--- a/arch/powerpc/kernel/mce.c
+++ b/arch/powerpc/kernel/mce.c
@@ -488,9 +488,21 @@ long machine_check_early(struct pt_regs *regs)
 {
 	long handled = 0;
 
-	__this_cpu_inc(irq_stat.mce_exceptions);
+	/*
+	 * For pSeries we count mce when we go into virtual mode machine
+	 * check handler. Hence skip it. Also, We can't access per cpu
+	 * variables in real mode for LPAR.
+	 */
+	if (early_cpu_has_feature(CPU_FTR_HVMODE))
+		__this_cpu_inc(irq_stat.mce_exceptions);
 
-	if (cur_cpu_spec && cur_cpu_spec->machine_check_early)
+	/*
+	 * See if platform is capable of handling machine check.
+	 * Otherwise fallthrough and allow CPU to handle this machine check.
+	 */
+	if (ppc_md.machine_check_early)
+		handled = ppc_md.machine_check_early(regs);
+	else if (cur_cpu_spec && cur_cpu_spec->machine_check_early)
 		handled = cur_cpu_spec->machine_check_early(regs);
 	return handled;
 }
diff --git a/arch/powerpc/mm/slb.c b/arch/powerpc/mm/slb.c
index 66577cc66dc9..5b1813b98358 100644
--- a/arch/powerpc/mm/slb.c
+++ b/arch/powerpc/mm/slb.c
@@ -145,6 +145,12 @@ void slb_flush_and_rebolt(void)
 	get_paca()->slb_cache_ptr = 0;
 }
 
+void slb_flush_and_rebolt_realmode(void)
+{
+	__slb_flush_and_rebolt();
+	get_paca()->slb_cache_ptr = 0;
+}
+
 void slb_vmalloc_update(void)
 {
 	unsigned long vflags;
diff --git a/arch/powerpc/platforms/powernv/opal.c b/arch/powerpc/platforms/powernv/opal.c
index 48fbb41af5d1..ed548d40a9e1 100644
--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -417,7 +417,6 @@ static int opal_recover_mce(struct pt_regs *regs,
 
 	if (!(regs->msr & MSR_RI)) {
 		/* If MSR_RI isn't set, we cannot recover */
-		pr_err("Machine check interrupt unrecoverable: MSR(RI=0)\n");
 		recovered = 0;
 	} else if (evt->disposition == MCE_DISPOSITION_RECOVERED) {
 		/* Platform corrected itself */
diff --git a/arch/powerpc/platforms/pseries/pseries.h b/arch/powerpc/platforms/pseries/pseries.h
index 60db2ee511fb..3611db5dd583 100644
--- a/arch/powerpc/platforms/pseries/pseries.h
+++ b/arch/powerpc/platforms/pseries/pseries.h
@@ -24,6 +24,7 @@ struct pt_regs;
 
 extern int pSeries_system_reset_exception(struct pt_regs *regs);
 extern int pSeries_machine_check_exception(struct pt_regs *regs);
+extern int pSeries_machine_check_realmode(struct pt_regs *regs);
 
 #ifdef CONFIG_SMP
 extern void smp_init_pseries(void);
diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
index 851ce326874a..9aa7885e0148 100644
--- a/arch/powerpc/platforms/pseries/ras.c
+++ b/arch/powerpc/platforms/pseries/ras.c
@@ -427,6 +427,35 @@ int pSeries_system_reset_exception(struct pt_regs *regs)
 	return 0; /* need to perform reset */
 }
 
+static int mce_handle_error(struct rtas_error_log *errp)
+{
+	struct pseries_errorlog *pseries_log;
+	struct pseries_mc_errorlog *mce_log;
+	int disposition = rtas_error_disposition(errp);
+	uint8_t error_type;
+
+	if (!rtas_error_extended(errp))
+		goto out;
+
+	pseries_log = get_pseries_errorlog(errp, PSERIES_ELOG_SECT_ID_MCE);
+	if (pseries_log == NULL)
+		goto out;
+
+	mce_log = (struct pseries_mc_errorlog *)pseries_log->data;
+	error_type = rtas_mc_error_type(mce_log);
+
+	if ((disposition == RTAS_DISP_NOT_RECOVERED) &&
+			(error_type == PSERIES_MC_ERROR_TYPE_SLB)) {
+		/* Store the old slb content someplace. */
+		slb_flush_and_rebolt_realmode();
+		disposition = RTAS_DISP_FULLY_RECOVERED;
+		rtas_set_disposition_recovered(errp);
+	}
+
+out:
+	return disposition;
+}
+
 /*
  * Process MCE rtas errlog event.
  */
@@ -503,11 +532,31 @@ int pSeries_machine_check_exception(struct pt_regs *regs)
 	struct rtas_error_log *errp;
 
 	if (fwnmi_active) {
-		errp = fwnmi_get_errinfo(regs);
 		fwnmi_release_errinfo();
+		errp = fwnmi_get_errlog();
 		if (errp && recover_mce(regs, errp))
 			return 1;
 	}
 
 	return 0;
 }
+
+int pSeries_machine_check_realmode(struct pt_regs *regs)
+{
+	struct rtas_error_log *errp;
+	int disposition;
+
+	if (fwnmi_active) {
+		errp = fwnmi_get_errinfo(regs);
+		/*
+		 * Call to fwnmi_release_errinfo() in real mode causes kernel
+		 * to panic. Hence we will call it as soon as we go into
+		 * virtual mode.
+		 */
+		disposition = mce_handle_error(errp);
+		if (disposition == RTAS_DISP_FULLY_RECOVERED)
+			return 1;
+	}
+
+	return 0;
+}
diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
index 60a067a6e743..249b02bc5c41 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -999,6 +999,7 @@ define_machine(pseries) {
 	.calibrate_decr		= generic_calibrate_decr,
 	.progress		= rtas_progress,
 	.system_reset_exception = pSeries_system_reset_exception,
+	.machine_check_early	= pSeries_machine_check_realmode,
 	.machine_check_exception = pSeries_machine_check_exception,
 #ifdef CONFIG_KEXEC_CORE
 	.machine_kexec          = pSeries_machine_kexec,

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v5 6/7] powerpc/pseries: Display machine check error details.
  2018-07-02  5:45 [PATCH v5 0/7] powerpc/pseries: Machien check handler improvements Mahesh J Salgaonkar
                   ` (4 preceding siblings ...)
  2018-07-02  5:47 ` [PATCH v5 5/7] powerpc/pseries: flush SLB contents on SLB MCE errors Mahesh J Salgaonkar
@ 2018-07-02  5:47 ` Mahesh J Salgaonkar
  2018-07-02  5:47 ` [PATCH v5 7/7] powerpc/pseries: Dump the SLB contents on SLB MCE errors Mahesh J Salgaonkar
  6 siblings, 0 replies; 17+ messages in thread
From: Mahesh J Salgaonkar @ 2018-07-02  5:47 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Nicholas Piggin, Aneesh Kumar K.V, Laurent Dufour, Michal Suchanek

From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

Extract the MCE error details from RTAS extended log and display it to
console.

With this patch you should now see mce logs like below:

[  142.371818] Severe Machine check interrupt [Recovered]
[  142.371822]   NIP [d00000000ca301b8]: init_module+0x1b8/0x338 [bork_kernel]
[  142.371822]   Initiator: CPU
[  142.371823]   Error type: SLB [Multihit]
[  142.371824]     Effective address: d00000000ca70000

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/rtas.h      |    5 +
 arch/powerpc/platforms/pseries/ras.c |  131 ++++++++++++++++++++++++++++++++++
 2 files changed, 136 insertions(+)

diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index ceeed2dd489b..26bc3d5c4992 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -197,6 +197,11 @@ static inline uint8_t rtas_error_extended(const struct rtas_error_log *elog)
 	return (elog->byte1 & 0x04) >> 2;
 }
 
+static inline uint8_t rtas_error_initiator(const struct rtas_error_log *elog)
+{
+	return (elog->byte2 & 0xf0) >> 4;
+}
+
 #define rtas_error_type(x)	((x)->byte3)
 
 static inline
diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
index 9aa7885e0148..7d4d2b8bc019 100644
--- a/arch/powerpc/platforms/pseries/ras.c
+++ b/arch/powerpc/platforms/pseries/ras.c
@@ -427,6 +427,135 @@ int pSeries_system_reset_exception(struct pt_regs *regs)
 	return 0; /* need to perform reset */
 }
 
+#define VAL_TO_STRING(ar, val)	((val < ARRAY_SIZE(ar)) ? ar[val] : "Unknown")
+
+static void pseries_print_mce_info(struct pt_regs *regs,
+						struct rtas_error_log *errp)
+{
+	const char *level, *sevstr;
+	struct pseries_errorlog *pseries_log;
+	struct pseries_mc_errorlog *mce_log;
+	uint8_t error_type, err_sub_type;
+	uint64_t addr;
+	uint8_t initiator = rtas_error_initiator(errp);
+	int disposition = rtas_error_disposition(errp);
+
+	static const char * const initiators[] = {
+		"Unknown",
+		"CPU",
+		"PCI",
+		"ISA",
+		"Memory",
+		"Power Mgmt",
+	};
+	static const char * const mc_err_types[] = {
+		"UE",
+		"SLB",
+		"ERAT",
+		"TLB",
+		"D-Cache",
+		"Unknown",
+		"I-Cache",
+	};
+	static const char * const mc_ue_types[] = {
+		"Indeterminate",
+		"Instruction fetch",
+		"Page table walk ifetch",
+		"Load/Store",
+		"Page table walk Load/Store",
+	};
+
+	/* SLB sub errors valid values are 0x0, 0x1, 0x2 */
+	static const char * const mc_slb_types[] = {
+		"Parity",
+		"Multihit",
+		"Indeterminate",
+	};
+
+	/* TLB and ERAT sub errors valid values are 0x1, 0x2, 0x3 */
+	static const char * const mc_soft_types[] = {
+		"Unknown",
+		"Parity",
+		"Multihit",
+		"Indeterminate",
+	};
+
+	if (!rtas_error_extended(errp)) {
+		pr_err("Machine check interrupt: Missing extended error log\n");
+		return;
+	}
+
+	pseries_log = get_pseries_errorlog(errp, PSERIES_ELOG_SECT_ID_MCE);
+	if (pseries_log == NULL)
+		return;
+
+	mce_log = (struct pseries_mc_errorlog *)pseries_log->data;
+
+	error_type = rtas_mc_error_type(mce_log);
+	err_sub_type = rtas_mc_error_sub_type(mce_log);
+
+	switch (rtas_error_severity(errp)) {
+	case RTAS_SEVERITY_NO_ERROR:
+		level = KERN_INFO;
+		sevstr = "Harmless";
+		break;
+	case RTAS_SEVERITY_WARNING:
+		level = KERN_WARNING;
+		sevstr = "";
+		break;
+	case RTAS_SEVERITY_ERROR:
+	case RTAS_SEVERITY_ERROR_SYNC:
+		level = KERN_ERR;
+		sevstr = "Severe";
+		break;
+	case RTAS_SEVERITY_FATAL:
+	default:
+		level = KERN_ERR;
+		sevstr = "Fatal";
+		break;
+	}
+
+	printk("%s%s Machine check interrupt [%s]\n", level, sevstr,
+		disposition == RTAS_DISP_FULLY_RECOVERED ?
+		"Recovered" : "Not recovered");
+	if (user_mode(regs)) {
+		printk("%s  NIP: [%016lx] PID: %d Comm: %s\n", level,
+			regs->nip, current->pid, current->comm);
+	} else {
+		printk("%s  NIP [%016lx]: %pS\n", level, regs->nip,
+			(void *)regs->nip);
+	}
+	printk("%s  Initiator: %s\n", level,
+				VAL_TO_STRING(initiators, initiator));
+
+	switch (error_type) {
+	case PSERIES_MC_ERROR_TYPE_UE:
+		printk("%s  Error type: %s [%s]\n", level,
+			VAL_TO_STRING(mc_err_types, error_type),
+			VAL_TO_STRING(mc_ue_types, err_sub_type));
+		break;
+	case PSERIES_MC_ERROR_TYPE_SLB:
+		printk("%s  Error type: %s [%s]\n", level,
+			VAL_TO_STRING(mc_err_types, error_type),
+			VAL_TO_STRING(mc_slb_types, err_sub_type));
+		break;
+	case PSERIES_MC_ERROR_TYPE_ERAT:
+	case PSERIES_MC_ERROR_TYPE_TLB:
+		printk("%s  Error type: %s [%s]\n", level,
+			VAL_TO_STRING(mc_err_types, error_type),
+			VAL_TO_STRING(mc_soft_types, err_sub_type));
+		break;
+	default:
+		printk("%s  Error type: %s\n", level,
+			VAL_TO_STRING(mc_err_types, error_type));
+		break;
+	}
+
+	addr = rtas_mc_get_effective_addr(mce_log);
+	if (addr)
+		printk("%s    Effective address: %016llx\n", level, addr);
+}
+
 static int mce_handle_error(struct rtas_error_log *errp)
 {
 	struct pseries_errorlog *pseries_log;
@@ -481,6 +610,8 @@ static int recover_mce(struct pt_regs *regs, struct rtas_error_log *err)
 	int recovered = 0;
 	int disposition = rtas_error_disposition(err);
 
+	pseries_print_mce_info(regs, err);
+
 	if (!(regs->msr & MSR_RI)) {
 		/* If MSR_RI isn't set, we cannot recover */
 		recovered = 0;

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v5 7/7] powerpc/pseries: Dump the SLB contents on SLB MCE errors.
  2018-07-02  5:45 [PATCH v5 0/7] powerpc/pseries: Machien check handler improvements Mahesh J Salgaonkar
                   ` (5 preceding siblings ...)
  2018-07-02  5:47 ` [PATCH v5 6/7] powerpc/pseries: Display machine check error details Mahesh J Salgaonkar
@ 2018-07-02  5:47 ` Mahesh J Salgaonkar
  6 siblings, 0 replies; 17+ messages in thread
From: Mahesh J Salgaonkar @ 2018-07-02  5:47 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Aneesh Kumar K.V, Michael Ellerman, Nicholas Piggin,
	Aneesh Kumar K.V, Laurent Dufour, Michal Suchanek

From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

If we get a machine check exceptions due to SLB errors then dump the
current SLB contents which will be very much helpful in debugging the
root cause of SLB errors. Introduce an exclusive buffer per cpu to hold
faulty SLB entries. In real mode mce handler saves the old SLB contents
into this buffer accessible through paca and print it out later in virtual
mode.

With this patch the console will log SLB contents like below on SLB MCE
errors:

[ 3022.938065] SLB contents of cpu 0x3
[ 3022.938066] 00 c000000008000000 400ea1b217000500
[ 3022.938067]   1T  ESID=   c00000  VSID=      ea1b217 LLP:100
[ 3022.938068] 01 d000000008000000 400d43642f000510
[ 3022.938069]   1T  ESID=   d00000  VSID=      d43642f LLP:110
[ 3022.938070] 05 f000000008000000 400a86c85f000500
[ 3022.938071]   1T  ESID=   f00000  VSID=      a86c85f LLP:100
[ 3022.938072] 06 00007f0008000000 400a628b13000d90
[ 3022.938073]   1T  ESID=       7f  VSID=      a628b13 LLP:110
[ 3022.938074] 07 0000000018000000 000b7979f523fd90
[ 3022.938075]  256M ESID=        1  VSID=   b7979f523f LLP:110
[ 3022.938076] 08 c000000008000000 400ea1b217000510
[ 3022.938076]   1T  ESID=   c00000  VSID=      ea1b217 LLP:110
[ 3022.938077] 09 c000000008000000 400ea1b217000510
[ 3022.938078]   1T  ESID=   c00000  VSID=      ea1b217 LLP:110

Suggested-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/book3s/64/mmu-hash.h |    7 +++
 arch/powerpc/include/asm/paca.h               |    1 
 arch/powerpc/mm/slb.c                         |   57 +++++++++++++++++++++++++
 arch/powerpc/platforms/pseries/ras.c          |   10 ++++
 arch/powerpc/platforms/pseries/setup.c        |   10 ++++
 5 files changed, 84 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
index cc00a7088cf3..5a3fe282076d 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
@@ -485,9 +485,16 @@ static inline void hpte_init_pseries(void) { }
 
 extern void hpte_init_native(void);
 
+struct slb_entry {
+	u64	esid;
+	u64	vsid;
+};
+
 extern void slb_initialize(void);
 extern void slb_flush_and_rebolt(void);
 extern void slb_flush_and_rebolt_realmode(void);
+extern void slb_save_contents(struct slb_entry *slb_ptr);
+extern void slb_dump_contents(struct slb_entry *slb_ptr);
 
 extern void slb_vmalloc_update(void);
 extern void slb_set_size(u16 size);
diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index b441fef53077..653f87c69423 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -253,6 +253,7 @@ struct paca_struct {
 #endif
 #ifdef CONFIG_PPC_PSERIES
 	u8 *mce_data_buf;		/* buffer to hold per cpu rtas errlog */
+	struct slb_entry *mce_faulty_slbs;
 #endif /* CONFIG_PPC_PSERIES */
 } ____cacheline_aligned;
 
diff --git a/arch/powerpc/mm/slb.c b/arch/powerpc/mm/slb.c
index 5b1813b98358..476ab0b1d4e8 100644
--- a/arch/powerpc/mm/slb.c
+++ b/arch/powerpc/mm/slb.c
@@ -151,6 +151,63 @@ void slb_flush_and_rebolt_realmode(void)
 	get_paca()->slb_cache_ptr = 0;
 }
 
+void slb_save_contents(struct slb_entry *slb_ptr)
+{
+	int i;
+	unsigned long e, v;
+
+	if (!slb_ptr)
+		return;
+
+	for (i = 0; i < mmu_slb_size; i++) {
+		asm volatile("slbmfee  %0,%1" : "=r" (e) : "r" (i));
+		asm volatile("slbmfev  %0,%1" : "=r" (v) : "r" (i));
+		slb_ptr->esid = e;
+		slb_ptr->vsid = v;
+		slb_ptr++;
+	}
+}
+
+void slb_dump_contents(struct slb_entry *slb_ptr)
+{
+	int i;
+	unsigned long e, v;
+	unsigned long llp;
+
+	if (!slb_ptr)
+		return;
+
+	pr_err("SLB contents of cpu 0x%x\n", smp_processor_id());
+
+	for (i = 0; i < mmu_slb_size; i++) {
+		e = slb_ptr->esid;
+		v = slb_ptr->vsid;
+		slb_ptr++;
+
+		if (!e && !v)
+			continue;
+
+		pr_err("%02d %016lx %016lx\n", i, e, v);
+
+		if (!(e & SLB_ESID_V)) {
+			pr_err("\n");
+			continue;
+		}
+		llp = v & SLB_VSID_LLP;
+		if (v & SLB_VSID_B_1T) {
+			pr_err("  1T  ESID=%9lx  VSID=%13lx LLP:%3lx\n",
+				GET_ESID_1T(e),
+				(v & ~SLB_VSID_B) >> SLB_VSID_SHIFT_1T,
+				llp);
+		} else {
+			pr_err(" 256M ESID=%9lx  VSID=%13lx LLP:%3lx\n",
+				GET_ESID(e),
+				(v & ~SLB_VSID_B) >> SLB_VSID_SHIFT,
+				llp);
+		}
+	}
+}
+
 void slb_vmalloc_update(void)
 {
 	unsigned long vflags;
diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
index 7d4d2b8bc019..d33c88e65fa1 100644
--- a/arch/powerpc/platforms/pseries/ras.c
+++ b/arch/powerpc/platforms/pseries/ras.c
@@ -515,6 +515,10 @@ static void pseries_print_mce_info(struct pt_regs *regs,
 		break;
 	}
 
+	/* Display faulty slb contents for SLB errors. */
+	if (error_type == PSERIES_MC_ERROR_TYPE_SLB)
+		slb_dump_contents(local_paca->mce_faulty_slbs);
+
 	printk("%s%s Machine check interrupt [%s]\n", level, sevstr,
 		disposition == RTAS_DISP_FULLY_RECOVERED ?
 		"Recovered" : "Not recovered");
@@ -575,7 +579,11 @@ static int mce_handle_error(struct rtas_error_log *errp)
 
 	if ((disposition == RTAS_DISP_NOT_RECOVERED) &&
 			(error_type == PSERIES_MC_ERROR_TYPE_SLB)) {
-		/* Store the old slb content someplace. */
+		/*
+		 * Store the old slb content in paca before flushing. Print
+		 * this when we go to virtual mode.
+		 */
+		slb_save_contents(local_paca->mce_faulty_slbs);
 		slb_flush_and_rebolt_realmode();
 		disposition = RTAS_DISP_FULLY_RECOVERED;
 		rtas_set_disposition_recovered(errp);
diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
index 249b02bc5c41..76d15e46a152 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -105,6 +105,9 @@ static void __init fwnmi_init(void)
 	u8 *mce_data_buf;
 	unsigned int i;
 	int nr_cpus = num_possible_cpus();
+	struct slb_entry *slb_ptr;
+	size_t size;
+
 
 	int ibm_nmi_register = rtas_token("ibm,nmi-register");
 	if (ibm_nmi_register == RTAS_UNKNOWN_SERVICE)
@@ -130,6 +133,13 @@ static void __init fwnmi_init(void)
 		paca_ptrs[i]->mce_data_buf = mce_data_buf +
 						(RTAS_ERROR_LOG_MAX * i);
 	}
+
+	/* Allocate per cpu slb area to save old slb contents during MCE */
+	size = sizeof(struct slb_entry) * mmu_slb_size * nr_cpus;
+	slb_ptr = __va(memblock_alloc_base(size, sizeof(struct slb_entry),
+							ppc64_rma_size));
+	for_each_possible_cpu(i)
+		paca_ptrs[i]->mce_faulty_slbs = slb_ptr + (mmu_slb_size * i);
 }
 
 static void pseries_8259_cascade(struct irq_desc *desc)

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 5/7] powerpc/pseries: flush SLB contents on SLB MCE errors.
  2018-07-02  5:47 ` [PATCH v5 5/7] powerpc/pseries: flush SLB contents on SLB MCE errors Mahesh J Salgaonkar
@ 2018-07-02 22:08   ` Nicholas Piggin
  2018-07-03  7:20     ` Mahesh Jagannath Salgaonkar
                       ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Nicholas Piggin @ 2018-07-02 22:08 UTC (permalink / raw)
  To: Mahesh J Salgaonkar
  Cc: linuxppc-dev, Aneesh Kumar K.V, Laurent Dufour, Michal Suchanek

On Mon, 02 Jul 2018 11:17:06 +0530
Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com> wrote:

> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> 
> On pseries, as of today system crashes if we get a machine check
> exceptions due to SLB errors. These are soft errors and can be fixed by
> flushing the SLBs so the kernel can continue to function instead of
> system crash. We do this in real mode before turning on MMU. Otherwise
> we would run into nested machine checks. This patch now fetches the
> rtas error log in real mode and flushes the SLBs on SLB errors.
> 
> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> ---
>  arch/powerpc/include/asm/book3s/64/mmu-hash.h |    1 
>  arch/powerpc/include/asm/machdep.h            |    1 
>  arch/powerpc/kernel/exceptions-64s.S          |   42 +++++++++++++++++++++
>  arch/powerpc/kernel/mce.c                     |   16 +++++++-
>  arch/powerpc/mm/slb.c                         |    6 +++
>  arch/powerpc/platforms/powernv/opal.c         |    1 
>  arch/powerpc/platforms/pseries/pseries.h      |    1 
>  arch/powerpc/platforms/pseries/ras.c          |   51 +++++++++++++++++++++++++
>  arch/powerpc/platforms/pseries/setup.c        |    1 
>  9 files changed, 116 insertions(+), 4 deletions(-)
> 


> +TRAMP_REAL_BEGIN(machine_check_pSeries_early)
> +BEGIN_FTR_SECTION
> +	EXCEPTION_PROLOG_1(PACA_EXMC, NOTEST, 0x200)
> +	mr	r10,r1			/* Save r1 */
> +	ld	r1,PACAMCEMERGSP(r13)	/* Use MC emergency stack */
> +	subi	r1,r1,INT_FRAME_SIZE	/* alloc stack frame		*/
> +	mfspr	r11,SPRN_SRR0		/* Save SRR0 */
> +	mfspr	r12,SPRN_SRR1		/* Save SRR1 */
> +	EXCEPTION_PROLOG_COMMON_1()
> +	EXCEPTION_PROLOG_COMMON_2(PACA_EXMC)
> +	EXCEPTION_PROLOG_COMMON_3(0x200)
> +	addi	r3,r1,STACK_FRAME_OVERHEAD
> +	BRANCH_LINK_TO_FAR(machine_check_early) /* Function call ABI */

Is there any reason you can't use the existing
machine_check_powernv_early code to do all this?

> diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
> index efdd16a79075..221271c96a57 100644
> --- a/arch/powerpc/kernel/mce.c
> +++ b/arch/powerpc/kernel/mce.c
> @@ -488,9 +488,21 @@ long machine_check_early(struct pt_regs *regs)
>  {
>  	long handled = 0;
>  
> -	__this_cpu_inc(irq_stat.mce_exceptions);
> +	/*
> +	 * For pSeries we count mce when we go into virtual mode machine
> +	 * check handler. Hence skip it. Also, We can't access per cpu
> +	 * variables in real mode for LPAR.
> +	 */
> +	if (early_cpu_has_feature(CPU_FTR_HVMODE))
> +		__this_cpu_inc(irq_stat.mce_exceptions);
>  
> -	if (cur_cpu_spec && cur_cpu_spec->machine_check_early)
> +	/*
> +	 * See if platform is capable of handling machine check.
> +	 * Otherwise fallthrough and allow CPU to handle this machine check.
> +	 */
> +	if (ppc_md.machine_check_early)
> +		handled = ppc_md.machine_check_early(regs);
> +	else if (cur_cpu_spec && cur_cpu_spec->machine_check_early)
>  		handled = cur_cpu_spec->machine_check_early(regs);

Would be good to add a powernv ppc_md handler which does the
cur_cpu_spec->machine_check_early() call now that other platforms are
calling this code. Because those aren't valid as a fallback call, but
specific to powernv.

> diff --git a/arch/powerpc/platforms/powernv/opal.c b/arch/powerpc/platforms/powernv/opal.c
> index 48fbb41af5d1..ed548d40a9e1 100644
> --- a/arch/powerpc/platforms/powernv/opal.c
> +++ b/arch/powerpc/platforms/powernv/opal.c
> @@ -417,7 +417,6 @@ static int opal_recover_mce(struct pt_regs *regs,
>  
>  	if (!(regs->msr & MSR_RI)) {
>  		/* If MSR_RI isn't set, we cannot recover */
> -		pr_err("Machine check interrupt unrecoverable: MSR(RI=0)\n");

What's the reason for this change?

>  		recovered = 0;
>  	} else if (evt->disposition == MCE_DISPOSITION_RECOVERED) {
>  		/* Platform corrected itself */
> diff --git a/arch/powerpc/platforms/pseries/pseries.h b/arch/powerpc/platforms/pseries/pseries.h
> index 60db2ee511fb..3611db5dd583 100644
> --- a/arch/powerpc/platforms/pseries/pseries.h
> +++ b/arch/powerpc/platforms/pseries/pseries.h
> @@ -24,6 +24,7 @@ struct pt_regs;
>  
>  extern int pSeries_system_reset_exception(struct pt_regs *regs);
>  extern int pSeries_machine_check_exception(struct pt_regs *regs);
> +extern int pSeries_machine_check_realmode(struct pt_regs *regs);
>  
>  #ifdef CONFIG_SMP
>  extern void smp_init_pseries(void);
> diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
> index 851ce326874a..9aa7885e0148 100644
> --- a/arch/powerpc/platforms/pseries/ras.c
> +++ b/arch/powerpc/platforms/pseries/ras.c
> @@ -427,6 +427,35 @@ int pSeries_system_reset_exception(struct pt_regs *regs)
>  	return 0; /* need to perform reset */
>  }
>  
> +static int mce_handle_error(struct rtas_error_log *errp)
> +{
> +	struct pseries_errorlog *pseries_log;
> +	struct pseries_mc_errorlog *mce_log;
> +	int disposition = rtas_error_disposition(errp);
> +	uint8_t error_type;
> +
> +	if (!rtas_error_extended(errp))
> +		goto out;
> +
> +	pseries_log = get_pseries_errorlog(errp, PSERIES_ELOG_SECT_ID_MCE);
> +	if (pseries_log == NULL)
> +		goto out;
> +
> +	mce_log = (struct pseries_mc_errorlog *)pseries_log->data;
> +	error_type = rtas_mc_error_type(mce_log);
> +
> +	if ((disposition == RTAS_DISP_NOT_RECOVERED) &&
> +			(error_type == PSERIES_MC_ERROR_TYPE_SLB)) {
> +		/* Store the old slb content someplace. */
> +		slb_flush_and_rebolt_realmode();
> +		disposition = RTAS_DISP_FULLY_RECOVERED;
> +		rtas_set_disposition_recovered(errp);
> +	}
> +
> +out:
> +	return disposition;
> +}
> +
>  /*
>   * Process MCE rtas errlog event.
>   */
> @@ -503,11 +532,31 @@ int pSeries_machine_check_exception(struct pt_regs *regs)
>  	struct rtas_error_log *errp;
>  
>  	if (fwnmi_active) {
> -		errp = fwnmi_get_errinfo(regs);
>  		fwnmi_release_errinfo();

Should the fwnmi_release_errinfo be done in the realmode path as well
now, or is there some reason to leave it here?

> +		errp = fwnmi_get_errlog();
>  		if (errp && recover_mce(regs, errp))
>  			return 1;
>  	}
>  
>  	return 0;
>  }
> +
> +int pSeries_machine_check_realmode(struct pt_regs *regs)
> +{
> +	struct rtas_error_log *errp;
> +	int disposition;
> +
> +	if (fwnmi_active) {
> +		errp = fwnmi_get_errinfo(regs);
> +		/*
> +		 * Call to fwnmi_release_errinfo() in real mode causes kernel
> +		 * to panic. Hence we will call it as soon as we go into
> +		 * virtual mode.
> +		 */
> +		disposition = mce_handle_error(errp);
> +		if (disposition == RTAS_DISP_FULLY_RECOVERED)
> +			return 1;
> +	}
> +
> +	return 0;
> +}
> diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
> index 60a067a6e743..249b02bc5c41 100644
> --- a/arch/powerpc/platforms/pseries/setup.c
> +++ b/arch/powerpc/platforms/pseries/setup.c
> @@ -999,6 +999,7 @@ define_machine(pseries) {
>  	.calibrate_decr		= generic_calibrate_decr,
>  	.progress		= rtas_progress,
>  	.system_reset_exception = pSeries_system_reset_exception,
> +	.machine_check_early	= pSeries_machine_check_realmode,
>  	.machine_check_exception = pSeries_machine_check_exception,
>  #ifdef CONFIG_KEXEC_CORE
>  	.machine_kexec          = pSeries_machine_kexec,
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 2/7] powerpc/pseries: Defer the logging of rtas error to irq work queue.
  2018-07-02  5:46 ` [PATCH v5 2/7] powerpc/pseries: Defer the logging of rtas error to irq work queue Mahesh J Salgaonkar
@ 2018-07-03  3:25   ` Nicholas Piggin
  2018-07-03 10:32     ` Mahesh Jagannath Salgaonkar
  0 siblings, 1 reply; 17+ messages in thread
From: Nicholas Piggin @ 2018-07-03  3:25 UTC (permalink / raw)
  To: Mahesh J Salgaonkar
  Cc: linuxppc-dev, stable, Aneesh Kumar K.V, Laurent Dufour, Michal Suchanek

On Mon, 02 Jul 2018 11:16:29 +0530
Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com> wrote:

> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> 
> rtas_log_buf is a buffer to hold RTAS event data that are communicated
> to kernel by hypervisor. This buffer is then used to pass RTAS event
> data to user through proc fs. This buffer is allocated from vmalloc
> (non-linear mapping) area.
> 
> On Machine check interrupt, register r3 points to RTAS extended event
> log passed by hypervisor that contains the MCE event. The pseries
> machine check handler then logs this error into rtas_log_buf. The
> rtas_log_buf is a vmalloc-ed (non-linear) buffer we end up taking up a
> page fault (vector 0x300) while accessing it. Since machine check
> interrupt handler runs in NMI context we can not afford to take any
> page fault. Page faults are not honored in NMI context and causes
> kernel panic. Apart from that, as Nick pointed out, pSeries_log_error()
> also takes a spin_lock while logging error which is not safe in NMI
> context. It may endup in deadlock if we get another MCE before releasing
> the lock. Fix this by deferring the logging of rtas error to irq work queue.
> 
> Current implementation uses two different buffers to hold rtas error log
> depending on whether extended log is provided or not. This makes bit
> difficult to identify which buffer has valid data that needs to logged
> later in irq work. Simplify this using single buffer, one per paca, and
> copy rtas log to it irrespective of whether extended log is provided or
> not. Allocate this buffer below RMA region so that it can be accessed
> in real mode mce handler.
> 
> Fixes: b96672dd840f ("powerpc: Machine check interrupt is a non-maskable interrupt")
> Cc: stable@vger.kernel.org
> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

I think this looks reasonable. It doesn't fix that commit so much as
fixes the problem that's apparent after it's applied. I don't know if
we should backport this to a wider set of stable kernels? Aside from
that,

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>

Thanks,
Nick

> ---
>  arch/powerpc/include/asm/paca.h        |    3 ++
>  arch/powerpc/platforms/pseries/ras.c   |   47 ++++++++++++++++++++++----------
>  arch/powerpc/platforms/pseries/setup.c |   16 +++++++++++
>  3 files changed, 51 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
> index 3f109a3e3edb..b441fef53077 100644
> --- a/arch/powerpc/include/asm/paca.h
> +++ b/arch/powerpc/include/asm/paca.h
> @@ -251,6 +251,9 @@ struct paca_struct {
>  	void *rfi_flush_fallback_area;
>  	u64 l1d_flush_size;
>  #endif
> +#ifdef CONFIG_PPC_PSERIES
> +	u8 *mce_data_buf;		/* buffer to hold per cpu rtas errlog */
> +#endif /* CONFIG_PPC_PSERIES */
>  } ____cacheline_aligned;
>  
>  extern void copy_mm_to_paca(struct mm_struct *mm);
> diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
> index ef104144d4bc..14a46b07ab2f 100644
> --- a/arch/powerpc/platforms/pseries/ras.c
> +++ b/arch/powerpc/platforms/pseries/ras.c
> @@ -22,6 +22,7 @@
>  #include <linux/of.h>
>  #include <linux/fs.h>
>  #include <linux/reboot.h>
> +#include <linux/irq_work.h>
>  
>  #include <asm/machdep.h>
>  #include <asm/rtas.h>
> @@ -32,11 +33,13 @@
>  static unsigned char ras_log_buf[RTAS_ERROR_LOG_MAX];
>  static DEFINE_SPINLOCK(ras_log_buf_lock);
>  
> -static char global_mce_data_buf[RTAS_ERROR_LOG_MAX];
> -static DEFINE_PER_CPU(__u64, mce_data_buf);
> -
>  static int ras_check_exception_token;
>  
> +static void mce_process_errlog_event(struct irq_work *work);
> +static struct irq_work mce_errlog_process_work = {
> +	.func = mce_process_errlog_event,
> +};
> +
>  #define EPOW_SENSOR_TOKEN	9
>  #define EPOW_SENSOR_INDEX	0
>  
> @@ -330,16 +333,20 @@ static irqreturn_t ras_error_interrupt(int irq, void *dev_id)
>  	((((A) >= 0x7000) && ((A) < 0x7ff0)) || \
>  	(((A) >= rtas.base) && ((A) < (rtas.base + rtas.size - 16))))
>  
> +static inline struct rtas_error_log *fwnmi_get_errlog(void)
> +{
> +	return (struct rtas_error_log *)local_paca->mce_data_buf;
> +}
> +
>  /*
>   * Get the error information for errors coming through the
>   * FWNMI vectors.  The pt_regs' r3 will be updated to reflect
>   * the actual r3 if possible, and a ptr to the error log entry
>   * will be returned if found.
>   *
> - * If the RTAS error is not of the extended type, then we put it in a per
> - * cpu 64bit buffer. If it is the extended type we use global_mce_data_buf.
> + * Use one buffer mce_data_buf per cpu to store RTAS error.
>   *
> - * The global_mce_data_buf does not have any locks or protection around it,
> + * The mce_data_buf does not have any locks or protection around it,
>   * if a second machine check comes in, or a system reset is done
>   * before we have logged the error, then we will get corruption in the
>   * error log.  This is preferable over holding off on calling
> @@ -349,7 +356,7 @@ static irqreturn_t ras_error_interrupt(int irq, void *dev_id)
>  static struct rtas_error_log *fwnmi_get_errinfo(struct pt_regs *regs)
>  {
>  	unsigned long *savep;
> -	struct rtas_error_log *h, *errhdr = NULL;
> +	struct rtas_error_log *h;
>  
>  	/* Mask top two bits */
>  	regs->gpr[3] &= ~(0x3UL << 62);
> @@ -362,22 +369,20 @@ static struct rtas_error_log *fwnmi_get_errinfo(struct pt_regs *regs)
>  	savep = __va(regs->gpr[3]);
>  	regs->gpr[3] = savep[0];	/* restore original r3 */
>  
> -	/* If it isn't an extended log we can use the per cpu 64bit buffer */
>  	h = (struct rtas_error_log *)&savep[1];
> +	/* Use the per cpu buffer from paca to store rtas error log */
> +	memset(local_paca->mce_data_buf, 0, RTAS_ERROR_LOG_MAX);
>  	if (!rtas_error_extended(h)) {
> -		memcpy(this_cpu_ptr(&mce_data_buf), h, sizeof(__u64));
> -		errhdr = (struct rtas_error_log *)this_cpu_ptr(&mce_data_buf);
> +		memcpy(local_paca->mce_data_buf, h, sizeof(__u64));
>  	} else {
>  		int len, error_log_length;
>  
>  		error_log_length = 8 + rtas_error_extended_log_length(h);
>  		len = min_t(int, error_log_length, RTAS_ERROR_LOG_MAX);
> -		memset(global_mce_data_buf, 0, RTAS_ERROR_LOG_MAX);
> -		memcpy(global_mce_data_buf, h, len);
> -		errhdr = (struct rtas_error_log *)global_mce_data_buf;
> +		memcpy(local_paca->mce_data_buf, h, len);
>  	}
>  
> -	return errhdr;
> +	return (struct rtas_error_log *)local_paca->mce_data_buf;
>  }
>  
>  /* Call this when done with the data returned by FWNMI_get_errinfo.
> @@ -422,6 +427,17 @@ int pSeries_system_reset_exception(struct pt_regs *regs)
>  	return 0; /* need to perform reset */
>  }
>  
> +/*
> + * Process MCE rtas errlog event.
> + */
> +static void mce_process_errlog_event(struct irq_work *work)
> +{
> +	struct rtas_error_log *err;
> +
> +	err = fwnmi_get_errlog();
> +	log_error((char *)err, ERR_TYPE_RTAS_LOG, 0);
> +}
> +
>  /*
>   * See if we can recover from a machine check exception.
>   * This is only called on power4 (or above) and only via
> @@ -466,7 +482,8 @@ static int recover_mce(struct pt_regs *regs, struct rtas_error_log *err)
>  		recovered = 1;
>  	}
>  
> -	log_error((char *)err, ERR_TYPE_RTAS_LOG, 0);
> +	/* Queue irq work to log this rtas event later. */
> +	irq_work_queue(&mce_errlog_process_work);
>  
>  	return recovered;
>  }
> diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
> index fdb32e056ef4..60a067a6e743 100644
> --- a/arch/powerpc/platforms/pseries/setup.c
> +++ b/arch/powerpc/platforms/pseries/setup.c
> @@ -41,6 +41,7 @@
>  #include <linux/root_dev.h>
>  #include <linux/of.h>
>  #include <linux/of_pci.h>
> +#include <linux/memblock.h>
>  
>  #include <asm/mmu.h>
>  #include <asm/processor.h>
> @@ -101,6 +102,9 @@ static void pSeries_show_cpuinfo(struct seq_file *m)
>  static void __init fwnmi_init(void)
>  {
>  	unsigned long system_reset_addr, machine_check_addr;
> +	u8 *mce_data_buf;
> +	unsigned int i;
> +	int nr_cpus = num_possible_cpus();
>  
>  	int ibm_nmi_register = rtas_token("ibm,nmi-register");
>  	if (ibm_nmi_register == RTAS_UNKNOWN_SERVICE)
> @@ -114,6 +118,18 @@ static void __init fwnmi_init(void)
>  	if (0 == rtas_call(ibm_nmi_register, 2, 1, NULL, system_reset_addr,
>  				machine_check_addr))
>  		fwnmi_active = 1;
> +
> +	/*
> +	 * Allocate a chunk for per cpu buffer to hold rtas errorlog.
> +	 * It will be used in real mode mce handler, hence it needs to be
> +	 * below RMA.
> +	 */
> +	mce_data_buf = __va(memblock_alloc_base(RTAS_ERROR_LOG_MAX * nr_cpus,
> +					RTAS_ERROR_LOG_MAX, ppc64_rma_size));
> +	for_each_possible_cpu(i) {
> +		paca_ptrs[i]->mce_data_buf = mce_data_buf +
> +						(RTAS_ERROR_LOG_MAX * i);
> +	}
>  }
>  
>  static void pseries_8259_cascade(struct irq_desc *desc)
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 5/7] powerpc/pseries: flush SLB contents on SLB MCE errors.
  2018-07-02 22:08   ` Nicholas Piggin
@ 2018-07-03  7:20     ` Mahesh Jagannath Salgaonkar
  2018-07-03 10:37     ` Michal Suchánek
  2018-07-12 13:41     ` Michal Suchánek
  2 siblings, 0 replies; 17+ messages in thread
From: Mahesh Jagannath Salgaonkar @ 2018-07-03  7:20 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: linuxppc-dev, Aneesh Kumar K.V, Laurent Dufour, Michal Suchanek

On 07/03/2018 03:38 AM, Nicholas Piggin wrote:
> On Mon, 02 Jul 2018 11:17:06 +0530
> Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com> wrote:
> 
>> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
>>
>> On pseries, as of today system crashes if we get a machine check
>> exceptions due to SLB errors. These are soft errors and can be fixed by
>> flushing the SLBs so the kernel can continue to function instead of
>> system crash. We do this in real mode before turning on MMU. Otherwise
>> we would run into nested machine checks. This patch now fetches the
>> rtas error log in real mode and flushes the SLBs on SLB errors.
>>
>> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
>> ---
>>  arch/powerpc/include/asm/book3s/64/mmu-hash.h |    1 
>>  arch/powerpc/include/asm/machdep.h            |    1 
>>  arch/powerpc/kernel/exceptions-64s.S          |   42 +++++++++++++++++++++
>>  arch/powerpc/kernel/mce.c                     |   16 +++++++-
>>  arch/powerpc/mm/slb.c                         |    6 +++
>>  arch/powerpc/platforms/powernv/opal.c         |    1 
>>  arch/powerpc/platforms/pseries/pseries.h      |    1 
>>  arch/powerpc/platforms/pseries/ras.c          |   51 +++++++++++++++++++++++++
>>  arch/powerpc/platforms/pseries/setup.c        |    1 
>>  9 files changed, 116 insertions(+), 4 deletions(-)
>>
> 
> 
>> +TRAMP_REAL_BEGIN(machine_check_pSeries_early)
>> +BEGIN_FTR_SECTION
>> +	EXCEPTION_PROLOG_1(PACA_EXMC, NOTEST, 0x200)
>> +	mr	r10,r1			/* Save r1 */
>> +	ld	r1,PACAMCEMERGSP(r13)	/* Use MC emergency stack */
>> +	subi	r1,r1,INT_FRAME_SIZE	/* alloc stack frame		*/
>> +	mfspr	r11,SPRN_SRR0		/* Save SRR0 */
>> +	mfspr	r12,SPRN_SRR1		/* Save SRR1 */
>> +	EXCEPTION_PROLOG_COMMON_1()
>> +	EXCEPTION_PROLOG_COMMON_2(PACA_EXMC)
>> +	EXCEPTION_PROLOG_COMMON_3(0x200)
>> +	addi	r3,r1,STACK_FRAME_OVERHEAD
>> +	BRANCH_LINK_TO_FAR(machine_check_early) /* Function call ABI */
> 
> Is there any reason you can't use the existing
> machine_check_powernv_early code to do all this?

I did think about that :-). But the machine_check_powernv_early code
does bit of extra stuff which isn't required in pseries like touching ME
bit in MSR and lots of checks that are done in
machine_check_handle_early() before going to virtual mode. But on second
look I see that we can bypass all that with HVMODE FTR section. Will
rename machine_check_powernv_early to machine_check_common_early and
reuse it.

> 
>> diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
>> index efdd16a79075..221271c96a57 100644
>> --- a/arch/powerpc/kernel/mce.c
>> +++ b/arch/powerpc/kernel/mce.c
>> @@ -488,9 +488,21 @@ long machine_check_early(struct pt_regs *regs)
>>  {
>>  	long handled = 0;
>>  
>> -	__this_cpu_inc(irq_stat.mce_exceptions);
>> +	/*
>> +	 * For pSeries we count mce when we go into virtual mode machine
>> +	 * check handler. Hence skip it. Also, We can't access per cpu
>> +	 * variables in real mode for LPAR.
>> +	 */
>> +	if (early_cpu_has_feature(CPU_FTR_HVMODE))
>> +		__this_cpu_inc(irq_stat.mce_exceptions);
>>  
>> -	if (cur_cpu_spec && cur_cpu_spec->machine_check_early)
>> +	/*
>> +	 * See if platform is capable of handling machine check.
>> +	 * Otherwise fallthrough and allow CPU to handle this machine check.
>> +	 */
>> +	if (ppc_md.machine_check_early)
>> +		handled = ppc_md.machine_check_early(regs);
>> +	else if (cur_cpu_spec && cur_cpu_spec->machine_check_early)
>>  		handled = cur_cpu_spec->machine_check_early(regs);
> 
> Would be good to add a powernv ppc_md handler which does the
> cur_cpu_spec->machine_check_early() call now that other platforms are
> calling this code. Because those aren't valid as a fallback call, but
> specific to powernv.
> 
>> diff --git a/arch/powerpc/platforms/powernv/opal.c b/arch/powerpc/platforms/powernv/opal.c
>> index 48fbb41af5d1..ed548d40a9e1 100644
>> --- a/arch/powerpc/platforms/powernv/opal.c
>> +++ b/arch/powerpc/platforms/powernv/opal.c
>> @@ -417,7 +417,6 @@ static int opal_recover_mce(struct pt_regs *regs,
>>  
>>  	if (!(regs->msr & MSR_RI)) {
>>  		/* If MSR_RI isn't set, we cannot recover */
>> -		pr_err("Machine check interrupt unrecoverable: MSR(RI=0)\n");
> 
> What's the reason for this change?

Err... This is by mistake.. My bad. Thanks for catching this. Will
remove this hunk in next revision. We need a similar print for pSeries
in ras.c.

> 
>>  		recovered = 0;
>>  	} else if (evt->disposition == MCE_DISPOSITION_RECOVERED) {
>>  		/* Platform corrected itself */
>> diff --git a/arch/powerpc/platforms/pseries/pseries.h b/arch/powerpc/platforms/pseries/pseries.h
>> index 60db2ee511fb..3611db5dd583 100644
>> --- a/arch/powerpc/platforms/pseries/pseries.h
>> +++ b/arch/powerpc/platforms/pseries/pseries.h
>> @@ -24,6 +24,7 @@ struct pt_regs;
>>  
>>  extern int pSeries_system_reset_exception(struct pt_regs *regs);
>>  extern int pSeries_machine_check_exception(struct pt_regs *regs);
>> +extern int pSeries_machine_check_realmode(struct pt_regs *regs);
>>  
>>  #ifdef CONFIG_SMP
>>  extern void smp_init_pseries(void);
>> diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
>> index 851ce326874a..9aa7885e0148 100644
>> --- a/arch/powerpc/platforms/pseries/ras.c
>> +++ b/arch/powerpc/platforms/pseries/ras.c
>> @@ -427,6 +427,35 @@ int pSeries_system_reset_exception(struct pt_regs *regs)
>>  	return 0; /* need to perform reset */
>>  }
>>  
>> +static int mce_handle_error(struct rtas_error_log *errp)
>> +{
>> +	struct pseries_errorlog *pseries_log;
>> +	struct pseries_mc_errorlog *mce_log;
>> +	int disposition = rtas_error_disposition(errp);
>> +	uint8_t error_type;
>> +
>> +	if (!rtas_error_extended(errp))
>> +		goto out;
>> +
>> +	pseries_log = get_pseries_errorlog(errp, PSERIES_ELOG_SECT_ID_MCE);
>> +	if (pseries_log == NULL)
>> +		goto out;
>> +
>> +	mce_log = (struct pseries_mc_errorlog *)pseries_log->data;
>> +	error_type = rtas_mc_error_type(mce_log);
>> +
>> +	if ((disposition == RTAS_DISP_NOT_RECOVERED) &&
>> +			(error_type == PSERIES_MC_ERROR_TYPE_SLB)) {
>> +		/* Store the old slb content someplace. */
>> +		slb_flush_and_rebolt_realmode();
>> +		disposition = RTAS_DISP_FULLY_RECOVERED;
>> +		rtas_set_disposition_recovered(errp);
>> +	}
>> +
>> +out:
>> +	return disposition;
>> +}
>> +
>>  /*
>>   * Process MCE rtas errlog event.
>>   */
>> @@ -503,11 +532,31 @@ int pSeries_machine_check_exception(struct pt_regs *regs)
>>  	struct rtas_error_log *errp;
>>  
>>  	if (fwnmi_active) {
>> -		errp = fwnmi_get_errinfo(regs);
>>  		fwnmi_release_errinfo();
> 
> Should the fwnmi_release_errinfo be done in the realmode path as well
> now, or is there some reason to leave it here?

In real mode calling fwnmi_release_errinfo() causes kernel panic.
Couldn't debug further to find out why. So decided to keep it in virtual
mode. I have mentioned that in comment below in
pSeries_machine_check_realmode().

> 
>> +		errp = fwnmi_get_errlog();
>>  		if (errp && recover_mce(regs, errp))
>>  			return 1;
>>  	}
>>  
>>  	return 0;
>>  }
>> +
>> +int pSeries_machine_check_realmode(struct pt_regs *regs)
>> +{
>> +	struct rtas_error_log *errp;
>> +	int disposition;
>> +
>> +	if (fwnmi_active) {
>> +		errp = fwnmi_get_errinfo(regs);
>> +		/*
>> +		 * Call to fwnmi_release_errinfo() in real mode causes kernel
>> +		 * to panic. Hence we will call it as soon as we go into
>> +		 * virtual mode.
>> +		 */
>> +		disposition = mce_handle_error(errp);
>> +		if (disposition == RTAS_DISP_FULLY_RECOVERED)
>> +			return 1;
>> +	}
>> +
>> +	return 0;
>> +}
>> diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
>> index 60a067a6e743..249b02bc5c41 100644
>> --- a/arch/powerpc/platforms/pseries/setup.c
>> +++ b/arch/powerpc/platforms/pseries/setup.c
>> @@ -999,6 +999,7 @@ define_machine(pseries) {
>>  	.calibrate_decr		= generic_calibrate_decr,
>>  	.progress		= rtas_progress,
>>  	.system_reset_exception = pSeries_system_reset_exception,
>> +	.machine_check_early	= pSeries_machine_check_realmode,
>>  	.machine_check_exception = pSeries_machine_check_exception,
>>  #ifdef CONFIG_KEXEC_CORE
>>  	.machine_kexec          = pSeries_machine_kexec,
>>
> 

Thanks for your review.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 2/7] powerpc/pseries: Defer the logging of rtas error to irq work queue.
  2018-07-03  3:25   ` Nicholas Piggin
@ 2018-07-03 10:32     ` Mahesh Jagannath Salgaonkar
  0 siblings, 0 replies; 17+ messages in thread
From: Mahesh Jagannath Salgaonkar @ 2018-07-03 10:32 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: linuxppc-dev, stable, Aneesh Kumar K.V, Laurent Dufour, Michal Suchanek

On 07/03/2018 08:55 AM, Nicholas Piggin wrote:
> On Mon, 02 Jul 2018 11:16:29 +0530
> Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com> wrote:
> 
>> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
>>
>> rtas_log_buf is a buffer to hold RTAS event data that are communicated
>> to kernel by hypervisor. This buffer is then used to pass RTAS event
>> data to user through proc fs. This buffer is allocated from vmalloc
>> (non-linear mapping) area.
>>
>> On Machine check interrupt, register r3 points to RTAS extended event
>> log passed by hypervisor that contains the MCE event. The pseries
>> machine check handler then logs this error into rtas_log_buf. The
>> rtas_log_buf is a vmalloc-ed (non-linear) buffer we end up taking up a
>> page fault (vector 0x300) while accessing it. Since machine check
>> interrupt handler runs in NMI context we can not afford to take any
>> page fault. Page faults are not honored in NMI context and causes
>> kernel panic. Apart from that, as Nick pointed out, pSeries_log_error()
>> also takes a spin_lock while logging error which is not safe in NMI
>> context. It may endup in deadlock if we get another MCE before releasing
>> the lock. Fix this by deferring the logging of rtas error to irq work queue.
>>
>> Current implementation uses two different buffers to hold rtas error log
>> depending on whether extended log is provided or not. This makes bit
>> difficult to identify which buffer has valid data that needs to logged
>> later in irq work. Simplify this using single buffer, one per paca, and
>> copy rtas log to it irrespective of whether extended log is provided or
>> not. Allocate this buffer below RMA region so that it can be accessed
>> in real mode mce handler.
>>
>> Fixes: b96672dd840f ("powerpc: Machine check interrupt is a non-maskable interrupt")
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> 
> I think this looks reasonable. It doesn't fix that commit so much as
> fixes the problem that's apparent after it's applied. I don't know if
> we should backport this to a wider set of stable kernels? Aside from
> that,

Since the commit b96672dd840f went into 4.13, I think it is good if we
backport this to V4.14 and above stable kernels.

Thanks,
-Mahesh

> 
> Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
> 
> Thanks,
> Nick
> 
>> ---
>>  arch/powerpc/include/asm/paca.h        |    3 ++
>>  arch/powerpc/platforms/pseries/ras.c   |   47 ++++++++++++++++++++++----------
>>  arch/powerpc/platforms/pseries/setup.c |   16 +++++++++++
>>  3 files changed, 51 insertions(+), 15 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
>> index 3f109a3e3edb..b441fef53077 100644
>> --- a/arch/powerpc/include/asm/paca.h
>> +++ b/arch/powerpc/include/asm/paca.h
>> @@ -251,6 +251,9 @@ struct paca_struct {
>>  	void *rfi_flush_fallback_area;
>>  	u64 l1d_flush_size;
>>  #endif
>> +#ifdef CONFIG_PPC_PSERIES
>> +	u8 *mce_data_buf;		/* buffer to hold per cpu rtas errlog */
>> +#endif /* CONFIG_PPC_PSERIES */
>>  } ____cacheline_aligned;
>>  
>>  extern void copy_mm_to_paca(struct mm_struct *mm);
>> diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
>> index ef104144d4bc..14a46b07ab2f 100644
>> --- a/arch/powerpc/platforms/pseries/ras.c
>> +++ b/arch/powerpc/platforms/pseries/ras.c
>> @@ -22,6 +22,7 @@
>>  #include <linux/of.h>
>>  #include <linux/fs.h>
>>  #include <linux/reboot.h>
>> +#include <linux/irq_work.h>
>>  
>>  #include <asm/machdep.h>
>>  #include <asm/rtas.h>
>> @@ -32,11 +33,13 @@
>>  static unsigned char ras_log_buf[RTAS_ERROR_LOG_MAX];
>>  static DEFINE_SPINLOCK(ras_log_buf_lock);
>>  
>> -static char global_mce_data_buf[RTAS_ERROR_LOG_MAX];
>> -static DEFINE_PER_CPU(__u64, mce_data_buf);
>> -
>>  static int ras_check_exception_token;
>>  
>> +static void mce_process_errlog_event(struct irq_work *work);
>> +static struct irq_work mce_errlog_process_work = {
>> +	.func = mce_process_errlog_event,
>> +};
>> +
>>  #define EPOW_SENSOR_TOKEN	9
>>  #define EPOW_SENSOR_INDEX	0
>>  
>> @@ -330,16 +333,20 @@ static irqreturn_t ras_error_interrupt(int irq, void *dev_id)
>>  	((((A) >= 0x7000) && ((A) < 0x7ff0)) || \
>>  	(((A) >= rtas.base) && ((A) < (rtas.base + rtas.size - 16))))
>>  
>> +static inline struct rtas_error_log *fwnmi_get_errlog(void)
>> +{
>> +	return (struct rtas_error_log *)local_paca->mce_data_buf;
>> +}
>> +
>>  /*
>>   * Get the error information for errors coming through the
>>   * FWNMI vectors.  The pt_regs' r3 will be updated to reflect
>>   * the actual r3 if possible, and a ptr to the error log entry
>>   * will be returned if found.
>>   *
>> - * If the RTAS error is not of the extended type, then we put it in a per
>> - * cpu 64bit buffer. If it is the extended type we use global_mce_data_buf.
>> + * Use one buffer mce_data_buf per cpu to store RTAS error.
>>   *
>> - * The global_mce_data_buf does not have any locks or protection around it,
>> + * The mce_data_buf does not have any locks or protection around it,
>>   * if a second machine check comes in, or a system reset is done
>>   * before we have logged the error, then we will get corruption in the
>>   * error log.  This is preferable over holding off on calling
>> @@ -349,7 +356,7 @@ static irqreturn_t ras_error_interrupt(int irq, void *dev_id)
>>  static struct rtas_error_log *fwnmi_get_errinfo(struct pt_regs *regs)
>>  {
>>  	unsigned long *savep;
>> -	struct rtas_error_log *h, *errhdr = NULL;
>> +	struct rtas_error_log *h;
>>  
>>  	/* Mask top two bits */
>>  	regs->gpr[3] &= ~(0x3UL << 62);
>> @@ -362,22 +369,20 @@ static struct rtas_error_log *fwnmi_get_errinfo(struct pt_regs *regs)
>>  	savep = __va(regs->gpr[3]);
>>  	regs->gpr[3] = savep[0];	/* restore original r3 */
>>  
>> -	/* If it isn't an extended log we can use the per cpu 64bit buffer */
>>  	h = (struct rtas_error_log *)&savep[1];
>> +	/* Use the per cpu buffer from paca to store rtas error log */
>> +	memset(local_paca->mce_data_buf, 0, RTAS_ERROR_LOG_MAX);
>>  	if (!rtas_error_extended(h)) {
>> -		memcpy(this_cpu_ptr(&mce_data_buf), h, sizeof(__u64));
>> -		errhdr = (struct rtas_error_log *)this_cpu_ptr(&mce_data_buf);
>> +		memcpy(local_paca->mce_data_buf, h, sizeof(__u64));
>>  	} else {
>>  		int len, error_log_length;
>>  
>>  		error_log_length = 8 + rtas_error_extended_log_length(h);
>>  		len = min_t(int, error_log_length, RTAS_ERROR_LOG_MAX);
>> -		memset(global_mce_data_buf, 0, RTAS_ERROR_LOG_MAX);
>> -		memcpy(global_mce_data_buf, h, len);
>> -		errhdr = (struct rtas_error_log *)global_mce_data_buf;
>> +		memcpy(local_paca->mce_data_buf, h, len);
>>  	}
>>  
>> -	return errhdr;
>> +	return (struct rtas_error_log *)local_paca->mce_data_buf;
>>  }
>>  
>>  /* Call this when done with the data returned by FWNMI_get_errinfo.
>> @@ -422,6 +427,17 @@ int pSeries_system_reset_exception(struct pt_regs *regs)
>>  	return 0; /* need to perform reset */
>>  }
>>  
>> +/*
>> + * Process MCE rtas errlog event.
>> + */
>> +static void mce_process_errlog_event(struct irq_work *work)
>> +{
>> +	struct rtas_error_log *err;
>> +
>> +	err = fwnmi_get_errlog();
>> +	log_error((char *)err, ERR_TYPE_RTAS_LOG, 0);
>> +}
>> +
>>  /*
>>   * See if we can recover from a machine check exception.
>>   * This is only called on power4 (or above) and only via
>> @@ -466,7 +482,8 @@ static int recover_mce(struct pt_regs *regs, struct rtas_error_log *err)
>>  		recovered = 1;
>>  	}
>>  
>> -	log_error((char *)err, ERR_TYPE_RTAS_LOG, 0);
>> +	/* Queue irq work to log this rtas event later. */
>> +	irq_work_queue(&mce_errlog_process_work);
>>  
>>  	return recovered;
>>  }
>> diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
>> index fdb32e056ef4..60a067a6e743 100644
>> --- a/arch/powerpc/platforms/pseries/setup.c
>> +++ b/arch/powerpc/platforms/pseries/setup.c
>> @@ -41,6 +41,7 @@
>>  #include <linux/root_dev.h>
>>  #include <linux/of.h>
>>  #include <linux/of_pci.h>
>> +#include <linux/memblock.h>
>>  
>>  #include <asm/mmu.h>
>>  #include <asm/processor.h>
>> @@ -101,6 +102,9 @@ static void pSeries_show_cpuinfo(struct seq_file *m)
>>  static void __init fwnmi_init(void)
>>  {
>>  	unsigned long system_reset_addr, machine_check_addr;
>> +	u8 *mce_data_buf;
>> +	unsigned int i;
>> +	int nr_cpus = num_possible_cpus();
>>  
>>  	int ibm_nmi_register = rtas_token("ibm,nmi-register");
>>  	if (ibm_nmi_register == RTAS_UNKNOWN_SERVICE)
>> @@ -114,6 +118,18 @@ static void __init fwnmi_init(void)
>>  	if (0 == rtas_call(ibm_nmi_register, 2, 1, NULL, system_reset_addr,
>>  				machine_check_addr))
>>  		fwnmi_active = 1;
>> +
>> +	/*
>> +	 * Allocate a chunk for per cpu buffer to hold rtas errorlog.
>> +	 * It will be used in real mode mce handler, hence it needs to be
>> +	 * below RMA.
>> +	 */
>> +	mce_data_buf = __va(memblock_alloc_base(RTAS_ERROR_LOG_MAX * nr_cpus,
>> +					RTAS_ERROR_LOG_MAX, ppc64_rma_size));
>> +	for_each_possible_cpu(i) {
>> +		paca_ptrs[i]->mce_data_buf = mce_data_buf +
>> +						(RTAS_ERROR_LOG_MAX * i);
>> +	}
>>  }
>>  
>>  static void pseries_8259_cascade(struct irq_desc *desc)
>>
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 5/7] powerpc/pseries: flush SLB contents on SLB MCE errors.
  2018-07-02 22:08   ` Nicholas Piggin
  2018-07-03  7:20     ` Mahesh Jagannath Salgaonkar
@ 2018-07-03 10:37     ` Michal Suchánek
  2018-07-04 13:15       ` Michael Ellerman
  2018-07-12 13:41     ` Michal Suchánek
  2 siblings, 1 reply; 17+ messages in thread
From: Michal Suchánek @ 2018-07-03 10:37 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Mahesh J Salgaonkar, linuxppc-dev, Laurent Dufour,
	Michal Suchanek, Aneesh Kumar K.V

On Tue, 3 Jul 2018 08:08:14 +1000
Nicholas Piggin <npiggin@gmail.com> wrote:

> On Mon, 02 Jul 2018 11:17:06 +0530
> Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com> wrote:
> 
> > From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> > 
> > On pseries, as of today system crashes if we get a machine check
> > exceptions due to SLB errors. These are soft errors and can be
> > fixed by flushing the SLBs so the kernel can continue to function
> > instead of system crash. We do this in real mode before turning on
> > MMU. Otherwise we would run into nested machine checks. This patch
> > now fetches the rtas error log in real mode and flushes the SLBs on
> > SLB errors.
> > 
> > Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> > ---
> >  arch/powerpc/include/asm/book3s/64/mmu-hash.h |    1 
> >  arch/powerpc/include/asm/machdep.h            |    1 
> >  arch/powerpc/kernel/exceptions-64s.S          |   42
> > +++++++++++++++++++++ arch/powerpc/kernel/mce.c
> > |   16 +++++++- arch/powerpc/mm/slb.c                         |
> > 6 +++ arch/powerpc/platforms/powernv/opal.c         |    1 
> >  arch/powerpc/platforms/pseries/pseries.h      |    1 
> >  arch/powerpc/platforms/pseries/ras.c          |   51
> > +++++++++++++++++++++++++
> > arch/powerpc/platforms/pseries/setup.c        |    1 9 files
> > changed, 116 insertions(+), 4 deletions(-) 
> 
> 
> > +TRAMP_REAL_BEGIN(machine_check_pSeries_early)
> > +BEGIN_FTR_SECTION
> > +	EXCEPTION_PROLOG_1(PACA_EXMC, NOTEST, 0x200)
> > +	mr	r10,r1			/* Save r1 */
> > +	ld	r1,PACAMCEMERGSP(r13)	/* Use MC emergency
> > stack */
> > +	subi	r1,r1,INT_FRAME_SIZE	/* alloc stack
> > frame		*/
> > +	mfspr	r11,SPRN_SRR0		/* Save SRR0 */
> > +	mfspr	r12,SPRN_SRR1		/* Save SRR1 */
> > +	EXCEPTION_PROLOG_COMMON_1()
> > +	EXCEPTION_PROLOG_COMMON_2(PACA_EXMC)
> > +	EXCEPTION_PROLOG_COMMON_3(0x200)
> > +	addi	r3,r1,STACK_FRAME_OVERHEAD
> > +	BRANCH_LINK_TO_FAR(machine_check_early) /* Function call
> > ABI */  
> 
> Is there any reason you can't use the existing
> machine_check_powernv_early code to do all this?

Code sharing is nice but if we envision this going to stable kernels
butchering the existing handler is going to be a nightmare. The code is
quite a bit different between kernel versions.

This code as is requires the bit that introduces
EXCEPTION_PROLOG_COMMON_1 and then should work on Linux 3.14+

Thanks

Michal

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 5/7] powerpc/pseries: flush SLB contents on SLB MCE errors.
  2018-07-03 10:37     ` Michal Suchánek
@ 2018-07-04 13:15       ` Michael Ellerman
  0 siblings, 0 replies; 17+ messages in thread
From: Michael Ellerman @ 2018-07-04 13:15 UTC (permalink / raw)
  To: Michal Suchánek, Nicholas Piggin
  Cc: Mahesh J Salgaonkar, Laurent Dufour, Michal Suchanek,
	Aneesh Kumar K.V, linuxppc-dev

Michal Such=C3=A1nek <msuchanek@suse.de> writes:
> On Tue, 3 Jul 2018 08:08:14 +1000
> Nicholas Piggin <npiggin@gmail.com> wrote:
>> On Mon, 02 Jul 2018 11:17:06 +0530
>> Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com> wrote:
>> > From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
>> >=20
>> > On pseries, as of today system crashes if we get a machine check
>> > exceptions due to SLB errors. These are soft errors and can be
>> > fixed by flushing the SLBs so the kernel can continue to function
>> > instead of system crash. We do this in real mode before turning on
>> > MMU. Otherwise we would run into nested machine checks. This patch
>> > now fetches the rtas error log in real mode and flushes the SLBs on
>> > SLB errors.
>> >=20
>> > Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
>> > ---
>> >  arch/powerpc/include/asm/book3s/64/mmu-hash.h |    1=20
>> >  arch/powerpc/include/asm/machdep.h            |    1=20
>> >  arch/powerpc/kernel/exceptions-64s.S          |   42
>> > +++++++++++++++++++++ arch/powerpc/kernel/mce.c
>> > |   16 +++++++- arch/powerpc/mm/slb.c                         |
>> > 6 +++ arch/powerpc/platforms/powernv/opal.c         |    1=20
>> >  arch/powerpc/platforms/pseries/pseries.h      |    1=20
>> >  arch/powerpc/platforms/pseries/ras.c          |   51
>> > +++++++++++++++++++++++++
>> > arch/powerpc/platforms/pseries/setup.c        |    1 9 files
>> > changed, 116 insertions(+), 4 deletions(-)=20
>>=20
>>=20
>> > +TRAMP_REAL_BEGIN(machine_check_pSeries_early)
>> > +BEGIN_FTR_SECTION
>> > +	EXCEPTION_PROLOG_1(PACA_EXMC, NOTEST, 0x200)
>> > +	mr	r10,r1			/* Save r1 */
>> > +	ld	r1,PACAMCEMERGSP(r13)	/* Use MC emergency
>> > stack */
>> > +	subi	r1,r1,INT_FRAME_SIZE	/* alloc stack
>> > frame		*/
>> > +	mfspr	r11,SPRN_SRR0		/* Save SRR0 */
>> > +	mfspr	r12,SPRN_SRR1		/* Save SRR1 */
>> > +	EXCEPTION_PROLOG_COMMON_1()
>> > +	EXCEPTION_PROLOG_COMMON_2(PACA_EXMC)
>> > +	EXCEPTION_PROLOG_COMMON_3(0x200)
>> > +	addi	r3,r1,STACK_FRAME_OVERHEAD
>> > +	BRANCH_LINK_TO_FAR(machine_check_early) /* Function call
>> > ABI */=20=20
>>=20
>> Is there any reason you can't use the existing
>> machine_check_powernv_early code to do all this?
>
> Code sharing is nice but if we envision this going to stable kernels
> butchering the existing handler is going to be a nightmare. The code is
> quite a bit different between kernel versions.

I'm not sure if we'll send it to stable kernels. But we obviously will
back port it to some distros :)

So if sharing the code is a significant impediment to that, then I'm
happy if we don't share code initially. That could be done as a
follow-up to this series.

cheers

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 5/7] powerpc/pseries: flush SLB contents on SLB MCE errors.
  2018-07-02 22:08   ` Nicholas Piggin
  2018-07-03  7:20     ` Mahesh Jagannath Salgaonkar
  2018-07-03 10:37     ` Michal Suchánek
@ 2018-07-12 13:41     ` Michal Suchánek
  2018-07-19 13:08       ` Michael Ellerman
  2018-08-01  5:49       ` Nicholas Piggin
  2 siblings, 2 replies; 17+ messages in thread
From: Michal Suchánek @ 2018-07-12 13:41 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Mahesh J Salgaonkar, Aneesh Kumar K.V, Laurent Dufour, linuxppc-dev

On Tue, 3 Jul 2018 08:08:14 +1000
"Nicholas Piggin" <npiggin@gmail.com> wrote:

> On Mon, 02 Jul 2018 11:17:06 +0530
> Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com> wrote:
> 
> > From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> > 
> > On pseries, as of today system crashes if we get a machine check
> > exceptions due to SLB errors. These are soft errors and can be
> > fixed by flushing the SLBs so the kernel can continue to function
> > instead of system crash. We do this in real mode before turning on
> > MMU. Otherwise we would run into nested machine checks. This patch
> > now fetches the rtas error log in real mode and flushes the SLBs on
> > SLB errors.
> > 
> > Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> > ---
> >  arch/powerpc/include/asm/book3s/64/mmu-hash.h |    1 
> >  arch/powerpc/include/asm/machdep.h            |    1 
> >  arch/powerpc/kernel/exceptions-64s.S          |   42
> > +++++++++++++++++++++ arch/powerpc/kernel/mce.c
> > |   16 +++++++- arch/powerpc/mm/slb.c                         |
> > 6 +++ arch/powerpc/platforms/powernv/opal.c         |    1 
> >  arch/powerpc/platforms/pseries/pseries.h      |    1 
> >  arch/powerpc/platforms/pseries/ras.c          |   51
> > +++++++++++++++++++++++++
> > arch/powerpc/platforms/pseries/setup.c        |    1 9 files
> > changed, 116 insertions(+), 4 deletions(-) 
> 
> 
> > +TRAMP_REAL_BEGIN(machine_check_pSeries_early)
> > +BEGIN_FTR_SECTION
> > +	EXCEPTION_PROLOG_1(PACA_EXMC, NOTEST, 0x200)
> > +	mr	r10,r1			/* Save r1 */
> > +	ld	r1,PACAMCEMERGSP(r13)	/* Use MC emergency
> > stack */
> > +	subi	r1,r1,INT_FRAME_SIZE	/* alloc stack
> > frame		*/
> > +	mfspr	r11,SPRN_SRR0		/* Save SRR0 */
> > +	mfspr	r12,SPRN_SRR1		/* Save SRR1 */
> > +	EXCEPTION_PROLOG_COMMON_1()
> > +	EXCEPTION_PROLOG_COMMON_2(PACA_EXMC)
> > +	EXCEPTION_PROLOG_COMMON_3(0x200)
> > +	addi	r3,r1,STACK_FRAME_OVERHEAD
> > +	BRANCH_LINK_TO_FAR(machine_check_early) /* Function call
> > ABI */  
> 
> Is there any reason you can't use the existing
> machine_check_powernv_early code to do all this?
> 
> > diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
> > index efdd16a79075..221271c96a57 100644
> > --- a/arch/powerpc/kernel/mce.c
> > +++ b/arch/powerpc/kernel/mce.c
> > @@ -488,9 +488,21 @@ long machine_check_early(struct pt_regs *regs)
> >  {
> >  	long handled = 0;
> >  
> > -	__this_cpu_inc(irq_stat.mce_exceptions);
> > +	/*
> > +	 * For pSeries we count mce when we go into virtual mode
> > machine
> > +	 * check handler. Hence skip it. Also, We can't access per
> > cpu
> > +	 * variables in real mode for LPAR.
> > +	 */
> > +	if (early_cpu_has_feature(CPU_FTR_HVMODE))
> > +		__this_cpu_inc(irq_stat.mce_exceptions);
> >  
> > -	if (cur_cpu_spec && cur_cpu_spec->machine_check_early)
> > +	/*
> > +	 * See if platform is capable of handling machine check.
> > +	 * Otherwise fallthrough and allow CPU to handle this
> > machine check.
> > +	 */
> > +	if (ppc_md.machine_check_early)
> > +		handled = ppc_md.machine_check_early(regs);
> > +	else if (cur_cpu_spec && cur_cpu_spec->machine_check_early)
> >  		handled =
> > cur_cpu_spec->machine_check_early(regs);  
> 
> Would be good to add a powernv ppc_md handler which does the
> cur_cpu_spec->machine_check_early() call now that other platforms are
> calling this code. Because those aren't valid as a fallback call, but
> specific to powernv.
> 

Something like this (untested)?

Subject: [PATCH] powerpc/powernv: define platform MCE handler.

---
 arch/powerpc/kernel/mce.c              |  3 ---
 arch/powerpc/platforms/powernv/setup.c | 11 +++++++++++
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
index 221271c96a57..ae17d8aa60c4 100644
--- a/arch/powerpc/kernel/mce.c
+++ b/arch/powerpc/kernel/mce.c
@@ -498,12 +498,9 @@ long machine_check_early(struct pt_regs *regs)
 
 	/*
 	 * See if platform is capable of handling machine check.
-	 * Otherwise fallthrough and allow CPU to handle this machine check.
 	 */
 	if (ppc_md.machine_check_early)
 		handled = ppc_md.machine_check_early(regs);
-	else if (cur_cpu_spec && cur_cpu_spec->machine_check_early)
-		handled = cur_cpu_spec->machine_check_early(regs);
 	return handled;
 }
 
diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c
index f96df0a25d05..b74c93bc2e55 100644
--- a/arch/powerpc/platforms/powernv/setup.c
+++ b/arch/powerpc/platforms/powernv/setup.c
@@ -431,6 +431,16 @@ static unsigned long pnv_get_proc_freq(unsigned int cpu)
 	return ret_freq;
 }
 
+static long pnv_machine_check_early(struct pt_regs *regs)
+{
+	long handled = 0;
+
+	if (cur_cpu_spec && cur_cpu_spec->machine_check_early)
+		handled = cur_cpu_spec->machine_check_early(regs);
+
+	return handled;
+}
+
 define_machine(powernv) {
 	.name			= "PowerNV",
 	.probe			= pnv_probe,
@@ -442,6 +452,7 @@ define_machine(powernv) {
 	.machine_shutdown	= pnv_shutdown,
 	.power_save             = NULL,
 	.calibrate_decr		= generic_calibrate_decr,
+	.machine_check_early	= pnv_machine_check_early,
 #ifdef CONFIG_KEXEC_CORE
 	.kexec_cpu_down		= pnv_kexec_cpu_down,
 #endif
-- 
2.13.7

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 5/7] powerpc/pseries: flush SLB contents on SLB MCE errors.
  2018-07-12 13:41     ` Michal Suchánek
@ 2018-07-19 13:08       ` Michael Ellerman
  2018-08-01  5:49       ` Nicholas Piggin
  1 sibling, 0 replies; 17+ messages in thread
From: Michael Ellerman @ 2018-07-19 13:08 UTC (permalink / raw)
  To: Michal Suchánek, Nicholas Piggin
  Cc: Mahesh J Salgaonkar, Laurent Dufour, Aneesh Kumar K.V, linuxppc-dev

Michal Such=C3=A1nek <msuchanek@suse.de> writes:
> On Tue, 3 Jul 2018 08:08:14 +1000
> "Nicholas Piggin" <npiggin@gmail.com> wrote: >> On Mon, 02 Jul 2018 11:17=
:06 +0530
>> Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com> wrote:
>> > From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
>> > diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
>> > index efdd16a79075..221271c96a57 100644
>> > --- a/arch/powerpc/kernel/mce.c
>> > +++ b/arch/powerpc/kernel/mce.c
>> > @@ -488,9 +488,21 @@ long machine_check_early(struct pt_regs *regs)
>> >  {
>> >  	long handled =3D 0;
>> >=20=20
>> > -	__this_cpu_inc(irq_stat.mce_exceptions);
>> > +	/*
>> > +	 * For pSeries we count mce when we go into virtual mode
>> > machine
>> > +	 * check handler. Hence skip it. Also, We can't access per
>> > cpu
>> > +	 * variables in real mode for LPAR.
>> > +	 */
>> > +	if (early_cpu_has_feature(CPU_FTR_HVMODE))
>> > +		__this_cpu_inc(irq_stat.mce_exceptions);
>> >=20=20
>> > -	if (cur_cpu_spec && cur_cpu_spec->machine_check_early)
>> > +	/*
>> > +	 * See if platform is capable of handling machine check.
>> > +	 * Otherwise fallthrough and allow CPU to handle this
>> > machine check.
>> > +	 */
>> > +	if (ppc_md.machine_check_early)
>> > +		handled =3D ppc_md.machine_check_early(regs);
>> > +	else if (cur_cpu_spec && cur_cpu_spec->machine_check_early)
>> >  		handled =3D
>> > cur_cpu_spec->machine_check_early(regs);=20=20
>>=20
>> Would be good to add a powernv ppc_md handler which does the
>> cur_cpu_spec->machine_check_early() call now that other platforms are
>> calling this code. Because those aren't valid as a fallback call, but
>> specific to powernv.
>>=20
>
> Something like this (untested)?
>
> Subject: [PATCH] powerpc/powernv: define platform MCE handler.

LGTM.

cheers

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 5/7] powerpc/pseries: flush SLB contents on SLB MCE errors.
  2018-07-12 13:41     ` Michal Suchánek
  2018-07-19 13:08       ` Michael Ellerman
@ 2018-08-01  5:49       ` Nicholas Piggin
  1 sibling, 0 replies; 17+ messages in thread
From: Nicholas Piggin @ 2018-08-01  5:49 UTC (permalink / raw)
  To: Michal Suchánek
  Cc: Mahesh J Salgaonkar, Aneesh Kumar K.V, Laurent Dufour, linuxppc-dev

On Thu, 12 Jul 2018 15:41:13 +0200
Michal Such=C3=A1nek <msuchanek@suse.de> wrote:

> On Tue, 3 Jul 2018 08:08:14 +1000
> "Nicholas Piggin" <npiggin@gmail.com> wrote:
>=20
> > On Mon, 02 Jul 2018 11:17:06 +0530
> > Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com> wrote:
> >  =20
> > > From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> > >=20
> > > On pseries, as of today system crashes if we get a machine check
> > > exceptions due to SLB errors. These are soft errors and can be
> > > fixed by flushing the SLBs so the kernel can continue to function
> > > instead of system crash. We do this in real mode before turning on
> > > MMU. Otherwise we would run into nested machine checks. This patch
> > > now fetches the rtas error log in real mode and flushes the SLBs on
> > > SLB errors.
> > >=20
> > > Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> > > ---
> > >  arch/powerpc/include/asm/book3s/64/mmu-hash.h |    1=20
> > >  arch/powerpc/include/asm/machdep.h            |    1=20
> > >  arch/powerpc/kernel/exceptions-64s.S          |   42
> > > +++++++++++++++++++++ arch/powerpc/kernel/mce.c
> > > |   16 +++++++- arch/powerpc/mm/slb.c                         |
> > > 6 +++ arch/powerpc/platforms/powernv/opal.c         |    1=20
> > >  arch/powerpc/platforms/pseries/pseries.h      |    1=20
> > >  arch/powerpc/platforms/pseries/ras.c          |   51
> > > +++++++++++++++++++++++++
> > > arch/powerpc/platforms/pseries/setup.c        |    1 9 files
> > > changed, 116 insertions(+), 4 deletions(-)  =20
> >=20
> >  =20
> > > +TRAMP_REAL_BEGIN(machine_check_pSeries_early)
> > > +BEGIN_FTR_SECTION
> > > +	EXCEPTION_PROLOG_1(PACA_EXMC, NOTEST, 0x200)
> > > +	mr	r10,r1			/* Save r1 */
> > > +	ld	r1,PACAMCEMERGSP(r13)	/* Use MC emergency
> > > stack */
> > > +	subi	r1,r1,INT_FRAME_SIZE	/* alloc stack
> > > frame		*/
> > > +	mfspr	r11,SPRN_SRR0		/* Save SRR0 */
> > > +	mfspr	r12,SPRN_SRR1		/* Save SRR1 */
> > > +	EXCEPTION_PROLOG_COMMON_1()
> > > +	EXCEPTION_PROLOG_COMMON_2(PACA_EXMC)
> > > +	EXCEPTION_PROLOG_COMMON_3(0x200)
> > > +	addi	r3,r1,STACK_FRAME_OVERHEAD
> > > +	BRANCH_LINK_TO_FAR(machine_check_early) /* Function call
> > > ABI */   =20
> >=20
> > Is there any reason you can't use the existing
> > machine_check_powernv_early code to do all this?
> >  =20
> > > diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
> > > index efdd16a79075..221271c96a57 100644
> > > --- a/arch/powerpc/kernel/mce.c
> > > +++ b/arch/powerpc/kernel/mce.c
> > > @@ -488,9 +488,21 @@ long machine_check_early(struct pt_regs *regs)
> > >  {
> > >  	long handled =3D 0;
> > > =20
> > > -	__this_cpu_inc(irq_stat.mce_exceptions);
> > > +	/*
> > > +	 * For pSeries we count mce when we go into virtual mode
> > > machine
> > > +	 * check handler. Hence skip it. Also, We can't access per
> > > cpu
> > > +	 * variables in real mode for LPAR.
> > > +	 */
> > > +	if (early_cpu_has_feature(CPU_FTR_HVMODE))
> > > +		__this_cpu_inc(irq_stat.mce_exceptions);
> > > =20
> > > -	if (cur_cpu_spec && cur_cpu_spec->machine_check_early)
> > > +	/*
> > > +	 * See if platform is capable of handling machine check.
> > > +	 * Otherwise fallthrough and allow CPU to handle this
> > > machine check.
> > > +	 */
> > > +	if (ppc_md.machine_check_early)
> > > +		handled =3D ppc_md.machine_check_early(regs);
> > > +	else if (cur_cpu_spec && cur_cpu_spec->machine_check_early)
> > >  		handled =3D
> > > cur_cpu_spec->machine_check_early(regs);   =20
> >=20
> > Would be good to add a powernv ppc_md handler which does the
> > cur_cpu_spec->machine_check_early() call now that other platforms are
> > calling this code. Because those aren't valid as a fallback call, but
> > specific to powernv.
> >  =20
>=20
> Something like this (untested)?

Sorry, some emails fell through the cracks. Yes exactly like this would
be good. If you can add a quick changelog and SOB, and
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>

Thanks,
Nick

>=20
> Subject: [PATCH] powerpc/powernv: define platform MCE handler.
>=20
> ---
>  arch/powerpc/kernel/mce.c              |  3 ---
>  arch/powerpc/platforms/powernv/setup.c | 11 +++++++++++
>  2 files changed, 11 insertions(+), 3 deletions(-)
>=20
> diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
> index 221271c96a57..ae17d8aa60c4 100644
> --- a/arch/powerpc/kernel/mce.c
> +++ b/arch/powerpc/kernel/mce.c
> @@ -498,12 +498,9 @@ long machine_check_early(struct pt_regs *regs)
> =20
>  	/*
>  	 * See if platform is capable of handling machine check.
> -	 * Otherwise fallthrough and allow CPU to handle this machine check.
>  	 */
>  	if (ppc_md.machine_check_early)
>  		handled =3D ppc_md.machine_check_early(regs);
> -	else if (cur_cpu_spec && cur_cpu_spec->machine_check_early)
> -		handled =3D cur_cpu_spec->machine_check_early(regs);
>  	return handled;
>  }
> =20
> diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platfo=
rms/powernv/setup.c
> index f96df0a25d05..b74c93bc2e55 100644
> --- a/arch/powerpc/platforms/powernv/setup.c
> +++ b/arch/powerpc/platforms/powernv/setup.c
> @@ -431,6 +431,16 @@ static unsigned long pnv_get_proc_freq(unsigned int =
cpu)
>  	return ret_freq;
>  }
> =20
> +static long pnv_machine_check_early(struct pt_regs *regs)
> +{
> +	long handled =3D 0;
> +
> +	if (cur_cpu_spec && cur_cpu_spec->machine_check_early)
> +		handled =3D cur_cpu_spec->machine_check_early(regs);
> +
> +	return handled;
> +}
> +
>  define_machine(powernv) {
>  	.name			=3D "PowerNV",
>  	.probe			=3D pnv_probe,
> @@ -442,6 +452,7 @@ define_machine(powernv) {
>  	.machine_shutdown	=3D pnv_shutdown,
>  	.power_save             =3D NULL,
>  	.calibrate_decr		=3D generic_calibrate_decr,
> +	.machine_check_early	=3D pnv_machine_check_early,
>  #ifdef CONFIG_KEXEC_CORE
>  	.kexec_cpu_down		=3D pnv_kexec_cpu_down,
>  #endif

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2018-08-01  5:50 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-02  5:45 [PATCH v5 0/7] powerpc/pseries: Machien check handler improvements Mahesh J Salgaonkar
2018-07-02  5:46 ` [PATCH v5 1/7] powerpc/pseries: Avoid using the size greater than Mahesh J Salgaonkar
2018-07-02  5:46 ` [PATCH v5 2/7] powerpc/pseries: Defer the logging of rtas error to irq work queue Mahesh J Salgaonkar
2018-07-03  3:25   ` Nicholas Piggin
2018-07-03 10:32     ` Mahesh Jagannath Salgaonkar
2018-07-02  5:46 ` [PATCH v5 3/7] powerpc/pseries: Fix endainness while restoring of r3 in MCE handler Mahesh J Salgaonkar
2018-07-02  5:46 ` [PATCH v5 4/7] powerpc/pseries: Define MCE error event section Mahesh J Salgaonkar
2018-07-02  5:47 ` [PATCH v5 5/7] powerpc/pseries: flush SLB contents on SLB MCE errors Mahesh J Salgaonkar
2018-07-02 22:08   ` Nicholas Piggin
2018-07-03  7:20     ` Mahesh Jagannath Salgaonkar
2018-07-03 10:37     ` Michal Suchánek
2018-07-04 13:15       ` Michael Ellerman
2018-07-12 13:41     ` Michal Suchánek
2018-07-19 13:08       ` Michael Ellerman
2018-08-01  5:49       ` Nicholas Piggin
2018-07-02  5:47 ` [PATCH v5 6/7] powerpc/pseries: Display machine check error details Mahesh J Salgaonkar
2018-07-02  5:47 ` [PATCH v5 7/7] powerpc/pseries: Dump the SLB contents on SLB MCE errors Mahesh J Salgaonkar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.