All of lore.kernel.org
 help / color / mirror / Atom feed
* [v3 PATCH 0/5] powerpc/pseries: Machien check handler improvements.
@ 2018-06-07 17:27 Mahesh J Salgaonkar
  2018-06-07 17:28 ` [v3 PATCH 1/5] powerpc/pseries: convert rtas_log_buf to linear allocation Mahesh J Salgaonkar
                   ` (4 more replies)
  0 siblings, 5 replies; 21+ messages in thread
From: Mahesh J Salgaonkar @ 2018-06-07 17:27 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Michael Ellerman, stable, Aneesh Kumar K.V, Aneesh Kumar K.V,
	Michael Ellerman, Laurent Dufour, Nicholas Piggin

This patch series includes some improvement to Machine check handler
for pseries. Patch 1 fixes an issue where machine check handler crashes
kernel while accessing vmalloc-ed buffer while in nmi context.
Patch 2 fixes endain bug while restoring of r3 in MCE handler.
Patch 4 dumps the SLB contents on SLB MCE errors to improve the debugability.
Patch 5 display's the MCE error details on console.

CHange in V3:
- Moved patch 5 to patch 2

Change in V2:
- patch 3: Display additional info (NIP and task info) in MCE error details.
- patch 5: Fix endain bug while restoring of r3 in MCE handler.

---

Mahesh Salgaonkar (5):
      powerpc/pseries: convert rtas_log_buf to linear allocation.
      powerpc/pseries: Fix endainness while restoring of r3 in MCE handler.
      powerpc/pseries: Define MCE error event section.
      powerpc/pseries: Dump and flush SLB contents on SLB MCE errors.
      powerpc/pseries: Display machine check error details.


 arch/powerpc/include/asm/book3s/64/mmu-hash.h |    1 
 arch/powerpc/include/asm/rtas.h               |  109 ++++++++++++++++++
 arch/powerpc/kernel/rtasd.c                   |    2 
 arch/powerpc/mm/slb.c                         |   35 ++++++
 arch/powerpc/platforms/pseries/ras.c          |  155 +++++++++++++++++++++++++
 5 files changed, 299 insertions(+), 3 deletions(-)

--
Signature

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [v3 PATCH 1/5] powerpc/pseries: convert rtas_log_buf to linear allocation.
  2018-06-07 17:27 [v3 PATCH 0/5] powerpc/pseries: Machien check handler improvements Mahesh J Salgaonkar
@ 2018-06-07 17:28 ` Mahesh J Salgaonkar
  2018-06-08  1:31   ` Nicholas Piggin
  2018-06-07 17:28 ` [v3 PATCH 2/5] powerpc/pseries: Fix endainness while restoring of r3 in MCE handler Mahesh J Salgaonkar
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 21+ messages in thread
From: Mahesh J Salgaonkar @ 2018-06-07 17:28 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: stable, Aneesh Kumar K.V, Aneesh Kumar K.V, Michael Ellerman,
	Laurent Dufour, Nicholas Piggin

From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

rtas_log_buf is a buffer to hold RTAS event data that are communicated
to kernel by hypervisor. This buffer is then used to pass RTAS event
data to user through proc fs. This buffer is allocated from vmalloc
(non-linear mapping) area.

On Machine check interrupt, register r3 points to RTAS extended event
log passed by hypervisor that contains the MCE event. The pseries
machine check handler then logs this error into rtas_log_buf. The
rtas_log_buf is a vmalloc-ed (non-linear) buffer we end up taking up a
page fault (vector 0x300) while accessing it. Since machine check
interrupt handler runs in NMI context we can not afford to take any
page fault. Page faults are not honored in NMI context and causes
kernel panic. This patch fixes this issue by allocating rtas_log_buf
using kmalloc.

Fixes: b96672dd840f ("powerpc: Machine check interrupt is a non-maskable interrupt")
Cc: stable@vger.kernel.org
Suggested-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/rtasd.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/rtasd.c b/arch/powerpc/kernel/rtasd.c
index f915db93cd42..3957d4ae2ba2 100644
--- a/arch/powerpc/kernel/rtasd.c
+++ b/arch/powerpc/kernel/rtasd.c
@@ -559,7 +559,7 @@ static int __init rtas_event_scan_init(void)
 	rtas_error_log_max = rtas_get_error_log_max();
 	rtas_error_log_buffer_max = rtas_error_log_max + sizeof(int);
 
-	rtas_log_buf = vmalloc(rtas_error_log_buffer_max*LOG_NUMBER);
+	rtas_log_buf = kmalloc(rtas_error_log_buffer_max*LOG_NUMBER, GFP_KERNEL);
 	if (!rtas_log_buf) {
 		printk(KERN_ERR "rtasd: no memory\n");
 		return -ENOMEM;

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [v3 PATCH 2/5] powerpc/pseries: Fix endainness while restoring of r3 in MCE handler.
  2018-06-07 17:27 [v3 PATCH 0/5] powerpc/pseries: Machien check handler improvements Mahesh J Salgaonkar
  2018-06-07 17:28 ` [v3 PATCH 1/5] powerpc/pseries: convert rtas_log_buf to linear allocation Mahesh J Salgaonkar
@ 2018-06-07 17:28 ` Mahesh J Salgaonkar
  2018-06-08  1:33   ` Nicholas Piggin
  2018-06-08  6:50   ` Michael Ellerman
  2018-06-07 17:28 ` [v3 PATCH 3/5] powerpc/pseries: Define MCE error event section Mahesh J Salgaonkar
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 21+ messages in thread
From: Mahesh J Salgaonkar @ 2018-06-07 17:28 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: stable, Aneesh Kumar K.V, Michael Ellerman, Laurent Dufour,
	Nicholas Piggin

From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

During Machine Check interrupt on pseries platform, register r3 points
RTAS extended event log passed by hypervisor. Since hypervisor uses r3
to pass pointer to rtas log, it stores the original r3 value at the
start of the memory (first 8 bytes) pointed by r3. Since hypervisor
stores this info and rtas log is in BE format, linux should make
sure to restore r3 value in correct endian format.

Without this patch when MCE handler, after recovery, returns to code that
that caused the MCE may end up with Data SLB access interrupt for invalid
address followed by kernel panic or hang.

[   62.878965] Severe Machine check interrupt [Recovered]
[   62.878968]   NIP [d00000000ca301b8]: init_module+0x1b8/0x338 [bork_kernel]
[   62.878969]   Initiator: CPU
[   62.878970]   Error type: SLB [Multihit]
[   62.878971]     Effective address: d00000000ca70000
cpu 0xa: Vector: 380 (Data SLB Access) at [c0000000fc7775b0]
    pc: c0000000009694c0: vsnprintf+0x80/0x480
    lr: c0000000009698e0: vscnprintf+0x20/0x60
    sp: c0000000fc777830
   msr: 8000000002009033
   dar: a803a30c000000d0
  current = 0xc00000000bc9ef00
  paca    = 0xc00000001eca5c00	 softe: 3	 irq_happened: 0x01
    pid   = 8860, comm = insmod
[c0000000fc7778b0] c0000000009698e0 vscnprintf+0x20/0x60
[c0000000fc7778e0] c00000000016b6c4 vprintk_emit+0xb4/0x4b0
[c0000000fc777960] c00000000016d40c vprintk_func+0x5c/0xd0
[c0000000fc777980] c00000000016cbb4 printk+0x38/0x4c
[c0000000fc7779a0] d00000000ca301c0 init_module+0x1c0/0x338 [bork_kernel]
[c0000000fc777a40] c00000000000d9c4 do_one_initcall+0x54/0x230
[c0000000fc777b00] c0000000001b3b74 do_init_module+0x8c/0x248
[c0000000fc777b90] c0000000001b2478 load_module+0x12b8/0x15b0
[c0000000fc777d30] c0000000001b29e8 sys_finit_module+0xa8/0x110
[c0000000fc777e30] c00000000000b204 system_call+0x58/0x6c
--- Exception: c00 (System Call) at 00007fff8bda0644
SP (7fffdfbfe980) is in userspace

This patch fixes this issue.

Fixes: a08a53ea4c97 ("powerpc/le: Enable RTAS events support")
Cc: stable@vger.kernel.org
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/pseries/ras.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
index 5e1ef9150182..2edc673be137 100644
--- a/arch/powerpc/platforms/pseries/ras.c
+++ b/arch/powerpc/platforms/pseries/ras.c
@@ -360,7 +360,7 @@ static struct rtas_error_log *fwnmi_get_errinfo(struct pt_regs *regs)
 	}
 
 	savep = __va(regs->gpr[3]);
-	regs->gpr[3] = savep[0];	/* restore original r3 */
+	regs->gpr[3] = be64_to_cpu(savep[0]);	/* restore original r3 */
 
 	/* If it isn't an extended log we can use the per cpu 64bit buffer */
 	h = (struct rtas_error_log *)&savep[1];

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [v3 PATCH 3/5] powerpc/pseries: Define MCE error event section.
  2018-06-07 17:27 [v3 PATCH 0/5] powerpc/pseries: Machien check handler improvements Mahesh J Salgaonkar
  2018-06-07 17:28 ` [v3 PATCH 1/5] powerpc/pseries: convert rtas_log_buf to linear allocation Mahesh J Salgaonkar
  2018-06-07 17:28 ` [v3 PATCH 2/5] powerpc/pseries: Fix endainness while restoring of r3 in MCE handler Mahesh J Salgaonkar
@ 2018-06-07 17:28 ` Mahesh J Salgaonkar
  2018-06-07 17:28 ` [v3 PATCH 4/5] powerpc/pseries: Dump and flush SLB contents on SLB MCE errors Mahesh J Salgaonkar
  2018-06-07 17:29 ` [v3 PATCH 5/5] powerpc/pseries: Display machine check error details Mahesh J Salgaonkar
  4 siblings, 0 replies; 21+ messages in thread
From: Mahesh J Salgaonkar @ 2018-06-07 17:28 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Aneesh Kumar K.V, Michael Ellerman, Laurent Dufour, Nicholas Piggin

From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

On pseries, the machine check error details are part of RTAS extended
event log passed under Machine check exception section. This patch adds
the definition of rtas MCE event section and related helper
functions.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/rtas.h |  104 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 104 insertions(+)

diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index ec9dd79398ee..3f2fba7ef23b 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -275,6 +275,7 @@ inline uint32_t rtas_ext_event_company_id(struct rtas_ext_event_log_v6 *ext_log)
 #define PSERIES_ELOG_SECT_ID_CALL_HOME		(('C' << 8) | 'H')
 #define PSERIES_ELOG_SECT_ID_USER_DEF		(('U' << 8) | 'D')
 #define PSERIES_ELOG_SECT_ID_HOTPLUG		(('H' << 8) | 'P')
+#define PSERIES_ELOG_SECT_ID_MCE		(('M' << 8) | 'C')
 
 /* Vendor specific Platform Event Log Format, Version 6, section header */
 struct pseries_errorlog {
@@ -326,6 +327,109 @@ struct pseries_hp_errorlog {
 #define PSERIES_HP_ELOG_ID_DRC_COUNT	3
 #define PSERIES_HP_ELOG_ID_DRC_IC	4
 
+/* RTAS pseries MCE errorlog section */
+#pragma pack(push, 1)
+struct pseries_mc_errorlog {
+	__be32	fru_id;
+	__be32	proc_id;
+	uint8_t	error_type;
+	union {
+		struct {
+			uint8_t	ue_err_type;
+			/* XXXXXXXX
+			 * X		1: Permanent or Transient UE.
+			 *  X		1: Effective address provided.
+			 *   X		1: Logical address provided.
+			 *    XX	2: Reserved.
+			 *      XXX	3: Type of UE error.
+			 */
+			uint8_t	reserved_1[6];
+			__be64	effective_address;
+			__be64	logical_address;
+		} ue_error;
+		struct {
+			uint8_t	soft_err_type;
+			/* XXXXXXXX
+			 * X		1: Effective address provided.
+			 *  XXXXX	5: Reserved.
+			 *       XX	2: Type of SLB/ERAT/TLB error.
+			 */
+			uint8_t	reserved_1[6];
+			__be64	effective_address;
+			uint8_t	reserved_2[8];
+		} soft_error;
+	} u;
+};
+#pragma pack(pop)
+
+/* RTAS pseries MCE error types */
+#define PSERIES_MC_ERROR_TYPE_UE		0x00
+#define PSERIES_MC_ERROR_TYPE_SLB		0x01
+#define PSERIES_MC_ERROR_TYPE_ERAT		0x02
+#define PSERIES_MC_ERROR_TYPE_TLB		0x04
+#define PSERIES_MC_ERROR_TYPE_D_CACHE		0x05
+#define PSERIES_MC_ERROR_TYPE_I_CACHE		0x07
+
+/* RTAS pseries MCE error sub types */
+#define PSERIES_MC_ERROR_UE_INDETERMINATE		0
+#define PSERIES_MC_ERROR_UE_IFETCH			1
+#define PSERIES_MC_ERROR_UE_PAGE_TABLE_WALK_IFETCH	2
+#define PSERIES_MC_ERROR_UE_LOAD_STORE			3
+#define PSERIES_MC_ERROR_UE_PAGE_TABLE_WALK_LOAD_STORE	4
+
+#define PSERIES_MC_ERROR_SLB_PARITY		0
+#define PSERIES_MC_ERROR_SLB_MULTIHIT		1
+#define PSERIES_MC_ERROR_SLB_INDETERMINATE	2
+
+#define PSERIES_MC_ERROR_ERAT_PARITY		1
+#define PSERIES_MC_ERROR_ERAT_MULTIHIT		2
+#define PSERIES_MC_ERROR_ERAT_INDETERMINATE	3
+
+#define PSERIES_MC_ERROR_TLB_PARITY		1
+#define PSERIES_MC_ERROR_TLB_MULTIHIT		2
+#define PSERIES_MC_ERROR_TLB_INDETERMINATE	3
+
+static inline uint8_t rtas_mc_error_type(const struct pseries_mc_errorlog *mlog)
+{
+	return mlog->error_type;
+}
+
+static inline uint8_t rtas_mc_error_sub_type(
+					const struct pseries_mc_errorlog *mlog)
+{
+	switch (mlog->error_type) {
+	case	PSERIES_MC_ERROR_TYPE_UE:
+		return (mlog->u.ue_error.ue_err_type & 0x07);
+	case	PSERIES_MC_ERROR_TYPE_SLB:
+	case	PSERIES_MC_ERROR_TYPE_ERAT:
+	case	PSERIES_MC_ERROR_TYPE_TLB:
+		return (mlog->u.soft_error.soft_err_type & 0x03);
+	default:
+		return 0;
+	}
+}
+
+static inline uint64_t rtas_mc_get_effective_addr(
+					const struct pseries_mc_errorlog *mlog)
+{
+	uint64_t addr = 0;
+
+	switch (mlog->error_type) {
+	case	PSERIES_MC_ERROR_TYPE_UE:
+		if (mlog->u.ue_error.ue_err_type & 0x40)
+			addr = mlog->u.ue_error.effective_address;
+		break;
+	case	PSERIES_MC_ERROR_TYPE_SLB:
+	case	PSERIES_MC_ERROR_TYPE_ERAT:
+	case	PSERIES_MC_ERROR_TYPE_TLB:
+		if (mlog->u.soft_error.soft_err_type & 0x80)
+			addr = mlog->u.soft_error.effective_address;
+	default:
+		break;
+	}
+	return be64_to_cpu(addr);
+}
+
 struct pseries_errorlog *get_pseries_errorlog(struct rtas_error_log *log,
 					      uint16_t section_id);
 

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [v3 PATCH 4/5] powerpc/pseries: Dump and flush SLB contents on SLB MCE errors.
  2018-06-07 17:27 [v3 PATCH 0/5] powerpc/pseries: Machien check handler improvements Mahesh J Salgaonkar
                   ` (2 preceding siblings ...)
  2018-06-07 17:28 ` [v3 PATCH 3/5] powerpc/pseries: Define MCE error event section Mahesh J Salgaonkar
@ 2018-06-07 17:28 ` Mahesh J Salgaonkar
  2018-06-08  1:48   ` Nicholas Piggin
  2018-06-12 13:47   ` Michael Ellerman
  2018-06-07 17:29 ` [v3 PATCH 5/5] powerpc/pseries: Display machine check error details Mahesh J Salgaonkar
  4 siblings, 2 replies; 21+ messages in thread
From: Mahesh J Salgaonkar @ 2018-06-07 17:28 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Aneesh Kumar K.V, Michael Ellerman, Aneesh Kumar K.V,
	Michael Ellerman, Laurent Dufour, Nicholas Piggin

From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

If we get a machine check exceptions due to SLB errors then dump the
current SLB contents which will be very much helpful in debugging the
root cause of SLB errors. On pseries, as of today system crashes on SLB
errors. These are soft errors and can be fixed by flushing the SLBs so
the kernel can continue to function instead of system crash. This patch
fixes that also.

With this patch the console will log SLB contents like below on SLB MCE
errors:

[  822.711728] slb contents:
[  822.711730] 00 c000000008000000 400ea1b217000500
[  822.711731]   1T  ESID=   c00000  VSID=      ea1b217 LLP:100
[  822.711732] 01 d000000008000000 400d43642f000510
[  822.711733]   1T  ESID=   d00000  VSID=      d43642f LLP:110
[  822.711734] 09 f000000008000000 400a86c85f000500
[  822.711736]   1T  ESID=   f00000  VSID=      a86c85f LLP:100
[  822.711737] 10 00007f0008000000 400d1f26e3000d90
[  822.711738]   1T  ESID=       7f  VSID=      d1f26e3 LLP:110
[  822.711739] 11 0000000018000000 000e3615f520fd90
[  822.711740]  256M ESID=        1  VSID=   e3615f520f LLP:110
[  822.711740] 12 d000000008000000 400d43642f000510
[  822.711741]   1T  ESID=   d00000  VSID=      d43642f LLP:110
[  822.711742] 13 d000000008000000 400d43642f000510
[  822.711743]   1T  ESID=   d00000  VSID=      d43642f LLP:110


Suggested-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/book3s/64/mmu-hash.h |    1 +
 arch/powerpc/mm/slb.c                         |   35 +++++++++++++++++++++++++
 arch/powerpc/platforms/pseries/ras.c          |   29 ++++++++++++++++++++-
 3 files changed, 64 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
index 50ed64fba4ae..c0da68927235 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
@@ -487,6 +487,7 @@ extern void hpte_init_native(void);
 
 extern void slb_initialize(void);
 extern void slb_flush_and_rebolt(void);
+extern void slb_dump_contents(void);
 
 extern void slb_vmalloc_update(void);
 extern void slb_set_size(u16 size);
diff --git a/arch/powerpc/mm/slb.c b/arch/powerpc/mm/slb.c
index 66577cc66dc9..799aa117cec3 100644
--- a/arch/powerpc/mm/slb.c
+++ b/arch/powerpc/mm/slb.c
@@ -145,6 +145,41 @@ void slb_flush_and_rebolt(void)
 	get_paca()->slb_cache_ptr = 0;
 }
 
+void slb_dump_contents(void)
+{
+	int i;
+	unsigned long e, v;
+	unsigned long llp;
+
+	pr_err("slb contents:\n");
+	for (i = 0; i < mmu_slb_size; i++) {
+		asm volatile("slbmfee  %0,%1" : "=r" (e) : "r" (i));
+		asm volatile("slbmfev  %0,%1" : "=r" (v) : "r" (i));
+
+		if (!e && !v)
+			continue;
+
+		pr_err("%02d %016lx %016lx", i, e, v);
+
+		if (!(e & SLB_ESID_V)) {
+			pr_err("\n");
+			continue;
+		}
+		llp = v & SLB_VSID_LLP;
+		if (v & SLB_VSID_B_1T) {
+			pr_err("  1T  ESID=%9lx  VSID=%13lx LLP:%3lx\n",
+				GET_ESID_1T(e),
+				(v & ~SLB_VSID_B) >> SLB_VSID_SHIFT_1T,
+				llp);
+		} else {
+			pr_err(" 256M ESID=%9lx  VSID=%13lx LLP:%3lx\n",
+				GET_ESID(e),
+				(v & ~SLB_VSID_B) >> SLB_VSID_SHIFT,
+				llp);
+		}
+	}
+}
+
 void slb_vmalloc_update(void)
 {
 	unsigned long vflags;
diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
index 2edc673be137..e56759d92356 100644
--- a/arch/powerpc/platforms/pseries/ras.c
+++ b/arch/powerpc/platforms/pseries/ras.c
@@ -422,6 +422,31 @@ int pSeries_system_reset_exception(struct pt_regs *regs)
 	return 0; /* need to perform reset */
 }
 
+static int mce_handle_error(struct rtas_error_log *errp)
+{
+	struct pseries_errorlog *pseries_log;
+	struct pseries_mc_errorlog *mce_log;
+	int disposition = rtas_error_disposition(errp);
+	uint8_t error_type;
+
+	pseries_log = get_pseries_errorlog(errp, PSERIES_ELOG_SECT_ID_MCE);
+	if (pseries_log == NULL)
+		goto out;
+
+	mce_log = (struct pseries_mc_errorlog *)pseries_log->data;
+	error_type = rtas_mc_error_type(mce_log);
+
+	if ((disposition == RTAS_DISP_NOT_RECOVERED) &&
+			(error_type == PSERIES_MC_ERROR_TYPE_SLB)) {
+		slb_dump_contents();
+		slb_flush_and_rebolt();
+		disposition = RTAS_DISP_FULLY_RECOVERED;
+	}
+
+out:
+	return disposition;
+}
+
 /*
  * See if we can recover from a machine check exception.
  * This is only called on power4 (or above) and only via
@@ -434,7 +459,9 @@ int pSeries_system_reset_exception(struct pt_regs *regs)
 static int recover_mce(struct pt_regs *regs, struct rtas_error_log *err)
 {
 	int recovered = 0;
-	int disposition = rtas_error_disposition(err);
+	int disposition;
+
+	disposition = mce_handle_error(err);
 
 	if (!(regs->msr & MSR_RI)) {
 		/* If MSR_RI isn't set, we cannot recover */

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [v3 PATCH 5/5] powerpc/pseries: Display machine check error details.
  2018-06-07 17:27 [v3 PATCH 0/5] powerpc/pseries: Machien check handler improvements Mahesh J Salgaonkar
                   ` (3 preceding siblings ...)
  2018-06-07 17:28 ` [v3 PATCH 4/5] powerpc/pseries: Dump and flush SLB contents on SLB MCE errors Mahesh J Salgaonkar
@ 2018-06-07 17:29 ` Mahesh J Salgaonkar
  2018-06-08  1:51   ` Nicholas Piggin
  4 siblings, 1 reply; 21+ messages in thread
From: Mahesh J Salgaonkar @ 2018-06-07 17:29 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Aneesh Kumar K.V, Michael Ellerman, Laurent Dufour, Nicholas Piggin

From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

Extract the MCE error details from RTAS extended log and display it to
console.

With this patch you should now see mce logs like below:

[  142.371818] Severe Machine check interrupt [Recovered]
[  142.371822]   NIP [d00000000ca301b8]: init_module+0x1b8/0x338 [bork_kernel]
[  142.371822]   Initiator: CPU
[  142.371823]   Error type: SLB [Multihit]
[  142.371824]     Effective address: d00000000ca70000

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/rtas.h      |    5 +
 arch/powerpc/platforms/pseries/ras.c |  128 +++++++++++++++++++++++++++++++++-
 2 files changed, 131 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index 3f2fba7ef23b..8100a95c133a 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -190,6 +190,11 @@ static inline uint8_t rtas_error_extended(const struct rtas_error_log *elog)
 	return (elog->byte1 & 0x04) >> 2;
 }
 
+static inline uint8_t rtas_error_initiator(const struct rtas_error_log *elog)
+{
+	return (elog->byte2 & 0xf0) >> 4;
+}
+
 #define rtas_error_type(x)	((x)->byte3)
 
 static inline
diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
index e56759d92356..cd9446980092 100644
--- a/arch/powerpc/platforms/pseries/ras.c
+++ b/arch/powerpc/platforms/pseries/ras.c
@@ -422,7 +422,130 @@ int pSeries_system_reset_exception(struct pt_regs *regs)
 	return 0; /* need to perform reset */
 }
 
-static int mce_handle_error(struct rtas_error_log *errp)
+#define VAL_TO_STRING(ar, val)	((val < ARRAY_SIZE(ar)) ? ar[val] : "Unknown")
+
+static void pseries_print_mce_info(struct pt_regs *regs,
+				struct rtas_error_log *errp, int disposition)
+{
+	const char *level, *sevstr;
+	struct pseries_errorlog *pseries_log;
+	struct pseries_mc_errorlog *mce_log;
+	uint8_t error_type, err_sub_type;
+	uint8_t initiator = rtas_error_initiator(errp);
+	uint64_t addr;
+
+	static const char * const initiators[] = {
+		"Unknown",
+		"CPU",
+		"PCI",
+		"ISA",
+		"Memory",
+		"Power Mgmt",
+	};
+	static const char * const mc_err_types[] = {
+		"UE",
+		"SLB",
+		"ERAT",
+		"TLB",
+		"D-Cache",
+		"Unknown",
+		"I-Cache",
+	};
+	static const char * const mc_ue_types[] = {
+		"Indeterminate",
+		"Instruction fetch",
+		"Page table walk ifetch",
+		"Load/Store",
+		"Page table walk Load/Store",
+	};
+
+	/* SLB sub errors valid values are 0x0, 0x1, 0x2 */
+	static const char * const mc_slb_types[] = {
+		"Parity",
+		"Multihit",
+		"Indeterminate",
+	};
+
+	/* TLB and ERAT sub errors valid values are 0x1, 0x2, 0x3 */
+	static const char * const mc_soft_types[] = {
+		"Unknown",
+		"Parity",
+		"Multihit",
+		"Indeterminate",
+	};
+
+	pseries_log = get_pseries_errorlog(errp, PSERIES_ELOG_SECT_ID_MCE);
+	if (pseries_log == NULL)
+		return;
+
+	mce_log = (struct pseries_mc_errorlog *)pseries_log->data;
+
+	error_type = rtas_mc_error_type(mce_log);
+	err_sub_type = rtas_mc_error_sub_type(mce_log);
+
+	switch (rtas_error_severity(errp)) {
+	case RTAS_SEVERITY_NO_ERROR:
+		level = KERN_INFO;
+		sevstr = "Harmless";
+		break;
+	case RTAS_SEVERITY_WARNING:
+		level = KERN_WARNING;
+		sevstr = "";
+		break;
+	case RTAS_SEVERITY_ERROR:
+	case RTAS_SEVERITY_ERROR_SYNC:
+		level = KERN_ERR;
+		sevstr = "Severe";
+		break;
+	case RTAS_SEVERITY_FATAL:
+	default:
+		level = KERN_ERR;
+		sevstr = "Fatal";
+		break;
+	}
+
+	printk("%s%s Machine check interrupt [%s]\n", level, sevstr,
+		disposition == RTAS_DISP_FULLY_RECOVERED ?
+		"Recovered" : "Not recovered");
+	if (user_mode(regs)) {
+		printk("%s  NIP: [%016lx] PID: %d Comm: %s\n", level,
+			regs->nip, current->pid, current->comm);
+	} else {
+		printk("%s  NIP [%016lx]: %pS\n", level, regs->nip,
+			(void *)regs->nip);
+	}
+	printk("%s  Initiator: %s\n", level,
+				VAL_TO_STRING(initiators, initiator));
+
+	switch (error_type) {
+	case PSERIES_MC_ERROR_TYPE_UE:
+		printk("%s  Error type: %s [%s]\n", level,
+			VAL_TO_STRING(mc_err_types, error_type),
+			VAL_TO_STRING(mc_ue_types, err_sub_type));
+		break;
+	case PSERIES_MC_ERROR_TYPE_SLB:
+		printk("%s  Error type: %s [%s]\n", level,
+			VAL_TO_STRING(mc_err_types, error_type),
+			VAL_TO_STRING(mc_slb_types, err_sub_type));
+		break;
+	case PSERIES_MC_ERROR_TYPE_ERAT:
+	case PSERIES_MC_ERROR_TYPE_TLB:
+		printk("%s  Error type: %s [%s]\n", level,
+			VAL_TO_STRING(mc_err_types, error_type),
+			VAL_TO_STRING(mc_soft_types, err_sub_type));
+		break;
+	default:
+		printk("%s  Error type: %s\n", level,
+			VAL_TO_STRING(mc_err_types, error_type));
+		break;
+	}
+
+	addr = rtas_mc_get_effective_addr(mce_log);
+	if (addr)
+		printk("%s    Effective address: %016llx\n", level, addr);
+}
+
+static int mce_handle_error(struct pt_regs *regs, struct rtas_error_log *errp)
 {
 	struct pseries_errorlog *pseries_log;
 	struct pseries_mc_errorlog *mce_log;
@@ -442,6 +565,7 @@ static int mce_handle_error(struct rtas_error_log *errp)
 		slb_flush_and_rebolt();
 		disposition = RTAS_DISP_FULLY_RECOVERED;
 	}
+	pseries_print_mce_info(regs, errp, disposition);
 
 out:
 	return disposition;
@@ -461,7 +585,7 @@ static int recover_mce(struct pt_regs *regs, struct rtas_error_log *err)
 	int recovered = 0;
 	int disposition;
 
-	disposition = mce_handle_error(err);
+	disposition = mce_handle_error(regs, err);
 
 	if (!(regs->msr & MSR_RI)) {
 		/* If MSR_RI isn't set, we cannot recover */

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [v3 PATCH 1/5] powerpc/pseries: convert rtas_log_buf to linear allocation.
  2018-06-07 17:28 ` [v3 PATCH 1/5] powerpc/pseries: convert rtas_log_buf to linear allocation Mahesh J Salgaonkar
@ 2018-06-08  1:31   ` Nicholas Piggin
  2018-06-08  6:16     ` Mahesh Jagannath Salgaonkar
  0 siblings, 1 reply; 21+ messages in thread
From: Nicholas Piggin @ 2018-06-08  1:31 UTC (permalink / raw)
  To: Mahesh J Salgaonkar
  Cc: linuxppc-dev, stable, Aneesh Kumar K.V, Michael Ellerman, Laurent Dufour

On Thu, 07 Jun 2018 22:58:11 +0530
Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com> wrote:

> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> 
> rtas_log_buf is a buffer to hold RTAS event data that are communicated
> to kernel by hypervisor. This buffer is then used to pass RTAS event
> data to user through proc fs. This buffer is allocated from vmalloc
> (non-linear mapping) area.
> 
> On Machine check interrupt, register r3 points to RTAS extended event
> log passed by hypervisor that contains the MCE event. The pseries
> machine check handler then logs this error into rtas_log_buf. The
> rtas_log_buf is a vmalloc-ed (non-linear) buffer we end up taking up a
> page fault (vector 0x300) while accessing it. Since machine check
> interrupt handler runs in NMI context we can not afford to take any
> page fault. Page faults are not honored in NMI context and causes
> kernel panic. This patch fixes this issue by allocating rtas_log_buf
> using kmalloc.
> 
> Fixes: b96672dd840f ("powerpc: Machine check interrupt is a non-maskable interrupt")
> Cc: stable@vger.kernel.org
> Suggested-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> ---
>  arch/powerpc/kernel/rtasd.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kernel/rtasd.c b/arch/powerpc/kernel/rtasd.c
> index f915db93cd42..3957d4ae2ba2 100644
> --- a/arch/powerpc/kernel/rtasd.c
> +++ b/arch/powerpc/kernel/rtasd.c
> @@ -559,7 +559,7 @@ static int __init rtas_event_scan_init(void)
>  	rtas_error_log_max = rtas_get_error_log_max();
>  	rtas_error_log_buffer_max = rtas_error_log_max + sizeof(int);
>  
> -	rtas_log_buf = vmalloc(rtas_error_log_buffer_max*LOG_NUMBER);
> +	rtas_log_buf = kmalloc(rtas_error_log_buffer_max*LOG_NUMBER, GFP_KERNEL);

Does this have to be in the RMA region if it's to be accessed with
relocation off in the guest?

A comment about it being accessed with relocation off might be helpful
too.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [v3 PATCH 2/5] powerpc/pseries: Fix endainness while restoring of r3 in MCE handler.
  2018-06-07 17:28 ` [v3 PATCH 2/5] powerpc/pseries: Fix endainness while restoring of r3 in MCE handler Mahesh J Salgaonkar
@ 2018-06-08  1:33   ` Nicholas Piggin
  2018-06-08  6:50   ` Michael Ellerman
  1 sibling, 0 replies; 21+ messages in thread
From: Nicholas Piggin @ 2018-06-08  1:33 UTC (permalink / raw)
  To: Mahesh J Salgaonkar
  Cc: linuxppc-dev, stable, Aneesh Kumar K.V, Michael Ellerman, Laurent Dufour

On Thu, 07 Jun 2018 22:58:33 +0530
Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com> wrote:

> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> 
> During Machine Check interrupt on pseries platform, register r3 points
> RTAS extended event log passed by hypervisor. Since hypervisor uses r3
> to pass pointer to rtas log, it stores the original r3 value at the
> start of the memory (first 8 bytes) pointed by r3. Since hypervisor
> stores this info and rtas log is in BE format, linux should make
> sure to restore r3 value in correct endian format.
> 
> Without this patch when MCE handler, after recovery, returns to code that
> that caused the MCE may end up with Data SLB access interrupt for invalid
> address followed by kernel panic or hang.
> 
> [   62.878965] Severe Machine check interrupt [Recovered]
> [   62.878968]   NIP [d00000000ca301b8]: init_module+0x1b8/0x338 [bork_kernel]
> [   62.878969]   Initiator: CPU
> [   62.878970]   Error type: SLB [Multihit]
> [   62.878971]     Effective address: d00000000ca70000
> cpu 0xa: Vector: 380 (Data SLB Access) at [c0000000fc7775b0]
>     pc: c0000000009694c0: vsnprintf+0x80/0x480
>     lr: c0000000009698e0: vscnprintf+0x20/0x60
>     sp: c0000000fc777830
>    msr: 8000000002009033
>    dar: a803a30c000000d0
>   current = 0xc00000000bc9ef00
>   paca    = 0xc00000001eca5c00	 softe: 3	 irq_happened: 0x01
>     pid   = 8860, comm = insmod
> [c0000000fc7778b0] c0000000009698e0 vscnprintf+0x20/0x60
> [c0000000fc7778e0] c00000000016b6c4 vprintk_emit+0xb4/0x4b0
> [c0000000fc777960] c00000000016d40c vprintk_func+0x5c/0xd0
> [c0000000fc777980] c00000000016cbb4 printk+0x38/0x4c
> [c0000000fc7779a0] d00000000ca301c0 init_module+0x1c0/0x338 [bork_kernel]
> [c0000000fc777a40] c00000000000d9c4 do_one_initcall+0x54/0x230
> [c0000000fc777b00] c0000000001b3b74 do_init_module+0x8c/0x248
> [c0000000fc777b90] c0000000001b2478 load_module+0x12b8/0x15b0
> [c0000000fc777d30] c0000000001b29e8 sys_finit_module+0xa8/0x110
> [c0000000fc777e30] c00000000000b204 system_call+0x58/0x6c
> --- Exception: c00 (System Call) at 00007fff8bda0644
> SP (7fffdfbfe980) is in userspace
> 
> This patch fixes this issue.

LGTM

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>

> 
> Fixes: a08a53ea4c97 ("powerpc/le: Enable RTAS events support")
> Cc: stable@vger.kernel.org
> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> ---
>  arch/powerpc/platforms/pseries/ras.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
> index 5e1ef9150182..2edc673be137 100644
> --- a/arch/powerpc/platforms/pseries/ras.c
> +++ b/arch/powerpc/platforms/pseries/ras.c
> @@ -360,7 +360,7 @@ static struct rtas_error_log *fwnmi_get_errinfo(struct pt_regs *regs)
>  	}
>  
>  	savep = __va(regs->gpr[3]);
> -	regs->gpr[3] = savep[0];	/* restore original r3 */
> +	regs->gpr[3] = be64_to_cpu(savep[0]);	/* restore original r3 */
>  
>  	/* If it isn't an extended log we can use the per cpu 64bit buffer */
>  	h = (struct rtas_error_log *)&savep[1];
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [v3 PATCH 4/5] powerpc/pseries: Dump and flush SLB contents on SLB MCE errors.
  2018-06-07 17:28 ` [v3 PATCH 4/5] powerpc/pseries: Dump and flush SLB contents on SLB MCE errors Mahesh J Salgaonkar
@ 2018-06-08  1:48   ` Nicholas Piggin
  2018-06-08  6:19     ` Mahesh Jagannath Salgaonkar
  2018-06-12 13:47   ` Michael Ellerman
  1 sibling, 1 reply; 21+ messages in thread
From: Nicholas Piggin @ 2018-06-08  1:48 UTC (permalink / raw)
  To: Mahesh J Salgaonkar
  Cc: linuxppc-dev, Aneesh Kumar K.V, Michael Ellerman, Laurent Dufour

On Thu, 07 Jun 2018 22:58:55 +0530
Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com> wrote:

> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> 
> If we get a machine check exceptions due to SLB errors then dump the
> current SLB contents which will be very much helpful in debugging the
> root cause of SLB errors. On pseries, as of today system crashes on SLB
> errors. These are soft errors and can be fixed by flushing the SLBs so
> the kernel can continue to function instead of system crash. This patch
> fixes that also.

So pseries never flushed SLB and reloaded in response to multi hit
errors? This seems like quite a good improvement then. I like
dumping SLB too.

It's a bit annoying we can't share the same code with xmon really,
that's okay but I just suggest commenting them both if you take a
copy like this with a note to keep them in synch if you re-post
the series.

> 
> With this patch the console will log SLB contents like below on SLB MCE
> errors:
> 
> [  822.711728] slb contents:

Suggest keeping the same format as the xmon dump (in particular
CPU number, even though it's probably printed elsewhere in the MCE
message it doesn't hurt.

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>

Thanks,
Nick

> [  822.711730] 00 c000000008000000 400ea1b217000500
> [  822.711731]   1T  ESID=   c00000  VSID=      ea1b217 LLP:100
> [  822.711732] 01 d000000008000000 400d43642f000510
> [  822.711733]   1T  ESID=   d00000  VSID=      d43642f LLP:110
> [  822.711734] 09 f000000008000000 400a86c85f000500
> [  822.711736]   1T  ESID=   f00000  VSID=      a86c85f LLP:100
> [  822.711737] 10 00007f0008000000 400d1f26e3000d90
> [  822.711738]   1T  ESID=       7f  VSID=      d1f26e3 LLP:110
> [  822.711739] 11 0000000018000000 000e3615f520fd90
> [  822.711740]  256M ESID=        1  VSID=   e3615f520f LLP:110
> [  822.711740] 12 d000000008000000 400d43642f000510
> [  822.711741]   1T  ESID=   d00000  VSID=      d43642f LLP:110
> [  822.711742] 13 d000000008000000 400d43642f000510
> [  822.711743]   1T  ESID=   d00000  VSID=      d43642f LLP:110
> 
> 
> Suggested-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> ---
>  arch/powerpc/include/asm/book3s/64/mmu-hash.h |    1 +
>  arch/powerpc/mm/slb.c                         |   35 +++++++++++++++++++++++++
>  arch/powerpc/platforms/pseries/ras.c          |   29 ++++++++++++++++++++-
>  3 files changed, 64 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
> index 50ed64fba4ae..c0da68927235 100644
> --- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
> +++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
> @@ -487,6 +487,7 @@ extern void hpte_init_native(void);
>  
>  extern void slb_initialize(void);
>  extern void slb_flush_and_rebolt(void);
> +extern void slb_dump_contents(void);
>  
>  extern void slb_vmalloc_update(void);
>  extern void slb_set_size(u16 size);
> diff --git a/arch/powerpc/mm/slb.c b/arch/powerpc/mm/slb.c
> index 66577cc66dc9..799aa117cec3 100644
> --- a/arch/powerpc/mm/slb.c
> +++ b/arch/powerpc/mm/slb.c
> @@ -145,6 +145,41 @@ void slb_flush_and_rebolt(void)
>  	get_paca()->slb_cache_ptr = 0;
>  }
>  
> +void slb_dump_contents(void)
> +{
> +	int i;
> +	unsigned long e, v;
> +	unsigned long llp;
> +
> +	pr_err("slb contents:\n");
> +	for (i = 0; i < mmu_slb_size; i++) {
> +		asm volatile("slbmfee  %0,%1" : "=r" (e) : "r" (i));
> +		asm volatile("slbmfev  %0,%1" : "=r" (v) : "r" (i));
> +
> +		if (!e && !v)
> +			continue;
> +
> +		pr_err("%02d %016lx %016lx", i, e, v);
> +
> +		if (!(e & SLB_ESID_V)) {
> +			pr_err("\n");
> +			continue;
> +		}
> +		llp = v & SLB_VSID_LLP;
> +		if (v & SLB_VSID_B_1T) {
> +			pr_err("  1T  ESID=%9lx  VSID=%13lx LLP:%3lx\n",
> +				GET_ESID_1T(e),
> +				(v & ~SLB_VSID_B) >> SLB_VSID_SHIFT_1T,
> +				llp);
> +		} else {
> +			pr_err(" 256M ESID=%9lx  VSID=%13lx LLP:%3lx\n",
> +				GET_ESID(e),
> +				(v & ~SLB_VSID_B) >> SLB_VSID_SHIFT,
> +				llp);
> +		}
> +	}
> +}
> +
>  void slb_vmalloc_update(void)
>  {
>  	unsigned long vflags;
> diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
> index 2edc673be137..e56759d92356 100644
> --- a/arch/powerpc/platforms/pseries/ras.c
> +++ b/arch/powerpc/platforms/pseries/ras.c
> @@ -422,6 +422,31 @@ int pSeries_system_reset_exception(struct pt_regs *regs)
>  	return 0; /* need to perform reset */
>  }
>  
> +static int mce_handle_error(struct rtas_error_log *errp)
> +{
> +	struct pseries_errorlog *pseries_log;
> +	struct pseries_mc_errorlog *mce_log;
> +	int disposition = rtas_error_disposition(errp);
> +	uint8_t error_type;
> +
> +	pseries_log = get_pseries_errorlog(errp, PSERIES_ELOG_SECT_ID_MCE);
> +	if (pseries_log == NULL)
> +		goto out;
> +
> +	mce_log = (struct pseries_mc_errorlog *)pseries_log->data;
> +	error_type = rtas_mc_error_type(mce_log);
> +
> +	if ((disposition == RTAS_DISP_NOT_RECOVERED) &&
> +			(error_type == PSERIES_MC_ERROR_TYPE_SLB)) {
> +		slb_dump_contents();
> +		slb_flush_and_rebolt();
> +		disposition = RTAS_DISP_FULLY_RECOVERED;
> +	}
> +
> +out:
> +	return disposition;
> +}
> +
>  /*
>   * See if we can recover from a machine check exception.
>   * This is only called on power4 (or above) and only via
> @@ -434,7 +459,9 @@ int pSeries_system_reset_exception(struct pt_regs *regs)
>  static int recover_mce(struct pt_regs *regs, struct rtas_error_log *err)
>  {
>  	int recovered = 0;
> -	int disposition = rtas_error_disposition(err);
> +	int disposition;
> +
> +	disposition = mce_handle_error(err);
>  
>  	if (!(regs->msr & MSR_RI)) {
>  		/* If MSR_RI isn't set, we cannot recover */
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [v3 PATCH 5/5] powerpc/pseries: Display machine check error details.
  2018-06-07 17:29 ` [v3 PATCH 5/5] powerpc/pseries: Display machine check error details Mahesh J Salgaonkar
@ 2018-06-08  1:51   ` Nicholas Piggin
  2018-06-08  6:28     ` Mahesh Jagannath Salgaonkar
  2018-07-02 18:01     ` Michal Suchánek
  0 siblings, 2 replies; 21+ messages in thread
From: Nicholas Piggin @ 2018-06-08  1:51 UTC (permalink / raw)
  To: Mahesh J Salgaonkar
  Cc: linuxppc-dev, Aneesh Kumar K.V, Michael Ellerman, Laurent Dufour

On Thu, 07 Jun 2018 22:59:04 +0530
Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com> wrote:

> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> 
> Extract the MCE error details from RTAS extended log and display it to
> console.
> 
> With this patch you should now see mce logs like below:
> 
> [  142.371818] Severe Machine check interrupt [Recovered]
> [  142.371822]   NIP [d00000000ca301b8]: init_module+0x1b8/0x338 [bork_kernel]
> [  142.371822]   Initiator: CPU
> [  142.371823]   Error type: SLB [Multihit]
> [  142.371824]     Effective address: d00000000ca70000
> 
> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> ---
>  arch/powerpc/include/asm/rtas.h      |    5 +
>  arch/powerpc/platforms/pseries/ras.c |  128 +++++++++++++++++++++++++++++++++-
>  2 files changed, 131 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
> index 3f2fba7ef23b..8100a95c133a 100644
> --- a/arch/powerpc/include/asm/rtas.h
> +++ b/arch/powerpc/include/asm/rtas.h
> @@ -190,6 +190,11 @@ static inline uint8_t rtas_error_extended(const struct rtas_error_log *elog)
>  	return (elog->byte1 & 0x04) >> 2;
>  }
>  
> +static inline uint8_t rtas_error_initiator(const struct rtas_error_log *elog)
> +{
> +	return (elog->byte2 & 0xf0) >> 4;
> +}
> +
>  #define rtas_error_type(x)	((x)->byte3)
>  
>  static inline
> diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
> index e56759d92356..cd9446980092 100644
> --- a/arch/powerpc/platforms/pseries/ras.c
> +++ b/arch/powerpc/platforms/pseries/ras.c
> @@ -422,7 +422,130 @@ int pSeries_system_reset_exception(struct pt_regs *regs)
>  	return 0; /* need to perform reset */
>  }
>  
> -static int mce_handle_error(struct rtas_error_log *errp)
> +#define VAL_TO_STRING(ar, val)	((val < ARRAY_SIZE(ar)) ? ar[val] : "Unknown")
> +
> +static void pseries_print_mce_info(struct pt_regs *regs,
> +				struct rtas_error_log *errp, int disposition)
> +{
> +	const char *level, *sevstr;
> +	struct pseries_errorlog *pseries_log;
> +	struct pseries_mc_errorlog *mce_log;
> +	uint8_t error_type, err_sub_type;
> +	uint8_t initiator = rtas_error_initiator(errp);
> +	uint64_t addr;
> +
> +	static const char * const initiators[] = {
> +		"Unknown",
> +		"CPU",
> +		"PCI",
> +		"ISA",
> +		"Memory",
> +		"Power Mgmt",
> +	};
> +	static const char * const mc_err_types[] = {
> +		"UE",
> +		"SLB",
> +		"ERAT",
> +		"TLB",
> +		"D-Cache",
> +		"Unknown",
> +		"I-Cache",
> +	};
> +	static const char * const mc_ue_types[] = {
> +		"Indeterminate",
> +		"Instruction fetch",
> +		"Page table walk ifetch",
> +		"Load/Store",
> +		"Page table walk Load/Store",
> +	};
> +
> +	/* SLB sub errors valid values are 0x0, 0x1, 0x2 */
> +	static const char * const mc_slb_types[] = {
> +		"Parity",
> +		"Multihit",
> +		"Indeterminate",
> +	};
> +
> +	/* TLB and ERAT sub errors valid values are 0x1, 0x2, 0x3 */
> +	static const char * const mc_soft_types[] = {
> +		"Unknown",
> +		"Parity",
> +		"Multihit",
> +		"Indeterminate",
> +	};
> +
> +	pseries_log = get_pseries_errorlog(errp, PSERIES_ELOG_SECT_ID_MCE);
> +	if (pseries_log == NULL)
> +		return;
> +
> +	mce_log = (struct pseries_mc_errorlog *)pseries_log->data;
> +
> +	error_type = rtas_mc_error_type(mce_log);
> +	err_sub_type = rtas_mc_error_sub_type(mce_log);
> +
> +	switch (rtas_error_severity(errp)) {
> +	case RTAS_SEVERITY_NO_ERROR:
> +		level = KERN_INFO;
> +		sevstr = "Harmless";
> +		break;
> +	case RTAS_SEVERITY_WARNING:
> +		level = KERN_WARNING;
> +		sevstr = "";
> +		break;
> +	case RTAS_SEVERITY_ERROR:
> +	case RTAS_SEVERITY_ERROR_SYNC:
> +		level = KERN_ERR;
> +		sevstr = "Severe";
> +		break;
> +	case RTAS_SEVERITY_FATAL:
> +	default:
> +		level = KERN_ERR;
> +		sevstr = "Fatal";
> +		break;
> +	}
> +
> +	printk("%s%s Machine check interrupt [%s]\n", level, sevstr,
> +		disposition == RTAS_DISP_FULLY_RECOVERED ?
> +		"Recovered" : "Not recovered");
> +	if (user_mode(regs)) {
> +		printk("%s  NIP: [%016lx] PID: %d Comm: %s\n", level,
> +			regs->nip, current->pid, current->comm);
> +	} else {
> +		printk("%s  NIP [%016lx]: %pS\n", level, regs->nip,
> +			(void *)regs->nip);
> +	}

I think it's probably still useful to print pid/comm for kernel mode
faults if !in_interrupt()... I see you're basically taking kernel/mce.c
and doing the same thing.

Is there any reasonable way to share code here?

Thanks,
Nick

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [v3 PATCH 1/5] powerpc/pseries: convert rtas_log_buf to linear allocation.
  2018-06-08  1:31   ` Nicholas Piggin
@ 2018-06-08  6:16     ` Mahesh Jagannath Salgaonkar
  0 siblings, 0 replies; 21+ messages in thread
From: Mahesh Jagannath Salgaonkar @ 2018-06-08  6:16 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: linuxppc-dev, stable, Aneesh Kumar K.V, Michael Ellerman, Laurent Dufour

On 06/08/2018 07:01 AM, Nicholas Piggin wrote:
> On Thu, 07 Jun 2018 22:58:11 +0530
> Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com> wrote:
> 
>> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
>>
>> rtas_log_buf is a buffer to hold RTAS event data that are communicated
>> to kernel by hypervisor. This buffer is then used to pass RTAS event
>> data to user through proc fs. This buffer is allocated from vmalloc
>> (non-linear mapping) area.
>>
>> On Machine check interrupt, register r3 points to RTAS extended event
>> log passed by hypervisor that contains the MCE event. The pseries
>> machine check handler then logs this error into rtas_log_buf. The
>> rtas_log_buf is a vmalloc-ed (non-linear) buffer we end up taking up a
>> page fault (vector 0x300) while accessing it. Since machine check
>> interrupt handler runs in NMI context we can not afford to take any
>> page fault. Page faults are not honored in NMI context and causes
>> kernel panic. This patch fixes this issue by allocating rtas_log_buf
>> using kmalloc.
>>
>> Fixes: b96672dd840f ("powerpc: Machine check interrupt is a non-maskable interrupt")
>> Cc: stable@vger.kernel.org
>> Suggested-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
>> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
>> ---
>>  arch/powerpc/kernel/rtasd.c |    2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/kernel/rtasd.c b/arch/powerpc/kernel/rtasd.c
>> index f915db93cd42..3957d4ae2ba2 100644
>> --- a/arch/powerpc/kernel/rtasd.c
>> +++ b/arch/powerpc/kernel/rtasd.c
>> @@ -559,7 +559,7 @@ static int __init rtas_event_scan_init(void)
>>  	rtas_error_log_max = rtas_get_error_log_max();
>>  	rtas_error_log_buffer_max = rtas_error_log_max + sizeof(int);
>>  
>> -	rtas_log_buf = vmalloc(rtas_error_log_buffer_max*LOG_NUMBER);
>> +	rtas_log_buf = kmalloc(rtas_error_log_buffer_max*LOG_NUMBER, GFP_KERNEL);
> 
> Does this have to be in the RMA region if it's to be accessed with
> relocation off in the guest?

Nope not required. It never gets accessed with relocation off.

> 
> A comment about it being accessed with relocation off might be helpful
> too.

Sure.

Thanks,
-Mahesh.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [v3 PATCH 4/5] powerpc/pseries: Dump and flush SLB contents on SLB MCE errors.
  2018-06-08  1:48   ` Nicholas Piggin
@ 2018-06-08  6:19     ` Mahesh Jagannath Salgaonkar
  0 siblings, 0 replies; 21+ messages in thread
From: Mahesh Jagannath Salgaonkar @ 2018-06-08  6:19 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: linuxppc-dev, Aneesh Kumar K.V, Michael Ellerman, Laurent Dufour

On 06/08/2018 07:18 AM, Nicholas Piggin wrote:
> On Thu, 07 Jun 2018 22:58:55 +0530
> Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com> wrote:
> 
>> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
>>
>> If we get a machine check exceptions due to SLB errors then dump the
>> current SLB contents which will be very much helpful in debugging the
>> root cause of SLB errors. On pseries, as of today system crashes on SLB
>> errors. These are soft errors and can be fixed by flushing the SLBs so
>> the kernel can continue to function instead of system crash. This patch
>> fixes that also.
> 
> So pseries never flushed SLB and reloaded in response to multi hit
> errors? This seems like quite a good improvement then. I like
> dumping SLB too.
> 
> It's a bit annoying we can't share the same code with xmon really,
> that's okay but I just suggest commenting them both if you take a
> copy like this with a note to keep them in synch if you re-post
> the series.
> 
>>
>> With this patch the console will log SLB contents like below on SLB MCE
>> errors:
>>
>> [  822.711728] slb contents:
> 
> Suggest keeping the same format as the xmon dump (in particular
> CPU number, even though it's probably printed elsewhere in the MCE
> message it doesn't hurt.

Sure will do that and repost.

Thanks,
-Mahesh.

> 
> Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
> 
> Thanks,
> Nick
> 
>> [  822.711730] 00 c000000008000000 400ea1b217000500
>> [  822.711731]   1T  ESID=   c00000  VSID=      ea1b217 LLP:100
>> [  822.711732] 01 d000000008000000 400d43642f000510
>> [  822.711733]   1T  ESID=   d00000  VSID=      d43642f LLP:110
>> [  822.711734] 09 f000000008000000 400a86c85f000500
>> [  822.711736]   1T  ESID=   f00000  VSID=      a86c85f LLP:100
>> [  822.711737] 10 00007f0008000000 400d1f26e3000d90
>> [  822.711738]   1T  ESID=       7f  VSID=      d1f26e3 LLP:110
>> [  822.711739] 11 0000000018000000 000e3615f520fd90
>> [  822.711740]  256M ESID=        1  VSID=   e3615f520f LLP:110
>> [  822.711740] 12 d000000008000000 400d43642f000510
>> [  822.711741]   1T  ESID=   d00000  VSID=      d43642f LLP:110
>> [  822.711742] 13 d000000008000000 400d43642f000510
>> [  822.711743]   1T  ESID=   d00000  VSID=      d43642f LLP:110
>>
>>
>> Suggested-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
>> Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
>> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
>> ---
>>  arch/powerpc/include/asm/book3s/64/mmu-hash.h |    1 +
>>  arch/powerpc/mm/slb.c                         |   35 +++++++++++++++++++++++++
>>  arch/powerpc/platforms/pseries/ras.c          |   29 ++++++++++++++++++++-
>>  3 files changed, 64 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
>> index 50ed64fba4ae..c0da68927235 100644
>> --- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
>> +++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
>> @@ -487,6 +487,7 @@ extern void hpte_init_native(void);
>>  
>>  extern void slb_initialize(void);
>>  extern void slb_flush_and_rebolt(void);
>> +extern void slb_dump_contents(void);
>>  
>>  extern void slb_vmalloc_update(void);
>>  extern void slb_set_size(u16 size);
>> diff --git a/arch/powerpc/mm/slb.c b/arch/powerpc/mm/slb.c
>> index 66577cc66dc9..799aa117cec3 100644
>> --- a/arch/powerpc/mm/slb.c
>> +++ b/arch/powerpc/mm/slb.c
>> @@ -145,6 +145,41 @@ void slb_flush_and_rebolt(void)
>>  	get_paca()->slb_cache_ptr = 0;
>>  }
>>  
>> +void slb_dump_contents(void)
>> +{
>> +	int i;
>> +	unsigned long e, v;
>> +	unsigned long llp;
>> +
>> +	pr_err("slb contents:\n");
>> +	for (i = 0; i < mmu_slb_size; i++) {
>> +		asm volatile("slbmfee  %0,%1" : "=r" (e) : "r" (i));
>> +		asm volatile("slbmfev  %0,%1" : "=r" (v) : "r" (i));
>> +
>> +		if (!e && !v)
>> +			continue;
>> +
>> +		pr_err("%02d %016lx %016lx", i, e, v);
>> +
>> +		if (!(e & SLB_ESID_V)) {
>> +			pr_err("\n");
>> +			continue;
>> +		}
>> +		llp = v & SLB_VSID_LLP;
>> +		if (v & SLB_VSID_B_1T) {
>> +			pr_err("  1T  ESID=%9lx  VSID=%13lx LLP:%3lx\n",
>> +				GET_ESID_1T(e),
>> +				(v & ~SLB_VSID_B) >> SLB_VSID_SHIFT_1T,
>> +				llp);
>> +		} else {
>> +			pr_err(" 256M ESID=%9lx  VSID=%13lx LLP:%3lx\n",
>> +				GET_ESID(e),
>> +				(v & ~SLB_VSID_B) >> SLB_VSID_SHIFT,
>> +				llp);
>> +		}
>> +	}
>> +}
>> +
>>  void slb_vmalloc_update(void)
>>  {
>>  	unsigned long vflags;
>> diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
>> index 2edc673be137..e56759d92356 100644
>> --- a/arch/powerpc/platforms/pseries/ras.c
>> +++ b/arch/powerpc/platforms/pseries/ras.c
>> @@ -422,6 +422,31 @@ int pSeries_system_reset_exception(struct pt_regs *regs)
>>  	return 0; /* need to perform reset */
>>  }
>>  
>> +static int mce_handle_error(struct rtas_error_log *errp)
>> +{
>> +	struct pseries_errorlog *pseries_log;
>> +	struct pseries_mc_errorlog *mce_log;
>> +	int disposition = rtas_error_disposition(errp);
>> +	uint8_t error_type;
>> +
>> +	pseries_log = get_pseries_errorlog(errp, PSERIES_ELOG_SECT_ID_MCE);
>> +	if (pseries_log == NULL)
>> +		goto out;
>> +
>> +	mce_log = (struct pseries_mc_errorlog *)pseries_log->data;
>> +	error_type = rtas_mc_error_type(mce_log);
>> +
>> +	if ((disposition == RTAS_DISP_NOT_RECOVERED) &&
>> +			(error_type == PSERIES_MC_ERROR_TYPE_SLB)) {
>> +		slb_dump_contents();
>> +		slb_flush_and_rebolt();
>> +		disposition = RTAS_DISP_FULLY_RECOVERED;
>> +	}
>> +
>> +out:
>> +	return disposition;
>> +}
>> +
>>  /*
>>   * See if we can recover from a machine check exception.
>>   * This is only called on power4 (or above) and only via
>> @@ -434,7 +459,9 @@ int pSeries_system_reset_exception(struct pt_regs *regs)
>>  static int recover_mce(struct pt_regs *regs, struct rtas_error_log *err)
>>  {
>>  	int recovered = 0;
>> -	int disposition = rtas_error_disposition(err);
>> +	int disposition;
>> +
>> +	disposition = mce_handle_error(err);
>>  
>>  	if (!(regs->msr & MSR_RI)) {
>>  		/* If MSR_RI isn't set, we cannot recover */
>>
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [v3 PATCH 5/5] powerpc/pseries: Display machine check error details.
  2018-06-08  1:51   ` Nicholas Piggin
@ 2018-06-08  6:28     ` Mahesh Jagannath Salgaonkar
  2018-07-02 18:01     ` Michal Suchánek
  1 sibling, 0 replies; 21+ messages in thread
From: Mahesh Jagannath Salgaonkar @ 2018-06-08  6:28 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: linuxppc-dev, Aneesh Kumar K.V, Michael Ellerman, Laurent Dufour

On 06/08/2018 07:21 AM, Nicholas Piggin wrote:
> On Thu, 07 Jun 2018 22:59:04 +0530
> Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com> wrote:
> 
>> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
>>
>> Extract the MCE error details from RTAS extended log and display it to
>> console.
>>
>> With this patch you should now see mce logs like below:
>>
>> [  142.371818] Severe Machine check interrupt [Recovered]
>> [  142.371822]   NIP [d00000000ca301b8]: init_module+0x1b8/0x338 [bork_kernel]
>> [  142.371822]   Initiator: CPU
>> [  142.371823]   Error type: SLB [Multihit]
>> [  142.371824]     Effective address: d00000000ca70000
>>
>> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
>> ---
>>  arch/powerpc/include/asm/rtas.h      |    5 +
>>  arch/powerpc/platforms/pseries/ras.c |  128 +++++++++++++++++++++++++++++++++-
>>  2 files changed, 131 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
>> index 3f2fba7ef23b..8100a95c133a 100644
>> --- a/arch/powerpc/include/asm/rtas.h
>> +++ b/arch/powerpc/include/asm/rtas.h
>> @@ -190,6 +190,11 @@ static inline uint8_t rtas_error_extended(const struct rtas_error_log *elog)
>>  	return (elog->byte1 & 0x04) >> 2;
>>  }
>>  
>> +static inline uint8_t rtas_error_initiator(const struct rtas_error_log *elog)
>> +{
>> +	return (elog->byte2 & 0xf0) >> 4;
>> +}
>> +
>>  #define rtas_error_type(x)	((x)->byte3)
>>  
>>  static inline
>> diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
>> index e56759d92356..cd9446980092 100644
>> --- a/arch/powerpc/platforms/pseries/ras.c
>> +++ b/arch/powerpc/platforms/pseries/ras.c
>> @@ -422,7 +422,130 @@ int pSeries_system_reset_exception(struct pt_regs *regs)
>>  	return 0; /* need to perform reset */
>>  }
>>  
>> -static int mce_handle_error(struct rtas_error_log *errp)
>> +#define VAL_TO_STRING(ar, val)	((val < ARRAY_SIZE(ar)) ? ar[val] : "Unknown")
>> +
>> +static void pseries_print_mce_info(struct pt_regs *regs,
>> +				struct rtas_error_log *errp, int disposition)
>> +{
>> +	const char *level, *sevstr;
>> +	struct pseries_errorlog *pseries_log;
>> +	struct pseries_mc_errorlog *mce_log;
>> +	uint8_t error_type, err_sub_type;
>> +	uint8_t initiator = rtas_error_initiator(errp);
>> +	uint64_t addr;
>> +
>> +	static const char * const initiators[] = {
>> +		"Unknown",
>> +		"CPU",
>> +		"PCI",
>> +		"ISA",
>> +		"Memory",
>> +		"Power Mgmt",
>> +	};
>> +	static const char * const mc_err_types[] = {
>> +		"UE",
>> +		"SLB",
>> +		"ERAT",
>> +		"TLB",
>> +		"D-Cache",
>> +		"Unknown",
>> +		"I-Cache",
>> +	};
>> +	static const char * const mc_ue_types[] = {
>> +		"Indeterminate",
>> +		"Instruction fetch",
>> +		"Page table walk ifetch",
>> +		"Load/Store",
>> +		"Page table walk Load/Store",
>> +	};
>> +
>> +	/* SLB sub errors valid values are 0x0, 0x1, 0x2 */
>> +	static const char * const mc_slb_types[] = {
>> +		"Parity",
>> +		"Multihit",
>> +		"Indeterminate",
>> +	};
>> +
>> +	/* TLB and ERAT sub errors valid values are 0x1, 0x2, 0x3 */
>> +	static const char * const mc_soft_types[] = {
>> +		"Unknown",
>> +		"Parity",
>> +		"Multihit",
>> +		"Indeterminate",
>> +	};
>> +
>> +	pseries_log = get_pseries_errorlog(errp, PSERIES_ELOG_SECT_ID_MCE);
>> +	if (pseries_log == NULL)
>> +		return;
>> +
>> +	mce_log = (struct pseries_mc_errorlog *)pseries_log->data;
>> +
>> +	error_type = rtas_mc_error_type(mce_log);
>> +	err_sub_type = rtas_mc_error_sub_type(mce_log);
>> +
>> +	switch (rtas_error_severity(errp)) {
>> +	case RTAS_SEVERITY_NO_ERROR:
>> +		level = KERN_INFO;
>> +		sevstr = "Harmless";
>> +		break;
>> +	case RTAS_SEVERITY_WARNING:
>> +		level = KERN_WARNING;
>> +		sevstr = "";
>> +		break;
>> +	case RTAS_SEVERITY_ERROR:
>> +	case RTAS_SEVERITY_ERROR_SYNC:
>> +		level = KERN_ERR;
>> +		sevstr = "Severe";
>> +		break;
>> +	case RTAS_SEVERITY_FATAL:
>> +	default:
>> +		level = KERN_ERR;
>> +		sevstr = "Fatal";
>> +		break;
>> +	}
>> +
>> +	printk("%s%s Machine check interrupt [%s]\n", level, sevstr,
>> +		disposition == RTAS_DISP_FULLY_RECOVERED ?
>> +		"Recovered" : "Not recovered");
>> +	if (user_mode(regs)) {
>> +		printk("%s  NIP: [%016lx] PID: %d Comm: %s\n", level,
>> +			regs->nip, current->pid, current->comm);
>> +	} else {
>> +		printk("%s  NIP [%016lx]: %pS\n", level, regs->nip,
>> +			(void *)regs->nip);
>> +	}
> 
> I think it's probably still useful to print pid/comm for kernel mode
> faults if !in_interrupt()... I see you're basically taking kernel/mce.c
> and doing the same thing.
> 
> Is there any reasonable way to share code here?

I did think of doing that, but I wanted make this patch series simple
enough to be able to make backport easy for very old kernels. I will
work on consolidating the code as enhancement later.

Thanks,
-Mahesh.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [v3 PATCH 2/5] powerpc/pseries: Fix endainness while restoring of r3 in MCE handler.
  2018-06-07 17:28 ` [v3 PATCH 2/5] powerpc/pseries: Fix endainness while restoring of r3 in MCE handler Mahesh J Salgaonkar
  2018-06-08  1:33   ` Nicholas Piggin
@ 2018-06-08  6:50   ` Michael Ellerman
  2018-06-08 10:31     ` Mahesh Jagannath Salgaonkar
  1 sibling, 1 reply; 21+ messages in thread
From: Michael Ellerman @ 2018-06-08  6:50 UTC (permalink / raw)
  To: Mahesh J Salgaonkar, linuxppc-dev
  Cc: stable, Aneesh Kumar K.V, Laurent Dufour, Nicholas Piggin

Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com> writes:
> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
>
> During Machine Check interrupt on pseries platform, register r3 points
> RTAS extended event log passed by hypervisor. Since hypervisor uses r3
> to pass pointer to rtas log, it stores the original r3 value at the
> start of the memory (first 8 bytes) pointed by r3. Since hypervisor
> stores this info and rtas log is in BE format, linux should make
> sure to restore r3 value in correct endian format.

Can we hit this under KVM? And if so what if the KVM/qemu is running
little endian, does it still write the value BE?

cheers

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [v3 PATCH 2/5] powerpc/pseries: Fix endainness while restoring of r3 in MCE handler.
  2018-06-08  6:50   ` Michael Ellerman
@ 2018-06-08 10:31     ` Mahesh Jagannath Salgaonkar
  0 siblings, 0 replies; 21+ messages in thread
From: Mahesh Jagannath Salgaonkar @ 2018-06-08 10:31 UTC (permalink / raw)
  To: Michael Ellerman, linuxppc-dev
  Cc: stable, Aneesh Kumar K.V, Laurent Dufour, Nicholas Piggin

On 06/08/2018 12:20 PM, Michael Ellerman wrote:
> Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com> writes:
>> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
>>
>> During Machine Check interrupt on pseries platform, register r3 points
>> RTAS extended event log passed by hypervisor. Since hypervisor uses r3
>> to pass pointer to rtas log, it stores the original r3 value at the
>> start of the memory (first 8 bytes) pointed by r3. Since hypervisor
>> stores this info and rtas log is in BE format, linux should make
>> sure to restore r3 value in correct endian format.
> 
> Can we hit this under KVM? And if so what if the KVM/qemu is running
> little endian, does it still write the value BE?

FWNMI support for qemu is still not in. But when it is in, we can hit
this. But whenever FWNMI support gets in, it should pass RTAS event data
always in BE format including original r3 value.

Thanks,
-Mahesh.
> 
> cheers
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [v3 PATCH 4/5] powerpc/pseries: Dump and flush SLB contents on SLB MCE errors.
  2018-06-07 17:28 ` [v3 PATCH 4/5] powerpc/pseries: Dump and flush SLB contents on SLB MCE errors Mahesh J Salgaonkar
  2018-06-08  1:48   ` Nicholas Piggin
@ 2018-06-12 13:47   ` Michael Ellerman
  2018-06-13  2:38     ` Aneesh Kumar K.V
  2018-06-13  3:45     ` Mahesh Jagannath Salgaonkar
  1 sibling, 2 replies; 21+ messages in thread
From: Michael Ellerman @ 2018-06-12 13:47 UTC (permalink / raw)
  To: Mahesh J Salgaonkar, linuxppc-dev
  Cc: Aneesh Kumar K.V, Aneesh Kumar K.V, Laurent Dufour, Nicholas Piggin

Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com> writes:
> diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
> index 2edc673be137..e56759d92356 100644
> --- a/arch/powerpc/platforms/pseries/ras.c
> +++ b/arch/powerpc/platforms/pseries/ras.c
> @@ -422,6 +422,31 @@ int pSeries_system_reset_exception(struct pt_regs *regs)
>  	return 0; /* need to perform reset */
>  }
>  
> +static int mce_handle_error(struct rtas_error_log *errp)
> +{
> +	struct pseries_errorlog *pseries_log;
> +	struct pseries_mc_errorlog *mce_log;
> +	int disposition = rtas_error_disposition(errp);
> +	uint8_t error_type;
> +
> +	pseries_log = get_pseries_errorlog(errp, PSERIES_ELOG_SECT_ID_MCE);
> +	if (pseries_log == NULL)
> +		goto out;
> +
> +	mce_log = (struct pseries_mc_errorlog *)pseries_log->data;
> +	error_type = rtas_mc_error_type(mce_log);
> +
> +	if ((disposition == RTAS_DISP_NOT_RECOVERED) &&
> +			(error_type == PSERIES_MC_ERROR_TYPE_SLB)) {
> +		slb_dump_contents();
> +		slb_flush_and_rebolt();

Aren't we back in virtual mode here?

Don't we need to do the flush in real mode before turning the MMU back
on. Otherwise we'll just take another multi-hit?

cheers

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [v3 PATCH 4/5] powerpc/pseries: Dump and flush SLB contents on SLB MCE errors.
  2018-06-12 13:47   ` Michael Ellerman
@ 2018-06-13  2:38     ` Aneesh Kumar K.V
  2018-06-13  4:06       ` Michael Ellerman
  2018-06-13  3:45     ` Mahesh Jagannath Salgaonkar
  1 sibling, 1 reply; 21+ messages in thread
From: Aneesh Kumar K.V @ 2018-06-13  2:38 UTC (permalink / raw)
  To: Michael Ellerman, Mahesh J Salgaonkar, linuxppc-dev
  Cc: Aneesh Kumar K.V, Laurent Dufour, Nicholas Piggin

On 06/12/2018 07:17 PM, Michael Ellerman wrote:
> Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com> writes:
>> diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
>> index 2edc673be137..e56759d92356 100644
>> --- a/arch/powerpc/platforms/pseries/ras.c
>> +++ b/arch/powerpc/platforms/pseries/ras.c
>> @@ -422,6 +422,31 @@ int pSeries_system_reset_exception(struct pt_regs *regs)
>>   	return 0; /* need to perform reset */
>>   }
>>   
>> +static int mce_handle_error(struct rtas_error_log *errp)
>> +{
>> +	struct pseries_errorlog *pseries_log;
>> +	struct pseries_mc_errorlog *mce_log;
>> +	int disposition = rtas_error_disposition(errp);
>> +	uint8_t error_type;
>> +
>> +	pseries_log = get_pseries_errorlog(errp, PSERIES_ELOG_SECT_ID_MCE);
>> +	if (pseries_log == NULL)
>> +		goto out;
>> +
>> +	mce_log = (struct pseries_mc_errorlog *)pseries_log->data;
>> +	error_type = rtas_mc_error_type(mce_log);
>> +
>> +	if ((disposition == RTAS_DISP_NOT_RECOVERED) &&
>> +			(error_type == PSERIES_MC_ERROR_TYPE_SLB)) {
>> +		slb_dump_contents();
>> +		slb_flush_and_rebolt();
> 
> Aren't we back in virtual mode here?
> 
> Don't we need to do the flush in real mode before turning the MMU back
> on. Otherwise we'll just take another multi-hit?
> 

slb_flush_and_rebolt does slbia, which keeps slb index 0. So kernel code 
should not get another slb miss. We also make sure we don't touch stack 
in slb_flush_and_rebolt(). So we flush everything and put vmalloc and 
stack back. That should be ok with MMU on?

-aneesh

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [v3 PATCH 4/5] powerpc/pseries: Dump and flush SLB contents on SLB MCE errors.
  2018-06-12 13:47   ` Michael Ellerman
  2018-06-13  2:38     ` Aneesh Kumar K.V
@ 2018-06-13  3:45     ` Mahesh Jagannath Salgaonkar
  1 sibling, 0 replies; 21+ messages in thread
From: Mahesh Jagannath Salgaonkar @ 2018-06-13  3:45 UTC (permalink / raw)
  To: Michael Ellerman, linuxppc-dev
  Cc: Laurent Dufour, Aneesh Kumar K.V, Nicholas Piggin

On 06/12/2018 07:17 PM, Michael Ellerman wrote:
> Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com> writes:
>> diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
>> index 2edc673be137..e56759d92356 100644
>> --- a/arch/powerpc/platforms/pseries/ras.c
>> +++ b/arch/powerpc/platforms/pseries/ras.c
>> @@ -422,6 +422,31 @@ int pSeries_system_reset_exception(struct pt_regs *regs)
>>  	return 0; /* need to perform reset */
>>  }
>>  
>> +static int mce_handle_error(struct rtas_error_log *errp)
>> +{
>> +	struct pseries_errorlog *pseries_log;
>> +	struct pseries_mc_errorlog *mce_log;
>> +	int disposition = rtas_error_disposition(errp);
>> +	uint8_t error_type;
>> +
>> +	pseries_log = get_pseries_errorlog(errp, PSERIES_ELOG_SECT_ID_MCE);
>> +	if (pseries_log == NULL)
>> +		goto out;
>> +
>> +	mce_log = (struct pseries_mc_errorlog *)pseries_log->data;
>> +	error_type = rtas_mc_error_type(mce_log);
>> +
>> +	if ((disposition == RTAS_DISP_NOT_RECOVERED) &&
>> +			(error_type == PSERIES_MC_ERROR_TYPE_SLB)) {
>> +		slb_dump_contents();
>> +		slb_flush_and_rebolt();
> 
> Aren't we back in virtual mode here?
> 
> Don't we need to do the flush in real mode before turning the MMU back
> on. Otherwise we'll just take another multi-hit?

Yeah for duplicate entries for kernel segment "0xc00", we will end up
with another multi-hit. For other segments we won't. I think I need to
move the fetching of rtas error log and handling part into real mode to
avoid a loop, and do only printing part in virtual mode.

> 
> cheers
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [v3 PATCH 4/5] powerpc/pseries: Dump and flush SLB contents on SLB MCE errors.
  2018-06-13  2:38     ` Aneesh Kumar K.V
@ 2018-06-13  4:06       ` Michael Ellerman
  2018-06-13  4:06         ` Aneesh Kumar K.V
  0 siblings, 1 reply; 21+ messages in thread
From: Michael Ellerman @ 2018-06-13  4:06 UTC (permalink / raw)
  To: Aneesh Kumar K.V, Mahesh J Salgaonkar, linuxppc-dev
  Cc: Aneesh Kumar K.V, Laurent Dufour, Nicholas Piggin

"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
> On 06/12/2018 07:17 PM, Michael Ellerman wrote:
>> Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com> writes:
>>> diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
>>> index 2edc673be137..e56759d92356 100644
>>> --- a/arch/powerpc/platforms/pseries/ras.c
>>> +++ b/arch/powerpc/platforms/pseries/ras.c
>>> @@ -422,6 +422,31 @@ int pSeries_system_reset_exception(struct pt_regs *regs)
>>>   	return 0; /* need to perform reset */
>>>   }
>>>   
>>> +static int mce_handle_error(struct rtas_error_log *errp)
>>> +{
>>> +	struct pseries_errorlog *pseries_log;
>>> +	struct pseries_mc_errorlog *mce_log;
>>> +	int disposition = rtas_error_disposition(errp);
>>> +	uint8_t error_type;
>>> +
>>> +	pseries_log = get_pseries_errorlog(errp, PSERIES_ELOG_SECT_ID_MCE);
>>> +	if (pseries_log == NULL)
>>> +		goto out;
>>> +
>>> +	mce_log = (struct pseries_mc_errorlog *)pseries_log->data;
>>> +	error_type = rtas_mc_error_type(mce_log);
>>> +
>>> +	if ((disposition == RTAS_DISP_NOT_RECOVERED) &&
>>> +			(error_type == PSERIES_MC_ERROR_TYPE_SLB)) {
>>> +		slb_dump_contents();
>>> +		slb_flush_and_rebolt();
>> 
>> Aren't we back in virtual mode here?
>> 
>> Don't we need to do the flush in real mode before turning the MMU back
>> on. Otherwise we'll just take another multi-hit?
>
> slb_flush_and_rebolt does slbia, which keeps slb index 0. So kernel code 
> should not get another slb miss. We also make sure we don't touch stack 
> in slb_flush_and_rebolt(). So we flush everything and put vmalloc and 
> stack back. That should be ok with MMU on?

I don't think so.

Imagine we take a multi-hit accessing the paca. The machine check is
delivered in real mode, so we can run and access the paca by it's real
address. But as soon as we turn the MMU back on, we'll take another
multi-hit when we access the paca.

If I'm reading the code right we are turning the MMU back on essentially
straight away when we rfid to machine_check_common().

cheers

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [v3 PATCH 4/5] powerpc/pseries: Dump and flush SLB contents on SLB MCE errors.
  2018-06-13  4:06       ` Michael Ellerman
@ 2018-06-13  4:06         ` Aneesh Kumar K.V
  0 siblings, 0 replies; 21+ messages in thread
From: Aneesh Kumar K.V @ 2018-06-13  4:06 UTC (permalink / raw)
  To: Michael Ellerman, Mahesh J Salgaonkar, linuxppc-dev
  Cc: Aneesh Kumar K.V, Laurent Dufour, Nicholas Piggin

On 06/13/2018 09:36 AM, Michael Ellerman wrote:
> "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
>> On 06/12/2018 07:17 PM, Michael Ellerman wrote:
>>> Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com> writes:
>>>> diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
>>>> index 2edc673be137..e56759d92356 100644
>>>> --- a/arch/powerpc/platforms/pseries/ras.c
>>>> +++ b/arch/powerpc/platforms/pseries/ras.c
>>>> @@ -422,6 +422,31 @@ int pSeries_system_reset_exception(struct pt_regs *regs)
>>>>    	return 0; /* need to perform reset */
>>>>    }
>>>>    
>>>> +static int mce_handle_error(struct rtas_error_log *errp)
>>>> +{
>>>> +	struct pseries_errorlog *pseries_log;
>>>> +	struct pseries_mc_errorlog *mce_log;
>>>> +	int disposition = rtas_error_disposition(errp);
>>>> +	uint8_t error_type;
>>>> +
>>>> +	pseries_log = get_pseries_errorlog(errp, PSERIES_ELOG_SECT_ID_MCE);
>>>> +	if (pseries_log == NULL)
>>>> +		goto out;
>>>> +
>>>> +	mce_log = (struct pseries_mc_errorlog *)pseries_log->data;
>>>> +	error_type = rtas_mc_error_type(mce_log);
>>>> +
>>>> +	if ((disposition == RTAS_DISP_NOT_RECOVERED) &&
>>>> +			(error_type == PSERIES_MC_ERROR_TYPE_SLB)) {
>>>> +		slb_dump_contents();
>>>> +		slb_flush_and_rebolt();
>>>
>>> Aren't we back in virtual mode here?
>>>
>>> Don't we need to do the flush in real mode before turning the MMU back
>>> on. Otherwise we'll just take another multi-hit?
>>
>> slb_flush_and_rebolt does slbia, which keeps slb index 0. So kernel code
>> should not get another slb miss. We also make sure we don't touch stack
>> in slb_flush_and_rebolt(). So we flush everything and put vmalloc and
>> stack back. That should be ok with MMU on?
> 
> I don't think so.
> 
> Imagine we take a multi-hit accessing the paca. The machine check is
> delivered in real mode, so we can run and access the paca by it's real
> address. But as soon as we turn the MMU back on, we'll take another
> multi-hit when we access the paca.
> 
> If I'm reading the code right we are turning the MMU back on essentially
> straight away when we rfid to machine_check_common().
> 

yes for linear mapped first 1TB we will take a multi-hit again

-aneesh

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [v3 PATCH 5/5] powerpc/pseries: Display machine check error details.
  2018-06-08  1:51   ` Nicholas Piggin
  2018-06-08  6:28     ` Mahesh Jagannath Salgaonkar
@ 2018-07-02 18:01     ` Michal Suchánek
  1 sibling, 0 replies; 21+ messages in thread
From: Michal Suchánek @ 2018-07-02 18:01 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Mahesh J Salgaonkar, linuxppc-dev, Laurent Dufour, Aneesh Kumar K.V

On Fri, 8 Jun 2018 11:51:36 +1000
Nicholas Piggin <npiggin@gmail.com> wrote:

> On Thu, 07 Jun 2018 22:59:04 +0530
> Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com> wrote:
> 
> > From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> > 
> > Extract the MCE error details from RTAS extended log and display it
> > to console.
> > 
> > With this patch you should now see mce logs like below:
> > 
> > [  142.371818] Severe Machine check interrupt [Recovered]
> > [  142.371822]   NIP [d00000000ca301b8]: init_module+0x1b8/0x338
> > [bork_kernel] [  142.371822]   Initiator: CPU
> > [  142.371823]   Error type: SLB [Multihit]
> > [  142.371824]     Effective address: d00000000ca70000
> > 
> > Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> > ---
> >  arch/powerpc/include/asm/rtas.h      |    5 +
> >  arch/powerpc/platforms/pseries/ras.c |  128
> > +++++++++++++++++++++++++++++++++- 2 files changed, 131
> > insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/powerpc/include/asm/rtas.h
> > b/arch/powerpc/include/asm/rtas.h index 3f2fba7ef23b..8100a95c133a
> > 100644 --- a/arch/powerpc/include/asm/rtas.h
> > +++ b/arch/powerpc/include/asm/rtas.h
> > @@ -190,6 +190,11 @@ static inline uint8_t
> > rtas_error_extended(const struct rtas_error_log *elog) return
> > (elog->byte1 & 0x04) >> 2; }
> >  
> > +static inline uint8_t rtas_error_initiator(const struct
> > rtas_error_log *elog) +{
> > +	return (elog->byte2 & 0xf0) >> 4;
> > +}
> > +
> >  #define rtas_error_type(x)	((x)->byte3)
> >  
> >  static inline
> > diff --git a/arch/powerpc/platforms/pseries/ras.c
> > b/arch/powerpc/platforms/pseries/ras.c index
> > e56759d92356..cd9446980092 100644 ---
> > a/arch/powerpc/platforms/pseries/ras.c +++
> > b/arch/powerpc/platforms/pseries/ras.c @@ -422,7 +422,130 @@ int
> > pSeries_system_reset_exception(struct pt_regs *regs) return 0; /*
> > need to perform reset */ }
> >  
> > -static int mce_handle_error(struct rtas_error_log *errp)
> > +#define VAL_TO_STRING(ar, val)	((val < ARRAY_SIZE(ar)) ?
> > ar[val] : "Unknown") +
> > +static void pseries_print_mce_info(struct pt_regs *regs,
> > +				struct rtas_error_log *errp, int
> > disposition) +{
> > +	const char *level, *sevstr;
> > +	struct pseries_errorlog *pseries_log;
> > +	struct pseries_mc_errorlog *mce_log;
> > +	uint8_t error_type, err_sub_type;
> > +	uint8_t initiator = rtas_error_initiator(errp);
> > +	uint64_t addr;
> > +
> > +	static const char * const initiators[] = {
> > +		"Unknown",
> > +		"CPU",
> > +		"PCI",
> > +		"ISA",
> > +		"Memory",
> > +		"Power Mgmt",
> > +	};
> > +	static const char * const mc_err_types[] = {
> > +		"UE",
> > +		"SLB",
> > +		"ERAT",
> > +		"TLB",
> > +		"D-Cache",
> > +		"Unknown",
> > +		"I-Cache",
> > +	};
> > +	static const char * const mc_ue_types[] = {
> > +		"Indeterminate",
> > +		"Instruction fetch",
> > +		"Page table walk ifetch",
> > +		"Load/Store",
> > +		"Page table walk Load/Store",
> > +	};
> > +
> > +	/* SLB sub errors valid values are 0x0, 0x1, 0x2 */
> > +	static const char * const mc_slb_types[] = {
> > +		"Parity",
> > +		"Multihit",
> > +		"Indeterminate",
> > +	};
> > +
> > +	/* TLB and ERAT sub errors valid values are 0x1, 0x2, 0x3
> > */
> > +	static const char * const mc_soft_types[] = {
> > +		"Unknown",
> > +		"Parity",
> > +		"Multihit",
> > +		"Indeterminate",
> > +	};
> > +
> > +	pseries_log = get_pseries_errorlog(errp,
> > PSERIES_ELOG_SECT_ID_MCE);
> > +	if (pseries_log == NULL)
> > +		return;
> > +
> > +	mce_log = (struct pseries_mc_errorlog *)pseries_log->data;
> > +
> > +	error_type = rtas_mc_error_type(mce_log);
> > +	err_sub_type = rtas_mc_error_sub_type(mce_log);
> > +
> > +	switch (rtas_error_severity(errp)) {
> > +	case RTAS_SEVERITY_NO_ERROR:
> > +		level = KERN_INFO;
> > +		sevstr = "Harmless";
> > +		break;
> > +	case RTAS_SEVERITY_WARNING:
> > +		level = KERN_WARNING;
> > +		sevstr = "";
> > +		break;
> > +	case RTAS_SEVERITY_ERROR:
> > +	case RTAS_SEVERITY_ERROR_SYNC:
> > +		level = KERN_ERR;
> > +		sevstr = "Severe";
> > +		break;
> > +	case RTAS_SEVERITY_FATAL:
> > +	default:
> > +		level = KERN_ERR;
> > +		sevstr = "Fatal";
> > +		break;
> > +	}
> > +
> > +	printk("%s%s Machine check interrupt [%s]\n", level,
> > sevstr,
> > +		disposition == RTAS_DISP_FULLY_RECOVERED ?
> > +		"Recovered" : "Not recovered");
> > +	if (user_mode(regs)) {
> > +		printk("%s  NIP: [%016lx] PID: %d Comm: %s\n",
> > level,
> > +			regs->nip, current->pid, current->comm);
> > +	} else {
> > +		printk("%s  NIP [%016lx]: %pS\n", level, regs->nip,
> > +			(void *)regs->nip);
> > +	}  
> 
> I think it's probably still useful to print pid/comm for kernel mode
> faults if !in_interrupt()... I see you're basically taking
> kernel/mce.c and doing the same thing.
> 
> Is there any reasonable way to share code here?
> 

I don't think so. In commit 36df96f8acaf ("powerpc/book3s: Decode and
save machine check event.") these enums are added:

enum MCE_ErrorType {
        MCE_ERROR_TYPE_UNKNOWN = 0,
        MCE_ERROR_TYPE_UE = 1,
        MCE_ERROR_TYPE_SLB = 2,
        MCE_ERROR_TYPE_ERAT = 3,
        MCE_ERROR_TYPE_TLB = 4,
};

enum MCE_UeErrorType {
        MCE_UE_ERROR_INDETERMINATE = 0,
        MCE_UE_ERROR_IFETCH = 1,
        MCE_UE_ERROR_PAGE_TABLE_WALK_IFETCH = 2,
        MCE_UE_ERROR_LOAD_STORE = 3,
        MCE_UE_ERROR_PAGE_TABLE_WALK_LOAD_STORE = 4,
};

enum MCE_SlbErrorType {
        MCE_SLB_ERROR_INDETERMINATE = 0,
        MCE_SLB_ERROR_PARITY = 1,
        MCE_SLB_ERROR_MULTIHIT = 2,
};

enum MCE_EratErrorType {
        MCE_ERAT_ERROR_INDETERMINATE = 0,
        MCE_ERAT_ERROR_PARITY = 1,
        MCE_ERAT_ERROR_MULTIHIT = 2,
};

enum MCE_TlbErrorType {
        MCE_TLB_ERROR_INDETERMINATE = 0,
        MCE_TLB_ERROR_PARITY = 1,
        MCE_TLB_ERROR_MULTIHIT = 2,
};

And the patch in the series adds slightly different definitions:

/* RTAS pseries MCE error types */
#define PSERIES_MC_ERROR_TYPE_UE                0x00
#define PSERIES_MC_ERROR_TYPE_SLB               0x01
#define PSERIES_MC_ERROR_TYPE_ERAT              0x02
#define PSERIES_MC_ERROR_TYPE_TLB               0x04
#define PSERIES_MC_ERROR_TYPE_D_CACHE           0x05
#define PSERIES_MC_ERROR_TYPE_I_CACHE           0x07

/* RTAS pseries MCE error sub types */
#define PSERIES_MC_ERROR_UE_INDETERMINATE               0
#define PSERIES_MC_ERROR_UE_IFETCH                      1
#define PSERIES_MC_ERROR_UE_PAGE_TABLE_WALK_IFETCH      2
#define PSERIES_MC_ERROR_UE_LOAD_STORE                  3
#define PSERIES_MC_ERROR_UE_PAGE_TABLE_WALK_LOAD_STORE  4

#define PSERIES_MC_ERROR_SLB_PARITY             0
#define PSERIES_MC_ERROR_SLB_MULTIHIT           1
#define PSERIES_MC_ERROR_SLB_INDETERMINATE      2

#define PSERIES_MC_ERROR_ERAT_PARITY            1
#define PSERIES_MC_ERROR_ERAT_MULTIHIT          2
#define PSERIES_MC_ERROR_ERAT_INDETERMINATE     3

#define PSERIES_MC_ERROR_TLB_PARITY             1
#define PSERIES_MC_ERROR_TLB_MULTIHIT           2
#define PSERIES_MC_ERROR_TLB_INDETERMINATE      3


If the MCEs are indeed intentionally different between pSeries and
powernv it might be worth mentioning somewhere.

Thanks

Michal

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2018-07-02 18:02 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-07 17:27 [v3 PATCH 0/5] powerpc/pseries: Machien check handler improvements Mahesh J Salgaonkar
2018-06-07 17:28 ` [v3 PATCH 1/5] powerpc/pseries: convert rtas_log_buf to linear allocation Mahesh J Salgaonkar
2018-06-08  1:31   ` Nicholas Piggin
2018-06-08  6:16     ` Mahesh Jagannath Salgaonkar
2018-06-07 17:28 ` [v3 PATCH 2/5] powerpc/pseries: Fix endainness while restoring of r3 in MCE handler Mahesh J Salgaonkar
2018-06-08  1:33   ` Nicholas Piggin
2018-06-08  6:50   ` Michael Ellerman
2018-06-08 10:31     ` Mahesh Jagannath Salgaonkar
2018-06-07 17:28 ` [v3 PATCH 3/5] powerpc/pseries: Define MCE error event section Mahesh J Salgaonkar
2018-06-07 17:28 ` [v3 PATCH 4/5] powerpc/pseries: Dump and flush SLB contents on SLB MCE errors Mahesh J Salgaonkar
2018-06-08  1:48   ` Nicholas Piggin
2018-06-08  6:19     ` Mahesh Jagannath Salgaonkar
2018-06-12 13:47   ` Michael Ellerman
2018-06-13  2:38     ` Aneesh Kumar K.V
2018-06-13  4:06       ` Michael Ellerman
2018-06-13  4:06         ` Aneesh Kumar K.V
2018-06-13  3:45     ` Mahesh Jagannath Salgaonkar
2018-06-07 17:29 ` [v3 PATCH 5/5] powerpc/pseries: Display machine check error details Mahesh J Salgaonkar
2018-06-08  1:51   ` Nicholas Piggin
2018-06-08  6:28     ` Mahesh Jagannath Salgaonkar
2018-07-02 18:01     ` Michal Suchánek

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.