linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/9] Machine check handling in linux host.
@ 2013-08-07  9:37 Mahesh J Salgaonkar
  2013-08-07  9:38 ` [RFC PATCH 1/9] powerpc: Split the common exception prolog logic into two section Mahesh J Salgaonkar
                   ` (8 more replies)
  0 siblings, 9 replies; 22+ messages in thread
From: Mahesh J Salgaonkar @ 2013-08-07  9:37 UTC (permalink / raw)
  To: linuxppc-dev, Benjamin Herrenschmidt
  Cc: Jeremy Kerr, Paul Mackerras, Anton Blanchard

Hi,

Please find the patch set that performs the machine check handling inside linux
host. The design is to be able to handle re-entrancy so that we do not clobber
the machine check information during nested machine check interrupt.

The patch 2 implements the logic to save the raw MCE info onto the emergency
stack and prepares to take another exception. Patch 3 and 4 adds CPU-side hooks
for early machine check handler and TLB flush.  The patch 5 and 6 is
responsible to detect SLB/TLB errors and flush them off in the real mode. The
patch 7 implements the logic to decode and save high level MCE information to
per cpu buffer without clobbering. The patch 9 adds the basic error handling to
the high level C code with MMU on.

I have tested SLB multihit scenario on powernv. For TLB multihit and nested
machine check testing, I am still working on getting cronus setup to be able
to inject errors.

Please review and let me know your comments.

Thanks,
-Mahesh.

---

Mahesh Salgaonkar (9):
      powerpc: Split the common exception prolog logic into two section.
      powerpc: handle machine check in Linux host.
      powerpc: Introduce a early machine check hook in cpu_spec.
      powerpc: Add flush_tlb operation in cpu_spec.
      powerpc: Flush SLB/TLBs if we get SLB/TLB machine check errors on power7.
      powerpc: Flush SLB/TLBs if we get SLB/TLB machine check errors on power8.
      powerpc: Decode and save machine check event.
      powerpc/powernv: Remove machine check handling in OPAL.
      powerpc/powernv: Machine check exception handling.


 arch/powerpc/include/asm/bitops.h        |    5 +
 arch/powerpc/include/asm/cputable.h      |   12 +
 arch/powerpc/include/asm/exception-64s.h |  110 ++++++++---
 arch/powerpc/include/asm/mce.h           |  190 ++++++++++++++++++++
 arch/powerpc/kernel/Makefile             |    3 
 arch/powerpc/kernel/cpu_setup_power.S    |   39 +++-
 arch/powerpc/kernel/cputable.c           |   16 ++
 arch/powerpc/kernel/exceptions-64s.S     |   50 +++++
 arch/powerpc/kernel/mce.c                |  179 +++++++++++++++++++
 arch/powerpc/kernel/mce_power.c          |  287 ++++++++++++++++++++++++++++++
 arch/powerpc/kernel/traps.c              |   15 ++
 arch/powerpc/kvm/book3s_hv_ras.c         |   42 +---
 arch/powerpc/platforms/powernv/opal.c    |   84 ++++++---
 13 files changed, 935 insertions(+), 97 deletions(-)
 create mode 100644 arch/powerpc/include/asm/mce.h
 create mode 100644 arch/powerpc/kernel/mce.c
 create mode 100644 arch/powerpc/kernel/mce_power.c

-- 
-Mahesh

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [RFC PATCH 1/9] powerpc: Split the common exception prolog logic into two section.
  2013-08-07  9:37 [RFC PATCH 0/9] Machine check handling in linux host Mahesh J Salgaonkar
@ 2013-08-07  9:38 ` Mahesh J Salgaonkar
  2013-08-08  4:10   ` Anshuman Khandual
  2013-08-07  9:38 ` [RFC PATCH 2/9] powerpc: handle machine check in Linux host Mahesh J Salgaonkar
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 22+ messages in thread
From: Mahesh J Salgaonkar @ 2013-08-07  9:38 UTC (permalink / raw)
  To: linuxppc-dev, Benjamin Herrenschmidt
  Cc: Jeremy Kerr, Paul Mackerras, Anton Blanchard

From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

This patch splits the common exception prolog logic into two parts to
facilitate reuse of existing code in the next patch. The second part will
be reused in the machine check exception routine in the next patch.

Please note that this patch does not introduce or change existing code
logic. Instead it is just a code movement.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/exception-64s.h |   67 ++++++++++++++++--------------
 1 file changed, 35 insertions(+), 32 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64s.h b/arch/powerpc/include/asm/exception-64s.h
index 07ca627..2386d40 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -248,6 +248,40 @@ do_kvm_##n:								\
 
 #define NOTEST(n)
 
+#define EXCEPTION_PROLOG_COMMON_2(n, area)				   \
+	std	r2,GPR2(r1);		/* save r2 in stackframe	*/ \
+	SAVE_4GPRS(3, r1);		/* save r3 - r6 in stackframe	*/ \
+	SAVE_2GPRS(7, r1);		/* save r7, r8 in stackframe	*/ \
+	ld	r9,area+EX_R9(r13);	/* move r9, r10 to stackframe	*/ \
+	ld	r10,area+EX_R10(r13);					   \
+	std	r9,GPR9(r1);						   \
+	std	r10,GPR10(r1);						   \
+	ld	r9,area+EX_R11(r13);	/* move r11 - r13 to stackframe	*/ \
+	ld	r10,area+EX_R12(r13);					   \
+	ld	r11,area+EX_R13(r13);					   \
+	std	r9,GPR11(r1);						   \
+	std	r10,GPR12(r1);						   \
+	std	r11,GPR13(r1);						   \
+	BEGIN_FTR_SECTION_NESTED(66);					   \
+	ld	r10,area+EX_CFAR(r13);					   \
+	std	r10,ORIG_GPR3(r1);					   \
+	END_FTR_SECTION_NESTED(CPU_FTR_CFAR, CPU_FTR_CFAR, 66);		   \
+	GET_LR(r9,area);		/* Get LR, later save to stack	*/ \
+	ld	r2,PACATOC(r13);	/* get kernel TOC into r2	*/ \
+	std	r9,_LINK(r1);						   \
+	mfctr	r10;			/* save CTR in stackframe	*/ \
+	std	r10,_CTR(r1);						   \
+	lbz	r10,PACASOFTIRQEN(r13);				   \
+	mfspr	r11,SPRN_XER;		/* save XER in stackframe	*/ \
+	std	r10,SOFTE(r1);						   \
+	std	r11,_XER(r1);						   \
+	li	r9,(n)+1;						   \
+	std	r9,_TRAP(r1);		/* set trap number		*/ \
+	li	r10,0;							   \
+	ld	r11,exception_marker@toc(r2);				   \
+	std	r10,RESULT(r1);		/* clear regs->result		*/ \
+	std	r11,STACK_FRAME_OVERHEAD-16(r1); /* mark the frame	*/
+
 /*
  * The common exception prolog is used for all except a few exceptions
  * such as a segment miss on a kernel address.  We have to be prepared
@@ -281,38 +315,7 @@ do_kvm_##n:								\
 	beq	4f;			/* if from kernel mode		*/ \
 	ACCOUNT_CPU_USER_ENTRY(r9, r10);				   \
 	SAVE_PPR(area, r9, r10);					   \
-4:	std	r2,GPR2(r1);		/* save r2 in stackframe	*/ \
-	SAVE_4GPRS(3, r1);		/* save r3 - r6 in stackframe	*/ \
-	SAVE_2GPRS(7, r1);		/* save r7, r8 in stackframe	*/ \
-	ld	r9,area+EX_R9(r13);	/* move r9, r10 to stackframe	*/ \
-	ld	r10,area+EX_R10(r13);					   \
-	std	r9,GPR9(r1);						   \
-	std	r10,GPR10(r1);						   \
-	ld	r9,area+EX_R11(r13);	/* move r11 - r13 to stackframe	*/ \
-	ld	r10,area+EX_R12(r13);					   \
-	ld	r11,area+EX_R13(r13);					   \
-	std	r9,GPR11(r1);						   \
-	std	r10,GPR12(r1);						   \
-	std	r11,GPR13(r1);						   \
-	BEGIN_FTR_SECTION_NESTED(66);					   \
-	ld	r10,area+EX_CFAR(r13);					   \
-	std	r10,ORIG_GPR3(r1);					   \
-	END_FTR_SECTION_NESTED(CPU_FTR_CFAR, CPU_FTR_CFAR, 66);		   \
-	GET_LR(r9,area);		/* Get LR, later save to stack	*/ \
-	ld	r2,PACATOC(r13);	/* get kernel TOC into r2	*/ \
-	std	r9,_LINK(r1);						   \
-	mfctr	r10;			/* save CTR in stackframe	*/ \
-	std	r10,_CTR(r1);						   \
-	lbz	r10,PACASOFTIRQEN(r13);				   \
-	mfspr	r11,SPRN_XER;		/* save XER in stackframe	*/ \
-	std	r10,SOFTE(r1);						   \
-	std	r11,_XER(r1);						   \
-	li	r9,(n)+1;						   \
-	std	r9,_TRAP(r1);		/* set trap number		*/ \
-	li	r10,0;							   \
-	ld	r11,exception_marker@toc(r2);				   \
-	std	r10,RESULT(r1);		/* clear regs->result		*/ \
-	std	r11,STACK_FRAME_OVERHEAD-16(r1); /* mark the frame	*/ \
+4:	EXCEPTION_PROLOG_COMMON_2(n, area)				   \
 	ACCOUNT_STOLEN_TIME
 
 /*

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH 2/9] powerpc: handle machine check in Linux host.
  2013-08-07  9:37 [RFC PATCH 0/9] Machine check handling in linux host Mahesh J Salgaonkar
  2013-08-07  9:38 ` [RFC PATCH 1/9] powerpc: Split the common exception prolog logic into two section Mahesh J Salgaonkar
@ 2013-08-07  9:38 ` Mahesh J Salgaonkar
  2013-08-08  4:51   ` Paul Mackerras
  2013-08-08  5:01   ` Anshuman Khandual
  2013-08-07  9:38 ` [RFC PATCH 3/9] powerpc: Introduce a early machine check hook in cpu_spec Mahesh J Salgaonkar
                   ` (6 subsequent siblings)
  8 siblings, 2 replies; 22+ messages in thread
From: Mahesh J Salgaonkar @ 2013-08-07  9:38 UTC (permalink / raw)
  To: linuxppc-dev, Benjamin Herrenschmidt
  Cc: Jeremy Kerr, Paul Mackerras, Anton Blanchard

From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

Move machine check entry point into Linux. So far we were dependent on
firmware to decode MCE error details and handover the high level info to OS.

This patch introduces early machine check routine that saves the MCE
information (srr1, srr0, dar and dsisr) to the emergency stack. We allocate
stack frame on emergency stack and set the r1 accordingly. This allows us
to be prepared to take another exception without loosing context. One thing
to note here that, if we get another machine check while ME bit is off then
we risk a checkstop. Hence we restrict ourselves to save only MCE information
and turn the ME bit on.

This is the code flow:

		Machine Check Interrupt
			|
			V
		   0x200 vector				  ME=0, IR=0, DR=0
			|
			V
	+-----------------------------------------------+
	|machine_check_pSeries_early:			| ME=0, IR=0, DR=0
	|	Alloc frame on emergency stack		|
	|	Save srr1, srr0, dar and dsisr on stack |
	+-----------------------------------------------+
			|
		(ME=1, IR=0, DR=0, RFID)
			|
			V
		machine_check_handle_early		  ME=1, IR=0, DR=0
			|
			V
	+-----------------------------------------------+
	|	machine_check_early (r3=pt_regs)	| ME=1, IR=0, DR=0
	|	Things to do: (in next patches)		|
	|		Flush SLB for SLB errors	|
	|		Flush TLB for TLB errors	|
	|		Decode and save MCE info	|
	+-----------------------------------------------+
			|
	(Fall through existing exception handler routine.)
			|
			V
		machine_check_pSerie			  ME=1, IR=0, DR=0
			|
		(ME=1, IR=1, DR=1, RFID)
			|
			V
		machine_check_common			  ME=1, IR=1, DR=1
			.
			.
			.


Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/exception-64s.h |   43 ++++++++++++++++++++++++++
 arch/powerpc/kernel/exceptions-64s.S     |   50 +++++++++++++++++++++++++++++-
 arch/powerpc/kernel/traps.c              |   12 +++++++
 3 files changed, 104 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/exception-64s.h b/arch/powerpc/include/asm/exception-64s.h
index 2386d40..c5d2cbc 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -174,6 +174,49 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 #define EXCEPTION_PROLOG_1(area, extra, vec)				\
 	__EXCEPTION_PROLOG_1(area, extra, vec)
 
+/*
+ * Register contents:
+ * R12		= interrupt vector
+ * R13		= PACA
+ * R9		= CR
+ * R11 & R12 is saved on PACA_EXMC
+ *
+ * Swicth to emergency stack and handle re-entrancy (though we currently
+ * don't test for overflow). Save MCE registers srr1, srr0, dar and
+ * dsisr and then turn the ME bit on.
+ */
+#define __EARLY_MACHINE_CHECK_HANDLER(area, label)			\
+	/* Check if we are laready using emergency stack. */		\
+	ld	r10,PACAEMERGSP(r13);					\
+	subi	r10,r10,THREAD_SIZE;					\
+	rldicr	r10,r10,0,(63 - THREAD_SHIFT);				\
+	rldicr	r11,r1,0,(63 - THREAD_SHIFT);				\
+	cmpd	r10,r11;	/* Are we using emergency stack? */	\
+	mr	r11,r1;			/* Save current stack pointer */\
+	beq	0f;							\
+	ld	r1,PACAEMERGSP(r13);	/* Use emergency stack */	\
+0:	subi	r1,r1,INT_FRAME_SIZE;	/* alloc stack frame */		\
+	std	r11,GPR1(r1);						\
+	std	r11,0(r1);		/* make stack chain pointer */	\
+	mfspr	r11,SPRN_SRR0;		/* Save SRR0 */			\
+	std	r11,_NIP(r1);						\
+	mfspr	r11,SPRN_SRR1;		/* Save SRR1 */			\
+	std	r11,_MSR(r1);						\
+	mfspr	r11,SPRN_DAR;		/* Save DAR */			\
+	std 	r11,_DAR(r1);						\
+	mfspr	r11,SPRN_DSISR;		/* Save DSISR */		\
+	std	r11,_DSISR(r1);						\
+	mfmsr	r11;			/* get MSR value */		\
+	ori	r11,r11,MSR_ME;		/* turn on ME bit */		\
+	ld	r12,PACAKBASE(r13);	/* get high part of &label */	\
+	LOAD_HANDLER(r12,label)						\
+	mtspr	SPRN_SRR0,r12;						\
+	mtspr	SPRN_SRR1,r11;						\
+	rfid;								\
+	b	.	/* prevent speculative execution */
+#define EARLY_MACHINE_CHECK_HANDLER(area, label)			\
+	__EARLY_MACHINE_CHECK_HANDLER(area, label)
+
 #define __EXCEPTION_PROLOG_PSERIES_1(label, h)				\
 	ld	r12,PACAKBASE(r13);	/* get high part of &label */	\
 	ld	r10,PACAKMSR(r13);	/* get MSR value for kernel */	\
diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index 4e00d22..66d2a71 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -156,7 +156,7 @@ machine_check_pSeries_1:
 	HMT_MEDIUM_PPR_DISCARD
 	SET_SCRATCH0(r13)		/* save r13 */
 	EXCEPTION_PROLOG_0(PACA_EXMC)
-	b	machine_check_pSeries_0
+	b	machine_check_pSeries_early
 
 	. = 0x300
 	.globl data_access_pSeries
@@ -404,6 +404,9 @@ denorm_exception_hv:
 
 	.align	7
 	/* moved from 0x200 */
+machine_check_pSeries_early:
+	EXCEPTION_PROLOG_1(PACA_EXMC, NOTEST, 0x200)
+	EARLY_MACHINE_CHECK_HANDLER(PACA_EXMC, machine_check_handle_early)
 machine_check_pSeries:
 	.globl machine_check_fwnmi
 machine_check_fwnmi:
@@ -681,6 +684,51 @@ machine_check_common:
 	bl	.machine_check_exception
 	b	.ret_from_except
 
+	/*
+	 * Handle machine check early in real mode. We come here with
+	 * ME bit on with MMU (IR and DR) off and using emergency stack.
+	 */
+	.align	7
+	.globl machine_check_handle_early
+machine_check_handle_early:
+	std	r9,_CCR(r1)	/* Save CR in stackframe */
+	std	r0,GPR0(r1)	/* Save r0, r2 - r10 */
+	EXCEPTION_PROLOG_COMMON_2(0x200, PACA_EXMC)
+	bl	.save_nvgprs
+	addi	r3,r1,STACK_FRAME_OVERHEAD
+	bl	.machine_check_early
+	cmpdi	r3,0			/* Did we handle it? */
+	ld	r9,_MSR(r1)
+	beq	1f
+	ori	r9,r9,MSR_RI		/* Yes we have handled it. */
+	/* Move original SRR0 and SRR1 into the respective regs */
+1:	mtspr	SPRN_SRR1,r9
+	ld	r3,_NIP(r1)
+	mtspr	SPRN_SRR0,r3
+	REST_NVGPRS(r1)
+	ld	r9,_CTR(r1)
+	mtctr	r9
+	ld	r9,_XER(r1)
+	mtxer	r9
+	ld	r9,_CCR(r1)
+	mtcr	r9
+BEGIN_FTR_SECTION
+	ld	r9,ORIG_GPR3(r1)
+	mtspr	SPRN_CFAR,r9
+END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
+	ld	r9,_LINK(r1)
+	mtlr	r9
+	REST_GPR(0, r1)
+	REST_8GPRS(2, r1)
+	REST_2GPRS(10, r1)
+	REST_GPR(12, r1)
+	/*
+	 * restore orig stack pointer. We have already saved MCE info
+	 * to per cpu MCE event buffer.
+	 */
+	ld	r1,GPR1(r1)
+	b	machine_check_pSeries
+
 	STD_EXCEPTION_COMMON_ASYNC(0x500, hardware_interrupt, do_IRQ)
 	STD_EXCEPTION_COMMON_ASYNC(0x900, decrementer, .timer_interrupt)
 	STD_EXCEPTION_COMMON(0x980, hdecrementer, .hdec_interrupt)
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index bf33c22..1720e08 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -286,6 +286,18 @@ void system_reset_exception(struct pt_regs *regs)
 
 	/* What should we do here? We could issue a shutdown or hard reset. */
 }
+
+/*
+ * This function is called in real mode. Strictly no printk's please.
+ *
+ * regs->nip and regs->msr contains srr0 and ssr1.
+ */
+long machine_check_early(struct pt_regs *regs)
+{
+	/* TODO: handle/decode machine check reason */
+	return 0;
+}
+
 #endif
 
 /*

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH 3/9] powerpc: Introduce a early machine check hook in cpu_spec.
  2013-08-07  9:37 [RFC PATCH 0/9] Machine check handling in linux host Mahesh J Salgaonkar
  2013-08-07  9:38 ` [RFC PATCH 1/9] powerpc: Split the common exception prolog logic into two section Mahesh J Salgaonkar
  2013-08-07  9:38 ` [RFC PATCH 2/9] powerpc: handle machine check in Linux host Mahesh J Salgaonkar
@ 2013-08-07  9:38 ` Mahesh J Salgaonkar
  2013-08-07  9:38 ` [RFC PATCH 4/9] powerpc: Add flush_tlb operation " Mahesh J Salgaonkar
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 22+ messages in thread
From: Mahesh J Salgaonkar @ 2013-08-07  9:38 UTC (permalink / raw)
  To: linuxppc-dev, Benjamin Herrenschmidt
  Cc: Jeremy Kerr, Paul Mackerras, Anton Blanchard

From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

This patch adds the early machine check function pointer in cputable for
CPU specific early machine check handling. The early machine handle routine
will be called in real mode to handle SLB and TLB errors. This patch just
sets up a mechanism invoke CPU specific handler. The subsequent patches
will populate the function pointer.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/cputable.h |    7 +++++++
 arch/powerpc/kernel/traps.c         |    7 +++++--
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/cputable.h b/arch/powerpc/include/asm/cputable.h
index 6f3887d..d8c098e 100644
--- a/arch/powerpc/include/asm/cputable.h
+++ b/arch/powerpc/include/asm/cputable.h
@@ -90,6 +90,13 @@ struct cpu_spec {
 	 * if the error is fatal, 1 if it was fully recovered and 0 to
 	 * pass up (not CPU originated) */
 	int		(*machine_check)(struct pt_regs *regs);
+
+	/*
+	 * Processor specific early machine check handler which is
+	 * called in real mode to handle SLB and TLB errors.
+	 */
+	long		(*machine_check_early)(struct pt_regs *regs);
+
 };
 
 extern struct cpu_spec		*cur_cpu_spec;
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 1720e08..07331b7 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -294,8 +294,11 @@ void system_reset_exception(struct pt_regs *regs)
  */
 long machine_check_early(struct pt_regs *regs)
 {
-	/* TODO: handle/decode machine check reason */
-	return 0;
+	long handled = 0;
+
+	if (cur_cpu_spec && cur_cpu_spec->machine_check_early)
+		handled = cur_cpu_spec->machine_check_early(regs);
+	return handled;
 }
 
 #endif

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH 4/9] powerpc: Add flush_tlb operation in cpu_spec.
  2013-08-07  9:37 [RFC PATCH 0/9] Machine check handling in linux host Mahesh J Salgaonkar
                   ` (2 preceding siblings ...)
  2013-08-07  9:38 ` [RFC PATCH 3/9] powerpc: Introduce a early machine check hook in cpu_spec Mahesh J Salgaonkar
@ 2013-08-07  9:38 ` Mahesh J Salgaonkar
  2013-08-07  9:38 ` [RFC PATCH 5/9] powerpc: Flush SLB/TLBs if we get SLB/TLB machine check errors on power7 Mahesh J Salgaonkar
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 22+ messages in thread
From: Mahesh J Salgaonkar @ 2013-08-07  9:38 UTC (permalink / raw)
  To: linuxppc-dev, Benjamin Herrenschmidt
  Cc: Jeremy Kerr, Paul Mackerras, Anton Blanchard

From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

This patch introduces flush_tlb operation in cpu_spec structure. This will
help us to invoke appropriate CPU-side flush tlb routine. This patch
adds the foundation to invoke CPU specific flush routine for respective
architectures. Currently this patch introduce flush_tlb for p7 and p8.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/cputable.h   |    5 ++++
 arch/powerpc/kernel/cpu_setup_power.S |   39 ++++++++++++++++++++++++---------
 arch/powerpc/kernel/cputable.c        |    8 +++++++
 arch/powerpc/kvm/book3s_hv_ras.c      |   18 +++------------
 4 files changed, 45 insertions(+), 25 deletions(-)

diff --git a/arch/powerpc/include/asm/cputable.h b/arch/powerpc/include/asm/cputable.h
index d8c098e..d76e47b 100644
--- a/arch/powerpc/include/asm/cputable.h
+++ b/arch/powerpc/include/asm/cputable.h
@@ -97,6 +97,11 @@ struct cpu_spec {
 	 */
 	long		(*machine_check_early)(struct pt_regs *regs);
 
+	/*
+	 * Processor specific routine to flush tlbs.
+	 */
+	void		(*flush_tlb)(unsigned long inval_selector);
+
 };
 
 extern struct cpu_spec		*cur_cpu_spec;
diff --git a/arch/powerpc/kernel/cpu_setup_power.S b/arch/powerpc/kernel/cpu_setup_power.S
index 18b5b9c..17f1fd4 100644
--- a/arch/powerpc/kernel/cpu_setup_power.S
+++ b/arch/powerpc/kernel/cpu_setup_power.S
@@ -15,6 +15,7 @@
 #include <asm/ppc_asm.h>
 #include <asm/asm-offsets.h>
 #include <asm/cache.h>
+#include <asm/mmu-hash64.h>
 
 /* Entry: r3 = crap, r4 = ptr to cputable entry
  *
@@ -29,7 +30,7 @@ _GLOBAL(__setup_cpu_power7)
 	mtspr	SPRN_LPID,r0
 	mfspr	r3,SPRN_LPCR
 	bl	__init_LPCR
-	bl	__init_TLB
+	bl	__init_tlb_power7
 	mtlr	r11
 	blr
 
@@ -42,7 +43,7 @@ _GLOBAL(__restore_cpu_power7)
 	mtspr	SPRN_LPID,r0
 	mfspr	r3,SPRN_LPCR
 	bl	__init_LPCR
-	bl	__init_TLB
+	bl	__init_tlb_power7
 	mtlr	r11
 	blr
 
@@ -59,7 +60,7 @@ _GLOBAL(__setup_cpu_power8)
 	oris	r3, r3, LPCR_AIL_3@h
 	bl	__init_LPCR
 	bl	__init_HFSCR
-	bl	__init_TLB
+	bl	__init_tlb_power8
 	bl	__init_PMU_HV
 	mtlr	r11
 	blr
@@ -78,7 +79,7 @@ _GLOBAL(__restore_cpu_power8)
 	oris	r3, r3, LPCR_AIL_3@h
 	bl	__init_LPCR
 	bl	__init_HFSCR
-	bl	__init_TLB
+	bl	__init_tlb_power8
 	bl	__init_PMU_HV
 	mtlr	r11
 	blr
@@ -134,15 +135,31 @@ __init_HFSCR:
 	mtspr	SPRN_HFSCR,r3
 	blr
 
-__init_TLB:
-	/*
-	 * Clear the TLB using the "IS 3" form of tlbiel instruction
-	 * (invalidate by congruence class). P7 has 128 CCs, P8 has 512
-	 * so we just always do 512
-	 */
+/*
+ * Clear the TLB using the specified IS form of tlbiel instruction
+ * (invalidate by congruence class). P7 has 128 CCs., P8 has 512.
+ *
+ * r3 = IS field
+ */
+__init_tlb_power7:
+	li	r3,0xc00	/* IS field = 0b11 */
+_GLOBAL(__flush_tlb_power7)
+	li	r6,128
+	mtctr	r6
+	mr	r7,r3		/* IS field */
+	ptesync
+2:	tlbiel	r7
+	addi	r7,r7,0x1000
+	bdnz	2b
+	ptesync
+1:	blr
+
+__init_tlb_power8:
+	li	r3,0xc00	/* IS field = 0b11 */
+_GLOBAL(__flush_tlb_power8)
 	li	r6,512
 	mtctr	r6
-	li	r7,0xc00	/* IS field = 0b11 */
+	mr	r7,r3		/* IS field */
 	ptesync
 2:	tlbiel	r7
 	addi	r7,r7,0x1000
diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c
index 22973a7..cdbe115 100644
--- a/arch/powerpc/kernel/cputable.c
+++ b/arch/powerpc/kernel/cputable.c
@@ -71,6 +71,8 @@ extern void __restore_cpu_power7(void);
 extern void __setup_cpu_power8(unsigned long offset, struct cpu_spec* spec);
 extern void __restore_cpu_power8(void);
 extern void __restore_cpu_a2(void);
+extern void __flush_tlb_power7(unsigned long inval_selector);
+extern void __flush_tlb_power8(unsigned long inval_selector);
 #endif /* CONFIG_PPC64 */
 #if defined(CONFIG_E500)
 extern void __setup_cpu_e5500(unsigned long offset, struct cpu_spec* spec);
@@ -440,6 +442,7 @@ static struct cpu_spec __initdata cpu_specs[] = {
 		.oprofile_cpu_type	= "ppc64/ibm-compat-v1",
 		.cpu_setup		= __setup_cpu_power7,
 		.cpu_restore		= __restore_cpu_power7,
+		.flush_tlb		= __flush_tlb_power7,
 		.platform		= "power7",
 	},
 	{	/* 2.07-compliant processor, i.e. Power8 "architected" mode */
@@ -456,6 +459,7 @@ static struct cpu_spec __initdata cpu_specs[] = {
 		.oprofile_cpu_type	= "ppc64/ibm-compat-v1",
 		.cpu_setup		= __setup_cpu_power8,
 		.cpu_restore		= __restore_cpu_power8,
+		.flush_tlb		= __flush_tlb_power8,
 		.platform		= "power8",
 	},
 	{	/* Power7 */
@@ -474,6 +478,7 @@ static struct cpu_spec __initdata cpu_specs[] = {
 		.oprofile_type		= PPC_OPROFILE_POWER4,
 		.cpu_setup		= __setup_cpu_power7,
 		.cpu_restore		= __restore_cpu_power7,
+		.flush_tlb		= __flush_tlb_power7,
 		.platform		= "power7",
 	},
 	{	/* Power7+ */
@@ -492,6 +497,7 @@ static struct cpu_spec __initdata cpu_specs[] = {
 		.oprofile_type		= PPC_OPROFILE_POWER4,
 		.cpu_setup		= __setup_cpu_power7,
 		.cpu_restore		= __restore_cpu_power7,
+		.flush_tlb		= __flush_tlb_power7,
 		.platform		= "power7+",
 	},
 	{	/* Power8E */
@@ -510,6 +516,7 @@ static struct cpu_spec __initdata cpu_specs[] = {
 		.oprofile_type		= PPC_OPROFILE_INVALID,
 		.cpu_setup		= __setup_cpu_power8,
 		.cpu_restore		= __restore_cpu_power8,
+		.flush_tlb		= __flush_tlb_power8,
 		.platform		= "power8",
 	},
 	{	/* Power8 */
@@ -528,6 +535,7 @@ static struct cpu_spec __initdata cpu_specs[] = {
 		.oprofile_type		= PPC_OPROFILE_INVALID,
 		.cpu_setup		= __setup_cpu_power8,
 		.cpu_restore		= __restore_cpu_power8,
+		.flush_tlb		= __flush_tlb_power8,
 		.platform		= "power8",
 	},
 	{	/* Cell Broadband Engine */
diff --git a/arch/powerpc/kvm/book3s_hv_ras.c b/arch/powerpc/kvm/book3s_hv_ras.c
index a353c48..5c427b4 100644
--- a/arch/powerpc/kvm/book3s_hv_ras.c
+++ b/arch/powerpc/kvm/book3s_hv_ras.c
@@ -58,18 +58,6 @@ static void reload_slb(struct kvm_vcpu *vcpu)
 	}
 }
 
-/* POWER7 TLB flush */
-static void flush_tlb_power7(struct kvm_vcpu *vcpu)
-{
-	unsigned long i, rb;
-
-	rb = TLBIEL_INVAL_SET_LPID;
-	for (i = 0; i < POWER7_TLB_SETS; ++i) {
-		asm volatile("tlbiel %0" : : "r" (rb));
-		rb += 1 << TLBIEL_INVAL_SET_SHIFT;
-	}
-}
-
 /*
  * On POWER7, see if we can handle a machine check that occurred inside
  * the guest in real mode, without switching to the host partition.
@@ -96,7 +84,8 @@ static long kvmppc_realmode_mc_power7(struct kvm_vcpu *vcpu)
 				   DSISR_MC_SLB_PARITY | DSISR_MC_DERAT_MULTI);
 		}
 		if (dsisr & DSISR_MC_TLB_MULTI) {
-			flush_tlb_power7(vcpu);
+			if (cur_cpu_spec && cur_cpu_spec->flush_tlb)
+				cur_cpu_spec->flush_tlb(TLBIEL_INVAL_SET_LPID);
 			dsisr &= ~DSISR_MC_TLB_MULTI;
 		}
 		/* Any other errors we don't understand? */
@@ -113,7 +102,8 @@ static long kvmppc_realmode_mc_power7(struct kvm_vcpu *vcpu)
 		reload_slb(vcpu);
 		break;
 	case SRR1_MC_IFETCH_TLBMULTI:
-		flush_tlb_power7(vcpu);
+		if (cur_cpu_spec && cur_cpu_spec->flush_tlb)
+			cur_cpu_spec->flush_tlb(TLBIEL_INVAL_SET_LPID);
 		break;
 	default:
 		handled = 0;

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH 5/9] powerpc: Flush SLB/TLBs if we get SLB/TLB machine check errors on power7.
  2013-08-07  9:37 [RFC PATCH 0/9] Machine check handling in linux host Mahesh J Salgaonkar
                   ` (3 preceding siblings ...)
  2013-08-07  9:38 ` [RFC PATCH 4/9] powerpc: Add flush_tlb operation " Mahesh J Salgaonkar
@ 2013-08-07  9:38 ` Mahesh J Salgaonkar
  2013-08-08  4:58   ` Paul Mackerras
  2013-08-07  9:39 ` [RFC PATCH 6/9] powerpc: Flush SLB/TLBs if we get SLB/TLB machine check errors on power8 Mahesh J Salgaonkar
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 22+ messages in thread
From: Mahesh J Salgaonkar @ 2013-08-07  9:38 UTC (permalink / raw)
  To: linuxppc-dev, Benjamin Herrenschmidt
  Cc: Jeremy Kerr, Paul Mackerras, Anton Blanchard

From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

If we get a machine check exception due to SLB or TLB errors, then flush
SLBs/TLBs and reload SLBs to recover. We do this in real mode before turning
on MMU. Otherwise we would run into nested machine checks.

If we get a machine check when we are in guest, then just flush the
SLBs and continue. This patch handles errors for power7. The next
patch will handle errors for power8

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/bitops.h |    5 +
 arch/powerpc/include/asm/mce.h    |   67 ++++++++++++++++
 arch/powerpc/kernel/Makefile      |    1 
 arch/powerpc/kernel/cputable.c    |    4 +
 arch/powerpc/kernel/mce_power.c   |  153 +++++++++++++++++++++++++++++++++++++
 5 files changed, 230 insertions(+)
 create mode 100644 arch/powerpc/include/asm/mce.h
 create mode 100644 arch/powerpc/kernel/mce_power.c

diff --git a/arch/powerpc/include/asm/bitops.h b/arch/powerpc/include/asm/bitops.h
index 910194e..a57f4bc 100644
--- a/arch/powerpc/include/asm/bitops.h
+++ b/arch/powerpc/include/asm/bitops.h
@@ -46,6 +46,11 @@
 #include <asm/asm-compat.h>
 #include <asm/synch.h>
 
+/* PPC bit number conversion */
+#define PPC_BIT(bit)		(0x8000000000000000UL >> (bit))
+#define PPC_BITLSHIFT(be)	(63 - (be))
+#define PPC_BITMASK(bs, be)	((PPC_BIT(bs) - PPC_BIT(be)) | PPC_BIT(bs))
+
 /*
  * clear_bit doesn't imply a memory barrier
  */
diff --git a/arch/powerpc/include/asm/mce.h b/arch/powerpc/include/asm/mce.h
new file mode 100644
index 0000000..ba19073
--- /dev/null
+++ b/arch/powerpc/include/asm/mce.h
@@ -0,0 +1,67 @@
+/*
+ * Machine check exception header file.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright 2013 IBM Corporation
+ * Author: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
+ */
+
+#ifndef __ASM_PPC64_MCE_H__
+#define __ASM_PPC64_MCE_H__
+
+#include <linux/bitops.h>
+
+/*
+ * Machine Check bits on power7 and power8
+ */
+#define P7_SRR1_MC_LOADSTORE(srr1)	((srr1) & PPC_BIT(42)) /* P8 too */
+
+/* SRR1 bits for machine check (On Power7 and Power8) */
+#define P7_SRR1_MC_IFETCH(srr1)	((srr1) & PPC_BITMASK(43, 45)) /* P8 too */
+
+#define P7_SRR1_MC_IFETCH_UE		(0x1 << PPC_BITLSHIFT(45)) /* P8 too */
+#define P7_SRR1_MC_IFETCH_SLB_PARITY	(0x2 << PPC_BITLSHIFT(45)) /* P8 too */
+#define P7_SRR1_MC_IFETCH_SLB_MULTIHIT	(0x3 << PPC_BITLSHIFT(45)) /* P8 too */
+#define P7_SRR1_MC_IFETCH_SLB_BOTH	(0x4 << PPC_BITLSHIFT(45)) /* P8 too */
+#define P7_SRR1_MC_IFETCH_TLB_MULTIHIT	(0x5 << PPC_BITLSHIFT(45)) /* P8 too */
+#define P7_SRR1_MC_IFETCH_UE_TLB_RELOAD	(0x6 << PPC_BITLSHIFT(45)) /* P8 too */
+#define P7_SRR1_MC_IFETCH_UE_IFU_INTERNAL	(0x7 << PPC_BITLSHIFT(45))
+
+/* SRR1 bits for machine check (On Power8) */
+#define P8_SRR1_MC_IFETCH_ERAT_MULTIHIT	(0x4 << PPC_BITLSHIFT(45))
+
+/* DSISR bits for machine check (On Power7 and Power8) */
+#define P7_DSISR_MC_UE			(PPC_BIT(48))	/* P8 too */
+#define P7_DSISR_MC_UE_TABLEWALK	(PPC_BIT(49))	/* P8 too */
+#define P7_DSISR_MC_ERAT_MULTIHIT	(PPC_BIT(52))	/* P8 too */
+#define P7_DSISR_MC_TLB_MULTIHIT_MFTLB	(PPC_BIT(53))	/* P8 too */
+#define P7_DSISR_MC_SLB_PARITY_MFSLB	(PPC_BIT(55))	/* P8 too */
+#define P7_DSISR_MC_SLB_MULTIHIT	(PPC_BIT(56))	/* P8 too */
+#define P7_DSISR_MC_SLB_MULTIHIT_PARITY	(PPC_BIT(57))	/* P8 too */
+
+/*
+ * DSISR bits for machine check (Power8) in addition to above.
+ * Secondary DERAT Multihit
+ */
+#define P8_DSISR_MC_ERAT_MULTIHIT_SEC	(PPC_BIT(54))
+
+/* SLB error bits */
+#define P7_DSISR_MC_SLB_ERRORS		(P7_DSISR_MC_ERAT_MULTIHIT | \
+					 P7_DSISR_MC_SLB_PARITY_MFSLB | \
+					 P7_DSISR_MC_SLB_MULTIHIT | \
+					 P7_DSISR_MC_SLB_MULTIHIT_PARITY)
+
+#endif /* __ASM_PPC64_MCE_H__ */
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index a8619bf..a1aba53 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -39,6 +39,7 @@ obj-$(CONFIG_PPC64)		+= setup_64.o sys_ppc32.o \
 obj-$(CONFIG_HAVE_HW_BREAKPOINT)	+= hw_breakpoint.o
 obj-$(CONFIG_PPC_BOOK3S_64)	+= cpu_setup_ppc970.o cpu_setup_pa6t.o
 obj-$(CONFIG_PPC_BOOK3S_64)	+= cpu_setup_power.o
+obj-$(CONFIG_PPC_BOOK3S_64)	+= mce_power.o
 obj64-$(CONFIG_RELOCATABLE)	+= reloc_64.o
 obj-$(CONFIG_PPC_BOOK3E_64)	+= exceptions-64e.o idle_book3e.o
 obj-$(CONFIG_PPC_A2)		+= cpu_setup_a2.o
diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c
index cdbe115..c28cc2c 100644
--- a/arch/powerpc/kernel/cputable.c
+++ b/arch/powerpc/kernel/cputable.c
@@ -73,6 +73,7 @@ extern void __restore_cpu_power8(void);
 extern void __restore_cpu_a2(void);
 extern void __flush_tlb_power7(unsigned long inval_selector);
 extern void __flush_tlb_power8(unsigned long inval_selector);
+extern long __machine_check_early_realmode_p7(struct pt_regs *regs);
 #endif /* CONFIG_PPC64 */
 #if defined(CONFIG_E500)
 extern void __setup_cpu_e5500(unsigned long offset, struct cpu_spec* spec);
@@ -443,6 +444,7 @@ static struct cpu_spec __initdata cpu_specs[] = {
 		.cpu_setup		= __setup_cpu_power7,
 		.cpu_restore		= __restore_cpu_power7,
 		.flush_tlb		= __flush_tlb_power7,
+		.machine_check_early	= __machine_check_early_realmode_p7,
 		.platform		= "power7",
 	},
 	{	/* 2.07-compliant processor, i.e. Power8 "architected" mode */
@@ -479,6 +481,7 @@ static struct cpu_spec __initdata cpu_specs[] = {
 		.cpu_setup		= __setup_cpu_power7,
 		.cpu_restore		= __restore_cpu_power7,
 		.flush_tlb		= __flush_tlb_power7,
+		.machine_check_early	= __machine_check_early_realmode_p7,
 		.platform		= "power7",
 	},
 	{	/* Power7+ */
@@ -498,6 +501,7 @@ static struct cpu_spec __initdata cpu_specs[] = {
 		.cpu_setup		= __setup_cpu_power7,
 		.cpu_restore		= __restore_cpu_power7,
 		.flush_tlb		= __flush_tlb_power7,
+		.machine_check_early	= __machine_check_early_realmode_p7,
 		.platform		= "power7+",
 	},
 	{	/* Power8E */
diff --git a/arch/powerpc/kernel/mce_power.c b/arch/powerpc/kernel/mce_power.c
new file mode 100644
index 0000000..645d722
--- /dev/null
+++ b/arch/powerpc/kernel/mce_power.c
@@ -0,0 +1,153 @@
+/*
+ * Machine check exception handling CPU-side for power7 and power8
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright 2013 IBM Corporation
+ * Author: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
+ */
+
+#undef DEBUG
+#define pr_fmt(fmt) "mce_power: " fmt
+
+#include <linux/types.h>
+#include <linux/ptrace.h>
+#include <asm/mmu.h>
+#include <asm/mce.h>
+
+/* flush SLBs and reload */
+static void flush_and_reload_slb(void)
+{
+	struct slb_shadow *slb;
+	unsigned long i, n;
+
+	if (!mmu_has_feature(MMU_FTR_SLB))
+		return;
+
+	/* Invalidate all SLBs */
+	asm volatile("slbmte %0,%0; slbia" : : "r" (0));
+
+#ifdef CONFIG_KVM_BOOK3S_HANDLER
+	/*
+	 * If machine check is hit when in guest or in transition, we will
+	 * only flush the SLBs and continue.
+	 */
+	if (get_paca()->kvm_hstate.in_guest)
+		return;
+#endif
+
+	/* For host kernel, reload the SLBs from shadow SLB buffer. */
+	slb = get_slb_shadow();
+	if (!slb)
+		return;
+
+	n = min_t(u32, slb->persistent, SLB_MIN_SIZE);
+
+	/* Load up the SLB entries from shadow SLB */
+	for (i = 0; i < n; i++) {
+		unsigned long rb = slb->save_area[i].esid;
+		unsigned long rs = slb->save_area[i].vsid;
+
+		rb = (rb & ~0xFFFul) | i;
+		asm volatile("slbmte %0,%1" : : "r" (rs), "r" (rb));
+	}
+}
+
+static long mce_handle_derror(uint64_t dsisr, uint64_t slb_error_bits)
+{
+	long handled = 1;
+
+	/*
+	 * flush and reload SLBs for SLB errors and flush TLBs for TLB errors.
+	 * reset the error bits whenever we handle them so that at the end
+	 * we can check whether we handled all of them or not.
+	 * */
+	if (dsisr & slb_error_bits) {
+		flush_and_reload_slb();
+		/* reset error bits */
+		dsisr &= ~(slb_error_bits);
+	}
+	if (dsisr & P7_DSISR_MC_TLB_MULTIHIT_MFTLB) {
+		if (cur_cpu_spec && cur_cpu_spec->flush_tlb)
+			cur_cpu_spec->flush_tlb(TLBIEL_INVAL_PAGE);
+		/* reset error bits */
+		dsisr &= ~P7_DSISR_MC_TLB_MULTIHIT_MFTLB;
+	}
+	/* Any other errors we don't understand? */
+	if (dsisr & 0xffffffffUL)
+		handled = 0;
+
+	return handled;
+}
+
+static long mce_handle_derror_p7(uint64_t dsisr)
+{
+	return mce_handle_derror(dsisr, P7_DSISR_MC_SLB_ERRORS);
+}
+
+static long mce_handle_common_ierror(uint64_t srr1)
+{
+	long handled = 0;
+
+	switch (P7_SRR1_MC_IFETCH(srr1)) {
+	case 0:
+		break;
+	case P7_SRR1_MC_IFETCH_SLB_PARITY:
+	case P7_SRR1_MC_IFETCH_SLB_MULTIHIT:
+		/* flush and reload SLBs for SLB errors. */
+		flush_and_reload_slb();
+		handled = 1;
+		break;
+	case P7_SRR1_MC_IFETCH_TLB_MULTIHIT:
+		if (cur_cpu_spec && cur_cpu_spec->flush_tlb) {
+			cur_cpu_spec->flush_tlb(TLBIEL_INVAL_PAGE);
+			handled = 1;
+		}
+		break;
+	default:
+		break;
+	}
+
+	return handled;
+}
+
+static long mce_handle_ierror_p7(uint64_t srr1)
+{
+	long handled = 0;
+
+	handled = mce_handle_common_ierror(srr1);
+
+	if (P7_SRR1_MC_IFETCH(srr1) == P7_SRR1_MC_IFETCH_SLB_BOTH) {
+		flush_and_reload_slb();
+		handled = 1;
+	}
+	return handled;
+}
+
+long __machine_check_early_realmode_p7(struct pt_regs *regs)
+{
+	uint64_t srr1;
+	long handled = 1;
+
+	srr1 = regs->msr;
+
+	if (P7_SRR1_MC_LOADSTORE(srr1))
+		handled = mce_handle_derror_p7(regs->dsisr);
+	else
+		handled = mce_handle_ierror_p7(srr1);
+
+	/* TODO: Decode machine check reason. */
+	return handled;
+}

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH 6/9] powerpc: Flush SLB/TLBs if we get SLB/TLB machine check errors on power8.
  2013-08-07  9:37 [RFC PATCH 0/9] Machine check handling in linux host Mahesh J Salgaonkar
                   ` (4 preceding siblings ...)
  2013-08-07  9:38 ` [RFC PATCH 5/9] powerpc: Flush SLB/TLBs if we get SLB/TLB machine check errors on power7 Mahesh J Salgaonkar
@ 2013-08-07  9:39 ` Mahesh J Salgaonkar
  2013-08-07  9:39 ` [RFC PATCH 7/9] powerpc: Decode and save machine check event Mahesh J Salgaonkar
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 22+ messages in thread
From: Mahesh J Salgaonkar @ 2013-08-07  9:39 UTC (permalink / raw)
  To: linuxppc-dev, Benjamin Herrenschmidt
  Cc: Jeremy Kerr, Paul Mackerras, Anton Blanchard

From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

This patch handles the memory errors on power8. If we get a machine check
exception due to SLB or TLB errors, then flush SLBs/TLBs and reload SLBs to
recover.

I do not have access to power8 box, hence this patch haven't been tested
yet.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/mce.h  |    3 +++
 arch/powerpc/kernel/cputable.c  |    4 ++++
 arch/powerpc/kernel/mce_power.c |   34 ++++++++++++++++++++++++++++++++++
 3 files changed, 41 insertions(+)

diff --git a/arch/powerpc/include/asm/mce.h b/arch/powerpc/include/asm/mce.h
index ba19073..6866062 100644
--- a/arch/powerpc/include/asm/mce.h
+++ b/arch/powerpc/include/asm/mce.h
@@ -64,4 +64,7 @@
 					 P7_DSISR_MC_SLB_MULTIHIT | \
 					 P7_DSISR_MC_SLB_MULTIHIT_PARITY)
 
+#define P8_DSISR_MC_SLB_ERRORS		(P7_DSISR_MC_SLB_ERRORS | \
+					 P8_DSISR_MC_ERAT_MULTIHIT_SEC)
+
 #endif /* __ASM_PPC64_MCE_H__ */
diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c
index c28cc2c..0195358 100644
--- a/arch/powerpc/kernel/cputable.c
+++ b/arch/powerpc/kernel/cputable.c
@@ -74,6 +74,7 @@ extern void __restore_cpu_a2(void);
 extern void __flush_tlb_power7(unsigned long inval_selector);
 extern void __flush_tlb_power8(unsigned long inval_selector);
 extern long __machine_check_early_realmode_p7(struct pt_regs *regs);
+extern long __machine_check_early_realmode_p8(struct pt_regs *regs);
 #endif /* CONFIG_PPC64 */
 #if defined(CONFIG_E500)
 extern void __setup_cpu_e5500(unsigned long offset, struct cpu_spec* spec);
@@ -462,6 +463,7 @@ static struct cpu_spec __initdata cpu_specs[] = {
 		.cpu_setup		= __setup_cpu_power8,
 		.cpu_restore		= __restore_cpu_power8,
 		.flush_tlb		= __flush_tlb_power8,
+		.machine_check_early	= __machine_check_early_realmode_p8,
 		.platform		= "power8",
 	},
 	{	/* Power7 */
@@ -521,6 +523,7 @@ static struct cpu_spec __initdata cpu_specs[] = {
 		.cpu_setup		= __setup_cpu_power8,
 		.cpu_restore		= __restore_cpu_power8,
 		.flush_tlb		= __flush_tlb_power8,
+		.machine_check_early	= __machine_check_early_realmode_p8,
 		.platform		= "power8",
 	},
 	{	/* Power8 */
@@ -540,6 +543,7 @@ static struct cpu_spec __initdata cpu_specs[] = {
 		.cpu_setup		= __setup_cpu_power8,
 		.cpu_restore		= __restore_cpu_power8,
 		.flush_tlb		= __flush_tlb_power8,
+		.machine_check_early	= __machine_check_early_realmode_p8,
 		.platform		= "power8",
 	},
 	{	/* Cell Broadband Engine */
diff --git a/arch/powerpc/kernel/mce_power.c b/arch/powerpc/kernel/mce_power.c
index 645d722..949d102 100644
--- a/arch/powerpc/kernel/mce_power.c
+++ b/arch/powerpc/kernel/mce_power.c
@@ -151,3 +151,37 @@ long __machine_check_early_realmode_p7(struct pt_regs *regs)
 	/* TODO: Decode machine check reason. */
 	return handled;
 }
+
+static long mce_handle_ierror_p8(uint64_t srr1)
+{
+	long handled = 0;
+
+	handled = mce_handle_common_ierror(srr1);
+
+	if (P7_SRR1_MC_IFETCH(srr1) == P8_SRR1_MC_IFETCH_ERAT_MULTIHIT) {
+		flush_and_reload_slb();
+		handled = 1;
+	}
+	return handled;
+}
+
+static long mce_handle_derror_p8(uint64_t dsisr)
+{
+	return mce_handle_derror(dsisr, P8_DSISR_MC_SLB_ERRORS);
+}
+
+long __machine_check_early_realmode_p8(struct pt_regs *regs)
+{
+	uint64_t srr1;
+	long handled = 1;
+
+	srr1 = regs->msr;
+
+	if (P7_SRR1_MC_LOADSTORE(srr1))
+		handled = mce_handle_derror_p8(regs->dsisr);
+	else
+		handled = mce_handle_ierror_p8(srr1);
+
+	/* TODO: Decode machine check reason. */
+	return handled;
+}

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH 7/9] powerpc: Decode and save machine check event.
  2013-08-07  9:37 [RFC PATCH 0/9] Machine check handling in linux host Mahesh J Salgaonkar
                   ` (5 preceding siblings ...)
  2013-08-07  9:39 ` [RFC PATCH 6/9] powerpc: Flush SLB/TLBs if we get SLB/TLB machine check errors on power8 Mahesh J Salgaonkar
@ 2013-08-07  9:39 ` Mahesh J Salgaonkar
  2013-08-07 18:41   ` Scott Wood
  2013-08-08  5:14   ` Paul Mackerras
  2013-08-07  9:39 ` [RFC PATCH 8/9] powerpc/powernv: Remove machine check handling in OPAL Mahesh J Salgaonkar
  2013-08-07  9:39 ` [RFC PATCH 9/9] powerpc/powernv: Machine check exception handling Mahesh J Salgaonkar
  8 siblings, 2 replies; 22+ messages in thread
From: Mahesh J Salgaonkar @ 2013-08-07  9:39 UTC (permalink / raw)
  To: linuxppc-dev, Benjamin Herrenschmidt
  Cc: Jeremy Kerr, Paul Mackerras, Anton Blanchard

From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

Now that we handle machine check in linux, the MCE decoding should also
take place in linux host. This info is crucial to log before we go down
in case we can not handle the machine check errors. This patch decodes
and populates a machine check event which contain high level meaning full
MCE information.

We do this in real mode C code with ME bit on. The MCE information is still
available on emergency stack (in pt_regs structure format). Even if we take
another exception at this point the MCE early handler will allocate a new
stack frame on top of current one. So when we return back here we still have
our MCE information safe on current stack.

We use per cpu buffer to save high level MCE information. Each per cpu buffer
is an array of machine check event structure indexed by per cpu counter
mce_nest_count. The mce_nest_count is incremented every time we enter
machine check early handler in real mode to get the current free slot
(index = mce_nest_count - 1). The mce_nest_count is decremented once the
MCE info is consumed by virtual mode machine exception handler.

This patch provides save_mce_event and get_mce_event generic routine that
can be used by machine check handlers to populate and retrieve the event.

This patch also updates kvm code to invoke get_mce_event to retrieve generic
mce event rather than paca->opal_mce_evt.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/mce.h        |  119 ++++++++++++++++++++++++++
 arch/powerpc/kernel/Makefile          |    2 
 arch/powerpc/kernel/mce.c             |  152 +++++++++++++++++++++++++++++++++
 arch/powerpc/kernel/mce_power.c       |  116 +++++++++++++++++++++++--
 arch/powerpc/kvm/book3s_hv_ras.c      |   24 ++---
 arch/powerpc/platforms/powernv/opal.c |   35 ++++----
 6 files changed, 407 insertions(+), 41 deletions(-)
 create mode 100644 arch/powerpc/kernel/mce.c

diff --git a/arch/powerpc/include/asm/mce.h b/arch/powerpc/include/asm/mce.h
index 6866062..cf57f3d 100644
--- a/arch/powerpc/include/asm/mce.h
+++ b/arch/powerpc/include/asm/mce.h
@@ -66,5 +66,124 @@
 
 #define P8_DSISR_MC_SLB_ERRORS		(P7_DSISR_MC_SLB_ERRORS | \
 					 P8_DSISR_MC_ERAT_MULTIHIT_SEC)
+enum MCE_Version {
+	MCE_V1 = 1,
+};
+
+enum MCE_Severity {
+	MCE_SEV_NO_ERROR = 0,
+	MCE_SEV_WARNING = 1,
+	MCE_SEV_ERROR_SYNC = 2,
+	MCE_SEV_FATAL = 3,
+};
+
+enum MCE_Disposition {
+	MCE_DISPOSITION_RECOVERED = 0,
+	MCE_DISPOSITION_NOT_RECOVERED = 1,
+};
+
+enum MCE_Initiator {
+	MCE_INITIATOR_UNKNOWN = 0,
+	MCE_INITIATOR_CPU = 1,
+};
+
+enum MCE_ErrorType {
+	MCE_ERROR_TYPE_UNKNOWN = 0,
+	MCE_ERROR_TYPE_UE = 1,
+	MCE_ERROR_TYPE_SLB = 2,
+	MCE_ERROR_TYPE_ERAT = 3,
+	MCE_ERROR_TYPE_TLB = 4,
+};
+
+enum MCE_UeErrorType {
+	MCE_UE_ERROR_INDETERMINATE = 0,
+	MCE_UE_ERROR_IFETCH = 1,
+	MCE_UE_ERROR_PAGE_TABLE_WALK_IFETCH = 2,
+	MCE_UE_ERROR_LOAD_STORE = 3,
+	MCE_UE_ERROR_PAGE_TABLE_WALK_LOAD_STORE = 4,
+};
+
+enum MCE_SlbErrorType {
+	MCE_SLB_ERROR_INDETERMINATE = 0,
+	MCE_SLB_ERROR_PARITY = 1,
+	MCE_SLB_ERROR_MULTIHIT = 2,
+};
+
+enum MCE_EratErrorType {
+	MCE_ERAT_ERROR_INDETERMINATE = 0,
+	MCE_ERAT_ERROR_PARITY = 1,
+	MCE_ERAT_ERROR_MULTIHIT = 2,
+};
+
+enum MCE_TlbErrorType {
+	MCE_TLB_ERROR_INDETERMINATE = 0,
+	MCE_TLB_ERROR_PARITY = 1,
+	MCE_TLB_ERROR_MULTIHIT = 2,
+};
+
+struct machine_check_event {
+	enum MCE_Version	version:8;	/* 0x00 */
+	uint8_t			in_use;		/* 0x01 */
+	enum MCE_Severity	severity:8;	/* 0x02 */
+	enum MCE_Initiator	initiator:8;	/* 0x03 */
+	enum MCE_ErrorType	error_type:8;	/* 0x04 */
+	enum MCE_Disposition	disposition:8;	/* 0x05 */
+	uint8_t			reserved_1[2];	/* 0x06 */
+	uint64_t		gpr3;		/* 0x08 */
+	uint64_t		srr0;		/* 0x10 */
+	uint64_t		srr1;		/* 0x18 */
+	union {					/* 0x20 */
+		struct {
+			enum MCE_UeErrorType ue_error_type:8;
+			uint8_t		effective_address_provided;
+			uint8_t		physical_address_provided;
+			uint8_t		reserved_1[5];
+			uint64_t	effective_address;
+			uint64_t	physical_address;
+			uint8_t		reserved_2[8];
+		} ue_error;
+
+		struct {
+			enum MCE_SlbErrorType slb_error_type:8;
+			uint8_t		effective_address_provided;
+			uint8_t		reserved_1[6];
+			uint64_t	effective_address;
+			uint8_t		reserved_2[16];
+		} slb_error;
+
+		struct {
+			enum MCE_EratErrorType erat_error_type:8;
+			uint8_t		effective_address_provided;
+			uint8_t		reserved_1[6];
+			uint64_t	effective_address;
+			uint8_t		reserved_2[16];
+		} erat_error;
+
+		struct {
+			enum MCE_TlbErrorType tlb_error_type:8;
+			uint8_t		effective_address_provided;
+			uint8_t		reserved_1[6];
+			uint64_t	effective_address;
+			uint8_t		reserved_2[16];
+		} tlb_error;
+	} u;
+};
+
+struct mce_error_info {
+	enum MCE_ErrorType error_type:8;
+	union {
+		enum MCE_UeErrorType ue_error_type:8;
+		enum MCE_SlbErrorType slb_error_type:8;
+		enum MCE_EratErrorType erat_error_type:8;
+		enum MCE_TlbErrorType tlb_error_type:8;
+	} u;
+	uint8_t		reserved[2];
+};
+
+#define MAX_MC_EVT	100
+
+extern void save_mce_event(struct pt_regs *regs, long handled,
+			   struct mce_error_info *mce_err, uint64_t addr);
+extern int get_mce_event(struct machine_check_event *mce);
 
 #endif /* __ASM_PPC64_MCE_H__ */
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index a1aba53..0b5b04a 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -35,7 +35,7 @@ obj-y				:= cputable.o ptrace.o syscalls.o \
 				   misc_$(CONFIG_WORD_SIZE).o vdso32/
 obj-$(CONFIG_PPC64)		+= setup_64.o sys_ppc32.o \
 				   signal_64.o ptrace32.o \
-				   paca.o nvram_64.o firmware.o
+				   paca.o nvram_64.o firmware.o mce.o
 obj-$(CONFIG_HAVE_HW_BREAKPOINT)	+= hw_breakpoint.o
 obj-$(CONFIG_PPC_BOOK3S_64)	+= cpu_setup_ppc970.o cpu_setup_pa6t.o
 obj-$(CONFIG_PPC_BOOK3S_64)	+= cpu_setup_power.o
diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
new file mode 100644
index 0000000..a523d8b
--- /dev/null
+++ b/arch/powerpc/kernel/mce.c
@@ -0,0 +1,152 @@
+/*
+ * Machine check exception handling.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright 2013 IBM Corporation
+ * Author: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
+ */
+
+#undef DEBUG
+#define pr_fmt(fmt) "mce: " fmt
+
+#include <linux/types.h>
+#include <linux/ptrace.h>
+#include <linux/percpu.h>
+#include <linux/export.h>
+#include <asm/mce.h>
+
+static DEFINE_PER_CPU(int, mce_nest_count);
+static DEFINE_PER_CPU(struct machine_check_event[MAX_MC_EVT], mce_event);
+
+static void mce_set_error_info(struct machine_check_event *mce,
+			       struct mce_error_info *mce_err)
+{
+	mce->error_type = mce_err->error_type;
+	switch (mce_err->error_type) {
+	case MCE_ERROR_TYPE_UE:
+		mce->u.ue_error.ue_error_type = mce_err->u.ue_error_type;
+		break;
+	case MCE_ERROR_TYPE_SLB:
+		mce->u.slb_error.slb_error_type = mce_err->u.slb_error_type;
+		break;
+	case MCE_ERROR_TYPE_ERAT:
+		mce->u.erat_error.erat_error_type = mce_err->u.erat_error_type;
+		break;
+	case MCE_ERROR_TYPE_TLB:
+		mce->u.tlb_error.tlb_error_type = mce_err->u.tlb_error_type;
+		break;
+	case MCE_ERROR_TYPE_UNKNOWN:
+	default:
+		break;
+	}
+}
+
+/*
+ * Decode and save high level MCE information into per cpu buffer which
+ * is an array of machine_check_event structure.
+ */
+void save_mce_event(struct pt_regs *regs, long handled,
+		    struct mce_error_info *mce_err,
+		    uint64_t addr)
+{
+	uint64_t srr1;
+	int index = __get_cpu_var(mce_nest_count)++;
+	struct machine_check_event *mce = &__get_cpu_var(mce_event[index]);
+
+	/*
+	 * Return if we don't have enough space to log mce event.
+	 * mce_nest_count may go beyond MAX_MC_EVT but that's ok,
+	 * the check below will stop buffer overrun.
+	 */
+	if (index >= MAX_MC_EVT)
+		return;
+
+	/* Populate generic machine check info */
+	mce->version = MCE_V1;
+	mce->srr0 = regs->nip;
+	mce->srr1 = regs->msr;
+	mce->gpr3 = regs->gpr[3];
+	mce->in_use = 1;
+
+	mce->initiator = MCE_INITIATOR_CPU;
+	if (handled)
+		mce->disposition = MCE_DISPOSITION_RECOVERED;
+	else
+		mce->disposition = MCE_DISPOSITION_NOT_RECOVERED;
+	mce->severity = MCE_SEV_ERROR_SYNC;
+
+	srr1 = regs->msr;
+
+	/*
+	 * Populate the mce error_type and type-specific error_type.
+	 */
+	mce_set_error_info(mce, mce_err);
+
+	if (!addr)
+		return;
+
+	if (mce->error_type == MCE_ERROR_TYPE_TLB) {
+		mce->u.tlb_error.effective_address_provided = true;
+		mce->u.tlb_error.effective_address = addr;
+	} else if (mce->error_type == MCE_ERROR_TYPE_SLB) {
+		mce->u.slb_error.effective_address_provided = true;
+		mce->u.slb_error.effective_address = addr;
+	} else if (mce->error_type == MCE_ERROR_TYPE_ERAT) {
+		mce->u.erat_error.effective_address_provided = true;
+		mce->u.erat_error.effective_address = addr;
+	} else if (mce->error_type == MCE_ERROR_TYPE_UE) {
+		mce->u.ue_error.effective_address_provided = true;
+		mce->u.ue_error.effective_address = addr;
+	}
+	return;
+}
+
+/*
+ * get_mce_event:
+ *	mce	Pointer to machine_check_event structure to be filled.
+ *
+ *	return	1 = success
+ *		0 = failure
+ *
+ * get_mce_event() will be called by platform specific machine check
+ * handle routine and in KVM.
+ * When we call get_mce_event(), we are still in interrupt context and
+ * preemption will not be scheduled until ret_from_expect() routine
+ * is called.
+ */
+int get_mce_event(struct machine_check_event *mce)
+{
+	int index = __get_cpu_var(mce_nest_count) - 1;
+	struct machine_check_event *mc_evt;
+	int ret = 0;
+
+	/* Sanity check */
+	if (!mce || (index < 0))
+		return ret;
+
+	/* Check if we have MCE info to process. */
+	if (index < MAX_MC_EVT) {
+		mc_evt = &__get_cpu_var(mce_event[index]);
+		/* Copy the event structure and release the original */
+		*mce = *mc_evt;
+		mc_evt->in_use = 0;
+		ret = 1;
+	}
+	/* Decrement the count to free the slot */
+	__get_cpu_var(mce_nest_count)--;
+
+	return ret;
+}
diff --git a/arch/powerpc/kernel/mce_power.c b/arch/powerpc/kernel/mce_power.c
index 949d102..c153d9c 100644
--- a/arch/powerpc/kernel/mce_power.c
+++ b/arch/powerpc/kernel/mce_power.c
@@ -136,22 +136,116 @@ static long mce_handle_ierror_p7(uint64_t srr1)
 	return handled;
 }
 
+static void mce_get_common_ierror(struct mce_error_info *mce_err, uint64_t srr1)
+{
+	switch (P7_SRR1_MC_IFETCH(srr1)) {
+	case P7_SRR1_MC_IFETCH_SLB_PARITY:
+		mce_err->error_type = MCE_ERROR_TYPE_SLB;
+		mce_err->u.slb_error_type = MCE_SLB_ERROR_PARITY;
+		break;
+	case P7_SRR1_MC_IFETCH_SLB_MULTIHIT:
+		mce_err->error_type = MCE_ERROR_TYPE_SLB;
+		mce_err->u.slb_error_type = MCE_SLB_ERROR_MULTIHIT;
+		break;
+	case P7_SRR1_MC_IFETCH_TLB_MULTIHIT:
+		mce_err->error_type = MCE_ERROR_TYPE_TLB;
+		mce_err->u.tlb_error_type = MCE_TLB_ERROR_MULTIHIT;
+		break;
+	case P7_SRR1_MC_IFETCH_UE:
+	case P7_SRR1_MC_IFETCH_UE_IFU_INTERNAL:
+		mce_err->error_type = MCE_ERROR_TYPE_UE;
+		mce_err->u.ue_error_type = MCE_UE_ERROR_IFETCH;
+		break;
+	case P7_SRR1_MC_IFETCH_UE_TLB_RELOAD:
+		mce_err->error_type = MCE_ERROR_TYPE_UE;
+		mce_err->u.ue_error_type =
+				MCE_UE_ERROR_PAGE_TABLE_WALK_IFETCH;
+		break;
+	}
+}
+
+static void mce_get_ierror_p7(struct mce_error_info *mce_err, uint64_t srr1)
+{
+	mce_get_common_ierror(mce_err, srr1);
+	if (P7_SRR1_MC_IFETCH(srr1) == P7_SRR1_MC_IFETCH_SLB_BOTH) {
+		mce_err->error_type = MCE_ERROR_TYPE_SLB;
+		mce_err->u.slb_error_type = MCE_SLB_ERROR_INDETERMINATE;
+	}
+}
+
+static void mce_get_derror_p7(struct mce_error_info *mce_err, uint64_t dsisr)
+{
+	if (dsisr & P7_DSISR_MC_UE) {
+		mce_err->error_type = MCE_ERROR_TYPE_UE;
+		mce_err->u.ue_error_type = MCE_UE_ERROR_LOAD_STORE;
+	} else if (dsisr & P7_DSISR_MC_UE_TABLEWALK) {
+		mce_err->error_type = MCE_ERROR_TYPE_UE;
+		mce_err->u.ue_error_type =
+				MCE_UE_ERROR_PAGE_TABLE_WALK_LOAD_STORE;
+	} else if (dsisr & P7_DSISR_MC_ERAT_MULTIHIT) {
+		mce_err->error_type = MCE_ERROR_TYPE_ERAT;
+		mce_err->u.erat_error_type = MCE_ERAT_ERROR_MULTIHIT;
+	} else if (dsisr & P7_DSISR_MC_SLB_MULTIHIT) {
+		mce_err->error_type = MCE_ERROR_TYPE_SLB;
+		mce_err->u.slb_error_type = MCE_SLB_ERROR_MULTIHIT;
+	} else if (dsisr & P7_DSISR_MC_SLB_PARITY_MFSLB) {
+		mce_err->error_type = MCE_ERROR_TYPE_SLB;
+		mce_err->u.slb_error_type = MCE_SLB_ERROR_PARITY;
+	} else if (dsisr & P7_DSISR_MC_TLB_MULTIHIT_MFTLB) {
+		mce_err->error_type = MCE_ERROR_TYPE_TLB;
+		mce_err->u.tlb_error_type = MCE_TLB_ERROR_MULTIHIT;
+	} else if (dsisr & P7_DSISR_MC_SLB_MULTIHIT_PARITY) {
+		mce_err->error_type = MCE_ERROR_TYPE_SLB;
+		mce_err->u.slb_error_type = MCE_SLB_ERROR_INDETERMINATE;
+	}
+}
+
 long __machine_check_early_realmode_p7(struct pt_regs *regs)
 {
-	uint64_t srr1;
+	uint64_t srr1, addr;
 	long handled = 1;
+	struct mce_error_info mce_error_info = { 0 };
 
 	srr1 = regs->msr;
 
-	if (P7_SRR1_MC_LOADSTORE(srr1))
+	/*
+	 * Handle memory errors depending whether this was a load/store or
+	 * ifetch exception. Also, populate the mce error_type and
+	 * type-specific error_type from either SRR1 or DSISR, depending
+	 * whether this was a load/store or ifetch exception
+	 */
+	if (P7_SRR1_MC_LOADSTORE(srr1)) {
 		handled = mce_handle_derror_p7(regs->dsisr);
-	else
+		mce_get_derror_p7(&mce_error_info, regs->dsisr);
+		addr = regs->dar;
+	} else {
 		handled = mce_handle_ierror_p7(srr1);
+		mce_get_ierror_p7(&mce_error_info, srr1);
+		addr = regs->nip;
+	}
 
-	/* TODO: Decode machine check reason. */
+	save_mce_event(regs, handled, &mce_error_info, addr);
 	return handled;
 }
 
+static void mce_get_ierror_p8(struct mce_error_info *mce_err, uint64_t srr1)
+{
+	mce_get_common_ierror(mce_err, srr1);
+	if (P7_SRR1_MC_IFETCH(srr1) == P8_SRR1_MC_IFETCH_ERAT_MULTIHIT) {
+		mce_err->error_type = MCE_ERROR_TYPE_ERAT;
+		mce_err->u.erat_error_type = MCE_ERAT_ERROR_MULTIHIT;
+	}
+}
+
+static void mce_get_derror_p8(struct mce_error_info *mce_err, uint64_t dsisr)
+{
+	mce_get_derror_p7(mce_err, dsisr);
+	if (dsisr & P8_DSISR_MC_ERAT_MULTIHIT_SEC) {
+		mce_err->error_type = MCE_ERROR_TYPE_ERAT;
+		mce_err->u.erat_error_type = MCE_ERAT_ERROR_MULTIHIT;
+	}
+}
+
 static long mce_handle_ierror_p8(uint64_t srr1)
 {
 	long handled = 0;
@@ -172,16 +266,22 @@ static long mce_handle_derror_p8(uint64_t dsisr)
 
 long __machine_check_early_realmode_p8(struct pt_regs *regs)
 {
-	uint64_t srr1;
+	uint64_t srr1, addr;
 	long handled = 1;
+	struct mce_error_info mce_error_info = { 0 };
 
 	srr1 = regs->msr;
 
-	if (P7_SRR1_MC_LOADSTORE(srr1))
+	if (P7_SRR1_MC_LOADSTORE(srr1)) {
 		handled = mce_handle_derror_p8(regs->dsisr);
-	else
+		mce_get_derror_p8(&mce_error_info, regs->dsisr);
+		addr = regs->dar;
+	} else {
 		handled = mce_handle_ierror_p8(srr1);
+		mce_get_ierror_p8(&mce_error_info, srr1);
+		addr = regs->nip;
+	}
 
-	/* TODO: Decode machine check reason. */
+	save_mce_event(regs, handled, &mce_error_info, addr);
 	return handled;
 }
diff --git a/arch/powerpc/kvm/book3s_hv_ras.c b/arch/powerpc/kvm/book3s_hv_ras.c
index 5c427b4..cddeba3 100644
--- a/arch/powerpc/kvm/book3s_hv_ras.c
+++ b/arch/powerpc/kvm/book3s_hv_ras.c
@@ -12,6 +12,7 @@
 #include <linux/kvm_host.h>
 #include <linux/kernel.h>
 #include <asm/opal.h>
+#include <asm/mce.h>
 
 /* SRR1 bits for machine check on POWER7 */
 #define SRR1_MC_LDSTERR		(1ul << (63-42))
@@ -67,9 +68,7 @@ static void reload_slb(struct kvm_vcpu *vcpu)
 static long kvmppc_realmode_mc_power7(struct kvm_vcpu *vcpu)
 {
 	unsigned long srr1 = vcpu->arch.shregs.msr;
-#ifdef CONFIG_PPC_POWERNV
-	struct opal_machine_check_event *opal_evt;
-#endif
+	struct machine_check_event mce_evt;
 	long handled = 1;
 
 	if (srr1 & SRR1_MC_LDSTERR) {
@@ -109,22 +108,19 @@ static long kvmppc_realmode_mc_power7(struct kvm_vcpu *vcpu)
 		handled = 0;
 	}
 
-#ifdef CONFIG_PPC_POWERNV
 	/*
-	 * See if OPAL has already handled the condition.
-	 * We assume that if the condition is recovered then OPAL
+	 * See if we have already handled the condition in the linux host.
+	 * We assume that if the condition is recovered then linux host
 	 * will have generated an error log event that we will pick
 	 * up and log later.
 	 */
-	opal_evt = local_paca->opal_mc_evt;
-	if (opal_evt->version == OpalMCE_V1 &&
-	    (opal_evt->severity == OpalMCE_SEV_NO_ERROR ||
-	     opal_evt->disposition == OpalMCE_DISPOSITION_RECOVERED))
-		handled = 1;
+	if (!get_mce_event(&mce_evt))
+		return handled;
 
-	if (handled)
-		opal_evt->in_use = 0;
-#endif
+	if (mce_evt.version == MCE_V1 &&
+	    (mce_evt.severity == MCE_SEV_NO_ERROR ||
+	     mce_evt.disposition == MCE_DISPOSITION_RECOVERED))
+		handled = 1;
 
 	return handled;
 }
diff --git a/arch/powerpc/platforms/powernv/opal.c b/arch/powerpc/platforms/powernv/opal.c
index 106301f..bd1afa4 100644
--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -19,6 +19,7 @@
 #include <linux/slab.h>
 #include <asm/opal.h>
 #include <asm/firmware.h>
+#include <asm/mce.h>
 
 #include "powernv.h"
 
@@ -245,8 +246,7 @@ int opal_put_chars(uint32_t vtermno, const char *data, int total_len)
 
 int opal_machine_check(struct pt_regs *regs)
 {
-	struct opal_machine_check_event *opal_evt = get_paca()->opal_mc_evt;
-	struct opal_machine_check_event evt;
+	struct machine_check_event evt;
 	const char *level, *sevstr, *subtype;
 	static const char *opal_mc_ue_types[] = {
 		"Indeterminate",
@@ -271,30 +271,29 @@ int opal_machine_check(struct pt_regs *regs)
 		"Multihit",
 	};
 
-	/* Copy the event structure and release the original */
-	evt = *opal_evt;
-	opal_evt->in_use = 0;
+	if (!get_mce_event(&evt))
+		return 0;
 
 	/* Print things out */
-	if (evt.version != OpalMCE_V1) {
+	if (evt.version != MCE_V1) {
 		pr_err("Machine Check Exception, Unknown event version %d !\n",
 		       evt.version);
 		return 0;
 	}
 	switch(evt.severity) {
-	case OpalMCE_SEV_NO_ERROR:
+	case MCE_SEV_NO_ERROR:
 		level = KERN_INFO;
 		sevstr = "Harmless";
 		break;
-	case OpalMCE_SEV_WARNING:
+	case MCE_SEV_WARNING:
 		level = KERN_WARNING;
 		sevstr = "";
 		break;
-	case OpalMCE_SEV_ERROR_SYNC:
+	case MCE_SEV_ERROR_SYNC:
 		level = KERN_ERR;
 		sevstr = "Severe";
 		break;
-	case OpalMCE_SEV_FATAL:
+	case MCE_SEV_FATAL:
 	default:
 		level = KERN_ERR;
 		sevstr = "Fatal";
@@ -302,12 +301,12 @@ int opal_machine_check(struct pt_regs *regs)
 	}
 
 	printk("%s%s Machine check interrupt [%s]\n", level, sevstr,
-	       evt.disposition == OpalMCE_DISPOSITION_RECOVERED ?
+	       evt.disposition == MCE_DISPOSITION_RECOVERED ?
 	       "Recovered" : "[Not recovered");
 	printk("%s  Initiator: %s\n", level,
-	       evt.initiator == OpalMCE_INITIATOR_CPU ? "CPU" : "Unknown");
+	       evt.initiator == MCE_INITIATOR_CPU ? "CPU" : "Unknown");
 	switch(evt.error_type) {
-	case OpalMCE_ERROR_TYPE_UE:
+	case MCE_ERROR_TYPE_UE:
 		subtype = evt.u.ue_error.ue_error_type <
 			ARRAY_SIZE(opal_mc_ue_types) ?
 			opal_mc_ue_types[evt.u.ue_error.ue_error_type]
@@ -320,7 +319,7 @@ int opal_machine_check(struct pt_regs *regs)
 			printk("%s      Physial address: %016llx\n",
 			       level, evt.u.ue_error.physical_address);
 		break;
-	case OpalMCE_ERROR_TYPE_SLB:
+	case MCE_ERROR_TYPE_SLB:
 		subtype = evt.u.slb_error.slb_error_type <
 			ARRAY_SIZE(opal_mc_slb_types) ?
 			opal_mc_slb_types[evt.u.slb_error.slb_error_type]
@@ -330,7 +329,7 @@ int opal_machine_check(struct pt_regs *regs)
 			printk("%s    Effective address: %016llx\n",
 			       level, evt.u.slb_error.effective_address);
 		break;
-	case OpalMCE_ERROR_TYPE_ERAT:
+	case MCE_ERROR_TYPE_ERAT:
 		subtype = evt.u.erat_error.erat_error_type <
 			ARRAY_SIZE(opal_mc_erat_types) ?
 			opal_mc_erat_types[evt.u.erat_error.erat_error_type]
@@ -340,7 +339,7 @@ int opal_machine_check(struct pt_regs *regs)
 			printk("%s    Effective address: %016llx\n",
 			       level, evt.u.erat_error.effective_address);
 		break;
-	case OpalMCE_ERROR_TYPE_TLB:
+	case MCE_ERROR_TYPE_TLB:
 		subtype = evt.u.tlb_error.tlb_error_type <
 			ARRAY_SIZE(opal_mc_tlb_types) ?
 			opal_mc_tlb_types[evt.u.tlb_error.tlb_error_type]
@@ -351,11 +350,11 @@ int opal_machine_check(struct pt_regs *regs)
 			       level, evt.u.tlb_error.effective_address);
 		break;
 	default:
-	case OpalMCE_ERROR_TYPE_UNKNOWN:
+	case MCE_ERROR_TYPE_UNKNOWN:
 		printk("%s  Error type: Unknown\n", level);
 		break;
 	}
-	return evt.severity == OpalMCE_SEV_FATAL ? 0 : 1;
+	return evt.severity == MCE_SEV_FATAL ? 0 : 1;
 }
 
 static irqreturn_t opal_interrupt(int irq, void *data)

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH 8/9] powerpc/powernv: Remove machine check handling in OPAL.
  2013-08-07  9:37 [RFC PATCH 0/9] Machine check handling in linux host Mahesh J Salgaonkar
                   ` (6 preceding siblings ...)
  2013-08-07  9:39 ` [RFC PATCH 7/9] powerpc: Decode and save machine check event Mahesh J Salgaonkar
@ 2013-08-07  9:39 ` Mahesh J Salgaonkar
  2013-08-07  9:39 ` [RFC PATCH 9/9] powerpc/powernv: Machine check exception handling Mahesh J Salgaonkar
  8 siblings, 0 replies; 22+ messages in thread
From: Mahesh J Salgaonkar @ 2013-08-07  9:39 UTC (permalink / raw)
  To: linuxppc-dev, Benjamin Herrenschmidt
  Cc: Jeremy Kerr, Paul Mackerras, Anton Blanchard

From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

Now that we are ready to handle machine check directly in linux, do not
register with firmware to handle machine check exception.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/opal.c |    8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/opal.c b/arch/powerpc/platforms/powernv/opal.c
index bd1afa4..a08243d 100644
--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -83,14 +83,10 @@ static int __init opal_register_exception_handlers(void)
 	if (!(powerpc_firmware_features & FW_FEATURE_OPAL))
 		return -ENODEV;
 
-	/* Hookup some exception handlers. We use the fwnmi area at 0x7000
-	 * to provide the glue space to OPAL
+	/* Hookup some exception handlers except machine check. We use the
+	 * fwnmi area at 0x7000 to provide the glue space to OPAL
 	 */
 	glue = 0x7000;
-	opal_register_exception_handler(OPAL_MACHINE_CHECK_HANDLER,
-					__pa(opal_mc_secondary_handler[0]),
-					glue);
-	glue += 128;
 	opal_register_exception_handler(OPAL_HYPERVISOR_MAINTENANCE_HANDLER,
 					0, glue);
 	glue += 128;

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH 9/9] powerpc/powernv: Machine check exception handling.
  2013-08-07  9:37 [RFC PATCH 0/9] Machine check handling in linux host Mahesh J Salgaonkar
                   ` (7 preceding siblings ...)
  2013-08-07  9:39 ` [RFC PATCH 8/9] powerpc/powernv: Remove machine check handling in OPAL Mahesh J Salgaonkar
@ 2013-08-07  9:39 ` Mahesh J Salgaonkar
  8 siblings, 0 replies; 22+ messages in thread
From: Mahesh J Salgaonkar @ 2013-08-07  9:39 UTC (permalink / raw)
  To: linuxppc-dev, Benjamin Herrenschmidt
  Cc: Jeremy Kerr, Paul Mackerras, Anton Blanchard

From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

Add basic error handling in machine check exception handler.

- If MSR_RI isn't set, we can not recover.
- Check if disposition set to OpalMCE_DISPOSITION_RECOVERED.
- Check if address at fault is inside kernel address space, if not then send
  SIGBUS to process if we hit exception when in userspace.
- If address at fault is not provided then and if we get a synchronous machine
  check while in userspace then kill the task.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/mce.h        |    1 +
 arch/powerpc/kernel/mce.c             |   27 +++++++++++++++++++++
 arch/powerpc/platforms/powernv/opal.c |   43 ++++++++++++++++++++++++++++++++-
 3 files changed, 70 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/mce.h b/arch/powerpc/include/asm/mce.h
index cf57f3d..558d0ca 100644
--- a/arch/powerpc/include/asm/mce.h
+++ b/arch/powerpc/include/asm/mce.h
@@ -185,5 +185,6 @@ struct mce_error_info {
 extern void save_mce_event(struct pt_regs *regs, long handled,
 			   struct mce_error_info *mce_err, uint64_t addr);
 extern int get_mce_event(struct machine_check_event *mce);
+extern uint64_t get_mce_fault_addr(struct machine_check_event *evt);
 
 #endif /* __ASM_PPC64_MCE_H__ */
diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
index a523d8b..eb9e059 100644
--- a/arch/powerpc/kernel/mce.c
+++ b/arch/powerpc/kernel/mce.c
@@ -150,3 +150,30 @@ int get_mce_event(struct machine_check_event *mce)
 
 	return ret;
 }
+
+uint64_t get_mce_fault_addr(struct machine_check_event *evt)
+{
+	switch (evt->error_type) {
+	case MCE_ERROR_TYPE_UE:
+		if (evt->u.ue_error.effective_address_provided)
+			return evt->u.ue_error.effective_address;
+		break;
+	case MCE_ERROR_TYPE_SLB:
+		if (evt->u.slb_error.effective_address_provided)
+			return evt->u.slb_error.effective_address;
+		break;
+	case MCE_ERROR_TYPE_ERAT:
+		if (evt->u.erat_error.effective_address_provided)
+			return evt->u.erat_error.effective_address;
+		break;
+	case MCE_ERROR_TYPE_TLB:
+		if (evt->u.tlb_error.effective_address_provided)
+			return evt->u.tlb_error.effective_address;
+		break;
+	default:
+	case MCE_ERROR_TYPE_UNKNOWN:
+		break;
+	}
+	return 0;
+}
+EXPORT_SYMBOL(get_mce_fault_addr);
diff --git a/arch/powerpc/platforms/powernv/opal.c b/arch/powerpc/platforms/powernv/opal.c
index a08243d..407d571 100644
--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -17,6 +17,7 @@
 #include <linux/interrupt.h>
 #include <linux/notifier.h>
 #include <linux/slab.h>
+#include <linux/sched.h>
 #include <asm/opal.h>
 #include <asm/firmware.h>
 #include <asm/mce.h>
@@ -240,6 +241,44 @@ int opal_put_chars(uint32_t vtermno, const char *data, int total_len)
 	return written;
 }
 
+static int opal_recover_mce(struct pt_regs *regs,
+					struct machine_check_event *evt)
+{
+	int recovered = 0;
+	uint64_t ea = get_mce_fault_addr(evt);
+
+	if (!(regs->msr & MSR_RI)) {
+		/* If MSR_RI isn't set, we cannot recover */
+		recovered = 0;
+	} else if (evt->disposition == MCE_DISPOSITION_RECOVERED) {
+		/* Platform corrected itself */
+		recovered = 1;
+	} else if (ea && !is_kernel_addr(ea)) {
+		/*
+		 * Faulting address is not in kernel text. We should be fine.
+		 * We need to find which process uses this address.
+		 * For now, kill the task if we have received exception when
+		 * in userspace.
+		 *
+		 * TODO: Queue up this address for hwpoisioning later.
+		 */
+		if (user_mode(regs) && !is_global_init(current)) {
+			_exception(SIGBUS, regs, BUS_MCEERR_AR, regs->nip);
+			recovered = 1;
+		} else
+			recovered = 0;
+	} else if (user_mode(regs) && !is_global_init(current) &&
+		evt->severity == MCE_SEV_ERROR_SYNC) {
+		/*
+		 * If we have received a synchronous error when in userspace
+		 * kill the task.
+		 */
+		_exception(SIGBUS, regs, BUS_MCEERR_AR, regs->nip);
+		recovered = 1;
+	}
+	return recovered;
+}
+
 int opal_machine_check(struct pt_regs *regs)
 {
 	struct machine_check_event evt;
@@ -350,7 +389,9 @@ int opal_machine_check(struct pt_regs *regs)
 		printk("%s  Error type: Unknown\n", level);
 		break;
 	}
-	return evt.severity == MCE_SEV_FATAL ? 0 : 1;
+	if (opal_recover_mce(regs, &evt))
+		return 1;
+	return 0;
 }
 
 static irqreturn_t opal_interrupt(int irq, void *data)

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 7/9] powerpc: Decode and save machine check event.
  2013-08-07  9:39 ` [RFC PATCH 7/9] powerpc: Decode and save machine check event Mahesh J Salgaonkar
@ 2013-08-07 18:41   ` Scott Wood
  2013-08-08  3:40     ` Mahesh Jagannath Salgaonkar
  2013-08-08  5:14   ` Paul Mackerras
  1 sibling, 1 reply; 22+ messages in thread
From: Scott Wood @ 2013-08-07 18:41 UTC (permalink / raw)
  To: Mahesh J Salgaonkar
  Cc: linuxppc-dev, Paul Mackerras, Jeremy Kerr, Anton Blanchard

On Wed, 2013-08-07 at 15:09 +0530, Mahesh J Salgaonkar wrote:
> diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
> index a1aba53..0b5b04a 100644
> --- a/arch/powerpc/kernel/Makefile
> +++ b/arch/powerpc/kernel/Makefile
> @@ -35,7 +35,7 @@ obj-y				:= cputable.o ptrace.o syscalls.o \
>  				   misc_$(CONFIG_WORD_SIZE).o vdso32/
>  obj-$(CONFIG_PPC64)		+= setup_64.o sys_ppc32.o \
>  				   signal_64.o ptrace32.o \
> -				   paca.o nvram_64.o firmware.o
> +				   paca.o nvram_64.o firmware.o mce.o

What happens when we build mce.o on book3e?

Please don't take things that are specific to certain hardware and try
to apply them to all powerpc.

For that matter, it would be nice if book3s-specific (or POWER-specific)
patches had that in the subject prefix, just as booke-specific patches
do.

-Scott

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 7/9] powerpc: Decode and save machine check event.
  2013-08-07 18:41   ` Scott Wood
@ 2013-08-08  3:40     ` Mahesh Jagannath Salgaonkar
  0 siblings, 0 replies; 22+ messages in thread
From: Mahesh Jagannath Salgaonkar @ 2013-08-08  3:40 UTC (permalink / raw)
  To: Scott Wood; +Cc: linuxppc-dev, Paul Mackerras, Jeremy Kerr, Anton Blanchard

On 08/08/2013 12:11 AM, Scott Wood wrote:
> On Wed, 2013-08-07 at 15:09 +0530, Mahesh J Salgaonkar wrote:
>> diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
>> index a1aba53..0b5b04a 100644
>> --- a/arch/powerpc/kernel/Makefile
>> +++ b/arch/powerpc/kernel/Makefile
>> @@ -35,7 +35,7 @@ obj-y				:= cputable.o ptrace.o syscalls.o \
>>  				   misc_$(CONFIG_WORD_SIZE).o vdso32/
>>  obj-$(CONFIG_PPC64)		+= setup_64.o sys_ppc32.o \
>>  				   signal_64.o ptrace32.o \
>> -				   paca.o nvram_64.o firmware.o
>> +				   paca.o nvram_64.o firmware.o mce.o
> 
> What happens when we build mce.o on book3e?
> 
> Please don't take things that are specific to certain hardware and try
> to apply them to all powerpc.

sure. I will move it under obj-$(CONFIG_PPC_BOOK3S_64).

> 
> For that matter, it would be nice if book3s-specific (or POWER-specific)
> patches had that in the subject prefix, just as booke-specific patches
> do.

Sure will follow this for future versions.

Thanks,
-Mahesh.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 1/9] powerpc: Split the common exception prolog logic into two section.
  2013-08-07  9:38 ` [RFC PATCH 1/9] powerpc: Split the common exception prolog logic into two section Mahesh J Salgaonkar
@ 2013-08-08  4:10   ` Anshuman Khandual
  2013-08-08  4:16     ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 22+ messages in thread
From: Anshuman Khandual @ 2013-08-08  4:10 UTC (permalink / raw)
  To: Mahesh J Salgaonkar
  Cc: linuxppc-dev, Paul Mackerras, Jeremy Kerr, Anton Blanchard

On 08/07/2013 03:08 PM, Mahesh J Salgaonkar wrote:
> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> 
> This patch splits the common exception prolog logic into two parts to
> facilitate reuse of existing code in the next patch. The second part will
> be reused in the machine check exception routine in the next patch.
> 

Please avoid describing the functionality as a requirement for upcoming
sibling patches. Justification to split the code should be generic functional
or code organizational requirement. We should avoid the word "next patch" in
the commit message, as it would be confusing when you read it later point of
time. The commit message should be self sufficient pertaining to the exact
code change set in consideration.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 1/9] powerpc: Split the common exception prolog logic into two section.
  2013-08-08  4:10   ` Anshuman Khandual
@ 2013-08-08  4:16     ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 22+ messages in thread
From: Benjamin Herrenschmidt @ 2013-08-08  4:16 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: Mahesh J Salgaonkar, Paul Mackerras, Jeremy Kerr,
	Anton Blanchard, linuxppc-dev

On Thu, 2013-08-08 at 09:40 +0530, Anshuman Khandual wrote:
> On 08/07/2013 03:08 PM, Mahesh J Salgaonkar wrote:
> > From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> > 
> > This patch splits the common exception prolog logic into two parts to
> > facilitate reuse of existing code in the next patch. The second part will
> > be reused in the machine check exception routine in the next patch.
> > 
> 
> Please avoid describing the functionality as a requirement for upcoming
> sibling patches. Justification to split the code should be generic functional
> or code organizational requirement. We should avoid the word "next patch" in
> the commit message, as it would be confusing when you read it later point of
> time. The commit message should be self sufficient pertaining to the exact
> code change set in consideration.

Ugh ?

It's absolutely common practice to have a patch doing such a split
*specifically* for the purpose of subsequent patches....

In fact it's even *recommended* to separate the split from the
subsequent code change as the code split patch should be a nop, and it
makes the subsequent patch a lot easier to review.

Ben.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 2/9] powerpc: handle machine check in Linux host.
  2013-08-07  9:38 ` [RFC PATCH 2/9] powerpc: handle machine check in Linux host Mahesh J Salgaonkar
@ 2013-08-08  4:51   ` Paul Mackerras
  2013-08-08 13:19     ` Mahesh Jagannath Salgaonkar
  2013-08-08  5:01   ` Anshuman Khandual
  1 sibling, 1 reply; 22+ messages in thread
From: Paul Mackerras @ 2013-08-08  4:51 UTC (permalink / raw)
  To: Mahesh J Salgaonkar; +Cc: linuxppc-dev, Jeremy Kerr, Anton Blanchard

On Wed, Aug 07, 2013 at 03:08:15PM +0530, Mahesh J Salgaonkar wrote:
> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> 
> Move machine check entry point into Linux. So far we were dependent on
> firmware to decode MCE error details and handover the high level info to OS.
> 
> This patch introduces early machine check routine that saves the MCE
> information (srr1, srr0, dar and dsisr) to the emergency stack. We allocate
> stack frame on emergency stack and set the r1 accordingly. This allows us
> to be prepared to take another exception without loosing context. One thing
> to note here that, if we get another machine check while ME bit is off then
> we risk a checkstop. Hence we restrict ourselves to save only MCE information
> and turn the ME bit on.

Some comments below:

> + * Swicth to emergency stack and handle re-entrancy (though we currently
      ^
      Switch

> + * don't test for overflow). Save MCE registers srr1, srr0, dar and
> + * dsisr and then turn the ME bit on.
> + */
> +#define __EARLY_MACHINE_CHECK_HANDLER(area, label)			\
> +	/* Check if we are laready using emergency stack. */		\
> +	ld	r10,PACAEMERGSP(r13);					\
> +	subi	r10,r10,THREAD_SIZE;					\
> +	rldicr	r10,r10,0,(63 - THREAD_SHIFT);				\
> +	rldicr	r11,r1,0,(63 - THREAD_SHIFT);				\
> +	cmpd	r10,r11;	/* Are we using emergency stack? */	\

This assumes that r1 contains a host kernel stack pointer value.
However, we could be coming in from a guest, or from userspace, in
which case r1 could have any value at all, and we shouldn't even be
considering using it as our stack pointer.

There is another complication, too: if we are coming in from a KVM
guest running on a secondary thread (not thread 0), we will be using
part of the emergency stack already (see kvm_start_guest in
arch/powerpc/kvm/book3s_hv_rmhandlers.S).  However, because we have
come in from a guest, r1 contains a guest value, and we would have to
look in KVM data structures to find out how much of the emergency
stack we have already used.  I'm not sure at the moment what the best
way to fix this is.

> +	mr	r11,r1;			/* Save current stack pointer */\
> +	beq	0f;							\
> +	ld	r1,PACAEMERGSP(r13);	/* Use emergency stack */	\
> +0:	subi	r1,r1,INT_FRAME_SIZE;	/* alloc stack frame */		\
> +	std	r11,GPR1(r1);						\
> +	std	r11,0(r1);		/* make stack chain pointer */	\
> +	mfspr	r11,SPRN_SRR0;		/* Save SRR0 */			\
> +	std	r11,_NIP(r1);						\
> +	mfspr	r11,SPRN_SRR1;		/* Save SRR1 */			\
> +	std	r11,_MSR(r1);						\
> +	mfspr	r11,SPRN_DAR;		/* Save DAR */			\
> +	std 	r11,_DAR(r1);						\
> +	mfspr	r11,SPRN_DSISR;		/* Save DSISR */		\
> +	std	r11,_DSISR(r1);						\
> +	mfmsr	r11;			/* get MSR value */		\
> +	ori	r11,r11,MSR_ME;		/* turn on ME bit */		\
> +	ld	r12,PACAKBASE(r13);	/* get high part of &label */	\
> +	LOAD_HANDLER(r12,label)						\
> +	mtspr	SPRN_SRR0,r12;						\
> +	mtspr	SPRN_SRR1,r11;						\
> +	rfid;								\
> +	b	.	/* prevent speculative execution */
> +#define EARLY_MACHINE_CHECK_HANDLER(area, label)			\
> +	__EARLY_MACHINE_CHECK_HANDLER(area, label)

Given this is only used in one place, why make this a macro?  Wouldn't
it be simpler and neater to just spell out this code in the place
where you use the macro?

> +	/*
> +	 * Handle machine check early in real mode. We come here with
> +	 * ME bit on with MMU (IR and DR) off and using emergency stack.
> +	 */
> +	.align	7
> +	.globl machine_check_handle_early
> +machine_check_handle_early:
> +	std	r9,_CCR(r1)	/* Save CR in stackframe */
> +	std	r0,GPR0(r1)	/* Save r0, r2 - r10 */
> +	EXCEPTION_PROLOG_COMMON_2(0x200, PACA_EXMC)
> +	bl	.save_nvgprs
> +	addi	r3,r1,STACK_FRAME_OVERHEAD
> +	bl	.machine_check_early
> +	cmpdi	r3,0			/* Did we handle it? */
> +	ld	r9,_MSR(r1)
> +	beq	1f
> +	ori	r9,r9,MSR_RI		/* Yes we have handled it. */

I don't think this is a good idea.  MSR_RI being off means either that
the machine check happened at a time when SRR0 and SRR1 contained live
contents, which have been overwritten by the machine check interrupt,
or else that the CPU was not able to determine a precise state at
which the interrupt could be taken.  Neither of those problems would
be fixed by our machine check handler.

What was the rationale for setting RI in this situation?

> +	/* Move original SRR0 and SRR1 into the respective regs */
> +1:	mtspr	SPRN_SRR1,r9
> +	ld	r3,_NIP(r1)
> +	mtspr	SPRN_SRR0,r3
> +	REST_NVGPRS(r1)
> +	ld	r9,_CTR(r1)
> +	mtctr	r9
> +	ld	r9,_XER(r1)
> +	mtxer	r9
> +	ld	r9,_CCR(r1)
> +	mtcr	r9
> +BEGIN_FTR_SECTION
> +	ld	r9,ORIG_GPR3(r1)
> +	mtspr	SPRN_CFAR,r9
> +END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
> +	ld	r9,_LINK(r1)
> +	mtlr	r9
> +	REST_GPR(0, r1)
> +	REST_8GPRS(2, r1)
> +	REST_2GPRS(10, r1)
> +	REST_GPR(12, r1)
> +	/*
> +	 * restore orig stack pointer. We have already saved MCE info
> +	 * to per cpu MCE event buffer.
> +	 */
> +	ld	r1,GPR1(r1)
> +	b	machine_check_pSeries

What about r13?  I don't see that you have restored it at this point.

> +
>  	STD_EXCEPTION_COMMON_ASYNC(0x500, hardware_interrupt, do_IRQ)
>  	STD_EXCEPTION_COMMON_ASYNC(0x900, decrementer, .timer_interrupt)
>  	STD_EXCEPTION_COMMON(0x980, hdecrementer, .hdec_interrupt)
> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
> index bf33c22..1720e08 100644
> --- a/arch/powerpc/kernel/traps.c
> +++ b/arch/powerpc/kernel/traps.c
> @@ -286,6 +286,18 @@ void system_reset_exception(struct pt_regs *regs)
>  
>  	/* What should we do here? We could issue a shutdown or hard reset. */
>  }
> +
> +/*
> + * This function is called in real mode. Strictly no printk's please.
> + *
> + * regs->nip and regs->msr contains srr0 and ssr1.
> + */
> +long machine_check_early(struct pt_regs *regs)
> +{
> +	/* TODO: handle/decode machine check reason */
> +	return 0;
> +}
> +
>  #endif
>  
>  /*

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 5/9] powerpc: Flush SLB/TLBs if we get SLB/TLB machine check errors on power7.
  2013-08-07  9:38 ` [RFC PATCH 5/9] powerpc: Flush SLB/TLBs if we get SLB/TLB machine check errors on power7 Mahesh J Salgaonkar
@ 2013-08-08  4:58   ` Paul Mackerras
  0 siblings, 0 replies; 22+ messages in thread
From: Paul Mackerras @ 2013-08-08  4:58 UTC (permalink / raw)
  To: Mahesh J Salgaonkar; +Cc: linuxppc-dev, Jeremy Kerr, Anton Blanchard

On Wed, Aug 07, 2013 at 03:08:55PM +0530, Mahesh J Salgaonkar wrote:
> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> 
> If we get a machine check exception due to SLB or TLB errors, then flush
> SLBs/TLBs and reload SLBs to recover. We do this in real mode before turning
> on MMU. Otherwise we would run into nested machine checks.
> 
> If we get a machine check when we are in guest, then just flush the
> SLBs and continue. This patch handles errors for power7. The next
> patch will handle errors for power8

I don't see anywhere in this patch or the next where we check whether
we are in fact running in hypervisor mode.  It is possible for us to
get machine check interrupts to the 0x200 vector when running under a
hypervisor, for example if the hypervisor doesn't support the FWNMI
facility (e.g. KVM) or if it does but we haven't done the
ibm,nmi-register call yet.

If we're not in hypervisor mode we should probably not do any of this
new stuff, but just handle it like we always have.

Paul.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 2/9] powerpc: handle machine check in Linux host.
  2013-08-07  9:38 ` [RFC PATCH 2/9] powerpc: handle machine check in Linux host Mahesh J Salgaonkar
  2013-08-08  4:51   ` Paul Mackerras
@ 2013-08-08  5:01   ` Anshuman Khandual
  1 sibling, 0 replies; 22+ messages in thread
From: Anshuman Khandual @ 2013-08-08  5:01 UTC (permalink / raw)
  To: Mahesh J Salgaonkar
  Cc: linuxppc-dev, Paul Mackerras, Jeremy Kerr, Anton Blanchard

On 08/07/2013 03:08 PM, Mahesh J Salgaonkar wrote:
> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> 
> Move machine check entry point into Linux. So far we were dependent on
> firmware to decode MCE error details and handover the high level info to OS.
> 
> This patch introduces early machine check routine that saves the MCE
> information (srr1, srr0, dar and dsisr) to the emergency stack. We allocate
> stack frame on emergency stack and set the r1 accordingly. This allows us
> to be prepared to take another exception without loosing context. One thing
> to note here that, if we get another machine check while ME bit is off then
> we risk a checkstop. Hence we restrict ourselves to save only MCE information
> and turn the ME bit on.
> 
> This is the code flow:
> 
> 		Machine Check Interrupt
> 			|
> 			V
> 		   0x200 vector				  ME=0, IR=0, DR=0
> 			|
> 			V
> 	+-----------------------------------------------+
> 	|machine_check_pSeries_early:			| ME=0, IR=0, DR=0
> 	|	Alloc frame on emergency stack		|
> 	|	Save srr1, srr0, dar and dsisr on stack |
> 	+-----------------------------------------------+
> 			|
> 		(ME=1, IR=0, DR=0, RFID)
> 			|
> 			V
> 		machine_check_handle_early		  ME=1, IR=0, DR=0
> 			|
> 			V
> 	+-----------------------------------------------+
> 	|	machine_check_early (r3=pt_regs)	| ME=1, IR=0, DR=0
> 	|	Things to do: (in next patches)		|
> 	|		Flush SLB for SLB errors	|
> 	|		Flush TLB for TLB errors	|
> 	|		Decode and save MCE info	|
> 	+-----------------------------------------------+
> 			|
> 	(Fall through existing exception handler routine.)
> 			|
> 			V
> 		machine_check_pSerie			  ME=1, IR=0, DR=0
> 			|
> 		(ME=1, IR=1, DR=1, RFID)
> 			|
> 			V
> 		machine_check_common			  ME=1, IR=1, DR=1
> 			.
> 			.
> 			.
> 
> 
> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> ---
>  arch/powerpc/include/asm/exception-64s.h |   43 ++++++++++++++++++++++++++
>  arch/powerpc/kernel/exceptions-64s.S     |   50 +++++++++++++++++++++++++++++-
>  arch/powerpc/kernel/traps.c              |   12 +++++++
>  3 files changed, 104 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/include/asm/exception-64s.h b/arch/powerpc/include/asm/exception-64s.h
> index 2386d40..c5d2cbc 100644
> --- a/arch/powerpc/include/asm/exception-64s.h
> +++ b/arch/powerpc/include/asm/exception-64s.h
> @@ -174,6 +174,49 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
>  #define EXCEPTION_PROLOG_1(area, extra, vec)				\
>  	__EXCEPTION_PROLOG_1(area, extra, vec)
> 
> +/*
> + * Register contents:
> + * R12		= interrupt vector
> + * R13		= PACA
> + * R9		= CR
> + * R11 & R12 is saved on PACA_EXMC
> + *
> + * Swicth to emergency stack and handle re-entrancy (though we currently
> + * don't test for overflow). Save MCE registers srr1, srr0, dar and
> + * dsisr and then turn the ME bit on.
> + */
> +#define __EARLY_MACHINE_CHECK_HANDLER(area, label)			\
> +	/* Check if we are laready using emergency stack. */		\
> +	ld	r10,PACAEMERGSP(r13);					\
> +	subi	r10,r10,THREAD_SIZE;					\
> +	rldicr	r10,r10,0,(63 - THREAD_SHIFT);				\
> +	rldicr	r11,r1,0,(63 - THREAD_SHIFT);				\
> +	cmpd	r10,r11;	/* Are we using emergency stack? */	\
> +	mr	r11,r1;			/* Save current stack pointer */\
> +	beq	0f;							\
> +	ld	r1,PACAEMERGSP(r13);	/* Use emergency stack */	\
> +0:	subi	r1,r1,INT_FRAME_SIZE;	/* alloc stack frame */		\
> +	std	r11,GPR1(r1);						\
> +	std	r11,0(r1);		/* make stack chain pointer */	\
> +	mfspr	r11,SPRN_SRR0;		/* Save SRR0 */			\
> +	std	r11,_NIP(r1);						\
> +	mfspr	r11,SPRN_SRR1;		/* Save SRR1 */			\
> +	std	r11,_MSR(r1);						\
> +	mfspr	r11,SPRN_DAR;		/* Save DAR */			\
> +	std 	r11,_DAR(r1);						\
> +	mfspr	r11,SPRN_DSISR;		/* Save DSISR */		\
> +	std	r11,_DSISR(r1);						\
> +	mfmsr	r11;			/* get MSR value */		\
> +	ori	r11,r11,MSR_ME;		/* turn on ME bit */		\

You need to mention here the fact that we are vulnerable to a core check
stop possibility if we get another machine check exception till we set
the ME bit ON (from the occurrence of the interrupt).

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 7/9] powerpc: Decode and save machine check event.
  2013-08-07  9:39 ` [RFC PATCH 7/9] powerpc: Decode and save machine check event Mahesh J Salgaonkar
  2013-08-07 18:41   ` Scott Wood
@ 2013-08-08  5:14   ` Paul Mackerras
  2013-08-08 13:19     ` Mahesh Jagannath Salgaonkar
  1 sibling, 1 reply; 22+ messages in thread
From: Paul Mackerras @ 2013-08-08  5:14 UTC (permalink / raw)
  To: Mahesh J Salgaonkar; +Cc: linuxppc-dev, Jeremy Kerr, Anton Blanchard

On Wed, Aug 07, 2013 at 03:09:13PM +0530, Mahesh J Salgaonkar wrote:
> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> 
> Now that we handle machine check in linux, the MCE decoding should also
> take place in linux host. This info is crucial to log before we go down
> in case we can not handle the machine check errors. This patch decodes
> and populates a machine check event which contain high level meaning full
> MCE information.

A couple of things worry me about this patch:

First, there is the fact that we can only do get_mce_event() once for
a given machine check.  You call it in kvmppc_realmode_mc_power7(),
which is fine, but if it is not something we recognize and can handle
we will proceed to exit the guest and jump to machine_check_fwnmi,
which will then proceed to machine_check_common() and then
opal_machine_check(), where you have added another call to
get_mce_event(), which will probably underflow your little per-cpu
stack of machine check events.

Secondly, we shouldn't call save_mce_event() if we're not in
hypervisor mode, since per-cpu variables are not in general accessible
in real mode when running under a hypervisor with a limited real-mode
area (RMA).

Paul.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 2/9] powerpc: handle machine check in Linux host.
  2013-08-08  4:51   ` Paul Mackerras
@ 2013-08-08 13:19     ` Mahesh Jagannath Salgaonkar
  2013-08-08 13:33       ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 22+ messages in thread
From: Mahesh Jagannath Salgaonkar @ 2013-08-08 13:19 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev, Jeremy Kerr, Anton Blanchard

On 08/08/2013 10:21 AM, Paul Mackerras wrote:
> On Wed, Aug 07, 2013 at 03:08:15PM +0530, Mahesh J Salgaonkar wrote:
>> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
>>
>> Move machine check entry point into Linux. So far we were dependent on
>> firmware to decode MCE error details and handover the high level info to OS.
>>
>> This patch introduces early machine check routine that saves the MCE
>> information (srr1, srr0, dar and dsisr) to the emergency stack. We allocate
>> stack frame on emergency stack and set the r1 accordingly. This allows us
>> to be prepared to take another exception without loosing context. One thing
>> to note here that, if we get another machine check while ME bit is off then
>> we risk a checkstop. Hence we restrict ourselves to save only MCE information
>> and turn the ME bit on.
> 
> Some comments below:
> 
>> + * Swicth to emergency stack and handle re-entrancy (though we currently
>       ^
>       Switch
> 
>> + * don't test for overflow). Save MCE registers srr1, srr0, dar and
>> + * dsisr and then turn the ME bit on.
>> + */
>> +#define __EARLY_MACHINE_CHECK_HANDLER(area, label)			\
>> +	/* Check if we are laready using emergency stack. */		\
>> +	ld	r10,PACAEMERGSP(r13);					\
>> +	subi	r10,r10,THREAD_SIZE;					\
>> +	rldicr	r10,r10,0,(63 - THREAD_SHIFT);				\
>> +	rldicr	r11,r1,0,(63 - THREAD_SHIFT);				\
>> +	cmpd	r10,r11;	/* Are we using emergency stack? */	\
> 
> This assumes that r1 contains a host kernel stack pointer value.
> However, we could be coming in from a guest, or from userspace, in
> which case r1 could have any value at all, and we shouldn't even be
> considering using it as our stack pointer.

I see your point. I need to have different mechanism to keep track of
emergency stack pointer. May be using some sratch registers or something
in paca structure.

> 
> There is another complication, too: if we are coming in from a KVM
> guest running on a secondary thread (not thread 0), we will be using
> part of the emergency stack already (see kvm_start_guest in
> arch/powerpc/kvm/book3s_hv_rmhandlers.S).  However, because we have
> come in from a guest, r1 contains a guest value, and we would have to
> look in KVM data structures to find out how much of the emergency
> stack we have already used.  I'm not sure at the moment what the best
> way to fix this is.

How about adding a paca member to maintain current emergency stack
pointer in use?

> 
>> +	mr	r11,r1;			/* Save current stack pointer */\
>> +	beq	0f;							\
>> +	ld	r1,PACAEMERGSP(r13);	/* Use emergency stack */	\
>> +0:	subi	r1,r1,INT_FRAME_SIZE;	/* alloc stack frame */		\
>> +	std	r11,GPR1(r1);						\
>> +	std	r11,0(r1);		/* make stack chain pointer */	\
>> +	mfspr	r11,SPRN_SRR0;		/* Save SRR0 */			\
>> +	std	r11,_NIP(r1);						\
>> +	mfspr	r11,SPRN_SRR1;		/* Save SRR1 */			\
>> +	std	r11,_MSR(r1);						\
>> +	mfspr	r11,SPRN_DAR;		/* Save DAR */			\
>> +	std 	r11,_DAR(r1);						\
>> +	mfspr	r11,SPRN_DSISR;		/* Save DSISR */		\
>> +	std	r11,_DSISR(r1);						\
>> +	mfmsr	r11;			/* get MSR value */		\
>> +	ori	r11,r11,MSR_ME;		/* turn on ME bit */		\
>> +	ld	r12,PACAKBASE(r13);	/* get high part of &label */	\
>> +	LOAD_HANDLER(r12,label)						\
>> +	mtspr	SPRN_SRR0,r12;						\
>> +	mtspr	SPRN_SRR1,r11;						\
>> +	rfid;								\
>> +	b	.	/* prevent speculative execution */
>> +#define EARLY_MACHINE_CHECK_HANDLER(area, label)			\
>> +	__EARLY_MACHINE_CHECK_HANDLER(area, label)
> 
> Given this is only used in one place, why make this a macro?  Wouldn't
> it be simpler and neater to just spell out this code in the place
> where you use the macro?

Sure, will do that.

> 
>> +	/*
>> +	 * Handle machine check early in real mode. We come here with
>> +	 * ME bit on with MMU (IR and DR) off and using emergency stack.
>> +	 */
>> +	.align	7
>> +	.globl machine_check_handle_early
>> +machine_check_handle_early:
>> +	std	r9,_CCR(r1)	/* Save CR in stackframe */
>> +	std	r0,GPR0(r1)	/* Save r0, r2 - r10 */
>> +	EXCEPTION_PROLOG_COMMON_2(0x200, PACA_EXMC)
>> +	bl	.save_nvgprs
>> +	addi	r3,r1,STACK_FRAME_OVERHEAD
>> +	bl	.machine_check_early
>> +	cmpdi	r3,0			/* Did we handle it? */
>> +	ld	r9,_MSR(r1)
>> +	beq	1f
>> +	ori	r9,r9,MSR_RI		/* Yes we have handled it. */
> 
> I don't think this is a good idea.  MSR_RI being off means either that
> the machine check happened at a time when SRR0 and SRR1 contained live
> contents, which have been overwritten by the machine check interrupt,
> or else that the CPU was not able to determine a precise state at
> which the interrupt could be taken.  Neither of those problems would
> be fixed by our machine check handler.

Agree.

> 
> What was the rationale for setting RI in this situation?

When I tested SLB multihit, I got an interrupt with RI=0. In
opal_machine_check() the very first check I added is to see if MSR_RI is
set or not and return accordingly which ended up in die and panic in
machine_check_exception() routine.

But, I think I should depend on
evt->disposition==MCE_DISPOSITION_RECOVERED and not play with MSR_RI and
SRR1. I will fix my code.

> 
>> +	/* Move original SRR0 and SRR1 into the respective regs */
>> +1:	mtspr	SPRN_SRR1,r9
>> +	ld	r3,_NIP(r1)
>> +	mtspr	SPRN_SRR0,r3
>> +	REST_NVGPRS(r1)
>> +	ld	r9,_CTR(r1)
>> +	mtctr	r9
>> +	ld	r9,_XER(r1)
>> +	mtxer	r9
>> +	ld	r9,_CCR(r1)
>> +	mtcr	r9
>> +BEGIN_FTR_SECTION
>> +	ld	r9,ORIG_GPR3(r1)
>> +	mtspr	SPRN_CFAR,r9
>> +END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
>> +	ld	r9,_LINK(r1)
>> +	mtlr	r9
>> +	REST_GPR(0, r1)
>> +	REST_8GPRS(2, r1)
>> +	REST_2GPRS(10, r1)
>> +	REST_GPR(12, r1)
>> +	/*
>> +	 * restore orig stack pointer. We have already saved MCE info
>> +	 * to per cpu MCE event buffer.
>> +	 */
>> +	ld	r1,GPR1(r1)
>> +	b	machine_check_pSeries
> 
> What about r13?  I don't see that you have restored it at this point.

Will fix it.

> 
>> +
>>  	STD_EXCEPTION_COMMON_ASYNC(0x500, hardware_interrupt, do_IRQ)
>>  	STD_EXCEPTION_COMMON_ASYNC(0x900, decrementer, .timer_interrupt)
>>  	STD_EXCEPTION_COMMON(0x980, hdecrementer, .hdec_interrupt)
>> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
>> index bf33c22..1720e08 100644
>> --- a/arch/powerpc/kernel/traps.c
>> +++ b/arch/powerpc/kernel/traps.c
>> @@ -286,6 +286,18 @@ void system_reset_exception(struct pt_regs *regs)
>>  
>>  	/* What should we do here? We could issue a shutdown or hard reset. */
>>  }
>> +
>> +/*
>> + * This function is called in real mode. Strictly no printk's please.
>> + *
>> + * regs->nip and regs->msr contains srr0 and ssr1.
>> + */
>> +long machine_check_early(struct pt_regs *regs)
>> +{
>> +	/* TODO: handle/decode machine check reason */
>> +	return 0;
>> +}
>> +
>>  #endif
>>  
>>  /*
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 7/9] powerpc: Decode and save machine check event.
  2013-08-08  5:14   ` Paul Mackerras
@ 2013-08-08 13:19     ` Mahesh Jagannath Salgaonkar
  2013-08-08 13:33       ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 22+ messages in thread
From: Mahesh Jagannath Salgaonkar @ 2013-08-08 13:19 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev, Jeremy Kerr, Anton Blanchard

On 08/08/2013 10:44 AM, Paul Mackerras wrote:
> On Wed, Aug 07, 2013 at 03:09:13PM +0530, Mahesh J Salgaonkar wrote:
>> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
>>
>> Now that we handle machine check in linux, the MCE decoding should also
>> take place in linux host. This info is crucial to log before we go down
>> in case we can not handle the machine check errors. This patch decodes
>> and populates a machine check event which contain high level meaning full
>> MCE information.
> 
> A couple of things worry me about this patch:
> 
> First, there is the fact that we can only do get_mce_event() once for
> a given machine check.  You call it in kvmppc_realmode_mc_power7(),
> which is fine, but if it is not something we recognize and can handle
> we will proceed to exit the guest and jump to machine_check_fwnmi,
> which will then proceed to machine_check_common() and then
> opal_machine_check(), where you have added another call to
> get_mce_event(), which will probably underflow your little per-cpu
> stack of machine check events.

Ouch! I missed that. Will work on fixing it.

> 
> Secondly, we shouldn't call save_mce_event() if we're not in
> hypervisor mode, since per-cpu variables are not in general accessible
> in real mode when running under a hypervisor with a limited real-mode
> area (RMA).

Does that mean in real mode we can never be able to access per cpu
variable? OR do I need to use some tweaks to access those?

Thanks,
-Mahesh.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 2/9] powerpc: handle machine check in Linux host.
  2013-08-08 13:19     ` Mahesh Jagannath Salgaonkar
@ 2013-08-08 13:33       ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 22+ messages in thread
From: Benjamin Herrenschmidt @ 2013-08-08 13:33 UTC (permalink / raw)
  To: Mahesh Jagannath Salgaonkar
  Cc: linuxppc-dev, Paul Mackerras, Anton Blanchard, Jeremy Kerr

On Thu, 2013-08-08 at 18:49 +0530, Mahesh Jagannath Salgaonkar wrote:
> But, I think I should depend on
> evt->disposition==MCE_DISPOSITION_RECOVERED and not play with MSR_RI
> and
> SRR1. I will fix my code.

If MSR:RI is 0, then you have clobbered SRR0/SRR1 beyond repair and
probably cannot recover.

Ben.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 7/9] powerpc: Decode and save machine check event.
  2013-08-08 13:19     ` Mahesh Jagannath Salgaonkar
@ 2013-08-08 13:33       ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 22+ messages in thread
From: Benjamin Herrenschmidt @ 2013-08-08 13:33 UTC (permalink / raw)
  To: Mahesh Jagannath Salgaonkar
  Cc: linuxppc-dev, Paul Mackerras, Anton Blanchard, Jeremy Kerr

On Thu, 2013-08-08 at 18:49 +0530, Mahesh Jagannath Salgaonkar wrote:
> > Secondly, we shouldn't call save_mce_event() if we're not in
> > hypervisor mode, since per-cpu variables are not in general
> accessible
> > in real mode when running under a hypervisor with a limited
> real-mode
> > area (RMA).
> 
> Does that mean in real mode we can never be able to access per cpu
> variable? OR do I need to use some tweaks to access those?

Right, all you can access is stuff that was specifically allocated to be
in the RMA during boot, such as the PACA's.

Ben.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2013-08-08 13:34 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-07  9:37 [RFC PATCH 0/9] Machine check handling in linux host Mahesh J Salgaonkar
2013-08-07  9:38 ` [RFC PATCH 1/9] powerpc: Split the common exception prolog logic into two section Mahesh J Salgaonkar
2013-08-08  4:10   ` Anshuman Khandual
2013-08-08  4:16     ` Benjamin Herrenschmidt
2013-08-07  9:38 ` [RFC PATCH 2/9] powerpc: handle machine check in Linux host Mahesh J Salgaonkar
2013-08-08  4:51   ` Paul Mackerras
2013-08-08 13:19     ` Mahesh Jagannath Salgaonkar
2013-08-08 13:33       ` Benjamin Herrenschmidt
2013-08-08  5:01   ` Anshuman Khandual
2013-08-07  9:38 ` [RFC PATCH 3/9] powerpc: Introduce a early machine check hook in cpu_spec Mahesh J Salgaonkar
2013-08-07  9:38 ` [RFC PATCH 4/9] powerpc: Add flush_tlb operation " Mahesh J Salgaonkar
2013-08-07  9:38 ` [RFC PATCH 5/9] powerpc: Flush SLB/TLBs if we get SLB/TLB machine check errors on power7 Mahesh J Salgaonkar
2013-08-08  4:58   ` Paul Mackerras
2013-08-07  9:39 ` [RFC PATCH 6/9] powerpc: Flush SLB/TLBs if we get SLB/TLB machine check errors on power8 Mahesh J Salgaonkar
2013-08-07  9:39 ` [RFC PATCH 7/9] powerpc: Decode and save machine check event Mahesh J Salgaonkar
2013-08-07 18:41   ` Scott Wood
2013-08-08  3:40     ` Mahesh Jagannath Salgaonkar
2013-08-08  5:14   ` Paul Mackerras
2013-08-08 13:19     ` Mahesh Jagannath Salgaonkar
2013-08-08 13:33       ` Benjamin Herrenschmidt
2013-08-07  9:39 ` [RFC PATCH 8/9] powerpc/powernv: Remove machine check handling in OPAL Mahesh J Salgaonkar
2013-08-07  9:39 ` [RFC PATCH 9/9] powerpc/powernv: Machine check exception handling Mahesh J Salgaonkar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).