All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 1/2] KVM: PPC: New capability to control MCE behaviour
@ 2016-01-13  7:19 ` Aravinda Prasad
  0 siblings, 0 replies; 31+ messages in thread
From: Aravinda Prasad @ 2016-01-13  7:07 UTC (permalink / raw)
  To: gleb, agraf, kvm-ppc, paulus, linuxppc-dev, pbonzini
  Cc: mahesh, david, kvm, mpe

This patch introduces a new KVM capability to control
how KVM behaves on machine check exception (MCE).
Without this capability, KVM redirects machine check
exceptions to guest's 0x200 vector if the address in
error belongs to the guest. With this capability KVM
causes a guest exit with NMI exit reason.

This is required to avoid problems if a new kernel/KVM
is used with an old QEMU for guests that don't issue
"ibm,nmi-register". As old QEMU does not understand the
NMI exit type, it treats it as a fatal error. However,
the guest could have handled the machine check error
if the exception was delivered to guest's 0x200 interrupt
vector instead of NMI exit in case of old QEMU.

QEMU part can be found at:
http://lists.nongnu.org/archive/html/qemu-ppc/2015-12/msg00199.html

Change Log v3:
  - Split the patch into 2. First patch introduces the
    new capability while the second one enhances KVM to
    redirect MCE.
  - Fix access width bug
  - Rebased to v4.4-rc7

Change Log v2:
  - Added KVM capability

Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/kvm_host.h |    1 +
 arch/powerpc/kernel/asm-offsets.c   |    1 +
 arch/powerpc/kvm/powerpc.c          |    7 +++++++
 include/uapi/linux/kvm.h            |    1 +
 4 files changed, 10 insertions(+)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index cfa758c..9ac2b84 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -243,6 +243,7 @@ struct kvm_arch {
 	int hpt_cma_alloc;
 	struct dentry *debugfs_dir;
 	struct dentry *htab_dentry;
+	u8 fwnmi_enabled;
 #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
 #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
 	struct mutex hpt_mutex;
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index 221d584..6a4e81a 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -506,6 +506,7 @@ int main(void)
 	DEFINE(KVM_ENABLED_HCALLS, offsetof(struct kvm, arch.enabled_hcalls));
 	DEFINE(KVM_LPCR, offsetof(struct kvm, arch.lpcr));
 	DEFINE(KVM_VRMA_SLB_V, offsetof(struct kvm, arch.vrma_slb_v));
+	DEFINE(KVM_FWNMI, offsetof(struct kvm, arch.fwnmi_enabled));
 	DEFINE(VCPU_DSISR, offsetof(struct kvm_vcpu, arch.shregs.dsisr));
 	DEFINE(VCPU_DAR, offsetof(struct kvm_vcpu, arch.shregs.dar));
 	DEFINE(VCPU_VPA, offsetof(struct kvm_vcpu, arch.vpa.pinned_addr));
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 6fd2405..a8399b5 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -570,6 +570,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 		r = 1;
 		break;
 #endif
+	case KVM_CAP_PPC_FWNMI:
+		r = 1;
+		break;
 	default:
 		r = 0;
 		break;
@@ -1132,6 +1135,10 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
 		break;
 	}
 #endif /* CONFIG_KVM_XICS */
+	case KVM_CAP_PPC_FWNMI:
+		r = 0;
+		vcpu->kvm->arch.fwnmi_enabled = true;
+		break;
 	default:
 		r = -EINVAL;
 		break;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 03f3618..d8a07b5 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -831,6 +831,7 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_GUEST_DEBUG_HW_WPS 120
 #define KVM_CAP_SPLIT_IRQCHIP 121
 #define KVM_CAP_IOEVENTFD_ANY_LENGTH 122
+#define KVM_CAP_PPC_FWNMI 123
 
 #ifdef KVM_CAP_IRQ_ROUTING
 


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v3 2/2] KVM: PPC: Exit guest upon MCE when FWNMI capability is enabled
  2016-01-13  7:19 ` Aravinda Prasad
@ 2016-01-13  7:20   ` Aravinda Prasad
  -1 siblings, 0 replies; 31+ messages in thread
From: Aravinda Prasad @ 2016-01-13  7:08 UTC (permalink / raw)
  To: gleb, agraf, kvm-ppc, paulus, linuxppc-dev, pbonzini
  Cc: mahesh, david, kvm, mpe

Enhance KVM to cause a guest exit with KVM_EXIT_NMI
exit reasons upon a machine check exception (MCE) in
the guest address space if the KVM_CAP_PPC_FWNMI
capability is enabled (instead of delivering 0x200
interrupt to guest). This enables QEMU to build error
log and deliver machine check exception to guest via
guest registered machine check handler.

This approach simplifies the delivering of machine
check exception to guest OS compared to the earlier
approach of KVM directly invoking 0x200 guest interrupt
vector. In the earlier approach QEMU was enhanced to
patch the 0x200 interrupt vector during boot. The
patched code at 0x200 issued a private hcall to pass
the control to QEMU to build the error log.

This design/approach is based on the feedback for the
QEMU patches to handle machine check exception. Details
of earlier approach of handling machine check exception
in QEMU and related discussions can be found at:

https://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg00813.html

Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
---
 arch/powerpc/kvm/book3s_hv.c            |   12 ++------
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |   48 +++++++++++++++----------------
 2 files changed, 26 insertions(+), 34 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index a7352b5..4fa03d0 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -858,15 +858,9 @@ static int kvmppc_handle_exit_hv(struct kvm_run *run, struct kvm_vcpu *vcpu,
 		r = RESUME_GUEST;
 		break;
 	case BOOK3S_INTERRUPT_MACHINE_CHECK:
-		/*
-		 * Deliver a machine check interrupt to the guest.
-		 * We have to do this, even if the host has handled the
-		 * machine check, because machine checks use SRR0/1 and
-		 * the interrupt might have trashed guest state in them.
-		 */
-		kvmppc_book3s_queue_irqprio(vcpu,
-					    BOOK3S_INTERRUPT_MACHINE_CHECK);
-		r = RESUME_GUEST;
+		/* Exit to guest with KVM_EXIT_NMI as exit reason */
+		run->exit_reason = KVM_EXIT_NMI;
+		r = RESUME_HOST;
 		break;
 	case BOOK3S_INTERRUPT_PROGRAM:
 	{
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 3c6badc..84e32a3 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -133,21 +133,18 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 	stb	r0, HSTATE_HWTHREAD_REQ(r13)
 
 	/*
-	 * For external and machine check interrupts, we need
-	 * to call the Linux handler to process the interrupt.
-	 * We do that by jumping to absolute address 0x500 for
-	 * external interrupts, or the machine_check_fwnmi label
-	 * for machine checks (since firmware might have patched
-	 * the vector area at 0x200).  The [h]rfid at the end of the
-	 * handler will return to the book3s_hv_interrupts.S code.
-	 * For other interrupts we do the rfid to get back
-	 * to the book3s_hv_interrupts.S code here.
+	 * For external interrupts we need to call the Linux
+	 * handler to process the interrupt. We do that by jumping
+	 * to absolute address 0x500 for external interrupts.
+	 * The [h]rfid at the end of the handler will return to
+	 * the book3s_hv_interrupts.S code. For other interrupts
+	 * we do the rfid to get back to the book3s_hv_interrupts.S
+	 * code here.
 	 */
 	ld	r8, 112+PPC_LR_STKOFF(r1)
 	addi	r1, r1, 112
 	ld	r7, HSTATE_HOST_MSR(r13)
 
-	cmpwi	cr1, r12, BOOK3S_INTERRUPT_MACHINE_CHECK
 	cmpwi	r12, BOOK3S_INTERRUPT_EXTERNAL
 	beq	11f
 	cmpwi	r12, BOOK3S_INTERRUPT_H_DOORBELL
@@ -162,7 +159,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 	mtmsrd	r6, 1			/* Clear RI in MSR */
 	mtsrr0	r8
 	mtsrr1	r7
-	beq	cr1, 13f		/* machine check */
 	RFI
 
 	/* On POWER7, we have external interrupts set to use HSRR0/1 */
@@ -170,8 +166,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 	mtspr	SPRN_HSRR1, r7
 	ba	0x500
 
-13:	b	machine_check_fwnmi
-
 14:	mtspr	SPRN_HSRR0, r8
 	mtspr	SPRN_HSRR1, r7
 	b	hmi_exception_after_realmode
@@ -2390,15 +2384,13 @@ machine_check_realmode:
 	ld	r9, HSTATE_KVM_VCPU(r13)
 	li	r12, BOOK3S_INTERRUPT_MACHINE_CHECK
 	/*
-	 * Deliver unhandled/fatal (e.g. UE) MCE errors to guest through
-	 * machine check interrupt (set HSRR0 to 0x200). And for handled
-	 * errors (no-fatal), just go back to guest execution with current
-	 * HSRR0 instead of exiting guest. This new approach will inject
-	 * machine check to guest for fatal error causing guest to crash.
-	 *
-	 * The old code used to return to host for unhandled errors which
-	 * was causing guest to hang with soft lockups inside guest and
-	 * makes it difficult to recover guest instance.
+	 * Deliver unhandled/fatal (e.g. UE) MCE errors to guest either
+	 * through machine check interrupt (set HSRR0 to 0x200) or by
+	 * exiting the guest with KVM_EXIT_NMI exit reason if guest is
+	 * FWNMI capable. For handled errors (no-fatal), just go back
+	 * to guest execution with current HSRR0. This new approach
+	 * injects machine check errors in guest address space to guest
+	 * enabling guest kernel to suitably handle such errors.
 	 *
 	 * if we receive machine check with MSR(RI=0) then deliver it to
 	 * guest as machine check causing guest to crash.
@@ -2408,11 +2400,17 @@ machine_check_realmode:
 	beq	1f			/* Deliver a machine check to guest */
 	ld	r10, VCPU_PC(r9)
 	cmpdi	r3, 0		/* Did we handle MCE ? */
-	bne	2f	/* Continue guest execution. */
+	bne	3f	/* Continue guest execution. */
 	/* If not, deliver a machine check.  SRR0/1 are already set */
-1:	li	r10, BOOK3S_INTERRUPT_MACHINE_CHECK
+1:  /* Check if guest is capable of handling NMI exit */
+	ld  r3, VCPU_KVM(r9)
+	lbz  r3, KVM_FWNMI(r3)
+	cmpdi   r3, 1       /* FWNMI capable? */
+	bne 2f
+	b   mc_cont
+2:	li	r10, BOOK3S_INTERRUPT_MACHINE_CHECK
 	bl	kvmppc_msr_interrupt
-2:	b	fast_interrupt_c_return
+3:	b	fast_interrupt_c_return
 
 /*
  * Check the reason we woke from nap, and take appropriate action.


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v3 1/2] KVM: PPC: New capability to control MCE behaviour
@ 2016-01-13  7:19 ` Aravinda Prasad
  0 siblings, 0 replies; 31+ messages in thread
From: Aravinda Prasad @ 2016-01-13  7:19 UTC (permalink / raw)
  To: gleb, agraf, kvm-ppc, paulus, linuxppc-dev, pbonzini
  Cc: mahesh, david, kvm, mpe

This patch introduces a new KVM capability to control
how KVM behaves on machine check exception (MCE).
Without this capability, KVM redirects machine check
exceptions to guest's 0x200 vector if the address in
error belongs to the guest. With this capability KVM
causes a guest exit with NMI exit reason.

This is required to avoid problems if a new kernel/KVM
is used with an old QEMU for guests that don't issue
"ibm,nmi-register". As old QEMU does not understand the
NMI exit type, it treats it as a fatal error. However,
the guest could have handled the machine check error
if the exception was delivered to guest's 0x200 interrupt
vector instead of NMI exit in case of old QEMU.

QEMU part can be found at:
http://lists.nongnu.org/archive/html/qemu-ppc/2015-12/msg00199.html

Change Log v3:
  - Split the patch into 2. First patch introduces the
    new capability while the second one enhances KVM to
    redirect MCE.
  - Fix access width bug
  - Rebased to v4.4-rc7

Change Log v2:
  - Added KVM capability

Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/kvm_host.h |    1 +
 arch/powerpc/kernel/asm-offsets.c   |    1 +
 arch/powerpc/kvm/powerpc.c          |    7 +++++++
 include/uapi/linux/kvm.h            |    1 +
 4 files changed, 10 insertions(+)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index cfa758c..9ac2b84 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -243,6 +243,7 @@ struct kvm_arch {
 	int hpt_cma_alloc;
 	struct dentry *debugfs_dir;
 	struct dentry *htab_dentry;
+	u8 fwnmi_enabled;
 #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
 #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
 	struct mutex hpt_mutex;
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index 221d584..6a4e81a 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -506,6 +506,7 @@ int main(void)
 	DEFINE(KVM_ENABLED_HCALLS, offsetof(struct kvm, arch.enabled_hcalls));
 	DEFINE(KVM_LPCR, offsetof(struct kvm, arch.lpcr));
 	DEFINE(KVM_VRMA_SLB_V, offsetof(struct kvm, arch.vrma_slb_v));
+	DEFINE(KVM_FWNMI, offsetof(struct kvm, arch.fwnmi_enabled));
 	DEFINE(VCPU_DSISR, offsetof(struct kvm_vcpu, arch.shregs.dsisr));
 	DEFINE(VCPU_DAR, offsetof(struct kvm_vcpu, arch.shregs.dar));
 	DEFINE(VCPU_VPA, offsetof(struct kvm_vcpu, arch.vpa.pinned_addr));
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 6fd2405..a8399b5 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -570,6 +570,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 		r = 1;
 		break;
 #endif
+	case KVM_CAP_PPC_FWNMI:
+		r = 1;
+		break;
 	default:
 		r = 0;
 		break;
@@ -1132,6 +1135,10 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
 		break;
 	}
 #endif /* CONFIG_KVM_XICS */
+	case KVM_CAP_PPC_FWNMI:
+		r = 0;
+		vcpu->kvm->arch.fwnmi_enabled = true;
+		break;
 	default:
 		r = -EINVAL;
 		break;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 03f3618..d8a07b5 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -831,6 +831,7 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_GUEST_DEBUG_HW_WPS 120
 #define KVM_CAP_SPLIT_IRQCHIP 121
 #define KVM_CAP_IOEVENTFD_ANY_LENGTH 122
+#define KVM_CAP_PPC_FWNMI 123
 
 #ifdef KVM_CAP_IRQ_ROUTING
 


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v3 2/2] KVM: PPC: Exit guest upon MCE when FWNMI capability is enabled
@ 2016-01-13  7:20   ` Aravinda Prasad
  0 siblings, 0 replies; 31+ messages in thread
From: Aravinda Prasad @ 2016-01-13  7:20 UTC (permalink / raw)
  To: gleb, agraf, kvm-ppc, paulus, linuxppc-dev, pbonzini
  Cc: mahesh, david, kvm, mpe

Enhance KVM to cause a guest exit with KVM_EXIT_NMI
exit reasons upon a machine check exception (MCE) in
the guest address space if the KVM_CAP_PPC_FWNMI
capability is enabled (instead of delivering 0x200
interrupt to guest). This enables QEMU to build error
log and deliver machine check exception to guest via
guest registered machine check handler.

This approach simplifies the delivering of machine
check exception to guest OS compared to the earlier
approach of KVM directly invoking 0x200 guest interrupt
vector. In the earlier approach QEMU was enhanced to
patch the 0x200 interrupt vector during boot. The
patched code at 0x200 issued a private hcall to pass
the control to QEMU to build the error log.

This design/approach is based on the feedback for the
QEMU patches to handle machine check exception. Details
of earlier approach of handling machine check exception
in QEMU and related discussions can be found at:

https://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg00813.html

Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
---
 arch/powerpc/kvm/book3s_hv.c            |   12 ++------
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |   48 +++++++++++++++----------------
 2 files changed, 26 insertions(+), 34 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index a7352b5..4fa03d0 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -858,15 +858,9 @@ static int kvmppc_handle_exit_hv(struct kvm_run *run, struct kvm_vcpu *vcpu,
 		r = RESUME_GUEST;
 		break;
 	case BOOK3S_INTERRUPT_MACHINE_CHECK:
-		/*
-		 * Deliver a machine check interrupt to the guest.
-		 * We have to do this, even if the host has handled the
-		 * machine check, because machine checks use SRR0/1 and
-		 * the interrupt might have trashed guest state in them.
-		 */
-		kvmppc_book3s_queue_irqprio(vcpu,
-					    BOOK3S_INTERRUPT_MACHINE_CHECK);
-		r = RESUME_GUEST;
+		/* Exit to guest with KVM_EXIT_NMI as exit reason */
+		run->exit_reason = KVM_EXIT_NMI;
+		r = RESUME_HOST;
 		break;
 	case BOOK3S_INTERRUPT_PROGRAM:
 	{
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 3c6badc..84e32a3 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -133,21 +133,18 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 	stb	r0, HSTATE_HWTHREAD_REQ(r13)
 
 	/*
-	 * For external and machine check interrupts, we need
-	 * to call the Linux handler to process the interrupt.
-	 * We do that by jumping to absolute address 0x500 for
-	 * external interrupts, or the machine_check_fwnmi label
-	 * for machine checks (since firmware might have patched
-	 * the vector area at 0x200).  The [h]rfid at the end of the
-	 * handler will return to the book3s_hv_interrupts.S code.
-	 * For other interrupts we do the rfid to get back
-	 * to the book3s_hv_interrupts.S code here.
+	 * For external interrupts we need to call the Linux
+	 * handler to process the interrupt. We do that by jumping
+	 * to absolute address 0x500 for external interrupts.
+	 * The [h]rfid at the end of the handler will return to
+	 * the book3s_hv_interrupts.S code. For other interrupts
+	 * we do the rfid to get back to the book3s_hv_interrupts.S
+	 * code here.
 	 */
 	ld	r8, 112+PPC_LR_STKOFF(r1)
 	addi	r1, r1, 112
 	ld	r7, HSTATE_HOST_MSR(r13)
 
-	cmpwi	cr1, r12, BOOK3S_INTERRUPT_MACHINE_CHECK
 	cmpwi	r12, BOOK3S_INTERRUPT_EXTERNAL
 	beq	11f
 	cmpwi	r12, BOOK3S_INTERRUPT_H_DOORBELL
@@ -162,7 +159,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 	mtmsrd	r6, 1			/* Clear RI in MSR */
 	mtsrr0	r8
 	mtsrr1	r7
-	beq	cr1, 13f		/* machine check */
 	RFI
 
 	/* On POWER7, we have external interrupts set to use HSRR0/1 */
@@ -170,8 +166,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 	mtspr	SPRN_HSRR1, r7
 	ba	0x500
 
-13:	b	machine_check_fwnmi
-
 14:	mtspr	SPRN_HSRR0, r8
 	mtspr	SPRN_HSRR1, r7
 	b	hmi_exception_after_realmode
@@ -2390,15 +2384,13 @@ machine_check_realmode:
 	ld	r9, HSTATE_KVM_VCPU(r13)
 	li	r12, BOOK3S_INTERRUPT_MACHINE_CHECK
 	/*
-	 * Deliver unhandled/fatal (e.g. UE) MCE errors to guest through
-	 * machine check interrupt (set HSRR0 to 0x200). And for handled
-	 * errors (no-fatal), just go back to guest execution with current
-	 * HSRR0 instead of exiting guest. This new approach will inject
-	 * machine check to guest for fatal error causing guest to crash.
-	 *
-	 * The old code used to return to host for unhandled errors which
-	 * was causing guest to hang with soft lockups inside guest and
-	 * makes it difficult to recover guest instance.
+	 * Deliver unhandled/fatal (e.g. UE) MCE errors to guest either
+	 * through machine check interrupt (set HSRR0 to 0x200) or by
+	 * exiting the guest with KVM_EXIT_NMI exit reason if guest is
+	 * FWNMI capable. For handled errors (no-fatal), just go back
+	 * to guest execution with current HSRR0. This new approach
+	 * injects machine check errors in guest address space to guest
+	 * enabling guest kernel to suitably handle such errors.
 	 *
 	 * if we receive machine check with MSR(RI=0) then deliver it to
 	 * guest as machine check causing guest to crash.
@@ -2408,11 +2400,17 @@ machine_check_realmode:
 	beq	1f			/* Deliver a machine check to guest */
 	ld	r10, VCPU_PC(r9)
 	cmpdi	r3, 0		/* Did we handle MCE ? */
-	bne	2f	/* Continue guest execution. */
+	bne	3f	/* Continue guest execution. */
 	/* If not, deliver a machine check.  SRR0/1 are already set */
-1:	li	r10, BOOK3S_INTERRUPT_MACHINE_CHECK
+1:  /* Check if guest is capable of handling NMI exit */
+	ld  r3, VCPU_KVM(r9)
+	lbz  r3, KVM_FWNMI(r3)
+	cmpdi   r3, 1       /* FWNMI capable? */
+	bne 2f
+	b   mc_cont
+2:	li	r10, BOOK3S_INTERRUPT_MACHINE_CHECK
 	bl	kvmppc_msr_interrupt
-2:	b	fast_interrupt_c_return
+3:	b	fast_interrupt_c_return
 
 /*
  * Check the reason we woke from nap, and take appropriate action.


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 1/2] KVM: PPC: New capability to control MCE behaviour
  2016-01-13  7:19 ` Aravinda Prasad
@ 2016-01-14  0:02   ` David Gibson
  -1 siblings, 0 replies; 31+ messages in thread
From: David Gibson @ 2016-01-14  0:02 UTC (permalink / raw)
  To: Aravinda Prasad
  Cc: gleb, agraf, kvm-ppc, paulus, linuxppc-dev, pbonzini, mahesh, kvm, mpe

[-- Attachment #1: Type: text/plain, Size: 4067 bytes --]

On Wed, Jan 13, 2016 at 12:37:59PM +0530, Aravinda Prasad wrote:
> This patch introduces a new KVM capability to control
> how KVM behaves on machine check exception (MCE).
> Without this capability, KVM redirects machine check
> exceptions to guest's 0x200 vector if the address in
> error belongs to the guest. With this capability KVM
> causes a guest exit with NMI exit reason.
> 
> This is required to avoid problems if a new kernel/KVM
> is used with an old QEMU for guests that don't issue
> "ibm,nmi-register". As old QEMU does not understand the
> NMI exit type, it treats it as a fatal error. However,
> the guest could have handled the machine check error
> if the exception was delivered to guest's 0x200 interrupt
> vector instead of NMI exit in case of old QEMU.
> 
> QEMU part can be found at:
> http://lists.nongnu.org/archive/html/qemu-ppc/2015-12/msg00199.html
> 
> Change Log v3:
>   - Split the patch into 2. First patch introduces the
>     new capability while the second one enhances KVM to
>     redirect MCE.
>   - Fix access width bug
>   - Rebased to v4.4-rc7
> 
> Change Log v2:
>   - Added KVM capability
> 
> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> ---
>  arch/powerpc/include/asm/kvm_host.h |    1 +
>  arch/powerpc/kernel/asm-offsets.c   |    1 +
>  arch/powerpc/kvm/powerpc.c          |    7 +++++++
>  include/uapi/linux/kvm.h            |    1 +
>  4 files changed, 10 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
> index cfa758c..9ac2b84 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -243,6 +243,7 @@ struct kvm_arch {
>  	int hpt_cma_alloc;
>  	struct dentry *debugfs_dir;
>  	struct dentry *htab_dentry;
> +	u8 fwnmi_enabled;

Um.. I don't see anything in this patch or 2/2 which actually tests
this flag...

>  #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
>  #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
>  	struct mutex hpt_mutex;
> diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
> index 221d584..6a4e81a 100644
> --- a/arch/powerpc/kernel/asm-offsets.c
> +++ b/arch/powerpc/kernel/asm-offsets.c
> @@ -506,6 +506,7 @@ int main(void)
>  	DEFINE(KVM_ENABLED_HCALLS, offsetof(struct kvm, arch.enabled_hcalls));
>  	DEFINE(KVM_LPCR, offsetof(struct kvm, arch.lpcr));
>  	DEFINE(KVM_VRMA_SLB_V, offsetof(struct kvm, arch.vrma_slb_v));
> +	DEFINE(KVM_FWNMI, offsetof(struct kvm, arch.fwnmi_enabled));
>  	DEFINE(VCPU_DSISR, offsetof(struct kvm_vcpu, arch.shregs.dsisr));
>  	DEFINE(VCPU_DAR, offsetof(struct kvm_vcpu, arch.shregs.dar));
>  	DEFINE(VCPU_VPA, offsetof(struct kvm_vcpu, arch.vpa.pinned_addr));
> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
> index 6fd2405..a8399b5 100644
> --- a/arch/powerpc/kvm/powerpc.c
> +++ b/arch/powerpc/kvm/powerpc.c
> @@ -570,6 +570,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>  		r = 1;
>  		break;
>  #endif
> +	case KVM_CAP_PPC_FWNMI:
> +		r = 1;
> +		break;
>  	default:
>  		r = 0;
>  		break;
> @@ -1132,6 +1135,10 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
>  		break;
>  	}
>  #endif /* CONFIG_KVM_XICS */
> +	case KVM_CAP_PPC_FWNMI:
> +		r = 0;
> +		vcpu->kvm->arch.fwnmi_enabled = true;
> +		break;
>  	default:
>  		r = -EINVAL;
>  		break;
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 03f3618..d8a07b5 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -831,6 +831,7 @@ struct kvm_ppc_smmu_info {
>  #define KVM_CAP_GUEST_DEBUG_HW_WPS 120
>  #define KVM_CAP_SPLIT_IRQCHIP 121
>  #define KVM_CAP_IOEVENTFD_ANY_LENGTH 122
> +#define KVM_CAP_PPC_FWNMI 123
>  
>  #ifdef KVM_CAP_IRQ_ROUTING
>  
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 1/2] KVM: PPC: New capability to control MCE behaviour
@ 2016-01-14  0:02   ` David Gibson
  0 siblings, 0 replies; 31+ messages in thread
From: David Gibson @ 2016-01-14  0:02 UTC (permalink / raw)
  To: Aravinda Prasad
  Cc: gleb, agraf, kvm-ppc, paulus, linuxppc-dev, pbonzini, mahesh, kvm, mpe

[-- Attachment #1: Type: text/plain, Size: 4067 bytes --]

On Wed, Jan 13, 2016 at 12:37:59PM +0530, Aravinda Prasad wrote:
> This patch introduces a new KVM capability to control
> how KVM behaves on machine check exception (MCE).
> Without this capability, KVM redirects machine check
> exceptions to guest's 0x200 vector if the address in
> error belongs to the guest. With this capability KVM
> causes a guest exit with NMI exit reason.
> 
> This is required to avoid problems if a new kernel/KVM
> is used with an old QEMU for guests that don't issue
> "ibm,nmi-register". As old QEMU does not understand the
> NMI exit type, it treats it as a fatal error. However,
> the guest could have handled the machine check error
> if the exception was delivered to guest's 0x200 interrupt
> vector instead of NMI exit in case of old QEMU.
> 
> QEMU part can be found at:
> http://lists.nongnu.org/archive/html/qemu-ppc/2015-12/msg00199.html
> 
> Change Log v3:
>   - Split the patch into 2. First patch introduces the
>     new capability while the second one enhances KVM to
>     redirect MCE.
>   - Fix access width bug
>   - Rebased to v4.4-rc7
> 
> Change Log v2:
>   - Added KVM capability
> 
> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> ---
>  arch/powerpc/include/asm/kvm_host.h |    1 +
>  arch/powerpc/kernel/asm-offsets.c   |    1 +
>  arch/powerpc/kvm/powerpc.c          |    7 +++++++
>  include/uapi/linux/kvm.h            |    1 +
>  4 files changed, 10 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
> index cfa758c..9ac2b84 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -243,6 +243,7 @@ struct kvm_arch {
>  	int hpt_cma_alloc;
>  	struct dentry *debugfs_dir;
>  	struct dentry *htab_dentry;
> +	u8 fwnmi_enabled;

Um.. I don't see anything in this patch or 2/2 which actually tests
this flag...

>  #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
>  #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
>  	struct mutex hpt_mutex;
> diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
> index 221d584..6a4e81a 100644
> --- a/arch/powerpc/kernel/asm-offsets.c
> +++ b/arch/powerpc/kernel/asm-offsets.c
> @@ -506,6 +506,7 @@ int main(void)
>  	DEFINE(KVM_ENABLED_HCALLS, offsetof(struct kvm, arch.enabled_hcalls));
>  	DEFINE(KVM_LPCR, offsetof(struct kvm, arch.lpcr));
>  	DEFINE(KVM_VRMA_SLB_V, offsetof(struct kvm, arch.vrma_slb_v));
> +	DEFINE(KVM_FWNMI, offsetof(struct kvm, arch.fwnmi_enabled));
>  	DEFINE(VCPU_DSISR, offsetof(struct kvm_vcpu, arch.shregs.dsisr));
>  	DEFINE(VCPU_DAR, offsetof(struct kvm_vcpu, arch.shregs.dar));
>  	DEFINE(VCPU_VPA, offsetof(struct kvm_vcpu, arch.vpa.pinned_addr));
> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
> index 6fd2405..a8399b5 100644
> --- a/arch/powerpc/kvm/powerpc.c
> +++ b/arch/powerpc/kvm/powerpc.c
> @@ -570,6 +570,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>  		r = 1;
>  		break;
>  #endif
> +	case KVM_CAP_PPC_FWNMI:
> +		r = 1;
> +		break;
>  	default:
>  		r = 0;
>  		break;
> @@ -1132,6 +1135,10 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
>  		break;
>  	}
>  #endif /* CONFIG_KVM_XICS */
> +	case KVM_CAP_PPC_FWNMI:
> +		r = 0;
> +		vcpu->kvm->arch.fwnmi_enabled = true;
> +		break;
>  	default:
>  		r = -EINVAL;
>  		break;
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 03f3618..d8a07b5 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -831,6 +831,7 @@ struct kvm_ppc_smmu_info {
>  #define KVM_CAP_GUEST_DEBUG_HW_WPS 120
>  #define KVM_CAP_SPLIT_IRQCHIP 121
>  #define KVM_CAP_IOEVENTFD_ANY_LENGTH 122
> +#define KVM_CAP_PPC_FWNMI 123
>  
>  #ifdef KVM_CAP_IRQ_ROUTING
>  
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 1/2] KVM: PPC: New capability to control MCE behaviour
  2016-01-14  0:02   ` David Gibson
@ 2016-01-14  0:05     ` David Gibson
  -1 siblings, 0 replies; 31+ messages in thread
From: David Gibson @ 2016-01-14  0:05 UTC (permalink / raw)
  To: Aravinda Prasad
  Cc: gleb, agraf, kvm-ppc, paulus, linuxppc-dev, pbonzini, mahesh, kvm, mpe

[-- Attachment #1: Type: text/plain, Size: 4455 bytes --]

On Thu, Jan 14, 2016 at 11:02:39AM +1100, David Gibson wrote:
> On Wed, Jan 13, 2016 at 12:37:59PM +0530, Aravinda Prasad wrote:
> > This patch introduces a new KVM capability to control
> > how KVM behaves on machine check exception (MCE).
> > Without this capability, KVM redirects machine check
> > exceptions to guest's 0x200 vector if the address in
> > error belongs to the guest. With this capability KVM
> > causes a guest exit with NMI exit reason.
> > 
> > This is required to avoid problems if a new kernel/KVM
> > is used with an old QEMU for guests that don't issue
> > "ibm,nmi-register". As old QEMU does not understand the
> > NMI exit type, it treats it as a fatal error. However,
> > the guest could have handled the machine check error
> > if the exception was delivered to guest's 0x200 interrupt
> > vector instead of NMI exit in case of old QEMU.
> > 
> > QEMU part can be found at:
> > http://lists.nongnu.org/archive/html/qemu-ppc/2015-12/msg00199.html
> > 
> > Change Log v3:
> >   - Split the patch into 2. First patch introduces the
> >     new capability while the second one enhances KVM to
> >     redirect MCE.
> >   - Fix access width bug
> >   - Rebased to v4.4-rc7
> > 
> > Change Log v2:
> >   - Added KVM capability
> > 
> > Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> > ---
> >  arch/powerpc/include/asm/kvm_host.h |    1 +
> >  arch/powerpc/kernel/asm-offsets.c   |    1 +
> >  arch/powerpc/kvm/powerpc.c          |    7 +++++++
> >  include/uapi/linux/kvm.h            |    1 +
> >  4 files changed, 10 insertions(+)
> > 
> > diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
> > index cfa758c..9ac2b84 100644
> > --- a/arch/powerpc/include/asm/kvm_host.h
> > +++ b/arch/powerpc/include/asm/kvm_host.h
> > @@ -243,6 +243,7 @@ struct kvm_arch {
> >  	int hpt_cma_alloc;
> >  	struct dentry *debugfs_dir;
> >  	struct dentry *htab_dentry;
> > +	u8 fwnmi_enabled;
> 
> Um.. I don't see anything in this patch or 2/2 which actually tests
> this flag...

Sorry, I missed it in the asm, spotted it now.

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> 
> >  #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
> >  #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
> >  	struct mutex hpt_mutex;
> > diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
> > index 221d584..6a4e81a 100644
> > --- a/arch/powerpc/kernel/asm-offsets.c
> > +++ b/arch/powerpc/kernel/asm-offsets.c
> > @@ -506,6 +506,7 @@ int main(void)
> >  	DEFINE(KVM_ENABLED_HCALLS, offsetof(struct kvm, arch.enabled_hcalls));
> >  	DEFINE(KVM_LPCR, offsetof(struct kvm, arch.lpcr));
> >  	DEFINE(KVM_VRMA_SLB_V, offsetof(struct kvm, arch.vrma_slb_v));
> > +	DEFINE(KVM_FWNMI, offsetof(struct kvm, arch.fwnmi_enabled));
> >  	DEFINE(VCPU_DSISR, offsetof(struct kvm_vcpu, arch.shregs.dsisr));
> >  	DEFINE(VCPU_DAR, offsetof(struct kvm_vcpu, arch.shregs.dar));
> >  	DEFINE(VCPU_VPA, offsetof(struct kvm_vcpu, arch.vpa.pinned_addr));
> > diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
> > index 6fd2405..a8399b5 100644
> > --- a/arch/powerpc/kvm/powerpc.c
> > +++ b/arch/powerpc/kvm/powerpc.c
> > @@ -570,6 +570,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> >  		r = 1;
> >  		break;
> >  #endif
> > +	case KVM_CAP_PPC_FWNMI:
> > +		r = 1;
> > +		break;
> >  	default:
> >  		r = 0;
> >  		break;
> > @@ -1132,6 +1135,10 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
> >  		break;
> >  	}
> >  #endif /* CONFIG_KVM_XICS */
> > +	case KVM_CAP_PPC_FWNMI:
> > +		r = 0;
> > +		vcpu->kvm->arch.fwnmi_enabled = true;
> > +		break;
> >  	default:
> >  		r = -EINVAL;
> >  		break;
> > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > index 03f3618..d8a07b5 100644
> > --- a/include/uapi/linux/kvm.h
> > +++ b/include/uapi/linux/kvm.h
> > @@ -831,6 +831,7 @@ struct kvm_ppc_smmu_info {
> >  #define KVM_CAP_GUEST_DEBUG_HW_WPS 120
> >  #define KVM_CAP_SPLIT_IRQCHIP 121
> >  #define KVM_CAP_IOEVENTFD_ANY_LENGTH 122
> > +#define KVM_CAP_PPC_FWNMI 123
> >  
> >  #ifdef KVM_CAP_IRQ_ROUTING
> >  
> > 
> 



-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 1/2] KVM: PPC: New capability to control MCE behaviour
@ 2016-01-14  0:05     ` David Gibson
  0 siblings, 0 replies; 31+ messages in thread
From: David Gibson @ 2016-01-14  0:05 UTC (permalink / raw)
  To: Aravinda Prasad
  Cc: gleb, agraf, kvm-ppc, paulus, linuxppc-dev, pbonzini, mahesh, kvm, mpe

[-- Attachment #1: Type: text/plain, Size: 4455 bytes --]

On Thu, Jan 14, 2016 at 11:02:39AM +1100, David Gibson wrote:
> On Wed, Jan 13, 2016 at 12:37:59PM +0530, Aravinda Prasad wrote:
> > This patch introduces a new KVM capability to control
> > how KVM behaves on machine check exception (MCE).
> > Without this capability, KVM redirects machine check
> > exceptions to guest's 0x200 vector if the address in
> > error belongs to the guest. With this capability KVM
> > causes a guest exit with NMI exit reason.
> > 
> > This is required to avoid problems if a new kernel/KVM
> > is used with an old QEMU for guests that don't issue
> > "ibm,nmi-register". As old QEMU does not understand the
> > NMI exit type, it treats it as a fatal error. However,
> > the guest could have handled the machine check error
> > if the exception was delivered to guest's 0x200 interrupt
> > vector instead of NMI exit in case of old QEMU.
> > 
> > QEMU part can be found at:
> > http://lists.nongnu.org/archive/html/qemu-ppc/2015-12/msg00199.html
> > 
> > Change Log v3:
> >   - Split the patch into 2. First patch introduces the
> >     new capability while the second one enhances KVM to
> >     redirect MCE.
> >   - Fix access width bug
> >   - Rebased to v4.4-rc7
> > 
> > Change Log v2:
> >   - Added KVM capability
> > 
> > Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> > ---
> >  arch/powerpc/include/asm/kvm_host.h |    1 +
> >  arch/powerpc/kernel/asm-offsets.c   |    1 +
> >  arch/powerpc/kvm/powerpc.c          |    7 +++++++
> >  include/uapi/linux/kvm.h            |    1 +
> >  4 files changed, 10 insertions(+)
> > 
> > diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
> > index cfa758c..9ac2b84 100644
> > --- a/arch/powerpc/include/asm/kvm_host.h
> > +++ b/arch/powerpc/include/asm/kvm_host.h
> > @@ -243,6 +243,7 @@ struct kvm_arch {
> >  	int hpt_cma_alloc;
> >  	struct dentry *debugfs_dir;
> >  	struct dentry *htab_dentry;
> > +	u8 fwnmi_enabled;
> 
> Um.. I don't see anything in this patch or 2/2 which actually tests
> this flag...

Sorry, I missed it in the asm, spotted it now.

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> 
> >  #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
> >  #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
> >  	struct mutex hpt_mutex;
> > diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
> > index 221d584..6a4e81a 100644
> > --- a/arch/powerpc/kernel/asm-offsets.c
> > +++ b/arch/powerpc/kernel/asm-offsets.c
> > @@ -506,6 +506,7 @@ int main(void)
> >  	DEFINE(KVM_ENABLED_HCALLS, offsetof(struct kvm, arch.enabled_hcalls));
> >  	DEFINE(KVM_LPCR, offsetof(struct kvm, arch.lpcr));
> >  	DEFINE(KVM_VRMA_SLB_V, offsetof(struct kvm, arch.vrma_slb_v));
> > +	DEFINE(KVM_FWNMI, offsetof(struct kvm, arch.fwnmi_enabled));
> >  	DEFINE(VCPU_DSISR, offsetof(struct kvm_vcpu, arch.shregs.dsisr));
> >  	DEFINE(VCPU_DAR, offsetof(struct kvm_vcpu, arch.shregs.dar));
> >  	DEFINE(VCPU_VPA, offsetof(struct kvm_vcpu, arch.vpa.pinned_addr));
> > diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
> > index 6fd2405..a8399b5 100644
> > --- a/arch/powerpc/kvm/powerpc.c
> > +++ b/arch/powerpc/kvm/powerpc.c
> > @@ -570,6 +570,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> >  		r = 1;
> >  		break;
> >  #endif
> > +	case KVM_CAP_PPC_FWNMI:
> > +		r = 1;
> > +		break;
> >  	default:
> >  		r = 0;
> >  		break;
> > @@ -1132,6 +1135,10 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
> >  		break;
> >  	}
> >  #endif /* CONFIG_KVM_XICS */
> > +	case KVM_CAP_PPC_FWNMI:
> > +		r = 0;
> > +		vcpu->kvm->arch.fwnmi_enabled = true;
> > +		break;
> >  	default:
> >  		r = -EINVAL;
> >  		break;
> > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > index 03f3618..d8a07b5 100644
> > --- a/include/uapi/linux/kvm.h
> > +++ b/include/uapi/linux/kvm.h
> > @@ -831,6 +831,7 @@ struct kvm_ppc_smmu_info {
> >  #define KVM_CAP_GUEST_DEBUG_HW_WPS 120
> >  #define KVM_CAP_SPLIT_IRQCHIP 121
> >  #define KVM_CAP_IOEVENTFD_ANY_LENGTH 122
> > +#define KVM_CAP_PPC_FWNMI 123
> >  
> >  #ifdef KVM_CAP_IRQ_ROUTING
> >  
> > 
> 



-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 2/2] KVM: PPC: Exit guest upon MCE when FWNMI capability is enabled
  2016-01-13  7:20   ` Aravinda Prasad
@ 2016-01-14  0:06     ` David Gibson
  -1 siblings, 0 replies; 31+ messages in thread
From: David Gibson @ 2016-01-14  0:06 UTC (permalink / raw)
  To: Aravinda Prasad
  Cc: gleb, agraf, kvm-ppc, paulus, linuxppc-dev, pbonzini, mahesh, kvm, mpe

[-- Attachment #1: Type: text/plain, Size: 6684 bytes --]

On Wed, Jan 13, 2016 at 12:38:09PM +0530, Aravinda Prasad wrote:
> Enhance KVM to cause a guest exit with KVM_EXIT_NMI
> exit reasons upon a machine check exception (MCE) in
> the guest address space if the KVM_CAP_PPC_FWNMI
> capability is enabled (instead of delivering 0x200
> interrupt to guest). This enables QEMU to build error
> log and deliver machine check exception to guest via
> guest registered machine check handler.
> 
> This approach simplifies the delivering of machine
> check exception to guest OS compared to the earlier
> approach of KVM directly invoking 0x200 guest interrupt
> vector. In the earlier approach QEMU was enhanced to
> patch the 0x200 interrupt vector during boot. The
> patched code at 0x200 issued a private hcall to pass
> the control to QEMU to build the error log.
> 
> This design/approach is based on the feedback for the
> QEMU patches to handle machine check exception. Details
> of earlier approach of handling machine check exception
> in QEMU and related discussions can be found at:
> 
> https://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg00813.html
> 
> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  arch/powerpc/kvm/book3s_hv.c            |   12 ++------
>  arch/powerpc/kvm/book3s_hv_rmhandlers.S |   48 +++++++++++++++----------------
>  2 files changed, 26 insertions(+), 34 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index a7352b5..4fa03d0 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -858,15 +858,9 @@ static int kvmppc_handle_exit_hv(struct kvm_run *run, struct kvm_vcpu *vcpu,
>  		r = RESUME_GUEST;
>  		break;
>  	case BOOK3S_INTERRUPT_MACHINE_CHECK:
> -		/*
> -		 * Deliver a machine check interrupt to the guest.
> -		 * We have to do this, even if the host has handled the
> -		 * machine check, because machine checks use SRR0/1 and
> -		 * the interrupt might have trashed guest state in them.
> -		 */
> -		kvmppc_book3s_queue_irqprio(vcpu,
> -					    BOOK3S_INTERRUPT_MACHINE_CHECK);
> -		r = RESUME_GUEST;
> +		/* Exit to guest with KVM_EXIT_NMI as exit reason */
> +		run->exit_reason = KVM_EXIT_NMI;
> +		r = RESUME_HOST;
>  		break;
>  	case BOOK3S_INTERRUPT_PROGRAM:
>  	{
> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> index 3c6badc..84e32a3 100644
> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> @@ -133,21 +133,18 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>  	stb	r0, HSTATE_HWTHREAD_REQ(r13)
>  
>  	/*
> -	 * For external and machine check interrupts, we need
> -	 * to call the Linux handler to process the interrupt.
> -	 * We do that by jumping to absolute address 0x500 for
> -	 * external interrupts, or the machine_check_fwnmi label
> -	 * for machine checks (since firmware might have patched
> -	 * the vector area at 0x200).  The [h]rfid at the end of the
> -	 * handler will return to the book3s_hv_interrupts.S code.
> -	 * For other interrupts we do the rfid to get back
> -	 * to the book3s_hv_interrupts.S code here.
> +	 * For external interrupts we need to call the Linux
> +	 * handler to process the interrupt. We do that by jumping
> +	 * to absolute address 0x500 for external interrupts.
> +	 * The [h]rfid at the end of the handler will return to
> +	 * the book3s_hv_interrupts.S code. For other interrupts
> +	 * we do the rfid to get back to the book3s_hv_interrupts.S
> +	 * code here.
>  	 */
>  	ld	r8, 112+PPC_LR_STKOFF(r1)
>  	addi	r1, r1, 112
>  	ld	r7, HSTATE_HOST_MSR(r13)
>  
> -	cmpwi	cr1, r12, BOOK3S_INTERRUPT_MACHINE_CHECK
>  	cmpwi	r12, BOOK3S_INTERRUPT_EXTERNAL
>  	beq	11f
>  	cmpwi	r12, BOOK3S_INTERRUPT_H_DOORBELL
> @@ -162,7 +159,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>  	mtmsrd	r6, 1			/* Clear RI in MSR */
>  	mtsrr0	r8
>  	mtsrr1	r7
> -	beq	cr1, 13f		/* machine check */
>  	RFI
>  
>  	/* On POWER7, we have external interrupts set to use HSRR0/1 */
> @@ -170,8 +166,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>  	mtspr	SPRN_HSRR1, r7
>  	ba	0x500
>  
> -13:	b	machine_check_fwnmi
> -
>  14:	mtspr	SPRN_HSRR0, r8
>  	mtspr	SPRN_HSRR1, r7
>  	b	hmi_exception_after_realmode
> @@ -2390,15 +2384,13 @@ machine_check_realmode:
>  	ld	r9, HSTATE_KVM_VCPU(r13)
>  	li	r12, BOOK3S_INTERRUPT_MACHINE_CHECK
>  	/*
> -	 * Deliver unhandled/fatal (e.g. UE) MCE errors to guest through
> -	 * machine check interrupt (set HSRR0 to 0x200). And for handled
> -	 * errors (no-fatal), just go back to guest execution with current
> -	 * HSRR0 instead of exiting guest. This new approach will inject
> -	 * machine check to guest for fatal error causing guest to crash.
> -	 *
> -	 * The old code used to return to host for unhandled errors which
> -	 * was causing guest to hang with soft lockups inside guest and
> -	 * makes it difficult to recover guest instance.
> +	 * Deliver unhandled/fatal (e.g. UE) MCE errors to guest either
> +	 * through machine check interrupt (set HSRR0 to 0x200) or by
> +	 * exiting the guest with KVM_EXIT_NMI exit reason if guest is
> +	 * FWNMI capable. For handled errors (no-fatal), just go back
> +	 * to guest execution with current HSRR0. This new approach
> +	 * injects machine check errors in guest address space to guest
> +	 * enabling guest kernel to suitably handle such errors.
>  	 *
>  	 * if we receive machine check with MSR(RI=0) then deliver it to
>  	 * guest as machine check causing guest to crash.
> @@ -2408,11 +2400,17 @@ machine_check_realmode:
>  	beq	1f			/* Deliver a machine check to guest */
>  	ld	r10, VCPU_PC(r9)
>  	cmpdi	r3, 0		/* Did we handle MCE ? */
> -	bne	2f	/* Continue guest execution. */
> +	bne	3f	/* Continue guest execution. */
>  	/* If not, deliver a machine check.  SRR0/1 are already set */
> -1:	li	r10, BOOK3S_INTERRUPT_MACHINE_CHECK
> +1:  /* Check if guest is capable of handling NMI exit */
> +	ld  r3, VCPU_KVM(r9)
> +	lbz  r3, KVM_FWNMI(r3)
> +	cmpdi   r3, 1       /* FWNMI capable? */
> +	bne 2f
> +	b   mc_cont
> +2:	li	r10, BOOK3S_INTERRUPT_MACHINE_CHECK
>  	bl	kvmppc_msr_interrupt
> -2:	b	fast_interrupt_c_return
> +3:	b	fast_interrupt_c_return
>  
>  /*
>   * Check the reason we woke from nap, and take appropriate action.
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 2/2] KVM: PPC: Exit guest upon MCE when FWNMI capability is enabled
@ 2016-01-14  0:06     ` David Gibson
  0 siblings, 0 replies; 31+ messages in thread
From: David Gibson @ 2016-01-14  0:06 UTC (permalink / raw)
  To: Aravinda Prasad
  Cc: gleb, agraf, kvm-ppc, paulus, linuxppc-dev, pbonzini, mahesh, kvm, mpe

[-- Attachment #1: Type: text/plain, Size: 6684 bytes --]

On Wed, Jan 13, 2016 at 12:38:09PM +0530, Aravinda Prasad wrote:
> Enhance KVM to cause a guest exit with KVM_EXIT_NMI
> exit reasons upon a machine check exception (MCE) in
> the guest address space if the KVM_CAP_PPC_FWNMI
> capability is enabled (instead of delivering 0x200
> interrupt to guest). This enables QEMU to build error
> log and deliver machine check exception to guest via
> guest registered machine check handler.
> 
> This approach simplifies the delivering of machine
> check exception to guest OS compared to the earlier
> approach of KVM directly invoking 0x200 guest interrupt
> vector. In the earlier approach QEMU was enhanced to
> patch the 0x200 interrupt vector during boot. The
> patched code at 0x200 issued a private hcall to pass
> the control to QEMU to build the error log.
> 
> This design/approach is based on the feedback for the
> QEMU patches to handle machine check exception. Details
> of earlier approach of handling machine check exception
> in QEMU and related discussions can be found at:
> 
> https://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg00813.html
> 
> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  arch/powerpc/kvm/book3s_hv.c            |   12 ++------
>  arch/powerpc/kvm/book3s_hv_rmhandlers.S |   48 +++++++++++++++----------------
>  2 files changed, 26 insertions(+), 34 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index a7352b5..4fa03d0 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -858,15 +858,9 @@ static int kvmppc_handle_exit_hv(struct kvm_run *run, struct kvm_vcpu *vcpu,
>  		r = RESUME_GUEST;
>  		break;
>  	case BOOK3S_INTERRUPT_MACHINE_CHECK:
> -		/*
> -		 * Deliver a machine check interrupt to the guest.
> -		 * We have to do this, even if the host has handled the
> -		 * machine check, because machine checks use SRR0/1 and
> -		 * the interrupt might have trashed guest state in them.
> -		 */
> -		kvmppc_book3s_queue_irqprio(vcpu,
> -					    BOOK3S_INTERRUPT_MACHINE_CHECK);
> -		r = RESUME_GUEST;
> +		/* Exit to guest with KVM_EXIT_NMI as exit reason */
> +		run->exit_reason = KVM_EXIT_NMI;
> +		r = RESUME_HOST;
>  		break;
>  	case BOOK3S_INTERRUPT_PROGRAM:
>  	{
> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> index 3c6badc..84e32a3 100644
> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> @@ -133,21 +133,18 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>  	stb	r0, HSTATE_HWTHREAD_REQ(r13)
>  
>  	/*
> -	 * For external and machine check interrupts, we need
> -	 * to call the Linux handler to process the interrupt.
> -	 * We do that by jumping to absolute address 0x500 for
> -	 * external interrupts, or the machine_check_fwnmi label
> -	 * for machine checks (since firmware might have patched
> -	 * the vector area at 0x200).  The [h]rfid at the end of the
> -	 * handler will return to the book3s_hv_interrupts.S code.
> -	 * For other interrupts we do the rfid to get back
> -	 * to the book3s_hv_interrupts.S code here.
> +	 * For external interrupts we need to call the Linux
> +	 * handler to process the interrupt. We do that by jumping
> +	 * to absolute address 0x500 for external interrupts.
> +	 * The [h]rfid at the end of the handler will return to
> +	 * the book3s_hv_interrupts.S code. For other interrupts
> +	 * we do the rfid to get back to the book3s_hv_interrupts.S
> +	 * code here.
>  	 */
>  	ld	r8, 112+PPC_LR_STKOFF(r1)
>  	addi	r1, r1, 112
>  	ld	r7, HSTATE_HOST_MSR(r13)
>  
> -	cmpwi	cr1, r12, BOOK3S_INTERRUPT_MACHINE_CHECK
>  	cmpwi	r12, BOOK3S_INTERRUPT_EXTERNAL
>  	beq	11f
>  	cmpwi	r12, BOOK3S_INTERRUPT_H_DOORBELL
> @@ -162,7 +159,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>  	mtmsrd	r6, 1			/* Clear RI in MSR */
>  	mtsrr0	r8
>  	mtsrr1	r7
> -	beq	cr1, 13f		/* machine check */
>  	RFI
>  
>  	/* On POWER7, we have external interrupts set to use HSRR0/1 */
> @@ -170,8 +166,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>  	mtspr	SPRN_HSRR1, r7
>  	ba	0x500
>  
> -13:	b	machine_check_fwnmi
> -
>  14:	mtspr	SPRN_HSRR0, r8
>  	mtspr	SPRN_HSRR1, r7
>  	b	hmi_exception_after_realmode
> @@ -2390,15 +2384,13 @@ machine_check_realmode:
>  	ld	r9, HSTATE_KVM_VCPU(r13)
>  	li	r12, BOOK3S_INTERRUPT_MACHINE_CHECK
>  	/*
> -	 * Deliver unhandled/fatal (e.g. UE) MCE errors to guest through
> -	 * machine check interrupt (set HSRR0 to 0x200). And for handled
> -	 * errors (no-fatal), just go back to guest execution with current
> -	 * HSRR0 instead of exiting guest. This new approach will inject
> -	 * machine check to guest for fatal error causing guest to crash.
> -	 *
> -	 * The old code used to return to host for unhandled errors which
> -	 * was causing guest to hang with soft lockups inside guest and
> -	 * makes it difficult to recover guest instance.
> +	 * Deliver unhandled/fatal (e.g. UE) MCE errors to guest either
> +	 * through machine check interrupt (set HSRR0 to 0x200) or by
> +	 * exiting the guest with KVM_EXIT_NMI exit reason if guest is
> +	 * FWNMI capable. For handled errors (no-fatal), just go back
> +	 * to guest execution with current HSRR0. This new approach
> +	 * injects machine check errors in guest address space to guest
> +	 * enabling guest kernel to suitably handle such errors.
>  	 *
>  	 * if we receive machine check with MSR(RI=0) then deliver it to
>  	 * guest as machine check causing guest to crash.
> @@ -2408,11 +2400,17 @@ machine_check_realmode:
>  	beq	1f			/* Deliver a machine check to guest */
>  	ld	r10, VCPU_PC(r9)
>  	cmpdi	r3, 0		/* Did we handle MCE ? */
> -	bne	2f	/* Continue guest execution. */
> +	bne	3f	/* Continue guest execution. */
>  	/* If not, deliver a machine check.  SRR0/1 are already set */
> -1:	li	r10, BOOK3S_INTERRUPT_MACHINE_CHECK
> +1:  /* Check if guest is capable of handling NMI exit */
> +	ld  r3, VCPU_KVM(r9)
> +	lbz  r3, KVM_FWNMI(r3)
> +	cmpdi   r3, 1       /* FWNMI capable? */
> +	bne 2f
> +	b   mc_cont
> +2:	li	r10, BOOK3S_INTERRUPT_MACHINE_CHECK
>  	bl	kvmppc_msr_interrupt
> -2:	b	fast_interrupt_c_return
> +3:	b	fast_interrupt_c_return
>  
>  /*
>   * Check the reason we woke from nap, and take appropriate action.
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 1/2] KVM: PPC: New capability to control MCE behaviour
  2016-01-13  7:19 ` Aravinda Prasad
@ 2016-01-23 10:20   ` Paul Mackerras
  -1 siblings, 0 replies; 31+ messages in thread
From: Paul Mackerras @ 2016-01-23 10:20 UTC (permalink / raw)
  To: Aravinda Prasad
  Cc: gleb, agraf, kvm-ppc, linuxppc-dev, pbonzini, mahesh, david, kvm, mpe

On Wed, Jan 13, 2016 at 12:37:59PM +0530, Aravinda Prasad wrote:
> This patch introduces a new KVM capability to control
> how KVM behaves on machine check exception (MCE).
> Without this capability, KVM redirects machine check
> exceptions to guest's 0x200 vector if the address in
> error belongs to the guest. With this capability KVM
> causes a guest exit with NMI exit reason.
> 
> This is required to avoid problems if a new kernel/KVM
> is used with an old QEMU for guests that don't issue
> "ibm,nmi-register". As old QEMU does not understand the
> NMI exit type, it treats it as a fatal error. However,
> the guest could have handled the machine check error
> if the exception was delivered to guest's 0x200 interrupt
> vector instead of NMI exit in case of old QEMU.

[snip]

> @@ -1132,6 +1135,10 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
>  		break;
>  	}
>  #endif /* CONFIG_KVM_XICS */
> +	case KVM_CAP_PPC_FWNMI:
> +		r = 0;
> +		vcpu->kvm->arch.fwnmi_enabled = true;
> +		break;

Might we ever want to set this flag back to false after setting it to
true?  If so perhaps we should do vcpu->kvm->arch.fwnmi_enabled =
!!cap->args[0].  However, I admit I can't actually think of a
situation where we would need to reset it. :)

Paul.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 1/2] KVM: PPC: New capability to control MCE behaviour
@ 2016-01-23 10:20   ` Paul Mackerras
  0 siblings, 0 replies; 31+ messages in thread
From: Paul Mackerras @ 2016-01-23 10:20 UTC (permalink / raw)
  To: Aravinda Prasad
  Cc: gleb, agraf, kvm-ppc, linuxppc-dev, pbonzini, mahesh, david, kvm, mpe

On Wed, Jan 13, 2016 at 12:37:59PM +0530, Aravinda Prasad wrote:
> This patch introduces a new KVM capability to control
> how KVM behaves on machine check exception (MCE).
> Without this capability, KVM redirects machine check
> exceptions to guest's 0x200 vector if the address in
> error belongs to the guest. With this capability KVM
> causes a guest exit with NMI exit reason.
> 
> This is required to avoid problems if a new kernel/KVM
> is used with an old QEMU for guests that don't issue
> "ibm,nmi-register". As old QEMU does not understand the
> NMI exit type, it treats it as a fatal error. However,
> the guest could have handled the machine check error
> if the exception was delivered to guest's 0x200 interrupt
> vector instead of NMI exit in case of old QEMU.

[snip]

> @@ -1132,6 +1135,10 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
>  		break;
>  	}
>  #endif /* CONFIG_KVM_XICS */
> +	case KVM_CAP_PPC_FWNMI:
> +		r = 0;
> +		vcpu->kvm->arch.fwnmi_enabled = true;
> +		break;

Might we ever want to set this flag back to false after setting it to
true?  If so perhaps we should do vcpu->kvm->arch.fwnmi_enabled !!cap->args[0].  However, I admit I can't actually think of a
situation where we would need to reset it. :)

Paul.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 2/2] KVM: PPC: Exit guest upon MCE when FWNMI capability is enabled
  2016-01-13  7:20   ` Aravinda Prasad
@ 2016-01-23 10:28     ` Paul Mackerras
  -1 siblings, 0 replies; 31+ messages in thread
From: Paul Mackerras @ 2016-01-23 10:28 UTC (permalink / raw)
  To: Aravinda Prasad
  Cc: gleb, agraf, kvm-ppc, linuxppc-dev, pbonzini, mahesh, david, kvm, mpe

On Wed, Jan 13, 2016 at 12:38:09PM +0530, Aravinda Prasad wrote:
> Enhance KVM to cause a guest exit with KVM_EXIT_NMI
> exit reasons upon a machine check exception (MCE) in
> the guest address space if the KVM_CAP_PPC_FWNMI
> capability is enabled (instead of delivering 0x200
> interrupt to guest). This enables QEMU to build error
> log and deliver machine check exception to guest via
> guest registered machine check handler.
> 
> This approach simplifies the delivering of machine
> check exception to guest OS compared to the earlier
> approach of KVM directly invoking 0x200 guest interrupt
> vector. In the earlier approach QEMU was enhanced to
> patch the 0x200 interrupt vector during boot. The
> patched code at 0x200 issued a private hcall to pass
> the control to QEMU to build the error log.
> 
> This design/approach is based on the feedback for the
> QEMU patches to handle machine check exception. Details
> of earlier approach of handling machine check exception
> in QEMU and related discussions can be found at:

[snip]

> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> @@ -133,21 +133,18 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>  	stb	r0, HSTATE_HWTHREAD_REQ(r13)
>  
>  	/*
> -	 * For external and machine check interrupts, we need
> -	 * to call the Linux handler to process the interrupt.
> -	 * We do that by jumping to absolute address 0x500 for
> -	 * external interrupts, or the machine_check_fwnmi label
> -	 * for machine checks (since firmware might have patched
> -	 * the vector area at 0x200).  The [h]rfid at the end of the
> -	 * handler will return to the book3s_hv_interrupts.S code.
> -	 * For other interrupts we do the rfid to get back
> -	 * to the book3s_hv_interrupts.S code here.
> +	 * For external interrupts we need to call the Linux
> +	 * handler to process the interrupt. We do that by jumping
> +	 * to absolute address 0x500 for external interrupts.
> +	 * The [h]rfid at the end of the handler will return to
> +	 * the book3s_hv_interrupts.S code. For other interrupts
> +	 * we do the rfid to get back to the book3s_hv_interrupts.S
> +	 * code here.
>  	 */
>  	ld	r8, 112+PPC_LR_STKOFF(r1)
>  	addi	r1, r1, 112
>  	ld	r7, HSTATE_HOST_MSR(r13)
>  
> -	cmpwi	cr1, r12, BOOK3S_INTERRUPT_MACHINE_CHECK
>  	cmpwi	r12, BOOK3S_INTERRUPT_EXTERNAL
>  	beq	11f
>  	cmpwi	r12, BOOK3S_INTERRUPT_H_DOORBELL
> @@ -162,7 +159,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>  	mtmsrd	r6, 1			/* Clear RI in MSR */
>  	mtsrr0	r8
>  	mtsrr1	r7
> -	beq	cr1, 13f		/* machine check */
>  	RFI
>  
>  	/* On POWER7, we have external interrupts set to use HSRR0/1 */
> @@ -170,8 +166,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>  	mtspr	SPRN_HSRR1, r7
>  	ba	0x500
>  
> -13:	b	machine_check_fwnmi
> -

So, what you're disabling here is the host-side handling of the
machine check after completing the guest->host switch.  This has
nothing to do with how the machine check gets communicated to the
guest.

Now, part of the host-side machine check handling has already
happened, but I thought there was more that was done in host kernel
virtual mode.  If this change really is needed then I would want an
ack from Mahesh that this is correct, and it will need to be explained
in detail in the patch description.

>  14:	mtspr	SPRN_HSRR0, r8
>  	mtspr	SPRN_HSRR1, r7
>  	b	hmi_exception_after_realmode
> @@ -2390,15 +2384,13 @@ machine_check_realmode:
>  	ld	r9, HSTATE_KVM_VCPU(r13)
>  	li	r12, BOOK3S_INTERRUPT_MACHINE_CHECK
>  	/*
> -	 * Deliver unhandled/fatal (e.g. UE) MCE errors to guest through
> -	 * machine check interrupt (set HSRR0 to 0x200). And for handled
> -	 * errors (no-fatal), just go back to guest execution with current
> -	 * HSRR0 instead of exiting guest. This new approach will inject
> -	 * machine check to guest for fatal error causing guest to crash.
> -	 *
> -	 * The old code used to return to host for unhandled errors which
> -	 * was causing guest to hang with soft lockups inside guest and
> -	 * makes it difficult to recover guest instance.
> +	 * Deliver unhandled/fatal (e.g. UE) MCE errors to guest either
> +	 * through machine check interrupt (set HSRR0 to 0x200) or by
> +	 * exiting the guest with KVM_EXIT_NMI exit reason if guest is
> +	 * FWNMI capable. For handled errors (no-fatal), just go back
> +	 * to guest execution with current HSRR0. This new approach
> +	 * injects machine check errors in guest address space to guest
> +	 * enabling guest kernel to suitably handle such errors.
>  	 *
>  	 * if we receive machine check with MSR(RI=0) then deliver it to
>  	 * guest as machine check causing guest to crash.
> @@ -2408,11 +2400,17 @@ machine_check_realmode:
>  	beq	1f			/* Deliver a machine check to guest */
>  	ld	r10, VCPU_PC(r9)
>  	cmpdi	r3, 0		/* Did we handle MCE ? */
> -	bne	2f	/* Continue guest execution. */
> +	bne	3f	/* Continue guest execution. */
>  	/* If not, deliver a machine check.  SRR0/1 are already set */
> -1:	li	r10, BOOK3S_INTERRUPT_MACHINE_CHECK
> +1:  /* Check if guest is capable of handling NMI exit */
> +	ld  r3, VCPU_KVM(r9)

Tab between opcode and first operand please, and also in the following
lines.

> +	lbz  r3, KVM_FWNMI(r3)
> +	cmpdi   r3, 1       /* FWNMI capable? */
> +	bne 2f
> +	b   mc_cont

Why not just beq mc_cont rather than the bne 2f; b mc_cont?

> +2:	li	r10, BOOK3S_INTERRUPT_MACHINE_CHECK
>  	bl	kvmppc_msr_interrupt
> -2:	b	fast_interrupt_c_return
> +3:	b	fast_interrupt_c_return

Paul.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 2/2] KVM: PPC: Exit guest upon MCE when FWNMI capability is enabled
@ 2016-01-23 10:28     ` Paul Mackerras
  0 siblings, 0 replies; 31+ messages in thread
From: Paul Mackerras @ 2016-01-23 10:28 UTC (permalink / raw)
  To: Aravinda Prasad
  Cc: gleb, agraf, kvm-ppc, linuxppc-dev, pbonzini, mahesh, david, kvm, mpe

On Wed, Jan 13, 2016 at 12:38:09PM +0530, Aravinda Prasad wrote:
> Enhance KVM to cause a guest exit with KVM_EXIT_NMI
> exit reasons upon a machine check exception (MCE) in
> the guest address space if the KVM_CAP_PPC_FWNMI
> capability is enabled (instead of delivering 0x200
> interrupt to guest). This enables QEMU to build error
> log and deliver machine check exception to guest via
> guest registered machine check handler.
> 
> This approach simplifies the delivering of machine
> check exception to guest OS compared to the earlier
> approach of KVM directly invoking 0x200 guest interrupt
> vector. In the earlier approach QEMU was enhanced to
> patch the 0x200 interrupt vector during boot. The
> patched code at 0x200 issued a private hcall to pass
> the control to QEMU to build the error log.
> 
> This design/approach is based on the feedback for the
> QEMU patches to handle machine check exception. Details
> of earlier approach of handling machine check exception
> in QEMU and related discussions can be found at:

[snip]

> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> @@ -133,21 +133,18 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>  	stb	r0, HSTATE_HWTHREAD_REQ(r13)
>  
>  	/*
> -	 * For external and machine check interrupts, we need
> -	 * to call the Linux handler to process the interrupt.
> -	 * We do that by jumping to absolute address 0x500 for
> -	 * external interrupts, or the machine_check_fwnmi label
> -	 * for machine checks (since firmware might have patched
> -	 * the vector area at 0x200).  The [h]rfid at the end of the
> -	 * handler will return to the book3s_hv_interrupts.S code.
> -	 * For other interrupts we do the rfid to get back
> -	 * to the book3s_hv_interrupts.S code here.
> +	 * For external interrupts we need to call the Linux
> +	 * handler to process the interrupt. We do that by jumping
> +	 * to absolute address 0x500 for external interrupts.
> +	 * The [h]rfid at the end of the handler will return to
> +	 * the book3s_hv_interrupts.S code. For other interrupts
> +	 * we do the rfid to get back to the book3s_hv_interrupts.S
> +	 * code here.
>  	 */
>  	ld	r8, 112+PPC_LR_STKOFF(r1)
>  	addi	r1, r1, 112
>  	ld	r7, HSTATE_HOST_MSR(r13)
>  
> -	cmpwi	cr1, r12, BOOK3S_INTERRUPT_MACHINE_CHECK
>  	cmpwi	r12, BOOK3S_INTERRUPT_EXTERNAL
>  	beq	11f
>  	cmpwi	r12, BOOK3S_INTERRUPT_H_DOORBELL
> @@ -162,7 +159,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>  	mtmsrd	r6, 1			/* Clear RI in MSR */
>  	mtsrr0	r8
>  	mtsrr1	r7
> -	beq	cr1, 13f		/* machine check */
>  	RFI
>  
>  	/* On POWER7, we have external interrupts set to use HSRR0/1 */
> @@ -170,8 +166,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>  	mtspr	SPRN_HSRR1, r7
>  	ba	0x500
>  
> -13:	b	machine_check_fwnmi
> -

So, what you're disabling here is the host-side handling of the
machine check after completing the guest->host switch.  This has
nothing to do with how the machine check gets communicated to the
guest.

Now, part of the host-side machine check handling has already
happened, but I thought there was more that was done in host kernel
virtual mode.  If this change really is needed then I would want an
ack from Mahesh that this is correct, and it will need to be explained
in detail in the patch description.

>  14:	mtspr	SPRN_HSRR0, r8
>  	mtspr	SPRN_HSRR1, r7
>  	b	hmi_exception_after_realmode
> @@ -2390,15 +2384,13 @@ machine_check_realmode:
>  	ld	r9, HSTATE_KVM_VCPU(r13)
>  	li	r12, BOOK3S_INTERRUPT_MACHINE_CHECK
>  	/*
> -	 * Deliver unhandled/fatal (e.g. UE) MCE errors to guest through
> -	 * machine check interrupt (set HSRR0 to 0x200). And for handled
> -	 * errors (no-fatal), just go back to guest execution with current
> -	 * HSRR0 instead of exiting guest. This new approach will inject
> -	 * machine check to guest for fatal error causing guest to crash.
> -	 *
> -	 * The old code used to return to host for unhandled errors which
> -	 * was causing guest to hang with soft lockups inside guest and
> -	 * makes it difficult to recover guest instance.
> +	 * Deliver unhandled/fatal (e.g. UE) MCE errors to guest either
> +	 * through machine check interrupt (set HSRR0 to 0x200) or by
> +	 * exiting the guest with KVM_EXIT_NMI exit reason if guest is
> +	 * FWNMI capable. For handled errors (no-fatal), just go back
> +	 * to guest execution with current HSRR0. This new approach
> +	 * injects machine check errors in guest address space to guest
> +	 * enabling guest kernel to suitably handle such errors.
>  	 *
>  	 * if we receive machine check with MSR(RI=0) then deliver it to
>  	 * guest as machine check causing guest to crash.
> @@ -2408,11 +2400,17 @@ machine_check_realmode:
>  	beq	1f			/* Deliver a machine check to guest */
>  	ld	r10, VCPU_PC(r9)
>  	cmpdi	r3, 0		/* Did we handle MCE ? */
> -	bne	2f	/* Continue guest execution. */
> +	bne	3f	/* Continue guest execution. */
>  	/* If not, deliver a machine check.  SRR0/1 are already set */
> -1:	li	r10, BOOK3S_INTERRUPT_MACHINE_CHECK
> +1:  /* Check if guest is capable of handling NMI exit */
> +	ld  r3, VCPU_KVM(r9)

Tab between opcode and first operand please, and also in the following
lines.

> +	lbz  r3, KVM_FWNMI(r3)
> +	cmpdi   r3, 1       /* FWNMI capable? */
> +	bne 2f
> +	b   mc_cont

Why not just beq mc_cont rather than the bne 2f; b mc_cont?

> +2:	li	r10, BOOK3S_INTERRUPT_MACHINE_CHECK
>  	bl	kvmppc_msr_interrupt
> -2:	b	fast_interrupt_c_return
> +3:	b	fast_interrupt_c_return

Paul.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 1/2] KVM: PPC: New capability to control MCE behaviour
  2016-01-23 10:20   ` Paul Mackerras
@ 2016-01-23 12:40     ` Aravinda Prasad
  -1 siblings, 0 replies; 31+ messages in thread
From: Aravinda Prasad @ 2016-01-23 12:28 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: gleb, agraf, kvm-ppc, linuxppc-dev, pbonzini, mahesh, david, kvm, mpe



On Saturday 23 January 2016 03:50 PM, Paul Mackerras wrote:
> On Wed, Jan 13, 2016 at 12:37:59PM +0530, Aravinda Prasad wrote:
>> This patch introduces a new KVM capability to control
>> how KVM behaves on machine check exception (MCE).
>> Without this capability, KVM redirects machine check
>> exceptions to guest's 0x200 vector if the address in
>> error belongs to the guest. With this capability KVM
>> causes a guest exit with NMI exit reason.
>>
>> This is required to avoid problems if a new kernel/KVM
>> is used with an old QEMU for guests that don't issue
>> "ibm,nmi-register". As old QEMU does not understand the
>> NMI exit type, it treats it as a fatal error. However,
>> the guest could have handled the machine check error
>> if the exception was delivered to guest's 0x200 interrupt
>> vector instead of NMI exit in case of old QEMU.
> 
> [snip]
> 
>> @@ -1132,6 +1135,10 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
>>  		break;
>>  	}
>>  #endif /* CONFIG_KVM_XICS */
>> +	case KVM_CAP_PPC_FWNMI:
>> +		r = 0;
>> +		vcpu->kvm->arch.fwnmi_enabled = true;
>> +		break;
> 
> Might we ever want to set this flag back to false after setting it to
> true?  If so perhaps we should do vcpu->kvm->arch.fwnmi_enabled =
> !!cap->args[0].  However, I admit I can't actually think of a
> situation where we would need to reset it. :)

Even I am not able to think of any situation where resetting is required.

Regards,
Aravinda

> 
> Paul.
> 

-- 
Regards,
Aravinda


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 1/2] KVM: PPC: New capability to control MCE behaviour
@ 2016-01-23 12:40     ` Aravinda Prasad
  0 siblings, 0 replies; 31+ messages in thread
From: Aravinda Prasad @ 2016-01-23 12:40 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: gleb, agraf, kvm-ppc, linuxppc-dev, pbonzini, mahesh, david, kvm, mpe



On Saturday 23 January 2016 03:50 PM, Paul Mackerras wrote:
> On Wed, Jan 13, 2016 at 12:37:59PM +0530, Aravinda Prasad wrote:
>> This patch introduces a new KVM capability to control
>> how KVM behaves on machine check exception (MCE).
>> Without this capability, KVM redirects machine check
>> exceptions to guest's 0x200 vector if the address in
>> error belongs to the guest. With this capability KVM
>> causes a guest exit with NMI exit reason.
>>
>> This is required to avoid problems if a new kernel/KVM
>> is used with an old QEMU for guests that don't issue
>> "ibm,nmi-register". As old QEMU does not understand the
>> NMI exit type, it treats it as a fatal error. However,
>> the guest could have handled the machine check error
>> if the exception was delivered to guest's 0x200 interrupt
>> vector instead of NMI exit in case of old QEMU.
> 
> [snip]
> 
>> @@ -1132,6 +1135,10 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
>>  		break;
>>  	}
>>  #endif /* CONFIG_KVM_XICS */
>> +	case KVM_CAP_PPC_FWNMI:
>> +		r = 0;
>> +		vcpu->kvm->arch.fwnmi_enabled = true;
>> +		break;
> 
> Might we ever want to set this flag back to false after setting it to
> true?  If so perhaps we should do vcpu->kvm->arch.fwnmi_enabled > !!cap->args[0].  However, I admit I can't actually think of a
> situation where we would need to reset it. :)

Even I am not able to think of any situation where resetting is required.

Regards,
Aravinda

> 
> Paul.
> 

-- 
Regards,
Aravinda


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 2/2] KVM: PPC: Exit guest upon MCE when FWNMI capability is enabled
  2016-01-23 10:28     ` Paul Mackerras
@ 2016-01-23 12:53       ` Aravinda Prasad
  -1 siblings, 0 replies; 31+ messages in thread
From: Aravinda Prasad @ 2016-01-23 12:53 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: gleb, agraf, kvm-ppc, linuxppc-dev, pbonzini, mahesh, david, kvm, mpe



On Saturday 23 January 2016 03:58 PM, Paul Mackerras wrote:
> On Wed, Jan 13, 2016 at 12:38:09PM +0530, Aravinda Prasad wrote:
>> Enhance KVM to cause a guest exit with KVM_EXIT_NMI
>> exit reasons upon a machine check exception (MCE) in
>> the guest address space if the KVM_CAP_PPC_FWNMI
>> capability is enabled (instead of delivering 0x200
>> interrupt to guest). This enables QEMU to build error
>> log and deliver machine check exception to guest via
>> guest registered machine check handler.
>>
>> This approach simplifies the delivering of machine
>> check exception to guest OS compared to the earlier
>> approach of KVM directly invoking 0x200 guest interrupt
>> vector. In the earlier approach QEMU was enhanced to
>> patch the 0x200 interrupt vector during boot. The
>> patched code at 0x200 issued a private hcall to pass
>> the control to QEMU to build the error log.
>>
>> This design/approach is based on the feedback for the
>> QEMU patches to handle machine check exception. Details
>> of earlier approach of handling machine check exception
>> in QEMU and related discussions can be found at:
> 
> [snip]
> 
>> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
>> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
>> @@ -133,21 +133,18 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>>  	stb	r0, HSTATE_HWTHREAD_REQ(r13)
>>  
>>  	/*
>> -	 * For external and machine check interrupts, we need
>> -	 * to call the Linux handler to process the interrupt.
>> -	 * We do that by jumping to absolute address 0x500 for
>> -	 * external interrupts, or the machine_check_fwnmi label
>> -	 * for machine checks (since firmware might have patched
>> -	 * the vector area at 0x200).  The [h]rfid at the end of the
>> -	 * handler will return to the book3s_hv_interrupts.S code.
>> -	 * For other interrupts we do the rfid to get back
>> -	 * to the book3s_hv_interrupts.S code here.
>> +	 * For external interrupts we need to call the Linux
>> +	 * handler to process the interrupt. We do that by jumping
>> +	 * to absolute address 0x500 for external interrupts.
>> +	 * The [h]rfid at the end of the handler will return to
>> +	 * the book3s_hv_interrupts.S code. For other interrupts
>> +	 * we do the rfid to get back to the book3s_hv_interrupts.S
>> +	 * code here.
>>  	 */
>>  	ld	r8, 112+PPC_LR_STKOFF(r1)
>>  	addi	r1, r1, 112
>>  	ld	r7, HSTATE_HOST_MSR(r13)
>>  
>> -	cmpwi	cr1, r12, BOOK3S_INTERRUPT_MACHINE_CHECK
>>  	cmpwi	r12, BOOK3S_INTERRUPT_EXTERNAL
>>  	beq	11f
>>  	cmpwi	r12, BOOK3S_INTERRUPT_H_DOORBELL
>> @@ -162,7 +159,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>>  	mtmsrd	r6, 1			/* Clear RI in MSR */
>>  	mtsrr0	r8
>>  	mtsrr1	r7
>> -	beq	cr1, 13f		/* machine check */
>>  	RFI
>>  
>>  	/* On POWER7, we have external interrupts set to use HSRR0/1 */
>> @@ -170,8 +166,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>>  	mtspr	SPRN_HSRR1, r7
>>  	ba	0x500
>>  
>> -13:	b	machine_check_fwnmi
>> -
> 
> So, what you're disabling here is the host-side handling of the
> machine check after completing the guest->host switch.  This has
> nothing to do with how the machine check gets communicated to the
> guest.
> 
> Now, part of the host-side machine check handling has already
> happened, but I thought there was more that was done in host kernel
> virtual mode.  If this change really is needed then I would want an
> ack from Mahesh that this is correct, and it will need to be explained
> in detail in the patch description.

If we don't do that we will end up running into
panic() in opal_machine_check() if UE belonged to guest.

Details in this link:
http://marc.info/?l=kvm-ppc&m=144730552720044&w=2


> 
>>  14:	mtspr	SPRN_HSRR0, r8
>>  	mtspr	SPRN_HSRR1, r7
>>  	b	hmi_exception_after_realmode
>> @@ -2390,15 +2384,13 @@ machine_check_realmode:
>>  	ld	r9, HSTATE_KVM_VCPU(r13)
>>  	li	r12, BOOK3S_INTERRUPT_MACHINE_CHECK
>>  	/*
>> -	 * Deliver unhandled/fatal (e.g. UE) MCE errors to guest through
>> -	 * machine check interrupt (set HSRR0 to 0x200). And for handled
>> -	 * errors (no-fatal), just go back to guest execution with current
>> -	 * HSRR0 instead of exiting guest. This new approach will inject
>> -	 * machine check to guest for fatal error causing guest to crash.
>> -	 *
>> -	 * The old code used to return to host for unhandled errors which
>> -	 * was causing guest to hang with soft lockups inside guest and
>> -	 * makes it difficult to recover guest instance.
>> +	 * Deliver unhandled/fatal (e.g. UE) MCE errors to guest either
>> +	 * through machine check interrupt (set HSRR0 to 0x200) or by
>> +	 * exiting the guest with KVM_EXIT_NMI exit reason if guest is
>> +	 * FWNMI capable. For handled errors (no-fatal), just go back
>> +	 * to guest execution with current HSRR0. This new approach
>> +	 * injects machine check errors in guest address space to guest
>> +	 * enabling guest kernel to suitably handle such errors.
>>  	 *
>>  	 * if we receive machine check with MSR(RI=0) then deliver it to
>>  	 * guest as machine check causing guest to crash.
>> @@ -2408,11 +2400,17 @@ machine_check_realmode:
>>  	beq	1f			/* Deliver a machine check to guest */
>>  	ld	r10, VCPU_PC(r9)
>>  	cmpdi	r3, 0		/* Did we handle MCE ? */
>> -	bne	2f	/* Continue guest execution. */
>> +	bne	3f	/* Continue guest execution. */
>>  	/* If not, deliver a machine check.  SRR0/1 are already set */
>> -1:	li	r10, BOOK3S_INTERRUPT_MACHINE_CHECK
>> +1:  /* Check if guest is capable of handling NMI exit */
>> +	ld  r3, VCPU_KVM(r9)
> 
> Tab between opcode and first operand please, and also in the following
> lines.

ah.. missed it.

> 
>> +	lbz  r3, KVM_FWNMI(r3)
>> +	cmpdi   r3, 1       /* FWNMI capable? */
>> +	bne 2f
>> +	b   mc_cont
> 
> Why not just beq mc_cont rather than the bne 2f; b mc_cont?

Yes, beq mc_count is enough.

Regards,
Aravinda

> 
>> +2:	li	r10, BOOK3S_INTERRUPT_MACHINE_CHECK
>>  	bl	kvmppc_msr_interrupt
>> -2:	b	fast_interrupt_c_return
>> +3:	b	fast_interrupt_c_return
> 
> Paul.
> 

-- 
Regards,
Aravinda


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 2/2] KVM: PPC: Exit guest upon MCE when FWNMI capability is enabled
@ 2016-01-23 12:53       ` Aravinda Prasad
  0 siblings, 0 replies; 31+ messages in thread
From: Aravinda Prasad @ 2016-01-23 12:53 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: gleb, agraf, kvm-ppc, linuxppc-dev, pbonzini, mahesh, david, kvm, mpe



On Saturday 23 January 2016 03:58 PM, Paul Mackerras wrote:
> On Wed, Jan 13, 2016 at 12:38:09PM +0530, Aravinda Prasad wrote:
>> Enhance KVM to cause a guest exit with KVM_EXIT_NMI
>> exit reasons upon a machine check exception (MCE) in
>> the guest address space if the KVM_CAP_PPC_FWNMI
>> capability is enabled (instead of delivering 0x200
>> interrupt to guest). This enables QEMU to build error
>> log and deliver machine check exception to guest via
>> guest registered machine check handler.
>>
>> This approach simplifies the delivering of machine
>> check exception to guest OS compared to the earlier
>> approach of KVM directly invoking 0x200 guest interrupt
>> vector. In the earlier approach QEMU was enhanced to
>> patch the 0x200 interrupt vector during boot. The
>> patched code at 0x200 issued a private hcall to pass
>> the control to QEMU to build the error log.
>>
>> This design/approach is based on the feedback for the
>> QEMU patches to handle machine check exception. Details
>> of earlier approach of handling machine check exception
>> in QEMU and related discussions can be found at:
> 
> [snip]
> 
>> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
>> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
>> @@ -133,21 +133,18 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>>  	stb	r0, HSTATE_HWTHREAD_REQ(r13)
>>  
>>  	/*
>> -	 * For external and machine check interrupts, we need
>> -	 * to call the Linux handler to process the interrupt.
>> -	 * We do that by jumping to absolute address 0x500 for
>> -	 * external interrupts, or the machine_check_fwnmi label
>> -	 * for machine checks (since firmware might have patched
>> -	 * the vector area at 0x200).  The [h]rfid at the end of the
>> -	 * handler will return to the book3s_hv_interrupts.S code.
>> -	 * For other interrupts we do the rfid to get back
>> -	 * to the book3s_hv_interrupts.S code here.
>> +	 * For external interrupts we need to call the Linux
>> +	 * handler to process the interrupt. We do that by jumping
>> +	 * to absolute address 0x500 for external interrupts.
>> +	 * The [h]rfid at the end of the handler will return to
>> +	 * the book3s_hv_interrupts.S code. For other interrupts
>> +	 * we do the rfid to get back to the book3s_hv_interrupts.S
>> +	 * code here.
>>  	 */
>>  	ld	r8, 112+PPC_LR_STKOFF(r1)
>>  	addi	r1, r1, 112
>>  	ld	r7, HSTATE_HOST_MSR(r13)
>>  
>> -	cmpwi	cr1, r12, BOOK3S_INTERRUPT_MACHINE_CHECK
>>  	cmpwi	r12, BOOK3S_INTERRUPT_EXTERNAL
>>  	beq	11f
>>  	cmpwi	r12, BOOK3S_INTERRUPT_H_DOORBELL
>> @@ -162,7 +159,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>>  	mtmsrd	r6, 1			/* Clear RI in MSR */
>>  	mtsrr0	r8
>>  	mtsrr1	r7
>> -	beq	cr1, 13f		/* machine check */
>>  	RFI
>>  
>>  	/* On POWER7, we have external interrupts set to use HSRR0/1 */
>> @@ -170,8 +166,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>>  	mtspr	SPRN_HSRR1, r7
>>  	ba	0x500
>>  
>> -13:	b	machine_check_fwnmi
>> -
> 
> So, what you're disabling here is the host-side handling of the
> machine check after completing the guest->host switch.  This has
> nothing to do with how the machine check gets communicated to the
> guest.
> 
> Now, part of the host-side machine check handling has already
> happened, but I thought there was more that was done in host kernel
> virtual mode.  If this change really is needed then I would want an
> ack from Mahesh that this is correct, and it will need to be explained
> in detail in the patch description.

If we don't do that we will end up running into
panic() in opal_machine_check() if UE belonged to guest.

Details in this link:
http://marc.info/?l=kvm-ppc&m\x144730552720044&w=2


> 
>>  14:	mtspr	SPRN_HSRR0, r8
>>  	mtspr	SPRN_HSRR1, r7
>>  	b	hmi_exception_after_realmode
>> @@ -2390,15 +2384,13 @@ machine_check_realmode:
>>  	ld	r9, HSTATE_KVM_VCPU(r13)
>>  	li	r12, BOOK3S_INTERRUPT_MACHINE_CHECK
>>  	/*
>> -	 * Deliver unhandled/fatal (e.g. UE) MCE errors to guest through
>> -	 * machine check interrupt (set HSRR0 to 0x200). And for handled
>> -	 * errors (no-fatal), just go back to guest execution with current
>> -	 * HSRR0 instead of exiting guest. This new approach will inject
>> -	 * machine check to guest for fatal error causing guest to crash.
>> -	 *
>> -	 * The old code used to return to host for unhandled errors which
>> -	 * was causing guest to hang with soft lockups inside guest and
>> -	 * makes it difficult to recover guest instance.
>> +	 * Deliver unhandled/fatal (e.g. UE) MCE errors to guest either
>> +	 * through machine check interrupt (set HSRR0 to 0x200) or by
>> +	 * exiting the guest with KVM_EXIT_NMI exit reason if guest is
>> +	 * FWNMI capable. For handled errors (no-fatal), just go back
>> +	 * to guest execution with current HSRR0. This new approach
>> +	 * injects machine check errors in guest address space to guest
>> +	 * enabling guest kernel to suitably handle such errors.
>>  	 *
>>  	 * if we receive machine check with MSR(RI=0) then deliver it to
>>  	 * guest as machine check causing guest to crash.
>> @@ -2408,11 +2400,17 @@ machine_check_realmode:
>>  	beq	1f			/* Deliver a machine check to guest */
>>  	ld	r10, VCPU_PC(r9)
>>  	cmpdi	r3, 0		/* Did we handle MCE ? */
>> -	bne	2f	/* Continue guest execution. */
>> +	bne	3f	/* Continue guest execution. */
>>  	/* If not, deliver a machine check.  SRR0/1 are already set */
>> -1:	li	r10, BOOK3S_INTERRUPT_MACHINE_CHECK
>> +1:  /* Check if guest is capable of handling NMI exit */
>> +	ld  r3, VCPU_KVM(r9)
> 
> Tab between opcode and first operand please, and also in the following
> lines.

ah.. missed it.

> 
>> +	lbz  r3, KVM_FWNMI(r3)
>> +	cmpdi   r3, 1       /* FWNMI capable? */
>> +	bne 2f
>> +	b   mc_cont
> 
> Why not just beq mc_cont rather than the bne 2f; b mc_cont?

Yes, beq mc_count is enough.

Regards,
Aravinda

> 
>> +2:	li	r10, BOOK3S_INTERRUPT_MACHINE_CHECK
>>  	bl	kvmppc_msr_interrupt
>> -2:	b	fast_interrupt_c_return
>> +3:	b	fast_interrupt_c_return
> 
> Paul.
> 

-- 
Regards,
Aravinda


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 2/2] KVM: PPC: Exit guest upon MCE when FWNMI capability is enabled
  2016-01-23 12:53       ` Aravinda Prasad
  (?)
@ 2016-01-23 21:24         ` Paul Mackerras
  -1 siblings, 0 replies; 31+ messages in thread
From: Paul Mackerras @ 2016-01-23 21:24 UTC (permalink / raw)
  To: Aravinda Prasad
  Cc: kvm, gleb, mahesh, agraf, kvm-ppc, linuxppc-dev, pbonzini, david

On Sat, Jan 23, 2016 at 06:23:35PM +0530, Aravinda Prasad wrote:
> 
> 
> On Saturday 23 January 2016 03:58 PM, Paul Mackerras wrote:
> > On Wed, Jan 13, 2016 at 12:38:09PM +0530, Aravinda Prasad wrote:
> >> Enhance KVM to cause a guest exit with KVM_EXIT_NMI
> >> exit reasons upon a machine check exception (MCE) in
> >> the guest address space if the KVM_CAP_PPC_FWNMI
> >> capability is enabled (instead of delivering 0x200
> >> interrupt to guest). This enables QEMU to build error
> >> log and deliver machine check exception to guest via
> >> guest registered machine check handler.
> >>
> >> This approach simplifies the delivering of machine
> >> check exception to guest OS compared to the earlier
> >> approach of KVM directly invoking 0x200 guest interrupt
> >> vector. In the earlier approach QEMU was enhanced to
> >> patch the 0x200 interrupt vector during boot. The
> >> patched code at 0x200 issued a private hcall to pass
> >> the control to QEMU to build the error log.
> >>
> >> This design/approach is based on the feedback for the
> >> QEMU patches to handle machine check exception. Details
> >> of earlier approach of handling machine check exception
> >> in QEMU and related discussions can be found at:
> > 
> > [snip]
> > 
> >> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> >> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> >> @@ -133,21 +133,18 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
> >>  	stb	r0, HSTATE_HWTHREAD_REQ(r13)
> >>  
> >>  	/*
> >> -	 * For external and machine check interrupts, we need
> >> -	 * to call the Linux handler to process the interrupt.
> >> -	 * We do that by jumping to absolute address 0x500 for
> >> -	 * external interrupts, or the machine_check_fwnmi label
> >> -	 * for machine checks (since firmware might have patched
> >> -	 * the vector area at 0x200).  The [h]rfid at the end of the
> >> -	 * handler will return to the book3s_hv_interrupts.S code.
> >> -	 * For other interrupts we do the rfid to get back
> >> -	 * to the book3s_hv_interrupts.S code here.
> >> +	 * For external interrupts we need to call the Linux
> >> +	 * handler to process the interrupt. We do that by jumping
> >> +	 * to absolute address 0x500 for external interrupts.
> >> +	 * The [h]rfid at the end of the handler will return to
> >> +	 * the book3s_hv_interrupts.S code. For other interrupts
> >> +	 * we do the rfid to get back to the book3s_hv_interrupts.S
> >> +	 * code here.
> >>  	 */
> >>  	ld	r8, 112+PPC_LR_STKOFF(r1)
> >>  	addi	r1, r1, 112
> >>  	ld	r7, HSTATE_HOST_MSR(r13)
> >>  
> >> -	cmpwi	cr1, r12, BOOK3S_INTERRUPT_MACHINE_CHECK
> >>  	cmpwi	r12, BOOK3S_INTERRUPT_EXTERNAL
> >>  	beq	11f
> >>  	cmpwi	r12, BOOK3S_INTERRUPT_H_DOORBELL
> >> @@ -162,7 +159,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
> >>  	mtmsrd	r6, 1			/* Clear RI in MSR */
> >>  	mtsrr0	r8
> >>  	mtsrr1	r7
> >> -	beq	cr1, 13f		/* machine check */
> >>  	RFI
> >>  
> >>  	/* On POWER7, we have external interrupts set to use HSRR0/1 */
> >> @@ -170,8 +166,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
> >>  	mtspr	SPRN_HSRR1, r7
> >>  	ba	0x500
> >>  
> >> -13:	b	machine_check_fwnmi
> >> -
> > 
> > So, what you're disabling here is the host-side handling of the
> > machine check after completing the guest->host switch.  This has
> > nothing to do with how the machine check gets communicated to the
> > guest.
> > 
> > Now, part of the host-side machine check handling has already
> > happened, but I thought there was more that was done in host kernel
> > virtual mode.  If this change really is needed then I would want an
> > ack from Mahesh that this is correct, and it will need to be explained
> > in detail in the patch description.
> 
> If we don't do that we will end up running into
> panic() in opal_machine_check() if UE belonged to guest.
> 
> Details in this link:
> http://marc.info/?l=kvm-ppc&m=144730552720044&w=2

Well maybe the panic call needs to be changed.  But the way you have
it, we *never* get to opal_machine_check for any machine check
interrupt, and I have a hard time believing that is correct.

Paul.
_______________________________________________
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 2/2] KVM: PPC: Exit guest upon MCE when FWNMI capability is enabled
@ 2016-01-23 21:24         ` Paul Mackerras
  0 siblings, 0 replies; 31+ messages in thread
From: Paul Mackerras @ 2016-01-23 21:24 UTC (permalink / raw)
  To: Aravinda Prasad
  Cc: gleb, agraf, kvm-ppc, linuxppc-dev, pbonzini, mahesh, david, kvm, mpe

On Sat, Jan 23, 2016 at 06:23:35PM +0530, Aravinda Prasad wrote:
> 
> 
> On Saturday 23 January 2016 03:58 PM, Paul Mackerras wrote:
> > On Wed, Jan 13, 2016 at 12:38:09PM +0530, Aravinda Prasad wrote:
> >> Enhance KVM to cause a guest exit with KVM_EXIT_NMI
> >> exit reasons upon a machine check exception (MCE) in
> >> the guest address space if the KVM_CAP_PPC_FWNMI
> >> capability is enabled (instead of delivering 0x200
> >> interrupt to guest). This enables QEMU to build error
> >> log and deliver machine check exception to guest via
> >> guest registered machine check handler.
> >>
> >> This approach simplifies the delivering of machine
> >> check exception to guest OS compared to the earlier
> >> approach of KVM directly invoking 0x200 guest interrupt
> >> vector. In the earlier approach QEMU was enhanced to
> >> patch the 0x200 interrupt vector during boot. The
> >> patched code at 0x200 issued a private hcall to pass
> >> the control to QEMU to build the error log.
> >>
> >> This design/approach is based on the feedback for the
> >> QEMU patches to handle machine check exception. Details
> >> of earlier approach of handling machine check exception
> >> in QEMU and related discussions can be found at:
> > 
> > [snip]
> > 
> >> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> >> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> >> @@ -133,21 +133,18 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
> >>  	stb	r0, HSTATE_HWTHREAD_REQ(r13)
> >>  
> >>  	/*
> >> -	 * For external and machine check interrupts, we need
> >> -	 * to call the Linux handler to process the interrupt.
> >> -	 * We do that by jumping to absolute address 0x500 for
> >> -	 * external interrupts, or the machine_check_fwnmi label
> >> -	 * for machine checks (since firmware might have patched
> >> -	 * the vector area at 0x200).  The [h]rfid at the end of the
> >> -	 * handler will return to the book3s_hv_interrupts.S code.
> >> -	 * For other interrupts we do the rfid to get back
> >> -	 * to the book3s_hv_interrupts.S code here.
> >> +	 * For external interrupts we need to call the Linux
> >> +	 * handler to process the interrupt. We do that by jumping
> >> +	 * to absolute address 0x500 for external interrupts.
> >> +	 * The [h]rfid at the end of the handler will return to
> >> +	 * the book3s_hv_interrupts.S code. For other interrupts
> >> +	 * we do the rfid to get back to the book3s_hv_interrupts.S
> >> +	 * code here.
> >>  	 */
> >>  	ld	r8, 112+PPC_LR_STKOFF(r1)
> >>  	addi	r1, r1, 112
> >>  	ld	r7, HSTATE_HOST_MSR(r13)
> >>  
> >> -	cmpwi	cr1, r12, BOOK3S_INTERRUPT_MACHINE_CHECK
> >>  	cmpwi	r12, BOOK3S_INTERRUPT_EXTERNAL
> >>  	beq	11f
> >>  	cmpwi	r12, BOOK3S_INTERRUPT_H_DOORBELL
> >> @@ -162,7 +159,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
> >>  	mtmsrd	r6, 1			/* Clear RI in MSR */
> >>  	mtsrr0	r8
> >>  	mtsrr1	r7
> >> -	beq	cr1, 13f		/* machine check */
> >>  	RFI
> >>  
> >>  	/* On POWER7, we have external interrupts set to use HSRR0/1 */
> >> @@ -170,8 +166,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
> >>  	mtspr	SPRN_HSRR1, r7
> >>  	ba	0x500
> >>  
> >> -13:	b	machine_check_fwnmi
> >> -
> > 
> > So, what you're disabling here is the host-side handling of the
> > machine check after completing the guest->host switch.  This has
> > nothing to do with how the machine check gets communicated to the
> > guest.
> > 
> > Now, part of the host-side machine check handling has already
> > happened, but I thought there was more that was done in host kernel
> > virtual mode.  If this change really is needed then I would want an
> > ack from Mahesh that this is correct, and it will need to be explained
> > in detail in the patch description.
> 
> If we don't do that we will end up running into
> panic() in opal_machine_check() if UE belonged to guest.
> 
> Details in this link:
> http://marc.info/?l=kvm-ppc&m=144730552720044&w=2

Well maybe the panic call needs to be changed.  But the way you have
it, we *never* get to opal_machine_check for any machine check
interrupt, and I have a hard time believing that is correct.

Paul.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 2/2] KVM: PPC: Exit guest upon MCE when FWNMI capability is enabled
@ 2016-01-23 21:24         ` Paul Mackerras
  0 siblings, 0 replies; 31+ messages in thread
From: Paul Mackerras @ 2016-01-23 21:24 UTC (permalink / raw)
  To: Aravinda Prasad
  Cc: kvm, gleb, mahesh, agraf, kvm-ppc, linuxppc-dev, pbonzini, david

On Sat, Jan 23, 2016 at 06:23:35PM +0530, Aravinda Prasad wrote:
> 
> 
> On Saturday 23 January 2016 03:58 PM, Paul Mackerras wrote:
> > On Wed, Jan 13, 2016 at 12:38:09PM +0530, Aravinda Prasad wrote:
> >> Enhance KVM to cause a guest exit with KVM_EXIT_NMI
> >> exit reasons upon a machine check exception (MCE) in
> >> the guest address space if the KVM_CAP_PPC_FWNMI
> >> capability is enabled (instead of delivering 0x200
> >> interrupt to guest). This enables QEMU to build error
> >> log and deliver machine check exception to guest via
> >> guest registered machine check handler.
> >>
> >> This approach simplifies the delivering of machine
> >> check exception to guest OS compared to the earlier
> >> approach of KVM directly invoking 0x200 guest interrupt
> >> vector. In the earlier approach QEMU was enhanced to
> >> patch the 0x200 interrupt vector during boot. The
> >> patched code at 0x200 issued a private hcall to pass
> >> the control to QEMU to build the error log.
> >>
> >> This design/approach is based on the feedback for the
> >> QEMU patches to handle machine check exception. Details
> >> of earlier approach of handling machine check exception
> >> in QEMU and related discussions can be found at:
> > 
> > [snip]
> > 
> >> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> >> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> >> @@ -133,21 +133,18 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
> >>  	stb	r0, HSTATE_HWTHREAD_REQ(r13)
> >>  
> >>  	/*
> >> -	 * For external and machine check interrupts, we need
> >> -	 * to call the Linux handler to process the interrupt.
> >> -	 * We do that by jumping to absolute address 0x500 for
> >> -	 * external interrupts, or the machine_check_fwnmi label
> >> -	 * for machine checks (since firmware might have patched
> >> -	 * the vector area at 0x200).  The [h]rfid at the end of the
> >> -	 * handler will return to the book3s_hv_interrupts.S code.
> >> -	 * For other interrupts we do the rfid to get back
> >> -	 * to the book3s_hv_interrupts.S code here.
> >> +	 * For external interrupts we need to call the Linux
> >> +	 * handler to process the interrupt. We do that by jumping
> >> +	 * to absolute address 0x500 for external interrupts.
> >> +	 * The [h]rfid at the end of the handler will return to
> >> +	 * the book3s_hv_interrupts.S code. For other interrupts
> >> +	 * we do the rfid to get back to the book3s_hv_interrupts.S
> >> +	 * code here.
> >>  	 */
> >>  	ld	r8, 112+PPC_LR_STKOFF(r1)
> >>  	addi	r1, r1, 112
> >>  	ld	r7, HSTATE_HOST_MSR(r13)
> >>  
> >> -	cmpwi	cr1, r12, BOOK3S_INTERRUPT_MACHINE_CHECK
> >>  	cmpwi	r12, BOOK3S_INTERRUPT_EXTERNAL
> >>  	beq	11f
> >>  	cmpwi	r12, BOOK3S_INTERRUPT_H_DOORBELL
> >> @@ -162,7 +159,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
> >>  	mtmsrd	r6, 1			/* Clear RI in MSR */
> >>  	mtsrr0	r8
> >>  	mtsrr1	r7
> >> -	beq	cr1, 13f		/* machine check */
> >>  	RFI
> >>  
> >>  	/* On POWER7, we have external interrupts set to use HSRR0/1 */
> >> @@ -170,8 +166,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
> >>  	mtspr	SPRN_HSRR1, r7
> >>  	ba	0x500
> >>  
> >> -13:	b	machine_check_fwnmi
> >> -
> > 
> > So, what you're disabling here is the host-side handling of the
> > machine check after completing the guest->host switch.  This has
> > nothing to do with how the machine check gets communicated to the
> > guest.
> > 
> > Now, part of the host-side machine check handling has already
> > happened, but I thought there was more that was done in host kernel
> > virtual mode.  If this change really is needed then I would want an
> > ack from Mahesh that this is correct, and it will need to be explained
> > in detail in the patch description.
> 
> If we don't do that we will end up running into
> panic() in opal_machine_check() if UE belonged to guest.
> 
> Details in this link:
> http://marc.info/?l=kvm-ppc&m\x144730552720044&w=2

Well maybe the panic call needs to be changed.  But the way you have
it, we *never* get to opal_machine_check for any machine check
interrupt, and I have a hard time believing that is correct.

Paul.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 2/2] KVM: PPC: Exit guest upon MCE when FWNMI capability is enabled
  2016-01-23 21:24         ` Paul Mackerras
@ 2016-01-25  8:51           ` Aravinda Prasad
  -1 siblings, 0 replies; 31+ messages in thread
From: Aravinda Prasad @ 2016-01-25  8:39 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: kvm, gleb, mahesh, agraf, kvm-ppc, linuxppc-dev, pbonzini, david



On Sunday 24 January 2016 02:54 AM, Paul Mackerras wrote:
> On Sat, Jan 23, 2016 at 06:23:35PM +0530, Aravinda Prasad wrote:
>>
>>
>> On Saturday 23 January 2016 03:58 PM, Paul Mackerras wrote:
>>> On Wed, Jan 13, 2016 at 12:38:09PM +0530, Aravinda Prasad wrote:
>>>> Enhance KVM to cause a guest exit with KVM_EXIT_NMI
>>>> exit reasons upon a machine check exception (MCE) in
>>>> the guest address space if the KVM_CAP_PPC_FWNMI
>>>> capability is enabled (instead of delivering 0x200
>>>> interrupt to guest). This enables QEMU to build error
>>>> log and deliver machine check exception to guest via
>>>> guest registered machine check handler.
>>>>
>>>> This approach simplifies the delivering of machine
>>>> check exception to guest OS compared to the earlier
>>>> approach of KVM directly invoking 0x200 guest interrupt
>>>> vector. In the earlier approach QEMU was enhanced to
>>>> patch the 0x200 interrupt vector during boot. The
>>>> patched code at 0x200 issued a private hcall to pass
>>>> the control to QEMU to build the error log.
>>>>
>>>> This design/approach is based on the feedback for the
>>>> QEMU patches to handle machine check exception. Details
>>>> of earlier approach of handling machine check exception
>>>> in QEMU and related discussions can be found at:
>>>
>>> [snip]
>>>
>>>> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
>>>> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
>>>> @@ -133,21 +133,18 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>>>>  	stb	r0, HSTATE_HWTHREAD_REQ(r13)
>>>>  
>>>>  	/*
>>>> -	 * For external and machine check interrupts, we need
>>>> -	 * to call the Linux handler to process the interrupt.
>>>> -	 * We do that by jumping to absolute address 0x500 for
>>>> -	 * external interrupts, or the machine_check_fwnmi label
>>>> -	 * for machine checks (since firmware might have patched
>>>> -	 * the vector area at 0x200).  The [h]rfid at the end of the
>>>> -	 * handler will return to the book3s_hv_interrupts.S code.
>>>> -	 * For other interrupts we do the rfid to get back
>>>> -	 * to the book3s_hv_interrupts.S code here.
>>>> +	 * For external interrupts we need to call the Linux
>>>> +	 * handler to process the interrupt. We do that by jumping
>>>> +	 * to absolute address 0x500 for external interrupts.
>>>> +	 * The [h]rfid at the end of the handler will return to
>>>> +	 * the book3s_hv_interrupts.S code. For other interrupts
>>>> +	 * we do the rfid to get back to the book3s_hv_interrupts.S
>>>> +	 * code here.
>>>>  	 */
>>>>  	ld	r8, 112+PPC_LR_STKOFF(r1)
>>>>  	addi	r1, r1, 112
>>>>  	ld	r7, HSTATE_HOST_MSR(r13)
>>>>  
>>>> -	cmpwi	cr1, r12, BOOK3S_INTERRUPT_MACHINE_CHECK
>>>>  	cmpwi	r12, BOOK3S_INTERRUPT_EXTERNAL
>>>>  	beq	11f
>>>>  	cmpwi	r12, BOOK3S_INTERRUPT_H_DOORBELL
>>>> @@ -162,7 +159,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>>>>  	mtmsrd	r6, 1			/* Clear RI in MSR */
>>>>  	mtsrr0	r8
>>>>  	mtsrr1	r7
>>>> -	beq	cr1, 13f		/* machine check */
>>>>  	RFI
>>>>  
>>>>  	/* On POWER7, we have external interrupts set to use HSRR0/1 */
>>>> @@ -170,8 +166,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>>>>  	mtspr	SPRN_HSRR1, r7
>>>>  	ba	0x500
>>>>  
>>>> -13:	b	machine_check_fwnmi
>>>> -
>>>
>>> So, what you're disabling here is the host-side handling of the
>>> machine check after completing the guest->host switch.  This has
>>> nothing to do with how the machine check gets communicated to the
>>> guest.
>>>
>>> Now, part of the host-side machine check handling has already
>>> happened, but I thought there was more that was done in host kernel
>>> virtual mode.  If this change really is needed then I would want an
>>> ack from Mahesh that this is correct, and it will need to be explained
>>> in detail in the patch description.
>>
>> If we don't do that we will end up running into
>> panic() in opal_machine_check() if UE belonged to guest.
>>
>> Details in this link:
>> http://marc.info/?l=kvm-ppc&m=144730552720044&w=2
> 
> Well maybe the panic call needs to be changed.  But the way you have
> it, we *never* get to opal_machine_check for any machine check
> interrupt, and I have a hard time believing that is correct.

If I am not wrong, earlier, even without this patch we never get to
opal_machine_check() from this place for the machine check happening
inside the guest (irrespective of whether the machine check is recovered
or not). Because, we jump to fast_interrupt_c_return() from
machine_check_realmode without causing the guest to exit.

With this patch I call mc_cont from machine_check_realmode and want the
guest to exit with NMI exit code. However, we don't want to call
opal_machine_check(), because if we call opal_machine_check() we don't
end up with NMI exit for the guest.

Regards,
Aravinda

> 
> Paul.
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
> 

-- 
Regards,
Aravinda


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 2/2] KVM: PPC: Exit guest upon MCE when FWNMI capability is enabled
@ 2016-01-25  8:51           ` Aravinda Prasad
  0 siblings, 0 replies; 31+ messages in thread
From: Aravinda Prasad @ 2016-01-25  8:51 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: kvm, gleb, mahesh, agraf, kvm-ppc, linuxppc-dev, pbonzini, david



On Sunday 24 January 2016 02:54 AM, Paul Mackerras wrote:
> On Sat, Jan 23, 2016 at 06:23:35PM +0530, Aravinda Prasad wrote:
>>
>>
>> On Saturday 23 January 2016 03:58 PM, Paul Mackerras wrote:
>>> On Wed, Jan 13, 2016 at 12:38:09PM +0530, Aravinda Prasad wrote:
>>>> Enhance KVM to cause a guest exit with KVM_EXIT_NMI
>>>> exit reasons upon a machine check exception (MCE) in
>>>> the guest address space if the KVM_CAP_PPC_FWNMI
>>>> capability is enabled (instead of delivering 0x200
>>>> interrupt to guest). This enables QEMU to build error
>>>> log and deliver machine check exception to guest via
>>>> guest registered machine check handler.
>>>>
>>>> This approach simplifies the delivering of machine
>>>> check exception to guest OS compared to the earlier
>>>> approach of KVM directly invoking 0x200 guest interrupt
>>>> vector. In the earlier approach QEMU was enhanced to
>>>> patch the 0x200 interrupt vector during boot. The
>>>> patched code at 0x200 issued a private hcall to pass
>>>> the control to QEMU to build the error log.
>>>>
>>>> This design/approach is based on the feedback for the
>>>> QEMU patches to handle machine check exception. Details
>>>> of earlier approach of handling machine check exception
>>>> in QEMU and related discussions can be found at:
>>>
>>> [snip]
>>>
>>>> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
>>>> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
>>>> @@ -133,21 +133,18 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>>>>  	stb	r0, HSTATE_HWTHREAD_REQ(r13)
>>>>  
>>>>  	/*
>>>> -	 * For external and machine check interrupts, we need
>>>> -	 * to call the Linux handler to process the interrupt.
>>>> -	 * We do that by jumping to absolute address 0x500 for
>>>> -	 * external interrupts, or the machine_check_fwnmi label
>>>> -	 * for machine checks (since firmware might have patched
>>>> -	 * the vector area at 0x200).  The [h]rfid at the end of the
>>>> -	 * handler will return to the book3s_hv_interrupts.S code.
>>>> -	 * For other interrupts we do the rfid to get back
>>>> -	 * to the book3s_hv_interrupts.S code here.
>>>> +	 * For external interrupts we need to call the Linux
>>>> +	 * handler to process the interrupt. We do that by jumping
>>>> +	 * to absolute address 0x500 for external interrupts.
>>>> +	 * The [h]rfid at the end of the handler will return to
>>>> +	 * the book3s_hv_interrupts.S code. For other interrupts
>>>> +	 * we do the rfid to get back to the book3s_hv_interrupts.S
>>>> +	 * code here.
>>>>  	 */
>>>>  	ld	r8, 112+PPC_LR_STKOFF(r1)
>>>>  	addi	r1, r1, 112
>>>>  	ld	r7, HSTATE_HOST_MSR(r13)
>>>>  
>>>> -	cmpwi	cr1, r12, BOOK3S_INTERRUPT_MACHINE_CHECK
>>>>  	cmpwi	r12, BOOK3S_INTERRUPT_EXTERNAL
>>>>  	beq	11f
>>>>  	cmpwi	r12, BOOK3S_INTERRUPT_H_DOORBELL
>>>> @@ -162,7 +159,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>>>>  	mtmsrd	r6, 1			/* Clear RI in MSR */
>>>>  	mtsrr0	r8
>>>>  	mtsrr1	r7
>>>> -	beq	cr1, 13f		/* machine check */
>>>>  	RFI
>>>>  
>>>>  	/* On POWER7, we have external interrupts set to use HSRR0/1 */
>>>> @@ -170,8 +166,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>>>>  	mtspr	SPRN_HSRR1, r7
>>>>  	ba	0x500
>>>>  
>>>> -13:	b	machine_check_fwnmi
>>>> -
>>>
>>> So, what you're disabling here is the host-side handling of the
>>> machine check after completing the guest->host switch.  This has
>>> nothing to do with how the machine check gets communicated to the
>>> guest.
>>>
>>> Now, part of the host-side machine check handling has already
>>> happened, but I thought there was more that was done in host kernel
>>> virtual mode.  If this change really is needed then I would want an
>>> ack from Mahesh that this is correct, and it will need to be explained
>>> in detail in the patch description.
>>
>> If we don't do that we will end up running into
>> panic() in opal_machine_check() if UE belonged to guest.
>>
>> Details in this link:
>> http://marc.info/?l=kvm-ppc&m\x144730552720044&w=2
> 
> Well maybe the panic call needs to be changed.  But the way you have
> it, we *never* get to opal_machine_check for any machine check
> interrupt, and I have a hard time believing that is correct.

If I am not wrong, earlier, even without this patch we never get to
opal_machine_check() from this place for the machine check happening
inside the guest (irrespective of whether the machine check is recovered
or not). Because, we jump to fast_interrupt_c_return() from
machine_check_realmode without causing the guest to exit.

With this patch I call mc_cont from machine_check_realmode and want the
guest to exit with NMI exit code. However, we don't want to call
opal_machine_check(), because if we call opal_machine_check() we don't
end up with NMI exit for the guest.

Regards,
Aravinda

> 
> Paul.
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
> 

-- 
Regards,
Aravinda


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 2/2] KVM: PPC: Exit guest upon MCE when FWNMI capability is enabled
  2016-01-23 21:24         ` Paul Mackerras
  (?)
@ 2016-01-27  5:02           ` Mahesh Jagannath Salgaonkar
  -1 siblings, 0 replies; 31+ messages in thread
From: Mahesh Jagannath Salgaonkar @ 2016-01-27  5:02 UTC (permalink / raw)
  To: Paul Mackerras, Aravinda Prasad
  Cc: kvm, gleb, agraf, kvm-ppc, linuxppc-dev, pbonzini, david

On 01/24/2016 02:54 AM, Paul Mackerras wrote:
> On Sat, Jan 23, 2016 at 06:23:35PM +0530, Aravinda Prasad wrote:
>>
>>
>> On Saturday 23 January 2016 03:58 PM, Paul Mackerras wrote:
>>> On Wed, Jan 13, 2016 at 12:38:09PM +0530, Aravinda Prasad wrote:
>>>> Enhance KVM to cause a guest exit with KVM_EXIT_NMI
>>>> exit reasons upon a machine check exception (MCE) in
>>>> the guest address space if the KVM_CAP_PPC_FWNMI
>>>> capability is enabled (instead of delivering 0x200
>>>> interrupt to guest). This enables QEMU to build error
>>>> log and deliver machine check exception to guest via
>>>> guest registered machine check handler.
>>>>
>>>> This approach simplifies the delivering of machine
>>>> check exception to guest OS compared to the earlier
>>>> approach of KVM directly invoking 0x200 guest interrupt
>>>> vector. In the earlier approach QEMU was enhanced to
>>>> patch the 0x200 interrupt vector during boot. The
>>>> patched code at 0x200 issued a private hcall to pass
>>>> the control to QEMU to build the error log.
>>>>
>>>> This design/approach is based on the feedback for the
>>>> QEMU patches to handle machine check exception. Details
>>>> of earlier approach of handling machine check exception
>>>> in QEMU and related discussions can be found at:
>>>
>>> [snip]
>>>
>>>> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
>>>> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
>>>> @@ -133,21 +133,18 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>>>>  	stb	r0, HSTATE_HWTHREAD_REQ(r13)
>>>>  
>>>>  	/*
>>>> -	 * For external and machine check interrupts, we need
>>>> -	 * to call the Linux handler to process the interrupt.
>>>> -	 * We do that by jumping to absolute address 0x500 for
>>>> -	 * external interrupts, or the machine_check_fwnmi label
>>>> -	 * for machine checks (since firmware might have patched
>>>> -	 * the vector area at 0x200).  The [h]rfid at the end of the
>>>> -	 * handler will return to the book3s_hv_interrupts.S code.
>>>> -	 * For other interrupts we do the rfid to get back
>>>> -	 * to the book3s_hv_interrupts.S code here.
>>>> +	 * For external interrupts we need to call the Linux
>>>> +	 * handler to process the interrupt. We do that by jumping
>>>> +	 * to absolute address 0x500 for external interrupts.
>>>> +	 * The [h]rfid at the end of the handler will return to
>>>> +	 * the book3s_hv_interrupts.S code. For other interrupts
>>>> +	 * we do the rfid to get back to the book3s_hv_interrupts.S
>>>> +	 * code here.
>>>>  	 */
>>>>  	ld	r8, 112+PPC_LR_STKOFF(r1)
>>>>  	addi	r1, r1, 112
>>>>  	ld	r7, HSTATE_HOST_MSR(r13)
>>>>  
>>>> -	cmpwi	cr1, r12, BOOK3S_INTERRUPT_MACHINE_CHECK
>>>>  	cmpwi	r12, BOOK3S_INTERRUPT_EXTERNAL
>>>>  	beq	11f
>>>>  	cmpwi	r12, BOOK3S_INTERRUPT_H_DOORBELL
>>>> @@ -162,7 +159,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>>>>  	mtmsrd	r6, 1			/* Clear RI in MSR */
>>>>  	mtsrr0	r8
>>>>  	mtsrr1	r7
>>>> -	beq	cr1, 13f		/* machine check */
>>>>  	RFI
>>>>  
>>>>  	/* On POWER7, we have external interrupts set to use HSRR0/1 */
>>>> @@ -170,8 +166,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>>>>  	mtspr	SPRN_HSRR1, r7
>>>>  	ba	0x500
>>>>  
>>>> -13:	b	machine_check_fwnmi
>>>> -
>>>
>>> So, what you're disabling here is the host-side handling of the
>>> machine check after completing the guest->host switch.  This has
>>> nothing to do with how the machine check gets communicated to the
>>> guest.
>>>
>>> Now, part of the host-side machine check handling has already
>>> happened, but I thought there was more that was done in host kernel
>>> virtual mode.  If this change really is needed then I would want an
>>> ack from Mahesh that this is correct, and it will need to be explained
>>> in detail in the patch description.
>>
>> If we don't do that we will end up running into
>> panic() in opal_machine_check() if UE belonged to guest.
>>
>> Details in this link:
>> http://marc.info/?l=kvm-ppc&m=144730552720044&w=2
> 
> Well maybe the panic call needs to be changed.  But the way you have
> it, we *never* get to opal_machine_check for any machine check
> interrupt, and I have a hard time believing that is correct.

We do still go to opal_machine_check for MCE in host kernel/user space.
The host virtual mode prints the event and then checks whether MCE is in
kernel or user space context and acts accordingly.

For machine check while in guest, when we fall through
machine_check_fwnmi, the register context points to host kernel module
(__kvmppc_vcore_entry [kvm_hv]). And opal_machine_check thinks that
memory UE is in host kernel rather than in guest and goes down the panic
path. It becomes bit difficult to figure out whether MCE happened in
guest OR in host kernel by looking at the register set received by
opal_machine_check in this case.

Aravinda's implementation requires him to go back to KVM kernel module
which would end up in NMI exit inside QEMU and then he can handle Memory
error for guest inside QEMU and deliver it to guest. Hence he needs to
bypass machine_check_fwnmi. But by this time we would have already
hooked the mce event to work queue for console logging.

The other approach could be to add additional field into machine check
event that can help opal_machine_check to identify that MCE is from
guest and then just return from interrupt after printing event instead
of going down to panic path. But this could be bit messy.

Thanks,
-Mahesh.

> 
> Paul.
> 

_______________________________________________
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 2/2] KVM: PPC: Exit guest upon MCE when FWNMI capability is enabled
@ 2016-01-27  5:02           ` Mahesh Jagannath Salgaonkar
  0 siblings, 0 replies; 31+ messages in thread
From: Mahesh Jagannath Salgaonkar @ 2016-01-27  5:02 UTC (permalink / raw)
  To: Paul Mackerras, Aravinda Prasad
  Cc: gleb, agraf, kvm-ppc, linuxppc-dev, pbonzini, david, kvm, mpe

On 01/24/2016 02:54 AM, Paul Mackerras wrote:
> On Sat, Jan 23, 2016 at 06:23:35PM +0530, Aravinda Prasad wrote:
>>
>>
>> On Saturday 23 January 2016 03:58 PM, Paul Mackerras wrote:
>>> On Wed, Jan 13, 2016 at 12:38:09PM +0530, Aravinda Prasad wrote:
>>>> Enhance KVM to cause a guest exit with KVM_EXIT_NMI
>>>> exit reasons upon a machine check exception (MCE) in
>>>> the guest address space if the KVM_CAP_PPC_FWNMI
>>>> capability is enabled (instead of delivering 0x200
>>>> interrupt to guest). This enables QEMU to build error
>>>> log and deliver machine check exception to guest via
>>>> guest registered machine check handler.
>>>>
>>>> This approach simplifies the delivering of machine
>>>> check exception to guest OS compared to the earlier
>>>> approach of KVM directly invoking 0x200 guest interrupt
>>>> vector. In the earlier approach QEMU was enhanced to
>>>> patch the 0x200 interrupt vector during boot. The
>>>> patched code at 0x200 issued a private hcall to pass
>>>> the control to QEMU to build the error log.
>>>>
>>>> This design/approach is based on the feedback for the
>>>> QEMU patches to handle machine check exception. Details
>>>> of earlier approach of handling machine check exception
>>>> in QEMU and related discussions can be found at:
>>>
>>> [snip]
>>>
>>>> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
>>>> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
>>>> @@ -133,21 +133,18 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>>>>  	stb	r0, HSTATE_HWTHREAD_REQ(r13)
>>>>  
>>>>  	/*
>>>> -	 * For external and machine check interrupts, we need
>>>> -	 * to call the Linux handler to process the interrupt.
>>>> -	 * We do that by jumping to absolute address 0x500 for
>>>> -	 * external interrupts, or the machine_check_fwnmi label
>>>> -	 * for machine checks (since firmware might have patched
>>>> -	 * the vector area at 0x200).  The [h]rfid at the end of the
>>>> -	 * handler will return to the book3s_hv_interrupts.S code.
>>>> -	 * For other interrupts we do the rfid to get back
>>>> -	 * to the book3s_hv_interrupts.S code here.
>>>> +	 * For external interrupts we need to call the Linux
>>>> +	 * handler to process the interrupt. We do that by jumping
>>>> +	 * to absolute address 0x500 for external interrupts.
>>>> +	 * The [h]rfid at the end of the handler will return to
>>>> +	 * the book3s_hv_interrupts.S code. For other interrupts
>>>> +	 * we do the rfid to get back to the book3s_hv_interrupts.S
>>>> +	 * code here.
>>>>  	 */
>>>>  	ld	r8, 112+PPC_LR_STKOFF(r1)
>>>>  	addi	r1, r1, 112
>>>>  	ld	r7, HSTATE_HOST_MSR(r13)
>>>>  
>>>> -	cmpwi	cr1, r12, BOOK3S_INTERRUPT_MACHINE_CHECK
>>>>  	cmpwi	r12, BOOK3S_INTERRUPT_EXTERNAL
>>>>  	beq	11f
>>>>  	cmpwi	r12, BOOK3S_INTERRUPT_H_DOORBELL
>>>> @@ -162,7 +159,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>>>>  	mtmsrd	r6, 1			/* Clear RI in MSR */
>>>>  	mtsrr0	r8
>>>>  	mtsrr1	r7
>>>> -	beq	cr1, 13f		/* machine check */
>>>>  	RFI
>>>>  
>>>>  	/* On POWER7, we have external interrupts set to use HSRR0/1 */
>>>> @@ -170,8 +166,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>>>>  	mtspr	SPRN_HSRR1, r7
>>>>  	ba	0x500
>>>>  
>>>> -13:	b	machine_check_fwnmi
>>>> -
>>>
>>> So, what you're disabling here is the host-side handling of the
>>> machine check after completing the guest->host switch.  This has
>>> nothing to do with how the machine check gets communicated to the
>>> guest.
>>>
>>> Now, part of the host-side machine check handling has already
>>> happened, but I thought there was more that was done in host kernel
>>> virtual mode.  If this change really is needed then I would want an
>>> ack from Mahesh that this is correct, and it will need to be explained
>>> in detail in the patch description.
>>
>> If we don't do that we will end up running into
>> panic() in opal_machine_check() if UE belonged to guest.
>>
>> Details in this link:
>> http://marc.info/?l=kvm-ppc&m=144730552720044&w=2
> 
> Well maybe the panic call needs to be changed.  But the way you have
> it, we *never* get to opal_machine_check for any machine check
> interrupt, and I have a hard time believing that is correct.

We do still go to opal_machine_check for MCE in host kernel/user space.
The host virtual mode prints the event and then checks whether MCE is in
kernel or user space context and acts accordingly.

For machine check while in guest, when we fall through
machine_check_fwnmi, the register context points to host kernel module
(__kvmppc_vcore_entry [kvm_hv]). And opal_machine_check thinks that
memory UE is in host kernel rather than in guest and goes down the panic
path. It becomes bit difficult to figure out whether MCE happened in
guest OR in host kernel by looking at the register set received by
opal_machine_check in this case.

Aravinda's implementation requires him to go back to KVM kernel module
which would end up in NMI exit inside QEMU and then he can handle Memory
error for guest inside QEMU and deliver it to guest. Hence he needs to
bypass machine_check_fwnmi. But by this time we would have already
hooked the mce event to work queue for console logging.

The other approach could be to add additional field into machine check
event that can help opal_machine_check to identify that MCE is from
guest and then just return from interrupt after printing event instead
of going down to panic path. But this could be bit messy.

Thanks,
-Mahesh.

> 
> Paul.
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 2/2] KVM: PPC: Exit guest upon MCE when FWNMI capability is enabled
@ 2016-01-27  5:02           ` Mahesh Jagannath Salgaonkar
  0 siblings, 0 replies; 31+ messages in thread
From: Mahesh Jagannath Salgaonkar @ 2016-01-27  5:14 UTC (permalink / raw)
  To: Paul Mackerras, Aravinda Prasad
  Cc: kvm, gleb, agraf, kvm-ppc, linuxppc-dev, pbonzini, david

On 01/24/2016 02:54 AM, Paul Mackerras wrote:
> On Sat, Jan 23, 2016 at 06:23:35PM +0530, Aravinda Prasad wrote:
>>
>>
>> On Saturday 23 January 2016 03:58 PM, Paul Mackerras wrote:
>>> On Wed, Jan 13, 2016 at 12:38:09PM +0530, Aravinda Prasad wrote:
>>>> Enhance KVM to cause a guest exit with KVM_EXIT_NMI
>>>> exit reasons upon a machine check exception (MCE) in
>>>> the guest address space if the KVM_CAP_PPC_FWNMI
>>>> capability is enabled (instead of delivering 0x200
>>>> interrupt to guest). This enables QEMU to build error
>>>> log and deliver machine check exception to guest via
>>>> guest registered machine check handler.
>>>>
>>>> This approach simplifies the delivering of machine
>>>> check exception to guest OS compared to the earlier
>>>> approach of KVM directly invoking 0x200 guest interrupt
>>>> vector. In the earlier approach QEMU was enhanced to
>>>> patch the 0x200 interrupt vector during boot. The
>>>> patched code at 0x200 issued a private hcall to pass
>>>> the control to QEMU to build the error log.
>>>>
>>>> This design/approach is based on the feedback for the
>>>> QEMU patches to handle machine check exception. Details
>>>> of earlier approach of handling machine check exception
>>>> in QEMU and related discussions can be found at:
>>>
>>> [snip]
>>>
>>>> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
>>>> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
>>>> @@ -133,21 +133,18 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>>>>  	stb	r0, HSTATE_HWTHREAD_REQ(r13)
>>>>  
>>>>  	/*
>>>> -	 * For external and machine check interrupts, we need
>>>> -	 * to call the Linux handler to process the interrupt.
>>>> -	 * We do that by jumping to absolute address 0x500 for
>>>> -	 * external interrupts, or the machine_check_fwnmi label
>>>> -	 * for machine checks (since firmware might have patched
>>>> -	 * the vector area at 0x200).  The [h]rfid at the end of the
>>>> -	 * handler will return to the book3s_hv_interrupts.S code.
>>>> -	 * For other interrupts we do the rfid to get back
>>>> -	 * to the book3s_hv_interrupts.S code here.
>>>> +	 * For external interrupts we need to call the Linux
>>>> +	 * handler to process the interrupt. We do that by jumping
>>>> +	 * to absolute address 0x500 for external interrupts.
>>>> +	 * The [h]rfid at the end of the handler will return to
>>>> +	 * the book3s_hv_interrupts.S code. For other interrupts
>>>> +	 * we do the rfid to get back to the book3s_hv_interrupts.S
>>>> +	 * code here.
>>>>  	 */
>>>>  	ld	r8, 112+PPC_LR_STKOFF(r1)
>>>>  	addi	r1, r1, 112
>>>>  	ld	r7, HSTATE_HOST_MSR(r13)
>>>>  
>>>> -	cmpwi	cr1, r12, BOOK3S_INTERRUPT_MACHINE_CHECK
>>>>  	cmpwi	r12, BOOK3S_INTERRUPT_EXTERNAL
>>>>  	beq	11f
>>>>  	cmpwi	r12, BOOK3S_INTERRUPT_H_DOORBELL
>>>> @@ -162,7 +159,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>>>>  	mtmsrd	r6, 1			/* Clear RI in MSR */
>>>>  	mtsrr0	r8
>>>>  	mtsrr1	r7
>>>> -	beq	cr1, 13f		/* machine check */
>>>>  	RFI
>>>>  
>>>>  	/* On POWER7, we have external interrupts set to use HSRR0/1 */
>>>> @@ -170,8 +166,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>>>>  	mtspr	SPRN_HSRR1, r7
>>>>  	ba	0x500
>>>>  
>>>> -13:	b	machine_check_fwnmi
>>>> -
>>>
>>> So, what you're disabling here is the host-side handling of the
>>> machine check after completing the guest->host switch.  This has
>>> nothing to do with how the machine check gets communicated to the
>>> guest.
>>>
>>> Now, part of the host-side machine check handling has already
>>> happened, but I thought there was more that was done in host kernel
>>> virtual mode.  If this change really is needed then I would want an
>>> ack from Mahesh that this is correct, and it will need to be explained
>>> in detail in the patch description.
>>
>> If we don't do that we will end up running into
>> panic() in opal_machine_check() if UE belonged to guest.
>>
>> Details in this link:
>> http://marc.info/?l=kvm-ppc&m\x144730552720044&w=2
> 
> Well maybe the panic call needs to be changed.  But the way you have
> it, we *never* get to opal_machine_check for any machine check
> interrupt, and I have a hard time believing that is correct.

We do still go to opal_machine_check for MCE in host kernel/user space.
The host virtual mode prints the event and then checks whether MCE is in
kernel or user space context and acts accordingly.

For machine check while in guest, when we fall through
machine_check_fwnmi, the register context points to host kernel module
(__kvmppc_vcore_entry [kvm_hv]). And opal_machine_check thinks that
memory UE is in host kernel rather than in guest and goes down the panic
path. It becomes bit difficult to figure out whether MCE happened in
guest OR in host kernel by looking at the register set received by
opal_machine_check in this case.

Aravinda's implementation requires him to go back to KVM kernel module
which would end up in NMI exit inside QEMU and then he can handle Memory
error for guest inside QEMU and deliver it to guest. Hence he needs to
bypass machine_check_fwnmi. But by this time we would have already
hooked the mce event to work queue for console logging.

The other approach could be to add additional field into machine check
event that can help opal_machine_check to identify that MCE is from
guest and then just return from interrupt after printing event instead
of going down to panic path. But this could be bit messy.

Thanks,
-Mahesh.

> 
> Paul.
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 2/2] KVM: PPC: Exit guest upon MCE when FWNMI capability is enabled
  2016-01-13  7:20   ` Aravinda Prasad
  (?)
@ 2016-06-20  5:18     ` Paul Mackerras
  -1 siblings, 0 replies; 31+ messages in thread
From: Paul Mackerras @ 2016-06-20  5:18 UTC (permalink / raw)
  To: Aravinda Prasad
  Cc: kvm, gleb, mahesh, agraf, kvm-ppc, linuxppc-dev, pbonzini, david

Hi Aravinda,

On Wed, Jan 13, 2016 at 12:38:09PM +0530, Aravinda Prasad wrote:
> Enhance KVM to cause a guest exit with KVM_EXIT_NMI
> exit reasons upon a machine check exception (MCE) in
> the guest address space if the KVM_CAP_PPC_FWNMI
> capability is enabled (instead of delivering 0x200
> interrupt to guest). This enables QEMU to build error
> log and deliver machine check exception to guest via
> guest registered machine check handler.
> 
> This approach simplifies the delivering of machine
> check exception to guest OS compared to the earlier
> approach of KVM directly invoking 0x200 guest interrupt
> vector. In the earlier approach QEMU was enhanced to
> patch the 0x200 interrupt vector during boot. The
> patched code at 0x200 issued a private hcall to pass
> the control to QEMU to build the error log.
> 
> This design/approach is based on the feedback for the
> QEMU patches to handle machine check exception. Details
> of earlier approach of handling machine check exception
> in QEMU and related discussions can be found at:
> 
> https://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg00813.html
> 
> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>

Are you in the process of doing a new version of this patch with the
requested changes?

Paul.
_______________________________________________
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 2/2] KVM: PPC: Exit guest upon MCE when FWNMI capability is enabled
@ 2016-06-20  5:18     ` Paul Mackerras
  0 siblings, 0 replies; 31+ messages in thread
From: Paul Mackerras @ 2016-06-20  5:18 UTC (permalink / raw)
  To: Aravinda Prasad
  Cc: gleb, agraf, kvm-ppc, linuxppc-dev, pbonzini, mahesh, david, kvm, mpe

Hi Aravinda,

On Wed, Jan 13, 2016 at 12:38:09PM +0530, Aravinda Prasad wrote:
> Enhance KVM to cause a guest exit with KVM_EXIT_NMI
> exit reasons upon a machine check exception (MCE) in
> the guest address space if the KVM_CAP_PPC_FWNMI
> capability is enabled (instead of delivering 0x200
> interrupt to guest). This enables QEMU to build error
> log and deliver machine check exception to guest via
> guest registered machine check handler.
> 
> This approach simplifies the delivering of machine
> check exception to guest OS compared to the earlier
> approach of KVM directly invoking 0x200 guest interrupt
> vector. In the earlier approach QEMU was enhanced to
> patch the 0x200 interrupt vector during boot. The
> patched code at 0x200 issued a private hcall to pass
> the control to QEMU to build the error log.
> 
> This design/approach is based on the feedback for the
> QEMU patches to handle machine check exception. Details
> of earlier approach of handling machine check exception
> in QEMU and related discussions can be found at:
> 
> https://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg00813.html
> 
> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>

Are you in the process of doing a new version of this patch with the
requested changes?

Paul.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 2/2] KVM: PPC: Exit guest upon MCE when FWNMI capability is enabled
@ 2016-06-20  5:18     ` Paul Mackerras
  0 siblings, 0 replies; 31+ messages in thread
From: Paul Mackerras @ 2016-06-20  5:18 UTC (permalink / raw)
  To: Aravinda Prasad
  Cc: kvm, gleb, mahesh, agraf, kvm-ppc, linuxppc-dev, pbonzini, david

Hi Aravinda,

On Wed, Jan 13, 2016 at 12:38:09PM +0530, Aravinda Prasad wrote:
> Enhance KVM to cause a guest exit with KVM_EXIT_NMI
> exit reasons upon a machine check exception (MCE) in
> the guest address space if the KVM_CAP_PPC_FWNMI
> capability is enabled (instead of delivering 0x200
> interrupt to guest). This enables QEMU to build error
> log and deliver machine check exception to guest via
> guest registered machine check handler.
> 
> This approach simplifies the delivering of machine
> check exception to guest OS compared to the earlier
> approach of KVM directly invoking 0x200 guest interrupt
> vector. In the earlier approach QEMU was enhanced to
> patch the 0x200 interrupt vector during boot. The
> patched code at 0x200 issued a private hcall to pass
> the control to QEMU to build the error log.
> 
> This design/approach is based on the feedback for the
> QEMU patches to handle machine check exception. Details
> of earlier approach of handling machine check exception
> in QEMU and related discussions can be found at:
> 
> https://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg00813.html
> 
> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>

Are you in the process of doing a new version of this patch with the
requested changes?

Paul.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 2/2] KVM: PPC: Exit guest upon MCE when FWNMI capability is enabled
  2016-06-20  5:18     ` Paul Mackerras
@ 2016-06-21 21:13       ` Aravinda Prasad
  -1 siblings, 0 replies; 31+ messages in thread
From: Aravinda Prasad @ 2016-06-21 21:01 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: gleb, agraf, kvm-ppc, linuxppc-dev, pbonzini, mahesh, david, kvm, mpe



On Monday 20 June 2016 10:48 AM, Paul Mackerras wrote:
> Hi Aravinda,
> 
> On Wed, Jan 13, 2016 at 12:38:09PM +0530, Aravinda Prasad wrote:
>> Enhance KVM to cause a guest exit with KVM_EXIT_NMI
>> exit reasons upon a machine check exception (MCE) in
>> the guest address space if the KVM_CAP_PPC_FWNMI
>> capability is enabled (instead of delivering 0x200
>> interrupt to guest). This enables QEMU to build error
>> log and deliver machine check exception to guest via
>> guest registered machine check handler.
>>
>> This approach simplifies the delivering of machine
>> check exception to guest OS compared to the earlier
>> approach of KVM directly invoking 0x200 guest interrupt
>> vector. In the earlier approach QEMU was enhanced to
>> patch the 0x200 interrupt vector during boot. The
>> patched code at 0x200 issued a private hcall to pass
>> the control to QEMU to build the error log.
>>
>> This design/approach is based on the feedback for the
>> QEMU patches to handle machine check exception. Details
>> of earlier approach of handling machine check exception
>> in QEMU and related discussions can be found at:
>>
>> https://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg00813.html
>>
>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> 
> Are you in the process of doing a new version of this patch with the
> requested changes?

Yes, I am working (intermittently) on the new version. But, not able to
finish off and post it. Will complete it and post the new version.

Regards,
Aravinda

> 
> Paul.
> 

-- 
Regards,
Aravinda


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 2/2] KVM: PPC: Exit guest upon MCE when FWNMI capability is enabled
@ 2016-06-21 21:13       ` Aravinda Prasad
  0 siblings, 0 replies; 31+ messages in thread
From: Aravinda Prasad @ 2016-06-21 21:13 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: gleb, agraf, kvm-ppc, linuxppc-dev, pbonzini, mahesh, david, kvm, mpe



On Monday 20 June 2016 10:48 AM, Paul Mackerras wrote:
> Hi Aravinda,
> 
> On Wed, Jan 13, 2016 at 12:38:09PM +0530, Aravinda Prasad wrote:
>> Enhance KVM to cause a guest exit with KVM_EXIT_NMI
>> exit reasons upon a machine check exception (MCE) in
>> the guest address space if the KVM_CAP_PPC_FWNMI
>> capability is enabled (instead of delivering 0x200
>> interrupt to guest). This enables QEMU to build error
>> log and deliver machine check exception to guest via
>> guest registered machine check handler.
>>
>> This approach simplifies the delivering of machine
>> check exception to guest OS compared to the earlier
>> approach of KVM directly invoking 0x200 guest interrupt
>> vector. In the earlier approach QEMU was enhanced to
>> patch the 0x200 interrupt vector during boot. The
>> patched code at 0x200 issued a private hcall to pass
>> the control to QEMU to build the error log.
>>
>> This design/approach is based on the feedback for the
>> QEMU patches to handle machine check exception. Details
>> of earlier approach of handling machine check exception
>> in QEMU and related discussions can be found at:
>>
>> https://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg00813.html
>>
>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> 
> Are you in the process of doing a new version of this patch with the
> requested changes?

Yes, I am working (intermittently) on the new version. But, not able to
finish off and post it. Will complete it and post the new version.

Regards,
Aravinda

> 
> Paul.
> 

-- 
Regards,
Aravinda


^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2016-06-21 21:13 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-13  7:07 [PATCH v3 1/2] KVM: PPC: New capability to control MCE behaviour Aravinda Prasad
2016-01-13  7:19 ` Aravinda Prasad
2016-01-13  7:08 ` [PATCH v3 2/2] KVM: PPC: Exit guest upon MCE when FWNMI capability is enabled Aravinda Prasad
2016-01-13  7:20   ` Aravinda Prasad
2016-01-14  0:06   ` David Gibson
2016-01-14  0:06     ` David Gibson
2016-01-23 10:28   ` Paul Mackerras
2016-01-23 10:28     ` Paul Mackerras
2016-01-23 12:53     ` Aravinda Prasad
2016-01-23 12:53       ` Aravinda Prasad
2016-01-23 21:24       ` Paul Mackerras
2016-01-23 21:24         ` Paul Mackerras
2016-01-23 21:24         ` Paul Mackerras
2016-01-25  8:39         ` Aravinda Prasad
2016-01-25  8:51           ` Aravinda Prasad
2016-01-27  5:02         ` Mahesh Jagannath Salgaonkar
2016-01-27  5:14           ` Mahesh Jagannath Salgaonkar
2016-01-27  5:02           ` Mahesh Jagannath Salgaonkar
2016-06-20  5:18   ` Paul Mackerras
2016-06-20  5:18     ` Paul Mackerras
2016-06-20  5:18     ` Paul Mackerras
2016-06-21 21:01     ` Aravinda Prasad
2016-06-21 21:13       ` Aravinda Prasad
2016-01-14  0:02 ` [PATCH v3 1/2] KVM: PPC: New capability to control MCE behaviour David Gibson
2016-01-14  0:02   ` David Gibson
2016-01-14  0:05   ` David Gibson
2016-01-14  0:05     ` David Gibson
2016-01-23 10:20 ` Paul Mackerras
2016-01-23 10:20   ` Paul Mackerras
2016-01-23 12:28   ` Aravinda Prasad
2016-01-23 12:40     ` Aravinda Prasad

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.