[RFC PATCH 0/4] KVM: x86: allow overwriting L2 reinjected exception with L1 vmexit

All of lore.kernel.org
 help / color / mirror / Atom feed

* [RFC PATCH 0/4] KVM: x86: allow overwriting L2 reinjected exception with L1 vmexit
@ 2017-08-23 20:43 Paolo Bonzini
  2017-08-23 20:43 ` [PATCH 1/4] KVM: nVMX: move vmentry tasks from prepare_vmcs12 to enter_vmx_non_root_mode Paolo Bonzini
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Paolo Bonzini @ 2017-08-23 20:43 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: wanpeng.li, david, rkrcmar, jmattson

vcpu->arch.exception currently contains the vmcs02 IDT-vectored info
through the entire execution of the vmexit.  This makes it harder
to keep that information safe when vcpu->arch.exception is reused for
an exception (such as a page fault) that happens while L0 handles a vmexit.

When this happens, there are two cases:

- the exception causes a vmexit to L1; in that case, the exception in the
  IDT-vectored info is not reinjected; vcpu->arch.exception is
  reused to build the VM-exit interruption info.

- the exception doesn't cause a vmexit to L1; in that case,
  vcpu->arch.exception is changed to a double fault which is injected
  normally into L2 via KVM_REQ_EVENT.

The fix is easy if the vmcs12 IDT-vectored info is prepared early, in
vmx_complete_interrupts (patches 1-2).  I wanted to include this in 4.14,
but Radim is on vacation and also I'm not sure how this interacts with
Wanpeng's other refactoring of nested exceptions, so I'm sending it out
only as RFC.

I am applying only the first patch to kvm/queue.  If I get a review, that
one can be put in 4.14.

Paolo

Paolo Bonzini (4):
  KVM: nVMX: move vmentry tasks from prepare_vmcs12 to
    enter_vmx_non_root_mode
  KVM: nVMX: fill nested IDT-vectored event info on all L2->L0 exits
  KVM: x86: pass struct kvm_queued_exception to kvm_multiple_exception
  KVM: x86: allow overwriting L2 reinjected exception with L1 vmexit

 arch/x86/include/asm/kvm_host.h |   2 +
 arch/x86/kvm/svm.c              |  79 +++++++-------
 arch/x86/kvm/vmx.c              | 224 +++++++++++++++++++++-------------------
 arch/x86/kvm/x86.c              |  86 +++++++++------
 4 files changed, 217 insertions(+), 174 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/4] KVM: nVMX: move vmentry tasks from prepare_vmcs12 to enter_vmx_non_root_mode
  2017-08-23 20:43 [RFC PATCH 0/4] KVM: x86: allow overwriting L2 reinjected exception with L1 vmexit Paolo Bonzini
@ 2017-08-23 20:43 ` Paolo Bonzini
  2017-08-23 21:25   ` Jim Mattson
  2017-08-23 20:43 ` [PATCH 2/4] KVM: nVMX: fill nested IDT-vectored event info on all L2->L0 exits Paolo Bonzini
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 7+ messages in thread
From: Paolo Bonzini @ 2017-08-23 20:43 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: wanpeng.li, david, rkrcmar, jmattson

Setting the VMCS12 to launched and clearing the vm_entry_intr_info_field
was done as part of L0->L1 exit in prepare_vmcs12.  In order to simplify
prepare_vmcs12, move this to enter_vmx_non_root_mode since at this point
we've already committed to vmentry.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/vmx.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 19aa69af7c2d..01c29b6af254 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -10752,6 +10752,13 @@ static int enter_vmx_non_root_mode(struct kvm_vcpu *vcpu, bool from_vmentry)
 	 * returned as far as L1 is concerned. It will only return (and set
 	 * the success flag) when L2 exits (see nested_vmx_vmexit()).
 	 */
+	if (from_vmentry) {
+		vmcs12->launch_state = 1;
+
+		/* vm_entry_intr_info_field is cleared on exit. Emulate this
+		 * instead of reading the real value. */
+		vmcs12->vm_entry_intr_info_field &= ~INTR_INFO_VALID_MASK;
+	}
 	return 0;
 }
 
@@ -11121,12 +11128,6 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
 	vmcs12->vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO);
 
 	if (!(vmcs12->vm_exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY)) {
-		vmcs12->launch_state = 1;
-
-		/* vm_entry_intr_info_field is cleared on exit. Emulate this
-		 * instead of reading the real value. */
-		vmcs12->vm_entry_intr_info_field &= ~INTR_INFO_VALID_MASK;
-
 		/*
 		 * Transfer the event that L0 or L1 may wanted to inject into
 		 * L2 to IDT_VECTORING_INFO_FIELD.
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/4] KVM: nVMX: fill nested IDT-vectored event info on all L2->L0 exits
  2017-08-23 20:43 [RFC PATCH 0/4] KVM: x86: allow overwriting L2 reinjected exception with L1 vmexit Paolo Bonzini
  2017-08-23 20:43 ` [PATCH 1/4] KVM: nVMX: move vmentry tasks from prepare_vmcs12 to enter_vmx_non_root_mode Paolo Bonzini
@ 2017-08-23 20:43 ` Paolo Bonzini
  2017-08-23 20:43 ` [PATCH 3/4] KVM: x86: pass struct kvm_queued_exception to kvm_multiple_exception Paolo Bonzini
  2017-08-23 20:43 ` [PATCH 4/4] KVM: x86: allow overwriting L2 reinjected exception with L1 vmexit Paolo Bonzini
  3 siblings, 0 replies; 7+ messages in thread
From: Paolo Bonzini @ 2017-08-23 20:43 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: wanpeng.li, david, rkrcmar, jmattson

vcpu->arch.exception currently contains the vmcs02 IDT-vectored info
through the entire execution of the vmexit.  This makes it harder
to keep that information safe when vcpu->arch.exception is reused for
an exception that happens while L0 handles a vmexit.

When this happens, there are two cases:

- the exception causes a vmexit to L1; in that case, the exception in the
  IDT-vectored info is not reinjected; vcpu->arch.exception is
  reused to build the VM-exit interruption info.

- the exception doesn't cause a vmexit to L1; in that case,
  vcpu->arch.exception is changed to a double fault which is injected
  normally into L2 via KVM_REQ_EVENT.

We want to discard vcpu->arch.exception in the first case.  To prepare
for that, prepare the vmcs12 IDT-vectored info early.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/vmx.c | 127 +++++++++++++++++++++++++++++------------------------
 1 file changed, 69 insertions(+), 58 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 01c29b6af254..f8ef38094acc 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -9073,11 +9073,76 @@ static void __vmx_complete_interrupts(struct kvm_vcpu *vcpu,
 	}
 }
 
-static void vmx_complete_interrupts(struct vcpu_vmx *vmx)
+static void vmcs12_save_pending_event(struct kvm_vcpu *vcpu,
+				       struct vmcs12 *vmcs12)
+{
+	u32 idt_vectoring;
+	unsigned int nr;
+
+	if (vcpu->arch.exception.pending && vcpu->arch.exception.reinject) {
+		nr = vcpu->arch.exception.nr;
+		idt_vectoring = nr | VECTORING_INFO_VALID_MASK;
+
+		if (kvm_exception_is_soft(nr)) {
+			vmcs12->vm_exit_instruction_len =
+				vcpu->arch.event_exit_inst_len;
+			idt_vectoring |= INTR_TYPE_SOFT_EXCEPTION;
+		} else
+			idt_vectoring |= INTR_TYPE_HARD_EXCEPTION;
+
+		if (vcpu->arch.exception.has_error_code) {
+			idt_vectoring |= VECTORING_INFO_DELIVER_CODE_MASK;
+			vmcs12->idt_vectoring_error_code =
+				vcpu->arch.exception.error_code;
+		}
+
+		vmcs12->idt_vectoring_info_field = idt_vectoring;
+	} else if (vcpu->arch.nmi_injected) {
+		vmcs12->idt_vectoring_info_field =
+			INTR_TYPE_NMI_INTR | INTR_INFO_VALID_MASK | NMI_VECTOR;
+	} else if (vcpu->arch.interrupt.pending) {
+		nr = vcpu->arch.interrupt.nr;
+		idt_vectoring = nr | VECTORING_INFO_VALID_MASK;
+
+		if (vcpu->arch.interrupt.soft) {
+			idt_vectoring |= INTR_TYPE_SOFT_INTR;
+			vmcs12->vm_entry_instruction_len =
+				vcpu->arch.event_exit_inst_len;
+		} else
+			idt_vectoring |= INTR_TYPE_EXT_INTR;
+
+		vmcs12->idt_vectoring_info_field = idt_vectoring;
+	}
+}
+
+static void vmx_complete_interrupts(struct kvm_vcpu *vcpu)
 {
+	struct vcpu_vmx *vmx = to_vmx(vcpu);
+	struct vmcs12 *vmcs12;
+
 	__vmx_complete_interrupts(&vmx->vcpu, vmx->idt_vectoring_info,
 				  VM_EXIT_INSTRUCTION_LEN,
 				  IDT_VECTORING_ERROR_CODE);
+
+	if (!is_guest_mode(vcpu))
+		return;
+
+	/*
+	 * Nested vmexit during event delivery, move the IDT-vectored event
+	 * fields to _both_ vcpu->arch and VMCS12.  If we exit to L1, having it
+	 * in VMCS12 makes it easier to reuse vcpu->arch for a non-reinjected
+	 * exception and error code; if we stay in L2, the vmcs12 writes go
+	 * unnoticed.
+	 */
+
+	vmcs12 = get_vmcs12(vcpu);
+	vmcs12->idt_vectoring_info_field = 0;
+	vmcs12->vm_exit_instruction_len = vmcs_read32(VM_EXIT_INSTRUCTION_LEN);
+	vmcs12->vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO);
+
+	if ((vmx->idt_vectoring_info & VECTORING_INFO_VALID_MASK) &&
+	    !(vmcs12->vm_exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY))
+		vmcs12_save_pending_event(vcpu, vmcs12);
 }
 
 static void vmx_cancel_injection(struct kvm_vcpu *vcpu)
@@ -9343,7 +9408,7 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
 
 	vmx_complete_atomic_exit(vmx);
 	vmx_recover_nmi_blocking(vmx);
-	vmx_complete_interrupts(vmx);
+	vmx_complete_interrupts(vcpu);
 }
 STACK_FRAME_NON_STANDARD(vmx_vcpu_run);
 
@@ -10887,48 +10952,6 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch)
 			vcpu->arch.cr4_guest_owned_bits));
 }
 
-static void vmcs12_save_pending_event(struct kvm_vcpu *vcpu,
-				       struct vmcs12 *vmcs12)
-{
-	u32 idt_vectoring;
-	unsigned int nr;
-
-	if (vcpu->arch.exception.pending && vcpu->arch.exception.reinject) {
-		nr = vcpu->arch.exception.nr;
-		idt_vectoring = nr | VECTORING_INFO_VALID_MASK;
-
-		if (kvm_exception_is_soft(nr)) {
-			vmcs12->vm_exit_instruction_len =
-				vcpu->arch.event_exit_inst_len;
-			idt_vectoring |= INTR_TYPE_SOFT_EXCEPTION;
-		} else
-			idt_vectoring |= INTR_TYPE_HARD_EXCEPTION;
-
-		if (vcpu->arch.exception.has_error_code) {
-			idt_vectoring |= VECTORING_INFO_DELIVER_CODE_MASK;
-			vmcs12->idt_vectoring_error_code =
-				vcpu->arch.exception.error_code;
-		}
-
-		vmcs12->idt_vectoring_info_field = idt_vectoring;
-	} else if (vcpu->arch.nmi_injected) {
-		vmcs12->idt_vectoring_info_field =
-			INTR_TYPE_NMI_INTR | INTR_INFO_VALID_MASK | NMI_VECTOR;
-	} else if (vcpu->arch.interrupt.pending) {
-		nr = vcpu->arch.interrupt.nr;
-		idt_vectoring = nr | VECTORING_INFO_VALID_MASK;
-
-		if (vcpu->arch.interrupt.soft) {
-			idt_vectoring |= INTR_TYPE_SOFT_INTR;
-			vmcs12->vm_entry_instruction_len =
-				vcpu->arch.event_exit_inst_len;
-		} else
-			idt_vectoring |= INTR_TYPE_EXT_INTR;
-
-		vmcs12->idt_vectoring_info_field = idt_vectoring;
-	}
-}
-
 static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
@@ -11123,21 +11146,9 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
 	vmcs12->exit_qualification = exit_qualification;
 	vmcs12->vm_exit_intr_info = exit_intr_info;
 
-	vmcs12->idt_vectoring_info_field = 0;
-	vmcs12->vm_exit_instruction_len = vmcs_read32(VM_EXIT_INSTRUCTION_LEN);
-	vmcs12->vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO);
-
-	if (!(vmcs12->vm_exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY)) {
-		/*
-		 * Transfer the event that L0 or L1 may wanted to inject into
-		 * L2 to IDT_VECTORING_INFO_FIELD.
-		 */
-		vmcs12_save_pending_event(vcpu, vmcs12);
-	}
-
 	/*
-	 * Drop what we picked up for L2 via vmx_complete_interrupts. It is
-	 * preserved above and would only end up incorrectly in L1.
+	 * Clear these, they are already in vmcs12 via exit interruption info
+	 * or IDT-vectored event info.
 	 */
 	vcpu->arch.nmi_injected = false;
 	kvm_clear_exception_queue(vcpu);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 3/4] KVM: x86: pass struct kvm_queued_exception to kvm_multiple_exception
  2017-08-23 20:43 [RFC PATCH 0/4] KVM: x86: allow overwriting L2 reinjected exception with L1 vmexit Paolo Bonzini
  2017-08-23 20:43 ` [PATCH 1/4] KVM: nVMX: move vmentry tasks from prepare_vmcs12 to enter_vmx_non_root_mode Paolo Bonzini
  2017-08-23 20:43 ` [PATCH 2/4] KVM: nVMX: fill nested IDT-vectored event info on all L2->L0 exits Paolo Bonzini
@ 2017-08-23 20:43 ` Paolo Bonzini
  2017-08-23 20:43 ` [PATCH 4/4] KVM: x86: allow overwriting L2 reinjected exception with L1 vmexit Paolo Bonzini
  3 siblings, 0 replies; 7+ messages in thread
From: Paolo Bonzini @ 2017-08-23 20:43 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: wanpeng.li, david, rkrcmar, jmattson

Avoid early overwriting of vcpu->arch.exception.nested_apf, and
make it easier to add CR2 or DR6 in the future.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/x86.c | 79 ++++++++++++++++++++++++++++++++----------------------
 1 file changed, 47 insertions(+), 32 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4e699238a113..88b91114c5a8 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -381,25 +381,14 @@ static int exception_type(int vector)
 }
 
 static void kvm_multiple_exception(struct kvm_vcpu *vcpu,
-		unsigned nr, bool has_error, u32 error_code,
-		bool reinject)
+                                   struct kvm_queued_exception ex)
 {
 	u32 prev_nr;
 	int class1, class2;
 
 	kvm_make_request(KVM_REQ_EVENT, vcpu);
-
-	if (!vcpu->arch.exception.pending) {
-	queue:
-		if (has_error && !is_protmode(vcpu))
-			has_error = false;
-		vcpu->arch.exception.pending = true;
-		vcpu->arch.exception.has_error_code = has_error;
-		vcpu->arch.exception.nr = nr;
-		vcpu->arch.exception.error_code = error_code;
-		vcpu->arch.exception.reinject = reinject;
-		return;
-	}
+	if (!vcpu->arch.exception.pending)
+		goto queue;
 
 	/* to check exception */
 	prev_nr = vcpu->arch.exception.nr;
@@ -409,30 +398,43 @@ static void kvm_multiple_exception(struct kvm_vcpu *vcpu,
 		return;
 	}
 	class1 = exception_class(prev_nr);
-	class2 = exception_class(nr);
+	class2 = exception_class(ex.nr);
 	if ((class1 == EXCPT_CONTRIBUTORY && class2 == EXCPT_CONTRIBUTORY)
 		|| (class1 == EXCPT_PF && class2 != EXCPT_BENIGN)) {
 		/* generate double fault per SDM Table 5-5 */
-		vcpu->arch.exception.pending = true;
-		vcpu->arch.exception.has_error_code = true;
-		vcpu->arch.exception.nr = DF_VECTOR;
-		vcpu->arch.exception.error_code = 0;
-	} else
-		/* replace previous exception with a new one in a hope
-		   that instruction re-execution will regenerate lost
-		   exception */
-		goto queue;
+		ex.pending = true;
+		ex.has_error_code = true;
+		ex.nr = DF_VECTOR;
+		ex.error_code = 0;
+	}
+
+	/*
+	 * Else replace previous exception with a new one in a hope
+	 * that instruction re-execution will regenerate lost
+	 * exception.
+	 */
+
+queue:
+	ex.pending = true;
+	ex.has_error_code = ex.has_error_code && is_protmode(vcpu);
+	vcpu->arch.exception = ex;
+	return;
 }
 
 void kvm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr)
 {
-	kvm_multiple_exception(vcpu, nr, false, 0, false);
+	kvm_multiple_exception(vcpu, ((struct kvm_queued_exception) {
+		.nr = nr
+	}));
 }
 EXPORT_SYMBOL_GPL(kvm_queue_exception);
 
 void kvm_requeue_exception(struct kvm_vcpu *vcpu, unsigned nr)
 {
-	kvm_multiple_exception(vcpu, nr, false, 0, true);
+	kvm_multiple_exception(vcpu, ((struct kvm_queued_exception) {
+		.nr = nr,
+		.reinject = true
+	}));
 }
 EXPORT_SYMBOL_GPL(kvm_requeue_exception);
 
@@ -449,14 +451,18 @@ int kvm_complete_insn_gp(struct kvm_vcpu *vcpu, int err)
 
 void kvm_inject_page_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault)
 {
+	bool nested_apf = is_guest_mode(vcpu) && fault->async_page_fault;
 	++vcpu->stat.pf_guest;
-	vcpu->arch.exception.nested_apf =
-		is_guest_mode(vcpu) && fault->async_page_fault;
-	if (vcpu->arch.exception.nested_apf)
+	if (nested_apf)
 		vcpu->arch.apf.nested_apf_token = fault->address;
 	else
 		vcpu->arch.cr2 = fault->address;
-	kvm_queue_exception_e(vcpu, PF_VECTOR, fault->error_code);
+	kvm_multiple_exception(vcpu, ((struct kvm_queued_exception) {
+		.nr = PF_VECTOR,
+		.nested_apf = nested_apf,
+		.has_error_code = true,
+		.error_code = fault->error_code
+	}));
 }
 EXPORT_SYMBOL_GPL(kvm_inject_page_fault);
 
@@ -479,13 +485,22 @@ void kvm_inject_nmi(struct kvm_vcpu *vcpu)
 
 void kvm_queue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code)
 {
-	kvm_multiple_exception(vcpu, nr, true, error_code, false);
+	kvm_multiple_exception(vcpu, ((struct kvm_queued_exception) {
+		.nr = nr,
+		.has_error_code = true,
+		.error_code = error_code
+	}));
 }
 EXPORT_SYMBOL_GPL(kvm_queue_exception_e);
 
 void kvm_requeue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code)
 {
-	kvm_multiple_exception(vcpu, nr, true, error_code, true);
+	kvm_multiple_exception(vcpu, ((struct kvm_queued_exception) {
+		.nr = nr,
+		.has_error_code = true,
+		.error_code = error_code,
+		.reinject = true
+	}));
 }
 EXPORT_SYMBOL_GPL(kvm_requeue_exception_e);
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 4/4] KVM: x86: allow overwriting L2 reinjected exception with L1 vmexit
  2017-08-23 20:43 [RFC PATCH 0/4] KVM: x86: allow overwriting L2 reinjected exception with L1 vmexit Paolo Bonzini
                   ` (2 preceding siblings ...)
  2017-08-23 20:43 ` [PATCH 3/4] KVM: x86: pass struct kvm_queued_exception to kvm_multiple_exception Paolo Bonzini
@ 2017-08-23 20:43 ` Paolo Bonzini
  3 siblings, 0 replies; 7+ messages in thread
From: Paolo Bonzini @ 2017-08-23 20:43 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: wanpeng.li, david, rkrcmar, jmattson

A reinjected exception is already recorded in either the IDT-vectored
event information fields or the EXITINTINFO fields; if the handling of
an exception in L0 causes a vmexit, we don't really need to keep the
reinjected exception in vcpu->arch.exception.

Teach kvm_multiple_exception to recognize this scenario through a
new kvm_x86_ops callback.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |  2 +
 arch/x86/kvm/svm.c              | 79 +++++++++++++++++++-------------------
 arch/x86/kvm/vmx.c              | 84 +++++++++++++++++++++--------------------
 arch/x86/kvm/x86.c              |  9 +++++
 4 files changed, 95 insertions(+), 79 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 6db0ed9cf59e..643308143bea 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -962,6 +962,8 @@ struct kvm_x86_ops {
 				unsigned char *hypercall_addr);
 	void (*set_irq)(struct kvm_vcpu *vcpu);
 	void (*set_nmi)(struct kvm_vcpu *vcpu);
+	int (*nested_check_exception)(struct kvm_vcpu *vcpu,
+				      struct kvm_queued_exception *ex);
 	void (*queue_exception)(struct kvm_vcpu *vcpu);
 	void (*cancel_injection)(struct kvm_vcpu *vcpu);
 	int (*interrupt_allowed)(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 7e190b21a30b..32c8d8f62985 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -291,8 +291,6 @@ struct amd_svm_iommu_ir {
 static int nested_svm_exit_handled(struct vcpu_svm *svm);
 static int nested_svm_intercept(struct vcpu_svm *svm);
 static int nested_svm_vmexit(struct vcpu_svm *svm);
-static int nested_svm_check_exception(struct vcpu_svm *svm, unsigned nr,
-				      bool has_error_code, u32 error_code);
 
 enum {
 	VMCB_INTERCEPTS, /* Intercept vectors, TSC offset,
@@ -650,6 +648,42 @@ static void skip_emulated_instruction(struct kvm_vcpu *vcpu)
 	svm_set_interrupt_shadow(vcpu, 0);
 }
 
+static void nested_svm_queue_exception(struct kvm_vcpu *vcpu,
+				       struct kvm_queued_exception *ex)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+
+	svm->vmcb->control.exit_code = SVM_EXIT_EXCP_BASE + ex->nr;
+	svm->vmcb->control.exit_code_hi = 0;
+	svm->vmcb->control.exit_info_1 = ex->error_code;
+
+	/*
+	 * FIXME: we should not write CR2 when L1 intercepts an L2 #PF exception.
+	 * The fix is to add the ancillary datum (CR2 or DR6) to structs
+	 * kvm_queued_exception and kvm_vcpu_events, so that CR2 and DR6 can be
+	 * written only when inject_pending_event runs (DR6 would written here
+	 * too).  This should be conditional on a new capability---if the
+	 * capability is disabled, kvm_multiple_exception would write the
+	 * ancillary information to CR2 or DR6, for backwards ABI-compatibility.
+	 */
+	if (ex->nested_apf)
+		svm->vmcb->control.exit_info_2 = svm->vcpu.arch.apf.nested_apf_token;
+	else
+		svm->vmcb->control.exit_info_2 = svm->vcpu.arch.cr2;
+
+	svm->nested.exit_required = true;
+}
+
+static int nested_svm_check_exception(struct kvm_vcpu *vcpu,
+				      struct kvm_queued_exception *ex)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+	unsigned int nr = ex->nr;
+
+	return ((svm->nested.intercept_exceptions & (1 << nr)) ||
+		(nr == PF_VECTOR && ex->nested_apf));
+}
+
 static void svm_queue_exception(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
@@ -662,9 +696,11 @@ static void svm_queue_exception(struct kvm_vcpu *vcpu)
 	 * If we are within a nested VM we'd better #VMEXIT and let the guest
 	 * handle the exception
 	 */
-	if (!reinject &&
-	    nested_svm_check_exception(svm, nr, has_error_code, error_code))
+	if (!reinject && is_guest_mode(vcpu) &&
+	    nested_svm_check_exception(vcpu, &vcpu->arch.exception)) {
+		nested_svm_queue_exception(vcpu, &vcpu->arch.exception);
 		return;
+	}
 
 	if (nr == BP_VECTOR && !static_cpu_has(X86_FEATURE_NRIPS)) {
 		unsigned long rip, old_rip = kvm_rip_read(&svm->vcpu);
@@ -2428,40 +2464,6 @@ static int nested_svm_check_permissions(struct vcpu_svm *svm)
 	return 0;
 }
 
-static int nested_svm_check_exception(struct vcpu_svm *svm, unsigned nr,
-				      bool has_error_code, u32 error_code)
-{
-	int vmexit;
-
-	if (!is_guest_mode(&svm->vcpu))
-		return 0;
-
-	vmexit = nested_svm_intercept(svm);
-	if (vmexit != NESTED_EXIT_DONE)
-		return 0;
-
-	svm->vmcb->control.exit_code = SVM_EXIT_EXCP_BASE + nr;
-	svm->vmcb->control.exit_code_hi = 0;
-	svm->vmcb->control.exit_info_1 = error_code;
-
-	/*
-	 * FIXME: we should not write CR2 when L1 intercepts an L2 #PF exception.
-	 * The fix is to add the ancillary datum (CR2 or DR6) to structs
-	 * kvm_queued_exception and kvm_vcpu_events, so that CR2 and DR6 can be
-	 * written only when inject_pending_event runs (DR6 would written here
-	 * too).  This should be conditional on a new capability---if the
-	 * capability is disabled, kvm_multiple_exception would write the
-	 * ancillary information to CR2 or DR6, for backwards ABI-compatibility.
-	 */
-	if (svm->vcpu.arch.exception.nested_apf)
-		svm->vmcb->control.exit_info_2 = svm->vcpu.arch.apf.nested_apf_token;
-	else
-		svm->vmcb->control.exit_info_2 = svm->vcpu.arch.cr2;
-
-	svm->nested.exit_required = true;
-	return vmexit;
-}
-
 /* This function returns true if it is save to enable the irq window */
 static inline bool nested_svm_intr(struct vcpu_svm *svm)
 {
@@ -5448,6 +5450,7 @@ static void svm_setup_mce(struct kvm_vcpu *vcpu)
 	.patch_hypercall = svm_patch_hypercall,
 	.set_irq = svm_set_irq,
 	.set_nmi = svm_inject_nmi,
+	.nested_check_exception = nested_svm_check_exception,
 	.queue_exception = svm_queue_exception,
 	.cancel_injection = svm_cancel_injection,
 	.interrupt_allowed = svm_interrupt_allowed,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index f8ef38094acc..ddabed8425b3 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2438,15 +2438,41 @@ static void skip_emulated_instruction(struct kvm_vcpu *vcpu)
 	vmx_set_interrupt_shadow(vcpu, 0);
 }
 
-static void nested_vmx_inject_exception_vmexit(struct kvm_vcpu *vcpu,
-					       unsigned long exit_qual)
+static void nested_vmx_queue_exception(struct kvm_vcpu *vcpu,
+				       struct kvm_queued_exception *ex)
 {
 	struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
-	unsigned int nr = vcpu->arch.exception.nr;
-	u32 intr_info = nr | INTR_INFO_VALID_MASK;
+	unsigned int nr = ex->nr;
+   	u32 intr_info = nr | INTR_INFO_VALID_MASK;
+	unsigned long exit_qual = 0;
 
-	if (vcpu->arch.exception.has_error_code) {
-		vmcs12->vm_exit_intr_error_code = vcpu->arch.exception.error_code;
+	/*
+	 * FIXME: we must not write CR2 or DR6 when L1 intercepts an L2 #PF
+	 * or #DB exception.
+	 *
+	 * The fix is to add the ancillary datum (CR2 or DR6) to structs
+	 * kvm_queued_exception and kvm_vcpu_events, so that CR2 and DR6
+	 * can be written only when inject_pending_event runs.  This should be
+	 * conditional on a new capability---if the capability is disabled,
+	 * kvm_multiple_exception would write the ancillary information to
+	 * CR2 or DR6, for backwards ABI-compatibility.
+	 */
+	switch (nr) {
+	case PF_VECTOR:
+		if (ex->nested_apf)
+			exit_qual = vcpu->arch.apf.nested_apf_token;
+		else
+			exit_qual = vcpu->arch.cr2;
+		break;
+	case DB_VECTOR:
+		exit_qual = vcpu->arch.dr6;
+		break;
+	default:
+		exit_qual = 0;
+	}
+
+	if (ex->has_error_code) {
+		vmcs12->vm_exit_intr_error_code = ex->error_code;
 		intr_info |= INTR_INFO_DELIVER_CODE_MASK;
 	}
 
@@ -2466,43 +2492,16 @@ static void nested_vmx_inject_exception_vmexit(struct kvm_vcpu *vcpu,
  * KVM wants to inject page-faults which it got to the guest. This function
  * checks whether in a nested guest, we need to inject them to L1 or L2.
  */
-static int nested_vmx_check_exception(struct kvm_vcpu *vcpu)
+static int nested_vmx_check_exception(struct kvm_vcpu *vcpu,
+				      struct kvm_queued_exception *ex)
 {
 	struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
-	unsigned int nr = vcpu->arch.exception.nr;
+	unsigned int nr = ex->nr;
 
-	if (nr == PF_VECTOR) {
-		if (vcpu->arch.exception.nested_apf) {
-			nested_vmx_inject_exception_vmexit(vcpu,
-							   vcpu->arch.apf.nested_apf_token);
-			return 1;
-		}
-		/*
-		 * FIXME: we must not write CR2 when L1 intercepts an L2 #PF exception.
-		 * The fix is to add the ancillary datum (CR2 or DR6) to structs
-		 * kvm_queued_exception and kvm_vcpu_events, so that CR2 and DR6
-		 * can be written only when inject_pending_event runs.  This should be
-		 * conditional on a new capability---if the capability is disabled,
-		 * kvm_multiple_exception would write the ancillary information to
-		 * CR2 or DR6, for backwards ABI-compatibility.
-		 */
-		if (nested_vmx_is_page_fault_vmexit(vmcs12,
-						    vcpu->arch.exception.error_code)) {
-			nested_vmx_inject_exception_vmexit(vcpu, vcpu->arch.cr2);
-			return 1;
-		}
-	} else {
-		unsigned long exit_qual = 0;
-		if (nr == DB_VECTOR)
-			exit_qual = vcpu->arch.dr6;
-
-		if (vmcs12->exception_bitmap & (1u << nr)) {
-			nested_vmx_inject_exception_vmexit(vcpu, exit_qual);
-			return 1;
-		}
-	}
-
-	return 0;
+	return (nr == PF_VECTOR
+		? (ex->nested_apf ||
+		   nested_vmx_is_page_fault_vmexit(vmcs12, ex->error_code))
+		: vmcs12->exception_bitmap & (1u << nr));
 }
 
 static void vmx_queue_exception(struct kvm_vcpu *vcpu)
@@ -2515,8 +2514,10 @@ static void vmx_queue_exception(struct kvm_vcpu *vcpu)
 	u32 intr_info = nr | INTR_INFO_VALID_MASK;
 
 	if (!reinject && is_guest_mode(vcpu) &&
-	    nested_vmx_check_exception(vcpu))
+	    nested_vmx_check_exception(vcpu, &vcpu->arch.exception)) {
+		nested_vmx_queue_exception(vcpu, &vcpu->arch.exception);
 		return;
+	}
 
 	if (has_error_code) {
 		vmcs_write32(VM_ENTRY_EXCEPTION_ERROR_CODE, error_code);
@@ -11886,6 +11887,7 @@ static void vmx_setup_mce(struct kvm_vcpu *vcpu)
 	.patch_hypercall = vmx_patch_hypercall,
 	.set_irq = vmx_inject_irq,
 	.set_nmi = vmx_inject_nmi,
+	.nested_check_exception = nested_vmx_check_exception,
 	.queue_exception = vmx_queue_exception,
 	.cancel_injection = vmx_cancel_injection,
 	.interrupt_allowed = vmx_interrupt_allowed,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 88b91114c5a8..55c709531eb9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -390,6 +390,15 @@ static void kvm_multiple_exception(struct kvm_vcpu *vcpu,
 	if (!vcpu->arch.exception.pending)
 		goto queue;
 
+	/*
+	 * If the previous exception was recorded during an L2->L0
+	 * exit, we can overwrite it with one that causes an L0->L1
+	 * nested vmexit.
+	 */
+	if (vcpu->arch.exception.reinject && is_guest_mode(vcpu) &&
+	    kvm_x86_ops->nested_check_exception(vcpu, &ex))
+		goto queue;
+
 	/* to check exception */
 	prev_nr = vcpu->arch.exception.nr;
 	if (prev_nr == DF_VECTOR) {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/4] KVM: nVMX: move vmentry tasks from prepare_vmcs12 to enter_vmx_non_root_mode
  2017-08-23 20:43 ` [PATCH 1/4] KVM: nVMX: move vmentry tasks from prepare_vmcs12 to enter_vmx_non_root_mode Paolo Bonzini
@ 2017-08-23 21:25   ` Jim Mattson
  2017-08-23 21:47     ` Paolo Bonzini
  0 siblings, 1 reply; 7+ messages in thread
From: Jim Mattson @ 2017-08-23 21:25 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: LKML, kvm list, Wanpeng Li, David Hildenbrand,
	Radim Krčmář

On Wed, Aug 23, 2017 at 1:43 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Setting the VMCS12 to launched and clearing the vm_entry_intr_info_field
> was done as part of L0->L1 exit in prepare_vmcs12.  In order to simplify
> prepare_vmcs12, move this to enter_vmx_non_root_mode since at this point
> we've already committed to vmentry.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  arch/x86/kvm/vmx.c | 13 +++++++------
>  1 file changed, 7 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 19aa69af7c2d..01c29b6af254 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -10752,6 +10752,13 @@ static int enter_vmx_non_root_mode(struct kvm_vcpu *vcpu, bool from_vmentry)
>          * returned as far as L1 is concerned. It will only return (and set
>          * the success flag) when L2 exits (see nested_vmx_vmexit()).
>          */
> +       if (from_vmentry) {
> +               vmcs12->launch_state = 1;

Because we defer most guest state validity checks to the hardware
vmlaunch of the vmcs02, it is too early to set the vmcs12 launched
state here. If the exit reason has the high bit set,
vmcs12->launch_state should not be modified.

> +
> +               /* vm_entry_intr_info_field is cleared on exit. Emulate this
> +                * instead of reading the real value. */
> +               vmcs12->vm_entry_intr_info_field &= ~INTR_INFO_VALID_MASK;
> +       }
>         return 0;
>  }
>
> @@ -11121,12 +11128,6 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
>         vmcs12->vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO);
>
>         if (!(vmcs12->vm_exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY)) {
> -               vmcs12->launch_state = 1;
> -
> -               /* vm_entry_intr_info_field is cleared on exit. Emulate this
> -                * instead of reading the real value. */
> -               vmcs12->vm_entry_intr_info_field &= ~INTR_INFO_VALID_MASK;
> -
>                 /*
>                  * Transfer the event that L0 or L1 may wanted to inject into
>                  * L2 to IDT_VECTORING_INFO_FIELD.
> --
> 1.8.3.1
>
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/4] KVM: nVMX: move vmentry tasks from prepare_vmcs12 to enter_vmx_non_root_mode
  2017-08-23 21:25   ` Jim Mattson
@ 2017-08-23 21:47     ` Paolo Bonzini
  0 siblings, 0 replies; 7+ messages in thread
From: Paolo Bonzini @ 2017-08-23 21:47 UTC (permalink / raw)
  To: Jim Mattson
  Cc: LKML, kvm list, Wanpeng Li, David Hildenbrand,
	Radim Krčmář

On 23/08/2017 23:25, Jim Mattson wrote:
>> +       if (from_vmentry) {
>> +               vmcs12->launch_state = 1;
> Because we defer most guest state validity checks to the hardware
> vmlaunch of the vmcs02, it is too early to set the vmcs12 launched
> state here. If the exit reason has the high bit set,
> vmcs12->launch_state should not be modified.

Thanks---I'll do my homework and add a testcase then. :)  The launched
state cannot be discovered with VMREAD, but I can test that a
VMLAUNCH;VMLAUNCH works and a VMLAUNCH;VMRESUME fails.

The same applies for the next instruction, which is part of step 6 of
the vmentry ("An event may be injected in the guest context").  This one
can be tested with VMREAD.

Paolo

>> +
>> +               /* vm_entry_intr_info_field is cleared on exit. Emulate this
>> +                * instead of reading the real value. */
>> +               vmcs12->vm_entry_intr_info_field &= ~INTR_INFO_VALID_MASK;

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-08-23 21:47 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-23 20:43 [RFC PATCH 0/4] KVM: x86: allow overwriting L2 reinjected exception with L1 vmexit Paolo Bonzini
2017-08-23 20:43 ` [PATCH 1/4] KVM: nVMX: move vmentry tasks from prepare_vmcs12 to enter_vmx_non_root_mode Paolo Bonzini
2017-08-23 21:25   ` Jim Mattson
2017-08-23 21:47     ` Paolo Bonzini
2017-08-23 20:43 ` [PATCH 2/4] KVM: nVMX: fill nested IDT-vectored event info on all L2->L0 exits Paolo Bonzini
2017-08-23 20:43 ` [PATCH 3/4] KVM: x86: pass struct kvm_queued_exception to kvm_multiple_exception Paolo Bonzini
2017-08-23 20:43 ` [PATCH 4/4] KVM: x86: allow overwriting L2 reinjected exception with L1 vmexit Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.