All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] KVM: nested virt SMM fixes
@ 2017-09-13 14:06 Ladi Prosek
  2017-09-13 14:06 ` [PATCH 1/5] KVM: x86: introduce ISA specific SMM entry/exit callbacks Ladi Prosek
                   ` (4 more replies)
  0 siblings, 5 replies; 9+ messages in thread
From: Ladi Prosek @ 2017-09-13 14:06 UTC (permalink / raw)
  To: kvm; +Cc: rkrcmar

Windows Server 2016 with Hyper-V enabled fails to boot on OVMF with SMM
(OVMF_CODE-need-smm.fd). Turns out that the SMM emulation code in KVM
does not handle nested virtualization very well, leading to a whole bunch
of issues.

For example, Hyper-V uses descriptor table exiting (SECONDARY_EXEC_DESC)
so when the SMM handler tries to switch from real mode a VM exit occurs
and is forwarded to a clueless L1.

This series fixes it by switching the vcpu to !guest_mode, i.e. to the L1
state, before entering SMM and then switching back to L2 after the RSM
instruction is emulated.

Patch 1 is common for both Intel and AMD, patch 2 fixes Intel, and
patches 4-5 AMD. Patch 3 adds more state to the SMRAM save area as
prescribed by the Intel SDM. It is however not required to make Windows
work.

Ladi Prosek (5):
      KVM: x86: introduce ISA specific SMM entry/exit callbacks
      KVM: nVMX: fix SMI injection in guest mode
      KVM: nVMX: save nested EPT information in SMRAM state save map
      KVM: nSVM: refactor nested_svm_vmrun
      KVM: nSVM: fix SMI injection in guest mode

 arch/x86/include/asm/kvm_emulate.h |   1 +
 arch/x86/include/asm/kvm_host.h    |   8 ++
 arch/x86/kvm/emulate.c             |   2 +
 arch/x86/kvm/svm.c                 | 180 ++++++++++++++++++++++++-------------
 arch/x86/kvm/vmx.c                 |  82 +++++++++++++++--
 arch/x86/kvm/x86.c                 |   9 +-
 6 files changed, 207 insertions(+), 75 deletions(-)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/5] KVM: x86: introduce ISA specific SMM entry/exit callbacks
  2017-09-13 14:06 [PATCH 0/5] KVM: nested virt SMM fixes Ladi Prosek
@ 2017-09-13 14:06 ` Ladi Prosek
  2017-09-13 21:44   ` Paolo Bonzini
  2017-09-13 14:06 ` [PATCH 2/5] KVM: nVMX: fix SMI injection in guest mode Ladi Prosek
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 9+ messages in thread
From: Ladi Prosek @ 2017-09-13 14:06 UTC (permalink / raw)
  To: kvm; +Cc: rkrcmar

Entering and exiting SMM may require ISA specific handling under certain
circumstances. This commit adds two new callbacks with empty implementations.
Actual functionality will be added in following commits.

* prep_enter_smm() is to be called when injecting an SMM, before any
  SMM related vcpu state has been changed
* post_leave_smm() is to be called when emulating the RSM instruction,
  after all SMM related vcpu state has been restored

Signed-off-by: Ladi Prosek <lprosek@redhat.com>
---
 arch/x86/include/asm/kvm_emulate.h |  1 +
 arch/x86/include/asm/kvm_host.h    |  3 +++
 arch/x86/kvm/emulate.c             |  2 ++
 arch/x86/kvm/svm.c                 | 15 +++++++++++++++
 arch/x86/kvm/vmx.c                 | 15 +++++++++++++++
 arch/x86/kvm/x86.c                 |  6 ++++++
 6 files changed, 42 insertions(+)

diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h
index fde36f1..8d56703 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -298,6 +298,7 @@ struct x86_emulate_ctxt {
 	bool perm_ok; /* do not check permissions if true */
 	bool ud;	/* inject an #UD if host doesn't support insn */
 	bool tf;	/* TF value before instruction (after for syscall/sysret) */
+	bool left_smm;  /* post_leave_smm() needs to be called after emulation */
 
 	bool have_exception;
 	struct x86_exception exception;
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 92c9032..26acdb3 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1058,6 +1058,9 @@ struct kvm_x86_ops {
 	void (*cancel_hv_timer)(struct kvm_vcpu *vcpu);
 
 	void (*setup_mce)(struct kvm_vcpu *vcpu);
+
+	int (*prep_enter_smm)(struct kvm_vcpu *vcpu, char *smstate);
+	int (*post_leave_smm)(struct kvm_vcpu *vcpu);
 };
 
 struct kvm_arch_async_pf {
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index fb00559..5faaf85 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2601,6 +2601,8 @@ static int em_rsm(struct x86_emulate_ctxt *ctxt)
 
 	ctxt->ops->set_hflags(ctxt, ctxt->ops->get_hflags(ctxt) &
 		~(X86EMUL_SMM_INSIDE_NMI_MASK | X86EMUL_SMM_MASK));
+	ctxt->left_smm = true;
+
 	return X86EMUL_CONTINUE;
 }
 
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index af256b7..0c5a599 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -5357,6 +5357,18 @@ static void svm_setup_mce(struct kvm_vcpu *vcpu)
 	vcpu->arch.mcg_cap &= 0x1ff;
 }
 
+static int svm_prep_enter_smm(struct kvm_vcpu *vcpu, char *smstate)
+{
+	/* TODO: Implement */
+	return 0;
+}
+
+static int svm_post_leave_smm(struct kvm_vcpu *vcpu)
+{
+	/* TODO: Implement */
+	return 0;
+}
+
 static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
 	.cpu_has_kvm_support = has_svm,
 	.disabled_by_bios = is_disabled,
@@ -5467,6 +5479,9 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
 	.deliver_posted_interrupt = svm_deliver_avic_intr,
 	.update_pi_irte = svm_update_pi_irte,
 	.setup_mce = svm_setup_mce,
+
+	.prep_enter_smm = svm_prep_enter_smm,
+	.post_leave_smm = svm_post_leave_smm,
 };
 
 static int __init svm_init(void)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index c6ef294..d56a528 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -11629,6 +11629,18 @@ static void vmx_setup_mce(struct kvm_vcpu *vcpu)
 			~FEATURE_CONTROL_LMCE;
 }
 
+static int vmx_prep_enter_smm(struct kvm_vcpu *vcpu, char *smstate)
+{
+	/* TODO: Implement */
+	return 0;
+}
+
+static int vmx_post_leave_smm(struct kvm_vcpu *vcpu)
+{
+	/* TODO: Implement */
+	return 0;
+}
+
 static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
 	.cpu_has_kvm_support = cpu_has_kvm_support,
 	.disabled_by_bios = vmx_disabled_by_bios,
@@ -11754,6 +11766,9 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
 #endif
 
 	.setup_mce = vmx_setup_mce,
+
+	.prep_enter_smm = vmx_prep_enter_smm,
+	.post_leave_smm = vmx_post_leave_smm,
 };
 
 static int __init vmx_init(void)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 272320e..21ade70 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5674,6 +5674,7 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu,
 		ctxt->have_exception = false;
 		ctxt->exception.vector = -1;
 		ctxt->perm_ok = false;
+		ctxt->left_smm = false;
 
 		ctxt->ud = emulation_type & EMULTYPE_TRAP_UD;
 
@@ -5755,6 +5756,8 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu,
 		toggle_interruptibility(vcpu, ctxt->interruptibility);
 		vcpu->arch.emulate_regs_need_sync_to_vcpu = false;
 		kvm_rip_write(vcpu, ctxt->eip);
+		if (r == EMULATE_DONE && ctxt->left_smm)
+			kvm_x86_ops->post_leave_smm(vcpu);
 		if (r == EMULATE_DONE &&
 		    (ctxt->tf || (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP)))
 			kvm_vcpu_do_singlestep(vcpu, &r);
@@ -6614,6 +6617,9 @@ static void enter_smm(struct kvm_vcpu *vcpu)
 	trace_kvm_enter_smm(vcpu->vcpu_id, vcpu->arch.smbase, true);
 	vcpu->arch.hflags |= HF_SMM_MASK;
 	memset(buf, 0, 512);
+
+	kvm_x86_ops->prep_enter_smm(vcpu, buf);
+
 	if (guest_cpuid_has_longmode(vcpu))
 		enter_smm_save_state_64(vcpu, buf);
 	else
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/5] KVM: nVMX: fix SMI injection in guest mode
  2017-09-13 14:06 [PATCH 0/5] KVM: nested virt SMM fixes Ladi Prosek
  2017-09-13 14:06 ` [PATCH 1/5] KVM: x86: introduce ISA specific SMM entry/exit callbacks Ladi Prosek
@ 2017-09-13 14:06 ` Ladi Prosek
  2017-09-13 14:06 ` [PATCH 3/5] KVM: nVMX: save nested EPT information in SMRAM state save map Ladi Prosek
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Ladi Prosek @ 2017-09-13 14:06 UTC (permalink / raw)
  To: kvm; +Cc: rkrcmar

Entering SMM while running in guest mode wasn't working very well because several
pieces of the vcpu state were left set up for nested operation.

Some of the issues observed:

* L1 was getting unexpected VM exits (using L1 interception controls but running
  in SMM execution environment)
* SMM handler couldn't write to vmx_set_cr4 because of incorrect validity checks
  predicated on nested.vmxon
* MMU was confused (walk_mmu was still set to nested_mmu)

Intel SDM actually prescribes the logical processor to "leave VMX operation" upon
entering SMM in 34.14.1 Default Treatment of SMI Delivery. What we need to do is
basically get out of guest mode and set nested.vmxon to false for the duration of
SMM. All this completely transparent to L1, i.e. L1 is not given control and no
L1 observable state changes.

To avoid code duplication this commit takes advantage of the existing nested
vmexit and run functionality, perhaps at the cost of efficiency. To get out of
guest mode, nested_vmx_vmexit with exit_reason == -1 is called, a trick already
used in vmx_leave_nested. Re-entering is cleaner, using enter_vmx_non_root_mode.

This commit fixes running Windows Server 2016 with Hyper-V enabled in a VM with
OVMF firmware (OVMF_CODE-need-smm.fd).

Signed-off-by: Ladi Prosek <lprosek@redhat.com>
---
 arch/x86/kvm/vmx.c | 55 +++++++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 44 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index d56a528..4926df2 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -481,6 +481,14 @@ struct nested_vmx {
 	u64 nested_vmx_cr4_fixed0;
 	u64 nested_vmx_cr4_fixed1;
 	u64 nested_vmx_vmcs_enum;
+
+	/* SMM related state */
+	struct {
+		/* in VMX operation on SMM entry? */
+		bool vmxon;
+		/* in guest mode on SMM entry? */
+		bool guest_mode;
+	} smm;
 };
 
 #define POSTED_INTR_ON  0
@@ -11112,8 +11120,11 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason,
 	WARN_ON_ONCE(vmx->nested.nested_run_pending);
 
 	leave_guest_mode(vcpu);
-	prepare_vmcs12(vcpu, vmcs12, exit_reason, exit_intr_info,
-		       exit_qualification);
+	if (exit_reason == -1)
+		sync_vmcs12(vcpu, vmcs12);
+	else
+		prepare_vmcs12(vcpu, vmcs12, exit_reason, exit_intr_info,
+			       exit_qualification);
 
 	if (nested_vmx_store_msr(vcpu, vmcs12->vm_exit_msr_store_addr,
 				 vmcs12->vm_exit_msr_store_count))
@@ -11139,12 +11150,13 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason,
 			INTR_INFO_VALID_MASK | INTR_TYPE_EXT_INTR;
 	}
 
-	trace_kvm_nested_vmexit_inject(vmcs12->vm_exit_reason,
-				       vmcs12->exit_qualification,
-				       vmcs12->idt_vectoring_info_field,
-				       vmcs12->vm_exit_intr_info,
-				       vmcs12->vm_exit_intr_error_code,
-				       KVM_ISA_VMX);
+	if (exit_reason != -1)
+		trace_kvm_nested_vmexit_inject(vmcs12->vm_exit_reason,
+					       vmcs12->exit_qualification,
+					       vmcs12->idt_vectoring_info_field,
+					       vmcs12->vm_exit_intr_info,
+					       vmcs12->vm_exit_intr_error_code,
+					       KVM_ISA_VMX);
 
 	vm_entry_controls_reset_shadow(vmx);
 	vm_exit_controls_reset_shadow(vmx);
@@ -11214,7 +11226,7 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason,
 		nested_vmx_failValid(vcpu, vm_inst_error);
 	} else
 		nested_vmx_succeed(vcpu);
-	if (enable_shadow_vmcs)
+	if (enable_shadow_vmcs && exit_reason != -1)
 		vmx->nested.sync_shadow_vmcs = true;
 
 	/* in case we halted in L2 */
@@ -11631,13 +11643,34 @@ static void vmx_setup_mce(struct kvm_vcpu *vcpu)
 
 static int vmx_prep_enter_smm(struct kvm_vcpu *vcpu, char *smstate)
 {
-	/* TODO: Implement */
+	struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+	vmx->nested.smm.guest_mode = is_guest_mode(vcpu);
+	if (vmx->nested.smm.guest_mode)
+		nested_vmx_vmexit(vcpu, -1, 0, 0);
+
+	vmx->nested.smm.vmxon = vmx->nested.vmxon;
+	vmx->nested.vmxon = false;
 	return 0;
 }
 
 static int vmx_post_leave_smm(struct kvm_vcpu *vcpu)
 {
-	/* TODO: Implement */
+	struct vcpu_vmx *vmx = to_vmx(vcpu);
+	int ret;
+
+	if (vmx->nested.smm.vmxon) {
+		vmx->nested.vmxon = true;
+		vmx->nested.smm.vmxon = false;
+	}
+
+	if (vmx->nested.smm.guest_mode) {
+		ret = enter_vmx_non_root_mode(vcpu, false);
+		if (ret)
+			return ret;
+
+		vmx->nested.smm.guest_mode = false;
+	}
 	return 0;
 }
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/5] KVM: nVMX: save nested EPT information in SMRAM state save map
  2017-09-13 14:06 [PATCH 0/5] KVM: nested virt SMM fixes Ladi Prosek
  2017-09-13 14:06 ` [PATCH 1/5] KVM: x86: introduce ISA specific SMM entry/exit callbacks Ladi Prosek
  2017-09-13 14:06 ` [PATCH 2/5] KVM: nVMX: fix SMI injection in guest mode Ladi Prosek
@ 2017-09-13 14:06 ` Ladi Prosek
  2017-09-13 14:06 ` [PATCH 4/5] KVM: nSVM: refactor nested_svm_vmrun Ladi Prosek
  2017-09-13 14:06 ` [PATCH 5/5] KVM: nSVM: fix SMI injection in guest mode Ladi Prosek
  4 siblings, 0 replies; 9+ messages in thread
From: Ladi Prosek @ 2017-09-13 14:06 UTC (permalink / raw)
  To: kvm; +Cc: rkrcmar

According to Intel SDM 34.14.1 "Default Treatment of SMI Delivery", if the cpu
is in VMX non-root operation and EPT is enabled at the time of an SMI, this
fact is recorded in the SMRAM state save area.

Signed-off-by: Ladi Prosek <lprosek@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |  5 +++++
 arch/x86/kvm/vmx.c              | 18 +++++++++++++++++-
 arch/x86/kvm/x86.c              |  3 ---
 3 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 26acdb3..6c89d4f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1440,4 +1440,9 @@ static inline int kvm_cpu_get_apicid(int mps_cpu)
 #endif
 }
 
+#define get_smstate(type, buf, offset)                           \
+	*(type *)((buf) + (offset) - 0x7e00)
+#define put_smstate(type, buf, offset, val)                      \
+	*(type *)((buf) + (offset) - 0x7e00) = val
+
 #endif /* _ASM_X86_KVM_HOST_H */
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 4926df2..875c3ff 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -11644,10 +11644,26 @@ static void vmx_setup_mce(struct kvm_vcpu *vcpu)
 static int vmx_prep_enter_smm(struct kvm_vcpu *vcpu, char *smstate)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
+	struct vmcs12 *vmcs12;
 
 	vmx->nested.smm.guest_mode = is_guest_mode(vcpu);
-	if (vmx->nested.smm.guest_mode)
+	if (vmx->nested.smm.guest_mode) {
+		vmcs12 = get_vmcs12(vcpu);
+		if (nested_cpu_has_ept(vmcs12)) {
+			/*
+			 * 34.14.1 Default Treatment of SMI Delivery
+			 * Bit 0 of the 32-bit field at offset SMBASE + 8000H + 7EE0H
+			 * indicates whether the cpu was in VMX non-root operation with
+			 * EPT enabled.
+			 * The 64-bit field at offset SMBASE + 8000H + 7ED8H holds the
+			 * value of the EPT pointer.
+			 */
+			put_smstate(u32, smstate, 0x7ee0,
+				    get_smstate(u32, smstate, 0x7ee0) | 1);
+			put_smstate(u64, smstate, 0x7ed8, vmcs12->ept_pointer);
+		}
 		nested_vmx_vmexit(vcpu, -1, 0, 0);
+	}
 
 	vmx->nested.smm.vmxon = vmx->nested.vmxon;
 	vmx->nested.vmxon = false;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 21ade70..7af9ab8 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6447,9 +6447,6 @@ static void process_nmi(struct kvm_vcpu *vcpu)
 	kvm_make_request(KVM_REQ_EVENT, vcpu);
 }
 
-#define put_smstate(type, buf, offset, val)			  \
-	*(type *)((buf) + (offset) - 0x7e00) = val
-
 static u32 enter_smm_get_segment_flags(struct kvm_segment *seg)
 {
 	u32 flags = 0;
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 4/5] KVM: nSVM: refactor nested_svm_vmrun
  2017-09-13 14:06 [PATCH 0/5] KVM: nested virt SMM fixes Ladi Prosek
                   ` (2 preceding siblings ...)
  2017-09-13 14:06 ` [PATCH 3/5] KVM: nVMX: save nested EPT information in SMRAM state save map Ladi Prosek
@ 2017-09-13 14:06 ` Ladi Prosek
  2017-09-13 14:06 ` [PATCH 5/5] KVM: nSVM: fix SMI injection in guest mode Ladi Prosek
  4 siblings, 0 replies; 9+ messages in thread
From: Ladi Prosek @ 2017-09-13 14:06 UTC (permalink / raw)
  To: kvm; +Cc: rkrcmar

Analogous to 858e25c06fb0 ("kvm: nVMX: Refactor nested_vmx_run()"), this commit splits
nested_svm_vmrun into two parts. The newly introduced enter_svm_guest_mode modifies the
vcpu state to transition from L1 to L2, while the code left in nested_svm_vmrun handles
the VMRUN instruction.

Signed-off-by: Ladi Prosek <lprosek@redhat.com>
---
 arch/x86/kvm/svm.c | 132 ++++++++++++++++++++++++++++-------------------------
 1 file changed, 69 insertions(+), 63 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 0c5a599..0f85062 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -2892,70 +2892,9 @@ static bool nested_vmcb_checks(struct vmcb *vmcb)
 	return true;
 }
 
-static bool nested_svm_vmrun(struct vcpu_svm *svm)
+static void enter_svm_guest_mode(struct vcpu_svm *svm, u64 vmcb_gpa,
+				 struct vmcb *nested_vmcb, struct page *page)
 {
-	struct vmcb *nested_vmcb;
-	struct vmcb *hsave = svm->nested.hsave;
-	struct vmcb *vmcb = svm->vmcb;
-	struct page *page;
-	u64 vmcb_gpa;
-
-	vmcb_gpa = svm->vmcb->save.rax;
-
-	nested_vmcb = nested_svm_map(svm, svm->vmcb->save.rax, &page);
-	if (!nested_vmcb)
-		return false;
-
-	if (!nested_vmcb_checks(nested_vmcb)) {
-		nested_vmcb->control.exit_code    = SVM_EXIT_ERR;
-		nested_vmcb->control.exit_code_hi = 0;
-		nested_vmcb->control.exit_info_1  = 0;
-		nested_vmcb->control.exit_info_2  = 0;
-
-		nested_svm_unmap(page);
-
-		return false;
-	}
-
-	trace_kvm_nested_vmrun(svm->vmcb->save.rip, vmcb_gpa,
-			       nested_vmcb->save.rip,
-			       nested_vmcb->control.int_ctl,
-			       nested_vmcb->control.event_inj,
-			       nested_vmcb->control.nested_ctl);
-
-	trace_kvm_nested_intercepts(nested_vmcb->control.intercept_cr & 0xffff,
-				    nested_vmcb->control.intercept_cr >> 16,
-				    nested_vmcb->control.intercept_exceptions,
-				    nested_vmcb->control.intercept);
-
-	/* Clear internal status */
-	kvm_clear_exception_queue(&svm->vcpu);
-	kvm_clear_interrupt_queue(&svm->vcpu);
-
-	/*
-	 * Save the old vmcb, so we don't need to pick what we save, but can
-	 * restore everything when a VMEXIT occurs
-	 */
-	hsave->save.es     = vmcb->save.es;
-	hsave->save.cs     = vmcb->save.cs;
-	hsave->save.ss     = vmcb->save.ss;
-	hsave->save.ds     = vmcb->save.ds;
-	hsave->save.gdtr   = vmcb->save.gdtr;
-	hsave->save.idtr   = vmcb->save.idtr;
-	hsave->save.efer   = svm->vcpu.arch.efer;
-	hsave->save.cr0    = kvm_read_cr0(&svm->vcpu);
-	hsave->save.cr4    = svm->vcpu.arch.cr4;
-	hsave->save.rflags = kvm_get_rflags(&svm->vcpu);
-	hsave->save.rip    = kvm_rip_read(&svm->vcpu);
-	hsave->save.rsp    = vmcb->save.rsp;
-	hsave->save.rax    = vmcb->save.rax;
-	if (npt_enabled)
-		hsave->save.cr3    = vmcb->save.cr3;
-	else
-		hsave->save.cr3    = kvm_read_cr3(&svm->vcpu);
-
-	copy_vmcb_control_area(hsave, vmcb);
-
 	if (kvm_get_rflags(&svm->vcpu) & X86_EFLAGS_IF)
 		svm->vcpu.arch.hflags |= HF_HIF_MASK;
 	else
@@ -3048,6 +2987,73 @@ static bool nested_svm_vmrun(struct vcpu_svm *svm)
 	enable_gif(svm);
 
 	mark_all_dirty(svm->vmcb);
+}
+
+static bool nested_svm_vmrun(struct vcpu_svm *svm)
+{
+	struct vmcb *nested_vmcb;
+	struct vmcb *hsave = svm->nested.hsave;
+	struct vmcb *vmcb = svm->vmcb;
+	struct page *page;
+	u64 vmcb_gpa;
+
+	vmcb_gpa = svm->vmcb->save.rax;
+
+	nested_vmcb = nested_svm_map(svm, svm->vmcb->save.rax, &page);
+	if (!nested_vmcb)
+		return false;
+
+	if (!nested_vmcb_checks(nested_vmcb)) {
+		nested_vmcb->control.exit_code    = SVM_EXIT_ERR;
+		nested_vmcb->control.exit_code_hi = 0;
+		nested_vmcb->control.exit_info_1  = 0;
+		nested_vmcb->control.exit_info_2  = 0;
+
+		nested_svm_unmap(page);
+
+		return false;
+	}
+
+	trace_kvm_nested_vmrun(svm->vmcb->save.rip, vmcb_gpa,
+			       nested_vmcb->save.rip,
+			       nested_vmcb->control.int_ctl,
+			       nested_vmcb->control.event_inj,
+			       nested_vmcb->control.nested_ctl);
+
+	trace_kvm_nested_intercepts(nested_vmcb->control.intercept_cr & 0xffff,
+				    nested_vmcb->control.intercept_cr >> 16,
+				    nested_vmcb->control.intercept_exceptions,
+				    nested_vmcb->control.intercept);
+
+	/* Clear internal status */
+	kvm_clear_exception_queue(&svm->vcpu);
+	kvm_clear_interrupt_queue(&svm->vcpu);
+
+	/*
+	 * Save the old vmcb, so we don't need to pick what we save, but can
+	 * restore everything when a VMEXIT occurs
+	 */
+	hsave->save.es     = vmcb->save.es;
+	hsave->save.cs     = vmcb->save.cs;
+	hsave->save.ss     = vmcb->save.ss;
+	hsave->save.ds     = vmcb->save.ds;
+	hsave->save.gdtr   = vmcb->save.gdtr;
+	hsave->save.idtr   = vmcb->save.idtr;
+	hsave->save.efer   = svm->vcpu.arch.efer;
+	hsave->save.cr0    = kvm_read_cr0(&svm->vcpu);
+	hsave->save.cr4    = svm->vcpu.arch.cr4;
+	hsave->save.rflags = kvm_get_rflags(&svm->vcpu);
+	hsave->save.rip    = kvm_rip_read(&svm->vcpu);
+	hsave->save.rsp    = vmcb->save.rsp;
+	hsave->save.rax    = vmcb->save.rax;
+	if (npt_enabled)
+		hsave->save.cr3    = vmcb->save.cr3;
+	else
+		hsave->save.cr3    = kvm_read_cr3(&svm->vcpu);
+
+	copy_vmcb_control_area(hsave, vmcb);
+
+	enter_svm_guest_mode(svm, vmcb_gpa, nested_vmcb, page);
 
 	return true;
 }
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 5/5] KVM: nSVM: fix SMI injection in guest mode
  2017-09-13 14:06 [PATCH 0/5] KVM: nested virt SMM fixes Ladi Prosek
                   ` (3 preceding siblings ...)
  2017-09-13 14:06 ` [PATCH 4/5] KVM: nSVM: refactor nested_svm_vmrun Ladi Prosek
@ 2017-09-13 14:06 ` Ladi Prosek
  4 siblings, 0 replies; 9+ messages in thread
From: Ladi Prosek @ 2017-09-13 14:06 UTC (permalink / raw)
  To: kvm; +Cc: rkrcmar

Entering SMM while running in guest mode wasn't working very well because several
pieces of the vcpu state were left set up for nested operation.

Some of the issues observed:

* L1 was getting unexpected VM exits (using L1 interception controls but running
  in SMM execution environment)
* MMU was confused (walk_mmu was still set to nested_mmu)

Intel SDM actually prescribes the logical processor to "leave VMX operation" upon
entering SMM in 34.14.1 Default Treatment of SMI Delivery. AMD doesn't seem to
document this but handling SMM in the same fashion is a safe bet. What we need
to do is basically get out of guest mode for the duration of SMM. All this
completely transparent to L1, i.e. L1 is not given control and no L1 observable
state changes.

To avoid code duplication this commit takes advantage of the existing nested
vmexit and run functionality, perhaps at the cost of efficiency. To get out of
guest mode, nested_svm_vmexit is called, unchanged. Re-entering is performed using
enter_svm_guest_mode.

This commit fixes running Windows Server 2016 with Hyper-V enabled in a VM with
OVMF firmware (OVMF_CODE-need-smm.fd).

Signed-off-by: Ladi Prosek <lprosek@redhat.com>
---
 arch/x86/kvm/svm.c | 37 +++++++++++++++++++++++++++++++++++--
 1 file changed, 35 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 0f85062..b9ae1a2 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -152,6 +152,14 @@ struct nested_state {
 
 	/* Nested Paging related state */
 	u64 nested_cr3;
+
+	/* SMM related state */
+	struct {
+		/* in guest mode on SMM entry? */
+		bool guest_mode;
+		/* current nested VMCB on SMM entry */
+		u64 vmcb;
+	} smm;
 };
 
 #define MSRPM_OFFSETS	16
@@ -5365,13 +5373,38 @@ static void svm_setup_mce(struct kvm_vcpu *vcpu)
 
 static int svm_prep_enter_smm(struct kvm_vcpu *vcpu, char *smstate)
 {
-	/* TODO: Implement */
+	struct vcpu_svm *svm = to_svm(vcpu);
+	int ret;
+
+	svm->nested.smm.guest_mode = is_guest_mode(vcpu);
+	if (svm->nested.smm.guest_mode) {
+		svm->nested.smm.vmcb = svm->nested.vmcb;
+
+		svm->vmcb->save.rax = vcpu->arch.regs[VCPU_REGS_RAX];
+		svm->vmcb->save.rsp = vcpu->arch.regs[VCPU_REGS_RSP];
+		svm->vmcb->save.rip = vcpu->arch.regs[VCPU_REGS_RIP];
+
+		ret = nested_svm_vmexit(svm);
+		if (ret)
+			return ret;
+	}
 	return 0;
 }
 
 static int svm_post_leave_smm(struct kvm_vcpu *vcpu)
 {
-	/* TODO: Implement */
+	struct vcpu_svm *svm = to_svm(vcpu);
+	struct vmcb *nested_vmcb;
+	struct page *page;
+
+	if (svm->nested.smm.guest_mode) {
+		nested_vmcb = nested_svm_map(svm, svm->nested.smm.vmcb, &page);
+		if (!nested_vmcb)
+			return 1;
+		enter_svm_guest_mode(svm, svm->nested.smm.vmcb, nested_vmcb, page);
+
+		svm->nested.smm.guest_mode = false;
+	}
 	return 0;
 }
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/5] KVM: x86: introduce ISA specific SMM entry/exit callbacks
  2017-09-13 14:06 ` [PATCH 1/5] KVM: x86: introduce ISA specific SMM entry/exit callbacks Ladi Prosek
@ 2017-09-13 21:44   ` Paolo Bonzini
  2017-09-14  7:14     ` Ladi Prosek
  0 siblings, 1 reply; 9+ messages in thread
From: Paolo Bonzini @ 2017-09-13 21:44 UTC (permalink / raw)
  To: Ladi Prosek, kvm; +Cc: rkrcmar

On 13/09/2017 16:06, Ladi Prosek wrote:
> +	bool left_smm;  /* post_leave_smm() needs to be called after emulation */

This is already stored (more or less) in hflags.  Would it work to
invoke the hook from kvm_smm_changed instead?

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/5] KVM: x86: introduce ISA specific SMM entry/exit callbacks
  2017-09-13 21:44   ` Paolo Bonzini
@ 2017-09-14  7:14     ` Ladi Prosek
  2017-09-14  9:47       ` Paolo Bonzini
  0 siblings, 1 reply; 9+ messages in thread
From: Ladi Prosek @ 2017-09-14  7:14 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: KVM list, Radim Krcmar

On Wed, Sep 13, 2017 at 11:44 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> On 13/09/2017 16:06, Ladi Prosek wrote:
>> +     bool left_smm;  /* post_leave_smm() needs to be called after emulation */
>
> This is already stored (more or less) in hflags.  Would it work to
> invoke the hook from kvm_smm_changed instead?

I would have to reorder some of the calls under "if (writeback)" in
x86_emulate_instruction to make it work. The hook needs to be called
after all vcpu state has been synced. For example now kvm_rip_write
runs after kvm_set_hflags so it would overwrite the rip restored by
the hook.

The left_smm field is indeed not necessary though. What if I leave it
as a separate statement in x86_emulate_instruction to make the
ordering requirement explicit, but use hflags to detect that we've
left SMM?

Thanks!
Ladi

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/5] KVM: x86: introduce ISA specific SMM entry/exit callbacks
  2017-09-14  7:14     ` Ladi Prosek
@ 2017-09-14  9:47       ` Paolo Bonzini
  0 siblings, 0 replies; 9+ messages in thread
From: Paolo Bonzini @ 2017-09-14  9:47 UTC (permalink / raw)
  To: Ladi Prosek; +Cc: KVM list, Radim Krcmar

On 14/09/2017 09:14, Ladi Prosek wrote:
> I would have to reorder some of the calls under "if (writeback)" in
> x86_emulate_instruction to make it work. The hook needs to be called
> after all vcpu state has been synced. For example now kvm_rip_write
> runs after kvm_set_hflags so it would overwrite the rip restored by
> the hook.
> 
> The left_smm field is indeed not necessary though. What if I leave it
> as a separate statement in x86_emulate_instruction to make the
> ordering requirement explicit, but use hflags to detect that we've
> left SMM?

That'd be even better, yes.

Paolo

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-09-14  9:47 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-13 14:06 [PATCH 0/5] KVM: nested virt SMM fixes Ladi Prosek
2017-09-13 14:06 ` [PATCH 1/5] KVM: x86: introduce ISA specific SMM entry/exit callbacks Ladi Prosek
2017-09-13 21:44   ` Paolo Bonzini
2017-09-14  7:14     ` Ladi Prosek
2017-09-14  9:47       ` Paolo Bonzini
2017-09-13 14:06 ` [PATCH 2/5] KVM: nVMX: fix SMI injection in guest mode Ladi Prosek
2017-09-13 14:06 ` [PATCH 3/5] KVM: nVMX: save nested EPT information in SMRAM state save map Ladi Prosek
2017-09-13 14:06 ` [PATCH 4/5] KVM: nSVM: refactor nested_svm_vmrun Ladi Prosek
2017-09-13 14:06 ` [PATCH 5/5] KVM: nSVM: fix SMI injection in guest mode Ladi Prosek

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.