All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/6] KVM: nested virt SMM fixes
@ 2017-09-25  8:08 Ladi Prosek
  2017-09-25  8:08 ` [PATCH v3 1/6] KVM: x86: introduce ISA specific SMM entry/exit callbacks Ladi Prosek
                   ` (5 more replies)
  0 siblings, 6 replies; 11+ messages in thread
From: Ladi Prosek @ 2017-09-25  8:08 UTC (permalink / raw)
  To: kvm; +Cc: rkrcmar, pbonzini

Windows Server 2016 with Hyper-V enabled fails to boot on OVMF with SMM
(OVMF_CODE-need-smm.fd). Turns out that the SMM emulation code in KVM
does not handle nested virtualization very well, leading to a whole bunch
of issues.

For example, Hyper-V uses descriptor table exiting (SECONDARY_EXEC_DESC)
so when the SMM handler tries to switch from real mode a VM exit occurs
and is forwarded to a clueless L1.

This series fixes it by switching the vcpu to !guest_mode, i.e. to the L1
state, before entering SMM and then switching back to L2 after the RSM
instruction is emulated.

Patches 1 and 2 are common for both Intel and AMD, patch 3 fixes Intel,
and patches 5-6 AMD. Patch 4 prevents CR4.VMXE from being set in SMM on
Intel. It is however not required to make Windows work.

v1->v2:
* Moved left_smm detection to emulator_set_hflags (couldn't quite get rid
  of the field despite my original claim) (Paolo)
* Moved the kvm_x86_ops->post_leave_smm() call a few statements down so
  it really runs after all state has been synced.
* Added the smi_allowed callback (new patch 2) to avoid running into
  WARN_ON_ONCE(vmx->nested.nested_run_pending) on Intel.

v2->v3:
* Ommitted patch 4 ("KVM: nVMX: save nested EPT information in SMRAM state
  save map") and replaced it with ("treat CR4.VMXE as reserved in SMM")
  (Paolo)
* Implemented smi_allowed on AMD to support SMI interception. Turns out
  Windows needs this when running on >1 vCPU.
* Eliminated internal SMM state on AMD and switched to using the SMM state
  save area in guest memory instead (Paolo)

Ladi Prosek (6):
      KVM: x86: introduce ISA specific SMM entry/exit callbacks
      KVM: x86: introduce ISA specific smi_allowed callback
      KVM: nVMX: fix SMI injection in guest mode
      KVM: nVMX: treat CR4.VMXE as reserved in SMM
      KVM: nSVM: refactor nested_svm_vmrun
      KVM: nSVM: fix SMI injection in guest mode

 arch/x86/include/asm/kvm_emulate.h |   2 +
 arch/x86/include/asm/kvm_host.h    |   7 ++
 arch/x86/kvm/emulate.c             |  18 +++-
 arch/x86/kvm/svm.c                 | 205 +++++++++++++++++++++++++------------
 arch/x86/kvm/vmx.c                 |  79 ++++++++++++--
 arch/x86/kvm/x86.c                 |  18 +++-
 6 files changed, 249 insertions(+), 80 deletions(-)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v3 1/6] KVM: x86: introduce ISA specific SMM entry/exit callbacks
  2017-09-25  8:08 [PATCH v3 0/6] KVM: nested virt SMM fixes Ladi Prosek
@ 2017-09-25  8:08 ` Ladi Prosek
  2017-09-25  8:09 ` [PATCH v3 2/6] KVM: x86: introduce ISA specific smi_allowed callback Ladi Prosek
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 11+ messages in thread
From: Ladi Prosek @ 2017-09-25  8:08 UTC (permalink / raw)
  To: kvm; +Cc: rkrcmar, pbonzini

Entering and exiting SMM may require ISA specific handling under certain
circumstances. This commit adds two new callbacks with empty implementations.
Actual functionality will be added in following commits.

* prep_enter_smm() is to be called when injecting an SMM, before any
  SMM related vcpu state has been changed
* post_leave_smm() is to be called when emulating the RSM instruction,
  after all SMM related vcpu state has been restored

Signed-off-by: Ladi Prosek <lprosek@redhat.com>
---
 arch/x86/include/asm/kvm_emulate.h |  2 ++
 arch/x86/include/asm/kvm_host.h    |  3 +++
 arch/x86/kvm/svm.c                 | 15 +++++++++++++++
 arch/x86/kvm/vmx.c                 | 15 +++++++++++++++
 arch/x86/kvm/x86.c                 | 13 ++++++++++++-
 5 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h
index fa2558e12024..99f83367b92c 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -289,6 +289,7 @@ struct x86_emulate_ctxt {
 	/* Register state before/after emulation. */
 	unsigned long eflags;
 	unsigned long eip; /* eip before instruction emulation */
+	u64 smbase; /* smbase before instruction emulation */
 	/* Emulated execution mode, represented by an X86EMUL_MODE value. */
 	enum x86emul_mode mode;
 
@@ -298,6 +299,7 @@ struct x86_emulate_ctxt {
 	bool perm_ok; /* do not check permissions if true */
 	bool ud;	/* inject an #UD if host doesn't support insn */
 	bool tf;	/* TF value before instruction (after for syscall/sysret) */
+	bool left_smm;  /* post_leave_smm() needs to be called after emulation */
 
 	bool have_exception;
 	struct x86_exception exception;
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index c73e493adf07..596f2e826327 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1061,6 +1061,9 @@ struct kvm_x86_ops {
 	void (*cancel_hv_timer)(struct kvm_vcpu *vcpu);
 
 	void (*setup_mce)(struct kvm_vcpu *vcpu);
+
+	int (*prep_enter_smm)(struct kvm_vcpu *vcpu, char *smstate);
+	int (*post_leave_smm)(struct kvm_vcpu *vcpu, u64 smbase);
 };
 
 struct kvm_arch_async_pf {
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 0e68f0b3cbf7..cdbbf9537111 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -5393,6 +5393,18 @@ static void svm_setup_mce(struct kvm_vcpu *vcpu)
 	vcpu->arch.mcg_cap &= 0x1ff;
 }
 
+static int svm_prep_enter_smm(struct kvm_vcpu *vcpu, char *smstate)
+{
+	/* TODO: Implement */
+	return 0;
+}
+
+static int svm_post_leave_smm(struct kvm_vcpu *vcpu, u64 smbase)
+{
+	/* TODO: Implement */
+	return 0;
+}
+
 static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
 	.cpu_has_kvm_support = has_svm,
 	.disabled_by_bios = is_disabled,
@@ -5503,6 +5515,9 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
 	.deliver_posted_interrupt = svm_deliver_avic_intr,
 	.update_pi_irte = svm_update_pi_irte,
 	.setup_mce = svm_setup_mce,
+
+	.prep_enter_smm = svm_prep_enter_smm,
+	.post_leave_smm = svm_post_leave_smm,
 };
 
 static int __init svm_init(void)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index c83d28b0ab05..10f5526f1069 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -11946,6 +11946,18 @@ static void vmx_setup_mce(struct kvm_vcpu *vcpu)
 			~FEATURE_CONTROL_LMCE;
 }
 
+static int vmx_prep_enter_smm(struct kvm_vcpu *vcpu, char *smstate)
+{
+	/* TODO: Implement */
+	return 0;
+}
+
+static int vmx_post_leave_smm(struct kvm_vcpu *vcpu, u64 smbase)
+{
+	/* TODO: Implement */
+	return 0;
+}
+
 static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
 	.cpu_has_kvm_support = cpu_has_kvm_support,
 	.disabled_by_bios = vmx_disabled_by_bios,
@@ -12071,6 +12083,9 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
 #endif
 
 	.setup_mce = vmx_setup_mce,
+
+	.prep_enter_smm = vmx_prep_enter_smm,
+	.post_leave_smm = vmx_post_leave_smm,
 };
 
 static int __init vmx_init(void)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index cd17b7d9a107..8007a6ec2e5b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5272,7 +5272,11 @@ static unsigned emulator_get_hflags(struct x86_emulate_ctxt *ctxt)
 
 static void emulator_set_hflags(struct x86_emulate_ctxt *ctxt, unsigned emul_flags)
 {
-	kvm_set_hflags(emul_to_vcpu(ctxt), emul_flags);
+	struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt);
+
+	if ((vcpu->arch.hflags & HF_SMM_MASK) && !(emul_flags & HF_SMM_MASK))
+		ctxt->left_smm = true;
+	kvm_set_hflags(vcpu, emul_flags);
 }
 
 static const struct x86_emulate_ops emulate_ops = {
@@ -5692,6 +5696,8 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu,
 		ctxt->have_exception = false;
 		ctxt->exception.vector = -1;
 		ctxt->perm_ok = false;
+		ctxt->smbase = ctxt->ops->get_smbase(ctxt);
+		ctxt->left_smm = false;
 
 		ctxt->ud = emulation_type & EMULTYPE_TRAP_UD;
 
@@ -5779,6 +5785,8 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu,
 		if (!ctxt->have_exception ||
 		    exception_type(ctxt->exception.vector) == EXCPT_TRAP)
 			__kvm_set_rflags(vcpu, ctxt->eflags);
+		if (r == EMULATE_DONE && ctxt->left_smm)
+			kvm_x86_ops->post_leave_smm(vcpu, ctxt->smbase);
 
 		/*
 		 * For STI, interrupts are shadowed; so KVM_REQ_EVENT will
@@ -6643,6 +6651,9 @@ static void enter_smm(struct kvm_vcpu *vcpu)
 	trace_kvm_enter_smm(vcpu->vcpu_id, vcpu->arch.smbase, true);
 	vcpu->arch.hflags |= HF_SMM_MASK;
 	memset(buf, 0, 512);
+
+	kvm_x86_ops->prep_enter_smm(vcpu, buf);
+
 	if (guest_cpuid_has(vcpu, X86_FEATURE_LM))
 		enter_smm_save_state_64(vcpu, buf);
 	else
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 2/6] KVM: x86: introduce ISA specific smi_allowed callback
  2017-09-25  8:08 [PATCH v3 0/6] KVM: nested virt SMM fixes Ladi Prosek
  2017-09-25  8:08 ` [PATCH v3 1/6] KVM: x86: introduce ISA specific SMM entry/exit callbacks Ladi Prosek
@ 2017-09-25  8:09 ` Ladi Prosek
  2017-09-25  8:09 ` [PATCH v3 3/6] KVM: nVMX: fix SMI injection in guest mode Ladi Prosek
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 11+ messages in thread
From: Ladi Prosek @ 2017-09-25  8:09 UTC (permalink / raw)
  To: kvm; +Cc: rkrcmar, pbonzini

Similar to NMI, there may be ISA specific reasons why an SMI cannot be
injected into the guest. This commit adds a new smi_allowed callback to
be implemented in following commits.

Signed-off-by: Ladi Prosek <lprosek@redhat.com>
---
 arch/x86/include/asm/kvm_host.h | 1 +
 arch/x86/kvm/svm.c              | 6 ++++++
 arch/x86/kvm/vmx.c              | 6 ++++++
 arch/x86/kvm/x86.c              | 2 +-
 4 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 596f2e826327..2445b2ba26f9 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1062,6 +1062,7 @@ struct kvm_x86_ops {
 
 	void (*setup_mce)(struct kvm_vcpu *vcpu);
 
+	int (*smi_allowed)(struct kvm_vcpu *vcpu);
 	int (*prep_enter_smm)(struct kvm_vcpu *vcpu, char *smstate);
 	int (*post_leave_smm)(struct kvm_vcpu *vcpu, u64 smbase);
 };
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index cdbbf9537111..1d8dd364ead0 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -5393,6 +5393,11 @@ static void svm_setup_mce(struct kvm_vcpu *vcpu)
 	vcpu->arch.mcg_cap &= 0x1ff;
 }
 
+static int svm_smi_allowed(struct kvm_vcpu *vcpu)
+{
+	return 1;
+}
+
 static int svm_prep_enter_smm(struct kvm_vcpu *vcpu, char *smstate)
 {
 	/* TODO: Implement */
@@ -5516,6 +5521,7 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
 	.update_pi_irte = svm_update_pi_irte,
 	.setup_mce = svm_setup_mce,
 
+	.smi_allowed = svm_smi_allowed,
 	.prep_enter_smm = svm_prep_enter_smm,
 	.post_leave_smm = svm_post_leave_smm,
 };
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 10f5526f1069..5ddd092fb140 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -11946,6 +11946,11 @@ static void vmx_setup_mce(struct kvm_vcpu *vcpu)
 			~FEATURE_CONTROL_LMCE;
 }
 
+static int vmx_smi_allowed(struct kvm_vcpu *vcpu)
+{
+	return 1;
+}
+
 static int vmx_prep_enter_smm(struct kvm_vcpu *vcpu, char *smstate)
 {
 	/* TODO: Implement */
@@ -12084,6 +12089,7 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
 
 	.setup_mce = vmx_setup_mce,
 
+	.smi_allowed = vmx_smi_allowed,
 	.prep_enter_smm = vmx_prep_enter_smm,
 	.post_leave_smm = vmx_post_leave_smm,
 };
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8007a6ec2e5b..5c4c49e8e660 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6434,7 +6434,7 @@ static int inject_pending_event(struct kvm_vcpu *vcpu, bool req_int_win)
 		}
 
 		kvm_x86_ops->queue_exception(vcpu);
-	} else if (vcpu->arch.smi_pending && !is_smm(vcpu)) {
+	} else if (vcpu->arch.smi_pending && !is_smm(vcpu) && kvm_x86_ops->smi_allowed(vcpu)) {
 		vcpu->arch.smi_pending = false;
 		enter_smm(vcpu);
 	} else if (vcpu->arch.nmi_pending && kvm_x86_ops->nmi_allowed(vcpu)) {
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 3/6] KVM: nVMX: fix SMI injection in guest mode
  2017-09-25  8:08 [PATCH v3 0/6] KVM: nested virt SMM fixes Ladi Prosek
  2017-09-25  8:08 ` [PATCH v3 1/6] KVM: x86: introduce ISA specific SMM entry/exit callbacks Ladi Prosek
  2017-09-25  8:09 ` [PATCH v3 2/6] KVM: x86: introduce ISA specific smi_allowed callback Ladi Prosek
@ 2017-09-25  8:09 ` Ladi Prosek
  2017-09-25  8:09 ` [PATCH v3 4/6] KVM: nVMX: treat CR4.VMXE as reserved in SMM Ladi Prosek
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 11+ messages in thread
From: Ladi Prosek @ 2017-09-25  8:09 UTC (permalink / raw)
  To: kvm; +Cc: rkrcmar, pbonzini

Entering SMM while running in guest mode wasn't working very well because several
pieces of the vcpu state were left set up for nested operation.

Some of the issues observed:

* L1 was getting unexpected VM exits (using L1 interception controls but running
  in SMM execution environment)
* SMM handler couldn't write to vmx_set_cr4 because of incorrect validity checks
  predicated on nested.vmxon
* MMU was confused (walk_mmu was still set to nested_mmu)

Intel SDM actually prescribes the logical processor to "leave VMX operation" upon
entering SMM in 34.14.1 Default Treatment of SMI Delivery. What we need to do is
basically get out of guest mode and set nested.vmxon to false for the duration of
SMM. All this completely transparent to L1, i.e. L1 is not given control and no
L1 observable state changes.

To avoid code duplication this commit takes advantage of the existing nested
vmexit and run functionality, perhaps at the cost of efficiency. To get out of
guest mode, nested_vmx_vmexit with exit_reason == -1 is called, a trick already
used in vmx_leave_nested. Re-entering is cleaner, using enter_vmx_non_root_mode.

This commit fixes running Windows Server 2016 with Hyper-V enabled in a VM with
OVMF firmware (OVMF_CODE-need-smm.fd).

Signed-off-by: Ladi Prosek <lprosek@redhat.com>
---
 arch/x86/kvm/vmx.c | 58 +++++++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 47 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 5ddd092fb140..be4b5f9a84bf 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -486,6 +486,14 @@ struct nested_vmx {
 	u64 nested_vmx_cr4_fixed1;
 	u64 nested_vmx_vmcs_enum;
 	u64 nested_vmx_vmfunc_controls;
+
+	/* SMM related state */
+	struct {
+		/* in VMX operation on SMM entry? */
+		bool vmxon;
+		/* in guest mode on SMM entry? */
+		bool guest_mode;
+	} smm;
 };
 
 #define POSTED_INTR_ON  0
@@ -11419,8 +11427,11 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason,
 	leave_guest_mode(vcpu);
 
 	if (likely(!vmx->fail)) {
-		prepare_vmcs12(vcpu, vmcs12, exit_reason, exit_intr_info,
-			       exit_qualification);
+		if (exit_reason == -1)
+			sync_vmcs12(vcpu, vmcs12);
+		else
+			prepare_vmcs12(vcpu, vmcs12, exit_reason, exit_intr_info,
+				       exit_qualification);
 
 		if (nested_vmx_store_msr(vcpu, vmcs12->vm_exit_msr_store_addr,
 					 vmcs12->vm_exit_msr_store_count))
@@ -11484,7 +11495,7 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason,
 	 */
 	kvm_make_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu);
 
-	if (enable_shadow_vmcs)
+	if (enable_shadow_vmcs && exit_reason != -1)
 		vmx->nested.sync_shadow_vmcs = true;
 
 	/* in case we halted in L2 */
@@ -11508,12 +11519,13 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason,
 				INTR_INFO_VALID_MASK | INTR_TYPE_EXT_INTR;
 		}
 
-		trace_kvm_nested_vmexit_inject(vmcs12->vm_exit_reason,
-					       vmcs12->exit_qualification,
-					       vmcs12->idt_vectoring_info_field,
-					       vmcs12->vm_exit_intr_info,
-					       vmcs12->vm_exit_intr_error_code,
-					       KVM_ISA_VMX);
+		if (exit_reason != -1)
+			trace_kvm_nested_vmexit_inject(vmcs12->vm_exit_reason,
+						       vmcs12->exit_qualification,
+						       vmcs12->idt_vectoring_info_field,
+						       vmcs12->vm_exit_intr_info,
+						       vmcs12->vm_exit_intr_error_code,
+						       KVM_ISA_VMX);
 
 		load_vmcs12_host_state(vcpu, vmcs12);
 
@@ -11948,18 +11960,42 @@ static void vmx_setup_mce(struct kvm_vcpu *vcpu)
 
 static int vmx_smi_allowed(struct kvm_vcpu *vcpu)
 {
+	/* we need a nested vmexit to enter SMM, postpone if run is pending */
+	if (to_vmx(vcpu)->nested.nested_run_pending)
+		return 0;
 	return 1;
 }
 
 static int vmx_prep_enter_smm(struct kvm_vcpu *vcpu, char *smstate)
 {
-	/* TODO: Implement */
+	struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+	vmx->nested.smm.guest_mode = is_guest_mode(vcpu);
+	if (vmx->nested.smm.guest_mode)
+		nested_vmx_vmexit(vcpu, -1, 0, 0);
+
+	vmx->nested.smm.vmxon = vmx->nested.vmxon;
+	vmx->nested.vmxon = false;
 	return 0;
 }
 
 static int vmx_post_leave_smm(struct kvm_vcpu *vcpu, u64 smbase)
 {
-	/* TODO: Implement */
+	struct vcpu_vmx *vmx = to_vmx(vcpu);
+	int ret;
+
+	if (vmx->nested.smm.vmxon) {
+		vmx->nested.vmxon = true;
+		vmx->nested.smm.vmxon = false;
+	}
+
+	if (vmx->nested.smm.guest_mode) {
+		ret = enter_vmx_non_root_mode(vcpu, false);
+		if (ret)
+			return ret;
+
+		vmx->nested.smm.guest_mode = false;
+	}
 	return 0;
 }
 
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 4/6] KVM: nVMX: treat CR4.VMXE as reserved in SMM
  2017-09-25  8:08 [PATCH v3 0/6] KVM: nested virt SMM fixes Ladi Prosek
                   ` (2 preceding siblings ...)
  2017-09-25  8:09 ` [PATCH v3 3/6] KVM: nVMX: fix SMI injection in guest mode Ladi Prosek
@ 2017-09-25  8:09 ` Ladi Prosek
  2017-09-25  8:09 ` [PATCH v3 5/6] KVM: nSVM: refactor nested_svm_vmrun Ladi Prosek
  2017-09-25  8:09 ` [PATCH v3 6/6] KVM: nSVM: fix SMI injection in guest mode Ladi Prosek
  5 siblings, 0 replies; 11+ messages in thread
From: Ladi Prosek @ 2017-09-25  8:09 UTC (permalink / raw)
  To: kvm; +Cc: rkrcmar, pbonzini

Intel SDM 34.14.3 Protection of CR4.VMXE in SMM:

"Under the default treatment, CR4.VMXE is treated as a reserved bit while a
logical processor is in SMM. Any attempt by software running in SMM to set
this bit causes a general-protection exception."

em_rsm() may set CR4.VMXE as part of its loading of saved SMM state so care
must be taken to order this properly (HF_SMM_MASK is required to access the
state-save area but must be cleared before setting CR4.VMXE).

Signed-off-by: Ladi Prosek <lprosek@redhat.com>
---
 arch/x86/kvm/emulate.c | 18 +++++++++++++++---
 arch/x86/kvm/vmx.c     |  4 ++++
 2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 15f527b44aa7..f12bf51d379a 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2479,6 +2479,9 @@ static int rsm_load_state_32(struct x86_emulate_ctxt *ctxt, u64 smbase)
 
 	ctxt->ops->set_smbase(ctxt, GET_SMSTATE(u32, smbase, 0x7ef8));
 
+	ctxt->ops->set_hflags(ctxt, ctxt->ops->get_hflags(ctxt) &
+		~X86EMUL_SMM_MASK);
+
 	return rsm_enter_protected_mode(ctxt, cr0, cr4);
 }
 
@@ -2531,7 +2534,8 @@ static int rsm_load_state_64(struct x86_emulate_ctxt *ctxt, u64 smbase)
 	dt.address =                GET_SMSTATE(u64, smbase, 0x7e68);
 	ctxt->ops->set_gdt(ctxt, &dt);
 
-	r = rsm_enter_protected_mode(ctxt, cr0, cr4);
+	/* We're still in SMM so CR4.VMXE is reserved. */
+	r = rsm_enter_protected_mode(ctxt, cr0, cr4 & ~X86_CR4_VMXE);
 	if (r != X86EMUL_CONTINUE)
 		return r;
 
@@ -2541,6 +2545,13 @@ static int rsm_load_state_64(struct x86_emulate_ctxt *ctxt, u64 smbase)
 			return r;
 	}
 
+	/* Out of SMM now and finish off CR4. */
+	ctxt->ops->set_hflags(ctxt, ctxt->ops->get_hflags(ctxt) &
+		~X86EMUL_SMM_MASK);
+
+	if (ctxt->ops->set_cr(ctxt, 4, cr4))
+		return X86EMUL_UNHANDLEABLE;
+
 	return X86EMUL_CONTINUE;
 }
 
@@ -2601,9 +2612,10 @@ static int em_rsm(struct x86_emulate_ctxt *ctxt)
 
 	if ((ctxt->ops->get_hflags(ctxt) & X86EMUL_SMM_INSIDE_NMI_MASK) == 0)
 		ctxt->ops->set_nmi_mask(ctxt, false);
+	else
+		ctxt->ops->set_hflags(ctxt, ctxt->ops->get_hflags(ctxt) &
+			~X86EMUL_SMM_INSIDE_NMI_MASK);
 
-	ctxt->ops->set_hflags(ctxt, ctxt->ops->get_hflags(ctxt) &
-		~(X86EMUL_SMM_INSIDE_NMI_MASK | X86EMUL_SMM_MASK));
 	return X86EMUL_CONTINUE;
 }
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index be4b5f9a84bf..7e350b8d724d 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4366,6 +4366,10 @@ static int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
 		 */
 		if (!nested_vmx_allowed(vcpu))
 			return 1;
+
+		/* cr4.VMXE is a reserved bit in SMM */
+		if (is_smm(vcpu))
+			return 1;
 	}
 
 	if (to_vmx(vcpu)->nested.vmxon && !nested_cr4_valid(vcpu, cr4))
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 5/6] KVM: nSVM: refactor nested_svm_vmrun
  2017-09-25  8:08 [PATCH v3 0/6] KVM: nested virt SMM fixes Ladi Prosek
                   ` (3 preceding siblings ...)
  2017-09-25  8:09 ` [PATCH v3 4/6] KVM: nVMX: treat CR4.VMXE as reserved in SMM Ladi Prosek
@ 2017-09-25  8:09 ` Ladi Prosek
  2017-09-25  8:09 ` [PATCH v3 6/6] KVM: nSVM: fix SMI injection in guest mode Ladi Prosek
  5 siblings, 0 replies; 11+ messages in thread
From: Ladi Prosek @ 2017-09-25  8:09 UTC (permalink / raw)
  To: kvm; +Cc: rkrcmar, pbonzini

Analogous to 858e25c06fb0 ("kvm: nVMX: Refactor nested_vmx_run()"), this commit splits
nested_svm_vmrun into two parts. The newly introduced enter_svm_guest_mode modifies the
vcpu state to transition from L1 to L2, while the code left in nested_svm_vmrun handles
the VMRUN instruction.

Signed-off-by: Ladi Prosek <lprosek@redhat.com>
---
 arch/x86/kvm/svm.c | 132 ++++++++++++++++++++++++++++-------------------------
 1 file changed, 69 insertions(+), 63 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 1d8dd364ead0..b9f19d715fc3 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -2916,70 +2916,9 @@ static bool nested_vmcb_checks(struct vmcb *vmcb)
 	return true;
 }
 
-static bool nested_svm_vmrun(struct vcpu_svm *svm)
+static void enter_svm_guest_mode(struct vcpu_svm *svm, u64 vmcb_gpa,
+				 struct vmcb *nested_vmcb, struct page *page)
 {
-	struct vmcb *nested_vmcb;
-	struct vmcb *hsave = svm->nested.hsave;
-	struct vmcb *vmcb = svm->vmcb;
-	struct page *page;
-	u64 vmcb_gpa;
-
-	vmcb_gpa = svm->vmcb->save.rax;
-
-	nested_vmcb = nested_svm_map(svm, svm->vmcb->save.rax, &page);
-	if (!nested_vmcb)
-		return false;
-
-	if (!nested_vmcb_checks(nested_vmcb)) {
-		nested_vmcb->control.exit_code    = SVM_EXIT_ERR;
-		nested_vmcb->control.exit_code_hi = 0;
-		nested_vmcb->control.exit_info_1  = 0;
-		nested_vmcb->control.exit_info_2  = 0;
-
-		nested_svm_unmap(page);
-
-		return false;
-	}
-
-	trace_kvm_nested_vmrun(svm->vmcb->save.rip, vmcb_gpa,
-			       nested_vmcb->save.rip,
-			       nested_vmcb->control.int_ctl,
-			       nested_vmcb->control.event_inj,
-			       nested_vmcb->control.nested_ctl);
-
-	trace_kvm_nested_intercepts(nested_vmcb->control.intercept_cr & 0xffff,
-				    nested_vmcb->control.intercept_cr >> 16,
-				    nested_vmcb->control.intercept_exceptions,
-				    nested_vmcb->control.intercept);
-
-	/* Clear internal status */
-	kvm_clear_exception_queue(&svm->vcpu);
-	kvm_clear_interrupt_queue(&svm->vcpu);
-
-	/*
-	 * Save the old vmcb, so we don't need to pick what we save, but can
-	 * restore everything when a VMEXIT occurs
-	 */
-	hsave->save.es     = vmcb->save.es;
-	hsave->save.cs     = vmcb->save.cs;
-	hsave->save.ss     = vmcb->save.ss;
-	hsave->save.ds     = vmcb->save.ds;
-	hsave->save.gdtr   = vmcb->save.gdtr;
-	hsave->save.idtr   = vmcb->save.idtr;
-	hsave->save.efer   = svm->vcpu.arch.efer;
-	hsave->save.cr0    = kvm_read_cr0(&svm->vcpu);
-	hsave->save.cr4    = svm->vcpu.arch.cr4;
-	hsave->save.rflags = kvm_get_rflags(&svm->vcpu);
-	hsave->save.rip    = kvm_rip_read(&svm->vcpu);
-	hsave->save.rsp    = vmcb->save.rsp;
-	hsave->save.rax    = vmcb->save.rax;
-	if (npt_enabled)
-		hsave->save.cr3    = vmcb->save.cr3;
-	else
-		hsave->save.cr3    = kvm_read_cr3(&svm->vcpu);
-
-	copy_vmcb_control_area(hsave, vmcb);
-
 	if (kvm_get_rflags(&svm->vcpu) & X86_EFLAGS_IF)
 		svm->vcpu.arch.hflags |= HF_HIF_MASK;
 	else
@@ -3072,6 +3011,73 @@ static bool nested_svm_vmrun(struct vcpu_svm *svm)
 	enable_gif(svm);
 
 	mark_all_dirty(svm->vmcb);
+}
+
+static bool nested_svm_vmrun(struct vcpu_svm *svm)
+{
+	struct vmcb *nested_vmcb;
+	struct vmcb *hsave = svm->nested.hsave;
+	struct vmcb *vmcb = svm->vmcb;
+	struct page *page;
+	u64 vmcb_gpa;
+
+	vmcb_gpa = svm->vmcb->save.rax;
+
+	nested_vmcb = nested_svm_map(svm, svm->vmcb->save.rax, &page);
+	if (!nested_vmcb)
+		return false;
+
+	if (!nested_vmcb_checks(nested_vmcb)) {
+		nested_vmcb->control.exit_code    = SVM_EXIT_ERR;
+		nested_vmcb->control.exit_code_hi = 0;
+		nested_vmcb->control.exit_info_1  = 0;
+		nested_vmcb->control.exit_info_2  = 0;
+
+		nested_svm_unmap(page);
+
+		return false;
+	}
+
+	trace_kvm_nested_vmrun(svm->vmcb->save.rip, vmcb_gpa,
+			       nested_vmcb->save.rip,
+			       nested_vmcb->control.int_ctl,
+			       nested_vmcb->control.event_inj,
+			       nested_vmcb->control.nested_ctl);
+
+	trace_kvm_nested_intercepts(nested_vmcb->control.intercept_cr & 0xffff,
+				    nested_vmcb->control.intercept_cr >> 16,
+				    nested_vmcb->control.intercept_exceptions,
+				    nested_vmcb->control.intercept);
+
+	/* Clear internal status */
+	kvm_clear_exception_queue(&svm->vcpu);
+	kvm_clear_interrupt_queue(&svm->vcpu);
+
+	/*
+	 * Save the old vmcb, so we don't need to pick what we save, but can
+	 * restore everything when a VMEXIT occurs
+	 */
+	hsave->save.es     = vmcb->save.es;
+	hsave->save.cs     = vmcb->save.cs;
+	hsave->save.ss     = vmcb->save.ss;
+	hsave->save.ds     = vmcb->save.ds;
+	hsave->save.gdtr   = vmcb->save.gdtr;
+	hsave->save.idtr   = vmcb->save.idtr;
+	hsave->save.efer   = svm->vcpu.arch.efer;
+	hsave->save.cr0    = kvm_read_cr0(&svm->vcpu);
+	hsave->save.cr4    = svm->vcpu.arch.cr4;
+	hsave->save.rflags = kvm_get_rflags(&svm->vcpu);
+	hsave->save.rip    = kvm_rip_read(&svm->vcpu);
+	hsave->save.rsp    = vmcb->save.rsp;
+	hsave->save.rax    = vmcb->save.rax;
+	if (npt_enabled)
+		hsave->save.cr3    = vmcb->save.cr3;
+	else
+		hsave->save.cr3    = kvm_read_cr3(&svm->vcpu);
+
+	copy_vmcb_control_area(hsave, vmcb);
+
+	enter_svm_guest_mode(svm, vmcb_gpa, nested_vmcb, page);
 
 	return true;
 }
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 6/6] KVM: nSVM: fix SMI injection in guest mode
  2017-09-25  8:08 [PATCH v3 0/6] KVM: nested virt SMM fixes Ladi Prosek
                   ` (4 preceding siblings ...)
  2017-09-25  8:09 ` [PATCH v3 5/6] KVM: nSVM: refactor nested_svm_vmrun Ladi Prosek
@ 2017-09-25  8:09 ` Ladi Prosek
  2017-10-03 19:53   ` Radim Krčmář
  5 siblings, 1 reply; 11+ messages in thread
From: Ladi Prosek @ 2017-09-25  8:09 UTC (permalink / raw)
  To: kvm; +Cc: rkrcmar, pbonzini

Entering SMM while running in guest mode wasn't working very well because several
pieces of the vcpu state were left set up for nested operation.

Some of the issues observed:

* L1 was getting unexpected VM exits (using L1 interception controls but running
  in SMM execution environment)
* MMU was confused (walk_mmu was still set to nested_mmu)
* INTERCEPT_SMI was not emulated for L1 (KVM never injected SVM_EXIT_SMI)

Intel SDM actually prescribes the logical processor to "leave VMX operation" upon
entering SMM in 34.14.1 Default Treatment of SMI Delivery. AMD doesn't seem to
document this but they provide fields in the SMM state-save area to stash the
current state of SVM. What we need to do is basically get out of guest mode for
the duration of SMM. All this completely transparent to L1, i.e. L1 is not given
control and no L1 observable state changes.

To avoid code duplication this commit takes advantage of the existing nested
vmexit and run functionality, perhaps at the cost of efficiency. To get out of
guest mode, nested_svm_vmexit is called, unchanged. Re-entering is performed using
enter_svm_guest_mode.

This commit fixes running Windows Server 2016 with Hyper-V enabled in a VM with
OVMF firmware (OVMF_CODE-need-smm.fd).

Signed-off-by: Ladi Prosek <lprosek@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |  3 +++
 arch/x86/kvm/svm.c              | 56 +++++++++++++++++++++++++++++++++++++++--
 arch/x86/kvm/x86.c              |  3 ---
 3 files changed, 57 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 2445b2ba26f9..e582b8c9579b 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1430,4 +1430,7 @@ static inline int kvm_cpu_get_apicid(int mps_cpu)
 #endif
 }
 
+#define put_smstate(type, buf, offset, val)                      \
+	*(type *)((buf) + (offset) - 0x7e00) = val
+
 #endif /* _ASM_X86_KVM_HOST_H */
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index b9f19d715fc3..416ec56d6715 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -5401,18 +5401,70 @@ static void svm_setup_mce(struct kvm_vcpu *vcpu)
 
 static int svm_smi_allowed(struct kvm_vcpu *vcpu)
 {
+	struct vcpu_svm *svm = to_svm(vcpu);
+
+	/* Per APM Vol.2 15.22.2 "Response to SMI" */
+	if (!gif_set(svm))
+		return 0;
+
+	if (is_guest_mode(&svm->vcpu) &&
+	    svm->nested.intercept & (1ULL << INTERCEPT_SMI)) {
+		/* TODO: Might need to set exit_info_1 and exit_info_2 here */
+		svm->vmcb->control.exit_code = SVM_EXIT_SMI;
+		svm->nested.exit_required = true;
+		return 0;
+	}
+
 	return 1;
 }
 
 static int svm_prep_enter_smm(struct kvm_vcpu *vcpu, char *smstate)
 {
-	/* TODO: Implement */
+	struct vcpu_svm *svm = to_svm(vcpu);
+	int ret;
+
+	if (is_guest_mode(vcpu)) {
+		/* FED8h - SVM Guest */
+		put_smstate(u64, smstate, 0x7ed8, 1);
+		/* FEE0h - SVM Guest VMCB Physical Address */
+		put_smstate(u64, smstate, 0x7ee0, svm->nested.vmcb);
+
+		svm->vmcb->save.rax = vcpu->arch.regs[VCPU_REGS_RAX];
+		svm->vmcb->save.rsp = vcpu->arch.regs[VCPU_REGS_RSP];
+		svm->vmcb->save.rip = vcpu->arch.regs[VCPU_REGS_RIP];
+
+		ret = nested_svm_vmexit(svm);
+		if (ret)
+			return ret;
+	}
 	return 0;
 }
 
 static int svm_post_leave_smm(struct kvm_vcpu *vcpu, u64 smbase)
 {
-	/* TODO: Implement */
+	struct vcpu_svm *svm = to_svm(vcpu);
+	struct vmcb *nested_vmcb;
+	struct page *page;
+	struct {
+		u64 guest;
+		u64 vmcb;
+	} svm_state_save;
+	int r;
+
+	/* Temporarily set the SMM flag to access the SMM state-save area */
+	vcpu->arch.hflags |= HF_SMM_MASK;
+	r = kvm_vcpu_read_guest(vcpu, smbase + 0xfed8, &svm_state_save,
+				sizeof(svm_state_save));
+	vcpu->arch.hflags &= ~HF_SMM_MASK;
+	if (r)
+		return r;
+
+	if (svm_state_save.guest) {
+		nested_vmcb = nested_svm_map(svm, svm_state_save.vmcb, &page);
+		if (!nested_vmcb)
+			return 1;
+		enter_svm_guest_mode(svm, svm_state_save.vmcb, nested_vmcb, page);
+	}
 	return 0;
 }
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5c4c49e8e660..41aa1da599bc 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6481,9 +6481,6 @@ static void process_nmi(struct kvm_vcpu *vcpu)
 	kvm_make_request(KVM_REQ_EVENT, vcpu);
 }
 
-#define put_smstate(type, buf, offset, val)			  \
-	*(type *)((buf) + (offset) - 0x7e00) = val
-
 static u32 enter_smm_get_segment_flags(struct kvm_segment *seg)
 {
 	u32 flags = 0;
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 6/6] KVM: nSVM: fix SMI injection in guest mode
  2017-09-25  8:09 ` [PATCH v3 6/6] KVM: nSVM: fix SMI injection in guest mode Ladi Prosek
@ 2017-10-03 19:53   ` Radim Krčmář
  2017-10-04 10:10     ` Ladi Prosek
  0 siblings, 1 reply; 11+ messages in thread
From: Radim Krčmář @ 2017-10-03 19:53 UTC (permalink / raw)
  To: Ladi Prosek; +Cc: kvm, pbonzini

2017-09-25 10:09+0200, Ladi Prosek:
> Entering SMM while running in guest mode wasn't working very well because several
> pieces of the vcpu state were left set up for nested operation.
> 
> Some of the issues observed:
> 
> * L1 was getting unexpected VM exits (using L1 interception controls but running
>   in SMM execution environment)
> * MMU was confused (walk_mmu was still set to nested_mmu)
> * INTERCEPT_SMI was not emulated for L1 (KVM never injected SVM_EXIT_SMI)
> 
> Intel SDM actually prescribes the logical processor to "leave VMX operation" upon
> entering SMM in 34.14.1 Default Treatment of SMI Delivery. AMD doesn't seem to
> document this but they provide fields in the SMM state-save area to stash the
> current state of SVM. What we need to do is basically get out of guest mode for
> the duration of SMM. All this completely transparent to L1, i.e. L1 is not given
> control and no L1 observable state changes.

The best description I found is in APM vol. 2, 15.22 SMM Support:

• The simplest solution is to not intercept SMI signals. SMIs
  encountered while in a guest context are taken from within the guest
  context. In this case, the SMM handler is not subject to any
  intercepts set up by the VMM and consequently runs outside of the
  virtualization controls. The state saved in the SMM State-Save area as
  seen by the SMM handler reflects the state of the guest that had been
  running at the time the SMI was encountered. When the SMM handler
  executes the RSM instruction, the processor returns to executing in
  the guest context, and any modifications to the SMM State-Save area
  made by the SMM handler are reflected in the guest state.

I think that the last sentence is not implemented correctly:
svm_prep_enter_smm() loads the host state into VMCB and
enter_smm_save_state_64() then puts host state where the SMM handler
would expect guest state.

Do Windows intercept SMI?

Thanks.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 6/6] KVM: nSVM: fix SMI injection in guest mode
  2017-10-03 19:53   ` Radim Krčmář
@ 2017-10-04 10:10     ` Ladi Prosek
  2017-10-04 14:42       ` Radim Krčmář
  0 siblings, 1 reply; 11+ messages in thread
From: Ladi Prosek @ 2017-10-04 10:10 UTC (permalink / raw)
  To: Radim Krčmář; +Cc: KVM list, Paolo Bonzini

On Tue, Oct 3, 2017 at 9:53 PM, Radim Krčmář <rkrcmar@redhat.com> wrote:
> 2017-09-25 10:09+0200, Ladi Prosek:
>> Entering SMM while running in guest mode wasn't working very well because several
>> pieces of the vcpu state were left set up for nested operation.
>>
>> Some of the issues observed:
>>
>> * L1 was getting unexpected VM exits (using L1 interception controls but running
>>   in SMM execution environment)
>> * MMU was confused (walk_mmu was still set to nested_mmu)
>> * INTERCEPT_SMI was not emulated for L1 (KVM never injected SVM_EXIT_SMI)
>>
>> Intel SDM actually prescribes the logical processor to "leave VMX operation" upon
>> entering SMM in 34.14.1 Default Treatment of SMI Delivery. AMD doesn't seem to
>> document this but they provide fields in the SMM state-save area to stash the
>> current state of SVM. What we need to do is basically get out of guest mode for
>> the duration of SMM. All this completely transparent to L1, i.e. L1 is not given
>> control and no L1 observable state changes.
>
> The best description I found is in APM vol. 2, 15.22 SMM Support:
>
> • The simplest solution is to not intercept SMI signals. SMIs
>   encountered while in a guest context are taken from within the guest
>   context. In this case, the SMM handler is not subject to any
>   intercepts set up by the VMM and consequently runs outside of the
>   virtualization controls. The state saved in the SMM State-Save area as
>   seen by the SMM handler reflects the state of the guest that had been
>   running at the time the SMI was encountered. When the SMM handler
>   executes the RSM instruction, the processor returns to executing in
>   the guest context, and any modifications to the SMM State-Save area
>   made by the SMM handler are reflected in the guest state.
>
> I think that the last sentence is not implemented correctly:
> svm_prep_enter_smm() loads the host state into VMCB and
> enter_smm_save_state_64() then puts host state where the SMM handler
> would expect guest state.

That's true. It seems to be easier to switch to the nested guest after
we've restored all the basics from the SMM area. I briefly tried to
reverse the order, which should make it compliant with 15.22 quoted
above and hopefully also with what Intel wants (it's not clear to me
whether their "leave VMX operation" in 34.14.1 Default Treatment of
SMI Delivery means switching to host state or not). I quickly ran into
issues though because entering the nested guest doesn't quite work if
the vCPU is still in real mode.

I think that the bootstrap sequence in em_rsm() has to be tweaked to
make it possible to call post_leave_smm() before restoring state from
the SMM area. I'll look into it.

> Do Windows intercept SMI?

Yes, but only if started with >1 vCPU.

Thanks!

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 6/6] KVM: nSVM: fix SMI injection in guest mode
  2017-10-04 10:10     ` Ladi Prosek
@ 2017-10-04 14:42       ` Radim Krčmář
  2017-10-10  8:03         ` Ladi Prosek
  0 siblings, 1 reply; 11+ messages in thread
From: Radim Krčmář @ 2017-10-04 14:42 UTC (permalink / raw)
  To: Ladi Prosek; +Cc: KVM list, Paolo Bonzini

2017-10-04 12:10+0200, Ladi Prosek:
> On Tue, Oct 3, 2017 at 9:53 PM, Radim Krčmář <rkrcmar@redhat.com> wrote:
> > 2017-09-25 10:09+0200, Ladi Prosek:
> >> Entering SMM while running in guest mode wasn't working very well because several
> >> pieces of the vcpu state were left set up for nested operation.
> >>
> >> Some of the issues observed:
> >>
> >> * L1 was getting unexpected VM exits (using L1 interception controls but running
> >>   in SMM execution environment)
> >> * MMU was confused (walk_mmu was still set to nested_mmu)
> >> * INTERCEPT_SMI was not emulated for L1 (KVM never injected SVM_EXIT_SMI)
> >>
> >> Intel SDM actually prescribes the logical processor to "leave VMX operation" upon
> >> entering SMM in 34.14.1 Default Treatment of SMI Delivery. AMD doesn't seem to
> >> document this but they provide fields in the SMM state-save area to stash the
> >> current state of SVM. What we need to do is basically get out of guest mode for
> >> the duration of SMM. All this completely transparent to L1, i.e. L1 is not given
> >> control and no L1 observable state changes.
> >
> > The best description I found is in APM vol. 2, 15.22 SMM Support:
> >
> > • The simplest solution is to not intercept SMI signals. SMIs
> >   encountered while in a guest context are taken from within the guest
> >   context. In this case, the SMM handler is not subject to any
> >   intercepts set up by the VMM and consequently runs outside of the
> >   virtualization controls. The state saved in the SMM State-Save area as
> >   seen by the SMM handler reflects the state of the guest that had been
> >   running at the time the SMI was encountered. When the SMM handler
> >   executes the RSM instruction, the processor returns to executing in
> >   the guest context, and any modifications to the SMM State-Save area
> >   made by the SMM handler are reflected in the guest state.
> >
> > I think that the last sentence is not implemented correctly:
> > svm_prep_enter_smm() loads the host state into VMCB and
> > enter_smm_save_state_64() then puts host state where the SMM handler
> > would expect guest state.
> 
> That's true. It seems to be easier to switch to the nested guest after
> we've restored all the basics from the SMM area. I briefly tried to
> reverse the order, which should make it compliant with 15.22 quoted
> above and hopefully also with what Intel wants (it's not clear to me
> whether their "leave VMX operation" in 34.14.1 Default Treatment of
> SMI Delivery means switching to host state or not). I quickly ran into
> issues though because entering the nested guest doesn't quite work if
> the vCPU is still in real mode.

We basically want to keep the L2 state while removing L2 controls.
I think the current order could be maintained if we copied L2 state into
L1 before running current SMM routines.  Afterward, we'd copy restored
L1 state into L2.

It is inefficient, but should be simple. :)

> I think that the bootstrap sequence in em_rsm() has to be tweaked to
> make it possible to call post_leave_smm() before restoring state from
> the SMM area. I'll look into it.
> 
> > Do Windows intercept SMI?
> 
> Yes, but only if started with >1 vCPU.

Interesting, thanks.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 6/6] KVM: nSVM: fix SMI injection in guest mode
  2017-10-04 14:42       ` Radim Krčmář
@ 2017-10-10  8:03         ` Ladi Prosek
  0 siblings, 0 replies; 11+ messages in thread
From: Ladi Prosek @ 2017-10-10  8:03 UTC (permalink / raw)
  To: Radim Krčmář; +Cc: KVM list, Paolo Bonzini

On Wed, Oct 4, 2017 at 4:42 PM, Radim Krčmář <rkrcmar@redhat.com> wrote:
> 2017-10-04 12:10+0200, Ladi Prosek:
>> On Tue, Oct 3, 2017 at 9:53 PM, Radim Krčmář <rkrcmar@redhat.com> wrote:
>> > 2017-09-25 10:09+0200, Ladi Prosek:
>> >> Entering SMM while running in guest mode wasn't working very well because several
>> >> pieces of the vcpu state were left set up for nested operation.
>> >>
>> >> Some of the issues observed:
>> >>
>> >> * L1 was getting unexpected VM exits (using L1 interception controls but running
>> >>   in SMM execution environment)
>> >> * MMU was confused (walk_mmu was still set to nested_mmu)
>> >> * INTERCEPT_SMI was not emulated for L1 (KVM never injected SVM_EXIT_SMI)
>> >>
>> >> Intel SDM actually prescribes the logical processor to "leave VMX operation" upon
>> >> entering SMM in 34.14.1 Default Treatment of SMI Delivery. AMD doesn't seem to
>> >> document this but they provide fields in the SMM state-save area to stash the
>> >> current state of SVM. What we need to do is basically get out of guest mode for
>> >> the duration of SMM. All this completely transparent to L1, i.e. L1 is not given
>> >> control and no L1 observable state changes.
>> >
>> > The best description I found is in APM vol. 2, 15.22 SMM Support:
>> >
>> > • The simplest solution is to not intercept SMI signals. SMIs
>> >   encountered while in a guest context are taken from within the guest
>> >   context. In this case, the SMM handler is not subject to any
>> >   intercepts set up by the VMM and consequently runs outside of the
>> >   virtualization controls. The state saved in the SMM State-Save area as
>> >   seen by the SMM handler reflects the state of the guest that had been
>> >   running at the time the SMI was encountered. When the SMM handler
>> >   executes the RSM instruction, the processor returns to executing in
>> >   the guest context, and any modifications to the SMM State-Save area
>> >   made by the SMM handler are reflected in the guest state.
>> >
>> > I think that the last sentence is not implemented correctly:
>> > svm_prep_enter_smm() loads the host state into VMCB and
>> > enter_smm_save_state_64() then puts host state where the SMM handler
>> > would expect guest state.
>>
>> That's true. It seems to be easier to switch to the nested guest after
>> we've restored all the basics from the SMM area. I briefly tried to
>> reverse the order, which should make it compliant with 15.22 quoted
>> above and hopefully also with what Intel wants (it's not clear to me
>> whether their "leave VMX operation" in 34.14.1 Default Treatment of
>> SMI Delivery means switching to host state or not). I quickly ran into
>> issues though because entering the nested guest doesn't quite work if
>> the vCPU is still in real mode.
>
> We basically want to keep the L2 state while removing L2 controls.
> I think the current order could be maintained if we copied L2 state into
> L1 before running current SMM routines.  Afterward, we'd copy restored
> L1 state into L2.
>
> It is inefficient, but should be simple. :)

I've been able to make it work like this:

- On SMM entry we save the L2 state into the state-save area.
- Then we exit to L1 and continue with SMM entry as usual.

- On RSM we restore the L2 state from the state-save area (while still
in L1, i.e. is_guest_mode == false).
- Then we re-enter L2.
- Finally we restore the L2 state again (with a minor tweak as we're
no longer switching from real to protected).

If we didn't enter from L2, the second restore is omitted.

Definitely inefficient but can be done reusing the existing routines.
I'll send a v4 shortly.

>> I think that the bootstrap sequence in em_rsm() has to be tweaked to
>> make it possible to call post_leave_smm() before restoring state from
>> the SMM area. I'll look into it.
>>
>> > Do Windows intercept SMI?
>>
>> Yes, but only if started with >1 vCPU.
>
> Interesting, thanks.

I lied, Windows intercepts SMI always. It's just that on single CPU I
was able to get away without emulating the interception for some
reason. Which means that all this restore-after-RSM logic won't be
exercised with Windows guests on AMD. It's still going to be used on
Intel though and the code is common for both.

Thanks!

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-10-10  8:03 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-25  8:08 [PATCH v3 0/6] KVM: nested virt SMM fixes Ladi Prosek
2017-09-25  8:08 ` [PATCH v3 1/6] KVM: x86: introduce ISA specific SMM entry/exit callbacks Ladi Prosek
2017-09-25  8:09 ` [PATCH v3 2/6] KVM: x86: introduce ISA specific smi_allowed callback Ladi Prosek
2017-09-25  8:09 ` [PATCH v3 3/6] KVM: nVMX: fix SMI injection in guest mode Ladi Prosek
2017-09-25  8:09 ` [PATCH v3 4/6] KVM: nVMX: treat CR4.VMXE as reserved in SMM Ladi Prosek
2017-09-25  8:09 ` [PATCH v3 5/6] KVM: nSVM: refactor nested_svm_vmrun Ladi Prosek
2017-09-25  8:09 ` [PATCH v3 6/6] KVM: nSVM: fix SMI injection in guest mode Ladi Prosek
2017-10-03 19:53   ` Radim Krčmář
2017-10-04 10:10     ` Ladi Prosek
2017-10-04 14:42       ` Radim Krčmář
2017-10-10  8:03         ` Ladi Prosek

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.