All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/6] KVM: nested virt SMM fixes
@ 2017-10-10 12:17 Ladi Prosek
  2017-10-10 12:17 ` [PATCH v4 1/6] KVM: x86: introduce ISA specific SMM entry/exit callbacks Ladi Prosek
                   ` (5 more replies)
  0 siblings, 6 replies; 13+ messages in thread
From: Ladi Prosek @ 2017-10-10 12:17 UTC (permalink / raw)
  To: kvm; +Cc: rkrcmar, pbonzini

Windows Server 2016 with Hyper-V enabled fails to boot on OVMF with SMM
(OVMF_CODE-need-smm.fd). Turns out that the SMM emulation code in KVM
does not handle nested virtualization very well, leading to a whole bunch
of issues.

For example, Hyper-V uses descriptor table exiting (SECONDARY_EXEC_DESC)
so when the SMM handler tries to switch from real mode a VM exit occurs
and is forwarded to a clueless L1.

This series fixes it by switching the vcpu to !guest_mode, i.e. to the L1
state, before entering SMM and then switching back to L2 as part of
emulating the RSM instruction.

Patches 1 and 2 are common for both Intel and AMD, patch 3 fixes Intel,
and patches 5-6 AMD. Patch 4 prevents CR4.VMXE from being set in SMM on
Intel. It is however not required to make Windows work.

v1->v2:
* Moved left_smm detection to emulator_set_hflags (couldn't quite get rid
  of the field despite my original claim) (Paolo)
* Moved the kvm_x86_ops->post_leave_smm() call a few statements down so
  it really runs after all state has been synced.
* Added the smi_allowed callback (new patch 2) to avoid running into
  WARN_ON_ONCE(vmx->nested.nested_run_pending) on Intel.

v2->v3:
* Ommitted patch 4 ("KVM: nVMX: save nested EPT information in SMRAM state
  save map") and replaced it with ("treat CR4.VMXE as reserved in SMM")
  (Paolo)
* Implemented smi_allowed on AMD to support SMI interception. Turns out
  Windows needs this when running on >1 vCPU.
* Eliminated internal SMM state on AMD and switched to using the SMM state
  save area in guest memory instead (Paolo)

v3->v4:
* Changed the order of operations in enter_smm(), now saving the original
  (and potentially L2) state into the SMM state save area.
* Made em_rsm() reload the SMM state save area if post_leave_smm() entered
  guest mode. This way, SMM handlers see and may change the actual state
  of the vCPU at the point where SMI was injected (Radim)
* In patch 4, switched to a different way of avoiding the problem of hitting
  the very check the patch is adding.

Ladi Prosek (6):
      KVM: x86: introduce ISA specific SMM entry/exit callbacks
      KVM: x86: introduce ISA specific smi_allowed callback
      KVM: nVMX: fix SMI injection in guest mode
      KVM: nVMX: treat CR4.VMXE as reserved in SMM
      KVM: nSVM: refactor nested_svm_vmrun
      KVM: nSVM: fix SMI injection in guest mode

 arch/x86/include/asm/kvm_emulate.h |   3 +
 arch/x86/include/asm/kvm_host.h    |   8 ++
 arch/x86/kvm/emulate.c             |  62 ++++++++---
 arch/x86/kvm/svm.c                 | 207 ++++++++++++++++++++++++++-----------
 arch/x86/kvm/vmx.c                 |  81 +++++++++++++--
 arch/x86/kvm/x86.c                 |  22 +++-
 6 files changed, 295 insertions(+), 88 deletions(-)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v4 1/6] KVM: x86: introduce ISA specific SMM entry/exit callbacks
  2017-10-10 12:17 [PATCH v4 0/6] KVM: nested virt SMM fixes Ladi Prosek
@ 2017-10-10 12:17 ` Ladi Prosek
  2017-10-10 12:17 ` [PATCH v4 2/6] KVM: x86: introduce ISA specific smi_allowed callback Ladi Prosek
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Ladi Prosek @ 2017-10-10 12:17 UTC (permalink / raw)
  To: kvm; +Cc: rkrcmar, pbonzini

Entering and exiting SMM may require ISA specific handling under certain
circumstances. This commit adds two new callbacks with empty implementations.
Actual functionality will be added in following commits.

* prep_enter_smm() is to be called when injecting an SMM, before any
  SMM related vcpu state has been changed
* post_leave_smm() is to be called when emulating the RSM instruction,
  after all SMM related vcpu state has been restored; the function may
  ask the caller to restore the vcpu state again via the reload_state
  parameter

Signed-off-by: Ladi Prosek <lprosek@redhat.com>
---
 arch/x86/include/asm/kvm_emulate.h |  3 +++
 arch/x86/include/asm/kvm_host.h    |  4 ++++
 arch/x86/kvm/emulate.c             | 30 ++++++++++++++++++++++++++----
 arch/x86/kvm/svm.c                 | 16 ++++++++++++++++
 arch/x86/kvm/vmx.c                 | 16 ++++++++++++++++
 arch/x86/kvm/x86.c                 | 17 +++++++++++++++++
 6 files changed, 82 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h
index fa2558e12024..0ba3837173fb 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -225,6 +225,9 @@ struct x86_emulate_ops {
 
 	unsigned (*get_hflags)(struct x86_emulate_ctxt *ctxt);
 	void (*set_hflags)(struct x86_emulate_ctxt *ctxt, unsigned hflags);
+	int (*post_leave_smm)(struct x86_emulate_ctxt *ctxt, u64 smbase,
+			      bool *reload_state);
+
 };
 
 typedef u32 __attribute__((vector_size(16))) sse128_t;
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index c73e493adf07..769de6d2e684 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1061,6 +1061,10 @@ struct kvm_x86_ops {
 	void (*cancel_hv_timer)(struct kvm_vcpu *vcpu);
 
 	void (*setup_mce)(struct kvm_vcpu *vcpu);
+
+	int (*prep_enter_smm)(struct kvm_vcpu *vcpu, char *smstate);
+	int (*post_leave_smm)(struct kvm_vcpu *vcpu, u64 smbase,
+			      bool *reload_state);
 };
 
 struct kvm_arch_async_pf {
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index d90cdc77e077..1e6a8a824b8b 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2546,9 +2546,18 @@ static int rsm_load_state_64(struct x86_emulate_ctxt *ctxt, u64 smbase)
 	return X86EMUL_CONTINUE;
 }
 
+static int rsm_load_state(struct x86_emulate_ctxt *ctxt, u64 smbase)
+{
+	if (emulator_has_longmode(ctxt))
+		return rsm_load_state_64(ctxt, smbase + 0x8000);
+	else
+		return rsm_load_state_32(ctxt, smbase + 0x8000);
+}
+
 static int em_rsm(struct x86_emulate_ctxt *ctxt)
 {
 	unsigned long cr0, cr4, efer;
+	bool reload_state = false;
 	u64 smbase;
 	int ret;
 
@@ -2591,16 +2600,29 @@ static int em_rsm(struct x86_emulate_ctxt *ctxt)
 	ctxt->ops->set_msr(ctxt, MSR_EFER, efer);
 
 	smbase = ctxt->ops->get_smbase(ctxt);
-	if (emulator_has_longmode(ctxt))
-		ret = rsm_load_state_64(ctxt, smbase + 0x8000);
-	else
-		ret = rsm_load_state_32(ctxt, smbase + 0x8000);
 
+	ret = rsm_load_state(ctxt, smbase);
 	if (ret != X86EMUL_CONTINUE) {
 		/* FIXME: should triple fault */
 		return X86EMUL_UNHANDLEABLE;
 	}
 
+	if (ctxt->ops->post_leave_smm(ctxt, smbase, &reload_state))
+		return X86EMUL_UNHANDLEABLE;
+
+	if (reload_state) {
+		/*
+		 * post_leave_smm() made changes to the vCPU (e.g. entered
+		 * guest mode) and is asking us to load the SMM state-save
+		 * area again.
+		 */
+		ret = rsm_load_state(ctxt, smbase);
+		if (ret != X86EMUL_CONTINUE) {
+			/* FIXME: should triple fault */
+			return X86EMUL_UNHANDLEABLE;
+		}
+	}
+
 	if ((ctxt->ops->get_hflags(ctxt) & X86EMUL_SMM_INSIDE_NMI_MASK) == 0)
 		ctxt->ops->set_nmi_mask(ctxt, false);
 
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 0e68f0b3cbf7..d9b3e1bea644 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -5393,6 +5393,19 @@ static void svm_setup_mce(struct kvm_vcpu *vcpu)
 	vcpu->arch.mcg_cap &= 0x1ff;
 }
 
+static int svm_prep_enter_smm(struct kvm_vcpu *vcpu, char *smstate)
+{
+	/* TODO: Implement */
+	return 0;
+}
+
+static int svm_post_leave_smm(struct kvm_vcpu *vcpu, u64 smbase,
+			      bool *reload_state)
+{
+	/* TODO: Implement */
+	return 0;
+}
+
 static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
 	.cpu_has_kvm_support = has_svm,
 	.disabled_by_bios = is_disabled,
@@ -5503,6 +5516,9 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
 	.deliver_posted_interrupt = svm_deliver_avic_intr,
 	.update_pi_irte = svm_update_pi_irte,
 	.setup_mce = svm_setup_mce,
+
+	.prep_enter_smm = svm_prep_enter_smm,
+	.post_leave_smm = svm_post_leave_smm,
 };
 
 static int __init svm_init(void)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index a2b804e10c95..15478f413392 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -11941,6 +11941,19 @@ static void vmx_setup_mce(struct kvm_vcpu *vcpu)
 			~FEATURE_CONTROL_LMCE;
 }
 
+static int vmx_prep_enter_smm(struct kvm_vcpu *vcpu, char *smstate)
+{
+	/* TODO: Implement */
+	return 0;
+}
+
+static int vmx_post_leave_smm(struct kvm_vcpu *vcpu, u64 smbase,
+			      bool *reload_state)
+{
+	/* TODO: Implement */
+	return 0;
+}
+
 static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
 	.cpu_has_kvm_support = cpu_has_kvm_support,
 	.disabled_by_bios = vmx_disabled_by_bios,
@@ -12066,6 +12079,9 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
 #endif
 
 	.setup_mce = vmx_setup_mce,
+
+	.prep_enter_smm = vmx_prep_enter_smm,
+	.post_leave_smm = vmx_post_leave_smm,
 };
 
 static int __init vmx_init(void)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 03869eb7fcd6..e9aef1d858a8 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5275,6 +5275,12 @@ static void emulator_set_hflags(struct x86_emulate_ctxt *ctxt, unsigned emul_fla
 	kvm_set_hflags(emul_to_vcpu(ctxt), emul_flags);
 }
 
+static int emulator_post_leave_smm(struct x86_emulate_ctxt *ctxt, u64 smbase,
+				   bool *reload_state)
+{
+	return kvm_x86_ops->post_leave_smm(emul_to_vcpu(ctxt), smbase, reload_state);
+}
+
 static const struct x86_emulate_ops emulate_ops = {
 	.read_gpr            = emulator_read_gpr,
 	.write_gpr           = emulator_write_gpr,
@@ -5316,6 +5322,7 @@ static const struct x86_emulate_ops emulate_ops = {
 	.set_nmi_mask        = emulator_set_nmi_mask,
 	.get_hflags          = emulator_get_hflags,
 	.set_hflags          = emulator_set_hflags,
+	.post_leave_smm      = emulator_post_leave_smm,
 };
 
 static void toggle_interruptibility(struct kvm_vcpu *vcpu, u32 mask)
@@ -6643,6 +6650,7 @@ static void enter_smm(struct kvm_vcpu *vcpu)
 	trace_kvm_enter_smm(vcpu->vcpu_id, vcpu->arch.smbase, true);
 	vcpu->arch.hflags |= HF_SMM_MASK;
 	memset(buf, 0, 512);
+
 	if (guest_cpuid_has(vcpu, X86_FEATURE_LM))
 		enter_smm_save_state_64(vcpu, buf);
 	else
@@ -6650,6 +6658,15 @@ static void enter_smm(struct kvm_vcpu *vcpu)
 
 	kvm_vcpu_write_guest(vcpu, vcpu->arch.smbase + 0xfe00, buf, sizeof(buf));
 
+	/*
+	 * Give prep_enter_smm() a chance to make ISA-specific changes to the
+	 * vCPU state (e.g. leave guest mode) after we've saved the state into
+	 * the SMM state-save area. Clear HF_SMM_MASK temporarily.
+	 */
+	vcpu->arch.hflags &= ~HF_SMM_MASK;
+	kvm_x86_ops->prep_enter_smm(vcpu, buf);
+	vcpu->arch.hflags |= HF_SMM_MASK;
+
 	if (kvm_x86_ops->get_nmi_mask(vcpu))
 		vcpu->arch.hflags |= HF_SMM_INSIDE_NMI_MASK;
 	else
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 2/6] KVM: x86: introduce ISA specific smi_allowed callback
  2017-10-10 12:17 [PATCH v4 0/6] KVM: nested virt SMM fixes Ladi Prosek
  2017-10-10 12:17 ` [PATCH v4 1/6] KVM: x86: introduce ISA specific SMM entry/exit callbacks Ladi Prosek
@ 2017-10-10 12:17 ` Ladi Prosek
  2017-10-10 12:17 ` [PATCH v4 3/6] KVM: nVMX: fix SMI injection in guest mode Ladi Prosek
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Ladi Prosek @ 2017-10-10 12:17 UTC (permalink / raw)
  To: kvm; +Cc: rkrcmar, pbonzini

Similar to NMI, there may be ISA specific reasons why an SMI cannot be
injected into the guest. This commit adds a new smi_allowed callback to
be implemented in following commits.

Signed-off-by: Ladi Prosek <lprosek@redhat.com>
---
 arch/x86/include/asm/kvm_host.h | 1 +
 arch/x86/kvm/svm.c              | 6 ++++++
 arch/x86/kvm/vmx.c              | 6 ++++++
 arch/x86/kvm/x86.c              | 2 +-
 4 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 769de6d2e684..1c9d6b90f50c 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1062,6 +1062,7 @@ struct kvm_x86_ops {
 
 	void (*setup_mce)(struct kvm_vcpu *vcpu);
 
+	int (*smi_allowed)(struct kvm_vcpu *vcpu);
 	int (*prep_enter_smm)(struct kvm_vcpu *vcpu, char *smstate);
 	int (*post_leave_smm)(struct kvm_vcpu *vcpu, u64 smbase,
 			      bool *reload_state);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index d9b3e1bea644..e7c6c7fb3e19 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -5393,6 +5393,11 @@ static void svm_setup_mce(struct kvm_vcpu *vcpu)
 	vcpu->arch.mcg_cap &= 0x1ff;
 }
 
+static int svm_smi_allowed(struct kvm_vcpu *vcpu)
+{
+	return 1;
+}
+
 static int svm_prep_enter_smm(struct kvm_vcpu *vcpu, char *smstate)
 {
 	/* TODO: Implement */
@@ -5517,6 +5522,7 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
 	.update_pi_irte = svm_update_pi_irte,
 	.setup_mce = svm_setup_mce,
 
+	.smi_allowed = svm_smi_allowed,
 	.prep_enter_smm = svm_prep_enter_smm,
 	.post_leave_smm = svm_post_leave_smm,
 };
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 15478f413392..bde45f0c27cc 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -11941,6 +11941,11 @@ static void vmx_setup_mce(struct kvm_vcpu *vcpu)
 			~FEATURE_CONTROL_LMCE;
 }
 
+static int vmx_smi_allowed(struct kvm_vcpu *vcpu)
+{
+	return 1;
+}
+
 static int vmx_prep_enter_smm(struct kvm_vcpu *vcpu, char *smstate)
 {
 	/* TODO: Implement */
@@ -12080,6 +12085,7 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
 
 	.setup_mce = vmx_setup_mce,
 
+	.smi_allowed = vmx_smi_allowed,
 	.prep_enter_smm = vmx_prep_enter_smm,
 	.post_leave_smm = vmx_post_leave_smm,
 };
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e9aef1d858a8..588ef3864ebd 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6433,7 +6433,7 @@ static int inject_pending_event(struct kvm_vcpu *vcpu, bool req_int_win)
 		}
 
 		kvm_x86_ops->queue_exception(vcpu);
-	} else if (vcpu->arch.smi_pending && !is_smm(vcpu)) {
+	} else if (vcpu->arch.smi_pending && !is_smm(vcpu) && kvm_x86_ops->smi_allowed(vcpu)) {
 		vcpu->arch.smi_pending = false;
 		enter_smm(vcpu);
 	} else if (vcpu->arch.nmi_pending && kvm_x86_ops->nmi_allowed(vcpu)) {
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 3/6] KVM: nVMX: fix SMI injection in guest mode
  2017-10-10 12:17 [PATCH v4 0/6] KVM: nested virt SMM fixes Ladi Prosek
  2017-10-10 12:17 ` [PATCH v4 1/6] KVM: x86: introduce ISA specific SMM entry/exit callbacks Ladi Prosek
  2017-10-10 12:17 ` [PATCH v4 2/6] KVM: x86: introduce ISA specific smi_allowed callback Ladi Prosek
@ 2017-10-10 12:17 ` Ladi Prosek
  2017-10-10 12:17 ` [PATCH v4 4/6] KVM: nVMX: treat CR4.VMXE as reserved in SMM Ladi Prosek
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Ladi Prosek @ 2017-10-10 12:17 UTC (permalink / raw)
  To: kvm; +Cc: rkrcmar, pbonzini

Entering SMM while running in guest mode wasn't working very well because several
pieces of the vcpu state were left set up for nested operation.

Some of the issues observed:

* L1 was getting unexpected VM exits (using L1 interception controls but running
  in SMM execution environment)
* SMM handler couldn't write to vmx_set_cr4 because of incorrect validity checks
  predicated on nested.vmxon
* MMU was confused (walk_mmu was still set to nested_mmu)

Intel SDM actually prescribes the logical processor to "leave VMX operation" upon
entering SMM in 34.14.1 Default Treatment of SMI Delivery. What we need to do is
basically get out of guest mode and set nested.vmxon to false for the duration of
SMM. All this completely transparent to L1, i.e. L1 is not given control and no
L1 observable state changes.

To avoid code duplication this commit takes advantage of the existing nested
vmexit and run functionality, perhaps at the cost of efficiency. To get out of
guest mode, nested_vmx_vmexit with exit_reason == -1 is called, a trick already
used in vmx_leave_nested. Re-entering is cleaner, using enter_vmx_non_root_mode.

This commit fixes running Windows Server 2016 with Hyper-V enabled in a VM with
OVMF firmware (OVMF_CODE-need-smm.fd).

Signed-off-by: Ladi Prosek <lprosek@redhat.com>
---
 arch/x86/kvm/vmx.c | 59 ++++++++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 48 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index bde45f0c27cc..ace5ca6bc41a 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -486,6 +486,14 @@ struct nested_vmx {
 	u64 nested_vmx_cr4_fixed1;
 	u64 nested_vmx_vmcs_enum;
 	u64 nested_vmx_vmfunc_controls;
+
+	/* SMM related state */
+	struct {
+		/* in VMX operation on SMM entry? */
+		bool vmxon;
+		/* in guest mode on SMM entry? */
+		bool guest_mode;
+	} smm;
 };
 
 #define POSTED_INTR_ON  0
@@ -11424,8 +11432,11 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason,
 	leave_guest_mode(vcpu);
 
 	if (likely(!vmx->fail)) {
-		prepare_vmcs12(vcpu, vmcs12, exit_reason, exit_intr_info,
-			       exit_qualification);
+		if (exit_reason == -1)
+			sync_vmcs12(vcpu, vmcs12);
+		else
+			prepare_vmcs12(vcpu, vmcs12, exit_reason, exit_intr_info,
+				       exit_qualification);
 
 		if (nested_vmx_store_msr(vcpu, vmcs12->vm_exit_msr_store_addr,
 					 vmcs12->vm_exit_msr_store_count))
@@ -11489,7 +11500,7 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason,
 	 */
 	kvm_make_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu);
 
-	if (enable_shadow_vmcs)
+	if (enable_shadow_vmcs && exit_reason != -1)
 		vmx->nested.sync_shadow_vmcs = true;
 
 	/* in case we halted in L2 */
@@ -11513,12 +11524,13 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason,
 				INTR_INFO_VALID_MASK | INTR_TYPE_EXT_INTR;
 		}
 
-		trace_kvm_nested_vmexit_inject(vmcs12->vm_exit_reason,
-					       vmcs12->exit_qualification,
-					       vmcs12->idt_vectoring_info_field,
-					       vmcs12->vm_exit_intr_info,
-					       vmcs12->vm_exit_intr_error_code,
-					       KVM_ISA_VMX);
+		if (exit_reason != -1)
+			trace_kvm_nested_vmexit_inject(vmcs12->vm_exit_reason,
+						       vmcs12->exit_qualification,
+						       vmcs12->idt_vectoring_info_field,
+						       vmcs12->vm_exit_intr_info,
+						       vmcs12->vm_exit_intr_error_code,
+						       KVM_ISA_VMX);
 
 		load_vmcs12_host_state(vcpu, vmcs12);
 
@@ -11943,19 +11955,44 @@ static void vmx_setup_mce(struct kvm_vcpu *vcpu)
 
 static int vmx_smi_allowed(struct kvm_vcpu *vcpu)
 {
+	/* we need a nested vmexit to enter SMM, postpone if run is pending */
+	if (to_vmx(vcpu)->nested.nested_run_pending)
+		return 0;
 	return 1;
 }
 
 static int vmx_prep_enter_smm(struct kvm_vcpu *vcpu, char *smstate)
 {
-	/* TODO: Implement */
+	struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+	vmx->nested.smm.guest_mode = is_guest_mode(vcpu);
+	if (vmx->nested.smm.guest_mode)
+		nested_vmx_vmexit(vcpu, -1, 0, 0);
+
+	vmx->nested.smm.vmxon = vmx->nested.vmxon;
+	vmx->nested.vmxon = false;
 	return 0;
 }
 
 static int vmx_post_leave_smm(struct kvm_vcpu *vcpu, u64 smbase,
 			      bool *reload_state)
 {
-	/* TODO: Implement */
+	struct vcpu_vmx *vmx = to_vmx(vcpu);
+	int ret;
+
+	if (vmx->nested.smm.vmxon) {
+		vmx->nested.vmxon = true;
+		vmx->nested.smm.vmxon = false;
+	}
+
+	if (vmx->nested.smm.guest_mode) {
+		ret = enter_vmx_non_root_mode(vcpu, false);
+		if (ret)
+			return ret;
+
+		vmx->nested.smm.guest_mode = false;
+		*reload_state = true;
+	}
 	return 0;
 }
 
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 4/6] KVM: nVMX: treat CR4.VMXE as reserved in SMM
  2017-10-10 12:17 [PATCH v4 0/6] KVM: nested virt SMM fixes Ladi Prosek
                   ` (2 preceding siblings ...)
  2017-10-10 12:17 ` [PATCH v4 3/6] KVM: nVMX: fix SMI injection in guest mode Ladi Prosek
@ 2017-10-10 12:17 ` Ladi Prosek
  2017-10-10 14:31   ` Paolo Bonzini
  2017-10-10 12:17 ` [PATCH v4 5/6] KVM: nSVM: refactor nested_svm_vmrun Ladi Prosek
  2017-10-10 12:17 ` [PATCH v4 6/6] KVM: nSVM: fix SMI injection in guest mode Ladi Prosek
  5 siblings, 1 reply; 13+ messages in thread
From: Ladi Prosek @ 2017-10-10 12:17 UTC (permalink / raw)
  To: kvm; +Cc: rkrcmar, pbonzini

Intel SDM 34.14.3 Protection of CR4.VMXE in SMM:

"Under the default treatment, CR4.VMXE is treated as a reserved bit while a
logical processor is in SMM. Any attempt by software running in SMM to set
this bit causes a general-protection exception."

em_rsm() may set CR4.VMXE as part of its loading of saved SMM state so the
the SMM flag is temporarily cleared in the affected function.

Signed-off-by: Ladi Prosek <lprosek@redhat.com>
---
 arch/x86/kvm/emulate.c | 32 ++++++++++++++++++++++++--------
 arch/x86/kvm/vmx.c     |  4 ++++
 2 files changed, 28 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 1e6a8a824b8b..2d0b9dcfb812 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2406,7 +2406,14 @@ static int rsm_load_seg_64(struct x86_emulate_ctxt *ctxt, u64 smbase, int n)
 static int rsm_enter_protected_mode(struct x86_emulate_ctxt *ctxt,
 				     u64 cr0, u64 cr4)
 {
-	int bad;
+	int bad, ret = X86EMUL_CONTINUE;
+
+	/*
+	 * Temporarily clear the SMM flag so we don't trip over the
+	 * no-CR4.VMXE-in-SMM check.
+	 */
+	ctxt->ops->set_hflags(ctxt, ctxt->ops->get_hflags(ctxt) &
+		~X86EMUL_SMM_MASK);
 
 	/*
 	 * First enable PAE, long mode needs it before CR0.PG = 1 is set.
@@ -2414,20 +2421,29 @@ static int rsm_enter_protected_mode(struct x86_emulate_ctxt *ctxt,
 	 * if EFER.LMA=0, so set it separately.
 	 */
 	bad = ctxt->ops->set_cr(ctxt, 4, cr4 & ~X86_CR4_PCIDE);
-	if (bad)
-		return X86EMUL_UNHANDLEABLE;
+	if (bad) {
+		ret = X86EMUL_UNHANDLEABLE;
+		goto out;
+	}
 
 	bad = ctxt->ops->set_cr(ctxt, 0, cr0);
-	if (bad)
-		return X86EMUL_UNHANDLEABLE;
+	if (bad) {
+		ret = X86EMUL_UNHANDLEABLE;
+		goto out;
+	}
 
 	if (cr4 & X86_CR4_PCIDE) {
 		bad = ctxt->ops->set_cr(ctxt, 4, cr4);
-		if (bad)
-			return X86EMUL_UNHANDLEABLE;
+		if (bad) {
+			ret = X86EMUL_UNHANDLEABLE;
+			goto out;
+		}
 	}
+out:
+	ctxt->ops->set_hflags(ctxt, ctxt->ops->get_hflags(ctxt) |
+		X86EMUL_SMM_MASK);
 
-	return X86EMUL_CONTINUE;
+	return ret;
 }
 
 static int rsm_load_state_32(struct x86_emulate_ctxt *ctxt, u64 smbase)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index ace5ca6bc41a..f255038d5a91 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4364,6 +4364,10 @@ static int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
 		 */
 		if (!nested_vmx_allowed(vcpu))
 			return 1;
+
+		/* cr4.VMXE is a reserved bit in SMM */
+		if (is_smm(vcpu))
+			return 1;
 	}
 
 	if (to_vmx(vcpu)->nested.vmxon && !nested_cr4_valid(vcpu, cr4))
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 5/6] KVM: nSVM: refactor nested_svm_vmrun
  2017-10-10 12:17 [PATCH v4 0/6] KVM: nested virt SMM fixes Ladi Prosek
                   ` (3 preceding siblings ...)
  2017-10-10 12:17 ` [PATCH v4 4/6] KVM: nVMX: treat CR4.VMXE as reserved in SMM Ladi Prosek
@ 2017-10-10 12:17 ` Ladi Prosek
  2017-10-10 12:17 ` [PATCH v4 6/6] KVM: nSVM: fix SMI injection in guest mode Ladi Prosek
  5 siblings, 0 replies; 13+ messages in thread
From: Ladi Prosek @ 2017-10-10 12:17 UTC (permalink / raw)
  To: kvm; +Cc: rkrcmar, pbonzini

Analogous to 858e25c06fb0 ("kvm: nVMX: Refactor nested_vmx_run()"), this commit splits
nested_svm_vmrun into two parts. The newly introduced enter_svm_guest_mode modifies the
vcpu state to transition from L1 to L2, while the code left in nested_svm_vmrun handles
the VMRUN instruction.

Signed-off-by: Ladi Prosek <lprosek@redhat.com>
---
 arch/x86/kvm/svm.c | 132 ++++++++++++++++++++++++++++-------------------------
 1 file changed, 69 insertions(+), 63 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index e7c6c7fb3e19..a10f95139d0a 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -2916,70 +2916,9 @@ static bool nested_vmcb_checks(struct vmcb *vmcb)
 	return true;
 }
 
-static bool nested_svm_vmrun(struct vcpu_svm *svm)
+static void enter_svm_guest_mode(struct vcpu_svm *svm, u64 vmcb_gpa,
+				 struct vmcb *nested_vmcb, struct page *page)
 {
-	struct vmcb *nested_vmcb;
-	struct vmcb *hsave = svm->nested.hsave;
-	struct vmcb *vmcb = svm->vmcb;
-	struct page *page;
-	u64 vmcb_gpa;
-
-	vmcb_gpa = svm->vmcb->save.rax;
-
-	nested_vmcb = nested_svm_map(svm, svm->vmcb->save.rax, &page);
-	if (!nested_vmcb)
-		return false;
-
-	if (!nested_vmcb_checks(nested_vmcb)) {
-		nested_vmcb->control.exit_code    = SVM_EXIT_ERR;
-		nested_vmcb->control.exit_code_hi = 0;
-		nested_vmcb->control.exit_info_1  = 0;
-		nested_vmcb->control.exit_info_2  = 0;
-
-		nested_svm_unmap(page);
-
-		return false;
-	}
-
-	trace_kvm_nested_vmrun(svm->vmcb->save.rip, vmcb_gpa,
-			       nested_vmcb->save.rip,
-			       nested_vmcb->control.int_ctl,
-			       nested_vmcb->control.event_inj,
-			       nested_vmcb->control.nested_ctl);
-
-	trace_kvm_nested_intercepts(nested_vmcb->control.intercept_cr & 0xffff,
-				    nested_vmcb->control.intercept_cr >> 16,
-				    nested_vmcb->control.intercept_exceptions,
-				    nested_vmcb->control.intercept);
-
-	/* Clear internal status */
-	kvm_clear_exception_queue(&svm->vcpu);
-	kvm_clear_interrupt_queue(&svm->vcpu);
-
-	/*
-	 * Save the old vmcb, so we don't need to pick what we save, but can
-	 * restore everything when a VMEXIT occurs
-	 */
-	hsave->save.es     = vmcb->save.es;
-	hsave->save.cs     = vmcb->save.cs;
-	hsave->save.ss     = vmcb->save.ss;
-	hsave->save.ds     = vmcb->save.ds;
-	hsave->save.gdtr   = vmcb->save.gdtr;
-	hsave->save.idtr   = vmcb->save.idtr;
-	hsave->save.efer   = svm->vcpu.arch.efer;
-	hsave->save.cr0    = kvm_read_cr0(&svm->vcpu);
-	hsave->save.cr4    = svm->vcpu.arch.cr4;
-	hsave->save.rflags = kvm_get_rflags(&svm->vcpu);
-	hsave->save.rip    = kvm_rip_read(&svm->vcpu);
-	hsave->save.rsp    = vmcb->save.rsp;
-	hsave->save.rax    = vmcb->save.rax;
-	if (npt_enabled)
-		hsave->save.cr3    = vmcb->save.cr3;
-	else
-		hsave->save.cr3    = kvm_read_cr3(&svm->vcpu);
-
-	copy_vmcb_control_area(hsave, vmcb);
-
 	if (kvm_get_rflags(&svm->vcpu) & X86_EFLAGS_IF)
 		svm->vcpu.arch.hflags |= HF_HIF_MASK;
 	else
@@ -3072,6 +3011,73 @@ static bool nested_svm_vmrun(struct vcpu_svm *svm)
 	enable_gif(svm);
 
 	mark_all_dirty(svm->vmcb);
+}
+
+static bool nested_svm_vmrun(struct vcpu_svm *svm)
+{
+	struct vmcb *nested_vmcb;
+	struct vmcb *hsave = svm->nested.hsave;
+	struct vmcb *vmcb = svm->vmcb;
+	struct page *page;
+	u64 vmcb_gpa;
+
+	vmcb_gpa = svm->vmcb->save.rax;
+
+	nested_vmcb = nested_svm_map(svm, svm->vmcb->save.rax, &page);
+	if (!nested_vmcb)
+		return false;
+
+	if (!nested_vmcb_checks(nested_vmcb)) {
+		nested_vmcb->control.exit_code    = SVM_EXIT_ERR;
+		nested_vmcb->control.exit_code_hi = 0;
+		nested_vmcb->control.exit_info_1  = 0;
+		nested_vmcb->control.exit_info_2  = 0;
+
+		nested_svm_unmap(page);
+
+		return false;
+	}
+
+	trace_kvm_nested_vmrun(svm->vmcb->save.rip, vmcb_gpa,
+			       nested_vmcb->save.rip,
+			       nested_vmcb->control.int_ctl,
+			       nested_vmcb->control.event_inj,
+			       nested_vmcb->control.nested_ctl);
+
+	trace_kvm_nested_intercepts(nested_vmcb->control.intercept_cr & 0xffff,
+				    nested_vmcb->control.intercept_cr >> 16,
+				    nested_vmcb->control.intercept_exceptions,
+				    nested_vmcb->control.intercept);
+
+	/* Clear internal status */
+	kvm_clear_exception_queue(&svm->vcpu);
+	kvm_clear_interrupt_queue(&svm->vcpu);
+
+	/*
+	 * Save the old vmcb, so we don't need to pick what we save, but can
+	 * restore everything when a VMEXIT occurs
+	 */
+	hsave->save.es     = vmcb->save.es;
+	hsave->save.cs     = vmcb->save.cs;
+	hsave->save.ss     = vmcb->save.ss;
+	hsave->save.ds     = vmcb->save.ds;
+	hsave->save.gdtr   = vmcb->save.gdtr;
+	hsave->save.idtr   = vmcb->save.idtr;
+	hsave->save.efer   = svm->vcpu.arch.efer;
+	hsave->save.cr0    = kvm_read_cr0(&svm->vcpu);
+	hsave->save.cr4    = svm->vcpu.arch.cr4;
+	hsave->save.rflags = kvm_get_rflags(&svm->vcpu);
+	hsave->save.rip    = kvm_rip_read(&svm->vcpu);
+	hsave->save.rsp    = vmcb->save.rsp;
+	hsave->save.rax    = vmcb->save.rax;
+	if (npt_enabled)
+		hsave->save.cr3    = vmcb->save.cr3;
+	else
+		hsave->save.cr3    = kvm_read_cr3(&svm->vcpu);
+
+	copy_vmcb_control_area(hsave, vmcb);
+
+	enter_svm_guest_mode(svm, vmcb_gpa, nested_vmcb, page);
 
 	return true;
 }
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 6/6] KVM: nSVM: fix SMI injection in guest mode
  2017-10-10 12:17 [PATCH v4 0/6] KVM: nested virt SMM fixes Ladi Prosek
                   ` (4 preceding siblings ...)
  2017-10-10 12:17 ` [PATCH v4 5/6] KVM: nSVM: refactor nested_svm_vmrun Ladi Prosek
@ 2017-10-10 12:17 ` Ladi Prosek
  2017-10-10 14:56   ` Paolo Bonzini
  5 siblings, 1 reply; 13+ messages in thread
From: Ladi Prosek @ 2017-10-10 12:17 UTC (permalink / raw)
  To: kvm; +Cc: rkrcmar, pbonzini

Entering SMM while running in guest mode wasn't working very well because several
pieces of the vcpu state were left set up for nested operation.

Some of the issues observed:

* L1 was getting unexpected VM exits (using L1 interception controls but running
  in SMM execution environment)
* MMU was confused (walk_mmu was still set to nested_mmu)
* INTERCEPT_SMI was not emulated for L1 (KVM never injected SVM_EXIT_SMI)

Intel SDM actually prescribes the logical processor to "leave VMX operation" upon
entering SMM in 34.14.1 Default Treatment of SMI Delivery. AMD doesn't seem to
document this but they provide fields in the SMM state-save area to stash the
current state of SVM. What we need to do is basically get out of guest mode for
the duration of SMM. All this completely transparent to L1, i.e. L1 is not given
control and no L1 observable state changes.

To avoid code duplication this commit takes advantage of the existing nested
vmexit and run functionality, perhaps at the cost of efficiency. To get out of
guest mode, nested_svm_vmexit is called, unchanged. Re-entering is performed using
enter_svm_guest_mode.

This commit fixes running Windows Server 2016 with Hyper-V enabled in a VM with
OVMF firmware (OVMF_CODE-need-smm.fd).

Signed-off-by: Ladi Prosek <lprosek@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |  3 +++
 arch/x86/kvm/svm.c              | 57 +++++++++++++++++++++++++++++++++++++++--
 arch/x86/kvm/x86.c              |  3 ---
 3 files changed, 58 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 1c9d6b90f50c..d67b342a5c0e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1431,4 +1431,7 @@ static inline int kvm_cpu_get_apicid(int mps_cpu)
 #endif
 }
 
+#define put_smstate(type, buf, offset, val)                      \
+	*(type *)((buf) + (offset) - 0x7e00) = val
+
 #endif /* _ASM_X86_KVM_HOST_H */
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index a10f95139d0a..71ecc2549e2d 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -5401,19 +5401,72 @@ static void svm_setup_mce(struct kvm_vcpu *vcpu)
 
 static int svm_smi_allowed(struct kvm_vcpu *vcpu)
 {
+	struct vcpu_svm *svm = to_svm(vcpu);
+
+	/* Per APM Vol.2 15.22.2 "Response to SMI" */
+	if (!gif_set(svm))
+		return 0;
+
+	if (is_guest_mode(&svm->vcpu) &&
+	    svm->nested.intercept & (1ULL << INTERCEPT_SMI)) {
+		/* TODO: Might need to set exit_info_1 and exit_info_2 here */
+		svm->vmcb->control.exit_code = SVM_EXIT_SMI;
+		svm->nested.exit_required = true;
+		return 0;
+	}
+
 	return 1;
 }
 
 static int svm_prep_enter_smm(struct kvm_vcpu *vcpu, char *smstate)
 {
-	/* TODO: Implement */
+	struct vcpu_svm *svm = to_svm(vcpu);
+	int ret;
+
+	if (is_guest_mode(vcpu)) {
+		/* FED8h - SVM Guest */
+		put_smstate(u64, smstate, 0x7ed8, 1);
+		/* FEE0h - SVM Guest VMCB Physical Address */
+		put_smstate(u64, smstate, 0x7ee0, svm->nested.vmcb);
+
+		svm->vmcb->save.rax = vcpu->arch.regs[VCPU_REGS_RAX];
+		svm->vmcb->save.rsp = vcpu->arch.regs[VCPU_REGS_RSP];
+		svm->vmcb->save.rip = vcpu->arch.regs[VCPU_REGS_RIP];
+
+		ret = nested_svm_vmexit(svm);
+		if (ret)
+			return ret;
+	}
 	return 0;
 }
 
 static int svm_post_leave_smm(struct kvm_vcpu *vcpu, u64 smbase,
 			      bool *reload_state)
 {
-	/* TODO: Implement */
+	struct vcpu_svm *svm = to_svm(vcpu);
+	struct vmcb *nested_vmcb;
+	struct page *page;
+	struct {
+		u64 guest;
+		u64 vmcb;
+	} svm_state_save;
+	int r;
+
+	/* Temporarily set the SMM flag to access the SMM state-save area */
+	vcpu->arch.hflags |= HF_SMM_MASK;
+	r = kvm_vcpu_read_guest(vcpu, smbase + 0xfed8, &svm_state_save,
+				sizeof(svm_state_save));
+	vcpu->arch.hflags &= ~HF_SMM_MASK;
+	if (r)
+		return r;
+
+	if (svm_state_save.guest) {
+		nested_vmcb = nested_svm_map(svm, svm_state_save.vmcb, &page);
+		if (!nested_vmcb)
+			return 1;
+		enter_svm_guest_mode(svm, svm_state_save.vmcb, nested_vmcb, page);
+		*reload_state = true;
+	}
 	return 0;
 }
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 588ef3864ebd..ae0cfe58dd30 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6480,9 +6480,6 @@ static void process_nmi(struct kvm_vcpu *vcpu)
 	kvm_make_request(KVM_REQ_EVENT, vcpu);
 }
 
-#define put_smstate(type, buf, offset, val)			  \
-	*(type *)((buf) + (offset) - 0x7e00) = val
-
 static u32 enter_smm_get_segment_flags(struct kvm_segment *seg)
 {
 	u32 flags = 0;
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 4/6] KVM: nVMX: treat CR4.VMXE as reserved in SMM
  2017-10-10 12:17 ` [PATCH v4 4/6] KVM: nVMX: treat CR4.VMXE as reserved in SMM Ladi Prosek
@ 2017-10-10 14:31   ` Paolo Bonzini
  0 siblings, 0 replies; 13+ messages in thread
From: Paolo Bonzini @ 2017-10-10 14:31 UTC (permalink / raw)
  To: Ladi Prosek, kvm; +Cc: rkrcmar

On 10/10/2017 14:17, Ladi Prosek wrote:
> Intel SDM 34.14.3 Protection of CR4.VMXE in SMM:
> 
> "Under the default treatment, CR4.VMXE is treated as a reserved bit while a
> logical processor is in SMM. Any attempt by software running in SMM to set
> this bit causes a general-protection exception."
> 
> em_rsm() may set CR4.VMXE as part of its loading of saved SMM state so the
> the SMM flag is temporarily cleared in the affected function.
> 
> Signed-off-by: Ladi Prosek <lprosek@redhat.com>
> ---
>  arch/x86/kvm/emulate.c | 32 ++++++++++++++++++++++++--------
>  arch/x86/kvm/vmx.c     |  4 ++++
>  2 files changed, 28 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index 1e6a8a824b8b..2d0b9dcfb812 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -2406,7 +2406,14 @@ static int rsm_load_seg_64(struct x86_emulate_ctxt *ctxt, u64 smbase, int n)
>  static int rsm_enter_protected_mode(struct x86_emulate_ctxt *ctxt,
>  				     u64 cr0, u64 cr4)
>  {
> -	int bad;
> +	int bad, ret = X86EMUL_CONTINUE;
> +
> +	/*
> +	 * Temporarily clear the SMM flag so we don't trip over the
> +	 * no-CR4.VMXE-in-SMM check.
> +	 */
> +	ctxt->ops->set_hflags(ctxt, ctxt->ops->get_hflags(ctxt) &
> +		~X86EMUL_SMM_MASK);

Maybe don't set VMXE here and set it in post_leave_smm?  I can just
leave out this patch and let you redo this as a follow up.

Paolo

>  	/*
>  	 * First enable PAE, long mode needs it before CR0.PG = 1 is set.
> @@ -2414,20 +2421,29 @@ static int rsm_enter_protected_mode(struct x86_emulate_ctxt *ctxt,
>  	 * if EFER.LMA=0, so set it separately.
>  	 */
>  	bad = ctxt->ops->set_cr(ctxt, 4, cr4 & ~X86_CR4_PCIDE);
> -	if (bad)
> -		return X86EMUL_UNHANDLEABLE;
> +	if (bad) {
> +		ret = X86EMUL_UNHANDLEABLE;
> +		goto out;
> +	}
>  
>  	bad = ctxt->ops->set_cr(ctxt, 0, cr0);
> -	if (bad)
> -		return X86EMUL_UNHANDLEABLE;
> +	if (bad) {
> +		ret = X86EMUL_UNHANDLEABLE;
> +		goto out;
> +	}
>  
>  	if (cr4 & X86_CR4_PCIDE) {
>  		bad = ctxt->ops->set_cr(ctxt, 4, cr4);
> -		if (bad)
> -			return X86EMUL_UNHANDLEABLE;
> +		if (bad) {
> +			ret = X86EMUL_UNHANDLEABLE;
> +			goto out;
> +		}
>  	}
> +out:
> +	ctxt->ops->set_hflags(ctxt, ctxt->ops->get_hflags(ctxt) |
> +		X86EMUL_SMM_MASK);
>  
> -	return X86EMUL_CONTINUE;
> +	return ret;
>  }
>  
>  static int rsm_load_state_32(struct x86_emulate_ctxt *ctxt, u64 smbase)
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index ace5ca6bc41a..f255038d5a91 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -4364,6 +4364,10 @@ static int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
>  		 */
>  		if (!nested_vmx_allowed(vcpu))
>  			return 1;
> +
> +		/* cr4.VMXE is a reserved bit in SMM */
> +		if (is_smm(vcpu))
> +			return 1;
>  	}
>  
>  	if (to_vmx(vcpu)->nested.vmxon && !nested_cr4_valid(vcpu, cr4))
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 6/6] KVM: nSVM: fix SMI injection in guest mode
  2017-10-10 12:17 ` [PATCH v4 6/6] KVM: nSVM: fix SMI injection in guest mode Ladi Prosek
@ 2017-10-10 14:56   ` Paolo Bonzini
  2017-10-10 15:59     ` Ladi Prosek
  0 siblings, 1 reply; 13+ messages in thread
From: Paolo Bonzini @ 2017-10-10 14:56 UTC (permalink / raw)
  To: Ladi Prosek, kvm; +Cc: rkrcmar

On 10/10/2017 14:17, Ladi Prosek wrote:
> +	/* Temporarily set the SMM flag to access the SMM state-save area */
> +	vcpu->arch.hflags |= HF_SMM_MASK;
> +	r = kvm_vcpu_read_guest(vcpu, smbase + 0xfed8, &svm_state_save,
> +				sizeof(svm_state_save));
> +	vcpu->arch.hflags &= ~HF_SMM_MASK;
> +	if (r)

Isn't the flag still set here?  You have:

+	if (ctxt->ops->post_leave_smm(ctxt, smbase, &reload_state))
+		return X86EMUL_UNHANDLEABLE;
+
+	if (reload_state) {
+		/*
+		 * post_leave_smm() made changes to the vCPU (e.g. entered
+		 * guest mode) and is asking us to load the SMM state-save
+		 * area again.
+		 */
+		ret = rsm_load_state(ctxt, smbase);
+		if (ret != X86EMUL_CONTINUE) {
+			/* FIXME: should triple fault */
+			return X86EMUL_UNHANDLEABLE;
+		}
+	}
+
        if ((ctxt->ops->get_hflags(ctxt) & X86EMUL_SMM_INSIDE_NMI_MASK) == 0)
                ctxt->ops->set_nmi_mask(ctxt, false);

        ctxt->ops->set_hflags(ctxt, ctxt->ops->get_hflags(ctxt) &
                ~(X86EMUL_SMM_INSIDE_NMI_MASK | X86EMUL_SMM_MASK));

On the other hand you need to turn HF_SMM_MASK off and back on
around nested_svm_map/enter_svm_guest_mode, so that the VMCB is
not read from SMRAM.

And the mandatory stupid question: why is the first state load
needed at all?  Could VMX's post_leave_smm callback (which would
really become a pre_leave_smm callback) use load_vmcs12_host_state
instead.   SVM likewise could extract load_from_hsave_area from
nested_svm_vmexit, and pre-load the SMM state-save area data in
pre_leave_smm.

(Yes, load_vmcs12_host_state is overkill, but
it's already there and it already knows how to do things like
entering protected mode etc.).

Instead, post_leave_smm would run with the SMM flag clear, so
hopefully you wouldn't need games with HF_SMM_MASK at all (except
for the reserved VMXE flag---let's leave that out for now and
concentrate on the right way to do L2 state save restore...).

Paolo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 6/6] KVM: nSVM: fix SMI injection in guest mode
  2017-10-10 14:56   ` Paolo Bonzini
@ 2017-10-10 15:59     ` Ladi Prosek
  2017-10-10 16:15       ` Paolo Bonzini
  0 siblings, 1 reply; 13+ messages in thread
From: Ladi Prosek @ 2017-10-10 15:59 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: KVM list, Radim Krcmar

On Tue, Oct 10, 2017 at 4:56 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> On 10/10/2017 14:17, Ladi Prosek wrote:
>> +     /* Temporarily set the SMM flag to access the SMM state-save area */
>> +     vcpu->arch.hflags |= HF_SMM_MASK;
>> +     r = kvm_vcpu_read_guest(vcpu, smbase + 0xfed8, &svm_state_save,
>> +                             sizeof(svm_state_save));
>> +     vcpu->arch.hflags &= ~HF_SMM_MASK;
>> +     if (r)
>
> Isn't the flag still set here?  You have:
>
> +       if (ctxt->ops->post_leave_smm(ctxt, smbase, &reload_state))
> +               return X86EMUL_UNHANDLEABLE;
> +
> +       if (reload_state) {
> +               /*
> +                * post_leave_smm() made changes to the vCPU (e.g. entered
> +                * guest mode) and is asking us to load the SMM state-save
> +                * area again.
> +                */
> +               ret = rsm_load_state(ctxt, smbase);
> +               if (ret != X86EMUL_CONTINUE) {
> +                       /* FIXME: should triple fault */
> +                       return X86EMUL_UNHANDLEABLE;
> +               }
> +       }
> +
>         if ((ctxt->ops->get_hflags(ctxt) & X86EMUL_SMM_INSIDE_NMI_MASK) == 0)
>                 ctxt->ops->set_nmi_mask(ctxt, false);
>
>         ctxt->ops->set_hflags(ctxt, ctxt->ops->get_hflags(ctxt) &
>                 ~(X86EMUL_SMM_INSIDE_NMI_MASK | X86EMUL_SMM_MASK));
>
> On the other hand you need to turn HF_SMM_MASK off and back on
> around nested_svm_map/enter_svm_guest_mode, so that the VMCB is
> not read from SMRAM.

Ah, yes, that's a fallout from the previous arrangement where
post_leave_smm really ran after everything else. Sorry for the mix-up.

> And the mandatory stupid question: why is the first state load
> needed at all?  Could VMX's post_leave_smm callback (which would
> really become a pre_leave_smm callback) use load_vmcs12_host_state
> instead.   SVM likewise could extract load_from_hsave_area from
> nested_svm_vmexit, and pre-load the SMM state-save area data in
> pre_leave_smm.

I swear I've tried load_vmcs12_host_state and spent way too much time
trying to figure out why it didn't work. I'll sleep on it and try
again tomorrow :)

> (Yes, load_vmcs12_host_state is overkill, but
> it's already there and it already knows how to do things like
> entering protected mode etc.).

And that I'm not quite sure about. If I remember correctly I had to do
a bit of prep work before I could call load_vmcs12_host_state -- is it
really supposed to be able to enter protected from real?

> Instead, post_leave_smm would run with the SMM flag clear, so
> hopefully you wouldn't need games with HF_SMM_MASK at all (except
> for the reserved VMXE flag---let's leave that out for now and
> concentrate on the right way to do L2 state save restore...).

Sounds good, thanks!

> Paolo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 6/6] KVM: nSVM: fix SMI injection in guest mode
  2017-10-10 15:59     ` Ladi Prosek
@ 2017-10-10 16:15       ` Paolo Bonzini
  2017-10-11  7:50         ` Ladi Prosek
  0 siblings, 1 reply; 13+ messages in thread
From: Paolo Bonzini @ 2017-10-10 16:15 UTC (permalink / raw)
  To: Ladi Prosek; +Cc: KVM list, Radim Krcmar

On 10/10/2017 17:59, Ladi Prosek wrote:
> On Tue, Oct 10, 2017 at 4:56 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>> On 10/10/2017 14:17, Ladi Prosek wrote:
>>> +     /* Temporarily set the SMM flag to access the SMM state-save area */
>>> +     vcpu->arch.hflags |= HF_SMM_MASK;
>>> +     r = kvm_vcpu_read_guest(vcpu, smbase + 0xfed8, &svm_state_save,
>>> +                             sizeof(svm_state_save));
>>> +     vcpu->arch.hflags &= ~HF_SMM_MASK;
>>> +     if (r)
>>
>> Isn't the flag still set here?  You have:
>>
>> +       if (ctxt->ops->post_leave_smm(ctxt, smbase, &reload_state))
>> +               return X86EMUL_UNHANDLEABLE;
>> +
>> +       if (reload_state) {
>> +               /*
>> +                * post_leave_smm() made changes to the vCPU (e.g. entered
>> +                * guest mode) and is asking us to load the SMM state-save
>> +                * area again.
>> +                */
>> +               ret = rsm_load_state(ctxt, smbase);
>> +               if (ret != X86EMUL_CONTINUE) {
>> +                       /* FIXME: should triple fault */
>> +                       return X86EMUL_UNHANDLEABLE;
>> +               }
>> +       }
>> +
>>         if ((ctxt->ops->get_hflags(ctxt) & X86EMUL_SMM_INSIDE_NMI_MASK) == 0)
>>                 ctxt->ops->set_nmi_mask(ctxt, false);
>>
>>         ctxt->ops->set_hflags(ctxt, ctxt->ops->get_hflags(ctxt) &
>>                 ~(X86EMUL_SMM_INSIDE_NMI_MASK | X86EMUL_SMM_MASK));
>>
>> On the other hand you need to turn HF_SMM_MASK off and back on
>> around nested_svm_map/enter_svm_guest_mode, so that the VMCB is
>> not read from SMRAM.
> 
> Ah, yes, that's a fallout from the previous arrangement where
> post_leave_smm really ran after everything else. Sorry for the mix-up.
> 
>> And the mandatory stupid question: why is the first state load
>> needed at all?  Could VMX's post_leave_smm callback (which would
>> really become a pre_leave_smm callback) use load_vmcs12_host_state
>> instead.   SVM likewise could extract load_from_hsave_area from
>> nested_svm_vmexit, and pre-load the SMM state-save area data in
>> pre_leave_smm.
> 
> I swear I've tried load_vmcs12_host_state and spent way too much time
> trying to figure out why it didn't work. I'll sleep on it and try
> again tomorrow :)
> 
>> (Yes, load_vmcs12_host_state is overkill, but
>> it's already there and it already knows how to do things like
>> entering protected mode etc.).
> 
> And that I'm not quite sure about. If I remember correctly I had to do
> a bit of prep work before I could call load_vmcs12_host_state -- is it
> really supposed to be able to enter protected from real?

Yes, if L2 guest is running in real mode (with unrestricted_guest=1)
then load_vmcs12_host_state would enter protected mode.  It avoids all
the hoops in emulate.c because it can call vmx_set_cr0/cr4 directly and
skip all the checks.

Maybe today's patch "[PATCH] KVM: nVMX: fix guest CR4 loading when
emulating L2 to L1 exit" can help, actually.  SVM has been using
svm_set_cr4 forever, but until that patch VMX was using kvm_set_cr4
incorrectly.

This also suggests that placing the "VMXE && SMM" check in kvm_set_cr4
would bypass some of the issues you're having.

Paolo

>> Instead, post_leave_smm would run with the SMM flag clear, so
>> hopefully you wouldn't need games with HF_SMM_MASK at all (except
>> for the reserved VMXE flag---let's leave that out for now and
>> concentrate on the right way to do L2 state save restore...).
> 
> Sounds good, thanks!
> 
>> Paolo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 6/6] KVM: nSVM: fix SMI injection in guest mode
  2017-10-10 16:15       ` Paolo Bonzini
@ 2017-10-11  7:50         ` Ladi Prosek
  2017-10-11 11:21           ` Paolo Bonzini
  0 siblings, 1 reply; 13+ messages in thread
From: Ladi Prosek @ 2017-10-11  7:50 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: KVM list, Radim Krcmar

On Tue, Oct 10, 2017 at 6:15 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> On 10/10/2017 17:59, Ladi Prosek wrote:
>> On Tue, Oct 10, 2017 at 4:56 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>> On 10/10/2017 14:17, Ladi Prosek wrote:
>>>> +     /* Temporarily set the SMM flag to access the SMM state-save area */
>>>> +     vcpu->arch.hflags |= HF_SMM_MASK;
>>>> +     r = kvm_vcpu_read_guest(vcpu, smbase + 0xfed8, &svm_state_save,
>>>> +                             sizeof(svm_state_save));
>>>> +     vcpu->arch.hflags &= ~HF_SMM_MASK;
>>>> +     if (r)
>>>
>>> Isn't the flag still set here?  You have:
>>>
>>> +       if (ctxt->ops->post_leave_smm(ctxt, smbase, &reload_state))
>>> +               return X86EMUL_UNHANDLEABLE;
>>> +
>>> +       if (reload_state) {
>>> +               /*
>>> +                * post_leave_smm() made changes to the vCPU (e.g. entered
>>> +                * guest mode) and is asking us to load the SMM state-save
>>> +                * area again.
>>> +                */
>>> +               ret = rsm_load_state(ctxt, smbase);
>>> +               if (ret != X86EMUL_CONTINUE) {
>>> +                       /* FIXME: should triple fault */
>>> +                       return X86EMUL_UNHANDLEABLE;
>>> +               }
>>> +       }
>>> +
>>>         if ((ctxt->ops->get_hflags(ctxt) & X86EMUL_SMM_INSIDE_NMI_MASK) == 0)
>>>                 ctxt->ops->set_nmi_mask(ctxt, false);
>>>
>>>         ctxt->ops->set_hflags(ctxt, ctxt->ops->get_hflags(ctxt) &
>>>                 ~(X86EMUL_SMM_INSIDE_NMI_MASK | X86EMUL_SMM_MASK));
>>>
>>> On the other hand you need to turn HF_SMM_MASK off and back on
>>> around nested_svm_map/enter_svm_guest_mode, so that the VMCB is
>>> not read from SMRAM.
>>
>> Ah, yes, that's a fallout from the previous arrangement where
>> post_leave_smm really ran after everything else. Sorry for the mix-up.
>>
>>> And the mandatory stupid question: why is the first state load
>>> needed at all?  Could VMX's post_leave_smm callback (which would
>>> really become a pre_leave_smm callback) use load_vmcs12_host_state
>>> instead.   SVM likewise could extract load_from_hsave_area from
>>> nested_svm_vmexit, and pre-load the SMM state-save area data in
>>> pre_leave_smm.
>>
>> I swear I've tried load_vmcs12_host_state and spent way too much time
>> trying to figure out why it didn't work. I'll sleep on it and try
>> again tomorrow :)
>>
>>> (Yes, load_vmcs12_host_state is overkill, but
>>> it's already there and it already knows how to do things like
>>> entering protected mode etc.).
>>
>> And that I'm not quite sure about. If I remember correctly I had to do
>> a bit of prep work before I could call load_vmcs12_host_state -- is it
>> really supposed to be able to enter protected from real?
>
> Yes, if L2 guest is running in real mode (with unrestricted_guest=1)
> then load_vmcs12_host_state would enter protected mode.  It avoids all
> the hoops in emulate.c because it can call vmx_set_cr0/cr4 directly and
> skip all the checks.

Thanks, load_vmcs12_host_state() has a bug. It doesn't write to
GUEST_IDTR_LIMIT and GUEST_GDTR_LIMIT and the former is causing grief
in our case because it's set to 0 in enter_smm():

  /* Undocumented: IDT limit is set to zero on entry to SMM.  */
  dt.address = dt.size = 0;
  kvm_x86_ops->set_idt(vcpu, &dt);

Per Intel SDM 27.5.2 Loading Host Segment and Descriptor-Table Registers:
"The GDTR and IDTR limits are each set to FFFFH."

So this is what the first state load was doing for me, it just
restored IDTR limit to a good value.

Now after fixing load_vmcs12_host_state(), it looks like the
return-from-SMM-to-L2 works without doing anything special before
enter_vmx_non_root_mode (Intel) or enter_svm_guest_mode (AMD). In
particular I don't need to call load_vmcs12_host_state().

To recap, this is what I have in em_rsm() now:

1. Go back to real mode - existing code
2. enter_vmx_non_root_mode() / enter_svm_guest_mode()
3. rsm_load_state_* - existing code

Am I just getting lucky or is there a reason why we must have loaded
host state before switching to guest mode? I understand that it would
be cleaner with the intermediate host state load as going directly
from real to guest is kind of unnatural. But it works and saves a
bunch of cycles.

Thanks!

> Maybe today's patch "[PATCH] KVM: nVMX: fix guest CR4 loading when
> emulating L2 to L1 exit" can help, actually.  SVM has been using
> svm_set_cr4 forever, but until that patch VMX was using kvm_set_cr4
> incorrectly.
>
> This also suggests that placing the "VMXE && SMM" check in kvm_set_cr4
> would bypass some of the issues you're having.
>
> Paolo
>
>>> Instead, post_leave_smm would run with the SMM flag clear, so
>>> hopefully you wouldn't need games with HF_SMM_MASK at all (except
>>> for the reserved VMXE flag---let's leave that out for now and
>>> concentrate on the right way to do L2 state save restore...).
>>
>> Sounds good, thanks!
>>
>>> Paolo
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 6/6] KVM: nSVM: fix SMI injection in guest mode
  2017-10-11  7:50         ` Ladi Prosek
@ 2017-10-11 11:21           ` Paolo Bonzini
  0 siblings, 0 replies; 13+ messages in thread
From: Paolo Bonzini @ 2017-10-11 11:21 UTC (permalink / raw)
  To: Ladi Prosek; +Cc: KVM list, Radim Krcmar

On 11/10/2017 09:50, Ladi Prosek wrote:
> To recap, this is what I have in em_rsm() now:
> 
> 1. Go back to real mode - existing code
> 2. enter_vmx_non_root_mode() / enter_svm_guest_mode()
> 3. rsm_load_state_* - existing code
> 
> Am I just getting lucky or is there a reason why we must have loaded
> host state before switching to guest mode? I understand that it would
> be cleaner with the intermediate host state load as going directly
> from real to guest is kind of unnatural. But it works and saves a
> bunch of cycles.

Thinking more about it no, it shouldn't be needed.  It was just cleaner
than the first state load.

Paolo

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2017-10-11 11:21 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-10 12:17 [PATCH v4 0/6] KVM: nested virt SMM fixes Ladi Prosek
2017-10-10 12:17 ` [PATCH v4 1/6] KVM: x86: introduce ISA specific SMM entry/exit callbacks Ladi Prosek
2017-10-10 12:17 ` [PATCH v4 2/6] KVM: x86: introduce ISA specific smi_allowed callback Ladi Prosek
2017-10-10 12:17 ` [PATCH v4 3/6] KVM: nVMX: fix SMI injection in guest mode Ladi Prosek
2017-10-10 12:17 ` [PATCH v4 4/6] KVM: nVMX: treat CR4.VMXE as reserved in SMM Ladi Prosek
2017-10-10 14:31   ` Paolo Bonzini
2017-10-10 12:17 ` [PATCH v4 5/6] KVM: nSVM: refactor nested_svm_vmrun Ladi Prosek
2017-10-10 12:17 ` [PATCH v4 6/6] KVM: nSVM: fix SMI injection in guest mode Ladi Prosek
2017-10-10 14:56   ` Paolo Bonzini
2017-10-10 15:59     ` Ladi Prosek
2017-10-10 16:15       ` Paolo Bonzini
2017-10-11  7:50         ` Ladi Prosek
2017-10-11 11:21           ` Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.