linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] KVM: x86: nSVM: fixes for SYSENTER emulation
@ 2021-03-15 17:43 Maxim Levitsky
  2021-03-15 17:43 ` [PATCH 1/2] KVM: x86: add guest_cpuid_is_intel Maxim Levitsky
  2021-03-15 17:43 ` [PATCH 2/2] KVM: nSVM: improve SYSENTER emulation on AMD Maxim Levitsky
  0 siblings, 2 replies; 8+ messages in thread
From: Maxim Levitsky @ 2021-03-15 17:43 UTC (permalink / raw)
  To: kvm
  Cc: maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	H. Peter Anvin, Sean Christopherson, Ingo Molnar, Jim Mattson,
	Vitaly Kuznetsov, Joerg Roedel, Paolo Bonzini, Thomas Gleixner,
	open list:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Wanpeng Li, Borislav Petkov, Maxim Levitsky

This is a result of deep rabbit hole dive in regard to why currently the
nested migration of 32 bit guests is totally broken on AMD.

It turns out that due to slight differences between the original AMD64
implementation and the Intel's remake, SYSENTER instruction behaves a
bit differently on Intel, and to support migration from Intel to AMD we
try to emulate those differences away.

Sadly that collides with virtual vmload/vmsave feature that is used in nesting
such as on migration (and otherwise when userspace
reads MSR_IA32_SYSENTER_EIP/MSR_IA32_SYSENTER_ESP), wrong value is returned,
which leads to #DF in the nested guest when the wrong value is loaded back.

The patch I prepared carefully fixes this, by mostly disabling that SYSCALL
emulation when we don't spoof Intel's vendor ID, and if we do, and yet somehow
SVM is enabled (this is very rare corner case), then virtual vmload/save is
force disabled.

Best regards,
	Maxim Levitsky

Maxim Levitsky (2):
  KVM: x86: add guest_cpuid_is_intel
  KVM: nSVM: improve SYSENTER emulation on AMD

 arch/x86/kvm/cpuid.h   |  8 ++++
 arch/x86/kvm/svm/svm.c | 97 ++++++++++++++++++++++++++++--------------
 arch/x86/kvm/svm/svm.h |  7 +--
 3 files changed, 77 insertions(+), 35 deletions(-)

-- 
2.26.2



^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/2] KVM: x86: add guest_cpuid_is_intel
  2021-03-15 17:43 [PATCH 0/2] KVM: x86: nSVM: fixes for SYSENTER emulation Maxim Levitsky
@ 2021-03-15 17:43 ` Maxim Levitsky
  2021-03-15 17:43 ` [PATCH 2/2] KVM: nSVM: improve SYSENTER emulation on AMD Maxim Levitsky
  1 sibling, 0 replies; 8+ messages in thread
From: Maxim Levitsky @ 2021-03-15 17:43 UTC (permalink / raw)
  To: kvm
  Cc: maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	H. Peter Anvin, Sean Christopherson, Ingo Molnar, Jim Mattson,
	Vitaly Kuznetsov, Joerg Roedel, Paolo Bonzini, Thomas Gleixner,
	open list:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Wanpeng Li, Borislav Petkov, Maxim Levitsky

This is similar to existing 'guest_cpuid_is_amd_or_hygon'

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/cpuid.h | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index 2a0c5064497f3..ded84d244f19f 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -248,6 +248,14 @@ static inline bool guest_cpuid_is_amd_or_hygon(struct kvm_vcpu *vcpu)
 		is_guest_vendor_hygon(best->ebx, best->ecx, best->edx));
 }
 
+static inline bool guest_cpuid_is_intel(struct kvm_vcpu *vcpu)
+{
+	struct kvm_cpuid_entry2 *best;
+
+	best = kvm_find_cpuid_entry(vcpu, 0, 0);
+	return best && is_guest_vendor_intel(best->ebx, best->ecx, best->edx);
+}
+
 static inline int guest_cpuid_family(struct kvm_vcpu *vcpu)
 {
 	struct kvm_cpuid_entry2 *best;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/2] KVM: nSVM: improve SYSENTER emulation on AMD
  2021-03-15 17:43 [PATCH 0/2] KVM: x86: nSVM: fixes for SYSENTER emulation Maxim Levitsky
  2021-03-15 17:43 ` [PATCH 1/2] KVM: x86: add guest_cpuid_is_intel Maxim Levitsky
@ 2021-03-15 17:43 ` Maxim Levitsky
  2021-03-15 17:56   ` Paolo Bonzini
  1 sibling, 1 reply; 8+ messages in thread
From: Maxim Levitsky @ 2021-03-15 17:43 UTC (permalink / raw)
  To: kvm
  Cc: maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	H. Peter Anvin, Sean Christopherson, Ingo Molnar, Jim Mattson,
	Vitaly Kuznetsov, Joerg Roedel, Paolo Bonzini, Thomas Gleixner,
	open list:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Wanpeng Li, Borislav Petkov, Maxim Levitsky

Currently to support Intel->AMD migration, if CPU vendor is GenuineIntel,
we emulate full 64 value for MSR_IA32_SYSENTER_EIP/MSR_IA32_SYSENTER_ESP msrs,
and we also emulate sysenter/sysexit instruction in long mode.

(Emulator does still refuse to emulate sysenter in 64 bit mode, on the ground
that the code for this wasn't tested and likely there are no users of this)

However when virtual vmload/vmsave is enabled, the vmload instruction will
update these 32 bit msrs without triggering their msr intercept,
which will lead to having stale values in our shadow copy of these msrs,
which are in turn only updated when those msrs are intercepted.

Fix/optimize this by doing the following:

1. Enable the MSR intercepts for these MSRs iff vendor=GenuineIntel
   (This is both a tiny optimization and also will ensure that when guest
   cpu vendor is AMD, the msrs will be 32 bit wide as AMD defined).

2. Store only high 32 bit part of these msrs on interception and combine
   it with hardware msr value on intercepted read/writes iff vendor=GenuineIntel.

3. Disable vmload/vmsave virtualization if vendor=GenuineIntel.
   (It is somewhat insane to set vendor=GenuineIntel and still enable
   SVM for the guest but well whatever).
   Then zero the high 32 bit parts when we intercept and emulate vmload.
   And since we now read the low 32 bit part from the VMCB, it will be
   correct.

Thanks a lot to Paulo Bonzini for helping me fix this in the most correct way.

This patch fixes nested migration of 32 bit nested guests which was broken due
to incorrect cached values of these msrs being read if L1 changed these
msrs with vmload prior to L2 entry.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/svm.c | 97 ++++++++++++++++++++++++++++--------------
 arch/x86/kvm/svm/svm.h |  7 +--
 2 files changed, 69 insertions(+), 35 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 271196400495f..8bf243e0b1f7c 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -95,6 +95,8 @@ static const struct svm_direct_access_msrs {
 } direct_access_msrs[MAX_DIRECT_ACCESS_MSRS] = {
 	{ .index = MSR_STAR,				.always = true  },
 	{ .index = MSR_IA32_SYSENTER_CS,		.always = true  },
+	{ .index = MSR_IA32_SYSENTER_EIP,		.always = false },
+	{ .index = MSR_IA32_SYSENTER_ESP,		.always = false },
 #ifdef CONFIG_X86_64
 	{ .index = MSR_GS_BASE,				.always = true  },
 	{ .index = MSR_FS_BASE,				.always = true  },
@@ -1258,16 +1260,6 @@ static void init_vmcb(struct kvm_vcpu *vcpu)
 	if (kvm_vcpu_apicv_active(vcpu))
 		avic_init_vmcb(svm);
 
-	/*
-	 * If hardware supports Virtual VMLOAD VMSAVE then enable it
-	 * in VMCB and clear intercepts to avoid #VMEXIT.
-	 */
-	if (vls) {
-		svm_clr_intercept(svm, INTERCEPT_VMLOAD);
-		svm_clr_intercept(svm, INTERCEPT_VMSAVE);
-		svm->vmcb->control.virt_ext |= VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK;
-	}
-
 	if (vgif) {
 		svm_clr_intercept(svm, INTERCEPT_STGI);
 		svm_clr_intercept(svm, INTERCEPT_CLGI);
@@ -2133,9 +2125,11 @@ static int vmload_vmsave_interception(struct kvm_vcpu *vcpu, bool vmload)
 
 	ret = kvm_skip_emulated_instruction(vcpu);
 
-	if (vmload)
+	if (vmload) {
 		nested_svm_vmloadsave(vmcb12, svm->vmcb);
-	else
+		svm->sysenter_eip_hi = 0;
+		svm->sysenter_esp_hi = 0;
+	} else
 		nested_svm_vmloadsave(svm->vmcb, vmcb12);
 
 	kvm_vcpu_unmap(vcpu, &map, true);
@@ -2676,11 +2670,18 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_IA32_SYSENTER_CS:
 		msr_info->data = svm->vmcb01.ptr->save.sysenter_cs;
 		break;
+
 	case MSR_IA32_SYSENTER_EIP:
-		msr_info->data = svm->sysenter_eip;
+		msr_info->data = (u32)svm->vmcb01.ptr->save.sysenter_eip;
+		if (guest_cpuid_is_intel(vcpu))
+			msr_info->data |= (u64)svm->sysenter_eip_hi << 32;
+
 		break;
 	case MSR_IA32_SYSENTER_ESP:
-		msr_info->data = svm->sysenter_esp;
+		msr_info->data = svm->vmcb01.ptr->save.sysenter_esp;
+		if (guest_cpuid_is_intel(vcpu))
+			msr_info->data |= (u64)svm->sysenter_esp_hi << 32;
+
 		break;
 	case MSR_TSC_AUX:
 		if (!boot_cpu_has(X86_FEATURE_RDTSCP))
@@ -2885,12 +2886,20 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
 		svm->vmcb01.ptr->save.sysenter_cs = data;
 		break;
 	case MSR_IA32_SYSENTER_EIP:
-		svm->sysenter_eip = data;
-		svm->vmcb01.ptr->save.sysenter_eip = data;
+		svm->vmcb01.ptr->save.sysenter_eip = (u32)data;
+		/*
+		 * We only intercept the MSR_IA32_SYSENTER_{EIP|ESP} msrs
+		 * when we spoof an Intel vendor ID (for cross vendor migration).
+		 * In this case we use this intercept to track the high
+		 * 32 bit part of these msrs to support Intel's
+		 * implementation of SYSENTER/SYSEXIT.
+		 */
+		svm->sysenter_eip_hi = guest_cpuid_is_intel(vcpu) ? (data >> 32) : 0;
+
 		break;
 	case MSR_IA32_SYSENTER_ESP:
-		svm->sysenter_esp = data;
-		svm->vmcb01.ptr->save.sysenter_esp = data;
+		svm->vmcb01.ptr->save.sysenter_esp = (u32)data;
+		svm->sysenter_esp_hi = guest_cpuid_is_intel(vcpu) ? (data >> 32) : 0;
 		break;
 	case MSR_TSC_AUX:
 		if (!boot_cpu_has(X86_FEATURE_RDTSCP))
@@ -4009,24 +4018,48 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 			vcpu->arch.reserved_gpa_bits &= ~(1UL << (best->ebx & 0x3f));
 	}
 
-	if (!kvm_vcpu_apicv_active(vcpu))
-		return;
+	if (kvm_vcpu_apicv_active(vcpu)) {
+		/*
+		 * AVIC does not work with an x2APIC mode guest. If the X2APIC feature
+		 * is exposed to the guest, disable AVIC.
+		 */
+		if (guest_cpuid_has(vcpu, X86_FEATURE_X2APIC))
+			kvm_request_apicv_update(vcpu->kvm, false,
+						 APICV_INHIBIT_REASON_X2APIC);
 
-	/*
-	 * AVIC does not work with an x2APIC mode guest. If the X2APIC feature
-	 * is exposed to the guest, disable AVIC.
-	 */
-	if (guest_cpuid_has(vcpu, X86_FEATURE_X2APIC))
-		kvm_request_apicv_update(vcpu->kvm, false,
-					 APICV_INHIBIT_REASON_X2APIC);
+		/*
+		 * Currently, AVIC does not work with nested virtualization.
+		 * So, we disable AVIC when cpuid for SVM is set in the L1 guest.
+		 */
+		if (nested && guest_cpuid_has(vcpu, X86_FEATURE_SVM))
+			kvm_request_apicv_update(vcpu->kvm, false,
+						 APICV_INHIBIT_REASON_NESTED);
+	}
 
 	/*
-	 * Currently, AVIC does not work with nested virtualization.
-	 * So, we disable AVIC when cpuid for SVM is set in the L1 guest.
+	 * If we are emulating an Intel CPU we need to intercept
+	 * MSR_IA32_SYSENTER_EIP/MSR_IA32_SYSENTER_RSP to know their
+	 * full 64 bit values since on AMD these are 32 bit.
+	 *
+	 * Also we have to enable VMLOAD interception to not allow it
+	 * to overwrite these MSRs withouth KVM knowing (msr intercept
+	 * doesn't affect VMLOAD)
+	 *
 	 */
-	if (nested && guest_cpuid_has(vcpu, X86_FEATURE_SVM))
-		kvm_request_apicv_update(vcpu->kvm, false,
-					 APICV_INHIBIT_REASON_NESTED);
+	if (!guest_cpuid_is_intel(vcpu)) {
+		/*
+		 * If hardware supports Virtual VMLOAD VMSAVE then enable it
+		 * in VMCB and clear intercepts to avoid #VMEXIT.
+		 */
+		if (vls) {
+			svm_clr_intercept(svm, INTERCEPT_VMLOAD);
+			svm_clr_intercept(svm, INTERCEPT_VMSAVE);
+			svm->vmcb->control.virt_ext |= VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK;
+		}
+		/* No need to intercept these msrs either */
+		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_SYSENTER_EIP, 1, 1);
+		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_SYSENTER_ESP, 1, 1);
+	}
 }
 
 static bool svm_has_wbinvd_exit(void)
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 8e276c4fb33df..c71737f4abab8 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -28,7 +28,7 @@ static const u32 host_save_user_msrs[] = {
 };
 #define NR_HOST_SAVE_USER_MSRS ARRAY_SIZE(host_save_user_msrs)
 
-#define MAX_DIRECT_ACCESS_MSRS	18
+#define MAX_DIRECT_ACCESS_MSRS	20
 #define MSRPM_OFFSETS	16
 extern u32 msrpm_offsets[MSRPM_OFFSETS] __read_mostly;
 extern bool npt_enabled;
@@ -116,8 +116,9 @@ struct vcpu_svm {
 	struct kvm_vmcb_info *current_vmcb;
 	struct svm_cpu_data *svm_data;
 	u32 asid;
-	uint64_t sysenter_esp;
-	uint64_t sysenter_eip;
+	u32 sysenter_esp_hi;
+	u32 sysenter_eip_hi;
+
 	uint64_t tsc_aux;
 
 	u64 msr_decfg;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] KVM: nSVM: improve SYSENTER emulation on AMD
  2021-03-15 17:43 ` [PATCH 2/2] KVM: nSVM: improve SYSENTER emulation on AMD Maxim Levitsky
@ 2021-03-15 17:56   ` Paolo Bonzini
  2021-03-15 18:19     ` Maxim Levitsky
  0 siblings, 1 reply; 8+ messages in thread
From: Paolo Bonzini @ 2021-03-15 17:56 UTC (permalink / raw)
  To: Maxim Levitsky, kvm
  Cc: maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	H. Peter Anvin, Sean Christopherson, Ingo Molnar, Jim Mattson,
	Vitaly Kuznetsov, Joerg Roedel, Thomas Gleixner,
	open list:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Wanpeng Li, Borislav Petkov

On 15/03/21 18:43, Maxim Levitsky wrote:
> +	if (!guest_cpuid_is_intel(vcpu)) {
> +		/*
> +		 * If hardware supports Virtual VMLOAD VMSAVE then enable it
> +		 * in VMCB and clear intercepts to avoid #VMEXIT.
> +		 */
> +		if (vls) {
> +			svm_clr_intercept(svm, INTERCEPT_VMLOAD);
> +			svm_clr_intercept(svm, INTERCEPT_VMSAVE);
> +			svm->vmcb->control.virt_ext |= VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK;
> +		}
> +		/* No need to intercept these msrs either */
> +		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_SYSENTER_EIP, 1, 1);
> +		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_SYSENTER_ESP, 1, 1);
> +	}

An "else" is needed here to do the opposite setup (removing the "if 
(vls)" from init_vmcb).

This also makes the code more readable since you can write

	if (guest_cpuid_is_intel(vcpu)) {
		/*
		 * We must intercept SYSENTER_EIP and SYSENTER_ESP
		 * accesses because the processor only stores 32 bits.
		 * For the same reason we cannot use virtual
		 * VMLOAD/VMSAVE.
		 */
		...
	} else {
		/* Do the opposite.  */
		...
	}

Paolo


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] KVM: nSVM: improve SYSENTER emulation on AMD
  2021-03-15 17:56   ` Paolo Bonzini
@ 2021-03-15 18:19     ` Maxim Levitsky
  2021-03-16  8:16       ` Paolo Bonzini
  0 siblings, 1 reply; 8+ messages in thread
From: Maxim Levitsky @ 2021-03-15 18:19 UTC (permalink / raw)
  To: Paolo Bonzini, kvm
  Cc: maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	H. Peter Anvin, Sean Christopherson, Ingo Molnar, Jim Mattson,
	Vitaly Kuznetsov, Joerg Roedel, Thomas Gleixner,
	open list:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Wanpeng Li, Borislav Petkov

On Mon, 2021-03-15 at 18:56 +0100, Paolo Bonzini wrote:
> On 15/03/21 18:43, Maxim Levitsky wrote:
> > +	if (!guest_cpuid_is_intel(vcpu)) {
> > +		/*
> > +		 * If hardware supports Virtual VMLOAD VMSAVE then enable it
> > +		 * in VMCB and clear intercepts to avoid #VMEXIT.
> > +		 */
> > +		if (vls) {
> > +			svm_clr_intercept(svm, INTERCEPT_VMLOAD);
> > +			svm_clr_intercept(svm, INTERCEPT_VMSAVE);
> > +			svm->vmcb->control.virt_ext |= VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK;
> > +		}
> > +		/* No need to intercept these msrs either */
> > +		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_SYSENTER_EIP, 1, 1);
> > +		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_SYSENTER_ESP, 1, 1);
> > +	}
> 
> An "else" is needed here to do the opposite setup (removing the "if 
> (vls)" from init_vmcb).

init_vmcb currently set the INTERCEPT_VMLOAD and INTERCEPT_VMSAVE and it doesn't enable vls
thus there is nothing to do if I don't want to enable vls.
It seems reasonable to me.

Both msrs I marked as '.always = false' in the 
'direct_access_msrs', which makes them be intercepted by the default.
If I were to use '.always = true' it would feel a bit wrong as the intercept is not always
enabled.

What do you think?

> 
> This also makes the code more readable since you can write
> 
> 	if (guest_cpuid_is_intel(vcpu)) {
> 		/*
> 		 * We must intercept SYSENTER_EIP and SYSENTER_ESP
> 		 * accesses because the processor only stores 32 bits.
> 		 * For the same reason we cannot use virtual
> 		 * VMLOAD/VMSAVE.
> 		 */
> 		...
> 	} else {
> 		/* Do the opposite.  */
> 		...
> 	}

Best regards,
	Maxim Levitsky

> 
> Paolo
> 



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] KVM: nSVM: improve SYSENTER emulation on AMD
  2021-03-15 18:19     ` Maxim Levitsky
@ 2021-03-16  8:16       ` Paolo Bonzini
  2021-03-16 10:39         ` Maxim Levitsky
  0 siblings, 1 reply; 8+ messages in thread
From: Paolo Bonzini @ 2021-03-16  8:16 UTC (permalink / raw)
  To: Maxim Levitsky, kvm
  Cc: maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	H. Peter Anvin, Sean Christopherson, Ingo Molnar, Jim Mattson,
	Vitaly Kuznetsov, Joerg Roedel, Thomas Gleixner,
	open list:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Wanpeng Li, Borislav Petkov

On 15/03/21 19:19, Maxim Levitsky wrote:
> On Mon, 2021-03-15 at 18:56 +0100, Paolo Bonzini wrote:
>> On 15/03/21 18:43, Maxim Levitsky wrote:
>>> +	if (!guest_cpuid_is_intel(vcpu)) {
>>> +		/*
>>> +		 * If hardware supports Virtual VMLOAD VMSAVE then enable it
>>> +		 * in VMCB and clear intercepts to avoid #VMEXIT.
>>> +		 */
>>> +		if (vls) {
>>> +			svm_clr_intercept(svm, INTERCEPT_VMLOAD);
>>> +			svm_clr_intercept(svm, INTERCEPT_VMSAVE);
>>> +			svm->vmcb->control.virt_ext |= VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK;
>>> +		}
>>> +		/* No need to intercept these msrs either */
>>> +		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_SYSENTER_EIP, 1, 1);
>>> +		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_SYSENTER_ESP, 1, 1);
>>> +	}
>>
>> An "else" is needed here to do the opposite setup (removing the "if
>> (vls)" from init_vmcb).
> 
> init_vmcb currently set the INTERCEPT_VMLOAD and INTERCEPT_VMSAVE and it doesn't enable vls

There's also this towards the end of the function:

         /*
          * If hardware supports Virtual VMLOAD VMSAVE then enable it
          * in VMCB and clear intercepts to avoid #VMEXIT.
          */
         if (vls) {
                 svm_clr_intercept(svm, INTERCEPT_VMLOAD);
                 svm_clr_intercept(svm, INTERCEPT_VMSAVE);
                 svm->vmcb->control.virt_ext |= 
VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK;
         }

> thus there is nothing to do if I don't want to enable vls.
> It seems reasonable to me.
> 
> Both msrs I marked as '.always = false' in the
> 'direct_access_msrs', which makes them be intercepted by the default.
> If I were to use '.always = true' it would feel a bit wrong as the intercept is not always
> enabled.

I agree that .always = false is correct.

> What do you think?

You can set the CPUID multiple times, so you could go from AMD to Intel 
and back.

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] KVM: nSVM: improve SYSENTER emulation on AMD
  2021-03-16  8:16       ` Paolo Bonzini
@ 2021-03-16 10:39         ` Maxim Levitsky
  0 siblings, 0 replies; 8+ messages in thread
From: Maxim Levitsky @ 2021-03-16 10:39 UTC (permalink / raw)
  To: Paolo Bonzini, kvm
  Cc: maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	H. Peter Anvin, Sean Christopherson, Ingo Molnar, Jim Mattson,
	Vitaly Kuznetsov, Joerg Roedel, Thomas Gleixner,
	open list:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Wanpeng Li, Borislav Petkov

On Tue, 2021-03-16 at 09:16 +0100, Paolo Bonzini wrote:
> On 15/03/21 19:19, Maxim Levitsky wrote:
> > On Mon, 2021-03-15 at 18:56 +0100, Paolo Bonzini wrote:
> > > On 15/03/21 18:43, Maxim Levitsky wrote:
> > > > +	if (!guest_cpuid_is_intel(vcpu)) {
> > > > +		/*
> > > > +		 * If hardware supports Virtual VMLOAD VMSAVE then enable it
> > > > +		 * in VMCB and clear intercepts to avoid #VMEXIT.
> > > > +		 */
> > > > +		if (vls) {
> > > > +			svm_clr_intercept(svm, INTERCEPT_VMLOAD);
> > > > +			svm_clr_intercept(svm, INTERCEPT_VMSAVE);
> > > > +			svm->vmcb->control.virt_ext |= VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK;
> > > > +		}
> > > > +		/* No need to intercept these msrs either */
> > > > +		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_SYSENTER_EIP, 1, 1);
> > > > +		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_SYSENTER_ESP, 1, 1);
> > > > +	}
> > > 
> > > An "else" is needed here to do the opposite setup (removing the "if
> > > (vls)" from init_vmcb).
> > 
> > init_vmcb currently set the INTERCEPT_VMLOAD and INTERCEPT_VMSAVE and it doesn't enable vls
> 
> There's also this towards the end of the function:
> 
>          /*
>           * If hardware supports Virtual VMLOAD VMSAVE then enable it
>           * in VMCB and clear intercepts to avoid #VMEXIT.
>           */
>          if (vls) {
>                  svm_clr_intercept(svm, INTERCEPT_VMLOAD);
>                  svm_clr_intercept(svm, INTERCEPT_VMSAVE);
>                  svm->vmcb->control.virt_ext |= 
> VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK;
>          }
> 
> > thus there is nothing to do if I don't want to enable vls.
> > It seems reasonable to me.
> > 
> > Both msrs I marked as '.always = false' in the
> > 'direct_access_msrs', which makes them be intercepted by the default.
> > If I were to use '.always = true' it would feel a bit wrong as the intercept is not always
> > enabled.
> 
> I agree that .always = false is correct.
> 
> > What do you think?
> 
> You can set the CPUID multiple times, so you could go from AMD to Intel 
> and back.

I understand now, I will send V2 with that. Thanks for the review!

Best regards,
	Maxim Levitsky

> 
> Thanks,
> 
> Paolo
> 



^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 2/2] KVM: nSVM: improve SYSENTER emulation on AMD
  2021-04-01 11:16 [PATCH 0/2] KVM: x86: nSVM: fixes for SYSENTER emulation Maxim Levitsky
@ 2021-04-01 11:16 ` Maxim Levitsky
  0 siblings, 0 replies; 8+ messages in thread
From: Maxim Levitsky @ 2021-04-01 11:16 UTC (permalink / raw)
  To: kvm
  Cc: linux-kernel, Jim Mattson, Paolo Bonzini, Ingo Molnar,
	Wanpeng Li, Joerg Roedel,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Thomas Gleixner, H. Peter Anvin, Sean Christopherson,
	Vitaly Kuznetsov, Borislav Petkov, Maxim Levitsky

Currently to support Intel->AMD migration, if CPU vendor is GenuineIntel,
we emulate the full 64 value for MSR_IA32_SYSENTER_{EIP|ESP}
msrs, and we also emulate the sysenter/sysexit instruction in long mode.

(Emulator does still refuse to emulate sysenter in 64 bit mode, on the
ground that the code for that wasn't tested and likely has no users)

However when virtual vmload/vmsave is enabled, the vmload instruction will
update these 32 bit msrs without triggering their msr intercept,
which will lead to having stale values in kvm's shadow copy of these msrs,
which relies on the intercept to be up to date.

Fix/optimize this by doing the following:

1. Enable the MSR intercepts for SYSENTER MSRs iff vendor=GenuineIntel
   (This is both a tiny optimization and also ensures that in case
   the guest cpu vendor is AMD, the msrs will be 32 bit wide as
   AMD defined).

2. Store only high 32 bit part of these msrs on interception and combine
   it with hardware msr value on intercepted read/writes
   iff vendor=GenuineIntel.

3. Disable vmload/vmsave virtualization if vendor=GenuineIntel.
   (It is somewhat insane to set vendor=GenuineIntel and still enable
   SVM for the guest but well whatever).
   Then zero the high 32 bit parts when kvm intercepts and emulates vmload.

Thanks a lot to Paulo Bonzini for helping me with fixing this in the most
correct way.

This patch fixes nested migration of 32 bit nested guests, that was
broken because incorrect cached values of SYSENTER msrs were stored in
the migration stream if L1 changed these msrs with
vmload prior to L2 entry.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/svm.c | 99 +++++++++++++++++++++++++++---------------
 arch/x86/kvm/svm/svm.h |  6 +--
 2 files changed, 68 insertions(+), 37 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 271196400495..6c39b0cd6ec6 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -95,6 +95,8 @@ static const struct svm_direct_access_msrs {
 } direct_access_msrs[MAX_DIRECT_ACCESS_MSRS] = {
 	{ .index = MSR_STAR,				.always = true  },
 	{ .index = MSR_IA32_SYSENTER_CS,		.always = true  },
+	{ .index = MSR_IA32_SYSENTER_EIP,		.always = false },
+	{ .index = MSR_IA32_SYSENTER_ESP,		.always = false },
 #ifdef CONFIG_X86_64
 	{ .index = MSR_GS_BASE,				.always = true  },
 	{ .index = MSR_FS_BASE,				.always = true  },
@@ -1258,16 +1260,6 @@ static void init_vmcb(struct kvm_vcpu *vcpu)
 	if (kvm_vcpu_apicv_active(vcpu))
 		avic_init_vmcb(svm);
 
-	/*
-	 * If hardware supports Virtual VMLOAD VMSAVE then enable it
-	 * in VMCB and clear intercepts to avoid #VMEXIT.
-	 */
-	if (vls) {
-		svm_clr_intercept(svm, INTERCEPT_VMLOAD);
-		svm_clr_intercept(svm, INTERCEPT_VMSAVE);
-		svm->vmcb->control.virt_ext |= VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK;
-	}
-
 	if (vgif) {
 		svm_clr_intercept(svm, INTERCEPT_STGI);
 		svm_clr_intercept(svm, INTERCEPT_CLGI);
@@ -2133,9 +2125,11 @@ static int vmload_vmsave_interception(struct kvm_vcpu *vcpu, bool vmload)
 
 	ret = kvm_skip_emulated_instruction(vcpu);
 
-	if (vmload)
+	if (vmload) {
 		nested_svm_vmloadsave(vmcb12, svm->vmcb);
-	else
+		svm->sysenter_eip_hi = 0;
+		svm->sysenter_esp_hi = 0;
+	} else
 		nested_svm_vmloadsave(svm->vmcb, vmcb12);
 
 	kvm_vcpu_unmap(vcpu, &map, true);
@@ -2677,10 +2671,14 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		msr_info->data = svm->vmcb01.ptr->save.sysenter_cs;
 		break;
 	case MSR_IA32_SYSENTER_EIP:
-		msr_info->data = svm->sysenter_eip;
+		msr_info->data = (u32)svm->vmcb01.ptr->save.sysenter_eip;
+		if (guest_cpuid_is_intel(vcpu))
+			msr_info->data |= (u64)svm->sysenter_eip_hi << 32;
 		break;
 	case MSR_IA32_SYSENTER_ESP:
-		msr_info->data = svm->sysenter_esp;
+		msr_info->data = svm->vmcb01.ptr->save.sysenter_esp;
+		if (guest_cpuid_is_intel(vcpu))
+			msr_info->data |= (u64)svm->sysenter_esp_hi << 32;
 		break;
 	case MSR_TSC_AUX:
 		if (!boot_cpu_has(X86_FEATURE_RDTSCP))
@@ -2885,12 +2883,19 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
 		svm->vmcb01.ptr->save.sysenter_cs = data;
 		break;
 	case MSR_IA32_SYSENTER_EIP:
-		svm->sysenter_eip = data;
-		svm->vmcb01.ptr->save.sysenter_eip = data;
+		svm->vmcb01.ptr->save.sysenter_eip = (u32)data;
+		/*
+		 * We only intercept the MSR_IA32_SYSENTER_{EIP|ESP} msrs
+		 * when we spoof an Intel vendor ID (for cross vendor migration).
+		 * In this case we use this intercept to track the high
+		 * 32 bit part of these msrs to support Intel's
+		 * implementation of SYSENTER/SYSEXIT.
+		 */
+		svm->sysenter_eip_hi = guest_cpuid_is_intel(vcpu) ? (data >> 32) : 0;
 		break;
 	case MSR_IA32_SYSENTER_ESP:
-		svm->sysenter_esp = data;
-		svm->vmcb01.ptr->save.sysenter_esp = data;
+		svm->vmcb01.ptr->save.sysenter_esp = (u32)data;
+		svm->sysenter_esp_hi = guest_cpuid_is_intel(vcpu) ? (data >> 32) : 0;
 		break;
 	case MSR_TSC_AUX:
 		if (!boot_cpu_has(X86_FEATURE_RDTSCP))
@@ -4009,24 +4014,50 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 			vcpu->arch.reserved_gpa_bits &= ~(1UL << (best->ebx & 0x3f));
 	}
 
-	if (!kvm_vcpu_apicv_active(vcpu))
-		return;
+	if (kvm_vcpu_apicv_active(vcpu)) {
+		/*
+		 * AVIC does not work with an x2APIC mode guest. If the X2APIC feature
+		 * is exposed to the guest, disable AVIC.
+		 */
+		if (guest_cpuid_has(vcpu, X86_FEATURE_X2APIC))
+			kvm_request_apicv_update(vcpu->kvm, false,
+						 APICV_INHIBIT_REASON_X2APIC);
 
-	/*
-	 * AVIC does not work with an x2APIC mode guest. If the X2APIC feature
-	 * is exposed to the guest, disable AVIC.
-	 */
-	if (guest_cpuid_has(vcpu, X86_FEATURE_X2APIC))
-		kvm_request_apicv_update(vcpu->kvm, false,
-					 APICV_INHIBIT_REASON_X2APIC);
+		/*
+		 * Currently, AVIC does not work with nested virtualization.
+		 * So, we disable AVIC when cpuid for SVM is set in the L1 guest.
+		 */
+		if (nested && guest_cpuid_has(vcpu, X86_FEATURE_SVM))
+			kvm_request_apicv_update(vcpu->kvm, false,
+						 APICV_INHIBIT_REASON_NESTED);
+	}
 
-	/*
-	 * Currently, AVIC does not work with nested virtualization.
-	 * So, we disable AVIC when cpuid for SVM is set in the L1 guest.
-	 */
-	if (nested && guest_cpuid_has(vcpu, X86_FEATURE_SVM))
-		kvm_request_apicv_update(vcpu->kvm, false,
-					 APICV_INHIBIT_REASON_NESTED);
+	if (guest_cpuid_is_intel(vcpu)) {
+		/*
+		 * We must intercept SYSENTER_EIP and SYSENTER_ESP
+		 * accesses because the processor only stores 32 bits.
+		 * For the same reason we cannot use virtual VMLOAD/VMSAVE.
+		 */
+		svm_set_intercept(svm, INTERCEPT_VMLOAD);
+		svm_set_intercept(svm, INTERCEPT_VMSAVE);
+		svm->vmcb->control.virt_ext &= ~VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK;
+
+		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_SYSENTER_EIP, 0, 0);
+		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_SYSENTER_ESP, 0, 0);
+	} else {
+		/*
+		 * If hardware supports Virtual VMLOAD VMSAVE then enable it
+		 * in VMCB and clear intercepts to avoid #VMEXIT.
+		 */
+		if (vls) {
+			svm_clr_intercept(svm, INTERCEPT_VMLOAD);
+			svm_clr_intercept(svm, INTERCEPT_VMSAVE);
+			svm->vmcb->control.virt_ext |= VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK;
+		}
+		/* No need to intercept these MSRs */
+		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_SYSENTER_EIP, 1, 1);
+		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_SYSENTER_ESP, 1, 1);
+	}
 }
 
 static bool svm_has_wbinvd_exit(void)
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 8e276c4fb33d..fffdd5fb446d 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -28,7 +28,7 @@ static const u32 host_save_user_msrs[] = {
 };
 #define NR_HOST_SAVE_USER_MSRS ARRAY_SIZE(host_save_user_msrs)
 
-#define MAX_DIRECT_ACCESS_MSRS	18
+#define MAX_DIRECT_ACCESS_MSRS	20
 #define MSRPM_OFFSETS	16
 extern u32 msrpm_offsets[MSRPM_OFFSETS] __read_mostly;
 extern bool npt_enabled;
@@ -116,8 +116,8 @@ struct vcpu_svm {
 	struct kvm_vmcb_info *current_vmcb;
 	struct svm_cpu_data *svm_data;
 	u32 asid;
-	uint64_t sysenter_esp;
-	uint64_t sysenter_eip;
+	u32 sysenter_esp_hi;
+	u32 sysenter_eip_hi;
 	uint64_t tsc_aux;
 
 	u64 msr_decfg;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-04-01 18:03 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-15 17:43 [PATCH 0/2] KVM: x86: nSVM: fixes for SYSENTER emulation Maxim Levitsky
2021-03-15 17:43 ` [PATCH 1/2] KVM: x86: add guest_cpuid_is_intel Maxim Levitsky
2021-03-15 17:43 ` [PATCH 2/2] KVM: nSVM: improve SYSENTER emulation on AMD Maxim Levitsky
2021-03-15 17:56   ` Paolo Bonzini
2021-03-15 18:19     ` Maxim Levitsky
2021-03-16  8:16       ` Paolo Bonzini
2021-03-16 10:39         ` Maxim Levitsky
2021-04-01 11:16 [PATCH 0/2] KVM: x86: nSVM: fixes for SYSENTER emulation Maxim Levitsky
2021-04-01 11:16 ` [PATCH 2/2] KVM: nSVM: improve SYSENTER emulation on AMD Maxim Levitsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).