All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 00/15]  KVM: nVMX: VMX MSR quirk+fixes, CR4 fixes
@ 2022-06-07 21:35 Sean Christopherson
  2022-06-07 21:35 ` [PATCH v5 01/15] KVM: x86: Split kvm_is_valid_cr4() and export only the non-vendor bits Sean Christopherson
                   ` (14 more replies)
  0 siblings, 15 replies; 31+ messages in thread
From: Sean Christopherson @ 2022-06-07 21:35 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Eric Li, David Matlack,
	Oliver Upton

Resurrecting Oliver's series to quirk nVMX's manipulation of the VMX MSRs.

KVM has a quirk where nVMX overwrites select VMX MSR bits in response to
CPUID updates.  Specifically, KVM forces CR{0,4}_FIXED1 bits and the
VM-Entry/VM-Exit control bits for BNDCFGS and PERF_GLOBAL_CTRL to align
with the guest vCPU model.  Add a quirk to (a) allow userspace to opt out
of the existing behavior and (b) make it clear the the existing behavior
should not be propagated to new features.

Patches 0-4 are tangentially related fixes for correctly handling CR4
checks on nested VMXON and VM-Enter.  They are included here because there
is a subtle dependency created by the fix in patch 02, as it changes the
resulting behavior of patch 10, "Extend VMX MSRs quirk to CR0/4 fixed1 bits".

Patch 05 fixes a bug where KVM forces incoming VMX MSR values to be a
subset of _current_ value, not of KVM's support valu.  E.g. if userspace
clears an allowed-1 bit, it can never set that bit back to the original
value.

Patch 06, "Keep KVM updates to BNDCFGS ctrl bits across MSR write", fixes
a related bug where KVM's original quirky behavior kept the VMX MSRs
up-to-date (almost) all the time.

This series is technically based on my selftests overhaul[*], but practically
speaking that only affects the selftests.  The KVM should apply cleanly on
kvm/queue, 55371f1d0c01 ("KVM: x86/pmu: Update global ...")

I have a KUT test for patch 3 (VMXON fixes) that I'l post separately (yet
more cleanup involved, ugh).  I spot tested patch 2 by fudging PKU in guest
CPUID, but I'm not planning on submitting an official test anywhere (though
it could be done without too much pain in selftests).

v5:
 - Rebase (see above).
 - Fix "CR4 valid for nVMX" bugs.
 - Modify PERF_GLOBAL_CTRL bits iff the MSR exists.
 - Fix a bug where KVM doesn't allow userspace to restore VMX MSRs to
   _KVM's_ allowed values.
 - Fix the UMIP emulation goof.
 - Extend the quirk to CR0/4_FIXED1 MSRs.
 - Add a helper to identify if the vCPU has a vPMU.
 - Rewrote the selftests to more exhaustively test combos, and to test
   the aforementioned bugs fixed in v5.

v4:
 - https://lore.kernel.org/all/20220301060351.442881-1-oupton@google.com
 - Rebased to kvm/queue. Avoids conflicts with new CAPs and commit
   0bcd556e15f9 ("KVM: nVMX: Refactor PMU refresh to avoid referencing
   kvm_x86_ops.pmu_ops") on kvm/queue.
 - Grabbed KVM_CAP_DISABLE_QUIRKS2 patch, since this series also
   introduces a quirk.
 - Fix typo in KVM_CAP_DISABLE_QUIRKS2 documentation (Sean)
 - Eliminated the need to refresh 'load IA32_PGC' bits from PMU refresh.
 - Use consistent formatting to make test cases more easily readable
   (David Dunn)
 - Use correct 'Fixes: ' tag and correct a typo in Patch 2 changelog.

Oliver Upton (4):
  KVM: nVMX: Keep KVM updates to BNDCFGS ctrl bits across MSR write
  KVM: nVMX: Keep KVM updates to PERF_GLOBAL_CTRL ctrl bits across MSR
    write
  KVM: nVMX: Drop nested_vmx_pmu_refresh()
  KVM: nVMX: Add a quirk for KVM tweaks to VMX MSRs

Sean Christopherson (11):
  KVM: x86: Split kvm_is_valid_cr4() and export only the non-vendor bits
  KVM: nVMX: Account for KVM reserved CR4 bits in consistency checks
  KVM: nVMX: Inject #UD if VMXON is attempted with incompatible CR0/CR4
  KVM: nVMX: Rename handle_vm{on,off}() to handle_vmx{on,off}()
  KVM: nVMX: Let userspace set nVMX MSR to any _host_ supported value
  KVM: VMX: Add helper to check if the guest PMU has PERF_GLOBAL_CTRL
  KVM: nVMX: Set UMIP bit CR4_FIXED1 MSR when emulating UMIP
  KVM: nVMX: Extend VMX MSRs quirk to CR0/4 fixed1 bits
  KVM: selftests: Add test to verify KVM's VMX MSRs quirk for controls
  KVM: selftests: Extend VMX MSRs test to cover CR4_FIXED1 (and its
    quirks)
  KVM: selftests: Verify VMX MSRs can be restored to KVM-supported
    values

 Documentation/virt/kvm/api.rst                |  29 ++
 arch/x86/include/asm/kvm_host.h               |   3 +-
 arch/x86/include/uapi/asm/kvm.h               |   1 +
 arch/x86/kvm/svm/nested.c                     |   3 +-
 arch/x86/kvm/vmx/nested.c                     | 199 ++++++------
 arch/x86/kvm/vmx/nested.h                     |   5 +-
 arch/x86/kvm/vmx/pmu_intel.c                  |   7 +-
 arch/x86/kvm/vmx/vmx.c                        |  23 +-
 arch/x86/kvm/vmx/vmx.h                        |  14 +
 arch/x86/kvm/x86.c                            |  12 +-
 arch/x86/kvm/x86.h                            |   2 +-
 tools/testing/selftests/kvm/.gitignore        |   1 +
 tools/testing/selftests/kvm/Makefile          |   1 +
 .../selftests/kvm/include/x86_64/processor.h  |   8 +
 .../selftests/kvm/include/x86_64/vmx.h        |   2 +
 .../selftests/kvm/x86_64/vmx_msrs_test.c      | 287 ++++++++++++++++++
 16 files changed, 493 insertions(+), 104 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/x86_64/vmx_msrs_test.c


base-commit: 081ad4bbae8d503c79fae45f463766d28b2f3241
-- 
2.36.1.255.ge46751e96f-goog


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v5 01/15] KVM: x86: Split kvm_is_valid_cr4() and export only the non-vendor bits
  2022-06-07 21:35 [PATCH v5 00/15] KVM: nVMX: VMX MSR quirk+fixes, CR4 fixes Sean Christopherson
@ 2022-06-07 21:35 ` Sean Christopherson
  2022-06-07 21:35 ` [PATCH v5 02/15] KVM: nVMX: Account for KVM reserved CR4 bits in consistency checks Sean Christopherson
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 31+ messages in thread
From: Sean Christopherson @ 2022-06-07 21:35 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Eric Li, David Matlack,
	Oliver Upton

Split the common x86 parts of kvm_is_valid_cr4(), i.e. the reserved bits
checks, into a separate helper, __kvm_is_valid_cr4(), and export only the
inner helper to vendor code in order to prevent nested VMX from calling
back into vmx_is_valid_cr4() via kvm_is_valid_cr4().

On SVM, this is a nop as SVM doesn't place any additional restrictions on
CR4.

On VMX, this is also currently a nop, but only because nested VMX is
missing checks on reserved CR4 bits for nested VM-Enter.  That bug will
be fixed in a future patch, and could simply use kvm_is_valid_cr4() as-is,
but nVMX has _another_ bug where VMXON emulation doesn't enforce VMX's
restrictions on CR0/CR4.  The cleanest and most intuitive way to fix the
VMXON bug is to use nested_host_cr{0,4}_valid().  If the CR4 variant
routes through kvm_is_valid_cr4(), using nested_host_cr4_valid() won't do
the right thing for the VMXON case as vmx_is_valid_cr4() enforces VMX's
restrictions if and only if the vCPU is post-VMXON.

Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/nested.c |  3 ++-
 arch/x86/kvm/vmx/vmx.c    |  4 ++--
 arch/x86/kvm/x86.c        | 12 +++++++++---
 arch/x86/kvm/x86.h        |  2 +-
 4 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 88da8edbe1e1..2953939d5bf4 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -320,7 +320,8 @@ static bool __nested_vmcb_check_save(struct kvm_vcpu *vcpu,
 			return false;
 	}
 
-	if (CC(!kvm_is_valid_cr4(vcpu, save->cr4)))
+	/* Note, SVM doesn't have any additional restrictions on CR4. */
+	if (CC(!__kvm_is_valid_cr4(vcpu, save->cr4)))
 		return false;
 
 	if (CC(!kvm_valid_efer(vcpu, save->efer)))
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index fd2e707faf2b..57df799ffa29 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -3187,8 +3187,8 @@ static bool vmx_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
 {
 	/*
 	 * We operate under the default treatment of SMM, so VMX cannot be
-	 * enabled under SMM.  Note, whether or not VMXE is allowed at all is
-	 * handled by kvm_is_valid_cr4().
+	 * enabled under SMM.  Note, whether or not VMXE is allowed at all,
+	 * i.e. is a reserved bit, is handled by common x86 code.
 	 */
 	if ((cr4 & X86_CR4_VMXE) && is_smm(vcpu))
 		return false;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2db6f0373fa3..540651cd28d7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1079,7 +1079,7 @@ int kvm_emulate_xsetbv(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_GPL(kvm_emulate_xsetbv);
 
-bool kvm_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
+bool __kvm_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
 {
 	if (cr4 & cr4_reserved_bits)
 		return false;
@@ -1087,9 +1087,15 @@ bool kvm_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
 	if (cr4 & vcpu->arch.cr4_guest_rsvd_bits)
 		return false;
 
-	return static_call(kvm_x86_is_valid_cr4)(vcpu, cr4);
+	return true;
+}
+EXPORT_SYMBOL_GPL(__kvm_is_valid_cr4);
+
+static bool kvm_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
+{
+	return __kvm_is_valid_cr4(vcpu, cr4) &&
+	       static_call(kvm_x86_is_valid_cr4)(vcpu, cr4);
 }
-EXPORT_SYMBOL_GPL(kvm_is_valid_cr4);
 
 void kvm_post_set_cr4(struct kvm_vcpu *vcpu, unsigned long old_cr4, unsigned long cr4)
 {
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 501b884b8cc4..1926d2cb8e79 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -434,7 +434,7 @@ static inline void kvm_machine_check(void)
 void kvm_load_guest_xsave_state(struct kvm_vcpu *vcpu);
 void kvm_load_host_xsave_state(struct kvm_vcpu *vcpu);
 int kvm_spec_ctrl_test_value(u64 value);
-bool kvm_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4);
+bool __kvm_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4);
 int kvm_handle_memory_failure(struct kvm_vcpu *vcpu, int r,
 			      struct x86_exception *e);
 int kvm_handle_invpcid(struct kvm_vcpu *vcpu, unsigned long type, gva_t gva);
-- 
2.36.1.255.ge46751e96f-goog


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 02/15] KVM: nVMX: Account for KVM reserved CR4 bits in consistency checks
  2022-06-07 21:35 [PATCH v5 00/15] KVM: nVMX: VMX MSR quirk+fixes, CR4 fixes Sean Christopherson
  2022-06-07 21:35 ` [PATCH v5 01/15] KVM: x86: Split kvm_is_valid_cr4() and export only the non-vendor bits Sean Christopherson
@ 2022-06-07 21:35 ` Sean Christopherson
  2022-06-07 21:35 ` [PATCH v5 03/15] KVM: nVMX: Inject #UD if VMXON is attempted with incompatible CR0/CR4 Sean Christopherson
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 31+ messages in thread
From: Sean Christopherson @ 2022-06-07 21:35 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Eric Li, David Matlack,
	Oliver Upton

Check that the guest (L2) and host (L1) CR4 values that would be loaded
by nested VM-Enter and VM-Exit respectively are valid with respect to
KVM's (L0 host) allowed CR4 bits.  Failure to check KVM reserved bits
would allow L1 to load an illegal CR4 (or trigger hardware VM-Fail or
failed VM-Entry) by massaging guest CPUID to allow features that are not
supported by KVM.  Amusingly, KVM itself is an accomplice in its doom, as
KVM adjusts L1's MSR_IA32_VMX_CR4_FIXED1 to allow L1 to enable bits for
L2 based on L1's CPUID model.

Note, although nested_{guest,host}_cr4_valid() are _currently_ used if
and only if the vCPU is post-VMXON (nested.vmxon == true), that may not
be true in the future, e.g. emulating VMXON has a bug where it doesn't
check the allowed/required CR0/CR4 bits.

Cc: stable@vger.kernel.org
Fixes: 3899152ccbf4 ("KVM: nVMX: fix checks on CR{0,4} during virtual VMX operation")
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/nested.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h
index c92cea0b8ccc..129ae4e01f7c 100644
--- a/arch/x86/kvm/vmx/nested.h
+++ b/arch/x86/kvm/vmx/nested.h
@@ -281,7 +281,8 @@ static inline bool nested_cr4_valid(struct kvm_vcpu *vcpu, unsigned long val)
 	u64 fixed0 = to_vmx(vcpu)->nested.msrs.cr4_fixed0;
 	u64 fixed1 = to_vmx(vcpu)->nested.msrs.cr4_fixed1;
 
-	return fixed_bits_valid(val, fixed0, fixed1);
+	return fixed_bits_valid(val, fixed0, fixed1) &&
+	       __kvm_is_valid_cr4(vcpu, val);
 }
 
 /* No difference in the restrictions on guest and host CR4 in VMX operation. */
-- 
2.36.1.255.ge46751e96f-goog


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 03/15] KVM: nVMX: Inject #UD if VMXON is attempted with incompatible CR0/CR4
  2022-06-07 21:35 [PATCH v5 00/15] KVM: nVMX: VMX MSR quirk+fixes, CR4 fixes Sean Christopherson
  2022-06-07 21:35 ` [PATCH v5 01/15] KVM: x86: Split kvm_is_valid_cr4() and export only the non-vendor bits Sean Christopherson
  2022-06-07 21:35 ` [PATCH v5 02/15] KVM: nVMX: Account for KVM reserved CR4 bits in consistency checks Sean Christopherson
@ 2022-06-07 21:35 ` Sean Christopherson
  2022-06-07 21:35 ` [PATCH v5 04/15] KVM: nVMX: Rename handle_vm{on,off}() to handle_vmx{on,off}() Sean Christopherson
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 31+ messages in thread
From: Sean Christopherson @ 2022-06-07 21:35 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Eric Li, David Matlack,
	Oliver Upton

Inject a #UD if L1 attempts VMXON with a CR0 or CR4 that is disallowed
per the associated nested VMX MSRs' fixed0/1 settings.  KVM cannot rely
on hardware to perform the checks, even for the few checks that have
higher priority than VM-Exit, as (a) KVM may have forced CR0/CR4 bits in
hardware while running the guest, (b) there may incompatible CR0/CR4 bits
that have lower priority than VM-Exit, e.g. CR0.NE, and (c) userspace may
have further restricted the allowed CR0/CR4 values by manipulating the
guest's nested VMX MSRs.

Note, despite a very strong desire to throw shade at Jim, commit
70f3aac964ae ("kvm: nVMX: Remove superfluous VMX instruction fault checks")
is not to blame for the buggy behavior (though the comment...).  That
commit only removed the CR0.PE, EFLAGS.VM, and COMPATIBILITY mode checks
(though it did erroneously drop the CPL check, but that has already been
remedied).  KVM may force CR0.PE=1, but will do so only when also
forcing EFLAGS.VM=1 to emulate Real Mode, i.e. hardware will still #UD.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=216033
Fixes: ec378aeef9df ("KVM: nVMX: Implement VMXON and VMXOFF")
Reported-by: Eric Li <ercli@ucdavis.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/nested.c | 23 ++++++++++++++---------
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 7d8cd0ebcc75..9a5b6ef16c1c 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -4968,20 +4968,25 @@ static int handle_vmon(struct kvm_vcpu *vcpu)
 		| FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX;
 
 	/*
-	 * The Intel VMX Instruction Reference lists a bunch of bits that are
-	 * prerequisite to running VMXON, most notably cr4.VMXE must be set to
-	 * 1 (see vmx_is_valid_cr4() for when we allow the guest to set this).
-	 * Otherwise, we should fail with #UD.  But most faulting conditions
-	 * have already been checked by hardware, prior to the VM-exit for
-	 * VMXON.  We do test guest cr4.VMXE because processor CR4 always has
-	 * that bit set to 1 in non-root mode.
+	 * Note, KVM cannot rely on hardware to perform the CR0/CR4 #UD checks
+	 * that have higher priority than VM-Exit (see Intel SDM's pseudocode
+	 * for VMXON), as KVM must load valid CR0/CR4 values into hardware while
+	 * running the guest, i.e. KVM needs to check the _guest_ values.
+	 *
+	 * Rely on hardware for the other two pre-VM-Exit checks, !VM86 and
+	 * !COMPATIBILITY modes.  KVM may run the guest in VM86 to emulate Real
+	 * Mode, but KVM will never take the guest out of those modes.
 	 */
-	if (!kvm_read_cr4_bits(vcpu, X86_CR4_VMXE)) {
+	if (!nested_host_cr0_valid(vcpu, kvm_read_cr0(vcpu)) ||
+	    !nested_host_cr4_valid(vcpu, kvm_read_cr4(vcpu))) {
 		kvm_queue_exception(vcpu, UD_VECTOR);
 		return 1;
 	}
 
-	/* CPL=0 must be checked manually. */
+	/*
+	 * CPL=0 and all other checks that are lower priority than VM-Exit must
+	 * be checked manually.
+	 */
 	if (vmx_get_cpl(vcpu)) {
 		kvm_inject_gp(vcpu, 0);
 		return 1;
-- 
2.36.1.255.ge46751e96f-goog


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 04/15] KVM: nVMX: Rename handle_vm{on,off}() to handle_vmx{on,off}()
  2022-06-07 21:35 [PATCH v5 00/15] KVM: nVMX: VMX MSR quirk+fixes, CR4 fixes Sean Christopherson
                   ` (2 preceding siblings ...)
  2022-06-07 21:35 ` [PATCH v5 03/15] KVM: nVMX: Inject #UD if VMXON is attempted with incompatible CR0/CR4 Sean Christopherson
@ 2022-06-07 21:35 ` Sean Christopherson
  2022-06-07 21:35 ` [PATCH v5 05/15] KVM: nVMX: Let userspace set nVMX MSR to any _host_ supported value Sean Christopherson
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 31+ messages in thread
From: Sean Christopherson @ 2022-06-07 21:35 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Eric Li, David Matlack,
	Oliver Upton

Rename the exit handlers for VMXON and VMXOFF to match the instruction
names, the terms "vmon" and "vmoff" are not used anywhere in Intel's
documentation, nor are they used elsehwere in KVM.

Sadly, the exit reasons are exposed to userspace and so cannot be renamed
without breaking userspace. :-(

Fixes: ec378aeef9df ("KVM: nVMX: Implement VMXON and VMXOFF")
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/nested.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 9a5b6ef16c1c..00c7b00c017a 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -4958,7 +4958,7 @@ static int enter_vmx_operation(struct kvm_vcpu *vcpu)
 }
 
 /* Emulate the VMXON instruction. */
-static int handle_vmon(struct kvm_vcpu *vcpu)
+static int handle_vmxon(struct kvm_vcpu *vcpu)
 {
 	int ret;
 	gpa_t vmptr;
@@ -5055,7 +5055,7 @@ static inline void nested_release_vmcs12(struct kvm_vcpu *vcpu)
 }
 
 /* Emulate the VMXOFF instruction */
-static int handle_vmoff(struct kvm_vcpu *vcpu)
+static int handle_vmxoff(struct kvm_vcpu *vcpu)
 {
 	if (!nested_vmx_check_permission(vcpu))
 		return 1;
@@ -6832,8 +6832,8 @@ __init int nested_vmx_hardware_setup(int (*exit_handlers[])(struct kvm_vcpu *))
 	exit_handlers[EXIT_REASON_VMREAD]	= handle_vmread;
 	exit_handlers[EXIT_REASON_VMRESUME]	= handle_vmresume;
 	exit_handlers[EXIT_REASON_VMWRITE]	= handle_vmwrite;
-	exit_handlers[EXIT_REASON_VMOFF]	= handle_vmoff;
-	exit_handlers[EXIT_REASON_VMON]		= handle_vmon;
+	exit_handlers[EXIT_REASON_VMOFF]	= handle_vmxoff;
+	exit_handlers[EXIT_REASON_VMON]		= handle_vmxon;
 	exit_handlers[EXIT_REASON_INVEPT]	= handle_invept;
 	exit_handlers[EXIT_REASON_INVVPID]	= handle_invvpid;
 	exit_handlers[EXIT_REASON_VMFUNC]	= handle_vmfunc;
-- 
2.36.1.255.ge46751e96f-goog


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 05/15] KVM: nVMX: Let userspace set nVMX MSR to any _host_ supported value
  2022-06-07 21:35 [PATCH v5 00/15] KVM: nVMX: VMX MSR quirk+fixes, CR4 fixes Sean Christopherson
                   ` (3 preceding siblings ...)
  2022-06-07 21:35 ` [PATCH v5 04/15] KVM: nVMX: Rename handle_vm{on,off}() to handle_vmx{on,off}() Sean Christopherson
@ 2022-06-07 21:35 ` Sean Christopherson
  2022-10-31 16:39   ` Yu Zhang
  2022-06-07 21:35 ` [PATCH v5 06/15] KVM: nVMX: Keep KVM updates to BNDCFGS ctrl bits across MSR write Sean Christopherson
                   ` (9 subsequent siblings)
  14 siblings, 1 reply; 31+ messages in thread
From: Sean Christopherson @ 2022-06-07 21:35 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Eric Li, David Matlack,
	Oliver Upton

Restrict the nVMX MSRs based on KVM's config, not based on the guest's
current config.  Using the guest's config to audit the new config
prevents userspace from restoring the original config (KVM's config) if
at any point in the past the guest's config was restricted in any way.

Fixes: 62cc6b9dc61e ("KVM: nVMX: support restore of VMX capability MSRs")
Cc: stable@vger.kernel.org
Cc: David Matlack <dmatlack@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/nested.c | 100 ++++++++++++++++++++------------------
 1 file changed, 52 insertions(+), 48 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 00c7b00c017a..fca30e79b3a0 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -1223,7 +1223,7 @@ static int vmx_restore_vmx_basic(struct vcpu_vmx *vmx, u64 data)
 		BIT_ULL(49) | BIT_ULL(54) | BIT_ULL(55) |
 		/* reserved */
 		BIT_ULL(31) | GENMASK_ULL(47, 45) | GENMASK_ULL(63, 56);
-	u64 vmx_basic = vmx->nested.msrs.basic;
+	u64 vmx_basic = vmcs_config.nested.basic;
 
 	if (!is_bitwise_subset(vmx_basic, data, feature_and_reserved))
 		return -EINVAL;
@@ -1246,36 +1246,42 @@ static int vmx_restore_vmx_basic(struct vcpu_vmx *vmx, u64 data)
 	return 0;
 }
 
+static void vmx_get_control_msr(struct nested_vmx_msrs *msrs, u32 msr_index,
+				u32 **low, u32 **high)
+{
+	switch (msr_index) {
+	case MSR_IA32_VMX_TRUE_PINBASED_CTLS:
+		*low = &msrs->pinbased_ctls_low;
+		*high = &msrs->pinbased_ctls_high;
+		break;
+	case MSR_IA32_VMX_TRUE_PROCBASED_CTLS:
+		*low = &msrs->procbased_ctls_low;
+		*high = &msrs->procbased_ctls_high;
+		break;
+	case MSR_IA32_VMX_TRUE_EXIT_CTLS:
+		*low = &msrs->exit_ctls_low;
+		*high = &msrs->exit_ctls_high;
+		break;
+	case MSR_IA32_VMX_TRUE_ENTRY_CTLS:
+		*low = &msrs->entry_ctls_low;
+		*high = &msrs->entry_ctls_high;
+		break;
+	case MSR_IA32_VMX_PROCBASED_CTLS2:
+		*low = &msrs->secondary_ctls_low;
+		*high = &msrs->secondary_ctls_high;
+		break;
+	default:
+		BUG();
+	}
+}
+
 static int
 vmx_restore_control_msr(struct vcpu_vmx *vmx, u32 msr_index, u64 data)
 {
-	u64 supported;
 	u32 *lowp, *highp;
+	u64 supported;
 
-	switch (msr_index) {
-	case MSR_IA32_VMX_TRUE_PINBASED_CTLS:
-		lowp = &vmx->nested.msrs.pinbased_ctls_low;
-		highp = &vmx->nested.msrs.pinbased_ctls_high;
-		break;
-	case MSR_IA32_VMX_TRUE_PROCBASED_CTLS:
-		lowp = &vmx->nested.msrs.procbased_ctls_low;
-		highp = &vmx->nested.msrs.procbased_ctls_high;
-		break;
-	case MSR_IA32_VMX_TRUE_EXIT_CTLS:
-		lowp = &vmx->nested.msrs.exit_ctls_low;
-		highp = &vmx->nested.msrs.exit_ctls_high;
-		break;
-	case MSR_IA32_VMX_TRUE_ENTRY_CTLS:
-		lowp = &vmx->nested.msrs.entry_ctls_low;
-		highp = &vmx->nested.msrs.entry_ctls_high;
-		break;
-	case MSR_IA32_VMX_PROCBASED_CTLS2:
-		lowp = &vmx->nested.msrs.secondary_ctls_low;
-		highp = &vmx->nested.msrs.secondary_ctls_high;
-		break;
-	default:
-		BUG();
-	}
+	vmx_get_control_msr(&vmcs_config.nested, msr_index, &lowp, &highp);
 
 	supported = vmx_control_msr(*lowp, *highp);
 
@@ -1287,6 +1293,7 @@ vmx_restore_control_msr(struct vcpu_vmx *vmx, u32 msr_index, u64 data)
 	if (!is_bitwise_subset(supported, data, GENMASK_ULL(63, 32)))
 		return -EINVAL;
 
+	vmx_get_control_msr(&vmx->nested.msrs, msr_index, &lowp, &highp);
 	*lowp = data;
 	*highp = data >> 32;
 	return 0;
@@ -1300,10 +1307,8 @@ static int vmx_restore_vmx_misc(struct vcpu_vmx *vmx, u64 data)
 		BIT_ULL(28) | BIT_ULL(29) | BIT_ULL(30) |
 		/* reserved */
 		GENMASK_ULL(13, 9) | BIT_ULL(31);
-	u64 vmx_misc;
-
-	vmx_misc = vmx_control_msr(vmx->nested.msrs.misc_low,
-				   vmx->nested.msrs.misc_high);
+	u64 vmx_misc = vmx_control_msr(vmcs_config.nested.misc_low,
+				       vmcs_config.nested.misc_high);
 
 	if (!is_bitwise_subset(vmx_misc, data, feature_and_reserved_bits))
 		return -EINVAL;
@@ -1331,10 +1336,8 @@ static int vmx_restore_vmx_misc(struct vcpu_vmx *vmx, u64 data)
 
 static int vmx_restore_vmx_ept_vpid_cap(struct vcpu_vmx *vmx, u64 data)
 {
-	u64 vmx_ept_vpid_cap;
-
-	vmx_ept_vpid_cap = vmx_control_msr(vmx->nested.msrs.ept_caps,
-					   vmx->nested.msrs.vpid_caps);
+	u64 vmx_ept_vpid_cap = vmx_control_msr(vmcs_config.nested.ept_caps,
+					       vmcs_config.nested.vpid_caps);
 
 	/* Every bit is either reserved or a feature bit. */
 	if (!is_bitwise_subset(vmx_ept_vpid_cap, data, -1ULL))
@@ -1345,20 +1348,21 @@ static int vmx_restore_vmx_ept_vpid_cap(struct vcpu_vmx *vmx, u64 data)
 	return 0;
 }
 
+static u64 *vmx_get_fixed0_msr(struct nested_vmx_msrs *msrs, u32 msr_index)
+{
+	switch (msr_index) {
+	case MSR_IA32_VMX_CR0_FIXED0:
+		return &msrs->cr0_fixed0;
+	case MSR_IA32_VMX_CR4_FIXED0:
+		return &msrs->cr4_fixed0;
+	default:
+		BUG();
+	}
+}
+
 static int vmx_restore_fixed0_msr(struct vcpu_vmx *vmx, u32 msr_index, u64 data)
 {
-	u64 *msr;
-
-	switch (msr_index) {
-	case MSR_IA32_VMX_CR0_FIXED0:
-		msr = &vmx->nested.msrs.cr0_fixed0;
-		break;
-	case MSR_IA32_VMX_CR4_FIXED0:
-		msr = &vmx->nested.msrs.cr4_fixed0;
-		break;
-	default:
-		BUG();
-	}
+	const u64 *msr = vmx_get_fixed0_msr(&vmcs_config.nested, msr_index);
 
 	/*
 	 * 1 bits (which indicates bits which "must-be-1" during VMX operation)
@@ -1367,7 +1371,7 @@ static int vmx_restore_fixed0_msr(struct vcpu_vmx *vmx, u32 msr_index, u64 data)
 	if (!is_bitwise_subset(data, *msr, -1ULL))
 		return -EINVAL;
 
-	*msr = data;
+	*vmx_get_fixed0_msr(&vmx->nested.msrs, msr_index) = data;
 	return 0;
 }
 
@@ -1428,7 +1432,7 @@ int vmx_set_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data)
 		vmx->nested.msrs.vmcs_enum = data;
 		return 0;
 	case MSR_IA32_VMX_VMFUNC:
-		if (data & ~vmx->nested.msrs.vmfunc_controls)
+		if (data & ~vmcs_config.nested.vmfunc_controls)
 			return -EINVAL;
 		vmx->nested.msrs.vmfunc_controls = data;
 		return 0;
-- 
2.36.1.255.ge46751e96f-goog


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 06/15] KVM: nVMX: Keep KVM updates to BNDCFGS ctrl bits across MSR write
  2022-06-07 21:35 [PATCH v5 00/15] KVM: nVMX: VMX MSR quirk+fixes, CR4 fixes Sean Christopherson
                   ` (4 preceding siblings ...)
  2022-06-07 21:35 ` [PATCH v5 05/15] KVM: nVMX: Let userspace set nVMX MSR to any _host_ supported value Sean Christopherson
@ 2022-06-07 21:35 ` Sean Christopherson
  2022-07-22  9:06   ` Paolo Bonzini
  2022-06-07 21:35 ` [PATCH v5 07/15] KVM: VMX: Add helper to check if the guest PMU has PERF_GLOBAL_CTRL Sean Christopherson
                   ` (8 subsequent siblings)
  14 siblings, 1 reply; 31+ messages in thread
From: Sean Christopherson @ 2022-06-07 21:35 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Eric Li, David Matlack,
	Oliver Upton

From: Oliver Upton <oupton@google.com>

Since commit 5f76f6f5ff96 ("KVM: nVMX: Do not expose MPX VMX controls
when guest MPX disabled"), KVM has taken ownership of the "load
IA32_BNDCFGS" and "clear IA32_BNDCFGS" VMX entry/exit controls. The ABI
is that these bits must be set in the IA32_VMX_TRUE_{ENTRY,EXIT}_CTLS
MSRs if the guest's CPUID supports MPX, and clear otherwise.

However, commit aedbaf4f6afd ("KVM: x86: Extract
kvm_update_cpuid_runtime() from kvm_update_cpuid()") partially broke KVM
ownership of the aforementioned bits. Before, kvm_update_cpuid() was
exercised frequently when running a guest and constantly applied its own
changes to the BNDCFGS bits. Now, the BNDCFGS bits are only ever
updated after a KVM_SET_CPUID/KVM_SET_CPUID2 ioctl, meaning that a
subsequent MSR write from userspace will clobber these values.

Uphold the old ABI by reapplying KVM's tweaks to the BNDCFGS bits after
an MSR write from userspace.

Note, the old ABI that is being preserved is a KVM hack to workaround a
userspace bug; see commit 5f76f6f5ff96 ("KVM: nVMX: Do not expose MPX VMX
controls when guest MPX disabled").  VMCS controls are not tied to CPUID,
i.e. KVM should not be mucking with unrelated things.  The argument that
it's KVM's responsibility to propagate CPUID state to VMX MSRs doesn't
hold water, as the MPX shenanigans are an exception, not the norm, e.g.
KVM doesn't perform the following adjustments (and this is but a subset
of all possible tweaks):

  X86_FEATURE_LM => VM_EXIT_HOST_ADDR_SPACE_SIZE, VM_ENTRY_IA32E_MODE,
                    VMX_MISC_SAVE_EFER_LMA

  X86_FEATURE_TSC => CPU_BASED_RDTSC_EXITING, CPU_BASED_USE_TSC_OFFSETTING,
                     SECONDARY_EXEC_TSC_SCALING

  X86_FEATURE_INVPCID_SINGLE => SECONDARY_EXEC_ENABLE_INVPCID

  X86_FEATURE_MWAIT => CPU_BASED_MONITOR_EXITING, CPU_BASED_MWAIT_EXITING

  X86_FEATURE_INTEL_PT => SECONDARY_EXEC_PT_CONCEAL_VMX, SECONDARY_EXEC_PT_USE_GPA,
                          VM_EXIT_CLEAR_IA32_RTIT_CTL, VM_ENTRY_LOAD_IA32_RTIT_CTL

  X86_FEATURE_XSAVES => SECONDARY_EXEC_XSAVES

Fixes: aedbaf4f6afd ("KVM: x86: Extract kvm_update_cpuid_runtime() from kvm_update_cpuid()")
Signed-off-by: Oliver Upton <oupton@google.com>
[sean: explicitly document the original KVM hack]
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/nested.c | 9 +++++++++
 arch/x86/kvm/vmx/vmx.c    | 2 +-
 arch/x86/kvm/vmx/vmx.h    | 2 ++
 3 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index fca30e79b3a0..d1c21d387716 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -1296,6 +1296,15 @@ vmx_restore_control_msr(struct vcpu_vmx *vmx, u32 msr_index, u64 data)
 	vmx_get_control_msr(&vmx->nested.msrs, msr_index, &lowp, &highp);
 	*lowp = data;
 	*highp = data >> 32;
+
+	/*
+	 * To preserve an old, kludgy ABI, ensure KVM fiddling with the "true"
+	 * entry/exit controls MSRs is preserved after userspace modifications.
+	 */
+	if (msr_index == MSR_IA32_VMX_TRUE_ENTRY_CTLS ||
+	    msr_index == MSR_IA32_VMX_TRUE_EXIT_CTLS)
+		nested_vmx_entry_exit_ctls_update(&vmx->vcpu);
+
 	return 0;
 }
 
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 57df799ffa29..3f1671d7cbe4 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7398,7 +7398,7 @@ static void nested_vmx_cr_fixed1_bits_update(struct kvm_vcpu *vcpu)
 #undef cr4_fixed1_update
 }
 
-static void nested_vmx_entry_exit_ctls_update(struct kvm_vcpu *vcpu)
+void nested_vmx_entry_exit_ctls_update(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 71bcb486e73f..576fed7e33de 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -425,6 +425,8 @@ static inline void vmx_set_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr,
 
 void vmx_update_cpu_dirty_logging(struct kvm_vcpu *vcpu);
 
+void nested_vmx_entry_exit_ctls_update(struct kvm_vcpu *vcpu);
+
 /*
  * Note, early Intel manuals have the write-low and read-high bitmap offsets
  * the wrong way round.  The bitmaps control MSRs 0x00000000-0x00001fff and
-- 
2.36.1.255.ge46751e96f-goog


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 07/15] KVM: VMX: Add helper to check if the guest PMU has PERF_GLOBAL_CTRL
  2022-06-07 21:35 [PATCH v5 00/15] KVM: nVMX: VMX MSR quirk+fixes, CR4 fixes Sean Christopherson
                   ` (5 preceding siblings ...)
  2022-06-07 21:35 ` [PATCH v5 06/15] KVM: nVMX: Keep KVM updates to BNDCFGS ctrl bits across MSR write Sean Christopherson
@ 2022-06-07 21:35 ` Sean Christopherson
  2022-06-07 21:35 ` [PATCH v5 08/15] KVM: nVMX: Keep KVM updates to PERF_GLOBAL_CTRL ctrl bits across MSR write Sean Christopherson
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 31+ messages in thread
From: Sean Christopherson @ 2022-06-07 21:35 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Eric Li, David Matlack,
	Oliver Upton

Add a helper to check of the guest PMU has PERF_GLOBAL_CTRL, which is
unintuitive _and_ diverges from Intel's architecturally defined behavior.
Even worse, KVM currently implements the check using two different, but
equivalent checksand , _and_ there has been at least one attempt to add a
_third_ flavor.

Link: https://lore.kernel.org/all/Yk4ugOETeo%2FqDRbW@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/pmu_intel.c |  4 ++--
 arch/x86/kvm/vmx/vmx.h       | 12 ++++++++++++
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 5b85320fc9f1..6ce3b066f7d9 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -111,7 +111,7 @@ static bool intel_pmc_is_enabled(struct kvm_pmc *pmc)
 {
 	struct kvm_pmu *pmu = pmc_to_pmu(pmc);
 
-	if (pmu->version < 2)
+	if (!intel_pmu_has_perf_global_ctrl(pmu))
 		return true;
 
 	return test_bit(pmc->idx, (unsigned long *)&pmu->global_ctrl);
@@ -208,7 +208,7 @@ static bool intel_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr, bool host_initiat
 	case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
 		if (host_initiated)
 			return true;
-		return pmu->version > 1;
+		return intel_pmu_has_perf_global_ctrl(pmu);
 		break;
 	case MSR_IA32_PEBS_ENABLE:
 		if (host_initiated)
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 576fed7e33de..215f17eb6732 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -91,6 +91,18 @@ union vmx_exit_reason {
 	u32 full;
 };
 
+static inline bool intel_pmu_has_perf_global_ctrl(struct kvm_pmu *pmu)
+{
+	/*
+	 * Architecturally, Intel's SDM states that IA32_PERF_GLOBAL_CTRL is
+	 * supported if "CPUID.0AH: EAX[7:0] > 0", i.e. if the PMU version is
+	 * greater than zero.  However, KVM only exposes and emulates the MSR
+	 * to/for the guest if the guest PMU supports at least "Architectural
+	 * Performance Monitoring Version 2".
+	 */
+	return pmu->version > 1;
+}
+
 #define vcpu_to_lbr_desc(vcpu) (&to_vmx(vcpu)->lbr_desc)
 #define vcpu_to_lbr_records(vcpu) (&to_vmx(vcpu)->lbr_desc.records)
 
-- 
2.36.1.255.ge46751e96f-goog


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 08/15] KVM: nVMX: Keep KVM updates to PERF_GLOBAL_CTRL ctrl bits across MSR write
  2022-06-07 21:35 [PATCH v5 00/15] KVM: nVMX: VMX MSR quirk+fixes, CR4 fixes Sean Christopherson
                   ` (6 preceding siblings ...)
  2022-06-07 21:35 ` [PATCH v5 07/15] KVM: VMX: Add helper to check if the guest PMU has PERF_GLOBAL_CTRL Sean Christopherson
@ 2022-06-07 21:35 ` Sean Christopherson
  2022-06-07 21:35 ` [PATCH v5 09/15] KVM: nVMX: Drop nested_vmx_pmu_refresh() Sean Christopherson
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 31+ messages in thread
From: Sean Christopherson @ 2022-06-07 21:35 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Eric Li, David Matlack,
	Oliver Upton

From: Oliver Upton <oupton@google.com>

Since commit 03a8871add95 ("KVM: nVMX: Expose load IA32_PERF_GLOBAL_CTRL
VM-{Entry,Exit} control"), KVM has taken ownership of the "load
IA32_PERF_GLOBAL_CTRL" VMX entry/exit control bits. The ABI is that
these bits will be set in the IA32_VMX_TRUE_{ENTRY,EXIT}_CTLS MSRs if
the guest's CPUID exposes a vPMU that supports the IA32_PERF_GLOBAL_CTRL
MSR (CPUID.0AH:EAX[7:0] > 1), and clear otherwise.

However, commit aedbaf4f6afd ("KVM: x86: Extract kvm_update_cpuid_runtime()
from kvm_update_cpuid()") partially broke KVM ownership of the
aforementioned bits. Before, kvm_update_cpuid() was exercised frequently
when running a guest and constantly applied its own changes to the "load
IA32_PERF_GLOBAL_CTRL" bits. Now, the "load IA32_PERF_GLOBAL_CTRL" bits
are only ever updated after a KVM_SET_CPUID/KVM_SET_CPUID2 ioctl, meaning
that a subsequent MSR write from userspace will clobber these values.

Note that older kernels without commit c44d9b34701d ("KVM: x86: Invoke
vendor's vcpu_after_set_cpuid() after all common updates") still require
that the entry/exit controls be updated from kvm_pmu_refresh(). Leave
the benign call in place to allow for cleaner backporting and punt the
cleanup to a later change.

Uphold the old ABI by reapplying KVM's tweaks to the "load
IA32_PERF_GLOBAL_CTRL" bits after an MSR write from userspace.

Note, the old ABI that is being preserved is misguided KVM behavior that
was introduced by commit 03a8871add95 ("KVM: nVMX: Expose load
IA32_PERF_GLOBAL_CTRL VM-{Entry,Exit} control").  KVM's bogus tweaking of
VMX MSRs was first implemented by commit 5f76f6f5ff96 ("KVM: nVMX: Do not
expose MPX VMX controls when guest MPX disabled") to hack around a QEMU
bug, and that bad behavior was unfortunately applied to PERF_GLOBAL_CTRL
before it could be stamped out.

Fixes: aedbaf4f6afd ("KVM: x86: Extract kvm_update_cpuid_runtime() from kvm_update_cpuid()")
Reported-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Oliver Upton <oupton@google.com>
[sean: explicitly document the original KVM hack, set bits iff CPU
       supports the control]
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/vmx.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 3f1671d7cbe4..73ec4746a4e6 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7413,6 +7413,20 @@ void nested_vmx_entry_exit_ctls_update(struct kvm_vcpu *vcpu)
 			vmx->nested.msrs.exit_ctls_high &= ~VM_EXIT_CLEAR_BNDCFGS;
 		}
 	}
+
+	/*
+	 * KVM supports a 1-setting of the "load IA32_PERF_GLOBAL_CTRL"
+	 * VM-{Entry,Exit} controls if the vPMU supports IA32_PERF_GLOBAL_CTRL.
+	 */
+	if (cpu_has_load_perf_global_ctrl()) {
+		if (intel_pmu_has_perf_global_ctrl(vcpu_to_pmu(vcpu))) {
+			vmx->nested.msrs.entry_ctls_high |= VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL;
+			vmx->nested.msrs.exit_ctls_high |= VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL;
+		} else {
+			vmx->nested.msrs.entry_ctls_high &= ~VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL;
+			vmx->nested.msrs.exit_ctls_high &= ~VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL;
+		}
+	}
 }
 
 static void update_intel_pt_cfg(struct kvm_vcpu *vcpu)
-- 
2.36.1.255.ge46751e96f-goog


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 09/15] KVM: nVMX: Drop nested_vmx_pmu_refresh()
  2022-06-07 21:35 [PATCH v5 00/15] KVM: nVMX: VMX MSR quirk+fixes, CR4 fixes Sean Christopherson
                   ` (7 preceding siblings ...)
  2022-06-07 21:35 ` [PATCH v5 08/15] KVM: nVMX: Keep KVM updates to PERF_GLOBAL_CTRL ctrl bits across MSR write Sean Christopherson
@ 2022-06-07 21:35 ` Sean Christopherson
  2022-06-07 21:35 ` [PATCH v5 10/15] KVM: nVMX: Add a quirk for KVM tweaks to VMX MSRs Sean Christopherson
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 31+ messages in thread
From: Sean Christopherson @ 2022-06-07 21:35 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Eric Li, David Matlack,
	Oliver Upton

From: Oliver Upton <oupton@google.com>

nested_vmx_pmu_refresh() is now unneeded, as the call to
nested_vmx_entry_exit_ctls_update() in vmx_vcpu_after_set_cpuid()
guarantees that the VM-{Entry,Exit} control MSR changes are applied
after setting CPUID. Drop all vestiges of nested_vmx_pmu_refresh().

No functional change intended.

Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/nested.c    | 22 ----------------------
 arch/x86/kvm/vmx/nested.h    |  2 --
 arch/x86/kvm/vmx/pmu_intel.c |  3 ---
 3 files changed, 27 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index d1c21d387716..4ba0e5540908 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -4847,28 +4847,6 @@ int get_vmx_mem_address(struct kvm_vcpu *vcpu, unsigned long exit_qualification,
 	return 0;
 }
 
-void nested_vmx_pmu_refresh(struct kvm_vcpu *vcpu,
-			    bool vcpu_has_perf_global_ctrl)
-{
-	struct vcpu_vmx *vmx;
-
-	if (!nested_vmx_allowed(vcpu))
-		return;
-
-	vmx = to_vmx(vcpu);
-	if (vcpu_has_perf_global_ctrl) {
-		vmx->nested.msrs.entry_ctls_high |=
-				VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL;
-		vmx->nested.msrs.exit_ctls_high |=
-				VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL;
-	} else {
-		vmx->nested.msrs.entry_ctls_high &=
-				~VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL;
-		vmx->nested.msrs.exit_ctls_high &=
-				~VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL;
-	}
-}
-
 static int nested_vmx_get_vmptr(struct kvm_vcpu *vcpu, gpa_t *vmpointer,
 				int *ret)
 {
diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h
index 129ae4e01f7c..88b00a7359e4 100644
--- a/arch/x86/kvm/vmx/nested.h
+++ b/arch/x86/kvm/vmx/nested.h
@@ -32,8 +32,6 @@ int vmx_set_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data);
 int vmx_get_vmx_msr(struct nested_vmx_msrs *msrs, u32 msr_index, u64 *pdata);
 int get_vmx_mem_address(struct kvm_vcpu *vcpu, unsigned long exit_qualification,
 			u32 vmx_instruction_info, bool wr, int len, gva_t *ret);
-void nested_vmx_pmu_refresh(struct kvm_vcpu *vcpu,
-			    bool vcpu_has_perf_global_ctrl);
 void nested_mark_vmcs12_pages_dirty(struct kvm_vcpu *vcpu);
 bool nested_vmx_check_io_bitmaps(struct kvm_vcpu *vcpu, unsigned int port,
 				 int size);
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 6ce3b066f7d9..515ab6594333 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -597,9 +597,6 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
 	bitmap_set(pmu->all_valid_pmc_idx,
 		INTEL_PMC_MAX_GENERIC, pmu->nr_arch_fixed_counters);
 
-	nested_vmx_pmu_refresh(vcpu,
-			       intel_is_valid_msr(vcpu, MSR_CORE_PERF_GLOBAL_CTRL, false));
-
 	if (cpuid_model_is_consistent(vcpu))
 		x86_perf_get_lbr(&lbr_desc->records);
 	else
-- 
2.36.1.255.ge46751e96f-goog


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 10/15] KVM: nVMX: Add a quirk for KVM tweaks to VMX MSRs
  2022-06-07 21:35 [PATCH v5 00/15] KVM: nVMX: VMX MSR quirk+fixes, CR4 fixes Sean Christopherson
                   ` (8 preceding siblings ...)
  2022-06-07 21:35 ` [PATCH v5 09/15] KVM: nVMX: Drop nested_vmx_pmu_refresh() Sean Christopherson
@ 2022-06-07 21:35 ` Sean Christopherson
  2022-06-07 21:36 ` [PATCH v5 11/15] KVM: nVMX: Set UMIP bit CR4_FIXED1 MSR when emulating UMIP Sean Christopherson
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 31+ messages in thread
From: Sean Christopherson @ 2022-06-07 21:35 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Eric Li, David Matlack,
	Oliver Upton

From: Oliver Upton <oupton@google.com>

Quirk KVM's misguided behavior of manipulating VMX MSRs based on guest
CPUID state.  There is no requirement, at all, that a CPU support
virtualizing a feature if said feature is supported in bare metal.  I.e.
the VMX MSRs exist independent of CPUID for a reason.

One could argue that disabling features, as KVM does for the entry/exit
controls for the IA32_PERF_GLOBAL_CTRL and IA32_BNDCFGS MSRs, is correct
as such a configuration is contradictory, but KVM's policy is to let
userspace have full control of the guest vCPU model so long as the host
kernel is not at risk.  Furthermore, mucking with the VMX MSRs creates a
subtle, difficult to maintain ABI as KVM must ensure that any internal
changes, e.g. to how KVM handles _any_ guest CPUID changes, yield the
same functional result.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Oliver Upton <oupton@google.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 Documentation/virt/kvm/api.rst  | 21 +++++++++++++++++++++
 arch/x86/include/asm/kvm_host.h |  3 ++-
 arch/x86/include/uapi/asm/kvm.h |  1 +
 arch/x86/kvm/vmx/nested.c       |  5 +++--
 arch/x86/kvm/vmx/vmx.c          |  3 ++-
 5 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 42a1984fafc8..1095692ddab7 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -7374,6 +7374,27 @@ The valid bits in cap.args[0] are:
                                     hypercall instructions. Executing the
                                     incorrect hypercall instruction will
                                     generate a #UD within the guest.
+
+ KVM_X86_QUIRK_TWEAK_VMX_MSRS       By default, during a guest CPUID update,
+                                    KVM adjusts the values of select VMX MSRs
+                                    (usually based on guest CPUID):
+
+                                    - If CPUID.07H:EBX[bit 14] (MPX) is set, KVM
+                                      sets IA32_VMX_TRUE_ENTRY_CTLS[bit 48]
+                                      ('load IA32_BNDCFGS') and
+                                      IA32_VMX_TRUE_EXIT_CTLS[bit 55]
+                                      ('clear IA32_BNDCFGS'). Otherwise, these
+                                      corresponding MSR bits are cleared.
+                                    - If CPUID.0AH:EAX[bits 7:0] > 1, KVM sets
+                                      IA32_VMX_TRUE_ENTRY_CTLS[bit 45]
+                                      ('load IA32_PERF_GLOBAL_CTRL') and
+                                      IA32_VMX_TRUE_EXIT_CTLS[bit 44]
+                                      ('load IA32_PERF_GLOBAL_CTRL'). Otherwise,
+                                      these corresponding MSR bits are cleared.
+
+                                    When this quirk is disabled, KVM will not
+                                    change the values of the aformentioned VMX
+                                    MSRs during guest CPUID updates.
 =================================== ============================================
 
 7.32 KVM_CAP_MAX_VCPU_ID
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 6cf5d77d7896..a783c82fb902 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2011,6 +2011,7 @@ int memslot_rmap_alloc(struct kvm_memory_slot *slot, unsigned long npages);
 	 KVM_X86_QUIRK_LAPIC_MMIO_HOLE |	\
 	 KVM_X86_QUIRK_OUT_7E_INC_RIP |		\
 	 KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT |	\
-	 KVM_X86_QUIRK_FIX_HYPERCALL_INSN)
+	 KVM_X86_QUIRK_FIX_HYPERCALL_INSN |	\
+	 KVM_X86_QUIRK_TWEAK_VMX_MSRS)
 
 #endif /* _ASM_X86_KVM_HOST_H */
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index 24c807c8d5f7..0705178bd93d 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -438,6 +438,7 @@ struct kvm_sync_regs {
 #define KVM_X86_QUIRK_OUT_7E_INC_RIP		(1 << 3)
 #define KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT	(1 << 4)
 #define KVM_X86_QUIRK_FIX_HYPERCALL_INSN	(1 << 5)
+#define KVM_X86_QUIRK_TWEAK_VMX_MSRS		(1 << 6)
 
 #define KVM_STATE_NESTED_FORMAT_VMX	0
 #define KVM_STATE_NESTED_FORMAT_SVM	1
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 4ba0e5540908..dc2f9b06b99a 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -1301,8 +1301,9 @@ vmx_restore_control_msr(struct vcpu_vmx *vmx, u32 msr_index, u64 data)
 	 * To preserve an old, kludgy ABI, ensure KVM fiddling with the "true"
 	 * entry/exit controls MSRs is preserved after userspace modifications.
 	 */
-	if (msr_index == MSR_IA32_VMX_TRUE_ENTRY_CTLS ||
-	    msr_index == MSR_IA32_VMX_TRUE_EXIT_CTLS)
+	if ((msr_index == MSR_IA32_VMX_TRUE_ENTRY_CTLS ||
+	     msr_index == MSR_IA32_VMX_TRUE_EXIT_CTLS) &&
+	    kvm_check_has_quirk(vmx->vcpu.kvm, KVM_X86_QUIRK_TWEAK_VMX_MSRS))
 		nested_vmx_entry_exit_ctls_update(&vmx->vcpu);
 
 	return 0;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 73ec4746a4e6..4c31c8f24329 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7522,7 +7522,8 @@ static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 
 	if (nested_vmx_allowed(vcpu)) {
 		nested_vmx_cr_fixed1_bits_update(vcpu);
-		nested_vmx_entry_exit_ctls_update(vcpu);
+		if (kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_TWEAK_VMX_MSRS))
+			nested_vmx_entry_exit_ctls_update(vcpu);
 	}
 
 	if (boot_cpu_has(X86_FEATURE_INTEL_PT) &&
-- 
2.36.1.255.ge46751e96f-goog


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 11/15] KVM: nVMX: Set UMIP bit CR4_FIXED1 MSR when emulating UMIP
  2022-06-07 21:35 [PATCH v5 00/15] KVM: nVMX: VMX MSR quirk+fixes, CR4 fixes Sean Christopherson
                   ` (9 preceding siblings ...)
  2022-06-07 21:35 ` [PATCH v5 10/15] KVM: nVMX: Add a quirk for KVM tweaks to VMX MSRs Sean Christopherson
@ 2022-06-07 21:36 ` Sean Christopherson
  2022-07-22  9:49   ` Paolo Bonzini
  2022-06-07 21:36 ` [PATCH v5 12/15] KVM: nVMX: Extend VMX MSRs quirk to CR0/4 fixed1 bits Sean Christopherson
                   ` (3 subsequent siblings)
  14 siblings, 1 reply; 31+ messages in thread
From: Sean Christopherson @ 2022-06-07 21:36 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Eric Li, David Matlack,
	Oliver Upton

Make UMIP an "allowed-1" bit CR4_FIXED1 MSR when KVM is emulating UMIP.
KVM emulates UMIP for both L1 and L2, and so should enumerate that L2 is
allowed to have CR4.UMIP=1.  Not setting the bit doesn't immediately
break nVMX, as KVM does set/clear the bit in CR4_FIXED1 in response to a
guest CPUID update, i.e. KVM will correctly (dis)allow nested VM-Entry
based on whether or not UMIP is exposed to L1.

That said, KVM should enumerate the bit as being allowed from time zero,
e.g. userspace will see the wrong value if the MSR is read before CPUID
is written.  And a future patch will quirk KVM's behavior of stuffing
CR4_FIXED1 in response to guest CPUID changes, as CR4_FIXED1 is not
strictly required to match the CPUID model exposed to L1.

Fixes: 0367f205a3b7 ("KVM: vmx: add support for emulating UMIP")
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/nested.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index dc2f9b06b99a..5533c2474128 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -6781,6 +6781,9 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
 	rdmsrl(MSR_IA32_VMX_CR0_FIXED1, msrs->cr0_fixed1);
 	rdmsrl(MSR_IA32_VMX_CR4_FIXED1, msrs->cr4_fixed1);
 
+	if (vmx_umip_emulated())
+		msrs->cr4_fixed1 |= X86_CR4_UMIP;
+
 	msrs->vmcs_enum = nested_vmx_calc_vmcs_enum_msr();
 }
 
-- 
2.36.1.255.ge46751e96f-goog


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 12/15] KVM: nVMX: Extend VMX MSRs quirk to CR0/4 fixed1 bits
  2022-06-07 21:35 [PATCH v5 00/15] KVM: nVMX: VMX MSR quirk+fixes, CR4 fixes Sean Christopherson
                   ` (10 preceding siblings ...)
  2022-06-07 21:36 ` [PATCH v5 11/15] KVM: nVMX: Set UMIP bit CR4_FIXED1 MSR when emulating UMIP Sean Christopherson
@ 2022-06-07 21:36 ` Sean Christopherson
  2022-07-22  9:50   ` Paolo Bonzini
  2022-06-07 21:36 ` [PATCH v5 13/15] KVM: selftests: Add test to verify KVM's VMX MSRs quirk for controls Sean Christopherson
                   ` (2 subsequent siblings)
  14 siblings, 1 reply; 31+ messages in thread
From: Sean Christopherson @ 2022-06-07 21:36 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Eric Li, David Matlack,
	Oliver Upton

Extend the VMX MSRs quirk to the CR0/4_FIXED1 MSRs, i.e. when the quirk
is disabled, allow userspace to set the MSRs and do not rewrite the MSRs
during CPUID updates.  The bits that the guest (L2 in this case) is
allowed to set are not directly tied to CPUID.  Enumerating to L1 that it
can set reserved CR0/4 bits is nonsensical and will ultimately result in
a failed nested VM-Entry (KVM enforces guest reserved CR4 bits on top of
the VMX MSRs), but KVM typically doesn't police the vCPU model except
when doing so is necessary to protect the host kernel.

Further restricting CR4 bits is however a reasonable thing to do, e.g. to
work around a bug in nested virtualization, in which case exposing a
feature to L1 is ok, but letting L2 use the feature is not.  Of course,
whether or not the L1 hypervisor will actually _check_ the FIXED1 bits is
another matter entirely, e.g. KVM currently assumes all bits that can be
set in the host can also be set in the guest.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 Documentation/virt/kvm/api.rst |  8 ++++++++
 arch/x86/kvm/vmx/nested.c      | 33 ++++++++++++++++++++++++++++++---
 arch/x86/kvm/vmx/vmx.c         |  6 +++---
 3 files changed, 41 insertions(+), 6 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 1095692ddab7..88d1bbae031e 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -7391,6 +7391,14 @@ The valid bits in cap.args[0] are:
                                       IA32_VMX_TRUE_EXIT_CTLS[bit 44]
                                       ('load IA32_PERF_GLOBAL_CTRL'). Otherwise,
                                       these corresponding MSR bits are cleared.
+                                    - MSR_IA32_VMX_CR0_FIXED1 is unconditionally
+                                      set to 0xffffffff
+                                    - CR4.PCE is unconditionally set in
+                                      MSR_IA32_VMX_CR4_FIXED1.
+                                    - All CR4 bits with an associated CPUID
+                                      feature flag are set in
+                                      MSR_IA32_VMX_CR4_FIXED1 if the feature is
+                                      reported as supported in guest CPUID.
 
                                     When this quirk is disabled, KVM will not
                                     change the values of the aformentioned VMX
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 5533c2474128..abce74cfefc9 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -1385,6 +1385,30 @@ static int vmx_restore_fixed0_msr(struct vcpu_vmx *vmx, u32 msr_index, u64 data)
 	return 0;
 }
 
+static u64 *vmx_get_fixed1_msr(struct nested_vmx_msrs *msrs, u32 msr_index)
+{
+	switch (msr_index) {
+	case MSR_IA32_VMX_CR0_FIXED1:
+		return &msrs->cr0_fixed1;
+	case MSR_IA32_VMX_CR4_FIXED1:
+		return &msrs->cr4_fixed1;
+	default:
+		BUG();
+	}
+}
+
+static int vmx_restore_fixed1_msr(struct vcpu_vmx *vmx, u32 msr_index, u64 data)
+{
+	const u64 *msr = vmx_get_fixed1_msr(&vmcs_config.nested, msr_index);
+
+	/* Bits that "must-be-0" must not be set in the restored value. */
+	if (!is_bitwise_subset(*msr, data, -1ULL))
+		return -EINVAL;
+
+	*vmx_get_fixed1_msr(&vmx->nested.msrs, msr_index) = data;
+	return 0;
+}
+
 /*
  * Called when userspace is restoring VMX MSRs.
  *
@@ -1432,10 +1456,13 @@ int vmx_set_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data)
 	case MSR_IA32_VMX_CR0_FIXED1:
 	case MSR_IA32_VMX_CR4_FIXED1:
 		/*
-		 * These MSRs are generated based on the vCPU's CPUID, so we
-		 * do not support restoring them directly.
+		 * These MSRs are generated based on the vCPU's CPUID when KVM
+		 * "owns" the VMX MSRs, do not allow restoring them directly.
 		 */
-		return -EINVAL;
+		if (kvm_check_has_quirk(vmx->vcpu.kvm, KVM_X86_QUIRK_TWEAK_VMX_MSRS))
+			return -EINVAL;
+
+		return vmx_restore_fixed1_msr(vmx, msr_index, data);
 	case MSR_IA32_VMX_EPT_VPID_CAP:
 		return vmx_restore_vmx_ept_vpid_cap(vmx, data);
 	case MSR_IA32_VMX_VMCS_ENUM:
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 4c31c8f24329..139f365ca6bb 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7520,10 +7520,10 @@ static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 			~(FEAT_CTL_VMX_ENABLED_INSIDE_SMX |
 			  FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX);
 
-	if (nested_vmx_allowed(vcpu)) {
+	if (nested_vmx_allowed(vcpu) &&
+	    kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_TWEAK_VMX_MSRS)) {
 		nested_vmx_cr_fixed1_bits_update(vcpu);
-		if (kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_TWEAK_VMX_MSRS))
-			nested_vmx_entry_exit_ctls_update(vcpu);
+		nested_vmx_entry_exit_ctls_update(vcpu);
 	}
 
 	if (boot_cpu_has(X86_FEATURE_INTEL_PT) &&
-- 
2.36.1.255.ge46751e96f-goog


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 13/15] KVM: selftests: Add test to verify KVM's VMX MSRs quirk for controls
  2022-06-07 21:35 [PATCH v5 00/15] KVM: nVMX: VMX MSR quirk+fixes, CR4 fixes Sean Christopherson
                   ` (11 preceding siblings ...)
  2022-06-07 21:36 ` [PATCH v5 12/15] KVM: nVMX: Extend VMX MSRs quirk to CR0/4 fixed1 bits Sean Christopherson
@ 2022-06-07 21:36 ` Sean Christopherson
  2022-06-07 21:36 ` [PATCH v5 14/15] KVM: selftests: Extend VMX MSRs test to cover CR4_FIXED1 (and its quirks) Sean Christopherson
  2022-06-07 21:36 ` [PATCH v5 15/15] KVM: selftests: Verify VMX MSRs can be restored to KVM-supported values Sean Christopherson
  14 siblings, 0 replies; 31+ messages in thread
From: Sean Christopherson @ 2022-06-07 21:36 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Eric Li, David Matlack,
	Oliver Upton

Add a test to verify KVM's established ABI with respect to the entry/exit
controls for PERF_GLOBAL_CTRL and BNDCFGS.  KVM has a quirk where KVM
updates the VMX "true" entry/exit control MSRs to force PERF_GLOBAL_CTRL
and BNDCFGS to follow the guest CPUID model, i.e. set when supported,
clear when not, even though the MSR values are not strictly associated
with CPUID.  Note, KVM's ABI is that its modifications to the MSRs are
preserved even when userspace explicitly writes the MSRs.

Verify that KVM correctly tweaks the MSRs When the quirk is enabled (the
default behavior), and does not touch them when the quirk is disabled.

Suggested-by: Oliver Upton <oupton@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 tools/testing/selftests/kvm/.gitignore        |   1 +
 tools/testing/selftests/kvm/Makefile          |   1 +
 .../selftests/kvm/include/x86_64/processor.h  |   1 +
 .../selftests/kvm/include/x86_64/vmx.h        |   2 +
 .../selftests/kvm/x86_64/vmx_msrs_test.c      | 161 ++++++++++++++++++
 5 files changed, 166 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/x86_64/vmx_msrs_test.c

diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftests/kvm/.gitignore
index 0ab0e255d292..5893804b5196 100644
--- a/tools/testing/selftests/kvm/.gitignore
+++ b/tools/testing/selftests/kvm/.gitignore
@@ -47,6 +47,7 @@
 /x86_64/vmx_dirty_log_test
 /x86_64/vmx_exception_with_invalid_guest_state
 /x86_64/vmx_invalid_nested_guest_state
+/x86_64/vmx_msrs_test
 /x86_64/vmx_preemption_timer_test
 /x86_64/vmx_set_nested_state_test
 /x86_64/vmx_tsc_adjust_test
diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index 9a256c1f1bdf..2ee2dc55c100 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -74,6 +74,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/vmx_apic_access_test
 TEST_GEN_PROGS_x86_64 += x86_64/vmx_close_while_nested_test
 TEST_GEN_PROGS_x86_64 += x86_64/vmx_dirty_log_test
 TEST_GEN_PROGS_x86_64 += x86_64/vmx_exception_with_invalid_guest_state
+TEST_GEN_PROGS_x86_64 += x86_64/vmx_msrs_test
 TEST_GEN_PROGS_x86_64 += x86_64/vmx_invalid_nested_guest_state
 TEST_GEN_PROGS_x86_64 += x86_64/vmx_set_nested_state_test
 TEST_GEN_PROGS_x86_64 += x86_64/vmx_tsc_adjust_test
diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
index 3fd3d58148c2..51cab9b080f7 100644
--- a/tools/testing/selftests/kvm/include/x86_64/processor.h
+++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
@@ -98,6 +98,7 @@ struct kvm_x86_cpu_feature {
 #define	X86_FEATURE_SMEP	        KVM_X86_CPU_FEATURE(0x7, 0, EBX, 7)
 #define	X86_FEATURE_INVPCID		KVM_X86_CPU_FEATURE(0x7, 0, EBX, 10)
 #define	X86_FEATURE_RTM			KVM_X86_CPU_FEATURE(0x7, 0, EBX, 11)
+#define	X86_FEATURE_MPX			KVM_X86_CPU_FEATURE(0x7, 0, EBX, 14)
 #define	X86_FEATURE_SMAP		KVM_X86_CPU_FEATURE(0x7, 0, EBX, 20)
 #define	X86_FEATURE_PCOMMIT		KVM_X86_CPU_FEATURE(0x7, 0, EBX, 22)
 #define	X86_FEATURE_CLFLUSHOPT		KVM_X86_CPU_FEATURE(0x7, 0, EBX, 23)
diff --git a/tools/testing/selftests/kvm/include/x86_64/vmx.h b/tools/testing/selftests/kvm/include/x86_64/vmx.h
index fe0ebb790b49..5a6002b34d2b 100644
--- a/tools/testing/selftests/kvm/include/x86_64/vmx.h
+++ b/tools/testing/selftests/kvm/include/x86_64/vmx.h
@@ -80,6 +80,7 @@
 #define VM_EXIT_SAVE_IA32_EFER			0x00100000
 #define VM_EXIT_LOAD_IA32_EFER			0x00200000
 #define VM_EXIT_SAVE_VMX_PREEMPTION_TIMER	0x00400000
+#define VM_EXIT_CLEAR_BNDCFGS			0x00800000
 
 #define VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR	0x00036dff
 
@@ -90,6 +91,7 @@
 #define VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL	0x00002000
 #define VM_ENTRY_LOAD_IA32_PAT			0x00004000
 #define VM_ENTRY_LOAD_IA32_EFER			0x00008000
+#define VM_ENTRY_LOAD_BNDCFGS			0x00010000
 
 #define VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR	0x000011ff
 
diff --git a/tools/testing/selftests/kvm/x86_64/vmx_msrs_test.c b/tools/testing/selftests/kvm/x86_64/vmx_msrs_test.c
new file mode 100644
index 000000000000..9be2c2e3acf1
--- /dev/null
+++ b/tools/testing/selftests/kvm/x86_64/vmx_msrs_test.c
@@ -0,0 +1,161 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * VMX control MSR test
+ *
+ * Copyright (C) 2022 Google LLC.
+ *
+ * Tests for KVM ownership of bits in the VMX entry/exit control MSRs. Checks
+ * that KVM will set owned bits where appropriate, and will not if
+ * KVM_X86_QUIRK_TWEAK_VMX_CTRL_MSRS is disabled.
+ */
+
+#include "kvm_util.h"
+#include "vmx.h"
+
+#define SUBTEST_REQUIRE(f)					\
+	if (!(f)) {						\
+		print_skip("Requirement not met: %s", #f);	\
+		return;						\
+	}
+
+static bool vmx_has_ctrl(struct kvm_vcpu *vcpu, uint32_t msr, uint32_t ctrl_mask)
+{
+	return (vcpu_get_msr(vcpu, msr) >> 32) & ctrl_mask;
+}
+
+static void test_vmx_ctrl_msr(struct kvm_vcpu *vcpu,
+			      uint32_t msr, uint64_t ctrl_mask,
+			      bool quirk_enabled, bool feature_enabled)
+{
+	uint64_t ctrl_allowed1 = ctrl_mask << 32;
+	uint64_t val = vcpu_get_msr(vcpu, msr);
+
+	/*
+	 * If the quirk is enabled, KVM should have modified the MSR when the
+	 * guest's CPUID was set.  Don't assert anything when the quirk is
+	 * disabled, the value of the MSR is not known (it could be made known,
+	 * but it gets messy and the added value is minimal).
+	 */
+	TEST_ASSERT(!quirk_enabled || (!!(val & ctrl_allowed1) == feature_enabled),
+		    "KVM owns the ctrl when the quirk is enabled, want 0x%lx, got 0x%lx",
+		    feature_enabled ? ctrl_allowed1 : 0, val & ctrl_allowed1);
+
+	val |= ctrl_allowed1;
+	vcpu_set_msr(vcpu, msr, val);
+
+	val = vcpu_get_msr(vcpu, msr);
+	if (quirk_enabled)
+		TEST_ASSERT(!!(val & ctrl_allowed1) == feature_enabled,
+			    "KVM owns the ctrl when the quirk is enabled, want 0x%lx, got 0x%lx",
+			    feature_enabled ? ctrl_allowed1 : 0, val & ctrl_allowed1);
+	else
+		TEST_ASSERT(val & ctrl_allowed1,
+			    "KVM shouldn't clear the ctrl when the quirk is disabled");
+
+	val &= ~ctrl_allowed1;
+	vcpu_set_msr(vcpu, msr, val);
+
+	val = vcpu_get_msr(vcpu, msr);
+	if (quirk_enabled)
+		TEST_ASSERT(!!(val & ctrl_allowed1) == feature_enabled,
+			    "KVM owns the ctrl when the quirk is enabled, want 0x%lx, got 0x%lx",
+			    feature_enabled ? ctrl_allowed1 : 0, val & ctrl_allowed1);
+	else
+		TEST_ASSERT(!(val & ctrl_allowed1),
+			    "KVM shouldn't set the ctrl when the quirk is disabled");
+}
+
+static void test_vmx_ctrl_msrs_pair(struct kvm_vcpu *vcpu,
+				    bool quirk_enabled, bool feature_enabled,
+				    uint32_t entry_msr, uint64_t entry_mask,
+				    uint32_t exit_msr, uint64_t exit_mask)
+{
+	test_vmx_ctrl_msr(vcpu, entry_msr, entry_mask, quirk_enabled, feature_enabled);
+	test_vmx_ctrl_msr(vcpu, exit_msr, exit_mask, quirk_enabled, feature_enabled);
+}
+
+static void test_vmx_ctrls(struct kvm_vm *vm, struct kvm_vcpu *vcpu,
+			   uint64_t entry_ctrl, uint64_t exit_ctrl)
+{
+	/*
+	 * KVM's quirky behavior only exists for PERF_GLOBAL_CTRL and BNDCFGS,
+	 * any attempt to extend KVM's quirky behavior must be met with fierce
+	 * resistance!
+	 */
+	TEST_ASSERT(entry_ctrl == VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL ||
+		    entry_ctrl == VM_ENTRY_LOAD_BNDCFGS,
+		    "Don't let KVM expand its quirk beyond PERF_GLOBAL_CTRL and BNDCFSG");
+
+	SUBTEST_REQUIRE(vmx_has_ctrl(vcpu, MSR_IA32_VMX_TRUE_ENTRY_CTLS, entry_ctrl));
+	SUBTEST_REQUIRE(vmx_has_ctrl(vcpu, MSR_IA32_VMX_TRUE_EXIT_CTLS, exit_ctrl));
+
+	/*
+	 * Test that, when the quirk is enabled, KVM sets/clears the VMX MSR
+	 * bits based on whether or not the feature is exposed to the guest.
+	 */
+	test_vmx_ctrl_msrs_pair(vcpu, true, true,
+				MSR_IA32_VMX_TRUE_ENTRY_CTLS, entry_ctrl,
+				MSR_IA32_VMX_TRUE_EXIT_CTLS, exit_ctrl);
+
+	/* Hide the feature in CPUID. */
+	if (entry_ctrl == VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL)
+		vcpu_clear_cpuid_entry(vcpu, 0xa);
+	else
+		vcpu_clear_cpuid_feature(vcpu, X86_FEATURE_MPX);
+
+	test_vmx_ctrl_msrs_pair(vcpu, true, false,
+				MSR_IA32_VMX_TRUE_ENTRY_CTLS, entry_ctrl,
+				MSR_IA32_VMX_TRUE_EXIT_CTLS, exit_ctrl);
+
+	/*
+	 * Disable the quirk, giving userspace control of the VMX MSRs.  KVM
+	 * should not touch the MSR, i.e. should allow hiding the control when
+	 * a vPMU is supported, and should allow exposing the control when a
+	 * vPMU is not supported.
+	 */
+	vm_enable_cap(vm, KVM_CAP_DISABLE_QUIRKS2, KVM_X86_QUIRK_TWEAK_VMX_MSRS);
+
+	test_vmx_ctrl_msrs_pair(vcpu, false, false,
+				MSR_IA32_VMX_TRUE_ENTRY_CTLS, entry_ctrl,
+				MSR_IA32_VMX_TRUE_EXIT_CTLS, exit_ctrl);
+
+	/* Restore the full CPUID to expose the feature to the guest. */
+	vcpu_init_cpuid(vcpu, kvm_get_supported_cpuid());
+	test_vmx_ctrl_msrs_pair(vcpu, false, true,
+				MSR_IA32_VMX_TRUE_ENTRY_CTLS, entry_ctrl,
+				MSR_IA32_VMX_TRUE_EXIT_CTLS, exit_ctrl);
+
+	vm_enable_cap(vm, KVM_CAP_DISABLE_QUIRKS2, 0);
+}
+
+static void load_perf_global_ctrl_test(struct kvm_vm *vm, struct kvm_vcpu *vcpu)
+{
+	SUBTEST_REQUIRE(kvm_get_cpuid_max_basic() >= 0xa);
+
+	test_vmx_ctrls(vm, vcpu, VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL,
+		       VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL);
+}
+
+static void load_and_clear_bndcfgs_test(struct kvm_vm *vm, struct kvm_vcpu *vcpu)
+{
+	SUBTEST_REQUIRE(kvm_cpu_has(X86_FEATURE_MPX));
+
+	test_vmx_ctrls(vm, vcpu, VM_ENTRY_LOAD_BNDCFGS, VM_EXIT_CLEAR_BNDCFGS);
+}
+
+int main(void)
+{
+	struct kvm_vcpu *vcpu;
+	struct kvm_vm *vm;
+
+	TEST_REQUIRE(kvm_has_cap(KVM_CAP_DISABLE_QUIRKS2));
+	TEST_REQUIRE(kvm_cpu_has(X86_FEATURE_VMX));
+
+	/* No need to actually do KVM_RUN, thus no guest code. */
+	vm = vm_create_with_one_vcpu(&vcpu, NULL);
+
+	load_perf_global_ctrl_test(vm, vcpu);
+	load_and_clear_bndcfgs_test(vm, vcpu);
+
+	kvm_vm_free(vm);
+}
-- 
2.36.1.255.ge46751e96f-goog


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 14/15] KVM: selftests: Extend VMX MSRs test to cover CR4_FIXED1 (and its quirks)
  2022-06-07 21:35 [PATCH v5 00/15] KVM: nVMX: VMX MSR quirk+fixes, CR4 fixes Sean Christopherson
                   ` (12 preceding siblings ...)
  2022-06-07 21:36 ` [PATCH v5 13/15] KVM: selftests: Add test to verify KVM's VMX MSRs quirk for controls Sean Christopherson
@ 2022-06-07 21:36 ` Sean Christopherson
  2022-06-07 21:36 ` [PATCH v5 15/15] KVM: selftests: Verify VMX MSRs can be restored to KVM-supported values Sean Christopherson
  14 siblings, 0 replies; 31+ messages in thread
From: Sean Christopherson @ 2022-06-07 21:36 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Eric Li, David Matlack,
	Oliver Upton

Extend the VMX MSRs test to verify that KVM adheres to its established
quirky ABI for CR4_FIXED1 when VMX MSRS quirk is enabled, and that KV
doesn't touch CR4_FIXED1 when the quirk is disabled.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 .../selftests/kvm/include/x86_64/processor.h  |  7 ++
 .../selftests/kvm/x86_64/vmx_msrs_test.c      | 65 +++++++++++++++++++
 2 files changed, 72 insertions(+)

diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
index 51cab9b080f7..716e72bc9163 100644
--- a/tools/testing/selftests/kvm/include/x86_64/processor.h
+++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
@@ -87,9 +87,16 @@ struct kvm_x86_cpu_feature {
 #define	X86_FEATURE_XSAVE		KVM_X86_CPU_FEATURE(0x1, 0, ECX, 26)
 #define	X86_FEATURE_OSXSAVE		KVM_X86_CPU_FEATURE(0x1, 0, ECX, 27)
 #define	X86_FEATURE_RDRAND		KVM_X86_CPU_FEATURE(0x1, 0, ECX, 30)
+#define	X86_FEATURE_VME			KVM_X86_CPU_FEATURE(0x1, 0, EDX, 1)
+#define	X86_FEATURE_DE			KVM_X86_CPU_FEATURE(0x1, 0, EDX, 2)
+#define	X86_FEATURE_PSE			KVM_X86_CPU_FEATURE(0x1, 0, EDX, 3)
+#define	X86_FEATURE_TSC			KVM_X86_CPU_FEATURE(0x1, 0, EDX, 4)
+#define	X86_FEATURE_PAE			KVM_X86_CPU_FEATURE(0x1, 0, EDX, 6)
 #define	X86_FEATURE_MCE			KVM_X86_CPU_FEATURE(0x1, 0, EDX, 7)
 #define	X86_FEATURE_APIC		KVM_X86_CPU_FEATURE(0x1, 0, EDX, 9)
+#define	X86_FEATURE_PGE			KVM_X86_CPU_FEATURE(0x1, 0, EDX, 13)
 #define	X86_FEATURE_CLFLUSH		KVM_X86_CPU_FEATURE(0x1, 0, EDX, 19)
+#define	X86_FEATURE_FXSR		KVM_X86_CPU_FEATURE(0x1, 0, EDX, 24)
 #define	X86_FEATURE_XMM			KVM_X86_CPU_FEATURE(0x1, 0, EDX, 25)
 #define	X86_FEATURE_XMM2		KVM_X86_CPU_FEATURE(0x1, 0, EDX, 26)
 #define	X86_FEATURE_FSGSBASE		KVM_X86_CPU_FEATURE(0x7, 0, EBX, 0)
diff --git a/tools/testing/selftests/kvm/x86_64/vmx_msrs_test.c b/tools/testing/selftests/kvm/x86_64/vmx_msrs_test.c
index 9be2c2e3acf1..c0c4252a6a03 100644
--- a/tools/testing/selftests/kvm/x86_64/vmx_msrs_test.c
+++ b/tools/testing/selftests/kvm/x86_64/vmx_msrs_test.c
@@ -143,6 +143,70 @@ static void load_and_clear_bndcfgs_test(struct kvm_vm *vm, struct kvm_vcpu *vcpu
 	test_vmx_ctrls(vm, vcpu, VM_ENTRY_LOAD_BNDCFGS, VM_EXIT_CLEAR_BNDCFGS);
 }
 
+static void cr4_reserved_bit_test(struct kvm_vm *vm, struct kvm_vcpu *vcpu,
+				  uint64_t cr4_bit,
+				  struct kvm_x86_cpu_feature feature)
+{
+	uint64_t val;
+	int r;
+
+	if (!kvm_cpu_has(feature))
+		return;
+
+	vcpu_set_cpuid_feature(vcpu, feature);
+	val = vcpu_get_msr(vcpu, MSR_IA32_VMX_CR4_FIXED1);
+	TEST_ASSERT(val & cr4_bit,
+		    "KVM should set CR4 bit when quirk and feature are enabled");
+
+	vcpu_clear_cpuid_feature(vcpu, feature);
+	val = vcpu_get_msr(vcpu, MSR_IA32_VMX_CR4_FIXED1);
+	TEST_ASSERT(!(val & cr4_bit),
+		    "KVM should clear CR4 bit when quirk and feature are enabled");
+
+	r = _vcpu_set_msr(vcpu, MSR_IA32_VMX_CR4_FIXED1, val);
+	TEST_ASSERT(r == 0, "Writing CR4_FIXED1 should fail when quirk is enabled");
+
+	vm_enable_cap(vm, KVM_CAP_DISABLE_QUIRKS2, KVM_X86_QUIRK_TWEAK_VMX_MSRS);
+
+	val &= ~cr4_bit;
+	vcpu_set_msr(vcpu, MSR_IA32_VMX_CR4_FIXED1, val);
+
+	vcpu_set_cpuid_feature(vcpu, feature);
+	TEST_ASSERT(!(val & cr4_bit),
+		    "KVM shouldn't set CR4 bit when quirk is disabled");
+
+	val |= cr4_bit;
+	vcpu_clear_cpuid_feature(vcpu, feature);
+	TEST_ASSERT(val & cr4_bit,
+		    "KVM shouldn't clear CR4 bit when quirk is disabled");
+
+	vm_enable_cap(vm, KVM_CAP_DISABLE_QUIRKS2, 0);
+}
+
+static void cr4_reserved_bits_test(struct kvm_vm *vm, struct kvm_vcpu *vcpu)
+{
+	cr4_reserved_bit_test(vm, vcpu, X86_CR4_VME,        X86_FEATURE_VME);
+	cr4_reserved_bit_test(vm, vcpu, X86_CR4_PVI,        X86_FEATURE_VME);
+	cr4_reserved_bit_test(vm, vcpu, X86_CR4_TSD,        X86_FEATURE_TSC);
+	cr4_reserved_bit_test(vm, vcpu, X86_CR4_DE,         X86_FEATURE_DE);
+	cr4_reserved_bit_test(vm, vcpu, X86_CR4_PSE,        X86_FEATURE_PSE);
+	cr4_reserved_bit_test(vm, vcpu, X86_CR4_PAE,        X86_FEATURE_PAE);
+	cr4_reserved_bit_test(vm, vcpu, X86_CR4_MCE,        X86_FEATURE_MCE);
+	cr4_reserved_bit_test(vm, vcpu, X86_CR4_PGE,        X86_FEATURE_PGE);
+	cr4_reserved_bit_test(vm, vcpu, X86_CR4_OSFXSR,     X86_FEATURE_FXSR);
+	cr4_reserved_bit_test(vm, vcpu, X86_CR4_OSXMMEXCPT, X86_FEATURE_XMM);
+	cr4_reserved_bit_test(vm, vcpu, X86_CR4_VMXE,       X86_FEATURE_VMX);
+	cr4_reserved_bit_test(vm, vcpu, X86_CR4_SMXE,       X86_FEATURE_SMX);
+	cr4_reserved_bit_test(vm, vcpu, X86_CR4_PCIDE,      X86_FEATURE_PCID);
+	cr4_reserved_bit_test(vm, vcpu, X86_CR4_OSXSAVE,    X86_FEATURE_XSAVE);
+	cr4_reserved_bit_test(vm, vcpu, X86_CR4_FSGSBASE,   X86_FEATURE_FSGSBASE);
+	cr4_reserved_bit_test(vm, vcpu, X86_CR4_SMEP,       X86_FEATURE_SMEP);
+	cr4_reserved_bit_test(vm, vcpu, X86_CR4_SMAP,       X86_FEATURE_SMAP);
+	cr4_reserved_bit_test(vm, vcpu, X86_CR4_PKE,        X86_FEATURE_PKU);
+	cr4_reserved_bit_test(vm, vcpu, X86_CR4_UMIP,       X86_FEATURE_UMIP);
+	cr4_reserved_bit_test(vm, vcpu, X86_CR4_LA57,       X86_FEATURE_LA57);
+}
+
 int main(void)
 {
 	struct kvm_vcpu *vcpu;
@@ -156,6 +220,7 @@ int main(void)
 
 	load_perf_global_ctrl_test(vm, vcpu);
 	load_and_clear_bndcfgs_test(vm, vcpu);
+	cr4_reserved_bits_test(vm, vcpu);
 
 	kvm_vm_free(vm);
 }
-- 
2.36.1.255.ge46751e96f-goog


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 15/15] KVM: selftests: Verify VMX MSRs can be restored to KVM-supported values
  2022-06-07 21:35 [PATCH v5 00/15] KVM: nVMX: VMX MSR quirk+fixes, CR4 fixes Sean Christopherson
                   ` (13 preceding siblings ...)
  2022-06-07 21:36 ` [PATCH v5 14/15] KVM: selftests: Extend VMX MSRs test to cover CR4_FIXED1 (and its quirks) Sean Christopherson
@ 2022-06-07 21:36 ` Sean Christopherson
  14 siblings, 0 replies; 31+ messages in thread
From: Sean Christopherson @ 2022-06-07 21:36 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Eric Li, David Matlack,
	Oliver Upton

Verify that KVM allows toggling VMX MSR bits to be "more" restrictive,
and also allows restoring each MSR to KVM's original, less restrictive
value.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 .../selftests/kvm/x86_64/vmx_msrs_test.c      | 61 +++++++++++++++++++
 1 file changed, 61 insertions(+)

diff --git a/tools/testing/selftests/kvm/x86_64/vmx_msrs_test.c b/tools/testing/selftests/kvm/x86_64/vmx_msrs_test.c
index c0c4252a6a03..99d9614999c9 100644
--- a/tools/testing/selftests/kvm/x86_64/vmx_msrs_test.c
+++ b/tools/testing/selftests/kvm/x86_64/vmx_msrs_test.c
@@ -8,6 +8,7 @@
  * that KVM will set owned bits where appropriate, and will not if
  * KVM_X86_QUIRK_TWEAK_VMX_CTRL_MSRS is disabled.
  */
+#include <linux/bitmap.h>
 
 #include "kvm_util.h"
 #include "vmx.h"
@@ -207,6 +208,65 @@ static void cr4_reserved_bits_test(struct kvm_vm *vm, struct kvm_vcpu *vcpu)
 	cr4_reserved_bit_test(vm, vcpu, X86_CR4_LA57,       X86_FEATURE_LA57);
 }
 
+static void vmx_fixed1_msr_test(struct kvm_vcpu *vcpu, uint32_t msr_index,
+				  uint64_t mask)
+{
+	uint64_t val = vcpu_get_msr(vcpu, msr_index);
+	uint64_t bit;
+
+	mask &= val;
+
+	for_each_set_bit(bit, &mask, 64) {
+		vcpu_set_msr(vcpu, msr_index, val & ~BIT_ULL(bit));
+		vcpu_set_msr(vcpu, msr_index, val);
+	}
+}
+
+static void vmx_fixed0_msr_test(struct kvm_vcpu *vcpu, uint32_t msr_index,
+				uint64_t mask)
+{
+	uint64_t val = vcpu_get_msr(vcpu, msr_index);
+	uint64_t bit;
+
+	mask = ~mask | val;
+
+	for_each_clear_bit(bit, &mask, 64) {
+		vcpu_set_msr(vcpu, msr_index, val | BIT_ULL(bit));
+		vcpu_set_msr(vcpu, msr_index, val);
+	}
+}
+
+static void vmx_fixed0and1_msr_test(struct kvm_vcpu *vcpu, uint32_t msr_index)
+{
+	vmx_fixed0_msr_test(vcpu, msr_index, GENMASK_ULL(31, 0));
+	vmx_fixed1_msr_test(vcpu, msr_index, GENMASK_ULL(63, 32));
+}
+
+static void vmx_save_restore_msrs_test(struct kvm_vcpu *vcpu)
+{
+	vcpu_set_msr(vcpu, MSR_IA32_VMX_VMCS_ENUM, 0);
+	vcpu_set_msr(vcpu, MSR_IA32_VMX_VMCS_ENUM, -1ull);
+
+	vmx_fixed1_msr_test(vcpu, MSR_IA32_VMX_BASIC,
+			    BIT_ULL(49) | BIT_ULL(54) | BIT_ULL(55));
+
+	vmx_fixed1_msr_test(vcpu, MSR_IA32_VMX_BASIC,
+			    BIT_ULL(5) | GENMASK_ULL(8, 6) | BIT_ULL(14) |
+			    BIT_ULL(15) | BIT_ULL(28) | BIT_ULL(29) | BIT_ULL(30));
+
+	vmx_fixed0_msr_test(vcpu, MSR_IA32_VMX_CR0_FIXED0, -1ull);
+	vmx_fixed1_msr_test(vcpu, MSR_IA32_VMX_CR0_FIXED1, -1ull);
+	vmx_fixed0_msr_test(vcpu, MSR_IA32_VMX_CR4_FIXED0, -1ull);
+	vmx_fixed1_msr_test(vcpu, MSR_IA32_VMX_CR4_FIXED1, -1ull);
+	vmx_fixed0and1_msr_test(vcpu, MSR_IA32_VMX_PROCBASED_CTLS2);
+	vmx_fixed1_msr_test(vcpu, MSR_IA32_VMX_EPT_VPID_CAP, -1ull);
+	vmx_fixed0and1_msr_test(vcpu, MSR_IA32_VMX_TRUE_PINBASED_CTLS);
+	vmx_fixed0and1_msr_test(vcpu, MSR_IA32_VMX_TRUE_PROCBASED_CTLS);
+	vmx_fixed0and1_msr_test(vcpu, MSR_IA32_VMX_TRUE_EXIT_CTLS);
+	vmx_fixed0and1_msr_test(vcpu, MSR_IA32_VMX_TRUE_ENTRY_CTLS);
+	vmx_fixed1_msr_test(vcpu, MSR_IA32_VMX_VMFUNC, -1ull);
+}
+
 int main(void)
 {
 	struct kvm_vcpu *vcpu;
@@ -221,6 +281,7 @@ int main(void)
 	load_perf_global_ctrl_test(vm, vcpu);
 	load_and_clear_bndcfgs_test(vm, vcpu);
 	cr4_reserved_bits_test(vm, vcpu);
+	vmx_save_restore_msrs_test(vcpu);
 
 	kvm_vm_free(vm);
 }
-- 
2.36.1.255.ge46751e96f-goog


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 06/15] KVM: nVMX: Keep KVM updates to BNDCFGS ctrl bits across MSR write
  2022-06-07 21:35 ` [PATCH v5 06/15] KVM: nVMX: Keep KVM updates to BNDCFGS ctrl bits across MSR write Sean Christopherson
@ 2022-07-22  9:06   ` Paolo Bonzini
  0 siblings, 0 replies; 31+ messages in thread
From: Paolo Bonzini @ 2022-07-22  9:06 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Eric Li, David Matlack, Oliver Upton

On 6/7/22 23:35, Sean Christopherson wrote:
> From: Oliver Upton <oupton@google.com>
> 
> Since commit 5f76f6f5ff96 ("KVM: nVMX: Do not expose MPX VMX controls
> when guest MPX disabled"), KVM has taken ownership of the "load
> IA32_BNDCFGS" and "clear IA32_BNDCFGS" VMX entry/exit controls. The ABI
> is that these bits must be set in the IA32_VMX_TRUE_{ENTRY,EXIT}_CTLS
> MSRs if the guest's CPUID supports MPX, and clear otherwise.
> 
> However, commit aedbaf4f6afd ("KVM: x86: Extract
> kvm_update_cpuid_runtime() from kvm_update_cpuid()") partially broke KVM
> ownership of the aforementioned bits. Before, kvm_update_cpuid() was
> exercised frequently when running a guest and constantly applied its own
> changes to the BNDCFGS bits. Now, the BNDCFGS bits are only ever
> updated after a KVM_SET_CPUID/KVM_SET_CPUID2 ioctl, meaning that a
> subsequent MSR write from userspace will clobber these values.
> 
> Uphold the old ABI by reapplying KVM's tweaks to the BNDCFGS bits after
> an MSR write from userspace.
> 
> Note, the old ABI that is being preserved is a KVM hack to workaround a
> userspace bug; see commit 5f76f6f5ff96 ("KVM: nVMX: Do not expose MPX VMX
> controls when guest MPX disabled").

Actually this is not a userspace bug.  It's an L1 workaround for running 
*correct* L1 userspace without a *kernel* bugfix on L0, namely commit 
691bd4340bef ("kvm: vmx: allow host to access guest MSR_IA32_BNDCFGS").

But thanks for writing the incorrect commit message, because now that 
I've actually looked at the history, I'm going to say screw 6 year old 
kernels used as L0.  Let's just revert commit 5f76f6f5ff96.

I've applied patches 1-5, let's see how bad the conflicts are in the 
rest of the series.

(This shows another reason why sometimes series seem to be cursed and 
don't get reviews: maintainers having a fuzzy feeling that *something* 
is wrong with them and putting off the review until it finally clicks).

Paolo


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 11/15] KVM: nVMX: Set UMIP bit CR4_FIXED1 MSR when emulating UMIP
  2022-06-07 21:36 ` [PATCH v5 11/15] KVM: nVMX: Set UMIP bit CR4_FIXED1 MSR when emulating UMIP Sean Christopherson
@ 2022-07-22  9:49   ` Paolo Bonzini
  0 siblings, 0 replies; 31+ messages in thread
From: Paolo Bonzini @ 2022-07-22  9:49 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Eric Li, David Matlack, Oliver Upton

On 6/7/22 23:36, Sean Christopherson wrote:
> Make UMIP an "allowed-1" bit CR4_FIXED1 MSR when KVM is emulating UMIP.
> KVM emulates UMIP for both L1 and L2, and so should enumerate that L2 is
> allowed to have CR4.UMIP=1.  Not setting the bit doesn't immediately
> break nVMX, as KVM does set/clear the bit in CR4_FIXED1 in response to a
> guest CPUID update, i.e. KVM will correctly (dis)allow nested VM-Entry
> based on whether or not UMIP is exposed to L1.
> 
> That said, KVM should enumerate the bit as being allowed from time zero,
> e.g. userspace will see the wrong value if the MSR is read before CPUID
> is written.  And a future patch will quirk KVM's behavior of stuffing
> CR4_FIXED1 in response to guest CPUID changes, as CR4_FIXED1 is not
> strictly required to match the CPUID model exposed to L1.

I'm not sure about this; there's no *practical* need to allow it, since 
there is generally a 1:n mapping between CPUID and CR4 reserved bits. 
Do you mind removing the "future patch" reference from the commit message?

Paolo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 12/15] KVM: nVMX: Extend VMX MSRs quirk to CR0/4 fixed1 bits
  2022-06-07 21:36 ` [PATCH v5 12/15] KVM: nVMX: Extend VMX MSRs quirk to CR0/4 fixed1 bits Sean Christopherson
@ 2022-07-22  9:50   ` Paolo Bonzini
  0 siblings, 0 replies; 31+ messages in thread
From: Paolo Bonzini @ 2022-07-22  9:50 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Eric Li, David Matlack, Oliver Upton

On 6/7/22 23:36, Sean Christopherson wrote:
> Extend the VMX MSRs quirk to the CR0/4_FIXED1 MSRs, i.e. when the quirk
> is disabled, allow userspace to set the MSRs and do not rewrite the MSRs
> during CPUID updates.  The bits that the guest (L2 in this case) is
> allowed to set are not directly tied to CPUID.  Enumerating to L1 that it
> can set reserved CR0/4 bits is nonsensical and will ultimately result in
> a failed nested VM-Entry (KVM enforces guest reserved CR4 bits on top of
> the VMX MSRs), but KVM typically doesn't police the vCPU model except
> when doing so is necessary to protect the host kernel.
> 
> Further restricting CR4 bits is however a reasonable thing to do, e.g. to
> work around a bug in nested virtualization, in which case exposing a
> feature to L1 is ok, but letting L2 use the feature is not.  Of course,
> whether or not the L1 hypervisor will actually _check_ the FIXED1 bits is
> another matter entirely, e.g. KVM currently assumes all bits that can be
> set in the host can also be set in the guest.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>

I'm leaving this patch out for separate discussion since the quirk is 
not needed anymore.  I'll post the two reverts after some testing.

Paolo

> ---
>   Documentation/virt/kvm/api.rst |  8 ++++++++
>   arch/x86/kvm/vmx/nested.c      | 33 ++++++++++++++++++++++++++++++---
>   arch/x86/kvm/vmx/vmx.c         |  6 +++---
>   3 files changed, 41 insertions(+), 6 deletions(-)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 1095692ddab7..88d1bbae031e 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -7391,6 +7391,14 @@ The valid bits in cap.args[0] are:
>                                         IA32_VMX_TRUE_EXIT_CTLS[bit 44]
>                                         ('load IA32_PERF_GLOBAL_CTRL'). Otherwise,
>                                         these corresponding MSR bits are cleared.
> +                                    - MSR_IA32_VMX_CR0_FIXED1 is unconditionally
> +                                      set to 0xffffffff
> +                                    - CR4.PCE is unconditionally set in
> +                                      MSR_IA32_VMX_CR4_FIXED1.
> +                                    - All CR4 bits with an associated CPUID
> +                                      feature flag are set in
> +                                      MSR_IA32_VMX_CR4_FIXED1 if the feature is
> +                                      reported as supported in guest CPUID.
>   
>                                       When this quirk is disabled, KVM will not
>                                       change the values of the aformentioned VMX
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 5533c2474128..abce74cfefc9 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -1385,6 +1385,30 @@ static int vmx_restore_fixed0_msr(struct vcpu_vmx *vmx, u32 msr_index, u64 data)
>   	return 0;
>   }
>   
> +static u64 *vmx_get_fixed1_msr(struct nested_vmx_msrs *msrs, u32 msr_index)
> +{
> +	switch (msr_index) {
> +	case MSR_IA32_VMX_CR0_FIXED1:
> +		return &msrs->cr0_fixed1;
> +	case MSR_IA32_VMX_CR4_FIXED1:
> +		return &msrs->cr4_fixed1;
> +	default:
> +		BUG();
> +	}
> +}
> +
> +static int vmx_restore_fixed1_msr(struct vcpu_vmx *vmx, u32 msr_index, u64 data)
> +{
> +	const u64 *msr = vmx_get_fixed1_msr(&vmcs_config.nested, msr_index);
> +
> +	/* Bits that "must-be-0" must not be set in the restored value. */
> +	if (!is_bitwise_subset(*msr, data, -1ULL))
> +		return -EINVAL;
> +
> +	*vmx_get_fixed1_msr(&vmx->nested.msrs, msr_index) = data;
> +	return 0;
> +}
> +
>   /*
>    * Called when userspace is restoring VMX MSRs.
>    *
> @@ -1432,10 +1456,13 @@ int vmx_set_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data)
>   	case MSR_IA32_VMX_CR0_FIXED1:
>   	case MSR_IA32_VMX_CR4_FIXED1:
>   		/*
> -		 * These MSRs are generated based on the vCPU's CPUID, so we
> -		 * do not support restoring them directly.
> +		 * These MSRs are generated based on the vCPU's CPUID when KVM
> +		 * "owns" the VMX MSRs, do not allow restoring them directly.
>   		 */
> -		return -EINVAL;
> +		if (kvm_check_has_quirk(vmx->vcpu.kvm, KVM_X86_QUIRK_TWEAK_VMX_MSRS))
> +			return -EINVAL;
> +
> +		return vmx_restore_fixed1_msr(vmx, msr_index, data);
>   	case MSR_IA32_VMX_EPT_VPID_CAP:
>   		return vmx_restore_vmx_ept_vpid_cap(vmx, data);
>   	case MSR_IA32_VMX_VMCS_ENUM:
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 4c31c8f24329..139f365ca6bb 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -7520,10 +7520,10 @@ static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>   			~(FEAT_CTL_VMX_ENABLED_INSIDE_SMX |
>   			  FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX);
>   
> -	if (nested_vmx_allowed(vcpu)) {
> +	if (nested_vmx_allowed(vcpu) &&
> +	    kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_TWEAK_VMX_MSRS)) {
>   		nested_vmx_cr_fixed1_bits_update(vcpu);
> -		if (kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_TWEAK_VMX_MSRS))
> -			nested_vmx_entry_exit_ctls_update(vcpu);
> +		nested_vmx_entry_exit_ctls_update(vcpu);
>   	}
>   
>   	if (boot_cpu_has(X86_FEATURE_INTEL_PT) &&


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 05/15] KVM: nVMX: Let userspace set nVMX MSR to any _host_ supported value
  2022-06-07 21:35 ` [PATCH v5 05/15] KVM: nVMX: Let userspace set nVMX MSR to any _host_ supported value Sean Christopherson
@ 2022-10-31 16:39   ` Yu Zhang
  2022-10-31 17:11     ` Sean Christopherson
  0 siblings, 1 reply; 31+ messages in thread
From: Yu Zhang @ 2022-10-31 16:39 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Eric Li, David Matlack, Oliver Upton

Hi Sean & Paolo,

On Tue, Jun 07, 2022 at 09:35:54PM +0000, Sean Christopherson wrote:
> Restrict the nVMX MSRs based on KVM's config, not based on the guest's
> current config.  Using the guest's config to audit the new config
> prevents userspace from restoring the original config (KVM's config) if
> at any point in the past the guest's config was restricted in any way.

May I ask for an example here, to explain why we use the KVM config
here, instead of the guest's? I mean, the guest's config can be
adjusted after cpuid updates by vmx_vcpu_after_set_cpuid(). Yet the
msr settings in vmcs_config.nested might be outdated by then.

Another question is about the setting of secondary_ctls_high in
nested_vmx_setup_ctls_msrs().  I saw there's a comment saying:
	"Do not include those that depend on CPUID bits, they are
	added later by vmx_vcpu_after_set_cpuid.".

But since cpuid updates can adjust the vmx->nested.msrs.secondary_ctls_high,
do we really need to clear those flags for secondary_ctls_high in this
global config? Could we just set 
	msrs->secondary_ctls_high = vmcs_conf->cpu_based_2nd_exec_ctrl?

If yes, code(in nested_vmx_setup_ctls_msrs()) such as
	if (enable_ept) {
		/* nested EPT: emulate EPT also to L1 */
		msrs->secondary_ctls_high |=
			SECONDARY_EXEC_ENABLE_EPT;
or 
	if (cpu_has_vmx_vmfunc()) {
		msrs->secondary_ctls_high |=
			SECONDARY_EXEC_ENABLE_VMFUNC;
and other similar ones may also be uncessary.

B.R.
Yu

> 
> Fixes: 62cc6b9dc61e ("KVM: nVMX: support restore of VMX capability MSRs")
> Cc: stable@vger.kernel.org
> Cc: David Matlack <dmatlack@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/vmx/nested.c | 100 ++++++++++++++++++++------------------
>  1 file changed, 52 insertions(+), 48 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 00c7b00c017a..fca30e79b3a0 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -1223,7 +1223,7 @@ static int vmx_restore_vmx_basic(struct vcpu_vmx *vmx, u64 data)
>  		BIT_ULL(49) | BIT_ULL(54) | BIT_ULL(55) |
>  		/* reserved */
>  		BIT_ULL(31) | GENMASK_ULL(47, 45) | GENMASK_ULL(63, 56);
> -	u64 vmx_basic = vmx->nested.msrs.basic;
> +	u64 vmx_basic = vmcs_config.nested.basic;
>  
>  	if (!is_bitwise_subset(vmx_basic, data, feature_and_reserved))
>  		return -EINVAL;
> @@ -1246,36 +1246,42 @@ static int vmx_restore_vmx_basic(struct vcpu_vmx *vmx, u64 data)
>  	return 0;
>  }
>  
> +static void vmx_get_control_msr(struct nested_vmx_msrs *msrs, u32 msr_index,
> +				u32 **low, u32 **high)
> +{
> +	switch (msr_index) {
> +	case MSR_IA32_VMX_TRUE_PINBASED_CTLS:
> +		*low = &msrs->pinbased_ctls_low;
> +		*high = &msrs->pinbased_ctls_high;
> +		break;
> +	case MSR_IA32_VMX_TRUE_PROCBASED_CTLS:
> +		*low = &msrs->procbased_ctls_low;
> +		*high = &msrs->procbased_ctls_high;
> +		break;
> +	case MSR_IA32_VMX_TRUE_EXIT_CTLS:
> +		*low = &msrs->exit_ctls_low;
> +		*high = &msrs->exit_ctls_high;
> +		break;
> +	case MSR_IA32_VMX_TRUE_ENTRY_CTLS:
> +		*low = &msrs->entry_ctls_low;
> +		*high = &msrs->entry_ctls_high;
> +		break;
> +	case MSR_IA32_VMX_PROCBASED_CTLS2:
> +		*low = &msrs->secondary_ctls_low;
> +		*high = &msrs->secondary_ctls_high;
> +		break;
> +	default:
> +		BUG();
> +	}
> +}
> +
>  static int
>  vmx_restore_control_msr(struct vcpu_vmx *vmx, u32 msr_index, u64 data)
>  {
> -	u64 supported;
>  	u32 *lowp, *highp;
> +	u64 supported;
>  
> -	switch (msr_index) {
> -	case MSR_IA32_VMX_TRUE_PINBASED_CTLS:
> -		lowp = &vmx->nested.msrs.pinbased_ctls_low;
> -		highp = &vmx->nested.msrs.pinbased_ctls_high;
> -		break;
> -	case MSR_IA32_VMX_TRUE_PROCBASED_CTLS:
> -		lowp = &vmx->nested.msrs.procbased_ctls_low;
> -		highp = &vmx->nested.msrs.procbased_ctls_high;
> -		break;
> -	case MSR_IA32_VMX_TRUE_EXIT_CTLS:
> -		lowp = &vmx->nested.msrs.exit_ctls_low;
> -		highp = &vmx->nested.msrs.exit_ctls_high;
> -		break;
> -	case MSR_IA32_VMX_TRUE_ENTRY_CTLS:
> -		lowp = &vmx->nested.msrs.entry_ctls_low;
> -		highp = &vmx->nested.msrs.entry_ctls_high;
> -		break;
> -	case MSR_IA32_VMX_PROCBASED_CTLS2:
> -		lowp = &vmx->nested.msrs.secondary_ctls_low;
> -		highp = &vmx->nested.msrs.secondary_ctls_high;
> -		break;
> -	default:
> -		BUG();
> -	}
> +	vmx_get_control_msr(&vmcs_config.nested, msr_index, &lowp, &highp);
>  
>  	supported = vmx_control_msr(*lowp, *highp);
>  
> @@ -1287,6 +1293,7 @@ vmx_restore_control_msr(struct vcpu_vmx *vmx, u32 msr_index, u64 data)
>  	if (!is_bitwise_subset(supported, data, GENMASK_ULL(63, 32)))
>  		return -EINVAL;
>  
> +	vmx_get_control_msr(&vmx->nested.msrs, msr_index, &lowp, &highp);
>  	*lowp = data;
>  	*highp = data >> 32;
>  	return 0;
> @@ -1300,10 +1307,8 @@ static int vmx_restore_vmx_misc(struct vcpu_vmx *vmx, u64 data)
>  		BIT_ULL(28) | BIT_ULL(29) | BIT_ULL(30) |
>  		/* reserved */
>  		GENMASK_ULL(13, 9) | BIT_ULL(31);
> -	u64 vmx_misc;
> -
> -	vmx_misc = vmx_control_msr(vmx->nested.msrs.misc_low,
> -				   vmx->nested.msrs.misc_high);
> +	u64 vmx_misc = vmx_control_msr(vmcs_config.nested.misc_low,
> +				       vmcs_config.nested.misc_high);
>  
>  	if (!is_bitwise_subset(vmx_misc, data, feature_and_reserved_bits))
>  		return -EINVAL;
> @@ -1331,10 +1336,8 @@ static int vmx_restore_vmx_misc(struct vcpu_vmx *vmx, u64 data)
>  
>  static int vmx_restore_vmx_ept_vpid_cap(struct vcpu_vmx *vmx, u64 data)
>  {
> -	u64 vmx_ept_vpid_cap;
> -
> -	vmx_ept_vpid_cap = vmx_control_msr(vmx->nested.msrs.ept_caps,
> -					   vmx->nested.msrs.vpid_caps);
> +	u64 vmx_ept_vpid_cap = vmx_control_msr(vmcs_config.nested.ept_caps,
> +					       vmcs_config.nested.vpid_caps);
>  
>  	/* Every bit is either reserved or a feature bit. */
>  	if (!is_bitwise_subset(vmx_ept_vpid_cap, data, -1ULL))
> @@ -1345,20 +1348,21 @@ static int vmx_restore_vmx_ept_vpid_cap(struct vcpu_vmx *vmx, u64 data)
>  	return 0;
>  }
>  
> +static u64 *vmx_get_fixed0_msr(struct nested_vmx_msrs *msrs, u32 msr_index)
> +{
> +	switch (msr_index) {
> +	case MSR_IA32_VMX_CR0_FIXED0:
> +		return &msrs->cr0_fixed0;
> +	case MSR_IA32_VMX_CR4_FIXED0:
> +		return &msrs->cr4_fixed0;
> +	default:
> +		BUG();
> +	}
> +}
> +
>  static int vmx_restore_fixed0_msr(struct vcpu_vmx *vmx, u32 msr_index, u64 data)
>  {
> -	u64 *msr;
> -
> -	switch (msr_index) {
> -	case MSR_IA32_VMX_CR0_FIXED0:
> -		msr = &vmx->nested.msrs.cr0_fixed0;
> -		break;
> -	case MSR_IA32_VMX_CR4_FIXED0:
> -		msr = &vmx->nested.msrs.cr4_fixed0;
> -		break;
> -	default:
> -		BUG();
> -	}
> +	const u64 *msr = vmx_get_fixed0_msr(&vmcs_config.nested, msr_index);
>  
>  	/*
>  	 * 1 bits (which indicates bits which "must-be-1" during VMX operation)
> @@ -1367,7 +1371,7 @@ static int vmx_restore_fixed0_msr(struct vcpu_vmx *vmx, u32 msr_index, u64 data)
>  	if (!is_bitwise_subset(data, *msr, -1ULL))
>  		return -EINVAL;
>  
> -	*msr = data;
> +	*vmx_get_fixed0_msr(&vmx->nested.msrs, msr_index) = data;
>  	return 0;
>  }
>  
> @@ -1428,7 +1432,7 @@ int vmx_set_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data)
>  		vmx->nested.msrs.vmcs_enum = data;
>  		return 0;
>  	case MSR_IA32_VMX_VMFUNC:
> -		if (data & ~vmx->nested.msrs.vmfunc_controls)
> +		if (data & ~vmcs_config.nested.vmfunc_controls)
>  			return -EINVAL;
>  		vmx->nested.msrs.vmfunc_controls = data;
>  		return 0;
> -- 
> 2.36.1.255.ge46751e96f-goog
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 05/15] KVM: nVMX: Let userspace set nVMX MSR to any _host_ supported value
  2022-10-31 16:39   ` Yu Zhang
@ 2022-10-31 17:11     ` Sean Christopherson
  2022-11-01 10:18       ` Yu Zhang
  0 siblings, 1 reply; 31+ messages in thread
From: Sean Christopherson @ 2022-10-31 17:11 UTC (permalink / raw)
  To: Yu Zhang
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Eric Li, David Matlack,
	Oliver Upton

On Tue, Nov 01, 2022, Yu Zhang wrote:
> Hi Sean & Paolo,
> 
> On Tue, Jun 07, 2022 at 09:35:54PM +0000, Sean Christopherson wrote:
> > Restrict the nVMX MSRs based on KVM's config, not based on the guest's
> > current config.  Using the guest's config to audit the new config
> > prevents userspace from restoring the original config (KVM's config) if
> > at any point in the past the guest's config was restricted in any way.
> 
> May I ask for an example here, to explain why we use the KVM config
> here, instead of the guest's? I mean, the guest's config can be
> adjusted after cpuid updates by vmx_vcpu_after_set_cpuid(). Yet the
> msr settings in vmcs_config.nested might be outdated by then.

vmcs_config.nested never becomes out-of-date, it's read-only after __init (not
currently marked as such, that will be remedied soon).

The auditing performed by KVM is purely to guard against userspace enabling
features that KVM doesn't support.  KVM is not responsible for ensuring that the
vCPU's CPUID model match the VMX MSR model.

An example would be if userspace loaded the VMX MSRs with a default model, and
then enabled features one-by-one.  In practice this doesn't happen because it's
more performant to gather all features and do a single KVM_SET_MSRS, but it's a
legitimate approach that KVM should allow.

> Another question is about the setting of secondary_ctls_high in
> nested_vmx_setup_ctls_msrs().  I saw there's a comment saying:
> 	"Do not include those that depend on CPUID bits, they are
> 	added later by vmx_vcpu_after_set_cpuid.".

That's a stale comment, see the very next commit, 8805875aa473 ("Revert "KVM: nVMX:
Do not expose MPX VMX controls when guest MPX disabled""), as well as the slightly
later commit 9389d5774aca ("Revert "KVM: nVMX: Expose load IA32_PERF_GLOBAL_CTRL
VM-{Entry,Exit} control"").

> But since cpuid updates can adjust the vmx->nested.msrs.secondary_ctls_high,
> do we really need to clear those flags for secondary_ctls_high in this
> global config?

As above, the comment is stale, KVM should not manipulate the VMX MSRs in response
to guest CPUID changes.  The one exception to this is reserved CR0/CR4 bits.  We
discussed quirking that behavior, but ultimately decided not to because (a) no
userspace actually cares and and (b) KVM would effectively need to make up behavior
if userspace allowed the guest to load CR4 bits via VM-Enter or VM-Exit that are
disallowed by CPUID, e.g. L1 could end up running with a CR4 that is supposed to
be impossible according to CPUID.

> Could we just set 
> 	msrs->secondary_ctls_high = vmcs_conf->cpu_based_2nd_exec_ctrl?

KVM already does that in upstream (with further sanitization).  See commit
bcdf201f8a4d ("KVM: nVMX: Use sanitized allowed-1 bits for VMX control MSRs").

> If yes, code(in nested_vmx_setup_ctls_msrs()) such as
> 	if (enable_ept) {
> 		/* nested EPT: emulate EPT also to L1 */
> 		msrs->secondary_ctls_high |=
> 			SECONDARY_EXEC_ENABLE_EPT;

This can't be completely removed, though unless I'm missing something, it can and
should be shifted to the sanitization code, e.g.

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 8f67a9c4a287..0c41d5808413 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -6800,6 +6800,7 @@ void nested_vmx_setup_ctls_msrs(struct vmcs_config *vmcs_conf, u32 ept_caps)
 
        msrs->secondary_ctls_high = vmcs_conf->cpu_based_2nd_exec_ctrl;
        msrs->secondary_ctls_high &=
+               SECONDARY_EXEC_ENABLE_EPT |
                SECONDARY_EXEC_DESC |
                SECONDARY_EXEC_ENABLE_RDTSCP |
                SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE |
@@ -6820,9 +6821,6 @@ void nested_vmx_setup_ctls_msrs(struct vmcs_config *vmcs_conf, u32 ept_caps)
                SECONDARY_EXEC_SHADOW_VMCS;
 
        if (enable_ept) {
-               /* nested EPT: emulate EPT also to L1 */
-               msrs->secondary_ctls_high |=
-                       SECONDARY_EXEC_ENABLE_EPT;
                msrs->ept_caps =
                        VMX_EPT_PAGE_WALK_4_BIT |
                        VMX_EPT_PAGE_WALK_5_BIT |


> or 
> 	if (cpu_has_vmx_vmfunc()) {
> 		msrs->secondary_ctls_high |=
> 			SECONDARY_EXEC_ENABLE_VMFUNC;

This one is still required.  KVM never enables VMFUNC for itself, i.e. it won't
be set in KVM's VMCS configuration.

> and other similar ones may also be uncessary.



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 05/15] KVM: nVMX: Let userspace set nVMX MSR to any _host_ supported value
  2022-10-31 17:11     ` Sean Christopherson
@ 2022-11-01 10:18       ` Yu Zhang
  2022-11-01 17:58         ` Sean Christopherson
  0 siblings, 1 reply; 31+ messages in thread
From: Yu Zhang @ 2022-11-01 10:18 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Eric Li, David Matlack,
	Oliver Upton

On Mon, Oct 31, 2022 at 05:11:10PM +0000, Sean Christopherson wrote:
> On Tue, Nov 01, 2022, Yu Zhang wrote:
> > Hi Sean & Paolo,
> > 
> > On Tue, Jun 07, 2022 at 09:35:54PM +0000, Sean Christopherson wrote:
> > > Restrict the nVMX MSRs based on KVM's config, not based on the guest's
> > > current config.  Using the guest's config to audit the new config
> > > prevents userspace from restoring the original config (KVM's config) if
> > > at any point in the past the guest's config was restricted in any way.
> > 
> > May I ask for an example here, to explain why we use the KVM config
> > here, instead of the guest's? I mean, the guest's config can be
> > adjusted after cpuid updates by vmx_vcpu_after_set_cpuid(). Yet the
> > msr settings in vmcs_config.nested might be outdated by then.

Thanks a lot for your explanation, Sean. Questions are embedded below:

> 
> vmcs_config.nested never becomes out-of-date, it's read-only after __init (not
> currently marked as such, that will be remedied soon).
> 
> The auditing performed by KVM is purely to guard against userspace enabling
> features that KVM doesn't support.  KVM is not responsible for ensuring that the
> vCPU's CPUID model match the VMX MSR model.

Do you mean the VMX MSR model shall not be changed after the cpuid updates?
And for VMX MSR model, do you mean the vmx->nested.msrs or the ones in 
vmcs_config->nested? 

What I observed is that vmx->nested.msrs.secondary_ctls_high will be changed
in vmx_adjust_secondary_exec_control(), which can be triggered after cpuid is
set. 

Since KVM's config(vmcs_config->nested.secondary_ctls_high) is done during init
by nested_vmx_setup_ctls_msrs(), which only kept a subset of the flags from the
vmcs_confg->cpu_based_2nd_exec_ctrl, the vmx_restore_control_msr() could fail
later, when userspace VMM tries to enable a feature(the only one I witnessed is
SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE) by setting MSR_IA32_VMX_PROCBASED_CTLS2.
Because the vmx->nested.msrs.secondary_ctls_high is updated by cpuid, but this
bit is not taken from vmcs_conf->cpu_based_2nd_exec_ctrl by nested_vmx_setup_ctls_msrs()
for vmcs_config->nested.secondary_ctls_high.

The failure can be fixed, simply by adding SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE in
nested_vmx_setup_ctls_msrs(), e.g.

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 0c62352dda6a..fa255391718c 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -6791,13 +6791,7 @@ void nested_vmx_setup_ctls_msrs(struct vmcs_config *vmcs_conf, u32 ept_caps)
        msrs->procbased_ctls_low &=
                ~(CPU_BASED_CR3_LOAD_EXITING | CPU_BASED_CR3_STORE_EXITING);

-       /*
-        * secondary cpu-based controls.  Do not include those that
-        * depend on CPUID bits, they are added later by
-        * vmx_vcpu_after_set_cpuid.
-        */
        msrs->secondary_ctls_low = 0;
-
        msrs->secondary_ctls_high = vmcs_conf->cpu_based_2nd_exec_ctrl;
        msrs->secondary_ctls_high &=
                SECONDARY_EXEC_DESC |
@@ -6810,7 +6804,8 @@ void nested_vmx_setup_ctls_msrs(struct vmcs_config *vmcs_conf, u32 ept_caps)
                SECONDARY_EXEC_ENABLE_INVPCID |
                SECONDARY_EXEC_RDSEED_EXITING |
                SECONDARY_EXEC_XSAVES |
-               SECONDARY_EXEC_TSC_SCALING;
+               SECONDARY_EXEC_TSC_SCALING |
+               SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE;

        /*
         * We can emulate "VMCS shadowing," even if the hardware

But then I wonder, why do we need the bitwise and operation here first:
        msrs->secondary_ctls_high = vmcs_conf->cpu_based_2nd_exec_ctrl;
        msrs->secondary_ctls_high &=
                SECONDARY_EXEC_DESC |
                SECONDARY_EXEC_ENABLE_RDTSCP |
                SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE |
                SECONDARY_EXEC_WBINVD_EXITING |
                SECONDARY_EXEC_APIC_REGISTER_VIRT |
                SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY |
                SECONDARY_EXEC_RDRAND_EXITING |
                SECONDARY_EXEC_ENABLE_INVPCID |
                SECONDARY_EXEC_RDSEED_EXITING |
                SECONDARY_EXEC_XSAVES |
                SECONDARY_EXEC_TSC_SCALING |
                SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE;

And then reset many of the remaining flags based on configurations such as
enable_ept, enable_vpid, enable_unrestricted_guest etc... But maybe we need
go through this case by case.

> 
> An example would be if userspace loaded the VMX MSRs with a default model, and
> then enabled features one-by-one.  In practice this doesn't happen because it's
> more performant to gather all features and do a single KVM_SET_MSRS, but it's a
> legitimate approach that KVM should allow.
> 
> > Another question is about the setting of secondary_ctls_high in
> > nested_vmx_setup_ctls_msrs().  I saw there's a comment saying:
> > 	"Do not include those that depend on CPUID bits, they are
> > 	added later by vmx_vcpu_after_set_cpuid.".
> 
> That's a stale comment, see the very next commit, 8805875aa473 ("Revert "KVM: nVMX:
> Do not expose MPX VMX controls when guest MPX disabled""), as well as the slightly
> later commit 9389d5774aca ("Revert "KVM: nVMX: Expose load IA32_PERF_GLOBAL_CTRL
> VM-{Entry,Exit} control"").
> 

So the comment can be and shall be removed, right? 

> > But since cpuid updates can adjust the vmx->nested.msrs.secondary_ctls_high,
> > do we really need to clear those flags for secondary_ctls_high in this
> > global config?
> 
> As above, the comment is stale, KVM should not manipulate the VMX MSRs in response
> to guest CPUID changes.  The one exception to this is reserved CR0/CR4 bits.  We
> discussed quirking that behavior, but ultimately decided not to because (a) no
> userspace actually cares and and (b) KVM would effectively need to make up behavior
> if userspace allowed the guest to load CR4 bits via VM-Enter or VM-Exit that are
> disallowed by CPUID, e.g. L1 could end up running with a CR4 that is supposed to
> be impossible according to CPUID.
> 
> > Could we just set 
> > 	msrs->secondary_ctls_high = vmcs_conf->cpu_based_2nd_exec_ctrl?
> 
> KVM already does that in upstream (with further sanitization).  See commit
> bcdf201f8a4d ("KVM: nVMX: Use sanitized allowed-1 bits for VMX control MSRs").
> 
> > If yes, code(in nested_vmx_setup_ctls_msrs()) such as
> > 	if (enable_ept) {
> > 		/* nested EPT: emulate EPT also to L1 */
> > 		msrs->secondary_ctls_high |=
> > 			SECONDARY_EXEC_ENABLE_EPT;
> 
> This can't be completely removed, though unless I'm missing something, it can and
> should be shifted to the sanitization code, e.g.
> 
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 8f67a9c4a287..0c41d5808413 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -6800,6 +6800,7 @@ void nested_vmx_setup_ctls_msrs(struct vmcs_config *vmcs_conf, u32 ept_caps)
>  
>         msrs->secondary_ctls_high = vmcs_conf->cpu_based_2nd_exec_ctrl;
>         msrs->secondary_ctls_high &=
> +               SECONDARY_EXEC_ENABLE_EPT |
>                 SECONDARY_EXEC_DESC |
>                 SECONDARY_EXEC_ENABLE_RDTSCP |
>                 SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE |
> @@ -6820,9 +6821,6 @@ void nested_vmx_setup_ctls_msrs(struct vmcs_config *vmcs_conf, u32 ept_caps)
>                 SECONDARY_EXEC_SHADOW_VMCS;
>  
>         if (enable_ept) {
> -               /* nested EPT: emulate EPT also to L1 */
> -               msrs->secondary_ctls_high |=
> -                       SECONDARY_EXEC_ENABLE_EPT;
>                 msrs->ept_caps =
>                         VMX_EPT_PAGE_WALK_4_BIT |
>                         VMX_EPT_PAGE_WALK_5_BIT |
> 
> 
> > or 
> > 	if (cpu_has_vmx_vmfunc()) {
> > 		msrs->secondary_ctls_high |=
> > 			SECONDARY_EXEC_ENABLE_VMFUNC;
> 
> This one is still required.  KVM never enables VMFUNC for itself, i.e. it won't
> be set in KVM's VMCS configuration.
> 

My understanding is that, for VMFUNC, altough KVM does not support it,
SECONDARY_EXEC_ENABLE_VMFUNC is still set in the secondary proc-based
vm execution ctrol. KVM just uses different handlers for VMFUNC exits
from L1(to inject the #UD) and L2(to emulate the eptp switching). So
it doesn't matter if we do not clear this bit for vmcs_config->nested.
procbased_ctls_high. 

I may probably missed something. But I hope my questions are clear 
enough (though I also doubt it...) :)

B.R.
Yu

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 05/15] KVM: nVMX: Let userspace set nVMX MSR to any _host_ supported value
  2022-11-01 10:18       ` Yu Zhang
@ 2022-11-01 17:58         ` Sean Christopherson
  2022-11-02  8:54           ` Yu Zhang
  0 siblings, 1 reply; 31+ messages in thread
From: Sean Christopherson @ 2022-11-01 17:58 UTC (permalink / raw)
  To: Yu Zhang
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Eric Li, David Matlack,
	Oliver Upton

On Tue, Nov 01, 2022, Yu Zhang wrote:
> On Mon, Oct 31, 2022 at 05:11:10PM +0000, Sean Christopherson wrote:
> > vmcs_config.nested never becomes out-of-date, it's read-only after __init (not
> > currently marked as such, that will be remedied soon).
> > 
> > The auditing performed by KVM is purely to guard against userspace enabling
> > features that KVM doesn't support.  KVM is not responsible for ensuring that the
> > vCPU's CPUID model match the VMX MSR model.
> 
> Do you mean the VMX MSR model shall not be changed after the cpuid updates?

No, I mean that the virtual CPU model (CPUID + VMX MSRs) that is presented to the
guest is the responsibility of host userspace.  KVM only cares about not enabling
bits/features that KVM doesn't supported.

> And for VMX MSR model, do you mean the vmx->nested.msrs or the ones in 
> vmcs_config->nested? 

vmx->nested.msrs.  vmcs_config->nested is effectively the VMX equivalent of
KVM_GET_SUPPORTED_CPUID.

> What I observed is that vmx->nested.msrs.secondary_ctls_high will be changed
> in vmx_adjust_secondary_exec_control(), which can be triggered after cpuid is
> set. 

Ugh, that path got overlooked when we yanked out KVM's manipulaton of VMX MSRs
in response to guest CPUID changes.  I wonder if we can get away with changing
KVM's behavior to only ensure a feature isn't exposed to L2 when it's not exposed
to L1.

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 6b4266e949a3..cfc35d559d91 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4523,8 +4523,8 @@ vmx_adjust_secondary_exec_control(struct vcpu_vmx *vmx, u32 *exec_control,
         * Update the nested MSR settings so that a nested VMM can/can't set
         * controls for features that are/aren't exposed to the guest.
         */
-       if (nested) {
-               if (enabled)
+       if (nested && !enabled)
+               if (exiting)
                        vmx->nested.msrs.secondary_ctls_high |= control;
                else
                        vmx->nested.msrs.secondary_ctls_high &= ~control;


> Since KVM's config(vmcs_config->nested.secondary_ctls_high) is done during init
> by nested_vmx_setup_ctls_msrs(), which only kept a subset of the flags from the
> vmcs_confg->cpu_based_2nd_exec_ctrl, the vmx_restore_control_msr() could fail
> later, when userspace VMM tries to enable a feature(the only one I witnessed is
> SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE) by setting MSR_IA32_VMX_PROCBASED_CTLS2.
> Because the vmx->nested.msrs.secondary_ctls_high is updated by cpuid, but this
> bit is not taken from vmcs_conf->cpu_based_2nd_exec_ctrl by nested_vmx_setup_ctls_msrs()
> for vmcs_config->nested.secondary_ctls_high.
> 
> The failure can be fixed, simply by adding SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE in
> nested_vmx_setup_ctls_msrs(), e.g.

Assuming KVM actually supports user wait/pause in L2, this is an orthogonal bug
to the CPUID-based manipulation above.  KVM simply neglects to advertise to userspace
that ENABLE_USR_WAIT_PAUSE is supported for nested virtualization.

If KVM doesn't correctly support virtualizing user wait/pause for L2, then the
correct location to fix this is in vmx_secondary_exec_control().

> > > Another question is about the setting of secondary_ctls_high in
> > > nested_vmx_setup_ctls_msrs().  I saw there's a comment saying:
> > > 	"Do not include those that depend on CPUID bits, they are
> > > 	added later by vmx_vcpu_after_set_cpuid.".
> > 
> > That's a stale comment, see the very next commit, 8805875aa473 ("Revert "KVM: nVMX:
> > Do not expose MPX VMX controls when guest MPX disabled""), as well as the slightly
> > later commit 9389d5774aca ("Revert "KVM: nVMX: Expose load IA32_PERF_GLOBAL_CTRL
> > VM-{Entry,Exit} control"").
> > 
> 
> So the comment can be and shall be removed, right? 

Yep.

> > > 	if (cpu_has_vmx_vmfunc()) {
> > > 		msrs->secondary_ctls_high |=
> > > 			SECONDARY_EXEC_ENABLE_VMFUNC;
> > 
> > This one is still required.  KVM never enables VMFUNC for itself, i.e. it won't
> > be set in KVM's VMCS configuration.
> > 
> 
> My understanding is that, for VMFUNC, altough KVM does not support it,
> SECONDARY_EXEC_ENABLE_VMFUNC is still set in the secondary proc-based
> vm execution ctrol. KVM just uses different handlers for VMFUNC exits
> from L1(to inject the #UD) and L2(to emulate the eptp switching). So
> it doesn't matter if we do not clear this bit for vmcs_config->nested.
> procbased_ctls_high. 

Ah, you're right, I didn't realize KVM enables VMFUNC in L1.  Enabling VMFUNC for
L1 is silly though, it's trivial to clear the feature in vmx_secondary_exec_control().

That said, enabling VMFUNC in vmcs01 is an orthogonal topic, and it _is_ indeed
easier to keep the feature in the reference config.  Now that the nested config
is derived from the non-nested config, nested_vmx_setup_ctls_msrs() can do:

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 61a2e551640a..751cc67aa1a9 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -6811,7 +6811,8 @@ void nested_vmx_setup_ctls_msrs(struct vmcs_config *vmcs_conf, u32 ept_caps)
                SECONDARY_EXEC_ENABLE_INVPCID |
                SECONDARY_EXEC_RDSEED_EXITING |
                SECONDARY_EXEC_XSAVES |
-               SECONDARY_EXEC_TSC_SCALING;
+               SECONDARY_EXEC_TSC_SCALING |
+               SECONDARY_EXEC_ENABLE_VMFUNC;
 
        /*
         * We can emulate "VMCS shadowing," even if the hardware
@@ -6842,17 +6843,12 @@ void nested_vmx_setup_ctls_msrs(struct vmcs_config *vmcs_conf, u32 ept_caps)
                }
        }
 
-       if (cpu_has_vmx_vmfunc()) {
-               msrs->secondary_ctls_high |=
-                       SECONDARY_EXEC_ENABLE_VMFUNC;
-               /*
-                * Advertise EPTP switching unconditionally
-                * since we emulate it
-                */
-               if (enable_ept)
-                       msrs->vmfunc_controls =
-                               VMX_VMFUNC_EPTP_SWITCHING;
-       }
+       /*
+        * Advertise EPTP switching if VMFUNC and EPT are supported, KVM
+        * emulates the actual EPTP switch in software.
+        */
+       if (cpu_has_vmx_vmfunc() && enable_ept)
+               msrs->vmfunc_controls = VMX_VMFUNC_EPTP_SWITCHING;
 
        /*
         * Old versions of KVM use the single-context version without

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 05/15] KVM: nVMX: Let userspace set nVMX MSR to any _host_ supported value
  2022-11-01 17:58         ` Sean Christopherson
@ 2022-11-02  8:54           ` Yu Zhang
  2022-11-03 16:53             ` Sean Christopherson
  0 siblings, 1 reply; 31+ messages in thread
From: Yu Zhang @ 2022-11-02  8:54 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Eric Li, David Matlack,
	Oliver Upton

On Tue, Nov 01, 2022 at 05:58:21PM +0000, Sean Christopherson wrote:
> On Tue, Nov 01, 2022, Yu Zhang wrote:
> > On Mon, Oct 31, 2022 at 05:11:10PM +0000, Sean Christopherson wrote:
> > > vmcs_config.nested never becomes out-of-date, it's read-only after __init (not
> > > currently marked as such, that will be remedied soon).
> > > 
> > > The auditing performed by KVM is purely to guard against userspace enabling
> > > features that KVM doesn't support.  KVM is not responsible for ensuring that the
> > > vCPU's CPUID model match the VMX MSR model.
> > 
> > Do you mean the VMX MSR model shall not be changed after the cpuid updates?
> 
> No, I mean that the virtual CPU model (CPUID + VMX MSRs) that is presented to the
> guest is the responsibility of host userspace.  KVM only cares about not enabling
> bits/features that KVM doesn't supported.
> 

Oh, I see. We need to guarantee the userspace VMM can not successfully
set a feature in vmx msr, if KVM does not support it.

> > And for VMX MSR model, do you mean the vmx->nested.msrs or the ones in 
> > vmcs_config->nested? 
> 
> vmx->nested.msrs.  vmcs_config->nested is effectively the VMX equivalent of
> KVM_GET_SUPPORTED_CPUID.
> 
> > What I observed is that vmx->nested.msrs.secondary_ctls_high will be changed
> > in vmx_adjust_secondary_exec_control(), which can be triggered after cpuid is
> > set. 
> 
> Ugh, that path got overlooked when we yanked out KVM's manipulaton of VMX MSRs
> in response to guest CPUID changes.  I wonder if we can get away with changing
> KVM's behavior to only ensure a feature isn't exposed to L2 when it's not exposed
> to L1.
> 
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 6b4266e949a3..cfc35d559d91 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -4523,8 +4523,8 @@ vmx_adjust_secondary_exec_control(struct vcpu_vmx *vmx, u32 *exec_control,
>          * Update the nested MSR settings so that a nested VMM can/can't set
>          * controls for features that are/aren't exposed to the guest.
>          */
> -       if (nested) {
> -               if (enabled)
> +       if (nested && !enabled)
> +               if (exiting)
>                         vmx->nested.msrs.secondary_ctls_high |= control;
>                 else
>                         vmx->nested.msrs.secondary_ctls_high &= ~control;
> 

Indeed, this change can make sure a feature won't be exposed to L2, if L1
does not have it. But for the feature bits that L1 has, yet cleared from
the vmcs_conf->nested.msrs.secondary_ctls_high in nested_vmx_setup_ctls_msrs(),
there's no chance for userspace vmm to reset it again.

Well, I am not suggesting to give userspace vmm such permission(which I believe
is incorrect). And IIUC, vmcs_conf->nested.msrs.secondary_ctls_high shall serve
as a template to initialize vmx->nested.msrs.secondary_ctls_high. So maybe we
shall not mask off some features in nested_vmx_setup_ctls_msrs() at the beginning.
 
> > Since KVM's config(vmcs_config->nested.secondary_ctls_high) is done during init
> > by nested_vmx_setup_ctls_msrs(), which only kept a subset of the flags from the
> > vmcs_confg->cpu_based_2nd_exec_ctrl, the vmx_restore_control_msr() could fail
> > later, when userspace VMM tries to enable a feature(the only one I witnessed is
> > SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE) by setting MSR_IA32_VMX_PROCBASED_CTLS2.
> > Because the vmx->nested.msrs.secondary_ctls_high is updated by cpuid, but this
> > bit is not taken from vmcs_conf->cpu_based_2nd_exec_ctrl by nested_vmx_setup_ctls_msrs()
> > for vmcs_config->nested.secondary_ctls_high.
> > 
> > The failure can be fixed, simply by adding SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE in
> > nested_vmx_setup_ctls_msrs(), e.g.
> 
> Assuming KVM actually supports user wait/pause in L2, this is an orthogonal bug
> to the CPUID-based manipulation above.  KVM simply neglects to advertise to userspace
> that ENABLE_USR_WAIT_PAUSE is supported for nested virtualization.
> 
> If KVM doesn't correctly support virtualizing user wait/pause for L2, then the
> correct location to fix this is in vmx_secondary_exec_control().
> 

Sorry, why vmx_secondary_exec_control()? Could we just change
nested_vmx_setup_ctls_msrs() like below:

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 0c62352dda6a..fa255391718c 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -6791,13 +6791,7 @@ void nested_vmx_setup_ctls_msrs(struct vmcs_config *vmcs_conf, u32 ept_caps)
        msrs->procbased_ctls_low &=
                ~(CPU_BASED_CR3_LOAD_EXITING | CPU_BASED_CR3_STORE_EXITING);

-       /*
-        * secondary cpu-based controls.  Do not include those that
-        * depend on CPUID bits, they are added later by
-        * vmx_vcpu_after_set_cpuid.
-        */
        msrs->secondary_ctls_low = 0;
-
        msrs->secondary_ctls_high = vmcs_conf->cpu_based_2nd_exec_ctrl;
        msrs->secondary_ctls_high &=
                SECONDARY_EXEC_DESC |
@@ -6810,7 +6804,8 @@ void nested_vmx_setup_ctls_msrs(struct vmcs_config *vmcs_conf, u32 ept_caps)
                SECONDARY_EXEC_ENABLE_INVPCID |
                SECONDARY_EXEC_RDSEED_EXITING |
                SECONDARY_EXEC_XSAVES |
-               SECONDARY_EXEC_TSC_SCALING;
+               SECONDARY_EXEC_TSC_SCALING |
+               SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE;

        /*
         * We can emulate "VMCS shadowing," even if the hardware

Note: I did not use "if (cpu_has_vmx_waitpkg())" here, it looks like to
take off one's pants to fart(no offense, just a Chinese old saying meaning
unnecessary acts.:)).

> > > > Another question is about the setting of secondary_ctls_high in
> > > > nested_vmx_setup_ctls_msrs().  I saw there's a comment saying:
> > > > 	"Do not include those that depend on CPUID bits, they are
> > > > 	added later by vmx_vcpu_after_set_cpuid.".
> > > 
> > > That's a stale comment, see the very next commit, 8805875aa473 ("Revert "KVM: nVMX:
> > > Do not expose MPX VMX controls when guest MPX disabled""), as well as the slightly
> > > later commit 9389d5774aca ("Revert "KVM: nVMX: Expose load IA32_PERF_GLOBAL_CTRL
> > > VM-{Entry,Exit} control"").
> > > 
> > 
> > So the comment can be and shall be removed, right? 
> 
> Yep.
> 
> > > > 	if (cpu_has_vmx_vmfunc()) {
> > > > 		msrs->secondary_ctls_high |=
> > > > 			SECONDARY_EXEC_ENABLE_VMFUNC;
> > > 
> > > This one is still required.  KVM never enables VMFUNC for itself, i.e. it won't
> > > be set in KVM's VMCS configuration.
> > > 
> > 
> > My understanding is that, for VMFUNC, altough KVM does not support it,
> > SECONDARY_EXEC_ENABLE_VMFUNC is still set in the secondary proc-based
> > vm execution ctrol. KVM just uses different handlers for VMFUNC exits
> > from L1(to inject the #UD) and L2(to emulate the eptp switching). So
> > it doesn't matter if we do not clear this bit for vmcs_config->nested.
> > procbased_ctls_high. 
> 
> Ah, you're right, I didn't realize KVM enables VMFUNC in L1.  Enabling VMFUNC for
> L1 is silly though, it's trivial to clear the feature in vmx_secondary_exec_control().
> 
> That said, enabling VMFUNC in vmcs01 is an orthogonal topic, and it _is_ indeed
> easier to keep the feature in the reference config.  Now that the nested config
> is derived from the non-nested config, nested_vmx_setup_ctls_msrs() can do:

Agreed. BTW, do you know why KVM took pains to do so? I mean, emulation for
L2's vmfunc does not rely on the existance of vmfunc, right? So, for L2,
we can just set vmcs02's SECONDARY_EXEC_ENABLE_VMFUNC based on vmcs12. And
for L1, we can just disable it by clearing it in vmx_secondary_exec_control(),
and remove the #UD injection logic from KVM?

B.R.
Yu

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 05/15] KVM: nVMX: Let userspace set nVMX MSR to any _host_ supported value
  2022-11-02  8:54           ` Yu Zhang
@ 2022-11-03 16:53             ` Sean Christopherson
  2022-11-07  8:28               ` Yu Zhang
  0 siblings, 1 reply; 31+ messages in thread
From: Sean Christopherson @ 2022-11-03 16:53 UTC (permalink / raw)
  To: Yu Zhang
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Eric Li, David Matlack,
	Oliver Upton

On Wed, Nov 02, 2022, Yu Zhang wrote:
> On Tue, Nov 01, 2022 at 05:58:21PM +0000, Sean Christopherson wrote:
> > On Tue, Nov 01, 2022, Yu Zhang wrote:
> > > What I observed is that vmx->nested.msrs.secondary_ctls_high will be changed
> > > in vmx_adjust_secondary_exec_control(), which can be triggered after cpuid is
> > > set. 
> > 
> > Ugh, that path got overlooked when we yanked out KVM's manipulaton of VMX MSRs
> > in response to guest CPUID changes.  I wonder if we can get away with changing
> > KVM's behavior to only ensure a feature isn't exposed to L2 when it's not exposed
> > to L1.
> > 
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index 6b4266e949a3..cfc35d559d91 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -4523,8 +4523,8 @@ vmx_adjust_secondary_exec_control(struct vcpu_vmx *vmx, u32 *exec_control,
> >          * Update the nested MSR settings so that a nested VMM can/can't set
> >          * controls for features that are/aren't exposed to the guest.
> >          */
> > -       if (nested) {
> > -               if (enabled)
> > +       if (nested && !enabled)
> > +               if (exiting)
> >                         vmx->nested.msrs.secondary_ctls_high |= control;
> >                 else
> >                         vmx->nested.msrs.secondary_ctls_high &= ~control;
> > 
> 
> Indeed, this change can make sure a feature won't be exposed to L2, if L1
> does not have it.

No, that's not the goal of the change.  KVM already hides features in the VMX MSRs
if the base feature is not supported in L1 according to guest CPUID.  The problem
is that, currently, KVM also _forces_ features to be enabled in the VMX MSRs when
the base feature IS supported in L1 (CPUID).

Ideally, KVM should NEVER manipulate VMX MSRs in response to guest CPUID changes.
That's what I was referring to earlier by commits:

  8805875aa473 ("Revert "KVM: nVMX: Do not expose MPX VMX controls when guest MPX disabled"")
  9389d5774aca ("Revert "KVM: nVMX: Expose load IA32_PERF_GLOBAL_CTRL"")

E.g. if userspace sets VMX MSRs and then sets guest CPUID, KVM will override the
nVMX CPU model defined by userspace.  The scenario where userspace hides a "base"
feature but exposes the feature in the VMX MSRs is nonsensical, which is why I
think KVM can likely get away with force-disabling features.

The reverse is completely legitimate though: hiding a feature in VMX MSRs even if
the base feature is supported for L1, i.e. disallowing L1 from enabling the feature
in L2, is something that real VMMs will actually do, e.g. if the user doesn't trust
that KVM correctly handles all aspects of nested virtualization for the feature.

In other words, the behavior you're observing, where vmx->nested.msrs.secondary_ctls_high
is changed by vmx_adjust_secondary_exec_control(), is a completely separate bug
than the one below.

> But for the feature bits that L1 has, yet cleared from the
> vmcs_conf->nested.msrs.secondary_ctls_high in nested_vmx_setup_ctls_msrs(),
> there's no chance for userspace vmm to reset it again.
> 
> Well, I am not suggesting to give userspace vmm such permission(which I believe
> is incorrect). And IIUC, vmcs_conf->nested.msrs.secondary_ctls_high shall serve
> as a template to initialize vmx->nested.msrs.secondary_ctls_high. So maybe we
> shall not mask off some features in nested_vmx_setup_ctls_msrs() at the beginning.
>  
> > > Since KVM's config(vmcs_config->nested.secondary_ctls_high) is done during init
> > > by nested_vmx_setup_ctls_msrs(), which only kept a subset of the flags from the
> > > vmcs_confg->cpu_based_2nd_exec_ctrl, the vmx_restore_control_msr() could fail
> > > later, when userspace VMM tries to enable a feature(the only one I witnessed is
> > > SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE) by setting MSR_IA32_VMX_PROCBASED_CTLS2.
> > > Because the vmx->nested.msrs.secondary_ctls_high is updated by cpuid, but this
> > > bit is not taken from vmcs_conf->cpu_based_2nd_exec_ctrl by nested_vmx_setup_ctls_msrs()
> > > for vmcs_config->nested.secondary_ctls_high.
> > > 
> > > The failure can be fixed, simply by adding SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE in
> > > nested_vmx_setup_ctls_msrs(), e.g.
> > 
> > Assuming KVM actually supports user wait/pause in L2, this is an orthogonal bug
> > to the CPUID-based manipulation above.  KVM simply neglects to advertise to userspace
> > that ENABLE_USR_WAIT_PAUSE is supported for nested virtualization.
> > 
> > If KVM doesn't correctly support virtualizing user wait/pause for L2, then the
> > correct location to fix this is in vmx_secondary_exec_control().
> > 
> 
> Sorry, why vmx_secondary_exec_control()?

You missed the qualifier:

  If KVM doesn't correctly support virtualizing user wait/pause for L2

If KVM should NOT be exposing ENABLE_USR_WAIT_PAUSE to the L1 VMM, then NOT
propagating the feature to msrs->secondary_ctls_low is correct.  And if that's
the case, then vmx_secondary_exec_control() needs to be modified so that it does
NOT set ENABLE_USR_WAIT_PAUSE in vmx->nested.msrs.secondary_ctls_high.

> Could we just change nested_vmx_setup_ctls_msrs() like below:

If KVM correctly virtualizes the feature in a nested scenario, yes.  I haven't
looked into ENABLE_USR_WAIT_PAUSE enough to know whether or not KVM gets the
nested virtualization pieces correct, hence the above qualifier.

> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 0c62352dda6a..fa255391718c 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -6791,13 +6791,7 @@ void nested_vmx_setup_ctls_msrs(struct vmcs_config *vmcs_conf, u32 ept_caps)
>         msrs->procbased_ctls_low &=
>                 ~(CPU_BASED_CR3_LOAD_EXITING | CPU_BASED_CR3_STORE_EXITING);
> 
> -       /*
> -        * secondary cpu-based controls.  Do not include those that
> -        * depend on CPUID bits, they are added later by
> -        * vmx_vcpu_after_set_cpuid.
> -        */
>         msrs->secondary_ctls_low = 0;
> -
>         msrs->secondary_ctls_high = vmcs_conf->cpu_based_2nd_exec_ctrl;
>         msrs->secondary_ctls_high &=
>                 SECONDARY_EXEC_DESC |
> @@ -6810,7 +6804,8 @@ void nested_vmx_setup_ctls_msrs(struct vmcs_config *vmcs_conf, u32 ept_caps)
>                 SECONDARY_EXEC_ENABLE_INVPCID |
>                 SECONDARY_EXEC_RDSEED_EXITING |
>                 SECONDARY_EXEC_XSAVES |
> -               SECONDARY_EXEC_TSC_SCALING;
> +               SECONDARY_EXEC_TSC_SCALING |
> +               SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE;
> 
>         /*
>          * We can emulate "VMCS shadowing," even if the hardware

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 05/15] KVM: nVMX: Let userspace set nVMX MSR to any _host_ supported value
  2022-11-03 16:53             ` Sean Christopherson
@ 2022-11-07  8:28               ` Yu Zhang
  2022-11-07 15:06                 ` Sean Christopherson
  0 siblings, 1 reply; 31+ messages in thread
From: Yu Zhang @ 2022-11-07  8:28 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Eric Li, David Matlack,
	Oliver Upton, Liu Jingqi

On Thu, Nov 03, 2022 at 04:53:11PM +0000, Sean Christopherson wrote:
> > > 
> > > Ugh, that path got overlooked when we yanked out KVM's manipulaton of VMX MSRs
> > > in response to guest CPUID changes.  I wonder if we can get away with changing
> > > KVM's behavior to only ensure a feature isn't exposed to L2 when it's not exposed
> > > to L1.
> > > 
> > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > > index 6b4266e949a3..cfc35d559d91 100644
> > > --- a/arch/x86/kvm/vmx/vmx.c
> > > +++ b/arch/x86/kvm/vmx/vmx.c
> > > @@ -4523,8 +4523,8 @@ vmx_adjust_secondary_exec_control(struct vcpu_vmx *vmx, u32 *exec_control,
> > >          * Update the nested MSR settings so that a nested VMM can/can't set
> > >          * controls for features that are/aren't exposed to the guest.
> > >          */
> > > -       if (nested) {
> > > -               if (enabled)
> > > +       if (nested && !enabled)
> > > +               if (exiting)
> > >                         vmx->nested.msrs.secondary_ctls_high |= control;
> > >                 else
> > >                         vmx->nested.msrs.secondary_ctls_high &= ~control;
> > > 
> > 
> > Indeed, this change can make sure a feature won't be exposed to L2, if L1
> > does not have it.
> 
> No, that's not the goal of the change.  KVM already hides features in the VMX MSRs
> if the base feature is not supported in L1 according to guest CPUID.  The problem
> is that, currently, KVM also _forces_ features to be enabled in the VMX MSRs when
> the base feature IS supported in L1 (CPUID).
> 
> Ideally, KVM should NEVER manipulate VMX MSRs in response to guest CPUID changes.
> That's what I was referring to earlier by commits:
> 
>   8805875aa473 ("Revert "KVM: nVMX: Do not expose MPX VMX controls when guest MPX disabled"")
>   9389d5774aca ("Revert "KVM: nVMX: Expose load IA32_PERF_GLOBAL_CTRL"")
> 
> E.g. if userspace sets VMX MSRs and then sets guest CPUID, KVM will override the
> nVMX CPU model defined by userspace.  The scenario where userspace hides a "base"
> feature but exposes the feature in the VMX MSRs is nonsensical, which is why I
> think KVM can likely get away with force-disabling features.
> 
> The reverse is completely legitimate though: hiding a feature in VMX MSRs even if
> the base feature is supported for L1, i.e. disallowing L1 from enabling the feature
> in L2, is something that real VMMs will actually do, e.g. if the user doesn't trust
> that KVM correctly handles all aspects of nested virtualization for the feature.

Thanks Sean. Let me try to rephrase my understandings of your statement(
and pls feel free to correct me):

1> For now, what vmx_adjust_secondary_exec_control() does, is to enable/
disable a feature in VMX MSR(and nVMX MSR) based on cpuid changes.
2> What makes sense is, if a feature is 
	a. disabled by guest CPUID, it shall not be exposed in guest VMX MSR;
	b. enabled by guest CPUID, it could be either exposed or hidden in
	guest VMX MSR.
3> So your previous change is to guarantee 2.a, and userspace VMM can choose
to follow follow either choices in 2.b(depending on whether it believes this
feature is correctly supported by KVM in nested). 

Is above understanding correct? 

But what if userspace VMM sets guest CPUID first, disabling a feature, and
then sets the guest VMX MSR bit with this feature enabled? Does KVM need to
check guest CPUID again, in vmx_restore_control_msr()? 

I do not think above scenario is what QEMU does - QEMU checks guest CPUID
first with kvm_arch_get_supported_cpuid() before trying to set guest VMX MSR.
But I am not sure if this is mandatory step for all userspace VMM. 

> 
> In other words, the behavior you're observing, where vmx->nested.msrs.secondary_ctls_high
> is changed by vmx_adjust_secondary_exec_control(), is a completely separate bug
> than the one below.
> 
> > > 
> > > Assuming KVM actually supports user wait/pause in L2, this is an orthogonal bug
> > > to the CPUID-based manipulation above.  KVM simply neglects to advertise to userspace
> > > that ENABLE_USR_WAIT_PAUSE is supported for nested virtualization.
> > > 
> > > If KVM doesn't correctly support virtualizing user wait/pause for L2, then the
> > > correct location to fix this is in vmx_secondary_exec_control().
> > > 
> > 
> > Sorry, why vmx_secondary_exec_control()?
> 
> You missed the qualifier:
> 
>   If KVM doesn't correctly support virtualizing user wait/pause for L2
> 
> If KVM should NOT be exposing ENABLE_USR_WAIT_PAUSE to the L1 VMM, then NOT
> propagating the feature to msrs->secondary_ctls_low is correct.  And if that's
> the case, then vmx_secondary_exec_control() needs to be modified so that it does
> NOT set ENABLE_USR_WAIT_PAUSE in vmx->nested.msrs.secondary_ctls_high.
> 
> > Could we just change nested_vmx_setup_ctls_msrs() like below:
> 
> If KVM correctly virtualizes the feature in a nested scenario, yes.  I haven't
> looked into ENABLE_USR_WAIT_PAUSE enough to know whether or not KVM gets the
> nested virtualization pieces correct, hence the above qualifier.
> 

Got it. I'll check that. Thanks!


B.R.
Yu

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 05/15] KVM: nVMX: Let userspace set nVMX MSR to any _host_ supported value
  2022-11-07  8:28               ` Yu Zhang
@ 2022-11-07 15:06                 ` Sean Christopherson
  2022-11-08 10:21                   ` Yu Zhang
  0 siblings, 1 reply; 31+ messages in thread
From: Sean Christopherson @ 2022-11-07 15:06 UTC (permalink / raw)
  To: Yu Zhang
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Eric Li, David Matlack,
	Oliver Upton, Liu Jingqi

On Mon, Nov 07, 2022, Yu Zhang wrote:
> On Thu, Nov 03, 2022 at 04:53:11PM +0000, Sean Christopherson wrote:
> > Ideally, KVM should NEVER manipulate VMX MSRs in response to guest CPUID changes.
> > That's what I was referring to earlier by commits:

...

> Thanks Sean. Let me try to rephrase my understandings of your statement(
> and pls feel free to correct me):
> 
> 1> For now, what vmx_adjust_secondary_exec_control() does, is to enable/
> disable a feature in VMX MSR(and nVMX MSR) based on cpuid changes.
> 2> What makes sense is, if a feature is 
> 	a. disabled by guest CPUID, it shall not be exposed in guest VMX MSR;
> 	b. enabled by guest CPUID, it could be either exposed or hidden in
> 	guest VMX MSR.
> 3> So your previous change is to guarantee 2.a, and userspace VMM can choose
> to follow follow either choices in 2.b(depending on whether it believes this
> feature is correctly supported by KVM in nested). 
> 
> Is above understanding correct? 

Not quite.  Again, in an ideal world, KVM would not modify the VMX MSRs based on
guest CPUID.  But it's possible userspace is relying on KVM to hide a feature from
L2 if it's hidden from L1, so to avoid breaking an otherwise valide userspace config,
it's worth enforcing that in KVM.

> But what if userspace VMM sets guest CPUID first, disabling a feature, and
> then sets the guest VMX MSR bit with this feature enabled? Does KVM need to
> check guest CPUID again, in vmx_restore_control_msr()? 

No.  It's not KVM's responsibility to provide a sane, valid vCPU model.  So long
as KVM is not endangered by a bad config, userspace can do whatever it wants.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 05/15] KVM: nVMX: Let userspace set nVMX MSR to any _host_ supported value
  2022-11-07 15:06                 ` Sean Christopherson
@ 2022-11-08 10:21                   ` Yu Zhang
  2022-11-08 18:35                     ` Sean Christopherson
  0 siblings, 1 reply; 31+ messages in thread
From: Yu Zhang @ 2022-11-08 10:21 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Eric Li, David Matlack,
	Oliver Upton, Liu Jingqi

On Mon, Nov 07, 2022 at 03:06:51PM +0000, Sean Christopherson wrote:
> On Mon, Nov 07, 2022, Yu Zhang wrote:
> > On Thu, Nov 03, 2022 at 04:53:11PM +0000, Sean Christopherson wrote:
> > > Ideally, KVM should NEVER manipulate VMX MSRs in response to guest CPUID changes.
> > > That's what I was referring to earlier by commits:
> 
> ...
> 
> > Thanks Sean. Let me try to rephrase my understandings of your statement(
> > and pls feel free to correct me):
> > 
> > 1> For now, what vmx_adjust_secondary_exec_control() does, is to enable/
> > disable a feature in VMX MSR(and nVMX MSR) based on cpuid changes.
> > 2> What makes sense is, if a feature is 
> > 	a. disabled by guest CPUID, it shall not be exposed in guest VMX MSR;
> > 	b. enabled by guest CPUID, it could be either exposed or hidden in
> > 	guest VMX MSR.
> > 3> So your previous change is to guarantee 2.a, and userspace VMM can choose
> > to follow follow either choices in 2.b(depending on whether it believes this
> > feature is correctly supported by KVM in nested). 
> > 
> > Is above understanding correct? 
> 
> Not quite.  Again, in an ideal world, KVM would not modify the VMX MSRs based on
> guest CPUID.  But it's possible userspace is relying on KVM to hide a feature from
> L2 if it's hidden from L1, so to avoid breaking an otherwise valide userspace config,
> it's worth enforcing that in KVM.
> 

Sorry, maybe I should understand this way:

In theroy, KVM shall not modify guest VMX MSRs in response to the guest CPUID
updates. Therefore we shall not enforce the exposure of a feature in guest VMX
MSR, just because it is enabled in guest CPUID (e.g., userspace VMM can choose
to hide such feature so long as it believes KVM can not provide correct nested
support for this feature). 

But in reverse, it is not reasonable for userspace VMM to expose a feature in
guest VMX MSR settings, if such feature is disabled in this guest's CPUID. So
KVM shall help to make sure such feature is hidden when guest CPUID changes.


BTW, I found my previous understanding of what vmx_adjust_secondary_exec_control()
currently does was also wrong. It could also be used for EXITING controls. And
for such flags(e.g., SECONDARY_EXEC_RDRAND_EXITING), values for the nested settings
(vmx->nested.msrs.secondary_ctls_high) and for the L1 execution controls(*exec_control)
could be opposite. So the statement:
	"1> For now, what vmx_adjust_secondary_exec_control() does, is to enable/
	 disable a feature in VMX MSR(and nVMX MSR) based on cpuid changes."
is wrong.

Hopefully we are gonna change vmx_adjust_secondary_exec_control() soon...

B.R.
Yu

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 05/15] KVM: nVMX: Let userspace set nVMX MSR to any _host_ supported value
  2022-11-08 10:21                   ` Yu Zhang
@ 2022-11-08 18:35                     ` Sean Christopherson
  2022-11-10  8:44                       ` Yu Zhang
  0 siblings, 1 reply; 31+ messages in thread
From: Sean Christopherson @ 2022-11-08 18:35 UTC (permalink / raw)
  To: Yu Zhang
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Eric Li, David Matlack,
	Oliver Upton, Liu Jingqi

On Tue, Nov 08, 2022, Yu Zhang wrote:
> On Mon, Nov 07, 2022 at 03:06:51PM +0000, Sean Christopherson wrote:
> > On Mon, Nov 07, 2022, Yu Zhang wrote:
> > > On Thu, Nov 03, 2022 at 04:53:11PM +0000, Sean Christopherson wrote:
> > > > Ideally, KVM should NEVER manipulate VMX MSRs in response to guest CPUID changes.
> > > > That's what I was referring to earlier by commits:
> > 
> > ...
> > 
> > > Thanks Sean. Let me try to rephrase my understandings of your statement(
> > > and pls feel free to correct me):
> > > 
> > > 1> For now, what vmx_adjust_secondary_exec_control() does, is to enable/
> > > disable a feature in VMX MSR(and nVMX MSR) based on cpuid changes.
> > > 2> What makes sense is, if a feature is 
> > > 	a. disabled by guest CPUID, it shall not be exposed in guest VMX MSR;
> > > 	b. enabled by guest CPUID, it could be either exposed or hidden in
> > > 	guest VMX MSR.
> > > 3> So your previous change is to guarantee 2.a, and userspace VMM can choose
> > > to follow follow either choices in 2.b(depending on whether it believes this
> > > feature is correctly supported by KVM in nested). 
> > > 
> > > Is above understanding correct? 
> > 
> > Not quite.  Again, in an ideal world, KVM would not modify the VMX MSRs based on
> > guest CPUID.  But it's possible userspace is relying on KVM to hide a feature from
> > L2 if it's hidden from L1, so to avoid breaking an otherwise valide userspace config,
> > it's worth enforcing that in KVM.
> > 
> 
> Sorry, maybe I should understand this way:
> 
> In theroy, KVM shall not modify guest VMX MSRs in response to the guest CPUID
> updates. Therefore we shall not enforce the exposure of a feature in guest VMX
> MSR, just because it is enabled in guest CPUID (e.g., userspace VMM can choose
> to hide such feature so long as it believes KVM can not provide correct nested
> support for this feature). 
> 
> But in reverse, it is not reasonable for userspace VMM to expose a feature in
> guest VMX MSR settings, if such feature is disabled in this guest's CPUID. So
> KVM shall help to make sure such feature is hidden when guest CPUID changes.

No.  Again, KVM _should never_ manipulate VMX MSRs in response to CPUID changes.
Keeping the existing behavior would be done purely to maintain backwards
compability with existing userspace, not because it's strictly the right thing to do.

E.g. as a strawman, a weird userspace could do KVM_SET_MSRS => KVM_SET_CPUID =>
KVM_SET_CPUID, where the first KVM_SET_CPUID reset to a base config and the second
KVM_SET_CPUID incorporates "optional" features.  In that case, clearing bits in
the VMX MSRs on the first KVM_SET_CPUID would do the wrong thing if the second
KVM_SET_CPUID enabled the relevant features.

AFAIK, no userspace actually does something odd like that, whereas there are VMMs
that do KVM_SET_MSRS before KVM_SET_CPUID, e.g. disable a feature in VMX MSRs but
later enable the feature in CPUID for L1.  And so disabling features is likely
safe-ish, but enabling feature most definitely can cause problems for userspace.

Hrm, actually, there are likely older VMMs that never set VMX MSRs, and so dropping
the "enable features" code might not be safe either.  Grr.  The obvious solution
would be to add a quirk, but maybe we can avoid a quirk by skipping KVM's
misguided updates if userspace has set the MSR.  That should work for a userspace
that deliberately sets the MSR during setup, and for a userspace that blindly
migrates the MSR since the migrated value should already be correct/sane.

E.g.

diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index 45162c1bcd8f..671479cd7721 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -51,6 +51,7 @@ struct nested_vmx_msrs {
        u64 cr4_fixed1;
        u64 vmcs_enum;
        u64 vmfunc_controls;
+       bool secondary_set_by_userspace;
 };
 
 struct vmcs_config {
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 62e3967cf131..3f691ed169d8 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -1257,6 +1257,9 @@ vmx_restore_control_msr(struct vcpu_vmx *vmx, u32 msr_index, u64 data)
        if (!is_bitwise_subset(supported, data, GENMASK_ULL(63, 32)))
                return -EINVAL;
 
+       if (msr_index == MSR_IA32_VMX_PROCBASED_CTLS2)
+               vmx->nested.msrs.secondary_set_by_userspace = true;
+
        vmx_get_control_msr(&vmx->nested.msrs, msr_index, &lowp, &highp);
        *lowp = data;
        *highp = data >> 32;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index ab89755dce66..8aadaae5b81e 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4523,7 +4523,7 @@ vmx_adjust_secondary_exec_control(struct vcpu_vmx *vmx, u32 *exec_control,
         * Update the nested MSR settings so that a nested VMM can/can't set
         * controls for features that are/aren't exposed to the guest.
         */
-       if (nested) {
+       if (nested && !vmx->nested.msrs.secondary_set_by_userspace) {
                if (enabled)
                        vmx->nested.msrs.secondary_ctls_high |= control;
                else


> BTW, I found my previous understanding of what vmx_adjust_secondary_exec_control()
> currently does was also wrong. It could also be used for EXITING controls. And
> for such flags(e.g., SECONDARY_EXEC_RDRAND_EXITING), values for the nested settings
> (vmx->nested.msrs.secondary_ctls_high) and for the L1 execution controls(*exec_control)
> could be opposite. So the statement:
> 	"1> For now, what vmx_adjust_secondary_exec_control() does, is to enable/
> 	 disable a feature in VMX MSR(and nVMX MSR) based on cpuid changes."
> is wrong.

No, it's correct.  The EXITING controls are just inverted feature flags.  E.g. if
RDRAND is disabled in CPUID, KVM sets the EXITING control so that KVM intercepts
RDRAND in order to inject #UD.

	[EXIT_REASON_RDRAND]                  = kvm_handle_invalid_op,

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 05/15] KVM: nVMX: Let userspace set nVMX MSR to any _host_ supported value
  2022-11-08 18:35                     ` Sean Christopherson
@ 2022-11-10  8:44                       ` Yu Zhang
  2022-11-10 16:08                         ` Sean Christopherson
  0 siblings, 1 reply; 31+ messages in thread
From: Yu Zhang @ 2022-11-10  8:44 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Eric Li, David Matlack,
	Oliver Upton, Liu Jingqi

> 
> No.  Again, KVM _should never_ manipulate VMX MSRs in response to CPUID changes.
> Keeping the existing behavior would be done purely to maintain backwards
> compability with existing userspace, not because it's strictly the right thing to do.
> 
> E.g. as a strawman, a weird userspace could do KVM_SET_MSRS => KVM_SET_CPUID =>
> KVM_SET_CPUID, where the first KVM_SET_CPUID reset to a base config and the second
> KVM_SET_CPUID incorporates "optional" features.  In that case, clearing bits in
> the VMX MSRs on the first KVM_SET_CPUID would do the wrong thing if the second
> KVM_SET_CPUID enabled the relevant features.
> 
> AFAIK, no userspace actually does something odd like that, whereas there are VMMs
> that do KVM_SET_MSRS before KVM_SET_CPUID, e.g. disable a feature in VMX MSRs but
> later enable the feature in CPUID for L1.  And so disabling features is likely
> safe-ish, but enabling feature most definitely can cause problems for userspace.
> 
> Hrm, actually, there are likely older VMMs that never set VMX MSRs, and so dropping
> the "enable features" code might not be safe either.  Grr.  The obvious solution
> would be to add a quirk, but maybe we can avoid a quirk by skipping KVM's
> misguided updates if userspace has set the MSR.  That should work for a userspace
> that deliberately sets the MSR during setup, and for a userspace that blindly
> migrates the MSR since the migrated value should already be correct/sane.
> 
Oh. Just saw your new selftest code, and fininally get your point(I hope
so...).  Thanks!

> > BTW, I found my previous understanding of what vmx_adjust_secondary_exec_control()
> > currently does was also wrong. It could also be used for EXITING controls. And
> > for such flags(e.g., SECONDARY_EXEC_RDRAND_EXITING), values for the nested settings
> > (vmx->nested.msrs.secondary_ctls_high) and for the L1 execution controls(*exec_control)
> > could be opposite. So the statement:
> > 	"1> For now, what vmx_adjust_secondary_exec_control() does, is to enable/
> > 	 disable a feature in VMX MSR(and nVMX MSR) based on cpuid changes."
> > is wrong.
> 
> No, it's correct.  The EXITING controls are just inverted feature flags.  E.g. if
> RDRAND is disabled in CPUID, KVM sets the EXITING control so that KVM intercepts
> RDRAND in order to inject #UD.
> 
> 	[EXIT_REASON_RDRAND]                  = kvm_handle_invalid_op,
> 

Well, suppose
- cpu_has_vmx_rdrand() is true;
- meanwhile guest_cpuid_has(vcpu, X86_FEATURE_RDRAND) is false.

And then, what vmx_adjust_secondary_exec_control() currently does is:
1> keep the SECONDARY_EXEC_RDRAND_EXITING set in L1 secondary proc-
based execution control.
2> and then clear the SECONDARY_EXEC_RDRAND_EXITING in the high bits
of IA32_VMX_PROCBASED_CTLS2 MSR for nested by
        vmx->nested.msrs.secondary_ctls_high &= ~control;
That means for L1 VMM, SECONDARY_EXEC_RDRAND_EXITING must be cleared
in its(VMCS12's) secondary proc-based VM-execution control, even when
rdrand is disabled in L1's and L2's CPUID.

I wonder, for native environment, if an instruction is not supported,
will the allowed 1-setting for its corresponding exiting feature in
IA32_VMX_PROCBASED_CTLS2 MSR be set, or be cleared? Maybe it should
be cleared, and executing such instruction in non-root will just get
a #UD directly instead of triggering a VM-Exit?

Note: I do not think this will cause any problem, just curious if L1
VMM can observe a behavior that's not supposed to be in native scenario(
only because what we are doing in KVM). 

B.R.
Yu


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 05/15] KVM: nVMX: Let userspace set nVMX MSR to any _host_ supported value
  2022-11-10  8:44                       ` Yu Zhang
@ 2022-11-10 16:08                         ` Sean Christopherson
  0 siblings, 0 replies; 31+ messages in thread
From: Sean Christopherson @ 2022-11-10 16:08 UTC (permalink / raw)
  To: Yu Zhang
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Eric Li, David Matlack,
	Oliver Upton, Liu Jingqi

On Thu, Nov 10, 2022, Yu Zhang wrote:
> > > BTW, I found my previous understanding of what vmx_adjust_secondary_exec_control()
> > > currently does was also wrong. It could also be used for EXITING controls. And
> > > for such flags(e.g., SECONDARY_EXEC_RDRAND_EXITING), values for the nested settings
> > > (vmx->nested.msrs.secondary_ctls_high) and for the L1 execution controls(*exec_control)
> > > could be opposite. So the statement:
> > > 	"1> For now, what vmx_adjust_secondary_exec_control() does, is to enable/
> > > 	 disable a feature in VMX MSR(and nVMX MSR) based on cpuid changes."
> > > is wrong.
> > 
> > No, it's correct.  The EXITING controls are just inverted feature flags.  E.g. if
> > RDRAND is disabled in CPUID, KVM sets the EXITING control so that KVM intercepts
> > RDRAND in order to inject #UD.
> > 
> > 	[EXIT_REASON_RDRAND]                  = kvm_handle_invalid_op,
> > 
> 
> Well, suppose
> - cpu_has_vmx_rdrand() is true;
> - meanwhile guest_cpuid_has(vcpu, X86_FEATURE_RDRAND) is false.
> 
> And then, what vmx_adjust_secondary_exec_control() currently does is:
> 1> keep the SECONDARY_EXEC_RDRAND_EXITING set in L1 secondary proc-
> based execution control.
> 2> and then clear the SECONDARY_EXEC_RDRAND_EXITING in the high bits
> of IA32_VMX_PROCBASED_CTLS2 MSR for nested by
>         vmx->nested.msrs.secondary_ctls_high &= ~control;
> That means for L1 VMM, SECONDARY_EXEC_RDRAND_EXITING must be cleared
> in its(VMCS12's) secondary proc-based VM-execution control, even when
> rdrand is disabled in L1's and L2's CPUID.

Again, it is _userspace's_ responsibility to provide a sane, consistent CPU model
to the guest.

> I wonder, for native environment, if an instruction is not supported,
> will the allowed 1-setting for its corresponding exiting feature in
> IA32_VMX_PROCBASED_CTLS2 MSR be set, or be cleared? Maybe it should
> be cleared, and executing such instruction in non-root will just get
> a #UD directly instead of triggering a VM-Exit?

For any reasonable interpretation of the SDM, it's a moot point.  The SDM doesn't
call out these scenarios for instructions like RDTSCP because they're nonsensical,
but for other instructions that can be trapped by the hypervisor and can take a
#UD when they're supported, the #UD is prioritized of the VM-Exit.  E.g. VMX
instructions have pseudocode like:

  IF not in VMX operation
    THEN #UD;
  ELSIF in VMX non-root operation
    THEN VM exit;

In other words, if the CPU doesn't recognize an instruction, it will generate a
#UD without getting to the (presumed) microcode flow that checks for VM-Exit.

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2022-11-10 16:08 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-07 21:35 [PATCH v5 00/15] KVM: nVMX: VMX MSR quirk+fixes, CR4 fixes Sean Christopherson
2022-06-07 21:35 ` [PATCH v5 01/15] KVM: x86: Split kvm_is_valid_cr4() and export only the non-vendor bits Sean Christopherson
2022-06-07 21:35 ` [PATCH v5 02/15] KVM: nVMX: Account for KVM reserved CR4 bits in consistency checks Sean Christopherson
2022-06-07 21:35 ` [PATCH v5 03/15] KVM: nVMX: Inject #UD if VMXON is attempted with incompatible CR0/CR4 Sean Christopherson
2022-06-07 21:35 ` [PATCH v5 04/15] KVM: nVMX: Rename handle_vm{on,off}() to handle_vmx{on,off}() Sean Christopherson
2022-06-07 21:35 ` [PATCH v5 05/15] KVM: nVMX: Let userspace set nVMX MSR to any _host_ supported value Sean Christopherson
2022-10-31 16:39   ` Yu Zhang
2022-10-31 17:11     ` Sean Christopherson
2022-11-01 10:18       ` Yu Zhang
2022-11-01 17:58         ` Sean Christopherson
2022-11-02  8:54           ` Yu Zhang
2022-11-03 16:53             ` Sean Christopherson
2022-11-07  8:28               ` Yu Zhang
2022-11-07 15:06                 ` Sean Christopherson
2022-11-08 10:21                   ` Yu Zhang
2022-11-08 18:35                     ` Sean Christopherson
2022-11-10  8:44                       ` Yu Zhang
2022-11-10 16:08                         ` Sean Christopherson
2022-06-07 21:35 ` [PATCH v5 06/15] KVM: nVMX: Keep KVM updates to BNDCFGS ctrl bits across MSR write Sean Christopherson
2022-07-22  9:06   ` Paolo Bonzini
2022-06-07 21:35 ` [PATCH v5 07/15] KVM: VMX: Add helper to check if the guest PMU has PERF_GLOBAL_CTRL Sean Christopherson
2022-06-07 21:35 ` [PATCH v5 08/15] KVM: nVMX: Keep KVM updates to PERF_GLOBAL_CTRL ctrl bits across MSR write Sean Christopherson
2022-06-07 21:35 ` [PATCH v5 09/15] KVM: nVMX: Drop nested_vmx_pmu_refresh() Sean Christopherson
2022-06-07 21:35 ` [PATCH v5 10/15] KVM: nVMX: Add a quirk for KVM tweaks to VMX MSRs Sean Christopherson
2022-06-07 21:36 ` [PATCH v5 11/15] KVM: nVMX: Set UMIP bit CR4_FIXED1 MSR when emulating UMIP Sean Christopherson
2022-07-22  9:49   ` Paolo Bonzini
2022-06-07 21:36 ` [PATCH v5 12/15] KVM: nVMX: Extend VMX MSRs quirk to CR0/4 fixed1 bits Sean Christopherson
2022-07-22  9:50   ` Paolo Bonzini
2022-06-07 21:36 ` [PATCH v5 13/15] KVM: selftests: Add test to verify KVM's VMX MSRs quirk for controls Sean Christopherson
2022-06-07 21:36 ` [PATCH v5 14/15] KVM: selftests: Extend VMX MSRs test to cover CR4_FIXED1 (and its quirks) Sean Christopherson
2022-06-07 21:36 ` [PATCH v5 15/15] KVM: selftests: Verify VMX MSRs can be restored to KVM-supported values Sean Christopherson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.