linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/8] KVM: x86: nVMX GUEST_CR3 bug fix, and then some...
@ 2019-09-27 21:45 Sean Christopherson
  2019-09-27 21:45 ` [PATCH v2 1/8] KVM: nVMX: Always write vmcs02.GUEST_CR3 during nested VM-Enter Sean Christopherson
                   ` (8 more replies)
  0 siblings, 9 replies; 34+ messages in thread
From: Sean Christopherson @ 2019-09-27 21:45 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Reto Buerki, Liran Alon

*sigh*

v2 was shaping up to be a trivial update, until I started working on
Vitaly's suggestion to add a helper to test for register availability.

The primary purpose of this series is to fix a CR3 corruption in L2
reported by Reto Buerki when running with HLT interception disabled in L1.
On a nested VM-Enter that puts L2 into HLT, KVM never actually enters L2
and instead mimics HLT interception by canceling the nested run and
pretending that VM-Enter to L2 completed and then exited on HLT (which
KVM intercepted).  Because KVM never actually runs L2, KVM skips the
pending MMU update for L2 and so leaves a stale value in vmcs02.GUEST_CR3.
If the next wake event for L2 triggers a nested VM-Exit, KVM will refresh
vmcs12->guest_cr3 from vmcs02.GUEST_CR3 and consume the stale value.

Fix the issue by unconditionally writing vmcs02.GUEST_CR3 during nested
VM-Enter instead of deferring the update to vmx_set_cr3(), and skip the
update of GUEST_CR3 in vmx_set_cr3() when running L2.  I.e. make the
nested code fully responsible for vmcs02.GUEST_CR3.

Patch 02/08 is a minor optimization to skip the GUEST_CR3 update if
vmcs01 is already up-to-date.

Patches 03 and beyond are Vitaly's fault ;-).

Patches 03 and 04 are tangentially related cleanup to vmx_set_rflags()
that was discovered when working through the avail/dirty testing code.
Ideally they'd be sent as a separate series, but they conflict with the
avail/dirty helper changes and are themselves minor and straightforward.

Patches 05 and 06 clean up the register caching code so that there is a
single enum for all registers which use avail/dirty tracking.  While not
a true prerequisite for the avail/dirty helpers, the cleanup allows the
new helpers to take an 'enum kvm_reg' instead of a less helpful 'int reg'.

Patch 07 is the helpers themselves, as suggested by Vitaly.

Patch 08 is a truly optional change to ditch decache_cr3() in favor of
handling CR3 via cache_reg() like any other avail/dirty register.


Note, I collected the Reviewed-by and Tested-by tags for patches 01 and 02
even though I inverted the boolean from 'skip_cr3' to 'update_guest_cr3'.
Please drop the tags if that constitutes a non-trivial functional change.

v2:
  - Invert skip_cr3 to update_guest_cr3.  [Liran]
  - Reword the changelog and comment to be more explicit in detailing
    how/when KVM will process a nested VM-Enter without runnin L2.  [Liran]
  - Added Reviewed-by and Tested-by tags.
  - Add a comment in vmx_set_cr3() to explicitly state that nested
    VM-Enter is responsible for loading vmcs02.GUEST_CR3.  [Jim]
  - All of the loveliness in patches 03-08. [Vitaly]

Sean Christopherson (8):
  KVM: nVMX: Always write vmcs02.GUEST_CR3 during nested VM-Enter
  KVM: VMX: Skip GUEST_CR3 VMREAD+VMWRITE if the VMCS is up-to-date
  KVM: VMX: Consolidate to_vmx() usage in RFLAGS accessors
  KVM: VMX: Optimize vmx_set_rflags() for unrestricted guest
  KVM: x86: Add WARNs to detect out-of-bounds register indices
  KVM: x86: Fold 'enum kvm_ex_reg' definitions into 'enum kvm_reg'
  KVM: x86: Add helpers to test/mark reg availability and dirtiness
  KVM: x86: Fold decache_cr3() into cache_reg()

 arch/x86/include/asm/kvm_host.h |  5 +-
 arch/x86/kvm/kvm_cache_regs.h   | 67 +++++++++++++++++------
 arch/x86/kvm/svm.c              |  5 --
 arch/x86/kvm/vmx/nested.c       | 14 ++++-
 arch/x86/kvm/vmx/vmx.c          | 94 ++++++++++++++++++---------------
 arch/x86/kvm/x86.c              | 13 ++---
 arch/x86/kvm/x86.h              |  6 +--
 7 files changed, 123 insertions(+), 81 deletions(-)

-- 
2.22.0


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v2 1/8] KVM: nVMX: Always write vmcs02.GUEST_CR3 during nested VM-Enter
  2019-09-27 21:45 [PATCH v2 0/8] KVM: x86: nVMX GUEST_CR3 bug fix, and then some Sean Christopherson
@ 2019-09-27 21:45 ` Sean Christopherson
  2019-09-27 23:37   ` Jim Mattson
  2019-09-27 21:45 ` [PATCH v2 2/8] KVM: VMX: Skip GUEST_CR3 VMREAD+VMWRITE if the VMCS is up-to-date Sean Christopherson
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 34+ messages in thread
From: Sean Christopherson @ 2019-09-27 21:45 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Reto Buerki, Liran Alon

Write the desired L2 CR3 into vmcs02.GUEST_CR3 during nested VM-Enter
instead of deferring the VMWRITE until vmx_set_cr3().  If the VMWRITE
is deferred, then KVM can consume a stale vmcs02.GUEST_CR3 when it
refreshes vmcs12->guest_cr3 during nested_vmx_vmexit() if the emulated
VM-Exit occurs without actually entering L2, e.g. if the nested run
is squashed because nested VM-Enter (from L1) is putting L2 into HLT.

Note, the above scenario can occur regardless of whether L1 is
intercepting HLT, e.g. L1 can intercept HLT and then re-enter L2 with
vmcs.GUEST_ACTIVITY_STATE=HALTED.  But practically speaking, a VMM will
likely put a guest into HALTED if and only if it's not intercepting HLT.

In an ideal world where EPT *requires* unrestricted guest (and vice
versa), VMX could handle CR3 similar to how it handles RSP and RIP,
e.g. mark CR3 dirty and conditionally load it at vmx_vcpu_run().  But
the unrestricted guest silliness complicates the dirty tracking logic
to the point that explicitly handling vmcs02.GUEST_CR3 during nested
VM-Enter is a simpler overall implementation.

Cc: stable@vger.kernel.org
Reported-and-tested-by: Reto Buerki <reet@codelabs.ch>
Tested-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/vmx/nested.c | 10 ++++++++++
 arch/x86/kvm/vmx/vmx.c    | 10 +++++++---
 2 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 41abc62c9a8a..b72a00b53e4a 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -2418,6 +2418,16 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
 				entry_failure_code))
 		return -EINVAL;
 
+	/*
+	 * Immediately write vmcs02.GUEST_CR3.  It will be propagated to vmcs12
+	 * on nested VM-Exit, which can occur without actually running L2 and
+	 * thus without hitting vmx_set_cr3(), e.g. if L1 is entering L2 with
+	 * vmcs12.GUEST_ACTIVITYSTATE=HLT, in which case KVM will intercept the
+	 * transition to HLT instead of running L2.
+	 */
+	if (enable_ept)
+		vmcs_writel(GUEST_CR3, vmcs12->guest_cr3);
+
 	/* Late preparation of GUEST_PDPTRs now that EFER and CRs are set. */
 	if (load_guest_pdptrs_vmcs12 && nested_cpu_has_ept(vmcs12) &&
 	    is_pae_paging(vcpu)) {
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index d4575ffb3cec..7679c2a05a50 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2984,6 +2984,7 @@ u64 construct_eptp(struct kvm_vcpu *vcpu, unsigned long root_hpa)
 void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
 {
 	struct kvm *kvm = vcpu->kvm;
+	bool update_guest_cr3 = true;
 	unsigned long guest_cr3;
 	u64 eptp;
 
@@ -3000,15 +3001,18 @@ void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
 			spin_unlock(&to_kvm_vmx(kvm)->ept_pointer_lock);
 		}
 
-		if (enable_unrestricted_guest || is_paging(vcpu) ||
-		    is_guest_mode(vcpu))
+		/* Loading vmcs02.GUEST_CR3 is handled by nested VM-Enter. */
+		if (is_guest_mode(vcpu))
+			update_guest_cr3 = false;
+		else if (enable_unrestricted_guest || is_paging(vcpu))
 			guest_cr3 = kvm_read_cr3(vcpu);
 		else
 			guest_cr3 = to_kvm_vmx(kvm)->ept_identity_map_addr;
 		ept_load_pdptrs(vcpu);
 	}
 
-	vmcs_writel(GUEST_CR3, guest_cr3);
+	if (update_guest_cr3)
+		vmcs_writel(GUEST_CR3, guest_cr3);
 }
 
 int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 2/8] KVM: VMX: Skip GUEST_CR3 VMREAD+VMWRITE if the VMCS is up-to-date
  2019-09-27 21:45 [PATCH v2 0/8] KVM: x86: nVMX GUEST_CR3 bug fix, and then some Sean Christopherson
  2019-09-27 21:45 ` [PATCH v2 1/8] KVM: nVMX: Always write vmcs02.GUEST_CR3 during nested VM-Enter Sean Christopherson
@ 2019-09-27 21:45 ` Sean Christopherson
  2019-09-27 21:45 ` [PATCH v2 3/8] KVM: VMX: Consolidate to_vmx() usage in RFLAGS accessors Sean Christopherson
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 34+ messages in thread
From: Sean Christopherson @ 2019-09-27 21:45 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Reto Buerki, Liran Alon

Skip the VMWRITE to update GUEST_CR3 if CR3 is not available, i.e. has
not been read from the VMCS since the last VM-Enter.  If vcpu->arch.cr3
is stale, kvm_read_cr3(vcpu) will refresh vcpu->arch.cr3 from the VMCS,
meaning KVM will do a VMREAD and then VMWRITE the value it just pulled
from the VMCS.

Note, this is a purely theoretical change, no instances of skipping
the VMREAD+VMWRITE have been observed with this change.

Tested-by: Reto Buerki <reet@codelabs.ch>
Tested-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/vmx/vmx.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 7679c2a05a50..0b8dd9c315f8 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -3004,10 +3004,12 @@ void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
 		/* Loading vmcs02.GUEST_CR3 is handled by nested VM-Enter. */
 		if (is_guest_mode(vcpu))
 			update_guest_cr3 = false;
-		else if (enable_unrestricted_guest || is_paging(vcpu))
-			guest_cr3 = kvm_read_cr3(vcpu);
-		else
+		else if (!enable_unrestricted_guest && !is_paging(vcpu))
 			guest_cr3 = to_kvm_vmx(kvm)->ept_identity_map_addr;
+		else if (test_bit(VCPU_EXREG_CR3, (ulong *)&vcpu->arch.regs_avail))
+			guest_cr3 = vcpu->arch.cr3;
+		else /* vmcs01.GUEST_CR3 is already up-to-date. */
+			update_guest_cr3 = false;
 		ept_load_pdptrs(vcpu);
 	}
 
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 3/8] KVM: VMX: Consolidate to_vmx() usage in RFLAGS accessors
  2019-09-27 21:45 [PATCH v2 0/8] KVM: x86: nVMX GUEST_CR3 bug fix, and then some Sean Christopherson
  2019-09-27 21:45 ` [PATCH v2 1/8] KVM: nVMX: Always write vmcs02.GUEST_CR3 during nested VM-Enter Sean Christopherson
  2019-09-27 21:45 ` [PATCH v2 2/8] KVM: VMX: Skip GUEST_CR3 VMREAD+VMWRITE if the VMCS is up-to-date Sean Christopherson
@ 2019-09-27 21:45 ` Sean Christopherson
  2019-09-30  8:48   ` Vitaly Kuznetsov
  2019-09-27 21:45 ` [PATCH v2 4/8] KVM: VMX: Optimize vmx_set_rflags() for unrestricted guest Sean Christopherson
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 34+ messages in thread
From: Sean Christopherson @ 2019-09-27 21:45 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Reto Buerki, Liran Alon

Capture struct vcpu_vmx in a local variable to improve the readability
of vmx_{g,s}et_rflags().

No functional change intended.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/vmx/vmx.c | 20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 0b8dd9c315f8..83fe8b02b732 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1407,35 +1407,37 @@ static void vmx_decache_cr0_guest_bits(struct kvm_vcpu *vcpu);
 
 unsigned long vmx_get_rflags(struct kvm_vcpu *vcpu)
 {
+	struct vcpu_vmx *vmx = to_vmx(vcpu);
 	unsigned long rflags, save_rflags;
 
 	if (!test_bit(VCPU_EXREG_RFLAGS, (ulong *)&vcpu->arch.regs_avail)) {
 		__set_bit(VCPU_EXREG_RFLAGS, (ulong *)&vcpu->arch.regs_avail);
 		rflags = vmcs_readl(GUEST_RFLAGS);
-		if (to_vmx(vcpu)->rmode.vm86_active) {
+		if (vmx->rmode.vm86_active) {
 			rflags &= RMODE_GUEST_OWNED_EFLAGS_BITS;
-			save_rflags = to_vmx(vcpu)->rmode.save_rflags;
+			save_rflags = vmx->rmode.save_rflags;
 			rflags |= save_rflags & ~RMODE_GUEST_OWNED_EFLAGS_BITS;
 		}
-		to_vmx(vcpu)->rflags = rflags;
+		vmx->rflags = rflags;
 	}
-	return to_vmx(vcpu)->rflags;
+	return vmx->rflags;
 }
 
 void vmx_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
 {
+	struct vcpu_vmx *vmx = to_vmx(vcpu);
 	unsigned long old_rflags = vmx_get_rflags(vcpu);
 
 	__set_bit(VCPU_EXREG_RFLAGS, (ulong *)&vcpu->arch.regs_avail);
-	to_vmx(vcpu)->rflags = rflags;
-	if (to_vmx(vcpu)->rmode.vm86_active) {
-		to_vmx(vcpu)->rmode.save_rflags = rflags;
+	vmx->rflags = rflags;
+	if (vmx->rmode.vm86_active) {
+		vmx->rmode.save_rflags = rflags;
 		rflags |= X86_EFLAGS_IOPL | X86_EFLAGS_VM;
 	}
 	vmcs_writel(GUEST_RFLAGS, rflags);
 
-	if ((old_rflags ^ to_vmx(vcpu)->rflags) & X86_EFLAGS_VM)
-		to_vmx(vcpu)->emulation_required = emulation_required(vcpu);
+	if ((old_rflags ^ vmx->rflags) & X86_EFLAGS_VM)
+		vmx->emulation_required = emulation_required(vcpu);
 }
 
 u32 vmx_get_interrupt_shadow(struct kvm_vcpu *vcpu)
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 4/8] KVM: VMX: Optimize vmx_set_rflags() for unrestricted guest
  2019-09-27 21:45 [PATCH v2 0/8] KVM: x86: nVMX GUEST_CR3 bug fix, and then some Sean Christopherson
                   ` (2 preceding siblings ...)
  2019-09-27 21:45 ` [PATCH v2 3/8] KVM: VMX: Consolidate to_vmx() usage in RFLAGS accessors Sean Christopherson
@ 2019-09-27 21:45 ` Sean Christopherson
  2019-09-30  8:57   ` Vitaly Kuznetsov
  2019-10-09 10:40   ` Paolo Bonzini
  2019-09-27 21:45 ` [PATCH v2 5/8] KVM: x86: Add WARNs to detect out-of-bounds register indices Sean Christopherson
                   ` (4 subsequent siblings)
  8 siblings, 2 replies; 34+ messages in thread
From: Sean Christopherson @ 2019-09-27 21:45 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Reto Buerki, Liran Alon

Rework vmx_set_rflags() to avoid the extra code need to handle emulation
of real mode and invalid state when unrestricted guest is disabled.  The
primary reason for doing so is to avoid the call to vmx_get_rflags(),
which will incur a VMREAD when RFLAGS is not already available.  When
running nested VMs, the majority of calls to vmx_set_rflags() will occur
without an associated vmx_get_rflags(), i.e. when stuffing GUEST_RFLAGS
during transitions between vmcs01 and vmcs02.

Note, vmx_get_rflags() guarantees RFLAGS is marked available.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/vmx/vmx.c | 28 ++++++++++++++++++----------
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 83fe8b02b732..814d3e6d0264 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1426,18 +1426,26 @@ unsigned long vmx_get_rflags(struct kvm_vcpu *vcpu)
 void vmx_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
-	unsigned long old_rflags = vmx_get_rflags(vcpu);
+	unsigned long old_rflags;
 
-	__set_bit(VCPU_EXREG_RFLAGS, (ulong *)&vcpu->arch.regs_avail);
-	vmx->rflags = rflags;
-	if (vmx->rmode.vm86_active) {
-		vmx->rmode.save_rflags = rflags;
-		rflags |= X86_EFLAGS_IOPL | X86_EFLAGS_VM;
+	if (enable_unrestricted_guest) {
+		__set_bit(VCPU_EXREG_RFLAGS, (ulong *)&vcpu->arch.regs_avail);
+
+		vmx->rflags = rflags;
+		vmcs_writel(GUEST_RFLAGS, rflags);
+	} else {
+		old_rflags = vmx_get_rflags(vcpu);
+
+		vmx->rflags = rflags;
+		if (vmx->rmode.vm86_active) {
+			vmx->rmode.save_rflags = rflags;
+			rflags |= X86_EFLAGS_IOPL | X86_EFLAGS_VM;
+		}
+		vmcs_writel(GUEST_RFLAGS, rflags);
+
+		if ((old_rflags ^ vmx->rflags) & X86_EFLAGS_VM)
+			vmx->emulation_required = emulation_required(vcpu);
 	}
-	vmcs_writel(GUEST_RFLAGS, rflags);
-
-	if ((old_rflags ^ vmx->rflags) & X86_EFLAGS_VM)
-		vmx->emulation_required = emulation_required(vcpu);
 }
 
 u32 vmx_get_interrupt_shadow(struct kvm_vcpu *vcpu)
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 5/8] KVM: x86: Add WARNs to detect out-of-bounds register indices
  2019-09-27 21:45 [PATCH v2 0/8] KVM: x86: nVMX GUEST_CR3 bug fix, and then some Sean Christopherson
                   ` (3 preceding siblings ...)
  2019-09-27 21:45 ` [PATCH v2 4/8] KVM: VMX: Optimize vmx_set_rflags() for unrestricted guest Sean Christopherson
@ 2019-09-27 21:45 ` Sean Christopherson
  2019-09-30  9:19   ` Vitaly Kuznetsov
  2019-10-09 10:50   ` Paolo Bonzini
  2019-09-27 21:45 ` [PATCH v2 6/8] KVM: x86: Fold 'enum kvm_ex_reg' definitions into 'enum kvm_reg' Sean Christopherson
                   ` (3 subsequent siblings)
  8 siblings, 2 replies; 34+ messages in thread
From: Sean Christopherson @ 2019-09-27 21:45 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Reto Buerki, Liran Alon

Add WARN_ON_ONCE() checks in kvm_register_{read,write}() to detect reg
values that would cause KVM to overflow vcpu->arch.regs.  Change the reg
param to an 'int' to make it clear that the reg index is unverified.

Open code the RIP and RSP accessors so as to avoid pointless overhead of
WARN_ON_ONCE().  Alternatively, lower-level helpers could be provided,
but that opens the door for improper use of said helpers, and the
ugliness of the open-coding will be slightly improved in future patches.

Regarding the overhead of WARN_ON_ONCE(), now that all fixed GPR reads
and writes use dedicated accessors, e.g. kvm_rax_read(), the overhead
is limited to flows where the reg index is generated at runtime.  And
there is at least one historical bug where KVM has generated an out-of-
bounds access to arch.regs (see commit b68f3cc7d9789, "KVM: x86: Always
use 32-bit SMRAM save state for 32-bit kernels").

Adding the WARN_ON_ONCE() protection paves the way for additional
cleanup related to kvm_reg and kvm_reg_ex.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/kvm_cache_regs.h | 30 ++++++++++++++++++++++--------
 arch/x86/kvm/x86.h            |  6 ++----
 2 files changed, 24 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/kvm_cache_regs.h b/arch/x86/kvm/kvm_cache_regs.h
index 1cc6c47dc77e..3972e1b65635 100644
--- a/arch/x86/kvm/kvm_cache_regs.h
+++ b/arch/x86/kvm/kvm_cache_regs.h
@@ -37,19 +37,23 @@ BUILD_KVM_GPR_ACCESSORS(r14, R14)
 BUILD_KVM_GPR_ACCESSORS(r15, R15)
 #endif
 
-static inline unsigned long kvm_register_read(struct kvm_vcpu *vcpu,
-					      enum kvm_reg reg)
+static inline unsigned long kvm_register_read(struct kvm_vcpu *vcpu, int reg)
 {
+	if (WARN_ON_ONCE((unsigned int)reg >= NR_VCPU_REGS))
+		return 0;
+
 	if (!test_bit(reg, (unsigned long *)&vcpu->arch.regs_avail))
 		kvm_x86_ops->cache_reg(vcpu, reg);
 
 	return vcpu->arch.regs[reg];
 }
 
-static inline void kvm_register_write(struct kvm_vcpu *vcpu,
-				      enum kvm_reg reg,
+static inline void kvm_register_write(struct kvm_vcpu *vcpu, int reg,
 				      unsigned long val)
 {
+	if (WARN_ON_ONCE((unsigned int)reg >= NR_VCPU_REGS))
+		return;
+
 	vcpu->arch.regs[reg] = val;
 	__set_bit(reg, (unsigned long *)&vcpu->arch.regs_dirty);
 	__set_bit(reg, (unsigned long *)&vcpu->arch.regs_avail);
@@ -57,22 +61,32 @@ static inline void kvm_register_write(struct kvm_vcpu *vcpu,
 
 static inline unsigned long kvm_rip_read(struct kvm_vcpu *vcpu)
 {
-	return kvm_register_read(vcpu, VCPU_REGS_RIP);
+	if (!test_bit(VCPU_REGS_RIP, (unsigned long *)&vcpu->arch.regs_avail))
+		kvm_x86_ops->cache_reg(vcpu, VCPU_REGS_RIP);
+
+	return vcpu->arch.regs[VCPU_REGS_RIP];
 }
 
 static inline void kvm_rip_write(struct kvm_vcpu *vcpu, unsigned long val)
 {
-	kvm_register_write(vcpu, VCPU_REGS_RIP, val);
+	vcpu->arch.regs[VCPU_REGS_RIP] = val;
+	__set_bit(VCPU_REGS_RIP, (unsigned long *)&vcpu->arch.regs_dirty);
+	__set_bit(VCPU_REGS_RIP, (unsigned long *)&vcpu->arch.regs_avail);
 }
 
 static inline unsigned long kvm_rsp_read(struct kvm_vcpu *vcpu)
 {
-	return kvm_register_read(vcpu, VCPU_REGS_RSP);
+	if (!test_bit(VCPU_REGS_RSP, (unsigned long *)&vcpu->arch.regs_avail))
+		kvm_x86_ops->cache_reg(vcpu, VCPU_REGS_RSP);
+
+	return vcpu->arch.regs[VCPU_REGS_RSP];
 }
 
 static inline void kvm_rsp_write(struct kvm_vcpu *vcpu, unsigned long val)
 {
-	kvm_register_write(vcpu, VCPU_REGS_RSP, val);
+	vcpu->arch.regs[VCPU_REGS_RSP] = val;
+	__set_bit(VCPU_REGS_RSP, (unsigned long *)&vcpu->arch.regs_dirty);
+	__set_bit(VCPU_REGS_RSP, (unsigned long *)&vcpu->arch.regs_avail);
 }
 
 static inline u64 kvm_pdptr_read(struct kvm_vcpu *vcpu, int index)
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index dbf7442a822b..45d82b8277e5 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -238,8 +238,7 @@ static inline bool vcpu_match_mmio_gpa(struct kvm_vcpu *vcpu, gpa_t gpa)
 	return false;
 }
 
-static inline unsigned long kvm_register_readl(struct kvm_vcpu *vcpu,
-					       enum kvm_reg reg)
+static inline unsigned long kvm_register_readl(struct kvm_vcpu *vcpu, int reg)
 {
 	unsigned long val = kvm_register_read(vcpu, reg);
 
@@ -247,8 +246,7 @@ static inline unsigned long kvm_register_readl(struct kvm_vcpu *vcpu,
 }
 
 static inline void kvm_register_writel(struct kvm_vcpu *vcpu,
-				       enum kvm_reg reg,
-				       unsigned long val)
+				       int reg, unsigned long val)
 {
 	if (!is_64_bit_mode(vcpu))
 		val = (u32)val;
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 6/8] KVM: x86: Fold 'enum kvm_ex_reg' definitions into 'enum kvm_reg'
  2019-09-27 21:45 [PATCH v2 0/8] KVM: x86: nVMX GUEST_CR3 bug fix, and then some Sean Christopherson
                   ` (4 preceding siblings ...)
  2019-09-27 21:45 ` [PATCH v2 5/8] KVM: x86: Add WARNs to detect out-of-bounds register indices Sean Christopherson
@ 2019-09-27 21:45 ` Sean Christopherson
  2019-09-30  9:25   ` Vitaly Kuznetsov
  2019-09-27 21:45 ` [PATCH v2 7/8] KVM: x86: Add helpers to test/mark reg availability and dirtiness Sean Christopherson
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 34+ messages in thread
From: Sean Christopherson @ 2019-09-27 21:45 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Reto Buerki, Liran Alon

Now that indexing into arch.regs is either protected by WARN_ON_ONCE or
done with hardcoded enums, combine all definitions for registers that
are tracked by regs_avail and regs_dirty into 'enum kvm_reg'.  Having a
single enum type will simplify additional cleanup related to regs_avail
and regs_dirty.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/include/asm/kvm_host.h | 4 +---
 arch/x86/kvm/kvm_cache_regs.h   | 2 +-
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 23edf56cf577..a27f7f6b6b7a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -156,10 +156,8 @@ enum kvm_reg {
 	VCPU_REGS_R15 = __VCPU_REGS_R15,
 #endif
 	VCPU_REGS_RIP,
-	NR_VCPU_REGS
-};
+	NR_VCPU_REGS,
 
-enum kvm_reg_ex {
 	VCPU_EXREG_PDPTR = NR_VCPU_REGS,
 	VCPU_EXREG_CR3,
 	VCPU_EXREG_RFLAGS,
diff --git a/arch/x86/kvm/kvm_cache_regs.h b/arch/x86/kvm/kvm_cache_regs.h
index 3972e1b65635..b85fc4b4e04f 100644
--- a/arch/x86/kvm/kvm_cache_regs.h
+++ b/arch/x86/kvm/kvm_cache_regs.h
@@ -95,7 +95,7 @@ static inline u64 kvm_pdptr_read(struct kvm_vcpu *vcpu, int index)
 
 	if (!test_bit(VCPU_EXREG_PDPTR,
 		      (unsigned long *)&vcpu->arch.regs_avail))
-		kvm_x86_ops->cache_reg(vcpu, (enum kvm_reg)VCPU_EXREG_PDPTR);
+		kvm_x86_ops->cache_reg(vcpu, VCPU_EXREG_PDPTR);
 
 	return vcpu->arch.walk_mmu->pdptrs[index];
 }
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 7/8] KVM: x86: Add helpers to test/mark reg availability and dirtiness
  2019-09-27 21:45 [PATCH v2 0/8] KVM: x86: nVMX GUEST_CR3 bug fix, and then some Sean Christopherson
                   ` (5 preceding siblings ...)
  2019-09-27 21:45 ` [PATCH v2 6/8] KVM: x86: Fold 'enum kvm_ex_reg' definitions into 'enum kvm_reg' Sean Christopherson
@ 2019-09-27 21:45 ` Sean Christopherson
  2019-09-30  9:32   ` Vitaly Kuznetsov
  2019-09-27 21:45 ` [PATCH v2 8/8] KVM: x86: Fold decache_cr3() into cache_reg() Sean Christopherson
  2019-09-30 10:42 ` [PATCH v2 0/8] KVM: x86: nVMX GUEST_CR3 bug fix, and then some Reto Buerki
  8 siblings, 1 reply; 34+ messages in thread
From: Sean Christopherson @ 2019-09-27 21:45 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Reto Buerki, Liran Alon

Add helpers to prettify code that tests and/or marks whether or not a
register is available and/or dirty.

Suggested-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/kvm_cache_regs.h | 45 +++++++++++++++++++++++++----------
 arch/x86/kvm/vmx/nested.c     |  4 ++--
 arch/x86/kvm/vmx/vmx.c        | 29 ++++++++++------------
 arch/x86/kvm/x86.c            | 13 ++++------
 4 files changed, 53 insertions(+), 38 deletions(-)

diff --git a/arch/x86/kvm/kvm_cache_regs.h b/arch/x86/kvm/kvm_cache_regs.h
index b85fc4b4e04f..9c2bc528800b 100644
--- a/arch/x86/kvm/kvm_cache_regs.h
+++ b/arch/x86/kvm/kvm_cache_regs.h
@@ -37,12 +37,37 @@ BUILD_KVM_GPR_ACCESSORS(r14, R14)
 BUILD_KVM_GPR_ACCESSORS(r15, R15)
 #endif
 
+static inline bool kvm_register_is_available(struct kvm_vcpu *vcpu,
+					     enum kvm_reg reg)
+{
+	return test_bit(reg, (unsigned long *)&vcpu->arch.regs_avail);
+}
+
+static inline bool kvm_register_is_dirty(struct kvm_vcpu *vcpu,
+					 enum kvm_reg reg)
+{
+	return test_bit(reg, (unsigned long *)&vcpu->arch.regs_dirty);
+}
+
+static inline void kvm_register_mark_available(struct kvm_vcpu *vcpu,
+					       enum kvm_reg reg)
+{
+	__set_bit(reg, (unsigned long *)&vcpu->arch.regs_avail);
+}
+
+static inline void kvm_register_mark_dirty(struct kvm_vcpu *vcpu,
+					   enum kvm_reg reg)
+{
+	__set_bit(reg, (unsigned long *)&vcpu->arch.regs_avail);
+	__set_bit(reg, (unsigned long *)&vcpu->arch.regs_dirty);
+}
+
 static inline unsigned long kvm_register_read(struct kvm_vcpu *vcpu, int reg)
 {
 	if (WARN_ON_ONCE((unsigned int)reg >= NR_VCPU_REGS))
 		return 0;
 
-	if (!test_bit(reg, (unsigned long *)&vcpu->arch.regs_avail))
+	if (!kvm_register_is_available(vcpu, reg))
 		kvm_x86_ops->cache_reg(vcpu, reg);
 
 	return vcpu->arch.regs[reg];
@@ -55,13 +80,12 @@ static inline void kvm_register_write(struct kvm_vcpu *vcpu, int reg,
 		return;
 
 	vcpu->arch.regs[reg] = val;
-	__set_bit(reg, (unsigned long *)&vcpu->arch.regs_dirty);
-	__set_bit(reg, (unsigned long *)&vcpu->arch.regs_avail);
+	kvm_register_mark_dirty(vcpu, reg);
 }
 
 static inline unsigned long kvm_rip_read(struct kvm_vcpu *vcpu)
 {
-	if (!test_bit(VCPU_REGS_RIP, (unsigned long *)&vcpu->arch.regs_avail))
+	if (!kvm_register_is_available(vcpu, VCPU_REGS_RIP))
 		kvm_x86_ops->cache_reg(vcpu, VCPU_REGS_RIP);
 
 	return vcpu->arch.regs[VCPU_REGS_RIP];
@@ -70,13 +94,12 @@ static inline unsigned long kvm_rip_read(struct kvm_vcpu *vcpu)
 static inline void kvm_rip_write(struct kvm_vcpu *vcpu, unsigned long val)
 {
 	vcpu->arch.regs[VCPU_REGS_RIP] = val;
-	__set_bit(VCPU_REGS_RIP, (unsigned long *)&vcpu->arch.regs_dirty);
-	__set_bit(VCPU_REGS_RIP, (unsigned long *)&vcpu->arch.regs_avail);
+	kvm_register_mark_dirty(vcpu, VCPU_REGS_RIP);
 }
 
 static inline unsigned long kvm_rsp_read(struct kvm_vcpu *vcpu)
 {
-	if (!test_bit(VCPU_REGS_RSP, (unsigned long *)&vcpu->arch.regs_avail))
+	if (!kvm_register_is_available(vcpu, VCPU_REGS_RSP))
 		kvm_x86_ops->cache_reg(vcpu, VCPU_REGS_RSP);
 
 	return vcpu->arch.regs[VCPU_REGS_RSP];
@@ -85,16 +108,14 @@ static inline unsigned long kvm_rsp_read(struct kvm_vcpu *vcpu)
 static inline void kvm_rsp_write(struct kvm_vcpu *vcpu, unsigned long val)
 {
 	vcpu->arch.regs[VCPU_REGS_RSP] = val;
-	__set_bit(VCPU_REGS_RSP, (unsigned long *)&vcpu->arch.regs_dirty);
-	__set_bit(VCPU_REGS_RSP, (unsigned long *)&vcpu->arch.regs_avail);
+	kvm_register_mark_dirty(vcpu, VCPU_REGS_RSP);
 }
 
 static inline u64 kvm_pdptr_read(struct kvm_vcpu *vcpu, int index)
 {
 	might_sleep();  /* on svm */
 
-	if (!test_bit(VCPU_EXREG_PDPTR,
-		      (unsigned long *)&vcpu->arch.regs_avail))
+	if (!kvm_register_is_available(vcpu, VCPU_EXREG_PDPTR))
 		kvm_x86_ops->cache_reg(vcpu, VCPU_EXREG_PDPTR);
 
 	return vcpu->arch.walk_mmu->pdptrs[index];
@@ -123,7 +144,7 @@ static inline ulong kvm_read_cr4_bits(struct kvm_vcpu *vcpu, ulong mask)
 
 static inline ulong kvm_read_cr3(struct kvm_vcpu *vcpu)
 {
-	if (!test_bit(VCPU_EXREG_CR3, (ulong *)&vcpu->arch.regs_avail))
+	if (!kvm_register_is_available(vcpu, VCPU_EXREG_CR3))
 		kvm_x86_ops->decache_cr3(vcpu);
 	return vcpu->arch.cr3;
 }
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index b72a00b53e4a..8946f11c574c 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -1012,7 +1012,7 @@ static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3, bool ne
 		kvm_mmu_new_cr3(vcpu, cr3, false);
 
 	vcpu->arch.cr3 = cr3;
-	__set_bit(VCPU_EXREG_CR3, (ulong *)&vcpu->arch.regs_avail);
+	kvm_register_mark_available(vcpu, VCPU_EXREG_CR3);
 
 	kvm_init_mmu(vcpu, false);
 
@@ -3986,7 +3986,7 @@ static void nested_vmx_restore_host_state(struct kvm_vcpu *vcpu)
 
 	nested_ept_uninit_mmu_context(vcpu);
 	vcpu->arch.cr3 = vmcs_readl(GUEST_CR3);
-	__set_bit(VCPU_EXREG_CR3, (ulong *)&vcpu->arch.regs_avail);
+	kvm_register_mark_available(vcpu, VCPU_EXREG_CR3);
 
 	/*
 	 * Use ept_save_pdptrs(vcpu) to load the MMU's cached PDPTRs
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 814d3e6d0264..ed03d0cd1cc8 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -721,8 +721,8 @@ static bool vmx_segment_cache_test_set(struct vcpu_vmx *vmx, unsigned seg,
 	bool ret;
 	u32 mask = 1 << (seg * SEG_FIELD_NR + field);
 
-	if (!(vmx->vcpu.arch.regs_avail & (1 << VCPU_EXREG_SEGMENTS))) {
-		vmx->vcpu.arch.regs_avail |= (1 << VCPU_EXREG_SEGMENTS);
+	if (!kvm_register_is_available(&vmx->vcpu, VCPU_EXREG_SEGMENTS)) {
+		kvm_register_mark_available(&vmx->vcpu, VCPU_EXREG_SEGMENTS);
 		vmx->segment_cache.bitmask = 0;
 	}
 	ret = vmx->segment_cache.bitmask & mask;
@@ -1410,8 +1410,8 @@ unsigned long vmx_get_rflags(struct kvm_vcpu *vcpu)
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 	unsigned long rflags, save_rflags;
 
-	if (!test_bit(VCPU_EXREG_RFLAGS, (ulong *)&vcpu->arch.regs_avail)) {
-		__set_bit(VCPU_EXREG_RFLAGS, (ulong *)&vcpu->arch.regs_avail);
+	if (!kvm_register_is_available(vcpu, VCPU_EXREG_RFLAGS)) {
+		kvm_register_mark_available(vcpu, VCPU_EXREG_RFLAGS);
 		rflags = vmcs_readl(GUEST_RFLAGS);
 		if (vmx->rmode.vm86_active) {
 			rflags &= RMODE_GUEST_OWNED_EFLAGS_BITS;
@@ -1429,7 +1429,7 @@ void vmx_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
 	unsigned long old_rflags;
 
 	if (enable_unrestricted_guest) {
-		__set_bit(VCPU_EXREG_RFLAGS, (ulong *)&vcpu->arch.regs_avail);
+		kvm_register_mark_available(vcpu, VCPU_EXREG_RFLAGS);
 
 		vmx->rflags = rflags;
 		vmcs_writel(GUEST_RFLAGS, rflags);
@@ -2175,7 +2175,8 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 
 static void vmx_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg)
 {
-	__set_bit(reg, (unsigned long *)&vcpu->arch.regs_avail);
+	kvm_register_mark_available(vcpu, reg);
+
 	switch (reg) {
 	case VCPU_REGS_RSP:
 		vcpu->arch.regs[VCPU_REGS_RSP] = vmcs_readl(GUEST_RSP);
@@ -2862,7 +2863,7 @@ static void vmx_decache_cr3(struct kvm_vcpu *vcpu)
 {
 	if (enable_unrestricted_guest || (enable_ept && is_paging(vcpu)))
 		vcpu->arch.cr3 = vmcs_readl(GUEST_CR3);
-	__set_bit(VCPU_EXREG_CR3, (ulong *)&vcpu->arch.regs_avail);
+	kvm_register_mark_available(vcpu, VCPU_EXREG_CR3);
 }
 
 static void vmx_decache_cr4_guest_bits(struct kvm_vcpu *vcpu)
@@ -2877,8 +2878,7 @@ static void ept_load_pdptrs(struct kvm_vcpu *vcpu)
 {
 	struct kvm_mmu *mmu = vcpu->arch.walk_mmu;
 
-	if (!test_bit(VCPU_EXREG_PDPTR,
-		      (unsigned long *)&vcpu->arch.regs_dirty))
+	if (!kvm_register_is_dirty(vcpu, VCPU_EXREG_PDPTR))
 		return;
 
 	if (is_pae_paging(vcpu)) {
@@ -2900,10 +2900,7 @@ void ept_save_pdptrs(struct kvm_vcpu *vcpu)
 		mmu->pdptrs[3] = vmcs_read64(GUEST_PDPTR3);
 	}
 
-	__set_bit(VCPU_EXREG_PDPTR,
-		  (unsigned long *)&vcpu->arch.regs_avail);
-	__set_bit(VCPU_EXREG_PDPTR,
-		  (unsigned long *)&vcpu->arch.regs_dirty);
+	kvm_register_mark_dirty(vcpu, VCPU_EXREG_PDPTR);
 }
 
 static void ept_update_paging_mode_cr0(unsigned long *hw_cr0,
@@ -2912,7 +2909,7 @@ static void ept_update_paging_mode_cr0(unsigned long *hw_cr0,
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 
-	if (!test_bit(VCPU_EXREG_CR3, (ulong *)&vcpu->arch.regs_avail))
+	if (!kvm_register_is_available(vcpu, VCPU_EXREG_CR3))
 		vmx_decache_cr3(vcpu);
 	if (!(cr0 & X86_CR0_PG)) {
 		/* From paging/starting to nonpaging */
@@ -6528,9 +6525,9 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
 	if (vmx->nested.need_vmcs12_to_shadow_sync)
 		nested_sync_vmcs12_to_shadow(vcpu);
 
-	if (test_bit(VCPU_REGS_RSP, (unsigned long *)&vcpu->arch.regs_dirty))
+	if (kvm_register_is_dirty(vcpu, VCPU_REGS_RSP))
 		vmcs_writel(GUEST_RSP, vcpu->arch.regs[VCPU_REGS_RSP]);
-	if (test_bit(VCPU_REGS_RIP, (unsigned long *)&vcpu->arch.regs_dirty))
+	if (kvm_register_is_dirty(vcpu, VCPU_REGS_RIP))
 		vmcs_writel(GUEST_RIP, vcpu->arch.regs[VCPU_REGS_RIP]);
 
 	cr3 = __get_current_cr3_fast();
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0ed07d8d2caa..cd6bd7991c39 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -709,10 +709,8 @@ int load_pdptrs(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, unsigned long cr3)
 	ret = 1;
 
 	memcpy(mmu->pdptrs, pdpte, sizeof(mmu->pdptrs));
-	__set_bit(VCPU_EXREG_PDPTR,
-		  (unsigned long *)&vcpu->arch.regs_avail);
-	__set_bit(VCPU_EXREG_PDPTR,
-		  (unsigned long *)&vcpu->arch.regs_dirty);
+	kvm_register_mark_dirty(vcpu, VCPU_EXREG_PDPTR);
+
 out:
 
 	return ret;
@@ -730,8 +728,7 @@ bool pdptrs_changed(struct kvm_vcpu *vcpu)
 	if (!is_pae_paging(vcpu))
 		return false;
 
-	if (!test_bit(VCPU_EXREG_PDPTR,
-		      (unsigned long *)&vcpu->arch.regs_avail))
+	if (!kvm_register_is_available(vcpu, VCPU_EXREG_PDPTR))
 		return true;
 
 	gfn = (kvm_read_cr3(vcpu) & 0xffffffe0ul) >> PAGE_SHIFT;
@@ -976,7 +973,7 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
 
 	kvm_mmu_new_cr3(vcpu, cr3, skip_tlb_flush);
 	vcpu->arch.cr3 = cr3;
-	__set_bit(VCPU_EXREG_CR3, (ulong *)&vcpu->arch.regs_avail);
+	kvm_register_mark_available(vcpu, VCPU_EXREG_CR3);
 
 	return 0;
 }
@@ -8766,7 +8763,7 @@ static int __set_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
 	vcpu->arch.cr2 = sregs->cr2;
 	mmu_reset_needed |= kvm_read_cr3(vcpu) != sregs->cr3;
 	vcpu->arch.cr3 = sregs->cr3;
-	__set_bit(VCPU_EXREG_CR3, (ulong *)&vcpu->arch.regs_avail);
+	kvm_register_mark_available(vcpu, VCPU_EXREG_CR3);
 
 	kvm_set_cr8(vcpu, sregs->cr8);
 
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 8/8] KVM: x86: Fold decache_cr3() into cache_reg()
  2019-09-27 21:45 [PATCH v2 0/8] KVM: x86: nVMX GUEST_CR3 bug fix, and then some Sean Christopherson
                   ` (6 preceding siblings ...)
  2019-09-27 21:45 ` [PATCH v2 7/8] KVM: x86: Add helpers to test/mark reg availability and dirtiness Sean Christopherson
@ 2019-09-27 21:45 ` Sean Christopherson
  2019-09-30 10:58   ` Vitaly Kuznetsov
  2019-10-09 11:03   ` Paolo Bonzini
  2019-09-30 10:42 ` [PATCH v2 0/8] KVM: x86: nVMX GUEST_CR3 bug fix, and then some Reto Buerki
  8 siblings, 2 replies; 34+ messages in thread
From: Sean Christopherson @ 2019-09-27 21:45 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Reto Buerki, Liran Alon

Handle caching CR3 (from VMX's VMCS) into struct kvm_vcpu via the common
cache_reg() callback and drop the dedicated decache_cr3().  The name
decache_cr3() is somewhat confusing as the caching behavior of CR3
follows that of GPRs, RFLAGS and PDPTRs, (handled via cache_reg()), and
has nothing in common with the caching behavior of CR0/CR4 (whose
decache_cr{0,4}_guest_bits() likely provided the 'decache' verbiage).

Note, this effectively adds a BUG() if KVM attempts to cache CR3 on SVM.
Opportunistically add a WARN_ON_ONCE() in VMX to provide an equivalent
check.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/include/asm/kvm_host.h |  1 -
 arch/x86/kvm/kvm_cache_regs.h   |  2 +-
 arch/x86/kvm/svm.c              |  5 -----
 arch/x86/kvm/vmx/vmx.c          | 15 ++++++---------
 4 files changed, 7 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index a27f7f6b6b7a..0411dc0a27b0 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1040,7 +1040,6 @@ struct kvm_x86_ops {
 			    struct kvm_segment *var, int seg);
 	void (*get_cs_db_l_bits)(struct kvm_vcpu *vcpu, int *db, int *l);
 	void (*decache_cr0_guest_bits)(struct kvm_vcpu *vcpu);
-	void (*decache_cr3)(struct kvm_vcpu *vcpu);
 	void (*decache_cr4_guest_bits)(struct kvm_vcpu *vcpu);
 	void (*set_cr0)(struct kvm_vcpu *vcpu, unsigned long cr0);
 	void (*set_cr3)(struct kvm_vcpu *vcpu, unsigned long cr3);
diff --git a/arch/x86/kvm/kvm_cache_regs.h b/arch/x86/kvm/kvm_cache_regs.h
index 9c2bc528800b..f18177cd0030 100644
--- a/arch/x86/kvm/kvm_cache_regs.h
+++ b/arch/x86/kvm/kvm_cache_regs.h
@@ -145,7 +145,7 @@ static inline ulong kvm_read_cr4_bits(struct kvm_vcpu *vcpu, ulong mask)
 static inline ulong kvm_read_cr3(struct kvm_vcpu *vcpu)
 {
 	if (!kvm_register_is_available(vcpu, VCPU_EXREG_CR3))
-		kvm_x86_ops->decache_cr3(vcpu);
+		kvm_x86_ops->cache_reg(vcpu, VCPU_EXREG_CR3);
 	return vcpu->arch.cr3;
 }
 
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index f8ecb6df5106..3102c44c12c6 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -2517,10 +2517,6 @@ static void svm_decache_cr0_guest_bits(struct kvm_vcpu *vcpu)
 {
 }
 
-static void svm_decache_cr3(struct kvm_vcpu *vcpu)
-{
-}
-
 static void svm_decache_cr4_guest_bits(struct kvm_vcpu *vcpu)
 {
 }
@@ -7208,7 +7204,6 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
 	.get_cpl = svm_get_cpl,
 	.get_cs_db_l_bits = kvm_get_cs_db_l_bits,
 	.decache_cr0_guest_bits = svm_decache_cr0_guest_bits,
-	.decache_cr3 = svm_decache_cr3,
 	.decache_cr4_guest_bits = svm_decache_cr4_guest_bits,
 	.set_cr0 = svm_set_cr0,
 	.set_cr3 = svm_set_cr3,
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index ed03d0cd1cc8..c84798026e85 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2188,7 +2188,12 @@ static void vmx_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg)
 		if (enable_ept)
 			ept_save_pdptrs(vcpu);
 		break;
+	case VCPU_EXREG_CR3:
+		if (enable_unrestricted_guest || (enable_ept && is_paging(vcpu)))
+			vcpu->arch.cr3 = vmcs_readl(GUEST_CR3);
+		break;
 	default:
+		WARN_ON_ONCE(1);
 		break;
 	}
 }
@@ -2859,13 +2864,6 @@ static void vmx_decache_cr0_guest_bits(struct kvm_vcpu *vcpu)
 	vcpu->arch.cr0 |= vmcs_readl(GUEST_CR0) & cr0_guest_owned_bits;
 }
 
-static void vmx_decache_cr3(struct kvm_vcpu *vcpu)
-{
-	if (enable_unrestricted_guest || (enable_ept && is_paging(vcpu)))
-		vcpu->arch.cr3 = vmcs_readl(GUEST_CR3);
-	kvm_register_mark_available(vcpu, VCPU_EXREG_CR3);
-}
-
 static void vmx_decache_cr4_guest_bits(struct kvm_vcpu *vcpu)
 {
 	ulong cr4_guest_owned_bits = vcpu->arch.cr4_guest_owned_bits;
@@ -2910,7 +2908,7 @@ static void ept_update_paging_mode_cr0(unsigned long *hw_cr0,
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 
 	if (!kvm_register_is_available(vcpu, VCPU_EXREG_CR3))
-		vmx_decache_cr3(vcpu);
+		vmx_cache_reg(vcpu, VCPU_EXREG_CR3);
 	if (!(cr0 & X86_CR0_PG)) {
 		/* From paging/starting to nonpaging */
 		exec_controls_setbit(vmx, CPU_BASED_CR3_LOAD_EXITING |
@@ -7792,7 +7790,6 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
 	.get_cpl = vmx_get_cpl,
 	.get_cs_db_l_bits = vmx_get_cs_db_l_bits,
 	.decache_cr0_guest_bits = vmx_decache_cr0_guest_bits,
-	.decache_cr3 = vmx_decache_cr3,
 	.decache_cr4_guest_bits = vmx_decache_cr4_guest_bits,
 	.set_cr0 = vmx_set_cr0,
 	.set_cr3 = vmx_set_cr3,
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 1/8] KVM: nVMX: Always write vmcs02.GUEST_CR3 during nested VM-Enter
  2019-09-27 21:45 ` [PATCH v2 1/8] KVM: nVMX: Always write vmcs02.GUEST_CR3 during nested VM-Enter Sean Christopherson
@ 2019-09-27 23:37   ` Jim Mattson
  0 siblings, 0 replies; 34+ messages in thread
From: Jim Mattson @ 2019-09-27 23:37 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Radim Krčmář,
	Vitaly Kuznetsov, Wanpeng Li, Joerg Roedel, kvm list, LKML,
	Reto Buerki, Liran Alon

On Fri, Sep 27, 2019 at 2:45 PM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
>
> Write the desired L2 CR3 into vmcs02.GUEST_CR3 during nested VM-Enter
> instead of deferring the VMWRITE until vmx_set_cr3().  If the VMWRITE
> is deferred, then KVM can consume a stale vmcs02.GUEST_CR3 when it
> refreshes vmcs12->guest_cr3 during nested_vmx_vmexit() if the emulated
> VM-Exit occurs without actually entering L2, e.g. if the nested run
> is squashed because nested VM-Enter (from L1) is putting L2 into HLT.
>
> Note, the above scenario can occur regardless of whether L1 is
> intercepting HLT, e.g. L1 can intercept HLT and then re-enter L2 with
> vmcs.GUEST_ACTIVITY_STATE=HALTED.  But practically speaking, a VMM will
> likely put a guest into HALTED if and only if it's not intercepting HLT.
>
> In an ideal world where EPT *requires* unrestricted guest (and vice
> versa), VMX could handle CR3 similar to how it handles RSP and RIP,
> e.g. mark CR3 dirty and conditionally load it at vmx_vcpu_run().  But
> the unrestricted guest silliness complicates the dirty tracking logic
> to the point that explicitly handling vmcs02.GUEST_CR3 during nested
> VM-Enter is a simpler overall implementation.
>
> Cc: stable@vger.kernel.org
> Reported-and-tested-by: Reto Buerki <reet@codelabs.ch>
> Tested-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> Reviewed-by: Liran Alon <liran.alon@oracle.com>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Jim Mattson <jmattson@google.com>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 3/8] KVM: VMX: Consolidate to_vmx() usage in RFLAGS accessors
  2019-09-27 21:45 ` [PATCH v2 3/8] KVM: VMX: Consolidate to_vmx() usage in RFLAGS accessors Sean Christopherson
@ 2019-09-30  8:48   ` Vitaly Kuznetsov
  0 siblings, 0 replies; 34+ messages in thread
From: Vitaly Kuznetsov @ 2019-09-30  8:48 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Wanpeng Li, Jim Mattson, Joerg Roedel, kvm, linux-kernel,
	Reto Buerki, Liran Alon, Paolo Bonzini,
	Radim Krčmář

Sean Christopherson <sean.j.christopherson@intel.com> writes:

> Capture struct vcpu_vmx in a local variable to improve the readability
> of vmx_{g,s}et_rflags().
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> ---
>  arch/x86/kvm/vmx/vmx.c | 20 +++++++++++---------
>  1 file changed, 11 insertions(+), 9 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 0b8dd9c315f8..83fe8b02b732 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -1407,35 +1407,37 @@ static void vmx_decache_cr0_guest_bits(struct kvm_vcpu *vcpu);
>  
>  unsigned long vmx_get_rflags(struct kvm_vcpu *vcpu)
>  {
> +	struct vcpu_vmx *vmx = to_vmx(vcpu);
>  	unsigned long rflags, save_rflags;
>  
>  	if (!test_bit(VCPU_EXREG_RFLAGS, (ulong *)&vcpu->arch.regs_avail)) {
>  		__set_bit(VCPU_EXREG_RFLAGS, (ulong *)&vcpu->arch.regs_avail);
>  		rflags = vmcs_readl(GUEST_RFLAGS);
> -		if (to_vmx(vcpu)->rmode.vm86_active) {
> +		if (vmx->rmode.vm86_active) {
>  			rflags &= RMODE_GUEST_OWNED_EFLAGS_BITS;
> -			save_rflags = to_vmx(vcpu)->rmode.save_rflags;
> +			save_rflags = vmx->rmode.save_rflags;
>  			rflags |= save_rflags & ~RMODE_GUEST_OWNED_EFLAGS_BITS;
>  		}
> -		to_vmx(vcpu)->rflags = rflags;
> +		vmx->rflags = rflags;
>  	}
> -	return to_vmx(vcpu)->rflags;
> +	return vmx->rflags;
>  }
>  
>  void vmx_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
>  {
> +	struct vcpu_vmx *vmx = to_vmx(vcpu);
>  	unsigned long old_rflags = vmx_get_rflags(vcpu);
>  
>  	__set_bit(VCPU_EXREG_RFLAGS, (ulong *)&vcpu->arch.regs_avail);
> -	to_vmx(vcpu)->rflags = rflags;
> -	if (to_vmx(vcpu)->rmode.vm86_active) {
> -		to_vmx(vcpu)->rmode.save_rflags = rflags;
> +	vmx->rflags = rflags;
> +	if (vmx->rmode.vm86_active) {
> +		vmx->rmode.save_rflags = rflags;
>  		rflags |= X86_EFLAGS_IOPL | X86_EFLAGS_VM;
>  	}
>  	vmcs_writel(GUEST_RFLAGS, rflags);
>  
> -	if ((old_rflags ^ to_vmx(vcpu)->rflags) & X86_EFLAGS_VM)
> -		to_vmx(vcpu)->emulation_required = emulation_required(vcpu);
> +	if ((old_rflags ^ vmx->rflags) & X86_EFLAGS_VM)
> +		vmx->emulation_required = emulation_required(vcpu);
>  }
>  
>  u32 vmx_get_interrupt_shadow(struct kvm_vcpu *vcpu)

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>

-- 
Vitaly

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 4/8] KVM: VMX: Optimize vmx_set_rflags() for unrestricted guest
  2019-09-27 21:45 ` [PATCH v2 4/8] KVM: VMX: Optimize vmx_set_rflags() for unrestricted guest Sean Christopherson
@ 2019-09-30  8:57   ` Vitaly Kuznetsov
  2019-09-30 15:19     ` Sean Christopherson
  2019-10-09 10:40   ` Paolo Bonzini
  1 sibling, 1 reply; 34+ messages in thread
From: Vitaly Kuznetsov @ 2019-09-30  8:57 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Radim Krčmář
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Reto Buerki, Liran Alon

Sean Christopherson <sean.j.christopherson@intel.com> writes:

> Rework vmx_set_rflags() to avoid the extra code need to handle emulation
> of real mode and invalid state when unrestricted guest is disabled.  The
> primary reason for doing so is to avoid the call to vmx_get_rflags(),
> which will incur a VMREAD when RFLAGS is not already available.  When
> running nested VMs, the majority of calls to vmx_set_rflags() will occur
> without an associated vmx_get_rflags(), i.e. when stuffing GUEST_RFLAGS
> during transitions between vmcs01 and vmcs02.
>
> Note, vmx_get_rflags() guarantees RFLAGS is marked available.
>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> ---
>  arch/x86/kvm/vmx/vmx.c | 28 ++++++++++++++++++----------
>  1 file changed, 18 insertions(+), 10 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 83fe8b02b732..814d3e6d0264 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -1426,18 +1426,26 @@ unsigned long vmx_get_rflags(struct kvm_vcpu *vcpu)
>  void vmx_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
>  {
>  	struct vcpu_vmx *vmx = to_vmx(vcpu);
> -	unsigned long old_rflags = vmx_get_rflags(vcpu);
> +	unsigned long old_rflags;
>  
> -	__set_bit(VCPU_EXREG_RFLAGS, (ulong *)&vcpu->arch.regs_avail);
> -	vmx->rflags = rflags;
> -	if (vmx->rmode.vm86_active) {
> -		vmx->rmode.save_rflags = rflags;
> -		rflags |= X86_EFLAGS_IOPL | X86_EFLAGS_VM;
> +	if (enable_unrestricted_guest) {
> +		__set_bit(VCPU_EXREG_RFLAGS, (ulong *)&vcpu->arch.regs_avail);
> +
> +		vmx->rflags = rflags;
> +		vmcs_writel(GUEST_RFLAGS, rflags);
> +	} else {
> +		old_rflags = vmx_get_rflags(vcpu);
> +
> +		vmx->rflags = rflags;
> +		if (vmx->rmode.vm86_active) {
> +			vmx->rmode.save_rflags = rflags;
> +			rflags |= X86_EFLAGS_IOPL | X86_EFLAGS_VM;
> +		}
> +		vmcs_writel(GUEST_RFLAGS, rflags);
> +
> +		if ((old_rflags ^ vmx->rflags) & X86_EFLAGS_VM)
> +			vmx->emulation_required = emulation_required(vcpu);
>  	}
> -	vmcs_writel(GUEST_RFLAGS, rflags);

We're doing vmcs_writel() in both branches so it could've stayed here, right?

> -
> -	if ((old_rflags ^ vmx->rflags) & X86_EFLAGS_VM)
> -		vmx->emulation_required = emulation_required(vcpu);
>  }
>  
>  u32 vmx_get_interrupt_shadow(struct kvm_vcpu *vcpu)

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>

-- 
Vitaly

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 5/8] KVM: x86: Add WARNs to detect out-of-bounds register indices
  2019-09-27 21:45 ` [PATCH v2 5/8] KVM: x86: Add WARNs to detect out-of-bounds register indices Sean Christopherson
@ 2019-09-30  9:19   ` Vitaly Kuznetsov
  2019-10-09 10:50   ` Paolo Bonzini
  1 sibling, 0 replies; 34+ messages in thread
From: Vitaly Kuznetsov @ 2019-09-30  9:19 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Wanpeng Li, Jim Mattson, Joerg Roedel, kvm, linux-kernel,
	Reto Buerki, Liran Alon, Paolo Bonzini,
	Radim Krčmář

Sean Christopherson <sean.j.christopherson@intel.com> writes:

> Add WARN_ON_ONCE() checks in kvm_register_{read,write}() to detect reg
> values that would cause KVM to overflow vcpu->arch.regs.  Change the reg
> param to an 'int' to make it clear that the reg index is unverified.
>

Hm, on multiple occasions I was thinking "an enum would do better here
but whatever" but maybe 'int' was there on purpose? Interesting... :-)

> Open code the RIP and RSP accessors so as to avoid pointless overhead of
> WARN_ON_ONCE().  Alternatively, lower-level helpers could be provided,
> but that opens the door for improper use of said helpers, and the
> ugliness of the open-coding will be slightly improved in future patches.
>
> Regarding the overhead of WARN_ON_ONCE(), now that all fixed GPR reads
> and writes use dedicated accessors, e.g. kvm_rax_read(), the overhead
> is limited to flows where the reg index is generated at runtime.  And
> there is at least one historical bug where KVM has generated an out-of-
> bounds access to arch.regs (see commit b68f3cc7d9789, "KVM: x86: Always
> use 32-bit SMRAM save state for 32-bit kernels").
>
> Adding the WARN_ON_ONCE() protection paves the way for additional
> cleanup related to kvm_reg and kvm_reg_ex.
>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> ---
>  arch/x86/kvm/kvm_cache_regs.h | 30 ++++++++++++++++++++++--------
>  arch/x86/kvm/x86.h            |  6 ++----
>  2 files changed, 24 insertions(+), 12 deletions(-)
>
> diff --git a/arch/x86/kvm/kvm_cache_regs.h b/arch/x86/kvm/kvm_cache_regs.h
> index 1cc6c47dc77e..3972e1b65635 100644
> --- a/arch/x86/kvm/kvm_cache_regs.h
> +++ b/arch/x86/kvm/kvm_cache_regs.h
> @@ -37,19 +37,23 @@ BUILD_KVM_GPR_ACCESSORS(r14, R14)
>  BUILD_KVM_GPR_ACCESSORS(r15, R15)
>  #endif
>  
> -static inline unsigned long kvm_register_read(struct kvm_vcpu *vcpu,
> -					      enum kvm_reg reg)
> +static inline unsigned long kvm_register_read(struct kvm_vcpu *vcpu, int reg)
>  {
> +	if (WARN_ON_ONCE((unsigned int)reg >= NR_VCPU_REGS))
> +		return 0;
> +

(I'm just trying to think outside of the box) when this WARN fires it
means we have a bug in KVM but replacing this with BUG_ON() is probably
not justified (like other VMs on the host may be doing all
right). Propagating (and checking) errors from every such place is
probably too cumbersome so what if we introduce a flag "emit
KVM_INTERNAL_ERROR and kill the VM ASAP" and check it before launching
vCPU again? The goal is to not allow the VM to proceed because its state
is definitely invalid.

>  	if (!test_bit(reg, (unsigned long *)&vcpu->arch.regs_avail))
>  		kvm_x86_ops->cache_reg(vcpu, reg);
>  
>  	return vcpu->arch.regs[reg];
>  }
>  
> -static inline void kvm_register_write(struct kvm_vcpu *vcpu,
> -				      enum kvm_reg reg,
> +static inline void kvm_register_write(struct kvm_vcpu *vcpu, int reg,
>  				      unsigned long val)
>  {
> +	if (WARN_ON_ONCE((unsigned int)reg >= NR_VCPU_REGS))
> +		return;
> +
>  	vcpu->arch.regs[reg] = val;
>  	__set_bit(reg, (unsigned long *)&vcpu->arch.regs_dirty);
>  	__set_bit(reg, (unsigned long *)&vcpu->arch.regs_avail);
> @@ -57,22 +61,32 @@ static inline void kvm_register_write(struct kvm_vcpu *vcpu,
>  
>  static inline unsigned long kvm_rip_read(struct kvm_vcpu *vcpu)
>  {
> -	return kvm_register_read(vcpu, VCPU_REGS_RIP);
> +	if (!test_bit(VCPU_REGS_RIP, (unsigned long *)&vcpu->arch.regs_avail))
> +		kvm_x86_ops->cache_reg(vcpu, VCPU_REGS_RIP);
> +
> +	return vcpu->arch.regs[VCPU_REGS_RIP];
>  }
>  
>  static inline void kvm_rip_write(struct kvm_vcpu *vcpu, unsigned long val)
>  {
> -	kvm_register_write(vcpu, VCPU_REGS_RIP, val);
> +	vcpu->arch.regs[VCPU_REGS_RIP] = val;
> +	__set_bit(VCPU_REGS_RIP, (unsigned long *)&vcpu->arch.regs_dirty);
> +	__set_bit(VCPU_REGS_RIP, (unsigned long *)&vcpu->arch.regs_avail);
>  }
>  
>  static inline unsigned long kvm_rsp_read(struct kvm_vcpu *vcpu)
>  {
> -	return kvm_register_read(vcpu, VCPU_REGS_RSP);
> +	if (!test_bit(VCPU_REGS_RSP, (unsigned long *)&vcpu->arch.regs_avail))
> +		kvm_x86_ops->cache_reg(vcpu, VCPU_REGS_RSP);
> +
> +	return vcpu->arch.regs[VCPU_REGS_RSP];
>  }
>  
>  static inline void kvm_rsp_write(struct kvm_vcpu *vcpu, unsigned long val)
>  {
> -	kvm_register_write(vcpu, VCPU_REGS_RSP, val);
> +	vcpu->arch.regs[VCPU_REGS_RSP] = val;
> +	__set_bit(VCPU_REGS_RSP, (unsigned long *)&vcpu->arch.regs_dirty);
> +	__set_bit(VCPU_REGS_RSP, (unsigned long *)&vcpu->arch.regs_avail);
>  }
>  
>  static inline u64 kvm_pdptr_read(struct kvm_vcpu *vcpu, int index)
> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> index dbf7442a822b..45d82b8277e5 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -238,8 +238,7 @@ static inline bool vcpu_match_mmio_gpa(struct kvm_vcpu *vcpu, gpa_t gpa)
>  	return false;
>  }
>  
> -static inline unsigned long kvm_register_readl(struct kvm_vcpu *vcpu,
> -					       enum kvm_reg reg)
> +static inline unsigned long kvm_register_readl(struct kvm_vcpu *vcpu, int reg)
>  {
>  	unsigned long val = kvm_register_read(vcpu, reg);
>  
> @@ -247,8 +246,7 @@ static inline unsigned long kvm_register_readl(struct kvm_vcpu *vcpu,
>  }
>  
>  static inline void kvm_register_writel(struct kvm_vcpu *vcpu,
> -				       enum kvm_reg reg,
> -				       unsigned long val)
> +				       int reg, unsigned long val)
>  {
>  	if (!is_64_bit_mode(vcpu))
>  		val = (u32)val;

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>

-- 
Vitaly

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 6/8] KVM: x86: Fold 'enum kvm_ex_reg' definitions into 'enum kvm_reg'
  2019-09-27 21:45 ` [PATCH v2 6/8] KVM: x86: Fold 'enum kvm_ex_reg' definitions into 'enum kvm_reg' Sean Christopherson
@ 2019-09-30  9:25   ` Vitaly Kuznetsov
  2019-10-09 10:52     ` Paolo Bonzini
  0 siblings, 1 reply; 34+ messages in thread
From: Vitaly Kuznetsov @ 2019-09-30  9:25 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Wanpeng Li, Jim Mattson, Joerg Roedel, kvm, linux-kernel,
	Reto Buerki, Liran Alon, Paolo Bonzini,
	Radim Krčmář

Sean Christopherson <sean.j.christopherson@intel.com> writes:

> Now that indexing into arch.regs is either protected by WARN_ON_ONCE or
> done with hardcoded enums, combine all definitions for registers that
> are tracked by regs_avail and regs_dirty into 'enum kvm_reg'.  Having a
> single enum type will simplify additional cleanup related to regs_avail
> and regs_dirty.
>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> ---
>  arch/x86/include/asm/kvm_host.h | 4 +---
>  arch/x86/kvm/kvm_cache_regs.h   | 2 +-
>  2 files changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 23edf56cf577..a27f7f6b6b7a 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -156,10 +156,8 @@ enum kvm_reg {
>  	VCPU_REGS_R15 = __VCPU_REGS_R15,
>  #endif
>  	VCPU_REGS_RIP,
> -	NR_VCPU_REGS
> -};
> +	NR_VCPU_REGS,
>  
> -enum kvm_reg_ex {
>  	VCPU_EXREG_PDPTR = NR_VCPU_REGS,

(Personally, I would've changed that to NR_VCPU_REGS + 1)

>  	VCPU_EXREG_CR3,
>  	VCPU_EXREG_RFLAGS,
> diff --git a/arch/x86/kvm/kvm_cache_regs.h b/arch/x86/kvm/kvm_cache_regs.h
> index 3972e1b65635..b85fc4b4e04f 100644
> --- a/arch/x86/kvm/kvm_cache_regs.h
> +++ b/arch/x86/kvm/kvm_cache_regs.h
> @@ -95,7 +95,7 @@ static inline u64 kvm_pdptr_read(struct kvm_vcpu *vcpu, int index)
>  
>  	if (!test_bit(VCPU_EXREG_PDPTR,
>  		      (unsigned long *)&vcpu->arch.regs_avail))
> -		kvm_x86_ops->cache_reg(vcpu, (enum kvm_reg)VCPU_EXREG_PDPTR);
> +		kvm_x86_ops->cache_reg(vcpu, VCPU_EXREG_PDPTR);
>  
>  	return vcpu->arch.walk_mmu->pdptrs[index];
>  }

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>

-- 
Vitaly

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 7/8] KVM: x86: Add helpers to test/mark reg availability and dirtiness
  2019-09-27 21:45 ` [PATCH v2 7/8] KVM: x86: Add helpers to test/mark reg availability and dirtiness Sean Christopherson
@ 2019-09-30  9:32   ` Vitaly Kuznetsov
  2019-10-09 11:00     ` Paolo Bonzini
  0 siblings, 1 reply; 34+ messages in thread
From: Vitaly Kuznetsov @ 2019-09-30  9:32 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Radim Krčmář
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Reto Buerki, Liran Alon

Sean Christopherson <sean.j.christopherson@intel.com> writes:

> Add helpers to prettify code that tests and/or marks whether or not a
> register is available and/or dirty.
>
> Suggested-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> ---
>  arch/x86/kvm/kvm_cache_regs.h | 45 +++++++++++++++++++++++++----------
>  arch/x86/kvm/vmx/nested.c     |  4 ++--
>  arch/x86/kvm/vmx/vmx.c        | 29 ++++++++++------------
>  arch/x86/kvm/x86.c            | 13 ++++------
>  4 files changed, 53 insertions(+), 38 deletions(-)
>
> diff --git a/arch/x86/kvm/kvm_cache_regs.h b/arch/x86/kvm/kvm_cache_regs.h
> index b85fc4b4e04f..9c2bc528800b 100644
> --- a/arch/x86/kvm/kvm_cache_regs.h
> +++ b/arch/x86/kvm/kvm_cache_regs.h
> @@ -37,12 +37,37 @@ BUILD_KVM_GPR_ACCESSORS(r14, R14)
>  BUILD_KVM_GPR_ACCESSORS(r15, R15)
>  #endif
>  
> +static inline bool kvm_register_is_available(struct kvm_vcpu *vcpu,
> +					     enum kvm_reg reg)
> +{
> +	return test_bit(reg, (unsigned long *)&vcpu->arch.regs_avail);
> +}
> +

(Interestingly enough, all call sites use !kvm_register_is_available()
but kvm_register_is_unavailable() sounds weird. So I'd prefer to keep it
as-is).

> +static inline bool kvm_register_is_dirty(struct kvm_vcpu *vcpu,
> +					 enum kvm_reg reg)
> +{
> +	return test_bit(reg, (unsigned long *)&vcpu->arch.regs_dirty);
> +}
> +
> +static inline void kvm_register_mark_available(struct kvm_vcpu *vcpu,
> +					       enum kvm_reg reg)
> +{
> +	__set_bit(reg, (unsigned long *)&vcpu->arch.regs_avail);
> +}
> +
> +static inline void kvm_register_mark_dirty(struct kvm_vcpu *vcpu,
> +					   enum kvm_reg reg)
> +{
> +	__set_bit(reg, (unsigned long *)&vcpu->arch.regs_avail);
> +	__set_bit(reg, (unsigned long *)&vcpu->arch.regs_dirty);
> +}
> +

Personal preference again, but I would've named this
"kvm_register_mark_avail_dirty" to indicate what we're actually doing
(and maybe even shortened 'kvm_register_' to 'kvm_reg_' everywhere as I
can't see how 'reg' could be misread).

>  static inline unsigned long kvm_register_read(struct kvm_vcpu *vcpu, int reg)
>  {
>  	if (WARN_ON_ONCE((unsigned int)reg >= NR_VCPU_REGS))
>  		return 0;
>  
> -	if (!test_bit(reg, (unsigned long *)&vcpu->arch.regs_avail))
> +	if (!kvm_register_is_available(vcpu, reg))
>  		kvm_x86_ops->cache_reg(vcpu, reg);
>  
>  	return vcpu->arch.regs[reg];
> @@ -55,13 +80,12 @@ static inline void kvm_register_write(struct kvm_vcpu *vcpu, int reg,
>  		return;
>  
>  	vcpu->arch.regs[reg] = val;
> -	__set_bit(reg, (unsigned long *)&vcpu->arch.regs_dirty);
> -	__set_bit(reg, (unsigned long *)&vcpu->arch.regs_avail);
> +	kvm_register_mark_dirty(vcpu, reg);
>  }
>  
>  static inline unsigned long kvm_rip_read(struct kvm_vcpu *vcpu)
>  {
> -	if (!test_bit(VCPU_REGS_RIP, (unsigned long *)&vcpu->arch.regs_avail))
> +	if (!kvm_register_is_available(vcpu, VCPU_REGS_RIP))
>  		kvm_x86_ops->cache_reg(vcpu, VCPU_REGS_RIP);
>  
>  	return vcpu->arch.regs[VCPU_REGS_RIP];
> @@ -70,13 +94,12 @@ static inline unsigned long kvm_rip_read(struct kvm_vcpu *vcpu)
>  static inline void kvm_rip_write(struct kvm_vcpu *vcpu, unsigned long val)
>  {
>  	vcpu->arch.regs[VCPU_REGS_RIP] = val;
> -	__set_bit(VCPU_REGS_RIP, (unsigned long *)&vcpu->arch.regs_dirty);
> -	__set_bit(VCPU_REGS_RIP, (unsigned long *)&vcpu->arch.regs_avail);
> +	kvm_register_mark_dirty(vcpu, VCPU_REGS_RIP);
>  }
>  
>  static inline unsigned long kvm_rsp_read(struct kvm_vcpu *vcpu)
>  {
> -	if (!test_bit(VCPU_REGS_RSP, (unsigned long *)&vcpu->arch.regs_avail))
> +	if (!kvm_register_is_available(vcpu, VCPU_REGS_RSP))
>  		kvm_x86_ops->cache_reg(vcpu, VCPU_REGS_RSP);
>  
>  	return vcpu->arch.regs[VCPU_REGS_RSP];
> @@ -85,16 +108,14 @@ static inline unsigned long kvm_rsp_read(struct kvm_vcpu *vcpu)
>  static inline void kvm_rsp_write(struct kvm_vcpu *vcpu, unsigned long val)
>  {
>  	vcpu->arch.regs[VCPU_REGS_RSP] = val;
> -	__set_bit(VCPU_REGS_RSP, (unsigned long *)&vcpu->arch.regs_dirty);
> -	__set_bit(VCPU_REGS_RSP, (unsigned long *)&vcpu->arch.regs_avail);
> +	kvm_register_mark_dirty(vcpu, VCPU_REGS_RSP);
>  }
>  
>  static inline u64 kvm_pdptr_read(struct kvm_vcpu *vcpu, int index)
>  {
>  	might_sleep();  /* on svm */
>  
> -	if (!test_bit(VCPU_EXREG_PDPTR,
> -		      (unsigned long *)&vcpu->arch.regs_avail))
> +	if (!kvm_register_is_available(vcpu, VCPU_EXREG_PDPTR))
>  		kvm_x86_ops->cache_reg(vcpu, VCPU_EXREG_PDPTR);
>  
>  	return vcpu->arch.walk_mmu->pdptrs[index];
> @@ -123,7 +144,7 @@ static inline ulong kvm_read_cr4_bits(struct kvm_vcpu *vcpu, ulong mask)
>  
>  static inline ulong kvm_read_cr3(struct kvm_vcpu *vcpu)
>  {
> -	if (!test_bit(VCPU_EXREG_CR3, (ulong *)&vcpu->arch.regs_avail))
> +	if (!kvm_register_is_available(vcpu, VCPU_EXREG_CR3))
>  		kvm_x86_ops->decache_cr3(vcpu);
>  	return vcpu->arch.cr3;
>  }
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index b72a00b53e4a..8946f11c574c 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -1012,7 +1012,7 @@ static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3, bool ne
>  		kvm_mmu_new_cr3(vcpu, cr3, false);
>  
>  	vcpu->arch.cr3 = cr3;
> -	__set_bit(VCPU_EXREG_CR3, (ulong *)&vcpu->arch.regs_avail);
> +	kvm_register_mark_available(vcpu, VCPU_EXREG_CR3);
>  
>  	kvm_init_mmu(vcpu, false);
>  
> @@ -3986,7 +3986,7 @@ static void nested_vmx_restore_host_state(struct kvm_vcpu *vcpu)
>  
>  	nested_ept_uninit_mmu_context(vcpu);
>  	vcpu->arch.cr3 = vmcs_readl(GUEST_CR3);
> -	__set_bit(VCPU_EXREG_CR3, (ulong *)&vcpu->arch.regs_avail);
> +	kvm_register_mark_available(vcpu, VCPU_EXREG_CR3);
>  
>  	/*
>  	 * Use ept_save_pdptrs(vcpu) to load the MMU's cached PDPTRs
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 814d3e6d0264..ed03d0cd1cc8 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -721,8 +721,8 @@ static bool vmx_segment_cache_test_set(struct vcpu_vmx *vmx, unsigned seg,
>  	bool ret;
>  	u32 mask = 1 << (seg * SEG_FIELD_NR + field);
>  
> -	if (!(vmx->vcpu.arch.regs_avail & (1 << VCPU_EXREG_SEGMENTS))) {
> -		vmx->vcpu.arch.regs_avail |= (1 << VCPU_EXREG_SEGMENTS);
> +	if (!kvm_register_is_available(&vmx->vcpu, VCPU_EXREG_SEGMENTS)) {
> +		kvm_register_mark_available(&vmx->vcpu, VCPU_EXREG_SEGMENTS);
>  		vmx->segment_cache.bitmask = 0;
>  	}
>  	ret = vmx->segment_cache.bitmask & mask;
> @@ -1410,8 +1410,8 @@ unsigned long vmx_get_rflags(struct kvm_vcpu *vcpu)
>  	struct vcpu_vmx *vmx = to_vmx(vcpu);
>  	unsigned long rflags, save_rflags;
>  
> -	if (!test_bit(VCPU_EXREG_RFLAGS, (ulong *)&vcpu->arch.regs_avail)) {
> -		__set_bit(VCPU_EXREG_RFLAGS, (ulong *)&vcpu->arch.regs_avail);
> +	if (!kvm_register_is_available(vcpu, VCPU_EXREG_RFLAGS)) {
> +		kvm_register_mark_available(vcpu, VCPU_EXREG_RFLAGS);
>  		rflags = vmcs_readl(GUEST_RFLAGS);
>  		if (vmx->rmode.vm86_active) {
>  			rflags &= RMODE_GUEST_OWNED_EFLAGS_BITS;
> @@ -1429,7 +1429,7 @@ void vmx_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
>  	unsigned long old_rflags;
>  
>  	if (enable_unrestricted_guest) {
> -		__set_bit(VCPU_EXREG_RFLAGS, (ulong *)&vcpu->arch.regs_avail);
> +		kvm_register_mark_available(vcpu, VCPU_EXREG_RFLAGS);
>  
>  		vmx->rflags = rflags;
>  		vmcs_writel(GUEST_RFLAGS, rflags);
> @@ -2175,7 +2175,8 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  
>  static void vmx_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg)
>  {
> -	__set_bit(reg, (unsigned long *)&vcpu->arch.regs_avail);
> +	kvm_register_mark_available(vcpu, reg);
> +
>  	switch (reg) {
>  	case VCPU_REGS_RSP:
>  		vcpu->arch.regs[VCPU_REGS_RSP] = vmcs_readl(GUEST_RSP);
> @@ -2862,7 +2863,7 @@ static void vmx_decache_cr3(struct kvm_vcpu *vcpu)
>  {
>  	if (enable_unrestricted_guest || (enable_ept && is_paging(vcpu)))
>  		vcpu->arch.cr3 = vmcs_readl(GUEST_CR3);
> -	__set_bit(VCPU_EXREG_CR3, (ulong *)&vcpu->arch.regs_avail);
> +	kvm_register_mark_available(vcpu, VCPU_EXREG_CR3);
>  }
>  
>  static void vmx_decache_cr4_guest_bits(struct kvm_vcpu *vcpu)
> @@ -2877,8 +2878,7 @@ static void ept_load_pdptrs(struct kvm_vcpu *vcpu)
>  {
>  	struct kvm_mmu *mmu = vcpu->arch.walk_mmu;
>  
> -	if (!test_bit(VCPU_EXREG_PDPTR,
> -		      (unsigned long *)&vcpu->arch.regs_dirty))
> +	if (!kvm_register_is_dirty(vcpu, VCPU_EXREG_PDPTR))
>  		return;
>  
>  	if (is_pae_paging(vcpu)) {
> @@ -2900,10 +2900,7 @@ void ept_save_pdptrs(struct kvm_vcpu *vcpu)
>  		mmu->pdptrs[3] = vmcs_read64(GUEST_PDPTR3);
>  	}
>  
> -	__set_bit(VCPU_EXREG_PDPTR,
> -		  (unsigned long *)&vcpu->arch.regs_avail);
> -	__set_bit(VCPU_EXREG_PDPTR,
> -		  (unsigned long *)&vcpu->arch.regs_dirty);
> +	kvm_register_mark_dirty(vcpu, VCPU_EXREG_PDPTR);
>  }
>  
>  static void ept_update_paging_mode_cr0(unsigned long *hw_cr0,
> @@ -2912,7 +2909,7 @@ static void ept_update_paging_mode_cr0(unsigned long *hw_cr0,
>  {
>  	struct vcpu_vmx *vmx = to_vmx(vcpu);
>  
> -	if (!test_bit(VCPU_EXREG_CR3, (ulong *)&vcpu->arch.regs_avail))
> +	if (!kvm_register_is_available(vcpu, VCPU_EXREG_CR3))
>  		vmx_decache_cr3(vcpu);
>  	if (!(cr0 & X86_CR0_PG)) {
>  		/* From paging/starting to nonpaging */
> @@ -6528,9 +6525,9 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
>  	if (vmx->nested.need_vmcs12_to_shadow_sync)
>  		nested_sync_vmcs12_to_shadow(vcpu);
>  
> -	if (test_bit(VCPU_REGS_RSP, (unsigned long *)&vcpu->arch.regs_dirty))
> +	if (kvm_register_is_dirty(vcpu, VCPU_REGS_RSP))
>  		vmcs_writel(GUEST_RSP, vcpu->arch.regs[VCPU_REGS_RSP]);
> -	if (test_bit(VCPU_REGS_RIP, (unsigned long *)&vcpu->arch.regs_dirty))
> +	if (kvm_register_is_dirty(vcpu, VCPU_REGS_RIP))
>  		vmcs_writel(GUEST_RIP, vcpu->arch.regs[VCPU_REGS_RIP]);
>  
>  	cr3 = __get_current_cr3_fast();
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 0ed07d8d2caa..cd6bd7991c39 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -709,10 +709,8 @@ int load_pdptrs(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, unsigned long cr3)
>  	ret = 1;
>  
>  	memcpy(mmu->pdptrs, pdpte, sizeof(mmu->pdptrs));
> -	__set_bit(VCPU_EXREG_PDPTR,
> -		  (unsigned long *)&vcpu->arch.regs_avail);
> -	__set_bit(VCPU_EXREG_PDPTR,
> -		  (unsigned long *)&vcpu->arch.regs_dirty);
> +	kvm_register_mark_dirty(vcpu, VCPU_EXREG_PDPTR);
> +
>  out:
>  
>  	return ret;
> @@ -730,8 +728,7 @@ bool pdptrs_changed(struct kvm_vcpu *vcpu)
>  	if (!is_pae_paging(vcpu))
>  		return false;
>  
> -	if (!test_bit(VCPU_EXREG_PDPTR,
> -		      (unsigned long *)&vcpu->arch.regs_avail))
> +	if (!kvm_register_is_available(vcpu, VCPU_EXREG_PDPTR))
>  		return true;
>  
>  	gfn = (kvm_read_cr3(vcpu) & 0xffffffe0ul) >> PAGE_SHIFT;
> @@ -976,7 +973,7 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
>  
>  	kvm_mmu_new_cr3(vcpu, cr3, skip_tlb_flush);
>  	vcpu->arch.cr3 = cr3;
> -	__set_bit(VCPU_EXREG_CR3, (ulong *)&vcpu->arch.regs_avail);
> +	kvm_register_mark_available(vcpu, VCPU_EXREG_CR3);
>  
>  	return 0;
>  }
> @@ -8766,7 +8763,7 @@ static int __set_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
>  	vcpu->arch.cr2 = sregs->cr2;
>  	mmu_reset_needed |= kvm_read_cr3(vcpu) != sregs->cr3;
>  	vcpu->arch.cr3 = sregs->cr3;
> -	__set_bit(VCPU_EXREG_CR3, (ulong *)&vcpu->arch.regs_avail);
> +	kvm_register_mark_available(vcpu, VCPU_EXREG_CR3);
>  
>  	kvm_set_cr8(vcpu, sregs->cr8);

-- 
Vitaly

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 0/8] KVM: x86: nVMX GUEST_CR3 bug fix, and then some...
  2019-09-27 21:45 [PATCH v2 0/8] KVM: x86: nVMX GUEST_CR3 bug fix, and then some Sean Christopherson
                   ` (7 preceding siblings ...)
  2019-09-27 21:45 ` [PATCH v2 8/8] KVM: x86: Fold decache_cr3() into cache_reg() Sean Christopherson
@ 2019-09-30 10:42 ` Reto Buerki
  2019-10-29 15:03   ` Martin Lucina
  8 siblings, 1 reply; 34+ messages in thread
From: Reto Buerki @ 2019-09-30 10:42 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Radim Krčmář
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Liran Alon

On 9/27/19 11:45 PM, Sean Christopherson wrote:
> *sigh*
> 
> v2 was shaping up to be a trivial update, until I started working on
> Vitaly's suggestion to add a helper to test for register availability.
> 
> The primary purpose of this series is to fix a CR3 corruption in L2
> reported by Reto Buerki when running with HLT interception disabled in L1.
> On a nested VM-Enter that puts L2 into HLT, KVM never actually enters L2
> and instead mimics HLT interception by canceling the nested run and
> pretending that VM-Enter to L2 completed and then exited on HLT (which
> KVM intercepted).  Because KVM never actually runs L2, KVM skips the
> pending MMU update for L2 and so leaves a stale value in vmcs02.GUEST_CR3.
> If the next wake event for L2 triggers a nested VM-Exit, KVM will refresh
> vmcs12->guest_cr3 from vmcs02.GUEST_CR3 and consume the stale value.
> 
> Fix the issue by unconditionally writing vmcs02.GUEST_CR3 during nested
> VM-Enter instead of deferring the update to vmx_set_cr3(), and skip the
> update of GUEST_CR3 in vmx_set_cr3() when running L2.  I.e. make the
> nested code fully responsible for vmcs02.GUEST_CR3.
> 
> Patch 02/08 is a minor optimization to skip the GUEST_CR3 update if
> vmcs01 is already up-to-date.
> 
> Patches 03 and beyond are Vitaly's fault ;-).
> 
> Patches 03 and 04 are tangentially related cleanup to vmx_set_rflags()
> that was discovered when working through the avail/dirty testing code.
> Ideally they'd be sent as a separate series, but they conflict with the
> avail/dirty helper changes and are themselves minor and straightforward.
> 
> Patches 05 and 06 clean up the register caching code so that there is a
> single enum for all registers which use avail/dirty tracking.  While not
> a true prerequisite for the avail/dirty helpers, the cleanup allows the
> new helpers to take an 'enum kvm_reg' instead of a less helpful 'int reg'.
> 
> Patch 07 is the helpers themselves, as suggested by Vitaly.
> 
> Patch 08 is a truly optional change to ditch decache_cr3() in favor of
> handling CR3 via cache_reg() like any other avail/dirty register.
> 
> 
> Note, I collected the Reviewed-by and Tested-by tags for patches 01 and 02
> even though I inverted the boolean from 'skip_cr3' to 'update_guest_cr3'.
> Please drop the tags if that constitutes a non-trivial functional change.
> 
> v2:
>   - Invert skip_cr3 to update_guest_cr3.  [Liran]
>   - Reword the changelog and comment to be more explicit in detailing
>     how/when KVM will process a nested VM-Enter without runnin L2.  [Liran]
>   - Added Reviewed-by and Tested-by tags.
>   - Add a comment in vmx_set_cr3() to explicitly state that nested
>     VM-Enter is responsible for loading vmcs02.GUEST_CR3.  [Jim]
>   - All of the loveliness in patches 03-08. [Vitaly]
> 
> Sean Christopherson (8):
>   KVM: nVMX: Always write vmcs02.GUEST_CR3 during nested VM-Enter
>   KVM: VMX: Skip GUEST_CR3 VMREAD+VMWRITE if the VMCS is up-to-date
>   KVM: VMX: Consolidate to_vmx() usage in RFLAGS accessors
>   KVM: VMX: Optimize vmx_set_rflags() for unrestricted guest
>   KVM: x86: Add WARNs to detect out-of-bounds register indices
>   KVM: x86: Fold 'enum kvm_ex_reg' definitions into 'enum kvm_reg'
>   KVM: x86: Add helpers to test/mark reg availability and dirtiness
>   KVM: x86: Fold decache_cr3() into cache_reg()
> 
>  arch/x86/include/asm/kvm_host.h |  5 +-
>  arch/x86/kvm/kvm_cache_regs.h   | 67 +++++++++++++++++------
>  arch/x86/kvm/svm.c              |  5 --
>  arch/x86/kvm/vmx/nested.c       | 14 ++++-
>  arch/x86/kvm/vmx/vmx.c          | 94 ++++++++++++++++++---------------
>  arch/x86/kvm/x86.c              | 13 ++---
>  arch/x86/kvm/x86.h              |  6 +--
>  7 files changed, 123 insertions(+), 81 deletions(-)

Series:
Tested-by: Reto Buerki <reet@codelabs.ch>

Thanks.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 8/8] KVM: x86: Fold decache_cr3() into cache_reg()
  2019-09-27 21:45 ` [PATCH v2 8/8] KVM: x86: Fold decache_cr3() into cache_reg() Sean Christopherson
@ 2019-09-30 10:58   ` Vitaly Kuznetsov
  2019-09-30 15:04     ` Sean Christopherson
  2019-10-09 11:03   ` Paolo Bonzini
  1 sibling, 1 reply; 34+ messages in thread
From: Vitaly Kuznetsov @ 2019-09-30 10:58 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Wanpeng Li, Jim Mattson, Joerg Roedel, kvm, linux-kernel,
	Reto Buerki, Liran Alon, Paolo Bonzini,
	Radim Krčmář

Sean Christopherson <sean.j.christopherson@intel.com> writes:

> Handle caching CR3 (from VMX's VMCS) into struct kvm_vcpu via the common
> cache_reg() callback and drop the dedicated decache_cr3().  The name
> decache_cr3() is somewhat confusing as the caching behavior of CR3
> follows that of GPRs, RFLAGS and PDPTRs, (handled via cache_reg()), and
> has nothing in common with the caching behavior of CR0/CR4 (whose
> decache_cr{0,4}_guest_bits() likely provided the 'decache' verbiage).
>
> Note, this effectively adds a BUG() if KVM attempts to cache CR3 on SVM.
> Opportunistically add a WARN_ON_ONCE() in VMX to provide an equivalent
> check.

Just to justify my idea of replacing such occasions with
KVM_INTERNAL_ERROR by setting a special 'kill ASAP' bit somewhere:

This WARN_ON_ONCE() falls in the same category (IMO).

>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> ---
>  arch/x86/include/asm/kvm_host.h |  1 -
>  arch/x86/kvm/kvm_cache_regs.h   |  2 +-
>  arch/x86/kvm/svm.c              |  5 -----
>  arch/x86/kvm/vmx/vmx.c          | 15 ++++++---------
>  4 files changed, 7 insertions(+), 16 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index a27f7f6b6b7a..0411dc0a27b0 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1040,7 +1040,6 @@ struct kvm_x86_ops {
>  			    struct kvm_segment *var, int seg);
>  	void (*get_cs_db_l_bits)(struct kvm_vcpu *vcpu, int *db, int *l);
>  	void (*decache_cr0_guest_bits)(struct kvm_vcpu *vcpu);
> -	void (*decache_cr3)(struct kvm_vcpu *vcpu);
>  	void (*decache_cr4_guest_bits)(struct kvm_vcpu *vcpu);
>  	void (*set_cr0)(struct kvm_vcpu *vcpu, unsigned long cr0);
>  	void (*set_cr3)(struct kvm_vcpu *vcpu, unsigned long cr3);
> diff --git a/arch/x86/kvm/kvm_cache_regs.h b/arch/x86/kvm/kvm_cache_regs.h
> index 9c2bc528800b..f18177cd0030 100644
> --- a/arch/x86/kvm/kvm_cache_regs.h
> +++ b/arch/x86/kvm/kvm_cache_regs.h
> @@ -145,7 +145,7 @@ static inline ulong kvm_read_cr4_bits(struct kvm_vcpu *vcpu, ulong mask)
>  static inline ulong kvm_read_cr3(struct kvm_vcpu *vcpu)
>  {
>  	if (!kvm_register_is_available(vcpu, VCPU_EXREG_CR3))
> -		kvm_x86_ops->decache_cr3(vcpu);
> +		kvm_x86_ops->cache_reg(vcpu, VCPU_EXREG_CR3);
>  	return vcpu->arch.cr3;
>  }
>  
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index f8ecb6df5106..3102c44c12c6 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -2517,10 +2517,6 @@ static void svm_decache_cr0_guest_bits(struct kvm_vcpu *vcpu)
>  {
>  }
>  
> -static void svm_decache_cr3(struct kvm_vcpu *vcpu)
> -{
> -}
> -
>  static void svm_decache_cr4_guest_bits(struct kvm_vcpu *vcpu)
>  {
>  }
> @@ -7208,7 +7204,6 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
>  	.get_cpl = svm_get_cpl,
>  	.get_cs_db_l_bits = kvm_get_cs_db_l_bits,
>  	.decache_cr0_guest_bits = svm_decache_cr0_guest_bits,
> -	.decache_cr3 = svm_decache_cr3,
>  	.decache_cr4_guest_bits = svm_decache_cr4_guest_bits,
>  	.set_cr0 = svm_set_cr0,
>  	.set_cr3 = svm_set_cr3,
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index ed03d0cd1cc8..c84798026e85 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -2188,7 +2188,12 @@ static void vmx_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg)
>  		if (enable_ept)
>  			ept_save_pdptrs(vcpu);
>  		break;
> +	case VCPU_EXREG_CR3:
> +		if (enable_unrestricted_guest || (enable_ept && is_paging(vcpu)))
> +			vcpu->arch.cr3 = vmcs_readl(GUEST_CR3);
> +		break;
>  	default:
> +		WARN_ON_ONCE(1);
>  		break;
>  	}
>  }
> @@ -2859,13 +2864,6 @@ static void vmx_decache_cr0_guest_bits(struct kvm_vcpu *vcpu)
>  	vcpu->arch.cr0 |= vmcs_readl(GUEST_CR0) & cr0_guest_owned_bits;
>  }
>  
> -static void vmx_decache_cr3(struct kvm_vcpu *vcpu)
> -{
> -	if (enable_unrestricted_guest || (enable_ept && is_paging(vcpu)))
> -		vcpu->arch.cr3 = vmcs_readl(GUEST_CR3);
> -	kvm_register_mark_available(vcpu, VCPU_EXREG_CR3);
> -}
> -
>  static void vmx_decache_cr4_guest_bits(struct kvm_vcpu *vcpu)
>  {
>  	ulong cr4_guest_owned_bits = vcpu->arch.cr4_guest_owned_bits;
> @@ -2910,7 +2908,7 @@ static void ept_update_paging_mode_cr0(unsigned long *hw_cr0,
>  	struct vcpu_vmx *vmx = to_vmx(vcpu);
>  
>  	if (!kvm_register_is_available(vcpu, VCPU_EXREG_CR3))
> -		vmx_decache_cr3(vcpu);
> +		vmx_cache_reg(vcpu, VCPU_EXREG_CR3);
>  	if (!(cr0 & X86_CR0_PG)) {
>  		/* From paging/starting to nonpaging */
>  		exec_controls_setbit(vmx, CPU_BASED_CR3_LOAD_EXITING |
> @@ -7792,7 +7790,6 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
>  	.get_cpl = vmx_get_cpl,
>  	.get_cs_db_l_bits = vmx_get_cs_db_l_bits,
>  	.decache_cr0_guest_bits = vmx_decache_cr0_guest_bits,
> -	.decache_cr3 = vmx_decache_cr3,
>  	.decache_cr4_guest_bits = vmx_decache_cr4_guest_bits,
>  	.set_cr0 = vmx_set_cr0,
>  	.set_cr3 = vmx_set_cr3,

Reviewed (and Tested-On-Amd-By:): Vitaly Kuznetsov <vkuznets@redhat.com>

-- 
Vitaly

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 8/8] KVM: x86: Fold decache_cr3() into cache_reg()
  2019-09-30 10:58   ` Vitaly Kuznetsov
@ 2019-09-30 15:04     ` Sean Christopherson
  2019-09-30 15:27       ` Vitaly Kuznetsov
  0 siblings, 1 reply; 34+ messages in thread
From: Sean Christopherson @ 2019-09-30 15:04 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Wanpeng Li, Jim Mattson, Joerg Roedel, kvm, linux-kernel,
	Reto Buerki, Liran Alon, Paolo Bonzini,
	Radim Krčmář

On Mon, Sep 30, 2019 at 12:58:53PM +0200, Vitaly Kuznetsov wrote:
> Sean Christopherson <sean.j.christopherson@intel.com> writes:
> 
> > Handle caching CR3 (from VMX's VMCS) into struct kvm_vcpu via the common
> > cache_reg() callback and drop the dedicated decache_cr3().  The name
> > decache_cr3() is somewhat confusing as the caching behavior of CR3
> > follows that of GPRs, RFLAGS and PDPTRs, (handled via cache_reg()), and
> > has nothing in common with the caching behavior of CR0/CR4 (whose
> > decache_cr{0,4}_guest_bits() likely provided the 'decache' verbiage).
> >
> > Note, this effectively adds a BUG() if KVM attempts to cache CR3 on SVM.
> > Opportunistically add a WARN_ON_ONCE() in VMX to provide an equivalent
> > check.
> 
> Just to justify my idea of replacing such occasions with
> KVM_INTERNAL_ERROR by setting a special 'kill ASAP' bit somewhere:
> 
> This WARN_ON_ONCE() falls in the same category (IMO).

Maybe something like KVM_BUG_ON?  E.g.:

#define KVM_BUG_ON(kvm, cond)		\
({					\
	int r;				\
					\
	if (r = WARN_ON_ONCE(cond))	\
		kvm->vm_bugged = true;	\
	r;				\
)}
	

> > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> > ---

...

> Reviewed (and Tested-On-Amd-By:): Vitaly Kuznetsov <vkuznets@redhat.com>

Thanks for the reviews and for testing on AMD!

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 4/8] KVM: VMX: Optimize vmx_set_rflags() for unrestricted guest
  2019-09-30  8:57   ` Vitaly Kuznetsov
@ 2019-09-30 15:19     ` Sean Christopherson
  2019-09-30 15:55       ` Vitaly Kuznetsov
  0 siblings, 1 reply; 34+ messages in thread
From: Sean Christopherson @ 2019-09-30 15:19 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Paolo Bonzini, Radim Krčmář,
	Wanpeng Li, Jim Mattson, Joerg Roedel, kvm, linux-kernel,
	Reto Buerki, Liran Alon

On Mon, Sep 30, 2019 at 10:57:17AM +0200, Vitaly Kuznetsov wrote:
> Sean Christopherson <sean.j.christopherson@intel.com> writes:
> 
> > Rework vmx_set_rflags() to avoid the extra code need to handle emulation
> > of real mode and invalid state when unrestricted guest is disabled.  The
> > primary reason for doing so is to avoid the call to vmx_get_rflags(),
> > which will incur a VMREAD when RFLAGS is not already available.  When
> > running nested VMs, the majority of calls to vmx_set_rflags() will occur
> > without an associated vmx_get_rflags(), i.e. when stuffing GUEST_RFLAGS
> > during transitions between vmcs01 and vmcs02.
> >
> > Note, vmx_get_rflags() guarantees RFLAGS is marked available.
> >
> > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> > ---
> >  arch/x86/kvm/vmx/vmx.c | 28 ++++++++++++++++++----------
> >  1 file changed, 18 insertions(+), 10 deletions(-)
> >
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index 83fe8b02b732..814d3e6d0264 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -1426,18 +1426,26 @@ unsigned long vmx_get_rflags(struct kvm_vcpu *vcpu)
> >  void vmx_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
> >  {
> >  	struct vcpu_vmx *vmx = to_vmx(vcpu);
> > -	unsigned long old_rflags = vmx_get_rflags(vcpu);
> > +	unsigned long old_rflags;
> >  
> > -	__set_bit(VCPU_EXREG_RFLAGS, (ulong *)&vcpu->arch.regs_avail);
> > -	vmx->rflags = rflags;
> > -	if (vmx->rmode.vm86_active) {
> > -		vmx->rmode.save_rflags = rflags;
> > -		rflags |= X86_EFLAGS_IOPL | X86_EFLAGS_VM;
> > +	if (enable_unrestricted_guest) {
> > +		__set_bit(VCPU_EXREG_RFLAGS, (ulong *)&vcpu->arch.regs_avail);
> > +
> > +		vmx->rflags = rflags;
> > +		vmcs_writel(GUEST_RFLAGS, rflags);
> > +	} else {
> > +		old_rflags = vmx_get_rflags(vcpu);
> > +
> > +		vmx->rflags = rflags;
> > +		if (vmx->rmode.vm86_active) {
> > +			vmx->rmode.save_rflags = rflags;
> > +			rflags |= X86_EFLAGS_IOPL | X86_EFLAGS_VM;
> > +		}
> > +		vmcs_writel(GUEST_RFLAGS, rflags);
> > +
> > +		if ((old_rflags ^ vmx->rflags) & X86_EFLAGS_VM)
> > +			vmx->emulation_required = emulation_required(vcpu);
> >  	}
> > -	vmcs_writel(GUEST_RFLAGS, rflags);
> 
> We're doing vmcs_writel() in both branches so it could've stayed here, right?

Yes, but the resulting code is a bit ugly.  emulation_required() consumes
vmcs.GUEST_RFLAGS, i.e. the if statement that reads old_rflags would also
need to be outside of the else{} case.  

This isn't too bad:

	if (!enable_unrestricted_guest && 
	    ((old_rflags ^ vmx->rflags) & X86_EFLAGS_VM))
		vmx->emulation_required = emulation_required(vcpu);

but gcc isn't smart enough to understand old_rflags won't be used if
enable_unrestricted_guest, so old_rflags either needs to be tagged with
uninitialized_var() or explicitly initialized in the if(){} case.

Duplicating a small amount of code felt like the lesser of two evils.

> > -
> > -	if ((old_rflags ^ vmx->rflags) & X86_EFLAGS_VM)
> > -		vmx->emulation_required = emulation_required(vcpu);
> >  }
> >  
> >  u32 vmx_get_interrupt_shadow(struct kvm_vcpu *vcpu)
> 
> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> 
> -- 
> Vitaly

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 8/8] KVM: x86: Fold decache_cr3() into cache_reg()
  2019-09-30 15:04     ` Sean Christopherson
@ 2019-09-30 15:27       ` Vitaly Kuznetsov
  2019-09-30 15:33         ` Sean Christopherson
  0 siblings, 1 reply; 34+ messages in thread
From: Vitaly Kuznetsov @ 2019-09-30 15:27 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Wanpeng Li, Jim Mattson, Joerg Roedel, kvm, linux-kernel,
	Reto Buerki, Liran Alon, Paolo Bonzini,
	Radim Krčmář

Sean Christopherson <sean.j.christopherson@intel.com> writes:

> On Mon, Sep 30, 2019 at 12:58:53PM +0200, Vitaly Kuznetsov wrote:
>> Sean Christopherson <sean.j.christopherson@intel.com> writes:
>> 
>> > Handle caching CR3 (from VMX's VMCS) into struct kvm_vcpu via the common
>> > cache_reg() callback and drop the dedicated decache_cr3().  The name
>> > decache_cr3() is somewhat confusing as the caching behavior of CR3
>> > follows that of GPRs, RFLAGS and PDPTRs, (handled via cache_reg()), and
>> > has nothing in common with the caching behavior of CR0/CR4 (whose
>> > decache_cr{0,4}_guest_bits() likely provided the 'decache' verbiage).
>> >
>> > Note, this effectively adds a BUG() if KVM attempts to cache CR3 on SVM.
>> > Opportunistically add a WARN_ON_ONCE() in VMX to provide an equivalent
>> > check.
>> 
>> Just to justify my idea of replacing such occasions with
>> KVM_INTERNAL_ERROR by setting a special 'kill ASAP' bit somewhere:
>> 
>> This WARN_ON_ONCE() falls in the same category (IMO).
>
> Maybe something like KVM_BUG_ON?  E.g.:
>
> #define KVM_BUG_ON(kvm, cond)		\
> ({					\
> 	int r;				\
> 					\
> 	if (r = WARN_ON_ONCE(cond))	\
> 		kvm->vm_bugged = true;	\
> 	r;				\
> )}
> 	

Yes, that's more or less what I meant! (to me 'vm_bugged' sounds like
there was a bug in the VM but the bug is actually in KVM so maybe
something like 'kvm_internal_bug' to make it explicit?)

-- 
Vitaly

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 8/8] KVM: x86: Fold decache_cr3() into cache_reg()
  2019-09-30 15:27       ` Vitaly Kuznetsov
@ 2019-09-30 15:33         ` Sean Christopherson
  0 siblings, 0 replies; 34+ messages in thread
From: Sean Christopherson @ 2019-09-30 15:33 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Wanpeng Li, Jim Mattson, Joerg Roedel, kvm, linux-kernel,
	Reto Buerki, Liran Alon, Paolo Bonzini,
	Radim Krčmář

On Mon, Sep 30, 2019 at 05:27:58PM +0200, Vitaly Kuznetsov wrote:
> Sean Christopherson <sean.j.christopherson@intel.com> writes:
> 
> > On Mon, Sep 30, 2019 at 12:58:53PM +0200, Vitaly Kuznetsov wrote:
> >> Sean Christopherson <sean.j.christopherson@intel.com> writes:
> >> 
> >> > Handle caching CR3 (from VMX's VMCS) into struct kvm_vcpu via the common
> >> > cache_reg() callback and drop the dedicated decache_cr3().  The name
> >> > decache_cr3() is somewhat confusing as the caching behavior of CR3
> >> > follows that of GPRs, RFLAGS and PDPTRs, (handled via cache_reg()), and
> >> > has nothing in common with the caching behavior of CR0/CR4 (whose
> >> > decache_cr{0,4}_guest_bits() likely provided the 'decache' verbiage).
> >> >
> >> > Note, this effectively adds a BUG() if KVM attempts to cache CR3 on SVM.
> >> > Opportunistically add a WARN_ON_ONCE() in VMX to provide an equivalent
> >> > check.
> >> 
> >> Just to justify my idea of replacing such occasions with
> >> KVM_INTERNAL_ERROR by setting a special 'kill ASAP' bit somewhere:
> >> 
> >> This WARN_ON_ONCE() falls in the same category (IMO).
> >
> > Maybe something like KVM_BUG_ON?  E.g.:
> >
> > #define KVM_BUG_ON(kvm, cond)		\
> > ({					\
> > 	int r;				\
> > 					\
> > 	if (r = WARN_ON_ONCE(cond))	\
> > 		kvm->vm_bugged = true;	\
> > 	r;				\
> > )}
> > 	
> 
> Yes, that's more or less what I meant! (to me 'vm_bugged' sounds like
> there was a bug in the VM but the bug is actually in KVM so maybe
> something like 'kvm_internal_bug' to make it explicit?)

Ya, kvm_internal_bug is better.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 4/8] KVM: VMX: Optimize vmx_set_rflags() for unrestricted guest
  2019-09-30 15:19     ` Sean Christopherson
@ 2019-09-30 15:55       ` Vitaly Kuznetsov
  0 siblings, 0 replies; 34+ messages in thread
From: Vitaly Kuznetsov @ 2019-09-30 15:55 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Radim Krčmář,
	Wanpeng Li, Jim Mattson, Joerg Roedel, kvm, linux-kernel,
	Reto Buerki, Liran Alon

Sean Christopherson <sean.j.christopherson@intel.com> writes:

> On Mon, Sep 30, 2019 at 10:57:17AM +0200, Vitaly Kuznetsov wrote:
>> Sean Christopherson <sean.j.christopherson@intel.com> writes:
>> 
>> > Rework vmx_set_rflags() to avoid the extra code need to handle emulation
>> > of real mode and invalid state when unrestricted guest is disabled.  The
>> > primary reason for doing so is to avoid the call to vmx_get_rflags(),
>> > which will incur a VMREAD when RFLAGS is not already available.  When
>> > running nested VMs, the majority of calls to vmx_set_rflags() will occur
>> > without an associated vmx_get_rflags(), i.e. when stuffing GUEST_RFLAGS
>> > during transitions between vmcs01 and vmcs02.
>> >
>> > Note, vmx_get_rflags() guarantees RFLAGS is marked available.
>> >
>> > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
>> > ---
>> >  arch/x86/kvm/vmx/vmx.c | 28 ++++++++++++++++++----------
>> >  1 file changed, 18 insertions(+), 10 deletions(-)
>> >
>> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>> > index 83fe8b02b732..814d3e6d0264 100644
>> > --- a/arch/x86/kvm/vmx/vmx.c
>> > +++ b/arch/x86/kvm/vmx/vmx.c
>> > @@ -1426,18 +1426,26 @@ unsigned long vmx_get_rflags(struct kvm_vcpu *vcpu)
>> >  void vmx_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
>> >  {
>> >  	struct vcpu_vmx *vmx = to_vmx(vcpu);
>> > -	unsigned long old_rflags = vmx_get_rflags(vcpu);
>> > +	unsigned long old_rflags;
>> >  
>> > -	__set_bit(VCPU_EXREG_RFLAGS, (ulong *)&vcpu->arch.regs_avail);
>> > -	vmx->rflags = rflags;
>> > -	if (vmx->rmode.vm86_active) {
>> > -		vmx->rmode.save_rflags = rflags;
>> > -		rflags |= X86_EFLAGS_IOPL | X86_EFLAGS_VM;
>> > +	if (enable_unrestricted_guest) {
>> > +		__set_bit(VCPU_EXREG_RFLAGS, (ulong *)&vcpu->arch.regs_avail);
>> > +
>> > +		vmx->rflags = rflags;
>> > +		vmcs_writel(GUEST_RFLAGS, rflags);
>> > +	} else {
>> > +		old_rflags = vmx_get_rflags(vcpu);
>> > +
>> > +		vmx->rflags = rflags;
>> > +		if (vmx->rmode.vm86_active) {
>> > +			vmx->rmode.save_rflags = rflags;
>> > +			rflags |= X86_EFLAGS_IOPL | X86_EFLAGS_VM;
>> > +		}
>> > +		vmcs_writel(GUEST_RFLAGS, rflags);
>> > +
>> > +		if ((old_rflags ^ vmx->rflags) & X86_EFLAGS_VM)
>> > +			vmx->emulation_required = emulation_required(vcpu);
>> >  	}
>> > -	vmcs_writel(GUEST_RFLAGS, rflags);
>> 
>> We're doing vmcs_writel() in both branches so it could've stayed here, right?
>
> Yes, but the resulting code is a bit ugly.  emulation_required() consumes
> vmcs.GUEST_RFLAGS, i.e. the if statement that reads old_rflags would also
> need to be outside of the else{} case.  
>
> This isn't too bad:
>
> 	if (!enable_unrestricted_guest && 
> 	    ((old_rflags ^ vmx->rflags) & X86_EFLAGS_VM))
> 		vmx->emulation_required = emulation_required(vcpu);
>
> but gcc isn't smart enough to understand old_rflags won't be used if
> enable_unrestricted_guest, so old_rflags either needs to be tagged with
> uninitialized_var() or explicitly initialized in the if(){} case.
>
> Duplicating a small amount of code felt like the lesser of two evils.
>

I see, thanks for these additional details!

-- 
Vitaly

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 4/8] KVM: VMX: Optimize vmx_set_rflags() for unrestricted guest
  2019-09-27 21:45 ` [PATCH v2 4/8] KVM: VMX: Optimize vmx_set_rflags() for unrestricted guest Sean Christopherson
  2019-09-30  8:57   ` Vitaly Kuznetsov
@ 2019-10-09 10:40   ` Paolo Bonzini
  2019-10-09 16:38     ` Sean Christopherson
  1 sibling, 1 reply; 34+ messages in thread
From: Paolo Bonzini @ 2019-10-09 10:40 UTC (permalink / raw)
  To: Sean Christopherson, Radim Krčmář
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Reto Buerki, Liran Alon

On 27/09/19 23:45, Sean Christopherson wrote:
> Rework vmx_set_rflags() to avoid the extra code need to handle emulation
> of real mode and invalid state when unrestricted guest is disabled.  The
> primary reason for doing so is to avoid the call to vmx_get_rflags(),
> which will incur a VMREAD when RFLAGS is not already available.  When
> running nested VMs, the majority of calls to vmx_set_rflags() will occur
> without an associated vmx_get_rflags(), i.e. when stuffing GUEST_RFLAGS
> during transitions between vmcs01 and vmcs02.
> 
> Note, vmx_get_rflags() guarantees RFLAGS is marked available.

Slightly nicer this way:

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 8de9853d7ab6..62ab19d65efd 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1431,9 +1431,17 @@ unsigned long vmx_get_rflags(struct kvm_vcpu *vcpu)
 void vmx_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
-	unsigned long old_rflags = vmx_get_rflags(vcpu);
+	unsigned long old_rflags;
+
+	if (enable_unrestricted_guest) {
+		__set_bit(VCPU_EXREG_RFLAGS, (ulong *)&vcpu->arch.regs_avail);
+		vmx->rflags = rflags;
+		vmcs_writel(GUEST_RFLAGS, rflags);
+		return;
+	}
+
+	old_rflags = vmx_get_rflags(vcpu);
 
-	__set_bit(VCPU_EXREG_RFLAGS, (ulong *)&vcpu->arch.regs_avail);
 	vmx->rflags = rflags;
 	if (vmx->rmode.vm86_active) {
 		vmx->rmode.save_rflags = rflags;

Paolo

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 5/8] KVM: x86: Add WARNs to detect out-of-bounds register indices
  2019-09-27 21:45 ` [PATCH v2 5/8] KVM: x86: Add WARNs to detect out-of-bounds register indices Sean Christopherson
  2019-09-30  9:19   ` Vitaly Kuznetsov
@ 2019-10-09 10:50   ` Paolo Bonzini
  2019-10-09 16:36     ` Sean Christopherson
  1 sibling, 1 reply; 34+ messages in thread
From: Paolo Bonzini @ 2019-10-09 10:50 UTC (permalink / raw)
  To: Sean Christopherson, Radim Krčmář
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Reto Buerki, Liran Alon

On 27/09/19 23:45, Sean Christopherson wrote:
> Open code the RIP and RSP accessors so as to avoid pointless overhead of
> WARN_ON_ONCE().

Is there actually an overhead here?  It is effectively WARN_ON_ONCE(0)
which should be compiled out just fine.

Paolo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 6/8] KVM: x86: Fold 'enum kvm_ex_reg' definitions into 'enum kvm_reg'
  2019-09-30  9:25   ` Vitaly Kuznetsov
@ 2019-10-09 10:52     ` Paolo Bonzini
  2019-10-09 11:27       ` Vitaly Kuznetsov
  0 siblings, 1 reply; 34+ messages in thread
From: Paolo Bonzini @ 2019-10-09 10:52 UTC (permalink / raw)
  To: Vitaly Kuznetsov, Sean Christopherson
  Cc: Wanpeng Li, Jim Mattson, Joerg Roedel, kvm, linux-kernel,
	Reto Buerki, Liran Alon, Radim Krčmář

On 30/09/19 11:25, Vitaly Kuznetsov wrote:
>> -enum kvm_reg_ex {
>>  	VCPU_EXREG_PDPTR = NR_VCPU_REGS,
> (Personally, I would've changed that to NR_VCPU_REGS + 1)
> 

Why?

Paolo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 7/8] KVM: x86: Add helpers to test/mark reg availability and dirtiness
  2019-09-30  9:32   ` Vitaly Kuznetsov
@ 2019-10-09 11:00     ` Paolo Bonzini
  0 siblings, 0 replies; 34+ messages in thread
From: Paolo Bonzini @ 2019-10-09 11:00 UTC (permalink / raw)
  To: Vitaly Kuznetsov, Sean Christopherson, Radim Krčmář
  Cc: Wanpeng Li, Jim Mattson, Joerg Roedel, kvm, linux-kernel,
	Reto Buerki, Liran Alon

On 30/09/19 11:32, Vitaly Kuznetsov wrote:
>> +static inline void kvm_register_mark_dirty(struct kvm_vcpu *vcpu,
>> +					   enum kvm_reg reg)
>> +{
>> +	__set_bit(reg, (unsigned long *)&vcpu->arch.regs_avail);
>> +	__set_bit(reg, (unsigned long *)&vcpu->arch.regs_dirty);
>> +}
>> +
> Personal preference again, but I would've named this
> "kvm_register_mark_avail_dirty" to indicate what we're actually doing
> (and maybe even shortened 'kvm_register_' to 'kvm_reg_' everywhere as I
> can't see how 'reg' could be misread).
> 

I think this is okay, a register can be either not cached, available or
dirty.  But dirty means we have to write it back, so it implies
availability.

Paolo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 8/8] KVM: x86: Fold decache_cr3() into cache_reg()
  2019-09-27 21:45 ` [PATCH v2 8/8] KVM: x86: Fold decache_cr3() into cache_reg() Sean Christopherson
  2019-09-30 10:58   ` Vitaly Kuznetsov
@ 2019-10-09 11:03   ` Paolo Bonzini
  1 sibling, 0 replies; 34+ messages in thread
From: Paolo Bonzini @ 2019-10-09 11:03 UTC (permalink / raw)
  To: Sean Christopherson, Radim Krčmář
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Reto Buerki, Liran Alon

On 27/09/19 23:45, Sean Christopherson wrote:
> Handle caching CR3 (from VMX's VMCS) into struct kvm_vcpu via the common
> cache_reg() callback and drop the dedicated decache_cr3().  The name
> decache_cr3() is somewhat confusing as the caching behavior of CR3
> follows that of GPRs, RFLAGS and PDPTRs, (handled via cache_reg()), and
> has nothing in common with the caching behavior of CR0/CR4 (whose
> decache_cr{0,4}_guest_bits() likely provided the 'decache' verbiage).
> 
> Note, this effectively adds a BUG() if KVM attempts to cache CR3 on SVM.
> Opportunistically add a WARN_ON_ONCE() in VMX to provide an equivalent
> check.
> 
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>

BUG() is a bit heavy, I'll squash this instead:

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index b8885bc0e7d7..e479ea9bc9da 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -2376,7 +2376,7 @@ static void svm_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg)
 		load_pdptrs(vcpu, vcpu->arch.walk_mmu, kvm_read_cr3(vcpu));
 		break;
 	default:
-		BUG();
+		WARN_ON_ONCE(1);
 	}
 }
 

Since the value that is never cached, literally nothing could go wrong at
least in theory.

Paolo


> ---
>  arch/x86/include/asm/kvm_host.h |  1 -
>  arch/x86/kvm/kvm_cache_regs.h   |  2 +-
>  arch/x86/kvm/svm.c              |  5 -----
>  arch/x86/kvm/vmx/vmx.c          | 15 ++++++---------
>  4 files changed, 7 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index a27f7f6b6b7a..0411dc0a27b0 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1040,7 +1040,6 @@ struct kvm_x86_ops {
>  			    struct kvm_segment *var, int seg);
>  	void (*get_cs_db_l_bits)(struct kvm_vcpu *vcpu, int *db, int *l);
>  	void (*decache_cr0_guest_bits)(struct kvm_vcpu *vcpu);
> -	void (*decache_cr3)(struct kvm_vcpu *vcpu);
>  	void (*decache_cr4_guest_bits)(struct kvm_vcpu *vcpu);
>  	void (*set_cr0)(struct kvm_vcpu *vcpu, unsigned long cr0);
>  	void (*set_cr3)(struct kvm_vcpu *vcpu, unsigned long cr3);
> diff --git a/arch/x86/kvm/kvm_cache_regs.h b/arch/x86/kvm/kvm_cache_regs.h
> index 9c2bc528800b..f18177cd0030 100644
> --- a/arch/x86/kvm/kvm_cache_regs.h
> +++ b/arch/x86/kvm/kvm_cache_regs.h
> @@ -145,7 +145,7 @@ static inline ulong kvm_read_cr4_bits(struct kvm_vcpu *vcpu, ulong mask)
>  static inline ulong kvm_read_cr3(struct kvm_vcpu *vcpu)
>  {
>  	if (!kvm_register_is_available(vcpu, VCPU_EXREG_CR3))
> -		kvm_x86_ops->decache_cr3(vcpu);
> +		kvm_x86_ops->cache_reg(vcpu, VCPU_EXREG_CR3);
>  	return vcpu->arch.cr3;
>  }
>  
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index f8ecb6df5106..3102c44c12c6 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -2517,10 +2517,6 @@ static void svm_decache_cr0_guest_bits(struct kvm_vcpu *vcpu)
>  {
>  }
>  
> -static void svm_decache_cr3(struct kvm_vcpu *vcpu)
> -{
> -}
> -
>  static void svm_decache_cr4_guest_bits(struct kvm_vcpu *vcpu)
>  {
>  }
> @@ -7208,7 +7204,6 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
>  	.get_cpl = svm_get_cpl,
>  	.get_cs_db_l_bits = kvm_get_cs_db_l_bits,
>  	.decache_cr0_guest_bits = svm_decache_cr0_guest_bits,
> -	.decache_cr3 = svm_decache_cr3,
>  	.decache_cr4_guest_bits = svm_decache_cr4_guest_bits,
>  	.set_cr0 = svm_set_cr0,
>  	.set_cr3 = svm_set_cr3,
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index ed03d0cd1cc8..c84798026e85 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -2188,7 +2188,12 @@ static void vmx_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg)
>  		if (enable_ept)
>  			ept_save_pdptrs(vcpu);
>  		break;
> +	case VCPU_EXREG_CR3:
> +		if (enable_unrestricted_guest || (enable_ept && is_paging(vcpu)))
> +			vcpu->arch.cr3 = vmcs_readl(GUEST_CR3);
> +		break;
>  	default:
> +		WARN_ON_ONCE(1);
>  		break;
>  	}
>  }
> @@ -2859,13 +2864,6 @@ static void vmx_decache_cr0_guest_bits(struct kvm_vcpu *vcpu)
>  	vcpu->arch.cr0 |= vmcs_readl(GUEST_CR0) & cr0_guest_owned_bits;
>  }
>  
> -static void vmx_decache_cr3(struct kvm_vcpu *vcpu)
> -{
> -	if (enable_unrestricted_guest || (enable_ept && is_paging(vcpu)))
> -		vcpu->arch.cr3 = vmcs_readl(GUEST_CR3);
> -	kvm_register_mark_available(vcpu, VCPU_EXREG_CR3);
> -}
> -
>  static void vmx_decache_cr4_guest_bits(struct kvm_vcpu *vcpu)
>  {
>  	ulong cr4_guest_owned_bits = vcpu->arch.cr4_guest_owned_bits;
> @@ -2910,7 +2908,7 @@ static void ept_update_paging_mode_cr0(unsigned long *hw_cr0,
>  	struct vcpu_vmx *vmx = to_vmx(vcpu);
>  
>  	if (!kvm_register_is_available(vcpu, VCPU_EXREG_CR3))
> -		vmx_decache_cr3(vcpu);
> +		vmx_cache_reg(vcpu, VCPU_EXREG_CR3);
>  	if (!(cr0 & X86_CR0_PG)) {
>  		/* From paging/starting to nonpaging */
>  		exec_controls_setbit(vmx, CPU_BASED_CR3_LOAD_EXITING |
> @@ -7792,7 +7790,6 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
>  	.get_cpl = vmx_get_cpl,
>  	.get_cs_db_l_bits = vmx_get_cs_db_l_bits,
>  	.decache_cr0_guest_bits = vmx_decache_cr0_guest_bits,
> -	.decache_cr3 = vmx_decache_cr3,
>  	.decache_cr4_guest_bits = vmx_decache_cr4_guest_bits,
>  	.set_cr0 = vmx_set_cr0,
>  	.set_cr3 = vmx_set_cr3,
> 


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 6/8] KVM: x86: Fold 'enum kvm_ex_reg' definitions into 'enum kvm_reg'
  2019-10-09 10:52     ` Paolo Bonzini
@ 2019-10-09 11:27       ` Vitaly Kuznetsov
  0 siblings, 0 replies; 34+ messages in thread
From: Vitaly Kuznetsov @ 2019-10-09 11:27 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson
  Cc: Wanpeng Li, Jim Mattson, Joerg Roedel, kvm, linux-kernel,
	Reto Buerki, Liran Alon, Radim Krčmář

Paolo Bonzini <pbonzini@redhat.com> writes:

> On 30/09/19 11:25, Vitaly Kuznetsov wrote:
>>> -enum kvm_reg_ex {
>>>  	VCPU_EXREG_PDPTR = NR_VCPU_REGS,
>> (Personally, I would've changed that to NR_VCPU_REGS + 1)
>> 
>
> Why?
>

Just so every entry in the enum is different and NR_VCPU_REGS acts as a
guardian.

-- 
Vitaly

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 5/8] KVM: x86: Add WARNs to detect out-of-bounds register indices
  2019-10-09 10:50   ` Paolo Bonzini
@ 2019-10-09 16:36     ` Sean Christopherson
  0 siblings, 0 replies; 34+ messages in thread
From: Sean Christopherson @ 2019-10-09 16:36 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Radim Krčmář,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Reto Buerki, Liran Alon

On Wed, Oct 09, 2019 at 12:50:44PM +0200, Paolo Bonzini wrote:
> On 27/09/19 23:45, Sean Christopherson wrote:
> > Open code the RIP and RSP accessors so as to avoid pointless overhead of
> > WARN_ON_ONCE().
> 
> Is there actually an overhead here?  It is effectively WARN_ON_ONCE(0)
> which should be compiled out just fine.

Doh, you're correct, it does get compiled out.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 4/8] KVM: VMX: Optimize vmx_set_rflags() for unrestricted guest
  2019-10-09 10:40   ` Paolo Bonzini
@ 2019-10-09 16:38     ` Sean Christopherson
  2019-10-09 20:59       ` Paolo Bonzini
  0 siblings, 1 reply; 34+ messages in thread
From: Sean Christopherson @ 2019-10-09 16:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Radim Krčmář,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Reto Buerki, Liran Alon

On Wed, Oct 09, 2019 at 12:40:53PM +0200, Paolo Bonzini wrote:
> On 27/09/19 23:45, Sean Christopherson wrote:
> > Rework vmx_set_rflags() to avoid the extra code need to handle emulation
> > of real mode and invalid state when unrestricted guest is disabled.  The
> > primary reason for doing so is to avoid the call to vmx_get_rflags(),
> > which will incur a VMREAD when RFLAGS is not already available.  When
> > running nested VMs, the majority of calls to vmx_set_rflags() will occur
> > without an associated vmx_get_rflags(), i.e. when stuffing GUEST_RFLAGS
> > during transitions between vmcs01 and vmcs02.
> > 
> > Note, vmx_get_rflags() guarantees RFLAGS is marked available.
> 
> Slightly nicer this way:
> 
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 8de9853d7ab6..62ab19d65efd 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -1431,9 +1431,17 @@ unsigned long vmx_get_rflags(struct kvm_vcpu *vcpu)
>  void vmx_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
>  {
>  	struct vcpu_vmx *vmx = to_vmx(vcpu);
> -	unsigned long old_rflags = vmx_get_rflags(vcpu);
> +	unsigned long old_rflags;
> +
> +	if (enable_unrestricted_guest) {
> +		__set_bit(VCPU_EXREG_RFLAGS, (ulong *)&vcpu->arch.regs_avail);
> +		vmx->rflags = rflags;
> +		vmcs_writel(GUEST_RFLAGS, rflags);
> +		return;
> +	}
> +
> +	old_rflags = vmx_get_rflags(vcpu);
>  
> -	__set_bit(VCPU_EXREG_RFLAGS, (ulong *)&vcpu->arch.regs_avail);
>  	vmx->rflags = rflags;
>  	if (vmx->rmode.vm86_active) {
>  		vmx->rmode.save_rflags = rflags;

Works for me.  Do you want me to spin a v3 to incorporate this and remove
the open coding of the RIP/RSP accessors?  Or are you planning on squashing
the changes as you apply?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 4/8] KVM: VMX: Optimize vmx_set_rflags() for unrestricted guest
  2019-10-09 16:38     ` Sean Christopherson
@ 2019-10-09 20:59       ` Paolo Bonzini
  2019-10-09 21:30         ` Sean Christopherson
  0 siblings, 1 reply; 34+ messages in thread
From: Paolo Bonzini @ 2019-10-09 20:59 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Radim Krčmář,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Reto Buerki, Liran Alon

On 09/10/19 18:38, Sean Christopherson wrote:
> On Wed, Oct 09, 2019 at 12:40:53PM +0200, Paolo Bonzini wrote:
>> On 27/09/19 23:45, Sean Christopherson wrote:
>>> Rework vmx_set_rflags() to avoid the extra code need to handle emulation
>>> of real mode and invalid state when unrestricted guest is disabled.  The
>>> primary reason for doing so is to avoid the call to vmx_get_rflags(),
>>> which will incur a VMREAD when RFLAGS is not already available.  When
>>> running nested VMs, the majority of calls to vmx_set_rflags() will occur
>>> without an associated vmx_get_rflags(), i.e. when stuffing GUEST_RFLAGS
>>> during transitions between vmcs01 and vmcs02.
>>>
>>> Note, vmx_get_rflags() guarantees RFLAGS is marked available.
>>
>> Slightly nicer this way:
>>
>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>> index 8de9853d7ab6..62ab19d65efd 100644
>> --- a/arch/x86/kvm/vmx/vmx.c
>> +++ b/arch/x86/kvm/vmx/vmx.c
>> @@ -1431,9 +1431,17 @@ unsigned long vmx_get_rflags(struct kvm_vcpu *vcpu)
>>  void vmx_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
>>  {
>>  	struct vcpu_vmx *vmx = to_vmx(vcpu);
>> -	unsigned long old_rflags = vmx_get_rflags(vcpu);
>> +	unsigned long old_rflags;
>> +
>> +	if (enable_unrestricted_guest) {
>> +		__set_bit(VCPU_EXREG_RFLAGS, (ulong *)&vcpu->arch.regs_avail);
>> +		vmx->rflags = rflags;
>> +		vmcs_writel(GUEST_RFLAGS, rflags);
>> +		return;
>> +	}
>> +
>> +	old_rflags = vmx_get_rflags(vcpu);
>>  
>> -	__set_bit(VCPU_EXREG_RFLAGS, (ulong *)&vcpu->arch.regs_avail);
>>  	vmx->rflags = rflags;
>>  	if (vmx->rmode.vm86_active) {
>>  		vmx->rmode.save_rflags = rflags;
> 
> Works for me.  Do you want me to spin a v3 to incorporate this and remove
> the open coding of the RIP/RSP accessors?  Or are you planning on squashing
> the changes as you apply?

If it's okay for you I can squash it.

Paolo


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 4/8] KVM: VMX: Optimize vmx_set_rflags() for unrestricted guest
  2019-10-09 20:59       ` Paolo Bonzini
@ 2019-10-09 21:30         ` Sean Christopherson
  0 siblings, 0 replies; 34+ messages in thread
From: Sean Christopherson @ 2019-10-09 21:30 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Radim Krčmář,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Reto Buerki, Liran Alon

On Wed, Oct 09, 2019 at 10:59:10PM +0200, Paolo Bonzini wrote:
> On 09/10/19 18:38, Sean Christopherson wrote:
> > On Wed, Oct 09, 2019 at 12:40:53PM +0200, Paolo Bonzini wrote:
> >> On 27/09/19 23:45, Sean Christopherson wrote:
> >>> Rework vmx_set_rflags() to avoid the extra code need to handle emulation
> >>> of real mode and invalid state when unrestricted guest is disabled.  The
> >>> primary reason for doing so is to avoid the call to vmx_get_rflags(),
> >>> which will incur a VMREAD when RFLAGS is not already available.  When
> >>> running nested VMs, the majority of calls to vmx_set_rflags() will occur
> >>> without an associated vmx_get_rflags(), i.e. when stuffing GUEST_RFLAGS
> >>> during transitions between vmcs01 and vmcs02.
> >>>
> >>> Note, vmx_get_rflags() guarantees RFLAGS is marked available.
> >>
> >> Slightly nicer this way:
> >>
> >> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> >> index 8de9853d7ab6..62ab19d65efd 100644
> >> --- a/arch/x86/kvm/vmx/vmx.c
> >> +++ b/arch/x86/kvm/vmx/vmx.c
> >> @@ -1431,9 +1431,17 @@ unsigned long vmx_get_rflags(struct kvm_vcpu *vcpu)
> >>  void vmx_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
> >>  {
> >>  	struct vcpu_vmx *vmx = to_vmx(vcpu);
> >> -	unsigned long old_rflags = vmx_get_rflags(vcpu);
> >> +	unsigned long old_rflags;
> >> +
> >> +	if (enable_unrestricted_guest) {
> >> +		__set_bit(VCPU_EXREG_RFLAGS, (ulong *)&vcpu->arch.regs_avail);
> >> +		vmx->rflags = rflags;
> >> +		vmcs_writel(GUEST_RFLAGS, rflags);
> >> +		return;
> >> +	}
> >> +
> >> +	old_rflags = vmx_get_rflags(vcpu);
> >>  
> >> -	__set_bit(VCPU_EXREG_RFLAGS, (ulong *)&vcpu->arch.regs_avail);
> >>  	vmx->rflags = rflags;
> >>  	if (vmx->rmode.vm86_active) {
> >>  		vmx->rmode.save_rflags = rflags;
> > 
> > Works for me.  Do you want me to spin a v3 to incorporate this and remove
> > the open coding of the RIP/RSP accessors?  Or are you planning on squashing
> > the changes as you apply?
> 
> If it's okay for you I can squash it.

Squash away.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 0/8] KVM: x86: nVMX GUEST_CR3 bug fix, and then some...
  2019-09-30 10:42 ` [PATCH v2 0/8] KVM: x86: nVMX GUEST_CR3 bug fix, and then some Reto Buerki
@ 2019-10-29 15:03   ` Martin Lucina
  2019-10-30  9:09     ` Sean Christopherson
  0 siblings, 1 reply; 34+ messages in thread
From: Martin Lucina @ 2019-10-29 15:03 UTC (permalink / raw)
  To: Reto Buerki; +Cc: Sean Christopherson, Paolo Bonzini, kvm, linux-kernel

(Cc:s trimmed)

Hi,

On Monday, 30.09.2019 at 12:42, Reto Buerki wrote:
> On 9/27/19 11:45 PM, Sean Christopherson wrote:
> > *sigh*
> > 
> > v2 was shaping up to be a trivial update, until I started working on
> > Vitaly's suggestion to add a helper to test for register availability.
> > 
> > The primary purpose of this series is to fix a CR3 corruption in L2
> > reported by Reto Buerki when running with HLT interception disabled in L1.
> > On a nested VM-Enter that puts L2 into HLT, KVM never actually enters L2
> > and instead mimics HLT interception by canceling the nested run and
> > pretending that VM-Enter to L2 completed and then exited on HLT (which
> > KVM intercepted).  Because KVM never actually runs L2, KVM skips the
> > pending MMU update for L2 and so leaves a stale value in vmcs02.GUEST_CR3.
> > If the next wake event for L2 triggers a nested VM-Exit, KVM will refresh
> > vmcs12->guest_cr3 from vmcs02.GUEST_CR3 and consume the stale value.
> > 
> > Fix the issue by unconditionally writing vmcs02.GUEST_CR3 during nested
> > VM-Enter instead of deferring the update to vmx_set_cr3(), and skip the
> > update of GUEST_CR3 in vmx_set_cr3() when running L2.  I.e. make the
> > nested code fully responsible for vmcs02.GUEST_CR3.
> > 
> > Patch 02/08 is a minor optimization to skip the GUEST_CR3 update if
> > vmcs01 is already up-to-date.
> > 
> > Patches 03 and beyond are Vitaly's fault ;-).
> > 
> > Patches 03 and 04 are tangentially related cleanup to vmx_set_rflags()
> > that was discovered when working through the avail/dirty testing code.
> > Ideally they'd be sent as a separate series, but they conflict with the
> > avail/dirty helper changes and are themselves minor and straightforward.
> > 
> > Patches 05 and 06 clean up the register caching code so that there is a
> > single enum for all registers which use avail/dirty tracking.  While not
> > a true prerequisite for the avail/dirty helpers, the cleanup allows the
> > new helpers to take an 'enum kvm_reg' instead of a less helpful 'int reg'.
> > 
> > Patch 07 is the helpers themselves, as suggested by Vitaly.
> > 
> > Patch 08 is a truly optional change to ditch decache_cr3() in favor of
> > handling CR3 via cache_reg() like any other avail/dirty register.
> > 
> > 
> > Note, I collected the Reviewed-by and Tested-by tags for patches 01 and 02
> > even though I inverted the boolean from 'skip_cr3' to 'update_guest_cr3'.
> > Please drop the tags if that constitutes a non-trivial functional change.
> > 
> > v2:
> >   - Invert skip_cr3 to update_guest_cr3.  [Liran]
> >   - Reword the changelog and comment to be more explicit in detailing
> >     how/when KVM will process a nested VM-Enter without runnin L2.  [Liran]
> >   - Added Reviewed-by and Tested-by tags.
> >   - Add a comment in vmx_set_cr3() to explicitly state that nested
> >     VM-Enter is responsible for loading vmcs02.GUEST_CR3.  [Jim]
> >   - All of the loveliness in patches 03-08. [Vitaly]
> > 
> > Sean Christopherson (8):
> >   KVM: nVMX: Always write vmcs02.GUEST_CR3 during nested VM-Enter
> >   KVM: VMX: Skip GUEST_CR3 VMREAD+VMWRITE if the VMCS is up-to-date
> >   KVM: VMX: Consolidate to_vmx() usage in RFLAGS accessors
> >   KVM: VMX: Optimize vmx_set_rflags() for unrestricted guest
> >   KVM: x86: Add WARNs to detect out-of-bounds register indices
> >   KVM: x86: Fold 'enum kvm_ex_reg' definitions into 'enum kvm_reg'
> >   KVM: x86: Add helpers to test/mark reg availability and dirtiness
> >   KVM: x86: Fold decache_cr3() into cache_reg()
> > 
> >  arch/x86/include/asm/kvm_host.h |  5 +-
> >  arch/x86/kvm/kvm_cache_regs.h   | 67 +++++++++++++++++------
> >  arch/x86/kvm/svm.c              |  5 --
> >  arch/x86/kvm/vmx/nested.c       | 14 ++++-
> >  arch/x86/kvm/vmx/vmx.c          | 94 ++++++++++++++++++---------------
> >  arch/x86/kvm/x86.c              | 13 ++---
> >  arch/x86/kvm/x86.h              |  6 +--
> >  7 files changed, 123 insertions(+), 81 deletions(-)
> 
> Series:
> Tested-by: Reto Buerki <reet@codelabs.ch>

Any chance of this series making it into 5.4? Unless I'm looking in the
wrong place, I don't see the changes in either kvm.git or Linus' tree.

Thanks,

Martin

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 0/8] KVM: x86: nVMX GUEST_CR3 bug fix, and then some...
  2019-10-29 15:03   ` Martin Lucina
@ 2019-10-30  9:09     ` Sean Christopherson
  0 siblings, 0 replies; 34+ messages in thread
From: Sean Christopherson @ 2019-10-30  9:09 UTC (permalink / raw)
  To: Reto Buerki, Paolo Bonzini, kvm, linux-kernel

On Tue, Oct 29, 2019 at 04:03:04PM +0100, Martin Lucina wrote:
> (Cc:s trimmed)
> 
> Hi,
> 
> On Monday, 30.09.2019 at 12:42, Reto Buerki wrote:
> > On 9/27/19 11:45 PM, Sean Christopherson wrote:
> > > Sean Christopherson (8):
> > >   KVM: nVMX: Always write vmcs02.GUEST_CR3 during nested VM-Enter
> > >   KVM: VMX: Skip GUEST_CR3 VMREAD+VMWRITE if the VMCS is up-to-date
> > >   KVM: VMX: Consolidate to_vmx() usage in RFLAGS accessors
> > >   KVM: VMX: Optimize vmx_set_rflags() for unrestricted guest
> > >   KVM: x86: Add WARNs to detect out-of-bounds register indices
> > >   KVM: x86: Fold 'enum kvm_ex_reg' definitions into 'enum kvm_reg'
> > >   KVM: x86: Add helpers to test/mark reg availability and dirtiness
> > >   KVM: x86: Fold decache_cr3() into cache_reg()
> > > 
> > >  arch/x86/include/asm/kvm_host.h |  5 +-
> > >  arch/x86/kvm/kvm_cache_regs.h   | 67 +++++++++++++++++------
> > >  arch/x86/kvm/svm.c              |  5 --
> > >  arch/x86/kvm/vmx/nested.c       | 14 ++++-
> > >  arch/x86/kvm/vmx/vmx.c          | 94 ++++++++++++++++++---------------
> > >  arch/x86/kvm/x86.c              | 13 ++---
> > >  arch/x86/kvm/x86.h              |  6 +--
> > >  7 files changed, 123 insertions(+), 81 deletions(-)
> > 
> > Series:
> > Tested-by: Reto Buerki <reet@codelabs.ch>
> 
> Any chance of this series making it into 5.4? Unless I'm looking in the
> wrong place, I don't see the changes in either kvm.git or Linus' tree.

It's queued up in kvm.git for 5.5.  That being said, the first patch
should go into 5.4 (it's also tagged for stable).  The next round of KVM
fixes for 5.4 will probably be delayed due to KVM Forum.

Paolo?

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2019-10-30  9:09 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-27 21:45 [PATCH v2 0/8] KVM: x86: nVMX GUEST_CR3 bug fix, and then some Sean Christopherson
2019-09-27 21:45 ` [PATCH v2 1/8] KVM: nVMX: Always write vmcs02.GUEST_CR3 during nested VM-Enter Sean Christopherson
2019-09-27 23:37   ` Jim Mattson
2019-09-27 21:45 ` [PATCH v2 2/8] KVM: VMX: Skip GUEST_CR3 VMREAD+VMWRITE if the VMCS is up-to-date Sean Christopherson
2019-09-27 21:45 ` [PATCH v2 3/8] KVM: VMX: Consolidate to_vmx() usage in RFLAGS accessors Sean Christopherson
2019-09-30  8:48   ` Vitaly Kuznetsov
2019-09-27 21:45 ` [PATCH v2 4/8] KVM: VMX: Optimize vmx_set_rflags() for unrestricted guest Sean Christopherson
2019-09-30  8:57   ` Vitaly Kuznetsov
2019-09-30 15:19     ` Sean Christopherson
2019-09-30 15:55       ` Vitaly Kuznetsov
2019-10-09 10:40   ` Paolo Bonzini
2019-10-09 16:38     ` Sean Christopherson
2019-10-09 20:59       ` Paolo Bonzini
2019-10-09 21:30         ` Sean Christopherson
2019-09-27 21:45 ` [PATCH v2 5/8] KVM: x86: Add WARNs to detect out-of-bounds register indices Sean Christopherson
2019-09-30  9:19   ` Vitaly Kuznetsov
2019-10-09 10:50   ` Paolo Bonzini
2019-10-09 16:36     ` Sean Christopherson
2019-09-27 21:45 ` [PATCH v2 6/8] KVM: x86: Fold 'enum kvm_ex_reg' definitions into 'enum kvm_reg' Sean Christopherson
2019-09-30  9:25   ` Vitaly Kuznetsov
2019-10-09 10:52     ` Paolo Bonzini
2019-10-09 11:27       ` Vitaly Kuznetsov
2019-09-27 21:45 ` [PATCH v2 7/8] KVM: x86: Add helpers to test/mark reg availability and dirtiness Sean Christopherson
2019-09-30  9:32   ` Vitaly Kuznetsov
2019-10-09 11:00     ` Paolo Bonzini
2019-09-27 21:45 ` [PATCH v2 8/8] KVM: x86: Fold decache_cr3() into cache_reg() Sean Christopherson
2019-09-30 10:58   ` Vitaly Kuznetsov
2019-09-30 15:04     ` Sean Christopherson
2019-09-30 15:27       ` Vitaly Kuznetsov
2019-09-30 15:33         ` Sean Christopherson
2019-10-09 11:03   ` Paolo Bonzini
2019-09-30 10:42 ` [PATCH v2 0/8] KVM: x86: nVMX GUEST_CR3 bug fix, and then some Reto Buerki
2019-10-29 15:03   ` Martin Lucina
2019-10-30  9:09     ` Sean Christopherson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).