linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 00/37] KVM: x86: TLB flushing fixes and enhancements
@ 2020-03-20 21:27 Sean Christopherson
  2020-03-20 21:27 ` [PATCH v3 01/37] KVM: VMX: Flush all EPTP/VPID contexts on remote TLB flush Sean Christopherson
                   ` (36 more replies)
  0 siblings, 37 replies; 83+ messages in thread
From: Sean Christopherson @ 2020-03-20 21:27 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Junaid Shahid,
	Liran Alon, Boris Ostrovsky, John Haxby, Miaohe Lin,
	Tom Lendacky

VMX TLB flushing cleanup series to fix a variety of bugs, and to avoid
unnecessary TLB flushes on nested VMX transitions.

  1) Nested VMX doesn't properly flush all ASIDs/contexts on system events,
     e.g. on mmu_notifier invalidate all contexts for L1 *and* L2 need to
     be invalidated, but KVM generally only flushes L1 or L2 (or just L1).

  2) #1 is largely benign because nested VMX always flushes the new
     context on nested VM-Entry/VM-Exit.

High level overview:

  a) Fix the main TLB flushing bug with a big hammer.

  b) Fix a few other flushing related bugs.

  c) Clean up vmx_tlb_flush(), i.e. what was v1 of this series.

  d) Reintroduce current-ASID/context flushing to regain some of the
     precision that got blasted away by the big hammer in #1.

  e) Fix random code paths that unnecessarily trigger TLB flushes on
     nested VMX transitions.

  f) Stop flushing on every nested VMX transition.

  g) Extra cleanup.


v3:
  - Fix freeing of roots during INVVPID, I botched things when tweaking
    Junaid's original patch.
  - Move "last vpid02" logic into nested VMX flushing helper. [Paolo]
  - Split "skip tlb flush" and "skip mmu sync" logic during fast roots
    switch. [Paolo]
  - Unconditionally skip tlb flush during fast roots switch on nested VMX
    transitions, i.e. let nested_vmx_transition_tlb_flush() do the work.
    This avoids flushing when EPT=0 and VPID=1. [Paolo]
  - Do more cr3->pgd conversions in related code that drove me bonkers
    when trying to figure out what was broken with VPID.  I think this
    knocks off the last code that uses "cr3" for variables/functions that
    work with cr3 or eptp.

v2:
  - Basically a new series.

v1:
  - https://patchwork.kernel.org/cover/11394987/

Junaid Shahid (2):
  KVM: nVMX: Invalidate all roots when emulating INVVPID without EPT
  KVM: x86: Sync SPTEs when injecting page/EPT fault into L1

Sean Christopherson (35):
  KVM: VMX: Flush all EPTP/VPID contexts on remote TLB flush
  KVM: nVMX: Validate the EPTP when emulating INVEPT(EXTENT_CONTEXT)
  KVM: nVMX: Invalidate all EPTP contexts when emulating INVEPT for L1
  KVM: x86: Export kvm_propagate_fault() (as
    kvm_inject_emulated_page_fault)
  KVM: x86: Consolidate logic for injecting page faults to L1
  KVM: VMX: Skip global INVVPID fallback if vpid==0 in
    vpid_sync_context()
  KVM: VMX: Use vpid_sync_context() directly when possible
  KVM: VMX: Move vpid_sync_vcpu_addr() down a few lines
  KVM: VMX: Handle INVVPID fallback logic in vpid_sync_vcpu_addr()
  KVM: VMX: Drop redundant capability checks in low level INVVPID
    helpers
  KVM: nVMX: Use vpid_sync_vcpu_addr() to emulate INVVPID with address
  KVM: x86: Move "flush guest's TLB" logic to separate kvm_x86_ops hook
  KVM: VMX: Clean up vmx_flush_tlb_gva()
  KVM: x86: Drop @invalidate_gpa param from kvm_x86_ops' tlb_flush()
  KVM: SVM: Wire up ->tlb_flush_guest() directly to svm_flush_tlb()
  KVM: VMX: Move vmx_flush_tlb() to vmx.c
  KVM: nVMX: Move nested_get_vpid02() to vmx/nested.h
  KVM: VMX: Introduce vmx_flush_tlb_current()
  KVM: SVM: Document the ASID logic in svm_flush_tlb()
  KVM: x86: Rename ->tlb_flush() to ->tlb_flush_all()
  KVM: nVMX: Add helper to handle TLB flushes on nested VM-Enter/VM-Exit
  KVM: x86: Introduce KVM_REQ_TLB_FLUSH_CURRENT to flush current ASID
  KVM: x86/mmu: Use KVM_REQ_TLB_FLUSH_CURRENT for MMU specific flushes
  KVM: nVMX: Selectively use TLB_FLUSH_CURRENT for nested
    VM-Enter/VM-Exit
  KVM: nVMX: Reload APIC access page on nested VM-Exit only if necessary
  KVM: VMX: Retrieve APIC access page HPA only when necessary
  KVM: VMX: Don't reload APIC access page if its control is disabled
  KVM: x86/mmu: Move fast_cr3_switch() side effects to
    __kvm_mmu_new_cr3()
  KVM: x86/mmu: Add separate override for MMU sync during fast CR3
    switch
  KVM: x86/mmu: Add module param to force TLB flush on root reuse
  KVM: nVMX: Skip MMU sync on nested VMX transition when possible
  KVM: nVMX: Don't flush TLB on nested VMX transition
  KVM: nVMX: Free only the affected contexts when emulating INVEPT
  KVM: x86: Replace "cr3" with "pgd" in "new cr3/pgd" related code
  KVM: VMX: Clean cr3/pgd handling in vmx_load_mmu_pgd()

 arch/x86/include/asm/kvm_host.h |  25 +++-
 arch/x86/kvm/mmu/mmu.c          | 145 +++++++++----------
 arch/x86/kvm/mmu/paging_tmpl.h  |   2 +-
 arch/x86/kvm/svm.c              |  19 ++-
 arch/x86/kvm/vmx/nested.c       | 249 ++++++++++++++++++++++----------
 arch/x86/kvm/vmx/nested.h       |   7 +
 arch/x86/kvm/vmx/ops.h          |  32 ++--
 arch/x86/kvm/vmx/vmx.c          | 119 ++++++++++++---
 arch/x86/kvm/vmx/vmx.h          |  19 +--
 arch/x86/kvm/x86.c              |  67 ++++++---
 arch/x86/kvm/x86.h              |   6 +
 11 files changed, 448 insertions(+), 242 deletions(-)

-- 
2.24.1


^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH v3 01/37] KVM: VMX: Flush all EPTP/VPID contexts on remote TLB flush
  2020-03-20 21:27 [PATCH v3 00/37] KVM: x86: TLB flushing fixes and enhancements Sean Christopherson
@ 2020-03-20 21:27 ` Sean Christopherson
  2021-08-03  1:45   ` Lai Jiangshan
  2020-03-20 21:27 ` [PATCH v3 02/37] KVM: nVMX: Validate the EPTP when emulating INVEPT(EXTENT_CONTEXT) Sean Christopherson
                   ` (35 subsequent siblings)
  36 siblings, 1 reply; 83+ messages in thread
From: Sean Christopherson @ 2020-03-20 21:27 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Junaid Shahid,
	Liran Alon, Boris Ostrovsky, John Haxby, Miaohe Lin,
	Tom Lendacky

Flush all EPTP/VPID contexts if a TLB flush _may_ have been triggered by
a remote or deferred TLB flush, i.e. by KVM_REQ_TLB_FLUSH.  Remote TLB
flushes require all contexts to be invalidated, not just the active
contexts, e.g. all mappings in all contexts for a given HVA need to be
invalidated on a mmu_notifier invalidation.  Similarly, the instigator
of the deferred TLB flush may be expecting all contexts to be flushed,
e.g. vmx_vcpu_load_vmcs().

Without nested VMX, flushing only the current EPTP/VPID context isn't
problematic because KVM uses a constant VPID for each vCPU, and
mmu_alloc_direct_roots() all but guarantees KVM will use a single EPTP
for L1.  In the rare case where a different EPTP is created or reused,
KVM (currently) unconditionally flushes the new EPTP context prior to
entering the guest.

With nested VMX, KVM conditionally uses a different VPID for L2, and
unconditionally uses a different EPTP for L2.  Because KVM doesn't
_intentionally_ guarantee L2's EPTP/VPID context is flushed on nested
VM-Enter, it'd be possible for a malicious L1 to attack the host and/or
different VMs by exploiting the lack of flushing for L2.

  1) Launch nested guest from malicious L1.

  2) Nested VM-Enter to L2.

  3) Access target GPA 'g'.  CPU inserts TLB entry tagged with L2's ASID
     mapping 'g' to host PFN 'x'.

  2) Nested VM-Exit to L1.

  3) L1 triggers kernel same-page merging (ksm) by duplicating/zeroing
     the page for PFN 'x'.

  4) Host kernel merges PFN 'x' with PFN 'y', i.e. unmaps PFN 'x' and
     remaps the page to PFN 'y'.  mmu_notifier sends invalidate command,
     KVM flushes TLB only for L1's ASID.

  4) Host kernel reallocates PFN 'x' to some other task/guest.

  5) Nested VM-Enter to L2.  KVM does not invalidate L2's EPTP or VPID.

  6) L2 accesses GPA 'g' and gains read/write access to PFN 'x' via its
     stale TLB entry.

However, current KVM unconditionally flushes L1's EPTP/VPID context on
nested VM-Exit.  But, that behavior is mostly unintentional, KVM doesn't
go out of its way to flush EPTP/VPID on nested VM-Enter/VM-Exit, rather
a TLB flush is guaranteed to occur prior to re-entering L1 due to
__kvm_mmu_new_cr3() always being called with skip_tlb_flush=false.  On
nested VM-Enter, this happens via kvm_init_shadow_ept_mmu() (nested EPT
enabled) or in nested_vmx_load_cr3() (nested EPT disabled).  On nested
VM-Exit it occurs via nested_vmx_load_cr3().

This also fixes a bug where a deferred TLB flush in the context of L2,
with EPT disabled, would flush L1's VPID instead of L2's VPID, as
vmx_flush_tlb() flushes L1's VPID regardless of is_guest_mode().

Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Ben Gardon <bgardon@google.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Junaid Shahid <junaids@google.com>
Cc: Liran Alon <liran.alon@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: John Haxby <john.haxby@oracle.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Fixes: efebf0aaec3d ("KVM: nVMX: Do not flush TLB on L1<->L2 transitions if L1 uses VPID and EPT")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/vmx/vmx.h | 28 +++++++++++++++++++++++++++-
 1 file changed, 27 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index be93d597306c..d6d67b816ebe 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -518,7 +518,33 @@ static inline void __vmx_flush_tlb(struct kvm_vcpu *vcpu, int vpid,
 
 static inline void vmx_flush_tlb(struct kvm_vcpu *vcpu, bool invalidate_gpa)
 {
-	__vmx_flush_tlb(vcpu, to_vmx(vcpu)->vpid, invalidate_gpa);
+	struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+	/*
+	 * Flush all EPTP/VPID contexts if the TLB flush _may_ have been
+	 * invoked via kvm_flush_remote_tlbs(), which always passes %true for
+	 * @invalidate_gpa.  Flushing remote TLBs requires all contexts to be
+	 * flushed, not just the active context.
+	 *
+	 * Note, this also ensures a deferred TLB flush with VPID enabled and
+	 * EPT disabled invalidates the "correct" VPID, by nuking both L1 and
+	 * L2's VPIDs.
+	 */
+	if (invalidate_gpa) {
+		if (enable_ept) {
+			ept_sync_global();
+		} else if (enable_vpid) {
+			if (cpu_has_vmx_invvpid_global()) {
+				vpid_sync_vcpu_global();
+			} else {
+				WARN_ON_ONCE(!cpu_has_vmx_invvpid_single());
+				vpid_sync_vcpu_single(vmx->vpid);
+				vpid_sync_vcpu_single(vmx->nested.vpid02);
+			}
+		}
+	} else {
+		__vmx_flush_tlb(vcpu, vmx->vpid, false);
+	}
 }
 
 static inline void decache_tsc_multiplier(struct vcpu_vmx *vmx)
-- 
2.24.1


^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH v3 02/37] KVM: nVMX: Validate the EPTP when emulating INVEPT(EXTENT_CONTEXT)
  2020-03-20 21:27 [PATCH v3 00/37] KVM: x86: TLB flushing fixes and enhancements Sean Christopherson
  2020-03-20 21:27 ` [PATCH v3 01/37] KVM: VMX: Flush all EPTP/VPID contexts on remote TLB flush Sean Christopherson
@ 2020-03-20 21:27 ` Sean Christopherson
  2020-03-23 14:51   ` Vitaly Kuznetsov
  2020-03-20 21:27 ` [PATCH v3 03/37] KVM: nVMX: Invalidate all EPTP contexts when emulating INVEPT for L1 Sean Christopherson
                   ` (34 subsequent siblings)
  36 siblings, 1 reply; 83+ messages in thread
From: Sean Christopherson @ 2020-03-20 21:27 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Junaid Shahid,
	Liran Alon, Boris Ostrovsky, John Haxby, Miaohe Lin,
	Tom Lendacky

Signal VM-Fail for the single-context variant of INVEPT if the specified
EPTP is invalid.  Per the INEVPT pseudocode in Intel's SDM, it's subject
to the standard EPT checks:

  If VM entry with the "enable EPT" VM execution control set to 1 would
  fail due to the EPTP value then VMfail(Invalid operand to INVEPT/INVVPID);

Fixes: bfd0a56b90005 ("nEPT: Nested INVEPT")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/vmx/nested.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 8578513907d7..f3774cef4fd4 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -5156,8 +5156,12 @@ static int handle_invept(struct kvm_vcpu *vcpu)
 	}
 
 	switch (type) {
-	case VMX_EPT_EXTENT_GLOBAL:
 	case VMX_EPT_EXTENT_CONTEXT:
+		if (!nested_vmx_check_eptp(vcpu, operand.eptp))
+			return nested_vmx_failValid(vcpu,
+				VMXERR_INVALID_OPERAND_TO_INVEPT_INVVPID);
+		fallthrough;
+	case VMX_EPT_EXTENT_GLOBAL:
 	/*
 	 * TODO: Sync the necessary shadow EPT roots here, rather than
 	 * at the next emulated VM-entry.
-- 
2.24.1


^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH v3 03/37] KVM: nVMX: Invalidate all EPTP contexts when emulating INVEPT for L1
  2020-03-20 21:27 [PATCH v3 00/37] KVM: x86: TLB flushing fixes and enhancements Sean Christopherson
  2020-03-20 21:27 ` [PATCH v3 01/37] KVM: VMX: Flush all EPTP/VPID contexts on remote TLB flush Sean Christopherson
  2020-03-20 21:27 ` [PATCH v3 02/37] KVM: nVMX: Validate the EPTP when emulating INVEPT(EXTENT_CONTEXT) Sean Christopherson
@ 2020-03-20 21:27 ` Sean Christopherson
  2020-03-23 15:24   ` Vitaly Kuznetsov
  2020-03-23 16:24   ` Jim Mattson
  2020-03-20 21:28 ` [PATCH v3 04/37] KVM: nVMX: Invalidate all roots when emulating INVVPID without EPT Sean Christopherson
                   ` (33 subsequent siblings)
  36 siblings, 2 replies; 83+ messages in thread
From: Sean Christopherson @ 2020-03-20 21:27 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Junaid Shahid,
	Liran Alon, Boris Ostrovsky, John Haxby, Miaohe Lin,
	Tom Lendacky

Free all L2 (guest_mmu) roots when emulating INVEPT for L1.  Outstanding
changes to the EPT tables managed by L1 need to be recognized, and
relying on KVM to always flush L2's EPTP context on nested VM-Enter is
dangerous.

Similar to handle_invpcid(), rely on kvm_mmu_free_roots() to do a remote
TLB flush if necessary, e.g. if L1 has never entered L2 then there is
nothing to be done.

Nuking all L2 roots is overkill for the single-context variant, but it's
the safe and easy bet.  A more precise zap mechanism will be added in
the future.  Add a TODO to call out that KVM only needs to invalidate
affected contexts.

Fixes: b119019847fbc ("kvm: nVMX: Remove unnecessary sync_roots from handle_invept")
Reported-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/vmx/nested.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index f3774cef4fd4..9624cea4ed9f 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -5160,12 +5160,12 @@ static int handle_invept(struct kvm_vcpu *vcpu)
 		if (!nested_vmx_check_eptp(vcpu, operand.eptp))
 			return nested_vmx_failValid(vcpu,
 				VMXERR_INVALID_OPERAND_TO_INVEPT_INVVPID);
+
+		/* TODO: sync only the target EPTP context. */
 		fallthrough;
 	case VMX_EPT_EXTENT_GLOBAL:
-	/*
-	 * TODO: Sync the necessary shadow EPT roots here, rather than
-	 * at the next emulated VM-entry.
-	 */
+		kvm_mmu_free_roots(vcpu, &vcpu->arch.guest_mmu,
+				   KVM_MMU_ROOTS_ALL);
 		break;
 	default:
 		BUG_ON(1);
-- 
2.24.1


^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH v3 04/37] KVM: nVMX: Invalidate all roots when emulating INVVPID without EPT
  2020-03-20 21:27 [PATCH v3 00/37] KVM: x86: TLB flushing fixes and enhancements Sean Christopherson
                   ` (2 preceding siblings ...)
  2020-03-20 21:27 ` [PATCH v3 03/37] KVM: nVMX: Invalidate all EPTP contexts when emulating INVEPT for L1 Sean Christopherson
@ 2020-03-20 21:28 ` Sean Christopherson
  2020-03-23 15:34   ` Vitaly Kuznetsov
  2020-03-20 21:28 ` [PATCH v3 05/37] KVM: x86: Export kvm_propagate_fault() (as kvm_inject_emulated_page_fault) Sean Christopherson
                   ` (32 subsequent siblings)
  36 siblings, 1 reply; 83+ messages in thread
From: Sean Christopherson @ 2020-03-20 21:28 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Junaid Shahid,
	Liran Alon, Boris Ostrovsky, John Haxby, Miaohe Lin,
	Tom Lendacky

From: Junaid Shahid <junaids@google.com>

Free all roots when emulating INVVPID for L1 and EPT is disabled, as
outstanding changes to the page tables managed by L1 need to be
recognized.  Because L1 and L2 share an MMU when EPT is disabled, and
because VPID is not tracked by the MMU role, all roots in the current
MMU (root_mmu) need to be freed, otherwise a future nested VM-Enter or
VM-Exit could do a fast CR3 switch (without a flush/sync) and consume
stale SPTEs.

Fixes: 5c614b3583e7b ("KVM: nVMX: nested VPID emulation")
Signed-off-by: Junaid Shahid <junaids@google.com>
[sean: ported to upstream KVM, reworded the comment and changelog]
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/vmx/nested.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 9624cea4ed9f..bc74fbbf33c6 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -5250,6 +5250,20 @@ static int handle_invvpid(struct kvm_vcpu *vcpu)
 		return kvm_skip_emulated_instruction(vcpu);
 	}
 
+	/*
+	 * Sync the shadow page tables if EPT is disabled, L1 is invalidating
+	 * linear mappings for L2 (tagged with L2's VPID).  Free all roots as
+	 * VPIDs are not tracked in the MMU role.
+	 *
+	 * Note, this operates on root_mmu, not guest_mmu, as L1 and L2 share
+	 * an MMU when EPT is disabled.
+	 *
+	 * TODO: sync only the affected SPTEs for INVDIVIDUAL_ADDR.
+	 */
+	if (!enable_ept)
+		kvm_mmu_free_roots(vcpu, &vcpu->arch.root_mmu,
+				   KVM_MMU_ROOTS_ALL);
+
 	return nested_vmx_succeed(vcpu);
 }
 
-- 
2.24.1


^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH v3 05/37] KVM: x86: Export kvm_propagate_fault() (as kvm_inject_emulated_page_fault)
  2020-03-20 21:27 [PATCH v3 00/37] KVM: x86: TLB flushing fixes and enhancements Sean Christopherson
                   ` (3 preceding siblings ...)
  2020-03-20 21:28 ` [PATCH v3 04/37] KVM: nVMX: Invalidate all roots when emulating INVVPID without EPT Sean Christopherson
@ 2020-03-20 21:28 ` Sean Christopherson
  2020-03-23 15:47   ` Vitaly Kuznetsov
  2020-03-20 21:28 ` [PATCH v3 06/37] KVM: x86: Consolidate logic for injecting page faults to L1 Sean Christopherson
                   ` (31 subsequent siblings)
  36 siblings, 1 reply; 83+ messages in thread
From: Sean Christopherson @ 2020-03-20 21:28 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Junaid Shahid,
	Liran Alon, Boris Ostrovsky, John Haxby, Miaohe Lin,
	Tom Lendacky

Export the page fault propagation helper so that VMX can use it to
correctly emulate TLB invalidation on page faults in an upcoming patch.

In the (hopefully) not-too-distant future, SGX virtualization will also
want access to the helper for injecting page faults to the correct level
(L1 vs. L2) when emulating ENCLS instructions.

Rename the function to kvm_inject_emulated_page_fault() to clarify that
it is (a) injecting a fault and (b) only for page faults.  WARN if it's
invoked with an exception other than PF_VECTOR.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/include/asm/kvm_host.h | 2 ++
 arch/x86/kvm/x86.c              | 8 ++++++--
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 9a183e9d4cb1..328b1765ff76 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1447,6 +1447,8 @@ void kvm_queue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code);
 void kvm_requeue_exception(struct kvm_vcpu *vcpu, unsigned nr);
 void kvm_requeue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code);
 void kvm_inject_page_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault);
+bool kvm_inject_emulated_page_fault(struct kvm_vcpu *vcpu,
+				    struct x86_exception *fault);
 int kvm_read_guest_page_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 			    gfn_t gfn, void *data, int offset, int len,
 			    u32 access);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e54c6ad628a8..64ed6e6e2b56 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -611,8 +611,11 @@ void kvm_inject_page_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault)
 }
 EXPORT_SYMBOL_GPL(kvm_inject_page_fault);
 
-static bool kvm_propagate_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault)
+bool kvm_inject_emulated_page_fault(struct kvm_vcpu *vcpu,
+				    struct x86_exception *fault)
 {
+	WARN_ON_ONCE(fault->vector != PF_VECTOR);
+
 	if (mmu_is_nested(vcpu) && !fault->nested_page_fault)
 		vcpu->arch.nested_mmu.inject_page_fault(vcpu, fault);
 	else
@@ -620,6 +623,7 @@ static bool kvm_propagate_fault(struct kvm_vcpu *vcpu, struct x86_exception *fau
 
 	return fault->nested_page_fault;
 }
+EXPORT_SYMBOL_GPL(kvm_inject_emulated_page_fault);
 
 void kvm_inject_nmi(struct kvm_vcpu *vcpu)
 {
@@ -6373,7 +6377,7 @@ static bool inject_emulated_exception(struct kvm_vcpu *vcpu)
 {
 	struct x86_emulate_ctxt *ctxt = vcpu->arch.emulate_ctxt;
 	if (ctxt->exception.vector == PF_VECTOR)
-		return kvm_propagate_fault(vcpu, &ctxt->exception);
+		return kvm_inject_emulated_page_fault(vcpu, &ctxt->exception);
 
 	if (ctxt->exception.error_code_valid)
 		kvm_queue_exception_e(vcpu, ctxt->exception.vector,
-- 
2.24.1


^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH v3 06/37] KVM: x86: Consolidate logic for injecting page faults to L1
  2020-03-20 21:27 [PATCH v3 00/37] KVM: x86: TLB flushing fixes and enhancements Sean Christopherson
                   ` (4 preceding siblings ...)
  2020-03-20 21:28 ` [PATCH v3 05/37] KVM: x86: Export kvm_propagate_fault() (as kvm_inject_emulated_page_fault) Sean Christopherson
@ 2020-03-20 21:28 ` Sean Christopherson
  2020-03-24  0:47   ` Paolo Bonzini
  2020-03-20 21:28 ` [PATCH v3 07/37] KVM: x86: Sync SPTEs when injecting page/EPT fault into L1 Sean Christopherson
                   ` (30 subsequent siblings)
  36 siblings, 1 reply; 83+ messages in thread
From: Sean Christopherson @ 2020-03-20 21:28 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Junaid Shahid,
	Liran Alon, Boris Ostrovsky, John Haxby, Miaohe Lin,
	Tom Lendacky

Move the MMU's inject_page_fault(), which is used to inject page faults
encountered when walking L1's page tables, to x86.c and use it to handle
the non-nested path of kvm_inject_emulated_page_fault().  Using a common
helper will reduce duplicate code in a future patch to sync SPTEs on
emulated page faults, and also eliminates the rather confusing function
name "inject_page_fault", which collides with struct kvm_mmu's hook.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/include/asm/kvm_host.h | 2 ++
 arch/x86/kvm/mmu/mmu.c          | 6 ------
 arch/x86/kvm/mmu/paging_tmpl.h  | 2 +-
 arch/x86/kvm/x86.c              | 8 +++++++-
 4 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 328b1765ff76..cdbf822c5c8b 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1447,6 +1447,8 @@ void kvm_queue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code);
 void kvm_requeue_exception(struct kvm_vcpu *vcpu, unsigned nr);
 void kvm_requeue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code);
 void kvm_inject_page_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault);
+void kvm_inject_l1_page_fault(struct kvm_vcpu *vcpu,
+			      struct x86_exception *fault);
 bool kvm_inject_emulated_page_fault(struct kvm_vcpu *vcpu,
 				    struct x86_exception *fault);
 int kvm_read_guest_page_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 560e85ebdf22..5ae620881bbc 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4357,12 +4357,6 @@ static unsigned long get_cr3(struct kvm_vcpu *vcpu)
 	return kvm_read_cr3(vcpu);
 }
 
-static void inject_page_fault(struct kvm_vcpu *vcpu,
-			      struct x86_exception *fault)
-{
-	vcpu->arch.mmu->inject_page_fault(vcpu, fault);
-}
-
 static bool sync_mmio_spte(struct kvm_vcpu *vcpu, u64 *sptep, gfn_t gfn,
 			   unsigned int access, int *nr_present)
 {
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 1ddbfff64ccc..ac613f2fae01 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -812,7 +812,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t addr, u32 error_code,
 	if (!r) {
 		pgprintk("%s: guest page fault\n", __func__);
 		if (!prefault)
-			inject_page_fault(vcpu, &walker.fault);
+			kvm_inject_l1_page_fault(vcpu, &walker.fault);
 
 		return RET_PF_RETRY;
 	}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 64ed6e6e2b56..fcad522f221e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -611,6 +611,12 @@ void kvm_inject_page_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault)
 }
 EXPORT_SYMBOL_GPL(kvm_inject_page_fault);
 
+void kvm_inject_l1_page_fault(struct kvm_vcpu *vcpu,
+			      struct x86_exception *fault)
+{
+	vcpu->arch.mmu->inject_page_fault(vcpu, fault);
+}
+
 bool kvm_inject_emulated_page_fault(struct kvm_vcpu *vcpu,
 				    struct x86_exception *fault)
 {
@@ -619,7 +625,7 @@ bool kvm_inject_emulated_page_fault(struct kvm_vcpu *vcpu,
 	if (mmu_is_nested(vcpu) && !fault->nested_page_fault)
 		vcpu->arch.nested_mmu.inject_page_fault(vcpu, fault);
 	else
-		vcpu->arch.mmu->inject_page_fault(vcpu, fault);
+		kvm_inject_l1_page_fault(vcpu, fault);
 
 	return fault->nested_page_fault;
 }
-- 
2.24.1


^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH v3 07/37] KVM: x86: Sync SPTEs when injecting page/EPT fault into L1
  2020-03-20 21:27 [PATCH v3 00/37] KVM: x86: TLB flushing fixes and enhancements Sean Christopherson
                   ` (5 preceding siblings ...)
  2020-03-20 21:28 ` [PATCH v3 06/37] KVM: x86: Consolidate logic for injecting page faults to L1 Sean Christopherson
@ 2020-03-20 21:28 ` Sean Christopherson
  2020-03-20 21:28 ` [PATCH v3 08/37] KVM: VMX: Skip global INVVPID fallback if vpid==0 in vpid_sync_context() Sean Christopherson
                   ` (29 subsequent siblings)
  36 siblings, 0 replies; 83+ messages in thread
From: Sean Christopherson @ 2020-03-20 21:28 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Junaid Shahid,
	Liran Alon, Boris Ostrovsky, John Haxby, Miaohe Lin,
	Tom Lendacky

From: Junaid Shahid <junaids@google.com>

When injecting a page fault or EPT violation/misconfiguration, invoke
->invlpg() to sync any shadow PTEs associated with the faulting address,
including those in previous MMUs that are associated with L1's current
EPTP (in a nested EPT scenario).  Skip the sync (which incurs a costly
retpoline) if the MMU can't have unsync'd SPTEs for the address.

In addition, flush any hardware TLB entries associated with the faulting
address if the fault is the result of emulation, i.e. not an async
page fault.  !PRESENT and RSVD page faults are exempt from the flushing
as the CPU is not allowed to cache such translations.

Signed-off-by: Junaid Shahid <junaids@google.com>
Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/vmx/nested.c | 44 +++++++++++++++++++++++++++++----------
 arch/x86/kvm/vmx/vmx.c    |  2 +-
 arch/x86/kvm/x86.c        | 17 +++++++++++++++
 3 files changed, 51 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index bc74fbbf33c6..5554727d7ba8 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -323,6 +323,14 @@ void nested_vmx_free_vcpu(struct kvm_vcpu *vcpu)
 	vcpu_put(vcpu);
 }
 
+#define EPTP_PA_MASK	GENMASK_ULL(51, 12)
+
+static bool nested_ept_root_matches(hpa_t root_hpa, u64 root_eptp, u64 eptp)
+{
+	return VALID_PAGE(root_hpa) &&
+	       ((root_eptp & EPTP_PA_MASK) == (eptp & EPTP_PA_MASK));
+}
+
 static void nested_ept_inject_page_fault(struct kvm_vcpu *vcpu,
 		struct x86_exception *fault)
 {
@@ -330,18 +338,32 @@ static void nested_ept_inject_page_fault(struct kvm_vcpu *vcpu,
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 	u32 exit_reason;
 	unsigned long exit_qualification = vcpu->arch.exit_qualification;
+	struct kvm_mmu_root_info *prev;
+	u64 gpa = fault->address;
+	int i;
 
 	if (vmx->nested.pml_full) {
 		exit_reason = EXIT_REASON_PML_FULL;
 		vmx->nested.pml_full = false;
 		exit_qualification &= INTR_INFO_UNBLOCK_NMI;
-	} else if (fault->error_code & PFERR_RSVD_MASK)
-		exit_reason = EXIT_REASON_EPT_MISCONFIG;
-	else
-		exit_reason = EXIT_REASON_EPT_VIOLATION;
+	} else {
+		if (fault->error_code & PFERR_RSVD_MASK)
+			exit_reason = EXIT_REASON_EPT_MISCONFIG;
+		else
+			exit_reason = EXIT_REASON_EPT_VIOLATION;
+
+		/* Sync SPTEs in cached MMUs that track the current L1 EPTP. */
+		for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++) {
+			prev = &vcpu->arch.mmu->prev_roots[i];
+
+			if (nested_ept_root_matches(prev->hpa, prev->cr3,
+						    vmcs12->ept_pointer))
+				vcpu->arch.mmu->invlpg(vcpu, gpa, prev->hpa);
+		}
+	}
 
 	nested_vmx_vmexit(vcpu, exit_reason, 0, exit_qualification);
-	vmcs12->guest_physical_address = fault->address;
+	vmcs12->guest_physical_address = gpa;
 }
 
 static void nested_ept_init_mmu_context(struct kvm_vcpu *vcpu)
@@ -4559,7 +4581,7 @@ static int nested_vmx_get_vmptr(struct kvm_vcpu *vcpu, gpa_t *vmpointer)
 		return 1;
 
 	if (kvm_read_guest_virt(vcpu, gva, vmpointer, sizeof(*vmpointer), &e)) {
-		kvm_inject_page_fault(vcpu, &e);
+		kvm_inject_emulated_page_fault(vcpu, &e);
 		return 1;
 	}
 
@@ -4868,7 +4890,7 @@ static int handle_vmread(struct kvm_vcpu *vcpu)
 			return 1;
 		/* _system ok, nested_vmx_check_permission has verified cpl=0 */
 		if (kvm_write_guest_virt_system(vcpu, gva, &value, len, &e)) {
-			kvm_inject_page_fault(vcpu, &e);
+			kvm_inject_emulated_page_fault(vcpu, &e);
 			return 1;
 		}
 	}
@@ -4942,7 +4964,7 @@ static int handle_vmwrite(struct kvm_vcpu *vcpu)
 					instr_info, false, len, &gva))
 			return 1;
 		if (kvm_read_guest_virt(vcpu, gva, &value, len, &e)) {
-			kvm_inject_page_fault(vcpu, &e);
+			kvm_inject_emulated_page_fault(vcpu, &e);
 			return 1;
 		}
 	}
@@ -5107,7 +5129,7 @@ static int handle_vmptrst(struct kvm_vcpu *vcpu)
 	/* *_system ok, nested_vmx_check_permission has verified cpl=0 */
 	if (kvm_write_guest_virt_system(vcpu, gva, (void *)&current_vmptr,
 					sizeof(gpa_t), &e)) {
-		kvm_inject_page_fault(vcpu, &e);
+		kvm_inject_emulated_page_fault(vcpu, &e);
 		return 1;
 	}
 	return nested_vmx_succeed(vcpu);
@@ -5151,7 +5173,7 @@ static int handle_invept(struct kvm_vcpu *vcpu)
 			vmx_instruction_info, false, sizeof(operand), &gva))
 		return 1;
 	if (kvm_read_guest_virt(vcpu, gva, &operand, sizeof(operand), &e)) {
-		kvm_inject_page_fault(vcpu, &e);
+		kvm_inject_emulated_page_fault(vcpu, &e);
 		return 1;
 	}
 
@@ -5215,7 +5237,7 @@ static int handle_invvpid(struct kvm_vcpu *vcpu)
 			vmx_instruction_info, false, sizeof(operand), &gva))
 		return 1;
 	if (kvm_read_guest_virt(vcpu, gva, &operand, sizeof(operand), &e)) {
-		kvm_inject_page_fault(vcpu, &e);
+		kvm_inject_emulated_page_fault(vcpu, &e);
 		return 1;
 	}
 	if (operand.vpid >> 16)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index b447d66f44e6..ba49323a89d8 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -5404,7 +5404,7 @@ static int handle_invpcid(struct kvm_vcpu *vcpu)
 		return 1;
 
 	if (kvm_read_guest_virt(vcpu, gva, &operand, sizeof(operand), &e)) {
-		kvm_inject_page_fault(vcpu, &e);
+		kvm_inject_emulated_page_fault(vcpu, &e);
 		return 1;
 	}
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fcad522f221e..f506248d61a1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -614,6 +614,11 @@ EXPORT_SYMBOL_GPL(kvm_inject_page_fault);
 void kvm_inject_l1_page_fault(struct kvm_vcpu *vcpu,
 			      struct x86_exception *fault)
 {
+	if (!vcpu->arch.mmu->direct_map &&
+	    (fault->error_code & PFERR_PRESENT_MASK))
+		vcpu->arch.mmu->invlpg(vcpu, fault->address,
+				       vcpu->arch.mmu->root_hpa);
+
 	vcpu->arch.mmu->inject_page_fault(vcpu, fault);
 }
 
@@ -622,7 +627,19 @@ bool kvm_inject_emulated_page_fault(struct kvm_vcpu *vcpu,
 {
 	WARN_ON_ONCE(fault->vector != PF_VECTOR);
 
+	/*
+	 * Invalidate the TLB entry for the faulting address, if one can exist,
+	 * else the access will fault indefinitely (and to emulate hardware).
+	 */
+	if ((fault->error_code & PFERR_PRESENT_MASK) &&
+	    !(fault->error_code & PFERR_RSVD_MASK))
+		kvm_x86_ops->tlb_flush_gva(vcpu, fault->address);
+
 	if (mmu_is_nested(vcpu) && !fault->nested_page_fault)
+		/*
+		 * No need to sync SPTEs, the fault is being injected into L2,
+		 * whose page tables are not being shadowed.
+		 */
 		vcpu->arch.nested_mmu.inject_page_fault(vcpu, fault);
 	else
 		kvm_inject_l1_page_fault(vcpu, fault);
-- 
2.24.1


^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH v3 08/37] KVM: VMX: Skip global INVVPID fallback if vpid==0 in vpid_sync_context()
  2020-03-20 21:27 [PATCH v3 00/37] KVM: x86: TLB flushing fixes and enhancements Sean Christopherson
                   ` (6 preceding siblings ...)
  2020-03-20 21:28 ` [PATCH v3 07/37] KVM: x86: Sync SPTEs when injecting page/EPT fault into L1 Sean Christopherson
@ 2020-03-20 21:28 ` Sean Christopherson
  2020-03-25  9:33   ` Vitaly Kuznetsov
  2020-03-20 21:28 ` [PATCH v3 09/37] KVM: VMX: Use vpid_sync_context() directly when possible Sean Christopherson
                   ` (28 subsequent siblings)
  36 siblings, 1 reply; 83+ messages in thread
From: Sean Christopherson @ 2020-03-20 21:28 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Junaid Shahid,
	Liran Alon, Boris Ostrovsky, John Haxby, Miaohe Lin,
	Tom Lendacky

Skip the global INVVPID in the unlikely scenario that vpid==0 and the
SINGLE_CONTEXT variant of INVVPID is unsupported.  If vpid==0, there's
no need to INVVPID as it's impossible to do VM-Enter with VPID enabled
and vmcs.VPID==0, i.e. there can't be any TLB entries for the vCPU with
vpid==0.  The fact that the SINGLE_CONTEXT variant isn't supported is
irrelevant.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/vmx/ops.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/ops.h b/arch/x86/kvm/vmx/ops.h
index 45eaedee2ac0..33645a8e5463 100644
--- a/arch/x86/kvm/vmx/ops.h
+++ b/arch/x86/kvm/vmx/ops.h
@@ -285,7 +285,7 @@ static inline void vpid_sync_context(int vpid)
 {
 	if (cpu_has_vmx_invvpid_single())
 		vpid_sync_vcpu_single(vpid);
-	else
+	else if (vpid != 0)
 		vpid_sync_vcpu_global();
 }
 
-- 
2.24.1


^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH v3 09/37] KVM: VMX: Use vpid_sync_context() directly when possible
  2020-03-20 21:27 [PATCH v3 00/37] KVM: x86: TLB flushing fixes and enhancements Sean Christopherson
                   ` (7 preceding siblings ...)
  2020-03-20 21:28 ` [PATCH v3 08/37] KVM: VMX: Skip global INVVPID fallback if vpid==0 in vpid_sync_context() Sean Christopherson
@ 2020-03-20 21:28 ` Sean Christopherson
  2020-03-20 21:28 ` [PATCH v3 10/37] KVM: VMX: Move vpid_sync_vcpu_addr() down a few lines Sean Christopherson
                   ` (27 subsequent siblings)
  36 siblings, 0 replies; 83+ messages in thread
From: Sean Christopherson @ 2020-03-20 21:28 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Junaid Shahid,
	Liran Alon, Boris Ostrovsky, John Haxby, Miaohe Lin,
	Tom Lendacky

Use vpid_sync_context() directly for flows that run if and only if
enable_vpid=1, or more specifically, nested VMX flows that are gated by
vmx->nested.msrs.secondary_ctls_high.SECONDARY_EXEC_ENABLE_VPID being
set, which is allowed if and only if enable_vpid=1.  Because these flows
call __vmx_flush_tlb() with @invalidate_gpa=false, the if-statement that
decides between INVEPT and INVVPID will always go down the INVVPID path,
i.e. call vpid_sync_context() because
"enable_ept && (invalidate_gpa || !enable_vpid)" always evaluates false.

This helps pave the way toward removing @invalidate_gpa and @vpid from
__vmx_flush_tlb() and its callers.

Opportunstically drop unnecessary brackets in handle_invvpid() around an
affected __vmx_flush_tlb()->vpid_sync_context() conversion.

No functional change intended.

Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/vmx/nested.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 5554727d7ba8..81bc4791d704 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -2481,7 +2481,7 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
 		if (nested_cpu_has_vpid(vmcs12) && nested_has_guest_tlb_tag(vcpu)) {
 			if (vmcs12->virtual_processor_id != vmx->nested.last_vpid) {
 				vmx->nested.last_vpid = vmcs12->virtual_processor_id;
-				__vmx_flush_tlb(vcpu, nested_get_vpid02(vcpu), false);
+				vpid_sync_context(nested_get_vpid02(vcpu));
 			}
 		} else {
 			/*
@@ -5251,21 +5251,21 @@ static int handle_invvpid(struct kvm_vcpu *vcpu)
 		    is_noncanonical_address(operand.gla, vcpu))
 			return nested_vmx_failValid(vcpu,
 				VMXERR_INVALID_OPERAND_TO_INVEPT_INVVPID);
-		if (cpu_has_vmx_invvpid_individual_addr()) {
+		if (cpu_has_vmx_invvpid_individual_addr())
 			__invvpid(VMX_VPID_EXTENT_INDIVIDUAL_ADDR,
 				vpid02, operand.gla);
-		} else
-			__vmx_flush_tlb(vcpu, vpid02, false);
+		else
+			vpid_sync_context(vpid02);
 		break;
 	case VMX_VPID_EXTENT_SINGLE_CONTEXT:
 	case VMX_VPID_EXTENT_SINGLE_NON_GLOBAL:
 		if (!operand.vpid)
 			return nested_vmx_failValid(vcpu,
 				VMXERR_INVALID_OPERAND_TO_INVEPT_INVVPID);
-		__vmx_flush_tlb(vcpu, vpid02, false);
+		vpid_sync_context(vpid02);
 		break;
 	case VMX_VPID_EXTENT_ALL_CONTEXT:
-		__vmx_flush_tlb(vcpu, vpid02, false);
+		vpid_sync_context(vpid02);
 		break;
 	default:
 		WARN_ON_ONCE(1);
-- 
2.24.1


^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH v3 10/37] KVM: VMX: Move vpid_sync_vcpu_addr() down a few lines
  2020-03-20 21:27 [PATCH v3 00/37] KVM: x86: TLB flushing fixes and enhancements Sean Christopherson
                   ` (8 preceding siblings ...)
  2020-03-20 21:28 ` [PATCH v3 09/37] KVM: VMX: Use vpid_sync_context() directly when possible Sean Christopherson
@ 2020-03-20 21:28 ` Sean Christopherson
  2020-03-20 21:28 ` [PATCH v3 11/37] KVM: VMX: Handle INVVPID fallback logic in vpid_sync_vcpu_addr() Sean Christopherson
                   ` (26 subsequent siblings)
  36 siblings, 0 replies; 83+ messages in thread
From: Sean Christopherson @ 2020-03-20 21:28 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Junaid Shahid,
	Liran Alon, Boris Ostrovsky, John Haxby, Miaohe Lin,
	Tom Lendacky

Move vpid_sync_vcpu_addr() below vpid_sync_context() so that it can be
refactored in a future patch to call vpid_sync_context() directly when
the "individual address" INVVPID variant isn't supported.

No functional change intended.

Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/vmx/ops.h | 26 +++++++++++++-------------
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kvm/vmx/ops.h b/arch/x86/kvm/vmx/ops.h
index 33645a8e5463..dd7ab61bfcc1 100644
--- a/arch/x86/kvm/vmx/ops.h
+++ b/arch/x86/kvm/vmx/ops.h
@@ -253,19 +253,6 @@ static inline void __invept(unsigned long ext, u64 eptp, gpa_t gpa)
 	vmx_asm2(invept, "r"(ext), "m"(operand), ext, eptp, gpa);
 }
 
-static inline bool vpid_sync_vcpu_addr(int vpid, gva_t addr)
-{
-	if (vpid == 0)
-		return true;
-
-	if (cpu_has_vmx_invvpid_individual_addr()) {
-		__invvpid(VMX_VPID_EXTENT_INDIVIDUAL_ADDR, vpid, addr);
-		return true;
-	}
-
-	return false;
-}
-
 static inline void vpid_sync_vcpu_single(int vpid)
 {
 	if (vpid == 0)
@@ -289,6 +276,19 @@ static inline void vpid_sync_context(int vpid)
 		vpid_sync_vcpu_global();
 }
 
+static inline bool vpid_sync_vcpu_addr(int vpid, gva_t addr)
+{
+	if (vpid == 0)
+		return true;
+
+	if (cpu_has_vmx_invvpid_individual_addr()) {
+		__invvpid(VMX_VPID_EXTENT_INDIVIDUAL_ADDR, vpid, addr);
+		return true;
+	}
+
+	return false;
+}
+
 static inline void ept_sync_global(void)
 {
 	__invept(VMX_EPT_EXTENT_GLOBAL, 0, 0);
-- 
2.24.1


^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH v3 11/37] KVM: VMX: Handle INVVPID fallback logic in vpid_sync_vcpu_addr()
  2020-03-20 21:27 [PATCH v3 00/37] KVM: x86: TLB flushing fixes and enhancements Sean Christopherson
                   ` (9 preceding siblings ...)
  2020-03-20 21:28 ` [PATCH v3 10/37] KVM: VMX: Move vpid_sync_vcpu_addr() down a few lines Sean Christopherson
@ 2020-03-20 21:28 ` Sean Christopherson
  2020-03-20 21:28 ` [PATCH v3 12/37] KVM: VMX: Drop redundant capability checks in low level INVVPID helpers Sean Christopherson
                   ` (25 subsequent siblings)
  36 siblings, 0 replies; 83+ messages in thread
From: Sean Christopherson @ 2020-03-20 21:28 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Junaid Shahid,
	Liran Alon, Boris Ostrovsky, John Haxby, Miaohe Lin,
	Tom Lendacky

Directly invoke vpid_sync_context() to do a global INVVPID when the
individual address variant is not supported instead of deferring such
behavior to the caller.  This allows for additional consolidation of
code as the logic is basically identical to the emulation of the
individual address variant in handle_invvpid().

No functional change intended.

Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/vmx/ops.h | 12 +++++-------
 arch/x86/kvm/vmx/vmx.c |  3 +--
 2 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/vmx/ops.h b/arch/x86/kvm/vmx/ops.h
index dd7ab61bfcc1..39122699cfeb 100644
--- a/arch/x86/kvm/vmx/ops.h
+++ b/arch/x86/kvm/vmx/ops.h
@@ -276,17 +276,15 @@ static inline void vpid_sync_context(int vpid)
 		vpid_sync_vcpu_global();
 }
 
-static inline bool vpid_sync_vcpu_addr(int vpid, gva_t addr)
+static inline void vpid_sync_vcpu_addr(int vpid, gva_t addr)
 {
 	if (vpid == 0)
-		return true;
+		return;
 
-	if (cpu_has_vmx_invvpid_individual_addr()) {
+	if (cpu_has_vmx_invvpid_individual_addr())
 		__invvpid(VMX_VPID_EXTENT_INDIVIDUAL_ADDR, vpid, addr);
-		return true;
-	}
-
-	return false;
+	else
+		vpid_sync_context(vpid);
 }
 
 static inline void ept_sync_global(void)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index ba49323a89d8..ba24bbda2c12 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2853,8 +2853,7 @@ static void vmx_flush_tlb_gva(struct kvm_vcpu *vcpu, gva_t addr)
 {
 	int vpid = to_vmx(vcpu)->vpid;
 
-	if (!vpid_sync_vcpu_addr(vpid, addr))
-		vpid_sync_context(vpid);
+	vpid_sync_vcpu_addr(vpid, addr);
 
 	/*
 	 * If VPIDs are not supported or enabled, then the above is a no-op.
-- 
2.24.1


^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH v3 12/37] KVM: VMX: Drop redundant capability checks in low level INVVPID helpers
  2020-03-20 21:27 [PATCH v3 00/37] KVM: x86: TLB flushing fixes and enhancements Sean Christopherson
                   ` (10 preceding siblings ...)
  2020-03-20 21:28 ` [PATCH v3 11/37] KVM: VMX: Handle INVVPID fallback logic in vpid_sync_vcpu_addr() Sean Christopherson
@ 2020-03-20 21:28 ` Sean Christopherson
  2020-03-20 21:28 ` [PATCH v3 13/37] KVM: nVMX: Use vpid_sync_vcpu_addr() to emulate INVVPID with address Sean Christopherson
                   ` (24 subsequent siblings)
  36 siblings, 0 replies; 83+ messages in thread
From: Sean Christopherson @ 2020-03-20 21:28 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Junaid Shahid,
	Liran Alon, Boris Ostrovsky, John Haxby, Miaohe Lin,
	Tom Lendacky

Remove the INVVPID capabilities checks from vpid_sync_vcpu_single() and
vpid_sync_vcpu_global() now that all callers ensure the INVVPID variant
is supported.  Note, in some cases the guarantee is provided in concert
with hardware_setup(), which enables VPID if and only if at least of
invvpid_single() or invvpid_global() is supported.

Drop the WARN_ON_ONCE() from vmx_flush_tlb() as vpid_sync_vcpu_single()
will trigger a WARN() on INVVPID failure, i.e. if SINGLE_CONTEXT isn't
supported.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/vmx/ops.h | 6 ++----
 arch/x86/kvm/vmx/vmx.h | 1 -
 2 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/vmx/ops.h b/arch/x86/kvm/vmx/ops.h
index 39122699cfeb..aa1aab52971a 100644
--- a/arch/x86/kvm/vmx/ops.h
+++ b/arch/x86/kvm/vmx/ops.h
@@ -258,14 +258,12 @@ static inline void vpid_sync_vcpu_single(int vpid)
 	if (vpid == 0)
 		return;
 
-	if (cpu_has_vmx_invvpid_single())
-		__invvpid(VMX_VPID_EXTENT_SINGLE_CONTEXT, vpid, 0);
+	__invvpid(VMX_VPID_EXTENT_SINGLE_CONTEXT, vpid, 0);
 }
 
 static inline void vpid_sync_vcpu_global(void)
 {
-	if (cpu_has_vmx_invvpid_global())
-		__invvpid(VMX_VPID_EXTENT_ALL_CONTEXT, 0, 0);
+	__invvpid(VMX_VPID_EXTENT_ALL_CONTEXT, 0, 0);
 }
 
 static inline void vpid_sync_context(int vpid)
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index d6d67b816ebe..3770ae111e6a 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -537,7 +537,6 @@ static inline void vmx_flush_tlb(struct kvm_vcpu *vcpu, bool invalidate_gpa)
 			if (cpu_has_vmx_invvpid_global()) {
 				vpid_sync_vcpu_global();
 			} else {
-				WARN_ON_ONCE(!cpu_has_vmx_invvpid_single());
 				vpid_sync_vcpu_single(vmx->vpid);
 				vpid_sync_vcpu_single(vmx->nested.vpid02);
 			}
-- 
2.24.1


^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH v3 13/37] KVM: nVMX: Use vpid_sync_vcpu_addr() to emulate INVVPID with address
  2020-03-20 21:27 [PATCH v3 00/37] KVM: x86: TLB flushing fixes and enhancements Sean Christopherson
                   ` (11 preceding siblings ...)
  2020-03-20 21:28 ` [PATCH v3 12/37] KVM: VMX: Drop redundant capability checks in low level INVVPID helpers Sean Christopherson
@ 2020-03-20 21:28 ` Sean Christopherson
  2020-03-20 21:28 ` [PATCH v3 14/37] KVM: x86: Move "flush guest's TLB" logic to separate kvm_x86_ops hook Sean Christopherson
                   ` (23 subsequent siblings)
  36 siblings, 0 replies; 83+ messages in thread
From: Sean Christopherson @ 2020-03-20 21:28 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Junaid Shahid,
	Liran Alon, Boris Ostrovsky, John Haxby, Miaohe Lin,
	Tom Lendacky

Use vpid_sync_vcpu_addr() to emulate the "individual address" variant of
INVVPID now that said function handles the fallback case of the (host)
CPU not supporting "individual address".

Note, the "vpid == 0" checks in the vpid_sync_*() helpers aren't
actually redundant with the "!operand.vpid" check in handle_invvpid(),
as the vpid passed to vpid_sync_vcpu_addr() is a KVM (host) controlled
value, i.e. vpid02 can be zero even if operand.vpid is non-zero.

No functional change intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/vmx/nested.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 81bc4791d704..0c71db6fec5a 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -5251,11 +5251,7 @@ static int handle_invvpid(struct kvm_vcpu *vcpu)
 		    is_noncanonical_address(operand.gla, vcpu))
 			return nested_vmx_failValid(vcpu,
 				VMXERR_INVALID_OPERAND_TO_INVEPT_INVVPID);
-		if (cpu_has_vmx_invvpid_individual_addr())
-			__invvpid(VMX_VPID_EXTENT_INDIVIDUAL_ADDR,
-				vpid02, operand.gla);
-		else
-			vpid_sync_context(vpid02);
+		vpid_sync_vcpu_addr(vpid02, operand.gla);
 		break;
 	case VMX_VPID_EXTENT_SINGLE_CONTEXT:
 	case VMX_VPID_EXTENT_SINGLE_NON_GLOBAL:
-- 
2.24.1


^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH v3 14/37] KVM: x86: Move "flush guest's TLB" logic to separate kvm_x86_ops hook
  2020-03-20 21:27 [PATCH v3 00/37] KVM: x86: TLB flushing fixes and enhancements Sean Christopherson
                   ` (12 preceding siblings ...)
  2020-03-20 21:28 ` [PATCH v3 13/37] KVM: nVMX: Use vpid_sync_vcpu_addr() to emulate INVVPID with address Sean Christopherson
@ 2020-03-20 21:28 ` Sean Christopherson
  2020-03-25 10:23   ` Vitaly Kuznetsov
  2020-03-20 21:28 ` [PATCH v3 15/37] KVM: VMX: Clean up vmx_flush_tlb_gva() Sean Christopherson
                   ` (22 subsequent siblings)
  36 siblings, 1 reply; 83+ messages in thread
From: Sean Christopherson @ 2020-03-20 21:28 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Junaid Shahid,
	Liran Alon, Boris Ostrovsky, John Haxby, Miaohe Lin,
	Tom Lendacky

Add a dedicated hook to handle flushing TLB entries on behalf of the
guest, i.e. for a paravirtualized TLB flush, and use it directly instead
of bouncing through kvm_vcpu_flush_tlb().

For VMX, change the effective implementation implementation to never do
INVEPT and flush only the current context, i.e. to always flush via
INVVPID(SINGLE_CONTEXT).  The INVEPT performed by __vmx_flush_tlb() when
@invalidate_gpa=false and enable_vpid=0 is unnecessary, as it will only
flush guest-physical mappings; linear and combined mappings are flushed
by VM-Enter when VPID is disabled, and changes in the guest pages tables
do not affect guest-physical mappings.

When EPT and VPID are enabled, doing INVVPID is not required (by Intel's
architecture) to invalidate guest-physical mappings, i.e. TLB entries
that cache guest-physical mappings can live across INVVPID as the
mappings are associated with an EPTP, not a VPID.  The intent of
@invalidate_gpa is to inform vmx_flush_tlb() that it must "invalidate
gpa mappings", i.e. do INVEPT and not simply INVVPID.  Other than nested
VPID handling, which now calls vpid_sync_context() directly, the only
scenario where KVM can safely do INVVPID instead of INVEPT (when EPT is
enabled) is if KVM is flushing TLB entries from the guest's perspective,
i.e. is only required to invalidate linear mappings.

For SVM, flushing TLB entries from the guest's perspective can be done
by flushing the current ASID, as changes to the guest's page tables are
associated only with the current ASID.

Adding a dedicated ->tlb_flush_guest() paves the way toward removing
@invalidate_gpa, which is a potentially dangerous control flag as its
meaning is not exactly crystal clear, even for those who are familiar
with the subtleties of what mappings Intel CPUs are/aren't allowed to
keep across various invalidation scenarios.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/include/asm/kvm_host.h |  6 ++++++
 arch/x86/kvm/svm.c              |  6 ++++++
 arch/x86/kvm/vmx/vmx.c          | 13 +++++++++++++
 arch/x86/kvm/x86.c              |  2 +-
 4 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index cdbf822c5c8b..c08f4c0bf4d1 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1118,6 +1118,12 @@ struct kvm_x86_ops {
 	 */
 	void (*tlb_flush_gva)(struct kvm_vcpu *vcpu, gva_t addr);
 
+	/*
+	 * Flush any TLB entries created by the guest.  Like tlb_flush_gva(),
+	 * does not need to flush GPA->HPA mappings.
+	 */
+	void (*tlb_flush_guest)(struct kvm_vcpu *vcpu);
+
 	void (*run)(struct kvm_vcpu *vcpu);
 	int (*handle_exit)(struct kvm_vcpu *vcpu,
 		enum exit_fastpath_completion exit_fastpath);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 08568ae9f7a1..396f42753489 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -5643,6 +5643,11 @@ static void svm_flush_tlb_gva(struct kvm_vcpu *vcpu, gva_t gva)
 	invlpga(gva, svm->vmcb->control.asid);
 }
 
+static void svm_flush_tlb_guest(struct kvm_vcpu *vcpu)
+{
+	svm_flush_tlb(vcpu, false);
+}
+
 static void svm_prepare_guest_switch(struct kvm_vcpu *vcpu)
 {
 }
@@ -7400,6 +7405,7 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
 
 	.tlb_flush = svm_flush_tlb,
 	.tlb_flush_gva = svm_flush_tlb_gva,
+	.tlb_flush_guest = svm_flush_tlb_guest,
 
 	.run = svm_vcpu_run,
 	.handle_exit = handle_exit,
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index ba24bbda2c12..57c1cee58d18 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2862,6 +2862,18 @@ static void vmx_flush_tlb_gva(struct kvm_vcpu *vcpu, gva_t addr)
 	 */
 }
 
+static void vmx_flush_tlb_guest(struct kvm_vcpu *vcpu)
+{
+	/*
+	 * vpid_sync_context() is a nop if vmx->vpid==0, e.g. if enable_vpid==0
+	 * or a vpid couldn't be allocated for this vCPU.  VM-Enter and VM-Exit
+	 * are required to flush GVA->{G,H}PA mappings from the TLB if vpid is
+	 * disabled (VM-Enter with vpid enabled and vpid==0 is disallowed),
+	 * i.e. no explicit INVVPID is necessary.
+	 */
+	vpid_sync_context(to_vmx(vcpu)->vpid);
+}
+
 static void vmx_decache_cr0_guest_bits(struct kvm_vcpu *vcpu)
 {
 	ulong cr0_guest_owned_bits = vcpu->arch.cr0_guest_owned_bits;
@@ -7875,6 +7887,7 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
 
 	.tlb_flush = vmx_flush_tlb,
 	.tlb_flush_gva = vmx_flush_tlb_gva,
+	.tlb_flush_guest = vmx_flush_tlb_guest,
 
 	.run = vmx_vcpu_run,
 	.handle_exit = vmx_handle_exit,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f506248d61a1..0b90ec2c93cf 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2725,7 +2725,7 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
 	trace_kvm_pv_tlb_flush(vcpu->vcpu_id,
 		st->preempted & KVM_VCPU_FLUSH_TLB);
 	if (xchg(&st->preempted, 0) & KVM_VCPU_FLUSH_TLB)
-		kvm_vcpu_flush_tlb(vcpu, false);
+		kvm_x86_ops->tlb_flush_guest(vcpu);
 
 	vcpu->arch.st.preempted = 0;
 
-- 
2.24.1


^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH v3 15/37] KVM: VMX: Clean up vmx_flush_tlb_gva()
  2020-03-20 21:27 [PATCH v3 00/37] KVM: x86: TLB flushing fixes and enhancements Sean Christopherson
                   ` (13 preceding siblings ...)
  2020-03-20 21:28 ` [PATCH v3 14/37] KVM: x86: Move "flush guest's TLB" logic to separate kvm_x86_ops hook Sean Christopherson
@ 2020-03-20 21:28 ` Sean Christopherson
  2020-03-20 21:28 ` [PATCH v3 16/37] KVM: x86: Drop @invalidate_gpa param from kvm_x86_ops' tlb_flush() Sean Christopherson
                   ` (21 subsequent siblings)
  36 siblings, 0 replies; 83+ messages in thread
From: Sean Christopherson @ 2020-03-20 21:28 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Junaid Shahid,
	Liran Alon, Boris Ostrovsky, John Haxby, Miaohe Lin,
	Tom Lendacky

Refactor vmx_flush_tlb_gva() to remove a superfluous local variable and
clean up its comment, which is oddly located below the code it is
commenting.

No functional change intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/vmx/vmx.c | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 57c1cee58d18..43c0d4706f9a 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2851,15 +2851,11 @@ static void exit_lmode(struct kvm_vcpu *vcpu)
 
 static void vmx_flush_tlb_gva(struct kvm_vcpu *vcpu, gva_t addr)
 {
-	int vpid = to_vmx(vcpu)->vpid;
-
-	vpid_sync_vcpu_addr(vpid, addr);
-
 	/*
-	 * If VPIDs are not supported or enabled, then the above is a no-op.
-	 * But we don't really need a TLB flush in that case anyway, because
-	 * each VM entry/exit includes an implicit flush when VPID is 0.
+	 * vpid_sync_vcpu_addr() is a nop if vmx->vpid==0, see the comment in
+	 * vmx_flush_tlb_guest() for an explanation of why this is ok.
 	 */
+	vpid_sync_vcpu_addr(to_vmx(vcpu)->vpid, addr);
 }
 
 static void vmx_flush_tlb_guest(struct kvm_vcpu *vcpu)
-- 
2.24.1


^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH v3 16/37] KVM: x86: Drop @invalidate_gpa param from kvm_x86_ops' tlb_flush()
  2020-03-20 21:27 [PATCH v3 00/37] KVM: x86: TLB flushing fixes and enhancements Sean Christopherson
                   ` (14 preceding siblings ...)
  2020-03-20 21:28 ` [PATCH v3 15/37] KVM: VMX: Clean up vmx_flush_tlb_gva() Sean Christopherson
@ 2020-03-20 21:28 ` Sean Christopherson
  2020-03-25 11:23   ` Vitaly Kuznetsov
  2020-03-20 21:28 ` [PATCH v3 17/37] KVM: SVM: Wire up ->tlb_flush_guest() directly to svm_flush_tlb() Sean Christopherson
                   ` (20 subsequent siblings)
  36 siblings, 1 reply; 83+ messages in thread
From: Sean Christopherson @ 2020-03-20 21:28 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Junaid Shahid,
	Liran Alon, Boris Ostrovsky, John Haxby, Miaohe Lin,
	Tom Lendacky

Drop @invalidate_gpa from ->tlb_flush() and kvm_vcpu_flush_tlb() now
that all callers pass %true for said param, or ignore the param (SVM has
an internal call to svm_flush_tlb() in svm_flush_tlb_guest that somewhat
arbitrarily passes %false).

Remove __vmx_flush_tlb() as it is no longer used.

No functional change intended.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/include/asm/kvm_host.h |  2 +-
 arch/x86/kvm/mmu/mmu.c          |  2 +-
 arch/x86/kvm/svm.c              | 10 ++++----
 arch/x86/kvm/vmx/vmx.c          |  4 ++--
 arch/x86/kvm/vmx/vmx.h          | 42 ++++++++++-----------------------
 arch/x86/kvm/x86.c              |  6 ++---
 6 files changed, 24 insertions(+), 42 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index c08f4c0bf4d1..a5dfab4642d6 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1105,7 +1105,7 @@ struct kvm_x86_ops {
 	unsigned long (*get_rflags)(struct kvm_vcpu *vcpu);
 	void (*set_rflags)(struct kvm_vcpu *vcpu, unsigned long rflags);
 
-	void (*tlb_flush)(struct kvm_vcpu *vcpu, bool invalidate_gpa);
+	void (*tlb_flush)(struct kvm_vcpu *vcpu);
 	int  (*tlb_remote_flush)(struct kvm *kvm);
 	int  (*tlb_remote_flush_with_range)(struct kvm *kvm,
 			struct kvm_tlb_range *range);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 5ae620881bbc..a87b8f9f3b1f 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5177,7 +5177,7 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu)
 	if (r)
 		goto out;
 	kvm_mmu_load_pgd(vcpu);
-	kvm_x86_ops->tlb_flush(vcpu, true);
+	kvm_x86_ops->tlb_flush(vcpu);
 out:
 	return r;
 }
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 396f42753489..62fa45dcb6a4 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -385,7 +385,7 @@ module_param(dump_invalid_vmcb, bool, 0644);
 static u8 rsm_ins_bytes[] = "\x0f\xaa";
 
 static void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0);
-static void svm_flush_tlb(struct kvm_vcpu *vcpu, bool invalidate_gpa);
+static void svm_flush_tlb(struct kvm_vcpu *vcpu);
 static void svm_complete_interrupts(struct vcpu_svm *svm);
 static void svm_toggle_avic_for_irq_window(struct kvm_vcpu *vcpu, bool activate);
 static inline void avic_post_state_restore(struct kvm_vcpu *vcpu);
@@ -2692,7 +2692,7 @@ static int svm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
 		return 1;
 
 	if (npt_enabled && ((old_cr4 ^ cr4) & X86_CR4_PGE))
-		svm_flush_tlb(vcpu, true);
+		svm_flush_tlb(vcpu);
 
 	vcpu->arch.cr4 = cr4;
 	if (!npt_enabled)
@@ -3630,7 +3630,7 @@ static void enter_svm_guest_mode(struct vcpu_svm *svm, u64 vmcb_gpa,
 	svm->nested.intercept_exceptions = nested_vmcb->control.intercept_exceptions;
 	svm->nested.intercept            = nested_vmcb->control.intercept;
 
-	svm_flush_tlb(&svm->vcpu, true);
+	svm_flush_tlb(&svm->vcpu);
 	svm->vmcb->control.int_ctl = nested_vmcb->control.int_ctl | V_INTR_MASKING_MASK;
 	if (nested_vmcb->control.int_ctl & V_INTR_MASKING_MASK)
 		svm->vcpu.arch.hflags |= HF_VINTR_MASK;
@@ -5626,7 +5626,7 @@ static int svm_set_identity_map_addr(struct kvm *kvm, u64 ident_addr)
 	return 0;
 }
 
-static void svm_flush_tlb(struct kvm_vcpu *vcpu, bool invalidate_gpa)
+static void svm_flush_tlb(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 
@@ -5645,7 +5645,7 @@ static void svm_flush_tlb_gva(struct kvm_vcpu *vcpu, gva_t gva)
 
 static void svm_flush_tlb_guest(struct kvm_vcpu *vcpu)
 {
-	svm_flush_tlb(vcpu, false);
+	svm_flush_tlb(vcpu);
 }
 
 static void svm_prepare_guest_switch(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 43c0d4706f9a..477bdbc52ed0 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6079,7 +6079,7 @@ void vmx_set_virtual_apic_mode(struct kvm_vcpu *vcpu)
 		if (flexpriority_enabled) {
 			sec_exec_control |=
 				SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
-			vmx_flush_tlb(vcpu, true);
+			vmx_flush_tlb(vcpu);
 		}
 		break;
 	case LAPIC_MODE_X2APIC:
@@ -6097,7 +6097,7 @@ static void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu, hpa_t hpa)
 {
 	if (!is_guest_mode(vcpu)) {
 		vmcs_write64(APIC_ACCESS_ADDR, hpa);
-		vmx_flush_tlb(vcpu, true);
+		vmx_flush_tlb(vcpu);
 	}
 }
 
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 3770ae111e6a..bab5d62ad964 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -503,46 +503,28 @@ static inline struct vmcs *alloc_vmcs(bool shadow)
 
 u64 construct_eptp(struct kvm_vcpu *vcpu, unsigned long root_hpa);
 
-static inline void __vmx_flush_tlb(struct kvm_vcpu *vcpu, int vpid,
-				bool invalidate_gpa)
-{
-	if (enable_ept && (invalidate_gpa || !enable_vpid)) {
-		if (!VALID_PAGE(vcpu->arch.mmu->root_hpa))
-			return;
-		ept_sync_context(construct_eptp(vcpu,
-						vcpu->arch.mmu->root_hpa));
-	} else {
-		vpid_sync_context(vpid);
-	}
-}
-
-static inline void vmx_flush_tlb(struct kvm_vcpu *vcpu, bool invalidate_gpa)
+static inline void vmx_flush_tlb(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 
 	/*
-	 * Flush all EPTP/VPID contexts if the TLB flush _may_ have been
-	 * invoked via kvm_flush_remote_tlbs(), which always passes %true for
-	 * @invalidate_gpa.  Flushing remote TLBs requires all contexts to be
-	 * flushed, not just the active context.
+	 * Flush all EPTP/VPID contexts, as the TLB flush _may_ have been
+	 * invoked via kvm_flush_remote_tlbs().  Flushing remote TLBs requires
+	 * all contexts to be flushed, not just the active context.
 	 *
 	 * Note, this also ensures a deferred TLB flush with VPID enabled and
 	 * EPT disabled invalidates the "correct" VPID, by nuking both L1 and
 	 * L2's VPIDs.
 	 */
-	if (invalidate_gpa) {
-		if (enable_ept) {
-			ept_sync_global();
-		} else if (enable_vpid) {
-			if (cpu_has_vmx_invvpid_global()) {
-				vpid_sync_vcpu_global();
-			} else {
-				vpid_sync_vcpu_single(vmx->vpid);
-				vpid_sync_vcpu_single(vmx->nested.vpid02);
-			}
+	if (enable_ept) {
+		ept_sync_global();
+	} else if (enable_vpid) {
+		if (cpu_has_vmx_invvpid_global()) {
+			vpid_sync_vcpu_global();
+		} else {
+			vpid_sync_vcpu_single(vmx->vpid);
+			vpid_sync_vcpu_single(vmx->nested.vpid02);
 		}
-	} else {
-		__vmx_flush_tlb(vcpu, vmx->vpid, false);
 	}
 }
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0b90ec2c93cf..84cbd7ca1e18 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2696,10 +2696,10 @@ static void kvmclock_reset(struct kvm_vcpu *vcpu)
 	vcpu->arch.time = 0;
 }
 
-static void kvm_vcpu_flush_tlb(struct kvm_vcpu *vcpu, bool invalidate_gpa)
+static void kvm_vcpu_flush_tlb(struct kvm_vcpu *vcpu)
 {
 	++vcpu->stat.tlb_flush;
-	kvm_x86_ops->tlb_flush(vcpu, invalidate_gpa);
+	kvm_x86_ops->tlb_flush(vcpu);
 }
 
 static void record_steal_time(struct kvm_vcpu *vcpu)
@@ -8223,7 +8223,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 		if (kvm_check_request(KVM_REQ_LOAD_MMU_PGD, vcpu))
 			kvm_mmu_load_pgd(vcpu);
 		if (kvm_check_request(KVM_REQ_TLB_FLUSH, vcpu))
-			kvm_vcpu_flush_tlb(vcpu, true);
+			kvm_vcpu_flush_tlb(vcpu);
 		if (kvm_check_request(KVM_REQ_REPORT_TPR_ACCESS, vcpu)) {
 			vcpu->run->exit_reason = KVM_EXIT_TPR_ACCESS;
 			r = 0;
-- 
2.24.1


^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH v3 17/37] KVM: SVM: Wire up ->tlb_flush_guest() directly to svm_flush_tlb()
  2020-03-20 21:27 [PATCH v3 00/37] KVM: x86: TLB flushing fixes and enhancements Sean Christopherson
                   ` (15 preceding siblings ...)
  2020-03-20 21:28 ` [PATCH v3 16/37] KVM: x86: Drop @invalidate_gpa param from kvm_x86_ops' tlb_flush() Sean Christopherson
@ 2020-03-20 21:28 ` Sean Christopherson
  2020-03-25 11:23   ` Vitaly Kuznetsov
  2020-03-20 21:28 ` [PATCH v3 18/37] KVM: VMX: Move vmx_flush_tlb() to vmx.c Sean Christopherson
                   ` (19 subsequent siblings)
  36 siblings, 1 reply; 83+ messages in thread
From: Sean Christopherson @ 2020-03-20 21:28 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Junaid Shahid,
	Liran Alon, Boris Ostrovsky, John Haxby, Miaohe Lin,
	Tom Lendacky

Use svm_flush_tlb() directly for kvm_x86_ops->tlb_flush_guest() now that
the @invalidate_gpa param to ->tlb_flush() is gone, i.e. the wrapper for
->tlb_flush_guest() is no longer necessary.

No functional change intended.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/svm.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 62fa45dcb6a4..dfa3b53f8437 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -5643,11 +5643,6 @@ static void svm_flush_tlb_gva(struct kvm_vcpu *vcpu, gva_t gva)
 	invlpga(gva, svm->vmcb->control.asid);
 }
 
-static void svm_flush_tlb_guest(struct kvm_vcpu *vcpu)
-{
-	svm_flush_tlb(vcpu);
-}
-
 static void svm_prepare_guest_switch(struct kvm_vcpu *vcpu)
 {
 }
@@ -7405,7 +7400,7 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
 
 	.tlb_flush = svm_flush_tlb,
 	.tlb_flush_gva = svm_flush_tlb_gva,
-	.tlb_flush_guest = svm_flush_tlb_guest,
+	.tlb_flush_guest = svm_flush_tlb,
 
 	.run = svm_vcpu_run,
 	.handle_exit = handle_exit,
-- 
2.24.1


^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH v3 18/37] KVM: VMX: Move vmx_flush_tlb() to vmx.c
  2020-03-20 21:27 [PATCH v3 00/37] KVM: x86: TLB flushing fixes and enhancements Sean Christopherson
                   ` (16 preceding siblings ...)
  2020-03-20 21:28 ` [PATCH v3 17/37] KVM: SVM: Wire up ->tlb_flush_guest() directly to svm_flush_tlb() Sean Christopherson
@ 2020-03-20 21:28 ` Sean Christopherson
  2020-03-25 11:25   ` Vitaly Kuznetsov
  2020-03-20 21:28 ` [PATCH v3 19/37] KVM: nVMX: Move nested_get_vpid02() to vmx/nested.h Sean Christopherson
                   ` (18 subsequent siblings)
  36 siblings, 1 reply; 83+ messages in thread
From: Sean Christopherson @ 2020-03-20 21:28 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Junaid Shahid,
	Liran Alon, Boris Ostrovsky, John Haxby, Miaohe Lin,
	Tom Lendacky

Move vmx_flush_tlb() to vmx.c and make it non-inline static now that all
its callers live in vmx.c.

No functional change intended.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/vmx/vmx.c | 25 +++++++++++++++++++++++++
 arch/x86/kvm/vmx/vmx.h | 25 -------------------------
 2 files changed, 25 insertions(+), 25 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 477bdbc52ed0..c6affaaef138 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2849,6 +2849,31 @@ static void exit_lmode(struct kvm_vcpu *vcpu)
 
 #endif
 
+static void vmx_flush_tlb(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+	/*
+	 * Flush all EPTP/VPID contexts, as the TLB flush _may_ have been
+	 * invoked via kvm_flush_remote_tlbs().  Flushing remote TLBs requires
+	 * all contexts to be flushed, not just the active context.
+	 *
+	 * Note, this also ensures a deferred TLB flush with VPID enabled and
+	 * EPT disabled invalidates the "correct" VPID, by nuking both L1 and
+	 * L2's VPIDs.
+	 */
+	if (enable_ept) {
+		ept_sync_global();
+	} else if (enable_vpid) {
+		if (cpu_has_vmx_invvpid_global()) {
+			vpid_sync_vcpu_global();
+		} else {
+			vpid_sync_vcpu_single(vmx->vpid);
+			vpid_sync_vcpu_single(vmx->nested.vpid02);
+		}
+	}
+}
+
 static void vmx_flush_tlb_gva(struct kvm_vcpu *vcpu, gva_t addr)
 {
 	/*
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index bab5d62ad964..571249e18bb6 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -503,31 +503,6 @@ static inline struct vmcs *alloc_vmcs(bool shadow)
 
 u64 construct_eptp(struct kvm_vcpu *vcpu, unsigned long root_hpa);
 
-static inline void vmx_flush_tlb(struct kvm_vcpu *vcpu)
-{
-	struct vcpu_vmx *vmx = to_vmx(vcpu);
-
-	/*
-	 * Flush all EPTP/VPID contexts, as the TLB flush _may_ have been
-	 * invoked via kvm_flush_remote_tlbs().  Flushing remote TLBs requires
-	 * all contexts to be flushed, not just the active context.
-	 *
-	 * Note, this also ensures a deferred TLB flush with VPID enabled and
-	 * EPT disabled invalidates the "correct" VPID, by nuking both L1 and
-	 * L2's VPIDs.
-	 */
-	if (enable_ept) {
-		ept_sync_global();
-	} else if (enable_vpid) {
-		if (cpu_has_vmx_invvpid_global()) {
-			vpid_sync_vcpu_global();
-		} else {
-			vpid_sync_vcpu_single(vmx->vpid);
-			vpid_sync_vcpu_single(vmx->nested.vpid02);
-		}
-	}
-}
-
 static inline void decache_tsc_multiplier(struct vcpu_vmx *vmx)
 {
 	vmx->current_tsc_ratio = vmx->vcpu.arch.tsc_scaling_ratio;
-- 
2.24.1


^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH v3 19/37] KVM: nVMX: Move nested_get_vpid02() to vmx/nested.h
  2020-03-20 21:27 [PATCH v3 00/37] KVM: x86: TLB flushing fixes and enhancements Sean Christopherson
                   ` (17 preceding siblings ...)
  2020-03-20 21:28 ` [PATCH v3 18/37] KVM: VMX: Move vmx_flush_tlb() to vmx.c Sean Christopherson
@ 2020-03-20 21:28 ` Sean Christopherson
  2020-03-25 11:25   ` Vitaly Kuznetsov
  2020-03-20 21:28 ` [PATCH v3 20/37] KVM: VMX: Introduce vmx_flush_tlb_current() Sean Christopherson
                   ` (17 subsequent siblings)
  36 siblings, 1 reply; 83+ messages in thread
From: Sean Christopherson @ 2020-03-20 21:28 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Junaid Shahid,
	Liran Alon, Boris Ostrovsky, John Haxby, Miaohe Lin,
	Tom Lendacky

Move nested_get_vpid02() to vmx/nested.h so that a future patch can
reference it from vmx.c to implement context-specific TLB flushing.

No functional change intended.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/vmx/nested.c | 7 -------
 arch/x86/kvm/vmx/nested.h | 7 +++++++
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 0c71db6fec5a..77819d890088 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -1154,13 +1154,6 @@ static bool nested_has_guest_tlb_tag(struct kvm_vcpu *vcpu)
 	       (nested_cpu_has_vpid(vmcs12) && to_vmx(vcpu)->nested.vpid02);
 }
 
-static u16 nested_get_vpid02(struct kvm_vcpu *vcpu)
-{
-	struct vcpu_vmx *vmx = to_vmx(vcpu);
-
-	return vmx->nested.vpid02 ? vmx->nested.vpid02 : vmx->vpid;
-}
-
 static bool is_bitwise_subset(u64 superset, u64 subset, u64 mask)
 {
 	superset &= mask;
diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h
index 21d36652f213..debc5eeb5757 100644
--- a/arch/x86/kvm/vmx/nested.h
+++ b/arch/x86/kvm/vmx/nested.h
@@ -60,6 +60,13 @@ static inline int vmx_has_valid_vmcs12(struct kvm_vcpu *vcpu)
 		vmx->nested.hv_evmcs;
 }
 
+static inline u16 nested_get_vpid02(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+	return vmx->nested.vpid02 ? vmx->nested.vpid02 : vmx->vpid;
+}
+
 static inline unsigned long nested_ept_get_eptp(struct kvm_vcpu *vcpu)
 {
 	/* return the page table to be shadowed - in our case, EPT12 */
-- 
2.24.1


^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH v3 20/37] KVM: VMX: Introduce vmx_flush_tlb_current()
  2020-03-20 21:27 [PATCH v3 00/37] KVM: x86: TLB flushing fixes and enhancements Sean Christopherson
                   ` (18 preceding siblings ...)
  2020-03-20 21:28 ` [PATCH v3 19/37] KVM: nVMX: Move nested_get_vpid02() to vmx/nested.h Sean Christopherson
@ 2020-03-20 21:28 ` Sean Christopherson
  2020-03-20 21:28 ` [PATCH v3 21/37] KVM: SVM: Document the ASID logic in svm_flush_tlb() Sean Christopherson
                   ` (16 subsequent siblings)
  36 siblings, 0 replies; 83+ messages in thread
From: Sean Christopherson @ 2020-03-20 21:28 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Junaid Shahid,
	Liran Alon, Boris Ostrovsky, John Haxby, Miaohe Lin,
	Tom Lendacky

Add a helper to flush TLB entries only for the current EPTP/VPID context
and use it for the existing direct invocations of vmx_flush_tlb().  TLB
flushes that are specific to the current vCPU state do not need to flush
other contexts.

Note, both converted call sites happen to be related to the APIC access
page, this is purely coincidental.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/vmx/vmx.c | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index c6affaaef138..2d0a8c7654d7 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2874,6 +2874,22 @@ static void vmx_flush_tlb(struct kvm_vcpu *vcpu)
 	}
 }
 
+static void vmx_flush_tlb_current(struct kvm_vcpu *vcpu)
+{
+	u64 root_hpa = vcpu->arch.mmu->root_hpa;
+
+	/* No flush required if the current context is invalid. */
+	if (!VALID_PAGE(root_hpa))
+		return;
+
+	if (enable_ept)
+		ept_sync_context(construct_eptp(vcpu, root_hpa));
+	else if (!is_guest_mode(vcpu))
+		vpid_sync_context(to_vmx(vcpu)->vpid);
+	else
+		vpid_sync_context(nested_get_vpid02(vcpu));
+}
+
 static void vmx_flush_tlb_gva(struct kvm_vcpu *vcpu, gva_t addr)
 {
 	/*
@@ -6104,7 +6120,7 @@ void vmx_set_virtual_apic_mode(struct kvm_vcpu *vcpu)
 		if (flexpriority_enabled) {
 			sec_exec_control |=
 				SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
-			vmx_flush_tlb(vcpu);
+			vmx_flush_tlb_current(vcpu);
 		}
 		break;
 	case LAPIC_MODE_X2APIC:
@@ -6122,7 +6138,7 @@ static void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu, hpa_t hpa)
 {
 	if (!is_guest_mode(vcpu)) {
 		vmcs_write64(APIC_ACCESS_ADDR, hpa);
-		vmx_flush_tlb(vcpu);
+		vmx_flush_tlb_current(vcpu);
 	}
 }
 
-- 
2.24.1


^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH v3 21/37] KVM: SVM: Document the ASID logic in svm_flush_tlb()
  2020-03-20 21:27 [PATCH v3 00/37] KVM: x86: TLB flushing fixes and enhancements Sean Christopherson
                   ` (19 preceding siblings ...)
  2020-03-20 21:28 ` [PATCH v3 20/37] KVM: VMX: Introduce vmx_flush_tlb_current() Sean Christopherson
@ 2020-03-20 21:28 ` Sean Christopherson
  2020-03-20 21:28 ` [PATCH v3 22/37] KVM: x86: Rename ->tlb_flush() to ->tlb_flush_all() Sean Christopherson
                   ` (15 subsequent siblings)
  36 siblings, 0 replies; 83+ messages in thread
From: Sean Christopherson @ 2020-03-20 21:28 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Junaid Shahid,
	Liran Alon, Boris Ostrovsky, John Haxby, Miaohe Lin,
	Tom Lendacky

Add a comment in svm_flush_tlb() to document why it flushes only the
current ASID, even when it is invoked when flushing remote TLBs.

Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/svm.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index dfa3b53f8437..8c3700b44eb4 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -5630,6 +5630,13 @@ static void svm_flush_tlb(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 
+	/*
+	 * Flush only the current ASID even if the TLB flush was invoked via
+	 * kvm_flush_remote_tlbs().  Although flushing remote TLBs requires all
+	 * ASIDs to be flushed, KVM uses a single ASID for L1 and L2, and
+	 * unconditionally does a TLB flush on both nested VM-Enter and nested
+	 * VM-Exit (via kvm_mmu_reset_context()).
+	 */
 	if (static_cpu_has(X86_FEATURE_FLUSHBYASID))
 		svm->vmcb->control.tlb_ctl = TLB_CONTROL_FLUSH_ASID;
 	else
-- 
2.24.1


^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH v3 22/37] KVM: x86: Rename ->tlb_flush() to ->tlb_flush_all()
  2020-03-20 21:27 [PATCH v3 00/37] KVM: x86: TLB flushing fixes and enhancements Sean Christopherson
                   ` (20 preceding siblings ...)
  2020-03-20 21:28 ` [PATCH v3 21/37] KVM: SVM: Document the ASID logic in svm_flush_tlb() Sean Christopherson
@ 2020-03-20 21:28 ` Sean Christopherson
  2020-03-20 21:28 ` [PATCH v3 23/37] KVM: nVMX: Add helper to handle TLB flushes on nested VM-Enter/VM-Exit Sean Christopherson
                   ` (14 subsequent siblings)
  36 siblings, 0 replies; 83+ messages in thread
From: Sean Christopherson @ 2020-03-20 21:28 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Junaid Shahid,
	Liran Alon, Boris Ostrovsky, John Haxby, Miaohe Lin,
	Tom Lendacky

Rename ->tlb_flush() to ->tlb_flush_all() in preparation for adding a
new hook to flush only the current ASID/context.

Opportunstically replace the comment in vmx_flush_tlb() that explains
why it flushes all EPTP/VPID contexts with a comment explaining why it
unconditionally uses INVEPT when EPT is enabled.  I.e. rely on the "all"
part of the name to clarify why it does global INVEPT/INVVPID.

No functional change intended.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/include/asm/kvm_host.h |  2 +-
 arch/x86/kvm/mmu/mmu.c          |  2 +-
 arch/x86/kvm/svm.c              |  2 +-
 arch/x86/kvm/vmx/vmx.c          | 16 +++++++---------
 arch/x86/kvm/x86.c              |  6 +++---
 5 files changed, 13 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index a5dfab4642d6..0392a9db110d 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1105,7 +1105,7 @@ struct kvm_x86_ops {
 	unsigned long (*get_rflags)(struct kvm_vcpu *vcpu);
 	void (*set_rflags)(struct kvm_vcpu *vcpu, unsigned long rflags);
 
-	void (*tlb_flush)(struct kvm_vcpu *vcpu);
+	void (*tlb_flush_all)(struct kvm_vcpu *vcpu);
 	int  (*tlb_remote_flush)(struct kvm *kvm);
 	int  (*tlb_remote_flush_with_range)(struct kvm *kvm,
 			struct kvm_tlb_range *range);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index a87b8f9f3b1f..c357cc79f0f3 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5177,7 +5177,7 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu)
 	if (r)
 		goto out;
 	kvm_mmu_load_pgd(vcpu);
-	kvm_x86_ops->tlb_flush(vcpu);
+	kvm_x86_ops->tlb_flush_all(vcpu);
 out:
 	return r;
 }
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 8c3700b44eb4..10e5b8c4b515 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -7405,7 +7405,7 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
 	.get_rflags = svm_get_rflags,
 	.set_rflags = svm_set_rflags,
 
-	.tlb_flush = svm_flush_tlb,
+	.tlb_flush_all = svm_flush_tlb,
 	.tlb_flush_gva = svm_flush_tlb_gva,
 	.tlb_flush_guest = svm_flush_tlb,
 
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 2d0a8c7654d7..d6cf625b4011 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2849,18 +2849,16 @@ static void exit_lmode(struct kvm_vcpu *vcpu)
 
 #endif
 
-static void vmx_flush_tlb(struct kvm_vcpu *vcpu)
+static void vmx_flush_tlb_all(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 
 	/*
-	 * Flush all EPTP/VPID contexts, as the TLB flush _may_ have been
-	 * invoked via kvm_flush_remote_tlbs().  Flushing remote TLBs requires
-	 * all contexts to be flushed, not just the active context.
-	 *
-	 * Note, this also ensures a deferred TLB flush with VPID enabled and
-	 * EPT disabled invalidates the "correct" VPID, by nuking both L1 and
-	 * L2's VPIDs.
+	 * INVEPT must be issued when EPT is enabled, irrespective of VPID, as
+	 * the CPU is not required to invalidate guest-physical mappings on
+	 * VM-Entry, even if VPID is disabled.  Guest-physical mappings are
+	 * associated with the root EPT structure and not any particular VPID
+	 * (INVVPID also isn't required to invalidate guest-physical mappings).
 	 */
 	if (enable_ept) {
 		ept_sync_global();
@@ -7922,7 +7920,7 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
 	.get_rflags = vmx_get_rflags,
 	.set_rflags = vmx_set_rflags,
 
-	.tlb_flush = vmx_flush_tlb,
+	.tlb_flush_all = vmx_flush_tlb_all,
 	.tlb_flush_gva = vmx_flush_tlb_gva,
 	.tlb_flush_guest = vmx_flush_tlb_guest,
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 84cbd7ca1e18..333968e5ef3c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2696,10 +2696,10 @@ static void kvmclock_reset(struct kvm_vcpu *vcpu)
 	vcpu->arch.time = 0;
 }
 
-static void kvm_vcpu_flush_tlb(struct kvm_vcpu *vcpu)
+static void kvm_vcpu_flush_tlb_all(struct kvm_vcpu *vcpu)
 {
 	++vcpu->stat.tlb_flush;
-	kvm_x86_ops->tlb_flush(vcpu);
+	kvm_x86_ops->tlb_flush_all(vcpu);
 }
 
 static void record_steal_time(struct kvm_vcpu *vcpu)
@@ -8223,7 +8223,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 		if (kvm_check_request(KVM_REQ_LOAD_MMU_PGD, vcpu))
 			kvm_mmu_load_pgd(vcpu);
 		if (kvm_check_request(KVM_REQ_TLB_FLUSH, vcpu))
-			kvm_vcpu_flush_tlb(vcpu);
+			kvm_vcpu_flush_tlb_all(vcpu);
 		if (kvm_check_request(KVM_REQ_REPORT_TPR_ACCESS, vcpu)) {
 			vcpu->run->exit_reason = KVM_EXIT_TPR_ACCESS;
 			r = 0;
-- 
2.24.1


^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH v3 23/37] KVM: nVMX: Add helper to handle TLB flushes on nested VM-Enter/VM-Exit
  2020-03-20 21:27 [PATCH v3 00/37] KVM: x86: TLB flushing fixes and enhancements Sean Christopherson
                   ` (21 preceding siblings ...)
  2020-03-20 21:28 ` [PATCH v3 22/37] KVM: x86: Rename ->tlb_flush() to ->tlb_flush_all() Sean Christopherson
@ 2020-03-20 21:28 ` Sean Christopherson
  2021-10-28 13:11   ` Lai Jiangshan
  2020-03-20 21:28 ` [PATCH v3 24/37] KVM: x86: Introduce KVM_REQ_TLB_FLUSH_CURRENT to flush current ASID Sean Christopherson
                   ` (13 subsequent siblings)
  36 siblings, 1 reply; 83+ messages in thread
From: Sean Christopherson @ 2020-03-20 21:28 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Junaid Shahid,
	Liran Alon, Boris Ostrovsky, John Haxby, Miaohe Lin,
	Tom Lendacky

Add a helper to determine whether or not a full TLB flush needs to be
performed on nested VM-Enter/VM-Exit, as the logic is identical for both
flows and needs a fairly beefy comment to boot.  This also provides a
common point to make future adjustments to the logic.

Handle vpid12 changes the new helper as well even though it is specific
to VM-Enter.  The vpid12 logic is an extension of the flushing logic,
and it's worth the extra bool parameter to provide a single location for
the flushing logic.

Cc: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/vmx/nested.c | 88 +++++++++++++++++++--------------------
 1 file changed, 44 insertions(+), 44 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 77819d890088..580d5c98352f 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -1154,6 +1154,48 @@ static bool nested_has_guest_tlb_tag(struct kvm_vcpu *vcpu)
 	       (nested_cpu_has_vpid(vmcs12) && to_vmx(vcpu)->nested.vpid02);
 }
 
+static void nested_vmx_transition_tlb_flush(struct kvm_vcpu *vcpu,
+					    struct vmcs12 *vmcs12,
+					    bool is_vmenter)
+{
+	struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+	/*
+	 * If VPID is disabled, linear and combined mappings are flushed on
+	 * VM-Enter/VM-Exit, and guest-physical mappings are valid only for
+	 * their associated EPTP.
+	 */
+	if (!enable_vpid)
+		return;
+
+	/*
+	 * If vmcs12 doesn't use VPID, L1 expects linear and combined mappings
+	 * for *all* contexts to be flushed on VM-Enter/VM-Exit.
+	 *
+	 * If VPID is enabled and used by vmc12, but L2 does not have a unique
+	 * TLB tag (ASID), i.e. EPT is disabled and KVM was unable to allocate
+	 * a VPID for L2, flush the TLB as the effective ASID is common to both
+	 * L1 and L2.
+	 *
+	 * Defer the flush so that it runs after vmcs02.EPTP has been set by
+	 * KVM_REQ_LOAD_MMU_PGD (if nested EPT is enabled) and to avoid
+	 * redundant flushes further down the nested pipeline.
+	 *
+	 * If a TLB flush isn't required due to any of the above, and vpid12 is
+	 * changing then the new "virtual" VPID (vpid12) will reuse the same
+	 * "real" VPID (vpid02), and so needs to be sync'd.  There is no direct
+	 * mapping between vpid02 and vpid12, vpid02 is per-vCPU and reused for
+	 * all nested vCPUs.
+	 */
+	if (!nested_cpu_has_vpid(vmcs12) || !nested_has_guest_tlb_tag(vcpu)) {
+		kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
+	} else if (is_vmenter &&
+		   vmcs12->virtual_processor_id != vmx->nested.last_vpid) {
+		vmx->nested.last_vpid = vmcs12->virtual_processor_id;
+		vpid_sync_context(nested_get_vpid02(vcpu));
+	}
+}
+
 static bool is_bitwise_subset(u64 superset, u64 subset, u64 mask)
 {
 	superset &= mask;
@@ -2462,32 +2504,7 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
 	if (kvm_has_tsc_control)
 		decache_tsc_multiplier(vmx);
 
-	if (enable_vpid) {
-		/*
-		 * There is no direct mapping between vpid02 and vpid12, the
-		 * vpid02 is per-vCPU for L0 and reused while the value of
-		 * vpid12 is changed w/ one invvpid during nested vmentry.
-		 * The vpid12 is allocated by L1 for L2, so it will not
-		 * influence global bitmap(for vpid01 and vpid02 allocation)
-		 * even if spawn a lot of nested vCPUs.
-		 */
-		if (nested_cpu_has_vpid(vmcs12) && nested_has_guest_tlb_tag(vcpu)) {
-			if (vmcs12->virtual_processor_id != vmx->nested.last_vpid) {
-				vmx->nested.last_vpid = vmcs12->virtual_processor_id;
-				vpid_sync_context(nested_get_vpid02(vcpu));
-			}
-		} else {
-			/*
-			 * If L1 use EPT, then L0 needs to execute INVEPT on
-			 * EPTP02 instead of EPTP01. Therefore, delay TLB
-			 * flush until vmcs02->eptp is fully updated by
-			 * KVM_REQ_LOAD_MMU_PGD. Note that this assumes
-			 * KVM_REQ_TLB_FLUSH is evaluated after
-			 * KVM_REQ_LOAD_MMU_PGD in vcpu_enter_guest().
-			 */
-			kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
-		}
-	}
+	nested_vmx_transition_tlb_flush(vcpu, vmcs12, true);
 
 	if (nested_cpu_has_ept(vmcs12))
 		nested_ept_init_mmu_context(vcpu);
@@ -4054,24 +4071,7 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu,
 	if (!enable_ept)
 		vcpu->arch.walk_mmu->inject_page_fault = kvm_inject_page_fault;
 
-	/*
-	 * If vmcs01 doesn't use VPID, CPU flushes TLB on every
-	 * VMEntry/VMExit. Thus, no need to flush TLB.
-	 *
-	 * If vmcs12 doesn't use VPID, L1 expects TLB to be
-	 * flushed on every VMEntry/VMExit.
-	 *
-	 * Otherwise, we can preserve TLB entries as long as we are
-	 * able to tag L1 TLB entries differently than L2 TLB entries.
-	 *
-	 * If vmcs12 uses EPT, we need to execute this flush on EPTP01
-	 * and therefore we request the TLB flush to happen only after VMCS EPTP
-	 * has been set by KVM_REQ_LOAD_MMU_PGD.
-	 */
-	if (enable_vpid &&
-	    (!nested_cpu_has_vpid(vmcs12) || !nested_has_guest_tlb_tag(vcpu))) {
-		kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
-	}
+	nested_vmx_transition_tlb_flush(vcpu, vmcs12, false);
 
 	vmcs_write32(GUEST_SYSENTER_CS, vmcs12->host_ia32_sysenter_cs);
 	vmcs_writel(GUEST_SYSENTER_ESP, vmcs12->host_ia32_sysenter_esp);
-- 
2.24.1


^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH v3 24/37] KVM: x86: Introduce KVM_REQ_TLB_FLUSH_CURRENT to flush current ASID
  2020-03-20 21:27 [PATCH v3 00/37] KVM: x86: TLB flushing fixes and enhancements Sean Christopherson
                   ` (22 preceding siblings ...)
  2020-03-20 21:28 ` [PATCH v3 23/37] KVM: nVMX: Add helper to handle TLB flushes on nested VM-Enter/VM-Exit Sean Christopherson
@ 2020-03-20 21:28 ` Sean Christopherson
  2020-03-20 21:28 ` [PATCH v3 25/37] KVM: x86/mmu: Use KVM_REQ_TLB_FLUSH_CURRENT for MMU specific flushes Sean Christopherson
                   ` (12 subsequent siblings)
  36 siblings, 0 replies; 83+ messages in thread
From: Sean Christopherson @ 2020-03-20 21:28 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Junaid Shahid,
	Liran Alon, Boris Ostrovsky, John Haxby, Miaohe Lin,
	Tom Lendacky

Add KVM_REQ_TLB_FLUSH_CURRENT to allow optimized TLB flushing of VMX's
EPTP/VPID contexts[*] from the KVM MMU and/or in a deferred manner, e.g.
to flush L2's context during nested VM-Enter.

Convert KVM_REQ_TLB_FLUSH to KVM_REQ_TLB_FLUSH_CURRENT in flows where
the flush is directly associated with vCPU-scoped instruction emulation,
i.e. MOV CR3 and INVPCID.

Add a comment in vmx_vcpu_load_vmcs() above its KVM_REQ_TLB_FLUSH to
make it clear that it deliberately requests a flush of all contexts.

Service any pending flush request on nested VM-Exit as it's possible a
nested VM-Exit could occur after requesting a flush for L2.  Add the
same logic for nested VM-Enter even though it's _extremely_ unlikely
for flush to be pending on nested VM-Enter, but theoretically possible
(in the future) due to RSM (SMM) emulation.

[*] Intel also has an Address Space Identifier (ASID) concept, e.g.
    EPTP+VPID+PCID == ASID, it's just not documented in the SDM because
    the rules of invalidation are different based on which piece of the
    ASID is being changed, i.e. whether the EPTP, VPID, or PCID context
    must be invalidated.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/include/asm/kvm_host.h |  2 ++
 arch/x86/kvm/svm.c              |  1 +
 arch/x86/kvm/vmx/nested.c       |  7 +++++++
 arch/x86/kvm/vmx/vmx.c          |  7 ++++++-
 arch/x86/kvm/x86.c              | 11 +++++++++--
 arch/x86/kvm/x86.h              |  6 ++++++
 6 files changed, 31 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 0392a9db110d..26fa52450569 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -83,6 +83,7 @@
 #define KVM_REQ_GET_VMCS12_PAGES	KVM_ARCH_REQ(24)
 #define KVM_REQ_APICV_UPDATE \
 	KVM_ARCH_REQ_FLAGS(25, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
+#define KVM_REQ_TLB_FLUSH_CURRENT	KVM_ARCH_REQ(26)
 
 #define CR0_RESERVED_BITS                                               \
 	(~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
@@ -1106,6 +1107,7 @@ struct kvm_x86_ops {
 	void (*set_rflags)(struct kvm_vcpu *vcpu, unsigned long rflags);
 
 	void (*tlb_flush_all)(struct kvm_vcpu *vcpu);
+	void (*tlb_flush_current)(struct kvm_vcpu *vcpu);
 	int  (*tlb_remote_flush)(struct kvm *kvm);
 	int  (*tlb_remote_flush_with_range)(struct kvm *kvm,
 			struct kvm_tlb_range *range);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 10e5b8c4b515..0bf7ad5f62ad 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -7406,6 +7406,7 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
 	.set_rflags = svm_set_rflags,
 
 	.tlb_flush_all = svm_flush_tlb,
+	.tlb_flush_current = svm_flush_tlb,
 	.tlb_flush_gva = svm_flush_tlb_gva,
 	.tlb_flush_guest = svm_flush_tlb,
 
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 580d5c98352f..b9fa2f89b564 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3230,6 +3230,9 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
 	u32 exit_reason = EXIT_REASON_INVALID_STATE;
 	u32 exit_qual;
 
+	if (kvm_check_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu))
+		kvm_vcpu_flush_tlb_current(vcpu);
+
 	evaluate_pending_interrupts = exec_controls_get(vmx) &
 		(CPU_BASED_INTR_WINDOW_EXITING | CPU_BASED_NMI_WINDOW_EXITING);
 	if (likely(!evaluate_pending_interrupts) && kvm_vcpu_apicv_active(vcpu))
@@ -4295,6 +4298,10 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason,
 	/* trying to cancel vmlaunch/vmresume is a bug */
 	WARN_ON_ONCE(vmx->nested.nested_run_pending);
 
+	/* Service the TLB flush request for L2 before switching to L1. */
+	if (kvm_check_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu))
+		kvm_vcpu_flush_tlb_current(vcpu);
+
 	leave_guest_mode(vcpu);
 
 	if (nested_cpu_has_preemption_timer(vmcs12))
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index d6cf625b4011..ae7279802652 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1367,6 +1367,10 @@ void vmx_vcpu_load_vmcs(struct kvm_vcpu *vcpu, int cpu)
 		void *gdt = get_current_gdt_ro();
 		unsigned long sysenter_esp;
 
+		/*
+		 * Flush all EPTP/VPID contexts, the new pCPU may have stale
+		 * TLB entries from its previous association with the vCPU.
+		 */
 		kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
 
 		/*
@@ -5479,7 +5483,7 @@ static int handle_invpcid(struct kvm_vcpu *vcpu)
 
 		if (kvm_get_active_pcid(vcpu) == operand.pcid) {
 			kvm_mmu_sync_roots(vcpu);
-			kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
+			kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
 		}
 
 		for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++)
@@ -7921,6 +7925,7 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
 	.set_rflags = vmx_set_rflags,
 
 	.tlb_flush_all = vmx_flush_tlb_all,
+	.tlb_flush_current = vmx_flush_tlb_current,
 	.tlb_flush_gva = vmx_flush_tlb_gva,
 	.tlb_flush_guest = vmx_flush_tlb_guest,
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 333968e5ef3c..cccfcf612008 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1033,7 +1033,7 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
 	if (cr3 == kvm_read_cr3(vcpu) && !pdptrs_changed(vcpu)) {
 		if (!skip_tlb_flush) {
 			kvm_mmu_sync_roots(vcpu);
-			kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
+			kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
 		}
 		return 0;
 	}
@@ -8222,8 +8222,15 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 			kvm_mmu_sync_roots(vcpu);
 		if (kvm_check_request(KVM_REQ_LOAD_MMU_PGD, vcpu))
 			kvm_mmu_load_pgd(vcpu);
-		if (kvm_check_request(KVM_REQ_TLB_FLUSH, vcpu))
+		if (kvm_check_request(KVM_REQ_TLB_FLUSH, vcpu)) {
 			kvm_vcpu_flush_tlb_all(vcpu);
+
+			/* Flushing all ASIDs flushes the current ASID... */
+			kvm_clear_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
+		}
+		if (kvm_check_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu))
+			kvm_vcpu_flush_tlb_current(vcpu);
+
 		if (kvm_check_request(KVM_REQ_REPORT_TPR_ACCESS, vcpu)) {
 			vcpu->run->exit_reason = KVM_EXIT_TPR_ACCESS;
 			r = 0;
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index c1954e216b41..e0816850ce5e 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -125,6 +125,12 @@ static inline bool mmu_is_nested(struct kvm_vcpu *vcpu)
 	return vcpu->arch.walk_mmu == &vcpu->arch.nested_mmu;
 }
 
+static inline void kvm_vcpu_flush_tlb_current(struct kvm_vcpu *vcpu)
+{
+	++vcpu->stat.tlb_flush;
+	kvm_x86_ops->tlb_flush_current(vcpu);
+}
+
 static inline int is_pae(struct kvm_vcpu *vcpu)
 {
 	return kvm_read_cr4_bits(vcpu, X86_CR4_PAE);
-- 
2.24.1


^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH v3 25/37] KVM: x86/mmu: Use KVM_REQ_TLB_FLUSH_CURRENT for MMU specific flushes
  2020-03-20 21:27 [PATCH v3 00/37] KVM: x86: TLB flushing fixes and enhancements Sean Christopherson
                   ` (23 preceding siblings ...)
  2020-03-20 21:28 ` [PATCH v3 24/37] KVM: x86: Introduce KVM_REQ_TLB_FLUSH_CURRENT to flush current ASID Sean Christopherson
@ 2020-03-20 21:28 ` Sean Christopherson
  2020-03-20 21:28 ` [PATCH v3 26/37] KVM: nVMX: Selectively use TLB_FLUSH_CURRENT for nested VM-Enter/VM-Exit Sean Christopherson
                   ` (11 subsequent siblings)
  36 siblings, 0 replies; 83+ messages in thread
From: Sean Christopherson @ 2020-03-20 21:28 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Junaid Shahid,
	Liran Alon, Boris Ostrovsky, John Haxby, Miaohe Lin,
	Tom Lendacky

Flush only the current ASID/context when requesting a TLB flush due to a
change in the current vCPU's MMU to avoid blasting away TLB entries
associated with other ASIDs/contexts, e.g. entries cached for L1 when
a change in L2's MMU requires a flush.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/mmu/mmu.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index c357cc79f0f3..97d906a42e81 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2313,7 +2313,7 @@ static void kvm_mmu_flush_or_zap(struct kvm_vcpu *vcpu,
 		return;
 
 	if (local_flush)
-		kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
+		kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
 }
 
 #ifdef CONFIG_KVM_MMU_AUDIT
@@ -2520,11 +2520,11 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
 				break;
 
 			WARN_ON(!list_empty(&invalid_list));
-			kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
+			kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
 		}
 
 		if (sp->unsync_children)
-			kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
+			kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
 
 		__clear_sp_write_flooding_count(sp);
 		trace_kvm_mmu_get_page(sp, false);
@@ -3125,7 +3125,7 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 	if (set_spte_ret & SET_SPTE_WRITE_PROTECTED_PT) {
 		if (write_fault)
 			ret = RET_PF_EMULATE;
-		kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
+		kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
 	}
 
 	if (set_spte_ret & SET_SPTE_NEED_REMOTE_TLB_FLUSH || flush)
@@ -4314,7 +4314,7 @@ static bool fast_cr3_switch(struct kvm_vcpu *vcpu, gpa_t new_cr3,
 			kvm_make_request(KVM_REQ_LOAD_MMU_PGD, vcpu);
 			if (!skip_tlb_flush) {
 				kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
-				kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
+				kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
 			}
 
 			/*
@@ -5177,7 +5177,7 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu)
 	if (r)
 		goto out;
 	kvm_mmu_load_pgd(vcpu);
-	kvm_x86_ops->tlb_flush_all(vcpu);
+	kvm_x86_ops->tlb_flush_current(vcpu);
 out:
 	return r;
 }
-- 
2.24.1


^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH v3 26/37] KVM: nVMX: Selectively use TLB_FLUSH_CURRENT for nested VM-Enter/VM-Exit
  2020-03-20 21:27 [PATCH v3 00/37] KVM: x86: TLB flushing fixes and enhancements Sean Christopherson
                   ` (24 preceding siblings ...)
  2020-03-20 21:28 ` [PATCH v3 25/37] KVM: x86/mmu: Use KVM_REQ_TLB_FLUSH_CURRENT for MMU specific flushes Sean Christopherson
@ 2020-03-20 21:28 ` Sean Christopherson
  2020-03-20 21:28 ` [PATCH v3 27/37] KVM: nVMX: Reload APIC access page on nested VM-Exit only if necessary Sean Christopherson
                   ` (10 subsequent siblings)
  36 siblings, 0 replies; 83+ messages in thread
From: Sean Christopherson @ 2020-03-20 21:28 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Junaid Shahid,
	Liran Alon, Boris Ostrovsky, John Haxby, Miaohe Lin,
	Tom Lendacky

Flush only the current context, as opposed to all contexts, when
requesting a TLB flush to handle the scenario where a L1 does not expect
a TLB flush, but one is required because L1 and L2 shared an ASID.  This
occurs if EPT is disabled (no per-EPTP tag), VPID is enabled (hardware
doesn't flush unconditionally) and vmcs02 does not have its own VPID due
to exhaustion of available VPIDs.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/vmx/nested.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index b9fa2f89b564..e630d656b211 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -1174,8 +1174,8 @@ static void nested_vmx_transition_tlb_flush(struct kvm_vcpu *vcpu,
 	 *
 	 * If VPID is enabled and used by vmc12, but L2 does not have a unique
 	 * TLB tag (ASID), i.e. EPT is disabled and KVM was unable to allocate
-	 * a VPID for L2, flush the TLB as the effective ASID is common to both
-	 * L1 and L2.
+	 * a VPID for L2, flush the current context as the effective ASID is
+	 * common to both L1 and L2.
 	 *
 	 * Defer the flush so that it runs after vmcs02.EPTP has been set by
 	 * KVM_REQ_LOAD_MMU_PGD (if nested EPT is enabled) and to avoid
@@ -1187,8 +1187,10 @@ static void nested_vmx_transition_tlb_flush(struct kvm_vcpu *vcpu,
 	 * mapping between vpid02 and vpid12, vpid02 is per-vCPU and reused for
 	 * all nested vCPUs.
 	 */
-	if (!nested_cpu_has_vpid(vmcs12) || !nested_has_guest_tlb_tag(vcpu)) {
+	if (!nested_cpu_has_vpid(vmcs12)) {
 		kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
+	} else if (!nested_has_guest_tlb_tag(vcpu)) {
+		kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
 	} else if (is_vmenter &&
 		   vmcs12->virtual_processor_id != vmx->nested.last_vpid) {
 		vmx->nested.last_vpid = vmcs12->virtual_processor_id;
-- 
2.24.1


^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH v3 27/37] KVM: nVMX: Reload APIC access page on nested VM-Exit only if necessary
  2020-03-20 21:27 [PATCH v3 00/37] KVM: x86: TLB flushing fixes and enhancements Sean Christopherson
                   ` (25 preceding siblings ...)
  2020-03-20 21:28 ` [PATCH v3 26/37] KVM: nVMX: Selectively use TLB_FLUSH_CURRENT for nested VM-Enter/VM-Exit Sean Christopherson
@ 2020-03-20 21:28 ` Sean Christopherson
  2020-03-20 21:28 ` [PATCH v3 28/37] KVM: VMX: Retrieve APIC access page HPA only when necessary Sean Christopherson
                   ` (9 subsequent siblings)
  36 siblings, 0 replies; 83+ messages in thread
From: Sean Christopherson @ 2020-03-20 21:28 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Junaid Shahid,
	Liran Alon, Boris Ostrovsky, John Haxby, Miaohe Lin,
	Tom Lendacky

Defer reloading L1's APIC page by logging the need for a reload and
processing it during nested VM-Exit instead of unconditionally reloading
the APIC page on nested VM-Exit.  This eliminates a TLB flush on the
majority of VM-Exits as the APIC page rarely needs to be reloaded.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/vmx/nested.c |  9 ++++-----
 arch/x86/kvm/vmx/vmx.c    | 10 +++++++---
 arch/x86/kvm/vmx/vmx.h    |  1 +
 3 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index e630d656b211..06fc0b68ecf3 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -4367,11 +4367,10 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason,
 	kvm_vcpu_unmap(vcpu, &vmx->nested.pi_desc_map, true);
 	vmx->nested.pi_desc = NULL;
 
-	/*
-	 * We are now running in L2, mmu_notifier will force to reload the
-	 * page's hpa for L2 vmcs. Need to reload it for L1 before entering L1.
-	 */
-	kvm_make_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu);
+	if (vmx->nested.reload_vmcs01_apic_access_page) {
+		vmx->nested.reload_vmcs01_apic_access_page = false;
+		kvm_make_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu);
+	}
 
 	if ((exit_reason != -1) && (enable_shadow_vmcs || vmx->nested.hv_evmcs))
 		vmx->nested.need_vmcs12_to_shadow_sync = true;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index ae7279802652..3155329bf844 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6138,10 +6138,14 @@ void vmx_set_virtual_apic_mode(struct kvm_vcpu *vcpu)
 
 static void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu, hpa_t hpa)
 {
-	if (!is_guest_mode(vcpu)) {
-		vmcs_write64(APIC_ACCESS_ADDR, hpa);
-		vmx_flush_tlb_current(vcpu);
+	/* Defer reload until vmcs01 is the current VMCS. */
+	if (is_guest_mode(vcpu)) {
+		to_vmx(vcpu)->nested.reload_vmcs01_apic_access_page = true;
+		return;
 	}
+
+	vmcs_write64(APIC_ACCESS_ADDR, hpa);
+	vmx_flush_tlb_current(vcpu);
 }
 
 static void vmx_hwapic_isr_update(struct kvm_vcpu *vcpu, int max_isr)
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 571249e18bb6..66cc9f639e4b 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -138,6 +138,7 @@ struct nested_vmx {
 	bool vmcs02_initialized;
 
 	bool change_vmcs01_virtual_apic_mode;
+	bool reload_vmcs01_apic_access_page;
 
 	/*
 	 * Enlightened VMCS has been enabled. It does not mean that L1 has to
-- 
2.24.1


^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH v3 28/37] KVM: VMX: Retrieve APIC access page HPA only when necessary
  2020-03-20 21:27 [PATCH v3 00/37] KVM: x86: TLB flushing fixes and enhancements Sean Christopherson
                   ` (26 preceding siblings ...)
  2020-03-20 21:28 ` [PATCH v3 27/37] KVM: nVMX: Reload APIC access page on nested VM-Exit only if necessary Sean Christopherson
@ 2020-03-20 21:28 ` Sean Christopherson
  2020-03-20 21:28 ` [PATCH v3 29/37] KVM: VMX: Don't reload APIC access page if its control is disabled Sean Christopherson
                   ` (8 subsequent siblings)
  36 siblings, 0 replies; 83+ messages in thread
From: Sean Christopherson @ 2020-03-20 21:28 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Junaid Shahid,
	Liran Alon, Boris Ostrovsky, John Haxby, Miaohe Lin,
	Tom Lendacky

Move the retrieval of the HPA associated with L1's APIC access page into
VMX code to avoid unnecessarily calling gfn_to_page(), e.g. when the
vCPU is in guest mode (L2).  Alternatively, the optimization logic in
VMX could be mirrored into the common x86 code, but that will get ugly
fast when further optimizations are introduced.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/include/asm/kvm_host.h |  2 +-
 arch/x86/kvm/vmx/vmx.c          | 16 ++++++++++++++--
 arch/x86/kvm/x86.c              | 13 +------------
 3 files changed, 16 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 26fa52450569..31aa93088bf9 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1154,7 +1154,7 @@ struct kvm_x86_ops {
 	bool (*guest_apic_has_interrupt)(struct kvm_vcpu *vcpu);
 	void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
 	void (*set_virtual_apic_mode)(struct kvm_vcpu *vcpu);
-	void (*set_apic_access_page_addr)(struct kvm_vcpu *vcpu, hpa_t hpa);
+	void (*set_apic_access_page_addr)(struct kvm_vcpu *vcpu);
 	int (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
 	int (*sync_pir_to_irr)(struct kvm_vcpu *vcpu);
 	int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 3155329bf844..e8d409b50afd 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6136,16 +6136,28 @@ void vmx_set_virtual_apic_mode(struct kvm_vcpu *vcpu)
 	vmx_update_msr_bitmap(vcpu);
 }
 
-static void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu, hpa_t hpa)
+static void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu)
 {
+	struct page *page;
+
 	/* Defer reload until vmcs01 is the current VMCS. */
 	if (is_guest_mode(vcpu)) {
 		to_vmx(vcpu)->nested.reload_vmcs01_apic_access_page = true;
 		return;
 	}
 
-	vmcs_write64(APIC_ACCESS_ADDR, hpa);
+	page = gfn_to_page(vcpu->kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
+	if (is_error_page(page))
+		return;
+
+	vmcs_write64(APIC_ACCESS_ADDR, page_to_phys(page));
 	vmx_flush_tlb_current(vcpu);
+
+	/*
+	 * Do not pin apic access page in memory, the MMU notifier
+	 * will call us again if it is migrated or swapped out.
+	 */
+	put_page(page);
 }
 
 static void vmx_hwapic_isr_update(struct kvm_vcpu *vcpu, int max_isr)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index cccfcf612008..26c24af87cca 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8157,24 +8157,13 @@ int kvm_arch_mmu_notifier_invalidate_range(struct kvm *kvm,
 
 void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
 {
-	struct page *page = NULL;
-
 	if (!lapic_in_kernel(vcpu))
 		return;
 
 	if (!kvm_x86_ops->set_apic_access_page_addr)
 		return;
 
-	page = gfn_to_page(vcpu->kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
-	if (is_error_page(page))
-		return;
-	kvm_x86_ops->set_apic_access_page_addr(vcpu, page_to_phys(page));
-
-	/*
-	 * Do not pin apic access page in memory, the MMU notifier
-	 * will call us again if it is migrated or swapped out.
-	 */
-	put_page(page);
+	kvm_x86_ops->set_apic_access_page_addr(vcpu);
 }
 
 void __kvm_request_immediate_exit(struct kvm_vcpu *vcpu)
-- 
2.24.1


^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH v3 29/37] KVM: VMX: Don't reload APIC access page if its control is disabled
  2020-03-20 21:27 [PATCH v3 00/37] KVM: x86: TLB flushing fixes and enhancements Sean Christopherson
                   ` (27 preceding siblings ...)
  2020-03-20 21:28 ` [PATCH v3 28/37] KVM: VMX: Retrieve APIC access page HPA only when necessary Sean Christopherson
@ 2020-03-20 21:28 ` Sean Christopherson
  2020-03-20 21:28 ` [PATCH v3 30/37] KVM: x86/mmu: Move fast_cr3_switch() side effects to __kvm_mmu_new_cr3() Sean Christopherson
                   ` (7 subsequent siblings)
  36 siblings, 0 replies; 83+ messages in thread
From: Sean Christopherson @ 2020-03-20 21:28 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Junaid Shahid,
	Liran Alon, Boris Ostrovsky, John Haxby, Miaohe Lin,
	Tom Lendacky

Don't reload the APIC access page if its control is disabled, e.g. if
the guest is running with x2APIC (likely) or with the local APIC
disabled (unlikely), to avoid unnecessary TLB flushes and VMWRITEs.
Unconditionally reload the APIC access page and flush the TLB when
the guest's virtual APIC transitions to "xAPIC enabled", as any
changes to the APIC access page's mapping will not be recorded while
the guest's virtual APIC is disabled.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/vmx/vmx.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index e8d409b50afd..d49d2a1ddf03 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6122,7 +6122,15 @@ void vmx_set_virtual_apic_mode(struct kvm_vcpu *vcpu)
 		if (flexpriority_enabled) {
 			sec_exec_control |=
 				SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
-			vmx_flush_tlb_current(vcpu);
+			kvm_make_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu);
+
+			/*
+			 * Flush the TLB, reloading the APIC access page will
+			 * only do so if its physical address has changed, but
+			 * the guest may have inserted a non-APIC mapping into
+			 * the TLB while the APIC access page was disabled.
+			 */
+			kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
 		}
 		break;
 	case LAPIC_MODE_X2APIC:
@@ -6146,6 +6154,10 @@ static void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu)
 		return;
 	}
 
+	if (!(secondary_exec_controls_get(to_vmx(vcpu)) &
+	    SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES))
+		return;
+
 	page = gfn_to_page(vcpu->kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
 	if (is_error_page(page))
 		return;
-- 
2.24.1


^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH v3 30/37] KVM: x86/mmu: Move fast_cr3_switch() side effects to __kvm_mmu_new_cr3()
  2020-03-20 21:27 [PATCH v3 00/37] KVM: x86: TLB flushing fixes and enhancements Sean Christopherson
                   ` (28 preceding siblings ...)
  2020-03-20 21:28 ` [PATCH v3 29/37] KVM: VMX: Don't reload APIC access page if its control is disabled Sean Christopherson
@ 2020-03-20 21:28 ` Sean Christopherson
  2020-03-20 21:28 ` [PATCH v3 31/37] KVM: x86/mmu: Add separate override for MMU sync during fast CR3 switch Sean Christopherson
                   ` (6 subsequent siblings)
  36 siblings, 0 replies; 83+ messages in thread
From: Sean Christopherson @ 2020-03-20 21:28 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Junaid Shahid,
	Liran Alon, Boris Ostrovsky, John Haxby, Miaohe Lin,
	Tom Lendacky

Handle the side effects of a fast CR3 (PGD) switch up a level in
__kvm_mmu_new_cr3(), which is the only caller of fast_cr3_switch().

This consolidates handling all side effects in __kvm_mmu_new_cr3()
(where freeing the current root when KVM can't do a fast switch is
already handled), and ameliorates the pain of adding a second boolean in
a future patch to provide a separate "skip" override for the MMU sync.

Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/mmu/mmu.c | 69 +++++++++++++++++++-----------------------
 1 file changed, 31 insertions(+), 38 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 97d906a42e81..b95933198f4c 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4288,8 +4288,7 @@ static bool cached_root_available(struct kvm_vcpu *vcpu, gpa_t new_cr3,
 }
 
 static bool fast_cr3_switch(struct kvm_vcpu *vcpu, gpa_t new_cr3,
-			    union kvm_mmu_page_role new_role,
-			    bool skip_tlb_flush)
+			    union kvm_mmu_page_role new_role)
 {
 	struct kvm_mmu *mmu = vcpu->arch.mmu;
 
@@ -4299,39 +4298,9 @@ static bool fast_cr3_switch(struct kvm_vcpu *vcpu, gpa_t new_cr3,
 	 * later if necessary.
 	 */
 	if (mmu->shadow_root_level >= PT64_ROOT_4LEVEL &&
-	    mmu->root_level >= PT64_ROOT_4LEVEL) {
-		if (mmu_check_root(vcpu, new_cr3 >> PAGE_SHIFT))
-			return false;
-
-		if (cached_root_available(vcpu, new_cr3, new_role)) {
-			/*
-			 * It is possible that the cached previous root page is
-			 * obsolete because of a change in the MMU generation
-			 * number. However, changing the generation number is
-			 * accompanied by KVM_REQ_MMU_RELOAD, which will free
-			 * the root set here and allocate a new one.
-			 */
-			kvm_make_request(KVM_REQ_LOAD_MMU_PGD, vcpu);
-			if (!skip_tlb_flush) {
-				kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
-				kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
-			}
-
-			/*
-			 * The last MMIO access's GVA and GPA are cached in the
-			 * VCPU. When switching to a new CR3, that GVA->GPA
-			 * mapping may no longer be valid. So clear any cached
-			 * MMIO info even when we don't need to sync the shadow
-			 * page tables.
-			 */
-			vcpu_clear_mmio_info(vcpu, MMIO_GVA_ANY);
-
-			__clear_sp_write_flooding_count(
-				page_header(mmu->root_hpa));
-
-			return true;
-		}
-	}
+	    mmu->root_level >= PT64_ROOT_4LEVEL)
+		return !mmu_check_root(vcpu, new_cr3 >> PAGE_SHIFT) &&
+		       cached_root_available(vcpu, new_cr3, new_role);
 
 	return false;
 }
@@ -4340,9 +4309,33 @@ static void __kvm_mmu_new_cr3(struct kvm_vcpu *vcpu, gpa_t new_cr3,
 			      union kvm_mmu_page_role new_role,
 			      bool skip_tlb_flush)
 {
-	if (!fast_cr3_switch(vcpu, new_cr3, new_role, skip_tlb_flush))
-		kvm_mmu_free_roots(vcpu, vcpu->arch.mmu,
-				   KVM_MMU_ROOT_CURRENT);
+	if (!fast_cr3_switch(vcpu, new_cr3, new_role)) {
+		kvm_mmu_free_roots(vcpu, vcpu->arch.mmu, KVM_MMU_ROOT_CURRENT);
+		return;
+	}
+
+	/*
+	 * It's possible that the cached previous root page is obsolete because
+	 * of a change in the MMU generation number. However, changing the
+	 * generation number is accompanied by KVM_REQ_MMU_RELOAD, which will
+	 * free the root set here and allocate a new one.
+	 */
+	kvm_make_request(KVM_REQ_LOAD_MMU_PGD, vcpu);
+
+	if (!skip_tlb_flush) {
+		kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
+		kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
+	}
+
+	/*
+	 * The last MMIO access's GVA and GPA are cached in the VCPU. When
+	 * switching to a new CR3, that GVA->GPA mapping may no longer be
+	 * valid. So clear any cached MMIO info even when we don't need to sync
+	 * the shadow page tables.
+	 */
+	vcpu_clear_mmio_info(vcpu, MMIO_GVA_ANY);
+
+	__clear_sp_write_flooding_count(page_header(vcpu->arch.mmu->root_hpa));
 }
 
 void kvm_mmu_new_cr3(struct kvm_vcpu *vcpu, gpa_t new_cr3, bool skip_tlb_flush)
-- 
2.24.1


^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH v3 31/37] KVM: x86/mmu: Add separate override for MMU sync during fast CR3 switch
  2020-03-20 21:27 [PATCH v3 00/37] KVM: x86: TLB flushing fixes and enhancements Sean Christopherson
                   ` (29 preceding siblings ...)
  2020-03-20 21:28 ` [PATCH v3 30/37] KVM: x86/mmu: Move fast_cr3_switch() side effects to __kvm_mmu_new_cr3() Sean Christopherson
@ 2020-03-20 21:28 ` Sean Christopherson
  2020-03-24 11:07   ` Paolo Bonzini
  2020-03-20 21:28 ` [PATCH v3 32/37] KVM: x86/mmu: Add module param to force TLB flush on root reuse Sean Christopherson
                   ` (5 subsequent siblings)
  36 siblings, 1 reply; 83+ messages in thread
From: Sean Christopherson @ 2020-03-20 21:28 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Junaid Shahid,
	Liran Alon, Boris Ostrovsky, John Haxby, Miaohe Lin,
	Tom Lendacky

Add a separate "skip" override for MMU sync, a future change to avoid
TLB flushes on nested VMX transitions may need to sync the MMU even if
the TLB flush is unnecessary.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/include/asm/kvm_host.h |  3 ++-
 arch/x86/kvm/mmu/mmu.c          | 13 +++++++------
 arch/x86/kvm/vmx/nested.c       |  2 +-
 arch/x86/kvm/x86.c              |  2 +-
 4 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 31aa93088bf9..6fca2e45886c 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1517,7 +1517,8 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 error_code,
 		       void *insn, int insn_len);
 void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva);
 void kvm_mmu_invpcid_gva(struct kvm_vcpu *vcpu, gva_t gva, unsigned long pcid);
-void kvm_mmu_new_cr3(struct kvm_vcpu *vcpu, gpa_t new_cr3, bool skip_tlb_flush);
+void kvm_mmu_new_cr3(struct kvm_vcpu *vcpu, gpa_t new_cr3, bool skip_tlb_flush,
+		     bool skip_mmu_sync);
 
 void kvm_configure_mmu(bool enable_tdp, int tdp_page_level);
 
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index b95933198f4c..06e94ca59a2d 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4307,7 +4307,7 @@ static bool fast_cr3_switch(struct kvm_vcpu *vcpu, gpa_t new_cr3,
 
 static void __kvm_mmu_new_cr3(struct kvm_vcpu *vcpu, gpa_t new_cr3,
 			      union kvm_mmu_page_role new_role,
-			      bool skip_tlb_flush)
+			      bool skip_tlb_flush, bool skip_mmu_sync)
 {
 	if (!fast_cr3_switch(vcpu, new_cr3, new_role)) {
 		kvm_mmu_free_roots(vcpu, vcpu->arch.mmu, KVM_MMU_ROOT_CURRENT);
@@ -4322,10 +4322,10 @@ static void __kvm_mmu_new_cr3(struct kvm_vcpu *vcpu, gpa_t new_cr3,
 	 */
 	kvm_make_request(KVM_REQ_LOAD_MMU_PGD, vcpu);
 
-	if (!skip_tlb_flush) {
+	if (!skip_mmu_sync)
 		kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
+	if (!skip_tlb_flush)
 		kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
-	}
 
 	/*
 	 * The last MMIO access's GVA and GPA are cached in the VCPU. When
@@ -4338,10 +4338,11 @@ static void __kvm_mmu_new_cr3(struct kvm_vcpu *vcpu, gpa_t new_cr3,
 	__clear_sp_write_flooding_count(page_header(vcpu->arch.mmu->root_hpa));
 }
 
-void kvm_mmu_new_cr3(struct kvm_vcpu *vcpu, gpa_t new_cr3, bool skip_tlb_flush)
+void kvm_mmu_new_cr3(struct kvm_vcpu *vcpu, gpa_t new_cr3, bool skip_tlb_flush,
+		     bool skip_mmu_sync)
 {
 	__kvm_mmu_new_cr3(vcpu, new_cr3, kvm_mmu_calc_root_page_role(vcpu),
-			  skip_tlb_flush);
+			  skip_tlb_flush, skip_mmu_sync);
 }
 EXPORT_SYMBOL_GPL(kvm_mmu_new_cr3);
 
@@ -5034,7 +5035,7 @@ void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly,
 		kvm_calc_shadow_ept_root_page_role(vcpu, accessed_dirty,
 						   execonly, level);
 
-	__kvm_mmu_new_cr3(vcpu, new_eptp, new_role.base, false);
+	__kvm_mmu_new_cr3(vcpu, new_eptp, new_role.base, false, false);
 
 	if (new_role.as_u64 == context->mmu_role.as_u64)
 		return;
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 06fc0b68ecf3..dd58563ee793 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -1123,7 +1123,7 @@ static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3, bool ne
 	}
 
 	if (!nested_ept)
-		kvm_mmu_new_cr3(vcpu, cr3, false);
+		kvm_mmu_new_cr3(vcpu, cr3, false, false);
 
 	vcpu->arch.cr3 = cr3;
 	kvm_register_mark_available(vcpu, VCPU_EXREG_CR3);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 26c24af87cca..0d1572a0791c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1045,7 +1045,7 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
 		 !load_pdptrs(vcpu, vcpu->arch.walk_mmu, cr3))
 		return 1;
 
-	kvm_mmu_new_cr3(vcpu, cr3, skip_tlb_flush);
+	kvm_mmu_new_cr3(vcpu, cr3, skip_tlb_flush, skip_tlb_flush);
 	vcpu->arch.cr3 = cr3;
 	kvm_register_mark_available(vcpu, VCPU_EXREG_CR3);
 
-- 
2.24.1


^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH v3 32/37] KVM: x86/mmu: Add module param to force TLB flush on root reuse
  2020-03-20 21:27 [PATCH v3 00/37] KVM: x86: TLB flushing fixes and enhancements Sean Christopherson
                   ` (30 preceding siblings ...)
  2020-03-20 21:28 ` [PATCH v3 31/37] KVM: x86/mmu: Add separate override for MMU sync during fast CR3 switch Sean Christopherson
@ 2020-03-20 21:28 ` Sean Christopherson
  2020-03-20 21:28 ` [PATCH v3 33/37] KVM: nVMX: Skip MMU sync on nested VMX transition when possible Sean Christopherson
                   ` (4 subsequent siblings)
  36 siblings, 0 replies; 83+ messages in thread
From: Sean Christopherson @ 2020-03-20 21:28 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Junaid Shahid,
	Liran Alon, Boris Ostrovsky, John Haxby, Miaohe Lin,
	Tom Lendacky

Add a module param, flush_on_reuse, to override skip_tlb_flush and
skip_mmu_sync when performing a so called "fast cr3 switch", i.e. when
reusing a cached root.  The primary motiviation for the control is to
provide a fallback mechanism in the event that TLB flushing and/or MMU
sync bugs are exposed/introduced by upcoming changes to stop
unconditionally flushing on nested VMX transitions.

Suggested-by: Jim Mattson <jmattson@google.com>
Suggested-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/mmu/mmu.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 06e94ca59a2d..6a986b66c867 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -78,6 +78,9 @@ module_param_cb(nx_huge_pages_recovery_ratio, &nx_huge_pages_recovery_ratio_ops,
 		&nx_huge_pages_recovery_ratio, 0644);
 __MODULE_PARM_TYPE(nx_huge_pages_recovery_ratio, "uint");
 
+static bool __read_mostly force_flush_and_sync_on_reuse;
+module_param_named(flush_on_reuse, force_flush_and_sync_on_reuse, bool, 0644);
+
 /*
  * When setting this variable to true it enables Two-Dimensional-Paging
  * where the hardware walks 2 page tables:
@@ -4322,9 +4325,9 @@ static void __kvm_mmu_new_cr3(struct kvm_vcpu *vcpu, gpa_t new_cr3,
 	 */
 	kvm_make_request(KVM_REQ_LOAD_MMU_PGD, vcpu);
 
-	if (!skip_mmu_sync)
+	if (!skip_mmu_sync || force_flush_and_sync_on_reuse)
 		kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
-	if (!skip_tlb_flush)
+	if (!skip_tlb_flush || force_flush_and_sync_on_reuse)
 		kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
 
 	/*
-- 
2.24.1


^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH v3 33/37] KVM: nVMX: Skip MMU sync on nested VMX transition when possible
  2020-03-20 21:27 [PATCH v3 00/37] KVM: x86: TLB flushing fixes and enhancements Sean Christopherson
                   ` (31 preceding siblings ...)
  2020-03-20 21:28 ` [PATCH v3 32/37] KVM: x86/mmu: Add module param to force TLB flush on root reuse Sean Christopherson
@ 2020-03-20 21:28 ` Sean Christopherson
  2020-03-24 11:19   ` Paolo Bonzini
  2020-03-20 21:28 ` [PATCH v3 34/37] KVM: nVMX: Don't flush TLB on nested VMX transition Sean Christopherson
                   ` (3 subsequent siblings)
  36 siblings, 1 reply; 83+ messages in thread
From: Sean Christopherson @ 2020-03-20 21:28 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Junaid Shahid,
	Liran Alon, Boris Ostrovsky, John Haxby, Miaohe Lin,
	Tom Lendacky

Skip the MMU sync when reusing a cached root if EPT is enabled or L1
enabled VPID for L2.

If EPT is enabled, guest-physical mappings aren't flushed even if VPID
is disabled, i.e. L1 can't expect stale TLB entries to be flushed if it
has enabled EPT and L0 isn't shadowing PTEs (for L1 or L2) if L1 has
EPT disabled.

If VPID is enabled (and EPT is disabled), then L1 can't expect stale TLB
entries to be flushed (for itself or L2).

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/mmu/mmu.c    |  2 +-
 arch/x86/kvm/vmx/nested.c | 44 ++++++++++++++++++++++++++++++++++++++-
 2 files changed, 44 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 6a986b66c867..84e1e748c2b3 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5038,7 +5038,7 @@ void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly,
 		kvm_calc_shadow_ept_root_page_role(vcpu, accessed_dirty,
 						   execonly, level);
 
-	__kvm_mmu_new_cr3(vcpu, new_eptp, new_role.base, false, false);
+	__kvm_mmu_new_cr3(vcpu, new_eptp, new_role.base, false, true);
 
 	if (new_role.as_u64 == context->mmu_role.as_u64)
 		return;
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index dd58563ee793..db3ce8f297c2 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -1095,6 +1095,44 @@ static bool nested_cr3_valid(struct kvm_vcpu *vcpu, unsigned long val)
 	return (val & invalid_mask) == 0;
 }
 
+/*
+ * Returns true if the MMU needs to be sync'd on nested VM-Enter/VM-Exit.  The
+ * MMU needs to be sync if L0 is using shadow paging (EPT disabled) and L1
+ * didn't enable VPID for L2, i.e. L1 expects a TLB flush on VMX transitions.
+ *
+ * If EPT is enabled by L0 but disabled by L1, then L0 is not shadowing L1 or
+ * L2 PTEs, there cannot be unsync'd SPTEs for either L1 or L2.
+ *
+ * If EPT is enabled by L1 (and therefore L0), then L0 doesn't need to sync on
+ * VM-Enter as VM-Enter isn't required to invalidate guest-physical mappings
+ * (irrespective of VPID), i.e. L1 can't rely on the (virtual) CPU to flush
+ * stale GPA->HPA translations for L2 from the TLB.  And as above, L0 isn't
+ * shadowing L1 PTEs so there are no unsync'd SPTEs to sync on VM-Exit.
+ *
+ * If VPID is enabled by L1 (for L2), then L0 doesn't need to sync as VM-Enter
+ * and VM-Exit aren't required to invaliate linear mappings (EPT is disabled so
+ * there are no combined or guest-physical mappings), i.e. L1 can't rely on the
+ * (virtual) CPU to flush stale VA->PA mappings for either L2 or itself (L1).
+ *
+ * If EPT is disabled (by L0 and therefore L1) and VPID is disabled by L1, then
+ * a sync is needed as L1 expects all VA->PA mappings to be flushed on both
+ * VM-Enter and VM-Exit.
+ *
+ * Note, this logic is subtly different than nested_has_guest_tlb_tag(), which
+ * additionally checks that L2 has been assigned a VPID (when EPT is disabled).
+ * Whether or not L2 has been assigned a VPID by L0 is irrelevant with respect
+ * to L1's expectations, e.g. L0 needs to invalidate hardware TLB entries if L2
+ * doesn't have a unique VPID to prevent reusing L1's entries (assuming L1 has
+ * been assigned a VPID), but L0 doesn't need to do a MMU sync because L1
+ * doesn't expect stale (virtual) TLB entries to be flushed, i.e. L1 doesn't
+ * know that L0 will flush the TLB and so L1 will do INVVPID as needed to flush
+ * stale TLB entries, at which point L0 will sync L2's MMU.
+ */
+static bool nested_vmx_transition_mmu_sync(struct kvm_vcpu *vcpu)
+{
+	return !enable_ept && !nested_cpu_has_vpid(get_vmcs12(vcpu));
+}
+
 /*
  * Load guest's/host's cr3 at nested entry/exit.  @nested_ept is true if we are
  * emulating VM-Entry into a guest with EPT enabled.  On failure, the expected
@@ -1122,8 +1160,12 @@ static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3, bool ne
 		}
 	}
 
+	/*
+	 * See nested_vmx_transition_mmu_sync for details on skipping the MMU sync.
+	 */
 	if (!nested_ept)
-		kvm_mmu_new_cr3(vcpu, cr3, false, false);
+		kvm_mmu_new_cr3(vcpu, cr3, false,
+				!nested_vmx_transition_mmu_sync(vcpu));
 
 	vcpu->arch.cr3 = cr3;
 	kvm_register_mark_available(vcpu, VCPU_EXREG_CR3);
-- 
2.24.1


^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH v3 34/37] KVM: nVMX: Don't flush TLB on nested VMX transition
  2020-03-20 21:27 [PATCH v3 00/37] KVM: x86: TLB flushing fixes and enhancements Sean Christopherson
                   ` (32 preceding siblings ...)
  2020-03-20 21:28 ` [PATCH v3 33/37] KVM: nVMX: Skip MMU sync on nested VMX transition when possible Sean Christopherson
@ 2020-03-20 21:28 ` Sean Christopherson
  2020-03-24 11:20   ` Paolo Bonzini
  2020-03-20 21:28 ` [PATCH v3 35/37] KVM: nVMX: Free only the affected contexts when emulating INVEPT Sean Christopherson
                   ` (2 subsequent siblings)
  36 siblings, 1 reply; 83+ messages in thread
From: Sean Christopherson @ 2020-03-20 21:28 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Junaid Shahid,
	Liran Alon, Boris Ostrovsky, John Haxby, Miaohe Lin,
	Tom Lendacky

Unconditionally skip the TLB flush triggered when reusing a root for a
nested transition as nested_vmx_transition_tlb_flush() ensures the TLB
is flushed when needed, regardless of whether the MMU can reuse a cached
root (or the last root).

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/mmu/mmu.c    | 2 +-
 arch/x86/kvm/vmx/nested.c | 6 ++++--
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 84e1e748c2b3..7b0fb7f2c24d 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5038,7 +5038,7 @@ void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly,
 		kvm_calc_shadow_ept_root_page_role(vcpu, accessed_dirty,
 						   execonly, level);
 
-	__kvm_mmu_new_cr3(vcpu, new_eptp, new_role.base, false, true);
+	__kvm_mmu_new_cr3(vcpu, new_eptp, new_role.base, true, true);
 
 	if (new_role.as_u64 == context->mmu_role.as_u64)
 		return;
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index db3ce8f297c2..92aab4166498 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -1161,10 +1161,12 @@ static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3, bool ne
 	}
 
 	/*
-	 * See nested_vmx_transition_mmu_sync for details on skipping the MMU sync.
+	 * Unconditionally skip the TLB flush on fast CR3 switch, all TLB
+	 * flushes are handled by nested_vmx_transition_tlb_flush().  See
+	 * nested_vmx_transition_mmu_sync for details on skipping the MMU sync.
 	 */
 	if (!nested_ept)
-		kvm_mmu_new_cr3(vcpu, cr3, false,
+		kvm_mmu_new_cr3(vcpu, cr3, true,
 				!nested_vmx_transition_mmu_sync(vcpu));
 
 	vcpu->arch.cr3 = cr3;
-- 
2.24.1


^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH v3 35/37] KVM: nVMX: Free only the affected contexts when emulating INVEPT
  2020-03-20 21:27 [PATCH v3 00/37] KVM: x86: TLB flushing fixes and enhancements Sean Christopherson
                   ` (33 preceding siblings ...)
  2020-03-20 21:28 ` [PATCH v3 34/37] KVM: nVMX: Don't flush TLB on nested VMX transition Sean Christopherson
@ 2020-03-20 21:28 ` Sean Christopherson
  2020-03-20 21:28 ` [PATCH v3 36/37] KVM: x86: Replace "cr3" with "pgd" in "new cr3/pgd" related code Sean Christopherson
  2020-03-20 21:28 ` [PATCH v3 37/37] KVM: VMX: Clean cr3/pgd handling in vmx_load_mmu_pgd() Sean Christopherson
  36 siblings, 0 replies; 83+ messages in thread
From: Sean Christopherson @ 2020-03-20 21:28 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Junaid Shahid,
	Liran Alon, Boris Ostrovsky, John Haxby, Miaohe Lin,
	Tom Lendacky

Add logic to handle_invept() to free only those roots that match the
target EPT context when emulating a single-context INVEPT.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/vmx/nested.c | 26 +++++++++++++++++++++-----
 1 file changed, 21 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 92aab4166498..72e69d841531 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -5185,12 +5185,14 @@ static int handle_invept(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 	u32 vmx_instruction_info, types;
-	unsigned long type;
+	unsigned long type, roots_to_free;
+	struct kvm_mmu *mmu;
 	gva_t gva;
 	struct x86_exception e;
 	struct {
 		u64 eptp, gpa;
 	} operand;
+	int i;
 
 	if (!(vmx->nested.msrs.secondary_ctls_high &
 	      SECONDARY_EXEC_ENABLE_EPT) ||
@@ -5222,23 +5224,37 @@ static int handle_invept(struct kvm_vcpu *vcpu)
 		return 1;
 	}
 
+	mmu = &vcpu->arch.guest_mmu;
+
 	switch (type) {
 	case VMX_EPT_EXTENT_CONTEXT:
 		if (!nested_vmx_check_eptp(vcpu, operand.eptp))
 			return nested_vmx_failValid(vcpu,
 				VMXERR_INVALID_OPERAND_TO_INVEPT_INVVPID);
 
-		/* TODO: sync only the target EPTP context. */
-		fallthrough;
+		roots_to_free = 0;
+		if (nested_ept_root_matches(mmu->root_hpa, mmu->root_cr3,
+					    operand.eptp))
+			roots_to_free |= KVM_MMU_ROOT_CURRENT;
+
+		for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++) {
+			if (nested_ept_root_matches(mmu->prev_roots[i].hpa,
+						    mmu->prev_roots[i].cr3,
+						    operand.eptp))
+				roots_to_free |= KVM_MMU_ROOT_PREVIOUS(i);
+		}
+		break;
 	case VMX_EPT_EXTENT_GLOBAL:
-		kvm_mmu_free_roots(vcpu, &vcpu->arch.guest_mmu,
-				   KVM_MMU_ROOTS_ALL);
+		roots_to_free = KVM_MMU_ROOTS_ALL;
 		break;
 	default:
 		BUG_ON(1);
 		break;
 	}
 
+	if (roots_to_free)
+		kvm_mmu_free_roots(vcpu, mmu, roots_to_free);
+
 	return nested_vmx_succeed(vcpu);
 }
 
-- 
2.24.1


^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH v3 36/37] KVM: x86: Replace "cr3" with "pgd" in "new cr3/pgd" related code
  2020-03-20 21:27 [PATCH v3 00/37] KVM: x86: TLB flushing fixes and enhancements Sean Christopherson
                   ` (34 preceding siblings ...)
  2020-03-20 21:28 ` [PATCH v3 35/37] KVM: nVMX: Free only the affected contexts when emulating INVEPT Sean Christopherson
@ 2020-03-20 21:28 ` Sean Christopherson
  2020-03-20 21:28 ` [PATCH v3 37/37] KVM: VMX: Clean cr3/pgd handling in vmx_load_mmu_pgd() Sean Christopherson
  36 siblings, 0 replies; 83+ messages in thread
From: Sean Christopherson @ 2020-03-20 21:28 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Junaid Shahid,
	Liran Alon, Boris Ostrovsky, John Haxby, Miaohe Lin,
	Tom Lendacky

Rename functions and variables in kvm_mmu_new_cr3() and related code to
replace "cr3" with "pgd", i.e. continue the work started by commit
727a7e27cf88a ("KVM: x86: rename set_cr3 callback and related flags to
load_mmu_pgd").  kvm_mmu_new_cr3() and company are not always loading a
new CR3, e.g. when nested EPT is enabled "cr3" is actually an EPTP.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/include/asm/kvm_host.h |  8 ++---
 arch/x86/kvm/mmu/mmu.c          | 58 ++++++++++++++++-----------------
 arch/x86/kvm/vmx/nested.c       |  8 ++---
 arch/x86/kvm/vmx/vmx.c          |  2 +-
 arch/x86/kvm/x86.c              |  2 +-
 5 files changed, 39 insertions(+), 39 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 6fca2e45886c..167729624149 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -373,12 +373,12 @@ struct rsvd_bits_validate {
 };
 
 struct kvm_mmu_root_info {
-	gpa_t cr3;
+	gpa_t pgd;
 	hpa_t hpa;
 };
 
 #define KVM_MMU_ROOT_INFO_INVALID \
-	((struct kvm_mmu_root_info) { .cr3 = INVALID_PAGE, .hpa = INVALID_PAGE })
+	((struct kvm_mmu_root_info) { .pgd = INVALID_PAGE, .hpa = INVALID_PAGE })
 
 #define KVM_MMU_NUM_PREV_ROOTS 3
 
@@ -404,7 +404,7 @@ struct kvm_mmu {
 	void (*update_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 			   u64 *spte, const void *pte);
 	hpa_t root_hpa;
-	gpa_t root_cr3;
+	gpa_t root_pgd;
 	union kvm_mmu_role mmu_role;
 	u8 root_level;
 	u8 shadow_root_level;
@@ -1517,7 +1517,7 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 error_code,
 		       void *insn, int insn_len);
 void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva);
 void kvm_mmu_invpcid_gva(struct kvm_vcpu *vcpu, gva_t gva, unsigned long pcid);
-void kvm_mmu_new_cr3(struct kvm_vcpu *vcpu, gpa_t new_cr3, bool skip_tlb_flush,
+void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd, bool skip_tlb_flush,
 		     bool skip_mmu_sync);
 
 void kvm_configure_mmu(bool enable_tdp, int tdp_page_level);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 7b0fb7f2c24d..be03f353dd3d 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3669,7 +3669,7 @@ void kvm_mmu_free_roots(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 							   &invalid_list);
 			mmu->root_hpa = INVALID_PAGE;
 		}
-		mmu->root_cr3 = 0;
+		mmu->root_pgd = 0;
 	}
 
 	kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list);
@@ -3726,8 +3726,8 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
 	} else
 		BUG();
 
-	/* root_cr3 is ignored for direct MMUs. */
-	vcpu->arch.mmu->root_cr3 = 0;
+	/* root_pgd is ignored for direct MMUs. */
+	vcpu->arch.mmu->root_pgd = 0;
 
 	return 0;
 }
@@ -3736,11 +3736,11 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 {
 	struct kvm_mmu_page *sp;
 	u64 pdptr, pm_mask;
-	gfn_t root_gfn, root_cr3;
+	gfn_t root_gfn, root_pgd;
 	int i;
 
-	root_cr3 = vcpu->arch.mmu->get_guest_pgd(vcpu);
-	root_gfn = root_cr3 >> PAGE_SHIFT;
+	root_pgd = vcpu->arch.mmu->get_guest_pgd(vcpu);
+	root_gfn = root_pgd >> PAGE_SHIFT;
 
 	if (mmu_check_root(vcpu, root_gfn))
 		return 1;
@@ -3765,7 +3765,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 		++sp->root_count;
 		spin_unlock(&vcpu->kvm->mmu_lock);
 		vcpu->arch.mmu->root_hpa = root;
-		goto set_root_cr3;
+		goto set_root_pgd;
 	}
 
 	/*
@@ -3831,8 +3831,8 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 		vcpu->arch.mmu->root_hpa = __pa(vcpu->arch.mmu->lm_root);
 	}
 
-set_root_cr3:
-	vcpu->arch.mmu->root_cr3 = root_cr3;
+set_root_pgd:
+	vcpu->arch.mmu->root_pgd = root_pgd;
 
 	return 0;
 }
@@ -4248,49 +4248,49 @@ static void nonpaging_init_context(struct kvm_vcpu *vcpu,
 	context->nx = false;
 }
 
-static inline bool is_root_usable(struct kvm_mmu_root_info *root, gpa_t cr3,
+static inline bool is_root_usable(struct kvm_mmu_root_info *root, gpa_t pgd,
 				  union kvm_mmu_page_role role)
 {
-	return (role.direct || cr3 == root->cr3) &&
+	return (role.direct || pgd == root->pgd) &&
 	       VALID_PAGE(root->hpa) && page_header(root->hpa) &&
 	       role.word == page_header(root->hpa)->role.word;
 }
 
 /*
- * Find out if a previously cached root matching the new CR3/role is available.
+ * Find out if a previously cached root matching the new pgd/role is available.
  * The current root is also inserted into the cache.
  * If a matching root was found, it is assigned to kvm_mmu->root_hpa and true is
  * returned.
  * Otherwise, the LRU root from the cache is assigned to kvm_mmu->root_hpa and
  * false is returned. This root should now be freed by the caller.
  */
-static bool cached_root_available(struct kvm_vcpu *vcpu, gpa_t new_cr3,
+static bool cached_root_available(struct kvm_vcpu *vcpu, gpa_t new_pgd,
 				  union kvm_mmu_page_role new_role)
 {
 	uint i;
 	struct kvm_mmu_root_info root;
 	struct kvm_mmu *mmu = vcpu->arch.mmu;
 
-	root.cr3 = mmu->root_cr3;
+	root.pgd = mmu->root_pgd;
 	root.hpa = mmu->root_hpa;
 
-	if (is_root_usable(&root, new_cr3, new_role))
+	if (is_root_usable(&root, new_pgd, new_role))
 		return true;
 
 	for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++) {
 		swap(root, mmu->prev_roots[i]);
 
-		if (is_root_usable(&root, new_cr3, new_role))
+		if (is_root_usable(&root, new_pgd, new_role))
 			break;
 	}
 
 	mmu->root_hpa = root.hpa;
-	mmu->root_cr3 = root.cr3;
+	mmu->root_pgd = root.pgd;
 
 	return i < KVM_MMU_NUM_PREV_ROOTS;
 }
 
-static bool fast_cr3_switch(struct kvm_vcpu *vcpu, gpa_t new_cr3,
+static bool fast_pgd_switch(struct kvm_vcpu *vcpu, gpa_t new_pgd,
 			    union kvm_mmu_page_role new_role)
 {
 	struct kvm_mmu *mmu = vcpu->arch.mmu;
@@ -4302,17 +4302,17 @@ static bool fast_cr3_switch(struct kvm_vcpu *vcpu, gpa_t new_cr3,
 	 */
 	if (mmu->shadow_root_level >= PT64_ROOT_4LEVEL &&
 	    mmu->root_level >= PT64_ROOT_4LEVEL)
-		return !mmu_check_root(vcpu, new_cr3 >> PAGE_SHIFT) &&
-		       cached_root_available(vcpu, new_cr3, new_role);
+		return !mmu_check_root(vcpu, new_pgd >> PAGE_SHIFT) &&
+		       cached_root_available(vcpu, new_pgd, new_role);
 
 	return false;
 }
 
-static void __kvm_mmu_new_cr3(struct kvm_vcpu *vcpu, gpa_t new_cr3,
+static void __kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd,
 			      union kvm_mmu_page_role new_role,
 			      bool skip_tlb_flush, bool skip_mmu_sync)
 {
-	if (!fast_cr3_switch(vcpu, new_cr3, new_role)) {
+	if (!fast_pgd_switch(vcpu, new_pgd, new_role)) {
 		kvm_mmu_free_roots(vcpu, vcpu->arch.mmu, KVM_MMU_ROOT_CURRENT);
 		return;
 	}
@@ -4341,13 +4341,13 @@ static void __kvm_mmu_new_cr3(struct kvm_vcpu *vcpu, gpa_t new_cr3,
 	__clear_sp_write_flooding_count(page_header(vcpu->arch.mmu->root_hpa));
 }
 
-void kvm_mmu_new_cr3(struct kvm_vcpu *vcpu, gpa_t new_cr3, bool skip_tlb_flush,
+void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd, bool skip_tlb_flush,
 		     bool skip_mmu_sync)
 {
-	__kvm_mmu_new_cr3(vcpu, new_cr3, kvm_mmu_calc_root_page_role(vcpu),
+	__kvm_mmu_new_pgd(vcpu, new_pgd, kvm_mmu_calc_root_page_role(vcpu),
 			  skip_tlb_flush, skip_mmu_sync);
 }
-EXPORT_SYMBOL_GPL(kvm_mmu_new_cr3);
+EXPORT_SYMBOL_GPL(kvm_mmu_new_pgd);
 
 static unsigned long get_cr3(struct kvm_vcpu *vcpu)
 {
@@ -5038,7 +5038,7 @@ void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly,
 		kvm_calc_shadow_ept_root_page_role(vcpu, accessed_dirty,
 						   execonly, level);
 
-	__kvm_mmu_new_cr3(vcpu, new_eptp, new_role.base, true, true);
+	__kvm_mmu_new_pgd(vcpu, new_eptp, new_role.base, true, true);
 
 	if (new_role.as_u64 == context->mmu_role.as_u64)
 		return;
@@ -5532,7 +5532,7 @@ void kvm_mmu_invpcid_gva(struct kvm_vcpu *vcpu, gva_t gva, unsigned long pcid)
 
 	for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++) {
 		if (VALID_PAGE(mmu->prev_roots[i].hpa) &&
-		    pcid == kvm_get_pcid(vcpu, mmu->prev_roots[i].cr3)) {
+		    pcid == kvm_get_pcid(vcpu, mmu->prev_roots[i].pgd)) {
 			mmu->invlpg(vcpu, gva, mmu->prev_roots[i].hpa);
 			tlb_flush = true;
 		}
@@ -5686,13 +5686,13 @@ int kvm_mmu_create(struct kvm_vcpu *vcpu)
 	vcpu->arch.walk_mmu = &vcpu->arch.root_mmu;
 
 	vcpu->arch.root_mmu.root_hpa = INVALID_PAGE;
-	vcpu->arch.root_mmu.root_cr3 = 0;
+	vcpu->arch.root_mmu.root_pgd = 0;
 	vcpu->arch.root_mmu.translate_gpa = translate_gpa;
 	for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++)
 		vcpu->arch.root_mmu.prev_roots[i] = KVM_MMU_ROOT_INFO_INVALID;
 
 	vcpu->arch.guest_mmu.root_hpa = INVALID_PAGE;
-	vcpu->arch.guest_mmu.root_cr3 = 0;
+	vcpu->arch.guest_mmu.root_pgd = 0;
 	vcpu->arch.guest_mmu.translate_gpa = translate_gpa;
 	for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++)
 		vcpu->arch.guest_mmu.prev_roots[i] = KVM_MMU_ROOT_INFO_INVALID;
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 72e69d841531..88fe87f8e140 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -356,7 +356,7 @@ static void nested_ept_inject_page_fault(struct kvm_vcpu *vcpu,
 		for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++) {
 			prev = &vcpu->arch.mmu->prev_roots[i];
 
-			if (nested_ept_root_matches(prev->hpa, prev->cr3,
+			if (nested_ept_root_matches(prev->hpa, prev->pgd,
 						    vmcs12->ept_pointer))
 				vcpu->arch.mmu->invlpg(vcpu, gpa, prev->hpa);
 		}
@@ -1166,7 +1166,7 @@ static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3, bool ne
 	 * nested_vmx_transition_mmu_sync for details on skipping the MMU sync.
 	 */
 	if (!nested_ept)
-		kvm_mmu_new_cr3(vcpu, cr3, true,
+		kvm_mmu_new_pgd(vcpu, cr3, true,
 				!nested_vmx_transition_mmu_sync(vcpu));
 
 	vcpu->arch.cr3 = cr3;
@@ -5233,13 +5233,13 @@ static int handle_invept(struct kvm_vcpu *vcpu)
 				VMXERR_INVALID_OPERAND_TO_INVEPT_INVVPID);
 
 		roots_to_free = 0;
-		if (nested_ept_root_matches(mmu->root_hpa, mmu->root_cr3,
+		if (nested_ept_root_matches(mmu->root_hpa, mmu->root_pgd,
 					    operand.eptp))
 			roots_to_free |= KVM_MMU_ROOT_CURRENT;
 
 		for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++) {
 			if (nested_ept_root_matches(mmu->prev_roots[i].hpa,
-						    mmu->prev_roots[i].cr3,
+						    mmu->prev_roots[i].pgd,
 						    operand.eptp))
 				roots_to_free |= KVM_MMU_ROOT_PREVIOUS(i);
 		}
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index d49d2a1ddf03..53fea2d38590 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -5487,7 +5487,7 @@ static int handle_invpcid(struct kvm_vcpu *vcpu)
 		}
 
 		for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++)
-			if (kvm_get_pcid(vcpu, vcpu->arch.mmu->prev_roots[i].cr3)
+			if (kvm_get_pcid(vcpu, vcpu->arch.mmu->prev_roots[i].pgd)
 			    == operand.pcid)
 				roots_to_free |= KVM_MMU_ROOT_PREVIOUS(i);
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0d1572a0791c..210af343eebf 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1045,7 +1045,7 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
 		 !load_pdptrs(vcpu, vcpu->arch.walk_mmu, cr3))
 		return 1;
 
-	kvm_mmu_new_cr3(vcpu, cr3, skip_tlb_flush, skip_tlb_flush);
+	kvm_mmu_new_pgd(vcpu, cr3, skip_tlb_flush, skip_tlb_flush);
 	vcpu->arch.cr3 = cr3;
 	kvm_register_mark_available(vcpu, VCPU_EXREG_CR3);
 
-- 
2.24.1


^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH v3 37/37] KVM: VMX: Clean cr3/pgd handling in vmx_load_mmu_pgd()
  2020-03-20 21:27 [PATCH v3 00/37] KVM: x86: TLB flushing fixes and enhancements Sean Christopherson
                   ` (35 preceding siblings ...)
  2020-03-20 21:28 ` [PATCH v3 36/37] KVM: x86: Replace "cr3" with "pgd" in "new cr3/pgd" related code Sean Christopherson
@ 2020-03-20 21:28 ` Sean Christopherson
  36 siblings, 0 replies; 83+ messages in thread
From: Sean Christopherson @ 2020-03-20 21:28 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Junaid Shahid,
	Liran Alon, Boris Ostrovsky, John Haxby, Miaohe Lin,
	Tom Lendacky

Rename @cr3 to @pgd in vmx_load_mmu_pgd() to reflect that it will be
loaded into vmcs.EPT_POINTER and not vmcs.GUEST_CR3 when EPT is enabled.
Similarly, load guest_cr3 with @pgd if and only if EPT is disabled.

This fixes one of the last, if not _the_ last, cases in KVM where a
variable that is not strictly a cr3 value uses "cr3" isntead of "pgd".

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/vmx/vmx.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 53fea2d38590..b7ca11d4766c 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -3045,16 +3045,15 @@ u64 construct_eptp(struct kvm_vcpu *vcpu, unsigned long root_hpa)
 	return eptp;
 }
 
-void vmx_load_mmu_pgd(struct kvm_vcpu *vcpu, unsigned long cr3)
+void vmx_load_mmu_pgd(struct kvm_vcpu *vcpu, unsigned long pgd)
 {
 	struct kvm *kvm = vcpu->kvm;
 	bool update_guest_cr3 = true;
 	unsigned long guest_cr3;
 	u64 eptp;
 
-	guest_cr3 = cr3;
 	if (enable_ept) {
-		eptp = construct_eptp(vcpu, cr3);
+		eptp = construct_eptp(vcpu, pgd);
 		vmcs_write64(EPT_POINTER, eptp);
 
 		if (kvm_x86_ops->tlb_remote_flush) {
@@ -3075,6 +3074,8 @@ void vmx_load_mmu_pgd(struct kvm_vcpu *vcpu, unsigned long cr3)
 		else /* vmcs01.GUEST_CR3 is already up-to-date. */
 			update_guest_cr3 = false;
 		ept_load_pdptrs(vcpu);
+	} else {
+		guest_cr3 = pgd;
 	}
 
 	if (update_guest_cr3)
-- 
2.24.1


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 02/37] KVM: nVMX: Validate the EPTP when emulating INVEPT(EXTENT_CONTEXT)
  2020-03-20 21:27 ` [PATCH v3 02/37] KVM: nVMX: Validate the EPTP when emulating INVEPT(EXTENT_CONTEXT) Sean Christopherson
@ 2020-03-23 14:51   ` Vitaly Kuznetsov
  2020-03-23 15:45     ` Sean Christopherson
  0 siblings, 1 reply; 83+ messages in thread
From: Vitaly Kuznetsov @ 2020-03-23 14:51 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: Wanpeng Li, Jim Mattson, Joerg Roedel, kvm, linux-kernel,
	Ben Gardon, Junaid Shahid, Liran Alon, Boris Ostrovsky,
	John Haxby, Miaohe Lin, Tom Lendacky

Sean Christopherson <sean.j.christopherson@intel.com> writes:

> Signal VM-Fail for the single-context variant of INVEPT if the specified
> EPTP is invalid.  Per the INEVPT pseudocode in Intel's SDM, it's subject
> to the standard EPT checks:
>
>   If VM entry with the "enable EPT" VM execution control set to 1 would
>   fail due to the EPTP value then VMfail(Invalid operand to INVEPT/INVVPID);
>
> Fixes: bfd0a56b90005 ("nEPT: Nested INVEPT")
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> ---
>  arch/x86/kvm/vmx/nested.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 8578513907d7..f3774cef4fd4 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -5156,8 +5156,12 @@ static int handle_invept(struct kvm_vcpu *vcpu)
>  	}
>  
>  	switch (type) {
> -	case VMX_EPT_EXTENT_GLOBAL:
>  	case VMX_EPT_EXTENT_CONTEXT:
> +		if (!nested_vmx_check_eptp(vcpu, operand.eptp))
> +			return nested_vmx_failValid(vcpu,
> +				VMXERR_INVALID_OPERAND_TO_INVEPT_INVVPID);

I was going to ask "and we don't seem to check that current nested VMPTR
is valid, how can we know that nested_vmx_failValid() is the right
VMfail() to use" but then I checked our nested_vmx_failValid() and there
is a fallback there:

	if (vmx->nested.current_vmptr == -1ull && !vmx->nested.hv_evmcs)
		return nested_vmx_failInvalid(vcpu);

so this is a non-issue. My question, however, transforms into "would it
make sense to introduce nested_vmx_fail() implementing the logic from
SDM:

VMfail(ErrorNumber):
	IF VMCS pointer is valid
		THEN VMfailValid(ErrorNumber);
	ELSE VMfailInvalid;
	FI;

to assist an innocent reader of the code?"

> +		fallthrough;
> +	case VMX_EPT_EXTENT_GLOBAL:
>  	/*
>  	 * TODO: Sync the necessary shadow EPT roots here, rather than
>  	 * at the next emulated VM-entry.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>

-- 
Vitaly


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 03/37] KVM: nVMX: Invalidate all EPTP contexts when emulating INVEPT for L1
  2020-03-20 21:27 ` [PATCH v3 03/37] KVM: nVMX: Invalidate all EPTP contexts when emulating INVEPT for L1 Sean Christopherson
@ 2020-03-23 15:24   ` Vitaly Kuznetsov
  2020-03-23 15:53     ` Sean Christopherson
  2020-03-23 16:24   ` Jim Mattson
  1 sibling, 1 reply; 83+ messages in thread
From: Vitaly Kuznetsov @ 2020-03-23 15:24 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: Wanpeng Li, Jim Mattson, Joerg Roedel, kvm, linux-kernel,
	Ben Gardon, Junaid Shahid, Liran Alon, Boris Ostrovsky,
	John Haxby, Miaohe Lin, Tom Lendacky

Sean Christopherson <sean.j.christopherson@intel.com> writes:

> Free all L2 (guest_mmu) roots when emulating INVEPT for L1.  Outstanding
> changes to the EPT tables managed by L1 need to be recognized, and
> relying on KVM to always flush L2's EPTP context on nested VM-Enter is
> dangerous.
>
> Similar to handle_invpcid(), rely on kvm_mmu_free_roots() to do a remote
> TLB flush if necessary, e.g. if L1 has never entered L2 then there is
> nothing to be done.
>
> Nuking all L2 roots is overkill for the single-context variant, but it's
> the safe and easy bet.  A more precise zap mechanism will be added in
> the future.  Add a TODO to call out that KVM only needs to invalidate
> affected contexts.
>
> Fixes: b119019847fbc ("kvm: nVMX: Remove unnecessary sync_roots from handle_invept")
> Reported-by: Jim Mattson <jmattson@google.com>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> ---
>  arch/x86/kvm/vmx/nested.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index f3774cef4fd4..9624cea4ed9f 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -5160,12 +5160,12 @@ static int handle_invept(struct kvm_vcpu *vcpu)
>  		if (!nested_vmx_check_eptp(vcpu, operand.eptp))
>  			return nested_vmx_failValid(vcpu,
>  				VMXERR_INVALID_OPERAND_TO_INVEPT_INVVPID);
> +
> +		/* TODO: sync only the target EPTP context. */
>  		fallthrough;
>  	case VMX_EPT_EXTENT_GLOBAL:
> -	/*
> -	 * TODO: Sync the necessary shadow EPT roots here, rather than
> -	 * at the next emulated VM-entry.
> -	 */
> +		kvm_mmu_free_roots(vcpu, &vcpu->arch.guest_mmu,
> +				   KVM_MMU_ROOTS_ALL);
>  		break;

An ignorant reader may wonder "and how do we know that L1 actaully uses
EPT" as he may find out that guest_mmu is not being used otherwise. The
answer to the question will likely be "if L1 doesn't use EPT for some of
its guests than there's nothing we should do here as we will be
resetting root_mmu when switching to/from them". Hope the ignorant
reviewer typing this is not very wrong :-)

>  	default:
>  		BUG_ON(1);

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>

-- 
Vitaly


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 04/37] KVM: nVMX: Invalidate all roots when emulating INVVPID without EPT
  2020-03-20 21:28 ` [PATCH v3 04/37] KVM: nVMX: Invalidate all roots when emulating INVVPID without EPT Sean Christopherson
@ 2020-03-23 15:34   ` Vitaly Kuznetsov
  2020-03-23 16:04     ` Sean Christopherson
  0 siblings, 1 reply; 83+ messages in thread
From: Vitaly Kuznetsov @ 2020-03-23 15:34 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: Wanpeng Li, Jim Mattson, Joerg Roedel, kvm, linux-kernel,
	Ben Gardon, Junaid Shahid, Liran Alon, Boris Ostrovsky,
	John Haxby, Miaohe Lin, Tom Lendacky

Sean Christopherson <sean.j.christopherson@intel.com> writes:

> From: Junaid Shahid <junaids@google.com>
>
> Free all roots when emulating INVVPID for L1 and EPT is disabled, as
> outstanding changes to the page tables managed by L1 need to be
> recognized.  Because L1 and L2 share an MMU when EPT is disabled, and
> because VPID is not tracked by the MMU role, all roots in the current
> MMU (root_mmu) need to be freed, otherwise a future nested VM-Enter or
> VM-Exit could do a fast CR3 switch (without a flush/sync) and consume
> stale SPTEs.
>
> Fixes: 5c614b3583e7b ("KVM: nVMX: nested VPID emulation")
> Signed-off-by: Junaid Shahid <junaids@google.com>
> [sean: ported to upstream KVM, reworded the comment and changelog]
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> ---
>  arch/x86/kvm/vmx/nested.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
>
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 9624cea4ed9f..bc74fbbf33c6 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -5250,6 +5250,20 @@ static int handle_invvpid(struct kvm_vcpu *vcpu)
>  		return kvm_skip_emulated_instruction(vcpu);
>  	}
>  
> +	/*
> +	 * Sync the shadow page tables if EPT is disabled, L1 is invalidating
> +	 * linear mappings for L2 (tagged with L2's VPID).  Free all roots as
> +	 * VPIDs are not tracked in the MMU role.
> +	 *
> +	 * Note, this operates on root_mmu, not guest_mmu, as L1 and L2 share
> +	 * an MMU when EPT is disabled.
> +	 *
> +	 * TODO: sync only the affected SPTEs for INVDIVIDUAL_ADDR.
> +	 */
> +	if (!enable_ept)
> +		kvm_mmu_free_roots(vcpu, &vcpu->arch.root_mmu,
> +				   KVM_MMU_ROOTS_ALL);
> +

This is related to my remark on the previous patch; the comment above
makes me think I'm missing something obvious, enlighten me please)

My understanding is that L1 and L2 will share arch.root_mmu not only
when EPT is globally disabled, we seem to switch between
root_mmu/guest_mmu only when nested_cpu_has_ept(vmcs12) but different L2
guests may be different on this. Do we need to handle this somehow?

>  	return nested_vmx_succeed(vcpu);
>  }

-- 
Vitaly


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 02/37] KVM: nVMX: Validate the EPTP when emulating INVEPT(EXTENT_CONTEXT)
  2020-03-23 14:51   ` Vitaly Kuznetsov
@ 2020-03-23 15:45     ` Sean Christopherson
  2020-03-23 23:46       ` Paolo Bonzini
  0 siblings, 1 reply; 83+ messages in thread
From: Sean Christopherson @ 2020-03-23 15:45 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Paolo Bonzini, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Ben Gardon, Junaid Shahid, Liran Alon,
	Boris Ostrovsky, John Haxby, Miaohe Lin, Tom Lendacky

On Mon, Mar 23, 2020 at 03:51:17PM +0100, Vitaly Kuznetsov wrote:
> Sean Christopherson <sean.j.christopherson@intel.com> writes:
> 
> > Signal VM-Fail for the single-context variant of INVEPT if the specified
> > EPTP is invalid.  Per the INEVPT pseudocode in Intel's SDM, it's subject
> > to the standard EPT checks:
> >
> >   If VM entry with the "enable EPT" VM execution control set to 1 would
> >   fail due to the EPTP value then VMfail(Invalid operand to INVEPT/INVVPID);
> >
> > Fixes: bfd0a56b90005 ("nEPT: Nested INVEPT")
> > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> > ---
> >  arch/x86/kvm/vmx/nested.c | 6 +++++-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> > index 8578513907d7..f3774cef4fd4 100644
> > --- a/arch/x86/kvm/vmx/nested.c
> > +++ b/arch/x86/kvm/vmx/nested.c
> > @@ -5156,8 +5156,12 @@ static int handle_invept(struct kvm_vcpu *vcpu)
> >  	}
> >  
> >  	switch (type) {
> > -	case VMX_EPT_EXTENT_GLOBAL:
> >  	case VMX_EPT_EXTENT_CONTEXT:
> > +		if (!nested_vmx_check_eptp(vcpu, operand.eptp))
> > +			return nested_vmx_failValid(vcpu,
> > +				VMXERR_INVALID_OPERAND_TO_INVEPT_INVVPID);
> 
> I was going to ask "and we don't seem to check that current nested VMPTR
> is valid, how can we know that nested_vmx_failValid() is the right
> VMfail() to use" but then I checked our nested_vmx_failValid() and there
> is a fallback there:
> 
> 	if (vmx->nested.current_vmptr == -1ull && !vmx->nested.hv_evmcs)
> 		return nested_vmx_failInvalid(vcpu);
> 
> so this is a non-issue. My question, however, transforms into "would it
> make sense to introduce nested_vmx_fail() implementing the logic from
> SDM:
> 
> VMfail(ErrorNumber):
> 	IF VMCS pointer is valid
> 		THEN VMfailValid(ErrorNumber);
> 	ELSE VMfailInvalid;
> 	FI;
> 

Hmm, I wouldn't be opposed to such a wrapper.  It would pair with
nested_vmx_succeed().

> 
> > +		fallthrough;
> > +	case VMX_EPT_EXTENT_GLOBAL:
> >  	/*
> >  	 * TODO: Sync the necessary shadow EPT roots here, rather than
> >  	 * at the next emulated VM-entry.
> 
> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> 
> -- 
> Vitaly
> 

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 05/37] KVM: x86: Export kvm_propagate_fault() (as kvm_inject_emulated_page_fault)
  2020-03-20 21:28 ` [PATCH v3 05/37] KVM: x86: Export kvm_propagate_fault() (as kvm_inject_emulated_page_fault) Sean Christopherson
@ 2020-03-23 15:47   ` Vitaly Kuznetsov
  2020-03-23 16:24     ` Sean Christopherson
  0 siblings, 1 reply; 83+ messages in thread
From: Vitaly Kuznetsov @ 2020-03-23 15:47 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: Wanpeng Li, Jim Mattson, Joerg Roedel, kvm, linux-kernel,
	Ben Gardon, Junaid Shahid, Liran Alon, Boris Ostrovsky,
	John Haxby, Miaohe Lin, Tom Lendacky

Sean Christopherson <sean.j.christopherson@intel.com> writes:

> Export the page fault propagation helper so that VMX can use it to
> correctly emulate TLB invalidation on page faults in an upcoming patch.
>
> In the (hopefully) not-too-distant future, SGX virtualization will also
> want access to the helper for injecting page faults to the correct level
> (L1 vs. L2) when emulating ENCLS instructions.
>
> Rename the function to kvm_inject_emulated_page_fault() to clarify that
> it is (a) injecting a fault and (b) only for page faults.  WARN if it's
> invoked with an exception other than PF_VECTOR.
>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> ---
>  arch/x86/include/asm/kvm_host.h | 2 ++
>  arch/x86/kvm/x86.c              | 8 ++++++--
>  2 files changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 9a183e9d4cb1..328b1765ff76 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1447,6 +1447,8 @@ void kvm_queue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code);
>  void kvm_requeue_exception(struct kvm_vcpu *vcpu, unsigned nr);
>  void kvm_requeue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code);
>  void kvm_inject_page_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault);
> +bool kvm_inject_emulated_page_fault(struct kvm_vcpu *vcpu,
> +				    struct x86_exception *fault);
>  int kvm_read_guest_page_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
>  			    gfn_t gfn, void *data, int offset, int len,
>  			    u32 access);
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index e54c6ad628a8..64ed6e6e2b56 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -611,8 +611,11 @@ void kvm_inject_page_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault)
>  }
>  EXPORT_SYMBOL_GPL(kvm_inject_page_fault);
>  
> -static bool kvm_propagate_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault)
> +bool kvm_inject_emulated_page_fault(struct kvm_vcpu *vcpu,
> +				    struct x86_exception *fault)
>  {
> +	WARN_ON_ONCE(fault->vector != PF_VECTOR);
> +
>  	if (mmu_is_nested(vcpu) && !fault->nested_page_fault)
>  		vcpu->arch.nested_mmu.inject_page_fault(vcpu, fault);
>  	else
> @@ -620,6 +623,7 @@ static bool kvm_propagate_fault(struct kvm_vcpu *vcpu, struct x86_exception *fau
>  
>  	return fault->nested_page_fault;
>  }
> +EXPORT_SYMBOL_GPL(kvm_inject_emulated_page_fault);

We don't seem to use the return value a lot, actually,
inject_emulated_exception() seems to be the only one, the rest just call
it without checking the return value. Judging by the new name, I'd guess
that the function returns whether it was able to inject the exception or
not but this doesn't seem to be the case. My suggestion would then be to
make it return 'void' and return 'fault->nested_page_fault' separately
in inject_emulated_exception().

>  
>  void kvm_inject_nmi(struct kvm_vcpu *vcpu)
>  {
> @@ -6373,7 +6377,7 @@ static bool inject_emulated_exception(struct kvm_vcpu *vcpu)
>  {
>  	struct x86_emulate_ctxt *ctxt = vcpu->arch.emulate_ctxt;
>  	if (ctxt->exception.vector == PF_VECTOR)
> -		return kvm_propagate_fault(vcpu, &ctxt->exception);
> +		return kvm_inject_emulated_page_fault(vcpu, &ctxt->exception);
>  
>  	if (ctxt->exception.error_code_valid)
>  		kvm_queue_exception_e(vcpu, ctxt->exception.vector,

With or without the change suggested above,

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>

-- 
Vitaly


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 03/37] KVM: nVMX: Invalidate all EPTP contexts when emulating INVEPT for L1
  2020-03-23 15:24   ` Vitaly Kuznetsov
@ 2020-03-23 15:53     ` Sean Christopherson
  0 siblings, 0 replies; 83+ messages in thread
From: Sean Christopherson @ 2020-03-23 15:53 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Paolo Bonzini, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Ben Gardon, Junaid Shahid, Liran Alon,
	Boris Ostrovsky, John Haxby, Miaohe Lin, Tom Lendacky

On Mon, Mar 23, 2020 at 04:24:05PM +0100, Vitaly Kuznetsov wrote:
> Sean Christopherson <sean.j.christopherson@intel.com> writes:
> 
> > Free all L2 (guest_mmu) roots when emulating INVEPT for L1.  Outstanding
> > changes to the EPT tables managed by L1 need to be recognized, and
> > relying on KVM to always flush L2's EPTP context on nested VM-Enter is
> > dangerous.
> >
> > Similar to handle_invpcid(), rely on kvm_mmu_free_roots() to do a remote
> > TLB flush if necessary, e.g. if L1 has never entered L2 then there is
> > nothing to be done.
> >
> > Nuking all L2 roots is overkill for the single-context variant, but it's
> > the safe and easy bet.  A more precise zap mechanism will be added in
> > the future.  Add a TODO to call out that KVM only needs to invalidate
> > affected contexts.
> >
> > Fixes: b119019847fbc ("kvm: nVMX: Remove unnecessary sync_roots from handle_invept")
> > Reported-by: Jim Mattson <jmattson@google.com>
> > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> > ---
> >  arch/x86/kvm/vmx/nested.c | 8 ++++----
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> >
> > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> > index f3774cef4fd4..9624cea4ed9f 100644
> > --- a/arch/x86/kvm/vmx/nested.c
> > +++ b/arch/x86/kvm/vmx/nested.c
> > @@ -5160,12 +5160,12 @@ static int handle_invept(struct kvm_vcpu *vcpu)
> >  		if (!nested_vmx_check_eptp(vcpu, operand.eptp))
> >  			return nested_vmx_failValid(vcpu,
> >  				VMXERR_INVALID_OPERAND_TO_INVEPT_INVVPID);
> > +
> > +		/* TODO: sync only the target EPTP context. */
> >  		fallthrough;
> >  	case VMX_EPT_EXTENT_GLOBAL:
> > -	/*
> > -	 * TODO: Sync the necessary shadow EPT roots here, rather than
> > -	 * at the next emulated VM-entry.
> > -	 */
> > +		kvm_mmu_free_roots(vcpu, &vcpu->arch.guest_mmu,
> > +				   KVM_MMU_ROOTS_ALL);
> >  		break;
> 
> An ignorant reader may wonder "and how do we know that L1 actaully uses
> EPT" as he may find out that guest_mmu is not being used otherwise. The
> answer to the question will likely be "if L1 doesn't use EPT for some of
> its guests than there's nothing we should do here as we will be
> resetting root_mmu when switching to/from them". Hope the ignorant
> reviewer typing this is not very wrong :-)

A different way to put it would be:

  KVM never uses root_mmu to hold nested EPT roots.

Invalidating too much is functionally ok, though sub-optimal for performance.
Invalidating too little is what we really care about.

FWIW, VMX currently uses guest_mmu iff nested EPT is enabled.  In theory,
KVM could be enhanced to also used guest_mmu when nested-TDP is disabled,
e.g. to enable VMX to preserve L1's root_mmu when emulating INVVPID.  That
would likely be a decent performance boost for nested VMX+VPID without
nested EPT, but I'm guessing that the cross-section of users that care
about nested performance and don't use nested EPT is quite small.

> >  	default:
> >  		BUG_ON(1);
> 
> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> 
> -- 
> Vitaly
> 

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 04/37] KVM: nVMX: Invalidate all roots when emulating INVVPID without EPT
  2020-03-23 15:34   ` Vitaly Kuznetsov
@ 2020-03-23 16:04     ` Sean Christopherson
  2020-03-23 16:33       ` Vitaly Kuznetsov
  0 siblings, 1 reply; 83+ messages in thread
From: Sean Christopherson @ 2020-03-23 16:04 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Paolo Bonzini, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Ben Gardon, Junaid Shahid, Liran Alon,
	Boris Ostrovsky, John Haxby, Miaohe Lin, Tom Lendacky

On Mon, Mar 23, 2020 at 04:34:17PM +0100, Vitaly Kuznetsov wrote:
> Sean Christopherson <sean.j.christopherson@intel.com> writes:
> 
> > From: Junaid Shahid <junaids@google.com>
> >
> > Free all roots when emulating INVVPID for L1 and EPT is disabled, as
> > outstanding changes to the page tables managed by L1 need to be
> > recognized.  Because L1 and L2 share an MMU when EPT is disabled, and
> > because VPID is not tracked by the MMU role, all roots in the current
> > MMU (root_mmu) need to be freed, otherwise a future nested VM-Enter or
> > VM-Exit could do a fast CR3 switch (without a flush/sync) and consume
> > stale SPTEs.
> >
> > Fixes: 5c614b3583e7b ("KVM: nVMX: nested VPID emulation")
> > Signed-off-by: Junaid Shahid <junaids@google.com>
> > [sean: ported to upstream KVM, reworded the comment and changelog]
> > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> > ---
> >  arch/x86/kvm/vmx/nested.c | 14 ++++++++++++++
> >  1 file changed, 14 insertions(+)
> >
> > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> > index 9624cea4ed9f..bc74fbbf33c6 100644
> > --- a/arch/x86/kvm/vmx/nested.c
> > +++ b/arch/x86/kvm/vmx/nested.c
> > @@ -5250,6 +5250,20 @@ static int handle_invvpid(struct kvm_vcpu *vcpu)
> >  		return kvm_skip_emulated_instruction(vcpu);
> >  	}
> >  
> > +	/*
> > +	 * Sync the shadow page tables if EPT is disabled, L1 is invalidating
> > +	 * linear mappings for L2 (tagged with L2's VPID).  Free all roots as
> > +	 * VPIDs are not tracked in the MMU role.
> > +	 *
> > +	 * Note, this operates on root_mmu, not guest_mmu, as L1 and L2 share
> > +	 * an MMU when EPT is disabled.
> > +	 *
> > +	 * TODO: sync only the affected SPTEs for INVDIVIDUAL_ADDR.
> > +	 */
> > +	if (!enable_ept)
> > +		kvm_mmu_free_roots(vcpu, &vcpu->arch.root_mmu,
> > +				   KVM_MMU_ROOTS_ALL);
> > +
> 
> This is related to my remark on the previous patch; the comment above
> makes me think I'm missing something obvious, enlighten me please)
> 
> My understanding is that L1 and L2 will share arch.root_mmu not only
> when EPT is globally disabled, we seem to switch between
> root_mmu/guest_mmu only when nested_cpu_has_ept(vmcs12) but different L2
> guests may be different on this. Do we need to handle this somehow?

guest_mmu is used iff nested EPT is enabled, which requires enable_ept=1.
enable_ept is global and cannot be changed without reloading kvm_intel.

This most definitely over-invalidates, e.g. it blasts away L1's page
tables.  But, fixing that requires tracking VPID in mmu_role and/or adding
support for using guest_mmu when L1 isn't using TDP, i.e. nested EPT is
disabled.  Assuming the vast majority of nested deployments enable EPT in
L0, the cost of both options likely outweighs the benefits.

> >  	return nested_vmx_succeed(vcpu);
> >  }
> 
> -- 
> Vitaly
> 

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 03/37] KVM: nVMX: Invalidate all EPTP contexts when emulating INVEPT for L1
  2020-03-20 21:27 ` [PATCH v3 03/37] KVM: nVMX: Invalidate all EPTP contexts when emulating INVEPT for L1 Sean Christopherson
  2020-03-23 15:24   ` Vitaly Kuznetsov
@ 2020-03-23 16:24   ` Jim Mattson
  2020-03-23 16:28     ` Sean Christopherson
  1 sibling, 1 reply; 83+ messages in thread
From: Jim Mattson @ 2020-03-23 16:24 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Joerg Roedel,
	kvm list, LKML, Ben Gardon, Junaid Shahid, Liran Alon,
	Boris Ostrovsky, John Haxby, Miaohe Lin, Tom Lendacky

On Fri, Mar 20, 2020 at 2:29 PM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
>
> Free all L2 (guest_mmu) roots when emulating INVEPT for L1.  Outstanding
> changes to the EPT tables managed by L1 need to be recognized, and
> relying on KVM to always flush L2's EPTP context on nested VM-Enter is
> dangerous.
>
> Similar to handle_invpcid(), rely on kvm_mmu_free_roots() to do a remote
> TLB flush if necessary, e.g. if L1 has never entered L2 then there is
> nothing to be done.
>
> Nuking all L2 roots is overkill for the single-context variant, but it's
> the safe and easy bet.  A more precise zap mechanism will be added in
> the future.  Add a TODO to call out that KVM only needs to invalidate
> affected contexts.
>
> Fixes: b119019847fbc ("kvm: nVMX: Remove unnecessary sync_roots from handle_invept")

The bug existed well before the commit indicated in the "Fixes" line.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 05/37] KVM: x86: Export kvm_propagate_fault() (as kvm_inject_emulated_page_fault)
  2020-03-23 15:47   ` Vitaly Kuznetsov
@ 2020-03-23 16:24     ` Sean Christopherson
  2020-03-23 23:56       ` Paolo Bonzini
  0 siblings, 1 reply; 83+ messages in thread
From: Sean Christopherson @ 2020-03-23 16:24 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Paolo Bonzini, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Ben Gardon, Junaid Shahid, Liran Alon,
	Boris Ostrovsky, John Haxby, Miaohe Lin, Tom Lendacky

On Mon, Mar 23, 2020 at 04:47:49PM +0100, Vitaly Kuznetsov wrote:
> Sean Christopherson <sean.j.christopherson@intel.com> writes:
> 
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index e54c6ad628a8..64ed6e6e2b56 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -611,8 +611,11 @@ void kvm_inject_page_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault)
> >  }
> >  EXPORT_SYMBOL_GPL(kvm_inject_page_fault);
> >  
> > -static bool kvm_propagate_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault)
> > +bool kvm_inject_emulated_page_fault(struct kvm_vcpu *vcpu,
> > +				    struct x86_exception *fault)
> >  {
> > +	WARN_ON_ONCE(fault->vector != PF_VECTOR);
> > +
> >  	if (mmu_is_nested(vcpu) && !fault->nested_page_fault)
> >  		vcpu->arch.nested_mmu.inject_page_fault(vcpu, fault);
> >  	else
> > @@ -620,6 +623,7 @@ static bool kvm_propagate_fault(struct kvm_vcpu *vcpu, struct x86_exception *fau
> >  
> >  	return fault->nested_page_fault;
> >  }
> > +EXPORT_SYMBOL_GPL(kvm_inject_emulated_page_fault);
> 
> We don't seem to use the return value a lot, actually,
> inject_emulated_exception() seems to be the only one, the rest just call
> it without checking the return value. Judging by the new name, I'd guess
> that the function returns whether it was able to inject the exception or
> not but this doesn't seem to be the case. My suggestion would then be to
> make it return 'void' and return 'fault->nested_page_fault' separately
> in inject_emulated_exception().

Oooh, I like that idea.  The return from the common helper also confuses me
every time I look at it.

> >  void kvm_inject_nmi(struct kvm_vcpu *vcpu)
> >  {
> > @@ -6373,7 +6377,7 @@ static bool inject_emulated_exception(struct kvm_vcpu *vcpu)
> >  {
> >  	struct x86_emulate_ctxt *ctxt = vcpu->arch.emulate_ctxt;
> >  	if (ctxt->exception.vector == PF_VECTOR)
> > -		return kvm_propagate_fault(vcpu, &ctxt->exception);
> > +		return kvm_inject_emulated_page_fault(vcpu, &ctxt->exception);
> >  
> >  	if (ctxt->exception.error_code_valid)
> >  		kvm_queue_exception_e(vcpu, ctxt->exception.vector,
> 
> With or without the change suggested above,
> 
> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> 
> -- 
> Vitaly
> 

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 03/37] KVM: nVMX: Invalidate all EPTP contexts when emulating INVEPT for L1
  2020-03-23 16:24   ` Jim Mattson
@ 2020-03-23 16:28     ` Sean Christopherson
  2020-03-23 16:36       ` Jim Mattson
  0 siblings, 1 reply; 83+ messages in thread
From: Sean Christopherson @ 2020-03-23 16:28 UTC (permalink / raw)
  To: Jim Mattson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Joerg Roedel,
	kvm list, LKML, Ben Gardon, Junaid Shahid, Liran Alon,
	Boris Ostrovsky, John Haxby, Miaohe Lin, Tom Lendacky

On Mon, Mar 23, 2020 at 09:24:25AM -0700, Jim Mattson wrote:
> On Fri, Mar 20, 2020 at 2:29 PM Sean Christopherson
> <sean.j.christopherson@intel.com> wrote:
> >
> > Free all L2 (guest_mmu) roots when emulating INVEPT for L1.  Outstanding
> > changes to the EPT tables managed by L1 need to be recognized, and
> > relying on KVM to always flush L2's EPTP context on nested VM-Enter is
> > dangerous.
> >
> > Similar to handle_invpcid(), rely on kvm_mmu_free_roots() to do a remote
> > TLB flush if necessary, e.g. if L1 has never entered L2 then there is
> > nothing to be done.
> >
> > Nuking all L2 roots is overkill for the single-context variant, but it's
> > the safe and easy bet.  A more precise zap mechanism will be added in
> > the future.  Add a TODO to call out that KVM only needs to invalidate
> > affected contexts.
> >
> > Fixes: b119019847fbc ("kvm: nVMX: Remove unnecessary sync_roots from handle_invept")
> 
> The bug existed well before the commit indicated in the "Fixes" line.

Ah, my bad.  A cursory glance at commit b119019847fbc makes that quite
obvious.  This should be

  Fixes: bfd0a56b9000 ("nEPT: Nested INVEPT")

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 04/37] KVM: nVMX: Invalidate all roots when emulating INVVPID without EPT
  2020-03-23 16:04     ` Sean Christopherson
@ 2020-03-23 16:33       ` Vitaly Kuznetsov
  2020-03-23 16:50         ` Sean Christopherson
  0 siblings, 1 reply; 83+ messages in thread
From: Vitaly Kuznetsov @ 2020-03-23 16:33 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Ben Gardon, Junaid Shahid, Liran Alon,
	Boris Ostrovsky, John Haxby, Miaohe Lin, Tom Lendacky

Sean Christopherson <sean.j.christopherson@intel.com> writes:

> On Mon, Mar 23, 2020 at 04:34:17PM +0100, Vitaly Kuznetsov wrote:
>> Sean Christopherson <sean.j.christopherson@intel.com> writes:
>> 
>> > From: Junaid Shahid <junaids@google.com>
>> >
>> > Free all roots when emulating INVVPID for L1 and EPT is disabled, as
>> > outstanding changes to the page tables managed by L1 need to be
>> > recognized.  Because L1 and L2 share an MMU when EPT is disabled, and
>> > because VPID is not tracked by the MMU role, all roots in the current
>> > MMU (root_mmu) need to be freed, otherwise a future nested VM-Enter or
>> > VM-Exit could do a fast CR3 switch (without a flush/sync) and consume
>> > stale SPTEs.
>> >
>> > Fixes: 5c614b3583e7b ("KVM: nVMX: nested VPID emulation")
>> > Signed-off-by: Junaid Shahid <junaids@google.com>
>> > [sean: ported to upstream KVM, reworded the comment and changelog]
>> > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
>> > ---
>> >  arch/x86/kvm/vmx/nested.c | 14 ++++++++++++++
>> >  1 file changed, 14 insertions(+)
>> >
>> > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
>> > index 9624cea4ed9f..bc74fbbf33c6 100644
>> > --- a/arch/x86/kvm/vmx/nested.c
>> > +++ b/arch/x86/kvm/vmx/nested.c
>> > @@ -5250,6 +5250,20 @@ static int handle_invvpid(struct kvm_vcpu *vcpu)
>> >  		return kvm_skip_emulated_instruction(vcpu);
>> >  	}
>> >  
>> > +	/*
>> > +	 * Sync the shadow page tables if EPT is disabled, L1 is invalidating
>> > +	 * linear mappings for L2 (tagged with L2's VPID).  Free all roots as
>> > +	 * VPIDs are not tracked in the MMU role.
>> > +	 *
>> > +	 * Note, this operates on root_mmu, not guest_mmu, as L1 and L2 share
>> > +	 * an MMU when EPT is disabled.
>> > +	 *
>> > +	 * TODO: sync only the affected SPTEs for INVDIVIDUAL_ADDR.
>> > +	 */
>> > +	if (!enable_ept)
>> > +		kvm_mmu_free_roots(vcpu, &vcpu->arch.root_mmu,
>> > +				   KVM_MMU_ROOTS_ALL);
>> > +
>> 
>> This is related to my remark on the previous patch; the comment above
>> makes me think I'm missing something obvious, enlighten me please)
>> 
>> My understanding is that L1 and L2 will share arch.root_mmu not only
>> when EPT is globally disabled, we seem to switch between
>> root_mmu/guest_mmu only when nested_cpu_has_ept(vmcs12) but different L2
>> guests may be different on this. Do we need to handle this somehow?
>
> guest_mmu is used iff nested EPT is enabled, which requires enable_ept=1.
> enable_ept is global and cannot be changed without reloading kvm_intel.
>
> This most definitely over-invalidates, e.g. it blasts away L1's page
> tables.  But, fixing that requires tracking VPID in mmu_role and/or adding
> support for using guest_mmu when L1 isn't using TDP, i.e. nested EPT is
> disabled.  Assuming the vast majority of nested deployments enable EPT in
> L0, the cost of both options likely outweighs the benefits.
>

Yes but my question rather was: what if global 'enable_ept' is true but
nested EPT is not being used by L1, don't we still need to do
kvm_mmu_free_roots(&vcpu->arch.root_mmu) here?

-- 
Vitaly


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 03/37] KVM: nVMX: Invalidate all EPTP contexts when emulating INVEPT for L1
  2020-03-23 16:28     ` Sean Christopherson
@ 2020-03-23 16:36       ` Jim Mattson
  2020-03-23 16:44         ` Sean Christopherson
  0 siblings, 1 reply; 83+ messages in thread
From: Jim Mattson @ 2020-03-23 16:36 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Joerg Roedel,
	kvm list, LKML, Ben Gardon, Junaid Shahid, Liran Alon,
	Boris Ostrovsky, John Haxby, Miaohe Lin, Tom Lendacky

On Mon, Mar 23, 2020 at 9:28 AM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
>
> On Mon, Mar 23, 2020 at 09:24:25AM -0700, Jim Mattson wrote:
> > On Fri, Mar 20, 2020 at 2:29 PM Sean Christopherson
> > <sean.j.christopherson@intel.com> wrote:
> > >
> > > Free all L2 (guest_mmu) roots when emulating INVEPT for L1.  Outstanding
> > > changes to the EPT tables managed by L1 need to be recognized, and
> > > relying on KVM to always flush L2's EPTP context on nested VM-Enter is
> > > dangerous.
> > >
> > > Similar to handle_invpcid(), rely on kvm_mmu_free_roots() to do a remote
> > > TLB flush if necessary, e.g. if L1 has never entered L2 then there is
> > > nothing to be done.
> > >
> > > Nuking all L2 roots is overkill for the single-context variant, but it's
> > > the safe and easy bet.  A more precise zap mechanism will be added in
> > > the future.  Add a TODO to call out that KVM only needs to invalidate
> > > affected contexts.
> > >
> > > Fixes: b119019847fbc ("kvm: nVMX: Remove unnecessary sync_roots from handle_invept")
> >
> > The bug existed well before the commit indicated in the "Fixes" line.
>
> Ah, my bad.  A cursory glance at commit b119019847fbc makes that quite
> obvious.  This should be
>
>   Fixes: bfd0a56b9000 ("nEPT: Nested INVEPT")

Actually, I think that things were fine back then (though we
gratuitously flushed L1's TLB as a result of an emulated INVEPT). The
problem started when we stopped flushing the TLB on every emulated
VM-entry (i.e. L1 -> L2 transitions). I'm not sure what that commit
was, but I think you referenced it in an earlier email.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 03/37] KVM: nVMX: Invalidate all EPTP contexts when emulating INVEPT for L1
  2020-03-23 16:36       ` Jim Mattson
@ 2020-03-23 16:44         ` Sean Christopherson
  2020-03-23 23:50           ` Paolo Bonzini
  0 siblings, 1 reply; 83+ messages in thread
From: Sean Christopherson @ 2020-03-23 16:44 UTC (permalink / raw)
  To: Jim Mattson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Joerg Roedel,
	kvm list, LKML, Ben Gardon, Junaid Shahid, Liran Alon,
	Boris Ostrovsky, John Haxby, Miaohe Lin, Tom Lendacky

On Mon, Mar 23, 2020 at 09:36:22AM -0700, Jim Mattson wrote:
> On Mon, Mar 23, 2020 at 9:28 AM Sean Christopherson
> <sean.j.christopherson@intel.com> wrote:
> >
> > On Mon, Mar 23, 2020 at 09:24:25AM -0700, Jim Mattson wrote:
> > > On Fri, Mar 20, 2020 at 2:29 PM Sean Christopherson
> > > <sean.j.christopherson@intel.com> wrote:
> > > >
> > > > Free all L2 (guest_mmu) roots when emulating INVEPT for L1.  Outstanding
> > > > changes to the EPT tables managed by L1 need to be recognized, and
> > > > relying on KVM to always flush L2's EPTP context on nested VM-Enter is
> > > > dangerous.
> > > >
> > > > Similar to handle_invpcid(), rely on kvm_mmu_free_roots() to do a remote
> > > > TLB flush if necessary, e.g. if L1 has never entered L2 then there is
> > > > nothing to be done.
> > > >
> > > > Nuking all L2 roots is overkill for the single-context variant, but it's
> > > > the safe and easy bet.  A more precise zap mechanism will be added in
> > > > the future.  Add a TODO to call out that KVM only needs to invalidate
> > > > affected contexts.
> > > >
> > > > Fixes: b119019847fbc ("kvm: nVMX: Remove unnecessary sync_roots from handle_invept")
> > >
> > > The bug existed well before the commit indicated in the "Fixes" line.
> >
> > Ah, my bad.  A cursory glance at commit b119019847fbc makes that quite
> > obvious.  This should be
> >
> >   Fixes: bfd0a56b9000 ("nEPT: Nested INVEPT")
> 
> Actually, I think that things were fine back then (though we
> gratuitously flushed L1's TLB as a result of an emulated INVEPT). The
> problem started when we stopped flushing the TLB on every emulated
> VM-entry (i.e. L1 -> L2 transitions). I'm not sure what that commit
> was, but I think you referenced it in an earlier email.

Hmm, true.  I was thinking it was the original commit because it didn't
operate on guest_mmu, but guest_mmu didn't exist back then.  So I think

  Fixes: 14c07ad89f4d ("x86/kvm/mmu: introduce guest_mmu")

would be appropriate?

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 04/37] KVM: nVMX: Invalidate all roots when emulating INVVPID without EPT
  2020-03-23 16:33       ` Vitaly Kuznetsov
@ 2020-03-23 16:50         ` Sean Christopherson
  2020-03-23 16:57           ` Vitaly Kuznetsov
  0 siblings, 1 reply; 83+ messages in thread
From: Sean Christopherson @ 2020-03-23 16:50 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Paolo Bonzini, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Ben Gardon, Junaid Shahid, Liran Alon,
	Boris Ostrovsky, John Haxby, Miaohe Lin, Tom Lendacky

On Mon, Mar 23, 2020 at 05:33:08PM +0100, Vitaly Kuznetsov wrote:
> Sean Christopherson <sean.j.christopherson@intel.com> writes:
> 
> > On Mon, Mar 23, 2020 at 04:34:17PM +0100, Vitaly Kuznetsov wrote:
> >> Sean Christopherson <sean.j.christopherson@intel.com> writes:
> >> 
> >> > From: Junaid Shahid <junaids@google.com>
> >> >
> >> > Free all roots when emulating INVVPID for L1 and EPT is disabled, as
> >> > outstanding changes to the page tables managed by L1 need to be
> >> > recognized.  Because L1 and L2 share an MMU when EPT is disabled, and
> >> > because VPID is not tracked by the MMU role, all roots in the current
> >> > MMU (root_mmu) need to be freed, otherwise a future nested VM-Enter or
> >> > VM-Exit could do a fast CR3 switch (without a flush/sync) and consume
> >> > stale SPTEs.
> >> >
> >> > Fixes: 5c614b3583e7b ("KVM: nVMX: nested VPID emulation")
> >> > Signed-off-by: Junaid Shahid <junaids@google.com>
> >> > [sean: ported to upstream KVM, reworded the comment and changelog]
> >> > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> >> > ---
> >> >  arch/x86/kvm/vmx/nested.c | 14 ++++++++++++++
> >> >  1 file changed, 14 insertions(+)
> >> >
> >> > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> >> > index 9624cea4ed9f..bc74fbbf33c6 100644
> >> > --- a/arch/x86/kvm/vmx/nested.c
> >> > +++ b/arch/x86/kvm/vmx/nested.c
> >> > @@ -5250,6 +5250,20 @@ static int handle_invvpid(struct kvm_vcpu *vcpu)
> >> >  		return kvm_skip_emulated_instruction(vcpu);
> >> >  	}
> >> >  
> >> > +	/*
> >> > +	 * Sync the shadow page tables if EPT is disabled, L1 is invalidating
> >> > +	 * linear mappings for L2 (tagged with L2's VPID).  Free all roots as
> >> > +	 * VPIDs are not tracked in the MMU role.
> >> > +	 *
> >> > +	 * Note, this operates on root_mmu, not guest_mmu, as L1 and L2 share
> >> > +	 * an MMU when EPT is disabled.
> >> > +	 *
> >> > +	 * TODO: sync only the affected SPTEs for INVDIVIDUAL_ADDR.
> >> > +	 */
> >> > +	if (!enable_ept)
> >> > +		kvm_mmu_free_roots(vcpu, &vcpu->arch.root_mmu,
> >> > +				   KVM_MMU_ROOTS_ALL);
> >> > +
> >> 
> >> This is related to my remark on the previous patch; the comment above
> >> makes me think I'm missing something obvious, enlighten me please)
> >> 
> >> My understanding is that L1 and L2 will share arch.root_mmu not only
> >> when EPT is globally disabled, we seem to switch between
> >> root_mmu/guest_mmu only when nested_cpu_has_ept(vmcs12) but different L2
> >> guests may be different on this. Do we need to handle this somehow?
> >
> > guest_mmu is used iff nested EPT is enabled, which requires enable_ept=1.
> > enable_ept is global and cannot be changed without reloading kvm_intel.
> >
> > This most definitely over-invalidates, e.g. it blasts away L1's page
> > tables.  But, fixing that requires tracking VPID in mmu_role and/or adding
> > support for using guest_mmu when L1 isn't using TDP, i.e. nested EPT is
> > disabled.  Assuming the vast majority of nested deployments enable EPT in
> > L0, the cost of both options likely outweighs the benefits.
> >
> 
> Yes but my question rather was: what if global 'enable_ept' is true but
> nested EPT is not being used by L1, don't we still need to do
> kvm_mmu_free_roots(&vcpu->arch.root_mmu) here?

No, because L0 isn't shadowing the L1->L2 page tables, i.e. there can't be
unsync'd SPTEs for L2.  The vpid_sync_*() above flushes the TLB for L2's
effective VPID, which is all that's required.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 04/37] KVM: nVMX: Invalidate all roots when emulating INVVPID without EPT
  2020-03-23 16:50         ` Sean Christopherson
@ 2020-03-23 16:57           ` Vitaly Kuznetsov
  0 siblings, 0 replies; 83+ messages in thread
From: Vitaly Kuznetsov @ 2020-03-23 16:57 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Ben Gardon, Junaid Shahid, Liran Alon,
	Boris Ostrovsky, John Haxby, Miaohe Lin, Tom Lendacky

Sean Christopherson <sean.j.christopherson@intel.com> writes:

> On Mon, Mar 23, 2020 at 05:33:08PM +0100, Vitaly Kuznetsov wrote:
>> Sean Christopherson <sean.j.christopherson@intel.com> writes:
>> 
>> > On Mon, Mar 23, 2020 at 04:34:17PM +0100, Vitaly Kuznetsov wrote:
>> >> Sean Christopherson <sean.j.christopherson@intel.com> writes:
>> >> 
>> >> > From: Junaid Shahid <junaids@google.com>
>> >> >
>> >> > Free all roots when emulating INVVPID for L1 and EPT is disabled, as
>> >> > outstanding changes to the page tables managed by L1 need to be
>> >> > recognized.  Because L1 and L2 share an MMU when EPT is disabled, and
>> >> > because VPID is not tracked by the MMU role, all roots in the current
>> >> > MMU (root_mmu) need to be freed, otherwise a future nested VM-Enter or
>> >> > VM-Exit could do a fast CR3 switch (without a flush/sync) and consume
>> >> > stale SPTEs.
>> >> >
>> >> > Fixes: 5c614b3583e7b ("KVM: nVMX: nested VPID emulation")
>> >> > Signed-off-by: Junaid Shahid <junaids@google.com>
>> >> > [sean: ported to upstream KVM, reworded the comment and changelog]
>> >> > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
>> >> > ---
>> >> >  arch/x86/kvm/vmx/nested.c | 14 ++++++++++++++
>> >> >  1 file changed, 14 insertions(+)
>> >> >
>> >> > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
>> >> > index 9624cea4ed9f..bc74fbbf33c6 100644
>> >> > --- a/arch/x86/kvm/vmx/nested.c
>> >> > +++ b/arch/x86/kvm/vmx/nested.c
>> >> > @@ -5250,6 +5250,20 @@ static int handle_invvpid(struct kvm_vcpu *vcpu)
>> >> >  		return kvm_skip_emulated_instruction(vcpu);
>> >> >  	}
>> >> >  
>> >> > +	/*
>> >> > +	 * Sync the shadow page tables if EPT is disabled, L1 is invalidating
>> >> > +	 * linear mappings for L2 (tagged with L2's VPID).  Free all roots as
>> >> > +	 * VPIDs are not tracked in the MMU role.
>> >> > +	 *
>> >> > +	 * Note, this operates on root_mmu, not guest_mmu, as L1 and L2 share
>> >> > +	 * an MMU when EPT is disabled.
>> >> > +	 *
>> >> > +	 * TODO: sync only the affected SPTEs for INVDIVIDUAL_ADDR.
>> >> > +	 */
>> >> > +	if (!enable_ept)
>> >> > +		kvm_mmu_free_roots(vcpu, &vcpu->arch.root_mmu,
>> >> > +				   KVM_MMU_ROOTS_ALL);
>> >> > +
>> >> 
>> >> This is related to my remark on the previous patch; the comment above
>> >> makes me think I'm missing something obvious, enlighten me please)
>> >> 
>> >> My understanding is that L1 and L2 will share arch.root_mmu not only
>> >> when EPT is globally disabled, we seem to switch between
>> >> root_mmu/guest_mmu only when nested_cpu_has_ept(vmcs12) but different L2
>> >> guests may be different on this. Do we need to handle this somehow?
>> >
>> > guest_mmu is used iff nested EPT is enabled, which requires enable_ept=1.
>> > enable_ept is global and cannot be changed without reloading kvm_intel.
>> >
>> > This most definitely over-invalidates, e.g. it blasts away L1's page
>> > tables.  But, fixing that requires tracking VPID in mmu_role and/or adding
>> > support for using guest_mmu when L1 isn't using TDP, i.e. nested EPT is
>> > disabled.  Assuming the vast majority of nested deployments enable EPT in
>> > L0, the cost of both options likely outweighs the benefits.
>> >
>> 
>> Yes but my question rather was: what if global 'enable_ept' is true but
>> nested EPT is not being used by L1, don't we still need to do
>> kvm_mmu_free_roots(&vcpu->arch.root_mmu) here?
>
> No, because L0 isn't shadowing the L1->L2 page tables, i.e. there can't be
> unsync'd SPTEs for L2.  The vpid_sync_*() above flushes the TLB for L2's
> effective VPID, which is all that's required.

Ah, stupid me, it's actually EPT and not nested EPT which we care about
here. Thank you for the clarification!

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>

-- 
Vitaly


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 02/37] KVM: nVMX: Validate the EPTP when emulating INVEPT(EXTENT_CONTEXT)
  2020-03-23 15:45     ` Sean Christopherson
@ 2020-03-23 23:46       ` Paolo Bonzini
  0 siblings, 0 replies; 83+ messages in thread
From: Paolo Bonzini @ 2020-03-23 23:46 UTC (permalink / raw)
  To: Sean Christopherson, Vitaly Kuznetsov
  Cc: Wanpeng Li, Jim Mattson, Joerg Roedel, kvm, linux-kernel,
	Ben Gardon, Junaid Shahid, Liran Alon, Boris Ostrovsky,
	John Haxby, Miaohe Lin, Tom Lendacky

On 23/03/20 16:45, Sean Christopherson wrote:
>> My question, however, transforms into "would it
>> make sense to introduce nested_vmx_fail() implementing the logic from
>> SDM:
>>
>> VMfail(ErrorNumber):
>> 	IF VMCS pointer is valid
>> 		THEN VMfailValid(ErrorNumber);
>> 	ELSE VMfailInvalid;
>> 	FI;
>>
> Hmm, I wouldn't be opposed to such a wrapper.  It would pair with
> nested_vmx_succeed().
> 

Neither would I.

Paolo


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 03/37] KVM: nVMX: Invalidate all EPTP contexts when emulating INVEPT for L1
  2020-03-23 16:44         ` Sean Christopherson
@ 2020-03-23 23:50           ` Paolo Bonzini
  2020-03-24  0:12             ` Jim Mattson
  0 siblings, 1 reply; 83+ messages in thread
From: Paolo Bonzini @ 2020-03-23 23:50 UTC (permalink / raw)
  To: Sean Christopherson, Jim Mattson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Joerg Roedel, kvm list, LKML,
	Ben Gardon, Junaid Shahid, Liran Alon, Boris Ostrovsky,
	John Haxby, Miaohe Lin, Tom Lendacky

On 23/03/20 17:44, Sean Christopherson wrote:
> So I think
> 
>   Fixes: 14c07ad89f4d ("x86/kvm/mmu: introduce guest_mmu")
> 
> would be appropriate?
> 

Yes.  I changed it and also added the comment

+		/*
+		 * Nested EPT roots are always held through guest_mmu,
+		 * not root_mmu.
+		 */

which isn't unlike what you suggested elsewhere in the thread.

Paolo


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 05/37] KVM: x86: Export kvm_propagate_fault() (as kvm_inject_emulated_page_fault)
  2020-03-23 16:24     ` Sean Christopherson
@ 2020-03-23 23:56       ` Paolo Bonzini
  0 siblings, 0 replies; 83+ messages in thread
From: Paolo Bonzini @ 2020-03-23 23:56 UTC (permalink / raw)
  To: Sean Christopherson, Vitaly Kuznetsov
  Cc: Wanpeng Li, Jim Mattson, Joerg Roedel, kvm, linux-kernel,
	Ben Gardon, Junaid Shahid, Liran Alon, Boris Ostrovsky,
	John Haxby, Miaohe Lin, Tom Lendacky

On 23/03/20 17:24, Sean Christopherson wrote:
>> We don't seem to use the return value a lot, actually,
>> inject_emulated_exception() seems to be the only one, the rest just call
>> it without checking the return value. Judging by the new name, I'd guess
>> that the function returns whether it was able to inject the exception or
>> not but this doesn't seem to be the case. My suggestion would then be to
>> make it return 'void' and return 'fault->nested_page_fault' separately
>> in inject_emulated_exception().
> Oooh, I like that idea.  The return from the common helper also confuses me
> every time I look at it.
> 

Separate patch, please.  I'm not sure it makes a great difference though.

Paolo


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 03/37] KVM: nVMX: Invalidate all EPTP contexts when emulating INVEPT for L1
  2020-03-23 23:50           ` Paolo Bonzini
@ 2020-03-24  0:12             ` Jim Mattson
  2020-03-30 18:38               ` Sean Christopherson
  0 siblings, 1 reply; 83+ messages in thread
From: Jim Mattson @ 2020-03-24  0:12 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Joerg Roedel,
	kvm list, LKML, Ben Gardon, Junaid Shahid, Liran Alon,
	Boris Ostrovsky, John Haxby, Miaohe Lin, Tom Lendacky

On Mon, Mar 23, 2020 at 4:51 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 23/03/20 17:44, Sean Christopherson wrote:
> > So I think
> >
> >   Fixes: 14c07ad89f4d ("x86/kvm/mmu: introduce guest_mmu")
> >
> > would be appropriate?
> >
>
> Yes.

I think it was actually commit efebf0aaec3d ("KVM: nVMX: Do not flush
TLB on L1<->L2 transitions if L1 uses VPID and EPT").

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 06/37] KVM: x86: Consolidate logic for injecting page faults to L1
  2020-03-20 21:28 ` [PATCH v3 06/37] KVM: x86: Consolidate logic for injecting page faults to L1 Sean Christopherson
@ 2020-03-24  0:47   ` Paolo Bonzini
  0 siblings, 0 replies; 83+ messages in thread
From: Paolo Bonzini @ 2020-03-24  0:47 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Ben Gardon, Junaid Shahid, Liran Alon,
	Boris Ostrovsky, John Haxby, Miaohe Lin, Tom Lendacky

On 20/03/20 22:28, Sean Christopherson wrote:
> +void kvm_inject_l1_page_fault(struct kvm_vcpu *vcpu,
> +			      struct x86_exception *fault)
> +{
> +	vcpu->arch.mmu->inject_page_fault(vcpu, fault);
> +}
> +
>  bool kvm_inject_emulated_page_fault(struct kvm_vcpu *vcpu,
>  				    struct x86_exception *fault)
>  {
> @@ -619,7 +625,7 @@ bool kvm_inject_emulated_page_fault(struct kvm_vcpu *vcpu,
>  	if (mmu_is_nested(vcpu) && !fault->nested_page_fault)
>  		vcpu->arch.nested_mmu.inject_page_fault(vcpu, fault);
>  	else
> -		vcpu->arch.mmu->inject_page_fault(vcpu, fault);
> +		kvm_inject_l1_page_fault(vcpu, fault);
>  
>  	return fault->nested_page_fault;

This all started with "I don't like the name of the function" but
thinking more about it, we can also write this as

	if (mmu_is_nested(vcpu) && !fault->nested_page_fault)
		vcpu->arch.walk_mmu->inject_page_fault(vcpu, fault);
	else
		vcpu->arch.mmu->inject_page_fault(vcpu, fault);

Now, if !mmu_is_nested(vcpu) then walk_mmu == mmu, so it's much simpler
up until this patch:

	fault_mmu = fault->nested_page_fault ? vcpu->arch.mmu : vcpu->arch.walk_mmu;
	fault_mmu->inject_page_fault(vcpu, fault);

(which also matches how fault->nested_page_fault is assigned to).
In patch 7 we add the invalidation in kvm_inject_l1_page_fault, but
is it necessary to do it only in the else?

+	if (!vcpu->arch.mmu->direct_map &&
+	    (fault->error_code & PFERR_PRESENT_MASK))
+		vcpu->arch.mmu->invlpg(vcpu, fault->address,
+				       vcpu->arch.mmu->root_hpa);
+
 	vcpu->arch.mmu->inject_page_fault(vcpu, fault);
 }
 
The direct_map check is really just an optimization to avoid a
retpoline if ->invlpg is nonpaging_invlpg.  We can change it to
!vcpu->arch.mmu->invlpg if nonpaging_invlpg is replaced with NULL,
and then the same "if" condition can also be used for the nested_mmu
i.e. what patch 7 writes as

+		/*
+		 * No need to sync SPTEs, the fault is being injected into L2,
+		 * whose page tables are not being shadowed.
+		 */
 		vcpu->arch.nested_mmu.inject_page_fault(vcpu, fault);


Finally, patch 7 also adds a tlb_flush_gva call which is already present
in kvm_mmu_invlpg, and this brings the final form to look like this:

bool kvm_inject_emulated_page_fault(struct kvm_vcpu *vcpu,
                                    struct x86_exception *fault)
{
        struct kvm_mmu *fault_mmu;
        WARN_ON_ONCE(fault->vector != PF_VECTOR);

        fault_mmu = fault->nested_page_fault ? vcpu->arch.mmu : vcpu->arch.walk_mmu;

        /*
         * Invalidate the TLB entry for the faulting address, if it exists,
         * else the access will fault indefinitely (and to emulate hardware).
         */
        if (fault->error_code & PFERR_PRESENT_MASK)
                __kvm_mmu_invlpg(vcpu, fault_mmu, fault->address);

        fault_mmu->inject_page_fault(vcpu, fault);
        return fault->nested_page_fault;
}

This will become a formal mini-series replacing patches 6 and 7
after I test it, so no need to do anything on your part.

Paolo


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 31/37] KVM: x86/mmu: Add separate override for MMU sync during fast CR3 switch
  2020-03-20 21:28 ` [PATCH v3 31/37] KVM: x86/mmu: Add separate override for MMU sync during fast CR3 switch Sean Christopherson
@ 2020-03-24 11:07   ` Paolo Bonzini
  0 siblings, 0 replies; 83+ messages in thread
From: Paolo Bonzini @ 2020-03-24 11:07 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Ben Gardon, Junaid Shahid, Liran Alon,
	Boris Ostrovsky, John Haxby, Miaohe Lin, Tom Lendacky

On 20/03/20 22:28, Sean Christopherson wrote:
> Add a separate "skip" override for MMU sync, a future change to avoid
> TLB flushes on nested VMX transitions may need to sync the MMU even if
> the TLB flush is unnecessary.
> 
> Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>

I added a WARN_ON(skip_tlb_flush && !skip_mmu_sync); which could help
catching misordered parameters.

Paolo


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 33/37] KVM: nVMX: Skip MMU sync on nested VMX transition when possible
  2020-03-20 21:28 ` [PATCH v3 33/37] KVM: nVMX: Skip MMU sync on nested VMX transition when possible Sean Christopherson
@ 2020-03-24 11:19   ` Paolo Bonzini
  0 siblings, 0 replies; 83+ messages in thread
From: Paolo Bonzini @ 2020-03-24 11:19 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Ben Gardon, Junaid Shahid, Liran Alon,
	Boris Ostrovsky, John Haxby, Miaohe Lin, Tom Lendacky

On 20/03/20 22:28, Sean Christopherson wrote:
> Skip the MMU sync when reusing a cached root if EPT is enabled or L1
> enabled VPID for L2.
> 
> If EPT is enabled, guest-physical mappings aren't flushed even if VPID
> is disabled, i.e. L1 can't expect stale TLB entries to be flushed if it
> has enabled EPT and L0 isn't shadowing PTEs (for L1 or L2) if L1 has
> EPT disabled.
> 
> If VPID is enabled (and EPT is disabled), then L1 can't expect stale TLB
> entries to be flushed (for itself or L2).
> 
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>

Great, just a small rephrasing here and there:

/*
 * Returns true if the MMU needs to be sync'd on nested VM-Enter/VM-Exit.
 * tl;dr: the MMU needs a sync if L0 is using shadow paging and L1 didn't
 * enable VPID for L2 (implying it expects a TLB flush on VMX transitions).
 * Here's why.
 *
 * If EPT is enabled by L0 a sync is never needed:
 * - if it is disabled by L1, then L0 is not shadowing L1 or L2 PTEs, there
 *   cannot be unsync'd SPTEs for either L1 or L2.
 *
 * - if it is also enabled by L1, then L0 doesn't need to sync on VM-Enter
 *   VM-Enter as VM-Enter isn't required to invalidate guest-physical mappings
 *   (irrespective of VPID), i.e. L1 can't rely on the (virtual) CPU to flush
 *   stale guest-physical mappings for L2 from the TLB.  And as above, L0 isn't
 *   shadowing L1 PTEs so there are no unsync'd SPTEs to sync on VM-Exit.
 *
 * If EPT is disabled by L0:
 * - if VPID is enabled by L1 (for L2), the situation is similar to when L1
 *   enables EPT: L0 doesn't need to sync as VM-Enter and VM-Exit aren't
 *   required to invalidate linear mappings (EPT is disabled so there are
 *   no combined or guest-physical mappings), i.e. L1 can't rely on the
 *   (virtual) CPU to flush stale linear mappings for either L2 or itself (L1).
 *
 * - however if VPID is disabled by L1, then a sync is needed as L1 expects all
 *   linear mappings (EPT is disabled so there are no combined or guest-physical
 *   mappings) to be invalidated on both VM-Enter and VM-Exit.
 *
 * Note, this logic is subtly different than nested_has_guest_tlb_tag(), which
 * additionally checks that L2 has been assigned a VPID (when EPT is disabled).
 * Whether or not L2 has been assigned a VPID by L0 is irrelevant with respect
 * to L1's expectations, e.g. L0 needs to invalidate hardware TLB entries if L2
 * doesn't have a unique VPID to prevent reusing L1's entries (assuming L1 has
 * been assigned a VPID), but L0 doesn't need to do a MMU sync because L1
 * doesn't expect stale (virtual) TLB entries to be flushed, i.e. L1 doesn't
 * know that L0 will flush the TLB and so L1 will do INVVPID as needed to flush
 * stale TLB entries, at which point L0 will sync L2's MMU.
 */

Paolo


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 34/37] KVM: nVMX: Don't flush TLB on nested VMX transition
  2020-03-20 21:28 ` [PATCH v3 34/37] KVM: nVMX: Don't flush TLB on nested VMX transition Sean Christopherson
@ 2020-03-24 11:20   ` Paolo Bonzini
  2020-03-24 18:10     ` Sean Christopherson
  0 siblings, 1 reply; 83+ messages in thread
From: Paolo Bonzini @ 2020-03-24 11:20 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Ben Gardon, Junaid Shahid, Liran Alon,
	Boris Ostrovsky, John Haxby, Miaohe Lin, Tom Lendacky

On 20/03/20 22:28, Sean Christopherson wrote:
> Unconditionally skip the TLB flush triggered when reusing a root for a
> nested transition as nested_vmx_transition_tlb_flush() ensures the TLB
> is flushed when needed, regardless of whether the MMU can reuse a cached
> root (or the last root).
> 
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>

So much for my WARN_ON. :)

Paolo

> ---
>  arch/x86/kvm/mmu/mmu.c    | 2 +-
>  arch/x86/kvm/vmx/nested.c | 6 ++++--
>  2 files changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 84e1e748c2b3..7b0fb7f2c24d 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -5038,7 +5038,7 @@ void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly,
>  		kvm_calc_shadow_ept_root_page_role(vcpu, accessed_dirty,
>  						   execonly, level);
>  
> -	__kvm_mmu_new_cr3(vcpu, new_eptp, new_role.base, false, true);
> +	__kvm_mmu_new_cr3(vcpu, new_eptp, new_role.base, true, true);
>  
>  	if (new_role.as_u64 == context->mmu_role.as_u64)
>  		return;
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index db3ce8f297c2..92aab4166498 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -1161,10 +1161,12 @@ static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3, bool ne
>  	}
>  
>  	/*
> -	 * See nested_vmx_transition_mmu_sync for details on skipping the MMU sync.
> +	 * Unconditionally skip the TLB flush on fast CR3 switch, all TLB
> +	 * flushes are handled by nested_vmx_transition_tlb_flush().  See
> +	 * nested_vmx_transition_mmu_sync for details on skipping the MMU sync.
>  	 */
>  	if (!nested_ept)
> -		kvm_mmu_new_cr3(vcpu, cr3, false,
> +		kvm_mmu_new_cr3(vcpu, cr3, true,
>  				!nested_vmx_transition_mmu_sync(vcpu));
>  
>  	vcpu->arch.cr3 = cr3;
> 


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 34/37] KVM: nVMX: Don't flush TLB on nested VMX transition
  2020-03-24 11:20   ` Paolo Bonzini
@ 2020-03-24 18:10     ` Sean Christopherson
  0 siblings, 0 replies; 83+ messages in thread
From: Sean Christopherson @ 2020-03-24 18:10 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Ben Gardon, Junaid Shahid, Liran Alon,
	Boris Ostrovsky, John Haxby, Miaohe Lin, Tom Lendacky

On Tue, Mar 24, 2020 at 12:20:31PM +0100, Paolo Bonzini wrote:
> On 20/03/20 22:28, Sean Christopherson wrote:
> > Unconditionally skip the TLB flush triggered when reusing a root for a
> > nested transition as nested_vmx_transition_tlb_flush() ensures the TLB
> > is flushed when needed, regardless of whether the MMU can reuse a cached
> > root (or the last root).
> > 
> > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> 
> So much for my WARN_ON. :)

Ha, yeah.  The double boolean also makes me nervous, but since there are
only two options, it seemed cleaner overall than a single mask-based param,
a ala EMULTYPE.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 08/37] KVM: VMX: Skip global INVVPID fallback if vpid==0 in vpid_sync_context()
  2020-03-20 21:28 ` [PATCH v3 08/37] KVM: VMX: Skip global INVVPID fallback if vpid==0 in vpid_sync_context() Sean Christopherson
@ 2020-03-25  9:33   ` Vitaly Kuznetsov
  0 siblings, 0 replies; 83+ messages in thread
From: Vitaly Kuznetsov @ 2020-03-25  9:33 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: Wanpeng Li, Jim Mattson, Joerg Roedel, kvm, linux-kernel,
	Ben Gardon, Junaid Shahid, Liran Alon, Boris Ostrovsky,
	John Haxby, Miaohe Lin, Tom Lendacky

Sean Christopherson <sean.j.christopherson@intel.com> writes:

> Skip the global INVVPID in the unlikely scenario that vpid==0 and the
> SINGLE_CONTEXT variant of INVVPID is unsupported.  If vpid==0, there's
> no need to INVVPID as it's impossible to do VM-Enter with VPID enabled
> and vmcs.VPID==0, i.e. there can't be any TLB entries for the vCPU with
> vpid==0.  The fact that the SINGLE_CONTEXT variant isn't supported is
> irrelevant.
>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> ---
>  arch/x86/kvm/vmx/ops.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/vmx/ops.h b/arch/x86/kvm/vmx/ops.h
> index 45eaedee2ac0..33645a8e5463 100644
> --- a/arch/x86/kvm/vmx/ops.h
> +++ b/arch/x86/kvm/vmx/ops.h
> @@ -285,7 +285,7 @@ static inline void vpid_sync_context(int vpid)
>  {
>  	if (cpu_has_vmx_invvpid_single())
>  		vpid_sync_vcpu_single(vpid);
> -	else
> +	else if (vpid != 0)
>  		vpid_sync_vcpu_global();
>  }

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>

(personally, I also prefer 'vpid !=0' to '!vpid', however, nested.c
uses expressions like '&& !vmcs12->virtual_processor_id' instead...)

-- 
Vitaly


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 14/37] KVM: x86: Move "flush guest's TLB" logic to separate kvm_x86_ops hook
  2020-03-20 21:28 ` [PATCH v3 14/37] KVM: x86: Move "flush guest's TLB" logic to separate kvm_x86_ops hook Sean Christopherson
@ 2020-03-25 10:23   ` Vitaly Kuznetsov
  2020-03-25 15:41     ` Paolo Bonzini
  2020-03-25 15:48     ` Sean Christopherson
  0 siblings, 2 replies; 83+ messages in thread
From: Vitaly Kuznetsov @ 2020-03-25 10:23 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: Wanpeng Li, Jim Mattson, Joerg Roedel, kvm, linux-kernel,
	Ben Gardon, Junaid Shahid, Liran Alon, Boris Ostrovsky,
	John Haxby, Miaohe Lin, Tom Lendacky

Sean Christopherson <sean.j.christopherson@intel.com> writes:

> Add a dedicated hook to handle flushing TLB entries on behalf of the
> guest, i.e. for a paravirtualized TLB flush, and use it directly instead
> of bouncing through kvm_vcpu_flush_tlb().
>
> For VMX, change the effective implementation implementation to never do
> INVEPT and flush only the current context, i.e. to always flush via
> INVVPID(SINGLE_CONTEXT).  The INVEPT performed by __vmx_flush_tlb() when
> @invalidate_gpa=false and enable_vpid=0 is unnecessary, as it will only
> flush guest-physical mappings; linear and combined mappings are flushed
> by VM-Enter when VPID is disabled, and changes in the guest pages tables
> do not affect guest-physical mappings.
>
> When EPT and VPID are enabled, doing INVVPID is not required (by Intel's
> architecture) to invalidate guest-physical mappings, i.e. TLB entries
> that cache guest-physical mappings can live across INVVPID as the
> mappings are associated with an EPTP, not a VPID.  The intent of
> @invalidate_gpa is to inform vmx_flush_tlb() that it must "invalidate
> gpa mappings", i.e. do INVEPT and not simply INVVPID.  Other than nested
> VPID handling, which now calls vpid_sync_context() directly, the only
> scenario where KVM can safely do INVVPID instead of INVEPT (when EPT is
> enabled) is if KVM is flushing TLB entries from the guest's perspective,
> i.e. is only required to invalidate linear mappings.
>
> For SVM, flushing TLB entries from the guest's perspective can be done
> by flushing the current ASID, as changes to the guest's page tables are
> associated only with the current ASID.
>
> Adding a dedicated ->tlb_flush_guest() paves the way toward removing
> @invalidate_gpa, which is a potentially dangerous control flag as its
> meaning is not exactly crystal clear, even for those who are familiar
> with the subtleties of what mappings Intel CPUs are/aren't allowed to
> keep across various invalidation scenarios.
>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> ---
>  arch/x86/include/asm/kvm_host.h |  6 ++++++
>  arch/x86/kvm/svm.c              |  6 ++++++
>  arch/x86/kvm/vmx/vmx.c          | 13 +++++++++++++
>  arch/x86/kvm/x86.c              |  2 +-
>  4 files changed, 26 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index cdbf822c5c8b..c08f4c0bf4d1 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1118,6 +1118,12 @@ struct kvm_x86_ops {
>  	 */
>  	void (*tlb_flush_gva)(struct kvm_vcpu *vcpu, gva_t addr);
>  
> +	/*
> +	 * Flush any TLB entries created by the guest.  Like tlb_flush_gva(),
> +	 * does not need to flush GPA->HPA mappings.
> +	 */
> +	void (*tlb_flush_guest)(struct kvm_vcpu *vcpu);
> +
>  	void (*run)(struct kvm_vcpu *vcpu);
>  	int (*handle_exit)(struct kvm_vcpu *vcpu,
>  		enum exit_fastpath_completion exit_fastpath);
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 08568ae9f7a1..396f42753489 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -5643,6 +5643,11 @@ static void svm_flush_tlb_gva(struct kvm_vcpu *vcpu, gva_t gva)
>  	invlpga(gva, svm->vmcb->control.asid);
>  }
>  
> +static void svm_flush_tlb_guest(struct kvm_vcpu *vcpu)
> +{
> +	svm_flush_tlb(vcpu, false);
> +}
> +
>  static void svm_prepare_guest_switch(struct kvm_vcpu *vcpu)
>  {
>  }
> @@ -7400,6 +7405,7 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
>  
>  	.tlb_flush = svm_flush_tlb,
>  	.tlb_flush_gva = svm_flush_tlb_gva,
> +	.tlb_flush_guest = svm_flush_tlb_guest,
>  
>  	.run = svm_vcpu_run,
>  	.handle_exit = handle_exit,
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index ba24bbda2c12..57c1cee58d18 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -2862,6 +2862,18 @@ static void vmx_flush_tlb_gva(struct kvm_vcpu *vcpu, gva_t addr)
>  	 */
>  }
>  
> +static void vmx_flush_tlb_guest(struct kvm_vcpu *vcpu)
> +{
> +	/*
> +	 * vpid_sync_context() is a nop if vmx->vpid==0, e.g. if enable_vpid==0
> +	 * or a vpid couldn't be allocated for this vCPU.  VM-Enter and VM-Exit
> +	 * are required to flush GVA->{G,H}PA mappings from the TLB if vpid is
> +	 * disabled (VM-Enter with vpid enabled and vpid==0 is disallowed),
> +	 * i.e. no explicit INVVPID is necessary.
> +	 */
> +	vpid_sync_context(to_vmx(vcpu)->vpid);
> +}
> +
>  static void vmx_decache_cr0_guest_bits(struct kvm_vcpu *vcpu)
>  {
>  	ulong cr0_guest_owned_bits = vcpu->arch.cr0_guest_owned_bits;
> @@ -7875,6 +7887,7 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
>  
>  	.tlb_flush = vmx_flush_tlb,
>  	.tlb_flush_gva = vmx_flush_tlb_gva,
> +	.tlb_flush_guest = vmx_flush_tlb_guest,
>  
>  	.run = vmx_vcpu_run,
>  	.handle_exit = vmx_handle_exit,
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index f506248d61a1..0b90ec2c93cf 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2725,7 +2725,7 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
>  	trace_kvm_pv_tlb_flush(vcpu->vcpu_id,
>  		st->preempted & KVM_VCPU_FLUSH_TLB);
>  	if (xchg(&st->preempted, 0) & KVM_VCPU_FLUSH_TLB)
> -		kvm_vcpu_flush_tlb(vcpu, false);
> +		kvm_x86_ops->tlb_flush_guest(vcpu);
>  
>  	vcpu->arch.st.preempted = 0;

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>

I *think* I've commented on the previous version that we also have
hyperv-style PV TLB flush and this will likely need to be switched to
tlb_flush_guest(). What do you think about the following (very lightly
tested)?

commit 485b4a579605597b9897b3d9ec118e0f7f1138ad
Author: Vitaly Kuznetsov <vkuznets@redhat.com>
Date:   Wed Mar 25 11:14:25 2020 +0100

    KVM: x86: make Hyper-V PV TLB flush use tlb_flush_guest()
    
    Hyper-V PV TLB flush mechanism does TLB flush on behalf of the guest
    so doing tlb_flush_all() is an overkill, switch to using tlb_flush_guest()
    (just like KVM PV TLB flush mechanism) instead. Introduce
    KVM_REQ_HV_TLB_FLUSH to support the change.
    
    Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 167729624149..8c5659ed211b 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -84,6 +84,7 @@
 #define KVM_REQ_APICV_UPDATE \
 	KVM_ARCH_REQ_FLAGS(25, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
 #define KVM_REQ_TLB_FLUSH_CURRENT	KVM_ARCH_REQ(26)
+#define KVM_REQ_HV_TLB_FLUSH		KVM_ARCH_REQ(27)
 
 #define CR0_RESERVED_BITS                                               \
 	(~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index a86fda7a1d03..0d051ed11f38 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -1425,8 +1425,7 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *current_vcpu, u64 ingpa,
 	 * vcpu->arch.cr3 may not be up-to-date for running vCPUs so we can't
 	 * analyze it here, flush TLB regardless of the specified address space.
 	 */
-	kvm_make_vcpus_request_mask(kvm,
-				    KVM_REQ_TLB_FLUSH | KVM_REQUEST_NO_WAKEUP,
+	kvm_make_vcpus_request_mask(kvm, KVM_REQ_HV_TLB_FLUSH,
 				    vcpu_mask, &hv_vcpu->tlb_flush);
 
 ret_success:
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 210af343eebf..5096a9b1a04e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2702,6 +2702,12 @@ static void kvm_vcpu_flush_tlb_all(struct kvm_vcpu *vcpu)
 	kvm_x86_ops->tlb_flush_all(vcpu);
 }
 
+static void kvm_vcpu_flush_tlb_guest(struct kvm_vcpu *vcpu)
+{
+	++vcpu->stat.tlb_flush;
+	kvm_x86_ops->tlb_flush_guest(vcpu);
+}
+
 static void record_steal_time(struct kvm_vcpu *vcpu)
 {
 	struct kvm_host_map map;
@@ -2725,7 +2731,7 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
 	trace_kvm_pv_tlb_flush(vcpu->vcpu_id,
 		st->preempted & KVM_VCPU_FLUSH_TLB);
 	if (xchg(&st->preempted, 0) & KVM_VCPU_FLUSH_TLB)
-		kvm_x86_ops->tlb_flush_guest(vcpu);
+		kvm_vcpu_flush_tlb_guest(vcpu);
 
 	vcpu->arch.st.preempted = 0;
 
@@ -8219,7 +8225,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 		}
 		if (kvm_check_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu))
 			kvm_vcpu_flush_tlb_current(vcpu);
-
+		if (kvm_check_request(KVM_REQ_HV_TLB_FLUSH, vcpu))
+			kvm_vcpu_flush_tlb_guest(vcpu);
 		if (kvm_check_request(KVM_REQ_REPORT_TPR_ACCESS, vcpu)) {
 			vcpu->run->exit_reason = KVM_EXIT_TPR_ACCESS;
 			r = 0;

-- 
Vitaly


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 16/37] KVM: x86: Drop @invalidate_gpa param from kvm_x86_ops' tlb_flush()
  2020-03-20 21:28 ` [PATCH v3 16/37] KVM: x86: Drop @invalidate_gpa param from kvm_x86_ops' tlb_flush() Sean Christopherson
@ 2020-03-25 11:23   ` Vitaly Kuznetsov
  0 siblings, 0 replies; 83+ messages in thread
From: Vitaly Kuznetsov @ 2020-03-25 11:23 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: Wanpeng Li, Jim Mattson, Joerg Roedel, kvm, linux-kernel,
	Ben Gardon, Junaid Shahid, Liran Alon, Boris Ostrovsky,
	John Haxby, Miaohe Lin, Tom Lendacky

Sean Christopherson <sean.j.christopherson@intel.com> writes:

> Drop @invalidate_gpa from ->tlb_flush() and kvm_vcpu_flush_tlb() now
> that all callers pass %true for said param, or ignore the param (SVM has
> an internal call to svm_flush_tlb() in svm_flush_tlb_guest that somewhat
> arbitrarily passes %false).
>
> Remove __vmx_flush_tlb() as it is no longer used.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> ---
>  arch/x86/include/asm/kvm_host.h |  2 +-
>  arch/x86/kvm/mmu/mmu.c          |  2 +-
>  arch/x86/kvm/svm.c              | 10 ++++----
>  arch/x86/kvm/vmx/vmx.c          |  4 ++--
>  arch/x86/kvm/vmx/vmx.h          | 42 ++++++++++-----------------------
>  arch/x86/kvm/x86.c              |  6 ++---
>  6 files changed, 24 insertions(+), 42 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index c08f4c0bf4d1..a5dfab4642d6 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1105,7 +1105,7 @@ struct kvm_x86_ops {
>  	unsigned long (*get_rflags)(struct kvm_vcpu *vcpu);
>  	void (*set_rflags)(struct kvm_vcpu *vcpu, unsigned long rflags);
>  
> -	void (*tlb_flush)(struct kvm_vcpu *vcpu, bool invalidate_gpa);
> +	void (*tlb_flush)(struct kvm_vcpu *vcpu);
>  	int  (*tlb_remote_flush)(struct kvm *kvm);
>  	int  (*tlb_remote_flush_with_range)(struct kvm *kvm,
>  			struct kvm_tlb_range *range);
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 5ae620881bbc..a87b8f9f3b1f 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -5177,7 +5177,7 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu)
>  	if (r)
>  		goto out;
>  	kvm_mmu_load_pgd(vcpu);
> -	kvm_x86_ops->tlb_flush(vcpu, true);
> +	kvm_x86_ops->tlb_flush(vcpu);
>  out:
>  	return r;
>  }
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 396f42753489..62fa45dcb6a4 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -385,7 +385,7 @@ module_param(dump_invalid_vmcb, bool, 0644);
>  static u8 rsm_ins_bytes[] = "\x0f\xaa";
>  
>  static void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0);
> -static void svm_flush_tlb(struct kvm_vcpu *vcpu, bool invalidate_gpa);
> +static void svm_flush_tlb(struct kvm_vcpu *vcpu);
>  static void svm_complete_interrupts(struct vcpu_svm *svm);
>  static void svm_toggle_avic_for_irq_window(struct kvm_vcpu *vcpu, bool activate);
>  static inline void avic_post_state_restore(struct kvm_vcpu *vcpu);
> @@ -2692,7 +2692,7 @@ static int svm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
>  		return 1;
>  
>  	if (npt_enabled && ((old_cr4 ^ cr4) & X86_CR4_PGE))
> -		svm_flush_tlb(vcpu, true);
> +		svm_flush_tlb(vcpu);
>  
>  	vcpu->arch.cr4 = cr4;
>  	if (!npt_enabled)
> @@ -3630,7 +3630,7 @@ static void enter_svm_guest_mode(struct vcpu_svm *svm, u64 vmcb_gpa,
>  	svm->nested.intercept_exceptions = nested_vmcb->control.intercept_exceptions;
>  	svm->nested.intercept            = nested_vmcb->control.intercept;
>  
> -	svm_flush_tlb(&svm->vcpu, true);
> +	svm_flush_tlb(&svm->vcpu);
>  	svm->vmcb->control.int_ctl = nested_vmcb->control.int_ctl | V_INTR_MASKING_MASK;
>  	if (nested_vmcb->control.int_ctl & V_INTR_MASKING_MASK)
>  		svm->vcpu.arch.hflags |= HF_VINTR_MASK;
> @@ -5626,7 +5626,7 @@ static int svm_set_identity_map_addr(struct kvm *kvm, u64 ident_addr)
>  	return 0;
>  }
>  
> -static void svm_flush_tlb(struct kvm_vcpu *vcpu, bool invalidate_gpa)
> +static void svm_flush_tlb(struct kvm_vcpu *vcpu)
>  {
>  	struct vcpu_svm *svm = to_svm(vcpu);
>  
> @@ -5645,7 +5645,7 @@ static void svm_flush_tlb_gva(struct kvm_vcpu *vcpu, gva_t gva)
>  
>  static void svm_flush_tlb_guest(struct kvm_vcpu *vcpu)
>  {
> -	svm_flush_tlb(vcpu, false);
> +	svm_flush_tlb(vcpu);
>  }
>  
>  static void svm_prepare_guest_switch(struct kvm_vcpu *vcpu)
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 43c0d4706f9a..477bdbc52ed0 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -6079,7 +6079,7 @@ void vmx_set_virtual_apic_mode(struct kvm_vcpu *vcpu)
>  		if (flexpriority_enabled) {
>  			sec_exec_control |=
>  				SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
> -			vmx_flush_tlb(vcpu, true);
> +			vmx_flush_tlb(vcpu);
>  		}
>  		break;
>  	case LAPIC_MODE_X2APIC:
> @@ -6097,7 +6097,7 @@ static void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu, hpa_t hpa)
>  {
>  	if (!is_guest_mode(vcpu)) {
>  		vmcs_write64(APIC_ACCESS_ADDR, hpa);
> -		vmx_flush_tlb(vcpu, true);
> +		vmx_flush_tlb(vcpu);
>  	}
>  }
>  
> diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
> index 3770ae111e6a..bab5d62ad964 100644
> --- a/arch/x86/kvm/vmx/vmx.h
> +++ b/arch/x86/kvm/vmx/vmx.h
> @@ -503,46 +503,28 @@ static inline struct vmcs *alloc_vmcs(bool shadow)
>  
>  u64 construct_eptp(struct kvm_vcpu *vcpu, unsigned long root_hpa);
>  
> -static inline void __vmx_flush_tlb(struct kvm_vcpu *vcpu, int vpid,
> -				bool invalidate_gpa)
> -{
> -	if (enable_ept && (invalidate_gpa || !enable_vpid)) {
> -		if (!VALID_PAGE(vcpu->arch.mmu->root_hpa))
> -			return;
> -		ept_sync_context(construct_eptp(vcpu,
> -						vcpu->arch.mmu->root_hpa));
> -	} else {
> -		vpid_sync_context(vpid);
> -	}
> -}
> -
> -static inline void vmx_flush_tlb(struct kvm_vcpu *vcpu, bool invalidate_gpa)
> +static inline void vmx_flush_tlb(struct kvm_vcpu *vcpu)
>  {
>  	struct vcpu_vmx *vmx = to_vmx(vcpu);
>  
>  	/*
> -	 * Flush all EPTP/VPID contexts if the TLB flush _may_ have been
> -	 * invoked via kvm_flush_remote_tlbs(), which always passes %true for
> -	 * @invalidate_gpa.  Flushing remote TLBs requires all contexts to be
> -	 * flushed, not just the active context.
> +	 * Flush all EPTP/VPID contexts, as the TLB flush _may_ have been
> +	 * invoked via kvm_flush_remote_tlbs().  Flushing remote TLBs requires
> +	 * all contexts to be flushed, not just the active context.
>  	 *
>  	 * Note, this also ensures a deferred TLB flush with VPID enabled and
>  	 * EPT disabled invalidates the "correct" VPID, by nuking both L1 and
>  	 * L2's VPIDs.
>  	 */
> -	if (invalidate_gpa) {
> -		if (enable_ept) {
> -			ept_sync_global();
> -		} else if (enable_vpid) {
> -			if (cpu_has_vmx_invvpid_global()) {
> -				vpid_sync_vcpu_global();
> -			} else {
> -				vpid_sync_vcpu_single(vmx->vpid);
> -				vpid_sync_vcpu_single(vmx->nested.vpid02);
> -			}
> +	if (enable_ept) {
> +		ept_sync_global();
> +	} else if (enable_vpid) {
> +		if (cpu_has_vmx_invvpid_global()) {
> +			vpid_sync_vcpu_global();
> +		} else {
> +			vpid_sync_vcpu_single(vmx->vpid);
> +			vpid_sync_vcpu_single(vmx->nested.vpid02);
>  		}
> -	} else {
> -		__vmx_flush_tlb(vcpu, vmx->vpid, false);
>  	}
>  }
>  
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 0b90ec2c93cf..84cbd7ca1e18 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2696,10 +2696,10 @@ static void kvmclock_reset(struct kvm_vcpu *vcpu)
>  	vcpu->arch.time = 0;
>  }
>  
> -static void kvm_vcpu_flush_tlb(struct kvm_vcpu *vcpu, bool invalidate_gpa)
> +static void kvm_vcpu_flush_tlb(struct kvm_vcpu *vcpu)
>  {
>  	++vcpu->stat.tlb_flush;
> -	kvm_x86_ops->tlb_flush(vcpu, invalidate_gpa);
> +	kvm_x86_ops->tlb_flush(vcpu);
>  }
>  
>  static void record_steal_time(struct kvm_vcpu *vcpu)
> @@ -8223,7 +8223,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>  		if (kvm_check_request(KVM_REQ_LOAD_MMU_PGD, vcpu))
>  			kvm_mmu_load_pgd(vcpu);
>  		if (kvm_check_request(KVM_REQ_TLB_FLUSH, vcpu))
> -			kvm_vcpu_flush_tlb(vcpu, true);
> +			kvm_vcpu_flush_tlb(vcpu);
>  		if (kvm_check_request(KVM_REQ_REPORT_TPR_ACCESS, vcpu)) {
>  			vcpu->run->exit_reason = KVM_EXIT_TPR_ACCESS;
>  			r = 0;

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>

-- 
Vitaly


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 17/37] KVM: SVM: Wire up ->tlb_flush_guest() directly to svm_flush_tlb()
  2020-03-20 21:28 ` [PATCH v3 17/37] KVM: SVM: Wire up ->tlb_flush_guest() directly to svm_flush_tlb() Sean Christopherson
@ 2020-03-25 11:23   ` Vitaly Kuznetsov
  0 siblings, 0 replies; 83+ messages in thread
From: Vitaly Kuznetsov @ 2020-03-25 11:23 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: Wanpeng Li, Jim Mattson, Joerg Roedel, kvm, linux-kernel,
	Ben Gardon, Junaid Shahid, Liran Alon, Boris Ostrovsky,
	John Haxby, Miaohe Lin, Tom Lendacky

Sean Christopherson <sean.j.christopherson@intel.com> writes:

> Use svm_flush_tlb() directly for kvm_x86_ops->tlb_flush_guest() now that
> the @invalidate_gpa param to ->tlb_flush() is gone, i.e. the wrapper for
> ->tlb_flush_guest() is no longer necessary.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> ---
>  arch/x86/kvm/svm.c | 7 +------
>  1 file changed, 1 insertion(+), 6 deletions(-)
>
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 62fa45dcb6a4..dfa3b53f8437 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -5643,11 +5643,6 @@ static void svm_flush_tlb_gva(struct kvm_vcpu *vcpu, gva_t gva)
>  	invlpga(gva, svm->vmcb->control.asid);
>  }
>  
> -static void svm_flush_tlb_guest(struct kvm_vcpu *vcpu)
> -{
> -	svm_flush_tlb(vcpu);
> -}
> -
>  static void svm_prepare_guest_switch(struct kvm_vcpu *vcpu)
>  {
>  }
> @@ -7405,7 +7400,7 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
>  
>  	.tlb_flush = svm_flush_tlb,
>  	.tlb_flush_gva = svm_flush_tlb_gva,
> -	.tlb_flush_guest = svm_flush_tlb_guest,
> +	.tlb_flush_guest = svm_flush_tlb,
>  
>  	.run = svm_vcpu_run,
>  	.handle_exit = handle_exit,

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>

-- 
Vitaly


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 18/37] KVM: VMX: Move vmx_flush_tlb() to vmx.c
  2020-03-20 21:28 ` [PATCH v3 18/37] KVM: VMX: Move vmx_flush_tlb() to vmx.c Sean Christopherson
@ 2020-03-25 11:25   ` Vitaly Kuznetsov
  0 siblings, 0 replies; 83+ messages in thread
From: Vitaly Kuznetsov @ 2020-03-25 11:25 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: Wanpeng Li, Jim Mattson, Joerg Roedel, kvm, linux-kernel,
	Ben Gardon, Junaid Shahid, Liran Alon, Boris Ostrovsky,
	John Haxby, Miaohe Lin, Tom Lendacky

Sean Christopherson <sean.j.christopherson@intel.com> writes:

> Move vmx_flush_tlb() to vmx.c and make it non-inline static now that all
> its callers live in vmx.c.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> ---
>  arch/x86/kvm/vmx/vmx.c | 25 +++++++++++++++++++++++++
>  arch/x86/kvm/vmx/vmx.h | 25 -------------------------
>  2 files changed, 25 insertions(+), 25 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 477bdbc52ed0..c6affaaef138 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -2849,6 +2849,31 @@ static void exit_lmode(struct kvm_vcpu *vcpu)
>  
>  #endif
>  
> +static void vmx_flush_tlb(struct kvm_vcpu *vcpu)
> +{
> +	struct vcpu_vmx *vmx = to_vmx(vcpu);
> +
> +	/*
> +	 * Flush all EPTP/VPID contexts, as the TLB flush _may_ have been
> +	 * invoked via kvm_flush_remote_tlbs().  Flushing remote TLBs requires
> +	 * all contexts to be flushed, not just the active context.
> +	 *
> +	 * Note, this also ensures a deferred TLB flush with VPID enabled and
> +	 * EPT disabled invalidates the "correct" VPID, by nuking both L1 and
> +	 * L2's VPIDs.
> +	 */
> +	if (enable_ept) {
> +		ept_sync_global();
> +	} else if (enable_vpid) {
> +		if (cpu_has_vmx_invvpid_global()) {
> +			vpid_sync_vcpu_global();
> +		} else {
> +			vpid_sync_vcpu_single(vmx->vpid);
> +			vpid_sync_vcpu_single(vmx->nested.vpid02);
> +		}
> +	}
> +}
> +
>  static void vmx_flush_tlb_gva(struct kvm_vcpu *vcpu, gva_t addr)
>  {
>  	/*
> diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
> index bab5d62ad964..571249e18bb6 100644
> --- a/arch/x86/kvm/vmx/vmx.h
> +++ b/arch/x86/kvm/vmx/vmx.h
> @@ -503,31 +503,6 @@ static inline struct vmcs *alloc_vmcs(bool shadow)
>  
>  u64 construct_eptp(struct kvm_vcpu *vcpu, unsigned long root_hpa);
>  
> -static inline void vmx_flush_tlb(struct kvm_vcpu *vcpu)
> -{
> -	struct vcpu_vmx *vmx = to_vmx(vcpu);
> -
> -	/*
> -	 * Flush all EPTP/VPID contexts, as the TLB flush _may_ have been
> -	 * invoked via kvm_flush_remote_tlbs().  Flushing remote TLBs requires
> -	 * all contexts to be flushed, not just the active context.
> -	 *
> -	 * Note, this also ensures a deferred TLB flush with VPID enabled and
> -	 * EPT disabled invalidates the "correct" VPID, by nuking both L1 and
> -	 * L2's VPIDs.
> -	 */
> -	if (enable_ept) {
> -		ept_sync_global();
> -	} else if (enable_vpid) {
> -		if (cpu_has_vmx_invvpid_global()) {
> -			vpid_sync_vcpu_global();
> -		} else {
> -			vpid_sync_vcpu_single(vmx->vpid);
> -			vpid_sync_vcpu_single(vmx->nested.vpid02);
> -		}
> -	}
> -}
> -
>  static inline void decache_tsc_multiplier(struct vcpu_vmx *vmx)
>  {
>  	vmx->current_tsc_ratio = vmx->vcpu.arch.tsc_scaling_ratio;

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>

-- 
Vitaly


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 19/37] KVM: nVMX: Move nested_get_vpid02() to vmx/nested.h
  2020-03-20 21:28 ` [PATCH v3 19/37] KVM: nVMX: Move nested_get_vpid02() to vmx/nested.h Sean Christopherson
@ 2020-03-25 11:25   ` Vitaly Kuznetsov
  0 siblings, 0 replies; 83+ messages in thread
From: Vitaly Kuznetsov @ 2020-03-25 11:25 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: Wanpeng Li, Jim Mattson, Joerg Roedel, kvm, linux-kernel,
	Ben Gardon, Junaid Shahid, Liran Alon, Boris Ostrovsky,
	John Haxby, Miaohe Lin, Tom Lendacky

Sean Christopherson <sean.j.christopherson@intel.com> writes:

> Move nested_get_vpid02() to vmx/nested.h so that a future patch can
> reference it from vmx.c to implement context-specific TLB flushing.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> ---
>  arch/x86/kvm/vmx/nested.c | 7 -------
>  arch/x86/kvm/vmx/nested.h | 7 +++++++
>  2 files changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 0c71db6fec5a..77819d890088 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -1154,13 +1154,6 @@ static bool nested_has_guest_tlb_tag(struct kvm_vcpu *vcpu)
>  	       (nested_cpu_has_vpid(vmcs12) && to_vmx(vcpu)->nested.vpid02);
>  }
>  
> -static u16 nested_get_vpid02(struct kvm_vcpu *vcpu)
> -{
> -	struct vcpu_vmx *vmx = to_vmx(vcpu);
> -
> -	return vmx->nested.vpid02 ? vmx->nested.vpid02 : vmx->vpid;
> -}
> -
>  static bool is_bitwise_subset(u64 superset, u64 subset, u64 mask)
>  {
>  	superset &= mask;
> diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h
> index 21d36652f213..debc5eeb5757 100644
> --- a/arch/x86/kvm/vmx/nested.h
> +++ b/arch/x86/kvm/vmx/nested.h
> @@ -60,6 +60,13 @@ static inline int vmx_has_valid_vmcs12(struct kvm_vcpu *vcpu)
>  		vmx->nested.hv_evmcs;
>  }
>  
> +static inline u16 nested_get_vpid02(struct kvm_vcpu *vcpu)
> +{
> +	struct vcpu_vmx *vmx = to_vmx(vcpu);
> +
> +	return vmx->nested.vpid02 ? vmx->nested.vpid02 : vmx->vpid;
> +}
> +
>  static inline unsigned long nested_ept_get_eptp(struct kvm_vcpu *vcpu)
>  {
>  	/* return the page table to be shadowed - in our case, EPT12 */

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>

-- 
Vitaly


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 14/37] KVM: x86: Move "flush guest's TLB" logic to separate kvm_x86_ops hook
  2020-03-25 10:23   ` Vitaly Kuznetsov
@ 2020-03-25 15:41     ` Paolo Bonzini
  2020-03-25 16:08       ` Vitaly Kuznetsov
  2020-03-25 15:48     ` Sean Christopherson
  1 sibling, 1 reply; 83+ messages in thread
From: Paolo Bonzini @ 2020-03-25 15:41 UTC (permalink / raw)
  To: Vitaly Kuznetsov, Sean Christopherson
  Cc: Wanpeng Li, Jim Mattson, Joerg Roedel, kvm, linux-kernel,
	Ben Gardon, Junaid Shahid, Liran Alon, Boris Ostrovsky,
	John Haxby, Miaohe Lin, Tom Lendacky

On 25/03/20 11:23, Vitaly Kuznetsov wrote:
> What do you think about the following (very lightly
> tested)?
> 
> commit 485b4a579605597b9897b3d9ec118e0f7f1138ad
> Author: Vitaly Kuznetsov <vkuznets@redhat.com>
> Date:   Wed Mar 25 11:14:25 2020 +0100
> 
>     KVM: x86: make Hyper-V PV TLB flush use tlb_flush_guest()
>     
>     Hyper-V PV TLB flush mechanism does TLB flush on behalf of the guest
>     so doing tlb_flush_all() is an overkill, switch to using tlb_flush_guest()
>     (just like KVM PV TLB flush mechanism) instead. Introduce
>     KVM_REQ_HV_TLB_FLUSH to support the change.
>     
>     Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 167729624149..8c5659ed211b 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -84,6 +84,7 @@
>  #define KVM_REQ_APICV_UPDATE \
>  	KVM_ARCH_REQ_FLAGS(25, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
>  #define KVM_REQ_TLB_FLUSH_CURRENT	KVM_ARCH_REQ(26)
> +#define KVM_REQ_HV_TLB_FLUSH		KVM_ARCH_REQ(27)
>  
>  #define CR0_RESERVED_BITS                                               \
>  	(~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
> index a86fda7a1d03..0d051ed11f38 100644
> --- a/arch/x86/kvm/hyperv.c
> +++ b/arch/x86/kvm/hyperv.c
> @@ -1425,8 +1425,7 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *current_vcpu, u64 ingpa,
>  	 * vcpu->arch.cr3 may not be up-to-date for running vCPUs so we can't
>  	 * analyze it here, flush TLB regardless of the specified address space.
>  	 */
> -	kvm_make_vcpus_request_mask(kvm,
> -				    KVM_REQ_TLB_FLUSH | KVM_REQUEST_NO_WAKEUP,
> +	kvm_make_vcpus_request_mask(kvm, KVM_REQ_HV_TLB_FLUSH,
>  				    vcpu_mask, &hv_vcpu->tlb_flush);
>  

Looks good, but why are you dropping KVM_REQUEST_NO_WAKEUP?

Paolo


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 14/37] KVM: x86: Move "flush guest's TLB" logic to separate kvm_x86_ops hook
  2020-03-25 10:23   ` Vitaly Kuznetsov
  2020-03-25 15:41     ` Paolo Bonzini
@ 2020-03-25 15:48     ` Sean Christopherson
  2020-03-25 16:11       ` Vitaly Kuznetsov
  1 sibling, 1 reply; 83+ messages in thread
From: Sean Christopherson @ 2020-03-25 15:48 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Paolo Bonzini, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Ben Gardon, Junaid Shahid, Liran Alon,
	Boris Ostrovsky, John Haxby, Miaohe Lin, Tom Lendacky

On Wed, Mar 25, 2020 at 11:23:41AM +0100, Vitaly Kuznetsov wrote:
> Sean Christopherson <sean.j.christopherson@intel.com> writes:
> I *think* I've commented on the previous version that we also have
> hyperv-style PV TLB flush and this will likely need to be switched to
> tlb_flush_guest().

Oh, you most definitely commented about HyperV's PV TLB flush, looking at
that code is what led me down this rabbit hole :-)

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 14/37] KVM: x86: Move "flush guest's TLB" logic to separate kvm_x86_ops hook
  2020-03-25 15:41     ` Paolo Bonzini
@ 2020-03-25 16:08       ` Vitaly Kuznetsov
  0 siblings, 0 replies; 83+ messages in thread
From: Vitaly Kuznetsov @ 2020-03-25 16:08 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson
  Cc: Wanpeng Li, Jim Mattson, Joerg Roedel, kvm, linux-kernel,
	Ben Gardon, Junaid Shahid, Liran Alon, Boris Ostrovsky,
	John Haxby, Miaohe Lin, Tom Lendacky

Paolo Bonzini <pbonzini@redhat.com> writes:

> On 25/03/20 11:23, Vitaly Kuznetsov wrote:
>> What do you think about the following (very lightly
>> tested)?
>> 
>> commit 485b4a579605597b9897b3d9ec118e0f7f1138ad
>> Author: Vitaly Kuznetsov <vkuznets@redhat.com>
>> Date:   Wed Mar 25 11:14:25 2020 +0100
>> 
>>     KVM: x86: make Hyper-V PV TLB flush use tlb_flush_guest()
>>     
>>     Hyper-V PV TLB flush mechanism does TLB flush on behalf of the guest
>>     so doing tlb_flush_all() is an overkill, switch to using tlb_flush_guest()
>>     (just like KVM PV TLB flush mechanism) instead. Introduce
>>     KVM_REQ_HV_TLB_FLUSH to support the change.
>>     
>>     Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> 
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index 167729624149..8c5659ed211b 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -84,6 +84,7 @@
>>  #define KVM_REQ_APICV_UPDATE \
>>  	KVM_ARCH_REQ_FLAGS(25, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
>>  #define KVM_REQ_TLB_FLUSH_CURRENT	KVM_ARCH_REQ(26)
>> +#define KVM_REQ_HV_TLB_FLUSH		KVM_ARCH_REQ(27)
>>  
>>  #define CR0_RESERVED_BITS                                               \
>>  	(~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
>> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
>> index a86fda7a1d03..0d051ed11f38 100644
>> --- a/arch/x86/kvm/hyperv.c
>> +++ b/arch/x86/kvm/hyperv.c
>> @@ -1425,8 +1425,7 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *current_vcpu, u64 ingpa,
>>  	 * vcpu->arch.cr3 may not be up-to-date for running vCPUs so we can't
>>  	 * analyze it here, flush TLB regardless of the specified address space.
>>  	 */
>> -	kvm_make_vcpus_request_mask(kvm,
>> -				    KVM_REQ_TLB_FLUSH | KVM_REQUEST_NO_WAKEUP,
>> +	kvm_make_vcpus_request_mask(kvm, KVM_REQ_HV_TLB_FLUSH,
>>  				    vcpu_mask, &hv_vcpu->tlb_flush);
>>  
>
> Looks good, but why are you dropping KVM_REQUEST_NO_WAKEUP?

My bad, KVM_REQUEST_NO_WAKEUP needs to stay.

-- 
Vitaly


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 14/37] KVM: x86: Move "flush guest's TLB" logic to separate kvm_x86_ops hook
  2020-03-25 15:48     ` Sean Christopherson
@ 2020-03-25 16:11       ` Vitaly Kuznetsov
  0 siblings, 0 replies; 83+ messages in thread
From: Vitaly Kuznetsov @ 2020-03-25 16:11 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Ben Gardon, Junaid Shahid, Liran Alon,
	Boris Ostrovsky, John Haxby, Miaohe Lin, Tom Lendacky

Sean Christopherson <sean.j.christopherson@intel.com> writes:

> On Wed, Mar 25, 2020 at 11:23:41AM +0100, Vitaly Kuznetsov wrote:
>> Sean Christopherson <sean.j.christopherson@intel.com> writes:
>> I *think* I've commented on the previous version that we also have
>> hyperv-style PV TLB flush and this will likely need to be switched to
>> tlb_flush_guest().
>
> Oh, you most definitely commented about HyperV's PV TLB flush, looking at
> that code is what led me down this rabbit hole :-)

Ah, I was just worried it's Groundhog Day all over again :-) And I
didn't see you touching hyperv.c 

-- 
Vitaly


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 03/37] KVM: nVMX: Invalidate all EPTP contexts when emulating INVEPT for L1
  2020-03-24  0:12             ` Jim Mattson
@ 2020-03-30 18:38               ` Sean Christopherson
  0 siblings, 0 replies; 83+ messages in thread
From: Sean Christopherson @ 2020-03-30 18:38 UTC (permalink / raw)
  To: Jim Mattson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Joerg Roedel,
	kvm list, LKML, Ben Gardon, Junaid Shahid, Liran Alon,
	Boris Ostrovsky, John Haxby, Miaohe Lin, Tom Lendacky

On Mon, Mar 23, 2020 at 05:12:04PM -0700, Jim Mattson wrote:
> On Mon, Mar 23, 2020 at 4:51 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
> >
> > On 23/03/20 17:44, Sean Christopherson wrote:
> > > So I think
> > >
> > >   Fixes: 14c07ad89f4d ("x86/kvm/mmu: introduce guest_mmu")
> > >
> > > would be appropriate?
> > >
> >
> > Yes.
> 
> I think it was actually commit efebf0aaec3d ("KVM: nVMX: Do not flush
> TLB on L1<->L2 transitions if L1 uses VPID and EPT").

Hmm, commit efebf0aaec3d it only changed flushing behavior, it didn't
affect KVM's behavior with respect to refreshing unsync'd SPTE, i.e.
reloading guest_mmu.

It's somewhat of a moot point, because _technically_ there is no bug since,
at the time of this fix, KVM always flushes and reloads on nested VM-Enter.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 01/37] KVM: VMX: Flush all EPTP/VPID contexts on remote TLB flush
  2020-03-20 21:27 ` [PATCH v3 01/37] KVM: VMX: Flush all EPTP/VPID contexts on remote TLB flush Sean Christopherson
@ 2021-08-03  1:45   ` Lai Jiangshan
  2021-08-03 15:39     ` Sean Christopherson
  0 siblings, 1 reply; 83+ messages in thread
From: Lai Jiangshan @ 2021-08-03  1:45 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, LKML

(I'm replying to a very old email, so many CCs are dropped.)

On Sat, Mar 21, 2020 at 5:33 AM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
>
> Flush all EPTP/VPID contexts if a TLB flush _may_ have been triggered by
> a remote or deferred TLB flush, i.e. by KVM_REQ_TLB_FLUSH.  Remote TLB
> flushes require all contexts to be invalidated, not just the active
> contexts, e.g. all mappings in all contexts for a given HVA need to be
> invalidated on a mmu_notifier invalidation.  Similarly, the instigator
> of the deferred TLB flush may be expecting all contexts to be flushed,
> e.g. vmx_vcpu_load_vmcs().
>
> Without nested VMX, flushing only the current EPTP/VPID context isn't
> problematic because KVM uses a constant VPID for each vCPU, and

Hello, Sean

Is the patch optimized for cases where nested VMX is active?
I think the non-nested cases are normal cases.

Although the related code has been changed, the logic of the patch
is still working now, would it be better if we restore the optimization
for the normal cases (non-nested)?

Thanks
Lai

> mmu_alloc_direct_roots() all but guarantees KVM will use a single EPTP
> for L1.  In the rare case where a different EPTP is created or reused,
> KVM (currently) unconditionally flushes the new EPTP context prior to
> entering the guest.
>
> With nested VMX, KVM conditionally uses a different VPID for L2, and
> unconditionally uses a different EPTP for L2.  Because KVM doesn't
> _intentionally_ guarantee L2's EPTP/VPID context is flushed on nested
> VM-Enter, it'd be possible for a malicious L1 to attack the host and/or
> different VMs by exploiting the lack of flushing for L2.
>
>   1) Launch nested guest from malicious L1.
>
>   2) Nested VM-Enter to L2.
>
>   3) Access target GPA 'g'.  CPU inserts TLB entry tagged with L2's ASID
>      mapping 'g' to host PFN 'x'.
>
>   2) Nested VM-Exit to L1.
>
>   3) L1 triggers kernel same-page merging (ksm) by duplicating/zeroing
>      the page for PFN 'x'.
>
>   4) Host kernel merges PFN 'x' with PFN 'y', i.e. unmaps PFN 'x' and
>      remaps the page to PFN 'y'.  mmu_notifier sends invalidate command,
>      KVM flushes TLB only for L1's ASID.
>
>   4) Host kernel reallocates PFN 'x' to some other task/guest.
>
>   5) Nested VM-Enter to L2.  KVM does not invalidate L2's EPTP or VPID.
>
>   6) L2 accesses GPA 'g' and gains read/write access to PFN 'x' via its
>      stale TLB entry.
>
> However, current KVM unconditionally flushes L1's EPTP/VPID context on
> nested VM-Exit.  But, that behavior is mostly unintentional, KVM doesn't
> go out of its way to flush EPTP/VPID on nested VM-Enter/VM-Exit, rather
> a TLB flush is guaranteed to occur prior to re-entering L1 due to
> __kvm_mmu_new_cr3() always being called with skip_tlb_flush=false.  On
> nested VM-Enter, this happens via kvm_init_shadow_ept_mmu() (nested EPT
> enabled) or in nested_vmx_load_cr3() (nested EPT disabled).  On nested
> VM-Exit it occurs via nested_vmx_load_cr3().
>
> This also fixes a bug where a deferred TLB flush in the context of L2,
> with EPT disabled, would flush L1's VPID instead of L2's VPID, as
> vmx_flush_tlb() flushes L1's VPID regardless of is_guest_mode().
>
> Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
> Cc: Ben Gardon <bgardon@google.com>
> Cc: Jim Mattson <jmattson@google.com>
> Cc: Junaid Shahid <junaids@google.com>
> Cc: Liran Alon <liran.alon@oracle.com>
> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Cc: John Haxby <john.haxby@oracle.com>
> Reviewed-by: Liran Alon <liran.alon@oracle.com>
> Fixes: efebf0aaec3d ("KVM: nVMX: Do not flush TLB on L1<->L2 transitions if L1 uses VPID and EPT")
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> ---
>  arch/x86/kvm/vmx/vmx.h | 28 +++++++++++++++++++++++++++-
>  1 file changed, 27 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
> index be93d597306c..d6d67b816ebe 100644
> --- a/arch/x86/kvm/vmx/vmx.h
> +++ b/arch/x86/kvm/vmx/vmx.h
> @@ -518,7 +518,33 @@ static inline void __vmx_flush_tlb(struct kvm_vcpu *vcpu, int vpid,
>
>  static inline void vmx_flush_tlb(struct kvm_vcpu *vcpu, bool invalidate_gpa)
>  {
> -       __vmx_flush_tlb(vcpu, to_vmx(vcpu)->vpid, invalidate_gpa);
> +       struct vcpu_vmx *vmx = to_vmx(vcpu);
> +
> +       /*
> +        * Flush all EPTP/VPID contexts if the TLB flush _may_ have been
> +        * invoked via kvm_flush_remote_tlbs(), which always passes %true for
> +        * @invalidate_gpa.  Flushing remote TLBs requires all contexts to be
> +        * flushed, not just the active context.
> +        *
> +        * Note, this also ensures a deferred TLB flush with VPID enabled and
> +        * EPT disabled invalidates the "correct" VPID, by nuking both L1 and
> +        * L2's VPIDs.
> +        */
> +       if (invalidate_gpa) {
> +               if (enable_ept) {
> +                       ept_sync_global();
> +               } else if (enable_vpid) {
> +                       if (cpu_has_vmx_invvpid_global()) {
> +                               vpid_sync_vcpu_global();
> +                       } else {
> +                               WARN_ON_ONCE(!cpu_has_vmx_invvpid_single());
> +                               vpid_sync_vcpu_single(vmx->vpid);
> +                               vpid_sync_vcpu_single(vmx->nested.vpid02);
> +                       }
> +               }
> +       } else {
> +               __vmx_flush_tlb(vcpu, vmx->vpid, false);
> +       }
>  }
>
>  static inline void decache_tsc_multiplier(struct vcpu_vmx *vmx)
> --
> 2.24.1
>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 01/37] KVM: VMX: Flush all EPTP/VPID contexts on remote TLB flush
  2021-08-03  1:45   ` Lai Jiangshan
@ 2021-08-03 15:39     ` Sean Christopherson
  2021-08-04  3:11       ` Lai Jiangshan
  0 siblings, 1 reply; 83+ messages in thread
From: Sean Christopherson @ 2021-08-03 15:39 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: Paolo Bonzini, kvm, LKML

On Tue, Aug 03, 2021, Lai Jiangshan wrote:
> (I'm replying to a very old email, so many CCs are dropped.)
> 
> On Sat, Mar 21, 2020 at 5:33 AM Sean Christopherson
> <sean.j.christopherson@intel.com> wrote:
> >
> > Flush all EPTP/VPID contexts if a TLB flush _may_ have been triggered by
> > a remote or deferred TLB flush, i.e. by KVM_REQ_TLB_FLUSH.  Remote TLB
> > flushes require all contexts to be invalidated, not just the active
> > contexts, e.g. all mappings in all contexts for a given HVA need to be
> > invalidated on a mmu_notifier invalidation.  Similarly, the instigator
> > of the deferred TLB flush may be expecting all contexts to be flushed,
> > e.g. vmx_vcpu_load_vmcs().
> >
> > Without nested VMX, flushing only the current EPTP/VPID context isn't
> > problematic because KVM uses a constant VPID for each vCPU, and
> 
> Hello, Sean
> 
> Is the patch optimized for cases where nested VMX is active?

Well, this patch isn't, but KVM has since been optimized to do full EPT/VPID
flushes only when "necessary".  Necessary in quotes because the two uses can
technically be further optimized, but doing so would incur significant complexity.

Use #1 is remote flushes from the MMU, which don't strictly require a global flush,
but KVM would need to propagate more information (mmu_role?) in order for responding
vCPUs to determine what contexts needs to be flushed.  And practically speaking,
for MMU flushes there's no meaningful difference when using TDP without nested
guests as the common case will be that each vCPU has a single active EPTP and
that EPTP will be affected by the MMU changes, i.e. needs to be flushed.

Use #2 is in VMX's pCPU migration path.  Again, not strictly necessary as KVM could
theoretically track which pCPUs have run a particular vCPU and when that pCPU last
flushed EPT contexts, but fully solving the problem would be quite complex.  Since
pCPU migration is always going to be a slow path, the extra complexity would be
very difficult to justify.

> I think the non-nested cases are normal cases.
> 
> Although the related code has been changed, the logic of the patch
> is still working now, would it be better if we restore the optimization
> for the normal cases (non-nested)?

As above, vmx_flush_tlb_all() hasn't changed, but the callers have.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 01/37] KVM: VMX: Flush all EPTP/VPID contexts on remote TLB flush
  2021-08-03 15:39     ` Sean Christopherson
@ 2021-08-04  3:11       ` Lai Jiangshan
  2021-08-04 15:33         ` Sean Christopherson
  0 siblings, 1 reply; 83+ messages in thread
From: Lai Jiangshan @ 2021-08-04  3:11 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, LKML

On Tue, Aug 3, 2021 at 11:39 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Tue, Aug 03, 2021, Lai Jiangshan wrote:
> > (I'm replying to a very old email, so many CCs are dropped.)
> >
> > On Sat, Mar 21, 2020 at 5:33 AM Sean Christopherson
> > <sean.j.christopherson@intel.com> wrote:
> > >
> > > Flush all EPTP/VPID contexts if a TLB flush _may_ have been triggered by
> > > a remote or deferred TLB flush, i.e. by KVM_REQ_TLB_FLUSH.  Remote TLB
> > > flushes require all contexts to be invalidated, not just the active
> > > contexts, e.g. all mappings in all contexts for a given HVA need to be
> > > invalidated on a mmu_notifier invalidation.  Similarly, the instigator
> > > of the deferred TLB flush may be expecting all contexts to be flushed,
> > > e.g. vmx_vcpu_load_vmcs().
> > >
> > > Without nested VMX, flushing only the current EPTP/VPID context isn't
> > > problematic because KVM uses a constant VPID for each vCPU, and
> >
> > Hello, Sean
> >
> > Is the patch optimized for cases where nested VMX is active?
>
> Well, this patch isn't, but KVM has since been optimized to do full EPT/VPID
> flushes only when "necessary".  Necessary in quotes because the two uses can
> technically be further optimized, but doing so would incur significant complexity.

Hello, thanks for your reply.

I know there might be a lot of possible optimizations to be considered, many of
which are too complicated to be implemented.

The optimization I considered yesterday is "ept_sync_global() V.S.
ept_sync_context(this_vcpu's)" in the case: when the VM is using EPT and
doesn't allow nested VMs.  (And I failed to express it yesterday)

In this case, the vCPU uses only one single root_hpa, and I think ept sync
for single context is enough for both cases you listed below.

When the context is flushed, the TLB for the vCPU is clean to run.

If kvm changes the mmu->root_hpa, it is kvm's responsibility to request
another flush which is implemented.

In other words, KVM_REQ_TLB_FLUSH == KVM_REQ_TLB_FLUSH_CURRENT in this case.
And before this patch, kvm flush only the single context rather than global.

>
> Use #1 is remote flushes from the MMU, which don't strictly require a global flush,
> but KVM would need to propagate more information (mmu_role?) in order for responding
> vCPUs to determine what contexts needs to be flushed.  And practically speaking,
> for MMU flushes there's no meaningful difference when using TDP without nested
> guests as the common case will be that each vCPU has a single active EPTP and
> that EPTP will be affected by the MMU changes, i.e. needs to be flushed.

I don't see when we need "to determine what contexts" since the vcpu is
using only one context in this case which is the assumption in my mind,
could you please correct me if I'm wrong.

Thanks,
Lai.

>
> Use #2 is in VMX's pCPU migration path.  Again, not strictly necessary as KVM could
> theoretically track which pCPUs have run a particular vCPU and when that pCPU last
> flushed EPT contexts, but fully solving the problem would be quite complex.  Since
> pCPU migration is always going to be a slow path, the extra complexity would be
> very difficult to justify.
>
> > I think the non-nested cases are normal cases.
> >
> > Although the related code has been changed, the logic of the patch
> > is still working now, would it be better if we restore the optimization
> > for the normal cases (non-nested)?
>
> As above, vmx_flush_tlb_all() hasn't changed, but the callers have.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 01/37] KVM: VMX: Flush all EPTP/VPID contexts on remote TLB flush
  2021-08-04  3:11       ` Lai Jiangshan
@ 2021-08-04 15:33         ` Sean Christopherson
  0 siblings, 0 replies; 83+ messages in thread
From: Sean Christopherson @ 2021-08-04 15:33 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: Paolo Bonzini, kvm, LKML

On Wed, Aug 04, 2021, Lai Jiangshan wrote:
> The optimization I considered yesterday is "ept_sync_global() V.S.
> ept_sync_context(this_vcpu's)" in the case: when the VM is using EPT and
> doesn't allow nested VMs.  (And I failed to express it yesterday)
> 
> In this case, the vCPU uses only one single root_hpa,

This is not strictly guaranteed.  kvm_mmu_page_role tracks efer.NX, cr0.wp, and
cr4.SMEP/SMAP (if cr0.wp=0), which means that KVM will create a a different root
if the guest toggles any of those bits.  I'm pretty sure that can be changed and
will look into doing so in the near future[*], but even that wouldn't guarantee
a single root.

SMM is also incorporated in the page role and will result in a different roots
for SMM vs. non-SMM.  This is mandatory because SMM has its own memslot view.

A CPUID.MAXPHYADDR change can also change the role, but in this case zapping all
roots will always be the correct/desired behavior.

[*] https://lkml.kernel.org/r/YQGj8gj7fpWDdLg5@google.com

> and I think ept sync for single context is enough for both cases you listed below.
> 
> When the context is flushed, the TLB for the vCPU is clean to run.
> 
> If kvm changes the mmu->root_hpa, it is kvm's responsibility to request
> another flush which is implemented.

KVM needs to flush when it allocates a new root, largely because it has no way
of knowing if some other entity previously created a CR3/EPTP at that HPA, but
KVM isn't strictly required to flush when switching to a previous/cached root.

Currently this is a moot point because kvm_post_set_cr0(), kvm_post_set_cr4(),
set_efer(), and kvm_smm_changed() all do kvm_mmu_reset_context() instead of
attempting a fast PGD switch, but I am hoping to change this as well, at least
for the non-SMM cases.

> In other words, KVM_REQ_TLB_FLUSH == KVM_REQ_TLB_FLUSH_CURRENT in this case.
> And before this patch, kvm flush only the single context rather than global.
> 
> >
> > Use #1 is remote flushes from the MMU, which don't strictly require a global flush,
> > but KVM would need to propagate more information (mmu_role?) in order for responding
> > vCPUs to determine what contexts needs to be flushed.  And practically speaking,
> > for MMU flushes there's no meaningful difference when using TDP without nested
> > guests as the common case will be that each vCPU has a single active EPTP and
> > that EPTP will be affected by the MMU changes, i.e. needs to be flushed.
> 
> I don't see when we need "to determine what contexts" since the vcpu is
> using only one context in this case which is the assumption in my mind,
> could you please correct me if I'm wrong.

As it exists today, I believe you're correct that KVM will only ever have a
single reachable TDP root, but only because of overzealous kvm_mmu_reset_context()
usage.  The SMM case in particular could be optimized to not zap all roots (whether
or not it's worth optimizing is another question).

All that said, the easiest way to query the number of reachable roots would be to
check the previous/cached root.

But, even if we can guarantee there's exactly one reachable root, I would be
surprised if doing INVEPT.context instead of INVEPT.global actually provided any
meaningful performance benefit.  Using INVEPT.context is safe if and only if there
are no other TLB entries for this vCPU, and KVM must invalidate on pCPU migration,
so there can't be collateral damage in that sense.

That leaves the latency of INVEPT as the only possible performance delta, and that
will be uarch specific.  It's entirely possible INVEPT.global is slower, but again
I would be surprised if it is so much slower than INVEPT.context that it actually
impacts guest performance given that its use is limited to slow paths.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 23/37] KVM: nVMX: Add helper to handle TLB flushes on nested VM-Enter/VM-Exit
  2020-03-20 21:28 ` [PATCH v3 23/37] KVM: nVMX: Add helper to handle TLB flushes on nested VM-Enter/VM-Exit Sean Christopherson
@ 2021-10-28 13:11   ` Lai Jiangshan
  2021-10-28 15:22     ` Sean Christopherson
  0 siblings, 1 reply; 83+ messages in thread
From: Lai Jiangshan @ 2021-10-28 13:11 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, LKML, Ben Gardon, Junaid Shahid, Liran Alon,
	Boris Ostrovsky, John Haxby, Miaohe Lin, Tom Lendacky

On Sat, Mar 21, 2020 at 5:29 AM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:

> +       if (!nested_cpu_has_vpid(vmcs12) || !nested_has_guest_tlb_tag(vcpu)) {
> +               kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
> +       } else if (is_vmenter &&
> +                  vmcs12->virtual_processor_id != vmx->nested.last_vpid) {
> +               vmx->nested.last_vpid = vmcs12->virtual_processor_id;
> +               vpid_sync_context(nested_get_vpid02(vcpu));
> +       }
> +}


(I'm sorry to pick this old email to reply to, but the problem has
nothing to do with this patch nor 5c614b3583e7 and it exists since
nested vmx is introduced.)

I think kvm_mmu_free_guest_mode_roots() should be called
if (!enable_ept && vmcs12->virtual_processor_id != vmx->nested.last_vpid)
just because prev_roots doesn't cache the vpid12.
(prev_roots caches PCID, which is distinctive)

The problem hardly exists if L1's hypervisor is also kvm, but if
L1's hypervisor is different or is also kvm with some changes
in the way how it manages VPID.  (Actually, I planned to
change the way how it manages VPID to svm-like.)

nvcpu0 and nvcpu1 are in the same nested VM and are running the same
application process.

vcpu1: runs nvcpu1 with the same cr3 as nvcpu0
vcpu0: runs nvcpu0, modifies pagetable and L1 sync root, and flush VPID12
       but L0 doesn't sync, it just removes the root from vcpu0's prev_roots.
vcpu1: L1 migrates nvcpu0 to here, allocates a *fresh* VPID12 to nvcpu0
       like the ways svm allocates a fresh ASID.
vcpu1: runs nvcpu0 without any flush. (vcpu1's prev_roots has already had it
       L0 hasn't synced it)

If my understanding is correct, I hope it is a report and somebody fixes it.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 23/37] KVM: nVMX: Add helper to handle TLB flushes on nested VM-Enter/VM-Exit
  2021-10-28 13:11   ` Lai Jiangshan
@ 2021-10-28 15:22     ` Sean Christopherson
  2021-10-29  0:44       ` Lai Jiangshan
  0 siblings, 1 reply; 83+ messages in thread
From: Sean Christopherson @ 2021-10-28 15:22 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, LKML, Ben Gardon, Junaid Shahid, Liran Alon,
	Boris Ostrovsky, John Haxby, Miaohe Lin, Tom Lendacky

-me :-)

On Thu, Oct 28, 2021, Lai Jiangshan wrote:
> On Sat, Mar 21, 2020 at 5:29 AM Sean Christopherson
> <sean.j.christopherson@intel.com> wrote:
> 
> > +       if (!nested_cpu_has_vpid(vmcs12) || !nested_has_guest_tlb_tag(vcpu)) {
> > +               kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
> > +       } else if (is_vmenter &&
> > +                  vmcs12->virtual_processor_id != vmx->nested.last_vpid) {
> > +               vmx->nested.last_vpid = vmcs12->virtual_processor_id;
> > +               vpid_sync_context(nested_get_vpid02(vcpu));
> > +       }
> > +}
> 
> (I'm sorry to pick this old email to reply to, but the problem has
> nothing to do with this patch nor 5c614b3583e7 and it exists since
> nested vmx is introduced.)
> 
> I think kvm_mmu_free_guest_mode_roots() should be called
> if (!enable_ept && vmcs12->virtual_processor_id != vmx->nested.last_vpid)
> just because prev_roots doesn't cache the vpid12.
> (prev_roots caches PCID, which is distinctive)
> 
> The problem hardly exists if L1's hypervisor is also kvm, but if L1's
> hypervisor is different or is also kvm with some changes in the way how it
> manages VPID.

Indeed.  A more straightforward error case would be if L1 and L2 share CR3, and
vmcs02.VPID is toggled (or used for the first time) on the L1 => L2 VM-Enter.

The fix should simply be:

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index eedcebf58004..574823370e7a 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -1202,17 +1202,15 @@ static void nested_vmx_transition_tlb_flush(struct kvm_vcpu *vcpu,
         *
         * If a TLB flush isn't required due to any of the above, and vpid12 is
         * changing then the new "virtual" VPID (vpid12) will reuse the same
-        * "real" VPID (vpid02), and so needs to be flushed.  There's no direct
-        * mapping between vpid02 and vpid12, vpid02 is per-vCPU and reused for
-        * all nested vCPUs.  Remember, a flush on VM-Enter does not invalidate
-        * guest-physical mappings, so there is no need to sync the nEPT MMU.
+        * "real" VPID (vpid02), and so needs to be flushed.  Like the !vpid02
+        * case above, this is a full TLB flush from the guest's perspective.
         */
        if (!nested_has_guest_tlb_tag(vcpu)) {
                kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
        } else if (is_vmenter &&
                   vmcs12->virtual_processor_id != vmx->nested.last_vpid) {
                vmx->nested.last_vpid = vmcs12->virtual_processor_id;
-               vpid_sync_context(nested_get_vpid02(vcpu));
+               kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
        }
 }

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 23/37] KVM: nVMX: Add helper to handle TLB flushes on nested VM-Enter/VM-Exit
  2021-10-28 15:22     ` Sean Christopherson
@ 2021-10-29  0:44       ` Lai Jiangshan
  2021-10-29 17:10         ` Sean Christopherson
  0 siblings, 1 reply; 83+ messages in thread
From: Lai Jiangshan @ 2021-10-29  0:44 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, LKML, Ben Gardon, Junaid Shahid, Liran Alon,
	Boris Ostrovsky, John Haxby, Miaohe Lin, Tom Lendacky

On Thu, Oct 28, 2021 at 11:22 PM Sean Christopherson <seanjc@google.com> wrote:
>
> -me :-)
>
> On Thu, Oct 28, 2021, Lai Jiangshan wrote:
> > On Sat, Mar 21, 2020 at 5:29 AM Sean Christopherson
> > <sean.j.christopherson@intel.com> wrote:
> >
> > > +       if (!nested_cpu_has_vpid(vmcs12) || !nested_has_guest_tlb_tag(vcpu)) {
> > > +               kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
> > > +       } else if (is_vmenter &&
> > > +                  vmcs12->virtual_processor_id != vmx->nested.last_vpid) {
> > > +               vmx->nested.last_vpid = vmcs12->virtual_processor_id;
> > > +               vpid_sync_context(nested_get_vpid02(vcpu));
> > > +       }
> > > +}
> >
> > (I'm sorry to pick this old email to reply to, but the problem has
> > nothing to do with this patch nor 5c614b3583e7 and it exists since
> > nested vmx is introduced.)
> >
> > I think kvm_mmu_free_guest_mode_roots() should be called
> > if (!enable_ept && vmcs12->virtual_processor_id != vmx->nested.last_vpid)
> > just because prev_roots doesn't cache the vpid12.
> > (prev_roots caches PCID, which is distinctive)
> >
> > The problem hardly exists if L1's hypervisor is also kvm, but if L1's
> > hypervisor is different or is also kvm with some changes in the way how it
> > manages VPID.
>
> Indeed.  A more straightforward error case would be if L1 and L2 share CR3, and
> vmcs02.VPID is toggled (or used for the first time) on the L1 => L2 VM-Enter.
>
> The fix should simply be:
>
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index eedcebf58004..574823370e7a 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -1202,17 +1202,15 @@ static void nested_vmx_transition_tlb_flush(struct kvm_vcpu *vcpu,
>          *
>          * If a TLB flush isn't required due to any of the above, and vpid12 is
>          * changing then the new "virtual" VPID (vpid12) will reuse the same
> -        * "real" VPID (vpid02), and so needs to be flushed.  There's no direct
> -        * mapping between vpid02 and vpid12, vpid02 is per-vCPU and reused for
> -        * all nested vCPUs.  Remember, a flush on VM-Enter does not invalidate
> -        * guest-physical mappings, so there is no need to sync the nEPT MMU.
> +        * "real" VPID (vpid02), and so needs to be flushed.  Like the !vpid02
> +        * case above, this is a full TLB flush from the guest's perspective.
>          */
>         if (!nested_has_guest_tlb_tag(vcpu)) {
>                 kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
>         } else if (is_vmenter &&
>                    vmcs12->virtual_processor_id != vmx->nested.last_vpid) {
>                 vmx->nested.last_vpid = vmcs12->virtual_processor_id;
> -               vpid_sync_context(nested_get_vpid02(vcpu));
> +               kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);

This change is neat.

But current KVM_REQ_TLB_FLUSH_GUEST flushes vpid01 only, and it doesn't flush
vpid02.  vmx_flush_tlb_guest() might need to be changed to flush vpid02 too.

And if so, this nested_vmx_transition_tlb_flush() can be simplified further
since KVM_REQ_TLB_FLUSH_CURRENT(!enable_ept) can be replaced with
KVM_REQ_TLB_FLUSH_GUEST.

>         }
>  }

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 23/37] KVM: nVMX: Add helper to handle TLB flushes on nested VM-Enter/VM-Exit
  2021-10-29  0:44       ` Lai Jiangshan
@ 2021-10-29 17:10         ` Sean Christopherson
  2021-10-30  1:34           ` Lai Jiangshan
  0 siblings, 1 reply; 83+ messages in thread
From: Sean Christopherson @ 2021-10-29 17:10 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, LKML, Ben Gardon, Junaid Shahid, Liran Alon,
	Boris Ostrovsky, John Haxby, Miaohe Lin, Tom Lendacky

TL;DR: I'll work on a proper series next week, there are multiple things that need
to be fixed.

On Fri, Oct 29, 2021, Lai Jiangshan wrote:
> On Thu, Oct 28, 2021 at 11:22 PM Sean Christopherson <seanjc@google.com> wrote:
> > The fix should simply be:
> >
> > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> > index eedcebf58004..574823370e7a 100644
> > --- a/arch/x86/kvm/vmx/nested.c
> > +++ b/arch/x86/kvm/vmx/nested.c
> > @@ -1202,17 +1202,15 @@ static void nested_vmx_transition_tlb_flush(struct kvm_vcpu *vcpu,
> >          *
> >          * If a TLB flush isn't required due to any of the above, and vpid12 is
> >          * changing then the new "virtual" VPID (vpid12) will reuse the same
> > -        * "real" VPID (vpid02), and so needs to be flushed.  There's no direct
> > -        * mapping between vpid02 and vpid12, vpid02 is per-vCPU and reused for
> > -        * all nested vCPUs.  Remember, a flush on VM-Enter does not invalidate
> > -        * guest-physical mappings, so there is no need to sync the nEPT MMU.
> > +        * "real" VPID (vpid02), and so needs to be flushed.  Like the !vpid02
> > +        * case above, this is a full TLB flush from the guest's perspective.
> >          */
> >         if (!nested_has_guest_tlb_tag(vcpu)) {
> >                 kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
> >         } else if (is_vmenter &&
> >                    vmcs12->virtual_processor_id != vmx->nested.last_vpid) {
> >                 vmx->nested.last_vpid = vmcs12->virtual_processor_id;
> > -               vpid_sync_context(nested_get_vpid02(vcpu));
> > +               kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
> 
> This change is neat.

Heh, yeah, but too neat to be right :-)

> But current KVM_REQ_TLB_FLUSH_GUEST flushes vpid01 only, and it doesn't flush
> vpid02.  vmx_flush_tlb_guest() might need to be changed to flush vpid02 too.

Hmm.  I think vmx_flush_tlb_guest() is straight up broken.  E.g. if EPT is enabled
but L1 doesn't use EPT for L2 and doesn't intercept INVPCID, then KVM will handle
INVPCID from L2.  That means the recent addition to kvm_invalidate_pcid() (see
below) will flush the wrong VPID.  And it's incorrect (well, more than is required
by the SDM) to flush both VPIDs because flushes from INVPCID (and flushes from the
guest's perspective in general) are scoped to the current VPID, e.g. a "full" TLB
flush in the "host" by toggling CR4.PGE flushes only the current VPID:

  Operations that architecturally invalidate entries in the TLBs or paging-structure
  caches independent of VMX operation (e.g., the INVLPG and INVPCID instructions)
  invalidate linear mappings and combined mappings.  They are required to do so only
  for the current VPID (but, for combined mappings, all EP4TAs).

static void kvm_invalidate_pcid(struct kvm_vcpu *vcpu, unsigned long pcid)
{
	struct kvm_mmu *mmu = vcpu->arch.mmu;
	unsigned long roots_to_free = 0;
	int i;

	/*
	 * MOV CR3 and INVPCID are usually not intercepted when using TDP, but
	 * this is reachable when running EPT=1 and unrestricted_guest=0,  and
	 * also via the emulator.  KVM's TDP page tables are not in the scope of
	 * the invalidation, but the guest's TLB entries need to be flushed as
	 * the CPU may have cached entries in its TLB for the target PCID.
	 */
	if (unlikely(tdp_enabled)) {
		kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
		return;
	}

	...
}

To fix that, the "guest" flushes should always operate on the current VPID.  But
this alone is insufficient (more below).

static inline int vmx_get_current_vpid(struct kvm_vcpu *vcpu)
{
	if (is_guest_mode(vcpu))
		return nested_get_vpid02(vcpu);
	return to_vmx(vcpu)->vpid;
}

static void vmx_flush_tlb_current(struct kvm_vcpu *vcpu)
{
	struct kvm_mmu *mmu = vcpu->arch.mmu;
	u64 root_hpa = mmu->root_hpa;

	/* No flush required if the current context is invalid. */
	if (!VALID_PAGE(root_hpa))
		return;

	if (enable_ept)
		ept_sync_context(construct_eptp(vcpu, root_hpa,
						mmu->shadow_root_level));
	else
		vpid_sync_context(vmx_get_current_vpid(vcpu));
}

static void vmx_flush_tlb_gva(struct kvm_vcpu *vcpu, gva_t addr)
{
	/*
	 * vpid_sync_vcpu_addr() is a nop if vpid==0, see the comment in
	 * vmx_flush_tlb_guest() for an explanation of why this is ok.
	 */
	vpid_sync_vcpu_addr(vmx_get_current_vpid(vcpu), addr);
}

static void vmx_flush_tlb_guest(struct kvm_vcpu *vcpu)
{
	/*
	 * vpid_sync_context() is a nop if vpid==0, e.g. if enable_vpid==0 or a
	 * vpid couldn't be allocated for this vCPU.  VM-Enter and VM-Exit are
	 * required to flush GVA->{G,H}PA mappings from the TLB if vpid is
	 * disabled (VM-Enter with vpid enabled and vpid==0 is disallowed),
	 * i.e. no explicit INVVPID is necessary.
	 */
	vpid_sync_context(vmx_get_current_vpid(vcpu));
}



> And if so, this nested_vmx_transition_tlb_flush() can be simplified further
> since KVM_REQ_TLB_FLUSH_CURRENT(!enable_ept) can be replaced with
> KVM_REQ_TLB_FLUSH_GUEST.

And as above KVM_REQ_TLB_FLUSH_GUEST is conceptually wrong, too.  E.g. in my
dummy case of L1 and L2 using the same CR3, if L1 assigns L2 a VPID then L1's
ASID is not flushed flushed on VM-Exit, so pending PTE updates for that single
CR3 would not be flushed (sync'd in KVM) for L1 even though they were flushed
for L2.

kvm_mmu_page_role doesn't track VPID, but it does track is_guest_mode, so the
bizarre case of L1 but not L2 having stale entries for a single CR3 is "supported".

Another wrinkle that is being mishandled is if L1 doesn't intercept INVPCID and
KVM synthesizes a nested VM-Exit from L2=>L1 before servicing KVM_REQ_MMU_SYNC,
KVM will sync the wrong MMU because the nested transitions only service pending
"current" flushes.  The GUEST variant also has the same bug (which I alluded to
above).

To fix that, the nVMX code should handle all pending flushes and syncs that are
specific to the current vCPU, e.g. by replacing the open coded TLB_FLUSH_CURRENT
check with a call to a common helper as below.  Ideally enter_guest_mode() and
leave_guest_mode() would handle these calls so that SVM doesn't need to be updated
if/when SVM stops flushing on all nested transitions, but VMX switches to vmcs02
and has already modified state before getting to enter_guest_mode(), which makes
me more than a bit nervous.

@@ -3361,8 +3358,7 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
        };
        u32 failed_index;

-       if (kvm_check_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu))
-               kvm_vcpu_flush_tlb_current(vcpu);
+       kvm_service_pending_tlb_flush_on_nested_transition(vcpu);

        evaluate_pending_interrupts = exec_controls_get(vmx) &
                (CPU_BASED_INTR_WINDOW_EXITING | CPU_BASED_NMI_WINDOW_EXITING);
@@ -4516,9 +4512,8 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 vm_exit_reason,
                (void)nested_get_evmcs_page(vcpu);
        }

-       /* Service the TLB flush request for L2 before switching to L1. */
-       if (kvm_check_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu))
-               kvm_vcpu_flush_tlb_current(vcpu);
+       /* Service pending TLB flush requests for L2 before switching to L1. */
+       kvm_service_pending_tlb_flush_on_nested_transition(vcpu);

        /*
         * VCPU_EXREG_PDPTR will be clobbered in arch/x86/kvm/vmx/vmx.h between


And for nested_vmx_transition_tlb_flush(), assuming all the other things are fixed,
the "vpid12 is changing" case does indeed become KVM_REQ_TLB_FLUSH_GUEST.  It also
needs to be prioritized above nested_has_guest_tlb_tag() because a GUEST flush is
"strong" than a CURRENT flush.

static void nested_vmx_transition_tlb_flush(struct kvm_vcpu *vcpu,
					    struct vmcs12 *vmcs12,
					    bool is_vmenter)
{
	struct vcpu_vmx *vmx = to_vmx(vcpu);

	/*
	 * If vmcs12 doesn't use VPID, L1 expects linear and combined mappings
	 * for *all* contexts to be flushed on VM-Enter/VM-Exit, i.e. it's a
	 * full TLB flush from the guest's perspective.  This is required even
	 * if VPID is disabled in the host as KVM may need to synchronize the
	 * MMU in response to the guest TLB flush.
	 *
	 * Note, using TLB_FLUSH_GUEST is correct even if nested EPT is in use.
	 * EPT is a special snowflake, as guest-physical mappings aren't
	 * flushed on VPID invalidations, including VM-Enter or VM-Exit with
	 * VPID disabled.  As a result, KVM _never_ needs to sync nEPT
	 * entries on VM-Enter because L1 can't rely on VM-Enter to flush
	 * those mappings.
	 */
	if (!nested_cpu_has_vpid(vmcs12)) {
		kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
		return;
	}

	/* L2 should never have a VPID if VPID is disabled. */
	WARN_ON(!enable_vpid);

	/*
	 * VPID is enabled and in use by vmcs12.  If vpid12 is changing, then
	 * emulate a guest TLB flush as KVM does not track vpid12 history nor
	 * is the VPID incorporated into the MMU context.  I.e. KVM must assume
	 * that the new vpid12 has never been used and thus represents a new
	 * guest ASID that cannot have entries in the TLB.
	 */
	if (is_vmenter && vmcs12->virtual_processor_id != vmx->nested.last_vpid) {
		vmx->nested.last_vpid = vmcs12->virtual_processor_id;
		kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
		return;
	}

	/*
	 * If VPID is enabled, used by vmc12, and vpid12 is not changing but
	 * but does not have a unique TLB tag (ASID), i.e. EPT is disabled and
	 * KVM was unable to allocate a VPID for L2, flush the current context
	 * as the effective ASID is common to both L1 and L2.
	 */
	if (!nested_has_guest_tlb_tag(vcpu))
		kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
}

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 23/37] KVM: nVMX: Add helper to handle TLB flushes on nested VM-Enter/VM-Exit
  2021-10-29 17:10         ` Sean Christopherson
@ 2021-10-30  1:34           ` Lai Jiangshan
  2021-11-04 17:47             ` Sean Christopherson
  0 siblings, 1 reply; 83+ messages in thread
From: Lai Jiangshan @ 2021-10-30  1:34 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, LKML, Ben Gardon, Junaid Shahid, Liran Alon,
	Boris Ostrovsky, John Haxby, Miaohe Lin, Tom Lendacky

/

On Sat, Oct 30, 2021 at 1:10 AM Sean Christopherson <seanjc@google.com> wrote:
>
> TL;DR: I'll work on a proper series next week, there are multiple things that need
> to be fixed.
>
> On Fri, Oct 29, 2021, Lai Jiangshan wrote:
> > On Thu, Oct 28, 2021 at 11:22 PM Sean Christopherson <seanjc@google.com> wrote:
> > > The fix should simply be:
> > >
> > > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> > > index eedcebf58004..574823370e7a 100644
> > > --- a/arch/x86/kvm/vmx/nested.c
> > > +++ b/arch/x86/kvm/vmx/nested.c
> > > @@ -1202,17 +1202,15 @@ static void nested_vmx_transition_tlb_flush(struct kvm_vcpu *vcpu,
> > >          *
> > >          * If a TLB flush isn't required due to any of the above, and vpid12 is
> > >          * changing then the new "virtual" VPID (vpid12) will reuse the same
> > > -        * "real" VPID (vpid02), and so needs to be flushed.  There's no direct
> > > -        * mapping between vpid02 and vpid12, vpid02 is per-vCPU and reused for
> > > -        * all nested vCPUs.  Remember, a flush on VM-Enter does not invalidate
> > > -        * guest-physical mappings, so there is no need to sync the nEPT MMU.
> > > +        * "real" VPID (vpid02), and so needs to be flushed.  Like the !vpid02
> > > +        * case above, this is a full TLB flush from the guest's perspective.
> > >          */
> > >         if (!nested_has_guest_tlb_tag(vcpu)) {
> > >                 kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
> > >         } else if (is_vmenter &&
> > >                    vmcs12->virtual_processor_id != vmx->nested.last_vpid) {
> > >                 vmx->nested.last_vpid = vmcs12->virtual_processor_id;
> > > -               vpid_sync_context(nested_get_vpid02(vcpu));
> > > +               kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
> >
> > This change is neat.
>
> Heh, yeah, but too neat to be right :-)
>
> > But current KVM_REQ_TLB_FLUSH_GUEST flushes vpid01 only, and it doesn't flush
> > vpid02.  vmx_flush_tlb_guest() might need to be changed to flush vpid02 too.
>
> Hmm.  I think vmx_flush_tlb_guest() is straight up broken.  E.g. if EPT is enabled
> but L1 doesn't use EPT for L2 and doesn't intercept INVPCID, then KVM will handle
> INVPCID from L2.  That means the recent addition to kvm_invalidate_pcid() (see
> below) will flush the wrong VPID.  And it's incorrect (well, more than is required
> by the SDM) to flush both VPIDs because flushes from INVPCID (and flushes from the
> guest's perspective in general) are scoped to the current VPID, e.g. a "full" TLB
> flush in the "host" by toggling CR4.PGE flushes only the current VPID:

I think KVM_REQ_TLB_FLUSH_GUEST/kvm_vcpu_flush_tlb_guest/vmx_flush_tlb_guest
was deliberately designed for the L1 guest only.  It can be seen from the code,
from the history, and from the caller's side.  For example,
nested_vmx_transition_tlb_flush() knows KVM_REQ_TLB_FLUSH_GUEST flushes
L1 guest:

        /*
         * If vmcs12 doesn't use VPID, L1 expects linear and combined mappings
         * for *all* contexts to be flushed on VM-Enter/VM-Exit, i.e. it's a
         * full TLB flush from the guest's perspective.  This is required even
         * if VPID is disabled in the host as KVM may need to synchronize the
         * MMU in response to the guest TLB flush.
         *
         * Note, using TLB_FLUSH_GUEST is correct even if nested EPT is in use.
         * EPT is a special snowflake, as guest-physical mappings aren't
         * flushed on VPID invalidations, including VM-Enter or VM-Exit with
         * VPID disabled.  As a result, KVM _never_ needs to sync nEPT
         * entries on VM-Enter because L1 can't rely on VM-Enter to flush
         * those mappings.
         */
        if (!nested_cpu_has_vpid(vmcs12)) {
                kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
                return;
        }

While handle_invvpid() doesn't use KVM_REQ_TLB_FLUSH_GUEST.

So, I don't think KVM_REQ_TLB_FLUSH_GUEST, kvm_vcpu_flush_tlb_guest
or vmx_flush_tlb_guest is broken since they are for L1 guests.
What we have to do is to consider is it worth extending them for
nested guests for the convenience of nested code.

I second that they are extended.

A small comment in your proposal: I found that KVM_REQ_TLB_FLUSH_CURRENT
and KVM_REQ_TLB_FLUSH_GUEST is to flush "current" vpid only, some special
work needs to be added when switching mmu from L1 to L2 and vice versa:
handle the requests before switching.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 23/37] KVM: nVMX: Add helper to handle TLB flushes on nested VM-Enter/VM-Exit
  2021-10-30  1:34           ` Lai Jiangshan
@ 2021-11-04 17:47             ` Sean Christopherson
  0 siblings, 0 replies; 83+ messages in thread
From: Sean Christopherson @ 2021-11-04 17:47 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, LKML, Ben Gardon, Junaid Shahid, Liran Alon,
	Boris Ostrovsky, John Haxby, Miaohe Lin, Tom Lendacky

On Sat, Oct 30, 2021, Lai Jiangshan wrote:
> A small comment in your proposal: I found that KVM_REQ_TLB_FLUSH_CURRENT
> and KVM_REQ_TLB_FLUSH_GUEST is to flush "current" vpid only, some special
> work needs to be added when switching mmu from L1 to L2 and vice versa:
> handle the requests before switching.

Oh, yeah, that's this snippet of my pseudo patch, but I didn't provide the
kvm_service_pending_tlb_flush_on_nested_transition() implementation so it's not
exactly obvious what I intended.  The current code handles CURRENT, but not GUEST,
the idea is to shove those into a helper that can be shared between nVMX and nSVM.

And I believe the "flush" also needs to service KVM_REQ_MMU_SYNC.  For L1=>L2 it
should be irrelevant/impossible, since L1 can only be unsync if L1 and L2 share
an MMU, but the L2=>L1 path could result in a lost sync if something, e.g. an IRQ,
prompted a nested VM-Exit before re-entering L2.

Let me know if I misunderstood your comment.  Thanks!

@@ -3361,8 +3358,7 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
        };
        u32 failed_index;

-       if (kvm_check_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu))
-               kvm_vcpu_flush_tlb_current(vcpu);
+       kvm_service_pending_tlb_flush_on_nested_transition(vcpu);

        evaluate_pending_interrupts = exec_controls_get(vmx) &
                (CPU_BASED_INTR_WINDOW_EXITING | CPU_BASED_NMI_WINDOW_EXITING);

^ permalink raw reply	[flat|nested] 83+ messages in thread

end of thread, other threads:[~2021-11-04 17:48 UTC | newest]

Thread overview: 83+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-20 21:27 [PATCH v3 00/37] KVM: x86: TLB flushing fixes and enhancements Sean Christopherson
2020-03-20 21:27 ` [PATCH v3 01/37] KVM: VMX: Flush all EPTP/VPID contexts on remote TLB flush Sean Christopherson
2021-08-03  1:45   ` Lai Jiangshan
2021-08-03 15:39     ` Sean Christopherson
2021-08-04  3:11       ` Lai Jiangshan
2021-08-04 15:33         ` Sean Christopherson
2020-03-20 21:27 ` [PATCH v3 02/37] KVM: nVMX: Validate the EPTP when emulating INVEPT(EXTENT_CONTEXT) Sean Christopherson
2020-03-23 14:51   ` Vitaly Kuznetsov
2020-03-23 15:45     ` Sean Christopherson
2020-03-23 23:46       ` Paolo Bonzini
2020-03-20 21:27 ` [PATCH v3 03/37] KVM: nVMX: Invalidate all EPTP contexts when emulating INVEPT for L1 Sean Christopherson
2020-03-23 15:24   ` Vitaly Kuznetsov
2020-03-23 15:53     ` Sean Christopherson
2020-03-23 16:24   ` Jim Mattson
2020-03-23 16:28     ` Sean Christopherson
2020-03-23 16:36       ` Jim Mattson
2020-03-23 16:44         ` Sean Christopherson
2020-03-23 23:50           ` Paolo Bonzini
2020-03-24  0:12             ` Jim Mattson
2020-03-30 18:38               ` Sean Christopherson
2020-03-20 21:28 ` [PATCH v3 04/37] KVM: nVMX: Invalidate all roots when emulating INVVPID without EPT Sean Christopherson
2020-03-23 15:34   ` Vitaly Kuznetsov
2020-03-23 16:04     ` Sean Christopherson
2020-03-23 16:33       ` Vitaly Kuznetsov
2020-03-23 16:50         ` Sean Christopherson
2020-03-23 16:57           ` Vitaly Kuznetsov
2020-03-20 21:28 ` [PATCH v3 05/37] KVM: x86: Export kvm_propagate_fault() (as kvm_inject_emulated_page_fault) Sean Christopherson
2020-03-23 15:47   ` Vitaly Kuznetsov
2020-03-23 16:24     ` Sean Christopherson
2020-03-23 23:56       ` Paolo Bonzini
2020-03-20 21:28 ` [PATCH v3 06/37] KVM: x86: Consolidate logic for injecting page faults to L1 Sean Christopherson
2020-03-24  0:47   ` Paolo Bonzini
2020-03-20 21:28 ` [PATCH v3 07/37] KVM: x86: Sync SPTEs when injecting page/EPT fault into L1 Sean Christopherson
2020-03-20 21:28 ` [PATCH v3 08/37] KVM: VMX: Skip global INVVPID fallback if vpid==0 in vpid_sync_context() Sean Christopherson
2020-03-25  9:33   ` Vitaly Kuznetsov
2020-03-20 21:28 ` [PATCH v3 09/37] KVM: VMX: Use vpid_sync_context() directly when possible Sean Christopherson
2020-03-20 21:28 ` [PATCH v3 10/37] KVM: VMX: Move vpid_sync_vcpu_addr() down a few lines Sean Christopherson
2020-03-20 21:28 ` [PATCH v3 11/37] KVM: VMX: Handle INVVPID fallback logic in vpid_sync_vcpu_addr() Sean Christopherson
2020-03-20 21:28 ` [PATCH v3 12/37] KVM: VMX: Drop redundant capability checks in low level INVVPID helpers Sean Christopherson
2020-03-20 21:28 ` [PATCH v3 13/37] KVM: nVMX: Use vpid_sync_vcpu_addr() to emulate INVVPID with address Sean Christopherson
2020-03-20 21:28 ` [PATCH v3 14/37] KVM: x86: Move "flush guest's TLB" logic to separate kvm_x86_ops hook Sean Christopherson
2020-03-25 10:23   ` Vitaly Kuznetsov
2020-03-25 15:41     ` Paolo Bonzini
2020-03-25 16:08       ` Vitaly Kuznetsov
2020-03-25 15:48     ` Sean Christopherson
2020-03-25 16:11       ` Vitaly Kuznetsov
2020-03-20 21:28 ` [PATCH v3 15/37] KVM: VMX: Clean up vmx_flush_tlb_gva() Sean Christopherson
2020-03-20 21:28 ` [PATCH v3 16/37] KVM: x86: Drop @invalidate_gpa param from kvm_x86_ops' tlb_flush() Sean Christopherson
2020-03-25 11:23   ` Vitaly Kuznetsov
2020-03-20 21:28 ` [PATCH v3 17/37] KVM: SVM: Wire up ->tlb_flush_guest() directly to svm_flush_tlb() Sean Christopherson
2020-03-25 11:23   ` Vitaly Kuznetsov
2020-03-20 21:28 ` [PATCH v3 18/37] KVM: VMX: Move vmx_flush_tlb() to vmx.c Sean Christopherson
2020-03-25 11:25   ` Vitaly Kuznetsov
2020-03-20 21:28 ` [PATCH v3 19/37] KVM: nVMX: Move nested_get_vpid02() to vmx/nested.h Sean Christopherson
2020-03-25 11:25   ` Vitaly Kuznetsov
2020-03-20 21:28 ` [PATCH v3 20/37] KVM: VMX: Introduce vmx_flush_tlb_current() Sean Christopherson
2020-03-20 21:28 ` [PATCH v3 21/37] KVM: SVM: Document the ASID logic in svm_flush_tlb() Sean Christopherson
2020-03-20 21:28 ` [PATCH v3 22/37] KVM: x86: Rename ->tlb_flush() to ->tlb_flush_all() Sean Christopherson
2020-03-20 21:28 ` [PATCH v3 23/37] KVM: nVMX: Add helper to handle TLB flushes on nested VM-Enter/VM-Exit Sean Christopherson
2021-10-28 13:11   ` Lai Jiangshan
2021-10-28 15:22     ` Sean Christopherson
2021-10-29  0:44       ` Lai Jiangshan
2021-10-29 17:10         ` Sean Christopherson
2021-10-30  1:34           ` Lai Jiangshan
2021-11-04 17:47             ` Sean Christopherson
2020-03-20 21:28 ` [PATCH v3 24/37] KVM: x86: Introduce KVM_REQ_TLB_FLUSH_CURRENT to flush current ASID Sean Christopherson
2020-03-20 21:28 ` [PATCH v3 25/37] KVM: x86/mmu: Use KVM_REQ_TLB_FLUSH_CURRENT for MMU specific flushes Sean Christopherson
2020-03-20 21:28 ` [PATCH v3 26/37] KVM: nVMX: Selectively use TLB_FLUSH_CURRENT for nested VM-Enter/VM-Exit Sean Christopherson
2020-03-20 21:28 ` [PATCH v3 27/37] KVM: nVMX: Reload APIC access page on nested VM-Exit only if necessary Sean Christopherson
2020-03-20 21:28 ` [PATCH v3 28/37] KVM: VMX: Retrieve APIC access page HPA only when necessary Sean Christopherson
2020-03-20 21:28 ` [PATCH v3 29/37] KVM: VMX: Don't reload APIC access page if its control is disabled Sean Christopherson
2020-03-20 21:28 ` [PATCH v3 30/37] KVM: x86/mmu: Move fast_cr3_switch() side effects to __kvm_mmu_new_cr3() Sean Christopherson
2020-03-20 21:28 ` [PATCH v3 31/37] KVM: x86/mmu: Add separate override for MMU sync during fast CR3 switch Sean Christopherson
2020-03-24 11:07   ` Paolo Bonzini
2020-03-20 21:28 ` [PATCH v3 32/37] KVM: x86/mmu: Add module param to force TLB flush on root reuse Sean Christopherson
2020-03-20 21:28 ` [PATCH v3 33/37] KVM: nVMX: Skip MMU sync on nested VMX transition when possible Sean Christopherson
2020-03-24 11:19   ` Paolo Bonzini
2020-03-20 21:28 ` [PATCH v3 34/37] KVM: nVMX: Don't flush TLB on nested VMX transition Sean Christopherson
2020-03-24 11:20   ` Paolo Bonzini
2020-03-24 18:10     ` Sean Christopherson
2020-03-20 21:28 ` [PATCH v3 35/37] KVM: nVMX: Free only the affected contexts when emulating INVEPT Sean Christopherson
2020-03-20 21:28 ` [PATCH v3 36/37] KVM: x86: Replace "cr3" with "pgd" in "new cr3/pgd" related code Sean Christopherson
2020-03-20 21:28 ` [PATCH v3 37/37] KVM: VMX: Clean cr3/pgd handling in vmx_load_mmu_pgd() Sean Christopherson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).