linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/15] KVM: x86/mmu: TLB fixes and related cleanups
@ 2021-06-09 23:42 Sean Christopherson
  2021-06-09 23:42 ` [PATCH 01/15] KVM: nVMX: Sync all PGDs on nested transition with shadow paging Sean Christopherson
                   ` (15 more replies)
  0 siblings, 16 replies; 18+ messages in thread
From: Sean Christopherson @ 2021-06-09 23:42 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Junaid Shahid, Maxim Levitsky,
	Lai Jiangshan

Fixes for two (very) theoretical TLB flushing bugs (patches 1 and 4),
and clean ups on top to (hopefully) consolidate and simplifiy the TLB
flushing logic.

The basic gist of the TLB flush and MMU sync code shuffling  is to stop
relying on the logic in __kvm_mmu_new_pgd() (but keep it for forced
flushing), and instead handle the flush+sync logic in the caller
independent from whether or not the "fast" switch occurs.

I spent a fair bit of time trying to shove the necessary logic down into
__kvm_mmu_new_pgd(), but it always ended up a complete mess because the
requirements and contextual information is always different.  The rules
for MOV CR3 are different from nVMX transitions (and those vary based on
EPT+VPID), and nSVM will be different still (once it adds proper TLB
handling).  In particular, I like that nVMX no longer has special code
for synchronizing the MMU when using shadowing paging and instead relies
on the common rules for TLB flushing.

Note, this series (indirectly) relies heavily on commit b53e84eed08b
("KVM: x86: Unload MMU on guest TLB flush if TDP disabled to force MMU
sync"), as it uses KVM_REQ_TLB_FLUSH_GUEST (was KVM_REQ_HV_TLB_FLUSH)
to do the TLB flush _and_ the MMU sync in non-PV code.

Tested all combinations for i386, EPT, NPT, and shadow paging. I think...

The EPTP switching and INVPCID single-context changes in particular lack
meaningful coverage in kvm-unit-tests+Linux.  Long term it's on my todo
list to remedy that, but realistically I doubt I'll get it done anytime
soon.

To test EPTP switching, I hacked L1 to set up a duplicate top-level EPT
table, copy the "real" table to the duplicate table on EPT violation,
populate VMFUNC.EPTP_LIST with the two EPTPs, expose  VMFUNC.EPTP_SWITCH
to L2.  I then hacked L2 to do an EPTP switch to a random (valid) EPTP
index on every task switch.

To test INVPCID single-context I modified L1 to iterate over all possible
PCIDs using INVPCID single-context in native_flush_tlb_global().  I also
verified that the guest crashed if it didn't do any INVPCID at all
(interestingly, the guest made it through boot without the flushes when
EPT was enabled, which implies the missing MMU sync on INVPCID was the
source of the crash, not a stale TLB entry).

Sean Christopherson (15):
  KVM: nVMX: Sync all PGDs on nested transition with shadow paging
  KVM: nVMX: Ensure 64-bit shift when checking VMFUNC bitmap
  KVM: nVMX: Don't clobber nested MMU's A/D status on EPTP switch
  KVM: x86: Invalidate all PGDs for the current PCID on MOV CR3 w/ flush
  KVM: x86: Uncondtionally skip MMU sync/TLB flush in MOV CR3's PGD
    switch
  KVM: nSVM: Move TLB flushing logic (or lack thereof) to dedicated
    helper
  KVM: x86: Drop skip MMU sync and TLB flush params from "new PGD"
    helpers
  KVM: nVMX: Consolidate VM-Enter/VM-Exit TLB flush and MMU sync logic
  KVM: nVMX: Free only guest_mode (L2) roots on INVVPID w/o EPT
  KVM: x86: Use KVM_REQ_TLB_FLUSH_GUEST to handle INVPCID(ALL) emulation
  KVM: nVMX: Use fast PGD switch when emulating VMFUNC[EPTP_SWITCH]
  KVM: x86: Defer MMU sync on PCID invalidation
  KVM: x86: Drop pointless @reset_roots from kvm_init_mmu()
  KVM: nVMX: WARN if subtly-impossible VMFUNC conditions occur
  KVM: nVMX: Drop redundant checks on vmcs12 in EPTP switching emulation

 arch/x86/include/asm/kvm_host.h |   6 +-
 arch/x86/kvm/hyperv.c           |   2 +-
 arch/x86/kvm/mmu.h              |   2 +-
 arch/x86/kvm/mmu/mmu.c          |  57 ++++++++-----
 arch/x86/kvm/svm/nested.c       |  40 ++++++---
 arch/x86/kvm/vmx/nested.c       | 139 ++++++++++++--------------------
 arch/x86/kvm/x86.c              |  75 ++++++++++-------
 7 files changed, 169 insertions(+), 152 deletions(-)

-- 
2.32.0.rc1.229.g3e70b5a671-goog


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 01/15] KVM: nVMX: Sync all PGDs on nested transition with shadow paging
  2021-06-09 23:42 [PATCH 00/15] KVM: x86/mmu: TLB fixes and related cleanups Sean Christopherson
@ 2021-06-09 23:42 ` Sean Christopherson
  2021-06-09 23:42 ` [PATCH 02/15] KVM: nVMX: Ensure 64-bit shift when checking VMFUNC bitmap Sean Christopherson
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Sean Christopherson @ 2021-06-09 23:42 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Junaid Shahid, Maxim Levitsky,
	Lai Jiangshan

Trigger a full TLB flush on behalf of the guest on nested VM-Enter and
VM-Exit when VPID is disabled for L2.  kvm_mmu_new_pgd() syncs only the
current PGD, which can theoretically leave stale, unsync'd entries in a
previous guest PGD, which could be consumed if L2 is allowed to load CR3
with PCID_NOFLUSH=1.

Rename KVM_REQ_HV_TLB_FLUSH to KVM_REQ_TLB_FLUSH_GUEST so that it can
be utilized for its obvious purpose of emulating a guest TLB flush.

Note, there is no change the actual TLB flush executed by KVM, even
though the fast PGD switch uses KVM_REQ_TLB_FLUSH_CURRENT.  When VPID is
disabled for L2, vpid02 is guaranteed to be '0', and thus
nested_get_vpid02() will return the VPID that is shared by L1 and L2.

Generate the request outside of kvm_mmu_new_pgd(), as getting the common
helper to correctly identify which requested is needed is quite painful.
E.g. using KVM_REQ_TLB_FLUSH_GUEST when nested EPT is in play is wrong as
a TLB flush from the L1 kernel's perspective does not invalidate EPT
mappings.  And, by using KVM_REQ_TLB_FLUSH_GUEST, nVMX can do future
simplification by moving the logic into nested_vmx_transition_tlb_flush().

Fixes: 41fab65e7c44 ("KVM: nVMX: Skip MMU sync on nested VMX transition when possible")
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h |  2 +-
 arch/x86/kvm/hyperv.c           |  2 +-
 arch/x86/kvm/vmx/nested.c       | 17 ++++++++++++-----
 arch/x86/kvm/x86.c              |  2 +-
 4 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 9c7ced0e3171..6652e51a86fd 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -85,7 +85,7 @@
 #define KVM_REQ_APICV_UPDATE \
 	KVM_ARCH_REQ_FLAGS(25, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
 #define KVM_REQ_TLB_FLUSH_CURRENT	KVM_ARCH_REQ(26)
-#define KVM_REQ_HV_TLB_FLUSH \
+#define KVM_REQ_TLB_FLUSH_GUEST \
 	KVM_ARCH_REQ_FLAGS(27, KVM_REQUEST_NO_WAKEUP)
 #define KVM_REQ_APF_READY		KVM_ARCH_REQ(28)
 #define KVM_REQ_MSR_FILTER_CHANGED	KVM_ARCH_REQ(29)
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index f00830e5202f..fdd1eca717fd 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -1704,7 +1704,7 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, u64 ingpa, u16 rep_cnt, bool
 	 * vcpu->arch.cr3 may not be up-to-date for running vCPUs so we can't
 	 * analyze it here, flush TLB regardless of the specified address space.
 	 */
-	kvm_make_vcpus_request_mask(kvm, KVM_REQ_HV_TLB_FLUSH,
+	kvm_make_vcpus_request_mask(kvm, KVM_REQ_TLB_FLUSH_GUEST,
 				    NULL, vcpu_mask, &hv_vcpu->tlb_flush);
 
 ret_success:
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 6058a65a6ede..1c243758dd2c 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -1127,12 +1127,19 @@ static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3, bool ne
 
 	/*
 	 * Unconditionally skip the TLB flush on fast CR3 switch, all TLB
-	 * flushes are handled by nested_vmx_transition_tlb_flush().  See
-	 * nested_vmx_transition_mmu_sync for details on skipping the MMU sync.
+	 * flushes are handled by nested_vmx_transition_tlb_flush().
 	 */
-	if (!nested_ept)
-		kvm_mmu_new_pgd(vcpu, cr3, true,
-				!nested_vmx_transition_mmu_sync(vcpu));
+	if (!nested_ept) {
+		kvm_mmu_new_pgd(vcpu, cr3, true, true);
+
+		/*
+		 * A TLB flush on VM-Enter/VM-Exit flushes all linear mappings
+		 * across all PCIDs, i.e. all PGDs need to be synchronized.
+		 * See nested_vmx_transition_mmu_sync() for more details.
+		 */
+		if (nested_vmx_transition_mmu_sync(vcpu))
+			kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
+	}
 
 	vcpu->arch.cr3 = cr3;
 	kvm_register_mark_available(vcpu, VCPU_EXREG_CR3);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9dd23bdfc6cc..905de6854efa 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9167,7 +9167,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 		}
 		if (kvm_check_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu))
 			kvm_vcpu_flush_tlb_current(vcpu);
-		if (kvm_check_request(KVM_REQ_HV_TLB_FLUSH, vcpu))
+		if (kvm_check_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu))
 			kvm_vcpu_flush_tlb_guest(vcpu);
 
 		if (kvm_check_request(KVM_REQ_REPORT_TPR_ACCESS, vcpu)) {
-- 
2.32.0.rc1.229.g3e70b5a671-goog


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 02/15] KVM: nVMX: Ensure 64-bit shift when checking VMFUNC bitmap
  2021-06-09 23:42 [PATCH 00/15] KVM: x86/mmu: TLB fixes and related cleanups Sean Christopherson
  2021-06-09 23:42 ` [PATCH 01/15] KVM: nVMX: Sync all PGDs on nested transition with shadow paging Sean Christopherson
@ 2021-06-09 23:42 ` Sean Christopherson
  2021-06-09 23:42 ` [PATCH 03/15] KVM: nVMX: Don't clobber nested MMU's A/D status on EPTP switch Sean Christopherson
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Sean Christopherson @ 2021-06-09 23:42 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Junaid Shahid, Maxim Levitsky,
	Lai Jiangshan

Use BIT_ULL() instead of an open-coded shift to check whether or not a
function is enabled in L1's VMFUNC bitmap.  This is a benign bug as KVM
supports only bit 0, and will fail VM-Enter if any other bits are set,
i.e. bits 63:32 are guaranteed to be zero.

Note, "function" is bounded by hardware as VMFUNC will #UD before taking
a VM-Exit if the function is greater than 63.

Before:
  if ((vmcs12->vm_function_control & (1 << function)) == 0)
   0x000000000001a916 <+118>:	mov    $0x1,%eax
   0x000000000001a91b <+123>:	shl    %cl,%eax
   0x000000000001a91d <+125>:	cltq
   0x000000000001a91f <+127>:	and    0x128(%rbx),%rax

After:
  if (!(vmcs12->vm_function_control & BIT_ULL(function & 63)))
   0x000000000001a955 <+117>:	mov    0x128(%rbx),%rdx
   0x000000000001a95c <+124>:	bt     %rax,%rdx

Fixes: 27c42a1bb867 ("KVM: nVMX: Enable VMFUNC for the L1 hypervisor")
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/nested.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 1c243758dd2c..c3624109ffeb 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -5540,7 +5540,7 @@ static int handle_vmfunc(struct kvm_vcpu *vcpu)
 	}
 
 	vmcs12 = get_vmcs12(vcpu);
-	if ((vmcs12->vm_function_control & (1 << function)) == 0)
+	if (!(vmcs12->vm_function_control & BIT_ULL(function)))
 		goto fail;
 
 	switch (function) {
-- 
2.32.0.rc1.229.g3e70b5a671-goog


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 03/15] KVM: nVMX: Don't clobber nested MMU's A/D status on EPTP switch
  2021-06-09 23:42 [PATCH 00/15] KVM: x86/mmu: TLB fixes and related cleanups Sean Christopherson
  2021-06-09 23:42 ` [PATCH 01/15] KVM: nVMX: Sync all PGDs on nested transition with shadow paging Sean Christopherson
  2021-06-09 23:42 ` [PATCH 02/15] KVM: nVMX: Ensure 64-bit shift when checking VMFUNC bitmap Sean Christopherson
@ 2021-06-09 23:42 ` Sean Christopherson
  2021-06-09 23:42 ` [PATCH 04/15] KVM: x86: Invalidate all PGDs for the current PCID on MOV CR3 w/ flush Sean Christopherson
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Sean Christopherson @ 2021-06-09 23:42 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Junaid Shahid, Maxim Levitsky,
	Lai Jiangshan

Drop bogus logic that incorrectly clobbers the accessed/dirty enabling
status of the nested MMU on an EPTP switch.  When nested EPT is enabled,
walk_mmu points at L2's _legacy_ page tables, not L1's EPT for L2.

This is likely a benign bug, as mmu->ept_ad is never consumed (since the
MMU is not a nested EPT MMU), and stuffing mmu_role.base.ad_disabled will
never propagate into future shadow pages since the nested MMU isn't used
to map anything, just to walk L2's page tables.

Note, KVM also does a full MMU reload, i.e. the guest_mmu will be
recreated using the new EPTP, and thus any change in A/D enabling will be
properly recognized in the relevant MMU.

Fixes: 41ab93727467 ("KVM: nVMX: Emulate EPTP switching for the L1 hypervisor")
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/nested.c | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index c3624109ffeb..e102a5c10a83 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -5488,8 +5488,6 @@ static int nested_vmx_eptp_switching(struct kvm_vcpu *vcpu,
 {
 	u32 index = kvm_rcx_read(vcpu);
 	u64 new_eptp;
-	bool accessed_dirty;
-	struct kvm_mmu *mmu = vcpu->arch.walk_mmu;
 
 	if (!nested_cpu_has_eptp_switching(vmcs12) ||
 	    !nested_cpu_has_ept(vmcs12))
@@ -5498,13 +5496,10 @@ static int nested_vmx_eptp_switching(struct kvm_vcpu *vcpu,
 	if (index >= VMFUNC_EPTP_ENTRIES)
 		return 1;
 
-
 	if (kvm_vcpu_read_guest_page(vcpu, vmcs12->eptp_list_address >> PAGE_SHIFT,
 				     &new_eptp, index * 8, 8))
 		return 1;
 
-	accessed_dirty = !!(new_eptp & VMX_EPTP_AD_ENABLE_BIT);
-
 	/*
 	 * If the (L2) guest does a vmfunc to the currently
 	 * active ept pointer, we don't have to do anything else
@@ -5513,8 +5508,6 @@ static int nested_vmx_eptp_switching(struct kvm_vcpu *vcpu,
 		if (!nested_vmx_check_eptp(vcpu, new_eptp))
 			return 1;
 
-		mmu->ept_ad = accessed_dirty;
-		mmu->mmu_role.base.ad_disabled = !accessed_dirty;
 		vmcs12->ept_pointer = new_eptp;
 
 		kvm_make_request(KVM_REQ_MMU_RELOAD, vcpu);
-- 
2.32.0.rc1.229.g3e70b5a671-goog


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 04/15] KVM: x86: Invalidate all PGDs for the current PCID on MOV CR3 w/ flush
  2021-06-09 23:42 [PATCH 00/15] KVM: x86/mmu: TLB fixes and related cleanups Sean Christopherson
                   ` (2 preceding siblings ...)
  2021-06-09 23:42 ` [PATCH 03/15] KVM: nVMX: Don't clobber nested MMU's A/D status on EPTP switch Sean Christopherson
@ 2021-06-09 23:42 ` Sean Christopherson
  2021-06-09 23:42 ` [PATCH 05/15] KVM: x86: Uncondtionally skip MMU sync/TLB flush in MOV CR3's PGD switch Sean Christopherson
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Sean Christopherson @ 2021-06-09 23:42 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Junaid Shahid, Maxim Levitsky,
	Lai Jiangshan

Flush and sync all PGDs for the current/target PCID on MOV CR3 with a
TLB flush, i.e. without PCID_NOFLUSH set.  Paraphrasing Intel's SDM
regarding the behavior of MOV to CR3:

  - If CR4.PCIDE = 0, invalidates all TLB entries associated with PCID
    000H and all entries in all paging-structure caches associated with
    PCID 000H.

  - If CR4.PCIDE = 1 and NOFLUSH=0, invalidates all TLB entries
    associated with the PCID specified in bits 11:0, and all entries in
    all paging-structure caches associated with that PCID. It is not
    required to invalidate entries in the TLBs and paging-structure
    caches that are associated with other PCIDs.

  - If CR4.PCIDE=1 and NOFLUSH=1, is not required to invalidate any TLB
    entries or entries in paging-structure caches.

Extract and reuse the logic for INVPCID(single) which is effectively the
same flow and works even if CR4.PCIDE=0, as the current PCID will be '0'
in that case, thus honoring the requirement of flushing PCID=0.

Continue passing skip_tlb_flush to kvm_mmu_new_pgd() even though it
_should_ be redundant; the clean up will be done in a future patch.  The
overhead of an unnecessary nop sync is minimal (especially compared to
the actual sync), and the TLB flush is handled via request.  Avoiding the
the negligible overhead is not worth the risk of breaking kernels that
backport the fix.

Fixes: 956bf3531fba ("kvm: x86: Skip shadow page resync on CR3 switch when indicated by guest")
Cc: Junaid Shahid <junaids@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/x86.c | 69 ++++++++++++++++++++++++++++------------------
 1 file changed, 42 insertions(+), 27 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 905de6854efa..e2f6d6a1ba54 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1084,25 +1084,45 @@ int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
 }
 EXPORT_SYMBOL_GPL(kvm_set_cr4);
 
+static void kvm_invalidate_pcid(struct kvm_vcpu *vcpu, unsigned long pcid)
+{
+	struct kvm_mmu *mmu = vcpu->arch.mmu;
+	unsigned long roots_to_free = 0;
+	int i;
+
+	/*
+	 * If neither the current CR3 nor any of the prev_roots use the given
+	 * PCID, then nothing needs to be done here because a resync will
+	 * happen anyway before switching to any other CR3.
+	 */
+	if (kvm_get_active_pcid(vcpu) == pcid) {
+		kvm_mmu_sync_roots(vcpu);
+		kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
+	}
+
+	for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++)
+		if (kvm_get_pcid(vcpu, mmu->prev_roots[i].pgd) == pcid)
+			roots_to_free |= KVM_MMU_ROOT_PREVIOUS(i);
+
+	kvm_mmu_free_roots(vcpu, mmu, roots_to_free);
+}
+
 int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
 {
 	bool skip_tlb_flush = false;
+	unsigned long pcid = 0;
 #ifdef CONFIG_X86_64
 	bool pcid_enabled = kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE);
 
 	if (pcid_enabled) {
 		skip_tlb_flush = cr3 & X86_CR3_PCID_NOFLUSH;
 		cr3 &= ~X86_CR3_PCID_NOFLUSH;
+		pcid = cr3 & X86_CR3_PCID_MASK;
 	}
 #endif
 
-	if (cr3 == kvm_read_cr3(vcpu) && !pdptrs_changed(vcpu)) {
-		if (!skip_tlb_flush) {
-			kvm_mmu_sync_roots(vcpu);
-			kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
-		}
-		return 0;
-	}
+	if (cr3 == kvm_read_cr3(vcpu) && !pdptrs_changed(vcpu))
+		goto handle_tlb_flush;
 
 	/*
 	 * Do not condition the GPA check on long mode, this helper is used to
@@ -1115,10 +1135,23 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
 	if (is_pae_paging(vcpu) && !load_pdptrs(vcpu, vcpu->arch.walk_mmu, cr3))
 		return 1;
 
-	kvm_mmu_new_pgd(vcpu, cr3, skip_tlb_flush, skip_tlb_flush);
+	if (cr3 != kvm_read_cr3(vcpu))
+		kvm_mmu_new_pgd(vcpu, cr3, skip_tlb_flush, skip_tlb_flush);
+
 	vcpu->arch.cr3 = cr3;
 	kvm_register_mark_available(vcpu, VCPU_EXREG_CR3);
 
+handle_tlb_flush:
+	/*
+	 * A load of CR3 that flushes the TLB flushes only the current PCID,
+	 * even if PCID is disabled, in which case PCID=0 is flushed.  It's a
+	 * moot point in the end because _disabling_ PCID will flush all PCIDs,
+	 * and it's impossible to use a non-zero PCID when PCID is disabled,
+	 * i.e. only PCID=0 can be relevant.
+	 */
+	if (!skip_tlb_flush)
+		kvm_invalidate_pcid(vcpu, pcid);
+
 	return 0;
 }
 EXPORT_SYMBOL_GPL(kvm_set_cr3);
@@ -11697,8 +11730,6 @@ int kvm_handle_invpcid(struct kvm_vcpu *vcpu, unsigned long type, gva_t gva)
 {
 	bool pcid_enabled;
 	struct x86_exception e;
-	unsigned i;
-	unsigned long roots_to_free = 0;
 	struct {
 		u64 pcid;
 		u64 gla;
@@ -11732,23 +11763,7 @@ int kvm_handle_invpcid(struct kvm_vcpu *vcpu, unsigned long type, gva_t gva)
 			return 1;
 		}
 
-		if (kvm_get_active_pcid(vcpu) == operand.pcid) {
-			kvm_mmu_sync_roots(vcpu);
-			kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
-		}
-
-		for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++)
-			if (kvm_get_pcid(vcpu, vcpu->arch.mmu->prev_roots[i].pgd)
-			    == operand.pcid)
-				roots_to_free |= KVM_MMU_ROOT_PREVIOUS(i);
-
-		kvm_mmu_free_roots(vcpu, vcpu->arch.mmu, roots_to_free);
-		/*
-		 * If neither the current cr3 nor any of the prev_roots use the
-		 * given PCID, then nothing needs to be done here because a
-		 * resync will happen anyway before switching to any other CR3.
-		 */
-
+		kvm_invalidate_pcid(vcpu, operand.pcid);
 		return kvm_skip_emulated_instruction(vcpu);
 
 	case INVPCID_TYPE_ALL_NON_GLOBAL:
-- 
2.32.0.rc1.229.g3e70b5a671-goog


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 05/15] KVM: x86: Uncondtionally skip MMU sync/TLB flush in MOV CR3's PGD switch
  2021-06-09 23:42 [PATCH 00/15] KVM: x86/mmu: TLB fixes and related cleanups Sean Christopherson
                   ` (3 preceding siblings ...)
  2021-06-09 23:42 ` [PATCH 04/15] KVM: x86: Invalidate all PGDs for the current PCID on MOV CR3 w/ flush Sean Christopherson
@ 2021-06-09 23:42 ` Sean Christopherson
  2021-06-09 23:42 ` [PATCH 06/15] KVM: nSVM: Move TLB flushing logic (or lack thereof) to dedicated helper Sean Christopherson
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Sean Christopherson @ 2021-06-09 23:42 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Junaid Shahid, Maxim Levitsky,
	Lai Jiangshan

Stop leveraging the MMU sync and TLB flush requested by the fast PGD
switch helper now that kvm_set_cr3() manually handles the necessary sync,
frees, and TLB flush.  This will allow dropping the params from the fast
PGD helpers since nested SVM is now the odd blob out.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/x86.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e2f6d6a1ba54..02ceb1f606f4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1136,7 +1136,7 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
 		return 1;
 
 	if (cr3 != kvm_read_cr3(vcpu))
-		kvm_mmu_new_pgd(vcpu, cr3, skip_tlb_flush, skip_tlb_flush);
+		kvm_mmu_new_pgd(vcpu, cr3, true, true);
 
 	vcpu->arch.cr3 = cr3;
 	kvm_register_mark_available(vcpu, VCPU_EXREG_CR3);
-- 
2.32.0.rc1.229.g3e70b5a671-goog


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 06/15] KVM: nSVM: Move TLB flushing logic (or lack thereof) to dedicated helper
  2021-06-09 23:42 [PATCH 00/15] KVM: x86/mmu: TLB fixes and related cleanups Sean Christopherson
                   ` (4 preceding siblings ...)
  2021-06-09 23:42 ` [PATCH 05/15] KVM: x86: Uncondtionally skip MMU sync/TLB flush in MOV CR3's PGD switch Sean Christopherson
@ 2021-06-09 23:42 ` Sean Christopherson
  2021-06-09 23:42 ` [PATCH 07/15] KVM: x86: Drop skip MMU sync and TLB flush params from "new PGD" helpers Sean Christopherson
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Sean Christopherson @ 2021-06-09 23:42 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Junaid Shahid, Maxim Levitsky,
	Lai Jiangshan

Introduce nested_svm_transition_tlb_flush() and use it force an MMU sync
and TLB flush on nSVM VM-Enter and VM-Exit instead of sneaking the logic
into the __kvm_mmu_new_pgd() call sites.  Add a partial todo list to
document issues that need to be addressed before the unconditional sync
and flush can be modified to look more like nVMX's logic.

In addition to making nSVM's forced flushing more overt (guess who keeps
losing track of it), the new helper brings further convergence between
nSVM and nVMX, and also sets the stage for dropping the "skip" params
from __kvm_mmu_new_pgd().

Cc: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c    |  2 +-
 arch/x86/kvm/svm/nested.c | 38 +++++++++++++++++++++++++++++---------
 2 files changed, 30 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 0144c40d09c7..d7f29bf94ca3 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4648,7 +4648,7 @@ void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, u32 cr0, u32 cr4, u32 efer,
 	struct kvm_mmu *context = &vcpu->arch.guest_mmu;
 	union kvm_mmu_role new_role = kvm_calc_shadow_npt_root_page_role(vcpu);
 
-	__kvm_mmu_new_pgd(vcpu, nested_cr3, new_role.base, false, false);
+	__kvm_mmu_new_pgd(vcpu, nested_cr3, new_role.base, true, true);
 
 	if (new_role.as_u64 != context->mmu_role.as_u64) {
 		shadow_mmu_init_context(vcpu, context, cr0, cr4, efer, new_role);
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 5e8d8443154e..fe2705557960 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -380,6 +380,25 @@ static inline bool nested_npt_enabled(struct vcpu_svm *svm)
 	return svm->nested.ctl.nested_ctl & SVM_NESTED_CTL_NP_ENABLE;
 }
 
+static void nested_svm_transition_tlb_flush(struct kvm_vcpu *vcpu)
+{
+	/*
+	 * TODO: optimize unconditional TLB flush/MMU sync.  A partial list of
+	 * things to fix before this can be conditional:
+	 *
+	 *  - Flush TLBs for both L1 and L2 remote TLB flush
+	 *  - Honor L1's request to flush an ASID on nested VMRUN
+	 *  - Sync nested NPT MMU on VMRUN that flushes L2's ASID[*]
+	 *  - Don't crush a pending TLB flush in vmcb02 on nested VMRUN
+	 *  - Flush L1's ASID on KVM_REQ_TLB_FLUSH_GUEST
+	 *
+	 * [*] Unlike nested EPT, SVM's ASID management can invalidate nested
+	 *     NPT guest-physical mappings on VMRUN.
+	 */
+	kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
+	kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
+}
+
 /*
  * Load guest's/host's cr3 on nested vmentry or vmexit. @nested_npt is true
  * if we are emulating VM-Entry into a guest with NPT enabled.
@@ -396,12 +415,8 @@ static int nested_svm_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3,
 			return -EINVAL;
 	}
 
-	/*
-	 * TODO: optimize unconditional TLB flush/MMU sync here and in
-	 * kvm_init_shadow_npt_mmu().
-	 */
 	if (!nested_npt)
-		kvm_mmu_new_pgd(vcpu, cr3, false, false);
+		kvm_mmu_new_pgd(vcpu, cr3, true, true);
 
 	vcpu->arch.cr3 = cr3;
 	kvm_register_mark_available(vcpu, VCPU_EXREG_CR3);
@@ -481,6 +496,7 @@ static void nested_vmcb02_prepare_save(struct vcpu_svm *svm, struct vmcb *vmcb12
 static void nested_vmcb02_prepare_control(struct vcpu_svm *svm)
 {
 	const u32 mask = V_INTR_MASKING_MASK | V_GIF_ENABLE_MASK | V_GIF_MASK;
+	struct kvm_vcpu *vcpu = &svm->vcpu;
 
 	/*
 	 * Filled at exit: exit_code, exit_code_hi, exit_info_1, exit_info_2,
@@ -505,10 +521,10 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm)
 
 	/* nested_cr3.  */
 	if (nested_npt_enabled(svm))
-		nested_svm_init_mmu_context(&svm->vcpu);
+		nested_svm_init_mmu_context(vcpu);
 
-	svm->vmcb->control.tsc_offset = svm->vcpu.arch.tsc_offset =
-		svm->vcpu.arch.l1_tsc_offset + svm->nested.ctl.tsc_offset;
+	svm->vmcb->control.tsc_offset = vcpu->arch.tsc_offset =
+		vcpu->arch.l1_tsc_offset + svm->nested.ctl.tsc_offset;
 
 	svm->vmcb->control.int_ctl             =
 		(svm->nested.ctl.int_ctl & ~mask) |
@@ -523,8 +539,10 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm)
 	svm->vmcb->control.pause_filter_count  = svm->nested.ctl.pause_filter_count;
 	svm->vmcb->control.pause_filter_thresh = svm->nested.ctl.pause_filter_thresh;
 
+	nested_svm_transition_tlb_flush(vcpu);
+
 	/* Enter Guest-Mode */
-	enter_guest_mode(&svm->vcpu);
+	enter_guest_mode(vcpu);
 
 	/*
 	 * Merge guest and host intercepts - must be called with vcpu in
@@ -803,6 +821,8 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
 
 	kvm_vcpu_unmap(vcpu, &map, true);
 
+	nested_svm_transition_tlb_flush(vcpu);
+
 	nested_svm_uninit_mmu_context(vcpu);
 
 	rc = nested_svm_load_cr3(vcpu, svm->vmcb->save.cr3, false);
-- 
2.32.0.rc1.229.g3e70b5a671-goog


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 07/15] KVM: x86: Drop skip MMU sync and TLB flush params from "new PGD" helpers
  2021-06-09 23:42 [PATCH 00/15] KVM: x86/mmu: TLB fixes and related cleanups Sean Christopherson
                   ` (5 preceding siblings ...)
  2021-06-09 23:42 ` [PATCH 06/15] KVM: nSVM: Move TLB flushing logic (or lack thereof) to dedicated helper Sean Christopherson
@ 2021-06-09 23:42 ` Sean Christopherson
  2021-06-09 23:42 ` [PATCH 08/15] KVM: nVMX: Consolidate VM-Enter/VM-Exit TLB flush and MMU sync logic Sean Christopherson
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Sean Christopherson @ 2021-06-09 23:42 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Junaid Shahid, Maxim Levitsky,
	Lai Jiangshan

Drop skip_mmu_sync and skip_tlb_flush from __kvm_mmu_new_pgd() now that
all call sites unconditionally skip both the sync and flush.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h |  3 +--
 arch/x86/kvm/mmu/mmu.c          | 17 +++++++----------
 arch/x86/kvm/svm/nested.c       |  2 +-
 arch/x86/kvm/vmx/nested.c       |  6 +-----
 arch/x86/kvm/x86.c              |  2 +-
 5 files changed, 11 insertions(+), 19 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 6652e51a86fd..c05448d3beff 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1675,8 +1675,7 @@ void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva);
 void kvm_mmu_invalidate_gva(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 			    gva_t gva, hpa_t root_hpa);
 void kvm_mmu_invpcid_gva(struct kvm_vcpu *vcpu, gva_t gva, unsigned long pcid);
-void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd, bool skip_tlb_flush,
-		     bool skip_mmu_sync);
+void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd);
 
 void kvm_configure_mmu(bool enable_tdp, int tdp_max_root_level,
 		       int tdp_huge_page_level);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index d7f29bf94ca3..a832e0fedf32 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3913,8 +3913,7 @@ static bool fast_pgd_switch(struct kvm_vcpu *vcpu, gpa_t new_pgd,
 }
 
 static void __kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd,
-			      union kvm_mmu_page_role new_role,
-			      bool skip_tlb_flush, bool skip_mmu_sync)
+			      union kvm_mmu_page_role new_role)
 {
 	if (!fast_pgd_switch(vcpu, new_pgd, new_role)) {
 		kvm_mmu_free_roots(vcpu, vcpu->arch.mmu, KVM_MMU_ROOT_CURRENT);
@@ -3929,10 +3928,10 @@ static void __kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd,
 	 */
 	kvm_make_request(KVM_REQ_LOAD_MMU_PGD, vcpu);
 
-	if (!skip_mmu_sync || force_flush_and_sync_on_reuse)
+	if (force_flush_and_sync_on_reuse) {
 		kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
-	if (!skip_tlb_flush || force_flush_and_sync_on_reuse)
 		kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
+	}
 
 	/*
 	 * The last MMIO access's GVA and GPA are cached in the VCPU. When
@@ -3951,11 +3950,9 @@ static void __kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd,
 				to_shadow_page(vcpu->arch.mmu->root_hpa));
 }
 
-void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd, bool skip_tlb_flush,
-		     bool skip_mmu_sync)
+void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd)
 {
-	__kvm_mmu_new_pgd(vcpu, new_pgd, kvm_mmu_calc_root_page_role(vcpu),
-			  skip_tlb_flush, skip_mmu_sync);
+	__kvm_mmu_new_pgd(vcpu, new_pgd, kvm_mmu_calc_root_page_role(vcpu));
 }
 EXPORT_SYMBOL_GPL(kvm_mmu_new_pgd);
 
@@ -4648,7 +4645,7 @@ void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, u32 cr0, u32 cr4, u32 efer,
 	struct kvm_mmu *context = &vcpu->arch.guest_mmu;
 	union kvm_mmu_role new_role = kvm_calc_shadow_npt_root_page_role(vcpu);
 
-	__kvm_mmu_new_pgd(vcpu, nested_cr3, new_role.base, true, true);
+	__kvm_mmu_new_pgd(vcpu, nested_cr3, new_role.base);
 
 	if (new_role.as_u64 != context->mmu_role.as_u64) {
 		shadow_mmu_init_context(vcpu, context, cr0, cr4, efer, new_role);
@@ -4700,7 +4697,7 @@ void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly,
 		kvm_calc_shadow_ept_root_page_role(vcpu, accessed_dirty,
 						   execonly, level);
 
-	__kvm_mmu_new_pgd(vcpu, new_eptp, new_role.base, true, true);
+	__kvm_mmu_new_pgd(vcpu, new_eptp, new_role.base);
 
 	if (new_role.as_u64 == context->mmu_role.as_u64)
 		return;
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index fe2705557960..ccd90ea93acd 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -416,7 +416,7 @@ static int nested_svm_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3,
 	}
 
 	if (!nested_npt)
-		kvm_mmu_new_pgd(vcpu, cr3, true, true);
+		kvm_mmu_new_pgd(vcpu, cr3);
 
 	vcpu->arch.cr3 = cr3;
 	kvm_register_mark_available(vcpu, VCPU_EXREG_CR3);
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index e102a5c10a83..3fb87e5aead4 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -1125,12 +1125,8 @@ static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3, bool ne
 		}
 	}
 
-	/*
-	 * Unconditionally skip the TLB flush on fast CR3 switch, all TLB
-	 * flushes are handled by nested_vmx_transition_tlb_flush().
-	 */
 	if (!nested_ept) {
-		kvm_mmu_new_pgd(vcpu, cr3, true, true);
+		kvm_mmu_new_pgd(vcpu, cr3);
 
 		/*
 		 * A TLB flush on VM-Enter/VM-Exit flushes all linear mappings
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 02ceb1f606f4..117acfbc7ba9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1136,7 +1136,7 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
 		return 1;
 
 	if (cr3 != kvm_read_cr3(vcpu))
-		kvm_mmu_new_pgd(vcpu, cr3, true, true);
+		kvm_mmu_new_pgd(vcpu, cr3);
 
 	vcpu->arch.cr3 = cr3;
 	kvm_register_mark_available(vcpu, VCPU_EXREG_CR3);
-- 
2.32.0.rc1.229.g3e70b5a671-goog


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 08/15] KVM: nVMX: Consolidate VM-Enter/VM-Exit TLB flush and MMU sync logic
  2021-06-09 23:42 [PATCH 00/15] KVM: x86/mmu: TLB fixes and related cleanups Sean Christopherson
                   ` (6 preceding siblings ...)
  2021-06-09 23:42 ` [PATCH 07/15] KVM: x86: Drop skip MMU sync and TLB flush params from "new PGD" helpers Sean Christopherson
@ 2021-06-09 23:42 ` Sean Christopherson
  2021-06-09 23:42 ` [PATCH 09/15] KVM: nVMX: Free only guest_mode (L2) roots on INVVPID w/o EPT Sean Christopherson
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Sean Christopherson @ 2021-06-09 23:42 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Junaid Shahid, Maxim Levitsky,
	Lai Jiangshan

Drop the dedicated nested_vmx_transition_mmu_sync() now that the MMU sync
is handled via KVM_REQ_TLB_FLUSH_GUEST, and fold that flush into the
all-encompassing nested_vmx_transition_tlb_flush().

Opportunistically add a comment explaning why nested EPT never needs to
sync the MMU on VM-Enter.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/nested.c | 91 +++++++++++----------------------------
 1 file changed, 25 insertions(+), 66 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 3fb87e5aead4..6d7c368c92e7 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -1057,48 +1057,6 @@ static void prepare_vmx_msr_autostore_list(struct kvm_vcpu *vcpu,
 	}
 }
 
-/*
- * Returns true if the MMU needs to be sync'd on nested VM-Enter/VM-Exit.
- * tl;dr: the MMU needs a sync if L0 is using shadow paging and L1 didn't
- * enable VPID for L2 (implying it expects a TLB flush on VMX transitions).
- * Here's why.
- *
- * If EPT is enabled by L0 a sync is never needed:
- * - if it is disabled by L1, then L0 is not shadowing L1 or L2 PTEs, there
- *   cannot be unsync'd SPTEs for either L1 or L2.
- *
- * - if it is also enabled by L1, then L0 doesn't need to sync on VM-Enter
- *   VM-Enter as VM-Enter isn't required to invalidate guest-physical mappings
- *   (irrespective of VPID), i.e. L1 can't rely on the (virtual) CPU to flush
- *   stale guest-physical mappings for L2 from the TLB.  And as above, L0 isn't
- *   shadowing L1 PTEs so there are no unsync'd SPTEs to sync on VM-Exit.
- *
- * If EPT is disabled by L0:
- * - if VPID is enabled by L1 (for L2), the situation is similar to when L1
- *   enables EPT: L0 doesn't need to sync as VM-Enter and VM-Exit aren't
- *   required to invalidate linear mappings (EPT is disabled so there are
- *   no combined or guest-physical mappings), i.e. L1 can't rely on the
- *   (virtual) CPU to flush stale linear mappings for either L2 or itself (L1).
- *
- * - however if VPID is disabled by L1, then a sync is needed as L1 expects all
- *   linear mappings (EPT is disabled so there are no combined or guest-physical
- *   mappings) to be invalidated on both VM-Enter and VM-Exit.
- *
- * Note, this logic is subtly different than nested_has_guest_tlb_tag(), which
- * additionally checks that L2 has been assigned a VPID (when EPT is disabled).
- * Whether or not L2 has been assigned a VPID by L0 is irrelevant with respect
- * to L1's expectations, e.g. L0 needs to invalidate hardware TLB entries if L2
- * doesn't have a unique VPID to prevent reusing L1's entries (assuming L1 has
- * been assigned a VPID), but L0 doesn't need to do a MMU sync because L1
- * doesn't expect stale (virtual) TLB entries to be flushed, i.e. L1 doesn't
- * know that L0 will flush the TLB and so L1 will do INVVPID as needed to flush
- * stale TLB entries, at which point L0 will sync L2's MMU.
- */
-static bool nested_vmx_transition_mmu_sync(struct kvm_vcpu *vcpu)
-{
-	return !enable_ept && !nested_cpu_has_vpid(get_vmcs12(vcpu));
-}
-
 /*
  * Load guest's/host's cr3 at nested entry/exit.  @nested_ept is true if we are
  * emulating VM-Entry into a guest with EPT enabled.  On failure, the expected
@@ -1125,18 +1083,9 @@ static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3, bool ne
 		}
 	}
 
-	if (!nested_ept) {
+	if (!nested_ept)
 		kvm_mmu_new_pgd(vcpu, cr3);
 
-		/*
-		 * A TLB flush on VM-Enter/VM-Exit flushes all linear mappings
-		 * across all PCIDs, i.e. all PGDs need to be synchronized.
-		 * See nested_vmx_transition_mmu_sync() for more details.
-		 */
-		if (nested_vmx_transition_mmu_sync(vcpu))
-			kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
-	}
-
 	vcpu->arch.cr3 = cr3;
 	kvm_register_mark_available(vcpu, VCPU_EXREG_CR3);
 
@@ -1172,18 +1121,29 @@ static void nested_vmx_transition_tlb_flush(struct kvm_vcpu *vcpu,
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 
-	/*
-	 * If VPID is disabled, linear and combined mappings are flushed on
-	 * VM-Enter/VM-Exit, and guest-physical mappings are valid only for
-	 * their associated EPTP.
-	 */
-	if (!enable_vpid)
-		return;
-
 	/*
 	 * If vmcs12 doesn't use VPID, L1 expects linear and combined mappings
-	 * for *all* contexts to be flushed on VM-Enter/VM-Exit.
+	 * for *all* contexts to be flushed on VM-Enter/VM-Exit, i.e. it's a
+	 * full TLB flush from the guest's perspective.  This is required even
+	 * if VPID is disabled in the host as KVM may need to synchronize the
+	 * MMU in response to the guest TLB flush.
 	 *
+	 * Note, using TLB_FLUSH_GUEST is correct even if nested EPT is in use.
+	 * EPT is a special snowflake, as guest-physical mappings aren't
+	 * flushed on VPID invalidations, including VM-Enter or VM-Exit with
+	 * VPID disabled.  As a result, KVM _never_ needs to sync nEPT
+	 * entries on VM-Enter because L1 can't rely on VM-Enter to flush
+	 * those mappings.
+	 */
+	if (!nested_cpu_has_vpid(vmcs12)) {
+		kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
+		return;
+	}
+
+	/* L2 should never have a VPID if VPID is disabled. */
+	WARN_ON(!enable_vpid);
+
+	/*
 	 * If VPID is enabled and used by vmc12, but L2 does not have a unique
 	 * TLB tag (ASID), i.e. EPT is disabled and KVM was unable to allocate
 	 * a VPID for L2, flush the current context as the effective ASID is
@@ -1195,13 +1155,12 @@ static void nested_vmx_transition_tlb_flush(struct kvm_vcpu *vcpu,
 	 *
 	 * If a TLB flush isn't required due to any of the above, and vpid12 is
 	 * changing then the new "virtual" VPID (vpid12) will reuse the same
-	 * "real" VPID (vpid02), and so needs to be sync'd.  There is no direct
+	 * "real" VPID (vpid02), and so needs to be flushed.  There's no direct
 	 * mapping between vpid02 and vpid12, vpid02 is per-vCPU and reused for
-	 * all nested vCPUs.
+	 * all nested vCPUs.  Remember, a flush on VM-Enter does not invalidate
+	 * guest-physical mappings, so there is no need to sync the nEPT MMU.
 	 */
-	if (!nested_cpu_has_vpid(vmcs12)) {
-		kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
-	} else if (!nested_has_guest_tlb_tag(vcpu)) {
+	if (!nested_has_guest_tlb_tag(vcpu)) {
 		kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
 	} else if (is_vmenter &&
 		   vmcs12->virtual_processor_id != vmx->nested.last_vpid) {
-- 
2.32.0.rc1.229.g3e70b5a671-goog


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 09/15] KVM: nVMX: Free only guest_mode (L2) roots on INVVPID w/o EPT
  2021-06-09 23:42 [PATCH 00/15] KVM: x86/mmu: TLB fixes and related cleanups Sean Christopherson
                   ` (7 preceding siblings ...)
  2021-06-09 23:42 ` [PATCH 08/15] KVM: nVMX: Consolidate VM-Enter/VM-Exit TLB flush and MMU sync logic Sean Christopherson
@ 2021-06-09 23:42 ` Sean Christopherson
  2021-06-09 23:42 ` [PATCH 10/15] KVM: x86: Use KVM_REQ_TLB_FLUSH_GUEST to handle INVPCID(ALL) emulation Sean Christopherson
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Sean Christopherson @ 2021-06-09 23:42 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Junaid Shahid, Maxim Levitsky,
	Lai Jiangshan

When emulating INVVPID for L1, free only L2+ roots, using the guest_mode
tag in the MMU role to identify L2+ roots.  From L1's perspective, its
own TLB entries use VPID=0, and INVVPID is not requied to invalidate such
entries.  Per Intel's SDM, INVVPID _may_ invalidate entries with VPID=0,
but it is not required to do so.

Cc: Lai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/mmu/mmu.c          | 27 +++++++++++++++++++++++++++
 arch/x86/kvm/vmx/nested.c       |  7 +++----
 3 files changed, 31 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index c05448d3beff..05c5ca047c53 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1650,6 +1650,7 @@ int kvm_mmu_unprotect_page(struct kvm *kvm, gfn_t gfn);
 void __kvm_mmu_free_some_pages(struct kvm_vcpu *vcpu);
 void kvm_mmu_free_roots(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 			ulong roots_to_free);
+void kvm_mmu_free_guest_mode_roots(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu);
 gpa_t translate_nested_gpa(struct kvm_vcpu *vcpu, gpa_t gpa, u32 access,
 			   struct x86_exception *exception);
 gpa_t kvm_mmu_gva_to_gpa_read(struct kvm_vcpu *vcpu, gva_t gva,
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index a832e0fedf32..f987f2ea4a01 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3180,6 +3180,33 @@ void kvm_mmu_free_roots(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 }
 EXPORT_SYMBOL_GPL(kvm_mmu_free_roots);
 
+void kvm_mmu_free_guest_mode_roots(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu)
+{
+	unsigned long roots_to_free = 0;
+	hpa_t root_hpa;
+	int i;
+
+	/*
+	 * This should not be called while L2 is active, L2 can't invalidate
+	 * _only_ its own roots, e.g. INVVPID unconditionally exits.
+	 */
+	WARN_ON_ONCE(mmu->mmu_role.base.guest_mode);
+
+	for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++) {
+		root_hpa = mmu->prev_roots[i].hpa;
+		if (!VALID_PAGE(root_hpa))
+			continue;
+
+		if (!to_shadow_page(root_hpa) ||
+			to_shadow_page(root_hpa)->role.guest_mode)
+			roots_to_free |= KVM_MMU_ROOT_PREVIOUS(i);
+	}
+
+	kvm_mmu_free_roots(vcpu, mmu, roots_to_free);
+}
+EXPORT_SYMBOL_GPL(kvm_mmu_free_guest_mode_roots);
+
+
 static int mmu_check_root(struct kvm_vcpu *vcpu, gfn_t root_gfn)
 {
 	int ret = 0;
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 6d7c368c92e7..2a881afc1fd0 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -5423,8 +5423,8 @@ static int handle_invvpid(struct kvm_vcpu *vcpu)
 
 	/*
 	 * Sync the shadow page tables if EPT is disabled, L1 is invalidating
-	 * linear mappings for L2 (tagged with L2's VPID).  Free all roots as
-	 * VPIDs are not tracked in the MMU role.
+	 * linear mappings for L2 (tagged with L2's VPID).  Free all guest
+	 * roots as VPIDs are not tracked in the MMU role.
 	 *
 	 * Note, this operates on root_mmu, not guest_mmu, as L1 and L2 share
 	 * an MMU when EPT is disabled.
@@ -5432,8 +5432,7 @@ static int handle_invvpid(struct kvm_vcpu *vcpu)
 	 * TODO: sync only the affected SPTEs for INVDIVIDUAL_ADDR.
 	 */
 	if (!enable_ept)
-		kvm_mmu_free_roots(vcpu, &vcpu->arch.root_mmu,
-				   KVM_MMU_ROOTS_ALL);
+		kvm_mmu_free_guest_mode_roots(vcpu, &vcpu->arch.root_mmu);
 
 	return nested_vmx_succeed(vcpu);
 }
-- 
2.32.0.rc1.229.g3e70b5a671-goog


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 10/15] KVM: x86: Use KVM_REQ_TLB_FLUSH_GUEST to handle INVPCID(ALL) emulation
  2021-06-09 23:42 [PATCH 00/15] KVM: x86/mmu: TLB fixes and related cleanups Sean Christopherson
                   ` (8 preceding siblings ...)
  2021-06-09 23:42 ` [PATCH 09/15] KVM: nVMX: Free only guest_mode (L2) roots on INVVPID w/o EPT Sean Christopherson
@ 2021-06-09 23:42 ` Sean Christopherson
  2021-06-09 23:42 ` [PATCH 11/15] KVM: nVMX: Use fast PGD switch when emulating VMFUNC[EPTP_SWITCH] Sean Christopherson
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Sean Christopherson @ 2021-06-09 23:42 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Junaid Shahid, Maxim Levitsky,
	Lai Jiangshan

Use KVM_REQ_TLB_FLUSH_GUEST instead of KVM_REQ_MMU_RELOAD when emulating
INVPCID of all contexts.  In the current code, this is a glorified nop as
TLB_FLUSH_GUEST becomes kvm_mmu_unload(), same as MMU_RELOAD, when TDP
is disabled, which is the only time INVPCID is only intercepted+emulated.
In the future, reusing TLB_FLUSH_GUEST will simplify optimizing paths
that emulate a guest TLB flush, e.g. by synchronizing as needed instead
of completely unloading all MMUs.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/x86.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 117acfbc7ba9..9620d8936dc4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11776,7 +11776,7 @@ int kvm_handle_invpcid(struct kvm_vcpu *vcpu, unsigned long type, gva_t gva)
 
 		fallthrough;
 	case INVPCID_TYPE_ALL_INCL_GLOBAL:
-		kvm_make_request(KVM_REQ_MMU_RELOAD, vcpu);
+		kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
 		return kvm_skip_emulated_instruction(vcpu);
 
 	default:
-- 
2.32.0.rc1.229.g3e70b5a671-goog


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 11/15] KVM: nVMX: Use fast PGD switch when emulating VMFUNC[EPTP_SWITCH]
  2021-06-09 23:42 [PATCH 00/15] KVM: x86/mmu: TLB fixes and related cleanups Sean Christopherson
                   ` (9 preceding siblings ...)
  2021-06-09 23:42 ` [PATCH 10/15] KVM: x86: Use KVM_REQ_TLB_FLUSH_GUEST to handle INVPCID(ALL) emulation Sean Christopherson
@ 2021-06-09 23:42 ` Sean Christopherson
  2021-06-09 23:42 ` [PATCH 12/15] KVM: x86: Defer MMU sync on PCID invalidation Sean Christopherson
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Sean Christopherson @ 2021-06-09 23:42 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Junaid Shahid, Maxim Levitsky,
	Lai Jiangshan

Use __kvm_mmu_new_pgd() via kvm_init_shadow_ept_mmu() to emulate
VMFUNC[EPTP_SWITCH] instead of nuking all MMUs.  EPTP_SWITCH is the EPT
equivalent of MOV to CR3, i.e. is a perfect fit for the common PGD flow,
the only hiccup being that A/D enabling is buried in the EPTP.  But, that
is easily handled by bouncing through kvm_init_shadow_ept_mmu().

Explicitly request a guest TLB flush if VPID is disabled.  Per Intel's
SDM, if VPID is disabled, "an EPTP-switching VMFUNC invalidates combined
mappings associated with VPID 0000H (for all PCIDs and for all EP4TA
values, where EP4TA is the value of bits 51:12 of EPTP)".

Note, this technically is a very bizarre bug fix of sorts if L2 is using
PAE paging, as avoiding the full MMU reload also avoids incorrectly
reloading the PDPTEs, which the SDM explicitly states are not touched:

  If PAE paging is in use, an EPTP-switching VMFUNC does not load the
  four page-directory-pointer-table entries (PDPTEs) from the
  guest-physical address in CR3. The logical processor continues to use
  the four guest-physical addresses already present in the PDPTEs. The
  guest-physical address in CR3 is not translated through the new EPT
  paging structures (until some operation that would load the PDPTEs).

In addition to optimizing L2's MMU shenanigans, avoiding the full reload
also optimizes L1's MMU as KVM_REQ_MMU_RELOAD wipes out all roots in both
root_mmu and guest_mmu.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/nested.c | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 2a881afc1fd0..4b8f5dca49ac 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -346,16 +346,21 @@ static void nested_ept_inject_page_fault(struct kvm_vcpu *vcpu,
 	vmcs12->guest_physical_address = fault->address;
 }
 
+static void nested_ept_new_eptp(struct kvm_vcpu *vcpu)
+{
+	kvm_init_shadow_ept_mmu(vcpu,
+				to_vmx(vcpu)->nested.msrs.ept_caps &
+				VMX_EPT_EXECUTE_ONLY_BIT,
+				nested_ept_ad_enabled(vcpu),
+				nested_ept_get_eptp(vcpu));
+}
+
 static void nested_ept_init_mmu_context(struct kvm_vcpu *vcpu)
 {
 	WARN_ON(mmu_is_nested(vcpu));
 
 	vcpu->arch.mmu = &vcpu->arch.guest_mmu;
-	kvm_init_shadow_ept_mmu(vcpu,
-			to_vmx(vcpu)->nested.msrs.ept_caps &
-			VMX_EPT_EXECUTE_ONLY_BIT,
-			nested_ept_ad_enabled(vcpu),
-			nested_ept_get_eptp(vcpu));
+	nested_ept_new_eptp(vcpu);
 	vcpu->arch.mmu->get_guest_pgd     = nested_ept_get_eptp;
 	vcpu->arch.mmu->inject_page_fault = nested_ept_inject_page_fault;
 	vcpu->arch.mmu->get_pdptr         = kvm_pdptr_read;
@@ -5463,8 +5468,10 @@ static int nested_vmx_eptp_switching(struct kvm_vcpu *vcpu,
 			return 1;
 
 		vmcs12->ept_pointer = new_eptp;
+		nested_ept_new_eptp(vcpu);
 
-		kvm_make_request(KVM_REQ_MMU_RELOAD, vcpu);
+		if (!nested_cpu_has_vpid(vmcs12))
+			kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
 	}
 
 	return 0;
-- 
2.32.0.rc1.229.g3e70b5a671-goog


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 12/15] KVM: x86: Defer MMU sync on PCID invalidation
  2021-06-09 23:42 [PATCH 00/15] KVM: x86/mmu: TLB fixes and related cleanups Sean Christopherson
                   ` (10 preceding siblings ...)
  2021-06-09 23:42 ` [PATCH 11/15] KVM: nVMX: Use fast PGD switch when emulating VMFUNC[EPTP_SWITCH] Sean Christopherson
@ 2021-06-09 23:42 ` Sean Christopherson
  2021-06-09 23:42 ` [PATCH 13/15] KVM: x86: Drop pointless @reset_roots from kvm_init_mmu() Sean Christopherson
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Sean Christopherson @ 2021-06-09 23:42 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Junaid Shahid, Maxim Levitsky,
	Lai Jiangshan

Defer the MMU sync on PCID invalidation so that multiple sync requests in
a single VM-Exit are batched.  This is a very minor optimization as
checking for unsync'd children is quite cheap.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/x86.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9620d8936dc4..d3a2a3375541 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1096,7 +1096,7 @@ static void kvm_invalidate_pcid(struct kvm_vcpu *vcpu, unsigned long pcid)
 	 * happen anyway before switching to any other CR3.
 	 */
 	if (kvm_get_active_pcid(vcpu) == pcid) {
-		kvm_mmu_sync_roots(vcpu);
+		kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
 		kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
 	}
 
-- 
2.32.0.rc1.229.g3e70b5a671-goog


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 13/15] KVM: x86: Drop pointless @reset_roots from kvm_init_mmu()
  2021-06-09 23:42 [PATCH 00/15] KVM: x86/mmu: TLB fixes and related cleanups Sean Christopherson
                   ` (11 preceding siblings ...)
  2021-06-09 23:42 ` [PATCH 12/15] KVM: x86: Defer MMU sync on PCID invalidation Sean Christopherson
@ 2021-06-09 23:42 ` Sean Christopherson
  2021-06-09 23:42 ` [PATCH 14/15] KVM: nVMX: WARN if subtly-impossible VMFUNC conditions occur Sean Christopherson
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Sean Christopherson @ 2021-06-09 23:42 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Junaid Shahid, Maxim Levitsky,
	Lai Jiangshan

Remove the @reset_roots param from kvm_init_mmu(), the one user,
kvm_mmu_reset_context() has already unloaded the MMU and thus freed and
invalidated all roots.  This also happens to be why the reset_roots=true
paths doesn't leak roots; they're already invalid.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu.h        |  2 +-
 arch/x86/kvm/mmu/mmu.c    | 13 ++-----------
 arch/x86/kvm/svm/nested.c |  2 +-
 arch/x86/kvm/vmx/nested.c |  2 +-
 arch/x86/kvm/x86.c        |  2 +-
 5 files changed, 6 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 88d0ed5225a4..63b49725fb24 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -65,7 +65,7 @@ void kvm_mmu_set_ept_masks(bool has_ad_bits, bool has_exec_only);
 void
 reset_shadow_zero_bits_mask(struct kvm_vcpu *vcpu, struct kvm_mmu *context);
 
-void kvm_init_mmu(struct kvm_vcpu *vcpu, bool reset_roots);
+void kvm_init_mmu(struct kvm_vcpu *vcpu);
 void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, u32 cr0, u32 cr4, u32 efer,
 			     gpa_t nested_cr3);
 void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly,
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index f987f2ea4a01..b4fa8ec8afce 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4817,17 +4817,8 @@ static void init_kvm_nested_mmu(struct kvm_vcpu *vcpu)
 	update_last_nonleaf_level(vcpu, g_context);
 }
 
-void kvm_init_mmu(struct kvm_vcpu *vcpu, bool reset_roots)
+void kvm_init_mmu(struct kvm_vcpu *vcpu)
 {
-	if (reset_roots) {
-		uint i;
-
-		vcpu->arch.mmu->root_hpa = INVALID_PAGE;
-
-		for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++)
-			vcpu->arch.mmu->prev_roots[i] = KVM_MMU_ROOT_INFO_INVALID;
-	}
-
 	if (mmu_is_nested(vcpu))
 		init_kvm_nested_mmu(vcpu);
 	else if (tdp_enabled)
@@ -4853,7 +4844,7 @@ kvm_mmu_calc_root_page_role(struct kvm_vcpu *vcpu)
 void kvm_mmu_reset_context(struct kvm_vcpu *vcpu)
 {
 	kvm_mmu_unload(vcpu);
-	kvm_init_mmu(vcpu, true);
+	kvm_init_mmu(vcpu);
 }
 EXPORT_SYMBOL_GPL(kvm_mmu_reset_context);
 
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index ccd90ea93acd..8a4276d8753d 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -421,7 +421,7 @@ static int nested_svm_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3,
 	vcpu->arch.cr3 = cr3;
 	kvm_register_mark_available(vcpu, VCPU_EXREG_CR3);
 
-	kvm_init_mmu(vcpu, false);
+	kvm_init_mmu(vcpu);
 
 	return 0;
 }
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 4b8f5dca49ac..f686618d9ede 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -1094,7 +1094,7 @@ static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3, bool ne
 	vcpu->arch.cr3 = cr3;
 	kvm_register_mark_available(vcpu, VCPU_EXREG_CR3);
 
-	kvm_init_mmu(vcpu, false);
+	kvm_init_mmu(vcpu);
 
 	return 0;
 }
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d3a2a3375541..32e93492273f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10409,7 +10409,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
 	kvm_vcpu_mtrr_init(vcpu);
 	vcpu_load(vcpu);
 	kvm_vcpu_reset(vcpu, false);
-	kvm_init_mmu(vcpu, false);
+	kvm_init_mmu(vcpu);
 	vcpu_put(vcpu);
 	return 0;
 
-- 
2.32.0.rc1.229.g3e70b5a671-goog


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 14/15] KVM: nVMX: WARN if subtly-impossible VMFUNC conditions occur
  2021-06-09 23:42 [PATCH 00/15] KVM: x86/mmu: TLB fixes and related cleanups Sean Christopherson
                   ` (12 preceding siblings ...)
  2021-06-09 23:42 ` [PATCH 13/15] KVM: x86: Drop pointless @reset_roots from kvm_init_mmu() Sean Christopherson
@ 2021-06-09 23:42 ` Sean Christopherson
  2021-06-09 23:42 ` [PATCH 15/15] KVM: nVMX: Drop redundant checks on vmcs12 in EPTP switching emulation Sean Christopherson
  2021-06-10 16:10 ` [PATCH 00/15] KVM: x86/mmu: TLB fixes and related cleanups Paolo Bonzini
  15 siblings, 0 replies; 18+ messages in thread
From: Sean Christopherson @ 2021-06-09 23:42 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Junaid Shahid, Maxim Levitsky,
	Lai Jiangshan

WARN and inject #UD when emulating VMFUNC for L2 if the function is
out-of-bounds or if VMFUNC is not enabled in vmcs12.  Neither condition
should occur in practice, as the CPU is supposed to prioritize the #UD
over VM-Exit for out-of-bounds input and KVM is supposed to enable
VMFUNC in vmcs02 if and only if it's enabled in vmcs12, but neither of
those dependencies is obvious.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/nested.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index f686618d9ede..0075d3f0f8fa 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -5494,6 +5494,16 @@ static int handle_vmfunc(struct kvm_vcpu *vcpu)
 	}
 
 	vmcs12 = get_vmcs12(vcpu);
+
+	/*
+	 * #UD on out-of-bounds function has priority over VM-Exit, and VMFUNC
+	 * is enabled in vmcs02 if and only if it's enabled in vmcs12.
+	 */
+	if (WARN_ON_ONCE((function > 63) || !nested_cpu_has_vmfunc(vmcs12))) {
+		kvm_queue_exception(vcpu, UD_VECTOR);
+		return 1;
+	}
+
 	if (!(vmcs12->vm_function_control & BIT_ULL(function)))
 		goto fail;
 
-- 
2.32.0.rc1.229.g3e70b5a671-goog


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 15/15] KVM: nVMX: Drop redundant checks on vmcs12 in EPTP switching emulation
  2021-06-09 23:42 [PATCH 00/15] KVM: x86/mmu: TLB fixes and related cleanups Sean Christopherson
                   ` (13 preceding siblings ...)
  2021-06-09 23:42 ` [PATCH 14/15] KVM: nVMX: WARN if subtly-impossible VMFUNC conditions occur Sean Christopherson
@ 2021-06-09 23:42 ` Sean Christopherson
  2021-06-10 16:09   ` Paolo Bonzini
  2021-06-10 16:10 ` [PATCH 00/15] KVM: x86/mmu: TLB fixes and related cleanups Paolo Bonzini
  15 siblings, 1 reply; 18+ messages in thread
From: Sean Christopherson @ 2021-06-09 23:42 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Junaid Shahid, Maxim Levitsky,
	Lai Jiangshan

Drop the explicit checks on EPTP switching and EPT itself being enabled.
The EPTP switching check is handled in the generic VMFUNC function check,
the underlying VMFUNC enablement check is done by hardware and redone
by generic VMFUNC emulation, and the vmcs12 EPT check is handled by KVM
at VM-Enter in the form of a consistency check.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/nested.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 0075d3f0f8fa..479ec9378609 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -5448,10 +5448,6 @@ static int nested_vmx_eptp_switching(struct kvm_vcpu *vcpu,
 	u32 index = kvm_rcx_read(vcpu);
 	u64 new_eptp;
 
-	if (!nested_cpu_has_eptp_switching(vmcs12) ||
-	    !nested_cpu_has_ept(vmcs12))
-		return 1;
-
 	if (index >= VMFUNC_EPTP_ENTRIES)
 		return 1;
 
-- 
2.32.0.rc1.229.g3e70b5a671-goog


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 15/15] KVM: nVMX: Drop redundant checks on vmcs12 in EPTP switching emulation
  2021-06-09 23:42 ` [PATCH 15/15] KVM: nVMX: Drop redundant checks on vmcs12 in EPTP switching emulation Sean Christopherson
@ 2021-06-10 16:09   ` Paolo Bonzini
  0 siblings, 0 replies; 18+ messages in thread
From: Paolo Bonzini @ 2021-06-10 16:09 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Junaid Shahid, Maxim Levitsky, Lai Jiangshan

On 10/06/21 01:42, Sean Christopherson wrote:
> Drop the explicit checks on EPTP switching and EPT itself being enabled.
> The EPTP switching check is handled in the generic VMFUNC function check,
> the underlying VMFUNC enablement check is done by hardware and redone
> by generic VMFUNC emulation, and the vmcs12 EPT check is handled by KVM
> at VM-Enter in the form of a consistency check.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   arch/x86/kvm/vmx/nested.c | 4 ----
>   1 file changed, 4 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 0075d3f0f8fa..479ec9378609 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -5448,10 +5448,6 @@ static int nested_vmx_eptp_switching(struct kvm_vcpu *vcpu,
>   	u32 index = kvm_rcx_read(vcpu);
>   	u64 new_eptp;
>   
> -	if (!nested_cpu_has_eptp_switching(vmcs12) ||
> -	    !nested_cpu_has_ept(vmcs12))
> -		return 1;
> -

Perhaps the EPT enabled check is worth keeping with a WARN_ON_ONCE?

Paolo

>   	if (index >= VMFUNC_EPTP_ENTRIES)
>   		return 1;
>   
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 00/15] KVM: x86/mmu: TLB fixes and related cleanups
  2021-06-09 23:42 [PATCH 00/15] KVM: x86/mmu: TLB fixes and related cleanups Sean Christopherson
                   ` (14 preceding siblings ...)
  2021-06-09 23:42 ` [PATCH 15/15] KVM: nVMX: Drop redundant checks on vmcs12 in EPTP switching emulation Sean Christopherson
@ 2021-06-10 16:10 ` Paolo Bonzini
  15 siblings, 0 replies; 18+ messages in thread
From: Paolo Bonzini @ 2021-06-10 16:10 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Junaid Shahid, Maxim Levitsky, Lai Jiangshan

On 10/06/21 01:42, Sean Christopherson wrote:
> Fixes for two (very) theoretical TLB flushing bugs (patches 1 and 4),
> and clean ups on top to (hopefully) consolidate and simplifiy the TLB
> flushing logic.
> 
> The basic gist of the TLB flush and MMU sync code shuffling  is to stop
> relying on the logic in __kvm_mmu_new_pgd() (but keep it for forced
> flushing), and instead handle the flush+sync logic in the caller
> independent from whether or not the "fast" switch occurs.
> 
> I spent a fair bit of time trying to shove the necessary logic down into
> __kvm_mmu_new_pgd(), but it always ended up a complete mess because the
> requirements and contextual information is always different.  The rules
> for MOV CR3 are different from nVMX transitions (and those vary based on
> EPT+VPID), and nSVM will be different still (once it adds proper TLB
> handling).  In particular, I like that nVMX no longer has special code
> for synchronizing the MMU when using shadowing paging and instead relies
> on the common rules for TLB flushing.
> 
> Note, this series (indirectly) relies heavily on commit b53e84eed08b
> ("KVM: x86: Unload MMU on guest TLB flush if TDP disabled to force MMU
> sync"), as it uses KVM_REQ_TLB_FLUSH_GUEST (was KVM_REQ_HV_TLB_FLUSH)
> to do the TLB flush _and_ the MMU sync in non-PV code.
> 
> Tested all combinations for i386, EPT, NPT, and shadow paging. I think...
> 
> The EPTP switching and INVPCID single-context changes in particular lack
> meaningful coverage in kvm-unit-tests+Linux.  Long term it's on my todo
> list to remedy that, but realistically I doubt I'll get it done anytime
> soon.
> 
> To test EPTP switching, I hacked L1 to set up a duplicate top-level EPT
> table, copy the "real" table to the duplicate table on EPT violation,
> populate VMFUNC.EPTP_LIST with the two EPTPs, expose  VMFUNC.EPTP_SWITCH
> to L2.  I then hacked L2 to do an EPTP switch to a random (valid) EPTP
> index on every task switch.
> 
> To test INVPCID single-context I modified L1 to iterate over all possible
> PCIDs using INVPCID single-context in native_flush_tlb_global().  I also
> verified that the guest crashed if it didn't do any INVPCID at all
> (interestingly, the guest made it through boot without the flushes when
> EPT was enabled, which implies the missing MMU sync on INVPCID was the
> source of the crash, not a stale TLB entry).
> 
> Sean Christopherson (15):
>    KVM: nVMX: Sync all PGDs on nested transition with shadow paging
>    KVM: nVMX: Ensure 64-bit shift when checking VMFUNC bitmap
>    KVM: nVMX: Don't clobber nested MMU's A/D status on EPTP switch
>    KVM: x86: Invalidate all PGDs for the current PCID on MOV CR3 w/ flush
>    KVM: x86: Uncondtionally skip MMU sync/TLB flush in MOV CR3's PGD
>      switch
>    KVM: nSVM: Move TLB flushing logic (or lack thereof) to dedicated
>      helper
>    KVM: x86: Drop skip MMU sync and TLB flush params from "new PGD"
>      helpers
>    KVM: nVMX: Consolidate VM-Enter/VM-Exit TLB flush and MMU sync logic
>    KVM: nVMX: Free only guest_mode (L2) roots on INVVPID w/o EPT
>    KVM: x86: Use KVM_REQ_TLB_FLUSH_GUEST to handle INVPCID(ALL) emulation
>    KVM: nVMX: Use fast PGD switch when emulating VMFUNC[EPTP_SWITCH]
>    KVM: x86: Defer MMU sync on PCID invalidation
>    KVM: x86: Drop pointless @reset_roots from kvm_init_mmu()
>    KVM: nVMX: WARN if subtly-impossible VMFUNC conditions occur
>    KVM: nVMX: Drop redundant checks on vmcs12 in EPTP switching emulation
> 
>   arch/x86/include/asm/kvm_host.h |   6 +-
>   arch/x86/kvm/hyperv.c           |   2 +-
>   arch/x86/kvm/mmu.h              |   2 +-
>   arch/x86/kvm/mmu/mmu.c          |  57 ++++++++-----
>   arch/x86/kvm/svm/nested.c       |  40 ++++++---
>   arch/x86/kvm/vmx/nested.c       | 139 ++++++++++++--------------------
>   arch/x86/kvm/x86.c              |  75 ++++++++++-------
>   7 files changed, 169 insertions(+), 152 deletions(-)
> 

I tried this a couple times but was blocked on what is essentially your 
first patch, so thanks!  Patches queued for 5.14.

Paolo


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2021-06-10 16:10 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-09 23:42 [PATCH 00/15] KVM: x86/mmu: TLB fixes and related cleanups Sean Christopherson
2021-06-09 23:42 ` [PATCH 01/15] KVM: nVMX: Sync all PGDs on nested transition with shadow paging Sean Christopherson
2021-06-09 23:42 ` [PATCH 02/15] KVM: nVMX: Ensure 64-bit shift when checking VMFUNC bitmap Sean Christopherson
2021-06-09 23:42 ` [PATCH 03/15] KVM: nVMX: Don't clobber nested MMU's A/D status on EPTP switch Sean Christopherson
2021-06-09 23:42 ` [PATCH 04/15] KVM: x86: Invalidate all PGDs for the current PCID on MOV CR3 w/ flush Sean Christopherson
2021-06-09 23:42 ` [PATCH 05/15] KVM: x86: Uncondtionally skip MMU sync/TLB flush in MOV CR3's PGD switch Sean Christopherson
2021-06-09 23:42 ` [PATCH 06/15] KVM: nSVM: Move TLB flushing logic (or lack thereof) to dedicated helper Sean Christopherson
2021-06-09 23:42 ` [PATCH 07/15] KVM: x86: Drop skip MMU sync and TLB flush params from "new PGD" helpers Sean Christopherson
2021-06-09 23:42 ` [PATCH 08/15] KVM: nVMX: Consolidate VM-Enter/VM-Exit TLB flush and MMU sync logic Sean Christopherson
2021-06-09 23:42 ` [PATCH 09/15] KVM: nVMX: Free only guest_mode (L2) roots on INVVPID w/o EPT Sean Christopherson
2021-06-09 23:42 ` [PATCH 10/15] KVM: x86: Use KVM_REQ_TLB_FLUSH_GUEST to handle INVPCID(ALL) emulation Sean Christopherson
2021-06-09 23:42 ` [PATCH 11/15] KVM: nVMX: Use fast PGD switch when emulating VMFUNC[EPTP_SWITCH] Sean Christopherson
2021-06-09 23:42 ` [PATCH 12/15] KVM: x86: Defer MMU sync on PCID invalidation Sean Christopherson
2021-06-09 23:42 ` [PATCH 13/15] KVM: x86: Drop pointless @reset_roots from kvm_init_mmu() Sean Christopherson
2021-06-09 23:42 ` [PATCH 14/15] KVM: nVMX: WARN if subtly-impossible VMFUNC conditions occur Sean Christopherson
2021-06-09 23:42 ` [PATCH 15/15] KVM: nVMX: Drop redundant checks on vmcs12 in EPTP switching emulation Sean Christopherson
2021-06-10 16:09   ` Paolo Bonzini
2021-06-10 16:10 ` [PATCH 00/15] KVM: x86/mmu: TLB fixes and related cleanups Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).