linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/18] KVM: MMU: do not unload MMU roots on all role changes
@ 2022-02-17 21:03 Paolo Bonzini
  2022-02-17 21:03 ` [PATCH v2 01/18] KVM: x86: host-initiated EFER.LME write affects the MMU Paolo Bonzini
                   ` (17 more replies)
  0 siblings, 18 replies; 66+ messages in thread
From: Paolo Bonzini @ 2022-02-17 21:03 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: seanjc

The TDP MMU has a performance regression compared to the legacy MMU
when CR0 changes often.  This was reported for the grsecurity kernel,
which uses CR0.WP to implement kernel W^X.  In that case, each change to
CR0.WP unloads the MMU and causes a lot of unnecessary work.  When running
nested, this can even cause the L1 to hardly make progress, as the L0
hypervisor it is overwhelmed by the amount of MMU work that is needed.

The root reason why kvm_mmu_reset_context calls kvm_mmu_unload is a
subtlety of the implementation of fast PGD switching, which requires a
call to kvm_mmu_new_pgd (and therefore knowing the new MMU role) *before*
kvm_init_mmu.  kvm_mmu_reset_context chickens out and does not do fast
PGD switching at all, instead dropping all the roots.

Therefore, the most important part of this series is a reorganization
of fast PGD switching; it makes it possible to call kvm_mmu_new_pgd
*after* the MMU has been set up, just using the MMU role instead of
kvm_mmu_calc_root_page_role.

Patches 1 and 2 are bugfixes found while working on the series.

Patches 3 to 4 add more sanity checks that triggered a lot during
development.

Patches 5 to 7 are related cleanups.  In particular patch 5 makes the
cache lookup code a bit more pleasant.

Patches 8 to 9 rework the fast PGD switching.  Patches 10 and 11 are
cleanups enabled by the rework, and the only survivors of the CPU role
patchset.

Patches 13 to 16 tidy up callers of kvm_mmu_reset_context and
kvm_mmu_new_pgd.  kvm_mmu_new_pgd is changed to use the ->get_guest_pgd
callback, avoiding the possibility of confusion between the root_mmu
and the guest_mmu, and a new request is created for it (this will also
be put to use once the role patchset will allow automatic detection of
changed MMU role).

Finally, patch 17 changes callers that expect kvm_mmu_reset_context to
perform a guest TLB flush, and patch 18 optimizes kvm_mmu_reset_context.

Paolo

Lai Jiangshan (1):
  KVM: x86/mmu: Do not use guest root level in audit

Paolo Bonzini (17):
  KVM: x86: host-initiated EFER.LME write affects the MMU
  KVM: x86: do not deliver asynchronous page faults if CR0.PG=0
  KVM: x86/mmu: WARN if PAE roots linger after kvm_mmu_unload
  KVM: x86/mmu: avoid NULL-pointer dereference on page freeing bugs
  KVM: x86/mmu: use struct kvm_mmu_root_info for mmu->root
  KVM: x86/mmu: do not consult levels when freeing roots
  KVM: x86/mmu: do not pass vcpu to root freeing functions
  KVM: x86/mmu: look for a cached PGD when going from 32-bit to 64-bit
  KVM: x86/mmu: load new PGD after the shadow MMU is initialized
  KVM: x86/mmu: Always use current mmu's role when loading new PGD
  KVM: x86/mmu: clear MMIO cache when unloading the MMU
  KVM: x86: reset and reinitialize the MMU in __set_sregs_common
  KVM: x86/mmu: avoid indirect call for get_cr3
  KVM: x86/mmu: rename kvm_mmu_new_pgd, introduce variant that calls
    get_guest_pgd
  KVM: x86: introduce KVM_REQ_MMU_UPDATE_ROOT
  KVM: x86: flush TLB separately from MMU reset
  KVM: x86: do not unload MMU roots on all role changes

 arch/x86/include/asm/kvm_host.h |  10 +-
 arch/x86/kvm/mmu.h              |  18 ++-
 arch/x86/kvm/mmu/mmu.c          | 273 +++++++++++++++++---------------
 arch/x86/kvm/mmu/mmu_audit.c    |  16 +-
 arch/x86/kvm/mmu/paging_tmpl.h  |   4 +-
 arch/x86/kvm/mmu/tdp_mmu.c      |   2 +-
 arch/x86/kvm/mmu/tdp_mmu.h      |   2 +-
 arch/x86/kvm/svm/nested.c       |   6 +-
 arch/x86/kvm/vmx/nested.c       |  16 +-
 arch/x86/kvm/vmx/vmx.c          |   2 +-
 arch/x86/kvm/x86.c              | 135 ++++++++++------
 11 files changed, 279 insertions(+), 205 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH v2 01/18] KVM: x86: host-initiated EFER.LME write affects the MMU
  2022-02-17 21:03 [PATCH v2 00/18] KVM: MMU: do not unload MMU roots on all role changes Paolo Bonzini
@ 2022-02-17 21:03 ` Paolo Bonzini
  2022-02-18 17:08   ` Sean Christopherson
  2022-02-23 13:40   ` Maxim Levitsky
  2022-02-17 21:03 ` [PATCH v2 02/18] KVM: x86: do not deliver asynchronous page faults if CR0.PG=0 Paolo Bonzini
                   ` (16 subsequent siblings)
  17 siblings, 2 replies; 66+ messages in thread
From: Paolo Bonzini @ 2022-02-17 21:03 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: seanjc, stable

While the guest runs, EFER.LME cannot change unless CR0.PG is clear, and therefore
EFER.NX is the only bit that can affect the MMU role.  However, set_efer accepts
a host-initiated change to EFER.LME even with CR0.PG=1.  In that case, the
MMU has to be reset.

Fixes: 11988499e62b ("KVM: x86: Skip EFER vs. guest CPUID checks for host-initiated writes")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/mmu.h | 1 +
 arch/x86/kvm/x86.c | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 51faa2c76ca5..a5a50cfeffff 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -48,6 +48,7 @@
 			       X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_PKE)
 
 #define KVM_MMU_CR0_ROLE_BITS (X86_CR0_PG | X86_CR0_WP)
+#define KVM_MMU_EFER_ROLE_BITS (EFER_LME | EFER_NX)
 
 static __always_inline u64 rsvd_bits(int s, int e)
 {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d3da64106685..99a58c25f5c2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1647,7 +1647,7 @@ static int set_efer(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	}
 
 	/* Update reserved bits */
-	if ((efer ^ old_efer) & EFER_NX)
+	if ((efer ^ old_efer) & KVM_MMU_EFER_ROLE_BITS)
 		kvm_mmu_reset_context(vcpu);
 
 	return 0;
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 02/18] KVM: x86: do not deliver asynchronous page faults if CR0.PG=0
  2022-02-17 21:03 [PATCH v2 00/18] KVM: MMU: do not unload MMU roots on all role changes Paolo Bonzini
  2022-02-17 21:03 ` [PATCH v2 01/18] KVM: x86: host-initiated EFER.LME write affects the MMU Paolo Bonzini
@ 2022-02-17 21:03 ` Paolo Bonzini
  2022-02-18 17:12   ` Sean Christopherson
  2022-02-23 14:07   ` Maxim Levitsky
  2022-02-17 21:03 ` [PATCH v2 03/18] KVM: x86/mmu: WARN if PAE roots linger after kvm_mmu_unload Paolo Bonzini
                   ` (15 subsequent siblings)
  17 siblings, 2 replies; 66+ messages in thread
From: Paolo Bonzini @ 2022-02-17 21:03 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: seanjc

Enabling async page faults is nonsensical if paging is disabled, but
it is allowed because CR0.PG=0 does not clear the async page fault
MSR.  Just ignore them and only use the artificial halt state,
similar to what happens in guest mode if async #PF vmexits are disabled.

Given the increasingly complex logic, and the nicer code if the new
"if" is placed last, opportunistically change the "||" into a chain
of "if (...) return false" statements.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/x86.c | 22 ++++++++++++++++++----
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 99a58c25f5c2..b912eef5dc1a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12270,14 +12270,28 @@ static inline bool apf_pageready_slot_free(struct kvm_vcpu *vcpu)
 
 static bool kvm_can_deliver_async_pf(struct kvm_vcpu *vcpu)
 {
-	if (!vcpu->arch.apf.delivery_as_pf_vmexit && is_guest_mode(vcpu))
+
+	if (!kvm_pv_async_pf_enabled(vcpu))
 		return false;
 
-	if (!kvm_pv_async_pf_enabled(vcpu) ||
-	    (vcpu->arch.apf.send_user_only && static_call(kvm_x86_get_cpl)(vcpu) == 0))
+	if (vcpu->arch.apf.send_user_only &&
+	    static_call(kvm_x86_get_cpl)(vcpu) == 0)
 		return false;
 
-	return true;
+	if (is_guest_mode(vcpu)) {
+		/*
+		 * L1 needs to opt into the special #PF vmexits that are
+		 * used to deliver async page faults.
+		 */
+		return vcpu->arch.apf.delivery_as_pf_vmexit;
+	} else {
+		/*
+		 * Play it safe in case the guest does a quick real mode
+		 * foray.  The real mode IDT is unlikely to have a #PF
+		 * exception setup.
+		 */
+		return is_paging(vcpu);
+	}
 }
 
 bool kvm_can_do_async_pf(struct kvm_vcpu *vcpu)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 03/18] KVM: x86/mmu: WARN if PAE roots linger after kvm_mmu_unload
  2022-02-17 21:03 [PATCH v2 00/18] KVM: MMU: do not unload MMU roots on all role changes Paolo Bonzini
  2022-02-17 21:03 ` [PATCH v2 01/18] KVM: x86: host-initiated EFER.LME write affects the MMU Paolo Bonzini
  2022-02-17 21:03 ` [PATCH v2 02/18] KVM: x86: do not deliver asynchronous page faults if CR0.PG=0 Paolo Bonzini
@ 2022-02-17 21:03 ` Paolo Bonzini
  2022-02-18 17:14   ` Sean Christopherson
  2022-02-17 21:03 ` [PATCH v2 04/18] KVM: x86/mmu: avoid NULL-pointer dereference on page freeing bugs Paolo Bonzini
                   ` (14 subsequent siblings)
  17 siblings, 1 reply; 66+ messages in thread
From: Paolo Bonzini @ 2022-02-17 21:03 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: seanjc

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/mmu/mmu.c | 17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 296f8723f9ae..a67071ac80f3 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5086,12 +5086,21 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu)
 	return r;
 }
 
+static void __kvm_mmu_unload(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu)
+{
+	int i;
+	kvm_mmu_free_roots(vcpu, mmu, KVM_MMU_ROOTS_ALL);
+	WARN_ON(VALID_PAGE(mmu->root_hpa));
+	if (mmu->pae_root) {
+		for (i = 0; i < 4; ++i)
+			WARN_ON(IS_VALID_PAE_ROOT(mmu->pae_root[i]));
+	}
+}
+
 void kvm_mmu_unload(struct kvm_vcpu *vcpu)
 {
-	kvm_mmu_free_roots(vcpu, &vcpu->arch.root_mmu, KVM_MMU_ROOTS_ALL);
-	WARN_ON(VALID_PAGE(vcpu->arch.root_mmu.root_hpa));
-	kvm_mmu_free_roots(vcpu, &vcpu->arch.guest_mmu, KVM_MMU_ROOTS_ALL);
-	WARN_ON(VALID_PAGE(vcpu->arch.guest_mmu.root_hpa));
+	__kvm_mmu_unload(vcpu, &vcpu->arch.root_mmu);
+	__kvm_mmu_unload(vcpu, &vcpu->arch.guest_mmu);
 }
 
 static bool need_remote_flush(u64 old, u64 new)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 04/18] KVM: x86/mmu: avoid NULL-pointer dereference on page freeing bugs
  2022-02-17 21:03 [PATCH v2 00/18] KVM: MMU: do not unload MMU roots on all role changes Paolo Bonzini
                   ` (2 preceding siblings ...)
  2022-02-17 21:03 ` [PATCH v2 03/18] KVM: x86/mmu: WARN if PAE roots linger after kvm_mmu_unload Paolo Bonzini
@ 2022-02-17 21:03 ` Paolo Bonzini
  2022-02-18 17:15   ` Sean Christopherson
  2022-02-23 14:12   ` Maxim Levitsky
  2022-02-17 21:03 ` [PATCH v2 05/18] KVM: x86/mmu: use struct kvm_mmu_root_info for mmu->root Paolo Bonzini
                   ` (13 subsequent siblings)
  17 siblings, 2 replies; 66+ messages in thread
From: Paolo Bonzini @ 2022-02-17 21:03 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: seanjc

WARN and bail if KVM attempts to free a root that isn't backed by a shadow
page.  KVM allocates a bare page for "special" roots, e.g. when using PAE
paging or shadowing 2/3/4-level page tables with 4/5-level, and so root_hpa
will be valid but won't be backed by a shadow page.  It's all too easy to
blindly call mmu_free_root_page() on root_hpa, be nice and WARN instead of
crashing KVM and possibly the kernel.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/mmu/mmu.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index a67071ac80f3..6ea423b00824 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3222,6 +3222,8 @@ static void mmu_free_root_page(struct kvm *kvm, hpa_t *root_hpa,
 		return;
 
 	sp = to_shadow_page(*root_hpa & PT64_BASE_ADDR_MASK);
+	if (WARN_ON(!sp))
+		return;
 
 	if (is_tdp_mmu_page(sp))
 		kvm_tdp_mmu_put_root(kvm, sp, false);
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 05/18] KVM: x86/mmu: use struct kvm_mmu_root_info for mmu->root
  2022-02-17 21:03 [PATCH v2 00/18] KVM: MMU: do not unload MMU roots on all role changes Paolo Bonzini
                   ` (3 preceding siblings ...)
  2022-02-17 21:03 ` [PATCH v2 04/18] KVM: x86/mmu: avoid NULL-pointer dereference on page freeing bugs Paolo Bonzini
@ 2022-02-17 21:03 ` Paolo Bonzini
  2022-02-23 14:39   ` Maxim Levitsky
  2022-02-17 21:03 ` [PATCH v2 06/18] KVM: x86/mmu: do not consult levels when freeing roots Paolo Bonzini
                   ` (12 subsequent siblings)
  17 siblings, 1 reply; 66+ messages in thread
From: Paolo Bonzini @ 2022-02-17 21:03 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: seanjc

The root_hpa and root_pgd fields form essentially a struct kvm_mmu_root_info.
Use the struct to have more consistency between mmu->root and
mmu->prev_roots.

The patch is entirely search and replace except for cached_root_available,
which does not need a temporary struct kvm_mmu_root_info anymore.

Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |  3 +-
 arch/x86/kvm/mmu.h              |  4 +-
 arch/x86/kvm/mmu/mmu.c          | 69 +++++++++++++++------------------
 arch/x86/kvm/mmu/mmu_audit.c    |  4 +-
 arch/x86/kvm/mmu/paging_tmpl.h  |  2 +-
 arch/x86/kvm/mmu/tdp_mmu.c      |  2 +-
 arch/x86/kvm/mmu/tdp_mmu.h      |  2 +-
 arch/x86/kvm/vmx/nested.c       |  2 +-
 arch/x86/kvm/vmx/vmx.c          |  2 +-
 arch/x86/kvm/x86.c              |  2 +-
 10 files changed, 42 insertions(+), 50 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 8e512f25a930..6442facfd5c0 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -432,8 +432,7 @@ struct kvm_mmu {
 	int (*sync_page)(struct kvm_vcpu *vcpu,
 			 struct kvm_mmu_page *sp);
 	void (*invlpg)(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root_hpa);
-	hpa_t root_hpa;
-	gpa_t root_pgd;
+	struct kvm_mmu_root_info root;
 	union kvm_mmu_role mmu_role;
 	u8 root_level;
 	u8 shadow_root_level;
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index a5a50cfeffff..1d0c1904d69a 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -85,7 +85,7 @@ void kvm_mmu_sync_prev_roots(struct kvm_vcpu *vcpu);
 
 static inline int kvm_mmu_reload(struct kvm_vcpu *vcpu)
 {
-	if (likely(vcpu->arch.mmu->root_hpa != INVALID_PAGE))
+	if (likely(vcpu->arch.mmu->root.hpa != INVALID_PAGE))
 		return 0;
 
 	return kvm_mmu_load(vcpu);
@@ -107,7 +107,7 @@ static inline unsigned long kvm_get_active_pcid(struct kvm_vcpu *vcpu)
 
 static inline void kvm_mmu_load_pgd(struct kvm_vcpu *vcpu)
 {
-	u64 root_hpa = vcpu->arch.mmu->root_hpa;
+	u64 root_hpa = vcpu->arch.mmu->root.hpa;
 
 	if (!VALID_PAGE(root_hpa))
 		return;
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 6ea423b00824..a478667d7561 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2162,7 +2162,7 @@ static void shadow_walk_init_using_root(struct kvm_shadow_walk_iterator *iterato
 		 * prev_root is currently only used for 64-bit hosts. So only
 		 * the active root_hpa is valid here.
 		 */
-		BUG_ON(root != vcpu->arch.mmu->root_hpa);
+		BUG_ON(root != vcpu->arch.mmu->root.hpa);
 
 		iterator->shadow_addr
 			= vcpu->arch.mmu->pae_root[(addr >> 30) & 3];
@@ -2176,7 +2176,7 @@ static void shadow_walk_init_using_root(struct kvm_shadow_walk_iterator *iterato
 static void shadow_walk_init(struct kvm_shadow_walk_iterator *iterator,
 			     struct kvm_vcpu *vcpu, u64 addr)
 {
-	shadow_walk_init_using_root(iterator, vcpu, vcpu->arch.mmu->root_hpa,
+	shadow_walk_init_using_root(iterator, vcpu, vcpu->arch.mmu->root.hpa,
 				    addr);
 }
 
@@ -3245,7 +3245,7 @@ void kvm_mmu_free_roots(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 	BUILD_BUG_ON(KVM_MMU_NUM_PREV_ROOTS >= BITS_PER_LONG);
 
 	/* Before acquiring the MMU lock, see if we need to do any real work. */
-	if (!(free_active_root && VALID_PAGE(mmu->root_hpa))) {
+	if (!(free_active_root && VALID_PAGE(mmu->root.hpa))) {
 		for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++)
 			if ((roots_to_free & KVM_MMU_ROOT_PREVIOUS(i)) &&
 			    VALID_PAGE(mmu->prev_roots[i].hpa))
@@ -3265,7 +3265,7 @@ void kvm_mmu_free_roots(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 	if (free_active_root) {
 		if (mmu->shadow_root_level >= PT64_ROOT_4LEVEL &&
 		    (mmu->root_level >= PT64_ROOT_4LEVEL || mmu->direct_map)) {
-			mmu_free_root_page(kvm, &mmu->root_hpa, &invalid_list);
+			mmu_free_root_page(kvm, &mmu->root.hpa, &invalid_list);
 		} else if (mmu->pae_root) {
 			for (i = 0; i < 4; ++i) {
 				if (!IS_VALID_PAE_ROOT(mmu->pae_root[i]))
@@ -3276,8 +3276,8 @@ void kvm_mmu_free_roots(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 				mmu->pae_root[i] = INVALID_PAE_ROOT;
 			}
 		}
-		mmu->root_hpa = INVALID_PAGE;
-		mmu->root_pgd = 0;
+		mmu->root.hpa = INVALID_PAGE;
+		mmu->root.pgd = 0;
 	}
 
 	kvm_mmu_commit_zap_page(kvm, &invalid_list);
@@ -3350,10 +3350,10 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
 
 	if (is_tdp_mmu_enabled(vcpu->kvm)) {
 		root = kvm_tdp_mmu_get_vcpu_root_hpa(vcpu);
-		mmu->root_hpa = root;
+		mmu->root.hpa = root;
 	} else if (shadow_root_level >= PT64_ROOT_4LEVEL) {
 		root = mmu_alloc_root(vcpu, 0, 0, shadow_root_level, true);
-		mmu->root_hpa = root;
+		mmu->root.hpa = root;
 	} else if (shadow_root_level == PT32E_ROOT_LEVEL) {
 		if (WARN_ON_ONCE(!mmu->pae_root)) {
 			r = -EIO;
@@ -3368,15 +3368,15 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
 			mmu->pae_root[i] = root | PT_PRESENT_MASK |
 					   shadow_me_mask;
 		}
-		mmu->root_hpa = __pa(mmu->pae_root);
+		mmu->root.hpa = __pa(mmu->pae_root);
 	} else {
 		WARN_ONCE(1, "Bad TDP root level = %d\n", shadow_root_level);
 		r = -EIO;
 		goto out_unlock;
 	}
 
-	/* root_pgd is ignored for direct MMUs. */
-	mmu->root_pgd = 0;
+	/* root.pgd is ignored for direct MMUs. */
+	mmu->root.pgd = 0;
 out_unlock:
 	write_unlock(&vcpu->kvm->mmu_lock);
 	return r;
@@ -3489,7 +3489,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 	if (mmu->root_level >= PT64_ROOT_4LEVEL) {
 		root = mmu_alloc_root(vcpu, root_gfn, 0,
 				      mmu->shadow_root_level, false);
-		mmu->root_hpa = root;
+		mmu->root.hpa = root;
 		goto set_root_pgd;
 	}
 
@@ -3539,14 +3539,14 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 	}
 
 	if (mmu->shadow_root_level == PT64_ROOT_5LEVEL)
-		mmu->root_hpa = __pa(mmu->pml5_root);
+		mmu->root.hpa = __pa(mmu->pml5_root);
 	else if (mmu->shadow_root_level == PT64_ROOT_4LEVEL)
-		mmu->root_hpa = __pa(mmu->pml4_root);
+		mmu->root.hpa = __pa(mmu->pml4_root);
 	else
-		mmu->root_hpa = __pa(mmu->pae_root);
+		mmu->root.hpa = __pa(mmu->pae_root);
 
 set_root_pgd:
-	mmu->root_pgd = root_pgd;
+	mmu->root.pgd = root_pgd;
 out_unlock:
 	write_unlock(&vcpu->kvm->mmu_lock);
 
@@ -3659,13 +3659,13 @@ void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu)
 	if (vcpu->arch.mmu->direct_map)
 		return;
 
-	if (!VALID_PAGE(vcpu->arch.mmu->root_hpa))
+	if (!VALID_PAGE(vcpu->arch.mmu->root.hpa))
 		return;
 
 	vcpu_clear_mmio_info(vcpu, MMIO_GVA_ANY);
 
 	if (vcpu->arch.mmu->root_level >= PT64_ROOT_4LEVEL) {
-		hpa_t root = vcpu->arch.mmu->root_hpa;
+		hpa_t root = vcpu->arch.mmu->root.hpa;
 		sp = to_shadow_page(root);
 
 		if (!is_unsync_root(root))
@@ -3956,7 +3956,7 @@ static bool kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
 static bool is_page_fault_stale(struct kvm_vcpu *vcpu,
 				struct kvm_page_fault *fault, int mmu_seq)
 {
-	struct kvm_mmu_page *sp = to_shadow_page(vcpu->arch.mmu->root_hpa);
+	struct kvm_mmu_page *sp = to_shadow_page(vcpu->arch.mmu->root.hpa);
 
 	/* Special roots, e.g. pae_root, are not backed by shadow pages. */
 	if (sp && is_obsolete_sp(vcpu->kvm, sp))
@@ -4113,34 +4113,27 @@ static inline bool is_root_usable(struct kvm_mmu_root_info *root, gpa_t pgd,
 /*
  * Find out if a previously cached root matching the new pgd/role is available.
  * The current root is also inserted into the cache.
- * If a matching root was found, it is assigned to kvm_mmu->root_hpa and true is
+ * If a matching root was found, it is assigned to kvm_mmu->root.hpa and true is
  * returned.
- * Otherwise, the LRU root from the cache is assigned to kvm_mmu->root_hpa and
+ * Otherwise, the LRU root from the cache is assigned to kvm_mmu->root.hpa and
  * false is returned. This root should now be freed by the caller.
  */
 static bool cached_root_available(struct kvm_vcpu *vcpu, gpa_t new_pgd,
 				  union kvm_mmu_page_role new_role)
 {
 	uint i;
-	struct kvm_mmu_root_info root;
 	struct kvm_mmu *mmu = vcpu->arch.mmu;
 
-	root.pgd = mmu->root_pgd;
-	root.hpa = mmu->root_hpa;
-
-	if (is_root_usable(&root, new_pgd, new_role))
+	if (is_root_usable(&mmu->root, new_pgd, new_role))
 		return true;
 
 	for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++) {
-		swap(root, mmu->prev_roots[i]);
+		swap(mmu->root, mmu->prev_roots[i]);
 
-		if (is_root_usable(&root, new_pgd, new_role))
+		if (is_root_usable(&mmu->root, new_pgd, new_role))
 			break;
 	}
 
-	mmu->root_hpa = root.hpa;
-	mmu->root_pgd = root.pgd;
-
 	return i < KVM_MMU_NUM_PREV_ROOTS;
 }
 
@@ -4196,7 +4189,7 @@ static void __kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd,
 	 */
 	if (!new_role.direct)
 		__clear_sp_write_flooding_count(
-				to_shadow_page(vcpu->arch.mmu->root_hpa));
+				to_shadow_page(vcpu->arch.mmu->root.hpa));
 }
 
 void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd)
@@ -5092,7 +5085,7 @@ static void __kvm_mmu_unload(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu)
 {
 	int i;
 	kvm_mmu_free_roots(vcpu, mmu, KVM_MMU_ROOTS_ALL);
-	WARN_ON(VALID_PAGE(mmu->root_hpa));
+	WARN_ON(VALID_PAGE(mmu->root.hpa));
 	if (mmu->pae_root) {
 		for (i = 0; i < 4; ++i)
 			WARN_ON(IS_VALID_PAE_ROOT(mmu->pae_root[i]));
@@ -5287,7 +5280,7 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 error_code,
 	int r, emulation_type = EMULTYPE_PF;
 	bool direct = vcpu->arch.mmu->direct_map;
 
-	if (WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root_hpa)))
+	if (WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root.hpa)))
 		return RET_PF_RETRY;
 
 	r = RET_PF_INVALID;
@@ -5359,7 +5352,7 @@ void kvm_mmu_invalidate_gva(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 		return;
 
 	if (root_hpa == INVALID_PAGE) {
-		mmu->invlpg(vcpu, gva, mmu->root_hpa);
+		mmu->invlpg(vcpu, gva, mmu->root.hpa);
 
 		/*
 		 * INVLPG is required to invalidate any global mappings for the VA,
@@ -5395,7 +5388,7 @@ void kvm_mmu_invpcid_gva(struct kvm_vcpu *vcpu, gva_t gva, unsigned long pcid)
 	uint i;
 
 	if (pcid == kvm_get_active_pcid(vcpu)) {
-		mmu->invlpg(vcpu, gva, mmu->root_hpa);
+		mmu->invlpg(vcpu, gva, mmu->root.hpa);
 		tlb_flush = true;
 	}
 
@@ -5508,8 +5501,8 @@ static int __kvm_mmu_create(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu)
 	struct page *page;
 	int i;
 
-	mmu->root_hpa = INVALID_PAGE;
-	mmu->root_pgd = 0;
+	mmu->root.hpa = INVALID_PAGE;
+	mmu->root.pgd = 0;
 	for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++)
 		mmu->prev_roots[i] = KVM_MMU_ROOT_INFO_INVALID;
 
diff --git a/arch/x86/kvm/mmu/mmu_audit.c b/arch/x86/kvm/mmu/mmu_audit.c
index f31fdb874f1f..3e5d62a25350 100644
--- a/arch/x86/kvm/mmu/mmu_audit.c
+++ b/arch/x86/kvm/mmu/mmu_audit.c
@@ -56,11 +56,11 @@ static void mmu_spte_walk(struct kvm_vcpu *vcpu, inspect_spte_fn fn)
 	int i;
 	struct kvm_mmu_page *sp;
 
-	if (!VALID_PAGE(vcpu->arch.mmu->root_hpa))
+	if (!VALID_PAGE(vcpu->arch.mmu->root.hpa))
 		return;
 
 	if (vcpu->arch.mmu->root_level >= PT64_ROOT_4LEVEL) {
-		hpa_t root = vcpu->arch.mmu->root_hpa;
+		hpa_t root = vcpu->arch.mmu->root.hpa;
 
 		sp = to_shadow_page(root);
 		__mmu_spte_walk(vcpu, sp, fn, vcpu->arch.mmu->root_level);
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 5b5bdac97c7b..346f3bad3cb9 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -668,7 +668,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
 	if (FNAME(gpte_changed)(vcpu, gw, top_level))
 		goto out_gpte_changed;
 
-	if (WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root_hpa)))
+	if (WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root.hpa)))
 		goto out_gpte_changed;
 
 	for (shadow_walk_init(&it, vcpu, fault->addr);
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 8def8f810cb0..debf08212f12 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -657,7 +657,7 @@ static inline void tdp_mmu_set_spte_no_dirty_log(struct kvm *kvm,
 		else
 
 #define tdp_mmu_for_each_pte(_iter, _mmu, _start, _end)		\
-	for_each_tdp_pte(_iter, to_shadow_page(_mmu->root_hpa), _start, _end)
+	for_each_tdp_pte(_iter, to_shadow_page(_mmu->root.hpa), _start, _end)
 
 /*
  * Yield if the MMU lock is contended or this thread needs to return control
diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
index 3f987785702a..57c73d8f76ce 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.h
+++ b/arch/x86/kvm/mmu/tdp_mmu.h
@@ -95,7 +95,7 @@ static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp) { return sp->tdp_mmu
 static inline bool is_tdp_mmu(struct kvm_mmu *mmu)
 {
 	struct kvm_mmu_page *sp;
-	hpa_t hpa = mmu->root_hpa;
+	hpa_t hpa = mmu->root.hpa;
 
 	if (WARN_ON(!VALID_PAGE(hpa)))
 		return false;
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index c73e4d938ddc..29289ecca223 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -5466,7 +5466,7 @@ static int handle_invept(struct kvm_vcpu *vcpu)
 				VMXERR_INVALID_OPERAND_TO_INVEPT_INVVPID);
 
 		roots_to_free = 0;
-		if (nested_ept_root_matches(mmu->root_hpa, mmu->root_pgd,
+		if (nested_ept_root_matches(mmu->root.hpa, mmu->root.pgd,
 					    operand.eptp))
 			roots_to_free |= KVM_MMU_ROOT_CURRENT;
 
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index d8547144d3b7..b183dfc41d74 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2952,7 +2952,7 @@ static inline int vmx_get_current_vpid(struct kvm_vcpu *vcpu)
 static void vmx_flush_tlb_current(struct kvm_vcpu *vcpu)
 {
 	struct kvm_mmu *mmu = vcpu->arch.mmu;
-	u64 root_hpa = mmu->root_hpa;
+	u64 root_hpa = mmu->root.hpa;
 
 	/* No flush required if the current context is invalid. */
 	if (!VALID_PAGE(root_hpa))
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b912eef5dc1a..c0d7256e3a78 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -762,7 +762,7 @@ bool kvm_inject_emulated_page_fault(struct kvm_vcpu *vcpu,
 	if ((fault->error_code & PFERR_PRESENT_MASK) &&
 	    !(fault->error_code & PFERR_RSVD_MASK))
 		kvm_mmu_invalidate_gva(vcpu, fault_mmu, fault->address,
-				       fault_mmu->root_hpa);
+				       fault_mmu->root.hpa);
 
 	fault_mmu->inject_page_fault(vcpu, fault);
 	return fault->nested_page_fault;
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 06/18] KVM: x86/mmu: do not consult levels when freeing roots
  2022-02-17 21:03 [PATCH v2 00/18] KVM: MMU: do not unload MMU roots on all role changes Paolo Bonzini
                   ` (4 preceding siblings ...)
  2022-02-17 21:03 ` [PATCH v2 05/18] KVM: x86/mmu: use struct kvm_mmu_root_info for mmu->root Paolo Bonzini
@ 2022-02-17 21:03 ` Paolo Bonzini
  2022-02-18 17:27   ` Sean Christopherson
  2022-02-23 14:59   ` Maxim Levitsky
  2022-02-17 21:03 ` [PATCH v2 07/18] KVM: x86/mmu: Do not use guest root level in audit Paolo Bonzini
                   ` (11 subsequent siblings)
  17 siblings, 2 replies; 66+ messages in thread
From: Paolo Bonzini @ 2022-02-17 21:03 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: seanjc

Right now, PGD caching requires a complicated dance of first computing
the MMU role and passing it to __kvm_mmu_new_pgd(), and then separately calling
kvm_init_mmu().

Part of this is due to kvm_mmu_free_roots using mmu->root_level and
mmu->shadow_root_level to distinguish whether the page table uses a single
root or 4 PAE roots.  Because kvm_init_mmu() can overwrite mmu->root_level,
kvm_mmu_free_roots() must be called before kvm_init_mmu().

However, even after kvm_init_mmu() there is a way to detect whether the
page table may hold PAE roots, as root.hpa isn't backed by a shadow when
it points at PAE roots.  Using this method results in simpler code, and
is one less obstacle in moving all calls to __kvm_mmu_new_pgd() after the
MMU has been initialized.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/mmu/mmu.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index a478667d7561..e1578f71feae 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3240,12 +3240,15 @@ void kvm_mmu_free_roots(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 	struct kvm *kvm = vcpu->kvm;
 	int i;
 	LIST_HEAD(invalid_list);
-	bool free_active_root = roots_to_free & KVM_MMU_ROOT_CURRENT;
+	bool free_active_root;
 
 	BUILD_BUG_ON(KVM_MMU_NUM_PREV_ROOTS >= BITS_PER_LONG);
 
 	/* Before acquiring the MMU lock, see if we need to do any real work. */
-	if (!(free_active_root && VALID_PAGE(mmu->root.hpa))) {
+	free_active_root = (roots_to_free & KVM_MMU_ROOT_CURRENT)
+		&& VALID_PAGE(mmu->root.hpa);
+
+	if (!free_active_root) {
 		for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++)
 			if ((roots_to_free & KVM_MMU_ROOT_PREVIOUS(i)) &&
 			    VALID_PAGE(mmu->prev_roots[i].hpa))
@@ -3263,8 +3266,7 @@ void kvm_mmu_free_roots(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 					   &invalid_list);
 
 	if (free_active_root) {
-		if (mmu->shadow_root_level >= PT64_ROOT_4LEVEL &&
-		    (mmu->root_level >= PT64_ROOT_4LEVEL || mmu->direct_map)) {
+		if (to_shadow_page(mmu->root.hpa)) {
 			mmu_free_root_page(kvm, &mmu->root.hpa, &invalid_list);
 		} else if (mmu->pae_root) {
 			for (i = 0; i < 4; ++i) {
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 07/18] KVM: x86/mmu: Do not use guest root level in audit
  2022-02-17 21:03 [PATCH v2 00/18] KVM: MMU: do not unload MMU roots on all role changes Paolo Bonzini
                   ` (5 preceding siblings ...)
  2022-02-17 21:03 ` [PATCH v2 06/18] KVM: x86/mmu: do not consult levels when freeing roots Paolo Bonzini
@ 2022-02-17 21:03 ` Paolo Bonzini
  2022-02-18 18:37   ` Sean Christopherson
  2022-02-17 21:03 ` [PATCH v2 08/18] KVM: x86/mmu: do not pass vcpu to root freeing functions Paolo Bonzini
                   ` (10 subsequent siblings)
  17 siblings, 1 reply; 66+ messages in thread
From: Paolo Bonzini @ 2022-02-17 21:03 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: seanjc, Lai Jiangshan

From: Lai Jiangshan <laijs@linux.alibaba.com>

Walking from the root page of the shadow page table should start with
the level of the shadow page table: shadow_root_level; do not
consult the level in order to check whether the root has a single
root or uses pae_root, either, and use to_shadow_page instead.

Also tweak audit_mappings(), where the current walking level is more
valuable to print.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/mmu/mmu_audit.c | 16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu_audit.c b/arch/x86/kvm/mmu/mmu_audit.c
index 3e5d62a25350..d1c59aed0465 100644
--- a/arch/x86/kvm/mmu/mmu_audit.c
+++ b/arch/x86/kvm/mmu/mmu_audit.c
@@ -53,17 +53,16 @@ static void __mmu_spte_walk(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 
 static void mmu_spte_walk(struct kvm_vcpu *vcpu, inspect_spte_fn fn)
 {
-	int i;
+	hpa_t root = vcpu->arch.mmu->root.hpa;
 	struct kvm_mmu_page *sp;
+	int i;
 
-	if (!VALID_PAGE(vcpu->arch.mmu->root.hpa))
+	if (!VALID_PAGE(root))
 		return;
 
-	if (vcpu->arch.mmu->root_level >= PT64_ROOT_4LEVEL) {
-		hpa_t root = vcpu->arch.mmu->root.hpa;
-
-		sp = to_shadow_page(root);
-		__mmu_spte_walk(vcpu, sp, fn, vcpu->arch.mmu->root_level);
+	sp = to_shadow_page(root);
+	if (sp) {
+		__mmu_spte_walk(vcpu, sp, fn, vcpu->arch.mmu->shadow_root_level);
 		return;
 	}
 
@@ -119,8 +118,7 @@ static void audit_mappings(struct kvm_vcpu *vcpu, u64 *sptep, int level)
 	hpa =  pfn << PAGE_SHIFT;
 	if ((*sptep & PT64_BASE_ADDR_MASK) != hpa)
 		audit_printk(vcpu->kvm, "levels %d pfn %llx hpa %llx "
-			     "ent %llxn", vcpu->arch.mmu->root_level, pfn,
-			     hpa, *sptep);
+			     "ent %llxn", level, pfn, hpa, *sptep);
 }
 
 static void inspect_spte_has_rmap(struct kvm *kvm, u64 *sptep)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 08/18] KVM: x86/mmu: do not pass vcpu to root freeing functions
  2022-02-17 21:03 [PATCH v2 00/18] KVM: MMU: do not unload MMU roots on all role changes Paolo Bonzini
                   ` (6 preceding siblings ...)
  2022-02-17 21:03 ` [PATCH v2 07/18] KVM: x86/mmu: Do not use guest root level in audit Paolo Bonzini
@ 2022-02-17 21:03 ` Paolo Bonzini
  2022-02-18 18:39   ` Sean Christopherson
  2022-02-23 15:16   ` Maxim Levitsky
  2022-02-17 21:03 ` [PATCH v2 09/18] KVM: x86/mmu: look for a cached PGD when going from 32-bit to 64-bit Paolo Bonzini
                   ` (9 subsequent siblings)
  17 siblings, 2 replies; 66+ messages in thread
From: Paolo Bonzini @ 2022-02-17 21:03 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: seanjc

These functions only operate on a given MMU, of which there are two in a vCPU.
They also need a struct kvm in order to lock the mmu_lock, but they do not
needed anything else in the struct kvm_vcpu.  So, pass the vcpu->kvm directly
to them.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |  4 ++--
 arch/x86/kvm/mmu/mmu.c          | 21 +++++++++++----------
 arch/x86/kvm/vmx/nested.c       |  8 ++++----
 arch/x86/kvm/x86.c              |  4 ++--
 4 files changed, 19 insertions(+), 18 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 6442facfd5c0..79f37ccc8726 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1780,9 +1780,9 @@ void kvm_inject_nmi(struct kvm_vcpu *vcpu);
 void kvm_update_dr7(struct kvm_vcpu *vcpu);
 
 int kvm_mmu_unprotect_page(struct kvm *kvm, gfn_t gfn);
-void kvm_mmu_free_roots(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
+void kvm_mmu_free_roots(struct kvm *kvm, struct kvm_mmu *mmu,
 			ulong roots_to_free);
-void kvm_mmu_free_guest_mode_roots(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu);
+void kvm_mmu_free_guest_mode_roots(struct kvm *kvm, struct kvm_mmu *mmu);
 gpa_t kvm_mmu_gva_to_gpa_read(struct kvm_vcpu *vcpu, gva_t gva,
 			      struct x86_exception *exception);
 gpa_t kvm_mmu_gva_to_gpa_fetch(struct kvm_vcpu *vcpu, gva_t gva,
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index e1578f71feae..0f2de811e871 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3234,10 +3234,9 @@ static void mmu_free_root_page(struct kvm *kvm, hpa_t *root_hpa,
 }
 
 /* roots_to_free must be some combination of the KVM_MMU_ROOT_* flags */
-void kvm_mmu_free_roots(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
+void kvm_mmu_free_roots(struct kvm *kvm, struct kvm_mmu *mmu,
 			ulong roots_to_free)
 {
-	struct kvm *kvm = vcpu->kvm;
 	int i;
 	LIST_HEAD(invalid_list);
 	bool free_active_root;
@@ -3287,7 +3286,7 @@ void kvm_mmu_free_roots(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 }
 EXPORT_SYMBOL_GPL(kvm_mmu_free_roots);
 
-void kvm_mmu_free_guest_mode_roots(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu)
+void kvm_mmu_free_guest_mode_roots(struct kvm *kvm, struct kvm_mmu *mmu)
 {
 	unsigned long roots_to_free = 0;
 	hpa_t root_hpa;
@@ -3309,7 +3308,7 @@ void kvm_mmu_free_guest_mode_roots(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu)
 			roots_to_free |= KVM_MMU_ROOT_PREVIOUS(i);
 	}
 
-	kvm_mmu_free_roots(vcpu, mmu, roots_to_free);
+	kvm_mmu_free_roots(kvm, mmu, roots_to_free);
 }
 EXPORT_SYMBOL_GPL(kvm_mmu_free_guest_mode_roots);
 
@@ -3710,7 +3709,7 @@ void kvm_mmu_sync_prev_roots(struct kvm_vcpu *vcpu)
 			roots_to_free |= KVM_MMU_ROOT_PREVIOUS(i);
 
 	/* sync prev_roots by simply freeing them */
-	kvm_mmu_free_roots(vcpu, vcpu->arch.mmu, roots_to_free);
+	kvm_mmu_free_roots(vcpu->kvm, vcpu->arch.mmu, roots_to_free);
 }
 
 static gpa_t nonpaging_gva_to_gpa(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
@@ -4159,8 +4158,10 @@ static bool fast_pgd_switch(struct kvm_vcpu *vcpu, gpa_t new_pgd,
 static void __kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd,
 			      union kvm_mmu_page_role new_role)
 {
+	struct kvm_mmu *mmu = vcpu->arch.mmu;
+
 	if (!fast_pgd_switch(vcpu, new_pgd, new_role)) {
-		kvm_mmu_free_roots(vcpu, vcpu->arch.mmu, KVM_MMU_ROOT_CURRENT);
+		kvm_mmu_free_roots(vcpu->kvm, mmu, KVM_MMU_ROOT_CURRENT);
 		return;
 	}
 
@@ -5083,10 +5084,10 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu)
 	return r;
 }
 
-static void __kvm_mmu_unload(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu)
+static void __kvm_mmu_unload(struct kvm *kvm, struct kvm_mmu *mmu)
 {
 	int i;
-	kvm_mmu_free_roots(vcpu, mmu, KVM_MMU_ROOTS_ALL);
+	kvm_mmu_free_roots(kvm, mmu, KVM_MMU_ROOTS_ALL);
 	WARN_ON(VALID_PAGE(mmu->root.hpa));
 	if (mmu->pae_root) {
 		for (i = 0; i < 4; ++i)
@@ -5096,8 +5097,8 @@ static void __kvm_mmu_unload(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu)
 
 void kvm_mmu_unload(struct kvm_vcpu *vcpu)
 {
-	__kvm_mmu_unload(vcpu, &vcpu->arch.root_mmu);
-	__kvm_mmu_unload(vcpu, &vcpu->arch.guest_mmu);
+	__kvm_mmu_unload(vcpu->kvm, &vcpu->arch.root_mmu);
+	__kvm_mmu_unload(vcpu->kvm, &vcpu->arch.guest_mmu);
 }
 
 static bool need_remote_flush(u64 old, u64 new)
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 29289ecca223..b7bc634d35e2 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -321,7 +321,7 @@ static void free_nested(struct kvm_vcpu *vcpu)
 	kvm_vcpu_unmap(vcpu, &vmx->nested.pi_desc_map, true);
 	vmx->nested.pi_desc = NULL;
 
-	kvm_mmu_free_roots(vcpu, &vcpu->arch.guest_mmu, KVM_MMU_ROOTS_ALL);
+	kvm_mmu_free_roots(vcpu->kvm, &vcpu->arch.guest_mmu, KVM_MMU_ROOTS_ALL);
 
 	nested_release_evmcs(vcpu);
 
@@ -5007,7 +5007,7 @@ static inline void nested_release_vmcs12(struct kvm_vcpu *vcpu)
 				  vmx->nested.current_vmptr >> PAGE_SHIFT,
 				  vmx->nested.cached_vmcs12, 0, VMCS12_SIZE);
 
-	kvm_mmu_free_roots(vcpu, &vcpu->arch.guest_mmu, KVM_MMU_ROOTS_ALL);
+	kvm_mmu_free_roots(vcpu->kvm, &vcpu->arch.guest_mmu, KVM_MMU_ROOTS_ALL);
 
 	vmx->nested.current_vmptr = INVALID_GPA;
 }
@@ -5486,7 +5486,7 @@ static int handle_invept(struct kvm_vcpu *vcpu)
 	}
 
 	if (roots_to_free)
-		kvm_mmu_free_roots(vcpu, mmu, roots_to_free);
+		kvm_mmu_free_roots(vcpu->kvm, mmu, roots_to_free);
 
 	return nested_vmx_succeed(vcpu);
 }
@@ -5575,7 +5575,7 @@ static int handle_invvpid(struct kvm_vcpu *vcpu)
 	 * TODO: sync only the affected SPTEs for INVDIVIDUAL_ADDR.
 	 */
 	if (!enable_ept)
-		kvm_mmu_free_guest_mode_roots(vcpu, &vcpu->arch.root_mmu);
+		kvm_mmu_free_guest_mode_roots(vcpu->kvm, &vcpu->arch.root_mmu);
 
 	return nested_vmx_succeed(vcpu);
 }
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c0d7256e3a78..6aefd7ac7039 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -855,7 +855,7 @@ int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long cr3)
 	 * Shadow page roots need to be reconstructed instead.
 	 */
 	if (!tdp_enabled && memcmp(mmu->pdptrs, pdpte, sizeof(mmu->pdptrs)))
-		kvm_mmu_free_roots(vcpu, mmu, KVM_MMU_ROOT_CURRENT);
+		kvm_mmu_free_roots(vcpu->kvm, mmu, KVM_MMU_ROOT_CURRENT);
 
 	memcpy(mmu->pdptrs, pdpte, sizeof(mmu->pdptrs));
 	kvm_register_mark_dirty(vcpu, VCPU_EXREG_PDPTR);
@@ -1156,7 +1156,7 @@ static void kvm_invalidate_pcid(struct kvm_vcpu *vcpu, unsigned long pcid)
 		if (kvm_get_pcid(vcpu, mmu->prev_roots[i].pgd) == pcid)
 			roots_to_free |= KVM_MMU_ROOT_PREVIOUS(i);
 
-	kvm_mmu_free_roots(vcpu, mmu, roots_to_free);
+	kvm_mmu_free_roots(vcpu->kvm, mmu, roots_to_free);
 }
 
 int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 09/18] KVM: x86/mmu: look for a cached PGD when going from 32-bit to 64-bit
  2022-02-17 21:03 [PATCH v2 00/18] KVM: MMU: do not unload MMU roots on all role changes Paolo Bonzini
                   ` (7 preceding siblings ...)
  2022-02-17 21:03 ` [PATCH v2 08/18] KVM: x86/mmu: do not pass vcpu to root freeing functions Paolo Bonzini
@ 2022-02-17 21:03 ` Paolo Bonzini
  2022-02-18 18:08   ` Sean Christopherson
  2022-02-23 16:01   ` Maxim Levitsky
  2022-02-17 21:03 ` [PATCH v2 10/18] KVM: x86/mmu: load new PGD after the shadow MMU is initialized Paolo Bonzini
                   ` (8 subsequent siblings)
  17 siblings, 2 replies; 66+ messages in thread
From: Paolo Bonzini @ 2022-02-17 21:03 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: seanjc

Right now, PGD caching avoids placing a PAE root in the cache by using the
old value of mmu->root_level and mmu->shadow_root_level; it does not look
for a cached PGD if the old root is a PAE one, and then frees it using
kvm_mmu_free_roots.

Change the logic instead to free the uncacheable root early.
This way, __kvm_new_mmu_pgd is able to look up the cache when going from
32-bit to 64-bit (if there is a hit, the invalid root becomes the least
recently used).  An example of this is nested virtualization with shadow
paging, when a 64-bit L1 runs a 32-bit L2.

As a side effect (which is actually the reason why this patch was
written), PGD caching does not use the old value of mmu->root_level
and mmu->shadow_root_level anymore.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/mmu/mmu.c | 82 ++++++++++++++++++++++++++++++------------
 1 file changed, 59 insertions(+), 23 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 0f2de811e871..da324a317000 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4107,52 +4107,88 @@ static inline bool is_root_usable(struct kvm_mmu_root_info *root, gpa_t pgd,
 				  union kvm_mmu_page_role role)
 {
 	return (role.direct || pgd == root->pgd) &&
-	       VALID_PAGE(root->hpa) && to_shadow_page(root->hpa) &&
+	       VALID_PAGE(root->hpa) &&
 	       role.word == to_shadow_page(root->hpa)->role.word;
 }
 
 /*
- * Find out if a previously cached root matching the new pgd/role is available.
- * The current root is also inserted into the cache.
- * If a matching root was found, it is assigned to kvm_mmu->root.hpa and true is
- * returned.
- * Otherwise, the LRU root from the cache is assigned to kvm_mmu->root.hpa and
- * false is returned. This root should now be freed by the caller.
+ * Find out if a previously cached root matching the new pgd/role is available,
+ * and insert the current root as the MRU in the cache.
+ * If a matching root is found, it is assigned to kvm_mmu->root and
+ * true is returned.
+ * If no match is found, kvm_mmu->root is left invalid, the LRU root is
+ * evicted to make room for the current root, and false is returned.
  */
-static bool cached_root_available(struct kvm_vcpu *vcpu, gpa_t new_pgd,
-				  union kvm_mmu_page_role new_role)
+static bool cached_root_find_and_keep_current(struct kvm *kvm, struct kvm_mmu *mmu,
+					      gpa_t new_pgd,
+					      union kvm_mmu_page_role new_role)
 {
 	uint i;
-	struct kvm_mmu *mmu = vcpu->arch.mmu;
 
 	if (is_root_usable(&mmu->root, new_pgd, new_role))
 		return true;
 
 	for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++) {
+		/*
+		 * The swaps end up rotating the cache like this:
+		 *   C   0 1 2 3   (on entry to the function)
+		 *   0   C 1 2 3
+		 *   1   C 0 2 3
+		 *   2   C 0 1 3
+		 *   3   C 0 1 2   (on exit from the loop)
+		 */
 		swap(mmu->root, mmu->prev_roots[i]);
-
 		if (is_root_usable(&mmu->root, new_pgd, new_role))
-			break;
+			return true;
 	}
 
-	return i < KVM_MMU_NUM_PREV_ROOTS;
+	kvm_mmu_free_roots(kvm, mmu, KVM_MMU_ROOT_CURRENT);
+	return false;
 }
 
-static bool fast_pgd_switch(struct kvm_vcpu *vcpu, gpa_t new_pgd,
-			    union kvm_mmu_page_role new_role)
+/*
+ * Find out if a previously cached root matching the new pgd/role is available.
+ * On entry, mmu->root is invalid.
+ * If a matching root is found, it is assigned to kvm_mmu->root, the LRU entry
+ * of the cache becomes invalid, and true is returned.
+ * If no match is found, kvm_mmu->root is left invalid and false is returned.
+ */
+static bool cached_root_find_without_current(struct kvm *kvm, struct kvm_mmu *mmu,
+					     gpa_t new_pgd,
+					     union kvm_mmu_page_role new_role)
 {
-	struct kvm_mmu *mmu = vcpu->arch.mmu;
+	uint i;
+
+	for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++)
+		if (is_root_usable(&mmu->prev_roots[i], new_pgd, new_role))
+			goto hit;
 
+	return false;
+
+hit:
+	swap(mmu->root, mmu->prev_roots[i]);
+	/* Bubble up the remaining roots.  */
+	for (; i < KVM_MMU_NUM_PREV_ROOTS - 1; i++)
+		mmu->prev_roots[i] = mmu->prev_roots[i + 1];
+	mmu->prev_roots[i].hpa = INVALID_PAGE;
+	return true;
+}
+
+static bool fast_pgd_switch(struct kvm *kvm, struct kvm_mmu *mmu,
+			    gpa_t new_pgd, union kvm_mmu_page_role new_role)
+{
 	/*
-	 * For now, limit the fast switch to 64-bit hosts+VMs in order to avoid
+	 * For now, limit the caching to 64-bit hosts+VMs in order to avoid
 	 * having to deal with PDPTEs. We may add support for 32-bit hosts/VMs
 	 * later if necessary.
 	 */
-	if (mmu->shadow_root_level >= PT64_ROOT_4LEVEL &&
-	    mmu->root_level >= PT64_ROOT_4LEVEL)
-		return cached_root_available(vcpu, new_pgd, new_role);
+	if (VALID_PAGE(mmu->root.hpa) && !to_shadow_page(mmu->root.hpa))
+		kvm_mmu_free_roots(kvm, mmu, KVM_MMU_ROOT_CURRENT);
 
-	return false;
+	if (VALID_PAGE(mmu->root.hpa))
+		return cached_root_find_and_keep_current(kvm, mmu, new_pgd, new_role);
+	else
+		return cached_root_find_without_current(kvm, mmu, new_pgd, new_role);
 }
 
 static void __kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd,
@@ -4160,8 +4196,8 @@ static void __kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd,
 {
 	struct kvm_mmu *mmu = vcpu->arch.mmu;
 
-	if (!fast_pgd_switch(vcpu, new_pgd, new_role)) {
-		kvm_mmu_free_roots(vcpu->kvm, mmu, KVM_MMU_ROOT_CURRENT);
+	if (!fast_pgd_switch(vcpu->kvm, mmu, new_pgd, new_role)) {
+		/* kvm_mmu_ensure_valid_pgd will set up a new root.  */
 		return;
 	}
 
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 10/18] KVM: x86/mmu: load new PGD after the shadow MMU is initialized
  2022-02-17 21:03 [PATCH v2 00/18] KVM: MMU: do not unload MMU roots on all role changes Paolo Bonzini
                   ` (8 preceding siblings ...)
  2022-02-17 21:03 ` [PATCH v2 09/18] KVM: x86/mmu: look for a cached PGD when going from 32-bit to 64-bit Paolo Bonzini
@ 2022-02-17 21:03 ` Paolo Bonzini
  2022-02-18 23:59   ` Sean Christopherson
  2022-02-23 16:20   ` Maxim Levitsky
  2022-02-17 21:03 ` [PATCH v2 11/18] KVM: x86/mmu: Always use current mmu's role when loading new PGD Paolo Bonzini
                   ` (7 subsequent siblings)
  17 siblings, 2 replies; 66+ messages in thread
From: Paolo Bonzini @ 2022-02-17 21:03 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: seanjc

Now that __kvm_mmu_new_pgd does not look at the MMU's root_level and
shadow_root_level anymore, pull the PGD load after the initialization of
the shadow MMUs.

Besides being more intuitive, this enables future simplifications
and optimizations because it's not necessary anymore to compute the
role outside kvm_init_mmu.  In particular, kvm_mmu_reset_context was not
attempting to use a cached PGD to avoid having to figure out the new role.
It will soon be able to follow what nested_{vmx,svm}_load_cr3 are doing,
and avoid unloading all the cached roots.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/mmu/mmu.c    | 37 +++++++++++++++++--------------------
 arch/x86/kvm/svm/nested.c |  6 +++---
 arch/x86/kvm/vmx/nested.c |  6 +++---
 3 files changed, 23 insertions(+), 26 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index da324a317000..906a9244ad28 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4903,9 +4903,8 @@ void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, unsigned long cr0,
 
 	new_role = kvm_calc_shadow_npt_root_page_role(vcpu, &regs);
 
-	__kvm_mmu_new_pgd(vcpu, nested_cr3, new_role.base);
-
 	shadow_mmu_init_context(vcpu, context, &regs, new_role);
+	__kvm_mmu_new_pgd(vcpu, nested_cr3, new_role.base);
 }
 EXPORT_SYMBOL_GPL(kvm_init_shadow_npt_mmu);
 
@@ -4943,27 +4942,25 @@ void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly,
 		kvm_calc_shadow_ept_root_page_role(vcpu, accessed_dirty,
 						   execonly, level);
 
-	__kvm_mmu_new_pgd(vcpu, new_eptp, new_role.base);
-
-	if (new_role.as_u64 == context->mmu_role.as_u64)
-		return;
-
-	context->mmu_role.as_u64 = new_role.as_u64;
+	if (new_role.as_u64 != context->mmu_role.as_u64) {
+		context->mmu_role.as_u64 = new_role.as_u64;
 
-	context->shadow_root_level = level;
+		context->shadow_root_level = level;
 
-	context->ept_ad = accessed_dirty;
-	context->page_fault = ept_page_fault;
-	context->gva_to_gpa = ept_gva_to_gpa;
-	context->sync_page = ept_sync_page;
-	context->invlpg = ept_invlpg;
-	context->root_level = level;
-	context->direct_map = false;
+		context->ept_ad = accessed_dirty;
+		context->page_fault = ept_page_fault;
+		context->gva_to_gpa = ept_gva_to_gpa;
+		context->sync_page = ept_sync_page;
+		context->invlpg = ept_invlpg;
+		context->root_level = level;
+		context->direct_map = false;
+		update_permission_bitmask(context, true);
+		context->pkru_mask = 0;
+		reset_rsvds_bits_mask_ept(vcpu, context, execonly, huge_page_level);
+		reset_ept_shadow_zero_bits_mask(context, execonly);
+	}
 
-	update_permission_bitmask(context, true);
-	context->pkru_mask = 0;
-	reset_rsvds_bits_mask_ept(vcpu, context, execonly, huge_page_level);
-	reset_ept_shadow_zero_bits_mask(context, execonly);
+	__kvm_mmu_new_pgd(vcpu, new_eptp, new_role.base);
 }
 EXPORT_SYMBOL_GPL(kvm_init_shadow_ept_mmu);
 
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index f284e61451c8..96bab464967f 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -492,14 +492,14 @@ static int nested_svm_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3,
 	    CC(!load_pdptrs(vcpu, cr3)))
 		return -EINVAL;
 
-	if (!nested_npt)
-		kvm_mmu_new_pgd(vcpu, cr3);
-
 	vcpu->arch.cr3 = cr3;
 
 	/* Re-initialize the MMU, e.g. to pick up CR4 MMU role changes. */
 	kvm_init_mmu(vcpu);
 
+	if (!nested_npt)
+		kvm_mmu_new_pgd(vcpu, cr3);
+
 	return 0;
 }
 
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index b7bc634d35e2..1dfe23963a9e 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -1126,15 +1126,15 @@ static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3,
 		return -EINVAL;
 	}
 
-	if (!nested_ept)
-		kvm_mmu_new_pgd(vcpu, cr3);
-
 	vcpu->arch.cr3 = cr3;
 	kvm_register_mark_dirty(vcpu, VCPU_EXREG_CR3);
 
 	/* Re-initialize the MMU, e.g. to pick up CR4 MMU role changes. */
 	kvm_init_mmu(vcpu);
 
+	if (!nested_ept)
+		kvm_mmu_new_pgd(vcpu, cr3);
+
 	return 0;
 }
 
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 11/18] KVM: x86/mmu: Always use current mmu's role when loading new PGD
  2022-02-17 21:03 [PATCH v2 00/18] KVM: MMU: do not unload MMU roots on all role changes Paolo Bonzini
                   ` (9 preceding siblings ...)
  2022-02-17 21:03 ` [PATCH v2 10/18] KVM: x86/mmu: load new PGD after the shadow MMU is initialized Paolo Bonzini
@ 2022-02-17 21:03 ` Paolo Bonzini
  2022-02-18 23:59   ` Sean Christopherson
  2022-02-23 16:23   ` Maxim Levitsky
  2022-02-17 21:03 ` [PATCH v2 12/18] KVM: x86/mmu: clear MMIO cache when unloading the MMU Paolo Bonzini
                   ` (6 subsequent siblings)
  17 siblings, 2 replies; 66+ messages in thread
From: Paolo Bonzini @ 2022-02-17 21:03 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: seanjc

Since the guest PGD is now loaded after the MMU has been set up
completely, the desired role for a cache hit is simply the current
mmu_role.  There is no need to compute it again, so __kvm_mmu_new_pgd
can be folded in kvm_mmu_new_pgd.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/mmu/mmu.c | 29 ++++-------------------------
 1 file changed, 4 insertions(+), 25 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 906a9244ad28..b01160716c6a 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -190,8 +190,6 @@ struct kmem_cache *mmu_page_header_cache;
 static struct percpu_counter kvm_total_used_mmu_pages;
 
 static void mmu_spte_set(u64 *sptep, u64 spte);
-static union kvm_mmu_page_role
-kvm_mmu_calc_root_page_role(struct kvm_vcpu *vcpu);
 
 struct kvm_mmu_role_regs {
 	const unsigned long cr0;
@@ -4191,10 +4189,10 @@ static bool fast_pgd_switch(struct kvm *kvm, struct kvm_mmu *mmu,
 		return cached_root_find_without_current(kvm, mmu, new_pgd, new_role);
 }
 
-static void __kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd,
-			      union kvm_mmu_page_role new_role)
+void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd)
 {
 	struct kvm_mmu *mmu = vcpu->arch.mmu;
+	union kvm_mmu_page_role new_role = mmu->mmu_role.base;
 
 	if (!fast_pgd_switch(vcpu->kvm, mmu, new_pgd, new_role)) {
 		/* kvm_mmu_ensure_valid_pgd will set up a new root.  */
@@ -4230,11 +4228,6 @@ static void __kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd,
 		__clear_sp_write_flooding_count(
 				to_shadow_page(vcpu->arch.mmu->root.hpa));
 }
-
-void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd)
-{
-	__kvm_mmu_new_pgd(vcpu, new_pgd, kvm_mmu_calc_root_page_role(vcpu));
-}
 EXPORT_SYMBOL_GPL(kvm_mmu_new_pgd);
 
 static unsigned long get_cr3(struct kvm_vcpu *vcpu)
@@ -4904,7 +4897,7 @@ void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, unsigned long cr0,
 	new_role = kvm_calc_shadow_npt_root_page_role(vcpu, &regs);
 
 	shadow_mmu_init_context(vcpu, context, &regs, new_role);
-	__kvm_mmu_new_pgd(vcpu, nested_cr3, new_role.base);
+	kvm_mmu_new_pgd(vcpu, nested_cr3);
 }
 EXPORT_SYMBOL_GPL(kvm_init_shadow_npt_mmu);
 
@@ -4960,7 +4953,7 @@ void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly,
 		reset_ept_shadow_zero_bits_mask(context, execonly);
 	}
 
-	__kvm_mmu_new_pgd(vcpu, new_eptp, new_role.base);
+	kvm_mmu_new_pgd(vcpu, new_eptp);
 }
 EXPORT_SYMBOL_GPL(kvm_init_shadow_ept_mmu);
 
@@ -5045,20 +5038,6 @@ void kvm_init_mmu(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_GPL(kvm_init_mmu);
 
-static union kvm_mmu_page_role
-kvm_mmu_calc_root_page_role(struct kvm_vcpu *vcpu)
-{
-	struct kvm_mmu_role_regs regs = vcpu_to_role_regs(vcpu);
-	union kvm_mmu_role role;
-
-	if (tdp_enabled)
-		role = kvm_calc_tdp_mmu_root_page_role(vcpu, &regs, true);
-	else
-		role = kvm_calc_shadow_mmu_root_page_role(vcpu, &regs, true);
-
-	return role.base;
-}
-
 void kvm_mmu_after_set_cpuid(struct kvm_vcpu *vcpu)
 {
 	/*
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 12/18] KVM: x86/mmu: clear MMIO cache when unloading the MMU
  2022-02-17 21:03 [PATCH v2 00/18] KVM: MMU: do not unload MMU roots on all role changes Paolo Bonzini
                   ` (10 preceding siblings ...)
  2022-02-17 21:03 ` [PATCH v2 11/18] KVM: x86/mmu: Always use current mmu's role when loading new PGD Paolo Bonzini
@ 2022-02-17 21:03 ` Paolo Bonzini
  2022-02-18 23:59   ` Sean Christopherson
  2022-02-23 16:32   ` Maxim Levitsky
  2022-02-17 21:03 ` [PATCH v2 13/18] KVM: x86: reset and reinitialize the MMU in __set_sregs_common Paolo Bonzini
                   ` (5 subsequent siblings)
  17 siblings, 2 replies; 66+ messages in thread
From: Paolo Bonzini @ 2022-02-17 21:03 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: seanjc

For cleanliness, do not leave a stale GVA in the cache after all the roots are
cleared.  In practice, kvm_mmu_load will go through kvm_mmu_sync_roots if
paging is on, and will not use vcpu_match_mmio_gva at all if paging is off.
However, leaving data in the cache might cause bugs in the future.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/mmu/mmu.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index b01160716c6a..4e8e3e9530ca 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5111,6 +5111,7 @@ void kvm_mmu_unload(struct kvm_vcpu *vcpu)
 {
 	__kvm_mmu_unload(vcpu->kvm, &vcpu->arch.root_mmu);
 	__kvm_mmu_unload(vcpu->kvm, &vcpu->arch.guest_mmu);
+	vcpu_clear_mmio_info(vcpu, MMIO_GVA_ANY);
 }
 
 static bool need_remote_flush(u64 old, u64 new)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 13/18] KVM: x86: reset and reinitialize the MMU in __set_sregs_common
  2022-02-17 21:03 [PATCH v2 00/18] KVM: MMU: do not unload MMU roots on all role changes Paolo Bonzini
                   ` (11 preceding siblings ...)
  2022-02-17 21:03 ` [PATCH v2 12/18] KVM: x86/mmu: clear MMIO cache when unloading the MMU Paolo Bonzini
@ 2022-02-17 21:03 ` Paolo Bonzini
  2022-02-19  0:22   ` Sean Christopherson
  2022-02-23 16:48   ` Maxim Levitsky
  2022-02-17 21:03 ` [PATCH v2 14/18] KVM: x86/mmu: avoid indirect call for get_cr3 Paolo Bonzini
                   ` (4 subsequent siblings)
  17 siblings, 2 replies; 66+ messages in thread
From: Paolo Bonzini @ 2022-02-17 21:03 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: seanjc

Do a full unload of the MMU in KVM_SET_SREGS and KVM_SEST_REGS2, in
preparation for not doing so in kvm_mmu_reset_context.  There is no
need to delay the reset until after the return, so do it directly in
the __set_sregs_common function and remove the mmu_reset_needed output
parameter.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/x86.c | 32 +++++++++++++-------------------
 1 file changed, 13 insertions(+), 19 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6aefd7ac7039..f10878aa5b20 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10730,7 +10730,7 @@ static bool kvm_is_valid_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
 }
 
 static int __set_sregs_common(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs,
-		int *mmu_reset_needed, bool update_pdptrs)
+			      int update_pdptrs)
 {
 	struct msr_data apic_base_msr;
 	int idx;
@@ -10755,29 +10755,31 @@ static int __set_sregs_common(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs,
 	static_call(kvm_x86_set_gdt)(vcpu, &dt);
 
 	vcpu->arch.cr2 = sregs->cr2;
-	*mmu_reset_needed |= kvm_read_cr3(vcpu) != sregs->cr3;
+
+	if (vcpu->arch.efer != sregs->efer ||
+	    kvm_read_cr0(vcpu) != sregs->cr0 ||
+	    vcpu->arch.cr3 != sregs->cr3 || !update_pdptrs ||
+	    kvm_read_cr4(vcpu) != sregs->cr4)
+		kvm_mmu_unload(vcpu);
+
 	vcpu->arch.cr3 = sregs->cr3;
 	kvm_register_mark_dirty(vcpu, VCPU_EXREG_CR3);
 	static_call_cond(kvm_x86_post_set_cr3)(vcpu, sregs->cr3);
 
 	kvm_set_cr8(vcpu, sregs->cr8);
 
-	*mmu_reset_needed |= vcpu->arch.efer != sregs->efer;
 	static_call(kvm_x86_set_efer)(vcpu, sregs->efer);
 
-	*mmu_reset_needed |= kvm_read_cr0(vcpu) != sregs->cr0;
 	static_call(kvm_x86_set_cr0)(vcpu, sregs->cr0);
 	vcpu->arch.cr0 = sregs->cr0;
 
-	*mmu_reset_needed |= kvm_read_cr4(vcpu) != sregs->cr4;
 	static_call(kvm_x86_set_cr4)(vcpu, sregs->cr4);
 
+	kvm_init_mmu(vcpu);
 	if (update_pdptrs) {
 		idx = srcu_read_lock(&vcpu->kvm->srcu);
-		if (is_pae_paging(vcpu)) {
+		if (is_pae_paging(vcpu))
 			load_pdptrs(vcpu, kvm_read_cr3(vcpu));
-			*mmu_reset_needed = 1;
-		}
 		srcu_read_unlock(&vcpu->kvm->srcu, idx);
 	}
 
@@ -10805,15 +10807,11 @@ static int __set_sregs_common(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs,
 static int __set_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
 {
 	int pending_vec, max_bits;
-	int mmu_reset_needed = 0;
-	int ret = __set_sregs_common(vcpu, sregs, &mmu_reset_needed, true);
+	int ret = __set_sregs_common(vcpu, sregs, true);
 
 	if (ret)
 		return ret;
 
-	if (mmu_reset_needed)
-		kvm_mmu_reset_context(vcpu);
-
 	max_bits = KVM_NR_INTERRUPTS;
 	pending_vec = find_first_bit(
 		(const unsigned long *)sregs->interrupt_bitmap, max_bits);
@@ -10828,7 +10826,6 @@ static int __set_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
 
 static int __set_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2)
 {
-	int mmu_reset_needed = 0;
 	bool valid_pdptrs = sregs2->flags & KVM_SREGS2_FLAGS_PDPTRS_VALID;
 	bool pae = (sregs2->cr0 & X86_CR0_PG) && (sregs2->cr4 & X86_CR4_PAE) &&
 		!(sregs2->efer & EFER_LMA);
@@ -10840,8 +10837,7 @@ static int __set_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2)
 	if (valid_pdptrs && (!pae || vcpu->arch.guest_state_protected))
 		return -EINVAL;
 
-	ret = __set_sregs_common(vcpu, (struct kvm_sregs *)sregs2,
-				 &mmu_reset_needed, !valid_pdptrs);
+	ret = __set_sregs_common(vcpu, (struct kvm_sregs *)sregs2, !valid_pdptrs);
 	if (ret)
 		return ret;
 
@@ -10850,11 +10846,9 @@ static int __set_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2)
 			kvm_pdptr_write(vcpu, i, sregs2->pdptrs[i]);
 
 		kvm_register_mark_dirty(vcpu, VCPU_EXREG_PDPTR);
-		mmu_reset_needed = 1;
 		vcpu->arch.pdptrs_from_userspace = true;
+		/* kvm_mmu_reload will be called on the next entry.  */
 	}
-	if (mmu_reset_needed)
-		kvm_mmu_reset_context(vcpu);
 	return 0;
 }
 
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 14/18] KVM: x86/mmu: avoid indirect call for get_cr3
  2022-02-17 21:03 [PATCH v2 00/18] KVM: MMU: do not unload MMU roots on all role changes Paolo Bonzini
                   ` (12 preceding siblings ...)
  2022-02-17 21:03 ` [PATCH v2 13/18] KVM: x86: reset and reinitialize the MMU in __set_sregs_common Paolo Bonzini
@ 2022-02-17 21:03 ` Paolo Bonzini
  2022-02-18 20:30   ` Sean Christopherson
  2022-02-24 11:02   ` Maxim Levitsky
  2022-02-17 21:03 ` [PATCH v2 15/18] KVM: x86/mmu: rename kvm_mmu_new_pgd, introduce variant that calls get_guest_pgd Paolo Bonzini
                   ` (3 subsequent siblings)
  17 siblings, 2 replies; 66+ messages in thread
From: Paolo Bonzini @ 2022-02-17 21:03 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: seanjc

Most of the time, calls to get_guest_pgd result in calling
kvm_read_cr3 (the exception is only nested TDP).  Hardcode
the default instead of using the get_cr3 function, avoiding
a retpoline if they are enabled.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/mmu.h             | 13 +++++++++++++
 arch/x86/kvm/mmu/mmu.c         | 15 +++++----------
 arch/x86/kvm/mmu/paging_tmpl.h |  2 +-
 arch/x86/kvm/x86.c             |  2 +-
 4 files changed, 20 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 1d0c1904d69a..1808d6814ddb 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -116,6 +116,19 @@ static inline void kvm_mmu_load_pgd(struct kvm_vcpu *vcpu)
 					  vcpu->arch.mmu->shadow_root_level);
 }
 
+static inline gpa_t __kvm_mmu_get_guest_pgd(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu)
+{
+	if (!mmu->get_guest_pgd)
+		return kvm_read_cr3(vcpu);
+	else
+		return mmu->get_guest_pgd(vcpu);
+}
+
+static inline gpa_t kvm_mmu_get_guest_pgd(struct kvm_vcpu *vcpu)
+{
+	return __kvm_mmu_get_guest_pgd(vcpu, vcpu->arch.mmu);
+}
+
 struct kvm_page_fault {
 	/* arguments to kvm_mmu_do_page_fault.  */
 	const gpa_t addr;
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 4e8e3e9530ca..d422d0d2adf8 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3451,7 +3451,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 	unsigned i;
 	int r;
 
-	root_pgd = mmu->get_guest_pgd(vcpu);
+	root_pgd = kvm_mmu_get_guest_pgd(vcpu);
 	root_gfn = root_pgd >> PAGE_SHIFT;
 
 	if (mmu_check_root(vcpu, root_gfn))
@@ -3881,7 +3881,7 @@ static bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 	arch.token = (vcpu->arch.apf.id++ << 12) | vcpu->vcpu_id;
 	arch.gfn = gfn;
 	arch.direct_map = vcpu->arch.mmu->direct_map;
-	arch.cr3 = vcpu->arch.mmu->get_guest_pgd(vcpu);
+	arch.cr3 = kvm_mmu_get_guest_pgd(vcpu);
 
 	return kvm_setup_async_pf(vcpu, cr2_or_gpa,
 				  kvm_vcpu_gfn_to_hva(vcpu, gfn), &arch);
@@ -4230,11 +4230,6 @@ void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd)
 }
 EXPORT_SYMBOL_GPL(kvm_mmu_new_pgd);
 
-static unsigned long get_cr3(struct kvm_vcpu *vcpu)
-{
-	return kvm_read_cr3(vcpu);
-}
-
 static bool sync_mmio_spte(struct kvm_vcpu *vcpu, u64 *sptep, gfn_t gfn,
 			   unsigned int access)
 {
@@ -4789,7 +4784,7 @@ static void init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
 	context->invlpg = NULL;
 	context->shadow_root_level = kvm_mmu_get_tdp_level(vcpu);
 	context->direct_map = true;
-	context->get_guest_pgd = get_cr3;
+	context->get_guest_pgd = NULL; /* use kvm_read_cr3 */
 	context->get_pdptr = kvm_pdptr_read;
 	context->inject_page_fault = kvm_inject_page_fault;
 	context->root_level = role_regs_to_root_level(&regs);
@@ -4964,7 +4959,7 @@ static void init_kvm_softmmu(struct kvm_vcpu *vcpu)
 
 	kvm_init_shadow_mmu(vcpu, &regs);
 
-	context->get_guest_pgd     = get_cr3;
+	context->get_guest_pgd	   = NULL; /* use kvm_read_cr3 */
 	context->get_pdptr         = kvm_pdptr_read;
 	context->inject_page_fault = kvm_inject_page_fault;
 }
@@ -4996,7 +4991,7 @@ static void init_kvm_nested_mmu(struct kvm_vcpu *vcpu)
 		return;
 
 	g_context->mmu_role.as_u64 = new_role.as_u64;
-	g_context->get_guest_pgd     = get_cr3;
+	g_context->get_guest_pgd     = NULL; /* use kvm_read_cr3 */
 	g_context->get_pdptr         = kvm_pdptr_read;
 	g_context->inject_page_fault = kvm_inject_page_fault;
 	g_context->root_level        = new_role.base.level;
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 346f3bad3cb9..1a85aba837b2 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -362,7 +362,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
 	trace_kvm_mmu_pagetable_walk(addr, access);
 retry_walk:
 	walker->level = mmu->root_level;
-	pte           = mmu->get_guest_pgd(vcpu);
+	pte           = __kvm_mmu_get_guest_pgd(vcpu, mmu);
 	have_ad       = PT_HAVE_ACCESSED_DIRTY(mmu);
 
 #if PTTYPE == 64
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f10878aa5b20..adcee7c305ca 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12161,7 +12161,7 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work)
 		return;
 
 	if (!vcpu->arch.mmu->direct_map &&
-	      work->arch.cr3 != vcpu->arch.mmu->get_guest_pgd(vcpu))
+	      work->arch.cr3 != kvm_mmu_get_guest_pgd(vcpu))
 		return;
 
 	kvm_mmu_do_page_fault(vcpu, work->cr2_or_gpa, 0, true);
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 15/18] KVM: x86/mmu: rename kvm_mmu_new_pgd, introduce variant that calls get_guest_pgd
  2022-02-17 21:03 [PATCH v2 00/18] KVM: MMU: do not unload MMU roots on all role changes Paolo Bonzini
                   ` (13 preceding siblings ...)
  2022-02-17 21:03 ` [PATCH v2 14/18] KVM: x86/mmu: avoid indirect call for get_cr3 Paolo Bonzini
@ 2022-02-17 21:03 ` Paolo Bonzini
  2022-02-18  9:39   ` Paolo Bonzini
  2022-02-17 21:03 ` [PATCH v2 16/18] KVM: x86: introduce KVM_REQ_MMU_UPDATE_ROOT Paolo Bonzini
                   ` (2 subsequent siblings)
  17 siblings, 1 reply; 66+ messages in thread
From: Paolo Bonzini @ 2022-02-17 21:03 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: seanjc

In the common case, the argument to kvm_mmu_new_pgd is already in
vcpu->arch.cr3, but that does not work when the guest_mmu is in use.
In that case, the root for L1 TDP tables needs to be retrieved via vendor
code.  Besides, kvm_mmu_new_pgd is a bad name: it can be used also when
the role bits change, not just when the PGD changes.

Kill two birds with one stone by renaming the old kvm_mmu_new_pgd
to __kvm_mmu_update_root.  The non-__ version, kvm_mmu_update_root,
covers the common case, including nested TDP, by calling the
get_guest_pgd callback to retrieve the desired PGD pointer.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |  2 +-
 arch/x86/kvm/mmu/mmu.c          | 15 +++++++++++----
 arch/x86/kvm/svm/nested.c       |  2 +-
 arch/x86/kvm/vmx/nested.c       |  2 +-
 arch/x86/kvm/x86.c              |  2 +-
 5 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 79f37ccc8726..319ac0918aa2 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1808,7 +1808,7 @@ void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva);
 void kvm_mmu_invalidate_gva(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 			    gva_t gva, hpa_t root_hpa);
 void kvm_mmu_invpcid_gva(struct kvm_vcpu *vcpu, gva_t gva, unsigned long pcid);
-void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd);
+void kvm_mmu_update_root(struct kvm_vcpu *vcpu);
 
 void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
 		       int tdp_max_root_level, int tdp_huge_page_level);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index d422d0d2adf8..c44b5114f947 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4189,7 +4189,7 @@ static bool fast_pgd_switch(struct kvm *kvm, struct kvm_mmu *mmu,
 		return cached_root_find_without_current(kvm, mmu, new_pgd, new_role);
 }
 
-void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd)
+static void __kvm_mmu_update_root(struct kvm_vcpu *vcpu, gpa_t new_pgd)
 {
 	struct kvm_mmu *mmu = vcpu->arch.mmu;
 	union kvm_mmu_page_role new_role = mmu->mmu_role.base;
@@ -4228,7 +4228,14 @@ void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd)
 		__clear_sp_write_flooding_count(
 				to_shadow_page(vcpu->arch.mmu->root.hpa));
 }
-EXPORT_SYMBOL_GPL(kvm_mmu_new_pgd);
+
+void kvm_mmu_update_root(struct kvm_vcpu *vcpu)
+{
+	gpa_t new_pgd = kvm_mmu_get_guest_pgd(vcpu);
+
+	__kvm_mmu_update_root(vcpu, new_pgd);
+}
+EXPORT_SYMBOL_GPL(kvm_mmu_update_root);
 
 static bool sync_mmio_spte(struct kvm_vcpu *vcpu, u64 *sptep, gfn_t gfn,
 			   unsigned int access)
@@ -4892,7 +4899,7 @@ void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, unsigned long cr0,
 	new_role = kvm_calc_shadow_npt_root_page_role(vcpu, &regs);
 
 	shadow_mmu_init_context(vcpu, context, &regs, new_role);
-	kvm_mmu_new_pgd(vcpu, nested_cr3);
+	__kvm_mmu_update_root(vcpu, nested_cr3);
 }
 EXPORT_SYMBOL_GPL(kvm_init_shadow_npt_mmu);
 
@@ -4948,7 +4955,7 @@ void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly,
 		reset_ept_shadow_zero_bits_mask(context, execonly);
 	}
 
-	kvm_mmu_new_pgd(vcpu, new_eptp);
+	__kvm_mmu_update_root(vcpu, new_eptp);
 }
 EXPORT_SYMBOL_GPL(kvm_init_shadow_ept_mmu);
 
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 96bab464967f..2386fadae9ed 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -498,7 +498,7 @@ static int nested_svm_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3,
 	kvm_init_mmu(vcpu);
 
 	if (!nested_npt)
-		kvm_mmu_new_pgd(vcpu, cr3);
+		kvm_mmu_update_root(vcpu);
 
 	return 0;
 }
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 1dfe23963a9e..2dbd7a9ada84 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -1133,7 +1133,7 @@ static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3,
 	kvm_init_mmu(vcpu);
 
 	if (!nested_ept)
-		kvm_mmu_new_pgd(vcpu, cr3);
+		kvm_mmu_update_root(vcpu);
 
 	return 0;
 }
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index adcee7c305ca..9800c8883a48 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1189,7 +1189,7 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
 		return 1;
 
 	if (cr3 != kvm_read_cr3(vcpu))
-		kvm_mmu_new_pgd(vcpu, cr3);
+		kvm_mmu_update_root(vcpu);
 
 	vcpu->arch.cr3 = cr3;
 	kvm_register_mark_dirty(vcpu, VCPU_EXREG_CR3);
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 16/18] KVM: x86: introduce KVM_REQ_MMU_UPDATE_ROOT
  2022-02-17 21:03 [PATCH v2 00/18] KVM: MMU: do not unload MMU roots on all role changes Paolo Bonzini
                   ` (14 preceding siblings ...)
  2022-02-17 21:03 ` [PATCH v2 15/18] KVM: x86/mmu: rename kvm_mmu_new_pgd, introduce variant that calls get_guest_pgd Paolo Bonzini
@ 2022-02-17 21:03 ` Paolo Bonzini
  2022-02-18 21:45   ` Sean Christopherson
  2022-02-17 21:03 ` [PATCH v2 17/18] KVM: x86: flush TLB separately from MMU reset Paolo Bonzini
  2022-02-17 21:03 ` [PATCH v2 18/18] KVM: x86: do not unload MMU roots on all role changes Paolo Bonzini
  17 siblings, 1 reply; 66+ messages in thread
From: Paolo Bonzini @ 2022-02-17 21:03 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: seanjc

Whenever KVM knows the page role flags have changed, it needs to drop
the current MMU root and possibly load one from the prev_roots cache.
Currently it is papering over some overly simplistic code by just
dropping _all_ roots, so that the root will be reloaded by
kvm_mmu_reload, but this has bad performance for the TDP MMU
(which drops the whole of the page tables when freeing a root,
without the performance safety net of a hash table).

To do this, KVM needs to do a more kvm_mmu_update_root call from
kvm_mmu_reset_context.  Introduce a new request bit so that the call
can be delayed until after a possible KVM_REQ_MMU_RELOAD, which would
kill all hopes of finding a cached PGD.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/svm/nested.c       |  2 +-
 arch/x86/kvm/vmx/nested.c       |  2 +-
 arch/x86/kvm/x86.c              | 13 +++++++++++--
 4 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 319ac0918aa2..532cda546eb9 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -102,6 +102,7 @@
 #define KVM_REQ_MSR_FILTER_CHANGED	KVM_ARCH_REQ(29)
 #define KVM_REQ_UPDATE_CPU_DIRTY_LOGGING \
 	KVM_ARCH_REQ_FLAGS(30, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
+#define KVM_REQ_MMU_UPDATE_ROOT		KVM_ARCH_REQ(31)
 
 #define CR0_RESERVED_BITS                                               \
 	(~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 2386fadae9ed..8e6e62d8df36 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -498,7 +498,7 @@ static int nested_svm_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3,
 	kvm_init_mmu(vcpu);
 
 	if (!nested_npt)
-		kvm_mmu_update_root(vcpu);
+		kvm_make_request(KVM_REQ_MMU_UPDATE_ROOT, vcpu);
 
 	return 0;
 }
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 2dbd7a9ada84..c3595bc0a02d 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -1133,7 +1133,7 @@ static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3,
 	kvm_init_mmu(vcpu);
 
 	if (!nested_ept)
-		kvm_mmu_update_root(vcpu);
+		kvm_make_request(KVM_REQ_MMU_UPDATE_ROOT, vcpu);
 
 	return 0;
 }
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9800c8883a48..9043548e6baf 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1189,7 +1189,7 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
 		return 1;
 
 	if (cr3 != kvm_read_cr3(vcpu))
-		kvm_mmu_update_root(vcpu);
+		kvm_make_request(KVM_REQ_MMU_UPDATE_ROOT, vcpu);
 
 	vcpu->arch.cr3 = cr3;
 	kvm_register_mark_dirty(vcpu, VCPU_EXREG_CR3);
@@ -9835,8 +9835,15 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 				goto out;
 			}
 		}
-		if (kvm_check_request(KVM_REQ_MMU_RELOAD, vcpu))
+		if (kvm_check_request(KVM_REQ_MMU_RELOAD, vcpu)) {
 			kvm_mmu_unload(vcpu);
+
+			/*
+			 * Dropping all roots leaves no hope for loading a cached
+			 * one.  Let kvm_mmu_reload build a new one.
+			 */
+			kvm_clear_request(KVM_REQ_MMU_UPDATE_ROOT, vcpu);
+		}
 		if (kvm_check_request(KVM_REQ_MIGRATE_TIMER, vcpu))
 			__kvm_migrate_timers(vcpu);
 		if (kvm_check_request(KVM_REQ_MASTERCLOCK_UPDATE, vcpu))
@@ -9848,6 +9855,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 			if (unlikely(r))
 				goto out;
 		}
+		if (kvm_check_request(KVM_REQ_MMU_UPDATE_ROOT, vcpu))
+			kvm_mmu_update_root(vcpu);
 		if (kvm_check_request(KVM_REQ_MMU_SYNC, vcpu))
 			kvm_mmu_sync_roots(vcpu);
 		if (kvm_check_request(KVM_REQ_LOAD_MMU_PGD, vcpu))
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 17/18] KVM: x86: flush TLB separately from MMU reset
  2022-02-17 21:03 [PATCH v2 00/18] KVM: MMU: do not unload MMU roots on all role changes Paolo Bonzini
                   ` (15 preceding siblings ...)
  2022-02-17 21:03 ` [PATCH v2 16/18] KVM: x86: introduce KVM_REQ_MMU_UPDATE_ROOT Paolo Bonzini
@ 2022-02-17 21:03 ` Paolo Bonzini
  2022-02-18 23:57   ` Sean Christopherson
  2022-02-24 16:11   ` Maxim Levitsky
  2022-02-17 21:03 ` [PATCH v2 18/18] KVM: x86: do not unload MMU roots on all role changes Paolo Bonzini
  17 siblings, 2 replies; 66+ messages in thread
From: Paolo Bonzini @ 2022-02-17 21:03 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: seanjc

For both CR0 and CR4, disassociate the TLB flush logic from the
MMU role logic.  Instead  of relying on kvm_mmu_reset_context() being
a superset of various TLB flushes (which is not necessarily going to
be the case in the future), always call it if the role changes
but also set the various TLB flush requests according to what is
in the manual.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/x86.c | 58 ++++++++++++++++++++++++++++++++--------------
 1 file changed, 40 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9043548e6baf..2b4663dfcd8d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -871,6 +871,13 @@ void kvm_post_set_cr0(struct kvm_vcpu *vcpu, unsigned long old_cr0, unsigned lon
 	if ((cr0 ^ old_cr0) & X86_CR0_PG) {
 		kvm_clear_async_pf_completion_queue(vcpu);
 		kvm_async_pf_hash_reset(vcpu);
+
+		/*
+		 * Clearing CR0.PG is defined to flush the TLB from the guest's
+		 * perspective.
+		 */
+		if (!(cr0 & X86_CR0_PG))
+			kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
 	}
 
 	if ((cr0 ^ old_cr0) & KVM_MMU_CR0_ROLE_BITS)
@@ -1057,28 +1064,41 @@ EXPORT_SYMBOL_GPL(kvm_is_valid_cr4);
 
 void kvm_post_set_cr4(struct kvm_vcpu *vcpu, unsigned long old_cr4, unsigned long cr4)
 {
+	if ((cr4 ^ old_cr4) & KVM_MMU_CR4_ROLE_BITS)
+		kvm_mmu_reset_context(vcpu);
+
 	/*
-	 * If any role bit is changed, the MMU needs to be reset.
-	 *
-	 * If CR4.PCIDE is changed 1 -> 0, the guest TLB must be flushed.
 	 * If CR4.PCIDE is changed 0 -> 1, there is no need to flush the TLB
 	 * according to the SDM; however, stale prev_roots could be reused
 	 * incorrectly in the future after a MOV to CR3 with NOFLUSH=1, so we
-	 * free them all.  KVM_REQ_MMU_RELOAD is fit for the both cases; it
-	 * is slow, but changing CR4.PCIDE is a rare case.
-	 *
-	 * If CR4.PGE is changed, the guest TLB must be flushed.
-	 *
-	 * Note: resetting MMU is a superset of KVM_REQ_MMU_RELOAD and
-	 * KVM_REQ_MMU_RELOAD is a superset of KVM_REQ_TLB_FLUSH_GUEST, hence
-	 * the usage of "else if".
+	 * free them all.  This is *not* a superset of KVM_REQ_TLB_FLUSH_GUEST
+	 * or KVM_REQ_TLB_FLUSH_CURRENT, because the hardware TLB is not flushed,
+	 * so fall through.
 	 */
-	if ((cr4 ^ old_cr4) & KVM_MMU_CR4_ROLE_BITS)
-		kvm_mmu_reset_context(vcpu);
-	else if ((cr4 ^ old_cr4) & X86_CR4_PCIDE)
+	if (!tdp_enabled &&
+	    (cr4 & X86_CR4_PCIDE) && !(old_cr4 & X86_CR4_PCIDE))
 		kvm_make_request(KVM_REQ_MMU_RELOAD, vcpu);
-	else if ((cr4 ^ old_cr4) & X86_CR4_PGE)
-		kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
+
+	/*
+	 * The TLB has to be flushed for all PCIDs on:
+	 * - CR4.PCIDE changed from 1 to 0
+	 * - any change to CR4.PGE
+	 *
+	 * This is a superset of KVM_REQ_TLB_FLUSH_CURRENT.
+	 */
+	if (((cr4 ^ old_cr4) & X86_CR4_PGE) ||
+	    (!(cr4 & X86_CR4_PCIDE) && (old_cr4 & X86_CR4_PCIDE)))
+		 kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
+
+	/*
+	 * The TLB has to be flushed for the current PCID on:
+	 * - CR4.SMEP changed from 0 to 1
+	 * - any change to CR4.PAE
+	 */
+	else if (((cr4 ^ old_cr4) & X86_CR4_PAE) ||
+		 ((cr4 & X86_CR4_SMEP) && !(old_cr4 & X86_CR4_SMEP)))
+		 kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
+
 }
 EXPORT_SYMBOL_GPL(kvm_post_set_cr4);
 
@@ -11323,15 +11343,17 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 	static_call(kvm_x86_update_exception_bitmap)(vcpu);
 
 	/*
-	 * Reset the MMU context if paging was enabled prior to INIT (which is
+	 * A TLB flush is needed if paging was enabled prior to INIT (which is
 	 * implied if CR0.PG=1 as CR0 will be '0' prior to RESET).  Unlike the
 	 * standard CR0/CR4/EFER modification paths, only CR0.PG needs to be
 	 * checked because it is unconditionally cleared on INIT and all other
 	 * paging related bits are ignored if paging is disabled, i.e. CR0.WP,
 	 * CR4, and EFER changes are all irrelevant if CR0.PG was '0'.
 	 */
-	if (old_cr0 & X86_CR0_PG)
+	if (old_cr0 & X86_CR0_PG) {
+		kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
 		kvm_mmu_reset_context(vcpu);
+	}
 
 	/*
 	 * Intel's SDM states that all TLB entries are flushed on INIT.  AMD's
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 18/18] KVM: x86: do not unload MMU roots on all role changes
  2022-02-17 21:03 [PATCH v2 00/18] KVM: MMU: do not unload MMU roots on all role changes Paolo Bonzini
                   ` (16 preceding siblings ...)
  2022-02-17 21:03 ` [PATCH v2 17/18] KVM: x86: flush TLB separately from MMU reset Paolo Bonzini
@ 2022-02-17 21:03 ` Paolo Bonzini
  2022-02-24 16:25   ` Maxim Levitsky
  17 siblings, 1 reply; 66+ messages in thread
From: Paolo Bonzini @ 2022-02-17 21:03 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: seanjc

kvm_mmu_reset_context is called on all role changes and right now it
calls kvm_mmu_unload.  With the legacy MMU this is a relatively cheap
operation; the previous PGDs remains in the hash table and is picked
up immediately on the next page fault.  With the TDP MMU, however, the
roots are thrown away for good and a full rebuild of the page tables is
necessary, which is many times more expensive.

Fortunately, throwing away the roots is not necessary except when
the manual says a TLB flush is required:

- changing CR0.PG from 1 to 0 (because it flushes the TLB according to
  the x86 architecture specification)

- changing CPUID (which changes the interpretation of page tables in
  ways not reflected by the role).

- changing CR4.SMEP from 0 to 1 (not doing so actually breaks access.c)

Except for these cases, once the MMU has updated the CPU/MMU roles
and metadata it is enough to force-reload the current value of CR3.
KVM will look up the cached roots for an entry with the right role and
PGD, and only if the cache misses a new root will be created.

Measuring with vmexit.flat from kvm-unit-tests shows the following
improvement:

             TDP         legacy       shadow
   before    46754       5096         5150
   after     4879        4875         5006

which is for very small page tables.  The impact is however much larger
when running as an L1 hypervisor, because the new page tables cause
extra work for L0 to shadow them.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/mmu/mmu.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index c44b5114f947..913cc7229bf4 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5043,8 +5043,8 @@ EXPORT_SYMBOL_GPL(kvm_init_mmu);
 void kvm_mmu_after_set_cpuid(struct kvm_vcpu *vcpu)
 {
 	/*
-	 * Invalidate all MMU roles to force them to reinitialize as CPUID
-	 * information is factored into reserved bit calculations.
+	 * Invalidate all MMU roles and roots to force them to reinitialize,
+	 * as CPUID information is factored into reserved bit calculations.
 	 *
 	 * Correctly handling multiple vCPU models with respect to paging and
 	 * physical address properties) in a single VM would require tracking
@@ -5057,6 +5057,7 @@ void kvm_mmu_after_set_cpuid(struct kvm_vcpu *vcpu)
 	vcpu->arch.root_mmu.mmu_role.ext.valid = 0;
 	vcpu->arch.guest_mmu.mmu_role.ext.valid = 0;
 	vcpu->arch.nested_mmu.mmu_role.ext.valid = 0;
+	kvm_mmu_unload(vcpu);
 	kvm_mmu_reset_context(vcpu);
 
 	/*
@@ -5068,8 +5069,8 @@ void kvm_mmu_after_set_cpuid(struct kvm_vcpu *vcpu)
 
 void kvm_mmu_reset_context(struct kvm_vcpu *vcpu)
 {
-	kvm_mmu_unload(vcpu);
 	kvm_init_mmu(vcpu);
+	kvm_make_request(KVM_REQ_MMU_UPDATE_ROOT, vcpu);
 }
 EXPORT_SYMBOL_GPL(kvm_mmu_reset_context);
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 15/18] KVM: x86/mmu: rename kvm_mmu_new_pgd, introduce variant that calls get_guest_pgd
  2022-02-17 21:03 ` [PATCH v2 15/18] KVM: x86/mmu: rename kvm_mmu_new_pgd, introduce variant that calls get_guest_pgd Paolo Bonzini
@ 2022-02-18  9:39   ` Paolo Bonzini
  2022-02-18 21:00     ` Sean Christopherson
  0 siblings, 1 reply; 66+ messages in thread
From: Paolo Bonzini @ 2022-02-18  9:39 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: seanjc

On 2/17/22 22:03, Paolo Bonzini wrote:
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index adcee7c305ca..9800c8883a48 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1189,7 +1189,7 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
>   		return 1;
>   
>   	if (cr3 != kvm_read_cr3(vcpu))
> -		kvm_mmu_new_pgd(vcpu, cr3);
> +		kvm_mmu_update_root(vcpu);
>   
>   	vcpu->arch.cr3 = cr3;
>   	kvm_register_mark_dirty(vcpu, VCPU_EXREG_CR3);

Uh-oh, this has to become:

  	vcpu->arch.cr3 = cr3;
  	kvm_register_mark_dirty(vcpu, VCPU_EXREG_CR3);
	if (!is_pae_paging(vcpu))
		kvm_mmu_update_root(vcpu);

The regression would go away after patch 16, but this is more tidy apart
from having to check is_pae_paging *again*.

Incremental patch:

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index adcee7c305ca..0085e9fba372 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1188,11 +1189,11 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
  	if (is_pae_paging(vcpu) && !load_pdptrs(vcpu, cr3))
  		return 1;
  
-	if (cr3 != kvm_read_cr3(vcpu))
-		kvm_mmu_update_root(vcpu);
-
  	vcpu->arch.cr3 = cr3;
  	kvm_register_mark_dirty(vcpu, VCPU_EXREG_CR3);
+	if (!is_pae_paging(vcpu))
+		kvm_mmu_update_root(vcpu);
+
  	/* Do not call post_set_cr3, we do not get here for confidential guests.  */
  

An alternative is to move the vcpu->arch.cr3 update in load_pdptrs.
Reviewers, let me know if you prefer that, then I'll send v3.

Paolo


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 01/18] KVM: x86: host-initiated EFER.LME write affects the MMU
  2022-02-17 21:03 ` [PATCH v2 01/18] KVM: x86: host-initiated EFER.LME write affects the MMU Paolo Bonzini
@ 2022-02-18 17:08   ` Sean Christopherson
  2022-02-18 17:26     ` Paolo Bonzini
  2022-02-23 13:40   ` Maxim Levitsky
  1 sibling, 1 reply; 66+ messages in thread
From: Sean Christopherson @ 2022-02-18 17:08 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm, stable

The shortlog doesn't come remotely close to saying what this patch does, it's
simply a statement.

  KVM: x86: Reset the MMU context if host userspace toggles EFER.LME

On Thu, Feb 17, 2022, Paolo Bonzini wrote:
> While the guest runs, EFER.LME cannot change unless CR0.PG is clear, and therefore
> EFER.NX is the only bit that can affect the MMU role.  However, set_efer accepts
> a host-initiated change to EFER.LME even with CR0.PG=1.  In that case, the
> MMU has to be reset.

Wrap at ~75 please.

> Fixes: 11988499e62b ("KVM: x86: Skip EFER vs. guest CPUID checks for host-initiated writes")
> Cc: stable@vger.kernel.org
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---

With nits addressed,

Reviewed-by: Sean Christopherson <seanjc@google.com>

>  arch/x86/kvm/mmu.h | 1 +
>  arch/x86/kvm/x86.c | 2 +-
>  2 files changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> index 51faa2c76ca5..a5a50cfeffff 100644
> --- a/arch/x86/kvm/mmu.h
> +++ b/arch/x86/kvm/mmu.h
> @@ -48,6 +48,7 @@
>  			       X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_PKE)
>  
>  #define KVM_MMU_CR0_ROLE_BITS (X86_CR0_PG | X86_CR0_WP)
> +#define KVM_MMU_EFER_ROLE_BITS (EFER_LME | EFER_NX)
>  
>  static __always_inline u64 rsvd_bits(int s, int e)
>  {
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index d3da64106685..99a58c25f5c2 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1647,7 +1647,7 @@ static int set_efer(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  	}
>  
>  	/* Update reserved bits */

This comment needs to be dropped, toggling EFER.LME affects more than just reserved
bits.

> -	if ((efer ^ old_efer) & EFER_NX)
> +	if ((efer ^ old_efer) & KVM_MMU_EFER_ROLE_BITS)
>  		kvm_mmu_reset_context(vcpu);
>  
>  	return 0;
> -- 
> 2.31.1
> 
> 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 02/18] KVM: x86: do not deliver asynchronous page faults if CR0.PG=0
  2022-02-17 21:03 ` [PATCH v2 02/18] KVM: x86: do not deliver asynchronous page faults if CR0.PG=0 Paolo Bonzini
@ 2022-02-18 17:12   ` Sean Christopherson
  2022-02-23 14:07   ` Maxim Levitsky
  1 sibling, 0 replies; 66+ messages in thread
From: Sean Christopherson @ 2022-02-18 17:12 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm

On Thu, Feb 17, 2022, Paolo Bonzini wrote:
> Enabling async page faults is nonsensical if paging is disabled, but
> it is allowed because CR0.PG=0 does not clear the async page fault
> MSR.  Just ignore them and only use the artificial halt state,
> similar to what happens in guest mode if async #PF vmexits are disabled.
> 
> Given the increasingly complex logic, and the nicer code if the new
> "if" is placed last, opportunistically change the "||" into a chain
> of "if (...) return false" statements.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---

Comment nits aside,

Reviewed-by: Sean Christopherson <seanjc@google.com>

>  arch/x86/kvm/x86.c | 22 ++++++++++++++++++----
>  1 file changed, 18 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 99a58c25f5c2..b912eef5dc1a 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -12270,14 +12270,28 @@ static inline bool apf_pageready_slot_free(struct kvm_vcpu *vcpu)
>  
>  static bool kvm_can_deliver_async_pf(struct kvm_vcpu *vcpu)
>  {
> -	if (!vcpu->arch.apf.delivery_as_pf_vmexit && is_guest_mode(vcpu))
> +
> +	if (!kvm_pv_async_pf_enabled(vcpu))
>  		return false;
>  
> -	if (!kvm_pv_async_pf_enabled(vcpu) ||
> -	    (vcpu->arch.apf.send_user_only && static_call(kvm_x86_get_cpl)(vcpu) == 0))
> +	if (vcpu->arch.apf.send_user_only &&
> +	    static_call(kvm_x86_get_cpl)(vcpu) == 0)
>  		return false;
>  
> -	return true;
> +	if (is_guest_mode(vcpu)) {
> +		/*
> +		 * L1 needs to opt into the special #PF vmexits that are
> +		 * used to deliver async page faults.

Wrap at 80 chars.

> +		 */
> +		return vcpu->arch.apf.delivery_as_pf_vmexit;
> +	} else {
> +		/*
> +		 * Play it safe in case the guest does a quick real mode
> +		 * foray.  The real mode IDT is unlikely to have a #PF
> +		 * exception setup.

I actually like the comment, but it's slightly confusing because based on the
"real mode" stuff, I would expect:

		return is_protmode(vcpu);

Maybe tweak it to:


		/*
		 * Play it safe in case the guest temporarily disables paging.
		 * The real mode IDT in particular is unlikely to have a #PF
		 * exception setup.
		 */

> +		 */
> +		return is_paging(vcpu);
> +	}
>  }
>  
>  bool kvm_can_do_async_pf(struct kvm_vcpu *vcpu)
> -- 
> 2.31.1
> 
> 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 03/18] KVM: x86/mmu: WARN if PAE roots linger after kvm_mmu_unload
  2022-02-17 21:03 ` [PATCH v2 03/18] KVM: x86/mmu: WARN if PAE roots linger after kvm_mmu_unload Paolo Bonzini
@ 2022-02-18 17:14   ` Sean Christopherson
  2022-02-18 17:23     ` Paolo Bonzini
  0 siblings, 1 reply; 66+ messages in thread
From: Sean Christopherson @ 2022-02-18 17:14 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm

On Thu, Feb 17, 2022, Paolo Bonzini wrote:
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  arch/x86/kvm/mmu/mmu.c | 17 +++++++++++++----
>  1 file changed, 13 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 296f8723f9ae..a67071ac80f3 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -5086,12 +5086,21 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu)
>  	return r;
>  }
>  
> +static void __kvm_mmu_unload(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu)
> +{
> +	int i;
> +	kvm_mmu_free_roots(vcpu, mmu, KVM_MMU_ROOTS_ALL);
> +	WARN_ON(VALID_PAGE(mmu->root_hpa));
> +	if (mmu->pae_root) {
> +		for (i = 0; i < 4; ++i)
> +			WARN_ON(IS_VALID_PAE_ROOT(mmu->pae_root[i]));
> +	}
> +}
> +
>  void kvm_mmu_unload(struct kvm_vcpu *vcpu)
>  {
> -	kvm_mmu_free_roots(vcpu, &vcpu->arch.root_mmu, KVM_MMU_ROOTS_ALL);
> -	WARN_ON(VALID_PAGE(vcpu->arch.root_mmu.root_hpa));
> -	kvm_mmu_free_roots(vcpu, &vcpu->arch.guest_mmu, KVM_MMU_ROOTS_ALL);
> -	WARN_ON(VALID_PAGE(vcpu->arch.guest_mmu.root_hpa));
> +	__kvm_mmu_unload(vcpu, &vcpu->arch.root_mmu);
> +	__kvm_mmu_unload(vcpu, &vcpu->arch.guest_mmu);

Can we just drop this one?  Checkpatch doesn't like it, and IMO the existing asserts
are unnecessary.  IIRC you said this one never actually fired?

WARNING: Missing commit description - Add an appropriate one

WARNING: Missing a blank line after declarations
#22: FILE: arch/x86/kvm/mmu/mmu.c:5092:
+	int i;
+	kvm_mmu_free_roots(vcpu, mmu, KVM_MMU_ROOTS_ALL);


>  }
>  
>  static bool need_remote_flush(u64 old, u64 new)
> -- 
> 2.31.1
> 
> 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 04/18] KVM: x86/mmu: avoid NULL-pointer dereference on page freeing bugs
  2022-02-17 21:03 ` [PATCH v2 04/18] KVM: x86/mmu: avoid NULL-pointer dereference on page freeing bugs Paolo Bonzini
@ 2022-02-18 17:15   ` Sean Christopherson
  2022-02-23 14:12   ` Maxim Levitsky
  1 sibling, 0 replies; 66+ messages in thread
From: Sean Christopherson @ 2022-02-18 17:15 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm

On Thu, Feb 17, 2022, Paolo Bonzini wrote:
> WARN and bail if KVM attempts to free a root that isn't backed by a shadow
> page.  KVM allocates a bare page for "special" roots, e.g. when using PAE
> paging or shadowing 2/3/4-level page tables with 4/5-level, and so root_hpa
> will be valid but won't be backed by a shadow page.  It's all too easy to
> blindly call mmu_free_root_page() on root_hpa, be nice and WARN instead of
> crashing KVM and possibly the kernel.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---

Reviewed-by: Sean Christopherson <seanjc@google.com>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 03/18] KVM: x86/mmu: WARN if PAE roots linger after kvm_mmu_unload
  2022-02-18 17:14   ` Sean Christopherson
@ 2022-02-18 17:23     ` Paolo Bonzini
  2022-02-23 14:11       ` Maxim Levitsky
  0 siblings, 1 reply; 66+ messages in thread
From: Paolo Bonzini @ 2022-02-18 17:23 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: linux-kernel, kvm

On 2/18/22 18:14, Sean Christopherson wrote:
> Checkpatch doesn't like it, and IMO the existing asserts
> are unnecessary.

I agree that removing the assertions could be another way to go.

A third and better one could be to just wait until pae_root is gone.  I 
have started looking at it but I would like your opinion on one detail; 
see question I posted at 
https://lore.kernel.org/kvm/7ccb16e5-579e-b3d9-cedc-305152ef9b8f@redhat.com/.

For now I'll drop this patch.

Paolo


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 01/18] KVM: x86: host-initiated EFER.LME write affects the MMU
  2022-02-18 17:08   ` Sean Christopherson
@ 2022-02-18 17:26     ` Paolo Bonzini
  0 siblings, 0 replies; 66+ messages in thread
From: Paolo Bonzini @ 2022-02-18 17:26 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: linux-kernel, kvm, stable

On 2/18/22 18:08, Sean Christopherson wrote:
> The shortlog doesn't come remotely close to saying what this patch does, it's
> simply a statement.
> 
>    KVM: x86: Reset the MMU context if host userspace toggles EFER.LME

I'd like not to use "reset the MMU context" because 1) the meaning 
changes at the end of the series so it's not the best time to use the 
expression, 2) actually I hope to get rid of it completely and just use 
kvm_init_mmu.

I'll use "Reinitialize MMU" which is the important part of 
kvm_reset_mmu_context().

Paolo

> On Thu, Feb 17, 2022, Paolo Bonzini wrote:
>> While the guest runs, EFER.LME cannot change unless CR0.PG is clear, and therefore
>> EFER.NX is the only bit that can affect the MMU role.  However, set_efer accepts
>> a host-initiated change to EFER.LME even with CR0.PG=1.  In that case, the
>> MMU has to be reset.
> 
> Wrap at ~75 please.
> 
>> Fixes: 11988499e62b ("KVM: x86: Skip EFER vs. guest CPUID checks for host-initiated writes")
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
> 
> With nits addressed,
> 
> Reviewed-by: Sean Christopherson <seanjc@google.com>
> 
>>   arch/x86/kvm/mmu.h | 1 +
>>   arch/x86/kvm/x86.c | 2 +-
>>   2 files changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
>> index 51faa2c76ca5..a5a50cfeffff 100644
>> --- a/arch/x86/kvm/mmu.h
>> +++ b/arch/x86/kvm/mmu.h
>> @@ -48,6 +48,7 @@
>>   			       X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_PKE)
>>   
>>   #define KVM_MMU_CR0_ROLE_BITS (X86_CR0_PG | X86_CR0_WP)
>> +#define KVM_MMU_EFER_ROLE_BITS (EFER_LME | EFER_NX)
>>   
>>   static __always_inline u64 rsvd_bits(int s, int e)
>>   {
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index d3da64106685..99a58c25f5c2 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -1647,7 +1647,7 @@ static int set_efer(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>>   	}
>>   
>>   	/* Update reserved bits */
> 
> This comment needs to be dropped, toggling EFER.LME affects more than just reserved
> bits.
> 
>> -	if ((efer ^ old_efer) & EFER_NX)
>> +	if ((efer ^ old_efer) & KVM_MMU_EFER_ROLE_BITS)
>>   		kvm_mmu_reset_context(vcpu);
>>   
>>   	return 0;
>> -- 
>> 2.31.1
>>
>>
> 


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 06/18] KVM: x86/mmu: do not consult levels when freeing roots
  2022-02-17 21:03 ` [PATCH v2 06/18] KVM: x86/mmu: do not consult levels when freeing roots Paolo Bonzini
@ 2022-02-18 17:27   ` Sean Christopherson
  2022-02-23 14:59   ` Maxim Levitsky
  1 sibling, 0 replies; 66+ messages in thread
From: Sean Christopherson @ 2022-02-18 17:27 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm

On Thu, Feb 17, 2022, Paolo Bonzini wrote:
> Right now, PGD caching requires a complicated dance of first computing
> the MMU role and passing it to __kvm_mmu_new_pgd(), and then separately calling

Wrap at ~75 chars.  I'm starting to wonder if you role 5x d20 when deciding what
line number you wrap at :-)

> kvm_init_mmu().
> 
> Part of this is due to kvm_mmu_free_roots using mmu->root_level and
> mmu->shadow_root_level to distinguish whether the page table uses a single
> root or 4 PAE roots.  Because kvm_init_mmu() can overwrite mmu->root_level,
> kvm_mmu_free_roots() must be called before kvm_init_mmu().
> 
> However, even after kvm_init_mmu() there is a way to detect whether the
> page table may hold PAE roots, as root.hpa isn't backed by a shadow when
> it points at PAE roots.  Using this method results in simpler code, and
> is one less obstacle in moving all calls to __kvm_mmu_new_pgd() after the
> MMU has been initialized.

I think it's worth adding a blurb about 5-level nNPT.  Something like

  Note, this is technically wrong when KVM is using shadowing 4-level NPT
  in L1 with 5-level NPT in L0, as the PDPTEs are not used in that case
  and mmu->root.hpa will not be backed by a shadow page.  But the PDPTEs
  will be '0' so processing them does no harm, not too mention that that
  particular nNPT case is completely broken in KVM and this code will
  need to be reworked to correctly handle 5=>4-level nNPT no matter what.

> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  arch/x86/kvm/mmu/mmu.c | 10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index a478667d7561..e1578f71feae 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -3240,12 +3240,15 @@ void kvm_mmu_free_roots(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
>  	struct kvm *kvm = vcpu->kvm;
>  	int i;
>  	LIST_HEAD(invalid_list);
> -	bool free_active_root = roots_to_free & KVM_MMU_ROOT_CURRENT;
> +	bool free_active_root;
>  
>  	BUILD_BUG_ON(KVM_MMU_NUM_PREV_ROOTS >= BITS_PER_LONG);
>  
>  	/* Before acquiring the MMU lock, see if we need to do any real work. */
> -	if (!(free_active_root && VALID_PAGE(mmu->root.hpa))) {
> +	free_active_root = (roots_to_free & KVM_MMU_ROOT_CURRENT)
> +		&& VALID_PAGE(mmu->root.hpa);

Pretty please, put the && on the first line and align the indentation.

	free_active_root = (roots_to_free & KVM_MMU_ROOT_CURRENT) &&
			   VALID_PAGE(mmu->root.hpa);

With that,

Reviewed-by: Sean Christopherson <seanjc@google.com>

> +
> +	if (!free_active_root) {
>  		for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++)
>  			if ((roots_to_free & KVM_MMU_ROOT_PREVIOUS(i)) &&
>  			    VALID_PAGE(mmu->prev_roots[i].hpa))
> @@ -3263,8 +3266,7 @@ void kvm_mmu_free_roots(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
>  					   &invalid_list);
>  
>  	if (free_active_root) {
> -		if (mmu->shadow_root_level >= PT64_ROOT_4LEVEL &&
> -		    (mmu->root_level >= PT64_ROOT_4LEVEL || mmu->direct_map)) {
> +		if (to_shadow_page(mmu->root.hpa)) {
>  			mmu_free_root_page(kvm, &mmu->root.hpa, &invalid_list);
>  		} else if (mmu->pae_root) {
>  			for (i = 0; i < 4; ++i) {
> -- 
> 2.31.1
> 
> 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 09/18] KVM: x86/mmu: look for a cached PGD when going from 32-bit to 64-bit
  2022-02-17 21:03 ` [PATCH v2 09/18] KVM: x86/mmu: look for a cached PGD when going from 32-bit to 64-bit Paolo Bonzini
@ 2022-02-18 18:08   ` Sean Christopherson
  2022-02-23 16:01   ` Maxim Levitsky
  1 sibling, 0 replies; 66+ messages in thread
From: Sean Christopherson @ 2022-02-18 18:08 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm

On Thu, Feb 17, 2022, Paolo Bonzini wrote:
> Right now, PGD caching avoids placing a PAE root in the cache by using the
> old value of mmu->root_level and mmu->shadow_root_level; it does not look
> for a cached PGD if the old root is a PAE one, and then frees it using
> kvm_mmu_free_roots.
> 
> Change the logic instead to free the uncacheable root early.
> This way, __kvm_new_mmu_pgd is able to look up the cache when going from
> 32-bit to 64-bit (if there is a hit, the invalid root becomes the least
> recently used).  An example of this is nested virtualization with shadow
> paging, when a 64-bit L1 runs a 32-bit L2.
> 
> As a side effect (which is actually the reason why this patch was
> written), PGD caching does not use the old value of mmu->root_level
> and mmu->shadow_root_level anymore.

Maybe another blurb on 5=>4-level nNPT being broken?  I'm also ok omitting it.

> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---

Nits aside,

Reviewed-by: Sean Christopherson <seanjc@google.com>

> +static bool cached_root_find_and_keep_current(struct kvm *kvm, struct kvm_mmu *mmu,
> +					      gpa_t new_pgd,
> +					      union kvm_mmu_page_role new_role)
>  {
>  	uint i;
> -	struct kvm_mmu *mmu = vcpu->arch.mmu;
>  
>  	if (is_root_usable(&mmu->root, new_pgd, new_role))
>  		return true;
>  
>  	for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++) {
> +		/*
> +		 * The swaps end up rotating the cache like this:
> +		 *   C   0 1 2 3   (on entry to the function)
> +		 *   0   C 1 2 3
> +		 *   1   C 0 2 3
> +		 *   2   C 0 1 3
> +		 *   3   C 0 1 2   (on exit from the loop)
> +		 */
>  		swap(mmu->root, mmu->prev_roots[i]);
> -

I'd prefer we keep this whitespace, I like that it separates the swap() and its
comment from the usability check.

>  		if (is_root_usable(&mmu->root, new_pgd, new_role))
> -			break;
> +			return true;
>  	}
>  
> -	return i < KVM_MMU_NUM_PREV_ROOTS;
> +	kvm_mmu_free_roots(kvm, mmu, KVM_MMU_ROOT_CURRENT);
> +	return false;
>  }
>  
> -static bool fast_pgd_switch(struct kvm_vcpu *vcpu, gpa_t new_pgd,
> -			    union kvm_mmu_page_role new_role)
> +/*
> + * Find out if a previously cached root matching the new pgd/role is available.
> + * On entry, mmu->root is invalid.
> + * If a matching root is found, it is assigned to kvm_mmu->root, the LRU entry
> + * of the cache becomes invalid, and true is returned.
> + * If no match is found, kvm_mmu->root is left invalid and false is returned.
> + */
> +static bool cached_root_find_without_current(struct kvm *kvm, struct kvm_mmu *mmu,
> +					     gpa_t new_pgd,
> +					     union kvm_mmu_page_role new_role)
>  {
> -	struct kvm_mmu *mmu = vcpu->arch.mmu;
> +	uint i;
> +
> +	for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++)
> +		if (is_root_usable(&mmu->prev_roots[i], new_pgd, new_role))
> +			goto hit;

The for-loop needs curly braces.

>  
> +	return false;
> +
> +hit:
> +	swap(mmu->root, mmu->prev_roots[i]);
> +	/* Bubble up the remaining roots.  */
> +	for (; i < KVM_MMU_NUM_PREV_ROOTS - 1; i++)
> +		mmu->prev_roots[i] = mmu->prev_roots[i + 1];
> +	mmu->prev_roots[i].hpa = INVALID_PAGE;
> +	return true;
> +}
> +
> +static bool fast_pgd_switch(struct kvm *kvm, struct kvm_mmu *mmu,
> +			    gpa_t new_pgd, union kvm_mmu_page_role new_role)
> +{
>  	/*
> -	 * For now, limit the fast switch to 64-bit hosts+VMs in order to avoid
> +	 * For now, limit the caching to 64-bit hosts+VMs in order to avoid
>  	 * having to deal with PDPTEs. We may add support for 32-bit hosts/VMs
>  	 * later if necessary.
>  	 */
> -	if (mmu->shadow_root_level >= PT64_ROOT_4LEVEL &&
> -	    mmu->root_level >= PT64_ROOT_4LEVEL)
> -		return cached_root_available(vcpu, new_pgd, new_role);
> +	if (VALID_PAGE(mmu->root.hpa) && !to_shadow_page(mmu->root.hpa))
> +		kvm_mmu_free_roots(kvm, mmu, KVM_MMU_ROOT_CURRENT);
>  
> -	return false;
> +	if (VALID_PAGE(mmu->root.hpa))
> +		return cached_root_find_and_keep_current(kvm, mmu, new_pgd, new_role);
> +	else
> +		return cached_root_find_without_current(kvm, mmu, new_pgd, new_role);
>  }
>  
>  static void __kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd,
> @@ -4160,8 +4196,8 @@ static void __kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd,
>  {
>  	struct kvm_mmu *mmu = vcpu->arch.mmu;
>  
> -	if (!fast_pgd_switch(vcpu, new_pgd, new_role)) {
> -		kvm_mmu_free_roots(vcpu->kvm, mmu, KVM_MMU_ROOT_CURRENT);
> +	if (!fast_pgd_switch(vcpu->kvm, mmu, new_pgd, new_role)) {
> +		/* kvm_mmu_ensure_valid_pgd will set up a new root.  */

The "kvm_mmu_ensure_valid_pgd" part is stale due to the bikeshedding stalemate.
Maybe reference vcpu_enter_guest() instead?  E.g.

	/*
	 * If no usable root is found there's nothing more to do, a new root
	 * will be set up during vcpu_enter_guest(), prior to the next VM-Enter.
	 */
	if (!fast_pgd_switch(vcpu->kvm, mmu, new_pgd, new_role))
		return;

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 07/18] KVM: x86/mmu: Do not use guest root level in audit
  2022-02-17 21:03 ` [PATCH v2 07/18] KVM: x86/mmu: Do not use guest root level in audit Paolo Bonzini
@ 2022-02-18 18:37   ` Sean Christopherson
  2022-02-18 18:46     ` Paolo Bonzini
  0 siblings, 1 reply; 66+ messages in thread
From: Sean Christopherson @ 2022-02-18 18:37 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm, Lai Jiangshan

On Thu, Feb 17, 2022, Paolo Bonzini wrote:
> From: Lai Jiangshan <laijs@linux.alibaba.com>
> 
> Walking from the root page of the shadow page table should start with
> the level of the shadow page table: shadow_root_level; do not
> consult the level in order to check whether the root has a single
> root or uses pae_root, either, and use to_shadow_page instead.
> 
> Also tweak audit_mappings(), where the current walking level is more
> valuable to print.
> 
> Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---

Since I keep bringing it up...

From: Sean Christopherson <seanjc@google.com>
Date: Fri, 18 Feb 2022 09:43:05 -0800
Subject: [PATCH] KVM: x86/mmu: Remove MMU auditing

Remove mmu_audit.c and all its collateral, the auditing code has suffered
severe bitrot, ironically partly due to shadow paging being more stable
and thus not benefiting as much from auditing, but mostly due to TDP
supplanting shadow paging for non-nested guests and shadowing of nested
TDP not heavily stressing the logic that is being audited.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 .../admin-guide/kernel-parameters.txt         |   4 -
 arch/x86/include/asm/kvm_host.h               |   4 -
 arch/x86/kvm/Kconfig                          |   7 -
 arch/x86/kvm/mmu/mmu.c                        |  25 --
 arch/x86/kvm/mmu/mmu_audit.c                  | 303 ------------------
 arch/x86/kvm/mmu/paging_tmpl.h                |   2 -
 6 files changed, 345 deletions(-)
 delete mode 100644 arch/x86/kvm/mmu/mmu_audit.c

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 2a9746fe6c4a..05161afd7642 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2368,10 +2368,6 @@
 	kvm.enable_vmware_backdoor=[KVM] Support VMware backdoor PV interface.
 				   Default is false (don't support).

-	kvm.mmu_audit=	[KVM] This is a R/W parameter which allows audit
-			KVM MMU at runtime.
-			Default is 0 (off)
-
 	kvm.nx_huge_pages=
 			[KVM] Controls the software workaround for the
 			X86_BUG_ITLB_MULTIHIT bug.
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index c880254300c2..c2fe020802d1 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1127,10 +1127,6 @@ struct kvm_arch {
 	struct kvm_hv hyperv;
 	struct kvm_xen xen;

-	#ifdef CONFIG_KVM_MMU_AUDIT
-	int audit_point;
-	#endif
-
 	bool backwards_tsc_observed;
 	bool boot_vcpu_runs_old_kvmclock;
 	u32 bsp_vcpu_id;
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 2b1548da00eb..e3cbd7706136 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -126,13 +126,6 @@ config KVM_XEN

 	  If in doubt, say "N".

-config KVM_MMU_AUDIT
-	bool "Audit KVM MMU"
-	depends on KVM && TRACEPOINTS
-	help
-	 This option adds a R/W kVM module parameter 'mmu_audit', which allows
-	 auditing of KVM MMU events at runtime.
-
 config KVM_EXTERNAL_WRITE_TRACKING
 	bool

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index e1578f71feae..ed11a0383266 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -104,15 +104,6 @@ static int max_huge_page_level __read_mostly;
 static int tdp_root_level __read_mostly;
 static int max_tdp_level __read_mostly;

-enum {
-	AUDIT_PRE_PAGE_FAULT,
-	AUDIT_POST_PAGE_FAULT,
-	AUDIT_PRE_PTE_WRITE,
-	AUDIT_POST_PTE_WRITE,
-	AUDIT_PRE_SYNC,
-	AUDIT_POST_SYNC
-};
-
 #ifdef MMU_DEBUG
 bool dbg = 0;
 module_param(dbg, bool, 0644);
@@ -1904,13 +1895,6 @@ static bool kvm_mmu_remote_flush_or_zap(struct kvm *kvm,
 	return true;
 }

-#ifdef CONFIG_KVM_MMU_AUDIT
-#include "mmu_audit.c"
-#else
-static void kvm_mmu_audit(struct kvm_vcpu *vcpu, int point) { }
-static void mmu_audit_disable(void) { }
-#endif
-
 static bool is_obsolete_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
 	if (sp->role.invalid)
@@ -3674,17 +3658,12 @@ void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu)
 			return;

 		write_lock(&vcpu->kvm->mmu_lock);
-		kvm_mmu_audit(vcpu, AUDIT_PRE_SYNC);
-
 		mmu_sync_children(vcpu, sp, true);
-
-		kvm_mmu_audit(vcpu, AUDIT_POST_SYNC);
 		write_unlock(&vcpu->kvm->mmu_lock);
 		return;
 	}

 	write_lock(&vcpu->kvm->mmu_lock);
-	kvm_mmu_audit(vcpu, AUDIT_PRE_SYNC);

 	for (i = 0; i < 4; ++i) {
 		hpa_t root = vcpu->arch.mmu->pae_root[i];
@@ -3696,7 +3675,6 @@ void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu)
 		}
 	}

-	kvm_mmu_audit(vcpu, AUDIT_POST_SYNC);
 	write_unlock(&vcpu->kvm->mmu_lock);
 }

@@ -5247,7 +5225,6 @@ static void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
 	gentry = mmu_pte_write_fetch_gpte(vcpu, &gpa, &bytes);

 	++vcpu->kvm->stat.mmu_pte_write;
-	kvm_mmu_audit(vcpu, AUDIT_PRE_PTE_WRITE);

 	for_each_gfn_indirect_valid_sp(vcpu->kvm, sp, gfn) {
 		if (detect_write_misaligned(sp, gpa, bytes) ||
@@ -5272,7 +5249,6 @@ static void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
 		}
 	}
 	kvm_mmu_remote_flush_or_zap(vcpu->kvm, &invalid_list, flush);
-	kvm_mmu_audit(vcpu, AUDIT_POST_PTE_WRITE);
 	write_unlock(&vcpu->kvm->mmu_lock);
 }

@@ -6218,7 +6194,6 @@ void kvm_mmu_module_exit(void)
 	mmu_destroy_caches();
 	percpu_counter_destroy(&kvm_total_used_mmu_pages);
 	unregister_shrinker(&mmu_shrinker);
-	mmu_audit_disable();
 }

 /*
diff --git a/arch/x86/kvm/mmu/mmu_audit.c b/arch/x86/kvm/mmu/mmu_audit.c
deleted file mode 100644
index 3e5d62a25350..000000000000
--- a/arch/x86/kvm/mmu/mmu_audit.c
+++ /dev/null
@@ -1,303 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-only
-/*
- * mmu_audit.c:
- *
- * Audit code for KVM MMU
- *
- * Copyright (C) 2006 Qumranet, Inc.
- * Copyright 2010 Red Hat, Inc. and/or its affiliates.
- *
- * Authors:
- *   Yaniv Kamay  <yaniv@qumranet.com>
- *   Avi Kivity   <avi@qumranet.com>
- *   Marcelo Tosatti <mtosatti@redhat.com>
- *   Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
- */
-
-#include <linux/ratelimit.h>
-
-static char const *audit_point_name[] = {
-	"pre page fault",
-	"post page fault",
-	"pre pte write",
-	"post pte write",
-	"pre sync",
-	"post sync"
-};
-
-#define audit_printk(kvm, fmt, args...)		\
-	printk(KERN_ERR "audit: (%s) error: "	\
-		fmt, audit_point_name[kvm->arch.audit_point], ##args)
-
-typedef void (*inspect_spte_fn) (struct kvm_vcpu *vcpu, u64 *sptep, int level);
-
-static void __mmu_spte_walk(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
-			    inspect_spte_fn fn, int level)
-{
-	int i;
-
-	for (i = 0; i < PT64_ENT_PER_PAGE; ++i) {
-		u64 *ent = sp->spt;
-
-		fn(vcpu, ent + i, level);
-
-		if (is_shadow_present_pte(ent[i]) &&
-		      !is_last_spte(ent[i], level)) {
-			struct kvm_mmu_page *child;
-
-			child = to_shadow_page(ent[i] & PT64_BASE_ADDR_MASK);
-			__mmu_spte_walk(vcpu, child, fn, level - 1);
-		}
-	}
-}
-
-static void mmu_spte_walk(struct kvm_vcpu *vcpu, inspect_spte_fn fn)
-{
-	int i;
-	struct kvm_mmu_page *sp;
-
-	if (!VALID_PAGE(vcpu->arch.mmu->root.hpa))
-		return;
-
-	if (vcpu->arch.mmu->root_level >= PT64_ROOT_4LEVEL) {
-		hpa_t root = vcpu->arch.mmu->root.hpa;
-
-		sp = to_shadow_page(root);
-		__mmu_spte_walk(vcpu, sp, fn, vcpu->arch.mmu->root_level);
-		return;
-	}
-
-	for (i = 0; i < 4; ++i) {
-		hpa_t root = vcpu->arch.mmu->pae_root[i];
-
-		if (IS_VALID_PAE_ROOT(root)) {
-			root &= PT64_BASE_ADDR_MASK;
-			sp = to_shadow_page(root);
-			__mmu_spte_walk(vcpu, sp, fn, 2);
-		}
-	}
-
-	return;
-}
-
-typedef void (*sp_handler) (struct kvm *kvm, struct kvm_mmu_page *sp);
-
-static void walk_all_active_sps(struct kvm *kvm, sp_handler fn)
-{
-	struct kvm_mmu_page *sp;
-
-	list_for_each_entry(sp, &kvm->arch.active_mmu_pages, link)
-		fn(kvm, sp);
-}
-
-static void audit_mappings(struct kvm_vcpu *vcpu, u64 *sptep, int level)
-{
-	struct kvm_mmu_page *sp;
-	gfn_t gfn;
-	kvm_pfn_t pfn;
-	hpa_t hpa;
-
-	sp = sptep_to_sp(sptep);
-
-	if (sp->unsync) {
-		if (level != PG_LEVEL_4K) {
-			audit_printk(vcpu->kvm, "unsync sp: %p "
-				     "level = %d\n", sp, level);
-			return;
-		}
-	}
-
-	if (!is_shadow_present_pte(*sptep) || !is_last_spte(*sptep, level))
-		return;
-
-	gfn = kvm_mmu_page_get_gfn(sp, sptep - sp->spt);
-	pfn = kvm_vcpu_gfn_to_pfn_atomic(vcpu, gfn);
-
-	if (is_error_pfn(pfn))
-		return;
-
-	hpa =  pfn << PAGE_SHIFT;
-	if ((*sptep & PT64_BASE_ADDR_MASK) != hpa)
-		audit_printk(vcpu->kvm, "levels %d pfn %llx hpa %llx "
-			     "ent %llxn", vcpu->arch.mmu->root_level, pfn,
-			     hpa, *sptep);
-}
-
-static void inspect_spte_has_rmap(struct kvm *kvm, u64 *sptep)
-{
-	static DEFINE_RATELIMIT_STATE(ratelimit_state, 5 * HZ, 10);
-	struct kvm_rmap_head *rmap_head;
-	struct kvm_mmu_page *rev_sp;
-	struct kvm_memslots *slots;
-	struct kvm_memory_slot *slot;
-	gfn_t gfn;
-
-	rev_sp = sptep_to_sp(sptep);
-	gfn = kvm_mmu_page_get_gfn(rev_sp, sptep - rev_sp->spt);
-
-	slots = kvm_memslots_for_spte_role(kvm, rev_sp->role);
-	slot = __gfn_to_memslot(slots, gfn);
-	if (!slot) {
-		if (!__ratelimit(&ratelimit_state))
-			return;
-		audit_printk(kvm, "no memslot for gfn %llx\n", gfn);
-		audit_printk(kvm, "index %ld of sp (gfn=%llx)\n",
-		       (long int)(sptep - rev_sp->spt), rev_sp->gfn);
-		dump_stack();
-		return;
-	}
-
-	rmap_head = gfn_to_rmap(gfn, rev_sp->role.level, slot);
-	if (!rmap_head->val) {
-		if (!__ratelimit(&ratelimit_state))
-			return;
-		audit_printk(kvm, "no rmap for writable spte %llx\n",
-			     *sptep);
-		dump_stack();
-	}
-}
-
-static void audit_sptes_have_rmaps(struct kvm_vcpu *vcpu, u64 *sptep, int level)
-{
-	if (is_shadow_present_pte(*sptep) && is_last_spte(*sptep, level))
-		inspect_spte_has_rmap(vcpu->kvm, sptep);
-}
-
-static void audit_spte_after_sync(struct kvm_vcpu *vcpu, u64 *sptep)
-{
-	struct kvm_mmu_page *sp = sptep_to_sp(sptep);
-
-	if (vcpu->kvm->arch.audit_point == AUDIT_POST_SYNC && sp->unsync)
-		audit_printk(vcpu->kvm, "meet unsync sp(%p) after sync "
-			     "root.\n", sp);
-}
-
-static void check_mappings_rmap(struct kvm *kvm, struct kvm_mmu_page *sp)
-{
-	int i;
-
-	if (sp->role.level != PG_LEVEL_4K)
-		return;
-
-	for (i = 0; i < PT64_ENT_PER_PAGE; ++i) {
-		if (!is_shadow_present_pte(sp->spt[i]))
-			continue;
-
-		inspect_spte_has_rmap(kvm, sp->spt + i);
-	}
-}
-
-static void audit_write_protection(struct kvm *kvm, struct kvm_mmu_page *sp)
-{
-	struct kvm_rmap_head *rmap_head;
-	u64 *sptep;
-	struct rmap_iterator iter;
-	struct kvm_memslots *slots;
-	struct kvm_memory_slot *slot;
-
-	if (sp->role.direct || sp->unsync || sp->role.invalid)
-		return;
-
-	slots = kvm_memslots_for_spte_role(kvm, sp->role);
-	slot = __gfn_to_memslot(slots, sp->gfn);
-	rmap_head = gfn_to_rmap(sp->gfn, PG_LEVEL_4K, slot);
-
-	for_each_rmap_spte(rmap_head, &iter, sptep) {
-		if (is_writable_pte(*sptep))
-			audit_printk(kvm, "shadow page has writable "
-				     "mappings: gfn %llx role %x\n",
-				     sp->gfn, sp->role.word);
-	}
-}
-
-static void audit_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
-{
-	check_mappings_rmap(kvm, sp);
-	audit_write_protection(kvm, sp);
-}
-
-static void audit_all_active_sps(struct kvm *kvm)
-{
-	walk_all_active_sps(kvm, audit_sp);
-}
-
-static void audit_spte(struct kvm_vcpu *vcpu, u64 *sptep, int level)
-{
-	audit_sptes_have_rmaps(vcpu, sptep, level);
-	audit_mappings(vcpu, sptep, level);
-	audit_spte_after_sync(vcpu, sptep);
-}
-
-static void audit_vcpu_spte(struct kvm_vcpu *vcpu)
-{
-	mmu_spte_walk(vcpu, audit_spte);
-}
-
-static bool mmu_audit;
-static DEFINE_STATIC_KEY_FALSE(mmu_audit_key);
-
-static void __kvm_mmu_audit(struct kvm_vcpu *vcpu, int point)
-{
-	static DEFINE_RATELIMIT_STATE(ratelimit_state, 5 * HZ, 10);
-
-	if (!__ratelimit(&ratelimit_state))
-		return;
-
-	vcpu->kvm->arch.audit_point = point;
-	audit_all_active_sps(vcpu->kvm);
-	audit_vcpu_spte(vcpu);
-}
-
-static inline void kvm_mmu_audit(struct kvm_vcpu *vcpu, int point)
-{
-	if (static_branch_unlikely((&mmu_audit_key)))
-		__kvm_mmu_audit(vcpu, point);
-}
-
-static void mmu_audit_enable(void)
-{
-	if (mmu_audit)
-		return;
-
-	static_branch_inc(&mmu_audit_key);
-	mmu_audit = true;
-}
-
-static void mmu_audit_disable(void)
-{
-	if (!mmu_audit)
-		return;
-
-	static_branch_dec(&mmu_audit_key);
-	mmu_audit = false;
-}
-
-static int mmu_audit_set(const char *val, const struct kernel_param *kp)
-{
-	int ret;
-	unsigned long enable;
-
-	ret = kstrtoul(val, 10, &enable);
-	if (ret < 0)
-		return -EINVAL;
-
-	switch (enable) {
-	case 0:
-		mmu_audit_disable();
-		break;
-	case 1:
-		mmu_audit_enable();
-		break;
-	default:
-		return -EINVAL;
-	}
-
-	return 0;
-}
-
-static const struct kernel_param_ops audit_param_ops = {
-	.set = mmu_audit_set,
-	.get = param_get_bool,
-};
-
-arch_param_cb(mmu_audit, &audit_param_ops, &mmu_audit, 0644);
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 346f3bad3cb9..252c77805eb9 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -904,12 +904,10 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	if (is_page_fault_stale(vcpu, fault, mmu_seq))
 		goto out_unlock;

-	kvm_mmu_audit(vcpu, AUDIT_PRE_PAGE_FAULT);
 	r = make_mmu_pages_available(vcpu);
 	if (r)
 		goto out_unlock;
 	r = FNAME(fetch)(vcpu, fault, &walker);
-	kvm_mmu_audit(vcpu, AUDIT_POST_PAGE_FAULT);

 out_unlock:
 	write_unlock(&vcpu->kvm->mmu_lock);

base-commit: 385d1e4898fb823e0bb25b6c23d000400bf6340e
--


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 08/18] KVM: x86/mmu: do not pass vcpu to root freeing functions
  2022-02-17 21:03 ` [PATCH v2 08/18] KVM: x86/mmu: do not pass vcpu to root freeing functions Paolo Bonzini
@ 2022-02-18 18:39   ` Sean Christopherson
  2022-02-23 15:16   ` Maxim Levitsky
  1 sibling, 0 replies; 66+ messages in thread
From: Sean Christopherson @ 2022-02-18 18:39 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm

On Thu, Feb 17, 2022, Paolo Bonzini wrote:
> These functions only operate on a given MMU, of which there are two in a vCPU.

Technically 3, but one is only used to walk guest pages tables ;-)

> They also need a struct kvm in order to lock the mmu_lock, but they do not
> needed anything else in the struct kvm_vcpu.  So, pass the vcpu->kvm directly
> to them.

Wrapping at ~75 chars is preferred for changelogs.

> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---

Reviewed-by: Sean Christopherson <seanjc@google.com>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 07/18] KVM: x86/mmu: Do not use guest root level in audit
  2022-02-18 18:37   ` Sean Christopherson
@ 2022-02-18 18:46     ` Paolo Bonzini
  2022-02-23 15:02       ` Maxim Levitsky
  0 siblings, 1 reply; 66+ messages in thread
From: Paolo Bonzini @ 2022-02-18 18:46 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: linux-kernel, kvm, Lai Jiangshan

On 2/18/22 19:37, Sean Christopherson wrote:
> Since I keep bringing it up...
> 
> From: Sean Christopherson<seanjc@google.com>
> Date: Fri, 18 Feb 2022 09:43:05 -0800
> Subject: [PATCH] KVM: x86/mmu: Remove MMU auditing
> 
> Remove mmu_audit.c and all its collateral, the auditing code has suffered
> severe bitrot, ironically partly due to shadow paging being more stable
> and thus not benefiting as much from auditing, but mostly due to TDP
> supplanting shadow paging for non-nested guests and shadowing of nested
> TDP not heavily stressing the logic that is being audited.
> 
> Signed-off-by: Sean Christopherson<seanjc@google.com>

Queued, thanks. O:-)

Paolo


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 14/18] KVM: x86/mmu: avoid indirect call for get_cr3
  2022-02-17 21:03 ` [PATCH v2 14/18] KVM: x86/mmu: avoid indirect call for get_cr3 Paolo Bonzini
@ 2022-02-18 20:30   ` Sean Christopherson
  2022-02-19 10:03     ` Paolo Bonzini
  2022-02-24 11:02   ` Maxim Levitsky
  1 sibling, 1 reply; 66+ messages in thread
From: Sean Christopherson @ 2022-02-18 20:30 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm

On Thu, Feb 17, 2022, Paolo Bonzini wrote:
> Most of the time, calls to get_guest_pgd result in calling
> kvm_read_cr3 (the exception is only nested TDP).  Hardcode
> the default instead of using the get_cr3 function, avoiding
> a retpoline if they are enabled.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  arch/x86/kvm/mmu.h             | 13 +++++++++++++
>  arch/x86/kvm/mmu/mmu.c         | 15 +++++----------
>  arch/x86/kvm/mmu/paging_tmpl.h |  2 +-
>  arch/x86/kvm/x86.c             |  2 +-
>  4 files changed, 20 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> index 1d0c1904d69a..1808d6814ddb 100644
> --- a/arch/x86/kvm/mmu.h
> +++ b/arch/x86/kvm/mmu.h
> @@ -116,6 +116,19 @@ static inline void kvm_mmu_load_pgd(struct kvm_vcpu *vcpu)
>  					  vcpu->arch.mmu->shadow_root_level);
>  }
>  
> +static inline gpa_t __kvm_mmu_get_guest_pgd(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu)
> +{

I'd prefer to do what we do for page faults.  That approach avoids the need for a
comment to document NULL and avoids a conditional when RETPOLINE is not enabled.

Might be worth renaming get_cr3 => get_guest_cr3 though.

#ifdef CONFIG_RETPOLINE
	if (mmu->get_guest_pgd = get_guest_cr3)
		return kvm_read_cr3(vcpu);
#endif
	return mmu->get_guest_pgd(vcpu);


> +	if (!mmu->get_guest_pgd)
> +		return kvm_read_cr3(vcpu);
> +	else
> +		return mmu->get_guest_pgd(vcpu);
> +}
> +
> +static inline gpa_t kvm_mmu_get_guest_pgd(struct kvm_vcpu *vcpu)
> +{
> +	return __kvm_mmu_get_guest_pgd(vcpu, vcpu->arch.mmu);

I'd much prefer we don't provide an @vcpu-only variant and force the caller to
provide the mmu.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 15/18] KVM: x86/mmu: rename kvm_mmu_new_pgd, introduce variant that calls get_guest_pgd
  2022-02-18  9:39   ` Paolo Bonzini
@ 2022-02-18 21:00     ` Sean Christopherson
  2022-02-24 15:41       ` Maxim Levitsky
  0 siblings, 1 reply; 66+ messages in thread
From: Sean Christopherson @ 2022-02-18 21:00 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm

On Fri, Feb 18, 2022, Paolo Bonzini wrote:
> On 2/17/22 22:03, Paolo Bonzini wrote:
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index adcee7c305ca..9800c8883a48 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -1189,7 +1189,7 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
> >   		return 1;
> >   	if (cr3 != kvm_read_cr3(vcpu))
> > -		kvm_mmu_new_pgd(vcpu, cr3);
> > +		kvm_mmu_update_root(vcpu);
> >   	vcpu->arch.cr3 = cr3;
> >   	kvm_register_mark_dirty(vcpu, VCPU_EXREG_CR3);
> 
> Uh-oh, this has to become:
> 
>  	vcpu->arch.cr3 = cr3;
>  	kvm_register_mark_dirty(vcpu, VCPU_EXREG_CR3);
> 	if (!is_pae_paging(vcpu))
> 		kvm_mmu_update_root(vcpu);
> 
> The regression would go away after patch 16, but this is more tidy apart
> from having to check is_pae_paging *again*.
> 
> Incremental patch:
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index adcee7c305ca..0085e9fba372 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1188,11 +1189,11 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
>  	if (is_pae_paging(vcpu) && !load_pdptrs(vcpu, cr3))
>  		return 1;
> -	if (cr3 != kvm_read_cr3(vcpu))
> -		kvm_mmu_update_root(vcpu);
> -
>  	vcpu->arch.cr3 = cr3;
>  	kvm_register_mark_dirty(vcpu, VCPU_EXREG_CR3);
> +	if (!is_pae_paging(vcpu))
> +		kvm_mmu_update_root(vcpu);
> +
>  	/* Do not call post_set_cr3, we do not get here for confidential guests.  */
> 
> An alternative is to move the vcpu->arch.cr3 update in load_pdptrs.
> Reviewers, let me know if you prefer that, then I'll send v3.

  c) None of the above.

MOV CR3 never requires a new root if TDP is enabled, and the guest_mmu is used if
and only if TDP is enabled.  Even when KVM intercepts CR3 when EPT=1 && URG=0, it
does so only to snapshot vcpu->arch.cr3, there's no need to get a new PGD.

Unless I'm missing something, your original suggestion of checking tdp_enabled is
the way to go.

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6e0f7f22c6a7..2b02029c63d0 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1187,7 +1187,7 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
        if (is_pae_paging(vcpu) && !load_pdptrs(vcpu, cr3))
                return 1;

-       if (cr3 != kvm_read_cr3(vcpu))
+       if (!tdp_enabled && cr3 != kvm_read_cr3(vcpu))
                kvm_mmu_new_pgd(vcpu, cr3);

        vcpu->arch.cr3 = cr3;



^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 16/18] KVM: x86: introduce KVM_REQ_MMU_UPDATE_ROOT
  2022-02-17 21:03 ` [PATCH v2 16/18] KVM: x86: introduce KVM_REQ_MMU_UPDATE_ROOT Paolo Bonzini
@ 2022-02-18 21:45   ` Sean Christopherson
  2022-02-19  7:54     ` Paolo Bonzini
  0 siblings, 1 reply; 66+ messages in thread
From: Sean Christopherson @ 2022-02-18 21:45 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm

On Thu, Feb 17, 2022, Paolo Bonzini wrote:
> Whenever KVM knows the page role flags have changed, it needs to drop
> the current MMU root and possibly load one from the prev_roots cache.
> Currently it is papering over some overly simplistic code by just
> dropping _all_ roots, so that the root will be reloaded by
> kvm_mmu_reload, but this has bad performance for the TDP MMU
> (which drops the whole of the page tables when freeing a root,
> without the performance safety net of a hash table).
> 
> To do this, KVM needs to do a more kvm_mmu_update_root call from
> kvm_mmu_reset_context.  Introduce a new request bit so that the call
> can be delayed until after a possible KVM_REQ_MMU_RELOAD, which would
> kill all hopes of finding a cached PGD.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---

Please no.

I really, really do not want to add yet another deferred-load in the nested
virtualization paths.  As Jim pointed out[1], KVM_REQ_GET_NESTED_STATE_PAGES should
never have been merged. And on that point, I've no idea how this new request will
interact with KVM_REQ_GET_NESTED_STATE_PAGE.  It may be a complete non-issue, but
I'd honestly rather not have to spend the brain power.

And I still do not like the approach of converting kvm_mmu_reset_context() wholesale
to not doing kvm_mmu_unload().  There are currently eight kvm_mmu_reset_context() calls:

  1.   nested_vmx_restore_host_state() - Only for a missed VM-Entry => VM-Fail
       consistency check, not at all a performance concern.

  2.   kvm_mmu_after_set_cpuid() - Still needs to unload.  Not a perf concern.

  3.   kvm_vcpu_reset() - Relevant only to INIT.  Not a perf concern, but could be
       converted manually to a different path without too much fuss.

  4+5. enter_smm() / kvm_smm_changed() - IMO, not a perf concern, but again could
       be converted manually if anyone cares.

  6.   set_efer() - Silly corner case that basically requires host userspace abuse
       of KVM APIs.  Not a perf concern.

  7+8. kvm_post_set_cr0/4() - These are the ones we really care about, and they
       can be handled quite trivially, and can even share much of the logic with
       kvm_set_cr3().

I strongly prefer that we take a more conservative approach and fix 7+8, and then
tackle 1, 3, and 4+5 separately if someone cares enough about those flows to avoid
dropping roots.

Regarding KVM_REQ_MMU_RELOAD, that mess mostly goes away with my series to replace
that with KVM_REQ_MMU_FREE_OBSOLETE_ROOTS.  Obsolete TDP MMU roots will never get
a cache hit because the obsolete root will have an "invalid" role.  And if we care
about optimizing this with respect to a memslot (highly unlikely), then we could
add an MMU generation check in the cache lookup.  I was planning on posting that
series as soon as this one is queued, but I'm more than happy to speculatively send
a refreshed version that applies on top of this series.

[1] https://lore.kernel.org/all/CALMp9eT2cP7kdptoP3=acJX+5_Wg6MXNwoDh42pfb21-wdXvJg@mail.gmail.com
[2] https://lore.kernel.org/all/20211209060552.2956723-1-seanjc@google.com

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 17/18] KVM: x86: flush TLB separately from MMU reset
  2022-02-17 21:03 ` [PATCH v2 17/18] KVM: x86: flush TLB separately from MMU reset Paolo Bonzini
@ 2022-02-18 23:57   ` Sean Christopherson
  2022-02-21 15:01     ` Paolo Bonzini
  2022-02-24 16:11   ` Maxim Levitsky
  1 sibling, 1 reply; 66+ messages in thread
From: Sean Christopherson @ 2022-02-18 23:57 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm

On Thu, Feb 17, 2022, Paolo Bonzini wrote:
> For both CR0 and CR4, disassociate the TLB flush logic from the
> MMU role logic.  Instead  of relying on kvm_mmu_reset_context() being
> a superset of various TLB flushes (which is not necessarily going to
> be the case in the future), always call it if the role changes
> but also set the various TLB flush requests according to what is
> in the manual.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---

Code is good, a few nits on comments.

Reviewed-by: Sean Christopherson <seanjc@google.com>

> @@ -1057,28 +1064,41 @@ EXPORT_SYMBOL_GPL(kvm_is_valid_cr4);
>  
>  void kvm_post_set_cr4(struct kvm_vcpu *vcpu, unsigned long old_cr4, unsigned long cr4)
>  {
> +	if ((cr4 ^ old_cr4) & KVM_MMU_CR4_ROLE_BITS)
> +		kvm_mmu_reset_context(vcpu);
> +
>  	/*
> -	 * If any role bit is changed, the MMU needs to be reset.
> -	 *
> -	 * If CR4.PCIDE is changed 1 -> 0, the guest TLB must be flushed.
>  	 * If CR4.PCIDE is changed 0 -> 1, there is no need to flush the TLB
>  	 * according to the SDM; however, stale prev_roots could be reused
>  	 * incorrectly in the future after a MOV to CR3 with NOFLUSH=1, so we
> -	 * free them all.  KVM_REQ_MMU_RELOAD is fit for the both cases; it
> -	 * is slow, but changing CR4.PCIDE is a rare case.
> -	 *
> -	 * If CR4.PGE is changed, the guest TLB must be flushed.
> -	 *
> -	 * Note: resetting MMU is a superset of KVM_REQ_MMU_RELOAD and
> -	 * KVM_REQ_MMU_RELOAD is a superset of KVM_REQ_TLB_FLUSH_GUEST, hence
> -	 * the usage of "else if".
> +	 * free them all.  This is *not* a superset of KVM_REQ_TLB_FLUSH_GUEST
> +	 * or KVM_REQ_TLB_FLUSH_CURRENT, because the hardware TLB is not flushed,
> +	 * so fall through.
>  	 */
> -	if ((cr4 ^ old_cr4) & KVM_MMU_CR4_ROLE_BITS)
> -		kvm_mmu_reset_context(vcpu);
> -	else if ((cr4 ^ old_cr4) & X86_CR4_PCIDE)
> +	if (!tdp_enabled &&
> +	    (cr4 & X86_CR4_PCIDE) && !(old_cr4 & X86_CR4_PCIDE))
>  		kvm_make_request(KVM_REQ_MMU_RELOAD, vcpu);
> -	else if ((cr4 ^ old_cr4) & X86_CR4_PGE)
> -		kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
> +
> +	/*
> +	 * The TLB has to be flushed for all PCIDs on:
> +	 * - CR4.PCIDE changed from 1 to 0

Uber nit, grammatically this should use "a ... change", not "changed".  And I
think it's worth calling out that the flush is architecturally required.
Something like this, though I don't like using "conditions" to describe the
cases (can't think of a bette word, obviously).

	/*
	 * A TLB flush for all PCIDs is architecturally required if any of the
	 * following conditions is true:
	 * - CR4.PCIDE is changed from 1 to 0
	 * - CR4.PGE is toggled
	 */

> +	 * - any change to CR4.PGE
> +	 *
> +	 * This is a superset of KVM_REQ_TLB_FLUSH_CURRENT.
> +	 */
> +	if (((cr4 ^ old_cr4) & X86_CR4_PGE) ||
> +	    (!(cr4 & X86_CR4_PCIDE) && (old_cr4 & X86_CR4_PCIDE)))
> +		 kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
> +
> +	/*
> +	 * The TLB has to be flushed for the current PCID on:
> +	 * - CR4.SMEP changed from 0 to 1
> +	 * - any change to CR4.PAE
> +	 */

Same nit plus "architecturally required" feedback fo rthis one.

> +	else if (((cr4 ^ old_cr4) & X86_CR4_PAE) ||
> +		 ((cr4 & X86_CR4_SMEP) && !(old_cr4 & X86_CR4_SMEP)))
> +		 kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
> +
>  }
>  EXPORT_SYMBOL_GPL(kvm_post_set_cr4);
>  
> @@ -11323,15 +11343,17 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
>  	static_call(kvm_x86_update_exception_bitmap)(vcpu);
>  
>  	/*
> -	 * Reset the MMU context if paging was enabled prior to INIT (which is
> +	 * A TLB flush is needed if paging was enabled prior to INIT (which is

I appreciate the cleverness in changing only a single like, but I think both
pieces warrant a mention.  How 'bout this, to squeak by with two lines?

	/*
	 * Reset the MMU and flush the TLB if paging was enabled (INIT only, as
	 * CR0 is currently guaranteed to be '0' prior to RESET).  Unlike the

>  	 * implied if CR0.PG=1 as CR0 will be '0' prior to RESET).  Unlike the
>  	 * standard CR0/CR4/EFER modification paths, only CR0.PG needs to be
>  	 * checked because it is unconditionally cleared on INIT and all other
>  	 * paging related bits are ignored if paging is disabled, i.e. CR0.WP,
>  	 * CR4, and EFER changes are all irrelevant if CR0.PG was '0'.
>  	 */
> -	if (old_cr0 & X86_CR0_PG)
> +	if (old_cr0 & X86_CR0_PG) {
> +		kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
>  		kvm_mmu_reset_context(vcpu);
> +	}
>  
>  	/*
>  	 * Intel's SDM states that all TLB entries are flushed on INIT.  AMD's
> -- 
> 2.31.1

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 10/18] KVM: x86/mmu: load new PGD after the shadow MMU is initialized
  2022-02-17 21:03 ` [PATCH v2 10/18] KVM: x86/mmu: load new PGD after the shadow MMU is initialized Paolo Bonzini
@ 2022-02-18 23:59   ` Sean Christopherson
  2022-02-23 16:20   ` Maxim Levitsky
  1 sibling, 0 replies; 66+ messages in thread
From: Sean Christopherson @ 2022-02-18 23:59 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm

On Thu, Feb 17, 2022, Paolo Bonzini wrote:
> Now that __kvm_mmu_new_pgd does not look at the MMU's root_level and
> shadow_root_level anymore, pull the PGD load after the initialization of
> the shadow MMUs.
> 
> Besides being more intuitive, this enables future simplifications
> and optimizations because it's not necessary anymore to compute the
> role outside kvm_init_mmu.  In particular, kvm_mmu_reset_context was not
> attempting to use a cached PGD to avoid having to figure out the new role.
> It will soon be able to follow what nested_{vmx,svm}_load_cr3 are doing,
> and avoid unloading all the cached roots.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---

Reviewed-by: Sean Christopherson <seanjc@google.com>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 12/18] KVM: x86/mmu: clear MMIO cache when unloading the MMU
  2022-02-17 21:03 ` [PATCH v2 12/18] KVM: x86/mmu: clear MMIO cache when unloading the MMU Paolo Bonzini
@ 2022-02-18 23:59   ` Sean Christopherson
  2022-02-23 16:32   ` Maxim Levitsky
  1 sibling, 0 replies; 66+ messages in thread
From: Sean Christopherson @ 2022-02-18 23:59 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm

On Thu, Feb 17, 2022, Paolo Bonzini wrote:
> For cleanliness, do not leave a stale GVA in the cache after all the roots are
> cleared.  In practice, kvm_mmu_load will go through kvm_mmu_sync_roots if
> paging is on, and will not use vcpu_match_mmio_gva at all if paging is off.
> However, leaving data in the cache might cause bugs in the future.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---

Reviewed-by: Sean Christopherson <seanjc@google.com>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 11/18] KVM: x86/mmu: Always use current mmu's role when loading new PGD
  2022-02-17 21:03 ` [PATCH v2 11/18] KVM: x86/mmu: Always use current mmu's role when loading new PGD Paolo Bonzini
@ 2022-02-18 23:59   ` Sean Christopherson
  2022-02-23 16:23   ` Maxim Levitsky
  1 sibling, 0 replies; 66+ messages in thread
From: Sean Christopherson @ 2022-02-18 23:59 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm

On Thu, Feb 17, 2022, Paolo Bonzini wrote:
> Since the guest PGD is now loaded after the MMU has been set up
> completely, the desired role for a cache hit is simply the current
> mmu_role.  There is no need to compute it again, so __kvm_mmu_new_pgd
> can be folded in kvm_mmu_new_pgd.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---

Reviewed-by: Sean Christopherson <seanjc@google.com>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 13/18] KVM: x86: reset and reinitialize the MMU in __set_sregs_common
  2022-02-17 21:03 ` [PATCH v2 13/18] KVM: x86: reset and reinitialize the MMU in __set_sregs_common Paolo Bonzini
@ 2022-02-19  0:22   ` Sean Christopherson
  2022-02-23 16:48   ` Maxim Levitsky
  1 sibling, 0 replies; 66+ messages in thread
From: Sean Christopherson @ 2022-02-19  0:22 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm

On Thu, Feb 17, 2022, Paolo Bonzini wrote:
> Do a full unload of the MMU in KVM_SET_SREGS and KVM_SEST_REGS2, in
> preparation for not doing so in kvm_mmu_reset_context.  There is no
> need to delay the reset until after the return, so do it directly in
> the __set_sregs_common function and remove the mmu_reset_needed output
> parameter.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---

Reviewed-by: Sean Christopherson <seanjc@google.com>

> +	kvm_init_mmu(vcpu);
>  	if (update_pdptrs) {
>  		idx = srcu_read_lock(&vcpu->kvm->srcu);
> -		if (is_pae_paging(vcpu)) {
> +		if (is_pae_paging(vcpu))
>  			load_pdptrs(vcpu, kvm_read_cr3(vcpu));
> -			*mmu_reset_needed = 1;

Eww (not your code, just this whole pile).  It might be worth calling out in the
changelog that calling kvm_init_mmu() before load_pdptrs() will (subtly) _not_
impact the functionality of load_pdptrs().  If the MMU is nested, kvm_init_mmu()
will modify vcpu->arch.nested_mmu, whereas kvm_translate_gpa() will walk
vcpu->arch.guest_mmu.  And if the MMU is not nested, kvm_translate_gpa() will not
consuming vcpu->arch.mmu other than to check if it's == &guest_mmu.

> -		}
>  		srcu_read_unlock(&vcpu->kvm->srcu, idx);
>  	}
>  

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 16/18] KVM: x86: introduce KVM_REQ_MMU_UPDATE_ROOT
  2022-02-18 21:45   ` Sean Christopherson
@ 2022-02-19  7:54     ` Paolo Bonzini
  2022-02-22 16:06       ` Sean Christopherson
  2022-02-24 15:50       ` Maxim Levitsky
  0 siblings, 2 replies; 66+ messages in thread
From: Paolo Bonzini @ 2022-02-19  7:54 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: linux-kernel, kvm

On 2/18/22 22:45, Sean Christopherson wrote:
> On Thu, Feb 17, 2022, Paolo Bonzini wrote:
>> Whenever KVM knows the page role flags have changed, it needs to drop
>> the current MMU root and possibly load one from the prev_roots cache.
>> Currently it is papering over some overly simplistic code by just
>> dropping _all_ roots, so that the root will be reloaded by
>> kvm_mmu_reload, but this has bad performance for the TDP MMU
>> (which drops the whole of the page tables when freeing a root,
>> without the performance safety net of a hash table).
>>
>> To do this, KVM needs to do a more kvm_mmu_update_root call from
>> kvm_mmu_reset_context.  Introduce a new request bit so that the call
>> can be delayed until after a possible KVM_REQ_MMU_RELOAD, which would
>> kill all hopes of finding a cached PGD.
>>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
> 
> Please no.
> 
> I really, really do not want to add yet another deferred-load in the nested
> virtualization paths.

This is not a deferred load, is it?  It's only kvm_mmu_new_pgd that is 
deferred, but the PDPTR load is not.

I think I should first merge patches 1-13, then revisit the root_role 
series (which only depends on the fast_pgd_switch and caching changes), 
and then finally get back to this final part.  The reason is that 
root_role is what enables the stale-root check that you wanted; and it's 
easier to think about loading the guest PGD post-kvm_init_mmu if I can 
show you the direction I'd like to have in general, and not leave things 
half-done.

(Patch 17 is also independent and perhaps fixing a case of premature 
optimization, so I'm inclined to merge it as well).

> As Jim pointed out[1], KVM_REQ_GET_NESTED_STATE_PAGES should
> never have been merged. And on that point, I've no idea how this new request will
> interact with KVM_REQ_GET_NESTED_STATE_PAGE.  It may be a complete non-issue, but
> I'd honestly rather not have to spend the brain power.

Fair enough on the interaction, but I still think 
KVM_REQ_GET_NESTED_STATE_PAGES is a good idea.  I don't think KVM should 
access guest memory outside KVM_RUN, though there may be cases (possibly 
some PV MSRs, if I had to guess) where it does.

> And I still do not like the approach of converting kvm_mmu_reset_context() wholesale
> to not doing kvm_mmu_unload().  There are currently eight kvm_mmu_reset_context() calls:
> 
>    1.   nested_vmx_restore_host_state() - Only for a missed VM-Entry => VM-Fail
>         consistency check, not at all a performance concern.
> 
>    2.   kvm_mmu_after_set_cpuid() - Still needs to unload.  Not a perf concern.
> 
>    3.   kvm_vcpu_reset() - Relevant only to INIT.  Not a perf concern, but could be
>         converted manually to a different path without too much fuss.
> 
>    4+5. enter_smm() / kvm_smm_changed() - IMO, not a perf concern, but again could
>         be converted manually if anyone cares.
> 
>    6.   set_efer() - Silly corner case that basically requires host userspace abuse
>         of KVM APIs.  Not a perf concern.
> 
>    7+8. kvm_post_set_cr0/4() - These are the ones we really care about, and they
>         can be handled quite trivially, and can even share much of the logic with
>         kvm_set_cr3().
> 
> I strongly prefer that we take a more conservative approach and fix 7+8, and then
> tackle 1, 3, and 4+5 separately if someone cares enough about those flows to avoid
> dropping roots.

The thing is, I want to get rid of kvm_mmu_reset_context() altogether. 
I dislike the fact that it kills the roots but still keeps them in the 
hash table, thus relying on separate syncing to avoid future bugs.  It's 
very unintuitive what is "reset" and what isn't.

> Regarding KVM_REQ_MMU_RELOAD, that mess mostly goes away with my series to replace
> that with KVM_REQ_MMU_FREE_OBSOLETE_ROOTS.  Obsolete TDP MMU roots will never get
> a cache hit because the obsolete root will have an "invalid" role.  And if we care
> about optimizing this with respect to a memslot (highly unlikely), then we could
> add an MMU generation check in the cache lookup.  I was planning on posting that
> series as soon as this one is queued, but I'm more than happy to speculatively send
> a refreshed version that applies on top of this series.

Yes, please send a version on top of patches 1-13.  That can be reviewed 
and committed in parallel with the root_role changes.

Paolo

> [1] https://lore.kernel.org/all/CALMp9eT2cP7kdptoP3=acJX+5_Wg6MXNwoDh42pfb21-wdXvJg@mail.gmail.com
> [2] https://lore.kernel.org/all/20211209060552.2956723-1-seanjc@google.com


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 14/18] KVM: x86/mmu: avoid indirect call for get_cr3
  2022-02-18 20:30   ` Sean Christopherson
@ 2022-02-19 10:03     ` Paolo Bonzini
  0 siblings, 0 replies; 66+ messages in thread
From: Paolo Bonzini @ 2022-02-19 10:03 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: linux-kernel, kvm

On 2/18/22 21:30, Sean Christopherson wrote:
> On Thu, Feb 17, 2022, Paolo Bonzini wrote:
>> Most of the time, calls to get_guest_pgd result in calling
>> kvm_read_cr3 (the exception is only nested TDP).  Hardcode
>> the default instead of using the get_cr3 function, avoiding
>> a retpoline if they are enabled.
>>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
>>   arch/x86/kvm/mmu.h             | 13 +++++++++++++
>>   arch/x86/kvm/mmu/mmu.c         | 15 +++++----------
>>   arch/x86/kvm/mmu/paging_tmpl.h |  2 +-
>>   arch/x86/kvm/x86.c             |  2 +-
>>   4 files changed, 20 insertions(+), 12 deletions(-)
>>
>> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
>> index 1d0c1904d69a..1808d6814ddb 100644
>> --- a/arch/x86/kvm/mmu.h
>> +++ b/arch/x86/kvm/mmu.h
>> @@ -116,6 +116,19 @@ static inline void kvm_mmu_load_pgd(struct kvm_vcpu *vcpu)
>>   					  vcpu->arch.mmu->shadow_root_level);
>>   }
>>   
>> +static inline gpa_t __kvm_mmu_get_guest_pgd(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu)
>> +{
> 
> I'd prefer to do what we do for page faults.  That approach avoids the need for a
> comment to document NULL and avoids a conditional when RETPOLINE is not enabled.
> 
> Might be worth renaming get_cr3 => get_guest_cr3 though.

I did it this way to avoid a slightly gratuitous extern function just 
because kvm_mmu_get_guest_pgd and kvm_read_cr3 are both inline.  But at 
least it's not an export since there are no callers in vmx/svm, so it's 
okay to do it as you suggested.

> #ifdef CONFIG_RETPOLINE
> 	if (mmu->get_guest_pgd = get_guest_cr3)
> 		return kvm_read_cr3(vcpu);
> #endif
> 	return mmu->get_guest_pgd(vcpu);
> 
> 
>> +	if (!mmu->get_guest_pgd)
>> +		return kvm_read_cr3(vcpu);
>> +	else
>> +		return mmu->get_guest_pgd(vcpu);
>> +}
>> +
>> +static inline gpa_t kvm_mmu_get_guest_pgd(struct kvm_vcpu *vcpu)
>> +{
>> +	return __kvm_mmu_get_guest_pgd(vcpu, vcpu->arch.mmu);
> 
> I'd much prefer we don't provide an @vcpu-only variant and force the caller to
> provide the mmu.

No problem.

Paolo

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 17/18] KVM: x86: flush TLB separately from MMU reset
  2022-02-18 23:57   ` Sean Christopherson
@ 2022-02-21 15:01     ` Paolo Bonzini
  0 siblings, 0 replies; 66+ messages in thread
From: Paolo Bonzini @ 2022-02-21 15:01 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: linux-kernel, kvm

On 2/19/22 00:57, Sean Christopherson wrote:
> I appreciate the cleverness in changing only a single like, but I think both
> pieces warrant a mention.  How 'bout this, to squeak by with two lines?
> 
> 	/*
> 	 * Reset the MMU and flush the TLB if paging was enabled (INIT only, as
> 	 * CR0 is currently guaranteed to be '0' prior to RESET).  Unlike the

Let's just make it clearer:

          * On the standard CR0/CR4/EFER modification paths, there are several
          * complex conditions determining whether the MMU has to be reset and/or
          * which PCIDs have to be flushed.  However, CR0.WP and the paging-related
          * bits in CR4 and EFER are irrelevant if CR0.PG was '0'; and a reset+flush
          * is needed anyway if CR0.PG was '1' (which can only happen for INIT, as
          * CR0 will be '0' prior to RESET).  So we only need to check CR0.PG here.

Paolo

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 16/18] KVM: x86: introduce KVM_REQ_MMU_UPDATE_ROOT
  2022-02-19  7:54     ` Paolo Bonzini
@ 2022-02-22 16:06       ` Sean Christopherson
  2022-02-24 15:50       ` Maxim Levitsky
  1 sibling, 0 replies; 66+ messages in thread
From: Sean Christopherson @ 2022-02-22 16:06 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm

On Sat, Feb 19, 2022, Paolo Bonzini wrote:
> On 2/18/22 22:45, Sean Christopherson wrote:
> > On Thu, Feb 17, 2022, Paolo Bonzini wrote:
> > > Whenever KVM knows the page role flags have changed, it needs to drop
> > > the current MMU root and possibly load one from the prev_roots cache.
> > > Currently it is papering over some overly simplistic code by just
> > > dropping _all_ roots, so that the root will be reloaded by
> > > kvm_mmu_reload, but this has bad performance for the TDP MMU
> > > (which drops the whole of the page tables when freeing a root,
> > > without the performance safety net of a hash table).
> > > 
> > > To do this, KVM needs to do a more kvm_mmu_update_root call from
> > > kvm_mmu_reset_context.  Introduce a new request bit so that the call
> > > can be delayed until after a possible KVM_REQ_MMU_RELOAD, which would
> > > kill all hopes of finding a cached PGD.
> > > 
> > > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> > > ---
> > 
> > Please no.
> > 
> > I really, really do not want to add yet another deferred-load in the nested
> > virtualization paths.
> 
> This is not a deferred load, is it?  It's only kvm_mmu_new_pgd that is
> deferred, but the PDPTR load is not.

Yeah, I'm referring to kvm_mmu_new_pgd().

> > I strongly prefer that we take a more conservative approach and fix 7+8, and then
> > tackle 1, 3, and 4+5 separately if someone cares enough about those flows to avoid
> > dropping roots.
> 
> The thing is, I want to get rid of kvm_mmu_reset_context() altogether. I
> dislike the fact that it kills the roots but still keeps them in the hash
> table, thus relying on separate syncing to avoid future bugs.  It's very
> unintuitive what is "reset" and what isn't.

I agree with all of the above, I just don't think that forcing the issue is going
to be a net positive in the long run.

> > Regarding KVM_REQ_MMU_RELOAD, that mess mostly goes away with my series to replace
> > that with KVM_REQ_MMU_FREE_OBSOLETE_ROOTS.  Obsolete TDP MMU roots will never get
> > a cache hit because the obsolete root will have an "invalid" role.  And if we care
> > about optimizing this with respect to a memslot (highly unlikely), then we could
> > add an MMU generation check in the cache lookup.  I was planning on posting that
> > series as soon as this one is queued, but I'm more than happy to speculatively send
> > a refreshed version that applies on top of this series.
> 
> Yes, please send a version on top of patches 1-13.  That can be reviewed and
> committed in parallel with the root_role changes.

Will do.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 01/18] KVM: x86: host-initiated EFER.LME write affects the MMU
  2022-02-17 21:03 ` [PATCH v2 01/18] KVM: x86: host-initiated EFER.LME write affects the MMU Paolo Bonzini
  2022-02-18 17:08   ` Sean Christopherson
@ 2022-02-23 13:40   ` Maxim Levitsky
  1 sibling, 0 replies; 66+ messages in thread
From: Maxim Levitsky @ 2022-02-23 13:40 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm; +Cc: seanjc, stable

On Thu, 2022-02-17 at 16:03 -0500, Paolo Bonzini wrote:
> While the guest runs, EFER.LME cannot change unless CR0.PG is clear, and therefore
> EFER.NX is the only bit that can affect the MMU role.  However, set_efer accepts
> a host-initiated change to EFER.LME even with CR0.PG=1.  In that case, the
> MMU has to be reset.
> 
> Fixes: 11988499e62b ("KVM: x86: Skip EFER vs. guest CPUID checks for host-initiated writes")
> Cc: stable@vger.kernel.org
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  arch/x86/kvm/mmu.h | 1 +
>  arch/x86/kvm/x86.c | 2 +-
>  2 files changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> index 51faa2c76ca5..a5a50cfeffff 100644
> --- a/arch/x86/kvm/mmu.h
> +++ b/arch/x86/kvm/mmu.h
> @@ -48,6 +48,7 @@
>  			       X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_PKE)
>  
>  #define KVM_MMU_CR0_ROLE_BITS (X86_CR0_PG | X86_CR0_WP)
> +#define KVM_MMU_EFER_ROLE_BITS (EFER_LME | EFER_NX)
>  
>  static __always_inline u64 rsvd_bits(int s, int e)
>  {
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index d3da64106685..99a58c25f5c2 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1647,7 +1647,7 @@ static int set_efer(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  	}
>  
>  	/* Update reserved bits */
> -	if ((efer ^ old_efer) & EFER_NX)
> +	if ((efer ^ old_efer) & KVM_MMU_EFER_ROLE_BITS)
>  		kvm_mmu_reset_context(vcpu);
>  
>  	return 0;

It makes sense.

I am just curios, is there a report of failure
due to this issue? I can imagine something like this breaking
nested migration of 32 bit guests and such and/or smm and such.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 02/18] KVM: x86: do not deliver asynchronous page faults if CR0.PG=0
  2022-02-17 21:03 ` [PATCH v2 02/18] KVM: x86: do not deliver asynchronous page faults if CR0.PG=0 Paolo Bonzini
  2022-02-18 17:12   ` Sean Christopherson
@ 2022-02-23 14:07   ` Maxim Levitsky
  1 sibling, 0 replies; 66+ messages in thread
From: Maxim Levitsky @ 2022-02-23 14:07 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm; +Cc: seanjc

On Thu, 2022-02-17 at 16:03 -0500, Paolo Bonzini wrote:
> Enabling async page faults is nonsensical if paging is disabled, but
> it is allowed because CR0.PG=0 does not clear the async page fault
> MSR.  Just ignore them and only use the artificial halt state,
> similar to what happens in guest mode if async #PF vmexits are disabled.

Well in theory someone could use KVM for emulating DOS programs, and
use async #PF for on demand paging. I would question sanity of author
of such hypervisor though...


The only thing I would add is to add a mention of the CR0.PG=1 restriction in 
Documentation/virt/kvm/msr.rst.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky

> 
> Given the increasingly complex logic, and the nicer code if the new
> "if" is placed last, opportunistically change the "||" into a chain
> of "if (...) return false" statements.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  arch/x86/kvm/x86.c | 22 ++++++++++++++++++----
>  1 file changed, 18 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 99a58c25f5c2..b912eef5dc1a 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -12270,14 +12270,28 @@ static inline bool apf_pageready_slot_free(struct kvm_vcpu *vcpu)
>  
>  static bool kvm_can_deliver_async_pf(struct kvm_vcpu *vcpu)
>  {
> -	if (!vcpu->arch.apf.delivery_as_pf_vmexit && is_guest_mode(vcpu))
> +
> +	if (!kvm_pv_async_pf_enabled(vcpu))
>  		return false;
>  
> -	if (!kvm_pv_async_pf_enabled(vcpu) ||
> -	    (vcpu->arch.apf.send_user_only && static_call(kvm_x86_get_cpl)(vcpu) == 0))
> +	if (vcpu->arch.apf.send_user_only &&
> +	    static_call(kvm_x86_get_cpl)(vcpu) == 0)
>  		return false;
>  
> -	return true;
> +	if (is_guest_mode(vcpu)) {
> +		/*
> +		 * L1 needs to opt into the special #PF vmexits that are
> +		 * used to deliver async page faults.
> +		 */
> +		return vcpu->arch.apf.delivery_as_pf_vmexit;
> +	} else {
> +		/*
> +		 * Play it safe in case the guest does a quick real mode
> +		 * foray.  The real mode IDT is unlikely to have a #PF
> +		 * exception setup.
> +		 */
> +		return is_paging(vcpu);
> +	}
>  }
>  
>  bool kvm_can_do_async_pf(struct kvm_vcpu *vcpu)



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 03/18] KVM: x86/mmu: WARN if PAE roots linger after kvm_mmu_unload
  2022-02-18 17:23     ` Paolo Bonzini
@ 2022-02-23 14:11       ` Maxim Levitsky
  0 siblings, 0 replies; 66+ messages in thread
From: Maxim Levitsky @ 2022-02-23 14:11 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson; +Cc: linux-kernel, kvm

On Fri, 2022-02-18 at 18:23 +0100, Paolo Bonzini wrote:
> On 2/18/22 18:14, Sean Christopherson wrote:
> > Checkpatch doesn't like it, and IMO the existing asserts
> > are unnecessary.
> 
> I agree that removing the assertions could be another way to go.
> 
> A third and better one could be to just wait until pae_root is gone.  I 
> have started looking at it but I would like your opinion on one detail; 
> see question I posted at 
> https://lore.kernel.org/kvm/7ccb16e5-579e-b3d9-cedc-305152ef9b8f@redhat.com/.
> 
> For now I'll drop this patch.

IMHO, the idea of having shadow pages backing the synthetic pages like pae roots,
is a very good idea.

I hope I get to review that RFC very soon.

Best regards,
	Maxim Levitsky

> 
> Paolo
> 



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 04/18] KVM: x86/mmu: avoid NULL-pointer dereference on page freeing bugs
  2022-02-17 21:03 ` [PATCH v2 04/18] KVM: x86/mmu: avoid NULL-pointer dereference on page freeing bugs Paolo Bonzini
  2022-02-18 17:15   ` Sean Christopherson
@ 2022-02-23 14:12   ` Maxim Levitsky
  1 sibling, 0 replies; 66+ messages in thread
From: Maxim Levitsky @ 2022-02-23 14:12 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm; +Cc: seanjc

On Thu, 2022-02-17 at 16:03 -0500, Paolo Bonzini wrote:
> WARN and bail if KVM attempts to free a root that isn't backed by a shadow
> page.  KVM allocates a bare page for "special" roots, e.g. when using PAE
> paging or shadowing 2/3/4-level page tables with 4/5-level, and so root_hpa
> will be valid but won't be backed by a shadow page.  It's all too easy to
> blindly call mmu_free_root_page() on root_hpa, be nice and WARN instead of
> crashing KVM and possibly the kernel.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  arch/x86/kvm/mmu/mmu.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index a67071ac80f3..6ea423b00824 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -3222,6 +3222,8 @@ static void mmu_free_root_page(struct kvm *kvm, hpa_t *root_hpa,
>  		return;
>  
>  	sp = to_shadow_page(*root_hpa & PT64_BASE_ADDR_MASK);
> +	if (WARN_ON(!sp))
> +		return;
>  
>  	if (is_tdp_mmu_page(sp))
>  		kvm_tdp_mmu_put_root(kvm, sp, false);



Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 05/18] KVM: x86/mmu: use struct kvm_mmu_root_info for mmu->root
  2022-02-17 21:03 ` [PATCH v2 05/18] KVM: x86/mmu: use struct kvm_mmu_root_info for mmu->root Paolo Bonzini
@ 2022-02-23 14:39   ` Maxim Levitsky
  2022-02-23 15:42     ` Sean Christopherson
  0 siblings, 1 reply; 66+ messages in thread
From: Maxim Levitsky @ 2022-02-23 14:39 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm; +Cc: seanjc

On Thu, 2022-02-17 at 16:03 -0500, Paolo Bonzini wrote:
> The root_hpa and root_pgd fields form essentially a struct kvm_mmu_root_info.
> Use the struct to have more consistency between mmu->root and
> mmu->prev_roots.
> 
> The patch is entirely search and replace except for cached_root_available,
> which does not need a temporary struct kvm_mmu_root_info anymore.
> 
> Reviewed-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  arch/x86/include/asm/kvm_host.h |  3 +-
>  arch/x86/kvm/mmu.h              |  4 +-
>  arch/x86/kvm/mmu/mmu.c          | 69 +++++++++++++++------------------
>  arch/x86/kvm/mmu/mmu_audit.c    |  4 +-
>  arch/x86/kvm/mmu/paging_tmpl.h  |  2 +-
>  arch/x86/kvm/mmu/tdp_mmu.c      |  2 +-
>  arch/x86/kvm/mmu/tdp_mmu.h      |  2 +-
>  arch/x86/kvm/vmx/nested.c       |  2 +-
>  arch/x86/kvm/vmx/vmx.c          |  2 +-
>  arch/x86/kvm/x86.c              |  2 +-
>  10 files changed, 42 insertions(+), 50 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 8e512f25a930..6442facfd5c0 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -432,8 +432,7 @@ struct kvm_mmu {
>  	int (*sync_page)(struct kvm_vcpu *vcpu,
>  			 struct kvm_mmu_page *sp);
>  	void (*invlpg)(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root_hpa);
> -	hpa_t root_hpa;
> -	gpa_t root_pgd;
> +	struct kvm_mmu_root_info root;
>  	union kvm_mmu_role mmu_role;
>  	u8 root_level;
>  	u8 shadow_root_level;
> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> index a5a50cfeffff..1d0c1904d69a 100644
> --- a/arch/x86/kvm/mmu.h
> +++ b/arch/x86/kvm/mmu.h
> @@ -85,7 +85,7 @@ void kvm_mmu_sync_prev_roots(struct kvm_vcpu *vcpu);
>  
>  static inline int kvm_mmu_reload(struct kvm_vcpu *vcpu)
>  {
> -	if (likely(vcpu->arch.mmu->root_hpa != INVALID_PAGE))
> +	if (likely(vcpu->arch.mmu->root.hpa != INVALID_PAGE))
>  		return 0;
>  
>  	return kvm_mmu_load(vcpu);
> @@ -107,7 +107,7 @@ static inline unsigned long kvm_get_active_pcid(struct kvm_vcpu *vcpu)
>  
>  static inline void kvm_mmu_load_pgd(struct kvm_vcpu *vcpu)
>  {
> -	u64 root_hpa = vcpu->arch.mmu->root_hpa;
> +	u64 root_hpa = vcpu->arch.mmu->root.hpa;
>  
>  	if (!VALID_PAGE(root_hpa))
>  		return;
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 6ea423b00824..a478667d7561 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -2162,7 +2162,7 @@ static void shadow_walk_init_using_root(struct kvm_shadow_walk_iterator *iterato
>  		 * prev_root is currently only used for 64-bit hosts. So only
>  		 * the active root_hpa is valid here.
>  		 */
> -		BUG_ON(root != vcpu->arch.mmu->root_hpa);
> +		BUG_ON(root != vcpu->arch.mmu->root.hpa);
>  
>  		iterator->shadow_addr
>  			= vcpu->arch.mmu->pae_root[(addr >> 30) & 3];
> @@ -2176,7 +2176,7 @@ static void shadow_walk_init_using_root(struct kvm_shadow_walk_iterator *iterato
>  static void shadow_walk_init(struct kvm_shadow_walk_iterator *iterator,
>  			     struct kvm_vcpu *vcpu, u64 addr)
>  {
> -	shadow_walk_init_using_root(iterator, vcpu, vcpu->arch.mmu->root_hpa,
> +	shadow_walk_init_using_root(iterator, vcpu, vcpu->arch.mmu->root.hpa,
>  				    addr);
>  }
>  
> @@ -3245,7 +3245,7 @@ void kvm_mmu_free_roots(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
>  	BUILD_BUG_ON(KVM_MMU_NUM_PREV_ROOTS >= BITS_PER_LONG);
>  
>  	/* Before acquiring the MMU lock, see if we need to do any real work. */
> -	if (!(free_active_root && VALID_PAGE(mmu->root_hpa))) {
> +	if (!(free_active_root && VALID_PAGE(mmu->root.hpa))) {
>  		for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++)
>  			if ((roots_to_free & KVM_MMU_ROOT_PREVIOUS(i)) &&
>  			    VALID_PAGE(mmu->prev_roots[i].hpa))
> @@ -3265,7 +3265,7 @@ void kvm_mmu_free_roots(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
>  	if (free_active_root) {
>  		if (mmu->shadow_root_level >= PT64_ROOT_4LEVEL &&
>  		    (mmu->root_level >= PT64_ROOT_4LEVEL || mmu->direct_map)) {
> -			mmu_free_root_page(kvm, &mmu->root_hpa, &invalid_list);
> +			mmu_free_root_page(kvm, &mmu->root.hpa, &invalid_list);
>  		} else if (mmu->pae_root) {
>  			for (i = 0; i < 4; ++i) {
>  				if (!IS_VALID_PAE_ROOT(mmu->pae_root[i]))
> @@ -3276,8 +3276,8 @@ void kvm_mmu_free_roots(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
>  				mmu->pae_root[i] = INVALID_PAE_ROOT;
>  			}
>  		}
> -		mmu->root_hpa = INVALID_PAGE;
> -		mmu->root_pgd = 0;
> +		mmu->root.hpa = INVALID_PAGE;
> +		mmu->root.pgd = 0;
>  	}
>  
>  	kvm_mmu_commit_zap_page(kvm, &invalid_list);
> @@ -3350,10 +3350,10 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
>  
>  	if (is_tdp_mmu_enabled(vcpu->kvm)) {
>  		root = kvm_tdp_mmu_get_vcpu_root_hpa(vcpu);
> -		mmu->root_hpa = root;
> +		mmu->root.hpa = root;
>  	} else if (shadow_root_level >= PT64_ROOT_4LEVEL) {
>  		root = mmu_alloc_root(vcpu, 0, 0, shadow_root_level, true);
> -		mmu->root_hpa = root;
> +		mmu->root.hpa = root;
>  	} else if (shadow_root_level == PT32E_ROOT_LEVEL) {
>  		if (WARN_ON_ONCE(!mmu->pae_root)) {
>  			r = -EIO;
> @@ -3368,15 +3368,15 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
>  			mmu->pae_root[i] = root | PT_PRESENT_MASK |
>  					   shadow_me_mask;
>  		}
> -		mmu->root_hpa = __pa(mmu->pae_root);
> +		mmu->root.hpa = __pa(mmu->pae_root);
>  	} else {
>  		WARN_ONCE(1, "Bad TDP root level = %d\n", shadow_root_level);
>  		r = -EIO;
>  		goto out_unlock;
>  	}
>  
> -	/* root_pgd is ignored for direct MMUs. */
> -	mmu->root_pgd = 0;
> +	/* root.pgd is ignored for direct MMUs. */
> +	mmu->root.pgd = 0;
>  out_unlock:
>  	write_unlock(&vcpu->kvm->mmu_lock);
>  	return r;
> @@ -3489,7 +3489,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
>  	if (mmu->root_level >= PT64_ROOT_4LEVEL) {
>  		root = mmu_alloc_root(vcpu, root_gfn, 0,
>  				      mmu->shadow_root_level, false);
> -		mmu->root_hpa = root;
> +		mmu->root.hpa = root;
>  		goto set_root_pgd;
>  	}
>  
> @@ -3539,14 +3539,14 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
>  	}
>  
>  	if (mmu->shadow_root_level == PT64_ROOT_5LEVEL)
> -		mmu->root_hpa = __pa(mmu->pml5_root);
> +		mmu->root.hpa = __pa(mmu->pml5_root);
>  	else if (mmu->shadow_root_level == PT64_ROOT_4LEVEL)
> -		mmu->root_hpa = __pa(mmu->pml4_root);
> +		mmu->root.hpa = __pa(mmu->pml4_root);
>  	else
> -		mmu->root_hpa = __pa(mmu->pae_root);
> +		mmu->root.hpa = __pa(mmu->pae_root);
>  
>  set_root_pgd:
> -	mmu->root_pgd = root_pgd;
> +	mmu->root.pgd = root_pgd;
>  out_unlock:
>  	write_unlock(&vcpu->kvm->mmu_lock);
>  
> @@ -3659,13 +3659,13 @@ void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu)
>  	if (vcpu->arch.mmu->direct_map)
>  		return;
>  
> -	if (!VALID_PAGE(vcpu->arch.mmu->root_hpa))
> +	if (!VALID_PAGE(vcpu->arch.mmu->root.hpa))
>  		return;
>  
>  	vcpu_clear_mmio_info(vcpu, MMIO_GVA_ANY);
>  
>  	if (vcpu->arch.mmu->root_level >= PT64_ROOT_4LEVEL) {
> -		hpa_t root = vcpu->arch.mmu->root_hpa;
> +		hpa_t root = vcpu->arch.mmu->root.hpa;
>  		sp = to_shadow_page(root);
>  
>  		if (!is_unsync_root(root))
> @@ -3956,7 +3956,7 @@ static bool kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
>  static bool is_page_fault_stale(struct kvm_vcpu *vcpu,
>  				struct kvm_page_fault *fault, int mmu_seq)
>  {
> -	struct kvm_mmu_page *sp = to_shadow_page(vcpu->arch.mmu->root_hpa);
> +	struct kvm_mmu_page *sp = to_shadow_page(vcpu->arch.mmu->root.hpa);
>  
>  	/* Special roots, e.g. pae_root, are not backed by shadow pages. */
>  	if (sp && is_obsolete_sp(vcpu->kvm, sp))
> @@ -4113,34 +4113,27 @@ static inline bool is_root_usable(struct kvm_mmu_root_info *root, gpa_t pgd,
>  /*
>   * Find out if a previously cached root matching the new pgd/role is available.
>   * The current root is also inserted into the cache.
> - * If a matching root was found, it is assigned to kvm_mmu->root_hpa and true is
> + * If a matching root was found, it is assigned to kvm_mmu->root.hpa and true is
>   * returned.
> - * Otherwise, the LRU root from the cache is assigned to kvm_mmu->root_hpa and
> + * Otherwise, the LRU root from the cache is assigned to kvm_mmu->root.hpa and
>   * false is returned. This root should now be freed by the caller.
>   */
>  static bool cached_root_available(struct kvm_vcpu *vcpu, gpa_t new_pgd,
>  				  union kvm_mmu_page_role new_role)
>  {
>  	uint i;
> -	struct kvm_mmu_root_info root;
>  	struct kvm_mmu *mmu = vcpu->arch.mmu;
>  
> -	root.pgd = mmu->root_pgd;
> -	root.hpa = mmu->root_hpa;
> -
> -	if (is_root_usable(&root, new_pgd, new_role))
> +	if (is_root_usable(&mmu->root, new_pgd, new_role))
>  		return true;
>  
>  	for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++) {
> -		swap(root, mmu->prev_roots[i]);
> +		swap(mmu->root, mmu->prev_roots[i]);
>  
> -		if (is_root_usable(&root, new_pgd, new_role))
> +		if (is_root_usable(&mmu->root, new_pgd, new_role))
>  			break;
>  	}
>  
> -	mmu->root_hpa = root.hpa;
> -	mmu->root_pgd = root.pgd;
> -
>  	return i < KVM_MMU_NUM_PREV_ROOTS;
>  }
>  
> @@ -4196,7 +4189,7 @@ static void __kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd,
>  	 */
>  	if (!new_role.direct)
>  		__clear_sp_write_flooding_count(
> -				to_shadow_page(vcpu->arch.mmu->root_hpa));
> +				to_shadow_page(vcpu->arch.mmu->root.hpa));
>  }
>  
>  void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd)
> @@ -5092,7 +5085,7 @@ static void __kvm_mmu_unload(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu)
>  {
>  	int i;
>  	kvm_mmu_free_roots(vcpu, mmu, KVM_MMU_ROOTS_ALL);
> -	WARN_ON(VALID_PAGE(mmu->root_hpa));
> +	WARN_ON(VALID_PAGE(mmu->root.hpa));
>  	if (mmu->pae_root) {
>  		for (i = 0; i < 4; ++i)
>  			WARN_ON(IS_VALID_PAE_ROOT(mmu->pae_root[i]));
> @@ -5287,7 +5280,7 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 error_code,
>  	int r, emulation_type = EMULTYPE_PF;
>  	bool direct = vcpu->arch.mmu->direct_map;
>  
> -	if (WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root_hpa)))
> +	if (WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root.hpa)))
>  		return RET_PF_RETRY;
>  
>  	r = RET_PF_INVALID;
> @@ -5359,7 +5352,7 @@ void kvm_mmu_invalidate_gva(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
>  		return;
>  
>  	if (root_hpa == INVALID_PAGE) {
> -		mmu->invlpg(vcpu, gva, mmu->root_hpa);
> +		mmu->invlpg(vcpu, gva, mmu->root.hpa);
>  
>  		/*
>  		 * INVLPG is required to invalidate any global mappings for the VA,
> @@ -5395,7 +5388,7 @@ void kvm_mmu_invpcid_gva(struct kvm_vcpu *vcpu, gva_t gva, unsigned long pcid)
>  	uint i;
>  
>  	if (pcid == kvm_get_active_pcid(vcpu)) {
> -		mmu->invlpg(vcpu, gva, mmu->root_hpa);
> +		mmu->invlpg(vcpu, gva, mmu->root.hpa);
>  		tlb_flush = true;
>  	}
>  
> @@ -5508,8 +5501,8 @@ static int __kvm_mmu_create(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu)
>  	struct page *page;
>  	int i;
>  
> -	mmu->root_hpa = INVALID_PAGE;
> -	mmu->root_pgd = 0;
> +	mmu->root.hpa = INVALID_PAGE;
> +	mmu->root.pgd = 0;
>  	for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++)
>  		mmu->prev_roots[i] = KVM_MMU_ROOT_INFO_INVALID;
>  
> diff --git a/arch/x86/kvm/mmu/mmu_audit.c b/arch/x86/kvm/mmu/mmu_audit.c
> index f31fdb874f1f..3e5d62a25350 100644
> --- a/arch/x86/kvm/mmu/mmu_audit.c
> +++ b/arch/x86/kvm/mmu/mmu_audit.c
> @@ -56,11 +56,11 @@ static void mmu_spte_walk(struct kvm_vcpu *vcpu, inspect_spte_fn fn)
>  	int i;
>  	struct kvm_mmu_page *sp;
>  
> -	if (!VALID_PAGE(vcpu->arch.mmu->root_hpa))
> +	if (!VALID_PAGE(vcpu->arch.mmu->root.hpa))
>  		return;
>  
>  	if (vcpu->arch.mmu->root_level >= PT64_ROOT_4LEVEL) {
> -		hpa_t root = vcpu->arch.mmu->root_hpa;
> +		hpa_t root = vcpu->arch.mmu->root.hpa;
>  
>  		sp = to_shadow_page(root);
>  		__mmu_spte_walk(vcpu, sp, fn, vcpu->arch.mmu->root_level);
> diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
> index 5b5bdac97c7b..346f3bad3cb9 100644
> --- a/arch/x86/kvm/mmu/paging_tmpl.h
> +++ b/arch/x86/kvm/mmu/paging_tmpl.h
> @@ -668,7 +668,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
>  	if (FNAME(gpte_changed)(vcpu, gw, top_level))
>  		goto out_gpte_changed;
>  
> -	if (WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root_hpa)))
> +	if (WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root.hpa)))
>  		goto out_gpte_changed;
>  
>  	for (shadow_walk_init(&it, vcpu, fault->addr);
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index 8def8f810cb0..debf08212f12 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -657,7 +657,7 @@ static inline void tdp_mmu_set_spte_no_dirty_log(struct kvm *kvm,
>  		else
>  
>  #define tdp_mmu_for_each_pte(_iter, _mmu, _start, _end)		\
> -	for_each_tdp_pte(_iter, to_shadow_page(_mmu->root_hpa), _start, _end)
> +	for_each_tdp_pte(_iter, to_shadow_page(_mmu->root.hpa), _start, _end)
>  
>  /*
>   * Yield if the MMU lock is contended or this thread needs to return control
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
> index 3f987785702a..57c73d8f76ce 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.h
> +++ b/arch/x86/kvm/mmu/tdp_mmu.h
> @@ -95,7 +95,7 @@ static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp) { return sp->tdp_mmu
>  static inline bool is_tdp_mmu(struct kvm_mmu *mmu)
>  {
>  	struct kvm_mmu_page *sp;
> -	hpa_t hpa = mmu->root_hpa;
> +	hpa_t hpa = mmu->root.hpa;
>  
>  	if (WARN_ON(!VALID_PAGE(hpa)))
>  		return false;
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index c73e4d938ddc..29289ecca223 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -5466,7 +5466,7 @@ static int handle_invept(struct kvm_vcpu *vcpu)
>  				VMXERR_INVALID_OPERAND_TO_INVEPT_INVVPID);
>  
>  		roots_to_free = 0;
> -		if (nested_ept_root_matches(mmu->root_hpa, mmu->root_pgd,
> +		if (nested_ept_root_matches(mmu->root.hpa, mmu->root.pgd,
>  					    operand.eptp))
>  			roots_to_free |= KVM_MMU_ROOT_CURRENT;
>  
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index d8547144d3b7..b183dfc41d74 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -2952,7 +2952,7 @@ static inline int vmx_get_current_vpid(struct kvm_vcpu *vcpu)
>  static void vmx_flush_tlb_current(struct kvm_vcpu *vcpu)
>  {
>  	struct kvm_mmu *mmu = vcpu->arch.mmu;
> -	u64 root_hpa = mmu->root_hpa;
> +	u64 root_hpa = mmu->root.hpa;
>  
>  	/* No flush required if the current context is invalid. */
>  	if (!VALID_PAGE(root_hpa))
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index b912eef5dc1a..c0d7256e3a78 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -762,7 +762,7 @@ bool kvm_inject_emulated_page_fault(struct kvm_vcpu *vcpu,
>  	if ((fault->error_code & PFERR_PRESENT_MASK) &&
>  	    !(fault->error_code & PFERR_RSVD_MASK))
>  		kvm_mmu_invalidate_gva(vcpu, fault_mmu, fault->address,
> -				       fault_mmu->root_hpa);
> +				       fault_mmu->root.hpa);
>  
>  	fault_mmu->inject_page_fault(vcpu, fault);
>  	return fault->nested_page_fault;


As a follow up to this patch I suggest that we should also rename pgd to just 'gpa'.

This also brings a question, what pgd acronym actually means? 
I guess paging guest directory?

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 06/18] KVM: x86/mmu: do not consult levels when freeing roots
  2022-02-17 21:03 ` [PATCH v2 06/18] KVM: x86/mmu: do not consult levels when freeing roots Paolo Bonzini
  2022-02-18 17:27   ` Sean Christopherson
@ 2022-02-23 14:59   ` Maxim Levitsky
  1 sibling, 0 replies; 66+ messages in thread
From: Maxim Levitsky @ 2022-02-23 14:59 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm; +Cc: seanjc

On Thu, 2022-02-17 at 16:03 -0500, Paolo Bonzini wrote:
> Right now, PGD caching requires a complicated dance of first computing
> the MMU role and passing it to __kvm_mmu_new_pgd(), and then separately calling
> kvm_init_mmu().
> 
> Part of this is due to kvm_mmu_free_roots using mmu->root_level and
> mmu->shadow_root_level to distinguish whether the page table uses a single
> root or 4 PAE roots.  Because kvm_init_mmu() can overwrite mmu->root_level,
> kvm_mmu_free_roots() must be called before kvm_init_mmu().
> 
> However, even after kvm_init_mmu() there is a way to detect whether the
> page table may hold PAE roots, as root.hpa isn't backed by a shadow when
> it points at PAE roots.  Using this method results in simpler code, and
> is one less obstacle in moving all calls to __kvm_mmu_new_pgd() after the
> MMU has been initialized.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  arch/x86/kvm/mmu/mmu.c | 10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index a478667d7561..e1578f71feae 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -3240,12 +3240,15 @@ void kvm_mmu_free_roots(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
>  	struct kvm *kvm = vcpu->kvm;
>  	int i;
>  	LIST_HEAD(invalid_list);
> -	bool free_active_root = roots_to_free & KVM_MMU_ROOT_CURRENT;
> +	bool free_active_root;
>  
>  	BUILD_BUG_ON(KVM_MMU_NUM_PREV_ROOTS >= BITS_PER_LONG);
>  
>  	/* Before acquiring the MMU lock, see if we need to do any real work. */
> -	if (!(free_active_root && VALID_PAGE(mmu->root.hpa))) {
> +	free_active_root = (roots_to_free & KVM_MMU_ROOT_CURRENT)
> +		&& VALID_PAGE(mmu->root.hpa);
> +
> +	if (!free_active_root) {
>  		for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++)
>  			if ((roots_to_free & KVM_MMU_ROOT_PREVIOUS(i)) &&
>  			    VALID_PAGE(mmu->prev_roots[i].hpa))
> @@ -3263,8 +3266,7 @@ void kvm_mmu_free_roots(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
>  					   &invalid_list);
>  
>  	if (free_active_root) {
> -		if (mmu->shadow_root_level >= PT64_ROOT_4LEVEL &&
> -		    (mmu->root_level >= PT64_ROOT_4LEVEL || mmu->direct_map)) {
> +		if (to_shadow_page(mmu->root.hpa)) {
>  			mmu_free_root_page(kvm, &mmu->root.hpa, &invalid_list);
>  		} else if (mmu->pae_root) {
>  			for (i = 0; i < 4; ++i) {

Makes sense, although that will collide hard with that RFC of backing all shadow pages
with kvm mmu pages.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 07/18] KVM: x86/mmu: Do not use guest root level in audit
  2022-02-18 18:46     ` Paolo Bonzini
@ 2022-02-23 15:02       ` Maxim Levitsky
  0 siblings, 0 replies; 66+ messages in thread
From: Maxim Levitsky @ 2022-02-23 15:02 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson; +Cc: linux-kernel, kvm, Lai Jiangshan

On Fri, 2022-02-18 at 19:46 +0100, Paolo Bonzini wrote:
> On 2/18/22 19:37, Sean Christopherson wrote:
> > Since I keep bringing it up...
> > 
> > From: Sean Christopherson<seanjc@google.com>
> > Date: Fri, 18 Feb 2022 09:43:05 -0800
> > Subject: [PATCH] KVM: x86/mmu: Remove MMU auditing
> > 
> > Remove mmu_audit.c and all its collateral, the auditing code has suffered
> > severe bitrot, ironically partly due to shadow paging being more stable
> > and thus not benefiting as much from auditing, but mostly due to TDP
> > supplanting shadow paging for non-nested guests and shadowing of nested
> > TDP not heavily stressing the logic that is being audited.
> > 
> > Signed-off-by: Sean Christopherson<seanjc@google.com>
> 
> Queued, thanks. O:-)

I once kind of played with it.

Note that shadow mmu does have bugs - I can easily crash L1/L2 when
doing repeated migrations when I disable NPT either in L0 or L1,
and when I force the mmu to be always sync (see my strict_mmu patch),
the crashes go away.

mmu audit maybe could have helped with that.

But I won't argue too much about this.

Best regards,
	Maxim Levitsky

> 
> Paolo
> 



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 08/18] KVM: x86/mmu: do not pass vcpu to root freeing functions
  2022-02-17 21:03 ` [PATCH v2 08/18] KVM: x86/mmu: do not pass vcpu to root freeing functions Paolo Bonzini
  2022-02-18 18:39   ` Sean Christopherson
@ 2022-02-23 15:16   ` Maxim Levitsky
  2022-02-23 15:48     ` Sean Christopherson
  1 sibling, 1 reply; 66+ messages in thread
From: Maxim Levitsky @ 2022-02-23 15:16 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm; +Cc: seanjc

On Thu, 2022-02-17 at 16:03 -0500, Paolo Bonzini wrote:
> These functions only operate on a given MMU, of which there are two in a vCPU.
> They also need a struct kvm in order to lock the mmu_lock, but they do not
> needed anything else in the struct kvm_vcpu.  So, pass the vcpu->kvm directly
> to them.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  arch/x86/include/asm/kvm_host.h |  4 ++--
>  arch/x86/kvm/mmu/mmu.c          | 21 +++++++++++----------
>  arch/x86/kvm/vmx/nested.c       |  8 ++++----
>  arch/x86/kvm/x86.c              |  4 ++--
>  4 files changed, 19 insertions(+), 18 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 6442facfd5c0..79f37ccc8726 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1780,9 +1780,9 @@ void kvm_inject_nmi(struct kvm_vcpu *vcpu);
>  void kvm_update_dr7(struct kvm_vcpu *vcpu);
>  
>  int kvm_mmu_unprotect_page(struct kvm *kvm, gfn_t gfn);
> -void kvm_mmu_free_roots(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
> +void kvm_mmu_free_roots(struct kvm *kvm, struct kvm_mmu *mmu,
>  			ulong roots_to_free);
> -void kvm_mmu_free_guest_mode_roots(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu);
> +void kvm_mmu_free_guest_mode_roots(struct kvm *kvm, struct kvm_mmu *mmu);
>  gpa_t kvm_mmu_gva_to_gpa_read(struct kvm_vcpu *vcpu, gva_t gva,
>  			      struct x86_exception *exception);
>  gpa_t kvm_mmu_gva_to_gpa_fetch(struct kvm_vcpu *vcpu, gva_t gva,
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index e1578f71feae..0f2de811e871 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -3234,10 +3234,9 @@ static void mmu_free_root_page(struct kvm *kvm, hpa_t *root_hpa,
>  }
>  
>  /* roots_to_free must be some combination of the KVM_MMU_ROOT_* flags */
> -void kvm_mmu_free_roots(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
> +void kvm_mmu_free_roots(struct kvm *kvm, struct kvm_mmu *mmu,
>  			ulong roots_to_free)
>  {
> -	struct kvm *kvm = vcpu->kvm;
>  	int i;
>  	LIST_HEAD(invalid_list);
>  	bool free_active_root;
> @@ -3287,7 +3286,7 @@ void kvm_mmu_free_roots(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
>  }
>  EXPORT_SYMBOL_GPL(kvm_mmu_free_roots);
>  
> -void kvm_mmu_free_guest_mode_roots(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu)
> +void kvm_mmu_free_guest_mode_roots(struct kvm *kvm, struct kvm_mmu *mmu)
>  {
>  	unsigned long roots_to_free = 0;
>  	hpa_t root_hpa;
> @@ -3309,7 +3308,7 @@ void kvm_mmu_free_guest_mode_roots(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu)
>  			roots_to_free |= KVM_MMU_ROOT_PREVIOUS(i);
>  	}
>  
> -	kvm_mmu_free_roots(vcpu, mmu, roots_to_free);
> +	kvm_mmu_free_roots(kvm, mmu, roots_to_free);
>  }
>  EXPORT_SYMBOL_GPL(kvm_mmu_free_guest_mode_roots);
>  
> @@ -3710,7 +3709,7 @@ void kvm_mmu_sync_prev_roots(struct kvm_vcpu *vcpu)
>  			roots_to_free |= KVM_MMU_ROOT_PREVIOUS(i);
>  
>  	/* sync prev_roots by simply freeing them */
> -	kvm_mmu_free_roots(vcpu, vcpu->arch.mmu, roots_to_free);
> +	kvm_mmu_free_roots(vcpu->kvm, vcpu->arch.mmu, roots_to_free);
>  }
>  
>  static gpa_t nonpaging_gva_to_gpa(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
> @@ -4159,8 +4158,10 @@ static bool fast_pgd_switch(struct kvm_vcpu *vcpu, gpa_t new_pgd,
>  static void __kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd,
>  			      union kvm_mmu_page_role new_role)
>  {
> +	struct kvm_mmu *mmu = vcpu->arch.mmu;
> +
>  	if (!fast_pgd_switch(vcpu, new_pgd, new_role)) {
> -		kvm_mmu_free_roots(vcpu, vcpu->arch.mmu, KVM_MMU_ROOT_CURRENT);
> +		kvm_mmu_free_roots(vcpu->kvm, mmu, KVM_MMU_ROOT_CURRENT);
>  		return;
>  	}
>  
> @@ -5083,10 +5084,10 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu)
>  	return r;
>  }
>  
> -static void __kvm_mmu_unload(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu)
> +static void __kvm_mmu_unload(struct kvm *kvm, struct kvm_mmu *mmu)
>  {
>  	int i;
> -	kvm_mmu_free_roots(vcpu, mmu, KVM_MMU_ROOTS_ALL);
> +	kvm_mmu_free_roots(kvm, mmu, KVM_MMU_ROOTS_ALL);
>  	WARN_ON(VALID_PAGE(mmu->root.hpa));
>  	if (mmu->pae_root) {
>  		for (i = 0; i < 4; ++i)
> @@ -5096,8 +5097,8 @@ static void __kvm_mmu_unload(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu)
>  
>  void kvm_mmu_unload(struct kvm_vcpu *vcpu)
>  {
> -	__kvm_mmu_unload(vcpu, &vcpu->arch.root_mmu);
> -	__kvm_mmu_unload(vcpu, &vcpu->arch.guest_mmu);
> +	__kvm_mmu_unload(vcpu->kvm, &vcpu->arch.root_mmu);
> +	__kvm_mmu_unload(vcpu->kvm, &vcpu->arch.guest_mmu);
>  }
>  
>  static bool need_remote_flush(u64 old, u64 new)
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 29289ecca223..b7bc634d35e2 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -321,7 +321,7 @@ static void free_nested(struct kvm_vcpu *vcpu)
>  	kvm_vcpu_unmap(vcpu, &vmx->nested.pi_desc_map, true);
>  	vmx->nested.pi_desc = NULL;
>  
> -	kvm_mmu_free_roots(vcpu, &vcpu->arch.guest_mmu, KVM_MMU_ROOTS_ALL);
> +	kvm_mmu_free_roots(vcpu->kvm, &vcpu->arch.guest_mmu, KVM_MMU_ROOTS_ALL);
>  
>  	nested_release_evmcs(vcpu);
>  
> @@ -5007,7 +5007,7 @@ static inline void nested_release_vmcs12(struct kvm_vcpu *vcpu)
>  				  vmx->nested.current_vmptr >> PAGE_SHIFT,
>  				  vmx->nested.cached_vmcs12, 0, VMCS12_SIZE);
>  
> -	kvm_mmu_free_roots(vcpu, &vcpu->arch.guest_mmu, KVM_MMU_ROOTS_ALL);
> +	kvm_mmu_free_roots(vcpu->kvm, &vcpu->arch.guest_mmu, KVM_MMU_ROOTS_ALL);
>  
>  	vmx->nested.current_vmptr = INVALID_GPA;
>  }
> @@ -5486,7 +5486,7 @@ static int handle_invept(struct kvm_vcpu *vcpu)
>  	}
>  
>  	if (roots_to_free)
> -		kvm_mmu_free_roots(vcpu, mmu, roots_to_free);
> +		kvm_mmu_free_roots(vcpu->kvm, mmu, roots_to_free);
>  
>  	return nested_vmx_succeed(vcpu);
>  }
> @@ -5575,7 +5575,7 @@ static int handle_invvpid(struct kvm_vcpu *vcpu)
>  	 * TODO: sync only the affected SPTEs for INVDIVIDUAL_ADDR.
>  	 */
>  	if (!enable_ept)
> -		kvm_mmu_free_guest_mode_roots(vcpu, &vcpu->arch.root_mmu);
> +		kvm_mmu_free_guest_mode_roots(vcpu->kvm, &vcpu->arch.root_mmu);
>  
>  	return nested_vmx_succeed(vcpu);
>  }
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index c0d7256e3a78..6aefd7ac7039 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -855,7 +855,7 @@ int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long cr3)
>  	 * Shadow page roots need to be reconstructed instead.
>  	 */
>  	if (!tdp_enabled && memcmp(mmu->pdptrs, pdpte, sizeof(mmu->pdptrs)))
> -		kvm_mmu_free_roots(vcpu, mmu, KVM_MMU_ROOT_CURRENT);
> +		kvm_mmu_free_roots(vcpu->kvm, mmu, KVM_MMU_ROOT_CURRENT);
>  
>  	memcpy(mmu->pdptrs, pdpte, sizeof(mmu->pdptrs));
>  	kvm_register_mark_dirty(vcpu, VCPU_EXREG_PDPTR);
> @@ -1156,7 +1156,7 @@ static void kvm_invalidate_pcid(struct kvm_vcpu *vcpu, unsigned long pcid)
>  		if (kvm_get_pcid(vcpu, mmu->prev_roots[i].pgd) == pcid)
>  			roots_to_free |= KVM_MMU_ROOT_PREVIOUS(i);
>  
> -	kvm_mmu_free_roots(vcpu, mmu, roots_to_free);
> +	kvm_mmu_free_roots(vcpu->kvm, mmu, roots_to_free);
>  }
>  
>  int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)

IMHO anything that is related to guest memory should work on
VM level (that is struct kvm).

It is just ironically sad that writing to a guest page requires
these days a vCPU due to dirty ring tracking.


Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 05/18] KVM: x86/mmu: use struct kvm_mmu_root_info for mmu->root
  2022-02-23 14:39   ` Maxim Levitsky
@ 2022-02-23 15:42     ` Sean Christopherson
  0 siblings, 0 replies; 66+ messages in thread
From: Sean Christopherson @ 2022-02-23 15:42 UTC (permalink / raw)
  To: Maxim Levitsky; +Cc: Paolo Bonzini, linux-kernel, kvm

On Wed, Feb 23, 2022, Maxim Levitsky wrote:
> On Thu, 2022-02-17 at 16:03 -0500, Paolo Bonzini wrote:
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index b912eef5dc1a..c0d7256e3a78 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -762,7 +762,7 @@ bool kvm_inject_emulated_page_fault(struct kvm_vcpu *vcpu,
> >  	if ((fault->error_code & PFERR_PRESENT_MASK) &&
> >  	    !(fault->error_code & PFERR_RSVD_MASK))
> >  		kvm_mmu_invalidate_gva(vcpu, fault_mmu, fault->address,
> > -				       fault_mmu->root_hpa);
> > +				       fault_mmu->root.hpa);
> >  
> >  	fault_mmu->inject_page_fault(vcpu, fault);
> >  	return fault->nested_page_fault;
> 
> 
> As a follow up to this patch I suggest that we should also rename pgd to just 'gpa'.

Hmm, I prefer 'pgd' over 'gpa' because it provides a hint/reminder that the field
is unused for TDP.  It also pairs with e.g. kvm_mmu_new_pgd(), though I suppose we
could rename those to something else too.

> This also brings a question, what pgd acronym actually means? 
> I guess paging guest directory?

Page Global Directory, borrowed from the kernel's arch-agnostic paging terminology.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 08/18] KVM: x86/mmu: do not pass vcpu to root freeing functions
  2022-02-23 15:16   ` Maxim Levitsky
@ 2022-02-23 15:48     ` Sean Christopherson
  0 siblings, 0 replies; 66+ messages in thread
From: Sean Christopherson @ 2022-02-23 15:48 UTC (permalink / raw)
  To: Maxim Levitsky; +Cc: Paolo Bonzini, linux-kernel, kvm

On Wed, Feb 23, 2022, Maxim Levitsky wrote:
> On Thu, 2022-02-17 at 16:03 -0500, Paolo Bonzini wrote:
> > @@ -1156,7 +1156,7 @@ static void kvm_invalidate_pcid(struct kvm_vcpu *vcpu, unsigned long pcid)
> >  		if (kvm_get_pcid(vcpu, mmu->prev_roots[i].pgd) == pcid)
> >  			roots_to_free |= KVM_MMU_ROOT_PREVIOUS(i);
> >  
> > -	kvm_mmu_free_roots(vcpu, mmu, roots_to_free);
> > +	kvm_mmu_free_roots(vcpu->kvm, mmu, roots_to_free);
> >  }
> >  
> >  int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
> 
> IMHO anything that is related to guest memory should work on
> VM level (that is struct kvm).

No, because there are plently of per-CPU/vCPU properties that affect physical
memory accesseses.  Some of them KVM mostly punts on, e.g. MTRRs and APIC base,
but others are relevant, e.g. SMM.

> It is just ironically sad that writing to a guest page requires
> these days a vCPU due to dirty ring tracking.

I dislike (understatement) that the dirty ring code uses the currently running
vCPU instead of passing it down the stack, but fundamentally all memory accesses
that originate from the "CPU", as opposed to a device or whatever, should be tied
to a vCPU.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 09/18] KVM: x86/mmu: look for a cached PGD when going from 32-bit to 64-bit
  2022-02-17 21:03 ` [PATCH v2 09/18] KVM: x86/mmu: look for a cached PGD when going from 32-bit to 64-bit Paolo Bonzini
  2022-02-18 18:08   ` Sean Christopherson
@ 2022-02-23 16:01   ` Maxim Levitsky
  1 sibling, 0 replies; 66+ messages in thread
From: Maxim Levitsky @ 2022-02-23 16:01 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm; +Cc: seanjc

On Thu, 2022-02-17 at 16:03 -0500, Paolo Bonzini wrote:
> Right now, PGD caching avoids placing a PAE root in the cache by using the
> old value of mmu->root_level and mmu->shadow_root_level; it does not look
> for a cached PGD if the old root is a PAE one, and then frees it using
> kvm_mmu_free_roots.
> 
> Change the logic instead to free the uncacheable root early.
> This way, __kvm_new_mmu_pgd is able to look up the cache when going from
> 32-bit to 64-bit (if there is a hit, the invalid root becomes the least
> recently used).  An example of this is nested virtualization with shadow
> paging, when a 64-bit L1 runs a 32-bit L2.
> 
> As a side effect (which is actually the reason why this patch was
> written), PGD caching does not use the old value of mmu->root_level
> and mmu->shadow_root_level anymore.

Which is great - I hated this code!!!

> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  arch/x86/kvm/mmu/mmu.c | 82 ++++++++++++++++++++++++++++++------------
>  1 file changed, 59 insertions(+), 23 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 0f2de811e871..da324a317000 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -4107,52 +4107,88 @@ static inline bool is_root_usable(struct kvm_mmu_root_info *root, gpa_t pgd,
>  				  union kvm_mmu_page_role role)
>  {
>  	return (role.direct || pgd == root->pgd) &&
> -	       VALID_PAGE(root->hpa) && to_shadow_page(root->hpa) &&
> +	       VALID_PAGE(root->hpa) &&
>  	       role.word == to_shadow_page(root->hpa)->role.word;
>  }
>  
>  /*
> - * Find out if a previously cached root matching the new pgd/role is available.
> - * The current root is also inserted into the cache.
> - * If a matching root was found, it is assigned to kvm_mmu->root.hpa and true is
> - * returned.
> - * Otherwise, the LRU root from the cache is assigned to kvm_mmu->root.hpa and
> - * false is returned. This root should now be freed by the caller.
> + * Find out if a previously cached root matching the new pgd/role is available,
> + * and insert the current root as the MRU in the cache.
> + * If a matching root is found, it is assigned to kvm_mmu->root and
> + * true is returned.
> + * If no match is found, kvm_mmu->root is left invalid, the LRU root is
> + * evicted to make room for the current root, and false is returned.
>   */
> -static bool cached_root_available(struct kvm_vcpu *vcpu, gpa_t new_pgd,
> -				  union kvm_mmu_page_role new_role)
> +static bool cached_root_find_and_keep_current(struct kvm *kvm, struct kvm_mmu *mmu,
> +					      gpa_t new_pgd,
> +					      union kvm_mmu_page_role new_role)
>  {
>  	uint i;
> -	struct kvm_mmu *mmu = vcpu->arch.mmu;
>  
>  	if (is_root_usable(&mmu->root, new_pgd, new_role))
>  		return true;
>  
>  	for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++) {
> +		/*
> +		 * The swaps end up rotating the cache like this:
> +		 *   C   0 1 2 3   (on entry to the function)
> +		 *   0   C 1 2 3
> +		 *   1   C 0 2 3
> +		 *   2   C 0 1 3
> +		 *   3   C 0 1 2   (on exit from the loop)
> +		 */

Thanks a million for documenting this! I remember it took
me too much time to figure out what all of these swaps do.

I do want to mention that it would be nice to mention that
the above trace is when none of the root in the cache match.


>  		swap(mmu->root, mmu->prev_roots[i]);
> -
>  		if (is_root_usable(&mmu->root, new_pgd, new_role))
> -			break;
Maybe even add comment that if that break happens, the cache would look like
'2   C 0 1 3' with 2 beeing the matching root.

> +			return true;
>  	}
>  
> -	return i < KVM_MMU_NUM_PREV_ROOTS;
> +	kvm_mmu_free_roots(kvm, mmu, KVM_MMU_ROOT_CURRENT);
> +	return false;
>  }
>  
> -static bool fast_pgd_switch(struct kvm_vcpu *vcpu, gpa_t new_pgd,
> -			    union kvm_mmu_page_role new_role)
> +/*
> + * Find out if a previously cached root matching the new pgd/role is available.
> + * On entry, mmu->root is invalid.
> + * If a matching root is found, it is assigned to kvm_mmu->root, the LRU entry
> + * of the cache becomes invalid, and true is returned.
> + * If no match is found, kvm_mmu->root is left invalid and false is returned.
> + */
> +static bool cached_root_find_without_current(struct kvm *kvm, struct kvm_mmu *mmu,
> +					     gpa_t new_pgd,
> +					     union kvm_mmu_page_role new_role)
>  {
> -	struct kvm_mmu *mmu = vcpu->arch.mmu;
> +	uint i;
> +
> +	for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++)
> +		if (is_root_usable(&mmu->prev_roots[i], new_pgd, new_role))
> +			goto hit;
>  
> +	return false;
> +
> +hit:
> +	swap(mmu->root, mmu->prev_roots[i]);
> +	/* Bubble up the remaining roots.  */
> +	for (; i < KVM_MMU_NUM_PREV_ROOTS - 1; i++)
> +		mmu->prev_roots[i] = mmu->prev_roots[i + 1];
> +	mmu->prev_roots[i].hpa = INVALID_PAGE;

I'll would have invalidated the 'pgd' value as well, just in case.

> +	return true;
> +}

Since we just have 4 pointers in the LRU cache + root pointer, I wonder if something
dumber/slower could work for both cases.

Like something like that (not tested at all):

for (i = 0 ; i < KVM_MMU_NUM_PREV_ROOTS ; i++)
	tmp_prev_roots[i] = mmu->prev_roots[i];

j = 0;

/* current mmu root becomes MRU*/
if (VALID_PAGE(mmu->root.hpa)
	mmu->prev_roots[j++] = mmu->root;


for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS && j < KVM_MMU_NUM_PREV_ROOTS; i++)
	if (is_root_usable(&tmp_prev_roots[i], new_pgd, new_role))
		/* TODO: can also add a warning here if current root is already usable */
		mmu->root = mmu->tmp_prev_roots[i];
	else
		mmu->prev_roots[j++] = mmu->tmp_prev_roots[i];

for (; j < KVM_MMU_NUM_PREV_ROOTS ; j++)
	mmu->prev_roots[j++].hpa = INVALID_PAGE;

	
> +
> +static bool fast_pgd_switch(struct kvm *kvm, struct kvm_mmu *mmu,
> +			    gpa_t new_pgd, union kvm_mmu_page_role new_role)
> +{
>  	/*
> -	 * For now, limit the fast switch to 64-bit hosts+VMs in order to avoid
> +	 * For now, limit the caching to 64-bit hosts+VMs in order to avoid
>  	 * having to deal with PDPTEs. We may add support for 32-bit hosts/VMs
>  	 * later if necessary.
>  	 */
> -	if (mmu->shadow_root_level >= PT64_ROOT_4LEVEL &&
> -	    mmu->root_level >= PT64_ROOT_4LEVEL)
> -		return cached_root_available(vcpu, new_pgd, new_role);
> +	if (VALID_PAGE(mmu->root.hpa) && !to_shadow_page(mmu->root.hpa))
> +		kvm_mmu_free_roots(kvm, mmu, KVM_MMU_ROOT_CURRENT);
>  
> -	return false;
> +	if (VALID_PAGE(mmu->root.hpa))
> +		return cached_root_find_and_keep_current(kvm, mmu, new_pgd, new_role);
> +	else
> +		return cached_root_find_without_current(kvm, mmu, new_pgd, new_role);
>  }
>  
>  static void __kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd,
> @@ -4160,8 +4196,8 @@ static void __kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd,
>  {
>  	struct kvm_mmu *mmu = vcpu->arch.mmu;
>  
> -	if (!fast_pgd_switch(vcpu, new_pgd, new_role)) {
> -		kvm_mmu_free_roots(vcpu->kvm, mmu, KVM_MMU_ROOT_CURRENT);
> +	if (!fast_pgd_switch(vcpu->kvm, mmu, new_pgd, new_role)) {
> +		/* kvm_mmu_ensure_valid_pgd will set up a new root.  */
I also agree with Sean's comment on this.
>  		return;
>  	}
>  


I don't see any bugs in the code  however, thus:

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 10/18] KVM: x86/mmu: load new PGD after the shadow MMU is initialized
  2022-02-17 21:03 ` [PATCH v2 10/18] KVM: x86/mmu: load new PGD after the shadow MMU is initialized Paolo Bonzini
  2022-02-18 23:59   ` Sean Christopherson
@ 2022-02-23 16:20   ` Maxim Levitsky
  1 sibling, 0 replies; 66+ messages in thread
From: Maxim Levitsky @ 2022-02-23 16:20 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm; +Cc: seanjc

On Thu, 2022-02-17 at 16:03 -0500, Paolo Bonzini wrote:
> Now that __kvm_mmu_new_pgd does not look at the MMU's root_level and
> shadow_root_level anymore, pull the PGD load after the initialization of
> the shadow MMUs.

Again, thanks a million for this! I once spend at least a hour figuring
out why my kernel panics when I did a similiar change, only to figure out
that __kvm_mmu_new_pgd needs to happen before mmu re-initialization.

> 
> Besides being more intuitive, this enables future simplifications
> and optimizations because it's not necessary anymore to compute the
> role outside kvm_init_mmu.  In particular, kvm_mmu_reset_context was not
> attempting to use a cached PGD to avoid having to figure out the new role.
> It will soon be able to follow what nested_{vmx,svm}_load_cr3 are doing,
> and avoid unloading all the cached roots.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  arch/x86/kvm/mmu/mmu.c    | 37 +++++++++++++++++--------------------
>  arch/x86/kvm/svm/nested.c |  6 +++---
>  arch/x86/kvm/vmx/nested.c |  6 +++---
>  3 files changed, 23 insertions(+), 26 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index da324a317000..906a9244ad28 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -4903,9 +4903,8 @@ void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, unsigned long cr0,
>  
>  	new_role = kvm_calc_shadow_npt_root_page_role(vcpu, &regs);
>  
> -	__kvm_mmu_new_pgd(vcpu, nested_cr3, new_role.base);
> -
>  	shadow_mmu_init_context(vcpu, context, &regs, new_role);
> +	__kvm_mmu_new_pgd(vcpu, nested_cr3, new_role.base);
>  }
>  EXPORT_SYMBOL_GPL(kvm_init_shadow_npt_mmu);
>  
> @@ -4943,27 +4942,25 @@ void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly,
>  		kvm_calc_shadow_ept_root_page_role(vcpu, accessed_dirty,
>  						   execonly, level);
>  
> -	__kvm_mmu_new_pgd(vcpu, new_eptp, new_role.base);
> -
> -	if (new_role.as_u64 == context->mmu_role.as_u64)
> -		return;
> -
> -	context->mmu_role.as_u64 = new_role.as_u64;
> +	if (new_role.as_u64 != context->mmu_role.as_u64) {
> +		context->mmu_role.as_u64 = new_role.as_u64;
>  
> -	context->shadow_root_level = level;
> +		context->shadow_root_level = level;
>  
> -	context->ept_ad = accessed_dirty;
> -	context->page_fault = ept_page_fault;
> -	context->gva_to_gpa = ept_gva_to_gpa;
> -	context->sync_page = ept_sync_page;
> -	context->invlpg = ept_invlpg;
> -	context->root_level = level;
> -	context->direct_map = false;
> +		context->ept_ad = accessed_dirty;
> +		context->page_fault = ept_page_fault;
> +		context->gva_to_gpa = ept_gva_to_gpa;
> +		context->sync_page = ept_sync_page;
> +		context->invlpg = ept_invlpg;
> +		context->root_level = level;
> +		context->direct_map = false;
> +		update_permission_bitmask(context, true);
> +		context->pkru_mask = 0;
> +		reset_rsvds_bits_mask_ept(vcpu, context, execonly, huge_page_level);
> +		reset_ept_shadow_zero_bits_mask(context, execonly);
> +	}
>  
> -	update_permission_bitmask(context, true);
> -	context->pkru_mask = 0;
> -	reset_rsvds_bits_mask_ept(vcpu, context, execonly, huge_page_level);
> -	reset_ept_shadow_zero_bits_mask(context, execonly);
> +	__kvm_mmu_new_pgd(vcpu, new_eptp, new_role.base);
>  }
>  EXPORT_SYMBOL_GPL(kvm_init_shadow_ept_mmu);
>  
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index f284e61451c8..96bab464967f 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -492,14 +492,14 @@ static int nested_svm_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3,
>  	    CC(!load_pdptrs(vcpu, cr3)))
>  		return -EINVAL;
>  
> -	if (!nested_npt)
> -		kvm_mmu_new_pgd(vcpu, cr3);
> -
>  	vcpu->arch.cr3 = cr3;
>  
>  	/* Re-initialize the MMU, e.g. to pick up CR4 MMU role changes. */
>  	kvm_init_mmu(vcpu);
>  
> +	if (!nested_npt)
> +		kvm_mmu_new_pgd(vcpu, cr3);
> +
>  	return 0;
>  }
>  
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index b7bc634d35e2..1dfe23963a9e 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -1126,15 +1126,15 @@ static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3,
>  		return -EINVAL;
>  	}
>  
> -	if (!nested_ept)
> -		kvm_mmu_new_pgd(vcpu, cr3);
> -
>  	vcpu->arch.cr3 = cr3;
>  	kvm_register_mark_dirty(vcpu, VCPU_EXREG_CR3);
>  
>  	/* Re-initialize the MMU, e.g. to pick up CR4 MMU role changes. */
>  	kvm_init_mmu(vcpu);
>  
> +	if (!nested_ept)
> +		kvm_mmu_new_pgd(vcpu, cr3);
> +
>  	return 0;
>  }
>  


Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 11/18] KVM: x86/mmu: Always use current mmu's role when loading new PGD
  2022-02-17 21:03 ` [PATCH v2 11/18] KVM: x86/mmu: Always use current mmu's role when loading new PGD Paolo Bonzini
  2022-02-18 23:59   ` Sean Christopherson
@ 2022-02-23 16:23   ` Maxim Levitsky
  1 sibling, 0 replies; 66+ messages in thread
From: Maxim Levitsky @ 2022-02-23 16:23 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm; +Cc: seanjc

On Thu, 2022-02-17 at 16:03 -0500, Paolo Bonzini wrote:
> Since the guest PGD is now loaded after the MMU has been set up
> completely, the desired role for a cache hit is simply the current
> mmu_role.  There is no need to compute it again, so __kvm_mmu_new_pgd
> can be folded in kvm_mmu_new_pgd.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  arch/x86/kvm/mmu/mmu.c | 29 ++++-------------------------
>  1 file changed, 4 insertions(+), 25 deletions(-)

https://www.monkeyuser.com/2020/levels-of-satisfaction/ ;-)

> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 906a9244ad28..b01160716c6a 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -190,8 +190,6 @@ struct kmem_cache *mmu_page_header_cache;
>  static struct percpu_counter kvm_total_used_mmu_pages;
>  
>  static void mmu_spte_set(u64 *sptep, u64 spte);
> -static union kvm_mmu_page_role
> -kvm_mmu_calc_root_page_role(struct kvm_vcpu *vcpu);
>  
>  struct kvm_mmu_role_regs {
>  	const unsigned long cr0;
> @@ -4191,10 +4189,10 @@ static bool fast_pgd_switch(struct kvm *kvm, struct kvm_mmu *mmu,
>  		return cached_root_find_without_current(kvm, mmu, new_pgd, new_role);
>  }
>  
> -static void __kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd,
> -			      union kvm_mmu_page_role new_role)
> +void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd)
>  {
>  	struct kvm_mmu *mmu = vcpu->arch.mmu;
> +	union kvm_mmu_page_role new_role = mmu->mmu_role.base;
>  
>  	if (!fast_pgd_switch(vcpu->kvm, mmu, new_pgd, new_role)) {
>  		/* kvm_mmu_ensure_valid_pgd will set up a new root.  */
> @@ -4230,11 +4228,6 @@ static void __kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd,
>  		__clear_sp_write_flooding_count(
>  				to_shadow_page(vcpu->arch.mmu->root.hpa));
>  }
> -
> -void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd)
> -{
> -	__kvm_mmu_new_pgd(vcpu, new_pgd, kvm_mmu_calc_root_page_role(vcpu));
> -}
>  EXPORT_SYMBOL_GPL(kvm_mmu_new_pgd);
>  
>  static unsigned long get_cr3(struct kvm_vcpu *vcpu)
> @@ -4904,7 +4897,7 @@ void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, unsigned long cr0,
>  	new_role = kvm_calc_shadow_npt_root_page_role(vcpu, &regs);
>  
>  	shadow_mmu_init_context(vcpu, context, &regs, new_role);
> -	__kvm_mmu_new_pgd(vcpu, nested_cr3, new_role.base);
> +	kvm_mmu_new_pgd(vcpu, nested_cr3);
>  }
>  EXPORT_SYMBOL_GPL(kvm_init_shadow_npt_mmu);
>  
> @@ -4960,7 +4953,7 @@ void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly,
>  		reset_ept_shadow_zero_bits_mask(context, execonly);
>  	}
>  
> -	__kvm_mmu_new_pgd(vcpu, new_eptp, new_role.base);
> +	kvm_mmu_new_pgd(vcpu, new_eptp);
>  }
>  EXPORT_SYMBOL_GPL(kvm_init_shadow_ept_mmu);
>  
> @@ -5045,20 +5038,6 @@ void kvm_init_mmu(struct kvm_vcpu *vcpu)
>  }
>  EXPORT_SYMBOL_GPL(kvm_init_mmu);
>  
> -static union kvm_mmu_page_role
> -kvm_mmu_calc_root_page_role(struct kvm_vcpu *vcpu)
> -{
> -	struct kvm_mmu_role_regs regs = vcpu_to_role_regs(vcpu);
> -	union kvm_mmu_role role;
> -
> -	if (tdp_enabled)
> -		role = kvm_calc_tdp_mmu_root_page_role(vcpu, &regs, true);
> -	else
> -		role = kvm_calc_shadow_mmu_root_page_role(vcpu, &regs, true);
> -
> -	return role.base;
> -}
> -
>  void kvm_mmu_after_set_cpuid(struct kvm_vcpu *vcpu)
>  {
>  	/*

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 12/18] KVM: x86/mmu: clear MMIO cache when unloading the MMU
  2022-02-17 21:03 ` [PATCH v2 12/18] KVM: x86/mmu: clear MMIO cache when unloading the MMU Paolo Bonzini
  2022-02-18 23:59   ` Sean Christopherson
@ 2022-02-23 16:32   ` Maxim Levitsky
  1 sibling, 0 replies; 66+ messages in thread
From: Maxim Levitsky @ 2022-02-23 16:32 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm; +Cc: seanjc

On Thu, 2022-02-17 at 16:03 -0500, Paolo Bonzini wrote:
> For cleanliness, do not leave a stale GVA in the cache after all the roots are
> cleared.  In practice, kvm_mmu_load will go through kvm_mmu_sync_roots if
> paging is on, and will not use vcpu_match_mmio_gva at all if paging is off.
> However, leaving data in the cache might cause bugs in the future.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  arch/x86/kvm/mmu/mmu.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index b01160716c6a..4e8e3e9530ca 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -5111,6 +5111,7 @@ void kvm_mmu_unload(struct kvm_vcpu *vcpu)
>  {
>  	__kvm_mmu_unload(vcpu->kvm, &vcpu->arch.root_mmu);
>  	__kvm_mmu_unload(vcpu->kvm, &vcpu->arch.guest_mmu);
> +	vcpu_clear_mmio_info(vcpu, MMIO_GVA_ANY);
>  }
>  
>  static bool need_remote_flush(u64 old, u64 new)


One thing that bothers me for a while with all of this is that
vcpu->arch.{mmio_gva|mmio_access|mmio_gfn|mmio_gen} are often called mmio cache,
while we also install reserved bit SPTEs and also call this a mmio cache.

The above is basically a cache of a cache sort of.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 13/18] KVM: x86: reset and reinitialize the MMU in __set_sregs_common
  2022-02-17 21:03 ` [PATCH v2 13/18] KVM: x86: reset and reinitialize the MMU in __set_sregs_common Paolo Bonzini
  2022-02-19  0:22   ` Sean Christopherson
@ 2022-02-23 16:48   ` Maxim Levitsky
  1 sibling, 0 replies; 66+ messages in thread
From: Maxim Levitsky @ 2022-02-23 16:48 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm; +Cc: seanjc

On Thu, 2022-02-17 at 16:03 -0500, Paolo Bonzini wrote:
> Do a full unload of the MMU in KVM_SET_SREGS and KVM_SEST_REGS2, in
Typo
> preparation for not doing so in kvm_mmu_reset_context.  There is no
> need to delay the reset until after the return, so do it directly in
> the __set_sregs_common function and remove the mmu_reset_needed output
> parameter.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  arch/x86/kvm/x86.c | 32 +++++++++++++-------------------
>  1 file changed, 13 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 6aefd7ac7039..f10878aa5b20 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -10730,7 +10730,7 @@ static bool kvm_is_valid_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
>  }
>  
>  static int __set_sregs_common(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs,
> -		int *mmu_reset_needed, bool update_pdptrs)
> +			      int update_pdptrs)
>  {
>  	struct msr_data apic_base_msr;
>  	int idx;
> @@ -10755,29 +10755,31 @@ static int __set_sregs_common(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs,
>  	static_call(kvm_x86_set_gdt)(vcpu, &dt);
>  
>  	vcpu->arch.cr2 = sregs->cr2;
> -	*mmu_reset_needed |= kvm_read_cr3(vcpu) != sregs->cr3;
> +
> +	if (vcpu->arch.efer != sregs->efer ||
> +	    kvm_read_cr0(vcpu) != sregs->cr0 ||
> +	    vcpu->arch.cr3 != sregs->cr3 || !update_pdptrs ||
> +	    kvm_read_cr4(vcpu) != sregs->cr4)
> +		kvm_mmu_unload(vcpu);

Should it be (update_pdptrs && is_pae_paging(vcpu)) instead?

> +
>  	vcpu->arch.cr3 = sregs->cr3;
>  	kvm_register_mark_dirty(vcpu, VCPU_EXREG_CR3);
>  	static_call_cond(kvm_x86_post_set_cr3)(vcpu, sregs->cr3);
>  
>  	kvm_set_cr8(vcpu, sregs->cr8);
>  
> -	*mmu_reset_needed |= vcpu->arch.efer != sregs->efer;
>  	static_call(kvm_x86_set_efer)(vcpu, sregs->efer);
>  
> -	*mmu_reset_needed |= kvm_read_cr0(vcpu) != sregs->cr0;
>  	static_call(kvm_x86_set_cr0)(vcpu, sregs->cr0);
>  	vcpu->arch.cr0 = sregs->cr0;
>  
> -	*mmu_reset_needed |= kvm_read_cr4(vcpu) != sregs->cr4;
>  	static_call(kvm_x86_set_cr4)(vcpu, sregs->cr4);
>  
> +	kvm_init_mmu(vcpu);
>  	if (update_pdptrs) {
>  		idx = srcu_read_lock(&vcpu->kvm->srcu);
> -		if (is_pae_paging(vcpu)) {
> +		if (is_pae_paging(vcpu))
>  			load_pdptrs(vcpu, kvm_read_cr3(vcpu));
> -			*mmu_reset_needed = 1;
> -		}
>  		srcu_read_unlock(&vcpu->kvm->srcu, idx);
>  	}
>  
> @@ -10805,15 +10807,11 @@ static int __set_sregs_common(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs,
>  static int __set_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
>  {
>  	int pending_vec, max_bits;
> -	int mmu_reset_needed = 0;
> -	int ret = __set_sregs_common(vcpu, sregs, &mmu_reset_needed, true);
> +	int ret = __set_sregs_common(vcpu, sregs, true);
>  
>  	if (ret)
>  		return ret;
>  
> -	if (mmu_reset_needed)
> -		kvm_mmu_reset_context(vcpu);
> -
>  	max_bits = KVM_NR_INTERRUPTS;
>  	pending_vec = find_first_bit(
>  		(const unsigned long *)sregs->interrupt_bitmap, max_bits);
> @@ -10828,7 +10826,6 @@ static int __set_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
>  
>  static int __set_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2)
>  {
> -	int mmu_reset_needed = 0;
>  	bool valid_pdptrs = sregs2->flags & KVM_SREGS2_FLAGS_PDPTRS_VALID;
>  	bool pae = (sregs2->cr0 & X86_CR0_PG) && (sregs2->cr4 & X86_CR4_PAE) &&
>  		!(sregs2->efer & EFER_LMA);
> @@ -10840,8 +10837,7 @@ static int __set_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2)
>  	if (valid_pdptrs && (!pae || vcpu->arch.guest_state_protected))
>  		return -EINVAL;
>  
> -	ret = __set_sregs_common(vcpu, (struct kvm_sregs *)sregs2,
> -				 &mmu_reset_needed, !valid_pdptrs);
> +	ret = __set_sregs_common(vcpu, (struct kvm_sregs *)sregs2, !valid_pdptrs);
>  	if (ret)
>  		return ret;
>  
> @@ -10850,11 +10846,9 @@ static int __set_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2)
>  			kvm_pdptr_write(vcpu, i, sregs2->pdptrs[i]);
>  
>  		kvm_register_mark_dirty(vcpu, VCPU_EXREG_PDPTR);
> -		mmu_reset_needed = 1;
>  		vcpu->arch.pdptrs_from_userspace = true;
> +		/* kvm_mmu_reload will be called on the next entry.  */
Could you elaborate on this? 

In theory if set_sregs2 only changed the pdptrs, without changing anything else
which won't really happen in practice but can in theory, then there will
be no mmu reset with new code IMHO.

Best regards,
	Maxim Levitsky


>  	}
> -	if (mmu_reset_needed)
> -		kvm_mmu_reset_context(vcpu);
>  	return 0;
>  }
>  



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 14/18] KVM: x86/mmu: avoid indirect call for get_cr3
  2022-02-17 21:03 ` [PATCH v2 14/18] KVM: x86/mmu: avoid indirect call for get_cr3 Paolo Bonzini
  2022-02-18 20:30   ` Sean Christopherson
@ 2022-02-24 11:02   ` Maxim Levitsky
  2022-02-24 15:12     ` Sean Christopherson
  1 sibling, 1 reply; 66+ messages in thread
From: Maxim Levitsky @ 2022-02-24 11:02 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm; +Cc: seanjc

On Thu, 2022-02-17 at 16:03 -0500, Paolo Bonzini wrote:
> Most of the time, calls to get_guest_pgd result in calling
> kvm_read_cr3 (the exception is only nested TDP).  Hardcode
> the default instead of using the get_cr3 function, avoiding
> a retpoline if they are enabled.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  arch/x86/kvm/mmu.h             | 13 +++++++++++++
>  arch/x86/kvm/mmu/mmu.c         | 15 +++++----------
>  arch/x86/kvm/mmu/paging_tmpl.h |  2 +-
>  arch/x86/kvm/x86.c             |  2 +-
>  4 files changed, 20 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> index 1d0c1904d69a..1808d6814ddb 100644
> --- a/arch/x86/kvm/mmu.h
> +++ b/arch/x86/kvm/mmu.h
> @@ -116,6 +116,19 @@ static inline void kvm_mmu_load_pgd(struct kvm_vcpu *vcpu)
>  					  vcpu->arch.mmu->shadow_root_level);
>  }
>  
> +static inline gpa_t __kvm_mmu_get_guest_pgd(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu)
> +{
> +	if (!mmu->get_guest_pgd)
> +		return kvm_read_cr3(vcpu);
> +	else
> +		return mmu->get_guest_pgd(vcpu);
> +}
> +
> +static inline gpa_t kvm_mmu_get_guest_pgd(struct kvm_vcpu *vcpu)
> +{
> +	return __kvm_mmu_get_guest_pgd(vcpu, vcpu->arch.mmu);
> +}
> +
>  struct kvm_page_fault {
>  	/* arguments to kvm_mmu_do_page_fault.  */
>  	const gpa_t addr;
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 4e8e3e9530ca..d422d0d2adf8 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -3451,7 +3451,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
>  	unsigned i;
>  	int r;
>  
> -	root_pgd = mmu->get_guest_pgd(vcpu);
> +	root_pgd = kvm_mmu_get_guest_pgd(vcpu);
>  	root_gfn = root_pgd >> PAGE_SHIFT;
>  
>  	if (mmu_check_root(vcpu, root_gfn))
> @@ -3881,7 +3881,7 @@ static bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
>  	arch.token = (vcpu->arch.apf.id++ << 12) | vcpu->vcpu_id;
>  	arch.gfn = gfn;
>  	arch.direct_map = vcpu->arch.mmu->direct_map;
> -	arch.cr3 = vcpu->arch.mmu->get_guest_pgd(vcpu);
> +	arch.cr3 = kvm_mmu_get_guest_pgd(vcpu);
>  
>  	return kvm_setup_async_pf(vcpu, cr2_or_gpa,
>  				  kvm_vcpu_gfn_to_hva(vcpu, gfn), &arch);
> @@ -4230,11 +4230,6 @@ void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd)
>  }
>  EXPORT_SYMBOL_GPL(kvm_mmu_new_pgd);
>  
> -static unsigned long get_cr3(struct kvm_vcpu *vcpu)
> -{
> -	return kvm_read_cr3(vcpu);
> -}
> -
>  static bool sync_mmio_spte(struct kvm_vcpu *vcpu, u64 *sptep, gfn_t gfn,
>  			   unsigned int access)
>  {
> @@ -4789,7 +4784,7 @@ static void init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
>  	context->invlpg = NULL;
>  	context->shadow_root_level = kvm_mmu_get_tdp_level(vcpu);
>  	context->direct_map = true;
> -	context->get_guest_pgd = get_cr3;
> +	context->get_guest_pgd = NULL; /* use kvm_read_cr3 */
>  	context->get_pdptr = kvm_pdptr_read;
>  	context->inject_page_fault = kvm_inject_page_fault;
>  	context->root_level = role_regs_to_root_level(&regs);
> @@ -4964,7 +4959,7 @@ static void init_kvm_softmmu(struct kvm_vcpu *vcpu)
>  
>  	kvm_init_shadow_mmu(vcpu, &regs);
>  
> -	context->get_guest_pgd     = get_cr3;
> +	context->get_guest_pgd	   = NULL; /* use kvm_read_cr3 */
>  	context->get_pdptr         = kvm_pdptr_read;
>  	context->inject_page_fault = kvm_inject_page_fault;
>  }
> @@ -4996,7 +4991,7 @@ static void init_kvm_nested_mmu(struct kvm_vcpu *vcpu)
>  		return;
>  
>  	g_context->mmu_role.as_u64 = new_role.as_u64;
> -	g_context->get_guest_pgd     = get_cr3;
> +	g_context->get_guest_pgd     = NULL; /* use kvm_read_cr3 */
>  	g_context->get_pdptr         = kvm_pdptr_read;
>  	g_context->inject_page_fault = kvm_inject_page_fault;
>  	g_context->root_level        = new_role.base.level;
> diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
> index 346f3bad3cb9..1a85aba837b2 100644
> --- a/arch/x86/kvm/mmu/paging_tmpl.h
> +++ b/arch/x86/kvm/mmu/paging_tmpl.h
> @@ -362,7 +362,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
>  	trace_kvm_mmu_pagetable_walk(addr, access);
>  retry_walk:
>  	walker->level = mmu->root_level;
> -	pte           = mmu->get_guest_pgd(vcpu);
> +	pte           = __kvm_mmu_get_guest_pgd(vcpu, mmu);
>  	have_ad       = PT_HAVE_ACCESSED_DIRTY(mmu);
>  
>  #if PTTYPE == 64
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index f10878aa5b20..adcee7c305ca 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -12161,7 +12161,7 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work)
>  		return;
>  
>  	if (!vcpu->arch.mmu->direct_map &&
> -	      work->arch.cr3 != vcpu->arch.mmu->get_guest_pgd(vcpu))
> +	      work->arch.cr3 != kvm_mmu_get_guest_pgd(vcpu))
>  		return;
>  
>  	kvm_mmu_do_page_fault(vcpu, work->cr2_or_gpa, 0, true);


Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Not sure though if that is worth it though. IMHO it would be better to convert mmu callbacks
(and nested ops callbacks, etc) to static calls.

Best regards,
	Maxim Levitsky




^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 14/18] KVM: x86/mmu: avoid indirect call for get_cr3
  2022-02-24 11:02   ` Maxim Levitsky
@ 2022-02-24 15:12     ` Sean Christopherson
  2022-02-24 15:14       ` Maxim Levitsky
  0 siblings, 1 reply; 66+ messages in thread
From: Sean Christopherson @ 2022-02-24 15:12 UTC (permalink / raw)
  To: Maxim Levitsky; +Cc: Paolo Bonzini, linux-kernel, kvm

On Thu, Feb 24, 2022, Maxim Levitsky wrote:
> Not sure though if that is worth it though. IMHO it would be better to
> convert mmu callbacks (and nested ops callbacks, etc) to static calls.

nested_ops can utilize static_call(), mmu hooks cannot.  static_call() patches
the code, which means there cannot be multiple targets at any given time.  The
"static" part refers to the target not changing, generally for the lifetime of
the kernel/module in question.  Even with TDP that doesn't hold true due to
nested virtualization.

We could selectively use INDIRECT_CALL_*() for some of the MMU calls, but given
how few cases and targets we really care about, I prefer our homebrewed manual
checks as theres less macro maze to navigate.

E.g. to convert the TDP fault case

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 1d0c1904d69a..940ec6a9d284 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -3,6 +3,8 @@
 #define __KVM_X86_MMU_H

 #include <linux/kvm_host.h>
+#include <linux/indirect_call_wrapper.h>
+
 #include "kvm_cache_regs.h"
 #include "cpuid.h"

@@ -169,7 +171,8 @@ struct kvm_page_fault {
        bool map_writable;
 };

-int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
+INDIRECT_CALLABLE_DECLARE(int kvm_tdp_page_fault(struct kvm_vcpu *vcpu,
+                                                struct kvm_page_fault *fault));

 extern int nx_huge_pages;
 static inline bool is_nx_huge_page_enabled(void)
@@ -196,11 +199,9 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
                .req_level = PG_LEVEL_4K,
                .goal_level = PG_LEVEL_4K,
        };
-#ifdef CONFIG_RETPOLINE
-       if (fault.is_tdp)
-               return kvm_tdp_page_fault(vcpu, &fault);
-#endif
-       return vcpu->arch.mmu->page_fault(vcpu, &fault);
+       struct kvm_mmu *mmu = vcpu->arch.mmu;
+
+       return INDIRECT_CALL_1(mmu->page_fault, kvm_tdp_page_fault, vcpu, &fault);
 }

 /*
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index c1deaec795c2..a3ad1bc58859 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4055,7 +4055,8 @@ int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
 }
 EXPORT_SYMBOL_GPL(kvm_handle_page_fault);

-int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
+INDIRECT_CALLABLE_SCOPE int kvm_tdp_page_fault(struct kvm_vcpu *vcpu,
+                                              struct kvm_page_fault *fault)
 {
        while (fault->max_level > PG_LEVEL_4K) {
                int page_num = KVM_PAGES_PER_HPAGE(fault->max_level);


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 14/18] KVM: x86/mmu: avoid indirect call for get_cr3
  2022-02-24 15:12     ` Sean Christopherson
@ 2022-02-24 15:14       ` Maxim Levitsky
  0 siblings, 0 replies; 66+ messages in thread
From: Maxim Levitsky @ 2022-02-24 15:14 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, linux-kernel, kvm

On Thu, 2022-02-24 at 15:12 +0000, Sean Christopherson wrote:
> On Thu, Feb 24, 2022, Maxim Levitsky wrote:
> > Not sure though if that is worth it though. IMHO it would be better to
> > convert mmu callbacks (and nested ops callbacks, etc) to static calls.
> 
> nested_ops can utilize static_call(), mmu hooks cannot.  static_call() patches
> the code, which means there cannot be multiple targets at any given time.  The
> "static" part refers to the target not changing, generally for the lifetime of
> the kernel/module in question.  Even with TDP that doesn't hold true due to
> nested virtualization.

Ah, right, I forgot that static_call patches the call sites.

Best regards,
	Maxim Levitsky

> 
> We could selectively use INDIRECT_CALL_*() for some of the MMU calls, but given
> how few cases and targets we really care about, I prefer our homebrewed manual
> checks as theres less macro maze to navigate.
> 
> E.g. to convert the TDP fault case
> 
> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> index 1d0c1904d69a..940ec6a9d284 100644
> --- a/arch/x86/kvm/mmu.h
> +++ b/arch/x86/kvm/mmu.h
> @@ -3,6 +3,8 @@
>  #define __KVM_X86_MMU_H
> 
>  #include <linux/kvm_host.h>
> +#include <linux/indirect_call_wrapper.h>
> +
>  #include "kvm_cache_regs.h"
>  #include "cpuid.h"
> 
> @@ -169,7 +171,8 @@ struct kvm_page_fault {
>         bool map_writable;
>  };
> 
> -int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
> +INDIRECT_CALLABLE_DECLARE(int kvm_tdp_page_fault(struct kvm_vcpu *vcpu,
> +                                                struct kvm_page_fault *fault));
> 
>  extern int nx_huge_pages;
>  static inline bool is_nx_huge_page_enabled(void)
> @@ -196,11 +199,9 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
>                 .req_level = PG_LEVEL_4K,
>                 .goal_level = PG_LEVEL_4K,
>         };
> -#ifdef CONFIG_RETPOLINE
> -       if (fault.is_tdp)
> -               return kvm_tdp_page_fault(vcpu, &fault);
> -#endif
> -       return vcpu->arch.mmu->page_fault(vcpu, &fault);
> +       struct kvm_mmu *mmu = vcpu->arch.mmu;
> +
> +       return INDIRECT_CALL_1(mmu->page_fault, kvm_tdp_page_fault, vcpu, &fault);
>  }
> 
>  /*
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index c1deaec795c2..a3ad1bc58859 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -4055,7 +4055,8 @@ int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
>  }
>  EXPORT_SYMBOL_GPL(kvm_handle_page_fault);
> 
> -int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
> +INDIRECT_CALLABLE_SCOPE int kvm_tdp_page_fault(struct kvm_vcpu *vcpu,
> +                                              struct kvm_page_fault *fault)
>  {
>         while (fault->max_level > PG_LEVEL_4K) {
>                 int page_num = KVM_PAGES_PER_HPAGE(fault->max_level);
> 



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 15/18] KVM: x86/mmu: rename kvm_mmu_new_pgd, introduce variant that calls get_guest_pgd
  2022-02-18 21:00     ` Sean Christopherson
@ 2022-02-24 15:41       ` Maxim Levitsky
  2022-02-25 17:40         ` Sean Christopherson
  0 siblings, 1 reply; 66+ messages in thread
From: Maxim Levitsky @ 2022-02-24 15:41 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini; +Cc: linux-kernel, kvm

On Fri, 2022-02-18 at 21:00 +0000, Sean Christopherson wrote:
> On Fri, Feb 18, 2022, Paolo Bonzini wrote:
> > On 2/17/22 22:03, Paolo Bonzini wrote:
> > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > index adcee7c305ca..9800c8883a48 100644
> > > --- a/arch/x86/kvm/x86.c
> > > +++ b/arch/x86/kvm/x86.c
> > > @@ -1189,7 +1189,7 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
> > >   		return 1;
> > >   	if (cr3 != kvm_read_cr3(vcpu))
> > > -		kvm_mmu_new_pgd(vcpu, cr3);
> > > +		kvm_mmu_update_root(vcpu);
> > >   	vcpu->arch.cr3 = cr3;
> > >   	kvm_register_mark_dirty(vcpu, VCPU_EXREG_CR3);
> > 
> > Uh-oh, this has to become:
> > 
> >  	vcpu->arch.cr3 = cr3;
> >  	kvm_register_mark_dirty(vcpu, VCPU_EXREG_CR3);
> > 	if (!is_pae_paging(vcpu))
> > 		kvm_mmu_update_root(vcpu);
> > 
> > The regression would go away after patch 16, but this is more tidy apart
> > from having to check is_pae_paging *again*.
> > 
> > Incremental patch:
> > 
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index adcee7c305ca..0085e9fba372 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -1188,11 +1189,11 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
> >  	if (is_pae_paging(vcpu) && !load_pdptrs(vcpu, cr3))
> >  		return 1;
> > -	if (cr3 != kvm_read_cr3(vcpu))
> > -		kvm_mmu_update_root(vcpu);
> > -
> >  	vcpu->arch.cr3 = cr3;
> >  	kvm_register_mark_dirty(vcpu, VCPU_EXREG_CR3);
> > +	if (!is_pae_paging(vcpu))
> > +		kvm_mmu_update_root(vcpu);
> > +
> >  	/* Do not call post_set_cr3, we do not get here for confidential guests.  */
> > 
> > An alternative is to move the vcpu->arch.cr3 update in load_pdptrs.
> > Reviewers, let me know if you prefer that, then I'll send v3.
> 
>   c) None of the above.
> 
> MOV CR3 never requires a new root if TDP is enabled, and the guest_mmu is used if
> and only if TDP is enabled.  Even when KVM intercepts CR3 when EPT=1 && URG=0, it
> does so only to snapshot vcpu->arch.cr3, there's no need to get a new PGD.
> 
> Unless I'm missing something, your original suggestion of checking tdp_enabled is
> the way to go.
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 6e0f7f22c6a7..2b02029c63d0 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1187,7 +1187,7 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
>         if (is_pae_paging(vcpu) && !load_pdptrs(vcpu, cr3))
>                 return 1;
> 
> -       if (cr3 != kvm_read_cr3(vcpu))
> +       if (!tdp_enabled && cr3 != kvm_read_cr3(vcpu))
>                 kvm_mmu_new_pgd(vcpu, cr3);
> 
>         vcpu->arch.cr3 = cr3;
> 
> 

Is this actually related to the discussion? The original issue that Paolo found in his patch
was that kvm_mmu_update_root now reads _current_ cr3, thus it has to be set before calling it.

I do agree that kvm_set_cr3 doesn't need to do anything when TDP is enabled, this is a different
issue which doesn't cause much harm (the fast_pgd_switch with direct roots will reuse current root),
but still would raise KVM_REQ_LOAD_MMU_PGD without need for it and such.

About the patch itself, other than this mentioned issue, it looks fine to me.


Best regards,
	Maxim Levitsky



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 16/18] KVM: x86: introduce KVM_REQ_MMU_UPDATE_ROOT
  2022-02-19  7:54     ` Paolo Bonzini
  2022-02-22 16:06       ` Sean Christopherson
@ 2022-02-24 15:50       ` Maxim Levitsky
  1 sibling, 0 replies; 66+ messages in thread
From: Maxim Levitsky @ 2022-02-24 15:50 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson; +Cc: linux-kernel, kvm

On Sat, 2022-02-19 at 08:54 +0100, Paolo Bonzini wrote:
> On 2/18/22 22:45, Sean Christopherson wrote:
> > On Thu, Feb 17, 2022, Paolo Bonzini wrote:
> > > Whenever KVM knows the page role flags have changed, it needs to drop
> > > the current MMU root and possibly load one from the prev_roots cache.
> > > Currently it is papering over some overly simplistic code by just
> > > dropping _all_ roots, so that the root will be reloaded by
> > > kvm_mmu_reload, but this has bad performance for the TDP MMU
> > > (which drops the whole of the page tables when freeing a root,
> > > without the performance safety net of a hash table).
> > > 
> > > To do this, KVM needs to do a more kvm_mmu_update_root call from
> > > kvm_mmu_reset_context.  Introduce a new request bit so that the call
> > > can be delayed until after a possible KVM_REQ_MMU_RELOAD, which would
> > > kill all hopes of finding a cached PGD.
> > > 
> > > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> > > ---
> > 
> > Please no.
> > 
> > I really, really do not want to add yet another deferred-load in the nested
> > virtualization paths.
> 
> This is not a deferred load, is it?  It's only kvm_mmu_new_pgd that is 
> deferred, but the PDPTR load is not.
> 
> I think I should first merge patches 1-13, then revisit the root_role 
> series (which only depends on the fast_pgd_switch and caching changes), 
> and then finally get back to this final part.  The reason is that 
> root_role is what enables the stale-root check that you wanted; and it's 
> easier to think about loading the guest PGD post-kvm_init_mmu if I can 
> show you the direction I'd like to have in general, and not leave things 
> half-done.
> 
> (Patch 17 is also independent and perhaps fixing a case of premature 
> optimization, so I'm inclined to merge it as well).
> 
> > As Jim pointed out[1], KVM_REQ_GET_NESTED_STATE_PAGES should
> > never have been merged. And on that point, I've no idea how this new request will
> > interact with KVM_REQ_GET_NESTED_STATE_PAGE.  It may be a complete non-issue, but
> > I'd honestly rather not have to spend the brain power.
> 
> Fair enough on the interaction, but I still think 
> KVM_REQ_GET_NESTED_STATE_PAGES is a good idea.  I don't think KVM should 
> access guest memory outside KVM_RUN, though there may be cases (possibly 
> some PV MSRs, if I had to guess) where it does.

KVM_REQ_GET_NESTED_STATE_PAGES is a real source of bugs, and a burden to maintain,
I fixed too many bugs in it, and it will only get worse with time, not to mention,
that without any proper tests, we are bound to access guest memory on
setting the nested state without anybody noticing.


Best regards,
	Maxim Levitsky

> 
> > And I still do not like the approach of converting kvm_mmu_reset_context() wholesale
> > to not doing kvm_mmu_unload().  There are currently eight kvm_mmu_reset_context() calls:
> > 
> >    1.   nested_vmx_restore_host_state() - Only for a missed VM-Entry => VM-Fail
> >         consistency check, not at all a performance concern.
> > 
> >    2.   kvm_mmu_after_set_cpuid() - Still needs to unload.  Not a perf concern.
> > 
> >    3.   kvm_vcpu_reset() - Relevant only to INIT.  Not a perf concern, but could be
> >         converted manually to a different path without too much fuss.
> > 
> >    4+5. enter_smm() / kvm_smm_changed() - IMO, not a perf concern, but again could
> >         be converted manually if anyone cares.
> > 
> >    6.   set_efer() - Silly corner case that basically requires host userspace abuse
> >         of KVM APIs.  Not a perf concern.
> > 
> >    7+8. kvm_post_set_cr0/4() - These are the ones we really care about, and they
> >         can be handled quite trivially, and can even share much of the logic with
> >         kvm_set_cr3().
> > 
> > I strongly prefer that we take a more conservative approach and fix 7+8, and then
> > tackle 1, 3, and 4+5 separately if someone cares enough about those flows to avoid
> > dropping roots.
> 
> The thing is, I want to get rid of kvm_mmu_reset_context() altogether. 
> I dislike the fact that it kills the roots but still keeps them in the 
> hash table, thus relying on separate syncing to avoid future bugs.  It's 
> very unintuitive what is "reset" and what isn't.
> 
> > Regarding KVM_REQ_MMU_RELOAD, that mess mostly goes away with my series to replace
> > that with KVM_REQ_MMU_FREE_OBSOLETE_ROOTS.  Obsolete TDP MMU roots will never get
> > a cache hit because the obsolete root will have an "invalid" role.  And if we care
> > about optimizing this with respect to a memslot (highly unlikely), then we could
> > add an MMU generation check in the cache lookup.  I was planning on posting that
> > series as soon as this one is queued, but I'm more than happy to speculatively send
> > a refreshed version that applies on top of this series.
> 
> Yes, please send a version on top of patches 1-13.  That can be reviewed 
> and committed in parallel with the root_role changes.
> 
> Paolo
> 
> > [1] https://lore.kernel.org/all/CALMp9eT2cP7kdptoP3=acJX+5_Wg6MXNwoDh42pfb21-wdXvJg@mail.gmail.com
> > [2] https://lore.kernel.org/all/20211209060552.2956723-1-seanjc@google.com



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 17/18] KVM: x86: flush TLB separately from MMU reset
  2022-02-17 21:03 ` [PATCH v2 17/18] KVM: x86: flush TLB separately from MMU reset Paolo Bonzini
  2022-02-18 23:57   ` Sean Christopherson
@ 2022-02-24 16:11   ` Maxim Levitsky
  1 sibling, 0 replies; 66+ messages in thread
From: Maxim Levitsky @ 2022-02-24 16:11 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm; +Cc: seanjc

On Thu, 2022-02-17 at 16:03 -0500, Paolo Bonzini wrote:
> For both CR0 and CR4, disassociate the TLB flush logic from the
> MMU role logic.  Instead  of relying on kvm_mmu_reset_context() being
> a superset of various TLB flushes (which is not necessarily going to
> be the case in the future), always call it if the role changes
> but also set the various TLB flush requests according to what is
> in the manual.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  arch/x86/kvm/x86.c | 58 ++++++++++++++++++++++++++++++++--------------
>  1 file changed, 40 insertions(+), 18 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 9043548e6baf..2b4663dfcd8d 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -871,6 +871,13 @@ void kvm_post_set_cr0(struct kvm_vcpu *vcpu, unsigned long old_cr0, unsigned lon
>  	if ((cr0 ^ old_cr0) & X86_CR0_PG) {
>  		kvm_clear_async_pf_completion_queue(vcpu);
>  		kvm_async_pf_hash_reset(vcpu);
> +
> +		/*
> +		 * Clearing CR0.PG is defined to flush the TLB from the guest's
> +		 * perspective.
> +		 */
> +		if (!(cr0 & X86_CR0_PG))
> +			kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
>  	}
>  
>  	if ((cr0 ^ old_cr0) & KVM_MMU_CR0_ROLE_BITS)
> @@ -1057,28 +1064,41 @@ EXPORT_SYMBOL_GPL(kvm_is_valid_cr4);
>  
>  void kvm_post_set_cr4(struct kvm_vcpu *vcpu, unsigned long old_cr4, unsigned long cr4)
>  {
> +	if ((cr4 ^ old_cr4) & KVM_MMU_CR4_ROLE_BITS)
> +		kvm_mmu_reset_context(vcpu);
> +
>  	/*
> -	 * If any role bit is changed, the MMU needs to be reset.
> -	 *
> -	 * If CR4.PCIDE is changed 1 -> 0, the guest TLB must be flushed.
>  	 * If CR4.PCIDE is changed 0 -> 1, there is no need to flush the TLB
>  	 * according to the SDM; however, stale prev_roots could be reused
>  	 * incorrectly in the future after a MOV to CR3 with NOFLUSH=1, so we
> -	 * free them all.  KVM_REQ_MMU_RELOAD is fit for the both cases; it
> -	 * is slow, but changing CR4.PCIDE is a rare case.
> -	 *
> -	 * If CR4.PGE is changed, the guest TLB must be flushed.
> -	 *
> -	 * Note: resetting MMU is a superset of KVM_REQ_MMU_RELOAD and
> -	 * KVM_REQ_MMU_RELOAD is a superset of KVM_REQ_TLB_FLUSH_GUEST, hence
> -	 * the usage of "else if".
> +	 * free them all.  This is *not* a superset of KVM_REQ_TLB_FLUSH_GUEST
> +	 * or KVM_REQ_TLB_FLUSH_CURRENT, because the hardware TLB is not flushed,
> +	 * so fall through.
>  	 */
> -	if ((cr4 ^ old_cr4) & KVM_MMU_CR4_ROLE_BITS)
> -		kvm_mmu_reset_context(vcpu);
> -	else if ((cr4 ^ old_cr4) & X86_CR4_PCIDE)
> +	if (!tdp_enabled &&
> +	    (cr4 & X86_CR4_PCIDE) && !(old_cr4 & X86_CR4_PCIDE))
>  		kvm_make_request(KVM_REQ_MMU_RELOAD, vcpu);
> -	else if ((cr4 ^ old_cr4) & X86_CR4_PGE)
> -		kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
> +
> +	/*
> +	 * The TLB has to be flushed for all PCIDs on:
> +	 * - CR4.PCIDE changed from 1 to 0
> +	 * - any change to CR4.PGE
> +	 *
> +	 * This is a superset of KVM_REQ_TLB_FLUSH_CURRENT.
> +	 */
> +	if (((cr4 ^ old_cr4) & X86_CR4_PGE) ||
> +	    (!(cr4 & X86_CR4_PCIDE) && (old_cr4 & X86_CR4_PCIDE)))
> +		 kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
> +
> +	/*
> +	 * The TLB has to be flushed for the current PCID on:
> +	 * - CR4.SMEP changed from 0 to 1
> +	 * - any change to CR4.PAE
> +	 */
> +	else if (((cr4 ^ old_cr4) & X86_CR4_PAE) ||
> +		 ((cr4 & X86_CR4_SMEP) && !(old_cr4 & X86_CR4_SMEP)))
> +		 kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
> +
>  }
>  EXPORT_SYMBOL_GPL(kvm_post_set_cr4);
>  
> @@ -11323,15 +11343,17 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
>  	static_call(kvm_x86_update_exception_bitmap)(vcpu);
>  
>  	/*
> -	 * Reset the MMU context if paging was enabled prior to INIT (which is
> +	 * A TLB flush is needed if paging was enabled prior to INIT (which is
>  	 * implied if CR0.PG=1 as CR0 will be '0' prior to RESET).  Unlike the
>  	 * standard CR0/CR4/EFER modification paths, only CR0.PG needs to be
>  	 * checked because it is unconditionally cleared on INIT and all other
>  	 * paging related bits are ignored if paging is disabled, i.e. CR0.WP,
>  	 * CR4, and EFER changes are all irrelevant if CR0.PG was '0'.
>  	 */
> -	if (old_cr0 & X86_CR0_PG)
> +	if (old_cr0 & X86_CR0_PG) {
> +		kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
>  		kvm_mmu_reset_context(vcpu);
> +	}
>  
>  	/*
>  	 * Intel's SDM states that all TLB entries are flushed on INIT.  AMD's
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 18/18] KVM: x86: do not unload MMU roots on all role changes
  2022-02-17 21:03 ` [PATCH v2 18/18] KVM: x86: do not unload MMU roots on all role changes Paolo Bonzini
@ 2022-02-24 16:25   ` Maxim Levitsky
  0 siblings, 0 replies; 66+ messages in thread
From: Maxim Levitsky @ 2022-02-24 16:25 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm; +Cc: seanjc

On Thu, 2022-02-17 at 16:03 -0500, Paolo Bonzini wrote:
> kvm_mmu_reset_context is called on all role changes and right now it
> calls kvm_mmu_unload.  With the legacy MMU this is a relatively cheap
> operation; the previous PGDs remains in the hash table and is picked
> up immediately on the next page fault.  With the TDP MMU, however, the
> roots are thrown away for good and a full rebuild of the page tables is
> necessary, which is many times more expensive.
> 
> Fortunately, throwing away the roots is not necessary except when
> the manual says a TLB flush is required:
Actually does TLB flush throw away the roots? I think we only sync
them and keep on using them? (kvm_vcpu_flush_tlb_guest)

I can't be 100% sure but this patch 

> 
> - changing CR0.PG from 1 to 0 (because it flushes the TLB according to
>   the x86 architecture specification)
> 
> - changing CPUID (which changes the interpretation of page tables in
>   ways not reflected by the role).
> 
> - changing CR4.SMEP from 0 to 1 (not doing so actually breaks access.c)
> 
> Except for these cases, once the MMU has updated the CPU/MMU roles
> and metadata it is enough to force-reload the current value of CR3.
> KVM will look up the cached roots for an entry with the right role and
> PGD, and only if the cache misses a new root will be created.
> 
> Measuring with vmexit.flat from kvm-unit-tests shows the following
> improvement:
> 
>              TDP         legacy       shadow
>    before    46754       5096         5150
>    after     4879        4875         5006
> 
> which is for very small page tables.  The impact is however much larger
> when running as an L1 hypervisor, because the new page tables cause
> extra work for L0 to shadow them.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  arch/x86/kvm/mmu/mmu.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index c44b5114f947..913cc7229bf4 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -5043,8 +5043,8 @@ EXPORT_SYMBOL_GPL(kvm_init_mmu);
>  void kvm_mmu_after_set_cpuid(struct kvm_vcpu *vcpu)
>  {
>  	/*
> -	 * Invalidate all MMU roles to force them to reinitialize as CPUID
> -	 * information is factored into reserved bit calculations.
> +	 * Invalidate all MMU roles and roots to force them to reinitialize,
> +	 * as CPUID information is factored into reserved bit calculations.
>  	 *
>  	 * Correctly handling multiple vCPU models with respect to paging and
>  	 * physical address properties) in a single VM would require tracking
> @@ -5057,6 +5057,7 @@ void kvm_mmu_after_set_cpuid(struct kvm_vcpu *vcpu)
>  	vcpu->arch.root_mmu.mmu_role.ext.valid = 0;
>  	vcpu->arch.guest_mmu.mmu_role.ext.valid = 0;
>  	vcpu->arch.nested_mmu.mmu_role.ext.valid = 0;
> +	kvm_mmu_unload(vcpu);
>  	kvm_mmu_reset_context(vcpu);
>  
>  	/*
> @@ -5068,8 +5069,8 @@ void kvm_mmu_after_set_cpuid(struct kvm_vcpu *vcpu)
>  
>  void kvm_mmu_reset_context(struct kvm_vcpu *vcpu)
>  {
> -	kvm_mmu_unload(vcpu);
>  	kvm_init_mmu(vcpu);
> +	kvm_make_request(KVM_REQ_MMU_UPDATE_ROOT, vcpu);
>  }
>  EXPORT_SYMBOL_GPL(kvm_mmu_reset_context);
>  


How about call to kvm_mmu_reset_context in nested_vmx_restore_host_state
This is a failback path though.


Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 15/18] KVM: x86/mmu: rename kvm_mmu_new_pgd, introduce variant that calls get_guest_pgd
  2022-02-24 15:41       ` Maxim Levitsky
@ 2022-02-25 17:40         ` Sean Christopherson
  0 siblings, 0 replies; 66+ messages in thread
From: Sean Christopherson @ 2022-02-25 17:40 UTC (permalink / raw)
  To: Maxim Levitsky; +Cc: Paolo Bonzini, linux-kernel, kvm

On Thu, Feb 24, 2022, Maxim Levitsky wrote:
> On Fri, 2022-02-18 at 21:00 +0000, Sean Christopherson wrote:
> > On Fri, Feb 18, 2022, Paolo Bonzini wrote:
> > > On 2/17/22 22:03, Paolo Bonzini wrote:
> > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > > index adcee7c305ca..9800c8883a48 100644
> > > > --- a/arch/x86/kvm/x86.c
> > > > +++ b/arch/x86/kvm/x86.c
> > > > @@ -1189,7 +1189,7 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
> > > >   		return 1;
> > > >   	if (cr3 != kvm_read_cr3(vcpu))
> > > > -		kvm_mmu_new_pgd(vcpu, cr3);
> > > > +		kvm_mmu_update_root(vcpu);
> > > >   	vcpu->arch.cr3 = cr3;
> > > >   	kvm_register_mark_dirty(vcpu, VCPU_EXREG_CR3);
> > > 
> > > Uh-oh, this has to become:
> > > 
> > >  	vcpu->arch.cr3 = cr3;
> > >  	kvm_register_mark_dirty(vcpu, VCPU_EXREG_CR3);
> > > 	if (!is_pae_paging(vcpu))
> > > 		kvm_mmu_update_root(vcpu);
> > > 
> > > The regression would go away after patch 16, but this is more tidy apart
> > > from having to check is_pae_paging *again*.
> > > 
> > > Incremental patch:
> > > 
> > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > index adcee7c305ca..0085e9fba372 100644
> > > --- a/arch/x86/kvm/x86.c
> > > +++ b/arch/x86/kvm/x86.c
> > > @@ -1188,11 +1189,11 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
> > >  	if (is_pae_paging(vcpu) && !load_pdptrs(vcpu, cr3))
> > >  		return 1;
> > > -	if (cr3 != kvm_read_cr3(vcpu))
> > > -		kvm_mmu_update_root(vcpu);
> > > -
> > >  	vcpu->arch.cr3 = cr3;
> > >  	kvm_register_mark_dirty(vcpu, VCPU_EXREG_CR3);
> > > +	if (!is_pae_paging(vcpu))
> > > +		kvm_mmu_update_root(vcpu);
> > > +
> > >  	/* Do not call post_set_cr3, we do not get here for confidential guests.  */
> > > 
> > > An alternative is to move the vcpu->arch.cr3 update in load_pdptrs.
> > > Reviewers, let me know if you prefer that, then I'll send v3.
> > 
> >   c) None of the above.
> > 
> > MOV CR3 never requires a new root if TDP is enabled, and the guest_mmu is used if
> > and only if TDP is enabled.  Even when KVM intercepts CR3 when EPT=1 && URG=0, it
> > does so only to snapshot vcpu->arch.cr3, there's no need to get a new PGD.
> > 
> > Unless I'm missing something, your original suggestion of checking tdp_enabled is
> > the way to go.
> > 
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 6e0f7f22c6a7..2b02029c63d0 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -1187,7 +1187,7 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
> >         if (is_pae_paging(vcpu) && !load_pdptrs(vcpu, cr3))
> >                 return 1;
> > 
> > -       if (cr3 != kvm_read_cr3(vcpu))
> > +       if (!tdp_enabled && cr3 != kvm_read_cr3(vcpu))
> >                 kvm_mmu_new_pgd(vcpu, cr3);
> > 
> >         vcpu->arch.cr3 = cr3;
> > 
> > 
> 
> Is this actually related to the discussion? The original issue that Paolo
> found in his patch was that kvm_mmu_update_root now reads _current_ cr3, thus
> it has to be set before calling it.

Yes, if we instead do the above, then replacing kvm_mmu_new_pgd() with
kvm_mmu_update_root() is unnecessary.  Paolo is trying to fix the case where
kvm_mmu_new_pgd() does the wrong thing for guest_mmu.  My point is that we
should never call kvm_mmu_new_pgd() if mmu == guest_mmu in the first place, and
adding the tdp_enabled checks fixes that bug.

I'm ok with kvm_mmu_new_pgd() acting on a pre-computed role, assuming we actually
get sanity checks.  Deliberately ignoring the pgd/cr3 we already have is silly.

^ permalink raw reply	[flat|nested] 66+ messages in thread

end of thread, other threads:[~2022-02-25 17:40 UTC | newest]

Thread overview: 66+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-17 21:03 [PATCH v2 00/18] KVM: MMU: do not unload MMU roots on all role changes Paolo Bonzini
2022-02-17 21:03 ` [PATCH v2 01/18] KVM: x86: host-initiated EFER.LME write affects the MMU Paolo Bonzini
2022-02-18 17:08   ` Sean Christopherson
2022-02-18 17:26     ` Paolo Bonzini
2022-02-23 13:40   ` Maxim Levitsky
2022-02-17 21:03 ` [PATCH v2 02/18] KVM: x86: do not deliver asynchronous page faults if CR0.PG=0 Paolo Bonzini
2022-02-18 17:12   ` Sean Christopherson
2022-02-23 14:07   ` Maxim Levitsky
2022-02-17 21:03 ` [PATCH v2 03/18] KVM: x86/mmu: WARN if PAE roots linger after kvm_mmu_unload Paolo Bonzini
2022-02-18 17:14   ` Sean Christopherson
2022-02-18 17:23     ` Paolo Bonzini
2022-02-23 14:11       ` Maxim Levitsky
2022-02-17 21:03 ` [PATCH v2 04/18] KVM: x86/mmu: avoid NULL-pointer dereference on page freeing bugs Paolo Bonzini
2022-02-18 17:15   ` Sean Christopherson
2022-02-23 14:12   ` Maxim Levitsky
2022-02-17 21:03 ` [PATCH v2 05/18] KVM: x86/mmu: use struct kvm_mmu_root_info for mmu->root Paolo Bonzini
2022-02-23 14:39   ` Maxim Levitsky
2022-02-23 15:42     ` Sean Christopherson
2022-02-17 21:03 ` [PATCH v2 06/18] KVM: x86/mmu: do not consult levels when freeing roots Paolo Bonzini
2022-02-18 17:27   ` Sean Christopherson
2022-02-23 14:59   ` Maxim Levitsky
2022-02-17 21:03 ` [PATCH v2 07/18] KVM: x86/mmu: Do not use guest root level in audit Paolo Bonzini
2022-02-18 18:37   ` Sean Christopherson
2022-02-18 18:46     ` Paolo Bonzini
2022-02-23 15:02       ` Maxim Levitsky
2022-02-17 21:03 ` [PATCH v2 08/18] KVM: x86/mmu: do not pass vcpu to root freeing functions Paolo Bonzini
2022-02-18 18:39   ` Sean Christopherson
2022-02-23 15:16   ` Maxim Levitsky
2022-02-23 15:48     ` Sean Christopherson
2022-02-17 21:03 ` [PATCH v2 09/18] KVM: x86/mmu: look for a cached PGD when going from 32-bit to 64-bit Paolo Bonzini
2022-02-18 18:08   ` Sean Christopherson
2022-02-23 16:01   ` Maxim Levitsky
2022-02-17 21:03 ` [PATCH v2 10/18] KVM: x86/mmu: load new PGD after the shadow MMU is initialized Paolo Bonzini
2022-02-18 23:59   ` Sean Christopherson
2022-02-23 16:20   ` Maxim Levitsky
2022-02-17 21:03 ` [PATCH v2 11/18] KVM: x86/mmu: Always use current mmu's role when loading new PGD Paolo Bonzini
2022-02-18 23:59   ` Sean Christopherson
2022-02-23 16:23   ` Maxim Levitsky
2022-02-17 21:03 ` [PATCH v2 12/18] KVM: x86/mmu: clear MMIO cache when unloading the MMU Paolo Bonzini
2022-02-18 23:59   ` Sean Christopherson
2022-02-23 16:32   ` Maxim Levitsky
2022-02-17 21:03 ` [PATCH v2 13/18] KVM: x86: reset and reinitialize the MMU in __set_sregs_common Paolo Bonzini
2022-02-19  0:22   ` Sean Christopherson
2022-02-23 16:48   ` Maxim Levitsky
2022-02-17 21:03 ` [PATCH v2 14/18] KVM: x86/mmu: avoid indirect call for get_cr3 Paolo Bonzini
2022-02-18 20:30   ` Sean Christopherson
2022-02-19 10:03     ` Paolo Bonzini
2022-02-24 11:02   ` Maxim Levitsky
2022-02-24 15:12     ` Sean Christopherson
2022-02-24 15:14       ` Maxim Levitsky
2022-02-17 21:03 ` [PATCH v2 15/18] KVM: x86/mmu: rename kvm_mmu_new_pgd, introduce variant that calls get_guest_pgd Paolo Bonzini
2022-02-18  9:39   ` Paolo Bonzini
2022-02-18 21:00     ` Sean Christopherson
2022-02-24 15:41       ` Maxim Levitsky
2022-02-25 17:40         ` Sean Christopherson
2022-02-17 21:03 ` [PATCH v2 16/18] KVM: x86: introduce KVM_REQ_MMU_UPDATE_ROOT Paolo Bonzini
2022-02-18 21:45   ` Sean Christopherson
2022-02-19  7:54     ` Paolo Bonzini
2022-02-22 16:06       ` Sean Christopherson
2022-02-24 15:50       ` Maxim Levitsky
2022-02-17 21:03 ` [PATCH v2 17/18] KVM: x86: flush TLB separately from MMU reset Paolo Bonzini
2022-02-18 23:57   ` Sean Christopherson
2022-02-21 15:01     ` Paolo Bonzini
2022-02-24 16:11   ` Maxim Levitsky
2022-02-17 21:03 ` [PATCH v2 18/18] KVM: x86: do not unload MMU roots on all role changes Paolo Bonzini
2022-02-24 16:25   ` Maxim Levitsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).