linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/13] More parallel operations for the TDP MMU
@ 2021-03-31 21:08 Ben Gardon
  2021-03-31 21:08 ` [PATCH 01/13] KVM: x86/mmu: Re-add const qualifier in kvm_tdp_mmu_zap_collapsible_sptes Ben Gardon
                   ` (13 more replies)
  0 siblings, 14 replies; 32+ messages in thread
From: Ben Gardon @ 2021-03-31 21:08 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Peter Xu, Sean Christopherson, Peter Shier,
	Peter Feiner, Junaid Shahid, Jim Mattson, Yulei Zhang,
	Wanpeng Li, Vitaly Kuznetsov, Xiao Guangrong, Ben Gardon

Now that the TDP MMU is able to handle page faults in parallel, it's a
relatively small change to expand to other operations. This series allows
zapping a range of GFNs, reclaiming collapsible SPTEs (when disabling
dirty logging), and enabling dirty logging to all happen under the MMU
lock in read mode.

This is partly a cleanup + rewrite of the last few patches of the parallel
page faults series. I've incorporated feedback from Sean and Paolo, but
the patches have changed so much that I'm sending this as a separate
series.

Ran kvm-unit-tests + selftests on an SMP kernel + Intel Skylake, with the
TDP MMU enabled and disabled. This series introduces no new failures or
warnings.

I know this will conflict horribly with the patches from Sean's series
which were just queued, and I'll send a v2 to fix those conflicts +
address any feedback on this v1.

Ben Gardon (13):
  KVM: x86/mmu: Re-add const qualifier in
    kvm_tdp_mmu_zap_collapsible_sptes
  KVM: x86/mmu: Move kvm_mmu_(get|put)_root to TDP MMU
  KVM: x86/mmu: use tdp_mmu_free_sp to free roots
  KVM: x86/mmu: Merge TDP MMU put and free root
  KVM: x86/mmu: comment for_each_tdp_mmu_root requires MMU write lock
  KVM: x86/mmu: Refactor yield safe root iterator
  KVM: x86/mmu: Make TDP MMU root refcount atomic
  KVM: x86/mmu: Protect the tdp_mmu_roots list with RCU
  KVM: x86/mmu: Allow zap gfn range to operate under the mmu read lock
  KVM: x86/mmu: Allow zapping collapsible SPTEs to use MMU read lock
  KVM: x86/mmu: Allow enabling / disabling dirty logging under MMU read
    lock
  KVM: x86/mmu: Fast invalidation for TDP MMU
  KVM: x86/mmu: Tear down roots in fast invalidation thread

 arch/x86/kvm/mmu/mmu.c          |  70 +++---
 arch/x86/kvm/mmu/mmu_internal.h |  27 +--
 arch/x86/kvm/mmu/tdp_mmu.c      | 383 ++++++++++++++++++++++++--------
 arch/x86/kvm/mmu/tdp_mmu.h      |  21 +-
 include/linux/kvm_host.h        |   2 +-
 5 files changed, 357 insertions(+), 146 deletions(-)

-- 
2.31.0.291.g576ba9dcdaf-goog


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 01/13] KVM: x86/mmu: Re-add const qualifier in kvm_tdp_mmu_zap_collapsible_sptes
  2021-03-31 21:08 [PATCH 00/13] More parallel operations for the TDP MMU Ben Gardon
@ 2021-03-31 21:08 ` Ben Gardon
  2021-03-31 21:08 ` [PATCH 02/13] KVM: x86/mmu: Move kvm_mmu_(get|put)_root to TDP MMU Ben Gardon
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 32+ messages in thread
From: Ben Gardon @ 2021-03-31 21:08 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Peter Xu, Sean Christopherson, Peter Shier,
	Peter Feiner, Junaid Shahid, Jim Mattson, Yulei Zhang,
	Wanpeng Li, Vitaly Kuznetsov, Xiao Guangrong, Ben Gardon

kvm_tdp_mmu_zap_collapsible_sptes unnecessarily removes the const
qualifier from its memlsot argument, leading to a compiler warning. Add
the const annotation and pass it to subsequent functions.

Signed-off-by: Ben Gardon <bgardon@google.com>
---
 arch/x86/kvm/mmu/mmu.c          | 10 +++++-----
 arch/x86/kvm/mmu/mmu_internal.h |  5 +++--
 arch/x86/kvm/mmu/tdp_mmu.c      |  4 ++--
 arch/x86/kvm/mmu/tdp_mmu.h      |  2 +-
 include/linux/kvm_host.h        |  2 +-
 5 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index c6ed633594a2..f75cbb0fcc9c 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -715,8 +715,7 @@ static void kvm_mmu_page_set_gfn(struct kvm_mmu_page *sp, int index, gfn_t gfn)
  * handling slots that are not large page aligned.
  */
 static struct kvm_lpage_info *lpage_info_slot(gfn_t gfn,
-					      struct kvm_memory_slot *slot,
-					      int level)
+		const struct kvm_memory_slot *slot, int level)
 {
 	unsigned long idx;
 
@@ -2736,7 +2735,7 @@ static void direct_pte_prefetch(struct kvm_vcpu *vcpu, u64 *sptep)
 }
 
 static int host_pfn_mapping_level(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn,
-				  struct kvm_memory_slot *slot)
+				  const struct kvm_memory_slot *slot)
 {
 	unsigned long hva;
 	pte_t *pte;
@@ -2762,8 +2761,9 @@ static int host_pfn_mapping_level(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn,
 	return level;
 }
 
-int kvm_mmu_max_mapping_level(struct kvm *kvm, struct kvm_memory_slot *slot,
-			      gfn_t gfn, kvm_pfn_t pfn, int max_level)
+int kvm_mmu_max_mapping_level(struct kvm *kvm,
+			      const struct kvm_memory_slot *slot, gfn_t gfn,
+			      kvm_pfn_t pfn, int max_level)
 {
 	struct kvm_lpage_info *linfo;
 
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index e03267e93459..fc88f62d7bd9 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -156,8 +156,9 @@ enum {
 #define SET_SPTE_NEED_REMOTE_TLB_FLUSH	BIT(1)
 #define SET_SPTE_SPURIOUS		BIT(2)
 
-int kvm_mmu_max_mapping_level(struct kvm *kvm, struct kvm_memory_slot *slot,
-			      gfn_t gfn, kvm_pfn_t pfn, int max_level);
+int kvm_mmu_max_mapping_level(struct kvm *kvm,
+			      const struct kvm_memory_slot *slot, gfn_t gfn,
+			      kvm_pfn_t pfn, int max_level);
 int kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, gfn_t gfn,
 			    int max_level, kvm_pfn_t *pfnp,
 			    bool huge_page_disallowed, int *req_level);
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index f2c335854afb..6d4f4e305163 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1268,7 +1268,7 @@ void kvm_tdp_mmu_clear_dirty_pt_masked(struct kvm *kvm,
  */
 static void zap_collapsible_spte_range(struct kvm *kvm,
 				       struct kvm_mmu_page *root,
-				       struct kvm_memory_slot *slot)
+				       const struct kvm_memory_slot *slot)
 {
 	gfn_t start = slot->base_gfn;
 	gfn_t end = start + slot->npages;
@@ -1309,7 +1309,7 @@ static void zap_collapsible_spte_range(struct kvm *kvm,
  * be replaced by large mappings, for GFNs within the slot.
  */
 void kvm_tdp_mmu_zap_collapsible_sptes(struct kvm *kvm,
-				       struct kvm_memory_slot *slot)
+				       const struct kvm_memory_slot *slot)
 {
 	struct kvm_mmu_page *root;
 	int root_as_id;
diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
index 3b761c111bff..683d1d69c8c8 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.h
+++ b/arch/x86/kvm/mmu/tdp_mmu.h
@@ -34,7 +34,7 @@ void kvm_tdp_mmu_clear_dirty_pt_masked(struct kvm *kvm,
 				       gfn_t gfn, unsigned long mask,
 				       bool wrprot);
 void kvm_tdp_mmu_zap_collapsible_sptes(struct kvm *kvm,
-				       struct kvm_memory_slot *slot);
+				       const struct kvm_memory_slot *slot);
 
 bool kvm_tdp_mmu_write_protect_gfn(struct kvm *kvm,
 				   struct kvm_memory_slot *slot, gfn_t gfn);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 1b65e7204344..74e56e8673a6 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1116,7 +1116,7 @@ __gfn_to_memslot(struct kvm_memslots *slots, gfn_t gfn)
 }
 
 static inline unsigned long
-__gfn_to_hva_memslot(struct kvm_memory_slot *slot, gfn_t gfn)
+__gfn_to_hva_memslot(const struct kvm_memory_slot *slot, gfn_t gfn)
 {
 	return slot->userspace_addr + (gfn - slot->base_gfn) * PAGE_SIZE;
 }
-- 
2.31.0.291.g576ba9dcdaf-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 02/13] KVM: x86/mmu: Move kvm_mmu_(get|put)_root to TDP MMU
  2021-03-31 21:08 [PATCH 00/13] More parallel operations for the TDP MMU Ben Gardon
  2021-03-31 21:08 ` [PATCH 01/13] KVM: x86/mmu: Re-add const qualifier in kvm_tdp_mmu_zap_collapsible_sptes Ben Gardon
@ 2021-03-31 21:08 ` Ben Gardon
  2021-03-31 21:08 ` [PATCH 03/13] KVM: x86/mmu: use tdp_mmu_free_sp to free roots Ben Gardon
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 32+ messages in thread
From: Ben Gardon @ 2021-03-31 21:08 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Peter Xu, Sean Christopherson, Peter Shier,
	Peter Feiner, Junaid Shahid, Jim Mattson, Yulei Zhang,
	Wanpeng Li, Vitaly Kuznetsov, Xiao Guangrong, Ben Gardon

The TDP MMU is almost the only user of kvm_mmu_get_root and
kvm_mmu_put_root. There is only one use of put_root in mmu.c for the
legacy / shadow MMU. Open code that one use and move the get / put
functions to the TDP MMU so they can be extended in future commits.

No functional change intended.

Signed-off-by: Ben Gardon <bgardon@google.com>
---
 arch/x86/kvm/mmu/mmu.c          | 10 ++++------
 arch/x86/kvm/mmu/mmu_internal.h | 16 ----------------
 arch/x86/kvm/mmu/tdp_mmu.c      |  6 +++---
 arch/x86/kvm/mmu/tdp_mmu.h      | 18 ++++++++++++++++++
 4 files changed, 25 insertions(+), 25 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index f75cbb0fcc9c..618cc011f446 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3154,12 +3154,10 @@ static void mmu_free_root_page(struct kvm *kvm, hpa_t *root_hpa,
 
 	sp = to_shadow_page(*root_hpa & PT64_BASE_ADDR_MASK);
 
-	if (kvm_mmu_put_root(kvm, sp)) {
-		if (is_tdp_mmu_page(sp))
-			kvm_tdp_mmu_free_root(kvm, sp);
-		else if (sp->role.invalid)
-			kvm_mmu_prepare_zap_page(kvm, sp, invalid_list);
-	}
+	if (is_tdp_mmu_page(sp) && kvm_tdp_mmu_put_root(kvm, sp))
+		kvm_tdp_mmu_free_root(kvm, sp);
+	else if (!--sp->root_count && sp->role.invalid)
+		kvm_mmu_prepare_zap_page(kvm, sp, invalid_list);
 
 	*root_hpa = INVALID_PAGE;
 }
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index fc88f62d7bd9..788dcf77c957 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -118,22 +118,6 @@ bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm,
 void kvm_flush_remote_tlbs_with_address(struct kvm *kvm,
 					u64 start_gfn, u64 pages);
 
-static inline void kvm_mmu_get_root(struct kvm *kvm, struct kvm_mmu_page *sp)
-{
-	BUG_ON(!sp->root_count);
-	lockdep_assert_held(&kvm->mmu_lock);
-
-	++sp->root_count;
-}
-
-static inline bool kvm_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *sp)
-{
-	lockdep_assert_held(&kvm->mmu_lock);
-	--sp->root_count;
-
-	return !sp->root_count;
-}
-
 /*
  * Return values of handle_mmio_page_fault, mmu.page_fault, and fast_page_fault().
  *
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 6d4f4e305163..1929cc7a42ac 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -43,7 +43,7 @@ void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm)
 
 static void tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root)
 {
-	if (kvm_mmu_put_root(kvm, root))
+	if (kvm_tdp_mmu_put_root(kvm, root))
 		kvm_tdp_mmu_free_root(kvm, root);
 }
 
@@ -55,7 +55,7 @@ static inline bool tdp_mmu_next_root_valid(struct kvm *kvm,
 	if (list_entry_is_head(root, &kvm->arch.tdp_mmu_roots, link))
 		return false;
 
-	kvm_mmu_get_root(kvm, root);
+	kvm_tdp_mmu_get_root(kvm, root);
 	return true;
 
 }
@@ -150,7 +150,7 @@ hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu)
 	/* Check for an existing root before allocating a new one. */
 	for_each_tdp_mmu_root(kvm, root) {
 		if (root->role.word == role.word) {
-			kvm_mmu_get_root(kvm, root);
+			kvm_tdp_mmu_get_root(kvm, root);
 			goto out;
 		}
 	}
diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
index 683d1d69c8c8..2dc3b3ba48fb 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.h
+++ b/arch/x86/kvm/mmu/tdp_mmu.h
@@ -8,6 +8,24 @@
 hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu);
 void kvm_tdp_mmu_free_root(struct kvm *kvm, struct kvm_mmu_page *root);
 
+static inline void kvm_tdp_mmu_get_root(struct kvm *kvm,
+					struct kvm_mmu_page *root)
+{
+	BUG_ON(!root->root_count);
+	lockdep_assert_held(&kvm->mmu_lock);
+
+	++root->root_count;
+}
+
+static inline bool kvm_tdp_mmu_put_root(struct kvm *kvm,
+					struct kvm_mmu_page *root)
+{
+	lockdep_assert_held(&kvm->mmu_lock);
+	--root->root_count;
+
+	return !root->root_count;
+}
+
 bool kvm_tdp_mmu_zap_gfn_range(struct kvm *kvm, gfn_t start, gfn_t end);
 void kvm_tdp_mmu_zap_all(struct kvm *kvm);
 
-- 
2.31.0.291.g576ba9dcdaf-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 03/13] KVM: x86/mmu: use tdp_mmu_free_sp to free roots
  2021-03-31 21:08 [PATCH 00/13] More parallel operations for the TDP MMU Ben Gardon
  2021-03-31 21:08 ` [PATCH 01/13] KVM: x86/mmu: Re-add const qualifier in kvm_tdp_mmu_zap_collapsible_sptes Ben Gardon
  2021-03-31 21:08 ` [PATCH 02/13] KVM: x86/mmu: Move kvm_mmu_(get|put)_root to TDP MMU Ben Gardon
@ 2021-03-31 21:08 ` Ben Gardon
  2021-03-31 21:08 ` [PATCH 04/13] KVM: x86/mmu: Merge TDP MMU put and free root Ben Gardon
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 32+ messages in thread
From: Ben Gardon @ 2021-03-31 21:08 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Peter Xu, Sean Christopherson, Peter Shier,
	Peter Feiner, Junaid Shahid, Jim Mattson, Yulei Zhang,
	Wanpeng Li, Vitaly Kuznetsov, Xiao Guangrong, Ben Gardon

Minor cleanup to deduplicate the code used to free a struct kvm_mmu_page
in the TDP MMU.

No functional change intended.

Signed-off-by: Ben Gardon <bgardon@google.com>
---
 arch/x86/kvm/mmu/tdp_mmu.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 1929cc7a42ac..5a2698d64957 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -88,6 +88,12 @@ static inline struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm,
 static bool zap_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
 			  gfn_t start, gfn_t end, bool can_yield);
 
+static void tdp_mmu_free_sp(struct kvm_mmu_page *sp)
+{
+	free_page((unsigned long)sp->spt);
+	kmem_cache_free(mmu_page_header_cache, sp);
+}
+
 void kvm_tdp_mmu_free_root(struct kvm *kvm, struct kvm_mmu_page *root)
 {
 	gfn_t max_gfn = 1ULL << (shadow_phys_bits - PAGE_SHIFT);
@@ -101,8 +107,7 @@ void kvm_tdp_mmu_free_root(struct kvm *kvm, struct kvm_mmu_page *root)
 
 	zap_gfn_range(kvm, root, 0, max_gfn, false);
 
-	free_page((unsigned long)root->spt);
-	kmem_cache_free(mmu_page_header_cache, root);
+	tdp_mmu_free_sp(root);
 }
 
 static union kvm_mmu_page_role page_role_for_level(struct kvm_vcpu *vcpu,
@@ -164,12 +169,6 @@ hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu)
 	return __pa(root->spt);
 }
 
-static void tdp_mmu_free_sp(struct kvm_mmu_page *sp)
-{
-	free_page((unsigned long)sp->spt);
-	kmem_cache_free(mmu_page_header_cache, sp);
-}
-
 /*
  * This is called through call_rcu in order to free TDP page table memory
  * safely with respect to other kernel threads that may be operating on
-- 
2.31.0.291.g576ba9dcdaf-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 04/13] KVM: x86/mmu: Merge TDP MMU put and free root
  2021-03-31 21:08 [PATCH 00/13] More parallel operations for the TDP MMU Ben Gardon
                   ` (2 preceding siblings ...)
  2021-03-31 21:08 ` [PATCH 03/13] KVM: x86/mmu: use tdp_mmu_free_sp to free roots Ben Gardon
@ 2021-03-31 21:08 ` Ben Gardon
  2021-03-31 21:08 ` [PATCH 05/13] KVM: x86/mmu: comment for_each_tdp_mmu_root requires MMU write lock Ben Gardon
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 32+ messages in thread
From: Ben Gardon @ 2021-03-31 21:08 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Peter Xu, Sean Christopherson, Peter Shier,
	Peter Feiner, Junaid Shahid, Jim Mattson, Yulei Zhang,
	Wanpeng Li, Vitaly Kuznetsov, Xiao Guangrong, Ben Gardon

kvm_tdp_mmu_put_root and kvm_tdp_mmu_free_root are always called
together, so merge the functions to simplify TDP MMU root refcounting /
freeing.

Signed-off-by: Ben Gardon <bgardon@google.com>
---
 arch/x86/kvm/mmu/mmu.c     |  4 +--
 arch/x86/kvm/mmu/tdp_mmu.c | 54 ++++++++++++++++++--------------------
 arch/x86/kvm/mmu/tdp_mmu.h | 10 +------
 3 files changed, 28 insertions(+), 40 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 618cc011f446..667d64daa82c 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3154,8 +3154,8 @@ static void mmu_free_root_page(struct kvm *kvm, hpa_t *root_hpa,
 
 	sp = to_shadow_page(*root_hpa & PT64_BASE_ADDR_MASK);
 
-	if (is_tdp_mmu_page(sp) && kvm_tdp_mmu_put_root(kvm, sp))
-		kvm_tdp_mmu_free_root(kvm, sp);
+	if (is_tdp_mmu_page(sp))
+		kvm_tdp_mmu_put_root(kvm, sp);
 	else if (!--sp->root_count && sp->role.invalid)
 		kvm_mmu_prepare_zap_page(kvm, sp, invalid_list);
 
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 5a2698d64957..368091adab09 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -41,10 +41,31 @@ void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm)
 	rcu_barrier();
 }
 
-static void tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root)
+static bool zap_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
+			  gfn_t start, gfn_t end, bool can_yield);
+
+static void tdp_mmu_free_sp(struct kvm_mmu_page *sp)
 {
-	if (kvm_tdp_mmu_put_root(kvm, root))
-		kvm_tdp_mmu_free_root(kvm, root);
+	free_page((unsigned long)sp->spt);
+	kmem_cache_free(mmu_page_header_cache, sp);
+}
+
+void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root)
+{
+	gfn_t max_gfn = 1ULL << (shadow_phys_bits - PAGE_SHIFT);
+
+	lockdep_assert_held_write(&kvm->mmu_lock);
+
+	if (--root->root_count)
+		return;
+
+	WARN_ON(!root->tdp_mmu_page);
+
+	list_del(&root->link);
+
+	zap_gfn_range(kvm, root, 0, max_gfn, false);
+
+	tdp_mmu_free_sp(root);
 }
 
 static inline bool tdp_mmu_next_root_valid(struct kvm *kvm,
@@ -66,7 +87,7 @@ static inline struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm,
 	struct kvm_mmu_page *next_root;
 
 	next_root = list_next_entry(root, link);
-	tdp_mmu_put_root(kvm, root);
+	kvm_tdp_mmu_put_root(kvm, root);
 	return next_root;
 }
 
@@ -85,31 +106,6 @@ static inline struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm,
 #define for_each_tdp_mmu_root(_kvm, _root)				\
 	list_for_each_entry(_root, &_kvm->arch.tdp_mmu_roots, link)
 
-static bool zap_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
-			  gfn_t start, gfn_t end, bool can_yield);
-
-static void tdp_mmu_free_sp(struct kvm_mmu_page *sp)
-{
-	free_page((unsigned long)sp->spt);
-	kmem_cache_free(mmu_page_header_cache, sp);
-}
-
-void kvm_tdp_mmu_free_root(struct kvm *kvm, struct kvm_mmu_page *root)
-{
-	gfn_t max_gfn = 1ULL << (shadow_phys_bits - PAGE_SHIFT);
-
-	lockdep_assert_held_write(&kvm->mmu_lock);
-
-	WARN_ON(root->root_count);
-	WARN_ON(!root->tdp_mmu_page);
-
-	list_del(&root->link);
-
-	zap_gfn_range(kvm, root, 0, max_gfn, false);
-
-	tdp_mmu_free_sp(root);
-}
-
 static union kvm_mmu_page_role page_role_for_level(struct kvm_vcpu *vcpu,
 						   int level)
 {
diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
index 2dc3b3ba48fb..5d950e987fc7 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.h
+++ b/arch/x86/kvm/mmu/tdp_mmu.h
@@ -6,7 +6,6 @@
 #include <linux/kvm_host.h>
 
 hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu);
-void kvm_tdp_mmu_free_root(struct kvm *kvm, struct kvm_mmu_page *root);
 
 static inline void kvm_tdp_mmu_get_root(struct kvm *kvm,
 					struct kvm_mmu_page *root)
@@ -17,14 +16,7 @@ static inline void kvm_tdp_mmu_get_root(struct kvm *kvm,
 	++root->root_count;
 }
 
-static inline bool kvm_tdp_mmu_put_root(struct kvm *kvm,
-					struct kvm_mmu_page *root)
-{
-	lockdep_assert_held(&kvm->mmu_lock);
-	--root->root_count;
-
-	return !root->root_count;
-}
+void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root);
 
 bool kvm_tdp_mmu_zap_gfn_range(struct kvm *kvm, gfn_t start, gfn_t end);
 void kvm_tdp_mmu_zap_all(struct kvm *kvm);
-- 
2.31.0.291.g576ba9dcdaf-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 05/13] KVM: x86/mmu: comment for_each_tdp_mmu_root requires MMU write lock
  2021-03-31 21:08 [PATCH 00/13] More parallel operations for the TDP MMU Ben Gardon
                   ` (3 preceding siblings ...)
  2021-03-31 21:08 ` [PATCH 04/13] KVM: x86/mmu: Merge TDP MMU put and free root Ben Gardon
@ 2021-03-31 21:08 ` Ben Gardon
  2021-03-31 21:08 ` [PATCH 06/13] KVM: x86/mmu: Refactor yield safe root iterator Ben Gardon
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 32+ messages in thread
From: Ben Gardon @ 2021-03-31 21:08 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Peter Xu, Sean Christopherson, Peter Shier,
	Peter Feiner, Junaid Shahid, Jim Mattson, Yulei Zhang,
	Wanpeng Li, Vitaly Kuznetsov, Xiao Guangrong, Ben Gardon

Currently, iterating over the list of TDP MMU roots can only be done
under the MMU write lock, but that will change in future commits. Add a
defensive comment to for_each_tdp_mmu_root noting that it must only be
used under the MMU lock in write mode. That function will not be
modified to work under the lock in read mode.

Signed-off-by: Ben Gardon <bgardon@google.com>
---
 arch/x86/kvm/mmu/tdp_mmu.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 368091adab09..365fa9f2f856 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -103,6 +103,7 @@ static inline struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm,
 	     tdp_mmu_next_root_valid(_kvm, _root);			\
 	     _root = tdp_mmu_next_root(_kvm, _root))
 
+/* Only safe under the MMU lock in write mode, without yielding. */
 #define for_each_tdp_mmu_root(_kvm, _root)				\
 	list_for_each_entry(_root, &_kvm->arch.tdp_mmu_roots, link)
 
-- 
2.31.0.291.g576ba9dcdaf-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 06/13] KVM: x86/mmu: Refactor yield safe root iterator
  2021-03-31 21:08 [PATCH 00/13] More parallel operations for the TDP MMU Ben Gardon
                   ` (4 preceding siblings ...)
  2021-03-31 21:08 ` [PATCH 05/13] KVM: x86/mmu: comment for_each_tdp_mmu_root requires MMU write lock Ben Gardon
@ 2021-03-31 21:08 ` Ben Gardon
  2021-03-31 21:08 ` [PATCH 07/13] KVM: x86/mmu: Make TDP MMU root refcount atomic Ben Gardon
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 32+ messages in thread
From: Ben Gardon @ 2021-03-31 21:08 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Peter Xu, Sean Christopherson, Peter Shier,
	Peter Feiner, Junaid Shahid, Jim Mattson, Yulei Zhang,
	Wanpeng Li, Vitaly Kuznetsov, Xiao Guangrong, Ben Gardon

Refactor the yield safe TDP MMU root iterator to be more amenable to
changes in future commits which will allow it to be used under the MMU
lock in read mode. Currently the iterator requires a complicated dance
between the helper functions and different parts of the for loop which
makes it hard to reason about. Moving all the logic into a single function
simplifies the iterator substantially.

Signed-off-by: Ben Gardon <bgardon@google.com>
---
 arch/x86/kvm/mmu/tdp_mmu.c | 43 ++++++++++++++++++++++----------------
 1 file changed, 25 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 365fa9f2f856..ab1d26b40164 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -68,26 +68,34 @@ void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root)
 	tdp_mmu_free_sp(root);
 }
 
-static inline bool tdp_mmu_next_root_valid(struct kvm *kvm,
-					   struct kvm_mmu_page *root)
+/*
+ * Finds the next valid root after root (or the first valid root if root
+ * is NULL), takes a reference on it, and returns that next root. If root
+ * is not NULL, this thread should have already taken a reference on it, and
+ * that reference will be dropped. If no valid root is found, this
+ * function will return NULL.
+ */
+static struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm,
+					      struct kvm_mmu_page *prev_root)
 {
-	lockdep_assert_held_write(&kvm->mmu_lock);
+	struct kvm_mmu_page *next_root;
 
-	if (list_entry_is_head(root, &kvm->arch.tdp_mmu_roots, link))
-		return false;
+	lockdep_assert_held_write(&kvm->mmu_lock);
 
-	kvm_tdp_mmu_get_root(kvm, root);
-	return true;
+	if (prev_root)
+		next_root = list_next_entry(prev_root, link);
+	else
+		next_root = list_first_entry(&kvm->arch.tdp_mmu_roots,
+					     typeof(*next_root), link);
 
-}
+	if (list_entry_is_head(next_root, &kvm->arch.tdp_mmu_roots, link))
+		next_root = NULL;
+	else
+		kvm_tdp_mmu_get_root(kvm, next_root);
 
-static inline struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm,
-						     struct kvm_mmu_page *root)
-{
-	struct kvm_mmu_page *next_root;
+	if (prev_root)
+		kvm_tdp_mmu_put_root(kvm, prev_root);
 
-	next_root = list_next_entry(root, link);
-	kvm_tdp_mmu_put_root(kvm, root);
 	return next_root;
 }
 
@@ -97,10 +105,9 @@ static inline struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm,
  * if exiting the loop early, the caller must drop the reference to the most
  * recent root. (Unless keeping a live reference is desirable.)
  */
-#define for_each_tdp_mmu_root_yield_safe(_kvm, _root)				\
-	for (_root = list_first_entry(&_kvm->arch.tdp_mmu_roots,	\
-				      typeof(*_root), link);		\
-	     tdp_mmu_next_root_valid(_kvm, _root);			\
+#define for_each_tdp_mmu_root_yield_safe(_kvm, _root)	\
+	for (_root = tdp_mmu_next_root(_kvm, NULL);	\
+	     _root;					\
 	     _root = tdp_mmu_next_root(_kvm, _root))
 
 /* Only safe under the MMU lock in write mode, without yielding. */
-- 
2.31.0.291.g576ba9dcdaf-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 07/13] KVM: x86/mmu: Make TDP MMU root refcount atomic
  2021-03-31 21:08 [PATCH 00/13] More parallel operations for the TDP MMU Ben Gardon
                   ` (5 preceding siblings ...)
  2021-03-31 21:08 ` [PATCH 06/13] KVM: x86/mmu: Refactor yield safe root iterator Ben Gardon
@ 2021-03-31 21:08 ` Ben Gardon
  2021-03-31 22:21   ` Sean Christopherson
  2021-03-31 21:08 ` [PATCH 08/13] KVM: x86/mmu: Protect the tdp_mmu_roots list with RCU Ben Gardon
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 32+ messages in thread
From: Ben Gardon @ 2021-03-31 21:08 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Peter Xu, Sean Christopherson, Peter Shier,
	Peter Feiner, Junaid Shahid, Jim Mattson, Yulei Zhang,
	Wanpeng Li, Vitaly Kuznetsov, Xiao Guangrong, Ben Gardon

In order to parallelize more operations for the TDP MMU, make the
refcount on TDP MMU roots atomic, so that a future patch can allow
multiple threads to take a reference on the root concurrently, while
holding the MMU lock in read mode.

Signed-off-by: Ben Gardon <bgardon@google.com>
---
 arch/x86/kvm/mmu/mmu_internal.h |  6 +++++-
 arch/x86/kvm/mmu/tdp_mmu.c      | 15 ++++++++-------
 arch/x86/kvm/mmu/tdp_mmu.h      |  9 +++------
 3 files changed, 16 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index 788dcf77c957..0a040d6a4f35 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -50,7 +50,11 @@ struct kvm_mmu_page {
 	u64 *spt;
 	/* hold the gfn of each spte inside spt */
 	gfn_t *gfns;
-	int root_count;          /* Currently serving as active root */
+	/* Currently serving as active root */
+	union {
+		int root_count;
+		refcount_t tdp_mmu_root_count;
+	};
 	unsigned int unsync_children;
 	struct kvm_rmap_head parent_ptes; /* rmap pointers to parent sptes */
 	DECLARE_BITMAP(unsync_child_bitmap, 512);
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index ab1d26b40164..1f0b2d6124a2 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -56,7 +56,7 @@ void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root)
 
 	lockdep_assert_held_write(&kvm->mmu_lock);
 
-	if (--root->root_count)
+	if (!refcount_dec_and_test(&root->tdp_mmu_root_count))
 		return;
 
 	WARN_ON(!root->tdp_mmu_page);
@@ -88,10 +88,12 @@ static struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm,
 		next_root = list_first_entry(&kvm->arch.tdp_mmu_roots,
 					     typeof(*next_root), link);
 
+	while (!list_entry_is_head(next_root, &kvm->arch.tdp_mmu_roots, link) &&
+	       !kvm_tdp_mmu_get_root(kvm, next_root))
+		next_root = list_next_entry(next_root, link);
+
 	if (list_entry_is_head(next_root, &kvm->arch.tdp_mmu_roots, link))
 		next_root = NULL;
-	else
-		kvm_tdp_mmu_get_root(kvm, next_root);
 
 	if (prev_root)
 		kvm_tdp_mmu_put_root(kvm, prev_root);
@@ -158,14 +160,13 @@ hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu)
 
 	/* Check for an existing root before allocating a new one. */
 	for_each_tdp_mmu_root(kvm, root) {
-		if (root->role.word == role.word) {
-			kvm_tdp_mmu_get_root(kvm, root);
+		if (root->role.word == role.word &&
+		    kvm_tdp_mmu_get_root(kvm, root))
 			goto out;
-		}
 	}
 
 	root = alloc_tdp_mmu_page(vcpu, 0, vcpu->arch.mmu->shadow_root_level);
-	root->root_count = 1;
+	refcount_set(&root->tdp_mmu_root_count, 1);
 
 	list_add(&root->link, &kvm->arch.tdp_mmu_roots);
 
diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
index 5d950e987fc7..9961df505067 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.h
+++ b/arch/x86/kvm/mmu/tdp_mmu.h
@@ -7,13 +7,10 @@
 
 hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu);
 
-static inline void kvm_tdp_mmu_get_root(struct kvm *kvm,
-					struct kvm_mmu_page *root)
+__must_check static inline bool kvm_tdp_mmu_get_root(struct kvm *kvm,
+						     struct kvm_mmu_page *root)
 {
-	BUG_ON(!root->root_count);
-	lockdep_assert_held(&kvm->mmu_lock);
-
-	++root->root_count;
+	return refcount_inc_not_zero(&root->tdp_mmu_root_count);
 }
 
 void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root);
-- 
2.31.0.291.g576ba9dcdaf-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 08/13] KVM: x86/mmu: Protect the tdp_mmu_roots list with RCU
  2021-03-31 21:08 [PATCH 00/13] More parallel operations for the TDP MMU Ben Gardon
                   ` (6 preceding siblings ...)
  2021-03-31 21:08 ` [PATCH 07/13] KVM: x86/mmu: Make TDP MMU root refcount atomic Ben Gardon
@ 2021-03-31 21:08 ` Ben Gardon
  2021-04-01  9:37   ` Paolo Bonzini
  2021-04-01 13:16   ` kernel test robot
  2021-03-31 21:08 ` [PATCH 09/13] KVM: x86/mmu: Allow zap gfn range to operate under the mmu read lock Ben Gardon
                   ` (5 subsequent siblings)
  13 siblings, 2 replies; 32+ messages in thread
From: Ben Gardon @ 2021-03-31 21:08 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Peter Xu, Sean Christopherson, Peter Shier,
	Peter Feiner, Junaid Shahid, Jim Mattson, Yulei Zhang,
	Wanpeng Li, Vitaly Kuznetsov, Xiao Guangrong, Ben Gardon

Protect the contents of the TDP MMU roots list with RCU in preparation
for a future patch which will allow the iterator macro to be used under
the MMU lock in read mode.

Signed-off-by: Ben Gardon <bgardon@google.com>
---
 arch/x86/kvm/mmu/tdp_mmu.c | 64 +++++++++++++++++++++-----------------
 1 file changed, 36 insertions(+), 28 deletions(-)

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 1f0b2d6124a2..d255125059c4 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -50,6 +50,22 @@ static void tdp_mmu_free_sp(struct kvm_mmu_page *sp)
 	kmem_cache_free(mmu_page_header_cache, sp);
 }
 
+/*
+ * This is called through call_rcu in order to free TDP page table memory
+ * safely with respect to other kernel threads that may be operating on
+ * the memory.
+ * By only accessing TDP MMU page table memory in an RCU read critical
+ * section, and freeing it after a grace period, lockless access to that
+ * memory won't use it after it is freed.
+ */
+static void tdp_mmu_free_sp_rcu_callback(struct rcu_head *head)
+{
+	struct kvm_mmu_page *sp = container_of(head, struct kvm_mmu_page,
+					       rcu_head);
+
+	tdp_mmu_free_sp(sp);
+}
+
 void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root)
 {
 	gfn_t max_gfn = 1ULL << (shadow_phys_bits - PAGE_SHIFT);
@@ -61,11 +77,13 @@ void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root)
 
 	WARN_ON(!root->tdp_mmu_page);
 
-	list_del(&root->link);
+	spin_lock(&kvm->arch.tdp_mmu_pages_lock);
+	list_del_rcu(&root->link);
+	spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
 
 	zap_gfn_range(kvm, root, 0, max_gfn, false);
 
-	tdp_mmu_free_sp(root);
+	call_rcu(&root->rcu_head, tdp_mmu_free_sp_rcu_callback);
 }
 
 /*
@@ -82,18 +100,21 @@ static struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm,
 
 	lockdep_assert_held_write(&kvm->mmu_lock);
 
+	rcu_read_lock();
+
 	if (prev_root)
-		next_root = list_next_entry(prev_root, link);
+		next_root = list_next_or_null_rcu(&kvm->arch.tdp_mmu_roots,
+						  &prev_root->link,
+						  typeof(*prev_root), link);
 	else
-		next_root = list_first_entry(&kvm->arch.tdp_mmu_roots,
-					     typeof(*next_root), link);
+		next_root = list_first_or_null_rcu(&kvm->arch.tdp_mmu_roots,
+						   typeof(*next_root), link);
 
-	while (!list_entry_is_head(next_root, &kvm->arch.tdp_mmu_roots, link) &&
-	       !kvm_tdp_mmu_get_root(kvm, next_root))
-		next_root = list_next_entry(next_root, link);
+	while (next_root && !kvm_tdp_mmu_get_root(kvm, next_root))
+		next_root = list_next_or_null_rcu(&kvm->arch.tdp_mmu_roots,
+				&next_root->link, typeof(*next_root), link);
 
-	if (list_entry_is_head(next_root, &kvm->arch.tdp_mmu_roots, link))
-		next_root = NULL;
+	rcu_read_unlock();
 
 	if (prev_root)
 		kvm_tdp_mmu_put_root(kvm, prev_root);
@@ -114,7 +135,8 @@ static struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm,
 
 /* Only safe under the MMU lock in write mode, without yielding. */
 #define for_each_tdp_mmu_root(_kvm, _root)				\
-	list_for_each_entry(_root, &_kvm->arch.tdp_mmu_roots, link)
+	list_for_each_entry_rcu(_root, &_kvm->arch.tdp_mmu_roots, link,	\
+				lockdep_is_held_write(&kvm->mmu_lock))
 
 static union kvm_mmu_page_role page_role_for_level(struct kvm_vcpu *vcpu,
 						   int level)
@@ -168,28 +190,14 @@ hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu)
 	root = alloc_tdp_mmu_page(vcpu, 0, vcpu->arch.mmu->shadow_root_level);
 	refcount_set(&root->tdp_mmu_root_count, 1);
 
-	list_add(&root->link, &kvm->arch.tdp_mmu_roots);
+	spin_lock(&kvm->arch.tdp_mmu_pages_lock);
+	list_add_rcu(&root->link, &kvm->arch.tdp_mmu_roots);
+	spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
 
 out:
 	return __pa(root->spt);
 }
 
-/*
- * This is called through call_rcu in order to free TDP page table memory
- * safely with respect to other kernel threads that may be operating on
- * the memory.
- * By only accessing TDP MMU page table memory in an RCU read critical
- * section, and freeing it after a grace period, lockless access to that
- * memory won't use it after it is freed.
- */
-static void tdp_mmu_free_sp_rcu_callback(struct rcu_head *head)
-{
-	struct kvm_mmu_page *sp = container_of(head, struct kvm_mmu_page,
-					       rcu_head);
-
-	tdp_mmu_free_sp(sp);
-}
-
 static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
 				u64 old_spte, u64 new_spte, int level,
 				bool shared);
-- 
2.31.0.291.g576ba9dcdaf-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 09/13] KVM: x86/mmu: Allow zap gfn range to operate under the mmu read lock
  2021-03-31 21:08 [PATCH 00/13] More parallel operations for the TDP MMU Ben Gardon
                   ` (7 preceding siblings ...)
  2021-03-31 21:08 ` [PATCH 08/13] KVM: x86/mmu: Protect the tdp_mmu_roots list with RCU Ben Gardon
@ 2021-03-31 21:08 ` Ben Gardon
  2021-04-01  9:58   ` Paolo Bonzini
  2021-03-31 21:08 ` [PATCH 10/13] KVM: x86/mmu: Allow zapping collapsible SPTEs to use MMU " Ben Gardon
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 32+ messages in thread
From: Ben Gardon @ 2021-03-31 21:08 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Peter Xu, Sean Christopherson, Peter Shier,
	Peter Feiner, Junaid Shahid, Jim Mattson, Yulei Zhang,
	Wanpeng Li, Vitaly Kuznetsov, Xiao Guangrong, Ben Gardon

To reduce lock contention and interference with page fault handlers,
allow the TDP MMU function to zap a GFN range to operate under the MMU
read lock.

Signed-off-by: Ben Gardon <bgardon@google.com>
---
 arch/x86/kvm/mmu/mmu.c     |  15 ++++--
 arch/x86/kvm/mmu/tdp_mmu.c | 102 ++++++++++++++++++++++++++-----------
 arch/x86/kvm/mmu/tdp_mmu.h |   6 ++-
 3 files changed, 87 insertions(+), 36 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 667d64daa82c..dcbfc784cf2f 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3155,7 +3155,7 @@ static void mmu_free_root_page(struct kvm *kvm, hpa_t *root_hpa,
 	sp = to_shadow_page(*root_hpa & PT64_BASE_ADDR_MASK);
 
 	if (is_tdp_mmu_page(sp))
-		kvm_tdp_mmu_put_root(kvm, sp);
+		kvm_tdp_mmu_put_root(kvm, sp, false);
 	else if (!--sp->root_count && sp->role.invalid)
 		kvm_mmu_prepare_zap_page(kvm, sp, invalid_list);
 
@@ -5514,13 +5514,17 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
 		}
 	}
 
+	write_unlock(&kvm->mmu_lock);
+
 	if (is_tdp_mmu_enabled(kvm)) {
-		flush = kvm_tdp_mmu_zap_gfn_range(kvm, gfn_start, gfn_end);
+		read_lock(&kvm->mmu_lock);
+		flush = kvm_tdp_mmu_zap_gfn_range(kvm, gfn_start, gfn_end,
+						  true);
 		if (flush)
 			kvm_flush_remote_tlbs(kvm);
-	}
 
-	write_unlock(&kvm->mmu_lock);
+		read_unlock(&kvm->mmu_lock);
+	}
 }
 
 static bool slot_rmap_write_protect(struct kvm *kvm,
@@ -5959,7 +5963,8 @@ static void kvm_recover_nx_lpages(struct kvm *kvm)
 		WARN_ON_ONCE(!sp->lpage_disallowed);
 		if (is_tdp_mmu_page(sp)) {
 			kvm_tdp_mmu_zap_gfn_range(kvm, sp->gfn,
-				sp->gfn + KVM_PAGES_PER_HPAGE(sp->role.level));
+				sp->gfn + KVM_PAGES_PER_HPAGE(sp->role.level),
+				false);
 		} else {
 			kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list);
 			WARN_ON_ONCE(sp->lpage_disallowed);
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index d255125059c4..0e99e4675dd4 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -27,6 +27,15 @@ void kvm_mmu_init_tdp_mmu(struct kvm *kvm)
 	INIT_LIST_HEAD(&kvm->arch.tdp_mmu_pages);
 }
 
+static __always_inline void kvm_lockdep_assert_mmu_lock_held(struct kvm *kvm,
+							     bool shared)
+{
+	if (shared)
+		lockdep_assert_held_read(&kvm->mmu_lock);
+	else
+		lockdep_assert_held_write(&kvm->mmu_lock);
+}
+
 void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm)
 {
 	if (!kvm->arch.tdp_mmu_enabled)
@@ -42,7 +51,7 @@ void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm)
 }
 
 static bool zap_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
-			  gfn_t start, gfn_t end, bool can_yield);
+			  gfn_t start, gfn_t end, bool can_yield, bool shared);
 
 static void tdp_mmu_free_sp(struct kvm_mmu_page *sp)
 {
@@ -66,11 +75,12 @@ static void tdp_mmu_free_sp_rcu_callback(struct rcu_head *head)
 	tdp_mmu_free_sp(sp);
 }
 
-void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root)
+void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root,
+			  bool shared)
 {
 	gfn_t max_gfn = 1ULL << (shadow_phys_bits - PAGE_SHIFT);
 
-	lockdep_assert_held_write(&kvm->mmu_lock);
+	kvm_lockdep_assert_mmu_lock_held(kvm, shared);
 
 	if (!refcount_dec_and_test(&root->tdp_mmu_root_count))
 		return;
@@ -81,7 +91,7 @@ void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root)
 	list_del_rcu(&root->link);
 	spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
 
-	zap_gfn_range(kvm, root, 0, max_gfn, false);
+	zap_gfn_range(kvm, root, 0, max_gfn, false, shared);
 
 	call_rcu(&root->rcu_head, tdp_mmu_free_sp_rcu_callback);
 }
@@ -94,11 +104,11 @@ void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root)
  * function will return NULL.
  */
 static struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm,
-					      struct kvm_mmu_page *prev_root)
+					      struct kvm_mmu_page *prev_root,
+					      bool shared)
 {
 	struct kvm_mmu_page *next_root;
 
-	lockdep_assert_held_write(&kvm->mmu_lock);
 
 	rcu_read_lock();
 
@@ -117,7 +127,7 @@ static struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm,
 	rcu_read_unlock();
 
 	if (prev_root)
-		kvm_tdp_mmu_put_root(kvm, prev_root);
+		kvm_tdp_mmu_put_root(kvm, prev_root, shared);
 
 	return next_root;
 }
@@ -127,11 +137,15 @@ static struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm,
  * This makes it safe to release the MMU lock and yield within the loop, but
  * if exiting the loop early, the caller must drop the reference to the most
  * recent root. (Unless keeping a live reference is desirable.)
+ *
+ * If shared is set, this function is operating under the MMU lock in read
+ * mode. In the unlikely event that this thread must free a root, the lock
+ * will be temporarily dropped and reacquired in write mode.
  */
-#define for_each_tdp_mmu_root_yield_safe(_kvm, _root)	\
-	for (_root = tdp_mmu_next_root(_kvm, NULL);	\
-	     _root;					\
-	     _root = tdp_mmu_next_root(_kvm, _root))
+#define for_each_tdp_mmu_root_yield_safe(_kvm, _root, _shared)	\
+	for (_root = tdp_mmu_next_root(_kvm, NULL, _shared);	\
+	     _root;						\
+	     _root = tdp_mmu_next_root(_kvm, _root, _shared))
 
 /* Only safe under the MMU lock in write mode, without yielding. */
 #define for_each_tdp_mmu_root(_kvm, _root)				\
@@ -632,7 +646,8 @@ static inline void tdp_mmu_set_spte_no_dirty_log(struct kvm *kvm,
  * Return false if a yield was not needed.
  */
 static inline bool tdp_mmu_iter_cond_resched(struct kvm *kvm,
-					     struct tdp_iter *iter, bool flush)
+					     struct tdp_iter *iter, bool flush,
+					     bool shared)
 {
 	/* Ensure forward progress has been made before yielding. */
 	if (iter->next_last_level_gfn == iter->yielded_gfn)
@@ -644,7 +659,11 @@ static inline bool tdp_mmu_iter_cond_resched(struct kvm *kvm,
 		if (flush)
 			kvm_flush_remote_tlbs(kvm);
 
-		cond_resched_rwlock_write(&kvm->mmu_lock);
+		if (shared)
+			cond_resched_rwlock_read(&kvm->mmu_lock);
+		else
+			cond_resched_rwlock_write(&kvm->mmu_lock);
+
 		rcu_read_lock();
 
 		WARN_ON(iter->gfn > iter->next_last_level_gfn);
@@ -662,23 +681,33 @@ static inline bool tdp_mmu_iter_cond_resched(struct kvm *kvm,
  * non-root pages mapping GFNs strictly within that range. Returns true if
  * SPTEs have been cleared and a TLB flush is needed before releasing the
  * MMU lock.
+ *
  * If can_yield is true, will release the MMU lock and reschedule if the
  * scheduler needs the CPU or there is contention on the MMU lock. If this
  * function cannot yield, it will not release the MMU lock or reschedule and
  * the caller must ensure it does not supply too large a GFN range, or the
  * operation can cause a soft lockup.
+ *
+ * If shared is true, this thread holds the MMU lock in read mode and must
+ * account for the possibility that other threads are modifying the paging
+ * structures concurrently. If shared is false, this thread should hold the
+ * MMU lock in write mode.
  */
 static bool zap_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
-			  gfn_t start, gfn_t end, bool can_yield)
+			  gfn_t start, gfn_t end, bool can_yield, bool shared)
 {
 	struct tdp_iter iter;
 	bool flush_needed = false;
 
+	kvm_lockdep_assert_mmu_lock_held(kvm, shared);
+
 	rcu_read_lock();
 
 	tdp_root_for_each_pte(iter, root, start, end) {
+retry:
 		if (can_yield &&
-		    tdp_mmu_iter_cond_resched(kvm, &iter, flush_needed)) {
+		    tdp_mmu_iter_cond_resched(kvm, &iter, flush_needed,
+					      shared)) {
 			flush_needed = false;
 			continue;
 		}
@@ -696,8 +725,17 @@ static bool zap_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
 		    !is_last_spte(iter.old_spte, iter.level))
 			continue;
 
-		tdp_mmu_set_spte(kvm, &iter, 0);
-		flush_needed = true;
+		if (!shared) {
+			tdp_mmu_set_spte(kvm, &iter, 0);
+			flush_needed = true;
+		} else if (!tdp_mmu_zap_spte_atomic(kvm, &iter)) {
+			/*
+			 * The iter must explicitly re-read the SPTE because
+			 * the atomic cmpxchg failed.
+			 */
+			iter.old_spte = READ_ONCE(*rcu_dereference(iter.sptep));
+			goto retry;
+		}
 	}
 
 	rcu_read_unlock();
@@ -709,14 +747,20 @@ static bool zap_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
  * non-root pages mapping GFNs strictly within that range. Returns true if
  * SPTEs have been cleared and a TLB flush is needed before releasing the
  * MMU lock.
+ *
+ * If shared is true, this thread holds the MMU lock in read mode and must
+ * account for the possibility that other threads are modifying the paging
+ * structures concurrently. If shared is false, this thread should hold the
+ * MMU in write mode.
  */
-bool kvm_tdp_mmu_zap_gfn_range(struct kvm *kvm, gfn_t start, gfn_t end)
+bool kvm_tdp_mmu_zap_gfn_range(struct kvm *kvm, gfn_t start, gfn_t end,
+			       bool shared)
 {
 	struct kvm_mmu_page *root;
 	bool flush = false;
 
-	for_each_tdp_mmu_root_yield_safe(kvm, root)
-		flush |= zap_gfn_range(kvm, root, start, end, true);
+	for_each_tdp_mmu_root_yield_safe(kvm, root, shared)
+		flush |= zap_gfn_range(kvm, root, start, end, true, shared);
 
 	return flush;
 }
@@ -726,7 +770,7 @@ void kvm_tdp_mmu_zap_all(struct kvm *kvm)
 	gfn_t max_gfn = 1ULL << (shadow_phys_bits - PAGE_SHIFT);
 	bool flush;
 
-	flush = kvm_tdp_mmu_zap_gfn_range(kvm, 0, max_gfn);
+	flush = kvm_tdp_mmu_zap_gfn_range(kvm, 0, max_gfn, false);
 	if (flush)
 		kvm_flush_remote_tlbs(kvm);
 }
@@ -893,7 +937,7 @@ static __always_inline int kvm_tdp_mmu_handle_hva_range(struct kvm *kvm,
 	int ret = 0;
 	int as_id;
 
-	for_each_tdp_mmu_root_yield_safe(kvm, root) {
+	for_each_tdp_mmu_root_yield_safe(kvm, root, false) {
 		as_id = kvm_mmu_page_as_id(root);
 		slots = __kvm_memslots(kvm, as_id);
 		kvm_for_each_memslot(memslot, slots) {
@@ -933,7 +977,7 @@ static int zap_gfn_range_hva_wrapper(struct kvm *kvm,
 				     struct kvm_mmu_page *root, gfn_t start,
 				     gfn_t end, unsigned long unused)
 {
-	return zap_gfn_range(kvm, root, start, end, false);
+	return zap_gfn_range(kvm, root, start, end, false, false);
 }
 
 int kvm_tdp_mmu_zap_hva_range(struct kvm *kvm, unsigned long start,
@@ -1098,7 +1142,7 @@ static bool wrprot_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
 
 	for_each_tdp_pte_min_level(iter, root->spt, root->role.level,
 				   min_level, start, end) {
-		if (tdp_mmu_iter_cond_resched(kvm, &iter, false))
+		if (tdp_mmu_iter_cond_resched(kvm, &iter, false, false))
 			continue;
 
 		if (!is_shadow_present_pte(iter.old_spte) ||
@@ -1128,7 +1172,7 @@ bool kvm_tdp_mmu_wrprot_slot(struct kvm *kvm, struct kvm_memory_slot *slot,
 	int root_as_id;
 	bool spte_set = false;
 
-	for_each_tdp_mmu_root_yield_safe(kvm, root) {
+	for_each_tdp_mmu_root_yield_safe(kvm, root, false) {
 		root_as_id = kvm_mmu_page_as_id(root);
 		if (root_as_id != slot->as_id)
 			continue;
@@ -1157,7 +1201,7 @@ static bool clear_dirty_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
 	rcu_read_lock();
 
 	tdp_root_for_each_leaf_pte(iter, root, start, end) {
-		if (tdp_mmu_iter_cond_resched(kvm, &iter, false))
+		if (tdp_mmu_iter_cond_resched(kvm, &iter, false, false))
 			continue;
 
 		if (spte_ad_need_write_protect(iter.old_spte)) {
@@ -1193,7 +1237,7 @@ bool kvm_tdp_mmu_clear_dirty_slot(struct kvm *kvm, struct kvm_memory_slot *slot)
 	int root_as_id;
 	bool spte_set = false;
 
-	for_each_tdp_mmu_root_yield_safe(kvm, root) {
+	for_each_tdp_mmu_root_yield_safe(kvm, root, false) {
 		root_as_id = kvm_mmu_page_as_id(root);
 		if (root_as_id != slot->as_id)
 			continue;
@@ -1291,7 +1335,7 @@ static void zap_collapsible_spte_range(struct kvm *kvm,
 	rcu_read_lock();
 
 	tdp_root_for_each_pte(iter, root, start, end) {
-		if (tdp_mmu_iter_cond_resched(kvm, &iter, spte_set)) {
+		if (tdp_mmu_iter_cond_resched(kvm, &iter, spte_set, false)) {
 			spte_set = false;
 			continue;
 		}
@@ -1326,7 +1370,7 @@ void kvm_tdp_mmu_zap_collapsible_sptes(struct kvm *kvm,
 	struct kvm_mmu_page *root;
 	int root_as_id;
 
-	for_each_tdp_mmu_root_yield_safe(kvm, root) {
+	for_each_tdp_mmu_root_yield_safe(kvm, root, false) {
 		root_as_id = kvm_mmu_page_as_id(root);
 		if (root_as_id != slot->as_id)
 			continue;
diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
index 9961df505067..855e58856815 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.h
+++ b/arch/x86/kvm/mmu/tdp_mmu.h
@@ -13,9 +13,11 @@ __must_check static inline bool kvm_tdp_mmu_get_root(struct kvm *kvm,
 	return refcount_inc_not_zero(&root->tdp_mmu_root_count);
 }
 
-void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root);
+void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root,
+			  bool shared);
 
-bool kvm_tdp_mmu_zap_gfn_range(struct kvm *kvm, gfn_t start, gfn_t end);
+bool kvm_tdp_mmu_zap_gfn_range(struct kvm *kvm, gfn_t start, gfn_t end,
+			       bool shared);
 void kvm_tdp_mmu_zap_all(struct kvm *kvm);
 
 int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
-- 
2.31.0.291.g576ba9dcdaf-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 10/13] KVM: x86/mmu: Allow zapping collapsible SPTEs to use MMU read lock
  2021-03-31 21:08 [PATCH 00/13] More parallel operations for the TDP MMU Ben Gardon
                   ` (8 preceding siblings ...)
  2021-03-31 21:08 ` [PATCH 09/13] KVM: x86/mmu: Allow zap gfn range to operate under the mmu read lock Ben Gardon
@ 2021-03-31 21:08 ` Ben Gardon
  2021-04-01 10:12   ` Paolo Bonzini
  2021-03-31 21:08 ` [PATCH 11/13] KVM: x86/mmu: Allow enabling / disabling dirty logging under " Ben Gardon
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 32+ messages in thread
From: Ben Gardon @ 2021-03-31 21:08 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Peter Xu, Sean Christopherson, Peter Shier,
	Peter Feiner, Junaid Shahid, Jim Mattson, Yulei Zhang,
	Wanpeng Li, Vitaly Kuznetsov, Xiao Guangrong, Ben Gardon

To speed the process of disabling dirty logging, change the TDP MMU
function which zaps collapsible SPTEs to run under the MMU read lock.

Signed-off-by: Ben Gardon <bgardon@google.com>
---
 arch/x86/kvm/mmu/mmu.c     |  9 ++++++---
 arch/x86/kvm/mmu/tdp_mmu.c | 17 +++++++++++++----
 2 files changed, 19 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index dcbfc784cf2f..81967b4e7d76 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5610,10 +5610,13 @@ void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm,
 
 	write_lock(&kvm->mmu_lock);
 	slot_handle_leaf(kvm, slot, kvm_mmu_zap_collapsible_spte, true);
-
-	if (is_tdp_mmu_enabled(kvm))
-		kvm_tdp_mmu_zap_collapsible_sptes(kvm, slot);
 	write_unlock(&kvm->mmu_lock);
+
+	if (is_tdp_mmu_enabled(kvm)) {
+		read_lock(&kvm->mmu_lock);
+		kvm_tdp_mmu_zap_collapsible_sptes(kvm, memslot);
+		read_unlock(&kvm->mmu_lock);
+	}
 }
 
 void kvm_arch_flush_remote_tlbs_memslot(struct kvm *kvm,
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 0e99e4675dd4..862acb868abd 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1335,7 +1335,8 @@ static void zap_collapsible_spte_range(struct kvm *kvm,
 	rcu_read_lock();
 
 	tdp_root_for_each_pte(iter, root, start, end) {
-		if (tdp_mmu_iter_cond_resched(kvm, &iter, spte_set, false)) {
+retry:
+		if (tdp_mmu_iter_cond_resched(kvm, &iter, spte_set, true)) {
 			spte_set = false;
 			continue;
 		}
@@ -1350,8 +1351,14 @@ static void zap_collapsible_spte_range(struct kvm *kvm,
 							    pfn, PG_LEVEL_NUM))
 			continue;
 
-		tdp_mmu_set_spte(kvm, &iter, 0);
-
+		if (!tdp_mmu_zap_spte_atomic(kvm, &iter)) {
+			/*
+			 * The iter must explicitly re-read the SPTE because
+			 * the atomic cmpxchg failed.
+			 */
+			iter.old_spte = READ_ONCE(*rcu_dereference(iter.sptep));
+			goto retry;
+		}
 		spte_set = true;
 	}
 
@@ -1370,7 +1377,9 @@ void kvm_tdp_mmu_zap_collapsible_sptes(struct kvm *kvm,
 	struct kvm_mmu_page *root;
 	int root_as_id;
 
-	for_each_tdp_mmu_root_yield_safe(kvm, root, false) {
+	lockdep_assert_held_read(&kvm->mmu_lock);
+
+	for_each_tdp_mmu_root_yield_safe(kvm, root, true) {
 		root_as_id = kvm_mmu_page_as_id(root);
 		if (root_as_id != slot->as_id)
 			continue;
-- 
2.31.0.291.g576ba9dcdaf-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 11/13] KVM: x86/mmu: Allow enabling / disabling dirty logging under MMU read lock
  2021-03-31 21:08 [PATCH 00/13] More parallel operations for the TDP MMU Ben Gardon
                   ` (9 preceding siblings ...)
  2021-03-31 21:08 ` [PATCH 10/13] KVM: x86/mmu: Allow zapping collapsible SPTEs to use MMU " Ben Gardon
@ 2021-03-31 21:08 ` Ben Gardon
  2021-03-31 21:08 ` [PATCH 12/13] KVM: x86/mmu: Fast invalidation for TDP MMU Ben Gardon
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 32+ messages in thread
From: Ben Gardon @ 2021-03-31 21:08 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Peter Xu, Sean Christopherson, Peter Shier,
	Peter Feiner, Junaid Shahid, Jim Mattson, Yulei Zhang,
	Wanpeng Li, Vitaly Kuznetsov, Xiao Guangrong, Ben Gardon

To reduce lock contention and interference with page fault handlers,
allow the TDP MMU functions which enable and disable dirty logging
to operate under the MMU read lock.

Signed-off-by: Ben Gardon <bgardon@google.com>
---
 arch/x86/kvm/mmu/mmu.c     | 16 +++++++---
 arch/x86/kvm/mmu/tdp_mmu.c | 62 ++++++++++++++++++++++++++++++--------
 2 files changed, 61 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 81967b4e7d76..bf535c9f7ff2 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5543,10 +5543,14 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
 	write_lock(&kvm->mmu_lock);
 	flush = slot_handle_level(kvm, memslot, slot_rmap_write_protect,
 				start_level, KVM_MAX_HUGEPAGE_LEVEL, false);
-	if (is_tdp_mmu_enabled(kvm))
-		flush |= kvm_tdp_mmu_wrprot_slot(kvm, memslot, PG_LEVEL_4K);
 	write_unlock(&kvm->mmu_lock);
 
+	if (is_tdp_mmu_enabled(kvm)) {
+		read_lock(&kvm->mmu_lock);
+		flush |= kvm_tdp_mmu_wrprot_slot(kvm, memslot, PG_LEVEL_4K);
+		read_unlock(&kvm->mmu_lock);
+	}
+
 	/*
 	 * We can flush all the TLBs out of the mmu lock without TLB
 	 * corruption since we just change the spte from writable to
@@ -5641,10 +5645,14 @@ void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
 
 	write_lock(&kvm->mmu_lock);
 	flush = slot_handle_leaf(kvm, memslot, __rmap_clear_dirty, false);
-	if (is_tdp_mmu_enabled(kvm))
-		flush |= kvm_tdp_mmu_clear_dirty_slot(kvm, memslot);
 	write_unlock(&kvm->mmu_lock);
 
+	if (is_tdp_mmu_enabled(kvm)) {
+		read_lock(&kvm->mmu_lock);
+		flush |= kvm_tdp_mmu_clear_dirty_slot(kvm, memslot);
+		read_unlock(&kvm->mmu_lock);
+	}
+
 	/*
 	 * It's also safe to flush TLBs out of mmu lock here as currently this
 	 * function is only used for dirty logging, in which case flushing TLB
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 862acb868abd..0c90dc034819 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -491,8 +491,9 @@ static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
 }
 
 /*
- * tdp_mmu_set_spte_atomic - Set a TDP MMU SPTE atomically and handle the
- * associated bookkeeping
+ * tdp_mmu_set_spte_atomic_no_dirty_log - Set a TDP MMU SPTE atomically
+ * and handle the associated bookkeeping, but do not mark the page dirty
+ * in KVM's dirty bitmaps.
  *
  * @kvm: kvm instance
  * @iter: a tdp_iter instance currently on the SPTE that should be set
@@ -500,9 +501,9 @@ static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
  * Returns: true if the SPTE was set, false if it was not. If false is returned,
  *	    this function will have no side-effects.
  */
-static inline bool tdp_mmu_set_spte_atomic(struct kvm *kvm,
-					   struct tdp_iter *iter,
-					   u64 new_spte)
+static inline bool tdp_mmu_set_spte_atomic_no_dirty_log(struct kvm *kvm,
+							struct tdp_iter *iter,
+							u64 new_spte)
 {
 	lockdep_assert_held_read(&kvm->mmu_lock);
 
@@ -517,9 +518,22 @@ static inline bool tdp_mmu_set_spte_atomic(struct kvm *kvm,
 		      new_spte) != iter->old_spte)
 		return false;
 
-	handle_changed_spte(kvm, iter->as_id, iter->gfn, iter->old_spte,
-			    new_spte, iter->level, true);
+	__handle_changed_spte(kvm, iter->as_id, iter->gfn, iter->old_spte,
+			      new_spte, iter->level, true);
+	handle_changed_spte_acc_track(iter->old_spte, new_spte, iter->level);
+
+	return true;
+}
+
+static inline bool tdp_mmu_set_spte_atomic(struct kvm *kvm,
+					   struct tdp_iter *iter,
+					   u64 new_spte)
+{
+	if (!tdp_mmu_set_spte_atomic_no_dirty_log(kvm, iter, new_spte))
+		return false;
 
+	handle_changed_spte_dirty_log(kvm, iter->as_id, iter->gfn,
+				      iter->old_spte, new_spte, iter->level);
 	return true;
 }
 
@@ -1142,7 +1156,8 @@ static bool wrprot_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
 
 	for_each_tdp_pte_min_level(iter, root->spt, root->role.level,
 				   min_level, start, end) {
-		if (tdp_mmu_iter_cond_resched(kvm, &iter, false, false))
+retry:
+		if (tdp_mmu_iter_cond_resched(kvm, &iter, false, true))
 			continue;
 
 		if (!is_shadow_present_pte(iter.old_spte) ||
@@ -1152,7 +1167,15 @@ static bool wrprot_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
 
 		new_spte = iter.old_spte & ~PT_WRITABLE_MASK;
 
-		tdp_mmu_set_spte_no_dirty_log(kvm, &iter, new_spte);
+		if (!tdp_mmu_set_spte_atomic_no_dirty_log(kvm, &iter,
+							  new_spte)) {
+			/*
+			 * The iter must explicitly re-read the SPTE because
+			 * the atomic cmpxchg failed.
+			 */
+			iter.old_spte = READ_ONCE(*rcu_dereference(iter.sptep));
+			goto retry;
+		}
 		spte_set = true;
 	}
 
@@ -1172,7 +1195,9 @@ bool kvm_tdp_mmu_wrprot_slot(struct kvm *kvm, struct kvm_memory_slot *slot,
 	int root_as_id;
 	bool spte_set = false;
 
-	for_each_tdp_mmu_root_yield_safe(kvm, root, false) {
+	lockdep_assert_held_read(&kvm->mmu_lock);
+
+	for_each_tdp_mmu_root_yield_safe(kvm, root, true) {
 		root_as_id = kvm_mmu_page_as_id(root);
 		if (root_as_id != slot->as_id)
 			continue;
@@ -1201,7 +1226,8 @@ static bool clear_dirty_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
 	rcu_read_lock();
 
 	tdp_root_for_each_leaf_pte(iter, root, start, end) {
-		if (tdp_mmu_iter_cond_resched(kvm, &iter, false, false))
+retry:
+		if (tdp_mmu_iter_cond_resched(kvm, &iter, false, true))
 			continue;
 
 		if (spte_ad_need_write_protect(iter.old_spte)) {
@@ -1216,7 +1242,15 @@ static bool clear_dirty_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
 				continue;
 		}
 
-		tdp_mmu_set_spte_no_dirty_log(kvm, &iter, new_spte);
+		if (!tdp_mmu_set_spte_atomic_no_dirty_log(kvm, &iter,
+							  new_spte)) {
+			/*
+			 * The iter must explicitly re-read the SPTE because
+			 * the atomic cmpxchg failed.
+			 */
+			iter.old_spte = READ_ONCE(*rcu_dereference(iter.sptep));
+			goto retry;
+		}
 		spte_set = true;
 	}
 
@@ -1237,7 +1271,9 @@ bool kvm_tdp_mmu_clear_dirty_slot(struct kvm *kvm, struct kvm_memory_slot *slot)
 	int root_as_id;
 	bool spte_set = false;
 
-	for_each_tdp_mmu_root_yield_safe(kvm, root, false) {
+	lockdep_assert_held_read(&kvm->mmu_lock);
+
+	for_each_tdp_mmu_root_yield_safe(kvm, root, true) {
 		root_as_id = kvm_mmu_page_as_id(root);
 		if (root_as_id != slot->as_id)
 			continue;
-- 
2.31.0.291.g576ba9dcdaf-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 12/13] KVM: x86/mmu: Fast invalidation for TDP MMU
  2021-03-31 21:08 [PATCH 00/13] More parallel operations for the TDP MMU Ben Gardon
                   ` (10 preceding siblings ...)
  2021-03-31 21:08 ` [PATCH 11/13] KVM: x86/mmu: Allow enabling / disabling dirty logging under " Ben Gardon
@ 2021-03-31 21:08 ` Ben Gardon
  2021-03-31 22:27   ` Sean Christopherson
  2021-04-01 10:36   ` Paolo Bonzini
  2021-03-31 21:08 ` [PATCH 13/13] KVM: x86/mmu: Tear down roots in fast invalidation thread Ben Gardon
  2021-04-01 10:43 ` [PATCH 00/13] More parallel operations for the TDP MMU Paolo Bonzini
  13 siblings, 2 replies; 32+ messages in thread
From: Ben Gardon @ 2021-03-31 21:08 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Peter Xu, Sean Christopherson, Peter Shier,
	Peter Feiner, Junaid Shahid, Jim Mattson, Yulei Zhang,
	Wanpeng Li, Vitaly Kuznetsov, Xiao Guangrong, Ben Gardon

Provide a real mechanism for fast invalidation by marking roots as
invalid so that their reference count will quickly fall to zero
and they will be torn down.

One negative side affect of this approach is that a vCPU thread will
likely drop the last reference to a root and be saddled with the work of
tearing down an entire paging structure. This issue will be resolved in
a later commit.

Signed-off-by: Ben Gardon <bgardon@google.com>
---
 arch/x86/kvm/mmu/mmu.c     |  6 +++---
 arch/x86/kvm/mmu/tdp_mmu.c | 14 ++++++++++++++
 arch/x86/kvm/mmu/tdp_mmu.h |  5 +++++
 3 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index bf535c9f7ff2..49b7097fb55b 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5430,6 +5430,9 @@ static void kvm_mmu_zap_all_fast(struct kvm *kvm)
 	write_lock(&kvm->mmu_lock);
 	trace_kvm_mmu_zap_all_fast(kvm);
 
+	if (is_tdp_mmu_enabled(kvm))
+		kvm_tdp_mmu_invalidate_roots(kvm);
+
 	/*
 	 * Toggle mmu_valid_gen between '0' and '1'.  Because slots_lock is
 	 * held for the entire duration of zapping obsolete pages, it's
@@ -5451,9 +5454,6 @@ static void kvm_mmu_zap_all_fast(struct kvm *kvm)
 
 	kvm_zap_obsolete_pages(kvm);
 
-	if (is_tdp_mmu_enabled(kvm))
-		kvm_tdp_mmu_zap_all(kvm);
-
 	write_unlock(&kvm->mmu_lock);
 }
 
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 0c90dc034819..428ff6778426 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -789,6 +789,20 @@ void kvm_tdp_mmu_zap_all(struct kvm *kvm)
 		kvm_flush_remote_tlbs(kvm);
 }
 
+/*
+ * This function depends on running in the same MMU lock cirical section as
+ * kvm_reload_remote_mmus. Since this is in the same critical section, no new
+ * roots will be created between this function and the MMU reload signals
+ * being sent.
+ */
+void kvm_tdp_mmu_invalidate_roots(struct kvm *kvm)
+{
+	struct kvm_mmu_page *root;
+
+	for_each_tdp_mmu_root(kvm, root)
+		root->role.invalid = true;
+}
+
 /*
  * Installs a last-level SPTE to handle a TDP page fault.
  * (NPT/EPT violation/misconfiguration)
diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
index 855e58856815..ff4978817fb8 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.h
+++ b/arch/x86/kvm/mmu/tdp_mmu.h
@@ -10,6 +10,9 @@ hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu);
 __must_check static inline bool kvm_tdp_mmu_get_root(struct kvm *kvm,
 						     struct kvm_mmu_page *root)
 {
+	if (root->role.invalid)
+		return false;
+
 	return refcount_inc_not_zero(&root->tdp_mmu_root_count);
 }
 
@@ -20,6 +23,8 @@ bool kvm_tdp_mmu_zap_gfn_range(struct kvm *kvm, gfn_t start, gfn_t end,
 			       bool shared);
 void kvm_tdp_mmu_zap_all(struct kvm *kvm);
 
+void kvm_tdp_mmu_invalidate_roots(struct kvm *kvm);
+
 int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 		    int map_writable, int max_level, kvm_pfn_t pfn,
 		    bool prefault);
-- 
2.31.0.291.g576ba9dcdaf-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 13/13] KVM: x86/mmu: Tear down roots in fast invalidation thread
  2021-03-31 21:08 [PATCH 00/13] More parallel operations for the TDP MMU Ben Gardon
                   ` (11 preceding siblings ...)
  2021-03-31 21:08 ` [PATCH 12/13] KVM: x86/mmu: Fast invalidation for TDP MMU Ben Gardon
@ 2021-03-31 21:08 ` Ben Gardon
  2021-03-31 22:29   ` Sean Christopherson
  2021-04-01 10:37   ` Paolo Bonzini
  2021-04-01 10:43 ` [PATCH 00/13] More parallel operations for the TDP MMU Paolo Bonzini
  13 siblings, 2 replies; 32+ messages in thread
From: Ben Gardon @ 2021-03-31 21:08 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Peter Xu, Sean Christopherson, Peter Shier,
	Peter Feiner, Junaid Shahid, Jim Mattson, Yulei Zhang,
	Wanpeng Li, Vitaly Kuznetsov, Xiao Guangrong, Ben Gardon

To avoid saddling a vCPU thread with the work of tearing down an entire
paging structure, take a reference on each root before they become
obsolete, so that the thread initiating the fast invalidation can tear
down the paging structure and (most likely) release the last reference.
As a bonus, this teardown can happen under the MMU lock in read mode so
as not to block the progress of vCPU threads.

Signed-off-by: Ben Gardon <bgardon@google.com>
---
 arch/x86/kvm/mmu/mmu.c     |  6 ++++
 arch/x86/kvm/mmu/tdp_mmu.c | 74 +++++++++++++++++++++++++++++++++++++-
 arch/x86/kvm/mmu/tdp_mmu.h |  1 +
 3 files changed, 80 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 49b7097fb55b..22742619698d 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5455,6 +5455,12 @@ static void kvm_mmu_zap_all_fast(struct kvm *kvm)
 	kvm_zap_obsolete_pages(kvm);
 
 	write_unlock(&kvm->mmu_lock);
+
+	if (is_tdp_mmu_enabled(kvm)) {
+		read_lock(&kvm->mmu_lock);
+		kvm_tdp_mmu_zap_all_fast(kvm);
+		read_unlock(&kvm->mmu_lock);
+	}
 }
 
 static bool kvm_has_zapped_obsolete_pages(struct kvm *kvm)
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 428ff6778426..5498df7e2e1f 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -794,13 +794,85 @@ void kvm_tdp_mmu_zap_all(struct kvm *kvm)
  * kvm_reload_remote_mmus. Since this is in the same critical section, no new
  * roots will be created between this function and the MMU reload signals
  * being sent.
+ * Take a reference on all roots so that this thread can do the bulk of
+ * the work required to free the roots once they are invalidated. Without
+ * this reference, a vCPU thread might drop the last reference to a root
+ * and get stuck with tearing down the entire paging structure.
  */
 void kvm_tdp_mmu_invalidate_roots(struct kvm *kvm)
 {
 	struct kvm_mmu_page *root;
 
 	for_each_tdp_mmu_root(kvm, root)
-		root->role.invalid = true;
+		if (refcount_inc_not_zero(&root->tdp_mmu_root_count))
+			root->role.invalid = true;
+}
+
+static struct kvm_mmu_page *next_invalidated_root(struct kvm *kvm,
+						  struct kvm_mmu_page *prev_root)
+{
+	struct kvm_mmu_page *next_root;
+
+	if (prev_root)
+		next_root = list_next_or_null_rcu(&kvm->arch.tdp_mmu_roots,
+						  &prev_root->link,
+						  typeof(*prev_root), link);
+	else
+		next_root = list_first_or_null_rcu(&kvm->arch.tdp_mmu_roots,
+						   typeof(*next_root), link);
+
+	while (next_root && !(next_root->role.invalid &&
+			      refcount_read(&next_root->tdp_mmu_root_count)))
+		next_root = list_next_or_null_rcu(&kvm->arch.tdp_mmu_roots,
+						  &next_root->link,
+						  typeof(*next_root), link);
+
+	return next_root;
+}
+
+/*
+ * Since kvm_tdp_mmu_invalidate_roots has acquired a reference to each
+ * invalidated root, they will not be freed until this function drops the
+ * reference. Before dropping that reference, tear down the paging
+ * structure so that whichever thread does drop the last reference
+ * only has to do a trivial ammount of work. Since the roots are invalid,
+ * no new SPTEs should be created under them.
+ */
+void kvm_tdp_mmu_zap_all_fast(struct kvm *kvm)
+{
+	gfn_t max_gfn = 1ULL << (shadow_phys_bits - PAGE_SHIFT);
+	struct kvm_mmu_page *next_root;
+	struct kvm_mmu_page *root;
+	bool flush = false;
+
+	lockdep_assert_held_read(&kvm->mmu_lock);
+
+	rcu_read_lock();
+
+	root = next_invalidated_root(kvm, NULL);
+
+	while (root) {
+		next_root = next_invalidated_root(kvm, root);
+
+		rcu_read_unlock();
+
+		flush |= zap_gfn_range(kvm, root, 0, max_gfn, true, true);
+
+		/*
+		 * Put the reference acquired in
+		 * kvm_tdp_mmu_invalidate_roots
+		 */
+		kvm_tdp_mmu_put_root(kvm, root, true);
+
+		root = next_root;
+
+		rcu_read_lock();
+	}
+
+	rcu_read_unlock();
+
+	if (flush)
+		kvm_flush_remote_tlbs(kvm);
 }
 
 /*
diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
index ff4978817fb8..d6d98f9047cd 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.h
+++ b/arch/x86/kvm/mmu/tdp_mmu.h
@@ -24,6 +24,7 @@ bool kvm_tdp_mmu_zap_gfn_range(struct kvm *kvm, gfn_t start, gfn_t end,
 void kvm_tdp_mmu_zap_all(struct kvm *kvm);
 
 void kvm_tdp_mmu_invalidate_roots(struct kvm *kvm);
+void kvm_tdp_mmu_zap_all_fast(struct kvm *kvm);
 
 int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 		    int map_writable, int max_level, kvm_pfn_t pfn,
-- 
2.31.0.291.g576ba9dcdaf-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 07/13] KVM: x86/mmu: Make TDP MMU root refcount atomic
  2021-03-31 21:08 ` [PATCH 07/13] KVM: x86/mmu: Make TDP MMU root refcount atomic Ben Gardon
@ 2021-03-31 22:21   ` Sean Christopherson
  2021-04-01 16:50     ` Ben Gardon
  0 siblings, 1 reply; 32+ messages in thread
From: Sean Christopherson @ 2021-03-31 22:21 UTC (permalink / raw)
  To: Ben Gardon
  Cc: linux-kernel, kvm, Paolo Bonzini, Peter Xu, Peter Shier,
	Peter Feiner, Junaid Shahid, Jim Mattson, Yulei Zhang,
	Wanpeng Li, Vitaly Kuznetsov, Xiao Guangrong

On Wed, Mar 31, 2021, Ben Gardon wrote:
> In order to parallelize more operations for the TDP MMU, make the
> refcount on TDP MMU roots atomic, so that a future patch can allow
> multiple threads to take a reference on the root concurrently, while
> holding the MMU lock in read mode.
> 
> Signed-off-by: Ben Gardon <bgardon@google.com>
> ---

...

> @@ -88,10 +88,12 @@ static struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm,
>  		next_root = list_first_entry(&kvm->arch.tdp_mmu_roots,
>  					     typeof(*next_root), link);
>  
> +	while (!list_entry_is_head(next_root, &kvm->arch.tdp_mmu_roots, link) &&
> +	       !kvm_tdp_mmu_get_root(kvm, next_root))
> +		next_root = list_next_entry(next_root, link);
> +
>  	if (list_entry_is_head(next_root, &kvm->arch.tdp_mmu_roots, link))
>  		next_root = NULL;
> -	else
> -		kvm_tdp_mmu_get_root(kvm, next_root);
>  
>  	if (prev_root)
>  		kvm_tdp_mmu_put_root(kvm, prev_root);
> @@ -158,14 +160,13 @@ hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu)
>  
>  	/* Check for an existing root before allocating a new one. */
>  	for_each_tdp_mmu_root(kvm, root) {
> -		if (root->role.word == role.word) {
> -			kvm_tdp_mmu_get_root(kvm, root);
> +		if (root->role.word == role.word &&
> +		    kvm_tdp_mmu_get_root(kvm, root))

I'm not opposed to changing this logic while making the refcount atomic, but it
needs to be explained in the changelog.  As is, the changelog makes it sound
like the patch is a pure refactoring of the type.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 12/13] KVM: x86/mmu: Fast invalidation for TDP MMU
  2021-03-31 21:08 ` [PATCH 12/13] KVM: x86/mmu: Fast invalidation for TDP MMU Ben Gardon
@ 2021-03-31 22:27   ` Sean Christopherson
  2021-04-01 16:50     ` Ben Gardon
  2021-04-01 10:36   ` Paolo Bonzini
  1 sibling, 1 reply; 32+ messages in thread
From: Sean Christopherson @ 2021-03-31 22:27 UTC (permalink / raw)
  To: Ben Gardon
  Cc: linux-kernel, kvm, Paolo Bonzini, Peter Xu, Peter Shier,
	Peter Feiner, Junaid Shahid, Jim Mattson, Yulei Zhang,
	Wanpeng Li, Vitaly Kuznetsov, Xiao Guangrong

On Wed, Mar 31, 2021, Ben Gardon wrote:
> Provide a real mechanism for fast invalidation by marking roots as
> invalid so that their reference count will quickly fall to zero
> and they will be torn down.
> 
> One negative side affect of this approach is that a vCPU thread will
> likely drop the last reference to a root and be saddled with the work of
> tearing down an entire paging structure. This issue will be resolved in
> a later commit.
> 
> Signed-off-by: Ben Gardon <bgardon@google.com>
> ---

...

> +/*
> + * This function depends on running in the same MMU lock cirical section as
> + * kvm_reload_remote_mmus. Since this is in the same critical section, no new
> + * roots will be created between this function and the MMU reload signals
> + * being sent.

Eww.  That goes beyond just adding a lockdep assertion here.  I know you want to
isolate the TDP MMU as much as possible, but this really feels like it should be
open coded in kvm_mmu_zap_all_fast().  And assuming this lands after as_id is
added to for_each_tdp_mmu_root(), it's probably easier to open code anyways, e.g.
use list_for_each_entry() directly instead of bouncing through an iterator.

> + */
> +void kvm_tdp_mmu_invalidate_roots(struct kvm *kvm)
> +{
> +	struct kvm_mmu_page *root;
> +
> +	for_each_tdp_mmu_root(kvm, root)
> +		root->role.invalid = true;
> +}

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 13/13] KVM: x86/mmu: Tear down roots in fast invalidation thread
  2021-03-31 21:08 ` [PATCH 13/13] KVM: x86/mmu: Tear down roots in fast invalidation thread Ben Gardon
@ 2021-03-31 22:29   ` Sean Christopherson
  2021-04-01 10:41     ` Paolo Bonzini
  2021-04-01 10:37   ` Paolo Bonzini
  1 sibling, 1 reply; 32+ messages in thread
From: Sean Christopherson @ 2021-03-31 22:29 UTC (permalink / raw)
  To: Ben Gardon
  Cc: linux-kernel, kvm, Paolo Bonzini, Peter Xu, Peter Shier,
	Peter Feiner, Junaid Shahid, Jim Mattson, Yulei Zhang,
	Wanpeng Li, Vitaly Kuznetsov, Xiao Guangrong

On Wed, Mar 31, 2021, Ben Gardon wrote:
> ---
>  arch/x86/kvm/mmu/mmu.c     |  6 ++++
>  arch/x86/kvm/mmu/tdp_mmu.c | 74 +++++++++++++++++++++++++++++++++++++-
>  arch/x86/kvm/mmu/tdp_mmu.h |  1 +
>  3 files changed, 80 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 49b7097fb55b..22742619698d 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -5455,6 +5455,12 @@ static void kvm_mmu_zap_all_fast(struct kvm *kvm)
>  	kvm_zap_obsolete_pages(kvm);
>  
>  	write_unlock(&kvm->mmu_lock);
> +
> +	if (is_tdp_mmu_enabled(kvm)) {
> +		read_lock(&kvm->mmu_lock);
> +		kvm_tdp_mmu_zap_all_fast(kvm);

Purely because it exists first, I think we should follow the legacy MMU's
terminology, i.e. kvm_tdp_mmu_zap_obsolete_pages().

> +		read_unlock(&kvm->mmu_lock);
> +	}
>  }

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 08/13] KVM: x86/mmu: Protect the tdp_mmu_roots list with RCU
  2021-03-31 21:08 ` [PATCH 08/13] KVM: x86/mmu: Protect the tdp_mmu_roots list with RCU Ben Gardon
@ 2021-04-01  9:37   ` Paolo Bonzini
  2021-04-01 16:48     ` Ben Gardon
  2021-04-01 13:16   ` kernel test robot
  1 sibling, 1 reply; 32+ messages in thread
From: Paolo Bonzini @ 2021-04-01  9:37 UTC (permalink / raw)
  To: Ben Gardon, linux-kernel, kvm
  Cc: Peter Xu, Sean Christopherson, Peter Shier, Peter Feiner,
	Junaid Shahid, Jim Mattson, Yulei Zhang, Wanpeng Li,
	Vitaly Kuznetsov, Xiao Guangrong

On 31/03/21 23:08, Ben Gardon wrote:
> Protect the contents of the TDP MMU roots list with RCU in preparation
> for a future patch which will allow the iterator macro to be used under
> the MMU lock in read mode.
> 
> Signed-off-by: Ben Gardon<bgardon@google.com>
> ---
>   arch/x86/kvm/mmu/tdp_mmu.c | 64 +++++++++++++++++++++-----------------
>   1 file changed, 36 insertions(+), 28 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> +	spin_lock(&kvm->arch.tdp_mmu_pages_lock);
> +	list_del_rcu(&root->link);
> +	spin_unlock(&kvm->arch.tdp_mmu_pages_lock);


Please update the comment above tdp_mmu_pages_lock in 
arch/x86/include/asm/kvm_host.h as well.

>  /* Only safe under the MMU lock in write mode, without yielding. */
>  #define for_each_tdp_mmu_root(_kvm, _root)				\
> -	list_for_each_entry(_root, &_kvm->arch.tdp_mmu_roots, link)
> +	list_for_each_entry_rcu(_root, &_kvm->arch.tdp_mmu_roots, link,	\
> +				lockdep_is_held_write(&kvm->mmu_lock))

This should also add "... || 
lockdep_is_help(&kvm->arch.tdp_mmu_pages_lock)", if only for 
documentation purposes.

Paolo


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 09/13] KVM: x86/mmu: Allow zap gfn range to operate under the mmu read lock
  2021-03-31 21:08 ` [PATCH 09/13] KVM: x86/mmu: Allow zap gfn range to operate under the mmu read lock Ben Gardon
@ 2021-04-01  9:58   ` Paolo Bonzini
  2021-04-01 16:50     ` Ben Gardon
  0 siblings, 1 reply; 32+ messages in thread
From: Paolo Bonzini @ 2021-04-01  9:58 UTC (permalink / raw)
  To: Ben Gardon, linux-kernel, kvm
  Cc: Peter Xu, Sean Christopherson, Peter Shier, Peter Feiner,
	Junaid Shahid, Jim Mattson, Yulei Zhang, Wanpeng Li,
	Vitaly Kuznetsov, Xiao Guangrong

On 31/03/21 23:08, Ben Gardon wrote:
> To reduce lock contention and interference with page fault handlers,
> allow the TDP MMU function to zap a GFN range to operate under the MMU
> read lock.
> 
> Signed-off-by: Ben Gardon <bgardon@google.com>
> ---
>   arch/x86/kvm/mmu/mmu.c     |  15 ++++--
>   arch/x86/kvm/mmu/tdp_mmu.c | 102 ++++++++++++++++++++++++++-----------
>   arch/x86/kvm/mmu/tdp_mmu.h |   6 ++-
>   3 files changed, 87 insertions(+), 36 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 667d64daa82c..dcbfc784cf2f 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -3155,7 +3155,7 @@ static void mmu_free_root_page(struct kvm *kvm, hpa_t *root_hpa,
>   	sp = to_shadow_page(*root_hpa & PT64_BASE_ADDR_MASK);
>   
>   	if (is_tdp_mmu_page(sp))
> -		kvm_tdp_mmu_put_root(kvm, sp);
> +		kvm_tdp_mmu_put_root(kvm, sp, false);
>   	else if (!--sp->root_count && sp->role.invalid)
>   		kvm_mmu_prepare_zap_page(kvm, sp, invalid_list);
>   
> @@ -5514,13 +5514,17 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
>   		}
>   	}
>   
> +	write_unlock(&kvm->mmu_lock);
> +
>   	if (is_tdp_mmu_enabled(kvm)) {
> -		flush = kvm_tdp_mmu_zap_gfn_range(kvm, gfn_start, gfn_end);
> +		read_lock(&kvm->mmu_lock);
> +		flush = kvm_tdp_mmu_zap_gfn_range(kvm, gfn_start, gfn_end,
> +						  true);
>   		if (flush)
>   			kvm_flush_remote_tlbs(kvm);
> -	}
>   
> -	write_unlock(&kvm->mmu_lock);
> +		read_unlock(&kvm->mmu_lock);
> +	}
>   }

This will conflict with Sean's MMU notifier series patches:

KVM: x86/mmu: Pass address space ID to __kvm_tdp_mmu_zap_gfn_range()

What I can do for now is change the mmu.c part of that patch to

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index e6e02360ef67..9882bbd9b742 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5510,15 +5510,15 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
  		}
  	}
  
-	if (flush)
-		kvm_flush_remote_tlbs_with_address(kvm, gfn_start, gfn_end);
-
  	if (is_tdp_mmu_enabled(kvm)) {
-		flush = kvm_tdp_mmu_zap_gfn_range(kvm, gfn_start, gfn_end);
-		if (flush)
-			kvm_flush_remote_tlbs(kvm);
+		for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++)
+			flush = kvm_tdp_mmu_zap_gfn_range(kvm, i, gfn_start,
+							  gfn_end, flush);
  	}
  
+	if (flush)
+		kvm_flush_remote_tlbs_with_address(kvm, gfn_start, gfn_end);
+
  	write_unlock(&kvm->mmu_lock);
  }
  
  
but you will have to add a separate "if (flush)" when moving the write_unlock
earlier, since there's no downgrade function for rwlocks.  In practice it's
not a huge deal since unless running nested there will be only one active MMU.

Paolo

>   static bool slot_rmap_write_protect(struct kvm *kvm,
> @@ -5959,7 +5963,8 @@ static void kvm_recover_nx_lpages(struct kvm *kvm)
>   		WARN_ON_ONCE(!sp->lpage_disallowed);
>   		if (is_tdp_mmu_page(sp)) {
>   			kvm_tdp_mmu_zap_gfn_range(kvm, sp->gfn,
> -				sp->gfn + KVM_PAGES_PER_HPAGE(sp->role.level));
> +				sp->gfn + KVM_PAGES_PER_HPAGE(sp->role.level),
> +				false);
>   		} else {
>   			kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list);
>   			WARN_ON_ONCE(sp->lpage_disallowed);
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index d255125059c4..0e99e4675dd4 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -27,6 +27,15 @@ void kvm_mmu_init_tdp_mmu(struct kvm *kvm)
>   	INIT_LIST_HEAD(&kvm->arch.tdp_mmu_pages);
>   }
>   
> +static __always_inline void kvm_lockdep_assert_mmu_lock_held(struct kvm *kvm,
> +							     bool shared)
> +{
> +	if (shared)
> +		lockdep_assert_held_read(&kvm->mmu_lock);
> +	else
> +		lockdep_assert_held_write(&kvm->mmu_lock);
> +}
> +
>   void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm)
>   {
>   	if (!kvm->arch.tdp_mmu_enabled)
> @@ -42,7 +51,7 @@ void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm)
>   }
>   
>   static bool zap_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
> -			  gfn_t start, gfn_t end, bool can_yield);
> +			  gfn_t start, gfn_t end, bool can_yield, bool shared);
>   
>   static void tdp_mmu_free_sp(struct kvm_mmu_page *sp)
>   {
> @@ -66,11 +75,12 @@ static void tdp_mmu_free_sp_rcu_callback(struct rcu_head *head)
>   	tdp_mmu_free_sp(sp);
>   }
>   
> -void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root)
> +void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root,
> +			  bool shared)
>   {
>   	gfn_t max_gfn = 1ULL << (shadow_phys_bits - PAGE_SHIFT);
>   
> -	lockdep_assert_held_write(&kvm->mmu_lock);
> +	kvm_lockdep_assert_mmu_lock_held(kvm, shared);
>   
>   	if (!refcount_dec_and_test(&root->tdp_mmu_root_count))
>   		return;
> @@ -81,7 +91,7 @@ void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root)
>   	list_del_rcu(&root->link);
>   	spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
>   
> -	zap_gfn_range(kvm, root, 0, max_gfn, false);
> +	zap_gfn_range(kvm, root, 0, max_gfn, false, shared);
>   
>   	call_rcu(&root->rcu_head, tdp_mmu_free_sp_rcu_callback);
>   }
> @@ -94,11 +104,11 @@ void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root)
>    * function will return NULL.
>    */
>   static struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm,
> -					      struct kvm_mmu_page *prev_root)
> +					      struct kvm_mmu_page *prev_root,
> +					      bool shared)
>   {
>   	struct kvm_mmu_page *next_root;
>   
> -	lockdep_assert_held_write(&kvm->mmu_lock);
>   
>   	rcu_read_lock();
>   
> @@ -117,7 +127,7 @@ static struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm,
>   	rcu_read_unlock();
>   
>   	if (prev_root)
> -		kvm_tdp_mmu_put_root(kvm, prev_root);
> +		kvm_tdp_mmu_put_root(kvm, prev_root, shared);
>   
>   	return next_root;
>   }
> @@ -127,11 +137,15 @@ static struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm,
>    * This makes it safe to release the MMU lock and yield within the loop, but
>    * if exiting the loop early, the caller must drop the reference to the most
>    * recent root. (Unless keeping a live reference is desirable.)
> + *
> + * If shared is set, this function is operating under the MMU lock in read
> + * mode. In the unlikely event that this thread must free a root, the lock
> + * will be temporarily dropped and reacquired in write mode.
>    */
> -#define for_each_tdp_mmu_root_yield_safe(_kvm, _root)	\
> -	for (_root = tdp_mmu_next_root(_kvm, NULL);	\
> -	     _root;					\
> -	     _root = tdp_mmu_next_root(_kvm, _root))
> +#define for_each_tdp_mmu_root_yield_safe(_kvm, _root, _shared)	\
> +	for (_root = tdp_mmu_next_root(_kvm, NULL, _shared);	\
> +	     _root;						\
> +	     _root = tdp_mmu_next_root(_kvm, _root, _shared))
>   
>   /* Only safe under the MMU lock in write mode, without yielding. */
>   #define for_each_tdp_mmu_root(_kvm, _root)				\
> @@ -632,7 +646,8 @@ static inline void tdp_mmu_set_spte_no_dirty_log(struct kvm *kvm,
>    * Return false if a yield was not needed.
>    */
>   static inline bool tdp_mmu_iter_cond_resched(struct kvm *kvm,
> -					     struct tdp_iter *iter, bool flush)
> +					     struct tdp_iter *iter, bool flush,
> +					     bool shared)
>   {
>   	/* Ensure forward progress has been made before yielding. */
>   	if (iter->next_last_level_gfn == iter->yielded_gfn)
> @@ -644,7 +659,11 @@ static inline bool tdp_mmu_iter_cond_resched(struct kvm *kvm,
>   		if (flush)
>   			kvm_flush_remote_tlbs(kvm);
>   
> -		cond_resched_rwlock_write(&kvm->mmu_lock);
> +		if (shared)
> +			cond_resched_rwlock_read(&kvm->mmu_lock);
> +		else
> +			cond_resched_rwlock_write(&kvm->mmu_lock);
> +
>   		rcu_read_lock();
>   
>   		WARN_ON(iter->gfn > iter->next_last_level_gfn);
> @@ -662,23 +681,33 @@ static inline bool tdp_mmu_iter_cond_resched(struct kvm *kvm,
>    * non-root pages mapping GFNs strictly within that range. Returns true if
>    * SPTEs have been cleared and a TLB flush is needed before releasing the
>    * MMU lock.
> + *
>    * If can_yield is true, will release the MMU lock and reschedule if the
>    * scheduler needs the CPU or there is contention on the MMU lock. If this
>    * function cannot yield, it will not release the MMU lock or reschedule and
>    * the caller must ensure it does not supply too large a GFN range, or the
>    * operation can cause a soft lockup.
> + *
> + * If shared is true, this thread holds the MMU lock in read mode and must
> + * account for the possibility that other threads are modifying the paging
> + * structures concurrently. If shared is false, this thread should hold the
> + * MMU lock in write mode.
>    */
>   static bool zap_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
> -			  gfn_t start, gfn_t end, bool can_yield)
> +			  gfn_t start, gfn_t end, bool can_yield, bool shared)
>   {
>   	struct tdp_iter iter;
>   	bool flush_needed = false;
>   
> +	kvm_lockdep_assert_mmu_lock_held(kvm, shared);
> +
>   	rcu_read_lock();
>   
>   	tdp_root_for_each_pte(iter, root, start, end) {
> +retry:
>   		if (can_yield &&
> -		    tdp_mmu_iter_cond_resched(kvm, &iter, flush_needed)) {
> +		    tdp_mmu_iter_cond_resched(kvm, &iter, flush_needed,
> +					      shared)) {
>   			flush_needed = false;
>   			continue;
>   		}
> @@ -696,8 +725,17 @@ static bool zap_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
>   		    !is_last_spte(iter.old_spte, iter.level))
>   			continue;
>   
> -		tdp_mmu_set_spte(kvm, &iter, 0);
> -		flush_needed = true;
> +		if (!shared) {
> +			tdp_mmu_set_spte(kvm, &iter, 0);
> +			flush_needed = true;
> +		} else if (!tdp_mmu_zap_spte_atomic(kvm, &iter)) {
> +			/*
> +			 * The iter must explicitly re-read the SPTE because
> +			 * the atomic cmpxchg failed.
> +			 */
> +			iter.old_spte = READ_ONCE(*rcu_dereference(iter.sptep));
> +			goto retry;
> +		}
>   	}
>   
>   	rcu_read_unlock();
> @@ -709,14 +747,20 @@ static bool zap_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
>    * non-root pages mapping GFNs strictly within that range. Returns true if
>    * SPTEs have been cleared and a TLB flush is needed before releasing the
>    * MMU lock.
> + *
> + * If shared is true, this thread holds the MMU lock in read mode and must
> + * account for the possibility that other threads are modifying the paging
> + * structures concurrently. If shared is false, this thread should hold the
> + * MMU in write mode.
>    */
> -bool kvm_tdp_mmu_zap_gfn_range(struct kvm *kvm, gfn_t start, gfn_t end)
> +bool kvm_tdp_mmu_zap_gfn_range(struct kvm *kvm, gfn_t start, gfn_t end,
> +			       bool shared)
>   {
>   	struct kvm_mmu_page *root;
>   	bool flush = false;
>   
> -	for_each_tdp_mmu_root_yield_safe(kvm, root)
> -		flush |= zap_gfn_range(kvm, root, start, end, true);
> +	for_each_tdp_mmu_root_yield_safe(kvm, root, shared)
> +		flush |= zap_gfn_range(kvm, root, start, end, true, shared);
>   
>   	return flush;
>   }
> @@ -726,7 +770,7 @@ void kvm_tdp_mmu_zap_all(struct kvm *kvm)
>   	gfn_t max_gfn = 1ULL << (shadow_phys_bits - PAGE_SHIFT);
>   	bool flush;
>   
> -	flush = kvm_tdp_mmu_zap_gfn_range(kvm, 0, max_gfn);
> +	flush = kvm_tdp_mmu_zap_gfn_range(kvm, 0, max_gfn, false);
>   	if (flush)
>   		kvm_flush_remote_tlbs(kvm);
>   }
> @@ -893,7 +937,7 @@ static __always_inline int kvm_tdp_mmu_handle_hva_range(struct kvm *kvm,
>   	int ret = 0;
>   	int as_id;
>   
> -	for_each_tdp_mmu_root_yield_safe(kvm, root) {
> +	for_each_tdp_mmu_root_yield_safe(kvm, root, false) {
>   		as_id = kvm_mmu_page_as_id(root);
>   		slots = __kvm_memslots(kvm, as_id);
>   		kvm_for_each_memslot(memslot, slots) {
> @@ -933,7 +977,7 @@ static int zap_gfn_range_hva_wrapper(struct kvm *kvm,
>   				     struct kvm_mmu_page *root, gfn_t start,
>   				     gfn_t end, unsigned long unused)
>   {
> -	return zap_gfn_range(kvm, root, start, end, false);
> +	return zap_gfn_range(kvm, root, start, end, false, false);
>   }
>   
>   int kvm_tdp_mmu_zap_hva_range(struct kvm *kvm, unsigned long start,
> @@ -1098,7 +1142,7 @@ static bool wrprot_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
>   
>   	for_each_tdp_pte_min_level(iter, root->spt, root->role.level,
>   				   min_level, start, end) {
> -		if (tdp_mmu_iter_cond_resched(kvm, &iter, false))
> +		if (tdp_mmu_iter_cond_resched(kvm, &iter, false, false))
>   			continue;
>   
>   		if (!is_shadow_present_pte(iter.old_spte) ||
> @@ -1128,7 +1172,7 @@ bool kvm_tdp_mmu_wrprot_slot(struct kvm *kvm, struct kvm_memory_slot *slot,
>   	int root_as_id;
>   	bool spte_set = false;
>   
> -	for_each_tdp_mmu_root_yield_safe(kvm, root) {
> +	for_each_tdp_mmu_root_yield_safe(kvm, root, false) {
>   		root_as_id = kvm_mmu_page_as_id(root);
>   		if (root_as_id != slot->as_id)
>   			continue;
> @@ -1157,7 +1201,7 @@ static bool clear_dirty_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
>   	rcu_read_lock();
>   
>   	tdp_root_for_each_leaf_pte(iter, root, start, end) {
> -		if (tdp_mmu_iter_cond_resched(kvm, &iter, false))
> +		if (tdp_mmu_iter_cond_resched(kvm, &iter, false, false))
>   			continue;
>   
>   		if (spte_ad_need_write_protect(iter.old_spte)) {
> @@ -1193,7 +1237,7 @@ bool kvm_tdp_mmu_clear_dirty_slot(struct kvm *kvm, struct kvm_memory_slot *slot)
>   	int root_as_id;
>   	bool spte_set = false;
>   
> -	for_each_tdp_mmu_root_yield_safe(kvm, root) {
> +	for_each_tdp_mmu_root_yield_safe(kvm, root, false) {
>   		root_as_id = kvm_mmu_page_as_id(root);
>   		if (root_as_id != slot->as_id)
>   			continue;
> @@ -1291,7 +1335,7 @@ static void zap_collapsible_spte_range(struct kvm *kvm,
>   	rcu_read_lock();
>   
>   	tdp_root_for_each_pte(iter, root, start, end) {
> -		if (tdp_mmu_iter_cond_resched(kvm, &iter, spte_set)) {
> +		if (tdp_mmu_iter_cond_resched(kvm, &iter, spte_set, false)) {
>   			spte_set = false;
>   			continue;
>   		}
> @@ -1326,7 +1370,7 @@ void kvm_tdp_mmu_zap_collapsible_sptes(struct kvm *kvm,
>   	struct kvm_mmu_page *root;
>   	int root_as_id;
>   
> -	for_each_tdp_mmu_root_yield_safe(kvm, root) {
> +	for_each_tdp_mmu_root_yield_safe(kvm, root, false) {
>   		root_as_id = kvm_mmu_page_as_id(root);
>   		if (root_as_id != slot->as_id)
>   			continue;
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
> index 9961df505067..855e58856815 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.h
> +++ b/arch/x86/kvm/mmu/tdp_mmu.h
> @@ -13,9 +13,11 @@ __must_check static inline bool kvm_tdp_mmu_get_root(struct kvm *kvm,
>   	return refcount_inc_not_zero(&root->tdp_mmu_root_count);
>   }
>   
> -void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root);
> +void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root,
> +			  bool shared);
>   
> -bool kvm_tdp_mmu_zap_gfn_range(struct kvm *kvm, gfn_t start, gfn_t end);
> +bool kvm_tdp_mmu_zap_gfn_range(struct kvm *kvm, gfn_t start, gfn_t end,
> +			       bool shared);
>   void kvm_tdp_mmu_zap_all(struct kvm *kvm);
>   
>   int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
> 


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 10/13] KVM: x86/mmu: Allow zapping collapsible SPTEs to use MMU read lock
  2021-03-31 21:08 ` [PATCH 10/13] KVM: x86/mmu: Allow zapping collapsible SPTEs to use MMU " Ben Gardon
@ 2021-04-01 10:12   ` Paolo Bonzini
  0 siblings, 0 replies; 32+ messages in thread
From: Paolo Bonzini @ 2021-04-01 10:12 UTC (permalink / raw)
  To: Ben Gardon, linux-kernel, kvm
  Cc: Peter Xu, Sean Christopherson, Peter Shier, Peter Feiner,
	Junaid Shahid, Jim Mattson, Yulei Zhang, Wanpeng Li,
	Vitaly Kuznetsov, Xiao Guangrong

On 31/03/21 23:08, Ben Gardon wrote:
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index dcbfc784cf2f..81967b4e7d76 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -5610,10 +5610,13 @@ void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm,
>   
>   	write_lock(&kvm->mmu_lock);
>   	slot_handle_leaf(kvm, slot, kvm_mmu_zap_collapsible_spte, true);
> -
> -	if (is_tdp_mmu_enabled(kvm))
> -		kvm_tdp_mmu_zap_collapsible_sptes(kvm, slot);
>   	write_unlock(&kvm->mmu_lock);
> +
> +	if (is_tdp_mmu_enabled(kvm)) {
> +		read_lock(&kvm->mmu_lock);
> +		kvm_tdp_mmu_zap_collapsible_sptes(kvm, memslot);
> +		read_unlock(&kvm->mmu_lock);
> +	}
>   }

Same here, this will conflict with

KVM: x86/mmu: Coalesce TLB flushes when zapping collapsible SPTEs

Again, you will have to add back some "if (flush)" before the write_unlock.

Paolo


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 12/13] KVM: x86/mmu: Fast invalidation for TDP MMU
  2021-03-31 21:08 ` [PATCH 12/13] KVM: x86/mmu: Fast invalidation for TDP MMU Ben Gardon
  2021-03-31 22:27   ` Sean Christopherson
@ 2021-04-01 10:36   ` Paolo Bonzini
  2021-04-01 16:50     ` Ben Gardon
  1 sibling, 1 reply; 32+ messages in thread
From: Paolo Bonzini @ 2021-04-01 10:36 UTC (permalink / raw)
  To: Ben Gardon, linux-kernel, kvm
  Cc: Peter Xu, Sean Christopherson, Peter Shier, Peter Feiner,
	Junaid Shahid, Jim Mattson, Yulei Zhang, Wanpeng Li,
	Vitaly Kuznetsov, Xiao Guangrong

On 31/03/21 23:08, Ben Gardon wrote:
>   
> +	if (is_tdp_mmu_enabled(kvm))
> +		kvm_tdp_mmu_invalidate_roots(kvm);
> +
>   	/*
>   	 * Toggle mmu_valid_gen between '0' and '1'.  Because slots_lock is
>   	 * held for the entire duration of zapping obsolete pages, it's
> @@ -5451,9 +5454,6 @@ static void kvm_mmu_zap_all_fast(struct kvm *kvm)
>   
>   	kvm_zap_obsolete_pages(kvm);
>   
> -	if (is_tdp_mmu_enabled(kvm))
> -		kvm_tdp_mmu_zap_all(kvm);
> -

This is just cosmetic, but I'd prefer to keep the call to 
kvm_tdp_mmu_invalidate_roots at the original place, so that it's clear 
in the next patch that it's two separate parts because of the different 
locking requirements.

Paolo


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 13/13] KVM: x86/mmu: Tear down roots in fast invalidation thread
  2021-03-31 21:08 ` [PATCH 13/13] KVM: x86/mmu: Tear down roots in fast invalidation thread Ben Gardon
  2021-03-31 22:29   ` Sean Christopherson
@ 2021-04-01 10:37   ` Paolo Bonzini
  1 sibling, 0 replies; 32+ messages in thread
From: Paolo Bonzini @ 2021-04-01 10:37 UTC (permalink / raw)
  To: Ben Gardon, linux-kernel, kvm
  Cc: Peter Xu, Sean Christopherson, Peter Shier, Peter Feiner,
	Junaid Shahid, Jim Mattson, Yulei Zhang, Wanpeng Li,
	Vitaly Kuznetsov, Xiao Guangrong

On 31/03/21 23:08, Ben Gardon wrote:
> +/*
> + * Since kvm_tdp_mmu_invalidate_roots has acquired a reference to each
> + * invalidated root, they will not be freed until this function drops the
> + * reference. Before dropping that reference, tear down the paging
> + * structure so that whichever thread does drop the last reference
> + * only has to do a trivial ammount of work. Since the roots are invalid,
> + * no new SPTEs should be created under them.
> + */
> +void kvm_tdp_mmu_zap_all_fast(struct kvm *kvm)

Please rename this to kvm_tdp_mmu_zap_invalidated_roots.

Paolo


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 13/13] KVM: x86/mmu: Tear down roots in fast invalidation thread
  2021-03-31 22:29   ` Sean Christopherson
@ 2021-04-01 10:41     ` Paolo Bonzini
  0 siblings, 0 replies; 32+ messages in thread
From: Paolo Bonzini @ 2021-04-01 10:41 UTC (permalink / raw)
  To: Sean Christopherson, Ben Gardon
  Cc: linux-kernel, kvm, Peter Xu, Peter Shier, Peter Feiner,
	Junaid Shahid, Jim Mattson, Yulei Zhang, Wanpeng Li,
	Vitaly Kuznetsov, Xiao Guangrong

On 01/04/21 00:29, Sean Christopherson wrote:
>> +	if (is_tdp_mmu_enabled(kvm)) {
>> +		read_lock(&kvm->mmu_lock);
>> +		kvm_tdp_mmu_zap_all_fast(kvm);
> Purely because it exists first, I think we should follow the legacy MMU's
> terminology, i.e. kvm_tdp_mmu_zap_obsolete_pages().
> 

It's a bit different, obsolete pages in kvm_zap_obsolete_pages have an 
old mmu_valid_gen.  They are not invalid, so it is inaccurate to talk 
about obsolete pages in the context of the TDP MMU.  But I agree that 
the function should be renamed.

Paolo


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 00/13] More parallel operations for the TDP MMU
  2021-03-31 21:08 [PATCH 00/13] More parallel operations for the TDP MMU Ben Gardon
                   ` (12 preceding siblings ...)
  2021-03-31 21:08 ` [PATCH 13/13] KVM: x86/mmu: Tear down roots in fast invalidation thread Ben Gardon
@ 2021-04-01 10:43 ` Paolo Bonzini
  13 siblings, 0 replies; 32+ messages in thread
From: Paolo Bonzini @ 2021-04-01 10:43 UTC (permalink / raw)
  To: Ben Gardon, linux-kernel, kvm
  Cc: Peter Xu, Sean Christopherson, Peter Shier, Peter Feiner,
	Junaid Shahid, Jim Mattson, Yulei Zhang, Wanpeng Li,
	Vitaly Kuznetsov, Xiao Guangrong

On 31/03/21 23:08, Ben Gardon wrote:
> Now that the TDP MMU is able to handle page faults in parallel, it's a
> relatively small change to expand to other operations. This series allows
> zapping a range of GFNs, reclaiming collapsible SPTEs (when disabling
> dirty logging), and enabling dirty logging to all happen under the MMU
> lock in read mode.
> 
> This is partly a cleanup + rewrite of the last few patches of the parallel
> page faults series. I've incorporated feedback from Sean and Paolo, but
> the patches have changed so much that I'm sending this as a separate
> series.
> 
> Ran kvm-unit-tests + selftests on an SMP kernel + Intel Skylake, with the
> TDP MMU enabled and disabled. This series introduces no new failures or
> warnings.
> 
> I know this will conflict horribly with the patches from Sean's series
> which were just queued, and I'll send a v2 to fix those conflicts +
> address any feedback on this v1.

Mostly looks good (the only substantial remark is from Sean's reply to 
patch 7); I've made a couple adjustments to the patches I had queued to 
ease the fixing of conflicts.  The patches should hit kvm/queue later 
today, right after I send my 5.11 pull request to Linus.

I have also just sent a completely unrelated remark on the existing page 
fault function, which was prompted by reviewing these patches.  If you 
agree, I can take care of the change or you can include it in v2, as you 
prefer (I don't expect conflicts).

Paolo


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 08/13] KVM: x86/mmu: Protect the tdp_mmu_roots list with RCU
  2021-03-31 21:08 ` [PATCH 08/13] KVM: x86/mmu: Protect the tdp_mmu_roots list with RCU Ben Gardon
  2021-04-01  9:37   ` Paolo Bonzini
@ 2021-04-01 13:16   ` kernel test robot
  2021-04-01 16:50     ` Ben Gardon
  1 sibling, 1 reply; 32+ messages in thread
From: kernel test robot @ 2021-04-01 13:16 UTC (permalink / raw)
  To: Ben Gardon, linux-kernel, kvm
  Cc: kbuild-all, Paolo Bonzini, Peter Xu, Sean Christopherson,
	Peter Shier, Peter Feiner, Junaid Shahid, Jim Mattson,
	Yulei Zhang

[-- Attachment #1: Type: text/plain, Size: 7906 bytes --]

Hi Ben,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on next-20210331]
[cannot apply to kvm/queue tip/master linux/master linus/master v5.12-rc5 v5.12-rc4 v5.12-rc3 v5.12-rc5]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Ben-Gardon/More-parallel-operations-for-the-TDP-MMU/20210401-051137
base:    7a43c78d0573e0bbbb0456b033e2b9a895b89464
config: x86_64-allyesconfig (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce (this is a W=1 build):
        # https://github.com/0day-ci/linux/commit/2b2c6d3bdc35269df5f9293a02da5b71c74095f3
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Ben-Gardon/More-parallel-operations-for-the-TDP-MMU/20210401-051137
        git checkout 2b2c6d3bdc35269df5f9293a02da5b71c74095f3
        # save the attached .config to linux build tree
        make W=1 ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   In file included from include/linux/rculist.h:11,
                    from include/linux/pid.h:5,
                    from include/linux/sched.h:14,
                    from include/linux/hardirq.h:9,
                    from include/linux/kvm_host.h:7,
                    from arch/x86/kvm/mmu.h:5,
                    from arch/x86/kvm/mmu/tdp_mmu.c:3:
   arch/x86/kvm/mmu/tdp_mmu.c: In function 'kvm_tdp_mmu_get_vcpu_root_hpa':
>> arch/x86/kvm/mmu/tdp_mmu.c:139:5: error: implicit declaration of function 'lockdep_is_held_write'; did you mean 'lockdep_is_held_type'? [-Werror=implicit-function-declaration]
     139 |     lockdep_is_held_write(&kvm->mmu_lock))
         |     ^~~~~~~~~~~~~~~~~~~~~
   include/linux/rcupdate.h:318:52: note: in definition of macro 'RCU_LOCKDEP_WARN'
     318 |   if (debug_lockdep_rcu_enabled() && !__warned && (c)) { \
         |                                                    ^
   include/linux/rculist.h:391:7: note: in expansion of macro '__list_check_rcu'
     391 |  for (__list_check_rcu(dummy, ## cond, 0),   \
         |       ^~~~~~~~~~~~~~~~
   arch/x86/kvm/mmu/tdp_mmu.c:138:2: note: in expansion of macro 'list_for_each_entry_rcu'
     138 |  list_for_each_entry_rcu(_root, &_kvm->arch.tdp_mmu_roots, link, \
         |  ^~~~~~~~~~~~~~~~~~~~~~~
   arch/x86/kvm/mmu/tdp_mmu.c:184:2: note: in expansion of macro 'for_each_tdp_mmu_root'
     184 |  for_each_tdp_mmu_root(kvm, root) {
         |  ^~~~~~~~~~~~~~~~~~~~~
   cc1: some warnings being treated as errors


vim +139 arch/x86/kvm/mmu/tdp_mmu.c

     2	
   > 3	#include "mmu.h"
     4	#include "mmu_internal.h"
     5	#include "mmutrace.h"
     6	#include "tdp_iter.h"
     7	#include "tdp_mmu.h"
     8	#include "spte.h"
     9	
    10	#include <asm/cmpxchg.h>
    11	#include <trace/events/kvm.h>
    12	
    13	static bool __read_mostly tdp_mmu_enabled = false;
    14	module_param_named(tdp_mmu, tdp_mmu_enabled, bool, 0644);
    15	
    16	/* Initializes the TDP MMU for the VM, if enabled. */
    17	void kvm_mmu_init_tdp_mmu(struct kvm *kvm)
    18	{
    19		if (!tdp_enabled || !READ_ONCE(tdp_mmu_enabled))
    20			return;
    21	
    22		/* This should not be changed for the lifetime of the VM. */
    23		kvm->arch.tdp_mmu_enabled = true;
    24	
    25		INIT_LIST_HEAD(&kvm->arch.tdp_mmu_roots);
    26		spin_lock_init(&kvm->arch.tdp_mmu_pages_lock);
    27		INIT_LIST_HEAD(&kvm->arch.tdp_mmu_pages);
    28	}
    29	
    30	void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm)
    31	{
    32		if (!kvm->arch.tdp_mmu_enabled)
    33			return;
    34	
    35		WARN_ON(!list_empty(&kvm->arch.tdp_mmu_roots));
    36	
    37		/*
    38		 * Ensure that all the outstanding RCU callbacks to free shadow pages
    39		 * can run before the VM is torn down.
    40		 */
    41		rcu_barrier();
    42	}
    43	
    44	static bool zap_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
    45				  gfn_t start, gfn_t end, bool can_yield);
    46	
    47	static void tdp_mmu_free_sp(struct kvm_mmu_page *sp)
    48	{
    49		free_page((unsigned long)sp->spt);
    50		kmem_cache_free(mmu_page_header_cache, sp);
    51	}
    52	
    53	/*
    54	 * This is called through call_rcu in order to free TDP page table memory
    55	 * safely with respect to other kernel threads that may be operating on
    56	 * the memory.
    57	 * By only accessing TDP MMU page table memory in an RCU read critical
    58	 * section, and freeing it after a grace period, lockless access to that
    59	 * memory won't use it after it is freed.
    60	 */
    61	static void tdp_mmu_free_sp_rcu_callback(struct rcu_head *head)
    62	{
    63		struct kvm_mmu_page *sp = container_of(head, struct kvm_mmu_page,
    64						       rcu_head);
    65	
    66		tdp_mmu_free_sp(sp);
    67	}
    68	
    69	void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root)
    70	{
    71		gfn_t max_gfn = 1ULL << (shadow_phys_bits - PAGE_SHIFT);
    72	
    73		lockdep_assert_held_write(&kvm->mmu_lock);
    74	
    75		if (!refcount_dec_and_test(&root->tdp_mmu_root_count))
    76			return;
    77	
    78		WARN_ON(!root->tdp_mmu_page);
    79	
    80		spin_lock(&kvm->arch.tdp_mmu_pages_lock);
    81		list_del_rcu(&root->link);
    82		spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
    83	
    84		zap_gfn_range(kvm, root, 0, max_gfn, false);
    85	
    86		call_rcu(&root->rcu_head, tdp_mmu_free_sp_rcu_callback);
    87	}
    88	
    89	/*
    90	 * Finds the next valid root after root (or the first valid root if root
    91	 * is NULL), takes a reference on it, and returns that next root. If root
    92	 * is not NULL, this thread should have already taken a reference on it, and
    93	 * that reference will be dropped. If no valid root is found, this
    94	 * function will return NULL.
    95	 */
    96	static struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm,
    97						      struct kvm_mmu_page *prev_root)
    98	{
    99		struct kvm_mmu_page *next_root;
   100	
   101		lockdep_assert_held_write(&kvm->mmu_lock);
   102	
   103		rcu_read_lock();
   104	
   105		if (prev_root)
   106			next_root = list_next_or_null_rcu(&kvm->arch.tdp_mmu_roots,
   107							  &prev_root->link,
   108							  typeof(*prev_root), link);
   109		else
   110			next_root = list_first_or_null_rcu(&kvm->arch.tdp_mmu_roots,
   111							   typeof(*next_root), link);
   112	
   113		while (next_root && !kvm_tdp_mmu_get_root(kvm, next_root))
   114			next_root = list_next_or_null_rcu(&kvm->arch.tdp_mmu_roots,
   115					&next_root->link, typeof(*next_root), link);
   116	
   117		rcu_read_unlock();
   118	
   119		if (prev_root)
   120			kvm_tdp_mmu_put_root(kvm, prev_root);
   121	
   122		return next_root;
   123	}
   124	
   125	/*
   126	 * Note: this iterator gets and puts references to the roots it iterates over.
   127	 * This makes it safe to release the MMU lock and yield within the loop, but
   128	 * if exiting the loop early, the caller must drop the reference to the most
   129	 * recent root. (Unless keeping a live reference is desirable.)
   130	 */
   131	#define for_each_tdp_mmu_root_yield_safe(_kvm, _root)	\
   132		for (_root = tdp_mmu_next_root(_kvm, NULL);	\
   133		     _root;					\
   134		     _root = tdp_mmu_next_root(_kvm, _root))
   135	
   136	/* Only safe under the MMU lock in write mode, without yielding. */
   137	#define for_each_tdp_mmu_root(_kvm, _root)				\
   138		list_for_each_entry_rcu(_root, &_kvm->arch.tdp_mmu_roots, link,	\
 > 139					lockdep_is_held_write(&kvm->mmu_lock))
   140	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 65497 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 08/13] KVM: x86/mmu: Protect the tdp_mmu_roots list with RCU
  2021-04-01  9:37   ` Paolo Bonzini
@ 2021-04-01 16:48     ` Ben Gardon
  0 siblings, 0 replies; 32+ messages in thread
From: Ben Gardon @ 2021-04-01 16:48 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: LKML, kvm, Peter Xu, Sean Christopherson, Peter Shier,
	Peter Feiner, Junaid Shahid, Jim Mattson, Yulei Zhang,
	Wanpeng Li, Vitaly Kuznetsov, Xiao Guangrong

On Thu, Apr 1, 2021 at 2:37 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 31/03/21 23:08, Ben Gardon wrote:
> > Protect the contents of the TDP MMU roots list with RCU in preparation
> > for a future patch which will allow the iterator macro to be used under
> > the MMU lock in read mode.
> >
> > Signed-off-by: Ben Gardon<bgardon@google.com>
> > ---
> >   arch/x86/kvm/mmu/tdp_mmu.c | 64 +++++++++++++++++++++-----------------
> >   1 file changed, 36 insertions(+), 28 deletions(-)
> >
> > diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> > +     spin_lock(&kvm->arch.tdp_mmu_pages_lock);
> > +     list_del_rcu(&root->link);
> > +     spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
>
>
> Please update the comment above tdp_mmu_pages_lock in
> arch/x86/include/asm/kvm_host.h as well.

Ah yes, thank you for catching that. Will do.

>
> >  /* Only safe under the MMU lock in write mode, without yielding. */
> >  #define for_each_tdp_mmu_root(_kvm, _root)                           \
> > -     list_for_each_entry(_root, &_kvm->arch.tdp_mmu_roots, link)
> > +     list_for_each_entry_rcu(_root, &_kvm->arch.tdp_mmu_roots, link, \
> > +                             lockdep_is_held_write(&kvm->mmu_lock))
>
> This should also add "... ||
> lockdep_is_help(&kvm->arch.tdp_mmu_pages_lock)", if only for
> documentation purposes.

Good idea. I hope we never have a function try to protect its loop
over the roots with that lock, but it would be correct.

>
> Paolo
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 08/13] KVM: x86/mmu: Protect the tdp_mmu_roots list with RCU
  2021-04-01 13:16   ` kernel test robot
@ 2021-04-01 16:50     ` Ben Gardon
  0 siblings, 0 replies; 32+ messages in thread
From: Ben Gardon @ 2021-04-01 16:50 UTC (permalink / raw)
  To: kernel test robot
  Cc: LKML, kvm, kbuild-all, Paolo Bonzini, Peter Xu,
	Sean Christopherson, Peter Shier, Peter Feiner, Junaid Shahid,
	Jim Mattson, Yulei Zhang

On Thu, Apr 1, 2021 at 6:17 AM kernel test robot <lkp@intel.com> wrote:
>
> Hi Ben,
>
> Thank you for the patch! Yet something to improve:
>
> [auto build test ERROR on next-20210331]
> [cannot apply to kvm/queue tip/master linux/master linus/master v5.12-rc5 v5.12-rc4 v5.12-rc3 v5.12-rc5]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch]
>
> url:    https://github.com/0day-ci/linux/commits/Ben-Gardon/More-parallel-operations-for-the-TDP-MMU/20210401-051137
> base:    7a43c78d0573e0bbbb0456b033e2b9a895b89464
> config: x86_64-allyesconfig (attached as .config)
> compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
> reproduce (this is a W=1 build):
>         # https://github.com/0day-ci/linux/commit/2b2c6d3bdc35269df5f9293a02da5b71c74095f3
>         git remote add linux-review https://github.com/0day-ci/linux
>         git fetch --no-tags linux-review Ben-Gardon/More-parallel-operations-for-the-TDP-MMU/20210401-051137
>         git checkout 2b2c6d3bdc35269df5f9293a02da5b71c74095f3
>         # save the attached .config to linux build tree
>         make W=1 ARCH=x86_64
>
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot <lkp@intel.com>
>
> All errors (new ones prefixed by >>):
>
>    In file included from include/linux/rculist.h:11,
>                     from include/linux/pid.h:5,
>                     from include/linux/sched.h:14,
>                     from include/linux/hardirq.h:9,
>                     from include/linux/kvm_host.h:7,
>                     from arch/x86/kvm/mmu.h:5,
>                     from arch/x86/kvm/mmu/tdp_mmu.c:3:
>    arch/x86/kvm/mmu/tdp_mmu.c: In function 'kvm_tdp_mmu_get_vcpu_root_hpa':
> >> arch/x86/kvm/mmu/tdp_mmu.c:139:5: error: implicit declaration of function 'lockdep_is_held_write'; did you mean 'lockdep_is_held_type'? [-Werror=implicit-function-declaration]
>      139 |     lockdep_is_held_write(&kvm->mmu_lock))

Huh, I wonder why this isn't exported in some configuration. I'll fix
this in v2 as well.

>          |     ^~~~~~~~~~~~~~~~~~~~~
>    include/linux/rcupdate.h:318:52: note: in definition of macro 'RCU_LOCKDEP_WARN'
>      318 |   if (debug_lockdep_rcu_enabled() && !__warned && (c)) { \
>          |                                                    ^
>    include/linux/rculist.h:391:7: note: in expansion of macro '__list_check_rcu'
>      391 |  for (__list_check_rcu(dummy, ## cond, 0),   \
>          |       ^~~~~~~~~~~~~~~~
>    arch/x86/kvm/mmu/tdp_mmu.c:138:2: note: in expansion of macro 'list_for_each_entry_rcu'
>      138 |  list_for_each_entry_rcu(_root, &_kvm->arch.tdp_mmu_roots, link, \
>          |  ^~~~~~~~~~~~~~~~~~~~~~~
>    arch/x86/kvm/mmu/tdp_mmu.c:184:2: note: in expansion of macro 'for_each_tdp_mmu_root'
>      184 |  for_each_tdp_mmu_root(kvm, root) {
>          |  ^~~~~~~~~~~~~~~~~~~~~
>    cc1: some warnings being treated as errors
>
>
> vim +139 arch/x86/kvm/mmu/tdp_mmu.c
>
>      2
>    > 3  #include "mmu.h"
>      4  #include "mmu_internal.h"
>      5  #include "mmutrace.h"
>      6  #include "tdp_iter.h"
>      7  #include "tdp_mmu.h"
>      8  #include "spte.h"
>      9
>     10  #include <asm/cmpxchg.h>
>     11  #include <trace/events/kvm.h>
>     12
>     13  static bool __read_mostly tdp_mmu_enabled = false;
>     14  module_param_named(tdp_mmu, tdp_mmu_enabled, bool, 0644);
>     15
>     16  /* Initializes the TDP MMU for the VM, if enabled. */
>     17  void kvm_mmu_init_tdp_mmu(struct kvm *kvm)
>     18  {
>     19          if (!tdp_enabled || !READ_ONCE(tdp_mmu_enabled))
>     20                  return;
>     21
>     22          /* This should not be changed for the lifetime of the VM. */
>     23          kvm->arch.tdp_mmu_enabled = true;
>     24
>     25          INIT_LIST_HEAD(&kvm->arch.tdp_mmu_roots);
>     26          spin_lock_init(&kvm->arch.tdp_mmu_pages_lock);
>     27          INIT_LIST_HEAD(&kvm->arch.tdp_mmu_pages);
>     28  }
>     29
>     30  void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm)
>     31  {
>     32          if (!kvm->arch.tdp_mmu_enabled)
>     33                  return;
>     34
>     35          WARN_ON(!list_empty(&kvm->arch.tdp_mmu_roots));
>     36
>     37          /*
>     38           * Ensure that all the outstanding RCU callbacks to free shadow pages
>     39           * can run before the VM is torn down.
>     40           */
>     41          rcu_barrier();
>     42  }
>     43
>     44  static bool zap_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
>     45                            gfn_t start, gfn_t end, bool can_yield);
>     46
>     47  static void tdp_mmu_free_sp(struct kvm_mmu_page *sp)
>     48  {
>     49          free_page((unsigned long)sp->spt);
>     50          kmem_cache_free(mmu_page_header_cache, sp);
>     51  }
>     52
>     53  /*
>     54   * This is called through call_rcu in order to free TDP page table memory
>     55   * safely with respect to other kernel threads that may be operating on
>     56   * the memory.
>     57   * By only accessing TDP MMU page table memory in an RCU read critical
>     58   * section, and freeing it after a grace period, lockless access to that
>     59   * memory won't use it after it is freed.
>     60   */
>     61  static void tdp_mmu_free_sp_rcu_callback(struct rcu_head *head)
>     62  {
>     63          struct kvm_mmu_page *sp = container_of(head, struct kvm_mmu_page,
>     64                                                 rcu_head);
>     65
>     66          tdp_mmu_free_sp(sp);
>     67  }
>     68
>     69  void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root)
>     70  {
>     71          gfn_t max_gfn = 1ULL << (shadow_phys_bits - PAGE_SHIFT);
>     72
>     73          lockdep_assert_held_write(&kvm->mmu_lock);
>     74
>     75          if (!refcount_dec_and_test(&root->tdp_mmu_root_count))
>     76                  return;
>     77
>     78          WARN_ON(!root->tdp_mmu_page);
>     79
>     80          spin_lock(&kvm->arch.tdp_mmu_pages_lock);
>     81          list_del_rcu(&root->link);
>     82          spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
>     83
>     84          zap_gfn_range(kvm, root, 0, max_gfn, false);
>     85
>     86          call_rcu(&root->rcu_head, tdp_mmu_free_sp_rcu_callback);
>     87  }
>     88
>     89  /*
>     90   * Finds the next valid root after root (or the first valid root if root
>     91   * is NULL), takes a reference on it, and returns that next root. If root
>     92   * is not NULL, this thread should have already taken a reference on it, and
>     93   * that reference will be dropped. If no valid root is found, this
>     94   * function will return NULL.
>     95   */
>     96  static struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm,
>     97                                                struct kvm_mmu_page *prev_root)
>     98  {
>     99          struct kvm_mmu_page *next_root;
>    100
>    101          lockdep_assert_held_write(&kvm->mmu_lock);
>    102
>    103          rcu_read_lock();
>    104
>    105          if (prev_root)
>    106                  next_root = list_next_or_null_rcu(&kvm->arch.tdp_mmu_roots,
>    107                                                    &prev_root->link,
>    108                                                    typeof(*prev_root), link);
>    109          else
>    110                  next_root = list_first_or_null_rcu(&kvm->arch.tdp_mmu_roots,
>    111                                                     typeof(*next_root), link);
>    112
>    113          while (next_root && !kvm_tdp_mmu_get_root(kvm, next_root))
>    114                  next_root = list_next_or_null_rcu(&kvm->arch.tdp_mmu_roots,
>    115                                  &next_root->link, typeof(*next_root), link);
>    116
>    117          rcu_read_unlock();
>    118
>    119          if (prev_root)
>    120                  kvm_tdp_mmu_put_root(kvm, prev_root);
>    121
>    122          return next_root;
>    123  }
>    124
>    125  /*
>    126   * Note: this iterator gets and puts references to the roots it iterates over.
>    127   * This makes it safe to release the MMU lock and yield within the loop, but
>    128   * if exiting the loop early, the caller must drop the reference to the most
>    129   * recent root. (Unless keeping a live reference is desirable.)
>    130   */
>    131  #define for_each_tdp_mmu_root_yield_safe(_kvm, _root)   \
>    132          for (_root = tdp_mmu_next_root(_kvm, NULL);     \
>    133               _root;                                     \
>    134               _root = tdp_mmu_next_root(_kvm, _root))
>    135
>    136  /* Only safe under the MMU lock in write mode, without yielding. */
>    137  #define for_each_tdp_mmu_root(_kvm, _root)                              \
>    138          list_for_each_entry_rcu(_root, &_kvm->arch.tdp_mmu_roots, link, \
>  > 139                                  lockdep_is_held_write(&kvm->mmu_lock))
>    140
>
> ---
> 0-DAY CI Kernel Test Service, Intel Corporation
> https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 12/13] KVM: x86/mmu: Fast invalidation for TDP MMU
  2021-04-01 10:36   ` Paolo Bonzini
@ 2021-04-01 16:50     ` Ben Gardon
  2021-04-01 17:02       ` Paolo Bonzini
  0 siblings, 1 reply; 32+ messages in thread
From: Ben Gardon @ 2021-04-01 16:50 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: LKML, kvm, Peter Xu, Sean Christopherson, Peter Shier,
	Peter Feiner, Junaid Shahid, Jim Mattson, Yulei Zhang,
	Wanpeng Li, Vitaly Kuznetsov, Xiao Guangrong

On Thu, Apr 1, 2021 at 3:36 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 31/03/21 23:08, Ben Gardon wrote:
> >
> > +     if (is_tdp_mmu_enabled(kvm))
> > +             kvm_tdp_mmu_invalidate_roots(kvm);
> > +
> >       /*
> >        * Toggle mmu_valid_gen between '0' and '1'.  Because slots_lock is
> >        * held for the entire duration of zapping obsolete pages, it's
> > @@ -5451,9 +5454,6 @@ static void kvm_mmu_zap_all_fast(struct kvm *kvm)
> >
> >       kvm_zap_obsolete_pages(kvm);
> >
> > -     if (is_tdp_mmu_enabled(kvm))
> > -             kvm_tdp_mmu_zap_all(kvm);
> > -
>
> This is just cosmetic, but I'd prefer to keep the call to
> kvm_tdp_mmu_invalidate_roots at the original place, so that it's clear
> in the next patch that it's two separate parts because of the different
> locking requirements.

I'm not sure exactly what you mean and I could certainly do a better
job explaining in the commit description, but it's actually quite
important that kvm_tdp_mmu_invalidate_roots at least precede
kvm_zap_obsolete_pages as kvm_zap_obsolete_pages drops the lock and
yields. If kvm_tdp_mmu_invalidate_roots doesn't go first then vCPUs
could wind up dropping their ref on an old root and then picking it up
again before the last root had a chance to drop its ref.

Explaining in the description that kvm_tdp_mmu_zap_all is being
dropped because it is no longer necessary (as opposed to being moved)
might help make that cleaner.

Alternatively I could just leave kvm_tdp_mmu_zap_all and replace it in
the next patch.

>
> Paolo
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 12/13] KVM: x86/mmu: Fast invalidation for TDP MMU
  2021-03-31 22:27   ` Sean Christopherson
@ 2021-04-01 16:50     ` Ben Gardon
  0 siblings, 0 replies; 32+ messages in thread
From: Ben Gardon @ 2021-04-01 16:50 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: LKML, kvm, Paolo Bonzini, Peter Xu, Peter Shier, Peter Feiner,
	Junaid Shahid, Jim Mattson, Yulei Zhang, Wanpeng Li,
	Vitaly Kuznetsov, Xiao Guangrong

On Wed, Mar 31, 2021 at 3:27 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Wed, Mar 31, 2021, Ben Gardon wrote:
> > Provide a real mechanism for fast invalidation by marking roots as
> > invalid so that their reference count will quickly fall to zero
> > and they will be torn down.
> >
> > One negative side affect of this approach is that a vCPU thread will
> > likely drop the last reference to a root and be saddled with the work of
> > tearing down an entire paging structure. This issue will be resolved in
> > a later commit.
> >
> > Signed-off-by: Ben Gardon <bgardon@google.com>
> > ---
>
> ...
>
> > +/*
> > + * This function depends on running in the same MMU lock cirical section as
> > + * kvm_reload_remote_mmus. Since this is in the same critical section, no new
> > + * roots will be created between this function and the MMU reload signals
> > + * being sent.
>
> Eww.  That goes beyond just adding a lockdep assertion here.  I know you want to
> isolate the TDP MMU as much as possible, but this really feels like it should be
> open coded in kvm_mmu_zap_all_fast().  And assuming this lands after as_id is
> added to for_each_tdp_mmu_root(), it's probably easier to open code anyways, e.g.
> use list_for_each_entry() directly instead of bouncing through an iterator.

Yeah, that's fair. I'll open-code it here. I agree that it will remove
confusion from the function, though it would be nice to be able to use
for_each_tdp_mmu_root for the lockdep and rcu annotations.


>
> > + */
> > +void kvm_tdp_mmu_invalidate_roots(struct kvm *kvm)
> > +{
> > +     struct kvm_mmu_page *root;
> > +
> > +     for_each_tdp_mmu_root(kvm, root)
> > +             root->role.invalid = true;
> > +}

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 07/13] KVM: x86/mmu: Make TDP MMU root refcount atomic
  2021-03-31 22:21   ` Sean Christopherson
@ 2021-04-01 16:50     ` Ben Gardon
  0 siblings, 0 replies; 32+ messages in thread
From: Ben Gardon @ 2021-04-01 16:50 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: LKML, kvm, Paolo Bonzini, Peter Xu, Peter Shier, Peter Feiner,
	Junaid Shahid, Jim Mattson, Yulei Zhang, Wanpeng Li,
	Vitaly Kuznetsov, Xiao Guangrong

On Wed, Mar 31, 2021 at 3:22 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Wed, Mar 31, 2021, Ben Gardon wrote:
> > In order to parallelize more operations for the TDP MMU, make the
> > refcount on TDP MMU roots atomic, so that a future patch can allow
> > multiple threads to take a reference on the root concurrently, while
> > holding the MMU lock in read mode.
> >
> > Signed-off-by: Ben Gardon <bgardon@google.com>
> > ---
>
> ...
>
> > @@ -88,10 +88,12 @@ static struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm,
> >               next_root = list_first_entry(&kvm->arch.tdp_mmu_roots,
> >                                            typeof(*next_root), link);
> >
> > +     while (!list_entry_is_head(next_root, &kvm->arch.tdp_mmu_roots, link) &&
> > +            !kvm_tdp_mmu_get_root(kvm, next_root))
> > +             next_root = list_next_entry(next_root, link);
> > +
> >       if (list_entry_is_head(next_root, &kvm->arch.tdp_mmu_roots, link))
> >               next_root = NULL;
> > -     else
> > -             kvm_tdp_mmu_get_root(kvm, next_root);
> >
> >       if (prev_root)
> >               kvm_tdp_mmu_put_root(kvm, prev_root);
> > @@ -158,14 +160,13 @@ hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu)
> >
> >       /* Check for an existing root before allocating a new one. */
> >       for_each_tdp_mmu_root(kvm, root) {
> > -             if (root->role.word == role.word) {
> > -                     kvm_tdp_mmu_get_root(kvm, root);
> > +             if (root->role.word == role.word &&
> > +                 kvm_tdp_mmu_get_root(kvm, root))
>
> I'm not opposed to changing this logic while making the refcount atomic, but it
> needs to be explained in the changelog.  As is, the changelog makes it sound
> like the patch is a pure refactoring of the type.

Thanks for pointing that out. I'll add a note in the description in
v2. Those felt like natural changes since the introduction of the
atomic requires additional failure handling. I don't think there's any
way to add it as a separate commit without just introducing dead code,
but that would certainly be preferable.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 09/13] KVM: x86/mmu: Allow zap gfn range to operate under the mmu read lock
  2021-04-01  9:58   ` Paolo Bonzini
@ 2021-04-01 16:50     ` Ben Gardon
  0 siblings, 0 replies; 32+ messages in thread
From: Ben Gardon @ 2021-04-01 16:50 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: LKML, kvm, Peter Xu, Sean Christopherson, Peter Shier,
	Peter Feiner, Junaid Shahid, Jim Mattson, Yulei Zhang,
	Wanpeng Li, Vitaly Kuznetsov, Xiao Guangrong

On Thu, Apr 1, 2021 at 2:58 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 31/03/21 23:08, Ben Gardon wrote:
> > To reduce lock contention and interference with page fault handlers,
> > allow the TDP MMU function to zap a GFN range to operate under the MMU
> > read lock.
> >
> > Signed-off-by: Ben Gardon <bgardon@google.com>
> > ---
> >   arch/x86/kvm/mmu/mmu.c     |  15 ++++--
> >   arch/x86/kvm/mmu/tdp_mmu.c | 102 ++++++++++++++++++++++++++-----------
> >   arch/x86/kvm/mmu/tdp_mmu.h |   6 ++-
> >   3 files changed, 87 insertions(+), 36 deletions(-)
> >
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index 667d64daa82c..dcbfc784cf2f 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -3155,7 +3155,7 @@ static void mmu_free_root_page(struct kvm *kvm, hpa_t *root_hpa,
> >       sp = to_shadow_page(*root_hpa & PT64_BASE_ADDR_MASK);
> >
> >       if (is_tdp_mmu_page(sp))
> > -             kvm_tdp_mmu_put_root(kvm, sp);
> > +             kvm_tdp_mmu_put_root(kvm, sp, false);
> >       else if (!--sp->root_count && sp->role.invalid)
> >               kvm_mmu_prepare_zap_page(kvm, sp, invalid_list);
> >
> > @@ -5514,13 +5514,17 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
> >               }
> >       }
> >
> > +     write_unlock(&kvm->mmu_lock);
> > +
> >       if (is_tdp_mmu_enabled(kvm)) {
> > -             flush = kvm_tdp_mmu_zap_gfn_range(kvm, gfn_start, gfn_end);
> > +             read_lock(&kvm->mmu_lock);
> > +             flush = kvm_tdp_mmu_zap_gfn_range(kvm, gfn_start, gfn_end,
> > +                                               true);
> >               if (flush)
> >                       kvm_flush_remote_tlbs(kvm);
> > -     }
> >
> > -     write_unlock(&kvm->mmu_lock);
> > +             read_unlock(&kvm->mmu_lock);
> > +     }
> >   }
>
> This will conflict with Sean's MMU notifier series patches:
>
> KVM: x86/mmu: Pass address space ID to __kvm_tdp_mmu_zap_gfn_range()
>
> What I can do for now is change the mmu.c part of that patch to
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index e6e02360ef67..9882bbd9b742 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -5510,15 +5510,15 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
>                 }
>         }
>
> -       if (flush)
> -               kvm_flush_remote_tlbs_with_address(kvm, gfn_start, gfn_end);
> -
>         if (is_tdp_mmu_enabled(kvm)) {
> -               flush = kvm_tdp_mmu_zap_gfn_range(kvm, gfn_start, gfn_end);
> -               if (flush)
> -                       kvm_flush_remote_tlbs(kvm);
> +               for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++)
> +                       flush = kvm_tdp_mmu_zap_gfn_range(kvm, i, gfn_start,
> +                                                         gfn_end, flush);
>         }
>
> +       if (flush)
> +               kvm_flush_remote_tlbs_with_address(kvm, gfn_start, gfn_end);
> +
>         write_unlock(&kvm->mmu_lock);
>   }
>
>
> but you will have to add a separate "if (flush)" when moving the write_unlock
> earlier, since there's no downgrade function for rwlocks.  In practice it's
> not a huge deal since unless running nested there will be only one active MMU.
>
> Paolo

Thank you for doing that. I also figured that the extra flushes when
running nested would probably be worth it to get the parallelism
gains. I don't mind working out those conflicts in v2.


>
> >   static bool slot_rmap_write_protect(struct kvm *kvm,
> > @@ -5959,7 +5963,8 @@ static void kvm_recover_nx_lpages(struct kvm *kvm)
> >               WARN_ON_ONCE(!sp->lpage_disallowed);
> >               if (is_tdp_mmu_page(sp)) {
> >                       kvm_tdp_mmu_zap_gfn_range(kvm, sp->gfn,
> > -                             sp->gfn + KVM_PAGES_PER_HPAGE(sp->role.level));
> > +                             sp->gfn + KVM_PAGES_PER_HPAGE(sp->role.level),
> > +                             false);
> >               } else {
> >                       kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list);
> >                       WARN_ON_ONCE(sp->lpage_disallowed);
> > diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> > index d255125059c4..0e99e4675dd4 100644
> > --- a/arch/x86/kvm/mmu/tdp_mmu.c
> > +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> > @@ -27,6 +27,15 @@ void kvm_mmu_init_tdp_mmu(struct kvm *kvm)
> >       INIT_LIST_HEAD(&kvm->arch.tdp_mmu_pages);
> >   }
> >
> > +static __always_inline void kvm_lockdep_assert_mmu_lock_held(struct kvm *kvm,
> > +                                                          bool shared)
> > +{
> > +     if (shared)
> > +             lockdep_assert_held_read(&kvm->mmu_lock);
> > +     else
> > +             lockdep_assert_held_write(&kvm->mmu_lock);
> > +}
> > +
> >   void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm)
> >   {
> >       if (!kvm->arch.tdp_mmu_enabled)
> > @@ -42,7 +51,7 @@ void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm)
> >   }
> >
> >   static bool zap_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
> > -                       gfn_t start, gfn_t end, bool can_yield);
> > +                       gfn_t start, gfn_t end, bool can_yield, bool shared);
> >
> >   static void tdp_mmu_free_sp(struct kvm_mmu_page *sp)
> >   {
> > @@ -66,11 +75,12 @@ static void tdp_mmu_free_sp_rcu_callback(struct rcu_head *head)
> >       tdp_mmu_free_sp(sp);
> >   }
> >
> > -void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root)
> > +void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root,
> > +                       bool shared)
> >   {
> >       gfn_t max_gfn = 1ULL << (shadow_phys_bits - PAGE_SHIFT);
> >
> > -     lockdep_assert_held_write(&kvm->mmu_lock);
> > +     kvm_lockdep_assert_mmu_lock_held(kvm, shared);
> >
> >       if (!refcount_dec_and_test(&root->tdp_mmu_root_count))
> >               return;
> > @@ -81,7 +91,7 @@ void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root)
> >       list_del_rcu(&root->link);
> >       spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
> >
> > -     zap_gfn_range(kvm, root, 0, max_gfn, false);
> > +     zap_gfn_range(kvm, root, 0, max_gfn, false, shared);
> >
> >       call_rcu(&root->rcu_head, tdp_mmu_free_sp_rcu_callback);
> >   }
> > @@ -94,11 +104,11 @@ void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root)
> >    * function will return NULL.
> >    */
> >   static struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm,
> > -                                           struct kvm_mmu_page *prev_root)
> > +                                           struct kvm_mmu_page *prev_root,
> > +                                           bool shared)
> >   {
> >       struct kvm_mmu_page *next_root;
> >
> > -     lockdep_assert_held_write(&kvm->mmu_lock);
> >
> >       rcu_read_lock();
> >
> > @@ -117,7 +127,7 @@ static struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm,
> >       rcu_read_unlock();
> >
> >       if (prev_root)
> > -             kvm_tdp_mmu_put_root(kvm, prev_root);
> > +             kvm_tdp_mmu_put_root(kvm, prev_root, shared);
> >
> >       return next_root;
> >   }
> > @@ -127,11 +137,15 @@ static struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm,
> >    * This makes it safe to release the MMU lock and yield within the loop, but
> >    * if exiting the loop early, the caller must drop the reference to the most
> >    * recent root. (Unless keeping a live reference is desirable.)
> > + *
> > + * If shared is set, this function is operating under the MMU lock in read
> > + * mode. In the unlikely event that this thread must free a root, the lock
> > + * will be temporarily dropped and reacquired in write mode.
> >    */
> > -#define for_each_tdp_mmu_root_yield_safe(_kvm, _root)        \
> > -     for (_root = tdp_mmu_next_root(_kvm, NULL);     \
> > -          _root;                                     \
> > -          _root = tdp_mmu_next_root(_kvm, _root))
> > +#define for_each_tdp_mmu_root_yield_safe(_kvm, _root, _shared)       \
> > +     for (_root = tdp_mmu_next_root(_kvm, NULL, _shared);    \
> > +          _root;                                             \
> > +          _root = tdp_mmu_next_root(_kvm, _root, _shared))
> >
> >   /* Only safe under the MMU lock in write mode, without yielding. */
> >   #define for_each_tdp_mmu_root(_kvm, _root)                          \
> > @@ -632,7 +646,8 @@ static inline void tdp_mmu_set_spte_no_dirty_log(struct kvm *kvm,
> >    * Return false if a yield was not needed.
> >    */
> >   static inline bool tdp_mmu_iter_cond_resched(struct kvm *kvm,
> > -                                          struct tdp_iter *iter, bool flush)
> > +                                          struct tdp_iter *iter, bool flush,
> > +                                          bool shared)
> >   {
> >       /* Ensure forward progress has been made before yielding. */
> >       if (iter->next_last_level_gfn == iter->yielded_gfn)
> > @@ -644,7 +659,11 @@ static inline bool tdp_mmu_iter_cond_resched(struct kvm *kvm,
> >               if (flush)
> >                       kvm_flush_remote_tlbs(kvm);
> >
> > -             cond_resched_rwlock_write(&kvm->mmu_lock);
> > +             if (shared)
> > +                     cond_resched_rwlock_read(&kvm->mmu_lock);
> > +             else
> > +                     cond_resched_rwlock_write(&kvm->mmu_lock);
> > +
> >               rcu_read_lock();
> >
> >               WARN_ON(iter->gfn > iter->next_last_level_gfn);
> > @@ -662,23 +681,33 @@ static inline bool tdp_mmu_iter_cond_resched(struct kvm *kvm,
> >    * non-root pages mapping GFNs strictly within that range. Returns true if
> >    * SPTEs have been cleared and a TLB flush is needed before releasing the
> >    * MMU lock.
> > + *
> >    * If can_yield is true, will release the MMU lock and reschedule if the
> >    * scheduler needs the CPU or there is contention on the MMU lock. If this
> >    * function cannot yield, it will not release the MMU lock or reschedule and
> >    * the caller must ensure it does not supply too large a GFN range, or the
> >    * operation can cause a soft lockup.
> > + *
> > + * If shared is true, this thread holds the MMU lock in read mode and must
> > + * account for the possibility that other threads are modifying the paging
> > + * structures concurrently. If shared is false, this thread should hold the
> > + * MMU lock in write mode.
> >    */
> >   static bool zap_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
> > -                       gfn_t start, gfn_t end, bool can_yield)
> > +                       gfn_t start, gfn_t end, bool can_yield, bool shared)
> >   {
> >       struct tdp_iter iter;
> >       bool flush_needed = false;
> >
> > +     kvm_lockdep_assert_mmu_lock_held(kvm, shared);
> > +
> >       rcu_read_lock();
> >
> >       tdp_root_for_each_pte(iter, root, start, end) {
> > +retry:
> >               if (can_yield &&
> > -                 tdp_mmu_iter_cond_resched(kvm, &iter, flush_needed)) {
> > +                 tdp_mmu_iter_cond_resched(kvm, &iter, flush_needed,
> > +                                           shared)) {
> >                       flush_needed = false;
> >                       continue;
> >               }
> > @@ -696,8 +725,17 @@ static bool zap_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
> >                   !is_last_spte(iter.old_spte, iter.level))
> >                       continue;
> >
> > -             tdp_mmu_set_spte(kvm, &iter, 0);
> > -             flush_needed = true;
> > +             if (!shared) {
> > +                     tdp_mmu_set_spte(kvm, &iter, 0);
> > +                     flush_needed = true;
> > +             } else if (!tdp_mmu_zap_spte_atomic(kvm, &iter)) {
> > +                     /*
> > +                      * The iter must explicitly re-read the SPTE because
> > +                      * the atomic cmpxchg failed.
> > +                      */
> > +                     iter.old_spte = READ_ONCE(*rcu_dereference(iter.sptep));
> > +                     goto retry;
> > +             }
> >       }
> >
> >       rcu_read_unlock();
> > @@ -709,14 +747,20 @@ static bool zap_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
> >    * non-root pages mapping GFNs strictly within that range. Returns true if
> >    * SPTEs have been cleared and a TLB flush is needed before releasing the
> >    * MMU lock.
> > + *
> > + * If shared is true, this thread holds the MMU lock in read mode and must
> > + * account for the possibility that other threads are modifying the paging
> > + * structures concurrently. If shared is false, this thread should hold the
> > + * MMU in write mode.
> >    */
> > -bool kvm_tdp_mmu_zap_gfn_range(struct kvm *kvm, gfn_t start, gfn_t end)
> > +bool kvm_tdp_mmu_zap_gfn_range(struct kvm *kvm, gfn_t start, gfn_t end,
> > +                            bool shared)
> >   {
> >       struct kvm_mmu_page *root;
> >       bool flush = false;
> >
> > -     for_each_tdp_mmu_root_yield_safe(kvm, root)
> > -             flush |= zap_gfn_range(kvm, root, start, end, true);
> > +     for_each_tdp_mmu_root_yield_safe(kvm, root, shared)
> > +             flush |= zap_gfn_range(kvm, root, start, end, true, shared);
> >
> >       return flush;
> >   }
> > @@ -726,7 +770,7 @@ void kvm_tdp_mmu_zap_all(struct kvm *kvm)
> >       gfn_t max_gfn = 1ULL << (shadow_phys_bits - PAGE_SHIFT);
> >       bool flush;
> >
> > -     flush = kvm_tdp_mmu_zap_gfn_range(kvm, 0, max_gfn);
> > +     flush = kvm_tdp_mmu_zap_gfn_range(kvm, 0, max_gfn, false);
> >       if (flush)
> >               kvm_flush_remote_tlbs(kvm);
> >   }
> > @@ -893,7 +937,7 @@ static __always_inline int kvm_tdp_mmu_handle_hva_range(struct kvm *kvm,
> >       int ret = 0;
> >       int as_id;
> >
> > -     for_each_tdp_mmu_root_yield_safe(kvm, root) {
> > +     for_each_tdp_mmu_root_yield_safe(kvm, root, false) {
> >               as_id = kvm_mmu_page_as_id(root);
> >               slots = __kvm_memslots(kvm, as_id);
> >               kvm_for_each_memslot(memslot, slots) {
> > @@ -933,7 +977,7 @@ static int zap_gfn_range_hva_wrapper(struct kvm *kvm,
> >                                    struct kvm_mmu_page *root, gfn_t start,
> >                                    gfn_t end, unsigned long unused)
> >   {
> > -     return zap_gfn_range(kvm, root, start, end, false);
> > +     return zap_gfn_range(kvm, root, start, end, false, false);
> >   }
> >
> >   int kvm_tdp_mmu_zap_hva_range(struct kvm *kvm, unsigned long start,
> > @@ -1098,7 +1142,7 @@ static bool wrprot_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
> >
> >       for_each_tdp_pte_min_level(iter, root->spt, root->role.level,
> >                                  min_level, start, end) {
> > -             if (tdp_mmu_iter_cond_resched(kvm, &iter, false))
> > +             if (tdp_mmu_iter_cond_resched(kvm, &iter, false, false))
> >                       continue;
> >
> >               if (!is_shadow_present_pte(iter.old_spte) ||
> > @@ -1128,7 +1172,7 @@ bool kvm_tdp_mmu_wrprot_slot(struct kvm *kvm, struct kvm_memory_slot *slot,
> >       int root_as_id;
> >       bool spte_set = false;
> >
> > -     for_each_tdp_mmu_root_yield_safe(kvm, root) {
> > +     for_each_tdp_mmu_root_yield_safe(kvm, root, false) {
> >               root_as_id = kvm_mmu_page_as_id(root);
> >               if (root_as_id != slot->as_id)
> >                       continue;
> > @@ -1157,7 +1201,7 @@ static bool clear_dirty_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
> >       rcu_read_lock();
> >
> >       tdp_root_for_each_leaf_pte(iter, root, start, end) {
> > -             if (tdp_mmu_iter_cond_resched(kvm, &iter, false))
> > +             if (tdp_mmu_iter_cond_resched(kvm, &iter, false, false))
> >                       continue;
> >
> >               if (spte_ad_need_write_protect(iter.old_spte)) {
> > @@ -1193,7 +1237,7 @@ bool kvm_tdp_mmu_clear_dirty_slot(struct kvm *kvm, struct kvm_memory_slot *slot)
> >       int root_as_id;
> >       bool spte_set = false;
> >
> > -     for_each_tdp_mmu_root_yield_safe(kvm, root) {
> > +     for_each_tdp_mmu_root_yield_safe(kvm, root, false) {
> >               root_as_id = kvm_mmu_page_as_id(root);
> >               if (root_as_id != slot->as_id)
> >                       continue;
> > @@ -1291,7 +1335,7 @@ static void zap_collapsible_spte_range(struct kvm *kvm,
> >       rcu_read_lock();
> >
> >       tdp_root_for_each_pte(iter, root, start, end) {
> > -             if (tdp_mmu_iter_cond_resched(kvm, &iter, spte_set)) {
> > +             if (tdp_mmu_iter_cond_resched(kvm, &iter, spte_set, false)) {
> >                       spte_set = false;
> >                       continue;
> >               }
> > @@ -1326,7 +1370,7 @@ void kvm_tdp_mmu_zap_collapsible_sptes(struct kvm *kvm,
> >       struct kvm_mmu_page *root;
> >       int root_as_id;
> >
> > -     for_each_tdp_mmu_root_yield_safe(kvm, root) {
> > +     for_each_tdp_mmu_root_yield_safe(kvm, root, false) {
> >               root_as_id = kvm_mmu_page_as_id(root);
> >               if (root_as_id != slot->as_id)
> >                       continue;
> > diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
> > index 9961df505067..855e58856815 100644
> > --- a/arch/x86/kvm/mmu/tdp_mmu.h
> > +++ b/arch/x86/kvm/mmu/tdp_mmu.h
> > @@ -13,9 +13,11 @@ __must_check static inline bool kvm_tdp_mmu_get_root(struct kvm *kvm,
> >       return refcount_inc_not_zero(&root->tdp_mmu_root_count);
> >   }
> >
> > -void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root);
> > +void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root,
> > +                       bool shared);
> >
> > -bool kvm_tdp_mmu_zap_gfn_range(struct kvm *kvm, gfn_t start, gfn_t end);
> > +bool kvm_tdp_mmu_zap_gfn_range(struct kvm *kvm, gfn_t start, gfn_t end,
> > +                            bool shared);
> >   void kvm_tdp_mmu_zap_all(struct kvm *kvm);
> >
> >   int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
> >
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 12/13] KVM: x86/mmu: Fast invalidation for TDP MMU
  2021-04-01 16:50     ` Ben Gardon
@ 2021-04-01 17:02       ` Paolo Bonzini
  0 siblings, 0 replies; 32+ messages in thread
From: Paolo Bonzini @ 2021-04-01 17:02 UTC (permalink / raw)
  To: Ben Gardon
  Cc: LKML, kvm, Peter Xu, Sean Christopherson, Peter Shier,
	Peter Feiner, Junaid Shahid, Jim Mattson, Yulei Zhang,
	Wanpeng Li, Vitaly Kuznetsov, Xiao Guangrong

On 01/04/21 18:50, Ben Gardon wrote:
>> This is just cosmetic, but I'd prefer to keep the call to
>> kvm_tdp_mmu_invalidate_roots at the original place, so that it's clear
>> in the next patch that it's two separate parts because of the different
>> locking requirements.
> I'm not sure exactly what you mean and I could certainly do a better
> job explaining in the commit description, but it's actually quite
> important that kvm_tdp_mmu_invalidate_roots at least precede
> kvm_zap_obsolete_pages as kvm_zap_obsolete_pages drops the lock and
> yields. If kvm_tdp_mmu_invalidate_roots doesn't go first then vCPUs
> could wind up dropping their ref on an old root and then picking it up
> again before the last root had a chance to drop its ref.
> Explaining in the description that kvm_tdp_mmu_zap_all is being
> dropped because it is no longer necessary (as opposed to being moved)
> might help make that cleaner.

No, what would help is the remark you just made about 
kvm_zap_obsolete_pages yielding.  But that doesn't matter after 13/13 
though, does it?  Perhaps it's easier to just combine them.

Paolo


^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2021-04-01 18:33 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-31 21:08 [PATCH 00/13] More parallel operations for the TDP MMU Ben Gardon
2021-03-31 21:08 ` [PATCH 01/13] KVM: x86/mmu: Re-add const qualifier in kvm_tdp_mmu_zap_collapsible_sptes Ben Gardon
2021-03-31 21:08 ` [PATCH 02/13] KVM: x86/mmu: Move kvm_mmu_(get|put)_root to TDP MMU Ben Gardon
2021-03-31 21:08 ` [PATCH 03/13] KVM: x86/mmu: use tdp_mmu_free_sp to free roots Ben Gardon
2021-03-31 21:08 ` [PATCH 04/13] KVM: x86/mmu: Merge TDP MMU put and free root Ben Gardon
2021-03-31 21:08 ` [PATCH 05/13] KVM: x86/mmu: comment for_each_tdp_mmu_root requires MMU write lock Ben Gardon
2021-03-31 21:08 ` [PATCH 06/13] KVM: x86/mmu: Refactor yield safe root iterator Ben Gardon
2021-03-31 21:08 ` [PATCH 07/13] KVM: x86/mmu: Make TDP MMU root refcount atomic Ben Gardon
2021-03-31 22:21   ` Sean Christopherson
2021-04-01 16:50     ` Ben Gardon
2021-03-31 21:08 ` [PATCH 08/13] KVM: x86/mmu: Protect the tdp_mmu_roots list with RCU Ben Gardon
2021-04-01  9:37   ` Paolo Bonzini
2021-04-01 16:48     ` Ben Gardon
2021-04-01 13:16   ` kernel test robot
2021-04-01 16:50     ` Ben Gardon
2021-03-31 21:08 ` [PATCH 09/13] KVM: x86/mmu: Allow zap gfn range to operate under the mmu read lock Ben Gardon
2021-04-01  9:58   ` Paolo Bonzini
2021-04-01 16:50     ` Ben Gardon
2021-03-31 21:08 ` [PATCH 10/13] KVM: x86/mmu: Allow zapping collapsible SPTEs to use MMU " Ben Gardon
2021-04-01 10:12   ` Paolo Bonzini
2021-03-31 21:08 ` [PATCH 11/13] KVM: x86/mmu: Allow enabling / disabling dirty logging under " Ben Gardon
2021-03-31 21:08 ` [PATCH 12/13] KVM: x86/mmu: Fast invalidation for TDP MMU Ben Gardon
2021-03-31 22:27   ` Sean Christopherson
2021-04-01 16:50     ` Ben Gardon
2021-04-01 10:36   ` Paolo Bonzini
2021-04-01 16:50     ` Ben Gardon
2021-04-01 17:02       ` Paolo Bonzini
2021-03-31 21:08 ` [PATCH 13/13] KVM: x86/mmu: Tear down roots in fast invalidation thread Ben Gardon
2021-03-31 22:29   ` Sean Christopherson
2021-04-01 10:41     ` Paolo Bonzini
2021-04-01 10:37   ` Paolo Bonzini
2021-04-01 10:43 ` [PATCH 00/13] More parallel operations for the TDP MMU Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).