linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] Huge page related cleanups
@ 2022-07-15 23:21 Sean Christopherson
  2022-07-15 23:21 ` [PATCH 1/4] KVM: x86/mmu: Don't require refcounted "struct page" to create huge SPTEs Sean Christopherson
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Sean Christopherson @ 2022-07-15 23:21 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini; +Cc: kvm, linux-kernel, Mingwei Zhang

Simplify (hopefully it's simpler) zapping collapsible SPTEs by first
simplifying retrieving the host mapping level.  KVM currently requires
memory be backed by a refcounted struct page in order to be mapped as
a huge page.  This requires KVM to acquire the pfn that corresponds to
the gfn/hva before checking whether or not the gfn/hva can be mapped
huge.

Dropping that requirement allow the "zap collapsible" path to detect
that a shadow page can be zapped without having to first bottom out on
leaf entries.  This could theoretically be a minor performance win,
e.g. then KVM doesn't need to walk all not-present leaf SPTEs to find
out that a shadow page has no children.  In basic testing I didn't see
any meaningful difference (the actual zapping dominates).

There are also potential use cases for allow any mappings to be huge,
e.g. GPU buffers (IIUC).  Dropping the struct page requirement makes
KVM play nice with those.

This is most definitely post-5.20 material.

Sean Christopherson (4):
  KVM: x86/mmu: Don't require refcounted "struct page" to create huge
    SPTEs
  KVM: x86/mmu: Document the "rules" for using host_pfn_mapping_level()
  KVM: x86/mmu: Don't bottom out on leafs when zapping collapsible SPTEs
  KVM: selftests: Add an option to run vCPUs while disabling dirty
    logging

 arch/x86/kvm/mmu/mmu.c                        | 65 ++++++++++++-------
 arch/x86/kvm/mmu/mmu_internal.h               |  2 +-
 arch/x86/kvm/mmu/tdp_iter.c                   |  9 ---
 arch/x86/kvm/mmu/tdp_iter.h                   |  1 -
 arch/x86/kvm/mmu/tdp_mmu.c                    | 61 ++++++++---------
 .../selftests/kvm/dirty_log_perf_test.c       | 30 ++++++++-
 6 files changed, 94 insertions(+), 74 deletions(-)


base-commit: 8031d87aa9953ddeb047a5356ebd0b240c30f233
-- 
2.37.0.170.g444d1eabd0-goog


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/4] KVM: x86/mmu: Don't require refcounted "struct page" to create huge SPTEs
  2022-07-15 23:21 [PATCH 0/4] Huge page related cleanups Sean Christopherson
@ 2022-07-15 23:21 ` Sean Christopherson
  2022-07-16  5:53   ` Mingwei Zhang
  2022-07-15 23:21 ` [PATCH 2/4] KVM: x86/mmu: Document the "rules" for using host_pfn_mapping_level() Sean Christopherson
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 10+ messages in thread
From: Sean Christopherson @ 2022-07-15 23:21 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini; +Cc: kvm, linux-kernel, Mingwei Zhang

Drop the requirement that a pfn be backed by a refcounted, compound or
or ZONE_DEVICE, struct page, and instead rely solely on the host page
tables to identify huge pages.  The PageCompound() check is a remnant of
an old implementation that identified (well, attempt to identify) huge
pages without walking the host page tables.  The ZONE_DEVICE check was
added as an exception to the PageCompound() requirement.  In other words,
neither check is actually a hard requirement, if the primary has a pfn
backed with a huge page, then KVM can back the pfn with a huge page
regardless of the backing store.

Dropping the @pfn parameter will also allow KVM to query the max host
mapping level without having to first get the pfn, which is advantageous
for use outside of the page fault path where KVM wants to take action if
and only if a page can be mapped huge, i.e. avoids the pfn lookup for
gfns that can't be backed with a huge page.

Cc: Mingwei Zhang <mizhang@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c          | 23 +++++------------------
 arch/x86/kvm/mmu/mmu_internal.h |  2 +-
 arch/x86/kvm/mmu/tdp_mmu.c      |  8 +-------
 3 files changed, 7 insertions(+), 26 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 52664c3caaab..bebff1d5acd4 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2919,11 +2919,10 @@ static void direct_pte_prefetch(struct kvm_vcpu *vcpu, u64 *sptep)
 	__direct_pte_prefetch(vcpu, sp, sptep);
 }
 
-static int host_pfn_mapping_level(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn,
+static int host_pfn_mapping_level(struct kvm *kvm, gfn_t gfn,
 				  const struct kvm_memory_slot *slot)
 {
 	int level = PG_LEVEL_4K;
-	struct page *page;
 	unsigned long hva;
 	unsigned long flags;
 	pgd_t pgd;
@@ -2931,17 +2930,6 @@ static int host_pfn_mapping_level(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn,
 	pud_t pud;
 	pmd_t pmd;
 
-	/*
-	 * Note, @slot must be non-NULL, i.e. the caller is responsible for
-	 * ensuring @pfn isn't garbage and is backed by a memslot.
-	 */
-	page = kvm_pfn_to_refcounted_page(pfn);
-	if (!page)
-		return PG_LEVEL_4K;
-
-	if (!PageCompound(page) && !kvm_is_zone_device_page(page))
-		return PG_LEVEL_4K;
-
 	/*
 	 * Note, using the already-retrieved memslot and __gfn_to_hva_memslot()
 	 * is not solely for performance, it's also necessary to avoid the
@@ -2994,7 +2982,7 @@ static int host_pfn_mapping_level(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn,
 
 int kvm_mmu_max_mapping_level(struct kvm *kvm,
 			      const struct kvm_memory_slot *slot, gfn_t gfn,
-			      kvm_pfn_t pfn, int max_level)
+			      int max_level)
 {
 	struct kvm_lpage_info *linfo;
 	int host_level;
@@ -3009,7 +2997,7 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm,
 	if (max_level == PG_LEVEL_4K)
 		return PG_LEVEL_4K;
 
-	host_level = host_pfn_mapping_level(kvm, gfn, pfn, slot);
+	host_level = host_pfn_mapping_level(kvm, gfn, slot);
 	return min(host_level, max_level);
 }
 
@@ -3034,8 +3022,7 @@ void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	 * level, which will be used to do precise, accurate accounting.
 	 */
 	fault->req_level = kvm_mmu_max_mapping_level(vcpu->kvm, slot,
-						     fault->gfn, fault->pfn,
-						     fault->max_level);
+						     fault->gfn, fault->max_level);
 	if (fault->req_level == PG_LEVEL_4K || fault->huge_page_disallowed)
 		return;
 
@@ -6406,7 +6393,7 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,
 		 */
 		if (sp->role.direct &&
 		    sp->role.level < kvm_mmu_max_mapping_level(kvm, slot, sp->gfn,
-							       pfn, PG_LEVEL_NUM)) {
+							       PG_LEVEL_NUM)) {
 			pte_list_remove(kvm, rmap_head, sptep);
 
 			if (kvm_available_flush_tlb_with_range())
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index ae2d660e2dab..582def531d4d 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -309,7 +309,7 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 
 int kvm_mmu_max_mapping_level(struct kvm *kvm,
 			      const struct kvm_memory_slot *slot, gfn_t gfn,
-			      kvm_pfn_t pfn, int max_level);
+			      int max_level);
 void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
 void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_level);
 
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index f3a430d64975..d75d93edc40a 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1733,7 +1733,6 @@ static void zap_collapsible_spte_range(struct kvm *kvm,
 	gfn_t end = start + slot->npages;
 	struct tdp_iter iter;
 	int max_mapping_level;
-	kvm_pfn_t pfn;
 
 	rcu_read_lock();
 
@@ -1745,13 +1744,8 @@ static void zap_collapsible_spte_range(struct kvm *kvm,
 		    !is_last_spte(iter.old_spte, iter.level))
 			continue;
 
-		/*
-		 * This is a leaf SPTE. Check if the PFN it maps can
-		 * be mapped at a higher level.
-		 */
-		pfn = spte_to_pfn(iter.old_spte);
 		max_mapping_level = kvm_mmu_max_mapping_level(kvm, slot,
-				iter.gfn, pfn, PG_LEVEL_NUM);
+							      iter.gfn, PG_LEVEL_NUM);
 
 		WARN_ON(max_mapping_level < iter.level);
 
-- 
2.37.0.170.g444d1eabd0-goog


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/4] KVM: x86/mmu: Document the "rules" for using host_pfn_mapping_level()
  2022-07-15 23:21 [PATCH 0/4] Huge page related cleanups Sean Christopherson
  2022-07-15 23:21 ` [PATCH 1/4] KVM: x86/mmu: Don't require refcounted "struct page" to create huge SPTEs Sean Christopherson
@ 2022-07-15 23:21 ` Sean Christopherson
  2022-07-16 18:51   ` Mingwei Zhang
  2022-07-15 23:21 ` [PATCH 3/4] KVM: x86/mmu: Don't bottom out on leafs when zapping collapsible SPTEs Sean Christopherson
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 10+ messages in thread
From: Sean Christopherson @ 2022-07-15 23:21 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini; +Cc: kvm, linux-kernel, Mingwei Zhang

Add a comment to document how host_pfn_mapping_level() can be used safely,
as the line between safe and dangerous is quite thin.  E.g. if KVM were
to ever support in-place promotion to create huge pages, consuming the
level is safe if the caller holds mmu_lock and checks that there's an
existing _leaf_ SPTE, but unsafe if the caller only checks that there's a
non-leaf SPTE.

Opportunistically tweak the existing comments to explicitly document why
KVM needs to use READ_ONCE().

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 42 +++++++++++++++++++++++++++++++++++-------
 1 file changed, 35 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index bebff1d5acd4..d5b644f3e003 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2919,6 +2919,31 @@ static void direct_pte_prefetch(struct kvm_vcpu *vcpu, u64 *sptep)
 	__direct_pte_prefetch(vcpu, sp, sptep);
 }
 
+/*
+ * Lookup the mapping level for @gfn in the current mm.
+ *
+ * WARNING!  Use of host_pfn_mapping_level() requires the caller and the end
+ * consumer to be tied into KVM's handlers for MMU notifier events!
+ *
+ * There are several ways to safely use this helper:
+ *
+ * - Check mmu_notifier_retry_hva() after grabbing the mapping level, before
+ *   consuming it.  In this case, mmu_lock doesn't need to be held during the
+ *   lookup, but it does need to be held while checking the MMU notifier.
+ *
+ * - Hold mmu_lock AND ensure there is no in-progress MMU notifier invalidation
+ *   event for the hva.  This can be done by explicit checking the MMU notifier
+ *   or by ensuring that KVM already has a valid mapping that covers the hva.
+ *
+ * - Do not use the result to install new mappings, e.g. use the host mapping
+ *   level only to decide whether or not to zap an entry.  In this case, it's
+ *   not required to hold mmu_lock (though it's highly likely the caller will
+ *   want to hold mmu_lock anyways, e.g. to modify SPTEs).
+ *
+ * Note!  The lookup can still race with modifications to host page tables, but
+ * the above "rules" ensure KVM will not _consume_ the result of the walk if a
+ * race with the primary MMU occurs.
+ */
 static int host_pfn_mapping_level(struct kvm *kvm, gfn_t gfn,
 				  const struct kvm_memory_slot *slot)
 {
@@ -2941,16 +2966,19 @@ static int host_pfn_mapping_level(struct kvm *kvm, gfn_t gfn,
 	hva = __gfn_to_hva_memslot(slot, gfn);
 
 	/*
-	 * Lookup the mapping level in the current mm.  The information
-	 * may become stale soon, but it is safe to use as long as
-	 * 1) mmu_notifier_retry was checked after taking mmu_lock, and
-	 * 2) mmu_lock is taken now.
-	 *
-	 * We still need to disable IRQs to prevent concurrent tear down
-	 * of page tables.
+	 * Disable IRQs to prevent concurrent tear down of host page tables,
+	 * e.g. if the primary MMU promotes a P*D to a huge page and then frees
+	 * the original page table.
 	 */
 	local_irq_save(flags);
 
+	/*
+	 * Read each entry once.  As above, a non-leaf entry can be promoted to
+	 * a huge page _during_ this walk.  Re-reading the entry could send the
+	 * walk into the weeks, e.g. p*d_large() returns false (sees the old
+	 * value) and then p*d_offset() walks into the target huge page instead
+	 * of the old page table (sees the new value).
+	 */
 	pgd = READ_ONCE(*pgd_offset(kvm->mm, hva));
 	if (pgd_none(pgd))
 		goto out;
-- 
2.37.0.170.g444d1eabd0-goog


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/4] KVM: x86/mmu: Don't bottom out on leafs when zapping collapsible SPTEs
  2022-07-15 23:21 [PATCH 0/4] Huge page related cleanups Sean Christopherson
  2022-07-15 23:21 ` [PATCH 1/4] KVM: x86/mmu: Don't require refcounted "struct page" to create huge SPTEs Sean Christopherson
  2022-07-15 23:21 ` [PATCH 2/4] KVM: x86/mmu: Document the "rules" for using host_pfn_mapping_level() Sean Christopherson
@ 2022-07-15 23:21 ` Sean Christopherson
  2022-07-15 23:21 ` [PATCH 4/4] KVM: selftests: Add an option to run vCPUs while disabling dirty logging Sean Christopherson
  2022-07-19 18:02 ` [PATCH 0/4] Huge page related cleanups Paolo Bonzini
  4 siblings, 0 replies; 10+ messages in thread
From: Sean Christopherson @ 2022-07-15 23:21 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini; +Cc: kvm, linux-kernel, Mingwei Zhang

When zapping collapsible SPTEs in the TDP MMU, don't bottom out on a leaf
SPTE now that KVM doesn't require a PFN to compute the host mapping level,
i.e. now that there's no need to first find a leaf SPTE and then step
back up.

Drop the now unused tdp_iter_step_up(), as it is not the safest of
helpers (using any of the low level iterators requires some understanding
of the various side effects).

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/tdp_iter.c |  9 ------
 arch/x86/kvm/mmu/tdp_iter.h |  1 -
 arch/x86/kvm/mmu/tdp_mmu.c  | 57 ++++++++++++++++++-------------------
 3 files changed, 27 insertions(+), 40 deletions(-)

diff --git a/arch/x86/kvm/mmu/tdp_iter.c b/arch/x86/kvm/mmu/tdp_iter.c
index 9c65a64a56d9..39b48e7d7d1a 100644
--- a/arch/x86/kvm/mmu/tdp_iter.c
+++ b/arch/x86/kvm/mmu/tdp_iter.c
@@ -145,15 +145,6 @@ static bool try_step_up(struct tdp_iter *iter)
 	return true;
 }
 
-/*
- * Step the iterator back up a level in the paging structure. Should only be
- * used when the iterator is below the root level.
- */
-void tdp_iter_step_up(struct tdp_iter *iter)
-{
-	WARN_ON(!try_step_up(iter));
-}
-
 /*
  * Step to the next SPTE in a pre-order traversal of the paging structure.
  * To get to the next SPTE, the iterator either steps down towards the goal
diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/arch/x86/kvm/mmu/tdp_iter.h
index adfca0cf94d3..f0af385c56e0 100644
--- a/arch/x86/kvm/mmu/tdp_iter.h
+++ b/arch/x86/kvm/mmu/tdp_iter.h
@@ -114,6 +114,5 @@ void tdp_iter_start(struct tdp_iter *iter, struct kvm_mmu_page *root,
 		    int min_level, gfn_t next_last_level_gfn);
 void tdp_iter_next(struct tdp_iter *iter);
 void tdp_iter_restart(struct tdp_iter *iter);
-void tdp_iter_step_up(struct tdp_iter *iter);
 
 #endif /* __KVM_X86_MMU_TDP_ITER_H */
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index d75d93edc40a..40ccb5fba870 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1721,10 +1721,6 @@ void kvm_tdp_mmu_clear_dirty_pt_masked(struct kvm *kvm,
 		clear_dirty_pt_masked(kvm, root, gfn, mask, wrprot);
 }
 
-/*
- * Clear leaf entries which could be replaced by large mappings, for
- * GFNs within the slot.
- */
 static void zap_collapsible_spte_range(struct kvm *kvm,
 				       struct kvm_mmu_page *root,
 				       const struct kvm_memory_slot *slot)
@@ -1736,48 +1732,49 @@ static void zap_collapsible_spte_range(struct kvm *kvm,
 
 	rcu_read_lock();
 
-	tdp_root_for_each_pte(iter, root, start, end) {
+	for_each_tdp_pte_min_level(iter, root, PG_LEVEL_2M, start, end) {
+retry:
 		if (tdp_mmu_iter_cond_resched(kvm, &iter, false, true))
 			continue;
 
-		if (!is_shadow_present_pte(iter.old_spte) ||
-		    !is_last_spte(iter.old_spte, iter.level))
+		if (iter.level > KVM_MAX_HUGEPAGE_LEVEL ||
+		    !is_shadow_present_pte(iter.old_spte))
+			continue;
+
+		/*
+		 * Don't zap leaf SPTEs, if a leaf SPTE could be replaced with
+		 * a large page size, then its parent would have been zapped
+		 * instead of stepping down.
+		 */
+		if (is_last_spte(iter.old_spte, iter.level))
+			continue;
+
+		/*
+		 * If iter.gfn resides outside of the slot, i.e. the page for
+		 * the current level overlaps but is not contained by the slot,
+		 * then the SPTE can't be made huge.  More importantly, trying
+		 * to query that info from slot->arch.lpage_info will cause an
+		 * out-of-bounds access.
+		 */
+		if (iter.gfn < start || iter.gfn >= end)
 			continue;
 
 		max_mapping_level = kvm_mmu_max_mapping_level(kvm, slot,
 							      iter.gfn, PG_LEVEL_NUM);
-
-		WARN_ON(max_mapping_level < iter.level);
-
-		/*
-		 * If this page is already mapped at the highest
-		 * viable level, there's nothing more to do.
-		 */
-		if (max_mapping_level == iter.level)
+		if (max_mapping_level < iter.level)
 			continue;
 
-		/*
-		 * The page can be remapped at a higher level, so step
-		 * up to zap the parent SPTE.
-		 */
-		while (max_mapping_level > iter.level)
-			tdp_iter_step_up(&iter);
-
 		/* Note, a successful atomic zap also does a remote TLB flush. */
-		tdp_mmu_zap_spte_atomic(kvm, &iter);
-
-		/*
-		 * If the atomic zap fails, the iter will recurse back into
-		 * the same subtree to retry.
-		 */
+		if (tdp_mmu_zap_spte_atomic(kvm, &iter))
+			goto retry;
 	}
 
 	rcu_read_unlock();
 }
 
 /*
- * Clear non-leaf entries (and free associated page tables) which could
- * be replaced by large mappings, for GFNs within the slot.
+ * Zap non-leaf SPTEs (and free their associated page tables) which could
+ * be replaced by huge pages, for GFNs within the slot.
  */
 void kvm_tdp_mmu_zap_collapsible_sptes(struct kvm *kvm,
 				       const struct kvm_memory_slot *slot)
-- 
2.37.0.170.g444d1eabd0-goog


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 4/4] KVM: selftests: Add an option to run vCPUs while disabling dirty logging
  2022-07-15 23:21 [PATCH 0/4] Huge page related cleanups Sean Christopherson
                   ` (2 preceding siblings ...)
  2022-07-15 23:21 ` [PATCH 3/4] KVM: x86/mmu: Don't bottom out on leafs when zapping collapsible SPTEs Sean Christopherson
@ 2022-07-15 23:21 ` Sean Christopherson
  2022-07-19 18:02 ` [PATCH 0/4] Huge page related cleanups Paolo Bonzini
  4 siblings, 0 replies; 10+ messages in thread
From: Sean Christopherson @ 2022-07-15 23:21 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini; +Cc: kvm, linux-kernel, Mingwei Zhang

Add a command line option to dirty_log_perf_test to run vCPUs for the
entire duration of disabling dirty logging.  By default, the test stops
running runs vCPUs before disabling dirty logging, which is faster but
less interesting as it doesn't stress KVM's handling of contention
between page faults and the zapping of collapsible SPTEs.  Enabling the
flag also lets the user verify that KVM is indeed rebuilding zapped SPTEs
as huge pages by checking KVM's pages_{1g,2m,4k} stats.  Without vCPUs to
fault in the zapped SPTEs, the stats will show that KVM is zapping pages,
but they never show whether or not KVM actually allows huge pages to be
recreated.

Note!  Enabling the flag can _significantly_ increase runtime, especially
if the thread that's disabling dirty logging doesn't have a dedicated
pCPU, e.g. if all pCPUs are used to run vCPUs.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 .../selftests/kvm/dirty_log_perf_test.c       | 30 +++++++++++++++++--
 1 file changed, 27 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/kvm/dirty_log_perf_test.c b/tools/testing/selftests/kvm/dirty_log_perf_test.c
index 808a36dbf0c0..f99e39a672d3 100644
--- a/tools/testing/selftests/kvm/dirty_log_perf_test.c
+++ b/tools/testing/selftests/kvm/dirty_log_perf_test.c
@@ -59,6 +59,7 @@ static void arch_cleanup_vm(struct kvm_vm *vm)
 
 static int nr_vcpus = 1;
 static uint64_t guest_percpu_mem_size = DEFAULT_PER_VCPU_MEM_SIZE;
+static bool run_vcpus_while_disabling_dirty_logging;
 
 /* Host variables */
 static u64 dirty_log_manual_caps;
@@ -109,8 +110,13 @@ static void vcpu_worker(struct perf_test_vcpu_args *vcpu_args)
 				ts_diff.tv_nsec);
 		}
 
+		/*
+		 * Keep running the guest while dirty logging is being disabled
+		 * (iteration is negative) so that vCPUs are accessing memory
+		 * for the entire duration of zapping collapsible SPTEs.
+		 */
 		while (current_iteration == READ_ONCE(iteration) &&
-		       !READ_ONCE(host_quit)) {}
+		       READ_ONCE(iteration) >= 0 && !READ_ONCE(host_quit)) {}
 	}
 
 	avg = timespec_div(total, vcpu_last_completed_iteration[vcpu_idx]);
@@ -302,6 +308,14 @@ static void run_test(enum vm_guest_mode mode, void *arg)
 		}
 	}
 
+	/*
+	 * Run vCPUs while dirty logging is being disabled to stress disabling
+	 * in terms of both performance and correctness.  Opt-in via command
+	 * line as this significantly increases time to disable dirty logging.
+	 */
+	if (run_vcpus_while_disabling_dirty_logging)
+		WRITE_ONCE(iteration, -1);
+
 	/* Disable dirty logging */
 	clock_gettime(CLOCK_MONOTONIC, &start);
 	disable_dirty_logging(vm, p->slots);
@@ -309,7 +323,11 @@ static void run_test(enum vm_guest_mode mode, void *arg)
 	pr_info("Disabling dirty logging time: %ld.%.9lds\n",
 		ts_diff.tv_sec, ts_diff.tv_nsec);
 
-	/* Tell the vcpu thread to quit */
+	/*
+	 * Tell the vCPU threads to quit.  No need to manually check that vCPUs
+	 * have stopped running after disabling dirty logging, the join will
+	 * wait for them to exit.
+	 */
 	host_quit = true;
 	perf_test_join_vcpu_threads(nr_vcpus);
 
@@ -349,6 +367,9 @@ static void help(char *name)
 	       "     Warning: a low offset can conflict with the loaded test code.\n");
 	guest_modes_help();
 	printf(" -n: Run the vCPUs in nested mode (L2)\n");
+	printf(" -e: Run vCPUs while dirty logging is being disabled.  This\n"
+	       "     can significantly increase runtime, especially if there\n"
+	       "     isn't a dedicated pCPU for the main thread.\n");
 	printf(" -b: specify the size of the memory region which should be\n"
 	       "     dirtied by each vCPU. e.g. 10M or 3G.\n"
 	       "     (default: 1G)\n");
@@ -385,8 +406,11 @@ int main(int argc, char *argv[])
 
 	guest_modes_append_default();
 
-	while ((opt = getopt(argc, argv, "ghi:p:m:nb:f:v:os:x:")) != -1) {
+	while ((opt = getopt(argc, argv, "eghi:p:m:nb:f:v:os:x:")) != -1) {
 		switch (opt) {
+		case 'e':
+			/* 'e' is for evil. */
+			run_vcpus_while_disabling_dirty_logging = true;
 		case 'g':
 			dirty_log_manual_caps = 0;
 			break;
-- 
2.37.0.170.g444d1eabd0-goog


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/4] KVM: x86/mmu: Don't require refcounted "struct page" to create huge SPTEs
  2022-07-15 23:21 ` [PATCH 1/4] KVM: x86/mmu: Don't require refcounted "struct page" to create huge SPTEs Sean Christopherson
@ 2022-07-16  5:53   ` Mingwei Zhang
  0 siblings, 0 replies; 10+ messages in thread
From: Mingwei Zhang @ 2022-07-16  5:53 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel

On Fri, Jul 15, 2022, Sean Christopherson wrote:
> Drop the requirement that a pfn be backed by a refcounted, compound or
> or ZONE_DEVICE, struct page, and instead rely solely on the host page
> tables to identify huge pages.  The PageCompound() check is a remnant of
> an old implementation that identified (well, attempt to identify) huge
> pages without walking the host page tables.  The ZONE_DEVICE check was
> added as an exception to the PageCompound() requirement.  In other words,
> neither check is actually a hard requirement, if the primary has a pfn
> backed with a huge page, then KVM can back the pfn with a huge page
> regardless of the backing store.
> 
> Dropping the @pfn parameter will also allow KVM to query the max host
> mapping level without having to first get the pfn, which is advantageous
> for use outside of the page fault path where KVM wants to take action if
> and only if a page can be mapped huge, i.e. avoids the pfn lookup for
> gfns that can't be backed with a huge page.

Our of curiosity, when host maps huge pages under VMA with VM_PFNMAP,
they are basically out of the control of MM (I presume). So, when
drivers of those pages remove them from host page table, do we (KVM) get
mmu_notifiers?

> 
> Cc: Mingwei Zhang <mizhang@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Mingwei Zhang <mizhang@google.com>
> ---
>  arch/x86/kvm/mmu/mmu.c          | 23 +++++------------------
>  arch/x86/kvm/mmu/mmu_internal.h |  2 +-
>  arch/x86/kvm/mmu/tdp_mmu.c      |  8 +-------
>  3 files changed, 7 insertions(+), 26 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 52664c3caaab..bebff1d5acd4 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -2919,11 +2919,10 @@ static void direct_pte_prefetch(struct kvm_vcpu *vcpu, u64 *sptep)
>  	__direct_pte_prefetch(vcpu, sp, sptep);
>  }
>  
> -static int host_pfn_mapping_level(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn,
> +static int host_pfn_mapping_level(struct kvm *kvm, gfn_t gfn,
>  				  const struct kvm_memory_slot *slot)
>  {
>  	int level = PG_LEVEL_4K;
> -	struct page *page;
>  	unsigned long hva;
>  	unsigned long flags;
>  	pgd_t pgd;
> @@ -2931,17 +2930,6 @@ static int host_pfn_mapping_level(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn,
>  	pud_t pud;
>  	pmd_t pmd;
>  
> -	/*
> -	 * Note, @slot must be non-NULL, i.e. the caller is responsible for
> -	 * ensuring @pfn isn't garbage and is backed by a memslot.
> -	 */
> -	page = kvm_pfn_to_refcounted_page(pfn);
> -	if (!page)
> -		return PG_LEVEL_4K;
> -
> -	if (!PageCompound(page) && !kvm_is_zone_device_page(page))
> -		return PG_LEVEL_4K;
> -
>  	/*
>  	 * Note, using the already-retrieved memslot and __gfn_to_hva_memslot()
>  	 * is not solely for performance, it's also necessary to avoid the
> @@ -2994,7 +2982,7 @@ static int host_pfn_mapping_level(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn,
>  
>  int kvm_mmu_max_mapping_level(struct kvm *kvm,
>  			      const struct kvm_memory_slot *slot, gfn_t gfn,
> -			      kvm_pfn_t pfn, int max_level)
> +			      int max_level)
>  {
>  	struct kvm_lpage_info *linfo;
>  	int host_level;
> @@ -3009,7 +2997,7 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm,
>  	if (max_level == PG_LEVEL_4K)
>  		return PG_LEVEL_4K;
>  
> -	host_level = host_pfn_mapping_level(kvm, gfn, pfn, slot);
> +	host_level = host_pfn_mapping_level(kvm, gfn, slot);
>  	return min(host_level, max_level);
>  }
>  
> @@ -3034,8 +3022,7 @@ void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
>  	 * level, which will be used to do precise, accurate accounting.
>  	 */
>  	fault->req_level = kvm_mmu_max_mapping_level(vcpu->kvm, slot,
> -						     fault->gfn, fault->pfn,
> -						     fault->max_level);
> +						     fault->gfn, fault->max_level);
>  	if (fault->req_level == PG_LEVEL_4K || fault->huge_page_disallowed)
>  		return;
>  
> @@ -6406,7 +6393,7 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,
>  		 */
>  		if (sp->role.direct &&
>  		    sp->role.level < kvm_mmu_max_mapping_level(kvm, slot, sp->gfn,
> -							       pfn, PG_LEVEL_NUM)) {
> +							       PG_LEVEL_NUM)) {
>  			pte_list_remove(kvm, rmap_head, sptep);
>  
>  			if (kvm_available_flush_tlb_with_range())
> diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
> index ae2d660e2dab..582def531d4d 100644
> --- a/arch/x86/kvm/mmu/mmu_internal.h
> +++ b/arch/x86/kvm/mmu/mmu_internal.h
> @@ -309,7 +309,7 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
>  
>  int kvm_mmu_max_mapping_level(struct kvm *kvm,
>  			      const struct kvm_memory_slot *slot, gfn_t gfn,
> -			      kvm_pfn_t pfn, int max_level);
> +			      int max_level);
>  void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
>  void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_level);
>  
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index f3a430d64975..d75d93edc40a 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -1733,7 +1733,6 @@ static void zap_collapsible_spte_range(struct kvm *kvm,
>  	gfn_t end = start + slot->npages;
>  	struct tdp_iter iter;
>  	int max_mapping_level;
> -	kvm_pfn_t pfn;
>  
>  	rcu_read_lock();
>  
> @@ -1745,13 +1744,8 @@ static void zap_collapsible_spte_range(struct kvm *kvm,
>  		    !is_last_spte(iter.old_spte, iter.level))
>  			continue;
>  
> -		/*
> -		 * This is a leaf SPTE. Check if the PFN it maps can
> -		 * be mapped at a higher level.
> -		 */
> -		pfn = spte_to_pfn(iter.old_spte);
>  		max_mapping_level = kvm_mmu_max_mapping_level(kvm, slot,
> -				iter.gfn, pfn, PG_LEVEL_NUM);
> +							      iter.gfn, PG_LEVEL_NUM);
>  
>  		WARN_ON(max_mapping_level < iter.level);
>  
> -- 
> 2.37.0.170.g444d1eabd0-goog
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/4] KVM: x86/mmu: Document the "rules" for using host_pfn_mapping_level()
  2022-07-15 23:21 ` [PATCH 2/4] KVM: x86/mmu: Document the "rules" for using host_pfn_mapping_level() Sean Christopherson
@ 2022-07-16 18:51   ` Mingwei Zhang
  2022-07-18 16:50     ` Sean Christopherson
  0 siblings, 1 reply; 10+ messages in thread
From: Mingwei Zhang @ 2022-07-16 18:51 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel

On Fri, Jul 15, 2022, Sean Christopherson wrote:
> Add a comment to document how host_pfn_mapping_level() can be used safely,
> as the line between safe and dangerous is quite thin.  E.g. if KVM were
> to ever support in-place promotion to create huge pages, consuming the
> level is safe if the caller holds mmu_lock and checks that there's an
> existing _leaf_ SPTE, but unsafe if the caller only checks that there's a
> non-leaf SPTE.
> 
> Opportunistically tweak the existing comments to explicitly document why
> KVM needs to use READ_ONCE().
> 
> No functional change intended.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/mmu/mmu.c | 42 +++++++++++++++++++++++++++++++++++-------
>  1 file changed, 35 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index bebff1d5acd4..d5b644f3e003 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -2919,6 +2919,31 @@ static void direct_pte_prefetch(struct kvm_vcpu *vcpu, u64 *sptep)
>  	__direct_pte_prefetch(vcpu, sp, sptep);
>  }
>  
> +/*
> + * Lookup the mapping level for @gfn in the current mm.
> + *
> + * WARNING!  Use of host_pfn_mapping_level() requires the caller and the end
> + * consumer to be tied into KVM's handlers for MMU notifier events!
Since calling this function won't cause kernel crash now, I guess we can
remove the warning sign here, but keep the remaining statement since it
is necessary.
> + *
> + * There are several ways to safely use this helper:
> + *
> + * - Check mmu_notifier_retry_hva() after grabbing the mapping level, before
> + *   consuming it.  In this case, mmu_lock doesn't need to be held during the
> + *   lookup, but it does need to be held while checking the MMU notifier.

but it does need to be held while checking the MMU notifier and
consuming the result.
> + *
> + * - Hold mmu_lock AND ensure there is no in-progress MMU notifier invalidation
> + *   event for the hva.  This can be done by explicit checking the MMU notifier
> + *   or by ensuring that KVM already has a valid mapping that covers the hva.

Yes, more specifically, "mmu notifier sequence counter".
> + *
> + * - Do not use the result to install new mappings, e.g. use the host mapping
> + *   level only to decide whether or not to zap an entry.  In this case, it's
> + *   not required to hold mmu_lock (though it's highly likely the caller will
> + *   want to hold mmu_lock anyways, e.g. to modify SPTEs).
> + *
> + * Note!  The lookup can still race with modifications to host page tables, but
> + * the above "rules" ensure KVM will not _consume_ the result of the walk if a
> + * race with the primary MMU occurs.
> + */
>  static int host_pfn_mapping_level(struct kvm *kvm, gfn_t gfn,
>  				  const struct kvm_memory_slot *slot)
>  {
> @@ -2941,16 +2966,19 @@ static int host_pfn_mapping_level(struct kvm *kvm, gfn_t gfn,
>  	hva = __gfn_to_hva_memslot(slot, gfn);
>  
>  	/*
> -	 * Lookup the mapping level in the current mm.  The information
> -	 * may become stale soon, but it is safe to use as long as
> -	 * 1) mmu_notifier_retry was checked after taking mmu_lock, and
> -	 * 2) mmu_lock is taken now.
> -	 *
> -	 * We still need to disable IRQs to prevent concurrent tear down
> -	 * of page tables.
> +	 * Disable IRQs to prevent concurrent tear down of host page tables,
> +	 * e.g. if the primary MMU promotes a P*D to a huge page and then frees
> +	 * the original page table.
>  	 */
>  	local_irq_save(flags);
>  
> +	/*
> +	 * Read each entry once.  As above, a non-leaf entry can be promoted to
> +	 * a huge page _during_ this walk.  Re-reading the entry could send the
> +	 * walk into the weeks, e.g. p*d_large() returns false (sees the old
> +	 * value) and then p*d_offset() walks into the target huge page instead
> +	 * of the old page table (sees the new value).
> +	 */
>  	pgd = READ_ONCE(*pgd_offset(kvm->mm, hva));
>  	if (pgd_none(pgd))
>  		goto out;
> -- 
> 2.37.0.170.g444d1eabd0-goog
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/4] KVM: x86/mmu: Document the "rules" for using host_pfn_mapping_level()
  2022-07-16 18:51   ` Mingwei Zhang
@ 2022-07-18 16:50     ` Sean Christopherson
  2022-07-18 20:48       ` Mingwei Zhang
  0 siblings, 1 reply; 10+ messages in thread
From: Sean Christopherson @ 2022-07-18 16:50 UTC (permalink / raw)
  To: Mingwei Zhang; +Cc: Paolo Bonzini, kvm, linux-kernel

On Sat, Jul 16, 2022, Mingwei Zhang wrote:
> On Fri, Jul 15, 2022, Sean Christopherson wrote:
> > Add a comment to document how host_pfn_mapping_level() can be used safely,
> > as the line between safe and dangerous is quite thin.  E.g. if KVM were
> > to ever support in-place promotion to create huge pages, consuming the
> > level is safe if the caller holds mmu_lock and checks that there's an
> > existing _leaf_ SPTE, but unsafe if the caller only checks that there's a
> > non-leaf SPTE.
> > 
> > Opportunistically tweak the existing comments to explicitly document why
> > KVM needs to use READ_ONCE().
> > 
> > No functional change intended.
> > 
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> >  arch/x86/kvm/mmu/mmu.c | 42 +++++++++++++++++++++++++++++++++++-------
> >  1 file changed, 35 insertions(+), 7 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index bebff1d5acd4..d5b644f3e003 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -2919,6 +2919,31 @@ static void direct_pte_prefetch(struct kvm_vcpu *vcpu, u64 *sptep)
> >  	__direct_pte_prefetch(vcpu, sp, sptep);
> >  }
> >  
> > +/*
> > + * Lookup the mapping level for @gfn in the current mm.
> > + *
> > + * WARNING!  Use of host_pfn_mapping_level() requires the caller and the end
> > + * consumer to be tied into KVM's handlers for MMU notifier events!
> Since calling this function won't cause kernel crash now, I guess we can
> remove the warning sign here, but keep the remaining statement since it
> is necessary.

Calling this function won't _directly_ crash the kernel, but improper usage can
most definitely crash the host kernel, or even worse, silently corrupt host and
or guest data.  E.g. if KVM were to race with an mmu_notifier event and incorrectly
map a stale huge page into the guest.

So yes, the function itself is robust, but usage is still very subtle and delicate.

> > + *
> > + * There are several ways to safely use this helper:
> > + *
> > + * - Check mmu_notifier_retry_hva() after grabbing the mapping level, before
> > + *   consuming it.  In this case, mmu_lock doesn't need to be held during the
> > + *   lookup, but it does need to be held while checking the MMU notifier.
> 
> but it does need to be held while checking the MMU notifier and
> consuming the result.

I didn't want to include "consuming the result" because arguably the result is
being consumed while running the guest, and obviously KVM doesn't hold mmu_lock
while running the guest (though I fully acknowledge the above effectively uses
"consume" in the sense of shoving the result into SPTEs).  

> > + *
> > + * - Hold mmu_lock AND ensure there is no in-progress MMU notifier invalidation
> > + *   event for the hva.  This can be done by explicit checking the MMU notifier

s/explicit/explicitly

> > + *   or by ensuring that KVM already has a valid mapping that covers the hva.
> 
> Yes, more specifically, "mmu notifier sequence counter".

Heh, depends on what the reader interprets as "sequence counter".  If the reader
interprets that as the literal sequence counter, mmu_notifier_seq, then this phrasing
is incorrect as mmu_notifier_seq isn't bumped until the invalidation completes,
i.e. it guards against _past_ invalidations, not in-progress validations.

My preference is to intentionally not be precise in describing how to check for an
in-progress invalidation, e.g. so that this comment doesn't need to be updated if
the details change, and to also to try and force developers to do more than copy
and paste if they want to use this helper.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/4] KVM: x86/mmu: Document the "rules" for using host_pfn_mapping_level()
  2022-07-18 16:50     ` Sean Christopherson
@ 2022-07-18 20:48       ` Mingwei Zhang
  0 siblings, 0 replies; 10+ messages in thread
From: Mingwei Zhang @ 2022-07-18 20:48 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel

On Mon, Jul 18, 2022, Sean Christopherson wrote:
> On Sat, Jul 16, 2022, Mingwei Zhang wrote:
> > On Fri, Jul 15, 2022, Sean Christopherson wrote:
> > > Add a comment to document how host_pfn_mapping_level() can be used safely,
> > > as the line between safe and dangerous is quite thin.  E.g. if KVM were
> > > to ever support in-place promotion to create huge pages, consuming the
> > > level is safe if the caller holds mmu_lock and checks that there's an
> > > existing _leaf_ SPTE, but unsafe if the caller only checks that there's a
> > > non-leaf SPTE.
> > > 
> > > Opportunistically tweak the existing comments to explicitly document why
> > > KVM needs to use READ_ONCE().
> > > 
> > > No functional change intended.
> > > 
> > > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > > ---
> > >  arch/x86/kvm/mmu/mmu.c | 42 +++++++++++++++++++++++++++++++++++-------
> > >  1 file changed, 35 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > > index bebff1d5acd4..d5b644f3e003 100644
> > > --- a/arch/x86/kvm/mmu/mmu.c
> > > +++ b/arch/x86/kvm/mmu/mmu.c
> > > @@ -2919,6 +2919,31 @@ static void direct_pte_prefetch(struct kvm_vcpu *vcpu, u64 *sptep)
> > >  	__direct_pte_prefetch(vcpu, sp, sptep);
> > >  }
> > >  
> > > +/*
> > > + * Lookup the mapping level for @gfn in the current mm.
> > > + *
> > > + * WARNING!  Use of host_pfn_mapping_level() requires the caller and the end
> > > + * consumer to be tied into KVM's handlers for MMU notifier events!
> > Since calling this function won't cause kernel crash now, I guess we can
> > remove the warning sign here, but keep the remaining statement since it
> > is necessary.
> 
> Calling this function won't _directly_ crash the kernel, but improper usage can
> most definitely crash the host kernel, or even worse, silently corrupt host and
> or guest data.  E.g. if KVM were to race with an mmu_notifier event and incorrectly
> map a stale huge page into the guest.
> 
> So yes, the function itself is robust, but usage is still very subtle and delicate.

Understood. So we basically create another "gup_fast_only()" within KVM
and we worry that may confuse other developers so we add the warning
sign.
> 
> > > + *
> > > + * There are several ways to safely use this helper:
> > > + *
> > > + * - Check mmu_notifier_retry_hva() after grabbing the mapping level, before
> > > + *   consuming it.  In this case, mmu_lock doesn't need to be held during the
> > > + *   lookup, but it does need to be held while checking the MMU notifier.
> > 
> > but it does need to be held while checking the MMU notifier and
> > consuming the result.
> 
> I didn't want to include "consuming the result" because arguably the result is
> being consumed while running the guest, and obviously KVM doesn't hold mmu_lock
> while running the guest (though I fully acknowledge the above effectively uses
> "consume" in the sense of shoving the result into SPTEs).  
> 
> > > + *
> > > + * - Hold mmu_lock AND ensure there is no in-progress MMU notifier invalidation
> > > + *   event for the hva.  This can be done by explicit checking the MMU notifier
> 
> s/explicit/explicitly
> 
> > > + *   or by ensuring that KVM already has a valid mapping that covers the hva.
> > 
> > Yes, more specifically, "mmu notifier sequence counter".
> 
> Heh, depends on what the reader interprets as "sequence counter".  If the reader
> interprets that as the literal sequence counter, mmu_notifier_seq, then this phrasing
> is incorrect as mmu_notifier_seq isn't bumped until the invalidation completes,
> i.e. it guards against _past_ invalidations, not in-progress validations.
> 
> My preference is to intentionally not be precise in describing how to check for an
> in-progress invalidation, e.g. so that this comment doesn't need to be updated if
> the details change, and to also to try and force developers to do more than copy
> and paste if they want to use this helper.

Hmm, I was going to say that I strongly disagree about the intentional
unclearness. But then I find that MMU notifier implementation does
require more than just the counter but also the range, so yeah, talking
too much may fall into the weeds. But in general, I think mmu notifier
deserves better documentation in both concept and implementation in KVM.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/4] Huge page related cleanups
  2022-07-15 23:21 [PATCH 0/4] Huge page related cleanups Sean Christopherson
                   ` (3 preceding siblings ...)
  2022-07-15 23:21 ` [PATCH 4/4] KVM: selftests: Add an option to run vCPUs while disabling dirty logging Sean Christopherson
@ 2022-07-19 18:02 ` Paolo Bonzini
  4 siblings, 0 replies; 10+ messages in thread
From: Paolo Bonzini @ 2022-07-19 18:02 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: kvm, linux-kernel, Mingwei Zhang

Queued, thanks.

Paolo



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-07-19 18:02 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-15 23:21 [PATCH 0/4] Huge page related cleanups Sean Christopherson
2022-07-15 23:21 ` [PATCH 1/4] KVM: x86/mmu: Don't require refcounted "struct page" to create huge SPTEs Sean Christopherson
2022-07-16  5:53   ` Mingwei Zhang
2022-07-15 23:21 ` [PATCH 2/4] KVM: x86/mmu: Document the "rules" for using host_pfn_mapping_level() Sean Christopherson
2022-07-16 18:51   ` Mingwei Zhang
2022-07-18 16:50     ` Sean Christopherson
2022-07-18 20:48       ` Mingwei Zhang
2022-07-15 23:21 ` [PATCH 3/4] KVM: x86/mmu: Don't bottom out on leafs when zapping collapsible SPTEs Sean Christopherson
2022-07-15 23:21 ` [PATCH 4/4] KVM: selftests: Add an option to run vCPUs while disabling dirty logging Sean Christopherson
2022-07-19 18:02 ` [PATCH 0/4] Huge page related cleanups Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).