linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/3] Add detailed page size stats in KVM stats
@ 2021-07-26 17:53 Mingwei Zhang
  2021-07-26 17:53 ` [PATCH v2 1/3] KVM: x86/mmu: Remove redundant spte present check in mmu_set_spte Mingwei Zhang
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Mingwei Zhang @ 2021-07-26 17:53 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Mingwei Zhang,
	Jing Zhang

This commit basically adds detailed (large and regular) page size info to
KVM stats and deprecate the old one: lpages.

To support legacy MMU and TDP mmu, we use atomic type for all page stats.

v1 -> v2:
 - refactor kvm_update_page_stats and remove 'spte' argument. [sean]
 - remove 'lpages' as it can be aggregated by user level [sean]
 - fix lpages stats update issue in __handle_change_pte [sean]
 - fix style issues and typos. [ben/sean]

pre-v1 (internal reviewers):
 - use atomic in all page stats and use 'level' as index. [sean]
 - use an extra argument in kvm_update_page_stats for atomic/non-atomic.
   [bgardon]
 - should be careful on the difference between legacy mmu and tdp mmu.
   [jingzhangos]


Mingwei Zhang (3):
  kvm: mmu/x86: Remove redundant spte present check in mmu_set_spte
  KVM: x86/mmu: Avoid collision with !PRESENT SPTEs in TDP MMU lpage
    stats
  kvm: mmu/x86: Add detailed page size stats

 arch/x86/include/asm/kvm_host.h | 10 +++++++-
 arch/x86/kvm/mmu.h              |  2 ++
 arch/x86/kvm/mmu/mmu.c          | 42 ++++++++++++++++++---------------
 arch/x86/kvm/mmu/tdp_mmu.c      |  9 ++-----
 arch/x86/kvm/x86.c              |  7 ++++--
 5 files changed, 41 insertions(+), 29 deletions(-)

--
2.32.0.432.gabb21c7263-goog


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 1/3] KVM: x86/mmu: Remove redundant spte present check in mmu_set_spte
  2021-07-26 17:53 [PATCH v2 0/3] Add detailed page size stats in KVM stats Mingwei Zhang
@ 2021-07-26 17:53 ` Mingwei Zhang
  2021-07-26 20:23   ` Ben Gardon
  2021-07-29 18:16   ` Sean Christopherson
  2021-07-26 17:53 ` [PATCH v2 2/3] KVM: x86/mmu: Avoid collision with !PRESENT SPTEs in TDP MMU lpage stats Mingwei Zhang
  2021-07-26 17:53 ` [PATCH v2 3/3] KVM: x86/mmu: Add detailed page size stats Mingwei Zhang
  2 siblings, 2 replies; 15+ messages in thread
From: Mingwei Zhang @ 2021-07-26 17:53 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Mingwei Zhang,
	Jing Zhang

Drop an unnecessary is_shadow_present_pte() check when updating the rmaps
after installing a non-MMIO SPTE.  set_spte() is used only to create
shadow-present SPTEs, e.g. MMIO SPTEs are handled early on, mmu_set_spte()
runs with mmu_lock held for write, i.e. the SPTE can't be zapped between
writing the SPTE and updating the rmaps.

Opportunistically combine the "new SPTE" logic for large pages and rmaps.

No functional change intended.

Suggested-by: Ben Gardon <bgardon@google.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index b888385d1933..442cc554ebd6 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2690,15 +2690,13 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 
 	pgprintk("%s: setting spte %llx\n", __func__, *sptep);
 	trace_kvm_mmu_set_spte(level, gfn, sptep);
-	if (!was_rmapped && is_large_pte(*sptep))
-		++vcpu->kvm->stat.lpages;
 
-	if (is_shadow_present_pte(*sptep)) {
-		if (!was_rmapped) {
-			rmap_count = rmap_add(vcpu, sptep, gfn);
-			if (rmap_count > RMAP_RECYCLE_THRESHOLD)
-				rmap_recycle(vcpu, sptep, gfn);
-		}
+	if (!was_rmapped) {
+		if (is_large_pte(*sptep))
+			++vcpu->kvm->stat.lpages;
+		rmap_count = rmap_add(vcpu, sptep, gfn);
+		if (rmap_count > RMAP_RECYCLE_THRESHOLD)
+			rmap_recycle(vcpu, sptep, gfn);
 	}
 
 	return ret;
-- 
2.32.0.432.gabb21c7263-goog


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 2/3] KVM: x86/mmu: Avoid collision with !PRESENT SPTEs in TDP MMU lpage stats
  2021-07-26 17:53 [PATCH v2 0/3] Add detailed page size stats in KVM stats Mingwei Zhang
  2021-07-26 17:53 ` [PATCH v2 1/3] KVM: x86/mmu: Remove redundant spte present check in mmu_set_spte Mingwei Zhang
@ 2021-07-26 17:53 ` Mingwei Zhang
  2021-07-26 21:02   ` Ben Gardon
  2021-07-29 18:34   ` Sean Christopherson
  2021-07-26 17:53 ` [PATCH v2 3/3] KVM: x86/mmu: Add detailed page size stats Mingwei Zhang
  2 siblings, 2 replies; 15+ messages in thread
From: Mingwei Zhang @ 2021-07-26 17:53 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Mingwei Zhang,
	Jing Zhang

Factor in whether or not the old/new SPTEs are shadow-present when
adjusting the large page stats in the TDP MMU. A modified MMIO SPTE can
toggle the page size bit, as bit 7 is used to store the MMIO generation,
i.e. is_large_pte() can get a false positive when called on a MMIO SPTE.
Ditto for nuking SPTEs with REMOVED_SPTE, which sets bit 7 in its magic
value.

Opportunistically move the logic below the check to verify at least one
of the old/new SPTEs is shadow present.

Use is/was_leaf even though is/was_present would suffice.  The code
generation is roughly equivalent since all flags need to be computed
prior to the code in question, and using the *_leaf flags will minimize
the diff in a future enhancement to account all pages, i.e. will change
the check to "is_leaf != was_leaf".

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
---
 arch/x86/kvm/mmu/tdp_mmu.c | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index caac4ddb46df..cba2ab5db2a0 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -413,6 +413,7 @@ static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
 	bool was_leaf = was_present && is_last_spte(old_spte, level);
 	bool is_leaf = is_present && is_last_spte(new_spte, level);
 	bool pfn_changed = spte_to_pfn(old_spte) != spte_to_pfn(new_spte);
+	bool was_large, is_large;
 
 	WARN_ON(level > PT64_ROOT_MAX_LEVEL);
 	WARN_ON(level < PG_LEVEL_4K);
@@ -446,13 +447,6 @@ static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
 
 	trace_kvm_tdp_mmu_spte_changed(as_id, gfn, level, old_spte, new_spte);
 
-	if (is_large_pte(old_spte) != is_large_pte(new_spte)) {
-		if (is_large_pte(old_spte))
-			atomic64_sub(1, (atomic64_t*)&kvm->stat.lpages);
-		else
-			atomic64_add(1, (atomic64_t*)&kvm->stat.lpages);
-	}
-
 	/*
 	 * The only times a SPTE should be changed from a non-present to
 	 * non-present state is when an MMIO entry is installed/modified/
@@ -478,6 +472,18 @@ static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
 		return;
 	}
 
+	/*
+	 * Update large page stats if a large page is being zapped, created, or
+	 * is replacing an existing shadow page.
+	 */
+	was_large = was_leaf && is_large_pte(old_spte);
+	is_large = is_leaf && is_large_pte(new_spte);
+	if (was_large != is_large) {
+		if (was_large)
+			atomic64_sub(1, (atomic64_t *)&kvm->stat.lpages);
+		else
+			atomic64_add(1, (atomic64_t *)&kvm->stat.lpages);
+	}
 
 	if (was_leaf && is_dirty_spte(old_spte) &&
 	    (!is_present || !is_dirty_spte(new_spte) || pfn_changed))
-- 
2.32.0.432.gabb21c7263-goog


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 3/3] KVM: x86/mmu: Add detailed page size stats
  2021-07-26 17:53 [PATCH v2 0/3] Add detailed page size stats in KVM stats Mingwei Zhang
  2021-07-26 17:53 ` [PATCH v2 1/3] KVM: x86/mmu: Remove redundant spte present check in mmu_set_spte Mingwei Zhang
  2021-07-26 17:53 ` [PATCH v2 2/3] KVM: x86/mmu: Avoid collision with !PRESENT SPTEs in TDP MMU lpage stats Mingwei Zhang
@ 2021-07-26 17:53 ` Mingwei Zhang
  2021-07-26 20:41   ` Ben Gardon
  2021-07-27 15:36   ` David Matlack
  2 siblings, 2 replies; 15+ messages in thread
From: Mingwei Zhang @ 2021-07-26 17:53 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Mingwei Zhang,
	Jing Zhang

Existing KVM code tracks the number of large pages regardless of their
sizes. Therefore, when large page of 1GB (or larger) is adopted, the
information becomes less useful because lpages counts a mix of 1G and 2M
pages.

So remove the lpages since it is easy for user space to aggregate the info.
Instead, provide a comprehensive page stats of all sizes from 4K to 512G.

Suggested-by: Ben Gardon <bgardon@google.com>
Suggested-by: Jing Zhang <jingzhangos@google.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
---
 arch/x86/include/asm/kvm_host.h | 10 +++++++++-
 arch/x86/kvm/mmu.h              |  2 ++
 arch/x86/kvm/mmu/mmu.c          | 32 +++++++++++++++++++-------------
 arch/x86/kvm/mmu/tdp_mmu.c      | 15 ++-------------
 arch/x86/kvm/x86.c              |  7 +++++--
 5 files changed, 37 insertions(+), 29 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 974cbfb1eefe..2e4b6fd36e62 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1206,9 +1206,17 @@ struct kvm_vm_stat {
 	u64 mmu_recycled;
 	u64 mmu_cache_miss;
 	u64 mmu_unsync;
-	u64 lpages;
 	u64 nx_lpage_splits;
 	u64 max_mmu_page_hash_collisions;
+	union {
+		struct {
+			atomic64_t pages_4k;
+			atomic64_t pages_2m;
+			atomic64_t pages_1g;
+			atomic64_t pages_512g;
+		};
+		atomic64_t pages[4];
+	} page_stats;
 };
 
 struct kvm_vcpu_stat {
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 83e6c6965f1e..ad5638815311 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -240,4 +240,6 @@ static inline bool kvm_memslots_have_rmaps(struct kvm *kvm)
 	return smp_load_acquire(&kvm->arch.memslots_have_rmaps);
 }
 
+void kvm_update_page_stats(struct kvm *kvm, int level, int count);
+
 #endif
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 442cc554ebd6..7e0fc760739b 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -588,16 +588,22 @@ static bool mmu_spte_update(u64 *sptep, u64 new_spte)
 	return flush;
 }
 
+void kvm_update_page_stats(struct kvm *kvm, int level, int count)
+{
+	atomic64_add(count, &kvm->stat.page_stats.pages[level - 1]);
+}
+
 /*
  * Rules for using mmu_spte_clear_track_bits:
  * It sets the sptep from present to nonpresent, and track the
  * state bits, it is used to clear the last level sptep.
  * Returns non-zero if the PTE was previously valid.
  */
-static int mmu_spte_clear_track_bits(u64 *sptep)
+static int mmu_spte_clear_track_bits(struct kvm *kvm, u64 *sptep)
 {
 	kvm_pfn_t pfn;
 	u64 old_spte = *sptep;
+	int level = sptep_to_sp(sptep)->role.level;
 
 	if (!spte_has_volatile_bits(old_spte))
 		__update_clear_spte_fast(sptep, 0ull);
@@ -607,6 +613,9 @@ static int mmu_spte_clear_track_bits(u64 *sptep)
 	if (!is_shadow_present_pte(old_spte))
 		return 0;
 
+	if (is_last_spte(old_spte, level))
+		kvm_update_page_stats(kvm, level, -1);
+
 	pfn = spte_to_pfn(old_spte);
 
 	/*
@@ -984,9 +993,10 @@ static void __pte_list_remove(u64 *spte, struct kvm_rmap_head *rmap_head)
 	}
 }
 
-static void pte_list_remove(struct kvm_rmap_head *rmap_head, u64 *sptep)
+static void pte_list_remove(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
+			    u64 *sptep)
 {
-	mmu_spte_clear_track_bits(sptep);
+	mmu_spte_clear_track_bits(kvm, sptep);
 	__pte_list_remove(sptep, rmap_head);
 }
 
@@ -1119,7 +1129,7 @@ static u64 *rmap_get_next(struct rmap_iterator *iter)
 
 static void drop_spte(struct kvm *kvm, u64 *sptep)
 {
-	if (mmu_spte_clear_track_bits(sptep))
+	if (mmu_spte_clear_track_bits(kvm, sptep))
 		rmap_remove(kvm, sptep);
 }
 
@@ -1129,7 +1139,6 @@ static bool __drop_large_spte(struct kvm *kvm, u64 *sptep)
 	if (is_large_pte(*sptep)) {
 		WARN_ON(sptep_to_sp(sptep)->role.level == PG_LEVEL_4K);
 		drop_spte(kvm, sptep);
-		--kvm->stat.lpages;
 		return true;
 	}
 
@@ -1386,7 +1395,7 @@ static bool kvm_zap_rmapp(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
 	while ((sptep = rmap_get_first(rmap_head, &iter))) {
 		rmap_printk("spte %p %llx.\n", sptep, *sptep);
 
-		pte_list_remove(rmap_head, sptep);
+		pte_list_remove(kvm, rmap_head, sptep);
 		flush = true;
 	}
 
@@ -1421,13 +1430,13 @@ static bool kvm_set_pte_rmapp(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
 		need_flush = 1;
 
 		if (pte_write(pte)) {
-			pte_list_remove(rmap_head, sptep);
+			pte_list_remove(kvm, rmap_head, sptep);
 			goto restart;
 		} else {
 			new_spte = kvm_mmu_changed_pte_notifier_make_spte(
 					*sptep, new_pfn);
 
-			mmu_spte_clear_track_bits(sptep);
+			mmu_spte_clear_track_bits(kvm, sptep);
 			mmu_spte_set(sptep, new_spte);
 		}
 	}
@@ -2232,8 +2241,6 @@ static int mmu_page_zap_pte(struct kvm *kvm, struct kvm_mmu_page *sp,
 	if (is_shadow_present_pte(pte)) {
 		if (is_last_spte(pte, sp->role.level)) {
 			drop_spte(kvm, spte);
-			if (is_large_pte(pte))
-				--kvm->stat.lpages;
 		} else {
 			child = to_shadow_page(pte & PT64_BASE_ADDR_MASK);
 			drop_parent_pte(child, spte);
@@ -2692,8 +2699,7 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 	trace_kvm_mmu_set_spte(level, gfn, sptep);
 
 	if (!was_rmapped) {
-		if (is_large_pte(*sptep))
-			++vcpu->kvm->stat.lpages;
+		kvm_update_page_stats(vcpu->kvm, level, 1);
 		rmap_count = rmap_add(vcpu, sptep, gfn);
 		if (rmap_count > RMAP_RECYCLE_THRESHOLD)
 			rmap_recycle(vcpu, sptep, gfn);
@@ -5669,7 +5675,7 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,
 		if (sp->role.direct && !kvm_is_reserved_pfn(pfn) &&
 		    sp->role.level < kvm_mmu_max_mapping_level(kvm, slot, sp->gfn,
 							       pfn, PG_LEVEL_NUM)) {
-			pte_list_remove(rmap_head, sptep);
+			pte_list_remove(kvm, rmap_head, sptep);
 
 			if (kvm_available_flush_tlb_with_range())
 				kvm_flush_remote_tlbs_with_address(kvm, sp->gfn,
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index cba2ab5db2a0..eae404c15364 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -413,7 +413,6 @@ static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
 	bool was_leaf = was_present && is_last_spte(old_spte, level);
 	bool is_leaf = is_present && is_last_spte(new_spte, level);
 	bool pfn_changed = spte_to_pfn(old_spte) != spte_to_pfn(new_spte);
-	bool was_large, is_large;
 
 	WARN_ON(level > PT64_ROOT_MAX_LEVEL);
 	WARN_ON(level < PG_LEVEL_4K);
@@ -472,18 +471,8 @@ static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
 		return;
 	}
 
-	/*
-	 * Update large page stats if a large page is being zapped, created, or
-	 * is replacing an existing shadow page.
-	 */
-	was_large = was_leaf && is_large_pte(old_spte);
-	is_large = is_leaf && is_large_pte(new_spte);
-	if (was_large != is_large) {
-		if (was_large)
-			atomic64_sub(1, (atomic64_t *)&kvm->stat.lpages);
-		else
-			atomic64_add(1, (atomic64_t *)&kvm->stat.lpages);
-	}
+	if (is_leaf != was_leaf)
+		kvm_update_page_stats(kvm, level, is_leaf ? 1 : -1);
 
 	if (was_leaf && is_dirty_spte(old_spte) &&
 	    (!is_present || !is_dirty_spte(new_spte) || pfn_changed))
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8166ad113fb2..3858d36d3c49 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -235,9 +235,12 @@ const struct _kvm_stats_desc kvm_vm_stats_desc[] = {
 	STATS_DESC_COUNTER(VM, mmu_recycled),
 	STATS_DESC_COUNTER(VM, mmu_cache_miss),
 	STATS_DESC_ICOUNTER(VM, mmu_unsync),
-	STATS_DESC_ICOUNTER(VM, lpages),
 	STATS_DESC_ICOUNTER(VM, nx_lpage_splits),
-	STATS_DESC_PCOUNTER(VM, max_mmu_page_hash_collisions)
+	STATS_DESC_PCOUNTER(VM, max_mmu_page_hash_collisions),
+	STATS_DESC_ICOUNTER(VM, page_stats.pages_4k),
+	STATS_DESC_ICOUNTER(VM, page_stats.pages_2m),
+	STATS_DESC_ICOUNTER(VM, page_stats.pages_1g),
+	STATS_DESC_ICOUNTER(VM, page_stats.pages_512g)
 };
 static_assert(ARRAY_SIZE(kvm_vm_stats_desc) ==
 		sizeof(struct kvm_vm_stat) / sizeof(u64));
-- 
2.32.0.432.gabb21c7263-goog


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/3] KVM: x86/mmu: Remove redundant spte present check in mmu_set_spte
  2021-07-26 17:53 ` [PATCH v2 1/3] KVM: x86/mmu: Remove redundant spte present check in mmu_set_spte Mingwei Zhang
@ 2021-07-26 20:23   ` Ben Gardon
  2021-07-29 18:16   ` Sean Christopherson
  1 sibling, 0 replies; 15+ messages in thread
From: Ben Gardon @ 2021-07-26 20:23 UTC (permalink / raw)
  To: Mingwei Zhang
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, kvm, LKML, Jing Zhang

On Mon, Jul 26, 2021 at 10:54 AM Mingwei Zhang <mizhang@google.com> wrote:
>
> Drop an unnecessary is_shadow_present_pte() check when updating the rmaps
> after installing a non-MMIO SPTE.  set_spte() is used only to create
> shadow-present SPTEs, e.g. MMIO SPTEs are handled early on, mmu_set_spte()
> runs with mmu_lock held for write, i.e. the SPTE can't be zapped between
> writing the SPTE and updating the rmaps.
>
> Opportunistically combine the "new SPTE" logic for large pages and rmaps.
>
> No functional change intended.
>
> Suggested-by: Ben Gardon <bgardon@google.com>
> Signed-off-by: Mingwei Zhang <mizhang@google.com>

Reviewed-by: Ben Gardon <bgardon@google.com>

> ---
>  arch/x86/kvm/mmu/mmu.c | 14 ++++++--------
>  1 file changed, 6 insertions(+), 8 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index b888385d1933..442cc554ebd6 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -2690,15 +2690,13 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>
>         pgprintk("%s: setting spte %llx\n", __func__, *sptep);
>         trace_kvm_mmu_set_spte(level, gfn, sptep);
> -       if (!was_rmapped && is_large_pte(*sptep))
> -               ++vcpu->kvm->stat.lpages;
>
> -       if (is_shadow_present_pte(*sptep)) {
> -               if (!was_rmapped) {
> -                       rmap_count = rmap_add(vcpu, sptep, gfn);
> -                       if (rmap_count > RMAP_RECYCLE_THRESHOLD)
> -                               rmap_recycle(vcpu, sptep, gfn);
> -               }
> +       if (!was_rmapped) {
> +               if (is_large_pte(*sptep))
> +                       ++vcpu->kvm->stat.lpages;
> +               rmap_count = rmap_add(vcpu, sptep, gfn);
> +               if (rmap_count > RMAP_RECYCLE_THRESHOLD)
> +                       rmap_recycle(vcpu, sptep, gfn);
>         }
>
>         return ret;
> --
> 2.32.0.432.gabb21c7263-goog
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/3] KVM: x86/mmu: Add detailed page size stats
  2021-07-26 17:53 ` [PATCH v2 3/3] KVM: x86/mmu: Add detailed page size stats Mingwei Zhang
@ 2021-07-26 20:41   ` Ben Gardon
  2021-07-26 21:06     ` Ben Gardon
  2021-07-29 18:45     ` Sean Christopherson
  2021-07-27 15:36   ` David Matlack
  1 sibling, 2 replies; 15+ messages in thread
From: Ben Gardon @ 2021-07-26 20:41 UTC (permalink / raw)
  To: Mingwei Zhang
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, kvm, LKML, Jing Zhang

On Mon, Jul 26, 2021 at 10:54 AM Mingwei Zhang <mizhang@google.com> wrote:
>
> Existing KVM code tracks the number of large pages regardless of their
> sizes. Therefore, when large page of 1GB (or larger) is adopted, the
> information becomes less useful because lpages counts a mix of 1G and 2M
> pages.
>
> So remove the lpages since it is easy for user space to aggregate the info.
> Instead, provide a comprehensive page stats of all sizes from 4K to 512G.
>
> Suggested-by: Ben Gardon <bgardon@google.com>
> Suggested-by: Jing Zhang <jingzhangos@google.com>
> Signed-off-by: Mingwei Zhang <mizhang@google.com>
> ---
>  arch/x86/include/asm/kvm_host.h | 10 +++++++++-
>  arch/x86/kvm/mmu.h              |  2 ++
>  arch/x86/kvm/mmu/mmu.c          | 32 +++++++++++++++++++-------------
>  arch/x86/kvm/mmu/tdp_mmu.c      | 15 ++-------------
>  arch/x86/kvm/x86.c              |  7 +++++--
>  5 files changed, 37 insertions(+), 29 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 974cbfb1eefe..2e4b6fd36e62 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1206,9 +1206,17 @@ struct kvm_vm_stat {
>         u64 mmu_recycled;
>         u64 mmu_cache_miss;
>         u64 mmu_unsync;
> -       u64 lpages;
>         u64 nx_lpage_splits;
>         u64 max_mmu_page_hash_collisions;
> +       union {
> +               struct {
> +                       atomic64_t pages_4k;
> +                       atomic64_t pages_2m;
> +                       atomic64_t pages_1g;
> +                       atomic64_t pages_512g;
> +               };
> +               atomic64_t pages[4];
> +       } page_stats;
>  };
>
>  struct kvm_vcpu_stat {
> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> index 83e6c6965f1e..ad5638815311 100644
> --- a/arch/x86/kvm/mmu.h
> +++ b/arch/x86/kvm/mmu.h
> @@ -240,4 +240,6 @@ static inline bool kvm_memslots_have_rmaps(struct kvm *kvm)
>         return smp_load_acquire(&kvm->arch.memslots_have_rmaps);
>  }
>
> +void kvm_update_page_stats(struct kvm *kvm, int level, int count);
> +
>  #endif
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 442cc554ebd6..7e0fc760739b 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -588,16 +588,22 @@ static bool mmu_spte_update(u64 *sptep, u64 new_spte)
>         return flush;
>  }
>
> +void kvm_update_page_stats(struct kvm *kvm, int level, int count)
> +{
> +       atomic64_add(count, &kvm->stat.page_stats.pages[level - 1]);
> +}
> +
>  /*
>   * Rules for using mmu_spte_clear_track_bits:
>   * It sets the sptep from present to nonpresent, and track the
>   * state bits, it is used to clear the last level sptep.
>   * Returns non-zero if the PTE was previously valid.
>   */
> -static int mmu_spte_clear_track_bits(u64 *sptep)
> +static int mmu_spte_clear_track_bits(struct kvm *kvm, u64 *sptep)
>  {
>         kvm_pfn_t pfn;
>         u64 old_spte = *sptep;
> +       int level = sptep_to_sp(sptep)->role.level;
>
>         if (!spte_has_volatile_bits(old_spte))
>                 __update_clear_spte_fast(sptep, 0ull);
> @@ -607,6 +613,9 @@ static int mmu_spte_clear_track_bits(u64 *sptep)
>         if (!is_shadow_present_pte(old_spte))
>                 return 0;
>
> +       if (is_last_spte(old_spte, level))

You can drop this check since it's part of the contract for calling
this function.

> +               kvm_update_page_stats(kvm, level, -1);
> +
>         pfn = spte_to_pfn(old_spte);
>
>         /*
> @@ -984,9 +993,10 @@ static void __pte_list_remove(u64 *spte, struct kvm_rmap_head *rmap_head)
>         }
>  }
>
> -static void pte_list_remove(struct kvm_rmap_head *rmap_head, u64 *sptep)
> +static void pte_list_remove(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
> +                           u64 *sptep)
>  {
> -       mmu_spte_clear_track_bits(sptep);
> +       mmu_spte_clear_track_bits(kvm, sptep);
>         __pte_list_remove(sptep, rmap_head);
>  }
>
> @@ -1119,7 +1129,7 @@ static u64 *rmap_get_next(struct rmap_iterator *iter)
>
>  static void drop_spte(struct kvm *kvm, u64 *sptep)
>  {
> -       if (mmu_spte_clear_track_bits(sptep))
> +       if (mmu_spte_clear_track_bits(kvm, sptep))
>                 rmap_remove(kvm, sptep);
>  }
>
> @@ -1129,7 +1139,6 @@ static bool __drop_large_spte(struct kvm *kvm, u64 *sptep)
>         if (is_large_pte(*sptep)) {
>                 WARN_ON(sptep_to_sp(sptep)->role.level == PG_LEVEL_4K);
>                 drop_spte(kvm, sptep);
> -               --kvm->stat.lpages;
>                 return true;
>         }
>
> @@ -1386,7 +1395,7 @@ static bool kvm_zap_rmapp(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
>         while ((sptep = rmap_get_first(rmap_head, &iter))) {
>                 rmap_printk("spte %p %llx.\n", sptep, *sptep);
>
> -               pte_list_remove(rmap_head, sptep);
> +               pte_list_remove(kvm, rmap_head, sptep);
>                 flush = true;
>         }
>
> @@ -1421,13 +1430,13 @@ static bool kvm_set_pte_rmapp(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
>                 need_flush = 1;
>
>                 if (pte_write(pte)) {
> -                       pte_list_remove(rmap_head, sptep);
> +                       pte_list_remove(kvm, rmap_head, sptep);
>                         goto restart;
>                 } else {
>                         new_spte = kvm_mmu_changed_pte_notifier_make_spte(
>                                         *sptep, new_pfn);
>
> -                       mmu_spte_clear_track_bits(sptep);
> +                       mmu_spte_clear_track_bits(kvm, sptep);
>                         mmu_spte_set(sptep, new_spte);
>                 }
>         }
> @@ -2232,8 +2241,6 @@ static int mmu_page_zap_pte(struct kvm *kvm, struct kvm_mmu_page *sp,
>         if (is_shadow_present_pte(pte)) {
>                 if (is_last_spte(pte, sp->role.level)) {
>                         drop_spte(kvm, spte);
> -                       if (is_large_pte(pte))
> -                               --kvm->stat.lpages;
>                 } else {
>                         child = to_shadow_page(pte & PT64_BASE_ADDR_MASK);
>                         drop_parent_pte(child, spte);
> @@ -2692,8 +2699,7 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>         trace_kvm_mmu_set_spte(level, gfn, sptep);
>
>         if (!was_rmapped) {
> -               if (is_large_pte(*sptep))
> -                       ++vcpu->kvm->stat.lpages;
> +               kvm_update_page_stats(vcpu->kvm, level, 1);
>                 rmap_count = rmap_add(vcpu, sptep, gfn);
>                 if (rmap_count > RMAP_RECYCLE_THRESHOLD)
>                         rmap_recycle(vcpu, sptep, gfn);
> @@ -5669,7 +5675,7 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,
>                 if (sp->role.direct && !kvm_is_reserved_pfn(pfn) &&
>                     sp->role.level < kvm_mmu_max_mapping_level(kvm, slot, sp->gfn,
>                                                                pfn, PG_LEVEL_NUM)) {
> -                       pte_list_remove(rmap_head, sptep);
> +                       pte_list_remove(kvm, rmap_head, sptep);
>
>                         if (kvm_available_flush_tlb_with_range())
>                                 kvm_flush_remote_tlbs_with_address(kvm, sp->gfn,
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index cba2ab5db2a0..eae404c15364 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -413,7 +413,6 @@ static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
>         bool was_leaf = was_present && is_last_spte(old_spte, level);
>         bool is_leaf = is_present && is_last_spte(new_spte, level);
>         bool pfn_changed = spte_to_pfn(old_spte) != spte_to_pfn(new_spte);
> -       bool was_large, is_large;
>
>         WARN_ON(level > PT64_ROOT_MAX_LEVEL);
>         WARN_ON(level < PG_LEVEL_4K);
> @@ -472,18 +471,8 @@ static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
>                 return;
>         }
>
> -       /*
> -        * Update large page stats if a large page is being zapped, created, or
> -        * is replacing an existing shadow page.
> -        */
> -       was_large = was_leaf && is_large_pte(old_spte);
> -       is_large = is_leaf && is_large_pte(new_spte);
> -       if (was_large != is_large) {
> -               if (was_large)
> -                       atomic64_sub(1, (atomic64_t *)&kvm->stat.lpages);
> -               else
> -                       atomic64_add(1, (atomic64_t *)&kvm->stat.lpages);
> -       }
> +       if (is_leaf != was_leaf)
> +               kvm_update_page_stats(kvm, level, is_leaf ? 1 : -1);
>
>         if (was_leaf && is_dirty_spte(old_spte) &&
>             (!is_present || !is_dirty_spte(new_spte) || pfn_changed))
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 8166ad113fb2..3858d36d3c49 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -235,9 +235,12 @@ const struct _kvm_stats_desc kvm_vm_stats_desc[] = {
>         STATS_DESC_COUNTER(VM, mmu_recycled),
>         STATS_DESC_COUNTER(VM, mmu_cache_miss),
>         STATS_DESC_ICOUNTER(VM, mmu_unsync),
> -       STATS_DESC_ICOUNTER(VM, lpages),
>         STATS_DESC_ICOUNTER(VM, nx_lpage_splits),
> -       STATS_DESC_PCOUNTER(VM, max_mmu_page_hash_collisions)
> +       STATS_DESC_PCOUNTER(VM, max_mmu_page_hash_collisions),
> +       STATS_DESC_ICOUNTER(VM, page_stats.pages_4k),
> +       STATS_DESC_ICOUNTER(VM, page_stats.pages_2m),
> +       STATS_DESC_ICOUNTER(VM, page_stats.pages_1g),
> +       STATS_DESC_ICOUNTER(VM, page_stats.pages_512g)
>  };
>  static_assert(ARRAY_SIZE(kvm_vm_stats_desc) ==
>                 sizeof(struct kvm_vm_stat) / sizeof(u64));
> --
> 2.32.0.432.gabb21c7263-goog
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/3] KVM: x86/mmu: Avoid collision with !PRESENT SPTEs in TDP MMU lpage stats
  2021-07-26 17:53 ` [PATCH v2 2/3] KVM: x86/mmu: Avoid collision with !PRESENT SPTEs in TDP MMU lpage stats Mingwei Zhang
@ 2021-07-26 21:02   ` Ben Gardon
  2021-07-29 18:34   ` Sean Christopherson
  1 sibling, 0 replies; 15+ messages in thread
From: Ben Gardon @ 2021-07-26 21:02 UTC (permalink / raw)
  To: Mingwei Zhang
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, kvm, LKML, Jing Zhang

On Mon, Jul 26, 2021 at 10:54 AM Mingwei Zhang <mizhang@google.com> wrote:
>
> Factor in whether or not the old/new SPTEs are shadow-present when
> adjusting the large page stats in the TDP MMU. A modified MMIO SPTE can
> toggle the page size bit, as bit 7 is used to store the MMIO generation,
> i.e. is_large_pte() can get a false positive when called on a MMIO SPTE.
> Ditto for nuking SPTEs with REMOVED_SPTE, which sets bit 7 in its magic
> value.
>
> Opportunistically move the logic below the check to verify at least one
> of the old/new SPTEs is shadow present.
>
> Use is/was_leaf even though is/was_present would suffice.  The code
> generation is roughly equivalent since all flags need to be computed
> prior to the code in question, and using the *_leaf flags will minimize
> the diff in a future enhancement to account all pages, i.e. will change
> the check to "is_leaf != was_leaf".
>
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Mingwei Zhang <mizhang@google.com>

Reviewed-by: Ben Gardon <bgardon@google.com>

> ---
>  arch/x86/kvm/mmu/tdp_mmu.c | 20 +++++++++++++-------
>  1 file changed, 13 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index caac4ddb46df..cba2ab5db2a0 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -413,6 +413,7 @@ static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
>         bool was_leaf = was_present && is_last_spte(old_spte, level);
>         bool is_leaf = is_present && is_last_spte(new_spte, level);
>         bool pfn_changed = spte_to_pfn(old_spte) != spte_to_pfn(new_spte);
> +       bool was_large, is_large;
>
>         WARN_ON(level > PT64_ROOT_MAX_LEVEL);
>         WARN_ON(level < PG_LEVEL_4K);
> @@ -446,13 +447,6 @@ static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
>
>         trace_kvm_tdp_mmu_spte_changed(as_id, gfn, level, old_spte, new_spte);
>
> -       if (is_large_pte(old_spte) != is_large_pte(new_spte)) {
> -               if (is_large_pte(old_spte))
> -                       atomic64_sub(1, (atomic64_t*)&kvm->stat.lpages);
> -               else
> -                       atomic64_add(1, (atomic64_t*)&kvm->stat.lpages);
> -       }
> -
>         /*
>          * The only times a SPTE should be changed from a non-present to
>          * non-present state is when an MMIO entry is installed/modified/
> @@ -478,6 +472,18 @@ static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
>                 return;
>         }
>
> +       /*
> +        * Update large page stats if a large page is being zapped, created, or
> +        * is replacing an existing shadow page.
> +        */
> +       was_large = was_leaf && is_large_pte(old_spte);
> +       is_large = is_leaf && is_large_pte(new_spte);
> +       if (was_large != is_large) {
> +               if (was_large)
> +                       atomic64_sub(1, (atomic64_t *)&kvm->stat.lpages);
> +               else
> +                       atomic64_add(1, (atomic64_t *)&kvm->stat.lpages);
> +       }
>
>         if (was_leaf && is_dirty_spte(old_spte) &&
>             (!is_present || !is_dirty_spte(new_spte) || pfn_changed))
> --
> 2.32.0.432.gabb21c7263-goog
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/3] KVM: x86/mmu: Add detailed page size stats
  2021-07-26 20:41   ` Ben Gardon
@ 2021-07-26 21:06     ` Ben Gardon
  2021-07-29 18:45     ` Sean Christopherson
  1 sibling, 0 replies; 15+ messages in thread
From: Ben Gardon @ 2021-07-26 21:06 UTC (permalink / raw)
  To: Mingwei Zhang
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, kvm, LKML, Jing Zhang

On Mon, Jul 26, 2021 at 1:41 PM Ben Gardon <bgardon@google.com> wrote:
>
> On Mon, Jul 26, 2021 at 10:54 AM Mingwei Zhang <mizhang@google.com> wrote:
> >
> > Existing KVM code tracks the number of large pages regardless of their
> > sizes. Therefore, when large page of 1GB (or larger) is adopted, the
> > information becomes less useful because lpages counts a mix of 1G and 2M
> > pages.
> >
> > So remove the lpages since it is easy for user space to aggregate the info.
> > Instead, provide a comprehensive page stats of all sizes from 4K to 512G.
> >
> > Suggested-by: Ben Gardon <bgardon@google.com>
> > Suggested-by: Jing Zhang <jingzhangos@google.com>
> > Signed-off-by: Mingwei Zhang <mizhang@google.com>

Besides the check which can be dropped in mmu_spte_clear_track_bits,
this looks good to me.

Reviewed-by: Ben Gardon <bgardon@google.com>

> > ---
> >  arch/x86/include/asm/kvm_host.h | 10 +++++++++-
> >  arch/x86/kvm/mmu.h              |  2 ++
> >  arch/x86/kvm/mmu/mmu.c          | 32 +++++++++++++++++++-------------
> >  arch/x86/kvm/mmu/tdp_mmu.c      | 15 ++-------------
> >  arch/x86/kvm/x86.c              |  7 +++++--
> >  5 files changed, 37 insertions(+), 29 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index 974cbfb1eefe..2e4b6fd36e62 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -1206,9 +1206,17 @@ struct kvm_vm_stat {
> >         u64 mmu_recycled;
> >         u64 mmu_cache_miss;
> >         u64 mmu_unsync;
> > -       u64 lpages;
> >         u64 nx_lpage_splits;
> >         u64 max_mmu_page_hash_collisions;
> > +       union {
> > +               struct {
> > +                       atomic64_t pages_4k;
> > +                       atomic64_t pages_2m;
> > +                       atomic64_t pages_1g;
> > +                       atomic64_t pages_512g;
> > +               };
> > +               atomic64_t pages[4];
> > +       } page_stats;
> >  };
> >
> >  struct kvm_vcpu_stat {
> > diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> > index 83e6c6965f1e..ad5638815311 100644
> > --- a/arch/x86/kvm/mmu.h
> > +++ b/arch/x86/kvm/mmu.h
> > @@ -240,4 +240,6 @@ static inline bool kvm_memslots_have_rmaps(struct kvm *kvm)
> >         return smp_load_acquire(&kvm->arch.memslots_have_rmaps);
> >  }
> >
> > +void kvm_update_page_stats(struct kvm *kvm, int level, int count);
> > +
> >  #endif
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index 442cc554ebd6..7e0fc760739b 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -588,16 +588,22 @@ static bool mmu_spte_update(u64 *sptep, u64 new_spte)
> >         return flush;
> >  }
> >
> > +void kvm_update_page_stats(struct kvm *kvm, int level, int count)
> > +{
> > +       atomic64_add(count, &kvm->stat.page_stats.pages[level - 1]);
> > +}
> > +
> >  /*
> >   * Rules for using mmu_spte_clear_track_bits:
> >   * It sets the sptep from present to nonpresent, and track the
> >   * state bits, it is used to clear the last level sptep.
> >   * Returns non-zero if the PTE was previously valid.
> >   */
> > -static int mmu_spte_clear_track_bits(u64 *sptep)
> > +static int mmu_spte_clear_track_bits(struct kvm *kvm, u64 *sptep)
> >  {
> >         kvm_pfn_t pfn;
> >         u64 old_spte = *sptep;
> > +       int level = sptep_to_sp(sptep)->role.level;
> >
> >         if (!spte_has_volatile_bits(old_spte))
> >                 __update_clear_spte_fast(sptep, 0ull);
> > @@ -607,6 +613,9 @@ static int mmu_spte_clear_track_bits(u64 *sptep)
> >         if (!is_shadow_present_pte(old_spte))
> >                 return 0;
> >
> > +       if (is_last_spte(old_spte, level))
>
> You can drop this check since it's part of the contract for calling
> this function.
>
> > +               kvm_update_page_stats(kvm, level, -1);
> > +
> >         pfn = spte_to_pfn(old_spte);
> >
> >         /*
> > @@ -984,9 +993,10 @@ static void __pte_list_remove(u64 *spte, struct kvm_rmap_head *rmap_head)
> >         }
> >  }
> >
> > -static void pte_list_remove(struct kvm_rmap_head *rmap_head, u64 *sptep)
> > +static void pte_list_remove(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
> > +                           u64 *sptep)
> >  {
> > -       mmu_spte_clear_track_bits(sptep);
> > +       mmu_spte_clear_track_bits(kvm, sptep);
> >         __pte_list_remove(sptep, rmap_head);
> >  }
> >
> > @@ -1119,7 +1129,7 @@ static u64 *rmap_get_next(struct rmap_iterator *iter)
> >
> >  static void drop_spte(struct kvm *kvm, u64 *sptep)
> >  {
> > -       if (mmu_spte_clear_track_bits(sptep))
> > +       if (mmu_spte_clear_track_bits(kvm, sptep))
> >                 rmap_remove(kvm, sptep);
> >  }
> >
> > @@ -1129,7 +1139,6 @@ static bool __drop_large_spte(struct kvm *kvm, u64 *sptep)
> >         if (is_large_pte(*sptep)) {
> >                 WARN_ON(sptep_to_sp(sptep)->role.level == PG_LEVEL_4K);
> >                 drop_spte(kvm, sptep);
> > -               --kvm->stat.lpages;
> >                 return true;
> >         }
> >
> > @@ -1386,7 +1395,7 @@ static bool kvm_zap_rmapp(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
> >         while ((sptep = rmap_get_first(rmap_head, &iter))) {
> >                 rmap_printk("spte %p %llx.\n", sptep, *sptep);
> >
> > -               pte_list_remove(rmap_head, sptep);
> > +               pte_list_remove(kvm, rmap_head, sptep);
> >                 flush = true;
> >         }
> >
> > @@ -1421,13 +1430,13 @@ static bool kvm_set_pte_rmapp(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
> >                 need_flush = 1;
> >
> >                 if (pte_write(pte)) {
> > -                       pte_list_remove(rmap_head, sptep);
> > +                       pte_list_remove(kvm, rmap_head, sptep);
> >                         goto restart;
> >                 } else {
> >                         new_spte = kvm_mmu_changed_pte_notifier_make_spte(
> >                                         *sptep, new_pfn);
> >
> > -                       mmu_spte_clear_track_bits(sptep);
> > +                       mmu_spte_clear_track_bits(kvm, sptep);
> >                         mmu_spte_set(sptep, new_spte);
> >                 }
> >         }
> > @@ -2232,8 +2241,6 @@ static int mmu_page_zap_pte(struct kvm *kvm, struct kvm_mmu_page *sp,
> >         if (is_shadow_present_pte(pte)) {
> >                 if (is_last_spte(pte, sp->role.level)) {
> >                         drop_spte(kvm, spte);
> > -                       if (is_large_pte(pte))
> > -                               --kvm->stat.lpages;
> >                 } else {
> >                         child = to_shadow_page(pte & PT64_BASE_ADDR_MASK);
> >                         drop_parent_pte(child, spte);
> > @@ -2692,8 +2699,7 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
> >         trace_kvm_mmu_set_spte(level, gfn, sptep);
> >
> >         if (!was_rmapped) {
> > -               if (is_large_pte(*sptep))
> > -                       ++vcpu->kvm->stat.lpages;
> > +               kvm_update_page_stats(vcpu->kvm, level, 1);
> >                 rmap_count = rmap_add(vcpu, sptep, gfn);
> >                 if (rmap_count > RMAP_RECYCLE_THRESHOLD)
> >                         rmap_recycle(vcpu, sptep, gfn);
> > @@ -5669,7 +5675,7 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,
> >                 if (sp->role.direct && !kvm_is_reserved_pfn(pfn) &&
> >                     sp->role.level < kvm_mmu_max_mapping_level(kvm, slot, sp->gfn,
> >                                                                pfn, PG_LEVEL_NUM)) {
> > -                       pte_list_remove(rmap_head, sptep);
> > +                       pte_list_remove(kvm, rmap_head, sptep);
> >
> >                         if (kvm_available_flush_tlb_with_range())
> >                                 kvm_flush_remote_tlbs_with_address(kvm, sp->gfn,
> > diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> > index cba2ab5db2a0..eae404c15364 100644
> > --- a/arch/x86/kvm/mmu/tdp_mmu.c
> > +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> > @@ -413,7 +413,6 @@ static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
> >         bool was_leaf = was_present && is_last_spte(old_spte, level);
> >         bool is_leaf = is_present && is_last_spte(new_spte, level);
> >         bool pfn_changed = spte_to_pfn(old_spte) != spte_to_pfn(new_spte);
> > -       bool was_large, is_large;
> >
> >         WARN_ON(level > PT64_ROOT_MAX_LEVEL);
> >         WARN_ON(level < PG_LEVEL_4K);
> > @@ -472,18 +471,8 @@ static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
> >                 return;
> >         }
> >
> > -       /*
> > -        * Update large page stats if a large page is being zapped, created, or
> > -        * is replacing an existing shadow page.
> > -        */
> > -       was_large = was_leaf && is_large_pte(old_spte);
> > -       is_large = is_leaf && is_large_pte(new_spte);
> > -       if (was_large != is_large) {
> > -               if (was_large)
> > -                       atomic64_sub(1, (atomic64_t *)&kvm->stat.lpages);
> > -               else
> > -                       atomic64_add(1, (atomic64_t *)&kvm->stat.lpages);
> > -       }
> > +       if (is_leaf != was_leaf)
> > +               kvm_update_page_stats(kvm, level, is_leaf ? 1 : -1);
> >
> >         if (was_leaf && is_dirty_spte(old_spte) &&
> >             (!is_present || !is_dirty_spte(new_spte) || pfn_changed))
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 8166ad113fb2..3858d36d3c49 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -235,9 +235,12 @@ const struct _kvm_stats_desc kvm_vm_stats_desc[] = {
> >         STATS_DESC_COUNTER(VM, mmu_recycled),
> >         STATS_DESC_COUNTER(VM, mmu_cache_miss),
> >         STATS_DESC_ICOUNTER(VM, mmu_unsync),
> > -       STATS_DESC_ICOUNTER(VM, lpages),
> >         STATS_DESC_ICOUNTER(VM, nx_lpage_splits),
> > -       STATS_DESC_PCOUNTER(VM, max_mmu_page_hash_collisions)
> > +       STATS_DESC_PCOUNTER(VM, max_mmu_page_hash_collisions),
> > +       STATS_DESC_ICOUNTER(VM, page_stats.pages_4k),
> > +       STATS_DESC_ICOUNTER(VM, page_stats.pages_2m),
> > +       STATS_DESC_ICOUNTER(VM, page_stats.pages_1g),
> > +       STATS_DESC_ICOUNTER(VM, page_stats.pages_512g)
> >  };
> >  static_assert(ARRAY_SIZE(kvm_vm_stats_desc) ==
> >                 sizeof(struct kvm_vm_stat) / sizeof(u64));
> > --
> > 2.32.0.432.gabb21c7263-goog
> >

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/3] KVM: x86/mmu: Add detailed page size stats
  2021-07-26 17:53 ` [PATCH v2 3/3] KVM: x86/mmu: Add detailed page size stats Mingwei Zhang
  2021-07-26 20:41   ` Ben Gardon
@ 2021-07-27 15:36   ` David Matlack
  2021-07-29  6:24     ` Mingwei Zhang
  1 sibling, 1 reply; 15+ messages in thread
From: David Matlack @ 2021-07-27 15:36 UTC (permalink / raw)
  To: Mingwei Zhang
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, kvm, linux-kernel, Ben Gardon,
	Jing Zhang

On Mon, Jul 26, 2021 at 10:53:57AM -0700, Mingwei Zhang wrote:
> Existing KVM code tracks the number of large pages regardless of their
> sizes. Therefore, when large page of 1GB (or larger) is adopted, the
> information becomes less useful because lpages counts a mix of 1G and 2M
> pages.
> 
> So remove the lpages since it is easy for user space to aggregate the info.
> Instead, provide a comprehensive page stats of all sizes from 4K to 512G.
> 
> Suggested-by: Ben Gardon <bgardon@google.com>
> Suggested-by: Jing Zhang <jingzhangos@google.com>
> Signed-off-by: Mingwei Zhang <mizhang@google.com>
> ---
>  arch/x86/include/asm/kvm_host.h | 10 +++++++++-
>  arch/x86/kvm/mmu.h              |  2 ++
>  arch/x86/kvm/mmu/mmu.c          | 32 +++++++++++++++++++-------------
>  arch/x86/kvm/mmu/tdp_mmu.c      | 15 ++-------------
>  arch/x86/kvm/x86.c              |  7 +++++--
>  5 files changed, 37 insertions(+), 29 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 974cbfb1eefe..2e4b6fd36e62 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1206,9 +1206,17 @@ struct kvm_vm_stat {
>  	u64 mmu_recycled;
>  	u64 mmu_cache_miss;
>  	u64 mmu_unsync;
> -	u64 lpages;
>  	u64 nx_lpage_splits;
>  	u64 max_mmu_page_hash_collisions;
> +	union {
> +		struct {
> +			atomic64_t pages_4k;
> +			atomic64_t pages_2m;
> +			atomic64_t pages_1g;
> +			atomic64_t pages_512g;
> +		};
> +		atomic64_t pages[4];
> +	} page_stats;
>  };
>  
>  struct kvm_vcpu_stat {
> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> index 83e6c6965f1e..ad5638815311 100644
> --- a/arch/x86/kvm/mmu.h
> +++ b/arch/x86/kvm/mmu.h
> @@ -240,4 +240,6 @@ static inline bool kvm_memslots_have_rmaps(struct kvm *kvm)
>  	return smp_load_acquire(&kvm->arch.memslots_have_rmaps);
>  }
>  
> +void kvm_update_page_stats(struct kvm *kvm, int level, int count);
> +
>  #endif
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 442cc554ebd6..7e0fc760739b 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -588,16 +588,22 @@ static bool mmu_spte_update(u64 *sptep, u64 new_spte)
>  	return flush;
>  }
>  
> +void kvm_update_page_stats(struct kvm *kvm, int level, int count)
> +{
> +	atomic64_add(count, &kvm->stat.page_stats.pages[level - 1]);
> +}
> +
>  /*
>   * Rules for using mmu_spte_clear_track_bits:
>   * It sets the sptep from present to nonpresent, and track the
>   * state bits, it is used to clear the last level sptep.
>   * Returns non-zero if the PTE was previously valid.
>   */
> -static int mmu_spte_clear_track_bits(u64 *sptep)
> +static int mmu_spte_clear_track_bits(struct kvm *kvm, u64 *sptep)
>  {
>  	kvm_pfn_t pfn;
>  	u64 old_spte = *sptep;
> +	int level = sptep_to_sp(sptep)->role.level;
>  
>  	if (!spte_has_volatile_bits(old_spte))
>  		__update_clear_spte_fast(sptep, 0ull);
> @@ -607,6 +613,9 @@ static int mmu_spte_clear_track_bits(u64 *sptep)
>  	if (!is_shadow_present_pte(old_spte))
>  		return 0;
>  
> +	if (is_last_spte(old_spte, level))
> +		kvm_update_page_stats(kvm, level, -1);
> +
>  	pfn = spte_to_pfn(old_spte);
>  
>  	/*
> @@ -984,9 +993,10 @@ static void __pte_list_remove(u64 *spte, struct kvm_rmap_head *rmap_head)
>  	}
>  }
>  
> -static void pte_list_remove(struct kvm_rmap_head *rmap_head, u64 *sptep)
> +static void pte_list_remove(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
> +			    u64 *sptep)
>  {
> -	mmu_spte_clear_track_bits(sptep);
> +	mmu_spte_clear_track_bits(kvm, sptep);
>  	__pte_list_remove(sptep, rmap_head);
>  }
>  
> @@ -1119,7 +1129,7 @@ static u64 *rmap_get_next(struct rmap_iterator *iter)
>  
>  static void drop_spte(struct kvm *kvm, u64 *sptep)
>  {
> -	if (mmu_spte_clear_track_bits(sptep))
> +	if (mmu_spte_clear_track_bits(kvm, sptep))
>  		rmap_remove(kvm, sptep);
>  }
>  
> @@ -1129,7 +1139,6 @@ static bool __drop_large_spte(struct kvm *kvm, u64 *sptep)
>  	if (is_large_pte(*sptep)) {
>  		WARN_ON(sptep_to_sp(sptep)->role.level == PG_LEVEL_4K);
>  		drop_spte(kvm, sptep);
> -		--kvm->stat.lpages;
>  		return true;
>  	}
>  
> @@ -1386,7 +1395,7 @@ static bool kvm_zap_rmapp(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
>  	while ((sptep = rmap_get_first(rmap_head, &iter))) {
>  		rmap_printk("spte %p %llx.\n", sptep, *sptep);
>  
> -		pte_list_remove(rmap_head, sptep);
> +		pte_list_remove(kvm, rmap_head, sptep);
>  		flush = true;
>  	}
>  
> @@ -1421,13 +1430,13 @@ static bool kvm_set_pte_rmapp(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
>  		need_flush = 1;
>  
>  		if (pte_write(pte)) {
> -			pte_list_remove(rmap_head, sptep);
> +			pte_list_remove(kvm, rmap_head, sptep);
>  			goto restart;
>  		} else {
>  			new_spte = kvm_mmu_changed_pte_notifier_make_spte(
>  					*sptep, new_pfn);
>  
> -			mmu_spte_clear_track_bits(sptep);
> +			mmu_spte_clear_track_bits(kvm, sptep);
>  			mmu_spte_set(sptep, new_spte);
>  		}
>  	}
> @@ -2232,8 +2241,6 @@ static int mmu_page_zap_pte(struct kvm *kvm, struct kvm_mmu_page *sp,
>  	if (is_shadow_present_pte(pte)) {
>  		if (is_last_spte(pte, sp->role.level)) {
>  			drop_spte(kvm, spte);
> -			if (is_large_pte(pte))
> -				--kvm->stat.lpages;
>  		} else {
>  			child = to_shadow_page(pte & PT64_BASE_ADDR_MASK);
>  			drop_parent_pte(child, spte);
> @@ -2692,8 +2699,7 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>  	trace_kvm_mmu_set_spte(level, gfn, sptep);
>  
>  	if (!was_rmapped) {
> -		if (is_large_pte(*sptep))
> -			++vcpu->kvm->stat.lpages;
> +		kvm_update_page_stats(vcpu->kvm, level, 1);
>  		rmap_count = rmap_add(vcpu, sptep, gfn);
>  		if (rmap_count > RMAP_RECYCLE_THRESHOLD)
>  			rmap_recycle(vcpu, sptep, gfn);
> @@ -5669,7 +5675,7 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,
>  		if (sp->role.direct && !kvm_is_reserved_pfn(pfn) &&
>  		    sp->role.level < kvm_mmu_max_mapping_level(kvm, slot, sp->gfn,
>  							       pfn, PG_LEVEL_NUM)) {
> -			pte_list_remove(rmap_head, sptep);
> +			pte_list_remove(kvm, rmap_head, sptep);
>  
>  			if (kvm_available_flush_tlb_with_range())
>  				kvm_flush_remote_tlbs_with_address(kvm, sp->gfn,
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index cba2ab5db2a0..eae404c15364 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -413,7 +413,6 @@ static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
>  	bool was_leaf = was_present && is_last_spte(old_spte, level);
>  	bool is_leaf = is_present && is_last_spte(new_spte, level);
>  	bool pfn_changed = spte_to_pfn(old_spte) != spte_to_pfn(new_spte);
> -	bool was_large, is_large;
>  
>  	WARN_ON(level > PT64_ROOT_MAX_LEVEL);
>  	WARN_ON(level < PG_LEVEL_4K);
> @@ -472,18 +471,8 @@ static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
>  		return;
>  	}
>  
> -	/*
> -	 * Update large page stats if a large page is being zapped, created, or
> -	 * is replacing an existing shadow page.
> -	 */
> -	was_large = was_leaf && is_large_pte(old_spte);
> -	is_large = is_leaf && is_large_pte(new_spte);
> -	if (was_large != is_large) {
> -		if (was_large)
> -			atomic64_sub(1, (atomic64_t *)&kvm->stat.lpages);
> -		else
> -			atomic64_add(1, (atomic64_t *)&kvm->stat.lpages);
> -	}
> +	if (is_leaf != was_leaf)
> +		kvm_update_page_stats(kvm, level, is_leaf ? 1 : -1);
>  
>  	if (was_leaf && is_dirty_spte(old_spte) &&
>  	    (!is_present || !is_dirty_spte(new_spte) || pfn_changed))
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 8166ad113fb2..3858d36d3c49 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -235,9 +235,12 @@ const struct _kvm_stats_desc kvm_vm_stats_desc[] = {
>  	STATS_DESC_COUNTER(VM, mmu_recycled),
>  	STATS_DESC_COUNTER(VM, mmu_cache_miss),
>  	STATS_DESC_ICOUNTER(VM, mmu_unsync),
> -	STATS_DESC_ICOUNTER(VM, lpages),
>  	STATS_DESC_ICOUNTER(VM, nx_lpage_splits),
> -	STATS_DESC_PCOUNTER(VM, max_mmu_page_hash_collisions)
> +	STATS_DESC_PCOUNTER(VM, max_mmu_page_hash_collisions),
> +	STATS_DESC_ICOUNTER(VM, page_stats.pages_4k),
> +	STATS_DESC_ICOUNTER(VM, page_stats.pages_2m),
> +	STATS_DESC_ICOUNTER(VM, page_stats.pages_1g),
> +	STATS_DESC_ICOUNTER(VM, page_stats.pages_512g)

FYI this will make the stat names "page_stats.pages_4k",
"page_stats.pages_2m", etc. Is that ok?

If you want the stat names to be just "pages_4k", "pages_2m", etc. you
can make the page_stats union anonymous.

>  };
>  static_assert(ARRAY_SIZE(kvm_vm_stats_desc) ==
>  		sizeof(struct kvm_vm_stat) / sizeof(u64));
> -- 
> 2.32.0.432.gabb21c7263-goog
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/3] KVM: x86/mmu: Add detailed page size stats
  2021-07-27 15:36   ` David Matlack
@ 2021-07-29  6:24     ` Mingwei Zhang
  0 siblings, 0 replies; 15+ messages in thread
From: Mingwei Zhang @ 2021-07-29  6:24 UTC (permalink / raw)
  To: David Matlack
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, kvm, linux-kernel, Ben Gardon,
	Jing Zhang

Hi David and Ben,

Thank you both for the suggestions. It Makes sense to me and I will
update the patch set with the change added in the next version.

Regards.
-Mingwei

On Tue, Jul 27, 2021 at 8:36 AM David Matlack <dmatlack@google.com> wrote:
>
> On Mon, Jul 26, 2021 at 10:53:57AM -0700, Mingwei Zhang wrote:
> > Existing KVM code tracks the number of large pages regardless of their
> > sizes. Therefore, when large page of 1GB (or larger) is adopted, the
> > information becomes less useful because lpages counts a mix of 1G and 2M
> > pages.
> >
> > So remove the lpages since it is easy for user space to aggregate the info.
> > Instead, provide a comprehensive page stats of all sizes from 4K to 512G.
> >
> > Suggested-by: Ben Gardon <bgardon@google.com>
> > Suggested-by: Jing Zhang <jingzhangos@google.com>
> > Signed-off-by: Mingwei Zhang <mizhang@google.com>
> > ---
> >  arch/x86/include/asm/kvm_host.h | 10 +++++++++-
> >  arch/x86/kvm/mmu.h              |  2 ++
> >  arch/x86/kvm/mmu/mmu.c          | 32 +++++++++++++++++++-------------
> >  arch/x86/kvm/mmu/tdp_mmu.c      | 15 ++-------------
> >  arch/x86/kvm/x86.c              |  7 +++++--
> >  5 files changed, 37 insertions(+), 29 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index 974cbfb1eefe..2e4b6fd36e62 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -1206,9 +1206,17 @@ struct kvm_vm_stat {
> >       u64 mmu_recycled;
> >       u64 mmu_cache_miss;
> >       u64 mmu_unsync;
> > -     u64 lpages;
> >       u64 nx_lpage_splits;
> >       u64 max_mmu_page_hash_collisions;
> > +     union {
> > +             struct {
> > +                     atomic64_t pages_4k;
> > +                     atomic64_t pages_2m;
> > +                     atomic64_t pages_1g;
> > +                     atomic64_t pages_512g;
> > +             };
> > +             atomic64_t pages[4];
> > +     } page_stats;
> >  };
> >
> >  struct kvm_vcpu_stat {
> > diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> > index 83e6c6965f1e..ad5638815311 100644
> > --- a/arch/x86/kvm/mmu.h
> > +++ b/arch/x86/kvm/mmu.h
> > @@ -240,4 +240,6 @@ static inline bool kvm_memslots_have_rmaps(struct kvm *kvm)
> >       return smp_load_acquire(&kvm->arch.memslots_have_rmaps);
> >  }
> >
> > +void kvm_update_page_stats(struct kvm *kvm, int level, int count);
> > +
> >  #endif
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index 442cc554ebd6..7e0fc760739b 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -588,16 +588,22 @@ static bool mmu_spte_update(u64 *sptep, u64 new_spte)
> >       return flush;
> >  }
> >
> > +void kvm_update_page_stats(struct kvm *kvm, int level, int count)
> > +{
> > +     atomic64_add(count, &kvm->stat.page_stats.pages[level - 1]);
> > +}
> > +
> >  /*
> >   * Rules for using mmu_spte_clear_track_bits:
> >   * It sets the sptep from present to nonpresent, and track the
> >   * state bits, it is used to clear the last level sptep.
> >   * Returns non-zero if the PTE was previously valid.
> >   */
> > -static int mmu_spte_clear_track_bits(u64 *sptep)
> > +static int mmu_spte_clear_track_bits(struct kvm *kvm, u64 *sptep)
> >  {
> >       kvm_pfn_t pfn;
> >       u64 old_spte = *sptep;
> > +     int level = sptep_to_sp(sptep)->role.level;
> >
> >       if (!spte_has_volatile_bits(old_spte))
> >               __update_clear_spte_fast(sptep, 0ull);
> > @@ -607,6 +613,9 @@ static int mmu_spte_clear_track_bits(u64 *sptep)
> >       if (!is_shadow_present_pte(old_spte))
> >               return 0;
> >
> > +     if (is_last_spte(old_spte, level))
> > +             kvm_update_page_stats(kvm, level, -1);
> > +
> >       pfn = spte_to_pfn(old_spte);
> >
> >       /*
> > @@ -984,9 +993,10 @@ static void __pte_list_remove(u64 *spte, struct kvm_rmap_head *rmap_head)
> >       }
> >  }
> >
> > -static void pte_list_remove(struct kvm_rmap_head *rmap_head, u64 *sptep)
> > +static void pte_list_remove(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
> > +                         u64 *sptep)
> >  {
> > -     mmu_spte_clear_track_bits(sptep);
> > +     mmu_spte_clear_track_bits(kvm, sptep);
> >       __pte_list_remove(sptep, rmap_head);
> >  }
> >
> > @@ -1119,7 +1129,7 @@ static u64 *rmap_get_next(struct rmap_iterator *iter)
> >
> >  static void drop_spte(struct kvm *kvm, u64 *sptep)
> >  {
> > -     if (mmu_spte_clear_track_bits(sptep))
> > +     if (mmu_spte_clear_track_bits(kvm, sptep))
> >               rmap_remove(kvm, sptep);
> >  }
> >
> > @@ -1129,7 +1139,6 @@ static bool __drop_large_spte(struct kvm *kvm, u64 *sptep)
> >       if (is_large_pte(*sptep)) {
> >               WARN_ON(sptep_to_sp(sptep)->role.level == PG_LEVEL_4K);
> >               drop_spte(kvm, sptep);
> > -             --kvm->stat.lpages;
> >               return true;
> >       }
> >
> > @@ -1386,7 +1395,7 @@ static bool kvm_zap_rmapp(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
> >       while ((sptep = rmap_get_first(rmap_head, &iter))) {
> >               rmap_printk("spte %p %llx.\n", sptep, *sptep);
> >
> > -             pte_list_remove(rmap_head, sptep);
> > +             pte_list_remove(kvm, rmap_head, sptep);
> >               flush = true;
> >       }
> >
> > @@ -1421,13 +1430,13 @@ static bool kvm_set_pte_rmapp(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
> >               need_flush = 1;
> >
> >               if (pte_write(pte)) {
> > -                     pte_list_remove(rmap_head, sptep);
> > +                     pte_list_remove(kvm, rmap_head, sptep);
> >                       goto restart;
> >               } else {
> >                       new_spte = kvm_mmu_changed_pte_notifier_make_spte(
> >                                       *sptep, new_pfn);
> >
> > -                     mmu_spte_clear_track_bits(sptep);
> > +                     mmu_spte_clear_track_bits(kvm, sptep);
> >                       mmu_spte_set(sptep, new_spte);
> >               }
> >       }
> > @@ -2232,8 +2241,6 @@ static int mmu_page_zap_pte(struct kvm *kvm, struct kvm_mmu_page *sp,
> >       if (is_shadow_present_pte(pte)) {
> >               if (is_last_spte(pte, sp->role.level)) {
> >                       drop_spte(kvm, spte);
> > -                     if (is_large_pte(pte))
> > -                             --kvm->stat.lpages;
> >               } else {
> >                       child = to_shadow_page(pte & PT64_BASE_ADDR_MASK);
> >                       drop_parent_pte(child, spte);
> > @@ -2692,8 +2699,7 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
> >       trace_kvm_mmu_set_spte(level, gfn, sptep);
> >
> >       if (!was_rmapped) {
> > -             if (is_large_pte(*sptep))
> > -                     ++vcpu->kvm->stat.lpages;
> > +             kvm_update_page_stats(vcpu->kvm, level, 1);
> >               rmap_count = rmap_add(vcpu, sptep, gfn);
> >               if (rmap_count > RMAP_RECYCLE_THRESHOLD)
> >                       rmap_recycle(vcpu, sptep, gfn);
> > @@ -5669,7 +5675,7 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,
> >               if (sp->role.direct && !kvm_is_reserved_pfn(pfn) &&
> >                   sp->role.level < kvm_mmu_max_mapping_level(kvm, slot, sp->gfn,
> >                                                              pfn, PG_LEVEL_NUM)) {
> > -                     pte_list_remove(rmap_head, sptep);
> > +                     pte_list_remove(kvm, rmap_head, sptep);
> >
> >                       if (kvm_available_flush_tlb_with_range())
> >                               kvm_flush_remote_tlbs_with_address(kvm, sp->gfn,
> > diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> > index cba2ab5db2a0..eae404c15364 100644
> > --- a/arch/x86/kvm/mmu/tdp_mmu.c
> > +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> > @@ -413,7 +413,6 @@ static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
> >       bool was_leaf = was_present && is_last_spte(old_spte, level);
> >       bool is_leaf = is_present && is_last_spte(new_spte, level);
> >       bool pfn_changed = spte_to_pfn(old_spte) != spte_to_pfn(new_spte);
> > -     bool was_large, is_large;
> >
> >       WARN_ON(level > PT64_ROOT_MAX_LEVEL);
> >       WARN_ON(level < PG_LEVEL_4K);
> > @@ -472,18 +471,8 @@ static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
> >               return;
> >       }
> >
> > -     /*
> > -      * Update large page stats if a large page is being zapped, created, or
> > -      * is replacing an existing shadow page.
> > -      */
> > -     was_large = was_leaf && is_large_pte(old_spte);
> > -     is_large = is_leaf && is_large_pte(new_spte);
> > -     if (was_large != is_large) {
> > -             if (was_large)
> > -                     atomic64_sub(1, (atomic64_t *)&kvm->stat.lpages);
> > -             else
> > -                     atomic64_add(1, (atomic64_t *)&kvm->stat.lpages);
> > -     }
> > +     if (is_leaf != was_leaf)
> > +             kvm_update_page_stats(kvm, level, is_leaf ? 1 : -1);
> >
> >       if (was_leaf && is_dirty_spte(old_spte) &&
> >           (!is_present || !is_dirty_spte(new_spte) || pfn_changed))
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 8166ad113fb2..3858d36d3c49 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -235,9 +235,12 @@ const struct _kvm_stats_desc kvm_vm_stats_desc[] = {
> >       STATS_DESC_COUNTER(VM, mmu_recycled),
> >       STATS_DESC_COUNTER(VM, mmu_cache_miss),
> >       STATS_DESC_ICOUNTER(VM, mmu_unsync),
> > -     STATS_DESC_ICOUNTER(VM, lpages),
> >       STATS_DESC_ICOUNTER(VM, nx_lpage_splits),
> > -     STATS_DESC_PCOUNTER(VM, max_mmu_page_hash_collisions)
> > +     STATS_DESC_PCOUNTER(VM, max_mmu_page_hash_collisions),
> > +     STATS_DESC_ICOUNTER(VM, page_stats.pages_4k),
> > +     STATS_DESC_ICOUNTER(VM, page_stats.pages_2m),
> > +     STATS_DESC_ICOUNTER(VM, page_stats.pages_1g),
> > +     STATS_DESC_ICOUNTER(VM, page_stats.pages_512g)
>
> FYI this will make the stat names "page_stats.pages_4k",
> "page_stats.pages_2m", etc. Is that ok?
>
> If you want the stat names to be just "pages_4k", "pages_2m", etc. you
> can make the page_stats union anonymous.
>
> >  };
> >  static_assert(ARRAY_SIZE(kvm_vm_stats_desc) ==
> >               sizeof(struct kvm_vm_stat) / sizeof(u64));
> > --
> > 2.32.0.432.gabb21c7263-goog
> >

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/3] KVM: x86/mmu: Remove redundant spte present check in mmu_set_spte
  2021-07-26 17:53 ` [PATCH v2 1/3] KVM: x86/mmu: Remove redundant spte present check in mmu_set_spte Mingwei Zhang
  2021-07-26 20:23   ` Ben Gardon
@ 2021-07-29 18:16   ` Sean Christopherson
  1 sibling, 0 replies; 15+ messages in thread
From: Sean Christopherson @ 2021-07-29 18:16 UTC (permalink / raw)
  To: Mingwei Zhang
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Jing Zhang

On Mon, Jul 26, 2021, Mingwei Zhang wrote:
> Drop an unnecessary is_shadow_present_pte() check when updating the rmaps
> after installing a non-MMIO SPTE.  set_spte() is used only to create
> shadow-present SPTEs, e.g. MMIO SPTEs are handled early on, mmu_set_spte()
> runs with mmu_lock held for write, i.e. the SPTE can't be zapped between
> writing the SPTE and updating the rmaps.
> 
> Opportunistically combine the "new SPTE" logic for large pages and rmaps.
> 
> No functional change intended.
> 
> Suggested-by: Ben Gardon <bgardon@google.com>
> Signed-off-by: Mingwei Zhang <mizhang@google.com>
> ---

Reviewed-by: Sean Christopherson <seanjc@google.com>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/3] KVM: x86/mmu: Avoid collision with !PRESENT SPTEs in TDP MMU lpage stats
  2021-07-26 17:53 ` [PATCH v2 2/3] KVM: x86/mmu: Avoid collision with !PRESENT SPTEs in TDP MMU lpage stats Mingwei Zhang
  2021-07-26 21:02   ` Ben Gardon
@ 2021-07-29 18:34   ` Sean Christopherson
  2021-07-29 19:01     ` Mingwei Zhang
  1 sibling, 1 reply; 15+ messages in thread
From: Sean Christopherson @ 2021-07-29 18:34 UTC (permalink / raw)
  To: Mingwei Zhang
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Jing Zhang

On Mon, Jul 26, 2021, Mingwei Zhang wrote:
> Factor in whether or not the old/new SPTEs are shadow-present when
> adjusting the large page stats in the TDP MMU. A modified MMIO SPTE can
> toggle the page size bit, as bit 7 is used to store the MMIO generation,
> i.e. is_large_pte() can get a false positive when called on a MMIO SPTE.
> Ditto for nuking SPTEs with REMOVED_SPTE, which sets bit 7 in its magic
> value.
> 
> Opportunistically move the logic below the check to verify at least one
> of the old/new SPTEs is shadow present.
> 
> Use is/was_leaf even though is/was_present would suffice.  The code
> generation is roughly equivalent since all flags need to be computed
> prior to the code in question, and using the *_leaf flags will minimize
> the diff in a future enhancement to account all pages, i.e. will change
> the check to "is_leaf != was_leaf".
> 
> Suggested-by: Sean Christopherson <seanjc@google.com>

There's no hard rule for when to use Suggested-by vs. giving Author credit, but
in this case, since you took the patch and changelog verbatim[*] (sans the missing
tags below), it's more polite to take the full patch (with me as Author in
this case) and add your SOB since you're posting the patch.

  Fixes: 1699f65c8b65 ("kvm/x86: Fix 'lpages' kvm stat for TDM MMU")
  Cc: stable@vger.kernel.org

[*] https://lkml.kernel.org/r/YPho0ME5pSjqRSoc@google.com

> Signed-off-by: Mingwei Zhang <mizhang@google.com>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/3] KVM: x86/mmu: Add detailed page size stats
  2021-07-26 20:41   ` Ben Gardon
  2021-07-26 21:06     ` Ben Gardon
@ 2021-07-29 18:45     ` Sean Christopherson
  2021-07-29 19:02       ` Mingwei Zhang
  1 sibling, 1 reply; 15+ messages in thread
From: Sean Christopherson @ 2021-07-29 18:45 UTC (permalink / raw)
  To: Ben Gardon
  Cc: Mingwei Zhang, Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, kvm, LKML, Jing Zhang

On Mon, Jul 26, 2021, Ben Gardon wrote:
> On Mon, Jul 26, 2021 at 10:54 AM Mingwei Zhang <mizhang@google.com> wrote:
> > diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> > index 83e6c6965f1e..ad5638815311 100644
> > --- a/arch/x86/kvm/mmu.h
> > +++ b/arch/x86/kvm/mmu.h
> > @@ -240,4 +240,6 @@ static inline bool kvm_memslots_have_rmaps(struct kvm *kvm)
> >         return smp_load_acquire(&kvm->arch.memslots_have_rmaps);
> >  }
> >
> > +void kvm_update_page_stats(struct kvm *kvm, int level, int count);
> > +
> >  #endif
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index 442cc554ebd6..7e0fc760739b 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -588,16 +588,22 @@ static bool mmu_spte_update(u64 *sptep, u64 new_spte)
> >         return flush;
> >  }
> >
> > +void kvm_update_page_stats(struct kvm *kvm, int level, int count)
> > +{
> > +       atomic64_add(count, &kvm->stat.page_stats.pages[level - 1]);
> > +}

This can be static inline in the header.  Ignoring prolog+RET, it's four instructions,
and two of those are sign extending input params.

> > +
> >  /*
> >   * Rules for using mmu_spte_clear_track_bits:
> >   * It sets the sptep from present to nonpresent, and track the
> >   * state bits, it is used to clear the last level sptep.
> >   * Returns non-zero if the PTE was previously valid.
> >   */
> > -static int mmu_spte_clear_track_bits(u64 *sptep)
> > +static int mmu_spte_clear_track_bits(struct kvm *kvm, u64 *sptep)
> >  {
> >         kvm_pfn_t pfn;
> >         u64 old_spte = *sptep;
> > +       int level = sptep_to_sp(sptep)->role.level;
> >
> >         if (!spte_has_volatile_bits(old_spte))
> >                 __update_clear_spte_fast(sptep, 0ull);
> > @@ -607,6 +613,9 @@ static int mmu_spte_clear_track_bits(u64 *sptep)
> >         if (!is_shadow_present_pte(old_spte))
> >                 return 0;
> >
> > +       if (is_last_spte(old_spte, level))
> 
> You can drop this check since it's part of the contract for calling
> this function.

Ah, nice!  I overlooked that.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/3] KVM: x86/mmu: Avoid collision with !PRESENT SPTEs in TDP MMU lpage stats
  2021-07-29 18:34   ` Sean Christopherson
@ 2021-07-29 19:01     ` Mingwei Zhang
  0 siblings, 0 replies; 15+ messages in thread
From: Mingwei Zhang @ 2021-07-29 19:01 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Jing Zhang

oh, definitely. Sorry for the confusion.

On Thu, Jul 29, 2021 at 11:34 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Mon, Jul 26, 2021, Mingwei Zhang wrote:
> > Factor in whether or not the old/new SPTEs are shadow-present when
> > adjusting the large page stats in the TDP MMU. A modified MMIO SPTE can
> > toggle the page size bit, as bit 7 is used to store the MMIO generation,
> > i.e. is_large_pte() can get a false positive when called on a MMIO SPTE.
> > Ditto for nuking SPTEs with REMOVED_SPTE, which sets bit 7 in its magic
> > value.
> >
> > Opportunistically move the logic below the check to verify at least one
> > of the old/new SPTEs is shadow present.
> >
> > Use is/was_leaf even though is/was_present would suffice.  The code
> > generation is roughly equivalent since all flags need to be computed
> > prior to the code in question, and using the *_leaf flags will minimize
> > the diff in a future enhancement to account all pages, i.e. will change
> > the check to "is_leaf != was_leaf".
> >
> > Suggested-by: Sean Christopherson <seanjc@google.com>
>
> There's no hard rule for when to use Suggested-by vs. giving Author credit, but
> in this case, since you took the patch and changelog verbatim[*] (sans the missing
> tags below), it's more polite to take the full patch (with me as Author in
> this case) and add your SOB since you're posting the patch.
>
>   Fixes: 1699f65c8b65 ("kvm/x86: Fix 'lpages' kvm stat for TDM MMU")
>   Cc: stable@vger.kernel.org
>
> [*] https://lkml.kernel.org/r/YPho0ME5pSjqRSoc@google.com
>
> > Signed-off-by: Mingwei Zhang <mizhang@google.com>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/3] KVM: x86/mmu: Add detailed page size stats
  2021-07-29 18:45     ` Sean Christopherson
@ 2021-07-29 19:02       ` Mingwei Zhang
  0 siblings, 0 replies; 15+ messages in thread
From: Mingwei Zhang @ 2021-07-29 19:02 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Ben Gardon, Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, kvm, LKML, Jing Zhang

On Thu, Jul 29, 2021 at 11:45 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Mon, Jul 26, 2021, Ben Gardon wrote:
> > On Mon, Jul 26, 2021 at 10:54 AM Mingwei Zhang <mizhang@google.com> wrote:
> > > diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> > > index 83e6c6965f1e..ad5638815311 100644
> > > --- a/arch/x86/kvm/mmu.h
> > > +++ b/arch/x86/kvm/mmu.h
> > > @@ -240,4 +240,6 @@ static inline bool kvm_memslots_have_rmaps(struct kvm *kvm)
> > >         return smp_load_acquire(&kvm->arch.memslots_have_rmaps);
> > >  }
> > >
> > > +void kvm_update_page_stats(struct kvm *kvm, int level, int count);
> > > +
> > >  #endif
> > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > > index 442cc554ebd6..7e0fc760739b 100644
> > > --- a/arch/x86/kvm/mmu/mmu.c
> > > +++ b/arch/x86/kvm/mmu/mmu.c
> > > @@ -588,16 +588,22 @@ static bool mmu_spte_update(u64 *sptep, u64 new_spte)
> > >         return flush;
> > >  }
> > >
> > > +void kvm_update_page_stats(struct kvm *kvm, int level, int count)
> > > +{
> > > +       atomic64_add(count, &kvm->stat.page_stats.pages[level - 1]);
> > > +}
>
> This can be static inline in the header.  Ignoring prolog+RET, it's four instructions,
> and two of those are sign extending input params.

will do.It is really nice to see that this big function has been
finally shrinked to a single-line routine.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2021-07-29 19:02 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-26 17:53 [PATCH v2 0/3] Add detailed page size stats in KVM stats Mingwei Zhang
2021-07-26 17:53 ` [PATCH v2 1/3] KVM: x86/mmu: Remove redundant spte present check in mmu_set_spte Mingwei Zhang
2021-07-26 20:23   ` Ben Gardon
2021-07-29 18:16   ` Sean Christopherson
2021-07-26 17:53 ` [PATCH v2 2/3] KVM: x86/mmu: Avoid collision with !PRESENT SPTEs in TDP MMU lpage stats Mingwei Zhang
2021-07-26 21:02   ` Ben Gardon
2021-07-29 18:34   ` Sean Christopherson
2021-07-29 19:01     ` Mingwei Zhang
2021-07-26 17:53 ` [PATCH v2 3/3] KVM: x86/mmu: Add detailed page size stats Mingwei Zhang
2021-07-26 20:41   ` Ben Gardon
2021-07-26 21:06     ` Ben Gardon
2021-07-29 18:45     ` Sean Christopherson
2021-07-29 19:02       ` Mingwei Zhang
2021-07-27 15:36   ` David Matlack
2021-07-29  6:24     ` Mingwei Zhang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).