kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/3] Add detailed page size stats in KVM stats
@ 2021-08-03  4:46 Mingwei Zhang
  2021-08-03  4:46 ` [PATCH v4 1/3] KVM: x86/mmu: Remove redundant spte present check in mmu_set_spte Mingwei Zhang
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Mingwei Zhang @ 2021-08-03  4:46 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, David Matlack,
	Mingwei Zhang, Jing Zhang

This commit basically adds detailed (large and regular) page size info to
KVM stats and deprecate the old one: lpages.

To support legacy MMU and TDP mmu, we use atomic type for all page stats.

v3 -> v4:
 - rebase to origin/queue. [sean]
 - replace the lpages with page stats in place to avoid conflicts. [sean]

v2 -> v3:
 - move kvm_update_page_stats to mmu.h as a static inline function. [sean]
 - remove is_last_spte check in mmu_spte_clear_track_bits. [bgardon]
 - change page_stats union by making it anonymous. [dmatlack]

v1 -> v2:
 - refactor kvm_update_page_stats and remove 'spte' argument. [sean]
 - remove 'lpages' as it can be aggregated by user level [sean]
 - fix lpages stats update issue in __handle_change_pte [sean]
 - fix style issues and typos. [ben/sean]

pre-v1 (internal reviewers):
 - use atomic in all page stats and use 'level' as index. [sean]
 - use an extra argument in kvm_update_page_stats for atomic/non-atomic.
   [bgardon]
 - should be careful on the difference between legacy mmu and tdp mmu.
   [jingzhangos]

Mingwei Zhang (2):
  KVM: x86/mmu: Remove redundant spte present check in mmu_set_spte
  KVM: x86/mmu: Add detailed page size stats

Sean Christopherson (1):
  KVM: x86/mmu: Avoid collision with !PRESENT SPTEs in TDP MMU lpage
    stats

 arch/x86/include/asm/kvm_host.h | 10 ++++++-
 arch/x86/kvm/mmu.h              |  4 +++
 arch/x86/kvm/mmu/mmu.c          | 50 +++++++++++++++------------------
 arch/x86/kvm/mmu/tdp_mmu.c      |  9 ++----
 arch/x86/kvm/x86.c              |  5 +++-
 5 files changed, 42 insertions(+), 36 deletions(-)

--
2.32.0.554.ge1b32706d8-goog


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v4 1/3] KVM: x86/mmu: Remove redundant spte present check in mmu_set_spte
  2021-08-03  4:46 [PATCH v4 0/3] Add detailed page size stats in KVM stats Mingwei Zhang
@ 2021-08-03  4:46 ` Mingwei Zhang
  2021-08-03  4:46 ` [PATCH v4 2/3] KVM: x86/mmu: Avoid collision with !PRESENT SPTEs in TDP MMU lpage stats Mingwei Zhang
  2021-08-03  4:46 ` [PATCH v4 3/3] KVM: x86/mmu: Add detailed page size stats Mingwei Zhang
  2 siblings, 0 replies; 12+ messages in thread
From: Mingwei Zhang @ 2021-08-03  4:46 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, David Matlack,
	Mingwei Zhang, Jing Zhang

Drop an unnecessary is_shadow_present_pte() check when updating the rmaps
after installing a non-MMIO SPTE.  set_spte() is used only to create
shadow-present SPTEs, e.g. MMIO SPTEs are handled early on, mmu_set_spte()
runs with mmu_lock held for write, i.e. the SPTE can't be zapped between
writing the SPTE and updating the rmaps.

Opportunistically combine the "new SPTE" logic for large pages and rmaps.

No functional change intended.

Suggested-by: Ben Gardon <bgardon@google.com>

Reviewed-by: David Matlack <dmatlack@google.com>
Reviewed-by: Ben Gardon <bgardon@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 16 ++++++----------
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index a8cdfd8d45c4..f614e9df3c3b 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2734,17 +2734,13 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 
 	pgprintk("%s: setting spte %llx\n", __func__, *sptep);
 	trace_kvm_mmu_set_spte(level, gfn, sptep);
-	if (!was_rmapped && is_large_pte(*sptep))
-		++vcpu->kvm->stat.lpages;
 
-	if (is_shadow_present_pte(*sptep)) {
-		if (!was_rmapped) {
-			rmap_count = rmap_add(vcpu, sptep, gfn);
-			if (rmap_count > vcpu->kvm->stat.max_mmu_rmap_size)
-				vcpu->kvm->stat.max_mmu_rmap_size = rmap_count;
-			if (rmap_count > RMAP_RECYCLE_THRESHOLD)
-				rmap_recycle(vcpu, sptep, gfn);
-		}
+	if (!was_rmapped) {
+		if (is_large_pte(*sptep))
+			++vcpu->kvm->stat.lpages;
+		rmap_count = rmap_add(vcpu, sptep, gfn);
+		if (rmap_count > RMAP_RECYCLE_THRESHOLD)
+			rmap_recycle(vcpu, sptep, gfn);
 	}
 
 	return ret;
-- 
2.32.0.554.ge1b32706d8-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 2/3] KVM: x86/mmu: Avoid collision with !PRESENT SPTEs in TDP MMU lpage stats
  2021-08-03  4:46 [PATCH v4 0/3] Add detailed page size stats in KVM stats Mingwei Zhang
  2021-08-03  4:46 ` [PATCH v4 1/3] KVM: x86/mmu: Remove redundant spte present check in mmu_set_spte Mingwei Zhang
@ 2021-08-03  4:46 ` Mingwei Zhang
  2021-08-03  4:46 ` [PATCH v4 3/3] KVM: x86/mmu: Add detailed page size stats Mingwei Zhang
  2 siblings, 0 replies; 12+ messages in thread
From: Mingwei Zhang @ 2021-08-03  4:46 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, David Matlack,
	Mingwei Zhang, Jing Zhang

From: Sean Christopherson <seanjc@google.com>

Factor in whether or not the old/new SPTEs are shadow-present when
adjusting the large page stats in the TDP MMU.  A modified MMIO SPTE can
toggle the page size bit, as bit 7 is used to store the MMIO generation,
i.e. is_large_pte() can get a false positive when called on a MMIO SPTE.
Ditto for nuking SPTEs with REMOVED_SPTE, which sets bit 7 in its magic
value.

Opportunistically move the logic below the check to verify at least one
of the old/new SPTEs is shadow present.

Use is/was_leaf even though is/was_present would suffice.  The code
generation is roughly equivalent since all flags need to be computed
prior to the code in question, and using the *_leaf flags will minimize
the diff in a future enhancement to account all pages, i.e. will change
the check to "is_leaf != was_leaf".

Reviewed-by: David Matlack <dmatlack@google.com>
Reviewed-by: Ben Gardon <bgardon@google.com>

Fixes: 1699f65c8b65 ("kvm/x86: Fix 'lpages' kvm stat for TDM MMU")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
---
 arch/x86/kvm/mmu/tdp_mmu.c | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 43f12f5d12c0..4b0953fed12e 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -413,6 +413,7 @@ static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
 	bool was_leaf = was_present && is_last_spte(old_spte, level);
 	bool is_leaf = is_present && is_last_spte(new_spte, level);
 	bool pfn_changed = spte_to_pfn(old_spte) != spte_to_pfn(new_spte);
+	bool was_large, is_large;
 
 	WARN_ON(level > PT64_ROOT_MAX_LEVEL);
 	WARN_ON(level < PG_LEVEL_4K);
@@ -446,13 +447,6 @@ static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
 
 	trace_kvm_tdp_mmu_spte_changed(as_id, gfn, level, old_spte, new_spte);
 
-	if (is_large_pte(old_spte) != is_large_pte(new_spte)) {
-		if (is_large_pte(old_spte))
-			atomic64_sub(1, (atomic64_t*)&kvm->stat.lpages);
-		else
-			atomic64_add(1, (atomic64_t*)&kvm->stat.lpages);
-	}
-
 	/*
 	 * The only times a SPTE should be changed from a non-present to
 	 * non-present state is when an MMIO entry is installed/modified/
@@ -478,6 +472,18 @@ static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
 		return;
 	}
 
+	/*
+	 * Update large page stats if a large page is being zapped, created, or
+	 * is replacing an existing shadow page.
+	 */
+	was_large = was_leaf && is_large_pte(old_spte);
+	is_large = is_leaf && is_large_pte(new_spte);
+	if (was_large != is_large) {
+		if (was_large)
+			atomic64_sub(1, (atomic64_t *)&kvm->stat.lpages);
+		else
+			atomic64_add(1, (atomic64_t *)&kvm->stat.lpages);
+	}
 
 	if (was_leaf && is_dirty_spte(old_spte) &&
 	    (!is_present || !is_dirty_spte(new_spte) || pfn_changed))
-- 
2.32.0.554.ge1b32706d8-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 3/3] KVM: x86/mmu: Add detailed page size stats
  2021-08-03  4:46 [PATCH v4 0/3] Add detailed page size stats in KVM stats Mingwei Zhang
  2021-08-03  4:46 ` [PATCH v4 1/3] KVM: x86/mmu: Remove redundant spte present check in mmu_set_spte Mingwei Zhang
  2021-08-03  4:46 ` [PATCH v4 2/3] KVM: x86/mmu: Avoid collision with !PRESENT SPTEs in TDP MMU lpage stats Mingwei Zhang
@ 2021-08-03  4:46 ` Mingwei Zhang
  2021-08-09 22:26   ` Jim Mattson
  2021-08-11 11:36   ` Paolo Bonzini
  2 siblings, 2 replies; 12+ messages in thread
From: Mingwei Zhang @ 2021-08-03  4:46 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, David Matlack,
	Mingwei Zhang, Jing Zhang

Existing KVM code tracks the number of large pages regardless of their
sizes. Therefore, when large page of 1GB (or larger) is adopted, the
information becomes less useful because lpages counts a mix of 1G and 2M
pages.

So remove the lpages since it is easy for user space to aggregate the info.
Instead, provide a comprehensive page stats of all sizes from 4K to 512G.

Suggested-by: Ben Gardon <bgardon@google.com>

Reviewed-by: David Matlack <dmatlack@google.com>
Reviewed-by: Ben Gardon <bgardon@google.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Cc: Jing Zhang <jingzhangos@google.com>
Cc: David Matlack <dmatlack@google.com>
Cc: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h | 10 ++++++++-
 arch/x86/kvm/mmu.h              |  4 ++++
 arch/x86/kvm/mmu/mmu.c          | 38 ++++++++++++++++-----------------
 arch/x86/kvm/mmu/tdp_mmu.c      | 15 ++-----------
 arch/x86/kvm/x86.c              |  5 ++++-
 5 files changed, 38 insertions(+), 34 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 99f37781a6fc..c95bf4bbd2ff 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1206,7 +1206,15 @@ struct kvm_vm_stat {
 	u64 mmu_recycled;
 	u64 mmu_cache_miss;
 	u64 mmu_unsync;
-	u64 lpages;
+	union {
+		struct {
+			atomic64_t pages_4k;
+			atomic64_t pages_2m;
+			atomic64_t pages_1g;
+			atomic64_t pages_512g;
+		};
+		atomic64_t pages[4];
+	};
 	u64 nx_lpage_splits;
 	u64 max_mmu_page_hash_collisions;
 	u64 max_mmu_rmap_size;
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 83e6c6965f1e..2883789fb5fb 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -240,4 +240,8 @@ static inline bool kvm_memslots_have_rmaps(struct kvm *kvm)
 	return smp_load_acquire(&kvm->arch.memslots_have_rmaps);
 }
 
+static inline void kvm_update_page_stats(struct kvm *kvm, int level, int count)
+{
+	atomic64_add(count, &kvm->stat.pages[level - 1]);
+}
 #endif
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index f614e9df3c3b..8f46c1164f3f 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -599,10 +599,11 @@ static bool mmu_spte_update(u64 *sptep, u64 new_spte)
  * state bits, it is used to clear the last level sptep.
  * Returns the old PTE.
  */
-static u64 mmu_spte_clear_track_bits(u64 *sptep)
+static int mmu_spte_clear_track_bits(struct kvm *kvm, u64 *sptep)
 {
 	kvm_pfn_t pfn;
 	u64 old_spte = *sptep;
+	int level = sptep_to_sp(sptep)->role.level;
 
 	if (!spte_has_volatile_bits(old_spte))
 		__update_clear_spte_fast(sptep, 0ull);
@@ -612,6 +613,8 @@ static u64 mmu_spte_clear_track_bits(u64 *sptep)
 	if (!is_shadow_present_pte(old_spte))
 		return old_spte;
 
+	kvm_update_page_stats(kvm, level, -1);
+
 	pfn = spte_to_pfn(old_spte);
 
 	/*
@@ -996,15 +999,16 @@ static void __pte_list_remove(u64 *spte, struct kvm_rmap_head *rmap_head)
 	}
 }
 
-static void pte_list_remove(struct kvm_rmap_head *rmap_head, u64 *sptep)
+static void pte_list_remove(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
+			    u64 *sptep)
 {
-	mmu_spte_clear_track_bits(sptep);
+	mmu_spte_clear_track_bits(kvm, sptep);
 	__pte_list_remove(sptep, rmap_head);
 }
 
 /* Return true if rmap existed and callback called, false otherwise */
-static bool pte_list_destroy(struct kvm_rmap_head *rmap_head,
-			     void (*callback)(u64 *sptep))
+static bool pte_list_destroy(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
+			     void (*callback)(struct kvm *kvm, u64 *sptep))
 {
 	struct pte_list_desc *desc, *next;
 	int i;
@@ -1014,7 +1018,7 @@ static bool pte_list_destroy(struct kvm_rmap_head *rmap_head,
 
 	if (!(rmap_head->val & 1)) {
 		if (callback)
-			callback((u64 *)rmap_head->val);
+			callback(kvm, (u64 *)rmap_head->val);
 		goto out;
 	}
 
@@ -1023,7 +1027,7 @@ static bool pte_list_destroy(struct kvm_rmap_head *rmap_head,
 	while (desc) {
 		if (callback)
 			for (i = 0; i < desc->spte_count; i++)
-				callback(desc->sptes[i]);
+				callback(kvm, desc->sptes[i]);
 		next = desc->more;
 		mmu_free_pte_list_desc(desc);
 		desc = next;
@@ -1163,7 +1167,7 @@ static u64 *rmap_get_next(struct rmap_iterator *iter)
 
 static void drop_spte(struct kvm *kvm, u64 *sptep)
 {
-	u64 old_spte = mmu_spte_clear_track_bits(sptep);
+	u64 old_spte = mmu_spte_clear_track_bits(kvm, sptep);
 
 	if (is_shadow_present_pte(old_spte))
 		rmap_remove(kvm, sptep);
@@ -1175,7 +1179,6 @@ static bool __drop_large_spte(struct kvm *kvm, u64 *sptep)
 	if (is_large_pte(*sptep)) {
 		WARN_ON(sptep_to_sp(sptep)->role.level == PG_LEVEL_4K);
 		drop_spte(kvm, sptep);
-		--kvm->stat.lpages;
 		return true;
 	}
 
@@ -1422,15 +1425,15 @@ static bool rmap_write_protect(struct kvm_vcpu *vcpu, u64 gfn)
 	return kvm_mmu_slot_gfn_write_protect(vcpu->kvm, slot, gfn, PG_LEVEL_4K);
 }
 
-static void mmu_spte_clear_track_bits_cb(u64 *sptep)
+static void mmu_spte_clear_track_bits_cb(struct kvm *kvm, u64 *sptep)
 {
-	mmu_spte_clear_track_bits(sptep);
+	mmu_spte_clear_track_bits(kvm, sptep);
 }
 
 static bool kvm_zap_rmapp(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
 			  const struct kvm_memory_slot *slot)
 {
-	return pte_list_destroy(rmap_head, mmu_spte_clear_track_bits_cb);
+	return pte_list_destroy(kvm, rmap_head, mmu_spte_clear_track_bits_cb);
 }
 
 static bool kvm_unmap_rmapp(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
@@ -1461,13 +1464,13 @@ static bool kvm_set_pte_rmapp(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
 		need_flush = 1;
 
 		if (pte_write(pte)) {
-			pte_list_remove(rmap_head, sptep);
+			pte_list_remove(kvm, rmap_head, sptep);
 			goto restart;
 		} else {
 			new_spte = kvm_mmu_changed_pte_notifier_make_spte(
 					*sptep, new_pfn);
 
-			mmu_spte_clear_track_bits(sptep);
+			mmu_spte_clear_track_bits(kvm, sptep);
 			mmu_spte_set(sptep, new_spte);
 		}
 	}
@@ -2276,8 +2279,6 @@ static int mmu_page_zap_pte(struct kvm *kvm, struct kvm_mmu_page *sp,
 	if (is_shadow_present_pte(pte)) {
 		if (is_last_spte(pte, sp->role.level)) {
 			drop_spte(kvm, spte);
-			if (is_large_pte(pte))
-				--kvm->stat.lpages;
 		} else {
 			child = to_shadow_page(pte & PT64_BASE_ADDR_MASK);
 			drop_parent_pte(child, spte);
@@ -2736,8 +2737,7 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 	trace_kvm_mmu_set_spte(level, gfn, sptep);
 
 	if (!was_rmapped) {
-		if (is_large_pte(*sptep))
-			++vcpu->kvm->stat.lpages;
+		kvm_update_page_stats(vcpu->kvm, level, 1);
 		rmap_count = rmap_add(vcpu, sptep, gfn);
 		if (rmap_count > RMAP_RECYCLE_THRESHOLD)
 			rmap_recycle(vcpu, sptep, gfn);
@@ -5740,7 +5740,7 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,
 		if (sp->role.direct && !kvm_is_reserved_pfn(pfn) &&
 		    sp->role.level < kvm_mmu_max_mapping_level(kvm, slot, sp->gfn,
 							       pfn, PG_LEVEL_NUM)) {
-			pte_list_remove(rmap_head, sptep);
+			pte_list_remove(kvm, rmap_head, sptep);
 
 			if (kvm_available_flush_tlb_with_range())
 				kvm_flush_remote_tlbs_with_address(kvm, sp->gfn,
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 4b0953fed12e..45a5c1f43433 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -413,7 +413,6 @@ static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
 	bool was_leaf = was_present && is_last_spte(old_spte, level);
 	bool is_leaf = is_present && is_last_spte(new_spte, level);
 	bool pfn_changed = spte_to_pfn(old_spte) != spte_to_pfn(new_spte);
-	bool was_large, is_large;
 
 	WARN_ON(level > PT64_ROOT_MAX_LEVEL);
 	WARN_ON(level < PG_LEVEL_4K);
@@ -472,18 +471,8 @@ static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
 		return;
 	}
 
-	/*
-	 * Update large page stats if a large page is being zapped, created, or
-	 * is replacing an existing shadow page.
-	 */
-	was_large = was_leaf && is_large_pte(old_spte);
-	is_large = is_leaf && is_large_pte(new_spte);
-	if (was_large != is_large) {
-		if (was_large)
-			atomic64_sub(1, (atomic64_t *)&kvm->stat.lpages);
-		else
-			atomic64_add(1, (atomic64_t *)&kvm->stat.lpages);
-	}
+	if (is_leaf != was_leaf)
+		kvm_update_page_stats(kvm, level, is_leaf ? 1 : -1);
 
 	if (was_leaf && is_dirty_spte(old_spte) &&
 	    (!is_present || !is_dirty_spte(new_spte) || pfn_changed))
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 916c976e99ab..a0a1d70981a8 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -233,7 +233,10 @@ const struct _kvm_stats_desc kvm_vm_stats_desc[] = {
 	STATS_DESC_COUNTER(VM, mmu_recycled),
 	STATS_DESC_COUNTER(VM, mmu_cache_miss),
 	STATS_DESC_ICOUNTER(VM, mmu_unsync),
-	STATS_DESC_ICOUNTER(VM, lpages),
+	STATS_DESC_ICOUNTER(VM, pages_4k),
+	STATS_DESC_ICOUNTER(VM, pages_2m),
+	STATS_DESC_ICOUNTER(VM, pages_1g),
+	STATS_DESC_ICOUNTER(VM, pages_512g),
 	STATS_DESC_ICOUNTER(VM, nx_lpage_splits),
 	STATS_DESC_PCOUNTER(VM, max_mmu_rmap_size),
 	STATS_DESC_PCOUNTER(VM, max_mmu_page_hash_collisions)
-- 
2.32.0.554.ge1b32706d8-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 3/3] KVM: x86/mmu: Add detailed page size stats
  2021-08-03  4:46 ` [PATCH v4 3/3] KVM: x86/mmu: Add detailed page size stats Mingwei Zhang
@ 2021-08-09 22:26   ` Jim Mattson
  2021-08-09 23:39     ` Mingwei Zhang
  2021-08-11 11:36   ` Paolo Bonzini
  1 sibling, 1 reply; 12+ messages in thread
From: Jim Mattson @ 2021-08-09 22:26 UTC (permalink / raw)
  To: Mingwei Zhang
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, David Matlack,
	Jing Zhang

On Mon, Aug 2, 2021 at 9:46 PM Mingwei Zhang <mizhang@google.com> wrote:
>
> Existing KVM code tracks the number of large pages regardless of their
> sizes. Therefore, when large page of 1GB (or larger) is adopted, the
> information becomes less useful because lpages counts a mix of 1G and 2M
> pages.
>
> So remove the lpages since it is easy for user space to aggregate the info.
> Instead, provide a comprehensive page stats of all sizes from 4K to 512G.

There is no such thing as a 512GiB page, is there? If this is an
attempt at future-proofing, why not go to 256TiB?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 3/3] KVM: x86/mmu: Add detailed page size stats
  2021-08-09 22:26   ` Jim Mattson
@ 2021-08-09 23:39     ` Mingwei Zhang
  2021-08-10  0:01       ` Mingwei Zhang
  0 siblings, 1 reply; 12+ messages in thread
From: Mingwei Zhang @ 2021-08-09 23:39 UTC (permalink / raw)
  To: Jim Mattson
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Joerg Roedel, kvm, LKML, Ben Gardon, David Matlack, Jing Zhang

Hi Jim,

No, I don't think 512G is supported. So, I will remove the
'pages_512G' metric in my next version.

Thanks.
-Mingwei

On Mon, Aug 9, 2021 at 3:26 PM Jim Mattson <jmattson@google.com> wrote:
>
> On Mon, Aug 2, 2021 at 9:46 PM Mingwei Zhang <mizhang@google.com> wrote:
> >
> > Existing KVM code tracks the number of large pages regardless of their
> > sizes. Therefore, when large page of 1GB (or larger) is adopted, the
> > information becomes less useful because lpages counts a mix of 1G and 2M
> > pages.
> >
> > So remove the lpages since it is easy for user space to aggregate the info.
> > Instead, provide a comprehensive page stats of all sizes from 4K to 512G.
>
> There is no such thing as a 512GiB page, is there? If this is an
> attempt at future-proofing, why not go to 256TiB?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 3/3] KVM: x86/mmu: Add detailed page size stats
  2021-08-09 23:39     ` Mingwei Zhang
@ 2021-08-10  0:01       ` Mingwei Zhang
  2021-08-10 23:22         ` Peter Xu
  0 siblings, 1 reply; 12+ messages in thread
From: Mingwei Zhang @ 2021-08-10  0:01 UTC (permalink / raw)
  To: Jim Mattson
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Joerg Roedel, kvm, LKML, Ben Gardon, David Matlack, Jing Zhang,
	peterx

Hi Paolo,

I recently looked at the patches queued and I find this patch from
Peter Xu (Cced), which is also adding 'page stats' information into
KVM:

https://patchwork.kernel.org/project/kvm/patch/20210625153214.43106-7-peterx@redhat.com/

From a functionality point of view, the above patch seems duplicate
with mine. But in detail, Peter's approach is using debugfs with
proper locking to traverse the whole rmap to get the detailed page
sizes in different granularity.

In comparison, mine is to add extra code in low level SPTE update
routines and store aggregated data in kvm->kvm_stats. This data could
be retrieved from Jing's fd based API without any lock required, but
it does not provide the fine granular information such as the number
of contiguous 4KG/2MB/1GB pages.

So would you mind giving me some feedback on this patch? I would
really appreciate it.

Thank you. Regards
-Mingwei

-Mingwei

On Mon, Aug 9, 2021 at 4:39 PM Mingwei Zhang <mizhang@google.com> wrote:
>
> Hi Jim,
>
> No, I don't think 512G is supported. So, I will remove the
> 'pages_512G' metric in my next version.
>
> Thanks.
> -Mingwei
>
> On Mon, Aug 9, 2021 at 3:26 PM Jim Mattson <jmattson@google.com> wrote:
> >
> > On Mon, Aug 2, 2021 at 9:46 PM Mingwei Zhang <mizhang@google.com> wrote:
> > >
> > > Existing KVM code tracks the number of large pages regardless of their
> > > sizes. Therefore, when large page of 1GB (or larger) is adopted, the
> > > information becomes less useful because lpages counts a mix of 1G and 2M
> > > pages.
> > >
> > > So remove the lpages since it is easy for user space to aggregate the info.
> > > Instead, provide a comprehensive page stats of all sizes from 4K to 512G.
> >
> > There is no such thing as a 512GiB page, is there? If this is an
> > attempt at future-proofing, why not go to 256TiB?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 3/3] KVM: x86/mmu: Add detailed page size stats
  2021-08-10  0:01       ` Mingwei Zhang
@ 2021-08-10 23:22         ` Peter Xu
  2021-08-11  1:06           ` Mingwei Zhang
  0 siblings, 1 reply; 12+ messages in thread
From: Peter Xu @ 2021-08-10 23:22 UTC (permalink / raw)
  To: Mingwei Zhang
  Cc: Jim Mattson, Paolo Bonzini, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Joerg Roedel, kvm, LKML,
	Ben Gardon, David Matlack, Jing Zhang

On Mon, Aug 09, 2021 at 05:01:39PM -0700, Mingwei Zhang wrote:
> Hi Paolo,

Hi, Mingwei,

> 
> I recently looked at the patches queued and I find this patch from
> Peter Xu (Cced), which is also adding 'page stats' information into
> KVM:
> 
> https://patchwork.kernel.org/project/kvm/patch/20210625153214.43106-7-peterx@redhat.com/
> 
> From a functionality point of view, the above patch seems duplicate
> with mine.

The rmap statistics are majorly for rmap, not huge pages.

> But in detail, Peter's approach is using debugfs with
> proper locking to traverse the whole rmap to get the detailed page
> sizes in different granularity.
> 
> In comparison, mine is to add extra code in low level SPTE update
> routines and store aggregated data in kvm->kvm_stats. This data could
> be retrieved from Jing's fd based API without any lock required, but
> it does not provide the fine granular information such as the number
> of contiguous 4KG/2MB/1GB pages.
> 
> So would you mind giving me some feedback on this patch? I would
> really appreciate it.

I have a question: why change to using atomic ops?  As most kvm statistics
seems to be not with atomics before.

AFAIK atomics are expensive, and they get even more expensive when the host is
bigger (it should easily go into ten-nanosecond level).  So I have no idea
whether it's worth it for persuing that accuracy.

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 3/3] KVM: x86/mmu: Add detailed page size stats
  2021-08-10 23:22         ` Peter Xu
@ 2021-08-11  1:06           ` Mingwei Zhang
  2021-08-11 13:12             ` Peter Xu
  0 siblings, 1 reply; 12+ messages in thread
From: Mingwei Zhang @ 2021-08-11  1:06 UTC (permalink / raw)
  To: Peter Xu
  Cc: Jim Mattson, Paolo Bonzini, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Joerg Roedel, kvm, LKML,
	Ben Gardon, David Matlack, Jing Zhang

Hi Peter,

Thanks for the comments. Please check my feedback inline.

> > From a functionality point of view, the above patch seems duplicate
> > with mine.
>
> The rmap statistics are majorly for rmap, not huge pages.
>

Got you. I guess the focus of the stat is different.


>
> I have a question: why change to using atomic ops?  As most kvm statistics
> seems to be not with atomics before.
>
> AFAIK atomics are expensive, and they get even more expensive when the host is
> bigger (it should easily go into ten-nanosecond level).  So I have no idea
> whether it's worth it for persuing that accuracy.
>
> Thanks,

Yes, regarding the usage of 'atomic_t', we previously had discussions
internally with Sean. So I think the main reason is because: in KVM
currently, mmu may have several modes. Among them, one is the mmu
running with TDP enabled (say, legacy mode in this scope) and another
one is the TDP mmu mode (mmu/tdp_mmu.c). In the legacy mode, mmu will
update spte under a write lock, while in comparison, in the TDP MMU
mode, mmu will use a read lock. I copied a simple code snippet to
illustrate the situation:

».......if (is_tdp_mmu_fault)
».......».......read_lock(&vcpu->kvm->mmu_lock);
».......else
».......».......write_lock(&vcpu->kvm->mmu_lock);

So here comes the problem: how do we make the page stat update
correctly across all modes? For the latter case, we definitely have to
update the stat in an atomic way unless we want extra locking (we
don't want that). So we could either 1) use a branch to update the
stat differently for each mode or 2) use an atomic update across all
cases. After review, Sean mentioned (in pre-v1 internal review) that
we should just use atomic. I agree since adding a branch at such a hot
path may slow down even more globally, especially if there is branch
misprediction? But I appreciate your feedback as well.

Regarding the pursuit for accuracy, I think there might be several
reasons. One of the most critical reasons that I know is that we need
to ensure dirty logging works correctly, i.e., when dirty logging is
enabled, all huge pages (both 2MB and 1GB) _are_ gone. Hope that
clarifies a little bit?

Thanks.
-Mingwei

-Mingwei


>
> --
> Peter Xu
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 3/3] KVM: x86/mmu: Add detailed page size stats
  2021-08-03  4:46 ` [PATCH v4 3/3] KVM: x86/mmu: Add detailed page size stats Mingwei Zhang
  2021-08-09 22:26   ` Jim Mattson
@ 2021-08-11 11:36   ` Paolo Bonzini
  1 sibling, 0 replies; 12+ messages in thread
From: Paolo Bonzini @ 2021-08-11 11:36 UTC (permalink / raw)
  To: Mingwei Zhang
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, David Matlack,
	Jing Zhang

On 03/08/21 06:46, Mingwei Zhang wrote:
> +	union {
> +		struct {
> +			atomic64_t pages_4k;
> +			atomic64_t pages_2m;
> +			atomic64_t pages_1g;
> +			atomic64_t pages_512g;
> +		};
> +		atomic64_t pages[4];
> +	};
>   	u64 nx_lpage_splits;

This array can use KVM_NR_PAGE_SIZES as the size.

Queued, thanks!

Paolo


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 3/3] KVM: x86/mmu: Add detailed page size stats
  2021-08-11  1:06           ` Mingwei Zhang
@ 2021-08-11 13:12             ` Peter Xu
  2021-08-12 17:44               ` Mingwei Zhang
  0 siblings, 1 reply; 12+ messages in thread
From: Peter Xu @ 2021-08-11 13:12 UTC (permalink / raw)
  To: Mingwei Zhang
  Cc: Jim Mattson, Paolo Bonzini, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Joerg Roedel, kvm, LKML,
	Ben Gardon, David Matlack, Jing Zhang

On Tue, Aug 10, 2021 at 06:06:51PM -0700, Mingwei Zhang wrote:
> Regarding the pursuit for accuracy, I think there might be several
> reasons. One of the most critical reasons that I know is that we need
> to ensure dirty logging works correctly, i.e., when dirty logging is
> enabled, all huge pages (both 2MB and 1GB) _are_ gone. Hope that
> clarifies a little bit?

It's just for statistics, right?  I mean dirty log should be working even
without this change.

But I didn't read closely last night, so we want to have "how many huge pages
we're mapping", not "how many we've mapped in the history".  Yes that makes
sense to be accurate.  I should have looked more carefully, sorry.

PS: it turns out atomic is not that expensive as I thought even on a 200 core
system, which takes 7ns (but for sure it's still expensive than normal memory
ops, and bus locking); I thought it'll be bigger as on a 40 core system I got
15ns which is 2x of my laptop of 8 cores, but it didn't really grow but shrink.

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 3/3] KVM: x86/mmu: Add detailed page size stats
  2021-08-11 13:12             ` Peter Xu
@ 2021-08-12 17:44               ` Mingwei Zhang
  0 siblings, 0 replies; 12+ messages in thread
From: Mingwei Zhang @ 2021-08-12 17:44 UTC (permalink / raw)
  To: Peter Xu
  Cc: Jim Mattson, Paolo Bonzini, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Joerg Roedel, kvm, LKML,
	Ben Gardon, David Matlack, Jing Zhang

Hi Peter,


On Wed, Aug 11, 2021 at 6:12 AM Peter Xu <peterx@redhat.com> wrote:
>
> On Tue, Aug 10, 2021 at 06:06:51PM -0700, Mingwei Zhang wrote:
> > Regarding the pursuit for accuracy, I think there might be several
> > reasons. One of the most critical reasons that I know is that we need
> > to ensure dirty logging works correctly, i.e., when dirty logging is
> > enabled, all huge pages (both 2MB and 1GB) _are_ gone. Hope that
> > clarifies a little bit?
>
> It's just for statistics, right?  I mean dirty log should be working even
> without this change.

That's true. What I meant was that the accurate stats might be able to
help verifying a property of dirty logging as a side benefit. Sorry
for the confusion.

>
> But I didn't read closely last night, so we want to have "how many huge pages
> we're mapping", not "how many we've mapped in the history".  Yes that makes
> sense to be accurate.  I should have looked more carefully, sorry.
>
> PS: it turns out atomic is not that expensive as I thought even on a 200 core
> system, which takes 7ns (but for sure it's still expensive than normal memory
> ops, and bus locking); I thought it'll be bigger as on a 40 core system I got
> 15ns which is 2x of my laptop of 8 cores, but it didn't really grow but shrink.

Thanks for the information about atomic access!
>
> Thanks,
>
> --
> Peter Xu
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-08-12 17:44 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-03  4:46 [PATCH v4 0/3] Add detailed page size stats in KVM stats Mingwei Zhang
2021-08-03  4:46 ` [PATCH v4 1/3] KVM: x86/mmu: Remove redundant spte present check in mmu_set_spte Mingwei Zhang
2021-08-03  4:46 ` [PATCH v4 2/3] KVM: x86/mmu: Avoid collision with !PRESENT SPTEs in TDP MMU lpage stats Mingwei Zhang
2021-08-03  4:46 ` [PATCH v4 3/3] KVM: x86/mmu: Add detailed page size stats Mingwei Zhang
2021-08-09 22:26   ` Jim Mattson
2021-08-09 23:39     ` Mingwei Zhang
2021-08-10  0:01       ` Mingwei Zhang
2021-08-10 23:22         ` Peter Xu
2021-08-11  1:06           ` Mingwei Zhang
2021-08-11 13:12             ` Peter Xu
2021-08-12 17:44               ` Mingwei Zhang
2021-08-11 11:36   ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).