linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/4] KVM: mm: count KVM mmu usage in memory stats
@ 2022-06-06 22:20 Yosry Ahmed
  2022-06-06 22:20 ` [PATCH v5 1/4] mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses Yosry Ahmed
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Yosry Ahmed @ 2022-06-06 22:20 UTC (permalink / raw)
  To: Tejun Heo, Johannes Weiner, Zefan Li, Marc Zyngier, James Morse,
	Alexandru Elisei, Suzuki K Poulose, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Andrew Morton, Michal Hocko, Roman Gushchin,
	Shakeel Butt, Oliver Upton
  Cc: cgroups, linux-kernel, linux-arm-kernel, kvmarm, kvm, linux-mm,
	Yosry Ahmed

We keep track of several kernel memory stats (total kernel memory, page
tables, stack, vmalloc, etc) on multiple levels (global, per-node,
per-memcg, etc). These stats give insights to users to how much memory
is used by the kernel and for what purposes.

Currently, memory used by kvm mmu is not accounted in any of those
kernel memory stats. This patch series accounts the memory pages
used by KVM for page tables in those stats in a new
NR_SECONDARY_PAGETABLE stat. This stat can be later extended to account
for other types of secondary pages tables (e.g. iommu page tables).

KVM has a decent number of large allocations that aren't for page
tables, but for most of them, the number/size of those allocations
scales linearly with either the number of vCPUs or the amount of memory
assigned to the VM. KVM's secondary page table allocations do not scale
linearly, especially when nested virtualization is in use.

From a KVM perspective, NR_SECONDARY_PAGETABLE will scale with KVM's
per-VM pages_{4k,2m,1g} stats unless the guest is doing something
bizarre (e.g. accessing only 4kb chunks of 2mb pages so that KVM is
forced to allocate a large number of page tables even though the guest
isn't accessing that much memory). However, someone would need to either
understand how KVM works to make that connection, or know (or be told) to
go look at KVM's stats if they're running VMs to better decipher the stats.

Also, having NR_PAGETABLE side-by-side with NR_SECONDARY_PAGETABLE is
informative. For example, when backing a VM with THP vs. HugeTLB,
NR_SECONDARY_PAGETABLE is roughly the same, but NR_PAGETABLE is an order
of magnitude higher with THP. So having this stat will at the very least
prove to be useful for understanding tradeoffs between VM backing types,
and likely even steer folks towards potential optimizations.

---

Chnages in V5:
- Updated cover letter to explain more the rationale behind the change
  (Thanks to contributions by Sean Christopherson).
- Removed extraneous + in arm64 patch (Oliver Upton, Marc Zyngier).
- Shortened secondary_pagetables to sec_pagetables (Shakeel Butt).
- Removed dependency on other patchsets (applies to queue branch).

Changes in V4:
- Changed accounting hooks in arm64 to only account s2 page tables and
  refactored them to a much cleaner form, based on recommendations from
  Oliver Upton and Marc Zyngier.
- Dropped patches for mips and riscv. I am not interested in those archs
  anyway and don't have the resources to test them. I posted them for
  completeness but it doesn't seem like anyone was interested.

Changes in V3:
- Added NR_SECONDARY_PAGETABLE instead of piggybacking on NR_PAGETABLE
  stats.

Changes in V2:
- Added accounting stats for other archs than x86.
- Changed locations in the code where x86 KVM page table stats were
  accounted based on suggestions from Sean Christopherson.

---

Yosry Ahmed (4):
  mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses.
  KVM: mmu: add a helper to account memory used by KVM MMU.
  KVM: x86/mmu: count KVM mmu usage in secondary pagetable stats.
  KVM: arm64/mmu: count KVM s2 mmu usage in secondary pagetable stats

 Documentation/admin-guide/cgroup-v2.rst |  5 ++++
 Documentation/filesystems/proc.rst      |  4 +++
 arch/arm64/kvm/mmu.c                    | 35 ++++++++++++++++++++++---
 arch/x86/kvm/mmu/mmu.c                  | 16 +++++++++--
 arch/x86/kvm/mmu/tdp_mmu.c              | 12 +++++++++
 drivers/base/node.c                     |  2 ++
 fs/proc/meminfo.c                       |  2 ++
 include/linux/kvm_host.h                |  9 +++++++
 include/linux/mmzone.h                  |  1 +
 mm/memcontrol.c                         |  1 +
 mm/page_alloc.c                         |  6 ++++-
 mm/vmstat.c                             |  1 +
 12 files changed, 87 insertions(+), 7 deletions(-)

-- 
2.36.1.255.ge46751e96f-goog


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v5 1/4] mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses.
  2022-06-06 22:20 [PATCH v5 0/4] KVM: mm: count KVM mmu usage in memory stats Yosry Ahmed
@ 2022-06-06 22:20 ` Yosry Ahmed
  2022-06-10 19:55   ` Shakeel Butt
                     ` (3 more replies)
  2022-06-06 22:20 ` [PATCH v5 2/4] KVM: mmu: add a helper to account memory used by KVM MMU Yosry Ahmed
                   ` (2 subsequent siblings)
  3 siblings, 4 replies; 16+ messages in thread
From: Yosry Ahmed @ 2022-06-06 22:20 UTC (permalink / raw)
  To: Tejun Heo, Johannes Weiner, Zefan Li, Marc Zyngier, James Morse,
	Alexandru Elisei, Suzuki K Poulose, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Andrew Morton, Michal Hocko, Roman Gushchin,
	Shakeel Butt, Oliver Upton
  Cc: cgroups, linux-kernel, linux-arm-kernel, kvmarm, kvm, linux-mm,
	Yosry Ahmed

Add NR_SECONDARY_PAGETABLE stat to count secondary page table uses, e.g.
KVM mmu. This provides more insights on the kernel memory used
by a workload.

This stat will be used by subsequent patches to count KVM mmu
memory usage.

Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
---
 Documentation/admin-guide/cgroup-v2.rst | 5 +++++
 Documentation/filesystems/proc.rst      | 4 ++++
 drivers/base/node.c                     | 2 ++
 fs/proc/meminfo.c                       | 2 ++
 include/linux/mmzone.h                  | 1 +
 mm/memcontrol.c                         | 1 +
 mm/page_alloc.c                         | 6 +++++-
 mm/vmstat.c                             | 1 +
 8 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 69d7a6983f781..307a284b99189 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1312,6 +1312,11 @@ PAGE_SIZE multiple when read back.
 	  pagetables
                 Amount of memory allocated for page tables.
 
+	  sec_pagetables
+		Amount of memory allocated for secondary page tables,
+		this currently includes KVM mmu allocations on x86
+		and arm64.
+
 	  percpu (npn)
 		Amount of memory used for storing per-cpu kernel
 		data structures.
diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
index 061744c436d99..894d6317f3bdc 100644
--- a/Documentation/filesystems/proc.rst
+++ b/Documentation/filesystems/proc.rst
@@ -973,6 +973,7 @@ You may not have all of these fields.
     SReclaimable:   159856 kB
     SUnreclaim:     124508 kB
     PageTables:      24448 kB
+    SecPageTables:	 0 kB
     NFS_Unstable:        0 kB
     Bounce:              0 kB
     WritebackTmp:        0 kB
@@ -1067,6 +1068,9 @@ SUnreclaim
 PageTables
               amount of memory dedicated to the lowest level of page
               tables.
+SecPageTables
+	      amount of memory dedicated to secondary page tables, this
+	      currently includes KVM mmu allocations on x86 and arm64.
 NFS_Unstable
               Always zero. Previous counted pages which had been written to
               the server, but has not been committed to stable storage.
diff --git a/drivers/base/node.c b/drivers/base/node.c
index ec8bb24a5a227..9fe716832546f 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -433,6 +433,7 @@ static ssize_t node_read_meminfo(struct device *dev,
 			     "Node %d ShadowCallStack:%8lu kB\n"
 #endif
 			     "Node %d PageTables:     %8lu kB\n"
+			     "Node %d SecPageTables:  %8lu kB\n"
 			     "Node %d NFS_Unstable:   %8lu kB\n"
 			     "Node %d Bounce:         %8lu kB\n"
 			     "Node %d WritebackTmp:   %8lu kB\n"
@@ -459,6 +460,7 @@ static ssize_t node_read_meminfo(struct device *dev,
 			     nid, node_page_state(pgdat, NR_KERNEL_SCS_KB),
 #endif
 			     nid, K(node_page_state(pgdat, NR_PAGETABLE)),
+			     nid, K(node_page_state(pgdat, NR_SECONDARY_PAGETABLE)),
 			     nid, 0UL,
 			     nid, K(sum_zone_node_page_state(nid, NR_BOUNCE)),
 			     nid, K(node_page_state(pgdat, NR_WRITEBACK_TEMP)),
diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
index 6fa761c9cc78e..fad29024eb2e0 100644
--- a/fs/proc/meminfo.c
+++ b/fs/proc/meminfo.c
@@ -108,6 +108,8 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
 #endif
 	show_val_kb(m, "PageTables:     ",
 		    global_node_page_state(NR_PAGETABLE));
+	show_val_kb(m, "SecPageTables:	",
+		    global_node_page_state(NR_SECONDARY_PAGETABLE));
 
 	show_val_kb(m, "NFS_Unstable:   ", 0);
 	show_val_kb(m, "Bounce:         ",
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 46ffab808f037..81d109e6c623a 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -219,6 +219,7 @@ enum node_stat_item {
 	NR_KERNEL_SCS_KB,	/* measured in KiB */
 #endif
 	NR_PAGETABLE,		/* used for pagetables */
+	NR_SECONDARY_PAGETABLE, /* secondary pagetables, e.g. kvm shadow pagetables */
 #ifdef CONFIG_SWAP
 	NR_SWAPCACHE,
 #endif
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 598fece89e2b7..ee1c3d464857c 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1398,6 +1398,7 @@ static const struct memory_stat memory_stats[] = {
 	{ "kernel",			MEMCG_KMEM			},
 	{ "kernel_stack",		NR_KERNEL_STACK_KB		},
 	{ "pagetables",			NR_PAGETABLE			},
+	{ "sec_pagetables",		NR_SECONDARY_PAGETABLE		},
 	{ "percpu",			MEMCG_PERCPU_B			},
 	{ "sock",			MEMCG_SOCK			},
 	{ "vmalloc",			MEMCG_VMALLOC			},
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0e42038382c12..29a7e9cd28c74 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5932,7 +5932,8 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask)
 		" active_file:%lu inactive_file:%lu isolated_file:%lu\n"
 		" unevictable:%lu dirty:%lu writeback:%lu\n"
 		" slab_reclaimable:%lu slab_unreclaimable:%lu\n"
-		" mapped:%lu shmem:%lu pagetables:%lu bounce:%lu\n"
+		" mapped:%lu shmem:%lu pagetables:%lu\n"
+		" sec_pagetables:%lu bounce:%lu\n"
 		" kernel_misc_reclaimable:%lu\n"
 		" free:%lu free_pcp:%lu free_cma:%lu\n",
 		global_node_page_state(NR_ACTIVE_ANON),
@@ -5949,6 +5950,7 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask)
 		global_node_page_state(NR_FILE_MAPPED),
 		global_node_page_state(NR_SHMEM),
 		global_node_page_state(NR_PAGETABLE),
+		global_node_page_state(NR_SECONDARY_PAGETABLE),
 		global_zone_page_state(NR_BOUNCE),
 		global_node_page_state(NR_KERNEL_MISC_RECLAIMABLE),
 		global_zone_page_state(NR_FREE_PAGES),
@@ -5982,6 +5984,7 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask)
 			" shadow_call_stack:%lukB"
 #endif
 			" pagetables:%lukB"
+			" sec_pagetables:%lukB"
 			" all_unreclaimable? %s"
 			"\n",
 			pgdat->node_id,
@@ -6007,6 +6010,7 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask)
 			node_page_state(pgdat, NR_KERNEL_SCS_KB),
 #endif
 			K(node_page_state(pgdat, NR_PAGETABLE)),
+			K(node_page_state(pgdat, NR_SECONDARY_PAGETABLE)),
 			pgdat->kswapd_failures >= MAX_RECLAIM_RETRIES ?
 				"yes" : "no");
 	}
diff --git a/mm/vmstat.c b/mm/vmstat.c
index b75b1a64b54cb..06eb52fe5be94 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1240,6 +1240,7 @@ const char * const vmstat_text[] = {
 	"nr_shadow_call_stack",
 #endif
 	"nr_page_table_pages",
+	"nr_sec_page_table_pages",
 #ifdef CONFIG_SWAP
 	"nr_swapcached",
 #endif
-- 
2.36.1.255.ge46751e96f-goog


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v5 2/4] KVM: mmu: add a helper to account memory used by KVM MMU.
  2022-06-06 22:20 [PATCH v5 0/4] KVM: mm: count KVM mmu usage in memory stats Yosry Ahmed
  2022-06-06 22:20 ` [PATCH v5 1/4] mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses Yosry Ahmed
@ 2022-06-06 22:20 ` Yosry Ahmed
  2022-06-27 16:20   ` Sean Christopherson
  2022-06-06 22:20 ` [PATCH v5 3/4] KVM: x86/mmu: count KVM mmu usage in secondary pagetable stats Yosry Ahmed
  2022-06-06 22:20 ` [PATCH v5 4/4] KVM: arm64/mmu: count KVM s2 " Yosry Ahmed
  3 siblings, 1 reply; 16+ messages in thread
From: Yosry Ahmed @ 2022-06-06 22:20 UTC (permalink / raw)
  To: Tejun Heo, Johannes Weiner, Zefan Li, Marc Zyngier, James Morse,
	Alexandru Elisei, Suzuki K Poulose, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Andrew Morton, Michal Hocko, Roman Gushchin,
	Shakeel Butt, Oliver Upton
  Cc: cgroups, linux-kernel, linux-arm-kernel, kvmarm, kvm, linux-mm,
	Yosry Ahmed

Add a helper to account pages used by KVM for page tables in secondary
pagetable stats. This function will be used by subsequent patches in
different archs.

Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
---
 include/linux/kvm_host.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 883e86ec8e8c4..645585f3a4bed 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2246,6 +2246,15 @@ static inline void kvm_handle_signal_exit(struct kvm_vcpu *vcpu)
 }
 #endif /* CONFIG_KVM_XFER_TO_GUEST_WORK */
 
+/*
+ * If nr > 1, we assume virt is the address of the first page of a block of
+ * pages that were allocated together (i.e accounted together).
+ */
+static inline void kvm_account_pgtable_pages(void *virt, int nr)
+{
+	mod_lruvec_page_state(virt_to_page(virt), NR_SECONDARY_PAGETABLE, nr);
+}
+
 /*
  * This defines how many reserved entries we want to keep before we
  * kick the vcpu to the userspace to avoid dirty ring full.  This
-- 
2.36.1.255.ge46751e96f-goog


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v5 3/4] KVM: x86/mmu: count KVM mmu usage in secondary pagetable stats.
  2022-06-06 22:20 [PATCH v5 0/4] KVM: mm: count KVM mmu usage in memory stats Yosry Ahmed
  2022-06-06 22:20 ` [PATCH v5 1/4] mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses Yosry Ahmed
  2022-06-06 22:20 ` [PATCH v5 2/4] KVM: mmu: add a helper to account memory used by KVM MMU Yosry Ahmed
@ 2022-06-06 22:20 ` Yosry Ahmed
  2022-06-27 16:22   ` Sean Christopherson
  2022-06-06 22:20 ` [PATCH v5 4/4] KVM: arm64/mmu: count KVM s2 " Yosry Ahmed
  3 siblings, 1 reply; 16+ messages in thread
From: Yosry Ahmed @ 2022-06-06 22:20 UTC (permalink / raw)
  To: Tejun Heo, Johannes Weiner, Zefan Li, Marc Zyngier, James Morse,
	Alexandru Elisei, Suzuki K Poulose, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Andrew Morton, Michal Hocko, Roman Gushchin,
	Shakeel Butt, Oliver Upton
  Cc: cgroups, linux-kernel, linux-arm-kernel, kvmarm, kvm, linux-mm,
	Yosry Ahmed

Count the pages used by KVM mmu on x86 for in secondary pagetable stats.

Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
---
 arch/x86/kvm/mmu/mmu.c     | 16 ++++++++++++++--
 arch/x86/kvm/mmu/tdp_mmu.c | 12 ++++++++++++
 2 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index efe5a3dca1e09..4090d228e1756 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1652,6 +1652,18 @@ static inline void kvm_mod_used_mmu_pages(struct kvm *kvm, long nr)
 	percpu_counter_add(&kvm_total_used_mmu_pages, nr);
 }
 
+static void kvm_account_mmu_page(struct kvm *kvm, struct kvm_mmu_page *sp)
+{
+	kvm_mod_used_mmu_pages(kvm, +1);
+	kvm_account_pgtable_pages((void *)sp->spt, +1);
+}
+
+static void kvm_unaccount_mmu_page(struct kvm *kvm, struct kvm_mmu_page *sp)
+{
+	kvm_mod_used_mmu_pages(kvm, -1);
+	kvm_account_pgtable_pages((void *)sp->spt, -1);
+}
+
 static void kvm_mmu_free_page(struct kvm_mmu_page *sp)
 {
 	MMU_WARN_ON(!is_empty_shadow_page(sp->spt));
@@ -1707,7 +1719,7 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu, int direct
 	 */
 	sp->mmu_valid_gen = vcpu->kvm->arch.mmu_valid_gen;
 	list_add(&sp->link, &vcpu->kvm->arch.active_mmu_pages);
-	kvm_mod_used_mmu_pages(vcpu->kvm, +1);
+	kvm_account_mmu_page(vcpu->kvm, sp);
 	return sp;
 }
 
@@ -2336,7 +2348,7 @@ static bool __kvm_mmu_prepare_zap_page(struct kvm *kvm,
 			list_add(&sp->link, invalid_list);
 		else
 			list_move(&sp->link, invalid_list);
-		kvm_mod_used_mmu_pages(kvm, -1);
+		kvm_unaccount_mmu_page(kvm, sp);
 	} else {
 		/*
 		 * Remove the active root from the active page list, the root
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 841feaa48be5e..0b70d1a1a3534 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -372,6 +372,16 @@ static void handle_changed_spte_dirty_log(struct kvm *kvm, int as_id, gfn_t gfn,
 	}
 }
 
+static void tdp_account_mmu_page(struct kvm *kvm, struct kvm_mmu_page *sp)
+{
+	kvm_account_pgtable_pages((void *)sp->spt, +1);
+}
+
+static void tdp_unaccount_mmu_page(struct kvm *kvm, struct kvm_mmu_page *sp)
+{
+	kvm_account_pgtable_pages((void *)sp->spt, -1);
+}
+
 /**
  * tdp_mmu_unlink_sp() - Remove a shadow page from the list of used pages
  *
@@ -384,6 +394,7 @@ static void handle_changed_spte_dirty_log(struct kvm *kvm, int as_id, gfn_t gfn,
 static void tdp_mmu_unlink_sp(struct kvm *kvm, struct kvm_mmu_page *sp,
 			      bool shared)
 {
+	tdp_unaccount_mmu_page(kvm, sp);
 	if (shared)
 		spin_lock(&kvm->arch.tdp_mmu_pages_lock);
 	else
@@ -1146,6 +1157,7 @@ static int tdp_mmu_link_sp(struct kvm *kvm, struct tdp_iter *iter,
 	if (account_nx)
 		account_huge_nx_page(kvm, sp);
 	spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
+	tdp_account_mmu_page(kvm, sp);
 
 	return 0;
 }
-- 
2.36.1.255.ge46751e96f-goog


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v5 4/4] KVM: arm64/mmu: count KVM s2 mmu usage in secondary pagetable stats
  2022-06-06 22:20 [PATCH v5 0/4] KVM: mm: count KVM mmu usage in memory stats Yosry Ahmed
                   ` (2 preceding siblings ...)
  2022-06-06 22:20 ` [PATCH v5 3/4] KVM: x86/mmu: count KVM mmu usage in secondary pagetable stats Yosry Ahmed
@ 2022-06-06 22:20 ` Yosry Ahmed
  2022-06-28 18:53   ` Oliver Upton
  3 siblings, 1 reply; 16+ messages in thread
From: Yosry Ahmed @ 2022-06-06 22:20 UTC (permalink / raw)
  To: Tejun Heo, Johannes Weiner, Zefan Li, Marc Zyngier, James Morse,
	Alexandru Elisei, Suzuki K Poulose, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Andrew Morton, Michal Hocko, Roman Gushchin,
	Shakeel Butt, Oliver Upton
  Cc: cgroups, linux-kernel, linux-arm-kernel, kvmarm, kvm, linux-mm,
	Yosry Ahmed

Count the pages used by KVM in arm64 for stage2 mmu in secondary pagetable
stats.

Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
---
 arch/arm64/kvm/mmu.c | 36 ++++++++++++++++++++++++++++++++----
 1 file changed, 32 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index f5651a05b6a85..80bc92601fd96 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -92,9 +92,13 @@ static bool kvm_is_device_pfn(unsigned long pfn)
 static void *stage2_memcache_zalloc_page(void *arg)
 {
 	struct kvm_mmu_memory_cache *mc = arg;
+	void *virt;
 
 	/* Allocated with __GFP_ZERO, so no need to zero */
-	return kvm_mmu_memory_cache_alloc(mc);
+	virt = kvm_mmu_memory_cache_alloc(mc);
+	if (virt)
+		kvm_account_pgtable_pages(virt, 1);
+	return virt;
 }
 
 static void *kvm_host_zalloc_pages_exact(size_t size)
@@ -102,6 +106,21 @@ static void *kvm_host_zalloc_pages_exact(size_t size)
 	return alloc_pages_exact(size, GFP_KERNEL_ACCOUNT | __GFP_ZERO);
 }
 
+static void *kvm_s2_zalloc_pages_exact(size_t size)
+{
+	void *virt = kvm_host_zalloc_pages_exact(size);
+
+	if (virt)
+		kvm_account_pgtable_pages(virt, (size >> PAGE_SHIFT));
+	return virt;
+}
+
+static void kvm_s2_free_pages_exact(void *virt, size_t size)
+{
+	kvm_account_pgtable_pages(virt, -(size >> PAGE_SHIFT));
+	free_pages_exact(virt, size);
+}
+
 static void kvm_host_get_page(void *addr)
 {
 	get_page(virt_to_page(addr));
@@ -112,6 +131,15 @@ static void kvm_host_put_page(void *addr)
 	put_page(virt_to_page(addr));
 }
 
+static void kvm_s2_put_page(void *addr)
+{
+	struct page *p = virt_to_page(addr);
+	/* Dropping last refcount, the page will be freed */
+	if (page_count(p) == 1)
+		kvm_account_pgtable_pages(addr, -1);
+	put_page(p);
+}
+
 static int kvm_host_page_count(void *addr)
 {
 	return page_count(virt_to_page(addr));
@@ -625,10 +653,10 @@ static int get_user_mapping_size(struct kvm *kvm, u64 addr)
 
 static struct kvm_pgtable_mm_ops kvm_s2_mm_ops = {
 	.zalloc_page		= stage2_memcache_zalloc_page,
-	.zalloc_pages_exact	= kvm_host_zalloc_pages_exact,
-	.free_pages_exact	= free_pages_exact,
+	.zalloc_pages_exact	= kvm_s2_zalloc_pages_exact,
+	.free_pages_exact	= kvm_s2_free_pages_exact,
 	.get_page		= kvm_host_get_page,
-	.put_page		= kvm_host_put_page,
+	.put_page		= kvm_s2_put_page,
 	.page_count		= kvm_host_page_count,
 	.phys_to_virt		= kvm_host_va,
 	.virt_to_phys		= kvm_host_pa,
-- 
2.36.1.255.ge46751e96f-goog


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 1/4] mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses.
  2022-06-06 22:20 ` [PATCH v5 1/4] mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses Yosry Ahmed
@ 2022-06-10 19:55   ` Shakeel Butt
  2022-06-13  3:18   ` Huang, Shaoqin
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 16+ messages in thread
From: Shakeel Butt @ 2022-06-10 19:55 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: Tejun Heo, Johannes Weiner, Zefan Li, Marc Zyngier, James Morse,
	Alexandru Elisei, Suzuki K Poulose, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Andrew Morton, Michal Hocko, Roman Gushchin,
	Oliver Upton, Cgroups, LKML, Linux ARM, kvmarm, kvm, Linux MM

On Mon, Jun 6, 2022 at 3:21 PM Yosry Ahmed <yosryahmed@google.com> wrote:
>
> Add NR_SECONDARY_PAGETABLE stat to count secondary page table uses, e.g.
> KVM mmu. This provides more insights on the kernel memory used
> by a workload.
>
> This stat will be used by subsequent patches to count KVM mmu
> memory usage.
>
> Signed-off-by: Yosry Ahmed <yosryahmed@google.com>

Acked-by: Shakeel Butt <shakeelb@google.com>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 1/4] mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses.
  2022-06-06 22:20 ` [PATCH v5 1/4] mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses Yosry Ahmed
  2022-06-10 19:55   ` Shakeel Butt
@ 2022-06-13  3:18   ` Huang, Shaoqin
  2022-06-13 17:11     ` Yosry Ahmed
  2022-06-27 16:07   ` Sean Christopherson
  2022-06-27 16:27   ` Sean Christopherson
  3 siblings, 1 reply; 16+ messages in thread
From: Huang, Shaoqin @ 2022-06-13  3:18 UTC (permalink / raw)
  To: Yosry Ahmed, Tejun Heo, Johannes Weiner, Zefan Li, Marc Zyngier,
	James Morse, Alexandru Elisei, Suzuki K Poulose, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Andrew Morton, Michal Hocko, Roman Gushchin,
	Shakeel Butt, Oliver Upton
  Cc: cgroups, linux-kernel, linux-arm-kernel, kvmarm, kvm, linux-mm



On 6/7/2022 6:20 AM, Yosry Ahmed wrote:
> Add NR_SECONDARY_PAGETABLE stat to count secondary page table uses, e.g.
> KVM mmu. This provides more insights on the kernel memory used
> by a workload.
> 
> This stat will be used by subsequent patches to count KVM mmu
> memory usage.
> 
> Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
> ---
>   Documentation/admin-guide/cgroup-v2.rst | 5 +++++
>   Documentation/filesystems/proc.rst      | 4 ++++
>   drivers/base/node.c                     | 2 ++
>   fs/proc/meminfo.c                       | 2 ++
>   include/linux/mmzone.h                  | 1 +
>   mm/memcontrol.c                         | 1 +
>   mm/page_alloc.c                         | 6 +++++-
>   mm/vmstat.c                             | 1 +
>   8 files changed, 21 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> index 69d7a6983f781..307a284b99189 100644
> --- a/Documentation/admin-guide/cgroup-v2.rst
> +++ b/Documentation/admin-guide/cgroup-v2.rst
> @@ -1312,6 +1312,11 @@ PAGE_SIZE multiple when read back.
>   	  pagetables
>                   Amount of memory allocated for page tables.
>   
> +	  sec_pagetables
> +		Amount of memory allocated for secondary page tables,
> +		this currently includes KVM mmu allocations on x86
> +		and arm64.
> +
>   	  percpu (npn)
>   		Amount of memory used for storing per-cpu kernel
>   		data structures.
> diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
> index 061744c436d99..894d6317f3bdc 100644
> --- a/Documentation/filesystems/proc.rst
> +++ b/Documentation/filesystems/proc.rst
> @@ -973,6 +973,7 @@ You may not have all of these fields.
>       SReclaimable:   159856 kB
>       SUnreclaim:     124508 kB
>       PageTables:      24448 kB
> +    SecPageTables:	 0 kB
>       NFS_Unstable:        0 kB
>       Bounce:              0 kB
>       WritebackTmp:        0 kB
> @@ -1067,6 +1068,9 @@ SUnreclaim
>   PageTables
>                 amount of memory dedicated to the lowest level of page
>                 tables.
> +SecPageTables
> +	      amount of memory dedicated to secondary page tables, this
> +	      currently includes KVM mmu allocations on x86 and arm64.

Just a notice. This patch in the latest 5.19.0-rc2+ have a conflict in 
Documentation/filesystems/proc.rst file. But that's not a problem.

>   NFS_Unstable
>                 Always zero. Previous counted pages which had been written to
>                 the server, but has not been committed to stable storage.
> diff --git a/drivers/base/node.c b/drivers/base/node.c
> index ec8bb24a5a227..9fe716832546f 100644
> --- a/drivers/base/node.c
> +++ b/drivers/base/node.c
> @@ -433,6 +433,7 @@ static ssize_t node_read_meminfo(struct device *dev,
>   			     "Node %d ShadowCallStack:%8lu kB\n"
>   #endif
>   			     "Node %d PageTables:     %8lu kB\n"
> +			     "Node %d SecPageTables:  %8lu kB\n"
>   			     "Node %d NFS_Unstable:   %8lu kB\n"
>   			     "Node %d Bounce:         %8lu kB\n"
>   			     "Node %d WritebackTmp:   %8lu kB\n"
> @@ -459,6 +460,7 @@ static ssize_t node_read_meminfo(struct device *dev,
>   			     nid, node_page_state(pgdat, NR_KERNEL_SCS_KB),
>   #endif
>   			     nid, K(node_page_state(pgdat, NR_PAGETABLE)),
> +			     nid, K(node_page_state(pgdat, NR_SECONDARY_PAGETABLE)),
>   			     nid, 0UL,
>   			     nid, K(sum_zone_node_page_state(nid, NR_BOUNCE)),
>   			     nid, K(node_page_state(pgdat, NR_WRITEBACK_TEMP)),
> diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
> index 6fa761c9cc78e..fad29024eb2e0 100644
> --- a/fs/proc/meminfo.c
> +++ b/fs/proc/meminfo.c
> @@ -108,6 +108,8 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
>   #endif
>   	show_val_kb(m, "PageTables:     ",
>   		    global_node_page_state(NR_PAGETABLE));
> +	show_val_kb(m, "SecPageTables:	",
> +		    global_node_page_state(NR_SECONDARY_PAGETABLE));
>   
>   	show_val_kb(m, "NFS_Unstable:   ", 0);
>   	show_val_kb(m, "Bounce:         ",
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 46ffab808f037..81d109e6c623a 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -219,6 +219,7 @@ enum node_stat_item {
>   	NR_KERNEL_SCS_KB,	/* measured in KiB */
>   #endif
>   	NR_PAGETABLE,		/* used for pagetables */
> +	NR_SECONDARY_PAGETABLE, /* secondary pagetables, e.g. kvm shadow pagetables */
>   #ifdef CONFIG_SWAP
>   	NR_SWAPCACHE,
>   #endif
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 598fece89e2b7..ee1c3d464857c 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1398,6 +1398,7 @@ static const struct memory_stat memory_stats[] = {
>   	{ "kernel",			MEMCG_KMEM			},
>   	{ "kernel_stack",		NR_KERNEL_STACK_KB		},
>   	{ "pagetables",			NR_PAGETABLE			},
> +	{ "sec_pagetables",		NR_SECONDARY_PAGETABLE		},
>   	{ "percpu",			MEMCG_PERCPU_B			},
>   	{ "sock",			MEMCG_SOCK			},
>   	{ "vmalloc",			MEMCG_VMALLOC			},
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 0e42038382c12..29a7e9cd28c74 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5932,7 +5932,8 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask)
>   		" active_file:%lu inactive_file:%lu isolated_file:%lu\n"
>   		" unevictable:%lu dirty:%lu writeback:%lu\n"
>   		" slab_reclaimable:%lu slab_unreclaimable:%lu\n"
> -		" mapped:%lu shmem:%lu pagetables:%lu bounce:%lu\n"
> +		" mapped:%lu shmem:%lu pagetables:%lu\n"
> +		" sec_pagetables:%lu bounce:%lu\n"
>   		" kernel_misc_reclaimable:%lu\n"
>   		" free:%lu free_pcp:%lu free_cma:%lu\n",
>   		global_node_page_state(NR_ACTIVE_ANON),
> @@ -5949,6 +5950,7 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask)
>   		global_node_page_state(NR_FILE_MAPPED),
>   		global_node_page_state(NR_SHMEM),
>   		global_node_page_state(NR_PAGETABLE),
> +		global_node_page_state(NR_SECONDARY_PAGETABLE),
>   		global_zone_page_state(NR_BOUNCE),
>   		global_node_page_state(NR_KERNEL_MISC_RECLAIMABLE),
>   		global_zone_page_state(NR_FREE_PAGES),
> @@ -5982,6 +5984,7 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask)
>   			" shadow_call_stack:%lukB"
>   #endif
>   			" pagetables:%lukB"
> +			" sec_pagetables:%lukB"
>   			" all_unreclaimable? %s"
>   			"\n",
>   			pgdat->node_id,
> @@ -6007,6 +6010,7 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask)
>   			node_page_state(pgdat, NR_KERNEL_SCS_KB),
>   #endif
>   			K(node_page_state(pgdat, NR_PAGETABLE)),
> +			K(node_page_state(pgdat, NR_SECONDARY_PAGETABLE)),
>   			pgdat->kswapd_failures >= MAX_RECLAIM_RETRIES ?
>   				"yes" : "no");
>   	}
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index b75b1a64b54cb..06eb52fe5be94 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -1240,6 +1240,7 @@ const char * const vmstat_text[] = {
>   	"nr_shadow_call_stack",
>   #endif
>   	"nr_page_table_pages",
> +	"nr_sec_page_table_pages",
>   #ifdef CONFIG_SWAP
>   	"nr_swapcached",
>   #endif

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 1/4] mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses.
  2022-06-13  3:18   ` Huang, Shaoqin
@ 2022-06-13 17:11     ` Yosry Ahmed
  0 siblings, 0 replies; 16+ messages in thread
From: Yosry Ahmed @ 2022-06-13 17:11 UTC (permalink / raw)
  To: Huang, Shaoqin
  Cc: Tejun Heo, Johannes Weiner, Zefan Li, Marc Zyngier, James Morse,
	Alexandru Elisei, Suzuki K Poulose, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Andrew Morton, Michal Hocko, Roman Gushchin,
	Shakeel Butt, Oliver Upton, Cgroups, Linux Kernel Mailing List,
	linux-arm-kernel, kvmarm, kvm, Linux-MM

On Sun, Jun 12, 2022 at 8:18 PM Huang, Shaoqin <shaoqin.huang@intel.com> wrote:
>
>
>
> On 6/7/2022 6:20 AM, Yosry Ahmed wrote:
> > Add NR_SECONDARY_PAGETABLE stat to count secondary page table uses, e.g.
> > KVM mmu. This provides more insights on the kernel memory used
> > by a workload.
> >
> > This stat will be used by subsequent patches to count KVM mmu
> > memory usage.
> >
> > Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
> > ---
> >   Documentation/admin-guide/cgroup-v2.rst | 5 +++++
> >   Documentation/filesystems/proc.rst      | 4 ++++
> >   drivers/base/node.c                     | 2 ++
> >   fs/proc/meminfo.c                       | 2 ++
> >   include/linux/mmzone.h                  | 1 +
> >   mm/memcontrol.c                         | 1 +
> >   mm/page_alloc.c                         | 6 +++++-
> >   mm/vmstat.c                             | 1 +
> >   8 files changed, 21 insertions(+), 1 deletion(-)
> >
> > diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> > index 69d7a6983f781..307a284b99189 100644
> > --- a/Documentation/admin-guide/cgroup-v2.rst
> > +++ b/Documentation/admin-guide/cgroup-v2.rst
> > @@ -1312,6 +1312,11 @@ PAGE_SIZE multiple when read back.
> >         pagetables
> >                   Amount of memory allocated for page tables.
> >
> > +       sec_pagetables
> > +             Amount of memory allocated for secondary page tables,
> > +             this currently includes KVM mmu allocations on x86
> > +             and arm64.
> > +
> >         percpu (npn)
> >               Amount of memory used for storing per-cpu kernel
> >               data structures.
> > diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
> > index 061744c436d99..894d6317f3bdc 100644
> > --- a/Documentation/filesystems/proc.rst
> > +++ b/Documentation/filesystems/proc.rst
> > @@ -973,6 +973,7 @@ You may not have all of these fields.
> >       SReclaimable:   159856 kB
> >       SUnreclaim:     124508 kB
> >       PageTables:      24448 kB
> > +    SecPageTables:    0 kB
> >       NFS_Unstable:        0 kB
> >       Bounce:              0 kB
> >       WritebackTmp:        0 kB
> > @@ -1067,6 +1068,9 @@ SUnreclaim
> >   PageTables
> >                 amount of memory dedicated to the lowest level of page
> >                 tables.
> > +SecPageTables
> > +           amount of memory dedicated to secondary page tables, this
> > +           currently includes KVM mmu allocations on x86 and arm64.
>
> Just a notice. This patch in the latest 5.19.0-rc2+ have a conflict in
> Documentation/filesystems/proc.rst file. But that's not a problem.

Thanks for pointing this out. Let me know if a rebase and resend is necessary.

<snip>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 1/4] mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses.
  2022-06-06 22:20 ` [PATCH v5 1/4] mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses Yosry Ahmed
  2022-06-10 19:55   ` Shakeel Butt
  2022-06-13  3:18   ` Huang, Shaoqin
@ 2022-06-27 16:07   ` Sean Christopherson
  2022-06-27 16:23     ` Yosry Ahmed
  2022-06-27 16:27   ` Sean Christopherson
  3 siblings, 1 reply; 16+ messages in thread
From: Sean Christopherson @ 2022-06-27 16:07 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: Tejun Heo, Johannes Weiner, Zefan Li, Marc Zyngier, James Morse,
	Alexandru Elisei, Suzuki K Poulose, Paolo Bonzini,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Andrew Morton, Michal Hocko, Roman Gushchin, Shakeel Butt,
	Oliver Upton, cgroups, linux-kernel, linux-arm-kernel, kvmarm,
	kvm, linux-mm

On Mon, Jun 06, 2022, Yosry Ahmed wrote:
> Add NR_SECONDARY_PAGETABLE stat to count secondary page table uses, e.g.
> KVM mmu. This provides more insights on the kernel memory used
> by a workload.

Please provide more justification for NR_SECONDARY_PAGETABLE in the changelog.
Specially, answer the questions that were asked in the previous version:

  1. Why not piggyback NR_PAGETABLE?
  2. Why a "generic" NR_SECONDARY_PAGETABLE instead of NR_VIRT_PAGETABLE?

It doesn't have to be super long, but provide enough info so that reviewers and
future readers don't need to go spelunking to understand the motivation for the
new counter type.

And it's probably worth an explicit Link to Marc's question that prompted the long
discussion in the previous version, that way if someone does want the gory details
they have a link readily available.

Link: https://lore.kernel.org/all/87ilqoi77b.wl-maz@kernel.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 2/4] KVM: mmu: add a helper to account memory used by KVM MMU.
  2022-06-06 22:20 ` [PATCH v5 2/4] KVM: mmu: add a helper to account memory used by KVM MMU Yosry Ahmed
@ 2022-06-27 16:20   ` Sean Christopherson
  2022-06-27 16:28     ` Yosry Ahmed
  0 siblings, 1 reply; 16+ messages in thread
From: Sean Christopherson @ 2022-06-27 16:20 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: Tejun Heo, Johannes Weiner, Zefan Li, Marc Zyngier, James Morse,
	Alexandru Elisei, Suzuki K Poulose, Paolo Bonzini,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Andrew Morton, Michal Hocko, Roman Gushchin, Shakeel Butt,
	Oliver Upton, cgroups, linux-kernel, linux-arm-kernel, kvmarm,
	kvm, linux-mm

On Mon, Jun 06, 2022, Yosry Ahmed wrote:
> Add a helper to account pages used by KVM for page tables in secondary
> pagetable stats. This function will be used by subsequent patches in
> different archs.
> 
> Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
> ---
>  include/linux/kvm_host.h | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 883e86ec8e8c4..645585f3a4bed 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -2246,6 +2246,15 @@ static inline void kvm_handle_signal_exit(struct kvm_vcpu *vcpu)
>  }
>  #endif /* CONFIG_KVM_XFER_TO_GUEST_WORK */
>  
> +/*
> + * If nr > 1, we assume virt is the address of the first page of a block of

But what if @nr is -2, which is technically less than 1?  :-)

> + * pages that were allocated together (i.e accounted together).

Don't document assumptions, document the rules.  And avoid "we", pronouns are
ambiguous, e.g. is "we" the author, or KVM, or something else entirely?

/*
 * If more than one page is being (un)accounted, @virt must be the address of
 * the first page of a block of pages what were allocated together.
 */


> + */
> +static inline void kvm_account_pgtable_pages(void *virt, int nr)
> +{
> +	mod_lruvec_page_state(virt_to_page(virt), NR_SECONDARY_PAGETABLE, nr);
> +}
> +
>  /*
>   * This defines how many reserved entries we want to keep before we
>   * kick the vcpu to the userspace to avoid dirty ring full.  This
> -- 
> 2.36.1.255.ge46751e96f-goog
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 3/4] KVM: x86/mmu: count KVM mmu usage in secondary pagetable stats.
  2022-06-06 22:20 ` [PATCH v5 3/4] KVM: x86/mmu: count KVM mmu usage in secondary pagetable stats Yosry Ahmed
@ 2022-06-27 16:22   ` Sean Christopherson
  2022-06-27 16:29     ` Yosry Ahmed
  0 siblings, 1 reply; 16+ messages in thread
From: Sean Christopherson @ 2022-06-27 16:22 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: Tejun Heo, Johannes Weiner, Zefan Li, Marc Zyngier, James Morse,
	Alexandru Elisei, Suzuki K Poulose, Paolo Bonzini,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Andrew Morton, Michal Hocko, Roman Gushchin, Shakeel Butt,
	Oliver Upton, cgroups, linux-kernel, linux-arm-kernel, kvmarm,
	kvm, linux-mm

On Mon, Jun 06, 2022, Yosry Ahmed wrote:
> Count the pages used by KVM mmu on x86 for in secondary pagetable stats.

"for in" is funky.  And it's worth providing a brief explanation of what the
secondary pagetable stats actually are.  "secondary" is confusingly close to
"second level pagetables", e.g. might be misconstrued as KVM counters for the
number of stage-2 / two-dimension paging page (TDP) tables.

Code looks good, though it needs a rebased on kvm/queue.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 1/4] mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses.
  2022-06-27 16:07   ` Sean Christopherson
@ 2022-06-27 16:23     ` Yosry Ahmed
  0 siblings, 0 replies; 16+ messages in thread
From: Yosry Ahmed @ 2022-06-27 16:23 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Tejun Heo, Johannes Weiner, Zefan Li, Marc Zyngier, James Morse,
	Alexandru Elisei, Suzuki K Poulose, Paolo Bonzini,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Andrew Morton, Michal Hocko, Roman Gushchin, Shakeel Butt,
	Oliver Upton, Cgroups, Linux Kernel Mailing List,
	linux-arm-kernel, kvmarm, kvm, Linux-MM

On Mon, Jun 27, 2022 at 9:07 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Mon, Jun 06, 2022, Yosry Ahmed wrote:
> > Add NR_SECONDARY_PAGETABLE stat to count secondary page table uses, e.g.
> > KVM mmu. This provides more insights on the kernel memory used
> > by a workload.
>
> Please provide more justification for NR_SECONDARY_PAGETABLE in the changelog.
> Specially, answer the questions that were asked in the previous version:
>
>   1. Why not piggyback NR_PAGETABLE?
>   2. Why a "generic" NR_SECONDARY_PAGETABLE instead of NR_VIRT_PAGETABLE?
>
> It doesn't have to be super long, but provide enough info so that reviewers and
> future readers don't need to go spelunking to understand the motivation for the
> new counter type.

I added such justification in the cover letter, is it better to
include it here alternatively?
or do you think the description in the cover letter is lacking?

>
> And it's probably worth an explicit Link to Marc's question that prompted the long
> discussion in the previous version, that way if someone does want the gory details
> they have a link readily available.
>
> Link: https://lore.kernel.org/all/87ilqoi77b.wl-maz@kernel.org

I will include the link in the next version.
Thanks!

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 1/4] mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses.
  2022-06-06 22:20 ` [PATCH v5 1/4] mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses Yosry Ahmed
                     ` (2 preceding siblings ...)
  2022-06-27 16:07   ` Sean Christopherson
@ 2022-06-27 16:27   ` Sean Christopherson
  3 siblings, 0 replies; 16+ messages in thread
From: Sean Christopherson @ 2022-06-27 16:27 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: Tejun Heo, Johannes Weiner, Zefan Li, Marc Zyngier, James Morse,
	Alexandru Elisei, Suzuki K Poulose, Paolo Bonzini,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Andrew Morton, Michal Hocko, Roman Gushchin, Shakeel Butt,
	Oliver Upton, cgroups, linux-kernel, linux-arm-kernel, kvmarm,
	kvm, linux-mm

On Mon, Jun 06, 2022, Yosry Ahmed wrote:
> diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
> index 061744c436d99..894d6317f3bdc 100644
> --- a/Documentation/filesystems/proc.rst
> +++ b/Documentation/filesystems/proc.rst
> @@ -973,6 +973,7 @@ You may not have all of these fields.
>      SReclaimable:   159856 kB
>      SUnreclaim:     124508 kB
>      PageTables:      24448 kB
> +    SecPageTables:	 0 kB

If/when you rebase, this should probably use all spaces and no tabs to match the
other fields.  Given that it's documentation, I'm guessing the use of spaces is
deliberate.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 2/4] KVM: mmu: add a helper to account memory used by KVM MMU.
  2022-06-27 16:20   ` Sean Christopherson
@ 2022-06-27 16:28     ` Yosry Ahmed
  0 siblings, 0 replies; 16+ messages in thread
From: Yosry Ahmed @ 2022-06-27 16:28 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Tejun Heo, Johannes Weiner, Zefan Li, Marc Zyngier, James Morse,
	Alexandru Elisei, Suzuki K Poulose, Paolo Bonzini,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Andrew Morton, Michal Hocko, Roman Gushchin, Shakeel Butt,
	Oliver Upton, Cgroups, Linux Kernel Mailing List,
	linux-arm-kernel, kvmarm, kvm, Linux-MM

On Mon, Jun 27, 2022 at 9:20 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Mon, Jun 06, 2022, Yosry Ahmed wrote:
> > Add a helper to account pages used by KVM for page tables in secondary
> > pagetable stats. This function will be used by subsequent patches in
> > different archs.
> >
> > Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
> > ---
> >  include/linux/kvm_host.h | 9 +++++++++
> >  1 file changed, 9 insertions(+)
> >
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index 883e86ec8e8c4..645585f3a4bed 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -2246,6 +2246,15 @@ static inline void kvm_handle_signal_exit(struct kvm_vcpu *vcpu)
> >  }
> >  #endif /* CONFIG_KVM_XFER_TO_GUEST_WORK */
> >
> > +/*
> > + * If nr > 1, we assume virt is the address of the first page of a block of
>
> But what if @nr is -2, which is technically less than 1?  :-)
>
> > + * pages that were allocated together (i.e accounted together).
>
> Don't document assumptions, document the rules.  And avoid "we", pronouns are
> ambiguous, e.g. is "we" the author, or KVM, or something else entirely?
>
> /*
>  * If more than one page is being (un)accounted, @virt must be the address of
>  * the first page of a block of pages what were allocated together.
>  */
>

Looks much better, I will use that in the next version.

Thanks!

>
> > + */
> > +static inline void kvm_account_pgtable_pages(void *virt, int nr)
> > +{
> > +     mod_lruvec_page_state(virt_to_page(virt), NR_SECONDARY_PAGETABLE, nr);
> > +}
> > +
> >  /*
> >   * This defines how many reserved entries we want to keep before we
> >   * kick the vcpu to the userspace to avoid dirty ring full.  This
> > --
> > 2.36.1.255.ge46751e96f-goog
> >

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 3/4] KVM: x86/mmu: count KVM mmu usage in secondary pagetable stats.
  2022-06-27 16:22   ` Sean Christopherson
@ 2022-06-27 16:29     ` Yosry Ahmed
  0 siblings, 0 replies; 16+ messages in thread
From: Yosry Ahmed @ 2022-06-27 16:29 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Tejun Heo, Johannes Weiner, Zefan Li, Marc Zyngier, James Morse,
	Alexandru Elisei, Suzuki K Poulose, Paolo Bonzini,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Andrew Morton, Michal Hocko, Roman Gushchin, Shakeel Butt,
	Oliver Upton, Cgroups, Linux Kernel Mailing List,
	linux-arm-kernel, kvmarm, kvm, Linux-MM

On Mon, Jun 27, 2022 at 9:22 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Mon, Jun 06, 2022, Yosry Ahmed wrote:
> > Count the pages used by KVM mmu on x86 for in secondary pagetable stats.
>
> "for in" is funky.  And it's worth providing a brief explanation of what the
> secondary pagetable stats actually are.  "secondary" is confusingly close to
> "second level pagetables", e.g. might be misconstrued as KVM counters for the
> number of stage-2 / two-dimension paging page (TDP) tables.
>
> Code looks good, though it needs a rebased on kvm/queue.

Will rebase and modify the commit message accordingly, thanks!

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 4/4] KVM: arm64/mmu: count KVM s2 mmu usage in secondary pagetable stats
  2022-06-06 22:20 ` [PATCH v5 4/4] KVM: arm64/mmu: count KVM s2 " Yosry Ahmed
@ 2022-06-28 18:53   ` Oliver Upton
  0 siblings, 0 replies; 16+ messages in thread
From: Oliver Upton @ 2022-06-28 18:53 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: Tejun Heo, Johannes Weiner, Zefan Li, Marc Zyngier, James Morse,
	Alexandru Elisei, Suzuki K Poulose, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Andrew Morton, Michal Hocko, Roman Gushchin,
	Shakeel Butt, kvm, linux-kernel, linux-mm, cgroups, kvmarm,
	linux-arm-kernel

Hi Yosry,

On Mon, Jun 06, 2022 at 10:20:58PM +0000, Yosry Ahmed wrote:
> Count the pages used by KVM in arm64 for stage2 mmu in secondary pagetable
> stats.

You could probably benefit from being a bit more verbose in the commit
message here as well, per Sean's feedback.

> Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
> ---
>  arch/arm64/kvm/mmu.c | 36 ++++++++++++++++++++++++++++++++----
>  1 file changed, 32 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index f5651a05b6a85..80bc92601fd96 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -92,9 +92,13 @@ static bool kvm_is_device_pfn(unsigned long pfn)
>  static void *stage2_memcache_zalloc_page(void *arg)
>  {
>  	struct kvm_mmu_memory_cache *mc = arg;
> +	void *virt;
>  
>  	/* Allocated with __GFP_ZERO, so no need to zero */
> -	return kvm_mmu_memory_cache_alloc(mc);
> +	virt = kvm_mmu_memory_cache_alloc(mc);
> +	if (virt)
> +		kvm_account_pgtable_pages(virt, 1);
> +	return virt;
>  }
>  
>  static void *kvm_host_zalloc_pages_exact(size_t size)
> @@ -102,6 +106,21 @@ static void *kvm_host_zalloc_pages_exact(size_t size)
>  	return alloc_pages_exact(size, GFP_KERNEL_ACCOUNT | __GFP_ZERO);
>  }
>  
> +static void *kvm_s2_zalloc_pages_exact(size_t size)
> +{
> +	void *virt = kvm_host_zalloc_pages_exact(size);
> +
> +	if (virt)
> +		kvm_account_pgtable_pages(virt, (size >> PAGE_SHIFT));
> +	return virt;
> +}
> +
> +static void kvm_s2_free_pages_exact(void *virt, size_t size)
> +{
> +	kvm_account_pgtable_pages(virt, -(size >> PAGE_SHIFT));
> +	free_pages_exact(virt, size);
> +}
> +
>  static void kvm_host_get_page(void *addr)
>  {
>  	get_page(virt_to_page(addr));
> @@ -112,6 +131,15 @@ static void kvm_host_put_page(void *addr)
>  	put_page(virt_to_page(addr));
>  }
>  
> +static void kvm_s2_put_page(void *addr)
> +{
> +	struct page *p = virt_to_page(addr);
> +	/* Dropping last refcount, the page will be freed */
> +	if (page_count(p) == 1)
> +		kvm_account_pgtable_pages(addr, -1);
> +	put_page(p);

Probably more of a note to myself with the parallel fault series, but
this is a race waiting to happen. This only works because stage 2 pages
are dropped behind the write lock.

Besides the commit message nit:

Reviewed-by: Oliver Upton <oliver.upton@linux.dev>

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2022-06-28 18:54 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-06 22:20 [PATCH v5 0/4] KVM: mm: count KVM mmu usage in memory stats Yosry Ahmed
2022-06-06 22:20 ` [PATCH v5 1/4] mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses Yosry Ahmed
2022-06-10 19:55   ` Shakeel Butt
2022-06-13  3:18   ` Huang, Shaoqin
2022-06-13 17:11     ` Yosry Ahmed
2022-06-27 16:07   ` Sean Christopherson
2022-06-27 16:23     ` Yosry Ahmed
2022-06-27 16:27   ` Sean Christopherson
2022-06-06 22:20 ` [PATCH v5 2/4] KVM: mmu: add a helper to account memory used by KVM MMU Yosry Ahmed
2022-06-27 16:20   ` Sean Christopherson
2022-06-27 16:28     ` Yosry Ahmed
2022-06-06 22:20 ` [PATCH v5 3/4] KVM: x86/mmu: count KVM mmu usage in secondary pagetable stats Yosry Ahmed
2022-06-27 16:22   ` Sean Christopherson
2022-06-27 16:29     ` Yosry Ahmed
2022-06-06 22:20 ` [PATCH v5 4/4] KVM: arm64/mmu: count KVM s2 " Yosry Ahmed
2022-06-28 18:53   ` Oliver Upton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).