linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/6] KVM: mm: count KVM page table pages in memory stats
@ 2022-04-26  5:38 Yosry Ahmed
  2022-04-26  5:38 ` [PATCH v3 1/6] mm: add NR_SECONDARY_PAGETABLE stat Yosry Ahmed
                   ` (5 more replies)
  0 siblings, 6 replies; 13+ messages in thread
From: Yosry Ahmed @ 2022-04-26  5:38 UTC (permalink / raw)
  To: Sean Christopherson, Huacai Chen, Aleksandar Markovic,
	Anup Patel, Atish Patra, Paolo Bonzini, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, Andrew Morton,
	Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
	James Morse, Catalin Marinas, Shameer Kolothum, Marc Zyngier,
	Alexandru Elisei, Suzuki K Poulose
  Cc: linux-mips, kvm, kvm-riscv, linux-kernel, linux-fsdevel,
	linux-mm, cgroups, linux-arm-kernel, kvmarm, Yosry Ahmed

We keep track of several kernel memory stats (total kernel memory, page
tables, stack, vmalloc, etc) on multiple levels (global, per-node,
per-memcg, etc). These stats give insights to users to how much memory
is used by the kernel and for what purposes.

Currently, memory used by kvm for its page tables is not accounted in
any of those kernel memory stats. This patch series accounts the memory
pages used by KVM for page tables in those stats in a new
NR_SECONDARY_PAGETABLE stat.

The riscv and mips patches are not tested due to lack of
resources. Feel free to test or drop them.

Changes in V3:
- Added NR_SECONDARY_PAGETABLE instead of piggybacking on NR_PAGETABLE
  stats.

Changes in V2:
- Added accounting stats for other archs than x86.
- Changed locations in the code where x86 KVM page table stats were
  accounted based on suggestions from Sean Christopherson.


Yosry Ahmed (6):
  mm: add NR_SECONDARY_PAGETABLE stat
  KVM: mmu: add a helper to account page table pages used by KVM.
  KVM: x86/mmu: count KVM page table pages in pagetable stats
  KVM: arm64/mmu: count KVM page table pages in pagetable stats
  KVM: riscv/mmu: count KVM page table pages in pagetable stats
  KVM: mips/mmu: count KVM page table pages in pagetable stats

 arch/arm64/kernel/image-vars.h |  3 ++
 arch/arm64/kvm/hyp/pgtable.c   | 50 +++++++++++++++++++++-------------
 arch/mips/kvm/mips.c           |  1 +
 arch/mips/kvm/mmu.c            |  9 +++++-
 arch/riscv/kvm/mmu.c           | 26 +++++++++++++-----
 arch/x86/kvm/mmu/mmu.c         | 16 +++++++++--
 arch/x86/kvm/mmu/tdp_mmu.c     | 16 +++++++++--
 drivers/base/node.c            |  2 ++
 fs/proc/meminfo.c              |  2 ++
 include/linux/kvm_host.h       |  9 ++++++
 include/linux/mmzone.h         |  1 +
 mm/memcontrol.c                |  1 +
 mm/page_alloc.c                |  6 +++-
 mm/vmstat.c                    |  1 +
 14 files changed, 111 insertions(+), 32 deletions(-)

-- 
2.36.0.rc2.479.g8af0fa9b8e-goog


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v3 1/6] mm: add NR_SECONDARY_PAGETABLE stat
  2022-04-26  5:38 [PATCH v3 0/6] KVM: mm: count KVM page table pages in memory stats Yosry Ahmed
@ 2022-04-26  5:38 ` Yosry Ahmed
  2022-04-26  5:39 ` [PATCH v3 2/6] KVM: mmu: add a helper to account page table pages used by KVM Yosry Ahmed
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Yosry Ahmed @ 2022-04-26  5:38 UTC (permalink / raw)
  To: Sean Christopherson, Huacai Chen, Aleksandar Markovic,
	Anup Patel, Atish Patra, Paolo Bonzini, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, Andrew Morton,
	Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
	James Morse, Catalin Marinas, Shameer Kolothum, Marc Zyngier,
	Alexandru Elisei, Suzuki K Poulose
  Cc: linux-mips, kvm, kvm-riscv, linux-kernel, linux-fsdevel,
	linux-mm, cgroups, linux-arm-kernel, kvmarm, Yosry Ahmed

Add NR_SECONDARY_PAGETABLE stat to count secondary page table uses, e.g.
KVM shadow page tables. This provides more insights on the kernel memory
used by a workload.

This stat will be used by subsequent patches to count KVM pagetable
pages usage.

Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
---
 drivers/base/node.c    | 2 ++
 fs/proc/meminfo.c      | 2 ++
 include/linux/mmzone.h | 1 +
 mm/memcontrol.c        | 1 +
 mm/page_alloc.c        | 6 +++++-
 mm/vmstat.c            | 1 +
 6 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/base/node.c b/drivers/base/node.c
index ec8bb24a5a22..9fe716832546 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -433,6 +433,7 @@ static ssize_t node_read_meminfo(struct device *dev,
 			     "Node %d ShadowCallStack:%8lu kB\n"
 #endif
 			     "Node %d PageTables:     %8lu kB\n"
+			     "Node %d SecPageTables:  %8lu kB\n"
 			     "Node %d NFS_Unstable:   %8lu kB\n"
 			     "Node %d Bounce:         %8lu kB\n"
 			     "Node %d WritebackTmp:   %8lu kB\n"
@@ -459,6 +460,7 @@ static ssize_t node_read_meminfo(struct device *dev,
 			     nid, node_page_state(pgdat, NR_KERNEL_SCS_KB),
 #endif
 			     nid, K(node_page_state(pgdat, NR_PAGETABLE)),
+			     nid, K(node_page_state(pgdat, NR_SECONDARY_PAGETABLE)),
 			     nid, 0UL,
 			     nid, K(sum_zone_node_page_state(nid, NR_BOUNCE)),
 			     nid, K(node_page_state(pgdat, NR_WRITEBACK_TEMP)),
diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
index 6fa761c9cc78..fad29024eb2e 100644
--- a/fs/proc/meminfo.c
+++ b/fs/proc/meminfo.c
@@ -108,6 +108,8 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
 #endif
 	show_val_kb(m, "PageTables:     ",
 		    global_node_page_state(NR_PAGETABLE));
+	show_val_kb(m, "SecPageTables:	",
+		    global_node_page_state(NR_SECONDARY_PAGETABLE));
 
 	show_val_kb(m, "NFS_Unstable:   ", 0);
 	show_val_kb(m, "Bounce:         ",
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 962b14d403e8..35f57f2578c0 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -219,6 +219,7 @@ enum node_stat_item {
 	NR_KERNEL_SCS_KB,	/* measured in KiB */
 #endif
 	NR_PAGETABLE,		/* used for pagetables */
+	NR_SECONDARY_PAGETABLE, /* secondary pagetables, e.g. kvm shadow pagetables */
 #ifdef CONFIG_SWAP
 	NR_SWAPCACHE,
 #endif
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 725f76723220..89fbd1793960 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1388,6 +1388,7 @@ static const struct memory_stat memory_stats[] = {
 	{ "kernel",			MEMCG_KMEM			},
 	{ "kernel_stack",		NR_KERNEL_STACK_KB		},
 	{ "pagetables",			NR_PAGETABLE			},
+	{ "secondary_pagetables",	NR_SECONDARY_PAGETABLE		},
 	{ "percpu",			MEMCG_PERCPU_B			},
 	{ "sock",			MEMCG_SOCK			},
 	{ "vmalloc",			MEMCG_VMALLOC			},
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2db95780e003..96d00ae9d5c1 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5932,7 +5932,8 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask)
 		" active_file:%lu inactive_file:%lu isolated_file:%lu\n"
 		" unevictable:%lu dirty:%lu writeback:%lu\n"
 		" slab_reclaimable:%lu slab_unreclaimable:%lu\n"
-		" mapped:%lu shmem:%lu pagetables:%lu bounce:%lu\n"
+		" mapped:%lu shmem:%lu pagetables:%lu\n"
+		" secondary_pagetables:%lu bounce:%lu\n"
 		" kernel_misc_reclaimable:%lu\n"
 		" free:%lu free_pcp:%lu free_cma:%lu\n",
 		global_node_page_state(NR_ACTIVE_ANON),
@@ -5949,6 +5950,7 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask)
 		global_node_page_state(NR_FILE_MAPPED),
 		global_node_page_state(NR_SHMEM),
 		global_node_page_state(NR_PAGETABLE),
+		global_node_page_state(NR_SECONDARY_PAGETABLE),
 		global_zone_page_state(NR_BOUNCE),
 		global_node_page_state(NR_KERNEL_MISC_RECLAIMABLE),
 		global_zone_page_state(NR_FREE_PAGES),
@@ -5982,6 +5984,7 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask)
 			" shadow_call_stack:%lukB"
 #endif
 			" pagetables:%lukB"
+			" secondary_pagetables:%lukB"
 			" all_unreclaimable? %s"
 			"\n",
 			pgdat->node_id,
@@ -6007,6 +6010,7 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask)
 			node_page_state(pgdat, NR_KERNEL_SCS_KB),
 #endif
 			K(node_page_state(pgdat, NR_PAGETABLE)),
+			K(node_page_state(pgdat, NR_SECONDARY_PAGETABLE)),
 			pgdat->kswapd_failures >= MAX_RECLAIM_RETRIES ?
 				"yes" : "no");
 	}
diff --git a/mm/vmstat.c b/mm/vmstat.c
index b75b1a64b54c..50bbec73809b 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1240,6 +1240,7 @@ const char * const vmstat_text[] = {
 	"nr_shadow_call_stack",
 #endif
 	"nr_page_table_pages",
+	"nr_secondary_page_table_pages",
 #ifdef CONFIG_SWAP
 	"nr_swapcached",
 #endif
-- 
2.36.0.rc2.479.g8af0fa9b8e-goog


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 2/6] KVM: mmu: add a helper to account page table pages used by KVM.
  2022-04-26  5:38 [PATCH v3 0/6] KVM: mm: count KVM page table pages in memory stats Yosry Ahmed
  2022-04-26  5:38 ` [PATCH v3 1/6] mm: add NR_SECONDARY_PAGETABLE stat Yosry Ahmed
@ 2022-04-26  5:39 ` Yosry Ahmed
  2022-04-26  5:39 ` [PATCH v3 3/6] KVM: x86/mmu: count KVM page table pages in pagetable stats Yosry Ahmed
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Yosry Ahmed @ 2022-04-26  5:39 UTC (permalink / raw)
  To: Sean Christopherson, Huacai Chen, Aleksandar Markovic,
	Anup Patel, Atish Patra, Paolo Bonzini, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, Andrew Morton,
	Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
	James Morse, Catalin Marinas, Shameer Kolothum, Marc Zyngier,
	Alexandru Elisei, Suzuki K Poulose
  Cc: linux-mips, kvm, kvm-riscv, linux-kernel, linux-fsdevel,
	linux-mm, cgroups, linux-arm-kernel, kvmarm, Yosry Ahmed

Add a helper to account pages used by KVM for page tables as pagetable
stats. This function will be used by subsequent patches in different
archs.

Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
---
 include/linux/kvm_host.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 252ee4a61b58..54cc4634053c 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2221,6 +2221,15 @@ static inline void kvm_handle_signal_exit(struct kvm_vcpu *vcpu)
 }
 #endif /* CONFIG_KVM_XFER_TO_GUEST_WORK */
 
+/*
+ * If nr > 1, we assume virt is the address of the first page of a block of
+ * pages that were allocated together (i.e accounted together).
+ */
+static inline void kvm_account_pgtable_pages(void *virt, int nr)
+{
+	mod_lruvec_page_state(virt_to_page(virt), NR_SECONDARY_PAGETABLE, nr);
+}
+
 /*
  * This defines how many reserved entries we want to keep before we
  * kick the vcpu to the userspace to avoid dirty ring full.  This
-- 
2.36.0.rc2.479.g8af0fa9b8e-goog


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 3/6] KVM: x86/mmu: count KVM page table pages in pagetable stats
  2022-04-26  5:38 [PATCH v3 0/6] KVM: mm: count KVM page table pages in memory stats Yosry Ahmed
  2022-04-26  5:38 ` [PATCH v3 1/6] mm: add NR_SECONDARY_PAGETABLE stat Yosry Ahmed
  2022-04-26  5:39 ` [PATCH v3 2/6] KVM: mmu: add a helper to account page table pages used by KVM Yosry Ahmed
@ 2022-04-26  5:39 ` Yosry Ahmed
  2022-04-26  5:39 ` [PATCH v3 4/6] KVM: arm64/mmu: " Yosry Ahmed
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Yosry Ahmed @ 2022-04-26  5:39 UTC (permalink / raw)
  To: Sean Christopherson, Huacai Chen, Aleksandar Markovic,
	Anup Patel, Atish Patra, Paolo Bonzini, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, Andrew Morton,
	Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
	James Morse, Catalin Marinas, Shameer Kolothum, Marc Zyngier,
	Alexandru Elisei, Suzuki K Poulose
  Cc: linux-mips, kvm, kvm-riscv, linux-kernel, linux-fsdevel,
	linux-mm, cgroups, linux-arm-kernel, kvmarm, Yosry Ahmed

Count the pages used by KVM in x86 for page tables in pagetable stats.

For legacy code, accounting pagetable stats is combined KVM's
existing for mmu pages in newly introduced kvm_[un]account_mmu_page()
helpers.

For tdp mmu, introduce new tdp_[un]account_mmu_page() helpers. That
combines accounting pagetable stats with the tdp_mmu_pages counter
accounting.

tdp_mmu_pages counter introduced in this series [1]. This patch was
rebased on top of the first two patches in that series.

[1]https://lore.kernel.org/lkml/20220401063636.2414200-1-mizhang@google.com/

Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
---
 arch/x86/kvm/mmu/mmu.c     | 16 ++++++++++++++--
 arch/x86/kvm/mmu/tdp_mmu.c | 16 ++++++++++++++--
 2 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 78d8e1d8fb99..e5b0e826445d 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1679,6 +1679,18 @@ static inline void kvm_mod_used_mmu_pages(struct kvm *kvm, long nr)
 	percpu_counter_add(&kvm_total_used_mmu_pages, nr);
 }
 
+static void kvm_account_mmu_page(struct kvm *kvm, struct kvm_mmu_page *sp)
+{
+	kvm_mod_used_mmu_pages(kvm, +1);
+	kvm_account_pgtable_pages((void *)sp->spt, +1);
+}
+
+static void kvm_unaccount_mmu_page(struct kvm *kvm, struct kvm_mmu_page *sp)
+{
+	kvm_mod_used_mmu_pages(kvm, -1);
+	kvm_account_pgtable_pages((void *)sp->spt, -1);
+}
+
 static void kvm_mmu_free_page(struct kvm_mmu_page *sp)
 {
 	MMU_WARN_ON(!is_empty_shadow_page(sp->spt));
@@ -1734,7 +1746,7 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu, int direct
 	 */
 	sp->mmu_valid_gen = vcpu->kvm->arch.mmu_valid_gen;
 	list_add(&sp->link, &vcpu->kvm->arch.active_mmu_pages);
-	kvm_mod_used_mmu_pages(vcpu->kvm, +1);
+	kvm_account_mmu_page(vcpu->kvm, sp);
 	return sp;
 }
 
@@ -2363,7 +2375,7 @@ static bool __kvm_mmu_prepare_zap_page(struct kvm *kvm,
 			list_add(&sp->link, invalid_list);
 		else
 			list_move(&sp->link, invalid_list);
-		kvm_mod_used_mmu_pages(kvm, -1);
+		kvm_unaccount_mmu_page(kvm, sp);
 	} else {
 		/*
 		 * Remove the active root from the active page list, the root
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 3456277ade18..6295c4da5dee 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -371,6 +371,18 @@ static void handle_changed_spte_dirty_log(struct kvm *kvm, int as_id, gfn_t gfn,
 	}
 }
 
+static void tdp_account_mmu_page(struct kvm *kvm, struct kvm_mmu_page *sp)
+{
+	atomic64_inc(&kvm->arch.tdp_mmu_pages);
+	kvm_account_pgtable_pages((void *)sp->spt, +1);
+}
+
+static void tdp_unaccount_mmu_page(struct kvm *kvm, struct kvm_mmu_page *sp)
+{
+	atomic64_dec(&kvm->arch.tdp_mmu_pages);
+	kvm_account_pgtable_pages((void *)sp->spt, -1);
+}
+
 /**
  * tdp_mmu_unlink_sp() - Remove a shadow page from the list of used pages
  *
@@ -383,7 +395,7 @@ static void handle_changed_spte_dirty_log(struct kvm *kvm, int as_id, gfn_t gfn,
 static void tdp_mmu_unlink_sp(struct kvm *kvm, struct kvm_mmu_page *sp,
 			      bool shared)
 {
-	atomic64_dec(&kvm->arch.tdp_mmu_pages);
+	tdp_unaccount_mmu_page(kvm, sp);
 
 	if (!sp->lpage_disallowed)
 		return;
@@ -1121,7 +1133,7 @@ static int tdp_mmu_link_sp(struct kvm *kvm, struct tdp_iter *iter,
 		tdp_mmu_set_spte(kvm, iter, spte);
 	}
 
-	atomic64_inc(&kvm->arch.tdp_mmu_pages);
+	tdp_account_mmu_page(kvm, sp);
 
 	return 0;
 }
-- 
2.36.0.rc2.479.g8af0fa9b8e-goog


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 4/6] KVM: arm64/mmu: count KVM page table pages in pagetable stats
  2022-04-26  5:38 [PATCH v3 0/6] KVM: mm: count KVM page table pages in memory stats Yosry Ahmed
                   ` (2 preceding siblings ...)
  2022-04-26  5:39 ` [PATCH v3 3/6] KVM: x86/mmu: count KVM page table pages in pagetable stats Yosry Ahmed
@ 2022-04-26  5:39 ` Yosry Ahmed
  2022-04-26  7:34   ` Oliver Upton
  2022-04-26 15:58   ` Marc Zyngier
  2022-04-26  5:39 ` [PATCH v3 5/6] KVM: riscv/mmu: " Yosry Ahmed
  2022-04-26  5:39 ` [PATCH v3 6/6] KVM: mips/mmu: " Yosry Ahmed
  5 siblings, 2 replies; 13+ messages in thread
From: Yosry Ahmed @ 2022-04-26  5:39 UTC (permalink / raw)
  To: Sean Christopherson, Huacai Chen, Aleksandar Markovic,
	Anup Patel, Atish Patra, Paolo Bonzini, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, Andrew Morton,
	Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
	James Morse, Catalin Marinas, Shameer Kolothum, Marc Zyngier,
	Alexandru Elisei, Suzuki K Poulose
  Cc: linux-mips, kvm, kvm-riscv, linux-kernel, linux-fsdevel,
	linux-mm, cgroups, linux-arm-kernel, kvmarm, Yosry Ahmed

Count the pages used by KVM in arm64 for page tables in pagetable stats.

Account pages allocated for PTEs in pgtable init functions and
kvm_set_table_pte().

Since most page table pages are freed using put_page(), add a helper
function put_pte_page() that checks if this is the last ref for a pte
page before putting it, and unaccounts stats accordingly.

Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
---
 arch/arm64/kernel/image-vars.h |  3 ++
 arch/arm64/kvm/hyp/pgtable.c   | 50 +++++++++++++++++++++-------------
 2 files changed, 34 insertions(+), 19 deletions(-)

diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
index 241c86b67d01..25bf058714f6 100644
--- a/arch/arm64/kernel/image-vars.h
+++ b/arch/arm64/kernel/image-vars.h
@@ -143,6 +143,9 @@ KVM_NVHE_ALIAS(__hyp_rodata_end);
 /* pKVM static key */
 KVM_NVHE_ALIAS(kvm_protected_mode_initialized);
 
+/* Called by kvm_account_pgtable_pages() to update pagetable stats */
+KVM_NVHE_ALIAS(__mod_lruvec_page_state);
+
 #endif /* CONFIG_KVM */
 
 #endif /* __ARM64_KERNEL_IMAGE_VARS_H */
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 2cb3867eb7c2..53e13c3313e9 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -152,6 +152,7 @@ static void kvm_set_table_pte(kvm_pte_t *ptep, kvm_pte_t *childp,
 
 	WARN_ON(kvm_pte_valid(old));
 	smp_store_release(ptep, pte);
+	kvm_account_pgtable_pages((void *)childp, +1);
 }
 
 static kvm_pte_t kvm_init_valid_leaf_pte(u64 pa, kvm_pte_t attr, u32 level)
@@ -326,6 +327,14 @@ int kvm_pgtable_get_leaf(struct kvm_pgtable *pgt, u64 addr,
 	return ret;
 }
 
+static void put_pte_page(kvm_pte_t *ptep, struct kvm_pgtable_mm_ops *mm_ops)
+{
+	/* If this is the last page ref, decrement pagetable stats first. */
+	if (!mm_ops->page_count || mm_ops->page_count(ptep) == 1)
+		kvm_account_pgtable_pages((void *)ptep, -1);
+	mm_ops->put_page(ptep);
+}
+
 struct hyp_map_data {
 	u64				phys;
 	kvm_pte_t			attr;
@@ -488,10 +497,10 @@ static int hyp_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 
 	dsb(ish);
 	isb();
-	mm_ops->put_page(ptep);
+	put_pte_page(ptep, mm_ops);
 
 	if (childp)
-		mm_ops->put_page(childp);
+		put_pte_page(childp, mm_ops);
 
 	return 0;
 }
@@ -522,6 +531,7 @@ int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
 	pgt->pgd = (kvm_pte_t *)mm_ops->zalloc_page(NULL);
 	if (!pgt->pgd)
 		return -ENOMEM;
+	kvm_account_pgtable_pages((void *)pgt->pgd, +1);
 
 	pgt->ia_bits		= va_bits;
 	pgt->start_level	= KVM_PGTABLE_MAX_LEVELS - levels;
@@ -541,10 +551,10 @@ static int hyp_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 	if (!kvm_pte_valid(pte))
 		return 0;
 
-	mm_ops->put_page(ptep);
+	put_pte_page(ptep, mm_ops);
 
 	if (kvm_pte_table(pte, level))
-		mm_ops->put_page(kvm_pte_follow(pte, mm_ops));
+		put_pte_page(kvm_pte_follow(pte, mm_ops), mm_ops);
 
 	return 0;
 }
@@ -558,7 +568,7 @@ void kvm_pgtable_hyp_destroy(struct kvm_pgtable *pgt)
 	};
 
 	WARN_ON(kvm_pgtable_walk(pgt, 0, BIT(pgt->ia_bits), &walker));
-	pgt->mm_ops->put_page(pgt->pgd);
+	put_pte_page(pgt->pgd, pgt->mm_ops);
 	pgt->pgd = NULL;
 }
 
@@ -694,7 +704,7 @@ static void stage2_put_pte(kvm_pte_t *ptep, struct kvm_s2_mmu *mmu, u64 addr,
 		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, addr, level);
 	}
 
-	mm_ops->put_page(ptep);
+	put_pte_page(ptep, mm_ops);
 }
 
 static bool stage2_pte_cacheable(struct kvm_pgtable *pgt, kvm_pte_t pte)
@@ -795,7 +805,7 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 
 	if (data->anchor) {
 		if (stage2_pte_is_counted(pte))
-			mm_ops->put_page(ptep);
+			put_pte_page(ptep, mm_ops);
 
 		return 0;
 	}
@@ -848,8 +858,8 @@ static int stage2_map_walk_table_post(u64 addr, u64 end, u32 level,
 		childp = kvm_pte_follow(*ptep, mm_ops);
 	}
 
-	mm_ops->put_page(childp);
-	mm_ops->put_page(ptep);
+	put_pte_page(childp, mm_ops);
+	put_pte_page(ptep, mm_ops);
 
 	return ret;
 }
@@ -962,7 +972,7 @@ static int stage2_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 	if (!kvm_pte_valid(pte)) {
 		if (stage2_pte_is_counted(pte)) {
 			kvm_clear_pte(ptep);
-			mm_ops->put_page(ptep);
+			put_pte_page(ptep, mm_ops);
 		}
 		return 0;
 	}
@@ -988,7 +998,7 @@ static int stage2_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 					       kvm_granule_size(level));
 
 	if (childp)
-		mm_ops->put_page(childp);
+		put_pte_page(childp, mm_ops);
 
 	return 0;
 }
@@ -1177,16 +1187,17 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
 			      enum kvm_pgtable_stage2_flags flags,
 			      kvm_pgtable_force_pte_cb_t force_pte_cb)
 {
-	size_t pgd_sz;
+	u32 pgd_num;
 	u64 vtcr = mmu->arch->vtcr;
 	u32 ia_bits = VTCR_EL2_IPA(vtcr);
 	u32 sl0 = FIELD_GET(VTCR_EL2_SL0_MASK, vtcr);
 	u32 start_level = VTCR_EL2_TGRAN_SL0_BASE - sl0;
 
-	pgd_sz = kvm_pgd_pages(ia_bits, start_level) * PAGE_SIZE;
-	pgt->pgd = mm_ops->zalloc_pages_exact(pgd_sz);
+	pgd_num = kvm_pgd_pages(ia_bits, start_level);
+	pgt->pgd = mm_ops->zalloc_pages_exact(pgd_num * PAGE_SIZE);
 	if (!pgt->pgd)
 		return -ENOMEM;
+	kvm_account_pgtable_pages((void *)pgt->pgd, +pgd_num);
 
 	pgt->ia_bits		= ia_bits;
 	pgt->start_level	= start_level;
@@ -1210,17 +1221,17 @@ static int stage2_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 	if (!stage2_pte_is_counted(pte))
 		return 0;
 
-	mm_ops->put_page(ptep);
+	put_pte_page(ptep, mm_ops);
 
 	if (kvm_pte_table(pte, level))
-		mm_ops->put_page(kvm_pte_follow(pte, mm_ops));
+		put_pte_page(kvm_pte_follow(pte, mm_ops), mm_ops);
 
 	return 0;
 }
 
 void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
 {
-	size_t pgd_sz;
+	u32 pgd_num;
 	struct kvm_pgtable_walker walker = {
 		.cb	= stage2_free_walker,
 		.flags	= KVM_PGTABLE_WALK_LEAF |
@@ -1229,7 +1240,8 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
 	};
 
 	WARN_ON(kvm_pgtable_walk(pgt, 0, BIT(pgt->ia_bits), &walker));
-	pgd_sz = kvm_pgd_pages(pgt->ia_bits, pgt->start_level) * PAGE_SIZE;
-	pgt->mm_ops->free_pages_exact(pgt->pgd, pgd_sz);
+	pgd_num = kvm_pgd_pages(pgt->ia_bits, pgt->start_level);
+	kvm_account_pgtable_pages((void *)pgt->pgd, -pgd_num);
+	pgt->mm_ops->free_pages_exact(pgt->pgd, pgd_num * PAGE_SIZE);
 	pgt->pgd = NULL;
 }
-- 
2.36.0.rc2.479.g8af0fa9b8e-goog


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 5/6] KVM: riscv/mmu: count KVM page table pages in pagetable stats
  2022-04-26  5:38 [PATCH v3 0/6] KVM: mm: count KVM page table pages in memory stats Yosry Ahmed
                   ` (3 preceding siblings ...)
  2022-04-26  5:39 ` [PATCH v3 4/6] KVM: arm64/mmu: " Yosry Ahmed
@ 2022-04-26  5:39 ` Yosry Ahmed
  2022-04-26  5:39 ` [PATCH v3 6/6] KVM: mips/mmu: " Yosry Ahmed
  5 siblings, 0 replies; 13+ messages in thread
From: Yosry Ahmed @ 2022-04-26  5:39 UTC (permalink / raw)
  To: Sean Christopherson, Huacai Chen, Aleksandar Markovic,
	Anup Patel, Atish Patra, Paolo Bonzini, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, Andrew Morton,
	Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
	James Morse, Catalin Marinas, Shameer Kolothum, Marc Zyngier,
	Alexandru Elisei, Suzuki K Poulose
  Cc: linux-mips, kvm, kvm-riscv, linux-kernel, linux-fsdevel,
	linux-mm, cgroups, linux-arm-kernel, kvmarm, Yosry Ahmed

Count the pages used by KVM in riscv for page tables in pagetable stats.

Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
---
 arch/riscv/kvm/mmu.c | 26 +++++++++++++++++++-------
 1 file changed, 19 insertions(+), 7 deletions(-)

diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
index f80a34fbf102..fcfb75713750 100644
--- a/arch/riscv/kvm/mmu.c
+++ b/arch/riscv/kvm/mmu.c
@@ -152,6 +152,7 @@ static int stage2_set_pte(struct kvm *kvm, u32 level,
 			next_ptep = kvm_mmu_memory_cache_alloc(pcache);
 			if (!next_ptep)
 				return -ENOMEM;
+			kvm_account_pgtable_pages((void *)next_ptep, +1);
 			*ptep = pfn_pte(PFN_DOWN(__pa(next_ptep)),
 					__pgprot(_PAGE_TABLE));
 		} else {
@@ -229,6 +230,7 @@ static void stage2_op_pte(struct kvm *kvm, gpa_t addr,
 	pte_t *next_ptep;
 	u32 next_ptep_level;
 	unsigned long next_page_size, page_size;
+	struct page *p;
 
 	ret = stage2_level_to_page_size(ptep_level, &page_size);
 	if (ret)
@@ -252,8 +254,13 @@ static void stage2_op_pte(struct kvm *kvm, gpa_t addr,
 		for (i = 0; i < PTRS_PER_PTE; i++)
 			stage2_op_pte(kvm, addr + i * next_page_size,
 					&next_ptep[i], next_ptep_level, op);
-		if (op == STAGE2_OP_CLEAR)
-			put_page(virt_to_page(next_ptep));
+		if (op == STAGE2_OP_CLEAR) {
+			p = virt_to_page(next_ptep);
+			if (page_count(p) == 1)
+				kvm_account_pgtable_pages((void *)next_ptep,
+							  -1);
+			put_page(p);
+		}
 	} else {
 		if (op == STAGE2_OP_CLEAR)
 			set_pte(ptep, __pte(0));
@@ -700,25 +707,27 @@ int kvm_riscv_stage2_map(struct kvm_vcpu *vcpu,
 int kvm_riscv_stage2_alloc_pgd(struct kvm *kvm)
 {
 	struct page *pgd_page;
+	int order;
 
 	if (kvm->arch.pgd != NULL) {
 		kvm_err("kvm_arch already initialized?\n");
 		return -EINVAL;
 	}
 
-	pgd_page = alloc_pages(GFP_KERNEL | __GFP_ZERO,
-				get_order(stage2_pgd_size));
+	order = get_order(stage2_pgd_size);
+	pgd_page = alloc_pages(GFP_KERNEL | __GFP_ZERO, order);
 	if (!pgd_page)
 		return -ENOMEM;
 	kvm->arch.pgd = page_to_virt(pgd_page);
 	kvm->arch.pgd_phys = page_to_phys(pgd_page);
-
+	kvm_account_pgtable_pages((void *)kvm->arch.pgd, +(1UL << order));
 	return 0;
 }
 
 void kvm_riscv_stage2_free_pgd(struct kvm *kvm)
 {
 	void *pgd = NULL;
+	int order;
 
 	spin_lock(&kvm->mmu_lock);
 	if (kvm->arch.pgd) {
@@ -729,8 +738,11 @@ void kvm_riscv_stage2_free_pgd(struct kvm *kvm)
 	}
 	spin_unlock(&kvm->mmu_lock);
 
-	if (pgd)
-		free_pages((unsigned long)pgd, get_order(stage2_pgd_size));
+	if (pgd) {
+		order = get_order(stage2_pgd_size);
+		kvm_account_pgtable_pages((void *)pgd, -(1UL << order));
+		free_pages((unsigned long)pgd, order);
+	}
 }
 
 void kvm_riscv_stage2_update_hgatp(struct kvm_vcpu *vcpu)
-- 
2.36.0.rc2.479.g8af0fa9b8e-goog


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 6/6] KVM: mips/mmu: count KVM page table pages in pagetable stats
  2022-04-26  5:38 [PATCH v3 0/6] KVM: mm: count KVM page table pages in memory stats Yosry Ahmed
                   ` (4 preceding siblings ...)
  2022-04-26  5:39 ` [PATCH v3 5/6] KVM: riscv/mmu: " Yosry Ahmed
@ 2022-04-26  5:39 ` Yosry Ahmed
  5 siblings, 0 replies; 13+ messages in thread
From: Yosry Ahmed @ 2022-04-26  5:39 UTC (permalink / raw)
  To: Sean Christopherson, Huacai Chen, Aleksandar Markovic,
	Anup Patel, Atish Patra, Paolo Bonzini, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, Andrew Morton,
	Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
	James Morse, Catalin Marinas, Shameer Kolothum, Marc Zyngier,
	Alexandru Elisei, Suzuki K Poulose
  Cc: linux-mips, kvm, kvm-riscv, linux-kernel, linux-fsdevel,
	linux-mm, cgroups, linux-arm-kernel, kvmarm, Yosry Ahmed

Count the pages used by KVM in mips for page tables in pagetable stats.

Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
---
 arch/mips/kvm/mips.c | 1 +
 arch/mips/kvm/mmu.c  | 9 ++++++++-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index a25e0b73ee70..e60c1920a408 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -175,6 +175,7 @@ static void kvm_mips_free_gpa_pt(struct kvm *kvm)
 {
 	/* It should always be safe to remove after flushing the whole range */
 	WARN_ON(!kvm_mips_flush_gpa_pt(kvm, 0, ~0));
+	kvm_account_pgtable_pages((void *)kvm->arch.gpa_mm.pgd, -1);
 	pgd_free(NULL, kvm->arch.gpa_mm.pgd);
 }
 
diff --git a/arch/mips/kvm/mmu.c b/arch/mips/kvm/mmu.c
index 1bfd1b501d82..18da2ac2ded7 100644
--- a/arch/mips/kvm/mmu.c
+++ b/arch/mips/kvm/mmu.c
@@ -81,8 +81,10 @@ pgd_t *kvm_pgd_alloc(void)
 	pgd_t *ret;
 
 	ret = (pgd_t *)__get_free_pages(GFP_KERNEL, PGD_ORDER);
-	if (ret)
+	if (ret) {
 		kvm_pgd_init(ret);
+		kvm_account_pgtable_pages((void *)ret, +1);
+	}
 
 	return ret;
 }
@@ -125,6 +127,7 @@ static pte_t *kvm_mips_walk_pgd(pgd_t *pgd, struct kvm_mmu_memory_cache *cache,
 		pmd_init((unsigned long)new_pmd,
 			 (unsigned long)invalid_pte_table);
 		pud_populate(NULL, pud, new_pmd);
+		kvm_account_pgtable_pages((void *)new_pmd, +1);
 	}
 	pmd = pmd_offset(pud, addr);
 	if (pmd_none(*pmd)) {
@@ -135,6 +138,7 @@ static pte_t *kvm_mips_walk_pgd(pgd_t *pgd, struct kvm_mmu_memory_cache *cache,
 		new_pte = kvm_mmu_memory_cache_alloc(cache);
 		clear_page(new_pte);
 		pmd_populate_kernel(NULL, pmd, new_pte);
+		kvm_account_pgtable_pages((void *)new_pte, +1);
 	}
 	return pte_offset_kernel(pmd, addr);
 }
@@ -189,6 +193,7 @@ static bool kvm_mips_flush_gpa_pmd(pmd_t *pmd, unsigned long start_gpa,
 
 		if (kvm_mips_flush_gpa_pte(pte, start_gpa, end)) {
 			pmd_clear(pmd + i);
+			kvm_account_pgtable_pages((void *)pte, -1);
 			pte_free_kernel(NULL, pte);
 		} else {
 			safe_to_remove = false;
@@ -217,6 +222,7 @@ static bool kvm_mips_flush_gpa_pud(pud_t *pud, unsigned long start_gpa,
 
 		if (kvm_mips_flush_gpa_pmd(pmd, start_gpa, end)) {
 			pud_clear(pud + i);
+			kvm_account_pgtable_pages((void *)pmd, -1);
 			pmd_free(NULL, pmd);
 		} else {
 			safe_to_remove = false;
@@ -247,6 +253,7 @@ static bool kvm_mips_flush_gpa_pgd(pgd_t *pgd, unsigned long start_gpa,
 
 		if (kvm_mips_flush_gpa_pud(pud, start_gpa, end)) {
 			pgd_clear(pgd + i);
+			kvm_account_pgtable_pages((void *)pud, -1);
 			pud_free(NULL, pud);
 		} else {
 			safe_to_remove = false;
-- 
2.36.0.rc2.479.g8af0fa9b8e-goog


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 4/6] KVM: arm64/mmu: count KVM page table pages in pagetable stats
  2022-04-26  5:39 ` [PATCH v3 4/6] KVM: arm64/mmu: " Yosry Ahmed
@ 2022-04-26  7:34   ` Oliver Upton
  2022-04-26 19:27     ` Yosry Ahmed
  2022-04-26 15:58   ` Marc Zyngier
  1 sibling, 1 reply; 13+ messages in thread
From: Oliver Upton @ 2022-04-26  7:34 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: Sean Christopherson, Huacai Chen, Aleksandar Markovic,
	Anup Patel, Atish Patra, Paolo Bonzini, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, Andrew Morton,
	Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
	James Morse, Catalin Marinas, Shameer Kolothum, Marc Zyngier,
	Alexandru Elisei, Suzuki K Poulose, linux-mips, kvm, kvm-riscv,
	linux-kernel, linux-fsdevel, linux-mm, cgroups, linux-arm-kernel,
	kvmarm

Hi Yosry,

On Tue, Apr 26, 2022 at 05:39:02AM +0000, Yosry Ahmed wrote:
> Count the pages used by KVM in arm64 for page tables in pagetable stats.
> 
> Account pages allocated for PTEs in pgtable init functions and
> kvm_set_table_pte().
> 
> Since most page table pages are freed using put_page(), add a helper
> function put_pte_page() that checks if this is the last ref for a pte
> page before putting it, and unaccounts stats accordingly.
> 
> Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
> ---
>  arch/arm64/kernel/image-vars.h |  3 ++
>  arch/arm64/kvm/hyp/pgtable.c   | 50 +++++++++++++++++++++-------------
>  2 files changed, 34 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
> index 241c86b67d01..25bf058714f6 100644
> --- a/arch/arm64/kernel/image-vars.h
> +++ b/arch/arm64/kernel/image-vars.h
> @@ -143,6 +143,9 @@ KVM_NVHE_ALIAS(__hyp_rodata_end);
>  /* pKVM static key */
>  KVM_NVHE_ALIAS(kvm_protected_mode_initialized);
>  
> +/* Called by kvm_account_pgtable_pages() to update pagetable stats */
> +KVM_NVHE_ALIAS(__mod_lruvec_page_state);
> +
>  #endif /* CONFIG_KVM */
>  
>  #endif /* __ARM64_KERNEL_IMAGE_VARS_H */
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 2cb3867eb7c2..53e13c3313e9 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -152,6 +152,7 @@ static void kvm_set_table_pte(kvm_pte_t *ptep, kvm_pte_t *childp,
>  
>  	WARN_ON(kvm_pte_valid(old));
>  	smp_store_release(ptep, pte);
> +	kvm_account_pgtable_pages((void *)childp, +1);

What page tables do we want to account? KVM on ARM manages several page
tables.

For regular KVM, the host kernel manages allocations for the hyp stage 1
tables in addition to the stage 2 tables used for a particular VM. The
former is system overhead whereas the latter could be attributed to a
guest VM.

I imagine protected KVM is out of scope, since it actually manages its
own allocations outside of the host kernel.

Given this, I would recommend adding the accounting hooks to mmu.c as
that is where we alloc/free table pages and it is in the host address
space. kvm_s2_mm_ops and kvm_hyp_mm_ops point to all the relevant
functions, though the latter is only relevant if we want to count system
page tables too.

--
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 4/6] KVM: arm64/mmu: count KVM page table pages in pagetable stats
  2022-04-26  5:39 ` [PATCH v3 4/6] KVM: arm64/mmu: " Yosry Ahmed
  2022-04-26  7:34   ` Oliver Upton
@ 2022-04-26 15:58   ` Marc Zyngier
  2022-04-26 19:33     ` Yosry Ahmed
  1 sibling, 1 reply; 13+ messages in thread
From: Marc Zyngier @ 2022-04-26 15:58 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: Sean Christopherson, Huacai Chen, Aleksandar Markovic,
	Anup Patel, Atish Patra, Paolo Bonzini, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, Andrew Morton,
	Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
	James Morse, Catalin Marinas, Shameer Kolothum, Alexandru Elisei,
	Suzuki K Poulose, linux-mips, kvm, kvm-riscv, linux-kernel,
	linux-fsdevel, linux-mm, cgroups, linux-arm-kernel, kvmarm

On Tue, 26 Apr 2022 06:39:02 +0100,
Yosry Ahmed <yosryahmed@google.com> wrote:
> 
> Count the pages used by KVM in arm64 for page tables in pagetable stats.
> 
> Account pages allocated for PTEs in pgtable init functions and
> kvm_set_table_pte().
> 
> Since most page table pages are freed using put_page(), add a helper
> function put_pte_page() that checks if this is the last ref for a pte
> page before putting it, and unaccounts stats accordingly.
> 
> Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
> ---
>  arch/arm64/kernel/image-vars.h |  3 ++
>  arch/arm64/kvm/hyp/pgtable.c   | 50 +++++++++++++++++++++-------------
>  2 files changed, 34 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
> index 241c86b67d01..25bf058714f6 100644
> --- a/arch/arm64/kernel/image-vars.h
> +++ b/arch/arm64/kernel/image-vars.h
> @@ -143,6 +143,9 @@ KVM_NVHE_ALIAS(__hyp_rodata_end);
>  /* pKVM static key */
>  KVM_NVHE_ALIAS(kvm_protected_mode_initialized);
>  
> +/* Called by kvm_account_pgtable_pages() to update pagetable stats */
> +KVM_NVHE_ALIAS(__mod_lruvec_page_state);

This cannot be right. It means that this function will be called
directly from the EL2 code when in protected mode, and will result in
extreme fireworks.  There is no way you can call core kernel stuff
like this from this context.

Please do not add random symbols to this list just for the sake of
being able to link the kernel.

> +
>  #endif /* CONFIG_KVM */
>  
>  #endif /* __ARM64_KERNEL_IMAGE_VARS_H */
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 2cb3867eb7c2..53e13c3313e9 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -152,6 +152,7 @@ static void kvm_set_table_pte(kvm_pte_t *ptep, kvm_pte_t *childp,
>  
>  	WARN_ON(kvm_pte_valid(old));
>  	smp_store_release(ptep, pte);
> +	kvm_account_pgtable_pages((void *)childp, +1);

Why the + sign?

>  }
>  
>  static kvm_pte_t kvm_init_valid_leaf_pte(u64 pa, kvm_pte_t attr, u32 level)
> @@ -326,6 +327,14 @@ int kvm_pgtable_get_leaf(struct kvm_pgtable *pgt, u64 addr,
>  	return ret;
>  }
>  
> +static void put_pte_page(kvm_pte_t *ptep, struct kvm_pgtable_mm_ops *mm_ops)
> +{
> +	/* If this is the last page ref, decrement pagetable stats first. */
> +	if (!mm_ops->page_count || mm_ops->page_count(ptep) == 1)
> +		kvm_account_pgtable_pages((void *)ptep, -1);
> +	mm_ops->put_page(ptep);
> +}
> +
>  struct hyp_map_data {
>  	u64				phys;
>  	kvm_pte_t			attr;
> @@ -488,10 +497,10 @@ static int hyp_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
>  
>  	dsb(ish);
>  	isb();
> -	mm_ops->put_page(ptep);
> +	put_pte_page(ptep, mm_ops);
>  
>  	if (childp)
> -		mm_ops->put_page(childp);
> +		put_pte_page(childp, mm_ops);
>  
>  	return 0;
>  }
> @@ -522,6 +531,7 @@ int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
>  	pgt->pgd = (kvm_pte_t *)mm_ops->zalloc_page(NULL);
>  	if (!pgt->pgd)
>  		return -ENOMEM;
> +	kvm_account_pgtable_pages((void *)pgt->pgd, +1);
>  
>  	pgt->ia_bits		= va_bits;
>  	pgt->start_level	= KVM_PGTABLE_MAX_LEVELS - levels;
> @@ -541,10 +551,10 @@ static int hyp_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
>  	if (!kvm_pte_valid(pte))
>  		return 0;
>  
> -	mm_ops->put_page(ptep);
> +	put_pte_page(ptep, mm_ops);
>  
>  	if (kvm_pte_table(pte, level))
> -		mm_ops->put_page(kvm_pte_follow(pte, mm_ops));
> +		put_pte_page(kvm_pte_follow(pte, mm_ops), mm_ops);

OK, I see the pattern. I don't think this workable as such. I'd rather
the callbacks themselves (put_page, zalloc_page*) call into the
accounting code when it makes sense, rather than spreading the
complexity and having to special case the protected case.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 4/6] KVM: arm64/mmu: count KVM page table pages in pagetable stats
  2022-04-26  7:34   ` Oliver Upton
@ 2022-04-26 19:27     ` Yosry Ahmed
  2022-04-28 17:45       ` Oliver Upton
  0 siblings, 1 reply; 13+ messages in thread
From: Yosry Ahmed @ 2022-04-26 19:27 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Sean Christopherson, Huacai Chen, Aleksandar Markovic,
	Anup Patel, Atish Patra, Paolo Bonzini, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, Andrew Morton,
	Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
	James Morse, Catalin Marinas, Shameer Kolothum, Marc Zyngier,
	Alexandru Elisei, Suzuki K Poulose, linux-mips, kvm, kvm-riscv,
	Linux Kernel Mailing List, linux-fsdevel, Linux-MM, cgroups,
	linux-arm-kernel, kvmarm

Hi Oliver,
Thanks so much for taking the time to take a look at this!

On Tue, Apr 26, 2022 at 12:35 AM Oliver Upton <oupton@google.com> wrote:
>
> Hi Yosry,
>
> On Tue, Apr 26, 2022 at 05:39:02AM +0000, Yosry Ahmed wrote:
> > Count the pages used by KVM in arm64 for page tables in pagetable stats.
> >
> > Account pages allocated for PTEs in pgtable init functions and
> > kvm_set_table_pte().
> >
> > Since most page table pages are freed using put_page(), add a helper
> > function put_pte_page() that checks if this is the last ref for a pte
> > page before putting it, and unaccounts stats accordingly.
> >
> > Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
> > ---
> >  arch/arm64/kernel/image-vars.h |  3 ++
> >  arch/arm64/kvm/hyp/pgtable.c   | 50 +++++++++++++++++++++-------------
> >  2 files changed, 34 insertions(+), 19 deletions(-)
> >
> > diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
> > index 241c86b67d01..25bf058714f6 100644
> > --- a/arch/arm64/kernel/image-vars.h
> > +++ b/arch/arm64/kernel/image-vars.h
> > @@ -143,6 +143,9 @@ KVM_NVHE_ALIAS(__hyp_rodata_end);
> >  /* pKVM static key */
> >  KVM_NVHE_ALIAS(kvm_protected_mode_initialized);
> >
> > +/* Called by kvm_account_pgtable_pages() to update pagetable stats */
> > +KVM_NVHE_ALIAS(__mod_lruvec_page_state);
> > +
> >  #endif /* CONFIG_KVM */
> >
> >  #endif /* __ARM64_KERNEL_IMAGE_VARS_H */
> > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > index 2cb3867eb7c2..53e13c3313e9 100644
> > --- a/arch/arm64/kvm/hyp/pgtable.c
> > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > @@ -152,6 +152,7 @@ static void kvm_set_table_pte(kvm_pte_t *ptep, kvm_pte_t *childp,
> >
> >       WARN_ON(kvm_pte_valid(old));
> >       smp_store_release(ptep, pte);
> > +     kvm_account_pgtable_pages((void *)childp, +1);
>
> What page tables do we want to account? KVM on ARM manages several page
> tables.
>
> For regular KVM, the host kernel manages allocations for the hyp stage 1
> tables in addition to the stage 2 tables used for a particular VM. The
> former is system overhead whereas the latter could be attributed to a
> guest VM.

Honestly I would love to get your input on this. The main motivation
here is to give users insights on the kernel memory usage on their
system (or in a cgroup). We currently have NR_PAGETABLE stats for
normal kernel page tables (allocated using
__pte_alloc_one()/pte_free()), this shows up in /proc/meminfo,
/path/to/cgroup/memory.stat, and node stats. The idea is to add
NR_SECONDARY_PAGETABLE that should include the memory used for kvm
pagetables, which should be a separate category (no overlap). What
gets included or not depends on the semantics of KVM and what exactly
falls under the category of secondary pagetables from the user's pov.

Currently it looks like s2 page table allocations get accounted to
kmem of memory control groups (GFP_KERNEL_ACCOUNT), while hyp page
table allocations do not (GFP_KERNEL). So we could either follow this
and only account s2 page table allocations in the stats, or make hyp
allocations use GFP_KERNEL_ACCOUNT as well and add them to the stats.
Let me know what you think.

>
> I imagine protected KVM is out of scope, since it actually manages its
> own allocations outside of the host kernel.
>
> Given this, I would recommend adding the accounting hooks to mmu.c as
> that is where we alloc/free table pages and it is in the host address
> space. kvm_s2_mm_ops and kvm_hyp_mm_ops point to all the relevant
> functions, though the latter is only relevant if we want to count system
> page tables too.

Yeah moving the accounting hooks to mmu.c is much cleaner, I will do
this in the next version. The only reason I did not do this is that I
found other kvm_pgtable_mm_ops structs (such as pkvm_pgtable_mm_ops),
but it looks like these may be irrelevant here.

>
> --
> Thanks,
> Oliver

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 4/6] KVM: arm64/mmu: count KVM page table pages in pagetable stats
  2022-04-26 15:58   ` Marc Zyngier
@ 2022-04-26 19:33     ` Yosry Ahmed
  0 siblings, 0 replies; 13+ messages in thread
From: Yosry Ahmed @ 2022-04-26 19:33 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Sean Christopherson, Huacai Chen, Aleksandar Markovic,
	Anup Patel, Atish Patra, Paolo Bonzini, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, Andrew Morton,
	Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
	James Morse, Catalin Marinas, Shameer Kolothum, Alexandru Elisei,
	Suzuki K Poulose, linux-mips, kvm, kvm-riscv,
	Linux Kernel Mailing List, linux-fsdevel, Linux-MM, cgroups,
	linux-arm-kernel, kvmarm

Thanks a lot for taking the time to look at this!

On Tue, Apr 26, 2022 at 8:58 AM Marc Zyngier <maz@kernel.org> wrote:
>
> On Tue, 26 Apr 2022 06:39:02 +0100,
> Yosry Ahmed <yosryahmed@google.com> wrote:
> >
> > Count the pages used by KVM in arm64 for page tables in pagetable stats.
> >
> > Account pages allocated for PTEs in pgtable init functions and
> > kvm_set_table_pte().
> >
> > Since most page table pages are freed using put_page(), add a helper
> > function put_pte_page() that checks if this is the last ref for a pte
> > page before putting it, and unaccounts stats accordingly.
> >
> > Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
> > ---
> >  arch/arm64/kernel/image-vars.h |  3 ++
> >  arch/arm64/kvm/hyp/pgtable.c   | 50 +++++++++++++++++++++-------------
> >  2 files changed, 34 insertions(+), 19 deletions(-)
> >
> > diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
> > index 241c86b67d01..25bf058714f6 100644
> > --- a/arch/arm64/kernel/image-vars.h
> > +++ b/arch/arm64/kernel/image-vars.h
> > @@ -143,6 +143,9 @@ KVM_NVHE_ALIAS(__hyp_rodata_end);
> >  /* pKVM static key */
> >  KVM_NVHE_ALIAS(kvm_protected_mode_initialized);
> >
> > +/* Called by kvm_account_pgtable_pages() to update pagetable stats */
> > +KVM_NVHE_ALIAS(__mod_lruvec_page_state);
>
> This cannot be right. It means that this function will be called
> directly from the EL2 code when in protected mode, and will result in
> extreme fireworks.  There is no way you can call core kernel stuff
> like this from this context.
>
> Please do not add random symbols to this list just for the sake of
> being able to link the kernel.

Excuse my ignorance, this is my first time touching kvm code. Thanks a
lot for pointing this out.

>
> > +
> >  #endif /* CONFIG_KVM */
> >
> >  #endif /* __ARM64_KERNEL_IMAGE_VARS_H */
> > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > index 2cb3867eb7c2..53e13c3313e9 100644
> > --- a/arch/arm64/kvm/hyp/pgtable.c
> > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > @@ -152,6 +152,7 @@ static void kvm_set_table_pte(kvm_pte_t *ptep, kvm_pte_t *childp,
> >
> >       WARN_ON(kvm_pte_valid(old));
> >       smp_store_release(ptep, pte);
> > +     kvm_account_pgtable_pages((void *)childp, +1);
>
> Why the + sign?

I am following conventions in other existing stat accounting hooks
(e.g. kvm_mod_used_mmu_pages(vcpu->kvm, +1) call in
arch/x86/kvm/mmu/mmu.c), but I can certainly remove it if you think
this is better.

>
> >  }
> >
> >  static kvm_pte_t kvm_init_valid_leaf_pte(u64 pa, kvm_pte_t attr, u32 level)
> > @@ -326,6 +327,14 @@ int kvm_pgtable_get_leaf(struct kvm_pgtable *pgt, u64 addr,
> >       return ret;
> >  }
> >
> > +static void put_pte_page(kvm_pte_t *ptep, struct kvm_pgtable_mm_ops *mm_ops)
> > +{
> > +     /* If this is the last page ref, decrement pagetable stats first. */
> > +     if (!mm_ops->page_count || mm_ops->page_count(ptep) == 1)
> > +             kvm_account_pgtable_pages((void *)ptep, -1);
> > +     mm_ops->put_page(ptep);
> > +}
> > +
> >  struct hyp_map_data {
> >       u64                             phys;
> >       kvm_pte_t                       attr;
> > @@ -488,10 +497,10 @@ static int hyp_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> >
> >       dsb(ish);
> >       isb();
> > -     mm_ops->put_page(ptep);
> > +     put_pte_page(ptep, mm_ops);
> >
> >       if (childp)
> > -             mm_ops->put_page(childp);
> > +             put_pte_page(childp, mm_ops);
> >
> >       return 0;
> >  }
> > @@ -522,6 +531,7 @@ int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
> >       pgt->pgd = (kvm_pte_t *)mm_ops->zalloc_page(NULL);
> >       if (!pgt->pgd)
> >               return -ENOMEM;
> > +     kvm_account_pgtable_pages((void *)pgt->pgd, +1);
> >
> >       pgt->ia_bits            = va_bits;
> >       pgt->start_level        = KVM_PGTABLE_MAX_LEVELS - levels;
> > @@ -541,10 +551,10 @@ static int hyp_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> >       if (!kvm_pte_valid(pte))
> >               return 0;
> >
> > -     mm_ops->put_page(ptep);
> > +     put_pte_page(ptep, mm_ops);
> >
> >       if (kvm_pte_table(pte, level))
> > -             mm_ops->put_page(kvm_pte_follow(pte, mm_ops));
> > +             put_pte_page(kvm_pte_follow(pte, mm_ops), mm_ops);
>
> OK, I see the pattern. I don't think this workable as such. I'd rather
> the callbacks themselves (put_page, zalloc_page*) call into the
> accounting code when it makes sense, rather than spreading the
> complexity and having to special case the protected case.
>

This makes sense. I am working on moving calls to
kvm_account_pgtable_pages to callbacks in mmu.c in the next version
(stage2_memcache_zalloc_page, kvm_host_put_page, etc).


> Thanks,
>
>         M.
>
> --
> Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 4/6] KVM: arm64/mmu: count KVM page table pages in pagetable stats
  2022-04-26 19:27     ` Yosry Ahmed
@ 2022-04-28 17:45       ` Oliver Upton
  2022-04-28 23:49         ` Yosry Ahmed
  0 siblings, 1 reply; 13+ messages in thread
From: Oliver Upton @ 2022-04-28 17:45 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: Sean Christopherson, Huacai Chen, Aleksandar Markovic,
	Anup Patel, Atish Patra, Paolo Bonzini, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, Andrew Morton,
	Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
	James Morse, Catalin Marinas, Shameer Kolothum, Marc Zyngier,
	Alexandru Elisei, Suzuki K Poulose, linux-mips, kvm, kvm-riscv,
	Linux Kernel Mailing List, linux-fsdevel, Linux-MM, cgroups,
	linux-arm-kernel, kvmarm

On Tue, Apr 26, 2022 at 12:27:57PM -0700, Yosry Ahmed wrote:
> > What page tables do we want to account? KVM on ARM manages several page
> > tables.
> >
> > For regular KVM, the host kernel manages allocations for the hyp stage 1
> > tables in addition to the stage 2 tables used for a particular VM. The
> > former is system overhead whereas the latter could be attributed to a
> > guest VM.
> 
> Honestly I would love to get your input on this. The main motivation
> here is to give users insights on the kernel memory usage on their
> system (or in a cgroup). We currently have NR_PAGETABLE stats for
> normal kernel page tables (allocated using
> __pte_alloc_one()/pte_free()), this shows up in /proc/meminfo,
> /path/to/cgroup/memory.stat, and node stats. The idea is to add
> NR_SECONDARY_PAGETABLE that should include the memory used for kvm
> pagetables, which should be a separate category (no overlap). What
> gets included or not depends on the semantics of KVM and what exactly
> falls under the category of secondary pagetables from the user's pov.
> 
> Currently it looks like s2 page table allocations get accounted to
> kmem of memory control groups (GFP_KERNEL_ACCOUNT), while hyp page
> table allocations do not (GFP_KERNEL). So we could either follow this
> and only account s2 page table allocations in the stats, or make hyp
> allocations use GFP_KERNEL_ACCOUNT as well and add them to the stats.
> Let me know what you think.

I think it is reasonable to just focus on stage 2 table allocations and
ignore all else. As Marc pointed out it isn't workable in other
contexts anyway (pKVM), and keeps the patch tidy too.

GFP_KERNEL_ACCOUNT for hyp allocations wouldn't make sense, as it is
done at init to build out the system page tables for EL2.

--
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 4/6] KVM: arm64/mmu: count KVM page table pages in pagetable stats
  2022-04-28 17:45       ` Oliver Upton
@ 2022-04-28 23:49         ` Yosry Ahmed
  0 siblings, 0 replies; 13+ messages in thread
From: Yosry Ahmed @ 2022-04-28 23:49 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Sean Christopherson, Huacai Chen, Aleksandar Markovic,
	Anup Patel, Atish Patra, Paolo Bonzini, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, Andrew Morton,
	Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
	James Morse, Catalin Marinas, Shameer Kolothum, Marc Zyngier,
	Alexandru Elisei, Suzuki K Poulose, linux-mips, kvm, kvm-riscv,
	Linux Kernel Mailing List, linux-fsdevel, Linux-MM, cgroups,
	linux-arm-kernel, kvmarm

On Thu, Apr 28, 2022 at 10:45 AM Oliver Upton <oupton@google.com> wrote:
>
> On Tue, Apr 26, 2022 at 12:27:57PM -0700, Yosry Ahmed wrote:
> > > What page tables do we want to account? KVM on ARM manages several page
> > > tables.
> > >
> > > For regular KVM, the host kernel manages allocations for the hyp stage 1
> > > tables in addition to the stage 2 tables used for a particular VM. The
> > > former is system overhead whereas the latter could be attributed to a
> > > guest VM.
> >
> > Honestly I would love to get your input on this. The main motivation
> > here is to give users insights on the kernel memory usage on their
> > system (or in a cgroup). We currently have NR_PAGETABLE stats for
> > normal kernel page tables (allocated using
> > __pte_alloc_one()/pte_free()), this shows up in /proc/meminfo,
> > /path/to/cgroup/memory.stat, and node stats. The idea is to add
> > NR_SECONDARY_PAGETABLE that should include the memory used for kvm
> > pagetables, which should be a separate category (no overlap). What
> > gets included or not depends on the semantics of KVM and what exactly
> > falls under the category of secondary pagetables from the user's pov.
> >
> > Currently it looks like s2 page table allocations get accounted to
> > kmem of memory control groups (GFP_KERNEL_ACCOUNT), while hyp page
> > table allocations do not (GFP_KERNEL). So we could either follow this
> > and only account s2 page table allocations in the stats, or make hyp
> > allocations use GFP_KERNEL_ACCOUNT as well and add them to the stats.
> > Let me know what you think.
>
> I think it is reasonable to just focus on stage 2 table allocations and
> ignore all else. As Marc pointed out it isn't workable in other
> contexts anyway (pKVM), and keeps the patch tidy too.
>
> GFP_KERNEL_ACCOUNT for hyp allocations wouldn't make sense, as it is
> done at init to build out the system page tables for EL2.

Thanks so much for the insights, will send out v4 according to our discussion.

>
> --
> Thanks,
> Oliver

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-04-28 23:49 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-26  5:38 [PATCH v3 0/6] KVM: mm: count KVM page table pages in memory stats Yosry Ahmed
2022-04-26  5:38 ` [PATCH v3 1/6] mm: add NR_SECONDARY_PAGETABLE stat Yosry Ahmed
2022-04-26  5:39 ` [PATCH v3 2/6] KVM: mmu: add a helper to account page table pages used by KVM Yosry Ahmed
2022-04-26  5:39 ` [PATCH v3 3/6] KVM: x86/mmu: count KVM page table pages in pagetable stats Yosry Ahmed
2022-04-26  5:39 ` [PATCH v3 4/6] KVM: arm64/mmu: " Yosry Ahmed
2022-04-26  7:34   ` Oliver Upton
2022-04-26 19:27     ` Yosry Ahmed
2022-04-28 17:45       ` Oliver Upton
2022-04-28 23:49         ` Yosry Ahmed
2022-04-26 15:58   ` Marc Zyngier
2022-04-26 19:33     ` Yosry Ahmed
2022-04-26  5:39 ` [PATCH v3 5/6] KVM: riscv/mmu: " Yosry Ahmed
2022-04-26  5:39 ` [PATCH v3 6/6] KVM: mips/mmu: " Yosry Ahmed

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).