linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/4] KVM: mm: count KVM mmu usage in memory stats
@ 2022-04-29 20:11 Yosry Ahmed
  2022-04-29 20:11 ` [PATCH v4 1/4] mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses Yosry Ahmed
                   ` (3 more replies)
  0 siblings, 4 replies; 26+ messages in thread
From: Yosry Ahmed @ 2022-04-29 20:11 UTC (permalink / raw)
  To: Tejun Heo, Johannes Weiner, Zefan Li, Marc Zyngier, James Morse,
	Alexandru Elisei, Suzuki K Poulose, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Andrew Morton, Michal Hocko, Roman Gushchin,
	Shakeel Butt, Oliver Upton
  Cc: cgroups, linux-kernel, linux-arm-kernel, kvmarm, kvm, linux-mm,
	Yosry Ahmed

We keep track of several kernel memory stats (total kernel memory, page
tables, stack, vmalloc, etc) on multiple levels (global, per-node,
per-memcg, etc). These stats give insights to users to how much memory
is used by the kernel and for what purposes.

Currently, memory used by kvm mmu is not accounted in any of those
kernel memory stats. This patch series accounts the memory pages
used by KVM for page tables in those stats in a new
NR_SECONDARY_PAGETABLE stat.

---

Changes in V4:
- Changed accounting hooks in arm64 to only account s2 page tables and
  refactored them to a much cleaner form, based on recommendations from
  Oliver Upton and Marc Zyngier.
- Dropped patches for mips and riscv. I am not interested in those archs
  anyway and don't have the resources to test them. I posted them for
  completeness but it doesn't seem like anyone was interested.

Changes in V3:
- Added NR_SECONDARY_PAGETABLE instead of piggybacking on NR_PAGETABLE
  stats.

Changes in V2:
- Added accounting stats for other archs than x86.
- Changed locations in the code where x86 KVM page table stats were
  accounted based on suggestions from Sean Christopherson.

---

Yosry Ahmed (4):
  mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses.
  KVM: mmu: add a helper to account memory used by KVM mmu.
  KVM: x86/mmu: count KVM mmu usage in secondary pagetable stats.
  KVM: arm64/mmu: count KVM s2 mmu usage in secondary pagetable stats

 Documentation/admin-guide/cgroup-v2.rst |  5 ++++
 Documentation/filesystems/proc.rst      |  4 +++
 arch/arm64/kvm/mmu.c                    | 35 ++++++++++++++++++++++---
 arch/x86/kvm/mmu/mmu.c                  | 16 +++++++++--
 arch/x86/kvm/mmu/tdp_mmu.c              | 16 +++++++++--
 drivers/base/node.c                     |  2 ++
 fs/proc/meminfo.c                       |  2 ++
 include/linux/kvm_host.h                |  9 +++++++
 include/linux/mmzone.h                  |  1 +
 mm/memcontrol.c                         |  1 +
 mm/page_alloc.c                         |  6 ++++-
 mm/vmstat.c                             |  1 +
 12 files changed, 89 insertions(+), 9 deletions(-)

-- 
2.36.0.464.gb9c8b46e94-goog


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v4 1/4] mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses.
  2022-04-29 20:11 [PATCH v4 0/4] KVM: mm: count KVM mmu usage in memory stats Yosry Ahmed
@ 2022-04-29 20:11 ` Yosry Ahmed
  2022-05-02 10:01   ` Marc Zyngier
  2022-04-29 20:11 ` [PATCH v4 2/4] KVM: mmu: add a helper to account memory used by KVM mmu Yosry Ahmed
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 26+ messages in thread
From: Yosry Ahmed @ 2022-04-29 20:11 UTC (permalink / raw)
  To: Tejun Heo, Johannes Weiner, Zefan Li, Marc Zyngier, James Morse,
	Alexandru Elisei, Suzuki K Poulose, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Andrew Morton, Michal Hocko, Roman Gushchin,
	Shakeel Butt, Oliver Upton
  Cc: cgroups, linux-kernel, linux-arm-kernel, kvmarm, kvm, linux-mm,
	Yosry Ahmed

Add NR_SECONDARY_PAGETABLE stat to count secondary page table uses, e.g.
KVM mmu. This provides more insights on the kernel memory used
by a workload.

This stat will be used by subsequent patches to count KVM mmu
memory usage.

Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
---
 Documentation/admin-guide/cgroup-v2.rst | 5 +++++
 Documentation/filesystems/proc.rst      | 4 ++++
 drivers/base/node.c                     | 2 ++
 fs/proc/meminfo.c                       | 2 ++
 include/linux/mmzone.h                  | 1 +
 mm/memcontrol.c                         | 1 +
 mm/page_alloc.c                         | 6 +++++-
 mm/vmstat.c                             | 1 +
 8 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 69d7a6983f78..828cb6b6f918 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1312,6 +1312,11 @@ PAGE_SIZE multiple when read back.
 	  pagetables
                 Amount of memory allocated for page tables.
 
+	  secondary_pagetables
+		Amount of memory allocated for secondary page tables,
+		this currently includes KVM mmu allocations on x86
+		and arm64.
+
 	  percpu (npn)
 		Amount of memory used for storing per-cpu kernel
 		data structures.
diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
index 061744c436d9..894d6317f3bd 100644
--- a/Documentation/filesystems/proc.rst
+++ b/Documentation/filesystems/proc.rst
@@ -973,6 +973,7 @@ You may not have all of these fields.
     SReclaimable:   159856 kB
     SUnreclaim:     124508 kB
     PageTables:      24448 kB
+    SecPageTables:	 0 kB
     NFS_Unstable:        0 kB
     Bounce:              0 kB
     WritebackTmp:        0 kB
@@ -1067,6 +1068,9 @@ SUnreclaim
 PageTables
               amount of memory dedicated to the lowest level of page
               tables.
+SecPageTables
+	      amount of memory dedicated to secondary page tables, this
+	      currently includes KVM mmu allocations on x86 and arm64.
 NFS_Unstable
               Always zero. Previous counted pages which had been written to
               the server, but has not been committed to stable storage.
diff --git a/drivers/base/node.c b/drivers/base/node.c
index ec8bb24a5a22..9fe716832546 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -433,6 +433,7 @@ static ssize_t node_read_meminfo(struct device *dev,
 			     "Node %d ShadowCallStack:%8lu kB\n"
 #endif
 			     "Node %d PageTables:     %8lu kB\n"
+			     "Node %d SecPageTables:  %8lu kB\n"
 			     "Node %d NFS_Unstable:   %8lu kB\n"
 			     "Node %d Bounce:         %8lu kB\n"
 			     "Node %d WritebackTmp:   %8lu kB\n"
@@ -459,6 +460,7 @@ static ssize_t node_read_meminfo(struct device *dev,
 			     nid, node_page_state(pgdat, NR_KERNEL_SCS_KB),
 #endif
 			     nid, K(node_page_state(pgdat, NR_PAGETABLE)),
+			     nid, K(node_page_state(pgdat, NR_SECONDARY_PAGETABLE)),
 			     nid, 0UL,
 			     nid, K(sum_zone_node_page_state(nid, NR_BOUNCE)),
 			     nid, K(node_page_state(pgdat, NR_WRITEBACK_TEMP)),
diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
index 6fa761c9cc78..fad29024eb2e 100644
--- a/fs/proc/meminfo.c
+++ b/fs/proc/meminfo.c
@@ -108,6 +108,8 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
 #endif
 	show_val_kb(m, "PageTables:     ",
 		    global_node_page_state(NR_PAGETABLE));
+	show_val_kb(m, "SecPageTables:	",
+		    global_node_page_state(NR_SECONDARY_PAGETABLE));
 
 	show_val_kb(m, "NFS_Unstable:   ", 0);
 	show_val_kb(m, "Bounce:         ",
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 962b14d403e8..35f57f2578c0 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -219,6 +219,7 @@ enum node_stat_item {
 	NR_KERNEL_SCS_KB,	/* measured in KiB */
 #endif
 	NR_PAGETABLE,		/* used for pagetables */
+	NR_SECONDARY_PAGETABLE, /* secondary pagetables, e.g. kvm shadow pagetables */
 #ifdef CONFIG_SWAP
 	NR_SWAPCACHE,
 #endif
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 725f76723220..89fbd1793960 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1388,6 +1388,7 @@ static const struct memory_stat memory_stats[] = {
 	{ "kernel",			MEMCG_KMEM			},
 	{ "kernel_stack",		NR_KERNEL_STACK_KB		},
 	{ "pagetables",			NR_PAGETABLE			},
+	{ "secondary_pagetables",	NR_SECONDARY_PAGETABLE		},
 	{ "percpu",			MEMCG_PERCPU_B			},
 	{ "sock",			MEMCG_SOCK			},
 	{ "vmalloc",			MEMCG_VMALLOC			},
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2db95780e003..96d00ae9d5c1 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5932,7 +5932,8 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask)
 		" active_file:%lu inactive_file:%lu isolated_file:%lu\n"
 		" unevictable:%lu dirty:%lu writeback:%lu\n"
 		" slab_reclaimable:%lu slab_unreclaimable:%lu\n"
-		" mapped:%lu shmem:%lu pagetables:%lu bounce:%lu\n"
+		" mapped:%lu shmem:%lu pagetables:%lu\n"
+		" secondary_pagetables:%lu bounce:%lu\n"
 		" kernel_misc_reclaimable:%lu\n"
 		" free:%lu free_pcp:%lu free_cma:%lu\n",
 		global_node_page_state(NR_ACTIVE_ANON),
@@ -5949,6 +5950,7 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask)
 		global_node_page_state(NR_FILE_MAPPED),
 		global_node_page_state(NR_SHMEM),
 		global_node_page_state(NR_PAGETABLE),
+		global_node_page_state(NR_SECONDARY_PAGETABLE),
 		global_zone_page_state(NR_BOUNCE),
 		global_node_page_state(NR_KERNEL_MISC_RECLAIMABLE),
 		global_zone_page_state(NR_FREE_PAGES),
@@ -5982,6 +5984,7 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask)
 			" shadow_call_stack:%lukB"
 #endif
 			" pagetables:%lukB"
+			" secondary_pagetables:%lukB"
 			" all_unreclaimable? %s"
 			"\n",
 			pgdat->node_id,
@@ -6007,6 +6010,7 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask)
 			node_page_state(pgdat, NR_KERNEL_SCS_KB),
 #endif
 			K(node_page_state(pgdat, NR_PAGETABLE)),
+			K(node_page_state(pgdat, NR_SECONDARY_PAGETABLE)),
 			pgdat->kswapd_failures >= MAX_RECLAIM_RETRIES ?
 				"yes" : "no");
 	}
diff --git a/mm/vmstat.c b/mm/vmstat.c
index b75b1a64b54c..50bbec73809b 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1240,6 +1240,7 @@ const char * const vmstat_text[] = {
 	"nr_shadow_call_stack",
 #endif
 	"nr_page_table_pages",
+	"nr_secondary_page_table_pages",
 #ifdef CONFIG_SWAP
 	"nr_swapcached",
 #endif
-- 
2.36.0.464.gb9c8b46e94-goog


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v4 2/4] KVM: mmu: add a helper to account memory used by KVM mmu.
  2022-04-29 20:11 [PATCH v4 0/4] KVM: mm: count KVM mmu usage in memory stats Yosry Ahmed
  2022-04-29 20:11 ` [PATCH v4 1/4] mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses Yosry Ahmed
@ 2022-04-29 20:11 ` Yosry Ahmed
  2022-04-29 20:11 ` [PATCH v4 3/4] KVM: x86/mmu: count KVM mmu usage in secondary pagetable stats Yosry Ahmed
  2022-04-29 20:11 ` [PATCH v4 4/4] KVM: arm64/mmu: count KVM s2 " Yosry Ahmed
  3 siblings, 0 replies; 26+ messages in thread
From: Yosry Ahmed @ 2022-04-29 20:11 UTC (permalink / raw)
  To: Tejun Heo, Johannes Weiner, Zefan Li, Marc Zyngier, James Morse,
	Alexandru Elisei, Suzuki K Poulose, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Andrew Morton, Michal Hocko, Roman Gushchin,
	Shakeel Butt, Oliver Upton
  Cc: cgroups, linux-kernel, linux-arm-kernel, kvmarm, kvm, linux-mm,
	Yosry Ahmed

Add a helper to account pages used by KVM for page tables in secondary
pagetable stats. This function will be used by subsequent patches in
different archs.

Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
---
 include/linux/kvm_host.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 252ee4a61b58..54cc4634053c 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2221,6 +2221,15 @@ static inline void kvm_handle_signal_exit(struct kvm_vcpu *vcpu)
 }
 #endif /* CONFIG_KVM_XFER_TO_GUEST_WORK */
 
+/*
+ * If nr > 1, we assume virt is the address of the first page of a block of
+ * pages that were allocated together (i.e accounted together).
+ */
+static inline void kvm_account_pgtable_pages(void *virt, int nr)
+{
+	mod_lruvec_page_state(virt_to_page(virt), NR_SECONDARY_PAGETABLE, nr);
+}
+
 /*
  * This defines how many reserved entries we want to keep before we
  * kick the vcpu to the userspace to avoid dirty ring full.  This
-- 
2.36.0.464.gb9c8b46e94-goog


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v4 3/4] KVM: x86/mmu: count KVM mmu usage in secondary pagetable stats.
  2022-04-29 20:11 [PATCH v4 0/4] KVM: mm: count KVM mmu usage in memory stats Yosry Ahmed
  2022-04-29 20:11 ` [PATCH v4 1/4] mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses Yosry Ahmed
  2022-04-29 20:11 ` [PATCH v4 2/4] KVM: mmu: add a helper to account memory used by KVM mmu Yosry Ahmed
@ 2022-04-29 20:11 ` Yosry Ahmed
  2022-07-22 20:58   ` Mingwei Zhang
  2022-04-29 20:11 ` [PATCH v4 4/4] KVM: arm64/mmu: count KVM s2 " Yosry Ahmed
  3 siblings, 1 reply; 26+ messages in thread
From: Yosry Ahmed @ 2022-04-29 20:11 UTC (permalink / raw)
  To: Tejun Heo, Johannes Weiner, Zefan Li, Marc Zyngier, James Morse,
	Alexandru Elisei, Suzuki K Poulose, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Andrew Morton, Michal Hocko, Roman Gushchin,
	Shakeel Butt, Oliver Upton
  Cc: cgroups, linux-kernel, linux-arm-kernel, kvmarm, kvm, linux-mm,
	Yosry Ahmed

Count the pages used by KVM mmu on x86 for in secondary pagetable stats.

For the legacy mmu, accounting pagetable stats is combined KVM's
existing for mmu pages in newly introduced kvm_[un]account_mmu_page()
helpers.

For tdp mmu, introduce new tdp_[un]account_mmu_page() helpers. That
combines accounting pagetable stats with the tdp_mmu_pages counter
accounting.

tdp_mmu_pages counter introduced in this series [1]. This patch was
rebased on top of the first two patches in that series.

[1]https://lore.kernel.org/lkml/20220401063636.2414200-1-mizhang@google.com/

Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
---
 arch/x86/kvm/mmu/mmu.c     | 16 ++++++++++++++--
 arch/x86/kvm/mmu/tdp_mmu.c | 16 ++++++++++++++--
 2 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 78d8e1d8fb99..e5b0e826445d 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1679,6 +1679,18 @@ static inline void kvm_mod_used_mmu_pages(struct kvm *kvm, long nr)
 	percpu_counter_add(&kvm_total_used_mmu_pages, nr);
 }
 
+static void kvm_account_mmu_page(struct kvm *kvm, struct kvm_mmu_page *sp)
+{
+	kvm_mod_used_mmu_pages(kvm, +1);
+	kvm_account_pgtable_pages((void *)sp->spt, +1);
+}
+
+static void kvm_unaccount_mmu_page(struct kvm *kvm, struct kvm_mmu_page *sp)
+{
+	kvm_mod_used_mmu_pages(kvm, -1);
+	kvm_account_pgtable_pages((void *)sp->spt, -1);
+}
+
 static void kvm_mmu_free_page(struct kvm_mmu_page *sp)
 {
 	MMU_WARN_ON(!is_empty_shadow_page(sp->spt));
@@ -1734,7 +1746,7 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu, int direct
 	 */
 	sp->mmu_valid_gen = vcpu->kvm->arch.mmu_valid_gen;
 	list_add(&sp->link, &vcpu->kvm->arch.active_mmu_pages);
-	kvm_mod_used_mmu_pages(vcpu->kvm, +1);
+	kvm_account_mmu_page(vcpu->kvm, sp);
 	return sp;
 }
 
@@ -2363,7 +2375,7 @@ static bool __kvm_mmu_prepare_zap_page(struct kvm *kvm,
 			list_add(&sp->link, invalid_list);
 		else
 			list_move(&sp->link, invalid_list);
-		kvm_mod_used_mmu_pages(kvm, -1);
+		kvm_unaccount_mmu_page(kvm, sp);
 	} else {
 		/*
 		 * Remove the active root from the active page list, the root
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 3456277ade18..6295c4da5dee 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -371,6 +371,18 @@ static void handle_changed_spte_dirty_log(struct kvm *kvm, int as_id, gfn_t gfn,
 	}
 }
 
+static void tdp_account_mmu_page(struct kvm *kvm, struct kvm_mmu_page *sp)
+{
+	atomic64_inc(&kvm->arch.tdp_mmu_pages);
+	kvm_account_pgtable_pages((void *)sp->spt, +1);
+}
+
+static void tdp_unaccount_mmu_page(struct kvm *kvm, struct kvm_mmu_page *sp)
+{
+	atomic64_dec(&kvm->arch.tdp_mmu_pages);
+	kvm_account_pgtable_pages((void *)sp->spt, -1);
+}
+
 /**
  * tdp_mmu_unlink_sp() - Remove a shadow page from the list of used pages
  *
@@ -383,7 +395,7 @@ static void handle_changed_spte_dirty_log(struct kvm *kvm, int as_id, gfn_t gfn,
 static void tdp_mmu_unlink_sp(struct kvm *kvm, struct kvm_mmu_page *sp,
 			      bool shared)
 {
-	atomic64_dec(&kvm->arch.tdp_mmu_pages);
+	tdp_unaccount_mmu_page(kvm, sp);
 
 	if (!sp->lpage_disallowed)
 		return;
@@ -1121,7 +1133,7 @@ static int tdp_mmu_link_sp(struct kvm *kvm, struct tdp_iter *iter,
 		tdp_mmu_set_spte(kvm, iter, spte);
 	}
 
-	atomic64_inc(&kvm->arch.tdp_mmu_pages);
+	tdp_account_mmu_page(kvm, sp);
 
 	return 0;
 }
-- 
2.36.0.464.gb9c8b46e94-goog


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v4 4/4] KVM: arm64/mmu: count KVM s2 mmu usage in secondary pagetable stats
  2022-04-29 20:11 [PATCH v4 0/4] KVM: mm: count KVM mmu usage in memory stats Yosry Ahmed
                   ` (2 preceding siblings ...)
  2022-04-29 20:11 ` [PATCH v4 3/4] KVM: x86/mmu: count KVM mmu usage in secondary pagetable stats Yosry Ahmed
@ 2022-04-29 20:11 ` Yosry Ahmed
  2022-05-02  7:24   ` Oliver Upton
  3 siblings, 1 reply; 26+ messages in thread
From: Yosry Ahmed @ 2022-04-29 20:11 UTC (permalink / raw)
  To: Tejun Heo, Johannes Weiner, Zefan Li, Marc Zyngier, James Morse,
	Alexandru Elisei, Suzuki K Poulose, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Andrew Morton, Michal Hocko, Roman Gushchin,
	Shakeel Butt, Oliver Upton
  Cc: cgroups, linux-kernel, linux-arm-kernel, kvmarm, kvm, linux-mm,
	Yosry Ahmed

Count the pages used by KVM in arm64 for stage2 mmu in secondary pagetable
stats.

Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
---
 arch/arm64/kvm/mmu.c | 35 +++++++++++++++++++++++++++++++----
 1 file changed, 31 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 53ae2c0640bc..fc5030307cce 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -92,9 +92,13 @@ static bool kvm_is_device_pfn(unsigned long pfn)
 static void *stage2_memcache_zalloc_page(void *arg)
 {
 	struct kvm_mmu_memory_cache *mc = arg;
+	void *virt;
 
 	/* Allocated with __GFP_ZERO, so no need to zero */
-	return kvm_mmu_memory_cache_alloc(mc);
+	virt = kvm_mmu_memory_cache_alloc(mc);
+	if (virt)
+		kvm_account_pgtable_pages(virt, +1);
+	return virt;
 }
 
 static void *kvm_host_zalloc_pages_exact(size_t size)
@@ -102,6 +106,20 @@ static void *kvm_host_zalloc_pages_exact(size_t size)
 	return alloc_pages_exact(size, GFP_KERNEL_ACCOUNT | __GFP_ZERO);
 }
 
+static void *kvm_s2_zalloc_pages_exact(size_t size)
+{
+	void *virt = kvm_host_zalloc_pages_exact(size);
+	if (virt)
+		kvm_account_pgtable_pages(virt, +(size >> PAGE_SHIFT));
+	return virt;
+}
+
+static void kvm_s2_free_pages_exact(void *virt, size_t size)
+{
+	kvm_account_pgtable_pages(virt, -(size >> PAGE_SHIFT));
+	free_pages_exact(virt, size);
+}
+
 static void kvm_host_get_page(void *addr)
 {
 	get_page(virt_to_page(addr));
@@ -112,6 +130,15 @@ static void kvm_host_put_page(void *addr)
 	put_page(virt_to_page(addr));
 }
 
+static void kvm_s2_put_page(void *addr)
+{
+	struct page *p = virt_to_page(addr);
+	/* Dropping last refcount, the page will be freed */
+	if (page_count(p) == 1)
+		kvm_account_pgtable_pages(addr, -1);
+	put_page(p);
+}
+
 static int kvm_host_page_count(void *addr)
 {
 	return page_count(virt_to_page(addr));
@@ -603,10 +630,10 @@ static int get_user_mapping_size(struct kvm *kvm, u64 addr)
 
 static struct kvm_pgtable_mm_ops kvm_s2_mm_ops = {
 	.zalloc_page		= stage2_memcache_zalloc_page,
-	.zalloc_pages_exact	= kvm_host_zalloc_pages_exact,
-	.free_pages_exact	= free_pages_exact,
+	.zalloc_pages_exact	= kvm_s2_zalloc_pages_exact,
+	.free_pages_exact	= kvm_s2_free_pages_exact,
 	.get_page		= kvm_host_get_page,
-	.put_page		= kvm_host_put_page,
+	.put_page		= kvm_s2_put_page,
 	.page_count		= kvm_host_page_count,
 	.phys_to_virt		= kvm_host_va,
 	.virt_to_phys		= kvm_host_pa,
-- 
2.36.0.464.gb9c8b46e94-goog


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 4/4] KVM: arm64/mmu: count KVM s2 mmu usage in secondary pagetable stats
  2022-04-29 20:11 ` [PATCH v4 4/4] KVM: arm64/mmu: count KVM s2 " Yosry Ahmed
@ 2022-05-02  7:24   ` Oliver Upton
  2022-05-02  9:49     ` Marc Zyngier
  0 siblings, 1 reply; 26+ messages in thread
From: Oliver Upton @ 2022-05-02  7:24 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: Tejun Heo, Johannes Weiner, Zefan Li, Marc Zyngier, James Morse,
	Alexandru Elisei, Suzuki K Poulose, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Andrew Morton, Michal Hocko, Roman Gushchin,
	Shakeel Butt, cgroups, linux-kernel, linux-arm-kernel, kvmarm,
	kvm, linux-mm

Hi Yosry,

On Fri, Apr 29, 2022 at 08:11:31PM +0000, Yosry Ahmed wrote:
> Count the pages used by KVM in arm64 for stage2 mmu in secondary pagetable
> stats.
> 
> Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
> ---
>  arch/arm64/kvm/mmu.c | 35 +++++++++++++++++++++++++++++++----
>  1 file changed, 31 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 53ae2c0640bc..fc5030307cce 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -92,9 +92,13 @@ static bool kvm_is_device_pfn(unsigned long pfn)
>  static void *stage2_memcache_zalloc_page(void *arg)
>  {
>  	struct kvm_mmu_memory_cache *mc = arg;
> +	void *virt;
>  
>  	/* Allocated with __GFP_ZERO, so no need to zero */
> -	return kvm_mmu_memory_cache_alloc(mc);
> +	virt = kvm_mmu_memory_cache_alloc(mc);
> +	if (virt)
> +		kvm_account_pgtable_pages(virt, +1);

Sorry I didn't say it last time around, would now be a good time to
clean up the funky sign convention of kvm_mod_used_mmu_pages()? Or limit
the funk to just x86 :)

--
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 4/4] KVM: arm64/mmu: count KVM s2 mmu usage in secondary pagetable stats
  2022-05-02  7:24   ` Oliver Upton
@ 2022-05-02  9:49     ` Marc Zyngier
  0 siblings, 0 replies; 26+ messages in thread
From: Marc Zyngier @ 2022-05-02  9:49 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Yosry Ahmed, Tejun Heo, Johannes Weiner, Zefan Li, James Morse,
	Alexandru Elisei, Suzuki K Poulose, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Andrew Morton, Michal Hocko, Roman Gushchin,
	Shakeel Butt, cgroups, linux-kernel, linux-arm-kernel, kvmarm,
	kvm, linux-mm

On Mon, 02 May 2022 08:24:28 +0100,
Oliver Upton <oupton@google.com> wrote:
> 
> Hi Yosry,
> 
> On Fri, Apr 29, 2022 at 08:11:31PM +0000, Yosry Ahmed wrote:
> > Count the pages used by KVM in arm64 for stage2 mmu in secondary pagetable
> > stats.
> > 
> > Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
> > ---
> >  arch/arm64/kvm/mmu.c | 35 +++++++++++++++++++++++++++++++----
> >  1 file changed, 31 insertions(+), 4 deletions(-)
> > 
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index 53ae2c0640bc..fc5030307cce 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -92,9 +92,13 @@ static bool kvm_is_device_pfn(unsigned long pfn)
> >  static void *stage2_memcache_zalloc_page(void *arg)
> >  {
> >  	struct kvm_mmu_memory_cache *mc = arg;
> > +	void *virt;
> >  
> >  	/* Allocated with __GFP_ZERO, so no need to zero */
> > -	return kvm_mmu_memory_cache_alloc(mc);
> > +	virt = kvm_mmu_memory_cache_alloc(mc);
> > +	if (virt)
> > +		kvm_account_pgtable_pages(virt, +1);
> 
> Sorry I didn't say it last time around, would now be a good time to
> clean up the funky sign convention of kvm_mod_used_mmu_pages()? Or limit
> the funk to just x86 :)

Indeed. I pointed this out in my initial review of this series, and
expected these to be gone by now.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 1/4] mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses.
  2022-04-29 20:11 ` [PATCH v4 1/4] mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses Yosry Ahmed
@ 2022-05-02 10:01   ` Marc Zyngier
  2022-05-02 18:46     ` Yosry Ahmed
  0 siblings, 1 reply; 26+ messages in thread
From: Marc Zyngier @ 2022-05-02 10:01 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: Tejun Heo, Johannes Weiner, Zefan Li, James Morse,
	Alexandru Elisei, Suzuki K Poulose, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Andrew Morton, Michal Hocko, Roman Gushchin,
	Shakeel Butt, Oliver Upton, cgroups, linux-kernel,
	linux-arm-kernel, kvmarm, kvm, linux-mm

On Fri, 29 Apr 2022 21:11:28 +0100,
Yosry Ahmed <yosryahmed@google.com> wrote:
> 
> Add NR_SECONDARY_PAGETABLE stat to count secondary page table uses, e.g.
> KVM mmu. This provides more insights on the kernel memory used
> by a workload.
> 
> This stat will be used by subsequent patches to count KVM mmu
> memory usage.
> 
> Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
> ---
>  Documentation/admin-guide/cgroup-v2.rst | 5 +++++
>  Documentation/filesystems/proc.rst      | 4 ++++
>  drivers/base/node.c                     | 2 ++
>  fs/proc/meminfo.c                       | 2 ++
>  include/linux/mmzone.h                  | 1 +
>  mm/memcontrol.c                         | 1 +
>  mm/page_alloc.c                         | 6 +++++-
>  mm/vmstat.c                             | 1 +
>  8 files changed, 21 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> index 69d7a6983f78..828cb6b6f918 100644
> --- a/Documentation/admin-guide/cgroup-v2.rst
> +++ b/Documentation/admin-guide/cgroup-v2.rst
> @@ -1312,6 +1312,11 @@ PAGE_SIZE multiple when read back.
>  	  pagetables
>                  Amount of memory allocated for page tables.
>  
> +	  secondary_pagetables
> +		Amount of memory allocated for secondary page tables,
> +		this currently includes KVM mmu allocations on x86
> +		and arm64.

Can you please explain what the rationale is for this? We already
account for the (arm64) S2 PTs as a userspace allocation (see
115bae923ac8bb29ee635). You are saying that this is related to a
'workload', but given that the accounting is global, I fail to see how
you can attribute these allocations on a particular VM.

What do you plan to do for IOMMU page tables? After all, they serve
the exact same purpose, and I'd expect these to be handled the same
way (i.e. why is this KVM specific?).

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 1/4] mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses.
  2022-05-02 10:01   ` Marc Zyngier
@ 2022-05-02 18:46     ` Yosry Ahmed
  2022-05-09 16:38       ` Yosry Ahmed
  2022-05-12 23:07       ` Johannes Weiner
  0 siblings, 2 replies; 26+ messages in thread
From: Yosry Ahmed @ 2022-05-02 18:46 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Tejun Heo, Johannes Weiner, Zefan Li, James Morse,
	Alexandru Elisei, Suzuki K Poulose, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Andrew Morton, Michal Hocko, Roman Gushchin,
	Shakeel Butt, Oliver Upton, cgroups, Linux Kernel Mailing List,
	linux-arm-kernel, kvmarm, kvm, Linux-MM

On Mon, May 2, 2022 at 3:01 AM Marc Zyngier <maz@kernel.org> wrote:
>
> On Fri, 29 Apr 2022 21:11:28 +0100,
> Yosry Ahmed <yosryahmed@google.com> wrote:
> >
> > Add NR_SECONDARY_PAGETABLE stat to count secondary page table uses, e.g.
> > KVM mmu. This provides more insights on the kernel memory used
> > by a workload.
> >
> > This stat will be used by subsequent patches to count KVM mmu
> > memory usage.
> >
> > Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
> > ---
> >  Documentation/admin-guide/cgroup-v2.rst | 5 +++++
> >  Documentation/filesystems/proc.rst      | 4 ++++
> >  drivers/base/node.c                     | 2 ++
> >  fs/proc/meminfo.c                       | 2 ++
> >  include/linux/mmzone.h                  | 1 +
> >  mm/memcontrol.c                         | 1 +
> >  mm/page_alloc.c                         | 6 +++++-
> >  mm/vmstat.c                             | 1 +
> >  8 files changed, 21 insertions(+), 1 deletion(-)
> >
> > diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> > index 69d7a6983f78..828cb6b6f918 100644
> > --- a/Documentation/admin-guide/cgroup-v2.rst
> > +++ b/Documentation/admin-guide/cgroup-v2.rst
> > @@ -1312,6 +1312,11 @@ PAGE_SIZE multiple when read back.
> >         pagetables
> >                  Amount of memory allocated for page tables.
> >
> > +       secondary_pagetables
> > +             Amount of memory allocated for secondary page tables,
> > +             this currently includes KVM mmu allocations on x86
> > +             and arm64.
>
> Can you please explain what the rationale is for this? We already
> account for the (arm64) S2 PTs as a userspace allocation (see

This can be considered as continuation for that work. The mentioned
commit accounts S2 PTs to the VM process cgroup kernel memory. We have
stats for the total kernel memory, and some fine-grained categories of
that, like (pagetables, stack, slab, etc.).

This patch just adds another category to give further insights into
what exactly is using kernel memory.

> 115bae923ac8bb29ee635). You are saying that this is related to a
> 'workload', but given that the accounting is global, I fail to see how
> you can attribute these allocations on a particular VM.

The main motivation is having the memcg stats, which give attribution
to workloads. If you think it's more appropriate, we can add it as a
memcg-only stat, like MEMCG_VMALLOC (see 4e5aa1f4c2b4 ("memcg: add
per-memcg vmalloc stat")). The only reason I made this as a global
stat too is to be consistent with NR_PAGETABLE.

>
> What do you plan to do for IOMMU page tables? After all, they serve
> the exact same purpose, and I'd expect these to be handled the same
> way (i.e. why is this KVM specific?).

The reason this was named NR_SECONDARY_PAGTABLE instead of
NR_KVM_PAGETABLE is exactly that. To leave room to incrementally
account other types of secondary page tables to this stat. It is just
that we are currently interested in the KVM MMU usage.

>
> Thanks,
>
>         M.
>
> --
> Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 1/4] mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses.
  2022-05-02 18:46     ` Yosry Ahmed
@ 2022-05-09 16:38       ` Yosry Ahmed
  2022-05-12 20:36         ` Shakeel Butt
  2022-05-12 23:07       ` Johannes Weiner
  1 sibling, 1 reply; 26+ messages in thread
From: Yosry Ahmed @ 2022-05-09 16:38 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Tejun Heo, Johannes Weiner, Zefan Li, James Morse,
	Alexandru Elisei, Suzuki K Poulose, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Andrew Morton, Michal Hocko, Roman Gushchin,
	Shakeel Butt, Oliver Upton, cgroups, Linux Kernel Mailing List,
	linux-arm-kernel, kvmarm, kvm, Linux-MM

On Mon, May 2, 2022 at 11:46 AM Yosry Ahmed <yosryahmed@google.com> wrote:
>
> On Mon, May 2, 2022 at 3:01 AM Marc Zyngier <maz@kernel.org> wrote:
> >
> > On Fri, 29 Apr 2022 21:11:28 +0100,
> > Yosry Ahmed <yosryahmed@google.com> wrote:
> > >
> > > Add NR_SECONDARY_PAGETABLE stat to count secondary page table uses, e.g.
> > > KVM mmu. This provides more insights on the kernel memory used
> > > by a workload.
> > >
> > > This stat will be used by subsequent patches to count KVM mmu
> > > memory usage.
> > >
> > > Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
> > > ---
> > >  Documentation/admin-guide/cgroup-v2.rst | 5 +++++
> > >  Documentation/filesystems/proc.rst      | 4 ++++
> > >  drivers/base/node.c                     | 2 ++
> > >  fs/proc/meminfo.c                       | 2 ++
> > >  include/linux/mmzone.h                  | 1 +
> > >  mm/memcontrol.c                         | 1 +
> > >  mm/page_alloc.c                         | 6 +++++-
> > >  mm/vmstat.c                             | 1 +
> > >  8 files changed, 21 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> > > index 69d7a6983f78..828cb6b6f918 100644
> > > --- a/Documentation/admin-guide/cgroup-v2.rst
> > > +++ b/Documentation/admin-guide/cgroup-v2.rst
> > > @@ -1312,6 +1312,11 @@ PAGE_SIZE multiple when read back.
> > >         pagetables
> > >                  Amount of memory allocated for page tables.
> > >
> > > +       secondary_pagetables
> > > +             Amount of memory allocated for secondary page tables,
> > > +             this currently includes KVM mmu allocations on x86
> > > +             and arm64.
> >
> > Can you please explain what the rationale is for this? We already
> > account for the (arm64) S2 PTs as a userspace allocation (see
>
> This can be considered as continuation for that work. The mentioned
> commit accounts S2 PTs to the VM process cgroup kernel memory. We have
> stats for the total kernel memory, and some fine-grained categories of
> that, like (pagetables, stack, slab, etc.).
>
> This patch just adds another category to give further insights into
> what exactly is using kernel memory.
>
> > 115bae923ac8bb29ee635). You are saying that this is related to a
> > 'workload', but given that the accounting is global, I fail to see how
> > you can attribute these allocations on a particular VM.
>
> The main motivation is having the memcg stats, which give attribution
> to workloads. If you think it's more appropriate, we can add it as a
> memcg-only stat, like MEMCG_VMALLOC (see 4e5aa1f4c2b4 ("memcg: add
> per-memcg vmalloc stat")). The only reason I made this as a global
> stat too is to be consistent with NR_PAGETABLE.
>
> >
> > What do you plan to do for IOMMU page tables? After all, they serve
> > the exact same purpose, and I'd expect these to be handled the same
> > way (i.e. why is this KVM specific?).
>
> The reason this was named NR_SECONDARY_PAGTABLE instead of
> NR_KVM_PAGETABLE is exactly that. To leave room to incrementally
> account other types of secondary page tables to this stat. It is just
> that we are currently interested in the KVM MMU usage.
>


Any thoughts on this? Do you think MEMCG_SECONDARY_PAGETABLE would be
more appropriate here?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 1/4] mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses.
  2022-05-09 16:38       ` Yosry Ahmed
@ 2022-05-12 20:36         ` Shakeel Butt
  0 siblings, 0 replies; 26+ messages in thread
From: Shakeel Butt @ 2022-05-12 20:36 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: Marc Zyngier, Tejun Heo, Johannes Weiner, Zefan Li, James Morse,
	Alexandru Elisei, Suzuki K Poulose, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Andrew Morton, Michal Hocko, Roman Gushchin,
	Oliver Upton, Cgroups, Linux Kernel Mailing List,
	linux-arm-kernel, kvmarm, kvm, Linux-MM

On Mon, May 9, 2022 at 9:38 AM Yosry Ahmed <yosryahmed@google.com> wrote:
>
[...]
> > >
> > > What do you plan to do for IOMMU page tables? After all, they serve
> > > the exact same purpose, and I'd expect these to be handled the same
> > > way (i.e. why is this KVM specific?).
> >
> > The reason this was named NR_SECONDARY_PAGTABLE instead of
> > NR_KVM_PAGETABLE is exactly that. To leave room to incrementally
> > account other types of secondary page tables to this stat. It is just
> > that we are currently interested in the KVM MMU usage.
> >
>
>
> Any thoughts on this? Do you think MEMCG_SECONDARY_PAGETABLE would be
> more appropriate here?

I think NR_SECONDARY_PAGTABLE is good. Later it can include pagetables
from other subsystems. The only nit (which you can ignore) I have is
the very long memcg stat and vmstat names. Other than that the patch
looks good to me.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 1/4] mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses.
  2022-05-02 18:46     ` Yosry Ahmed
  2022-05-09 16:38       ` Yosry Ahmed
@ 2022-05-12 23:07       ` Johannes Weiner
  2022-05-12 23:29         ` Sean Christopherson
  1 sibling, 1 reply; 26+ messages in thread
From: Johannes Weiner @ 2022-05-12 23:07 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: Marc Zyngier, Tejun Heo, Zefan Li, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Paolo Bonzini, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Andrew Morton, Michal Hocko, Roman Gushchin, Shakeel Butt,
	Oliver Upton, cgroups, Linux Kernel Mailing List,
	linux-arm-kernel, kvmarm, kvm, Linux-MM

Hey Yosry,

On Mon, May 02, 2022 at 11:46:26AM -0700, Yosry Ahmed wrote:
> On Mon, May 2, 2022 at 3:01 AM Marc Zyngier <maz@kernel.org> wrote:
> > 115bae923ac8bb29ee635). You are saying that this is related to a
> > 'workload', but given that the accounting is global, I fail to see how
> > you can attribute these allocations on a particular VM.
> 
> The main motivation is having the memcg stats, which give attribution
> to workloads. If you think it's more appropriate, we can add it as a
> memcg-only stat, like MEMCG_VMALLOC (see 4e5aa1f4c2b4 ("memcg: add
> per-memcg vmalloc stat")). The only reason I made this as a global
> stat too is to be consistent with NR_PAGETABLE.

Please no memcg-specific stats if a regular vmstat item is possible
and useful at the system level as well, like in this case. It's extra
memcg code, extra callbacks, and it doesn't have NUMA node awareness.

> > What do you plan to do for IOMMU page tables? After all, they serve
> > the exact same purpose, and I'd expect these to be handled the same
> > way (i.e. why is this KVM specific?).
> 
> The reason this was named NR_SECONDARY_PAGTABLE instead of
> NR_KVM_PAGETABLE is exactly that. To leave room to incrementally
> account other types of secondary page tables to this stat. It is just
> that we are currently interested in the KVM MMU usage.

Do you actually care at the supervisor level that this memory is used
for guest page tables?

It seems to me you primarily care that it is reported *somewhere*
(hence the piggybacking off of NR_PAGETABLE at first). And whether
it's page tables or iommu tables or whatever else allocated for the
purpose of virtualization, it doesn't make much of a difference to the
host/cgroup that is tracking it, right?

(The proximity to nr_pagetable could also be confusing. A high page
table count can be a hint to userspace to enable THP. It seems
actionable in a different way than a high number of kvm page tables or
iommu page tables.)

How about NR_VIRT? It's shorter, seems descriptive enough, less room
for confusion, and is more easily extensible in the future.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 1/4] mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses.
  2022-05-12 23:07       ` Johannes Weiner
@ 2022-05-12 23:29         ` Sean Christopherson
  2022-05-13 15:50           ` Johannes Weiner
  0 siblings, 1 reply; 26+ messages in thread
From: Sean Christopherson @ 2022-05-12 23:29 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Yosry Ahmed, Marc Zyngier, Tejun Heo, Zefan Li, James Morse,
	Alexandru Elisei, Suzuki K Poulose, Paolo Bonzini,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Andrew Morton, Michal Hocko, Roman Gushchin, Shakeel Butt,
	Oliver Upton, cgroups, Linux Kernel Mailing List,
	linux-arm-kernel, kvmarm, kvm, Linux-MM

On Thu, May 12, 2022, Johannes Weiner wrote:
> Hey Yosry,
> 
> On Mon, May 02, 2022 at 11:46:26AM -0700, Yosry Ahmed wrote:
> > On Mon, May 2, 2022 at 3:01 AM Marc Zyngier <maz@kernel.org> wrote:
> > > 115bae923ac8bb29ee635). You are saying that this is related to a
> > > 'workload', but given that the accounting is global, I fail to see how
> > > you can attribute these allocations on a particular VM.
> > 
> > The main motivation is having the memcg stats, which give attribution
> > to workloads. If you think it's more appropriate, we can add it as a
> > memcg-only stat, like MEMCG_VMALLOC (see 4e5aa1f4c2b4 ("memcg: add
> > per-memcg vmalloc stat")). The only reason I made this as a global
> > stat too is to be consistent with NR_PAGETABLE.
> 
> Please no memcg-specific stats if a regular vmstat item is possible
> and useful at the system level as well, like in this case. It's extra
> memcg code, extra callbacks, and it doesn't have NUMA node awareness.
> 
> > > What do you plan to do for IOMMU page tables? After all, they serve
> > > the exact same purpose, and I'd expect these to be handled the same
> > > way (i.e. why is this KVM specific?).
> > 
> > The reason this was named NR_SECONDARY_PAGTABLE instead of
> > NR_KVM_PAGETABLE is exactly that. To leave room to incrementally
> > account other types of secondary page tables to this stat. It is just
> > that we are currently interested in the KVM MMU usage.
> 
> Do you actually care at the supervisor level that this memory is used
> for guest page tables?

Hmm, yes?  KVM does have a decent number of large-ish allocations that aren't
for page tables, but except for page tables, the number/size of those allocations
scales linearly with either the number of vCPUs or the amount of memory assigned
to the VM (with no room for improvement barring KVM changes).

Off the top of my head, KVM's secondary page tables are the only allocations that
don't scale linearly, especially when nested virtualization is in use.

> It seems to me you primarily care that it is reported *somewhere*
> (hence the piggybacking off of NR_PAGETABLE at first). And whether
> it's page tables or iommu tables or whatever else allocated for the
> purpose of virtualization, it doesn't make much of a difference to the
> host/cgroup that is tracking it, right?
> 
> (The proximity to nr_pagetable could also be confusing. A high page
> table count can be a hint to userspace to enable THP. It seems
> actionable in a different way than a high number of kvm page tables or
> iommu page tables.)

I don't know about iommu page tables, but on the KVM side a high count can also
be a good signal that enabling THP would be beneficial.  It's definitely actionable
in a different way though too.

> How about NR_VIRT? It's shorter, seems descriptive enough, less room
> for confusion, and is more easily extensible in the future.

I don't like NR_VIRT because VFIO/iommu can be used for non-virtualization things,
and we'd be lying by omission unless KVM (and other users) updates all of its
large-ish allocations to account them correctly.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 1/4] mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses.
  2022-05-12 23:29         ` Sean Christopherson
@ 2022-05-13 15:50           ` Johannes Weiner
  2022-05-13 16:12             ` Sean Christopherson
  0 siblings, 1 reply; 26+ messages in thread
From: Johannes Weiner @ 2022-05-13 15:50 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Yosry Ahmed, Marc Zyngier, Tejun Heo, Zefan Li, James Morse,
	Alexandru Elisei, Suzuki K Poulose, Paolo Bonzini,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Andrew Morton, Michal Hocko, Roman Gushchin, Shakeel Butt,
	Oliver Upton, cgroups, Linux Kernel Mailing List,
	linux-arm-kernel, kvmarm, kvm, Linux-MM

On Thu, May 12, 2022 at 11:29:38PM +0000, Sean Christopherson wrote:
> On Thu, May 12, 2022, Johannes Weiner wrote:
> > On Mon, May 02, 2022 at 11:46:26AM -0700, Yosry Ahmed wrote:
> > > On Mon, May 2, 2022 at 3:01 AM Marc Zyngier <maz@kernel.org> wrote:
> > > > What do you plan to do for IOMMU page tables? After all, they serve
> > > > the exact same purpose, and I'd expect these to be handled the same
> > > > way (i.e. why is this KVM specific?).
> > > 
> > > The reason this was named NR_SECONDARY_PAGTABLE instead of
> > > NR_KVM_PAGETABLE is exactly that. To leave room to incrementally
> > > account other types of secondary page tables to this stat. It is just
> > > that we are currently interested in the KVM MMU usage.
> > 
> > Do you actually care at the supervisor level that this memory is used
> > for guest page tables?
> 
> Hmm, yes?  KVM does have a decent number of large-ish allocations that aren't
> for page tables, but except for page tables, the number/size of those allocations
> scales linearly with either the number of vCPUs or the amount of memory assigned
> to the VM (with no room for improvement barring KVM changes).
> 
> Off the top of my head, KVM's secondary page tables are the only allocations that
> don't scale linearly, especially when nested virtualization is in use.

Thanks, that's useful information.

Are these other allocations accounted somewhere? If not, are they
potential containment holes that will need fixing eventually?

> > It seems to me you primarily care that it is reported *somewhere*
> > (hence the piggybacking off of NR_PAGETABLE at first). And whether
> > it's page tables or iommu tables or whatever else allocated for the
> > purpose of virtualization, it doesn't make much of a difference to the
> > host/cgroup that is tracking it, right?
> > 
> > (The proximity to nr_pagetable could also be confusing. A high page
> > table count can be a hint to userspace to enable THP. It seems
> > actionable in a different way than a high number of kvm page tables or
> > iommu page tables.)
> 
> I don't know about iommu page tables, but on the KVM side a high count can also
> be a good signal that enabling THP would be beneficial.

Well, maybe.

It might help, but ultimately it's the process that's in control in
all cases: it's unmovable kernel memory allocated to manage virtual
address space inside the task.

So I'm still a bit at a loss whether these things should all be lumped
in together or kept separately. meminfo and memory.stat are permanent
ABI, so we should try to establish in advance whether the new itme is
really a first-class consumer or part of something bigger.

The patch initially piggybacked on NR_PAGETABLE. I found an email of
you asking why it couldn't be a separate item, but it didn't provide a
reasoning for that decision. Could you share your thoughts on that?

Thanks

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 1/4] mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses.
  2022-05-13 15:50           ` Johannes Weiner
@ 2022-05-13 16:12             ` Sean Christopherson
  2022-05-13 16:22               ` Yosry Ahmed
  2022-05-13 17:13               ` Shakeel Butt
  0 siblings, 2 replies; 26+ messages in thread
From: Sean Christopherson @ 2022-05-13 16:12 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Yosry Ahmed, Marc Zyngier, Tejun Heo, Zefan Li, James Morse,
	Alexandru Elisei, Suzuki K Poulose, Paolo Bonzini,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Andrew Morton, Michal Hocko, Roman Gushchin, Shakeel Butt,
	Oliver Upton, cgroups, Linux Kernel Mailing List,
	linux-arm-kernel, kvmarm, kvm, Linux-MM

On Fri, May 13, 2022, Johannes Weiner wrote:
> On Thu, May 12, 2022 at 11:29:38PM +0000, Sean Christopherson wrote:
> > On Thu, May 12, 2022, Johannes Weiner wrote:
> > > On Mon, May 02, 2022 at 11:46:26AM -0700, Yosry Ahmed wrote:
> > > > On Mon, May 2, 2022 at 3:01 AM Marc Zyngier <maz@kernel.org> wrote:
> > > > > What do you plan to do for IOMMU page tables? After all, they serve
> > > > > the exact same purpose, and I'd expect these to be handled the same
> > > > > way (i.e. why is this KVM specific?).
> > > > 
> > > > The reason this was named NR_SECONDARY_PAGTABLE instead of
> > > > NR_KVM_PAGETABLE is exactly that. To leave room to incrementally
> > > > account other types of secondary page tables to this stat. It is just
> > > > that we are currently interested in the KVM MMU usage.
> > > 
> > > Do you actually care at the supervisor level that this memory is used
> > > for guest page tables?
> > 
> > Hmm, yes?  KVM does have a decent number of large-ish allocations that aren't
> > for page tables, but except for page tables, the number/size of those allocations
> > scales linearly with either the number of vCPUs or the amount of memory assigned
> > to the VM (with no room for improvement barring KVM changes).
> > 
> > Off the top of my head, KVM's secondary page tables are the only allocations that
> > don't scale linearly, especially when nested virtualization is in use.
> 
> Thanks, that's useful information.
> 
> Are these other allocations accounted somewhere? If not, are they
> potential containment holes that will need fixing eventually?

All allocations that are tied to specific VM/vCPU are tagged GFP_KERNEL_ACCOUNT,
so we should be good on that front.
 
> > > It seems to me you primarily care that it is reported *somewhere*
> > > (hence the piggybacking off of NR_PAGETABLE at first). And whether
> > > it's page tables or iommu tables or whatever else allocated for the
> > > purpose of virtualization, it doesn't make much of a difference to the
> > > host/cgroup that is tracking it, right?
> > > 
> > > (The proximity to nr_pagetable could also be confusing. A high page
> > > table count can be a hint to userspace to enable THP. It seems
> > > actionable in a different way than a high number of kvm page tables or
> > > iommu page tables.)
> > 
> > I don't know about iommu page tables, but on the KVM side a high count can also
> > be a good signal that enabling THP would be beneficial.
> 
> Well, maybe.
> 
> It might help, but ultimately it's the process that's in control in
> all cases: it's unmovable kernel memory allocated to manage virtual
> address space inside the task.
> 
> So I'm still a bit at a loss whether these things should all be lumped
> in together or kept separately. meminfo and memory.stat are permanent
> ABI, so we should try to establish in advance whether the new itme is
> really a first-class consumer or part of something bigger.
> 
> The patch initially piggybacked on NR_PAGETABLE. I found an email of
> you asking why it couldn't be a separate item, but it didn't provide a
> reasoning for that decision. Could you share your thoughts on that?

It was mostly an honest question, I too am trying to understand what userspace
wants to do with this information.  I was/am also trying to understand the benefits
of doing the tracking through page_state and not a dedicated KVM stat.  E.g. KVM
already has specific stats for the number of leaf pages mapped into a VM, why not
do the same for non-leaf pages?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 1/4] mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses.
  2022-05-13 16:12             ` Sean Christopherson
@ 2022-05-13 16:22               ` Yosry Ahmed
  2022-05-13 17:13               ` Shakeel Butt
  1 sibling, 0 replies; 26+ messages in thread
From: Yosry Ahmed @ 2022-05-13 16:22 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Johannes Weiner, Marc Zyngier, Tejun Heo, Zefan Li, James Morse,
	Alexandru Elisei, Suzuki K Poulose, Paolo Bonzini,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Andrew Morton, Michal Hocko, Roman Gushchin, Shakeel Butt,
	Oliver Upton, cgroups, Linux Kernel Mailing List,
	linux-arm-kernel, kvmarm, kvm, Linux-MM

Thanks everyone for participating in this discussion and looking into this.

On Fri, May 13, 2022 at 9:12 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Fri, May 13, 2022, Johannes Weiner wrote:
> > On Thu, May 12, 2022 at 11:29:38PM +0000, Sean Christopherson wrote:
> > > On Thu, May 12, 2022, Johannes Weiner wrote:
> > > > On Mon, May 02, 2022 at 11:46:26AM -0700, Yosry Ahmed wrote:
> > > > > On Mon, May 2, 2022 at 3:01 AM Marc Zyngier <maz@kernel.org> wrote:
> > > > > > What do you plan to do for IOMMU page tables? After all, they serve
> > > > > > the exact same purpose, and I'd expect these to be handled the same
> > > > > > way (i.e. why is this KVM specific?).
> > > > >
> > > > > The reason this was named NR_SECONDARY_PAGTABLE instead of
> > > > > NR_KVM_PAGETABLE is exactly that. To leave room to incrementally
> > > > > account other types of secondary page tables to this stat. It is just
> > > > > that we are currently interested in the KVM MMU usage.
> > > >
> > > > Do you actually care at the supervisor level that this memory is used
> > > > for guest page tables?
> > >
> > > Hmm, yes?  KVM does have a decent number of large-ish allocations that aren't
> > > for page tables, but except for page tables, the number/size of those allocations
> > > scales linearly with either the number of vCPUs or the amount of memory assigned
> > > to the VM (with no room for improvement barring KVM changes).
> > >
> > > Off the top of my head, KVM's secondary page tables are the only allocations that
> > > don't scale linearly, especially when nested virtualization is in use.
> >
> > Thanks, that's useful information.
> >
> > Are these other allocations accounted somewhere? If not, are they
> > potential containment holes that will need fixing eventually?
>
> All allocations that are tied to specific VM/vCPU are tagged GFP_KERNEL_ACCOUNT,
> so we should be good on that front.
>
> > > > It seems to me you primarily care that it is reported *somewhere*
> > > > (hence the piggybacking off of NR_PAGETABLE at first). And whether
> > > > it's page tables or iommu tables or whatever else allocated for the
> > > > purpose of virtualization, it doesn't make much of a difference to the
> > > > host/cgroup that is tracking it, right?
> > > >
> > > > (The proximity to nr_pagetable could also be confusing. A high page
> > > > table count can be a hint to userspace to enable THP. It seems
> > > > actionable in a different way than a high number of kvm page tables or
> > > > iommu page tables.)
> > >
> > > I don't know about iommu page tables, but on the KVM side a high count can also
> > > be a good signal that enabling THP would be beneficial.
> >
> > Well, maybe.
> >
> > It might help, but ultimately it's the process that's in control in
> > all cases: it's unmovable kernel memory allocated to manage virtual
> > address space inside the task.
> >
> > So I'm still a bit at a loss whether these things should all be lumped
> > in together or kept separately. meminfo and memory.stat are permanent
> > ABI, so we should try to establish in advance whether the new itme is
> > really a first-class consumer or part of something bigger.
> >
> > The patch initially piggybacked on NR_PAGETABLE. I found an email of
> > you asking why it couldn't be a separate item, but it didn't provide a
> > reasoning for that decision. Could you share your thoughts on that?
>
> It was mostly an honest question, I too am trying to understand what userspace
> wants to do with this information.  I was/am also trying to understand the benefits
> of doing the tracking through page_state and not a dedicated KVM stat.  E.g. KVM
> already has specific stats for the number of leaf pages mapped into a VM, why not
> do the same for non-leaf pages?

Let me cast some light on this. The reason this started being
piggybacked on NR_PAGETABLE is that we had a remnant of an old
internal implementation of NR_PAGETABLE before it was introduced
upstream, that accounted KVM secondary page tables as normal page
tables. This made me think this behavior was preferable. Personally, I
wanted to make it a separate thing since the beginning. When I found
opinions here that also suggested a separate stat I went ahead for
that.

As for where to put this information, it does not have to be
NR_SECONDARY_PAGETABLE. Ultimately, people working on KVM are the ones
that will interpret and act upon this data, so if you have somewhere
else in mind please let me know, Sean.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 1/4] mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses.
  2022-05-13 16:12             ` Sean Christopherson
  2022-05-13 16:22               ` Yosry Ahmed
@ 2022-05-13 17:13               ` Shakeel Butt
  2022-05-20  1:56                 ` Yosry Ahmed
  1 sibling, 1 reply; 26+ messages in thread
From: Shakeel Butt @ 2022-05-13 17:13 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Johannes Weiner, Yosry Ahmed, Marc Zyngier, Tejun Heo, Zefan Li,
	James Morse, Alexandru Elisei, Suzuki K Poulose, Paolo Bonzini,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Andrew Morton, Michal Hocko, Roman Gushchin, Oliver Upton,
	Cgroups, Linux Kernel Mailing List, linux-arm-kernel, kvmarm,
	kvm, Linux-MM

On Fri, May 13, 2022 at 9:12 AM Sean Christopherson <seanjc@google.com> wrote:
>
[...]
>
> It was mostly an honest question, I too am trying to understand what userspace
> wants to do with this information.  I was/am also trying to understand the benefits
> of doing the tracking through page_state and not a dedicated KVM stat.  E.g. KVM
> already has specific stats for the number of leaf pages mapped into a VM, why not
> do the same for non-leaf pages?

Let me answer why a more general stat is useful and the potential
userspace reaction:

For a memory type which is significant enough, it is useful to expose
it in the general interfaces, so that the general data/stat collection
infra can collect them instead of having workload dependent stat
collectors. In addition, not necessarily that stat has to have a
userspace reaction in an online fashion. We do collect stats for
offline analysis which greatly influence the priority order of
optimization workitems.

Next the question is do we really need a separate stat item
(secondary_pagetable instead of just plain pagetable) exposed in the
stable API? To me secondary_pagetable is general (not kvm specific)
enough and can be significant, so having a separate dedicated stat
should be ok. Though I am ok with lump it with pagetable stat for now
but we do want it to be accounted somewhere.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 1/4] mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses.
  2022-05-13 17:13               ` Shakeel Butt
@ 2022-05-20  1:56                 ` Yosry Ahmed
  2022-05-20 14:39                   ` Johannes Weiner
  0 siblings, 1 reply; 26+ messages in thread
From: Yosry Ahmed @ 2022-05-20  1:56 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Sean Christopherson, Johannes Weiner, Marc Zyngier, Tejun Heo,
	Zefan Li, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Andrew Morton, Michal Hocko, Roman Gushchin,
	Oliver Upton, Cgroups, Linux Kernel Mailing List,
	linux-arm-kernel, kvmarm, kvm, Linux-MM

On Fri, May 13, 2022 at 10:14 AM Shakeel Butt <shakeelb@google.com> wrote:
>
> On Fri, May 13, 2022 at 9:12 AM Sean Christopherson <seanjc@google.com> wrote:
> >
> [...]
> >
> > It was mostly an honest question, I too am trying to understand what userspace
> > wants to do with this information.  I was/am also trying to understand the benefits
> > of doing the tracking through page_state and not a dedicated KVM stat.  E.g. KVM
> > already has specific stats for the number of leaf pages mapped into a VM, why not
> > do the same for non-leaf pages?
>
> Let me answer why a more general stat is useful and the potential
> userspace reaction:
>
> For a memory type which is significant enough, it is useful to expose
> it in the general interfaces, so that the general data/stat collection
> infra can collect them instead of having workload dependent stat
> collectors. In addition, not necessarily that stat has to have a
> userspace reaction in an online fashion. We do collect stats for
> offline analysis which greatly influence the priority order of
> optimization workitems.
>
> Next the question is do we really need a separate stat item
> (secondary_pagetable instead of just plain pagetable) exposed in the
> stable API? To me secondary_pagetable is general (not kvm specific)
> enough and can be significant, so having a separate dedicated stat
> should be ok. Though I am ok with lump it with pagetable stat for now
> but we do want it to be accounted somewhere.

Any thoughts on this? Johannes?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 1/4] mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses.
  2022-05-20  1:56                 ` Yosry Ahmed
@ 2022-05-20 14:39                   ` Johannes Weiner
  2022-05-24 22:31                     ` Yosry Ahmed
  0 siblings, 1 reply; 26+ messages in thread
From: Johannes Weiner @ 2022-05-20 14:39 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: Shakeel Butt, Sean Christopherson, Marc Zyngier, Tejun Heo,
	Zefan Li, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Andrew Morton, Michal Hocko, Roman Gushchin,
	Oliver Upton, Cgroups, Linux Kernel Mailing List,
	linux-arm-kernel, kvmarm, kvm, Linux-MM

On Thu, May 19, 2022 at 06:56:54PM -0700, Yosry Ahmed wrote:
> On Fri, May 13, 2022 at 10:14 AM Shakeel Butt <shakeelb@google.com> wrote:
> >
> > On Fri, May 13, 2022 at 9:12 AM Sean Christopherson <seanjc@google.com> wrote:
> > >
> > [...]
> > >
> > > It was mostly an honest question, I too am trying to understand what userspace
> > > wants to do with this information.  I was/am also trying to understand the benefits
> > > of doing the tracking through page_state and not a dedicated KVM stat.  E.g. KVM
> > > already has specific stats for the number of leaf pages mapped into a VM, why not
> > > do the same for non-leaf pages?
> >
> > Let me answer why a more general stat is useful and the potential
> > userspace reaction:
> >
> > For a memory type which is significant enough, it is useful to expose
> > it in the general interfaces, so that the general data/stat collection
> > infra can collect them instead of having workload dependent stat
> > collectors. In addition, not necessarily that stat has to have a
> > userspace reaction in an online fashion. We do collect stats for
> > offline analysis which greatly influence the priority order of
> > optimization workitems.
> >
> > Next the question is do we really need a separate stat item
> > (secondary_pagetable instead of just plain pagetable) exposed in the
> > stable API? To me secondary_pagetable is general (not kvm specific)
> > enough and can be significant, so having a separate dedicated stat
> > should be ok. Though I am ok with lump it with pagetable stat for now
> > but we do want it to be accounted somewhere.
> 
> Any thoughts on this? Johannes?

I agree that this memory should show up in vmstat/memory.stat in some
form or another.

The arguments on whether this should be part of NR_PAGETABLE or a
separate entry seem a bit vague to me. I was hoping somebody more
familiar with KVM could provide a better picture of memory consumption
in that area.

Sean had mentioned that these allocations already get tracked through
GFP_KERNEL_ACCOUNT. That's good, but if they are significant enough to
track, they should be represented in memory.stat in some form. Sean
also pointed out though that those allocations tend to scale rather
differently than the page tables, so it probably makes sense to keep
those two things separate at least.

Any thoughts on putting shadow page tables and iommu page tables into
the existing NR_PAGETABLE item? If not, what are the cons?

And creating (maybe later) a separate NR_VIRT for the other
GPF_KERNEL_ACCOUNT allocations in kvm?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 1/4] mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses.
  2022-05-20 14:39                   ` Johannes Weiner
@ 2022-05-24 22:31                     ` Yosry Ahmed
  2022-05-25 11:56                       ` Johannes Weiner
  0 siblings, 1 reply; 26+ messages in thread
From: Yosry Ahmed @ 2022-05-24 22:31 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Shakeel Butt, Sean Christopherson, Marc Zyngier, Tejun Heo,
	Zefan Li, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Andrew Morton, Michal Hocko, Roman Gushchin,
	Oliver Upton, Cgroups, Linux Kernel Mailing List,
	linux-arm-kernel, kvmarm, kvm, Linux-MM

On Fri, May 20, 2022 at 7:39 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> On Thu, May 19, 2022 at 06:56:54PM -0700, Yosry Ahmed wrote:
> > On Fri, May 13, 2022 at 10:14 AM Shakeel Butt <shakeelb@google.com> wrote:
> > >
> > > On Fri, May 13, 2022 at 9:12 AM Sean Christopherson <seanjc@google.com> wrote:
> > > >
> > > [...]
> > > >
> > > > It was mostly an honest question, I too am trying to understand what userspace
> > > > wants to do with this information.  I was/am also trying to understand the benefits
> > > > of doing the tracking through page_state and not a dedicated KVM stat.  E.g. KVM
> > > > already has specific stats for the number of leaf pages mapped into a VM, why not
> > > > do the same for non-leaf pages?
> > >
> > > Let me answer why a more general stat is useful and the potential
> > > userspace reaction:
> > >
> > > For a memory type which is significant enough, it is useful to expose
> > > it in the general interfaces, so that the general data/stat collection
> > > infra can collect them instead of having workload dependent stat
> > > collectors. In addition, not necessarily that stat has to have a
> > > userspace reaction in an online fashion. We do collect stats for
> > > offline analysis which greatly influence the priority order of
> > > optimization workitems.
> > >
> > > Next the question is do we really need a separate stat item
> > > (secondary_pagetable instead of just plain pagetable) exposed in the
> > > stable API? To me secondary_pagetable is general (not kvm specific)
> > > enough and can be significant, so having a separate dedicated stat
> > > should be ok. Though I am ok with lump it with pagetable stat for now
> > > but we do want it to be accounted somewhere.
> >
> > Any thoughts on this? Johannes?
>
> I agree that this memory should show up in vmstat/memory.stat in some
> form or another.
>
> The arguments on whether this should be part of NR_PAGETABLE or a
> separate entry seem a bit vague to me. I was hoping somebody more
> familiar with KVM could provide a better picture of memory consumption
> in that area.
>
> Sean had mentioned that these allocations already get tracked through
> GFP_KERNEL_ACCOUNT. That's good, but if they are significant enough to
> track, they should be represented in memory.stat in some form. Sean
> also pointed out though that those allocations tend to scale rather
> differently than the page tables, so it probably makes sense to keep
> those two things separate at least.
>
> Any thoughts on putting shadow page tables and iommu page tables into
> the existing NR_PAGETABLE item? If not, what are the cons?
>
> And creating (maybe later) a separate NR_VIRT for the other
> GPF_KERNEL_ACCOUNT allocations in kvm?

I agree with Sean that a NR_VIRT stat would be inaccurate by omission,
unless we account for all KVM allocations under this stat. This might
be an unnecessary burden according to what Sean said, as most other
allocations scale linearly with the number of vCPUs or the memory
assigned to the VM.

I don't have enough context to say whether we should piggyback KVM MMU
pages to the existing NR_PAGETABLE item, but from a high level it
seems like it would be more helpful if they are a separate stat.
Anyway, I am willing to go with whatever Sean thinks is best.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 1/4] mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses.
  2022-05-24 22:31                     ` Yosry Ahmed
@ 2022-05-25 11:56                       ` Johannes Weiner
  2022-05-26  0:38                         ` Sean Christopherson
  0 siblings, 1 reply; 26+ messages in thread
From: Johannes Weiner @ 2022-05-25 11:56 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: Shakeel Butt, Sean Christopherson, Marc Zyngier, Tejun Heo,
	Zefan Li, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Andrew Morton, Michal Hocko, Roman Gushchin,
	Oliver Upton, Cgroups, Linux Kernel Mailing List,
	linux-arm-kernel, kvmarm, kvm, Linux-MM

On Tue, May 24, 2022 at 03:31:52PM -0700, Yosry Ahmed wrote:
> On Fri, May 20, 2022 at 7:39 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
> > I agree that this memory should show up in vmstat/memory.stat in some
> > form or another.
> >
> > The arguments on whether this should be part of NR_PAGETABLE or a
> > separate entry seem a bit vague to me. I was hoping somebody more
> > familiar with KVM could provide a better picture of memory consumption
> > in that area.
> >
> > Sean had mentioned that these allocations already get tracked through
> > GFP_KERNEL_ACCOUNT. That's good, but if they are significant enough to
> > track, they should be represented in memory.stat in some form. Sean
> > also pointed out though that those allocations tend to scale rather
> > differently than the page tables, so it probably makes sense to keep
> > those two things separate at least.
> >
> > Any thoughts on putting shadow page tables and iommu page tables into
> > the existing NR_PAGETABLE item? If not, what are the cons?
> >
> > And creating (maybe later) a separate NR_VIRT for the other
> > GPF_KERNEL_ACCOUNT allocations in kvm?
> 
> I agree with Sean that a NR_VIRT stat would be inaccurate by omission,
> unless we account for all KVM allocations under this stat. This might
> be an unnecessary burden according to what Sean said, as most other
> allocations scale linearly with the number of vCPUs or the memory
> assigned to the VM.

I think it's fine to table the addition of NR_VIRT for now. My
conclusion from this discussion was just that if we do want to add
more KVM-related allocation sites later on, they likely would be
something separate and not share an item with the shadow tables. This
simplifies the discussion around how to present the shadow tables.

That said, stats can be incremental and still useful. memory.current
itself lies by ommission. It's more important to catch what's of
significance and allow users to narrow down pathological cases.

> I don't have enough context to say whether we should piggyback KVM MMU
> pages to the existing NR_PAGETABLE item, but from a high level it
> seems like it would be more helpful if they are a separate stat.
> Anyway, I am willing to go with whatever Sean thinks is best.

Somebody should work this out and put it into a changelog. It's
permanent ABI.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 1/4] mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses.
  2022-05-25 11:56                       ` Johannes Weiner
@ 2022-05-26  0:38                         ` Sean Christopherson
  2022-05-27 18:33                           ` Yosry Ahmed
  0 siblings, 1 reply; 26+ messages in thread
From: Sean Christopherson @ 2022-05-26  0:38 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Yosry Ahmed, Shakeel Butt, Marc Zyngier, Tejun Heo, Zefan Li,
	James Morse, Alexandru Elisei, Suzuki K Poulose, Paolo Bonzini,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Andrew Morton, Michal Hocko, Roman Gushchin, Oliver Upton,
	Cgroups, Linux Kernel Mailing List, linux-arm-kernel, kvmarm,
	kvm, Linux-MM

On Wed, May 25, 2022, Johannes Weiner wrote:
> On Tue, May 24, 2022 at 03:31:52PM -0700, Yosry Ahmed wrote:
> > I don't have enough context to say whether we should piggyback KVM MMU
> > pages to the existing NR_PAGETABLE item, but from a high level it
> > seems like it would be more helpful if they are a separate stat.
> > Anyway, I am willing to go with whatever Sean thinks is best.
> 
> Somebody should work this out and put it into a changelog. It's
> permanent ABI.

After a lot of waffling, my vote is to add a dedicated NR_SECONDARY_PAGETABLE.

It's somewhat redundant from a KVM perspective, as NR_SECONDARY_PAGETABLE will
scale with KVM's per-VM pages_{4k,2m,1g} stats unless the guest is doing something
bizarre, e.g. accessing only 4kb chunks of 2mb pages so that KVM is forced to
allocate a large number of page tables even though the guest isn't accessing that
much memory.

But, someone would need to either understand how KVM works to make that connection,
or know (or be told) to go look at KVM's stats if they're running VMs to better
decipher the stats.

And even in the little bit of time I played with this, I found having
nr_page_table_pages side-by-side with nr_secondary_page_table_pages to be very
informative.  E.g. when backing a VM with THP versus HugeTLB,
nr_secondary_page_table_pages is roughly the same, but nr_page_table_pages is an
order of a magnitude higher with THP.  I'm guessing the THP behavior is due to
something triggering DoubleMap, but now I want to find out why that's happening.

So while I'm pretty sure a clever user could glean the same info by cross-referencing
NR_PAGETABLE stats with KVM stats, I think having NR_SECONDARY_PAGETABLE will at the
very least prove to be helpful for understanding tradeoffs between VM backing types,
and likely even steer folks towards potential optimizations.

Baseline:
  # grep page_table /proc/vmstat 
  nr_page_table_pages 2830
  nr_secondary_page_table_pages 0

THP:
  # grep page_table /proc/vmstat 
  nr_page_table_pages 7584
  nr_secondary_page_table_pages 140

HugeTLB:
  # grep page_table /proc/vmstat
  nr_page_table_pages 3153
  nr_secondary_page_table_pages 153


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 1/4] mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses.
  2022-05-26  0:38                         ` Sean Christopherson
@ 2022-05-27 18:33                           ` Yosry Ahmed
  2022-06-03 16:42                             ` Johannes Weiner
  0 siblings, 1 reply; 26+ messages in thread
From: Yosry Ahmed @ 2022-05-27 18:33 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Johannes Weiner, Shakeel Butt, Marc Zyngier, Tejun Heo, Zefan Li,
	James Morse, Alexandru Elisei, Suzuki K Poulose, Paolo Bonzini,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Andrew Morton, Michal Hocko, Roman Gushchin, Oliver Upton,
	Cgroups, Linux Kernel Mailing List, linux-arm-kernel, kvmarm,
	kvm, Linux-MM

On Wed, May 25, 2022 at 5:39 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Wed, May 25, 2022, Johannes Weiner wrote:
> > On Tue, May 24, 2022 at 03:31:52PM -0700, Yosry Ahmed wrote:
> > > I don't have enough context to say whether we should piggyback KVM MMU
> > > pages to the existing NR_PAGETABLE item, but from a high level it
> > > seems like it would be more helpful if they are a separate stat.
> > > Anyway, I am willing to go with whatever Sean thinks is best.
> >
> > Somebody should work this out and put it into a changelog. It's
> > permanent ABI.
>
> After a lot of waffling, my vote is to add a dedicated NR_SECONDARY_PAGETABLE.
>
> It's somewhat redundant from a KVM perspective, as NR_SECONDARY_PAGETABLE will
> scale with KVM's per-VM pages_{4k,2m,1g} stats unless the guest is doing something
> bizarre, e.g. accessing only 4kb chunks of 2mb pages so that KVM is forced to
> allocate a large number of page tables even though the guest isn't accessing that
> much memory.
>
> But, someone would need to either understand how KVM works to make that connection,
> or know (or be told) to go look at KVM's stats if they're running VMs to better
> decipher the stats.
>
> And even in the little bit of time I played with this, I found having
> nr_page_table_pages side-by-side with nr_secondary_page_table_pages to be very
> informative.  E.g. when backing a VM with THP versus HugeTLB,
> nr_secondary_page_table_pages is roughly the same, but nr_page_table_pages is an
> order of a magnitude higher with THP.  I'm guessing the THP behavior is due to
> something triggering DoubleMap, but now I want to find out why that's happening.
>
> So while I'm pretty sure a clever user could glean the same info by cross-referencing
> NR_PAGETABLE stats with KVM stats, I think having NR_SECONDARY_PAGETABLE will at the
> very least prove to be helpful for understanding tradeoffs between VM backing types,
> and likely even steer folks towards potential optimizations.
>
> Baseline:
>   # grep page_table /proc/vmstat
>   nr_page_table_pages 2830
>   nr_secondary_page_table_pages 0
>
> THP:
>   # grep page_table /proc/vmstat
>   nr_page_table_pages 7584
>   nr_secondary_page_table_pages 140
>
> HugeTLB:
>   # grep page_table /proc/vmstat
>   nr_page_table_pages 3153
>   nr_secondary_page_table_pages 153
>

Interesting findings! Thanks for taking the time to look into this, Sean!
I will refresh this patchset and summarize the discussion in the
commit message, and also fix some nits on the KVM side. Does this
sound good to everyone?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 1/4] mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses.
  2022-05-27 18:33                           ` Yosry Ahmed
@ 2022-06-03 16:42                             ` Johannes Weiner
  0 siblings, 0 replies; 26+ messages in thread
From: Johannes Weiner @ 2022-06-03 16:42 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: Sean Christopherson, Shakeel Butt, Marc Zyngier, Tejun Heo,
	Zefan Li, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Andrew Morton, Michal Hocko, Roman Gushchin,
	Oliver Upton, Cgroups, Linux Kernel Mailing List,
	linux-arm-kernel, kvmarm, kvm, Linux-MM

On Fri, May 27, 2022 at 11:33:27AM -0700, Yosry Ahmed wrote:
> On Wed, May 25, 2022 at 5:39 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Wed, May 25, 2022, Johannes Weiner wrote:
> > > On Tue, May 24, 2022 at 03:31:52PM -0700, Yosry Ahmed wrote:
> > > > I don't have enough context to say whether we should piggyback KVM MMU
> > > > pages to the existing NR_PAGETABLE item, but from a high level it
> > > > seems like it would be more helpful if they are a separate stat.
> > > > Anyway, I am willing to go with whatever Sean thinks is best.
> > >
> > > Somebody should work this out and put it into a changelog. It's
> > > permanent ABI.
> >
> > After a lot of waffling, my vote is to add a dedicated NR_SECONDARY_PAGETABLE.
> >
> > It's somewhat redundant from a KVM perspective, as NR_SECONDARY_PAGETABLE will
> > scale with KVM's per-VM pages_{4k,2m,1g} stats unless the guest is doing something
> > bizarre, e.g. accessing only 4kb chunks of 2mb pages so that KVM is forced to
> > allocate a large number of page tables even though the guest isn't accessing that
> > much memory.
> >
> > But, someone would need to either understand how KVM works to make that connection,
> > or know (or be told) to go look at KVM's stats if they're running VMs to better
> > decipher the stats.
> >
> > And even in the little bit of time I played with this, I found having
> > nr_page_table_pages side-by-side with nr_secondary_page_table_pages to be very
> > informative.  E.g. when backing a VM with THP versus HugeTLB,
> > nr_secondary_page_table_pages is roughly the same, but nr_page_table_pages is an
> > order of a magnitude higher with THP.  I'm guessing the THP behavior is due to
> > something triggering DoubleMap, but now I want to find out why that's happening.
> >
> > So while I'm pretty sure a clever user could glean the same info by cross-referencing
> > NR_PAGETABLE stats with KVM stats, I think having NR_SECONDARY_PAGETABLE will at the
> > very least prove to be helpful for understanding tradeoffs between VM backing types,
> > and likely even steer folks towards potential optimizations.
> >
> > Baseline:
> >   # grep page_table /proc/vmstat
> >   nr_page_table_pages 2830
> >   nr_secondary_page_table_pages 0
> >
> > THP:
> >   # grep page_table /proc/vmstat
> >   nr_page_table_pages 7584
> >   nr_secondary_page_table_pages 140
> >
> > HugeTLB:
> >   # grep page_table /proc/vmstat
> >   nr_page_table_pages 3153
> >   nr_secondary_page_table_pages 153
> >
> 
> Interesting findings! Thanks for taking the time to look into this, Sean!
> I will refresh this patchset and summarize the discussion in the
> commit message, and also fix some nits on the KVM side. Does this
> sound good to everyone?

Yes, thanks for summarizing this. Sounds good to me!


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 3/4] KVM: x86/mmu: count KVM mmu usage in secondary pagetable stats.
  2022-04-29 20:11 ` [PATCH v4 3/4] KVM: x86/mmu: count KVM mmu usage in secondary pagetable stats Yosry Ahmed
@ 2022-07-22 20:58   ` Mingwei Zhang
  2022-07-26 18:03     ` Sean Christopherson
  0 siblings, 1 reply; 26+ messages in thread
From: Mingwei Zhang @ 2022-07-22 20:58 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: Tejun Heo, Johannes Weiner, Zefan Li, Marc Zyngier, James Morse,
	Alexandru Elisei, Suzuki K Poulose, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Andrew Morton, Michal Hocko, Roman Gushchin,
	Shakeel Butt, Oliver Upton, cgroups, linux-kernel,
	linux-arm-kernel, kvmarm, kvm, linux-mm

On Fri, Apr 29, 2022, Yosry Ahmed wrote:
> Count the pages used by KVM mmu on x86 for in secondary pagetable stats.
> 
> For the legacy mmu, accounting pagetable stats is combined KVM's
> existing for mmu pages in newly introduced kvm_[un]account_mmu_page()
> helpers.
> 
> For tdp mmu, introduce new tdp_[un]account_mmu_page() helpers. That
> combines accounting pagetable stats with the tdp_mmu_pages counter
> accounting.
> 
> tdp_mmu_pages counter introduced in this series [1]. This patch was
> rebased on top of the first two patches in that series.
> 
> [1]https://lore.kernel.org/lkml/20220401063636.2414200-1-mizhang@google.com/
> 
> Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
> ---

It looks like there are two metrics for mmu in x86: one for shadow mmu
and the other for TDP mmu. Is there any plan to merge them together?

>  arch/x86/kvm/mmu/mmu.c     | 16 ++++++++++++++--
>  arch/x86/kvm/mmu/tdp_mmu.c | 16 ++++++++++++++--
>  2 files changed, 28 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 78d8e1d8fb99..e5b0e826445d 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -1679,6 +1679,18 @@ static inline void kvm_mod_used_mmu_pages(struct kvm *kvm, long nr)
>  	percpu_counter_add(&kvm_total_used_mmu_pages, nr);
>  }
>  
> +static void kvm_account_mmu_page(struct kvm *kvm, struct kvm_mmu_page *sp)
> +{
> +	kvm_mod_used_mmu_pages(kvm, +1);
> +	kvm_account_pgtable_pages((void *)sp->spt, +1);
> +}
> +
> +static void kvm_unaccount_mmu_page(struct kvm *kvm, struct kvm_mmu_page *sp)
> +{
> +	kvm_mod_used_mmu_pages(kvm, -1);
> +	kvm_account_pgtable_pages((void *)sp->spt, -1);
> +}
> +
>  static void kvm_mmu_free_page(struct kvm_mmu_page *sp)
>  {
>  	MMU_WARN_ON(!is_empty_shadow_page(sp->spt));
> @@ -1734,7 +1746,7 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu, int direct
>  	 */
>  	sp->mmu_valid_gen = vcpu->kvm->arch.mmu_valid_gen;
>  	list_add(&sp->link, &vcpu->kvm->arch.active_mmu_pages);
> -	kvm_mod_used_mmu_pages(vcpu->kvm, +1);
> +	kvm_account_mmu_page(vcpu->kvm, sp);
>  	return sp;
>  }
>  
> @@ -2363,7 +2375,7 @@ static bool __kvm_mmu_prepare_zap_page(struct kvm *kvm,
>  			list_add(&sp->link, invalid_list);
>  		else
>  			list_move(&sp->link, invalid_list);
> -		kvm_mod_used_mmu_pages(kvm, -1);
> +		kvm_unaccount_mmu_page(kvm, sp);
>  	} else {
>  		/*
>  		 * Remove the active root from the active page list, the root
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index 3456277ade18..6295c4da5dee 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -371,6 +371,18 @@ static void handle_changed_spte_dirty_log(struct kvm *kvm, int as_id, gfn_t gfn,
>  	}
>  }
>  
> +static void tdp_account_mmu_page(struct kvm *kvm, struct kvm_mmu_page *sp)
> +{
> +	atomic64_inc(&kvm->arch.tdp_mmu_pages);
> +	kvm_account_pgtable_pages((void *)sp->spt, +1);
> +}
> +
> +static void tdp_unaccount_mmu_page(struct kvm *kvm, struct kvm_mmu_page *sp)
> +{
> +	atomic64_dec(&kvm->arch.tdp_mmu_pages);
> +	kvm_account_pgtable_pages((void *)sp->spt, -1);
> +}
> +
>  /**
>   * tdp_mmu_unlink_sp() - Remove a shadow page from the list of used pages
>   *
> @@ -383,7 +395,7 @@ static void handle_changed_spte_dirty_log(struct kvm *kvm, int as_id, gfn_t gfn,
>  static void tdp_mmu_unlink_sp(struct kvm *kvm, struct kvm_mmu_page *sp,
>  			      bool shared)
>  {
> -	atomic64_dec(&kvm->arch.tdp_mmu_pages);
> +	tdp_unaccount_mmu_page(kvm, sp);
>  
>  	if (!sp->lpage_disallowed)
>  		return;
> @@ -1121,7 +1133,7 @@ static int tdp_mmu_link_sp(struct kvm *kvm, struct tdp_iter *iter,
>  		tdp_mmu_set_spte(kvm, iter, spte);
>  	}
>  
> -	atomic64_inc(&kvm->arch.tdp_mmu_pages);
> +	tdp_account_mmu_page(kvm, sp);
>  
>  	return 0;
>  }
> -- 
> 2.36.0.464.gb9c8b46e94-goog
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 3/4] KVM: x86/mmu: count KVM mmu usage in secondary pagetable stats.
  2022-07-22 20:58   ` Mingwei Zhang
@ 2022-07-26 18:03     ` Sean Christopherson
  0 siblings, 0 replies; 26+ messages in thread
From: Sean Christopherson @ 2022-07-26 18:03 UTC (permalink / raw)
  To: Mingwei Zhang
  Cc: Yosry Ahmed, Tejun Heo, Johannes Weiner, Zefan Li, Marc Zyngier,
	James Morse, Alexandru Elisei, Suzuki K Poulose, Paolo Bonzini,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Andrew Morton, Michal Hocko, Roman Gushchin, Shakeel Butt,
	Oliver Upton, cgroups, linux-kernel, linux-arm-kernel, kvmarm,
	kvm, linux-mm

On Fri, Jul 22, 2022, Mingwei Zhang wrote:
> On Fri, Apr 29, 2022, Yosry Ahmed wrote:
> > Count the pages used by KVM mmu on x86 for in secondary pagetable stats.
> > 
> > For the legacy mmu, accounting pagetable stats is combined KVM's
> > existing for mmu pages in newly introduced kvm_[un]account_mmu_page()
> > helpers.
> > 
> > For tdp mmu, introduce new tdp_[un]account_mmu_page() helpers. That
> > combines accounting pagetable stats with the tdp_mmu_pages counter
> > accounting.
> > 
> > tdp_mmu_pages counter introduced in this series [1]. This patch was
> > rebased on top of the first two patches in that series.
> > 
> > [1]https://lore.kernel.org/lkml/20220401063636.2414200-1-mizhang@google.com/
> > 
> > Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
> > ---
> 
> It looks like there are two metrics for mmu in x86: one for shadow mmu
> and the other for TDP mmu. Is there any plan to merge them together?

There aren't two _separate_ metrics per se, rather that the TDP MMU (intentionally)
doesn't honor KVM_SET_NR_MMU_PAGES, nor does it play nice the the core mm shrinkers.
Thus, the TDP MMU doesn't udpate kvm_mod_used_mmu_pages(), which feeds into both of
those things.

Long term, I don't think the TDP MMU will ever honor KVM_SET_NR_MMU_PAGES.  That
particular knob predates proper integration with memcg and probably should be
deprecated.

As for supporting shrinkers in the TDP MMU, it's unclear whether or not that's truly
necessary.  And until mmu_shrink_scan() is made a _lot_ smarter, it's somewhat of a
moot point because KVM's shrinker implementation is just too naive for it to be a net
positive, e.g. it tends to zap upper level entries and wipe out large swaths of KVM's
page tables.  KVM_SET_NR_MMU_PAGES uses the same naive algorithm, so it's not any better.

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2022-07-26 18:04 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-29 20:11 [PATCH v4 0/4] KVM: mm: count KVM mmu usage in memory stats Yosry Ahmed
2022-04-29 20:11 ` [PATCH v4 1/4] mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses Yosry Ahmed
2022-05-02 10:01   ` Marc Zyngier
2022-05-02 18:46     ` Yosry Ahmed
2022-05-09 16:38       ` Yosry Ahmed
2022-05-12 20:36         ` Shakeel Butt
2022-05-12 23:07       ` Johannes Weiner
2022-05-12 23:29         ` Sean Christopherson
2022-05-13 15:50           ` Johannes Weiner
2022-05-13 16:12             ` Sean Christopherson
2022-05-13 16:22               ` Yosry Ahmed
2022-05-13 17:13               ` Shakeel Butt
2022-05-20  1:56                 ` Yosry Ahmed
2022-05-20 14:39                   ` Johannes Weiner
2022-05-24 22:31                     ` Yosry Ahmed
2022-05-25 11:56                       ` Johannes Weiner
2022-05-26  0:38                         ` Sean Christopherson
2022-05-27 18:33                           ` Yosry Ahmed
2022-06-03 16:42                             ` Johannes Weiner
2022-04-29 20:11 ` [PATCH v4 2/4] KVM: mmu: add a helper to account memory used by KVM mmu Yosry Ahmed
2022-04-29 20:11 ` [PATCH v4 3/4] KVM: x86/mmu: count KVM mmu usage in secondary pagetable stats Yosry Ahmed
2022-07-22 20:58   ` Mingwei Zhang
2022-07-26 18:03     ` Sean Christopherson
2022-04-29 20:11 ` [PATCH v4 4/4] KVM: arm64/mmu: count KVM s2 " Yosry Ahmed
2022-05-02  7:24   ` Oliver Upton
2022-05-02  9:49     ` Marc Zyngier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).