linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/7] KVM: X86: Some light optimizations on rmap logic
@ 2021-07-30 22:04 Peter Xu
  2021-07-30 22:04 ` [PATCH v3 1/7] KVM: Allow to have arch-specific per-vm debugfs files Peter Xu
                   ` (6 more replies)
  0 siblings, 7 replies; 13+ messages in thread
From: Peter Xu @ 2021-07-30 22:04 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: Vitaly Kuznetsov, Sean Christopherson, peterx, Maxim Levitsky,
	Paolo Bonzini

Major change to v3 is to address comments from Sean.

Since I retested the two relevant patches and the numbers changed slightly, I
updated the numbers in the two optimization patches to reflect that.  In the
latest measurement the 3->15 slots change showed more effect on the speedup.
Summary:

        Vanilla:      473.90 (+-5.93%)
        3->15 slots:  366.10 (+-4.94%)
        Add counter:  351.00 (+-3.70%)

All the numbers are also updated in the commit messages.

To apply the series upon kvm/queue, below patches should be replaced by the
corresponding patches in this v3:

        KVM: X86: MMU: Tune PTE_LIST_EXT to be bigger
        KVM: X86: Optimize pte_list_desc with per-array counter
        KVM: X86: Optimize zapping rmap

The 1st oneliner patch needs to be replaced because the commit message is
updated with the new numbers so to align all the numbers, the 2nd-3rd patches
are for addressing Sean's comments and also with the new numbers.

I didn't repost the initial two patches because they're already in kvm/queue
and they'll be identical in content.  Please have a look, thanks.

v2: https://lore.kernel.org/kvm/20210625153214.43106-1-peterx@redhat.com/
v1: https://lore.kernel.org/kvm/20210624181356.10235-1-peterx@redhat.com/

-- original cover letter --

All things started from patch 1, which introduced a new statistic to keep "max
rmap entry count per vm".  At that time I was just curious about how many rmap
is there normally for a guest, and it surprised me a bit.

For TDP mappings it's all fine as mostly rmap of a page is either 0 or 1
depending on faulted or not.  It turns out with EPT=N there seems to be a huge
number of pages that can have tens or hundreds of rmap entries even for an idle
guest.  Then I continued with the rest.

To understand better on "how much of those pages", I did patch 2-6 which
introduced the idea of per-arch per-vm debugfs nodes, and added a debug file to
do statistics for rmap, which is similar to kvm_arch_create_vcpu_debugfs() but
for vm not vcpu.

I did notice this should be the clean approach as I also see other archs
randomly create some per-vm debugfs nodes there:

---8<---
*** arch/arm64/kvm/vgic/vgic-debug.c:
vgic_debug_init[274]           debugfs_create_file("vgic-state", 0444, kvm->debugfs_dentry, kvm,

*** arch/powerpc/kvm/book3s_64_mmu_hv.c:
kvmppc_mmu_debugfs_init[2115]  debugfs_create_file("htab", 0400, kvm->arch.debugfs_dir, kvm,

*** arch/powerpc/kvm/book3s_64_mmu_radix.c:
kvmhv_radix_debugfs_init[1434] debugfs_create_file("radix", 0400, kvm->arch.debugfs_dir, kvm,

*** arch/powerpc/kvm/book3s_hv.c:
debugfs_vcpu_init[2395]        debugfs_create_file("timings", 0444, vcpu->arch.debugfs_dir, vcpu,

*** arch/powerpc/kvm/book3s_xics.c:
xics_debugfs_init[1027]        xics->dentry = debugfs_create_file(name, 0444, powerpc_debugfs_root,

*** arch/powerpc/kvm/book3s_xive.c:
xive_debugfs_init[2236]        xive->dentry = debugfs_create_file(name, S_IRUGO, powerpc_debugfs_root,

*** arch/powerpc/kvm/timing.c:
kvmppc_create_vcpu_debugfs[214] debugfs_file = debugfs_create_file(dbg_fname, 0666, kvm_debugfs_dir,
---8<---

PPC even has its own per-vm dir for that.  I think if patch 2-6 can be
considered to be accepted then the next thing to consider is to merge all these
usages to be under the same existing per-vm dentry with their per-arch hooks
introduced.

The last 3 patches (patch 7-9) are a few optimizations of existing rmap logic.
The major test case I used is rmap_fork [1], however it's not really the ideal
one to show their effect for sure as that test I wrote covers both
rmap_add/remove, while I don't have good idea on optimizing rmap_remove without
changing the array structure or adding much overhead (e.g. sort the array, or
making a tree-like structure somehow to replace the array list).  However it
already shows some benefit with those changes, so I post them out.

Applying patch 7-8 will bring a summary of 38% perf boost when I fork 500
childs with the test I used.  Didn't run perf test on patch 9.  More in the
commit log.

Please review, thanks.

[1] https://github.com/xzpeter/clibs/commit/825436f825453de2ea5aaee4bdb1c92281efe5b3

Peter Xu (7):
  KVM: Allow to have arch-specific per-vm debugfs files
  KVM: X86: Introduce pte_list_count() helper
  KVM: X86: Introduce kvm_mmu_slot_lpages() helpers
  KVM: X86: Introduce mmu_rmaps_stat per-vm debugfs file
  KVM: X86: MMU: Tune PTE_LIST_EXT to be bigger
  KVM: X86: Optimize pte_list_desc with per-array counter
  KVM: X86: Optimize zapping rmap

 arch/x86/kvm/mmu/mmu.c          |  98 +++++++++++++++++-------
 arch/x86/kvm/mmu/mmu_internal.h |   1 +
 arch/x86/kvm/x86.c              | 130 +++++++++++++++++++++++++++++++-
 include/linux/kvm_host.h        |   1 +
 virt/kvm/kvm_main.c             |  20 ++++-
 5 files changed, 221 insertions(+), 29 deletions(-)

-- 
2.31.1



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v3 1/7] KVM: Allow to have arch-specific per-vm debugfs files
  2021-07-30 22:04 [PATCH v3 0/7] KVM: X86: Some light optimizations on rmap logic Peter Xu
@ 2021-07-30 22:04 ` Peter Xu
  2021-08-03 11:15   ` Greg KH
  2021-07-30 22:04 ` [PATCH v3 2/7] KVM: X86: Introduce pte_list_count() helper Peter Xu
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 13+ messages in thread
From: Peter Xu @ 2021-07-30 22:04 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: Vitaly Kuznetsov, Sean Christopherson, peterx, Maxim Levitsky,
	Paolo Bonzini

Allow archs to create arch-specific nodes under kvm->debugfs_dentry directory
besides the stats fields.  The new interface kvm_arch_create_vm_debugfs() is
defined but not yet used.  It's called after kvm->debugfs_dentry is created, so
it can be referenced directly in kvm_arch_create_vm_debugfs().  Arch should
define their own versions when they want to create extra debugfs nodes.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 include/linux/kvm_host.h |  1 +
 virt/kvm/kvm_main.c      | 20 +++++++++++++++++++-
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 9d6b4ad407b8..a3ec3271c4c8 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1067,6 +1067,7 @@ bool kvm_arch_dy_runnable(struct kvm_vcpu *vcpu);
 bool kvm_arch_dy_has_pending_interrupt(struct kvm_vcpu *vcpu);
 int kvm_arch_post_init_vm(struct kvm *kvm);
 void kvm_arch_pre_destroy_vm(struct kvm *kvm);
+int kvm_arch_create_vm_debugfs(struct kvm *kvm);
 
 #ifndef __KVM_HAVE_ARCH_VM_ALLOC
 /*
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index a96cbe24c688..327f8fae80a5 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -915,7 +915,7 @@ static int kvm_create_vm_debugfs(struct kvm *kvm, int fd)
 	char dir_name[ITOA_MAX_LEN * 2];
 	struct kvm_stat_data *stat_data;
 	const struct _kvm_stats_desc *pdesc;
-	int i;
+	int i, ret;
 	int kvm_debugfs_num_entries = kvm_vm_stats_header.num_desc +
 				      kvm_vcpu_stats_header.num_desc;
 
@@ -960,6 +960,13 @@ static int kvm_create_vm_debugfs(struct kvm *kvm, int fd)
 				    kvm->debugfs_dentry, stat_data,
 				    &stat_fops_per_vm);
 	}
+
+	ret = kvm_arch_create_vm_debugfs(kvm);
+	if (ret) {
+		kvm_destroy_vm_debugfs(kvm);
+		return i;
+	}
+
 	return 0;
 }
 
@@ -980,6 +987,17 @@ void __weak kvm_arch_pre_destroy_vm(struct kvm *kvm)
 {
 }
 
+/*
+ * Called after per-vm debugfs created.  When called kvm->debugfs_dentry should
+ * be setup already, so we can create arch-specific debugfs entries under it.
+ * Cleanup should be automatic done in kvm_destroy_vm_debugfs() recursively, so
+ * a per-arch destroy interface is not needed.
+ */
+int __weak kvm_arch_create_vm_debugfs(struct kvm *kvm)
+{
+	return 0;
+}
+
 static struct kvm *kvm_create_vm(unsigned long type)
 {
 	struct kvm *kvm = kvm_arch_alloc_vm();
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 2/7] KVM: X86: Introduce pte_list_count() helper
  2021-07-30 22:04 [PATCH v3 0/7] KVM: X86: Some light optimizations on rmap logic Peter Xu
  2021-07-30 22:04 ` [PATCH v3 1/7] KVM: Allow to have arch-specific per-vm debugfs files Peter Xu
@ 2021-07-30 22:04 ` Peter Xu
  2021-07-30 22:04 ` [PATCH v3 3/7] KVM: X86: Introduce kvm_mmu_slot_lpages() helpers Peter Xu
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Peter Xu @ 2021-07-30 22:04 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: Vitaly Kuznetsov, Sean Christopherson, peterx, Maxim Levitsky,
	Paolo Bonzini

This helper is used to count the number of rmap entries in the rmap list
pointed by the kvm_rmap_head.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 arch/x86/kvm/mmu/mmu.c          | 21 +++++++++++++++++++++
 arch/x86/kvm/mmu/mmu_internal.h |  1 +
 2 files changed, 22 insertions(+)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 5429c20cf2cf..16c99f771c9e 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -998,6 +998,27 @@ static void pte_list_remove(struct kvm_rmap_head *rmap_head, u64 *sptep)
 	__pte_list_remove(sptep, rmap_head);
 }
 
+unsigned int pte_list_count(struct kvm_rmap_head *rmap_head)
+{
+	struct pte_list_desc *desc;
+	unsigned int i, count = 0;
+
+	if (!rmap_head->val)
+		return 0;
+	else if (!(rmap_head->val & 1))
+		return 1;
+
+	desc = (struct pte_list_desc *)(rmap_head->val & ~1ul);
+
+	while (desc) {
+		for (i = 0; (i < PTE_LIST_EXT) && desc->sptes[i]; i++)
+			count++;
+		desc = desc->more;
+	}
+
+	return count;
+}
+
 static struct kvm_rmap_head *__gfn_to_rmap(gfn_t gfn, int level,
 					   const struct kvm_memory_slot *slot)
 {
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index ca7b7595bbfc..62bb8f758b3f 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -131,6 +131,7 @@ bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm,
 				    int min_level);
 void kvm_flush_remote_tlbs_with_address(struct kvm *kvm,
 					u64 start_gfn, u64 pages);
+unsigned int pte_list_count(struct kvm_rmap_head *rmap_head);
 
 /*
  * Return values of handle_mmio_page_fault, mmu.page_fault, and fast_page_fault().
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 3/7] KVM: X86: Introduce kvm_mmu_slot_lpages() helpers
  2021-07-30 22:04 [PATCH v3 0/7] KVM: X86: Some light optimizations on rmap logic Peter Xu
  2021-07-30 22:04 ` [PATCH v3 1/7] KVM: Allow to have arch-specific per-vm debugfs files Peter Xu
  2021-07-30 22:04 ` [PATCH v3 2/7] KVM: X86: Introduce pte_list_count() helper Peter Xu
@ 2021-07-30 22:04 ` Peter Xu
  2021-07-30 22:04 ` [PATCH v3 4/7] KVM: X86: Introduce mmu_rmaps_stat per-vm debugfs file Peter Xu
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Peter Xu @ 2021-07-30 22:04 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: Vitaly Kuznetsov, Sean Christopherson, peterx, Maxim Levitsky,
	Paolo Bonzini

Introduce kvm_mmu_slot_lpages() to calculcate lpage_info and rmap array size.
The other __kvm_mmu_slot_lpages() can take an extra parameter of npages rather
than fetching from the memslot pointer.  Start to use the latter one in
kvm_alloc_memslot_metadata().

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 arch/x86/kvm/x86.c | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 916c976e99ab..e44d8f7781b6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -299,6 +299,20 @@ static struct kmem_cache *x86_fpu_cache;
 
 static struct kmem_cache *x86_emulator_cache;
 
+static inline unsigned long
+__kvm_mmu_slot_lpages(struct kvm_memory_slot *slot, unsigned long npages,
+		      int level)
+{
+	return gfn_to_index(slot->base_gfn + npages - 1,
+			    slot->base_gfn, level) + 1;
+}
+
+static inline unsigned long
+kvm_mmu_slot_lpages(struct kvm_memory_slot *slot, int level)
+{
+	return __kvm_mmu_slot_lpages(slot, slot->npages, level);
+}
+
 /*
  * When called, it means the previous get/set msr reached an invalid msr.
  * Return true if we want to ignore/silent this failed msr access.
@@ -11443,8 +11457,7 @@ static int kvm_alloc_memslot_metadata(struct kvm *kvm,
 		int lpages;
 		int level = i + 1;
 
-		lpages = gfn_to_index(slot->base_gfn + npages - 1,
-				      slot->base_gfn, level) + 1;
+		lpages = __kvm_mmu_slot_lpages(slot, npages, level);
 
 		linfo = kvcalloc(lpages, sizeof(*linfo), GFP_KERNEL_ACCOUNT);
 		if (!linfo)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 4/7] KVM: X86: Introduce mmu_rmaps_stat per-vm debugfs file
  2021-07-30 22:04 [PATCH v3 0/7] KVM: X86: Some light optimizations on rmap logic Peter Xu
                   ` (2 preceding siblings ...)
  2021-07-30 22:04 ` [PATCH v3 3/7] KVM: X86: Introduce kvm_mmu_slot_lpages() helpers Peter Xu
@ 2021-07-30 22:04 ` Peter Xu
  2021-08-02 15:25   ` Paolo Bonzini
  2021-07-30 22:04 ` [PATCH v3 5/7] KVM: X86: MMU: Tune PTE_LIST_EXT to be bigger Peter Xu
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 13+ messages in thread
From: Peter Xu @ 2021-07-30 22:04 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: Vitaly Kuznetsov, Sean Christopherson, peterx, Maxim Levitsky,
	Paolo Bonzini

Use this file to dump rmap statistic information.  The statistic is done by
calculating the rmap count and the result is log-2-based.

An example output of this looks like (idle 6GB guest, right after boot linux):

Rmap_Count:     0       1       2-3     4-7     8-15    16-31   32-63   64-127  128-255 256-511 512-1023
Level=4K:       3086676 53045   12330   1272    502     121     76      2       0       0       0
Level=2M:       5947    231     0       0       0       0       0       0       0       0       0
Level=1G:       32      0       0       0       0       0       0       0       0       0       0

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 arch/x86/kvm/x86.c | 113 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 113 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e44d8f7781b6..0877340dc6ff 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -30,6 +30,7 @@
 #include "hyperv.h"
 #include "lapic.h"
 #include "xen.h"
+#include "mmu/mmu_internal.h"
 
 #include <linux/clocksource.h>
 #include <linux/interrupt.h>
@@ -59,6 +60,7 @@
 #include <linux/mem_encrypt.h>
 #include <linux/entry-kvm.h>
 #include <linux/suspend.h>
+#include <linux/debugfs.h>
 
 #include <trace/events/kvm.h>
 
@@ -11193,6 +11195,117 @@ int kvm_arch_post_init_vm(struct kvm *kvm)
 	return kvm_mmu_post_init_vm(kvm);
 }
 
+/*
+ * This covers statistics <1024 (11=log(1024)+1), which should be enough to
+ * cover RMAP_RECYCLE_THRESHOLD.
+ */
+#define  RMAP_LOG_SIZE  11
+
+static const char *kvm_lpage_str[KVM_NR_PAGE_SIZES] = { "4K", "2M", "1G" };
+
+static int kvm_mmu_rmaps_stat_show(struct seq_file *m, void *v)
+{
+	struct kvm_rmap_head *rmap;
+	struct kvm *kvm = m->private;
+	struct kvm_memory_slot *slot;
+	struct kvm_memslots *slots;
+	unsigned int lpage_size, index;
+	/* Still small enough to be on the stack */
+	unsigned int *log[KVM_NR_PAGE_SIZES], *cur;
+	int i, j, k, l, ret;
+
+	memset(log, 0, sizeof(log));
+
+	ret = -ENOMEM;
+	for (i = 0; i < KVM_NR_PAGE_SIZES; i++) {
+		log[i] = kzalloc(RMAP_LOG_SIZE * sizeof(unsigned int), GFP_KERNEL);
+		if (!log[i])
+			goto out;
+	}
+
+	mutex_lock(&kvm->slots_lock);
+	write_lock(&kvm->mmu_lock);
+
+	for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
+		slots = __kvm_memslots(kvm, i);
+		for (j = 0; j < slots->used_slots; j++) {
+			slot = &slots->memslots[j];
+			for (k = 0; k < KVM_NR_PAGE_SIZES; k++) {
+				rmap = slot->arch.rmap[k];
+				lpage_size = kvm_mmu_slot_lpages(slot, k + 1);
+				cur = log[k];
+				for (l = 0; l < lpage_size; l++) {
+					index = ffs(pte_list_count(&rmap[l]));
+					if (WARN_ON_ONCE(index >= RMAP_LOG_SIZE))
+						index = RMAP_LOG_SIZE - 1;
+					cur[index]++;
+				}
+			}
+		}
+	}
+
+	write_unlock(&kvm->mmu_lock);
+	mutex_unlock(&kvm->slots_lock);
+
+	/* index=0 counts no rmap; index=1 counts 1 rmap */
+	seq_printf(m, "Rmap_Count:\t0\t1\t");
+	for (i = 2; i < RMAP_LOG_SIZE; i++) {
+		j = 1 << (i - 1);
+		k = (1 << i) - 1;
+		seq_printf(m, "%d-%d\t", j, k);
+	}
+	seq_printf(m, "\n");
+
+	for (i = 0; i < KVM_NR_PAGE_SIZES; i++) {
+		seq_printf(m, "Level=%s:\t", kvm_lpage_str[i]);
+		cur = log[i];
+		for (j = 0; j < RMAP_LOG_SIZE; j++)
+			seq_printf(m, "%d\t", cur[j]);
+		seq_printf(m, "\n");
+	}
+
+	ret = 0;
+out:
+	for (i = 0; i < KVM_NR_PAGE_SIZES; i++)
+		if (log[i])
+			kfree(log[i]);
+
+	return ret;
+}
+
+static int kvm_mmu_rmaps_stat_open(struct inode *inode, struct file *file)
+{
+	struct kvm *kvm = inode->i_private;
+
+	if (!kvm_get_kvm_safe(kvm))
+		return -ENOENT;
+
+	return single_open(file, kvm_mmu_rmaps_stat_show, kvm);
+}
+
+static int kvm_mmu_rmaps_stat_release(struct inode *inode, struct file *file)
+{
+	struct kvm *kvm = inode->i_private;
+
+	kvm_put_kvm(kvm);
+
+	return single_release(inode, file);
+}
+
+static const struct file_operations mmu_rmaps_stat_fops = {
+	.open		= kvm_mmu_rmaps_stat_open,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= kvm_mmu_rmaps_stat_release,
+};
+
+int kvm_arch_create_vm_debugfs(struct kvm *kvm)
+{
+	debugfs_create_file("mmu_rmaps_stat", 0644, kvm->debugfs_dentry, kvm,
+			    &mmu_rmaps_stat_fops);
+	return 0;
+}
+
 static void kvm_unload_vcpu_mmu(struct kvm_vcpu *vcpu)
 {
 	vcpu_load(vcpu);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 5/7] KVM: X86: MMU: Tune PTE_LIST_EXT to be bigger
  2021-07-30 22:04 [PATCH v3 0/7] KVM: X86: Some light optimizations on rmap logic Peter Xu
                   ` (3 preceding siblings ...)
  2021-07-30 22:04 ` [PATCH v3 4/7] KVM: X86: Introduce mmu_rmaps_stat per-vm debugfs file Peter Xu
@ 2021-07-30 22:04 ` Peter Xu
  2021-07-30 22:06 ` [PATCH v3 6/7] KVM: X86: Optimize pte_list_desc with per-array counter Peter Xu
  2021-07-30 22:06 ` [PATCH v3 7/7] KVM: X86: Optimize zapping rmap Peter Xu
  6 siblings, 0 replies; 13+ messages in thread
From: Peter Xu @ 2021-07-30 22:04 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: Vitaly Kuznetsov, Sean Christopherson, peterx, Maxim Levitsky,
	Paolo Bonzini

Currently rmap array element only contains 3 entries.  However for EPT=N there
could have a lot of guest pages that got tens of even hundreds of rmap entry.

A normal distribution of a 6G guest (even if idle) shows this with rmap count
statistics:

Rmap_Count:     0       1       2-3     4-7     8-15    16-31   32-63   64-127  128-255 256-511 512-1023
Level=4K:       3089171 49005   14016   1363    235     212     15      7       0       0       0
Level=2M:       5951    227     0       0       0       0       0       0       0       0       0
Level=1G:       32      0       0       0       0       0       0       0       0       0       0

If we do some more fork some pages will grow even larger rmap counts.

This patch makes PTE_LIST_EXT bigger so it'll be more efficient for the general
use case of EPT=N as we do list reference less and the loops over PTE_LIST_EXT
will be slightly more efficient; but still not too large so less waste when
array not full.

It should not affecting EPT=Y since EPT normally only has zero or one rmap
entry for each page, so no array is even allocated.

With a test case to fork 500 child and recycle them ("./rmap_fork 500" [1]),
this patch speeds up fork time of about 29%.

    Before: 473.90 (+-5.93%)
    After:  366.10 (+-4.94%)

[1] https://github.com/xzpeter/clibs/commit/825436f825453de2ea5aaee4bdb1c92281efe5b3

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 arch/x86/kvm/mmu/mmu.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 16c99f771c9e..c0b452bb5dd9 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -137,8 +137,8 @@ module_param(dbg, bool, 0644);
 
 #include <trace/events/kvm.h>
 
-/* make pte_list_desc fit well in cache line */
-#define PTE_LIST_EXT 3
+/* make pte_list_desc fit well in cache lines */
+#define PTE_LIST_EXT 15
 
 struct pte_list_desc {
 	u64 *sptes[PTE_LIST_EXT];
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 6/7] KVM: X86: Optimize pte_list_desc with per-array counter
  2021-07-30 22:04 [PATCH v3 0/7] KVM: X86: Some light optimizations on rmap logic Peter Xu
                   ` (4 preceding siblings ...)
  2021-07-30 22:04 ` [PATCH v3 5/7] KVM: X86: MMU: Tune PTE_LIST_EXT to be bigger Peter Xu
@ 2021-07-30 22:06 ` Peter Xu
  2021-07-30 22:06 ` [PATCH v3 7/7] KVM: X86: Optimize zapping rmap Peter Xu
  6 siblings, 0 replies; 13+ messages in thread
From: Peter Xu @ 2021-07-30 22:06 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Maxim Levitsky, Sean Christopherson, Vitaly Kuznetsov, peterx,
	Paolo Bonzini

Add a counter field into pte_list_desc, so as to simplify the add/remove/loop
logic.  E.g., we don't need to loop over the array any more for most reasons.

This will make more sense after we've switched the array size to be larger
otherwise the counter will be a waste.

Initially I wanted to store a tail pointer at the head of the array list so we
don't need to traverse the list at least for pushing new ones (if without the
counter we traverse both the list and the array).  However that'll need
slightly more change without a huge lot benefit, e.g., after we grow entry
numbers per array the list traversing is not so expensive.

So let's be simple but still try to get as much benefit as we can with just
these extra few lines of changes (not to mention the code looks easier too
without looping over arrays).

I used the same a test case to fork 500 child and recycle them ("./rmap_fork
500" [1]), this patch further speeds up the total fork time of about 4%, which
is a total of 33% of vanilla kernel:

        Vanilla:      473.90 (+-5.93%)
        3->15 slots:  366.10 (+-4.94%)
        Add counter:  351.00 (+-3.70%)

[1] https://github.com/xzpeter/clibs/commit/825436f825453de2ea5aaee4bdb1c92281efe5b3

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 arch/x86/kvm/mmu/mmu.c | 40 ++++++++++++++++++++++++----------------
 1 file changed, 24 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index c0b452bb5dd9..111c37141dbe 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -138,11 +138,21 @@ module_param(dbg, bool, 0644);
 #include <trace/events/kvm.h>
 
 /* make pte_list_desc fit well in cache lines */
-#define PTE_LIST_EXT 15
+#define PTE_LIST_EXT 14
 
+/*
+ * Slight optimization of cacheline layout, by putting `more' and `spte_count'
+ * at the start; then accessing it will only use one single cacheline for
+ * either full (entries==PTE_LIST_EXT) case or entries<=6.
+ */
 struct pte_list_desc {
-	u64 *sptes[PTE_LIST_EXT];
 	struct pte_list_desc *more;
+	/*
+	 * Stores number of entries stored in the pte_list_desc.  No need to be
+	 * u64 but just for easier alignment.  When PTE_LIST_EXT, means full.
+	 */
+	u64 spte_count;
+	u64 *sptes[PTE_LIST_EXT];
 };
 
 struct kvm_shadow_walk_iterator {
@@ -901,7 +911,7 @@ static int pte_list_add(struct kvm_vcpu *vcpu, u64 *spte,
 			struct kvm_rmap_head *rmap_head)
 {
 	struct pte_list_desc *desc;
-	int i, count = 0;
+	int count = 0;
 
 	if (!rmap_head->val) {
 		rmap_printk("%p %llx 0->1\n", spte, *spte);
@@ -911,24 +921,24 @@ static int pte_list_add(struct kvm_vcpu *vcpu, u64 *spte,
 		desc = mmu_alloc_pte_list_desc(vcpu);
 		desc->sptes[0] = (u64 *)rmap_head->val;
 		desc->sptes[1] = spte;
+		desc->spte_count = 2;
 		rmap_head->val = (unsigned long)desc | 1;
 		++count;
 	} else {
 		rmap_printk("%p %llx many->many\n", spte, *spte);
 		desc = (struct pte_list_desc *)(rmap_head->val & ~1ul);
-		while (desc->sptes[PTE_LIST_EXT-1]) {
+		while (desc->spte_count == PTE_LIST_EXT) {
 			count += PTE_LIST_EXT;
-
 			if (!desc->more) {
 				desc->more = mmu_alloc_pte_list_desc(vcpu);
 				desc = desc->more;
+				desc->spte_count = 0;
 				break;
 			}
 			desc = desc->more;
 		}
-		for (i = 0; desc->sptes[i]; ++i)
-			++count;
-		desc->sptes[i] = spte;
+		count += desc->spte_count;
+		desc->sptes[desc->spte_count++] = spte;
 	}
 	return count;
 }
@@ -938,13 +948,12 @@ pte_list_desc_remove_entry(struct kvm_rmap_head *rmap_head,
 			   struct pte_list_desc *desc, int i,
 			   struct pte_list_desc *prev_desc)
 {
-	int j;
+	int j = desc->spte_count - 1;
 
-	for (j = PTE_LIST_EXT - 1; !desc->sptes[j] && j > i; --j)
-		;
 	desc->sptes[i] = desc->sptes[j];
 	desc->sptes[j] = NULL;
-	if (j != 0)
+	desc->spte_count--;
+	if (desc->spte_count)
 		return;
 	if (!prev_desc && !desc->more)
 		rmap_head->val = 0;
@@ -977,7 +986,7 @@ static void __pte_list_remove(u64 *spte, struct kvm_rmap_head *rmap_head)
 		desc = (struct pte_list_desc *)(rmap_head->val & ~1ul);
 		prev_desc = NULL;
 		while (desc) {
-			for (i = 0; i < PTE_LIST_EXT && desc->sptes[i]; ++i) {
+			for (i = 0; i < desc->spte_count; ++i) {
 				if (desc->sptes[i] == spte) {
 					pte_list_desc_remove_entry(rmap_head,
 							desc, i, prev_desc);
@@ -1001,7 +1010,7 @@ static void pte_list_remove(struct kvm_rmap_head *rmap_head, u64 *sptep)
 unsigned int pte_list_count(struct kvm_rmap_head *rmap_head)
 {
 	struct pte_list_desc *desc;
-	unsigned int i, count = 0;
+	unsigned int count = 0;
 
 	if (!rmap_head->val)
 		return 0;
@@ -1011,8 +1020,7 @@ unsigned int pte_list_count(struct kvm_rmap_head *rmap_head)
 	desc = (struct pte_list_desc *)(rmap_head->val & ~1ul);
 
 	while (desc) {
-		for (i = 0; (i < PTE_LIST_EXT) && desc->sptes[i]; i++)
-			count++;
+		count += desc->spte_count;
 		desc = desc->more;
 	}
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 7/7] KVM: X86: Optimize zapping rmap
  2021-07-30 22:04 [PATCH v3 0/7] KVM: X86: Some light optimizations on rmap logic Peter Xu
                   ` (5 preceding siblings ...)
  2021-07-30 22:06 ` [PATCH v3 6/7] KVM: X86: Optimize pte_list_desc with per-array counter Peter Xu
@ 2021-07-30 22:06 ` Peter Xu
  6 siblings, 0 replies; 13+ messages in thread
From: Peter Xu @ 2021-07-30 22:06 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Maxim Levitsky, Sean Christopherson, Vitaly Kuznetsov, peterx,
	Paolo Bonzini

Using rmap_get_first() and rmap_remove() for zapping a huge rmap list could be
slow.  The easy way is to travers the rmap list, collecting the a/d bits and
free the slots along the way.

Provide a pte_list_destroy() and do exactly that.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 arch/x86/kvm/mmu/mmu.c | 41 +++++++++++++++++++++++++++++------------
 1 file changed, 29 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 111c37141dbe..9b2616760e23 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1027,6 +1027,34 @@ unsigned int pte_list_count(struct kvm_rmap_head *rmap_head)
 	return count;
 }
 
+/* Return true if rmap existed, false otherwise */
+static bool pte_list_destroy(struct kvm_rmap_head *rmap_head)
+{
+	struct pte_list_desc *desc, *next;
+	int i;
+
+	if (!rmap_head->val)
+		return false;
+
+	if (!(rmap_head->val & 1)) {
+		mmu_spte_clear_track_bits((u64 *)rmap_head->val);
+		goto out;
+	}
+
+	desc = (struct pte_list_desc *)(rmap_head->val & ~1ul);
+
+	for (; desc; desc = next) {
+		for (i = 0; i < desc->spte_count; i++)
+			mmu_spte_clear_track_bits(desc->sptes[i]);
+		next = desc->more;
+		mmu_free_pte_list_desc(desc);
+	}
+out:
+	/* rmap_head is meaningless now, remember to reset it */
+	rmap_head->val = 0;
+	return true;
+}
+
 static struct kvm_rmap_head *__gfn_to_rmap(gfn_t gfn, int level,
 					   const struct kvm_memory_slot *slot)
 {
@@ -1418,18 +1446,7 @@ static bool rmap_write_protect(struct kvm_vcpu *vcpu, u64 gfn)
 static bool kvm_zap_rmapp(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
 			  const struct kvm_memory_slot *slot)
 {
-	u64 *sptep;
-	struct rmap_iterator iter;
-	bool flush = false;
-
-	while ((sptep = rmap_get_first(rmap_head, &iter))) {
-		rmap_printk("spte %p %llx.\n", sptep, *sptep);
-
-		pte_list_remove(rmap_head, sptep);
-		flush = true;
-	}
-
-	return flush;
+	return pte_list_destroy(rmap_head);
 }
 
 static bool kvm_unmap_rmapp(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 4/7] KVM: X86: Introduce mmu_rmaps_stat per-vm debugfs file
  2021-07-30 22:04 ` [PATCH v3 4/7] KVM: X86: Introduce mmu_rmaps_stat per-vm debugfs file Peter Xu
@ 2021-08-02 15:25   ` Paolo Bonzini
  2021-08-03 19:14     ` Peter Xu
  2021-08-05 18:19     ` Sean Christopherson
  0 siblings, 2 replies; 13+ messages in thread
From: Paolo Bonzini @ 2021-08-02 15:25 UTC (permalink / raw)
  To: Peter Xu, kvm, linux-kernel
  Cc: Vitaly Kuznetsov, Sean Christopherson, Maxim Levitsky

On 31/07/21 00:04, Peter Xu wrote:
> Use this file to dump rmap statistic information.  The statistic is done by
> calculating the rmap count and the result is log-2-based.
> 
> An example output of this looks like (idle 6GB guest, right after boot linux):
> 
> Rmap_Count:     0       1       2-3     4-7     8-15    16-31   32-63   64-127  128-255 256-511 512-1023
> Level=4K:       3086676 53045   12330   1272    502     121     76      2       0       0       0
> Level=2M:       5947    231     0       0       0       0       0       0       0       0       0
> Level=1G:       32      0       0       0       0       0       0       0       0       0       0
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>   arch/x86/kvm/x86.c | 113 +++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 113 insertions(+)

This should be in debugfs.c, meaning that the kvm_mmu_slot_lpages() must 
be in a header.  I think mmu.h should do, let me take a look and I can 
post myself a v4 of these debugfs parts.

Paolo

> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index e44d8f7781b6..0877340dc6ff 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -30,6 +30,7 @@
>   #include "hyperv.h"
>   #include "lapic.h"
>   #include "xen.h"
> +#include "mmu/mmu_internal.h"
>   
>   #include <linux/clocksource.h>
>   #include <linux/interrupt.h>
> @@ -59,6 +60,7 @@
>   #include <linux/mem_encrypt.h>
>   #include <linux/entry-kvm.h>
>   #include <linux/suspend.h>
> +#include <linux/debugfs.h>
>   
>   #include <trace/events/kvm.h>
>   
> @@ -11193,6 +11195,117 @@ int kvm_arch_post_init_vm(struct kvm *kvm)
>   	return kvm_mmu_post_init_vm(kvm);
>   }
>   
> +/*
> + * This covers statistics <1024 (11=log(1024)+1), which should be enough to
> + * cover RMAP_RECYCLE_THRESHOLD.
> + */
> +#define  RMAP_LOG_SIZE  11
> +
> +static const char *kvm_lpage_str[KVM_NR_PAGE_SIZES] = { "4K", "2M", "1G" };
> +
> +static int kvm_mmu_rmaps_stat_show(struct seq_file *m, void *v)
> +{
> +	struct kvm_rmap_head *rmap;
> +	struct kvm *kvm = m->private;
> +	struct kvm_memory_slot *slot;
> +	struct kvm_memslots *slots;
> +	unsigned int lpage_size, index;
> +	/* Still small enough to be on the stack */
> +	unsigned int *log[KVM_NR_PAGE_SIZES], *cur;
> +	int i, j, k, l, ret;
> +
> +	memset(log, 0, sizeof(log));
> +
> +	ret = -ENOMEM;
> +	for (i = 0; i < KVM_NR_PAGE_SIZES; i++) {
> +		log[i] = kzalloc(RMAP_LOG_SIZE * sizeof(unsigned int), GFP_KERNEL);
> +		if (!log[i])
> +			goto out;
> +	}
> +
> +	mutex_lock(&kvm->slots_lock);
> +	write_lock(&kvm->mmu_lock);
> +
> +	for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
> +		slots = __kvm_memslots(kvm, i);
> +		for (j = 0; j < slots->used_slots; j++) {
> +			slot = &slots->memslots[j];
> +			for (k = 0; k < KVM_NR_PAGE_SIZES; k++) {
> +				rmap = slot->arch.rmap[k];
> +				lpage_size = kvm_mmu_slot_lpages(slot, k + 1);
> +				cur = log[k];
> +				for (l = 0; l < lpage_size; l++) {
> +					index = ffs(pte_list_count(&rmap[l]));
> +					if (WARN_ON_ONCE(index >= RMAP_LOG_SIZE))
> +						index = RMAP_LOG_SIZE - 1;
> +					cur[index]++;
> +				}
> +			}
> +		}
> +	}
> +
> +	write_unlock(&kvm->mmu_lock);
> +	mutex_unlock(&kvm->slots_lock);
> +
> +	/* index=0 counts no rmap; index=1 counts 1 rmap */
> +	seq_printf(m, "Rmap_Count:\t0\t1\t");
> +	for (i = 2; i < RMAP_LOG_SIZE; i++) {
> +		j = 1 << (i - 1);
> +		k = (1 << i) - 1;
> +		seq_printf(m, "%d-%d\t", j, k);
> +	}
> +	seq_printf(m, "\n");
> +
> +	for (i = 0; i < KVM_NR_PAGE_SIZES; i++) {
> +		seq_printf(m, "Level=%s:\t", kvm_lpage_str[i]);
> +		cur = log[i];
> +		for (j = 0; j < RMAP_LOG_SIZE; j++)
> +			seq_printf(m, "%d\t", cur[j]);
> +		seq_printf(m, "\n");
> +	}
> +
> +	ret = 0;
> +out:
> +	for (i = 0; i < KVM_NR_PAGE_SIZES; i++)
> +		if (log[i])
> +			kfree(log[i]);
> +
> +	return ret;
> +}
> +
> +static int kvm_mmu_rmaps_stat_open(struct inode *inode, struct file *file)
> +{
> +	struct kvm *kvm = inode->i_private;
> +
> +	if (!kvm_get_kvm_safe(kvm))
> +		return -ENOENT;
> +
> +	return single_open(file, kvm_mmu_rmaps_stat_show, kvm);
> +}
> +
> +static int kvm_mmu_rmaps_stat_release(struct inode *inode, struct file *file)
> +{
> +	struct kvm *kvm = inode->i_private;
> +
> +	kvm_put_kvm(kvm);
> +
> +	return single_release(inode, file);
> +}
> +
> +static const struct file_operations mmu_rmaps_stat_fops = {
> +	.open		= kvm_mmu_rmaps_stat_open,
> +	.read		= seq_read,
> +	.llseek		= seq_lseek,
> +	.release	= kvm_mmu_rmaps_stat_release,
> +};
> +
> +int kvm_arch_create_vm_debugfs(struct kvm *kvm)
> +{
> +	debugfs_create_file("mmu_rmaps_stat", 0644, kvm->debugfs_dentry, kvm,
> +			    &mmu_rmaps_stat_fops);
> +	return 0;
> +}
> +
>   static void kvm_unload_vcpu_mmu(struct kvm_vcpu *vcpu)
>   {
>   	vcpu_load(vcpu);
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 1/7] KVM: Allow to have arch-specific per-vm debugfs files
  2021-07-30 22:04 ` [PATCH v3 1/7] KVM: Allow to have arch-specific per-vm debugfs files Peter Xu
@ 2021-08-03 11:15   ` Greg KH
  2021-08-03 19:25     ` Peter Xu
  0 siblings, 1 reply; 13+ messages in thread
From: Greg KH @ 2021-08-03 11:15 UTC (permalink / raw)
  To: Peter Xu
  Cc: kvm, linux-kernel, Vitaly Kuznetsov, Sean Christopherson,
	Maxim Levitsky, Paolo Bonzini

On Fri, Jul 30, 2021 at 06:04:49PM -0400, Peter Xu wrote:
> Allow archs to create arch-specific nodes under kvm->debugfs_dentry directory
> besides the stats fields.  The new interface kvm_arch_create_vm_debugfs() is
> defined but not yet used.  It's called after kvm->debugfs_dentry is created, so
> it can be referenced directly in kvm_arch_create_vm_debugfs().  Arch should
> define their own versions when they want to create extra debugfs nodes.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  include/linux/kvm_host.h |  1 +
>  virt/kvm/kvm_main.c      | 20 +++++++++++++++++++-
>  2 files changed, 20 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 9d6b4ad407b8..a3ec3271c4c8 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1067,6 +1067,7 @@ bool kvm_arch_dy_runnable(struct kvm_vcpu *vcpu);
>  bool kvm_arch_dy_has_pending_interrupt(struct kvm_vcpu *vcpu);
>  int kvm_arch_post_init_vm(struct kvm *kvm);
>  void kvm_arch_pre_destroy_vm(struct kvm *kvm);
> +int kvm_arch_create_vm_debugfs(struct kvm *kvm);
>  
>  #ifndef __KVM_HAVE_ARCH_VM_ALLOC
>  /*
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index a96cbe24c688..327f8fae80a5 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -915,7 +915,7 @@ static int kvm_create_vm_debugfs(struct kvm *kvm, int fd)
>  	char dir_name[ITOA_MAX_LEN * 2];
>  	struct kvm_stat_data *stat_data;
>  	const struct _kvm_stats_desc *pdesc;
> -	int i;
> +	int i, ret;
>  	int kvm_debugfs_num_entries = kvm_vm_stats_header.num_desc +
>  				      kvm_vcpu_stats_header.num_desc;
>  
> @@ -960,6 +960,13 @@ static int kvm_create_vm_debugfs(struct kvm *kvm, int fd)
>  				    kvm->debugfs_dentry, stat_data,
>  				    &stat_fops_per_vm);
>  	}
> +
> +	ret = kvm_arch_create_vm_debugfs(kvm);
> +	if (ret) {
> +		kvm_destroy_vm_debugfs(kvm);
> +		return i;
> +	}
> +
>  	return 0;
>  }
>  
> @@ -980,6 +987,17 @@ void __weak kvm_arch_pre_destroy_vm(struct kvm *kvm)
>  {
>  }
>  
> +/*
> + * Called after per-vm debugfs created.  When called kvm->debugfs_dentry should
> + * be setup already, so we can create arch-specific debugfs entries under it.
> + * Cleanup should be automatic done in kvm_destroy_vm_debugfs() recursively, so
> + * a per-arch destroy interface is not needed.
> + */
> +int __weak kvm_arch_create_vm_debugfs(struct kvm *kvm)

This should be a void function, nothing should matter if creating
debugfs files succeeds or not.

As proof, your one implementation always returned 0 :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 4/7] KVM: X86: Introduce mmu_rmaps_stat per-vm debugfs file
  2021-08-02 15:25   ` Paolo Bonzini
@ 2021-08-03 19:14     ` Peter Xu
  2021-08-05 18:19     ` Sean Christopherson
  1 sibling, 0 replies; 13+ messages in thread
From: Peter Xu @ 2021-08-03 19:14 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm, linux-kernel, Vitaly Kuznetsov, Sean Christopherson, Maxim Levitsky

On Mon, Aug 02, 2021 at 05:25:12PM +0200, Paolo Bonzini wrote:
> On 31/07/21 00:04, Peter Xu wrote:
> > Use this file to dump rmap statistic information.  The statistic is done by
> > calculating the rmap count and the result is log-2-based.
> > 
> > An example output of this looks like (idle 6GB guest, right after boot linux):
> > 
> > Rmap_Count:     0       1       2-3     4-7     8-15    16-31   32-63   64-127  128-255 256-511 512-1023
> > Level=4K:       3086676 53045   12330   1272    502     121     76      2       0       0       0
> > Level=2M:       5947    231     0       0       0       0       0       0       0       0       0
> > Level=1G:       32      0       0       0       0       0       0       0       0       0       0
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >   arch/x86/kvm/x86.c | 113 +++++++++++++++++++++++++++++++++++++++++++++
> >   1 file changed, 113 insertions(+)
> 
> This should be in debugfs.c, meaning that the kvm_mmu_slot_lpages() must be
> in a header.  I think mmu.h should do, let me take a look and I can post
> myself a v4 of these debugfs parts.

Thanks, Paolo!

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 1/7] KVM: Allow to have arch-specific per-vm debugfs files
  2021-08-03 11:15   ` Greg KH
@ 2021-08-03 19:25     ` Peter Xu
  0 siblings, 0 replies; 13+ messages in thread
From: Peter Xu @ 2021-08-03 19:25 UTC (permalink / raw)
  To: Greg KH
  Cc: kvm, linux-kernel, Vitaly Kuznetsov, Sean Christopherson,
	Maxim Levitsky, Paolo Bonzini

On Tue, Aug 03, 2021 at 01:15:19PM +0200, Greg KH wrote:
> > +/*
> > + * Called after per-vm debugfs created.  When called kvm->debugfs_dentry should
> > + * be setup already, so we can create arch-specific debugfs entries under it.
> > + * Cleanup should be automatic done in kvm_destroy_vm_debugfs() recursively, so
> > + * a per-arch destroy interface is not needed.
> > + */
> > +int __weak kvm_arch_create_vm_debugfs(struct kvm *kvm)
> 
> This should be a void function, nothing should matter if creating
> debugfs files succeeds or not.
> 
> As proof, your one implementation always returned 0 :)

Right. :)

But we do have code that prepares for a failure on debugfs creations, please
have a look at kvm_create_vm_debugfs().  So I kept that for per-arch hook.

The existing x86 one should not fail, but it's a hope that we can convert some
other arch's existing debugfs code into per-arch hook like what we did with x86
here.  I didn't check again (please refer to the cover letter; we do have some
of those), but it's still easier to still allow per-arch hook to fail the vm
creation just like what we have for kvm_create_vm_debugfs().

PS: It makes sense to me to fail vm creation if 99% of those debugfs creation
failures are about -ENOMEM; e.g. early failure sounds better than failing right
after VM booted.

Meanwhile, I never expected to receive comments from you; thanks for having a
look!

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 4/7] KVM: X86: Introduce mmu_rmaps_stat per-vm debugfs file
  2021-08-02 15:25   ` Paolo Bonzini
  2021-08-03 19:14     ` Peter Xu
@ 2021-08-05 18:19     ` Sean Christopherson
  1 sibling, 0 replies; 13+ messages in thread
From: Sean Christopherson @ 2021-08-05 18:19 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Peter Xu, kvm, linux-kernel, Vitaly Kuznetsov, Maxim Levitsky

On Mon, Aug 02, 2021, Paolo Bonzini wrote:
> On 31/07/21 00:04, Peter Xu wrote:
> > Use this file to dump rmap statistic information.  The statistic is done by
> > calculating the rmap count and the result is log-2-based.
> > 
> > An example output of this looks like (idle 6GB guest, right after boot linux):
> > 
> > Rmap_Count:     0       1       2-3     4-7     8-15    16-31   32-63   64-127  128-255 256-511 512-1023
> > Level=4K:       3086676 53045   12330   1272    502     121     76      2       0       0       0
> > Level=2M:       5947    231     0       0       0       0       0       0       0       0       0
> > Level=1G:       32      0       0       0       0       0       0       0       0       0       0
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >   arch/x86/kvm/x86.c | 113 +++++++++++++++++++++++++++++++++++++++++++++
> >   1 file changed, 113 insertions(+)
> 
> This should be in debugfs.c, meaning that the kvm_mmu_slot_lpages() must be
> in a header.  I think mmu.h should do, let me take a look and I can post
> myself a v4 of these debugfs parts.

When you do post v4, don't forget to include both mmu.h and mmu/mmu_internal.h. :-)
kvm/queue is still broken...

arch/x86/kvm/debugfs.c: In function ‘kvm_mmu_rmaps_stat_show’:
arch/x86/kvm/debugfs.c:115:18: error: implicit declaration of function ‘kvm_mmu_slot_lpages’;
  115 |     lpage_size = kvm_mmu_slot_lpages(slot, k + 1);
      |                  ^~~~~~~~~~~~~~~~~~~
      |                  kvm_mmu_gfn_allow_lpage

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-08-05 18:19 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-30 22:04 [PATCH v3 0/7] KVM: X86: Some light optimizations on rmap logic Peter Xu
2021-07-30 22:04 ` [PATCH v3 1/7] KVM: Allow to have arch-specific per-vm debugfs files Peter Xu
2021-08-03 11:15   ` Greg KH
2021-08-03 19:25     ` Peter Xu
2021-07-30 22:04 ` [PATCH v3 2/7] KVM: X86: Introduce pte_list_count() helper Peter Xu
2021-07-30 22:04 ` [PATCH v3 3/7] KVM: X86: Introduce kvm_mmu_slot_lpages() helpers Peter Xu
2021-07-30 22:04 ` [PATCH v3 4/7] KVM: X86: Introduce mmu_rmaps_stat per-vm debugfs file Peter Xu
2021-08-02 15:25   ` Paolo Bonzini
2021-08-03 19:14     ` Peter Xu
2021-08-05 18:19     ` Sean Christopherson
2021-07-30 22:04 ` [PATCH v3 5/7] KVM: X86: MMU: Tune PTE_LIST_EXT to be bigger Peter Xu
2021-07-30 22:06 ` [PATCH v3 6/7] KVM: X86: Optimize pte_list_desc with per-array counter Peter Xu
2021-07-30 22:06 ` [PATCH v3 7/7] KVM: X86: Optimize zapping rmap Peter Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).