All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/7] KVM: X86: Some light optimizations on rmap logic
@ 2021-07-30 22:04 Peter Xu
  2021-07-30 22:04 ` [PATCH v3 1/7] KVM: Allow to have arch-specific per-vm debugfs files Peter Xu
                   ` (6 more replies)
  0 siblings, 7 replies; 13+ messages in thread
From: Peter Xu @ 2021-07-30 22:04 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: Vitaly Kuznetsov, Sean Christopherson, peterx, Maxim Levitsky,
	Paolo Bonzini

Major change to v3 is to address comments from Sean.

Since I retested the two relevant patches and the numbers changed slightly, I
updated the numbers in the two optimization patches to reflect that.  In the
latest measurement the 3->15 slots change showed more effect on the speedup.
Summary:

        Vanilla:      473.90 (+-5.93%)
        3->15 slots:  366.10 (+-4.94%)
        Add counter:  351.00 (+-3.70%)

All the numbers are also updated in the commit messages.

To apply the series upon kvm/queue, below patches should be replaced by the
corresponding patches in this v3:

        KVM: X86: MMU: Tune PTE_LIST_EXT to be bigger
        KVM: X86: Optimize pte_list_desc with per-array counter
        KVM: X86: Optimize zapping rmap

The 1st oneliner patch needs to be replaced because the commit message is
updated with the new numbers so to align all the numbers, the 2nd-3rd patches
are for addressing Sean's comments and also with the new numbers.

I didn't repost the initial two patches because they're already in kvm/queue
and they'll be identical in content.  Please have a look, thanks.

v2: https://lore.kernel.org/kvm/20210625153214.43106-1-peterx@redhat.com/
v1: https://lore.kernel.org/kvm/20210624181356.10235-1-peterx@redhat.com/

-- original cover letter --

All things started from patch 1, which introduced a new statistic to keep "max
rmap entry count per vm".  At that time I was just curious about how many rmap
is there normally for a guest, and it surprised me a bit.

For TDP mappings it's all fine as mostly rmap of a page is either 0 or 1
depending on faulted or not.  It turns out with EPT=N there seems to be a huge
number of pages that can have tens or hundreds of rmap entries even for an idle
guest.  Then I continued with the rest.

To understand better on "how much of those pages", I did patch 2-6 which
introduced the idea of per-arch per-vm debugfs nodes, and added a debug file to
do statistics for rmap, which is similar to kvm_arch_create_vcpu_debugfs() but
for vm not vcpu.

I did notice this should be the clean approach as I also see other archs
randomly create some per-vm debugfs nodes there:

---8<---
*** arch/arm64/kvm/vgic/vgic-debug.c:
vgic_debug_init[274]           debugfs_create_file("vgic-state", 0444, kvm->debugfs_dentry, kvm,

*** arch/powerpc/kvm/book3s_64_mmu_hv.c:
kvmppc_mmu_debugfs_init[2115]  debugfs_create_file("htab", 0400, kvm->arch.debugfs_dir, kvm,

*** arch/powerpc/kvm/book3s_64_mmu_radix.c:
kvmhv_radix_debugfs_init[1434] debugfs_create_file("radix", 0400, kvm->arch.debugfs_dir, kvm,

*** arch/powerpc/kvm/book3s_hv.c:
debugfs_vcpu_init[2395]        debugfs_create_file("timings", 0444, vcpu->arch.debugfs_dir, vcpu,

*** arch/powerpc/kvm/book3s_xics.c:
xics_debugfs_init[1027]        xics->dentry = debugfs_create_file(name, 0444, powerpc_debugfs_root,

*** arch/powerpc/kvm/book3s_xive.c:
xive_debugfs_init[2236]        xive->dentry = debugfs_create_file(name, S_IRUGO, powerpc_debugfs_root,

*** arch/powerpc/kvm/timing.c:
kvmppc_create_vcpu_debugfs[214] debugfs_file = debugfs_create_file(dbg_fname, 0666, kvm_debugfs_dir,
---8<---

PPC even has its own per-vm dir for that.  I think if patch 2-6 can be
considered to be accepted then the next thing to consider is to merge all these
usages to be under the same existing per-vm dentry with their per-arch hooks
introduced.

The last 3 patches (patch 7-9) are a few optimizations of existing rmap logic.
The major test case I used is rmap_fork [1], however it's not really the ideal
one to show their effect for sure as that test I wrote covers both
rmap_add/remove, while I don't have good idea on optimizing rmap_remove without
changing the array structure or adding much overhead (e.g. sort the array, or
making a tree-like structure somehow to replace the array list).  However it
already shows some benefit with those changes, so I post them out.

Applying patch 7-8 will bring a summary of 38% perf boost when I fork 500
childs with the test I used.  Didn't run perf test on patch 9.  More in the
commit log.

Please review, thanks.

[1] https://github.com/xzpeter/clibs/commit/825436f825453de2ea5aaee4bdb1c92281efe5b3

Peter Xu (7):
  KVM: Allow to have arch-specific per-vm debugfs files
  KVM: X86: Introduce pte_list_count() helper
  KVM: X86: Introduce kvm_mmu_slot_lpages() helpers
  KVM: X86: Introduce mmu_rmaps_stat per-vm debugfs file
  KVM: X86: MMU: Tune PTE_LIST_EXT to be bigger
  KVM: X86: Optimize pte_list_desc with per-array counter
  KVM: X86: Optimize zapping rmap

 arch/x86/kvm/mmu/mmu.c          |  98 +++++++++++++++++-------
 arch/x86/kvm/mmu/mmu_internal.h |   1 +
 arch/x86/kvm/x86.c              | 130 +++++++++++++++++++++++++++++++-
 include/linux/kvm_host.h        |   1 +
 virt/kvm/kvm_main.c             |  20 ++++-
 5 files changed, 221 insertions(+), 29 deletions(-)

-- 
2.31.1



^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-08-05 18:19 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-30 22:04 [PATCH v3 0/7] KVM: X86: Some light optimizations on rmap logic Peter Xu
2021-07-30 22:04 ` [PATCH v3 1/7] KVM: Allow to have arch-specific per-vm debugfs files Peter Xu
2021-08-03 11:15   ` Greg KH
2021-08-03 19:25     ` Peter Xu
2021-07-30 22:04 ` [PATCH v3 2/7] KVM: X86: Introduce pte_list_count() helper Peter Xu
2021-07-30 22:04 ` [PATCH v3 3/7] KVM: X86: Introduce kvm_mmu_slot_lpages() helpers Peter Xu
2021-07-30 22:04 ` [PATCH v3 4/7] KVM: X86: Introduce mmu_rmaps_stat per-vm debugfs file Peter Xu
2021-08-02 15:25   ` Paolo Bonzini
2021-08-03 19:14     ` Peter Xu
2021-08-05 18:19     ` Sean Christopherson
2021-07-30 22:04 ` [PATCH v3 5/7] KVM: X86: MMU: Tune PTE_LIST_EXT to be bigger Peter Xu
2021-07-30 22:06 ` [PATCH v3 6/7] KVM: X86: Optimize pte_list_desc with per-array counter Peter Xu
2021-07-30 22:06 ` [PATCH v3 7/7] KVM: X86: Optimize zapping rmap Peter Xu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.