All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] Improve gfn-to-memslot performance during page faults
@ 2021-07-30 22:37 David Matlack
  2021-07-30 22:37 ` [PATCH 1/6] KVM: Cache the least recently used slot index per vCPU David Matlack
                   ` (5 more replies)
  0 siblings, 6 replies; 18+ messages in thread
From: David Matlack @ 2021-07-30 22:37 UTC (permalink / raw)
  To: kvm
  Cc: Ben Gardon, Joerg Roedel, Jim Mattson, Wanpeng Li,
	Vitaly Kuznetsov, Sean Christopherson, Paolo Bonzini,
	Junaid Shahid, Andrew Jones, David Matlack

This series improves the performance of gfn-to-memslot lookups during
page faults. Ben Gardon originally identified this performance gap and
sufficiently addressed it in Google's kernel by reading the memslot once
at the beginning of the page fault and passing around the pointer.

This series takes an alternative approach by introducing a per-vCPU
cache of the least recently used memslot index. This avoids needing to
binary search the existing memslots multiple times during a page fault.
Unlike passing around the pointer, the LRU cache has an additional
benefit in that it speeds up gfn-to-memslot lookups *across* faults and
during spte prefetching where the gfn changes.

This difference can be seen clearly when looking at the performance of
fast_page_fault when multiple slots are in play:

Metric                        | Baseline     | Pass*    | LRU**
----------------------------- | ------------ | -------- | ----------
Iteration 2 dirty memory time | 2.8s         | 1.6s     | 0.30s

* Pass: Lookup the memslot once per fault and pass it around.
** LRU: Cache the LRU slot per vCPU (i.e. this series).

(Collected via ./dirty_log_perf_test -v64 -x64)

I plan to also send a follow-up series with a version of Ben's patches
to pass the pointer to the memslot through the page fault handling code
rather than looking it up multiple times. Even when applied on top of
the LRU series it has some performance improvements by avoiding a few
extra memory accesses (mainly kvm->memslots[as_id] and
slots->used_slots). But it will be a judgement call whether or not it's
worth the code churn and complexity.

Here is a break down of this series:

Patches 1-2 introduce a per-vCPU cache of the least recently memslot
index.

Patches 3-5 convert existing gfn-to-memslot lookups to use
kvm_vcpu_gfn_to_memslot so that they can leverage the new LRU cache.

Patch 6 adds support for multiple slots to dirty_log_perf_test which is
used to generate the performance data in this series.

David Matlack (6):
  KVM: Cache the least recently used slot index per vCPU
  KVM: Avoid VM-wide lru_slot lookup in kvm_vcpu_gfn_to_memslot
  KVM: x86/mmu: Speed up dirty logging in
    tdp_mmu_map_handle_target_level
  KVM: x86/mmu: Leverage vcpu->lru_slot_index for rmap_add and
    rmap_recycle
  KVM: x86/mmu: Rename __gfn_to_rmap to gfn_to_rmap
  KVM: selftests: Support multiple slots in dirty_log_perf_test

 arch/x86/kvm/mmu/mmu.c                        | 54 +++++++------
 arch/x86/kvm/mmu/tdp_mmu.c                    | 15 +++-
 include/linux/kvm_host.h                      | 73 +++++++++++++-----
 .../selftests/kvm/access_tracking_perf_test.c |  2 +-
 .../selftests/kvm/demand_paging_test.c        |  2 +-
 .../selftests/kvm/dirty_log_perf_test.c       | 76 ++++++++++++++++---
 .../selftests/kvm/include/perf_test_util.h    |  2 +-
 .../selftests/kvm/lib/perf_test_util.c        | 20 +++--
 .../kvm/memslot_modification_stress_test.c    |  2 +-
 virt/kvm/kvm_main.c                           | 21 ++++-
 10 files changed, 198 insertions(+), 69 deletions(-)

-- 
2.32.0.554.ge1b32706d8-goog


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2021-08-02 16:38 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-30 22:37 [PATCH 0/6] Improve gfn-to-memslot performance during page faults David Matlack
2021-07-30 22:37 ` [PATCH 1/6] KVM: Cache the least recently used slot index per vCPU David Matlack
2021-08-02 14:36   ` Paolo Bonzini
2021-08-02 16:27     ` David Matlack
2021-08-02 16:38       ` Sean Christopherson
2021-07-30 22:37 ` [PATCH 2/6] KVM: Avoid VM-wide lru_slot lookup in kvm_vcpu_gfn_to_memslot David Matlack
2021-07-30 22:37 ` [PATCH 3/6] KVM: x86/mmu: Speed up dirty logging in tdp_mmu_map_handle_target_level David Matlack
2021-08-02 14:58   ` Paolo Bonzini
2021-08-02 16:31     ` David Matlack
2021-07-30 22:37 ` [PATCH 4/6] KVM: x86/mmu: Leverage vcpu->lru_slot_index for rmap_add and rmap_recycle David Matlack
2021-08-02 14:58   ` Paolo Bonzini
2021-07-30 22:37 ` [PATCH 5/6] KVM: x86/mmu: Rename __gfn_to_rmap to gfn_to_rmap David Matlack
2021-07-31  9:41   ` kernel test robot
2021-07-31  9:41     ` kernel test robot
2021-07-31 12:22   ` kernel test robot
2021-07-31 12:22     ` kernel test robot
2021-08-02 14:59   ` Paolo Bonzini
2021-07-30 22:37 ` [PATCH 6/6] KVM: selftests: Support multiple slots in dirty_log_perf_test David Matlack

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.