All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/6] Pass memslot around during page fault handling
@ 2021-08-13 20:34 David Matlack
  2021-08-13 20:34 ` [RFC PATCH 1/6] KVM: x86/mmu: Rename try_async_pf to kvm_faultin_pfn in comment David Matlack
                   ` (6 more replies)
  0 siblings, 7 replies; 18+ messages in thread
From: David Matlack @ 2021-08-13 20:34 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm, Ben Gardon, Joerg Roedel, Jim Mattson, Wanpeng Li,
	Vitaly Kuznetsov, Sean Christopherson, David Matlack

This series avoids kvm_vcpu_gfn_to_memslot() calls during page fault
handling by passing around the memslot in struct kvm_page_fault. This
idea came from Ben Gardon who authored an similar series in Google's
kernel.

This series is an RFC because kvm_vcpu_gfn_to_memslot() calls are
actually quite cheap after commit fe22ed827c5b ("KVM: Cache the last
used slot index per vCPU") since we always hit the cache. However
profiling shows there is still some time (1-2%) spent in
kvm_vcpu_gfn_to_memslot() and that hot instructions are the memory loads
for kvm->memslots[as_id] and slots->used_slots. This series eliminates
this remaining overhead but at the cost of a bit of code churn.

Design
------

We can avoid the cost of kvm_vcpu_gfn_to_memslot() by looking up the
slot once and passing it around. In fact this is quite easy to do now
that KVM passes around struct kvm_page_fault to most of the page fault
handling code.  We can store the slot there without changing most of the
call sites.

The one exception to this is mmu_set_spte, which does not take a
kvm_page_fault since it is also used during spte prefetching. There are
three memslots lookups under mmu_set_spte:

mmu_set_spte
  rmap_add
    kvm_vcpu_gfn_to_memslot
  rmap_recycle
    kvm_vcpu_gfn_to_memslot
  set_spte
    make_spte
      mmu_try_to_unsync_pages
        kvm_page_track_is_active
          kvm_vcpu_gfn_to_memslot

Avoiding these lookups requires plumbing the slot through all of the
above functions. I explored creating a synthetic kvm_page_fault for
prefetching so that kvm_page_fault could be passed to all of these
functions instead, but that resulted in even more code churn.

Patches
-------

Patches 1-2 are small cleanups related to the series.

Patches 3-4 pass the memslot through kvm_page_fault and use it where
kvm_page_fault is already accessible.

Patches 5-6 plumb the memslot down into the guts of mmu_set_spte to
avoid the remaining memslot lookups.

Performance
-----------

I measured the performance using dirty_log_perf_test and taking the
average "Populate memory time" over 10 runs. To help inform whether or
not different parts of this series is worth the code churn I measured
the performance of pages 1-4 and 1-6 separately.

Test                            | tdp_mmu | kvm/queue | Patches 1-4 | Patches 1-6
------------------------------- | ------- | --------- | ----------- | -----------
./dirty_log_perf_test -v64      | Y       | 5.22s     | 5.20s       | 5.20s
./dirty_log_perf_test -v64 -x64 | Y       | 5.23s     | 5.14s       | 5.14s
./dirty_log_perf_test -v64      | N       | 17.14s    | 16.39s      | 15.36s
./dirty_log_perf_test -v64 -x64 | N       | 17.17s    | 16.60s      | 15.31s

This series provides no performance improvement to the tdp_mmu but
improves the legacy MMU page fault handling by about 10%.

David Matlack (6):
  KVM: x86/mmu: Rename try_async_pf to kvm_faultin_pfn in comment
  KVM: x86/mmu: Fold rmap_recycle into rmap_add
  KVM: x86/mmu: Pass around the memslot in kvm_page_fault
  KVM: x86/mmu: Avoid memslot lookup in page_fault_handle_page_track
  KVM: x86/mmu: Avoid memslot lookup in rmap_add
  KVM: x86/mmu: Avoid memslot lookup in mmu_try_to_unsync_pages

 arch/x86/include/asm/kvm_page_track.h |   4 +-
 arch/x86/kvm/mmu.h                    |   5 +-
 arch/x86/kvm/mmu/mmu.c                | 110 +++++++++-----------------
 arch/x86/kvm/mmu/mmu_internal.h       |   3 +-
 arch/x86/kvm/mmu/page_track.c         |   6 +-
 arch/x86/kvm/mmu/paging_tmpl.h        |  18 ++++-
 arch/x86/kvm/mmu/spte.c               |  11 +--
 arch/x86/kvm/mmu/spte.h               |   9 ++-
 arch/x86/kvm/mmu/tdp_mmu.c            |  12 +--
 9 files changed, 80 insertions(+), 98 deletions(-)

-- 
2.33.0.rc1.237.g0d66db33f3-goog


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2021-08-20 23:02 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-13 20:34 [RFC PATCH 0/6] Pass memslot around during page fault handling David Matlack
2021-08-13 20:34 ` [RFC PATCH 1/6] KVM: x86/mmu: Rename try_async_pf to kvm_faultin_pfn in comment David Matlack
2021-08-13 20:35 ` [RFC PATCH 2/6] KVM: x86/mmu: Fold rmap_recycle into rmap_add David Matlack
2021-08-13 20:35 ` [RFC PATCH 3/6] KVM: x86/mmu: Pass the memslot around via struct kvm_page_fault David Matlack
2021-08-17 13:00   ` Paolo Bonzini
2021-08-17 16:13     ` David Matlack
2021-08-17 17:02       ` Paolo Bonzini
2021-08-19 16:37   ` Sean Christopherson
2021-08-20 22:54     ` David Matlack
2021-08-20 23:02       ` Sean Christopherson
2021-08-13 20:35 ` [RFC PATCH 4/6] KVM: x86/mmu: Avoid memslot lookup in page_fault_handle_page_track David Matlack
2021-08-13 20:35 ` [RFC PATCH 5/6] KVM: x86/mmu: Avoid memslot lookup in rmap_add David Matlack
2021-08-17 12:03   ` Paolo Bonzini
2021-08-19 16:15     ` David Matlack
2021-08-19 16:39       ` Sean Christopherson
2021-08-19 16:47         ` Paolo Bonzini
2021-08-13 20:35 ` [RFC PATCH 6/6] KVM: x86/mmu: Avoid memslot lookup in mmu_try_to_unsync_pages David Matlack
2021-08-17 11:12 ` [RFC PATCH 0/6] Pass memslot around during page fault handling Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.