linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 00/28] KVM: x86/mmu: Overhaul TDP MMU zapping and flushing
@ 2022-02-26  0:15 Sean Christopherson
  2022-02-26  0:15 ` [PATCH v3 01/28] KVM: x86/mmu: Use common iterator for walking invalid TDP MMU roots Sean Christopherson
                   ` (27 more replies)
  0 siblings, 28 replies; 79+ messages in thread
From: Sean Christopherson @ 2022-02-26  0:15 UTC (permalink / raw)
  To: Paolo Bonzini, Christian Borntraeger, Janosch Frank, Claudio Imbrenda
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, David Hildenbrand, kvm, linux-kernel,
	David Matlack, Ben Gardon, Mingwei Zhang

Overhaul TDP MMU's handling of zapping and TLB flushing to reduce the
number of TLB flushes, fix soft lockups and RCU stalls, avoid blocking
vCPUs for long durations while zapping paging structure, and to clean up
the zapping code.

The largest cleanup is to separate the flows for zapping roots (zap
_everything_), zapping leaf SPTEs (zap guest mappings for whatever reason),
and zapping a specific SP (NX recovery).  They're currently smushed into a
single zap_gfn_range(), which was a good idea at the time, but became a
mess when trying to handle the different rules, e.g. TLB flushes aren't
needed when zapping a root because KVM can safely zap a root if and only
if it's unreachable.

To solve the soft lockups, stalls, and vCPU performance issues:

 - Defer remote TLB flushes to the caller when zapping TDP MMU shadow
   pages by relying on RCU to ensure the paging structure isn't freed
   until all vCPUs have exited the guest.

 - Allowing yielding when zapping TDP MMU roots in response to the root's
   last reference being put.  This requires a bit of trickery to ensure
   the root is reachable via mmu_notifier, but it's not too gross.

 - Zap roots in two passes to avoid holding RCU for potential hundreds of
   seconds when zapping guest with terabytes of memory that is backed
   entirely by 4kb SPTEs.

 - Zap defunct roots asynchronously via the common work_queue so that a
   vCPU doesn't get stuck doing the work if the vCPU happens to drop the
   last reference to a root.

The selftest at the end allows populating a guest with the max amount of
memory allowed by the underlying architecture.  The most I've tested is
~64tb (MAXPHYADDR=46) as I don't have easy access to a system with
MAXPHYADDR=52.  The selftest compiles on arm64 and s390x, but otherwise
hasn't been tested outside of x86-64.  It will hopefully do something
useful as is, but there's a non-zero chance it won't get past init with
a high max memory.  Running on x86 without the TDP MMU is comically slow.

v3:
  - Drop patches that were applied.
  - Rebase to latest kvm/queue.
  - Collect a review. [David]
  - Use helper instead of goto to zap roots in two passes. [David]
  - Add patches to disallow REMOVED "old" SPTE when atomically
    setting SPTE.

v2:
  - https://lore.kernel.org/all/20211223222318.1039223-1-seanjc@google.com
  - Drop patches that were applied.
  - Collect reviews for patches that weren't modified. [Ben]
  - Abandon the idea of taking invalid roots off the list of roots.
  - Add a patch to fix misleading/wrong comments with respect to KVM's
    responsibilities in the "fast zap" flow, specifically that all SPTEs
    must be dropped before the zap completes.
  - Rework yielding in kvm_tdp_mmu_put_root() to keep the root visibile
    while yielding.
  - Add patch to zap roots in two passes. [Mingwei, David]
  - Add a patch to asynchronously zap defunct roots.
  - Add the selftest.

v1: https://lore.kernel.org/all/20211120045046.3940942-1-seanjc@google.com


Sean Christopherson (28):
  KVM: x86/mmu: Use common iterator for walking invalid TDP MMU roots
  KVM: x86/mmu: Check for present SPTE when clearing dirty bit in TDP
    MMU
  KVM: x86/mmu: Fix wrong/misleading comments in TDP MMU fast zap
  KVM: x86/mmu: Formalize TDP MMU's (unintended?) deferred TLB flush
    logic
  KVM: x86/mmu: Document that zapping invalidated roots doesn't need to
    flush
  KVM: x86/mmu: Require mmu_lock be held for write in unyielding root
    iter
  KVM: x86/mmu: Check for !leaf=>leaf, not PFN change, in TDP MMU SP
    removal
  KVM: x86/mmu: Batch TLB flushes from TDP MMU for MMU notifier
    change_spte
  KVM: x86/mmu: Drop RCU after processing each root in MMU notifier
    hooks
  KVM: x86/mmu: Add helpers to read/write TDP MMU SPTEs and document RCU
  KVM: x86/mmu: WARN if old _or_ new SPTE is REMOVED in non-atomic path
  KVM: x86/mmu: Refactor low-level TDP MMU set SPTE helper to take raw
    vals
  KVM: x86/mmu: Zap only the target TDP MMU shadow page in NX recovery
  KVM: x86/mmu: Skip remote TLB flush when zapping all of TDP MMU
  KVM: x86/mmu: Add dedicated helper to zap TDP MMU root shadow page
  KVM: x86/mmu: Require mmu_lock be held for write to zap TDP MMU range
  KVM: x86/mmu: Zap only TDP MMU leafs in kvm_zap_gfn_range()
  KVM: x86/mmu: Do remote TLB flush before dropping RCU in TDP MMU
    resched
  KVM: x86/mmu: Defer TLB flush to caller when freeing TDP MMU shadow
    pages
  KVM: x86/mmu: Allow yielding when zapping GFNs for defunct TDP MMU
    root
  KVM: x86/mmu: Zap roots in two passes to avoid inducing RCU stalls
  KVM: x86/mmu: Zap defunct roots via asynchronous worker
  KVM: x86/mmu: Check for a REMOVED leaf SPTE before making the SPTE
  KVM: x86/mmu: WARN on any attempt to atomically update REMOVED SPTE
  KVM: selftests: Move raw KVM_SET_USER_MEMORY_REGION helper to utils
  KVM: selftests: Split out helper to allocate guest mem via memfd
  KVM: selftests: Define cpu_relax() helpers for s390 and x86
  KVM: selftests: Add test to populate a VM with the max possible guest
    mem

 arch/x86/kvm/mmu/mmu.c                        |  42 +-
 arch/x86/kvm/mmu/mmu_internal.h               |  15 +-
 arch/x86/kvm/mmu/tdp_iter.c                   |   6 +-
 arch/x86/kvm/mmu/tdp_iter.h                   |  15 +-
 arch/x86/kvm/mmu/tdp_mmu.c                    | 595 ++++++++++++------
 arch/x86/kvm/mmu/tdp_mmu.h                    |  26 +-
 tools/testing/selftests/kvm/.gitignore        |   1 +
 tools/testing/selftests/kvm/Makefile          |   3 +
 .../selftests/kvm/include/kvm_util_base.h     |   5 +
 .../selftests/kvm/include/s390x/processor.h   |   8 +
 .../selftests/kvm/include/x86_64/processor.h  |   5 +
 tools/testing/selftests/kvm/lib/kvm_util.c    |  66 +-
 .../selftests/kvm/max_guest_memory_test.c     | 292 +++++++++
 .../selftests/kvm/set_memory_region_test.c    |  35 +-
 14 files changed, 832 insertions(+), 282 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/max_guest_memory_test.c


base-commit: 625e7ef7da1a4addd8db41c2504fe8a25b93acd5
-- 
2.35.1.574.g5d30c73bfb-goog


^ permalink raw reply	[flat|nested] 79+ messages in thread

end of thread, other threads:[~2022-03-03 23:07 UTC | newest]

Thread overview: 79+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-26  0:15 [PATCH v3 00/28] KVM: x86/mmu: Overhaul TDP MMU zapping and flushing Sean Christopherson
2022-02-26  0:15 ` [PATCH v3 01/28] KVM: x86/mmu: Use common iterator for walking invalid TDP MMU roots Sean Christopherson
2022-03-02 19:08   ` Mingwei Zhang
2022-03-02 19:51     ` Sean Christopherson
2022-03-03  0:57       ` Mingwei Zhang
2022-02-26  0:15 ` [PATCH v3 02/28] KVM: x86/mmu: Check for present SPTE when clearing dirty bit in TDP MMU Sean Christopherson
2022-03-02 19:50   ` Mingwei Zhang
2022-02-26  0:15 ` [PATCH v3 03/28] KVM: x86/mmu: Fix wrong/misleading comments in TDP MMU fast zap Sean Christopherson
2022-02-28 23:15   ` Ben Gardon
2022-02-26  0:15 ` [PATCH v3 04/28] KVM: x86/mmu: Formalize TDP MMU's (unintended?) deferred TLB flush logic Sean Christopherson
2022-03-02 23:59   ` Mingwei Zhang
2022-03-03  0:12     ` Sean Christopherson
2022-03-03  1:20       ` Mingwei Zhang
2022-03-03  1:41         ` Sean Christopherson
2022-03-03  4:50           ` Mingwei Zhang
2022-03-03 16:45             ` Sean Christopherson
2022-02-26  0:15 ` [PATCH v3 05/28] KVM: x86/mmu: Document that zapping invalidated roots doesn't need to flush Sean Christopherson
2022-02-28 23:17   ` Ben Gardon
2022-02-26  0:15 ` [PATCH v3 06/28] KVM: x86/mmu: Require mmu_lock be held for write in unyielding root iter Sean Christopherson
2022-02-28 23:26   ` Ben Gardon
2022-02-26  0:15 ` [PATCH v3 07/28] KVM: x86/mmu: Check for !leaf=>leaf, not PFN change, in TDP MMU SP removal Sean Christopherson
2022-03-01  0:11   ` Ben Gardon
2022-03-03 18:02   ` Mingwei Zhang
2022-02-26  0:15 ` [PATCH v3 08/28] KVM: x86/mmu: Batch TLB flushes from TDP MMU for MMU notifier change_spte Sean Christopherson
2022-03-03 18:08   ` Mingwei Zhang
2022-02-26  0:15 ` [PATCH v3 09/28] KVM: x86/mmu: Drop RCU after processing each root in MMU notifier hooks Sean Christopherson
2022-03-03 18:24   ` Mingwei Zhang
2022-03-03 18:32   ` Mingwei Zhang
2022-02-26  0:15 ` [PATCH v3 10/28] KVM: x86/mmu: Add helpers to read/write TDP MMU SPTEs and document RCU Sean Christopherson
2022-03-03 18:34   ` Mingwei Zhang
2022-02-26  0:15 ` [PATCH v3 11/28] KVM: x86/mmu: WARN if old _or_ new SPTE is REMOVED in non-atomic path Sean Christopherson
2022-03-03 18:37   ` Mingwei Zhang
2022-02-26  0:15 ` [PATCH v3 12/28] KVM: x86/mmu: Refactor low-level TDP MMU set SPTE helper to take raw vals Sean Christopherson
2022-03-03 18:47   ` Mingwei Zhang
2022-02-26  0:15 ` [PATCH v3 13/28] KVM: x86/mmu: Zap only the target TDP MMU shadow page in NX recovery Sean Christopherson
2022-02-26  0:15 ` [PATCH v3 14/28] KVM: x86/mmu: Skip remote TLB flush when zapping all of TDP MMU Sean Christopherson
2022-03-01  0:19   ` Ben Gardon
2022-03-03 18:50   ` Mingwei Zhang
2022-02-26  0:15 ` [PATCH v3 15/28] KVM: x86/mmu: Add dedicated helper to zap TDP MMU root shadow page Sean Christopherson
2022-03-01  0:32   ` Ben Gardon
2022-03-03 21:19   ` Mingwei Zhang
2022-03-03 21:24     ` Mingwei Zhang
2022-03-03 23:06       ` Sean Christopherson
2022-02-26  0:15 ` [PATCH v3 16/28] KVM: x86/mmu: Require mmu_lock be held for write to zap TDP MMU range Sean Christopherson
2022-02-26  0:15 ` [PATCH v3 17/28] KVM: x86/mmu: Zap only TDP MMU leafs in kvm_zap_gfn_range() Sean Christopherson
2022-02-26  0:15 ` [PATCH v3 18/28] KVM: x86/mmu: Do remote TLB flush before dropping RCU in TDP MMU resched Sean Christopherson
2022-02-26  0:15 ` [PATCH v3 19/28] KVM: x86/mmu: Defer TLB flush to caller when freeing TDP MMU shadow pages Sean Christopherson
2022-02-26  0:15 ` [PATCH v3 20/28] KVM: x86/mmu: Allow yielding when zapping GFNs for defunct TDP MMU root Sean Christopherson
2022-03-01 18:21   ` Paolo Bonzini
2022-03-01 19:43     ` Sean Christopherson
2022-03-01 20:12       ` Paolo Bonzini
2022-03-02  2:13         ` Sean Christopherson
2022-03-02 14:54           ` Paolo Bonzini
2022-03-02 17:43             ` Sean Christopherson
2022-02-26  0:15 ` [PATCH v3 21/28] KVM: x86/mmu: Zap roots in two passes to avoid inducing RCU stalls Sean Christopherson
2022-03-01  0:43   ` Ben Gardon
2022-02-26  0:15 ` [PATCH v3 22/28] KVM: x86/mmu: Zap defunct roots via asynchronous worker Sean Christopherson
2022-03-01 17:57   ` Ben Gardon
2022-03-02 17:25   ` Paolo Bonzini
2022-03-02 17:35     ` Sean Christopherson
2022-03-02 18:33       ` David Matlack
2022-03-02 18:36         ` Paolo Bonzini
2022-03-02 18:01     ` Sean Christopherson
2022-03-02 18:20       ` Paolo Bonzini
2022-03-02 19:33         ` Sean Christopherson
2022-03-02 20:14           ` Paolo Bonzini
2022-03-02 20:47             ` Sean Christopherson
2022-03-02 21:22               ` Paolo Bonzini
2022-03-02 22:25                 ` Sean Christopherson
2022-02-26  0:15 ` [PATCH v3 23/28] KVM: x86/mmu: Check for a REMOVED leaf SPTE before making the SPTE Sean Christopherson
2022-03-01 18:06   ` Ben Gardon
2022-02-26  0:15 ` [PATCH v3 24/28] KVM: x86/mmu: WARN on any attempt to atomically update REMOVED SPTE Sean Christopherson
2022-02-26  0:15 ` [PATCH v3 25/28] KVM: selftests: Move raw KVM_SET_USER_MEMORY_REGION helper to utils Sean Christopherson
2022-02-26  0:15 ` [PATCH v3 26/28] KVM: selftests: Split out helper to allocate guest mem via memfd Sean Christopherson
2022-02-28 23:36   ` David Woodhouse
2022-03-02 18:36     ` Paolo Bonzini
2022-03-02 21:55       ` David Woodhouse
2022-02-26  0:15 ` [PATCH v3 27/28] KVM: selftests: Define cpu_relax() helpers for s390 and x86 Sean Christopherson
2022-02-26  0:15 ` [PATCH v3 28/28] KVM: selftests: Add test to populate a VM with the max possible guest mem Sean Christopherson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).