linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mingwei Zhang <mizhang@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: LKML <linux-kernel@vger.kernel.org>, kvm <kvm@vger.kernel.org>,
	Sean Christopherson <seanjc@google.com>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>,
	David Hildenbrand <david@redhat.com>,
	David Matlack <dmatlack@google.com>,
	Ben Gardon <bgardon@google.com>
Subject: Re: [PATCH v4 18/30] KVM: x86/mmu: Zap only TDP MMU leafs in kvm_zap_gfn_range()
Date: Sun, 13 Mar 2022 11:40:44 -0700	[thread overview]
Message-ID: <CAL715WJc3QdFe4gkbefW5zHPaYZfErG9vQmOLsbXz=kbaB-6uw@mail.gmail.com> (raw)
In-Reply-To: <20220303193842.370645-19-pbonzini@redhat.com>

On Thu, Mar 3, 2022 at 11:39 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> From: Sean Christopherson <seanjc@google.com>
>
> Zap only leaf SPTEs in the TDP MMU's zap_gfn_range(), and rename various
> functions accordingly.  When removing mappings for functional correctness
> (except for the stupid VFIO GPU passthrough memslots bug), zapping the
> leaf SPTEs is sufficient as the paging structures themselves do not point
> at guest memory and do not directly impact the final translation (in the
> TDP MMU).
>
> Note, this aligns the TDP MMU with the legacy/full MMU, which zaps only
> the rmaps, a.k.a. leaf SPTEs, in kvm_zap_gfn_range() and
> kvm_unmap_gfn_range().
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> Reviewed-by: Ben Gardon <bgardon@google.com>
> Message-Id: <20220226001546.360188-18-seanjc@google.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  arch/x86/kvm/mmu/mmu.c     |  4 ++--
>  arch/x86/kvm/mmu/tdp_mmu.c | 41 ++++++++++----------------------------
>  arch/x86/kvm/mmu/tdp_mmu.h |  8 +-------
>  3 files changed, 14 insertions(+), 39 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 8408d7db8d2a..febdcaaa7b94 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -5834,8 +5834,8 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
>
>         if (is_tdp_mmu_enabled(kvm)) {
>                 for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++)
> -                       flush = kvm_tdp_mmu_zap_gfn_range(kvm, i, gfn_start,
> -                                                         gfn_end, flush);
> +                       flush = kvm_tdp_mmu_zap_leafs(kvm, i, gfn_start,
> +                                                     gfn_end, true, flush);
>         }
>
>         if (flush)
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index f3939ce4a115..c71debdbc732 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -834,10 +834,8 @@ bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
>  }
>
>  /*
> - * Tears down the mappings for the range of gfns, [start, end), and frees the
> - * non-root pages mapping GFNs strictly within that range. Returns true if
> - * SPTEs have been cleared and a TLB flush is needed before releasing the
> - * MMU lock.
> + * Zap leafs SPTEs for the range of gfns, [start, end). Returns true if SPTEs
> + * have been cleared and a TLB flush is needed before releasing the MMU lock.
>   *
>   * If can_yield is true, will release the MMU lock and reschedule if the
>   * scheduler needs the CPU or there is contention on the MMU lock. If this
> @@ -845,42 +843,25 @@ bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
>   * the caller must ensure it does not supply too large a GFN range, or the
>   * operation can cause a soft lockup.
>   */
> -static bool zap_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
> -                         gfn_t start, gfn_t end, bool can_yield, bool flush)
> +static bool tdp_mmu_zap_leafs(struct kvm *kvm, struct kvm_mmu_page *root,
> +                             gfn_t start, gfn_t end, bool can_yield, bool flush)
>  {
> -       bool zap_all = (start == 0 && end >= tdp_mmu_max_gfn_host());
>         struct tdp_iter iter;
>
> -       /*
> -        * No need to try to step down in the iterator when zapping all SPTEs,
> -        * zapping the top-level non-leaf SPTEs will recurse on their children.
> -        */
> -       int min_level = zap_all ? root->role.level : PG_LEVEL_4K;
> -
>         end = min(end, tdp_mmu_max_gfn_host());
>
>         lockdep_assert_held_write(&kvm->mmu_lock);
>
>         rcu_read_lock();
>
> -       for_each_tdp_pte_min_level(iter, root, min_level, start, end) {
> +       for_each_tdp_pte_min_level(iter, root, PG_LEVEL_4K, start, end) {
>                 if (can_yield &&
>                     tdp_mmu_iter_cond_resched(kvm, &iter, flush, false)) {
>                         flush = false;
>                         continue;
>                 }
>
> -               if (!is_shadow_present_pte(iter.old_spte))
> -                       continue;
> -
> -               /*
> -                * If this is a non-last-level SPTE that covers a larger range
> -                * than should be zapped, continue, and zap the mappings at a
> -                * lower level, except when zapping all SPTEs.
> -                */
> -               if (!zap_all &&
> -                   (iter.gfn < start ||
> -                    iter.gfn + KVM_PAGES_PER_HPAGE(iter.level) > end) &&
> +               if (!is_shadow_present_pte(iter.old_spte) ||
>                     !is_last_spte(iter.old_spte, iter.level))
>                         continue;
>
> @@ -898,13 +879,13 @@ static bool zap_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
>   * SPTEs have been cleared and a TLB flush is needed before releasing the
>   * MMU lock.
>   */
> -bool __kvm_tdp_mmu_zap_gfn_range(struct kvm *kvm, int as_id, gfn_t start,
> -                                gfn_t end, bool can_yield, bool flush)
> +bool kvm_tdp_mmu_zap_leafs(struct kvm *kvm, int as_id, gfn_t start, gfn_t end,
> +                          bool can_yield, bool flush)
>  {
>         struct kvm_mmu_page *root;
>
>         for_each_tdp_mmu_root_yield_safe(kvm, root, as_id)
> -               flush = zap_gfn_range(kvm, root, start, end, can_yield, flush);
> +               flush = tdp_mmu_zap_leafs(kvm, root, start, end, can_yield, false);

hmm, I think we might have to be very careful here. If we only zap
leafs, then there could be side effects. For instance, the code in
disallowed_hugepage_adjust() may not work as intended. If you check
the following condition in arch/x86/kvm/mmu/mmu.c:2918

if (cur_level > PG_LEVEL_4K &&
    cur_level == fault->goal_level &&
    is_shadow_present_pte(spte) &&
    !is_large_pte(spte)) {

If we previously use 4K mappings in this range due to various reasons
(dirty logging etc), then afterwards, we zap the range. Then the guest
touches a 4K and now we should map the range with whatever the maximum
level we can for the guest.

However, if we just zap only the leafs, then when the code comes to
the above location, is_shadow_present_pte(spte) will return true,
since the spte is a non-leaf (say a regular PMD entry). The whole if
statement will be true, then we never allow remapping guest memory
with huge pages.

>
>         return flush;
>  }
> @@ -1202,8 +1183,8 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
>  bool kvm_tdp_mmu_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range,
>                                  bool flush)
>  {
> -       return __kvm_tdp_mmu_zap_gfn_range(kvm, range->slot->as_id, range->start,
> -                                          range->end, range->may_block, flush);
> +       return kvm_tdp_mmu_zap_leafs(kvm, range->slot->as_id, range->start,
> +                                    range->end, range->may_block, flush);
>  }
>
>  typedef bool (*tdp_handler_t)(struct kvm *kvm, struct tdp_iter *iter,
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
> index 5e5ef2576c81..54bc8118c40a 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.h
> +++ b/arch/x86/kvm/mmu/tdp_mmu.h
> @@ -15,14 +15,8 @@ __must_check static inline bool kvm_tdp_mmu_get_root(struct kvm_mmu_page *root)
>  void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root,
>                           bool shared);
>
> -bool __kvm_tdp_mmu_zap_gfn_range(struct kvm *kvm, int as_id, gfn_t start,
> +bool kvm_tdp_mmu_zap_leafs(struct kvm *kvm, int as_id, gfn_t start,
>                                  gfn_t end, bool can_yield, bool flush);
> -static inline bool kvm_tdp_mmu_zap_gfn_range(struct kvm *kvm, int as_id,
> -                                            gfn_t start, gfn_t end, bool flush)
> -{
> -       return __kvm_tdp_mmu_zap_gfn_range(kvm, as_id, start, end, true, flush);
> -}
> -
>  bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu_page *sp);
>  void kvm_tdp_mmu_zap_all(struct kvm *kvm);
>  void kvm_tdp_mmu_invalidate_all_roots(struct kvm *kvm);
> --
> 2.31.1
>
>

  parent reply	other threads:[~2022-03-13 18:41 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-03 19:38 [PATCH v4 00/30] KVM: x86/mmu: Overhaul TDP MMU zapping and flushing Paolo Bonzini
2022-03-03 19:38 ` [PATCH v4 01/30] KVM: x86/mmu: Check for present SPTE when clearing dirty bit in TDP MMU Paolo Bonzini
2022-03-03 19:38 ` [PATCH v4 02/30] KVM: x86/mmu: Fix wrong/misleading comments in TDP MMU fast zap Paolo Bonzini
2022-03-03 19:38 ` [PATCH v4 03/30] KVM: x86/mmu: Formalize TDP MMU's (unintended?) deferred TLB flush logic Paolo Bonzini
2022-03-03 23:39   ` Mingwei Zhang
2022-03-03 19:38 ` [PATCH v4 04/30] KVM: x86/mmu: Document that zapping invalidated roots doesn't need to flush Paolo Bonzini
2022-03-03 19:38 ` [PATCH v4 05/30] KVM: x86/mmu: Require mmu_lock be held for write in unyielding root iter Paolo Bonzini
2022-03-03 19:38 ` [PATCH v4 06/30] KVM: x86/mmu: only perform eager page splitting on valid roots Paolo Bonzini
2022-03-03 20:03   ` Sean Christopherson
2022-03-03 19:38 ` [PATCH v4 07/30] KVM: x86/mmu: do not allow readers to acquire references to invalid roots Paolo Bonzini
2022-03-03 20:12   ` Sean Christopherson
2022-03-03 19:38 ` [PATCH v4 08/30] KVM: x86/mmu: Check for !leaf=>leaf, not PFN change, in TDP MMU SP removal Paolo Bonzini
2022-03-03 19:38 ` [PATCH v4 09/30] KVM: x86/mmu: Batch TLB flushes from TDP MMU for MMU notifier change_spte Paolo Bonzini
2022-03-03 19:38 ` [PATCH v4 10/30] KVM: x86/mmu: Drop RCU after processing each root in MMU notifier hooks Paolo Bonzini
2022-03-03 19:38 ` [PATCH v4 11/30] KVM: x86/mmu: Add helpers to read/write TDP MMU SPTEs and document RCU Paolo Bonzini
2022-03-03 19:38 ` [PATCH v4 12/30] KVM: x86/mmu: WARN if old _or_ new SPTE is REMOVED in non-atomic path Paolo Bonzini
2022-03-03 19:38 ` [PATCH v4 13/30] KVM: x86/mmu: Refactor low-level TDP MMU set SPTE helper to take raw values Paolo Bonzini
2022-03-03 19:38 ` [PATCH v4 14/30] KVM: x86/mmu: Zap only the target TDP MMU shadow page in NX recovery Paolo Bonzini
2022-03-03 19:38 ` [PATCH v4 15/30] KVM: x86/mmu: Skip remote TLB flush when zapping all of TDP MMU Paolo Bonzini
2022-03-03 19:38 ` [PATCH v4 16/30] KVM: x86/mmu: Add dedicated helper to zap TDP MMU root shadow page Paolo Bonzini
2022-03-04  0:07   ` Mingwei Zhang
2022-03-03 19:38 ` [PATCH v4 17/30] KVM: x86/mmu: Require mmu_lock be held for write to zap TDP MMU range Paolo Bonzini
2022-03-04  0:14   ` Mingwei Zhang
2022-03-03 19:38 ` [PATCH v4 18/30] KVM: x86/mmu: Zap only TDP MMU leafs in kvm_zap_gfn_range() Paolo Bonzini
2022-03-04  1:16   ` Mingwei Zhang
2022-03-04 16:11     ` Sean Christopherson
2022-03-04 18:00       ` Mingwei Zhang
2022-03-04 18:42         ` Sean Christopherson
2022-03-11 15:09   ` Vitaly Kuznetsov
2022-03-13 18:40   ` Mingwei Zhang [this message]
2022-03-25 15:13     ` Sean Christopherson
2022-03-26 18:10       ` Mingwei Zhang
2022-03-28 15:06         ` Sean Christopherson
2022-03-03 19:38 ` [PATCH v4 19/30] KVM: x86/mmu: Do remote TLB flush before dropping RCU in TDP MMU resched Paolo Bonzini
2022-03-04  1:19   ` Mingwei Zhang
2022-03-03 19:38 ` [PATCH v4 20/30] KVM: x86/mmu: Defer TLB flush to caller when freeing TDP MMU shadow pages Paolo Bonzini
2022-03-03 19:38 ` [PATCH v4 21/30] KVM: x86/mmu: Zap invalidated roots via asynchronous worker Paolo Bonzini
2022-03-03 20:54   ` Sean Christopherson
2022-03-03 21:06     ` Sean Christopherson
2022-03-03 21:20   ` Sean Christopherson
2022-03-03 21:32     ` Sean Christopherson
2022-03-04  6:48       ` Paolo Bonzini
2022-03-04 16:02         ` Sean Christopherson
2022-03-04 18:11           ` Paolo Bonzini
2022-03-05  0:34             ` Sean Christopherson
2022-03-05 19:53               ` Paolo Bonzini
2022-03-08 21:29                 ` Sean Christopherson
2022-03-11 17:50                   ` Paolo Bonzini
2022-03-03 19:38 ` [PATCH v4 22/30] KVM: x86/mmu: Allow yielding when zapping GFNs for defunct TDP MMU root Paolo Bonzini
2022-03-03 19:38 ` [PATCH v4 23/30] KVM: x86/mmu: Zap roots in two passes to avoid inducing RCU stalls Paolo Bonzini
2022-03-03 19:38 ` [PATCH v4 24/30] KVM: x86/mmu: Zap defunct roots via asynchronous worker Paolo Bonzini
2022-03-03 22:08   ` Sean Christopherson
2022-03-03 19:38 ` [PATCH v4 25/30] KVM: x86/mmu: Check for a REMOVED leaf SPTE before making the SPTE Paolo Bonzini
2022-03-03 19:38 ` [PATCH v4 26/30] KVM: x86/mmu: WARN on any attempt to atomically update REMOVED SPTE Paolo Bonzini
2022-03-03 19:38 ` [PATCH v4 27/30] KVM: selftests: Move raw KVM_SET_USER_MEMORY_REGION helper to utils Paolo Bonzini
2022-03-03 19:38 ` [PATCH v4 28/30] KVM: selftests: Split out helper to allocate guest mem via memfd Paolo Bonzini
2022-03-03 19:38 ` [PATCH v4 29/30] KVM: selftests: Define cpu_relax() helpers for s390 and x86 Paolo Bonzini
2022-03-03 19:38 ` [PATCH v4 30/30] KVM: selftests: Add test to populate a VM with the max possible guest mem Paolo Bonzini
2022-03-08 14:47   ` Paolo Bonzini
2022-03-08 15:36     ` Christian Borntraeger
2022-03-08 21:09     ` Sean Christopherson
2022-03-08 17:25 ` [PATCH v4 00/30] KVM: x86/mmu: Overhaul TDP MMU zapping and flushing Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAL715WJc3QdFe4gkbefW5zHPaYZfErG9vQmOLsbXz=kbaB-6uw@mail.gmail.com' \
    --to=mizhang@google.com \
    --cc=bgardon@google.com \
    --cc=david@redhat.com \
    --cc=dmatlack@google.com \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).