From: Ben Gardon <bgardon@google.com>
To: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
Christian Borntraeger <borntraeger@linux.ibm.com>,
Janosch Frank <frankja@linux.ibm.com>,
Claudio Imbrenda <imbrenda@linux.ibm.com>,
Vitaly Kuznetsov <vkuznets@redhat.com>,
Wanpeng Li <wanpengli@tencent.com>,
Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>,
David Hildenbrand <david@redhat.com>, kvm <kvm@vger.kernel.org>,
LKML <linux-kernel@vger.kernel.org>,
David Matlack <dmatlack@google.com>,
Mingwei Zhang <mizhang@google.com>
Subject: Re: [PATCH v3 03/28] KVM: x86/mmu: Fix wrong/misleading comments in TDP MMU fast zap
Date: Mon, 28 Feb 2022 15:15:03 -0800 [thread overview]
Message-ID: <CANgfPd9dFhYpQdVQQ9ic+yepPKCG3Vrek0tcYbP8rjzuZD_OLA@mail.gmail.com> (raw)
In-Reply-To: <20220226001546.360188-4-seanjc@google.com>
On Fri, Feb 25, 2022 at 4:16 PM Sean Christopherson <seanjc@google.com> wrote:
>
> Fix misleading and arguably wrong comments in the TDP MMU's fast zap
> flow. The comments, and the fact that actually zapping invalid roots was
> added separately, strongly suggests that zapping invalid roots is an
> optimization and not required for correctness. That is a lie.
>
> KVM _must_ zap invalid roots before returning from kvm_mmu_zap_all_fast(),
> because when it's called from kvm_mmu_invalidate_zap_pages_in_memslot(),
> KVM is relying on it to fully remove all references to the memslot. Once
> the memslot is gone, KVM's mmu_notifier hooks will be unable to find the
> stale references as the hva=>gfn translation is done via the memslots.
> If KVM doesn't immediately zap SPTEs and userspace unmaps a range after
> deleting a memslot, KVM will fail to zap in response to the mmu_notifier
> due to not finding a memslot corresponding to the notifier's range, which
> leads to a variation of use-after-free.
>
> The other misleading comment (and code) explicitly states that roots
> without a reference should be skipped. While that's technically true,
> it's also extremely misleading as it should be impossible for KVM to
> encounter a defunct root on the list while holding mmu_lock for write.
> Opportunstically add a WARN to enforce that invariant.
>
> Fixes: b7cccd397f31 ("KVM: x86/mmu: Fast invalidation for TDP MMU")
> Fixes: 4c6654bd160d ("KVM: x86/mmu: Tear down roots before kvm_mmu_zap_all_fast returns")
> Signed-off-by: Sean Christopherson <seanjc@google.com>
A couple nits about missing words, but otherwise looks good.
Reviewed-by: Ben Gardon <bgardon@google.com>
> ---
> arch/x86/kvm/mmu/mmu.c | 8 +++++++
> arch/x86/kvm/mmu/tdp_mmu.c | 46 +++++++++++++++++++++-----------------
> 2 files changed, 33 insertions(+), 21 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index b2c1c4eb6007..80607513a1f2 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -5662,6 +5662,14 @@ static void kvm_mmu_zap_all_fast(struct kvm *kvm)
>
> write_unlock(&kvm->mmu_lock);
>
> + /*
> + * Zap the invalidated TDP MMU roots, all SPTEs must be dropped before
> + * returning to the caller, e.g. if the zap is in response to a memslot
> + * deletion, mmu_notifier callbacks will be unable to reach the SPTEs
> + * associated with the deleted memslot once the update completes, and
> + * Deferring the zap until the final reference to the root is put would
> + * lead to use-after-free.
> + */
> if (is_tdp_mmu_enabled(kvm)) {
> read_lock(&kvm->mmu_lock);
> kvm_tdp_mmu_zap_invalidated_roots(kvm);
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index 9357780ec28f..12866113fb4f 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -826,12 +826,11 @@ void kvm_tdp_mmu_zap_all(struct kvm *kvm)
> }
>
> /*
> - * Since kvm_tdp_mmu_zap_all_fast has acquired a reference to each
> - * invalidated root, they will not be freed until this function drops the
> - * reference. Before dropping that reference, tear down the paging
> - * structure so that whichever thread does drop the last reference
> - * only has to do a trivial amount of work. Since the roots are invalid,
> - * no new SPTEs should be created under them.
> + * Zap all invalidated roots to ensure all SPTEs are dropped before the "fast
> + * zap" completes. Since kvm_tdp_mmu_invalidate_all_roots() has acquired a
> + * reference to each invalidated root, roots will not be freed until after this
> + * function drops the gifted reference, e.g. so that vCPUs don't get stuck with
> + * tearing paging structures.
Nit: tearing down paging structures
> */
> void kvm_tdp_mmu_zap_invalidated_roots(struct kvm *kvm)
> {
> @@ -855,21 +854,25 @@ void kvm_tdp_mmu_zap_invalidated_roots(struct kvm *kvm)
> }
>
> /*
> - * Mark each TDP MMU root as invalid so that other threads
> - * will drop their references and allow the root count to
> - * go to 0.
> + * Mark each TDP MMU root as invalid to prevent vCPUs from reusing a root that
> + * is about to be zapped, e.g. in response to a memslots update. The caller is
> + * responsible for invoking kvm_tdp_mmu_zap_invalidated_roots() to the actual
Nit: to do
> + * zapping.
> *
> - * Also take a reference on all roots so that this thread
> - * can do the bulk of the work required to free the roots
> - * once they are invalidated. Without this reference, a
> - * vCPU thread might drop the last reference to a root and
> - * get stuck with tearing down the entire paging structure.
> + * Take a reference on all roots to prevent the root from being freed before it
> + * is zapped by this thread. Freeing a root is not a correctness issue, but if
> + * a vCPU drops the last reference to a root prior to the root being zapped, it
> + * will get stuck with tearing down the entire paging structure.
> *
> - * Roots which have a zero refcount should be skipped as
> - * they're already being torn down.
> - * Already invalid roots should be referenced again so that
> - * they aren't freed before kvm_tdp_mmu_zap_all_fast is
> - * done with them.
> + * Get a reference even if the root is already invalid,
> + * kvm_tdp_mmu_zap_invalidated_roots() assumes it was gifted a reference to all
> + * invalid roots, e.g. there's no epoch to identify roots that were invalidated
> + * by a previous call. Roots stay on the list until the last reference is
> + * dropped, so even though all invalid roots are zapped, a root may not go away
> + * for quite some time, e.g. if a vCPU blocks across multiple memslot updates.
> + *
> + * Because mmu_lock is held for write, it should be impossible to observe a
> + * root with zero refcount, i.e. the list of roots cannot be stale.
> *
> * This has essentially the same effect for the TDP MMU
> * as updating mmu_valid_gen does for the shadow MMU.
> @@ -879,9 +882,10 @@ void kvm_tdp_mmu_invalidate_all_roots(struct kvm *kvm)
> struct kvm_mmu_page *root;
>
> lockdep_assert_held_write(&kvm->mmu_lock);
> - list_for_each_entry(root, &kvm->arch.tdp_mmu_roots, link)
> - if (refcount_inc_not_zero(&root->tdp_mmu_root_count))
> + list_for_each_entry(root, &kvm->arch.tdp_mmu_roots, link) {
> + if (!WARN_ON_ONCE(!kvm_tdp_mmu_get_root(root)))
> root->role.invalid = true;
> + }
> }
>
> /*
> --
> 2.35.1.574.g5d30c73bfb-goog
>
next prev parent reply other threads:[~2022-02-28 23:15 UTC|newest]
Thread overview: 79+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-02-26 0:15 [PATCH v3 00/28] KVM: x86/mmu: Overhaul TDP MMU zapping and flushing Sean Christopherson
2022-02-26 0:15 ` [PATCH v3 01/28] KVM: x86/mmu: Use common iterator for walking invalid TDP MMU roots Sean Christopherson
2022-03-02 19:08 ` Mingwei Zhang
2022-03-02 19:51 ` Sean Christopherson
2022-03-03 0:57 ` Mingwei Zhang
2022-02-26 0:15 ` [PATCH v3 02/28] KVM: x86/mmu: Check for present SPTE when clearing dirty bit in TDP MMU Sean Christopherson
2022-03-02 19:50 ` Mingwei Zhang
2022-02-26 0:15 ` [PATCH v3 03/28] KVM: x86/mmu: Fix wrong/misleading comments in TDP MMU fast zap Sean Christopherson
2022-02-28 23:15 ` Ben Gardon [this message]
2022-02-26 0:15 ` [PATCH v3 04/28] KVM: x86/mmu: Formalize TDP MMU's (unintended?) deferred TLB flush logic Sean Christopherson
2022-03-02 23:59 ` Mingwei Zhang
2022-03-03 0:12 ` Sean Christopherson
2022-03-03 1:20 ` Mingwei Zhang
2022-03-03 1:41 ` Sean Christopherson
2022-03-03 4:50 ` Mingwei Zhang
2022-03-03 16:45 ` Sean Christopherson
2022-02-26 0:15 ` [PATCH v3 05/28] KVM: x86/mmu: Document that zapping invalidated roots doesn't need to flush Sean Christopherson
2022-02-28 23:17 ` Ben Gardon
2022-02-26 0:15 ` [PATCH v3 06/28] KVM: x86/mmu: Require mmu_lock be held for write in unyielding root iter Sean Christopherson
2022-02-28 23:26 ` Ben Gardon
2022-02-26 0:15 ` [PATCH v3 07/28] KVM: x86/mmu: Check for !leaf=>leaf, not PFN change, in TDP MMU SP removal Sean Christopherson
2022-03-01 0:11 ` Ben Gardon
2022-03-03 18:02 ` Mingwei Zhang
2022-02-26 0:15 ` [PATCH v3 08/28] KVM: x86/mmu: Batch TLB flushes from TDP MMU for MMU notifier change_spte Sean Christopherson
2022-03-03 18:08 ` Mingwei Zhang
2022-02-26 0:15 ` [PATCH v3 09/28] KVM: x86/mmu: Drop RCU after processing each root in MMU notifier hooks Sean Christopherson
2022-03-03 18:24 ` Mingwei Zhang
2022-03-03 18:32 ` Mingwei Zhang
2022-02-26 0:15 ` [PATCH v3 10/28] KVM: x86/mmu: Add helpers to read/write TDP MMU SPTEs and document RCU Sean Christopherson
2022-03-03 18:34 ` Mingwei Zhang
2022-02-26 0:15 ` [PATCH v3 11/28] KVM: x86/mmu: WARN if old _or_ new SPTE is REMOVED in non-atomic path Sean Christopherson
2022-03-03 18:37 ` Mingwei Zhang
2022-02-26 0:15 ` [PATCH v3 12/28] KVM: x86/mmu: Refactor low-level TDP MMU set SPTE helper to take raw vals Sean Christopherson
2022-03-03 18:47 ` Mingwei Zhang
2022-02-26 0:15 ` [PATCH v3 13/28] KVM: x86/mmu: Zap only the target TDP MMU shadow page in NX recovery Sean Christopherson
2022-02-26 0:15 ` [PATCH v3 14/28] KVM: x86/mmu: Skip remote TLB flush when zapping all of TDP MMU Sean Christopherson
2022-03-01 0:19 ` Ben Gardon
2022-03-03 18:50 ` Mingwei Zhang
2022-02-26 0:15 ` [PATCH v3 15/28] KVM: x86/mmu: Add dedicated helper to zap TDP MMU root shadow page Sean Christopherson
2022-03-01 0:32 ` Ben Gardon
2022-03-03 21:19 ` Mingwei Zhang
2022-03-03 21:24 ` Mingwei Zhang
2022-03-03 23:06 ` Sean Christopherson
2022-02-26 0:15 ` [PATCH v3 16/28] KVM: x86/mmu: Require mmu_lock be held for write to zap TDP MMU range Sean Christopherson
2022-02-26 0:15 ` [PATCH v3 17/28] KVM: x86/mmu: Zap only TDP MMU leafs in kvm_zap_gfn_range() Sean Christopherson
2022-02-26 0:15 ` [PATCH v3 18/28] KVM: x86/mmu: Do remote TLB flush before dropping RCU in TDP MMU resched Sean Christopherson
2022-02-26 0:15 ` [PATCH v3 19/28] KVM: x86/mmu: Defer TLB flush to caller when freeing TDP MMU shadow pages Sean Christopherson
2022-02-26 0:15 ` [PATCH v3 20/28] KVM: x86/mmu: Allow yielding when zapping GFNs for defunct TDP MMU root Sean Christopherson
2022-03-01 18:21 ` Paolo Bonzini
2022-03-01 19:43 ` Sean Christopherson
2022-03-01 20:12 ` Paolo Bonzini
2022-03-02 2:13 ` Sean Christopherson
2022-03-02 14:54 ` Paolo Bonzini
2022-03-02 17:43 ` Sean Christopherson
2022-02-26 0:15 ` [PATCH v3 21/28] KVM: x86/mmu: Zap roots in two passes to avoid inducing RCU stalls Sean Christopherson
2022-03-01 0:43 ` Ben Gardon
2022-02-26 0:15 ` [PATCH v3 22/28] KVM: x86/mmu: Zap defunct roots via asynchronous worker Sean Christopherson
2022-03-01 17:57 ` Ben Gardon
2022-03-02 17:25 ` Paolo Bonzini
2022-03-02 17:35 ` Sean Christopherson
2022-03-02 18:33 ` David Matlack
2022-03-02 18:36 ` Paolo Bonzini
2022-03-02 18:01 ` Sean Christopherson
2022-03-02 18:20 ` Paolo Bonzini
2022-03-02 19:33 ` Sean Christopherson
2022-03-02 20:14 ` Paolo Bonzini
2022-03-02 20:47 ` Sean Christopherson
2022-03-02 21:22 ` Paolo Bonzini
2022-03-02 22:25 ` Sean Christopherson
2022-02-26 0:15 ` [PATCH v3 23/28] KVM: x86/mmu: Check for a REMOVED leaf SPTE before making the SPTE Sean Christopherson
2022-03-01 18:06 ` Ben Gardon
2022-02-26 0:15 ` [PATCH v3 24/28] KVM: x86/mmu: WARN on any attempt to atomically update REMOVED SPTE Sean Christopherson
2022-02-26 0:15 ` [PATCH v3 25/28] KVM: selftests: Move raw KVM_SET_USER_MEMORY_REGION helper to utils Sean Christopherson
2022-02-26 0:15 ` [PATCH v3 26/28] KVM: selftests: Split out helper to allocate guest mem via memfd Sean Christopherson
2022-02-28 23:36 ` David Woodhouse
2022-03-02 18:36 ` Paolo Bonzini
2022-03-02 21:55 ` David Woodhouse
2022-02-26 0:15 ` [PATCH v3 27/28] KVM: selftests: Define cpu_relax() helpers for s390 and x86 Sean Christopherson
2022-02-26 0:15 ` [PATCH v3 28/28] KVM: selftests: Add test to populate a VM with the max possible guest mem Sean Christopherson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CANgfPd9dFhYpQdVQQ9ic+yepPKCG3Vrek0tcYbP8rjzuZD_OLA@mail.gmail.com \
--to=bgardon@google.com \
--cc=borntraeger@linux.ibm.com \
--cc=david@redhat.com \
--cc=dmatlack@google.com \
--cc=frankja@linux.ibm.com \
--cc=imbrenda@linux.ibm.com \
--cc=jmattson@google.com \
--cc=joro@8bytes.org \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mizhang@google.com \
--cc=pbonzini@redhat.com \
--cc=seanjc@google.com \
--cc=vkuznets@redhat.com \
--cc=wanpengli@tencent.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).