kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ben Gardon <bgardon@google.com>
To: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	Hou Wenlong <houwenlong93@linux.alibaba.com>
Subject: Re: [PATCH 15/28] KVM: x86/mmu: Take TDP MMU roots off list when invalidating all roots
Date: Mon, 22 Nov 2021 16:03:10 -0800	[thread overview]
Message-ID: <CANgfPd8s=8SY2R_Cg+gytU6VU2PhOqkOwtq9fAdXCgp+GRpmQg@mail.gmail.com> (raw)
In-Reply-To: <YZwi+TzVLQi5YlIX@google.com>

On Mon, Nov 22, 2021 at 3:08 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Mon, Nov 22, 2021, Ben Gardon wrote:
> > On Fri, Nov 19, 2021 at 8:51 PM Sean Christopherson <seanjc@google.com> wrote:
> > >
> > > Take TDP MMU roots off the list of roots when they're invalidated instead
> > > of walking later on to find the roots that were just invalidated.  In
> > > addition to making the flow more straightforward, this allows warning
> > > if something attempts to elevate the refcount of an invalid root, which
> > > should be unreachable (no longer on the list so can't be reached by MMU
> > > notifier, and vCPUs must reload a new root before installing new SPTE).
> > >
> > > Signed-off-by: Sean Christopherson <seanjc@google.com>
> >
> > There are a bunch of awesome little cleanups and unrelated fixes
> > included in this commit that could be factored out.
> >
> > I'm skeptical of immediately moving the invalidated roots into another
> > list as that seems like it has a lot of potential for introducing
> > weird races.
>
> I disagree, the entire premise of fast invalidate is that there can't be races,
> i.e. mmu_lock must be held for write.  IMO, it's actually the opposite, as the only
> reason leaving roots on the per-VM list doesn't have weird races is because slots_lock
> is held.  If slots_lock weren't required to do a fast zap, which is feasible for the
> TDP MMU since it doesn't rely on the memslots generation, then it would be possible
> for multiple calls to kvm_tdp_mmu_zap_invalidated_roots() to run in parallel.  And in
> that case, leaving roots on the per-VM list would lead to a single instance of a
> "fast zap" zapping roots it didn't invalidate.  That's wouldn't be problematic per se,
> but I don't like not having a clear "owner" of the invalidated root.

That's a good point, the potential interleaving of zap_alls would be gross.

My mental model for the invariant here was "roots that are still in
use are on the roots list," but I can see how "the roots list contains
all valid, in-use roots" could be a more useful invariant.

>
> > I'm not sure it actually solves a problem either. Part of
> > the motive from the commit description "this allows warning if
> > something attempts to elevate the refcount of an invalid root" can be
> > achieved already without moving the roots into a separate list.
>
> Hmm, true in the sense that kvm_tdp_mmu_get_root() could be converted to a WARN,
> but that would require tdp_mmu_next_root() to manually skip invalid roots.
> kvm_tdp_mmu_get_vcpu_root_hpa() is naturally safe because page_role_for_level()
> will never set the invalid flag.
>
> I don't care too much about adding a manual check in tdp_mmu_next_root(), what I don't
> like is that a WARN in kvm_tdp_mmu_get_root() wouldn't be establishing an invariant
> that invalidated roots are unreachable, it would simply be forcing callers to check
> role.invalid.

That makes sense.

>
> > Maybe this would seem more straightforward with some of the little
> > cleanups factored out, but this feels more complicated to me.
> > > @@ -124,6 +137,27 @@ static struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm,
> > >  {
> > >         struct kvm_mmu_page *next_root;
> > >
> > > +       lockdep_assert_held(&kvm->mmu_lock);
> > > +
> > > +       /*
> > > +        * Restart the walk if the previous root was invalidated, which can
> > > +        * happen if the caller drops mmu_lock when yielding.  Restarting the
> > > +        * walke is necessary because invalidating a root also removes it from
> >
> > Nit: *walk
> >
> > > +        * tdp_mmu_roots.  Restarting is safe and correct because invalidating
> > > +        * a root is done if and only if _all_ roots are invalidated, i.e. any
> > > +        * root on tdp_mmu_roots was added _after_ the invalidation event.
> > > +        */
> > > +       if (prev_root && prev_root->role.invalid) {
> > > +               kvm_tdp_mmu_put_root(kvm, prev_root, shared);
> > > +               prev_root = NULL;
> > > +       }
> > > +
> > > +       /*
> > > +        * Finding the next root must be done under RCU read lock.  Although
> > > +        * @prev_root itself cannot be removed from tdp_mmu_roots because this
> > > +        * task holds a reference, its next and prev pointers can be modified
> > > +        * when freeing a different root.  Ditto for tdp_mmu_roots itself.
> > > +        */
> >
> > I'm not sure this is correct with the rest of the changes in this
> > patch. The new version of invalidate_roots removes roots from the list
> > immediately, even if they have a non-zero ref-count.
>
> Roots don't have to be invalidated to be removed, e.g. if the last reference is
> put due to kvm_mmu_reset_context().  Or did I misunderstand?

Ah, sorry that was kind of a pedantic comment. I think the comment
should say "@prev_root itself cannot be removed from tdp_mmu_roots
because this task holds the MMU lock (in read or write mode)"
With the other changes in the patch, holding a reference on a root
provides protection against it being freed, but not against it being
removed from the roots list.

>
> > >         rcu_read_lock();
> > >
> > >         if (prev_root)
> > > @@ -230,10 +264,13 @@ hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu)
> > >         root = alloc_tdp_mmu_page(vcpu, 0, vcpu->arch.mmu->shadow_root_level);
> > >         refcount_set(&root->tdp_mmu_root_count, 1);
> > >
> > > -       spin_lock(&kvm->arch.tdp_mmu_pages_lock);
> > > -       list_add_rcu(&root->link, &kvm->arch.tdp_mmu_roots);
> > > -       spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
> > > -
> > > +       /*
> > > +        * Because mmu_lock must be held for write to ensure that KVM doesn't
> > > +        * create multiple roots for a given role, this does not need to use
> > > +        * an RCU-friendly variant as readers of tdp_mmu_roots must also hold
> > > +        * mmu_lock in some capacity.
> > > +        */
> >
> > I doubt we're doing it now, but in principle we could allocate new
> > roots with mmu_lock in read + tdp_mmu_pages_lock. That might be better
> > than depending on the write lock.
>
> We're not, this function does lockdep_assert_held_write(&kvm->mmu_lock) a few
> lines above.  I don't have a preference between using mmu_lock.read+tdp_mmu_pages_lock
> versus mmu_lock.write, but I do care that the current code doesn't incorrectly imply
> that it's possible for something else to be walking the roots while this runs.

That makes sense. It seems like it'd be pretty easy to let this run
under the read lock, but I'd be fine with either way. I agree both
options are an improvement.

>
> Either way, this should definitely be a separate patch, pretty sure I just lost
> track of it.
>
> > > +       list_add(&root->link, &kvm->arch.tdp_mmu_roots);
> > >  out:
> > >         return __pa(root->spt);
> > >  }

  reply	other threads:[~2021-11-23  0:03 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-20  4:50 [PATCH 00/28] KVM: x86/mmu: Overhaul TDP MMU zapping and flushing Sean Christopherson
2021-11-20  4:50 ` [PATCH 01/28] KVM: x86/mmu: Use yield-safe TDP MMU root iter in MMU notifier unmapping Sean Christopherson
2021-11-22 19:48   ` Ben Gardon
2021-11-30  8:03   ` Paolo Bonzini
2021-11-20  4:50 ` [PATCH 02/28] KVM: x86/mmu: Skip tlb flush if it has been done in zap_gfn_range() Sean Christopherson
2021-11-20  4:50 ` [PATCH 03/28] KVM: x86/mmu: Remove spurious TLB flushes in TDP MMU zap collapsible path Sean Christopherson
2021-11-20  4:50 ` [PATCH 04/28] KVM: x86/mmu: Retry page fault if root is invalidated by memslot update Sean Christopherson
2021-11-22 19:54   ` Ben Gardon
2021-12-01 20:49   ` Paolo Bonzini
2021-12-08 19:17   ` Sean Christopherson
2021-11-20  4:50 ` [PATCH 05/28] KVM: x86/mmu: Check for present SPTE when clearing dirty bit in TDP MMU Sean Christopherson
2021-11-22 19:57   ` Ben Gardon
2021-11-20  4:50 ` [PATCH 06/28] KVM: x86/mmu: Formalize TDP MMU's (unintended?) deferred TLB flush logic Sean Christopherson
2021-11-20  4:50 ` [PATCH 07/28] KVM: x86/mmu: Document that zapping invalidated roots doesn't need to flush Sean Christopherson
2021-11-20  4:50 ` [PATCH 08/28] KVM: x86/mmu: Drop unused @kvm param from kvm_tdp_mmu_get_root() Sean Christopherson
2021-11-22 20:02   ` Ben Gardon
2021-11-20  4:50 ` [PATCH 09/28] KVM: x86/mmu: Require mmu_lock be held for write in unyielding root iter Sean Christopherson
2021-11-22 20:10   ` Ben Gardon
2021-11-22 20:19     ` Sean Christopherson
2021-11-20  4:50 ` [PATCH 10/28] KVM: x86/mmu: Allow yielding when zapping GFNs for defunct TDP MMU root Sean Christopherson
2021-11-22 21:30   ` Ben Gardon
2021-11-22 22:40     ` Sean Christopherson
2021-11-22 23:03       ` Ben Gardon
2021-12-14 23:45     ` Sean Christopherson
2021-12-14 23:52       ` Sean Christopherson
2021-11-20  4:50 ` [PATCH 11/28] KVM: x86/mmu: Check for !leaf=>leaf, not PFN change, in TDP MMU SP removal Sean Christopherson
2021-11-20  4:50 ` [PATCH 12/28] KVM: x86/mmu: Batch TLB flushes from TDP MMU for MMU notifier change_spte Sean Christopherson
2021-11-22 21:45   ` Ben Gardon
2021-11-20  4:50 ` [PATCH 13/28] KVM: x86/mmu: Drop RCU after processing each root in MMU notifier hooks Sean Christopherson
2021-11-22 21:47   ` Ben Gardon
2021-11-20  4:50 ` [PATCH 14/28] KVM: x86/mmu: Add helpers to read/write TDP MMU SPTEs and document RCU Sean Christopherson
2021-11-22 21:55   ` Ben Gardon
2021-11-20  4:50 ` [PATCH 15/28] KVM: x86/mmu: Take TDP MMU roots off list when invalidating all roots Sean Christopherson
2021-11-22 22:20   ` Ben Gardon
2021-11-22 23:08     ` Sean Christopherson
2021-11-23  0:03       ` Ben Gardon [this message]
2021-12-14 23:34         ` Sean Christopherson
2021-11-20  4:50 ` [PATCH 16/28] KVM: x86/mmu: WARN if old _or_ new SPTE is REMOVED in non-atomic path Sean Christopherson
2021-11-22 21:57   ` Ben Gardon
2021-11-20  4:50 ` [PATCH 17/28] KVM: x86/mmu: Terminate yield-friendly walk if invalid root observed Sean Christopherson
2021-11-22 22:25   ` Ben Gardon
2021-11-20  4:50 ` [PATCH 18/28] KVM: x86/mmu: Refactor low-level TDP MMU set SPTE helper to take raw vals Sean Christopherson
2021-11-22 22:29   ` Ben Gardon
2021-11-20  4:50 ` [PATCH 19/28] KVM: x86/mmu: Zap only the target TDP MMU shadow page in NX recovery Sean Christopherson
2021-11-22 22:43   ` Ben Gardon
2021-11-23  1:16     ` Sean Christopherson
2021-11-23 19:35       ` Ben Gardon
2021-11-20  4:50 ` [PATCH 20/28] KVM: x86/mmu: Use common TDP MMU zap helper for MMU notifier unmap hook Sean Christopherson
2021-11-22 22:49   ` Ben Gardon
2021-11-20  4:50 ` [PATCH 21/28] KVM: x86/mmu: Add TDP MMU helper to zap a root Sean Christopherson
2021-11-22 22:54   ` Ben Gardon
2021-11-22 23:15     ` Sean Christopherson
2021-11-22 23:38       ` Ben Gardon
2021-11-20  4:50 ` [PATCH 22/28] KVM: x86/mmu: Skip remote TLB flush when zapping all of TDP MMU Sean Christopherson
2021-11-22 23:00   ` Ben Gardon
2021-11-20  4:50 ` [PATCH 23/28] KVM: x86/mmu: Use "zap root" path for "slow" zap of all TDP MMU SPTEs Sean Christopherson
2021-11-20  4:50 ` [PATCH 24/28] KVM: x86/mmu: Add dedicated helper to zap TDP MMU root shadow page Sean Christopherson
2021-11-23  1:04   ` Ben Gardon
2021-11-20  4:50 ` [PATCH 25/28] KVM: x86/mmu: Require mmu_lock be held for write to zap TDP MMU range Sean Christopherson
2021-11-23 19:58   ` Ben Gardon
2021-11-20  4:50 ` [PATCH 26/28] KVM: x86/mmu: Zap only TDP MMU leafs in kvm_zap_gfn_range() Sean Christopherson
2021-11-23 19:58   ` Ben Gardon
2021-11-20  4:50 ` [PATCH 27/28] KVM: x86/mmu: Do remote TLB flush before dropping RCU in TDP MMU resched Sean Christopherson
2021-11-23 19:58   ` Ben Gardon
2021-11-24 18:42     ` Sean Christopherson
2021-11-30 11:29   ` Paolo Bonzini
2021-11-30 15:45     ` Sean Christopherson
2021-11-30 16:16       ` Paolo Bonzini
2021-11-20  4:50 ` [PATCH 28/28] KVM: x86/mmu: Defer TLB flush to caller when freeing TDP MMU shadow pages Sean Christopherson
2021-11-23 20:12   ` Ben Gardon
2021-12-01 17:53 ` [PATCH 00/28] KVM: x86/mmu: Overhaul TDP MMU zapping and flushing David Matlack
2021-12-02  2:03   ` Sean Christopherson
2021-12-03  0:16     ` David Matlack

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CANgfPd8s=8SY2R_Cg+gytU6VU2PhOqkOwtq9fAdXCgp+GRpmQg@mail.gmail.com' \
    --to=bgardon@google.com \
    --cc=houwenlong93@linux.alibaba.com \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).