kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Ben Gardon <bgardon@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	Hou Wenlong <houwenlong93@linux.alibaba.com>
Subject: Re: [PATCH 15/28] KVM: x86/mmu: Take TDP MMU roots off list when invalidating all roots
Date: Mon, 22 Nov 2021 23:08:41 +0000	[thread overview]
Message-ID: <YZwi+TzVLQi5YlIX@google.com> (raw)
In-Reply-To: <CANgfPd8Kz41FpvooznGW2VLp8GZFei28FCjonr2+YEZoturi0A@mail.gmail.com>

On Mon, Nov 22, 2021, Ben Gardon wrote:
> On Fri, Nov 19, 2021 at 8:51 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > Take TDP MMU roots off the list of roots when they're invalidated instead
> > of walking later on to find the roots that were just invalidated.  In
> > addition to making the flow more straightforward, this allows warning
> > if something attempts to elevate the refcount of an invalid root, which
> > should be unreachable (no longer on the list so can't be reached by MMU
> > notifier, and vCPUs must reload a new root before installing new SPTE).
> >
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> 
> There are a bunch of awesome little cleanups and unrelated fixes
> included in this commit that could be factored out.
> 
> I'm skeptical of immediately moving the invalidated roots into another
> list as that seems like it has a lot of potential for introducing
> weird races.

I disagree, the entire premise of fast invalidate is that there can't be races,
i.e. mmu_lock must be held for write.  IMO, it's actually the opposite, as the only
reason leaving roots on the per-VM list doesn't have weird races is because slots_lock
is held.  If slots_lock weren't required to do a fast zap, which is feasible for the
TDP MMU since it doesn't rely on the memslots generation, then it would be possible
for multiple calls to kvm_tdp_mmu_zap_invalidated_roots() to run in parallel.  And in
that case, leaving roots on the per-VM list would lead to a single instance of a
"fast zap" zapping roots it didn't invalidate.  That's wouldn't be problematic per se,
but I don't like not having a clear "owner" of the invalidated root.

> I'm not sure it actually solves a problem either. Part of
> the motive from the commit description "this allows warning if
> something attempts to elevate the refcount of an invalid root" can be
> achieved already without moving the roots into a separate list.

Hmm, true in the sense that kvm_tdp_mmu_get_root() could be converted to a WARN,
but that would require tdp_mmu_next_root() to manually skip invalid roots.
kvm_tdp_mmu_get_vcpu_root_hpa() is naturally safe because page_role_for_level()
will never set the invalid flag.

I don't care too much about adding a manual check in tdp_mmu_next_root(), what I don't
like is that a WARN in kvm_tdp_mmu_get_root() wouldn't be establishing an invariant
that invalidated roots are unreachable, it would simply be forcing callers to check
role.invalid.

> Maybe this would seem more straightforward with some of the little
> cleanups factored out, but this feels more complicated to me.
> > @@ -124,6 +137,27 @@ static struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm,
> >  {
> >         struct kvm_mmu_page *next_root;
> >
> > +       lockdep_assert_held(&kvm->mmu_lock);
> > +
> > +       /*
> > +        * Restart the walk if the previous root was invalidated, which can
> > +        * happen if the caller drops mmu_lock when yielding.  Restarting the
> > +        * walke is necessary because invalidating a root also removes it from
> 
> Nit: *walk
> 
> > +        * tdp_mmu_roots.  Restarting is safe and correct because invalidating
> > +        * a root is done if and only if _all_ roots are invalidated, i.e. any
> > +        * root on tdp_mmu_roots was added _after_ the invalidation event.
> > +        */
> > +       if (prev_root && prev_root->role.invalid) {
> > +               kvm_tdp_mmu_put_root(kvm, prev_root, shared);
> > +               prev_root = NULL;
> > +       }
> > +
> > +       /*
> > +        * Finding the next root must be done under RCU read lock.  Although
> > +        * @prev_root itself cannot be removed from tdp_mmu_roots because this
> > +        * task holds a reference, its next and prev pointers can be modified
> > +        * when freeing a different root.  Ditto for tdp_mmu_roots itself.
> > +        */
> 
> I'm not sure this is correct with the rest of the changes in this
> patch. The new version of invalidate_roots removes roots from the list
> immediately, even if they have a non-zero ref-count.

Roots don't have to be invalidated to be removed, e.g. if the last reference is
put due to kvm_mmu_reset_context().  Or did I misunderstand?

> >         rcu_read_lock();
> >
> >         if (prev_root)
> > @@ -230,10 +264,13 @@ hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu)
> >         root = alloc_tdp_mmu_page(vcpu, 0, vcpu->arch.mmu->shadow_root_level);
> >         refcount_set(&root->tdp_mmu_root_count, 1);
> >
> > -       spin_lock(&kvm->arch.tdp_mmu_pages_lock);
> > -       list_add_rcu(&root->link, &kvm->arch.tdp_mmu_roots);
> > -       spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
> > -
> > +       /*
> > +        * Because mmu_lock must be held for write to ensure that KVM doesn't
> > +        * create multiple roots for a given role, this does not need to use
> > +        * an RCU-friendly variant as readers of tdp_mmu_roots must also hold
> > +        * mmu_lock in some capacity.
> > +        */
> 
> I doubt we're doing it now, but in principle we could allocate new
> roots with mmu_lock in read + tdp_mmu_pages_lock. That might be better
> than depending on the write lock.

We're not, this function does lockdep_assert_held_write(&kvm->mmu_lock) a few
lines above.  I don't have a preference between using mmu_lock.read+tdp_mmu_pages_lock
versus mmu_lock.write, but I do care that the current code doesn't incorrectly imply
that it's possible for something else to be walking the roots while this runs.

Either way, this should definitely be a separate patch, pretty sure I just lost
track of it.
 
> > +       list_add(&root->link, &kvm->arch.tdp_mmu_roots);
> >  out:
> >         return __pa(root->spt);
> >  }

  reply	other threads:[~2021-11-22 23:08 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-20  4:50 [PATCH 00/28] KVM: x86/mmu: Overhaul TDP MMU zapping and flushing Sean Christopherson
2021-11-20  4:50 ` [PATCH 01/28] KVM: x86/mmu: Use yield-safe TDP MMU root iter in MMU notifier unmapping Sean Christopherson
2021-11-22 19:48   ` Ben Gardon
2021-11-30  8:03   ` Paolo Bonzini
2021-11-20  4:50 ` [PATCH 02/28] KVM: x86/mmu: Skip tlb flush if it has been done in zap_gfn_range() Sean Christopherson
2021-11-20  4:50 ` [PATCH 03/28] KVM: x86/mmu: Remove spurious TLB flushes in TDP MMU zap collapsible path Sean Christopherson
2021-11-20  4:50 ` [PATCH 04/28] KVM: x86/mmu: Retry page fault if root is invalidated by memslot update Sean Christopherson
2021-11-22 19:54   ` Ben Gardon
2021-12-01 20:49   ` Paolo Bonzini
2021-12-08 19:17   ` Sean Christopherson
2021-11-20  4:50 ` [PATCH 05/28] KVM: x86/mmu: Check for present SPTE when clearing dirty bit in TDP MMU Sean Christopherson
2021-11-22 19:57   ` Ben Gardon
2021-11-20  4:50 ` [PATCH 06/28] KVM: x86/mmu: Formalize TDP MMU's (unintended?) deferred TLB flush logic Sean Christopherson
2021-11-20  4:50 ` [PATCH 07/28] KVM: x86/mmu: Document that zapping invalidated roots doesn't need to flush Sean Christopherson
2021-11-20  4:50 ` [PATCH 08/28] KVM: x86/mmu: Drop unused @kvm param from kvm_tdp_mmu_get_root() Sean Christopherson
2021-11-22 20:02   ` Ben Gardon
2021-11-20  4:50 ` [PATCH 09/28] KVM: x86/mmu: Require mmu_lock be held for write in unyielding root iter Sean Christopherson
2021-11-22 20:10   ` Ben Gardon
2021-11-22 20:19     ` Sean Christopherson
2021-11-20  4:50 ` [PATCH 10/28] KVM: x86/mmu: Allow yielding when zapping GFNs for defunct TDP MMU root Sean Christopherson
2021-11-22 21:30   ` Ben Gardon
2021-11-22 22:40     ` Sean Christopherson
2021-11-22 23:03       ` Ben Gardon
2021-12-14 23:45     ` Sean Christopherson
2021-12-14 23:52       ` Sean Christopherson
2021-11-20  4:50 ` [PATCH 11/28] KVM: x86/mmu: Check for !leaf=>leaf, not PFN change, in TDP MMU SP removal Sean Christopherson
2021-11-20  4:50 ` [PATCH 12/28] KVM: x86/mmu: Batch TLB flushes from TDP MMU for MMU notifier change_spte Sean Christopherson
2021-11-22 21:45   ` Ben Gardon
2021-11-20  4:50 ` [PATCH 13/28] KVM: x86/mmu: Drop RCU after processing each root in MMU notifier hooks Sean Christopherson
2021-11-22 21:47   ` Ben Gardon
2021-11-20  4:50 ` [PATCH 14/28] KVM: x86/mmu: Add helpers to read/write TDP MMU SPTEs and document RCU Sean Christopherson
2021-11-22 21:55   ` Ben Gardon
2021-11-20  4:50 ` [PATCH 15/28] KVM: x86/mmu: Take TDP MMU roots off list when invalidating all roots Sean Christopherson
2021-11-22 22:20   ` Ben Gardon
2021-11-22 23:08     ` Sean Christopherson [this message]
2021-11-23  0:03       ` Ben Gardon
2021-12-14 23:34         ` Sean Christopherson
2021-11-20  4:50 ` [PATCH 16/28] KVM: x86/mmu: WARN if old _or_ new SPTE is REMOVED in non-atomic path Sean Christopherson
2021-11-22 21:57   ` Ben Gardon
2021-11-20  4:50 ` [PATCH 17/28] KVM: x86/mmu: Terminate yield-friendly walk if invalid root observed Sean Christopherson
2021-11-22 22:25   ` Ben Gardon
2021-11-20  4:50 ` [PATCH 18/28] KVM: x86/mmu: Refactor low-level TDP MMU set SPTE helper to take raw vals Sean Christopherson
2021-11-22 22:29   ` Ben Gardon
2021-11-20  4:50 ` [PATCH 19/28] KVM: x86/mmu: Zap only the target TDP MMU shadow page in NX recovery Sean Christopherson
2021-11-22 22:43   ` Ben Gardon
2021-11-23  1:16     ` Sean Christopherson
2021-11-23 19:35       ` Ben Gardon
2021-11-20  4:50 ` [PATCH 20/28] KVM: x86/mmu: Use common TDP MMU zap helper for MMU notifier unmap hook Sean Christopherson
2021-11-22 22:49   ` Ben Gardon
2021-11-20  4:50 ` [PATCH 21/28] KVM: x86/mmu: Add TDP MMU helper to zap a root Sean Christopherson
2021-11-22 22:54   ` Ben Gardon
2021-11-22 23:15     ` Sean Christopherson
2021-11-22 23:38       ` Ben Gardon
2021-11-20  4:50 ` [PATCH 22/28] KVM: x86/mmu: Skip remote TLB flush when zapping all of TDP MMU Sean Christopherson
2021-11-22 23:00   ` Ben Gardon
2021-11-20  4:50 ` [PATCH 23/28] KVM: x86/mmu: Use "zap root" path for "slow" zap of all TDP MMU SPTEs Sean Christopherson
2021-11-20  4:50 ` [PATCH 24/28] KVM: x86/mmu: Add dedicated helper to zap TDP MMU root shadow page Sean Christopherson
2021-11-23  1:04   ` Ben Gardon
2021-11-20  4:50 ` [PATCH 25/28] KVM: x86/mmu: Require mmu_lock be held for write to zap TDP MMU range Sean Christopherson
2021-11-23 19:58   ` Ben Gardon
2021-11-20  4:50 ` [PATCH 26/28] KVM: x86/mmu: Zap only TDP MMU leafs in kvm_zap_gfn_range() Sean Christopherson
2021-11-23 19:58   ` Ben Gardon
2021-11-20  4:50 ` [PATCH 27/28] KVM: x86/mmu: Do remote TLB flush before dropping RCU in TDP MMU resched Sean Christopherson
2021-11-23 19:58   ` Ben Gardon
2021-11-24 18:42     ` Sean Christopherson
2021-11-30 11:29   ` Paolo Bonzini
2021-11-30 15:45     ` Sean Christopherson
2021-11-30 16:16       ` Paolo Bonzini
2021-11-20  4:50 ` [PATCH 28/28] KVM: x86/mmu: Defer TLB flush to caller when freeing TDP MMU shadow pages Sean Christopherson
2021-11-23 20:12   ` Ben Gardon
2021-12-01 17:53 ` [PATCH 00/28] KVM: x86/mmu: Overhaul TDP MMU zapping and flushing David Matlack
2021-12-02  2:03   ` Sean Christopherson
2021-12-03  0:16     ` David Matlack

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YZwi+TzVLQi5YlIX@google.com \
    --to=seanjc@google.com \
    --cc=bgardon@google.com \
    --cc=houwenlong93@linux.alibaba.com \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).