kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ben Gardon <bgardon@google.com>
To: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	Hou Wenlong <houwenlong93@linux.alibaba.com>
Subject: Re: [PATCH 19/28] KVM: x86/mmu: Zap only the target TDP MMU shadow page in NX recovery
Date: Tue, 23 Nov 2021 11:35:28 -0800	[thread overview]
Message-ID: <CANgfPd9CdP-4aYkM7SCtCtV+v4T3HsyG6F8tLu=FCBz1nt=htg@mail.gmail.com> (raw)
In-Reply-To: <YZxA1VAs5FNbjmH9@google.com>

On Mon, Nov 22, 2021 at 5:16 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Mon, Nov 22, 2021, Ben Gardon wrote:
> > On Fri, Nov 19, 2021 at 8:51 PM Sean Christopherson <seanjc@google.com> wrote:
> > > @@ -755,6 +759,26 @@ static inline bool tdp_mmu_iter_cond_resched(struct kvm *kvm,
> > >         return false;
> > >  }
> > >
> > > +bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
> > > +{
> > > +       u64 old_spte;
> > > +
> > > +       rcu_read_lock();
> > > +
> > > +       old_spte = kvm_tdp_mmu_read_spte(sp->ptep);
> > > +       if (WARN_ON_ONCE(!is_shadow_present_pte(old_spte))) {
> > > +               rcu_read_unlock();
> > > +               return false;
> > > +       }
> > > +
> > > +       __tdp_mmu_set_spte(kvm, kvm_mmu_page_as_id(sp), sp->ptep, old_spte, 0,
> > > +                          sp->gfn, sp->role.level + 1, true, true);
> > > +
> > > +       rcu_read_unlock();
> > > +
> > > +       return true;
> > > +}
> > > +
> >
> > Ooooh this makes me really nervous. There are a lot of gotchas to
> > modifying SPTEs in a new context without traversing the paging
> > structure like this. For example, we could modify an SPTE under an
> > invalidated root here. I don't think that would be a problem since
> > we're just clearing it, but it makes the code more fragile.
>
> Heh, it better not be a problem, because there are plently of flows in the TDP MMU
> that can modify SPTEs under an invalidated root, e.g. fast_page_fault(),
> tdp_mmu_zap_leafs(), kvm_age_gfn(), kvm_test_age_gfn(), etc...  And before the
> patch that introduced is_page_fault_stale(), kvm_tdp_mmu_map() was even installing
> SPTEs into an invalid root!  Anything that takes a reference to a root and yields
> (or never takes mmu_lock) can potentially modify a SPTE under an invalid root.

That's true, I don't think there's really a problem with this commit,
just a different way of dealing with the PTs.


>
> Checking the paging structures for this flow wouldn't change anything.  Invalidating
> a root doesn't immediately zap SPTEs, it just marks the root invalid.  The other
> subtle gotcha is that kvm_reload_remote_mmus() doesn't actually gaurantee all vCPUs
> will have dropped the invalid root or performed a TLB flush when mmu_lock is dropped,
> those guarantees are only with respect to re-entering the guest!
>
> All of the above is no small part of why I don't want to walk the page tables:
> it's completely misleading as walking the page tables doesn't actually provide any
> protection, it's holding RCU that guarantees KVM doesn't write memory it doesn't own.

That's a great point. I was thinking about the RCU protection being
sort of passed down through the RCU dereferences from the root of the
paging structure to whatever SPTE we modify, but since we protect the
SPs with RCU too, dereferencing from them is just as good, I suppose.

>
> > Another approach to this would be to do in-place promotion / in-place
> > splitting once the patch sets David and I just sent out are merged.  That
> > would avoid causing extra page faults here to bring in the page after this
> > zap, but it probably wouldn't be safe if we did it under an invalidated root.
>
> I agree that in-place promotion would be better, but if we do that, I think a logical
> intermediate step would be to stop zapping unrelated roots and entries.  If there's
> a bug that is exposed/introduced by not zapping other stuff, I would much rather it
> show up when KVM stops zapping other stuff, not when KVM stops zapping other stuff
> _and_ promotes in place.  Ditto for if in-place promotion introduces a bug.

That makes sense. I think this is a good first step.

>
> > I'd rather avoid this extra complexity and just tolerate the worse
> > performance on the iTLB multi hit mitigation at this point since new
> > CPUs seem to be moving past that vulnerability.
>
> IMO, it reduces complexity, especially when looking at the series as a whole, which
> I fully realize you haven't yet done :-)  Setting aside the complexities of each
> chunk of code, what I find complex with the current TDP MMU zapping code is that
> there are no precise rules for what needs to be done in each situation.  I'm not
> criticizing how we got to this point, I absolutely think that hitting everything
> with a big hammer to get the initial version stable was the right thing to do.
>
> But a side effect of the big hammer approach is that it makes reasoning about things
> more difficult, e.g. "when is it safe to modify a SPTE versus when is it safe to insert
> a SPTE into the paging structures?" or "what needs to be zapped when the mmu_notifier
> unmaps a range?".
>
> And I also really want to avoid another snafu like the memslots with passthrough
> GPUs bug, where using a big hammer (zap all) when a smaller hammer (zap SPTEs for
> the memslot) _should_ work allows bugs to creep in unnoticed because they're hidden
> by overzealous zapping.
>
> > If you think this is worth the complexity, it'd be nice to do a little
> > benchmarking to make sure it's giving us a substantial improvement.
>
> Performance really isn't a motivating factor.  Per the changelog, the motivation
> is mostly to allow later patches to simplify zap_gfn_range() by having it zap only
> leaf SPTEs, and now that I've typed it up, also all of the above :-)

That makes sense, and addresses all my concerns. Carry on.
Thanks for writing all that up.

  reply	other threads:[~2021-11-23 19:35 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-20  4:50 [PATCH 00/28] KVM: x86/mmu: Overhaul TDP MMU zapping and flushing Sean Christopherson
2021-11-20  4:50 ` [PATCH 01/28] KVM: x86/mmu: Use yield-safe TDP MMU root iter in MMU notifier unmapping Sean Christopherson
2021-11-22 19:48   ` Ben Gardon
2021-11-30  8:03   ` Paolo Bonzini
2021-11-20  4:50 ` [PATCH 02/28] KVM: x86/mmu: Skip tlb flush if it has been done in zap_gfn_range() Sean Christopherson
2021-11-20  4:50 ` [PATCH 03/28] KVM: x86/mmu: Remove spurious TLB flushes in TDP MMU zap collapsible path Sean Christopherson
2021-11-20  4:50 ` [PATCH 04/28] KVM: x86/mmu: Retry page fault if root is invalidated by memslot update Sean Christopherson
2021-11-22 19:54   ` Ben Gardon
2021-12-01 20:49   ` Paolo Bonzini
2021-12-08 19:17   ` Sean Christopherson
2021-11-20  4:50 ` [PATCH 05/28] KVM: x86/mmu: Check for present SPTE when clearing dirty bit in TDP MMU Sean Christopherson
2021-11-22 19:57   ` Ben Gardon
2021-11-20  4:50 ` [PATCH 06/28] KVM: x86/mmu: Formalize TDP MMU's (unintended?) deferred TLB flush logic Sean Christopherson
2021-11-20  4:50 ` [PATCH 07/28] KVM: x86/mmu: Document that zapping invalidated roots doesn't need to flush Sean Christopherson
2021-11-20  4:50 ` [PATCH 08/28] KVM: x86/mmu: Drop unused @kvm param from kvm_tdp_mmu_get_root() Sean Christopherson
2021-11-22 20:02   ` Ben Gardon
2021-11-20  4:50 ` [PATCH 09/28] KVM: x86/mmu: Require mmu_lock be held for write in unyielding root iter Sean Christopherson
2021-11-22 20:10   ` Ben Gardon
2021-11-22 20:19     ` Sean Christopherson
2021-11-20  4:50 ` [PATCH 10/28] KVM: x86/mmu: Allow yielding when zapping GFNs for defunct TDP MMU root Sean Christopherson
2021-11-22 21:30   ` Ben Gardon
2021-11-22 22:40     ` Sean Christopherson
2021-11-22 23:03       ` Ben Gardon
2021-12-14 23:45     ` Sean Christopherson
2021-12-14 23:52       ` Sean Christopherson
2021-11-20  4:50 ` [PATCH 11/28] KVM: x86/mmu: Check for !leaf=>leaf, not PFN change, in TDP MMU SP removal Sean Christopherson
2021-11-20  4:50 ` [PATCH 12/28] KVM: x86/mmu: Batch TLB flushes from TDP MMU for MMU notifier change_spte Sean Christopherson
2021-11-22 21:45   ` Ben Gardon
2021-11-20  4:50 ` [PATCH 13/28] KVM: x86/mmu: Drop RCU after processing each root in MMU notifier hooks Sean Christopherson
2021-11-22 21:47   ` Ben Gardon
2021-11-20  4:50 ` [PATCH 14/28] KVM: x86/mmu: Add helpers to read/write TDP MMU SPTEs and document RCU Sean Christopherson
2021-11-22 21:55   ` Ben Gardon
2021-11-20  4:50 ` [PATCH 15/28] KVM: x86/mmu: Take TDP MMU roots off list when invalidating all roots Sean Christopherson
2021-11-22 22:20   ` Ben Gardon
2021-11-22 23:08     ` Sean Christopherson
2021-11-23  0:03       ` Ben Gardon
2021-12-14 23:34         ` Sean Christopherson
2021-11-20  4:50 ` [PATCH 16/28] KVM: x86/mmu: WARN if old _or_ new SPTE is REMOVED in non-atomic path Sean Christopherson
2021-11-22 21:57   ` Ben Gardon
2021-11-20  4:50 ` [PATCH 17/28] KVM: x86/mmu: Terminate yield-friendly walk if invalid root observed Sean Christopherson
2021-11-22 22:25   ` Ben Gardon
2021-11-20  4:50 ` [PATCH 18/28] KVM: x86/mmu: Refactor low-level TDP MMU set SPTE helper to take raw vals Sean Christopherson
2021-11-22 22:29   ` Ben Gardon
2021-11-20  4:50 ` [PATCH 19/28] KVM: x86/mmu: Zap only the target TDP MMU shadow page in NX recovery Sean Christopherson
2021-11-22 22:43   ` Ben Gardon
2021-11-23  1:16     ` Sean Christopherson
2021-11-23 19:35       ` Ben Gardon [this message]
2021-11-20  4:50 ` [PATCH 20/28] KVM: x86/mmu: Use common TDP MMU zap helper for MMU notifier unmap hook Sean Christopherson
2021-11-22 22:49   ` Ben Gardon
2021-11-20  4:50 ` [PATCH 21/28] KVM: x86/mmu: Add TDP MMU helper to zap a root Sean Christopherson
2021-11-22 22:54   ` Ben Gardon
2021-11-22 23:15     ` Sean Christopherson
2021-11-22 23:38       ` Ben Gardon
2021-11-20  4:50 ` [PATCH 22/28] KVM: x86/mmu: Skip remote TLB flush when zapping all of TDP MMU Sean Christopherson
2021-11-22 23:00   ` Ben Gardon
2021-11-20  4:50 ` [PATCH 23/28] KVM: x86/mmu: Use "zap root" path for "slow" zap of all TDP MMU SPTEs Sean Christopherson
2021-11-20  4:50 ` [PATCH 24/28] KVM: x86/mmu: Add dedicated helper to zap TDP MMU root shadow page Sean Christopherson
2021-11-23  1:04   ` Ben Gardon
2021-11-20  4:50 ` [PATCH 25/28] KVM: x86/mmu: Require mmu_lock be held for write to zap TDP MMU range Sean Christopherson
2021-11-23 19:58   ` Ben Gardon
2021-11-20  4:50 ` [PATCH 26/28] KVM: x86/mmu: Zap only TDP MMU leafs in kvm_zap_gfn_range() Sean Christopherson
2021-11-23 19:58   ` Ben Gardon
2021-11-20  4:50 ` [PATCH 27/28] KVM: x86/mmu: Do remote TLB flush before dropping RCU in TDP MMU resched Sean Christopherson
2021-11-23 19:58   ` Ben Gardon
2021-11-24 18:42     ` Sean Christopherson
2021-11-30 11:29   ` Paolo Bonzini
2021-11-30 15:45     ` Sean Christopherson
2021-11-30 16:16       ` Paolo Bonzini
2021-11-20  4:50 ` [PATCH 28/28] KVM: x86/mmu: Defer TLB flush to caller when freeing TDP MMU shadow pages Sean Christopherson
2021-11-23 20:12   ` Ben Gardon
2021-12-01 17:53 ` [PATCH 00/28] KVM: x86/mmu: Overhaul TDP MMU zapping and flushing David Matlack
2021-12-02  2:03   ` Sean Christopherson
2021-12-03  0:16     ` David Matlack

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CANgfPd9CdP-4aYkM7SCtCtV+v4T3HsyG6F8tLu=FCBz1nt=htg@mail.gmail.com' \
    --to=bgardon@google.com \
    --cc=houwenlong93@linux.alibaba.com \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).