From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Marc Zyngier <maz@kernel.org>,
Huacai Chen <chenhuacai@kernel.org>,
Aleksandar Markovic <aleksandar.qemu.devel@gmail.com>,
Paul Mackerras <paulus@ozlabs.org>,
James Morse <james.morse@arm.com>,
Julien Thierry <julien.thierry.kdev@gmail.com>,
Suzuki K Poulose <suzuki.poulose@arm.com>,
Vitaly Kuznetsov <vkuznets@redhat.com>,
Wanpeng Li <wanpengli@tencent.com>,
Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>,
linux-arm-kernel@lists.infradead.org,
kvmarm@lists.cs.columbia.edu, linux-mips@vger.kernel.org,
kvm@vger.kernel.org, kvm-ppc@vger.kernel.org,
linux-kernel@vger.kernel.org, Ben Gardon <bgardon@google.com>
Subject: Re: [PATCH 10/18] KVM: Move x86's MMU notifier memslot walkers to generic code
Date: Wed, 31 Mar 2021 16:20:20 +0000 [thread overview]
Message-ID: <YGShRP9E49p3vcos@google.com> (raw)
In-Reply-To: <ba3f7a9c-0b59-cbeb-5d46-4236cde2c51f@redhat.com>
On Wed, Mar 31, 2021, Paolo Bonzini wrote:
> On 26/03/21 03:19, Sean Christopherson wrote:
> > +#ifdef KVM_ARCH_WANT_NEW_MMU_NOTIFIER_APIS
> > + kvm_handle_hva_range(mn, address, address + 1, pte, kvm_set_spte_gfn);
> > +#else
> > struct kvm *kvm = mmu_notifier_to_kvm(mn);
> > int idx;
> > trace_kvm_set_spte_hva(address);
> > idx = srcu_read_lock(&kvm->srcu);
> >
> > KVM_MMU_LOCK(kvm);
> >
> > kvm->mmu_notifier_seq++;
> >
> > if (kvm_set_spte_hva(kvm, address, pte))
> > kvm_flush_remote_tlbs(kvm);
> >
> > KVM_MMU_UNLOCK(kvm);
> > srcu_read_unlock(&kvm->srcu, idx);
> > +#endif
>
> The kvm->mmu_notifier_seq is missing in the new API side. I guess you can
> add an argument to __kvm_handle_hva_range and handle it also in patch 15
> ("KVM: Take mmu_lock when handling MMU notifier iff the hva hits a
> memslot").
Yikes. Superb eyes!
That does bring up an oddity I discovered when digging into this. Every call
to .change_pte() is bookended by .invalidate_range_{start,end}(), i.e. the above
missing kvm->mmu_notifier_seq++ is benign because kvm->mmu_notifier_count is
guaranteed to be non-zero.
I'm also fairly certain it means kvm_set_spte_gfn() is effectively dead code on
_all_ architectures. x86 and MIPS are clearcut nops if the old SPTE is
not-present, and that's guaranteed due to the prior invalidation. PPC simply
unmaps the SPTE, which again should be a nop due to the invalidation. arm64 is
a bit murky, but if I'm reading the code correctly, it's also a nop because
kvm_pgtable_stage2_map() is called without a cache pointer, which I think means
it will map an entry if and only if an existing PTE was found.
I haven't actually tested the above analysis, e.g. by asserting that
kvm->mmu_notifier_count is indeed non-zero. I'll do that sooner than later.
But, given the shortlog of commit:
6bdb913f0a70 ("mm: wrap calls to set_pte_at_notify with invalidate_range_start
and invalidate_range_end")
I'm fairly confident my analysis is correct. And if so, it also means that the
whole point of adding .change_pte() in the first place (for KSM, commit
828502d30073, "ksm: add mmu_notifier set_pte_at_notify()"), has since been lost.
When it was originally added, .change_pte() was a pure alternative to
invalidating the entry.
void __mmu_notifier_change_pte(struct mm_struct *mm, unsigned long address,
pte_t pte)
{
struct mmu_notifier *mn;
struct hlist_node *n;
rcu_read_lock();
hlist_for_each_entry_rcu(mn, n, &mm->mmu_notifier_mm->list, hlist) {
if (mn->ops->change_pte)
mn->ops->change_pte(mn, mm, address, pte);
/*
* Some drivers don't have change_pte,
* so we must call invalidate_page in that case.
*/
else if (mn->ops->invalidate_page)
mn->ops->invalidate_page(mn, mm, address);
}
rcu_read_unlock();
}
The aforementioned commit 6bdb913f0a70 wrapped set_pte_at_notify() with
invalidate_range_{start,end}() so that .invalidate_page() implementations could
sleep. But, no one noticed that in doing so, .change_pte() was completely
neutered.
Assuming all of the above is correct, I'm very tempted to rip out .change_pte()
entirely. It's been dead weight for 8+ years and no one has complained about
KSM+KVM performance (I'd also be curious to know how much performance was gained
by shaving VM-Exits). As KVM is the only user of .change_pte(), dropping it in
KVM would mean the entire MMU notifier could also go away.
next prev parent reply other threads:[~2021-03-31 16:20 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-26 2:19 [PATCH 00/18] KVM: Consolidate and optimize MMU notifiers Sean Christopherson
2021-03-26 2:19 ` [PATCH 01/18] KVM: x86/mmu: Coalesce TDP MMU TLB flushes when zapping collapsible SPTEs Sean Christopherson
2021-03-26 2:19 ` [PATCH 02/18] KVM: x86/mmu: Move flushing for "slot" handlers to caller for legacy MMU Sean Christopherson
2021-03-26 2:19 ` [PATCH 03/18] KVM: x86/mmu: Coalesce TLB flushes when zapping collapsible SPTEs Sean Christopherson
2021-03-26 2:19 ` [PATCH 04/18] KVM: x86/mmu: Coalesce TLB flushes across address spaces for gfn range zap Sean Christopherson
2021-03-26 2:19 ` [PATCH 05/18] KVM: x86/mmu: Pass address space ID to __kvm_tdp_mmu_zap_gfn_range() Sean Christopherson
2021-03-26 2:19 ` [PATCH 06/18] KVM: x86/mmu: Pass address space ID to TDP MMU root walkers Sean Christopherson
2021-03-26 2:19 ` [PATCH 07/18] KVM: x86/mmu: Use leaf-only loop for walking TDP SPTEs when changing SPTE Sean Christopherson
2021-03-26 2:19 ` [PATCH 08/18] KVM: Move prototypes for MMU notifier callbacks to generic code Sean Christopherson
2021-03-26 2:19 ` [PATCH 09/18] KVM: Move arm64's MMU notifier trace events " Sean Christopherson
2021-03-26 2:19 ` [PATCH 10/18] KVM: Move x86's MMU notifier memslot walkers " Sean Christopherson
2021-03-31 7:52 ` Paolo Bonzini
2021-03-31 16:20 ` Sean Christopherson [this message]
2021-03-31 16:36 ` Paolo Bonzini
2021-03-26 2:19 ` [PATCH 11/18] KVM: arm64: Convert to the gfn-based MMU notifier callbacks Sean Christopherson
2021-03-26 2:19 ` [PATCH 12/18] KVM: MIPS/MMU: " Sean Christopherson
2021-03-31 7:41 ` Paolo Bonzini
2021-03-26 2:19 ` [PATCH 13/18] KVM: PPC: " Sean Christopherson
2021-03-26 2:19 ` [PATCH 14/18] KVM: Kill off the old hva-based " Sean Christopherson
2021-03-26 2:19 ` [PATCH 15/18] KVM: Take mmu_lock when handling MMU notifier iff the hva hits a memslot Sean Christopherson
2021-03-26 2:19 ` [PATCH 16/18] KVM: Don't take mmu_lock for range invalidation unless necessary Sean Christopherson
2021-03-31 7:52 ` Paolo Bonzini
2021-03-31 8:35 ` Paolo Bonzini
2021-03-31 16:41 ` Sean Christopherson
2021-03-31 16:47 ` Paolo Bonzini
2021-03-31 19:47 ` Sean Christopherson
2021-03-31 20:42 ` Paolo Bonzini
2021-03-31 21:05 ` Sean Christopherson
2021-03-31 21:22 ` Sean Christopherson
2021-03-31 21:36 ` Paolo Bonzini
2021-03-31 21:35 ` Paolo Bonzini
2021-03-31 21:47 ` Sean Christopherson
2021-03-31 20:15 ` Sean Christopherson
2021-03-31 20:30 ` Paolo Bonzini
2021-03-31 20:52 ` Sean Christopherson
2021-03-31 21:00 ` Paolo Bonzini
2021-03-26 2:19 ` [PATCH 17/18] KVM: x86/mmu: Allow yielding during MMU notifier unmap/zap, if possible Sean Christopherson
2021-03-26 2:19 ` [PATCH 18/18] KVM: x86/mmu: Drop trace_kvm_age_page() tracepoint Sean Christopherson
2021-03-30 18:32 ` [PATCH 00/18] KVM: Consolidate and optimize MMU notifiers Ben Gardon
2021-03-30 19:48 ` Paolo Bonzini
2021-03-30 19:58 ` Sean Christopherson
2021-03-31 7:57 ` Paolo Bonzini
2021-03-31 9:34 ` Marc Zyngier
2021-03-31 9:41 ` Paolo Bonzini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YGShRP9E49p3vcos@google.com \
--to=seanjc@google.com \
--cc=aleksandar.qemu.devel@gmail.com \
--cc=bgardon@google.com \
--cc=chenhuacai@kernel.org \
--cc=james.morse@arm.com \
--cc=jmattson@google.com \
--cc=joro@8bytes.org \
--cc=julien.thierry.kdev@gmail.com \
--cc=kvm-ppc@vger.kernel.org \
--cc=kvm@vger.kernel.org \
--cc=kvmarm@lists.cs.columbia.edu \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mips@vger.kernel.org \
--cc=maz@kernel.org \
--cc=paulus@ozlabs.org \
--cc=pbonzini@redhat.com \
--cc=suzuki.poulose@arm.com \
--cc=vkuznets@redhat.com \
--cc=wanpengli@tencent.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).