All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ignat Korchagin <ignat@cloudflare.com>
To: Sean Christopherson <seanjc@google.com>
Cc: kvm@vger.kernel.org, Paolo Bonzini <pbonzini@redhat.com>,
	stevensd@chromium.org, kernel-team <kernel-team@cloudflare.com>
Subject: Re: Potential bug in TDP MMU
Date: Sat, 11 Dec 2021 09:38:14 +0000	[thread overview]
Message-ID: <CALrw=nFK7vhBXXzAB0pti-pdp1T_wtr+50Dj8nwYDHF77AsBZA@mail.gmail.com> (raw)
In-Reply-To: <YbQPcsnpowmCP7G8@google.com>

On Sat, Dec 11, 2021 at 2:39 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Fri, Dec 10, 2021, Ignat Korchagin wrote:
> > I've been trying to figure out the difference between "good" runs and
> > "bad" runs of gvisor. So, if I've been running the following bpftrace
> > onliner:
>
> ...
>
> > That is, I never get a stack with
> > kvm_tdp_mmu_put_root->..->kvm_set_pfn_dirty with a "good" run.
> > Perhaps, this may shed some light onto what is going on.
>
> Hmm, a little?
>
> Based on the WARN backtrace, KVM encounters an entire chain of valid, present TDP
> MMU paging structures _after_ exit_mm() in the do_exit() path, as the call to
> task_work_run() in do_exit() occurs after exit_mm().
>
> That means that kvm_mmu_zap_all() is guaranteed to have been called before the
> fatal kvm_arch_destroy_vm(), as either:
>
>   a) exit_mm() put the last reference to mm_users and thus called __mmput ->
>      exit_mmap() -> mmu_notifier_release() -> ... -> kvm_mmu_zap_all().
>
>   b) Something else had a reference to mm_users, and so KVM's ->release hook was
>      invoked by kvm_destroy_vm() -> mmu_notifier_unregister().
>
> It's probably fairly safe to assume this is a TDP MMU bug, which rules out races
> or bad refcounts in other areas.

Most likely. Currently we're using kvm.tdp_mmu=0 kernel cmdline as a
workaround and haven't encountered any issues.

> That means that KVM (a) is somehow losing track of a root, (b) isn't zapping all
> SPTEs in kvm_mmu_zap_all(), or (c) is installing a SPTE after the mm has been released.
>
> (a) is unlikely because kvm_tdp_mmu_get_vcpu_root_hpa() is the only way for a
> vCPU to get a reference, and it holds mmu_lock for write, doesn't yield, and
> either gets a root from the list or adds a root to the list.
>
> (b) is unlikely because I would expect the fallout to be much larger and not
> unique to your setup.
>
> That leaves (c), which isn't all that likely either.  I can think of a variety of
> ways KVM might write a defunct SPTE, but I can't concoct a scenario where an
> entire tree of a present paging structures is written.
>
> Can you run with the below debug patch and see if you get a hit in the failure
> scenario?  Or possibly even a non-failure scenario?  This should either confirm
> or rule out (c).
>
>
> ---
>  arch/x86/kvm/mmu/mmu.c     | 2 ++
>  arch/x86/kvm/mmu/tdp_mmu.c | 5 +++++
>  include/linux/kvm_host.h   | 2 ++
>  3 files changed, 9 insertions(+)
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 1ccee4d17481..e4e283a38570 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -5939,6 +5939,8 @@ void kvm_mmu_zap_all(struct kvm *kvm)
>         LIST_HEAD(invalid_list);
>         int ign;
>
> +       atomic_set(&kvm->mm_released, 1);
> +
>         write_lock(&kvm->mmu_lock);
>  restart:
>         list_for_each_entry_safe(sp, node, &kvm->arch.active_mmu_pages, link) {
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index b69e47e68307..432ccf05f446 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -504,6 +504,9 @@ static inline bool tdp_mmu_set_spte_atomic(struct kvm *kvm,
>  {
>         lockdep_assert_held_read(&kvm->mmu_lock);
>
> +       WARN_ON(atomic_read(&kvm->mm_released) &&
> +               new_spte && !is_removed_spte(new_spte));
> +
>         /*
>          * Do not change removed SPTEs. Only the thread that froze the SPTE
>          * may modify it.
> @@ -577,6 +580,8 @@ static inline void __tdp_mmu_set_spte(struct kvm *kvm, struct tdp_iter *iter,
>  {
>         lockdep_assert_held_write(&kvm->mmu_lock);
>
> +       WARN_ON(atomic_read(&kvm->mm_released) && new_spte);
> +
>         /*
>          * No thread should be using this function to set SPTEs to the
>          * temporary removed SPTE value.
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index e7bfcc3b6b0b..8e76e2f6c3be 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -569,6 +569,8 @@ struct kvm {
>
>         struct mutex slots_lock;
>
> +       atomic_t mm_released;
> +
>         /*
>          * Protects the arch-specific fields of struct kvm_memory_slots in
>          * use by the VM. To be used under the slots_lock (above) or in a
>
> base-commit: 1c10f4b4877ffaed602d12ff8cbbd5009e82c970
> --

Thanks. Applied the patch, but no warnings are triggered neither in
"good" case nor in "bad" case.

Ignat

  reply	other threads:[~2021-12-11  9:38 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-29 21:44 Potential bug in TDP MMU Ignat Korchagin
2021-11-30  9:29 ` Paolo Bonzini
2021-11-30 10:58   ` Ignat Korchagin
2021-11-30 10:59     ` Ignat Korchagin
2021-11-30 11:11     ` Paolo Bonzini
2021-11-30 11:19       ` Ignat Korchagin
2021-11-30 11:43         ` Ignat Korchagin
2021-11-30 11:49           ` Paolo Bonzini
2021-11-30 12:13           ` Paolo Bonzini
2021-11-30 20:23     ` Sean Christopherson
2021-12-01 23:44       ` Ignat Korchagin
2021-12-10 23:04         ` Ignat Korchagin
2021-12-11  1:34           ` David Matlack
2021-12-11  1:49             ` Paolo Bonzini
2021-12-11 17:46               ` David Matlack
2021-12-11  1:37           ` Paolo Bonzini
2021-12-11  2:39           ` Sean Christopherson
2021-12-11  9:38             ` Ignat Korchagin [this message]
2021-12-11 20:49             ` Paolo Bonzini
2021-12-13 16:14               ` Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CALrw=nFK7vhBXXzAB0pti-pdp1T_wtr+50Dj8nwYDHF77AsBZA@mail.gmail.com' \
    --to=ignat@cloudflare.com \
    --cc=kernel-team@cloudflare.com \
    --cc=kvm@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=stevensd@chromium.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.