kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sean Christopherson <sean.j.christopherson@intel.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Alex Williamson" <alex.williamson@redhat.com>,
	"Radim Krčmář" <rkrcmar@redhat.com>,
	kvm@vger.kernel.org, "Xiao Guangrong" <guangrong.xiao@gmail.com>
Subject: Re: [PATCH v2 11/27] KVM: x86/mmu: Zap only the relevant pages when removing a memslot
Date: Fri, 26 Jun 2020 10:32:50 -0700	[thread overview]
Message-ID: <20200626173250.GD6583@linux.intel.com> (raw)
In-Reply-To: <20190820200318.GA15808@linux.intel.com>

/cast <thread necromancy>

On Tue, Aug 20, 2019 at 01:03:19PM -0700, Sean Christopherson wrote:
> On Mon, Aug 19, 2019 at 06:03:05PM +0200, Paolo Bonzini wrote:
> > It seems like the problem occurs when the sp->gfn you "continue over"
> > includes a userspace-MMIO gfn.  But since I have no better ideas right
> > now, I'm going to apply the revert (we don't know for sure that it only
> > happens with assigned devices).
> 
> After many hours of staring, I've come to the conclusion that
> kvm_mmu_invalidate_zap_pages_in_memslot() is completely broken, i.e.
> a revert is definitely in order for 5.3 and stable.

Whelp, past me was wrong.  The patch wasn't completely broken, as the rmap
zapping piece of kvm_mmu_invalidate_zap_pages_in_memslot() was sufficient
to satisfy removal of the memslot.  I.e. zapping leaf PTEs (including large
pages) should prevent referencing the old memslot, the fact that zapping
upper level shadow pages was broken was irrelevant because there should be
no need to zap non-leaf PTEs.

The one thing that sticks out as potentially concerning is passing %false
for @lock_flush_tlb.  Dropping mmu_lock during slot_handle_level_range()
without flushing would allow a vCPU to create and use a new entry while a
different vCPU has the old entry cached in its TLB.  I think that could
even happen with a single vCPU if the memslot is deleted by a helper task,
and the zapped page was a large page that was fractured into small pages
when inserted into the TLB.

TL;DR: Assuming no other bugs in the kernel, this should be sufficient if
the goal is simply to prevent usage of a memslot:

static void kvm_mmu_invalidate_zap_pages_in_memslot(struct kvm *kvm,
			struct kvm_memory_slot *slot,
			struct kvm_page_track_notifier_node *node)
{
	bool flush;

	spin_lock(&kvm->mmu_lock);
	flush = slot_handle_all_level(kvm, slot, kvm_zap_rmapp, true);
	if (flush)
		kvm_flush_remote_tlbs(kvm);
	spin_unlock(&
}

> mmu_page_hash is indexed by the gfn of the shadow pages, not the gfn of
> the shadow ptes, e.g. for_each_valid_sp() will find page tables, page
> directories, etc...  Passing in the raw gfns of the memslot doesn't work
> because the gfn isn't properly adjusted/aligned to match how KVM tracks
> gfns for shadow pages, e.g. removal of the companion audio memslot that
> occupies gfns 0xc1080 - 0xc1083 would need to query gfn 0xc1000 to find
> the shadow page table containing the relevant sptes.
> 
> This is why Paolo's suggestion to remove slot_handle_all_level() on
> kvm_zap_rmapp() caused a BUG(), as zapping the rmaps cleaned up KVM's
> accounting without actually zapping the relevant sptes.

This is just straight up wrong, zapping the rmaps does zap the leaf sptes.
The BUG() occurs because gfn_to_rmap() works on the _new_ memslots instance,
and if a memslot is being deleted there is no memslot for the gfn, hence the
NULL pointer bug when mmu_page_zap_pte() attempts to zap a PTE.  Zapping the
rmaps (leaf/last PTEs) first "fixes" the issue by making it so that
mmu_page_zap_pte() will never see a present PTE for a non-existent meslot.

I don't think any of this explains the pass-through GPU issue.  But, we
have a few use cases where zapping the entire MMU is undesirable, so I'm
going to retry upstreaming this patch as with per-VM opt-in.  I wanted to
set the record straight for posterity before doing so.

  parent reply	other threads:[~2020-06-26 17:32 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-05 20:54 [PATCH v2 00/27] KVM: x86/mmu: Remove fast invalidate mechanism Sean Christopherson
2019-02-05 20:54 ` [PATCH v2 01/27] KVM: Call kvm_arch_memslots_updated() before updating memslots Sean Christopherson
2019-02-06  9:12   ` Cornelia Huck
2019-02-12 12:36 ` [PATCH v2 00/27] KVM: x86/mmu: Remove fast invalidate mechanism Paolo Bonzini
     [not found] ` <20190205210137.1377-11-sean.j.christopherson@intel.com>
2019-08-13 16:04   ` [PATCH v2 11/27] KVM: x86/mmu: Zap only the relevant pages when removing a memslot Alex Williamson
2019-08-13 17:04     ` Sean Christopherson
2019-08-13 17:57       ` Alex Williamson
2019-08-13 19:33         ` Alex Williamson
2019-08-13 20:19           ` Sean Christopherson
2019-08-13 20:37             ` Paolo Bonzini
2019-08-13 21:14               ` Alex Williamson
2019-08-13 21:15                 ` Paolo Bonzini
2019-08-13 22:10                   ` Alex Williamson
2019-08-15 14:46                 ` Sean Christopherson
2019-08-15 15:23             ` Alex Williamson
2019-08-15 16:00               ` Sean Christopherson
2019-08-15 18:16                 ` Alex Williamson
2019-08-15 19:25                   ` Sean Christopherson
2019-08-15 20:11                     ` Alex Williamson
2019-08-19 16:03               ` Paolo Bonzini
2019-08-20 20:03                 ` Sean Christopherson
2019-08-20 20:42                   ` Alex Williamson
2019-08-20 21:02                     ` Sean Christopherson
2019-08-21 19:08                       ` Alex Williamson
2019-08-21 19:35                         ` Alex Williamson
2019-08-21 20:30                           ` Sean Christopherson
2019-08-23  2:25                             ` Sean Christopherson
2019-08-23 22:05                               ` Alex Williamson
2019-08-21 20:10                         ` Sean Christopherson
2019-08-26  7:36                           ` Tian, Kevin
2019-08-26 14:56                           ` Sean Christopherson
2020-06-26 17:32                   ` Sean Christopherson [this message]
2022-10-20 18:31                     ` Alexander Graf
2022-10-20 20:37                       ` Sean Christopherson
2022-10-20 21:06                         ` Alexander Graf
2022-10-21 19:40                           ` Sean Christopherson
2022-10-24  6:12                             ` Alexander Graf
2022-10-24 15:55                               ` Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200626173250.GD6583@linux.intel.com \
    --to=sean.j.christopherson@intel.com \
    --cc=alex.williamson@redhat.com \
    --cc=guangrong.xiao@gmail.com \
    --cc=kvm@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=rkrcmar@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).