kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Alexander Graf <graf@amazon.com>
Cc: "Alex Williamson" <alex.williamson@redhat.com>,
	"Radim Krčmář" <rkrcmar@redhat.com>,
	kvm@vger.kernel.org, "Xiao Guangrong" <guangrong.xiao@gmail.com>,
	"Chandrasekaran, Siddharth" <sidcha@amazon.de>,
	"Paolo Bonzini" <pbonzini@redhat.com>
Subject: Re: [PATCH v2 11/27] KVM: x86/mmu: Zap only the relevant pages when removing a memslot
Date: Mon, 24 Oct 2022 15:55:55 +0000	[thread overview]
Message-ID: <Y1a1i9vbJ/pVmV9r@google.com> (raw)
In-Reply-To: <490509f6-ae1a-4fc8-42a1-b037d6bffada@amazon.com>

On Mon, Oct 24, 2022, Alexander Graf wrote:
> Hey Sean,
> 
> On 21.10.22 21:40, Sean Christopherson wrote:
> > 
> > On Thu, Oct 20, 2022, Alexander Graf wrote:
> > > On 20.10.22 22:37, Sean Christopherson wrote:
> > > > On Thu, Oct 20, 2022, Alexander Graf wrote:
> > > > > On 26.06.20 19:32, Sean Christopherson wrote:
> > > > > > /cast <thread necromancy>
> > > > > > 
> > > > > > On Tue, Aug 20, 2019 at 01:03:19PM -0700, Sean Christopherson wrote:
> > > > > [...]
> > > > > 
> > > > > > I don't think any of this explains the pass-through GPU issue.  But, we
> > > > > > have a few use cases where zapping the entire MMU is undesirable, so I'm
> > > > > > going to retry upstreaming this patch as with per-VM opt-in.  I wanted to
> > > > > > set the record straight for posterity before doing so.
> > > > > Hey Sean,
> > > > > 
> > > > > Did you ever get around to upstream or rework the zap optimization? The way
> > > > > I read current upstream, a memslot change still always wipes all SPTEs, not
> > > > > only the ones that were changed.
> > > > Nope, I've more or less given up hope on zapping only the deleted/moved memslot.
> > > > TDX (and SNP?) will preserve SPTEs for guest private memory, but they're very
> > > > much a special case.
> > > > 
> > > > Do you have use case and/or issue that doesn't play nice with the "zap all" behavior?
> > > 
> > > Yeah, we're looking at adding support for the Hyper-V VSM extensions which
> > > Windows uses to implement Credential Guard. With that, the guest gets access
> > > to hypercalls that allow it to set reduced permissions for arbitrary gfns.
> > > To ensure that user space has full visibility into those for live migration,
> > > memory slots to model access would be a great fit. But it means we'd do
> > > ~100k memslot modifications on boot.
> > Oof.  100k memslot updates is going to be painful irrespective of flushing.  And
> > memslots (in their current form) won't work if the guest can drop executable
> > permissions.
> > 
> > Assuming KVM needs to support a KVM_MEM_NO_EXEC flag, rather than trying to solve
> > the "KVM flushes everything on memslot deletion", I think we should instead
> > properly support toggling KVM_MEM_READONLY (and KVM_MEM_NO_EXEC) without forcing
> > userspace to delete the memslot.  Commit 75d61fbcf563 ("KVM: set_memory_region:
> 
> 
> That would be a cute acceleration for the case where we have to change
> permissions for a full slot. Unfortunately, the bulk of the changes are slot
> splits.

Ah, right, the guest will be operating on per-page granularity.

> We already built a prototype implementation of an atomic memslot update
> ioctl that allows us to keep other vCPUs running while we do the
> delete/create/create/create operation.

Please weigh in with your use case on a relevant upstream discussion regarding
"atomic" memslot updates[*].  I suspect we'll end up with a different solution
for this use case (see below), but we should at least capture all potential use
cases and ideas for modifying memslots without pausing vCPUs.

[*] https://lore.kernel.org/all/20220909104506.738478-1-eesposit@redhat.com

> But even with that, we see up to 30 min boot times for larger guests that
> most of the time are stuck in zapping pages.

Out of curiosity, did you measure runtime performance?  I would expect some amount
of runtime overhead as well dut to fragmenting memslots to that degree.

> I guess we have 2 options to make this viable:
> 
>   1) Optimize memslot splits + modifications to a point where they're fast
> enough
>   2) Add a different, faster mechanism on top of memslots for page granular
> permission bits

#2 crossed my mind as well.  This is actually nearly identical to the confidential
VM use case, where KVM needs to handle guest-initiated conversions of memory between
"private" and "shared" on a per-page granularity.  The proposed solution for that
is indeed a layer on top of memslots[*], which we arrived at in no small part because
splitting memslots was going to be a bottleneck.

Extending the proposed mem_attr_array to support additional state should be quite
easy.  The framework is all there, KVM just needs a few extra flags values, e.g.

	KVM_MEM_ATTR_SHARED	BIT(0)
	KVM_MEM_ATTR_READONLY	BIT(1)
	KVM_MEM_ATTR_NOEXEC	BIT(2)

and then new ioctls to expose the functionality to userspace.  Actually, if we
want to go this route, it might even make sense to define new a generic MEM_ATTR
ioctl() right away instead of repurposing KVM_MEMORY_ENCRYPT_(UN)REG_REGION for
the private vs. shared use case.

[*] https://lore.kernel.org/all/20220915142913.2213336-6-chao.p.peng@linux.intel.com

      reply	other threads:[~2022-10-24 17:24 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-05 20:54 [PATCH v2 00/27] KVM: x86/mmu: Remove fast invalidate mechanism Sean Christopherson
2019-02-05 20:54 ` [PATCH v2 01/27] KVM: Call kvm_arch_memslots_updated() before updating memslots Sean Christopherson
2019-02-06  9:12   ` Cornelia Huck
2019-02-12 12:36 ` [PATCH v2 00/27] KVM: x86/mmu: Remove fast invalidate mechanism Paolo Bonzini
     [not found] ` <20190205210137.1377-11-sean.j.christopherson@intel.com>
2019-08-13 16:04   ` [PATCH v2 11/27] KVM: x86/mmu: Zap only the relevant pages when removing a memslot Alex Williamson
2019-08-13 17:04     ` Sean Christopherson
2019-08-13 17:57       ` Alex Williamson
2019-08-13 19:33         ` Alex Williamson
2019-08-13 20:19           ` Sean Christopherson
2019-08-13 20:37             ` Paolo Bonzini
2019-08-13 21:14               ` Alex Williamson
2019-08-13 21:15                 ` Paolo Bonzini
2019-08-13 22:10                   ` Alex Williamson
2019-08-15 14:46                 ` Sean Christopherson
2019-08-15 15:23             ` Alex Williamson
2019-08-15 16:00               ` Sean Christopherson
2019-08-15 18:16                 ` Alex Williamson
2019-08-15 19:25                   ` Sean Christopherson
2019-08-15 20:11                     ` Alex Williamson
2019-08-19 16:03               ` Paolo Bonzini
2019-08-20 20:03                 ` Sean Christopherson
2019-08-20 20:42                   ` Alex Williamson
2019-08-20 21:02                     ` Sean Christopherson
2019-08-21 19:08                       ` Alex Williamson
2019-08-21 19:35                         ` Alex Williamson
2019-08-21 20:30                           ` Sean Christopherson
2019-08-23  2:25                             ` Sean Christopherson
2019-08-23 22:05                               ` Alex Williamson
2019-08-21 20:10                         ` Sean Christopherson
2019-08-26  7:36                           ` Tian, Kevin
2019-08-26 14:56                           ` Sean Christopherson
2020-06-26 17:32                   ` Sean Christopherson
2022-10-20 18:31                     ` Alexander Graf
2022-10-20 20:37                       ` Sean Christopherson
2022-10-20 21:06                         ` Alexander Graf
2022-10-21 19:40                           ` Sean Christopherson
2022-10-24  6:12                             ` Alexander Graf
2022-10-24 15:55                               ` Sean Christopherson [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y1a1i9vbJ/pVmV9r@google.com \
    --to=seanjc@google.com \
    --cc=alex.williamson@redhat.com \
    --cc=graf@amazon.com \
    --cc=guangrong.xiao@gmail.com \
    --cc=kvm@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=rkrcmar@redhat.com \
    --cc=sidcha@amazon.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).