kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alexander Graf <graf@amazon.com>
To: Sean Christopherson <seanjc@google.com>
Cc: "Alex Williamson" <alex.williamson@redhat.com>,
	"Radim Krčmář" <rkrcmar@redhat.com>,
	kvm@vger.kernel.org, "Xiao Guangrong" <guangrong.xiao@gmail.com>,
	"Chandrasekaran, Siddharth" <sidcha@amazon.de>,
	"Paolo Bonzini" <pbonzini@redhat.com>
Subject: Re: [PATCH v2 11/27] KVM: x86/mmu: Zap only the relevant pages when removing a memslot
Date: Mon, 24 Oct 2022 08:12:22 +0200	[thread overview]
Message-ID: <490509f6-ae1a-4fc8-42a1-b037d6bffada@amazon.com> (raw)
In-Reply-To: <Y1L1t6Qw2CaLwJk3@google.com>

Hey Sean,

On 21.10.22 21:40, Sean Christopherson wrote:
>
> On Thu, Oct 20, 2022, Alexander Graf wrote:
>> On 20.10.22 22:37, Sean Christopherson wrote:
>>> On Thu, Oct 20, 2022, Alexander Graf wrote:
>>>> On 26.06.20 19:32, Sean Christopherson wrote:
>>>>> /cast <thread necromancy>
>>>>>
>>>>> On Tue, Aug 20, 2019 at 01:03:19PM -0700, Sean Christopherson wrote:
>>>> [...]
>>>>
>>>>> I don't think any of this explains the pass-through GPU issue.  But, we
>>>>> have a few use cases where zapping the entire MMU is undesirable, so I'm
>>>>> going to retry upstreaming this patch as with per-VM opt-in.  I wanted to
>>>>> set the record straight for posterity before doing so.
>>>> Hey Sean,
>>>>
>>>> Did you ever get around to upstream or rework the zap optimization? The way
>>>> I read current upstream, a memslot change still always wipes all SPTEs, not
>>>> only the ones that were changed.
>>> Nope, I've more or less given up hope on zapping only the deleted/moved memslot.
>>> TDX (and SNP?) will preserve SPTEs for guest private memory, but they're very
>>> much a special case.
>>>
>>> Do you have use case and/or issue that doesn't play nice with the "zap all" behavior?
>>
>> Yeah, we're looking at adding support for the Hyper-V VSM extensions which
>> Windows uses to implement Credential Guard. With that, the guest gets access
>> to hypercalls that allow it to set reduced permissions for arbitrary gfns.
>> To ensure that user space has full visibility into those for live migration,
>> memory slots to model access would be a great fit. But it means we'd do
>> ~100k memslot modifications on boot.
> Oof.  100k memslot updates is going to be painful irrespective of flushing.  And
> memslots (in their current form) won't work if the guest can drop executable
> permissions.
>
> Assuming KVM needs to support a KVM_MEM_NO_EXEC flag, rather than trying to solve
> the "KVM flushes everything on memslot deletion", I think we should instead
> properly support toggling KVM_MEM_READONLY (and KVM_MEM_NO_EXEC) without forcing
> userspace to delete the memslot.  Commit 75d61fbcf563 ("KVM: set_memory_region:


That would be a cute acceleration for the case where we have to change 
permissions for a full slot. Unfortunately, the bulk of the changes are 
slot splits. Let me explain with numbers from a 1 vcpu, 8GB Windows 
Server 2019 boot:

GFN permission modification requests: 46294
Unique GFNs: 21200

That means on boot, we start off with a few huge memslots for guest RAM. 
Then down the road, we need to change permissions for individual pages 
inside these larger regions. The obvious option for that is a memslot 
split - delete, create, create, create. Now we have 2 large memslots and 
1 that only spans a single page.

Later in the boot process, Windows then some times also toggles 
permissions for pages that it already split off earlier. That's the case 
we can optimize with the modify optimization you described in the 
previous email. But that's only about half the requests. The other half 
are memslot split requests.

We already built a prototype implementation of an atomic memslot update 
ioctl that allows us to keep other vCPUs running while we do the 
delete/create/create/create operation. But even with that, we see up to 
30 min boot times for larger guests that most of the time are stuck in 
zapping pages.

I guess we have 2 options to make this viable:

   1) Optimize memslot splits + modifications to a point where they're 
fast enough
   2) Add a different, faster mechanism on top of memslots for page 
granular permission bits

Also sorry for not posting the underlying credguard and atomic memslot 
patches yet. I wanted to kick off this conversation before sending them 
out - they're still too raw for upstream review atm :).


Thanks,

Alex




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879



  reply	other threads:[~2022-10-24  6:12 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-05 20:54 [PATCH v2 00/27] KVM: x86/mmu: Remove fast invalidate mechanism Sean Christopherson
2019-02-05 20:54 ` [PATCH v2 01/27] KVM: Call kvm_arch_memslots_updated() before updating memslots Sean Christopherson
2019-02-06  9:12   ` Cornelia Huck
2019-02-12 12:36 ` [PATCH v2 00/27] KVM: x86/mmu: Remove fast invalidate mechanism Paolo Bonzini
     [not found] ` <20190205210137.1377-11-sean.j.christopherson@intel.com>
2019-08-13 16:04   ` [PATCH v2 11/27] KVM: x86/mmu: Zap only the relevant pages when removing a memslot Alex Williamson
2019-08-13 17:04     ` Sean Christopherson
2019-08-13 17:57       ` Alex Williamson
2019-08-13 19:33         ` Alex Williamson
2019-08-13 20:19           ` Sean Christopherson
2019-08-13 20:37             ` Paolo Bonzini
2019-08-13 21:14               ` Alex Williamson
2019-08-13 21:15                 ` Paolo Bonzini
2019-08-13 22:10                   ` Alex Williamson
2019-08-15 14:46                 ` Sean Christopherson
2019-08-15 15:23             ` Alex Williamson
2019-08-15 16:00               ` Sean Christopherson
2019-08-15 18:16                 ` Alex Williamson
2019-08-15 19:25                   ` Sean Christopherson
2019-08-15 20:11                     ` Alex Williamson
2019-08-19 16:03               ` Paolo Bonzini
2019-08-20 20:03                 ` Sean Christopherson
2019-08-20 20:42                   ` Alex Williamson
2019-08-20 21:02                     ` Sean Christopherson
2019-08-21 19:08                       ` Alex Williamson
2019-08-21 19:35                         ` Alex Williamson
2019-08-21 20:30                           ` Sean Christopherson
2019-08-23  2:25                             ` Sean Christopherson
2019-08-23 22:05                               ` Alex Williamson
2019-08-21 20:10                         ` Sean Christopherson
2019-08-26  7:36                           ` Tian, Kevin
2019-08-26 14:56                           ` Sean Christopherson
2020-06-26 17:32                   ` Sean Christopherson
2022-10-20 18:31                     ` Alexander Graf
2022-10-20 20:37                       ` Sean Christopherson
2022-10-20 21:06                         ` Alexander Graf
2022-10-21 19:40                           ` Sean Christopherson
2022-10-24  6:12                             ` Alexander Graf [this message]
2022-10-24 15:55                               ` Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=490509f6-ae1a-4fc8-42a1-b037d6bffada@amazon.com \
    --to=graf@amazon.com \
    --cc=alex.williamson@redhat.com \
    --cc=guangrong.xiao@gmail.com \
    --cc=kvm@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=rkrcmar@redhat.com \
    --cc=seanjc@google.com \
    --cc=sidcha@amazon.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).