From: Alexander Graf <graf@amazon.com>
To: Sean Christopherson <seanjc@google.com>
Cc: "Alex Williamson" <alex.williamson@redhat.com>,
"Radim Krčmář" <rkrcmar@redhat.com>,
kvm@vger.kernel.org, "Xiao Guangrong" <guangrong.xiao@gmail.com>,
"Chandrasekaran, Siddharth" <sidcha@amazon.de>,
"Paolo Bonzini" <pbonzini@redhat.com>
Subject: Re: [PATCH v2 11/27] KVM: x86/mmu: Zap only the relevant pages when removing a memslot
Date: Mon, 24 Oct 2022 08:12:22 +0200 [thread overview]
Message-ID: <490509f6-ae1a-4fc8-42a1-b037d6bffada@amazon.com> (raw)
In-Reply-To: <Y1L1t6Qw2CaLwJk3@google.com>
Hey Sean,
On 21.10.22 21:40, Sean Christopherson wrote:
>
> On Thu, Oct 20, 2022, Alexander Graf wrote:
>> On 20.10.22 22:37, Sean Christopherson wrote:
>>> On Thu, Oct 20, 2022, Alexander Graf wrote:
>>>> On 26.06.20 19:32, Sean Christopherson wrote:
>>>>> /cast <thread necromancy>
>>>>>
>>>>> On Tue, Aug 20, 2019 at 01:03:19PM -0700, Sean Christopherson wrote:
>>>> [...]
>>>>
>>>>> I don't think any of this explains the pass-through GPU issue. But, we
>>>>> have a few use cases where zapping the entire MMU is undesirable, so I'm
>>>>> going to retry upstreaming this patch as with per-VM opt-in. I wanted to
>>>>> set the record straight for posterity before doing so.
>>>> Hey Sean,
>>>>
>>>> Did you ever get around to upstream or rework the zap optimization? The way
>>>> I read current upstream, a memslot change still always wipes all SPTEs, not
>>>> only the ones that were changed.
>>> Nope, I've more or less given up hope on zapping only the deleted/moved memslot.
>>> TDX (and SNP?) will preserve SPTEs for guest private memory, but they're very
>>> much a special case.
>>>
>>> Do you have use case and/or issue that doesn't play nice with the "zap all" behavior?
>>
>> Yeah, we're looking at adding support for the Hyper-V VSM extensions which
>> Windows uses to implement Credential Guard. With that, the guest gets access
>> to hypercalls that allow it to set reduced permissions for arbitrary gfns.
>> To ensure that user space has full visibility into those for live migration,
>> memory slots to model access would be a great fit. But it means we'd do
>> ~100k memslot modifications on boot.
> Oof. 100k memslot updates is going to be painful irrespective of flushing. And
> memslots (in their current form) won't work if the guest can drop executable
> permissions.
>
> Assuming KVM needs to support a KVM_MEM_NO_EXEC flag, rather than trying to solve
> the "KVM flushes everything on memslot deletion", I think we should instead
> properly support toggling KVM_MEM_READONLY (and KVM_MEM_NO_EXEC) without forcing
> userspace to delete the memslot. Commit 75d61fbcf563 ("KVM: set_memory_region:
That would be a cute acceleration for the case where we have to change
permissions for a full slot. Unfortunately, the bulk of the changes are
slot splits. Let me explain with numbers from a 1 vcpu, 8GB Windows
Server 2019 boot:
GFN permission modification requests: 46294
Unique GFNs: 21200
That means on boot, we start off with a few huge memslots for guest RAM.
Then down the road, we need to change permissions for individual pages
inside these larger regions. The obvious option for that is a memslot
split - delete, create, create, create. Now we have 2 large memslots and
1 that only spans a single page.
Later in the boot process, Windows then some times also toggles
permissions for pages that it already split off earlier. That's the case
we can optimize with the modify optimization you described in the
previous email. But that's only about half the requests. The other half
are memslot split requests.
We already built a prototype implementation of an atomic memslot update
ioctl that allows us to keep other vCPUs running while we do the
delete/create/create/create operation. But even with that, we see up to
30 min boot times for larger guests that most of the time are stuck in
zapping pages.
I guess we have 2 options to make this viable:
1) Optimize memslot splits + modifications to a point where they're
fast enough
2) Add a different, faster mechanism on top of memslots for page
granular permission bits
Also sorry for not posting the underlying credguard and atomic memslot
patches yet. I wanted to kick off this conversation before sending them
out - they're still too raw for upstream review atm :).
Thanks,
Alex
Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879
next prev parent reply other threads:[~2022-10-24 6:12 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-02-05 20:54 [PATCH v2 00/27] KVM: x86/mmu: Remove fast invalidate mechanism Sean Christopherson
2019-02-05 20:54 ` [PATCH v2 01/27] KVM: Call kvm_arch_memslots_updated() before updating memslots Sean Christopherson
2019-02-06 9:12 ` Cornelia Huck
2019-02-12 12:36 ` [PATCH v2 00/27] KVM: x86/mmu: Remove fast invalidate mechanism Paolo Bonzini
[not found] ` <20190205210137.1377-11-sean.j.christopherson@intel.com>
2019-08-13 16:04 ` [PATCH v2 11/27] KVM: x86/mmu: Zap only the relevant pages when removing a memslot Alex Williamson
2019-08-13 17:04 ` Sean Christopherson
2019-08-13 17:57 ` Alex Williamson
2019-08-13 19:33 ` Alex Williamson
2019-08-13 20:19 ` Sean Christopherson
2019-08-13 20:37 ` Paolo Bonzini
2019-08-13 21:14 ` Alex Williamson
2019-08-13 21:15 ` Paolo Bonzini
2019-08-13 22:10 ` Alex Williamson
2019-08-15 14:46 ` Sean Christopherson
2019-08-15 15:23 ` Alex Williamson
2019-08-15 16:00 ` Sean Christopherson
2019-08-15 18:16 ` Alex Williamson
2019-08-15 19:25 ` Sean Christopherson
2019-08-15 20:11 ` Alex Williamson
2019-08-19 16:03 ` Paolo Bonzini
2019-08-20 20:03 ` Sean Christopherson
2019-08-20 20:42 ` Alex Williamson
2019-08-20 21:02 ` Sean Christopherson
2019-08-21 19:08 ` Alex Williamson
2019-08-21 19:35 ` Alex Williamson
2019-08-21 20:30 ` Sean Christopherson
2019-08-23 2:25 ` Sean Christopherson
2019-08-23 22:05 ` Alex Williamson
2019-08-21 20:10 ` Sean Christopherson
2019-08-26 7:36 ` Tian, Kevin
2019-08-26 14:56 ` Sean Christopherson
2020-06-26 17:32 ` Sean Christopherson
2022-10-20 18:31 ` Alexander Graf
2022-10-20 20:37 ` Sean Christopherson
2022-10-20 21:06 ` Alexander Graf
2022-10-21 19:40 ` Sean Christopherson
2022-10-24 6:12 ` Alexander Graf [this message]
2022-10-24 15:55 ` Sean Christopherson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=490509f6-ae1a-4fc8-42a1-b037d6bffada@amazon.com \
--to=graf@amazon.com \
--cc=alex.williamson@redhat.com \
--cc=guangrong.xiao@gmail.com \
--cc=kvm@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=rkrcmar@redhat.com \
--cc=seanjc@google.com \
--cc=sidcha@amazon.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).