linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nicolas Saenz Julienne <nsaenz@amazon.com>
To: Sean Christopherson <seanjc@google.com>
Cc: <kvm@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	<linux-hyperv@vger.kernel.org>, <pbonzini@redhat.com>,
	<vkuznets@redhat.com>, <anelkz@amazon.com>, <graf@amazon.com>,
	<dwmw@amazon.co.uk>, <jgowans@amazon.com>, <corbert@lwn.net>,
	<kys@microsoft.com>, <haiyangz@microsoft.com>,
	<decui@microsoft.com>, <x86@kernel.org>,
	<linux-doc@vger.kernel.org>
Subject: Re: [RFC 0/33] KVM: x86: hyperv: Introduce VSM support
Date: Fri, 10 Nov 2023 17:56:41 +0000	[thread overview]
Message-ID: <CWVBQQ6GVQVG.2FKWLPBUF77UT@amazon.com> (raw)
In-Reply-To: <ZUvUZytj1AabvvrB@google.com>

Hi Sean,
Thanks for taking the time to review the series. I took note of your
comments across the series, and will incorporate them into the LPC
discussion.

On Wed Nov 8, 2023 at 6:33 PM UTC, Sean Christopherson wrote:
> On Wed, Nov 08, 2023, Sean Christopherson wrote:
> > On Wed, Nov 08, 2023, Nicolas Saenz Julienne wrote:
> > > This RFC series introduces the necessary infrastructure to emulate VSM
> > > enabled guests. It is a snapshot of the progress we made so far, and its
> > > main goal is to gather design feedback.
> >
> > Heh, then please provide an overview of the design, and ideally context and/or
> > justification for various design decisions.  It doesn't need to be a proper design
> > doc, and you can certainly point at other documentation for explaining VSM/VTLs,
> > but a few paragraphs and/or verbose bullet points would go a long way.
> >
> > The documentation in patch 33 provides an explanation of VSM itself, and a little
> > insight into how userspace can utilize the KVM implementation.  But the documentation
> > provides no explanation of the mechanics that KVM *developers* care about, e.g.
> > the use of memory attributes, how memory attributes are enforced, whether or not
> > an in-kernel local APIC is required, etc.
> >
> > Nor does the documentation explain *why*, e.g. why store a separate set of memory
> > attributes per VTL "device", which by the by is broken and unnecessary.
>
> After speed reading the series..  An overview of the design, why you made certain
> choices, and the tradeoffs between various options is definitely needed.
>
> A few questions off the top of my head:
>
>  - What is the split between userspace and KVM?  How did you arrive at that split?

Our original design, which we discussed in the KVM forum 2023 [1] and is
public [2], implemented most of VSM in-kernel. Notably we introduced VTL
awareness in struct kvm_vcpu. This turned out to be way too complex: now
vcpus have multiple CPU architectural states, events, apics, mmu, etc.
First of all, the code turned out to be very intrusive, for example,
most apic APIs had to be reworked one way or another to accommodate
the fact there are multiple apics available. Also, we were forced to
introduce VSM-specific semantics in x86 emulation code. But more
importantly, the biggest pain has been dealing with state changes, they
may be triggered remotely through requests, and some are already fairly
delicate as-is. They involve a multitude of corner cases that almost
never apply for a VTL aware kvm_vcpu. Especially if you factor in
features like live migration. It's been a continuous source of
regressions.

Memory protections were implemented by using memory slot modifications.
We introduced a downstream API that allows updating memory slots
concurrently with vCPUs running. I think there was a similar proposal
upstream from Red Hat some time ago. The result is complex, hard to
generalize and slow.

So we decided to move all this complexity outside of struct kvm_vcpu
and, as much as possible, out of the kernel. We figures out the basic
kernel building blocks that are absolutely necessary, and let user-space
deal with the rest.

>  - How much *needs* to be in KVM?  I.e. how much can be pushed to userspace while
>    maintaininly good performance?

As I said above, the aim of the current design is to keep it as light as
possible. The biggest move we made was moving VTL switching into
user-space. We don't see any indication that performance is affected in
a major way. But we will know for sure once we finish the implementation
and test it under real use-cases.

>  - Why not make VTLs a first-party concept in KVM?  E.g. rather than bury info
>    in a VTL device and APIC ID groups, why not modify "struct kvm" to support
>    replicating state that needs to be tracked per-VTL?  Because of how memory
>    attributes affect hugepages, duplicating *memslots* might actually be easier
>    than teaching memslots to be VTL-aware.

I do agree that we need to introduce some level VTL awareness into
memslots. There's the hugepages issues you pointed out. But it'll be
also necessary once we look at how to implement overlay pages that are
per-VTL. (A topic I didn't mention in the series as I though I had
managed to solve memory protections while avoiding the need for multiple
slots). What I have in mind is introducing a memory slot address space
per-VTL, similar to how we do things with SMM.

It's important to note that requirements for overlay pages and memory
protections are very different. Overlay pages are scarce, and are setup
once and never change (AFAICT), so we think stopping all vCPUs, updating
slots, and resuming execution will provide good enough performance.
Memory protections happen very frequently, generally with page
granularity, and may be short-lived.

>  - Is "struct kvm_vcpu" the best representation of an execution context (if I'm
>    getting the terminology right)?

Let's forget I ever mentioned execution contexts. I used it in the hopes
of making the VTL concept a little more understandable for non-VSM aware
people. It's meant to be interchangeable with VTL. But I see how it
creates confusion.

>    E.g. if 90% of the state is guaranteed to be identical for a given
>    vCPU across execution contexts, then modeling that with separate
>    kvm_vcpu structures is very inefficient.  I highly doubt it's 90%,
>    but it might be quite high depending on how much the TFLS restricts
>    the state of the vCPU, e.g. if it's 64-bit only.

For the record here's the private VTL state (TLFS 15.11.1):

"In general, each VTL has its own control registers, RIP register, RSP
 register, and MSRs:

 SYSENTER_CS, SYSENTER_ESP, SYSENTER_EIP, STAR, LSTAR, CSTAR, SFMASK,
 EFER, PAT, KERNEL_GSBASE, FS.BASE, GS.BASE, TSC_AUX
 HV_X64_MSR_HYPERCALL
 HV_X64_MSR_GUEST_OS_ID
 HV_X64_MSR_REFERENCE_TSC
 HV_X64_MSR_APIC_FREQUENCY
 HV_X64_MSR_EOI
 HV_X64_MSR_ICR
 HV_X64_MSR_TPR
 HV_X64_MSR_APIC_ASSIST_PAGE
 HV_X64_MSR_NPIEP_CONFIG
 HV_X64_MSR_SIRBP
 HV_X64_MSR_SCONTROL
 HV_X64_MSR_SVERSION
 HV_X64_MSR_SIEFP
 HV_X64_MSR_SIMP
 HV_X64_MSR_EOM
 HV_X64_MSR_SINT0 – HV_X64_MSR_SINT15
 HV_X64_MSR_STIMER0_COUNT – HV_X64_MSR_STIMER3_COUNT
 Local APIC registers (including CR8/TPR)
"

The rest is shared.

Note that we've observed that during normal operation, VTL switches
don't happen that often. The boot process is the most affected by any
performance impact VSM might introduce, which issues 100000s (mostly
memory protections).

Nicolas

[1] https://kvm-forum.qemu.org/2023/talk/TK7YGD/
[2] Partial rebase of our original implementation:
    https://github.com/vianpl/linux vsm

  reply	other threads:[~2023-11-10 18:41 UTC|newest]

Thread overview: 108+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-08 11:17 [RFC 0/33] KVM: x86: hyperv: Introduce VSM support Nicolas Saenz Julienne
2023-11-08 11:17 ` [RFC 01/33] KVM: x86: Decouple lapic.h from hyperv.h Nicolas Saenz Julienne
2023-11-08 16:11   ` Sean Christopherson
2023-11-08 11:17 ` [RFC 02/33] KVM: x86: Introduce KVM_CAP_APIC_ID_GROUPS Nicolas Saenz Julienne
2023-11-08 12:11   ` Alexander Graf
2023-11-08 17:47   ` Sean Christopherson
2023-11-10 18:46     ` Nicolas Saenz Julienne
2023-11-28  6:56   ` Maxim Levitsky
2023-12-01 15:25     ` Nicolas Saenz Julienne
2023-11-08 11:17 ` [RFC 03/33] KVM: x86: hyper-v: Introduce XMM output support Nicolas Saenz Julienne
2023-11-08 11:44   ` Alexander Graf
2023-11-08 12:11     ` Vitaly Kuznetsov
2023-11-08 12:16       ` Alexander Graf
2023-11-28  6:57         ` Maxim Levitsky
2023-11-08 11:17 ` [RFC 04/33] KVM: x86: hyper-v: Move hypercall page handling into separate function Nicolas Saenz Julienne
2023-11-28  7:01   ` Maxim Levitsky
2023-11-08 11:17 ` [RFC 05/33] KVM: x86: hyper-v: Introduce VTL call/return prologues in hypercall page Nicolas Saenz Julienne
2023-11-08 11:53   ` Alexander Graf
2023-11-08 14:10     ` Nicolas Saenz Julienne
2023-11-28  7:08   ` Maxim Levitsky
2023-11-28 16:33     ` Sean Christopherson
2023-12-01 16:19     ` Nicolas Saenz Julienne
2023-12-01 16:32       ` Sean Christopherson
2023-12-01 16:50         ` Nicolas Saenz Julienne
2023-12-01 17:47           ` Sean Christopherson
2023-12-01 18:15             ` Nicolas Saenz Julienne
2023-12-05 19:21               ` Sean Christopherson
2023-12-05 20:04                 ` Maxim Levitsky
2023-12-06  0:07                   ` Sean Christopherson
2023-12-06 16:19                     ` Maxim Levitsky
2023-11-08 11:17 ` [RFC 06/33] KVM: x86: hyper-v: Introduce VTL awareness to Hyper-V's PV-IPIs Nicolas Saenz Julienne
2023-11-28  7:14   ` Maxim Levitsky
2023-12-01 16:31     ` Nicolas Saenz Julienne
2023-12-05 15:02       ` Maxim Levitsky
2023-11-08 11:17 ` [RFC 07/33] KVM: x86: hyper-v: Introduce KVM_CAP_HYPERV_VSM Nicolas Saenz Julienne
2023-11-28  7:16   ` Maxim Levitsky
2023-11-08 11:17 ` [RFC 08/33] KVM: x86: Don't use hv_timer if CAP_HYPERV_VSM enabled Nicolas Saenz Julienne
2023-11-28  7:21   ` Maxim Levitsky
2023-11-08 11:17 ` [RFC 09/33] KVM: x86: hyper-v: Introduce per-VTL vcpu helpers Nicolas Saenz Julienne
2023-11-08 12:21   ` Alexander Graf
2023-11-08 14:04     ` Nicolas Saenz Julienne
2023-11-28  7:25   ` Maxim Levitsky
2023-11-08 11:17 ` [RFC 10/33] KVM: x86: hyper-v: Introduce KVM_HV_GET_VSM_STATE Nicolas Saenz Julienne
2023-11-28  7:26   ` Maxim Levitsky
2023-11-08 11:17 ` [RFC 11/33] KVM: x86: hyper-v: Handle GET/SET_VP_REGISTER hcall in user-space Nicolas Saenz Julienne
2023-11-08 12:14   ` Alexander Graf
2023-11-28  7:26     ` Maxim Levitsky
2023-11-08 11:17 ` [RFC 12/33] KVM: x86: hyper-v: Handle VSM hcalls " Nicolas Saenz Julienne
2023-11-28  7:28   ` Maxim Levitsky
2023-11-08 11:17 ` [RFC 13/33] KVM: Allow polling vCPUs for events Nicolas Saenz Julienne
2023-11-28  7:30   ` Maxim Levitsky
2023-11-08 11:17 ` [RFC 14/33] KVM: x86: Add VTL to the MMU role Nicolas Saenz Julienne
2023-11-08 17:26   ` Sean Christopherson
2023-11-10 18:52     ` Nicolas Saenz Julienne
2023-11-28  7:34       ` Maxim Levitsky
2023-11-08 11:17 ` [RFC 15/33] KVM: x86/mmu: Introduce infrastructure to handle non-executable faults Nicolas Saenz Julienne
2023-11-28  7:34   ` Maxim Levitsky
2023-11-08 11:17 ` [RFC 16/33] KVM: x86/mmu: Expose R/W/X flags during memory fault exits Nicolas Saenz Julienne
2023-11-28  7:36   ` Maxim Levitsky
2023-11-28 16:31     ` Sean Christopherson
2023-11-08 11:17 ` [RFC 17/33] KVM: x86/mmu: Allow setting memory attributes if VSM enabled Nicolas Saenz Julienne
2023-11-28  7:39   ` Maxim Levitsky
2023-11-08 11:17 ` [RFC 18/33] KVM: x86: Decouple kvm_get_memory_attributes() from struct kvm's mem_attr_array Nicolas Saenz Julienne
2023-11-08 16:59   ` Sean Christopherson
2023-11-28  7:41   ` Maxim Levitsky
2023-11-08 11:17 ` [RFC 19/33] KVM: x86: Decouple kvm_range_has_memory_attributes() " Nicolas Saenz Julienne
2023-11-28  7:42   ` Maxim Levitsky
2023-11-08 11:17 ` [RFC 20/33] KVM: x86/mmu: Decouple hugepage_has_attrs() " Nicolas Saenz Julienne
2023-11-28  7:43   ` Maxim Levitsky
2023-11-08 11:17 ` [RFC 21/33] KVM: Pass memory attribute array as a MMU notifier argument Nicolas Saenz Julienne
2023-11-08 17:08   ` Sean Christopherson
2023-11-08 11:17 ` [RFC 22/33] KVM: Decouple kvm_ioctl_set_mem_attributes() from kvm's mem_attr_array Nicolas Saenz Julienne
2023-11-08 11:17 ` [RFC 23/33] KVM: Expose memory attribute helper functions unanimously Nicolas Saenz Julienne
2023-11-08 11:17 ` [RFC 24/33] KVM: x86: hyper-v: Introduce KVM VTL device Nicolas Saenz Julienne
2023-11-08 11:17 ` [RFC 25/33] KVM: Introduce a set of new memory attributes Nicolas Saenz Julienne
2023-11-08 12:30   ` Alexander Graf
2023-11-08 16:43     ` Sean Christopherson
2023-11-08 11:17 ` [RFC 26/33] KVM: x86: hyper-vsm: Allow setting per-VTL " Nicolas Saenz Julienne
2023-11-28  7:44   ` Maxim Levitsky
2023-11-08 11:18 ` [RFC 27/33] KVM: x86/mmu/hyper-v: Validate memory faults against per-VTL memprots Nicolas Saenz Julienne
2023-11-28  7:46   ` Maxim Levitsky
2023-11-08 11:18 ` [RFC 28/33] x86/hyper-v: Introduce memory intercept message structure Nicolas Saenz Julienne
2023-11-28  7:53   ` Maxim Levitsky
2023-11-08 11:18 ` [RFC 29/33] KVM: VMX: Save instruction length on EPT violation Nicolas Saenz Julienne
2023-11-08 12:40   ` Alexander Graf
2023-11-08 16:15     ` Sean Christopherson
2023-11-08 17:11       ` Alexander Graf
2023-11-08 17:20   ` Sean Christopherson
2023-11-08 17:27     ` Alexander Graf
2023-11-08 18:19       ` Jim Mattson
2023-11-08 11:18 ` [RFC 30/33] KVM: x86: hyper-v: Introduce KVM_REQ_HV_INJECT_INTERCEPT request Nicolas Saenz Julienne
2023-11-08 12:45   ` Alexander Graf
2023-11-08 13:38     ` Nicolas Saenz Julienne
2023-11-28  8:19       ` Maxim Levitsky
2023-11-08 11:18 ` [RFC 31/33] KVM: x86: hyper-v: Inject intercept on VTL memory protection fault Nicolas Saenz Julienne
2023-11-08 11:18 ` [RFC 32/33] KVM: x86: hyper-v: Implement HVCALL_TRANSLATE_VIRTUAL_ADDRESS Nicolas Saenz Julienne
2023-11-08 12:49   ` Alexander Graf
2023-11-08 13:44     ` Nicolas Saenz Julienne
2023-11-08 11:18 ` [RFC 33/33] Documentation: KVM: Introduce "Emulating Hyper-V VSM with KVM" Nicolas Saenz Julienne
2023-11-28  8:19   ` Maxim Levitsky
2023-11-08 11:40 ` [RFC 0/33] KVM: x86: hyperv: Introduce VSM support Alexander Graf
2023-11-08 14:41   ` Nicolas Saenz Julienne
2023-11-08 16:55 ` Sean Christopherson
2023-11-08 18:33   ` Sean Christopherson
2023-11-10 17:56     ` Nicolas Saenz Julienne [this message]
2023-11-10 19:32       ` Sean Christopherson
2023-11-11 11:55         ` Nicolas Saenz Julienne
2023-11-10 19:04   ` Nicolas Saenz Julienne

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CWVBQQ6GVQVG.2FKWLPBUF77UT@amazon.com \
    --to=nsaenz@amazon.com \
    --cc=anelkz@amazon.com \
    --cc=corbert@lwn.net \
    --cc=decui@microsoft.com \
    --cc=dwmw@amazon.co.uk \
    --cc=graf@amazon.com \
    --cc=haiyangz@microsoft.com \
    --cc=jgowans@amazon.com \
    --cc=kvm@vger.kernel.org \
    --cc=kys@microsoft.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=vkuznets@redhat.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).