All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Jan Beulich <jbeulich@suse.com>,
	"xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>
Cc: "George Dunlap" <george.dunlap@citrix.com>,
	"Ian Jackson" <iwj@xenproject.org>,
	"Julien Grall" <julien@xen.org>,
	"Stefano Stabellini" <sstabellini@kernel.org>,
	"Wei Liu" <wl@xen.org>, "Roger Pau Monné" <roger.pau@citrix.com>
Subject: Re: [PATCH v3 03/22] x86/xstate: re-size save area when CPUID policy changes
Date: Mon, 3 May 2021 14:57:15 +0100	[thread overview]
Message-ID: <5a954be8-e213-36d8-27da-4c51243dc280@citrix.com> (raw)
In-Reply-To: <8ba8f016-0aed-277b-bbea-80022d057791@suse.com>

On 22/04/2021 15:44, Jan Beulich wrote:
> vCPU-s get maximum size areas allocated initially. Hidden (and in
> particular default-off) features may allow for a smaller size area to
> suffice.
>
> Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> v2: Use 1ul instead of 1ull. Re-base.
> ---
> This could be further shrunk if we used XSAVEC / if we really used
> XSAVES, as then we don't need to also cover the holes. But since we
> currently use neither of the two in reality, this would require more
> work than just adding the alternative size calculation here.
>
> Seeing that both vcpu_init_fpu() and cpuid_policy_updated() get called
> from arch_vcpu_create(), I'm not sure we really need this two-stage
> approach - the slightly longer period of time during which
> v->arch.xsave_area would remain NULL doesn't look all that problematic.
> But since xstate_alloc_save_area() gets called for idle vCPU-s, it has
> to stay anyway in some form, so the extra code churn may not be worth
> it.
>
> Instead of cpuid_policy_xcr0_max(), cpuid_policy_xstates() may be the
> interface to use here. But it remains to be determined whether the
> xcr0_accum field is meant to be inclusive of XSS (in which case it would
> better be renamed) or exclusive. Right now there's no difference as we
> don't support any XSS-controlled features.

I've been figuring out what we need to for supervisors states.  The
current code is not in a good shape, but I also think some of the
changes in this series take us in an unhelpful direction.

I've got a cleanup series which I will post shortly.  It interacts
texturally although not fundamentally with this series, but does fix
several issues.

For supervisor states, we need use XSAVES unilaterally, even for PV. 
This is because XSS_CET_S needs to be the HVM kernel's context, or Xen's
in PV context (specifically, MSR_PL0_SSP which is the shstk equivalent
of TSS.RSP0).


A consequence is that Xen's data handling shall use the compressed
format, and include supervisor states.  (While in principle we could
manage CET_S, CET_U, and potentially PT when vmtrace gets expanded, each
WRMSR there is a similar order of magnitude to an XSAVES/XRSTORS
instruction.)

I'm planning a host xss setting, similar to mmu_cr4_features, which
shall be the setting in context for everything other than HVM vcpus
(which need the guest setting in context, and/or the VT-x bodge to
support host-only states).  Amongst other things, all context switch
paths, including from-HVM, need to step XSS up to the host setting to
let XSAVES function correctly.

However, a consequence of this is that the size of the xsave area needs
deriving from host, as well as guest-max state.  i.e. even if some VMs
aren't using CET, we still need space in the xsave areas to function
correctly when a single VM is using CET.

Another consequence is that we need to rethink our hypercall behaviour. 
There is no such thing as supervisor states in an uncompressed XSAVE
image, which means we can't continue with that being the ABI.

I've also found some substantial issues with how we handle
xcr0/xcr0_accum and plan to address these.  There is no such thing as
xcr0 without the bottom bit set, ever, and xcr0_accum needs to default
to X87|SSE seeing as that's how we use it anyway.  However, in a context
switch, I expect we'll still be using xcr0_accum | host_xss when it
comes to the context switch path.

In terms of actual context switching, we want to be using XSAVES/XRSTORS
whenever it is available, even if we're not using supervisor states. 
XSAVES has both the inuse and modified optimisations, without the broken
consequence of XSAVEOPT (which is firmly in the "don't ever use this"
bucket now).

There's no point ever using XSAVEC.  There is no hardware where it
exists in the absence of XSAVES, and can't even in theoretical
circumstances due to (perhaps unintentional) linkage of the CPUID data. 
XSAVEC also doesn't use the modified optimisation, and is therefore
strictly worse than XSAVES, even when MSR_XSS is 0.

Therefore, our choice of instruction wants to be XSAVES, or XSAVE, or
FXSAVE, depending on hardware capability.

~Andrew



  reply	other threads:[~2021-05-03 13:57 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-22 14:38 [PATCH v3 00/22] xvmalloc() / x86 xstate area / x86 CPUID / AMX+XFD Jan Beulich
2021-04-22 14:43 ` [PATCH v3 01/22] mm: introduce xvmalloc() et al and use for grant table allocations Jan Beulich
2021-05-03 11:31   ` Roger Pau Monné
2021-05-03 13:50     ` Jan Beulich
2021-05-03 14:54       ` Roger Pau Monné
2021-05-03 15:21         ` Jan Beulich
2021-05-03 16:39           ` Roger Pau Monné
2021-04-22 14:44 ` [PATCH v3 02/22] x86/xstate: use xvzalloc() for save area allocation Jan Beulich
2021-05-05 13:29   ` Roger Pau Monné
2021-04-22 14:44 ` [PATCH v3 03/22] x86/xstate: re-size save area when CPUID policy changes Jan Beulich
2021-05-03 13:57   ` Andrew Cooper [this message]
2021-05-03 14:22     ` Jan Beulich
2021-05-11 16:41       ` Andrew Cooper
2021-05-17  7:33         ` Jan Beulich
2021-04-22 14:45 ` [PATCH v3 04/22] x86/xstate: re-use valid_xcr0() for boot-time checks Jan Beulich
2021-05-03 11:53   ` Andrew Cooper
2021-04-22 14:45 ` [PATCH v3 05/22] x86/xstate: drop xstate_offsets[] and xstate_sizes[] Jan Beulich
2021-05-03 16:10   ` Andrew Cooper
2021-05-04  7:57     ` Jan Beulich
2021-04-22 14:46 ` [PATCH v3 06/22] x86/xstate: replace xsave_cntxt_size and drop XCNTXT_MASK Jan Beulich
2021-04-22 14:47 ` [PATCH v3 07/22] x86/xstate: avoid accounting for unsupported components Jan Beulich
2021-04-22 14:47 ` [PATCH v3 08/22] x86: use xvmalloc() for extended context buffer allocations Jan Beulich
2021-04-22 14:48 ` [PATCH v3 09/22] x86/xstate: enable AMX components Jan Beulich
2021-04-22 14:50 ` [PATCH v3 10/22] x86/CPUID: adjust extended leaves out of range clearing Jan Beulich
2021-04-22 14:50 ` [PATCH v3 11/22] x86/CPUID: move bounding of max_{,sub}leaf fields to library code Jan Beulich
2021-04-22 14:51 ` [PATCH v3 12/22] x86/CPUID: enable AMX leaves Jan Beulich
2021-04-22 14:52 ` [PATCH v3 13/22] x86: XFD enabling Jan Beulich
2021-04-22 14:53 ` [PATCH v3 14/22] x86emul: introduce X86EMUL_FPU_{tilecfg,tile} Jan Beulich
2021-04-22 14:53 ` [PATCH v3 15/22] x86emul: support TILERELEASE Jan Beulich
2021-04-22 14:53 ` [PATCH v3 16/22] x86: introduce struct for TILECFG register Jan Beulich
2021-04-22 14:54 ` [PATCH v3 17/22] x86emul: support {LD,ST}TILECFG Jan Beulich
2021-04-22 14:55 ` [PATCH v3 18/22] x86emul: support TILEZERO Jan Beulich
2021-04-22 14:55 ` [PATCH v3 19/22] x86emul: support TILELOADD{,T1} and TILESTORE Jan Beulich
2021-04-22 15:06   ` Jan Beulich
2021-04-22 15:11     ` Jan Beulich
2021-04-26  7:12       ` Paul Durrant
2021-04-29  9:40         ` Jan Beulich
2021-04-22 14:56 ` [PATCH v3 20/22] x86emul: support tile multiplication insns Jan Beulich
2021-04-22 14:57 ` [PATCH v3 21/22] x86emul: test AMX insns Jan Beulich
2021-04-22 14:57 ` [PATCH v3 22/22] x86: permit guests to use AMX and XFD Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5a954be8-e213-36d8-27da-4c51243dc280@citrix.com \
    --to=andrew.cooper3@citrix.com \
    --cc=george.dunlap@citrix.com \
    --cc=iwj@xenproject.org \
    --cc=jbeulich@suse.com \
    --cc=julien@xen.org \
    --cc=roger.pau@citrix.com \
    --cc=sstabellini@kernel.org \
    --cc=wl@xen.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.