linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Yang Weijiang <weijiang.yang@intel.com>
To: seanjc@google.com, pbonzini@redhat.com, dave.hansen@intel.com,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	x86@kernel.org, yuan.yao@linux.intel.com
Cc: peterz@infradead.org, chao.gao@intel.com,
	rick.p.edgecombe@intel.com, mlevitsk@redhat.com,
	john.allen@amd.com, weijiang.yang@intel.com
Subject: [PATCH v9 01/27] x86/fpu/xstate: Always preserve non-user xfeatures/flags in __state_perm
Date: Tue, 23 Jan 2024 18:41:34 -0800	[thread overview]
Message-ID: <20240124024200.102792-2-weijiang.yang@intel.com> (raw)
In-Reply-To: <20240124024200.102792-1-weijiang.yang@intel.com>

From: Sean Christopherson <seanjc@google.com>

When granting userspace or a KVM guest access to an xfeature, preserve the
entity's existing supervisor and software-defined permissions as tracked
by __state_perm, i.e. use __state_perm to track *all* permissions even
though all supported supervisor xfeatures are granted to all FPUs and
FPU_GUEST_PERM_LOCKED disallows changing permissions.

Effectively clobbering supervisor permissions results in inconsistent
behavior, as xstate_get_group_perm() will report supervisor features for
process that do NOT request access to dynamic user xfeatures, whereas any
and all supervisor features will be absent from the set of permissions for
any process that is granted access to one or more dynamic xfeatures (which
right now means AMX).

The inconsistency isn't problematic because fpu_xstate_prctl() already
strips out everything except user xfeatures:

        case ARCH_GET_XCOMP_PERM:
                /*
                 * Lockless snapshot as it can also change right after the
                 * dropping the lock.
                 */
                permitted = xstate_get_host_group_perm();
                permitted &= XFEATURE_MASK_USER_SUPPORTED;
                return put_user(permitted, uptr);

        case ARCH_GET_XCOMP_GUEST_PERM:
                permitted = xstate_get_guest_group_perm();
                permitted &= XFEATURE_MASK_USER_SUPPORTED;
                return put_user(permitted, uptr);

and similarly KVM doesn't apply the __state_perm to supervisor states
(kvm_get_filtered_xcr0() incorporates xstate_get_guest_group_perm()):

        case 0xd: {
                u64 permitted_xcr0 = kvm_get_filtered_xcr0();
                u64 permitted_xss = kvm_caps.supported_xss;

But if KVM in particular were to ever change, dropping supervisor
permissions would result in subtle bugs in KVM's reporting of supported
CPUID settings.  And the above behavior also means that having supervisor
xfeatures in __state_perm is correctly handled by all users.

Dropping supervisor permissions also creates another landmine for KVM.  If
more dynamic user xfeatures are ever added, requesting access to multiple
xfeatures in separate ARCH_REQ_XCOMP_GUEST_PERM calls will result in the
second invocation of __xstate_request_perm() computing the wrong ksize, as
as the mask passed to xstate_calculate_size() would not contain *any*
supervisor features.

Commit 781c64bfcb73 ("x86/fpu/xstate: Handle supervisor states in XSTATE
permissions") fudged around the size issue for userspace FPUs, but for
reasons unknown skipped guest FPUs.  Lack of a fix for KVM "works" only
because KVM doesn't yet support virtualizing features that have supervisor
xfeatures, i.e. as of today, KVM guest FPUs will never need the relevant
xfeatures.

Simply extending the hack-a-fix for guests would temporarily solve the
ksize issue, but wouldn't address the inconsistency issue and would leave
another lurking pitfall for KVM.  KVM support for virtualizing CET will
likely add CET_KERNEL as a guest-only xfeature, i.e. CET_KERNEL will not
be set in xfeatures_mask_supervisor() and would again be dropped when
granting access to dynamic xfeatures.

Note, the existing clobbering behavior is rather subtle.  The @permitted
parameter to __xstate_request_perm() comes from:

	permitted = xstate_get_group_perm(guest);

which is either fpu->guest_perm.__state_perm or fpu->perm.__state_perm,
where __state_perm is initialized to:

        fpu->perm.__state_perm          = fpu_kernel_cfg.default_features;

and copied to the guest side of things:

	/* Same defaults for guests */
	fpu->guest_perm = fpu->perm;

fpu_kernel_cfg.default_features contains everything except the dynamic
xfeatures, i.e. everything except XFEATURE_MASK_XTILE_DATA:

        fpu_kernel_cfg.default_features = fpu_kernel_cfg.max_features;
        fpu_kernel_cfg.default_features &= ~XFEATURE_MASK_USER_DYNAMIC;

When __xstate_request_perm() restricts the local "mask" variable to
compute the user state size:

	mask &= XFEATURE_MASK_USER_SUPPORTED;
	usize = xstate_calculate_size(mask, false);

it subtly overwrites the target __state_perm with "mask" containing only
user xfeatures:

	perm = guest ? &fpu->guest_perm : &fpu->perm;
	/* Pairs with the READ_ONCE() in xstate_get_group_perm() */
	WRITE_ONCE(perm->__state_perm, mask);

Cc: Maxim Levitsky <mlevitsk@redhat.com>
Cc: Weijiang Yang <weijiang.yang@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Chao Gao <chao.gao@intel.com>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: John Allen <john.allen@amd.com>
Cc: kvm@vger.kernel.org
Link: https://lore.kernel.org/all/ZTqgzZl-reO1m01I@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kernel/fpu/xstate.c | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 117e74c44e75..07911532b108 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -1601,16 +1601,20 @@ static int __xstate_request_perm(u64 permitted, u64 requested, bool guest)
 	if ((permitted & requested) == requested)
 		return 0;
 
-	/* Calculate the resulting kernel state size */
+	/*
+	 * Calculate the resulting kernel state size.  Note, @permitted also
+	 * contains supervisor xfeatures even though supervisor are always
+	 * permitted for kernel and guest FPUs, and never permitted for user
+	 * FPUs.
+	 */
 	mask = permitted | requested;
-	/* Take supervisor states into account on the host */
-	if (!guest)
-		mask |= xfeatures_mask_supervisor();
 	ksize = xstate_calculate_size(mask, compacted);
 
-	/* Calculate the resulting user state size */
-	mask &= XFEATURE_MASK_USER_SUPPORTED;
-	usize = xstate_calculate_size(mask, false);
+	/*
+	 * Calculate the resulting user state size.  Take care not to clobber
+	 * the supervisor xfeatures in the new mask!
+	 */
+	usize = xstate_calculate_size(mask & XFEATURE_MASK_USER_SUPPORTED, false);
 
 	if (!guest) {
 		ret = validate_sigaltstack(usize);
-- 
2.39.3


  reply	other threads:[~2024-01-24  2:42 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-24  2:41 [PATCH v9 00/27] Enable CET Virtualization Yang Weijiang
2024-01-24  2:41 ` Yang Weijiang [this message]
2024-01-30  1:29   ` [PATCH v9 01/27] x86/fpu/xstate: Always preserve non-user xfeatures/flags in __state_perm Edgecombe, Rick P
2024-01-24  2:41 ` [PATCH v9 02/27] x86/fpu/xstate: Refine CET user xstate bit enabling Yang Weijiang
2024-01-24  2:41 ` [PATCH v9 03/27] x86/fpu/xstate: Add CET supervisor mode state support Yang Weijiang
2024-01-24  2:41 ` [PATCH v9 04/27] x86/fpu/xstate: Introduce XFEATURE_MASK_KERNEL_DYNAMIC xfeature set Yang Weijiang
2024-01-24  2:41 ` [PATCH v9 05/27] x86/fpu/xstate: Introduce fpu_guest_cfg for guest FPU configuration Yang Weijiang
2024-01-30  1:29   ` Edgecombe, Rick P
2024-01-30 15:00     ` Yang, Weijiang
2024-01-24  2:41 ` [PATCH v9 06/27] x86/fpu/xstate: Create guest fpstate with guest specific config Yang Weijiang
2024-01-30  1:38   ` Edgecombe, Rick P
2024-01-30 14:54     ` Yang, Weijiang
2024-01-24  2:41 ` [PATCH v9 07/27] x86/fpu/xstate: Warn if kernel dynamic xfeatures detected in normal fpstate Yang Weijiang
2024-01-24  2:41 ` [PATCH v9 08/27] KVM: x86: Rework cpuid_get_supported_xcr0() to operate on vCPU data Yang Weijiang
2024-01-24  2:41 ` [PATCH v9 09/27] KVM: x86: Rename kvm_{g,s}et_msr() to menifest emulation operations Yang Weijiang
2024-01-25  3:43   ` Chao Gao
2024-01-24  2:41 ` [PATCH v9 10/27] KVM: x86: Refine xsave-managed guest register/MSR reset handling Yang Weijiang
2024-01-25 10:17   ` Chao Gao
2024-01-26  9:13     ` Yang, Weijiang
2024-01-24  2:41 ` [PATCH v9 11/27] KVM: x86: Add kvm_msr_{read,write}() helpers Yang Weijiang
2024-01-24  2:41 ` [PATCH v9 12/27] KVM: x86: Report XSS as to-be-saved if there are supported features Yang Weijiang
2024-01-25 10:37   ` Chao Gao
2024-01-24  2:41 ` [PATCH v9 13/27] KVM: x86: Refresh CPUID on write to guest MSR_IA32_XSS Yang Weijiang
2024-01-25 10:57   ` Chao Gao
2024-01-26  9:30     ` Yang, Weijiang
2024-01-24  2:41 ` [PATCH v9 14/27] KVM: x86: Initialize kvm_caps.supported_xss Yang Weijiang
2024-01-26  1:35   ` Chao Gao
2024-01-24  2:41 ` [PATCH v9 15/27] KVM: x86: Load guest FPU state when access XSAVE-managed MSRs Yang Weijiang
2024-01-24  2:41 ` [PATCH v9 16/27] KVM: x86: Add fault checks for guest CR4.CET setting Yang Weijiang
2024-01-24  2:41 ` [PATCH v9 17/27] KVM: x86: Report KVM supported CET MSRs as to-be-saved Yang Weijiang
2024-01-24  2:41 ` [PATCH v9 18/27] KVM: VMX: Introduce CET VMCS fields and control bits Yang Weijiang
2024-01-24  2:41 ` [PATCH v9 19/27] KVM: x86: Use KVM-governed feature framework to track "SHSTK/IBT enabled" Yang Weijiang
2024-01-24  2:41 ` [PATCH v9 20/27] KVM: VMX: Emulate read and write to CET MSRs Yang Weijiang
2024-01-24  2:41 ` [PATCH v9 21/27] KVM: x86: Save and reload SSP to/from SMRAM Yang Weijiang
2024-01-26  3:17   ` Chao Gao
2024-01-26  6:51     ` Chao Gao
2024-01-24  2:41 ` [PATCH v9 22/27] KVM: VMX: Set up interception for CET MSRs Yang Weijiang
2024-01-26  3:54   ` Chao Gao
2024-01-26  9:36     ` Yang, Weijiang
2024-01-24  2:41 ` [PATCH v9 23/27] KVM: VMX: Set host constant supervisor states to VMCS fields Yang Weijiang
2024-01-26  6:31   ` Chao Gao
2024-01-26  9:37     ` Yang, Weijiang
2024-01-24  2:41 ` [PATCH v9 24/27] KVM: x86: Enable CET virtualization for VMX and advertise to userspace Yang Weijiang
2024-01-26  7:50   ` Chao Gao
2024-01-26 12:54     ` Yang, Weijiang
2024-01-24  2:41 ` [PATCH v9 25/27] KVM: nVMX: Introduce new VMX_BASIC bit for event error_code delivery to L1 Yang Weijiang
2024-01-26  7:54   ` Chao Gao
2024-01-24  2:41 ` [PATCH v9 26/27] KVM: nVMX: Enable CET support for nested guest Yang Weijiang
2024-01-29  7:04   ` Chao Gao
2024-01-30  7:38     ` Yang, Weijiang
2024-01-24  2:42 ` [PATCH v9 27/27] KVM: x86: Stop emulating for CET protected branch instructions Yang Weijiang
2024-01-26  8:53   ` Chao Gao
2024-01-26 12:56     ` Yang, Weijiang
2024-01-30  1:40 ` [PATCH v9 00/27] Enable CET Virtualization Edgecombe, Rick P
2024-01-30 15:05   ` Yang, Weijiang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240124024200.102792-2-weijiang.yang@intel.com \
    --to=weijiang.yang@intel.com \
    --cc=chao.gao@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=john.allen@amd.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mlevitsk@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rick.p.edgecombe@intel.com \
    --cc=seanjc@google.com \
    --cc=x86@kernel.org \
    --cc=yuan.yao@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).