linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Borislav Petkov <bp@alien8.de>
To: "Chang S. Bae" <chang.seok.bae@intel.com>
Cc: luto@kernel.org, tglx@linutronix.de, mingo@kernel.org,
	x86@kernel.org, len.brown@intel.com, dave.hansen@intel.com,
	thiago.macieira@intel.com, jing2.liu@intel.com,
	ravi.v.shankar@intel.com, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v9 12/26] x86/fpu/xstate: Use feature disable (XFD) to protect dynamic user state
Date: Wed, 18 Aug 2021 18:24:51 +0200	[thread overview]
Message-ID: <YR00U19168BGoRB9@zn.tnic> (raw)
In-Reply-To: <20210730145957.7927-13-chang.seok.bae@intel.com>

On Fri, Jul 30, 2021 at 07:59:43AM -0700, Chang S. Bae wrote:
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index d0ce5cfd3ac1..37150b7a8e44 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -277,6 +277,7 @@
>  #define X86_FEATURE_XSAVEC		(10*32+ 1) /* XSAVEC instruction */
>  #define X86_FEATURE_XGETBV1		(10*32+ 2) /* XGETBV with ECX = 1 instruction */
>  #define X86_FEATURE_XSAVES		(10*32+ 3) /* XSAVES/XRSTORS instructions */
> +#define X86_FEATURE_XFD			(10*32+ 4) /* eXtended Feature Disabling */
							     ^
							     |

Add "" at the marker above - it doesn't look like we wanna show "xfd" in
/proc/cpuinfo.

>   * Extended auxiliary flags: Linux defined - for features scattered in various
> diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
> index 263e349ff85a..e3590cf55325 100644
> --- a/arch/x86/include/asm/fpu/internal.h
> +++ b/arch/x86/include/asm/fpu/internal.h
> @@ -535,14 +535,55 @@ static inline void switch_fpu_prepare(struct fpu *old_fpu, int cpu)
>   * Misc helper functions:
>   */
>  
> +/* The Extended Feature Disable (XFD) helpers: */
> +
> +static inline void xfd_write(u64 value)
> +{
> +	wrmsrl_safe(MSR_IA32_XFD, value);
> +}
> +
> +static inline u64 xfd_read(void)
> +{
> +	u64 value;
> +
> +	rdmsrl_safe(MSR_IA32_XFD, &value);
> +	return value;
> +}

Those look useless. Imagine we had to add wrappers around *every* MSR we
touch in the kernel...

> +
> +static inline u64 xfd_capable(void)
> +{
> +	return xfeatures_mask_user_dynamic;
> +}

A small helper which returns an u64 but is used in boolean context?

Also, this name is wrong: XFD capable is a system which has
X86_FEATURE_XFD set. You should simply use xfeatures_mask_user_dynamic
everywhere since it is __ro_after_init.

> +/**
> + * xfd_switch - Switches the MSR IA32_XFD context if needed.
> + * @prev:	The previous task's struct fpu pointer
> + * @next:	The next task's struct fpu pointer
> + */
> +static inline void xfd_switch(struct fpu *prev, struct fpu *next)
> +{
> +	u64 prev_xfd_mask, next_xfd_mask;
> +
> +	if (!static_cpu_has(X86_FEATURE_XFD) || !xfd_capable())

cpu_feature_enabled(). Use that everywhere in your patchset. But you
know already...

> +		return;
> +
> +	prev_xfd_mask = prev->state_mask & xfd_capable();
> +	next_xfd_mask = next->state_mask & xfd_capable();

This is just plain misleading:

You're *AND*ing a mask with xfd_capable?!?

Just use xfeatures_mask_user_dynamic directly instead, as already
mentioned.

> +	if (unlikely(prev_xfd_mask != next_xfd_mask))
> +		xfd_write(xfd_capable() ^ next_xfd_mask);
> +}

Here too.

Also, I must be missing something. Let's play with some imaginary masks:

prev->state_mask = 110b
next->state_mask = 111b
dyn		 = 101b

("dyn" is short for xfeatures_mask_user_dynamic)

prev_xfd_mask = 100b
next_xfd_mask = 101b

if (unlikely(100b != 101b))
	xfd_write(101b ^ 101b) == xfd_write(0)

so next has bits 2 and 0 set but the xfd write zaps them so next won't
get any more #NMs for those states.

Why?

>  /*
>   * Delay loading of the complete FPU state until the return to userland.
>   * PKRU is handled separately.
>   */
> -static inline void switch_fpu_finish(struct fpu *new_fpu)
> +static inline void switch_fpu_finish(struct fpu *old_fpu, struct fpu *new_fpu)
>  {
> -	if (cpu_feature_enabled(X86_FEATURE_FPU))
> +	if (cpu_feature_enabled(X86_FEATURE_FPU)) {
>  		set_thread_flag(TIF_NEED_FPU_LOAD);
> +		xfd_switch(old_fpu, new_fpu);
> +	}
>  }
>  
>  #endif /* _ASM_X86_FPU_INTERNAL_H */
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index a7c413432b33..eac0cfd9210b 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -626,6 +626,8 @@
>  #define MSR_IA32_BNDCFGS_RSVD		0x00000ffc
>  
>  #define MSR_IA32_XSS			0x00000da0
> +#define MSR_IA32_XFD			0x000001c4
> +#define MSR_IA32_XFD_ERR		0x000001c5

At least try to keep those numerically sorted, at least among the
architectural MSR_IA32_ ones. That is, provided those XFD things are
architectural...

>  #define MSR_IA32_APICBASE		0x0000001b
>  #define MSR_IA32_APICBASE_BSP		(1<<8)
> diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
> index defda61f372d..7f891d2eb52e 100644
> --- a/arch/x86/kernel/cpu/cpuid-deps.c
> +++ b/arch/x86/kernel/cpu/cpuid-deps.c
> @@ -75,6 +75,7 @@ static const struct cpuid_dep cpuid_deps[] = {
>  	{ X86_FEATURE_SGX_LC,			X86_FEATURE_SGX	      },
>  	{ X86_FEATURE_SGX1,			X86_FEATURE_SGX       },
>  	{ X86_FEATURE_SGX2,			X86_FEATURE_SGX1      },
> +	{ X86_FEATURE_XFD,			X86_FEATURE_XSAVE     },
>  	{}
>  };
>  
> diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
> index 3b56e7612c45..c6ff0575d87d 100644
> --- a/arch/x86/kernel/fpu/xstate.c
> +++ b/arch/x86/kernel/fpu/xstate.c
> @@ -182,6 +182,26 @@ static bool xfeature_is_supervisor(int xfeature_nr)
>  	return ecx & 1;
>  }
>  
> +/**
> + * xfd_supported - Check if the feature supports Extended Feature Disable (XFD).
> + * @feature_nr:	The feature number.
> + *
> + * Returns:	True if supported; otherwise, false.
> + */
> +static bool xfd_supported(int feature_nr)

xfeature_supports_xfd()

> +{
> +	u32 eax, ebx, ecx, edx;
> +
> +	if (!boot_cpu_has(X86_FEATURE_XFD))
> +		return false;
> +
> +	/*
> +	 * If state component 'i' supports it, ECX[2] return 1; otherwise, 0.
> +	 */
> +	cpuid_count(XSTATE_CPUID, feature_nr, &eax, &ebx, &ecx, &edx);
> +	return ecx & 4;
> +}
> +
>  /**
>   * get_xstate_comp_offset - Find the feature's offset in the compacted
>   *			    format.
> @@ -274,6 +294,9 @@ void fpu__init_cpu_xstate(void)
>  		wrmsrl(MSR_IA32_XSS, xfeatures_mask_supervisor() |
>  				     xfeatures_mask_independent());
>  	}
> +
> +	if (boot_cpu_has(X86_FEATURE_XFD))
> +		xfd_write(xfd_capable());
>  }
>  
>  static bool xfeature_enabled(enum xfeature xfeature)
> @@ -473,8 +496,9 @@ static void __init print_xstate_offset_size(void)
>  	for (i = FIRST_EXTENDED_XFEATURE; i < XFEATURE_MAX; i++) {
>  		if (!xfeature_enabled(i))
>  			continue;
> -		pr_info("x86/fpu: xstate_offset[%d]: %4d, xstate_sizes[%d]: %4d\n",
> -			 i, xstate_comp_offsets[i], i, xstate_sizes[i]);
> +		pr_info("x86/fpu: xstate_offset[%d]: %4d, xstate_sizes[%d]: %4d (%s)\n",
> +			i, xstate_comp_offsets[i], i, xstate_sizes[i],
> +			(xfeatures_mask_user_dynamic & BIT_ULL(i)) ? "dynamic" : "default");

Make that

			(xfeatures_mask_user_dynamic & BIT_ULL(i)) ? "(dynamic)" : "");

> @@ -920,6 +944,16 @@ void __init fpu__init_system_xstate(void)
>  	/* Do not support the dynamically allocated buffer yet. */
>  	xfeatures_mask_user_dynamic = 0;
>  
> +	for (i = FIRST_EXTENDED_XFEATURE; i < XFEATURE_MAX; i++) {
> +		u64 feature_mask = BIT_ULL(i);
> +
> +		if (!(xfeatures_mask_uabi() & feature_mask))
> +			continue;
> +
> +		if (xfd_supported(i))
> +			xfeatures_mask_user_dynamic |= feature_mask;
> +	}
> +
>  	/* Enable xstate instructions to be able to continue with initialization: */
>  	fpu__init_cpu_xstate();
>  	err = init_xstate_size();
> @@ -981,6 +1015,12 @@ void fpu__resume_cpu(void)
>  		wrmsrl(MSR_IA32_XSS, xfeatures_mask_supervisor()  |
>  				     xfeatures_mask_independent());
>  	}
> +
> +	if (boot_cpu_has(X86_FEATURE_XFD)) {
> +		u64 fpu_xfd_mask = current->thread.fpu.state_mask & xfd_capable();
> +
> +		xfd_write(xfd_capable() ^ fpu_xfd_mask);
> +	}
>  }
>  
>  /**
> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index 534b9fb7e7ee..b85fa499f195 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -97,6 +97,12 @@ void arch_thread_struct_whitelist(unsigned long *offset, unsigned long *size)
>  	*size = get_xstate_config(XSTATE_MIN_SIZE);
>  }
>  
> +void arch_release_task_struct(struct task_struct *task)
> +{
> +	if (cpu_feature_enabled(X86_FEATURE_FPU))
> +		free_xstate_buffer(&task->thread.fpu);
> +}
> +
>  /*
>   * Free thread data structures etc..
>   */
> diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
> index 4f2f54e1281c..7bd5d08eeb41 100644
> --- a/arch/x86/kernel/process_32.c
> +++ b/arch/x86/kernel/process_32.c
> @@ -213,7 +213,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
>  
>  	this_cpu_write(current_task, next_p);
>  
> -	switch_fpu_finish(next_fpu);
> +	switch_fpu_finish(prev_fpu, next_fpu);
>  
>  	/* Load the Intel cache allocation PQR MSR. */
>  	resctrl_sched_in();
> diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
> index ec0d836a13b1..41c9855158d6 100644
> --- a/arch/x86/kernel/process_64.c
> +++ b/arch/x86/kernel/process_64.c
> @@ -620,7 +620,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
>  	this_cpu_write(current_task, next_p);
>  	this_cpu_write(cpu_current_top_of_stack, task_top_of_stack(next_p));
>  
> -	switch_fpu_finish(next_fpu);
> +	switch_fpu_finish(prev_fpu, next_fpu);
>  
>  	/* Reload sp0. */
>  	update_task_stack(next_p);
> diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
> index a58800973aed..dd66d528afd8 100644
> --- a/arch/x86/kernel/traps.c
> +++ b/arch/x86/kernel/traps.c
> @@ -1112,6 +1112,45 @@ DEFINE_IDTENTRY(exc_device_not_available)
>  {
>  	unsigned long cr0 = read_cr0();
>  
> +	if (boot_cpu_has(X86_FEATURE_XFD)) {

This whole thing wants to be in a separate function. Even the
indentation level is begging for it.

> +		u64 xfd_err;
> +
> +		rdmsrl_safe(MSR_IA32_XFD_ERR, &xfd_err);
> +		wrmsrl_safe(MSR_IA32_XFD_ERR, 0);
> +
> +		if (xfd_err) {
> +			u64 xfd_event = xfd_err & xfd_capable();
> +
> +			if (WARN_ON(!xfd_event)) {
> +				/*
> +				 * Unexpected event is raised. But update XFD state to
> +				 * unblock the task.
> +				 */
> +				xfd_write(xfd_read() & ~xfd_err);

So AFAIU, xfd_err points to some other feature which caused this
exception.

So if that feature bit is set in XFD, you're clearing it here. Why?

So that it doesn't raise that #NM for it anymore?

This looks weird.

> +			} else {
> +				struct fpu *fpu = &current->thread.fpu;
> +				int err = -1;
> +
> +				/*
> +				 * Make sure not in interrupt context as handling a
> +				 * trap from userspace.
> +				 */
> +				if (!WARN_ON(in_interrupt())) {

I'm guessing that's supposed to stop people from using AMX and other
dynamic states in the kernel?

> +					err = alloc_xstate_buffer(fpu, xfd_event);
> +					if (!err)
> +						xfd_write((fpu->state_mask & xfd_capable()) ^
> +							  xfd_capable());
> +				}
> +
> +				/* Raise a signal when it failed to handle. */
> +				if (err)
> +					force_sig_fault(SIGILL, ILL_ILLOPC,
> +							error_get_trap_addr(regs));

Where is it documented that that configuration of SIG* types means,
failure to allocate the dynamic buffer?

To the general picture: why is this thing even allocating a buffer in #NM?

Why isn't the buffer pre-allocated for the process after latter having
done prctl() so that when an #NM happens, no allocation happens at all?

And with those buffers preallocated, all that XFD muck is not needed
either.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

  reply	other threads:[~2021-08-18 16:24 UTC|newest]

Thread overview: 91+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-30 14:59 [PATCH v9 00/26] x86: Support Intel Advanced Matrix Extensions Chang S. Bae
2021-07-30 14:59 ` [PATCH v9 01/26] x86/fpu/xstate: Modify the initialization helper to handle both static and dynamic buffers Chang S. Bae
2021-07-30 14:59 ` [PATCH v9 02/26] x86/fpu/xstate: Modify state copy helpers " Chang S. Bae
2021-07-30 14:59 ` [PATCH v9 03/26] x86/fpu/xstate: Modify address finders " Chang S. Bae
2021-07-30 14:59 ` [PATCH v9 04/26] x86/fpu/xstate: Add a new variable to indicate dynamic user states Chang S. Bae
2021-07-30 14:59 ` [PATCH v9 05/26] x86/fpu/xstate: Add new variables to indicate dynamic XSTATE buffer size Chang S. Bae
2021-08-12 15:03   ` Borislav Petkov
2021-07-30 14:59 ` [PATCH v9 06/26] x86/fpu/xstate: Calculate and remember dynamic XSTATE buffer sizes Chang S. Bae
2021-08-12 16:36   ` Borislav Petkov
2021-07-30 14:59 ` [PATCH v9 07/26] x86/fpu/xstate: Convert the struct fpu 'state' field to a pointer Chang S. Bae
2021-08-12 17:09   ` Borislav Petkov
2021-07-30 14:59 ` [PATCH v9 08/26] x86/fpu/xstate: Introduce helpers to manage the XSTATE buffer dynamically Chang S. Bae
2021-08-12 19:44   ` Borislav Petkov
2021-08-13  8:04     ` Bae, Chang Seok
2021-08-13 10:04       ` Borislav Petkov
2021-08-13 19:43         ` Bae, Chang Seok
2021-08-18  9:28           ` Borislav Petkov
2021-08-18 19:46             ` Bae, Chang Seok
2021-08-25 16:01               ` Bae, Chang Seok
2021-08-30 17:07               ` Borislav Petkov
2021-08-30 23:39                 ` Bae, Chang Seok
2021-08-16 18:33     ` Bae, Chang Seok
2021-08-16 18:53       ` Borislav Petkov
2021-08-30 17:45   ` Dave Hansen
2021-08-30 23:39     ` Bae, Chang Seok
2021-07-30 14:59 ` [PATCH v9 09/26] x86/fpu/xstate: Update the XSTATE save function to support dynamic states Chang S. Bae
2021-07-30 14:59 ` [PATCH v9 10/26] x86/fpu/xstate: Update the XSTATE buffer address finder " Chang S. Bae
2021-08-18 11:33   ` Borislav Petkov
2021-08-18 19:47     ` Bae, Chang Seok
2021-08-30 17:18       ` Borislav Petkov
2021-08-30 23:38         ` Bae, Chang Seok
2021-07-30 14:59 ` [PATCH v9 11/26] x86/fpu/xstate: Update the XSTATE context copy function " Chang S. Bae
2021-08-18 12:03   ` Borislav Petkov
2021-08-18 19:47     ` Bae, Chang Seok
2021-07-30 14:59 ` [PATCH v9 12/26] x86/fpu/xstate: Use feature disable (XFD) to protect dynamic user state Chang S. Bae
2021-08-18 16:24   ` Borislav Petkov [this message]
2021-08-18 17:20     ` Thiago Macieira
2021-08-18 17:46       ` Borislav Petkov
2021-08-18 17:58         ` Thiago Macieira
2021-08-18 18:10           ` Borislav Petkov
2021-08-24 22:51             ` Len Brown
2021-08-18 20:43         ` Bae, Chang Seok
2021-08-18 21:04           ` Thiago Macieira
2021-08-18 21:12             ` Bae, Chang Seok
2021-08-18 22:27               ` Thiago Macieira
2021-08-19  1:21             ` Andy Lutomirski
2021-08-19 16:06               ` Thiago Macieira
2021-08-18 21:17           ` Borislav Petkov
2021-08-18 21:37             ` Bae, Chang Seok
2021-08-19  8:00               ` Borislav Petkov
2021-08-19 15:24                 ` Bae, Chang Seok
2021-08-24 23:22             ` Len Brown
2021-08-30 17:31               ` Borislav Petkov
2021-09-17  3:48                 ` Len Brown
2021-08-18 19:47     ` Bae, Chang Seok
2021-08-24 22:21     ` Len Brown
2021-08-30 17:41       ` Borislav Petkov
2021-08-31 21:44         ` Len Brown
2021-08-24 23:17     ` Len Brown
2021-08-30 17:53       ` Borislav Petkov
2021-08-31 22:07         ` Len Brown
2021-08-31 22:11           ` Dave Hansen
2021-08-30 18:04       ` Dave Hansen
2021-08-31 22:15         ` Len Brown
2021-08-31 22:16           ` Len Brown
2021-08-31 22:39           ` Thiago Macieira
2021-08-31 22:44             ` Len Brown
2021-07-30 14:59 ` [PATCH v9 13/26] x86/fpu/xstate: Support ptracer-induced XSTATE buffer expansion Chang S. Bae
2021-07-30 14:59 ` [PATCH v9 14/26] x86/arch_prctl: Create ARCH_SET_STATE_ENABLE/ARCH_GET_STATE_ENABLE Chang S. Bae
2021-08-06 16:46   ` Thiago Macieira
2021-08-09 22:08     ` Bae, Chang Seok
2021-08-09 23:42       ` Thiago Macieira
2021-08-10  0:57         ` Bae, Chang Seok
2021-08-13 19:44           ` Bae, Chang Seok
2021-07-30 14:59 ` [PATCH v9 15/26] x86/fpu/xstate: Support both legacy and expanded signal XSTATE size Chang S. Bae
2021-07-30 14:59 ` [PATCH v9 16/26] x86/fpu/xstate: Adjust the XSAVE feature table to address gaps in state component numbers Chang S. Bae
2021-07-30 14:59 ` [PATCH v9 17/26] x86/fpu/xstate: Disable XSTATE support if an inconsistent state is detected Chang S. Bae
2021-07-30 14:59 ` [PATCH v9 18/26] x86/cpufeatures/amx: Enumerate Advanced Matrix Extension (AMX) feature bits Chang S. Bae
2021-07-30 14:59 ` [PATCH v9 19/26] x86/fpu/amx: Define AMX state components and have it used for boot-time checks Chang S. Bae
2021-07-30 14:59 ` [PATCH v9 20/26] x86/fpu/amx: Initialize child's AMX state Chang S. Bae
2021-07-30 14:59 ` [PATCH v9 21/26] x86/fpu/amx: Enable the AMX feature in 64-bit mode Chang S. Bae
2021-07-30 14:59 ` [PATCH v9 22/26] x86/fpu/xstate: Skip writing zeros to signal frame for dynamic user states if in INIT-state Chang S. Bae
2021-07-30 14:59 ` [PATCH v9 23/26] selftest/x86/amx: Test cases for the AMX state management Chang S. Bae
2021-07-30 14:59 ` [PATCH v9 24/26] x86/insn/amx: Add TILERELEASE instruction to the opcode map Chang S. Bae
2021-07-30 14:59 ` [PATCH v9 25/26] intel_idle/amx: Add SPR support with XTILEDATA capability Chang S. Bae
2021-07-30 18:41   ` Dave Hansen
2021-08-03 21:32     ` Bae, Chang Seok
2021-08-03 21:38       ` Dave Hansen
2021-08-03 21:43         ` Brown, Len
2021-07-30 20:15   ` Dave Hansen
2021-07-30 14:59 ` [PATCH v9 26/26] x86/fpu/xstate: Add a sanity check for XFD state when saving XSTATE Chang S. Bae

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YR00U19168BGoRB9@zn.tnic \
    --to=bp@alien8.de \
    --cc=chang.seok.bae@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=jing2.liu@intel.com \
    --cc=len.brown@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mingo@kernel.org \
    --cc=ravi.v.shankar@intel.com \
    --cc=tglx@linutronix.de \
    --cc=thiago.macieira@intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).