All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Hansen <dave.hansen@intel.com>
To: "Chang S. Bae" <chang.seok.bae@intel.com>,
	bp@suse.de, luto@kernel.org, tglx@linutronix.de,
	mingo@kernel.org, x86@kernel.org
Cc: len.brown@intel.com, jing2.liu@intel.com,
	ravi.v.shankar@intel.com, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v5 15/28] x86/arch_prctl: Create ARCH_GET_XSTATE/ARCH_PUT_XSTATE
Date: Tue, 25 May 2021 08:46:44 -0700	[thread overview]
Message-ID: <cfd7a4bd-4381-4128-c193-e767a2cc9686@intel.com> (raw)
In-Reply-To: <20210523193259.26200-16-chang.seok.bae@intel.com>

> The kernel enforces access to the specified using XFD hardware support. By
> default, XFD is armed and results in #NM traps on un-authorized access.
> Upon successful ARCH_GET_XSTATE, XFD traps are dis-armed and the user is
> free to access the feature.

Does this really need to talk about XFD?

I also don't really like this talking about being "authorized" or not.
Isn't this interface simply to give userspace the opportunity to
deterministically avoid being killed by signals from ENOMEM during a #NM?

I'd also define the behavior a bit more generically.  Maybe:

After a successful ARCH_GET_XSTATE, the kernel guarantees that no #NM
exception will be generated for access to any of the specified XSAVE
features.  This guarantee will persist until at least the point where a
ARCH_PUT_XSTATE operation occurs, possibly longer.

The kernel may choose to return an error for any ARCH_GET_XSTATE request
at any time, even if a prior one succeeds.  This might be as a result of
a memory allocation failure, resource exhaustion, or exceeding the
implementations limits for "outstanding" ARCH_GET_XSTATE operations.

The kernel may return errors if the number of ARCH_PUT_XSTATE operations
for a given XSAVE feature exceed the number of ARCH_GET_XSTATE operations.

--

Note that there's no discussion of XFD.

> diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
> index 25c9c7dad3f9..016c3adebec3 100644
> --- a/arch/x86/kernel/fpu/core.c
> +++ b/arch/x86/kernel/fpu/core.c
> @@ -254,6 +254,8 @@ int fpu__copy(struct task_struct *dst, struct task_struct *src)
> 
>  	WARN_ON_FPU(src_fpu != &current->thread.fpu);
> 
> +	dst_fpu->refcount = NULL;

For the future, don't forget to call out the fork/exec() behavior.

>  	/*
>  	 * The child does not inherit the dynamic states. Thus, use the buffer
>  	 * embedded in struct task_struct, which has the minimum size.
> @@ -541,3 +543,15 @@ int fpu__exception_code(struct fpu *fpu, int trap_nr)
>  	 */
>  	return 0;
>  }
> +
> +/**
> + * free_fpu() - Free up memory that belongs to the FPU context.
> + * @fpu:	A struct fpu * pointer
> + *
> + * Return:	Nothing
> + */
> +void free_fpu(struct fpu *fpu)
> +{
> +	kfree(fpu->refcount);
> +	free_xstate_buffer(fpu);
> +}

FWIW, I don't think that needs a formal kdoc comment.

> diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
> index e60a20a1b24b..126c4a509669 100644
> --- a/arch/x86/kernel/fpu/xstate.c
> +++ b/arch/x86/kernel/fpu/xstate.c
> @@ -21,6 +21,7 @@
>  #include <asm/tlbflush.h>
>  #include <asm/cpufeature.h>
>  #include <asm/trace/fpu.h>
> +#include <asm/prctl.h>
> 
>  /*
>   * Although we spell it out in here, the Processor Trace
> @@ -78,6 +79,11 @@ static unsigned int xstate_supervisor_only_offsets[XFEATURE_MAX] = { [ 0 ... XFE
>   * byte boundary. Otherwise, it follows the preceding component immediately.
>   */
>  static bool xstate_aligns[XFEATURE_MAX] = { [ 0 ... XFEATURE_MAX - 1] = false};
> +/*
> + * Remember the index number in the reference counter array that supports
> + * access request. '-1' indicates that a state component does not support it.
> + */
> +static unsigned int xstate_refcount_idx[XFEATURE_MAX] = { [ 0 ... XFEATURE_MAX - 1] = -1};

For now when we have a single feature, isn't this overkill?  Also, even
if you decide to keep this, there are only 63 possible XSAVE features.
We don't need 'unsigned int' for storing a maximum value of 63.

>  /**
>   * struct fpu_xstate_buffer_config - xstate per-task buffer configuration
> @@ -969,8 +975,7 @@ void __init fpu__init_system_xstate(void)
>  {
>  	unsigned int eax, ebx, ecx, edx;
>  	static int on_boot_cpu __initdata = 1;
> -	int err;
> -	int i;
> +	int err, i, j;
> 
>  	WARN_ON_FPU(!on_boot_cpu);
>  	on_boot_cpu = 0;
> @@ -1025,14 +1030,17 @@ void __init fpu__init_system_xstate(void)
>  	xfeatures_mask_all &= fpu__get_supported_xfeatures_mask();
>  	xfeatures_mask_user_dynamic = 0;
> 
> -	for (i = FIRST_EXTENDED_XFEATURE; i < XFEATURE_MAX; i++) {
> +	for (i = FIRST_EXTENDED_XFEATURE, j = 0; i < XFEATURE_MAX; i++) {
>  		u64 feature_mask = BIT_ULL(i);
> 
>  		if (!(xfeatures_mask_user() & feature_mask))
>  			continue;
> 
> -		if (xfd_supported(i))
> +		if (xfd_supported(i)) {
>  			xfeatures_mask_user_dynamic |= feature_mask;
> +			xstate_refcount_idx[i] = j;
> +			j++;
> +		}
>  	}
> 
>  	/* Enable xstate instructions to be able to continue with initialization: */
> @@ -1339,6 +1347,93 @@ int alloc_xstate_buffer(struct fpu *fpu, u64 mask)
>  	return 0;
>  }
> 
> +/**
> + * do_arch_prctl_xstate() - Handle xstate-related arch_prctl requests.

Not the most helpful patch description.

> + * @fpu:	A struct fpu * pointer
> + * @option:	A subfunction of arch_prctl()
> + * @mask:	A xstate-component bitmap
> + *
> + * Return:	0 if successful; otherwise, return a relevant error code.
> + */
> +long do_arch_prctl_xstate(struct fpu *fpu, int option, unsigned long mask)
> +{
> +	bool need_xfd_update = false;
> +	int i;
> +
> +	switch (option) {
> +	case ARCH_GET_XSTATE: {
> +		int err = 0;
> +
> +		if (mask & ~xfeatures_mask_user())
> +			return -EPERM;

This would also return -EPERM for unknown features.  That's a bit odd.

How about just -EINVAL, to cover all cases: supervisor or unknown?

> +		if (!fpu->refcount) {
> +			fpu->refcount = kcalloc(hweight64(xfeatures_mask_user_dynamic),
> +						sizeof(int), GFP_KERNEL);
> +			if (!fpu->refcount)
> +				return -ENOMEM;
> +		}

If someone calls this on a non-XFD system, this kcalloc() will fail.
It's a bit odd that if I say "get XSTATE_FP", it returns -ENOMEM.

Maybe you could check 'mask" against xfeatures_mask_user_dynamic up front.

IIRC, this dynamic allocation costs 32 bytes of kmalloc() space for a
single integer, plus the pointer.  This would be simpler, faster and
smaller if just a single XFD feature was supported for now.

> +		for (i = FIRST_EXTENDED_XFEATURE; i < XFEATURE_MAX; i++) {
> +			unsigned int idx = xstate_refcount_idx[i];
> +
> +			if ((idx == -1) || !(BIT_ULL(i) & mask))
> +				continue;
> +
> +			if (fpu->refcount[idx] == INT_MAX)
> +				return -EINVAL;
> +
> +			fpu->refcount[idx]++;
> +		}

Let's say you have 5 xfeatures that support XFD.  The first 4 have their
fpu->refcount[]++ and the fifth hits the limit.  This will bump those 4
refcounts and then return -EINVAL.  How could the user ever recover from
that?

Also, a few comments in here would really help.  It's bare and fairly
hard to grok at the moment.  I *think* it's guaranteed by this point
that at least *ONE* refcount has to be bumped (or the kcalloc() would
fail), but it took me a while to convince myself that it works.

> +		if ((mask & fpu->state_mask) == mask)
> +			return 0;
> +
> +		err = alloc_xstate_buffer(fpu, mask);
> +		if (!err)
> +			need_xfd_update = true;
> +		else
> +			return err;

'return' without dropping the refcounts?

> +		break;
> +	}
> +	case ARCH_PUT_XSTATE: {
> +		if (mask & ~xfeatures_mask_user())
> +			return -EPERM;
> +
> +		if (!fpu->refcount)
> +			return -EINVAL;

This needs a comment:

		/* No successful GET_XSTATE was ever performed */

> +		for (i = FIRST_EXTENDED_XFEATURE; i < XFEATURE_MAX; i++) {
> +			int idx = xstate_refcount_idx[i];
> +			u64 feature_mask = BIT_ULL(i);
> +
> +			if ((idx == -1) || !(feature_mask & mask))
> +				continue;
> +
> +			if (fpu->refcount[idx] <= 0)
> +				return -EINVAL;

This has the same bug as the upper loop.

> +			fpu->refcount[idx]--;
> +			if (!fpu->refcount[idx]) {
> +				need_xfd_update = true;
> +				fpu->state_mask &= ~(feature_mask);
> +			}

Because of that bug, it's possible to return from this without getting
to the 'need_xfd_update' below.

> +		}
> +		break;
> +	}
> +	default:
> +		return -EINVAL;
> +	}
> +
> +	if (need_xfd_update) {
> +		u64 fpu_xfd_mask = fpu->state_mask & xfd_capable();
> +
> +		xfd_write(xfd_capable() ^ fpu_xfd_mask);
> +	}
> +	return 0;
> +}
> +
>  static void fill_gap(struct membuf *to, unsigned *last, unsigned offset)
>  {
>  	if (*last >= offset)
> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index 5252464a27e3..c166243f64e4 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -98,6 +98,12 @@ void arch_thread_struct_whitelist(unsigned long *offset, unsigned long *size)
>  	*size = get_xstate_config(XSTATE_MIN_SIZE);
>  }
> 
> +void arch_release_task_struct(struct task_struct *tsk)
> +{
> +	if (boot_cpu_has(X86_FEATURE_FPU))
> +		free_fpu(&tsk->thread.fpu);
> +}
> +
>  /*
>   * Free thread data structures etc..
>   */
> @@ -990,13 +996,16 @@ unsigned long get_wchan(struct task_struct *p)
>  }
> 
>  long do_arch_prctl_common(struct task_struct *task, int option,
> -			  unsigned long cpuid_enabled)
> +			  unsigned long arg2)
>  {
>  	switch (option) {
>  	case ARCH_GET_CPUID:
>  		return get_cpuid_mode();
>  	case ARCH_SET_CPUID:
> -		return set_cpuid_mode(task, cpuid_enabled);
> +		return set_cpuid_mode(task, arg2);
> +	case ARCH_GET_XSTATE:
> +	case ARCH_PUT_XSTATE:
> +		return do_arch_prctl_xstate(&task->thread.fpu, option, arg2);
>  	}
> 
>  	return -EINVAL;
> --
> 2.17.1
> 


  parent reply	other threads:[~2021-05-25 15:46 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-23 19:32 [PATCH v5 00/28] x86: Support Intel Advanced Matrix Extensions Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 01/28] x86/fpu/xstate: Modify the initialization helper to handle both static and dynamic buffers Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 02/28] x86/fpu/xstate: Modify state copy helpers " Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 03/28] x86/fpu/xstate: Modify address finders " Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 04/28] x86/fpu/xstate: Modify the context restore helper " Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 05/28] x86/fpu/xstate: Add a new variable to indicate dynamic user states Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 06/28] x86/fpu/xstate: Add new variables to indicate dynamic xstate buffer size Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 07/28] x86/fpu/xstate: Calculate and remember dynamic xstate buffer sizes Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 08/28] x86/fpu/xstate: Convert the struct fpu 'state' field to a pointer Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 09/28] x86/fpu/xstate: Introduce helpers to manage the xstate buffer dynamically Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 10/28] x86/fpu/xstate: Define the scope of the initial xstate data Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 11/28] x86/fpu/xstate: Update the xstate save function to support dynamic states Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 12/28] x86/fpu/xstate: Update the xstate buffer address finder " Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 13/28] x86/fpu/xstate: Update the xstate context copy function " Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 14/28] x86/fpu/xstate: Prevent unauthorised use of dynamic user state Chang S. Bae
2021-06-16 16:17   ` Dave Hansen
2021-06-16 16:27   ` Dave Hansen
2021-06-16 18:12     ` Andy Lutomirski
2021-06-16 18:47       ` Bae, Chang Seok
2021-06-16 19:01         ` Dave Hansen
2021-06-16 19:23           ` Bae, Chang Seok
2021-06-16 19:28             ` Dave Hansen
2021-06-16 19:37               ` Bae, Chang Seok
2021-06-28 10:11               ` Liu, Jing2
2021-06-29 17:43           ` Bae, Chang Seok
2021-06-29 17:54             ` Dave Hansen
2021-06-29 18:35               ` Bae, Chang Seok
2021-06-29 18:50                 ` Dave Hansen
2021-06-29 19:13                   ` Bae, Chang Seok
2021-06-29 19:26                     ` Dave Hansen
2021-05-23 19:32 ` [PATCH v5 15/28] x86/arch_prctl: Create ARCH_GET_XSTATE/ARCH_PUT_XSTATE Chang S. Bae
2021-05-24 23:10   ` Len Brown
2021-05-25 17:27     ` Borislav Petkov
2021-05-25 17:33       ` Dave Hansen
2021-05-26  0:38     ` Len Brown
2021-05-27 11:14       ` second, sync-alloc syscall Borislav Petkov
2021-05-27 13:59         ` Len Brown
2021-05-27 19:35           ` Andy Lutomirski
2021-05-25 15:46   ` Dave Hansen [this message]
2021-05-23 19:32 ` [PATCH v5 16/28] x86/fpu/xstate: Support ptracer-induced xstate buffer expansion Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 17/28] x86/fpu/xstate: Adjust the XSAVE feature table to address gaps in state component numbers Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 18/28] x86/fpu/xstate: Disable xstate support if an inconsistent state is detected Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 19/28] x86/cpufeatures/amx: Enumerate Advanced Matrix Extension (AMX) feature bits Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 20/28] x86/fpu/amx: Define AMX state components and have it used for boot-time checks Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 21/28] x86/fpu/amx: Initialize child's AMX state Chang S. Bae
2021-05-24  3:09   ` Andy Lutomirski
2021-05-24 17:37     ` Len Brown
2021-05-24 18:13       ` Andy Lutomirski
2021-05-24 18:21         ` Len Brown
2021-05-25  3:44           ` Andy Lutomirski
2021-05-23 19:32 ` [PATCH v5 22/28] x86/fpu/amx: Enable the AMX feature in 64-bit mode Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 23/28] selftest/x86/amx: Test cases for the AMX state management Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 24/28] x86/fpu/xstate: Use per-task xstate mask for saving xstate in signal frame Chang S. Bae
2021-05-24  3:15   ` Andy Lutomirski
2021-05-24 18:06     ` Len Brown
2021-05-25  4:47       ` Andy Lutomirski
2021-05-25 14:04         ` Len Brown
2021-05-23 19:32 ` [PATCH v5 25/28] x86/fpu/xstate: Skip writing zeros to signal frame for dynamic user states if in INIT-state Chang S. Bae
2021-05-24  3:25   ` Andy Lutomirski
2021-05-24 18:15     ` Len Brown
2021-05-24 18:29       ` Dave Hansen
2021-05-25  4:46       ` Andy Lutomirski
2021-05-23 19:32 ` [PATCH v5 26/28] selftest/x86/amx: Test case for AMX state copy optimization in signal delivery Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 27/28] x86/insn/amx: Add TILERELEASE instruction to the opcode map Chang S. Bae
2021-05-23 19:32 ` [PATCH v5 28/28] x86/fpu/amx: Clear the AMX state when appropriate Chang S. Bae
2021-05-24  3:13   ` Andy Lutomirski
2021-05-24 14:10     ` Dave Hansen
2021-05-24 17:32       ` Len Brown
2021-05-24 17:39         ` Dave Hansen
2021-05-24 18:24           ` Len Brown
2021-05-27 11:56             ` Peter Zijlstra
2021-05-27 14:02               ` Len Brown
2021-05-24 14:06   ` Dave Hansen
2021-05-24 17:34     ` Len Brown
2021-05-24 21:11       ` [PATCH v5-fix " Chang S. Bae

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cfd7a4bd-4381-4128-c193-e767a2cc9686@intel.com \
    --to=dave.hansen@intel.com \
    --cc=bp@suse.de \
    --cc=chang.seok.bae@intel.com \
    --cc=jing2.liu@intel.com \
    --cc=len.brown@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mingo@kernel.org \
    --cc=ravi.v.shankar@intel.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.