Re: [patch 2/3] x86/fpu/xsave: Prepare for optimized compaction

From: Dave Hansen <dave.hansen@intel.com>
To: Thomas Gleixner <tglx@linutronix.de>,
	LKML <linux-kernel@vger.kernel.org>
Cc: x86@kernel.org, Andrew Cooper <andrew.cooper3@citrix.com>,
	"Edgecombe, Rick P" <rick.p.edgecombe@intel.com>
Subject: Re: [patch 2/3] x86/fpu/xsave: Prepare for optimized compaction
Date: Thu, 14 Apr 2022 08:46:36 -0700	[thread overview]
Message-ID: <d2f927c9-187a-906c-e1f3-33541b3b5a84@intel.com> (raw)
In-Reply-To: <20220404104820.656881574@linutronix.de>

[-- Attachment #1: Type: text/plain, Size: 6564 bytes --]

On 4/4/22 05:11, Thomas Gleixner wrote:
...
> For copy_uabi_to_xstate() it's more effort when supervisor states are
> supported and active. If the user space buffer has active states which are
> not in the set of current states of the XSTATE buffer, then the buffer
> layout will change which means the supervisor states might be overwritten.
> 
> Provide a function, which moves them out of the way and invoke it before
> any of the extended features is copied from the user space buffer.

This took me a minute to sort through.  There are three kinds of state
in the kernel buffer in copy_uabi_to_xstate().

1. User state, set in hdr.features which is copied from uabi buffer
2. User state, clear in hdr.features.  Feature will be set to its init
   state because xsave->header.xfeatures is cleared.
3. Supervisor state which is preserved.

I was confused for a *moment* because the space for a #2 might be
overwritten by a #1 copy-in from userspace.  But, that only happens for
states that will end up being set to init.  No harm, no foul.

Any interest in doing something like the attached to make
copy_uabi_to_xstate() easier to read?

> --- a/arch/x86/kernel/fpu/xstate.c
> +++ b/arch/x86/kernel/fpu/xstate.c
> @@ -1147,10 +1147,10 @@ void __copy_xstate_to_uabi_buf(struct me
>  			pkru.pkru = pkru_val;
>  			membuf_write(&to, &pkru, sizeof(pkru));
>  		} else {
> -			copy_feature(header.xfeatures & BIT_ULL(i), &to,
> -				     __raw_xsave_addr(xsave, i),
> -				     __raw_xsave_addr(xinit, i),
> -				     xstate_sizes[i]);
> +			bool active = header.xfeatures & BIT_ULL(i);
> +			void *from = __raw_xsave_addr(active ? xsave : xinit, i);
> +
> +			membuf_write(&to, from, xstate_sizes[i]);
>  		}
>  		/*
>  		 * Keep track of the last copied state in the non-compacted
> @@ -1195,6 +1195,73 @@ static int copy_from_buffer(void *dst, u
>  	return 0;
>  }
>  
> +/*
> + * Prepare the kernel XSTATE buffer for saving the user supplied XSTATE. If
> + * the XGETBV1 based optimization is in use then the kernel buffer might
> + * not have the user supplied active features set and an eventual active
> + * supervisor state has to be moved out of the way. With optimized
> + * compaction the features which are to be stored need to be set in the
> + * XCOMP_BV field of the XSTATE header.
> + */

What does "eventual active supervisor state" mean?  Just that the state
needs to be preserved since it needs to be restored eventually?  I found
that phrase a bit confusing.

I was thinking of a comment more along these lines:

/*
 * Adjust 'fpstate' so that it can additionally store all the xfeatures
 * set in 'xuser' when optimized compaction is enabled.  All bits in
 * 'xuser' will be set in XCOMP_BV.  Supervisor state will be preserved
 * and may be moved to make room for new 'xuser' states.  User state may
 * be destroyed.
 */

> +static void xsave_adjust_xcomp(struct fpstate *fpstate, u64 xuser)
> +{
> +	struct xregs_state *xsave = &fpstate->regs.xsave;
> +	u64 xtmp, xall, xbv, xcur = xsave->header.xfeatures;
> +	int i;
> +
> +	/* Nothing to do if optimized compaction is not in use */
> +	if (!xsave_use_xgetbv1())
> +		return;

The comment makes me wonder if we need a more descriptive name for
xsave_use_xgetbv1(), like:

	if (!xsave_do_optimized_compaction())
		return;

> +	/*
> +	 * Check whether the current xstate buffer contains supervisor
> +	 * state. If not, just set the user supplied features.
> +	 */
> +	if (!(xcur & XFEATURE_MASK_SUPERVISOR_ALL)) {
> +		__xstate_init_xcomp_bv(xsave, xuser);
> +		return;
> +	}
> +
> +	/*
> +	 * Set legacy features. They are at a fixed offset and do not
> +	 * affect the layout.
> +	 */
> +	xbv = xsave->header.xcomp_bv;
> +	xbv |= xuser & (XFEATURE_MASK_FP | XFEATURE_MASK_SSE);
> +
> +	/*
> +	 * Check whether there is new extended state in the user supplied
> +	 * buffer. If not, then nothing has to be moved around.
> +	 */
> +	if (!(xuser & ~xbv)) {
> +		__xstate_init_xcomp_bv(xsave, xbv);
> +		return;
> +	}
> +
> +	/*
> +	 * No more optimizations. Set the user features and move the
> +	 * supervisor state(s). If the new user feature is past
> +	 * the supervisor state, then the loop is moving nothing.
> +	 */
> +	xtmp = xbv & XFEATURE_MASK_SUPERVISOR_ALL;
> +	xall = xbv | xuser;


I'd probably at least comment why the loop is backwards:

	/*
	 * Features are only be moved up in the buffer.  Start with
	 * high features to avoid overwriting them with a lower ones.
	 */

I know this is a very typical way to implement non-destructive moves,
but my stupid brain seems to default to assuming that for loops only go
forward.

> +	for (i = fls64(xtmp) - 1; i >= FIRST_EXTENDED_XFEATURE;
> +	     i = fls64(xtmp) - 1) {
> +		unsigned int to, from;

Is it worth a check here like:

		/* Do not move features in their init state: */
		if (!(xcur & BIT_ULL(i))) {
			xtmp &= ~BIT_ULL(i);
			continue;
		}

(probably also moving the &= around instead of copying it)

> +		from = xfeature_get_offset(xbv, i);
> +		to = xfeature_get_offset(xall, i);
> +		if (from < to) {
> +			memmove((void *)xsave + to, (void *)xsave + from,
> +				xstate_sizes[i]);
> +		} else {
> +			WARN_ON_ONCE(to < from);
> +		}
> +		xtmp &= ~BIT_ULL(i);
> +	}
> +	__xstate_init_xcomp_bv(xsave, xall);
> +}
>  
>  static int copy_uabi_to_xstate(struct fpstate *fpstate, const void *kbuf,
>  			       const void __user *ubuf)
> @@ -1232,6 +1299,8 @@ static int copy_uabi_to_xstate(struct fp
>  		}
>  	}
>  
> +	xsave_adjust_xcomp(fpstate, hdr.xfeatures);
> +
>  	for (i = 0; i < XFEATURE_MAX; i++) {
>  		u64 mask = ((u64)1 << i);
>  
> --- a/arch/x86/kernel/fpu/xstate.h
> +++ b/arch/x86/kernel/fpu/xstate.h
> @@ -10,14 +10,22 @@
>  DECLARE_PER_CPU(u64, xfd_state);
>  #endif
>  
> -static inline void xstate_init_xcomp_bv(struct xregs_state *xsave, u64 mask)
> +static inline bool xsave_use_xgetbv1(void) { return false; }
> +
> +static inline void __xstate_init_xcomp_bv(struct xregs_state *xsave, u64 mask)
>  {
>  	/*
>  	 * XRSTORS requires these bits set in xcomp_bv, or it will
> -	 * trigger #GP:
> +	 * trigger #GP. It's also required for XRSTOR when the buffer
> +	 * was saved with XSAVEC in compacted format.
>  	 */
> +	xsave->header.xcomp_bv = mask | XCOMP_BV_COMPACTED_FORMAT;
> +}
> +
> +static inline void xstate_init_xcomp_bv(struct xregs_state *xsave, u64 mask)
> +{
>  	if (cpu_feature_enabled(X86_FEATURE_XCOMPACTED))
> -		xsave->header.xcomp_bv = mask | XCOMP_BV_COMPACTED_FORMAT;
> +		__xstate_init_xcomp_bv(xsave, mask);
>  }
>  
>  static inline u64 xstate_get_group_perm(bool guest)

[-- Attachment #2: copy_uabi_to_xstate.1.patch --]
[-- Type: text/x-patch, Size: 2077 bytes --]

diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 39e1c8626ab9..8a32a8398e1f 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -1178,6 +1178,15 @@ static int copy_from_buffer(void *dst, unsigned int offset, unsigned int size,
 }
 
 
+/*
+ * Replace all user xfeatures with data from 'ubuf'.  Features
+ * marked as init in 'ubuf' will be marked as init in the kernel
+ * buffer.  Supervisor xfeatures will be preserved.
+ *
+ * Returns 0 on success.  Non-zero return codes indicate that
+ * the kernel xsave buffer may have been left in an inconsistent
+ * state.  Caller must reset fpstate to recover.
+ */
 static int copy_uabi_to_xstate(struct fpstate *fpstate, const void *kbuf,
 			       const void __user *ubuf)
 {
@@ -1214,30 +1223,30 @@ static int copy_uabi_to_xstate(struct fpstate *fpstate, const void *kbuf,
 		}
 	}
 
+	/*
+	 * Leave only supervisor states in 'xfeatures'.  User xfeature
+	 * bits are set in the loop below.
+	 */
+	xsave->header.xfeatures &= XFEATURE_MASK_SUPERVISOR_ALL;
+
 	for (i = 0; i < XFEATURE_MAX; i++) {
 		u64 mask = ((u64)1 << i);
+		void *dst = __raw_xsave_addr(xsave, i);
 
-		if (hdr.xfeatures & mask) {
-			void *dst = __raw_xsave_addr(xsave, i);
-
-			offset = xstate_offsets[i];
-			size = xstate_sizes[i];
-
-			if (copy_from_buffer(dst, offset, size, kbuf, ubuf))
-				return -EFAULT;
+		if (!hdr.xfeatures & mask) {
+			/* Feature was marked as init in uabi buffer: */
+			xsave->header.xfeatures &= ~mask;
+			continue
 		}
-	}
+		/* Feature was present in uabi buffer: */
+		xsave->header.xfeatures |= mask;
 
-	/*
-	 * The state that came in from userspace was user-state only.
-	 * Mask all the user states out of 'xfeatures':
-	 */
-	xsave->header.xfeatures &= XFEATURE_MASK_SUPERVISOR_ALL;
+		offset = xstate_offsets[i];
+		size = xstate_sizes[i];
 
-	/*
-	 * Add back in the features that came in from userspace:
-	 */
-	xsave->header.xfeatures |= hdr.xfeatures;
+		if (copy_from_buffer(dst, offset, size, kbuf, ubuf))
+			return -EFAULT;
+	}
 
 	return 0;
 }