On 4/4/22 05:11, Thomas Gleixner wrote: ... > For copy_uabi_to_xstate() it's more effort when supervisor states are > supported and active. If the user space buffer has active states which are > not in the set of current states of the XSTATE buffer, then the buffer > layout will change which means the supervisor states might be overwritten. > > Provide a function, which moves them out of the way and invoke it before > any of the extended features is copied from the user space buffer. This took me a minute to sort through. There are three kinds of state in the kernel buffer in copy_uabi_to_xstate(). 1. User state, set in hdr.features which is copied from uabi buffer 2. User state, clear in hdr.features. Feature will be set to its init state because xsave->header.xfeatures is cleared. 3. Supervisor state which is preserved. I was confused for a *moment* because the space for a #2 might be overwritten by a #1 copy-in from userspace. But, that only happens for states that will end up being set to init. No harm, no foul. Any interest in doing something like the attached to make copy_uabi_to_xstate() easier to read? > --- a/arch/x86/kernel/fpu/xstate.c > +++ b/arch/x86/kernel/fpu/xstate.c > @@ -1147,10 +1147,10 @@ void __copy_xstate_to_uabi_buf(struct me > pkru.pkru = pkru_val; > membuf_write(&to, &pkru, sizeof(pkru)); > } else { > - copy_feature(header.xfeatures & BIT_ULL(i), &to, > - __raw_xsave_addr(xsave, i), > - __raw_xsave_addr(xinit, i), > - xstate_sizes[i]); > + bool active = header.xfeatures & BIT_ULL(i); > + void *from = __raw_xsave_addr(active ? xsave : xinit, i); > + > + membuf_write(&to, from, xstate_sizes[i]); > } > /* > * Keep track of the last copied state in the non-compacted > @@ -1195,6 +1195,73 @@ static int copy_from_buffer(void *dst, u > return 0; > } > > +/* > + * Prepare the kernel XSTATE buffer for saving the user supplied XSTATE. If > + * the XGETBV1 based optimization is in use then the kernel buffer might > + * not have the user supplied active features set and an eventual active > + * supervisor state has to be moved out of the way. With optimized > + * compaction the features which are to be stored need to be set in the > + * XCOMP_BV field of the XSTATE header. > + */ What does "eventual active supervisor state" mean? Just that the state needs to be preserved since it needs to be restored eventually? I found that phrase a bit confusing. I was thinking of a comment more along these lines: /* * Adjust 'fpstate' so that it can additionally store all the xfeatures * set in 'xuser' when optimized compaction is enabled. All bits in * 'xuser' will be set in XCOMP_BV. Supervisor state will be preserved * and may be moved to make room for new 'xuser' states. User state may * be destroyed. */ > +static void xsave_adjust_xcomp(struct fpstate *fpstate, u64 xuser) > +{ > + struct xregs_state *xsave = &fpstate->regs.xsave; > + u64 xtmp, xall, xbv, xcur = xsave->header.xfeatures; > + int i; > + > + /* Nothing to do if optimized compaction is not in use */ > + if (!xsave_use_xgetbv1()) > + return; The comment makes me wonder if we need a more descriptive name for xsave_use_xgetbv1(), like: if (!xsave_do_optimized_compaction()) return; > + /* > + * Check whether the current xstate buffer contains supervisor > + * state. If not, just set the user supplied features. > + */ > + if (!(xcur & XFEATURE_MASK_SUPERVISOR_ALL)) { > + __xstate_init_xcomp_bv(xsave, xuser); > + return; > + } > + > + /* > + * Set legacy features. They are at a fixed offset and do not > + * affect the layout. > + */ > + xbv = xsave->header.xcomp_bv; > + xbv |= xuser & (XFEATURE_MASK_FP | XFEATURE_MASK_SSE); > + > + /* > + * Check whether there is new extended state in the user supplied > + * buffer. If not, then nothing has to be moved around. > + */ > + if (!(xuser & ~xbv)) { > + __xstate_init_xcomp_bv(xsave, xbv); > + return; > + } > + > + /* > + * No more optimizations. Set the user features and move the > + * supervisor state(s). If the new user feature is past > + * the supervisor state, then the loop is moving nothing. > + */ > + xtmp = xbv & XFEATURE_MASK_SUPERVISOR_ALL; > + xall = xbv | xuser; I'd probably at least comment why the loop is backwards: /* * Features are only be moved up in the buffer. Start with * high features to avoid overwriting them with a lower ones. */ I know this is a very typical way to implement non-destructive moves, but my stupid brain seems to default to assuming that for loops only go forward. > + for (i = fls64(xtmp) - 1; i >= FIRST_EXTENDED_XFEATURE; > + i = fls64(xtmp) - 1) { > + unsigned int to, from; Is it worth a check here like: /* Do not move features in their init state: */ if (!(xcur & BIT_ULL(i))) { xtmp &= ~BIT_ULL(i); continue; } (probably also moving the &= around instead of copying it) > + from = xfeature_get_offset(xbv, i); > + to = xfeature_get_offset(xall, i); > + if (from < to) { > + memmove((void *)xsave + to, (void *)xsave + from, > + xstate_sizes[i]); > + } else { > + WARN_ON_ONCE(to < from); > + } > + xtmp &= ~BIT_ULL(i); > + } > + __xstate_init_xcomp_bv(xsave, xall); > +} > > static int copy_uabi_to_xstate(struct fpstate *fpstate, const void *kbuf, > const void __user *ubuf) > @@ -1232,6 +1299,8 @@ static int copy_uabi_to_xstate(struct fp > } > } > > + xsave_adjust_xcomp(fpstate, hdr.xfeatures); > + > for (i = 0; i < XFEATURE_MAX; i++) { > u64 mask = ((u64)1 << i); > > --- a/arch/x86/kernel/fpu/xstate.h > +++ b/arch/x86/kernel/fpu/xstate.h > @@ -10,14 +10,22 @@ > DECLARE_PER_CPU(u64, xfd_state); > #endif > > -static inline void xstate_init_xcomp_bv(struct xregs_state *xsave, u64 mask) > +static inline bool xsave_use_xgetbv1(void) { return false; } > + > +static inline void __xstate_init_xcomp_bv(struct xregs_state *xsave, u64 mask) > { > /* > * XRSTORS requires these bits set in xcomp_bv, or it will > - * trigger #GP: > + * trigger #GP. It's also required for XRSTOR when the buffer > + * was saved with XSAVEC in compacted format. > */ > + xsave->header.xcomp_bv = mask | XCOMP_BV_COMPACTED_FORMAT; > +} > + > +static inline void xstate_init_xcomp_bv(struct xregs_state *xsave, u64 mask) > +{ > if (cpu_feature_enabled(X86_FEATURE_XCOMPACTED)) > - xsave->header.xcomp_bv = mask | XCOMP_BV_COMPACTED_FORMAT; > + __xstate_init_xcomp_bv(xsave, mask); > } > > static inline u64 xstate_get_group_perm(bool guest)