linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4] x86: load FPU registers on return to userland
@ 2018-11-07 19:48 Sebastian Andrzej Siewior
  2018-11-07 19:48 ` [PATCH 01/23] x86/fpu: Use ULL for shift in xfeature_uncompacted_offset() Sebastian Andrzej Siewior
                   ` (23 more replies)
  0 siblings, 24 replies; 45+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-11-07 19:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

This is a refurbished series originally started by by Rik van Riel. The
goal is load the FPU registers on return to userland and not on every
context switch. By this optimisation we can:
- avoid loading the registers if the task stays in kernel and does
  not return to userland
- make kernel_fpu_begin() cheaper: it only saves the registers on the
  first invocation. The second invocation does not need save them again.

To access the FPU registers in kernel we need:
- disable preemption to avoid that the scheduler switches tasks. By
  doing so it would set TIF_NEED_FPU_LOAD and the FPU registers would be
  not valid.
- disable BH because the softirq might use kernel_fpu_begin() and then
  set TIF_NEED_FPU_LOAD instead loading the FPU registers on completion.

v3…v4:
It has been suggested to remove the `initialized' member of the struct
fpu because it should not required be needed with lazy-FPU-restore and
would make the review easier. This is the first part of the series, the
second is basically the rebase of the v3 queue. As a result, the
diffstat became negative (which wasn't the case in previous version) :)
I tried to incorporate all the review comments that came up, some of
them were "outdated" after the removal of the `initialized' member. I'm
sorry should I missed any.

v1…v3:
v2 was never posted. I followed the idea to completely decouple PKRU
from xstate. This didn't quite work and made a few things complicated. 
One obvious required fixup is copy_fpstate_to_sigframe() where the PKRU
state needs to be fiddled into xstate. This required another
xfeatures_mask so that the sanity checks were performed and
xstate_offsets would be computed. Additionally ptrace also reads/sets
xstate in order to get/set the register and PKRU is one of them. So this
would need some fiddle, too.
In v3 I dropped that decouple idea. I also learned that the wrpkru
instruction is not privileged and so caching it in kernel does not work.
Instead I keep PKRU in xstate area and load it at context switch time
while the remaining registers are deferred (until return to userland).
The offset of PKRU within xstate is enumerated at boot time so why not
use it.

Rik van Riel (5):
  x86/fpu: Add (__)make_fpregs_active helpers
  x86/fpu: Eager switch PKRU state
  x86/fpu: Always store the registers in copy_fpstate_to_sigframe()
  x86/fpu: Prepare copy_fpstate_to_sigframe() for TIF_NEED_FPU_LOAD
  x86/fpu: Defer FPU state load until return to userspace

Sebastian Andrzej Siewior (18):
  x86/fpu: Use ULL for shift in xfeature_uncompacted_offset()
  x86/fpu: Remove fpu->initialized usage in __fpu__restore_sig()
  x86/fpu: Remove fpu__restore()
  x86/entry/32: Remove asm/math_emu.h include
  x86/fpu: Remove preempt_disable() in fpu__clear()
  x86/fpu: Always init the `state' in fpu__clear()
  x86/fpu: Remove fpu->initialized usage in copy_fpstate_to_sigframe()
  x86/fpu: Remove fpu->initialized
  x86/fpu: Remove user_fpu_begin()
  x86/entry: Remove _TIF_ALLWORK_MASK
  x86/fpu: Make __raw_xsave_addr() use feature number instead of mask
  x86/fpu: Make get_xsave_field_ptr() and get_xsave_addr() use feature
    number instead of mask
  x86/pkeys: Make init_pkru_value static
  x86/fpu: Only write PKRU if it is different from current
  x86/pkeys: Don't check if PKRU is zero before writting it
  x86/entry: Add TIF_NEED_FPU_LOAD
  x86/fpu: Update xstate's PKRU value on write_pkru()
  x86/fpu: Don't restore the FPU state directly from userland in
    __fpu__restore_sig()

 Documentation/preempt-locking.txt    |   1 -
 arch/x86/entry/common.c              |   8 ++
 arch/x86/ia32/ia32_signal.c          |  17 +--
 arch/x86/include/asm/fpu/api.h       |  25 ++++
 arch/x86/include/asm/fpu/internal.h  | 149 ++++++----------------
 arch/x86/include/asm/fpu/signal.h    |   2 +-
 arch/x86/include/asm/fpu/types.h     |   9 --
 arch/x86/include/asm/fpu/xstate.h    |   6 +-
 arch/x86/include/asm/pgtable.h       |  19 ++-
 arch/x86/include/asm/special_insns.h |  13 +-
 arch/x86/include/asm/thread_info.h   |  10 +-
 arch/x86/include/asm/trace/fpu.h     |   8 +-
 arch/x86/kernel/fpu/core.c           | 181 +++++++++++++--------------
 arch/x86/kernel/fpu/init.c           |   2 -
 arch/x86/kernel/fpu/regset.c         |  24 +---
 arch/x86/kernel/fpu/signal.c         | 133 +++++++++-----------
 arch/x86/kernel/fpu/xstate.c         |  45 ++++---
 arch/x86/kernel/process.c            |   2 +-
 arch/x86/kernel/process_32.c         |  14 +--
 arch/x86/kernel/process_64.c         |  11 +-
 arch/x86/kernel/signal.c             |  17 ++-
 arch/x86/kernel/traps.c              |   2 +-
 arch/x86/kvm/x86.c                   |  47 ++++---
 arch/x86/math-emu/fpu_entry.c        |   3 -
 arch/x86/mm/mpx.c                    |   6 +-
 arch/x86/mm/pkeys.c                  |  15 +--
 26 files changed, 343 insertions(+), 426 deletions(-)

 git://git.kernel.org/pub/scm/linux/kernel/git/bigeasy/staging.git x86_fpu_rtu_v4

Sebastian


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 01/23] x86/fpu: Use ULL for shift in xfeature_uncompacted_offset()
  2018-11-07 19:48 [PATCH v4] x86: load FPU registers on return to userland Sebastian Andrzej Siewior
@ 2018-11-07 19:48 ` Sebastian Andrzej Siewior
  2018-11-08 11:38   ` Borislav Petkov
  2018-11-07 19:48 ` [PATCH 02/23] x86/fpu: Remove fpu->initialized usage in __fpu__restore_sig() Sebastian Andrzej Siewior
                   ` (22 subsequent siblings)
  23 siblings, 1 reply; 45+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-11-07 19:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen,
	Sebastian Andrzej Siewior

The xfeature mask is 64bit so a shift from a number to its mask should
have LL prefix or else nr > 31 will be lost. This is not a problem now
but should XFEATURE_MASK_SUPERVISOR gain a bit >31 then this check won't
catch it.

Add a ULL suffix so >31bit number will be properly shifted.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/x86/kernel/fpu/xstate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 87a57b7642d36..9dc0a2c8c2755 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -444,7 +444,7 @@ static int xfeature_uncompacted_offset(int xfeature_nr)
 	 * format. Checking a supervisor state's uncompacted offset is
 	 * an error.
 	 */
-	if (XFEATURE_MASK_SUPERVISOR & (1 << xfeature_nr)) {
+	if (XFEATURE_MASK_SUPERVISOR & (1ULL << xfeature_nr)) {
 		WARN_ONCE(1, "No fixed offset for xstate %d\n", xfeature_nr);
 		return -1;
 	}
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 02/23] x86/fpu: Remove fpu->initialized usage in __fpu__restore_sig()
  2018-11-07 19:48 [PATCH v4] x86: load FPU registers on return to userland Sebastian Andrzej Siewior
  2018-11-07 19:48 ` [PATCH 01/23] x86/fpu: Use ULL for shift in xfeature_uncompacted_offset() Sebastian Andrzej Siewior
@ 2018-11-07 19:48 ` Sebastian Andrzej Siewior
  2018-11-08 14:57   ` Borislav Petkov
  2018-11-07 19:48 ` [PATCH 03/23] x86/fpu: Remove fpu__restore() Sebastian Andrzej Siewior
                   ` (21 subsequent siblings)
  23 siblings, 1 reply; 45+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-11-07 19:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen,
	Sebastian Andrzej Siewior

This is a preparation for the removal of the ->initialized member in the
fpu struct.
__fpu__restore_sig() is deactivating the FPU via fpu__drop() and then
setting manually ->initialized followed by fpu__restore(). The result is
that it is possible to manipulate fpu->state and the state of registers
won't be saved/restore on a context switch which would overwrite state.

Don't access the fpu->state while the content is read from user space
and examined / sanitized. Use a temporary buffer kmalloc() buffer for
the preparation of the FPU registers and once the state is considered
okay, load it. Should something go wrong, return with and error and
without altering the original FPU registers.

The removal of "fpu__initialize()" is a nop because fpu->initialized is
already set for the user task.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/x86/include/asm/fpu/signal.h |  2 +-
 arch/x86/kernel/fpu/regset.c      |  5 ++--
 arch/x86/kernel/fpu/signal.c      | 41 ++++++++++++-------------------
 3 files changed, 19 insertions(+), 29 deletions(-)

diff --git a/arch/x86/include/asm/fpu/signal.h b/arch/x86/include/asm/fpu/signal.h
index 44bbc39a57b30..7fb516b6893a8 100644
--- a/arch/x86/include/asm/fpu/signal.h
+++ b/arch/x86/include/asm/fpu/signal.h
@@ -22,7 +22,7 @@ int ia32_setup_frame(int sig, struct ksignal *ksig,
 
 extern void convert_from_fxsr(struct user_i387_ia32_struct *env,
 			      struct task_struct *tsk);
-extern void convert_to_fxsr(struct task_struct *tsk,
+extern void convert_to_fxsr(struct fxregs_state *fxsave,
 			    const struct user_i387_ia32_struct *env);
 
 unsigned long
diff --git a/arch/x86/kernel/fpu/regset.c b/arch/x86/kernel/fpu/regset.c
index bc02f5144b958..5dbc099178a88 100644
--- a/arch/x86/kernel/fpu/regset.c
+++ b/arch/x86/kernel/fpu/regset.c
@@ -269,11 +269,10 @@ convert_from_fxsr(struct user_i387_ia32_struct *env, struct task_struct *tsk)
 		memcpy(&to[i], &from[i], sizeof(to[0]));
 }
 
-void convert_to_fxsr(struct task_struct *tsk,
+void convert_to_fxsr(struct fxregs_state *fxsave,
 		     const struct user_i387_ia32_struct *env)
 
 {
-	struct fxregs_state *fxsave = &tsk->thread.fpu.state.fxsave;
 	struct _fpreg *from = (struct _fpreg *) &env->st_space[0];
 	struct _fpxreg *to = (struct _fpxreg *) &fxsave->st_space[0];
 	int i;
@@ -350,7 +349,7 @@ int fpregs_set(struct task_struct *target, const struct user_regset *regset,
 
 	ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, &env, 0, -1);
 	if (!ret)
-		convert_to_fxsr(target, &env);
+		convert_to_fxsr(&target->thread.fpu.state.fxsave, &env);
 
 	/*
 	 * update the header bit in the xsave header, indicating the
diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index 61a949d84dfa5..5777ee0c32fed 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -207,11 +207,11 @@ int copy_fpstate_to_sigframe(void __user *buf, void __user *buf_fx, int size)
 }
 
 static inline void
-sanitize_restored_xstate(struct task_struct *tsk,
+sanitize_restored_xstate(union fpregs_state *state,
 			 struct user_i387_ia32_struct *ia32_env,
 			 u64 xfeatures, int fx_only)
 {
-	struct xregs_state *xsave = &tsk->thread.fpu.state.xsave;
+	struct xregs_state *xsave = &state->xsave;
 	struct xstate_header *header = &xsave->header;
 
 	if (use_xsave()) {
@@ -238,7 +238,7 @@ sanitize_restored_xstate(struct task_struct *tsk,
 		 */
 		xsave->i387.mxcsr &= mxcsr_feature_mask;
 
-		convert_to_fxsr(tsk, ia32_env);
+		convert_to_fxsr(&state->fxsave, ia32_env);
 	}
 }
 
@@ -284,8 +284,6 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
 	if (!access_ok(VERIFY_READ, buf, size))
 		return -EACCES;
 
-	fpu__initialize(fpu);
-
 	if (!static_cpu_has(X86_FEATURE_FPU))
 		return fpregs_soft_set(current, NULL,
 				       0, sizeof(struct user_i387_ia32_struct),
@@ -314,41 +312,34 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
 		 * thread's fpu state, reconstruct fxstate from the fsave
 		 * header. Validate and sanitize the copied state.
 		 */
+		union fpregs_state *state;
+		void *tmp;
 		struct user_i387_ia32_struct env;
 		int err = 0;
 
-		/*
-		 * Drop the current fpu which clears fpu->initialized. This ensures
-		 * that any context-switch during the copy of the new state,
-		 * avoids the intermediate state from getting restored/saved.
-		 * Thus avoiding the new restored state from getting corrupted.
-		 * We will be ready to restore/save the state only after
-		 * fpu->initialized is again set.
-		 */
-		fpu__drop(fpu);
+		tmp = kmalloc(sizeof(*state) + fpu_kernel_xstate_size + 64, GFP_KERNEL);
+		if (!tmp)
+			return -ENOMEM;
+		state = PTR_ALIGN(tmp, 64);
 
 		if (using_compacted_format()) {
-			err = copy_user_to_xstate(&fpu->state.xsave, buf_fx);
+			err = copy_user_to_xstate(&state->xsave, buf_fx);
 		} else {
-			err = __copy_from_user(&fpu->state.xsave, buf_fx, state_size);
+			err = __copy_from_user(&state->xsave, buf_fx, state_size);
 
 			if (!err && state_size > offsetof(struct xregs_state, header))
-				err = validate_xstate_header(&fpu->state.xsave.header);
+				err = validate_xstate_header(&state->xsave.header);
 		}
 
 		if (err || __copy_from_user(&env, buf, sizeof(env))) {
-			fpstate_init(&fpu->state);
-			trace_x86_fpu_init_state(fpu);
 			err = -1;
 		} else {
-			sanitize_restored_xstate(tsk, &env, xfeatures, fx_only);
+			sanitize_restored_xstate(state, &env,
+						 xfeatures, fx_only);
+			copy_kernel_to_fpregs(state);
 		}
 
-		fpu->initialized = 1;
-		preempt_disable();
-		fpu__restore(fpu);
-		preempt_enable();
-
+		kfree(tmp);
 		return err;
 	} else {
 		/*
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 03/23] x86/fpu: Remove fpu__restore()
  2018-11-07 19:48 [PATCH v4] x86: load FPU registers on return to userland Sebastian Andrzej Siewior
  2018-11-07 19:48 ` [PATCH 01/23] x86/fpu: Use ULL for shift in xfeature_uncompacted_offset() Sebastian Andrzej Siewior
  2018-11-07 19:48 ` [PATCH 02/23] x86/fpu: Remove fpu->initialized usage in __fpu__restore_sig() Sebastian Andrzej Siewior
@ 2018-11-07 19:48 ` Sebastian Andrzej Siewior
  2018-11-07 19:48 ` [PATCH 04/23] x86/entry/32: Remove asm/math_emu.h include Sebastian Andrzej Siewior
                   ` (20 subsequent siblings)
  23 siblings, 0 replies; 45+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-11-07 19:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen,
	Sebastian Andrzej Siewior

There are no users of fpu__restore() so it is time to remove it.
The comment regarding fpu__restore() and TS bit is stale since commit
b3b0870ef3ffe ("i387: do not preload FPU state at task switch time") and
has no meaning since.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 Documentation/preempt-locking.txt   |  1 -
 arch/x86/include/asm/fpu/internal.h |  1 -
 arch/x86/kernel/fpu/core.c          | 24 ------------------------
 arch/x86/kernel/process_32.c        |  4 +---
 arch/x86/kernel/process_64.c        |  4 +---
 5 files changed, 2 insertions(+), 32 deletions(-)

diff --git a/Documentation/preempt-locking.txt b/Documentation/preempt-locking.txt
index 509f5a422d571..dce336134e54a 100644
--- a/Documentation/preempt-locking.txt
+++ b/Documentation/preempt-locking.txt
@@ -52,7 +52,6 @@ preemption must be disabled around such regions.
 
 Note, some FPU functions are already explicitly preempt safe.  For example,
 kernel_fpu_begin and kernel_fpu_end will disable and enable preemption.
-However, fpu__restore() must be called with preemption disabled.
 
 
 RULE #3: Lock acquire and release must be performed by same task
diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 5f7290e6e954e..32ea458f13b5c 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -28,7 +28,6 @@ extern void fpu__initialize(struct fpu *fpu);
 extern void fpu__prepare_read(struct fpu *fpu);
 extern void fpu__prepare_write(struct fpu *fpu);
 extern void fpu__save(struct fpu *fpu);
-extern void fpu__restore(struct fpu *fpu);
 extern int  fpu__restore_sig(void __user *buf, int ia32_frame);
 extern void fpu__drop(struct fpu *fpu);
 extern int  fpu__copy(struct fpu *dst_fpu, struct fpu *src_fpu);
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 2ea85b32421a0..9387e0fec7d17 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -305,30 +305,6 @@ void fpu__prepare_write(struct fpu *fpu)
 	}
 }
 
-/*
- * 'fpu__restore()' is called to copy FPU registers from
- * the FPU fpstate to the live hw registers and to activate
- * access to the hardware registers, so that FPU instructions
- * can be used afterwards.
- *
- * Must be called with kernel preemption disabled (for example
- * with local interrupts disabled, as it is in the case of
- * do_device_not_available()).
- */
-void fpu__restore(struct fpu *fpu)
-{
-	fpu__initialize(fpu);
-
-	/* Avoid __kernel_fpu_begin() right after fpregs_activate() */
-	kernel_fpu_disable();
-	trace_x86_fpu_before_restore(fpu);
-	fpregs_activate(fpu);
-	copy_kernel_to_fpregs(&fpu->state);
-	trace_x86_fpu_after_restore(fpu);
-	kernel_fpu_enable();
-}
-EXPORT_SYMBOL_GPL(fpu__restore);
-
 /*
  * Drops current FPU state: deactivates the fpregs and
  * the fpstate. NOTE: it still leaves previous contents
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 5046a3c9dec2f..d9510452b69a6 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -274,9 +274,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	/*
 	 * Leave lazy mode, flushing any hypercalls made here.
 	 * This must be done before restoring TLS segments so
-	 * the GDT and LDT are properly updated, and must be
-	 * done before fpu__restore(), so the TS bit is up
-	 * to date.
+	 * the GDT and LDT are properly updated.
 	 */
 	arch_end_context_switch(next_p);
 
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 0e0b4288a4b2b..cf2c157714594 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -576,9 +576,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	/*
 	 * Leave lazy mode, flushing any hypercalls made here.  This
 	 * must be done after loading TLS entries in the GDT but before
-	 * loading segments that might reference them, and and it must
-	 * be done before fpu__restore(), so the TS bit is up to
-	 * date.
+	 * loading segments that might reference them.
 	 */
 	arch_end_context_switch(next_p);
 
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 04/23] x86/entry/32: Remove asm/math_emu.h include
  2018-11-07 19:48 [PATCH v4] x86: load FPU registers on return to userland Sebastian Andrzej Siewior
                   ` (2 preceding siblings ...)
  2018-11-07 19:48 ` [PATCH 03/23] x86/fpu: Remove fpu__restore() Sebastian Andrzej Siewior
@ 2018-11-07 19:48 ` Sebastian Andrzej Siewior
  2018-11-08 18:45   ` Andy Lutomirski
  2018-11-07 19:48 ` [PATCH 05/23] x86/fpu: Remove preempt_disable() in fpu__clear() Sebastian Andrzej Siewior
                   ` (19 subsequent siblings)
  23 siblings, 1 reply; 45+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-11-07 19:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen,
	Sebastian Andrzej Siewior

The math_emu.h header files contains the definition of struct
math_emu_info. It is not used in this file.

Remove asm/math_emu.h include.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/x86/kernel/process_32.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index d9510452b69a6..b8a935adc96fc 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -44,9 +44,6 @@
 #include <asm/processor.h>
 #include <asm/fpu/internal.h>
 #include <asm/desc.h>
-#ifdef CONFIG_MATH_EMULATION
-#include <asm/math_emu.h>
-#endif
 
 #include <linux/err.h>
 
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 05/23] x86/fpu: Remove preempt_disable() in fpu__clear()
  2018-11-07 19:48 [PATCH v4] x86: load FPU registers on return to userland Sebastian Andrzej Siewior
                   ` (3 preceding siblings ...)
  2018-11-07 19:48 ` [PATCH 04/23] x86/entry/32: Remove asm/math_emu.h include Sebastian Andrzej Siewior
@ 2018-11-07 19:48 ` Sebastian Andrzej Siewior
  2018-11-07 19:48 ` [PATCH 06/23] x86/fpu: Always init the `state' " Sebastian Andrzej Siewior
                   ` (18 subsequent siblings)
  23 siblings, 0 replies; 45+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-11-07 19:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen,
	Sebastian Andrzej Siewior

The preempt_disable() section was introduced in commit a10b6a16cdad8
("x86/fpu: Make the fpu state change in fpu__clear() scheduler-atomic")
and it was said to be temporary.

fpu__initialize() initializes the FPU struct to its "init" value and
then sets ->initialized to 1. The last part is the important one.
The content of the `state' does not matter because it gets set via
copy_init_fpstate_to_fpregs().
A preemption here has little meaning because the register will always be
set to the same content after copy_init_fpstate_to_fpregs(). A softirq
with a kernel_fpu_begin() could also force to save FPU's register after
fpu__initialize() without changing the outcome here.

Remove the preempt_disable() section in fpu__clear(), preemption here
does not hurt.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/x86/kernel/fpu/core.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 9387e0fec7d17..b7d9b19ab8116 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -368,11 +368,9 @@ void fpu__clear(struct fpu *fpu)
 	 * Make sure fpstate is cleared and initialized.
 	 */
 	if (static_cpu_has(X86_FEATURE_FPU)) {
-		preempt_disable();
 		fpu__initialize(fpu);
 		user_fpu_begin();
 		copy_init_fpstate_to_fpregs();
-		preempt_enable();
 	}
 }
 
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 06/23] x86/fpu: Always init the `state' in fpu__clear()
  2018-11-07 19:48 [PATCH v4] x86: load FPU registers on return to userland Sebastian Andrzej Siewior
                   ` (4 preceding siblings ...)
  2018-11-07 19:48 ` [PATCH 05/23] x86/fpu: Remove preempt_disable() in fpu__clear() Sebastian Andrzej Siewior
@ 2018-11-07 19:48 ` Sebastian Andrzej Siewior
  2018-11-07 19:48 ` [PATCH 07/23] x86/fpu: Remove fpu->initialized usage in copy_fpstate_to_sigframe() Sebastian Andrzej Siewior
                   ` (17 subsequent siblings)
  23 siblings, 0 replies; 45+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-11-07 19:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen,
	Sebastian Andrzej Siewior

fpu__clear() only initializes the `state' if the FPU is present. This
initialisation is also required for the FPU-less system and takes place
math_emulate(). Since fpu__initialize() only performs the initialization
if ->initialized is zero it does not matter that it is invoked each time
an opcode is emulated. It makes the removal of ->initialized easier if
the struct is also initialized in FPU-less case at the same time.

Move fpu__initialize() before the FPU check so it is also performed in
FPU-less case.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/x86/include/asm/fpu/internal.h | 1 -
 arch/x86/kernel/fpu/core.c          | 5 ++---
 arch/x86/math-emu/fpu_entry.c       | 3 ---
 3 files changed, 2 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 32ea458f13b5c..971fb0988d56f 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -24,7 +24,6 @@
 /*
  * High level FPU state handling functions:
  */
-extern void fpu__initialize(struct fpu *fpu);
 extern void fpu__prepare_read(struct fpu *fpu);
 extern void fpu__prepare_write(struct fpu *fpu);
 extern void fpu__save(struct fpu *fpu);
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index b7d9b19ab8116..8391debc57265 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -225,7 +225,7 @@ int fpu__copy(struct fpu *dst_fpu, struct fpu *src_fpu)
  * Activate the current task's in-memory FPU context,
  * if it has not been used before:
  */
-void fpu__initialize(struct fpu *fpu)
+static void fpu__initialize(struct fpu *fpu)
 {
 	WARN_ON_FPU(fpu != &current->thread.fpu);
 
@@ -238,7 +238,6 @@ void fpu__initialize(struct fpu *fpu)
 		fpu->initialized = 1;
 	}
 }
-EXPORT_SYMBOL_GPL(fpu__initialize);
 
 /*
  * This function must be called before we read a task's fpstate.
@@ -367,8 +366,8 @@ void fpu__clear(struct fpu *fpu)
 	/*
 	 * Make sure fpstate is cleared and initialized.
 	 */
+	fpu__initialize(fpu);
 	if (static_cpu_has(X86_FEATURE_FPU)) {
-		fpu__initialize(fpu);
 		user_fpu_begin();
 		copy_init_fpstate_to_fpregs();
 	}
diff --git a/arch/x86/math-emu/fpu_entry.c b/arch/x86/math-emu/fpu_entry.c
index 9e2ba7e667f61..a873da6b46d6b 100644
--- a/arch/x86/math-emu/fpu_entry.c
+++ b/arch/x86/math-emu/fpu_entry.c
@@ -113,9 +113,6 @@ void math_emulate(struct math_emu_info *info)
 	unsigned long code_base = 0;
 	unsigned long code_limit = 0;	/* Initialized to stop compiler warnings */
 	struct desc_struct code_descriptor;
-	struct fpu *fpu = &current->thread.fpu;
-
-	fpu__initialize(fpu);
 
 #ifdef RE_ENTRANT_CHECKING
 	if (emulating) {
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 07/23] x86/fpu: Remove fpu->initialized usage in copy_fpstate_to_sigframe()
  2018-11-07 19:48 [PATCH v4] x86: load FPU registers on return to userland Sebastian Andrzej Siewior
                   ` (5 preceding siblings ...)
  2018-11-07 19:48 ` [PATCH 06/23] x86/fpu: Always init the `state' " Sebastian Andrzej Siewior
@ 2018-11-07 19:48 ` Sebastian Andrzej Siewior
  2018-11-07 19:48 ` [PATCH 08/23] x86/fpu: Remove fpu->initialized Sebastian Andrzej Siewior
                   ` (16 subsequent siblings)
  23 siblings, 0 replies; 45+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-11-07 19:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen,
	Sebastian Andrzej Siewior

Since ->initialized is always true for user tasks and kernel threads
don't get this far, we always save the registers directly to userspace.

Remove check for ->initialized because it is always true and remove the
false condition.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/x86/kernel/fpu/signal.c | 30 ++++++------------------------
 1 file changed, 6 insertions(+), 24 deletions(-)

diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index 5777ee0c32fed..30e65085dc4d9 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -157,7 +157,6 @@ static inline int copy_fpregs_to_sigframe(struct xregs_state __user *buf)
 int copy_fpstate_to_sigframe(void __user *buf, void __user *buf_fx, int size)
 {
 	struct fpu *fpu = &current->thread.fpu;
-	struct xregs_state *xsave = &fpu->state.xsave;
 	struct task_struct *tsk = current;
 	int ia32_fxstate = (buf != buf_fx);
 
@@ -172,29 +171,12 @@ int copy_fpstate_to_sigframe(void __user *buf, void __user *buf_fx, int size)
 			sizeof(struct user_i387_ia32_struct), NULL,
 			(struct _fpstate_32 __user *) buf) ? -1 : 1;
 
-	if (fpu->initialized || using_compacted_format()) {
-		/* Save the live register state to the user directly. */
-		if (copy_fpregs_to_sigframe(buf_fx))
-			return -1;
-		/* Update the thread's fxstate to save the fsave header. */
-		if (ia32_fxstate)
-			copy_fxregs_to_kernel(fpu);
-	} else {
-		/*
-		 * It is a *bug* if kernel uses compacted-format for xsave
-		 * area and we copy it out directly to a signal frame. It
-		 * should have been handled above by saving the registers
-		 * directly.
-		 */
-		if (boot_cpu_has(X86_FEATURE_XSAVES)) {
-			WARN_ONCE(1, "x86/fpu: saving compacted-format xsave area to a signal frame!\n");
-			return -1;
-		}
-
-		fpstate_sanitize_xstate(fpu);
-		if (__copy_to_user(buf_fx, xsave, fpu_user_xstate_size))
-			return -1;
-	}
+	/* Save the live register state to the user directly. */
+	if (copy_fpregs_to_sigframe(buf_fx))
+		return -1;
+	/* Update the thread's fxstate to save the fsave header. */
+	if (ia32_fxstate)
+		copy_fxregs_to_kernel(fpu);
 
 	/* Save the fsave header for the 32-bit frames. */
 	if ((ia32_fxstate || !use_fxsr()) && save_fsave_header(tsk, buf))
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 08/23] x86/fpu: Remove fpu->initialized
  2018-11-07 19:48 [PATCH v4] x86: load FPU registers on return to userland Sebastian Andrzej Siewior
                   ` (6 preceding siblings ...)
  2018-11-07 19:48 ` [PATCH 07/23] x86/fpu: Remove fpu->initialized usage in copy_fpstate_to_sigframe() Sebastian Andrzej Siewior
@ 2018-11-07 19:48 ` Sebastian Andrzej Siewior
  2018-11-07 19:48 ` [PATCH 09/23] x86/fpu: Remove user_fpu_begin() Sebastian Andrzej Siewior
                   ` (15 subsequent siblings)
  23 siblings, 0 replies; 45+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-11-07 19:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen,
	Sebastian Andrzej Siewior

The `initialized' member of the fpu struct is always set to one user
tasks and zero for kernel tasks. This avoids saving/restoring the FPU
registers for kernel threads.

By dropping that member the context switch time between kernel-threads
increases because the FPU registers are needlessly stored and restored.
The following context switch
	1. TaskA -> kernel thread1
	2. kernel thread1 -> TaskA

will restore TaskA's FPU registers in step 2. This wasn't the case
earlier because they would not have been destroyed by the kernel thread.
This means now fpu_fpregs_owner_ctx points always to current's FPU
struct (even for kernel threads).

One different behaviour: Since the FPU registers are now restored for
kernel threads, they will have PKRU=0. Before this change, the kernel
thread had a random value (inherited from the previous user task).

The increased context switch time will be reverted once the FPU
registers are restored on return to userland.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/x86/ia32/ia32_signal.c         | 17 +++----
 arch/x86/include/asm/fpu/internal.h | 17 ++++---
 arch/x86/include/asm/fpu/types.h    |  9 ----
 arch/x86/include/asm/trace/fpu.h    |  5 +-
 arch/x86/kernel/fpu/core.c          | 77 +++++++----------------------
 arch/x86/kernel/fpu/init.c          |  2 -
 arch/x86/kernel/fpu/regset.c        | 19 ++-----
 arch/x86/kernel/fpu/xstate.c        |  2 -
 arch/x86/kernel/process_32.c        |  4 +-
 arch/x86/kernel/process_64.c        |  4 +-
 arch/x86/kernel/signal.c            | 17 +++----
 arch/x86/mm/pkeys.c                 |  7 +--
 12 files changed, 51 insertions(+), 129 deletions(-)

diff --git a/arch/x86/ia32/ia32_signal.c b/arch/x86/ia32/ia32_signal.c
index 86b1341cba9ac..eb215ac5d9f46 100644
--- a/arch/x86/ia32/ia32_signal.c
+++ b/arch/x86/ia32/ia32_signal.c
@@ -216,8 +216,7 @@ static void __user *get_sigframe(struct ksignal *ksig, struct pt_regs *regs,
 				 size_t frame_size,
 				 void __user **fpstate)
 {
-	struct fpu *fpu = &current->thread.fpu;
-	unsigned long sp;
+	unsigned long sp, fx_aligned, math_size;
 
 	/* Default to using normal stack */
 	sp = regs->sp;
@@ -231,15 +230,11 @@ static void __user *get_sigframe(struct ksignal *ksig, struct pt_regs *regs,
 		 ksig->ka.sa.sa_restorer)
 		sp = (unsigned long) ksig->ka.sa.sa_restorer;
 
-	if (fpu->initialized) {
-		unsigned long fx_aligned, math_size;
-
-		sp = fpu__alloc_mathframe(sp, 1, &fx_aligned, &math_size);
-		*fpstate = (struct _fpstate_32 __user *) sp;
-		if (copy_fpstate_to_sigframe(*fpstate, (void __user *)fx_aligned,
-				    math_size) < 0)
-			return (void __user *) -1L;
-	}
+	sp = fpu__alloc_mathframe(sp, 1, &fx_aligned, &math_size);
+	*fpstate = (struct _fpstate_32 __user *) sp;
+	if (copy_fpstate_to_sigframe(*fpstate, (void __user *)fx_aligned,
+				     math_size) < 0)
+		return (void __user *) -1L;
 
 	sp -= frame_size;
 	/* Align the stack pointer according to the i386 ABI,
diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 971fb0988d56f..d7ef8ac8b17eb 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -526,7 +526,7 @@ static inline void fpregs_activate(struct fpu *fpu)
 static inline void
 switch_fpu_prepare(struct fpu *old_fpu, int cpu)
 {
-	if (static_cpu_has(X86_FEATURE_FPU) && old_fpu->initialized) {
+	if (static_cpu_has(X86_FEATURE_FPU)) {
 		if (!copy_fpregs_to_fpstate(old_fpu))
 			old_fpu->last_cpu = -1;
 		else
@@ -534,8 +534,7 @@ switch_fpu_prepare(struct fpu *old_fpu, int cpu)
 
 		/* But leave fpu_fpregs_owner_ctx! */
 		trace_x86_fpu_regs_deactivated(old_fpu);
-	} else
-		old_fpu->last_cpu = -1;
+	}
 }
 
 /*
@@ -548,12 +547,14 @@ switch_fpu_prepare(struct fpu *old_fpu, int cpu)
  */
 static inline void switch_fpu_finish(struct fpu *new_fpu, int cpu)
 {
-	bool preload = static_cpu_has(X86_FEATURE_FPU) &&
-		       new_fpu->initialized;
+	if (static_cpu_has(X86_FEATURE_FPU)) {
+		if (!fpregs_state_valid(new_fpu, cpu)) {
+			if (current->mm)
+				copy_kernel_to_fpregs(&new_fpu->state);
+			else
+				copy_kernel_to_fpregs(&init_fpstate);
+		}
 
-	if (preload) {
-		if (!fpregs_state_valid(new_fpu, cpu))
-			copy_kernel_to_fpregs(&new_fpu->state);
 		fpregs_activate(new_fpu);
 	}
 }
diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
index 202c53918ecfa..c5a6edd92de4f 100644
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -293,15 +293,6 @@ struct fpu {
 	 */
 	unsigned int			last_cpu;
 
-	/*
-	 * @initialized:
-	 *
-	 * This flag indicates whether this context is initialized: if the task
-	 * is not running then we can restore from this context, if the task
-	 * is running then we should save into this context.
-	 */
-	unsigned char			initialized;
-
 	/*
 	 * @state:
 	 *
diff --git a/arch/x86/include/asm/trace/fpu.h b/arch/x86/include/asm/trace/fpu.h
index 069c04be15076..bd65f6ba950f8 100644
--- a/arch/x86/include/asm/trace/fpu.h
+++ b/arch/x86/include/asm/trace/fpu.h
@@ -13,22 +13,19 @@ DECLARE_EVENT_CLASS(x86_fpu,
 
 	TP_STRUCT__entry(
 		__field(struct fpu *, fpu)
-		__field(bool, initialized)
 		__field(u64, xfeatures)
 		__field(u64, xcomp_bv)
 		),
 
 	TP_fast_assign(
 		__entry->fpu		= fpu;
-		__entry->initialized	= fpu->initialized;
 		if (boot_cpu_has(X86_FEATURE_OSXSAVE)) {
 			__entry->xfeatures = fpu->state.xsave.header.xfeatures;
 			__entry->xcomp_bv  = fpu->state.xsave.header.xcomp_bv;
 		}
 	),
-	TP_printk("x86/fpu: %p initialized: %d xfeatures: %llx xcomp_bv: %llx",
+	TP_printk("x86/fpu: %p xfeatures: %llx xcomp_bv: %llx",
 			__entry->fpu,
-			__entry->initialized,
 			__entry->xfeatures,
 			__entry->xcomp_bv
 	)
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 8391debc57265..2ed82c6e646b9 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -101,15 +101,7 @@ void __kernel_fpu_begin(void)
 
 	kernel_fpu_disable();
 
-	if (fpu->initialized) {
-		/*
-		 * Ignore return value -- we don't care if reg state
-		 * is clobbered.
-		 */
-		copy_fpregs_to_fpstate(fpu);
-	} else {
-		__cpu_invalidate_fpregs_state();
-	}
+	copy_fpregs_to_fpstate(fpu);
 }
 EXPORT_SYMBOL(__kernel_fpu_begin);
 
@@ -117,8 +109,7 @@ void __kernel_fpu_end(void)
 {
 	struct fpu *fpu = &current->thread.fpu;
 
-	if (fpu->initialized)
-		copy_kernel_to_fpregs(&fpu->state);
+	copy_kernel_to_fpregs(&fpu->state);
 
 	kernel_fpu_enable();
 }
@@ -149,10 +140,9 @@ void fpu__save(struct fpu *fpu)
 
 	preempt_disable();
 	trace_x86_fpu_before_save(fpu);
-	if (fpu->initialized) {
-		if (!copy_fpregs_to_fpstate(fpu)) {
-			copy_kernel_to_fpregs(&fpu->state);
-		}
+
+	if (!copy_fpregs_to_fpstate(fpu)) {
+		copy_kernel_to_fpregs(&fpu->state);
 	}
 	trace_x86_fpu_after_save(fpu);
 	preempt_enable();
@@ -192,7 +182,7 @@ int fpu__copy(struct fpu *dst_fpu, struct fpu *src_fpu)
 {
 	dst_fpu->last_cpu = -1;
 
-	if (!src_fpu->initialized || !static_cpu_has(X86_FEATURE_FPU))
+	if (!static_cpu_has(X86_FEATURE_FPU))
 		return 0;
 
 	WARN_ON_FPU(src_fpu != &current->thread.fpu);
@@ -229,14 +219,10 @@ static void fpu__initialize(struct fpu *fpu)
 {
 	WARN_ON_FPU(fpu != &current->thread.fpu);
 
-	if (!fpu->initialized) {
-		fpstate_init(&fpu->state);
-		trace_x86_fpu_init_state(fpu);
+	fpstate_init(&fpu->state);
+	trace_x86_fpu_init_state(fpu);
 
-		trace_x86_fpu_activate_state(fpu);
-		/* Safe to do for the current task: */
-		fpu->initialized = 1;
-	}
+	trace_x86_fpu_activate_state(fpu);
 }
 
 /*
@@ -249,32 +235,20 @@ static void fpu__initialize(struct fpu *fpu)
  *
  * - or it's called for stopped tasks (ptrace), in which case the
  *   registers were already saved by the context-switch code when
- *   the task scheduled out - we only have to initialize the registers
- *   if they've never been initialized.
+ *   the task scheduled out.
  *
  * If the task has used the FPU before then save it.
  */
 void fpu__prepare_read(struct fpu *fpu)
 {
-	if (fpu == &current->thread.fpu) {
+	if (fpu == &current->thread.fpu)
 		fpu__save(fpu);
-	} else {
-		if (!fpu->initialized) {
-			fpstate_init(&fpu->state);
-			trace_x86_fpu_init_state(fpu);
-
-			trace_x86_fpu_activate_state(fpu);
-			/* Safe to do for current and for stopped child tasks: */
-			fpu->initialized = 1;
-		}
-	}
 }
 
 /*
  * This function must be called before we write a task's fpstate.
  *
- * If the task has used the FPU before then invalidate any cached FPU registers.
- * If the task has not used the FPU before then initialize its fpstate.
+ * Invalidate any cached FPU registers.
  *
  * After this function call, after registers in the fpstate are
  * modified and the child task has woken up, the child task will
@@ -291,17 +265,8 @@ void fpu__prepare_write(struct fpu *fpu)
 	 */
 	WARN_ON_FPU(fpu == &current->thread.fpu);
 
-	if (fpu->initialized) {
-		/* Invalidate any cached state: */
-		__fpu_invalidate_fpregs_state(fpu);
-	} else {
-		fpstate_init(&fpu->state);
-		trace_x86_fpu_init_state(fpu);
-
-		trace_x86_fpu_activate_state(fpu);
-		/* Safe to do for stopped child tasks: */
-		fpu->initialized = 1;
-	}
+	/* Invalidate any cached state: */
+	__fpu_invalidate_fpregs_state(fpu);
 }
 
 /*
@@ -318,17 +283,13 @@ void fpu__drop(struct fpu *fpu)
 	preempt_disable();
 
 	if (fpu == &current->thread.fpu) {
-		if (fpu->initialized) {
-			/* Ignore delayed exceptions from user space */
-			asm volatile("1: fwait\n"
-				     "2:\n"
-				     _ASM_EXTABLE(1b, 2b));
-			fpregs_deactivate(fpu);
-		}
+		/* Ignore delayed exceptions from user space */
+		asm volatile("1: fwait\n"
+			     "2:\n"
+			     _ASM_EXTABLE(1b, 2b));
+		fpregs_deactivate(fpu);
 	}
 
-	fpu->initialized = 0;
-
 	trace_x86_fpu_dropped(fpu);
 
 	preempt_enable();
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 6abd83572b016..20d8fa7124c77 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -239,8 +239,6 @@ static void __init fpu__init_system_ctx_switch(void)
 
 	WARN_ON_FPU(!on_boot_cpu);
 	on_boot_cpu = 0;
-
-	WARN_ON_FPU(current->thread.fpu.initialized);
 }
 
 /*
diff --git a/arch/x86/kernel/fpu/regset.c b/arch/x86/kernel/fpu/regset.c
index 5dbc099178a88..d652b939ccfb5 100644
--- a/arch/x86/kernel/fpu/regset.c
+++ b/arch/x86/kernel/fpu/regset.c
@@ -15,16 +15,12 @@
  */
 int regset_fpregs_active(struct task_struct *target, const struct user_regset *regset)
 {
-	struct fpu *target_fpu = &target->thread.fpu;
-
-	return target_fpu->initialized ? regset->n : 0;
+	return regset->n;
 }
 
 int regset_xregset_fpregs_active(struct task_struct *target, const struct user_regset *regset)
 {
-	struct fpu *target_fpu = &target->thread.fpu;
-
-	if (boot_cpu_has(X86_FEATURE_FXSR) && target_fpu->initialized)
+	if (boot_cpu_has(X86_FEATURE_FXSR))
 		return regset->n;
 	else
 		return 0;
@@ -370,16 +366,9 @@ int fpregs_set(struct task_struct *target, const struct user_regset *regset,
 int dump_fpu(struct pt_regs *regs, struct user_i387_struct *ufpu)
 {
 	struct task_struct *tsk = current;
-	struct fpu *fpu = &tsk->thread.fpu;
-	int fpvalid;
 
-	fpvalid = fpu->initialized;
-	if (fpvalid)
-		fpvalid = !fpregs_get(tsk, NULL,
-				      0, sizeof(struct user_i387_ia32_struct),
-				      ufpu, NULL);
-
-	return fpvalid;
+	return !fpregs_get(tsk, NULL, 0, sizeof(struct user_i387_ia32_struct),
+			   ufpu, NULL);
 }
 EXPORT_SYMBOL(dump_fpu);
 
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 9dc0a2c8c2755..a2041961f3008 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -894,8 +894,6 @@ const void *get_xsave_field_ptr(int xsave_state)
 {
 	struct fpu *fpu = &current->thread.fpu;
 
-	if (!fpu->initialized)
-		return NULL;
 	/*
 	 * fpu__save() takes the CPU's xstate registers
 	 * and saves them off to the 'fpu memory buffer.
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index b8a935adc96fc..5c99e49c0d392 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -292,10 +292,10 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	if (prev->gs | next->gs)
 		lazy_load_gs(next->gs);
 
-	switch_fpu_finish(next_fpu, cpu);
-
 	this_cpu_write(current_task, next_p);
 
+	switch_fpu_finish(next_fpu, cpu);
+
 	/* Load the Intel cache allocation PQR MSR. */
 	intel_rdt_sched_in();
 
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index cf2c157714594..4fd0dfa5bd83e 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -604,14 +604,14 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 
 	x86_fsgsbase_load(prev, next);
 
-	switch_fpu_finish(next_fpu, cpu);
-
 	/*
 	 * Switch the PDA and FPU contexts.
 	 */
 	this_cpu_write(current_task, next_p);
 	this_cpu_write(cpu_current_top_of_stack, task_top_of_stack(next_p));
 
+	switch_fpu_finish(next_fpu, cpu);
+
 	/* Reload sp0. */
 	update_task_stack(next_p);
 
diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index 92a3b312a53c4..2687e698c31fc 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -246,7 +246,7 @@ get_sigframe(struct k_sigaction *ka, struct pt_regs *regs, size_t frame_size,
 	unsigned long sp = regs->sp;
 	unsigned long buf_fx = 0;
 	int onsigstack = on_sig_stack(sp);
-	struct fpu *fpu = &current->thread.fpu;
+	int ret;
 
 	/* redzone */
 	if (IS_ENABLED(CONFIG_X86_64))
@@ -265,11 +265,9 @@ get_sigframe(struct k_sigaction *ka, struct pt_regs *regs, size_t frame_size,
 		sp = (unsigned long) ka->sa.sa_restorer;
 	}
 
-	if (fpu->initialized) {
-		sp = fpu__alloc_mathframe(sp, IS_ENABLED(CONFIG_X86_32),
-					  &buf_fx, &math_size);
-		*fpstate = (void __user *)sp;
-	}
+	sp = fpu__alloc_mathframe(sp, IS_ENABLED(CONFIG_X86_32),
+				  &buf_fx, &math_size);
+	*fpstate = (void __user *)sp;
 
 	sp = align_sigframe(sp - frame_size);
 
@@ -281,8 +279,8 @@ get_sigframe(struct k_sigaction *ka, struct pt_regs *regs, size_t frame_size,
 		return (void __user *)-1L;
 
 	/* save i387 and extended state */
-	if (fpu->initialized &&
-	    copy_fpstate_to_sigframe(*fpstate, (void __user *)buf_fx, math_size) < 0)
+	ret = copy_fpstate_to_sigframe(*fpstate, (void __user *)buf_fx, math_size);
+	if (ret < 0)
 		return (void __user *)-1L;
 
 	return (void __user *)sp;
@@ -763,8 +761,7 @@ handle_signal(struct ksignal *ksig, struct pt_regs *regs)
 		/*
 		 * Ensure the signal handler starts with the new fpu state.
 		 */
-		if (fpu->initialized)
-			fpu__clear(fpu);
+		fpu__clear(fpu);
 	}
 	signal_setup_done(failed, ksig, stepping);
 }
diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c
index 6e98e0a7c9231..0af3ece18f113 100644
--- a/arch/x86/mm/pkeys.c
+++ b/arch/x86/mm/pkeys.c
@@ -39,17 +39,12 @@ int __execute_only_pkey(struct mm_struct *mm)
 	 * dance to set PKRU if we do not need to.  Check it
 	 * first and assume that if the execute-only pkey is
 	 * write-disabled that we do not have to set it
-	 * ourselves.  We need preempt off so that nobody
-	 * can make fpregs inactive.
+	 * ourselves.
 	 */
-	preempt_disable();
 	if (!need_to_set_mm_pkey &&
-	    current->thread.fpu.initialized &&
 	    !__pkru_allows_read(read_pkru(), execute_only_pkey)) {
-		preempt_enable();
 		return execute_only_pkey;
 	}
-	preempt_enable();
 
 	/*
 	 * Set up PKRU so that it denies access for everything
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 09/23] x86/fpu: Remove user_fpu_begin()
  2018-11-07 19:48 [PATCH v4] x86: load FPU registers on return to userland Sebastian Andrzej Siewior
                   ` (7 preceding siblings ...)
  2018-11-07 19:48 ` [PATCH 08/23] x86/fpu: Remove fpu->initialized Sebastian Andrzej Siewior
@ 2018-11-07 19:48 ` Sebastian Andrzej Siewior
  2018-11-07 19:48 ` [PATCH 10/23] x86/entry: Remove _TIF_ALLWORK_MASK Sebastian Andrzej Siewior
                   ` (14 subsequent siblings)
  23 siblings, 0 replies; 45+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-11-07 19:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen,
	Sebastian Andrzej Siewior

user_fpu_begin() sets fpu_fpregs_owner_ctx to task's fpu struct. This is
always the case since there is no lazy FPU anymore.
fpu_fpregs_owner_ctx is used during context switch to decide if it needs
to load the saved registers or if the currently loaded registers are
valid. This field is always updated on context switch so setting it
manually has no benefit.
Before the removal of the ->initialized member, fpu_fpregs_owner_ctx
wouldn't be updated for kernel threads. But kernel threads don't use the
fpu struct so again, user_fpu_begin() would not make a difference in
user context (and would be wrong in kernel context).

Remove user_fpu_begin(), it does not change fpu_fpregs_owner_ctx's
content.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/x86/include/asm/fpu/internal.h | 17 -----------------
 arch/x86/kernel/fpu/core.c          |  4 +---
 arch/x86/kernel/fpu/signal.c        |  1 -
 3 files changed, 1 insertion(+), 21 deletions(-)

diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index d7ef8ac8b17eb..d7f9e14c73c19 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -559,23 +559,6 @@ static inline void switch_fpu_finish(struct fpu *new_fpu, int cpu)
 	}
 }
 
-/*
- * Needs to be preemption-safe.
- *
- * NOTE! user_fpu_begin() must be used only immediately before restoring
- * the save state. It does not do any saving/restoring on its own. In
- * lazy FPU mode, it is just an optimization to avoid a #NM exception,
- * the task can lose the FPU right after preempt_enable().
- */
-static inline void user_fpu_begin(void)
-{
-	struct fpu *fpu = &current->thread.fpu;
-
-	preempt_disable();
-	fpregs_activate(fpu);
-	preempt_enable();
-}
-
 /*
  * MXCSR and XCR definitions:
  */
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 2ed82c6e646b9..02bbd40fe1e95 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -328,10 +328,8 @@ void fpu__clear(struct fpu *fpu)
 	 * Make sure fpstate is cleared and initialized.
 	 */
 	fpu__initialize(fpu);
-	if (static_cpu_has(X86_FEATURE_FPU)) {
-		user_fpu_begin();
+	if (static_cpu_has(X86_FEATURE_FPU))
 		copy_init_fpstate_to_fpregs();
-	}
 }
 
 /*
diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index 30e65085dc4d9..f61f4d804537a 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -328,7 +328,6 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
 		 * For 64-bit frames and 32-bit fsave frames, restore the user
 		 * state to the registers directly (with exceptions handled).
 		 */
-		user_fpu_begin();
 		if (copy_user_to_fpregs_zeroing(buf_fx, xfeatures, fx_only)) {
 			fpu__clear(fpu);
 			return -1;
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 10/23] x86/entry: Remove _TIF_ALLWORK_MASK
  2018-11-07 19:48 [PATCH v4] x86: load FPU registers on return to userland Sebastian Andrzej Siewior
                   ` (8 preceding siblings ...)
  2018-11-07 19:48 ` [PATCH 09/23] x86/fpu: Remove user_fpu_begin() Sebastian Andrzej Siewior
@ 2018-11-07 19:48 ` Sebastian Andrzej Siewior
  2018-11-07 19:48 ` [PATCH 11/23] x86/fpu: Add (__)make_fpregs_active helpers Sebastian Andrzej Siewior
                   ` (13 subsequent siblings)
  23 siblings, 0 replies; 45+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-11-07 19:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen,
	Sebastian Andrzej Siewior

There is no user of _TIF_ALLWORK_MASK since commit 21d375b6b34ff
("x86/entry/64: Remove the SYSCALL64 fast path").
Remove unused define _TIF_ALLWORK_MASK.

Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/x86/include/asm/thread_info.h | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index 2ff2a30a264f4..cd6920674b905 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -136,14 +136,6 @@ struct thread_info {
 	 _TIF_SECCOMP | _TIF_SYSCALL_TRACEPOINT |	\
 	 _TIF_NOHZ)
 
-/* work to do on any return to user space */
-#define _TIF_ALLWORK_MASK						\
-	(_TIF_SYSCALL_TRACE | _TIF_NOTIFY_RESUME | _TIF_SIGPENDING |	\
-	 _TIF_NEED_RESCHED | _TIF_SINGLESTEP | _TIF_SYSCALL_EMU |	\
-	 _TIF_SYSCALL_AUDIT | _TIF_USER_RETURN_NOTIFY | _TIF_UPROBE |	\
-	 _TIF_PATCH_PENDING | _TIF_NOHZ | _TIF_SYSCALL_TRACEPOINT |	\
-	 _TIF_FSCHECK)
-
 /* flags to check in __switch_to() */
 #define _TIF_WORK_CTXSW							\
 	(_TIF_IO_BITMAP|_TIF_NOCPUID|_TIF_NOTSC|_TIF_BLOCKSTEP|_TIF_SSBD)
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 11/23] x86/fpu: Add (__)make_fpregs_active helpers
  2018-11-07 19:48 [PATCH v4] x86: load FPU registers on return to userland Sebastian Andrzej Siewior
                   ` (9 preceding siblings ...)
  2018-11-07 19:48 ` [PATCH 10/23] x86/entry: Remove _TIF_ALLWORK_MASK Sebastian Andrzej Siewior
@ 2018-11-07 19:48 ` Sebastian Andrzej Siewior
  2018-11-07 19:48 ` [PATCH 12/23] x86/fpu: Make __raw_xsave_addr() use feature number instead of mask Sebastian Andrzej Siewior
                   ` (12 subsequent siblings)
  23 siblings, 0 replies; 45+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-11-07 19:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen,
	Sebastian Andrzej Siewior

From: Rik van Riel <riel@surriel.com>

Add helper function that ensures the floating point registers for
the current task are active. Use with preemption disabled.

Signed-off-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/x86/include/asm/fpu/api.h      | 11 +++++++++++
 arch/x86/include/asm/fpu/internal.h | 26 ++++++++++++++++----------
 2 files changed, 27 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h
index a9caac9d4a729..8c03b42416656 100644
--- a/arch/x86/include/asm/fpu/api.h
+++ b/arch/x86/include/asm/fpu/api.h
@@ -10,6 +10,7 @@
 
 #ifndef _ASM_X86_FPU_API_H
 #define _ASM_X86_FPU_API_H
+#include <linux/preempt.h>
 
 /*
  * Careful: __kernel_fpu_begin/end() must be called with preempt disabled
@@ -27,6 +28,16 @@ extern void kernel_fpu_begin(void);
 extern void kernel_fpu_end(void);
 extern bool irq_fpu_usable(void);
 
+static inline void __fpregs_changes_begin(void)
+{
+	preempt_disable();
+}
+
+static inline void __fpregs_changes_end(void)
+{
+	preempt_enable();
+}
+
 /*
  * Query the presence of one or more xfeatures. Works on any legacy CPU as well.
  *
diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index d7f9e14c73c19..260cd4f4ba2bb 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -512,6 +512,20 @@ static inline void fpregs_activate(struct fpu *fpu)
 	trace_x86_fpu_regs_activated(fpu);
 }
 
+/*
+ * Load the FPU state for the current task. Call with preemption disabled.
+ */
+static inline void __fpregs_load_activate(struct fpu *fpu, int cpu)
+{
+	if (!fpregs_state_valid(fpu, cpu)) {
+		if (current->mm)
+			copy_kernel_to_fpregs(&fpu->state);
+		else
+			copy_kernel_to_fpregs(&init_fpstate);
+	}
+	fpregs_activate(fpu);
+}
+
 /*
  * FPU state switching for scheduling.
  *
@@ -547,16 +561,8 @@ switch_fpu_prepare(struct fpu *old_fpu, int cpu)
  */
 static inline void switch_fpu_finish(struct fpu *new_fpu, int cpu)
 {
-	if (static_cpu_has(X86_FEATURE_FPU)) {
-		if (!fpregs_state_valid(new_fpu, cpu)) {
-			if (current->mm)
-				copy_kernel_to_fpregs(&new_fpu->state);
-			else
-				copy_kernel_to_fpregs(&init_fpstate);
-		}
-
-		fpregs_activate(new_fpu);
-	}
+	if (static_cpu_has(X86_FEATURE_FPU))
+		__fpregs_load_activate(new_fpu, cpu);
 }
 
 /*
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 12/23] x86/fpu: Make __raw_xsave_addr() use feature number instead of mask
  2018-11-07 19:48 [PATCH v4] x86: load FPU registers on return to userland Sebastian Andrzej Siewior
                   ` (10 preceding siblings ...)
  2018-11-07 19:48 ` [PATCH 11/23] x86/fpu: Add (__)make_fpregs_active helpers Sebastian Andrzej Siewior
@ 2018-11-07 19:48 ` Sebastian Andrzej Siewior
  2018-11-07 19:48 ` [PATCH 13/23] x86/fpu: Make get_xsave_field_ptr() and get_xsave_addr() " Sebastian Andrzej Siewior
                   ` (11 subsequent siblings)
  23 siblings, 0 replies; 45+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-11-07 19:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen,
	Sebastian Andrzej Siewior

Most users of __raw_xsave_addr() use a feature number, shift it to a
mask and then __raw_xsave_addr() shifts it back to the feature number.

Make __raw_xsave_addr() use the feature number as argument.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/x86/kernel/fpu/xstate.c | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index a2041961f3008..3dfe3627deaf6 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -805,22 +805,20 @@ void fpu__resume_cpu(void)
 }
 
 /*
- * Given an xstate feature mask, calculate where in the xsave
+ * Given an xstate feature nr, calculate where in the xsave
  * buffer the state is.  Callers should ensure that the buffer
  * is valid.
  *
  * Note: does not work for compacted buffers.
  */
-void *__raw_xsave_addr(struct xregs_state *xsave, int xstate_feature_mask)
+static void *__raw_xsave_addr(struct xregs_state *xsave, int xfeature_nr)
 {
-	int feature_nr = fls64(xstate_feature_mask) - 1;
-
-	if (!xfeature_enabled(feature_nr)) {
+	if (!xfeature_enabled(xfeature_nr)) {
 		WARN_ON_FPU(1);
 		return NULL;
 	}
 
-	return (void *)xsave + xstate_comp_offsets[feature_nr];
+	return (void *)xsave + xstate_comp_offsets[xfeature_nr];
 }
 /*
  * Given the xsave area and a state inside, this function returns the
@@ -842,6 +840,7 @@ void *__raw_xsave_addr(struct xregs_state *xsave, int xstate_feature_mask)
  */
 void *get_xsave_addr(struct xregs_state *xsave, int xstate_feature)
 {
+	int xfeature_nr;
 	/*
 	 * Do we even *have* xsave state?
 	 */
@@ -869,7 +868,8 @@ void *get_xsave_addr(struct xregs_state *xsave, int xstate_feature)
 	if (!(xsave->header.xfeatures & xstate_feature))
 		return NULL;
 
-	return __raw_xsave_addr(xsave, xstate_feature);
+	xfeature_nr = fls64(xstate_feature) - 1;
+	return __raw_xsave_addr(xsave, xfeature_nr);
 }
 EXPORT_SYMBOL_GPL(get_xsave_addr);
 
@@ -1016,7 +1016,7 @@ int copy_xstate_to_kernel(void *kbuf, struct xregs_state *xsave, unsigned int of
 		 * Copy only in-use xstates:
 		 */
 		if ((header.xfeatures >> i) & 1) {
-			void *src = __raw_xsave_addr(xsave, 1 << i);
+			void *src = __raw_xsave_addr(xsave, i);
 
 			offset = xstate_offsets[i];
 			size = xstate_sizes[i];
@@ -1102,7 +1102,7 @@ int copy_xstate_to_user(void __user *ubuf, struct xregs_state *xsave, unsigned i
 		 * Copy only in-use xstates:
 		 */
 		if ((header.xfeatures >> i) & 1) {
-			void *src = __raw_xsave_addr(xsave, 1 << i);
+			void *src = __raw_xsave_addr(xsave, i);
 
 			offset = xstate_offsets[i];
 			size = xstate_sizes[i];
@@ -1159,7 +1159,7 @@ int copy_kernel_to_xstate(struct xregs_state *xsave, const void *kbuf)
 		u64 mask = ((u64)1 << i);
 
 		if (hdr.xfeatures & mask) {
-			void *dst = __raw_xsave_addr(xsave, 1 << i);
+			void *dst = __raw_xsave_addr(xsave, i);
 
 			offset = xstate_offsets[i];
 			size = xstate_sizes[i];
@@ -1213,7 +1213,7 @@ int copy_user_to_xstate(struct xregs_state *xsave, const void __user *ubuf)
 		u64 mask = ((u64)1 << i);
 
 		if (hdr.xfeatures & mask) {
-			void *dst = __raw_xsave_addr(xsave, 1 << i);
+			void *dst = __raw_xsave_addr(xsave, i);
 
 			offset = xstate_offsets[i];
 			size = xstate_sizes[i];
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 13/23] x86/fpu: Make get_xsave_field_ptr() and get_xsave_addr() use feature number instead of mask
  2018-11-07 19:48 [PATCH v4] x86: load FPU registers on return to userland Sebastian Andrzej Siewior
                   ` (11 preceding siblings ...)
  2018-11-07 19:48 ` [PATCH 12/23] x86/fpu: Make __raw_xsave_addr() use feature number instead of mask Sebastian Andrzej Siewior
@ 2018-11-07 19:48 ` Sebastian Andrzej Siewior
  2018-11-07 19:48 ` [PATCH 14/23] x86/pkeys: Make init_pkru_value static Sebastian Andrzej Siewior
                   ` (10 subsequent siblings)
  23 siblings, 0 replies; 45+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-11-07 19:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen,
	Sebastian Andrzej Siewior

After changing the argument of __raw_xsave_addr() from a mask to number
Dave suggested to check if it makes sense to do the same for
get_xsave_addr(). As it turns out it does. Only get_xsave_addr() needs
the mask to check if the requested feature is part of what is
support/saved and then uses the number again. The shift operation is
cheaper compared to "find last bit set". Also, the feature number uses
less opcode space compared to the mask :)

Make get_xsave_addr() argument a xfeature number instead of mask and fix
up its callers.
As part of this use xfeature_nr and xfeature_mask consistently.
This results in changes to the kvm code as:
	feature -> xfeature_mask
	index -> xfeature_nr

Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/x86/include/asm/fpu/xstate.h |  4 ++--
 arch/x86/kernel/fpu/xstate.c      | 23 +++++++++++------------
 arch/x86/kernel/traps.c           |  2 +-
 arch/x86/kvm/x86.c                | 28 ++++++++++++++--------------
 arch/x86/mm/mpx.c                 |  6 +++---
 5 files changed, 31 insertions(+), 32 deletions(-)

diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
index 48581988d78c7..fbe41f808e5d8 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -46,8 +46,8 @@ extern void __init update_regset_xstate_info(unsigned int size,
 					     u64 xstate_mask);
 
 void fpu__xstate_clear_all_cpu_caps(void);
-void *get_xsave_addr(struct xregs_state *xsave, int xstate);
-const void *get_xsave_field_ptr(int xstate_field);
+void *get_xsave_addr(struct xregs_state *xsave, int xfeature_nr);
+const void *get_xsave_field_ptr(int xfeature_nr);
 int using_compacted_format(void);
 int copy_xstate_to_kernel(void *kbuf, struct xregs_state *xsave, unsigned int offset, unsigned int size);
 int copy_xstate_to_user(void __user *ubuf, struct xregs_state *xsave, unsigned int offset, unsigned int size);
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 3dfe3627deaf6..375226055a413 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -832,15 +832,15 @@ static void *__raw_xsave_addr(struct xregs_state *xsave, int xfeature_nr)
  *
  * Inputs:
  *	xstate: the thread's storage area for all FPU data
- *	xstate_feature: state which is defined in xsave.h (e.g.
- *	XFEATURE_MASK_FP, XFEATURE_MASK_SSE, etc...)
+ *	xfeature_nr: state which is defined in xsave.h (e.g. XFEATURE_FP,
+ *	XFEATURE_SSE, etc...)
  * Output:
  *	address of the state in the xsave area, or NULL if the
  *	field is not present in the xsave buffer.
  */
-void *get_xsave_addr(struct xregs_state *xsave, int xstate_feature)
+void *get_xsave_addr(struct xregs_state *xsave, int xfeature_nr)
 {
-	int xfeature_nr;
+	u64 xfeature_mask = 1ULL << xfeature_nr;
 	/*
 	 * Do we even *have* xsave state?
 	 */
@@ -852,11 +852,11 @@ void *get_xsave_addr(struct xregs_state *xsave, int xstate_feature)
 	 * have not enabled.  Remember that pcntxt_mask is
 	 * what we write to the XCR0 register.
 	 */
-	WARN_ONCE(!(xfeatures_mask & xstate_feature),
+	WARN_ONCE(!(xfeatures_mask & xfeature_mask),
 		  "get of unsupported state");
 	/*
 	 * This assumes the last 'xsave*' instruction to
-	 * have requested that 'xstate_feature' be saved.
+	 * have requested that 'xfeature_mask' be saved.
 	 * If it did not, we might be seeing and old value
 	 * of the field in the buffer.
 	 *
@@ -865,10 +865,9 @@ void *get_xsave_addr(struct xregs_state *xsave, int xstate_feature)
 	 * or because the "init optimization" caused it
 	 * to not be saved.
 	 */
-	if (!(xsave->header.xfeatures & xstate_feature))
+	if (!(xsave->header.xfeatures & xfeature_mask))
 		return NULL;
 
-	xfeature_nr = fls64(xstate_feature) - 1;
 	return __raw_xsave_addr(xsave, xfeature_nr);
 }
 EXPORT_SYMBOL_GPL(get_xsave_addr);
@@ -884,13 +883,13 @@ EXPORT_SYMBOL_GPL(get_xsave_addr);
  * Note that this only works on the current task.
  *
  * Inputs:
- *	@xsave_state: state which is defined in xsave.h (e.g. XFEATURE_MASK_FP,
- *	XFEATURE_MASK_SSE, etc...)
+ *	@xfeature_nr: state which is defined in xsave.h (e.g. XFEATURE_FP,
+ *	XFEATURE_SSE, etc...)
  * Output:
  *	address of the state in the xsave area or NULL if the state
  *	is not present or is in its 'init state'.
  */
-const void *get_xsave_field_ptr(int xsave_state)
+const void *get_xsave_field_ptr(int xfeature_nr)
 {
 	struct fpu *fpu = &current->thread.fpu;
 
@@ -900,7 +899,7 @@ const void *get_xsave_field_ptr(int xsave_state)
 	 */
 	fpu__save(fpu);
 
-	return get_xsave_addr(&fpu->state.xsave, xsave_state);
+	return get_xsave_addr(&fpu->state.xsave, xfeature_nr);
 }
 
 #ifdef CONFIG_ARCH_HAS_PKEYS
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 9b7c4ca8f0a73..626853b2ac344 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -455,7 +455,7 @@ dotraplinkage void do_bounds(struct pt_regs *regs, long error_code)
 	 * which is all zeros which indicates MPX was not
 	 * responsible for the exception.
 	 */
-	bndcsr = get_xsave_field_ptr(XFEATURE_MASK_BNDCSR);
+	bndcsr = get_xsave_field_ptr(XFEATURE_BNDCSR);
 	if (!bndcsr)
 		goto exit_trap;
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5cd5647120f2b..75301b439b6e4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3650,15 +3650,15 @@ static void fill_xsave(u8 *dest, struct kvm_vcpu *vcpu)
 	 */
 	valid = xstate_bv & ~XFEATURE_MASK_FPSSE;
 	while (valid) {
-		u64 feature = valid & -valid;
-		int index = fls64(feature) - 1;
-		void *src = get_xsave_addr(xsave, feature);
+		u64 xfeature_mask = valid & -valid;
+		int xfeature_nr = fls64(xfeature_mask) - 1;
+		void *src = get_xsave_addr(xsave, xfeature_nr);
 
 		if (src) {
 			u32 size, offset, ecx, edx;
-			cpuid_count(XSTATE_CPUID, index,
+			cpuid_count(XSTATE_CPUID, xfeature_nr,
 				    &size, &offset, &ecx, &edx);
-			if (feature == XFEATURE_MASK_PKRU)
+			if (xfeature_nr == XFEATURE_PKRU)
 				memcpy(dest + offset, &vcpu->arch.pkru,
 				       sizeof(vcpu->arch.pkru));
 			else
@@ -3666,7 +3666,7 @@ static void fill_xsave(u8 *dest, struct kvm_vcpu *vcpu)
 
 		}
 
-		valid -= feature;
+		valid -= xfeature_mask;
 	}
 }
 
@@ -3693,22 +3693,22 @@ static void load_xsave(struct kvm_vcpu *vcpu, u8 *src)
 	 */
 	valid = xstate_bv & ~XFEATURE_MASK_FPSSE;
 	while (valid) {
-		u64 feature = valid & -valid;
-		int index = fls64(feature) - 1;
-		void *dest = get_xsave_addr(xsave, feature);
+		u64 xfeature_mask = valid & -valid;
+		int xfeature_nr = fls64(xfeature_mask) - 1;
+		void *dest = get_xsave_addr(xsave, xfeature_nr);
 
 		if (dest) {
 			u32 size, offset, ecx, edx;
-			cpuid_count(XSTATE_CPUID, index,
+			cpuid_count(XSTATE_CPUID, xfeature_nr,
 				    &size, &offset, &ecx, &edx);
-			if (feature == XFEATURE_MASK_PKRU)
+			if (xfeature_nr == XFEATURE_PKRU)
 				memcpy(&vcpu->arch.pkru, src + offset,
 				       sizeof(vcpu->arch.pkru));
 			else
 				memcpy(dest, src + offset, size);
 		}
 
-		valid -= feature;
+		valid -= xfeature_mask;
 	}
 }
 
@@ -8704,11 +8704,11 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 		if (init_event)
 			kvm_put_guest_fpu(vcpu);
 		mpx_state_buffer = get_xsave_addr(&vcpu->arch.guest_fpu.state.xsave,
-					XFEATURE_MASK_BNDREGS);
+					XFEATURE_BNDREGS);
 		if (mpx_state_buffer)
 			memset(mpx_state_buffer, 0, sizeof(struct mpx_bndreg_state));
 		mpx_state_buffer = get_xsave_addr(&vcpu->arch.guest_fpu.state.xsave,
-					XFEATURE_MASK_BNDCSR);
+					XFEATURE_BNDCSR);
 		if (mpx_state_buffer)
 			memset(mpx_state_buffer, 0, sizeof(struct mpx_bndcsr));
 		if (init_event)
diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c
index 2385538e80656..bcd55cc3050c2 100644
--- a/arch/x86/mm/mpx.c
+++ b/arch/x86/mm/mpx.c
@@ -142,7 +142,7 @@ int mpx_fault_info(struct mpx_fault_info *info, struct pt_regs *regs)
 		goto err_out;
 	}
 	/* get bndregs field from current task's xsave area */
-	bndregs = get_xsave_field_ptr(XFEATURE_MASK_BNDREGS);
+	bndregs = get_xsave_field_ptr(XFEATURE_BNDREGS);
 	if (!bndregs) {
 		err = -EINVAL;
 		goto err_out;
@@ -190,7 +190,7 @@ static __user void *mpx_get_bounds_dir(void)
 	 * The bounds directory pointer is stored in a register
 	 * only accessible if we first do an xsave.
 	 */
-	bndcsr = get_xsave_field_ptr(XFEATURE_MASK_BNDCSR);
+	bndcsr = get_xsave_field_ptr(XFEATURE_BNDCSR);
 	if (!bndcsr)
 		return MPX_INVALID_BOUNDS_DIR;
 
@@ -376,7 +376,7 @@ static int do_mpx_bt_fault(void)
 	const struct mpx_bndcsr *bndcsr;
 	struct mm_struct *mm = current->mm;
 
-	bndcsr = get_xsave_field_ptr(XFEATURE_MASK_BNDCSR);
+	bndcsr = get_xsave_field_ptr(XFEATURE_BNDCSR);
 	if (!bndcsr)
 		return -EINVAL;
 	/*
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 14/23] x86/pkeys: Make init_pkru_value static
  2018-11-07 19:48 [PATCH v4] x86: load FPU registers on return to userland Sebastian Andrzej Siewior
                   ` (12 preceding siblings ...)
  2018-11-07 19:48 ` [PATCH 13/23] x86/fpu: Make get_xsave_field_ptr() and get_xsave_addr() " Sebastian Andrzej Siewior
@ 2018-11-07 19:48 ` Sebastian Andrzej Siewior
  2018-11-07 19:48 ` [PATCH 15/23] x86/fpu: Only write PKRU if it is different from current Sebastian Andrzej Siewior
                   ` (9 subsequent siblings)
  23 siblings, 0 replies; 45+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-11-07 19:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen,
	Sebastian Andrzej Siewior

The variable init_pkru_value isn't used outside of this file.
Make init_pkru_value static.

Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/x86/mm/pkeys.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c
index 0af3ece18f113..05bb9a44eb1c3 100644
--- a/arch/x86/mm/pkeys.c
+++ b/arch/x86/mm/pkeys.c
@@ -126,6 +126,7 @@ int __arch_override_mprotect_pkey(struct vm_area_struct *vma, int prot, int pkey
  * in the process's lifetime will not accidentally get access
  * to data which is pkey-protected later on.
  */
+static
 u32 init_pkru_value = PKRU_AD_KEY( 1) | PKRU_AD_KEY( 2) | PKRU_AD_KEY( 3) |
 		      PKRU_AD_KEY( 4) | PKRU_AD_KEY( 5) | PKRU_AD_KEY( 6) |
 		      PKRU_AD_KEY( 7) | PKRU_AD_KEY( 8) | PKRU_AD_KEY( 9) |
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 15/23] x86/fpu: Only write PKRU if it is different from current
  2018-11-07 19:48 [PATCH v4] x86: load FPU registers on return to userland Sebastian Andrzej Siewior
                   ` (13 preceding siblings ...)
  2018-11-07 19:48 ` [PATCH 14/23] x86/pkeys: Make init_pkru_value static Sebastian Andrzej Siewior
@ 2018-11-07 19:48 ` Sebastian Andrzej Siewior
  2018-11-07 19:48 ` [PATCH 16/23] x86/pkeys: Don't check if PKRU is zero before writting it Sebastian Andrzej Siewior
                   ` (8 subsequent siblings)
  23 siblings, 0 replies; 45+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-11-07 19:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen,
	Sebastian Andrzej Siewior

Dave Hansen says that the `wrpkru' is more expensive than `rdpkru'. It
has a higher cycle cost and it's also practically a (light) speculation
barrier.

As an optimisation read the current PKRU value and only write the new
one if it is different.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/x86/include/asm/special_insns.h | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h
index 43c029cdc3fe8..c2ccf71b22dd6 100644
--- a/arch/x86/include/asm/special_insns.h
+++ b/arch/x86/include/asm/special_insns.h
@@ -107,7 +107,7 @@ static inline u32 __read_pkru(void)
 	return pkru;
 }
 
-static inline void __write_pkru(u32 pkru)
+static inline void __write_pkru_insn(u32 pkru)
 {
 	u32 ecx = 0, edx = 0;
 
@@ -118,6 +118,17 @@ static inline void __write_pkru(u32 pkru)
 	asm volatile(".byte 0x0f,0x01,0xef\n\t"
 		     : : "a" (pkru), "c"(ecx), "d"(edx));
 }
+
+static inline void __write_pkru(u32 pkru)
+{
+	/*
+	 * Writting PKRU is expensive. Only write the PKRU value if it is
+	 * different from the current one.
+	 */
+	if (pkru == __read_pkru())
+		return;
+	__write_pkru_insn(pkru);
+}
 #else
 static inline u32 __read_pkru(void)
 {
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 16/23] x86/pkeys: Don't check if PKRU is zero before writting it
  2018-11-07 19:48 [PATCH v4] x86: load FPU registers on return to userland Sebastian Andrzej Siewior
                   ` (14 preceding siblings ...)
  2018-11-07 19:48 ` [PATCH 15/23] x86/fpu: Only write PKRU if it is different from current Sebastian Andrzej Siewior
@ 2018-11-07 19:48 ` Sebastian Andrzej Siewior
  2018-11-07 19:48 ` [PATCH 17/23] x86/fpu: Eager switch PKRU state Sebastian Andrzej Siewior
                   ` (7 subsequent siblings)
  23 siblings, 0 replies; 45+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-11-07 19:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen,
	Sebastian Andrzej Siewior

write_pkru() checks if the current value is the same as the expected
value. So instead just checking if the current and new value is zero
(and skip the write in such a case) we can benefit from that.

Remove the zero check of PKRU, write_pkru() provides a similar check.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/x86/mm/pkeys.c | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c
index 05bb9a44eb1c3..50f65fc1b9a3f 100644
--- a/arch/x86/mm/pkeys.c
+++ b/arch/x86/mm/pkeys.c
@@ -142,13 +142,6 @@ u32 init_pkru_value = PKRU_AD_KEY( 1) | PKRU_AD_KEY( 2) | PKRU_AD_KEY( 3) |
 void copy_init_pkru_to_fpregs(void)
 {
 	u32 init_pkru_value_snapshot = READ_ONCE(init_pkru_value);
-	/*
-	 * Any write to PKRU takes it out of the XSAVE 'init
-	 * state' which increases context switch cost.  Avoid
-	 * writing 0 when PKRU was already 0.
-	 */
-	if (!init_pkru_value_snapshot && !read_pkru())
-		return;
 	/*
 	 * Override the PKRU state that came from 'init_fpstate'
 	 * with the baseline from the process.
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 17/23] x86/fpu: Eager switch PKRU state
  2018-11-07 19:48 [PATCH v4] x86: load FPU registers on return to userland Sebastian Andrzej Siewior
                   ` (15 preceding siblings ...)
  2018-11-07 19:48 ` [PATCH 16/23] x86/pkeys: Don't check if PKRU is zero before writting it Sebastian Andrzej Siewior
@ 2018-11-07 19:48 ` Sebastian Andrzej Siewior
  2018-11-08 11:12   ` Paolo Bonzini
  2018-11-07 19:48 ` [PATCH 18/23] x86/entry: Add TIF_NEED_FPU_LOAD Sebastian Andrzej Siewior
                   ` (6 subsequent siblings)
  23 siblings, 1 reply; 45+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-11-07 19:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen,
	Sebastian Andrzej Siewior

From: Rik van Riel <riel@surriel.com>

While most of a task's FPU state is only needed in user space, the
protection keys need to be in place immediately after a context switch.

The reason is that any access to userspace memory while running in
kernel mode also need to abide by the memory permissions specified in
the protection keys.

The "eager switch" is a preparation for loading the FPU state on return
to userland. Instead of decoupling PKRU state from xstate I update PKRU
within xstate on write operations by the kernel.

The read/write_pkru() is moved to another header file so it can easily
accessed from pgtable.h and fpu/internal.h.

Signed-off-by: Rik van Riel <riel@surriel.com>
[bigeasy: save pkru to xstate, no cache]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/x86/include/asm/fpu/internal.h | 13 +++++++++++--
 arch/x86/include/asm/fpu/xstate.h   |  2 ++
 arch/x86/kernel/fpu/xstate.c        |  2 +-
 3 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 260cd4f4ba2bb..ed65e0642a1e1 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -561,8 +561,17 @@ switch_fpu_prepare(struct fpu *old_fpu, int cpu)
  */
 static inline void switch_fpu_finish(struct fpu *new_fpu, int cpu)
 {
-	if (static_cpu_has(X86_FEATURE_FPU))
-		__fpregs_load_activate(new_fpu, cpu);
+	if (!static_cpu_has(X86_FEATURE_FPU))
+		return;
+
+	__fpregs_load_activate(new_fpu, cpu);
+
+	if (cpu_feature_enabled(X86_FEATURE_OSPKE)) {
+		struct pkru_state *pk;
+
+		pk = __raw_xsave_addr(&new_fpu->state.xsave, XFEATURE_PKRU);
+		__write_pkru(pk->pkru);
+	}
 }
 
 /*
diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
index fbe41f808e5d8..dd138f5eb5c66 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -5,6 +5,7 @@
 #include <linux/types.h>
 #include <asm/processor.h>
 #include <linux/uaccess.h>
+#include <asm/user.h>
 
 /* Bit 63 of XCR0 is reserved for future expansion */
 #define XFEATURE_MASK_EXTEND	(~(XFEATURE_MASK_FPSSE | (1ULL << 63)))
@@ -47,6 +48,7 @@ extern void __init update_regset_xstate_info(unsigned int size,
 
 void fpu__xstate_clear_all_cpu_caps(void);
 void *get_xsave_addr(struct xregs_state *xsave, int xfeature_nr);
+void *__raw_xsave_addr(struct xregs_state *xsave, int feature_nr);
 const void *get_xsave_field_ptr(int xfeature_nr);
 int using_compacted_format(void);
 int copy_xstate_to_kernel(void *kbuf, struct xregs_state *xsave, unsigned int offset, unsigned int size);
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 375226055a413..5b33985d9f475 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -811,7 +811,7 @@ void fpu__resume_cpu(void)
  *
  * Note: does not work for compacted buffers.
  */
-static void *__raw_xsave_addr(struct xregs_state *xsave, int xfeature_nr)
+void *__raw_xsave_addr(struct xregs_state *xsave, int xfeature_nr)
 {
 	if (!xfeature_enabled(xfeature_nr)) {
 		WARN_ON_FPU(1);
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 18/23] x86/entry: Add TIF_NEED_FPU_LOAD
  2018-11-07 19:48 [PATCH v4] x86: load FPU registers on return to userland Sebastian Andrzej Siewior
                   ` (16 preceding siblings ...)
  2018-11-07 19:48 ` [PATCH 17/23] x86/fpu: Eager switch PKRU state Sebastian Andrzej Siewior
@ 2018-11-07 19:48 ` Sebastian Andrzej Siewior
  2018-11-07 19:48 ` [PATCH 19/23] x86/fpu: Always store the registers in copy_fpstate_to_sigframe() Sebastian Andrzej Siewior
                   ` (5 subsequent siblings)
  23 siblings, 0 replies; 45+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-11-07 19:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen,
	Sebastian Andrzej Siewior

Add TIF_NEED_FPU_LOAD. This is reserved for loading the FPU registers
before returning to userland. This flag must not be set for systems
without a FPU.
If this flag is cleared, the CPU's FPU register hold the current content
of current()'s FPU register. The in-memory copy (union fpregs_state) is
not valid.
If this flag is set, then all of CPU's FPU register may hold a random
value (except for PKRU) and it is required to load the content of the
FPU register on return to userland.

It is introduced now, so we can add code handling it now before adding
the main feature.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/x86/include/asm/thread_info.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index cd6920674b905..963a39da0a95c 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -83,6 +83,7 @@ struct thread_info {
 #define TIF_SYSCALL_EMU		6	/* syscall emulation active */
 #define TIF_SYSCALL_AUDIT	7	/* syscall auditing active */
 #define TIF_SECCOMP		8	/* secure computing */
+#define TIF_NEED_FPU_LOAD		10	/* load FPU on return to userspace */
 #define TIF_USER_RETURN_NOTIFY	11	/* notify kernel of userspace return */
 #define TIF_UPROBE		12	/* breakpointed or singlestepping */
 #define TIF_PATCH_PENDING	13	/* pending live patching update */
@@ -110,6 +111,7 @@ struct thread_info {
 #define _TIF_SYSCALL_EMU	(1 << TIF_SYSCALL_EMU)
 #define _TIF_SYSCALL_AUDIT	(1 << TIF_SYSCALL_AUDIT)
 #define _TIF_SECCOMP		(1 << TIF_SECCOMP)
+#define _TIF_NEED_FPU_LOAD		(1 << TIF_NEED_FPU_LOAD)
 #define _TIF_USER_RETURN_NOTIFY	(1 << TIF_USER_RETURN_NOTIFY)
 #define _TIF_UPROBE		(1 << TIF_UPROBE)
 #define _TIF_PATCH_PENDING	(1 << TIF_PATCH_PENDING)
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 19/23] x86/fpu: Always store the registers in copy_fpstate_to_sigframe()
  2018-11-07 19:48 [PATCH v4] x86: load FPU registers on return to userland Sebastian Andrzej Siewior
                   ` (17 preceding siblings ...)
  2018-11-07 19:48 ` [PATCH 18/23] x86/entry: Add TIF_NEED_FPU_LOAD Sebastian Andrzej Siewior
@ 2018-11-07 19:48 ` Sebastian Andrzej Siewior
  2018-11-07 19:48 ` [PATCH 20/23] x86/fpu: Prepare copy_fpstate_to_sigframe() for TIF_NEED_FPU_LOAD Sebastian Andrzej Siewior
                   ` (4 subsequent siblings)
  23 siblings, 0 replies; 45+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-11-07 19:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen,
	Sebastian Andrzej Siewior

From: Rik van Riel <riel@surriel.com>

copy_fpstate_to_sigframe() stores the registers directly to user space.
This is okay because the FPU register are valid and saving it directly
avoids saving it into kernel memory and making a copy.
However… We can't keep doing this if we are going to restore the FPU
registers on the return to userland. It is possible that the FPU
registers will be invalidated in the middle of the save operation and
this should be done with disabled preemption / BH.

Save the FPU registers to task's FPU struct and copy them to the user
memory.  later on.

This code is extracted from an earlier version of the patchset while
there still was lazy-FPU on x86.

Signed-off-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/x86/include/asm/fpu/internal.h | 45 -----------------------------
 arch/x86/kernel/fpu/signal.c        | 30 +++++++------------
 2 files changed, 11 insertions(+), 64 deletions(-)

diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index ed65e0642a1e1..9e213a6703c84 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -123,22 +123,6 @@ extern void fpstate_sanitize_xstate(struct fpu *fpu);
 		     _ASM_EXTABLE_HANDLE(1b, 2b, ex_handler_fprestore)	\
 		     : output : input)
 
-static inline int copy_fregs_to_user(struct fregs_state __user *fx)
-{
-	return user_insn(fnsave %[fx]; fwait,  [fx] "=m" (*fx), "m" (*fx));
-}
-
-static inline int copy_fxregs_to_user(struct fxregs_state __user *fx)
-{
-	if (IS_ENABLED(CONFIG_X86_32))
-		return user_insn(fxsave %[fx], [fx] "=m" (*fx), "m" (*fx));
-	else if (IS_ENABLED(CONFIG_AS_FXSAVEQ))
-		return user_insn(fxsaveq %[fx], [fx] "=m" (*fx), "m" (*fx));
-
-	/* See comment in copy_fxregs_to_kernel() below. */
-	return user_insn(rex64/fxsave (%[fx]), "=m" (*fx), [fx] "R" (fx));
-}
-
 static inline void copy_kernel_to_fxregs(struct fxregs_state *fx)
 {
 	if (IS_ENABLED(CONFIG_X86_32)) {
@@ -349,35 +333,6 @@ static inline void copy_kernel_to_xregs(struct xregs_state *xstate, u64 mask)
 	XSTATE_XRESTORE(xstate, lmask, hmask);
 }
 
-/*
- * Save xstate to user space xsave area.
- *
- * We don't use modified optimization because xrstor/xrstors might track
- * a different application.
- *
- * We don't use compacted format xsave area for
- * backward compatibility for old applications which don't understand
- * compacted format of xsave area.
- */
-static inline int copy_xregs_to_user(struct xregs_state __user *buf)
-{
-	int err;
-
-	/*
-	 * Clear the xsave header first, so that reserved fields are
-	 * initialized to zero.
-	 */
-	err = __clear_user(&buf->header, sizeof(buf->header));
-	if (unlikely(err))
-		return -EFAULT;
-
-	stac();
-	XSTATE_OP(XSAVE, buf, -1, -1, err);
-	clac();
-
-	return err;
-}
-
 /*
  * Restore xstate from user space xsave area.
  */
diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index f61f4d804537a..254a8dff9cd82 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -118,22 +118,6 @@ static inline int save_xstate_epilog(void __user *buf, int ia32_frame)
 	return err;
 }
 
-static inline int copy_fpregs_to_sigframe(struct xregs_state __user *buf)
-{
-	int err;
-
-	if (use_xsave())
-		err = copy_xregs_to_user(buf);
-	else if (use_fxsr())
-		err = copy_fxregs_to_user((struct fxregs_state __user *) buf);
-	else
-		err = copy_fregs_to_user((struct fregs_state __user *) buf);
-
-	if (unlikely(err) && __clear_user(buf, fpu_user_xstate_size))
-		err = -EFAULT;
-	return err;
-}
-
 /*
  * Save the fpu, extended register state to the user signal frame.
  *
@@ -157,6 +141,7 @@ static inline int copy_fpregs_to_sigframe(struct xregs_state __user *buf)
 int copy_fpstate_to_sigframe(void __user *buf, void __user *buf_fx, int size)
 {
 	struct fpu *fpu = &current->thread.fpu;
+	struct xregs_state *xsave = &fpu->state.xsave;
 	struct task_struct *tsk = current;
 	int ia32_fxstate = (buf != buf_fx);
 
@@ -171,12 +156,19 @@ int copy_fpstate_to_sigframe(void __user *buf, void __user *buf_fx, int size)
 			sizeof(struct user_i387_ia32_struct), NULL,
 			(struct _fpstate_32 __user *) buf) ? -1 : 1;
 
-	/* Save the live register state to the user directly. */
-	if (copy_fpregs_to_sigframe(buf_fx))
-		return -1;
 	/* Update the thread's fxstate to save the fsave header. */
 	if (ia32_fxstate)
 		copy_fxregs_to_kernel(fpu);
+	else
+		copy_fpregs_to_fpstate(fpu);
+
+	if (using_compacted_format()) {
+		copy_xstate_to_user(buf_fx, xsave, 0, size);
+	} else {
+		fpstate_sanitize_xstate(fpu);
+		if (__copy_to_user(buf_fx, xsave, fpu_user_xstate_size))
+			return -1;
+	}
 
 	/* Save the fsave header for the 32-bit frames. */
 	if ((ia32_fxstate || !use_fxsr()) && save_fsave_header(tsk, buf))
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 20/23] x86/fpu: Prepare copy_fpstate_to_sigframe() for TIF_NEED_FPU_LOAD
  2018-11-07 19:48 [PATCH v4] x86: load FPU registers on return to userland Sebastian Andrzej Siewior
                   ` (18 preceding siblings ...)
  2018-11-07 19:48 ` [PATCH 19/23] x86/fpu: Always store the registers in copy_fpstate_to_sigframe() Sebastian Andrzej Siewior
@ 2018-11-07 19:48 ` Sebastian Andrzej Siewior
  2018-11-07 19:48 ` [PATCH 21/23] x86/fpu: Update xstate's PKRU value on write_pkru() Sebastian Andrzej Siewior
                   ` (3 subsequent siblings)
  23 siblings, 0 replies; 45+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-11-07 19:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen,
	Sebastian Andrzej Siewior

From: Rik van Riel <riel@surriel.com>

The FPU registers need only to be saved if TIF_NEED_FPU_LOAD is not set.
Otherwise this has been already done and can be skipped.

Signed-off-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/x86/kernel/fpu/signal.c | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index 254a8dff9cd82..179e2b19976ad 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -156,11 +156,20 @@ int copy_fpstate_to_sigframe(void __user *buf, void __user *buf_fx, int size)
 			sizeof(struct user_i387_ia32_struct), NULL,
 			(struct _fpstate_32 __user *) buf) ? -1 : 1;
 
-	/* Update the thread's fxstate to save the fsave header. */
-	if (ia32_fxstate)
-		copy_fxregs_to_kernel(fpu);
-	else
-		copy_fpregs_to_fpstate(fpu);
+	__fpregs_changes_begin();
+	/*
+	 * If we do not need to load the FPU registers at return to userspace
+	 * then the CPU has the current state and we need to save it. Otherwise
+	 * it is already done and we can skip it.
+	 */
+	if (!test_thread_flag(TIF_NEED_FPU_LOAD)) {
+		/* Update the thread's fxstate to save the fsave header. */
+		if (ia32_fxstate)
+			copy_fxregs_to_kernel(fpu);
+		else
+			copy_fpregs_to_fpstate(fpu);
+	}
+	__fpregs_changes_end();
 
 	if (using_compacted_format()) {
 		copy_xstate_to_user(buf_fx, xsave, 0, size);
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 21/23] x86/fpu: Update xstate's PKRU value on write_pkru()
  2018-11-07 19:48 [PATCH v4] x86: load FPU registers on return to userland Sebastian Andrzej Siewior
                   ` (19 preceding siblings ...)
  2018-11-07 19:48 ` [PATCH 20/23] x86/fpu: Prepare copy_fpstate_to_sigframe() for TIF_NEED_FPU_LOAD Sebastian Andrzej Siewior
@ 2018-11-07 19:48 ` Sebastian Andrzej Siewior
  2018-11-07 19:48 ` [PATCH 22/23] x86/fpu: Don't restore the FPU state directly from userland in __fpu__restore_sig() Sebastian Andrzej Siewior
                   ` (2 subsequent siblings)
  23 siblings, 0 replies; 45+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-11-07 19:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen,
	Sebastian Andrzej Siewior

During the context switch the xstate is loaded which also includes the
PKRU value.
If xstate is restored on return to userland it is required that the
PKRU value in xstate is the same as the one in the CPU.

Save the PKRU in xstate during modification.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/x86/include/asm/pgtable.h | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 40616e8052924..45ca1c7802812 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -23,6 +23,8 @@
 
 #ifndef __ASSEMBLY__
 #include <asm/x86_init.h>
+#include <asm/fpu/xstate.h>
+#include <asm/fpu/api.h>
 
 extern pgd_t early_top_pgt[PTRS_PER_PGD];
 int __init __early_make_pgtable(unsigned long address, pmdval_t pmd);
@@ -133,8 +135,21 @@ static inline u32 read_pkru(void)
 
 static inline void write_pkru(u32 pkru)
 {
-	if (boot_cpu_has(X86_FEATURE_OSPKE))
-		__write_pkru(pkru);
+	struct pkru_state *pk;
+
+	if (!boot_cpu_has(X86_FEATURE_OSPKE))
+		return;
+
+	pk = __raw_xsave_addr(&current->thread.fpu.state.xsave, XFEATURE_PKRU);
+	/*
+	 * The PKRU value in xstate needs to be in sync with the value that is
+	 * written to the CPU. The FPU restore on return to userland would
+	 * otherwise load the previous value again.
+	 */
+	__fpregs_changes_begin();
+	pk->pkru = pkru;
+	__write_pkru(pkru);
+	__fpregs_changes_end();
 }
 
 static inline int pte_young(pte_t pte)
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 22/23] x86/fpu: Don't restore the FPU state directly from userland in __fpu__restore_sig()
  2018-11-07 19:48 [PATCH v4] x86: load FPU registers on return to userland Sebastian Andrzej Siewior
                   ` (20 preceding siblings ...)
  2018-11-07 19:48 ` [PATCH 21/23] x86/fpu: Update xstate's PKRU value on write_pkru() Sebastian Andrzej Siewior
@ 2018-11-07 19:48 ` Sebastian Andrzej Siewior
  2018-11-08 18:25   ` Andy Lutomirski
  2018-11-07 19:48 ` [PATCH 23/23] x86/fpu: Defer FPU state load until return to userspace Sebastian Andrzej Siewior
  2018-11-12  3:02 ` [PATCH v4] x86: load FPU registers on return to userland Wanpeng Li
  23 siblings, 1 reply; 45+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-11-07 19:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen,
	Sebastian Andrzej Siewior

__fpu__restore_sig() restores the CPU's FPU state directly from
userland. If we restore registers on return to userland then we can't
load them directly from userland because a context switch/BH could
destroy them.

Restore the FPU registers after they have been copied from userland.
__fpregs_changes_begin() ensures that they are not modified while beeing
worked on. TIF_NEED_FPU_LOAD is clreared we want to keep our state, not
the saved state.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/x86/include/asm/fpu/internal.h | 34 -----------------------------
 arch/x86/kernel/fpu/signal.c        | 33 ++++++++++++++++++----------
 2 files changed, 22 insertions(+), 45 deletions(-)

diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 9e213a6703c84..5e86ff60a3a5c 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -137,28 +137,11 @@ static inline void copy_kernel_to_fxregs(struct fxregs_state *fx)
 	}
 }
 
-static inline int copy_user_to_fxregs(struct fxregs_state __user *fx)
-{
-	if (IS_ENABLED(CONFIG_X86_32))
-		return user_insn(fxrstor %[fx], "=m" (*fx), [fx] "m" (*fx));
-	else if (IS_ENABLED(CONFIG_AS_FXSAVEQ))
-		return user_insn(fxrstorq %[fx], "=m" (*fx), [fx] "m" (*fx));
-
-	/* See comment in copy_fxregs_to_kernel() below. */
-	return user_insn(rex64/fxrstor (%[fx]), "=m" (*fx), [fx] "R" (fx),
-			  "m" (*fx));
-}
-
 static inline void copy_kernel_to_fregs(struct fregs_state *fx)
 {
 	kernel_insn(frstor %[fx], "=m" (*fx), [fx] "m" (*fx));
 }
 
-static inline int copy_user_to_fregs(struct fregs_state __user *fx)
-{
-	return user_insn(frstor %[fx], "=m" (*fx), [fx] "m" (*fx));
-}
-
 static inline void copy_fxregs_to_kernel(struct fpu *fpu)
 {
 	if (IS_ENABLED(CONFIG_X86_32))
@@ -333,23 +316,6 @@ static inline void copy_kernel_to_xregs(struct xregs_state *xstate, u64 mask)
 	XSTATE_XRESTORE(xstate, lmask, hmask);
 }
 
-/*
- * Restore xstate from user space xsave area.
- */
-static inline int copy_user_to_xregs(struct xregs_state __user *buf, u64 mask)
-{
-	struct xregs_state *xstate = ((__force struct xregs_state *)buf);
-	u32 lmask = mask;
-	u32 hmask = mask >> 32;
-	int err;
-
-	stac();
-	XSTATE_OP(XRSTOR, xstate, lmask, hmask, err);
-	clac();
-
-	return err;
-}
-
 /*
  * These must be called with preempt disabled. Returns
  * 'true' if the FPU state is still intact and we can
diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index 179e2b19976ad..9720529859483 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -228,23 +228,30 @@ sanitize_restored_xstate(union fpregs_state *state,
 /*
  * Restore the extended state if present. Otherwise, restore the FP/SSE state.
  */
-static inline int copy_user_to_fpregs_zeroing(void __user *buf, u64 xbv, int fx_only)
+static void copy_to_fpregs_zeroing(struct fpu *fpu, u64 xbv, int fx_only)
 {
+	__fpregs_changes_begin();
 	if (use_xsave()) {
-		if ((unsigned long)buf % 64 || fx_only) {
+		if (fx_only) {
 			u64 init_bv = xfeatures_mask & ~XFEATURE_MASK_FPSSE;
+
 			copy_kernel_to_xregs(&init_fpstate.xsave, init_bv);
-			return copy_user_to_fxregs(buf);
+			copy_kernel_to_fxregs(&fpu->state.fxsave);
 		} else {
 			u64 init_bv = xfeatures_mask & ~xbv;
+
 			if (unlikely(init_bv))
 				copy_kernel_to_xregs(&init_fpstate.xsave, init_bv);
-			return copy_user_to_xregs(buf, xbv);
+			copy_kernel_to_xregs(&fpu->state.xsave, xbv);
 		}
 	} else if (use_fxsr()) {
-		return copy_user_to_fxregs(buf);
-	} else
-		return copy_user_to_fregs(buf);
+		copy_kernel_to_fxregs(&fpu->state.fxsave);
+	} else {
+		copy_kernel_to_fregs(&fpu->state.fsave);
+	}
+	clear_thread_flag(TIF_NEED_FPU_LOAD);
+	fpregs_activate(fpu);
+	__fpregs_changes_end();
 }
 
 static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
@@ -255,6 +262,7 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
 	int state_size = fpu_kernel_xstate_size;
 	u64 xfeatures = 0;
 	int fx_only = 0;
+	int err = 0;
 
 	ia32_fxstate &= (IS_ENABLED(CONFIG_X86_32) ||
 			 IS_ENABLED(CONFIG_IA32_EMULATION));
@@ -298,7 +306,6 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
 		union fpregs_state *state;
 		void *tmp;
 		struct user_i387_ia32_struct env;
-		int err = 0;
 
 		tmp = kmalloc(sizeof(*state) + fpu_kernel_xstate_size + 64, GFP_KERNEL);
 		if (!tmp)
@@ -327,12 +334,16 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
 	} else {
 		/*
 		 * For 64-bit frames and 32-bit fsave frames, restore the user
-		 * state to the registers directly (with exceptions handled).
+		 * state from a copy in thread's fpu state.
 		 */
-		if (copy_user_to_fpregs_zeroing(buf_fx, xfeatures, fx_only)) {
+		err = __copy_from_user(&fpu->state.xsave, buf_fx, state_size);
+		if (err) {
 			fpu__clear(fpu);
-			return -1;
+			return -EFAULT;
 		}
+		if ((unsigned long)buf_fx % 64)
+			fx_only = 1;
+		copy_to_fpregs_zeroing(fpu, xfeatures, fx_only);
 	}
 
 	return 0;
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 23/23] x86/fpu: Defer FPU state load until return to userspace
  2018-11-07 19:48 [PATCH v4] x86: load FPU registers on return to userland Sebastian Andrzej Siewior
                   ` (21 preceding siblings ...)
  2018-11-07 19:48 ` [PATCH 22/23] x86/fpu: Don't restore the FPU state directly from userland in __fpu__restore_sig() Sebastian Andrzej Siewior
@ 2018-11-07 19:48 ` Sebastian Andrzej Siewior
  2018-11-08  8:33   ` Sebastian Andrzej Siewior
  2018-11-12  3:02 ` [PATCH v4] x86: load FPU registers on return to userland Wanpeng Li
  23 siblings, 1 reply; 45+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-11-07 19:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen,
	Sebastian Andrzej Siewior

From: Rik van Riel <riel@surriel.com>

Defer loading of FPU state until return to userspace. This gives
the kernel the potential to skip loading FPU state for tasks that
stay in kernel mode, or for tasks that end up with repeated
invocations of kernel_fpu_begin() & kernel_fpu_end().

The __fpregs_changes_{begin|end}() section ensures that the register
remain unchanged. Otherwise a context switch or a BH could save the
registers to its FPU context and processor's FPU register would became
random if beeing modified at the same time.

KVM swaps the host/guest register on entry/exit path. I kept the flow as
is. First it ensures that the registers are loaded and then saves the
current (host) state before it loads the guest's register. The swap is
done at the very end with disabled interrupts so it should not change
anymore before theg guest is entered. The read/save version seems to be
cheaper compared to memcpy() in a micro benchmark.

Each thread gets TIF_NEED_FPU_LOAD set as part of fork() / fpu__copy().
For kernel threads, this flag gets never cleared which avoids saving /
restoring the FPU state for kernel threads and during in-kernel usage of
the FPU register. This should restore the performance regression which
was introduced after the removal of the ->initialized field in struct
fpu.

Signed-off-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/x86/entry/common.c             |  8 +++
 arch/x86/include/asm/fpu/api.h      | 14 +++++
 arch/x86/include/asm/fpu/internal.h | 31 +++++----
 arch/x86/include/asm/trace/fpu.h    |  5 +-
 arch/x86/kernel/fpu/core.c          | 97 +++++++++++++++++++++++------
 arch/x86/kernel/fpu/signal.c        | 49 ++++++++-------
 arch/x86/kernel/process.c           |  2 +-
 arch/x86/kernel/process_32.c        |  5 +-
 arch/x86/kernel/process_64.c        |  5 +-
 arch/x86/kvm/x86.c                  | 19 ++++--
 10 files changed, 172 insertions(+), 63 deletions(-)

diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index 3b2490b819181..e46fe9aa2934a 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -31,6 +31,7 @@
 #include <asm/vdso.h>
 #include <linux/uaccess.h>
 #include <asm/cpufeature.h>
+#include <asm/fpu/api.h>
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/syscalls.h>
@@ -196,6 +197,13 @@ __visible inline void prepare_exit_to_usermode(struct pt_regs *regs)
 	if (unlikely(cached_flags & EXIT_TO_USERMODE_LOOP_FLAGS))
 		exit_to_usermode_loop(regs, cached_flags);
 
+	/* Reload ti->flags; we may have rescheduled above. */
+	cached_flags = READ_ONCE(ti->flags);
+
+	fpregs_assert_state_consistent();
+	if (unlikely(cached_flags & _TIF_NEED_FPU_LOAD))
+		switch_fpu_return();
+
 #ifdef CONFIG_COMPAT
 	/*
 	 * Compat syscalls set TS_COMPAT.  Make sure we clear it before
diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h
index 8c03b42416656..d10b60d80d35a 100644
--- a/arch/x86/include/asm/fpu/api.h
+++ b/arch/x86/include/asm/fpu/api.h
@@ -27,17 +27,31 @@ extern void __kernel_fpu_end(void);
 extern void kernel_fpu_begin(void);
 extern void kernel_fpu_end(void);
 extern bool irq_fpu_usable(void);
+extern void fpregs_mark_activate(void);
 
 static inline void __fpregs_changes_begin(void)
 {
 	preempt_disable();
+	local_bh_disable();
 }
 
 static inline void __fpregs_changes_end(void)
 {
+	local_bh_enable();
 	preempt_enable();
 }
 
+#ifdef CONFIG_X86_DEBUG_FPU
+extern void fpregs_assert_state_consistent(void);
+#else
+static inline void fpregs_assert_state_consistent(void) { }
+#endif
+
+/*
+ * Load the task FPU state before returning to userspace.
+ */
+extern void switch_fpu_return(void);
+
 /*
  * Query the presence of one or more xfeatures. Works on any legacy CPU as well.
  *
diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 5e86ff60a3a5c..0406f0987697f 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -29,7 +29,7 @@ extern void fpu__prepare_write(struct fpu *fpu);
 extern void fpu__save(struct fpu *fpu);
 extern int  fpu__restore_sig(void __user *buf, int ia32_frame);
 extern void fpu__drop(struct fpu *fpu);
-extern int  fpu__copy(struct fpu *dst_fpu, struct fpu *src_fpu);
+extern int  fpu__copy(struct task_struct *dst, struct task_struct *src);
 extern void fpu__clear(struct fpu *fpu);
 extern int  fpu__exception_code(struct fpu *fpu, int trap_nr);
 extern int  dump_fpu(struct pt_regs *ptregs, struct user_i387_struct *fpstate);
@@ -436,15 +436,20 @@ static inline void fpregs_activate(struct fpu *fpu)
 /*
  * Load the FPU state for the current task. Call with preemption disabled.
  */
-static inline void __fpregs_load_activate(struct fpu *fpu, int cpu)
+static inline void __fpregs_load_activate(void)
 {
+	struct fpu *fpu = &current->thread.fpu;
+	int cpu = smp_processor_id();
+	int kthread = current->mm == NULL;
+
+	if (kthread)
+		return;
 	if (!fpregs_state_valid(fpu, cpu)) {
-		if (current->mm)
-			copy_kernel_to_fpregs(&fpu->state);
-		else
-			copy_kernel_to_fpregs(&init_fpstate);
+		copy_kernel_to_fpregs(&fpu->state);
+		fpregs_activate(fpu);
+		fpu->last_cpu = cpu;
 	}
-	fpregs_activate(fpu);
+	clear_thread_flag(TIF_NEED_FPU_LOAD);
 }
 
 /*
@@ -455,8 +460,8 @@ static inline void __fpregs_load_activate(struct fpu *fpu, int cpu)
  *  - switch_fpu_prepare() saves the old state.
  *    This is done within the context of the old process.
  *
- *  - switch_fpu_finish() restores the new state as
- *    necessary.
+ *  - switch_fpu_finish() sets TIF_NEED_FPU_LOAD; the floating point state
+ *    will get loaded on return to userspace, or when the kernel needs it.
  */
 static inline void
 switch_fpu_prepare(struct fpu *old_fpu, int cpu)
@@ -477,15 +482,15 @@ switch_fpu_prepare(struct fpu *old_fpu, int cpu)
  */
 
 /*
- * Set up the userspace FPU context for the new task, if the task
- * has used the FPU.
+ * Load PKRU from the FPU context if available. Delay loading the loading of the
+ * complete FPU state until the return to userland.
  */
-static inline void switch_fpu_finish(struct fpu *new_fpu, int cpu)
+static inline void switch_fpu_finish(struct fpu *new_fpu)
 {
 	if (!static_cpu_has(X86_FEATURE_FPU))
 		return;
 
-	__fpregs_load_activate(new_fpu, cpu);
+	set_thread_flag(TIF_NEED_FPU_LOAD);
 
 	if (cpu_feature_enabled(X86_FEATURE_OSPKE)) {
 		struct pkru_state *pk;
diff --git a/arch/x86/include/asm/trace/fpu.h b/arch/x86/include/asm/trace/fpu.h
index bd65f6ba950f8..91a1422091ceb 100644
--- a/arch/x86/include/asm/trace/fpu.h
+++ b/arch/x86/include/asm/trace/fpu.h
@@ -13,19 +13,22 @@ DECLARE_EVENT_CLASS(x86_fpu,
 
 	TP_STRUCT__entry(
 		__field(struct fpu *, fpu)
+		__field(bool, load_fpu)
 		__field(u64, xfeatures)
 		__field(u64, xcomp_bv)
 		),
 
 	TP_fast_assign(
 		__entry->fpu		= fpu;
+		__entry->load_fpu	= test_thread_flag(TIF_NEED_FPU_LOAD);
 		if (boot_cpu_has(X86_FEATURE_OSXSAVE)) {
 			__entry->xfeatures = fpu->state.xsave.header.xfeatures;
 			__entry->xcomp_bv  = fpu->state.xsave.header.xcomp_bv;
 		}
 	),
-	TP_printk("x86/fpu: %p xfeatures: %llx xcomp_bv: %llx",
+	TP_printk("x86/fpu: %p load: %d xfeatures: %llx xcomp_bv: %llx",
 			__entry->fpu,
+			__entry->load_fpu,
 			__entry->xfeatures,
 			__entry->xcomp_bv
 	)
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 02bbd40fe1e95..85746819deed2 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -101,16 +101,20 @@ void __kernel_fpu_begin(void)
 
 	kernel_fpu_disable();
 
-	copy_fpregs_to_fpstate(fpu);
+	__cpu_invalidate_fpregs_state();
+	if (!test_thread_flag(TIF_NEED_FPU_LOAD)) {
+		set_thread_flag(TIF_NEED_FPU_LOAD);
+		/*
+		 * Ignore return value -- we don't care if reg state
+		 * is clobbered.
+		 */
+		copy_fpregs_to_fpstate(fpu);
+	}
 }
 EXPORT_SYMBOL(__kernel_fpu_begin);
 
 void __kernel_fpu_end(void)
 {
-	struct fpu *fpu = &current->thread.fpu;
-
-	copy_kernel_to_fpregs(&fpu->state);
-
 	kernel_fpu_enable();
 }
 EXPORT_SYMBOL(__kernel_fpu_end);
@@ -138,14 +142,16 @@ void fpu__save(struct fpu *fpu)
 {
 	WARN_ON_FPU(fpu != &current->thread.fpu);
 
-	preempt_disable();
+	__fpregs_changes_begin();
 	trace_x86_fpu_before_save(fpu);
 
-	if (!copy_fpregs_to_fpstate(fpu)) {
-		copy_kernel_to_fpregs(&fpu->state);
+	if (!test_thread_flag(TIF_NEED_FPU_LOAD)) {
+		if (!copy_fpregs_to_fpstate(fpu)) {
+			copy_kernel_to_fpregs(&fpu->state);
+		}
 	}
 	trace_x86_fpu_after_save(fpu);
-	preempt_enable();
+	__fpregs_changes_end();
 }
 EXPORT_SYMBOL_GPL(fpu__save);
 
@@ -178,8 +184,11 @@ void fpstate_init(union fpregs_state *state)
 }
 EXPORT_SYMBOL_GPL(fpstate_init);
 
-int fpu__copy(struct fpu *dst_fpu, struct fpu *src_fpu)
+int fpu__copy(struct task_struct *dst, struct task_struct *src)
 {
+	struct fpu *dst_fpu = &dst->thread.fpu;
+	struct fpu *src_fpu = &src->thread.fpu;
+
 	dst_fpu->last_cpu = -1;
 
 	if (!static_cpu_has(X86_FEATURE_FPU))
@@ -194,16 +203,23 @@ int fpu__copy(struct fpu *dst_fpu, struct fpu *src_fpu)
 	memset(&dst_fpu->state.xsave, 0, fpu_kernel_xstate_size);
 
 	/*
-	 * Save current FPU registers directly into the child
-	 * FPU context, without any memory-to-memory copying.
+	 * If the FPU registers are not current just memcpy() the state.
+	 * Otherwise save current FPU registers directly into the child's FPU
+	 * context, without any memory-to-memory copying.
 	 *
 	 * ( The function 'fails' in the FNSAVE case, which destroys
-	 *   register contents so we have to copy them back. )
+	 *   register contents so we have to load them back. )
 	 */
-	if (!copy_fpregs_to_fpstate(dst_fpu)) {
-		memcpy(&src_fpu->state, &dst_fpu->state, fpu_kernel_xstate_size);
-		copy_kernel_to_fpregs(&src_fpu->state);
-	}
+	__fpregs_changes_begin();
+	if (test_thread_flag(TIF_NEED_FPU_LOAD))
+		memcpy(&dst_fpu->state, &src_fpu->state, fpu_kernel_xstate_size);
+
+	else if (!copy_fpregs_to_fpstate(dst_fpu))
+		copy_kernel_to_fpregs(&dst_fpu->state);
+
+	__fpregs_changes_end();
+
+	set_tsk_thread_flag(dst, TIF_NEED_FPU_LOAD);
 
 	trace_x86_fpu_copy_src(src_fpu);
 	trace_x86_fpu_copy_dst(dst_fpu);
@@ -219,10 +235,9 @@ static void fpu__initialize(struct fpu *fpu)
 {
 	WARN_ON_FPU(fpu != &current->thread.fpu);
 
+	set_thread_flag(TIF_NEED_FPU_LOAD);
 	fpstate_init(&fpu->state);
 	trace_x86_fpu_init_state(fpu);
-
-	trace_x86_fpu_activate_state(fpu);
 }
 
 /*
@@ -301,6 +316,8 @@ void fpu__drop(struct fpu *fpu)
  */
 static inline void copy_init_fpstate_to_fpregs(void)
 {
+	__fpregs_changes_begin();
+
 	if (use_xsave())
 		copy_kernel_to_xregs(&init_fpstate.xsave, -1);
 	else if (static_cpu_has(X86_FEATURE_FXSR))
@@ -310,6 +327,9 @@ static inline void copy_init_fpstate_to_fpregs(void)
 
 	if (boot_cpu_has(X86_FEATURE_OSPKE))
 		copy_init_pkru_to_fpregs();
+
+	fpregs_mark_activate();
+	__fpregs_changes_end();
 }
 
 /*
@@ -332,6 +352,45 @@ void fpu__clear(struct fpu *fpu)
 		copy_init_fpstate_to_fpregs();
 }
 
+/*
+ * Load FPU context before returning to userspace.
+ */
+void switch_fpu_return(void)
+{
+	if (!static_cpu_has(X86_FEATURE_FPU))
+		return;
+
+	__fpregs_load_activate();
+}
+EXPORT_SYMBOL_GPL(switch_fpu_return);
+
+#ifdef CONFIG_X86_DEBUG_FPU
+/*
+ * If current FPU state according to its tracking (loaded FPU ctx on this CPU)
+ * is not valid then we must have TIF_NEED_FPU_LOAD set so the context is loaded on
+ * return to userland.
+ */
+void fpregs_assert_state_consistent(void)
+{
+       struct fpu *fpu = &current->thread.fpu;
+
+       if (test_thread_flag(TIF_NEED_FPU_LOAD))
+               return;
+       WARN_ON_FPU(!fpregs_state_valid(fpu, smp_processor_id()));
+}
+EXPORT_SYMBOL_GPL(fpregs_assert_state_consistent);
+#endif
+
+void fpregs_mark_activate(void)
+{
+	struct fpu *fpu = &current->thread.fpu;
+
+	fpregs_activate(fpu);
+	fpu->last_cpu = smp_processor_id();
+	clear_thread_flag(TIF_NEED_FPU_LOAD);
+}
+EXPORT_SYMBOL_GPL(fpregs_mark_activate);
+
 /*
  * x87 math exception handling:
  */
diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index 9720529859483..9a9c379c23c35 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -251,6 +251,7 @@ static void copy_to_fpregs_zeroing(struct fpu *fpu, u64 xbv, int fx_only)
 	}
 	clear_thread_flag(TIF_NEED_FPU_LOAD);
 	fpregs_activate(fpu);
+	fpu->last_cpu = smp_processor_id();
 	__fpregs_changes_end();
 }
 
@@ -262,7 +263,7 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
 	int state_size = fpu_kernel_xstate_size;
 	u64 xfeatures = 0;
 	int fx_only = 0;
-	int err = 0;
+	int err;
 
 	ia32_fxstate &= (IS_ENABLED(CONFIG_X86_32) ||
 			 IS_ENABLED(CONFIG_IA32_EMULATION));
@@ -297,40 +298,43 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
 		}
 	}
 
+	/*
+	 * The current state of the FPU registers does not matter. By setting
+	 * TIF_NEED_FPU_LOAD unconditionally it is ensured that the our xstate
+	 * is not modified on context switch and that the xstate is considered
+	 * to loaded again on return to userland (overriding last_cpu avoids the
+	 * optimisation).
+	 */
+	set_thread_flag(TIF_NEED_FPU_LOAD);
+	__fpu_invalidate_fpregs_state(fpu);
+
 	if (ia32_fxstate) {
 		/*
 		 * For 32-bit frames with fxstate, copy the user state to the
 		 * thread's fpu state, reconstruct fxstate from the fsave
 		 * header. Validate and sanitize the copied state.
 		 */
-		union fpregs_state *state;
-		void *tmp;
 		struct user_i387_ia32_struct env;
 
-		tmp = kmalloc(sizeof(*state) + fpu_kernel_xstate_size + 64, GFP_KERNEL);
-		if (!tmp)
-			return -ENOMEM;
-		state = PTR_ALIGN(tmp, 64);
-
 		if (using_compacted_format()) {
-			err = copy_user_to_xstate(&state->xsave, buf_fx);
+			err = copy_user_to_xstate(&fpu->state.xsave, buf_fx);
 		} else {
-			err = __copy_from_user(&state->xsave, buf_fx, state_size);
+			err = __copy_from_user(&fpu->state.xsave, buf_fx, state_size);
 
 			if (!err && state_size > offsetof(struct xregs_state, header))
-				err = validate_xstate_header(&state->xsave.header);
+				err = validate_xstate_header(&fpu->state.xsave.header);
+		}
+		if (err) {
+			err = -EINVAL;
+			goto out;
 		}
 
-		if (err || __copy_from_user(&env, buf, sizeof(env))) {
-			err = -1;
-		} else {
-			sanitize_restored_xstate(state, &env,
-						 xfeatures, fx_only);
-			copy_kernel_to_fpregs(state);
+		if (__copy_from_user(&env, buf, sizeof(env))) {
+			err = -EFAULT;
+			goto out;
 		}
 
-		kfree(tmp);
-		return err;
+		sanitize_restored_xstate(&fpu->state, &env, xfeatures, fx_only);
 	} else {
 		/*
 		 * For 64-bit frames and 32-bit fsave frames, restore the user
@@ -338,8 +342,8 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
 		 */
 		err = __copy_from_user(&fpu->state.xsave, buf_fx, state_size);
 		if (err) {
-			fpu__clear(fpu);
-			return -EFAULT;
+			err = EFAULT;
+			goto out;
 		}
 		if ((unsigned long)buf_fx % 64)
 			fx_only = 1;
@@ -347,6 +351,9 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
 	}
 
 	return 0;
+out:
+	fpu__clear(fpu);
+	return err;
 }
 
 static inline int xstate_sigframe_size(void)
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index c93fcfdf16734..cd7105fb92bfc 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -96,7 +96,7 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
 	dst->thread.vm86 = NULL;
 #endif
 
-	return fpu__copy(&dst->thread.fpu, &src->thread.fpu);
+	return fpu__copy(dst, src);
 }
 
 /*
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 5c99e49c0d392..5c059ecb136db 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -233,7 +233,8 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 
 	/* never put a printk in __switch_to... printk() calls wake_up*() indirectly */
 
-	switch_fpu_prepare(prev_fpu, cpu);
+	if (!test_thread_flag(TIF_NEED_FPU_LOAD))
+		switch_fpu_prepare(prev_fpu, cpu);
 
 	/*
 	 * Save away %gs. No need to save %fs, as it was saved on the
@@ -294,7 +295,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 
 	this_cpu_write(current_task, next_p);
 
-	switch_fpu_finish(next_fpu, cpu);
+	switch_fpu_finish(next_fpu);
 
 	/* Load the Intel cache allocation PQR MSR. */
 	intel_rdt_sched_in();
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 4fd0dfa5bd83e..83c3ea439190a 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -558,7 +558,8 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	WARN_ON_ONCE(IS_ENABLED(CONFIG_DEBUG_ENTRY) &&
 		     this_cpu_read(irq_count) != -1);
 
-	switch_fpu_prepare(prev_fpu, cpu);
+	if (!test_thread_flag(TIF_NEED_FPU_LOAD))
+		switch_fpu_prepare(prev_fpu, cpu);
 
 	/* We must save %fs and %gs before load_TLS() because
 	 * %fs and %gs may be cleared by load_TLS().
@@ -610,7 +611,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	this_cpu_write(current_task, next_p);
 	this_cpu_write(cpu_current_top_of_stack, task_top_of_stack(next_p));
 
-	switch_fpu_finish(next_fpu, cpu);
+	switch_fpu_finish(next_fpu);
 
 	/* Reload sp0. */
 	update_task_stack(next_p);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 75301b439b6e4..4cc57223893c5 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7734,6 +7734,10 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 		wait_lapic_expire(vcpu);
 	guest_enter_irqoff();
 
+	fpregs_assert_state_consistent();
+	if (test_thread_flag(TIF_NEED_FPU_LOAD))
+		switch_fpu_return();
+
 	if (unlikely(vcpu->arch.switch_db_regs)) {
 		set_debugreg(0, 7);
 		set_debugreg(vcpu->arch.eff_db[0], 0);
@@ -7993,22 +7997,29 @@ static int complete_emulated_mmio(struct kvm_vcpu *vcpu)
 /* Swap (qemu) user FPU context for the guest FPU context. */
 static void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
 {
-	preempt_disable();
+	__fpregs_changes_begin();
+
 	copy_fpregs_to_fpstate(&vcpu->arch.user_fpu);
+
 	/* PKRU is separately restored in kvm_x86_ops->run.  */
 	__copy_kernel_to_fpregs(&vcpu->arch.guest_fpu.state,
 				~XFEATURE_MASK_PKRU);
-	preempt_enable();
+
+	fpregs_mark_activate();
+	__fpregs_changes_end();
 	trace_kvm_fpu(1);
 }
 
 /* When vcpu_run ends, restore user space FPU context. */
 static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
 {
-	preempt_disable();
+	__fpregs_changes_begin();
+
 	copy_fpregs_to_fpstate(&vcpu->arch.guest_fpu);
 	copy_kernel_to_fpregs(&vcpu->arch.user_fpu.state);
-	preempt_enable();
+
+	fpregs_mark_activate();
+	__fpregs_changes_end();
 	++vcpu->stat.fpu_reload;
 	trace_kvm_fpu(0);
 }
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH 23/23] x86/fpu: Defer FPU state load until return to userspace
  2018-11-07 19:48 ` [PATCH 23/23] x86/fpu: Defer FPU state load until return to userspace Sebastian Andrzej Siewior
@ 2018-11-08  8:33   ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 45+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-11-08  8:33 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On 2018-11-07 20:48:58 [+0100], To linux-kernel@vger.kernel.org wrote:
> index 8c03b42416656..d10b60d80d35a 100644
> --- a/arch/x86/include/asm/fpu/api.h
> +++ b/arch/x86/include/asm/fpu/api.h

The bot complained about missing definition for local_bh_disable() so I
swapped the preempt.h for bottom_half.h at the beginning of this file.

> @@ -27,17 +27,31 @@ extern void __kernel_fpu_end(void);
>  extern void kernel_fpu_begin(void);
>  extern void kernel_fpu_end(void);
>  extern bool irq_fpu_usable(void);
> +extern void fpregs_mark_activate(void);
>  
>  static inline void __fpregs_changes_begin(void)
>  {
>  	preempt_disable();
> +	local_bh_disable();
>  }
>  
>  static inline void __fpregs_changes_end(void)
>  {
> +	local_bh_enable();
>  	preempt_enable();
>  }

Sebastian

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 17/23] x86/fpu: Eager switch PKRU state
  2018-11-07 19:48 ` [PATCH 17/23] x86/fpu: Eager switch PKRU state Sebastian Andrzej Siewior
@ 2018-11-08 11:12   ` Paolo Bonzini
  2018-11-19 18:17     ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 45+ messages in thread
From: Paolo Bonzini @ 2018-11-08 11:12 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, linux-kernel
  Cc: x86, Andy Lutomirski, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On 07/11/2018 20:48, Sebastian Andrzej Siewior wrote:
> index 375226055a413..5b33985d9f475 100644
> --- a/arch/x86/kernel/fpu/xstate.c
> +++ b/arch/x86/kernel/fpu/xstate.c
> @@ -811,7 +811,7 @@ void fpu__resume_cpu(void)
>   *
>   * Note: does not work for compacted buffers.
>   */

The comment is wrong, which was already the case before but it becomes a
bit more important if the function is used outside its module.

However, why not use get_xsave_addr?  I don't see why it is important to
skip the checks, and if it is it probably deserves a comment. "Raw" and
double underscores in the function name is scary...

Paolo

> -static void *__raw_xsave_addr(struct xregs_state *xsave, int xfeature_nr)
> +void *__raw_xsave_addr(struct xregs_state *xsave, int xfeature_nr)
>  {
>  	if (!xfeature_enabled(xfeature_nr)) {
>  		WARN_ON_FPU(1);


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 01/23] x86/fpu: Use ULL for shift in xfeature_uncompacted_offset()
  2018-11-07 19:48 ` [PATCH 01/23] x86/fpu: Use ULL for shift in xfeature_uncompacted_offset() Sebastian Andrzej Siewior
@ 2018-11-08 11:38   ` Borislav Petkov
  2018-11-09 16:56     ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 45+ messages in thread
From: Borislav Petkov @ 2018-11-08 11:38 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, x86, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On Wed, Nov 07, 2018 at 08:48:36PM +0100, Sebastian Andrzej Siewior wrote:
> The xfeature mask is 64bit so a shift from a number to its mask should
> have LL prefix or else nr > 31 will be lost. This is not a problem now
> but should XFEATURE_MASK_SUPERVISOR gain a bit >31 then this check won't
> catch it.
> 
> Add a ULL suffix so >31bit number will be properly shifted.
> 
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
>  arch/x86/kernel/fpu/xstate.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
> index 87a57b7642d36..9dc0a2c8c2755 100644
> --- a/arch/x86/kernel/fpu/xstate.c
> +++ b/arch/x86/kernel/fpu/xstate.c
> @@ -444,7 +444,7 @@ static int xfeature_uncompacted_offset(int xfeature_nr)
>  	 * format. Checking a supervisor state's uncompacted offset is
>  	 * an error.
>  	 */
> -	if (XFEATURE_MASK_SUPERVISOR & (1 << xfeature_nr)) {
> +	if (XFEATURE_MASK_SUPERVISOR & (1ULL << xfeature_nr)) {

Or simply BIT_ULL(xfeature_nr).

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 02/23] x86/fpu: Remove fpu->initialized usage in __fpu__restore_sig()
  2018-11-07 19:48 ` [PATCH 02/23] x86/fpu: Remove fpu->initialized usage in __fpu__restore_sig() Sebastian Andrzej Siewior
@ 2018-11-08 14:57   ` Borislav Petkov
  2018-11-09 17:35     ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 45+ messages in thread
From: Borislav Petkov @ 2018-11-08 14:57 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, x86, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On Wed, Nov 07, 2018 at 08:48:37PM +0100, Sebastian Andrzej Siewior wrote:
> This is a preparation for the removal of the ->initialized member in the
> fpu struct.
> __fpu__restore_sig() is deactivating the FPU via fpu__drop() and then
> setting manually ->initialized followed by fpu__restore(). The result is
> that it is possible to manipulate fpu->state and the state of registers
> won't be saved/restore on a context switch which would overwrite state.

		restored

> 
> Don't access the fpu->state while the content is read from user space
> and examined / sanitized. Use a temporary buffer kmalloc() buffer for

one "buffer" too many.

More importantly, what I'm missing here is more detailed explanation
about how that manipulation can happen. Especially since the comment
over fpu__drop() you're removing below is claiming the exact opposite.
AFAICT.

Yeah, FPU code has always been nasty and tricky to follow so I think
we'd need to have this stuff explained in much more detail.

Thx.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 22/23] x86/fpu: Don't restore the FPU state directly from userland in __fpu__restore_sig()
  2018-11-07 19:48 ` [PATCH 22/23] x86/fpu: Don't restore the FPU state directly from userland in __fpu__restore_sig() Sebastian Andrzej Siewior
@ 2018-11-08 18:25   ` Andy Lutomirski
  2018-11-09 19:09     ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 45+ messages in thread
From: Andy Lutomirski @ 2018-11-08 18:25 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: LKML, X86 ML, Andrew Lutomirski, Paolo Bonzini, Radim Krcmar,
	kvm list, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On Wed, Nov 7, 2018 at 11:49 AM Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
>
> __fpu__restore_sig() restores the CPU's FPU state directly from
> userland. If we restore registers on return to userland then we can't
> load them directly from userland because a context switch/BH could
> destroy them.
>
> Restore the FPU registers after they have been copied from userland.
> __fpregs_changes_begin() ensures that they are not modified while beeing
> worked on. TIF_NEED_FPU_LOAD is clreared we want to keep our state, not
> the saved state.

I'm conceptually okay with this change, but what happens if the
registers that are copied into the kernel are garbage?  We used to
fail the restore and presumably kill the task.  What happens now?

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 04/23] x86/entry/32: Remove asm/math_emu.h include
  2018-11-07 19:48 ` [PATCH 04/23] x86/entry/32: Remove asm/math_emu.h include Sebastian Andrzej Siewior
@ 2018-11-08 18:45   ` Andy Lutomirski
  0 siblings, 0 replies; 45+ messages in thread
From: Andy Lutomirski @ 2018-11-08 18:45 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: LKML, X86 ML, Andrew Lutomirski, Paolo Bonzini, Radim Krcmar,
	kvm list, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On Wed, Nov 7, 2018 at 11:49 AM Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
>
> The math_emu.h header files contains the definition of struct
> math_emu_info. It is not used in this file.
>
> Remove asm/math_emu.h include.
>

Reviewed-by: Andy Lutomirski <luto@kernel.org>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 01/23] x86/fpu: Use ULL for shift in xfeature_uncompacted_offset()
  2018-11-08 11:38   ` Borislav Petkov
@ 2018-11-09 16:56     ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 45+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-11-09 16:56 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-kernel, x86, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On 2018-11-08 12:38:17 [+0100], Borislav Petkov wrote:
> Or simply BIT_ULL(xfeature_nr).

Yes, why not. Updated. Thanks.

Sebastian

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 02/23] x86/fpu: Remove fpu->initialized usage in __fpu__restore_sig()
  2018-11-08 14:57   ` Borislav Petkov
@ 2018-11-09 17:35     ` Sebastian Andrzej Siewior
  2018-11-09 18:52       ` Borislav Petkov
  0 siblings, 1 reply; 45+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-11-09 17:35 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-kernel, x86, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On 2018-11-08 15:57:21 [+0100], Borislav Petkov wrote:
> On Wed, Nov 07, 2018 at 08:48:37PM +0100, Sebastian Andrzej Siewior wrote:
> > This is a preparation for the removal of the ->initialized member in the
> > fpu struct.
> > __fpu__restore_sig() is deactivating the FPU via fpu__drop() and then
> > setting manually ->initialized followed by fpu__restore(). The result is
> > that it is possible to manipulate fpu->state and the state of registers
> > won't be saved/restore on a context switch which would overwrite state.
> 
> 		restored
> 
> > 
> > Don't access the fpu->state while the content is read from user space
> > and examined / sanitized. Use a temporary buffer kmalloc() buffer for
> 
> one "buffer" too many.

corrected.

> More importantly, what I'm missing here is more detailed explanation
> about how that manipulation can happen. Especially since the comment
> over fpu__drop() you're removing below is claiming the exact opposite.
> AFAICT.

fpu__drop() stets ->initialized to 0. As a result the context switch
will not save current FPU registers and so _not_ write to fpu->state.
This also means that CPU's FPU register will be random (inherited from
the last context) after the context switch. This is also true for usage
in softirq via kernel_fpu_begin().

The "new" FPU state is prepared in fpu->state and once it is done, it
gets loaded via
  fpu->initialized = 1; // make sure fpu__initialize() in fpu__restore()
                        // is a nop
  fpu__restore();	// Load the registers.

Since I plan to remove the ->initialized member, I don't have the luxury
to play with fpu->state because the "new" content obtained by
copy_from_user() will be overwritten with CPU's current FPU state during
a context switch.
Now with that information, what do you plan to alter? :)

> Yeah, FPU code has always been nasty and tricky to follow so I think
> we'd need to have this stuff explained in much more detail.

Yeah, tell me about it. Now that you made me look into this again, I
spotted this gem:

| __fpu__restore_sig()
…
|        if (ia32_fxstate) {
…
|                 fpu__drop(fpu);
…
|	/* prepare new FPU state in fpu->state */
| 
|                 fpu->initialized = 1;

*BOOM* context switch. ->initialized == 1 is seen so it stashes current
CPU's FPU state into fpu->state and overwrites what has been prepared
before.

On the switch back to this task, the fpu__restore() becomes a "nop"
because the saved registers are the same but not what was expected /
prepared before.

|                 preempt_disable();
|                 fpu__restore(fpu);
|                 preempt_enable();
|

So. The fix would be:
@@ -344,10 +344,10 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
                        sanitize_restored_xstate(tsk, &env, xfeatures, fx_only);
                }
 
+               local_bh_disable();
                fpu->initialized = 1;
-               preempt_disable();
                fpu__restore(fpu);
-               preempt_enable();
+               local_bh_enable();
 
                return err;
        } else {

local_bh_disable() due to possible kernel_fpu_begin() usage in softirq.
How much do we care here about a theoretical race on 32bit anyway? I
don't think someone complained :) I would have to rebase my queue…
otherwise…

> Thx.

Sebastian

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 02/23] x86/fpu: Remove fpu->initialized usage in __fpu__restore_sig()
  2018-11-09 17:35     ` Sebastian Andrzej Siewior
@ 2018-11-09 18:52       ` Borislav Petkov
  2018-11-09 23:25         ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 45+ messages in thread
From: Borislav Petkov @ 2018-11-09 18:52 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, Ingo Molnar
  Cc: linux-kernel, x86, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

[-- Attachment #1: Type: text/plain, Size: 2365 bytes --]

On Fri, Nov 09, 2018 at 06:35:21PM +0100, Sebastian Andrzej Siewior wrote:
> fpu__drop() stets ->initialized to 0. As a result the context switch

"... the context switch path landing in switch_fpu_prepare()... " is what you
mean, right?

> will not save current FPU registers and so _not_ write to fpu->state.
> This also means that CPU's FPU register will be random (inherited from
> the last context)

You mean, the FPU regs will have random values, yes.

> after the context switch. This is also true for usage
> in softirq via kernel_fpu_begin().

So far so good.

Except maybe because I'm dense about FPU, I still am missing something.

You have this path:

__fpu__restore_sig
|-> fpu__clear
 |-> fpu__drop

and that happens on the sigreturn() path.

Now, the context switch happens ... when exactly?

After the sigreturn is done?

It must be because then you'd get that ->state corruption after
->initialized has been cleared.

Right?

<snip a bunch of stuff, we'll get back to it later>

> So. The fix would be:
> @@ -344,10 +344,10 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
>                         sanitize_restored_xstate(tsk, &env, xfeatures, fx_only);
>                 }
>  
> +               local_bh_disable();
>                 fpu->initialized = 1;
> -               preempt_disable();
>                 fpu__restore(fpu);
> -               preempt_enable();
> +               local_bh_enable();
>  
>                 return err;
>         } else {
> 
> local_bh_disable() due to possible kernel_fpu_begin() usage in softirq.
> How much do we care here about a theoretical race on 32bit anyway? I
> don't think someone complained :) I would have to rebase my queue…
> otherwise…

Funny, you should mention that.

But this very much rings a bell about a very elusive bug we had on
32-bit at the time. See attached mbox (yeah, the web archives were crap
and couldn't find the links so I'm sending you the whole thread).

And at the time Ingo said that there's something still missing about
*why* it would happen.

And I think it is this context switch happening right after the
sigreturn - *AFAICT* - which would cause this.

I could very well be off but this smells very similar to your thing.

Hmmm.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

[-- Attachment #2: fpu.mbox --]
[-- Type: application/mbox, Size: 149765 bytes --]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 22/23] x86/fpu: Don't restore the FPU state directly from userland in __fpu__restore_sig()
  2018-11-08 18:25   ` Andy Lutomirski
@ 2018-11-09 19:09     ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 45+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-11-09 19:09 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: LKML, X86 ML, Paolo Bonzini, Radim Krcmar, kvm list,
	Jason A. Donenfeld, Rik van Riel, Dave Hansen

On 2018-11-08 10:25:24 [-0800], Andy Lutomirski wrote:
> On Wed, Nov 7, 2018 at 11:49 AM Sebastian Andrzej Siewior
> <bigeasy@linutronix.de> wrote:
> >
> > __fpu__restore_sig() restores the CPU's FPU state directly from
> > userland. If we restore registers on return to userland then we can't
> > load them directly from userland because a context switch/BH could
> > destroy them.
> >
> > Restore the FPU registers after they have been copied from userland.
> > __fpregs_changes_begin() ensures that they are not modified while beeing
> > worked on. TIF_NEED_FPU_LOAD is clreared we want to keep our state, not
> > the saved state.
> 
> I'm conceptually okay with this change, but what happens if the
> registers that are copied into the kernel are garbage?  We used to
> fail the restore and presumably kill the task.  What happens now?

What means garbage? Assume you mean something like memset(xstate, 0xff,)
then this would happen:

| ------------[ cut here ]------------
| Bad FPU state detected at __fpu__restore_sig+0x2a3/0x660, reinitializing FPU registers.
| WARNING: CPU: 0 PID: 1687 at arch/x86/mm/extable.c:113 ex_handler_fprestore+0x60/0x70
| Modules linked in:
| CPU: 0 PID: 1687 Comm: ssltc Not tainted 4.20.0-rc1+ #116
| Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.11.1-1 04/01/2014
| RIP: 0010:ex_handler_fprestore+0x60/0x70
| Code: 00 00 00 5d c3 48 0f ae 0d 0d 13 2c 01 b8 01 00 00 00 5d c3 48 89 c6 48 c7 c7 40 39 02 82 c6 05 c3 1d 2b 01 01 e8 0a 01 01 00 <0f> 0b eb b9 66 66 2e 0f 1f 84 00 00 7
| RSP: 0018:ffffc90000563cf8 EFLAGS: 00010282
| RAX: 0000000000000000 RBX: ffffc90000563d68 RCX: 0000000000000000
| RDX: 0000000000000046 RSI: 0000000000000000 RDI: ffff88003a1516c0
| RBP: ffffc90000563cf8 R08: 0000000000000000 R09: 0000000000000000
| R10: 0000000000000000 R11: 0000000000000000 R12: 000000000000000d
| R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
| FS:  00007f17678f1b80(0000) GS:ffff88003e800000(0000) knlGS:0000000000000000
| CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
| CR2: 00007f1767dc2000 CR3: 000000003d0b6003 CR4: 0000000000060ef0
| Call Trace:
|  fixup_exception+0x45/0x5c
|  do_general_protection+0x61/0x1a0
|  general_protection+0x1e/0x30
| RIP: 0010:__fpu__restore_sig+0x2a3/0x660
| Code: 00 00 48 8b 95 48 ff ff ff 48 f7 d2 48 21 d0 0f 85 73 03 00 00 48 8b 85 48 ff ff ff 4c 89 f7 48 89 c2 48 c1 ea 20 48 0f ae 2f <65> 48 8b 04 25 00 4f 01 00 f0 80 60 e
| RSP: 0018:ffffc90000563e18 EFLAGS: 00010246
| RAX: 0000000000000007 RBX: 00007ffe7f19df40 RCX: 00000000000031d7
| RDX: 0000000000000000 RSI: 0000000000000200 RDI: ffff88003a152a40
| RBP: ffffc90000563ed0 R08: 0000000000000000 R09: 0000000000000000
| R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
| R13: ffff88003a1516c0 R14: ffff88003a152a40 R15: ffff88003a152a00
|  ? __fpu__restore_sig+0x261/0x660
|  ? trace_hardirqs_on+0x22/0x110
|  fpu__restore_sig+0x28/0x40
|  __ia32_sys_rt_sigreturn+0x218/0x2aa
|  do_syscall_64+0x50/0x1a0
|  entry_SYSCALL_64_after_hwframe+0x49/0xbe
| RIP: 0033:0x7f1767ca8a7b
| Code: 31 d8 0f a4 ed 05 c5 79 7f 4c 24 30 01 fa 21 c6 c5 b9 72 d4 1f 31 d8 01 ea 0f ac ed 07 31 de c5 a9 73 fc 0c c5 d9 fe e4 89 d7 <03> 4c 24 08 31 c5 0f a4 d2 05 c4 c1 1
| RSP: 002b:00007ffe7f19e340 EFLAGS: 00000282
| RAX: 000000001c9f00ab RBX: 00000000837732a0 RCX: 000000001bff4f26
| RDX: 0000000037535a7c RSI: 00000000979700a8 RDI: 0000000037535a7c
| RBP: 00000000053caff3 R08: 00005615ffd442a0 R09: 00007f1767dc54c0
| R10: 00007f1767dc6000 R11: 00007ffe7f19e3c8 R12: 00007f1767dc2000
| R13: 00005615ff6c30a0 R14: 00007f1767cab080 R15: 00007f1767d95100
| irq event stamp: 12775
| hardirqs last  enabled at (12774): [<ffffffff810de72c>] vprintk_emit+0xac/0x270
| hardirqs last disabled at (12775): [<ffffffff81001c01>] trace_hardirqs_off_thunk+0x1a/0x1c
| softirqs last  enabled at (12756): [<ffffffff81c003a2>] __do_softirq+0x3a2/0x4d1
| softirqs last disabled at (12760): [<ffffffff8102a210>] __fpu__restore_sig+0x230/0x660
| ---[ end trace 163c456a84752e26 ]---

and the task continues. Before we had this:
| BUG: GPF in non-whitelisted uaccess (non-canonical address?)
| general protection fault: 0000 [#2] PREEMPT SMP DEBUG_PAGEALLOC PTI
| CPU: 1 PID: 1687 Comm: ssltc Tainted: G      D           4.20.0-rc1+ #120
| Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.11.1-1 04/01/2014
| RIP: 0010:__fpu__restore_sig+0x2fa/0x710
| Code: 34 8b 95 5c ff ff ff 85 d2 75 2a 4c 89 fa 48 f7 d2 48 21 d0 0f 85 b8 03 00 00 66 66 90 4c 89 fa 48 89 df 44 89 f8 48 c1 ea 20 <48> 0f ae 2f 31 db 66 66 90 eb 24 48 2
| RSP: 0018:ffffc900004abe20 EFLAGS: 00010246
| RAX: 0000000000000007 RBX: 00007ffe5b4c1680 RCX: 0000000000000000
| RDX: 0000000000000000 RSI: ffffffff820544ad RDI: 00007ffe5b4c1680
| RBP: ffffc900004abed0 R08: 0000000000000000 R09: 0000000000000001
| R10: 0000000000000000 R11: 0000000000000000 R12: ffff8800394440c0
| R13: ffff880039442d80 R14: ffff8800394440c0 R15: 0000000000000007
| FS:  00007f81db2d4b80(0000) GS:ffff88003e880000(0000) knlGS:0000000000000000
| CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
| CR2: 00007f81db7a5000 CR3: 000000003aafa002 CR4: 0000000000060ee0
| Call Trace:
|  ? trace_hardirqs_on+0x22/0x110
|  fpu__restore_sig+0x28/0x40
|  __ia32_sys_rt_sigreturn+0x218/0x2aa
|  do_syscall_64+0x50/0x180
|  entry_SYSCALL_64_after_hwframe+0x49/0xbe
| RIP: 0033:0x7f81db68bdc7
| Code: 54 24 14 31 df 89 ee 0f a4 ed 05 c5 b9 72 d1 1e c5 79 7f 0c 24 01 fa 31 de 0f ac c0 07 01 ea c5 f1 72 f1 02 03 4c 24 18 31 c6 <89> d7 0f a4 d2 05 01 f1 31 c7 0f ac 3
| RSP: 002b:00007ffe5b4c1a80 EFLAGS: 00000286
| RAX: 00000000ad356612 RBX: 000000007d63f272 RCX: 00000000adc87c8e
| RDX: 00000000d90909d0 RSI: 000000008b541c65 RDI: 00000000188b44f4
| RBP: 00000000605100ab R08: 0000555c9ecd52a0 R09: 00007f81db7a8f80
| R10: 00007f81db7a9000 R11: 00007ffe5b4c1ae8 R12: 00007f81db7a5000
| R13: 0000555c9d0c60a0 R14: 00007f81db68e080 R15: 00007f81db778100
| Modules linked in:
| Dumping ftrace buffer:
|    (ftrace buffer empty)
| ---[ end trace a16fb09d293317cc ]---
| RIP: 0010:__fpu__restore_sig+0x2fa/0x710
| Code: 34 8b 95 5c ff ff ff 85 d2 75 2a 4c 89 fa 48 f7 d2 48 21 d0 0f 85 b8 03 00 00 66 66 90 4c 89 fa 48 89 df 44 89 f8 48 c1 ea 20 <48> 0f ae 2f 31 db 66 66 90 eb 24 48 2
| RSP: 0018:ffffc900004abe20 EFLAGS: 00010246
| RAX: 0000000000000007 RBX: 00007ffe1945ae40 RCX: 0000000000000000
| RDX: 0000000000000000 RSI: 00000000ffffffff RDI: 00007ffe1945ae40
| RBP: ffffc900004abed0 R08: 0000000000000000 R09: 0000000000000000
| R10: 0000000000000000 R11: 0000000000000000 R12: ffff8800394440c0
| R13: ffff880039442d80 R14: ffff8800394440c0 R15: 0000000000000007
| FS:  00007f81db2d4b80(0000) GS:ffff88003e880000(0000) knlGS:0000000000000000
| CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
| CR2: 00007f81db7a5000 CR3: 000000003aafa002 CR4: 0000000000060ee0

Noisier but the task segfaulted.
I could add validate_xstate_header() + sanitize_restored_xstate() which
is what we do for 32bit-frames and should catch garbage. Then it ends
with:
| ssltc[1724] bad frame in rt_sigreturn frame:00000000ac8c6496 ip:7f4a10e36eda sp:7fff75054540 orax:ffffffffffffffff in libcrypto.so.1.1[7f4a10ce9000+19f000]

which would be also a segfault but with a smaller backtrace. Also
checking for XFEATURE_MASK_SUPERVISOR sounds like a good thing.
Right now XFEATURE_MASK_SUPERVISOR is not enabled in BV so it should not
make a difference but should be save in future.

Sebastian

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 02/23] x86/fpu: Remove fpu->initialized usage in __fpu__restore_sig()
  2018-11-09 18:52       ` Borislav Petkov
@ 2018-11-09 23:25         ` Sebastian Andrzej Siewior
  2018-11-12 15:56           ` [PATCH] x86/fpu: Disable BH while while loading FPU registers " Sebastian Andrzej Siewior
  0 siblings, 1 reply; 45+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-11-09 23:25 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, linux-kernel, x86, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On 2018-11-09 19:52:02 [+0100], Borislav Petkov wrote:
> On Fri, Nov 09, 2018 at 06:35:21PM +0100, Sebastian Andrzej Siewior wrote:
> > fpu__drop() stets ->initialized to 0. As a result the context switch
> 
> "... the context switch path landing in switch_fpu_prepare()... " is what you
> mean, right?
I mean both. switch_fpu_prepare() while the task is leaving and then
switch_fpu_finish() while the task is coming back. But yes.

> > will not save current FPU registers and so _not_ write to fpu->state.
> > This also means that CPU's FPU register will be random (inherited from
> > the last context)
> 
> You mean, the FPU regs will have random values, yes.
correct. Same like for kernel threads.

> > after the context switch. This is also true for usage
> > in softirq via kernel_fpu_begin().
> 
> So far so good.
> 
> Except maybe because I'm dense about FPU, I still am missing something.
> 
> You have this path:
> 
> __fpu__restore_sig
> |-> fpu__clear
>  |-> fpu__drop
> 
> and that happens on the sigreturn() path.
> 
> Now, the context switch happens ... when exactly?
> 
> After the sigreturn is done?

Is fpu__clear() correct here? If so, a context switch after setting
->initialized has been set to 1 wouldn't matter because in the end the
register state is restored from init_fpstate and not from task's FPU
struct.

> 
> It must be because then you'd get that ->state corruption after
> ->initialized has been cleared.
> 
> Right?

I might got your question wrong. If you quote the code and try again and
I do so, too :)

> <snip a bunch of stuff, we'll get back to it later>
> 
> > So. The fix would be:
> > @@ -344,10 +344,10 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
> >                         sanitize_restored_xstate(tsk, &env, xfeatures, fx_only);
> >                 }
> >  
> > +               local_bh_disable();
> >                 fpu->initialized = 1;
> > -               preempt_disable();
> >                 fpu__restore(fpu);
> > -               preempt_enable();
> > +               local_bh_enable();
> >  
> >                 return err;
> >         } else {
> > 
> > local_bh_disable() due to possible kernel_fpu_begin() usage in softirq.
> > How much do we care here about a theoretical race on 32bit anyway? I
> > don't think someone complained :) I would have to rebase my queue…
> > otherwise…
> 
> Funny, you should mention that.
> 
> But this very much rings a bell about a very elusive bug we had on
> 32-bit at the time. See attached mbox (yeah, the web archives were crap
> and couldn't find the links so I'm sending you the whole thread).
> 
> And at the time Ingo said that there's something still missing about
> *why* it would happen.
> 
> And I think it is this context switch happening right after the
> sigreturn - *AFAICT* - which would cause this.
> 
> I could very well be off but this smells very similar to your thing.

So checking out v4.5-rc3-15-g58122bf1d856a and __fpu__restore_sig() is
something like this:

|	fpu__drop(fpu);
…
|	fpu->fpstate_active = 1;
X
|	if (use_eager_fpu()) {
|		preempt_disable();
|		fpu__restore(fpu);
|		preempt_enable();
|	}

fpu__drop() sets fpstate_active & fpregs_active to 0[¹]. A context switch
at X would _not_ save current FPU registers and overwrite what was
prepared because fpregs_active should still be zero.
Now on the switch back to the task, fpstate_active was set which means
fpu.preload might be true. If so it would load the FPU registers and set
fpregs_active to 1. Later fpu__restore() would try the same and
fpregs_activate() would trigger the warning because fpregs_active was
already set to 1.

> Hmmm.
So I just came up with a possible hard to trigger case and a robot
triggered it already a while back. Well, CONFIG_PREEMPT=y is also there
so it matches this part of the story. But you connected the dots. 

[¹] side note: in my early research it took a while to notice that
    fpstate_active and fpregs_active were two different things. My brain
    used fp.*_active for matching. It also helped my confusion that
    those were renamed and removed…

Sebastian

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v4] x86: load FPU registers on return to userland
  2018-11-07 19:48 [PATCH v4] x86: load FPU registers on return to userland Sebastian Andrzej Siewior
                   ` (22 preceding siblings ...)
  2018-11-07 19:48 ` [PATCH 23/23] x86/fpu: Defer FPU state load until return to userspace Sebastian Andrzej Siewior
@ 2018-11-12  3:02 ` Wanpeng Li
  2018-11-12  3:26   ` Jason A. Donenfeld
  23 siblings, 1 reply; 45+ messages in thread
From: Wanpeng Li @ 2018-11-12  3:02 UTC (permalink / raw)
  To: Sebastian Sewior
  Cc: LKML, the arch/x86 maintainers, Andy Lutomirski, Paolo Bonzini,
	Radim Krcmar, kvm, Jason, riel, Dave Hansen

On Thu, 8 Nov 2018 at 03:55, Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
>
> This is a refurbished series originally started by by Rik van Riel. The
> goal is load the FPU registers on return to userland and not on every
> context switch. By this optimisation we can:
> - avoid loading the registers if the task stays in kernel and does
>   not return to userland
> - make kernel_fpu_begin() cheaper: it only saves the registers on the
>   first invocation. The second invocation does not need save them again.
>

Do you have any performance data?

Regards,
Wanpeng Li

> To access the FPU registers in kernel we need:
> - disable preemption to avoid that the scheduler switches tasks. By
>   doing so it would set TIF_NEED_FPU_LOAD and the FPU registers would be
>   not valid.
> - disable BH because the softirq might use kernel_fpu_begin() and then
>   set TIF_NEED_FPU_LOAD instead loading the FPU registers on completion.
>
> v3…v4:
> It has been suggested to remove the `initialized' member of the struct
> fpu because it should not required be needed with lazy-FPU-restore and
> would make the review easier. This is the first part of the series, the
> second is basically the rebase of the v3 queue. As a result, the
> diffstat became negative (which wasn't the case in previous version) :)
> I tried to incorporate all the review comments that came up, some of
> them were "outdated" after the removal of the `initialized' member. I'm
> sorry should I missed any.
>
> v1…v3:
> v2 was never posted. I followed the idea to completely decouple PKRU
> from xstate. This didn't quite work and made a few things complicated.
> One obvious required fixup is copy_fpstate_to_sigframe() where the PKRU
> state needs to be fiddled into xstate. This required another
> xfeatures_mask so that the sanity checks were performed and
> xstate_offsets would be computed. Additionally ptrace also reads/sets
> xstate in order to get/set the register and PKRU is one of them. So this
> would need some fiddle, too.
> In v3 I dropped that decouple idea. I also learned that the wrpkru
> instruction is not privileged and so caching it in kernel does not work.
> Instead I keep PKRU in xstate area and load it at context switch time
> while the remaining registers are deferred (until return to userland).
> The offset of PKRU within xstate is enumerated at boot time so why not
> use it.
>
> Rik van Riel (5):
>   x86/fpu: Add (__)make_fpregs_active helpers
>   x86/fpu: Eager switch PKRU state
>   x86/fpu: Always store the registers in copy_fpstate_to_sigframe()
>   x86/fpu: Prepare copy_fpstate_to_sigframe() for TIF_NEED_FPU_LOAD
>   x86/fpu: Defer FPU state load until return to userspace
>
> Sebastian Andrzej Siewior (18):
>   x86/fpu: Use ULL for shift in xfeature_uncompacted_offset()
>   x86/fpu: Remove fpu->initialized usage in __fpu__restore_sig()
>   x86/fpu: Remove fpu__restore()
>   x86/entry/32: Remove asm/math_emu.h include
>   x86/fpu: Remove preempt_disable() in fpu__clear()
>   x86/fpu: Always init the `state' in fpu__clear()
>   x86/fpu: Remove fpu->initialized usage in copy_fpstate_to_sigframe()
>   x86/fpu: Remove fpu->initialized
>   x86/fpu: Remove user_fpu_begin()
>   x86/entry: Remove _TIF_ALLWORK_MASK
>   x86/fpu: Make __raw_xsave_addr() use feature number instead of mask
>   x86/fpu: Make get_xsave_field_ptr() and get_xsave_addr() use feature
>     number instead of mask
>   x86/pkeys: Make init_pkru_value static
>   x86/fpu: Only write PKRU if it is different from current
>   x86/pkeys: Don't check if PKRU is zero before writting it
>   x86/entry: Add TIF_NEED_FPU_LOAD
>   x86/fpu: Update xstate's PKRU value on write_pkru()
>   x86/fpu: Don't restore the FPU state directly from userland in
>     __fpu__restore_sig()
>
>  Documentation/preempt-locking.txt    |   1 -
>  arch/x86/entry/common.c              |   8 ++
>  arch/x86/ia32/ia32_signal.c          |  17 +--
>  arch/x86/include/asm/fpu/api.h       |  25 ++++
>  arch/x86/include/asm/fpu/internal.h  | 149 ++++++----------------
>  arch/x86/include/asm/fpu/signal.h    |   2 +-
>  arch/x86/include/asm/fpu/types.h     |   9 --
>  arch/x86/include/asm/fpu/xstate.h    |   6 +-
>  arch/x86/include/asm/pgtable.h       |  19 ++-
>  arch/x86/include/asm/special_insns.h |  13 +-
>  arch/x86/include/asm/thread_info.h   |  10 +-
>  arch/x86/include/asm/trace/fpu.h     |   8 +-
>  arch/x86/kernel/fpu/core.c           | 181 +++++++++++++--------------
>  arch/x86/kernel/fpu/init.c           |   2 -
>  arch/x86/kernel/fpu/regset.c         |  24 +---
>  arch/x86/kernel/fpu/signal.c         | 133 +++++++++-----------
>  arch/x86/kernel/fpu/xstate.c         |  45 ++++---
>  arch/x86/kernel/process.c            |   2 +-
>  arch/x86/kernel/process_32.c         |  14 +--
>  arch/x86/kernel/process_64.c         |  11 +-
>  arch/x86/kernel/signal.c             |  17 ++-
>  arch/x86/kernel/traps.c              |   2 +-
>  arch/x86/kvm/x86.c                   |  47 ++++---
>  arch/x86/math-emu/fpu_entry.c        |   3 -
>  arch/x86/mm/mpx.c                    |   6 +-
>  arch/x86/mm/pkeys.c                  |  15 +--
>  26 files changed, 343 insertions(+), 426 deletions(-)
>
>  git://git.kernel.org/pub/scm/linux/kernel/git/bigeasy/staging.git x86_fpu_rtu_v4
>
> Sebastian
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v4] x86: load FPU registers on return to userland
  2018-11-12  3:02 ` [PATCH v4] x86: load FPU registers on return to userland Wanpeng Li
@ 2018-11-12  3:26   ` Jason A. Donenfeld
  0 siblings, 0 replies; 45+ messages in thread
From: Jason A. Donenfeld @ 2018-11-12  3:26 UTC (permalink / raw)
  To: kernellwp
  Cc: Sebastian Siewior, LKML, X86 ML, Andrew Lutomirski,
	Paolo Bonzini, rkrcmar, KVM list, Rik van Riel, dave.hansen

On Sun, Nov 11, 2018 at 10:02 PM Wanpeng Li <kernellwp@gmail.com> wrote:
> On Thu, 8 Nov 2018 at 03:55, Sebastian Andrzej Siewior
> <bigeasy@linutronix.de> wrote:
> >
> > This is a refurbished series originally started by by Rik van Riel. The
> > goal is load the FPU registers on return to userland and not on every
> > context switch. By this optimisation we can:
> > - avoid loading the registers if the task stays in kernel and does
> >   not return to userland
> > - make kernel_fpu_begin() cheaper: it only saves the registers on the
> >   first invocation. The second invocation does not need save them again.
>
> Do you have any performance data?

In WireGuard, the savings in not having to
save/restore/save/restore/save/restore the registers winds up being in
the order of hundreds of megabits.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH] x86/fpu: Disable BH while while loading FPU registers in __fpu__restore_sig()
  2018-11-09 23:25         ` Sebastian Andrzej Siewior
@ 2018-11-12 15:56           ` Sebastian Andrzej Siewior
  2018-11-12 17:48             ` Dave Hansen
  0 siblings, 1 reply; 45+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-11-12 15:56 UTC (permalink / raw)
  To: Borislav Petkov, x86
  Cc: Ingo Molnar, linux-kernel, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

The sequence
   fpu->initialized = 1;
   preempt_disable();
   fpu__restore(fpu);
   preempt_enable();

is racy in regard to a context switch. A context switch after the first
line would save the `actual' content of the FPU registers and trash away
the state that has been prepared (since fpu__drop()).

Use local_bh_disable() around the restore sequence to avoid the race. BH
needs to be disabled because BH is allowed to run (even with preemption
disabled) and might invoke kernel_fpu_begin().

This possible race has been reported by the Kernel Test Robot in FEB
2016 while there still was lazy FPU support.

Link: https://lkml.kernel.org/r/20160226074940.GA28911@pd.tnic
Cc: stable@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/x86/kernel/fpu/signal.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index 61a949d84dfa5..d99a8ee9e185e 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -344,10 +344,10 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
 			sanitize_restored_xstate(tsk, &env, xfeatures, fx_only);
 		}
 
+		local_bh_disable();
 		fpu->initialized = 1;
-		preempt_disable();
 		fpu__restore(fpu);
-		preempt_enable();
+		local_bh_enable();
 
 		return err;
 	} else {
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH] x86/fpu: Disable BH while while loading FPU registers in __fpu__restore_sig()
  2018-11-12 15:56           ` [PATCH] x86/fpu: Disable BH while while loading FPU registers " Sebastian Andrzej Siewior
@ 2018-11-12 17:48             ` Dave Hansen
  2018-11-19 11:41               ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 45+ messages in thread
From: Dave Hansen @ 2018-11-12 17:48 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, Borislav Petkov, x86
  Cc: Ingo Molnar, linux-kernel, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On 11/12/18 7:56 AM, Sebastian Andrzej Siewior wrote:
> Use local_bh_disable() around the restore sequence to avoid the race. BH
> needs to be disabled because BH is allowed to run (even with preemption
> disabled) and might invoke kernel_fpu_begin().

FWIW, that would make nice comment fodder for the local_bh_disable().
I'd much rather run into it there than in a changelog.



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] x86/fpu: Disable BH while while loading FPU registers in __fpu__restore_sig()
  2018-11-12 17:48             ` Dave Hansen
@ 2018-11-19 11:41               ` Sebastian Andrzej Siewior
  2018-11-19 15:04                 ` Dave Hansen
  0 siblings, 1 reply; 45+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-11-19 11:41 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Borislav Petkov, x86, Ingo Molnar, linux-kernel, Andy Lutomirski,
	Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On 2018-11-12 09:48:08 [-0800], Dave Hansen wrote:
> On 11/12/18 7:56 AM, Sebastian Andrzej Siewior wrote:
> > Use local_bh_disable() around the restore sequence to avoid the race. BH
> > needs to be disabled because BH is allowed to run (even with preemption
> > disabled) and might invoke kernel_fpu_begin().
> 
> FWIW, that would make nice comment fodder for the local_bh_disable().
> I'd much rather run into it there than in a changelog.

Hmm. Should I really resent a v2 of this with the additional comment?
This patch should go stable to the fix the bug and at the end of this
series I remove/replace this hunk anyway.
I can do either way, just wanted to check first.

Sebastian

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] x86/fpu: Disable BH while while loading FPU registers in __fpu__restore_sig()
  2018-11-19 11:41               ` Sebastian Andrzej Siewior
@ 2018-11-19 15:04                 ` Dave Hansen
  2018-11-19 15:06                   ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 45+ messages in thread
From: Dave Hansen @ 2018-11-19 15:04 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Borislav Petkov, x86, Ingo Molnar, linux-kernel, Andy Lutomirski,
	Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On 11/19/18 3:41 AM, Sebastian Andrzej Siewior wrote:
> On 2018-11-12 09:48:08 [-0800], Dave Hansen wrote:
>> On 11/12/18 7:56 AM, Sebastian Andrzej Siewior wrote:
>>> Use local_bh_disable() around the restore sequence to avoid the race. BH
>>> needs to be disabled because BH is allowed to run (even with preemption
>>> disabled) and might invoke kernel_fpu_begin().
>> FWIW, that would make nice comment fodder for the local_bh_disable().
>> I'd much rather run into it there than in a changelog.
> Hmm. Should I really resent a v2 of this with the additional comment?
> This patch should go stable to the fix the bug and at the end of this
> series I remove/replace this hunk anyway.
> I can do either way, just wanted to check first.

Does the local_bh_disable() itself survive?

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] x86/fpu: Disable BH while while loading FPU registers in __fpu__restore_sig()
  2018-11-19 15:04                 ` Dave Hansen
@ 2018-11-19 15:06                   ` Sebastian Andrzej Siewior
  2018-11-19 15:08                     ` Dave Hansen
  0 siblings, 1 reply; 45+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-11-19 15:06 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Borislav Petkov, x86, Ingo Molnar, linux-kernel, Andy Lutomirski,
	Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On 2018-11-19 07:04:35 [-0800], Dave Hansen wrote:
> 
> Does the local_bh_disable() itself survive?

Not in __fpu__restore_sig(). I do have:
| static inline void __fpregs_changes_begin(void)
| {
|        preempt_disable();
|        local_bh_disable();
| }

and __fpregs_changes_begin() is introduced as part of the series.

Sebastian

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] x86/fpu: Disable BH while while loading FPU registers in __fpu__restore_sig()
  2018-11-19 15:06                   ` Sebastian Andrzej Siewior
@ 2018-11-19 15:08                     ` Dave Hansen
  2018-11-19 15:19                       ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 45+ messages in thread
From: Dave Hansen @ 2018-11-19 15:08 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Borislav Petkov, x86, Ingo Molnar, linux-kernel, Andy Lutomirski,
	Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On 11/19/18 7:06 AM, Sebastian Andrzej Siewior wrote:
> On 2018-11-19 07:04:35 [-0800], Dave Hansen wrote:
>> Does the local_bh_disable() itself survive?
> Not in __fpu__restore_sig(). I do have:
> | static inline void __fpregs_changes_begin(void)
> | {
> |        preempt_disable();
> |        local_bh_disable();
> | }
> 
> and __fpregs_changes_begin() is introduced as part of the series.

OK, so can we just comment *that*, please?  Basically, why do we need botj?

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] x86/fpu: Disable BH while while loading FPU registers in __fpu__restore_sig()
  2018-11-19 15:08                     ` Dave Hansen
@ 2018-11-19 15:19                       ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 45+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-11-19 15:19 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Borislav Petkov, x86, Ingo Molnar, linux-kernel, Andy Lutomirski,
	Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On 2018-11-19 07:08:44 [-0800], Dave Hansen wrote:
> On 11/19/18 7:06 AM, Sebastian Andrzej Siewior wrote:
> > On 2018-11-19 07:04:35 [-0800], Dave Hansen wrote:
> >> Does the local_bh_disable() itself survive?
> > Not in __fpu__restore_sig(). I do have:
> > | static inline void __fpregs_changes_begin(void)
> > | {
> > |        preempt_disable();
> > |        local_bh_disable();
> > | }
> > 
> > and __fpregs_changes_begin() is introduced as part of the series.
> 
> OK, so can we just comment *that*, please?  Basically, why do we need botj?

let me do this then.
local_bh_disable() should be enough. However I had a discussion with
PeterZ that this (local_bh_disable()) also acting as preempt_disable())
is an implementation detail and should be avoided. It is not true
Preempt-RT for instance.

Sebastian

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 17/23] x86/fpu: Eager switch PKRU state
  2018-11-08 11:12   ` Paolo Bonzini
@ 2018-11-19 18:17     ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 45+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-11-19 18:17 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: linux-kernel, x86, Andy Lutomirski, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On 2018-11-08 12:12:52 [+0100], Paolo Bonzini wrote:
> On 07/11/2018 20:48, Sebastian Andrzej Siewior wrote:
> > index 375226055a413..5b33985d9f475 100644
> > --- a/arch/x86/kernel/fpu/xstate.c
> > +++ b/arch/x86/kernel/fpu/xstate.c
> > @@ -811,7 +811,7 @@ void fpu__resume_cpu(void)
> >   *
> >   * Note: does not work for compacted buffers.
> >   */
> 
> The comment is wrong, which was already the case before but it becomes a
> bit more important if the function is used outside its module.

let me fix it.

> However, why not use get_xsave_addr?  I don't see why it is important to
> skip the checks, and if it is it probably deserves a comment. "Raw" and
> double underscores in the function name is scary...

yeah, it is scary and only those that can face baba jaga may use it.
I though it is a fast path and it would be okay to skip those checks.
However. Let me fix the comment and use the normal function like
everyone else. If it is too slow then we can still short circuit it
later.

> Paolo

Sebastian

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2018-11-19 18:17 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-07 19:48 [PATCH v4] x86: load FPU registers on return to userland Sebastian Andrzej Siewior
2018-11-07 19:48 ` [PATCH 01/23] x86/fpu: Use ULL for shift in xfeature_uncompacted_offset() Sebastian Andrzej Siewior
2018-11-08 11:38   ` Borislav Petkov
2018-11-09 16:56     ` Sebastian Andrzej Siewior
2018-11-07 19:48 ` [PATCH 02/23] x86/fpu: Remove fpu->initialized usage in __fpu__restore_sig() Sebastian Andrzej Siewior
2018-11-08 14:57   ` Borislav Petkov
2018-11-09 17:35     ` Sebastian Andrzej Siewior
2018-11-09 18:52       ` Borislav Petkov
2018-11-09 23:25         ` Sebastian Andrzej Siewior
2018-11-12 15:56           ` [PATCH] x86/fpu: Disable BH while while loading FPU registers " Sebastian Andrzej Siewior
2018-11-12 17:48             ` Dave Hansen
2018-11-19 11:41               ` Sebastian Andrzej Siewior
2018-11-19 15:04                 ` Dave Hansen
2018-11-19 15:06                   ` Sebastian Andrzej Siewior
2018-11-19 15:08                     ` Dave Hansen
2018-11-19 15:19                       ` Sebastian Andrzej Siewior
2018-11-07 19:48 ` [PATCH 03/23] x86/fpu: Remove fpu__restore() Sebastian Andrzej Siewior
2018-11-07 19:48 ` [PATCH 04/23] x86/entry/32: Remove asm/math_emu.h include Sebastian Andrzej Siewior
2018-11-08 18:45   ` Andy Lutomirski
2018-11-07 19:48 ` [PATCH 05/23] x86/fpu: Remove preempt_disable() in fpu__clear() Sebastian Andrzej Siewior
2018-11-07 19:48 ` [PATCH 06/23] x86/fpu: Always init the `state' " Sebastian Andrzej Siewior
2018-11-07 19:48 ` [PATCH 07/23] x86/fpu: Remove fpu->initialized usage in copy_fpstate_to_sigframe() Sebastian Andrzej Siewior
2018-11-07 19:48 ` [PATCH 08/23] x86/fpu: Remove fpu->initialized Sebastian Andrzej Siewior
2018-11-07 19:48 ` [PATCH 09/23] x86/fpu: Remove user_fpu_begin() Sebastian Andrzej Siewior
2018-11-07 19:48 ` [PATCH 10/23] x86/entry: Remove _TIF_ALLWORK_MASK Sebastian Andrzej Siewior
2018-11-07 19:48 ` [PATCH 11/23] x86/fpu: Add (__)make_fpregs_active helpers Sebastian Andrzej Siewior
2018-11-07 19:48 ` [PATCH 12/23] x86/fpu: Make __raw_xsave_addr() use feature number instead of mask Sebastian Andrzej Siewior
2018-11-07 19:48 ` [PATCH 13/23] x86/fpu: Make get_xsave_field_ptr() and get_xsave_addr() " Sebastian Andrzej Siewior
2018-11-07 19:48 ` [PATCH 14/23] x86/pkeys: Make init_pkru_value static Sebastian Andrzej Siewior
2018-11-07 19:48 ` [PATCH 15/23] x86/fpu: Only write PKRU if it is different from current Sebastian Andrzej Siewior
2018-11-07 19:48 ` [PATCH 16/23] x86/pkeys: Don't check if PKRU is zero before writting it Sebastian Andrzej Siewior
2018-11-07 19:48 ` [PATCH 17/23] x86/fpu: Eager switch PKRU state Sebastian Andrzej Siewior
2018-11-08 11:12   ` Paolo Bonzini
2018-11-19 18:17     ` Sebastian Andrzej Siewior
2018-11-07 19:48 ` [PATCH 18/23] x86/entry: Add TIF_NEED_FPU_LOAD Sebastian Andrzej Siewior
2018-11-07 19:48 ` [PATCH 19/23] x86/fpu: Always store the registers in copy_fpstate_to_sigframe() Sebastian Andrzej Siewior
2018-11-07 19:48 ` [PATCH 20/23] x86/fpu: Prepare copy_fpstate_to_sigframe() for TIF_NEED_FPU_LOAD Sebastian Andrzej Siewior
2018-11-07 19:48 ` [PATCH 21/23] x86/fpu: Update xstate's PKRU value on write_pkru() Sebastian Andrzej Siewior
2018-11-07 19:48 ` [PATCH 22/23] x86/fpu: Don't restore the FPU state directly from userland in __fpu__restore_sig() Sebastian Andrzej Siewior
2018-11-08 18:25   ` Andy Lutomirski
2018-11-09 19:09     ` Sebastian Andrzej Siewior
2018-11-07 19:48 ` [PATCH 23/23] x86/fpu: Defer FPU state load until return to userspace Sebastian Andrzej Siewior
2018-11-08  8:33   ` Sebastian Andrzej Siewior
2018-11-12  3:02 ` [PATCH v4] x86: load FPU registers on return to userland Wanpeng Li
2018-11-12  3:26   ` Jason A. Donenfeld

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).