linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6] x86: load FPU registers on return to userland
@ 2019-01-09 11:47 Sebastian Andrzej Siewior
  2019-01-09 11:47 ` [PATCH 01/22] x86/fpu: Remove fpu->initialized usage in __fpu__restore_sig() Sebastian Andrzej Siewior
                   ` (23 more replies)
  0 siblings, 24 replies; 91+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-01-09 11:47 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

This is a refurbished series originally started by by Rik van Riel. The
goal is load the FPU registers on return to userland and not on every
context switch. By this optimisation we can:
- avoid loading the registers if the task stays in kernel and does
  not return to userland
- make kernel_fpu_begin() cheaper: it only saves the registers on the
  first invocation. The second invocation does not need save them again.

To access the FPU registers in kernel we need:
- disable preemption to avoid that the scheduler switches tasks. By
  doing so it would set TIF_NEED_FPU_LOAD and the FPU registers would be
  not valid.
- disable BH because the softirq might use kernel_fpu_begin() and then
  set TIF_NEED_FPU_LOAD instead loading the FPU registers on completion.

v5…v6:
Rebased on top of v5.0-rc1. Integrated a few fixes which I noticed while
looking over the patches, dropped the first few patches which were
already applied.

v4…v5:
Rebased on top of a fix, noticed a problem with XSAVES and then redid
the restore on sig return (patch #26 to #28).

I don't like very much the sig save+restore thing that we are doing. It
has been always like that. I *think* that this is just because we have
nowhere to stash the FPU state while we are handling the signal. We
could add another fpu->state for the signal handler and avoid the thing.
Debian code-search revealed that `criu' is using it (and I didn't
figure out why). Nothing else (that is packaged in Debian). Maybe we
could get rid of this and if `criu' would then use a dedicated interface
for its needs rather the signal interface that happen to do what it
wants :)

v3…v4:
It has been suggested to remove the `initialized' member of the struct
fpu because it should not required be needed with lazy-FPU-restore and
would make the review easier. This is the first part of the series, the
second is basically the rebase of the v3 queue. As a result, the
diffstat became negative (which wasn't the case in previous version) :)
I tried to incorporate all the review comments that came up, some of
them were "outdated" after the removal of the `initialized' member. I'm
sorry should I missed any.

v1…v3:
v2 was never posted. I followed the idea to completely decouple PKRU
from xstate. This didn't quite work and made a few things complicated. 
One obvious required fixup is copy_fpstate_to_sigframe() where the PKRU
state needs to be fiddled into xstate. This required another
xfeatures_mask so that the sanity checks were performed and
xstate_offsets would be computed. Additionally ptrace also reads/sets
xstate in order to get/set the register and PKRU is one of them. So this
would need some fiddle, too.
In v3 I dropped that decouple idea. I also learned that the wrpkru
instruction is not privileged and so caching it in kernel does not work.
Instead I keep PKRU in xstate area and load it at context switch time
while the remaining registers are deferred (until return to userland).
The offset of PKRU within xstate is enumerated at boot time so why not
use it.

The following changes since commit bfeffd155283772bbe78c6a05dec7c0128ee500c:

  Linux 5.0-rc1 (2019-01-06 17:08:20 -0800)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/bigeasy/staging.git x86_fpu_rtu_v6

for you to fetch changes up to b7b783d84166b6b74aed2bfd4a07128ff303fed6:

  x86/fpu: Defer FPU state load until return to userspace (2019-01-07 11:32:57 +0100)

----------------------------------------------------------------
Rik van Riel (5):
      x86/fpu: Add (__)make_fpregs_active helpers
      x86/fpu: Eager switch PKRU state
      x86/fpu: Always store the registers in copy_fpstate_to_sigframe()
      x86/fpu: Prepare copy_fpstate_to_sigframe() for TIF_NEED_FPU_LOAD
      x86/fpu: Defer FPU state load until return to userspace

Sebastian Andrzej Siewior (17):
      x86/fpu: Remove fpu->initialized usage in __fpu__restore_sig()
      x86/fpu: Remove fpu__restore()
      x86/fpu: Remove preempt_disable() in fpu__clear()
      x86/fpu: Always init the `state' in fpu__clear()
      x86/fpu: Remove fpu->initialized usage in copy_fpstate_to_sigframe()
      x86/fpu: Don't save fxregs for ia32 frames in copy_fpstate_to_sigframe()
      x86/fpu: Remove fpu->initialized
      x86/fpu: Remove user_fpu_begin()
      x86/fpu: Make __raw_xsave_addr() use feature number instead of mask
      x86/fpu: Make get_xsave_field_ptr() and get_xsave_addr() use feature number instead of mask
      x86/fpu: Only write PKRU if it is different from current
      x86/pkeys: Don't check if PKRU is zero before writting it
      x86/entry: Add TIF_NEED_FPU_LOAD
      x86/fpu: Update xstate's PKRU value on write_pkru()
      x86/fpu: Inline copy_user_to_fpregs_zeroing()
      x86/fpu: Let __fpu__restore_sig() restore the !32bit+fxsr frame from kernel memory
      x86/fpu: Merge the two code paths in __fpu__restore_sig()

 Documentation/preempt-locking.txt    |   1 -
 arch/x86/entry/common.c              |   8 ++
 arch/x86/ia32/ia32_signal.c          |  17 +--
 arch/x86/include/asm/fpu/api.h       |  31 ++++++
 arch/x86/include/asm/fpu/internal.h  | 151 +++++++++++---------------
 arch/x86/include/asm/fpu/signal.h    |   2 +-
 arch/x86/include/asm/fpu/types.h     |   9 --
 arch/x86/include/asm/fpu/xstate.h    |   5 +-
 arch/x86/include/asm/pgtable.h       |  20 +++-
 arch/x86/include/asm/special_insns.h |  13 ++-
 arch/x86/include/asm/thread_info.h   |   2 +
 arch/x86/include/asm/trace/fpu.h     |   8 +-
 arch/x86/kernel/fpu/core.c           | 193 ++++++++++++++++-----------------
 arch/x86/kernel/fpu/init.c           |   2 -
 arch/x86/kernel/fpu/regset.c         |  24 +---
 arch/x86/kernel/fpu/signal.c         | 205 ++++++++++++++++-------------------
 arch/x86/kernel/fpu/xstate.c         |  43 ++++----
 arch/x86/kernel/process.c            |   2 +-
 arch/x86/kernel/process_32.c         |  11 +-
 arch/x86/kernel/process_64.c         |  11 +-
 arch/x86/kernel/signal.c             |  17 ++-
 arch/x86/kernel/traps.c              |   2 +-
 arch/x86/kvm/x86.c                   |  48 +++++---
 arch/x86/math-emu/fpu_entry.c        |   3 -
 arch/x86/mm/mpx.c                    |   6 +-
 arch/x86/mm/pkeys.c                  |  14 +--
 26 files changed, 411 insertions(+), 437 deletions(-)


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH 01/22] x86/fpu: Remove fpu->initialized usage in __fpu__restore_sig()
  2019-01-09 11:47 [PATCH v6] x86: load FPU registers on return to userland Sebastian Andrzej Siewior
@ 2019-01-09 11:47 ` Sebastian Andrzej Siewior
  2019-01-14 16:24   ` Borislav Petkov
  2019-01-09 11:47 ` [PATCH 02/22] x86/fpu: Remove fpu__restore() Sebastian Andrzej Siewior
                   ` (22 subsequent siblings)
  23 siblings, 1 reply; 91+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-01-09 11:47 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen,
	Sebastian Andrzej Siewior

This is a preparation for the removal of the ->initialized member in the
fpu struct.
__fpu__restore_sig() is deactivating the FPU via fpu__drop() and then
setting manually ->initialized followed by fpu__restore(). The result is
that it is possible to manipulate fpu->state and the state of registers
won't be saved/restored on a context switch which would overwrite
fpu->state.

Don't access the fpu->state while the content is read from user space
and examined / sanitized. Use a temporary kmalloc() buffer for the
preparation of the FPU registers and once the state is considered okay,
load it. Should something go wrong, return with an error and without
altering the original FPU registers.

The removal of "fpu__initialize()" is a nop because fpu->initialized is
already set for the user task.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/x86/include/asm/fpu/signal.h |  2 +-
 arch/x86/kernel/fpu/regset.c      |  5 ++--
 arch/x86/kernel/fpu/signal.c      | 41 ++++++++++++-------------------
 3 files changed, 19 insertions(+), 29 deletions(-)

diff --git a/arch/x86/include/asm/fpu/signal.h b/arch/x86/include/asm/fpu/signal.h
index 44bbc39a57b30..7fb516b6893a8 100644
--- a/arch/x86/include/asm/fpu/signal.h
+++ b/arch/x86/include/asm/fpu/signal.h
@@ -22,7 +22,7 @@ int ia32_setup_frame(int sig, struct ksignal *ksig,
 
 extern void convert_from_fxsr(struct user_i387_ia32_struct *env,
 			      struct task_struct *tsk);
-extern void convert_to_fxsr(struct task_struct *tsk,
+extern void convert_to_fxsr(struct fxregs_state *fxsave,
 			    const struct user_i387_ia32_struct *env);
 
 unsigned long
diff --git a/arch/x86/kernel/fpu/regset.c b/arch/x86/kernel/fpu/regset.c
index bc02f5144b958..5dbc099178a88 100644
--- a/arch/x86/kernel/fpu/regset.c
+++ b/arch/x86/kernel/fpu/regset.c
@@ -269,11 +269,10 @@ convert_from_fxsr(struct user_i387_ia32_struct *env, struct task_struct *tsk)
 		memcpy(&to[i], &from[i], sizeof(to[0]));
 }
 
-void convert_to_fxsr(struct task_struct *tsk,
+void convert_to_fxsr(struct fxregs_state *fxsave,
 		     const struct user_i387_ia32_struct *env)
 
 {
-	struct fxregs_state *fxsave = &tsk->thread.fpu.state.fxsave;
 	struct _fpreg *from = (struct _fpreg *) &env->st_space[0];
 	struct _fpxreg *to = (struct _fpxreg *) &fxsave->st_space[0];
 	int i;
@@ -350,7 +349,7 @@ int fpregs_set(struct task_struct *target, const struct user_regset *regset,
 
 	ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, &env, 0, -1);
 	if (!ret)
-		convert_to_fxsr(target, &env);
+		convert_to_fxsr(&target->thread.fpu.state.fxsave, &env);
 
 	/*
 	 * update the header bit in the xsave header, indicating the
diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index f6a1d299627c5..c0cdcb9b7de5a 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -207,11 +207,11 @@ int copy_fpstate_to_sigframe(void __user *buf, void __user *buf_fx, int size)
 }
 
 static inline void
-sanitize_restored_xstate(struct task_struct *tsk,
+sanitize_restored_xstate(union fpregs_state *state,
 			 struct user_i387_ia32_struct *ia32_env,
 			 u64 xfeatures, int fx_only)
 {
-	struct xregs_state *xsave = &tsk->thread.fpu.state.xsave;
+	struct xregs_state *xsave = &state->xsave;
 	struct xstate_header *header = &xsave->header;
 
 	if (use_xsave()) {
@@ -238,7 +238,7 @@ sanitize_restored_xstate(struct task_struct *tsk,
 		 */
 		xsave->i387.mxcsr &= mxcsr_feature_mask;
 
-		convert_to_fxsr(tsk, ia32_env);
+		convert_to_fxsr(&state->fxsave, ia32_env);
 	}
 }
 
@@ -284,8 +284,6 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
 	if (!access_ok(buf, size))
 		return -EACCES;
 
-	fpu__initialize(fpu);
-
 	if (!static_cpu_has(X86_FEATURE_FPU))
 		return fpregs_soft_set(current, NULL,
 				       0, sizeof(struct user_i387_ia32_struct),
@@ -315,40 +313,33 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
 		 * header. Validate and sanitize the copied state.
 		 */
 		struct user_i387_ia32_struct env;
+		union fpregs_state *state;
 		int err = 0;
+		void *tmp;
 
-		/*
-		 * Drop the current fpu which clears fpu->initialized. This ensures
-		 * that any context-switch during the copy of the new state,
-		 * avoids the intermediate state from getting restored/saved.
-		 * Thus avoiding the new restored state from getting corrupted.
-		 * We will be ready to restore/save the state only after
-		 * fpu->initialized is again set.
-		 */
-		fpu__drop(fpu);
+		tmp = kzalloc(sizeof(*state) + fpu_kernel_xstate_size + 64, GFP_KERNEL);
+		if (!tmp)
+			return -ENOMEM;
+		state = PTR_ALIGN(tmp, 64);
 
 		if (using_compacted_format()) {
-			err = copy_user_to_xstate(&fpu->state.xsave, buf_fx);
+			err = copy_user_to_xstate(&state->xsave, buf_fx);
 		} else {
-			err = __copy_from_user(&fpu->state.xsave, buf_fx, state_size);
+			err = __copy_from_user(&state->xsave, buf_fx, state_size);
 
 			if (!err && state_size > offsetof(struct xregs_state, header))
-				err = validate_xstate_header(&fpu->state.xsave.header);
+				err = validate_xstate_header(&state->xsave.header);
 		}
 
 		if (err || __copy_from_user(&env, buf, sizeof(env))) {
-			fpstate_init(&fpu->state);
-			trace_x86_fpu_init_state(fpu);
 			err = -1;
 		} else {
-			sanitize_restored_xstate(tsk, &env, xfeatures, fx_only);
+			sanitize_restored_xstate(state, &env,
+						 xfeatures, fx_only);
+			copy_kernel_to_fpregs(state);
 		}
 
-		local_bh_disable();
-		fpu->initialized = 1;
-		fpu__restore(fpu);
-		local_bh_enable();
-
+		kfree(tmp);
 		return err;
 	} else {
 		/*
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 02/22] x86/fpu: Remove fpu__restore()
  2019-01-09 11:47 [PATCH v6] x86: load FPU registers on return to userland Sebastian Andrzej Siewior
  2019-01-09 11:47 ` [PATCH 01/22] x86/fpu: Remove fpu->initialized usage in __fpu__restore_sig() Sebastian Andrzej Siewior
@ 2019-01-09 11:47 ` Sebastian Andrzej Siewior
  2019-01-09 11:47 ` [PATCH 03/22] x86/fpu: Remove preempt_disable() in fpu__clear() Sebastian Andrzej Siewior
                   ` (21 subsequent siblings)
  23 siblings, 0 replies; 91+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-01-09 11:47 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen,
	Sebastian Andrzej Siewior

There are no users of fpu__restore() so it is time to remove it.
The comment regarding fpu__restore() and TS bit is stale since commit
  b3b0870ef3ffe ("i387: do not preload FPU state at task switch time")
and has no meaning since.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 Documentation/preempt-locking.txt   |  1 -
 arch/x86/include/asm/fpu/internal.h |  1 -
 arch/x86/kernel/fpu/core.c          | 24 ------------------------
 arch/x86/kernel/process_32.c        |  4 +---
 arch/x86/kernel/process_64.c        |  4 +---
 5 files changed, 2 insertions(+), 32 deletions(-)

diff --git a/Documentation/preempt-locking.txt b/Documentation/preempt-locking.txt
index 509f5a422d571..dce336134e54a 100644
--- a/Documentation/preempt-locking.txt
+++ b/Documentation/preempt-locking.txt
@@ -52,7 +52,6 @@ preemption must be disabled around such regions.
 
 Note, some FPU functions are already explicitly preempt safe.  For example,
 kernel_fpu_begin and kernel_fpu_end will disable and enable preemption.
-However, fpu__restore() must be called with preemption disabled.
 
 
 RULE #3: Lock acquire and release must be performed by same task
diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index fa2c93cb42a27..67675d023d4f8 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -28,7 +28,6 @@ extern void fpu__initialize(struct fpu *fpu);
 extern void fpu__prepare_read(struct fpu *fpu);
 extern void fpu__prepare_write(struct fpu *fpu);
 extern void fpu__save(struct fpu *fpu);
-extern void fpu__restore(struct fpu *fpu);
 extern int  fpu__restore_sig(void __user *buf, int ia32_frame);
 extern void fpu__drop(struct fpu *fpu);
 extern int  fpu__copy(struct fpu *dst_fpu, struct fpu *src_fpu);
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 2e5003fef51a9..1d3ae7988f7f2 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -303,30 +303,6 @@ void fpu__prepare_write(struct fpu *fpu)
 	}
 }
 
-/*
- * 'fpu__restore()' is called to copy FPU registers from
- * the FPU fpstate to the live hw registers and to activate
- * access to the hardware registers, so that FPU instructions
- * can be used afterwards.
- *
- * Must be called with kernel preemption disabled (for example
- * with local interrupts disabled, as it is in the case of
- * do_device_not_available()).
- */
-void fpu__restore(struct fpu *fpu)
-{
-	fpu__initialize(fpu);
-
-	/* Avoid __kernel_fpu_begin() right after fpregs_activate() */
-	kernel_fpu_disable();
-	trace_x86_fpu_before_restore(fpu);
-	fpregs_activate(fpu);
-	copy_kernel_to_fpregs(&fpu->state);
-	trace_x86_fpu_after_restore(fpu);
-	kernel_fpu_enable();
-}
-EXPORT_SYMBOL_GPL(fpu__restore);
-
 /*
  * Drops current FPU state: deactivates the fpregs and
  * the fpstate. NOTE: it still leaves previous contents
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index e471d8e6f0b24..7888a41a03cdb 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -267,9 +267,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	/*
 	 * Leave lazy mode, flushing any hypercalls made here.
 	 * This must be done before restoring TLS segments so
-	 * the GDT and LDT are properly updated, and must be
-	 * done before fpu__restore(), so the TS bit is up
-	 * to date.
+	 * the GDT and LDT are properly updated.
 	 */
 	arch_end_context_switch(next_p);
 
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 6a62f4af9fcf7..e1983b3a16c43 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -538,9 +538,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	/*
 	 * Leave lazy mode, flushing any hypercalls made here.  This
 	 * must be done after loading TLS entries in the GDT but before
-	 * loading segments that might reference them, and and it must
-	 * be done before fpu__restore(), so the TS bit is up to
-	 * date.
+	 * loading segments that might reference them.
 	 */
 	arch_end_context_switch(next_p);
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 03/22] x86/fpu: Remove preempt_disable() in fpu__clear()
  2019-01-09 11:47 [PATCH v6] x86: load FPU registers on return to userland Sebastian Andrzej Siewior
  2019-01-09 11:47 ` [PATCH 01/22] x86/fpu: Remove fpu->initialized usage in __fpu__restore_sig() Sebastian Andrzej Siewior
  2019-01-09 11:47 ` [PATCH 02/22] x86/fpu: Remove fpu__restore() Sebastian Andrzej Siewior
@ 2019-01-09 11:47 ` Sebastian Andrzej Siewior
  2019-01-14 18:55   ` Borislav Petkov
  2019-01-09 11:47 ` [PATCH 04/22] x86/fpu: Always init the `state' " Sebastian Andrzej Siewior
                   ` (20 subsequent siblings)
  23 siblings, 1 reply; 91+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-01-09 11:47 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen,
	Sebastian Andrzej Siewior

The preempt_disable() section was introduced in commit
  a10b6a16cdad8 ("x86/fpu: Make the fpu state change in fpu__clear() scheduler-atomic")
and it was said to be temporary.

fpu__initialize() initializes the FPU struct to its "init" value and
then sets ->initialized to 1. The last part is the important one.
The content of the `state' does not matter because it gets set via
copy_init_fpstate_to_fpregs().
A preemption here has little meaning because the register will always be
set to the same content after copy_init_fpstate_to_fpregs(). A softirq
with a kernel_fpu_begin() could also force to save FPU's register after
fpu__initialize() without changing the outcome here.

Remove the preempt_disable() section in fpu__clear(), preemption here
does not hurt.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/x86/kernel/fpu/core.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 1d3ae7988f7f2..1940319268aef 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -366,11 +366,9 @@ void fpu__clear(struct fpu *fpu)
 	 * Make sure fpstate is cleared and initialized.
 	 */
 	if (static_cpu_has(X86_FEATURE_FPU)) {
-		preempt_disable();
 		fpu__initialize(fpu);
 		user_fpu_begin();
 		copy_init_fpstate_to_fpregs();
-		preempt_enable();
 	}
 }
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 04/22] x86/fpu: Always init the `state' in fpu__clear()
  2019-01-09 11:47 [PATCH v6] x86: load FPU registers on return to userland Sebastian Andrzej Siewior
                   ` (2 preceding siblings ...)
  2019-01-09 11:47 ` [PATCH 03/22] x86/fpu: Remove preempt_disable() in fpu__clear() Sebastian Andrzej Siewior
@ 2019-01-09 11:47 ` Sebastian Andrzej Siewior
  2019-01-14 19:32   ` Borislav Petkov
  2019-01-09 11:47 ` [PATCH 05/22] x86/fpu: Remove fpu->initialized usage in copy_fpstate_to_sigframe() Sebastian Andrzej Siewior
                   ` (19 subsequent siblings)
  23 siblings, 1 reply; 91+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-01-09 11:47 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen,
	Sebastian Andrzej Siewior

fpu__clear() only initializes the `state' if the FPU is present. This
initialisation is also required for the FPU-less system and takes place
math_emulate(). Since fpu__initialize() only performs the initialization
if ->initialized is zero it does not matter that it is invoked each time
an opcode is emulated. It makes the removal of ->initialized easier if
the struct is also initialized in FPU-less case at the same time.

Move fpu__initialize() before the FPU check so it is also performed in
FPU-less case.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/x86/include/asm/fpu/internal.h | 1 -
 arch/x86/kernel/fpu/core.c          | 5 ++---
 arch/x86/math-emu/fpu_entry.c       | 3 ---
 3 files changed, 2 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 67675d023d4f8..9f0b3ff8c9b7b 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -24,7 +24,6 @@
 /*
  * High level FPU state handling functions:
  */
-extern void fpu__initialize(struct fpu *fpu);
 extern void fpu__prepare_read(struct fpu *fpu);
 extern void fpu__prepare_write(struct fpu *fpu);
 extern void fpu__save(struct fpu *fpu);
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 1940319268aef..e43296854e379 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -223,7 +223,7 @@ int fpu__copy(struct fpu *dst_fpu, struct fpu *src_fpu)
  * Activate the current task's in-memory FPU context,
  * if it has not been used before:
  */
-void fpu__initialize(struct fpu *fpu)
+static void fpu__initialize(struct fpu *fpu)
 {
 	WARN_ON_FPU(fpu != &current->thread.fpu);
 
@@ -236,7 +236,6 @@ void fpu__initialize(struct fpu *fpu)
 		fpu->initialized = 1;
 	}
 }
-EXPORT_SYMBOL_GPL(fpu__initialize);
 
 /*
  * This function must be called before we read a task's fpstate.
@@ -365,8 +364,8 @@ void fpu__clear(struct fpu *fpu)
 	/*
 	 * Make sure fpstate is cleared and initialized.
 	 */
+	fpu__initialize(fpu);
 	if (static_cpu_has(X86_FEATURE_FPU)) {
-		fpu__initialize(fpu);
 		user_fpu_begin();
 		copy_init_fpstate_to_fpregs();
 	}
diff --git a/arch/x86/math-emu/fpu_entry.c b/arch/x86/math-emu/fpu_entry.c
index 9e2ba7e667f61..a873da6b46d6b 100644
--- a/arch/x86/math-emu/fpu_entry.c
+++ b/arch/x86/math-emu/fpu_entry.c
@@ -113,9 +113,6 @@ void math_emulate(struct math_emu_info *info)
 	unsigned long code_base = 0;
 	unsigned long code_limit = 0;	/* Initialized to stop compiler warnings */
 	struct desc_struct code_descriptor;
-	struct fpu *fpu = &current->thread.fpu;
-
-	fpu__initialize(fpu);
 
 #ifdef RE_ENTRANT_CHECKING
 	if (emulating) {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 05/22] x86/fpu: Remove fpu->initialized usage in copy_fpstate_to_sigframe()
  2019-01-09 11:47 [PATCH v6] x86: load FPU registers on return to userland Sebastian Andrzej Siewior
                   ` (3 preceding siblings ...)
  2019-01-09 11:47 ` [PATCH 04/22] x86/fpu: Always init the `state' " Sebastian Andrzej Siewior
@ 2019-01-09 11:47 ` Sebastian Andrzej Siewior
  2019-01-16 19:36   ` Borislav Petkov
  2019-01-09 11:47 ` [PATCH 06/22] x86/fpu: Don't save fxregs for ia32 frames " Sebastian Andrzej Siewior
                   ` (18 subsequent siblings)
  23 siblings, 1 reply; 91+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-01-09 11:47 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen,
	Sebastian Andrzej Siewior

Since ->initialized is always true for user tasks and kernel threads
don't get this far, we always save the registers directly to userspace.

Remove check for ->initialized because it is always true and remove the
false condition.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/x86/kernel/fpu/signal.c | 30 ++++++------------------------
 1 file changed, 6 insertions(+), 24 deletions(-)

diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index c0cdcb9b7de5a..c136a4327659d 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -157,7 +157,6 @@ static inline int copy_fpregs_to_sigframe(struct xregs_state __user *buf)
 int copy_fpstate_to_sigframe(void __user *buf, void __user *buf_fx, int size)
 {
 	struct fpu *fpu = &current->thread.fpu;
-	struct xregs_state *xsave = &fpu->state.xsave;
 	struct task_struct *tsk = current;
 	int ia32_fxstate = (buf != buf_fx);
 
@@ -172,29 +171,12 @@ int copy_fpstate_to_sigframe(void __user *buf, void __user *buf_fx, int size)
 			sizeof(struct user_i387_ia32_struct), NULL,
 			(struct _fpstate_32 __user *) buf) ? -1 : 1;
 
-	if (fpu->initialized || using_compacted_format()) {
-		/* Save the live register state to the user directly. */
-		if (copy_fpregs_to_sigframe(buf_fx))
-			return -1;
-		/* Update the thread's fxstate to save the fsave header. */
-		if (ia32_fxstate)
-			copy_fxregs_to_kernel(fpu);
-	} else {
-		/*
-		 * It is a *bug* if kernel uses compacted-format for xsave
-		 * area and we copy it out directly to a signal frame. It
-		 * should have been handled above by saving the registers
-		 * directly.
-		 */
-		if (boot_cpu_has(X86_FEATURE_XSAVES)) {
-			WARN_ONCE(1, "x86/fpu: saving compacted-format xsave area to a signal frame!\n");
-			return -1;
-		}
-
-		fpstate_sanitize_xstate(fpu);
-		if (__copy_to_user(buf_fx, xsave, fpu_user_xstate_size))
-			return -1;
-	}
+	/* Save the live register state to the user directly. */
+	if (copy_fpregs_to_sigframe(buf_fx))
+		return -1;
+	/* Update the thread's fxstate to save the fsave header. */
+	if (ia32_fxstate)
+		copy_fxregs_to_kernel(fpu);
 
 	/* Save the fsave header for the 32-bit frames. */
 	if ((ia32_fxstate || !use_fxsr()) && save_fsave_header(tsk, buf))
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 06/22] x86/fpu: Don't save fxregs for ia32 frames in copy_fpstate_to_sigframe()
  2019-01-09 11:47 [PATCH v6] x86: load FPU registers on return to userland Sebastian Andrzej Siewior
                   ` (4 preceding siblings ...)
  2019-01-09 11:47 ` [PATCH 05/22] x86/fpu: Remove fpu->initialized usage in copy_fpstate_to_sigframe() Sebastian Andrzej Siewior
@ 2019-01-09 11:47 ` Sebastian Andrzej Siewior
  2019-01-24 11:17   ` Borislav Petkov
  2019-01-09 11:47 ` [PATCH 07/22] x86/fpu: Remove fpu->initialized Sebastian Andrzej Siewior
                   ` (17 subsequent siblings)
  23 siblings, 1 reply; 91+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-01-09 11:47 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen,
	Sebastian Andrzej Siewior

Why does copy_fpstate_to_sigframe() do copy_fxregs_to_kernel() in the
ia32_fxstate case? I don't know. It just does.
Maybe it was required at some point, maybe it was added by accident and
nobody noticed it because it makes no difference.

In copy_fpstate_to_sigframe() we stash the FPU state into the task's
stackframe. Then the CPU's FPU registers (and its fpu->state) are
cleared (handle_signal() does fpu__clear()). So it makes *no* difference
what happens to fpu->state after copy_fpregs_to_sigframe().

Remove copy_fxregs_to_kernel() since it does not matter what it does and
save a few cycles.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/x86/kernel/fpu/signal.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index c136a4327659d..047390a45e016 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -174,9 +174,6 @@ int copy_fpstate_to_sigframe(void __user *buf, void __user *buf_fx, int size)
 	/* Save the live register state to the user directly. */
 	if (copy_fpregs_to_sigframe(buf_fx))
 		return -1;
-	/* Update the thread's fxstate to save the fsave header. */
-	if (ia32_fxstate)
-		copy_fxregs_to_kernel(fpu);
 
 	/* Save the fsave header for the 32-bit frames. */
 	if ((ia32_fxstate || !use_fxsr()) && save_fsave_header(tsk, buf))
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 07/22] x86/fpu: Remove fpu->initialized
  2019-01-09 11:47 [PATCH v6] x86: load FPU registers on return to userland Sebastian Andrzej Siewior
                   ` (5 preceding siblings ...)
  2019-01-09 11:47 ` [PATCH 06/22] x86/fpu: Don't save fxregs for ia32 frames " Sebastian Andrzej Siewior
@ 2019-01-09 11:47 ` Sebastian Andrzej Siewior
  2019-01-24 13:34   ` Borislav Petkov
  2019-01-09 11:47 ` [PATCH 08/22] x86/fpu: Remove user_fpu_begin() Sebastian Andrzej Siewior
                   ` (16 subsequent siblings)
  23 siblings, 1 reply; 91+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-01-09 11:47 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen,
	Sebastian Andrzej Siewior

The `initialized' member of the fpu struct is always set to one user
tasks and zero for kernel tasks. This avoids saving/restoring the FPU
registers for kernel threads.

I expect that fpu->initialized is always 1 and the 0 case has been
removed or is not important. For instance fpu__drop() sets the value to
zero and its caller call either fpu__initialize() (which would
set it back to one) or don't return to userland.

The context switch code (switch_fpu_prepare() + switch_fpu_finish())
can't unconditionally save/restore registers for kernel threads. I have
no idea what will happen if we restore a zero FPU context for the kernel
thread (since it never was initialized). Also it has been agreed that
for PKRU we don't want a random state (inherited from the previous task)
but a deterministic one.

For kernel_fpu_begin() (+end) the situation is similar: The kernel test
bot told me, that EFI with runtime services uses this before
alternatives_patched is true. Which means that this function is used too
early and it wasn't the case before.

For those two cases current->mm is used to determine between user &
kernel thread. For kernel_fpu_begin() we skip save/restore of the FPU
registers.
During the context switch into a kernel thread we don't do anything.
There is no reason to save the FPU state of a kernel thread.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/x86/ia32/ia32_signal.c         | 17 +++-----
 arch/x86/include/asm/fpu/internal.h | 15 +++----
 arch/x86/include/asm/fpu/types.h    |  9 ----
 arch/x86/include/asm/trace/fpu.h    |  5 +--
 arch/x86/kernel/fpu/core.c          | 68 ++++++++---------------------
 arch/x86/kernel/fpu/init.c          |  2 -
 arch/x86/kernel/fpu/regset.c        | 19 ++------
 arch/x86/kernel/fpu/xstate.c        |  2 -
 arch/x86/kernel/process_32.c        |  4 +-
 arch/x86/kernel/process_64.c        |  4 +-
 arch/x86/kernel/signal.c            | 17 +++-----
 arch/x86/mm/pkeys.c                 |  7 +--
 12 files changed, 49 insertions(+), 120 deletions(-)

diff --git a/arch/x86/ia32/ia32_signal.c b/arch/x86/ia32/ia32_signal.c
index 321fe5f5d0e96..6eeb3249f22ff 100644
--- a/arch/x86/ia32/ia32_signal.c
+++ b/arch/x86/ia32/ia32_signal.c
@@ -216,8 +216,7 @@ static void __user *get_sigframe(struct ksignal *ksig, struct pt_regs *regs,
 				 size_t frame_size,
 				 void __user **fpstate)
 {
-	struct fpu *fpu = &current->thread.fpu;
-	unsigned long sp;
+	unsigned long sp, fx_aligned, math_size;
 
 	/* Default to using normal stack */
 	sp = regs->sp;
@@ -231,15 +230,11 @@ static void __user *get_sigframe(struct ksignal *ksig, struct pt_regs *regs,
 		 ksig->ka.sa.sa_restorer)
 		sp = (unsigned long) ksig->ka.sa.sa_restorer;
 
-	if (fpu->initialized) {
-		unsigned long fx_aligned, math_size;
-
-		sp = fpu__alloc_mathframe(sp, 1, &fx_aligned, &math_size);
-		*fpstate = (struct _fpstate_32 __user *) sp;
-		if (copy_fpstate_to_sigframe(*fpstate, (void __user *)fx_aligned,
-				    math_size) < 0)
-			return (void __user *) -1L;
-	}
+	sp = fpu__alloc_mathframe(sp, 1, &fx_aligned, &math_size);
+	*fpstate = (struct _fpstate_32 __user *) sp;
+	if (copy_fpstate_to_sigframe(*fpstate, (void __user *)fx_aligned,
+				     math_size) < 0)
+		return (void __user *) -1L;
 
 	sp -= frame_size;
 	/* Align the stack pointer according to the i386 ABI,
diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 9f0b3ff8c9b7b..3d5121d2bc0bc 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -529,7 +529,7 @@ static inline void fpregs_activate(struct fpu *fpu)
 static inline void
 switch_fpu_prepare(struct fpu *old_fpu, int cpu)
 {
-	if (static_cpu_has(X86_FEATURE_FPU) && old_fpu->initialized) {
+	if (static_cpu_has(X86_FEATURE_FPU) && current->mm) {
 		if (!copy_fpregs_to_fpstate(old_fpu))
 			old_fpu->last_cpu = -1;
 		else
@@ -537,8 +537,7 @@ switch_fpu_prepare(struct fpu *old_fpu, int cpu)
 
 		/* But leave fpu_fpregs_owner_ctx! */
 		trace_x86_fpu_regs_deactivated(old_fpu);
-	} else
-		old_fpu->last_cpu = -1;
+	}
 }
 
 /*
@@ -551,12 +550,12 @@ switch_fpu_prepare(struct fpu *old_fpu, int cpu)
  */
 static inline void switch_fpu_finish(struct fpu *new_fpu, int cpu)
 {
-	bool preload = static_cpu_has(X86_FEATURE_FPU) &&
-		       new_fpu->initialized;
+	if (static_cpu_has(X86_FEATURE_FPU)) {
+		if (!fpregs_state_valid(new_fpu, cpu)) {
+			if (current->mm)
+				copy_kernel_to_fpregs(&new_fpu->state);
+		}
 
-	if (preload) {
-		if (!fpregs_state_valid(new_fpu, cpu))
-			copy_kernel_to_fpregs(&new_fpu->state);
 		fpregs_activate(new_fpu);
 	}
 }
diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
index 202c53918ecfa..c5a6edd92de4f 100644
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -293,15 +293,6 @@ struct fpu {
 	 */
 	unsigned int			last_cpu;
 
-	/*
-	 * @initialized:
-	 *
-	 * This flag indicates whether this context is initialized: if the task
-	 * is not running then we can restore from this context, if the task
-	 * is running then we should save into this context.
-	 */
-	unsigned char			initialized;
-
 	/*
 	 * @state:
 	 *
diff --git a/arch/x86/include/asm/trace/fpu.h b/arch/x86/include/asm/trace/fpu.h
index 069c04be15076..bd65f6ba950f8 100644
--- a/arch/x86/include/asm/trace/fpu.h
+++ b/arch/x86/include/asm/trace/fpu.h
@@ -13,22 +13,19 @@ DECLARE_EVENT_CLASS(x86_fpu,
 
 	TP_STRUCT__entry(
 		__field(struct fpu *, fpu)
-		__field(bool, initialized)
 		__field(u64, xfeatures)
 		__field(u64, xcomp_bv)
 		),
 
 	TP_fast_assign(
 		__entry->fpu		= fpu;
-		__entry->initialized	= fpu->initialized;
 		if (boot_cpu_has(X86_FEATURE_OSXSAVE)) {
 			__entry->xfeatures = fpu->state.xsave.header.xfeatures;
 			__entry->xcomp_bv  = fpu->state.xsave.header.xcomp_bv;
 		}
 	),
-	TP_printk("x86/fpu: %p initialized: %d xfeatures: %llx xcomp_bv: %llx",
+	TP_printk("x86/fpu: %p xfeatures: %llx xcomp_bv: %llx",
 			__entry->fpu,
-			__entry->initialized,
 			__entry->xfeatures,
 			__entry->xcomp_bv
 	)
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index e43296854e379..3a4668c9d24f1 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -101,7 +101,7 @@ static void __kernel_fpu_begin(void)
 
 	kernel_fpu_disable();
 
-	if (fpu->initialized) {
+	if (current->mm) {
 		/*
 		 * Ignore return value -- we don't care if reg state
 		 * is clobbered.
@@ -116,7 +116,7 @@ static void __kernel_fpu_end(void)
 {
 	struct fpu *fpu = &current->thread.fpu;
 
-	if (fpu->initialized)
+	if (current->mm)
 		copy_kernel_to_fpregs(&fpu->state);
 
 	kernel_fpu_enable();
@@ -147,10 +147,9 @@ void fpu__save(struct fpu *fpu)
 
 	preempt_disable();
 	trace_x86_fpu_before_save(fpu);
-	if (fpu->initialized) {
-		if (!copy_fpregs_to_fpstate(fpu)) {
-			copy_kernel_to_fpregs(&fpu->state);
-		}
+
+	if (!copy_fpregs_to_fpstate(fpu)) {
+		copy_kernel_to_fpregs(&fpu->state);
 	}
 	trace_x86_fpu_after_save(fpu);
 	preempt_enable();
@@ -190,7 +189,7 @@ int fpu__copy(struct fpu *dst_fpu, struct fpu *src_fpu)
 {
 	dst_fpu->last_cpu = -1;
 
-	if (!src_fpu->initialized || !static_cpu_has(X86_FEATURE_FPU))
+	if (!static_cpu_has(X86_FEATURE_FPU))
 		return 0;
 
 	WARN_ON_FPU(src_fpu != &current->thread.fpu);
@@ -227,14 +226,10 @@ static void fpu__initialize(struct fpu *fpu)
 {
 	WARN_ON_FPU(fpu != &current->thread.fpu);
 
-	if (!fpu->initialized) {
-		fpstate_init(&fpu->state);
-		trace_x86_fpu_init_state(fpu);
+	fpstate_init(&fpu->state);
+	trace_x86_fpu_init_state(fpu);
 
-		trace_x86_fpu_activate_state(fpu);
-		/* Safe to do for the current task: */
-		fpu->initialized = 1;
-	}
+	trace_x86_fpu_activate_state(fpu);
 }
 
 /*
@@ -247,32 +242,20 @@ static void fpu__initialize(struct fpu *fpu)
  *
  * - or it's called for stopped tasks (ptrace), in which case the
  *   registers were already saved by the context-switch code when
- *   the task scheduled out - we only have to initialize the registers
- *   if they've never been initialized.
+ *   the task scheduled out.
  *
  * If the task has used the FPU before then save it.
  */
 void fpu__prepare_read(struct fpu *fpu)
 {
-	if (fpu == &current->thread.fpu) {
+	if (fpu == &current->thread.fpu)
 		fpu__save(fpu);
-	} else {
-		if (!fpu->initialized) {
-			fpstate_init(&fpu->state);
-			trace_x86_fpu_init_state(fpu);
-
-			trace_x86_fpu_activate_state(fpu);
-			/* Safe to do for current and for stopped child tasks: */
-			fpu->initialized = 1;
-		}
-	}
 }
 
 /*
  * This function must be called before we write a task's fpstate.
  *
- * If the task has used the FPU before then invalidate any cached FPU registers.
- * If the task has not used the FPU before then initialize its fpstate.
+ * Invalidate any cached FPU registers.
  *
  * After this function call, after registers in the fpstate are
  * modified and the child task has woken up, the child task will
@@ -289,17 +272,8 @@ void fpu__prepare_write(struct fpu *fpu)
 	 */
 	WARN_ON_FPU(fpu == &current->thread.fpu);
 
-	if (fpu->initialized) {
-		/* Invalidate any cached state: */
-		__fpu_invalidate_fpregs_state(fpu);
-	} else {
-		fpstate_init(&fpu->state);
-		trace_x86_fpu_init_state(fpu);
-
-		trace_x86_fpu_activate_state(fpu);
-		/* Safe to do for stopped child tasks: */
-		fpu->initialized = 1;
-	}
+	/* Invalidate any cached state: */
+	__fpu_invalidate_fpregs_state(fpu);
 }
 
 /*
@@ -316,17 +290,13 @@ void fpu__drop(struct fpu *fpu)
 	preempt_disable();
 
 	if (fpu == &current->thread.fpu) {
-		if (fpu->initialized) {
-			/* Ignore delayed exceptions from user space */
-			asm volatile("1: fwait\n"
-				     "2:\n"
-				     _ASM_EXTABLE(1b, 2b));
-			fpregs_deactivate(fpu);
-		}
+		/* Ignore delayed exceptions from user space */
+		asm volatile("1: fwait\n"
+			     "2:\n"
+			     _ASM_EXTABLE(1b, 2b));
+		fpregs_deactivate(fpu);
 	}
 
-	fpu->initialized = 0;
-
 	trace_x86_fpu_dropped(fpu);
 
 	preempt_enable();
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 6abd83572b016..20d8fa7124c77 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -239,8 +239,6 @@ static void __init fpu__init_system_ctx_switch(void)
 
 	WARN_ON_FPU(!on_boot_cpu);
 	on_boot_cpu = 0;
-
-	WARN_ON_FPU(current->thread.fpu.initialized);
 }
 
 /*
diff --git a/arch/x86/kernel/fpu/regset.c b/arch/x86/kernel/fpu/regset.c
index 5dbc099178a88..d652b939ccfb5 100644
--- a/arch/x86/kernel/fpu/regset.c
+++ b/arch/x86/kernel/fpu/regset.c
@@ -15,16 +15,12 @@
  */
 int regset_fpregs_active(struct task_struct *target, const struct user_regset *regset)
 {
-	struct fpu *target_fpu = &target->thread.fpu;
-
-	return target_fpu->initialized ? regset->n : 0;
+	return regset->n;
 }
 
 int regset_xregset_fpregs_active(struct task_struct *target, const struct user_regset *regset)
 {
-	struct fpu *target_fpu = &target->thread.fpu;
-
-	if (boot_cpu_has(X86_FEATURE_FXSR) && target_fpu->initialized)
+	if (boot_cpu_has(X86_FEATURE_FXSR))
 		return regset->n;
 	else
 		return 0;
@@ -370,16 +366,9 @@ int fpregs_set(struct task_struct *target, const struct user_regset *regset,
 int dump_fpu(struct pt_regs *regs, struct user_i387_struct *ufpu)
 {
 	struct task_struct *tsk = current;
-	struct fpu *fpu = &tsk->thread.fpu;
-	int fpvalid;
 
-	fpvalid = fpu->initialized;
-	if (fpvalid)
-		fpvalid = !fpregs_get(tsk, NULL,
-				      0, sizeof(struct user_i387_ia32_struct),
-				      ufpu, NULL);
-
-	return fpvalid;
+	return !fpregs_get(tsk, NULL, 0, sizeof(struct user_i387_ia32_struct),
+			   ufpu, NULL);
 }
 EXPORT_SYMBOL(dump_fpu);
 
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 9cc108456d0be..914d9886c6ee8 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -892,8 +892,6 @@ const void *get_xsave_field_ptr(int xsave_state)
 {
 	struct fpu *fpu = &current->thread.fpu;
 
-	if (!fpu->initialized)
-		return NULL;
 	/*
 	 * fpu__save() takes the CPU's xstate registers
 	 * and saves them off to the 'fpu memory buffer.
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 7888a41a03cdb..77d9eb43ccac8 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -288,10 +288,10 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	if (prev->gs | next->gs)
 		lazy_load_gs(next->gs);
 
-	switch_fpu_finish(next_fpu, cpu);
-
 	this_cpu_write(current_task, next_p);
 
+	switch_fpu_finish(next_fpu, cpu);
+
 	/* Load the Intel cache allocation PQR MSR. */
 	resctrl_sched_in();
 
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index e1983b3a16c43..ffea7c557963a 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -566,14 +566,14 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 
 	x86_fsgsbase_load(prev, next);
 
-	switch_fpu_finish(next_fpu, cpu);
-
 	/*
 	 * Switch the PDA and FPU contexts.
 	 */
 	this_cpu_write(current_task, next_p);
 	this_cpu_write(cpu_current_top_of_stack, task_top_of_stack(next_p));
 
+	switch_fpu_finish(next_fpu, cpu);
+
 	/* Reload sp0. */
 	update_task_stack(next_p);
 
diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index 08dfd4c1a4f95..6f45f795690f6 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -246,7 +246,7 @@ get_sigframe(struct k_sigaction *ka, struct pt_regs *regs, size_t frame_size,
 	unsigned long sp = regs->sp;
 	unsigned long buf_fx = 0;
 	int onsigstack = on_sig_stack(sp);
-	struct fpu *fpu = &current->thread.fpu;
+	int ret;
 
 	/* redzone */
 	if (IS_ENABLED(CONFIG_X86_64))
@@ -265,11 +265,9 @@ get_sigframe(struct k_sigaction *ka, struct pt_regs *regs, size_t frame_size,
 		sp = (unsigned long) ka->sa.sa_restorer;
 	}
 
-	if (fpu->initialized) {
-		sp = fpu__alloc_mathframe(sp, IS_ENABLED(CONFIG_X86_32),
-					  &buf_fx, &math_size);
-		*fpstate = (void __user *)sp;
-	}
+	sp = fpu__alloc_mathframe(sp, IS_ENABLED(CONFIG_X86_32),
+				  &buf_fx, &math_size);
+	*fpstate = (void __user *)sp;
 
 	sp = align_sigframe(sp - frame_size);
 
@@ -281,8 +279,8 @@ get_sigframe(struct k_sigaction *ka, struct pt_regs *regs, size_t frame_size,
 		return (void __user *)-1L;
 
 	/* save i387 and extended state */
-	if (fpu->initialized &&
-	    copy_fpstate_to_sigframe(*fpstate, (void __user *)buf_fx, math_size) < 0)
+	ret = copy_fpstate_to_sigframe(*fpstate, (void __user *)buf_fx, math_size);
+	if (ret < 0)
 		return (void __user *)-1L;
 
 	return (void __user *)sp;
@@ -763,8 +761,7 @@ handle_signal(struct ksignal *ksig, struct pt_regs *regs)
 		/*
 		 * Ensure the signal handler starts with the new fpu state.
 		 */
-		if (fpu->initialized)
-			fpu__clear(fpu);
+		fpu__clear(fpu);
 	}
 	signal_setup_done(failed, ksig, stepping);
 }
diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c
index 047a77f6a10cb..05bb9a44eb1c3 100644
--- a/arch/x86/mm/pkeys.c
+++ b/arch/x86/mm/pkeys.c
@@ -39,17 +39,12 @@ int __execute_only_pkey(struct mm_struct *mm)
 	 * dance to set PKRU if we do not need to.  Check it
 	 * first and assume that if the execute-only pkey is
 	 * write-disabled that we do not have to set it
-	 * ourselves.  We need preempt off so that nobody
-	 * can make fpregs inactive.
+	 * ourselves.
 	 */
-	preempt_disable();
 	if (!need_to_set_mm_pkey &&
-	    current->thread.fpu.initialized &&
 	    !__pkru_allows_read(read_pkru(), execute_only_pkey)) {
-		preempt_enable();
 		return execute_only_pkey;
 	}
-	preempt_enable();
 
 	/*
 	 * Set up PKRU so that it denies access for everything
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 08/22] x86/fpu: Remove user_fpu_begin()
  2019-01-09 11:47 [PATCH v6] x86: load FPU registers on return to userland Sebastian Andrzej Siewior
                   ` (6 preceding siblings ...)
  2019-01-09 11:47 ` [PATCH 07/22] x86/fpu: Remove fpu->initialized Sebastian Andrzej Siewior
@ 2019-01-09 11:47 ` Sebastian Andrzej Siewior
  2019-01-25 15:18   ` Borislav Petkov
  2019-01-09 11:47 ` [PATCH 09/22] x86/fpu: Add (__)make_fpregs_active helpers Sebastian Andrzej Siewior
                   ` (15 subsequent siblings)
  23 siblings, 1 reply; 91+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-01-09 11:47 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen,
	Sebastian Andrzej Siewior

user_fpu_begin() sets fpu_fpregs_owner_ctx to task's fpu struct. This is
always the case since there is no lazy FPU anymore.

fpu_fpregs_owner_ctx is used during context switch to decide if it needs
to load the saved registers or if the currently loaded registers are
valid. It could be skipped during
	taskA -> kernel thread -> taskA

because the switch to kernel thread would not alter the CPU's FPU state.

Since this field is always updated during context switch and never
invalidated, setting it manually (in user context) makes no difference.
A kernel thread with kernel_fpu_begin() block could set
fpu_fpregs_owner_ctx to NULL but a kernel thread does not use
user_fpu_begin().
This is a leftover from the lazy-FPU time.

Remove user_fpu_begin(), it does not change fpu_fpregs_owner_ctx's
content.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/x86/include/asm/fpu/internal.h | 17 -----------------
 arch/x86/kernel/fpu/core.c          |  4 +---
 arch/x86/kernel/fpu/signal.c        |  1 -
 3 files changed, 1 insertion(+), 21 deletions(-)

diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 3d5121d2bc0bc..03acb9aeb32fc 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -560,23 +560,6 @@ static inline void switch_fpu_finish(struct fpu *new_fpu, int cpu)
 	}
 }
 
-/*
- * Needs to be preemption-safe.
- *
- * NOTE! user_fpu_begin() must be used only immediately before restoring
- * the save state. It does not do any saving/restoring on its own. In
- * lazy FPU mode, it is just an optimization to avoid a #NM exception,
- * the task can lose the FPU right after preempt_enable().
- */
-static inline void user_fpu_begin(void)
-{
-	struct fpu *fpu = &current->thread.fpu;
-
-	preempt_disable();
-	fpregs_activate(fpu);
-	preempt_enable();
-}
-
 /*
  * MXCSR and XCR definitions:
  */
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 3a4668c9d24f1..78d8037635932 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -335,10 +335,8 @@ void fpu__clear(struct fpu *fpu)
 	 * Make sure fpstate is cleared and initialized.
 	 */
 	fpu__initialize(fpu);
-	if (static_cpu_has(X86_FEATURE_FPU)) {
-		user_fpu_begin();
+	if (static_cpu_has(X86_FEATURE_FPU))
 		copy_init_fpstate_to_fpregs();
-	}
 }
 
 /*
diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index 047390a45e016..555c469878874 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -325,7 +325,6 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
 		 * For 64-bit frames and 32-bit fsave frames, restore the user
 		 * state to the registers directly (with exceptions handled).
 		 */
-		user_fpu_begin();
 		if (copy_user_to_fpregs_zeroing(buf_fx, xfeatures, fx_only)) {
 			fpu__clear(fpu);
 			return -1;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 09/22] x86/fpu: Add (__)make_fpregs_active helpers
  2019-01-09 11:47 [PATCH v6] x86: load FPU registers on return to userland Sebastian Andrzej Siewior
                   ` (7 preceding siblings ...)
  2019-01-09 11:47 ` [PATCH 08/22] x86/fpu: Remove user_fpu_begin() Sebastian Andrzej Siewior
@ 2019-01-09 11:47 ` Sebastian Andrzej Siewior
  2019-01-28 18:23   ` Borislav Petkov
  2019-01-09 11:47 ` [PATCH 10/22] x86/fpu: Make __raw_xsave_addr() use feature number instead of mask Sebastian Andrzej Siewior
                   ` (14 subsequent siblings)
  23 siblings, 1 reply; 91+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-01-09 11:47 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen,
	Sebastian Andrzej Siewior

From: Rik van Riel <riel@surriel.com>

Add helper function that ensures the floating point registers for
the current task are active. Use with preemption disabled.

Signed-off-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/x86/include/asm/fpu/api.h      | 11 +++++++++++
 arch/x86/include/asm/fpu/internal.h | 19 +++++++++++--------
 2 files changed, 22 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h
index b56d504af6545..31b66af8eb914 100644
--- a/arch/x86/include/asm/fpu/api.h
+++ b/arch/x86/include/asm/fpu/api.h
@@ -10,6 +10,7 @@
 
 #ifndef _ASM_X86_FPU_API_H
 #define _ASM_X86_FPU_API_H
+#include <linux/preempt.h>
 
 /*
  * Use kernel_fpu_begin/end() if you intend to use FPU in kernel context. It
@@ -22,6 +23,16 @@ extern void kernel_fpu_begin(void);
 extern void kernel_fpu_end(void);
 extern bool irq_fpu_usable(void);
 
+static inline void __fpregs_changes_begin(void)
+{
+	preempt_disable();
+}
+
+static inline void __fpregs_changes_end(void)
+{
+	preempt_enable();
+}
+
 /*
  * Query the presence of one or more xfeatures. Works on any legacy CPU as well.
  *
diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 03acb9aeb32fc..795a0a2df135e 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -515,6 +515,15 @@ static inline void fpregs_activate(struct fpu *fpu)
 	trace_x86_fpu_regs_activated(fpu);
 }
 
+static inline void __fpregs_load_activate(struct fpu *fpu, int cpu)
+{
+	if (!fpregs_state_valid(fpu, cpu)) {
+		if (current->mm)
+			copy_kernel_to_fpregs(&fpu->state);
+		fpregs_activate(fpu);
+	}
+}
+
 /*
  * FPU state switching for scheduling.
  *
@@ -550,14 +559,8 @@ switch_fpu_prepare(struct fpu *old_fpu, int cpu)
  */
 static inline void switch_fpu_finish(struct fpu *new_fpu, int cpu)
 {
-	if (static_cpu_has(X86_FEATURE_FPU)) {
-		if (!fpregs_state_valid(new_fpu, cpu)) {
-			if (current->mm)
-				copy_kernel_to_fpregs(&new_fpu->state);
-		}
-
-		fpregs_activate(new_fpu);
-	}
+	if (static_cpu_has(X86_FEATURE_FPU))
+		__fpregs_load_activate(new_fpu, cpu);
 }
 
 /*
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 10/22] x86/fpu: Make __raw_xsave_addr() use feature number instead of mask
  2019-01-09 11:47 [PATCH v6] x86: load FPU registers on return to userland Sebastian Andrzej Siewior
                   ` (8 preceding siblings ...)
  2019-01-09 11:47 ` [PATCH 09/22] x86/fpu: Add (__)make_fpregs_active helpers Sebastian Andrzej Siewior
@ 2019-01-09 11:47 ` Sebastian Andrzej Siewior
  2019-01-28 18:30   ` Borislav Petkov
  2019-01-09 11:47 ` [PATCH 11/22] x86/fpu: Make get_xsave_field_ptr() and get_xsave_addr() " Sebastian Andrzej Siewior
                   ` (13 subsequent siblings)
  23 siblings, 1 reply; 91+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-01-09 11:47 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen,
	Sebastian Andrzej Siewior

Most users of __raw_xsave_addr() use a feature number, shift it to a
mask and then __raw_xsave_addr() shifts it back to the feature number.

Make __raw_xsave_addr() use the feature number as an argument.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/x86/kernel/fpu/xstate.c | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 914d9886c6ee8..0e759a032c1c7 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -805,20 +805,18 @@ void fpu__resume_cpu(void)
 }
 
 /*
- * Given an xstate feature mask, calculate where in the xsave
+ * Given an xstate feature nr, calculate where in the xsave
  * buffer the state is.  Callers should ensure that the buffer
  * is valid.
  */
-static void *__raw_xsave_addr(struct xregs_state *xsave, int xstate_feature_mask)
+static void *__raw_xsave_addr(struct xregs_state *xsave, int xfeature_nr)
 {
-	int feature_nr = fls64(xstate_feature_mask) - 1;
-
-	if (!xfeature_enabled(feature_nr)) {
+	if (!xfeature_enabled(xfeature_nr)) {
 		WARN_ON_FPU(1);
 		return NULL;
 	}
 
-	return (void *)xsave + xstate_comp_offsets[feature_nr];
+	return (void *)xsave + xstate_comp_offsets[xfeature_nr];
 }
 /*
  * Given the xsave area and a state inside, this function returns the
@@ -840,6 +838,7 @@ static void *__raw_xsave_addr(struct xregs_state *xsave, int xstate_feature_mask
  */
 void *get_xsave_addr(struct xregs_state *xsave, int xstate_feature)
 {
+	int xfeature_nr;
 	/*
 	 * Do we even *have* xsave state?
 	 */
@@ -867,7 +866,8 @@ void *get_xsave_addr(struct xregs_state *xsave, int xstate_feature)
 	if (!(xsave->header.xfeatures & xstate_feature))
 		return NULL;
 
-	return __raw_xsave_addr(xsave, xstate_feature);
+	xfeature_nr = fls64(xstate_feature) - 1;
+	return __raw_xsave_addr(xsave, xfeature_nr);
 }
 EXPORT_SYMBOL_GPL(get_xsave_addr);
 
@@ -1014,7 +1014,7 @@ int copy_xstate_to_kernel(void *kbuf, struct xregs_state *xsave, unsigned int of
 		 * Copy only in-use xstates:
 		 */
 		if ((header.xfeatures >> i) & 1) {
-			void *src = __raw_xsave_addr(xsave, 1 << i);
+			void *src = __raw_xsave_addr(xsave, i);
 
 			offset = xstate_offsets[i];
 			size = xstate_sizes[i];
@@ -1100,7 +1100,7 @@ int copy_xstate_to_user(void __user *ubuf, struct xregs_state *xsave, unsigned i
 		 * Copy only in-use xstates:
 		 */
 		if ((header.xfeatures >> i) & 1) {
-			void *src = __raw_xsave_addr(xsave, 1 << i);
+			void *src = __raw_xsave_addr(xsave, i);
 
 			offset = xstate_offsets[i];
 			size = xstate_sizes[i];
@@ -1157,7 +1157,7 @@ int copy_kernel_to_xstate(struct xregs_state *xsave, const void *kbuf)
 		u64 mask = ((u64)1 << i);
 
 		if (hdr.xfeatures & mask) {
-			void *dst = __raw_xsave_addr(xsave, 1 << i);
+			void *dst = __raw_xsave_addr(xsave, i);
 
 			offset = xstate_offsets[i];
 			size = xstate_sizes[i];
@@ -1211,7 +1211,7 @@ int copy_user_to_xstate(struct xregs_state *xsave, const void __user *ubuf)
 		u64 mask = ((u64)1 << i);
 
 		if (hdr.xfeatures & mask) {
-			void *dst = __raw_xsave_addr(xsave, 1 << i);
+			void *dst = __raw_xsave_addr(xsave, i);
 
 			offset = xstate_offsets[i];
 			size = xstate_sizes[i];
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 11/22] x86/fpu: Make get_xsave_field_ptr() and get_xsave_addr() use feature number instead of mask
  2019-01-09 11:47 [PATCH v6] x86: load FPU registers on return to userland Sebastian Andrzej Siewior
                   ` (9 preceding siblings ...)
  2019-01-09 11:47 ` [PATCH 10/22] x86/fpu: Make __raw_xsave_addr() use feature number instead of mask Sebastian Andrzej Siewior
@ 2019-01-09 11:47 ` Sebastian Andrzej Siewior
  2019-01-28 18:49   ` Borislav Petkov
  2019-01-09 11:47 ` [PATCH 12/22] x86/fpu: Only write PKRU if it is different from current Sebastian Andrzej Siewior
                   ` (12 subsequent siblings)
  23 siblings, 1 reply; 91+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-01-09 11:47 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen,
	Sebastian Andrzej Siewior

After changing the argument of __raw_xsave_addr() from a mask to number
Dave suggested to check if it makes sense to do the same for
get_xsave_addr(). As it turns out it does. Only get_xsave_addr() needs
the mask to check if the requested feature is part of what is
support/saved and then uses the number again. The shift operation is
cheaper compared to "find last bit set". Also, the feature number uses
less opcode space compared to the mask :)

Make get_xsave_addr() argument a xfeature number instead of mask and fix
up its callers.
As part of this use xfeature_nr and xfeature_mask consistently.
This results in changes to the kvm code as:
	feature -> xfeature_mask
	index -> xfeature_nr

Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/x86/include/asm/fpu/xstate.h |  4 ++--
 arch/x86/kernel/fpu/xstate.c      | 23 +++++++++++------------
 arch/x86/kernel/traps.c           |  2 +-
 arch/x86/kvm/x86.c                | 28 ++++++++++++++--------------
 arch/x86/mm/mpx.c                 |  6 +++---
 5 files changed, 31 insertions(+), 32 deletions(-)

diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
index 48581988d78c7..fbe41f808e5d8 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -46,8 +46,8 @@ extern void __init update_regset_xstate_info(unsigned int size,
 					     u64 xstate_mask);
 
 void fpu__xstate_clear_all_cpu_caps(void);
-void *get_xsave_addr(struct xregs_state *xsave, int xstate);
-const void *get_xsave_field_ptr(int xstate_field);
+void *get_xsave_addr(struct xregs_state *xsave, int xfeature_nr);
+const void *get_xsave_field_ptr(int xfeature_nr);
 int using_compacted_format(void);
 int copy_xstate_to_kernel(void *kbuf, struct xregs_state *xsave, unsigned int offset, unsigned int size);
 int copy_xstate_to_user(void __user *ubuf, struct xregs_state *xsave, unsigned int offset, unsigned int size);
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 0e759a032c1c7..d288e4d271b71 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -830,15 +830,15 @@ static void *__raw_xsave_addr(struct xregs_state *xsave, int xfeature_nr)
  *
  * Inputs:
  *	xstate: the thread's storage area for all FPU data
- *	xstate_feature: state which is defined in xsave.h (e.g.
- *	XFEATURE_MASK_FP, XFEATURE_MASK_SSE, etc...)
+ *	xfeature_nr: state which is defined in xsave.h (e.g. XFEATURE_FP,
+ *	XFEATURE_SSE, etc...)
  * Output:
  *	address of the state in the xsave area, or NULL if the
  *	field is not present in the xsave buffer.
  */
-void *get_xsave_addr(struct xregs_state *xsave, int xstate_feature)
+void *get_xsave_addr(struct xregs_state *xsave, int xfeature_nr)
 {
-	int xfeature_nr;
+	u64 xfeature_mask = 1ULL << xfeature_nr;
 	/*
 	 * Do we even *have* xsave state?
 	 */
@@ -850,11 +850,11 @@ void *get_xsave_addr(struct xregs_state *xsave, int xstate_feature)
 	 * have not enabled.  Remember that pcntxt_mask is
 	 * what we write to the XCR0 register.
 	 */
-	WARN_ONCE(!(xfeatures_mask & xstate_feature),
+	WARN_ONCE(!(xfeatures_mask & xfeature_mask),
 		  "get of unsupported state");
 	/*
 	 * This assumes the last 'xsave*' instruction to
-	 * have requested that 'xstate_feature' be saved.
+	 * have requested that 'xfeature_mask' be saved.
 	 * If it did not, we might be seeing and old value
 	 * of the field in the buffer.
 	 *
@@ -863,10 +863,9 @@ void *get_xsave_addr(struct xregs_state *xsave, int xstate_feature)
 	 * or because the "init optimization" caused it
 	 * to not be saved.
 	 */
-	if (!(xsave->header.xfeatures & xstate_feature))
+	if (!(xsave->header.xfeatures & xfeature_mask))
 		return NULL;
 
-	xfeature_nr = fls64(xstate_feature) - 1;
 	return __raw_xsave_addr(xsave, xfeature_nr);
 }
 EXPORT_SYMBOL_GPL(get_xsave_addr);
@@ -882,13 +881,13 @@ EXPORT_SYMBOL_GPL(get_xsave_addr);
  * Note that this only works on the current task.
  *
  * Inputs:
- *	@xsave_state: state which is defined in xsave.h (e.g. XFEATURE_MASK_FP,
- *	XFEATURE_MASK_SSE, etc...)
+ *	@xfeature_nr: state which is defined in xsave.h (e.g. XFEATURE_FP,
+ *	XFEATURE_SSE, etc...)
  * Output:
  *	address of the state in the xsave area or NULL if the state
  *	is not present or is in its 'init state'.
  */
-const void *get_xsave_field_ptr(int xsave_state)
+const void *get_xsave_field_ptr(int xfeature_nr)
 {
 	struct fpu *fpu = &current->thread.fpu;
 
@@ -898,7 +897,7 @@ const void *get_xsave_field_ptr(int xsave_state)
 	 */
 	fpu__save(fpu);
 
-	return get_xsave_addr(&fpu->state.xsave, xsave_state);
+	return get_xsave_addr(&fpu->state.xsave, xfeature_nr);
 }
 
 #ifdef CONFIG_ARCH_HAS_PKEYS
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 9b7c4ca8f0a73..626853b2ac344 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -455,7 +455,7 @@ dotraplinkage void do_bounds(struct pt_regs *regs, long error_code)
 	 * which is all zeros which indicates MPX was not
 	 * responsible for the exception.
 	 */
-	bndcsr = get_xsave_field_ptr(XFEATURE_MASK_BNDCSR);
+	bndcsr = get_xsave_field_ptr(XFEATURE_BNDCSR);
 	if (!bndcsr)
 		goto exit_trap;
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 02c8e095a2390..6c21aa5c00e58 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3662,15 +3662,15 @@ static void fill_xsave(u8 *dest, struct kvm_vcpu *vcpu)
 	 */
 	valid = xstate_bv & ~XFEATURE_MASK_FPSSE;
 	while (valid) {
-		u64 feature = valid & -valid;
-		int index = fls64(feature) - 1;
-		void *src = get_xsave_addr(xsave, feature);
+		u64 xfeature_mask = valid & -valid;
+		int xfeature_nr = fls64(xfeature_mask) - 1;
+		void *src = get_xsave_addr(xsave, xfeature_nr);
 
 		if (src) {
 			u32 size, offset, ecx, edx;
-			cpuid_count(XSTATE_CPUID, index,
+			cpuid_count(XSTATE_CPUID, xfeature_nr,
 				    &size, &offset, &ecx, &edx);
-			if (feature == XFEATURE_MASK_PKRU)
+			if (xfeature_nr == XFEATURE_PKRU)
 				memcpy(dest + offset, &vcpu->arch.pkru,
 				       sizeof(vcpu->arch.pkru));
 			else
@@ -3678,7 +3678,7 @@ static void fill_xsave(u8 *dest, struct kvm_vcpu *vcpu)
 
 		}
 
-		valid -= feature;
+		valid -= xfeature_mask;
 	}
 }
 
@@ -3705,22 +3705,22 @@ static void load_xsave(struct kvm_vcpu *vcpu, u8 *src)
 	 */
 	valid = xstate_bv & ~XFEATURE_MASK_FPSSE;
 	while (valid) {
-		u64 feature = valid & -valid;
-		int index = fls64(feature) - 1;
-		void *dest = get_xsave_addr(xsave, feature);
+		u64 xfeature_mask = valid & -valid;
+		int xfeature_nr = fls64(xfeature_mask) - 1;
+		void *dest = get_xsave_addr(xsave, xfeature_nr);
 
 		if (dest) {
 			u32 size, offset, ecx, edx;
-			cpuid_count(XSTATE_CPUID, index,
+			cpuid_count(XSTATE_CPUID, xfeature_nr,
 				    &size, &offset, &ecx, &edx);
-			if (feature == XFEATURE_MASK_PKRU)
+			if (xfeature_nr == XFEATURE_PKRU)
 				memcpy(&vcpu->arch.pkru, src + offset,
 				       sizeof(vcpu->arch.pkru));
 			else
 				memcpy(dest, src + offset, size);
 		}
 
-		valid -= feature;
+		valid -= xfeature_mask;
 	}
 }
 
@@ -8804,11 +8804,11 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 		if (init_event)
 			kvm_put_guest_fpu(vcpu);
 		mpx_state_buffer = get_xsave_addr(&vcpu->arch.guest_fpu->state.xsave,
-					XFEATURE_MASK_BNDREGS);
+					XFEATURE_BNDREGS);
 		if (mpx_state_buffer)
 			memset(mpx_state_buffer, 0, sizeof(struct mpx_bndreg_state));
 		mpx_state_buffer = get_xsave_addr(&vcpu->arch.guest_fpu->state.xsave,
-					XFEATURE_MASK_BNDCSR);
+					XFEATURE_BNDCSR);
 		if (mpx_state_buffer)
 			memset(mpx_state_buffer, 0, sizeof(struct mpx_bndcsr));
 		if (init_event)
diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c
index de1851d156997..c1ec9d81c627c 100644
--- a/arch/x86/mm/mpx.c
+++ b/arch/x86/mm/mpx.c
@@ -142,7 +142,7 @@ int mpx_fault_info(struct mpx_fault_info *info, struct pt_regs *regs)
 		goto err_out;
 	}
 	/* get bndregs field from current task's xsave area */
-	bndregs = get_xsave_field_ptr(XFEATURE_MASK_BNDREGS);
+	bndregs = get_xsave_field_ptr(XFEATURE_BNDREGS);
 	if (!bndregs) {
 		err = -EINVAL;
 		goto err_out;
@@ -190,7 +190,7 @@ static __user void *mpx_get_bounds_dir(void)
 	 * The bounds directory pointer is stored in a register
 	 * only accessible if we first do an xsave.
 	 */
-	bndcsr = get_xsave_field_ptr(XFEATURE_MASK_BNDCSR);
+	bndcsr = get_xsave_field_ptr(XFEATURE_BNDCSR);
 	if (!bndcsr)
 		return MPX_INVALID_BOUNDS_DIR;
 
@@ -376,7 +376,7 @@ static int do_mpx_bt_fault(void)
 	const struct mpx_bndcsr *bndcsr;
 	struct mm_struct *mm = current->mm;
 
-	bndcsr = get_xsave_field_ptr(XFEATURE_MASK_BNDCSR);
+	bndcsr = get_xsave_field_ptr(XFEATURE_BNDCSR);
 	if (!bndcsr)
 		return -EINVAL;
 	/*
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 12/22] x86/fpu: Only write PKRU if it is different from current
  2019-01-09 11:47 [PATCH v6] x86: load FPU registers on return to userland Sebastian Andrzej Siewior
                   ` (10 preceding siblings ...)
  2019-01-09 11:47 ` [PATCH 11/22] x86/fpu: Make get_xsave_field_ptr() and get_xsave_addr() " Sebastian Andrzej Siewior
@ 2019-01-09 11:47 ` Sebastian Andrzej Siewior
  2019-01-23 18:09   ` Dave Hansen
  2019-01-09 11:47 ` [PATCH 13/22] x86/pkeys: Don't check if PKRU is zero before writting it Sebastian Andrzej Siewior
                   ` (11 subsequent siblings)
  23 siblings, 1 reply; 91+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-01-09 11:47 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen,
	Sebastian Andrzej Siewior

Dave Hansen says that the `wrpkru' is more expensive than `rdpkru'. It
has a higher cycle cost and it's also practically a (light) speculation
barrier.

As an optimisation read the current PKRU value and only write the new
one if it is different.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/x86/include/asm/special_insns.h | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h
index 43c029cdc3fe8..c2ccf71b22dd6 100644
--- a/arch/x86/include/asm/special_insns.h
+++ b/arch/x86/include/asm/special_insns.h
@@ -107,7 +107,7 @@ static inline u32 __read_pkru(void)
 	return pkru;
 }
 
-static inline void __write_pkru(u32 pkru)
+static inline void __write_pkru_insn(u32 pkru)
 {
 	u32 ecx = 0, edx = 0;
 
@@ -118,6 +118,17 @@ static inline void __write_pkru(u32 pkru)
 	asm volatile(".byte 0x0f,0x01,0xef\n\t"
 		     : : "a" (pkru), "c"(ecx), "d"(edx));
 }
+
+static inline void __write_pkru(u32 pkru)
+{
+	/*
+	 * Writting PKRU is expensive. Only write the PKRU value if it is
+	 * different from the current one.
+	 */
+	if (pkru == __read_pkru())
+		return;
+	__write_pkru_insn(pkru);
+}
 #else
 static inline u32 __read_pkru(void)
 {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 13/22] x86/pkeys: Don't check if PKRU is zero before writting it
  2019-01-09 11:47 [PATCH v6] x86: load FPU registers on return to userland Sebastian Andrzej Siewior
                   ` (11 preceding siblings ...)
  2019-01-09 11:47 ` [PATCH 12/22] x86/fpu: Only write PKRU if it is different from current Sebastian Andrzej Siewior
@ 2019-01-09 11:47 ` Sebastian Andrzej Siewior
  2019-01-09 11:47 ` [PATCH 14/22] x86/fpu: Eager switch PKRU state Sebastian Andrzej Siewior
                   ` (10 subsequent siblings)
  23 siblings, 0 replies; 91+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-01-09 11:47 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen,
	Sebastian Andrzej Siewior

write_pkru() checks if the current value is the same as the expected
value. So instead just checking if the current and new value is zero
(and skip the write in such a case) we can benefit from that.

Remove the zero check of PKRU, write_pkru() provides a similar check.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/x86/mm/pkeys.c | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c
index 05bb9a44eb1c3..50f65fc1b9a3f 100644
--- a/arch/x86/mm/pkeys.c
+++ b/arch/x86/mm/pkeys.c
@@ -142,13 +142,6 @@ u32 init_pkru_value = PKRU_AD_KEY( 1) | PKRU_AD_KEY( 2) | PKRU_AD_KEY( 3) |
 void copy_init_pkru_to_fpregs(void)
 {
 	u32 init_pkru_value_snapshot = READ_ONCE(init_pkru_value);
-	/*
-	 * Any write to PKRU takes it out of the XSAVE 'init
-	 * state' which increases context switch cost.  Avoid
-	 * writing 0 when PKRU was already 0.
-	 */
-	if (!init_pkru_value_snapshot && !read_pkru())
-		return;
 	/*
 	 * Override the PKRU state that came from 'init_fpstate'
 	 * with the baseline from the process.
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 14/22] x86/fpu: Eager switch PKRU state
  2019-01-09 11:47 [PATCH v6] x86: load FPU registers on return to userland Sebastian Andrzej Siewior
                   ` (12 preceding siblings ...)
  2019-01-09 11:47 ` [PATCH 13/22] x86/pkeys: Don't check if PKRU is zero before writting it Sebastian Andrzej Siewior
@ 2019-01-09 11:47 ` Sebastian Andrzej Siewior
  2019-01-09 11:47 ` [PATCH 15/22] x86/entry: Add TIF_NEED_FPU_LOAD Sebastian Andrzej Siewior
                   ` (9 subsequent siblings)
  23 siblings, 0 replies; 91+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-01-09 11:47 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen,
	Sebastian Andrzej Siewior

From: Rik van Riel <riel@surriel.com>

While most of a task's FPU state is only needed in user space, the
protection keys need to be in place immediately after a context switch.

The reason is that any access to userspace memory while running in
kernel mode also need to abide by the memory permissions specified in
the protection keys.

The "eager switch" is a preparation for loading the FPU state on return
to userland. Instead of decoupling PKRU state from xstate I update PKRU
within xstate on write operations by the kernel.

The read/write_pkru() is moved to another header file so it can easily
accessed from pgtable.h and fpu/internal.h.

For user tasks we should always get the PKRU from the xsave area and it
should not change anything because the PKRU value was loaded as part of
FPU restore.
For kernel kernel threads we now will have the default "allow
everything" written.  Before this commit the kernel thread would end up
with a random value which it inherited from the previous user task.

Signed-off-by: Rik van Riel <riel@surriel.com>
[bigeasy: save pkru to xstate, no cache, don't use __raw_xsave_addr()]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/x86/include/asm/fpu/internal.h | 20 ++++++++++++++++++--
 arch/x86/include/asm/fpu/xstate.h   |  1 +
 2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 795a0a2df135e..7191eb9686827 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -559,8 +559,24 @@ switch_fpu_prepare(struct fpu *old_fpu, int cpu)
  */
 static inline void switch_fpu_finish(struct fpu *new_fpu, int cpu)
 {
-	if (static_cpu_has(X86_FEATURE_FPU))
-		__fpregs_load_activate(new_fpu, cpu);
+	struct pkru_state *pk;
+	u32 pkru_val = 0;
+
+	if (!static_cpu_has(X86_FEATURE_FPU))
+		return;
+
+	__fpregs_load_activate(new_fpu, cpu);
+
+	if (!cpu_feature_enabled(X86_FEATURE_OSPKE))
+		return;
+
+	if (current->mm) {
+		pk = get_xsave_addr(&new_fpu->state.xsave, XFEATURE_PKRU);
+		WARN_ON_ONCE(!pk);
+		if (pk)
+			pkru_val = pk->pkru;
+	}
+	__write_pkru(pkru_val);
 }
 
 /*
diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
index fbe41f808e5d8..4e18a837223ff 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -5,6 +5,7 @@
 #include <linux/types.h>
 #include <asm/processor.h>
 #include <linux/uaccess.h>
+#include <asm/user.h>
 
 /* Bit 63 of XCR0 is reserved for future expansion */
 #define XFEATURE_MASK_EXTEND	(~(XFEATURE_MASK_FPSSE | (1ULL << 63)))
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 15/22] x86/entry: Add TIF_NEED_FPU_LOAD
  2019-01-09 11:47 [PATCH v6] x86: load FPU registers on return to userland Sebastian Andrzej Siewior
                   ` (13 preceding siblings ...)
  2019-01-09 11:47 ` [PATCH 14/22] x86/fpu: Eager switch PKRU state Sebastian Andrzej Siewior
@ 2019-01-09 11:47 ` Sebastian Andrzej Siewior
  2019-01-30 11:55   ` Borislav Petkov
  2019-01-09 11:47 ` [PATCH 16/22] x86/fpu: Always store the registers in copy_fpstate_to_sigframe() Sebastian Andrzej Siewior
                   ` (8 subsequent siblings)
  23 siblings, 1 reply; 91+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-01-09 11:47 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen,
	Sebastian Andrzej Siewior

Add TIF_NEED_FPU_LOAD. This is reserved for loading the FPU registers
before returning to userland. This flag must not be set for systems
without a FPU.
If this flag is cleared, the CPU's FPU register hold the current content
of current()'s FPU register. The in-memory copy (union fpregs_state) is
not valid.
If this flag is set, then all of CPU's FPU register may hold a random
value (except for PKRU) and it is required to load the content of the
FPU register on return to userland.

It is introduced now, so we can add code handling it now before adding
the main feature.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/x86/include/asm/thread_info.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index e0eccbcb8447d..f9453536f9bbc 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -88,6 +88,7 @@ struct thread_info {
 #define TIF_USER_RETURN_NOTIFY	11	/* notify kernel of userspace return */
 #define TIF_UPROBE		12	/* breakpointed or singlestepping */
 #define TIF_PATCH_PENDING	13	/* pending live patching update */
+#define TIF_NEED_FPU_LOAD	14	/* load FPU on return to userspace */
 #define TIF_NOCPUID		15	/* CPUID is not accessible in userland */
 #define TIF_NOTSC		16	/* TSC is not accessible in userland */
 #define TIF_IA32		17	/* IA32 compatibility process */
@@ -117,6 +118,7 @@ struct thread_info {
 #define _TIF_USER_RETURN_NOTIFY	(1 << TIF_USER_RETURN_NOTIFY)
 #define _TIF_UPROBE		(1 << TIF_UPROBE)
 #define _TIF_PATCH_PENDING	(1 << TIF_PATCH_PENDING)
+#define _TIF_NEED_FPU_LOAD	(1 << TIF_NEED_FPU_LOAD)
 #define _TIF_NOCPUID		(1 << TIF_NOCPUID)
 #define _TIF_NOTSC		(1 << TIF_NOTSC)
 #define _TIF_IA32		(1 << TIF_IA32)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 16/22] x86/fpu: Always store the registers in copy_fpstate_to_sigframe()
  2019-01-09 11:47 [PATCH v6] x86: load FPU registers on return to userland Sebastian Andrzej Siewior
                   ` (14 preceding siblings ...)
  2019-01-09 11:47 ` [PATCH 15/22] x86/entry: Add TIF_NEED_FPU_LOAD Sebastian Andrzej Siewior
@ 2019-01-09 11:47 ` Sebastian Andrzej Siewior
  2019-01-30 11:43   ` Borislav Petkov
  2019-01-09 11:47 ` [PATCH 17/22] x86/fpu: Prepare copy_fpstate_to_sigframe() for TIF_NEED_FPU_LOAD Sebastian Andrzej Siewior
                   ` (7 subsequent siblings)
  23 siblings, 1 reply; 91+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-01-09 11:47 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen,
	Sebastian Andrzej Siewior

From: Rik van Riel <riel@surriel.com>

copy_fpstate_to_sigframe() stores the registers directly to user space.
This is okay because the FPU register are valid and saving it directly
avoids saving it into kernel memory and making a copy.
However… We can't keep doing this if we are going to restore the FPU
registers on the return to userland. It is possible that the FPU
registers will be invalidated in the middle of the save operation and
this should be done with disabled preemption / BH.

Save the FPU registers to task's FPU struct and copy them to the user
memory later on.

This code is extracted from an earlier version of the patchset while
there still was lazy-FPU on x86.

Signed-off-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/x86/include/asm/fpu/internal.h | 45 -----------------------------
 arch/x86/kernel/fpu/signal.c        | 29 +++++++------------
 2 files changed, 10 insertions(+), 64 deletions(-)

diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 7191eb9686827..16ea30235b025 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -126,22 +126,6 @@ extern void fpstate_sanitize_xstate(struct fpu *fpu);
 		     _ASM_EXTABLE_HANDLE(1b, 2b, ex_handler_fprestore)	\
 		     : output : input)
 
-static inline int copy_fregs_to_user(struct fregs_state __user *fx)
-{
-	return user_insn(fnsave %[fx]; fwait,  [fx] "=m" (*fx), "m" (*fx));
-}
-
-static inline int copy_fxregs_to_user(struct fxregs_state __user *fx)
-{
-	if (IS_ENABLED(CONFIG_X86_32))
-		return user_insn(fxsave %[fx], [fx] "=m" (*fx), "m" (*fx));
-	else if (IS_ENABLED(CONFIG_AS_FXSAVEQ))
-		return user_insn(fxsaveq %[fx], [fx] "=m" (*fx), "m" (*fx));
-
-	/* See comment in copy_fxregs_to_kernel() below. */
-	return user_insn(rex64/fxsave (%[fx]), "=m" (*fx), [fx] "R" (fx));
-}
-
 static inline void copy_kernel_to_fxregs(struct fxregs_state *fx)
 {
 	if (IS_ENABLED(CONFIG_X86_32)) {
@@ -352,35 +336,6 @@ static inline void copy_kernel_to_xregs(struct xregs_state *xstate, u64 mask)
 	XSTATE_XRESTORE(xstate, lmask, hmask);
 }
 
-/*
- * Save xstate to user space xsave area.
- *
- * We don't use modified optimization because xrstor/xrstors might track
- * a different application.
- *
- * We don't use compacted format xsave area for
- * backward compatibility for old applications which don't understand
- * compacted format of xsave area.
- */
-static inline int copy_xregs_to_user(struct xregs_state __user *buf)
-{
-	int err;
-
-	/*
-	 * Clear the xsave header first, so that reserved fields are
-	 * initialized to zero.
-	 */
-	err = __clear_user(&buf->header, sizeof(buf->header));
-	if (unlikely(err))
-		return -EFAULT;
-
-	stac();
-	XSTATE_OP(XSAVE, buf, -1, -1, err);
-	clac();
-
-	return err;
-}
-
 /*
  * Restore xstate from user space xsave area.
  */
diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index 555c469878874..bf4e6caad305e 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -118,22 +118,6 @@ static inline int save_xstate_epilog(void __user *buf, int ia32_frame)
 	return err;
 }
 
-static inline int copy_fpregs_to_sigframe(struct xregs_state __user *buf)
-{
-	int err;
-
-	if (use_xsave())
-		err = copy_xregs_to_user(buf);
-	else if (use_fxsr())
-		err = copy_fxregs_to_user((struct fxregs_state __user *) buf);
-	else
-		err = copy_fregs_to_user((struct fregs_state __user *) buf);
-
-	if (unlikely(err) && __clear_user(buf, fpu_user_xstate_size))
-		err = -EFAULT;
-	return err;
-}
-
 /*
  * Save the fpu, extended register state to the user signal frame.
  *
@@ -157,6 +141,7 @@ static inline int copy_fpregs_to_sigframe(struct xregs_state __user *buf)
 int copy_fpstate_to_sigframe(void __user *buf, void __user *buf_fx, int size)
 {
 	struct fpu *fpu = &current->thread.fpu;
+	struct xregs_state *xsave = &fpu->state.xsave;
 	struct task_struct *tsk = current;
 	int ia32_fxstate = (buf != buf_fx);
 
@@ -171,9 +156,15 @@ int copy_fpstate_to_sigframe(void __user *buf, void __user *buf_fx, int size)
 			sizeof(struct user_i387_ia32_struct), NULL,
 			(struct _fpstate_32 __user *) buf) ? -1 : 1;
 
-	/* Save the live register state to the user directly. */
-	if (copy_fpregs_to_sigframe(buf_fx))
-		return -1;
+	copy_fpregs_to_fpstate(fpu);
+
+	if (using_compacted_format()) {
+		copy_xstate_to_user(buf_fx, xsave, 0, size);
+	} else {
+		fpstate_sanitize_xstate(fpu);
+		if (__copy_to_user(buf_fx, xsave, fpu_user_xstate_size))
+			return -1;
+	}
 
 	/* Save the fsave header for the 32-bit frames. */
 	if ((ia32_fxstate || !use_fxsr()) && save_fsave_header(tsk, buf))
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 17/22] x86/fpu: Prepare copy_fpstate_to_sigframe() for TIF_NEED_FPU_LOAD
  2019-01-09 11:47 [PATCH v6] x86: load FPU registers on return to userland Sebastian Andrzej Siewior
                   ` (15 preceding siblings ...)
  2019-01-09 11:47 ` [PATCH 16/22] x86/fpu: Always store the registers in copy_fpstate_to_sigframe() Sebastian Andrzej Siewior
@ 2019-01-09 11:47 ` Sebastian Andrzej Siewior
  2019-01-30 11:56   ` Borislav Petkov
  2019-01-09 11:47 ` [PATCH 18/22] x86/fpu: Update xstate's PKRU value on write_pkru() Sebastian Andrzej Siewior
                   ` (6 subsequent siblings)
  23 siblings, 1 reply; 91+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-01-09 11:47 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen,
	Sebastian Andrzej Siewior

From: Rik van Riel <riel@surriel.com>

The FPU registers need only to be saved if TIF_NEED_FPU_LOAD is not set.
Otherwise this has been already done and can be skipped.

Signed-off-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/x86/kernel/fpu/signal.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index bf4e6caad305e..a25be217f9a2c 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -156,7 +156,16 @@ int copy_fpstate_to_sigframe(void __user *buf, void __user *buf_fx, int size)
 			sizeof(struct user_i387_ia32_struct), NULL,
 			(struct _fpstate_32 __user *) buf) ? -1 : 1;
 
-	copy_fpregs_to_fpstate(fpu);
+	__fpregs_changes_begin();
+	/*
+	 * If we do not need to load the FPU registers at return to userspace
+	 * then the CPU has the current state and we need to save it. Otherwise
+	 * it is already done and we can skip it.
+	 */
+	if (!test_thread_flag(TIF_NEED_FPU_LOAD))
+		copy_fpregs_to_fpstate(fpu);
+
+	__fpregs_changes_end();
 
 	if (using_compacted_format()) {
 		copy_xstate_to_user(buf_fx, xsave, 0, size);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 18/22] x86/fpu: Update xstate's PKRU value on write_pkru()
  2019-01-09 11:47 [PATCH v6] x86: load FPU registers on return to userland Sebastian Andrzej Siewior
                   ` (16 preceding siblings ...)
  2019-01-09 11:47 ` [PATCH 17/22] x86/fpu: Prepare copy_fpstate_to_sigframe() for TIF_NEED_FPU_LOAD Sebastian Andrzej Siewior
@ 2019-01-09 11:47 ` Sebastian Andrzej Siewior
  2019-01-23 17:28   ` Dave Hansen
  2019-01-09 11:47 ` [PATCH 19/22] x86/fpu: Inline copy_user_to_fpregs_zeroing() Sebastian Andrzej Siewior
                   ` (5 subsequent siblings)
  23 siblings, 1 reply; 91+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-01-09 11:47 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen,
	Sebastian Andrzej Siewior

During the context switch the xstate is loaded which also includes the
PKRU value.
If xstate is restored on return to userland it is required that the
PKRU value in xstate is the same as the one in the CPU.

Save the PKRU in xstate during modification.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/x86/include/asm/pgtable.h | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 40616e8052924..5eed44798ae95 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -23,6 +23,8 @@
 
 #ifndef __ASSEMBLY__
 #include <asm/x86_init.h>
+#include <asm/fpu/xstate.h>
+#include <asm/fpu/api.h>
 
 extern pgd_t early_top_pgt[PTRS_PER_PGD];
 int __init __early_make_pgtable(unsigned long address, pmdval_t pmd);
@@ -133,8 +135,22 @@ static inline u32 read_pkru(void)
 
 static inline void write_pkru(u32 pkru)
 {
-	if (boot_cpu_has(X86_FEATURE_OSPKE))
-		__write_pkru(pkru);
+	struct pkru_state *pk;
+
+	if (!boot_cpu_has(X86_FEATURE_OSPKE))
+		return;
+
+	pk = get_xsave_addr(&current->thread.fpu.state.xsave, XFEATURE_PKRU);
+	/*
+	 * The PKRU value in xstate needs to be in sync with the value that is
+	 * written to the CPU. The FPU restore on return to userland would
+	 * otherwise load the previous value again.
+	 */
+	__fpregs_changes_begin();
+	if (pk)
+		pk->pkru = pkru;
+	__write_pkru(pkru);
+	__fpregs_changes_end();
 }
 
 static inline int pte_young(pte_t pte)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 19/22] x86/fpu: Inline copy_user_to_fpregs_zeroing()
  2019-01-09 11:47 [PATCH v6] x86: load FPU registers on return to userland Sebastian Andrzej Siewior
                   ` (17 preceding siblings ...)
  2019-01-09 11:47 ` [PATCH 18/22] x86/fpu: Update xstate's PKRU value on write_pkru() Sebastian Andrzej Siewior
@ 2019-01-09 11:47 ` Sebastian Andrzej Siewior
  2019-01-09 11:47 ` [PATCH 20/22] x86/fpu: Let __fpu__restore_sig() restore the !32bit+fxsr frame from kernel memory Sebastian Andrzej Siewior
                   ` (4 subsequent siblings)
  23 siblings, 0 replies; 91+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-01-09 11:47 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen,
	Sebastian Andrzej Siewior

Start refactoring __fpu__restore_sig() by inlining
copy_user_to_fpregs_zeroing().

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/x86/kernel/fpu/signal.c | 42 ++++++++++++++++--------------------
 1 file changed, 19 insertions(+), 23 deletions(-)

diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index a25be217f9a2c..970091fb011e9 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -221,28 +221,6 @@ sanitize_restored_xstate(union fpregs_state *state,
 	}
 }
 
-/*
- * Restore the extended state if present. Otherwise, restore the FP/SSE state.
- */
-static inline int copy_user_to_fpregs_zeroing(void __user *buf, u64 xbv, int fx_only)
-{
-	if (use_xsave()) {
-		if ((unsigned long)buf % 64 || fx_only) {
-			u64 init_bv = xfeatures_mask & ~XFEATURE_MASK_FPSSE;
-			copy_kernel_to_xregs(&init_fpstate.xsave, init_bv);
-			return copy_user_to_fxregs(buf);
-		} else {
-			u64 init_bv = xfeatures_mask & ~xbv;
-			if (unlikely(init_bv))
-				copy_kernel_to_xregs(&init_fpstate.xsave, init_bv);
-			return copy_user_to_xregs(buf, xbv);
-		}
-	} else if (use_fxsr()) {
-		return copy_user_to_fxregs(buf);
-	} else
-		return copy_user_to_fregs(buf);
-}
-
 static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
 {
 	int ia32_fxstate = (buf != buf_fx);
@@ -321,11 +299,29 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
 		kfree(tmp);
 		return err;
 	} else {
+		int ret;
+
 		/*
 		 * For 64-bit frames and 32-bit fsave frames, restore the user
 		 * state to the registers directly (with exceptions handled).
 		 */
-		if (copy_user_to_fpregs_zeroing(buf_fx, xfeatures, fx_only)) {
+		if (use_xsave()) {
+			if ((unsigned long)buf_fx % 64 || fx_only) {
+				u64 init_bv = xfeatures_mask & ~XFEATURE_MASK_FPSSE;
+				copy_kernel_to_xregs(&init_fpstate.xsave, init_bv);
+				ret = copy_user_to_fxregs(buf_fx);
+			} else {
+				u64 init_bv = xfeatures_mask & ~xfeatures;
+				if (unlikely(init_bv))
+					copy_kernel_to_xregs(&init_fpstate.xsave, init_bv);
+				ret = copy_user_to_xregs(buf_fx, xfeatures);
+			}
+		} else if (use_fxsr()) {
+			ret = copy_user_to_fxregs(buf_fx);
+		} else
+			ret = copy_user_to_fregs(buf_fx);
+
+		if (ret) {
 			fpu__clear(fpu);
 			return -1;
 		}
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 20/22] x86/fpu: Let __fpu__restore_sig() restore the !32bit+fxsr frame from kernel memory
  2019-01-09 11:47 [PATCH v6] x86: load FPU registers on return to userland Sebastian Andrzej Siewior
                   ` (18 preceding siblings ...)
  2019-01-09 11:47 ` [PATCH 19/22] x86/fpu: Inline copy_user_to_fpregs_zeroing() Sebastian Andrzej Siewior
@ 2019-01-09 11:47 ` Sebastian Andrzej Siewior
  2019-01-30 21:29   ` Borislav Petkov
  2019-01-09 11:47 ` [PATCH 21/22] x86/fpu: Merge the two code paths in __fpu__restore_sig() Sebastian Andrzej Siewior
                   ` (3 subsequent siblings)
  23 siblings, 1 reply; 91+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-01-09 11:47 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen,
	Sebastian Andrzej Siewior

The !32bit+fxsr case loads the new state from user memory. In case we
restore the FPU state on return to userland we can't do this. It would
be required to disable preemption in order to avoid a context switch
which would set TIF_NEED_FPU_LOAD. If this happens before the "restore"
operation then the loaded registers would become volatile.

Disabling preemption while accessing user memory requires to disable the
pagefault handler. An error during XRSTOR would then mean that either a
page fault occured (and we have to retry with enabled page fault
handler) or a #GP occured because the xstate is bogus (after all the
sig-handler can modify it).

In order to avoid that mess, copy the FPU state from userland, validate
it and then load it. The copy_users_…() helper are basically the old
helper except that they operate on kernel memory and the fault handler
just sets the error value and the caller handles it.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/x86/include/asm/fpu/internal.h | 32 ++++++++++-----
 arch/x86/kernel/fpu/signal.c        | 62 +++++++++++++++++++++++------
 2 files changed, 71 insertions(+), 23 deletions(-)

diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 16ea30235b025..672e51bc0e9b5 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -120,6 +120,21 @@ extern void fpstate_sanitize_xstate(struct fpu *fpu);
 	err;								\
 })
 
+#define kernel_insn_norestore(insn, output, input...)			\
+({									\
+	int err;							\
+	asm volatile("1:" #insn "\n\t"					\
+		     "2:\n"						\
+		     ".section .fixup,\"ax\"\n"				\
+		     "3:  movl $-1,%[err]\n"				\
+		     "    jmp  2b\n"					\
+		     ".previous\n"					\
+		     _ASM_EXTABLE(1b, 3b)				\
+		     : [err] "=r" (err), output				\
+		     : "0"(0), input);					\
+	err;								\
+})
+
 #define kernel_insn(insn, output, input...)				\
 	asm volatile("1:" #insn "\n\t"					\
 		     "2:\n"						\
@@ -140,15 +155,15 @@ static inline void copy_kernel_to_fxregs(struct fxregs_state *fx)
 	}
 }
 
-static inline int copy_user_to_fxregs(struct fxregs_state __user *fx)
+static inline int copy_users_to_fxregs(struct fxregs_state *fx)
 {
 	if (IS_ENABLED(CONFIG_X86_32))
-		return user_insn(fxrstor %[fx], "=m" (*fx), [fx] "m" (*fx));
+		return kernel_insn_norestore(fxrstor %[fx], "=m" (*fx), [fx] "m" (*fx));
 	else if (IS_ENABLED(CONFIG_AS_FXSAVEQ))
-		return user_insn(fxrstorq %[fx], "=m" (*fx), [fx] "m" (*fx));
+		return kernel_insn_norestore(fxrstorq %[fx], "=m" (*fx), [fx] "m" (*fx));
 
 	/* See comment in copy_fxregs_to_kernel() below. */
-	return user_insn(rex64/fxrstor (%[fx]), "=m" (*fx), [fx] "R" (fx),
+	return kernel_insn_norestore(rex64/fxrstor (%[fx]), "=m" (*fx), [fx] "R" (fx),
 			  "m" (*fx));
 }
 
@@ -157,9 +172,9 @@ static inline void copy_kernel_to_fregs(struct fregs_state *fx)
 	kernel_insn(frstor %[fx], "=m" (*fx), [fx] "m" (*fx));
 }
 
-static inline int copy_user_to_fregs(struct fregs_state __user *fx)
+static inline int copy_users_to_fregs(struct fregs_state *fx)
 {
-	return user_insn(frstor %[fx], "=m" (*fx), [fx] "m" (*fx));
+	return kernel_insn_norestore(frstor %[fx], "=m" (*fx), [fx] "m" (*fx));
 }
 
 static inline void copy_fxregs_to_kernel(struct fpu *fpu)
@@ -339,16 +354,13 @@ static inline void copy_kernel_to_xregs(struct xregs_state *xstate, u64 mask)
 /*
  * Restore xstate from user space xsave area.
  */
-static inline int copy_user_to_xregs(struct xregs_state __user *buf, u64 mask)
+static inline int copy_users_to_xregs(struct xregs_state *xstate, u64 mask)
 {
-	struct xregs_state *xstate = ((__force struct xregs_state *)buf);
 	u32 lmask = mask;
 	u32 hmask = mask >> 32;
 	int err;
 
-	stac();
 	XSTATE_OP(XRSTOR, xstate, lmask, hmask, err);
-	clac();
 
 	return err;
 }
diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index 970091fb011e9..4ed5c400cac58 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -217,7 +217,8 @@ sanitize_restored_xstate(union fpregs_state *state,
 		 */
 		xsave->i387.mxcsr &= mxcsr_feature_mask;
 
-		convert_to_fxsr(&state->fxsave, ia32_env);
+		if (ia32_env)
+			convert_to_fxsr(&state->fxsave, ia32_env);
 	}
 }
 
@@ -299,28 +300,63 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
 		kfree(tmp);
 		return err;
 	} else {
+		union fpregs_state *state;
+		void *tmp;
 		int ret;
 
+		tmp = kzalloc(sizeof(*state) + fpu_kernel_xstate_size + 64, GFP_KERNEL);
+		if (!tmp)
+			return -ENOMEM;
+		state = PTR_ALIGN(tmp, 64);
+
 		/*
 		 * For 64-bit frames and 32-bit fsave frames, restore the user
 		 * state to the registers directly (with exceptions handled).
 		 */
-		if (use_xsave()) {
-			if ((unsigned long)buf_fx % 64 || fx_only) {
+		if ((unsigned long)buf_fx % 64)
+			fx_only = 1;
+
+		if (use_xsave() && !fx_only) {
+			u64 init_bv = xfeatures_mask & ~xfeatures;
+
+			if (using_compacted_format()) {
+				ret = copy_user_to_xstate(&state->xsave, buf_fx);
+			} else {
+				ret = __copy_from_user(&state->xsave, buf_fx, state_size);
+
+				if (!ret && state_size > offsetof(struct xregs_state, header))
+					ret = validate_xstate_header(&state->xsave.header);
+			}
+			if (ret)
+				goto err_out;
+			sanitize_restored_xstate(state, NULL, xfeatures,
+						 fx_only);
+
+			if (unlikely(init_bv))
+				copy_kernel_to_xregs(&init_fpstate.xsave, init_bv);
+			ret = copy_users_to_xregs(&state->xsave, xfeatures);
+
+		} else if (use_fxsr()) {
+			ret = __copy_from_user(&state->fxsave, buf_fx, state_size);
+			if (ret)
+				goto err_out;
+
+			if (use_xsave()) {
 				u64 init_bv = xfeatures_mask & ~XFEATURE_MASK_FPSSE;
 				copy_kernel_to_xregs(&init_fpstate.xsave, init_bv);
-				ret = copy_user_to_fxregs(buf_fx);
-			} else {
-				u64 init_bv = xfeatures_mask & ~xfeatures;
-				if (unlikely(init_bv))
-					copy_kernel_to_xregs(&init_fpstate.xsave, init_bv);
-				ret = copy_user_to_xregs(buf_fx, xfeatures);
 			}
-		} else if (use_fxsr()) {
-			ret = copy_user_to_fxregs(buf_fx);
-		} else
-			ret = copy_user_to_fregs(buf_fx);
+			state->fxsave.mxcsr &= mxcsr_feature_mask;
 
+			ret = copy_users_to_fxregs(&state->fxsave);
+		} else {
+			ret = __copy_from_user(&state->fsave, buf_fx, state_size);
+			if (ret)
+				goto err_out;
+			ret = copy_users_to_fregs(buf_fx);
+		}
+
+err_out:
+		kfree(tmp);
 		if (ret) {
 			fpu__clear(fpu);
 			return -1;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 21/22] x86/fpu: Merge the two code paths in __fpu__restore_sig()
  2019-01-09 11:47 [PATCH v6] x86: load FPU registers on return to userland Sebastian Andrzej Siewior
                   ` (19 preceding siblings ...)
  2019-01-09 11:47 ` [PATCH 20/22] x86/fpu: Let __fpu__restore_sig() restore the !32bit+fxsr frame from kernel memory Sebastian Andrzej Siewior
@ 2019-01-09 11:47 ` Sebastian Andrzej Siewior
  2019-01-09 11:47 ` [PATCH 22/22] x86/fpu: Defer FPU state load until return to userspace Sebastian Andrzej Siewior
                   ` (2 subsequent siblings)
  23 siblings, 0 replies; 91+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-01-09 11:47 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen,
	Sebastian Andrzej Siewior

The ia32_fxstate case (32bit with fxsr) and the other (64bit, 32bit
without fxsr) restore both from kernel memory and sanitize the content.
The !ia32_fxstate version restores missing xstates from "init state"
while the ia32_fxstate doesn't and skips it.

Merge the two code paths and keep the !ia32_fxstate version. Copy only
the user_i387_ia32_struct data structure in the ia32_fxstate.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/x86/kernel/fpu/signal.c | 162 ++++++++++++++---------------------
 1 file changed, 65 insertions(+), 97 deletions(-)

diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index 4ed5c400cac58..a17e75fa1a0a6 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -224,12 +224,17 @@ sanitize_restored_xstate(union fpregs_state *state,
 
 static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
 {
+	struct user_i387_ia32_struct *envp = NULL;
 	int ia32_fxstate = (buf != buf_fx);
 	struct task_struct *tsk = current;
 	struct fpu *fpu = &tsk->thread.fpu;
 	int state_size = fpu_kernel_xstate_size;
+	struct user_i387_ia32_struct env;
+	union fpregs_state *state;
 	u64 xfeatures = 0;
 	int fx_only = 0;
+	int ret = 0;
+	void *tmp;
 
 	ia32_fxstate &= (IS_ENABLED(CONFIG_X86_32) ||
 			 IS_ENABLED(CONFIG_IA32_EMULATION));
@@ -264,106 +269,69 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
 		}
 	}
 
+	tmp = kzalloc(sizeof(*state) + fpu_kernel_xstate_size + 64, GFP_KERNEL);
+	if (!tmp)
+		return -ENOMEM;
+	state = PTR_ALIGN(tmp, 64);
+
+	if ((unsigned long)buf_fx % 64)
+		fx_only = 1;
+
+	/*
+	 * For 32-bit frames with fxstate, copy the fxstate so it can be
+	 * reconstructed later.
+	 */
 	if (ia32_fxstate) {
-		/*
-		 * For 32-bit frames with fxstate, copy the user state to the
-		 * thread's fpu state, reconstruct fxstate from the fsave
-		 * header. Validate and sanitize the copied state.
-		 */
-		struct user_i387_ia32_struct env;
-		union fpregs_state *state;
-		int err = 0;
-		void *tmp;
-
-		tmp = kzalloc(sizeof(*state) + fpu_kernel_xstate_size + 64, GFP_KERNEL);
-		if (!tmp)
-			return -ENOMEM;
-		state = PTR_ALIGN(tmp, 64);
-
-		if (using_compacted_format()) {
-			err = copy_user_to_xstate(&state->xsave, buf_fx);
-		} else {
-			err = __copy_from_user(&state->xsave, buf_fx, state_size);
-
-			if (!err && state_size > offsetof(struct xregs_state, header))
-				err = validate_xstate_header(&state->xsave.header);
-		}
-
-		if (err || __copy_from_user(&env, buf, sizeof(env))) {
-			err = -1;
-		} else {
-			sanitize_restored_xstate(state, &env,
-						 xfeatures, fx_only);
-			copy_kernel_to_fpregs(state);
-		}
-
-		kfree(tmp);
-		return err;
-	} else {
-		union fpregs_state *state;
-		void *tmp;
-		int ret;
-
-		tmp = kzalloc(sizeof(*state) + fpu_kernel_xstate_size + 64, GFP_KERNEL);
-		if (!tmp)
-			return -ENOMEM;
-		state = PTR_ALIGN(tmp, 64);
-
-		/*
-		 * For 64-bit frames and 32-bit fsave frames, restore the user
-		 * state to the registers directly (with exceptions handled).
-		 */
-		if ((unsigned long)buf_fx % 64)
-			fx_only = 1;
-
-		if (use_xsave() && !fx_only) {
-			u64 init_bv = xfeatures_mask & ~xfeatures;
-
-			if (using_compacted_format()) {
-				ret = copy_user_to_xstate(&state->xsave, buf_fx);
-			} else {
-				ret = __copy_from_user(&state->xsave, buf_fx, state_size);
-
-				if (!ret && state_size > offsetof(struct xregs_state, header))
-					ret = validate_xstate_header(&state->xsave.header);
-			}
-			if (ret)
-				goto err_out;
-			sanitize_restored_xstate(state, NULL, xfeatures,
-						 fx_only);
-
-			if (unlikely(init_bv))
-				copy_kernel_to_xregs(&init_fpstate.xsave, init_bv);
-			ret = copy_users_to_xregs(&state->xsave, xfeatures);
-
-		} else if (use_fxsr()) {
-			ret = __copy_from_user(&state->fxsave, buf_fx, state_size);
-			if (ret)
-				goto err_out;
-
-			if (use_xsave()) {
-				u64 init_bv = xfeatures_mask & ~XFEATURE_MASK_FPSSE;
-				copy_kernel_to_xregs(&init_fpstate.xsave, init_bv);
-			}
-			state->fxsave.mxcsr &= mxcsr_feature_mask;
-
-			ret = copy_users_to_fxregs(&state->fxsave);
-		} else {
-			ret = __copy_from_user(&state->fsave, buf_fx, state_size);
-			if (ret)
-				goto err_out;
-			ret = copy_users_to_fregs(buf_fx);
-		}
-
-err_out:
-		kfree(tmp);
-		if (ret) {
-			fpu__clear(fpu);
-			return -1;
-		}
+		ret = __copy_from_user(&env, buf, sizeof(env));
+		if (ret)
+			goto err_out;
+		envp = &env;
 	}
 
-	return 0;
+	if (use_xsave() && !fx_only) {
+		u64 init_bv = xfeatures_mask & ~xfeatures;
+
+		if (using_compacted_format()) {
+			ret = copy_user_to_xstate(&state->xsave, buf_fx);
+		} else {
+			ret = __copy_from_user(&state->xsave, buf_fx, state_size);
+
+			if (!ret && state_size > offsetof(struct xregs_state, header))
+				ret = validate_xstate_header(&state->xsave.header);
+		}
+		if (ret)
+			goto err_out;
+
+		sanitize_restored_xstate(state, envp, xfeatures, fx_only);
+
+		if (unlikely(init_bv))
+			copy_kernel_to_xregs(&init_fpstate.xsave, init_bv);
+		ret = copy_users_to_xregs(&state->xsave, xfeatures);
+
+	} else if (use_fxsr()) {
+		ret = __copy_from_user(&state->fxsave, buf_fx, state_size);
+		if (ret)
+			goto err_out;
+
+		sanitize_restored_xstate(state, envp, xfeatures, fx_only);
+		if (use_xsave()) {
+			u64 init_bv = xfeatures_mask & ~XFEATURE_MASK_FPSSE;
+			copy_kernel_to_xregs(&init_fpstate.xsave, init_bv);
+		}
+
+		ret = copy_users_to_fxregs(&state->fxsave);
+	} else {
+		ret = __copy_from_user(&state->fsave, buf_fx, state_size);
+		if (ret)
+			goto err_out;
+		ret = copy_users_to_fregs(buf_fx);
+	}
+
+err_out:
+	kfree(tmp);
+	if (ret)
+		fpu__clear(fpu);
+	return ret;
 }
 
 static inline int xstate_sigframe_size(void)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 22/22] x86/fpu: Defer FPU state load until return to userspace
  2019-01-09 11:47 [PATCH v6] x86: load FPU registers on return to userland Sebastian Andrzej Siewior
                   ` (20 preceding siblings ...)
  2019-01-09 11:47 ` [PATCH 21/22] x86/fpu: Merge the two code paths in __fpu__restore_sig() Sebastian Andrzej Siewior
@ 2019-01-09 11:47 ` Sebastian Andrzej Siewior
  2019-01-31  9:16   ` Borislav Petkov
  2019-01-15 12:44 ` [PATCH v6] x86: load FPU registers on return to userland David Laight
  2019-01-30 11:35 ` Borislav Petkov
  23 siblings, 1 reply; 91+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-01-09 11:47 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen,
	Sebastian Andrzej Siewior

From: Rik van Riel <riel@surriel.com>

Defer loading of FPU state until return to userspace. This gives
the kernel the potential to skip loading FPU state for tasks that
stay in kernel mode, or for tasks that end up with repeated
invocations of kernel_fpu_begin() & kernel_fpu_end().

The __fpregs_changes_{begin|end}() section ensures that the register
remain unchanged. Otherwise a context switch or a BH could save the
registers to its FPU context and processor's FPU register would became
random if beeing modified at the same time.

KVM swaps the host/guest register on entry/exit path. I kept the flow as
is. First it ensures that the registers are loaded and then saves the
current (host) state before it loads the guest's register. The swap is
done at the very end with disabled interrupts so it should not change
anymore before theg guest is entered. The read/save version seems to be
cheaper compared to memcpy() in a micro benchmark.

Each thread gets TIF_NEED_FPU_LOAD set as part of fork() / fpu__copy().
For kernel threads, this flag gets never cleared which avoids saving /
restoring the FPU state for kernel threads and during in-kernel usage of
the FPU register.

Signed-off-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/x86/entry/common.c             |   8 +++
 arch/x86/include/asm/fpu/api.h      |  22 +++++-
 arch/x86/include/asm/fpu/internal.h |  27 +++++---
 arch/x86/include/asm/trace/fpu.h    |   5 +-
 arch/x86/kernel/fpu/core.c          | 104 +++++++++++++++++++++-------
 arch/x86/kernel/fpu/signal.c        |  46 +++++++-----
 arch/x86/kernel/process.c           |   2 +-
 arch/x86/kernel/process_32.c        |   5 +-
 arch/x86/kernel/process_64.c        |   5 +-
 arch/x86/kvm/x86.c                  |  20 ++++--
 10 files changed, 179 insertions(+), 65 deletions(-)

diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index 7bc105f47d21a..13e8e29af6ab7 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -31,6 +31,7 @@
 #include <asm/vdso.h>
 #include <linux/uaccess.h>
 #include <asm/cpufeature.h>
+#include <asm/fpu/api.h>
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/syscalls.h>
@@ -196,6 +197,13 @@ __visible inline void prepare_exit_to_usermode(struct pt_regs *regs)
 	if (unlikely(cached_flags & EXIT_TO_USERMODE_LOOP_FLAGS))
 		exit_to_usermode_loop(regs, cached_flags);
 
+	/* Reload ti->flags; we may have rescheduled above. */
+	cached_flags = READ_ONCE(ti->flags);
+
+	fpregs_assert_state_consistent();
+	if (unlikely(cached_flags & _TIF_NEED_FPU_LOAD))
+		switch_fpu_return();
+
 #ifdef CONFIG_COMPAT
 	/*
 	 * Compat syscalls set TS_COMPAT.  Make sure we clear it before
diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h
index 31b66af8eb914..c17620af5d797 100644
--- a/arch/x86/include/asm/fpu/api.h
+++ b/arch/x86/include/asm/fpu/api.h
@@ -10,7 +10,7 @@
 
 #ifndef _ASM_X86_FPU_API_H
 #define _ASM_X86_FPU_API_H
-#include <linux/preempt.h>
+#include <linux/bottom_half.h>
 
 /*
  * Use kernel_fpu_begin/end() if you intend to use FPU in kernel context. It
@@ -22,17 +22,37 @@
 extern void kernel_fpu_begin(void);
 extern void kernel_fpu_end(void);
 extern bool irq_fpu_usable(void);
+extern void fpregs_mark_activate(void);
 
+/*
+ * Use __fpregs_changes_begin() while editing CPU's FPU registers or fpu->state.
+ * A context switch will (and softirq might) save CPU's FPU register to
+ * fpu->state and set TIF_NEED_FPU_LOAD leaving CPU's FPU registers in a random
+ * state.
+ */
 static inline void __fpregs_changes_begin(void)
 {
 	preempt_disable();
+	local_bh_disable();
 }
 
 static inline void __fpregs_changes_end(void)
 {
+	local_bh_enable();
 	preempt_enable();
 }
 
+#ifdef CONFIG_X86_DEBUG_FPU
+extern void fpregs_assert_state_consistent(void);
+#else
+static inline void fpregs_assert_state_consistent(void) { }
+#endif
+
+/*
+ * Load the task FPU state before returning to userspace.
+ */
+extern void switch_fpu_return(void);
+
 /*
  * Query the presence of one or more xfeatures. Works on any legacy CPU as well.
  *
diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 672e51bc0e9b5..61627f8cb3ff4 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -29,7 +29,7 @@ extern void fpu__prepare_write(struct fpu *fpu);
 extern void fpu__save(struct fpu *fpu);
 extern int  fpu__restore_sig(void __user *buf, int ia32_frame);
 extern void fpu__drop(struct fpu *fpu);
-extern int  fpu__copy(struct fpu *dst_fpu, struct fpu *src_fpu);
+extern int  fpu__copy(struct task_struct *dst, struct task_struct *src);
 extern void fpu__clear(struct fpu *fpu);
 extern int  fpu__exception_code(struct fpu *fpu, int trap_nr);
 extern int  dump_fpu(struct pt_regs *ptregs, struct user_i387_struct *fpstate);
@@ -482,13 +482,20 @@ static inline void fpregs_activate(struct fpu *fpu)
 	trace_x86_fpu_regs_activated(fpu);
 }
 
-static inline void __fpregs_load_activate(struct fpu *fpu, int cpu)
+static inline void __fpregs_load_activate(void)
 {
+	struct fpu *fpu = &current->thread.fpu;
+	int cpu = smp_processor_id();
+
+	if (WARN_ON_ONCE(current->mm == NULL))
+		return;
+
 	if (!fpregs_state_valid(fpu, cpu)) {
-		if (current->mm)
-			copy_kernel_to_fpregs(&fpu->state);
+		copy_kernel_to_fpregs(&fpu->state);
 		fpregs_activate(fpu);
+		fpu->last_cpu = cpu;
 	}
+	clear_thread_flag(TIF_NEED_FPU_LOAD);
 }
 
 /*
@@ -499,8 +506,8 @@ static inline void __fpregs_load_activate(struct fpu *fpu, int cpu)
  *  - switch_fpu_prepare() saves the old state.
  *    This is done within the context of the old process.
  *
- *  - switch_fpu_finish() restores the new state as
- *    necessary.
+ *  - switch_fpu_finish() sets TIF_NEED_FPU_LOAD; the floating point state
+ *    will get loaded on return to userspace, or when the kernel needs it.
  */
 static inline void
 switch_fpu_prepare(struct fpu *old_fpu, int cpu)
@@ -521,10 +528,10 @@ switch_fpu_prepare(struct fpu *old_fpu, int cpu)
  */
 
 /*
- * Set up the userspace FPU context for the new task, if the task
- * has used the FPU.
+ * Load PKRU from the FPU context if available. Delay loading the loading of the
+ * complete FPU state until the return to userland.
  */
-static inline void switch_fpu_finish(struct fpu *new_fpu, int cpu)
+static inline void switch_fpu_finish(struct fpu *new_fpu)
 {
 	struct pkru_state *pk;
 	u32 pkru_val = 0;
@@ -532,7 +539,7 @@ static inline void switch_fpu_finish(struct fpu *new_fpu, int cpu)
 	if (!static_cpu_has(X86_FEATURE_FPU))
 		return;
 
-	__fpregs_load_activate(new_fpu, cpu);
+	set_thread_flag(TIF_NEED_FPU_LOAD);
 
 	if (!cpu_feature_enabled(X86_FEATURE_OSPKE))
 		return;
diff --git a/arch/x86/include/asm/trace/fpu.h b/arch/x86/include/asm/trace/fpu.h
index bd65f6ba950f8..91a1422091ceb 100644
--- a/arch/x86/include/asm/trace/fpu.h
+++ b/arch/x86/include/asm/trace/fpu.h
@@ -13,19 +13,22 @@ DECLARE_EVENT_CLASS(x86_fpu,
 
 	TP_STRUCT__entry(
 		__field(struct fpu *, fpu)
+		__field(bool, load_fpu)
 		__field(u64, xfeatures)
 		__field(u64, xcomp_bv)
 		),
 
 	TP_fast_assign(
 		__entry->fpu		= fpu;
+		__entry->load_fpu	= test_thread_flag(TIF_NEED_FPU_LOAD);
 		if (boot_cpu_has(X86_FEATURE_OSXSAVE)) {
 			__entry->xfeatures = fpu->state.xsave.header.xfeatures;
 			__entry->xcomp_bv  = fpu->state.xsave.header.xcomp_bv;
 		}
 	),
-	TP_printk("x86/fpu: %p xfeatures: %llx xcomp_bv: %llx",
+	TP_printk("x86/fpu: %p load: %d xfeatures: %llx xcomp_bv: %llx",
 			__entry->fpu,
+			__entry->load_fpu,
 			__entry->xfeatures,
 			__entry->xcomp_bv
 	)
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 78d8037635932..f52e687dff9ee 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -102,23 +102,20 @@ static void __kernel_fpu_begin(void)
 	kernel_fpu_disable();
 
 	if (current->mm) {
-		/*
-		 * Ignore return value -- we don't care if reg state
-		 * is clobbered.
-		 */
-		copy_fpregs_to_fpstate(fpu);
-	} else {
-		__cpu_invalidate_fpregs_state();
+		if (!test_thread_flag(TIF_NEED_FPU_LOAD)) {
+			set_thread_flag(TIF_NEED_FPU_LOAD);
+			/*
+			 * Ignore return value -- we don't care if reg state
+			 * is clobbered.
+			 */
+			copy_fpregs_to_fpstate(fpu);
+		}
 	}
+	__cpu_invalidate_fpregs_state();
 }
 
 static void __kernel_fpu_end(void)
 {
-	struct fpu *fpu = &current->thread.fpu;
-
-	if (current->mm)
-		copy_kernel_to_fpregs(&fpu->state);
-
 	kernel_fpu_enable();
 }
 
@@ -145,14 +142,16 @@ void fpu__save(struct fpu *fpu)
 {
 	WARN_ON_FPU(fpu != &current->thread.fpu);
 
-	preempt_disable();
+	__fpregs_changes_begin();
 	trace_x86_fpu_before_save(fpu);
 
-	if (!copy_fpregs_to_fpstate(fpu)) {
-		copy_kernel_to_fpregs(&fpu->state);
+	if (!test_thread_flag(TIF_NEED_FPU_LOAD)) {
+		if (!copy_fpregs_to_fpstate(fpu)) {
+			copy_kernel_to_fpregs(&fpu->state);
+		}
 	}
 	trace_x86_fpu_after_save(fpu);
-	preempt_enable();
+	__fpregs_changes_end();
 }
 EXPORT_SYMBOL_GPL(fpu__save);
 
@@ -185,8 +184,11 @@ void fpstate_init(union fpregs_state *state)
 }
 EXPORT_SYMBOL_GPL(fpstate_init);
 
-int fpu__copy(struct fpu *dst_fpu, struct fpu *src_fpu)
+int fpu__copy(struct task_struct *dst, struct task_struct *src)
 {
+	struct fpu *dst_fpu = &dst->thread.fpu;
+	struct fpu *src_fpu = &src->thread.fpu;
+
 	dst_fpu->last_cpu = -1;
 
 	if (!static_cpu_has(X86_FEATURE_FPU))
@@ -201,16 +203,23 @@ int fpu__copy(struct fpu *dst_fpu, struct fpu *src_fpu)
 	memset(&dst_fpu->state.xsave, 0, fpu_kernel_xstate_size);
 
 	/*
-	 * Save current FPU registers directly into the child
-	 * FPU context, without any memory-to-memory copying.
+	 * If the FPU registers are not current just memcpy() the state.
+	 * Otherwise save current FPU registers directly into the child's FPU
+	 * context, without any memory-to-memory copying.
 	 *
 	 * ( The function 'fails' in the FNSAVE case, which destroys
-	 *   register contents so we have to copy them back. )
+	 *   register contents so we have to load them back. )
 	 */
-	if (!copy_fpregs_to_fpstate(dst_fpu)) {
-		memcpy(&src_fpu->state, &dst_fpu->state, fpu_kernel_xstate_size);
-		copy_kernel_to_fpregs(&src_fpu->state);
-	}
+	__fpregs_changes_begin();
+	if (test_thread_flag(TIF_NEED_FPU_LOAD))
+		memcpy(&dst_fpu->state, &src_fpu->state, fpu_kernel_xstate_size);
+
+	else if (!copy_fpregs_to_fpstate(dst_fpu))
+		copy_kernel_to_fpregs(&dst_fpu->state);
+
+	__fpregs_changes_end();
+
+	set_tsk_thread_flag(dst, TIF_NEED_FPU_LOAD);
 
 	trace_x86_fpu_copy_src(src_fpu);
 	trace_x86_fpu_copy_dst(dst_fpu);
@@ -226,10 +235,9 @@ static void fpu__initialize(struct fpu *fpu)
 {
 	WARN_ON_FPU(fpu != &current->thread.fpu);
 
+	set_thread_flag(TIF_NEED_FPU_LOAD);
 	fpstate_init(&fpu->state);
 	trace_x86_fpu_init_state(fpu);
-
-	trace_x86_fpu_activate_state(fpu);
 }
 
 /*
@@ -308,6 +316,8 @@ void fpu__drop(struct fpu *fpu)
  */
 static inline void copy_init_fpstate_to_fpregs(void)
 {
+	__fpregs_changes_begin();
+
 	if (use_xsave())
 		copy_kernel_to_xregs(&init_fpstate.xsave, -1);
 	else if (static_cpu_has(X86_FEATURE_FXSR))
@@ -317,6 +327,9 @@ static inline void copy_init_fpstate_to_fpregs(void)
 
 	if (boot_cpu_has(X86_FEATURE_OSPKE))
 		copy_init_pkru_to_fpregs();
+
+	fpregs_mark_activate();
+	__fpregs_changes_end();
 }
 
 /*
@@ -339,6 +352,45 @@ void fpu__clear(struct fpu *fpu)
 		copy_init_fpstate_to_fpregs();
 }
 
+/*
+ * Load FPU context before returning to userspace.
+ */
+void switch_fpu_return(void)
+{
+	if (!static_cpu_has(X86_FEATURE_FPU))
+		return;
+
+	__fpregs_load_activate();
+}
+EXPORT_SYMBOL_GPL(switch_fpu_return);
+
+#ifdef CONFIG_X86_DEBUG_FPU
+/*
+ * If current FPU state according to its tracking (loaded FPU ctx on this CPU)
+ * is not valid then we must have TIF_NEED_FPU_LOAD set so the context is loaded on
+ * return to userland.
+ */
+void fpregs_assert_state_consistent(void)
+{
+       struct fpu *fpu = &current->thread.fpu;
+
+       if (test_thread_flag(TIF_NEED_FPU_LOAD))
+               return;
+       WARN_ON_FPU(!fpregs_state_valid(fpu, smp_processor_id()));
+}
+EXPORT_SYMBOL_GPL(fpregs_assert_state_consistent);
+#endif
+
+void fpregs_mark_activate(void)
+{
+	struct fpu *fpu = &current->thread.fpu;
+
+	fpregs_activate(fpu);
+	fpu->last_cpu = smp_processor_id();
+	clear_thread_flag(TIF_NEED_FPU_LOAD);
+}
+EXPORT_SYMBOL_GPL(fpregs_mark_activate);
+
 /*
  * x87 math exception handling:
  */
diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index a17e75fa1a0a6..61a03a34a7304 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -230,11 +230,9 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
 	struct fpu *fpu = &tsk->thread.fpu;
 	int state_size = fpu_kernel_xstate_size;
 	struct user_i387_ia32_struct env;
-	union fpregs_state *state;
 	u64 xfeatures = 0;
 	int fx_only = 0;
 	int ret = 0;
-	void *tmp;
 
 	ia32_fxstate &= (IS_ENABLED(CONFIG_X86_32) ||
 			 IS_ENABLED(CONFIG_IA32_EMULATION));
@@ -269,14 +267,18 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
 		}
 	}
 
-	tmp = kzalloc(sizeof(*state) + fpu_kernel_xstate_size + 64, GFP_KERNEL);
-	if (!tmp)
-		return -ENOMEM;
-	state = PTR_ALIGN(tmp, 64);
+	/*
+	 * The current state of the FPU registers does not matter. By setting
+	 * TIF_NEED_FPU_LOAD unconditionally it is ensured that the our xstate
+	 * is not modified on context switch and that the xstate is considered
+	 * to loaded again on return to userland (overriding last_cpu avoids the
+	 * optimisation).
+	 */
+	set_thread_flag(TIF_NEED_FPU_LOAD);
+	__fpu_invalidate_fpregs_state(fpu);
 
 	if ((unsigned long)buf_fx % 64)
 		fx_only = 1;
-
 	/*
 	 * For 32-bit frames with fxstate, copy the fxstate so it can be
 	 * reconstructed later.
@@ -292,43 +294,51 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
 		u64 init_bv = xfeatures_mask & ~xfeatures;
 
 		if (using_compacted_format()) {
-			ret = copy_user_to_xstate(&state->xsave, buf_fx);
+			ret = copy_user_to_xstate(&fpu->state.xsave, buf_fx);
 		} else {
-			ret = __copy_from_user(&state->xsave, buf_fx, state_size);
+			ret = __copy_from_user(&fpu->state.xsave, buf_fx, state_size);
 
 			if (!ret && state_size > offsetof(struct xregs_state, header))
-				ret = validate_xstate_header(&state->xsave.header);
+				ret = validate_xstate_header(&fpu->state.xsave.header);
 		}
 		if (ret)
 			goto err_out;
 
-		sanitize_restored_xstate(state, envp, xfeatures, fx_only);
+		sanitize_restored_xstate(&fpu->state, envp, xfeatures, fx_only);
 
+		__fpregs_changes_begin();
 		if (unlikely(init_bv))
 			copy_kernel_to_xregs(&init_fpstate.xsave, init_bv);
-		ret = copy_users_to_xregs(&state->xsave, xfeatures);
+		ret = copy_users_to_xregs(&fpu->state.xsave, xfeatures);
 
 	} else if (use_fxsr()) {
-		ret = __copy_from_user(&state->fxsave, buf_fx, state_size);
-		if (ret)
+		ret = __copy_from_user(&fpu->state.fxsave, buf_fx, state_size);
+		if (ret) {
+			ret = -EFAULT;
 			goto err_out;
+		}
 
-		sanitize_restored_xstate(state, envp, xfeatures, fx_only);
+		sanitize_restored_xstate(&fpu->state, envp, xfeatures, fx_only);
+
+		__fpregs_changes_begin();
 		if (use_xsave()) {
 			u64 init_bv = xfeatures_mask & ~XFEATURE_MASK_FPSSE;
 			copy_kernel_to_xregs(&init_fpstate.xsave, init_bv);
 		}
 
-		ret = copy_users_to_fxregs(&state->fxsave);
+		ret = copy_users_to_fxregs(&fpu->state.fxsave);
 	} else {
-		ret = __copy_from_user(&state->fsave, buf_fx, state_size);
+		ret = __copy_from_user(&fpu->state.fsave, buf_fx, state_size);
 		if (ret)
 			goto err_out;
+		__fpregs_changes_begin();
 		ret = copy_users_to_fregs(buf_fx);
 	}
+	if (!ret)
+		fpregs_mark_activate();
+	__fpregs_changes_end();
 
 err_out:
-	kfree(tmp);
 	if (ret)
 		fpu__clear(fpu);
 	return ret;
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 90ae0ca510837..2e38a14fdbd3f 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -101,7 +101,7 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
 	dst->thread.vm86 = NULL;
 #endif
 
-	return fpu__copy(&dst->thread.fpu, &src->thread.fpu);
+	return fpu__copy(dst, src);
 }
 
 /*
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 77d9eb43ccac8..1bc47f3a48854 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -234,7 +234,8 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 
 	/* never put a printk in __switch_to... printk() calls wake_up*() indirectly */
 
-	switch_fpu_prepare(prev_fpu, cpu);
+	if (!test_thread_flag(TIF_NEED_FPU_LOAD))
+		switch_fpu_prepare(prev_fpu, cpu);
 
 	/*
 	 * Save away %gs. No need to save %fs, as it was saved on the
@@ -290,7 +291,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 
 	this_cpu_write(current_task, next_p);
 
-	switch_fpu_finish(next_fpu, cpu);
+	switch_fpu_finish(next_fpu);
 
 	/* Load the Intel cache allocation PQR MSR. */
 	resctrl_sched_in();
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index ffea7c557963a..37b2ecef041e6 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -520,7 +520,8 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	WARN_ON_ONCE(IS_ENABLED(CONFIG_DEBUG_ENTRY) &&
 		     this_cpu_read(irq_count) != -1);
 
-	switch_fpu_prepare(prev_fpu, cpu);
+	if (!test_thread_flag(TIF_NEED_FPU_LOAD))
+		switch_fpu_prepare(prev_fpu, cpu);
 
 	/* We must save %fs and %gs before load_TLS() because
 	 * %fs and %gs may be cleared by load_TLS().
@@ -572,7 +573,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	this_cpu_write(current_task, next_p);
 	this_cpu_write(cpu_current_top_of_stack, task_top_of_stack(next_p));
 
-	switch_fpu_finish(next_fpu, cpu);
+	switch_fpu_finish(next_fpu);
 
 	/* Reload sp0. */
 	update_task_stack(next_p);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6c21aa5c00e58..e52e8ac73e755 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7833,6 +7833,10 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 		wait_lapic_expire(vcpu);
 	guest_enter_irqoff();
 
+	fpregs_assert_state_consistent();
+	if (test_thread_flag(TIF_NEED_FPU_LOAD))
+		switch_fpu_return();
+
 	if (unlikely(vcpu->arch.switch_db_regs)) {
 		set_debugreg(0, 7);
 		set_debugreg(vcpu->arch.eff_db[0], 0);
@@ -8092,22 +8096,30 @@ static int complete_emulated_mmio(struct kvm_vcpu *vcpu)
 /* Swap (qemu) user FPU context for the guest FPU context. */
 static void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
 {
-	preempt_disable();
+	__fpregs_changes_begin();
+
 	copy_fpregs_to_fpstate(&current->thread.fpu);
 	/* PKRU is separately restored in kvm_x86_ops->run.  */
 	__copy_kernel_to_fpregs(&vcpu->arch.guest_fpu->state,
 				~XFEATURE_MASK_PKRU);
-	preempt_enable();
+
+	fpregs_mark_activate();
+	__fpregs_changes_end();
+
 	trace_kvm_fpu(1);
 }
 
 /* When vcpu_run ends, restore user space FPU context. */
 static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
 {
-	preempt_disable();
+	__fpregs_changes_begin();
+
 	copy_fpregs_to_fpstate(vcpu->arch.guest_fpu);
 	copy_kernel_to_fpregs(&current->thread.fpu.state);
-	preempt_enable();
+
+	fpregs_mark_activate();
+	__fpregs_changes_end();
+
 	++vcpu->stat.fpu_reload;
 	trace_kvm_fpu(0);
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* Re: [PATCH 01/22] x86/fpu: Remove fpu->initialized usage in __fpu__restore_sig()
  2019-01-09 11:47 ` [PATCH 01/22] x86/fpu: Remove fpu->initialized usage in __fpu__restore_sig() Sebastian Andrzej Siewior
@ 2019-01-14 16:24   ` Borislav Petkov
  2019-02-05 10:08     ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 91+ messages in thread
From: Borislav Petkov @ 2019-01-14 16:24 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, x86, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On Wed, Jan 09, 2019 at 12:47:23PM +0100, Sebastian Andrzej Siewior wrote:
> This is a preparation for the removal of the ->initialized member in the
> fpu struct.
> __fpu__restore_sig() is deactivating the FPU via fpu__drop() and then
> setting manually ->initialized followed by fpu__restore(). The result is
> that it is possible to manipulate fpu->state and the state of registers
> won't be saved/restored on a context switch which would overwrite
> fpu->state.
> 
> Don't access the fpu->state while the content is read from user space
> and examined / sanitized. Use a temporary kmalloc() buffer for the
> preparation of the FPU registers and once the state is considered okay,
> load it. Should something go wrong, return with an error and without
> altering the original FPU registers.
> 
> The removal of "fpu__initialize()" is a nop because fpu->initialized is
> already set for the user task.
> 
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
>  arch/x86/include/asm/fpu/signal.h |  2 +-
>  arch/x86/kernel/fpu/regset.c      |  5 ++--
>  arch/x86/kernel/fpu/signal.c      | 41 ++++++++++++-------------------
>  3 files changed, 19 insertions(+), 29 deletions(-)

...

> @@ -315,40 +313,33 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
>  		 * header. Validate and sanitize the copied state.
>  		 */
>  		struct user_i387_ia32_struct env;
> +		union fpregs_state *state;
>  		int err = 0;
> +		void *tmp;
>  
> -		/*
> -		 * Drop the current fpu which clears fpu->initialized. This ensures
> -		 * that any context-switch during the copy of the new state,
> -		 * avoids the intermediate state from getting restored/saved.
> -		 * Thus avoiding the new restored state from getting corrupted.
> -		 * We will be ready to restore/save the state only after
> -		 * fpu->initialized is again set.
> -		 */
> -		fpu__drop(fpu);
> +		tmp = kzalloc(sizeof(*state) + fpu_kernel_xstate_size + 64, GFP_KERNEL);
> +		if (!tmp)
> +			return -ENOMEM;
> +		state = PTR_ALIGN(tmp, 64);
>  
>  		if (using_compacted_format()) {
> -			err = copy_user_to_xstate(&fpu->state.xsave, buf_fx);
> +			err = copy_user_to_xstate(&state->xsave, buf_fx);
>  		} else {
> -			err = __copy_from_user(&fpu->state.xsave, buf_fx, state_size);
> +			err = __copy_from_user(&state->xsave, buf_fx, state_size);
>  
>  			if (!err && state_size > offsetof(struct xregs_state, header))
> -				err = validate_xstate_header(&fpu->state.xsave.header);
> +				err = validate_xstate_header(&state->xsave.header);
>  		}
>  
>  		if (err || __copy_from_user(&env, buf, sizeof(env))) {
> -			fpstate_init(&fpu->state);
> -			trace_x86_fpu_init_state(fpu);
>  			err = -1;
>  		} else {
> -			sanitize_restored_xstate(tsk, &env, xfeatures, fx_only);
> +			sanitize_restored_xstate(state, &env,
> +						 xfeatures, fx_only);

Just let that one stick out - there are other lines in this file already
longer than 80.

Notwithstanding, I don't see anything wrong with this patch.

Acked-by: Borislav Petkov <bp@suse.de>

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 03/22] x86/fpu: Remove preempt_disable() in fpu__clear()
  2019-01-09 11:47 ` [PATCH 03/22] x86/fpu: Remove preempt_disable() in fpu__clear() Sebastian Andrzej Siewior
@ 2019-01-14 18:55   ` Borislav Petkov
  0 siblings, 0 replies; 91+ messages in thread
From: Borislav Petkov @ 2019-01-14 18:55 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, x86, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On Wed, Jan 09, 2019 at 12:47:25PM +0100, Sebastian Andrzej Siewior wrote:
> The preempt_disable() section was introduced in commit

<---- newline here.

>   a10b6a16cdad8 ("x86/fpu: Make the fpu state change in fpu__clear() scheduler-atomic")

<---- newline here.

> and it was said to be temporary.
> 
> fpu__initialize() initializes the FPU struct to its "init" value and
> then sets ->initialized to 1. The last part is the important one.
> The content of the `state' does not matter because it gets set via
> copy_init_fpstate_to_fpregs().
> A preemption here has little meaning because the register will always be

s/register/registers/

> set to the same content after copy_init_fpstate_to_fpregs(). A softirq
> with a kernel_fpu_begin() could also force to save FPU's register after

ditto.

> fpu__initialize() without changing the outcome here.
> 
> Remove the preempt_disable() section in fpu__clear(), preemption here
> does not hurt.
> 
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
>  arch/x86/kernel/fpu/core.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
> index 1d3ae7988f7f2..1940319268aef 100644
> --- a/arch/x86/kernel/fpu/core.c
> +++ b/arch/x86/kernel/fpu/core.c
> @@ -366,11 +366,9 @@ void fpu__clear(struct fpu *fpu)
>  	 * Make sure fpstate is cleared and initialized.
>  	 */
>  	if (static_cpu_has(X86_FEATURE_FPU)) {
> -		preempt_disable();
>  		fpu__initialize(fpu);
>  		user_fpu_begin();
>  		copy_init_fpstate_to_fpregs();
> -		preempt_enable();
>  	}
>  }
>  
> -- 

With the above addressed:

Reviewed-by: Borislav Petkov <bp@suse.de>

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 04/22] x86/fpu: Always init the `state' in fpu__clear()
  2019-01-09 11:47 ` [PATCH 04/22] x86/fpu: Always init the `state' " Sebastian Andrzej Siewior
@ 2019-01-14 19:32   ` Borislav Petkov
  0 siblings, 0 replies; 91+ messages in thread
From: Borislav Petkov @ 2019-01-14 19:32 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, x86, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On Wed, Jan 09, 2019 at 12:47:26PM +0100, Sebastian Andrzej Siewior wrote:
> fpu__clear() only initializes the `state' if the FPU is present. This
> initialisation is also required for the FPU-less system and takes place

"in math_emulate()."

> math_emulate(). Since fpu__initialize() only performs the initialization
> if ->initialized is zero it does not matter that it is invoked each time
> an opcode is emulated. It makes the removal of ->initialized easier if
> the struct is also initialized in FPU-less case at the same time.

				in the

> 
> Move fpu__initialize() before the FPU check so it is also performed in
> FPU-less case.

in the...

> 
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
>  arch/x86/include/asm/fpu/internal.h | 1 -
>  arch/x86/kernel/fpu/core.c          | 5 ++---
>  arch/x86/math-emu/fpu_entry.c       | 3 ---
>  3 files changed, 2 insertions(+), 7 deletions(-)

With that fixed:

Reviewed-by: Borislav Petkov <bp@suse.de>

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* RE: [PATCH v6] x86: load FPU registers on return to userland
  2019-01-09 11:47 [PATCH v6] x86: load FPU registers on return to userland Sebastian Andrzej Siewior
                   ` (21 preceding siblings ...)
  2019-01-09 11:47 ` [PATCH 22/22] x86/fpu: Defer FPU state load until return to userspace Sebastian Andrzej Siewior
@ 2019-01-15 12:44 ` David Laight
  2019-01-15 13:15   ` 'Sebastian Andrzej Siewior'
  2019-01-15 19:46   ` Dave Hansen
  2019-01-30 11:35 ` Borislav Petkov
  23 siblings, 2 replies; 91+ messages in thread
From: David Laight @ 2019-01-15 12:44 UTC (permalink / raw)
  To: 'Sebastian Andrzej Siewior', linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

From:  Sebastian Andrzej Siewior
> Sent: 09 January 2019 11:47
>
> This is a refurbished series originally started by by Rik van Riel. The
> goal is load the FPU registers on return to userland and not on every
> context switch. By this optimisation we can:
> - avoid loading the registers if the task stays in kernel and does
>   not return to userland
> - make kernel_fpu_begin() cheaper: it only saves the registers on the
>   first invocation. The second invocation does not need save them again.
> 
> To access the FPU registers in kernel we need:
> - disable preemption to avoid that the scheduler switches tasks. By
>   doing so it would set TIF_NEED_FPU_LOAD and the FPU registers would be
>   not valid.
> - disable BH because the softirq might use kernel_fpu_begin() and then
>   set TIF_NEED_FPU_LOAD instead loading the FPU registers on completion.

Once this is done it might be worth while adding a parameter to
kernel_fpu_begin() to request the registers only when they don't
need saving.
This would benefit code paths where the gains are reasonable but not massive.

The return value from kernel_fpu_begin() ought to indicate which
registers are available - none, SSE, SSE2, AVX, AVX512 etc.
So code can use an appropriate implementation.
(I've not looked to see if this is already the case!)

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v6] x86: load FPU registers on return to userland
  2019-01-15 12:44 ` [PATCH v6] x86: load FPU registers on return to userland David Laight
@ 2019-01-15 13:15   ` 'Sebastian Andrzej Siewior'
  2019-01-15 14:33     ` David Laight
  2019-01-15 19:46   ` Dave Hansen
  1 sibling, 1 reply; 91+ messages in thread
From: 'Sebastian Andrzej Siewior' @ 2019-01-15 13:15 UTC (permalink / raw)
  To: David Laight
  Cc: linux-kernel, x86, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On 2019-01-15 12:44:53 [+0000], David Laight wrote:
> Once this is done it might be worth while adding a parameter to
> kernel_fpu_begin() to request the registers only when they don't
> need saving.
> This would benefit code paths where the gains are reasonable but not massive.

So if saving + FPU code is a small win why not do this always?

> The return value from kernel_fpu_begin() ought to indicate which
> registers are available - none, SSE, SSE2, AVX, AVX512 etc.
> So code can use an appropriate implementation.
> (I've not looked to see if this is already the case!)

Either everything is saved or nothing. So if SSE registers are saved
then AVX512 are, too.
I would like to see some benefit of this first before adding/adjusting
the API in a way which makes it possible do something to do wrong. That
said, one thing I would like to do is to get rid of irq_fpu_usable() so
code can use FPU registers and needs not to implement a fallback.

> 	David

Sebastian

^ permalink raw reply	[flat|nested] 91+ messages in thread

* RE: [PATCH v6] x86: load FPU registers on return to userland
  2019-01-15 13:15   ` 'Sebastian Andrzej Siewior'
@ 2019-01-15 14:33     ` David Laight
  0 siblings, 0 replies; 91+ messages in thread
From: David Laight @ 2019-01-15 14:33 UTC (permalink / raw)
  To: 'Sebastian Andrzej Siewior'
  Cc: linux-kernel, x86, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

From: 'Sebastian Andrzej Siewior'
> Sent: 15 January 2019 13:15
> On 2019-01-15 12:44:53 [+0000], David Laight wrote:
> > Once this is done it might be worth while adding a parameter to
> > kernel_fpu_begin() to request the registers only when they don't
> > need saving.
> > This would benefit code paths where the gains are reasonable but not massive.
> 
> So if saving + FPU code is a small win why not do this always?

I was thinking of the case when the cost of the fpu save is greater
than the saving.
This might be true for (say) a crc on a short buffer.

> > The return value from kernel_fpu_begin() ought to indicate which
> > registers are available - none, SSE, SSE2, AVX, AVX512 etc.
> > So code can use an appropriate implementation.
> > (I've not looked to see if this is already the case!)
> 
> Either everything is saved or nothing. So if SSE registers are saved
> then AVX512 are, too.

(I know that - I've written fpu save code for AVX).
I was thinking that the return value would depend on what the cpu supports.

In fact, given some talk about big-little cpus it might be worth being
able to ask for a specific register set.
Potentially that could cause a processor switch.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v6] x86: load FPU registers on return to userland
  2019-01-15 12:44 ` [PATCH v6] x86: load FPU registers on return to userland David Laight
  2019-01-15 13:15   ` 'Sebastian Andrzej Siewior'
@ 2019-01-15 19:46   ` Dave Hansen
  2019-01-15 20:26     ` Andy Lutomirski
  1 sibling, 1 reply; 91+ messages in thread
From: Dave Hansen @ 2019-01-15 19:46 UTC (permalink / raw)
  To: David Laight, 'Sebastian Andrzej Siewior', linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On 1/15/19 4:44 AM, David Laight wrote:
> Once this is done it might be worth while adding a parameter to
> kernel_fpu_begin() to request the registers only when they don't
> need saving.
> This would benefit code paths where the gains are reasonable but not massive.
> 
> The return value from kernel_fpu_begin() ought to indicate which
> registers are available - none, SSE, SSE2, AVX, AVX512 etc.
> So code can use an appropriate implementation.
> (I've not looked to see if this is already the case!)

Yeah, it would be sane to have both a mask passed, and returned, say:

	got = kernel_fpu_begin(XFEATURE_MASK_AVX512, NO_XSAVE_ALLOWED);

	if (got == XFEATURE_MASK_AVX512)
		do_avx_512_goo();
	else
		do_integer_goo();

	kernel_fpu_end(got)

Then, kernel_fpu_begin() can actually work without even *doing* an XSAVE:

	/* Do we have to save state for anything in 'ask_mask'? */
	if (all_states_are_init(ask_mask))
		return ask_mask;

Then kernel_fpu_end() just needs to zero out (re-init) the state, which
it can do with XRSTORS and a careful combination of XSTATE_BV and the
requested feature bitmap (RFBM).

This is all just optimization, though.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v6] x86: load FPU registers on return to userland
  2019-01-15 19:46   ` Dave Hansen
@ 2019-01-15 20:26     ` Andy Lutomirski
  2019-01-15 20:54       ` Dave Hansen
  2019-01-16 10:18       ` David Laight
  0 siblings, 2 replies; 91+ messages in thread
From: Andy Lutomirski @ 2019-01-15 20:26 UTC (permalink / raw)
  To: Dave Hansen, Jason A. Donenfeld
  Cc: David Laight, Sebastian Andrzej Siewior, linux-kernel, x86,
	Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Rik van Riel, Dave Hansen

On Tue, Jan 15, 2019 at 11:46 AM Dave Hansen <dave.hansen@intel.com> wrote:
>
> On 1/15/19 4:44 AM, David Laight wrote:
> > Once this is done it might be worth while adding a parameter to
> > kernel_fpu_begin() to request the registers only when they don't
> > need saving.
> > This would benefit code paths where the gains are reasonable but not massive.
> >
> > The return value from kernel_fpu_begin() ought to indicate which
> > registers are available - none, SSE, SSE2, AVX, AVX512 etc.
> > So code can use an appropriate implementation.
> > (I've not looked to see if this is already the case!)
>
> Yeah, it would be sane to have both a mask passed, and returned, say:
>
>         got = kernel_fpu_begin(XFEATURE_MASK_AVX512, NO_XSAVE_ALLOWED);
>
>         if (got == XFEATURE_MASK_AVX512)
>                 do_avx_512_goo();
>         else
>                 do_integer_goo();
>
>         kernel_fpu_end(got)
>
> Then, kernel_fpu_begin() can actually work without even *doing* an XSAVE:
>
>         /* Do we have to save state for anything in 'ask_mask'? */
>         if (all_states_are_init(ask_mask))
>                 return ask_mask;
>
> Then kernel_fpu_end() just needs to zero out (re-init) the state, which
> it can do with XRSTORS and a careful combination of XSTATE_BV and the
> requested feature bitmap (RFBM).
>
> This is all just optimization, though.

I don't think we'd ever want kernel_fpu_end() to restore anything,
right?  I'm a bit confused as to when this optimization would actually
be useful.

Jason Donenfeld has a rather nice API for this in his Zinc series.
Jason, how is that coming?

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v6] x86: load FPU registers on return to userland
  2019-01-15 20:26     ` Andy Lutomirski
@ 2019-01-15 20:54       ` Dave Hansen
  2019-01-15 21:11         ` Andy Lutomirski
  2019-01-16 10:18       ` David Laight
  1 sibling, 1 reply; 91+ messages in thread
From: Dave Hansen @ 2019-01-15 20:54 UTC (permalink / raw)
  To: Andy Lutomirski, Jason A. Donenfeld
  Cc: David Laight, Sebastian Andrzej Siewior, linux-kernel, x86,
	Paolo Bonzini, Radim Krčmář,
	kvm, Rik van Riel, Dave Hansen

On 1/15/19 12:26 PM, Andy Lutomirski wrote:
> I don't think we'd ever want kernel_fpu_end() to restore anything,
> right?  I'm a bit confused as to when this optimization would actually
> be useful.

Using AVX-512 as an example...

Let's say there was AVX-512 state, and a kernel_fpu_begin() user only
used AVX2.  We could totally avoid doing *any* AVX-512 state save/restore.

The init optimization doesn't help us if there _is_ AVX-512 state, and
the modified optimization only helps if we recently did a XRSTOR at
context switch and have not written to AVX-512 state since XRSTOR.

This probably only matters for AVX-512-using apps that have run on a
kernel with lots of kernel_fpu_begin()s that don't use AVX-512.  So, not
a big deal right now.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v6] x86: load FPU registers on return to userland
  2019-01-15 20:54       ` Dave Hansen
@ 2019-01-15 21:11         ` Andy Lutomirski
  2019-01-16 10:31           ` David Laight
  0 siblings, 1 reply; 91+ messages in thread
From: Andy Lutomirski @ 2019-01-15 21:11 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Andy Lutomirski, Jason A. Donenfeld, David Laight,
	Sebastian Andrzej Siewior, linux-kernel, x86, Paolo Bonzini,
	Radim Krčmář,
	kvm, Rik van Riel, Dave Hansen

On Tue, Jan 15, 2019 at 12:54 PM Dave Hansen <dave.hansen@intel.com> wrote:
>
> On 1/15/19 12:26 PM, Andy Lutomirski wrote:
> > I don't think we'd ever want kernel_fpu_end() to restore anything,
> > right?  I'm a bit confused as to when this optimization would actually
> > be useful.
>
> Using AVX-512 as an example...
>
> Let's say there was AVX-512 state, and a kernel_fpu_begin() user only
> used AVX2.  We could totally avoid doing *any* AVX-512 state save/restore.
>
> The init optimization doesn't help us if there _is_ AVX-512 state, and
> the modified optimization only helps if we recently did a XRSTOR at
> context switch and have not written to AVX-512 state since XRSTOR.
>
> This probably only matters for AVX-512-using apps that have run on a
> kernel with lots of kernel_fpu_begin()s that don't use AVX-512.  So, not
> a big deal right now.

On top of this series, this gets rather awkward, I think -- now we
need to be able to keep track of a state in which some of the user
registers live in the CPU and some live in memory, and we need to be
able to do the partial restore if we go back to user mode like this.
We also need to be able to do a partial save if we end up context
switching.  This seems rather complicated.

Last time I measured it (on Skylake IIRC), a full save was only about
twice as slow as a save that saved nothing at all, so I think we'd
need numbers.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* RE: [PATCH v6] x86: load FPU registers on return to userland
  2019-01-15 20:26     ` Andy Lutomirski
  2019-01-15 20:54       ` Dave Hansen
@ 2019-01-16 10:18       ` David Laight
  1 sibling, 0 replies; 91+ messages in thread
From: David Laight @ 2019-01-16 10:18 UTC (permalink / raw)
  To: 'Andy Lutomirski', Dave Hansen, Jason A. Donenfeld
  Cc: Sebastian Andrzej Siewior, linux-kernel, x86, Paolo Bonzini,
	Radim Krčmář,
	kvm, Rik van Riel, Dave Hansen

From: Andy Lutomirski
> Sent: 15 January 2019 20:27
> On Tue, Jan 15, 2019 at 11:46 AM Dave Hansen <dave.hansen@intel.com> wrote:
> >
> > On 1/15/19 4:44 AM, David Laight wrote:
> > > Once this is done it might be worth while adding a parameter to
> > > kernel_fpu_begin() to request the registers only when they don't
> > > need saving.
> > > This would benefit code paths where the gains are reasonable but not massive.
> > >
> > > The return value from kernel_fpu_begin() ought to indicate which
> > > registers are available - none, SSE, SSE2, AVX, AVX512 etc.
> > > So code can use an appropriate implementation.
> > > (I've not looked to see if this is already the case!)
> >
> > Yeah, it would be sane to have both a mask passed, and returned, say:
> >
> >         got = kernel_fpu_begin(XFEATURE_MASK_AVX512, NO_XSAVE_ALLOWED);

You could merge the two arguments.

> >         if (got == XFEATURE_MASK_AVX512)

	got & XFEATURE_MASK_AVX512

> >                 do_avx_512_goo();
> >         else
> >                 do_integer_goo();
> >
> >         kernel_fpu_end(got)
> >
> > Then, kernel_fpu_begin() can actually work without even *doing* an XSAVE:
> >
> >         /* Do we have to save state for anything in 'ask_mask'? */
> >         if (all_states_are_init(ask_mask))
> >                 return ask_mask;

It almost certainly needs to disable pre-emption - there isn't another
fpu save area.

> >
> > Then kernel_fpu_end() just needs to zero out (re-init) the state, which
> > it can do with XRSTORS and a careful combination of XSTATE_BV and the
> > requested feature bitmap (RFBM).
> >
> > This is all just optimization, though.
> 
> I don't think we'd ever want kernel_fpu_end() to restore anything,
> right?  I'm a bit confused as to when this optimization would actually
> be useful.

The user register restore is deferred to 'return to user'.

What you need to ensure is that the kernel values never leak out
to userspace.

ISTR there is a flag that says that all the AVX registers are zero
(XSAVE writes one, I can't remember if it is readable).
If the registers are all zero I think the kernel code can use them
even if they are 'live' - provided they get zeroed again before
return to user.
I also can't remember whether the fpu flags register is set by AVX
instructions - I know that is a pita to recover.

Also are all system calls entered via asm stubs that look like real functions?
(I think I've seen inline system calls in a linux binary - but that was a
long time ago.)
If that assumption can be made then because the AVX registers are all
caller-saved they are not 'live' on system call entry so can be zeroed
and need not be saved on a context switch.
(They still need saving if the kernel is entered by trap or interrupt.)

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 91+ messages in thread

* RE: [PATCH v6] x86: load FPU registers on return to userland
  2019-01-15 21:11         ` Andy Lutomirski
@ 2019-01-16 10:31           ` David Laight
  0 siblings, 0 replies; 91+ messages in thread
From: David Laight @ 2019-01-16 10:31 UTC (permalink / raw)
  To: 'Andy Lutomirski', Dave Hansen
  Cc: Jason A. Donenfeld, Sebastian Andrzej Siewior, linux-kernel, x86,
	Paolo Bonzini, Radim Krčmář,
	kvm, Rik van Riel, Dave Hansen

From: Andy Lutomirski [mailto:luto@kernel.org]
> On Tue, Jan 15, 2019 at 12:54 PM Dave Hansen <dave.hansen@intel.com> wrote:
> >
> > On 1/15/19 12:26 PM, Andy Lutomirski wrote:
> > > I don't think we'd ever want kernel_fpu_end() to restore anything,
> > > right?  I'm a bit confused as to when this optimization would actually
> > > be useful.
> >
> > Using AVX-512 as an example...
> >
> > Let's say there was AVX-512 state, and a kernel_fpu_begin() user only
> > used AVX2.  We could totally avoid doing *any* AVX-512 state save/restore.
> >
> > The init optimization doesn't help us if there _is_ AVX-512 state, and
> > the modified optimization only helps if we recently did a XRSTOR at
> > context switch and have not written to AVX-512 state since XRSTOR.
> >
> > This probably only matters for AVX-512-using apps that have run on a
> > kernel with lots of kernel_fpu_begin()s that don't use AVX-512.  So, not
> > a big deal right now.
> 
> On top of this series, this gets rather awkward, I think -- now we
> need to be able to keep track of a state in which some of the user
> registers live in the CPU and some live in memory, and we need to be
> able to do the partial restore if we go back to user mode like this.
> We also need to be able to do a partial save if we end up context
> switching.  This seems rather complicated.

If kernel_fpu_begin() requests registers that are 'live' for userspace,
or if the user registers have been saved then you (more or less) have
to disable pre-emption.
OTOH if the kernel wants the AVX2 registers and the user ones are all 0
then the kernel can just use the registers provided kernel_fpu_end()
zeroes them. In this can you can allow pre-emption because it will save
everything and it will all get restored correctly (will need to be
restored when the process is scheduled, not return to user).
The register save area might need zapping (if used) because it might
be readable from user space (by a debugger).

The other case is kernel code that guarantees to save and restore
any registers is uses (it might only want 2 registers for a CRC).
Such code can nest with other kernel users (eg in an ISR).
I'm not sure whether is needs a small 'save area' for fpu flags?
It might be worth adding such a structure to the interface - even
if it is currently a dummy structure.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 05/22] x86/fpu: Remove fpu->initialized usage in copy_fpstate_to_sigframe()
  2019-01-09 11:47 ` [PATCH 05/22] x86/fpu: Remove fpu->initialized usage in copy_fpstate_to_sigframe() Sebastian Andrzej Siewior
@ 2019-01-16 19:36   ` Borislav Petkov
  2019-01-16 22:40     ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 91+ messages in thread
From: Borislav Petkov @ 2019-01-16 19:36 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, x86, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On Wed, Jan 09, 2019 at 12:47:27PM +0100, Sebastian Andrzej Siewior wrote:
> Since ->initialized is always true for user tasks and kernel threads
> don't get this far,

Yeah, this is commit message is too laconic. Don'g get this far "where"?

> we always save the registers directly to userspace.

We don't save registers to userspace - please write stuff out.

So from looking at what you're removing I can barely rhyme up what
you're doing but this needs a lot more explanation why it is OK to
remove the else case. Hell, why was the thing added in the first place
if ->initialized is always true?

And why is it ok to save registers directly to the user task's buffers?

So please be more verbose even at the risk of explaning the obvious.
This is the FPU code, remember? :)

Thx.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 05/22] x86/fpu: Remove fpu->initialized usage in copy_fpstate_to_sigframe()
  2019-01-16 19:36   ` Borislav Petkov
@ 2019-01-16 22:40     ` Sebastian Andrzej Siewior
  2019-01-17 12:22       ` Borislav Petkov
  0 siblings, 1 reply; 91+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-01-16 22:40 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-kernel, x86, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On 2019-01-16 20:36:03 [+0100], Borislav Petkov wrote:
> On Wed, Jan 09, 2019 at 12:47:27PM +0100, Sebastian Andrzej Siewior wrote:
> > Since ->initialized is always true for user tasks and kernel threads
> > don't get this far,
> 
> Yeah, this is commit message is too laconic. Don'g get this far "where"?

To reach copy_fpregs_to_sigframe(). A kernel thread never invokes
copy_fpregs_to_sigframe(). Which means only user threads invoke
copy_fpregs_to_sigframe() and they have ->initialized set to one.

> > we always save the registers directly to userspace.
> 
> We don't save registers to userspace - please write stuff out.

Actually we do. copy_fpregs_to_sigframe() saves current FPU registers to
task's stack frame which is userspace memory.

> So from looking at what you're removing I can barely rhyme up what
> you're doing but this needs a lot more explanation why it is OK to
> remove the else case. Hell, why was the thing added in the first place
> if ->initialized is always true?

I think *parts* of the ->initialized field was wrongly converted while
lazy-FPU was removed *or* it was forgotten to be removed afterwards. Or
I don't know but it looks like a leftover.

At the beginning (while it was added) it was part of the lazy-FPU code.
So if tasks's FPU register are not active then they are saved in task's
FPU struct. So in this case (the else path) it does
	__copy_to_user(buf_fx, xsave, fpu_user_xstate_size)

In the other case (task's FPU struct is not up-to date, the current
FPU register content is in CPU's registers) it does
	copy_fpregs_to_sigframe(buf_fx)
	
which copies CPU's registers. In both cases it copies them (the FPU
registers) to the task's stack frame (the same location). Easy so far?

How does using_compacted_format() fit in here?
The point is that the "compacted" format is never exposed to
userland so it requires normal xsave. So far so good, right? But how
does it work in in the '->initialized = 0' case right?  It was
introduced in commit
  99aa22d0d8f7 ("x86/fpu/xstate: Copy xstate registers directly to the signal frame when compacted format is in use")

and it probably does not explain why this works, right?
So *either* fpregs_active() was always true if the task used FPU *once*
or if it used FPU *recently* and task's FPU register are active (I don't
remember anymore). Anyway:
a) we don't get here because caller checks for fpregs_active() before
   invoking copy_fpstate_to_sigframe()
b) a preemption check resets fpregs_active() after the first check
   then we do "xsave", xsaves traps because FPU is off/disabled, trap
   loads task's FPU registers, gets back to "xsave", "xsave" saves
   CPU's register to the stack frame.

The b part does not work like that since commit
  bef8b6da9522 ("x86/fpu: Handle #NM without FPU emulation as an error")

but then at that point it was "okay" because fpregs_active() would
return true if the task used FPU registers at least once. If it did not
use them then it would not invoke that function (the caller checks for
fpregs_active()).

> And why is it ok to save registers directly to the user task's buffers?
So I can't tell you why it is okay but I can explain why it is done
(well, that part I puzzled together).
The task is running and using FPU registers. Then an evil mind sends a
signal. The task goes into kernel, prepares itself and is about to
handle the signal in userland. It saves its FPU registers on the stack
frame. It zeros its current FPU registers (ready for a fresh start),
loads the address of the signal handler and returns to user land
handling the signal.

Now. The signal handler may use FPU registers and the signal handler
maybe be preempted so you need to save the FPU registers of the signal
handler and you can't mix them up with the FPU register's of the task
(before it started handling the signal).

So in order to avoid a second FPU struct it saves them on user's stack
frame. I *think* this (avoiding a second FPU struct) is the primary
motivation. A bonus point might be that the signal handler has a third
argument the `context'. That means you can use can access the task's FPU
registers from the signal handler. Not sure *why* you want to do so but
yo can.
I can't imagine a use case and I was looking for a user and expecting it
to be glibc but I didn't find anything in the glibc that would explain
it. Intel even defines a few bytes as "user reserved" which are used by
"struct _fpx_sw_bytes" to add a marker in the signal and recognise it on
restore.
The only user that seems to make use of that is `criu' (or it looked
like it does use it). I would prefer to add a second struct-FPU and use
that for the signal handler. This would avoid the whole dance here. And
`criu' could maybe become a proper interface. I don't think as of now
that it will break something in userland if the signal handler suddenly
does not have a pointer to the FPU struct.

> So please be more verbose even at the risk of explaning the obvious.
> This is the FPU code, remember? :)

Okay. So I was verbose *now*. Depending on what you say (or don't) I
will try to recycle this into commit message in a few days.

> Thx.

Sebastian

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 05/22] x86/fpu: Remove fpu->initialized usage in copy_fpstate_to_sigframe()
  2019-01-16 22:40     ` Sebastian Andrzej Siewior
@ 2019-01-17 12:22       ` Borislav Petkov
  2019-01-18 21:14         ` Sebastian Andrzej Siewior
  2019-02-05 14:37         ` [PATCH 05/22 v2] " Sebastian Andrzej Siewior
  0 siblings, 2 replies; 91+ messages in thread
From: Borislav Petkov @ 2019-01-17 12:22 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, x86, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On Wed, Jan 16, 2019 at 11:40:37PM +0100, Sebastian Andrzej Siewior wrote:
> Actually we do. copy_fpregs_to_sigframe() saves current FPU registers to
> task's stack frame which is userspace memory.

I know we do - I was only pointing at the not optimal choice of words -
"save registers to userspace" and to rather say "save hardware registers
to user buffers" or so.

> I think *parts* of the ->initialized field was wrongly converted while
> lazy-FPU was removed *or* it was forgotten to be removed afterwards. Or
> I don't know but it looks like a leftover.
> 
> At the beginning (while it was added) it was part of the lazy-FPU code.
> So if tasks's FPU register are not active then they are saved in task's
> FPU struct. So in this case (the else path) it does
> 	__copy_to_user(buf_fx, xsave, fpu_user_xstate_size)

So far, so good. Comment above says so too:

 * If the fpu, extended register state is live, save the state directly
 * to the user frame pointed by the aligned pointer 'buf_fx'. Otherwise,
 * copy the thread's fpu state to the user frame starting at 'buf_fx'.

> In the other case (task's FPU struct is not up-to date, the current
> FPU register content is in CPU's registers) it does
> 	copy_fpregs_to_sigframe(buf_fx)

ACK.

> How does using_compacted_format() fit in here?
> The point is that the "compacted" format is never exposed to
> userland so it requires normal xsave. So far so good, right? But how
> does it work in in the '->initialized = 0' case right?  It was
> introduced in commit
>   99aa22d0d8f7 ("x86/fpu/xstate: Copy xstate registers directly to the signal frame when compacted format is in use")
> 
> and it probably does not explain why this works, right?

I think this was imposed by our inability to handle XSAVES compacted
format. And that should be fixed now, AFAICR.

> So *either* fpregs_active() was always true if the task used FPU *once*
> or if it used FPU *recently* and task's FPU register are active (I don't
> remember anymore). Anyway:
> a) we don't get here because caller checks for fpregs_active() before
>    invoking copy_fpstate_to_sigframe()

Ok.

> b) a preemption check resets fpregs_active() after the first check
>    then we do "xsave", xsaves traps because FPU is off/disabled, trap
>    loads task's FPU registers, gets back to "xsave", "xsave" saves
>    CPU's register to the stack frame.
> 
> The b part does not work like that since commit
>   bef8b6da9522 ("x86/fpu: Handle #NM without FPU emulation as an error")
> 
> but then at that point it was "okay" because fpregs_active() would
> return true if the task used FPU registers at least once. If it did not
> use them then it would not invoke that function (the caller checks for
> fpregs_active()).

Right, AFAICT, we were moving to eager FPU at that time and this commit
is part of the lazy FPU removal stuff.

> So I can't tell you why it is okay but I can explain why it is done
> (well, that part I puzzled together).

I hate the fact that we have to puzzle stuff together for the FPU code.
;-\

> The task is running and using FPU registers. Then an evil mind sends a
> signal. The task goes into kernel, prepares itself and is about to
> handle the signal in userland. It saves its FPU registers on the stack
> frame. It zeros its current FPU registers (ready for a fresh start),
> loads the address of the signal handler and returns to user land
> handling the signal.
> 
> Now. The signal handler may use FPU registers and the signal handler
> maybe be preempted so you need to save the FPU registers of the signal
> handler and you can't mix them up with the FPU register's of the task
> (before it started handling the signal).
> 
> So in order to avoid a second FPU struct it saves them on user's stack
> frame. I *think* this (avoiding a second FPU struct) is the primary
> motivation.

Yah, makes sense. Sounds like something we'd do :-)

> A bonus point might be that the signal handler has a third
> argument the `context'. That means you can use can access the task's FPU
> registers from the signal handler. Not sure *why* you want to do so but
> yo can.

For <raisins>.

> I can't imagine a use case and I was looking for a user and expecting it
> to be glibc but I didn't find anything in the glibc that would explain
> it. Intel even defines a few bytes as "user reserved" which are used by
> "struct _fpx_sw_bytes" to add a marker in the signal and recognise it on
> restore.
> The only user that seems to make use of that is `criu' (or it looked
> like it does use it). I would prefer to add a second struct-FPU and use
> that for the signal handler. This would avoid the whole dance here.

That would be interesting from the perspective of making the code
straight-forward and not having to document all that dance somewhere.

> And `criu' could maybe become a proper interface. I don't think as of
> now that it will break something in userland if the signal handler
> suddenly does not have a pointer to the FPU struct.

Well, but allocating a special FPU pointer for the signal handler
context sounds simple and clean, no? Or are we afraid that that would
slowdown signal handling, the whole allocation and assignment and
stuff...?

> Okay. So I was verbose *now*. Depending on what you say (or don't) I
> will try to recycle this into commit message in a few days.

Yeah, much much better. Thanks a lot for the effort!

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 05/22] x86/fpu: Remove fpu->initialized usage in copy_fpstate_to_sigframe()
  2019-01-17 12:22       ` Borislav Petkov
@ 2019-01-18 21:14         ` Sebastian Andrzej Siewior
  2019-01-18 21:17           ` Dave Hansen
  2019-02-05 14:37         ` [PATCH 05/22 v2] " Sebastian Andrzej Siewior
  1 sibling, 1 reply; 91+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-01-18 21:14 UTC (permalink / raw)
  To: Borislav Petkov, Ingo Molnar, Oleg Nesterov
  Cc: linux-kernel, x86, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

tl;dr
The kernel saves task's FPU registers on user's signal stack before
entering the signal handler. Can we avoid that and have in-kernel memory
for that? Does someone rely on the FPU registers from the task in the
signal handler?

On 2019-01-17 13:22:53 [+0100], Borislav Petkov wrote:
> > The task is running and using FPU registers. Then an evil mind sends a
> > signal. The task goes into kernel, prepares itself and is about to
> > handle the signal in userland. It saves its FPU registers on the stack
> > frame. It zeros its current FPU registers (ready for a fresh start),
> > loads the address of the signal handler and returns to user land
> > handling the signal.
> > 
> > Now. The signal handler may use FPU registers and the signal handler
> > maybe be preempted so you need to save the FPU registers of the signal
> > handler and you can't mix them up with the FPU register's of the task
> > (before it started handling the signal).
> > 
> > So in order to avoid a second FPU struct it saves them on user's stack
> > frame. I *think* this (avoiding a second FPU struct) is the primary
> > motivation.
> 
> Yah, makes sense. Sounds like something we'd do :-)
> 
> > A bonus point might be that the signal handler has a third
> > argument the `context'. That means you can use can access the task's FPU
> > registers from the signal handler. Not sure *why* you want to do so but
> > yo can.
> 
> For <raisins>.
> 
> > I can't imagine a use case and I was looking for a user and expecting it
> > to be glibc but I didn't find anything in the glibc that would explain
> > it. Intel even defines a few bytes as "user reserved" which are used by
> > "struct _fpx_sw_bytes" to add a marker in the signal and recognise it on
> > restore.
> > The only user that seems to make use of that is `criu' (or it looked
> > like it does use it). I would prefer to add a second struct-FPU and use
> > that for the signal handler. This would avoid the whole dance here.
> 
> That would be interesting from the perspective of making the code
> straight-forward and not having to document all that dance somewhere.
> 
> > And `criu' could maybe become a proper interface. I don't think as of
> > now that it will break something in userland if the signal handler
> > suddenly does not have a pointer to the FPU struct.
> 
> Well, but allocating a special FPU pointer for the signal handler
> context sounds simple and clean, no? Or are we afraid that that would
> slowdown signal handling, the whole allocation and assignment and
> stuff...?

So I *think* we could allocate a second struct fpu for the signal
handler at task creation time and use it.
It should not slow-down signal handling. So instead saving it to user's
stack we would save it to "our" memory. On the restore path we could
trust our buffer and simply load it again.

Sebastian

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 05/22] x86/fpu: Remove fpu->initialized usage in copy_fpstate_to_sigframe()
  2019-01-18 21:14         ` Sebastian Andrzej Siewior
@ 2019-01-18 21:17           ` Dave Hansen
  2019-01-18 21:37             ` Sebastian Andrzej Siewior
  2019-01-21 11:21             ` Oleg Nesterov
  0 siblings, 2 replies; 91+ messages in thread
From: Dave Hansen @ 2019-01-18 21:17 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, Borislav Petkov, Ingo Molnar, Oleg Nesterov
  Cc: linux-kernel, x86, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On 1/18/19 1:14 PM, Sebastian Andrzej Siewior wrote:
> The kernel saves task's FPU registers on user's signal stack before
> entering the signal handler. Can we avoid that and have in-kernel memory
> for that? Does someone rely on the FPU registers from the task in the
> signal handler?

This is part of our ABI for *sure*.  Inspecting that state is how
userspace makes sense of MPX or protection keys faults.  We even use
this in selftests/.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 05/22] x86/fpu: Remove fpu->initialized usage in copy_fpstate_to_sigframe()
  2019-01-18 21:17           ` Dave Hansen
@ 2019-01-18 21:37             ` Sebastian Andrzej Siewior
  2019-01-18 21:43               ` Dave Hansen
  2019-01-21 11:21             ` Oleg Nesterov
  1 sibling, 1 reply; 91+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-01-18 21:37 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Borislav Petkov, Ingo Molnar, Oleg Nesterov, linux-kernel, x86,
	Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On 2019-01-18 13:17:28 [-0800], Dave Hansen wrote:
> On 1/18/19 1:14 PM, Sebastian Andrzej Siewior wrote:
> > The kernel saves task's FPU registers on user's signal stack before
> > entering the signal handler. Can we avoid that and have in-kernel memory
> > for that? Does someone rely on the FPU registers from the task in the
> > signal handler?
> 
> This is part of our ABI for *sure*.  

I missed that part. I will try to look it up and look see if says
something about optional part.
But ABI means we must keep doing it even if there are no users?

> Inspecting that state is how
> userspace makes sense of MPX or protection keys faults.  We even use
> this in selftests/.

Okay. MPX does not check for FP_XSTATE_MAGIC[12] and simply assumes it
is there. That is why I didn't find it.
So we would break MPX. But then MPX is on its way out, so…

Sebastian

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 05/22] x86/fpu: Remove fpu->initialized usage in copy_fpstate_to_sigframe()
  2019-01-18 21:37             ` Sebastian Andrzej Siewior
@ 2019-01-18 21:43               ` Dave Hansen
  0 siblings, 0 replies; 91+ messages in thread
From: Dave Hansen @ 2019-01-18 21:43 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Borislav Petkov, Ingo Molnar, Oleg Nesterov, linux-kernel, x86,
	Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On 1/18/19 1:37 PM, Sebastian Andrzej Siewior wrote:
> On 2019-01-18 13:17:28 [-0800], Dave Hansen wrote:
>> On 1/18/19 1:14 PM, Sebastian Andrzej Siewior wrote:
>>> The kernel saves task's FPU registers on user's signal stack before
>>> entering the signal handler. Can we avoid that and have in-kernel memory
>>> for that? Does someone rely on the FPU registers from the task in the
>>> signal handler?
>>
>> This is part of our ABI for *sure*.  
> 
> I missed that part. I will try to look it up and look see if says
> something about optional part.
> But ABI means we must keep doing it even if there are no users?

I'd bet a large sum of money there are users.

Google for 'uc_mcontext fpregs'.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 05/22] x86/fpu: Remove fpu->initialized usage in copy_fpstate_to_sigframe()
  2019-01-18 21:17           ` Dave Hansen
  2019-01-18 21:37             ` Sebastian Andrzej Siewior
@ 2019-01-21 11:21             ` Oleg Nesterov
  2019-01-22 13:40               ` Borislav Petkov
  2019-02-05 11:17               ` Sebastian Andrzej Siewior
  1 sibling, 2 replies; 91+ messages in thread
From: Oleg Nesterov @ 2019-01-21 11:21 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Sebastian Andrzej Siewior, Borislav Petkov, Ingo Molnar,
	linux-kernel, x86, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On 01/18, Dave Hansen wrote:
>
> On 1/18/19 1:14 PM, Sebastian Andrzej Siewior wrote:
> > The kernel saves task's FPU registers on user's signal stack before
> > entering the signal handler. Can we avoid that and have in-kernel memory
> > for that? Does someone rely on the FPU registers from the task in the
> > signal handler?
>
> This is part of our ABI for *sure*.  Inspecting that state is how
> userspace makes sense of MPX or protection keys faults.  We even use
> this in selftests/.

Yes.

And in any case I do not understand the idea to use the second in-kernel struct fpu.
A signal handler can be interrupted by another signal, this will need to save/restore
the FPU state again.

Oleg.


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 05/22] x86/fpu: Remove fpu->initialized usage in copy_fpstate_to_sigframe()
  2019-01-21 11:21             ` Oleg Nesterov
@ 2019-01-22 13:40               ` Borislav Petkov
  2019-01-22 16:15                 ` Oleg Nesterov
  2019-02-05 11:17               ` Sebastian Andrzej Siewior
  1 sibling, 1 reply; 91+ messages in thread
From: Borislav Petkov @ 2019-01-22 13:40 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Dave Hansen, Sebastian Andrzej Siewior, Ingo Molnar,
	linux-kernel, x86, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On Mon, Jan 21, 2019 at 12:21:17PM +0100, Oleg Nesterov wrote:
> And in any case I do not understand the idea to use the second
> in-kernel struct fpu. A signal handler can be interrupted by another
> signal, this will need to save/restore the FPU state again.

Well, we were just speculating whether doing that would simplify the
code around get_sigframe() et al. But if that is an ABI, then we can't
really touch it.

Btw, where is that whole ABI deal about saving FPU regs on the user
signal stack documented?

Thx.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 05/22] x86/fpu: Remove fpu->initialized usage in copy_fpstate_to_sigframe()
  2019-01-22 13:40               ` Borislav Petkov
@ 2019-01-22 16:15                 ` Oleg Nesterov
  2019-01-22 17:00                   ` Borislav Petkov
  0 siblings, 1 reply; 91+ messages in thread
From: Oleg Nesterov @ 2019-01-22 16:15 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Dave Hansen, Sebastian Andrzej Siewior, Ingo Molnar,
	linux-kernel, x86, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On 01/22, Borislav Petkov wrote:
>
> On Mon, Jan 21, 2019 at 12:21:17PM +0100, Oleg Nesterov wrote:
> > And in any case I do not understand the idea to use the second
> > in-kernel struct fpu. A signal handler can be interrupted by another
> > signal, this will need to save/restore the FPU state again.
>
> Well, we were just speculating whether doing that would simplify the
> code around get_sigframe() et al. But if that is an ABI, then we can't
> really touch it.
>
> Btw, where is that whole ABI deal about saving FPU regs on the user
> signal stack documented?

I don't know... tried to google, found nothing.

the comment in /usr/include/sys/ucontext.h mentions SysV/i386 ABI + historical
reasons, this didn't help.

Oleg.


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 05/22] x86/fpu: Remove fpu->initialized usage in copy_fpstate_to_sigframe()
  2019-01-22 16:15                 ` Oleg Nesterov
@ 2019-01-22 17:00                   ` Borislav Petkov
  2019-02-05 11:34                     ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 91+ messages in thread
From: Borislav Petkov @ 2019-01-22 17:00 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Dave Hansen, Sebastian Andrzej Siewior, Ingo Molnar,
	linux-kernel, x86, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen, Michael Matz

On Tue, Jan 22, 2019 at 05:15:51PM +0100, Oleg Nesterov wrote:
> I don't know... tried to google, found nothing.
> 
> the comment in /usr/include/sys/ucontext.h mentions SysV/i386 ABI + historical
> reasons, this didn't help.

So I'm being told by one of the psABI folks that this is not really
written down somewhere explicitly but it is the result from the POSIX
and psABI treatise of signal handlers, what they're supposed to do,
caller- and callee-saved registers, etc.

And FPU registers are volatile, i.e., caller-saved. Which means, the
handler itself doesn't save them but the caller, which, doesn't really
expect any signals - they are async. So the kernel must do that and
slap the FPU regs onto the user stack...

Hohumm. Makes sense.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 18/22] x86/fpu: Update xstate's PKRU value on write_pkru()
  2019-01-09 11:47 ` [PATCH 18/22] x86/fpu: Update xstate's PKRU value on write_pkru() Sebastian Andrzej Siewior
@ 2019-01-23 17:28   ` Dave Hansen
  0 siblings, 0 replies; 91+ messages in thread
From: Dave Hansen @ 2019-01-23 17:28 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On 1/9/19 3:47 AM, Sebastian Andrzej Siewior wrote:
> +	pk = get_xsave_addr(&current->thread.fpu.state.xsave, XFEATURE_PKRU);
> +	/*
> +	 * The PKRU value in xstate needs to be in sync with the value that is
> +	 * written to the CPU. The FPU restore on return to userland would
> +	 * otherwise load the previous value again.
> +	 */
> +	__fpregs_changes_begin();
> +	if (pk)
> +		pk->pkru = pkru;
> +	__write_pkru(pkru);
> +	__fpregs_changes_end();
>  }

I'm not sure this is right.

The "if (pk)" check is basically to see if there was a location for
XFEATURE_PKRU in the XSAVE buffer.  The only way this can be false in
the current code is if the "init optimization" is in play and
XFEATURE_PKRU was in the init state (all 0's for PKRU).

If it were in the init state, we need to take it *out* of the init
state, both in the buffer and in the registers.  The __write_pkru()
obviously does this for the registers, but "pk->pkru = pkru" is not
enough for the XSAVE buffer.  xsave->header.xfeatures (aka. XSTATE_BV)
also needs to have XFEATURE_PKRU set.  Otherwise, two calls to this
function in succession would break.

	pk = get_xsave_addr(...xsave, XFEATURE_PKRU);
	pk->pkru = pkru;
	__write_pkru(pkru);

	pk = get_xsave_addr(...xsave, XFEATURE_PKRU);
	/* 'pk' is still NULL, won't see 'pkru' set */

I *think* just setting

	xsave->header.xfeatures |= XFEATURE_MASK_PKRU;

will fix this.  I thought we did that whole dance somewhere else in the
code, but I don't see it right now.  Might have been in some other patch.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 12/22] x86/fpu: Only write PKRU if it is different from current
  2019-01-09 11:47 ` [PATCH 12/22] x86/fpu: Only write PKRU if it is different from current Sebastian Andrzej Siewior
@ 2019-01-23 18:09   ` Dave Hansen
  2019-02-07 11:27     ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 91+ messages in thread
From: Dave Hansen @ 2019-01-23 18:09 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On 1/9/19 3:47 AM, Sebastian Andrzej Siewior wrote:
> +static inline void __write_pkru(u32 pkru)
> +{
> +	/*
> +	 * Writting PKRU is expensive. Only write the PKRU value if it is
> +	 * different from the current one.
> +	 */

I'd say:

	WRPKRU is relatively expensive compared to RDPKRU.
	Avoid WRPKRU when it would not change the value.

In the grand scheme of things, WRPKRU is cheap.  It's certainly not an
"expensive instruction" compared to things like WBINVD.

> +	if (pkru == __read_pkru())
> +		return;
> +	__write_pkru_insn(pkru);
> +}

Is there a case where we need __write_pkru_insn() directly?  Why not
just put the inline assembly in here?

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 06/22] x86/fpu: Don't save fxregs for ia32 frames in copy_fpstate_to_sigframe()
  2019-01-09 11:47 ` [PATCH 06/22] x86/fpu: Don't save fxregs for ia32 frames " Sebastian Andrzej Siewior
@ 2019-01-24 11:17   ` Borislav Petkov
  2019-02-05 16:43     ` [PATCH 06/22 v2] x86/fpu: Don't save fxregs for ia32 frames in Sebastian Andrzej Siewior
  0 siblings, 1 reply; 91+ messages in thread
From: Borislav Petkov @ 2019-01-24 11:17 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, x86, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On Wed, Jan 09, 2019 at 12:47:28PM +0100, Sebastian Andrzej Siewior wrote:
> Why does copy_fpstate_to_sigframe() do copy_fxregs_to_kernel() in the
> ia32_fxstate case? I don't know. It just does.
> Maybe it was required at some point, maybe it was added by accident and
> nobody noticed it because it makes no difference.

So

  72a671ced66d ("x86, fpu: Unify signal handling code paths for x86 and x86_64 kernels")

talks about some exclusion of legacy fsave state.

> In copy_fpstate_to_sigframe() we stash the FPU state into the task's
> stackframe. Then the CPU's FPU registers (and its fpu->state) are
> cleared (handle_signal() does fpu__clear()).

So that fpu__clear() name is not optimal. It should be
fpu__reinitialize() or so. The comment above it says so too:

/*
 * Clear the FPU state back to init state.

> So it makes *no* difference
> what happens to fpu->state after copy_fpregs_to_sigframe().
> 
> Remove copy_fxregs_to_kernel() since it does not matter what it does and
> save a few cycles.
> 
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
>  arch/x86/kernel/fpu/signal.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
> index c136a4327659d..047390a45e016 100644
> --- a/arch/x86/kernel/fpu/signal.c
> +++ b/arch/x86/kernel/fpu/signal.c
> @@ -174,9 +174,6 @@ int copy_fpstate_to_sigframe(void __user *buf, void __user *buf_fx, int size)
>  	/* Save the live register state to the user directly. */
>  	if (copy_fpregs_to_sigframe(buf_fx))
>  		return -1;
> -	/* Update the thread's fxstate to save the fsave header. */
> -	if (ia32_fxstate)
> -		copy_fxregs_to_kernel(fpu);

Need to get rid of that local "fpu" var too:

arch/x86/kernel/fpu/signal.c: In function ‘copy_fpstate_to_sigframe’:
arch/x86/kernel/fpu/signal.c:159:14: warning: unused variable ‘fpu’ [-Wunused-variable]
  struct fpu *fpu = &current->thread.fpu;
              ^~~

>  	/* Save the fsave header for the 32-bit frames. */
>  	if ((ia32_fxstate || !use_fxsr()) && save_fsave_header(tsk, buf))
> -- 
> 2.20.1
> 

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 07/22] x86/fpu: Remove fpu->initialized
  2019-01-09 11:47 ` [PATCH 07/22] x86/fpu: Remove fpu->initialized Sebastian Andrzej Siewior
@ 2019-01-24 13:34   ` Borislav Petkov
  2019-02-05 18:03     ` Sebastian Andrzej Siewior
  2019-02-05 18:06     ` [PATCH 07/22 v2] " Sebastian Andrzej Siewior
  0 siblings, 2 replies; 91+ messages in thread
From: Borislav Petkov @ 2019-01-24 13:34 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, x86, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On Wed, Jan 09, 2019 at 12:47:29PM +0100, Sebastian Andrzej Siewior wrote:
> The `initialized' member of the fpu struct is always set to one user
								 ^
								 for

> tasks and zero for kernel tasks. This avoids saving/restoring the FPU
> registers for kernel threads.
> 
> I expect that fpu->initialized is always 1 and the 0 case has been

"I expect" reads funny. Make that impartial and passive and "expecting"
is the wrong formulation. It needs to be a clear statement talking about
the FPU context's state and why that state is always initialized.

> removed or is not important. For instance fpu__drop() sets the value to
> zero and its caller call either fpu__initialize() (which would

"its callers invoke" or so

> set it back to one) or don't return to userland.
> 
> The context switch code (switch_fpu_prepare() + switch_fpu_finish())
> can't unconditionally save/restore registers for kernel threads. I have
> no idea what will happen if we restore a zero FPU context for the kernel
> thread (since it never was initialized).

Yeah, avoid those "author is wondering" statements.

> Also it has been agreed that
> for PKRU we don't want a random state (inherited from the previous task)
> but a deterministic one.

Rewrite that to state what the PKRU state is going to be.

> For kernel_fpu_begin() (+end) the situation is similar: The kernel test
> bot told me, that EFI with runtime services uses this before
> alternatives_patched is true. Which means that this function is used too
> early and it wasn't the case before.
> 
> For those two cases current->mm is used to determine between user &
> kernel thread.

Now that we start looking at ->mm, I think we should document this
somewhere prominently, maybe

  arch/x86/include/asm/fpu/internal.h

or so along with all the logic this patchset changes wrt FPU handling.
Then we wouldn't have to wonder in the future why stuff is being done
the way it is done.

Like the FPU saving on the user stack frame or why this was needed:

-	/* Update the thread's fxstate to save the fsave header. */
-	if (ia32_fxstate)
-		copy_fxregs_to_kernel(fpu);

Some sort of a high-level invariants written down would save us a lot of
head scratching in the future.

> For kernel_fpu_begin() we skip save/restore of the FPU
> registers.
> During the context switch into a kernel thread we don't do anything.
> There is no reason to save the FPU state of a kernel thread.
> 
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
>  arch/x86/ia32/ia32_signal.c         | 17 +++-----
>  arch/x86/include/asm/fpu/internal.h | 15 +++----
>  arch/x86/include/asm/fpu/types.h    |  9 ----
>  arch/x86/include/asm/trace/fpu.h    |  5 +--
>  arch/x86/kernel/fpu/core.c          | 68 ++++++++---------------------
>  arch/x86/kernel/fpu/init.c          |  2 -
>  arch/x86/kernel/fpu/regset.c        | 19 ++------
>  arch/x86/kernel/fpu/xstate.c        |  2 -
>  arch/x86/kernel/process_32.c        |  4 +-
>  arch/x86/kernel/process_64.c        |  4 +-
>  arch/x86/kernel/signal.c            | 17 +++-----
>  arch/x86/mm/pkeys.c                 |  7 +--
>  12 files changed, 49 insertions(+), 120 deletions(-)

...

> diff --git a/arch/x86/include/asm/trace/fpu.h b/arch/x86/include/asm/trace/fpu.h
> index 069c04be15076..bd65f6ba950f8 100644
> --- a/arch/x86/include/asm/trace/fpu.h
> +++ b/arch/x86/include/asm/trace/fpu.h
> @@ -13,22 +13,19 @@ DECLARE_EVENT_CLASS(x86_fpu,
>  
>  	TP_STRUCT__entry(
>  		__field(struct fpu *, fpu)
> -		__field(bool, initialized)
>  		__field(u64, xfeatures)
>  		__field(u64, xcomp_bv)
>  		),

Yikes, can you do that?

rostedt has been preaching that adding members at the end of tracepoints
is ok but not changing them in the middle as that breaks ABI.

Might wanna ping him about it first.

>  
>  	TP_fast_assign(
>  		__entry->fpu		= fpu;
> -		__entry->initialized	= fpu->initialized;
>  		if (boot_cpu_has(X86_FEATURE_OSXSAVE)) {
>  			__entry->xfeatures = fpu->state.xsave.header.xfeatures;
>  			__entry->xcomp_bv  = fpu->state.xsave.header.xcomp_bv;
>  		}
>  	),
> -	TP_printk("x86/fpu: %p initialized: %d xfeatures: %llx xcomp_bv: %llx",
> +	TP_printk("x86/fpu: %p xfeatures: %llx xcomp_bv: %llx",
>  			__entry->fpu,
> -			__entry->initialized,
>  			__entry->xfeatures,
>  			__entry->xcomp_bv
>  	)
> diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
> index e43296854e379..3a4668c9d24f1 100644
> --- a/arch/x86/kernel/fpu/core.c
> +++ b/arch/x86/kernel/fpu/core.c
> @@ -101,7 +101,7 @@ static void __kernel_fpu_begin(void)
>  
>  	kernel_fpu_disable();
>  
> -	if (fpu->initialized) {
> +	if (current->mm) {
>  		/*
>  		 * Ignore return value -- we don't care if reg state
>  		 * is clobbered.
> @@ -116,7 +116,7 @@ static void __kernel_fpu_end(void)
>  {
>  	struct fpu *fpu = &current->thread.fpu;
>  
> -	if (fpu->initialized)
> +	if (current->mm)
>  		copy_kernel_to_fpregs(&fpu->state);
>  
>  	kernel_fpu_enable();
> @@ -147,10 +147,9 @@ void fpu__save(struct fpu *fpu)
>  
>  	preempt_disable();
>  	trace_x86_fpu_before_save(fpu);
> -	if (fpu->initialized) {
> -		if (!copy_fpregs_to_fpstate(fpu)) {
> -			copy_kernel_to_fpregs(&fpu->state);
> -		}
> +
> +	if (!copy_fpregs_to_fpstate(fpu)) {
> +		copy_kernel_to_fpregs(&fpu->state);
>  	}

WARNING: braces {} are not necessary for single statement blocks
#217: FILE: arch/x86/kernel/fpu/core.c:151:
+       if (!copy_fpregs_to_fpstate(fpu)) {
+               copy_kernel_to_fpregs(&fpu->state);
        }


...

> diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
> index 7888a41a03cdb..77d9eb43ccac8 100644
> --- a/arch/x86/kernel/process_32.c
> +++ b/arch/x86/kernel/process_32.c
> @@ -288,10 +288,10 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
>  	if (prev->gs | next->gs)
>  		lazy_load_gs(next->gs);
>  
> -	switch_fpu_finish(next_fpu, cpu);
> -
>  	this_cpu_write(current_task, next_p);
>  
> +	switch_fpu_finish(next_fpu, cpu);
> +
>  	/* Load the Intel cache allocation PQR MSR. */
>  	resctrl_sched_in();
>  
> diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
> index e1983b3a16c43..ffea7c557963a 100644
> --- a/arch/x86/kernel/process_64.c
> +++ b/arch/x86/kernel/process_64.c
> @@ -566,14 +566,14 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
>  
>  	x86_fsgsbase_load(prev, next);
>  
> -	switch_fpu_finish(next_fpu, cpu);
> -
>  	/*
>  	 * Switch the PDA and FPU contexts.
>  	 */
>  	this_cpu_write(current_task, next_p);
>  	this_cpu_write(cpu_current_top_of_stack, task_top_of_stack(next_p));
>  
> +	switch_fpu_finish(next_fpu, cpu);
> +
>  	/* Reload sp0. */
>  	update_task_stack(next_p);
>  

Those moves need at least a comment in the commit message or a separate
patch.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 08/22] x86/fpu: Remove user_fpu_begin()
  2019-01-09 11:47 ` [PATCH 08/22] x86/fpu: Remove user_fpu_begin() Sebastian Andrzej Siewior
@ 2019-01-25 15:18   ` Borislav Petkov
  2019-02-05 18:16     ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 91+ messages in thread
From: Borislav Petkov @ 2019-01-25 15:18 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, x86, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On Wed, Jan 09, 2019 at 12:47:30PM +0100, Sebastian Andrzej Siewior wrote:
> user_fpu_begin() sets fpu_fpregs_owner_ctx to task's fpu struct. This is
> always the case since there is no lazy FPU anymore.
> 
> fpu_fpregs_owner_ctx is used during context switch to decide if it needs
> to load the saved registers or if the currently loaded registers are
> valid. It could be skipped during
> 	taskA -> kernel thread -> taskA
> 
> because the switch to kernel thread would not alter the CPU's FPU state.
> 
> Since this field is always updated during context switch and never
> invalidated, setting it manually (in user context) makes no difference.
> A kernel thread with kernel_fpu_begin() block could set
> fpu_fpregs_owner_ctx to NULL but a kernel thread does not use
> user_fpu_begin().
> This is a leftover from the lazy-FPU time.
> 
> Remove user_fpu_begin(), it does not change fpu_fpregs_owner_ctx's
> content.
> 
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
>  arch/x86/include/asm/fpu/internal.h | 17 -----------------
>  arch/x86/kernel/fpu/core.c          |  4 +---
>  arch/x86/kernel/fpu/signal.c        |  1 -
>  3 files changed, 1 insertion(+), 21 deletions(-)

Reviewed-by: Borislav Petkov <bp@suse.de>

Should we do this microoptimization in addition, to save us the
activation when the kernel thread here:

	taskA -> kernel thread -> taskA

doesn't call kernel_fpu_begin() and thus fpu_fpregs_owner_ctx remains
the same?

It would be a bit more correct as it won't invoke the
trace_x86_fpu_regs_activated() TP in case the FPU context is the same.

---
diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index bfe0bfc7d0d1..ee1ac46a7820 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -510,7 +510,7 @@ switch_fpu_prepare(struct fpu *old_fpu, int cpu)
  * Set up the userspace FPU context for the new task, if the task
  * has used the FPU.
  */
-static inline void switch_fpu_finish(struct fpu *new_fpu, int cpu)
+static inline void switch_fpu_finish(struct fpu *prev_fpu, struct fpu *new_fpu, int cpu)
 {
 	if (static_cpu_has(X86_FEATURE_FPU)) {
 		if (!fpregs_state_valid(new_fpu, cpu)) {
@@ -518,7 +518,8 @@ static inline void switch_fpu_finish(struct fpu *new_fpu, int cpu)
 				copy_kernel_to_fpregs(&new_fpu->state);
 		}
 
-		fpregs_activate(new_fpu);
+		if (prev_fpu != new_fpu)
+			fpregs_activate(new_fpu);
 	}
 }
 
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 77d9eb43ccac..f8205df2df1d 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -290,7 +290,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 
 	this_cpu_write(current_task, next_p);
 
-	switch_fpu_finish(next_fpu, cpu);
+	switch_fpu_finish(prev_fpu, next_fpu, cpu);
 
 	/* Load the Intel cache allocation PQR MSR. */
 	resctrl_sched_in();
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index ffea7c557963..5f153b963180 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -572,7 +572,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	this_cpu_write(current_task, next_p);
 	this_cpu_write(cpu_current_top_of_stack, task_top_of_stack(next_p));
 
-	switch_fpu_finish(next_fpu, cpu);
+	switch_fpu_finish(prev_fpu, next_fpu, cpu);
 
 	/* Reload sp0. */
 	update_task_stack(next_p);


-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* Re: [PATCH 09/22] x86/fpu: Add (__)make_fpregs_active helpers
  2019-01-09 11:47 ` [PATCH 09/22] x86/fpu: Add (__)make_fpregs_active helpers Sebastian Andrzej Siewior
@ 2019-01-28 18:23   ` Borislav Petkov
  2019-02-07 10:43     ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 91+ messages in thread
From: Borislav Petkov @ 2019-01-28 18:23 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, x86, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On Wed, Jan 09, 2019 at 12:47:31PM +0100, Sebastian Andrzej Siewior wrote:
> From: Rik van Riel <riel@surriel.com>
> 
> Add helper function that ensures the floating point registers for
> the current task are active. Use with preemption disabled.
> 
> Signed-off-by: Rik van Riel <riel@surriel.com>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
>  arch/x86/include/asm/fpu/api.h      | 11 +++++++++++
>  arch/x86/include/asm/fpu/internal.h | 19 +++++++++++--------
>  2 files changed, 22 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h
> index b56d504af6545..31b66af8eb914 100644
> --- a/arch/x86/include/asm/fpu/api.h
> +++ b/arch/x86/include/asm/fpu/api.h
> @@ -10,6 +10,7 @@
>  
>  #ifndef _ASM_X86_FPU_API_H
>  #define _ASM_X86_FPU_API_H
> +#include <linux/preempt.h>
>  
>  /*
>   * Use kernel_fpu_begin/end() if you intend to use FPU in kernel context. It
> @@ -22,6 +23,16 @@ extern void kernel_fpu_begin(void);
>  extern void kernel_fpu_end(void);
>  extern bool irq_fpu_usable(void);
>  
> +static inline void __fpregs_changes_begin(void)
> +{
> +	preempt_disable();
> +}
> +
> +static inline void __fpregs_changes_end(void)

How am I to understand that "fpregs_changes" thing? That FPU registers
changes will begin and end respectively?

I probably would call them fpregs_lock and fpregs_unlock even if
it isn't doing any locking to denote that FPU regs are locked and
inaccessible inside the region.

And why the "__" prefix? Is there a counterpart without the "__" coming?

> +{
> +	preempt_enable();
> +}
> +
>  /*
>   * Query the presence of one or more xfeatures. Works on any legacy CPU as well.
>   *
> diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
> index 03acb9aeb32fc..795a0a2df135e 100644
> --- a/arch/x86/include/asm/fpu/internal.h
> +++ b/arch/x86/include/asm/fpu/internal.h
> @@ -515,6 +515,15 @@ static inline void fpregs_activate(struct fpu *fpu)
>  	trace_x86_fpu_regs_activated(fpu);
>  }
>  
> +static inline void __fpregs_load_activate(struct fpu *fpu, int cpu)
> +{
> +	if (!fpregs_state_valid(fpu, cpu)) {
> +		if (current->mm)
> +			copy_kernel_to_fpregs(&fpu->state);
> +		fpregs_activate(fpu);
> +	}
> +}
> +
>  /*
>   * FPU state switching for scheduling.
>   *
> @@ -550,14 +559,8 @@ switch_fpu_prepare(struct fpu *old_fpu, int cpu)
>   */
>  static inline void switch_fpu_finish(struct fpu *new_fpu, int cpu)
>  {
> -	if (static_cpu_has(X86_FEATURE_FPU)) {
> -		if (!fpregs_state_valid(new_fpu, cpu)) {
> -			if (current->mm)
> -				copy_kernel_to_fpregs(&new_fpu->state);
> -		}
> -
> -		fpregs_activate(new_fpu);
> -	}
> +	if (static_cpu_has(X86_FEATURE_FPU))
> +		__fpregs_load_activate(new_fpu, cpu);

And that second part of a cleanup looks strange in this patch. Why isn't
it in a separate patch or how is it related to the addition of the
helpers?

Thx.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 10/22] x86/fpu: Make __raw_xsave_addr() use feature number instead of mask
  2019-01-09 11:47 ` [PATCH 10/22] x86/fpu: Make __raw_xsave_addr() use feature number instead of mask Sebastian Andrzej Siewior
@ 2019-01-28 18:30   ` Borislav Petkov
  0 siblings, 0 replies; 91+ messages in thread
From: Borislav Petkov @ 2019-01-28 18:30 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, x86, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On Wed, Jan 09, 2019 at 12:47:32PM +0100, Sebastian Andrzej Siewior wrote:
> Most users of __raw_xsave_addr() use a feature number, shift it to a
> mask and then __raw_xsave_addr() shifts it back to the feature number.
> 
> Make __raw_xsave_addr() use the feature number as an argument.
> 
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
>  arch/x86/kernel/fpu/xstate.c | 22 +++++++++++-----------
>  1 file changed, 11 insertions(+), 11 deletions(-)

Reviewed-by: Borislav Petkov <bp@suse.de>

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 11/22] x86/fpu: Make get_xsave_field_ptr() and get_xsave_addr() use feature number instead of mask
  2019-01-09 11:47 ` [PATCH 11/22] x86/fpu: Make get_xsave_field_ptr() and get_xsave_addr() " Sebastian Andrzej Siewior
@ 2019-01-28 18:49   ` Borislav Petkov
  2019-02-07 11:13     ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 91+ messages in thread
From: Borislav Petkov @ 2019-01-28 18:49 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, x86, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On Wed, Jan 09, 2019 at 12:47:33PM +0100, Sebastian Andrzej Siewior wrote:
> After changing the argument of __raw_xsave_addr() from a mask to number
> Dave suggested to check if it makes sense to do the same for
> get_xsave_addr(). As it turns out it does. Only get_xsave_addr() needs
> the mask to check if the requested feature is part of what is
> support/saved and then uses the number again. The shift operation is
> cheaper compared to "find last bit set". Also, the feature number uses
> less opcode space compared to the mask :)
> 
> Make get_xsave_addr() argument a xfeature number instead of mask and fix
> up its callers.
> As part of this use xfeature_nr and xfeature_mask consistently.

Good!

> This results in changes to the kvm code as:
> 	feature -> xfeature_mask
> 	index -> xfeature_nr
> 
> Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
>  arch/x86/include/asm/fpu/xstate.h |  4 ++--
>  arch/x86/kernel/fpu/xstate.c      | 23 +++++++++++------------
>  arch/x86/kernel/traps.c           |  2 +-
>  arch/x86/kvm/x86.c                | 28 ++++++++++++++--------------
>  arch/x86/mm/mpx.c                 |  6 +++---
>  5 files changed, 31 insertions(+), 32 deletions(-)
> 
> diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
> index 48581988d78c7..fbe41f808e5d8 100644
> --- a/arch/x86/include/asm/fpu/xstate.h
> +++ b/arch/x86/include/asm/fpu/xstate.h
> @@ -46,8 +46,8 @@ extern void __init update_regset_xstate_info(unsigned int size,
>  					     u64 xstate_mask);
>  
>  void fpu__xstate_clear_all_cpu_caps(void);
> -void *get_xsave_addr(struct xregs_state *xsave, int xstate);
> -const void *get_xsave_field_ptr(int xstate_field);
> +void *get_xsave_addr(struct xregs_state *xsave, int xfeature_nr);
> +const void *get_xsave_field_ptr(int xfeature_nr);
>  int using_compacted_format(void);
>  int copy_xstate_to_kernel(void *kbuf, struct xregs_state *xsave, unsigned int offset, unsigned int size);
>  int copy_xstate_to_user(void __user *ubuf, struct xregs_state *xsave, unsigned int offset, unsigned int size);
> diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
> index 0e759a032c1c7..d288e4d271b71 100644
> --- a/arch/x86/kernel/fpu/xstate.c
> +++ b/arch/x86/kernel/fpu/xstate.c
> @@ -830,15 +830,15 @@ static void *__raw_xsave_addr(struct xregs_state *xsave, int xfeature_nr)
>   *
>   * Inputs:
>   *	xstate: the thread's storage area for all FPU data
> - *	xstate_feature: state which is defined in xsave.h (e.g.
> - *	XFEATURE_MASK_FP, XFEATURE_MASK_SSE, etc...)
> + *	xfeature_nr: state which is defined in xsave.h (e.g. XFEATURE_FP,
> + *	XFEATURE_SSE, etc...)
>   * Output:
>   *	address of the state in the xsave area, or NULL if the
>   *	field is not present in the xsave buffer.
>   */
> -void *get_xsave_addr(struct xregs_state *xsave, int xstate_feature)
> +void *get_xsave_addr(struct xregs_state *xsave, int xfeature_nr)
>  {
> -	int xfeature_nr;
> +	u64 xfeature_mask = 1ULL << xfeature_nr;

You can paste directly BIT_ULL(xfeature_nr) where you need it in this
function...

>  	/*
>  	 * Do we even *have* xsave state?
>  	 */
> @@ -850,11 +850,11 @@ void *get_xsave_addr(struct xregs_state *xsave, int xstate_feature)
>  	 * have not enabled.  Remember that pcntxt_mask is
>  	 * what we write to the XCR0 register.
>  	 */
> -	WARN_ONCE(!(xfeatures_mask & xstate_feature),
> +	WARN_ONCE(!(xfeatures_mask & xfeature_mask),

... and turn this into:

	WARN_ONCE(!(xfeatures_mask & BIT_ULL(xfeature_nr))

which is more readable than the AND of two variables which I had to
re-focus my eyes to see the difference. :)

Oh and this way, gcc generates better code by doing simply a BT
directly:

# arch/x86/kernel/fpu/xstate.c:852:     WARN_ONCE(!(xfeatures_mask & BIT_ULL(xfeature_nr)),
        .loc 1 852 2 view .LVU258
        movq    xfeatures_mask(%rip), %rax      # xfeatures_mask, tmp124
        btq     %rsi, %rax      # xfeature_nr, tmp124


without first computing the shift into xfeature_mask:

# arch/x86/kernel/fpu/xstate.c:841:     u64 xfeature_mask = 1ULL << xfeature_nr;
        .loc 1 841 6 view .LVU249
        movl    %esi, %ecx      # xfeature_nr, tmp120
        movl    $1, %ebp        #, tmp105
        salq    %cl, %rbp       # tmp120, xfeature_mask

and then testing it:

# arch/x86/kernel/fpu/xstate.c:853:     WARN_ONCE(!(xfeatures_mask & xfeature_mask),
        .loc 1 853 2 view .LVU256
        testq   %rbp, xfeatures_mask(%rip)      # xfeature_mask, xfeatures_mask
        movq    %rdi, %rbx      # xsave, xsave


Otherwise a nice cleanup!

Thx.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v6] x86: load FPU registers on return to userland
  2019-01-09 11:47 [PATCH v6] x86: load FPU registers on return to userland Sebastian Andrzej Siewior
                   ` (22 preceding siblings ...)
  2019-01-15 12:44 ` [PATCH v6] x86: load FPU registers on return to userland David Laight
@ 2019-01-30 11:35 ` Borislav Petkov
  2019-01-30 12:06   ` Sebastian Andrzej Siewior
  23 siblings, 1 reply; 91+ messages in thread
From: Borislav Petkov @ 2019-01-30 11:35 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, x86, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On Wed, Jan 09, 2019 at 12:47:22PM +0100, Sebastian Andrzej Siewior wrote:
> This is a refurbished series originally started by by Rik van Riel. The
> goal is load the FPU registers on return to userland and not on every
> context switch. By this optimisation we can:
> - avoid loading the registers if the task stays in kernel and does
>   not return to userland
> - make kernel_fpu_begin() cheaper: it only saves the registers on the
>   first invocation. The second invocation does not need save them again.

Btw, do we have any benchmark data showing the improvement this brings?

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 16/22] x86/fpu: Always store the registers in copy_fpstate_to_sigframe()
  2019-01-09 11:47 ` [PATCH 16/22] x86/fpu: Always store the registers in copy_fpstate_to_sigframe() Sebastian Andrzej Siewior
@ 2019-01-30 11:43   ` Borislav Petkov
  2019-02-07 13:28     ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 91+ messages in thread
From: Borislav Petkov @ 2019-01-30 11:43 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, x86, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On Wed, Jan 09, 2019 at 12:47:38PM +0100, Sebastian Andrzej Siewior wrote:
> From: Rik van Riel <riel@surriel.com>
> 
> copy_fpstate_to_sigframe() stores the registers directly to user space.
> This is okay because the FPU register are valid and saving it directly
> avoids saving it into kernel memory and making a copy.
> However… We can't keep doing this if we are going to restore the FPU
> registers on the return to userland. It is possible that the FPU
> registers will be invalidated in the middle of the save operation and
> this should be done with disabled preemption / BH.
> 
> Save the FPU registers to task's FPU struct and copy them to the user
> memory later on.
> 
> This code is extracted from an earlier version of the patchset while
> there still was lazy-FPU on x86.
> 
> Signed-off-by: Rik van Riel <riel@surriel.com>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
>  arch/x86/include/asm/fpu/internal.h | 45 -----------------------------
>  arch/x86/kernel/fpu/signal.c        | 29 +++++++------------
>  2 files changed, 10 insertions(+), 64 deletions(-)

...

> @@ -171,9 +156,15 @@ int copy_fpstate_to_sigframe(void __user *buf, void __user *buf_fx, int size)
>  			sizeof(struct user_i387_ia32_struct), NULL,
>  			(struct _fpstate_32 __user *) buf) ? -1 : 1;
>  
> -	/* Save the live register state to the user directly. */
> -	if (copy_fpregs_to_sigframe(buf_fx))
> -		return -1;
> +	copy_fpregs_to_fpstate(fpu);
> +
> +	if (using_compacted_format()) {
> +		copy_xstate_to_user(buf_fx, xsave, 0, size);
> +	} else {
> +		fpstate_sanitize_xstate(fpu);
> +		if (__copy_to_user(buf_fx, xsave, fpu_user_xstate_size))
> +			return -1;
> +	}
>  
>  	/* Save the fsave header for the 32-bit frames. */
>  	if ((ia32_fxstate || !use_fxsr()) && save_fsave_header(tsk, buf))

Comments above that function need updating.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 15/22] x86/entry: Add TIF_NEED_FPU_LOAD
  2019-01-09 11:47 ` [PATCH 15/22] x86/entry: Add TIF_NEED_FPU_LOAD Sebastian Andrzej Siewior
@ 2019-01-30 11:55   ` Borislav Petkov
  2019-02-07 11:49     ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 91+ messages in thread
From: Borislav Petkov @ 2019-01-30 11:55 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, x86, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On Wed, Jan 09, 2019 at 12:47:37PM +0100, Sebastian Andrzej Siewior wrote:
> Add TIF_NEED_FPU_LOAD. This is reserved for loading the FPU registers
> before returning to userland. This flag must not be set for systems
> without a FPU.
> If this flag is cleared, the CPU's FPU register hold the current content
> of current()'s FPU register. The in-memory copy (union fpregs_state) is
> not valid.
> If this flag is set, then all of CPU's FPU register may hold a random
> value (except for PKRU) and it is required to load the content of the
> FPU register on return to userland.

This definitely needs to be written somewhere in

arch/x86/include/asm/fpu/internal.h

or where we decide to put the FPU handling rules.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 17/22] x86/fpu: Prepare copy_fpstate_to_sigframe() for TIF_NEED_FPU_LOAD
  2019-01-09 11:47 ` [PATCH 17/22] x86/fpu: Prepare copy_fpstate_to_sigframe() for TIF_NEED_FPU_LOAD Sebastian Andrzej Siewior
@ 2019-01-30 11:56   ` Borislav Petkov
  2019-01-30 12:28     ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 91+ messages in thread
From: Borislav Petkov @ 2019-01-30 11:56 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, x86, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On Wed, Jan 09, 2019 at 12:47:39PM +0100, Sebastian Andrzej Siewior wrote:
> From: Rik van Riel <riel@surriel.com>
> 
> The FPU registers need only to be saved if TIF_NEED_FPU_LOAD is not set.
> Otherwise this has been already done and can be skipped.
> 
> Signed-off-by: Rik van Riel <riel@surriel.com>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
>  arch/x86/kernel/fpu/signal.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
> index bf4e6caad305e..a25be217f9a2c 100644
> --- a/arch/x86/kernel/fpu/signal.c
> +++ b/arch/x86/kernel/fpu/signal.c
> @@ -156,7 +156,16 @@ int copy_fpstate_to_sigframe(void __user *buf, void __user *buf_fx, int size)
>  			sizeof(struct user_i387_ia32_struct), NULL,
>  			(struct _fpstate_32 __user *) buf) ? -1 : 1;
>  
> -	copy_fpregs_to_fpstate(fpu);
> +	__fpregs_changes_begin();
> +	/*
> +	 * If we do not need to load the FPU registers at return to userspace
> +	 * then the CPU has the current state and we need to save it. Otherwise
> +	 * it is already done and we can skip it.
> +	 */
> +	if (!test_thread_flag(TIF_NEED_FPU_LOAD))
> +		copy_fpregs_to_fpstate(fpu);

I wonder if this flag would make the code more easy to follow by calling
it

	TIF_FPU_REGS_VALID

instead, to denote that the FPU registers in the CPU have a valid
content.

Then the test becomes:

	if (test_thread_flag(TIF_FPU_REGS_VALID))
		copy_fpregs_to_fpstate(fpu);

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v6] x86: load FPU registers on return to userland
  2019-01-30 11:35 ` Borislav Petkov
@ 2019-01-30 12:06   ` Sebastian Andrzej Siewior
  2019-01-30 12:27     ` Borislav Petkov
  0 siblings, 1 reply; 91+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-01-30 12:06 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-kernel, x86, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On 2019-01-30 12:35:55 [+0100], Borislav Petkov wrote:
> On Wed, Jan 09, 2019 at 12:47:22PM +0100, Sebastian Andrzej Siewior wrote:
> > This is a refurbished series originally started by by Rik van Riel. The
> > goal is load the FPU registers on return to userland and not on every
> > context switch. By this optimisation we can:
> > - avoid loading the registers if the task stays in kernel and does
> >   not return to userland
> > - make kernel_fpu_begin() cheaper: it only saves the registers on the
> >   first invocation. The second invocation does not need save them again.
> 
> Btw, do we have any benchmark data showing the improvement this brings?

nope. There is sig_lat or something like that which would measure how
many signals you can handle a second. That could show how bad the
changes are in the signal path.
I don't think that I need any numbers to show that all but first
invocation of kernel_fpu_begin() is "free". That would be the claim in
the second bullet.
And for the first bullet. Hmmm. I could add a trace point to see how
often we entered schedule() without saving FPU registers for user tasks.
That would mean the benefit is that we didn't restore them while we left
schedule() and didn't save them while entered schedule() (again).
I don't know if hackbench would show anything besides noise.

Sebastian

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v6] x86: load FPU registers on return to userland
  2019-01-30 12:06   ` Sebastian Andrzej Siewior
@ 2019-01-30 12:27     ` Borislav Petkov
  2019-02-08 13:12       ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 91+ messages in thread
From: Borislav Petkov @ 2019-01-30 12:27 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, x86, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On Wed, Jan 30, 2019 at 01:06:47PM +0100, Sebastian Andrzej Siewior wrote:
> I don't know if hackbench would show anything besides noise.

Yeah, if a sensible benchmark (dunno if hackbench is among them :))
shows no difference, is also saying something.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 17/22] x86/fpu: Prepare copy_fpstate_to_sigframe() for TIF_NEED_FPU_LOAD
  2019-01-30 11:56   ` Borislav Petkov
@ 2019-01-30 12:28     ` Sebastian Andrzej Siewior
  2019-01-30 12:53       ` Borislav Petkov
  0 siblings, 1 reply; 91+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-01-30 12:28 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-kernel, x86, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On 2019-01-30 12:56:14 [+0100], Borislav Petkov wrote:
> > diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
> > index bf4e6caad305e..a25be217f9a2c 100644
> > --- a/arch/x86/kernel/fpu/signal.c
> > +++ b/arch/x86/kernel/fpu/signal.c
> > @@ -156,7 +156,16 @@ int copy_fpstate_to_sigframe(void __user *buf, void __user *buf_fx, int size)
> >  			sizeof(struct user_i387_ia32_struct), NULL,
> >  			(struct _fpstate_32 __user *) buf) ? -1 : 1;
> >  
> > -	copy_fpregs_to_fpstate(fpu);
> > +	__fpregs_changes_begin();
> > +	/*
> > +	 * If we do not need to load the FPU registers at return to userspace
> > +	 * then the CPU has the current state and we need to save it. Otherwise
> > +	 * it is already done and we can skip it.
> > +	 */
> > +	if (!test_thread_flag(TIF_NEED_FPU_LOAD))
> > +		copy_fpregs_to_fpstate(fpu);
> 
> I wonder if this flag would make the code more easy to follow by calling
> it
> 
> 	TIF_FPU_REGS_VALID
> 
> instead, to denote that the FPU registers in the CPU have a valid
> content.
> 
> Then the test becomes:
> 
> 	if (test_thread_flag(TIF_FPU_REGS_VALID))
> 		copy_fpregs_to_fpstate(fpu);

I've been asked to add comment above the sequence so it is understood. I
think the general approach is easy to follow once the concept is
understood. I don't mind renaming the TIF_ thingy once again (it
happend once or twice and I think the current one was suggested by Andy
unless I mixed things up).
The problem I have with the above is that

	if (test_thread_flag(TIF_NEED_FPU_LOAD))
		do_that()

becomes
	if (!test_thread_flag(TIF_FPU_REGS_VALID))
		do_that()

and you could argue again the other way around. So do we want NEED_LOAD
or NEED_SAVE flag which is another way of saying REGS_VALID?
More importantly the logic is changed when the bit is set and this
requires more thinking than just doing sed on the patch series.

Sebastian

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 17/22] x86/fpu: Prepare copy_fpstate_to_sigframe() for TIF_NEED_FPU_LOAD
  2019-01-30 12:28     ` Sebastian Andrzej Siewior
@ 2019-01-30 12:53       ` Borislav Petkov
  2019-02-07 14:10         ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 91+ messages in thread
From: Borislav Petkov @ 2019-01-30 12:53 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, x86, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On Wed, Jan 30, 2019 at 01:28:20PM +0100, Sebastian Andrzej Siewior wrote:
> > > +	/*
> > > +	 * If we do not need to load the FPU registers at return to userspace
> > > +	 * then the CPU has the current state and we need to save it. Otherwise
> > > +	 * it is already done and we can skip it.
> > > +	 */
> > > +	if (!test_thread_flag(TIF_NEED_FPU_LOAD))
> > > +		copy_fpregs_to_fpstate(fpu);
> > 
> > I wonder if this flag would make the code more easy to follow by calling
> > it
> > 
> > 	TIF_FPU_REGS_VALID
> > 
> > instead, to denote that the FPU registers in the CPU have a valid
> > content.
> > 
> > Then the test becomes:
> > 
> > 	if (test_thread_flag(TIF_FPU_REGS_VALID))
> > 		copy_fpregs_to_fpstate(fpu);
> 
> I've been asked to add comment above the sequence so it is understood. I
> think the general approach is easy to follow once the concept is
> understood. I don't mind renaming the TIF_ thingy once again (it
> happend once or twice and I think the current one was suggested by Andy
> unless I mixed things up).
> The problem I have with the above is that
> 
> 	if (test_thread_flag(TIF_NEED_FPU_LOAD))
> 		do_that()
> 
> becomes
> 	if (!test_thread_flag(TIF_FPU_REGS_VALID))
> 		do_that()

Err, above it becomes

	if (test_thread_flag(TIF_FPU_REGS_VALID))
		copy_fpregs_to_fpstate(fpu);

without the "!". I.e., CPU's FPU regs are valid and we need to save them.

Or am I misreading the comment above?

> and you could argue again the other way around. So do we want NEED_LOAD
> or NEED_SAVE flag which is another way of saying REGS_VALID?

All fine and dandy except NEED_FPU_LOAD is ambiguous to me: we need to
load them where? Into the CPU? Or into the FPU state save area?

TIF_FPU_REGS_VALID is clearer in the sense that the CPU's FPU registers
are currently valid for the current task. As there are no other FPU
registers except the CPU's.

> More importantly the logic is changed when the bit is set and this
> requires more thinking than just doing sed on the patch series.

Sure.

And I'll get accustomed to the logic whatever the name is - this is just
a "wouldn't it be better" thing.

Thx.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 20/22] x86/fpu: Let __fpu__restore_sig() restore the !32bit+fxsr frame from kernel memory
  2019-01-09 11:47 ` [PATCH 20/22] x86/fpu: Let __fpu__restore_sig() restore the !32bit+fxsr frame from kernel memory Sebastian Andrzej Siewior
@ 2019-01-30 21:29   ` Borislav Petkov
  0 siblings, 0 replies; 91+ messages in thread
From: Borislav Petkov @ 2019-01-30 21:29 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, x86, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On Wed, Jan 09, 2019 at 12:47:42PM +0100, Sebastian Andrzej Siewior wrote:
> The !32bit+fxsr case loads the new state from user memory. In case we
      ^^^^^^^^^^^

Let's "decrypt" that: "The 64-bit case where fast FXSAVE/FXRSTOR are used... "

But looking at the patch, it is not only about the fxsr but also the
use_xsave() case. So pls write out exactly what you mean here.

Ditto for the patch title.

> restore the FPU state on return to userland we can't do this. It would
> be required to disable preemption in order to avoid a context switch
> which would set TIF_NEED_FPU_LOAD. If this happens before the "restore"
> operation then the loaded registers would become volatile.
> 
> Disabling preemption while accessing user memory requires to disable the
> pagefault handler. An error during XRSTOR would then mean that either a
> page fault occured (and we have to retry with enabled page fault
> handler) or a #GP occured because the xstate is bogus (after all the
> sig-handler can modify it).
> 
> In order to avoid that mess, copy the FPU state from userland, validate
> it and then load it. The copy_users_…() helper are basically the old
> helper except that they operate on kernel memory and the fault handler
> just sets the error value and the caller handles it.
> 
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
>  arch/x86/include/asm/fpu/internal.h | 32 ++++++++++-----
>  arch/x86/kernel/fpu/signal.c        | 62 +++++++++++++++++++++++------
>  2 files changed, 71 insertions(+), 23 deletions(-)
> 
> diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
> index 16ea30235b025..672e51bc0e9b5 100644
> --- a/arch/x86/include/asm/fpu/internal.h
> +++ b/arch/x86/include/asm/fpu/internal.h
> @@ -120,6 +120,21 @@ extern void fpstate_sanitize_xstate(struct fpu *fpu);
>  	err;								\
>  })
>  
> +#define kernel_insn_norestore(insn, output, input...)			\
> +({									\
> +	int err;							\
> +	asm volatile("1:" #insn "\n\t"					\
> +		     "2:\n"						\
> +		     ".section .fixup,\"ax\"\n"				\
> +		     "3:  movl $-1,%[err]\n"				\
> +		     "    jmp  2b\n"					\
> +		     ".previous\n"					\
> +		     _ASM_EXTABLE(1b, 3b)				\
> +		     : [err] "=r" (err), output				\
> +		     : "0"(0), input);					\
> +	err;								\
> +})

user_insn above looks unused - just repurpose it.

> +
>  #define kernel_insn(insn, output, input...)				\
>  	asm volatile("1:" #insn "\n\t"					\
>  		     "2:\n"						\
> @@ -140,15 +155,15 @@ static inline void copy_kernel_to_fxregs(struct fxregs_state *fx)
>  	}
>  }
>  
> -static inline int copy_user_to_fxregs(struct fxregs_state __user *fx)
> +static inline int copy_users_to_fxregs(struct fxregs_state *fx)

Why "users" ?

>  {
>  	if (IS_ENABLED(CONFIG_X86_32))
> -		return user_insn(fxrstor %[fx], "=m" (*fx), [fx] "m" (*fx));
> +		return kernel_insn_norestore(fxrstor %[fx], "=m" (*fx), [fx] "m" (*fx));
>  	else if (IS_ENABLED(CONFIG_AS_FXSAVEQ))
> -		return user_insn(fxrstorq %[fx], "=m" (*fx), [fx] "m" (*fx));
> +		return kernel_insn_norestore(fxrstorq %[fx], "=m" (*fx), [fx] "m" (*fx));
>  
>  	/* See comment in copy_fxregs_to_kernel() below. */
> -	return user_insn(rex64/fxrstor (%[fx]), "=m" (*fx), [fx] "R" (fx),
> +	return kernel_insn_norestore(rex64/fxrstor (%[fx]), "=m" (*fx), [fx] "R" (fx),
>  			  "m" (*fx));
>  }
>  
> @@ -157,9 +172,9 @@ static inline void copy_kernel_to_fregs(struct fregs_state *fx)
>  	kernel_insn(frstor %[fx], "=m" (*fx), [fx] "m" (*fx));
>  }
>  
> -static inline int copy_user_to_fregs(struct fregs_state __user *fx)
> +static inline int copy_users_to_fregs(struct fregs_state *fx)
>  {
> -	return user_insn(frstor %[fx], "=m" (*fx), [fx] "m" (*fx));
> +	return kernel_insn_norestore(frstor %[fx], "=m" (*fx), [fx] "m" (*fx));
>  }
>  
>  static inline void copy_fxregs_to_kernel(struct fpu *fpu)
> @@ -339,16 +354,13 @@ static inline void copy_kernel_to_xregs(struct xregs_state *xstate, u64 mask)
>  /*
>   * Restore xstate from user space xsave area.
>   */
> -static inline int copy_user_to_xregs(struct xregs_state __user *buf, u64 mask)
> +static inline int copy_users_to_xregs(struct xregs_state *xstate, u64 mask)
>  {
> -	struct xregs_state *xstate = ((__force struct xregs_state *)buf);
>  	u32 lmask = mask;
>  	u32 hmask = mask >> 32;
>  	int err;
>  
> -	stac();
>  	XSTATE_OP(XRSTOR, xstate, lmask, hmask, err);
> -	clac();
>  
>  	return err;
>  }
> diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
> index 970091fb011e9..4ed5c400cac58 100644
> --- a/arch/x86/kernel/fpu/signal.c
> +++ b/arch/x86/kernel/fpu/signal.c
> @@ -217,7 +217,8 @@ sanitize_restored_xstate(union fpregs_state *state,
>  		 */
>  		xsave->i387.mxcsr &= mxcsr_feature_mask;
>  
> -		convert_to_fxsr(&state->fxsave, ia32_env);
> +		if (ia32_env)
> +			convert_to_fxsr(&state->fxsave, ia32_env);
>  	}
>  }
>  
> @@ -299,28 +300,63 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
>  		kfree(tmp);
>  		return err;
>  	} else {
> +		union fpregs_state *state;
> +		void *tmp;
>  		int ret;
>  
> +		tmp = kzalloc(sizeof(*state) + fpu_kernel_xstate_size + 64, GFP_KERNEL);
> +		if (!tmp)
> +			return -ENOMEM;

<---- newline here.

> +		state = PTR_ALIGN(tmp, 64);
> +
>  		/*
>  		 * For 64-bit frames and 32-bit fsave frames, restore the user
>  		 * state to the registers directly (with exceptions handled).
>  		 */
> -		if (use_xsave()) {
> -			if ((unsigned long)buf_fx % 64 || fx_only) {
> +		if ((unsigned long)buf_fx % 64)
> +			fx_only = 1;
> +
> +		if (use_xsave() && !fx_only) {
> +			u64 init_bv = xfeatures_mask & ~xfeatures;

Define that init_bv in the function prologue above and then you don't
need to define it again here and below in a narrower scope.

> +
> +			if (using_compacted_format()) {
> +				ret = copy_user_to_xstate(&state->xsave, buf_fx);
> +			} else {
> +				ret = __copy_from_user(&state->xsave, buf_fx, state_size);
> +
> +				if (!ret && state_size > offsetof(struct xregs_state, header))
> +					ret = validate_xstate_header(&state->xsave.header);
> +			}
> +			if (ret)
> +				goto err_out;

<---- newline here.

> +			sanitize_restored_xstate(state, NULL, xfeatures,
> +						 fx_only);

Let that stick out.

And then do that here:

			init_bv = xfeatures_mask & ~xfeatures;> +
> +			if (unlikely(init_bv))
> +				copy_kernel_to_xregs(&init_fpstate.xsave, init_bv);

and add a newline here so that this code above belongs together visually.

> +			ret = copy_users_to_xregs(&state->xsave, xfeatures);
> +

...

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 22/22] x86/fpu: Defer FPU state load until return to userspace
  2019-01-09 11:47 ` [PATCH 22/22] x86/fpu: Defer FPU state load until return to userspace Sebastian Andrzej Siewior
@ 2019-01-31  9:16   ` Borislav Petkov
  0 siblings, 0 replies; 91+ messages in thread
From: Borislav Petkov @ 2019-01-31  9:16 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, x86, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On Wed, Jan 09, 2019 at 12:47:44PM +0100, Sebastian Andrzej Siewior wrote:
> From: Rik van Riel <riel@surriel.com>
> 
> Defer loading of FPU state until return to userspace. This gives
> the kernel the potential to skip loading FPU state for tasks that
> stay in kernel mode, or for tasks that end up with repeated
> invocations of kernel_fpu_begin() & kernel_fpu_end().

s/&/and/

> The __fpregs_changes_{begin|end}() section ensures that the register

registers - please check all your commit messages and code comments for
usage of singular "register" where the plural "registers" is supposed to
be. There are a couple of places.

> remain unchanged. Otherwise a context switch or a BH could save the

s/BH/bottom half/

> registers to its FPU context and processor's FPU register would became

"become"

> random if beeing modified at the same time.
> 
> KVM swaps the host/guest register on entry/exit path. I kept the flow as

Passive: "The flow has been kept."

> is. First it ensures that the registers are loaded and then saves the
> current (host) state before it loads the guest's register. The swap is
> done at the very end with disabled interrupts so it should not change
> anymore before theg guest is entered. The read/save version seems to be
> cheaper compared to memcpy() in a micro benchmark.
> 
> Each thread gets TIF_NEED_FPU_LOAD set as part of fork() / fpu__copy().
> For kernel threads, this flag gets never cleared which avoids saving /
> restoring the FPU state for kernel threads and during in-kernel usage of
> the FPU register.
> 
> Signed-off-by: Rik van Riel <riel@surriel.com>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
>  arch/x86/entry/common.c             |   8 +++
>  arch/x86/include/asm/fpu/api.h      |  22 +++++-
>  arch/x86/include/asm/fpu/internal.h |  27 +++++---
>  arch/x86/include/asm/trace/fpu.h    |   5 +-
>  arch/x86/kernel/fpu/core.c          | 104 +++++++++++++++++++++-------
>  arch/x86/kernel/fpu/signal.c        |  46 +++++++-----
>  arch/x86/kernel/process.c           |   2 +-
>  arch/x86/kernel/process_32.c        |   5 +-
>  arch/x86/kernel/process_64.c        |   5 +-
>  arch/x86/kvm/x86.c                  |  20 ++++--
>  10 files changed, 179 insertions(+), 65 deletions(-)

Needs checkpatch fixing:

WARNING: 'beeing' may be misspelled - perhaps 'being'?
#29: 
random if beeing modified at the same time.

WARNING: braces {} are not necessary for single statement blocks
#282: FILE: arch/x86/kernel/fpu/core.c:149:
+               if (!copy_fpregs_to_fpstate(fpu)) {
+                       copy_kernel_to_fpregs(&fpu->state);
+               }

WARNING: please, no spaces at the start of a line
#391: FILE: arch/x86/kernel/fpu/core.c:375:
+       struct fpu *fpu = &current->thread.fpu;$

WARNING: please, no spaces at the start of a line
#393: FILE: arch/x86/kernel/fpu/core.c:377:
+       if (test_thread_flag(TIF_NEED_FPU_LOAD))$

WARNING: suspect code indent for conditional statements (7, 15)
#393: FILE: arch/x86/kernel/fpu/core.c:377:
+       if (test_thread_flag(TIF_NEED_FPU_LOAD))
+               return;

ERROR: code indent should use tabs where possible
#394: FILE: arch/x86/kernel/fpu/core.c:378:
+               return;$

WARNING: please, no spaces at the start of a line
#394: FILE: arch/x86/kernel/fpu/core.c:378:
+               return;$

WARNING: please, no spaces at the start of a line
#395: FILE: arch/x86/kernel/fpu/core.c:379:
+       WARN_ON_FPU(!fpregs_state_valid(fpu, smp_processor_id()));$

total: 1 errors, 7 warnings, 499 lines checked

Also, this patch could use some splitting for easier review, like adding
the helpers in a pre-patch and then wiring in all the logic in another,
for example.

> diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
> index 7bc105f47d21a..13e8e29af6ab7 100644
> --- a/arch/x86/entry/common.c
> +++ b/arch/x86/entry/common.c
> @@ -31,6 +31,7 @@
>  #include <asm/vdso.h>
>  #include <linux/uaccess.h>
>  #include <asm/cpufeature.h>
> +#include <asm/fpu/api.h>
>  
>  #define CREATE_TRACE_POINTS
>  #include <trace/events/syscalls.h>
> @@ -196,6 +197,13 @@ __visible inline void prepare_exit_to_usermode(struct pt_regs *regs)
>  	if (unlikely(cached_flags & EXIT_TO_USERMODE_LOOP_FLAGS))
>  		exit_to_usermode_loop(regs, cached_flags);
>  
> +	/* Reload ti->flags; we may have rescheduled above. */
> +	cached_flags = READ_ONCE(ti->flags);
> +
> +	fpregs_assert_state_consistent();

So this one already tests TIF_NEED_FPU_LOAD when the kernel has been
built with CONFIG_X86_DEBUG_FPU=y.

I guess we can remove that CONFIG_X86_DEBUG_FPU around it and run it
unconditionally. And then make it return the test result so that you
don't have to run the same test again on cached_flags, or ?

> +	if (unlikely(cached_flags & _TIF_NEED_FPU_LOAD))
> +		switch_fpu_return();

And looking at the code, the consistency check and the potential loading
of FPU registers in switch_fpu_return() belong together so they're kinda
begging to be a single function...?

> +
>  #ifdef CONFIG_COMPAT
>  	/*
>  	 * Compat syscalls set TS_COMPAT.  Make sure we clear it before
> diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h
> index 31b66af8eb914..c17620af5d797 100644
> --- a/arch/x86/include/asm/fpu/api.h
> +++ b/arch/x86/include/asm/fpu/api.h
> @@ -10,7 +10,7 @@
>  
>  #ifndef _ASM_X86_FPU_API_H
>  #define _ASM_X86_FPU_API_H
> -#include <linux/preempt.h>
> +#include <linux/bottom_half.h>
>  
>  /*
>   * Use kernel_fpu_begin/end() if you intend to use FPU in kernel context. It
> @@ -22,17 +22,37 @@
>  extern void kernel_fpu_begin(void);
>  extern void kernel_fpu_end(void);
>  extern bool irq_fpu_usable(void);
> +extern void fpregs_mark_activate(void);
>  
> +/*
> + * Use __fpregs_changes_begin() while editing CPU's FPU registers or fpu->state.
> + * A context switch will (and softirq might) save CPU's FPU register to
> + * fpu->state and set TIF_NEED_FPU_LOAD leaving CPU's FPU registers in a random
> + * state.
> + */
>  static inline void __fpregs_changes_begin(void)
>  {
>  	preempt_disable();
> +	local_bh_disable();
>  }
>  
>  static inline void __fpregs_changes_end(void)
>  {
> +	local_bh_enable();
>  	preempt_enable();
>  }
>  
> +#ifdef CONFIG_X86_DEBUG_FPU
> +extern void fpregs_assert_state_consistent(void);
> +#else
> +static inline void fpregs_assert_state_consistent(void) { }
> +#endif
> +
> +/*
> + * Load the task FPU state before returning to userspace.
> + */
> +extern void switch_fpu_return(void);
> +
>  /*
>   * Query the presence of one or more xfeatures. Works on any legacy CPU as well.
>   *
> diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
> index 672e51bc0e9b5..61627f8cb3ff4 100644
> --- a/arch/x86/include/asm/fpu/internal.h
> +++ b/arch/x86/include/asm/fpu/internal.h
> @@ -29,7 +29,7 @@ extern void fpu__prepare_write(struct fpu *fpu);
>  extern void fpu__save(struct fpu *fpu);
>  extern int  fpu__restore_sig(void __user *buf, int ia32_frame);
>  extern void fpu__drop(struct fpu *fpu);
> -extern int  fpu__copy(struct fpu *dst_fpu, struct fpu *src_fpu);
> +extern int  fpu__copy(struct task_struct *dst, struct task_struct *src);
>  extern void fpu__clear(struct fpu *fpu);
>  extern int  fpu__exception_code(struct fpu *fpu, int trap_nr);
>  extern int  dump_fpu(struct pt_regs *ptregs, struct user_i387_struct *fpstate);
> @@ -482,13 +482,20 @@ static inline void fpregs_activate(struct fpu *fpu)
>  	trace_x86_fpu_regs_activated(fpu);
>  }
>  
> -static inline void __fpregs_load_activate(struct fpu *fpu, int cpu)
> +static inline void __fpregs_load_activate(void)
>  {
> +	struct fpu *fpu = &current->thread.fpu;
> +	int cpu = smp_processor_id();
> +
> +	if (WARN_ON_ONCE(current->mm == NULL))

			!current->mm

is what we do with NULL checks.

> +		return;
> +
>  	if (!fpregs_state_valid(fpu, cpu)) {
> -		if (current->mm)
> -			copy_kernel_to_fpregs(&fpu->state);
> +		copy_kernel_to_fpregs(&fpu->state);
>  		fpregs_activate(fpu);
> +		fpu->last_cpu = cpu;
>  	}
> +	clear_thread_flag(TIF_NEED_FPU_LOAD);
>  }
>  
>  /*
> @@ -499,8 +506,8 @@ static inline void __fpregs_load_activate(struct fpu *fpu, int cpu)
>   *  - switch_fpu_prepare() saves the old state.
>   *    This is done within the context of the old process.
>   *
> - *  - switch_fpu_finish() restores the new state as
> - *    necessary.
> + *  - switch_fpu_finish() sets TIF_NEED_FPU_LOAD; the floating point state
> + *    will get loaded on return to userspace, or when the kernel needs it.
>   */
>  static inline void
>  switch_fpu_prepare(struct fpu *old_fpu, int cpu)
> @@ -521,10 +528,10 @@ switch_fpu_prepare(struct fpu *old_fpu, int cpu)
>   */
>  
>  /*
> - * Set up the userspace FPU context for the new task, if the task
> - * has used the FPU.
> + * Load PKRU from the FPU context if available. Delay loading the loading of the

"loading the loading"?

> + * complete FPU state until the return to userland.
>   */
> -static inline void switch_fpu_finish(struct fpu *new_fpu, int cpu)
> +static inline void switch_fpu_finish(struct fpu *new_fpu)
>  {
>  	struct pkru_state *pk;
>  	u32 pkru_val = 0;
> @@ -532,7 +539,7 @@ static inline void switch_fpu_finish(struct fpu *new_fpu, int cpu)
>  	if (!static_cpu_has(X86_FEATURE_FPU))
>  		return;
>  
> -	__fpregs_load_activate(new_fpu, cpu);
> +	set_thread_flag(TIF_NEED_FPU_LOAD);
>  
>  	if (!cpu_feature_enabled(X86_FEATURE_OSPKE))
>  		return;
> diff --git a/arch/x86/include/asm/trace/fpu.h b/arch/x86/include/asm/trace/fpu.h
> index bd65f6ba950f8..91a1422091ceb 100644
> --- a/arch/x86/include/asm/trace/fpu.h
> +++ b/arch/x86/include/asm/trace/fpu.h
> @@ -13,19 +13,22 @@ DECLARE_EVENT_CLASS(x86_fpu,
>  
>  	TP_STRUCT__entry(
>  		__field(struct fpu *, fpu)
> +		__field(bool, load_fpu)

Yeah, I don't think you can do this.

>  		__field(u64, xfeatures)
>  		__field(u64, xcomp_bv)
>  		),
>  
>  	TP_fast_assign(
>  		__entry->fpu		= fpu;
> +		__entry->load_fpu	= test_thread_flag(TIF_NEED_FPU_LOAD);
>  		if (boot_cpu_has(X86_FEATURE_OSXSAVE)) {
>  			__entry->xfeatures = fpu->state.xsave.header.xfeatures;
>  			__entry->xcomp_bv  = fpu->state.xsave.header.xcomp_bv;
>  		}
>  	),
> -	TP_printk("x86/fpu: %p xfeatures: %llx xcomp_bv: %llx",
> +	TP_printk("x86/fpu: %p load: %d xfeatures: %llx xcomp_bv: %llx",
>  			__entry->fpu,
> +			__entry->load_fpu,
>  			__entry->xfeatures,
>  			__entry->xcomp_bv
>  	)
> diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
> index 78d8037635932..f52e687dff9ee 100644
> --- a/arch/x86/kernel/fpu/core.c
> +++ b/arch/x86/kernel/fpu/core.c
> @@ -102,23 +102,20 @@ static void __kernel_fpu_begin(void)
>  	kernel_fpu_disable();
>  
>  	if (current->mm) {
> -		/*
> -		 * Ignore return value -- we don't care if reg state
> -		 * is clobbered.
> -		 */
> -		copy_fpregs_to_fpstate(fpu);
> -	} else {
> -		__cpu_invalidate_fpregs_state();
> +		if (!test_thread_flag(TIF_NEED_FPU_LOAD)) {
> +			set_thread_flag(TIF_NEED_FPU_LOAD);

test_and_set_thread_flag ?

> +			/*
> +			 * Ignore return value -- we don't care if reg state
> +			 * is clobbered.
> +			 */
> +			copy_fpregs_to_fpstate(fpu);
> +		}
>  	}
> +	__cpu_invalidate_fpregs_state();
>  }
>  
>  static void __kernel_fpu_end(void)
>  {
> -	struct fpu *fpu = &current->thread.fpu;
> -
> -	if (current->mm)
> -		copy_kernel_to_fpregs(&fpu->state);
> -
>  	kernel_fpu_enable();
>  }
>  
> @@ -145,14 +142,16 @@ void fpu__save(struct fpu *fpu)
>  {
>  	WARN_ON_FPU(fpu != &current->thread.fpu);
>  
> -	preempt_disable();
> +	__fpregs_changes_begin();
>  	trace_x86_fpu_before_save(fpu);
>  
> -	if (!copy_fpregs_to_fpstate(fpu)) {
> -		copy_kernel_to_fpregs(&fpu->state);
> +	if (!test_thread_flag(TIF_NEED_FPU_LOAD)) {
> +		if (!copy_fpregs_to_fpstate(fpu)) {
> +			copy_kernel_to_fpregs(&fpu->state);
> +		}
>  	}
>  	trace_x86_fpu_after_save(fpu);
> -	preempt_enable();
> +	__fpregs_changes_end();
>  }
>  EXPORT_SYMBOL_GPL(fpu__save);
>  
> @@ -185,8 +184,11 @@ void fpstate_init(union fpregs_state *state)
>  }
>  EXPORT_SYMBOL_GPL(fpstate_init);
>  
> -int fpu__copy(struct fpu *dst_fpu, struct fpu *src_fpu)
> +int fpu__copy(struct task_struct *dst, struct task_struct *src)
>  {
> +	struct fpu *dst_fpu = &dst->thread.fpu;
> +	struct fpu *src_fpu = &src->thread.fpu;
> +
>  	dst_fpu->last_cpu = -1;
>  
>  	if (!static_cpu_has(X86_FEATURE_FPU))
> @@ -201,16 +203,23 @@ int fpu__copy(struct fpu *dst_fpu, struct fpu *src_fpu)
>  	memset(&dst_fpu->state.xsave, 0, fpu_kernel_xstate_size);
>  
>  	/*
> -	 * Save current FPU registers directly into the child
> -	 * FPU context, without any memory-to-memory copying.
> +	 * If the FPU registers are not current just memcpy() the state.
> +	 * Otherwise save current FPU registers directly into the child's FPU
> +	 * context, without any memory-to-memory copying.
>  	 *
>  	 * ( The function 'fails' in the FNSAVE case, which destroys
> -	 *   register contents so we have to copy them back. )
> +	 *   register contents so we have to load them back. )
>  	 */
> -	if (!copy_fpregs_to_fpstate(dst_fpu)) {
> -		memcpy(&src_fpu->state, &dst_fpu->state, fpu_kernel_xstate_size);
> -		copy_kernel_to_fpregs(&src_fpu->state);
> -	}
> +	__fpregs_changes_begin();
> +	if (test_thread_flag(TIF_NEED_FPU_LOAD))
> +		memcpy(&dst_fpu->state, &src_fpu->state, fpu_kernel_xstate_size);
> +

Superfluous newline.

> +	else if (!copy_fpregs_to_fpstate(dst_fpu))
> +		copy_kernel_to_fpregs(&dst_fpu->state);
> +
> +	__fpregs_changes_end();
> +
> +	set_tsk_thread_flag(dst, TIF_NEED_FPU_LOAD);
>  
>  	trace_x86_fpu_copy_src(src_fpu);
>  	trace_x86_fpu_copy_dst(dst_fpu);
> @@ -226,10 +235,9 @@ static void fpu__initialize(struct fpu *fpu)
>  {
>  	WARN_ON_FPU(fpu != &current->thread.fpu);
>  
> +	set_thread_flag(TIF_NEED_FPU_LOAD);
>  	fpstate_init(&fpu->state);
>  	trace_x86_fpu_init_state(fpu);
> -
> -	trace_x86_fpu_activate_state(fpu);
>  }
>  
>  /*
> @@ -308,6 +316,8 @@ void fpu__drop(struct fpu *fpu)
>   */
>  static inline void copy_init_fpstate_to_fpregs(void)
>  {
> +	__fpregs_changes_begin();
> +
>  	if (use_xsave())
>  		copy_kernel_to_xregs(&init_fpstate.xsave, -1);
>  	else if (static_cpu_has(X86_FEATURE_FXSR))
> @@ -317,6 +327,9 @@ static inline void copy_init_fpstate_to_fpregs(void)
>  
>  	if (boot_cpu_has(X86_FEATURE_OSPKE))
>  		copy_init_pkru_to_fpregs();
> +
> +	fpregs_mark_activate();
> +	__fpregs_changes_end();
>  }
>  
>  /*
> @@ -339,6 +352,45 @@ void fpu__clear(struct fpu *fpu)
>  		copy_init_fpstate_to_fpregs();
>  }
>  
> +/*
> + * Load FPU context before returning to userspace.
> + */
> +void switch_fpu_return(void)
> +{
> +	if (!static_cpu_has(X86_FEATURE_FPU))
> +		return;
> +
> +	__fpregs_load_activate();
> +}
> +EXPORT_SYMBOL_GPL(switch_fpu_return);
> +
> +#ifdef CONFIG_X86_DEBUG_FPU
> +/*
> + * If current FPU state according to its tracking (loaded FPU ctx on this CPU)
> + * is not valid then we must have TIF_NEED_FPU_LOAD set so the context is loaded on
> + * return to userland.
> + */
> +void fpregs_assert_state_consistent(void)
> +{
> +       struct fpu *fpu = &current->thread.fpu;
> +
> +       if (test_thread_flag(TIF_NEED_FPU_LOAD))
> +               return;
> +       WARN_ON_FPU(!fpregs_state_valid(fpu, smp_processor_id()));
> +}
> +EXPORT_SYMBOL_GPL(fpregs_assert_state_consistent);
> +#endif
> +
> +void fpregs_mark_activate(void)
> +{
> +	struct fpu *fpu = &current->thread.fpu;
> +
> +	fpregs_activate(fpu);
> +	fpu->last_cpu = smp_processor_id();
> +	clear_thread_flag(TIF_NEED_FPU_LOAD);
> +}
> +EXPORT_SYMBOL_GPL(fpregs_mark_activate);
> +
>  /*
>   * x87 math exception handling:
>   */
> diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
> index a17e75fa1a0a6..61a03a34a7304 100644
> --- a/arch/x86/kernel/fpu/signal.c
> +++ b/arch/x86/kernel/fpu/signal.c
> @@ -230,11 +230,9 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
>  	struct fpu *fpu = &tsk->thread.fpu;
>  	int state_size = fpu_kernel_xstate_size;
>  	struct user_i387_ia32_struct env;
> -	union fpregs_state *state;
>  	u64 xfeatures = 0;
>  	int fx_only = 0;
>  	int ret = 0;
> -	void *tmp;
>  
>  	ia32_fxstate &= (IS_ENABLED(CONFIG_X86_32) ||
>  			 IS_ENABLED(CONFIG_IA32_EMULATION));
> @@ -269,14 +267,18 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
>  		}
>  	}
>  
> -	tmp = kzalloc(sizeof(*state) + fpu_kernel_xstate_size + 64, GFP_KERNEL);
> -	if (!tmp)
> -		return -ENOMEM;
> -	state = PTR_ALIGN(tmp, 64);
> +	/*
> +	 * The current state of the FPU registers does not matter. By setting
> +	 * TIF_NEED_FPU_LOAD unconditionally it is ensured that the our xstate

our xstate? You mean the xstate copy in memory I think...

> +	 * is not modified on context switch and that the xstate is considered
> +	 * to loaded again on return to userland (overriding last_cpu avoids the
> +	 * optimisation).
> +	 */
> +	set_thread_flag(TIF_NEED_FPU_LOAD);
> +	__fpu_invalidate_fpregs_state(fpu);
>  
>  	if ((unsigned long)buf_fx % 64)
>  		fx_only = 1;
> -
>  	/*
>  	 * For 32-bit frames with fxstate, copy the fxstate so it can be
>  	 * reconstructed later.
> @@ -292,43 +294,51 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
>  		u64 init_bv = xfeatures_mask & ~xfeatures;
>  
>  		if (using_compacted_format()) {
> -			ret = copy_user_to_xstate(&state->xsave, buf_fx);
> +			ret = copy_user_to_xstate(&fpu->state.xsave, buf_fx);
>  		} else {
> -			ret = __copy_from_user(&state->xsave, buf_fx, state_size);
> +			ret = __copy_from_user(&fpu->state.xsave, buf_fx, state_size);
>  
>  			if (!ret && state_size > offsetof(struct xregs_state, header))
> -				ret = validate_xstate_header(&state->xsave.header);
> +				ret = validate_xstate_header(&fpu->state.xsave.header);
>  		}
>  		if (ret)
>  			goto err_out;
>  
> -		sanitize_restored_xstate(state, envp, xfeatures, fx_only);
> +		sanitize_restored_xstate(&fpu->state, envp, xfeatures, fx_only);
>  
> +		__fpregs_changes_begin();
>  		if (unlikely(init_bv))
>  			copy_kernel_to_xregs(&init_fpstate.xsave, init_bv);
> -		ret = copy_users_to_xregs(&state->xsave, xfeatures);
> +		ret = copy_users_to_xregs(&fpu->state.xsave, xfeatures);
>  
>  	} else if (use_fxsr()) {
> -		ret = __copy_from_user(&state->fxsave, buf_fx, state_size);
> -		if (ret)
> +		ret = __copy_from_user(&fpu->state.fxsave, buf_fx, state_size);
> +		if (ret) {
> +			ret = -EFAULT;
>  			goto err_out;
> +		}
>  
> -		sanitize_restored_xstate(state, envp, xfeatures, fx_only);
> +		sanitize_restored_xstate(&fpu->state, envp, xfeatures, fx_only);
> +
> +		__fpregs_changes_begin();
>  		if (use_xsave()) {
>  			u64 init_bv = xfeatures_mask & ~XFEATURE_MASK_FPSSE;
>  			copy_kernel_to_xregs(&init_fpstate.xsave, init_bv);
>  		}
>  
> -		ret = copy_users_to_fxregs(&state->fxsave);
> +		ret = copy_users_to_fxregs(&fpu->state.fxsave);
>  	} else {
> -		ret = __copy_from_user(&state->fsave, buf_fx, state_size);
> +		ret = __copy_from_user(&fpu->state.fsave, buf_fx, state_size);
>  		if (ret)
>  			goto err_out;
> +		__fpregs_changes_begin();

Eww, each branch is doing __fpregs_changes_begin() and we do
__fpregs_changes_end() once at the end. Had to open eyes wider here :)

>  		ret = copy_users_to_fregs(buf_fx);
>  	}
> +	if (!ret)
> +		fpregs_mark_activate();
> +	__fpregs_changes_end();
>  
>  err_out:
> -	kfree(tmp);
>  	if (ret)
>  		fpu__clear(fpu);
>  	return ret;
> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index 90ae0ca510837..2e38a14fdbd3f 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -101,7 +101,7 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
>  	dst->thread.vm86 = NULL;
>  #endif
>  
> -	return fpu__copy(&dst->thread.fpu, &src->thread.fpu);
> +	return fpu__copy(dst, src);
>  }
>  
>  /*
> diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
> index 77d9eb43ccac8..1bc47f3a48854 100644
> --- a/arch/x86/kernel/process_32.c
> +++ b/arch/x86/kernel/process_32.c
> @@ -234,7 +234,8 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
>  
>  	/* never put a printk in __switch_to... printk() calls wake_up*() indirectly */
>  
> -	switch_fpu_prepare(prev_fpu, cpu);

Let's add a comment here like the rest of the function does it. Ditto for below.

> +	if (!test_thread_flag(TIF_NEED_FPU_LOAD))
> +		switch_fpu_prepare(prev_fpu, cpu);
>  
>  	/*
>  	 * Save away %gs. No need to save %fs, as it was saved on the

...

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 01/22] x86/fpu: Remove fpu->initialized usage in __fpu__restore_sig()
  2019-01-14 16:24   ` Borislav Petkov
@ 2019-02-05 10:08     ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 91+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-02-05 10:08 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-kernel, x86, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On 2019-01-14 17:24:00 [+0100], Borislav Petkov wrote:
> > @@ -315,40 +313,33 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
> > -			sanitize_restored_xstate(tsk, &env, xfeatures, fx_only);
> > +			sanitize_restored_xstate(state, &env,
> > +						 xfeatures, fx_only);
> 
> Just let that one stick out - there are other lines in this file already
> longer than 80.
Didn't want to add more of these but okay.

> Notwithstanding, I don't see anything wrong with this patch.
> 
> Acked-by: Borislav Petkov <bp@suse.de>
Thanks.

Sebastian

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 05/22] x86/fpu: Remove fpu->initialized usage in copy_fpstate_to_sigframe()
  2019-01-21 11:21             ` Oleg Nesterov
  2019-01-22 13:40               ` Borislav Petkov
@ 2019-02-05 11:17               ` Sebastian Andrzej Siewior
  2019-02-26 16:38                 ` Oleg Nesterov
  1 sibling, 1 reply; 91+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-02-05 11:17 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Dave Hansen, Borislav Petkov, Ingo Molnar, linux-kernel, x86,
	Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On 2019-01-21 12:21:17 [+0100], Oleg Nesterov wrote:
> > This is part of our ABI for *sure*.  Inspecting that state is how
> > userspace makes sense of MPX or protection keys faults.  We even use
> > this in selftests/.
> 
> Yes.
> 
> And in any case I do not understand the idea to use the second in-kernel struct fpu.
> A signal handler can be interrupted by another signal, this will need to save/restore
> the FPU state again.

So I assumed that while SIGUSR1 is handled SIGUSR2 will wait until the
current signal is handled. So no interruption. But then SIGSEGV is
probably the exception which will interrupt SIGUSR1. So we would need a
third one…

The idea was to save the FPU state in-kernel so we don't have to
revalidate everything because userspace had access to it and could do
things.
Some things are complicated and not documented why they are the way they
are. For instance on 64bit (based on the code) the signal handler can
remove SSE from the state-mask and the kernel loads the "default-empty"
SSE registers and the enabled states from user. This isn't done on
32bit. Also: we save with XSAVE and allocate the buffer on stack. But if
we can't find the FP_XSTATE_MAGIC* or the buffer is not properly aligned
then we fallback to FXSR and assume that we have a FXSR buffer in front
of us.

> Oleg.

Sebastian

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 05/22] x86/fpu: Remove fpu->initialized usage in copy_fpstate_to_sigframe()
  2019-01-22 17:00                   ` Borislav Petkov
@ 2019-02-05 11:34                     ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 91+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-02-05 11:34 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Oleg Nesterov, Dave Hansen, Ingo Molnar, linux-kernel, x86,
	Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen, Michael Matz

On 2019-01-22 18:00:23 [+0100], Borislav Petkov wrote:
> On Tue, Jan 22, 2019 at 05:15:51PM +0100, Oleg Nesterov wrote:
> > I don't know... tried to google, found nothing.
> > 
> > the comment in /usr/include/sys/ucontext.h mentions SysV/i386 ABI + historical
> > reasons, this didn't help.
> 
> So I'm being told by one of the psABI folks that this is not really
> written down somewhere explicitly but it is the result from the POSIX
> and psABI treatise of signal handlers, what they're supposed to do,
> caller- and callee-saved registers, etc.
> 
> And FPU registers are volatile, i.e., caller-saved. Which means, the
> handler itself doesn't save them but the caller, which, doesn't really
> expect any signals - they are async. So the kernel must do that and
> slap the FPU regs onto the user stack...

My point was save them somewhere else if it is possible. So we could
save a few cycles during signal delivery and it would make the code a
little simpler.

Let me finish the series and then we can think how we could improve it.

> Hohumm. Makes sense.
> 

Sebastian

^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH 05/22 v2] x86/fpu: Remove fpu->initialized usage in copy_fpstate_to_sigframe()
  2019-01-17 12:22       ` Borislav Petkov
  2019-01-18 21:14         ` Sebastian Andrzej Siewior
@ 2019-02-05 14:37         ` Sebastian Andrzej Siewior
  1 sibling, 0 replies; 91+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-02-05 14:37 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-kernel, x86, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

With lazy-FPU support the (now named variable) ->initialized was set to true if
the CPU's FPU registers were holding the a valid state of the FPU registers for
the active process. If it was set to false then the FPU state was saved in
fpu->state and the FPU was deactivated.
With lazy-FPU gone, ->initialized is always true for user threads and kernel
threads never this function so ->initialized is always true in
copy_fpstate_to_sigframe().
The using_compacted_format() check is also a leftover from the lazy-FPU time.
In the `->initialized == false' case copy_to_user() would copy the compacted
buffer while userland would expect the non-compacted format instead. So in
order to save the FPU state in the non-compacted form it issues the xsave
opcode to save the *current* FPU state.
The FPU is not enabled so the attempt raises the FPU trap, the trap restores
the FPU content and re-enables the FPU and the xsave opcode is invoked again and
succeeds. *This* does not longer work since commit

  bef8b6da9522 ("x86/fpu: Handle #NM without FPU emulation as an error")

Remove check for ->initialized because it is always true and remove the
false condition. Update the comment to reflect that the "state is always live".

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
v1…v2: rewrite the patch description.

 arch/x86/kernel/fpu/signal.c | 30 ++++++------------------------
 1 file changed, 6 insertions(+), 24 deletions(-)

Index: staging/arch/x86/kernel/fpu/signal.c
===================================================================
--- staging.orig/arch/x86/kernel/fpu/signal.c
+++ staging/arch/x86/kernel/fpu/signal.c
@@ -144,9 +144,8 @@ static inline int copy_fpregs_to_sigfram
  *	buf == buf_fx for 64-bit frames and 32-bit fsave frame.
  *	buf != buf_fx for 32-bit frames with fxstate.
  *
- * If the fpu, extended register state is live, save the state directly
- * to the user frame pointed by the aligned pointer 'buf_fx'. Otherwise,
- * copy the thread's fpu state to the user frame starting at 'buf_fx'.
+ * Save the state directly to the user frame pointed by the aligned pointer
+ * 'buf_fx'.
  *
  * If this is a 32-bit frame with fxstate, put a fsave header before
  * the aligned state at 'buf_fx'.
@@ -157,7 +156,6 @@ static inline int copy_fpregs_to_sigfram
 int copy_fpstate_to_sigframe(void __user *buf, void __user *buf_fx, int size)
 {
 	struct fpu *fpu = &current->thread.fpu;
-	struct xregs_state *xsave = &fpu->state.xsave;
 	struct task_struct *tsk = current;
 	int ia32_fxstate = (buf != buf_fx);
 
@@ -172,29 +170,12 @@ int copy_fpstate_to_sigframe(void __user
 			sizeof(struct user_i387_ia32_struct), NULL,
 			(struct _fpstate_32 __user *) buf) ? -1 : 1;
 
-	if (fpu->initialized || using_compacted_format()) {
-		/* Save the live register state to the user directly. */
-		if (copy_fpregs_to_sigframe(buf_fx))
-			return -1;
-		/* Update the thread's fxstate to save the fsave header. */
-		if (ia32_fxstate)
-			copy_fxregs_to_kernel(fpu);
-	} else {
-		/*
-		 * It is a *bug* if kernel uses compacted-format for xsave
-		 * area and we copy it out directly to a signal frame. It
-		 * should have been handled above by saving the registers
-		 * directly.
-		 */
-		if (boot_cpu_has(X86_FEATURE_XSAVES)) {
-			WARN_ONCE(1, "x86/fpu: saving compacted-format xsave area to a signal frame!\n");
-			return -1;
-		}
-
-		fpstate_sanitize_xstate(fpu);
-		if (__copy_to_user(buf_fx, xsave, fpu_user_xstate_size))
-			return -1;
-	}
+	/* Save the live register state to the user directly. */
+	if (copy_fpregs_to_sigframe(buf_fx))
+		return -1;
+	/* Update the thread's fxstate to save the fsave header. */
+	if (ia32_fxstate)
+		copy_fxregs_to_kernel(fpu);
 
 	/* Save the fsave header for the 32-bit frames. */
 	if ((ia32_fxstate || !use_fxsr()) && save_fsave_header(tsk, buf))

^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH 06/22 v2] x86/fpu: Don't save fxregs for ia32 frames in
  2019-01-24 11:17   ` Borislav Petkov
@ 2019-02-05 16:43     ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 91+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-02-05 16:43 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-kernel, x86, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

In commit 

  72a671ced66db ("x86, fpu: Unify signal handling code paths for x86 and x86_64 kernels")

the 32bit and 64bit path of the signal delivery code were merged. The 32bit version:
|int save_i387_xstate_ia32(void __user *buf)
|…
|       if (cpu_has_xsave)
|               return save_i387_xsave(fp);
|       if (cpu_has_fxsr)
|               return save_i387_fxsave(fp);

The 64bit version:
|int save_i387_xstate(void __user *buf)
|…
|       if (user_has_fpu()) {
|               if (use_xsave())
|                       err = xsave_user(buf);
|               else
|                       err = fxsave_user(buf);
|
|               if (unlikely(err)) {
|                       __clear_user(buf, xstate_size);
|                       return err;

The merge:
|int save_xstate_sig(void __user *buf, void __user *buf_fx, int size)
|…
|       if (user_has_fpu()) {
|               /* Save the live register state to the user directly. */
|               if (save_user_xstate(buf_fx))
|                       return -1;
|               /* Update the thread's fxstate to save the fsave header. */
|               if (ia32_fxstate)
|                       fpu_fxsave(&tsk->thread.fpu);

I don't think that we needed to save the FPU registers to ->thread.fpu because
the registers were stored in `buf_fx'. Today the state will be restored from
`buf_fx' after the signal was handled (I assume that this was also the case
with lazy-FPU). Since commit

  66463db4fc560 ("x86, fpu: shift drop_init_fpu() from save_xstate_sig() to handle_signal()")

it is ensured that the signal handler starts with clear/fresh set of FPU
registers which means that the previous store is futile.

Remove copy_fxregs_to_kernel() because task's FPU state is cleared later in
handle_signal() via fpu__clear().

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
v1…v2:
 - reworte description. Replaced the "I don't know why it is like it is
   makes no sense buh" part with some pointer which might explain why
   copy_fxregs_to_kernel() ended there and since when it definitely is a
   nop.
 - removed `fpu' since it unused after that change.

 arch/x86/kernel/fpu/signal.c | 3 ---
 1 file changed, 3 deletions(-)

Index: staging/arch/x86/kernel/fpu/signal.c
===================================================================
--- staging.orig/arch/x86/kernel/fpu/signal.c
+++ staging/arch/x86/kernel/fpu/signal.c
@@ -155,7 +155,6 @@ static inline int copy_fpregs_to_sigfram
  */
 int copy_fpstate_to_sigframe(void __user *buf, void __user *buf_fx, int size)
 {
-	struct fpu *fpu = &current->thread.fpu;
 	struct task_struct *tsk = current;
 	int ia32_fxstate = (buf != buf_fx);
 
@@ -173,9 +172,6 @@ int copy_fpstate_to_sigframe(void __user
 	/* Save the live register state to the user directly. */
 	if (copy_fpregs_to_sigframe(buf_fx))
 		return -1;
-	/* Update the thread's fxstate to save the fsave header. */
-	if (ia32_fxstate)
-		copy_fxregs_to_kernel(fpu);
 
 	/* Save the fsave header for the 32-bit frames. */
 	if ((ia32_fxstate || !use_fxsr()) && save_fsave_header(tsk, buf))

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 07/22] x86/fpu: Remove fpu->initialized
  2019-01-24 13:34   ` Borislav Petkov
@ 2019-02-05 18:03     ` Sebastian Andrzej Siewior
  2019-02-06 14:01       ` Borislav Petkov
  2019-02-05 18:06     ` [PATCH 07/22 v2] " Sebastian Andrzej Siewior
  1 sibling, 1 reply; 91+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-02-05 18:03 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-kernel, x86, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On 2019-01-24 14:34:49 [+0100], Borislav Petkov wrote:
> > set it back to one) or don't return to userland.
> > 
> > The context switch code (switch_fpu_prepare() + switch_fpu_finish())
> > can't unconditionally save/restore registers for kernel threads. I have
> > no idea what will happen if we restore a zero FPU context for the kernel
> > thread (since it never was initialized).
> 
> Yeah, avoid those "author is wondering" statements.

So I am no longer unsure about certain thing. Understood.

> > Also it has been agreed that
> > for PKRU we don't want a random state (inherited from the previous task)
> > but a deterministic one.
> 
> Rewrite that to state what the PKRU state is going to be.
I dropped that part. It was part for this patch in an earlier version
but it was moved.

> > For kernel_fpu_begin() (+end) the situation is similar: The kernel test
> > bot told me, that EFI with runtime services uses this before
> > alternatives_patched is true. Which means that this function is used too
> > early and it wasn't the case before.
> > 
> > For those two cases current->mm is used to determine between user &
> > kernel thread.
> 
> Now that we start looking at ->mm, I think we should document this
> somewhere prominently, maybe
> 
>   arch/x86/include/asm/fpu/internal.h
> 
> or so along with all the logic this patchset changes wrt FPU handling.
> Then we wouldn't have to wonder in the future why stuff is being done
> the way it is done.

Well, nothing changes in regard to the logic. Earlier we had a variable
which helped us to distinguish between user & kernel thread. Now we have
a different one. 
I'm going to add a comment to switch_fpu_prepare() about ->mm since you
insist but I would like to avoid it.

> Like the FPU saving on the user stack frame or why this was needed:
> 
> -	/* Update the thread's fxstate to save the fsave header. */
> -	if (ia32_fxstate)
> -		copy_fxregs_to_kernel(fpu);
> 
> Some sort of a high-level invariants written down would save us a lot of
> head scratching in the future.

We have a comment, it is just not helping.

> > diff --git a/arch/x86/include/asm/trace/fpu.h b/arch/x86/include/asm/trace/fpu.h
> > index 069c04be15076..bd65f6ba950f8 100644
> > --- a/arch/x86/include/asm/trace/fpu.h
> > +++ b/arch/x86/include/asm/trace/fpu.h
> > @@ -13,22 +13,19 @@ DECLARE_EVENT_CLASS(x86_fpu,
> >  
> >  	TP_STRUCT__entry(
> >  		__field(struct fpu *, fpu)
> > -		__field(bool, initialized)
> >  		__field(u64, xfeatures)
> >  		__field(u64, xcomp_bv)
> >  		),
> 
> Yikes, can you do that?
> 
> rostedt has been preaching that adding members at the end of tracepoints
> is ok but not changing them in the middle as that breaks ABI.
> 
> Might wanna ping him about it first.

Steven said on IRC that it can be removed.

> > diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
> > index e43296854e379..3a4668c9d24f1 100644
> > --- a/arch/x86/kernel/fpu/core.c
> > +++ b/arch/x86/kernel/fpu/core.c
> > @@ -147,10 +147,9 @@ void fpu__save(struct fpu *fpu)
> >  
> >  	preempt_disable();
> >  	trace_x86_fpu_before_save(fpu);
> > -	if (fpu->initialized) {
> > -		if (!copy_fpregs_to_fpstate(fpu)) {
> > -			copy_kernel_to_fpregs(&fpu->state);
> > -		}
> > +
> > +	if (!copy_fpregs_to_fpstate(fpu)) {
> > +		copy_kernel_to_fpregs(&fpu->state);
> >  	}
> 
> WARNING: braces {} are not necessary for single statement blocks
> #217: FILE: arch/x86/kernel/fpu/core.c:151:
> +       if (!copy_fpregs_to_fpstate(fpu)) {
> +               copy_kernel_to_fpregs(&fpu->state);
>         }
removed.

> 
> ...
> 
> > diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
> > index 7888a41a03cdb..77d9eb43ccac8 100644
> > --- a/arch/x86/kernel/process_32.c
> > +++ b/arch/x86/kernel/process_32.c
> > @@ -288,10 +288,10 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
> >  	if (prev->gs | next->gs)
> >  		lazy_load_gs(next->gs);
> >  
> > -	switch_fpu_finish(next_fpu, cpu);
> > -
> >  	this_cpu_write(current_task, next_p);
> >  
> > +	switch_fpu_finish(next_fpu, cpu);
> > +
> >  	/* Load the Intel cache allocation PQR MSR. */
> >  	resctrl_sched_in();
> >  
> > diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
> > index e1983b3a16c43..ffea7c557963a 100644
> > --- a/arch/x86/kernel/process_64.c
> > +++ b/arch/x86/kernel/process_64.c
> > @@ -566,14 +566,14 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
> >  
> >  	x86_fsgsbase_load(prev, next);
> >  
> > -	switch_fpu_finish(next_fpu, cpu);
> > -
> >  	/*
> >  	 * Switch the PDA and FPU contexts.
> >  	 */
> >  	this_cpu_write(current_task, next_p);
> >  	this_cpu_write(cpu_current_top_of_stack, task_top_of_stack(next_p));
> >  
> > +	switch_fpu_finish(next_fpu, cpu);
> > +
> >  	/* Reload sp0. */
> >  	update_task_stack(next_p);
> >  
> 
> Those moves need at least a comment in the commit message or a separate
> patch.

This needs to be part of this patch. I add a note to the commit message.

Sebastian

^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH 07/22 v2] x86/fpu: Remove fpu->initialized
  2019-01-24 13:34   ` Borislav Petkov
  2019-02-05 18:03     ` Sebastian Andrzej Siewior
@ 2019-02-05 18:06     ` Sebastian Andrzej Siewior
  1 sibling, 0 replies; 91+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-02-05 18:06 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-kernel, x86, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

The `initialized' member of the fpu struct is always set to one for user
tasks and zero for kernel tasks. This avoids saving/restoring the FPU
registers for kernel threads.

The ->initialized = 0 case for user tasks has been removed in previous changes
for instance by always an explicit init at fork() time for FPU-less system which
was otherwise delayed until the emulated opcode.

The context switch code (switch_fpu_prepare() + switch_fpu_finish())
can't unconditionally save/restore registers for kernel threads. Not only would
it slow down switch but also load a zeroed xcomp_bv for the XSAVES.

For kernel_fpu_begin() (+end) the situation is similar: EFI with runtime
services uses this before alternatives_patched is true. Which means that this
function is used too early and it wasn't the case before.

For those two cases current->mm is used to determine between user &
kernel thread. For kernel_fpu_begin() we skip save/restore of the FPU
registers.
During the context switch into a kernel thread we don't do anything.
There is no reason to save the FPU state of a kernel thread.
The reordering in __switch_to() is important because the current() pointer
needs to be valid before switch_fpu_finish() is invoked so ->mm is seen of the
new task instead the old one.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
v1…v2:
  - patch description changes.
  - dropping brackets around a single statement in fpu__save().

 arch/x86/ia32/ia32_signal.c         | 17 +++-----
 arch/x86/include/asm/fpu/internal.h | 15 +++----
 arch/x86/include/asm/fpu/types.h    |  9 ----
 arch/x86/include/asm/trace/fpu.h    |  5 +--
 arch/x86/kernel/fpu/core.c          | 68 ++++++++---------------------
 arch/x86/kernel/fpu/init.c          |  2 -
 arch/x86/kernel/fpu/regset.c        | 19 ++------
 arch/x86/kernel/fpu/xstate.c        |  2 -
 arch/x86/kernel/process_32.c        |  4 +-
 arch/x86/kernel/process_64.c        |  4 +-
 arch/x86/kernel/signal.c            | 17 +++-----
 arch/x86/mm/pkeys.c                 |  7 +--
 12 files changed, 49 insertions(+), 120 deletions(-)

Index: staging/arch/x86/ia32/ia32_signal.c
===================================================================
--- staging.orig/arch/x86/ia32/ia32_signal.c
+++ staging/arch/x86/ia32/ia32_signal.c
@@ -216,8 +216,7 @@ static void __user *get_sigframe(struct
 				 size_t frame_size,
 				 void __user **fpstate)
 {
-	struct fpu *fpu = &current->thread.fpu;
-	unsigned long sp;
+	unsigned long sp, fx_aligned, math_size;
 
 	/* Default to using normal stack */
 	sp = regs->sp;
@@ -231,15 +230,11 @@ static void __user *get_sigframe(struct
 		 ksig->ka.sa.sa_restorer)
 		sp = (unsigned long) ksig->ka.sa.sa_restorer;
 
-	if (fpu->initialized) {
-		unsigned long fx_aligned, math_size;
-
-		sp = fpu__alloc_mathframe(sp, 1, &fx_aligned, &math_size);
-		*fpstate = (struct _fpstate_32 __user *) sp;
-		if (copy_fpstate_to_sigframe(*fpstate, (void __user *)fx_aligned,
-				    math_size) < 0)
-			return (void __user *) -1L;
-	}
+	sp = fpu__alloc_mathframe(sp, 1, &fx_aligned, &math_size);
+	*fpstate = (struct _fpstate_32 __user *) sp;
+	if (copy_fpstate_to_sigframe(*fpstate, (void __user *)fx_aligned,
+				     math_size) < 0)
+		return (void __user *) -1L;
 
 	sp -= frame_size;
 	/* Align the stack pointer according to the i386 ABI,
Index: staging/arch/x86/include/asm/fpu/internal.h
===================================================================
--- staging.orig/arch/x86/include/asm/fpu/internal.h
+++ staging/arch/x86/include/asm/fpu/internal.h
@@ -525,11 +525,14 @@ static inline void fpregs_activate(struc
  *
  *  - switch_fpu_finish() restores the new state as
  *    necessary.
+ *
+ * The FPU context is only stored/restore for user task and ->mm is used to
+ * distinguish between kernel and user threads.
  */
 static inline void
 switch_fpu_prepare(struct fpu *old_fpu, int cpu)
 {
-	if (static_cpu_has(X86_FEATURE_FPU) && old_fpu->initialized) {
+	if (static_cpu_has(X86_FEATURE_FPU) && current->mm) {
 		if (!copy_fpregs_to_fpstate(old_fpu))
 			old_fpu->last_cpu = -1;
 		else
@@ -537,8 +540,7 @@ switch_fpu_prepare(struct fpu *old_fpu,
 
 		/* But leave fpu_fpregs_owner_ctx! */
 		trace_x86_fpu_regs_deactivated(old_fpu);
-	} else
-		old_fpu->last_cpu = -1;
+	}
 }
 
 /*
@@ -551,12 +553,12 @@ switch_fpu_prepare(struct fpu *old_fpu,
  */
 static inline void switch_fpu_finish(struct fpu *new_fpu, int cpu)
 {
-	bool preload = static_cpu_has(X86_FEATURE_FPU) &&
-		       new_fpu->initialized;
+	if (static_cpu_has(X86_FEATURE_FPU)) {
+		if (!fpregs_state_valid(new_fpu, cpu)) {
+			if (current->mm)
+				copy_kernel_to_fpregs(&new_fpu->state);
+		}
 
-	if (preload) {
-		if (!fpregs_state_valid(new_fpu, cpu))
-			copy_kernel_to_fpregs(&new_fpu->state);
 		fpregs_activate(new_fpu);
 	}
 }
Index: staging/arch/x86/include/asm/fpu/types.h
===================================================================
--- staging.orig/arch/x86/include/asm/fpu/types.h
+++ staging/arch/x86/include/asm/fpu/types.h
@@ -294,15 +294,6 @@ struct fpu {
 	unsigned int			last_cpu;
 
 	/*
-	 * @initialized:
-	 *
-	 * This flag indicates whether this context is initialized: if the task
-	 * is not running then we can restore from this context, if the task
-	 * is running then we should save into this context.
-	 */
-	unsigned char			initialized;
-
-	/*
 	 * @state:
 	 *
 	 * In-memory copy of all FPU registers that we save/restore
Index: staging/arch/x86/include/asm/trace/fpu.h
===================================================================
--- staging.orig/arch/x86/include/asm/trace/fpu.h
+++ staging/arch/x86/include/asm/trace/fpu.h
@@ -13,22 +13,19 @@ DECLARE_EVENT_CLASS(x86_fpu,
 
 	TP_STRUCT__entry(
 		__field(struct fpu *, fpu)
-		__field(bool, initialized)
 		__field(u64, xfeatures)
 		__field(u64, xcomp_bv)
 		),
 
 	TP_fast_assign(
 		__entry->fpu		= fpu;
-		__entry->initialized	= fpu->initialized;
 		if (boot_cpu_has(X86_FEATURE_OSXSAVE)) {
 			__entry->xfeatures = fpu->state.xsave.header.xfeatures;
 			__entry->xcomp_bv  = fpu->state.xsave.header.xcomp_bv;
 		}
 	),
-	TP_printk("x86/fpu: %p initialized: %d xfeatures: %llx xcomp_bv: %llx",
+	TP_printk("x86/fpu: %p xfeatures: %llx xcomp_bv: %llx",
 			__entry->fpu,
-			__entry->initialized,
 			__entry->xfeatures,
 			__entry->xcomp_bv
 	)
Index: staging/arch/x86/kernel/fpu/core.c
===================================================================
--- staging.orig/arch/x86/kernel/fpu/core.c
+++ staging/arch/x86/kernel/fpu/core.c
@@ -101,7 +101,7 @@ static void __kernel_fpu_begin(void)
 
 	kernel_fpu_disable();
 
-	if (fpu->initialized) {
+	if (current->mm) {
 		/*
 		 * Ignore return value -- we don't care if reg state
 		 * is clobbered.
@@ -116,7 +116,7 @@ static void __kernel_fpu_end(void)
 {
 	struct fpu *fpu = &current->thread.fpu;
 
-	if (fpu->initialized)
+	if (current->mm)
 		copy_kernel_to_fpregs(&fpu->state);
 
 	kernel_fpu_enable();
@@ -147,11 +147,10 @@ void fpu__save(struct fpu *fpu)
 
 	preempt_disable();
 	trace_x86_fpu_before_save(fpu);
-	if (fpu->initialized) {
-		if (!copy_fpregs_to_fpstate(fpu)) {
-			copy_kernel_to_fpregs(&fpu->state);
-		}
-	}
+
+	if (!copy_fpregs_to_fpstate(fpu))
+		copy_kernel_to_fpregs(&fpu->state);
+
 	trace_x86_fpu_after_save(fpu);
 	preempt_enable();
 }
@@ -190,7 +189,7 @@ int fpu__copy(struct fpu *dst_fpu, struc
 {
 	dst_fpu->last_cpu = -1;
 
-	if (!src_fpu->initialized || !static_cpu_has(X86_FEATURE_FPU))
+	if (!static_cpu_has(X86_FEATURE_FPU))
 		return 0;
 
 	WARN_ON_FPU(src_fpu != &current->thread.fpu);
@@ -227,14 +226,10 @@ static void fpu__initialize(struct fpu *
 {
 	WARN_ON_FPU(fpu != &current->thread.fpu);
 
-	if (!fpu->initialized) {
-		fpstate_init(&fpu->state);
-		trace_x86_fpu_init_state(fpu);
-
-		trace_x86_fpu_activate_state(fpu);
-		/* Safe to do for the current task: */
-		fpu->initialized = 1;
-	}
+	fpstate_init(&fpu->state);
+	trace_x86_fpu_init_state(fpu);
+
+	trace_x86_fpu_activate_state(fpu);
 }
 
 /*
@@ -247,32 +242,20 @@ static void fpu__initialize(struct fpu *
  *
  * - or it's called for stopped tasks (ptrace), in which case the
  *   registers were already saved by the context-switch code when
- *   the task scheduled out - we only have to initialize the registers
- *   if they've never been initialized.
+ *   the task scheduled out.
  *
  * If the task has used the FPU before then save it.
  */
 void fpu__prepare_read(struct fpu *fpu)
 {
-	if (fpu == &current->thread.fpu) {
+	if (fpu == &current->thread.fpu)
 		fpu__save(fpu);
-	} else {
-		if (!fpu->initialized) {
-			fpstate_init(&fpu->state);
-			trace_x86_fpu_init_state(fpu);
-
-			trace_x86_fpu_activate_state(fpu);
-			/* Safe to do for current and for stopped child tasks: */
-			fpu->initialized = 1;
-		}
-	}
 }
 
 /*
  * This function must be called before we write a task's fpstate.
  *
- * If the task has used the FPU before then invalidate any cached FPU registers.
- * If the task has not used the FPU before then initialize its fpstate.
+ * Invalidate any cached FPU registers.
  *
  * After this function call, after registers in the fpstate are
  * modified and the child task has woken up, the child task will
@@ -289,17 +272,8 @@ void fpu__prepare_write(struct fpu *fpu)
 	 */
 	WARN_ON_FPU(fpu == &current->thread.fpu);
 
-	if (fpu->initialized) {
-		/* Invalidate any cached state: */
-		__fpu_invalidate_fpregs_state(fpu);
-	} else {
-		fpstate_init(&fpu->state);
-		trace_x86_fpu_init_state(fpu);
-
-		trace_x86_fpu_activate_state(fpu);
-		/* Safe to do for stopped child tasks: */
-		fpu->initialized = 1;
-	}
+	/* Invalidate any cached state: */
+	__fpu_invalidate_fpregs_state(fpu);
 }
 
 /*
@@ -316,17 +290,13 @@ void fpu__drop(struct fpu *fpu)
 	preempt_disable();
 
 	if (fpu == &current->thread.fpu) {
-		if (fpu->initialized) {
-			/* Ignore delayed exceptions from user space */
-			asm volatile("1: fwait\n"
-				     "2:\n"
-				     _ASM_EXTABLE(1b, 2b));
-			fpregs_deactivate(fpu);
-		}
+		/* Ignore delayed exceptions from user space */
+		asm volatile("1: fwait\n"
+			     "2:\n"
+			     _ASM_EXTABLE(1b, 2b));
+		fpregs_deactivate(fpu);
 	}
 
-	fpu->initialized = 0;
-
 	trace_x86_fpu_dropped(fpu);
 
 	preempt_enable();
Index: staging/arch/x86/kernel/fpu/init.c
===================================================================
--- staging.orig/arch/x86/kernel/fpu/init.c
+++ staging/arch/x86/kernel/fpu/init.c
@@ -239,8 +239,6 @@ static void __init fpu__init_system_ctx_
 
 	WARN_ON_FPU(!on_boot_cpu);
 	on_boot_cpu = 0;
-
-	WARN_ON_FPU(current->thread.fpu.initialized);
 }
 
 /*
Index: staging/arch/x86/kernel/fpu/regset.c
===================================================================
--- staging.orig/arch/x86/kernel/fpu/regset.c
+++ staging/arch/x86/kernel/fpu/regset.c
@@ -15,16 +15,12 @@
  */
 int regset_fpregs_active(struct task_struct *target, const struct user_regset *regset)
 {
-	struct fpu *target_fpu = &target->thread.fpu;
-
-	return target_fpu->initialized ? regset->n : 0;
+	return regset->n;
 }
 
 int regset_xregset_fpregs_active(struct task_struct *target, const struct user_regset *regset)
 {
-	struct fpu *target_fpu = &target->thread.fpu;
-
-	if (boot_cpu_has(X86_FEATURE_FXSR) && target_fpu->initialized)
+	if (boot_cpu_has(X86_FEATURE_FXSR))
 		return regset->n;
 	else
 		return 0;
@@ -370,16 +366,9 @@ int fpregs_set(struct task_struct *targe
 int dump_fpu(struct pt_regs *regs, struct user_i387_struct *ufpu)
 {
 	struct task_struct *tsk = current;
-	struct fpu *fpu = &tsk->thread.fpu;
-	int fpvalid;
-
-	fpvalid = fpu->initialized;
-	if (fpvalid)
-		fpvalid = !fpregs_get(tsk, NULL,
-				      0, sizeof(struct user_i387_ia32_struct),
-				      ufpu, NULL);
 
-	return fpvalid;
+	return !fpregs_get(tsk, NULL, 0, sizeof(struct user_i387_ia32_struct),
+			   ufpu, NULL);
 }
 EXPORT_SYMBOL(dump_fpu);
 
Index: staging/arch/x86/kernel/fpu/xstate.c
===================================================================
--- staging.orig/arch/x86/kernel/fpu/xstate.c
+++ staging/arch/x86/kernel/fpu/xstate.c
@@ -892,8 +892,6 @@ const void *get_xsave_field_ptr(int xsav
 {
 	struct fpu *fpu = &current->thread.fpu;
 
-	if (!fpu->initialized)
-		return NULL;
 	/*
 	 * fpu__save() takes the CPU's xstate registers
 	 * and saves them off to the 'fpu memory buffer.
Index: staging/arch/x86/kernel/process_32.c
===================================================================
--- staging.orig/arch/x86/kernel/process_32.c
+++ staging/arch/x86/kernel/process_32.c
@@ -288,10 +288,10 @@ __switch_to(struct task_struct *prev_p,
 	if (prev->gs | next->gs)
 		lazy_load_gs(next->gs);
 
-	switch_fpu_finish(next_fpu, cpu);
-
 	this_cpu_write(current_task, next_p);
 
+	switch_fpu_finish(next_fpu, cpu);
+
 	/* Load the Intel cache allocation PQR MSR. */
 	resctrl_sched_in();
 
Index: staging/arch/x86/kernel/process_64.c
===================================================================
--- staging.orig/arch/x86/kernel/process_64.c
+++ staging/arch/x86/kernel/process_64.c
@@ -566,14 +566,14 @@ __switch_to(struct task_struct *prev_p,
 
 	x86_fsgsbase_load(prev, next);
 
-	switch_fpu_finish(next_fpu, cpu);
-
 	/*
 	 * Switch the PDA and FPU contexts.
 	 */
 	this_cpu_write(current_task, next_p);
 	this_cpu_write(cpu_current_top_of_stack, task_top_of_stack(next_p));
 
+	switch_fpu_finish(next_fpu, cpu);
+
 	/* Reload sp0. */
 	update_task_stack(next_p);
 
Index: staging/arch/x86/kernel/signal.c
===================================================================
--- staging.orig/arch/x86/kernel/signal.c
+++ staging/arch/x86/kernel/signal.c
@@ -246,7 +246,7 @@ get_sigframe(struct k_sigaction *ka, str
 	unsigned long sp = regs->sp;
 	unsigned long buf_fx = 0;
 	int onsigstack = on_sig_stack(sp);
-	struct fpu *fpu = &current->thread.fpu;
+	int ret;
 
 	/* redzone */
 	if (IS_ENABLED(CONFIG_X86_64))
@@ -265,11 +265,9 @@ get_sigframe(struct k_sigaction *ka, str
 		sp = (unsigned long) ka->sa.sa_restorer;
 	}
 
-	if (fpu->initialized) {
-		sp = fpu__alloc_mathframe(sp, IS_ENABLED(CONFIG_X86_32),
-					  &buf_fx, &math_size);
-		*fpstate = (void __user *)sp;
-	}
+	sp = fpu__alloc_mathframe(sp, IS_ENABLED(CONFIG_X86_32),
+				  &buf_fx, &math_size);
+	*fpstate = (void __user *)sp;
 
 	sp = align_sigframe(sp - frame_size);
 
@@ -281,8 +279,8 @@ get_sigframe(struct k_sigaction *ka, str
 		return (void __user *)-1L;
 
 	/* save i387 and extended state */
-	if (fpu->initialized &&
-	    copy_fpstate_to_sigframe(*fpstate, (void __user *)buf_fx, math_size) < 0)
+	ret = copy_fpstate_to_sigframe(*fpstate, (void __user *)buf_fx, math_size);
+	if (ret < 0)
 		return (void __user *)-1L;
 
 	return (void __user *)sp;
@@ -763,8 +761,7 @@ handle_signal(struct ksignal *ksig, stru
 		/*
 		 * Ensure the signal handler starts with the new fpu state.
 		 */
-		if (fpu->initialized)
-			fpu__clear(fpu);
+		fpu__clear(fpu);
 	}
 	signal_setup_done(failed, ksig, stepping);
 }
Index: staging/arch/x86/mm/pkeys.c
===================================================================
--- staging.orig/arch/x86/mm/pkeys.c
+++ staging/arch/x86/mm/pkeys.c
@@ -39,17 +39,12 @@ int __execute_only_pkey(struct mm_struct
 	 * dance to set PKRU if we do not need to.  Check it
 	 * first and assume that if the execute-only pkey is
 	 * write-disabled that we do not have to set it
-	 * ourselves.  We need preempt off so that nobody
-	 * can make fpregs inactive.
+	 * ourselves.
 	 */
-	preempt_disable();
 	if (!need_to_set_mm_pkey &&
-	    current->thread.fpu.initialized &&
 	    !__pkru_allows_read(read_pkru(), execute_only_pkey)) {
-		preempt_enable();
 		return execute_only_pkey;
 	}
-	preempt_enable();
 
 	/*
 	 * Set up PKRU so that it denies access for everything

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 08/22] x86/fpu: Remove user_fpu_begin()
  2019-01-25 15:18   ` Borislav Petkov
@ 2019-02-05 18:16     ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 91+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-02-05 18:16 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-kernel, x86, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On 2019-01-25 16:18:40 [+0100], Borislav Petkov wrote:
> Reviewed-by: Borislav Petkov <bp@suse.de>
thanks.

> Should we do this microoptimization in addition, to save us the
> activation when the kernel thread here:
> 
> 	taskA -> kernel thread -> taskA
> 
> doesn't call kernel_fpu_begin() and thus fpu_fpregs_owner_ctx remains
> the same?

This might work now but at the end of the series this case will be
handled. The switch
	taskA -> kernel thread

will save taskA's registers. The switch
	kernel thread -> taskA

will only set TF flag to restore FPU registers on the return to
userland. The load happens only the ctx pointer is different.

> It would be a bit more correct as it won't invoke the
> trace_x86_fpu_regs_activated() TP in case the FPU context is the same.

The trace point is not wrong. As of now the same context will be loaded
again.

Sebastian

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 07/22] x86/fpu: Remove fpu->initialized
  2019-02-05 18:03     ` Sebastian Andrzej Siewior
@ 2019-02-06 14:01       ` Borislav Petkov
  2019-02-07 10:13         ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 91+ messages in thread
From: Borislav Petkov @ 2019-02-06 14:01 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, x86, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On Tue, Feb 05, 2019 at 07:03:37PM +0100, Sebastian Andrzej Siewior wrote:
> Well, nothing changes in regard to the logic. Earlier we had a variable
> which helped us to distinguish between user & kernel thread. Now we have
> a different one. 
> I'm going to add a comment to switch_fpu_prepare() about ->mm since you
> insist but I would like to avoid it.

I don't understand what that aversion is towards commenting stuff,
especially important stuff like the meaning of the presence of ->mm for
the FPU code. What is the downside to documenting that?

Considering that in this very thread we ourselves encountered the fact
that stuff is not documented and we complained that it wasn't!

> We have a comment, it is just not helping.

Why is it not helping?

> Steven said on IRC that it can be removed.

Did he give an explanation why is it ok?

Thx.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 07/22] x86/fpu: Remove fpu->initialized
  2019-02-06 14:01       ` Borislav Petkov
@ 2019-02-07 10:13         ` Sebastian Andrzej Siewior
  2019-02-07 10:37           ` Borislav Petkov
  0 siblings, 1 reply; 91+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-02-07 10:13 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-kernel, x86, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On 2019-02-06 15:01:14 [+0100], Borislav Petkov wrote:
> On Tue, Feb 05, 2019 at 07:03:37PM +0100, Sebastian Andrzej Siewior wrote:
> > Well, nothing changes in regard to the logic. Earlier we had a variable
> > which helped us to distinguish between user & kernel thread. Now we have
> > a different one. 
> > I'm going to add a comment to switch_fpu_prepare() about ->mm since you
> > insist but I would like to avoid it.
> 
> I don't understand what that aversion is towards commenting stuff,
> especially important stuff like the meaning of the presence of ->mm for
> the FPU code. What is the downside to documenting that?

I don't like commenting the obvious things in code but I might be wrong
on what I consider here obvious. The important part is probably that we
don't save/restore FPU registers for kernel threads but this isn't new,
it was always like that (more or less implicit). The ->mm part is an
implementation detail (and is used in other places).
That said I already added this:
|@@ -525,11 +525,14 @@ static inline void fpregs_activate(struc
|  *
|  *  - switch_fpu_finish() restores the new state as
|  *    necessary.
|+ *
|+ * The FPU context is only stored/restore for user task and ->mm is used to
|+ * distinguish between kernel and user threads.
|  */
| static inline void
| switch_fpu_prepare(struct fpu *old_fpu, int cpu)
| {

and I *think* that this is enough. This *what* we do and not *why*. I
don't have an answer towards *why*.

> Considering that in this very thread we ourselves encountered the fact
> that stuff is not documented and we complained that it wasn't!

Yes. We had no idea why we save the FPU registers on user's stack during
signal handling. Was this an implementation detail on kernel side as
part of signal handling or is this required/ expected by the user as
part of a use case? We have now the explanation that signals may
cascade. Do we know by now if userland is supposed to use it or it
accessed the register because they were available?
The MPX code did access the MPX part of the xsave area (others do it for
"testing/debug" as per my I google research). This kind of things should
be part of the ABI document and not only a comment in the kernel.
Are the MAGIC constants only in-kernel use (to check if the user
accidentally overwrote its stack) or should be checked by the user
during signal handling to ensure that the xsave area is available.

> > We have a comment, it is just not helping.
> 
> Why is it not helping?

The part you referred to was:
|-       /* Update the thread's fxstate to save the fsave header. */
|-       if (ia32_fxstate) 
|-               copy_fxregs_to_kernel(fpu);

and it is not helping because it does not explain why it is done. I can
see based on the code that the FXstate is saved in case of a 32bit
frame. It is saved into thread's state. It does not explain why it
needs to be done. That is the "not helping" part.

> > Steven said on IRC that it can be removed.
> 
> Did he give an explanation why is it ok?

I can forward you the IRC pieces offlist if you like. He said I can
remove it if there are no users and I am not aware of any. He pointed
out that sched_wakeup had a "success" member which was relied on by
tools so it remained in order not to break them. So we have
	__entry->success        = 1; /* rudiment, kill when possible */

in the tree now. I can loop him in if this is not enough.

> Thx.

Sebastian

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 07/22] x86/fpu: Remove fpu->initialized
  2019-02-07 10:13         ` Sebastian Andrzej Siewior
@ 2019-02-07 10:37           ` Borislav Petkov
  0 siblings, 0 replies; 91+ messages in thread
From: Borislav Petkov @ 2019-02-07 10:37 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, x86, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On Thu, Feb 07, 2019 at 11:13:01AM +0100, Sebastian Andrzej Siewior wrote:
> and I *think* that this is enough. This *what* we do and not *why*. I
> don't have an answer towards *why*.

Well, it is a start.

You now have everything in your L1 and it is all clear but I'm sure all
the details will be LRU-evicted out soon :) and then you'd wish you'd
written down at least a small hint explaining the grand scheme at least.

> 
> > Considering that in this very thread we ourselves encountered the fact
> > that stuff is not documented and we complained that it wasn't!
> 
> Yes. We had no idea why we save the FPU registers on user's stack during
> signal handling. Was this an implementation detail on kernel side as
> part of signal handling or is this required/ expected by the user as
> part of a use case?

Well, at least a comment over get_sigframe() would've helped a long way,
right?

Instead of scratching heads why is this being done this way.

> We have now the explanation that signals may cascade. Do we know by
> now if userland is supposed to use it or it accessed the register
> because they were available? The MPX code did access the MPX part of
> the xsave area (others do it for "testing/debug" as per my I google
> research). This kind of things should be part of the ABI document and
> not only a comment in the kernel.

Absolutely agreed.

> Are the MAGIC constants only in-kernel use (to check if the user
> accidentally overwrote its stack) or should be checked by the user
> during signal handling to ensure that the xsave area is available.

I don't think the user should care but what do I know?!

> The part you referred to was:
> |-       /* Update the thread's fxstate to save the fsave header. */
> |-       if (ia32_fxstate) 
> |-               copy_fxregs_to_kernel(fpu);
> 
> and it is not helping because it does not explain why it is done. I can
> see based on the code that the FXstate is saved in case of a 32bit
> frame. It is saved into thread's state. It does not explain why it
> needs to be done. That is the "not helping" part.

This is *exactly* why I propose that we should have a
"grand-scheme-of-things" explanation somewhere about what we're doing
with the FPU context.

Figuring out what exactly to do in which context should be easier then.
I hope.

> I can forward you the IRC pieces offlist if you like. He said I can
> remove it if there are no users and I am not aware of any. He pointed
> out that sched_wakeup had a "success" member which was relied on by
> tools so it remained in order not to break them. So we have
> 	__entry->success        = 1; /* rudiment, kill when possible */
> 
> in the tree now. I can loop him in if this is not enough.

So you're replacing the old member with the new, AFAICT and I guess
that doesn't change offsets so even tools which don't use libtraceevent
should be fine but we better make sure before we break userspace because
we don't break userpace :)

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 09/22] x86/fpu: Add (__)make_fpregs_active helpers
  2019-01-28 18:23   ` Borislav Petkov
@ 2019-02-07 10:43     ` Sebastian Andrzej Siewior
  2019-02-13  9:30       ` Borislav Petkov
  0 siblings, 1 reply; 91+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-02-07 10:43 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-kernel, x86, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On 2019-01-28 19:23:49 [+0100], Borislav Petkov wrote:
> > diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h
> > index b56d504af6545..31b66af8eb914 100644
> > --- a/arch/x86/include/asm/fpu/api.h
> > +++ b/arch/x86/include/asm/fpu/api.h
> > @@ -10,6 +10,7 @@
> >  
> >  #ifndef _ASM_X86_FPU_API_H
> >  #define _ASM_X86_FPU_API_H
> > +#include <linux/preempt.h>
> >  
> >  /*
> >   * Use kernel_fpu_begin/end() if you intend to use FPU in kernel context. It
> > @@ -22,6 +23,16 @@ extern void kernel_fpu_begin(void);
> >  extern void kernel_fpu_end(void);
> >  extern bool irq_fpu_usable(void);
> >  
> > +static inline void __fpregs_changes_begin(void)
> > +{
> > +	preempt_disable();
> > +}
> > +
> > +static inline void __fpregs_changes_end(void)
> 
> How am I to understand that "fpregs_changes" thing? That FPU registers
> changes will begin and end respectively?

correct.

> I probably would call them fpregs_lock and fpregs_unlock even if
> it isn't doing any locking to denote that FPU regs are locked and
> inaccessible inside the region.

They are accessible inside the region. But they should not be touched by
context switch code (and later BH).
Is that what you meant?

> And why the "__" prefix? Is there a counterpart without the "__" coming?

No. I picked up the patches, that function was named like that. I kept
it. That __ probably denotes that it is an internal function but then it
has to be used outside (KVM) if they plan to "reload" registers (which
happens if they switch between host/guest registers).

> > +{
> > +	preempt_enable();
> > +}
> > +
> >  /*
> >   * Query the presence of one or more xfeatures. Works on any legacy CPU as well.
> >   *
> > diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
> > index 03acb9aeb32fc..795a0a2df135e 100644
> > --- a/arch/x86/include/asm/fpu/internal.h
> > +++ b/arch/x86/include/asm/fpu/internal.h
> > @@ -515,6 +515,15 @@ static inline void fpregs_activate(struct fpu *fpu)
> >  	trace_x86_fpu_regs_activated(fpu);
> >  }
> >  
> > +static inline void __fpregs_load_activate(struct fpu *fpu, int cpu)
> > +{
> > +	if (!fpregs_state_valid(fpu, cpu)) {
> > +		if (current->mm)
> > +			copy_kernel_to_fpregs(&fpu->state);
> > +		fpregs_activate(fpu);
> > +	}
> > +}
> > +
> >  /*
> >   * FPU state switching for scheduling.
> >   *
> > @@ -550,14 +559,8 @@ switch_fpu_prepare(struct fpu *old_fpu, int cpu)
> >   */
> >  static inline void switch_fpu_finish(struct fpu *new_fpu, int cpu)
> >  {
> > -	if (static_cpu_has(X86_FEATURE_FPU)) {
> > -		if (!fpregs_state_valid(new_fpu, cpu)) {
> > -			if (current->mm)
> > -				copy_kernel_to_fpregs(&new_fpu->state);
> > -		}
> > -
> > -		fpregs_activate(new_fpu);
> > -	}
> > +	if (static_cpu_has(X86_FEATURE_FPU))
> > +		__fpregs_load_activate(new_fpu, cpu);
> 
> And that second part of a cleanup looks strange in this patch. Why isn't
> it in a separate patch or how is it related to the addition of the
> helpers?

Two helpers are added:
- __fpregs_changes_{begin|end}()
  new.

- __fpregs_load_activate()
  refactored from switch_fpu_finish(),

> Thx.
> 

Sebastian

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 11/22] x86/fpu: Make get_xsave_field_ptr() and get_xsave_addr() use feature number instead of mask
  2019-01-28 18:49   ` Borislav Petkov
@ 2019-02-07 11:13     ` Sebastian Andrzej Siewior
  2019-02-13  9:31       ` Borislav Petkov
  0 siblings, 1 reply; 91+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-02-07 11:13 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-kernel, x86, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On 2019-01-28 19:49:59 [+0100], Borislav Petkov wrote:
> > --- a/arch/x86/kernel/fpu/xstate.c
> > +++ b/arch/x86/kernel/fpu/xstate.c
> > @@ -830,15 +830,15 @@ static void *__raw_xsave_addr(struct xregs_state *xsave, int xfeature_nr)
> > -void *get_xsave_addr(struct xregs_state *xsave, int xstate_feature)
> > +void *get_xsave_addr(struct xregs_state *xsave, int xfeature_nr)
> >  {
> > -	int xfeature_nr;
> > +	u64 xfeature_mask = 1ULL << xfeature_nr;
> 
> You can paste directly BIT_ULL(xfeature_nr) where you need it in this
> function...
changed.

> > @@ -850,11 +850,11 @@ void *get_xsave_addr(struct xregs_state *xsave, int xstate_feature)
> >  	 * have not enabled.  Remember that pcntxt_mask is
> >  	 * what we write to the XCR0 register.
> >  	 */
> > -	WARN_ONCE(!(xfeatures_mask & xstate_feature),
> > +	WARN_ONCE(!(xfeatures_mask & xfeature_mask),
> 
> ... and turn this into:
> 
> 	WARN_ONCE(!(xfeatures_mask & BIT_ULL(xfeature_nr))
>
> which is more readable than the AND of two variables which I had to
> re-focus my eyes to see the difference. :)
> 
you mean with vs without the `s' ?

> Oh and this way, gcc generates better code by doing simply a BT
> directly:
> 
> # arch/x86/kernel/fpu/xstate.c:852:     WARN_ONCE(!(xfeatures_mask & BIT_ULL(xfeature_nr)),
>         .loc 1 852 2 view .LVU258
>         movq    xfeatures_mask(%rip), %rax      # xfeatures_mask, tmp124
>         btq     %rsi, %rax      # xfeature_nr, tmp124

interesting. gcc should know that it can use btq or shift + and because
it has all the raw data.
Anyway, I replaced the two user of xfeature_mask with
BIT_ULL(xfeature_nr).

> Thx.

Sebastian

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 12/22] x86/fpu: Only write PKRU if it is different from current
  2019-01-23 18:09   ` Dave Hansen
@ 2019-02-07 11:27     ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 91+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-02-07 11:27 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-kernel, x86, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On 2019-01-23 10:09:24 [-0800], Dave Hansen wrote:
> On 1/9/19 3:47 AM, Sebastian Andrzej Siewior wrote:
> > +static inline void __write_pkru(u32 pkru)
> > +{
> > +	/*
> > +	 * Writting PKRU is expensive. Only write the PKRU value if it is
> > +	 * different from the current one.
> > +	 */
> 
> I'd say:
> 
> 	WRPKRU is relatively expensive compared to RDPKRU.
> 	Avoid WRPKRU when it would not change the value.
> 
> In the grand scheme of things, WRPKRU is cheap.  It's certainly not an
> "expensive instruction" compared to things like WBINVD.

Okay.

> > +	if (pkru == __read_pkru())
> > +		return;
> > +	__write_pkru_insn(pkru);
> > +}
> 
> Is there a case where we need __write_pkru_insn() directly?  Why not
> just put the inline assembly in here?

There is no user of __write_pkru_insn(). I had one in the past I think.
Let me merge it for now.

Sebastian

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 15/22] x86/entry: Add TIF_NEED_FPU_LOAD
  2019-01-30 11:55   ` Borislav Petkov
@ 2019-02-07 11:49     ` Sebastian Andrzej Siewior
  2019-02-13  9:35       ` Borislav Petkov
  0 siblings, 1 reply; 91+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-02-07 11:49 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-kernel, x86, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On 2019-01-30 12:55:07 [+0100], Borislav Petkov wrote:
> This definitely needs to be written somewhere in
> 
> arch/x86/include/asm/fpu/internal.h
> 
> or where we decide to put the FPU handling rules.

Added:

Index: staging/arch/x86/include/asm/fpu/internal.h
===================================================================
--- staging.orig/arch/x86/include/asm/fpu/internal.h
+++ staging/arch/x86/include/asm/fpu/internal.h
@@ -537,6 +537,12 @@ static inline void __fpregs_load_activat
  *
  * The FPU context is only stored/restore for user task and ->mm is used to
  * distinguish between kernel and user threads.
+ *
+ * If TIF_NEED_FPU_LOAD is cleared then CPU's FPU registers are holding the
+ * current content of current()'s FPU register state.
+ * If TIF_NEED_FPU_LOAD is set then CPU's FPU registers may not hold current()'s
+ * FPU registers. It is required to load the register before returning to
+ * userland or using the content otherwise.
  */
 static inline void
 switch_fpu_prepare(struct fpu *old_fpu, int cpu)

Sebastian

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 16/22] x86/fpu: Always store the registers in copy_fpstate_to_sigframe()
  2019-01-30 11:43   ` Borislav Petkov
@ 2019-02-07 13:28     ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 91+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-02-07 13:28 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-kernel, x86, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On 2019-01-30 12:43:22 [+0100], Borislav Petkov wrote:
> > @@ -171,9 +156,15 @@ int copy_fpstate_to_sigframe(void __user *buf, void __user *buf_fx, int size)
> >  			sizeof(struct user_i387_ia32_struct), NULL,
> >  			(struct _fpstate_32 __user *) buf) ? -1 : 1;
> >  
> > -	/* Save the live register state to the user directly. */
> > -	if (copy_fpregs_to_sigframe(buf_fx))
> > -		return -1;
> > +	copy_fpregs_to_fpstate(fpu);
> > +
> > +	if (using_compacted_format()) {
> > +		copy_xstate_to_user(buf_fx, xsave, 0, size);
> > +	} else {
> > +		fpstate_sanitize_xstate(fpu);
> > +		if (__copy_to_user(buf_fx, xsave, fpu_user_xstate_size))
> > +			return -1;
> > +	}
> >  
> >  	/* Save the fsave header for the 32-bit frames. */
> >  	if ((ia32_fxstate || !use_fxsr()) && save_fsave_header(tsk, buf))
> 
> Comments above that function need updating.

Did:
- * Save the state directly to the user frame pointed by the aligned pointer
- * 'buf_fx'.
+ * Save the state to task's fpu->state and then copy it to the user frame
+ * pointed by the aligned pointer 'buf_fx'.

Sebastian

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 17/22] x86/fpu: Prepare copy_fpstate_to_sigframe() for TIF_NEED_FPU_LOAD
  2019-01-30 12:53       ` Borislav Petkov
@ 2019-02-07 14:10         ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 91+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-02-07 14:10 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-kernel, x86, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On 2019-01-30 13:53:51 [+0100], Borislav Petkov wrote:
> > I've been asked to add comment above the sequence so it is understood. I
> > think the general approach is easy to follow once the concept is
> > understood. I don't mind renaming the TIF_ thingy once again (it
> > happend once or twice and I think the current one was suggested by Andy
> > unless I mixed things up).
> > The problem I have with the above is that
> > 
> > 	if (test_thread_flag(TIF_NEED_FPU_LOAD))
> > 		do_that()
> > 
> > becomes
> > 	if (!test_thread_flag(TIF_FPU_REGS_VALID))
> > 		do_that()
> 
> Err, above it becomes
> 
> 	if (test_thread_flag(TIF_FPU_REGS_VALID))
> 		copy_fpregs_to_fpstate(fpu);

The (your) above example yes. But the reverse state
 	if (test_thread_flag(TIF_NEED_FPU_LOAD))
 		do_that()

becomes
 	if (!test_thread_flag(TIF_FPU_REGS_VALID))
 		do_that()

> without the "!". I.e., CPU's FPU regs are valid and we need to save them.
> 
> Or am I misreading the comment above?

Your example is correct. But in the opposite case, when ! was not there
then we have to add it.

> > and you could argue again the other way around. So do we want NEED_LOAD
> > or NEED_SAVE flag which is another way of saying REGS_VALID?
> 
> All fine and dandy except NEED_FPU_LOAD is ambiguous to me: we need to
> load them where? Into the CPU? Or into the FPU state save area?

if you need to LOAD then task-saved-area into CPU-state. If you need to
save it then CPU-state into task-saved-area.

> TIF_FPU_REGS_VALID is clearer in the sense that the CPU's FPU registers
> are currently valid for the current task. As there are no other FPU
> registers except the CPU's.

hmmm. I think it is just taste / habit.

> > More importantly the logic is changed when the bit is set and this
> > requires more thinking than just doing sed on the patch series.
> 
> Sure.
> 
> And I'll get accustomed to the logic whatever the name is - this is just
> a "wouldn't it be better" thing.

If it would be just a name thing then I probably wouldn't mind. But
swapping the logic might break things so I try to avoid that.

> Thx.
> 

Sebastian

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v6] x86: load FPU registers on return to userland
  2019-01-30 12:27     ` Borislav Petkov
@ 2019-02-08 13:12       ` Sebastian Andrzej Siewior
  2019-02-13 15:54         ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 91+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-02-08 13:12 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-kernel, x86, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On 2019-01-30 13:27:13 [+0100], Borislav Petkov wrote:
> On Wed, Jan 30, 2019 at 01:06:47PM +0100, Sebastian Andrzej Siewior wrote:
> > I don't know if hackbench would show anything besides noise.
> 
> Yeah, if a sensible benchmark (dunno if hackbench is among them :))
> shows no difference, is also saying something.

"hackbench -g80 -l 1000 -s 255" shows just noise. I don't see any
reasonable difference with or without the series.

Tracing. The following patch

diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
index c5a6edd92de4f..aa1914e5bf5c0 100644
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -292,6 +292,7 @@ struct fpu {
 	 * FPU state should be reloaded next time the task is run.
 	 */
 	unsigned int			last_cpu;
+	unsigned int avoided_loads;
 
 	/*
 	 * @state:
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index c98c54e796186..7560942a550ed 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -358,9 +358,11 @@ void fpu__clear(struct fpu *fpu)
  */
 void switch_fpu_return(void)
 {
+	struct fpu *fpu = &current->thread.fpu;
+
 	if (!static_cpu_has(X86_FEATURE_FPU))
 		return;
-
+	fpu->avoided_loads = 0;
 	__fpregs_load_activate();
 }
 EXPORT_SYMBOL_GPL(switch_fpu_return);
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 37b2ecef041e6..875f74b1e8779 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -522,6 +522,10 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 
 	if (!test_thread_flag(TIF_NEED_FPU_LOAD))
 		switch_fpu_prepare(prev_fpu, cpu);
+	else if (current->mm) {
+		prev_fpu->avoided_loads++;
+		trace_printk("skipped save %d\n", prev_fpu->avoided_loads);
+	}
 
 	/* We must save %fs and %gs before load_TLS() because
 	 * %fs and %gs may be cleared by load_TLS().

should help to spot the optimization. So if TIF_NEED_FPU_LOAD is set at
this point then between this and the last invocation of schedule() we
haven't been in userland and so we avoided loading + storing of FPU
registers. I saw things like:

|  http-1935  [001] d..2   223.460434: sched_switch: prev_comm=http prev_pid=1935 prev_prio=120 prev_state=R+ ==> next_comm=apt next_pid=1931 next_prio=120
|   apt-1931  [001] d..2   223.460680: sched_switch: prev_comm=apt prev_pid=1931 prev_prio=120 prev_state=D ==> next_comm=http next_pid=1935 next_prio=120
|  http-1935  [001] d..2   223.460729: sched_switch: prev_comm=http prev_pid=1935 prev_prio=120 prev_state=R+ ==> next_comm=apt next_pid=1931 next_prio=120
|  http-1935  [001] d..2   223.460732: __switch_to: skipped save 1
|   apt-1931  [001] d..2   223.461076: sched_switch: prev_comm=apt prev_pid=1931 prev_prio=120 prev_state=D ==> next_comm=http next_pid=1935 next_prio=120
|  http-1935  [001] d..2   223.461111: sched_switch: prev_comm=http prev_pid=1935 prev_prio=120 prev_state=R+ ==> next_comm=apt next_pid=1931 next_prio=120
|  http-1935  [001] d..2   223.461112: __switch_to: skipped save 2

which means we avoided loading FPU registers for `http' because for some
reason it was not required. Here we switched between two user tasks so
without the patches we would have to save and restore them.

I captured a few instances of something like:

|  rcu_preempt-10    [000] d..2  1032.867293: sched_switch: prev_comm=rcu_preempt prev_pid=10 prev_prio=98 prev_state=I ==> next_comm=kswapd0 next_pid=536 next_prio=120
|          apt-1954  [001] d..2  1032.867435: sched_switch: prev_comm=apt prev_pid=1954 prev_prio=120 prev_state=R+ ==> next_comm=kworker/1:0 next_pid=1943 next_prio=120
|          apt-1954  [001] d..2  1032.867436: __switch_to: skipped save 30
|  kworker/1:0-1943  [001] d..2  1032.867455: sched_switch: prev_comm=kworker/1:0 prev_pid=1943 prev_prio=120 prev_state=I ==> next_comm=apt next_pid=1954 next_prio=120
|          apt-1954  [001] d..2  1032.867459: sched_switch: prev_comm=apt prev_pid=1954 prev_prio=120 prev_state=D ==> next_comm=swapper/1 next_pid=0 next_prio=120
|          apt-1954  [001] d..2  1032.867460: __switch_to: skipped save 31

It has been avoided to restore and save the FPU register of `apt' 31
times (in a row). This isn't 100% true. We switched to and from a kernel
thread to `apt' so switch_fpu_finish() wouldn't load the registers
because the switch to the kernel thread (switch_fpu_prepare()) would not
destroy them. *However* the switch away from `apt' would save the FPU
registers so we avoid this (current code always saves FPU registers on
context switch, see switch_fpu_prepare()).
My understanding is that if the CPU supports `xsaves' then it wouldn't
save anything in this scenario because the CPU would notice that its FPU
state didn't change since last time so no need to save anything.

Then we have lat_sig [0]. Without the series 64bit:
|Signal handler overhead: 2.6839 microseconds
|Signal handler overhead: 2.6996 microseconds
|Signal handler overhead: 2.6821 microseconds

with the series:
|Signal handler overhead: 3.2976 microseconds
|Signal handler overhead: 3.3033 microseconds
|Signal handler overhead: 3.2980 microseconds

that is approximately 22% worse. Without the series 64bit kernel with
32bit binary:
| Signal handler overhead: 3.8139 microseconds
| Signal handler overhead: 3.8035 microseconds
| Signal handler overhead: 3.8127 microseconds

with the series:
| Signal handler overhead: 4.0434 microseconds
| Signal handler overhead: 4.0438 microseconds
| Signal handler overhead: 4.0408 microseconds

approximately 6% worse. I'm a little surprised in the 32bit case because
it did save+copy earlier (while the 64bit saved it directly to signal
stack).

If we restore directly from signal stack (instead the copy_from_user())
we get to (64bit only):
| Signal handler overhead: 3.0376 microseconds
| Signal handler overhead: 3.0687 microseconds
| Signal handler overhead: 3.0510 microseconds

and if additionally save the registers to the signal stack:
| Signal handler overhead: 2.7835 microseconds
| Signal handler overhead: 2.7850 microseconds
| Signal handler overhead: 2.7766 microseconds

then we get almost to where we started. I will fire a commit per commit
bench to see if I notice something.
Ach and this was PREEMPT on a 
|x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'compacted' format.

machine. So those with AVX-512 might be worse but I don't have any of
those.

[0] Part of lmbench, test
    taskset 2 /usr/lib/lmbench/bin/x86_64-linux-gnu/lat_sig -P 1 -W 64 -N 5000 catch

Sebastian

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* Re: [PATCH 09/22] x86/fpu: Add (__)make_fpregs_active helpers
  2019-02-07 10:43     ` Sebastian Andrzej Siewior
@ 2019-02-13  9:30       ` Borislav Petkov
  2019-02-14 14:51         ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 91+ messages in thread
From: Borislav Petkov @ 2019-02-13  9:30 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, x86, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On Thu, Feb 07, 2019 at 11:43:25AM +0100, Sebastian Andrzej Siewior wrote:
> They are accessible inside the region. But they should not be touched by
> context switch code (and later BH).
> Is that what you meant?

I just don't like that "changes" name. So when called, those functions
practically lock the FPU regs from being accessed by others. So with

fpregs_lock
fpregs_unlock

for example, is kinda clear what's going on and you don't have to wonder
what it does.

> No. I picked up the patches, that function was named like that. I kept
> it. That __ probably denotes that it is an internal function but then it
> has to be used outside (KVM) if they plan to "reload" registers (which
> happens if they switch between host/guest registers).

Ok, so you can drop the "__".

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 11/22] x86/fpu: Make get_xsave_field_ptr() and get_xsave_addr() use feature number instead of mask
  2019-02-07 11:13     ` Sebastian Andrzej Siewior
@ 2019-02-13  9:31       ` Borislav Petkov
  0 siblings, 0 replies; 91+ messages in thread
From: Borislav Petkov @ 2019-02-13  9:31 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, x86, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On Thu, Feb 07, 2019 at 12:13:40PM +0100, Sebastian Andrzej Siewior wrote:
> you mean with vs without the `s' ?

Yahaa. :)

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 15/22] x86/entry: Add TIF_NEED_FPU_LOAD
  2019-02-07 11:49     ` Sebastian Andrzej Siewior
@ 2019-02-13  9:35       ` Borislav Petkov
  2019-02-14 15:28         ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 91+ messages in thread
From: Borislav Petkov @ 2019-02-13  9:35 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, x86, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On Thu, Feb 07, 2019 at 12:49:42PM +0100, Sebastian Andrzej Siewior wrote:
> On 2019-01-30 12:55:07 [+0100], Borislav Petkov wrote:
> > This definitely needs to be written somewhere in
> > 
> > arch/x86/include/asm/fpu/internal.h
> > 
> > or where we decide to put the FPU handling rules.
> 
> Added:
> 
> Index: staging/arch/x86/include/asm/fpu/internal.h
> ===================================================================
> --- staging.orig/arch/x86/include/asm/fpu/internal.h
> +++ staging/arch/x86/include/asm/fpu/internal.h
> @@ -537,6 +537,12 @@ static inline void __fpregs_load_activat
>   *
>   * The FPU context is only stored/restore for user task and ->mm is used to
>   * distinguish between kernel and user threads.
> + *
> + * If TIF_NEED_FPU_LOAD is cleared then CPU's FPU registers are holding the
> + * current content of current()'s FPU register state.

"current content of current" - that's a lot of c...

Make that

"... then the CPU's FPU registers are mirrored in the current thread's
FPU registers state."

> + * If TIF_NEED_FPU_LOAD is set then CPU's FPU registers may not hold current()'s
> + * FPU registers. It is required to load the register before returning to
						^^^^^^^^

s/register/registers/ - plural.

Thx.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v6] x86: load FPU registers on return to userland
  2019-02-08 13:12       ` Sebastian Andrzej Siewior
@ 2019-02-13 15:54         ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 91+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-02-13 15:54 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-kernel, x86, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On 2019-02-08 14:12:33 [+0100], To Borislav Petkov wrote:
> Then we have lat_sig [0]. Without the series 64bit:
> |Signal handler overhead: 2.6839 microseconds
> |Signal handler overhead: 2.6996 microseconds
> |Signal handler overhead: 2.6821 microseconds
> 
> with the series:
> |Signal handler overhead: 3.2976 microseconds
> |Signal handler overhead: 3.3033 microseconds
> |Signal handler overhead: 3.2980 microseconds

Did a patch-by-patch run (64bit only, server preemption model, output in
us ("commit")):

2.368 ("Linux 5.0-rc5")

2.603 ("x86/fpu: Always store the registers in copy_fpstate_to_sigframe()")
  copy_fpstate_to_sigframe() stores to thread's FPU area and then copies
  user stack area.

2.668 ("x86/fpu: Prepare copy_fpstate_to_sigframe() for TIF_NEED_FPU_LOAD")
  this should be noise since preempt_disable/enable is a nop -
  test_thread_flag() isn't.

2.701 ("x86/fpu: Inline copy_user_to_fpregs_zeroing()")
  This pops up somehow but is simply code movement.

3.474 ("x86/fpu: Let __fpu__restore_sig() restore the !32bit+fxsr frame from kernel memory")
  This stands out. There a kmalloc() + saving to kernel memory + copy
  instead a direct save to kernel stack.

2.928 ("x86/fpu: Defer FPU state load until return to userspace")
  The kmalloc() has been removed. Just "copy-to-kernel-memory" and
  copy_to_user() remained.

So this looks like 0.3us for the save-copy + 0.3us for copy-restore. The
numbers for the preempt/low-lat-desktop have the same two spikes and
drop at the end.

Sebastian

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 09/22] x86/fpu: Add (__)make_fpregs_active helpers
  2019-02-13  9:30       ` Borislav Petkov
@ 2019-02-14 14:51         ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 91+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-02-14 14:51 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-kernel, x86, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On 2019-02-13 10:30:25 [+0100], Borislav Petkov wrote:
> On Thu, Feb 07, 2019 at 11:43:25AM +0100, Sebastian Andrzej Siewior wrote:
> > They are accessible inside the region. But they should not be touched by
> > context switch code (and later BH).
> > Is that what you meant?
> 
> I just don't like that "changes" name. So when called, those functions
> practically lock the FPU regs from being accessed by others. So with
> 
> fpregs_lock
> fpregs_unlock
> 
> for example, is kinda clear what's going on and you don't have to wonder
> what it does.

renamed as suggested.

Sebastian

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 15/22] x86/entry: Add TIF_NEED_FPU_LOAD
  2019-02-13  9:35       ` Borislav Petkov
@ 2019-02-14 15:28         ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 91+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-02-14 15:28 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-kernel, x86, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On 2019-02-13 10:35:53 [+0100], Borislav Petkov wrote:
…
> > + *
> > + * If TIF_NEED_FPU_LOAD is cleared then CPU's FPU registers are holding the
> > + * current content of current()'s FPU register state.
> 
> "current content of current" - that's a lot of c...
> 
> Make that
> 
> "... then the CPU's FPU registers are mirrored in the current thread's
> FPU registers state."

Replaced `mirrored' with saved:

+ * If TIF_NEED_FPU_LOAD is cleared then the CPU's FPU registers are saved in
+ * the current thread's FPU registers state.
+ * If TIF_NEED_FPU_LOAD is set then CPU's FPU registers may not hold current()'s
+ * FPU registers. It is required to load the registers before returning to
+ * userland or using the content otherwise.

Sebastian

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 05/22] x86/fpu: Remove fpu->initialized usage in copy_fpstate_to_sigframe()
  2019-02-05 11:17               ` Sebastian Andrzej Siewior
@ 2019-02-26 16:38                 ` Oleg Nesterov
  2019-03-08 18:12                   ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 91+ messages in thread
From: Oleg Nesterov @ 2019-02-26 16:38 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Dave Hansen, Borislav Petkov, Ingo Molnar, linux-kernel, x86,
	Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

Hi Sebastian,

Sorry, I just noticed your email...

On 02/05, Sebastian Andrzej Siewior wrote:
>
> On 2019-01-21 12:21:17 [+0100], Oleg Nesterov wrote:
> > > This is part of our ABI for *sure*.  Inspecting that state is how
> > > userspace makes sense of MPX or protection keys faults.  We even use
> > > this in selftests/.
> >
> > Yes.
> >
> > And in any case I do not understand the idea to use the second in-kernel struct fpu.
> > A signal handler can be interrupted by another signal, this will need to save/restore
> > the FPU state again.
>
> So I assumed that while SIGUSR1 is handled SIGUSR2 will wait until the
> current signal is handled. So no interruption. But then SIGSEGV is
> probably the exception which will interrupt SIGUSR1. So we would need a
> third one…

I guess you do not need my answer, but just in case.

SIGSEGV is not an exception. A SIGUSR1 handler can be interrupted by any other
signal which is not included in sigaction->sa_mask. Even SIGUSR1 can interrupt
the handler if SA_NODEFER was used.


> The idea was to save the FPU state in-kernel so we don't have to
> revalidate everything because userspace had access to it and could do
> things.

I understand, but this simply can't work, see above.

Oleg.


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 05/22] x86/fpu: Remove fpu->initialized usage in copy_fpstate_to_sigframe()
  2019-02-26 16:38                 ` Oleg Nesterov
@ 2019-03-08 18:12                   ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 91+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-03-08 18:12 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Dave Hansen, Borislav Petkov, Ingo Molnar, linux-kernel, x86,
	Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On 2019-02-26 17:38:22 [+0100], Oleg Nesterov wrote:
> Hi Sebastian,
Hi Oleg,

> Sorry, I just noticed your email...

no worries.

> > So I assumed that while SIGUSR1 is handled SIGUSR2 will wait until the
> > current signal is handled. So no interruption. But then SIGSEGV is
> > probably the exception which will interrupt SIGUSR1. So we would need a
> > third one…
> 
> I guess you do not need my answer, but just in case.
> 
> SIGSEGV is not an exception. A SIGUSR1 handler can be interrupted by any other
> signal which is not included in sigaction->sa_mask. Even SIGUSR1 can interrupt
> the handler if SA_NODEFER was used.

okay, understood. My understanding was that since signal sending is not
very deterministic and as such you can't reliably control if one signal
arrived before the other finished it should not matter.
But well, this all is gone now…

Thank you for the explanation.

> Oleg.

Sebastian

^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH 05/22] x86/fpu: Remove fpu->initialized usage in copy_fpstate_to_sigframe()
  2019-02-21 11:49 [PATCH v7] " Sebastian Andrzej Siewior
@ 2019-02-21 11:50 ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 91+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-02-21 11:50 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen,
	Sebastian Andrzej Siewior

With lazy-FPU support the (now named variable) ->initialized was set to true if
the CPU's FPU registers were holding the a valid state of the FPU registers for
the active process. If it was set to false then the FPU state was saved in
fpu->state and the FPU was deactivated.
With lazy-FPU gone, ->initialized is always true for user threads and kernel
threads never this function so ->initialized is always true in
copy_fpstate_to_sigframe().
The using_compacted_format() check is also a leftover from the lazy-FPU time.
In the `->initialized == false' case copy_to_user() would copy the compacted
buffer while userland would expect the non-compacted format instead. So in
order to save the FPU state in the non-compacted form it issues the xsave
opcode to save the *current* FPU state.
The FPU is not enabled so the attempt raises the FPU trap, the trap restores
the FPU content and re-enables the FPU and the xsave opcode is invoked again and
succeeds. *This* does not longer work since commit

  bef8b6da9522 ("x86/fpu: Handle #NM without FPU emulation as an error")

Remove check for ->initialized because it is always true and remove the
false condition. Update the comment to reflect that the "state is always live".

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/x86/kernel/fpu/signal.c | 35 ++++++++---------------------------
 1 file changed, 8 insertions(+), 27 deletions(-)

diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index a874931edf6a9..de83d0ed9e14e 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -144,9 +144,8 @@ static inline int copy_fpregs_to_sigframe(struct xregs_state __user *buf)
  *	buf == buf_fx for 64-bit frames and 32-bit fsave frame.
  *	buf != buf_fx for 32-bit frames with fxstate.
  *
- * If the fpu, extended register state is live, save the state directly
- * to the user frame pointed by the aligned pointer 'buf_fx'. Otherwise,
- * copy the thread's fpu state to the user frame starting at 'buf_fx'.
+ * Save the state directly to the user frame pointed by the aligned pointer
+ * 'buf_fx'.
  *
  * If this is a 32-bit frame with fxstate, put a fsave header before
  * the aligned state at 'buf_fx'.
@@ -157,7 +156,6 @@ static inline int copy_fpregs_to_sigframe(struct xregs_state __user *buf)
 int copy_fpstate_to_sigframe(void __user *buf, void __user *buf_fx, int size)
 {
 	struct fpu *fpu = &current->thread.fpu;
-	struct xregs_state *xsave = &fpu->state.xsave;
 	struct task_struct *tsk = current;
 	int ia32_fxstate = (buf != buf_fx);
 
@@ -172,29 +170,12 @@ int copy_fpstate_to_sigframe(void __user *buf, void __user *buf_fx, int size)
 			sizeof(struct user_i387_ia32_struct), NULL,
 			(struct _fpstate_32 __user *) buf) ? -1 : 1;
 
-	if (fpu->initialized || using_compacted_format()) {
-		/* Save the live register state to the user directly. */
-		if (copy_fpregs_to_sigframe(buf_fx))
-			return -1;
-		/* Update the thread's fxstate to save the fsave header. */
-		if (ia32_fxstate)
-			copy_fxregs_to_kernel(fpu);
-	} else {
-		/*
-		 * It is a *bug* if kernel uses compacted-format for xsave
-		 * area and we copy it out directly to a signal frame. It
-		 * should have been handled above by saving the registers
-		 * directly.
-		 */
-		if (boot_cpu_has(X86_FEATURE_XSAVES)) {
-			WARN_ONCE(1, "x86/fpu: saving compacted-format xsave area to a signal frame!\n");
-			return -1;
-		}
-
-		fpstate_sanitize_xstate(fpu);
-		if (__copy_to_user(buf_fx, xsave, fpu_user_xstate_size))
-			return -1;
-	}
+	/* Save the live register state to the user directly. */
+	if (copy_fpregs_to_sigframe(buf_fx))
+		return -1;
+	/* Update the thread's fxstate to save the fsave header. */
+	if (ia32_fxstate)
+		copy_fxregs_to_kernel(fpu);
 
 	/* Save the fsave header for the 32-bit frames. */
 	if ((ia32_fxstate || !use_fxsr()) && save_fsave_header(tsk, buf))
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 91+ messages in thread

end of thread, other threads:[~2019-03-08 18:12 UTC | newest]

Thread overview: 91+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-09 11:47 [PATCH v6] x86: load FPU registers on return to userland Sebastian Andrzej Siewior
2019-01-09 11:47 ` [PATCH 01/22] x86/fpu: Remove fpu->initialized usage in __fpu__restore_sig() Sebastian Andrzej Siewior
2019-01-14 16:24   ` Borislav Petkov
2019-02-05 10:08     ` Sebastian Andrzej Siewior
2019-01-09 11:47 ` [PATCH 02/22] x86/fpu: Remove fpu__restore() Sebastian Andrzej Siewior
2019-01-09 11:47 ` [PATCH 03/22] x86/fpu: Remove preempt_disable() in fpu__clear() Sebastian Andrzej Siewior
2019-01-14 18:55   ` Borislav Petkov
2019-01-09 11:47 ` [PATCH 04/22] x86/fpu: Always init the `state' " Sebastian Andrzej Siewior
2019-01-14 19:32   ` Borislav Petkov
2019-01-09 11:47 ` [PATCH 05/22] x86/fpu: Remove fpu->initialized usage in copy_fpstate_to_sigframe() Sebastian Andrzej Siewior
2019-01-16 19:36   ` Borislav Petkov
2019-01-16 22:40     ` Sebastian Andrzej Siewior
2019-01-17 12:22       ` Borislav Petkov
2019-01-18 21:14         ` Sebastian Andrzej Siewior
2019-01-18 21:17           ` Dave Hansen
2019-01-18 21:37             ` Sebastian Andrzej Siewior
2019-01-18 21:43               ` Dave Hansen
2019-01-21 11:21             ` Oleg Nesterov
2019-01-22 13:40               ` Borislav Petkov
2019-01-22 16:15                 ` Oleg Nesterov
2019-01-22 17:00                   ` Borislav Petkov
2019-02-05 11:34                     ` Sebastian Andrzej Siewior
2019-02-05 11:17               ` Sebastian Andrzej Siewior
2019-02-26 16:38                 ` Oleg Nesterov
2019-03-08 18:12                   ` Sebastian Andrzej Siewior
2019-02-05 14:37         ` [PATCH 05/22 v2] " Sebastian Andrzej Siewior
2019-01-09 11:47 ` [PATCH 06/22] x86/fpu: Don't save fxregs for ia32 frames " Sebastian Andrzej Siewior
2019-01-24 11:17   ` Borislav Petkov
2019-02-05 16:43     ` [PATCH 06/22 v2] x86/fpu: Don't save fxregs for ia32 frames in Sebastian Andrzej Siewior
2019-01-09 11:47 ` [PATCH 07/22] x86/fpu: Remove fpu->initialized Sebastian Andrzej Siewior
2019-01-24 13:34   ` Borislav Petkov
2019-02-05 18:03     ` Sebastian Andrzej Siewior
2019-02-06 14:01       ` Borislav Petkov
2019-02-07 10:13         ` Sebastian Andrzej Siewior
2019-02-07 10:37           ` Borislav Petkov
2019-02-05 18:06     ` [PATCH 07/22 v2] " Sebastian Andrzej Siewior
2019-01-09 11:47 ` [PATCH 08/22] x86/fpu: Remove user_fpu_begin() Sebastian Andrzej Siewior
2019-01-25 15:18   ` Borislav Petkov
2019-02-05 18:16     ` Sebastian Andrzej Siewior
2019-01-09 11:47 ` [PATCH 09/22] x86/fpu: Add (__)make_fpregs_active helpers Sebastian Andrzej Siewior
2019-01-28 18:23   ` Borislav Petkov
2019-02-07 10:43     ` Sebastian Andrzej Siewior
2019-02-13  9:30       ` Borislav Petkov
2019-02-14 14:51         ` Sebastian Andrzej Siewior
2019-01-09 11:47 ` [PATCH 10/22] x86/fpu: Make __raw_xsave_addr() use feature number instead of mask Sebastian Andrzej Siewior
2019-01-28 18:30   ` Borislav Petkov
2019-01-09 11:47 ` [PATCH 11/22] x86/fpu: Make get_xsave_field_ptr() and get_xsave_addr() " Sebastian Andrzej Siewior
2019-01-28 18:49   ` Borislav Petkov
2019-02-07 11:13     ` Sebastian Andrzej Siewior
2019-02-13  9:31       ` Borislav Petkov
2019-01-09 11:47 ` [PATCH 12/22] x86/fpu: Only write PKRU if it is different from current Sebastian Andrzej Siewior
2019-01-23 18:09   ` Dave Hansen
2019-02-07 11:27     ` Sebastian Andrzej Siewior
2019-01-09 11:47 ` [PATCH 13/22] x86/pkeys: Don't check if PKRU is zero before writting it Sebastian Andrzej Siewior
2019-01-09 11:47 ` [PATCH 14/22] x86/fpu: Eager switch PKRU state Sebastian Andrzej Siewior
2019-01-09 11:47 ` [PATCH 15/22] x86/entry: Add TIF_NEED_FPU_LOAD Sebastian Andrzej Siewior
2019-01-30 11:55   ` Borislav Petkov
2019-02-07 11:49     ` Sebastian Andrzej Siewior
2019-02-13  9:35       ` Borislav Petkov
2019-02-14 15:28         ` Sebastian Andrzej Siewior
2019-01-09 11:47 ` [PATCH 16/22] x86/fpu: Always store the registers in copy_fpstate_to_sigframe() Sebastian Andrzej Siewior
2019-01-30 11:43   ` Borislav Petkov
2019-02-07 13:28     ` Sebastian Andrzej Siewior
2019-01-09 11:47 ` [PATCH 17/22] x86/fpu: Prepare copy_fpstate_to_sigframe() for TIF_NEED_FPU_LOAD Sebastian Andrzej Siewior
2019-01-30 11:56   ` Borislav Petkov
2019-01-30 12:28     ` Sebastian Andrzej Siewior
2019-01-30 12:53       ` Borislav Petkov
2019-02-07 14:10         ` Sebastian Andrzej Siewior
2019-01-09 11:47 ` [PATCH 18/22] x86/fpu: Update xstate's PKRU value on write_pkru() Sebastian Andrzej Siewior
2019-01-23 17:28   ` Dave Hansen
2019-01-09 11:47 ` [PATCH 19/22] x86/fpu: Inline copy_user_to_fpregs_zeroing() Sebastian Andrzej Siewior
2019-01-09 11:47 ` [PATCH 20/22] x86/fpu: Let __fpu__restore_sig() restore the !32bit+fxsr frame from kernel memory Sebastian Andrzej Siewior
2019-01-30 21:29   ` Borislav Petkov
2019-01-09 11:47 ` [PATCH 21/22] x86/fpu: Merge the two code paths in __fpu__restore_sig() Sebastian Andrzej Siewior
2019-01-09 11:47 ` [PATCH 22/22] x86/fpu: Defer FPU state load until return to userspace Sebastian Andrzej Siewior
2019-01-31  9:16   ` Borislav Petkov
2019-01-15 12:44 ` [PATCH v6] x86: load FPU registers on return to userland David Laight
2019-01-15 13:15   ` 'Sebastian Andrzej Siewior'
2019-01-15 14:33     ` David Laight
2019-01-15 19:46   ` Dave Hansen
2019-01-15 20:26     ` Andy Lutomirski
2019-01-15 20:54       ` Dave Hansen
2019-01-15 21:11         ` Andy Lutomirski
2019-01-16 10:31           ` David Laight
2019-01-16 10:18       ` David Laight
2019-01-30 11:35 ` Borislav Petkov
2019-01-30 12:06   ` Sebastian Andrzej Siewior
2019-01-30 12:27     ` Borislav Petkov
2019-02-08 13:12       ` Sebastian Andrzej Siewior
2019-02-13 15:54         ` Sebastian Andrzej Siewior
2019-02-21 11:49 [PATCH v7] " Sebastian Andrzej Siewior
2019-02-21 11:50 ` [PATCH 05/22] x86/fpu: Remove fpu->initialized usage in copy_fpstate_to_sigframe() Sebastian Andrzej Siewior

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).