linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] x86/fpu: eagerfpu fixes, speedups, and default enablement
@ 2016-01-23  0:56 Andy Lutomirski
  2016-01-23  0:56 ` [PATCH 1/5] x86/fpu: Fix math emulation in eager fpu mode Andy Lutomirski
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: Andy Lutomirski @ 2016-01-23  0:56 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Borislav Petkov, Fenghua Yu, Oleg Nesterov, Peter Zijlstra,
	Sai Praneeth Prakhya, yu-cheng yu, Dave Hansen, Rik van Riel,
	Linus Torvalds, Andy Lutomirski

Hi all-

Patches 1, 2, and 3 are fixes.

Patch 4 is probably a small speedup.  It also only matters in lazy
FPU mode, which means that, most likely, no one cares.  Apply or
don't -- I don't care much.

Patch 5 is, in some sense, a radical change.  Currently we select
eager or lazy mode depending on CPU type.  I think that lazy mode
sucks and that we should deprecate and remove it.

With patches 1-3 applied, I think that eagerfpu works on all
systems.  Patch 5 will use it on all systems subject to a chicken
flag -- eagerfpu=off will still disable it.

I propose that we apply patch 5, let it soak in -next until the 4.6
merge window opens, possibly let it actually land in 4.6, and then
remove lazy mode entirely for 4.7.  This will open up enormous
cleanup possibilities, and it will make the fpu code vastly more
comprehensible.

Thoughts?

Andy Lutomirski (5):
  x86/fpu: Fix math emulation in eager fpu mode
  x86/fpu: Fix FNSAVE usage in eagerfpu mode
  x86/fpu: Fold fpu_copy into fpu__copy
  x86/fpu: Speed up lazy FPU restores slightly
  x86/fpu: Default eagerfpu=on on all CPUs

 arch/x86/include/asm/fpu/internal.h |  3 ++-
 arch/x86/kernel/fpu/core.c          | 52 +++++++++++++++++++------------------
 arch/x86/kernel/fpu/init.c          | 13 ++++------
 arch/x86/kernel/traps.c             |  3 +--
 4 files changed, 35 insertions(+), 36 deletions(-)

-- 
2.5.0

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/5] x86/fpu: Fix math emulation in eager fpu mode
  2016-01-23  0:56 [PATCH 0/5] x86/fpu: eagerfpu fixes, speedups, and default enablement Andy Lutomirski
@ 2016-01-23  0:56 ` Andy Lutomirski
  2016-01-23 10:02   ` Borislav Petkov
  2016-01-23  0:56 ` [PATCH 2/5] x86/fpu: Fix FNSAVE usage in eagerfpu mode Andy Lutomirski
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 13+ messages in thread
From: Andy Lutomirski @ 2016-01-23  0:56 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Borislav Petkov, Fenghua Yu, Oleg Nesterov, Peter Zijlstra,
	Sai Praneeth Prakhya, yu-cheng yu, Dave Hansen, Rik van Riel,
	Linus Torvalds, Andy Lutomirski

Systems without an FPU are generally old and therefore use lazy FPU
switching.  Unsurprisingly, math emulation in eager FPU mode is a
bit buggy.  Fix it.

There were two bugs involving kernel code trying to use the FPU
registers in eager mode even if they didn't exist and one BUG_ON
that was incorrect.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/include/asm/fpu/internal.h | 3 ++-
 arch/x86/kernel/fpu/core.c          | 2 +-
 arch/x86/kernel/traps.c             | 1 -
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 0fd440df63f1..a1f78a9fbf41 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -589,7 +589,8 @@ switch_fpu_prepare(struct fpu *old_fpu, struct fpu *new_fpu, int cpu)
 	 * If the task has used the math, pre-load the FPU on xsave processors
 	 * or if the past 5 consecutive context-switches used math.
 	 */
-	fpu.preload = new_fpu->fpstate_active &&
+	fpu.preload = static_cpu_has(X86_FEATURE_FPU) &&
+		      new_fpu->fpstate_active &&
 		      (use_eager_fpu() || new_fpu->counter > 5);
 
 	if (old_fpu->fpregs_active) {
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index d25097c3fc1d..08e1e11a05ca 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -423,7 +423,7 @@ void fpu__clear(struct fpu *fpu)
 {
 	WARN_ON_FPU(fpu != &current->thread.fpu); /* Almost certainly an anomaly */
 
-	if (!use_eager_fpu()) {
+	if (!use_eager_fpu() || !static_cpu_has(X86_FEATURE_FPU)) {
 		/* FPU state will be reallocated lazily at the first use. */
 		fpu__drop(fpu);
 	} else {
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index ade185a46b1d..87f80febf477 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -750,7 +750,6 @@ dotraplinkage void
 do_device_not_available(struct pt_regs *regs, long error_code)
 {
 	RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
-	BUG_ON(use_eager_fpu());
 
 #ifdef CONFIG_MATH_EMULATION
 	if (read_cr0() & X86_CR0_EM) {
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/5] x86/fpu: Fix FNSAVE usage in eagerfpu mode
  2016-01-23  0:56 [PATCH 0/5] x86/fpu: eagerfpu fixes, speedups, and default enablement Andy Lutomirski
  2016-01-23  0:56 ` [PATCH 1/5] x86/fpu: Fix math emulation in eager fpu mode Andy Lutomirski
@ 2016-01-23  0:56 ` Andy Lutomirski
  2016-01-23  0:56 ` [PATCH 3/5] x86/fpu: Fold fpu_copy into fpu__copy Andy Lutomirski
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 13+ messages in thread
From: Andy Lutomirski @ 2016-01-23  0:56 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Borislav Petkov, Fenghua Yu, Oleg Nesterov, Peter Zijlstra,
	Sai Praneeth Prakhya, yu-cheng yu, Dave Hansen, Rik van Riel,
	Linus Torvalds, Andy Lutomirski

In eager fpu mode, having deactivated fpu without immediately
reloading some other context is illegal.  Therefore, to recover from
FNSAVE, we can't just deactivate the state -- we need to reload it
if we're not actively context switching.

We had this wrong in fpu__save and fpu__copy.  Fix both.
__kernel_fpu_begin was fine -- add a comment.

This fixes a warning triggerable with nofxsr eagerfpu=on.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/kernel/fpu/core.c | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 08e1e11a05ca..7a9244df33e2 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -114,6 +114,10 @@ void __kernel_fpu_begin(void)
 	kernel_fpu_disable();
 
 	if (fpu->fpregs_active) {
+		/*
+		 * Ignore return value -- we don't care if reg state
+		 * is clobbered.
+		 */
 		copy_fpregs_to_fpstate(fpu);
 	} else {
 		this_cpu_write(fpu_fpregs_owner_ctx, NULL);
@@ -189,8 +193,12 @@ void fpu__save(struct fpu *fpu)
 
 	preempt_disable();
 	if (fpu->fpregs_active) {
-		if (!copy_fpregs_to_fpstate(fpu))
-			fpregs_deactivate(fpu);
+		if (!copy_fpregs_to_fpstate(fpu)) {
+			if (use_eager_fpu())
+				copy_kernel_to_fpregs(&fpu->state);
+			else
+				fpregs_deactivate(fpu);
+		}
 	}
 	preempt_enable();
 }
@@ -259,7 +267,11 @@ static void fpu_copy(struct fpu *dst_fpu, struct fpu *src_fpu)
 	preempt_disable();
 	if (!copy_fpregs_to_fpstate(dst_fpu)) {
 		memcpy(&src_fpu->state, &dst_fpu->state, xstate_size);
-		fpregs_deactivate(src_fpu);
+
+		if (use_eager_fpu())
+			copy_kernel_to_fpregs(&src_fpu->state);
+		else
+			fpregs_deactivate(src_fpu);
 	}
 	preempt_enable();
 }
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/5] x86/fpu: Fold fpu_copy into fpu__copy
  2016-01-23  0:56 [PATCH 0/5] x86/fpu: eagerfpu fixes, speedups, and default enablement Andy Lutomirski
  2016-01-23  0:56 ` [PATCH 1/5] x86/fpu: Fix math emulation in eager fpu mode Andy Lutomirski
  2016-01-23  0:56 ` [PATCH 2/5] x86/fpu: Fix FNSAVE usage in eagerfpu mode Andy Lutomirski
@ 2016-01-23  0:56 ` Andy Lutomirski
  2016-01-23  0:56 ` [PATCH 4/5] x86/fpu: Speed up lazy FPU restores slightly Andy Lutomirski
  2016-01-23  0:56 ` [PATCH 5/5] x86/fpu: Default eagerfpu=on on all CPUs Andy Lutomirski
  4 siblings, 0 replies; 13+ messages in thread
From: Andy Lutomirski @ 2016-01-23  0:56 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Borislav Petkov, Fenghua Yu, Oleg Nesterov, Peter Zijlstra,
	Sai Praneeth Prakhya, yu-cheng yu, Dave Hansen, Rik van Riel,
	Linus Torvalds, Andy Lutomirski

Splitting it into two functions needlessly obfuscated the code.
While we're at it, improve the comment slightly.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/kernel/fpu/core.c | 32 +++++++++++---------------------
 1 file changed, 11 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 7a9244df33e2..299b58bb975b 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -231,14 +231,15 @@ void fpstate_init(union fpregs_state *state)
 }
 EXPORT_SYMBOL_GPL(fpstate_init);
 
-/*
- * Copy the current task's FPU state to a new task's FPU context.
- *
- * In both the 'eager' and the 'lazy' case we save hardware registers
- * directly to the destination buffer.
- */
-static void fpu_copy(struct fpu *dst_fpu, struct fpu *src_fpu)
+int fpu__copy(struct fpu *dst_fpu, struct fpu *src_fpu)
 {
+	dst_fpu->counter = 0;
+	dst_fpu->fpregs_active = 0;
+	dst_fpu->last_cpu = -1;
+
+	if (!src_fpu->fpstate_active || !cpu_has_fpu)
+		return 0;
+
 	WARN_ON_FPU(src_fpu != &current->thread.fpu);
 
 	/*
@@ -251,10 +252,9 @@ static void fpu_copy(struct fpu *dst_fpu, struct fpu *src_fpu)
 	/*
 	 * Save current FPU registers directly into the child
 	 * FPU context, without any memory-to-memory copying.
-	 *
-	 * If the FPU context got destroyed in the process (FNSAVE
-	 * done on old CPUs) then copy it back into the source
-	 * context and mark the current task for lazy restore.
+	 * In lazy mode, if the FPU context isn't loaded into
+	 * fpregs, CR0.TS will be set and do_device_not_available
+	 * will load the FPU context.
 	 *
 	 * We have to do all this with preemption disabled,
 	 * mostly because of the FNSAVE case, because in that
@@ -274,16 +274,6 @@ static void fpu_copy(struct fpu *dst_fpu, struct fpu *src_fpu)
 			fpregs_deactivate(src_fpu);
 	}
 	preempt_enable();
-}
-
-int fpu__copy(struct fpu *dst_fpu, struct fpu *src_fpu)
-{
-	dst_fpu->counter = 0;
-	dst_fpu->fpregs_active = 0;
-	dst_fpu->last_cpu = -1;
-
-	if (src_fpu->fpstate_active && cpu_has_fpu)
-		fpu_copy(dst_fpu, src_fpu);
 
 	return 0;
 }
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/5] x86/fpu: Speed up lazy FPU restores slightly
  2016-01-23  0:56 [PATCH 0/5] x86/fpu: eagerfpu fixes, speedups, and default enablement Andy Lutomirski
                   ` (2 preceding siblings ...)
  2016-01-23  0:56 ` [PATCH 3/5] x86/fpu: Fold fpu_copy into fpu__copy Andy Lutomirski
@ 2016-01-23  0:56 ` Andy Lutomirski
  2016-01-23 10:14   ` Borislav Petkov
  2016-01-23  0:56 ` [PATCH 5/5] x86/fpu: Default eagerfpu=on on all CPUs Andy Lutomirski
  4 siblings, 1 reply; 13+ messages in thread
From: Andy Lutomirski @ 2016-01-23  0:56 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Borislav Petkov, Fenghua Yu, Oleg Nesterov, Peter Zijlstra,
	Sai Praneeth Prakhya, yu-cheng yu, Dave Hansen, Rik van Riel,
	Linus Torvalds, Andy Lutomirski

If we have an FPU, there's no need to check CR0 for FPU emulation.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/kernel/traps.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 87f80febf477..183b300f6a8b 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -752,7 +752,7 @@ do_device_not_available(struct pt_regs *regs, long error_code)
 	RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
 
 #ifdef CONFIG_MATH_EMULATION
-	if (read_cr0() & X86_CR0_EM) {
+	if (!cpu_has_fpu && (read_cr0() & X86_CR0_EM)) {
 		struct math_emu_info info = { };
 
 		conditional_sti(regs);
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 5/5] x86/fpu: Default eagerfpu=on on all CPUs
  2016-01-23  0:56 [PATCH 0/5] x86/fpu: eagerfpu fixes, speedups, and default enablement Andy Lutomirski
                   ` (3 preceding siblings ...)
  2016-01-23  0:56 ` [PATCH 4/5] x86/fpu: Speed up lazy FPU restores slightly Andy Lutomirski
@ 2016-01-23  0:56 ` Andy Lutomirski
  2016-01-23 10:19   ` Borislav Petkov
  4 siblings, 1 reply; 13+ messages in thread
From: Andy Lutomirski @ 2016-01-23  0:56 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Borislav Petkov, Fenghua Yu, Oleg Nesterov, Peter Zijlstra,
	Sai Praneeth Prakhya, yu-cheng yu, Dave Hansen, Rik van Riel,
	Linus Torvalds, Andy Lutomirski

We have eager and lazy fpu modes, introduced in 304bceda6a18 ("x86,
fpu: use non-lazy fpu restore for processors supporting xsave").

The result is rather messy.  There are two code paths in almost all
of the FPU code, and only one of them (the eager case) is tested
frequently, since most kernel developers have new enough hardware
that we use eagerfpu.

It seems that, on any remotely recent hardware, eagerfpu is a win:
glibc uses SSE2, so laziness is probably overoptimistic, and, in any
case, manipulating TS is far slower that saving and restoring the
full state.  (Stores to CR0.TS are serializing and are poorly
optimized.)

To try to shake out any latent issues on old hardware, this changes
the default to eager on all CPUs.  If no performance or functionality
problems show up, a subsequent patch could remove lazy mode entirely.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/kernel/fpu/init.c | 13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index d53ab3d3b8e8..e12cc0ad368e 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -262,7 +262,10 @@ static void __init fpu__init_system_xstate_size_legacy(void)
  * not only saved the restores along the way, but we also have the
  * FPU ready to be used for the original task.
  *
- * 'eager' switching is used on modern CPUs, there we switch the FPU
+ * 'lazy' is deprecated because it's almost never a performance win
+ * and it's much more complicated than 'eager'.
+ *
+ * 'eager' switching is by default on all CPUs, there we switch the FPU
  * state during every context switch, regardless of whether the task
  * has used FPU instructions in that time slice or not. This is done
  * because modern FPU context saving instructions are able to optimize
@@ -273,7 +276,7 @@ static void __init fpu__init_system_xstate_size_legacy(void)
  *   to use 'eager' restores, if we detect that a task is using the FPU
  *   frequently. See the fpu->counter logic in fpu/internal.h for that. ]
  */
-static enum { AUTO, ENABLE, DISABLE } eagerfpu = AUTO;
+static enum { ENABLE, DISABLE } eagerfpu = ENABLE;
 
 /*
  * Find supported xfeatures based on cpu features and command-line input.
@@ -350,15 +353,9 @@ static void __init fpu__init_system_ctx_switch(void)
  */
 static void __init fpu__init_parse_early_param(void)
 {
-	/*
-	 * No need to check "eagerfpu=auto" again, since it is the
-	 * initial default.
-	 */
 	if (cmdline_find_option_bool(boot_command_line, "eagerfpu=off")) {
 		eagerfpu = DISABLE;
 		fpu__clear_eager_fpu_features();
-	} else if (cmdline_find_option_bool(boot_command_line, "eagerfpu=on")) {
-		eagerfpu = ENABLE;
 	}
 
 	if (cmdline_find_option_bool(boot_command_line, "no387"))
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/5] x86/fpu: Fix math emulation in eager fpu mode
  2016-01-23  0:56 ` [PATCH 1/5] x86/fpu: Fix math emulation in eager fpu mode Andy Lutomirski
@ 2016-01-23 10:02   ` Borislav Petkov
  2016-01-23 17:40     ` Andy Lutomirski
  0 siblings, 1 reply; 13+ messages in thread
From: Borislav Petkov @ 2016-01-23 10:02 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: x86, linux-kernel, Fenghua Yu, Oleg Nesterov, Peter Zijlstra,
	Sai Praneeth Prakhya, yu-cheng yu, Dave Hansen, Rik van Riel,
	Linus Torvalds

On Fri, Jan 22, 2016 at 04:56:02PM -0800, Andy Lutomirski wrote:
> Systems without an FPU are generally old and therefore use lazy FPU
> switching.  Unsurprisingly, math emulation in eager FPU mode is a
> bit buggy.  Fix it.
> 
> There were two bugs involving kernel code trying to use the FPU
> registers in eager mode even if they didn't exist and one BUG_ON
> that was incorrect.
> 
> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> ---
>  arch/x86/include/asm/fpu/internal.h | 3 ++-
>  arch/x86/kernel/fpu/core.c          | 2 +-
>  arch/x86/kernel/traps.c             | 1 -
>  3 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
> index 0fd440df63f1..a1f78a9fbf41 100644
> --- a/arch/x86/include/asm/fpu/internal.h
> +++ b/arch/x86/include/asm/fpu/internal.h
> @@ -589,7 +589,8 @@ switch_fpu_prepare(struct fpu *old_fpu, struct fpu *new_fpu, int cpu)
>  	 * If the task has used the math, pre-load the FPU on xsave processors
>  	 * or if the past 5 consecutive context-switches used math.
>  	 */
> -	fpu.preload = new_fpu->fpstate_active &&
> +	fpu.preload = static_cpu_has(X86_FEATURE_FPU) &&
> +		      new_fpu->fpstate_active &&
>  		      (use_eager_fpu() || new_fpu->counter > 5);

Should we move that static_cpu_has(X86_FEATURE_FPU) check in
use_eager_fpu()?

I mean, when !X86_FEATURE_FPU, then we most certainly aren't doing eager
FPU anyway.

Looking at the call sites briefly says we should be covered but I might
be missing out some detail.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/5] x86/fpu: Speed up lazy FPU restores slightly
  2016-01-23  0:56 ` [PATCH 4/5] x86/fpu: Speed up lazy FPU restores slightly Andy Lutomirski
@ 2016-01-23 10:14   ` Borislav Petkov
  2016-01-23 22:09     ` Andy Lutomirski
  0 siblings, 1 reply; 13+ messages in thread
From: Borislav Petkov @ 2016-01-23 10:14 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: x86, linux-kernel, Fenghua Yu, Oleg Nesterov, Peter Zijlstra,
	Sai Praneeth Prakhya, yu-cheng yu, Dave Hansen, Rik van Riel,
	Linus Torvalds

On Fri, Jan 22, 2016 at 04:56:05PM -0800, Andy Lutomirski wrote:
> If we have an FPU, there's no need to check CR0 for FPU emulation.
> 
> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> ---
>  arch/x86/kernel/traps.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
> index 87f80febf477..183b300f6a8b 100644
> --- a/arch/x86/kernel/traps.c
> +++ b/arch/x86/kernel/traps.c
> @@ -752,7 +752,7 @@ do_device_not_available(struct pt_regs *regs, long error_code)
>  	RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
>  
>  #ifdef CONFIG_MATH_EMULATION
> -	if (read_cr0() & X86_CR0_EM) {
> +	if (!cpu_has_fpu && (read_cr0() & X86_CR0_EM)) {

Please use

	boot_cpu_has(X86_FEATURE_FPU)

We want to kill those cpu_has_xxx things.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 5/5] x86/fpu: Default eagerfpu=on on all CPUs
  2016-01-23  0:56 ` [PATCH 5/5] x86/fpu: Default eagerfpu=on on all CPUs Andy Lutomirski
@ 2016-01-23 10:19   ` Borislav Petkov
  0 siblings, 0 replies; 13+ messages in thread
From: Borislav Petkov @ 2016-01-23 10:19 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: x86, linux-kernel, Fenghua Yu, Oleg Nesterov, Peter Zijlstra,
	Sai Praneeth Prakhya, yu-cheng yu, Dave Hansen, Rik van Riel,
	Linus Torvalds

On Fri, Jan 22, 2016 at 04:56:06PM -0800, Andy Lutomirski wrote:
> To try to shake out any latent issues on old hardware, this changes
> the default to eager on all CPUs.  If no performance or functionality
> problems show up, a subsequent patch could remove lazy mode entirely.

Yes, I think we should do it, or at least try it. Maybe wait longer than
a kernel release - maybe 2 to be absolutely sure and so that we have
enough time to handle possible breakages.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/5] x86/fpu: Fix math emulation in eager fpu mode
  2016-01-23 10:02   ` Borislav Petkov
@ 2016-01-23 17:40     ` Andy Lutomirski
  2016-01-23 17:51       ` Borislav Petkov
  0 siblings, 1 reply; 13+ messages in thread
From: Andy Lutomirski @ 2016-01-23 17:40 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Andy Lutomirski, X86 ML, linux-kernel, Fenghua Yu, Oleg Nesterov,
	Peter Zijlstra, Sai Praneeth Prakhya, yu-cheng yu, Dave Hansen,
	Rik van Riel, Linus Torvalds

On Sat, Jan 23, 2016 at 2:02 AM, Borislav Petkov <bp@alien8.de> wrote:
> On Fri, Jan 22, 2016 at 04:56:02PM -0800, Andy Lutomirski wrote:
>> Systems without an FPU are generally old and therefore use lazy FPU
>> switching.  Unsurprisingly, math emulation in eager FPU mode is a
>> bit buggy.  Fix it.
>>
>> There were two bugs involving kernel code trying to use the FPU
>> registers in eager mode even if they didn't exist and one BUG_ON
>> that was incorrect.
>>
>> Signed-off-by: Andy Lutomirski <luto@kernel.org>
>> ---
>>  arch/x86/include/asm/fpu/internal.h | 3 ++-
>>  arch/x86/kernel/fpu/core.c          | 2 +-
>>  arch/x86/kernel/traps.c             | 1 -
>>  3 files changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
>> index 0fd440df63f1..a1f78a9fbf41 100644
>> --- a/arch/x86/include/asm/fpu/internal.h
>> +++ b/arch/x86/include/asm/fpu/internal.h
>> @@ -589,7 +589,8 @@ switch_fpu_prepare(struct fpu *old_fpu, struct fpu *new_fpu, int cpu)
>>        * If the task has used the math, pre-load the FPU on xsave processors
>>        * or if the past 5 consecutive context-switches used math.
>>        */
>> -     fpu.preload = new_fpu->fpstate_active &&
>> +     fpu.preload = static_cpu_has(X86_FEATURE_FPU) &&
>> +                   new_fpu->fpstate_active &&
>>                     (use_eager_fpu() || new_fpu->counter > 5);
>
> Should we move that static_cpu_has(X86_FEATURE_FPU) check in
> use_eager_fpu()?

I'm confused.  I want to make use_eager_fpu() return true always, so
how would this help?  The idea is for FPU emulation to work despite
being in eager mode.

>
> I mean, when !X86_FEATURE_FPU, then we most certainly aren't doing eager
> FPU anyway.

With this patch applied, we can do eager fpu without X86_FEATURE_FPU :)

--Andy

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/5] x86/fpu: Fix math emulation in eager fpu mode
  2016-01-23 17:40     ` Andy Lutomirski
@ 2016-01-23 17:51       ` Borislav Petkov
  0 siblings, 0 replies; 13+ messages in thread
From: Borislav Petkov @ 2016-01-23 17:51 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Andy Lutomirski, X86 ML, linux-kernel, Fenghua Yu, Oleg Nesterov,
	Peter Zijlstra, Sai Praneeth Prakhya, yu-cheng yu, Dave Hansen,
	Rik van Riel, Linus Torvalds

On Sat, Jan 23, 2016 at 09:40:06AM -0800, Andy Lutomirski wrote:
> I'm confused.  I want to make use_eager_fpu() return true always, so
> how would this help?  The idea is for FPU emulation to work despite
> being in eager mode.

I mean this:

static __always_inline __pure bool use_eager_fpu(void)
{
	if (!static_cpu_has(X86_FEATURE_FPU))
		return false;

        return static_cpu_has(X86_FEATURE_EAGER_FPU);
}

Although now I realize that returning false is ambiguous here. F'get it.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/5] x86/fpu: Speed up lazy FPU restores slightly
  2016-01-23 10:14   ` Borislav Petkov
@ 2016-01-23 22:09     ` Andy Lutomirski
  2016-01-23 23:35       ` Borislav Petkov
  0 siblings, 1 reply; 13+ messages in thread
From: Andy Lutomirski @ 2016-01-23 22:09 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Andy Lutomirski, X86 ML, linux-kernel, Fenghua Yu, Oleg Nesterov,
	Peter Zijlstra, Sai Praneeth Prakhya, yu-cheng yu, Dave Hansen,
	Rik van Riel, Linus Torvalds

On Sat, Jan 23, 2016 at 2:14 AM, Borislav Petkov <bp@alien8.de> wrote:
> On Fri, Jan 22, 2016 at 04:56:05PM -0800, Andy Lutomirski wrote:
>> If we have an FPU, there's no need to check CR0 for FPU emulation.
>>
>> Signed-off-by: Andy Lutomirski <luto@kernel.org>
>> ---
>>  arch/x86/kernel/traps.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
>> index 87f80febf477..183b300f6a8b 100644
>> --- a/arch/x86/kernel/traps.c
>> +++ b/arch/x86/kernel/traps.c
>> @@ -752,7 +752,7 @@ do_device_not_available(struct pt_regs *regs, long error_code)
>>       RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
>>
>>  #ifdef CONFIG_MATH_EMULATION
>> -     if (read_cr0() & X86_CR0_EM) {
>> +     if (!cpu_has_fpu && (read_cr0() & X86_CR0_EM)) {
>
> Please use
>
>         boot_cpu_has(X86_FEATURE_FPU)
>
> We want to kill those cpu_has_xxx things.

Queued for v2.

Maybe this will be just cpu_has some day if my theory about the new
improved static_cpu_has being shorter than boot_cpu_has pans out :)

--Andy

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/5] x86/fpu: Speed up lazy FPU restores slightly
  2016-01-23 22:09     ` Andy Lutomirski
@ 2016-01-23 23:35       ` Borislav Petkov
  0 siblings, 0 replies; 13+ messages in thread
From: Borislav Petkov @ 2016-01-23 23:35 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Andy Lutomirski, X86 ML, linux-kernel, Fenghua Yu, Oleg Nesterov,
	Peter Zijlstra, Sai Praneeth Prakhya, yu-cheng yu, Dave Hansen,
	Rik van Riel, Linus Torvalds

On Sat, Jan 23, 2016 at 02:09:59PM -0800, Andy Lutomirski wrote:
> Maybe this will be just cpu_has some day if my theory about the new
> improved static_cpu_has being shorter than boot_cpu_has pans out :)

Yeah, I have the static_cpu_has() simplification patchset v2 ready, will
send out tomorrow.

And yeah, as a next step, we probably should think about hiding
boot_cpu_has() and using solely static_cpu_has() everywhere instead.

The cpu_has() thing takes struct cpuinfo_x86 * and I'll bet a bunch of
money that a lot of the callsites could do static_cpu_has(), i.e., look
at boot CPU bits instead. That's for later, though.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2016-01-23 23:35 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-23  0:56 [PATCH 0/5] x86/fpu: eagerfpu fixes, speedups, and default enablement Andy Lutomirski
2016-01-23  0:56 ` [PATCH 1/5] x86/fpu: Fix math emulation in eager fpu mode Andy Lutomirski
2016-01-23 10:02   ` Borislav Petkov
2016-01-23 17:40     ` Andy Lutomirski
2016-01-23 17:51       ` Borislav Petkov
2016-01-23  0:56 ` [PATCH 2/5] x86/fpu: Fix FNSAVE usage in eagerfpu mode Andy Lutomirski
2016-01-23  0:56 ` [PATCH 3/5] x86/fpu: Fold fpu_copy into fpu__copy Andy Lutomirski
2016-01-23  0:56 ` [PATCH 4/5] x86/fpu: Speed up lazy FPU restores slightly Andy Lutomirski
2016-01-23 10:14   ` Borislav Petkov
2016-01-23 22:09     ` Andy Lutomirski
2016-01-23 23:35       ` Borislav Petkov
2016-01-23  0:56 ` [PATCH 5/5] x86/fpu: Default eagerfpu=on on all CPUs Andy Lutomirski
2016-01-23 10:19   ` Borislav Petkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).