All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/9] x86/fpu: remove lazy FPU mode & various FPU cleanups
@ 2016-10-05  0:34 riel
  2016-10-05  0:34 ` [PATCH 1/9] x86/crypto: Remove X86_FEATURE_EAGER_FPU ifdef from the crc32c code riel
                   ` (8 more replies)
  0 siblings, 9 replies; 28+ messages in thread
From: riel @ 2016-10-05  0:34 UTC (permalink / raw)
  To: linux-kernel; +Cc: dave.hansen, x86, tglx, pbonzini, mingo, luto, pa, bp

This series removes lazy FPU mode, and cleans up various bits
and pieces around the FPU code.

I have run this through a basic floating point test that involves
about 1.5 billion context switches and 45 minutes of swapping at
250MB/second.

This seems to tease out bugs fairly well, though I would not mind
an actual floating point test suite...

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 1/9] x86/crypto: Remove X86_FEATURE_EAGER_FPU ifdef from the crc32c code
  2016-10-05  0:34 [PATCH 0/9] x86/fpu: remove lazy FPU mode & various FPU cleanups riel
@ 2016-10-05  0:34 ` riel
  2016-10-05  0:40   ` Rik van Riel
  2016-10-07  9:43   ` [tip:x86/fpu] x86/crypto, x86/fpu: Remove X86_FEATURE_EAGER_FPU #ifdef " tip-bot for Andy Lutomirski
  2016-10-05  0:34 ` [PATCH 2/9] x86/fpu: Hard-disable lazy fpu mode riel
                   ` (7 subsequent siblings)
  8 siblings, 2 replies; 28+ messages in thread
From: riel @ 2016-10-05  0:34 UTC (permalink / raw)
  To: linux-kernel; +Cc: dave.hansen, x86, tglx, pbonzini, mingo, luto, pa, bp

From: Rik van Riel <riel@redhat.com>

>From : Andy Lutomirski <luto@kernel.org>

The crypto code was checking both use_eager_fpu() and
defined(X86_FEATURE_EAGER_FPU).  The latter was nonsensical, so
remove it.  This will avoid breakage when we remove
X86_FEATURE_EAGER_FPU.

Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/crypto/crc32c-intel_glue.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/arch/x86/crypto/crc32c-intel_glue.c b/arch/x86/crypto/crc32c-intel_glue.c
index 0857b1a1de3b..715399b14ed7 100644
--- a/arch/x86/crypto/crc32c-intel_glue.c
+++ b/arch/x86/crypto/crc32c-intel_glue.c
@@ -58,16 +58,11 @@
 asmlinkage unsigned int crc_pcl(const u8 *buffer, int len,
 				unsigned int crc_init);
 static int crc32c_pcl_breakeven = CRC32C_PCL_BREAKEVEN_EAGERFPU;
-#if defined(X86_FEATURE_EAGER_FPU)
 #define set_pcl_breakeven_point()					\
 do {									\
 	if (!use_eager_fpu())						\
 		crc32c_pcl_breakeven = CRC32C_PCL_BREAKEVEN_NOEAGERFPU;	\
 } while (0)
-#else
-#define set_pcl_breakeven_point()					\
-	(crc32c_pcl_breakeven = CRC32C_PCL_BREAKEVEN_NOEAGERFPU)
-#endif
 #endif /* CONFIG_X86_64 */
 
 static u32 crc32c_intel_le_hw_byte(u32 crc, unsigned char const *data, size_t length)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 2/9] x86/fpu: Hard-disable lazy fpu mode
  2016-10-05  0:34 [PATCH 0/9] x86/fpu: remove lazy FPU mode & various FPU cleanups riel
  2016-10-05  0:34 ` [PATCH 1/9] x86/crypto: Remove X86_FEATURE_EAGER_FPU ifdef from the crc32c code riel
@ 2016-10-05  0:34 ` riel
  2016-10-05  7:14   ` Paolo Bonzini
  2016-10-07  9:43   ` [tip:x86/fpu] x86/fpu: Hard-disable lazy FPU mode tip-bot for Andy Lutomirski
  2016-10-05  0:34 ` [PATCH 3/9] x86/fpu: Remove the XFEATURE_MASK_EAGER/LAZY distinction riel
                   ` (6 subsequent siblings)
  8 siblings, 2 replies; 28+ messages in thread
From: riel @ 2016-10-05  0:34 UTC (permalink / raw)
  To: linux-kernel; +Cc: dave.hansen, x86, tglx, pbonzini, mingo, luto, pa, bp

From: Andy Lutomirski <luto@kernel.org>

Since commit 58122bf1d856 ("x86/fpu: Default eagerfpu=on on all
CPUs") in Linux 4.6, eager FPU mode has been the default on all x86
systems, and no one has reported any regressions.

This patch removes the ability to enable lazy mode: use_eager_fpu()
becomes "return true" and all of the FPU mode selection machinery is
removed.

Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/include/asm/cpufeatures.h  |  2 +-
 arch/x86/include/asm/fpu/internal.h |  2 +-
 arch/x86/kernel/fpu/init.c          | 91 ++-----------------------------------
 3 files changed, 5 insertions(+), 90 deletions(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 1188bc849ee3..b212b862314a 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -104,7 +104,7 @@
 #define X86_FEATURE_EXTD_APICID	( 3*32+26) /* has extended APICID (8 bits) */
 #define X86_FEATURE_AMD_DCM     ( 3*32+27) /* multi-node processor */
 #define X86_FEATURE_APERFMPERF	( 3*32+28) /* APERFMPERF */
-#define X86_FEATURE_EAGER_FPU	( 3*32+29) /* "eagerfpu" Non lazy FPU restore */
+/* free, was #define X86_FEATURE_EAGER_FPU	( 3*32+29) * "eagerfpu" Non lazy FPU restore */
 #define X86_FEATURE_NONSTOP_TSC_S3 ( 3*32+30) /* TSC doesn't stop in S3 state */
 
 /* Intel-defined CPU features, CPUID level 0x00000001 (ecx), word 4 */
diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 2737366ea583..8852e3afa1ad 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -62,7 +62,7 @@ extern u64 fpu__get_supported_xfeatures_mask(void);
  */
 static __always_inline __pure bool use_eager_fpu(void)
 {
-	return static_cpu_has(X86_FEATURE_EAGER_FPU);
+	return true;
 }
 
 static __always_inline __pure bool use_xsaveopt(void)
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 2f2b8c7ccb85..1a09d133c801 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -15,10 +15,7 @@
  */
 static void fpu__init_cpu_ctx_switch(void)
 {
-	if (!boot_cpu_has(X86_FEATURE_EAGER_FPU))
-		stts();
-	else
-		clts();
+	clts();
 }
 
 /*
@@ -233,82 +230,16 @@ static void __init fpu__init_system_xstate_size_legacy(void)
 }
 
 /*
- * FPU context switching strategies:
- *
- * Against popular belief, we don't do lazy FPU saves, due to the
- * task migration complications it brings on SMP - we only do
- * lazy FPU restores.
- *
- * 'lazy' is the traditional strategy, which is based on setting
- * CR0::TS to 1 during context-switch (instead of doing a full
- * restore of the FPU state), which causes the first FPU instruction
- * after the context switch (whenever it is executed) to fault - at
- * which point we lazily restore the FPU state into FPU registers.
- *
- * Tasks are of course under no obligation to execute FPU instructions,
- * so it can easily happen that another context-switch occurs without
- * a single FPU instruction being executed. If we eventually switch
- * back to the original task (that still owns the FPU) then we have
- * not only saved the restores along the way, but we also have the
- * FPU ready to be used for the original task.
- *
- * 'lazy' is deprecated because it's almost never a performance win
- * and it's much more complicated than 'eager'.
- *
- * 'eager' switching is by default on all CPUs, there we switch the FPU
- * state during every context switch, regardless of whether the task
- * has used FPU instructions in that time slice or not. This is done
- * because modern FPU context saving instructions are able to optimize
- * state saving and restoration in hardware: they can detect both
- * unused and untouched FPU state and optimize accordingly.
- *
- * [ Note that even in 'lazy' mode we might optimize context switches
- *   to use 'eager' restores, if we detect that a task is using the FPU
- *   frequently. See the fpu->counter logic in fpu/internal.h for that. ]
- */
-static enum { ENABLE, DISABLE } eagerfpu = ENABLE;
-
-/*
  * Find supported xfeatures based on cpu features and command-line input.
  * This must be called after fpu__init_parse_early_param() is called and
  * xfeatures_mask is enumerated.
  */
 u64 __init fpu__get_supported_xfeatures_mask(void)
 {
-	/* Support all xfeatures known to us */
-	if (eagerfpu != DISABLE)
-		return XCNTXT_MASK;
-
-	/* Warning of xfeatures being disabled for no eagerfpu mode */
-	if (xfeatures_mask & XFEATURE_MASK_EAGER) {
-		pr_err("x86/fpu: eagerfpu switching disabled, disabling the following xstate features: 0x%llx.\n",
-			xfeatures_mask & XFEATURE_MASK_EAGER);
-	}
-
-	/* Return a mask that masks out all features requiring eagerfpu mode */
-	return ~XFEATURE_MASK_EAGER;
-}
-
-/*
- * Disable features dependent on eagerfpu.
- */
-static void __init fpu__clear_eager_fpu_features(void)
-{
-	setup_clear_cpu_cap(X86_FEATURE_MPX);
+	return XCNTXT_MASK;
 }
 
-/*
- * Pick the FPU context switching strategy:
- *
- * When eagerfpu is AUTO or ENABLE, we ensure it is ENABLE if either of
- * the following is true:
- *
- * (1) the cpu has xsaveopt, as it has the optimization and doing eager
- *     FPU switching has a relatively low cost compared to a plain xsave;
- * (2) the cpu has xsave features (e.g. MPX) that depend on eager FPU
- *     switching. Should the kernel boot with noxsaveopt, we support MPX
- *     with eager FPU switching at a higher cost.
- */
+/* Legacy code to initialize eager fpu mode. */
 static void __init fpu__init_system_ctx_switch(void)
 {
 	static bool on_boot_cpu __initdata = 1;
@@ -317,17 +248,6 @@ static void __init fpu__init_system_ctx_switch(void)
 	on_boot_cpu = 0;
 
 	WARN_ON_FPU(current->thread.fpu.fpstate_active);
-
-	if (boot_cpu_has(X86_FEATURE_XSAVEOPT) && eagerfpu != DISABLE)
-		eagerfpu = ENABLE;
-
-	if (xfeatures_mask & XFEATURE_MASK_EAGER)
-		eagerfpu = ENABLE;
-
-	if (eagerfpu == ENABLE)
-		setup_force_cpu_cap(X86_FEATURE_EAGER_FPU);
-
-	printk(KERN_INFO "x86/fpu: Using '%s' FPU context switches.\n", eagerfpu == ENABLE ? "eager" : "lazy");
 }
 
 /*
@@ -336,11 +256,6 @@ static void __init fpu__init_system_ctx_switch(void)
  */
 static void __init fpu__init_parse_early_param(void)
 {
-	if (cmdline_find_option_bool(boot_command_line, "eagerfpu=off")) {
-		eagerfpu = DISABLE;
-		fpu__clear_eager_fpu_features();
-	}
-
 	if (cmdline_find_option_bool(boot_command_line, "no387"))
 		setup_clear_cpu_cap(X86_FEATURE_FPU);
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 3/9] x86/fpu: Remove the XFEATURE_MASK_EAGER/LAZY distinction
  2016-10-05  0:34 [PATCH 0/9] x86/fpu: remove lazy FPU mode & various FPU cleanups riel
  2016-10-05  0:34 ` [PATCH 1/9] x86/crypto: Remove X86_FEATURE_EAGER_FPU ifdef from the crc32c code riel
  2016-10-05  0:34 ` [PATCH 2/9] x86/fpu: Hard-disable lazy fpu mode riel
@ 2016-10-05  0:34 ` riel
  2016-10-07  9:44   ` [tip:x86/fpu] " tip-bot for Andy Lutomirski
  2016-10-05  0:34 ` [PATCH 4/9] x86/fpu: Remove use_eager_fpu() riel
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 28+ messages in thread
From: riel @ 2016-10-05  0:34 UTC (permalink / raw)
  To: linux-kernel; +Cc: dave.hansen, x86, tglx, pbonzini, mingo, luto, pa, bp

From: Andy Lutomirski <luto@kernel.org>

Now that lazy mode is gone, we don't need to distinguish which
xfeatures require eager mode.

Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/include/asm/fpu/xstate.h | 14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
index d4957ac72b48..1b2799e0699a 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -21,20 +21,16 @@
 /* Supervisor features */
 #define XFEATURE_MASK_SUPERVISOR (XFEATURE_MASK_PT)
 
-/* Supported features which support lazy state saving */
-#define XFEATURE_MASK_LAZY	(XFEATURE_MASK_FP | \
+/* All currently supported features */
+#define XCNTXT_MASK		(XFEATURE_MASK_FP | \
 				 XFEATURE_MASK_SSE | \
 				 XFEATURE_MASK_YMM | \
 				 XFEATURE_MASK_OPMASK | \
 				 XFEATURE_MASK_ZMM_Hi256 | \
 				 XFEATURE_MASK_Hi16_ZMM	 | \
-				 XFEATURE_MASK_PKRU)
-
-/* Supported features which require eager state saving */
-#define XFEATURE_MASK_EAGER	(XFEATURE_MASK_BNDREGS | XFEATURE_MASK_BNDCSR)
-
-/* All currently supported features */
-#define XCNTXT_MASK	(XFEATURE_MASK_LAZY | XFEATURE_MASK_EAGER)
+				 XFEATURE_MASK_PKRU | \
+				 XFEATURE_MASK_BNDREGS | \
+				 XFEATURE_MASK_BNDCSR)
 
 #ifdef CONFIG_X86_64
 #define REX_PREFIX	"0x48, "
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 4/9] x86/fpu: Remove use_eager_fpu()
  2016-10-05  0:34 [PATCH 0/9] x86/fpu: remove lazy FPU mode & various FPU cleanups riel
                   ` (2 preceding siblings ...)
  2016-10-05  0:34 ` [PATCH 3/9] x86/fpu: Remove the XFEATURE_MASK_EAGER/LAZY distinction riel
@ 2016-10-05  0:34 ` riel
  2016-10-07  9:44   ` [tip:x86/fpu] " tip-bot for Andy Lutomirski
  2016-10-05  0:34 ` [PATCH 5/9] x86/fpu: remove fpu.counter riel
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 28+ messages in thread
From: riel @ 2016-10-05  0:34 UTC (permalink / raw)
  To: linux-kernel; +Cc: dave.hansen, x86, tglx, pbonzini, mingo, luto, pa, bp

From: Andy Lutomirski <luto@kernel.org>

This removes all the obvious code paths that depend on lazy FPU mode.
It shouldn't change the generated code at all.

Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/crypto/crc32c-intel_glue.c | 17 ++++-------------
 arch/x86/include/asm/fpu/internal.h | 34 +--------------------------------
 arch/x86/kernel/fpu/core.c          | 38 +++++--------------------------------
 arch/x86/kernel/fpu/signal.c        |  8 +++-----
 arch/x86/kernel/fpu/xstate.c        |  9 ---------
 arch/x86/kvm/cpuid.c                |  4 +---
 arch/x86/kvm/x86.c                  | 10 ----------
 7 files changed, 14 insertions(+), 106 deletions(-)

diff --git a/arch/x86/crypto/crc32c-intel_glue.c b/arch/x86/crypto/crc32c-intel_glue.c
index 715399b14ed7..c194d5717ae5 100644
--- a/arch/x86/crypto/crc32c-intel_glue.c
+++ b/arch/x86/crypto/crc32c-intel_glue.c
@@ -48,21 +48,13 @@
 #ifdef CONFIG_X86_64
 /*
  * use carryless multiply version of crc32c when buffer
- * size is >= 512 (when eager fpu is enabled) or
- * >= 1024 (when eager fpu is disabled) to account
+ * size is >= 512 to account
  * for fpu state save/restore overhead.
  */
-#define CRC32C_PCL_BREAKEVEN_EAGERFPU	512
-#define CRC32C_PCL_BREAKEVEN_NOEAGERFPU	1024
+#define CRC32C_PCL_BREAKEVEN	512
 
 asmlinkage unsigned int crc_pcl(const u8 *buffer, int len,
 				unsigned int crc_init);
-static int crc32c_pcl_breakeven = CRC32C_PCL_BREAKEVEN_EAGERFPU;
-#define set_pcl_breakeven_point()					\
-do {									\
-	if (!use_eager_fpu())						\
-		crc32c_pcl_breakeven = CRC32C_PCL_BREAKEVEN_NOEAGERFPU;	\
-} while (0)
 #endif /* CONFIG_X86_64 */
 
 static u32 crc32c_intel_le_hw_byte(u32 crc, unsigned char const *data, size_t length)
@@ -185,7 +177,7 @@ static int crc32c_pcl_intel_update(struct shash_desc *desc, const u8 *data,
 	 * use faster PCL version if datasize is large enough to
 	 * overcome kernel fpu state save/restore overhead
 	 */
-	if (len >= crc32c_pcl_breakeven && irq_fpu_usable()) {
+	if (len >= CRC32C_PCL_BREAKEVEN && irq_fpu_usable()) {
 		kernel_fpu_begin();
 		*crcp = crc_pcl(data, len, *crcp);
 		kernel_fpu_end();
@@ -197,7 +189,7 @@ static int crc32c_pcl_intel_update(struct shash_desc *desc, const u8 *data,
 static int __crc32c_pcl_intel_finup(u32 *crcp, const u8 *data, unsigned int len,
 				u8 *out)
 {
-	if (len >= crc32c_pcl_breakeven && irq_fpu_usable()) {
+	if (len >= CRC32C_PCL_BREAKEVEN && irq_fpu_usable()) {
 		kernel_fpu_begin();
 		*(__le32 *)out = ~cpu_to_le32(crc_pcl(data, len, *crcp));
 		kernel_fpu_end();
@@ -256,7 +248,6 @@ static int __init crc32c_intel_mod_init(void)
 		alg.update = crc32c_pcl_intel_update;
 		alg.finup = crc32c_pcl_intel_finup;
 		alg.digest = crc32c_pcl_intel_digest;
-		set_pcl_breakeven_point();
 	}
 #endif
 	return crypto_register_shash(&alg);
diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 8852e3afa1ad..7801d32347a2 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -60,11 +60,6 @@ extern u64 fpu__get_supported_xfeatures_mask(void);
 /*
  * FPU related CPU feature flag helper routines:
  */
-static __always_inline __pure bool use_eager_fpu(void)
-{
-	return true;
-}
-
 static __always_inline __pure bool use_xsaveopt(void)
 {
 	return static_cpu_has(X86_FEATURE_XSAVEOPT);
@@ -501,24 +496,6 @@ static inline int fpu_want_lazy_restore(struct fpu *fpu, unsigned int cpu)
 }
 
 
-/*
- * Wrap lazy FPU TS handling in a 'hw fpregs activation/deactivation'
- * idiom, which is then paired with the sw-flag (fpregs_active) later on:
- */
-
-static inline void __fpregs_activate_hw(void)
-{
-	if (!use_eager_fpu())
-		clts();
-}
-
-static inline void __fpregs_deactivate_hw(void)
-{
-	if (!use_eager_fpu())
-		stts();
-}
-
-/* Must be paired with an 'stts' (fpregs_deactivate_hw()) after! */
 static inline void __fpregs_deactivate(struct fpu *fpu)
 {
 	WARN_ON_FPU(!fpu->fpregs_active);
@@ -528,7 +505,6 @@ static inline void __fpregs_deactivate(struct fpu *fpu)
 	trace_x86_fpu_regs_deactivated(fpu);
 }
 
-/* Must be paired with a 'clts' (fpregs_activate_hw()) before! */
 static inline void __fpregs_activate(struct fpu *fpu)
 {
 	WARN_ON_FPU(fpu->fpregs_active);
@@ -554,22 +530,17 @@ static inline int fpregs_active(void)
 }
 
 /*
- * Encapsulate the CR0.TS handling together with the
- * software flag.
- *
  * These generally need preemption protection to work,
  * do try to avoid using these on their own.
  */
 static inline void fpregs_activate(struct fpu *fpu)
 {
-	__fpregs_activate_hw();
 	__fpregs_activate(fpu);
 }
 
 static inline void fpregs_deactivate(struct fpu *fpu)
 {
 	__fpregs_deactivate(fpu);
-	__fpregs_deactivate_hw();
 }
 
 /*
@@ -596,8 +567,7 @@ switch_fpu_prepare(struct fpu *old_fpu, struct fpu *new_fpu, int cpu)
 	 * or if the past 5 consecutive context-switches used math.
 	 */
 	fpu.preload = static_cpu_has(X86_FEATURE_FPU) &&
-		      new_fpu->fpstate_active &&
-		      (use_eager_fpu() || new_fpu->counter > 5);
+		      new_fpu->fpstate_active;
 
 	if (old_fpu->fpregs_active) {
 		if (!copy_fpregs_to_fpstate(old_fpu))
@@ -615,8 +585,6 @@ switch_fpu_prepare(struct fpu *old_fpu, struct fpu *new_fpu, int cpu)
 			__fpregs_activate(new_fpu);
 			trace_x86_fpu_regs_activated(new_fpu);
 			prefetch(&new_fpu->state);
-		} else {
-			__fpregs_deactivate_hw();
 		}
 	} else {
 		old_fpu->counter = 0;
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 3fc03a09a93b..036e14fe3b77 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -57,27 +57,9 @@ static bool kernel_fpu_disabled(void)
 	return this_cpu_read(in_kernel_fpu);
 }
 
-/*
- * Were we in an interrupt that interrupted kernel mode?
- *
- * On others, we can do a kernel_fpu_begin/end() pair *ONLY* if that
- * pair does nothing at all: the thread must not have fpu (so
- * that we don't try to save the FPU state), and TS must
- * be set (so that the clts/stts pair does nothing that is
- * visible in the interrupted kernel thread).
- *
- * Except for the eagerfpu case when we return true; in the likely case
- * the thread has FPU but we are not going to set/clear TS.
- */
 static bool interrupted_kernel_fpu_idle(void)
 {
-	if (kernel_fpu_disabled())
-		return false;
-
-	if (use_eager_fpu())
-		return true;
-
-	return !current->thread.fpu.fpregs_active && (read_cr0() & X86_CR0_TS);
+	return !kernel_fpu_disabled();
 }
 
 /*
@@ -125,7 +107,6 @@ void __kernel_fpu_begin(void)
 		copy_fpregs_to_fpstate(fpu);
 	} else {
 		this_cpu_write(fpu_fpregs_owner_ctx, NULL);
-		__fpregs_activate_hw();
 	}
 }
 EXPORT_SYMBOL(__kernel_fpu_begin);
@@ -136,8 +117,6 @@ void __kernel_fpu_end(void)
 
 	if (fpu->fpregs_active)
 		copy_kernel_to_fpregs(&fpu->state);
-	else
-		__fpregs_deactivate_hw();
 
 	kernel_fpu_enable();
 }
@@ -199,10 +178,7 @@ void fpu__save(struct fpu *fpu)
 	trace_x86_fpu_before_save(fpu);
 	if (fpu->fpregs_active) {
 		if (!copy_fpregs_to_fpstate(fpu)) {
-			if (use_eager_fpu())
-				copy_kernel_to_fpregs(&fpu->state);
-			else
-				fpregs_deactivate(fpu);
+			copy_kernel_to_fpregs(&fpu->state);
 		}
 	}
 	trace_x86_fpu_after_save(fpu);
@@ -259,8 +235,7 @@ int fpu__copy(struct fpu *dst_fpu, struct fpu *src_fpu)
 	 * Don't let 'init optimized' areas of the XSAVE area
 	 * leak into the child task:
 	 */
-	if (use_eager_fpu())
-		memset(&dst_fpu->state.xsave, 0, fpu_kernel_xstate_size);
+	memset(&dst_fpu->state.xsave, 0, fpu_kernel_xstate_size);
 
 	/*
 	 * Save current FPU registers directly into the child
@@ -282,10 +257,7 @@ int fpu__copy(struct fpu *dst_fpu, struct fpu *src_fpu)
 		memcpy(&src_fpu->state, &dst_fpu->state,
 		       fpu_kernel_xstate_size);
 
-		if (use_eager_fpu())
-			copy_kernel_to_fpregs(&src_fpu->state);
-		else
-			fpregs_deactivate(src_fpu);
+		copy_kernel_to_fpregs(&src_fpu->state);
 	}
 	preempt_enable();
 
@@ -517,7 +489,7 @@ void fpu__clear(struct fpu *fpu)
 {
 	WARN_ON_FPU(fpu != &current->thread.fpu); /* Almost certainly an anomaly */
 
-	if (!use_eager_fpu() || !static_cpu_has(X86_FEATURE_FPU)) {
+	if (!static_cpu_has(X86_FEATURE_FPU)) {
 		/* FPU state will be reallocated lazily at the first use. */
 		fpu__drop(fpu);
 	} else {
diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index a184c210efba..83c23c230b4c 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -340,11 +340,9 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
 		}
 
 		fpu->fpstate_active = 1;
-		if (use_eager_fpu()) {
-			preempt_disable();
-			fpu__restore(fpu);
-			preempt_enable();
-		}
+		preempt_disable();
+		fpu__restore(fpu);
+		preempt_enable();
 
 		return err;
 	} else {
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 01567aa87503..76bc2a1a3a79 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -886,15 +886,6 @@ int arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
 	 */
 	if (!boot_cpu_has(X86_FEATURE_OSPKE))
 		return -EINVAL;
-	/*
-	 * For most XSAVE components, this would be an arduous task:
-	 * brining fpstate up to date with fpregs, updating fpstate,
-	 * then re-populating fpregs.  But, for components that are
-	 * never lazily managed, we can just access the fpregs
-	 * directly.  PKRU is never managed lazily, so we can just
-	 * manipulate it directly.  Make sure it stays that way.
-	 */
-	WARN_ON_ONCE(!use_eager_fpu());
 
 	/* Set the bits we need in PKRU:  */
 	if (init_val & PKEY_DISABLE_ACCESS)
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 3235e0fe7792..1cf143e2b794 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -16,7 +16,6 @@
 #include <linux/export.h>
 #include <linux/vmalloc.h>
 #include <linux/uaccess.h>
-#include <asm/fpu/internal.h> /* For use_eager_fpu.  Ugh! */
 #include <asm/user.h>
 #include <asm/fpu/xstate.h>
 #include "cpuid.h"
@@ -114,8 +113,7 @@ int kvm_update_cpuid(struct kvm_vcpu *vcpu)
 	if (best && (best->eax & (F(XSAVES) | F(XSAVEC))))
 		best->ebx = xstate_required_size(vcpu->arch.xcr0, true);
 
-	if (use_eager_fpu())
-		kvm_x86_ops->fpu_activate(vcpu);
+	kvm_x86_ops->fpu_activate(vcpu);
 
 	/*
 	 * The existing code assumes virtual address is 48-bit in the canonical
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 699f8726539a..59d7761fd6df 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7357,16 +7357,6 @@ void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
 	copy_fpregs_to_fpstate(&vcpu->arch.guest_fpu);
 	__kernel_fpu_end();
 	++vcpu->stat.fpu_reload;
-	/*
-	 * If using eager FPU mode, or if the guest is a frequent user
-	 * of the FPU, just leave the FPU active for next time.
-	 * Every 255 times fpu_counter rolls over to 0; a guest that uses
-	 * the FPU in bursts will revert to loading it on demand.
-	 */
-	if (!use_eager_fpu()) {
-		if (++vcpu->fpu_counter < 5)
-			kvm_make_request(KVM_REQ_DEACTIVATE_FPU, vcpu);
-	}
 	trace_kvm_fpu(0);
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 5/9] x86/fpu: remove fpu.counter
  2016-10-05  0:34 [PATCH 0/9] x86/fpu: remove lazy FPU mode & various FPU cleanups riel
                   ` (3 preceding siblings ...)
  2016-10-05  0:34 ` [PATCH 4/9] x86/fpu: Remove use_eager_fpu() riel
@ 2016-10-05  0:34 ` riel
  2016-10-05  2:07   ` Andy Lutomirski
  2016-10-07  9:45   ` [tip:x86/fpu] x86/fpu: Remove struct fpu::counter tip-bot for Rik van Riel
  2016-10-05  0:34 ` [PATCH 6/9] x86/fpu,kvm: remove kvm vcpu->fpu_counter riel
                   ` (3 subsequent siblings)
  8 siblings, 2 replies; 28+ messages in thread
From: riel @ 2016-10-05  0:34 UTC (permalink / raw)
  To: linux-kernel; +Cc: dave.hansen, x86, tglx, pbonzini, mingo, luto, pa, bp

From: Rik van Riel <riel@redhat.com>

With the lazy FPU code gone, we no longer use the counter field
in struct fpu for anything. Get rid it.

Signed-off-by: Rik van Riel <riel@redhat.com>
---
 arch/x86/include/asm/fpu/internal.h |  3 ---
 arch/x86/include/asm/fpu/types.h    | 11 -----------
 arch/x86/include/asm/trace/fpu.h    |  5 +----
 arch/x86/kernel/fpu/core.c          |  3 ---
 4 files changed, 1 insertion(+), 21 deletions(-)

diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 7801d32347a2..499d6ed0e376 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -581,16 +581,13 @@ switch_fpu_prepare(struct fpu *old_fpu, struct fpu *new_fpu, int cpu)
 
 		/* Don't change CR0.TS if we just switch! */
 		if (fpu.preload) {
-			new_fpu->counter++;
 			__fpregs_activate(new_fpu);
 			trace_x86_fpu_regs_activated(new_fpu);
 			prefetch(&new_fpu->state);
 		}
 	} else {
-		old_fpu->counter = 0;
 		old_fpu->last_cpu = -1;
 		if (fpu.preload) {
-			new_fpu->counter++;
 			if (fpu_want_lazy_restore(new_fpu, cpu))
 				fpu.preload = 0;
 			else
diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
index 48df486b02f9..e31332d6f0e8 100644
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -322,17 +322,6 @@ struct fpu {
 	unsigned char			fpregs_active;
 
 	/*
-	 * @counter:
-	 *
-	 * This counter contains the number of consecutive context switches
-	 * during which the FPU stays used. If this is over a threshold, the
-	 * lazy FPU restore logic becomes eager, to save the trap overhead.
-	 * This is an unsigned char so that after 256 iterations the counter
-	 * wraps and the context switch behavior turns lazy again; this is to
-	 * deal with bursty apps that only use the FPU for a short time:
-	 */
-	unsigned char			counter;
-	/*
 	 * @state:
 	 *
 	 * In-memory copy of all FPU registers that we save/restore
diff --git a/arch/x86/include/asm/trace/fpu.h b/arch/x86/include/asm/trace/fpu.h
index 9217ab1f5bf6..342e59789fcd 100644
--- a/arch/x86/include/asm/trace/fpu.h
+++ b/arch/x86/include/asm/trace/fpu.h
@@ -14,7 +14,6 @@ DECLARE_EVENT_CLASS(x86_fpu,
 		__field(struct fpu *, fpu)
 		__field(bool, fpregs_active)
 		__field(bool, fpstate_active)
-		__field(int, counter)
 		__field(u64, xfeatures)
 		__field(u64, xcomp_bv)
 		),
@@ -23,17 +22,15 @@ DECLARE_EVENT_CLASS(x86_fpu,
 		__entry->fpu		= fpu;
 		__entry->fpregs_active	= fpu->fpregs_active;
 		__entry->fpstate_active	= fpu->fpstate_active;
-		__entry->counter	= fpu->counter;
 		if (boot_cpu_has(X86_FEATURE_OSXSAVE)) {
 			__entry->xfeatures = fpu->state.xsave.header.xfeatures;
 			__entry->xcomp_bv  = fpu->state.xsave.header.xcomp_bv;
 		}
 	),
-	TP_printk("x86/fpu: %p fpregs_active: %d fpstate_active: %d counter: %d xfeatures: %llx xcomp_bv: %llx",
+	TP_printk("x86/fpu: %p fpregs_active: %d fpstate_active: %d xfeatures: %llx xcomp_bv: %llx",
 			__entry->fpu,
 			__entry->fpregs_active,
 			__entry->fpstate_active,
-			__entry->counter,
 			__entry->xfeatures,
 			__entry->xcomp_bv
 	)
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 036e14fe3b77..6a37d525bdbe 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -222,7 +222,6 @@ EXPORT_SYMBOL_GPL(fpstate_init);
 
 int fpu__copy(struct fpu *dst_fpu, struct fpu *src_fpu)
 {
-	dst_fpu->counter = 0;
 	dst_fpu->fpregs_active = 0;
 	dst_fpu->last_cpu = -1;
 
@@ -430,7 +429,6 @@ void fpu__restore(struct fpu *fpu)
 	trace_x86_fpu_before_restore(fpu);
 	fpregs_activate(fpu);
 	copy_kernel_to_fpregs(&fpu->state);
-	fpu->counter++;
 	trace_x86_fpu_after_restore(fpu);
 	kernel_fpu_enable();
 }
@@ -448,7 +446,6 @@ EXPORT_SYMBOL_GPL(fpu__restore);
 void fpu__drop(struct fpu *fpu)
 {
 	preempt_disable();
-	fpu->counter = 0;
 
 	if (fpu->fpregs_active) {
 		/* Ignore delayed exceptions from user space */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 6/9] x86/fpu,kvm: remove kvm vcpu->fpu_counter
  2016-10-05  0:34 [PATCH 0/9] x86/fpu: remove lazy FPU mode & various FPU cleanups riel
                   ` (4 preceding siblings ...)
  2016-10-05  0:34 ` [PATCH 5/9] x86/fpu: remove fpu.counter riel
@ 2016-10-05  0:34 ` riel
  2016-10-07  9:45   ` [tip:x86/fpu] x86/fpu, kvm: Remove KVM vcpu->fpu_counter tip-bot for Rik van Riel
  2016-10-05  0:34 ` [PATCH 7/9] x86/fpu: rename lazy restore functions to "register state valid" riel
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 28+ messages in thread
From: riel @ 2016-10-05  0:34 UTC (permalink / raw)
  To: linux-kernel; +Cc: dave.hansen, x86, tglx, pbonzini, mingo, luto, pa, bp

From: Rik van Riel <riel@redhat.com>

With the removal of the lazy FPU code, this field is no longer used.
Get rid of it.

Signed-off-by: Rik van Riel <riel@redhat.com>
---
 arch/x86/kvm/x86.c       | 4 +---
 include/linux/kvm_host.h | 1 -
 2 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 59d7761fd6df..2c7e775d7295 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7348,10 +7348,8 @@ void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
 
 void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
 {
-	if (!vcpu->guest_fpu_loaded) {
-		vcpu->fpu_counter = 0;
+	if (!vcpu->guest_fpu_loaded)
 		return;
-	}
 
 	vcpu->guest_fpu_loaded = 0;
 	copy_fpregs_to_fpstate(&vcpu->arch.guest_fpu);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 9c28b4d4c90b..4e6905cd1e8e 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -224,7 +224,6 @@ struct kvm_vcpu {
 
 	int fpu_active;
 	int guest_fpu_loaded, guest_xcr0_loaded;
-	unsigned char fpu_counter;
 	struct swait_queue_head wq;
 	struct pid *pid;
 	int sigset_active;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 7/9] x86/fpu: rename lazy restore functions to "register state valid"
  2016-10-05  0:34 [PATCH 0/9] x86/fpu: remove lazy FPU mode & various FPU cleanups riel
                   ` (5 preceding siblings ...)
  2016-10-05  0:34 ` [PATCH 6/9] x86/fpu,kvm: remove kvm vcpu->fpu_counter riel
@ 2016-10-05  0:34 ` riel
  2016-10-05  2:09   ` Andy Lutomirski
  2016-10-07  9:46   ` [tip:x86/fpu] x86/fpu: Rename " tip-bot for Rik van Riel
  2016-10-05  0:34 ` [PATCH 8/9] x86/fpu: remove __fpregs_(de)activate riel
  2016-10-05  0:34 ` [PATCH 9/9] x86/fpu: split old & new fpu code paths riel
  8 siblings, 2 replies; 28+ messages in thread
From: riel @ 2016-10-05  0:34 UTC (permalink / raw)
  To: linux-kernel; +Cc: dave.hansen, x86, tglx, pbonzini, mingo, luto, pa, bp

From: Rik van Riel <riel@redhat.com>

Name the functions after the state they track, rather than the function
they currently enable. This should make it more obvious when we use the
fpu_register_state_valid function for something else in the future.

Signed-off-by: Rik van Riel <riel@redhat.com>
---
 arch/x86/include/asm/fpu/internal.h | 26 ++++++++++++++++++++------
 arch/x86/kernel/fpu/core.c          |  4 ++--
 arch/x86/kernel/smpboot.c           |  2 +-
 3 files changed, 23 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 499d6ed0e376..d2cfe16dd9fa 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -479,18 +479,32 @@ extern int copy_fpstate_to_sigframe(void __user *buf, void __user *fp, int size)
 DECLARE_PER_CPU(struct fpu *, fpu_fpregs_owner_ctx);
 
 /*
+ * The in-register FPU state for an FPU context on a CPU is assumed to be
+ * valid if the fpu->last_cpu matches the CPU, and the fpu_fpregs_owner_ctx
+ * matches the FPU.
+ *
+ * If the FPU register state is valid, the kernel can skip restoring the
+ * FPU state from memory.
+ *
+ * Any code that clobbers the FPU registers or updates the in-memory
+ * FPU state for a task MUST let the rest of the kernel know that the
+ * FPU registers are no longer valid for this task. Calling either of
+ * these two invalidate functions is enough, use whichever is convenient.
+ *
  * Must be run with preemption disabled: this clears the fpu_fpregs_owner_ctx,
  * on this CPU.
- *
- * This will disable any lazy FPU state restore of the current FPU state,
- * but if the current thread owns the FPU, it will still be saved by.
  */
-static inline void __cpu_disable_lazy_restore(unsigned int cpu)
+static inline void __cpu_invalidate_fpregs_state(unsigned int cpu)
 {
 	per_cpu(fpu_fpregs_owner_ctx, cpu) = NULL;
 }
 
-static inline int fpu_want_lazy_restore(struct fpu *fpu, unsigned int cpu)
+static inline void __fpu_invalidate_fpregs_state(struct fpu *fpu)
+{
+	fpu->last_cpu = -1;
+}
+
+static inline int fpregs_state_valid(struct fpu *fpu, unsigned int cpu)
 {
 	return fpu == this_cpu_read_stable(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu;
 }
@@ -588,7 +602,7 @@ switch_fpu_prepare(struct fpu *old_fpu, struct fpu *new_fpu, int cpu)
 	} else {
 		old_fpu->last_cpu = -1;
 		if (fpu.preload) {
-			if (fpu_want_lazy_restore(new_fpu, cpu))
+			if (fpregs_state_valid(new_fpu, cpu))
 				fpu.preload = 0;
 			else
 				prefetch(&new_fpu->state);
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 6a37d525bdbe..25a45ddfdbcf 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -336,7 +336,7 @@ void fpu__activate_fpstate_write(struct fpu *fpu)
 
 	if (fpu->fpstate_active) {
 		/* Invalidate any lazy state: */
-		fpu->last_cpu = -1;
+		__fpu_invalidate_fpregs_state(fpu);
 	} else {
 		fpstate_init(&fpu->state);
 		trace_x86_fpu_init_state(fpu);
@@ -379,7 +379,7 @@ void fpu__current_fpstate_write_begin(void)
 	 * ensures we will not be lazy and skip a XRSTOR in the
 	 * future.
 	 */
-	fpu->last_cpu = -1;
+	__fpu_invalidate_fpregs_state(fpu);
 }
 
 /*
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 42a93621f5b0..ca4c4ca2f6af 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1111,7 +1111,7 @@ int native_cpu_up(unsigned int cpu, struct task_struct *tidle)
 		return err;
 
 	/* the FPU context is blank, nobody can own it */
-	__cpu_disable_lazy_restore(cpu);
+	__cpu_invalidate_fpregs_state(cpu);
 
 	common_cpu_up(cpu, tidle);
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 8/9] x86/fpu: remove __fpregs_(de)activate
  2016-10-05  0:34 [PATCH 0/9] x86/fpu: remove lazy FPU mode & various FPU cleanups riel
                   ` (6 preceding siblings ...)
  2016-10-05  0:34 ` [PATCH 7/9] x86/fpu: rename lazy restore functions to "register state valid" riel
@ 2016-10-05  0:34 ` riel
  2016-10-05  2:10   ` Andy Lutomirski
  2016-10-07  9:46   ` [tip:x86/fpu] x86/fpu: Remove __fpregs_(de)activate() tip-bot for Rik van Riel
  2016-10-05  0:34 ` [PATCH 9/9] x86/fpu: split old & new fpu code paths riel
  8 siblings, 2 replies; 28+ messages in thread
From: riel @ 2016-10-05  0:34 UTC (permalink / raw)
  To: linux-kernel; +Cc: dave.hansen, x86, tglx, pbonzini, mingo, luto, pa, bp

From: Rik van Riel <riel@redhat.com>

Now that fpregs_activate and fpregs_deactivate do nothing except
call the double underscored versions of themselves, we can get
rid of the double underscore version.

Signed-off-by: Rik van Riel <riel@redhat.com>
---
 arch/x86/include/asm/fpu/internal.h | 25 +++++++------------------
 1 file changed, 7 insertions(+), 18 deletions(-)

diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index d2cfe16dd9fa..aa7a117b43f8 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -509,8 +509,11 @@ static inline int fpregs_state_valid(struct fpu *fpu, unsigned int cpu)
 	return fpu == this_cpu_read_stable(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu;
 }
 
-
-static inline void __fpregs_deactivate(struct fpu *fpu)
+/*
+ * These generally need preemption protection to work,
+ * do try to avoid using these on their own.
+ */
+static inline void fpregs_deactivate(struct fpu *fpu)
 {
 	WARN_ON_FPU(!fpu->fpregs_active);
 
@@ -519,7 +522,7 @@ static inline void __fpregs_deactivate(struct fpu *fpu)
 	trace_x86_fpu_regs_deactivated(fpu);
 }
 
-static inline void __fpregs_activate(struct fpu *fpu)
+static inline void fpregs_activate(struct fpu *fpu)
 {
 	WARN_ON_FPU(fpu->fpregs_active);
 
@@ -544,20 +547,6 @@ static inline int fpregs_active(void)
 }
 
 /*
- * These generally need preemption protection to work,
- * do try to avoid using these on their own.
- */
-static inline void fpregs_activate(struct fpu *fpu)
-{
-	__fpregs_activate(fpu);
-}
-
-static inline void fpregs_deactivate(struct fpu *fpu)
-{
-	__fpregs_deactivate(fpu);
-}
-
-/*
  * FPU state switching for scheduling.
  *
  * This is a two-stage process:
@@ -595,7 +584,7 @@ switch_fpu_prepare(struct fpu *old_fpu, struct fpu *new_fpu, int cpu)
 
 		/* Don't change CR0.TS if we just switch! */
 		if (fpu.preload) {
-			__fpregs_activate(new_fpu);
+			fpregs_activate(new_fpu);
 			trace_x86_fpu_regs_activated(new_fpu);
 			prefetch(&new_fpu->state);
 		}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 9/9] x86/fpu: split old & new fpu code paths
  2016-10-05  0:34 [PATCH 0/9] x86/fpu: remove lazy FPU mode & various FPU cleanups riel
                   ` (7 preceding siblings ...)
  2016-10-05  0:34 ` [PATCH 8/9] x86/fpu: remove __fpregs_(de)activate riel
@ 2016-10-05  0:34 ` riel
  2016-10-07  9:47   ` [tip:x86/fpu] x86/fpu: Split old & new FPU " tip-bot for Rik van Riel
  8 siblings, 1 reply; 28+ messages in thread
From: riel @ 2016-10-05  0:34 UTC (permalink / raw)
  To: linux-kernel; +Cc: dave.hansen, x86, tglx, pbonzini, mingo, luto, pa, bp

From: Rik van Riel <riel@redhat.com>

Now that CR0.TS is no longer being manipulated, we can simplify
switch_fpu_prepare by no longer nesting the handling of the new
fpu inside the two branches for the old FPU.

Signed-off-by: Rik van Riel <riel@redhat.com>
---
 arch/x86/include/asm/fpu/internal.h | 22 ++++++++--------------
 1 file changed, 8 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index aa7a117b43f8..ef52935f8a17 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -581,23 +581,17 @@ switch_fpu_prepare(struct fpu *old_fpu, struct fpu *new_fpu, int cpu)
 		/* But leave fpu_fpregs_owner_ctx! */
 		old_fpu->fpregs_active = 0;
 		trace_x86_fpu_regs_deactivated(old_fpu);
+	} else
+		old_fpu->last_cpu = -1;
 
-		/* Don't change CR0.TS if we just switch! */
-		if (fpu.preload) {
-			fpregs_activate(new_fpu);
-			trace_x86_fpu_regs_activated(new_fpu);
+	if (fpu.preload) {
+		if (fpregs_state_valid(new_fpu, cpu))
+			fpu.preload = 0;
+		else
 			prefetch(&new_fpu->state);
-		}
-	} else {
-		old_fpu->last_cpu = -1;
-		if (fpu.preload) {
-			if (fpregs_state_valid(new_fpu, cpu))
-				fpu.preload = 0;
-			else
-				prefetch(&new_fpu->state);
-			fpregs_activate(new_fpu);
-		}
+		fpregs_activate(new_fpu);
 	}
+
 	return fpu;
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/9] x86/crypto: Remove X86_FEATURE_EAGER_FPU ifdef from the crc32c code
  2016-10-05  0:34 ` [PATCH 1/9] x86/crypto: Remove X86_FEATURE_EAGER_FPU ifdef from the crc32c code riel
@ 2016-10-05  0:40   ` Rik van Riel
  2016-10-07  9:43   ` [tip:x86/fpu] x86/crypto, x86/fpu: Remove X86_FEATURE_EAGER_FPU #ifdef " tip-bot for Andy Lutomirski
  1 sibling, 0 replies; 28+ messages in thread
From: Rik van Riel @ 2016-10-05  0:40 UTC (permalink / raw)
  To: linux-kernel; +Cc: dave.hansen, x86, tglx, pbonzini, mingo, luto, pa, bp

[-- Attachment #1: Type: text/plain, Size: 328 bytes --]

On Tue, 2016-10-04 at 20:34 -0400, riel@redhat.com wrote:
> From: Rik van Riel <riel@redhat.com>
> 
> From : Andy Lutomirski <luto@kernel.org>

OK, I am not sure why "guilt patchbomb" (which invokes
git patchbomb) would do that :(

Obviously these first four patches are Andy's, not my
own.

-- 
All Rights Reversed.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 5/9] x86/fpu: remove fpu.counter
  2016-10-05  0:34 ` [PATCH 5/9] x86/fpu: remove fpu.counter riel
@ 2016-10-05  2:07   ` Andy Lutomirski
  2016-10-07  9:45   ` [tip:x86/fpu] x86/fpu: Remove struct fpu::counter tip-bot for Rik van Riel
  1 sibling, 0 replies; 28+ messages in thread
From: Andy Lutomirski @ 2016-10-05  2:07 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-kernel, Dave Hansen, X86 ML, Thomas Gleixner,
	Paolo Bonzini, Ingo Molnar, Andrew Lutomirski, pa,
	Borislav Petkov

On Tue, Oct 4, 2016 at 5:34 PM,  <riel@redhat.com> wrote:
> From: Rik van Riel <riel@redhat.com>
>
> With the lazy FPU code gone, we no longer use the counter field
> in struct fpu for anything. Get rid it.

Reviewed-by: Andy Lutomirski <luto@kernel.org>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 7/9] x86/fpu: rename lazy restore functions to "register state valid"
  2016-10-05  0:34 ` [PATCH 7/9] x86/fpu: rename lazy restore functions to "register state valid" riel
@ 2016-10-05  2:09   ` Andy Lutomirski
  2016-10-07  9:46   ` [tip:x86/fpu] x86/fpu: Rename " tip-bot for Rik van Riel
  1 sibling, 0 replies; 28+ messages in thread
From: Andy Lutomirski @ 2016-10-05  2:09 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-kernel, Dave Hansen, X86 ML, Thomas Gleixner,
	Paolo Bonzini, Ingo Molnar, Andrew Lutomirski, pa,
	Borislav Petkov

On Tue, Oct 4, 2016 at 5:34 PM,  <riel@redhat.com> wrote:
> From: Rik van Riel <riel@redhat.com>
>
> Name the functions after the state they track, rather than the function
> they currently enable. This should make it more obvious when we use the
> fpu_register_state_valid function for something else in the future.

I like this.

Reviewed-by: Andy Lutomirski <luto@kernel.org>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 8/9] x86/fpu: remove __fpregs_(de)activate
  2016-10-05  0:34 ` [PATCH 8/9] x86/fpu: remove __fpregs_(de)activate riel
@ 2016-10-05  2:10   ` Andy Lutomirski
  2016-10-07  9:46   ` [tip:x86/fpu] x86/fpu: Remove __fpregs_(de)activate() tip-bot for Rik van Riel
  1 sibling, 0 replies; 28+ messages in thread
From: Andy Lutomirski @ 2016-10-05  2:10 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-kernel, Dave Hansen, X86 ML, Thomas Gleixner,
	Paolo Bonzini, Ingo Molnar, Andrew Lutomirski, Borislav Petkov,
	H. Peter Anvin

On Tue, Oct 4, 2016 at 5:34 PM,  <riel@redhat.com> wrote:
> From: Rik van Riel <riel@redhat.com>
>
> Now that fpregs_activate and fpregs_deactivate do nothing except
> call the double underscored versions of themselves, we can get
> rid of the double underscore version.

Reviewed-by: Andy Lutomirski <luto@kernel.org>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/9] x86/fpu: Hard-disable lazy fpu mode
  2016-10-05  0:34 ` [PATCH 2/9] x86/fpu: Hard-disable lazy fpu mode riel
@ 2016-10-05  7:14   ` Paolo Bonzini
  2016-10-05 13:57     ` Rik van Riel
  2016-10-07  9:43   ` [tip:x86/fpu] x86/fpu: Hard-disable lazy FPU mode tip-bot for Andy Lutomirski
  1 sibling, 1 reply; 28+ messages in thread
From: Paolo Bonzini @ 2016-10-05  7:14 UTC (permalink / raw)
  To: riel, linux-kernel; +Cc: dave.hansen, x86, tglx, mingo, luto, pa, bp



On 05/10/2016 02:34, riel@redhat.com wrote:
> From: Andy Lutomirski <luto@kernel.org>
> 
> Since commit 58122bf1d856 ("x86/fpu: Default eagerfpu=on on all
> CPUs") in Linux 4.6, eager FPU mode has been the default on all x86
> systems, and no one has reported any regressions.
> 
> This patch removes the ability to enable lazy mode: use_eager_fpu()
> becomes "return true" and all of the FPU mode selection machinery is
> removed.

I haven't quite followed up on my promise to benchmark lazy vs. eager
FPU, but I probably should do that now...

I see two possible issues with this.  First, AMD as far as I know does
not have XSAVEOPT.  Second, when using virtualization, depending on how
you configure your cluster it's enough to have one pre-SandyBridge Intel
machine to force no XSAVE on all machines.

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/9] x86/fpu: Hard-disable lazy fpu mode
  2016-10-05  7:14   ` Paolo Bonzini
@ 2016-10-05 13:57     ` Rik van Riel
  2016-10-05 14:03       ` Paolo Bonzini
  0 siblings, 1 reply; 28+ messages in thread
From: Rik van Riel @ 2016-10-05 13:57 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel; +Cc: dave.hansen, x86, tglx, mingo, luto, pa, bp

[-- Attachment #1: Type: text/plain, Size: 1335 bytes --]

On Wed, 2016-10-05 at 09:14 +0200, Paolo Bonzini wrote:
> 
> On 05/10/2016 02:34, riel@redhat.com wrote:
> > 
> > From: Andy Lutomirski <luto@kernel.org>
> > 
> > Since commit 58122bf1d856 ("x86/fpu: Default eagerfpu=on on all
> > CPUs") in Linux 4.6, eager FPU mode has been the default on all x86
> > systems, and no one has reported any regressions.
> > 
> > This patch removes the ability to enable lazy mode: use_eager_fpu()
> > becomes "return true" and all of the FPU mode selection machinery
> > is
> > removed.
> 
> I haven't quite followed up on my promise to benchmark lazy vs. eager
> FPU, but I probably should do that now...
> 
> I see two possible issues with this.  First, AMD as far as I know
> does
> not have XSAVEOPT.  Second, when using virtualization, depending on
> how
> you configure your cluster it's enough to have one pre-SandyBridge
> Intel
> machine to force no XSAVE on all machines.

The "OPT" part of XSAVEOPT does not work across the
host/guest boundary, anyway.

One of the items used in the tuple that determines
whether the optimization can be used is whether
or not the system is in the VMX root, or in a guest.

In other words, across a VMEXIT / VMENTER boundary,
it does full saves & restores, if I am reading the
manual right.

-- 
All Rights Reversed.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/9] x86/fpu: Hard-disable lazy fpu mode
  2016-10-05 13:57     ` Rik van Riel
@ 2016-10-05 14:03       ` Paolo Bonzini
  2016-10-05 15:59         ` Andy Lutomirski
  0 siblings, 1 reply; 28+ messages in thread
From: Paolo Bonzini @ 2016-10-05 14:03 UTC (permalink / raw)
  To: Rik van Riel, linux-kernel; +Cc: dave.hansen, x86, tglx, mingo, luto, pa, bp



On 05/10/2016 15:57, Rik van Riel wrote:
> On Wed, 2016-10-05 at 09:14 +0200, Paolo Bonzini wrote:
>>
>> On 05/10/2016 02:34, riel@redhat.com wrote:
>>>
>>> From: Andy Lutomirski <luto@kernel.org>
>>>
>>> Since commit 58122bf1d856 ("x86/fpu: Default eagerfpu=on on all
>>> CPUs") in Linux 4.6, eager FPU mode has been the default on all x86
>>> systems, and no one has reported any regressions.
>>>
>>> This patch removes the ability to enable lazy mode: use_eager_fpu()
>>> becomes "return true" and all of the FPU mode selection machinery
>>> is
>>> removed.
>>
>> I haven't quite followed up on my promise to benchmark lazy vs. eager
>> FPU, but I probably should do that now...
>>
>> I see two possible issues with this.  First, AMD as far as I know does
>> not have XSAVEOPT.  Second, when using virtualization, depending on
>> how you configure your cluster it's enough to have one pre-SandyBridge
>> Intel machine to force no XSAVE on all machines.
> 
> The "OPT" part of XSAVEOPT does not work across the
> host/guest boundary, anyway.

Yes, but it works for bare metal (and in fact eager FPU was keyed on
XSAVEOPT before 58122bf1d856, not XSAVE).

I'm not talking about KVM here; I am just saying that the lazy FPU code
might be used more than we'd like to, because of AMD machines and of
cases where XSAVE is hidden altogether from guests.  Of course it is
quite unlikely that it be reported as a regression, since things just
work.  But as far as I know 58122bf1d856 went in without any substantial
(or not-so-substantial) benchmarking.

Paolo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/9] x86/fpu: Hard-disable lazy fpu mode
  2016-10-05 14:03       ` Paolo Bonzini
@ 2016-10-05 15:59         ` Andy Lutomirski
  2016-10-05 16:09           ` Paolo Bonzini
  0 siblings, 1 reply; 28+ messages in thread
From: Andy Lutomirski @ 2016-10-05 15:59 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Rik van Riel, linux-kernel, Dave Hansen, X86 ML, Thomas Gleixner,
	Ingo Molnar, Andrew Lutomirski, pa, Borislav Petkov

On Wed, Oct 5, 2016 at 7:03 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
>
> On 05/10/2016 15:57, Rik van Riel wrote:
>> On Wed, 2016-10-05 at 09:14 +0200, Paolo Bonzini wrote:
>>>
>>> On 05/10/2016 02:34, riel@redhat.com wrote:
>>>>
>>>> From: Andy Lutomirski <luto@kernel.org>
>>>>
>>>> Since commit 58122bf1d856 ("x86/fpu: Default eagerfpu=on on all
>>>> CPUs") in Linux 4.6, eager FPU mode has been the default on all x86
>>>> systems, and no one has reported any regressions.
>>>>
>>>> This patch removes the ability to enable lazy mode: use_eager_fpu()
>>>> becomes "return true" and all of the FPU mode selection machinery
>>>> is
>>>> removed.
>>>
>>> I haven't quite followed up on my promise to benchmark lazy vs. eager
>>> FPU, but I probably should do that now...
>>>
>>> I see two possible issues with this.  First, AMD as far as I know does
>>> not have XSAVEOPT.  Second, when using virtualization, depending on
>>> how you configure your cluster it's enough to have one pre-SandyBridge
>>> Intel machine to force no XSAVE on all machines.
>>
>> The "OPT" part of XSAVEOPT does not work across the
>> host/guest boundary, anyway.
>
> Yes, but it works for bare metal (and in fact eager FPU was keyed on
> XSAVEOPT before 58122bf1d856, not XSAVE).
>
> I'm not talking about KVM here; I am just saying that the lazy FPU code
> might be used more than we'd like to, because of AMD machines and of
> cases where XSAVE is hidden altogether from guests.  Of course it is
> quite unlikely that it be reported as a regression, since things just
> work.  But as far as I know 58122bf1d856 went in without any substantial
> (or not-so-substantial) benchmarking.

I actually benchmarked the underlying instructions quite a bit on
Intel.  (Not on AMD, but I doubt the results are very different.)
Writes to CR0.TS are *incredibly* slow, as are device-not-available
exceptions.  Keep in mind that, while there's a (slow) CLTS
instruction, there is no corresponding STTS instruction, so we're left
with a fully serializing, slowly microcoded move to CR0.  On SVM, I
think it's worse, because IIRC SVM doesn't have fancy execution
controls that let MOV to CR0 avoid exiting.  We're talking a couple
hundred cycles best case for a TS set/clear pair, and thousands of
cycles if we actually take a fault.

In contrast, an unconditional XSAVE + XRSTOR was considerably faster.

This leads to the counterintuitive result that, if we switch from task
A to B and back and task A is heavily using the FPU, then it's faster
to unconditoinally save and restore the full state both ways than it
is to set and clear TS so we can avoid it.

I would guess that the lazy mode hasn't been a win under most
workloads for many years.  It's worse on 64-bit CPUs, since almost all
userspace uses XMM regs for memcpy.  At least on 32-bit CPUs, SIMD
instructions weren't always available and userspace was conservative.

--Andy

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/9] x86/fpu: Hard-disable lazy fpu mode
  2016-10-05 15:59         ` Andy Lutomirski
@ 2016-10-05 16:09           ` Paolo Bonzini
  0 siblings, 0 replies; 28+ messages in thread
From: Paolo Bonzini @ 2016-10-05 16:09 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Rik van Riel, linux-kernel, Dave Hansen, X86 ML, Thomas Gleixner,
	Ingo Molnar, Andrew Lutomirski, pa, Borislav Petkov



On 05/10/2016 17:59, Andy Lutomirski wrote:
> I actually benchmarked the underlying instructions quite a bit on
> Intel.  (Not on AMD, but I doubt the results are very different.)
> Writes to CR0.TS are *incredibly* slow, as are device-not-available
> exceptions.  Keep in mind that, while there's a (slow) CLTS
> instruction, there is no corresponding STTS instruction, so we're left
> with a fully serializing, slowly microcoded move to CR0.  On SVM, I
> think it's worse, because IIRC SVM doesn't have fancy execution
> controls that let MOV to CR0 avoid exiting.

SVM lets you choose whether to trap on TS and MP; update_cr0_intercept
is where KVM does that (the "selective CR0 write" intercept is always
on, while the "CR0 write" intercept is toggled in that function).

> We're talking a couple
> hundred cycles best case for a TS set/clear pair, and thousands of
> cycles if we actually take a fault.
> 
> In contrast, an unconditional XSAVE + XRSTOR was considerably faster.

Did you also do a comparison against FXSAVE/FXRSTOR (on either pre- or
post-SandyBridge processors)?

But yeah, it's possible that the lack of STTS screws the whole plan,
despite the fpu.preload optimization in switch_fpu_prepare.

Paolo

> This leads to the counterintuitive result that, if we switch from task
> A to B and back and task A is heavily using the FPU, then it's faster
> to unconditoinally save and restore the full state both ways than it
> is to set and clear TS so we can avoid it.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [tip:x86/fpu] x86/crypto, x86/fpu: Remove X86_FEATURE_EAGER_FPU #ifdef from the crc32c code
  2016-10-05  0:34 ` [PATCH 1/9] x86/crypto: Remove X86_FEATURE_EAGER_FPU ifdef from the crc32c code riel
  2016-10-05  0:40   ` Rik van Riel
@ 2016-10-07  9:43   ` tip-bot for Andy Lutomirski
  1 sibling, 0 replies; 28+ messages in thread
From: tip-bot for Andy Lutomirski @ 2016-10-07  9:43 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: fenghua.yu, jpoimboe, dvlasenk, brgerst, dave.hansen, peterz,
	riel, mingo, hpa, tglx, quentin.casasnovas, oleg, linux-kernel,
	luto, bp, torvalds

Commit-ID:  02f39b2379fb81557ae864ec8f85421c0250c954
Gitweb:     http://git.kernel.org/tip/02f39b2379fb81557ae864ec8f85421c0250c954
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Tue, 4 Oct 2016 20:34:30 -0400
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Fri, 7 Oct 2016 11:14:06 +0200

x86/crypto, x86/fpu: Remove X86_FEATURE_EAGER_FPU #ifdef from the crc32c code

The crypto code was checking both use_eager_fpu() and
defined(X86_FEATURE_EAGER_FPU).  The latter was nonsensical, so
remove it.  This will avoid breakage when we remove
X86_FEATURE_EAGER_FPU.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Rik van Riel <riel@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: pbonzini@redhat.com
Link: http://lkml.kernel.org/r/1475627678-20788-2-git-send-email-riel@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/crypto/crc32c-intel_glue.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/arch/x86/crypto/crc32c-intel_glue.c b/arch/x86/crypto/crc32c-intel_glue.c
index 0857b1a..715399b 100644
--- a/arch/x86/crypto/crc32c-intel_glue.c
+++ b/arch/x86/crypto/crc32c-intel_glue.c
@@ -58,16 +58,11 @@
 asmlinkage unsigned int crc_pcl(const u8 *buffer, int len,
 				unsigned int crc_init);
 static int crc32c_pcl_breakeven = CRC32C_PCL_BREAKEVEN_EAGERFPU;
-#if defined(X86_FEATURE_EAGER_FPU)
 #define set_pcl_breakeven_point()					\
 do {									\
 	if (!use_eager_fpu())						\
 		crc32c_pcl_breakeven = CRC32C_PCL_BREAKEVEN_NOEAGERFPU;	\
 } while (0)
-#else
-#define set_pcl_breakeven_point()					\
-	(crc32c_pcl_breakeven = CRC32C_PCL_BREAKEVEN_NOEAGERFPU)
-#endif
 #endif /* CONFIG_X86_64 */
 
 static u32 crc32c_intel_le_hw_byte(u32 crc, unsigned char const *data, size_t length)

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [tip:x86/fpu] x86/fpu: Hard-disable lazy FPU mode
  2016-10-05  0:34 ` [PATCH 2/9] x86/fpu: Hard-disable lazy fpu mode riel
  2016-10-05  7:14   ` Paolo Bonzini
@ 2016-10-07  9:43   ` tip-bot for Andy Lutomirski
  1 sibling, 0 replies; 28+ messages in thread
From: tip-bot for Andy Lutomirski @ 2016-10-07  9:43 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, dave.hansen, dvlasenk, mingo, riel, brgerst,
	quentin.casasnovas, bp, tglx, torvalds, fenghua.yu, jpoimboe,
	hpa, peterz, oleg, luto

Commit-ID:  ca6938a1cd8a1c5e861a99b67f84ac166fc2b9e7
Gitweb:     http://git.kernel.org/tip/ca6938a1cd8a1c5e861a99b67f84ac166fc2b9e7
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Tue, 4 Oct 2016 20:34:31 -0400
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Fri, 7 Oct 2016 11:14:17 +0200

x86/fpu: Hard-disable lazy FPU mode

Since commit:

  58122bf1d856 ("x86/fpu: Default eagerfpu=on on all CPUs")

... in Linux 4.6, eager FPU mode has been the default on all x86
systems, and no one has reported any regressions.

This patch removes the ability to enable lazy mode: use_eager_fpu()
becomes "return true" and all of the FPU mode selection machinery is
removed.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Rik van Riel <riel@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: pbonzini@redhat.com
Link: http://lkml.kernel.org/r/1475627678-20788-3-git-send-email-riel@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/cpufeatures.h  |  2 +-
 arch/x86/include/asm/fpu/internal.h |  2 +-
 arch/x86/kernel/fpu/init.c          | 91 ++-----------------------------------
 3 files changed, 5 insertions(+), 90 deletions(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 1188bc8..b212b86 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -104,7 +104,7 @@
 #define X86_FEATURE_EXTD_APICID	( 3*32+26) /* has extended APICID (8 bits) */
 #define X86_FEATURE_AMD_DCM     ( 3*32+27) /* multi-node processor */
 #define X86_FEATURE_APERFMPERF	( 3*32+28) /* APERFMPERF */
-#define X86_FEATURE_EAGER_FPU	( 3*32+29) /* "eagerfpu" Non lazy FPU restore */
+/* free, was #define X86_FEATURE_EAGER_FPU	( 3*32+29) * "eagerfpu" Non lazy FPU restore */
 #define X86_FEATURE_NONSTOP_TSC_S3 ( 3*32+30) /* TSC doesn't stop in S3 state */
 
 /* Intel-defined CPU features, CPUID level 0x00000001 (ecx), word 4 */
diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 2737366..8852e3a 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -62,7 +62,7 @@ extern u64 fpu__get_supported_xfeatures_mask(void);
  */
 static __always_inline __pure bool use_eager_fpu(void)
 {
-	return static_cpu_has(X86_FEATURE_EAGER_FPU);
+	return true;
 }
 
 static __always_inline __pure bool use_xsaveopt(void)
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 2f2b8c7..1a09d13 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -15,10 +15,7 @@
  */
 static void fpu__init_cpu_ctx_switch(void)
 {
-	if (!boot_cpu_has(X86_FEATURE_EAGER_FPU))
-		stts();
-	else
-		clts();
+	clts();
 }
 
 /*
@@ -233,82 +230,16 @@ static void __init fpu__init_system_xstate_size_legacy(void)
 }
 
 /*
- * FPU context switching strategies:
- *
- * Against popular belief, we don't do lazy FPU saves, due to the
- * task migration complications it brings on SMP - we only do
- * lazy FPU restores.
- *
- * 'lazy' is the traditional strategy, which is based on setting
- * CR0::TS to 1 during context-switch (instead of doing a full
- * restore of the FPU state), which causes the first FPU instruction
- * after the context switch (whenever it is executed) to fault - at
- * which point we lazily restore the FPU state into FPU registers.
- *
- * Tasks are of course under no obligation to execute FPU instructions,
- * so it can easily happen that another context-switch occurs without
- * a single FPU instruction being executed. If we eventually switch
- * back to the original task (that still owns the FPU) then we have
- * not only saved the restores along the way, but we also have the
- * FPU ready to be used for the original task.
- *
- * 'lazy' is deprecated because it's almost never a performance win
- * and it's much more complicated than 'eager'.
- *
- * 'eager' switching is by default on all CPUs, there we switch the FPU
- * state during every context switch, regardless of whether the task
- * has used FPU instructions in that time slice or not. This is done
- * because modern FPU context saving instructions are able to optimize
- * state saving and restoration in hardware: they can detect both
- * unused and untouched FPU state and optimize accordingly.
- *
- * [ Note that even in 'lazy' mode we might optimize context switches
- *   to use 'eager' restores, if we detect that a task is using the FPU
- *   frequently. See the fpu->counter logic in fpu/internal.h for that. ]
- */
-static enum { ENABLE, DISABLE } eagerfpu = ENABLE;
-
-/*
  * Find supported xfeatures based on cpu features and command-line input.
  * This must be called after fpu__init_parse_early_param() is called and
  * xfeatures_mask is enumerated.
  */
 u64 __init fpu__get_supported_xfeatures_mask(void)
 {
-	/* Support all xfeatures known to us */
-	if (eagerfpu != DISABLE)
-		return XCNTXT_MASK;
-
-	/* Warning of xfeatures being disabled for no eagerfpu mode */
-	if (xfeatures_mask & XFEATURE_MASK_EAGER) {
-		pr_err("x86/fpu: eagerfpu switching disabled, disabling the following xstate features: 0x%llx.\n",
-			xfeatures_mask & XFEATURE_MASK_EAGER);
-	}
-
-	/* Return a mask that masks out all features requiring eagerfpu mode */
-	return ~XFEATURE_MASK_EAGER;
-}
-
-/*
- * Disable features dependent on eagerfpu.
- */
-static void __init fpu__clear_eager_fpu_features(void)
-{
-	setup_clear_cpu_cap(X86_FEATURE_MPX);
+	return XCNTXT_MASK;
 }
 
-/*
- * Pick the FPU context switching strategy:
- *
- * When eagerfpu is AUTO or ENABLE, we ensure it is ENABLE if either of
- * the following is true:
- *
- * (1) the cpu has xsaveopt, as it has the optimization and doing eager
- *     FPU switching has a relatively low cost compared to a plain xsave;
- * (2) the cpu has xsave features (e.g. MPX) that depend on eager FPU
- *     switching. Should the kernel boot with noxsaveopt, we support MPX
- *     with eager FPU switching at a higher cost.
- */
+/* Legacy code to initialize eager fpu mode. */
 static void __init fpu__init_system_ctx_switch(void)
 {
 	static bool on_boot_cpu __initdata = 1;
@@ -317,17 +248,6 @@ static void __init fpu__init_system_ctx_switch(void)
 	on_boot_cpu = 0;
 
 	WARN_ON_FPU(current->thread.fpu.fpstate_active);
-
-	if (boot_cpu_has(X86_FEATURE_XSAVEOPT) && eagerfpu != DISABLE)
-		eagerfpu = ENABLE;
-
-	if (xfeatures_mask & XFEATURE_MASK_EAGER)
-		eagerfpu = ENABLE;
-
-	if (eagerfpu == ENABLE)
-		setup_force_cpu_cap(X86_FEATURE_EAGER_FPU);
-
-	printk(KERN_INFO "x86/fpu: Using '%s' FPU context switches.\n", eagerfpu == ENABLE ? "eager" : "lazy");
 }
 
 /*
@@ -336,11 +256,6 @@ static void __init fpu__init_system_ctx_switch(void)
  */
 static void __init fpu__init_parse_early_param(void)
 {
-	if (cmdline_find_option_bool(boot_command_line, "eagerfpu=off")) {
-		eagerfpu = DISABLE;
-		fpu__clear_eager_fpu_features();
-	}
-
 	if (cmdline_find_option_bool(boot_command_line, "no387"))
 		setup_clear_cpu_cap(X86_FEATURE_FPU);
 

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [tip:x86/fpu] x86/fpu: Remove the XFEATURE_MASK_EAGER/LAZY distinction
  2016-10-05  0:34 ` [PATCH 3/9] x86/fpu: Remove the XFEATURE_MASK_EAGER/LAZY distinction riel
@ 2016-10-07  9:44   ` tip-bot for Andy Lutomirski
  0 siblings, 0 replies; 28+ messages in thread
From: tip-bot for Andy Lutomirski @ 2016-10-07  9:44 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: dave.hansen, bp, peterz, hpa, torvalds, luto, oleg, dvlasenk,
	brgerst, jpoimboe, linux-kernel, fenghua.yu, tglx, riel,
	quentin.casasnovas, mingo

Commit-ID:  2f7fada23549b4657e9ea92eeddebc878db569d2
Gitweb:     http://git.kernel.org/tip/2f7fada23549b4657e9ea92eeddebc878db569d2
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Tue, 4 Oct 2016 20:34:32 -0400
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Fri, 7 Oct 2016 11:14:29 +0200

x86/fpu: Remove the XFEATURE_MASK_EAGER/LAZY distinction

Now that lazy mode is gone, we don't need to distinguish which
xfeatures require eager mode.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Rik van Riel <riel@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: pbonzini@redhat.com
Link: http://lkml.kernel.org/r/1475627678-20788-4-git-send-email-riel@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/fpu/xstate.h | 14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
index d4957ac..1b2799e 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -21,20 +21,16 @@
 /* Supervisor features */
 #define XFEATURE_MASK_SUPERVISOR (XFEATURE_MASK_PT)
 
-/* Supported features which support lazy state saving */
-#define XFEATURE_MASK_LAZY	(XFEATURE_MASK_FP | \
+/* All currently supported features */
+#define XCNTXT_MASK		(XFEATURE_MASK_FP | \
 				 XFEATURE_MASK_SSE | \
 				 XFEATURE_MASK_YMM | \
 				 XFEATURE_MASK_OPMASK | \
 				 XFEATURE_MASK_ZMM_Hi256 | \
 				 XFEATURE_MASK_Hi16_ZMM	 | \
-				 XFEATURE_MASK_PKRU)
-
-/* Supported features which require eager state saving */
-#define XFEATURE_MASK_EAGER	(XFEATURE_MASK_BNDREGS | XFEATURE_MASK_BNDCSR)
-
-/* All currently supported features */
-#define XCNTXT_MASK	(XFEATURE_MASK_LAZY | XFEATURE_MASK_EAGER)
+				 XFEATURE_MASK_PKRU | \
+				 XFEATURE_MASK_BNDREGS | \
+				 XFEATURE_MASK_BNDCSR)
 
 #ifdef CONFIG_X86_64
 #define REX_PREFIX	"0x48, "

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [tip:x86/fpu] x86/fpu: Remove use_eager_fpu()
  2016-10-05  0:34 ` [PATCH 4/9] x86/fpu: Remove use_eager_fpu() riel
@ 2016-10-07  9:44   ` tip-bot for Andy Lutomirski
  0 siblings, 0 replies; 28+ messages in thread
From: tip-bot for Andy Lutomirski @ 2016-10-07  9:44 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: peterz, luto, dvlasenk, riel, brgerst, jpoimboe, oleg, torvalds,
	tglx, quentin.casasnovas, linux-kernel, mingo, hpa, bp,
	fenghua.yu, dave.hansen

Commit-ID:  c592b57347069abfc0dcad3b3a302cf882602597
Gitweb:     http://git.kernel.org/tip/c592b57347069abfc0dcad3b3a302cf882602597
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Tue, 4 Oct 2016 20:34:33 -0400
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Fri, 7 Oct 2016 11:14:36 +0200

x86/fpu: Remove use_eager_fpu()

This removes all the obvious code paths that depend on lazy FPU mode.
It shouldn't change the generated code at all.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Rik van Riel <riel@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: pbonzini@redhat.com
Link: http://lkml.kernel.org/r/1475627678-20788-5-git-send-email-riel@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/crypto/crc32c-intel_glue.c | 17 ++++-------------
 arch/x86/include/asm/fpu/internal.h | 34 +--------------------------------
 arch/x86/kernel/fpu/core.c          | 38 +++++--------------------------------
 arch/x86/kernel/fpu/signal.c        |  8 +++-----
 arch/x86/kernel/fpu/xstate.c        |  9 ---------
 arch/x86/kvm/cpuid.c                |  4 +---
 arch/x86/kvm/x86.c                  | 10 ----------
 7 files changed, 14 insertions(+), 106 deletions(-)

diff --git a/arch/x86/crypto/crc32c-intel_glue.c b/arch/x86/crypto/crc32c-intel_glue.c
index 715399b..c194d57 100644
--- a/arch/x86/crypto/crc32c-intel_glue.c
+++ b/arch/x86/crypto/crc32c-intel_glue.c
@@ -48,21 +48,13 @@
 #ifdef CONFIG_X86_64
 /*
  * use carryless multiply version of crc32c when buffer
- * size is >= 512 (when eager fpu is enabled) or
- * >= 1024 (when eager fpu is disabled) to account
+ * size is >= 512 to account
  * for fpu state save/restore overhead.
  */
-#define CRC32C_PCL_BREAKEVEN_EAGERFPU	512
-#define CRC32C_PCL_BREAKEVEN_NOEAGERFPU	1024
+#define CRC32C_PCL_BREAKEVEN	512
 
 asmlinkage unsigned int crc_pcl(const u8 *buffer, int len,
 				unsigned int crc_init);
-static int crc32c_pcl_breakeven = CRC32C_PCL_BREAKEVEN_EAGERFPU;
-#define set_pcl_breakeven_point()					\
-do {									\
-	if (!use_eager_fpu())						\
-		crc32c_pcl_breakeven = CRC32C_PCL_BREAKEVEN_NOEAGERFPU;	\
-} while (0)
 #endif /* CONFIG_X86_64 */
 
 static u32 crc32c_intel_le_hw_byte(u32 crc, unsigned char const *data, size_t length)
@@ -185,7 +177,7 @@ static int crc32c_pcl_intel_update(struct shash_desc *desc, const u8 *data,
 	 * use faster PCL version if datasize is large enough to
 	 * overcome kernel fpu state save/restore overhead
 	 */
-	if (len >= crc32c_pcl_breakeven && irq_fpu_usable()) {
+	if (len >= CRC32C_PCL_BREAKEVEN && irq_fpu_usable()) {
 		kernel_fpu_begin();
 		*crcp = crc_pcl(data, len, *crcp);
 		kernel_fpu_end();
@@ -197,7 +189,7 @@ static int crc32c_pcl_intel_update(struct shash_desc *desc, const u8 *data,
 static int __crc32c_pcl_intel_finup(u32 *crcp, const u8 *data, unsigned int len,
 				u8 *out)
 {
-	if (len >= crc32c_pcl_breakeven && irq_fpu_usable()) {
+	if (len >= CRC32C_PCL_BREAKEVEN && irq_fpu_usable()) {
 		kernel_fpu_begin();
 		*(__le32 *)out = ~cpu_to_le32(crc_pcl(data, len, *crcp));
 		kernel_fpu_end();
@@ -256,7 +248,6 @@ static int __init crc32c_intel_mod_init(void)
 		alg.update = crc32c_pcl_intel_update;
 		alg.finup = crc32c_pcl_intel_finup;
 		alg.digest = crc32c_pcl_intel_digest;
-		set_pcl_breakeven_point();
 	}
 #endif
 	return crypto_register_shash(&alg);
diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 8852e3a..7801d32 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -60,11 +60,6 @@ extern u64 fpu__get_supported_xfeatures_mask(void);
 /*
  * FPU related CPU feature flag helper routines:
  */
-static __always_inline __pure bool use_eager_fpu(void)
-{
-	return true;
-}
-
 static __always_inline __pure bool use_xsaveopt(void)
 {
 	return static_cpu_has(X86_FEATURE_XSAVEOPT);
@@ -501,24 +496,6 @@ static inline int fpu_want_lazy_restore(struct fpu *fpu, unsigned int cpu)
 }
 
 
-/*
- * Wrap lazy FPU TS handling in a 'hw fpregs activation/deactivation'
- * idiom, which is then paired with the sw-flag (fpregs_active) later on:
- */
-
-static inline void __fpregs_activate_hw(void)
-{
-	if (!use_eager_fpu())
-		clts();
-}
-
-static inline void __fpregs_deactivate_hw(void)
-{
-	if (!use_eager_fpu())
-		stts();
-}
-
-/* Must be paired with an 'stts' (fpregs_deactivate_hw()) after! */
 static inline void __fpregs_deactivate(struct fpu *fpu)
 {
 	WARN_ON_FPU(!fpu->fpregs_active);
@@ -528,7 +505,6 @@ static inline void __fpregs_deactivate(struct fpu *fpu)
 	trace_x86_fpu_regs_deactivated(fpu);
 }
 
-/* Must be paired with a 'clts' (fpregs_activate_hw()) before! */
 static inline void __fpregs_activate(struct fpu *fpu)
 {
 	WARN_ON_FPU(fpu->fpregs_active);
@@ -554,22 +530,17 @@ static inline int fpregs_active(void)
 }
 
 /*
- * Encapsulate the CR0.TS handling together with the
- * software flag.
- *
  * These generally need preemption protection to work,
  * do try to avoid using these on their own.
  */
 static inline void fpregs_activate(struct fpu *fpu)
 {
-	__fpregs_activate_hw();
 	__fpregs_activate(fpu);
 }
 
 static inline void fpregs_deactivate(struct fpu *fpu)
 {
 	__fpregs_deactivate(fpu);
-	__fpregs_deactivate_hw();
 }
 
 /*
@@ -596,8 +567,7 @@ switch_fpu_prepare(struct fpu *old_fpu, struct fpu *new_fpu, int cpu)
 	 * or if the past 5 consecutive context-switches used math.
 	 */
 	fpu.preload = static_cpu_has(X86_FEATURE_FPU) &&
-		      new_fpu->fpstate_active &&
-		      (use_eager_fpu() || new_fpu->counter > 5);
+		      new_fpu->fpstate_active;
 
 	if (old_fpu->fpregs_active) {
 		if (!copy_fpregs_to_fpstate(old_fpu))
@@ -615,8 +585,6 @@ switch_fpu_prepare(struct fpu *old_fpu, struct fpu *new_fpu, int cpu)
 			__fpregs_activate(new_fpu);
 			trace_x86_fpu_regs_activated(new_fpu);
 			prefetch(&new_fpu->state);
-		} else {
-			__fpregs_deactivate_hw();
 		}
 	} else {
 		old_fpu->counter = 0;
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 3fc03a0..036e14f 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -57,27 +57,9 @@ static bool kernel_fpu_disabled(void)
 	return this_cpu_read(in_kernel_fpu);
 }
 
-/*
- * Were we in an interrupt that interrupted kernel mode?
- *
- * On others, we can do a kernel_fpu_begin/end() pair *ONLY* if that
- * pair does nothing at all: the thread must not have fpu (so
- * that we don't try to save the FPU state), and TS must
- * be set (so that the clts/stts pair does nothing that is
- * visible in the interrupted kernel thread).
- *
- * Except for the eagerfpu case when we return true; in the likely case
- * the thread has FPU but we are not going to set/clear TS.
- */
 static bool interrupted_kernel_fpu_idle(void)
 {
-	if (kernel_fpu_disabled())
-		return false;
-
-	if (use_eager_fpu())
-		return true;
-
-	return !current->thread.fpu.fpregs_active && (read_cr0() & X86_CR0_TS);
+	return !kernel_fpu_disabled();
 }
 
 /*
@@ -125,7 +107,6 @@ void __kernel_fpu_begin(void)
 		copy_fpregs_to_fpstate(fpu);
 	} else {
 		this_cpu_write(fpu_fpregs_owner_ctx, NULL);
-		__fpregs_activate_hw();
 	}
 }
 EXPORT_SYMBOL(__kernel_fpu_begin);
@@ -136,8 +117,6 @@ void __kernel_fpu_end(void)
 
 	if (fpu->fpregs_active)
 		copy_kernel_to_fpregs(&fpu->state);
-	else
-		__fpregs_deactivate_hw();
 
 	kernel_fpu_enable();
 }
@@ -199,10 +178,7 @@ void fpu__save(struct fpu *fpu)
 	trace_x86_fpu_before_save(fpu);
 	if (fpu->fpregs_active) {
 		if (!copy_fpregs_to_fpstate(fpu)) {
-			if (use_eager_fpu())
-				copy_kernel_to_fpregs(&fpu->state);
-			else
-				fpregs_deactivate(fpu);
+			copy_kernel_to_fpregs(&fpu->state);
 		}
 	}
 	trace_x86_fpu_after_save(fpu);
@@ -259,8 +235,7 @@ int fpu__copy(struct fpu *dst_fpu, struct fpu *src_fpu)
 	 * Don't let 'init optimized' areas of the XSAVE area
 	 * leak into the child task:
 	 */
-	if (use_eager_fpu())
-		memset(&dst_fpu->state.xsave, 0, fpu_kernel_xstate_size);
+	memset(&dst_fpu->state.xsave, 0, fpu_kernel_xstate_size);
 
 	/*
 	 * Save current FPU registers directly into the child
@@ -282,10 +257,7 @@ int fpu__copy(struct fpu *dst_fpu, struct fpu *src_fpu)
 		memcpy(&src_fpu->state, &dst_fpu->state,
 		       fpu_kernel_xstate_size);
 
-		if (use_eager_fpu())
-			copy_kernel_to_fpregs(&src_fpu->state);
-		else
-			fpregs_deactivate(src_fpu);
+		copy_kernel_to_fpregs(&src_fpu->state);
 	}
 	preempt_enable();
 
@@ -517,7 +489,7 @@ void fpu__clear(struct fpu *fpu)
 {
 	WARN_ON_FPU(fpu != &current->thread.fpu); /* Almost certainly an anomaly */
 
-	if (!use_eager_fpu() || !static_cpu_has(X86_FEATURE_FPU)) {
+	if (!static_cpu_has(X86_FEATURE_FPU)) {
 		/* FPU state will be reallocated lazily at the first use. */
 		fpu__drop(fpu);
 	} else {
diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index a184c21..83c23c2 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -340,11 +340,9 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
 		}
 
 		fpu->fpstate_active = 1;
-		if (use_eager_fpu()) {
-			preempt_disable();
-			fpu__restore(fpu);
-			preempt_enable();
-		}
+		preempt_disable();
+		fpu__restore(fpu);
+		preempt_enable();
 
 		return err;
 	} else {
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 01567aa..76bc2a1 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -886,15 +886,6 @@ int arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
 	 */
 	if (!boot_cpu_has(X86_FEATURE_OSPKE))
 		return -EINVAL;
-	/*
-	 * For most XSAVE components, this would be an arduous task:
-	 * brining fpstate up to date with fpregs, updating fpstate,
-	 * then re-populating fpregs.  But, for components that are
-	 * never lazily managed, we can just access the fpregs
-	 * directly.  PKRU is never managed lazily, so we can just
-	 * manipulate it directly.  Make sure it stays that way.
-	 */
-	WARN_ON_ONCE(!use_eager_fpu());
 
 	/* Set the bits we need in PKRU:  */
 	if (init_val & PKEY_DISABLE_ACCESS)
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 3235e0f..1cf143e 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -16,7 +16,6 @@
 #include <linux/export.h>
 #include <linux/vmalloc.h>
 #include <linux/uaccess.h>
-#include <asm/fpu/internal.h> /* For use_eager_fpu.  Ugh! */
 #include <asm/user.h>
 #include <asm/fpu/xstate.h>
 #include "cpuid.h"
@@ -114,8 +113,7 @@ int kvm_update_cpuid(struct kvm_vcpu *vcpu)
 	if (best && (best->eax & (F(XSAVES) | F(XSAVEC))))
 		best->ebx = xstate_required_size(vcpu->arch.xcr0, true);
 
-	if (use_eager_fpu())
-		kvm_x86_ops->fpu_activate(vcpu);
+	kvm_x86_ops->fpu_activate(vcpu);
 
 	/*
 	 * The existing code assumes virtual address is 48-bit in the canonical
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 699f872..59d7761 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7357,16 +7357,6 @@ void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
 	copy_fpregs_to_fpstate(&vcpu->arch.guest_fpu);
 	__kernel_fpu_end();
 	++vcpu->stat.fpu_reload;
-	/*
-	 * If using eager FPU mode, or if the guest is a frequent user
-	 * of the FPU, just leave the FPU active for next time.
-	 * Every 255 times fpu_counter rolls over to 0; a guest that uses
-	 * the FPU in bursts will revert to loading it on demand.
-	 */
-	if (!use_eager_fpu()) {
-		if (++vcpu->fpu_counter < 5)
-			kvm_make_request(KVM_REQ_DEACTIVATE_FPU, vcpu);
-	}
 	trace_kvm_fpu(0);
 }
 

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [tip:x86/fpu] x86/fpu: Remove struct fpu::counter
  2016-10-05  0:34 ` [PATCH 5/9] x86/fpu: remove fpu.counter riel
  2016-10-05  2:07   ` Andy Lutomirski
@ 2016-10-07  9:45   ` tip-bot for Rik van Riel
  1 sibling, 0 replies; 28+ messages in thread
From: tip-bot for Rik van Riel @ 2016-10-07  9:45 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: torvalds, brgerst, dave.hansen, quentin.casasnovas, hpa,
	dvlasenk, jpoimboe, peterz, luto, oleg, bp, mingo, tglx,
	fenghua.yu, riel, linux-kernel

Commit-ID:  3913cc3507575273beb165a5e027a081913ed507
Gitweb:     http://git.kernel.org/tip/3913cc3507575273beb165a5e027a081913ed507
Author:     Rik van Riel <riel@redhat.com>
AuthorDate: Tue, 4 Oct 2016 20:34:34 -0400
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Fri, 7 Oct 2016 11:14:40 +0200

x86/fpu: Remove struct fpu::counter

With the lazy FPU code gone, we no longer use the counter field
in struct fpu for anything. Get rid it.

Signed-off-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: pbonzini@redhat.com
Link: http://lkml.kernel.org/r/1475627678-20788-6-git-send-email-riel@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/fpu/internal.h |  3 ---
 arch/x86/include/asm/fpu/types.h    | 11 -----------
 arch/x86/include/asm/trace/fpu.h    |  5 +----
 arch/x86/kernel/fpu/core.c          |  3 ---
 4 files changed, 1 insertion(+), 21 deletions(-)

diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 7801d32..499d6ed 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -581,16 +581,13 @@ switch_fpu_prepare(struct fpu *old_fpu, struct fpu *new_fpu, int cpu)
 
 		/* Don't change CR0.TS if we just switch! */
 		if (fpu.preload) {
-			new_fpu->counter++;
 			__fpregs_activate(new_fpu);
 			trace_x86_fpu_regs_activated(new_fpu);
 			prefetch(&new_fpu->state);
 		}
 	} else {
-		old_fpu->counter = 0;
 		old_fpu->last_cpu = -1;
 		if (fpu.preload) {
-			new_fpu->counter++;
 			if (fpu_want_lazy_restore(new_fpu, cpu))
 				fpu.preload = 0;
 			else
diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
index 48df486..e31332d 100644
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -322,17 +322,6 @@ struct fpu {
 	unsigned char			fpregs_active;
 
 	/*
-	 * @counter:
-	 *
-	 * This counter contains the number of consecutive context switches
-	 * during which the FPU stays used. If this is over a threshold, the
-	 * lazy FPU restore logic becomes eager, to save the trap overhead.
-	 * This is an unsigned char so that after 256 iterations the counter
-	 * wraps and the context switch behavior turns lazy again; this is to
-	 * deal with bursty apps that only use the FPU for a short time:
-	 */
-	unsigned char			counter;
-	/*
 	 * @state:
 	 *
 	 * In-memory copy of all FPU registers that we save/restore
diff --git a/arch/x86/include/asm/trace/fpu.h b/arch/x86/include/asm/trace/fpu.h
index 9217ab1..342e597 100644
--- a/arch/x86/include/asm/trace/fpu.h
+++ b/arch/x86/include/asm/trace/fpu.h
@@ -14,7 +14,6 @@ DECLARE_EVENT_CLASS(x86_fpu,
 		__field(struct fpu *, fpu)
 		__field(bool, fpregs_active)
 		__field(bool, fpstate_active)
-		__field(int, counter)
 		__field(u64, xfeatures)
 		__field(u64, xcomp_bv)
 		),
@@ -23,17 +22,15 @@ DECLARE_EVENT_CLASS(x86_fpu,
 		__entry->fpu		= fpu;
 		__entry->fpregs_active	= fpu->fpregs_active;
 		__entry->fpstate_active	= fpu->fpstate_active;
-		__entry->counter	= fpu->counter;
 		if (boot_cpu_has(X86_FEATURE_OSXSAVE)) {
 			__entry->xfeatures = fpu->state.xsave.header.xfeatures;
 			__entry->xcomp_bv  = fpu->state.xsave.header.xcomp_bv;
 		}
 	),
-	TP_printk("x86/fpu: %p fpregs_active: %d fpstate_active: %d counter: %d xfeatures: %llx xcomp_bv: %llx",
+	TP_printk("x86/fpu: %p fpregs_active: %d fpstate_active: %d xfeatures: %llx xcomp_bv: %llx",
 			__entry->fpu,
 			__entry->fpregs_active,
 			__entry->fpstate_active,
-			__entry->counter,
 			__entry->xfeatures,
 			__entry->xcomp_bv
 	)
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 036e14f..6a37d52 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -222,7 +222,6 @@ EXPORT_SYMBOL_GPL(fpstate_init);
 
 int fpu__copy(struct fpu *dst_fpu, struct fpu *src_fpu)
 {
-	dst_fpu->counter = 0;
 	dst_fpu->fpregs_active = 0;
 	dst_fpu->last_cpu = -1;
 
@@ -430,7 +429,6 @@ void fpu__restore(struct fpu *fpu)
 	trace_x86_fpu_before_restore(fpu);
 	fpregs_activate(fpu);
 	copy_kernel_to_fpregs(&fpu->state);
-	fpu->counter++;
 	trace_x86_fpu_after_restore(fpu);
 	kernel_fpu_enable();
 }
@@ -448,7 +446,6 @@ EXPORT_SYMBOL_GPL(fpu__restore);
 void fpu__drop(struct fpu *fpu)
 {
 	preempt_disable();
-	fpu->counter = 0;
 
 	if (fpu->fpregs_active) {
 		/* Ignore delayed exceptions from user space */

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [tip:x86/fpu] x86/fpu, kvm: Remove KVM vcpu->fpu_counter
  2016-10-05  0:34 ` [PATCH 6/9] x86/fpu,kvm: remove kvm vcpu->fpu_counter riel
@ 2016-10-07  9:45   ` tip-bot for Rik van Riel
  0 siblings, 0 replies; 28+ messages in thread
From: tip-bot for Rik van Riel @ 2016-10-07  9:45 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, brgerst, quentin.casasnovas, riel, dvlasenk, torvalds,
	jpoimboe, bp, oleg, hpa, peterz, dave.hansen, tglx, luto,
	fenghua.yu, linux-kernel

Commit-ID:  3d42de25d290fdfe604835d1b389845b8cba5bff
Gitweb:     http://git.kernel.org/tip/3d42de25d290fdfe604835d1b389845b8cba5bff
Author:     Rik van Riel <riel@redhat.com>
AuthorDate: Tue, 4 Oct 2016 20:34:35 -0400
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Fri, 7 Oct 2016 11:14:41 +0200

x86/fpu, kvm: Remove KVM vcpu->fpu_counter

With the removal of the lazy FPU code, this field is no longer used.
Get rid of it.

Signed-off-by: Rik van Riel <riel@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: pbonzini@redhat.com
Link: http://lkml.kernel.org/r/1475627678-20788-7-git-send-email-riel@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kvm/x86.c       | 4 +---
 include/linux/kvm_host.h | 1 -
 2 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 59d7761..2c7e775 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7348,10 +7348,8 @@ void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
 
 void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
 {
-	if (!vcpu->guest_fpu_loaded) {
-		vcpu->fpu_counter = 0;
+	if (!vcpu->guest_fpu_loaded)
 		return;
-	}
 
 	vcpu->guest_fpu_loaded = 0;
 	copy_fpregs_to_fpstate(&vcpu->arch.guest_fpu);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 9c28b4d..4e6905c 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -224,7 +224,6 @@ struct kvm_vcpu {
 
 	int fpu_active;
 	int guest_fpu_loaded, guest_xcr0_loaded;
-	unsigned char fpu_counter;
 	struct swait_queue_head wq;
 	struct pid *pid;
 	int sigset_active;

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [tip:x86/fpu] x86/fpu: Rename lazy restore functions to "register state valid"
  2016-10-05  0:34 ` [PATCH 7/9] x86/fpu: rename lazy restore functions to "register state valid" riel
  2016-10-05  2:09   ` Andy Lutomirski
@ 2016-10-07  9:46   ` tip-bot for Rik van Riel
  1 sibling, 0 replies; 28+ messages in thread
From: tip-bot for Rik van Riel @ 2016-10-07  9:46 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: tglx, hpa, torvalds, jpoimboe, dave.hansen, bp, dvlasenk, mingo,
	riel, quentin.casasnovas, peterz, linux-kernel, brgerst,
	fenghua.yu, luto, oleg

Commit-ID:  25d83b531c1aa4fca5b4e24ed10f493268f162bc
Gitweb:     http://git.kernel.org/tip/25d83b531c1aa4fca5b4e24ed10f493268f162bc
Author:     Rik van Riel <riel@redhat.com>
AuthorDate: Tue, 4 Oct 2016 20:34:36 -0400
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Fri, 7 Oct 2016 11:14:41 +0200

x86/fpu: Rename lazy restore functions to "register state valid"

Name the functions after the state they track, rather than the function
they currently enable. This should make it more obvious when we use the
fpu_register_state_valid() function for something else in the future.

Signed-off-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: pbonzini@redhat.com
Link: http://lkml.kernel.org/r/1475627678-20788-8-git-send-email-riel@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/fpu/internal.h | 26 ++++++++++++++++++++------
 arch/x86/kernel/fpu/core.c          |  4 ++--
 arch/x86/kernel/smpboot.c           |  2 +-
 3 files changed, 23 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 499d6ed..d2cfe16 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -479,18 +479,32 @@ extern int copy_fpstate_to_sigframe(void __user *buf, void __user *fp, int size)
 DECLARE_PER_CPU(struct fpu *, fpu_fpregs_owner_ctx);
 
 /*
+ * The in-register FPU state for an FPU context on a CPU is assumed to be
+ * valid if the fpu->last_cpu matches the CPU, and the fpu_fpregs_owner_ctx
+ * matches the FPU.
+ *
+ * If the FPU register state is valid, the kernel can skip restoring the
+ * FPU state from memory.
+ *
+ * Any code that clobbers the FPU registers or updates the in-memory
+ * FPU state for a task MUST let the rest of the kernel know that the
+ * FPU registers are no longer valid for this task. Calling either of
+ * these two invalidate functions is enough, use whichever is convenient.
+ *
  * Must be run with preemption disabled: this clears the fpu_fpregs_owner_ctx,
  * on this CPU.
- *
- * This will disable any lazy FPU state restore of the current FPU state,
- * but if the current thread owns the FPU, it will still be saved by.
  */
-static inline void __cpu_disable_lazy_restore(unsigned int cpu)
+static inline void __cpu_invalidate_fpregs_state(unsigned int cpu)
 {
 	per_cpu(fpu_fpregs_owner_ctx, cpu) = NULL;
 }
 
-static inline int fpu_want_lazy_restore(struct fpu *fpu, unsigned int cpu)
+static inline void __fpu_invalidate_fpregs_state(struct fpu *fpu)
+{
+	fpu->last_cpu = -1;
+}
+
+static inline int fpregs_state_valid(struct fpu *fpu, unsigned int cpu)
 {
 	return fpu == this_cpu_read_stable(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu;
 }
@@ -588,7 +602,7 @@ switch_fpu_prepare(struct fpu *old_fpu, struct fpu *new_fpu, int cpu)
 	} else {
 		old_fpu->last_cpu = -1;
 		if (fpu.preload) {
-			if (fpu_want_lazy_restore(new_fpu, cpu))
+			if (fpregs_state_valid(new_fpu, cpu))
 				fpu.preload = 0;
 			else
 				prefetch(&new_fpu->state);
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 6a37d52..25a45dd 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -336,7 +336,7 @@ void fpu__activate_fpstate_write(struct fpu *fpu)
 
 	if (fpu->fpstate_active) {
 		/* Invalidate any lazy state: */
-		fpu->last_cpu = -1;
+		__fpu_invalidate_fpregs_state(fpu);
 	} else {
 		fpstate_init(&fpu->state);
 		trace_x86_fpu_init_state(fpu);
@@ -379,7 +379,7 @@ void fpu__current_fpstate_write_begin(void)
 	 * ensures we will not be lazy and skip a XRSTOR in the
 	 * future.
 	 */
-	fpu->last_cpu = -1;
+	__fpu_invalidate_fpregs_state(fpu);
 }
 
 /*
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 42a9362..ca4c4ca 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1111,7 +1111,7 @@ int native_cpu_up(unsigned int cpu, struct task_struct *tidle)
 		return err;
 
 	/* the FPU context is blank, nobody can own it */
-	__cpu_disable_lazy_restore(cpu);
+	__cpu_invalidate_fpregs_state(cpu);
 
 	common_cpu_up(cpu, tidle);
 

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [tip:x86/fpu] x86/fpu: Remove __fpregs_(de)activate()
  2016-10-05  0:34 ` [PATCH 8/9] x86/fpu: remove __fpregs_(de)activate riel
  2016-10-05  2:10   ` Andy Lutomirski
@ 2016-10-07  9:46   ` tip-bot for Rik van Riel
  1 sibling, 0 replies; 28+ messages in thread
From: tip-bot for Rik van Riel @ 2016-10-07  9:46 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: quentin.casasnovas, torvalds, jpoimboe, dave.hansen, mingo, luto,
	riel, linux-kernel, oleg, fenghua.yu, dvlasenk, bp, hpa, peterz,
	tglx, brgerst

Commit-ID:  66f314efca3843a8874405ab015e354d041f86dd
Gitweb:     http://git.kernel.org/tip/66f314efca3843a8874405ab015e354d041f86dd
Author:     Rik van Riel <riel@redhat.com>
AuthorDate: Tue, 4 Oct 2016 20:34:37 -0400
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Fri, 7 Oct 2016 11:14:42 +0200

x86/fpu: Remove __fpregs_(de)activate()

Now that fpregs_activate() and fpregs_deactivate() do nothing except
call the double underscored versions of themselves, we can get
rid of the double underscore version.

Signed-off-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: pbonzini@redhat.com
Link: http://lkml.kernel.org/r/1475627678-20788-9-git-send-email-riel@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/fpu/internal.h | 25 +++++++------------------
 1 file changed, 7 insertions(+), 18 deletions(-)

diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index d2cfe16..d0324bc 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -509,8 +509,11 @@ static inline int fpregs_state_valid(struct fpu *fpu, unsigned int cpu)
 	return fpu == this_cpu_read_stable(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu;
 }
 
-
-static inline void __fpregs_deactivate(struct fpu *fpu)
+/*
+ * These generally need preemption protection to work,
+ * do try to avoid using these on their own:
+ */
+static inline void fpregs_deactivate(struct fpu *fpu)
 {
 	WARN_ON_FPU(!fpu->fpregs_active);
 
@@ -519,7 +522,7 @@ static inline void __fpregs_deactivate(struct fpu *fpu)
 	trace_x86_fpu_regs_deactivated(fpu);
 }
 
-static inline void __fpregs_activate(struct fpu *fpu)
+static inline void fpregs_activate(struct fpu *fpu)
 {
 	WARN_ON_FPU(fpu->fpregs_active);
 
@@ -544,20 +547,6 @@ static inline int fpregs_active(void)
 }
 
 /*
- * These generally need preemption protection to work,
- * do try to avoid using these on their own.
- */
-static inline void fpregs_activate(struct fpu *fpu)
-{
-	__fpregs_activate(fpu);
-}
-
-static inline void fpregs_deactivate(struct fpu *fpu)
-{
-	__fpregs_deactivate(fpu);
-}
-
-/*
  * FPU state switching for scheduling.
  *
  * This is a two-stage process:
@@ -595,7 +584,7 @@ switch_fpu_prepare(struct fpu *old_fpu, struct fpu *new_fpu, int cpu)
 
 		/* Don't change CR0.TS if we just switch! */
 		if (fpu.preload) {
-			__fpregs_activate(new_fpu);
+			fpregs_activate(new_fpu);
 			trace_x86_fpu_regs_activated(new_fpu);
 			prefetch(&new_fpu->state);
 		}

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [tip:x86/fpu] x86/fpu: Split old & new FPU code paths
  2016-10-05  0:34 ` [PATCH 9/9] x86/fpu: split old & new fpu code paths riel
@ 2016-10-07  9:47   ` tip-bot for Rik van Riel
  0 siblings, 0 replies; 28+ messages in thread
From: tip-bot for Rik van Riel @ 2016-10-07  9:47 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: riel, oleg, brgerst, hpa, peterz, quentin.casasnovas, tglx,
	mingo, linux-kernel, torvalds, jpoimboe, dvlasenk, dave.hansen,
	luto, bp, fenghua.yu

Commit-ID:  9ad93fe35aff616fca4e2b9581fdeed498605f9e
Gitweb:     http://git.kernel.org/tip/9ad93fe35aff616fca4e2b9581fdeed498605f9e
Author:     Rik van Riel <riel@redhat.com>
AuthorDate: Tue, 4 Oct 2016 20:34:38 -0400
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Fri, 7 Oct 2016 11:14:43 +0200

x86/fpu: Split old & new FPU code paths

Now that CR0.TS is no longer being manipulated, we can simplify
switch_fpu_prepare() by no longer nesting the handling of new_fpu
inside the two branches for the old_fpu.

Signed-off-by: Rik van Riel <riel@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: pbonzini@redhat.com
Link: http://lkml.kernel.org/r/1475627678-20788-10-git-send-email-riel@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/fpu/internal.h | 22 ++++++++--------------
 1 file changed, 8 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index d0324bc..1dcb29e 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -581,23 +581,17 @@ switch_fpu_prepare(struct fpu *old_fpu, struct fpu *new_fpu, int cpu)
 		/* But leave fpu_fpregs_owner_ctx! */
 		old_fpu->fpregs_active = 0;
 		trace_x86_fpu_regs_deactivated(old_fpu);
+	} else
+		old_fpu->last_cpu = -1;
 
-		/* Don't change CR0.TS if we just switch! */
-		if (fpu.preload) {
-			fpregs_activate(new_fpu);
-			trace_x86_fpu_regs_activated(new_fpu);
+	if (fpu.preload) {
+		if (fpregs_state_valid(new_fpu, cpu))
+			fpu.preload = 0;
+		else
 			prefetch(&new_fpu->state);
-		}
-	} else {
-		old_fpu->last_cpu = -1;
-		if (fpu.preload) {
-			if (fpregs_state_valid(new_fpu, cpu))
-				fpu.preload = 0;
-			else
-				prefetch(&new_fpu->state);
-			fpregs_activate(new_fpu);
-		}
+		fpregs_activate(new_fpu);
 	}
+
 	return fpu;
 }
 

^ permalink raw reply related	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2016-10-07  9:50 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-05  0:34 [PATCH 0/9] x86/fpu: remove lazy FPU mode & various FPU cleanups riel
2016-10-05  0:34 ` [PATCH 1/9] x86/crypto: Remove X86_FEATURE_EAGER_FPU ifdef from the crc32c code riel
2016-10-05  0:40   ` Rik van Riel
2016-10-07  9:43   ` [tip:x86/fpu] x86/crypto, x86/fpu: Remove X86_FEATURE_EAGER_FPU #ifdef " tip-bot for Andy Lutomirski
2016-10-05  0:34 ` [PATCH 2/9] x86/fpu: Hard-disable lazy fpu mode riel
2016-10-05  7:14   ` Paolo Bonzini
2016-10-05 13:57     ` Rik van Riel
2016-10-05 14:03       ` Paolo Bonzini
2016-10-05 15:59         ` Andy Lutomirski
2016-10-05 16:09           ` Paolo Bonzini
2016-10-07  9:43   ` [tip:x86/fpu] x86/fpu: Hard-disable lazy FPU mode tip-bot for Andy Lutomirski
2016-10-05  0:34 ` [PATCH 3/9] x86/fpu: Remove the XFEATURE_MASK_EAGER/LAZY distinction riel
2016-10-07  9:44   ` [tip:x86/fpu] " tip-bot for Andy Lutomirski
2016-10-05  0:34 ` [PATCH 4/9] x86/fpu: Remove use_eager_fpu() riel
2016-10-07  9:44   ` [tip:x86/fpu] " tip-bot for Andy Lutomirski
2016-10-05  0:34 ` [PATCH 5/9] x86/fpu: remove fpu.counter riel
2016-10-05  2:07   ` Andy Lutomirski
2016-10-07  9:45   ` [tip:x86/fpu] x86/fpu: Remove struct fpu::counter tip-bot for Rik van Riel
2016-10-05  0:34 ` [PATCH 6/9] x86/fpu,kvm: remove kvm vcpu->fpu_counter riel
2016-10-07  9:45   ` [tip:x86/fpu] x86/fpu, kvm: Remove KVM vcpu->fpu_counter tip-bot for Rik van Riel
2016-10-05  0:34 ` [PATCH 7/9] x86/fpu: rename lazy restore functions to "register state valid" riel
2016-10-05  2:09   ` Andy Lutomirski
2016-10-07  9:46   ` [tip:x86/fpu] x86/fpu: Rename " tip-bot for Rik van Riel
2016-10-05  0:34 ` [PATCH 8/9] x86/fpu: remove __fpregs_(de)activate riel
2016-10-05  2:10   ` Andy Lutomirski
2016-10-07  9:46   ` [tip:x86/fpu] x86/fpu: Remove __fpregs_(de)activate() tip-bot for Rik van Riel
2016-10-05  0:34 ` [PATCH 9/9] x86/fpu: split old & new fpu code paths riel
2016-10-07  9:47   ` [tip:x86/fpu] x86/fpu: Split old & new FPU " tip-bot for Rik van Riel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.