All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/4] x86/fpu: Reduce unnecessary FNINIT and MXCSR usage
@ 2021-01-21  5:09 Andy Lutomirski
  2021-01-21  5:09 ` [PATCH v3 1/4] x86/fpu: Add kernel_fpu_begin_mask() to selectively initialize state Andy Lutomirski
                   ` (4 more replies)
  0 siblings, 5 replies; 9+ messages in thread
From: Andy Lutomirski @ 2021-01-21  5:09 UTC (permalink / raw)
  To: x86
  Cc: LKML, Krzysztof Mazur, Krzysztof Olędzki, Arnd Bergmann,
	Andy Lutomirski

This series fixes two regressions: a boot failure on AMD K7 and a
performance regression on everything.

I did a double-take here -- the regressions were reported by different
people, both named Krzysztof :)

Changes from v2:
 - Tidy up the if statements (Sean)
 - Changelog and comment improvements (Boris)

Changes from v1:
 - Fix MMX better -- MMX really does need FNINIT.
 - Improve the EFI code.
 - Rename the KFPU constants.
 - Changelog improvements.

Andy Lutomirski (4):
  x86/fpu: Add kernel_fpu_begin_mask() to selectively initialize state
  x86/mmx: Use KFPU_387 for MMX string operations
  x86/fpu: Make the EFI FPU calling convention explicit
  x86/fpu/64: Don't FNINIT in kernel_fpu_begin()

 arch/x86/include/asm/efi.h     | 24 ++++++++++++++++++++----
 arch/x86/include/asm/fpu/api.h | 27 +++++++++++++++++++++++++--
 arch/x86/kernel/fpu/core.c     |  9 +++++----
 arch/x86/lib/mmx_32.c          | 20 +++++++++++++++-----
 arch/x86/platform/efi/efi_64.c |  4 ++--
 5 files changed, 67 insertions(+), 17 deletions(-)

-- 
2.29.2


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v3 1/4] x86/fpu: Add kernel_fpu_begin_mask() to selectively initialize state
  2021-01-21  5:09 [PATCH v3 0/4] x86/fpu: Reduce unnecessary FNINIT and MXCSR usage Andy Lutomirski
@ 2021-01-21  5:09 ` Andy Lutomirski
  2021-01-21 14:26   ` [tip: x86/urgent] " tip-bot2 for Andy Lutomirski
  2021-01-21  5:09 ` [PATCH v3 2/4] x86/mmx: Use KFPU_387 for MMX string operations Andy Lutomirski
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 9+ messages in thread
From: Andy Lutomirski @ 2021-01-21  5:09 UTC (permalink / raw)
  To: x86
  Cc: LKML, Krzysztof Mazur, Krzysztof Olędzki, Arnd Bergmann,
	Andy Lutomirski

Currently, requesting kernel FPU access doesn't distinguish which parts of
the extended ("FPU") state are needed.  This is nice for simplicity, but
there are a few cases in which it's suboptimal:

 - The vast majority of in-kernel FPU users want XMM/YMM/ZMM state but do
   not use legacy 387 state.  These users want MXCSR initialized but don't
   care about the FPU control word.  Skipping FNINIT would save time.
   (Empirically, FNINIT is several times slower than LDMXCSR.)

 - Code that wants MMX doesn't want or need MXCSR initialized.
   _mmx_memcpy(), for example, can run before CR4.OSFXSR gets set, and
   initializing MXCSR will fail.

 - Any future in-kernel users of XFD (eXtended Feature Disable)-capable
   dynamic states will need special handling.

Add a more specific API that allows callers specify exactly what they want.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/include/asm/fpu/api.h | 15 +++++++++++++--
 arch/x86/kernel/fpu/core.c     |  9 +++++----
 2 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h
index dcd9503b1098..38f4936045ab 100644
--- a/arch/x86/include/asm/fpu/api.h
+++ b/arch/x86/include/asm/fpu/api.h
@@ -16,14 +16,25 @@
  * Use kernel_fpu_begin/end() if you intend to use FPU in kernel context. It
  * disables preemption so be careful if you intend to use it for long periods
  * of time.
- * If you intend to use the FPU in softirq you need to check first with
+ * If you intend to use the FPU in irq/softirq you need to check first with
  * irq_fpu_usable() if it is possible.
  */
-extern void kernel_fpu_begin(void);
+
+/* Kernel FPU states to initialize in kernel_fpu_begin_mask() */
+#define KFPU_387	_BITUL(0)	/* 387 state will be initialized */
+#define KFPU_MXCSR	_BITUL(1)	/* MXCSR will be initialized */
+
+extern void kernel_fpu_begin_mask(unsigned int kfpu_mask);
 extern void kernel_fpu_end(void);
 extern bool irq_fpu_usable(void);
 extern void fpregs_mark_activate(void);
 
+/* Code that is unaware of kernel_fpu_begin_mask() can use this */
+static inline void kernel_fpu_begin(void)
+{
+	kernel_fpu_begin_mask(KFPU_387 | KFPU_MXCSR);
+}
+
 /*
  * Use fpregs_lock() while editing CPU's FPU registers or fpu->state.
  * A context switch will (and softirq might) save CPU's FPU registers to
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index eb86a2b831b1..571220ac8bea 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -121,7 +121,7 @@ int copy_fpregs_to_fpstate(struct fpu *fpu)
 }
 EXPORT_SYMBOL(copy_fpregs_to_fpstate);
 
-void kernel_fpu_begin(void)
+void kernel_fpu_begin_mask(unsigned int kfpu_mask)
 {
 	preempt_disable();
 
@@ -141,13 +141,14 @@ void kernel_fpu_begin(void)
 	}
 	__cpu_invalidate_fpregs_state();
 
-	if (boot_cpu_has(X86_FEATURE_XMM))
+	/* Put sane initial values into the control registers. */
+	if (likely(kfpu_mask & KFPU_MXCSR) && boot_cpu_has(X86_FEATURE_XMM))
 		ldmxcsr(MXCSR_DEFAULT);
 
-	if (boot_cpu_has(X86_FEATURE_FPU))
+	if (unlikely(kfpu_mask & KFPU_387) && boot_cpu_has(X86_FEATURE_FPU))
 		asm volatile ("fninit");
 }
-EXPORT_SYMBOL_GPL(kernel_fpu_begin);
+EXPORT_SYMBOL_GPL(kernel_fpu_begin_mask);
 
 void kernel_fpu_end(void)
 {
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v3 2/4] x86/mmx: Use KFPU_387 for MMX string operations
  2021-01-21  5:09 [PATCH v3 0/4] x86/fpu: Reduce unnecessary FNINIT and MXCSR usage Andy Lutomirski
  2021-01-21  5:09 ` [PATCH v3 1/4] x86/fpu: Add kernel_fpu_begin_mask() to selectively initialize state Andy Lutomirski
@ 2021-01-21  5:09 ` Andy Lutomirski
  2021-01-21 11:29   ` Krzysztof Mazur
  2021-01-21 14:26   ` [tip: x86/urgent] " tip-bot2 for Andy Lutomirski
  2021-01-21  5:09 ` [PATCH v3 3/4] x86/fpu: Make the EFI FPU calling convention explicit Andy Lutomirski
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 9+ messages in thread
From: Andy Lutomirski @ 2021-01-21  5:09 UTC (permalink / raw)
  To: x86
  Cc: LKML, Krzysztof Mazur, Krzysztof Olędzki, Arnd Bergmann,
	Andy Lutomirski, stable

The default kernel_fpu_begin() doesn't work on systems that support XMM but
haven't yet enabled CR4.OSFXSR.  This causes crashes when _mmx_memcpy() is
called too early because LDMXCSR generates #UD when the aforementioned bit
is clear.

Fix it by using kernel_fpu_begin_mask(KFPU_387) explicitly.

Fixes: 7ad816762f9b ("x86/fpu: Reset MXCSR to default in kernel_fpu_begin()")
Cc: <stable@vger.kernel.org>
Reported-by: Krzysztof Mazur <krzysiek@podlesie.net>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/lib/mmx_32.c | 20 +++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/arch/x86/lib/mmx_32.c b/arch/x86/lib/mmx_32.c
index 4321fa02e18d..ad1dabce931e 100644
--- a/arch/x86/lib/mmx_32.c
+++ b/arch/x86/lib/mmx_32.c
@@ -26,6 +26,16 @@
 #include <asm/fpu/api.h>
 #include <asm/asm.h>
 
+/*
+ * Use KFPU_387.  MMX instructions are not affected by MXCSR,
+ * but both AMD and Intel documentation states that even integer MMX
+ * operations will result in #MF if an exception is pending in FCW.
+ *
+ * EMMS is not needed afterwards because, after we call kernel_fpu_end(),
+ * any subsequent user of the 387 stack will reinitialize it using
+ * KFPU_387.
+ */
+
 void *_mmx_memcpy(void *to, const void *from, size_t len)
 {
 	void *p;
@@ -37,7 +47,7 @@ void *_mmx_memcpy(void *to, const void *from, size_t len)
 	p = to;
 	i = len >> 6; /* len/64 */
 
-	kernel_fpu_begin();
+	kernel_fpu_begin_mask(KFPU_387);
 
 	__asm__ __volatile__ (
 		"1: prefetch (%0)\n"		/* This set is 28 bytes */
@@ -127,7 +137,7 @@ static void fast_clear_page(void *page)
 {
 	int i;
 
-	kernel_fpu_begin();
+	kernel_fpu_begin_mask(KFPU_387);
 
 	__asm__ __volatile__ (
 		"  pxor %%mm0, %%mm0\n" : :
@@ -160,7 +170,7 @@ static void fast_copy_page(void *to, void *from)
 {
 	int i;
 
-	kernel_fpu_begin();
+	kernel_fpu_begin_mask(KFPU_387);
 
 	/*
 	 * maybe the prefetch stuff can go before the expensive fnsave...
@@ -247,7 +257,7 @@ static void fast_clear_page(void *page)
 {
 	int i;
 
-	kernel_fpu_begin();
+	kernel_fpu_begin_mask(KFPU_387);
 
 	__asm__ __volatile__ (
 		"  pxor %%mm0, %%mm0\n" : :
@@ -282,7 +292,7 @@ static void fast_copy_page(void *to, void *from)
 {
 	int i;
 
-	kernel_fpu_begin();
+	kernel_fpu_begin_mask(KFPU_387);
 
 	__asm__ __volatile__ (
 		"1: prefetch (%0)\n"
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v3 3/4] x86/fpu: Make the EFI FPU calling convention explicit
  2021-01-21  5:09 [PATCH v3 0/4] x86/fpu: Reduce unnecessary FNINIT and MXCSR usage Andy Lutomirski
  2021-01-21  5:09 ` [PATCH v3 1/4] x86/fpu: Add kernel_fpu_begin_mask() to selectively initialize state Andy Lutomirski
  2021-01-21  5:09 ` [PATCH v3 2/4] x86/mmx: Use KFPU_387 for MMX string operations Andy Lutomirski
@ 2021-01-21  5:09 ` Andy Lutomirski
  2021-01-21  5:09 ` [PATCH v3 4/4] x86/fpu/64: Don't FNINIT in kernel_fpu_begin() Andy Lutomirski
  2021-01-21  9:29 ` [PATCH v3 0/4] x86/fpu: Reduce unnecessary FNINIT and MXCSR usage Krzysztof Olędzki
  4 siblings, 0 replies; 9+ messages in thread
From: Andy Lutomirski @ 2021-01-21  5:09 UTC (permalink / raw)
  To: x86
  Cc: LKML, Krzysztof Mazur, Krzysztof Olędzki, Arnd Bergmann,
	Andy Lutomirski

EFI uses kernel_fpu_begin() to conform to the UEFI calling convention.
This specifically requires initializing FCW, whereas no sane 64-bit kernel
code should use legacy 387 operations that reference FCW.

This should enable us to safely change the default semantics of
kernel_fpu_begin() to stop initializing FCW on 64-bit kernels.

Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/include/asm/efi.h     | 24 ++++++++++++++++++++----
 arch/x86/platform/efi/efi_64.c |  4 ++--
 2 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index bc9758ef292e..f3201544fbf8 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -68,17 +68,33 @@ extern unsigned long efi_fw_vendor, efi_config_table;
 		#f " called with too many arguments (" #p ">" #n ")");	\
 })
 
+static inline void efi_fpu_begin(void)
+{
+	/*
+	 * The UEFI calling convention (UEFI spec 2.3.2 and 2.3.4) requires
+	 * that FCW and MXCSR (64-bit) must be initialized prior to calling
+	 * UEFI code.  (Oddly the spec does not require that the FPU stack
+	 * be empty.)
+	 */
+	kernel_fpu_begin_mask(KFPU_387 | KFPU_MXCSR);
+}
+
+static inline void efi_fpu_end(void)
+{
+	kernel_fpu_end();
+}
+
 #ifdef CONFIG_X86_32
 #define arch_efi_call_virt_setup()					\
 ({									\
-	kernel_fpu_begin();						\
+	efi_fpu_begin();						\
 	firmware_restrict_branch_speculation_start();			\
 })
 
 #define arch_efi_call_virt_teardown()					\
 ({									\
 	firmware_restrict_branch_speculation_end();			\
-	kernel_fpu_end();						\
+	efi_fpu_end();							\
 })
 
 #define arch_efi_call_virt(p, f, args...)	p->f(args)
@@ -107,7 +123,7 @@ struct efi_scratch {
 #define arch_efi_call_virt_setup()					\
 ({									\
 	efi_sync_low_kernel_mappings();					\
-	kernel_fpu_begin();						\
+	efi_fpu_begin();						\
 	firmware_restrict_branch_speculation_start();			\
 	efi_switch_mm(&efi_mm);						\
 })
@@ -119,7 +135,7 @@ struct efi_scratch {
 ({									\
 	efi_switch_mm(efi_scratch.prev_mm);				\
 	firmware_restrict_branch_speculation_end();			\
-	kernel_fpu_end();						\
+	efi_fpu_end();							\
 })
 
 #ifdef CONFIG_KASAN
diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 8f5759df7776..f042040b5da1 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -848,7 +848,7 @@ efi_set_virtual_address_map(unsigned long memory_map_size,
 							 virtual_map);
 	efi_switch_mm(&efi_mm);
 
-	kernel_fpu_begin();
+	efi_fpu_begin();
 
 	/* Disable interrupts around EFI calls: */
 	local_irq_save(flags);
@@ -857,7 +857,7 @@ efi_set_virtual_address_map(unsigned long memory_map_size,
 			  descriptor_version, virtual_map);
 	local_irq_restore(flags);
 
-	kernel_fpu_end();
+	efi_fpu_end();
 
 	/* grab the virtually remapped EFI runtime services table pointer */
 	efi.runtime = READ_ONCE(systab->runtime);
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v3 4/4] x86/fpu/64: Don't FNINIT in kernel_fpu_begin()
  2021-01-21  5:09 [PATCH v3 0/4] x86/fpu: Reduce unnecessary FNINIT and MXCSR usage Andy Lutomirski
                   ` (2 preceding siblings ...)
  2021-01-21  5:09 ` [PATCH v3 3/4] x86/fpu: Make the EFI FPU calling convention explicit Andy Lutomirski
@ 2021-01-21  5:09 ` Andy Lutomirski
  2021-01-21  9:29 ` [PATCH v3 0/4] x86/fpu: Reduce unnecessary FNINIT and MXCSR usage Krzysztof Olędzki
  4 siblings, 0 replies; 9+ messages in thread
From: Andy Lutomirski @ 2021-01-21  5:09 UTC (permalink / raw)
  To: x86
  Cc: LKML, Krzysztof Mazur, Krzysztof Olędzki, Arnd Bergmann,
	Andy Lutomirski

The remaining callers of kernel_fpu_begin() in 64-bit kernels don't use 387
instructions, so there's no need to sanitize the FPU state.  Skip it to get
most of the performance we lost back.

Reported-by: Krzysztof Olędzki <ole@ans.pl>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/include/asm/fpu/api.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h
index 38f4936045ab..435bc59d539b 100644
--- a/arch/x86/include/asm/fpu/api.h
+++ b/arch/x86/include/asm/fpu/api.h
@@ -32,7 +32,19 @@ extern void fpregs_mark_activate(void);
 /* Code that is unaware of kernel_fpu_begin_mask() can use this */
 static inline void kernel_fpu_begin(void)
 {
+#ifdef CONFIG_X86_64
+	/*
+	 * Any 64-bit code that uses 387 instructions must explicitly request
+	 * KFPU_387.
+	 */
+	kernel_fpu_begin_mask(KFPU_MXCSR);
+#else
+	/*
+	 * 32-bit kernel code may use 387 operations as well as SSE2, etc,
+	 * as long as it checks that the CPU has the required capability.
+	 */
 	kernel_fpu_begin_mask(KFPU_387 | KFPU_MXCSR);
+#endif
 }
 
 /*
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 0/4] x86/fpu: Reduce unnecessary FNINIT and MXCSR usage
  2021-01-21  5:09 [PATCH v3 0/4] x86/fpu: Reduce unnecessary FNINIT and MXCSR usage Andy Lutomirski
                   ` (3 preceding siblings ...)
  2021-01-21  5:09 ` [PATCH v3 4/4] x86/fpu/64: Don't FNINIT in kernel_fpu_begin() Andy Lutomirski
@ 2021-01-21  9:29 ` Krzysztof Olędzki
  4 siblings, 0 replies; 9+ messages in thread
From: Krzysztof Olędzki @ 2021-01-21  9:29 UTC (permalink / raw)
  To: Andy Lutomirski, x86; +Cc: LKML, Krzysztof Mazur, Arnd Bergmann

On 2021-01-20 at 21:09, Andy Lutomirski wrote:
> This series fixes two regressions: a boot failure on AMD K7 and a
> performance regression on everything.
> 
> I did a double-take here -- the regressions were reported by different
> people, both named Krzysztof :)
> 
> Changes from v2:
>   - Tidy up the if statements (Sean)
>   - Changelog and comment improvements (Boris)
> 
> Changes from v1:
>   - Fix MMX better -- MMX really does need FNINIT.
>   - Improve the EFI code.
>   - Rename the KFPU constants.
>   - Changelog improvements.
> 
> Andy Lutomirski (4):
>    x86/fpu: Add kernel_fpu_begin_mask() to selectively initialize state
>    x86/mmx: Use KFPU_387 for MMX string operations
>    x86/fpu: Make the EFI FPU calling convention explicit
>    x86/fpu/64: Don't FNINIT in kernel_fpu_begin()

Hi Andy,

I have tested the new patchset on the following CPUs running 5.4.90
(with some adjustments required for it to apply) and 5.10.9 kernels:
  - AMD Phenom(tm) II X3 B77 Processor (family: 0x10, model: 0x4, stepping: 0x3)
  - Intel(R) Xeon(R) CPU 3070  @ 2.66GHz (family: 0x6, model: 0xf, stepping: 0x6)
  - Intel(R) Xeon(R) CPU E3-1280 V2 @ 3.60GHz (family: 0x6, model: 0x3a, stepping: 0x9)

For all of them, it was possible to recover most of the performance lost
due to the introduction of "Reset MXCSR to default in kernel_fpu_begin":
  - B77: 90% instead of 82% for prefetch64-sse, 92% instead of 84% for generic_sse
  - 3070: 93% instead of 86% for prefetch64-sse, 93% instead of 88% for generic_sse
  - 1280v2: 99% instead of 88% for prefetch64-sse, 99% instead of 88% for generic_sse.

For some reason, 1280v2 (Ivy Bridge) sees almost no regression for
prefetch64-sse and generic_sse. The only issue is that AVX is still at
67% of its original performance. This is of course better compared to
60%. There is no AVX on the other 2 CPUs.

I was using 64 bit kernels for testing, please let me know if 32 bit
is also needed.

Tested-by: Krzysztof Piotr Olędzki <ole@ans.pl>

Thanks,
  Krzysztof


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 2/4] x86/mmx: Use KFPU_387 for MMX string operations
  2021-01-21  5:09 ` [PATCH v3 2/4] x86/mmx: Use KFPU_387 for MMX string operations Andy Lutomirski
@ 2021-01-21 11:29   ` Krzysztof Mazur
  2021-01-21 14:26   ` [tip: x86/urgent] " tip-bot2 for Andy Lutomirski
  1 sibling, 0 replies; 9+ messages in thread
From: Krzysztof Mazur @ 2021-01-21 11:29 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: x86, LKML, Krzysztof Olędzki, Arnd Bergmann, stable

On Wed, Jan 20, 2021 at 09:09:49PM -0800, Andy Lutomirski wrote:
> The default kernel_fpu_begin() doesn't work on systems that support XMM but
> haven't yet enabled CR4.OSFXSR.  This causes crashes when _mmx_memcpy() is
> called too early because LDMXCSR generates #UD when the aforementioned bit
> is clear.
> 
> Fix it by using kernel_fpu_begin_mask(KFPU_387) explicitly.
> 
> Fixes: 7ad816762f9b ("x86/fpu: Reset MXCSR to default in kernel_fpu_begin()")
> Cc: <stable@vger.kernel.org>
> Reported-by: Krzysztof Mazur <krzysiek@podlesie.net>
> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> ---
>  arch/x86/lib/mmx_32.c | 20 +++++++++++++++-----
>  1 file changed, 15 insertions(+), 5 deletions(-)

Thanks, 5.10 + this patch series boots on K7 with SSE.

Tested-by: Krzysztof Mazur <krzysiek@podlesie.net>

Regards,
Krzysiek

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [tip: x86/urgent] x86/mmx: Use KFPU_387 for MMX string operations
  2021-01-21  5:09 ` [PATCH v3 2/4] x86/mmx: Use KFPU_387 for MMX string operations Andy Lutomirski
  2021-01-21 11:29   ` Krzysztof Mazur
@ 2021-01-21 14:26   ` tip-bot2 for Andy Lutomirski
  1 sibling, 0 replies; 9+ messages in thread
From: tip-bot2 for Andy Lutomirski @ 2021-01-21 14:26 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Krzysztof Mazur, Andy Lutomirski, Borislav Petkov, ole, stable,
	x86, linux-kernel

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID:     67de8dca50c027ca0fa3b62a488ee5035036a0da
Gitweb:        https://git.kernel.org/tip/67de8dca50c027ca0fa3b62a488ee5035036a0da
Author:        Andy Lutomirski <luto@kernel.org>
AuthorDate:    Wed, 20 Jan 2021 21:09:49 -08:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Thu, 21 Jan 2021 13:39:36 +01:00

x86/mmx: Use KFPU_387 for MMX string operations

The default kernel_fpu_begin() doesn't work on systems that support XMM but
haven't yet enabled CR4.OSFXSR.  This causes crashes when _mmx_memcpy() is
called too early because LDMXCSR generates #UD when the aforementioned bit
is clear.

Fix it by using kernel_fpu_begin_mask(KFPU_387) explicitly.

Fixes: 7ad816762f9b ("x86/fpu: Reset MXCSR to default in kernel_fpu_begin()")
Reported-by: Krzysztof Mazur <krzysiek@podlesie.net>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Krzysztof Piotr Olędzki <ole@ans.pl>
Tested-by: Krzysztof Mazur <krzysiek@podlesie.net>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/e7bf21855fe99e5f3baa27446e32623358f69e8d.1611205691.git.luto@kernel.org
---
 arch/x86/lib/mmx_32.c | 20 +++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/arch/x86/lib/mmx_32.c b/arch/x86/lib/mmx_32.c
index 4321fa0..419365c 100644
--- a/arch/x86/lib/mmx_32.c
+++ b/arch/x86/lib/mmx_32.c
@@ -26,6 +26,16 @@
 #include <asm/fpu/api.h>
 #include <asm/asm.h>
 
+/*
+ * Use KFPU_387.  MMX instructions are not affected by MXCSR,
+ * but both AMD and Intel documentation states that even integer MMX
+ * operations will result in #MF if an exception is pending in FCW.
+ *
+ * EMMS is not needed afterwards because, after calling kernel_fpu_end(),
+ * any subsequent user of the 387 stack will reinitialize it using
+ * KFPU_387.
+ */
+
 void *_mmx_memcpy(void *to, const void *from, size_t len)
 {
 	void *p;
@@ -37,7 +47,7 @@ void *_mmx_memcpy(void *to, const void *from, size_t len)
 	p = to;
 	i = len >> 6; /* len/64 */
 
-	kernel_fpu_begin();
+	kernel_fpu_begin_mask(KFPU_387);
 
 	__asm__ __volatile__ (
 		"1: prefetch (%0)\n"		/* This set is 28 bytes */
@@ -127,7 +137,7 @@ static void fast_clear_page(void *page)
 {
 	int i;
 
-	kernel_fpu_begin();
+	kernel_fpu_begin_mask(KFPU_387);
 
 	__asm__ __volatile__ (
 		"  pxor %%mm0, %%mm0\n" : :
@@ -160,7 +170,7 @@ static void fast_copy_page(void *to, void *from)
 {
 	int i;
 
-	kernel_fpu_begin();
+	kernel_fpu_begin_mask(KFPU_387);
 
 	/*
 	 * maybe the prefetch stuff can go before the expensive fnsave...
@@ -247,7 +257,7 @@ static void fast_clear_page(void *page)
 {
 	int i;
 
-	kernel_fpu_begin();
+	kernel_fpu_begin_mask(KFPU_387);
 
 	__asm__ __volatile__ (
 		"  pxor %%mm0, %%mm0\n" : :
@@ -282,7 +292,7 @@ static void fast_copy_page(void *to, void *from)
 {
 	int i;
 
-	kernel_fpu_begin();
+	kernel_fpu_begin_mask(KFPU_387);
 
 	__asm__ __volatile__ (
 		"1: prefetch (%0)\n"

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [tip: x86/urgent] x86/fpu: Add kernel_fpu_begin_mask() to selectively initialize state
  2021-01-21  5:09 ` [PATCH v3 1/4] x86/fpu: Add kernel_fpu_begin_mask() to selectively initialize state Andy Lutomirski
@ 2021-01-21 14:26   ` tip-bot2 for Andy Lutomirski
  0 siblings, 0 replies; 9+ messages in thread
From: tip-bot2 for Andy Lutomirski @ 2021-01-21 14:26 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Andy Lutomirski, Borislav Petkov, ole, x86, linux-kernel

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID:     e45122893a9870813f9bd7b4add4f613e6f29008
Gitweb:        https://git.kernel.org/tip/e45122893a9870813f9bd7b4add4f613e6f29008
Author:        Andy Lutomirski <luto@kernel.org>
AuthorDate:    Wed, 20 Jan 2021 21:09:48 -08:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Thu, 21 Jan 2021 12:07:28 +01:00

x86/fpu: Add kernel_fpu_begin_mask() to selectively initialize state

Currently, requesting kernel FPU access doesn't distinguish which parts of
the extended ("FPU") state are needed.  This is nice for simplicity, but
there are a few cases in which it's suboptimal:

 - The vast majority of in-kernel FPU users want XMM/YMM/ZMM state but do
   not use legacy 387 state.  These users want MXCSR initialized but don't
   care about the FPU control word.  Skipping FNINIT would save time.
   (Empirically, FNINIT is several times slower than LDMXCSR.)

 - Code that wants MMX doesn't want or need MXCSR initialized.
   _mmx_memcpy(), for example, can run before CR4.OSFXSR gets set, and
   initializing MXCSR will fail because LDMXCSR generates an #UD when the
   aforementioned CR4 bit is not set.

 - Any future in-kernel users of XFD (eXtended Feature Disable)-capable
   dynamic states will need special handling.

Add a more specific API that allows callers to specify exactly what they
want.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Krzysztof Piotr Olędzki <ole@ans.pl>
Link: https://lkml.kernel.org/r/aff1cac8b8fc7ee900cf73e8f2369966621b053f.1611205691.git.luto@kernel.org
---
 arch/x86/include/asm/fpu/api.h | 15 +++++++++++++--
 arch/x86/kernel/fpu/core.c     |  9 +++++----
 2 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h
index a5aba4a..67a4f1c 100644
--- a/arch/x86/include/asm/fpu/api.h
+++ b/arch/x86/include/asm/fpu/api.h
@@ -16,14 +16,25 @@
  * Use kernel_fpu_begin/end() if you intend to use FPU in kernel context. It
  * disables preemption so be careful if you intend to use it for long periods
  * of time.
- * If you intend to use the FPU in softirq you need to check first with
+ * If you intend to use the FPU in irq/softirq you need to check first with
  * irq_fpu_usable() if it is possible.
  */
-extern void kernel_fpu_begin(void);
+
+/* Kernel FPU states to initialize in kernel_fpu_begin_mask() */
+#define KFPU_387	_BITUL(0)	/* 387 state will be initialized */
+#define KFPU_MXCSR	_BITUL(1)	/* MXCSR will be initialized */
+
+extern void kernel_fpu_begin_mask(unsigned int kfpu_mask);
 extern void kernel_fpu_end(void);
 extern bool irq_fpu_usable(void);
 extern void fpregs_mark_activate(void);
 
+/* Code that is unaware of kernel_fpu_begin_mask() can use this */
+static inline void kernel_fpu_begin(void)
+{
+	kernel_fpu_begin_mask(KFPU_387 | KFPU_MXCSR);
+}
+
 /*
  * Use fpregs_lock() while editing CPU's FPU registers or fpu->state.
  * A context switch will (and softirq might) save CPU's FPU registers to
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index eb86a2b..571220a 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -121,7 +121,7 @@ int copy_fpregs_to_fpstate(struct fpu *fpu)
 }
 EXPORT_SYMBOL(copy_fpregs_to_fpstate);
 
-void kernel_fpu_begin(void)
+void kernel_fpu_begin_mask(unsigned int kfpu_mask)
 {
 	preempt_disable();
 
@@ -141,13 +141,14 @@ void kernel_fpu_begin(void)
 	}
 	__cpu_invalidate_fpregs_state();
 
-	if (boot_cpu_has(X86_FEATURE_XMM))
+	/* Put sane initial values into the control registers. */
+	if (likely(kfpu_mask & KFPU_MXCSR) && boot_cpu_has(X86_FEATURE_XMM))
 		ldmxcsr(MXCSR_DEFAULT);
 
-	if (boot_cpu_has(X86_FEATURE_FPU))
+	if (unlikely(kfpu_mask & KFPU_387) && boot_cpu_has(X86_FEATURE_FPU))
 		asm volatile ("fninit");
 }
-EXPORT_SYMBOL_GPL(kernel_fpu_begin);
+EXPORT_SYMBOL_GPL(kernel_fpu_begin_mask);
 
 void kernel_fpu_end(void)
 {

^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-01-21 14:33 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-21  5:09 [PATCH v3 0/4] x86/fpu: Reduce unnecessary FNINIT and MXCSR usage Andy Lutomirski
2021-01-21  5:09 ` [PATCH v3 1/4] x86/fpu: Add kernel_fpu_begin_mask() to selectively initialize state Andy Lutomirski
2021-01-21 14:26   ` [tip: x86/urgent] " tip-bot2 for Andy Lutomirski
2021-01-21  5:09 ` [PATCH v3 2/4] x86/mmx: Use KFPU_387 for MMX string operations Andy Lutomirski
2021-01-21 11:29   ` Krzysztof Mazur
2021-01-21 14:26   ` [tip: x86/urgent] " tip-bot2 for Andy Lutomirski
2021-01-21  5:09 ` [PATCH v3 3/4] x86/fpu: Make the EFI FPU calling convention explicit Andy Lutomirski
2021-01-21  5:09 ` [PATCH v3 4/4] x86/fpu/64: Don't FNINIT in kernel_fpu_begin() Andy Lutomirski
2021-01-21  9:29 ` [PATCH v3 0/4] x86/fpu: Reduce unnecessary FNINIT and MXCSR usage Krzysztof Olędzki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.