linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] x86/fpu: Reduce unnecessary FNINIT and MXCSR usage
@ 2021-01-18  6:20 Andy Lutomirski
  2021-01-18  6:20 ` [PATCH 1/4] x86/fpu: Add kernel_fpu_begin_mask() to selectively initialize state Andy Lutomirski
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Andy Lutomirski @ 2021-01-18  6:20 UTC (permalink / raw)
  To: x86
  Cc: LKML, Krzysztof Mazur, Krzysztof Olędzki, Arnd Bergmann,
	Andy Lutomirski

This series fixes two regressions: a boot failure on AMD K7 and a
performance regression on everything.

I did a double-take here -- the regressions were reported by different
people, both named Krzysztof :)

Andy Lutomirski (4):
  x86/fpu: Add kernel_fpu_begin_mask() to selectively initialize state
  x86/mmx: Use KFPU_MMX for MMX string operations
  x86/fpu: Make the EFI FPU calling convention explicit
  x86/fpu/64: Don't FNINIT in kernel_fpu_begin()

 arch/x86/include/asm/efi.h     |  4 ++--
 arch/x86/include/asm/fpu/api.h | 35 ++++++++++++++++++++++++++++++++--
 arch/x86/kernel/fpu/core.c     | 17 +++++++++++------
 arch/x86/lib/mmx_32.c          | 10 +++++-----
 arch/x86/platform/efi/efi_64.c |  2 +-
 5 files changed, 52 insertions(+), 16 deletions(-)

-- 
2.29.2


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/4] x86/fpu: Add kernel_fpu_begin_mask() to selectively initialize state
  2021-01-18  6:20 [PATCH 0/4] x86/fpu: Reduce unnecessary FNINIT and MXCSR usage Andy Lutomirski
@ 2021-01-18  6:20 ` Andy Lutomirski
  2021-01-18 12:21   ` Peter Zijlstra
                     ` (2 more replies)
  2021-01-18  6:20 ` [PATCH 2/4] x86/mmx: Use KFPU_MMX for MMX string operations Andy Lutomirski
                   ` (3 subsequent siblings)
  4 siblings, 3 replies; 10+ messages in thread
From: Andy Lutomirski @ 2021-01-18  6:20 UTC (permalink / raw)
  To: x86
  Cc: LKML, Krzysztof Mazur, Krzysztof Olędzki, Arnd Bergmann,
	Andy Lutomirski

Currently, requesting kernel FPU access doesn't distinguish which parts of
the extended ("FPU") state are needed.  This is nice for simplicity, but
there are a few cases in which it's suboptimal:

 - The vast majority of in-kernel FPU users want XMM/YMM/ZMM state but do
   not use legacy 387 state.  These users want MXCSR initialized but don't
   care about the FPU control word.  Skipping FNINIT would save time.
   (Empirically, FNINIT is several times slower than LDMXCSR.)

 - Code that wants MMX doesn't want need MXCSR or FCW initialized.
   _mmx_memcpy(), for example, can run before CR4.OSFXSR gets set, and
   initializing MXCSR will fail.

 - Any future in-kernel users of XFD (eXtended Feature Disable)-capable
   dynamic states will need special handling.

This patch adds a more specific API that allows callers specify exactly
what they want.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/include/asm/fpu/api.h | 16 ++++++++++++++--
 arch/x86/kernel/fpu/core.c     | 17 +++++++++++------
 2 files changed, 25 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h
index dcd9503b1098..133907a200ef 100644
--- a/arch/x86/include/asm/fpu/api.h
+++ b/arch/x86/include/asm/fpu/api.h
@@ -16,14 +16,26 @@
  * Use kernel_fpu_begin/end() if you intend to use FPU in kernel context. It
  * disables preemption so be careful if you intend to use it for long periods
  * of time.
- * If you intend to use the FPU in softirq you need to check first with
+ * If you intend to use the FPU in irq/softirq you need to check first with
  * irq_fpu_usable() if it is possible.
  */
-extern void kernel_fpu_begin(void);
+
+/* Kernel FPU states to initialize in kernel_fpu_begin_mask() */
+#define KFPU_387	_BITUL(0)	/* FCW will be initialized */
+#define KFPU_XYZMM	_BITUL(1)	/* MXCSR will be initialized */
+#define KFPU_MMX	0		/* nothing gets initialized */
+
+extern void kernel_fpu_begin_mask(unsigned int kfpu_mask);
 extern void kernel_fpu_end(void);
 extern bool irq_fpu_usable(void);
 extern void fpregs_mark_activate(void);
 
+/* Code that is unaware of kernel_fpu_begin_mask() can use this */
+static inline void kernel_fpu_begin(void)
+{
+	kernel_fpu_begin_mask(KFPU_387 | KFPU_XYZMM);
+}
+
 /*
  * Use fpregs_lock() while editing CPU's FPU registers or fpu->state.
  * A context switch will (and softirq might) save CPU's FPU registers to
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index eb86a2b831b1..52d05c806aa6 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -121,7 +121,7 @@ int copy_fpregs_to_fpstate(struct fpu *fpu)
 }
 EXPORT_SYMBOL(copy_fpregs_to_fpstate);
 
-void kernel_fpu_begin(void)
+void kernel_fpu_begin_mask(unsigned int kfpu_mask)
 {
 	preempt_disable();
 
@@ -141,13 +141,18 @@ void kernel_fpu_begin(void)
 	}
 	__cpu_invalidate_fpregs_state();
 
-	if (boot_cpu_has(X86_FEATURE_XMM))
-		ldmxcsr(MXCSR_DEFAULT);
+	/* Put sane initial values into the control registers. */
+	if (likely(kfpu_mask & KFPU_XYZMM)) {
+		if (boot_cpu_has(X86_FEATURE_XMM))
+			ldmxcsr(MXCSR_DEFAULT);
+	}
 
-	if (boot_cpu_has(X86_FEATURE_FPU))
-		asm volatile ("fninit");
+	if (unlikely(kfpu_mask & KFPU_387)) {
+		if (boot_cpu_has(X86_FEATURE_FPU))
+			asm volatile ("fninit");
+	}
 }
-EXPORT_SYMBOL_GPL(kernel_fpu_begin);
+EXPORT_SYMBOL_GPL(kernel_fpu_begin_mask);
 
 void kernel_fpu_end(void)
 {
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/4] x86/mmx: Use KFPU_MMX for MMX string operations
  2021-01-18  6:20 [PATCH 0/4] x86/fpu: Reduce unnecessary FNINIT and MXCSR usage Andy Lutomirski
  2021-01-18  6:20 ` [PATCH 1/4] x86/fpu: Add kernel_fpu_begin_mask() to selectively initialize state Andy Lutomirski
@ 2021-01-18  6:20 ` Andy Lutomirski
  2021-01-18  6:20 ` [PATCH 3/4] x86/fpu: Make the EFI FPU calling convention explicit Andy Lutomirski
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: Andy Lutomirski @ 2021-01-18  6:20 UTC (permalink / raw)
  To: x86
  Cc: LKML, Krzysztof Mazur, Krzysztof Olędzki, Arnd Bergmann,
	Andy Lutomirski

The default kernel_fpu_begin() doesn't work on systems that support XMM but
haven't yet enabled OSFXSR.  This causes crashes when _mmx_memcpy() is
called too early.

Fix it by using kernel_fpu_begin(KFPU_MMX) explicitly.  This should also be
faster, since it skips both the reasonably fast LDMXCSR and also the rather
slow FNINIT instructions.

Fixes: 7ad816762f9b ("x86/fpu: Reset MXCSR to default in kernel_fpu_begin()")
Reported-by: Krzysztof Mazur <krzysiek@podlesie.net>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/lib/mmx_32.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/lib/mmx_32.c b/arch/x86/lib/mmx_32.c
index 4321fa02e18d..daa80fa005fb 100644
--- a/arch/x86/lib/mmx_32.c
+++ b/arch/x86/lib/mmx_32.c
@@ -37,7 +37,7 @@ void *_mmx_memcpy(void *to, const void *from, size_t len)
 	p = to;
 	i = len >> 6; /* len/64 */
 
-	kernel_fpu_begin();
+	kernel_fpu_begin_mask(KFPU_MMX);
 
 	__asm__ __volatile__ (
 		"1: prefetch (%0)\n"		/* This set is 28 bytes */
@@ -127,7 +127,7 @@ static void fast_clear_page(void *page)
 {
 	int i;
 
-	kernel_fpu_begin();
+	kernel_fpu_begin_mask(KFPU_MMX);
 
 	__asm__ __volatile__ (
 		"  pxor %%mm0, %%mm0\n" : :
@@ -160,7 +160,7 @@ static void fast_copy_page(void *to, void *from)
 {
 	int i;
 
-	kernel_fpu_begin();
+	kernel_fpu_begin_mask(KFPU_MMX);
 
 	/*
 	 * maybe the prefetch stuff can go before the expensive fnsave...
@@ -247,7 +247,7 @@ static void fast_clear_page(void *page)
 {
 	int i;
 
-	kernel_fpu_begin();
+	kernel_fpu_begin_mask(KFPU_MMX);
 
 	__asm__ __volatile__ (
 		"  pxor %%mm0, %%mm0\n" : :
@@ -282,7 +282,7 @@ static void fast_copy_page(void *to, void *from)
 {
 	int i;
 
-	kernel_fpu_begin();
+	kernel_fpu_begin_mask(KFPU_MMX);
 
 	__asm__ __volatile__ (
 		"1: prefetch (%0)\n"
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/4] x86/fpu: Make the EFI FPU calling convention explicit
  2021-01-18  6:20 [PATCH 0/4] x86/fpu: Reduce unnecessary FNINIT and MXCSR usage Andy Lutomirski
  2021-01-18  6:20 ` [PATCH 1/4] x86/fpu: Add kernel_fpu_begin_mask() to selectively initialize state Andy Lutomirski
  2021-01-18  6:20 ` [PATCH 2/4] x86/mmx: Use KFPU_MMX for MMX string operations Andy Lutomirski
@ 2021-01-18  6:20 ` Andy Lutomirski
  2021-01-18 12:13   ` Borislav Petkov
  2021-01-18  6:20 ` [PATCH 4/4] x86/fpu/64: Don't FNINIT in kernel_fpu_begin() Andy Lutomirski
  2021-01-18  8:25 ` [PATCH 0/4] x86/fpu: Reduce unnecessary FNINIT and MXCSR usage Krzysztof Olędzki
  4 siblings, 1 reply; 10+ messages in thread
From: Andy Lutomirski @ 2021-01-18  6:20 UTC (permalink / raw)
  To: x86
  Cc: LKML, Krzysztof Mazur, Krzysztof Olędzki, Arnd Bergmann,
	Andy Lutomirski

EFI uses kernel_fpu_begin() to conform to the UEFI calling convention.
This specifically requires initializing FCW, whereas no sane 64-bit kernel
code should use legacy 387 operations that reference FCW.

Add KFPU_EFI to make this self-documenting, and use it in the EFI code.
This should enable us to safely change the default semantics of
kernel_fpu_begin() to stop initializing FCW on 64-bit kernels.

Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/include/asm/efi.h     | 4 ++--
 arch/x86/include/asm/fpu/api.h | 7 +++++++
 arch/x86/platform/efi/efi_64.c | 2 +-
 3 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index bc9758ef292e..c60be69a5c82 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -71,7 +71,7 @@ extern unsigned long efi_fw_vendor, efi_config_table;
 #ifdef CONFIG_X86_32
 #define arch_efi_call_virt_setup()					\
 ({									\
-	kernel_fpu_begin();						\
+	kernel_fpu_begin_mask(KFPU_EFI);				\
 	firmware_restrict_branch_speculation_start();			\
 })
 
@@ -107,7 +107,7 @@ struct efi_scratch {
 #define arch_efi_call_virt_setup()					\
 ({									\
 	efi_sync_low_kernel_mappings();					\
-	kernel_fpu_begin();						\
+	kernel_fpu_begin_mask(KFPU_EFI);				\
 	firmware_restrict_branch_speculation_start();			\
 	efi_switch_mm(&efi_mm);						\
 })
diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h
index 133907a200ef..e95a06845443 100644
--- a/arch/x86/include/asm/fpu/api.h
+++ b/arch/x86/include/asm/fpu/api.h
@@ -25,6 +25,13 @@
 #define KFPU_XYZMM	_BITUL(1)	/* MXCSR will be initialized */
 #define KFPU_MMX	0		/* nothing gets initialized */
 
+/*
+ * The UEFI calling convention (UEFI spec 2.3.2 and 2.3.4) requires
+ * that FCW (32-bit and 64-bit) and MXCSR (64-bit) must be initialized
+ * prior to calling UEFI code.
+ */
+#define KFPU_EFI	(KFPU_387 | KFPU_XYZMM)
+
 extern void kernel_fpu_begin_mask(unsigned int kfpu_mask);
 extern void kernel_fpu_end(void);
 extern bool irq_fpu_usable(void);
diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 8f5759df7776..c304c8da862b 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -848,7 +848,7 @@ efi_set_virtual_address_map(unsigned long memory_map_size,
 							 virtual_map);
 	efi_switch_mm(&efi_mm);
 
-	kernel_fpu_begin();
+	kernel_fpu_begin_mask(KFPU_EFI);
 
 	/* Disable interrupts around EFI calls: */
 	local_irq_save(flags);
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 4/4] x86/fpu/64: Don't FNINIT in kernel_fpu_begin()
  2021-01-18  6:20 [PATCH 0/4] x86/fpu: Reduce unnecessary FNINIT and MXCSR usage Andy Lutomirski
                   ` (2 preceding siblings ...)
  2021-01-18  6:20 ` [PATCH 3/4] x86/fpu: Make the EFI FPU calling convention explicit Andy Lutomirski
@ 2021-01-18  6:20 ` Andy Lutomirski
  2021-01-18  8:25 ` [PATCH 0/4] x86/fpu: Reduce unnecessary FNINIT and MXCSR usage Krzysztof Olędzki
  4 siblings, 0 replies; 10+ messages in thread
From: Andy Lutomirski @ 2021-01-18  6:20 UTC (permalink / raw)
  To: x86
  Cc: LKML, Krzysztof Mazur, Krzysztof Olędzki, Arnd Bergmann,
	Andy Lutomirski

The remaining callers of kernel_fpu_begin() in 64-bit kernels don't use 387
instructions, so there's no need to sanitize FCW.  Skip it to get the
performance we lost back.

Reported-by: Krzysztof Olędzki <ole@ans.pl>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/include/asm/fpu/api.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h
index e95a06845443..6e826796a734 100644
--- a/arch/x86/include/asm/fpu/api.h
+++ b/arch/x86/include/asm/fpu/api.h
@@ -40,7 +40,19 @@ extern void fpregs_mark_activate(void);
 /* Code that is unaware of kernel_fpu_begin_mask() can use this */
 static inline void kernel_fpu_begin(void)
 {
+#ifdef CONFIG_X86_64
+	/*
+	 * Any 64-bit code that uses 387 instructions must explicitly request
+	 * KFPU_387.
+	 */
+	kernel_fpu_begin_mask(KFPU_XYZMM);
+#else
+	/*
+	 * 32-bit kernel code may use 387 operations as well as SSE2, etc,
+	 * as long as it checks that the CPU has the required capability.
+	 */
 	kernel_fpu_begin_mask(KFPU_387 | KFPU_XYZMM);
+#endif
 }
 
 /*
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/4] x86/fpu: Reduce unnecessary FNINIT and MXCSR usage
  2021-01-18  6:20 [PATCH 0/4] x86/fpu: Reduce unnecessary FNINIT and MXCSR usage Andy Lutomirski
                   ` (3 preceding siblings ...)
  2021-01-18  6:20 ` [PATCH 4/4] x86/fpu/64: Don't FNINIT in kernel_fpu_begin() Andy Lutomirski
@ 2021-01-18  8:25 ` Krzysztof Olędzki
  4 siblings, 0 replies; 10+ messages in thread
From: Krzysztof Olędzki @ 2021-01-18  8:25 UTC (permalink / raw)
  To: Andy Lutomirski, x86; +Cc: LKML, Krzysztof Mazur, Arnd Bergmann

On 2021-01-17 at 22:20, Andy Lutomirski wrote:
> This series fixes two regressions: a boot failure on AMD K7 and a
> performance regression on everything.
> 
> I did a double-take here -- the regressions were reported by different
> people, both named Krzysztof :)
> 
> Andy Lutomirski (4):
>    x86/fpu: Add kernel_fpu_begin_mask() to selectively initialize state
>    x86/mmx: Use KFPU_MMX for MMX string operations
>    x86/fpu: Make the EFI FPU calling convention explicit
>    x86/fpu/64: Don't FNINIT in kernel_fpu_begin()

Thank you so much Andy!

What a coincidence! Sadly, my AMD K7 is sitting somewhere in a closet, 
on a different continent, and was running Linux for the last time over 
10 years ago. :/ However, I can offer some testing on different AMD & 
Intel CPUs.

Now... It is 12 AM here so I tested it very quickly only on 5.4-stable, 
where I initially noticed the problem. The patch applies almost cleanly 
in this release, almost as arch/x86/platform/efi/efi_64.c does not have 
kernel_fpu_begin() call to update. The kernel complies and boots.

Here is the result for:
  Intel(R) Xeon(R) CPU E3-1280 V2 @ 3.60GHz (family: 0x6, model: 0x3a, 
stepping: 0x9)

5.4-stable (with "Reset MXCSR to default in kernel_fpu_begin"):
     avx       : 21072.000 MB/sec
     prefetch64-sse: 20392.000 MB/sec
     generic_sse: 18572.000 MB/sec
xor: using function: avx (21072.000 MB/sec)

5.4-stable-c4db485dd3f2378b4923503aed995f7816e265b7-revert:
     avx       : 33764.000 MB/sec
     prefetch64-sse: 23432.000 MB/sec
     generic_sse: 21036.000 MB/sec
xor: using function: avx (33764.000 MB/sec)

5.4-stable-kernel_fpu_begin_mask:
    avx       : 23576.000 MB/sec
    prefetch64-sse: 23024.000 MB/sec
    generic_sse: 20880.000 MB/sec
xor: using function: avx (23576.000 MB/sec)

So, the performance regression for prefetch64-sse and generic_sse is 
almost gone, but the AVX code is still impacted. Not as much as before, 
but still noticeably, and it is now barely better than fixed prefetch64-sse.

I'm going to test the patches on 5.10 / 5.11-rc to make sure what I have 
seen on 5.4 is not due to wrong backporting, and on different CPUs. 
However, this may have to wait until Tuesday / Wednesday due to family 
duties, as Monday is a holiday here.

Best regards,
  Krzysztof Olędzki

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 3/4] x86/fpu: Make the EFI FPU calling convention explicit
  2021-01-18  6:20 ` [PATCH 3/4] x86/fpu: Make the EFI FPU calling convention explicit Andy Lutomirski
@ 2021-01-18 12:13   ` Borislav Petkov
  0 siblings, 0 replies; 10+ messages in thread
From: Borislav Petkov @ 2021-01-18 12:13 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: x86, LKML, Krzysztof Mazur, Krzysztof Olędzki, Arnd Bergmann

On Sun, Jan 17, 2021 at 10:20:40PM -0800, Andy Lutomirski wrote:
> EFI uses kernel_fpu_begin() to conform to the UEFI calling convention.
> This specifically requires initializing FCW, whereas no sane 64-bit kernel
> code should use legacy 387 operations that reference FCW.
> 
> Add KFPU_EFI to make this self-documenting, and use it in the EFI code.

I'd prefer if you slap a comment over the kernel_fpu_begin() calls in
efi instead of adding a separate define etc.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/4] x86/fpu: Add kernel_fpu_begin_mask() to selectively initialize state
  2021-01-18  6:20 ` [PATCH 1/4] x86/fpu: Add kernel_fpu_begin_mask() to selectively initialize state Andy Lutomirski
@ 2021-01-18 12:21   ` Peter Zijlstra
  2021-01-18 14:03   ` David Laight
  2021-01-19 11:08   ` Borislav Petkov
  2 siblings, 0 replies; 10+ messages in thread
From: Peter Zijlstra @ 2021-01-18 12:21 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: x86, LKML, Krzysztof Mazur, Krzysztof Olędzki, Arnd Bergmann

On Sun, Jan 17, 2021 at 10:20:38PM -0800, Andy Lutomirski wrote:

>  - Code that wants MMX doesn't want need MXCSR or FCW initialized.
>    _mmx_memcpy(), for example, can run before CR4.OSFXSR gets set, and
>    initializing MXCSR will fail.

> +#define KFPU_MMX	0		/* nothing gets initialized */

This... why is that correct?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH 1/4] x86/fpu: Add kernel_fpu_begin_mask() to selectively initialize state
  2021-01-18  6:20 ` [PATCH 1/4] x86/fpu: Add kernel_fpu_begin_mask() to selectively initialize state Andy Lutomirski
  2021-01-18 12:21   ` Peter Zijlstra
@ 2021-01-18 14:03   ` David Laight
  2021-01-19 11:08   ` Borislav Petkov
  2 siblings, 0 replies; 10+ messages in thread
From: David Laight @ 2021-01-18 14:03 UTC (permalink / raw)
  To: 'Andy Lutomirski', x86
  Cc: LKML, Krzysztof Mazur, Krzysztof Olędzki, Arnd Bergmann

From: Andy Lutomirski
> Sent: 18 January 2021 06:21
> 
> Currently, requesting kernel FPU access doesn't distinguish which parts of
> the extended ("FPU") state are needed.  This is nice for simplicity, but
> there are a few cases in which it's suboptimal:
> 
>  - The vast majority of in-kernel FPU users want XMM/YMM/ZMM state but do
>    not use legacy 387 state.  These users want MXCSR initialized but don't
>    care about the FPU control word.  Skipping FNINIT would save time.
>    (Empirically, FNINIT is several times slower than LDMXCSR.)
> 
>  - Code that wants MMX doesn't want need MXCSR or FCW initialized.
>    _mmx_memcpy(), for example, can run before CR4.OSFXSR gets set, and
>    initializing MXCSR will fail.
> 
>  - Any future in-kernel users of XFD (eXtended Feature Disable)-capable
>    dynamic states will need special handling.
> 
> This patch adds a more specific API that allows callers specify exactly
> what they want.

Is it worth returning whether the required fpu feature is available?
Or, maybe optionally, available cheaply?

There are also code fragments that really just want one or two
[xyx]mm registers to speed something up.
For instance PCIe reads can be a lot faster if a wide register
can be used.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/4] x86/fpu: Add kernel_fpu_begin_mask() to selectively initialize state
  2021-01-18  6:20 ` [PATCH 1/4] x86/fpu: Add kernel_fpu_begin_mask() to selectively initialize state Andy Lutomirski
  2021-01-18 12:21   ` Peter Zijlstra
  2021-01-18 14:03   ` David Laight
@ 2021-01-19 11:08   ` Borislav Petkov
  2 siblings, 0 replies; 10+ messages in thread
From: Borislav Petkov @ 2021-01-19 11:08 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: x86, LKML, Krzysztof Mazur, Krzysztof Olędzki, Arnd Bergmann

Just nitpicks:

On Sun, Jan 17, 2021 at 10:20:38PM -0800, Andy Lutomirski wrote:
> Currently, requesting kernel FPU access doesn't distinguish which parts of
> the extended ("FPU") state are needed.  This is nice for simplicity, but
> there are a few cases in which it's suboptimal:
> 
>  - The vast majority of in-kernel FPU users want XMM/YMM/ZMM state but do
>    not use legacy 387 state.  These users want MXCSR initialized but don't
>    care about the FPU control word.  Skipping FNINIT would save time.
>    (Empirically, FNINIT is several times slower than LDMXCSR.)
> 
>  - Code that wants MMX doesn't want need MXCSR or FCW initialized.

"want/need" ?

>    _mmx_memcpy(), for example, can run before CR4.OSFXSR gets set, and
>    initializing MXCSR will fail.

"... because LDMXCSR generates an #UD when the aforementioned CR4 bit is
not set."

>  - Any future in-kernel users of XFD (eXtended Feature Disable)-capable
>    dynamic states will need special handling.
> 
> This patch adds a more specific API that allows callers specify exactly

s/This patch adds/Add/

> what they want.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-01-19 11:28 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-18  6:20 [PATCH 0/4] x86/fpu: Reduce unnecessary FNINIT and MXCSR usage Andy Lutomirski
2021-01-18  6:20 ` [PATCH 1/4] x86/fpu: Add kernel_fpu_begin_mask() to selectively initialize state Andy Lutomirski
2021-01-18 12:21   ` Peter Zijlstra
2021-01-18 14:03   ` David Laight
2021-01-19 11:08   ` Borislav Petkov
2021-01-18  6:20 ` [PATCH 2/4] x86/mmx: Use KFPU_MMX for MMX string operations Andy Lutomirski
2021-01-18  6:20 ` [PATCH 3/4] x86/fpu: Make the EFI FPU calling convention explicit Andy Lutomirski
2021-01-18 12:13   ` Borislav Petkov
2021-01-18  6:20 ` [PATCH 4/4] x86/fpu/64: Don't FNINIT in kernel_fpu_begin() Andy Lutomirski
2021-01-18  8:25 ` [PATCH 0/4] x86/fpu: Reduce unnecessary FNINIT and MXCSR usage Krzysztof Olędzki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).