linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch 00/31] x86/fpu: Preparatory cleanups for AMX support (part 1)
@ 2021-10-11 23:59 Thomas Gleixner
  2021-10-11 23:59 ` [patch 01/31] x86/fpu: Remove pointless argument from switch_fpu_finish() Thomas Gleixner
                   ` (31 more replies)
  0 siblings, 32 replies; 96+ messages in thread
From: Thomas Gleixner @ 2021-10-11 23:59 UTC (permalink / raw)
  To: LKML; +Cc: x86, Chang S. Bae, Dave Hansen, Arjan van de Ven, kvm, Paolo Bonzini

The recent attempts to support the new AMX feature just tried to bolt it
into the exisiting FPU code:

     https://lore.kernel.org/r/20211001223728.9309-1-chang.seok.bae@intel.com

As demonstrated with the supervisor bits, that's not really sensible and
leads to similar issues.

I've worked with Chang and Dave in the past few days on sorting this
out. Many thanks for their effort and support!

This series is a renewed effort to make this more palatable. It's the first
part of a 4 part submission which work towards a clean AMX integration into
the FPU code:

  1) Cleans up existing mess. Historical leftovers, shortcomings and
     especially the universal kitchen sink asm/fpu/internal.h which is
     included all over the place for the wrong reasons.

     This series has a value independent of AMX, but allows to make the
     integration and conversion to the new world order of dynamically
     enabled feature bits simpler.

  2) Introduces a container for the actual register storage which carries
     information about the kernel and user space features and sizes
     supported by it to easy the integration of dynamically enabled feature
     and the resulting different buffer sizes.

  3) Replaces a ton of state variables by introducing structures which
     carry that information

  4) The actual AMX and dynamic feature enable bits which have been
     significantly reworked on top of #1 - #3 and to address shortcomings
     of the previous submissions.

The current series (#1) is based on:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86/fpu

and also available from git:

   git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git x86/fpu-1

The full series which has #1-#4 included can be found at:

   git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git x86/fpu

Thanks,

	tglx
---
 arch/x86/events/perf_event.h        |    1 
 arch/x86/ia32/ia32_signal.c         |    1 
 arch/x86/include/asm/fpu/api.h      |   31 ++
 arch/x86/include/asm/fpu/internal.h |  530 ------------------------------------
 arch/x86/include/asm/fpu/signal.h   |   13 
 arch/x86/include/asm/fpu/xcr.h      |   11 
 arch/x86/include/asm/fpu/xstate.h   |    6 
 arch/x86/include/asm/pkru.h         |    2 
 arch/x86/kernel/cpu/bugs.c          |    2 
 arch/x86/kernel/cpu/common.c        |    2 
 arch/x86/kernel/fpu/bugs.c          |    2 
 arch/x86/kernel/fpu/core.c          |  163 ++++++++---
 arch/x86/kernel/fpu/init.c          |   29 -
 arch/x86/kernel/fpu/regset.c        |    6 
 arch/x86/kernel/fpu/signal.c        |   21 -
 arch/x86/kernel/fpu/xstate.c        |  164 ++++++-----
 arch/x86/kernel/process.c           |    6 
 arch/x86/kernel/process_32.c        |    5 
 arch/x86/kernel/process_64.c        |    5 
 arch/x86/kernel/ptrace.c            |    1 
 arch/x86/kernel/sev.c               |    2 
 arch/x86/kernel/signal.c            |    1 
 arch/x86/kernel/smpboot.c           |    2 
 arch/x86/kernel/traps.c             |    2 
 arch/x86/kvm/svm/sev.c              |    2 
 arch/x86/kvm/vmx/vmx.c              |    2 
 arch/x86/kvm/x86.c                  |  192 +------------
 arch/x86/math-emu/fpu_entry.c       |    2 
 arch/x86/mm/extable.c               |    4 
 arch/x86/power/cpu.c                |    2 
 b/arch/x86/include/asm/fpu/sched.h  |   68 ++++
 b/arch/x86/kernel/fpu/context.h     |   85 +++++
 b/arch/x86/kernel/fpu/internal.h    |   30 ++
 b/arch/x86/kernel/fpu/legacy.h      |  115 +++++++
 b/arch/x86/kernel/fpu/xstate.h      |  198 +++++++++++++
 35 files changed, 822 insertions(+), 886 deletions(-)

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [patch 01/31] x86/fpu: Remove pointless argument from switch_fpu_finish()
  2021-10-11 23:59 [patch 00/31] x86/fpu: Preparatory cleanups for AMX support (part 1) Thomas Gleixner
@ 2021-10-11 23:59 ` Thomas Gleixner
  2021-10-12  0:00 ` [patch 02/31] x86/fpu: Update stale comments Thomas Gleixner
                   ` (30 subsequent siblings)
  31 siblings, 0 replies; 96+ messages in thread
From: Thomas Gleixner @ 2021-10-11 23:59 UTC (permalink / raw)
  To: LKML; +Cc: x86, Chang S. Bae, Dave Hansen, Arjan van de Ven, kvm, Paolo Bonzini

Unused since the FPU switching rework.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/fpu/internal.h |    2 +-
 arch/x86/kernel/process_32.c        |    3 +--
 arch/x86/kernel/process_64.c        |    3 +--
 3 files changed, 3 insertions(+), 5 deletions(-)

--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -521,7 +521,7 @@ static inline void switch_fpu_prepare(st
  * Delay loading of the complete FPU state until the return to userland.
  * PKRU is handled separately.
  */
-static inline void switch_fpu_finish(struct fpu *new_fpu)
+static inline void switch_fpu_finish(void)
 {
 	if (cpu_feature_enabled(X86_FEATURE_FPU))
 		set_thread_flag(TIF_NEED_FPU_LOAD);
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -160,7 +160,6 @@ EXPORT_SYMBOL_GPL(start_thread);
 	struct thread_struct *prev = &prev_p->thread,
 			     *next = &next_p->thread;
 	struct fpu *prev_fpu = &prev->fpu;
-	struct fpu *next_fpu = &next->fpu;
 	int cpu = smp_processor_id();
 
 	/* never put a printk in __switch_to... printk() calls wake_up*() indirectly */
@@ -213,7 +212,7 @@ EXPORT_SYMBOL_GPL(start_thread);
 
 	this_cpu_write(current_task, next_p);
 
-	switch_fpu_finish(next_fpu);
+	switch_fpu_finish();
 
 	/* Load the Intel cache allocation PQR MSR. */
 	resctrl_sched_in();
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -559,7 +559,6 @@ void compat_start_thread(struct pt_regs
 	struct thread_struct *prev = &prev_p->thread;
 	struct thread_struct *next = &next_p->thread;
 	struct fpu *prev_fpu = &prev->fpu;
-	struct fpu *next_fpu = &next->fpu;
 	int cpu = smp_processor_id();
 
 	WARN_ON_ONCE(IS_ENABLED(CONFIG_DEBUG_ENTRY) &&
@@ -620,7 +619,7 @@ void compat_start_thread(struct pt_regs
 	this_cpu_write(current_task, next_p);
 	this_cpu_write(cpu_current_top_of_stack, task_top_of_stack(next_p));
 
-	switch_fpu_finish(next_fpu);
+	switch_fpu_finish();
 
 	/* Reload sp0. */
 	update_task_stack(next_p);


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [patch 02/31] x86/fpu: Update stale comments
  2021-10-11 23:59 [patch 00/31] x86/fpu: Preparatory cleanups for AMX support (part 1) Thomas Gleixner
  2021-10-11 23:59 ` [patch 01/31] x86/fpu: Remove pointless argument from switch_fpu_finish() Thomas Gleixner
@ 2021-10-12  0:00 ` Thomas Gleixner
  2021-10-12  0:00 ` [patch 03/31] x86/pkru: Remove useless include Thomas Gleixner
                   ` (29 subsequent siblings)
  31 siblings, 0 replies; 96+ messages in thread
From: Thomas Gleixner @ 2021-10-12  0:00 UTC (permalink / raw)
  To: LKML; +Cc: x86, Chang S. Bae, Dave Hansen, Arjan van de Ven, kvm, Paolo Bonzini

copy_fpstate_to_sigframe() does not have a slow path anymore. Neither does
the !ia32 restore in __fpu_restore_sig().

Update the comments accordingly.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/fpu/signal.c |   13 +++----------
 1 file changed, 3 insertions(+), 10 deletions(-)

--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -155,10 +155,8 @@ static inline int copy_fpregs_to_sigfram
  *	buf == buf_fx for 64-bit frames and 32-bit fsave frame.
  *	buf != buf_fx for 32-bit frames with fxstate.
  *
- * Try to save it directly to the user frame with disabled page fault handler.
- * If this fails then do the slow path where the FPU state is first saved to
- * task's fpu->state and then copy it to the user frame pointed to by the
- * aligned pointer 'buf_fx'.
+ * Save it directly to the user frame with disabled page fault handler. If
+ * that faults, try to clear the frame which handles the page fault.
  *
  * If this is a 32-bit frame with fxstate, put a fsave header before
  * the aligned state at 'buf_fx'.
@@ -334,12 +332,7 @@ static bool __fpu_restore_sig(void __use
 	}
 
 	if (likely(!ia32_fxstate)) {
-		/*
-		 * Attempt to restore the FPU registers directly from user
-		 * memory. For that to succeed, the user access cannot cause page
-		 * faults. If it does, fall back to the slow path below, going
-		 * through the kernel buffer with the enabled pagefault handler.
-		 */
+		/* Restore the FPU registers directly from user memory. */
 		return restore_fpregs_from_user(buf_fx, user_xfeatures, fx_only,
 						state_size);
 	}


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [patch 03/31] x86/pkru: Remove useless include
  2021-10-11 23:59 [patch 00/31] x86/fpu: Preparatory cleanups for AMX support (part 1) Thomas Gleixner
  2021-10-11 23:59 ` [patch 01/31] x86/fpu: Remove pointless argument from switch_fpu_finish() Thomas Gleixner
  2021-10-12  0:00 ` [patch 02/31] x86/fpu: Update stale comments Thomas Gleixner
@ 2021-10-12  0:00 ` Thomas Gleixner
  2021-10-12  0:00 ` [patch 04/31] x86/fpu: Restrict xsaves()/xrstors() to independent states Thomas Gleixner
                   ` (28 subsequent siblings)
  31 siblings, 0 replies; 96+ messages in thread
From: Thomas Gleixner @ 2021-10-12  0:00 UTC (permalink / raw)
  To: LKML; +Cc: x86, Chang S. Bae, Dave Hansen, Arjan van de Ven, kvm, Paolo Bonzini

PKRU code does not need anything from fpu headers. Include cpufeature.h
instead and fixup the resulting fallout in perf.

This is a preparation for FPU changes in order to prevent recursive include
hell.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/events/perf_event.h |    1 +
 arch/x86/include/asm/pkru.h  |    2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -14,6 +14,7 @@
 
 #include <linux/perf_event.h>
 
+#include <asm/fpu/xstate.h>
 #include <asm/intel_ds.h>
 #include <asm/cpu.h>
 
--- a/arch/x86/include/asm/pkru.h
+++ b/arch/x86/include/asm/pkru.h
@@ -2,7 +2,7 @@
 #ifndef _ASM_X86_PKRU_H
 #define _ASM_X86_PKRU_H
 
-#include <asm/fpu/xstate.h>
+#include <asm/cpufeature.h>
 
 #define PKRU_AD_BIT 0x1
 #define PKRU_WD_BIT 0x2


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [patch 04/31] x86/fpu: Restrict xsaves()/xrstors() to independent states
  2021-10-11 23:59 [patch 00/31] x86/fpu: Preparatory cleanups for AMX support (part 1) Thomas Gleixner
                   ` (2 preceding siblings ...)
  2021-10-12  0:00 ` [patch 03/31] x86/pkru: Remove useless include Thomas Gleixner
@ 2021-10-12  0:00 ` Thomas Gleixner
  2021-10-12 14:24   ` Borislav Petkov
  2021-10-12  0:00 ` [patch 05/31] x86/fpu: Cleanup the on_boot_cpu clutter Thomas Gleixner
                   ` (27 subsequent siblings)
  31 siblings, 1 reply; 96+ messages in thread
From: Thomas Gleixner @ 2021-10-12  0:00 UTC (permalink / raw)
  To: LKML; +Cc: x86, Chang S. Bae, Dave Hansen, Arjan van de Ven, kvm, Paolo Bonzini

These interfaces are really only valid for features which are independently
managed and not part of the task context state for various reasons.

Tighten the checks and adjust the misleading comments.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/fpu/xstate.c |   14 ++++----------
 1 file changed, 4 insertions(+), 10 deletions(-)

--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -1182,13 +1182,9 @@ static bool validate_xsaves_xrstors(u64
 	if (WARN_ON_FPU(!cpu_feature_enabled(X86_FEATURE_XSAVES)))
 		return false;
 	/*
-	 * Validate that this is either a task->fpstate related component
-	 * subset or an independent one.
+	 * Validate that this is a independent compoment.
 	 */
-	if (mask & xfeatures_mask_independent())
-		xchk = ~xfeatures_mask_independent();
-	else
-		xchk = ~xfeatures_mask_all;
+	xchk = ~xfeatures_mask_independent();
 
 	if (WARN_ON_ONCE(!mask || mask & xchk))
 		return false;
@@ -1206,8 +1202,7 @@ static bool validate_xsaves_xrstors(u64
  * buffer should be zeroed otherwise a consecutive XRSTORS from that buffer
  * can #GP.
  *
- * The feature mask must either be a subset of the independent features or
- * a subset of the task->fpstate related features.
+ * The feature mask must be a subset of the independent features
  */
 void xsaves(struct xregs_state *xstate, u64 mask)
 {
@@ -1231,8 +1226,7 @@ void xsaves(struct xregs_state *xstate,
  * Proper usage is to restore the state which was saved with
  * xsaves() into @xstate.
  *
- * The feature mask must either be a subset of the independent features or
- * a subset of the task->fpstate related features.
+ * The feature mask must be a subset of the independent features
  */
 void xrstors(struct xregs_state *xstate, u64 mask)
 {


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [patch 05/31] x86/fpu: Cleanup the on_boot_cpu clutter
  2021-10-11 23:59 [patch 00/31] x86/fpu: Preparatory cleanups for AMX support (part 1) Thomas Gleixner
                   ` (3 preceding siblings ...)
  2021-10-12  0:00 ` [patch 04/31] x86/fpu: Restrict xsaves()/xrstors() to independent states Thomas Gleixner
@ 2021-10-12  0:00 ` Thomas Gleixner
  2021-10-12  0:00 ` [patch 06/31] x86/fpu: Remove pointless memset in fpu_clone() Thomas Gleixner
                   ` (26 subsequent siblings)
  31 siblings, 0 replies; 96+ messages in thread
From: Thomas Gleixner @ 2021-10-12  0:00 UTC (permalink / raw)
  To: LKML; +Cc: x86, Chang S. Bae, Dave Hansen, Arjan van de Ven, kvm, Paolo Bonzini

Defensive programming is useful, but this on_boot_cpu debug is really
silly.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/fpu/init.c   |   16 ----------------
 arch/x86/kernel/fpu/xstate.c |    9 ---------
 2 files changed, 25 deletions(-)

--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -192,11 +192,6 @@ static void __init fpu__init_task_struct
  */
 static void __init fpu__init_system_xstate_size_legacy(void)
 {
-	static int on_boot_cpu __initdata = 1;
-
-	WARN_ON_FPU(!on_boot_cpu);
-	on_boot_cpu = 0;
-
 	/*
 	 * Note that xstate sizes might be overwritten later during
 	 * fpu__init_system_xstate().
@@ -216,15 +211,6 @@ static void __init fpu__init_system_xsta
 	fpu_user_xstate_size = fpu_kernel_xstate_size;
 }
 
-/* Legacy code to initialize eager fpu mode. */
-static void __init fpu__init_system_ctx_switch(void)
-{
-	static bool on_boot_cpu __initdata = 1;
-
-	WARN_ON_FPU(!on_boot_cpu);
-	on_boot_cpu = 0;
-}
-
 /*
  * Called on the boot CPU once per system bootup, to set up the initial
  * FPU state that is later cloned into all processes:
@@ -243,6 +229,4 @@ void __init fpu__init_system(struct cpui
 	fpu__init_system_xstate_size_legacy();
 	fpu__init_system_xstate();
 	fpu__init_task_struct_size();
-
-	fpu__init_system_ctx_switch();
 }
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -379,15 +379,10 @@ static void __init print_xstate_offset_s
  */
 static void __init setup_init_fpu_buf(void)
 {
-	static int on_boot_cpu __initdata = 1;
-
 	BUILD_BUG_ON((XFEATURE_MASK_USER_SUPPORTED |
 		      XFEATURE_MASK_SUPERVISOR_SUPPORTED) !=
 		     XFEATURES_INIT_FPSTATE_HANDLED);
 
-	WARN_ON_FPU(!on_boot_cpu);
-	on_boot_cpu = 0;
-
 	if (!boot_cpu_has(X86_FEATURE_XSAVE))
 		return;
 
@@ -721,14 +716,10 @@ static void fpu__init_disable_system_xst
 void __init fpu__init_system_xstate(void)
 {
 	unsigned int eax, ebx, ecx, edx;
-	static int on_boot_cpu __initdata = 1;
 	u64 xfeatures;
 	int err;
 	int i;
 
-	WARN_ON_FPU(!on_boot_cpu);
-	on_boot_cpu = 0;
-
 	if (!boot_cpu_has(X86_FEATURE_FPU)) {
 		pr_info("x86/fpu: No FPU detected\n");
 		return;


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [patch 06/31] x86/fpu: Remove pointless memset in fpu_clone()
  2021-10-11 23:59 [patch 00/31] x86/fpu: Preparatory cleanups for AMX support (part 1) Thomas Gleixner
                   ` (4 preceding siblings ...)
  2021-10-12  0:00 ` [patch 05/31] x86/fpu: Cleanup the on_boot_cpu clutter Thomas Gleixner
@ 2021-10-12  0:00 ` Thomas Gleixner
  2021-10-12  0:00 ` [patch 07/31] x86/process: Clone FPU in copy_thread() Thomas Gleixner
                   ` (25 subsequent siblings)
  31 siblings, 0 replies; 96+ messages in thread
From: Thomas Gleixner @ 2021-10-12  0:00 UTC (permalink / raw)
  To: LKML; +Cc: x86, Chang S. Bae, Dave Hansen, Arjan van de Ven, kvm, Paolo Bonzini

Zeroing the forked task's FPU register buffer to avoid leaking init
optimized stale data into the clone is a pointless exercise for the case
where the current task has TIF_NEED_FPU_LOAD set. In that case the FPU
register state is copied from current's FPU register buffer which can
contain stale init optimized data as well.

The alledged information leak is non-existant because this stale
init optimized data is nowhere used and cannot leak anywhere.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/fpu/core.c |    6 ------
 1 file changed, 6 deletions(-)

--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -260,12 +260,6 @@ int fpu_clone(struct task_struct *dst)
 		return 0;
 
 	/*
-	 * Don't let 'init optimized' areas of the XSAVE area
-	 * leak into the child task:
-	 */
-	memset(&dst_fpu->state.xsave, 0, fpu_kernel_xstate_size);
-
-	/*
 	 * If the FPU registers are not owned by current just memcpy() the
 	 * state.  Otherwise save the FPU registers directly into the
 	 * child's FPU context, without any memory-to-memory copying.


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [patch 07/31] x86/process: Clone FPU in copy_thread()
  2021-10-11 23:59 [patch 00/31] x86/fpu: Preparatory cleanups for AMX support (part 1) Thomas Gleixner
                   ` (5 preceding siblings ...)
  2021-10-12  0:00 ` [patch 06/31] x86/fpu: Remove pointless memset in fpu_clone() Thomas Gleixner
@ 2021-10-12  0:00 ` Thomas Gleixner
  2021-10-12  0:00 ` [patch 08/31] x86/fpu: Do not inherit FPU context for kernel and IO worker threads Thomas Gleixner
                   ` (24 subsequent siblings)
  31 siblings, 0 replies; 96+ messages in thread
From: Thomas Gleixner @ 2021-10-12  0:00 UTC (permalink / raw)
  To: LKML; +Cc: x86, Chang S. Bae, Dave Hansen, Arjan van de Ven, kvm, Paolo Bonzini

There is no reason to clone FPU in arch_dup_task_struct(). Quite the
contrary it prevents optimizations. Move it to copy_thread()

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/process.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -87,7 +87,7 @@ int arch_dup_task_struct(struct task_str
 #ifdef CONFIG_VM86
 	dst->thread.vm86 = NULL;
 #endif
-	return fpu_clone(dst);
+	return 0;
 }
 
 /*
@@ -154,6 +154,8 @@ int copy_thread(unsigned long clone_flag
 	frame->flags = X86_EFLAGS_FIXED;
 #endif
 
+	fpu_clone(p);
+
 	/* Kernel thread ? */
 	if (unlikely(p->flags & PF_KTHREAD)) {
 		p->thread.pkru = pkru_get_init_value();


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [patch 08/31] x86/fpu: Do not inherit FPU context for kernel and IO worker threads
  2021-10-11 23:59 [patch 00/31] x86/fpu: Preparatory cleanups for AMX support (part 1) Thomas Gleixner
                   ` (6 preceding siblings ...)
  2021-10-12  0:00 ` [patch 07/31] x86/process: Clone FPU in copy_thread() Thomas Gleixner
@ 2021-10-12  0:00 ` Thomas Gleixner
  2021-10-12  0:00 ` [patch 09/31] x86/fpu: Do not inherit FPU context for CLONE_THREAD Thomas Gleixner
                   ` (23 subsequent siblings)
  31 siblings, 0 replies; 96+ messages in thread
From: Thomas Gleixner @ 2021-10-12  0:00 UTC (permalink / raw)
  To: LKML; +Cc: x86, Chang S. Bae, Dave Hansen, Arjan van de Ven, kvm, Paolo Bonzini

There is no reason why kernel and IO worker threads need a full clone of
the parent's FPU state. Both are kernel threads which are not supposed to
use FPU. So copying a large state or doing XSAVE() is pointless. Just clean
out the minimaly required state for those tasks.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/fpu/core.c |   37 ++++++++++++++++++++++++++-----------
 1 file changed, 26 insertions(+), 11 deletions(-)

--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -212,6 +212,15 @@ static inline void fpstate_init_xstate(s
 	xsave->header.xcomp_bv = XCOMP_BV_COMPACTED_FORMAT | xfeatures_mask_all;
 }
 
+static inline unsigned int init_fpstate_copy_size(void)
+{
+	if (!use_xsave())
+		return fpu_kernel_xstate_size;
+
+	/* XSAVE(S) just needs the legacy and the xstate header part */
+	return sizeof(init_fpstate.xsave);
+}
+
 static inline void fpstate_init_fxstate(struct fxregs_state *fx)
 {
 	fx->cwd = 0x37f;
@@ -260,6 +269,23 @@ int fpu_clone(struct task_struct *dst)
 		return 0;
 
 	/*
+	 * Enforce reload for user space tasks and prevent kernel threads
+	 * from trying to save the FPU registers on context switch.
+	 */
+	set_tsk_thread_flag(dst, TIF_NEED_FPU_LOAD);
+
+	/*
+	 * No FPU state inheritance for kernel threads and IO
+	 * worker threads.
+	 */
+	if (dst->flags & (PF_KTHREAD | PF_IO_WORKER)) {
+		/* Clear out the minimal state */
+		memcpy(&dst_fpu->state, &init_fpstate,
+		       init_fpstate_copy_size());
+		return 0;
+	}
+
+	/*
 	 * If the FPU registers are not owned by current just memcpy() the
 	 * state.  Otherwise save the FPU registers directly into the
 	 * child's FPU context, without any memory-to-memory copying.
@@ -272,8 +298,6 @@ int fpu_clone(struct task_struct *dst)
 		save_fpregs_to_fpstate(dst_fpu);
 	fpregs_unlock();
 
-	set_tsk_thread_flag(dst, TIF_NEED_FPU_LOAD);
-
 	trace_x86_fpu_copy_src(src_fpu);
 	trace_x86_fpu_copy_dst(dst_fpu);
 
@@ -322,15 +346,6 @@ static inline void restore_fpregs_from_i
 	pkru_write_default();
 }
 
-static inline unsigned int init_fpstate_copy_size(void)
-{
-	if (!use_xsave())
-		return fpu_kernel_xstate_size;
-
-	/* XSAVE(S) just needs the legacy and the xstate header part */
-	return sizeof(init_fpstate.xsave);
-}
-
 /*
  * Reset current->fpu memory state to the init values.
  */


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [patch 09/31] x86/fpu: Do not inherit FPU context for CLONE_THREAD
  2021-10-11 23:59 [patch 00/31] x86/fpu: Preparatory cleanups for AMX support (part 1) Thomas Gleixner
                   ` (7 preceding siblings ...)
  2021-10-12  0:00 ` [patch 08/31] x86/fpu: Do not inherit FPU context for kernel and IO worker threads Thomas Gleixner
@ 2021-10-12  0:00 ` Thomas Gleixner
  2021-10-12 16:10   ` Borislav Petkov
  2021-10-12  0:00 ` [patch 10/31] x86/fpu: Cleanup xstate xcomp_bv initialization Thomas Gleixner
                   ` (22 subsequent siblings)
  31 siblings, 1 reply; 96+ messages in thread
From: Thomas Gleixner @ 2021-10-12  0:00 UTC (permalink / raw)
  To: LKML; +Cc: x86, Chang S. Bae, Dave Hansen, Arjan van de Ven, kvm, Paolo Bonzini

CLONE_THREAD does not have the guarantee of a true fork to inherit all
state. Especially the FPU state is meaningless for CLONE_THREAD.

Just wipe out the minimal required state so restore on return to user space
let's the thread start with a clean FPU.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/fpu/internal.h |    2 +-
 arch/x86/kernel/fpu/core.c          |    8 +++++---
 arch/x86/kernel/process.c           |    2 +-
 3 files changed, 7 insertions(+), 5 deletions(-)

--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -34,7 +34,7 @@ extern int  fpu__exception_code(struct f
 extern void fpu_sync_fpstate(struct fpu *fpu);
 
 /* Clone and exit operations */
-extern int  fpu_clone(struct task_struct *dst);
+extern int  fpu_clone(struct task_struct *dst, unsigned long clone_flags);
 extern void fpu_flush_thread(void);
 
 /*
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -257,7 +257,7 @@ void fpstate_init(union fpregs_state *st
 EXPORT_SYMBOL_GPL(fpstate_init);
 
 /* Clone current's FPU state on fork */
-int fpu_clone(struct task_struct *dst)
+int fpu_clone(struct task_struct *dst, unsigned long clone_flags)
 {
 	struct fpu *src_fpu = &current->thread.fpu;
 	struct fpu *dst_fpu = &dst->thread.fpu;
@@ -276,9 +276,11 @@ int fpu_clone(struct task_struct *dst)
 
 	/*
 	 * No FPU state inheritance for kernel threads and IO
-	 * worker threads.
+	 * worker threads. Neither CLONE_THREAD needs a copy
+	 * of the FPU state.
 	 */
-	if (dst->flags & (PF_KTHREAD | PF_IO_WORKER)) {
+	if (clone_flags & CLONE_THREAD ||
+	    dst->flags & (PF_KTHREAD | PF_IO_WORKER)) {
 		/* Clear out the minimal state */
 		memcpy(&dst_fpu->state, &init_fpstate,
 		       init_fpstate_copy_size());
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -154,7 +154,7 @@ int copy_thread(unsigned long clone_flag
 	frame->flags = X86_EFLAGS_FIXED;
 #endif
 
-	fpu_clone(p);
+	fpu_clone(p, clone_flags);
 
 	/* Kernel thread ? */
 	if (unlikely(p->flags & PF_KTHREAD)) {


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [patch 10/31] x86/fpu: Cleanup xstate xcomp_bv initialization
  2021-10-11 23:59 [patch 00/31] x86/fpu: Preparatory cleanups for AMX support (part 1) Thomas Gleixner
                   ` (8 preceding siblings ...)
  2021-10-12  0:00 ` [patch 09/31] x86/fpu: Do not inherit FPU context for CLONE_THREAD Thomas Gleixner
@ 2021-10-12  0:00 ` Thomas Gleixner
  2021-10-12  0:00 ` [patch 11/31] x86/fpu/xstate: Provide and use for_each_xfeature() Thomas Gleixner
                   ` (21 subsequent siblings)
  31 siblings, 0 replies; 96+ messages in thread
From: Thomas Gleixner @ 2021-10-12  0:00 UTC (permalink / raw)
  To: LKML; +Cc: x86, Chang S. Bae, Dave Hansen, Arjan van de Ven, kvm, Paolo Bonzini

No point in having this duplicated all over the place with needlessly
different defines.

Provide a proper initialization function which initializes user buffers
properly and make KVM use it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/fpu/internal.h |    4 +++-
 arch/x86/kernel/fpu/core.c          |   35 +++++++++++++++++++----------------
 arch/x86/kernel/fpu/init.c          |    6 +++---
 arch/x86/kernel/fpu/xstate.c        |    8 +++-----
 arch/x86/kernel/fpu/xstate.h        |   18 ++++++++++++++++++
 arch/x86/kvm/x86.c                  |   11 +++--------
 6 files changed, 49 insertions(+), 33 deletions(-)

--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -80,7 +80,9 @@ static __always_inline __pure bool use_f
 
 extern union fpregs_state init_fpstate;
 
-extern void fpstate_init(union fpregs_state *state);
+extern void fpstate_init_user(union fpregs_state *state);
+extern void fpu_init_fpstate_user(struct fpu *fpu);
+
 #ifdef CONFIG_MATH_EMULATION
 extern void fpstate_init_soft(struct swregs_state *soft);
 #else
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -16,6 +16,8 @@
 #include <linux/hardirq.h>
 #include <linux/pkeys.h>
 
+#include "xstate.h"
+
 #define CREATE_TRACE_POINTS
 #include <asm/trace/fpu.h>
 
@@ -203,15 +205,6 @@ void fpu_sync_fpstate(struct fpu *fpu)
 	fpregs_unlock();
 }
 
-static inline void fpstate_init_xstate(struct xregs_state *xsave)
-{
-	/*
-	 * XRSTORS requires these bits set in xcomp_bv, or it will
-	 * trigger #GP:
-	 */
-	xsave->header.xcomp_bv = XCOMP_BV_COMPACTED_FORMAT | xfeatures_mask_all;
-}
-
 static inline unsigned int init_fpstate_copy_size(void)
 {
 	if (!use_xsave())
@@ -238,23 +231,33 @@ static inline void fpstate_init_fstate(s
 	fp->fos = 0xffff0000u;
 }
 
-void fpstate_init(union fpregs_state *state)
+/*
+ * Used in two places:
+ * 1) Early boot to setup init_fpstate for non XSAVE systems
+ * 2) fpu_init_fpstate_user() which is invoked from KVM
+ */
+void fpstate_init_user(union fpregs_state *state)
 {
-	if (!static_cpu_has(X86_FEATURE_FPU)) {
+	if (!cpu_feature_enabled(X86_FEATURE_FPU)) {
 		fpstate_init_soft(&state->soft);
 		return;
 	}
 
-	memset(state, 0, fpu_kernel_xstate_size);
+	xstate_init_xcomp_bv(&state->xsave, xfeatures_mask_uabi());
 
-	if (static_cpu_has(X86_FEATURE_XSAVES))
-		fpstate_init_xstate(&state->xsave);
-	if (static_cpu_has(X86_FEATURE_FXSR))
+	if (cpu_feature_enabled(X86_FEATURE_FXSR))
 		fpstate_init_fxstate(&state->fxsave);
 	else
 		fpstate_init_fstate(&state->fsave);
 }
-EXPORT_SYMBOL_GPL(fpstate_init);
+
+#if IS_ENABLED(CONFIG_KVM)
+void fpu_init_fpstate_user(struct fpu *fpu)
+{
+	fpstate_init_user(&fpu->state);
+}
+EXPORT_SYMBOL_GPL(fpu_init_fpstate_user);
+#endif
 
 /* Clone current's FPU state on fork */
 int fpu_clone(struct task_struct *dst, unsigned long clone_flags)
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -121,10 +121,10 @@ static void __init fpu__init_system_mxcs
 static void __init fpu__init_system_generic(void)
 {
 	/*
-	 * Set up the legacy init FPU context. (xstate init might overwrite this
-	 * with a more modern format, if the CPU supports it.)
+	 * Set up the legacy init FPU context. Will be updated when the
+	 * CPU supports XSAVE[S].
 	 */
-	fpstate_init(&init_fpstate);
+	fpstate_init_user(&init_fpstate);
 
 	fpu__init_system_mxcsr();
 }
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -15,10 +15,10 @@
 #include <asm/fpu/internal.h>
 #include <asm/fpu/signal.h>
 #include <asm/fpu/regset.h>
-#include <asm/fpu/xstate.h>
 
 #include <asm/tlbflush.h>
-#include <asm/cpufeature.h>
+
+#include "xstate.h"
 
 /*
  * Although we spell it out in here, the Processor Trace
@@ -389,9 +389,7 @@ static void __init setup_init_fpu_buf(vo
 	setup_xstate_features();
 	print_xstate_features();
 
-	if (boot_cpu_has(X86_FEATURE_XSAVES))
-		init_fpstate.xsave.header.xcomp_bv = XCOMP_BV_COMPACTED_FORMAT |
-						     xfeatures_mask_all;
+	xstate_init_xcomp_bv(&init_fpstate.xsave, xfeatures_mask_all);
 
 	/*
 	 * Init all the features state with header.xfeatures being 0x0
--- /dev/null
+++ b/arch/x86/kernel/fpu/xstate.h
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __X86_KERNEL_FPU_XSTATE_H
+#define __X86_KERNEL_FPU_XSTATE_H
+
+#include <asm/cpufeature.h>
+#include <asm/fpu/xstate.h>
+
+static inline void xstate_init_xcomp_bv(struct xregs_state *xsave, u64 mask)
+{
+	/*
+	 * XRSTORS requires these bits set in xcomp_bv, or it will
+	 * trigger #GP:
+	 */
+	if (cpu_feature_enabled(X86_FEATURE_XSAVES))
+		xsave->header.xcomp_bv = mask | XCOMP_BV_COMPACTED_FORMAT;
+}
+
+#endif
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10612,14 +10612,6 @@ static int sync_regs(struct kvm_vcpu *vc
 
 static void fx_init(struct kvm_vcpu *vcpu)
 {
-	if (!vcpu->arch.guest_fpu)
-		return;
-
-	fpstate_init(&vcpu->arch.guest_fpu->state);
-	if (boot_cpu_has(X86_FEATURE_XSAVES))
-		vcpu->arch.guest_fpu->state.xsave.header.xcomp_bv =
-			host_xcr0 | XSTATE_COMPACTION_ENABLED;
-
 	/*
 	 * Ensure guest xcr0 is valid for loading
 	 */
@@ -10704,6 +10696,9 @@ int kvm_arch_vcpu_create(struct kvm_vcpu
 		pr_err("kvm: failed to allocate vcpu's fpu\n");
 		goto free_user_fpu;
 	}
+
+	fpu_init_fpstate_user(vcpu->arch.user_fpu);
+	fpu_init_fpstate_user(vcpu->arch.guest_fpu);
 	fx_init(vcpu);
 
 	vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu);


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [patch 11/31] x86/fpu/xstate: Provide and use for_each_xfeature()
  2021-10-11 23:59 [patch 00/31] x86/fpu: Preparatory cleanups for AMX support (part 1) Thomas Gleixner
                   ` (9 preceding siblings ...)
  2021-10-12  0:00 ` [patch 10/31] x86/fpu: Cleanup xstate xcomp_bv initialization Thomas Gleixner
@ 2021-10-12  0:00 ` Thomas Gleixner
  2021-10-12 16:45   ` Borislav Petkov
  2021-10-12  0:00 ` [patch 12/31] x86/fpu/xstate: Mark all init only functions __init Thomas Gleixner
                   ` (20 subsequent siblings)
  31 siblings, 1 reply; 96+ messages in thread
From: Thomas Gleixner @ 2021-10-12  0:00 UTC (permalink / raw)
  To: LKML; +Cc: x86, Chang S. Bae, Dave Hansen, Arjan van de Ven, kvm, Paolo Bonzini

These loops evaluating xfeature bits are really hard to read. Create an
iterator and use for_each_set_bit_from() inside which already does the right
thing.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/fpu/xstate.c |   56 +++++++++++++++++--------------------------
 1 file changed, 23 insertions(+), 33 deletions(-)

--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -4,6 +4,7 @@
  *
  * Author: Suresh Siddha <suresh.b.siddha@intel.com>
  */
+#include <linux/bitops.h>
 #include <linux/compat.h>
 #include <linux/cpu.h>
 #include <linux/mman.h>
@@ -20,6 +21,10 @@
 
 #include "xstate.h"
 
+#define for_each_extended_xfeature(bit, mask)				\
+	(bit) = FIRST_EXTENDED_XFEATURE;				\
+	for_each_set_bit_from(bit, (unsigned long *)&(mask), 8 * sizeof(mask))
+
 /*
  * Although we spell it out in here, the Processor Trace
  * xfeature is completely unused.  We use other mechanisms
@@ -184,10 +189,7 @@ static void __init setup_xstate_features
 	xstate_sizes[XFEATURE_SSE]	= sizeof_field(struct fxregs_state,
 						       xmm_space);
 
-	for (i = FIRST_EXTENDED_XFEATURE; i < XFEATURE_MAX; i++) {
-		if (!xfeature_enabled(i))
-			continue;
-
+	for_each_extended_xfeature(i, xfeatures_mask_all) {
 		cpuid_count(XSTATE_CPUID, i, &eax, &ebx, &ecx, &edx);
 
 		xstate_sizes[i] = eax;
@@ -291,20 +293,15 @@ static void __init setup_xstate_comp_off
 	xstate_comp_offsets[XFEATURE_SSE] = offsetof(struct fxregs_state,
 						     xmm_space);
 
-	if (!boot_cpu_has(X86_FEATURE_XSAVES)) {
-		for (i = FIRST_EXTENDED_XFEATURE; i < XFEATURE_MAX; i++) {
-			if (xfeature_enabled(i))
-				xstate_comp_offsets[i] = xstate_offsets[i];
-		}
+	if (!cpu_feature_enabled(X86_FEATURE_XSAVES)) {
+		for_each_extended_xfeature(i, xfeatures_mask_all)
+			xstate_comp_offsets[i] = xstate_offsets[i];
 		return;
 	}
 
 	next_offset = FXSAVE_SIZE + XSAVE_HDR_SIZE;
 
-	for (i = FIRST_EXTENDED_XFEATURE; i < XFEATURE_MAX; i++) {
-		if (!xfeature_enabled(i))
-			continue;
-
+	for_each_extended_xfeature(i, xfeatures_mask_all) {
 		if (xfeature_is_aligned(i))
 			next_offset = ALIGN(next_offset, 64);
 
@@ -328,8 +325,8 @@ static void __init setup_supervisor_only
 
 	next_offset = FXSAVE_SIZE + XSAVE_HDR_SIZE;
 
-	for (i = FIRST_EXTENDED_XFEATURE; i < XFEATURE_MAX; i++) {
-		if (!xfeature_enabled(i) || !xfeature_is_supervisor(i))
+	for_each_extended_xfeature(i, xfeatures_mask_all) {
+		if (!xfeature_is_supervisor(i))
 			continue;
 
 		if (xfeature_is_aligned(i))
@@ -347,9 +344,7 @@ static void __init print_xstate_offset_s
 {
 	int i;
 
-	for (i = FIRST_EXTENDED_XFEATURE; i < XFEATURE_MAX; i++) {
-		if (!xfeature_enabled(i))
-			continue;
+	for_each_extended_xfeature(i, xfeatures_mask_all) {
 		pr_info("x86/fpu: xstate_offset[%d]: %4d, xstate_sizes[%d]: %4d\n",
 			 i, xstate_comp_offsets[i], i, xstate_sizes[i]);
 	}
@@ -554,10 +549,7 @@ static void do_extra_xstate_size_checks(
 	int paranoid_xstate_size = FXSAVE_SIZE + XSAVE_HDR_SIZE;
 	int i;
 
-	for (i = FIRST_EXTENDED_XFEATURE; i < XFEATURE_MAX; i++) {
-		if (!xfeature_enabled(i))
-			continue;
-
+	for_each_extended_xfeature(i, xfeatures_mask_all) {
 		check_xstate_against_struct(i);
 		/*
 		 * Supervisor state components can be managed only by
@@ -586,7 +578,6 @@ static void do_extra_xstate_size_checks(
 	XSTATE_WARN_ON(paranoid_xstate_size != fpu_kernel_xstate_size);
 }
 
-
 /*
  * Get total size of enabled xstates in XCR0 | IA32_XSS.
  *
@@ -969,6 +960,7 @@ void copy_xstate_to_uabi_buf(struct memb
 	struct xregs_state *xinit = &init_fpstate.xsave;
 	struct xstate_header header;
 	unsigned int zerofrom;
+	u64 mask;
 	int i;
 
 	memset(&header, 0, sizeof(header));
@@ -1022,17 +1014,15 @@ void copy_xstate_to_uabi_buf(struct memb
 
 	zerofrom = offsetof(struct xregs_state, extended_state_area);
 
-	for (i = FIRST_EXTENDED_XFEATURE; i < XFEATURE_MAX; i++) {
-		/*
-		 * The ptrace buffer is in non-compacted XSAVE format.
-		 * In non-compacted format disabled features still occupy
-		 * state space, but there is no state to copy from in the
-		 * compacted init_fpstate. The gap tracking will zero this
-		 * later.
-		 */
-		if (!(xfeatures_mask_uabi() & BIT_ULL(i)))
-			continue;
+	/*
+	 * The ptrace buffer is in non-compacted XSAVE format.  In
+	 * non-compacted format disabled features still occupy state space,
+	 * but there is no state to copy from in the compacted
+	 * init_fpstate. The gap tracking will zero these states.
+	 */
+	mask = xfeatures_mask_uabi();
 
+	for_each_extended_xfeature(i, mask) {
 		/*
 		 * If there was a feature or alignment gap, zero the space
 		 * in the destination buffer.


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [patch 12/31] x86/fpu/xstate: Mark all init only functions __init
  2021-10-11 23:59 [patch 00/31] x86/fpu: Preparatory cleanups for AMX support (part 1) Thomas Gleixner
                   ` (10 preceding siblings ...)
  2021-10-12  0:00 ` [patch 11/31] x86/fpu/xstate: Provide and use for_each_xfeature() Thomas Gleixner
@ 2021-10-12  0:00 ` Thomas Gleixner
  2021-10-12  0:00 ` [patch 13/31] x86/fpu: Move KVMs FPU swapping to FPU core Thomas Gleixner
                   ` (19 subsequent siblings)
  31 siblings, 0 replies; 96+ messages in thread
From: Thomas Gleixner @ 2021-10-12  0:00 UTC (permalink / raw)
  To: LKML; +Cc: x86, Chang S. Bae, Dave Hansen, Arjan van de Ven, kvm, Paolo Bonzini

No point to keep them around after boot.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/fpu/xstate.c |   10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -462,7 +462,7 @@ static int validate_user_xstate_header(c
 	return 0;
 }
 
-static void __xstate_dump_leaves(void)
+static void __init __xstate_dump_leaves(void)
 {
 	int i;
 	u32 eax, ebx, ecx, edx;
@@ -502,7 +502,7 @@ static void __xstate_dump_leaves(void)
  * that our software representation matches what the CPU
  * tells us about the state's size.
  */
-static void check_xstate_against_struct(int nr)
+static void __init check_xstate_against_struct(int nr)
 {
 	/*
 	 * Ask the CPU for the size of the state.
@@ -544,7 +544,7 @@ static void check_xstate_against_struct(
  * covered by these checks. Only the size of the buffer for task->fpu
  * is checked here.
  */
-static void do_extra_xstate_size_checks(void)
+static void __init do_extra_xstate_size_checks(void)
 {
 	int paranoid_xstate_size = FXSAVE_SIZE + XSAVE_HDR_SIZE;
 	int i;
@@ -646,7 +646,7 @@ static unsigned int __init get_xsave_siz
  * Will the runtime-enumerated 'xstate_size' fit in the init
  * task's statically-allocated buffer?
  */
-static bool is_supported_xstate_size(unsigned int test_xstate_size)
+static bool __init is_supported_xstate_size(unsigned int test_xstate_size)
 {
 	if (test_xstate_size <= sizeof(union fpregs_state))
 		return true;
@@ -691,7 +691,7 @@ static int __init init_xstate_size(void)
  * We enabled the XSAVE hardware, but something went wrong and
  * we can not use it.  Disable it.
  */
-static void fpu__init_disable_system_xstate(void)
+static void __init fpu__init_disable_system_xstate(void)
 {
 	xfeatures_mask_all = 0;
 	cr4_clear_bits(X86_CR4_OSXSAVE);


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [patch 13/31] x86/fpu: Move KVMs FPU swapping to FPU core
  2021-10-11 23:59 [patch 00/31] x86/fpu: Preparatory cleanups for AMX support (part 1) Thomas Gleixner
                   ` (11 preceding siblings ...)
  2021-10-12  0:00 ` [patch 12/31] x86/fpu/xstate: Mark all init only functions __init Thomas Gleixner
@ 2021-10-12  0:00 ` Thomas Gleixner
  2021-10-12 16:53   ` Borislav Petkov
  2021-10-12 17:22   ` Paolo Bonzini
  2021-10-12  0:00 ` [patch 14/31] x86/fpu: Replace KVMs homebrewn FPU copy from user Thomas Gleixner
                   ` (18 subsequent siblings)
  31 siblings, 2 replies; 96+ messages in thread
From: Thomas Gleixner @ 2021-10-12  0:00 UTC (permalink / raw)
  To: LKML; +Cc: x86, Chang S. Bae, Dave Hansen, Arjan van de Ven, kvm, Paolo Bonzini

Swapping the host/guest FPU is directly fiddling with FPU internals which
requires 5 exports. The upcoming support of dymanically enabled states
would even need more.

Implement a swap function in the FPU core code and export that instead.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/include/asm/fpu/api.h      |    8 +++++
 arch/x86/include/asm/fpu/internal.h |   15 +---------
 arch/x86/kernel/fpu/core.c          |   30 ++++++++++++++++++---
 arch/x86/kernel/fpu/init.c          |    1 
 arch/x86/kernel/fpu/xstate.c        |    1 
 arch/x86/kvm/x86.c                  |   51 +++++++-----------------------------
 arch/x86/mm/extable.c               |    2 -
 7 files changed, 48 insertions(+), 60 deletions(-)

--- a/arch/x86/include/asm/fpu/api.h
+++ b/arch/x86/include/asm/fpu/api.h
@@ -12,6 +12,8 @@
 #define _ASM_X86_FPU_API_H
 #include <linux/bottom_half.h>
 
+#include <asm/fpu/types.h>
+
 /*
  * Use kernel_fpu_begin/end() if you intend to use FPU in kernel context. It
  * disables preemption so be careful if you intend to use it for long periods
@@ -108,4 +110,10 @@ extern int cpu_has_xfeatures(u64 xfeatur
 
 static inline void update_pasid(void) { }
 
+/* FPSTATE related functions which are exported to KVM */
+extern void fpu_init_fpstate_user(struct fpu *fpu);
+
+/* KVM specific functions */
+extern void fpu_swap_kvm_fpu(struct fpu *save, struct fpu *rstor, u64 restore_mask);
+
 #endif /* _ASM_X86_FPU_API_H */
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -74,14 +74,8 @@ static __always_inline __pure bool use_f
 	return static_cpu_has(X86_FEATURE_FXSR);
 }
 
-/*
- * fpstate handling functions:
- */
-
 extern union fpregs_state init_fpstate;
-
 extern void fpstate_init_user(union fpregs_state *state);
-extern void fpu_init_fpstate_user(struct fpu *fpu);
 
 #ifdef CONFIG_MATH_EMULATION
 extern void fpstate_init_soft(struct swregs_state *soft);
@@ -381,12 +375,7 @@ static inline int os_xrstor_safe(struct
 	return err;
 }
 
-extern void __restore_fpregs_from_fpstate(union fpregs_state *fpstate, u64 mask);
-
-static inline void restore_fpregs_from_fpstate(union fpregs_state *fpstate)
-{
-	__restore_fpregs_from_fpstate(fpstate, xfeatures_mask_fpstate());
-}
+extern void restore_fpregs_from_fpstate(union fpregs_state *fpstate, u64 mask);
 
 extern bool copy_fpstate_to_sigframe(void __user *buf, void __user *fp, int size);
 
@@ -467,7 +456,7 @@ static inline void fpregs_restore_userre
 		 */
 		mask = xfeatures_mask_restore_user() |
 			xfeatures_mask_supervisor();
-		__restore_fpregs_from_fpstate(&fpu->state, mask);
+		restore_fpregs_from_fpstate(&fpu->state, mask);
 
 		fpregs_activate(fpu);
 		fpu->last_cpu = cpu;
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -124,9 +124,8 @@ void save_fpregs_to_fpstate(struct fpu *
 	asm volatile("fnsave %[fp]; fwait" : [fp] "=m" (fpu->state.fsave));
 	frstor(&fpu->state.fsave);
 }
-EXPORT_SYMBOL(save_fpregs_to_fpstate);
 
-void __restore_fpregs_from_fpstate(union fpregs_state *fpstate, u64 mask)
+void restore_fpregs_from_fpstate(union fpregs_state *fpstate, u64 mask)
 {
 	/*
 	 * AMD K7/K8 and later CPUs up to Zen don't save/restore
@@ -151,7 +150,31 @@ void __restore_fpregs_from_fpstate(union
 			frstor(&fpstate->fsave);
 	}
 }
-EXPORT_SYMBOL_GPL(__restore_fpregs_from_fpstate);
+
+#if IS_ENABLED(CONFIG_KVM)
+void fpu_swap_kvm_fpu(struct fpu *save, struct fpu *rstor, u64 restore_mask)
+{
+	fpregs_lock();
+
+	if (save) {
+		if (test_thread_flag(TIF_NEED_FPU_LOAD)) {
+			memcpy(&save->state, &current->thread.fpu.state,
+			       fpu_kernel_xstate_size);
+		} else {
+			save_fpregs_to_fpstate(save);
+		}
+	}
+
+	if (rstor) {
+		restore_mask &= xfeatures_mask_fpstate();
+		restore_fpregs_from_fpstate(&rstor->state, restore_mask);
+	}
+
+	fpregs_mark_activate();
+	fpregs_unlock();
+}
+EXPORT_SYMBOL_GPL(fpu_swap_kvm_fpu);
+#endif
 
 void kernel_fpu_begin_mask(unsigned int kfpu_mask)
 {
@@ -459,7 +482,6 @@ void fpregs_mark_activate(void)
 	fpu->last_cpu = smp_processor_id();
 	clear_thread_flag(TIF_NEED_FPU_LOAD);
 }
-EXPORT_SYMBOL_GPL(fpregs_mark_activate);
 
 /*
  * x87 math exception handling:
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -136,7 +136,6 @@ static void __init fpu__init_system_gene
  * components into a single, continuous memory block:
  */
 unsigned int fpu_kernel_xstate_size __ro_after_init;
-EXPORT_SYMBOL_GPL(fpu_kernel_xstate_size);
 
 /* Get alignment of the TYPE. */
 #define TYPE_ALIGN(TYPE) offsetof(struct { char x; TYPE test; }, test)
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -65,7 +65,6 @@ static short xsave_cpuid_features[] __in
  * XSAVE buffer, both supervisor and user xstates.
  */
 u64 xfeatures_mask_all __ro_after_init;
-EXPORT_SYMBOL_GPL(xfeatures_mask_all);
 
 static unsigned int xstate_offsets[XFEATURE_MAX] __ro_after_init =
 	{ [ 0 ... XFEATURE_MAX - 1] = -1};
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -68,7 +68,9 @@
 #include <asm/mce.h>
 #include <asm/pkru.h>
 #include <linux/kernel_stat.h>
-#include <asm/fpu/internal.h> /* Ugh! */
+#include <asm/fpu/api.h>
+#include <asm/fpu/xcr.h>
+#include <asm/fpu/xstate.h>
 #include <asm/pvclock.h>
 #include <asm/div64.h>
 #include <asm/irq_remapping.h>
@@ -9899,58 +9901,27 @@ static int complete_emulated_mmio(struct
 	return 0;
 }
 
-static void kvm_save_current_fpu(struct fpu *fpu)
-{
-	/*
-	 * If the target FPU state is not resident in the CPU registers, just
-	 * memcpy() from current, else save CPU state directly to the target.
-	 */
-	if (test_thread_flag(TIF_NEED_FPU_LOAD))
-		memcpy(&fpu->state, &current->thread.fpu.state,
-		       fpu_kernel_xstate_size);
-	else
-		save_fpregs_to_fpstate(fpu);
-}
-
 /* Swap (qemu) user FPU context for the guest FPU context. */
 static void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
 {
-	fpregs_lock();
-
-	kvm_save_current_fpu(vcpu->arch.user_fpu);
-
 	/*
-	 * Guests with protected state can't have it set by the hypervisor,
-	 * so skip trying to set it.
+	 * Guest with protected state have guest_fpu == NULL which makes
+	 * the swap only safe the host state. Exclude PKRU from restore as
+	 * it is restored separately in kvm_x86_ops.run().
 	 */
-	if (vcpu->arch.guest_fpu)
-		/* PKRU is separately restored in kvm_x86_ops.run. */
-		__restore_fpregs_from_fpstate(&vcpu->arch.guest_fpu->state,
-					~XFEATURE_MASK_PKRU);
-
-	fpregs_mark_activate();
-	fpregs_unlock();
-
+	fpu_swap_kvm_fpu(vcpu->arch.user_fpu, vcpu->arch.guest_fpu,
+			 ~XFEATURE_MASK_PKRU);
 	trace_kvm_fpu(1);
 }
 
 /* When vcpu_run ends, restore user space FPU context. */
 static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
 {
-	fpregs_lock();
-
 	/*
-	 * Guests with protected state can't have it read by the hypervisor,
-	 * so skip trying to save it.
+	 * Guest with protected state have guest_fpu == NULL which makes
+	 * swap only restore the host state.
 	 */
-	if (vcpu->arch.guest_fpu)
-		kvm_save_current_fpu(vcpu->arch.guest_fpu);
-
-	restore_fpregs_from_fpstate(&vcpu->arch.user_fpu->state);
-
-	fpregs_mark_activate();
-	fpregs_unlock();
-
+	fpu_swap_kvm_fpu(vcpu->arch.guest_fpu, vcpu->arch.user_fpu, ~0ULL);
 	++vcpu->stat.fpu_reload;
 	trace_kvm_fpu(0);
 }
--- a/arch/x86/mm/extable.c
+++ b/arch/x86/mm/extable.c
@@ -47,7 +47,7 @@ static bool ex_handler_fprestore(const s
 	WARN_ONCE(1, "Bad FPU state detected at %pB, reinitializing FPU registers.",
 		  (void *)instruction_pointer(regs));
 
-	__restore_fpregs_from_fpstate(&init_fpstate, xfeatures_mask_fpstate());
+	restore_fpregs_from_fpstate(&init_fpstate, xfeatures_mask_fpstate());
 	return true;
 }
 


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [patch 14/31] x86/fpu: Replace KVMs homebrewn FPU copy from user
  2021-10-11 23:59 [patch 00/31] x86/fpu: Preparatory cleanups for AMX support (part 1) Thomas Gleixner
                   ` (12 preceding siblings ...)
  2021-10-12  0:00 ` [patch 13/31] x86/fpu: Move KVMs FPU swapping to FPU core Thomas Gleixner
@ 2021-10-12  0:00 ` Thomas Gleixner
  2021-10-12 17:00   ` Borislav Petkov
  2021-10-12 17:30   ` Paolo Bonzini
  2021-10-12  0:00 ` [patch 15/31] x86/fpu: Rework copy_xstate_to_uabi_buf() Thomas Gleixner
                   ` (17 subsequent siblings)
  31 siblings, 2 replies; 96+ messages in thread
From: Thomas Gleixner @ 2021-10-12  0:00 UTC (permalink / raw)
  To: LKML; +Cc: x86, Chang S. Bae, Dave Hansen, Arjan van de Ven, kvm, Paolo Bonzini

Copying a user space buffer to the memory buffer is already available in
the FPU core. The copy mechanism in KVM lacks sanity checks and needs to
use cpuid() to lookup the offset of each component, while the FPU core has
this information cached.

Make the FPU core variant accessible for KVM and replace the homebrewn
mechanism.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/include/asm/fpu/api.h |    3 +
 arch/x86/kernel/fpu/core.c     |   38 ++++++++++++++++++++-
 arch/x86/kernel/fpu/xstate.c   |    3 -
 arch/x86/kvm/x86.c             |   74 +----------------------------------------
 4 files changed, 44 insertions(+), 74 deletions(-)

--- a/arch/x86/include/asm/fpu/api.h
+++ b/arch/x86/include/asm/fpu/api.h
@@ -116,4 +116,7 @@ extern void fpu_init_fpstate_user(struct
 /* KVM specific functions */
 extern void fpu_swap_kvm_fpu(struct fpu *save, struct fpu *rstor, u64 restore_mask);
 
+struct kvm_vcpu;
+extern int fpu_copy_kvm_uabi_to_vcpu(struct fpu *fpu, const void *buf, u64 xcr0, u32 *pkru);
+
 #endif /* _ASM_X86_FPU_API_H */
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -174,7 +174,43 @@ void fpu_swap_kvm_fpu(struct fpu *save,
 	fpregs_unlock();
 }
 EXPORT_SYMBOL_GPL(fpu_swap_kvm_fpu);
-#endif
+
+int fpu_copy_kvm_uabi_to_vcpu(struct fpu *fpu, const void *buf, u64 xcr0,
+			      u32 *vpkru)
+{
+	union fpregs_state *kstate = &fpu->state;
+	const union fpregs_state *ustate = buf;
+	struct pkru_state *xpkru;
+	int ret;
+
+	if (!cpu_feature_enabled(X86_FEATURE_XSAVE)) {
+		if (ustate->xsave.header.xfeatures & ~XFEATURE_MASK_FPSSE)
+			return -EINVAL;
+		if (ustate->fxsave.mxcsr & ~mxcsr_feature_mask)
+			return -EINVAL;
+		memcpy(&kstate->fxsave, &ustate->fxsave, sizeof(ustate->fxsave));
+		return 0;
+	}
+
+	if (ustate->xsave.header.xfeatures & ~xcr0)
+		return -EINVAL;
+
+	ret = copy_uabi_from_kernel_to_xstate(&kstate->xsave, ustate);
+	if (ret)
+		return ret;
+
+	/* Retrieve PKRU if not in init state */
+	if (kstate->xsave.header.xfeatures & XFEATURE_MASK_PKRU) {
+		xpkru = get_xsave_addr(&kstate->xsave, XFEATURE_PKRU);
+		*vpkru = xpkru->pkru;
+	}
+
+	/* Ensure that XCOMP_BV is set up for XSAVES */
+	xstate_init_xcomp_bv(&kstate->xsave, xfeatures_mask_uabi());
+	return 0;
+}
+EXPORT_SYMBOL_GPL(fpu_copy_kvm_uabi_to_vcpu);
+#endif /* CONFIG_KVM */
 
 void kernel_fpu_begin_mask(unsigned int kfpu_mask)
 {
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -1134,8 +1134,7 @@ static int copy_uabi_to_xstate(struct xr
 
 /*
  * Convert from a ptrace standard-format kernel buffer to kernel XSAVE[S]
- * format and copy to the target thread. This is called from
- * xstateregs_set().
+ * format and copy to the target thread. Used by ptrace and KVM.
  */
 int copy_uabi_from_kernel_to_xstate(struct xregs_state *xsave, const void *kbuf)
 {
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4695,8 +4695,6 @@ static int kvm_vcpu_ioctl_x86_set_debugr
 	return 0;
 }
 
-#define XSTATE_COMPACTION_ENABLED (1ULL << 63)
-
 static void fill_xsave(u8 *dest, struct kvm_vcpu *vcpu)
 {
 	struct xregs_state *xsave = &vcpu->arch.guest_fpu->state.xsave;
@@ -4740,50 +4738,6 @@ static void fill_xsave(u8 *dest, struct
 	}
 }
 
-static void load_xsave(struct kvm_vcpu *vcpu, u8 *src)
-{
-	struct xregs_state *xsave = &vcpu->arch.guest_fpu->state.xsave;
-	u64 xstate_bv = *(u64 *)(src + XSAVE_HDR_OFFSET);
-	u64 valid;
-
-	/*
-	 * Copy legacy XSAVE area, to avoid complications with CPUID
-	 * leaves 0 and 1 in the loop below.
-	 */
-	memcpy(xsave, src, XSAVE_HDR_OFFSET);
-
-	/* Set XSTATE_BV and possibly XCOMP_BV.  */
-	xsave->header.xfeatures = xstate_bv;
-	if (boot_cpu_has(X86_FEATURE_XSAVES))
-		xsave->header.xcomp_bv = host_xcr0 | XSTATE_COMPACTION_ENABLED;
-
-	/*
-	 * Copy each region from the non-compacted offset to the
-	 * possibly compacted offset.
-	 */
-	valid = xstate_bv & ~XFEATURE_MASK_FPSSE;
-	while (valid) {
-		u32 size, offset, ecx, edx;
-		u64 xfeature_mask = valid & -valid;
-		int xfeature_nr = fls64(xfeature_mask) - 1;
-
-		cpuid_count(XSTATE_CPUID, xfeature_nr,
-			    &size, &offset, &ecx, &edx);
-
-		if (xfeature_nr == XFEATURE_PKRU) {
-			memcpy(&vcpu->arch.pkru, src + offset,
-			       sizeof(vcpu->arch.pkru));
-		} else {
-			void *dest = get_xsave_addr(xsave, xfeature_nr);
-
-			if (dest)
-				memcpy(dest, src + offset, size);
-		}
-
-		valid -= xfeature_mask;
-	}
-}
-
 static void kvm_vcpu_ioctl_x86_get_xsave(struct kvm_vcpu *vcpu,
 					 struct kvm_xsave *guest_xsave)
 {
@@ -4802,37 +4756,15 @@ static void kvm_vcpu_ioctl_x86_get_xsave
 	}
 }
 
-#define XSAVE_MXCSR_OFFSET 24
-
 static int kvm_vcpu_ioctl_x86_set_xsave(struct kvm_vcpu *vcpu,
 					struct kvm_xsave *guest_xsave)
 {
-	u64 xstate_bv;
-	u32 mxcsr;
-
 	if (!vcpu->arch.guest_fpu)
 		return 0;
 
-	xstate_bv = *(u64 *)&guest_xsave->region[XSAVE_HDR_OFFSET / sizeof(u32)];
-	mxcsr = *(u32 *)&guest_xsave->region[XSAVE_MXCSR_OFFSET / sizeof(u32)];
-
-	if (boot_cpu_has(X86_FEATURE_XSAVE)) {
-		/*
-		 * Here we allow setting states that are not present in
-		 * CPUID leaf 0xD, index 0, EDX:EAX.  This is for compatibility
-		 * with old userspace.
-		 */
-		if (xstate_bv & ~supported_xcr0 || mxcsr & ~mxcsr_feature_mask)
-			return -EINVAL;
-		load_xsave(vcpu, (u8 *)guest_xsave->region);
-	} else {
-		if (xstate_bv & ~XFEATURE_MASK_FPSSE ||
-			mxcsr & ~mxcsr_feature_mask)
-			return -EINVAL;
-		memcpy(&vcpu->arch.guest_fpu->state.fxsave,
-			guest_xsave->region, sizeof(struct fxregs_state));
-	}
-	return 0;
+	return fpu_copy_kvm_uabi_to_vcpu(vcpu->arch.guest_fpu,
+					 guest_xsave->region,
+					 supported_xcr0, &vcpu->arch.pkru);
 }
 
 static void kvm_vcpu_ioctl_x86_get_xcrs(struct kvm_vcpu *vcpu,


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [patch 15/31] x86/fpu: Rework copy_xstate_to_uabi_buf()
  2021-10-11 23:59 [patch 00/31] x86/fpu: Preparatory cleanups for AMX support (part 1) Thomas Gleixner
                   ` (13 preceding siblings ...)
  2021-10-12  0:00 ` [patch 14/31] x86/fpu: Replace KVMs homebrewn FPU copy from user Thomas Gleixner
@ 2021-10-12  0:00 ` Thomas Gleixner
  2021-10-12 17:30   ` Paolo Bonzini
  2021-10-12  0:00 ` [patch 16/31] x86/fpu: Replace KVMs homebrewn FPU copy to user Thomas Gleixner
                   ` (16 subsequent siblings)
  31 siblings, 1 reply; 96+ messages in thread
From: Thomas Gleixner @ 2021-10-12  0:00 UTC (permalink / raw)
  To: LKML; +Cc: x86, Chang S. Bae, Dave Hansen, Arjan van de Ven, kvm, Paolo Bonzini

Prepare for replacing the KVM copy xstate to user function by extending
copy_xstate_to_uabi_buf() with a pkru argument which allows the caller to
hand in the pkru value, which is required for KVM because the guest PKRU is
not accessible via current. Fixup all callsites accordingly.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/fpu/xstate.c |   34 ++++++++++++++++++++++++++--------
 arch/x86/kernel/fpu/xstate.h |    3 +++
 2 files changed, 29 insertions(+), 8 deletions(-)

--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -940,9 +940,10 @@ static void copy_feature(bool from_xstat
 }
 
 /**
- * copy_xstate_to_uabi_buf - Copy kernel saved xstate to a UABI buffer
+ * __copy_xstate_to_uabi_buf - Copy kernel saved xstate to a UABI buffer
  * @to:		membuf descriptor
- * @tsk:	The task from which to copy the saved xstate
+ * @xsave:	The xsave from which to copy
+ * @pkru_val:	The PKRU value to store in the PKRU component
  * @copy_mode:	The requested copy mode
  *
  * Converts from kernel XSAVE or XSAVES compacted format to UABI conforming
@@ -951,11 +952,10 @@ static void copy_feature(bool from_xstat
  *
  * It supports partial copy but @to.pos always starts from zero.
  */
-void copy_xstate_to_uabi_buf(struct membuf to, struct task_struct *tsk,
-			     enum xstate_copy_mode copy_mode)
+void __copy_xstate_to_uabi_buf(struct membuf to, struct xregs_state *xsave,
+			       u32 pkru_val, enum xstate_copy_mode copy_mode)
 {
 	const unsigned int off_mxcsr = offsetof(struct fxregs_state, mxcsr);
-	struct xregs_state *xsave = &tsk->thread.fpu.state.xsave;
 	struct xregs_state *xinit = &init_fpstate.xsave;
 	struct xstate_header header;
 	unsigned int zerofrom;
@@ -1033,10 +1033,9 @@ void copy_xstate_to_uabi_buf(struct memb
 			struct pkru_state pkru = {0};
 			/*
 			 * PKRU is not necessarily up to date in the
-			 * thread's XSAVE buffer.  Fill this part from the
-			 * per-thread storage.
+			 * XSAVE buffer. Use the provided value.
 			 */
-			pkru.pkru = tsk->thread.pkru;
+			pkru.pkru = pkru_val;
 			membuf_write(&to, &pkru, sizeof(pkru));
 		} else {
 			copy_feature(header.xfeatures & BIT_ULL(i), &to,
@@ -1056,6 +1055,25 @@ void copy_xstate_to_uabi_buf(struct memb
 		membuf_zero(&to, to.left);
 }
 
+/**
+ * copy_xstate_to_uabi_buf - Copy kernel saved xstate to a UABI buffer
+ * @to:		membuf descriptor
+ * @tsk:	The task from which to copy the saved xstate
+ * @copy_mode:	The requested copy mode
+ *
+ * Converts from kernel XSAVE or XSAVES compacted format to UABI conforming
+ * format, i.e. from the kernel internal hardware dependent storage format
+ * to the requested @mode. UABI XSTATE is always uncompacted!
+ *
+ * It supports partial copy but @to.pos always starts from zero.
+ */
+void copy_xstate_to_uabi_buf(struct membuf to, struct task_struct *tsk,
+			     enum xstate_copy_mode copy_mode)
+{
+	__copy_xstate_to_uabi_buf(to, &tsk->thread.fpu.state.xsave,
+				  tsk->thread.pkru, copy_mode);
+}
+
 static int copy_from_buffer(void *dst, unsigned int offset, unsigned int size,
 			    const void *kbuf, const void __user *ubuf)
 {
--- a/arch/x86/kernel/fpu/xstate.h
+++ b/arch/x86/kernel/fpu/xstate.h
@@ -15,4 +15,7 @@ static inline void xstate_init_xcomp_bv(
 		xsave->header.xcomp_bv = mask | XCOMP_BV_COMPACTED_FORMAT;
 }
 
+extern void __copy_xstate_to_uabi_buf(struct membuf to, struct xregs_state *xsave,
+				      u32 pkru_val, enum xstate_copy_mode copy_mode);
+
 #endif


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [patch 16/31] x86/fpu: Replace KVMs homebrewn FPU copy to user
  2021-10-11 23:59 [patch 00/31] x86/fpu: Preparatory cleanups for AMX support (part 1) Thomas Gleixner
                   ` (14 preceding siblings ...)
  2021-10-12  0:00 ` [patch 15/31] x86/fpu: Rework copy_xstate_to_uabi_buf() Thomas Gleixner
@ 2021-10-12  0:00 ` Thomas Gleixner
  2021-10-12 17:10   ` Borislav Petkov
  2021-10-12 17:36   ` Paolo Bonzini
  2021-10-12  0:00 ` [patch 17/31] x86/fpu: Mark fpu__init_prepare_fx_sw_frame() as __init Thomas Gleixner
                   ` (15 subsequent siblings)
  31 siblings, 2 replies; 96+ messages in thread
From: Thomas Gleixner @ 2021-10-12  0:00 UTC (permalink / raw)
  To: LKML; +Cc: x86, Chang S. Bae, Dave Hansen, Arjan van de Ven, kvm, Paolo Bonzini

Similar to the copy from user function the FPU core has this already
implemented with all bells and whistels.

Get rid of the duplicated code and use the core functionality.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/include/asm/fpu/api.h |    2 -
 arch/x86/kernel/fpu/core.c     |   16 +++++++++++
 arch/x86/kvm/x86.c             |   56 ++---------------------------------------
 3 files changed, 20 insertions(+), 54 deletions(-)

--- a/arch/x86/include/asm/fpu/api.h
+++ b/arch/x86/include/asm/fpu/api.h
@@ -116,7 +116,7 @@ extern void fpu_init_fpstate_user(struct
 /* KVM specific functions */
 extern void fpu_swap_kvm_fpu(struct fpu *save, struct fpu *rstor, u64 restore_mask);
 
-struct kvm_vcpu;
 extern int fpu_copy_kvm_uabi_to_vcpu(struct fpu *fpu, const void *buf, u64 xcr0, u32 *pkru);
+extern void fpu_copy_vcpu_to_kvm_uabi(struct fpu *fpu, void *buf, unsigned int size, u32 pkru);
 
 #endif /* _ASM_X86_FPU_API_H */
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -175,6 +175,22 @@ void fpu_swap_kvm_fpu(struct fpu *save,
 }
 EXPORT_SYMBOL_GPL(fpu_swap_kvm_fpu);
 
+void fpu_copy_vcpu_to_kvm_uabi(struct fpu *fpu, void *buf,
+			       unsigned int size, u32 pkru)
+{
+	union fpregs_state *kstate = &fpu->state;
+	union fpregs_state *ustate = buf;
+	struct membuf mb = { .p = buf, .left = size };
+
+	if (cpu_feature_enabled(X86_FEATURE_XSAVE)) {
+		__copy_xstate_to_uabi_buf(mb, &kstate->xsave, pkru,
+					  XSTATE_COPY_XSAVE);
+	} else {
+		memcpy(&ustate->fxsave, &kstate->fxsave, sizeof(ustate->fxsave));
+	}
+}
+EXPORT_SYMBOL_GPL(fpu_copy_vcpu_to_kvm_uabi);
+
 int fpu_copy_kvm_uabi_to_vcpu(struct fpu *fpu, const void *buf, u64 xcr0,
 			      u32 *vpkru)
 {
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4695,65 +4695,15 @@ static int kvm_vcpu_ioctl_x86_set_debugr
 	return 0;
 }
 
-static void fill_xsave(u8 *dest, struct kvm_vcpu *vcpu)
-{
-	struct xregs_state *xsave = &vcpu->arch.guest_fpu->state.xsave;
-	u64 xstate_bv = xsave->header.xfeatures;
-	u64 valid;
-
-	/*
-	 * Copy legacy XSAVE area, to avoid complications with CPUID
-	 * leaves 0 and 1 in the loop below.
-	 */
-	memcpy(dest, xsave, XSAVE_HDR_OFFSET);
-
-	/* Set XSTATE_BV */
-	xstate_bv &= vcpu->arch.guest_supported_xcr0 | XFEATURE_MASK_FPSSE;
-	*(u64 *)(dest + XSAVE_HDR_OFFSET) = xstate_bv;
-
-	/*
-	 * Copy each region from the possibly compacted offset to the
-	 * non-compacted offset.
-	 */
-	valid = xstate_bv & ~XFEATURE_MASK_FPSSE;
-	while (valid) {
-		u32 size, offset, ecx, edx;
-		u64 xfeature_mask = valid & -valid;
-		int xfeature_nr = fls64(xfeature_mask) - 1;
-		void *src;
-
-		cpuid_count(XSTATE_CPUID, xfeature_nr,
-			    &size, &offset, &ecx, &edx);
-
-		if (xfeature_nr == XFEATURE_PKRU) {
-			memcpy(dest + offset, &vcpu->arch.pkru,
-			       sizeof(vcpu->arch.pkru));
-		} else {
-			src = get_xsave_addr(xsave, xfeature_nr);
-			if (src)
-				memcpy(dest + offset, src, size);
-		}
-
-		valid -= xfeature_mask;
-	}
-}
-
 static void kvm_vcpu_ioctl_x86_get_xsave(struct kvm_vcpu *vcpu,
 					 struct kvm_xsave *guest_xsave)
 {
 	if (!vcpu->arch.guest_fpu)
 		return;
 
-	if (boot_cpu_has(X86_FEATURE_XSAVE)) {
-		memset(guest_xsave, 0, sizeof(struct kvm_xsave));
-		fill_xsave((u8 *) guest_xsave->region, vcpu);
-	} else {
-		memcpy(guest_xsave->region,
-			&vcpu->arch.guest_fpu->state.fxsave,
-			sizeof(struct fxregs_state));
-		*(u64 *)&guest_xsave->region[XSAVE_HDR_OFFSET / sizeof(u32)] =
-			XFEATURE_MASK_FPSSE;
-	}
+	fpu_copy_vcpu_to_kvm_uabi(vcpu->arch.guest_fpu, guest_xsave->region,
+				  sizeof(guest_xsave->region),
+				  vcpu->arch.pkru);
 }
 
 static int kvm_vcpu_ioctl_x86_set_xsave(struct kvm_vcpu *vcpu,


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [patch 17/31] x86/fpu: Mark fpu__init_prepare_fx_sw_frame() as __init
  2021-10-11 23:59 [patch 00/31] x86/fpu: Preparatory cleanups for AMX support (part 1) Thomas Gleixner
                   ` (15 preceding siblings ...)
  2021-10-12  0:00 ` [patch 16/31] x86/fpu: Replace KVMs homebrewn FPU copy to user Thomas Gleixner
@ 2021-10-12  0:00 ` Thomas Gleixner
  2021-10-12  0:00 ` [patch 18/31] x86/fpu: Move context switch and exit to user inlines into sched.h Thomas Gleixner
                   ` (14 subsequent siblings)
  31 siblings, 0 replies; 96+ messages in thread
From: Thomas Gleixner @ 2021-10-12  0:00 UTC (permalink / raw)
  To: LKML; +Cc: x86, Chang S. Bae, Dave Hansen, Arjan van de Ven, kvm, Paolo Bonzini

No need to keep it around.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/fpu/signal.h |    2 --
 arch/x86/kernel/fpu/internal.h    |    8 ++++++++
 arch/x86/kernel/fpu/signal.c      |    4 +++-
 arch/x86/kernel/fpu/xstate.c      |    1 +
 4 files changed, 12 insertions(+), 3 deletions(-)

--- a/arch/x86/include/asm/fpu/signal.h
+++ b/arch/x86/include/asm/fpu/signal.h
@@ -31,6 +31,4 @@ fpu__alloc_mathframe(unsigned long sp, i
 
 unsigned long fpu__get_fpstate_size(void);
 
-extern void fpu__init_prepare_fx_sw_frame(void);
-
 #endif /* _ASM_X86_FPU_SIGNAL_H */
--- /dev/null
+++ b/arch/x86/kernel/fpu/internal.h
@@ -0,0 +1,8 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __X86_KERNEL_FPU_INTERNAL_H
+#define __X86_KERNEL_FPU_INTERNAL_H
+
+/* Init functions */
+extern void fpu__init_prepare_fx_sw_frame(void);
+
+#endif
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -16,6 +16,8 @@
 #include <asm/trapnr.h>
 #include <asm/trace/fpu.h>
 
+#include "internal.h"
+
 static struct _fpx_sw_bytes fx_sw_reserved __ro_after_init;
 static struct _fpx_sw_bytes fx_sw_reserved_ia32 __ro_after_init;
 
@@ -514,7 +516,7 @@ unsigned long fpu__get_fpstate_size(void
  * This will be saved when ever the FP and extended state context is
  * saved on the user stack during the signal handler delivery to the user.
  */
-void fpu__init_prepare_fx_sw_frame(void)
+void __init fpu__init_prepare_fx_sw_frame(void)
 {
 	int size = fpu_user_xstate_size + FP_XSTATE_MAGIC2_SIZE;
 
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -19,6 +19,7 @@
 
 #include <asm/tlbflush.h>
 
+#include "internal.h"
 #include "xstate.h"
 
 #define for_each_extended_xfeature(bit, mask)				\


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [patch 18/31] x86/fpu: Move context switch and exit to user inlines into sched.h
  2021-10-11 23:59 [patch 00/31] x86/fpu: Preparatory cleanups for AMX support (part 1) Thomas Gleixner
                   ` (16 preceding siblings ...)
  2021-10-12  0:00 ` [patch 17/31] x86/fpu: Mark fpu__init_prepare_fx_sw_frame() as __init Thomas Gleixner
@ 2021-10-12  0:00 ` Thomas Gleixner
  2021-10-12  0:00 ` [patch 19/31] x86/fpu: Clean up cpu feature tests Thomas Gleixner
                   ` (13 subsequent siblings)
  31 siblings, 0 replies; 96+ messages in thread
From: Thomas Gleixner @ 2021-10-12  0:00 UTC (permalink / raw)
  To: LKML; +Cc: x86, Chang S. Bae, Dave Hansen, Arjan van de Ven, kvm, Paolo Bonzini

internal.h is a kitchen sink which needs to get out of the way to prepare
for the upcoming changes.

Move the context switch and exit to user inlines into a separate header,
which is all that code needs.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/fpu/internal.h |   60 -------------------------------
 arch/x86/include/asm/fpu/sched.h    |   68 ++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/fpu/core.c          |    1 
 arch/x86/kernel/process.c           |    2 -
 arch/x86/kernel/process_32.c        |    2 -
 arch/x86/kernel/process_64.c        |    2 -
 6 files changed, 72 insertions(+), 63 deletions(-)

--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -27,16 +27,11 @@
  * High level FPU state handling functions:
  */
 extern bool fpu__restore_sig(void __user *buf, int ia32_frame);
-extern void fpu__drop(struct fpu *fpu);
 extern void fpu__clear_user_states(struct fpu *fpu);
 extern int  fpu__exception_code(struct fpu *fpu, int trap_nr);
 
 extern void fpu_sync_fpstate(struct fpu *fpu);
 
-/* Clone and exit operations */
-extern int  fpu_clone(struct task_struct *dst, unsigned long clone_flags);
-extern void fpu_flush_thread(void);
-
 /*
  * Boot time FPU initialization functions:
  */
@@ -82,7 +77,6 @@ extern void fpstate_init_soft(struct swr
 #else
 static inline void fpstate_init_soft(struct swregs_state *soft) {}
 #endif
-extern void save_fpregs_to_fpstate(struct fpu *fpu);
 
 /*
  * Returns 0 on success or the trap number when the operation raises an
@@ -464,58 +458,4 @@ static inline void fpregs_restore_userre
 	clear_thread_flag(TIF_NEED_FPU_LOAD);
 }
 
-/*
- * FPU state switching for scheduling.
- *
- * This is a two-stage process:
- *
- *  - switch_fpu_prepare() saves the old state.
- *    This is done within the context of the old process.
- *
- *  - switch_fpu_finish() sets TIF_NEED_FPU_LOAD; the floating point state
- *    will get loaded on return to userspace, or when the kernel needs it.
- *
- * If TIF_NEED_FPU_LOAD is cleared then the CPU's FPU registers
- * are saved in the current thread's FPU register state.
- *
- * If TIF_NEED_FPU_LOAD is set then CPU's FPU registers may not
- * hold current()'s FPU registers. It is required to load the
- * registers before returning to userland or using the content
- * otherwise.
- *
- * The FPU context is only stored/restored for a user task and
- * PF_KTHREAD is used to distinguish between kernel and user threads.
- */
-static inline void switch_fpu_prepare(struct fpu *old_fpu, int cpu)
-{
-	if (static_cpu_has(X86_FEATURE_FPU) && !(current->flags & PF_KTHREAD)) {
-		save_fpregs_to_fpstate(old_fpu);
-		/*
-		 * The save operation preserved register state, so the
-		 * fpu_fpregs_owner_ctx is still @old_fpu. Store the
-		 * current CPU number in @old_fpu, so the next return
-		 * to user space can avoid the FPU register restore
-		 * when is returns on the same CPU and still owns the
-		 * context.
-		 */
-		old_fpu->last_cpu = cpu;
-
-		trace_x86_fpu_regs_deactivated(old_fpu);
-	}
-}
-
-/*
- * Misc helper functions:
- */
-
-/*
- * Delay loading of the complete FPU state until the return to userland.
- * PKRU is handled separately.
- */
-static inline void switch_fpu_finish(void)
-{
-	if (cpu_feature_enabled(X86_FEATURE_FPU))
-		set_thread_flag(TIF_NEED_FPU_LOAD);
-}
-
 #endif /* _ASM_X86_FPU_INTERNAL_H */
--- /dev/null
+++ b/arch/x86/include/asm/fpu/sched.h
@@ -0,0 +1,68 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_FPU_SCHED_H
+#define _ASM_X86_FPU_SCHED_H
+
+#include <linux/sched.h>
+
+#include <asm/cpufeature.h>
+#include <asm/fpu/types.h>
+
+#include <asm/trace/fpu.h>
+
+extern void save_fpregs_to_fpstate(struct fpu *fpu);
+extern void fpu__drop(struct fpu *fpu);
+extern int  fpu_clone(struct task_struct *dst, unsigned long clone_flags);
+extern void fpu_flush_thread(void);
+
+/*
+ * FPU state switching for scheduling.
+ *
+ * This is a two-stage process:
+ *
+ *  - switch_fpu_prepare() saves the old state.
+ *    This is done within the context of the old process.
+ *
+ *  - switch_fpu_finish() sets TIF_NEED_FPU_LOAD; the floating point state
+ *    will get loaded on return to userspace, or when the kernel needs it.
+ *
+ * If TIF_NEED_FPU_LOAD is cleared then the CPU's FPU registers
+ * are saved in the current thread's FPU register state.
+ *
+ * If TIF_NEED_FPU_LOAD is set then CPU's FPU registers may not
+ * hold current()'s FPU registers. It is required to load the
+ * registers before returning to userland or using the content
+ * otherwise.
+ *
+ * The FPU context is only stored/restored for a user task and
+ * PF_KTHREAD is used to distinguish between kernel and user threads.
+ */
+static inline void switch_fpu_prepare(struct fpu *old_fpu, int cpu)
+{
+	if (cpu_feature_enabled(X86_FEATURE_FPU) &&
+	    !(current->flags & PF_KTHREAD)) {
+		save_fpregs_to_fpstate(old_fpu);
+		/*
+		 * The save operation preserved register state, so the
+		 * fpu_fpregs_owner_ctx is still @old_fpu. Store the
+		 * current CPU number in @old_fpu, so the next return
+		 * to user space can avoid the FPU register restore
+		 * when is returns on the same CPU and still owns the
+		 * context.
+		 */
+		old_fpu->last_cpu = cpu;
+
+		trace_x86_fpu_regs_deactivated(old_fpu);
+	}
+}
+
+/*
+ * Delay loading of the complete FPU state until the return to userland.
+ * PKRU is handled separately.
+ */
+static inline void switch_fpu_finish(void)
+{
+	if (cpu_feature_enabled(X86_FEATURE_FPU))
+		set_thread_flag(TIF_NEED_FPU_LOAD);
+}
+
+#endif /* _ASM_X86_FPU_SCHED_H */
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -8,6 +8,7 @@
  */
 #include <asm/fpu/internal.h>
 #include <asm/fpu/regset.h>
+#include <asm/fpu/sched.h>
 #include <asm/fpu/signal.h>
 #include <asm/fpu/types.h>
 #include <asm/traps.h>
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -30,7 +30,7 @@
 #include <asm/apic.h>
 #include <linux/uaccess.h>
 #include <asm/mwait.h>
-#include <asm/fpu/internal.h>
+#include <asm/fpu/sched.h>
 #include <asm/debugreg.h>
 #include <asm/nmi.h>
 #include <asm/tlbflush.h>
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -41,7 +41,7 @@
 
 #include <asm/ldt.h>
 #include <asm/processor.h>
-#include <asm/fpu/internal.h>
+#include <asm/fpu/sched.h>
 #include <asm/desc.h>
 
 #include <linux/err.h>
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -42,7 +42,7 @@
 
 #include <asm/processor.h>
 #include <asm/pkru.h>
-#include <asm/fpu/internal.h>
+#include <asm/fpu/sched.h>
 #include <asm/mmu_context.h>
 #include <asm/prctl.h>
 #include <asm/desc.h>


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [patch 19/31] x86/fpu: Clean up cpu feature tests
  2021-10-11 23:59 [patch 00/31] x86/fpu: Preparatory cleanups for AMX support (part 1) Thomas Gleixner
                   ` (17 preceding siblings ...)
  2021-10-12  0:00 ` [patch 18/31] x86/fpu: Move context switch and exit to user inlines into sched.h Thomas Gleixner
@ 2021-10-12  0:00 ` Thomas Gleixner
  2021-10-12  0:00 ` [patch 20/31] x86/fpu: Make os_xrstor_booting() private Thomas Gleixner
                   ` (12 subsequent siblings)
  31 siblings, 0 replies; 96+ messages in thread
From: Thomas Gleixner @ 2021-10-12  0:00 UTC (permalink / raw)
  To: LKML; +Cc: x86, Chang S. Bae, Dave Hansen, Arjan van de Ven, kvm, Paolo Bonzini

Further disintegration of internal.h:

Move the cpu feature tests to a core header and remove the unused one.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/fpu/internal.h |   18 ------------------
 arch/x86/kernel/fpu/core.c          |    1 +
 arch/x86/kernel/fpu/internal.h      |   11 +++++++++++
 arch/x86/kernel/fpu/regset.c        |    2 ++
 4 files changed, 14 insertions(+), 18 deletions(-)

--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -51,24 +51,6 @@ extern void fpu__resume_cpu(void);
 # define WARN_ON_FPU(x) ({ (void)(x); 0; })
 #endif
 
-/*
- * FPU related CPU feature flag helper routines:
- */
-static __always_inline __pure bool use_xsaveopt(void)
-{
-	return static_cpu_has(X86_FEATURE_XSAVEOPT);
-}
-
-static __always_inline __pure bool use_xsave(void)
-{
-	return static_cpu_has(X86_FEATURE_XSAVE);
-}
-
-static __always_inline __pure bool use_fxsr(void)
-{
-	return static_cpu_has(X86_FEATURE_FXSR);
-}
-
 extern union fpregs_state init_fpstate;
 extern void fpstate_init_user(union fpregs_state *state);
 
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -17,6 +17,7 @@
 #include <linux/hardirq.h>
 #include <linux/pkeys.h>
 
+#include "internal.h"
 #include "xstate.h"
 
 #define CREATE_TRACE_POINTS
--- a/arch/x86/kernel/fpu/internal.h
+++ b/arch/x86/kernel/fpu/internal.h
@@ -2,6 +2,17 @@
 #ifndef __X86_KERNEL_FPU_INTERNAL_H
 #define __X86_KERNEL_FPU_INTERNAL_H
 
+/* CPU feature check wrappers */
+static __always_inline __pure bool use_xsave(void)
+{
+	return cpu_feature_enabled(X86_FEATURE_XSAVE);
+}
+
+static __always_inline __pure bool use_fxsr(void)
+{
+	return cpu_feature_enabled(X86_FEATURE_FXSR);
+}
+
 /* Init functions */
 extern void fpu__init_prepare_fx_sw_frame(void);
 
--- a/arch/x86/kernel/fpu/regset.c
+++ b/arch/x86/kernel/fpu/regset.c
@@ -10,6 +10,8 @@
 #include <asm/fpu/regset.h>
 #include <asm/fpu/xstate.h>
 
+#include "internal.h"
+
 /*
  * The xstateregs_active() routine is the same as the regset_fpregs_active() routine,
  * as the "regset->n" for the xstate regset will be updated based on the feature


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [patch 20/31] x86/fpu: Make os_xrstor_booting() private
  2021-10-11 23:59 [patch 00/31] x86/fpu: Preparatory cleanups for AMX support (part 1) Thomas Gleixner
                   ` (18 preceding siblings ...)
  2021-10-12  0:00 ` [patch 19/31] x86/fpu: Clean up cpu feature tests Thomas Gleixner
@ 2021-10-12  0:00 ` Thomas Gleixner
  2021-10-12  0:00 ` [patch 21/31] x86/fpu: Move os_xsave() and os_xrstor() to core Thomas Gleixner
                   ` (11 subsequent siblings)
  31 siblings, 0 replies; 96+ messages in thread
From: Thomas Gleixner @ 2021-10-12  0:00 UTC (permalink / raw)
  To: LKML; +Cc: x86, Chang S. Bae, Dave Hansen, Arjan van de Ven, kvm, Paolo Bonzini

It's only required in the xstate init code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/fpu/internal.h |   25 -------------------------
 arch/x86/kernel/fpu/xstate.c        |   23 +++++++++++++++++++++++
 2 files changed, 23 insertions(+), 25 deletions(-)

--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -226,31 +226,6 @@ static inline void fxsave(struct fxregs_
 		     : "memory")
 
 /*
- * This function is called only during boot time when x86 caps are not set
- * up and alternative can not be used yet.
- */
-static inline void os_xrstor_booting(struct xregs_state *xstate)
-{
-	u64 mask = xfeatures_mask_fpstate();
-	u32 lmask = mask;
-	u32 hmask = mask >> 32;
-	int err;
-
-	WARN_ON(system_state != SYSTEM_BOOTING);
-
-	if (boot_cpu_has(X86_FEATURE_XSAVES))
-		XSTATE_OP(XRSTORS, xstate, lmask, hmask, err);
-	else
-		XSTATE_OP(XRSTOR, xstate, lmask, hmask, err);
-
-	/*
-	 * We should never fault when copying from a kernel buffer, and the FPU
-	 * state we set at boot time should be valid.
-	 */
-	WARN_ON_FPU(err);
-}
-
-/*
  * Save processor xstate to xsave area.
  *
  * Uses either XSAVE or XSAVEOPT or XSAVES depending on the CPU features
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -351,6 +351,29 @@ static void __init print_xstate_offset_s
 }
 
 /*
+ * This function is called only during boot time when x86 caps are not set
+ * up and alternative can not be used yet.
+ */
+static __init void os_xrstor_booting(struct xregs_state *xstate)
+{
+	u64 mask = xfeatures_mask_fpstate();
+	u32 lmask = mask;
+	u32 hmask = mask >> 32;
+	int err;
+
+	if (cpu_feature_enabled(X86_FEATURE_XSAVES))
+		XSTATE_OP(XRSTORS, xstate, lmask, hmask, err);
+	else
+		XSTATE_OP(XRSTOR, xstate, lmask, hmask, err);
+
+	/*
+	 * We should never fault when copying from a kernel buffer, and the FPU
+	 * state we set at boot time should be valid.
+	 */
+	WARN_ON_FPU(err);
+}
+
+/*
  * All supported features have either init state all zeros or are
  * handled in setup_init_fpu() individually. This is an explicit
  * feature list and does not use XFEATURE_MASK*SUPPORTED to catch


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [patch 21/31] x86/fpu: Move os_xsave() and os_xrstor() to core
  2021-10-11 23:59 [patch 00/31] x86/fpu: Preparatory cleanups for AMX support (part 1) Thomas Gleixner
                   ` (19 preceding siblings ...)
  2021-10-12  0:00 ` [patch 20/31] x86/fpu: Make os_xrstor_booting() private Thomas Gleixner
@ 2021-10-12  0:00 ` Thomas Gleixner
  2021-10-12  0:00 ` [patch 22/31] x86/fpu: Move legacy ASM wrappers " Thomas Gleixner
                   ` (10 subsequent siblings)
  31 siblings, 0 replies; 96+ messages in thread
From: Thomas Gleixner @ 2021-10-12  0:00 UTC (permalink / raw)
  To: LKML; +Cc: x86, Chang S. Bae, Dave Hansen, Arjan van de Ven, kvm, Paolo Bonzini

Nothing outside the core code needs these.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/fpu/internal.h |  165 ----------------------------------
 arch/x86/include/asm/fpu/xstate.h   |    6 -
 arch/x86/kernel/fpu/signal.c        |    1 
 arch/x86/kernel/fpu/xstate.h        |  174 ++++++++++++++++++++++++++++++++++++
 4 files changed, 175 insertions(+), 171 deletions(-)

--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -161,171 +161,6 @@ static inline void fxsave(struct fxregs_
 		asm volatile("fxsaveq %[fx]" : [fx] "=m" (*fx));
 }
 
-/* These macros all use (%edi)/(%rdi) as the single memory argument. */
-#define XSAVE		".byte " REX_PREFIX "0x0f,0xae,0x27"
-#define XSAVEOPT	".byte " REX_PREFIX "0x0f,0xae,0x37"
-#define XSAVES		".byte " REX_PREFIX "0x0f,0xc7,0x2f"
-#define XRSTOR		".byte " REX_PREFIX "0x0f,0xae,0x2f"
-#define XRSTORS		".byte " REX_PREFIX "0x0f,0xc7,0x1f"
-
-/*
- * After this @err contains 0 on success or the trap number when the
- * operation raises an exception.
- */
-#define XSTATE_OP(op, st, lmask, hmask, err)				\
-	asm volatile("1:" op "\n\t"					\
-		     "xor %[err], %[err]\n"				\
-		     "2:\n\t"						\
-		     _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_FAULT_MCE_SAFE)	\
-		     : [err] "=a" (err)					\
-		     : "D" (st), "m" (*st), "a" (lmask), "d" (hmask)	\
-		     : "memory")
-
-/*
- * If XSAVES is enabled, it replaces XSAVEOPT because it supports a compact
- * format and supervisor states in addition to modified optimization in
- * XSAVEOPT.
- *
- * Otherwise, if XSAVEOPT is enabled, XSAVEOPT replaces XSAVE because XSAVEOPT
- * supports modified optimization which is not supported by XSAVE.
- *
- * We use XSAVE as a fallback.
- *
- * The 661 label is defined in the ALTERNATIVE* macros as the address of the
- * original instruction which gets replaced. We need to use it here as the
- * address of the instruction where we might get an exception at.
- */
-#define XSTATE_XSAVE(st, lmask, hmask, err)				\
-	asm volatile(ALTERNATIVE_2(XSAVE,				\
-				   XSAVEOPT, X86_FEATURE_XSAVEOPT,	\
-				   XSAVES,   X86_FEATURE_XSAVES)	\
-		     "\n"						\
-		     "xor %[err], %[err]\n"				\
-		     "3:\n"						\
-		     ".pushsection .fixup,\"ax\"\n"			\
-		     "4: movl $-2, %[err]\n"				\
-		     "jmp 3b\n"						\
-		     ".popsection\n"					\
-		     _ASM_EXTABLE(661b, 4b)				\
-		     : [err] "=r" (err)					\
-		     : "D" (st), "m" (*st), "a" (lmask), "d" (hmask)	\
-		     : "memory")
-
-/*
- * Use XRSTORS to restore context if it is enabled. XRSTORS supports compact
- * XSAVE area format.
- */
-#define XSTATE_XRESTORE(st, lmask, hmask)				\
-	asm volatile(ALTERNATIVE(XRSTOR,				\
-				 XRSTORS, X86_FEATURE_XSAVES)		\
-		     "\n"						\
-		     "3:\n"						\
-		     _ASM_EXTABLE_TYPE(661b, 3b, EX_TYPE_FPU_RESTORE)	\
-		     :							\
-		     : "D" (st), "m" (*st), "a" (lmask), "d" (hmask)	\
-		     : "memory")
-
-/*
- * Save processor xstate to xsave area.
- *
- * Uses either XSAVE or XSAVEOPT or XSAVES depending on the CPU features
- * and command line options. The choice is permanent until the next reboot.
- */
-static inline void os_xsave(struct xregs_state *xstate)
-{
-	u64 mask = xfeatures_mask_all;
-	u32 lmask = mask;
-	u32 hmask = mask >> 32;
-	int err;
-
-	WARN_ON_FPU(!alternatives_patched);
-
-	XSTATE_XSAVE(xstate, lmask, hmask, err);
-
-	/* We should never fault when copying to a kernel buffer: */
-	WARN_ON_FPU(err);
-}
-
-/*
- * Restore processor xstate from xsave area.
- *
- * Uses XRSTORS when XSAVES is used, XRSTOR otherwise.
- */
-static inline void os_xrstor(struct xregs_state *xstate, u64 mask)
-{
-	u32 lmask = mask;
-	u32 hmask = mask >> 32;
-
-	XSTATE_XRESTORE(xstate, lmask, hmask);
-}
-
-/*
- * Save xstate to user space xsave area.
- *
- * We don't use modified optimization because xrstor/xrstors might track
- * a different application.
- *
- * We don't use compacted format xsave area for backward compatibility for
- * old applications which don't understand the compacted format of the
- * xsave area.
- *
- * The caller has to zero buf::header before calling this because XSAVE*
- * does not touch the reserved fields in the header.
- */
-static inline int xsave_to_user_sigframe(struct xregs_state __user *buf)
-{
-	/*
-	 * Include the features which are not xsaved/rstored by the kernel
-	 * internally, e.g. PKRU. That's user space ABI and also required
-	 * to allow the signal handler to modify PKRU.
-	 */
-	u64 mask = xfeatures_mask_uabi();
-	u32 lmask = mask;
-	u32 hmask = mask >> 32;
-	int err;
-
-	stac();
-	XSTATE_OP(XSAVE, buf, lmask, hmask, err);
-	clac();
-
-	return err;
-}
-
-/*
- * Restore xstate from user space xsave area.
- */
-static inline int xrstor_from_user_sigframe(struct xregs_state __user *buf, u64 mask)
-{
-	struct xregs_state *xstate = ((__force struct xregs_state *)buf);
-	u32 lmask = mask;
-	u32 hmask = mask >> 32;
-	int err;
-
-	stac();
-	XSTATE_OP(XRSTOR, xstate, lmask, hmask, err);
-	clac();
-
-	return err;
-}
-
-/*
- * Restore xstate from kernel space xsave area, return an error code instead of
- * an exception.
- */
-static inline int os_xrstor_safe(struct xregs_state *xstate, u64 mask)
-{
-	u32 lmask = mask;
-	u32 hmask = mask >> 32;
-	int err;
-
-	if (cpu_feature_enabled(X86_FEATURE_XSAVES))
-		XSTATE_OP(XRSTORS, xstate, lmask, hmask, err);
-	else
-		XSTATE_OP(XRSTOR, xstate, lmask, hmask, err);
-
-	return err;
-}
-
 extern void restore_fpregs_from_fpstate(union fpregs_state *fpstate, u64 mask);
 
 extern bool copy_fpstate_to_sigframe(void __user *buf, void __user *fp, int size);
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -78,12 +78,6 @@
 				      XFEATURE_MASK_INDEPENDENT | \
 				      XFEATURE_MASK_SUPERVISOR_UNSUPPORTED)
 
-#ifdef CONFIG_X86_64
-#define REX_PREFIX	"0x48, "
-#else
-#define REX_PREFIX
-#endif
-
 extern u64 xfeatures_mask_all;
 
 static inline u64 xfeatures_mask_supervisor(void)
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -17,6 +17,7 @@
 #include <asm/trace/fpu.h>
 
 #include "internal.h"
+#include "xstate.h"
 
 static struct _fpx_sw_bytes fx_sw_reserved __ro_after_init;
 static struct _fpx_sw_bytes fx_sw_reserved_ia32 __ro_after_init;
--- a/arch/x86/kernel/fpu/xstate.h
+++ b/arch/x86/kernel/fpu/xstate.h
@@ -18,4 +18,178 @@ static inline void xstate_init_xcomp_bv(
 extern void __copy_xstate_to_uabi_buf(struct membuf to, struct xregs_state *xsave,
 				      u32 pkru_val, enum xstate_copy_mode copy_mode);
 
+/* XSAVE/XRSTOR wrapper functions */
+
+#ifdef CONFIG_X86_64
+#define REX_PREFIX	"0x48, "
+#else
+#define REX_PREFIX
+#endif
+
+/* These macros all use (%edi)/(%rdi) as the single memory argument. */
+#define XSAVE		".byte " REX_PREFIX "0x0f,0xae,0x27"
+#define XSAVEOPT	".byte " REX_PREFIX "0x0f,0xae,0x37"
+#define XSAVES		".byte " REX_PREFIX "0x0f,0xc7,0x2f"
+#define XRSTOR		".byte " REX_PREFIX "0x0f,0xae,0x2f"
+#define XRSTORS		".byte " REX_PREFIX "0x0f,0xc7,0x1f"
+
+/*
+ * After this @err contains 0 on success or the trap number when the
+ * operation raises an exception.
+ */
+#define XSTATE_OP(op, st, lmask, hmask, err)				\
+	asm volatile("1:" op "\n\t"					\
+		     "xor %[err], %[err]\n"				\
+		     "2:\n\t"						\
+		     _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_FAULT_MCE_SAFE)	\
+		     : [err] "=a" (err)					\
+		     : "D" (st), "m" (*st), "a" (lmask), "d" (hmask)	\
+		     : "memory")
+
+/*
+ * If XSAVES is enabled, it replaces XSAVEOPT because it supports a compact
+ * format and supervisor states in addition to modified optimization in
+ * XSAVEOPT.
+ *
+ * Otherwise, if XSAVEOPT is enabled, XSAVEOPT replaces XSAVE because XSAVEOPT
+ * supports modified optimization which is not supported by XSAVE.
+ *
+ * We use XSAVE as a fallback.
+ *
+ * The 661 label is defined in the ALTERNATIVE* macros as the address of the
+ * original instruction which gets replaced. We need to use it here as the
+ * address of the instruction where we might get an exception at.
+ */
+#define XSTATE_XSAVE(st, lmask, hmask, err)				\
+	asm volatile(ALTERNATIVE_2(XSAVE,				\
+				   XSAVEOPT, X86_FEATURE_XSAVEOPT,	\
+				   XSAVES,   X86_FEATURE_XSAVES)	\
+		     "\n"						\
+		     "xor %[err], %[err]\n"				\
+		     "3:\n"						\
+		     ".pushsection .fixup,\"ax\"\n"			\
+		     "4: movl $-2, %[err]\n"				\
+		     "jmp 3b\n"						\
+		     ".popsection\n"					\
+		     _ASM_EXTABLE(661b, 4b)				\
+		     : [err] "=r" (err)					\
+		     : "D" (st), "m" (*st), "a" (lmask), "d" (hmask)	\
+		     : "memory")
+
+/*
+ * Use XRSTORS to restore context if it is enabled. XRSTORS supports compact
+ * XSAVE area format.
+ */
+#define XSTATE_XRESTORE(st, lmask, hmask)				\
+	asm volatile(ALTERNATIVE(XRSTOR,				\
+				 XRSTORS, X86_FEATURE_XSAVES)		\
+		     "\n"						\
+		     "3:\n"						\
+		     _ASM_EXTABLE_TYPE(661b, 3b, EX_TYPE_FPU_RESTORE)	\
+		     :							\
+		     : "D" (st), "m" (*st), "a" (lmask), "d" (hmask)	\
+		     : "memory")
+
+/*
+ * Save processor xstate to xsave area.
+ *
+ * Uses either XSAVE or XSAVEOPT or XSAVES depending on the CPU features
+ * and command line options. The choice is permanent until the next reboot.
+ */
+static inline void os_xsave(struct xregs_state *xstate)
+{
+	u64 mask = xfeatures_mask_all;
+	u32 lmask = mask;
+	u32 hmask = mask >> 32;
+	int err;
+
+	WARN_ON_FPU(!alternatives_patched);
+
+	XSTATE_XSAVE(xstate, lmask, hmask, err);
+
+	/* We should never fault when copying to a kernel buffer: */
+	WARN_ON_FPU(err);
+}
+
+/*
+ * Restore processor xstate from xsave area.
+ *
+ * Uses XRSTORS when XSAVES is used, XRSTOR otherwise.
+ */
+static inline void os_xrstor(struct xregs_state *xstate, u64 mask)
+{
+	u32 lmask = mask;
+	u32 hmask = mask >> 32;
+
+	XSTATE_XRESTORE(xstate, lmask, hmask);
+}
+
+/*
+ * Save xstate to user space xsave area.
+ *
+ * We don't use modified optimization because xrstor/xrstors might track
+ * a different application.
+ *
+ * We don't use compacted format xsave area for backward compatibility for
+ * old applications which don't understand the compacted format of the
+ * xsave area.
+ *
+ * The caller has to zero buf::header before calling this because XSAVE*
+ * does not touch the reserved fields in the header.
+ */
+static inline int xsave_to_user_sigframe(struct xregs_state __user *buf)
+{
+	/*
+	 * Include the features which are not xsaved/rstored by the kernel
+	 * internally, e.g. PKRU. That's user space ABI and also required
+	 * to allow the signal handler to modify PKRU.
+	 */
+	u64 mask = xfeatures_mask_uabi();
+	u32 lmask = mask;
+	u32 hmask = mask >> 32;
+	int err;
+
+	stac();
+	XSTATE_OP(XSAVE, buf, lmask, hmask, err);
+	clac();
+
+	return err;
+}
+
+/*
+ * Restore xstate from user space xsave area.
+ */
+static inline int xrstor_from_user_sigframe(struct xregs_state __user *buf, u64 mask)
+{
+	struct xregs_state *xstate = ((__force struct xregs_state *)buf);
+	u32 lmask = mask;
+	u32 hmask = mask >> 32;
+	int err;
+
+	stac();
+	XSTATE_OP(XRSTOR, xstate, lmask, hmask, err);
+	clac();
+
+	return err;
+}
+
+/*
+ * Restore xstate from kernel space xsave area, return an error code instead of
+ * an exception.
+ */
+static inline int os_xrstor_safe(struct xregs_state *xstate, u64 mask)
+{
+	u32 lmask = mask;
+	u32 hmask = mask >> 32;
+	int err;
+
+	if (cpu_feature_enabled(X86_FEATURE_XSAVES))
+		XSTATE_OP(XRSTORS, xstate, lmask, hmask, err);
+	else
+		XSTATE_OP(XRSTOR, xstate, lmask, hmask, err);
+
+	return err;
+}
+
+
 #endif


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [patch 22/31] x86/fpu: Move legacy ASM wrappers to core
  2021-10-11 23:59 [patch 00/31] x86/fpu: Preparatory cleanups for AMX support (part 1) Thomas Gleixner
                   ` (20 preceding siblings ...)
  2021-10-12  0:00 ` [patch 21/31] x86/fpu: Move os_xsave() and os_xrstor() to core Thomas Gleixner
@ 2021-10-12  0:00 ` Thomas Gleixner
  2021-10-12  0:00 ` [patch 23/31] x86/fpu: Make WARN_ON_FPU() private Thomas Gleixner
                   ` (9 subsequent siblings)
  31 siblings, 0 replies; 96+ messages in thread
From: Thomas Gleixner @ 2021-10-12  0:00 UTC (permalink / raw)
  To: LKML; +Cc: x86, Chang S. Bae, Dave Hansen, Arjan van de Ven, kvm, Paolo Bonzini

Nothing outside the core code requires them.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/fpu/internal.h |  101 ---------------------------------
 arch/x86/kernel/fpu/core.c          |    1 
 arch/x86/kernel/fpu/legacy.h        |  108 ++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/fpu/signal.c        |    1 
 arch/x86/kernel/fpu/xstate.c        |    1 
 5 files changed, 111 insertions(+), 101 deletions(-)

--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -60,107 +60,6 @@ extern void fpstate_init_soft(struct swr
 static inline void fpstate_init_soft(struct swregs_state *soft) {}
 #endif
 
-/*
- * Returns 0 on success or the trap number when the operation raises an
- * exception.
- */
-#define user_insn(insn, output, input...)				\
-({									\
-	int err;							\
-									\
-	might_fault();							\
-									\
-	asm volatile(ASM_STAC "\n"					\
-		     "1: " #insn "\n"					\
-		     "2: " ASM_CLAC "\n"				\
-		     _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_FAULT_MCE_SAFE)	\
-		     : [err] "=a" (err), output				\
-		     : "0"(0), input);					\
-	err;								\
-})
-
-#define kernel_insn_err(insn, output, input...)				\
-({									\
-	int err;							\
-	asm volatile("1:" #insn "\n\t"					\
-		     "2:\n"						\
-		     ".section .fixup,\"ax\"\n"				\
-		     "3:  movl $-1,%[err]\n"				\
-		     "    jmp  2b\n"					\
-		     ".previous\n"					\
-		     _ASM_EXTABLE(1b, 3b)				\
-		     : [err] "=r" (err), output				\
-		     : "0"(0), input);					\
-	err;								\
-})
-
-#define kernel_insn(insn, output, input...)				\
-	asm volatile("1:" #insn "\n\t"					\
-		     "2:\n"						\
-		     _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_FPU_RESTORE)	\
-		     : output : input)
-
-static inline int fnsave_to_user_sigframe(struct fregs_state __user *fx)
-{
-	return user_insn(fnsave %[fx]; fwait,  [fx] "=m" (*fx), "m" (*fx));
-}
-
-static inline int fxsave_to_user_sigframe(struct fxregs_state __user *fx)
-{
-	if (IS_ENABLED(CONFIG_X86_32))
-		return user_insn(fxsave %[fx], [fx] "=m" (*fx), "m" (*fx));
-	else
-		return user_insn(fxsaveq %[fx], [fx] "=m" (*fx), "m" (*fx));
-
-}
-
-static inline void fxrstor(struct fxregs_state *fx)
-{
-	if (IS_ENABLED(CONFIG_X86_32))
-		kernel_insn(fxrstor %[fx], "=m" (*fx), [fx] "m" (*fx));
-	else
-		kernel_insn(fxrstorq %[fx], "=m" (*fx), [fx] "m" (*fx));
-}
-
-static inline int fxrstor_safe(struct fxregs_state *fx)
-{
-	if (IS_ENABLED(CONFIG_X86_32))
-		return kernel_insn_err(fxrstor %[fx], "=m" (*fx), [fx] "m" (*fx));
-	else
-		return kernel_insn_err(fxrstorq %[fx], "=m" (*fx), [fx] "m" (*fx));
-}
-
-static inline int fxrstor_from_user_sigframe(struct fxregs_state __user *fx)
-{
-	if (IS_ENABLED(CONFIG_X86_32))
-		return user_insn(fxrstor %[fx], "=m" (*fx), [fx] "m" (*fx));
-	else
-		return user_insn(fxrstorq %[fx], "=m" (*fx), [fx] "m" (*fx));
-}
-
-static inline void frstor(struct fregs_state *fx)
-{
-	kernel_insn(frstor %[fx], "=m" (*fx), [fx] "m" (*fx));
-}
-
-static inline int frstor_safe(struct fregs_state *fx)
-{
-	return kernel_insn_err(frstor %[fx], "=m" (*fx), [fx] "m" (*fx));
-}
-
-static inline int frstor_from_user_sigframe(struct fregs_state __user *fx)
-{
-	return user_insn(frstor %[fx], "=m" (*fx), [fx] "m" (*fx));
-}
-
-static inline void fxsave(struct fxregs_state *fx)
-{
-	if (IS_ENABLED(CONFIG_X86_32))
-		asm volatile( "fxsave %[fx]" : [fx] "=m" (*fx));
-	else
-		asm volatile("fxsaveq %[fx]" : [fx] "=m" (*fx));
-}
-
 extern void restore_fpregs_from_fpstate(union fpregs_state *fpstate, u64 mask);
 
 extern bool copy_fpstate_to_sigframe(void __user *buf, void __user *fp, int size);
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -18,6 +18,7 @@
 #include <linux/pkeys.h>
 
 #include "internal.h"
+#include "legacy.h"
 #include "xstate.h"
 
 #define CREATE_TRACE_POINTS
--- /dev/null
+++ b/arch/x86/kernel/fpu/legacy.h
@@ -0,0 +1,108 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __X86_KERNEL_FPU_LEGACY_H
+#define __X86_KERNEL_FPU_LEGACY_H
+
+#include <asm/fpu/types.h>
+
+/*
+ * Returns 0 on success or the trap number when the operation raises an
+ * exception.
+ */
+#define user_insn(insn, output, input...)				\
+({									\
+	int err;							\
+									\
+	might_fault();							\
+									\
+	asm volatile(ASM_STAC "\n"					\
+		     "1: " #insn "\n"					\
+		     "2: " ASM_CLAC "\n"				\
+		     _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_FAULT_MCE_SAFE)	\
+		     : [err] "=a" (err), output				\
+		     : "0"(0), input);					\
+	err;								\
+})
+
+#define kernel_insn_err(insn, output, input...)				\
+({									\
+	int err;							\
+	asm volatile("1:" #insn "\n\t"					\
+		     "2:\n"						\
+		     ".section .fixup,\"ax\"\n"				\
+		     "3:  movl $-1,%[err]\n"				\
+		     "    jmp  2b\n"					\
+		     ".previous\n"					\
+		     _ASM_EXTABLE(1b, 3b)				\
+		     : [err] "=r" (err), output				\
+		     : "0"(0), input);					\
+	err;								\
+})
+
+#define kernel_insn(insn, output, input...)				\
+	asm volatile("1:" #insn "\n\t"					\
+		     "2:\n"						\
+		     _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_FPU_RESTORE)	\
+		     : output : input)
+
+static inline int fnsave_to_user_sigframe(struct fregs_state __user *fx)
+{
+	return user_insn(fnsave %[fx]; fwait,  [fx] "=m" (*fx), "m" (*fx));
+}
+
+static inline int fxsave_to_user_sigframe(struct fxregs_state __user *fx)
+{
+	if (IS_ENABLED(CONFIG_X86_32))
+		return user_insn(fxsave %[fx], [fx] "=m" (*fx), "m" (*fx));
+	else
+		return user_insn(fxsaveq %[fx], [fx] "=m" (*fx), "m" (*fx));
+
+}
+
+static inline void fxrstor(struct fxregs_state *fx)
+{
+	if (IS_ENABLED(CONFIG_X86_32))
+		kernel_insn(fxrstor %[fx], "=m" (*fx), [fx] "m" (*fx));
+	else
+		kernel_insn(fxrstorq %[fx], "=m" (*fx), [fx] "m" (*fx));
+}
+
+static inline int fxrstor_safe(struct fxregs_state *fx)
+{
+	if (IS_ENABLED(CONFIG_X86_32))
+		return kernel_insn_err(fxrstor %[fx], "=m" (*fx), [fx] "m" (*fx));
+	else
+		return kernel_insn_err(fxrstorq %[fx], "=m" (*fx), [fx] "m" (*fx));
+}
+
+static inline int fxrstor_from_user_sigframe(struct fxregs_state __user *fx)
+{
+	if (IS_ENABLED(CONFIG_X86_32))
+		return user_insn(fxrstor %[fx], "=m" (*fx), [fx] "m" (*fx));
+	else
+		return user_insn(fxrstorq %[fx], "=m" (*fx), [fx] "m" (*fx));
+}
+
+static inline void frstor(struct fregs_state *fx)
+{
+	kernel_insn(frstor %[fx], "=m" (*fx), [fx] "m" (*fx));
+}
+
+static inline int frstor_safe(struct fregs_state *fx)
+{
+	return kernel_insn_err(frstor %[fx], "=m" (*fx), [fx] "m" (*fx));
+}
+
+static inline int frstor_from_user_sigframe(struct fregs_state __user *fx)
+{
+	return user_insn(frstor %[fx], "=m" (*fx), [fx] "m" (*fx));
+}
+
+static inline void fxsave(struct fxregs_state *fx)
+{
+	if (IS_ENABLED(CONFIG_X86_32))
+		asm volatile( "fxsave %[fx]" : [fx] "=m" (*fx));
+	else
+		asm volatile("fxsaveq %[fx]" : [fx] "=m" (*fx));
+}
+
+#endif
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -17,6 +17,7 @@
 #include <asm/trace/fpu.h>
 
 #include "internal.h"
+#include "legacy.h"
 #include "xstate.h"
 
 static struct _fpx_sw_bytes fx_sw_reserved __ro_after_init;
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -20,6 +20,7 @@
 #include <asm/tlbflush.h>
 
 #include "internal.h"
+#include "legacy.h"
 #include "xstate.h"
 
 #define for_each_extended_xfeature(bit, mask)				\


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [patch 23/31] x86/fpu: Make WARN_ON_FPU() private
  2021-10-11 23:59 [patch 00/31] x86/fpu: Preparatory cleanups for AMX support (part 1) Thomas Gleixner
                   ` (21 preceding siblings ...)
  2021-10-12  0:00 ` [patch 22/31] x86/fpu: Move legacy ASM wrappers " Thomas Gleixner
@ 2021-10-12  0:00 ` Thomas Gleixner
  2021-10-12  0:00 ` [patch 24/31] x86/fpu: Move fpregs_restore_userregs() to core Thomas Gleixner
                   ` (8 subsequent siblings)
  31 siblings, 0 replies; 96+ messages in thread
From: Thomas Gleixner @ 2021-10-12  0:00 UTC (permalink / raw)
  To: LKML; +Cc: x86, Chang S. Bae, Dave Hansen, Arjan van de Ven, kvm, Paolo Bonzini

No point in being in global headers.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/fpu/internal.h |    9 ---------
 arch/x86/kernel/fpu/init.c          |    2 ++
 arch/x86/kernel/fpu/internal.h      |    6 ++++++
 3 files changed, 8 insertions(+), 9 deletions(-)

--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -42,15 +42,6 @@ extern void fpu__init_system(struct cpui
 extern void fpu__init_check_bugs(void);
 extern void fpu__resume_cpu(void);
 
-/*
- * Debugging facility:
- */
-#ifdef CONFIG_X86_DEBUG_FPU
-# define WARN_ON_FPU(x) WARN_ON_ONCE(x)
-#else
-# define WARN_ON_FPU(x) ({ (void)(x); 0; })
-#endif
-
 extern union fpregs_state init_fpstate;
 extern void fpstate_init_user(union fpregs_state *state);
 
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -10,6 +10,8 @@
 #include <linux/sched/task.h>
 #include <linux/init.h>
 
+#include "internal.h"
+
 /*
  * Initialize the registers found in all CPUs, CR0 and CR4:
  */
--- a/arch/x86/kernel/fpu/internal.h
+++ b/arch/x86/kernel/fpu/internal.h
@@ -13,6 +13,12 @@ static __always_inline __pure bool use_f
 	return cpu_feature_enabled(X86_FEATURE_FXSR);
 }
 
+#ifdef CONFIG_X86_DEBUG_FPU
+# define WARN_ON_FPU(x) WARN_ON_ONCE(x)
+#else
+# define WARN_ON_FPU(x) ({ (void)(x); 0; })
+#endif
+
 /* Init functions */
 extern void fpu__init_prepare_fx_sw_frame(void);
 


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [patch 24/31] x86/fpu: Move fpregs_restore_userregs() to core
  2021-10-11 23:59 [patch 00/31] x86/fpu: Preparatory cleanups for AMX support (part 1) Thomas Gleixner
                   ` (22 preceding siblings ...)
  2021-10-12  0:00 ` [patch 23/31] x86/fpu: Make WARN_ON_FPU() private Thomas Gleixner
@ 2021-10-12  0:00 ` Thomas Gleixner
  2021-10-12 17:32   ` Borislav Petkov
  2021-10-12  0:00 ` [patch 25/31] x86/fpu: Move mxcsr related code " Thomas Gleixner
                   ` (7 subsequent siblings)
  31 siblings, 1 reply; 96+ messages in thread
From: Thomas Gleixner @ 2021-10-12  0:00 UTC (permalink / raw)
  To: LKML; +Cc: x86, Chang S. Bae, Dave Hansen, Arjan van de Ven, kvm, Paolo Bonzini

Only used core internaly.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/fpu/internal.h |   83 -----------------------------------
 arch/x86/kernel/fpu/context.h       |   85 ++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/fpu/core.c          |    1 
 arch/x86/kernel/fpu/regset.c        |    1 
 arch/x86/kernel/fpu/signal.c        |    1 
 5 files changed, 88 insertions(+), 83 deletions(-)

--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -55,89 +55,6 @@ extern void restore_fpregs_from_fpstate(
 
 extern bool copy_fpstate_to_sigframe(void __user *buf, void __user *fp, int size);
 
-/*
- * FPU context switch related helper methods:
- */
-
 DECLARE_PER_CPU(struct fpu *, fpu_fpregs_owner_ctx);
 
-/*
- * The in-register FPU state for an FPU context on a CPU is assumed to be
- * valid if the fpu->last_cpu matches the CPU, and the fpu_fpregs_owner_ctx
- * matches the FPU.
- *
- * If the FPU register state is valid, the kernel can skip restoring the
- * FPU state from memory.
- *
- * Any code that clobbers the FPU registers or updates the in-memory
- * FPU state for a task MUST let the rest of the kernel know that the
- * FPU registers are no longer valid for this task.
- *
- * Either one of these invalidation functions is enough. Invalidate
- * a resource you control: CPU if using the CPU for something else
- * (with preemption disabled), FPU for the current task, or a task that
- * is prevented from running by the current task.
- */
-static inline void __cpu_invalidate_fpregs_state(void)
-{
-	__this_cpu_write(fpu_fpregs_owner_ctx, NULL);
-}
-
-static inline void __fpu_invalidate_fpregs_state(struct fpu *fpu)
-{
-	fpu->last_cpu = -1;
-}
-
-static inline int fpregs_state_valid(struct fpu *fpu, unsigned int cpu)
-{
-	return fpu == this_cpu_read(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu;
-}
-
-/*
- * These generally need preemption protection to work,
- * do try to avoid using these on their own:
- */
-static inline void fpregs_deactivate(struct fpu *fpu)
-{
-	this_cpu_write(fpu_fpregs_owner_ctx, NULL);
-	trace_x86_fpu_regs_deactivated(fpu);
-}
-
-static inline void fpregs_activate(struct fpu *fpu)
-{
-	this_cpu_write(fpu_fpregs_owner_ctx, fpu);
-	trace_x86_fpu_regs_activated(fpu);
-}
-
-/* Internal helper for switch_fpu_return() and signal frame setup */
-static inline void fpregs_restore_userregs(void)
-{
-	struct fpu *fpu = &current->thread.fpu;
-	int cpu = smp_processor_id();
-
-	if (WARN_ON_ONCE(current->flags & PF_KTHREAD))
-		return;
-
-	if (!fpregs_state_valid(fpu, cpu)) {
-		u64 mask;
-
-		/*
-		 * This restores _all_ xstate which has not been
-		 * established yet.
-		 *
-		 * If PKRU is enabled, then the PKRU value is already
-		 * correct because it was either set in switch_to() or in
-		 * flush_thread(). So it is excluded because it might be
-		 * not up to date in current->thread.fpu.xsave state.
-		 */
-		mask = xfeatures_mask_restore_user() |
-			xfeatures_mask_supervisor();
-		restore_fpregs_from_fpstate(&fpu->state, mask);
-
-		fpregs_activate(fpu);
-		fpu->last_cpu = cpu;
-	}
-	clear_thread_flag(TIF_NEED_FPU_LOAD);
-}
-
 #endif /* _ASM_X86_FPU_INTERNAL_H */
--- /dev/null
+++ b/arch/x86/kernel/fpu/context.h
@@ -0,0 +1,85 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __X86_KERNEL_FPU_CONTEXT_H
+#define __X86_KERNEL_FPU_CONTEXT_H
+
+#include <asm/fpu/xstate.h>
+#include <asm/trace/fpu.h>
+
+/* Functions related to FPU context tracking */
+
+/*
+ * The in-register FPU state for an FPU context on a CPU is assumed to be
+ * valid if the fpu->last_cpu matches the CPU, and the fpu_fpregs_owner_ctx
+ * matches the FPU.
+ *
+ * If the FPU register state is valid, the kernel can skip restoring the
+ * FPU state from memory.
+ *
+ * Any code that clobbers the FPU registers or updates the in-memory
+ * FPU state for a task MUST let the rest of the kernel know that the
+ * FPU registers are no longer valid for this task.
+ *
+ * Either one of these invalidation functions is enough. Invalidate
+ * a resource you control: CPU if using the CPU for something else
+ * (with preemption disabled), FPU for the current task, or a task that
+ * is prevented from running by the current task.
+ */
+static inline void __cpu_invalidate_fpregs_state(void)
+{
+	__this_cpu_write(fpu_fpregs_owner_ctx, NULL);
+}
+
+static inline void __fpu_invalidate_fpregs_state(struct fpu *fpu)
+{
+	fpu->last_cpu = -1;
+}
+
+static inline int fpregs_state_valid(struct fpu *fpu, unsigned int cpu)
+{
+	return fpu == this_cpu_read(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu;
+}
+
+static inline void fpregs_deactivate(struct fpu *fpu)
+{
+	__this_cpu_write(fpu_fpregs_owner_ctx, NULL);
+	trace_x86_fpu_regs_deactivated(fpu);
+}
+
+static inline void fpregs_activate(struct fpu *fpu)
+{
+	__this_cpu_write(fpu_fpregs_owner_ctx, fpu);
+	trace_x86_fpu_regs_activated(fpu);
+}
+
+/* Internal helper for switch_fpu_return() and signal frame setup */
+static inline void fpregs_restore_userregs(void)
+{
+	struct fpu *fpu = &current->thread.fpu;
+	int cpu = smp_processor_id();
+
+	if (WARN_ON_ONCE(current->flags & PF_KTHREAD))
+		return;
+
+	if (!fpregs_state_valid(fpu, cpu)) {
+		u64 mask;
+
+		/*
+		 * This restores _all_ xstate which has not been
+		 * established yet.
+		 *
+		 * If PKRU is enabled, then the PKRU value is already
+		 * correct because it was either set in switch_to() or in
+		 * flush_thread(). So it is excluded because it might be
+		 * not up to date in current->thread.fpu.xsave state.
+		 */
+		mask = xfeatures_mask_restore_user() |
+			xfeatures_mask_supervisor();
+		restore_fpregs_from_fpstate(&fpu->state, mask);
+
+		fpregs_activate(fpu);
+		fpu->last_cpu = cpu;
+	}
+	clear_thread_flag(TIF_NEED_FPU_LOAD);
+}
+
+#endif
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -17,6 +17,7 @@
 #include <linux/hardirq.h>
 #include <linux/pkeys.h>
 
+#include "context.h"
 #include "internal.h"
 #include "legacy.h"
 #include "xstate.h"
--- a/arch/x86/kernel/fpu/regset.c
+++ b/arch/x86/kernel/fpu/regset.c
@@ -10,6 +10,7 @@
 #include <asm/fpu/regset.h>
 #include <asm/fpu/xstate.h>
 
+#include "context.h"
 #include "internal.h"
 
 /*
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -16,6 +16,7 @@
 #include <asm/trapnr.h>
 #include <asm/trace/fpu.h>
 
+#include "context.h"
 #include "internal.h"
 #include "legacy.h"
 #include "xstate.h"


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [patch 25/31] x86/fpu: Move mxcsr related code to core
  2021-10-11 23:59 [patch 00/31] x86/fpu: Preparatory cleanups for AMX support (part 1) Thomas Gleixner
                   ` (23 preceding siblings ...)
  2021-10-12  0:00 ` [patch 24/31] x86/fpu: Move fpregs_restore_userregs() to core Thomas Gleixner
@ 2021-10-12  0:00 ` Thomas Gleixner
  2021-10-12  0:00 ` [patch 26/31] x86/fpu: Move fpstate functions to api.h Thomas Gleixner
                   ` (6 subsequent siblings)
  31 siblings, 0 replies; 96+ messages in thread
From: Thomas Gleixner @ 2021-10-12  0:00 UTC (permalink / raw)
  To: LKML; +Cc: x86, Chang S. Bae, Dave Hansen, Arjan van de Ven, kvm, Paolo Bonzini

No need to expose that to code which only needs the XCR0 accessors.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/fpu/xcr.h |   11 -----------
 arch/x86/kernel/fpu/init.c     |    1 +
 arch/x86/kernel/fpu/legacy.h   |    7 +++++++
 arch/x86/kernel/fpu/regset.c   |    1 +
 arch/x86/kernel/fpu/xstate.c   |    3 ++-
 arch/x86/kvm/svm/sev.c         |    2 +-
 6 files changed, 12 insertions(+), 13 deletions(-)

--- a/arch/x86/include/asm/fpu/xcr.h
+++ b/arch/x86/include/asm/fpu/xcr.h
@@ -2,17 +2,6 @@
 #ifndef _ASM_X86_FPU_XCR_H
 #define _ASM_X86_FPU_XCR_H
 
-/*
- * MXCSR and XCR definitions:
- */
-
-static inline void ldmxcsr(u32 mxcsr)
-{
-	asm volatile("ldmxcsr %0" :: "m" (mxcsr));
-}
-
-extern unsigned int mxcsr_feature_mask;
-
 #define XCR_XFEATURE_ENABLED_MASK	0x00000000
 
 static inline u64 xgetbv(u32 index)
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -11,6 +11,7 @@
 #include <linux/init.h>
 
 #include "internal.h"
+#include "legacy.h"
 
 /*
  * Initialize the registers found in all CPUs, CR0 and CR4:
--- a/arch/x86/kernel/fpu/legacy.h
+++ b/arch/x86/kernel/fpu/legacy.h
@@ -4,6 +4,13 @@
 
 #include <asm/fpu/types.h>
 
+extern unsigned int mxcsr_feature_mask;
+
+static inline void ldmxcsr(u32 mxcsr)
+{
+	asm volatile("ldmxcsr %0" :: "m" (mxcsr));
+}
+
 /*
  * Returns 0 on success or the trap number when the operation raises an
  * exception.
--- a/arch/x86/kernel/fpu/regset.c
+++ b/arch/x86/kernel/fpu/regset.c
@@ -12,6 +12,7 @@
 
 #include "context.h"
 #include "internal.h"
+#include "legacy.h"
 
 /*
  * The xstateregs_active() routine is the same as the regset_fpregs_active() routine,
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -14,8 +14,9 @@
 
 #include <asm/fpu/api.h>
 #include <asm/fpu/internal.h>
-#include <asm/fpu/signal.h>
 #include <asm/fpu/regset.h>
+#include <asm/fpu/signal.h>
+#include <asm/fpu/xcr.h>
 
 #include <asm/tlbflush.h>
 
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -17,10 +17,10 @@
 #include <linux/misc_cgroup.h>
 #include <linux/processor.h>
 #include <linux/trace_events.h>
-#include <asm/fpu/internal.h>
 
 #include <asm/pkru.h>
 #include <asm/trapnr.h>
+#include <asm/fpu/xcr.h>
 
 #include "x86.h"
 #include "svm.h"


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [patch 26/31] x86/fpu: Move fpstate functions to api.h
  2021-10-11 23:59 [patch 00/31] x86/fpu: Preparatory cleanups for AMX support (part 1) Thomas Gleixner
                   ` (24 preceding siblings ...)
  2021-10-12  0:00 ` [patch 25/31] x86/fpu: Move mxcsr related code " Thomas Gleixner
@ 2021-10-12  0:00 ` Thomas Gleixner
  2021-10-12 17:46   ` Borislav Petkov
  2021-10-12  0:00 ` [patch 27/31] x86/fpu: Remove internal.h dependency from fpu/signal.h Thomas Gleixner
                   ` (5 subsequent siblings)
  31 siblings, 1 reply; 96+ messages in thread
From: Thomas Gleixner @ 2021-10-12  0:00 UTC (permalink / raw)
  To: LKML; +Cc: x86, Chang S. Bae, Dave Hansen, Arjan van de Ven, kvm, Paolo Bonzini

Move function declarations which need to be globaly available to api.h
where they belong.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/fpu/api.h      |    9 +++++++++
 arch/x86/include/asm/fpu/internal.h |    9 ---------
 arch/x86/kernel/fpu/internal.h      |    3 +++
 arch/x86/math-emu/fpu_entry.c       |    2 +-
 4 files changed, 13 insertions(+), 10 deletions(-)

--- a/arch/x86/include/asm/fpu/api.h
+++ b/arch/x86/include/asm/fpu/api.h
@@ -110,6 +110,15 @@ extern int cpu_has_xfeatures(u64 xfeatur
 
 static inline void update_pasid(void) { }
 
+#ifdef CONFIG_MATH_EMULATION
+extern void fpstate_init_soft(struct swregs_state *soft);
+#else
+static inline void fpstate_init_soft(struct swregs_state *soft) {}
+#endif
+
+/* FPSTATE */
+extern union fpregs_state init_fpstate;
+
 /* FPSTATE related functions which are exported to KVM */
 extern void fpu_init_fpstate_user(struct fpu *fpu);
 
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -42,15 +42,6 @@ extern void fpu__init_system(struct cpui
 extern void fpu__init_check_bugs(void);
 extern void fpu__resume_cpu(void);
 
-extern union fpregs_state init_fpstate;
-extern void fpstate_init_user(union fpregs_state *state);
-
-#ifdef CONFIG_MATH_EMULATION
-extern void fpstate_init_soft(struct swregs_state *soft);
-#else
-static inline void fpstate_init_soft(struct swregs_state *soft) {}
-#endif
-
 extern void restore_fpregs_from_fpstate(union fpregs_state *fpstate, u64 mask);
 
 extern bool copy_fpstate_to_sigframe(void __user *buf, void __user *fp, int size);
--- a/arch/x86/kernel/fpu/internal.h
+++ b/arch/x86/kernel/fpu/internal.h
@@ -22,4 +22,7 @@ static __always_inline __pure bool use_f
 /* Init functions */
 extern void fpu__init_prepare_fx_sw_frame(void);
 
+/* Used in init.c */
+extern void fpstate_init_user(union fpregs_state *state);
+
 #endif
--- a/arch/x86/math-emu/fpu_entry.c
+++ b/arch/x86/math-emu/fpu_entry.c
@@ -31,7 +31,7 @@
 #include <linux/uaccess.h>
 #include <asm/traps.h>
 #include <asm/user.h>
-#include <asm/fpu/internal.h>
+#include <asm/fpu/api.h>
 
 #include "fpu_system.h"
 #include "fpu_emu.h"


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [patch 27/31] x86/fpu: Remove internal.h dependency from fpu/signal.h
  2021-10-11 23:59 [patch 00/31] x86/fpu: Preparatory cleanups for AMX support (part 1) Thomas Gleixner
                   ` (25 preceding siblings ...)
  2021-10-12  0:00 ` [patch 26/31] x86/fpu: Move fpstate functions to api.h Thomas Gleixner
@ 2021-10-12  0:00 ` Thomas Gleixner
  2021-10-12  0:00 ` [patch 28/31] x86/sev: Include fpu/xcr.h Thomas Gleixner
                   ` (4 subsequent siblings)
  31 siblings, 0 replies; 96+ messages in thread
From: Thomas Gleixner @ 2021-10-12  0:00 UTC (permalink / raw)
  To: LKML; +Cc: x86, Chang S. Bae, Dave Hansen, Arjan van de Ven, kvm, Paolo Bonzini

In order to remove internal.h make signal.h independent of it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/ia32/ia32_signal.c         |    1 -
 arch/x86/include/asm/fpu/api.h      |    3 +++
 arch/x86/include/asm/fpu/internal.h |    7 -------
 arch/x86/include/asm/fpu/signal.h   |   13 +++++++++++++
 arch/x86/kernel/fpu/signal.c        |    1 -
 arch/x86/kernel/ptrace.c            |    1 -
 arch/x86/kernel/signal.c            |    1 -
 arch/x86/mm/extable.c               |    3 ++-
 8 files changed, 18 insertions(+), 12 deletions(-)

--- a/arch/x86/ia32/ia32_signal.c
+++ b/arch/x86/ia32/ia32_signal.c
@@ -24,7 +24,6 @@
 #include <linux/syscalls.h>
 #include <asm/ucontext.h>
 #include <linux/uaccess.h>
-#include <asm/fpu/internal.h>
 #include <asm/fpu/signal.h>
 #include <asm/ptrace.h>
 #include <asm/ia32_unistd.h>
--- a/arch/x86/include/asm/fpu/api.h
+++ b/arch/x86/include/asm/fpu/api.h
@@ -116,6 +116,9 @@ extern void fpstate_init_soft(struct swr
 static inline void fpstate_init_soft(struct swregs_state *soft) {}
 #endif
 
+/* State tracking */
+DECLARE_PER_CPU(struct fpu *, fpu_fpregs_owner_ctx);
+
 /* FPSTATE */
 extern union fpregs_state init_fpstate;
 
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -26,7 +26,6 @@
 /*
  * High level FPU state handling functions:
  */
-extern bool fpu__restore_sig(void __user *buf, int ia32_frame);
 extern void fpu__clear_user_states(struct fpu *fpu);
 extern int  fpu__exception_code(struct fpu *fpu, int trap_nr);
 
@@ -42,10 +41,4 @@ extern void fpu__init_system(struct cpui
 extern void fpu__init_check_bugs(void);
 extern void fpu__resume_cpu(void);
 
-extern void restore_fpregs_from_fpstate(union fpregs_state *fpstate, u64 mask);
-
-extern bool copy_fpstate_to_sigframe(void __user *buf, void __user *fp, int size);
-
-DECLARE_PER_CPU(struct fpu *, fpu_fpregs_owner_ctx);
-
 #endif /* _ASM_X86_FPU_INTERNAL_H */
--- a/arch/x86/include/asm/fpu/signal.h
+++ b/arch/x86/include/asm/fpu/signal.h
@@ -5,6 +5,11 @@
 #ifndef _ASM_X86_FPU_SIGNAL_H
 #define _ASM_X86_FPU_SIGNAL_H
 
+#include <linux/compat.h>
+#include <linux/user.h>
+
+#include <asm/fpu/types.h>
+
 #ifdef CONFIG_X86_64
 # include <uapi/asm/sigcontext.h>
 # include <asm/user32.h>
@@ -31,4 +36,12 @@ fpu__alloc_mathframe(unsigned long sp, i
 
 unsigned long fpu__get_fpstate_size(void);
 
+extern bool copy_fpstate_to_sigframe(void __user *buf, void __user *fp, int size);
+extern void fpu__clear_user_states(struct fpu *fpu);
+extern bool fpu__restore_sig(void __user *buf, int ia32_frame);
+
+extern void restore_fpregs_from_fpstate(union fpregs_state *fpstate, u64 mask);
+
+extern bool copy_fpstate_to_sigframe(void __user *buf, void __user *fp, int size);
+
 #endif /* _ASM_X86_FPU_SIGNAL_H */
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -7,7 +7,6 @@
 #include <linux/cpu.h>
 #include <linux/pagemap.h>
 
-#include <asm/fpu/internal.h>
 #include <asm/fpu/signal.h>
 #include <asm/fpu/regset.h>
 #include <asm/fpu/xstate.h>
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -29,7 +29,6 @@
 
 #include <linux/uaccess.h>
 #include <asm/processor.h>
-#include <asm/fpu/internal.h>
 #include <asm/fpu/signal.h>
 #include <asm/fpu/regset.h>
 #include <asm/debugreg.h>
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -30,7 +30,6 @@
 
 #include <asm/processor.h>
 #include <asm/ucontext.h>
-#include <asm/fpu/internal.h>
 #include <asm/fpu/signal.h>
 #include <asm/vdso.h>
 #include <asm/mce.h>
--- a/arch/x86/mm/extable.c
+++ b/arch/x86/mm/extable.c
@@ -4,7 +4,8 @@
 #include <linux/sched/debug.h>
 #include <xen/xen.h>
 
-#include <asm/fpu/internal.h>
+#include <asm/fpu/signal.h>
+#include <asm/fpu/xstate.h>
 #include <asm/sev.h>
 #include <asm/traps.h>
 #include <asm/kdebug.h>


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [patch 28/31] x86/sev: Include fpu/xcr.h
  2021-10-11 23:59 [patch 00/31] x86/fpu: Preparatory cleanups for AMX support (part 1) Thomas Gleixner
                   ` (26 preceding siblings ...)
  2021-10-12  0:00 ` [patch 27/31] x86/fpu: Remove internal.h dependency from fpu/signal.h Thomas Gleixner
@ 2021-10-12  0:00 ` Thomas Gleixner
  2021-10-12  7:24   ` Xiaoyao Li
  2021-10-12  0:00 ` [patch 29/31] x86/fpu: Mop up the internal.h leftovers Thomas Gleixner
                   ` (3 subsequent siblings)
  31 siblings, 1 reply; 96+ messages in thread
From: Thomas Gleixner @ 2021-10-12  0:00 UTC (permalink / raw)
  To: LKML; +Cc: x86, Chang S. Bae, Dave Hansen, Arjan van de Ven, kvm, Paolo Bonzini

Include the header which only provides the XRC accessors. That's all what
is needed here.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/sev.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -23,7 +23,7 @@
 #include <asm/stacktrace.h>
 #include <asm/sev.h>
 #include <asm/insn-eval.h>
-#include <asm/fpu/internal.h>
+#include <asm/fpu/xcr.h>
 #include <asm/processor.h>
 #include <asm/realmode.h>
 #include <asm/traps.h>


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [patch 29/31] x86/fpu: Mop up the internal.h leftovers
  2021-10-11 23:59 [patch 00/31] x86/fpu: Preparatory cleanups for AMX support (part 1) Thomas Gleixner
                   ` (27 preceding siblings ...)
  2021-10-12  0:00 ` [patch 28/31] x86/sev: Include fpu/xcr.h Thomas Gleixner
@ 2021-10-12  0:00 ` Thomas Gleixner
  2021-10-12  0:00 ` [patch 30/31] x86/fpu: Replace the includes of fpu/internal.h Thomas Gleixner
                   ` (2 subsequent siblings)
  31 siblings, 0 replies; 96+ messages in thread
From: Thomas Gleixner @ 2021-10-12  0:00 UTC (permalink / raw)
  To: LKML; +Cc: x86, Chang S. Bae, Dave Hansen, Arjan van de Ven, kvm, Paolo Bonzini

Move the global interfaces to api.h and the rest into the core.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/fpu/api.h      |   10 ++++++++++
 arch/x86/include/asm/fpu/internal.h |   18 ------------------
 arch/x86/kernel/fpu/init.c          |    1 +
 arch/x86/kernel/fpu/xstate.h        |    3 +++
 4 files changed, 14 insertions(+), 18 deletions(-)

--- a/arch/x86/include/asm/fpu/api.h
+++ b/arch/x86/include/asm/fpu/api.h
@@ -110,6 +110,16 @@ extern int cpu_has_xfeatures(u64 xfeatur
 
 static inline void update_pasid(void) { }
 
+/* Trap handling */
+extern int  fpu__exception_code(struct fpu *fpu, int trap_nr);
+extern void fpu_sync_fpstate(struct fpu *fpu);
+
+/* Boot, hotplug and resume */
+extern void fpu__init_cpu(void);
+extern void fpu__init_system(struct cpuinfo_x86 *c);
+extern void fpu__init_check_bugs(void);
+extern void fpu__resume_cpu(void);
+
 #ifdef CONFIG_MATH_EMULATION
 extern void fpstate_init_soft(struct swregs_state *soft);
 #else
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -23,22 +23,4 @@
 #include <asm/cpufeature.h>
 #include <asm/trace/fpu.h>
 
-/*
- * High level FPU state handling functions:
- */
-extern void fpu__clear_user_states(struct fpu *fpu);
-extern int  fpu__exception_code(struct fpu *fpu, int trap_nr);
-
-extern void fpu_sync_fpstate(struct fpu *fpu);
-
-/*
- * Boot time FPU initialization functions:
- */
-extern void fpu__init_cpu(void);
-extern void fpu__init_system_xstate(void);
-extern void fpu__init_cpu_xstate(void);
-extern void fpu__init_system(struct cpuinfo_x86 *c);
-extern void fpu__init_check_bugs(void);
-extern void fpu__resume_cpu(void);
-
 #endif /* _ASM_X86_FPU_INTERNAL_H */
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -12,6 +12,7 @@
 
 #include "internal.h"
 #include "legacy.h"
+#include "xstate.h"
 
 /*
  * Initialize the registers found in all CPUs, CR0 and CR4:
--- a/arch/x86/kernel/fpu/xstate.h
+++ b/arch/x86/kernel/fpu/xstate.h
@@ -18,6 +18,9 @@ static inline void xstate_init_xcomp_bv(
 extern void __copy_xstate_to_uabi_buf(struct membuf to, struct xregs_state *xsave,
 				      u32 pkru_val, enum xstate_copy_mode copy_mode);
 
+extern void fpu__init_cpu_xstate(void);
+extern void fpu__init_system_xstate(void);
+
 /* XSAVE/XRSTOR wrapper functions */
 
 #ifdef CONFIG_X86_64


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [patch 30/31] x86/fpu: Replace the includes of fpu/internal.h
  2021-10-11 23:59 [patch 00/31] x86/fpu: Preparatory cleanups for AMX support (part 1) Thomas Gleixner
                   ` (28 preceding siblings ...)
  2021-10-12  0:00 ` [patch 29/31] x86/fpu: Mop up the internal.h leftovers Thomas Gleixner
@ 2021-10-12  0:00 ` Thomas Gleixner
  2021-10-12  0:00 ` [patch 31/31] x86/fpu: Provide a proper function for ex_handler_fprestore() Thomas Gleixner
  2021-10-12 21:15 ` [patch 00/31] x86/fpu: Preparatory cleanups for AMX support (part 1) Thomas Gleixner
  31 siblings, 0 replies; 96+ messages in thread
From: Thomas Gleixner @ 2021-10-12  0:00 UTC (permalink / raw)
  To: LKML; +Cc: x86, Chang S. Bae, Dave Hansen, Arjan van de Ven, kvm, Paolo Bonzini

Now that the file is empty, fixup all references with the proper includes
and delete the former kitchen sink.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/fpu/internal.h |   26 --------------------------
 arch/x86/kernel/cpu/bugs.c          |    2 +-
 arch/x86/kernel/cpu/common.c        |    2 +-
 arch/x86/kernel/fpu/bugs.c          |    2 +-
 arch/x86/kernel/fpu/core.c          |    2 +-
 arch/x86/kernel/fpu/init.c          |    2 +-
 arch/x86/kernel/fpu/regset.c        |    2 +-
 arch/x86/kernel/fpu/xstate.c        |    1 -
 arch/x86/kernel/smpboot.c           |    2 +-
 arch/x86/kernel/traps.c             |    2 +-
 arch/x86/kvm/vmx/vmx.c              |    2 +-
 arch/x86/power/cpu.c                |    2 +-
 12 files changed, 10 insertions(+), 37 deletions(-)

--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -1,26 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-/*
- * Copyright (C) 1994 Linus Torvalds
- *
- * Pentium III FXSR, SSE support
- * General FPU state handling cleanups
- *	Gareth Hughes <gareth@valinux.com>, May 2000
- * x86-64 work by Andi Kleen 2002
- */
-
-#ifndef _ASM_X86_FPU_INTERNAL_H
-#define _ASM_X86_FPU_INTERNAL_H
-
-#include <linux/compat.h>
-#include <linux/sched.h>
-#include <linux/slab.h>
-#include <linux/mm.h>
-
-#include <asm/user.h>
-#include <asm/fpu/api.h>
-#include <asm/fpu/xstate.h>
-#include <asm/fpu/xcr.h>
-#include <asm/cpufeature.h>
-#include <asm/trace/fpu.h>
-
-#endif /* _ASM_X86_FPU_INTERNAL_H */
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -22,7 +22,7 @@
 #include <asm/bugs.h>
 #include <asm/processor.h>
 #include <asm/processor-flags.h>
-#include <asm/fpu/internal.h>
+#include <asm/fpu/api.h>
 #include <asm/msr.h>
 #include <asm/vmx.h>
 #include <asm/paravirt.h>
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -42,7 +42,7 @@
 #include <asm/setup.h>
 #include <asm/apic.h>
 #include <asm/desc.h>
-#include <asm/fpu/internal.h>
+#include <asm/fpu/api.h>
 #include <asm/mtrr.h>
 #include <asm/hwcap2.h>
 #include <linux/numa.h>
--- a/arch/x86/kernel/fpu/bugs.c
+++ b/arch/x86/kernel/fpu/bugs.c
@@ -2,7 +2,7 @@
 /*
  * x86 FPU bug checks:
  */
-#include <asm/fpu/internal.h>
+#include <asm/fpu/api.h>
 
 /*
  * Boot time CPU/FPU FDIV bug detection code:
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -6,7 +6,7 @@
  *  General FPU state handling cleanups
  *	Gareth Hughes <gareth@valinux.com>, May 2000
  */
-#include <asm/fpu/internal.h>
+#include <asm/fpu/api.h>
 #include <asm/fpu/regset.h>
 #include <asm/fpu/sched.h>
 #include <asm/fpu/signal.h>
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -2,7 +2,7 @@
 /*
  * x86 FPU boot time init code:
  */
-#include <asm/fpu/internal.h>
+#include <asm/fpu/api.h>
 #include <asm/tlbflush.h>
 #include <asm/setup.h>
 
--- a/arch/x86/kernel/fpu/regset.c
+++ b/arch/x86/kernel/fpu/regset.c
@@ -5,7 +5,7 @@
 #include <linux/sched/task_stack.h>
 #include <linux/vmalloc.h>
 
-#include <asm/fpu/internal.h>
+#include <asm/fpu/api.h>
 #include <asm/fpu/signal.h>
 #include <asm/fpu/regset.h>
 #include <asm/fpu/xstate.h>
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -13,7 +13,6 @@
 #include <linux/proc_fs.h>
 
 #include <asm/fpu/api.h>
-#include <asm/fpu/internal.h>
 #include <asm/fpu/regset.h>
 #include <asm/fpu/signal.h>
 #include <asm/fpu/xcr.h>
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -70,7 +70,7 @@
 #include <asm/mwait.h>
 #include <asm/apic.h>
 #include <asm/io_apic.h>
-#include <asm/fpu/internal.h>
+#include <asm/fpu/api.h>
 #include <asm/setup.h>
 #include <asm/uv/uv.h>
 #include <linux/mc146818rtc.h>
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -48,7 +48,7 @@
 #include <asm/ftrace.h>
 #include <asm/traps.h>
 #include <asm/desc.h>
-#include <asm/fpu/internal.h>
+#include <asm/fpu/api.h>
 #include <asm/cpu.h>
 #include <asm/cpu_entry_area.h>
 #include <asm/mce.h>
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -35,7 +35,7 @@
 #include <asm/cpu_device_id.h>
 #include <asm/debugreg.h>
 #include <asm/desc.h>
-#include <asm/fpu/internal.h>
+#include <asm/fpu/api.h>
 #include <asm/idtentry.h>
 #include <asm/io.h>
 #include <asm/irq_remapping.h>
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -20,7 +20,7 @@
 #include <asm/page.h>
 #include <asm/mce.h>
 #include <asm/suspend.h>
-#include <asm/fpu/internal.h>
+#include <asm/fpu/api.h>
 #include <asm/debugreg.h>
 #include <asm/cpu.h>
 #include <asm/mmu_context.h>


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [patch 31/31] x86/fpu: Provide a proper function for ex_handler_fprestore()
  2021-10-11 23:59 [patch 00/31] x86/fpu: Preparatory cleanups for AMX support (part 1) Thomas Gleixner
                   ` (29 preceding siblings ...)
  2021-10-12  0:00 ` [patch 30/31] x86/fpu: Replace the includes of fpu/internal.h Thomas Gleixner
@ 2021-10-12  0:00 ` Thomas Gleixner
  2021-10-12 21:15 ` [patch 00/31] x86/fpu: Preparatory cleanups for AMX support (part 1) Thomas Gleixner
  31 siblings, 0 replies; 96+ messages in thread
From: Thomas Gleixner @ 2021-10-12  0:00 UTC (permalink / raw)
  To: LKML; +Cc: x86, Chang S. Bae, Dave Hansen, Arjan van de Ven, kvm, Paolo Bonzini

To make upcoming changes for support of dynamically enabled features
simpler, provide a proper function for the exception handler which removes
exposure of FPU internals.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/fpu/api.h |    4 +---
 arch/x86/kernel/fpu/core.c     |    5 +++++
 arch/x86/kernel/fpu/internal.h |    2 ++
 arch/x86/mm/extable.c          |    5 ++---
 4 files changed, 10 insertions(+), 6 deletions(-)

--- a/arch/x86/include/asm/fpu/api.h
+++ b/arch/x86/include/asm/fpu/api.h
@@ -113,6 +113,7 @@ static inline void update_pasid(void) {
 /* Trap handling */
 extern int  fpu__exception_code(struct fpu *fpu, int trap_nr);
 extern void fpu_sync_fpstate(struct fpu *fpu);
+extern void fpu_reset_from_exception_fixup(void);
 
 /* Boot, hotplug and resume */
 extern void fpu__init_cpu(void);
@@ -129,9 +130,6 @@ static inline void fpstate_init_soft(str
 /* State tracking */
 DECLARE_PER_CPU(struct fpu *, fpu_fpregs_owner_ctx);
 
-/* FPSTATE */
-extern union fpregs_state init_fpstate;
-
 /* FPSTATE related functions which are exported to KVM */
 extern void fpu_init_fpstate_user(struct fpu *fpu);
 
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -155,6 +155,11 @@ void restore_fpregs_from_fpstate(union f
 	}
 }
 
+void fpu_reset_from_exception_fixup(void)
+{
+	restore_fpregs_from_fpstate(&init_fpstate, xfeatures_mask_fpstate());
+}
+
 #if IS_ENABLED(CONFIG_KVM)
 void fpu_swap_kvm_fpu(struct fpu *save, struct fpu *rstor, u64 restore_mask)
 {
--- a/arch/x86/kernel/fpu/internal.h
+++ b/arch/x86/kernel/fpu/internal.h
@@ -2,6 +2,8 @@
 #ifndef __X86_KERNEL_FPU_INTERNAL_H
 #define __X86_KERNEL_FPU_INTERNAL_H
 
+extern union fpregs_state init_fpstate;
+
 /* CPU feature check wrappers */
 static __always_inline __pure bool use_xsave(void)
 {
--- a/arch/x86/mm/extable.c
+++ b/arch/x86/mm/extable.c
@@ -4,8 +4,7 @@
 #include <linux/sched/debug.h>
 #include <xen/xen.h>
 
-#include <asm/fpu/signal.h>
-#include <asm/fpu/xstate.h>
+#include <asm/fpu/api.h>
 #include <asm/sev.h>
 #include <asm/traps.h>
 #include <asm/kdebug.h>
@@ -48,7 +47,7 @@ static bool ex_handler_fprestore(const s
 	WARN_ONCE(1, "Bad FPU state detected at %pB, reinitializing FPU registers.",
 		  (void *)instruction_pointer(regs));
 
-	restore_fpregs_from_fpstate(&init_fpstate, xfeatures_mask_fpstate());
+	fpu_reset_from_exception_fixup();
 	return true;
 }
 


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [patch 28/31] x86/sev: Include fpu/xcr.h
  2021-10-12  0:00 ` [patch 28/31] x86/sev: Include fpu/xcr.h Thomas Gleixner
@ 2021-10-12  7:24   ` Xiaoyao Li
  0 siblings, 0 replies; 96+ messages in thread
From: Xiaoyao Li @ 2021-10-12  7:24 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: x86, Chang S. Bae, Dave Hansen, Arjan van de Ven, kvm, Paolo Bonzini

On 10/12/2021 8:00 AM, Thomas Gleixner wrote:
> Include the header which only provides the XRC accessors. That's all what
                                               ^
                                             typo, should be XCR

> is needed here.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
>   arch/x86/kernel/sev.c |    2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> --- a/arch/x86/kernel/sev.c
> +++ b/arch/x86/kernel/sev.c
> @@ -23,7 +23,7 @@
>   #include <asm/stacktrace.h>
>   #include <asm/sev.h>
>   #include <asm/insn-eval.h>
> -#include <asm/fpu/internal.h>
> +#include <asm/fpu/xcr.h>
>   #include <asm/processor.h>
>   #include <asm/realmode.h>
>   #include <asm/traps.h>
> 


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [patch 04/31] x86/fpu: Restrict xsaves()/xrstors() to independent states
  2021-10-12  0:00 ` [patch 04/31] x86/fpu: Restrict xsaves()/xrstors() to independent states Thomas Gleixner
@ 2021-10-12 14:24   ` Borislav Petkov
  0 siblings, 0 replies; 96+ messages in thread
From: Borislav Petkov @ 2021-10-12 14:24 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Chang S. Bae, Dave Hansen, Arjan van de Ven, kvm,
	Paolo Bonzini

On Tue, Oct 12, 2021 at 02:00:04AM +0200, Thomas Gleixner wrote:
> These interfaces are really only valid for features which are independently
> managed and not part of the task context state for various reasons.
> 
> Tighten the checks and adjust the misleading comments.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
>  arch/x86/kernel/fpu/xstate.c |   14 ++++----------
>  1 file changed, 4 insertions(+), 10 deletions(-)
> 
> --- a/arch/x86/kernel/fpu/xstate.c
> +++ b/arch/x86/kernel/fpu/xstate.c
> @@ -1182,13 +1182,9 @@ static bool validate_xsaves_xrstors(u64

I guess then change the name too, to:

validate_indep_xstate_components()

or so?

Then you don't need the comment below.

>  	if (WARN_ON_FPU(!cpu_feature_enabled(X86_FEATURE_XSAVES)))
>  		return false;
>  	/*
> -	 * Validate that this is either a task->fpstate related component
> -	 * subset or an independent one.
> +	 * Validate that this is a independent compoment.

WARNING: 'compoment' may be misspelled - perhaps 'component'?
#78: FILE: arch/x86/kernel/fpu/xstate.c:1185:
+        * Validate that this is a independent compoment.
                                               ^^^^^^^^^
>  	 */
> -	if (mask & xfeatures_mask_independent())
> -		xchk = ~xfeatures_mask_independent();
> -	else
> -		xchk = ~xfeatures_mask_all;
> +	xchk = ~xfeatures_mask_independent();
>  
>  	if (WARN_ON_ONCE(!mask || mask & xchk))
>  		return false;
> @@ -1206,8 +1202,7 @@ static bool validate_xsaves_xrstors(u64
>   * buffer should be zeroed otherwise a consecutive XRSTORS from that buffer
>   * can #GP.
>   *
> - * The feature mask must either be a subset of the independent features or
> - * a subset of the task->fpstate related features.
> + * The feature mask must be a subset of the independent features

End with a fullstop.

>   */
>  void xsaves(struct xregs_state *xstate, u64 mask)
>  {
> @@ -1231,8 +1226,7 @@ void xsaves(struct xregs_state *xstate,
>   * Proper usage is to restore the state which was saved with
>   * xsaves() into @xstate.
>   *
> - * The feature mask must either be a subset of the independent features or
> - * a subset of the task->fpstate related features.
> + * The feature mask must be a subset of the independent features

Ditto.

>   */
>  void xrstors(struct xregs_state *xstate, u64 mask)
>  {

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [patch 09/31] x86/fpu: Do not inherit FPU context for CLONE_THREAD
  2021-10-12  0:00 ` [patch 09/31] x86/fpu: Do not inherit FPU context for CLONE_THREAD Thomas Gleixner
@ 2021-10-12 16:10   ` Borislav Petkov
  2021-10-12 18:52     ` Thomas Gleixner
  0 siblings, 1 reply; 96+ messages in thread
From: Borislav Petkov @ 2021-10-12 16:10 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Chang S. Bae, Dave Hansen, Arjan van de Ven, kvm,
	Paolo Bonzini

On Tue, Oct 12, 2021 at 02:00:11AM +0200, Thomas Gleixner wrote:
> CLONE_THREAD does not have the guarantee of a true fork to inherit all
> state. Especially the FPU state is meaningless for CLONE_THREAD.
> 
> Just wipe out the minimal required state so restore on return to user space
> let's the thread start with a clean FPU.

This sentence reads weird, needs massaging.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [patch 11/31] x86/fpu/xstate: Provide and use for_each_xfeature()
  2021-10-12  0:00 ` [patch 11/31] x86/fpu/xstate: Provide and use for_each_xfeature() Thomas Gleixner
@ 2021-10-12 16:45   ` Borislav Petkov
  0 siblings, 0 replies; 96+ messages in thread
From: Borislav Petkov @ 2021-10-12 16:45 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Chang S. Bae, Dave Hansen, Arjan van de Ven, kvm,
	Paolo Bonzini

On Tue, Oct 12, 2021 at 02:00:14AM +0200, Thomas Gleixner wrote:
> These loops evaluating xfeature bits are really hard to read. Create an
> iterator and use for_each_set_bit_from() inside which already does the right
> thing.

<--- No functional changes.

> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
>  arch/x86/kernel/fpu/xstate.c |   56 +++++++++++++++++--------------------------
>  1 file changed, 23 insertions(+), 33 deletions(-)

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [patch 13/31] x86/fpu: Move KVMs FPU swapping to FPU core
  2021-10-12  0:00 ` [patch 13/31] x86/fpu: Move KVMs FPU swapping to FPU core Thomas Gleixner
@ 2021-10-12 16:53   ` Borislav Petkov
  2021-10-12 18:25     ` Thomas Gleixner
  2021-10-12 17:22   ` Paolo Bonzini
  1 sibling, 1 reply; 96+ messages in thread
From: Borislav Petkov @ 2021-10-12 16:53 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Chang S. Bae, Dave Hansen, Arjan van de Ven, kvm,
	Paolo Bonzini

Just typos:

On Tue, Oct 12, 2021 at 02:00:17AM +0200, Thomas Gleixner wrote:
> Swapping the host/guest FPU is directly fiddling with FPU internals which
> requires 5 exports. The upcoming support of dymanically enabled states

"dynamically"

>  /*
>   * Use kernel_fpu_begin/end() if you intend to use FPU in kernel context. It
>   * disables preemption so be careful if you intend to use it for long periods
> @@ -108,4 +110,10 @@ extern int cpu_has_xfeatures(u64 xfeatur
>  
>  static inline void update_pasid(void) { }
>  
> +/* FPSTATE related functions which are exported to KVM */

fpstate-related

> +extern void fpu_init_fpstate_user(struct fpu *fpu);
> +
> +/* KVM specific functions */
> +extern void fpu_swap_kvm_fpu(struct fpu *save, struct fpu *rstor, u64 restore_mask);
> +
>  #endif /* _ASM_X86_FPU_API_H */

...

>  /* Swap (qemu) user FPU context for the guest FPU context. */
>  static void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
>  {
> -	fpregs_lock();
> -
> -	kvm_save_current_fpu(vcpu->arch.user_fpu);
> -
>  	/*
> -	 * Guests with protected state can't have it set by the hypervisor,
> -	 * so skip trying to set it.
> +	 * Guest with protected state have guest_fpu == NULL which makes

"Guests ... "

> +	 * the swap only safe the host state. Exclude PKRU from restore as

"save"

> +	 * it is restored separately in kvm_x86_ops.run().
>  	 */
> -	if (vcpu->arch.guest_fpu)
> -		/* PKRU is separately restored in kvm_x86_ops.run. */
> -		__restore_fpregs_from_fpstate(&vcpu->arch.guest_fpu->state,
> -					~XFEATURE_MASK_PKRU);
> -
> -	fpregs_mark_activate();
> -	fpregs_unlock();
> -
> +	fpu_swap_kvm_fpu(vcpu->arch.user_fpu, vcpu->arch.guest_fpu,
> +			 ~XFEATURE_MASK_PKRU);
>  	trace_kvm_fpu(1);
>  }
>  
>  /* When vcpu_run ends, restore user space FPU context. */
>  static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
>  {
> -	fpregs_lock();
> -
>  	/*
> -	 * Guests with protected state can't have it read by the hypervisor,
> -	 * so skip trying to save it.
> +	 * Guest with protected state have guest_fpu == NULL which makes

"Guests ... "

> +	 * swap only restore the host state.
>  	 */
> -	if (vcpu->arch.guest_fpu)
> -		kvm_save_current_fpu(vcpu->arch.guest_fpu);
> -
> -	restore_fpregs_from_fpstate(&vcpu->arch.user_fpu->state);
> -
> -	fpregs_mark_activate();
> -	fpregs_unlock();
> -
> +	fpu_swap_kvm_fpu(vcpu->arch.guest_fpu, vcpu->arch.user_fpu, ~0ULL);
>  	++vcpu->stat.fpu_reload;
>  	trace_kvm_fpu(0);
>  }
> --- a/arch/x86/mm/extable.c
> +++ b/arch/x86/mm/extable.c
> @@ -47,7 +47,7 @@ static bool ex_handler_fprestore(const s
>  	WARN_ONCE(1, "Bad FPU state detected at %pB, reinitializing FPU registers.",
>  		  (void *)instruction_pointer(regs));
>  
> -	__restore_fpregs_from_fpstate(&init_fpstate, xfeatures_mask_fpstate());
> +	restore_fpregs_from_fpstate(&init_fpstate, xfeatures_mask_fpstate());
>  	return true;
>  }
>  
> 

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [patch 14/31] x86/fpu: Replace KVMs homebrewn FPU copy from user
  2021-10-12  0:00 ` [patch 14/31] x86/fpu: Replace KVMs homebrewn FPU copy from user Thomas Gleixner
@ 2021-10-12 17:00   ` Borislav Petkov
  2021-10-13 14:57     ` Sean Christopherson
  2021-10-12 17:30   ` Paolo Bonzini
  1 sibling, 1 reply; 96+ messages in thread
From: Borislav Petkov @ 2021-10-12 17:00 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Chang S. Bae, Dave Hansen, Arjan van de Ven, kvm,
	Paolo Bonzini

On Tue, Oct 12, 2021 at 02:00:19AM +0200, Thomas Gleixner wrote:
> Copying a user space buffer to the memory buffer is already available in
> the FPU core. The copy mechanism in KVM lacks sanity checks and needs to
> use cpuid() to lookup the offset of each component, while the FPU core has
> this information cached.
> 
> Make the FPU core variant accessible for KVM and replace the homebrewn
> mechanism.

I think you mean "homebred" in that patch... or "home brewed", that
works too, I think.

> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: kvm@vger.kernel.org
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  arch/x86/include/asm/fpu/api.h |    3 +
>  arch/x86/kernel/fpu/core.c     |   38 ++++++++++++++++++++-
>  arch/x86/kernel/fpu/xstate.c   |    3 -
>  arch/x86/kvm/x86.c             |   74 +----------------------------------------
>  4 files changed, 44 insertions(+), 74 deletions(-)
> 
> --- a/arch/x86/include/asm/fpu/api.h
> +++ b/arch/x86/include/asm/fpu/api.h
> @@ -116,4 +116,7 @@ extern void fpu_init_fpstate_user(struct
>  /* KVM specific functions */
>  extern void fpu_swap_kvm_fpu(struct fpu *save, struct fpu *rstor, u64 restore_mask);
>  
> +struct kvm_vcpu;
> +extern int fpu_copy_kvm_uabi_to_vcpu(struct fpu *fpu, const void *buf, u64 xcr0, u32 *pkru);
> +
>  #endif /* _ASM_X86_FPU_API_H */
> --- a/arch/x86/kernel/fpu/core.c
> +++ b/arch/x86/kernel/fpu/core.c
> @@ -174,7 +174,43 @@ void fpu_swap_kvm_fpu(struct fpu *save,
>  	fpregs_unlock();
>  }
>  EXPORT_SYMBOL_GPL(fpu_swap_kvm_fpu);
> -#endif
> +
> +int fpu_copy_kvm_uabi_to_vcpu(struct fpu *fpu, const void *buf, u64 xcr0,
> +			      u32 *vpkru)

Right, except that there's no @vcpu in the args of that function. I
guess you could call it

fpu_copy_kvm_uabi_to_buf()

and that @buf can be

vcpu->arch.guest_fpu

...

Just a nitpick anyway.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [patch 16/31] x86/fpu: Replace KVMs homebrewn FPU copy to user
  2021-10-12  0:00 ` [patch 16/31] x86/fpu: Replace KVMs homebrewn FPU copy to user Thomas Gleixner
@ 2021-10-12 17:10   ` Borislav Petkov
  2021-10-12 17:36   ` Paolo Bonzini
  1 sibling, 0 replies; 96+ messages in thread
From: Borislav Petkov @ 2021-10-12 17:10 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Chang S. Bae, Dave Hansen, Arjan van de Ven, kvm,
	Paolo Bonzini

On Tue, Oct 12, 2021 at 02:00:22AM +0200, Thomas Gleixner wrote:
> Similar to the copy from user function the FPU core has this already
implemented with all bells and whistels.

"whistles"

And also, same nitpicks as here:

https://lore.kernel.org/r/YWW/PEQyQAwS9/qv@zn.tnic

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [patch 13/31] x86/fpu: Move KVMs FPU swapping to FPU core
  2021-10-12  0:00 ` [patch 13/31] x86/fpu: Move KVMs FPU swapping to FPU core Thomas Gleixner
  2021-10-12 16:53   ` Borislav Petkov
@ 2021-10-12 17:22   ` Paolo Bonzini
  2021-10-13  6:15     ` Liu, Jing2
  1 sibling, 1 reply; 96+ messages in thread
From: Paolo Bonzini @ 2021-10-12 17:22 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: x86, Chang S. Bae, Dave Hansen, Arjan van de Ven, kvm

On 12/10/21 02:00, Thomas Gleixner wrote:
> Swapping the host/guest FPU is directly fiddling with FPU internals which
> requires 5 exports. The upcoming support of dymanically enabled states
> would even need more.
> 
> Implement a swap function in the FPU core code and export that instead.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: kvm@vger.kernel.org
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> ---
>   arch/x86/include/asm/fpu/api.h      |    8 +++++
>   arch/x86/include/asm/fpu/internal.h |   15 +---------
>   arch/x86/kernel/fpu/core.c          |   30 ++++++++++++++++++---
>   arch/x86/kernel/fpu/init.c          |    1
>   arch/x86/kernel/fpu/xstate.c        |    1
>   arch/x86/kvm/x86.c                  |   51 +++++++-----------------------------
>   arch/x86/mm/extable.c               |    2 -
>   7 files changed, 48 insertions(+), 60 deletions(-)
> 
> --- a/arch/x86/include/asm/fpu/api.h
> +++ b/arch/x86/include/asm/fpu/api.h
> @@ -12,6 +12,8 @@
>   #define _ASM_X86_FPU_API_H
>   #include <linux/bottom_half.h>
>   
> +#include <asm/fpu/types.h>
> +
>   /*
>    * Use kernel_fpu_begin/end() if you intend to use FPU in kernel context. It
>    * disables preemption so be careful if you intend to use it for long periods
> @@ -108,4 +110,10 @@ extern int cpu_has_xfeatures(u64 xfeatur
>   
>   static inline void update_pasid(void) { }
>   
> +/* FPSTATE related functions which are exported to KVM */
> +extern void fpu_init_fpstate_user(struct fpu *fpu);
> +
> +/* KVM specific functions */
> +extern void fpu_swap_kvm_fpu(struct fpu *save, struct fpu *rstor, u64 restore_mask);
> +
>   #endif /* _ASM_X86_FPU_API_H */
> --- a/arch/x86/include/asm/fpu/internal.h
> +++ b/arch/x86/include/asm/fpu/internal.h
> @@ -74,14 +74,8 @@ static __always_inline __pure bool use_f
>   	return static_cpu_has(X86_FEATURE_FXSR);
>   }
>   
> -/*
> - * fpstate handling functions:
> - */
> -
>   extern union fpregs_state init_fpstate;
> -
>   extern void fpstate_init_user(union fpregs_state *state);
> -extern void fpu_init_fpstate_user(struct fpu *fpu);
>   
>   #ifdef CONFIG_MATH_EMULATION
>   extern void fpstate_init_soft(struct swregs_state *soft);
> @@ -381,12 +375,7 @@ static inline int os_xrstor_safe(struct
>   	return err;
>   }
>   
> -extern void __restore_fpregs_from_fpstate(union fpregs_state *fpstate, u64 mask);
> -
> -static inline void restore_fpregs_from_fpstate(union fpregs_state *fpstate)
> -{
> -	__restore_fpregs_from_fpstate(fpstate, xfeatures_mask_fpstate());
> -}
> +extern void restore_fpregs_from_fpstate(union fpregs_state *fpstate, u64 mask);
>   
>   extern bool copy_fpstate_to_sigframe(void __user *buf, void __user *fp, int size);
>   
> @@ -467,7 +456,7 @@ static inline void fpregs_restore_userre
>   		 */
>   		mask = xfeatures_mask_restore_user() |
>   			xfeatures_mask_supervisor();
> -		__restore_fpregs_from_fpstate(&fpu->state, mask);
> +		restore_fpregs_from_fpstate(&fpu->state, mask);
>   
>   		fpregs_activate(fpu);
>   		fpu->last_cpu = cpu;
> --- a/arch/x86/kernel/fpu/core.c
> +++ b/arch/x86/kernel/fpu/core.c
> @@ -124,9 +124,8 @@ void save_fpregs_to_fpstate(struct fpu *
>   	asm volatile("fnsave %[fp]; fwait" : [fp] "=m" (fpu->state.fsave));
>   	frstor(&fpu->state.fsave);
>   }
> -EXPORT_SYMBOL(save_fpregs_to_fpstate);
>   
> -void __restore_fpregs_from_fpstate(union fpregs_state *fpstate, u64 mask)
> +void restore_fpregs_from_fpstate(union fpregs_state *fpstate, u64 mask)
>   {
>   	/*
>   	 * AMD K7/K8 and later CPUs up to Zen don't save/restore
> @@ -151,7 +150,31 @@ void __restore_fpregs_from_fpstate(union
>   			frstor(&fpstate->fsave);
>   	}
>   }
> -EXPORT_SYMBOL_GPL(__restore_fpregs_from_fpstate);
> +
> +#if IS_ENABLED(CONFIG_KVM)
> +void fpu_swap_kvm_fpu(struct fpu *save, struct fpu *rstor, u64 restore_mask)
> +{
> +	fpregs_lock();
> +
> +	if (save) {
> +		if (test_thread_flag(TIF_NEED_FPU_LOAD)) {
> +			memcpy(&save->state, &current->thread.fpu.state,
> +			       fpu_kernel_xstate_size);
> +		} else {
> +			save_fpregs_to_fpstate(save);
> +		}
> +	}
> +
> +	if (rstor) {
> +		restore_mask &= xfeatures_mask_fpstate();
> +		restore_fpregs_from_fpstate(&rstor->state, restore_mask);
> +	}
> +
> +	fpregs_mark_activate();
> +	fpregs_unlock();
> +}
> +EXPORT_SYMBOL_GPL(fpu_swap_kvm_fpu);
> +#endif
>   
>   void kernel_fpu_begin_mask(unsigned int kfpu_mask)
>   {
> @@ -459,7 +482,6 @@ void fpregs_mark_activate(void)
>   	fpu->last_cpu = smp_processor_id();
>   	clear_thread_flag(TIF_NEED_FPU_LOAD);
>   }
> -EXPORT_SYMBOL_GPL(fpregs_mark_activate);
>   
>   /*
>    * x87 math exception handling:
> --- a/arch/x86/kernel/fpu/init.c
> +++ b/arch/x86/kernel/fpu/init.c
> @@ -136,7 +136,6 @@ static void __init fpu__init_system_gene
>    * components into a single, continuous memory block:
>    */
>   unsigned int fpu_kernel_xstate_size __ro_after_init;
> -EXPORT_SYMBOL_GPL(fpu_kernel_xstate_size);
>   
>   /* Get alignment of the TYPE. */
>   #define TYPE_ALIGN(TYPE) offsetof(struct { char x; TYPE test; }, test)
> --- a/arch/x86/kernel/fpu/xstate.c
> +++ b/arch/x86/kernel/fpu/xstate.c
> @@ -65,7 +65,6 @@ static short xsave_cpuid_features[] __in
>    * XSAVE buffer, both supervisor and user xstates.
>    */
>   u64 xfeatures_mask_all __ro_after_init;
> -EXPORT_SYMBOL_GPL(xfeatures_mask_all);
>   
>   static unsigned int xstate_offsets[XFEATURE_MAX] __ro_after_init =
>   	{ [ 0 ... XFEATURE_MAX - 1] = -1};
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -68,7 +68,9 @@
>   #include <asm/mce.h>
>   #include <asm/pkru.h>
>   #include <linux/kernel_stat.h>
> -#include <asm/fpu/internal.h> /* Ugh! */
> +#include <asm/fpu/api.h>
> +#include <asm/fpu/xcr.h>
> +#include <asm/fpu/xstate.h>
>   #include <asm/pvclock.h>
>   #include <asm/div64.h>
>   #include <asm/irq_remapping.h>
> @@ -9899,58 +9901,27 @@ static int complete_emulated_mmio(struct
>   	return 0;
>   }
>   
> -static void kvm_save_current_fpu(struct fpu *fpu)
> -{
> -	/*
> -	 * If the target FPU state is not resident in the CPU registers, just
> -	 * memcpy() from current, else save CPU state directly to the target.
> -	 */
> -	if (test_thread_flag(TIF_NEED_FPU_LOAD))
> -		memcpy(&fpu->state, &current->thread.fpu.state,
> -		       fpu_kernel_xstate_size);
> -	else
> -		save_fpregs_to_fpstate(fpu);
> -}
> -
>   /* Swap (qemu) user FPU context for the guest FPU context. */
>   static void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
>   {
> -	fpregs_lock();
> -
> -	kvm_save_current_fpu(vcpu->arch.user_fpu);
> -
>   	/*
> -	 * Guests with protected state can't have it set by the hypervisor,
> -	 * so skip trying to set it.
> +	 * Guest with protected state have guest_fpu == NULL which makes
> +	 * the swap only safe the host state. Exclude PKRU from restore as
> +	 * it is restored separately in kvm_x86_ops.run().
>   	 */
> -	if (vcpu->arch.guest_fpu)
> -		/* PKRU is separately restored in kvm_x86_ops.run. */
> -		__restore_fpregs_from_fpstate(&vcpu->arch.guest_fpu->state,
> -					~XFEATURE_MASK_PKRU);
> -
> -	fpregs_mark_activate();
> -	fpregs_unlock();
> -
> +	fpu_swap_kvm_fpu(vcpu->arch.user_fpu, vcpu->arch.guest_fpu,
> +			 ~XFEATURE_MASK_PKRU);
>   	trace_kvm_fpu(1);
>   }
>   
>   /* When vcpu_run ends, restore user space FPU context. */
>   static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
>   {
> -	fpregs_lock();
> -
>   	/*
> -	 * Guests with protected state can't have it read by the hypervisor,
> -	 * so skip trying to save it.
> +	 * Guest with protected state have guest_fpu == NULL which makes
> +	 * swap only restore the host state.
>   	 */
> -	if (vcpu->arch.guest_fpu)
> -		kvm_save_current_fpu(vcpu->arch.guest_fpu);
> -
> -	restore_fpregs_from_fpstate(&vcpu->arch.user_fpu->state);
> -
> -	fpregs_mark_activate();
> -	fpregs_unlock();
> -
> +	fpu_swap_kvm_fpu(vcpu->arch.guest_fpu, vcpu->arch.user_fpu, ~0ULL);
>   	++vcpu->stat.fpu_reload;
>   	trace_kvm_fpu(0);
>   }
> --- a/arch/x86/mm/extable.c
> +++ b/arch/x86/mm/extable.c
> @@ -47,7 +47,7 @@ static bool ex_handler_fprestore(const s
>   	WARN_ONCE(1, "Bad FPU state detected at %pB, reinitializing FPU registers.",
>   		  (void *)instruction_pointer(regs));
>   
> -	__restore_fpregs_from_fpstate(&init_fpstate, xfeatures_mask_fpstate());
> +	restore_fpregs_from_fpstate(&init_fpstate, xfeatures_mask_fpstate());
>   	return true;
>   }
>   
> 

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [patch 14/31] x86/fpu: Replace KVMs homebrewn FPU copy from user
  2021-10-12  0:00 ` [patch 14/31] x86/fpu: Replace KVMs homebrewn FPU copy from user Thomas Gleixner
  2021-10-12 17:00   ` Borislav Petkov
@ 2021-10-12 17:30   ` Paolo Bonzini
  1 sibling, 0 replies; 96+ messages in thread
From: Paolo Bonzini @ 2021-10-12 17:30 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: x86, Chang S. Bae, Dave Hansen, Arjan van de Ven, kvm

On 12/10/21 02:00, Thomas Gleixner wrote:
> Copying a user space buffer to the memory buffer is already available in
> the FPU core. The copy mechanism in KVM lacks sanity checks and needs to
> use cpuid() to lookup the offset of each component, while the FPU core has
> this information cached.
> 
> Make the FPU core variant accessible for KVM and replace the homebrewn
> mechanism.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: kvm@vger.kernel.org
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> ---
>   arch/x86/include/asm/fpu/api.h |    3 +
>   arch/x86/kernel/fpu/core.c     |   38 ++++++++++++++++++++-
>   arch/x86/kernel/fpu/xstate.c   |    3 -
>   arch/x86/kvm/x86.c             |   74 +----------------------------------------
>   4 files changed, 44 insertions(+), 74 deletions(-)
> 
> --- a/arch/x86/include/asm/fpu/api.h
> +++ b/arch/x86/include/asm/fpu/api.h
> @@ -116,4 +116,7 @@ extern void fpu_init_fpstate_user(struct
>   /* KVM specific functions */
>   extern void fpu_swap_kvm_fpu(struct fpu *save, struct fpu *rstor, u64 restore_mask);
>   
> +struct kvm_vcpu;
> +extern int fpu_copy_kvm_uabi_to_vcpu(struct fpu *fpu, const void *buf, u64 xcr0, u32 *pkru);
> +
>   #endif /* _ASM_X86_FPU_API_H */
> --- a/arch/x86/kernel/fpu/core.c
> +++ b/arch/x86/kernel/fpu/core.c
> @@ -174,7 +174,43 @@ void fpu_swap_kvm_fpu(struct fpu *save,
>   	fpregs_unlock();
>   }
>   EXPORT_SYMBOL_GPL(fpu_swap_kvm_fpu);
> -#endif
> +
> +int fpu_copy_kvm_uabi_to_vcpu(struct fpu *fpu, const void *buf, u64 xcr0,
> +			      u32 *vpkru)
> +{
> +	union fpregs_state *kstate = &fpu->state;
> +	const union fpregs_state *ustate = buf;
> +	struct pkru_state *xpkru;
> +	int ret;
> +
> +	if (!cpu_feature_enabled(X86_FEATURE_XSAVE)) {
> +		if (ustate->xsave.header.xfeatures & ~XFEATURE_MASK_FPSSE)
> +			return -EINVAL;
> +		if (ustate->fxsave.mxcsr & ~mxcsr_feature_mask)
> +			return -EINVAL;
> +		memcpy(&kstate->fxsave, &ustate->fxsave, sizeof(ustate->fxsave));
> +		return 0;
> +	}
> +
> +	if (ustate->xsave.header.xfeatures & ~xcr0)
> +		return -EINVAL;
> +
> +	ret = copy_uabi_from_kernel_to_xstate(&kstate->xsave, ustate);
> +	if (ret)
> +		return ret;
> +
> +	/* Retrieve PKRU if not in init state */
> +	if (kstate->xsave.header.xfeatures & XFEATURE_MASK_PKRU) {
> +		xpkru = get_xsave_addr(&kstate->xsave, XFEATURE_PKRU);
> +		*vpkru = xpkru->pkru;
> +	}
> +
> +	/* Ensure that XCOMP_BV is set up for XSAVES */
> +	xstate_init_xcomp_bv(&kstate->xsave, xfeatures_mask_uabi());
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(fpu_copy_kvm_uabi_to_vcpu);
> +#endif /* CONFIG_KVM */
>   
>   void kernel_fpu_begin_mask(unsigned int kfpu_mask)
>   {
> --- a/arch/x86/kernel/fpu/xstate.c
> +++ b/arch/x86/kernel/fpu/xstate.c
> @@ -1134,8 +1134,7 @@ static int copy_uabi_to_xstate(struct xr
>   
>   /*
>    * Convert from a ptrace standard-format kernel buffer to kernel XSAVE[S]
> - * format and copy to the target thread. This is called from
> - * xstateregs_set().
> + * format and copy to the target thread. Used by ptrace and KVM.
>    */
>   int copy_uabi_from_kernel_to_xstate(struct xregs_state *xsave, const void *kbuf)
>   {
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -4695,8 +4695,6 @@ static int kvm_vcpu_ioctl_x86_set_debugr
>   	return 0;
>   }
>   
> -#define XSTATE_COMPACTION_ENABLED (1ULL << 63)
> -
>   static void fill_xsave(u8 *dest, struct kvm_vcpu *vcpu)
>   {
>   	struct xregs_state *xsave = &vcpu->arch.guest_fpu->state.xsave;
> @@ -4740,50 +4738,6 @@ static void fill_xsave(u8 *dest, struct
>   	}
>   }
>   
> -static void load_xsave(struct kvm_vcpu *vcpu, u8 *src)
> -{
> -	struct xregs_state *xsave = &vcpu->arch.guest_fpu->state.xsave;
> -	u64 xstate_bv = *(u64 *)(src + XSAVE_HDR_OFFSET);
> -	u64 valid;
> -
> -	/*
> -	 * Copy legacy XSAVE area, to avoid complications with CPUID
> -	 * leaves 0 and 1 in the loop below.
> -	 */
> -	memcpy(xsave, src, XSAVE_HDR_OFFSET);
> -
> -	/* Set XSTATE_BV and possibly XCOMP_BV.  */
> -	xsave->header.xfeatures = xstate_bv;
> -	if (boot_cpu_has(X86_FEATURE_XSAVES))
> -		xsave->header.xcomp_bv = host_xcr0 | XSTATE_COMPACTION_ENABLED;
> -
> -	/*
> -	 * Copy each region from the non-compacted offset to the
> -	 * possibly compacted offset.
> -	 */
> -	valid = xstate_bv & ~XFEATURE_MASK_FPSSE;
> -	while (valid) {
> -		u32 size, offset, ecx, edx;
> -		u64 xfeature_mask = valid & -valid;
> -		int xfeature_nr = fls64(xfeature_mask) - 1;
> -
> -		cpuid_count(XSTATE_CPUID, xfeature_nr,
> -			    &size, &offset, &ecx, &edx);
> -
> -		if (xfeature_nr == XFEATURE_PKRU) {
> -			memcpy(&vcpu->arch.pkru, src + offset,
> -			       sizeof(vcpu->arch.pkru));
> -		} else {
> -			void *dest = get_xsave_addr(xsave, xfeature_nr);
> -
> -			if (dest)
> -				memcpy(dest, src + offset, size);
> -		}
> -
> -		valid -= xfeature_mask;
> -	}
> -}
> -
>   static void kvm_vcpu_ioctl_x86_get_xsave(struct kvm_vcpu *vcpu,
>   					 struct kvm_xsave *guest_xsave)
>   {
> @@ -4802,37 +4756,15 @@ static void kvm_vcpu_ioctl_x86_get_xsave
>   	}
>   }
>   
> -#define XSAVE_MXCSR_OFFSET 24
> -
>   static int kvm_vcpu_ioctl_x86_set_xsave(struct kvm_vcpu *vcpu,
>   					struct kvm_xsave *guest_xsave)
>   {
> -	u64 xstate_bv;
> -	u32 mxcsr;
> -
>   	if (!vcpu->arch.guest_fpu)
>   		return 0;
>   
> -	xstate_bv = *(u64 *)&guest_xsave->region[XSAVE_HDR_OFFSET / sizeof(u32)];
> -	mxcsr = *(u32 *)&guest_xsave->region[XSAVE_MXCSR_OFFSET / sizeof(u32)];
> -
> -	if (boot_cpu_has(X86_FEATURE_XSAVE)) {
> -		/*
> -		 * Here we allow setting states that are not present in
> -		 * CPUID leaf 0xD, index 0, EDX:EAX.  This is for compatibility
> -		 * with old userspace.
> -		 */
> -		if (xstate_bv & ~supported_xcr0 || mxcsr & ~mxcsr_feature_mask)
> -			return -EINVAL;
> -		load_xsave(vcpu, (u8 *)guest_xsave->region);
> -	} else {
> -		if (xstate_bv & ~XFEATURE_MASK_FPSSE ||
> -			mxcsr & ~mxcsr_feature_mask)
> -			return -EINVAL;
> -		memcpy(&vcpu->arch.guest_fpu->state.fxsave,
> -			guest_xsave->region, sizeof(struct fxregs_state));
> -	}
> -	return 0;
> +	return fpu_copy_kvm_uabi_to_vcpu(vcpu->arch.guest_fpu,
> +					 guest_xsave->region,
> +					 supported_xcr0, &vcpu->arch.pkru);
>   }
>   
>   static void kvm_vcpu_ioctl_x86_get_xcrs(struct kvm_vcpu *vcpu,
> 

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [patch 15/31] x86/fpu: Rework copy_xstate_to_uabi_buf()
  2021-10-12  0:00 ` [patch 15/31] x86/fpu: Rework copy_xstate_to_uabi_buf() Thomas Gleixner
@ 2021-10-12 17:30   ` Paolo Bonzini
  0 siblings, 0 replies; 96+ messages in thread
From: Paolo Bonzini @ 2021-10-12 17:30 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: x86, Chang S. Bae, Dave Hansen, Arjan van de Ven, kvm

On 12/10/21 02:00, Thomas Gleixner wrote:
> Prepare for replacing the KVM copy xstate to user function by extending
> copy_xstate_to_uabi_buf() with a pkru argument which allows the caller to
> hand in the pkru value, which is required for KVM because the guest PKRU is
> not accessible via current. Fixup all callsites accordingly.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
>   arch/x86/kernel/fpu/xstate.c |   34 ++++++++++++++++++++++++++--------
>   arch/x86/kernel/fpu/xstate.h |    3 +++
>   2 files changed, 29 insertions(+), 8 deletions(-)
> 
> --- a/arch/x86/kernel/fpu/xstate.c
> +++ b/arch/x86/kernel/fpu/xstate.c
> @@ -940,9 +940,10 @@ static void copy_feature(bool from_xstat
>   }
>   
>   /**
> - * copy_xstate_to_uabi_buf - Copy kernel saved xstate to a UABI buffer
> + * __copy_xstate_to_uabi_buf - Copy kernel saved xstate to a UABI buffer
>    * @to:		membuf descriptor
> - * @tsk:	The task from which to copy the saved xstate
> + * @xsave:	The xsave from which to copy
> + * @pkru_val:	The PKRU value to store in the PKRU component
>    * @copy_mode:	The requested copy mode
>    *
>    * Converts from kernel XSAVE or XSAVES compacted format to UABI conforming
> @@ -951,11 +952,10 @@ static void copy_feature(bool from_xstat
>    *
>    * It supports partial copy but @to.pos always starts from zero.
>    */
> -void copy_xstate_to_uabi_buf(struct membuf to, struct task_struct *tsk,
> -			     enum xstate_copy_mode copy_mode)
> +void __copy_xstate_to_uabi_buf(struct membuf to, struct xregs_state *xsave,
> +			       u32 pkru_val, enum xstate_copy_mode copy_mode)
>   {
>   	const unsigned int off_mxcsr = offsetof(struct fxregs_state, mxcsr);
> -	struct xregs_state *xsave = &tsk->thread.fpu.state.xsave;
>   	struct xregs_state *xinit = &init_fpstate.xsave;
>   	struct xstate_header header;
>   	unsigned int zerofrom;
> @@ -1033,10 +1033,9 @@ void copy_xstate_to_uabi_buf(struct memb
>   			struct pkru_state pkru = {0};
>   			/*
>   			 * PKRU is not necessarily up to date in the
> -			 * thread's XSAVE buffer.  Fill this part from the
> -			 * per-thread storage.
> +			 * XSAVE buffer. Use the provided value.
>   			 */
> -			pkru.pkru = tsk->thread.pkru;
> +			pkru.pkru = pkru_val;
>   			membuf_write(&to, &pkru, sizeof(pkru));
>   		} else {
>   			copy_feature(header.xfeatures & BIT_ULL(i), &to,
> @@ -1056,6 +1055,25 @@ void copy_xstate_to_uabi_buf(struct memb
>   		membuf_zero(&to, to.left);
>   }
>   
> +/**
> + * copy_xstate_to_uabi_buf - Copy kernel saved xstate to a UABI buffer
> + * @to:		membuf descriptor
> + * @tsk:	The task from which to copy the saved xstate
> + * @copy_mode:	The requested copy mode
> + *
> + * Converts from kernel XSAVE or XSAVES compacted format to UABI conforming
> + * format, i.e. from the kernel internal hardware dependent storage format
> + * to the requested @mode. UABI XSTATE is always uncompacted!
> + *
> + * It supports partial copy but @to.pos always starts from zero.
> + */
> +void copy_xstate_to_uabi_buf(struct membuf to, struct task_struct *tsk,
> +			     enum xstate_copy_mode copy_mode)
> +{
> +	__copy_xstate_to_uabi_buf(to, &tsk->thread.fpu.state.xsave,
> +				  tsk->thread.pkru, copy_mode);
> +}
> +
>   static int copy_from_buffer(void *dst, unsigned int offset, unsigned int size,
>   			    const void *kbuf, const void __user *ubuf)
>   {
> --- a/arch/x86/kernel/fpu/xstate.h
> +++ b/arch/x86/kernel/fpu/xstate.h
> @@ -15,4 +15,7 @@ static inline void xstate_init_xcomp_bv(
>   		xsave->header.xcomp_bv = mask | XCOMP_BV_COMPACTED_FORMAT;
>   }
>   
> +extern void __copy_xstate_to_uabi_buf(struct membuf to, struct xregs_state *xsave,
> +				      u32 pkru_val, enum xstate_copy_mode copy_mode);
> +
>   #endif
> 

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [patch 24/31] x86/fpu: Move fpregs_restore_userregs() to core
  2021-10-12  0:00 ` [patch 24/31] x86/fpu: Move fpregs_restore_userregs() to core Thomas Gleixner
@ 2021-10-12 17:32   ` Borislav Petkov
  0 siblings, 0 replies; 96+ messages in thread
From: Borislav Petkov @ 2021-10-12 17:32 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Chang S. Bae, Dave Hansen, Arjan van de Ven, kvm,
	Paolo Bonzini

On Tue, Oct 12, 2021 at 02:00:34AM +0200, Thomas Gleixner wrote:
> Only used core internaly.

"Only used internally in the FPU core."

or so.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [patch 16/31] x86/fpu: Replace KVMs homebrewn FPU copy to user
  2021-10-12  0:00 ` [patch 16/31] x86/fpu: Replace KVMs homebrewn FPU copy to user Thomas Gleixner
  2021-10-12 17:10   ` Borislav Petkov
@ 2021-10-12 17:36   ` Paolo Bonzini
  2021-10-12 17:47     ` Thomas Gleixner
  1 sibling, 1 reply; 96+ messages in thread
From: Paolo Bonzini @ 2021-10-12 17:36 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: x86, Chang S. Bae, Dave Hansen, Arjan van de Ven, kvm

On 12/10/21 02:00, Thomas Gleixner wrote:
> 
> -	if (boot_cpu_has(X86_FEATURE_XSAVE)) {
> -		memset(guest_xsave, 0, sizeof(struct kvm_xsave));
> -		fill_xsave((u8 *) guest_xsave->region, vcpu);
> -	} else {
> -		memcpy(guest_xsave->region,
> -			&vcpu->arch.guest_fpu->state.fxsave,
> -			sizeof(struct fxregs_state));
> -		*(u64 *)&guest_xsave->region[XSAVE_HDR_OFFSET / sizeof(u32)] =
> -			XFEATURE_MASK_FPSSE;
> -	}

After the patch, this final assignment is not done in the else case:

> +
> +	if (cpu_feature_enabled(X86_FEATURE_XSAVE)) {
> +		__copy_xstate_to_uabi_buf(mb, &kstate->xsave, pkru,
> +					  XSTATE_COPY_XSAVE);
> +	} else {
> +		memcpy(&ustate->fxsave, &kstate->fxsave, sizeof(ustate->fxsave));
> +	}
> +}

This leaves the xstate_bv set to 0 instead of XFEATURE_MASK_FPSSE. 
Resuming a VM then fails if you save on a non-XSAVE machine and restore 
it on an XSAVE machine.

The memset(guest_xsave, 0, sizeof(struct kvm_xsave)) also is not 
reproduced, you can make it unconditional for simplicity; this is not a 
fast path.

Paolo


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [patch 26/31] x86/fpu: Move fpstate functions to api.h
  2021-10-12  0:00 ` [patch 26/31] x86/fpu: Move fpstate functions to api.h Thomas Gleixner
@ 2021-10-12 17:46   ` Borislav Petkov
  0 siblings, 0 replies; 96+ messages in thread
From: Borislav Petkov @ 2021-10-12 17:46 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Chang S. Bae, Dave Hansen, Arjan van de Ven, kvm,
	Paolo Bonzini

On Tue, Oct 12, 2021 at 02:00:37AM +0200, Thomas Gleixner wrote:
> Move function declarations which need to be globaly available to api.h
> where they belong.

"globally"

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [patch 16/31] x86/fpu: Replace KVMs homebrewn FPU copy to user
  2021-10-12 17:36   ` Paolo Bonzini
@ 2021-10-12 17:47     ` Thomas Gleixner
  2021-10-12 18:40       ` [patch V2 16/31] x86/fpu: Replace KVMs home brewed " Thomas Gleixner
  2021-10-13  5:34       ` [patch 16/31] x86/fpu: Replace KVMs homebrewn " Paolo Bonzini
  0 siblings, 2 replies; 96+ messages in thread
From: Thomas Gleixner @ 2021-10-12 17:47 UTC (permalink / raw)
  To: Paolo Bonzini, LKML; +Cc: x86, Chang S. Bae, Dave Hansen, Arjan van de Ven, kvm

On Tue, Oct 12 2021 at 19:36, Paolo Bonzini wrote:
> On 12/10/21 02:00, Thomas Gleixner wrote:
>> 
>> -	if (boot_cpu_has(X86_FEATURE_XSAVE)) {
>> -		memset(guest_xsave, 0, sizeof(struct kvm_xsave));
>> -		fill_xsave((u8 *) guest_xsave->region, vcpu);
>> -	} else {
>> -		memcpy(guest_xsave->region,
>> -			&vcpu->arch.guest_fpu->state.fxsave,
>> -			sizeof(struct fxregs_state));
>> -		*(u64 *)&guest_xsave->region[XSAVE_HDR_OFFSET / sizeof(u32)] =
>> -			XFEATURE_MASK_FPSSE;
>> -	}
>
> After the patch, this final assignment is not done in the else case:

Doh.

>> +
>> +	if (cpu_feature_enabled(X86_FEATURE_XSAVE)) {
>> +		__copy_xstate_to_uabi_buf(mb, &kstate->xsave, pkru,
>> +					  XSTATE_COPY_XSAVE);
>> +	} else {
>> +		memcpy(&ustate->fxsave, &kstate->fxsave, sizeof(ustate->fxsave));
>> +	}
>> +}
>
> This leaves the xstate_bv set to 0 instead of XFEATURE_MASK_FPSSE. 
> Resuming a VM then fails if you save on a non-XSAVE machine and restore 
> it on an XSAVE machine.

Yup.

> The memset(guest_xsave, 0, sizeof(struct kvm_xsave)) also is not 
> reproduced, you can make it unconditional for simplicity; this is not a 
> fast path.

Duh, I should have mentioned that in the changelog. The buffer is
allocated with kzalloc() soe the memset is redundant, right?

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [patch 13/31] x86/fpu: Move KVMs FPU swapping to FPU core
  2021-10-12 16:53   ` Borislav Petkov
@ 2021-10-12 18:25     ` Thomas Gleixner
  2021-10-12 18:26       ` Thomas Gleixner
  0 siblings, 1 reply; 96+ messages in thread
From: Thomas Gleixner @ 2021-10-12 18:25 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: LKML, x86, Chang S. Bae, Dave Hansen, Arjan van de Ven, kvm,
	Paolo Bonzini

On Tue, Oct 12 2021 at 18:53, Borislav Petkov wrote:
> On Tue, Oct 12, 2021 at 02:00:17AM +0200, Thomas Gleixner wrote:
>>  	/*
>> -	 * Guests with protected state can't have it set by the hypervisor,
>> -	 * so skip trying to set it.
>> +	 * Guest with protected state have guest_fpu == NULL which makes
>
> "Guests ... "
>
>> +	 * the swap only safe the host state. Exclude PKRU from restore as
>
> "save"

No I meant safe, but let me rephrase it, Swap does both save and
restore. But it's not safe to dereference a NULL pointer :)

 .... makes the swap only handle the host state. Exclude PKRU from restore as

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [patch 13/31] x86/fpu: Move KVMs FPU swapping to FPU core
  2021-10-12 18:25     ` Thomas Gleixner
@ 2021-10-12 18:26       ` Thomas Gleixner
  0 siblings, 0 replies; 96+ messages in thread
From: Thomas Gleixner @ 2021-10-12 18:26 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: LKML, x86, Chang S. Bae, Dave Hansen, Arjan van de Ven, kvm,
	Paolo Bonzini

On Tue, Oct 12 2021 at 20:25, Thomas Gleixner wrote:

> On Tue, Oct 12 2021 at 18:53, Borislav Petkov wrote:
>> On Tue, Oct 12, 2021 at 02:00:17AM +0200, Thomas Gleixner wrote:
>>>  	/*
>>> -	 * Guests with protected state can't have it set by the hypervisor,
>>> -	 * so skip trying to set it.
>>> +	 * Guest with protected state have guest_fpu == NULL which makes
>>
>> "Guests ... "
>>
>>> +	 * the swap only safe the host state. Exclude PKRU from restore as
>>
>> "save"
>
> No I meant safe, but let me rephrase it, Swap does both save and
> restore. But it's not safe to dereference a NULL pointer :)
>
>  .... makes the swap only handle the host state. Exclude PKRU from restore as

Gah. I should have looked at the context first. "save" is correct
here. Oh well...

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [patch V2 16/31] x86/fpu: Replace KVMs home brewed FPU copy to user
  2021-10-12 17:47     ` Thomas Gleixner
@ 2021-10-12 18:40       ` Thomas Gleixner
  2021-10-13  5:34       ` [patch 16/31] x86/fpu: Replace KVMs homebrewn " Paolo Bonzini
  1 sibling, 0 replies; 96+ messages in thread
From: Thomas Gleixner @ 2021-10-12 18:40 UTC (permalink / raw)
  To: Paolo Bonzini, LKML; +Cc: x86, Chang S. Bae, Dave Hansen, Arjan van de Ven, kvm

Similar to the copy from user function the FPU core has this already
implemented with all bells and whistles.

Get rid of the duplicated code and use the core functionality.

The memset(0) of the buffer is not required as it is already allocated
with kzalloc() at the call site.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
---
V2: Add the missing xsave header assignment in the !XSAVE path
    and explain the memset(0) removal in the changelog - Paolo
    Rename the function and fix subject - Borislav
---
 arch/x86/include/asm/fpu/api.h |    1 
 arch/x86/kernel/fpu/core.c     |   18 +++++++++++++
 arch/x86/kvm/x86.c             |   56 ++---------------------------------------
 3 files changed, 22 insertions(+), 53 deletions(-)

--- a/arch/x86/include/asm/fpu/api.h
+++ b/arch/x86/include/asm/fpu/api.h
@@ -117,5 +117,6 @@ extern void fpu_init_fpstate_user(struct
 extern void fpu_swap_kvm_fpu(struct fpu *save, struct fpu *rstor, u64 restore_mask);
 
 extern int fpu_copy_kvm_uabi_to_fpstate(struct fpu *fpu, const void *buf, u64 xcr0, u32 *pkru);
+extern void fpu_copy_fpstate_to_kvm_uabi(struct fpu *fpu, void *buf, unsigned int size, u32 pkru);
 
 #endif /* _ASM_X86_FPU_API_H */
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -175,6 +175,24 @@ void fpu_swap_kvm_fpu(struct fpu *save,
 }
 EXPORT_SYMBOL_GPL(fpu_swap_kvm_fpu);
 
+void fpu_copy_fpstate_to_kvm_uabi(struct fpu *fpu, void *buf,
+			       unsigned int size, u32 pkru)
+{
+	union fpregs_state *kstate = &fpu->state;
+	union fpregs_state *ustate = buf;
+	struct membuf mb = { .p = buf, .left = size };
+
+	if (cpu_feature_enabled(X86_FEATURE_XSAVE)) {
+		__copy_xstate_to_uabi_buf(mb, &kstate->xsave, pkru,
+					  XSTATE_COPY_XSAVE);
+	} else {
+		memcpy(&ustate->fxsave, &kstate->fxsave, sizeof(ustate->fxsave));
+		/* Make it restorable on a XSAVE enabled host */
+		ustate->xsave.header.xfeatures = XFEATURE_MASK_FPSSE;
+	}
+}
+EXPORT_SYMBOL_GPL(fpu_copy_fpstate_to_kvm_uabi);
+
 int fpu_copy_kvm_uabi_to_fpstate(struct fpu *fpu, const void *buf, u64 xcr0,
 				 u32 *vpkru)
 {
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4695,65 +4695,15 @@ static int kvm_vcpu_ioctl_x86_set_debugr
 	return 0;
 }
 
-static void fill_xsave(u8 *dest, struct kvm_vcpu *vcpu)
-{
-	struct xregs_state *xsave = &vcpu->arch.guest_fpu->state.xsave;
-	u64 xstate_bv = xsave->header.xfeatures;
-	u64 valid;
-
-	/*
-	 * Copy legacy XSAVE area, to avoid complications with CPUID
-	 * leaves 0 and 1 in the loop below.
-	 */
-	memcpy(dest, xsave, XSAVE_HDR_OFFSET);
-
-	/* Set XSTATE_BV */
-	xstate_bv &= vcpu->arch.guest_supported_xcr0 | XFEATURE_MASK_FPSSE;
-	*(u64 *)(dest + XSAVE_HDR_OFFSET) = xstate_bv;
-
-	/*
-	 * Copy each region from the possibly compacted offset to the
-	 * non-compacted offset.
-	 */
-	valid = xstate_bv & ~XFEATURE_MASK_FPSSE;
-	while (valid) {
-		u32 size, offset, ecx, edx;
-		u64 xfeature_mask = valid & -valid;
-		int xfeature_nr = fls64(xfeature_mask) - 1;
-		void *src;
-
-		cpuid_count(XSTATE_CPUID, xfeature_nr,
-			    &size, &offset, &ecx, &edx);
-
-		if (xfeature_nr == XFEATURE_PKRU) {
-			memcpy(dest + offset, &vcpu->arch.pkru,
-			       sizeof(vcpu->arch.pkru));
-		} else {
-			src = get_xsave_addr(xsave, xfeature_nr);
-			if (src)
-				memcpy(dest + offset, src, size);
-		}
-
-		valid -= xfeature_mask;
-	}
-}
-
 static void kvm_vcpu_ioctl_x86_get_xsave(struct kvm_vcpu *vcpu,
 					 struct kvm_xsave *guest_xsave)
 {
 	if (!vcpu->arch.guest_fpu)
 		return;
 
-	if (boot_cpu_has(X86_FEATURE_XSAVE)) {
-		memset(guest_xsave, 0, sizeof(struct kvm_xsave));
-		fill_xsave((u8 *) guest_xsave->region, vcpu);
-	} else {
-		memcpy(guest_xsave->region,
-			&vcpu->arch.guest_fpu->state.fxsave,
-			sizeof(struct fxregs_state));
-		*(u64 *)&guest_xsave->region[XSAVE_HDR_OFFSET / sizeof(u32)] =
-			XFEATURE_MASK_FPSSE;
-	}
+	fpu_copy_fpstate_to_kvm_uabi(vcpu->arch.guest_fpu, guest_xsave->region,
+				     sizeof(guest_xsave->region),
+				     vcpu->arch.pkru);
 }
 
 static int kvm_vcpu_ioctl_x86_set_xsave(struct kvm_vcpu *vcpu,

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [patch 09/31] x86/fpu: Do not inherit FPU context for CLONE_THREAD
  2021-10-12 16:10   ` Borislav Petkov
@ 2021-10-12 18:52     ` Thomas Gleixner
  2021-10-12 19:01       ` Thomas Gleixner
  0 siblings, 1 reply; 96+ messages in thread
From: Thomas Gleixner @ 2021-10-12 18:52 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: LKML, x86, Chang S. Bae, Dave Hansen, Arjan van de Ven, kvm,
	Paolo Bonzini

On Tue, Oct 12 2021 at 18:10, Borislav Petkov wrote:

> On Tue, Oct 12, 2021 at 02:00:11AM +0200, Thomas Gleixner wrote:
>> CLONE_THREAD does not have the guarantee of a true fork to inherit all
>> state. Especially the FPU state is meaningless for CLONE_THREAD.
>> 
>> Just wipe out the minimal required state so restore on return to user space
>> let's the thread start with a clean FPU.
>
> This sentence reads weird, needs massaging.

The patch is wrong and needs to be removed. I just double checked
pthread_create() again and it says:

The new thread inherits the calling thread's floating-point environment
(fenv(3))

No idea where I was looking at a few days ago. :(

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [patch 09/31] x86/fpu: Do not inherit FPU context for CLONE_THREAD
  2021-10-12 18:52     ` Thomas Gleixner
@ 2021-10-12 19:01       ` Thomas Gleixner
  0 siblings, 0 replies; 96+ messages in thread
From: Thomas Gleixner @ 2021-10-12 19:01 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: LKML, x86, Chang S. Bae, Dave Hansen, Arjan van de Ven, kvm,
	Paolo Bonzini

On Tue, Oct 12 2021 at 20:52, Thomas Gleixner wrote:

> On Tue, Oct 12 2021 at 18:10, Borislav Petkov wrote:
>
>> On Tue, Oct 12, 2021 at 02:00:11AM +0200, Thomas Gleixner wrote:
>>> CLONE_THREAD does not have the guarantee of a true fork to inherit all
>>> state. Especially the FPU state is meaningless for CLONE_THREAD.
>>> 
>>> Just wipe out the minimal required state so restore on return to user space
>>> let's the thread start with a clean FPU.
>>
>> This sentence reads weird, needs massaging.
>
> The patch is wrong and needs to be removed. I just double checked
> pthread_create() again and it says:
>
> The new thread inherits the calling thread's floating-point environment
> (fenv(3))
>
> No idea where I was looking at a few days ago. :(

But fenv(3) is not the FPU state. Duh!

Anyway. It's an optimization which we can do later still and not
required for the cleanups here.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [patch 00/31] x86/fpu: Preparatory cleanups for AMX support (part 1)
  2021-10-11 23:59 [patch 00/31] x86/fpu: Preparatory cleanups for AMX support (part 1) Thomas Gleixner
                   ` (30 preceding siblings ...)
  2021-10-12  0:00 ` [patch 31/31] x86/fpu: Provide a proper function for ex_handler_fprestore() Thomas Gleixner
@ 2021-10-12 21:15 ` Thomas Gleixner
  31 siblings, 0 replies; 96+ messages in thread
From: Thomas Gleixner @ 2021-10-12 21:15 UTC (permalink / raw)
  To: LKML; +Cc: x86, Chang S. Bae, Dave Hansen, Arjan van de Ven, kvm, Paolo Bonzini

On Tue, Oct 12 2021 at 01:59, Thomas Gleixner wrote:
>
> The current series (#1) is based on:
>
>    git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86/fpu
>
> and also available from git:
>
>    git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git x86/fpu-1

I've updated the git branch with the review comments which came in today
addressed.

The full stack is rebased on top of that along with a few other fixes.

The delta patch to the current part-1 series is below.

I'm going to wait a bit before sending out a V2 to give people time to
react. Though I'm planning to send out part-2 based on the current state
soonish.

Thanks,

        tglx
---
diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h
index 4cf54d8ce17d..5ac5e4596b53 100644
--- a/arch/x86/include/asm/fpu/api.h
+++ b/arch/x86/include/asm/fpu/api.h
@@ -130,13 +130,13 @@ static inline void fpstate_init_soft(struct swregs_state *soft) {}
 /* State tracking */
 DECLARE_PER_CPU(struct fpu *, fpu_fpregs_owner_ctx);
 
-/* FPSTATE related functions which are exported to KVM */
+/* fpstate-related functions which are exported to KVM */
 extern void fpu_init_fpstate_user(struct fpu *fpu);
 
 /* KVM specific functions */
 extern void fpu_swap_kvm_fpu(struct fpu *save, struct fpu *rstor, u64 restore_mask);
 
-extern int fpu_copy_kvm_uabi_to_vcpu(struct fpu *fpu, const void *buf, u64 xcr0, u32 *pkru);
-extern void fpu_copy_vcpu_to_kvm_uabi(struct fpu *fpu, void *buf, unsigned int size, u32 pkru);
+extern int fpu_copy_kvm_uabi_to_fpstate(struct fpu *fpu, const void *buf, u64 xcr0, u32 *pkru);
+extern void fpu_copy_fpstate_to_kvm_uabi(struct fpu *fpu, void *buf, unsigned int size, u32 pkru);
 
 #endif /* _ASM_X86_FPU_API_H */
diff --git a/arch/x86/include/asm/fpu/sched.h b/arch/x86/include/asm/fpu/sched.h
index 99a8820e8cc4..cdb78d590c86 100644
--- a/arch/x86/include/asm/fpu/sched.h
+++ b/arch/x86/include/asm/fpu/sched.h
@@ -11,7 +11,7 @@
 
 extern void save_fpregs_to_fpstate(struct fpu *fpu);
 extern void fpu__drop(struct fpu *fpu);
-extern int  fpu_clone(struct task_struct *dst, unsigned long clone_flags);
+extern int  fpu_clone(struct task_struct *dst);
 extern void fpu_flush_thread(void);
 
 /*
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 1c5e753ba3f1..ac540a7d410e 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -184,7 +184,7 @@ void fpu_swap_kvm_fpu(struct fpu *save, struct fpu *rstor, u64 restore_mask)
 }
 EXPORT_SYMBOL_GPL(fpu_swap_kvm_fpu);
 
-void fpu_copy_vcpu_to_kvm_uabi(struct fpu *fpu, void *buf,
+void fpu_copy_fpstate_to_kvm_uabi(struct fpu *fpu, void *buf,
 			       unsigned int size, u32 pkru)
 {
 	union fpregs_state *kstate = &fpu->state;
@@ -196,12 +196,14 @@ void fpu_copy_vcpu_to_kvm_uabi(struct fpu *fpu, void *buf,
 					  XSTATE_COPY_XSAVE);
 	} else {
 		memcpy(&ustate->fxsave, &kstate->fxsave, sizeof(ustate->fxsave));
+		/* Make it restorable on a XSAVE enabled host */
+		ustate->xsave.header.xfeatures = XFEATURE_MASK_FPSSE;
 	}
 }
-EXPORT_SYMBOL_GPL(fpu_copy_vcpu_to_kvm_uabi);
+EXPORT_SYMBOL_GPL(fpu_copy_fpstate_to_kvm_uabi);
 
-int fpu_copy_kvm_uabi_to_vcpu(struct fpu *fpu, const void *buf, u64 xcr0,
-			      u32 *vpkru)
+int fpu_copy_kvm_uabi_to_fpstate(struct fpu *fpu, const void *buf, u64 xcr0,
+				 u32 *vpkru)
 {
 	union fpregs_state *kstate = &fpu->state;
 	const union fpregs_state *ustate = buf;
@@ -234,7 +236,7 @@ int fpu_copy_kvm_uabi_to_vcpu(struct fpu *fpu, const void *buf, u64 xcr0,
 	xstate_init_xcomp_bv(&kstate->xsave, xfeatures_mask_uabi());
 	return 0;
 }
-EXPORT_SYMBOL_GPL(fpu_copy_kvm_uabi_to_vcpu);
+EXPORT_SYMBOL_GPL(fpu_copy_kvm_uabi_to_fpstate);
 #endif /* CONFIG_KVM */
 
 void kernel_fpu_begin_mask(unsigned int kfpu_mask)
@@ -344,7 +346,7 @@ EXPORT_SYMBOL_GPL(fpu_init_fpstate_user);
 #endif
 
 /* Clone current's FPU state on fork */
-int fpu_clone(struct task_struct *dst, unsigned long clone_flags)
+int fpu_clone(struct task_struct *dst)
 {
 	struct fpu *src_fpu = &current->thread.fpu;
 	struct fpu *dst_fpu = &dst->thread.fpu;
@@ -363,11 +365,9 @@ int fpu_clone(struct task_struct *dst, unsigned long clone_flags)
 
 	/*
 	 * No FPU state inheritance for kernel threads and IO
-	 * worker threads. Neither CLONE_THREAD needs a copy
-	 * of the FPU state.
+	 * worker threads.
 	 */
-	if (clone_flags & CLONE_THREAD ||
-	    dst->flags & (PF_KTHREAD | PF_IO_WORKER)) {
+	if (dst->flags & (PF_KTHREAD | PF_IO_WORKER)) {
 		/* Clear out the minimal state */
 		memcpy(&dst_fpu->state, &init_fpstate,
 		       init_fpstate_copy_size());
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 6e729060beb3..b022df95a302 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -1195,15 +1195,13 @@ int copy_sigframe_from_user_to_xstate(struct xregs_state *xsave,
 	return copy_uabi_to_xstate(xsave, NULL, ubuf);
 }
 
-static bool validate_xsaves_xrstors(u64 mask)
+static bool validate_independent_components(u64 mask)
 {
 	u64 xchk;
 
 	if (WARN_ON_FPU(!cpu_feature_enabled(X86_FEATURE_XSAVES)))
 		return false;
-	/*
-	 * Validate that this is a independent compoment.
-	 */
+
 	xchk = ~xfeatures_mask_independent();
 
 	if (WARN_ON_ONCE(!mask || mask & xchk))
@@ -1222,13 +1220,13 @@ static bool validate_xsaves_xrstors(u64 mask)
  * buffer should be zeroed otherwise a consecutive XRSTORS from that buffer
  * can #GP.
  *
- * The feature mask must be a subset of the independent features
+ * The feature mask must be a subset of the independent features.
  */
 void xsaves(struct xregs_state *xstate, u64 mask)
 {
 	int err;
 
-	if (!validate_xsaves_xrstors(mask))
+	if (!validate_independent_components(mask))
 		return;
 
 	XSTATE_OP(XSAVES, xstate, (u32)mask, (u32)(mask >> 32), err);
@@ -1246,13 +1244,13 @@ void xsaves(struct xregs_state *xstate, u64 mask)
  * Proper usage is to restore the state which was saved with
  * xsaves() into @xstate.
  *
- * The feature mask must be a subset of the independent features
+ * The feature mask must be a subset of the independent features.
  */
 void xrstors(struct xregs_state *xstate, u64 mask)
 {
 	int err;
 
-	if (!validate_xsaves_xrstors(mask))
+	if (!validate_independent_components(mask))
 		return;
 
 	XSTATE_OP(XRSTORS, xstate, (u32)mask, (u32)(mask >> 32), err);
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 83a34fd828d5..5cd82082353e 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -154,7 +154,7 @@ int copy_thread(unsigned long clone_flags, unsigned long sp, unsigned long arg,
 	frame->flags = X86_EFLAGS_FIXED;
 #endif
 
-	fpu_clone(p, clone_flags);
+	fpu_clone(p);
 
 	/* Kernel thread ? */
 	if (unlikely(p->flags & PF_KTHREAD)) {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ac02945756ec..f7826148edc9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4701,9 +4701,9 @@ static void kvm_vcpu_ioctl_x86_get_xsave(struct kvm_vcpu *vcpu,
 	if (!vcpu->arch.guest_fpu)
 		return;
 
-	fpu_copy_vcpu_to_kvm_uabi(vcpu->arch.guest_fpu, guest_xsave->region,
-				  sizeof(guest_xsave->region),
-				  vcpu->arch.pkru);
+	fpu_copy_fpstate_to_kvm_uabi(vcpu->arch.guest_fpu, guest_xsave->region,
+				     sizeof(guest_xsave->region),
+				     vcpu->arch.pkru);
 }
 
 static int kvm_vcpu_ioctl_x86_set_xsave(struct kvm_vcpu *vcpu,
@@ -4712,9 +4712,9 @@ static int kvm_vcpu_ioctl_x86_set_xsave(struct kvm_vcpu *vcpu,
 	if (!vcpu->arch.guest_fpu)
 		return 0;
 
-	return fpu_copy_kvm_uabi_to_vcpu(vcpu->arch.guest_fpu,
-					 guest_xsave->region,
-					 supported_xcr0, &vcpu->arch.pkru);
+	return fpu_copy_kvm_uabi_to_fpstate(vcpu->arch.guest_fpu,
+					    guest_xsave->region,
+					    supported_xcr0, &vcpu->arch.pkru);
 }
 
 static void kvm_vcpu_ioctl_x86_get_xcrs(struct kvm_vcpu *vcpu,
@@ -9787,8 +9787,8 @@ static int complete_emulated_mmio(struct kvm_vcpu *vcpu)
 static void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
 {
 	/*
-	 * Guest with protected state have guest_fpu == NULL which makes
-	 * the swap only safe the host state. Exclude PKRU from restore as
+	 * Guests with protected state have guest_fpu == NULL which makes
+	 * the swap only save the host state. Exclude PKRU from restore as
 	 * it is restored separately in kvm_x86_ops.run().
 	 */
 	fpu_swap_kvm_fpu(vcpu->arch.user_fpu, vcpu->arch.guest_fpu,
@@ -9800,7 +9800,7 @@ static void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
 static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
 {
 	/*
-	 * Guest with protected state have guest_fpu == NULL which makes
+	 * Guests with protected state have guest_fpu == NULL which makes
 	 * swap only restore the host state.
 	 */
 	fpu_swap_kvm_fpu(vcpu->arch.guest_fpu, vcpu->arch.user_fpu, ~0ULL);

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* Re: [patch 16/31] x86/fpu: Replace KVMs homebrewn FPU copy to user
  2021-10-12 17:47     ` Thomas Gleixner
  2021-10-12 18:40       ` [patch V2 16/31] x86/fpu: Replace KVMs home brewed " Thomas Gleixner
@ 2021-10-13  5:34       ` Paolo Bonzini
  1 sibling, 0 replies; 96+ messages in thread
From: Paolo Bonzini @ 2021-10-13  5:34 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: x86, Chang S. Bae, Dave Hansen, Arjan van de Ven, kvm

On 12/10/21 19:47, Thomas Gleixner wrote:
>> The memset(guest_xsave, 0, sizeof(struct kvm_xsave)) also is not
>> reproduced, you can make it unconditional for simplicity; this is not a
>> fast path.
> Duh, I should have mentioned that in the changelog. The buffer is
> allocated with kzalloc() soe the memset is redundant, right?

Yes, I always confuse the __user pointers with the temporary ones that 
are allocated in the callers.

Paolo


^ permalink raw reply	[flat|nested] 96+ messages in thread

* RE: [patch 13/31] x86/fpu: Move KVMs FPU swapping to FPU core
  2021-10-12 17:22   ` Paolo Bonzini
@ 2021-10-13  6:15     ` Liu, Jing2
  2021-10-13  6:26       ` Paolo Bonzini
  2021-10-13 15:12       ` Thomas Gleixner
  0 siblings, 2 replies; 96+ messages in thread
From: Liu, Jing2 @ 2021-10-13  6:15 UTC (permalink / raw)
  To: Paolo Bonzini, Thomas Gleixner, LKML
  Cc: x86, Bae, Chang Seok, Dave Hansen, Arjan van de Ven, kvm,
	Nakajima, Jun, Jing Liu, seanjc

> On 12/10/21 02:00, Thomas Gleixner wrote:
> > Swapping the host/guest FPU is directly fiddling with FPU internals
> > which requires 5 exports. The upcoming support of dymanically enabled
> > states would even need more.
> >
> > Implement a swap function in the FPU core code and export that instead.
> >
> > Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> > Cc: kvm@vger.kernel.org
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > ---
> >   arch/x86/include/asm/fpu/api.h      |    8 +++++
> >   arch/x86/include/asm/fpu/internal.h |   15 +---------
> >   arch/x86/kernel/fpu/core.c          |   30 ++++++++++++++++++---
> >   arch/x86/kernel/fpu/init.c          |    1
> >   arch/x86/kernel/fpu/xstate.c        |    1
> >   arch/x86/kvm/x86.c                  |   51 +++++++-----------------------------
> >   arch/x86/mm/extable.c               |    2 -
> >   7 files changed, 48 insertions(+), 60 deletions(-)
> >

When looking into the tglx/devel.git x86/fpu for the full #1-#4 
series and the KVM AMX support, I'd like to talk two things
 as follows,

1. KVM dynamic allocation API:
Since KVM also uses dynamic allocation, after KVM detects guest
requesting AMX by #NM trap, KVM need alloc extra buffer for
this vcpu's current->thread.fpu.fpstate and guest_fpu related.
So far, the kernel itself has such API like fpstate_realloc(), but it's
static. How about making a common function usable for KVM?


2. There exists a case that *guest AMX state can be lost*:

After KVM passthrough XFD to guest, when vmexit opening
irq window and KVM is interrupted, kernel softirq path can call
kernel_fpu_begin() to touch xsave state. This function does
XSAVES. If guest XFD[18] is 1, and with guest AMX state in register,
then guest AMX state is lost by XSAVES.

The detailed example call trace in commit
commit 2620fe268e80d667a94553cd37a94ccaa2cb8c83
Author: Sean Christopherson <seanjc@google.com>
Date:   Fri Jan 17 11:30:51 2020 -0800

    KVM: x86: Revert "KVM: X86: Fix fpu state crash in kvm guest"

    Reload the current thread's FPU state, which contains the guest's FPU
    state, to the CPU registers if necessary during vcpu_enter_guest().
    TIF_NEED_FPU_LOAD can be set any time control is transferred out of
    KVM,
    e.g. if I/O is triggered during a KVM call to get_user_pages() or if a
    softirq occurs while KVM is scheduled in.
    ...
   A sample trace triggered by warning if TIF_NEED_FPU_LOAD is set while
    vcpu state is loaded:

     <IRQ>
      gcmaes_crypt_by_sg.constprop.12+0x26e/0x660
      ? 0xffffffffc024547d
      ? __qdisc_run+0x83/0x510
      ? __dev_queue_xmit+0x45e/0x990
      ...
      ? do_IRQ+0x7f/0xd0
      ? common_interrupt+0xf/0xf
      </IRQ>
      ? irq_entries_start+0x20/0x660
      ? vmx_get_interrupt_shadow+0x2f0/0x710 [kvm_intel]
      ? kvm_set_msr_common+0xfc7/0x2380 [kvm]
      ? recalibrate_cpu_khz+0x10/0x10
      ? ktime_get+0x3a/0xa0
      ? kvm_arch_vcpu_ioctl_run+0x107/0x560 [kvm]
      ? kvm_init+0x6bf/0xd00 [kvm]

For this case, I think one way is kernel doing something before XSAVES
for KVM thread; another way is let KVM fix: maintaining a zero XFD
value (by current->state.fpu.fpstate->xfd = 0) after vcpu fpu state is 
loaded and restore real guest XFD value before vmenter. 
Logic as follows.

after vmexit:
if XFD is passthrough
then
	sync guest XFD to vmx->xfd;
	set XFD to current->state.fpu.fpstate->xfd (= 0)
	__this_cpu_write(xfd_state, 0);

before vmenter (irq is disabled):
if passthrough
then
	restore to real guest XFD by vmx->xfd;

vcpu_run: (if XFD is passthrough)
load: swap from qemu's to a zero XFD
put: swap zero to qemu's


Thanks,
Jing

[...]

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [patch 13/31] x86/fpu: Move KVMs FPU swapping to FPU core
  2021-10-13  6:15     ` Liu, Jing2
@ 2021-10-13  6:26       ` Paolo Bonzini
  2021-10-13  7:46         ` Liu, Jing2
  2021-10-13 15:12       ` Thomas Gleixner
  1 sibling, 1 reply; 96+ messages in thread
From: Paolo Bonzini @ 2021-10-13  6:26 UTC (permalink / raw)
  To: Liu, Jing2, Thomas Gleixner, LKML
  Cc: x86, Bae, Chang Seok, Dave Hansen, Arjan van de Ven, kvm,
	Nakajima, Jun, Jing Liu, seanjc

On 13/10/21 08:15, Liu, Jing2 wrote:
> After KVM passthrough XFD to guest, when vmexit opening
> irq window and KVM is interrupted, kernel softirq path can call
> kernel_fpu_begin() to touch xsave state. This function does
> XSAVES. If guest XFD[18] is 1, and with guest AMX state in register,
> then guest AMX state is lost by XSAVES.

Yes, the host value of XFD (which is zero) has to be restored after 
vmexit.  See how KVM already handles SPEC_CTRL.

Passthrough of XFD is only enabled after the guest has caused an #NM 
vmexit and the full XSAVE state has been dynamically allocated, 
therefore it is always possible to do an XSAVES even from atomic context.

Paolo


^ permalink raw reply	[flat|nested] 96+ messages in thread

* RE: [patch 13/31] x86/fpu: Move KVMs FPU swapping to FPU core
  2021-10-13  6:26       ` Paolo Bonzini
@ 2021-10-13  7:46         ` Liu, Jing2
  2021-10-13  8:42           ` Paolo Bonzini
  0 siblings, 1 reply; 96+ messages in thread
From: Liu, Jing2 @ 2021-10-13  7:46 UTC (permalink / raw)
  To: Paolo Bonzini, Thomas Gleixner, LKML
  Cc: x86, Bae, Chang Seok, Dave Hansen, Arjan van de Ven, kvm,
	Nakajima, Jun, Jing Liu, seanjc


> On 13/10/21 08:15, Liu, Jing2 wrote:
> > After KVM passthrough XFD to guest, when vmexit opening irq window and
> > KVM is interrupted, kernel softirq path can call
> > kernel_fpu_begin() to touch xsave state. This function does XSAVES. If
> > guest XFD[18] is 1, and with guest AMX state in register, then guest
> > AMX state is lost by XSAVES.
> 
> Yes, the host value of XFD (which is zero) has to be restored after vmexit.
> See how KVM already handles SPEC_CTRL.
> 

I'm trying to understand why qemu's XFD is zero after kernel supports AMX.
Do you mean in guest #NM trap KVM also alloc extra user_fpu buffer and
clear qemu's XFD? But why do we need do that?

I think only when qemu userspace requests an AMX permission and exec
AMX instruction generating host #NM, host kernel clears qemu's XFD[18].
If guest #NM being trapped, KVM *don't* need clear host's XFD, but only
allocate guest_fpu's buffer and current->thread.fpu 's buffer, and
clear guest's XFD.

 
> Passthrough of XFD is only enabled after the guest has caused an #NM
> vmexit 

Yes, passthrough is done by two cases: one is guest #NM trapped;
another is guest clearing XFD before it generates #NM (this is possible for
guest), then passthrough.
For the two cases, we passthrough and allocate buffer for guest_fpu, and
current->thread.fpu.

Thanks,
Jing

and the full XSAVE state has been dynamically allocated, therefore it
> is always possible to do an XSAVES even from atomic context.
> 
> Paolo


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [patch 13/31] x86/fpu: Move KVMs FPU swapping to FPU core
  2021-10-13  7:46         ` Liu, Jing2
@ 2021-10-13  8:42           ` Paolo Bonzini
  2021-10-13 10:14             ` Andy Lutomirski
                               ` (2 more replies)
  0 siblings, 3 replies; 96+ messages in thread
From: Paolo Bonzini @ 2021-10-13  8:42 UTC (permalink / raw)
  To: Liu, Jing2, Thomas Gleixner, LKML
  Cc: x86, Bae, Chang Seok, Dave Hansen, Arjan van de Ven, kvm,
	Nakajima, Jun, Jing Liu, seanjc

On 13/10/21 09:46, Liu, Jing2 wrote:
> 
>> On 13/10/21 08:15, Liu, Jing2 wrote:
>>> After KVM passthrough XFD to guest, when vmexit opening irq window and
>>> KVM is interrupted, kernel softirq path can call
>>> kernel_fpu_begin() to touch xsave state. This function does XSAVES. If
>>> guest XFD[18] is 1, and with guest AMX state in register, then guest
>>> AMX state is lost by XSAVES.
>>
>> Yes, the host value of XFD (which is zero) has to be restored after vmexit.
>> See how KVM already handles SPEC_CTRL.
> 
> I'm trying to understand why qemu's XFD is zero after kernel supports AMX.

There are three copies of XFD:

- the guest value stored in vcpu->arch.

- the "QEMU" value attached to host_fpu.  This one only becomes zero if 
QEMU requires AMX (which shouldn't happen).

- the internal KVM value attached to guest_fpu.  When #NM happens, this 
one becomes zero.


The CPU value is:

- the host_fpu value before kvm_load_guest_fpu and after 
kvm_put_guest_fpu.  This ensures that QEMU context switch is as cheap as 
possible.

- the guest_fpu value between kvm_load_guest_fpu and kvm_put_guest_fpu. 
  This ensures that no state is lost in the case you are describing.

- the OR of the guest value and the guest_fpu value while the guest runs 
(using either MSR load/save lists, or manual wrmsr like 
pt_guest_enter/pt_guest_exit).  This ensures that the host has the 
opportunity to get a #NM exception, and allocate AMX state in the 
guest_fpu and in current->thread.fpu.

> Yes, passthrough is done by two cases: one is guest #NM trapped;
> another is guest clearing XFD before it generates #NM (this is possible for
> guest), then passthrough.
> For the two cases, we passthrough and allocate buffer for guest_fpu, and
> current->thread.fpu.

I think it's simpler to always wait for #NM, it will only happen once 
per vCPU.  In other words, even if the guest clears XFD before it 
generates #NM, the guest_fpu's XFD remains nonzero and an #NM vmexit is 
possible.  After #NM the guest_fpu's XFD is zero; then passthrough can 
happen and the #NM vmexit trap can be disabled.

Paolo


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [patch 13/31] x86/fpu: Move KVMs FPU swapping to FPU core
  2021-10-13  8:42           ` Paolo Bonzini
@ 2021-10-13 10:14             ` Andy Lutomirski
  2021-10-13 12:26               ` Paolo Bonzini
  2021-10-13 10:25             ` Liu, Jing2
  2021-10-13 14:06             ` Thomas Gleixner
  2 siblings, 1 reply; 96+ messages in thread
From: Andy Lutomirski @ 2021-10-13 10:14 UTC (permalink / raw)
  To: Paolo Bonzini, Liu, Jing2, Thomas Gleixner, Linux Kernel Mailing List
  Cc: the arch/x86 maintainers, Bae, Chang Seok, Dave Hansen,
	Arjan van de Ven, kvm list, Nakajima, Jun, Jing Liu,
	Sean Christopherson



On Wed, Oct 13, 2021, at 1:42 AM, Paolo Bonzini wrote:
> On 13/10/21 09:46, Liu, Jing2 wrote:
>> 
>>> On 13/10/21 08:15, Liu, Jing2 wrote:
>>>> After KVM passthrough XFD to guest, when vmexit opening irq window and
>>>> KVM is interrupted, kernel softirq path can call
>>>> kernel_fpu_begin() to touch xsave state. This function does XSAVES. If
>>>> guest XFD[18] is 1, and with guest AMX state in register, then guest
>>>> AMX state is lost by XSAVES.
>>>
>>> Yes, the host value of XFD (which is zero) has to be restored after vmexit.
>>> See how KVM already handles SPEC_CTRL.
>> 
>> I'm trying to understand why qemu's XFD is zero after kernel supports AMX.
>
> There are three copies of XFD:
>
> - the guest value stored in vcpu->arch.
>
> - the "QEMU" value attached to host_fpu.  This one only becomes zero if 
> QEMU requires AMX (which shouldn't happen).
>
> - the internal KVM value attached to guest_fpu.  When #NM happens, this 
> one becomes zero.
>
>
> The CPU value is:
>
> - the host_fpu value before kvm_load_guest_fpu and after 
> kvm_put_guest_fpu.  This ensures that QEMU context switch is as cheap as 
> possible.
>
> - the guest_fpu value between kvm_load_guest_fpu and kvm_put_guest_fpu. 
>   This ensures that no state is lost in the case you are describing.
>
> - the OR of the guest value and the guest_fpu value while the guest runs 
> (using either MSR load/save lists, or manual wrmsr like 
> pt_guest_enter/pt_guest_exit).  This ensures that the host has the 
> opportunity to get a #NM exception, and allocate AMX state in the 
> guest_fpu and in current->thread.fpu.
>
>> Yes, passthrough is done by two cases: one is guest #NM trapped;
>> another is guest clearing XFD before it generates #NM (this is possible for
>> guest), then passthrough.
>> For the two cases, we passthrough and allocate buffer for guest_fpu, and
>> current->thread.fpu.
>
> I think it's simpler to always wait for #NM, it will only happen once 
> per vCPU.  In other words, even if the guest clears XFD before it 
> generates #NM, the guest_fpu's XFD remains nonzero and an #NM vmexit is 
> possible.  After #NM the guest_fpu's XFD is zero; then passthrough can 
> happen and the #NM vmexit trap can be disabled.

This will stop being at all optimal when Intel inevitably adds another feature that uses XFD.  In the potentially infinite window in which the guest manages XFD and #NM on behalf of its userspace and when the guest allocates the other hypothetical feature, all the #NMs will have to be trapped by KVM.

Is it really worthwhile for KVM to use XFD at all instead of preallocating the state and being done with it?  KVM would still have to avoid data loss if the guest sets XFD with non-init state, but #NM could always pass through.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* RE: [patch 13/31] x86/fpu: Move KVMs FPU swapping to FPU core
  2021-10-13  8:42           ` Paolo Bonzini
  2021-10-13 10:14             ` Andy Lutomirski
@ 2021-10-13 10:25             ` Liu, Jing2
  2021-10-13 12:37               ` Paolo Bonzini
  2021-10-13 14:06             ` Thomas Gleixner
  2 siblings, 1 reply; 96+ messages in thread
From: Liu, Jing2 @ 2021-10-13 10:25 UTC (permalink / raw)
  To: Paolo Bonzini, Thomas Gleixner, LKML
  Cc: x86, Bae, Chang Seok, Dave Hansen, Arjan van de Ven, kvm,
	Nakajima, Jun, Jing Liu, seanjc


On 13/10/21 10:42, Paolo Bonzini wrote:
> On 13/10/21 09:46, Liu, Jing2 wrote:
> >
> >> On 13/10/21 08:15, Liu, Jing2 wrote:
> >>> After KVM passthrough XFD to guest, when vmexit opening irq window
> >>> and KVM is interrupted, kernel softirq path can call
> >>> kernel_fpu_begin() to touch xsave state. This function does XSAVES.
> >>> If guest XFD[18] is 1, and with guest AMX state in register, then
> >>> guest AMX state is lost by XSAVES.
> >>
> >> Yes, the host value of XFD (which is zero) has to be restored after vmexit.
> >> See how KVM already handles SPEC_CTRL.
> >
> > I'm trying to understand why qemu's XFD is zero after kernel supports AMX.
> 
> There are three copies of XFD:
> 
> - the guest value stored in vcpu->arch.

OK, let's call it e.g. vcpu->arch.xfd

[...]
> - the internal KVM value attached to guest_fpu.  When #NM happens, this
> one becomes zero.

> The CPU value is:
> 
> - the guest_fpu value between kvm_load_guest_fpu and kvm_put_guest_fpu.
>   This ensures that no state is lost in the case you are describing.
> 

OK, you mean using guest_fpu as a KVM value. Let me describe the
flow to see if anything missing.

When #NM trap which makes passthrough, guest_fpu XFD set to 0 and keeps
forever. (don't change HW XFD which is still 1)
In the #NM trap, KVM alloc buffer and regenerate a #NM exception to guest
to make guest kernel alloc its thread buffer. 
Then in next vmexit, KVM sync vcpu->arch.xfd, load guest_fpu value (=0) and
update current->thread.fpu XFD to 0 for kernel reference. 


> - the OR of the guest value and the guest_fpu value while the guest runs
> (using either MSR load/save lists, or manual wrmsr like
> pt_guest_enter/pt_guest_exit).  This ensures that the host has the
> opportunity to get a #NM exception, and allocate AMX state in the
> guest_fpu and in current->thread.fpu.
> 
> > Yes, passthrough is done by two cases: one is guest #NM trapped;
> > another is guest clearing XFD before it generates #NM (this is possible for
> > guest), then passthrough.
> > For the two cases, we passthrough and allocate buffer for guest_fpu, and
> > current->thread.fpu.
> 
> I think it's simpler to always wait for #NM, it will only happen once
> per vCPU.  In other words, even if the guest clears XFD before it
> generates #NM, the guest_fpu's XFD remains nonzero 

You mean a wrmsr trap doesn't do anything and return back?
In this case, when next vmenter, the OR of the guest value 
(vcpu->arch.xfd) and the guest_fpu value is still 1, so this 
doesn't obey guest's HW assumption? (guest finds the wrmsr 
didn't work)
 
Thanks,
Jing

and an #NM vmexit is
> possible.  After #NM the guest_fpu's XFD is zero; then passthrough can
> happen and the #NM vmexit trap can be disabled.

> 
> Paolo


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [patch 13/31] x86/fpu: Move KVMs FPU swapping to FPU core
  2021-10-13 10:14             ` Andy Lutomirski
@ 2021-10-13 12:26               ` Paolo Bonzini
  2021-10-13 14:14                 ` Thomas Gleixner
  2021-10-13 14:59                 ` Andy Lutomirski
  0 siblings, 2 replies; 96+ messages in thread
From: Paolo Bonzini @ 2021-10-13 12:26 UTC (permalink / raw)
  To: Andy Lutomirski, Liu, Jing2, Thomas Gleixner, Linux Kernel Mailing List
  Cc: the arch/x86 maintainers, Bae, Chang Seok, Dave Hansen,
	Arjan van de Ven, kvm list, Nakajima, Jun, Jing Liu,
	Sean Christopherson

On 13/10/21 12:14, Andy Lutomirski wrote:
>> I think it's simpler to always wait for #NM, it will only happen
>> once per vCPU.  In other words, even if the guest clears XFD before
>> it generates #NM, the guest_fpu's XFD remains nonzero and an #NM
>> vmexit is possible.  After #NM the guest_fpu's XFD is zero; then
>> passthrough can happen and the #NM vmexit trap can be disabled.
>
> This will stop being at all optimal when Intel inevitably adds
> another feature that uses XFD.  In the potentially infinite window in
> which the guest manages XFD and #NM on behalf of its userspace and
> when the guest allocates the other hypothetical feature, all the #NMs
> will have to be trapped by KVM.

The reason is that it's quite common to simply let the guest see all 
CPUID bits that KVM knows about.  But it's not unlikely that most guests 
will not ever use any XFD feature, and therefore will not ever see an 
#NM.  I wouldn't have any problem with allocating _all_ of the dynamic 
state space on the first #NM.

Thinking more about it, #NM only has to be trapped if XCR0 enables a 
dynamic feature.  In other words, the guest value of XFD can be limited 
to (host_XFD|guest_XFD) & guest_XCR0.  This avoids that KVM 
unnecessarily traps for old guests that use CR0.TS.

Paolo

> Is it really worthwhile for KVM to use XFD at all instead of
> preallocating the state and being done with it?  KVM would still have
> to avoid data loss if the guest sets XFD with non-init state, but #NM
> could always pass through.
> 


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [patch 13/31] x86/fpu: Move KVMs FPU swapping to FPU core
  2021-10-13 10:25             ` Liu, Jing2
@ 2021-10-13 12:37               ` Paolo Bonzini
  0 siblings, 0 replies; 96+ messages in thread
From: Paolo Bonzini @ 2021-10-13 12:37 UTC (permalink / raw)
  To: Liu, Jing2, Thomas Gleixner, LKML
  Cc: x86, Bae, Chang Seok, Dave Hansen, Arjan van de Ven, kvm,
	Nakajima, Jun, Jing Liu, seanjc

On 13/10/21 12:25, Liu, Jing2 wrote:
> [...]
>> - the internal KVM value attached to guest_fpu.  When #NM happens, this
>> one becomes zero.
> 
>> The CPU value is:
>>
>> - the guest_fpu value between kvm_load_guest_fpu and kvm_put_guest_fpu.
>>    This ensures that no state is lost in the case you are describing.
>>
> 
> OK, you mean using guest_fpu as a KVM value. Let me describe the
> flow to see if anything missing.
> 
> When #NM trap which makes passthrough, guest_fpu XFD set to 0 and keeps
> forever. (don't change HW XFD which is still 1)
> In the #NM trap, KVM alloc buffer and regenerate a #NM exception to guest
> to make guest kernel alloc its thread buffer.
> Then in next vmexit, KVM sync vcpu->arch.xfd, load guest_fpu value (=0) and
> update current->thread.fpu XFD to 0 for kernel reference.

In the #NM handler, KVM allocates the buffer and the guest_fpu XFD 
becomes zero.  Also because the guest_fpu XFD is zero:

- #NM vmexits are disabled.  More precisely, trapping #NM is only 
necessary if guest_fpu->xfd & ~vcpu->arch.xfd & vcpu->arch.xcr0 is 
nonzero (i.e. only if there is a state that is guest_fpu-disabled, but 
enabled according to both XFD and XCR0).

- On the next vmentry XFD is set to just vcpu->arch.xfd and the 
instruction is retried.  If the instruction causes an #NM in the guest, 
it is not trapped and delivered normally to the guest.

>> I think it's simpler to always wait for #NM, it will only happen once
>> per vCPU.  In other words, even if the guest clears XFD before it
>> generates #NM, the guest_fpu's XFD remains nonzero
> 
> You mean a wrmsr trap doesn't do anything and return back?

The guest might run with the same XFD value as before (which is 
guest_fpu->xfd | vcpu->arch.xfd), but vcpu->arch.xfd is changed.  The 
value in vcpu->arch.xfd will be read back by an RDMSR, because 
passthrough is not enabled and the RDMSR will cause a vmexit.

Once an #NM is received and guest_fpu->xfd becomes zero, passthrough can 
be enabled.

Paolo

> In this case, when next vmenter, the OR of the guest value
> (vcpu->arch.xfd) and the guest_fpu value is still 1, so this
> doesn't obey guest's HW assumption? (guest finds the wrmsr
> didn't work)


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [patch 13/31] x86/fpu: Move KVMs FPU swapping to FPU core
  2021-10-13  8:42           ` Paolo Bonzini
  2021-10-13 10:14             ` Andy Lutomirski
  2021-10-13 10:25             ` Liu, Jing2
@ 2021-10-13 14:06             ` Thomas Gleixner
  2021-10-14  6:50               ` Paolo Bonzini
  2 siblings, 1 reply; 96+ messages in thread
From: Thomas Gleixner @ 2021-10-13 14:06 UTC (permalink / raw)
  To: Paolo Bonzini, Liu, Jing2, LKML
  Cc: x86, Bae, Chang Seok, Dave Hansen, Arjan van de Ven, kvm,
	Nakajima, Jun, Jing Liu, seanjc, Andrew Cooper

Paolo,

On Wed, Oct 13 2021 at 10:42, Paolo Bonzini wrote:
> On 13/10/21 09:46, Liu, Jing2 wrote:
>>> Yes, the host value of XFD (which is zero) has to be restored after vmexit.
>>> See how KVM already handles SPEC_CTRL.
>> 
>> I'm trying to understand why qemu's XFD is zero after kernel supports AMX.
>
> There are three copies of XFD:
>
> - the guest value stored in vcpu->arch.
>
> - the "QEMU" value attached to host_fpu.  This one only becomes zero if 
> QEMU requires AMX (which shouldn't happen).

I don't think that makes sense.

First of all, if QEMU wants to expose AMX to guests, then it has to ask
for permission to do so as any other user space process. We're not going
to make that special just because.

The guest configuration will have to have a 'needs AMX' flag set. So
QEMU knows that it is required upfront.

Which also means that a guest configuration which has it not set will
never get AMX passed through.

That tells me, that we should not bother at all with on demand buffer
reallocations for that case and just keep things simple.

The on demand buffer allocation from the general OS point of view makes
sense because there it really matters whether we allocate $N kilobytes
per thread or not.

But does it matter for the QEMU process and its vCPU threads when the
guest is allowed to use AMX? I don't think so. It's an academic exercise
IMO and just makes the handling of this way more complex than required.

So the logic should be:

   qemu()
     read_config()
     if (dynamic_features_passthrough())
     	request_permission(feature)

     create_vcpu_threads()
       ....

       vcpu_thread()
         kvm_ioctl(ENABLE_DYN_FEATURE, feature)
           reallocate_buffers()
             realloc(tsk->fpu.fpstate, feature)
             realloc(guest_fpu.fpstate, feature)
             realloc(host_fpu.fpstate, feature)

             All of them will have

             fpstate.xfd = default_xfd & ~feature

That makes also resume and migration simple because that's going to use
exactly the same mechanism.

Yes, it _allows_ QEMU user space to use AMX, but that's not the end of
the world, really and avoids a ton of special cases to worry about.

Also the extra memory consumption per vCPU thread is probably just noise
compared to the rest of the vCPU state.

With that the only thing you have to take care of is in vmx_vcpu_run():

   local_irq_disable();
   ...
   vmx_vcpu_run()
     wrmsrl(XFD, guest->xfd)
     vmenter()
     guest->xfd = rdmsrl(XFD)
     wrmsrl(XFD, host->xfd)

It does not matter when at some day there is a XFD controlled bit 19 and
you want to selectively allow access to guests because we have two
mechanisms here:

  1) XCR0

    XSETBV in the guest is intercepted and checked against the allowed
    bits. If it tries to set one which is not allowed, then this is
    not any different from what KVM is doing today.

    I.e. Guest1 is allowed to set bit 18, but not 19
         Guest2 is allowed to set bit 19, but not 18
         Guest3 is allowed to set both 18 and 19

  2) XFD

     Intercepting XFD is optional I think. It does not matter what the
     guest writes into it, because if XCRO[i] = 0 then the state of
     XFD[i] is irrelevant according to the ISE:

     "(IA32_XFD[i] does not affect processor operations if XCR0[i] = 0.)"

     The only thing different vs. bare metal is that when guest writes
     XFD[i]=1 it wont get #GP despite the fact that virtualized CPUID
     suggest that it should get one:
     
     "Bit i of either MSR can be set to 1 only if CPUID.(EAX=0DH,ECX=i):ECX[2]
      is enumerated as 1.  An execution of WRMSR that attempts to set an
      unsupported bit in either MSR causes a general-protection fault
      (#GP)."

     Does it matter?  Probably not, all it can figure out is that
     component[i] is supported in hardware, but it can't do anything
     with that information because the VMM will not allow it to set the
     corresponding XCR0 bit...

     Sure you can intercept XFD, check the write against the allowed
     guest bits and inject #GP if not.

     But keep in mind that the guest kernel will context switch it and
     that will not be any better than context switching XCR0 in the
     guest kernel...

The thing we need to think about is the case where guest has XCR0[i] =
XFD[i] = 1 and host has XFD[i] = 0, because setting XFD[i] = 1 does not
bring the component[i] into init state.

In that case we have the following situation after a vmexit:

     guest->xfd = rdmsrl(XFD)         [i] = 1
     wrmsrl(XFD, host->xfd)           [i] = 0

If the component[i] is _not_ in init state then the next XSAVES on the
host will save it and therefore have xsave.header.XSAVE_BV[i] = 1 in the
buffer. A subsequent XRSTORS of that buffer on the host will restore the
saved data into component[i].

But the subsequent vmenter() will restore the guest XFD which will just
bring the guest into the exactly same state as before the VMEXIT.

Ergo it does not matter at all.

That also makes #NM handling trivial. Any #NM generated in the guest is
completely uninteresting for the host with that scheme and it's the
guests problem to deal with it.

But that brings me to another issue: XFD_ERR.

Assume guest takes #NM and before the handler can run and read/clear
XFD_ERR a VMEXIT happens which means XFD_ERR will have the guest error
bit set and nothing will clear it. So XFD_ERR has to be handled properly
otherwise a subsequent #NM on the host will see a stale bit from the
guest.

   vmx_vcpu_run()
     wrmsrl(XFD, guest->xfd)
     wrmsrl(XFD_ERR, guest->xfd_err)
     vmenter()
     guest->xfd_err = rdmsrl(XFD_ERR)
     guest->xfd = rdmsrl(XFD)
     wrmsrl(XFD_ERR, 0)
     wrmsrl(XFD, host->xfd)

Of course that want's to be conditional on the guest configuration and
you probably want all of that to be in the auto-load/store area, but
you get the idea.

Anything else will just create more problems than it solves. Especially
#NM handling (think nested guest) and the XFD_ERR additive behaviour
will be a nasty playground and easy to get wrong.

Not having that at all makes life way simpler, right?

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [patch 13/31] x86/fpu: Move KVMs FPU swapping to FPU core
  2021-10-13 12:26               ` Paolo Bonzini
@ 2021-10-13 14:14                 ` Thomas Gleixner
  2021-10-13 14:24                   ` Thomas Gleixner
  2021-10-13 14:59                 ` Andy Lutomirski
  1 sibling, 1 reply; 96+ messages in thread
From: Thomas Gleixner @ 2021-10-13 14:14 UTC (permalink / raw)
  To: Paolo Bonzini, Andy Lutomirski, Liu, Jing2, Linux Kernel Mailing List
  Cc: the arch/x86 maintainers, Bae, Chang Seok, Dave Hansen,
	Arjan van de Ven, kvm list, Nakajima, Jun, Jing Liu,
	Sean Christopherson

On Wed, Oct 13 2021 at 14:26, Paolo Bonzini wrote:

> On 13/10/21 12:14, Andy Lutomirski wrote:
>>> I think it's simpler to always wait for #NM, it will only happen
>>> once per vCPU.  In other words, even if the guest clears XFD before
>>> it generates #NM, the guest_fpu's XFD remains nonzero and an #NM
>>> vmexit is possible.  After #NM the guest_fpu's XFD is zero; then
>>> passthrough can happen and the #NM vmexit trap can be disabled.
>>
>> This will stop being at all optimal when Intel inevitably adds
>> another feature that uses XFD.  In the potentially infinite window in
>> which the guest manages XFD and #NM on behalf of its userspace and
>> when the guest allocates the other hypothetical feature, all the #NMs
>> will have to be trapped by KVM.
>
> The reason is that it's quite common to simply let the guest see all 
> CPUID bits that KVM knows about.

On fleets the cpu features exposed to guests matter a lot to ensure
migratability and I would be surprised when such a feature would just
be universally available to anyone.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [patch 13/31] x86/fpu: Move KVMs FPU swapping to FPU core
  2021-10-13 14:14                 ` Thomas Gleixner
@ 2021-10-13 14:24                   ` Thomas Gleixner
  0 siblings, 0 replies; 96+ messages in thread
From: Thomas Gleixner @ 2021-10-13 14:24 UTC (permalink / raw)
  To: Paolo Bonzini, Andy Lutomirski, Liu, Jing2, Linux Kernel Mailing List
  Cc: the arch/x86 maintainers, Bae, Chang Seok, Dave Hansen,
	Arjan van de Ven, kvm list, Nakajima, Jun, Jing Liu,
	Sean Christopherson

On Wed, Oct 13 2021 at 16:14, Thomas Gleixner wrote:

> On Wed, Oct 13 2021 at 14:26, Paolo Bonzini wrote:
>
>> On 13/10/21 12:14, Andy Lutomirski wrote:
>>>> I think it's simpler to always wait for #NM, it will only happen
>>>> once per vCPU.  In other words, even if the guest clears XFD before
>>>> it generates #NM, the guest_fpu's XFD remains nonzero and an #NM
>>>> vmexit is possible.  After #NM the guest_fpu's XFD is zero; then
>>>> passthrough can happen and the #NM vmexit trap can be disabled.
>>>
>>> This will stop being at all optimal when Intel inevitably adds
>>> another feature that uses XFD.  In the potentially infinite window in
>>> which the guest manages XFD and #NM on behalf of its userspace and
>>> when the guest allocates the other hypothetical feature, all the #NMs
>>> will have to be trapped by KVM.
>>
>> The reason is that it's quite common to simply let the guest see all 
>> CPUID bits that KVM knows about.
>
> On fleets the cpu features exposed to guests matter a lot to ensure
> migratability and I would be surprised when such a feature would just
> be universally available to anyone.

As a VM customer you get charged for RAM, CPUs, storage and whatever
extra features you need. So why would AMX be any different?

So far Intel ignored the fact that these accelerators are managed
resources even if they are accessible via instructions and do not
require to open(/dev/magic_accelerator). But that's just wrong and XFD
should already have happened with AVX512.

Trying to expose AMX unconditionally is just wrong and overengineered
and proliferating the mess we already have to suffer from.

As I said in the other mail. QEMU has to get permissions to use AMX
first and not doing it by circumventing the permission part via a KVM
hack.

Thanks,

        tglx




^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [patch 14/31] x86/fpu: Replace KVMs homebrewn FPU copy from user
  2021-10-12 17:00   ` Borislav Petkov
@ 2021-10-13 14:57     ` Sean Christopherson
  2021-10-13 15:12       ` Paolo Bonzini
  2021-10-13 15:16       ` Thomas Gleixner
  0 siblings, 2 replies; 96+ messages in thread
From: Sean Christopherson @ 2021-10-13 14:57 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Thomas Gleixner, LKML, x86, Chang S. Bae, Dave Hansen,
	Arjan van de Ven, kvm, Paolo Bonzini

On Tue, Oct 12, 2021, Borislav Petkov wrote:
> On Tue, Oct 12, 2021 at 02:00:19AM +0200, Thomas Gleixner wrote:
> > --- a/arch/x86/include/asm/fpu/api.h
> > +++ b/arch/x86/include/asm/fpu/api.h
> > @@ -116,4 +116,7 @@ extern void fpu_init_fpstate_user(struct
> >  /* KVM specific functions */
> >  extern void fpu_swap_kvm_fpu(struct fpu *save, struct fpu *rstor, u64 restore_mask);
> >  
> > +struct kvm_vcpu;
> > +extern int fpu_copy_kvm_uabi_to_vcpu(struct fpu *fpu, const void *buf, u64 xcr0, u32 *pkru);
> > +
> >  #endif /* _ASM_X86_FPU_API_H */
> > --- a/arch/x86/kernel/fpu/core.c
> > +++ b/arch/x86/kernel/fpu/core.c
> > @@ -174,7 +174,43 @@ void fpu_swap_kvm_fpu(struct fpu *save,
> >  	fpregs_unlock();
> >  }
> >  EXPORT_SYMBOL_GPL(fpu_swap_kvm_fpu);
> > -#endif
> > +
> > +int fpu_copy_kvm_uabi_to_vcpu(struct fpu *fpu, const void *buf, u64 xcr0,
> > +			      u32 *vpkru)
> 
> Right, except that there's no @vcpu in the args of that function. I
> guess you could call it
> 
> fpu_copy_kvm_uabi_to_buf()
> 
> and that @buf can be
> 
> vcpu->arch.guest_fpu

But the existing @buf is the userspace pointer, which semantically makes sense
because the userspace pointer is the "buffer" and the destination @fpu (and @prku)
is vCPU state, not a buffer.

That said, I also struggled with the lack of @vcpu.  What about prepending vcpu_
to fpu and to pkru?  E.g.

  int fpu_copy_kvm_uabi_to_vcpu(struct fpu *vcpu_fpu, const void *buf, u64 xcr0,
  				u32 *vcpu_pkru)

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [patch 13/31] x86/fpu: Move KVMs FPU swapping to FPU core
  2021-10-13 12:26               ` Paolo Bonzini
  2021-10-13 14:14                 ` Thomas Gleixner
@ 2021-10-13 14:59                 ` Andy Lutomirski
  2021-10-13 15:05                   ` Paolo Bonzini
  1 sibling, 1 reply; 96+ messages in thread
From: Andy Lutomirski @ 2021-10-13 14:59 UTC (permalink / raw)
  To: Paolo Bonzini, Liu, Jing2, Thomas Gleixner, Linux Kernel Mailing List
  Cc: the arch/x86 maintainers, Bae, Chang Seok, Dave Hansen,
	Arjan van de Ven, kvm list, Nakajima, Jun, Jing Liu,
	Sean Christopherson



On Wed, Oct 13, 2021, at 5:26 AM, Paolo Bonzini wrote:
> On 13/10/21 12:14, Andy Lutomirski wrote:
>>> I think it's simpler to always wait for #NM, it will only happen
>>> once per vCPU.  In other words, even if the guest clears XFD before
>>> it generates #NM, the guest_fpu's XFD remains nonzero and an #NM
>>> vmexit is possible.  After #NM the guest_fpu's XFD is zero; then
>>> passthrough can happen and the #NM vmexit trap can be disabled.
>>
>> This will stop being at all optimal when Intel inevitably adds
>> another feature that uses XFD.  In the potentially infinite window in
>> which the guest manages XFD and #NM on behalf of its userspace and
>> when the guest allocates the other hypothetical feature, all the #NMs
>> will have to be trapped by KVM.
>
> The reason is that it's quite common to simply let the guest see all 
> CPUID bits that KVM knows about.  But it's not unlikely that most guests 
> will not ever use any XFD feature, and therefore will not ever see an 
> #NM.  I wouldn't have any problem with allocating _all_ of the dynamic 
> state space on the first #NM.
>
> Thinking more about it, #NM only has to be trapped if XCR0 enables a 
> dynamic feature.  In other words, the guest value of XFD can be limited 
> to (host_XFD|guest_XFD) & guest_XCR0.  This avoids that KVM 
> unnecessarily traps for old guests that use CR0.TS.
>

You could simplify this by allocating the state the first time XCR0 enables the feature in question.

(This is how regular non-virt userspace *should* work too, but it looks like I’ve probably been outvoted on that front…)

> Paolo
>
>> Is it really worthwhile for KVM to use XFD at all instead of
>> preallocating the state and being done with it?  KVM would still have
>> to avoid data loss if the guest sets XFD with non-init state, but #NM
>> could always pass through.
>>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [patch 13/31] x86/fpu: Move KVMs FPU swapping to FPU core
  2021-10-13 14:59                 ` Andy Lutomirski
@ 2021-10-13 15:05                   ` Paolo Bonzini
  0 siblings, 0 replies; 96+ messages in thread
From: Paolo Bonzini @ 2021-10-13 15:05 UTC (permalink / raw)
  To: Andy Lutomirski, Liu, Jing2, Thomas Gleixner, Linux Kernel Mailing List
  Cc: the arch/x86 maintainers, Bae, Chang Seok, Dave Hansen,
	Arjan van de Ven, kvm list, Nakajima, Jun, Jing Liu,
	Sean Christopherson

On 13/10/21 16:59, Andy Lutomirski wrote:
>> 
>> Thinking more about it, #NM only has to be trapped if XCR0 enables
>> a dynamic feature.  In other words, the guest value of XFD can be
>> limited to (host_XFD|guest_XFD) & guest_XCR0.  This avoids that
>> KVM unnecessarily traps for old guests that use CR0.TS.
>> 
> You could simplify this by allocating the state the first time XCR0
> enables the feature in question.
> 
> (This is how regular non-virt userspace*should*  work too, but it
> looks like I’ve probably been outvoted on that front…)

Good point, you could do that too and do the work on the XCR0 vmexit 
instead of #NM.

Paolo


^ permalink raw reply	[flat|nested] 96+ messages in thread

* RE: [patch 13/31] x86/fpu: Move KVMs FPU swapping to FPU core
  2021-10-13  6:15     ` Liu, Jing2
  2021-10-13  6:26       ` Paolo Bonzini
@ 2021-10-13 15:12       ` Thomas Gleixner
  2021-10-14  8:21         ` Liu, Jing2
  1 sibling, 1 reply; 96+ messages in thread
From: Thomas Gleixner @ 2021-10-13 15:12 UTC (permalink / raw)
  To: Liu, Jing2, Paolo Bonzini, LKML
  Cc: x86, Bae, Chang Seok, Dave Hansen, Arjan van de Ven, kvm,
	Nakajima, Jun, Jing Liu, seanjc

Jing,

On Wed, Oct 13 2021 at 06:15, Jing2 Liu wrote:
>> On 12/10/21 02:00, Thomas Gleixner wrote:
> When looking into the tglx/devel.git x86/fpu for the full #1-#4 
> series and the KVM AMX support, I'd like to talk two things
>  as follows,
>
> 1. KVM dynamic allocation API:
> Since KVM also uses dynamic allocation, after KVM detects guest
> requesting AMX by #NM trap, KVM need alloc extra buffer for
> this vcpu's current->thread.fpu.fpstate and guest_fpu related.
> So far, the kernel itself has such API like fpstate_realloc(), but it's
> static. How about making a common function usable for KVM?

Just making that function usable without a proper design how this should
work at all does not solve anything.

We first need a conclusion vs. buffer reallocation.

Once that is sorted then we can create proper infrastructure for that in
the FPU core code and not just expose a random function to KVM and hack
it into submssion.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [patch 14/31] x86/fpu: Replace KVMs homebrewn FPU copy from user
  2021-10-13 14:57     ` Sean Christopherson
@ 2021-10-13 15:12       ` Paolo Bonzini
  2021-10-13 15:16       ` Thomas Gleixner
  1 sibling, 0 replies; 96+ messages in thread
From: Paolo Bonzini @ 2021-10-13 15:12 UTC (permalink / raw)
  To: Sean Christopherson, Borislav Petkov
  Cc: Thomas Gleixner, LKML, x86, Chang S. Bae, Dave Hansen,
	Arjan van de Ven, kvm

On 13/10/21 16:57, Sean Christopherson wrote:
>>> +int fpu_copy_kvm_uabi_to_vcpu(struct fpu *fpu, const void *buf, u64 xcr0,
>>> +			      u32 *vpkru)
>> Right, except that there's no @vcpu in the args of that function. I
>> guess you could call it
>>
>> fpu_copy_kvm_uabi_to_buf()
>>
>> and that @buf can be
>>
>> vcpu->arch.guest_fpu
> But the existing @buf is the userspace pointer, which semantically makes sense
> because the userspace pointer is the "buffer" and the destination @fpu (and @prku)
> is vCPU state, not a buffer.
> 
> That said, I also struggled with the lack of @vcpu.  What about prepending vcpu_
> to fpu and to pkru?  E.g.
> 
>    int fpu_copy_kvm_uabi_to_vcpu(struct fpu *vcpu_fpu, const void *buf, u64 xcr0,
>    				u32 *vcpu_pkru)
> 

It doesn't matter much that the source is somehow related to a vCPU, as 
long as the FPU is concerned.  If anything I would even drop the "v" 
from vpkru, but that's really nitpicking.

Paolo


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [patch 14/31] x86/fpu: Replace KVMs homebrewn FPU copy from user
  2021-10-13 14:57     ` Sean Christopherson
  2021-10-13 15:12       ` Paolo Bonzini
@ 2021-10-13 15:16       ` Thomas Gleixner
  1 sibling, 0 replies; 96+ messages in thread
From: Thomas Gleixner @ 2021-10-13 15:16 UTC (permalink / raw)
  To: Sean Christopherson, Borislav Petkov
  Cc: LKML, x86, Chang S. Bae, Dave Hansen, Arjan van de Ven, kvm,
	Paolo Bonzini

On Wed, Oct 13 2021 at 14:57, Sean Christopherson wrote:
> On Tue, Oct 12, 2021, Borislav Petkov wrote:
>> On Tue, Oct 12, 2021 at 02:00:19AM +0200, Thomas Gleixner wrote:
>> > --- a/arch/x86/include/asm/fpu/api.h
>> > +++ b/arch/x86/include/asm/fpu/api.h
>> > @@ -116,4 +116,7 @@ extern void fpu_init_fpstate_user(struct
>> >  /* KVM specific functions */
>> >  extern void fpu_swap_kvm_fpu(struct fpu *save, struct fpu *rstor, u64 restore_mask);
>> >  
>> > +struct kvm_vcpu;
>> > +extern int fpu_copy_kvm_uabi_to_vcpu(struct fpu *fpu, const void *buf, u64 xcr0, u32 *pkru);
>> > +
>> >  #endif /* _ASM_X86_FPU_API_H */
>> > --- a/arch/x86/kernel/fpu/core.c
>> > +++ b/arch/x86/kernel/fpu/core.c
>> > @@ -174,7 +174,43 @@ void fpu_swap_kvm_fpu(struct fpu *save,
>> >  	fpregs_unlock();
>> >  }
>> >  EXPORT_SYMBOL_GPL(fpu_swap_kvm_fpu);
>> > -#endif
>> > +
>> > +int fpu_copy_kvm_uabi_to_vcpu(struct fpu *fpu, const void *buf, u64 xcr0,
>> > +			      u32 *vpkru)
>> 
>> Right, except that there's no @vcpu in the args of that function. I
>> guess you could call it
>> 
>> fpu_copy_kvm_uabi_to_buf()
>> 
>> and that @buf can be
>> 
>> vcpu->arch.guest_fpu
>
> But the existing @buf is the userspace pointer, which semantically makes sense
> because the userspace pointer is the "buffer" and the destination @fpu (and @prku)
> is vCPU state, not a buffer.
>
> That said, I also struggled with the lack of @vcpu.  What about prepending vcpu_
> to fpu and to pkru?  E.g.
>
>   int fpu_copy_kvm_uabi_to_vcpu(struct fpu *vcpu_fpu, const void *buf, u64 xcr0,
>   				u32 *vcpu_pkru)

I've renamed them to:

     fpu_copy_kvm_uabi_to_fpstate()
     fpu_copy_fpstate_to_kvm_uabi()

See
https://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git/log/?h=x86/fpu-1

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [patch 13/31] x86/fpu: Move KVMs FPU swapping to FPU core
  2021-10-13 14:06             ` Thomas Gleixner
@ 2021-10-14  6:50               ` Paolo Bonzini
  2021-10-14  8:02                 ` Liu, Jing2
  2021-10-14 12:23                 ` Thomas Gleixner
  0 siblings, 2 replies; 96+ messages in thread
From: Paolo Bonzini @ 2021-10-14  6:50 UTC (permalink / raw)
  To: Thomas Gleixner, Liu, Jing2, LKML
  Cc: x86, Bae, Chang Seok, Dave Hansen, Arjan van de Ven, kvm,
	Nakajima, Jun, Jing Liu, seanjc, Andrew Cooper

On 13/10/21 16:06, Thomas Gleixner wrote:
>> - the guest value stored in vcpu->arch.
>>
>> - the "QEMU" value attached to host_fpu.  This one only becomes zero if
>> QEMU requires AMX (which shouldn't happen).
> 
> I don't think that makes sense.
> 
> First of all, if QEMU wants to expose AMX to guests, then it has to ask
> for permission to do so as any other user space process. We're not going
> to make that special just because.

Hmm, I would have preferred if there was no need to enable AMX for the 
QEMU FPU.  But you're saying that guest_fpu needs to swap out to 
current->thread.fpu if the guest is preempted, which would require 
XFD=0; and affect QEMU operation as well.

In principle I don't like it very much; it would be nicer to say "you 
enable it for QEMU itself via arch_prctl(ARCH_SET_STATE_ENABLE), and for 
the guests via ioctl(KVM_SET_CPUID2)".  But I can see why you want to 
keep things simple, so it's not a strong objection at all.

> Anything else will just create more problems than it solves. Especially
> #NM handling (think nested guest) and the XFD_ERR additive behaviour
> will be a nasty playground and easy to get wrong.
> 
> Not having that at all makes life way simpler, right?

It is simpler indeed, and it makes sense to start simple.  I am not sure 
if it will hold, but I agree it's better for the first implementation.

Paolo


^ permalink raw reply	[flat|nested] 96+ messages in thread

* RE: [patch 13/31] x86/fpu: Move KVMs FPU swapping to FPU core
  2021-10-14  6:50               ` Paolo Bonzini
@ 2021-10-14  8:02                 ` Liu, Jing2
  2021-10-14  9:01                   ` Paolo Bonzini
  2021-10-14 12:23                 ` Thomas Gleixner
  1 sibling, 1 reply; 96+ messages in thread
From: Liu, Jing2 @ 2021-10-14  8:02 UTC (permalink / raw)
  To: Paolo Bonzini, Thomas Gleixner, LKML
  Cc: x86, Bae, Chang Seok, Dave Hansen, Arjan van de Ven, kvm,
	Nakajima, Jun, Jing Liu, seanjc, Cooper, Andrew

On 10/14/2021 2:50 PM, Paolo Bonzini wrote:
> On 13/10/21 16:06, Thomas Gleixner wrote:
> >> - the guest value stored in vcpu->arch.
> >>
> >> - the "QEMU" value attached to host_fpu.  This one only becomes zero
> >> if QEMU requires AMX (which shouldn't happen).
> >
> > I don't think that makes sense.
> >
> > First of all, if QEMU wants to expose AMX to guests, then it has to
> > ask for permission to do so as any other user space process. We're not
> > going to make that special just because.
> 
> Hmm, I would have preferred if there was no need to enable AMX for the
> QEMU FPU.  But you're saying that guest_fpu needs to swap out to
> current->thread.fpu if the guest is preempted, which would require
> XFD=0; and affect QEMU operation as well.

For preemption, if guest_fpu XFD is used as KVM internal value, then
we can simply set current->thread.fpu XFD the same as KVM internal
value in vmexit so kernel preemption can refer to it.

Thus, I think this issue doesn't much effect if enabling AMX for Qemu
FPU or not.

> 
> In principle I don't like it very much; it would be nicer to say "you
> enable it for QEMU itself via arch_prctl(ARCH_SET_STATE_ENABLE), and for
> the guests via ioctl(KVM_SET_CPUID2)".  But I can see why you want to
> keep things simple, so it's not a strong objection at all.
> 

Does this mean that KVM allocate 3 buffers via 
1) Qemu's request, instead of via 2) guest XCR0 trap? 

For the two ways, I think what we need care is the same: a) allocation time;
b) lazy passthrough time which related to XFD handling and values. Because
we don't want always rdmsr and clear XFD in vmexit, and don't want to
trap different XFD switching in guest.

For 1), Qemu need prctl() and ioctl(ENABLE_DYN_FEATURE).
But *when* does Qemu do ioctl(ENABLE_DYN_FEATURE)? I mean if
guest XCR0 doesn't set bit18, then KVM doesn't need alloc 3 buffers
at all.

Thus, XCR0 trap is a simple way?

Meanwhile, for lazy passthrough, do we want to make it when guest
wrmsr trap (i.e. guest changes XFD, not inits XFD) if using 1) qemu's
request?  Or using 2) via XCR0 trap, directly passthrough when XCR0
trap?

> > Anything else will just create more problems than it solves. Especially
> > #NM handling (think nested guest) and the XFD_ERR additive behaviour
> > will be a nasty playground and easy to get wrong.
> >
> > Not having that at all makes life way simpler, right?
> 
> It is simpler indeed, and it makes sense to start simple.  
I'd like to confirm which is the simpler way we'd like to :)

Thanks,
Jing

I am not sure
> if it will hold, but I agree it's better for the first implementation.
> 
> Paolo


^ permalink raw reply	[flat|nested] 96+ messages in thread

* RE: [patch 13/31] x86/fpu: Move KVMs FPU swapping to FPU core
  2021-10-13 15:12       ` Thomas Gleixner
@ 2021-10-14  8:21         ` Liu, Jing2
  2021-10-14 13:08           ` Thomas Gleixner
  0 siblings, 1 reply; 96+ messages in thread
From: Liu, Jing2 @ 2021-10-14  8:21 UTC (permalink / raw)
  To: Thomas Gleixner, Paolo Bonzini, LKML
  Cc: x86, Bae, Chang Seok, Dave Hansen, Arjan van de Ven, kvm,
	Nakajima, Jun, Jing Liu, seanjc



> > 1. KVM dynamic allocation API:
> > Since KVM also uses dynamic allocation, after KVM detects guest
> > requesting AMX by #NM trap, KVM need alloc extra buffer for this
> > vcpu's current->thread.fpu.fpstate and guest_fpu related.
> > So far, the kernel itself has such API like fpstate_realloc(), but
> > it's static. How about making a common function usable for KVM?
> 
> Just making that function usable without a proper design how this should
> work at all does not solve anything.
> 
> We first need a conclusion vs. buffer reallocation.
> 
> Once that is sorted then we can create proper infrastructure for that in the
> FPU core code and not just expose a random function to KVM and hack it into
> submssion.
Yes, we need a consensus on the way we choose and then to see if need a
kernel function for KVM usage.

Thanks,
Jing

> 
> Thanks,
> 
>         tglx

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [patch 13/31] x86/fpu: Move KVMs FPU swapping to FPU core
  2021-10-14  8:02                 ` Liu, Jing2
@ 2021-10-14  9:01                   ` Paolo Bonzini
  2021-10-14 11:21                     ` Liu, Jing2
                                       ` (2 more replies)
  0 siblings, 3 replies; 96+ messages in thread
From: Paolo Bonzini @ 2021-10-14  9:01 UTC (permalink / raw)
  To: Liu, Jing2, Thomas Gleixner, LKML
  Cc: x86, Bae, Chang Seok, Dave Hansen, Arjan van de Ven, kvm,
	Nakajima, Jun, Jing Liu, seanjc, Cooper, Andrew

On 14/10/21 10:02, Liu, Jing2 wrote:
>> In principle I don't like it very much; it would be nicer to say "you
>> enable it for QEMU itself via arch_prctl(ARCH_SET_STATE_ENABLE), and for
>> the guests via ioctl(KVM_SET_CPUID2)".  But I can see why you want to
>> keep things simple, so it's not a strong objection at all.
> 
> Does this mean that KVM allocate 3 buffers via
> 1) Qemu's request, instead of via 2) guest XCR0 trap?

Based on the input from Andy and Thomas, the new way would be like this:

1) host_fpu must always be checked for reallocation in 
kvm_load_guest_fpu (or in the FPU functions that it calls, that depends 
on the rest of Thomas's patches).  That's because arch_prctl can enable 
AMX for QEMU at any point after KVM_CREATE_VCPU.

2) every use of vcpu->arch.guest_supported_xcr0 is changed to only 
include those dynamic-feature bits that were enabled via arch_prctl.
That is, something like:

static u64 kvm_guest_supported_cr0(struct kvm_vcpu *vcpu)
{
	return vcpu->arch.guest_supported_xcr0 &
		(~xfeatures_mask_user_dynamic | \
		 current->thread.fpu.dynamic_state_perm);
}

3) Even with passthrough disabled, the guest can run with XFD set to 
vcpu->arch.guest_xfd (and likewise for XFD_ERR) which is much simpler 
than trapping #NM.  The traps for writing XCR0 and XFD are used to 
allocate dynamic state for guest_fpu, and start the passthrough of XFD 
and XFD_ERR.  What we need is:

- if a dynamic state has XCR0[n]=0, bit n will never be set in XFD_ERR 
and the state will never be dirtied by the guest.

- if a dynamic state has XCR0[n]=1, but all enabled dynamic states have 
XFD[n]=1, the guest is not able to dirty any dynamic XSAVE state, 
because they all have either XCR0[n]=0 or XFD[n]=1.  An attempt to do so 
will cause an #NM trap and set the bit in XFD_ERR.

- if a dynamic state has XCR0[n]=1 and XFD[n]=0, the state for bit n is 
allocated in guest_fpu, and it can also disable the vmexits for XFD and 
XFD_ERR.

Therefore:

- if passthrough is disabled, the XCR0 and XFD write traps can check 
guest_xcr0 & ~guest_xfd.  If it includes a dynamic state bit, dynamic 
state is allocated for all bits enabled in guest_xcr0 and passthrough is 
started; this should happen shortly after the guest gets its first #NM 
trap for AMX.

- if passthrough is enabled, the XCR0 write trap must still ensure that 
dynamic state is allocated for all bits enabled in guest_xcr0.

So something like this pseudocode is called by both XCR0 and XFD writes:

int kvm_alloc_fpu_dynamic_features(struct kvm_vcpu *vcpu)
{
	u64 allowed_dynamic = current->thread.fpu.dynamic_state_perm;
	u64 enabled_dynamic =
		vcpu->arch.xcr0 & xfeatures_mask_user_dynamic;

	/* All dynamic features have to be arch_prctl'd first.  */
	WARN_ON_ONCE(enabled_dynamic & ~allowed_dynamic);

	if (!vcpu->arch.xfd_passthrough) {
		/* All dynamic states will #NM?  Wait and see.  */
		if ((enabled_dynamic & ~vcpu->arch.xfd) == 0)
			return 0;

		kvm_x86_ops.enable_xfd_passthrough(vcpu);
	}

	/* current->thread.fpu was already handled by arch_prctl.  */
	return fpu_alloc_features(vcpu->guest_fpu,
		vcpu->guest_fpu.dynamic_state_perm | enabled_dynamic);
}

Paolo


^ permalink raw reply	[flat|nested] 96+ messages in thread

* RE: [patch 13/31] x86/fpu: Move KVMs FPU swapping to FPU core
  2021-10-14  9:01                   ` Paolo Bonzini
@ 2021-10-14 11:21                     ` Liu, Jing2
  2021-10-14 11:33                       ` Paolo Bonzini
  2021-10-14 11:30                     ` Liu, Jing2
  2021-10-14 14:09                     ` Thomas Gleixner
  2 siblings, 1 reply; 96+ messages in thread
From: Liu, Jing2 @ 2021-10-14 11:21 UTC (permalink / raw)
  To: Paolo Bonzini, Thomas Gleixner, LKML
  Cc: x86, Bae, Chang Seok, Dave Hansen, Arjan van de Ven, kvm,
	Nakajima, Jun, Jing Liu, seanjc, Cooper, Andrew

On 10/14/2021 5:01 PM, Paolo Bonzini wrote:

> On 14/10/21 10:02, Liu, Jing2 wrote:
> >> In principle I don't like it very much; it would be nicer to say "you
> >> enable it for QEMU itself via arch_prctl(ARCH_SET_STATE_ENABLE), and
> >> for the guests via ioctl(KVM_SET_CPUID2)".  But I can see why you
> >> want to keep things simple, so it's not a strong objection at all.
> >
> > Does this mean that KVM allocate 3 buffers via
> > 1) Qemu's request, instead of via 2) guest XCR0 trap?
> 
> Based on the input from Andy and Thomas, the new way would be like this:
> 
> 1) host_fpu must always be checked for reallocation in kvm_load_guest_fpu
> (or in the FPU functions that it calls, that depends on the rest of Thomas's
> patches).  That's because arch_prctl can enable AMX for QEMU at any point
> after KVM_CREATE_VCPU.

For Qemu's XFD, I'd like to confirm that:
Since the arch_prctl() onlys add current->group_leader->thread.fpu's  state_perm,
__state_size, (current->thread.fpu.* is not changed), thus in
kvm_load_guest_fpu, host_fpu->xfd is always 1. That is to say, Qemu's arch_prctl()
doesn't change any copies of XFD.

> 
> 2) every use of vcpu->arch.guest_supported_xcr0 is changed to only include
> those dynamic-feature bits that were enabled via arch_prctl.
> That is, something like:
> 
> static u64 kvm_guest_supported_cr0(struct kvm_vcpu *vcpu) {
> 	return vcpu->arch.guest_supported_xcr0 &
> 		(~xfeatures_mask_user_dynamic | \
> 		 current->thread.fpu.dynamic_state_perm);
> }
> 
> 3) Even with passthrough disabled, the guest can run with XFD set to
> vcpu->arch.guest_xfd (and likewise for XFD_ERR) which is much simpler
> than trapping #NM.  The traps for writing XCR0 and XFD are used to allocate
> dynamic state for guest_fpu, and start the passthrough of XFD and XFD_ERR.
> What we need is:
> 
> - if a dynamic state has XCR0[n]=0, bit n will never be set in XFD_ERR and the
> state will never be dirtied by the guest.
> 
> - if a dynamic state has XCR0[n]=1, but all enabled dynamic states have
> XFD[n]=1, the guest is not able to dirty any dynamic XSAVE state, because
> they all have either XCR0[n]=0 or XFD[n]=1.  An attempt to do so will cause an
> #NM trap and set the bit in XFD_ERR.
> 
> - if a dynamic state has XCR0[n]=1 and XFD[n]=0, the state for bit n is
> allocated in guest_fpu, and it can also disable the vmexits for XFD and
> XFD_ERR.
> 

Got it, the principle is once XCR0[n]=1 and XFD[n]=0, then guest is allowed
to use the dynamic XSAVE state, thus KVM must prepare all things well
before. This probably happens shortly after guest #NM.

Only one thing: it seems we assume that vcpu->arch.xfd is guest runtime
value. And before guest initializes XFD, KVM provides
vcpu->arch.xfd[18]=1, right? But the spec asks XFD reset value as zero.
If so, between guest init XCR0 to 1 and init XFD to 1, it's XCR0[n]=1 and
XFD[n]=0. If a guest never init XFD and directly use dynamic state...

Or do we want to provide guest a XFD[18]=1 value at the very beginning?

> Therefore:
> 
> - if passthrough is disabled, the XCR0 and XFD write traps can check
> guest_xcr0 & ~guest_xfd.  If it includes a dynamic state bit, dynamic state is
> allocated for all bits enabled in guest_xcr0 and passthrough is started; this
> should happen shortly after the guest gets its first #NM trap for AMX.
> 
> - if passthrough is enabled, the XCR0 write trap must still ensure that
> dynamic state is allocated for all bits enabled in guest_xcr0.
> 
> So something like this pseudocode is called by both XCR0 and XFD writes:
> 
> int kvm_alloc_fpu_dynamic_features(struct kvm_vcpu *vcpu) {
> 	u64 allowed_dynamic = current->thread.fpu.dynamic_state_perm;
> 	u64 enabled_dynamic =
> 		vcpu->arch.xcr0 & xfeatures_mask_user_dynamic;
> 
> 	/* All dynamic features have to be arch_prctl'd first.  */
> 	WARN_ON_ONCE(enabled_dynamic & ~allowed_dynamic);
> 
> 	if (!vcpu->arch.xfd_passthrough) {
> 		/* All dynamic states will #NM?  Wait and see.  */
> 		if ((enabled_dynamic & ~vcpu->arch.xfd) == 0)
Here, when guest init XCR0 to 1, vcpu->arch.xfd should be 1
otherwise XCR0 trap makes passthrough and allocates buffer, which
is not what we want.

> 			return 0;
> 
> 		kvm_x86_ops.enable_xfd_passthrough(vcpu);
> 	}
> 
> 	/* current->thread.fpu was already handled by arch_prctl.  */
It seems so far, arch_prctl does not change current->thread.fpu,
only #NM handler itself does it. We here alloc current too.

Thanks,
Jing
> 	return fpu_alloc_features(vcpu->guest_fpu,
> 		vcpu->guest_fpu.dynamic_state_perm | enabled_dynamic); }
> 
> Paolo


^ permalink raw reply	[flat|nested] 96+ messages in thread

* RE: [patch 13/31] x86/fpu: Move KVMs FPU swapping to FPU core
  2021-10-14  9:01                   ` Paolo Bonzini
  2021-10-14 11:21                     ` Liu, Jing2
@ 2021-10-14 11:30                     ` Liu, Jing2
  2021-10-14 11:39                       ` Paolo Bonzini
  2021-10-14 14:09                     ` Thomas Gleixner
  2 siblings, 1 reply; 96+ messages in thread
From: Liu, Jing2 @ 2021-10-14 11:30 UTC (permalink / raw)
  To: Paolo Bonzini, Thomas Gleixner, LKML
  Cc: x86, Bae, Chang Seok, Dave Hansen, Arjan van de Ven, kvm,
	Nakajima, Jun, Jing Liu, seanjc, Cooper, Andrew


On 10/14/2021 5:01 PM, Paolo Bonzini wrote:
> On 14/10/21 10:02, Liu, Jing2 wrote:
> >> In principle I don't like it very much; it would be nicer to say "you
> >> enable it for QEMU itself via arch_prctl(ARCH_SET_STATE_ENABLE), and
> >> for the guests via ioctl(KVM_SET_CPUID2)".  But I can see why you
> >> want to keep things simple, so it's not a strong objection at all.
> >
> > Does this mean that KVM allocate 3 buffers via
> > 1) Qemu's request, instead of via 2) guest XCR0 trap?
> 
> Based on the input from Andy and Thomas, the new way would be like this:
> 
> 1) host_fpu must always be checked for reallocation in kvm_load_guest_fpu
> (or in the FPU functions that it calls, that depends on the rest of Thomas's
> patches).  That's because arch_prctl can enable AMX for QEMU at any point
> after KVM_CREATE_VCPU.
> 
> 2) every use of vcpu->arch.guest_supported_xcr0 is changed to only include
> those dynamic-feature bits that were enabled via arch_prctl.
> That is, something like:
> 
> static u64 kvm_guest_supported_cr0(struct kvm_vcpu *vcpu) {
> 	return vcpu->arch.guest_supported_xcr0 &
> 		(~xfeatures_mask_user_dynamic | \
> 		 current->thread.fpu.dynamic_state_perm);
> }
> 
> 3) Even with passthrough disabled, the guest can run with XFD set to
> vcpu->arch.guest_xfd (and likewise for XFD_ERR) which is much simpler
> than trapping #NM.  The traps for writing XCR0 and XFD are used to allocate
> dynamic state for guest_fpu, and start the passthrough of XFD and XFD_ERR.

For XFD_ERR, since it can be auto changed by HW, write-protect is not
need I think. KVM also not need trap rdmsr of it because no use.

I guess we're worrying about is when KVM is sched_out, a nonzero XFD_ERR
can be lost by other host thread. We can save guest XFD_ERR in sched_out
and restore before next vmenter. Kernel is assumed not using AMX thus
softirq won't make it lost.
I think this solves the problem. So we can directly passthrough RW of it,
and no need to rdmsr(XFD_ERR) in vmexit.

Thanks,
Jing
 

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [patch 13/31] x86/fpu: Move KVMs FPU swapping to FPU core
  2021-10-14 11:21                     ` Liu, Jing2
@ 2021-10-14 11:33                       ` Paolo Bonzini
  0 siblings, 0 replies; 96+ messages in thread
From: Paolo Bonzini @ 2021-10-14 11:33 UTC (permalink / raw)
  To: Liu, Jing2, Thomas Gleixner, LKML
  Cc: x86, Bae, Chang Seok, Dave Hansen, Arjan van de Ven, kvm,
	Nakajima, Jun, Jing Liu, seanjc, Cooper, Andrew

On 14/10/21 13:21, Liu, Jing2 wrote:
> Got it, the principle is once XCR0[n]=1 and XFD[n]=0, then guest is allowed
> to use the dynamic XSAVE state, thus KVM must prepare all things well
> before. This probably happens shortly after guest #NM.
> 
> Only one thing: it seems we assume that vcpu->arch.xfd is guest runtime
> value. And before guest initializes XFD, KVM provides
> vcpu->arch.xfd[18]=1, right? But the spec asks XFD reset value as zero.
> If so, between guest init XCR0 to 1 and init XFD to 1, it's XCR0[n]=1 and
> XFD[n]=0. If a guest never init XFD and directly use dynamic state...
> 
> Or do we want to provide guest a XFD[18]=1 value at the very beginning?

On reset the guest value has to be zero.  For Linux, which we control, 
we probably want to write the bit in XFD before XSETBV.  For other OSes 
there's nothing we can do, but hopefully they make similar considerations.

Paolo


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [patch 13/31] x86/fpu: Move KVMs FPU swapping to FPU core
  2021-10-14 11:30                     ` Liu, Jing2
@ 2021-10-14 11:39                       ` Paolo Bonzini
  2021-11-22  8:50                         ` Liu, Jing2
  0 siblings, 1 reply; 96+ messages in thread
From: Paolo Bonzini @ 2021-10-14 11:39 UTC (permalink / raw)
  To: Liu, Jing2, Thomas Gleixner, LKML
  Cc: x86, Bae, Chang Seok, Dave Hansen, Arjan van de Ven, kvm,
	Nakajima, Jun, Jing Liu, seanjc, Cooper, Andrew

On 14/10/21 13:30, Liu, Jing2 wrote:
> I guess we're worrying about is when KVM is sched_out, a nonzero XFD_ERR
> can be lost by other host thread. We can save guest XFD_ERR in sched_out
> and restore before next vmenter. Kernel is assumed not using AMX thus
> softirq won't make it lost.
> I think this solves the problem. So we can directly passthrough RW of it,
> and no need to rdmsr(XFD_ERR) in vmexit.

Correct; you can also use the "user-return MSRs" machinery (until Linux 
starts using AMX in the kernel, but that shouldn't happen too soon).

Paolo


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [patch 13/31] x86/fpu: Move KVMs FPU swapping to FPU core
  2021-10-14  6:50               ` Paolo Bonzini
  2021-10-14  8:02                 ` Liu, Jing2
@ 2021-10-14 12:23                 ` Thomas Gleixner
  2021-10-14 12:26                   ` Paolo Bonzini
  1 sibling, 1 reply; 96+ messages in thread
From: Thomas Gleixner @ 2021-10-14 12:23 UTC (permalink / raw)
  To: Paolo Bonzini, Liu, Jing2, LKML
  Cc: x86, Bae, Chang Seok, Dave Hansen, Arjan van de Ven, kvm,
	Nakajima, Jun, Jing Liu, seanjc, Andrew Cooper

Paolo,

On Thu, Oct 14 2021 at 08:50, Paolo Bonzini wrote:
> On 13/10/21 16:06, Thomas Gleixner wrote:
>>> - the guest value stored in vcpu->arch.
>>>
>>> - the "QEMU" value attached to host_fpu.  This one only becomes zero if
>>> QEMU requires AMX (which shouldn't happen).
>> 
>> I don't think that makes sense.
>> 
>> First of all, if QEMU wants to expose AMX to guests, then it has to ask
>> for permission to do so as any other user space process. We're not going
>> to make that special just because.
>
> Hmm, I would have preferred if there was no need to enable AMX for the 
> QEMU FPU.  But you're saying that guest_fpu needs to swap out to 
> current->thread.fpu if the guest is preempted, which would require 
> XFD=0; and affect QEMU operation as well.

Exactly. If we don't enable it for QEMY itself, then this is creating
just a horrible inconsistency which requires nasty hacks. I'm not at
all interested in those as I just got rid of quite some and made the
code consistent.

> In principle I don't like it very much; it would be nicer to say "you 
> enable it for QEMU itself via arch_prctl(ARCH_SET_STATE_ENABLE), and for 
> the guests via ioctl(KVM_SET_CPUID2)".  But I can see why you want to 
> keep things simple, so it's not a strong objection at all.

Errm.

   qemu()
     read_config()
     if (dynamic_features_passthrough())
	request_permission(feature)             <- prctl(ARCH_SET_STATE_ENABLE)

     create_vcpu_threads()
       ....

       vcpu_thread()
	 kvm_ioctl(ENABLE_DYN_FEATURE, feature) <- KVM ioctl

That's what I lined out, right?

>> Anything else will just create more problems than it solves. Especially
>> #NM handling (think nested guest) and the XFD_ERR additive behaviour
>> will be a nasty playground and easy to get wrong.
>> 
>> Not having that at all makes life way simpler, right?
>
> It is simpler indeed, and it makes sense to start simple.  I am not sure 
> if it will hold, but I agree it's better for the first implementation.

KISS is a very reasonable engineering principle :)

If there is a real world use case and a proper technical justification
for doing the dynamic buffer allocation then I'm happy to discuss that.

Thanks,

        tglx


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [patch 13/31] x86/fpu: Move KVMs FPU swapping to FPU core
  2021-10-14 12:23                 ` Thomas Gleixner
@ 2021-10-14 12:26                   ` Paolo Bonzini
  2021-10-14 14:23                     ` Thomas Gleixner
  0 siblings, 1 reply; 96+ messages in thread
From: Paolo Bonzini @ 2021-10-14 12:26 UTC (permalink / raw)
  To: Thomas Gleixner, Liu, Jing2, LKML
  Cc: x86, Bae, Chang Seok, Dave Hansen, Arjan van de Ven, kvm,
	Nakajima, Jun, Jing Liu, seanjc, Andrew Cooper

On 14/10/21 14:23, Thomas Gleixner wrote:
>> In principle I don't like it very much; it would be nicer to say "you
>> enable it for QEMU itself via arch_prctl(ARCH_SET_STATE_ENABLE), and for
>> the guests via ioctl(KVM_SET_CPUID2)".  But I can see why you want to
>> keep things simple, so it's not a strong objection at all.
> Errm.
> 
>     qemu()
>       read_config()
>       if (dynamic_features_passthrough())
> 	request_permission(feature)             <- prctl(ARCH_SET_STATE_ENABLE)
> 
>       create_vcpu_threads()
>         ....
> 
>         vcpu_thread()
> 	 kvm_ioctl(ENABLE_DYN_FEATURE, feature) <- KVM ioctl
> 
> That's what I lined out, right?
> 

I meant prctl for QEMU-in-user-mode vs. ioctl QEMU-in-guest-mode (i.e. 
no prctl if only the guest uses it).  But anyway it's just abstract 
"beauty", let's stick to simple. :)

Paolo


^ permalink raw reply	[flat|nested] 96+ messages in thread

* RE: [patch 13/31] x86/fpu: Move KVMs FPU swapping to FPU core
  2021-10-14  8:21         ` Liu, Jing2
@ 2021-10-14 13:08           ` Thomas Gleixner
  0 siblings, 0 replies; 96+ messages in thread
From: Thomas Gleixner @ 2021-10-14 13:08 UTC (permalink / raw)
  To: Liu, Jing2, Paolo Bonzini, LKML
  Cc: x86, Bae, Chang Seok, Dave Hansen, Arjan van de Ven, kvm,
	Nakajima, Jun, Jing Liu, seanjc

On Thu, Oct 14 2021 at 08:21, Jing2 Liu wrote:
>> 
>> Once that is sorted then we can create proper infrastructure for that in the
>> FPU core code and not just expose a random function to KVM and hack it into
>> submssion.
> Yes, we need a consensus on the way we choose and then to see if need a
> kernel function for KVM usage.

The question is not 'if'. The question is 'which' functionality we need.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [patch 13/31] x86/fpu: Move KVMs FPU swapping to FPU core
  2021-10-14  9:01                   ` Paolo Bonzini
  2021-10-14 11:21                     ` Liu, Jing2
  2021-10-14 11:30                     ` Liu, Jing2
@ 2021-10-14 14:09                     ` Thomas Gleixner
  2021-10-14 14:37                       ` Thomas Gleixner
  2021-10-14 15:01                       ` Paolo Bonzini
  2 siblings, 2 replies; 96+ messages in thread
From: Thomas Gleixner @ 2021-10-14 14:09 UTC (permalink / raw)
  To: Paolo Bonzini, Liu, Jing2, LKML
  Cc: x86, Bae, Chang Seok, Dave Hansen, Arjan van de Ven, kvm,
	Nakajima, Jun, Jing Liu, seanjc, Cooper, Andrew

On Thu, Oct 14 2021 at 11:01, Paolo Bonzini wrote:
> On 14/10/21 10:02, Liu, Jing2 wrote:
> Based on the input from Andy and Thomas, the new way would be like this:
>
> 1) host_fpu must always be checked for reallocation in 
> kvm_load_guest_fpu (or in the FPU functions that it calls, that depends 
> on the rest of Thomas's patches).  That's because arch_prctl can enable 
> AMX for QEMU at any point after KVM_CREATE_VCPU.

No.

   1) QEMU starts
   2) QEMU requests permissions via prctl()
   3) QEMU creates vCPU threads

Doing it the other way around makes no sense at all and wont work.

> 2) every use of vcpu->arch.guest_supported_xcr0 is changed to only 
> include those dynamic-feature bits that were enabled via arch_prctl.
> That is, something like:
>
> static u64 kvm_guest_supported_cr0(struct kvm_vcpu *vcpu)
> {
> 	return vcpu->arch.guest_supported_xcr0 &
> 		(~xfeatures_mask_user_dynamic | \
> 		 current->thread.fpu.dynamic_state_perm);

Bah. You can't get enough from poking in internals, right?

vcpu_create()

  fpu_init_fpstate_user(guest_fpu, supported_xcr0)

That will (it does not today) do:

     guest_fpu::__state_perm = supported_xcr0 & xstate_get_group_perm();

for you. Once.

The you have the information you need right in the guest FPU.

See?

> So something like this pseudocode is called by both XCR0 and XFD writes:
>
> int kvm_alloc_fpu_dynamic_features(struct kvm_vcpu *vcpu)
> {
> 	u64 allowed_dynamic = current->thread.fpu.dynamic_state_perm;

That's not a valid assumption.

> 	u64 enabled_dynamic =
> 		vcpu->arch.xcr0 & xfeatures_mask_user_dynamic;
>
> 	/* All dynamic features have to be arch_prctl'd first.  */
> 	WARN_ON_ONCE(enabled_dynamic & ~allowed_dynamic);
>
> 	if (!vcpu->arch.xfd_passthrough) {
> 		/* All dynamic states will #NM?  Wait and see.  */
> 		if ((enabled_dynamic & ~vcpu->arch.xfd) == 0)
> 			return 0;
>
> 		kvm_x86_ops.enable_xfd_passthrough(vcpu);
> 	}
>
> 	/* current->thread.fpu was already handled by arch_prctl.  */

No. current->thread.fpu has the default buffer unless QEMU used AMX or
something forced it to allocate a larger buffer.

> 	return fpu_alloc_features(vcpu->guest_fpu,
> 		vcpu->guest_fpu.dynamic_state_perm | enabled_dynamic);

This unconditionally calls into that allocation for every XCR0/XFD
trap ?

> }

Also you really should not wait until _all_ dynamic states are cleared
in guest XFD. Because a guest which has bit 18 and 19 available but only
uses one of them is going to trap on every other context switch due to
XFD writes.

So you check for

   (guest_xfd & guest_perm) != guest_perm)

and

   (guest_xr0 & guest_perm) != 0

If both are true, then you reallocate the buffers for _all_ permitted
states _and_ set XFD to pass through.

Thanks,

        tglx


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [patch 13/31] x86/fpu: Move KVMs FPU swapping to FPU core
  2021-10-14 12:26                   ` Paolo Bonzini
@ 2021-10-14 14:23                     ` Thomas Gleixner
  0 siblings, 0 replies; 96+ messages in thread
From: Thomas Gleixner @ 2021-10-14 14:23 UTC (permalink / raw)
  To: Paolo Bonzini, Liu, Jing2, LKML
  Cc: x86, Bae, Chang Seok, Dave Hansen, Arjan van de Ven, kvm,
	Nakajima, Jun, Jing Liu, seanjc, Andrew Cooper

On Thu, Oct 14 2021 at 14:26, Paolo Bonzini wrote:
> On 14/10/21 14:23, Thomas Gleixner wrote:
>>> In principle I don't like it very much; it would be nicer to say "you
>>> enable it for QEMU itself via arch_prctl(ARCH_SET_STATE_ENABLE), and for
>>> the guests via ioctl(KVM_SET_CPUID2)".  But I can see why you want to
>>> keep things simple, so it's not a strong objection at all.
>> Errm.
>> 
>>     qemu()
>>       read_config()
>>       if (dynamic_features_passthrough())
>> 	request_permission(feature)             <- prctl(ARCH_SET_STATE_ENABLE)
>> 
>>       create_vcpu_threads()
>>         ....
>> 
>>         vcpu_thread()
>> 	 kvm_ioctl(ENABLE_DYN_FEATURE, feature) <- KVM ioctl
>> 
>> That's what I lined out, right?
>> 
>
> I meant prctl for QEMU-in-user-mode vs. ioctl QEMU-in-guest-mode (i.e. 
> no prctl if only the guest uses it).  But anyway it's just abstract 
> "beauty", let's stick to simple. :)

It's not about simple. It's about correctness in the first place.

The prctl() is process wide and grants permission. If that permission is
not granted, e.g. by a seccomp rule, then the vCPU threads cannot use it
either. We are _not_ making exceptions for KVM just because it's KVM.

Trying to pretend that the usermode thread does not need it is just
illusion. The kernel representation of that very usermode vCPU thread must
have a large fpstate. It still can have XFD set, but that's a detail.

So what you are trying to sell me has nothing to do with beauty at all
except when your definition of beauty originates from a tunnel of horrors.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [patch 13/31] x86/fpu: Move KVMs FPU swapping to FPU core
  2021-10-14 14:09                     ` Thomas Gleixner
@ 2021-10-14 14:37                       ` Thomas Gleixner
  2021-10-14 15:01                       ` Paolo Bonzini
  1 sibling, 0 replies; 96+ messages in thread
From: Thomas Gleixner @ 2021-10-14 14:37 UTC (permalink / raw)
  To: Paolo Bonzini, Liu, Jing2, LKML
  Cc: x86, Bae, Chang Seok, Dave Hansen, Arjan van de Ven, kvm,
	Nakajima, Jun, Jing Liu, seanjc, Cooper, Andrew

On Thu, Oct 14 2021 at 16:09, Thomas Gleixner wrote:
> On Thu, Oct 14 2021 at 11:01, Paolo Bonzini wrote:
>
> Also you really should not wait until _all_ dynamic states are cleared
> in guest XFD. Because a guest which has bit 18 and 19 available but only
> uses one of them is going to trap on every other context switch due to
> XFD writes.
>
> So you check for
>
>    (guest_xfd & guest_perm) != guest_perm)
>
> and
>
>    (guest_xr0 & guest_perm) != 0
>
> If both are true, then you reallocate the buffers for _all_ permitted
> states _and_ set XFD to pass through.

And for that to work we must write XFD _before_ XSETBV in the guest boot
phase.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [patch 13/31] x86/fpu: Move KVMs FPU swapping to FPU core
  2021-10-14 14:09                     ` Thomas Gleixner
  2021-10-14 14:37                       ` Thomas Gleixner
@ 2021-10-14 15:01                       ` Paolo Bonzini
  2021-10-14 19:14                         ` Thomas Gleixner
  2021-10-15  9:00                         ` Liu, Jing2
  1 sibling, 2 replies; 96+ messages in thread
From: Paolo Bonzini @ 2021-10-14 15:01 UTC (permalink / raw)
  To: Thomas Gleixner, Liu, Jing2, LKML
  Cc: x86, Bae, Chang Seok, Dave Hansen, Arjan van de Ven, kvm,
	Nakajima, Jun, Jing Liu, seanjc, Cooper, Andrew

On 14/10/21 16:09, Thomas Gleixner wrote:
> On Thu, Oct 14 2021 at 11:01, Paolo Bonzini wrote:
>> On 14/10/21 10:02, Liu, Jing2 wrote:
>> Based on the input from Andy and Thomas, the new way would be like this:
>>
>> 1) host_fpu must always be checked for reallocation in
>> kvm_load_guest_fpu (or in the FPU functions that it calls, that depends
>> on the rest of Thomas's patches).  That's because arch_prctl can enable
>> AMX for QEMU at any point after KVM_CREATE_VCPU.
> 
> No.
> 
>     1) QEMU starts
>     2) QEMU requests permissions via prctl()
>     3) QEMU creates vCPU threads
> 
> Doing it the other way around makes no sense at all and wont work.

Sure, but KVM needs to do something that makes sense even for userspaces 
that are not QEMU.

For example, there could be a program that uses AMX *itself* and does 
not expose it to the guest.  In that case, the arch_prctl can come at 
the point AMX is needed, which can be after the program creates vCPU 
threads.  That's for host_fpu.

For the guest_fpu, I agree that the arch_prctl must come before creating 
vCPUs.

>> 2) every use of vcpu->arch.guest_supported_xcr0 is changed to only
>> include those dynamic-feature bits that were enabled via arch_prctl.
>> That is, something like:
>>
>> static u64 kvm_guest_supported_cr0(struct kvm_vcpu *vcpu)
>> {
>> 	return vcpu->arch.guest_supported_xcr0 &
>> 		(~xfeatures_mask_user_dynamic | \
>> 		 current->thread.fpu.dynamic_state_perm);
> 
> Bah. You can't get enough from poking in internals, right?
> 
> vcpu_create()
> 
>    fpu_init_fpstate_user(guest_fpu, supported_xcr0)
> 
> That will (it does not today) do:
> 
>       guest_fpu::__state_perm = supported_xcr0 & xstate_get_group_perm();
> 
> The you have the information you need right in the guest FPU.

Good, I wasn't aware of the APIs that will be there.

>> int kvm_alloc_fpu_dynamic_features(struct kvm_vcpu *vcpu)
>> {
>> 	u64 allowed_dynamic = current->thread.fpu.dynamic_state_perm;
> 
> That's not a valid assumption.
> 
>> 	u64 enabled_dynamic =
>> 		vcpu->arch.xcr0 & xfeatures_mask_user_dynamic;
>>
>> 	/* All dynamic features have to be arch_prctl'd first.  */
>> 	WARN_ON_ONCE(enabled_dynamic & ~allowed_dynamic);
>>
>> 	if (!vcpu->arch.xfd_passthrough) {
>> 		/* All dynamic states will #NM?  Wait and see.  */
>> 		if ((enabled_dynamic & ~vcpu->arch.xfd) == 0)
>> 			return 0;
>>
>> 		kvm_x86_ops.enable_xfd_passthrough(vcpu);
>> 	}
>>
>> 	/* current->thread.fpu was already handled by arch_prctl.  */
> 
> No. current->thread.fpu has the default buffer unless QEMU used AMX or
> something forced it to allocate a larger buffer.
> 
>> 	return fpu_alloc_features(vcpu->guest_fpu,
>> 		vcpu->guest_fpu.dynamic_state_perm | enabled_dynamic);
> 
> This unconditionally calls into that allocation for every XCR0/XFD
> trap ?

Calls into the function, but doesn't necessarily allocate anything. 
What you wrote below looks correct to me, thanks.

Paolo

> Also you really should not wait until _all_ dynamic states are cleared
> in guest XFD.  Because a guest which has bit 18 and 19 available but only > uses one of them is going to trap on every other context switch due to
> XFD writes.


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [patch 13/31] x86/fpu: Move KVMs FPU swapping to FPU core
  2021-10-14 15:01                       ` Paolo Bonzini
@ 2021-10-14 19:14                         ` Thomas Gleixner
  2021-10-15  9:20                           ` Liu, Jing2
  2021-10-15  9:36                           ` Thomas Gleixner
  2021-10-15  9:00                         ` Liu, Jing2
  1 sibling, 2 replies; 96+ messages in thread
From: Thomas Gleixner @ 2021-10-14 19:14 UTC (permalink / raw)
  To: Paolo Bonzini, Liu, Jing2, LKML
  Cc: x86, Bae, Chang Seok, Dave Hansen, Arjan van de Ven, kvm,
	Nakajima, Jun, Jing Liu, seanjc, Cooper, Andrew

Paolo,

On Thu, Oct 14 2021 at 17:01, Paolo Bonzini wrote:
> On 14/10/21 16:09, Thomas Gleixner wrote:
>> On Thu, Oct 14 2021 at 11:01, Paolo Bonzini wrote:
>>> On 14/10/21 10:02, Liu, Jing2 wrote:
>>> Based on the input from Andy and Thomas, the new way would be like this:
>>>
>>> 1) host_fpu must always be checked for reallocation in
>>> kvm_load_guest_fpu (or in the FPU functions that it calls, that depends
>>> on the rest of Thomas's patches).  That's because arch_prctl can enable
>>> AMX for QEMU at any point after KVM_CREATE_VCPU.
>> 
>> No.
>> 
>>     1) QEMU starts
>>     2) QEMU requests permissions via prctl()
>>     3) QEMU creates vCPU threads
>> 
>> Doing it the other way around makes no sense at all and wont work.
>
> Sure, but KVM needs to do something that makes sense even for userspaces 
> that are not QEMU.
>
> For example, there could be a program that uses AMX *itself* and does 
> not expose it to the guest.  In that case, the arch_prctl can come at 
> the point AMX is needed, which can be after the program creates vCPU 
> threads.  That's for host_fpu.

That wont affect the vCPU threads unless they start to use AMX in user
space themself. Which means they have the default buffer and their vCPU
user/guest FPU's too.

The prctl() sets the permission nothing else.  As long as they don't use
AMX their XFD[18] stays set. Only when they start using AMX in user
space themself they trigger #NM which allocates a larger buffer for the
thread.

So then the point where it matters is fpu_swap_kvm_fpu() and that's
preemptible context so we can do allocations before fiddling with the
buffers. Not rocket science.

And that has nothing to do with the whole XCR0/XFD/XFD_ERR/#NM guest
mess.

> For the guest_fpu, I agree that the arch_prctl must come before creating 
> vCPUs.

Good :)

>> vcpu_create()
>> 
>>    fpu_init_fpstate_user(guest_fpu, supported_xcr0)
>> 
>> That will (it does not today) do:
>> 
>>       guest_fpu::__state_perm = supported_xcr0 & xstate_get_group_perm();
>> 
>> The you have the information you need right in the guest FPU.
>
> Good, I wasn't aware of the APIs that will be there.

Me neither, but that's a pretty obvious consequence of the work I'm
doing for AMX. So I made it up for you. :)

>> This unconditionally calls into that allocation for every XCR0/XFD
>> trap ?
>
> Calls into the function, but doesn't necessarily allocate anything.

Sure.

> What you wrote below looks correct to me, thanks.
>
> Paolo
>

Properly quoting mail is hard, right?

>> Also you really should not wait until _all_ dynamic states are cleared
>> in guest XFD.  Because a guest which has bit 18 and 19 available but only > uses one of them is going to trap on every other context switch due to
>> XFD writes.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 96+ messages in thread

* RE: [patch 13/31] x86/fpu: Move KVMs FPU swapping to FPU core
  2021-10-14 15:01                       ` Paolo Bonzini
  2021-10-14 19:14                         ` Thomas Gleixner
@ 2021-10-15  9:00                         ` Liu, Jing2
  2021-10-15 10:50                           ` Thomas Gleixner
  1 sibling, 1 reply; 96+ messages in thread
From: Liu, Jing2 @ 2021-10-15  9:00 UTC (permalink / raw)
  To: Paolo Bonzini, Thomas Gleixner, LKML
  Cc: x86, Bae, Chang Seok, Dave Hansen, Arjan van de Ven, kvm,
	Nakajima, Jun, Jing Liu, seanjc, Cooper, Andrew


On 10/14/2021 11:01 PM, Paolo Bonzini wrote:
[...]
> Calls into the function, but doesn't necessarily allocate anything.
> What you wrote below looks correct to me, thanks.
> 

For the guest dynamic state support, based on the latest discussion,
four copies of XFD need be cared and switched, I'd like to list as follows.

- vcpu->arch.xfd: this is the real guest value for running.
Since kernel init XFD before XCR0, so I think KVM can initialize it as
bit[n]=0, for a guest start value. Otherwise, kvm_arch_vcpu_create()
need initializes vcpu->arch.xfd=guest_fpu->xfd=user_fpu->xfd=1.
Guest wrmsr XFD trap will make it update.

- user_fpu->fpstate->xfd: Qemu itself and not for guest, which is
probably always set.

- guest_fpu->fpstate->xfd: this is for KVM internal value between time[*].
KVM reinitializes it as bit[n]=0 (not the same as user_fpu), and it will be
updated when guest wrmsr trap. Thus, before passthrough, it's the same
as vcpu->arch.xfd, thus vmenter/vmexit need not rewrite msr.
After passthrough, this keeps bit[n] as 0 forever.

- current_fpu->fpstate->xfd: it should be the same as KVM internal value
between time[*].
[*] this means between kvm_load_guest_fpu and kvm_put_guest_fpu.

From guest booting timeline,  the values are: 

Booting start...   # In this time, vcpu->arch.xfd[n]=guest_fpu->xfd[n]=0
Init XFD by WRMSR(XFD[n], 1)  	# Then, vcpu->arch.xfd[n]=guest_fpu->xfd[n]=1
Init XCR0 by XSETBV 	
...
#NM WRMSR(XFD[n], 0)  # Then, guest_fpu->xfd[n]=0, vcpu->arch.xfd[n]=0.
vcpu->arch.xfd will be updated in later vmexits. 

BTW, we only need lazy-passthrough XFD WRITE and passthrough
READ directly.

Thanks,
Jing

> Paolo
> 
> > Also you really should not wait until _all_ dynamic states are cleared
> > in guest XFD.  Because a guest which has bit 18 and 19 available but
> > only > uses one of them is going to trap on every other context switch due
> to XFD writes.


^ permalink raw reply	[flat|nested] 96+ messages in thread

* RE: [patch 13/31] x86/fpu: Move KVMs FPU swapping to FPU core
  2021-10-14 19:14                         ` Thomas Gleixner
@ 2021-10-15  9:20                           ` Liu, Jing2
  2021-10-15  9:36                           ` Thomas Gleixner
  1 sibling, 0 replies; 96+ messages in thread
From: Liu, Jing2 @ 2021-10-15  9:20 UTC (permalink / raw)
  To: Thomas Gleixner, Paolo Bonzini, LKML
  Cc: x86, Bae, Chang Seok, Dave Hansen, Arjan van de Ven, kvm,
	Nakajima, Jun, Jing Liu, seanjc, Cooper, Andrew


On 10/15/2021 3:14 AM, Thomas Gleixner wrote:
> Paolo,
> 
[...]
> >> vcpu_create()
> >>
> >>    fpu_init_fpstate_user(guest_fpu, supported_xcr0)
> >>
> >> That will (it does not today) do:
> >>
> >>       guest_fpu::__state_perm = supported_xcr0 &
> >> xstate_get_group_perm();
> >>
> >> The you have the information you need right in the guest FPU.
> >
> > Good, I wasn't aware of the APIs that will be there.
> 
> Me neither, but that's a pretty obvious consequence of the work I'm doing
> for AMX. So I made it up for you. :)

Do you mean that fpu_init_fpstate_user() will be updated to add
supported_xcr0 later? :) 

I'm thinking if guest_fpu::xfd is good to directly initialize as user's
init_fpstate.xfd. Because before guest initializes XFD, "hardware"
is reset value. So it would be better to make guest_fpu::xfd the
same so no need to reload zero before vmenter during this time.

Thanks,
Jing

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [patch 13/31] x86/fpu: Move KVMs FPU swapping to FPU core
  2021-10-14 19:14                         ` Thomas Gleixner
  2021-10-15  9:20                           ` Liu, Jing2
@ 2021-10-15  9:36                           ` Thomas Gleixner
  2021-10-15 14:24                             ` Liu, Jing2
  1 sibling, 1 reply; 96+ messages in thread
From: Thomas Gleixner @ 2021-10-15  9:36 UTC (permalink / raw)
  To: Paolo Bonzini, Liu, Jing2, LKML
  Cc: x86, Bae, Chang Seok, Dave Hansen, Arjan van de Ven, kvm,
	Nakajima, Jun, Jing Liu, seanjc, Cooper, Andrew

Paolo,

On Thu, Oct 14 2021 at 21:14, Thomas Gleixner wrote:
> On Thu, Oct 14 2021 at 17:01, Paolo Bonzini wrote:
>>> vcpu_create()
>>> 
>>>    fpu_init_fpstate_user(guest_fpu, supported_xcr0)
>>> 
>>> That will (it does not today) do:
>>> 
>>>       guest_fpu::__state_perm = supported_xcr0 & xstate_get_group_perm();
>>> 
>>> The you have the information you need right in the guest FPU.
>>
>> Good, I wasn't aware of the APIs that will be there.
>
> Me neither, but that's a pretty obvious consequence of the work I'm
> doing for AMX. So I made it up for you. :)

let me make some more up for you!

If you carefully look at part 2 of the rework, then you might notice
that there is a fundamental change which allows to do a real
simplification for KVM FPU handling:

   current->thread.fpu.fpstate

is now a pointer. So you can spare one FPU allocation because we can now
do:

fpu_attach_guest_fpu(supported_xcr0)
{
        guest_fpstate = alloc_fpstate(supported_xcr0);
        fpu_init_fpstate_user(guest_fpstate, supported_xcr0);
        current->thread.fpu.guest_fpstate = guest_fpstate;
}

fpu_swap_kvm_fpu() becomes in the first step:

fpu_swap_kvm_fpu(bool enter_guest)
{
        safe_fpregs_to_fpstate(current->thread.fpu.fpstate);

        swap(current->thread.fpu.fpstate, current->thread.fpu.guest_fpstate);

        restore_fpregs_from_fpstate(current->thread.fpu.fpstate);
}

@enter guest will allow to do some sanity checks

In a second step:

fpu_swap_kvm_fpu(bool enter_guest, u64 guest_needs_features)
{
        possibly_reallocate(enter_guest, guest_needs_features);
        safe_fpregs_to_fpstate(current->thread.fpu.fpstate);

        swap(current->thread.fpu.fpstate, current->thread.fpu.guest_fpstate);

        restore_fpregs_from_fpstate(current->thread.fpu.fpstate);
        possibly_reallocate(enter_guest, guest_needs_features);
}

@guest_needs_features is the information which you gather via guest XCR0
and guest XFD.

So fpu_swap_kvm_fpu() is going to be the place where reallocation happens
and that's good enough for both cases:

vcpu_run()

     fpu_swap_kvm_fpu(); <- 1

     while (...)
           vmenter();

     fpu_swap_kvm_fpu(); <- 2

#1 QEMU user space used feature and has already large fpstate

#2 Guest requires feature but has not used it yet (XCR0/XFD trapping)

See?

It's not only correct, it's also simple and truly beautiful.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 96+ messages in thread

* RE: [patch 13/31] x86/fpu: Move KVMs FPU swapping to FPU core
  2021-10-15  9:00                         ` Liu, Jing2
@ 2021-10-15 10:50                           ` Thomas Gleixner
  2021-10-15 11:17                             ` Paolo Bonzini
  2021-10-15 13:01                             ` Liu, Jing2
  0 siblings, 2 replies; 96+ messages in thread
From: Thomas Gleixner @ 2021-10-15 10:50 UTC (permalink / raw)
  To: Liu, Jing2, Paolo Bonzini, LKML
  Cc: x86, Bae, Chang Seok, Dave Hansen, Arjan van de Ven, kvm,
	Nakajima, Jun, Jing Liu, seanjc, Cooper, Andrew

Jing,

On Fri, Oct 15 2021 at 09:00, Jing2 Liu wrote:
> On 10/14/2021 11:01 PM, Paolo Bonzini wrote:
> For the guest dynamic state support, based on the latest discussion,
> four copies of XFD need be cared and switched, I'd like to list as
> follows.

There will not be 4 copies. Read my last mail and think about the
consequences.

I'm really tired of this tinkering frenzy. There is only one correct
approach to this:

   1) Define the requirements

   2) Define the best trapping mechanism

   3) Sit down, look at the existing code including the FPU rework for
      AMX. Come up with a proper integration plan

   4) Clean up the existing KVM FPU mess further so the integration
      can be done smoothly

   5) Add the required infrastructure in FPU core and KVM

   6) Add the trapping mechanics

   7) Enable feature

What you are doing is looking for the quickest way to duct tape this
into the existing mess.

That might be matching the KVM expectations, but it's not going to
happen.

KVM already violates all well known rules of encapsulation and just
fiddles in the guts of FPU mechanism, duplicates code in buggy ways.

This has to stop now!

You are free to ignore me, but all you are going to achieve is to delay
AMX integration further. Seriously, I'm not even going to reply to
anything which is not based on the above approach.

I'm sure you can figure out at which point we are at the moment.

Thanks,

        tglx


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [patch 13/31] x86/fpu: Move KVMs FPU swapping to FPU core
  2021-10-15 10:50                           ` Thomas Gleixner
@ 2021-10-15 11:17                             ` Paolo Bonzini
  2021-10-15 13:01                             ` Liu, Jing2
  1 sibling, 0 replies; 96+ messages in thread
From: Paolo Bonzini @ 2021-10-15 11:17 UTC (permalink / raw)
  To: Thomas Gleixner, Liu, Jing2, LKML
  Cc: x86, Bae, Chang Seok, Dave Hansen, Arjan van de Ven, kvm,
	Nakajima, Jun, Jing Liu, seanjc, Cooper, Andrew

On 15/10/21 12:50, Thomas Gleixner wrote:
> That might be matching the KVM expectations, but it's not going to
> happen.
> 
> KVM already violates all well known rules of encapsulation and just
> fiddles in the guts of FPU mechanism, duplicates code in buggy ways.
> 
> This has to stop now!

FWIW, I totally agree about that.  Over the years we've gotten more 
well-thought hooks in the kernel for KVM and less hacks, and I'll only 
be happy if that extends to the FPU code which I'm quite wary of 
touching.  Most of it has been unchanged since Ingo's last rewrite.

Paolo

> You are free to ignore me, but all you are going to achieve is to delay
> AMX integration further. Seriously, I'm not even going to reply to
> anything which is not based on the above approach.


^ permalink raw reply	[flat|nested] 96+ messages in thread

* RE: [patch 13/31] x86/fpu: Move KVMs FPU swapping to FPU core
  2021-10-15 10:50                           ` Thomas Gleixner
  2021-10-15 11:17                             ` Paolo Bonzini
@ 2021-10-15 13:01                             ` Liu, Jing2
  1 sibling, 0 replies; 96+ messages in thread
From: Liu, Jing2 @ 2021-10-15 13:01 UTC (permalink / raw)
  To: Thomas Gleixner, Paolo Bonzini, LKML
  Cc: x86, Bae, Chang Seok, Dave Hansen, Arjan van de Ven, kvm,
	Nakajima, Jun, Jing Liu, seanjc, Cooper, Andrew

Hi Thomas,
On 10/15/2021 6:50 PM, Thomas Gleixner wrote:
> Jing,
> 
> On Fri, Oct 15 2021 at 09:00, Jing2 Liu wrote:
> > On 10/14/2021 11:01 PM, Paolo Bonzini wrote:
> > For the guest dynamic state support, based on the latest discussion,
> > four copies of XFD need be cared and switched, I'd like to list as
> > follows.
> 
> There will not be 4 copies. Read my last mail and think about the
> consequences.
> 
Actually I saw there are fpu_init_fpstate_user(vcpu->arch.user_fpu)
and fpu_init_fpstate_user(vcpu->arch.guest_fpu) in the full series,
so I understood that we'd keep it this way. (Your last mail corrects me)

But yes, these xstate copies really make things complex and bad,
and I'm glad to do for a good clean way. I'll reply the thinking
(based on your approach below) on that thread later.


> I'm really tired of this tinkering frenzy. There is only one correct approach to
> this:

> 
>    1) Define the requirements
> 
>    2) Define the best trapping mechanism
> 
>    3) Sit down, look at the existing code including the FPU rework for
>       AMX. Come up with a proper integration plan
> 
>    4) Clean up the existing KVM FPU mess further so the integration
>       can be done smoothly
> 
>    5) Add the required infrastructure in FPU core and KVM
> 
>    6) Add the trapping mechanics
> 
>    7) Enable feature
> 
> What you are doing is looking for the quickest way to duct tape this into the
> existing mess.
> 
> That might be matching the KVM expectations, but it's not going to happen.
> 
> KVM already violates all well known rules of encapsulation and just fiddles in
> the guts of FPU mechanism, duplicates code in buggy ways.
> 
> This has to stop now!
> 

Yes, this is an opportunity to make current KVM FPU better.  

> You are free to ignore me,
Of course I won't, because I also want to try a good way that both KVM 
and kernel are glad to use.  

Thanks,
Jing

 but all you are going to achieve is to delay AMX
> integration further. Seriously, I'm not even going to reply to anything which is
> not based on the above approach.
> 
> I'm sure you can figure out at which point we are at the moment.
> 
> Thanks,
> 
>         tglx


^ permalink raw reply	[flat|nested] 96+ messages in thread

* RE: [patch 13/31] x86/fpu: Move KVMs FPU swapping to FPU core
  2021-10-15  9:36                           ` Thomas Gleixner
@ 2021-10-15 14:24                             ` Liu, Jing2
  2021-10-15 15:53                               ` Paolo Bonzini
  2021-10-16 14:45                               ` Thomas Gleixner
  0 siblings, 2 replies; 96+ messages in thread
From: Liu, Jing2 @ 2021-10-15 14:24 UTC (permalink / raw)
  To: Thomas Gleixner, Paolo Bonzini, LKML
  Cc: x86, Bae, Chang Seok, Dave Hansen, Arjan van de Ven, kvm,
	Nakajima, Jun, Jing Liu, seanjc, Cooper, Andrew

Hi Thomas,

On 10/15/2021 5:36 PM, Thomas Gleixner wrote:
> Paolo,
> 
> On Thu, Oct 14 2021 at 21:14, Thomas Gleixner wrote:
> > On Thu, Oct 14 2021 at 17:01, Paolo Bonzini wrote:
> >>> vcpu_create()
> >>>
> >>>    fpu_init_fpstate_user(guest_fpu, supported_xcr0)
> >>>
> >>> That will (it does not today) do:
> >>>
> >>>       guest_fpu::__state_perm = supported_xcr0 &
> >>> xstate_get_group_perm();
> >>>
> >>> The you have the information you need right in the guest FPU.
> >>
> >> Good, I wasn't aware of the APIs that will be there.
> >
> > Me neither, but that's a pretty obvious consequence of the work I'm
> > doing for AMX. So I made it up for you. :)
> 
> let me make some more up for you!
> 
> If you carefully look at part 2 of the rework, then you might notice that there
> is a fundamental change which allows to do a real simplification for KVM FPU
> handling:
> 
>    current->thread.fpu.fpstate
> 
> is now a pointer. So you can spare one FPU allocation because we can now
> do:

Trying to understand your point, seems struct fpu will add a guest_fpstate
pointer. And this will be allocated when vcpu_create() by the following
function. Swap between the two pointers in load/put. What I was thinking 
is that vcpu keeps having guest_fpu and delete user_fpu. 

> 
> fpu_attach_guest_fpu(supported_xcr0)
> {
>         guest_fpstate = alloc_fpstate(supported_xcr0);

I supposed this is called when vcpu_create(). Not sure the reason 
of supported_xcr0 input here. supported_xcr0[n]=1 and
guest _state_perm[n]=1 which is requested before vcpu_create(),
so this will allocate full buffer, at vcpu_create() stage? 
Or do you mean vcpu->arch.guest_supported_xcr0.

Please correct me if I misunderstood. Thanks.

>         fpu_init_fpstate_user(guest_fpstate, supported_xcr0);
>         current->thread.fpu.guest_fpstate = guest_fpstate; }
> 


> fpu_swap_kvm_fpu() becomes in the first step:
> 
> fpu_swap_kvm_fpu(bool enter_guest)
> {
>         safe_fpregs_to_fpstate(current->thread.fpu.fpstate);
> 
>         swap(current->thread.fpu.fpstate, current->thread.fpu.guest_fpstate);
> 
>         restore_fpregs_from_fpstate(current->thread.fpu.fpstate);
> }
> 
> @enter guest will allow to do some sanity checks
> 
> In a second step:
> 
> fpu_swap_kvm_fpu(bool enter_guest, u64 guest_needs_features) {
>         possibly_reallocate(enter_guest, guest_needs_features);

When KVM traps guest wrmsr XFD in #NM, I think KVM need allocate
fpstate buffer for full features.
Because in next vmexit, guest might have dynamic state and KVM
can be preempted before running fpu_swap_kvm_fpu().
Thus, here the current->thread.fpu.fpstate already has enough space
for saving guest.

Thanks,
Jing

>         safe_fpregs_to_fpstate(current->thread.fpu.fpstate);
> 
>         swap(current->thread.fpu.fpstate, current->thread.fpu.guest_fpstate);
> 
>         restore_fpregs_from_fpstate(current->thread.fpu.fpstate);
>         possibly_reallocate(enter_guest, guest_needs_features); }
> 
> @guest_needs_features is the information which you gather via guest XCR0
> and guest XFD.
> 
> So fpu_swap_kvm_fpu() is going to be the place where reallocation happens
> and that's good enough for both cases:
> 
> vcpu_run()
> 
>      fpu_swap_kvm_fpu(); <- 1
> 
>      while (...)
>            vmenter();
> 
>      fpu_swap_kvm_fpu(); <- 2
> 
> #1 QEMU user space used feature and has already large fpstate
> 
> #2 Guest requires feature but has not used it yet (XCR0/XFD trapping)
> 
> See?
> 
> It's not only correct, it's also simple and truly beautiful.
> 
> Thanks,
> 
>         tglx

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [patch 13/31] x86/fpu: Move KVMs FPU swapping to FPU core
  2021-10-15 14:24                             ` Liu, Jing2
@ 2021-10-15 15:53                               ` Paolo Bonzini
  2021-10-16 14:45                               ` Thomas Gleixner
  1 sibling, 0 replies; 96+ messages in thread
From: Paolo Bonzini @ 2021-10-15 15:53 UTC (permalink / raw)
  To: Liu, Jing2, Thomas Gleixner, LKML
  Cc: x86, Bae, Chang Seok, Dave Hansen, Arjan van de Ven, kvm,
	Nakajima, Jun, Jing Liu, seanjc, Cooper, Andrew

On 15/10/21 16:24, Liu, Jing2 wrote:
>> fpu_swap_kvm_fpu(bool enter_guest, u64 guest_needs_features) {
>>          possibly_reallocate(enter_guest, guest_needs_features);
> When KVM traps guest wrmsr XFD in #NM, I think KVM need allocate
> fpstate buffer for full features.

You mean XCR0 and XFD (not XFD in #NM), but yeah at the point of 
fpu_swap_kvm_fpu we are in atomic context.

Still, for now the first pass of AMX implementation doesn't need to do 
anything but swap the pointers, and it can simply allocate the full 
buffer at vCPU creation.

Paolo

> Because in next vmexit, guest might have dynamic state and KVM
> can be preempted before running fpu_swap_kvm_fpu().
> Thus, here the current->thread.fpu.fpstate already has enough space
> for saving guest.


^ permalink raw reply	[flat|nested] 96+ messages in thread

* RE: [patch 13/31] x86/fpu: Move KVMs FPU swapping to FPU core
  2021-10-15 14:24                             ` Liu, Jing2
  2021-10-15 15:53                               ` Paolo Bonzini
@ 2021-10-16 14:45                               ` Thomas Gleixner
  1 sibling, 0 replies; 96+ messages in thread
From: Thomas Gleixner @ 2021-10-16 14:45 UTC (permalink / raw)
  To: Liu, Jing2, Paolo Bonzini, LKML
  Cc: x86, Bae, Chang Seok, Dave Hansen, Arjan van de Ven, kvm,
	Nakajima, Jun, Jing Liu, seanjc, Cooper, Andrew

Jing,

On Fri, Oct 15 2021 at 14:24, Jing2 Liu wrote:
> On 10/15/2021 5:36 PM, Thomas Gleixner wrote:
>> If you carefully look at part 2 of the rework, then you might notice that there
>> is a fundamental change which allows to do a real simplification for KVM FPU
>> handling:
>> 
>>    current->thread.fpu.fpstate
>> 
>> is now a pointer. So you can spare one FPU allocation because we can now
>> do:
>
> Trying to understand your point, seems struct fpu will add a guest_fpstate
> pointer. And this will be allocated when vcpu_create() by the following
> function. Swap between the two pointers in load/put. What I was thinking 
> is that vcpu keeps having guest_fpu and delete user_fpu.

unfortunately we can't do that in vcpu_create() because the thread doing
that is not necessarily the vCPU thread which invokes vcpu_run()
later. But that does not matter much.

So vcpu_create() will do

   vcpu->arch.guest_fpstate = fpu_alloc_guest_fpstate();

and in vcpu_run() invoke

    fpu_swap_kvm_fpstate(guest_fpstate, ...)

which in turn does:

int fpu_swap_kvm_fpstate(struct fpstate *guest_fps, bool enter_guest,
			 u64 restore_mask)
{
	struct fpu *fpu = &current->thread.fpu;
	struct fpstate *fps = fpu->fpstate;

	fpregs_lock();
	if (!test_thread_flag(TIF_NEED_FPU_LOAD))
		save_fpregs_to_fpstate(fpu);

	/* Swap fpstate */
	if (enter_guest) {
		fpu->__task_fpstate = fps;
		fpu->fpstate = guest_fps;
	} else {
		fpu->fpstate = fpu->__task_fpstate;
		fpu->__task_fpstate = NULL;
	}

	fps = fpu->fpstate;

	/*
	 * Once XFD support is added, XFP switching happens here
	 * right before the restore.
	 */
	restore_mask &= XFEATURE_MASK_FPSTATE;
	restore_fpregs_from_fpstate(fps, restore_mask);

	fpregs_mark_activate();
	fpregs_unlock();
	return 0;
}

That's a simplified version of what I have already running on top of the
FPU rework part 3, but you get the idea.

If you are curious:

  https://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git/log/?h=x86/fpu-3-kvm

If you compare that to the current KVM FPU swap handling then you'll
notice that there is only _one_ buffer required and in case that
TIF_NEED_FPU_RELOAD is set there is no memcpy() required either because
the state is already saved in the to be swapped out buffer.

That's a valuable cleanup and improvement independent of AMX.

See?

This also makes the confidential computing case less awkward because we
can do:

	if (!fpstate->is-scratch && !test_thread_flag(TIF_NEED_FPU_LOAD))
		save_fpregs_to_fpstate(fpu);

instead of the current hack of freeing guest_fpu. See the git tree.

Though I'm not sure whether the logic for this "is_scratch" optimization
is correct as I implemnted it, but I'm neither sure that the current
logic in KVM is correct. But that's just a implementation detail which
needs to be looked at.

XFD support will be also fully consistently integrated:

  XFD will be switched before the restore and this will be fully
  consistent with everything we are doing vs. host side support because
  current->thread.fpu.fpstate->xfd will always be the authoritive
  answer. No need to copy any information from one place to another.

Ergo: No 4 copies of XFD.

>> In a second step:
>> 
>> fpu_swap_kvm_fpu(bool enter_guest, u64 guest_needs_features) {
>>         possibly_reallocate(enter_guest, guest_needs_features);
>
> When KVM traps guest wrmsr XFD in #NM, I think KVM need allocate
> fpstate buffer for full features.
> Because in next vmexit, guest might have dynamic state and KVM
> can be preempted before running fpu_swap_kvm_fpu().
> Thus, here the current->thread.fpu.fpstate already has enough space
> for saving guest.

I think we are talking past each other.

You are looking at this from the point of view of what you have been
doing so far and I am looking at it from a design and abstraction point
of view.

That explains why we have different excpectations vs. XCR0/XFD/#NM.

So the regular boring case will be:

H   vcpu_run()
H   	fpu_swap_kvm_fpstate() <- Sets guest_fpstate->xfd
H
H        while (..) {
H           vmenter()

G              ....
G              ....     -> vmexit (unrelated to XCR0/XFD)

H           ...
H        }
H
H  	fpu_swap_kvm_fpstate() <- Sets user (task) XFD

Now let's look at the XFD/XCR0 intercept case:

H   vcpu_run()
H   	fpu_swap_kvm_fpstate() <- Sets guest_fpstate->xfd
H
H        while (..) {
H           vmenter()

G              ....
G              write to XFD/XCR0;        -> intercept

H           ...
H           if (reason == write to XFD/XCR0)) {
H                if (needs_action(guest_fpstate, $XFDVAL, $XCR0VAL)) {
H                        fpstate_set_realloc_request(guest_fpstate);
H
H                        break;
H
H                }
H           }
H           .....
H        }
H
H  	fpu_swap_kvm_fpstate()

fpu_swap_kvm_fpstate() will see the reallocation request in
guest_fpstate and act accordingly.

Both user and guest state are fully consistent at that point. Why?

It does not matter at all whether the wrmsrl(XFD) or XSETBV affecting
XCR0 in the guest happens because the guest decided it is cool to enable
it just for fun or because the guest took a #NM and wrote to XFD.

In both cases the XFD controlled component is in init state at that
point. So there is _nothing_ to save and _nothing_ which can be lost and
no buffer size problem at all.

Therefore it does also not matter whether the vCPU thread gets preempted
or not on the way out to fpu_swap_kvm_fpstate(). It's all still
consistent.

So fpu_swap_kvm_fpstate() will do in case of a reallocation request:

  1) Allocate new guest fpstate

     If that fails, it does a save and restore with the existing
     fpstates and return an error code which makes KVM drop out to user
     space to decide what to do.

     On success initialize the state properly including the new XFD
     value.

  2) Save guest state into new guest fpstate

  3) Restore host (user) state from the existing fpstate

See?

It does not even have to allocate a new host (user) fpstate to do
that. Why?

  Because once the fpstate pointer is flipped the state is consistent in
  both directions including XFD.

See?

Now if you think about the other way round then the same principle
applies:

  If the host (user) side of the vCPU thread used a dynamic state it has
  a large buffer, but that does not require the guest side buffer to be
  large as well.

  So this is what Paolo wanted, right? I was fighting that because with
  the existing three buffer scheme this cannot not work.

See?

The point is that: 

  - the simple state switching was impossible because the proposed host
    side infrastructure had the wrong abstraction:

    It just reallocated the register buffer, but did not give
    it a container which carries the other relevant information,
    i.e. features, sizes and importantly xfd.

  - the simple state switching was impossible because the user/guest FPU
    concept of KVM was preventing that.

  - it was tried to plug the reallocation into the wrong place:

    There is no point to do that from inside the vcpu_run() loop. It's a
    one off event and that extra overhead of going out to the place
    where this can be handled sanely does not matter at all.

Just to be clear: I'm not blamning you for any of this at all.

There have been enough senior people involved who should have seen the
obvious instead of misguiding you.

So please just forget the horrors which you went through due to lack of
proper guidance, sit back and think about it.

The code in that git branch is just a first step and requires a few
tweaks to get the reallocation handled correctly, but if you look at the
above then you might realize that there are two related but largely
independent problems to solve:

  1) What needs to be intercepted and analyzed in the intercept handlers
     of XCR0 and XFD

  2) Handling the reallocation problem

#1 is a KVM problem and #2 is a host FPU problem

As you are a KVM wizard, I let you sort out #1 with the KVM folks and
I'm looking at the #2 part together with Chang and Dave. Why?

  #1 is not my area of expertise, but I surely might have opinions.
  
  #2 is not any different from handling the host side lazy reallocation.

Can you spot the difference between the original approach and the approach
I have taken?

Maybe you understand now why I was explaining over and over that we need
consistent state and asked everyone to look at the AMX host series.

Just for the record. When I looked at that KVM FPU switching exactly two
weeks ago while I was thinking about the right abstraction for AMX, it
was bloody obvious that just reallocating the register state buffer is
wrong. And it was bloody obvious that the user/guest FPU concept of KVM
is nonsense to begin with and going to be in the way of doing a clean
integration.

Why?

Because when you switch buffers, you need to switch state information
which belongs to the buffer, i.e. features, sizes and xfd, as well
because otherwise you create inconsistent state. Sure you can copy tons
of stuff back and forth, but why would you do that if you just can
switch the full state by swapping the pointer to a container which
contains all the information which is needed and makes everything else
(KVM trap bits aside) just work.

So you can rightfully ask why I did not tell that plan right away?

The reason is that I wanted all of you look at the AMX host series and I
desperately hoped that my insisting on state consistency will make at
least one of the involved people come back and say:

  "Thomas, why don't you do the obvious in fpu_swap_kvm_fpu() and switch
   the fpstate pointers? That FPU switching with the three buffers you
   kept around is bloody stupid."

My answer would have been: "Doh, of course, it's obvious. Stupid me."

But that did not happen. Even when I brought up the

    vcpu_create() -> alloc_and_attach()
    vcpu_run() -> swap() -> vmenter_loop() -> swap()
    vcpu_destroy() -> detach_and_free()

proposal nobody told me:

     "Thomas, this can't work because the thread which creates the  vCPU
      is not necessarily the same as the one which runs it."

No, I had to ask the question myself because I had second thoughts when
I was starting to implement that scheme. I had not thought about that
when I wrote it up in mail simply because I'm not a KVM expert. But it
did not matter because the concept stayed the same, just the
implementation details changed:

    vcpu_create() -> alloc()
    vcpu_run() -> attach() -> vmenter_loop() -> detach()
    vcpu_destroy() -> free()

Why? Because everyone was busy trying to cram their hacks into the code
I just changed instead of rethinking the situation.

See?

Jing, as I said before, I'm not blaming you personally. What I blame is
the general approach to add new features to the kernel:

    Hack it into the existing mess until it works by some definition
    of works.

That simply cannot go anywhere because it makes the code slow and
unmaintainable in the long run.

If a new feature does not fit nicely into the existing code, then the
only viable approach is to sit back, look at the requirements of the new
feature and come up with proper abstractions and a plan how to refactor
the code so that the feature falls into place at the very end and does
not create mess and overhead all over the place.

If you look at the three series I posted, then you see not a single bit
which does not make sense on it's own, except for the last series which
adds the fpu config data structure with the pairs of default_* and
max_*.

Even the fpstate pointer makes sense on it's own because the resulting
cleanup of the KVM FPU switch code is already worthwhile even w/o AMX
and XFD in terms of memory consumption, performance and robustness.

See?

The real AMX stuff which still needs to be posted is just building upon
this refactoring. It adds the necessary infrastructure for AMX, which is
all slow path code.

In the hotpath it adds only the XFD update at exactly 4 places where
state is saved or restored. IOW, all hotpath operations are exactly the
same. If XFD is not available on a CPU then the overhead of the XFD
update code is a few extra NOPs due to the patched out static branch.
If enabled then yes, it has an extra conditional and when the XFD value
really changes then a wrmsrl, but that's inevitable.

See?

Now if you sit back and look at the KVM concepts I explained above then
you surely can see that the overhead for the KVM case is going to
exactly a few extra NOPs in the hotpath when XFD is not available.

When XFD is enabled then yes, it needs the extra setup for XFD, but the
common case in the vmenter_loop() will have only a minimalistic overhead
if at all. The common case in fpu_swap_kvm_fpstate() will only grow a
single conditional in the hotpath:

       if (unlikely(guest_fpstate->need_realloc)) {

       }

but that's not even measurable.

See?

Thanks,

        Thomas

^ permalink raw reply	[flat|nested] 96+ messages in thread

* RE: [patch 13/31] x86/fpu: Move KVMs FPU swapping to FPU core
  2021-10-14 11:39                       ` Paolo Bonzini
@ 2021-11-22  8:50                         ` Liu, Jing2
  0 siblings, 0 replies; 96+ messages in thread
From: Liu, Jing2 @ 2021-11-22  8:50 UTC (permalink / raw)
  To: Paolo Bonzini, Thomas Gleixner, LKML
  Cc: x86, Bae, Chang Seok, Dave Hansen, Arjan van de Ven, kvm,
	Nakajima, Jun, Jing Liu, seanjc, Cooper, Andrew, Tian, Jun J

Hi Paolo,

> On 10/14/2021 7:39 PM, Paolo Bonzini wrote:
> 
> On 14/10/21 13:30, Liu, Jing2 wrote:
> > I guess we're worrying about is when KVM is sched_out, a nonzero
> > XFD_ERR can be lost by other host thread. We can save guest XFD_ERR in
> > sched_out and restore before next vmenter. Kernel is assumed not using
> > AMX thus softirq won't make it lost.
> > I think this solves the problem. So we can directly passthrough RW of
> > it, and no need to rdmsr(XFD_ERR) in vmexit.
> 
> Correct; you can also use the "user-return MSRs" machinery (until Linux
> starts using AMX in the kernel, but that shouldn't happen too soon).
> 
Thanks for the suggestion. For user-return MSR mechanism using by emulated 
MSRs, it calls kvm_set_user_return_msr() to wrmsr of guest value, update curr
value and switch host once kernel exit to userspace. 

For XFD_ERR, it's automatically changed by H/W in guest, so KVM need correctly 
update guest XFD_ERR value at a time, which is different from other user-return
MSRs, e.g., at KVM preemption and kvm_put_guest_fpu() time, and both cases 
need not do wrmsr. And for kvm_put_guest_fpu(), it does return to userspace.
Also, XFD_ERR cannot refer to vmx->guest_uret_msrs_loaded to update before 
vmenter, since curr may not an up-to-date value. My feeling is the mechanism
may not much suitable for this case and need special handling.

Since guest non-zero XFD_ERR is rare case at vmexit, how about saving XFD_ERR
when preemption, mark flag=true and restore if non-zero before vcpu enter? This 
seems simple and direct way, drawback is if XFD_ERR is not changed when schedule
out, KVM need a wrmsr, but this only happens when it's non-zero&&flag=true.

Thanks,
Jing

> Paolo


^ permalink raw reply	[flat|nested] 96+ messages in thread

end of thread, other threads:[~2021-11-22  8:50 UTC | newest]

Thread overview: 96+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-11 23:59 [patch 00/31] x86/fpu: Preparatory cleanups for AMX support (part 1) Thomas Gleixner
2021-10-11 23:59 ` [patch 01/31] x86/fpu: Remove pointless argument from switch_fpu_finish() Thomas Gleixner
2021-10-12  0:00 ` [patch 02/31] x86/fpu: Update stale comments Thomas Gleixner
2021-10-12  0:00 ` [patch 03/31] x86/pkru: Remove useless include Thomas Gleixner
2021-10-12  0:00 ` [patch 04/31] x86/fpu: Restrict xsaves()/xrstors() to independent states Thomas Gleixner
2021-10-12 14:24   ` Borislav Petkov
2021-10-12  0:00 ` [patch 05/31] x86/fpu: Cleanup the on_boot_cpu clutter Thomas Gleixner
2021-10-12  0:00 ` [patch 06/31] x86/fpu: Remove pointless memset in fpu_clone() Thomas Gleixner
2021-10-12  0:00 ` [patch 07/31] x86/process: Clone FPU in copy_thread() Thomas Gleixner
2021-10-12  0:00 ` [patch 08/31] x86/fpu: Do not inherit FPU context for kernel and IO worker threads Thomas Gleixner
2021-10-12  0:00 ` [patch 09/31] x86/fpu: Do not inherit FPU context for CLONE_THREAD Thomas Gleixner
2021-10-12 16:10   ` Borislav Petkov
2021-10-12 18:52     ` Thomas Gleixner
2021-10-12 19:01       ` Thomas Gleixner
2021-10-12  0:00 ` [patch 10/31] x86/fpu: Cleanup xstate xcomp_bv initialization Thomas Gleixner
2021-10-12  0:00 ` [patch 11/31] x86/fpu/xstate: Provide and use for_each_xfeature() Thomas Gleixner
2021-10-12 16:45   ` Borislav Petkov
2021-10-12  0:00 ` [patch 12/31] x86/fpu/xstate: Mark all init only functions __init Thomas Gleixner
2021-10-12  0:00 ` [patch 13/31] x86/fpu: Move KVMs FPU swapping to FPU core Thomas Gleixner
2021-10-12 16:53   ` Borislav Petkov
2021-10-12 18:25     ` Thomas Gleixner
2021-10-12 18:26       ` Thomas Gleixner
2021-10-12 17:22   ` Paolo Bonzini
2021-10-13  6:15     ` Liu, Jing2
2021-10-13  6:26       ` Paolo Bonzini
2021-10-13  7:46         ` Liu, Jing2
2021-10-13  8:42           ` Paolo Bonzini
2021-10-13 10:14             ` Andy Lutomirski
2021-10-13 12:26               ` Paolo Bonzini
2021-10-13 14:14                 ` Thomas Gleixner
2021-10-13 14:24                   ` Thomas Gleixner
2021-10-13 14:59                 ` Andy Lutomirski
2021-10-13 15:05                   ` Paolo Bonzini
2021-10-13 10:25             ` Liu, Jing2
2021-10-13 12:37               ` Paolo Bonzini
2021-10-13 14:06             ` Thomas Gleixner
2021-10-14  6:50               ` Paolo Bonzini
2021-10-14  8:02                 ` Liu, Jing2
2021-10-14  9:01                   ` Paolo Bonzini
2021-10-14 11:21                     ` Liu, Jing2
2021-10-14 11:33                       ` Paolo Bonzini
2021-10-14 11:30                     ` Liu, Jing2
2021-10-14 11:39                       ` Paolo Bonzini
2021-11-22  8:50                         ` Liu, Jing2
2021-10-14 14:09                     ` Thomas Gleixner
2021-10-14 14:37                       ` Thomas Gleixner
2021-10-14 15:01                       ` Paolo Bonzini
2021-10-14 19:14                         ` Thomas Gleixner
2021-10-15  9:20                           ` Liu, Jing2
2021-10-15  9:36                           ` Thomas Gleixner
2021-10-15 14:24                             ` Liu, Jing2
2021-10-15 15:53                               ` Paolo Bonzini
2021-10-16 14:45                               ` Thomas Gleixner
2021-10-15  9:00                         ` Liu, Jing2
2021-10-15 10:50                           ` Thomas Gleixner
2021-10-15 11:17                             ` Paolo Bonzini
2021-10-15 13:01                             ` Liu, Jing2
2021-10-14 12:23                 ` Thomas Gleixner
2021-10-14 12:26                   ` Paolo Bonzini
2021-10-14 14:23                     ` Thomas Gleixner
2021-10-13 15:12       ` Thomas Gleixner
2021-10-14  8:21         ` Liu, Jing2
2021-10-14 13:08           ` Thomas Gleixner
2021-10-12  0:00 ` [patch 14/31] x86/fpu: Replace KVMs homebrewn FPU copy from user Thomas Gleixner
2021-10-12 17:00   ` Borislav Petkov
2021-10-13 14:57     ` Sean Christopherson
2021-10-13 15:12       ` Paolo Bonzini
2021-10-13 15:16       ` Thomas Gleixner
2021-10-12 17:30   ` Paolo Bonzini
2021-10-12  0:00 ` [patch 15/31] x86/fpu: Rework copy_xstate_to_uabi_buf() Thomas Gleixner
2021-10-12 17:30   ` Paolo Bonzini
2021-10-12  0:00 ` [patch 16/31] x86/fpu: Replace KVMs homebrewn FPU copy to user Thomas Gleixner
2021-10-12 17:10   ` Borislav Petkov
2021-10-12 17:36   ` Paolo Bonzini
2021-10-12 17:47     ` Thomas Gleixner
2021-10-12 18:40       ` [patch V2 16/31] x86/fpu: Replace KVMs home brewed " Thomas Gleixner
2021-10-13  5:34       ` [patch 16/31] x86/fpu: Replace KVMs homebrewn " Paolo Bonzini
2021-10-12  0:00 ` [patch 17/31] x86/fpu: Mark fpu__init_prepare_fx_sw_frame() as __init Thomas Gleixner
2021-10-12  0:00 ` [patch 18/31] x86/fpu: Move context switch and exit to user inlines into sched.h Thomas Gleixner
2021-10-12  0:00 ` [patch 19/31] x86/fpu: Clean up cpu feature tests Thomas Gleixner
2021-10-12  0:00 ` [patch 20/31] x86/fpu: Make os_xrstor_booting() private Thomas Gleixner
2021-10-12  0:00 ` [patch 21/31] x86/fpu: Move os_xsave() and os_xrstor() to core Thomas Gleixner
2021-10-12  0:00 ` [patch 22/31] x86/fpu: Move legacy ASM wrappers " Thomas Gleixner
2021-10-12  0:00 ` [patch 23/31] x86/fpu: Make WARN_ON_FPU() private Thomas Gleixner
2021-10-12  0:00 ` [patch 24/31] x86/fpu: Move fpregs_restore_userregs() to core Thomas Gleixner
2021-10-12 17:32   ` Borislav Petkov
2021-10-12  0:00 ` [patch 25/31] x86/fpu: Move mxcsr related code " Thomas Gleixner
2021-10-12  0:00 ` [patch 26/31] x86/fpu: Move fpstate functions to api.h Thomas Gleixner
2021-10-12 17:46   ` Borislav Petkov
2021-10-12  0:00 ` [patch 27/31] x86/fpu: Remove internal.h dependency from fpu/signal.h Thomas Gleixner
2021-10-12  0:00 ` [patch 28/31] x86/sev: Include fpu/xcr.h Thomas Gleixner
2021-10-12  7:24   ` Xiaoyao Li
2021-10-12  0:00 ` [patch 29/31] x86/fpu: Mop up the internal.h leftovers Thomas Gleixner
2021-10-12  0:00 ` [patch 30/31] x86/fpu: Replace the includes of fpu/internal.h Thomas Gleixner
2021-10-12  0:00 ` [patch 31/31] x86/fpu: Provide a proper function for ex_handler_fprestore() Thomas Gleixner
2021-10-12 21:15 ` [patch 00/31] x86/fpu: Preparatory cleanups for AMX support (part 1) Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).