All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 00/21] x86: Support Intel Advanced Matrix Extensions
@ 2020-12-23 15:56 Chang S. Bae
  2020-12-23 15:56 ` [PATCH v3 01/21] x86/fpu/xstate: Modify initialization helper to handle both static and dynamic buffers Chang S. Bae
                   ` (21 more replies)
  0 siblings, 22 replies; 64+ messages in thread
From: Chang S. Bae @ 2020-12-23 15:56 UTC (permalink / raw)
  To: bp, luto, tglx, mingo, x86
  Cc: len.brown, dave.hansen, jing2.liu, ravi.v.shankar, linux-kernel,
	chang.seok.bae

Intel Advanced Matrix Extensions (AMX)[1][2] will be shipping on servers
soon.  AMX consists of configurable TMM "TILE" registers plus new
accelerator instructions that operate on them.  TMUL (Tile matrix MULtiply)
is the first accelerator instruction set to use the new registers, and we
anticipate additional instructions in the future.

Neither AMX state nor TMUL instructions depend on AVX.  However, AMX and
AVX do share common challenges.  The TMM registers are 8KB today, and
architecturally as large as 64KB, which merits updates to hardware and
software state management.

Further, both technologies run faster when they are not simultaneously
running on SMT siblings, and both technologies use of power and bandwidth
impact the power and performance available to neighboring cores.  (This
impact has measurably improved in recent hardware.)

If the existing kernel approach for managing XSAVE state was employed to
handle AMX, 8KB space would be added to every task, but possibly rarely
used.  So Linux support is optimized by using a new XSAVE feature: eXtended
Feature Disabling (XFD).  The kernel arms XFD to provide a #NM exception
upon a tasks' first access to TILE state. The kernel exception handler
installs the appropriate XSAVE context switch buffer, and the task behaves
as if the kernel had done that for all tasks.  Using XFD, AMX space is
allocated only when needed, eliminating the memory waste for unused state
components.

This series requires the new minimum sigaltstack support [3] and is based
on the mainline. The series is composed of three parts:
* Patch 01-14: Foundation to support dynamic user state management
* Patch 15-19: AMX enablement, including unit tests
* Patch 20-21: Signal handling optimization and new boot-parameters

Thanks to Len Brown and Dave Hansen for help with the cover letter.

Changes from v2 [5]:
* Removed the patch for the tile data inheritance. Also, updated the
  selftest patch. (Andy Lutomirski)
* Changed the kernel tainted when any unknown state is enabled. (Andy
  Lutomirski)
* Changed to use the XFD feature only when the compacted format in use.
* Improved the test code.
* Simplified the cmdline handling.
* Removed 'task->fpu' in changelogs. (Boris Petkov)
* Updated the variable name / comments / changelogs for clarification.

Changes from v1 [4]:
* Added vmalloc() error tracing (Dave Hansen, PeterZ, and Andy Lutomirski)
* Inlined the #NM handling code (Andy Lutomirski)
* Made signal handling optimization revertible
* Revised the new parameter handling code (Andy Lutomirski and Dave Hansen)
* Rebased on the upstream kernel

[1]: Intel Architecture Instruction Set Extension Programming Reference
    October 2020, https://software.intel.com/content/dam/develop/external/us/en/documents/architecture-instruction-set-extensions-programming-reference.pdf
[2]: https://software.intel.com/content/www/us/en/develop/documentation/cpp-compiler-developer-guide-and-reference/top/compiler-reference/intrinsics/intrinsics-for-intel-advanced-matrix-extensions-intel-amx-instructions.html
[3]: https://lore.kernel.org/lkml/20201223015312.4882-1-chang.seok.bae@intel.com/
[4]: https://lore.kernel.org/lkml/20201001203913.9125-1-chang.seok.bae@intel.com/
[5]: https://lore.kernel.org/lkml/20201119233257.2939-1-chang.seok.bae@intel.com/

Chang S. Bae (21):
  x86/fpu/xstate: Modify initialization helper to handle both static and
    dynamic buffers
  x86/fpu/xstate: Modify state copy helpers to handle both static and
    dynamic buffers
  x86/fpu/xstate: Modify address finders to handle both static and
    dynamic buffers
  x86/fpu/xstate: Modify context switch helpers to handle both static
    and dynamic buffers
  x86/fpu/xstate: Add a new variable to indicate dynamic user states
  x86/fpu/xstate: Calculate and remember dynamic xstate buffer sizes
  x86/fpu/xstate: Introduce helpers to manage dynamic xstate buffers
  x86/fpu/xstate: Define the scope of the initial xstate data
  x86/fpu/xstate: Introduce wrapper functions to organize xstate buffer
    access
  x86/fpu/xstate: Update xstate save function to support dynamic xstate
  x86/fpu/xstate: Update xstate buffer address finder to support dynamic
    xstate
  x86/fpu/xstate: Update xstate context copy function to support dynamic
    buffer
  x86/fpu/xstate: Expand dynamic context switch buffer on first use
  x86/fpu/xstate: Support ptracer-induced xstate buffer expansion
  x86/fpu/xstate: Extend the table to map xstate components with
    features
  x86/cpufeatures/amx: Enumerate Advanced Matrix Extension (AMX) feature
    bits
  x86/fpu/amx: Define AMX state components and have it used for
    boot-time checks
  x86/fpu/amx: Enable the AMX feature in 64-bit mode
  selftest/x86/amx: Include test cases for the AMX state management
  x86/fpu/xstate: Support dynamic user state in the signal handling path
  x86/fpu/xstate: Introduce boot-parameters to control some state
    component support

 .../admin-guide/kernel-parameters.txt         |  15 +
 arch/x86/include/asm/cpufeatures.h            |   4 +
 arch/x86/include/asm/fpu/internal.h           |  97 ++-
 arch/x86/include/asm/fpu/types.h              |  62 +-
 arch/x86/include/asm/fpu/xstate.h             |  61 +-
 arch/x86/include/asm/msr-index.h              |   2 +
 arch/x86/include/asm/pgtable.h                |   2 +-
 arch/x86/include/asm/processor.h              |  10 +-
 arch/x86/include/asm/trace/fpu.h              |  11 +-
 arch/x86/kernel/cpu/common.c                  |   2 +-
 arch/x86/kernel/cpu/cpuid-deps.c              |   4 +
 arch/x86/kernel/fpu/core.c                    |  50 +-
 arch/x86/kernel/fpu/init.c                    | 103 ++-
 arch/x86/kernel/fpu/regset.c                  |  65 +-
 arch/x86/kernel/fpu/signal.c                  |  40 +-
 arch/x86/kernel/fpu/xstate.c                  | 481 ++++++++++--
 arch/x86/kernel/process.c                     |  11 +
 arch/x86/kernel/process_32.c                  |   2 +-
 arch/x86/kernel/process_64.c                  |   2 +-
 arch/x86/kernel/traps.c                       |  40 +
 arch/x86/kvm/x86.c                            |  43 +-
 arch/x86/mm/pkeys.c                           |   2 +-
 tools/testing/selftests/x86/Makefile          |   2 +-
 tools/testing/selftests/x86/amx.c             | 743 ++++++++++++++++++
 24 files changed, 1631 insertions(+), 223 deletions(-)
 create mode 100644 tools/testing/selftests/x86/amx.c

-- 
2.17.1


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v3 01/21] x86/fpu/xstate: Modify initialization helper to handle both static and dynamic buffers
  2020-12-23 15:56 [PATCH v3 00/21] x86: Support Intel Advanced Matrix Extensions Chang S. Bae
@ 2020-12-23 15:56 ` Chang S. Bae
  2021-01-15 12:40   ` Borislav Petkov
  2020-12-23 15:56 ` [PATCH v3 02/21] x86/fpu/xstate: Modify state copy helpers " Chang S. Bae
                   ` (20 subsequent siblings)
  21 siblings, 1 reply; 64+ messages in thread
From: Chang S. Bae @ 2020-12-23 15:56 UTC (permalink / raw)
  To: bp, luto, tglx, mingo, x86
  Cc: len.brown, dave.hansen, jing2.liu, ravi.v.shankar, linux-kernel,
	chang.seok.bae, kvm

In preparation for dynamic xstate buffer expansion, update the buffer
initialization function parameters to equally handle static in-line xstate
buffer, as well as dynamically allocated xstate buffer.

init_fpstate is a special case, which is indicated by a null pointer
parameter to fpstate_init().

Also, fpstate_init_xstate() now accepts the state component bitmap to
configure XCOMP_BV for the compacted format.

No functional change.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: kvm@vger.kernel.org
---
Changes from v2:
* Updated the changelog with task->fpu removed. (Boris Petkov)
---
 arch/x86/include/asm/fpu/internal.h |  6 +++---
 arch/x86/kernel/fpu/core.c          | 14 +++++++++++---
 arch/x86/kernel/fpu/init.c          |  2 +-
 arch/x86/kernel/fpu/regset.c        |  2 +-
 arch/x86/kernel/fpu/xstate.c        |  3 +--
 arch/x86/kvm/x86.c                  |  2 +-
 6 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 8d33ad80704f..d81d8c407dc0 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -80,20 +80,20 @@ static __always_inline __pure bool use_fxsr(void)
 
 extern union fpregs_state init_fpstate;
 
-extern void fpstate_init(union fpregs_state *state);
+extern void fpstate_init(struct fpu *fpu);
 #ifdef CONFIG_MATH_EMULATION
 extern void fpstate_init_soft(struct swregs_state *soft);
 #else
 static inline void fpstate_init_soft(struct swregs_state *soft) {}
 #endif
 
-static inline void fpstate_init_xstate(struct xregs_state *xsave)
+static inline void fpstate_init_xstate(struct xregs_state *xsave, u64 xcomp_mask)
 {
 	/*
 	 * XRSTORS requires these bits set in xcomp_bv, or it will
 	 * trigger #GP:
 	 */
-	xsave->header.xcomp_bv = XCOMP_BV_COMPACTED_FORMAT | xfeatures_mask_all;
+	xsave->header.xcomp_bv = XCOMP_BV_COMPACTED_FORMAT | xcomp_mask;
 }
 
 static inline void fpstate_init_fxstate(struct fxregs_state *fx)
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index eb86a2b831b1..f23e5ffbb307 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -191,8 +191,16 @@ static inline void fpstate_init_fstate(struct fregs_state *fp)
 	fp->fos = 0xffff0000u;
 }
 
-void fpstate_init(union fpregs_state *state)
+/* A null pointer parameter indicates init_fpstate. */
+void fpstate_init(struct fpu *fpu)
 {
+	union fpregs_state *state;
+
+	if (fpu)
+		state = &fpu->state;
+	else
+		state = &init_fpstate;
+
 	if (!static_cpu_has(X86_FEATURE_FPU)) {
 		fpstate_init_soft(&state->soft);
 		return;
@@ -201,7 +209,7 @@ void fpstate_init(union fpregs_state *state)
 	memset(state, 0, fpu_kernel_xstate_size);
 
 	if (static_cpu_has(X86_FEATURE_XSAVES))
-		fpstate_init_xstate(&state->xsave);
+		fpstate_init_xstate(&state->xsave, xfeatures_mask_all);
 	if (static_cpu_has(X86_FEATURE_FXSR))
 		fpstate_init_fxstate(&state->fxsave);
 	else
@@ -261,7 +269,7 @@ static void fpu__initialize(struct fpu *fpu)
 	WARN_ON_FPU(fpu != &current->thread.fpu);
 
 	set_thread_flag(TIF_NEED_FPU_LOAD);
-	fpstate_init(&fpu->state);
+	fpstate_init(fpu);
 	trace_x86_fpu_init_state(fpu);
 }
 
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 701f196d7c68..74e03e3bc20f 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -124,7 +124,7 @@ static void __init fpu__init_system_generic(void)
 	 * Set up the legacy init FPU context. (xstate init might overwrite this
 	 * with a more modern format, if the CPU supports it.)
 	 */
-	fpstate_init(&init_fpstate);
+	fpstate_init(NULL);
 
 	fpu__init_system_mxcsr();
 }
diff --git a/arch/x86/kernel/fpu/regset.c b/arch/x86/kernel/fpu/regset.c
index c413756ba89f..4c4d9059ff36 100644
--- a/arch/x86/kernel/fpu/regset.c
+++ b/arch/x86/kernel/fpu/regset.c
@@ -144,7 +144,7 @@ int xstateregs_set(struct task_struct *target, const struct user_regset *regset,
 	 * In case of failure, mark all states as init:
 	 */
 	if (ret)
-		fpstate_init(&fpu->state);
+		fpstate_init(fpu);
 
 	return ret;
 }
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 5d8047441a0a..1a3e5effe0fa 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -457,8 +457,7 @@ static void __init setup_init_fpu_buf(void)
 	print_xstate_features();
 
 	if (boot_cpu_has(X86_FEATURE_XSAVES))
-		init_fpstate.xsave.header.xcomp_bv = XCOMP_BV_COMPACTED_FORMAT |
-						     xfeatures_mask_all;
+		fpstate_init_xstate(&init_fpstate.xsave, xfeatures_mask_all);
 
 	/*
 	 * Init all the features state with header.xfeatures being 0x0
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e545a8a613b1..45704f106815 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9820,7 +9820,7 @@ static int sync_regs(struct kvm_vcpu *vcpu)
 
 static void fx_init(struct kvm_vcpu *vcpu)
 {
-	fpstate_init(&vcpu->arch.guest_fpu->state);
+	fpstate_init(vcpu->arch.guest_fpu);
 	if (boot_cpu_has(X86_FEATURE_XSAVES))
 		vcpu->arch.guest_fpu->state.xsave.header.xcomp_bv =
 			host_xcr0 | XSTATE_COMPACTION_ENABLED;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v3 02/21] x86/fpu/xstate: Modify state copy helpers to handle both static and dynamic buffers
  2020-12-23 15:56 [PATCH v3 00/21] x86: Support Intel Advanced Matrix Extensions Chang S. Bae
  2020-12-23 15:56 ` [PATCH v3 01/21] x86/fpu/xstate: Modify initialization helper to handle both static and dynamic buffers Chang S. Bae
@ 2020-12-23 15:56 ` Chang S. Bae
  2021-01-15 12:50   ` Borislav Petkov
  2020-12-23 15:56 ` [PATCH v3 03/21] x86/fpu/xstate: Modify address finders " Chang S. Bae
                   ` (19 subsequent siblings)
  21 siblings, 1 reply; 64+ messages in thread
From: Chang S. Bae @ 2020-12-23 15:56 UTC (permalink / raw)
  To: bp, luto, tglx, mingo, x86
  Cc: len.brown, dave.hansen, jing2.liu, ravi.v.shankar, linux-kernel,
	chang.seok.bae

In preparation for dynamic xstate buffer expansion, update the xstate
copy function parameters to equally handle static in-line buffer, as well
as dynamically allocated xstate buffer.

No functional change.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v2:
* Updated the changelog with task->fpu removed. (Boris Petkov)
---
 arch/x86/include/asm/fpu/xstate.h |  8 ++++----
 arch/x86/kernel/fpu/regset.c      |  6 +++---
 arch/x86/kernel/fpu/signal.c      | 16 +++++++---------
 arch/x86/kernel/fpu/xstate.c      | 19 +++++++++++++++----
 4 files changed, 29 insertions(+), 20 deletions(-)

diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
index 47a92232d595..e0f1b22f53ce 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -105,10 +105,10 @@ const void *get_xsave_field_ptr(int xfeature_nr);
 int using_compacted_format(void);
 int xfeature_size(int xfeature_nr);
 struct membuf;
-void copy_xstate_to_kernel(struct membuf to, struct xregs_state *xsave);
-int copy_kernel_to_xstate(struct xregs_state *xsave, const void *kbuf);
-int copy_user_to_xstate(struct xregs_state *xsave, const void __user *ubuf);
-void copy_supervisor_to_kernel(struct xregs_state *xsave);
+void copy_xstate_to_kernel(struct membuf to, struct fpu *fpu);
+int copy_kernel_to_xstate(struct fpu *fpu, const void *kbuf);
+int copy_user_to_xstate(struct fpu *fpu, const void __user *ubuf);
+void copy_supervisor_to_kernel(struct fpu *fpu);
 void copy_dynamic_supervisor_to_kernel(struct xregs_state *xstate, u64 mask);
 void copy_kernel_to_dynamic_supervisor(struct xregs_state *xstate, u64 mask);
 
diff --git a/arch/x86/kernel/fpu/regset.c b/arch/x86/kernel/fpu/regset.c
index 4c4d9059ff36..5e13e58d11d4 100644
--- a/arch/x86/kernel/fpu/regset.c
+++ b/arch/x86/kernel/fpu/regset.c
@@ -85,7 +85,7 @@ int xstateregs_get(struct task_struct *target, const struct user_regset *regset,
 	fpu__prepare_read(fpu);
 
 	if (using_compacted_format()) {
-		copy_xstate_to_kernel(to, xsave);
+		copy_xstate_to_kernel(to, fpu);
 		return 0;
 	} else {
 		fpstate_sanitize_xstate(fpu);
@@ -126,9 +126,9 @@ int xstateregs_set(struct task_struct *target, const struct user_regset *regset,
 
 	if (using_compacted_format()) {
 		if (kbuf)
-			ret = copy_kernel_to_xstate(xsave, kbuf);
+			ret = copy_kernel_to_xstate(fpu, kbuf);
 		else
-			ret = copy_user_to_xstate(xsave, ubuf);
+			ret = copy_user_to_xstate(fpu, ubuf);
 	} else {
 		ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, xsave, 0, -1);
 		if (!ret)
diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index a4ec65317a7f..0d6deb75c507 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -212,11 +212,11 @@ int copy_fpstate_to_sigframe(void __user *buf, void __user *buf_fx, int size)
 }
 
 static inline void
-sanitize_restored_user_xstate(union fpregs_state *state,
+sanitize_restored_user_xstate(struct fpu *fpu,
 			      struct user_i387_ia32_struct *ia32_env,
 			      u64 user_xfeatures, int fx_only)
 {
-	struct xregs_state *xsave = &state->xsave;
+	struct xregs_state *xsave = &fpu->state.xsave;
 	struct xstate_header *header = &xsave->header;
 
 	if (use_xsave()) {
@@ -253,7 +253,7 @@ sanitize_restored_user_xstate(union fpregs_state *state,
 		xsave->i387.mxcsr &= mxcsr_feature_mask;
 
 		if (ia32_env)
-			convert_to_fxsr(&state->fxsave, ia32_env);
+			convert_to_fxsr(&fpu->state.fxsave, ia32_env);
 	}
 }
 
@@ -396,7 +396,7 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
 		 * current supervisor states first and invalidate the FPU regs.
 		 */
 		if (xfeatures_mask_supervisor())
-			copy_supervisor_to_kernel(&fpu->state.xsave);
+			copy_supervisor_to_kernel(fpu);
 		set_thread_flag(TIF_NEED_FPU_LOAD);
 	}
 	__fpu_invalidate_fpregs_state(fpu);
@@ -406,7 +406,7 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
 		u64 init_bv = xfeatures_mask_user() & ~user_xfeatures;
 
 		if (using_compacted_format()) {
-			ret = copy_user_to_xstate(&fpu->state.xsave, buf_fx);
+			ret = copy_user_to_xstate(fpu, buf_fx);
 		} else {
 			ret = __copy_from_user(&fpu->state.xsave, buf_fx, state_size);
 
@@ -416,8 +416,7 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
 		if (ret)
 			goto err_out;
 
-		sanitize_restored_user_xstate(&fpu->state, envp, user_xfeatures,
-					      fx_only);
+		sanitize_restored_user_xstate(fpu, envp, user_xfeatures, fx_only);
 
 		fpregs_lock();
 		if (unlikely(init_bv))
@@ -437,8 +436,7 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
 			goto err_out;
 		}
 
-		sanitize_restored_user_xstate(&fpu->state, envp, user_xfeatures,
-					      fx_only);
+		sanitize_restored_user_xstate(fpu, envp, user_xfeatures, fx_only);
 
 		fpregs_lock();
 		if (use_xsave()) {
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 1a3e5effe0fa..6156dad0feb6 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -1071,14 +1071,17 @@ static void copy_part(struct membuf *to, unsigned *last, unsigned offset,
  * It supports partial copy but pos always starts from zero. This is called
  * from xstateregs_get() and there we check the CPU has XSAVES.
  */
-void copy_xstate_to_kernel(struct membuf to, struct xregs_state *xsave)
+void copy_xstate_to_kernel(struct membuf to, struct fpu *fpu)
 {
 	struct xstate_header header;
 	const unsigned off_mxcsr = offsetof(struct fxregs_state, mxcsr);
+	struct xregs_state *xsave;
 	unsigned size = to.left;
 	unsigned last = 0;
 	int i;
 
+	xsave = &fpu->state.xsave;
+
 	/*
 	 * The destination is a ptrace buffer; we put in only user xstates:
 	 */
@@ -1127,8 +1130,9 @@ void copy_xstate_to_kernel(struct membuf to, struct xregs_state *xsave)
  * Convert from a ptrace standard-format kernel buffer to kernel XSAVES format
  * and copy to the target thread. This is called from xstateregs_set().
  */
-int copy_kernel_to_xstate(struct xregs_state *xsave, const void *kbuf)
+int copy_kernel_to_xstate(struct fpu *fpu, const void *kbuf)
 {
+	struct xregs_state *xsave;
 	unsigned int offset, size;
 	int i;
 	struct xstate_header hdr;
@@ -1141,6 +1145,8 @@ int copy_kernel_to_xstate(struct xregs_state *xsave, const void *kbuf)
 	if (validate_user_xstate_header(&hdr))
 		return -EINVAL;
 
+	xsave = &fpu->state.xsave;
+
 	for (i = 0; i < XFEATURE_MAX; i++) {
 		u64 mask = ((u64)1 << i);
 
@@ -1180,8 +1186,9 @@ int copy_kernel_to_xstate(struct xregs_state *xsave, const void *kbuf)
  * xstateregs_set(), as well as potentially from the sigreturn() and
  * rt_sigreturn() system calls.
  */
-int copy_user_to_xstate(struct xregs_state *xsave, const void __user *ubuf)
+int copy_user_to_xstate(struct fpu *fpu, const void __user *ubuf)
 {
+	struct xregs_state *xsave;
 	unsigned int offset, size;
 	int i;
 	struct xstate_header hdr;
@@ -1195,6 +1202,8 @@ int copy_user_to_xstate(struct xregs_state *xsave, const void __user *ubuf)
 	if (validate_user_xstate_header(&hdr))
 		return -EINVAL;
 
+	xsave = &fpu->state.xsave;
+
 	for (i = 0; i < XFEATURE_MAX; i++) {
 		u64 mask = ((u64)1 << i);
 
@@ -1235,9 +1244,10 @@ int copy_user_to_xstate(struct xregs_state *xsave, const void __user *ubuf)
  * old states, and is intended to be used only in __fpu__restore_sig(), where
  * user states are restored from the user buffer.
  */
-void copy_supervisor_to_kernel(struct xregs_state *xstate)
+void copy_supervisor_to_kernel(struct fpu *fpu)
 {
 	struct xstate_header *header;
+	struct xregs_state *xstate;
 	u64 max_bit, min_bit;
 	u32 lmask, hmask;
 	int err, i;
@@ -1251,6 +1261,7 @@ void copy_supervisor_to_kernel(struct xregs_state *xstate)
 	max_bit = __fls(xfeatures_mask_supervisor());
 	min_bit = __ffs(xfeatures_mask_supervisor());
 
+	xstate = &fpu->state.xsave;
 	lmask = xfeatures_mask_supervisor();
 	hmask = xfeatures_mask_supervisor() >> 32;
 	XSTATE_OP(XSAVES, xstate, lmask, hmask, err);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v3 03/21] x86/fpu/xstate: Modify address finders to handle both static and dynamic buffers
  2020-12-23 15:56 [PATCH v3 00/21] x86: Support Intel Advanced Matrix Extensions Chang S. Bae
  2020-12-23 15:56 ` [PATCH v3 01/21] x86/fpu/xstate: Modify initialization helper to handle both static and dynamic buffers Chang S. Bae
  2020-12-23 15:56 ` [PATCH v3 02/21] x86/fpu/xstate: Modify state copy helpers " Chang S. Bae
@ 2020-12-23 15:56 ` Chang S. Bae
  2021-01-15 13:06   ` Borislav Petkov
  2020-12-23 15:57 ` [PATCH v3 04/21] x86/fpu/xstate: Modify context switch helpers " Chang S. Bae
                   ` (18 subsequent siblings)
  21 siblings, 1 reply; 64+ messages in thread
From: Chang S. Bae @ 2020-12-23 15:56 UTC (permalink / raw)
  To: bp, luto, tglx, mingo, x86
  Cc: len.brown, dave.hansen, jing2.liu, ravi.v.shankar, linux-kernel,
	chang.seok.bae, kvm

In preparation for dynamic xstate buffer expansion, update the buffer
address finder function parameters to equally handle static in-line xstate
buffer, as well as dynamically allocated xstate buffer.

init_fpstate is a special case, which is indicated by a null pointer
parameter to get_xsave_addr() and __raw_xsave_addr().

No functional change.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: kvm@vger.kernel.org
---
Changes from v2:
* Updated the changelog with task->fpu removed. (Boris Petkov)

Changes from v1:
* Rebased on the upstream kernel (5.10)
---
 arch/x86/include/asm/fpu/internal.h |  2 +-
 arch/x86/include/asm/fpu/xstate.h   |  2 +-
 arch/x86/include/asm/pgtable.h      |  2 +-
 arch/x86/kernel/cpu/common.c        |  2 +-
 arch/x86/kernel/fpu/xstate.c        | 50 +++++++++++++++++++----------
 arch/x86/kvm/x86.c                  | 26 +++++++++------
 arch/x86/mm/pkeys.c                 |  2 +-
 7 files changed, 55 insertions(+), 31 deletions(-)

diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index d81d8c407dc0..0153c4d4ca77 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -579,7 +579,7 @@ static inline void switch_fpu_finish(struct fpu *new_fpu)
 	 * return to userland e.g. for a copy_to_user() operation.
 	 */
 	if (current->mm) {
-		pk = get_xsave_addr(&new_fpu->state.xsave, XFEATURE_PKRU);
+		pk = get_xsave_addr(new_fpu, XFEATURE_PKRU);
 		if (pk)
 			pkru_val = pk->pkru;
 	}
diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
index e0f1b22f53ce..24bf8d3f559a 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -100,7 +100,7 @@ extern u64 xstate_fx_sw_bytes[USER_XSTATE_FX_SW_WORDS];
 extern void __init update_regset_xstate_info(unsigned int size,
 					     u64 xstate_mask);
 
-void *get_xsave_addr(struct xregs_state *xsave, int xfeature_nr);
+void *get_xsave_addr(struct fpu *fpu, int xfeature_nr);
 const void *get_xsave_field_ptr(int xfeature_nr);
 int using_compacted_format(void);
 int xfeature_size(int xfeature_nr);
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index a02c67291cfc..83268b41444f 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -141,7 +141,7 @@ static inline void write_pkru(u32 pkru)
 	if (!boot_cpu_has(X86_FEATURE_OSPKE))
 		return;
 
-	pk = get_xsave_addr(&current->thread.fpu.state.xsave, XFEATURE_PKRU);
+	pk = get_xsave_addr(&current->thread.fpu, XFEATURE_PKRU);
 
 	/*
 	 * The PKRU value in xstate needs to be in sync with the value that is
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 35ad8480c464..860b19db208b 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -478,7 +478,7 @@ static __always_inline void setup_pku(struct cpuinfo_x86 *c)
 		return;
 
 	cr4_set_bits(X86_CR4_PKE);
-	pk = get_xsave_addr(&init_fpstate.xsave, XFEATURE_PKRU);
+	pk = get_xsave_addr(NULL, XFEATURE_PKRU);
 	if (pk)
 		pk->pkru = init_pkru_value;
 	/*
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 6156dad0feb6..2010c31d25e1 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -894,15 +894,24 @@ void fpu__resume_cpu(void)
  * Given an xstate feature nr, calculate where in the xsave
  * buffer the state is.  Callers should ensure that the buffer
  * is valid.
+ *
+ * A null pointer parameter indicates to use init_fpstate.
  */
-static void *__raw_xsave_addr(struct xregs_state *xsave, int xfeature_nr)
+static void *__raw_xsave_addr(struct fpu *fpu, int xfeature_nr)
 {
+	void *xsave;
+
 	if (!xfeature_enabled(xfeature_nr)) {
 		WARN_ON_FPU(1);
 		return NULL;
 	}
 
-	return (void *)xsave + xstate_comp_offsets[xfeature_nr];
+	if (fpu)
+		xsave = &fpu->state.xsave;
+	else
+		xsave = &init_fpstate.xsave;
+
+	return xsave + xstate_comp_offsets[xfeature_nr];
 }
 /*
  * Given the xsave area and a state inside, this function returns the
@@ -915,15 +924,18 @@ static void *__raw_xsave_addr(struct xregs_state *xsave, int xfeature_nr)
  * this will return NULL.
  *
  * Inputs:
- *	xstate: the thread's storage area for all FPU data
+ *	fpu: the thread's FPU data to reference xstate buffer(s).
+ *	     (A null pointer parameter indicates init_fpstate.)
  *	xfeature_nr: state which is defined in xsave.h (e.g. XFEATURE_FP,
  *	XFEATURE_SSE, etc...)
  * Output:
  *	address of the state in the xsave area, or NULL if the
  *	field is not present in the xsave buffer.
  */
-void *get_xsave_addr(struct xregs_state *xsave, int xfeature_nr)
+void *get_xsave_addr(struct fpu *fpu, int xfeature_nr)
 {
+	struct xregs_state *xsave;
+
 	/*
 	 * Do we even *have* xsave state?
 	 */
@@ -936,6 +948,12 @@ void *get_xsave_addr(struct xregs_state *xsave, int xfeature_nr)
 	 */
 	WARN_ONCE(!(xfeatures_mask_all & BIT_ULL(xfeature_nr)),
 		  "get of unsupported state");
+
+	if (fpu)
+		xsave = &fpu->state.xsave;
+	else
+		xsave = &init_fpstate.xsave;
+
 	/*
 	 * This assumes the last 'xsave*' instruction to
 	 * have requested that 'xfeature_nr' be saved.
@@ -950,7 +968,7 @@ void *get_xsave_addr(struct xregs_state *xsave, int xfeature_nr)
 	if (!(xsave->header.xfeatures & BIT_ULL(xfeature_nr)))
 		return NULL;
 
-	return __raw_xsave_addr(xsave, xfeature_nr);
+	return __raw_xsave_addr(fpu, xfeature_nr);
 }
 EXPORT_SYMBOL_GPL(get_xsave_addr);
 
@@ -981,7 +999,7 @@ const void *get_xsave_field_ptr(int xfeature_nr)
 	 */
 	fpu__save(fpu);
 
-	return get_xsave_addr(&fpu->state.xsave, xfeature_nr);
+	return get_xsave_addr(fpu, xfeature_nr);
 }
 
 #ifdef CONFIG_ARCH_HAS_PKEYS
@@ -1116,7 +1134,7 @@ void copy_xstate_to_kernel(struct membuf to, struct fpu *fpu)
 		 * Copy only in-use xstates:
 		 */
 		if ((header.xfeatures >> i) & 1) {
-			void *src = __raw_xsave_addr(xsave, i);
+			void *src = __raw_xsave_addr(fpu, i);
 
 			copy_part(&to, &last, xstate_offsets[i],
 				  xstate_sizes[i], src);
@@ -1145,13 +1163,11 @@ int copy_kernel_to_xstate(struct fpu *fpu, const void *kbuf)
 	if (validate_user_xstate_header(&hdr))
 		return -EINVAL;
 
-	xsave = &fpu->state.xsave;
-
 	for (i = 0; i < XFEATURE_MAX; i++) {
 		u64 mask = ((u64)1 << i);
 
 		if (hdr.xfeatures & mask) {
-			void *dst = __raw_xsave_addr(xsave, i);
+			void *dst = __raw_xsave_addr(fpu, i);
 
 			offset = xstate_offsets[i];
 			size = xstate_sizes[i];
@@ -1160,6 +1176,8 @@ int copy_kernel_to_xstate(struct fpu *fpu, const void *kbuf)
 		}
 	}
 
+	xsave = &fpu->state.xsave;
+
 	if (xfeatures_mxcsr_quirk(hdr.xfeatures)) {
 		offset = offsetof(struct fxregs_state, mxcsr);
 		size = MXCSR_AND_FLAGS_SIZE;
@@ -1202,13 +1220,11 @@ int copy_user_to_xstate(struct fpu *fpu, const void __user *ubuf)
 	if (validate_user_xstate_header(&hdr))
 		return -EINVAL;
 
-	xsave = &fpu->state.xsave;
-
 	for (i = 0; i < XFEATURE_MAX; i++) {
 		u64 mask = ((u64)1 << i);
 
 		if (hdr.xfeatures & mask) {
-			void *dst = __raw_xsave_addr(xsave, i);
+			void *dst = __raw_xsave_addr(fpu, i);
 
 			offset = xstate_offsets[i];
 			size = xstate_sizes[i];
@@ -1218,6 +1234,8 @@ int copy_user_to_xstate(struct fpu *fpu, const void __user *ubuf)
 		}
 	}
 
+	xsave = &fpu->state.xsave;
+
 	if (xfeatures_mxcsr_quirk(hdr.xfeatures)) {
 		offset = offsetof(struct fxregs_state, mxcsr);
 		size = MXCSR_AND_FLAGS_SIZE;
@@ -1441,16 +1459,14 @@ void update_pasid(void)
 	} else {
 		struct fpu *fpu = &current->thread.fpu;
 		struct ia32_pasid_state *ppasid_state;
-		struct xregs_state *xsave;
 
 		/*
 		 * The CPU's xstate registers are not currently active. Just
 		 * update the PASID state in the memory buffer here. The
 		 * PASID MSR will be loaded when returning to user mode.
 		 */
-		xsave = &fpu->state.xsave;
-		xsave->header.xfeatures |= XFEATURE_MASK_PASID;
-		ppasid_state = get_xsave_addr(xsave, XFEATURE_PASID);
+		fpu->state.xsave.header.xfeatures |= XFEATURE_MASK_PASID;
+		ppasid_state = get_xsave_addr(fpu, XFEATURE_PASID);
 		/*
 		 * Since XFEATURE_MASK_PASID is set in xfeatures, ppasid_state
 		 * won't be NULL and no need to check its value.
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 45704f106815..09368201d9cc 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4396,10 +4396,15 @@ static int kvm_vcpu_ioctl_x86_set_debugregs(struct kvm_vcpu *vcpu,
 
 static void fill_xsave(u8 *dest, struct kvm_vcpu *vcpu)
 {
-	struct xregs_state *xsave = &vcpu->arch.guest_fpu->state.xsave;
-	u64 xstate_bv = xsave->header.xfeatures;
+	struct xregs_state *xsave;
+	struct fpu *guest_fpu;
+	u64 xstate_bv;
 	u64 valid;
 
+	guest_fpu = vcpu->arch.guest_fpu;
+	xsave = &guest_fpu->state.xsave;
+	xstate_bv = xsave->header.xfeatures;
+
 	/*
 	 * Copy legacy XSAVE area, to avoid complications with CPUID
 	 * leaves 0 and 1 in the loop below.
@@ -4418,7 +4423,7 @@ static void fill_xsave(u8 *dest, struct kvm_vcpu *vcpu)
 	while (valid) {
 		u64 xfeature_mask = valid & -valid;
 		int xfeature_nr = fls64(xfeature_mask) - 1;
-		void *src = get_xsave_addr(xsave, xfeature_nr);
+		void *src = get_xsave_addr(guest_fpu, xfeature_nr);
 
 		if (src) {
 			u32 size, offset, ecx, edx;
@@ -4438,10 +4443,14 @@ static void fill_xsave(u8 *dest, struct kvm_vcpu *vcpu)
 
 static void load_xsave(struct kvm_vcpu *vcpu, u8 *src)
 {
-	struct xregs_state *xsave = &vcpu->arch.guest_fpu->state.xsave;
 	u64 xstate_bv = *(u64 *)(src + XSAVE_HDR_OFFSET);
+	struct xregs_state *xsave;
+	struct fpu *guest_fpu;
 	u64 valid;
 
+	guest_fpu = vcpu->arch.guest_fpu;
+	xsave = &guest_fpu->state.xsave;
+
 	/*
 	 * Copy legacy XSAVE area, to avoid complications with CPUID
 	 * leaves 0 and 1 in the loop below.
@@ -4461,7 +4470,7 @@ static void load_xsave(struct kvm_vcpu *vcpu, u8 *src)
 	while (valid) {
 		u64 xfeature_mask = valid & -valid;
 		int xfeature_nr = fls64(xfeature_mask) - 1;
-		void *dest = get_xsave_addr(xsave, xfeature_nr);
+		void *dest = get_xsave_addr(guest_fpu, xfeature_nr);
 
 		if (dest) {
 			u32 size, offset, ecx, edx;
@@ -10031,6 +10040,7 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 	vcpu->arch.apf.halted = false;
 
 	if (kvm_mpx_supported()) {
+		struct fpu *guest_fpu = vcpu->arch.guest_fpu;
 		void *mpx_state_buffer;
 
 		/*
@@ -10039,12 +10049,10 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 		 */
 		if (init_event)
 			kvm_put_guest_fpu(vcpu);
-		mpx_state_buffer = get_xsave_addr(&vcpu->arch.guest_fpu->state.xsave,
-					XFEATURE_BNDREGS);
+		mpx_state_buffer = get_xsave_addr(guest_fpu, XFEATURE_BNDREGS);
 		if (mpx_state_buffer)
 			memset(mpx_state_buffer, 0, sizeof(struct mpx_bndreg_state));
-		mpx_state_buffer = get_xsave_addr(&vcpu->arch.guest_fpu->state.xsave,
-					XFEATURE_BNDCSR);
+		mpx_state_buffer = get_xsave_addr(guest_fpu, XFEATURE_BNDCSR);
 		if (mpx_state_buffer)
 			memset(mpx_state_buffer, 0, sizeof(struct mpx_bndcsr));
 		if (init_event)
diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c
index 8873ed1438a9..772e8bc3d49d 100644
--- a/arch/x86/mm/pkeys.c
+++ b/arch/x86/mm/pkeys.c
@@ -177,7 +177,7 @@ static ssize_t init_pkru_write_file(struct file *file,
 		return -EINVAL;
 
 	WRITE_ONCE(init_pkru_value, new_init_pkru);
-	pk = get_xsave_addr(&init_fpstate.xsave, XFEATURE_PKRU);
+	pk = get_xsave_addr(NULL, XFEATURE_PKRU);
 	if (!pk)
 		return -EINVAL;
 	pk->pkru = new_init_pkru;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v3 04/21] x86/fpu/xstate: Modify context switch helpers to handle both static and dynamic buffers
  2020-12-23 15:56 [PATCH v3 00/21] x86: Support Intel Advanced Matrix Extensions Chang S. Bae
                   ` (2 preceding siblings ...)
  2020-12-23 15:56 ` [PATCH v3 03/21] x86/fpu/xstate: Modify address finders " Chang S. Bae
@ 2020-12-23 15:57 ` Chang S. Bae
  2021-01-15 13:18   ` Borislav Petkov
  2020-12-23 15:57 ` [PATCH v3 05/21] x86/fpu/xstate: Add a new variable to indicate dynamic user states Chang S. Bae
                   ` (17 subsequent siblings)
  21 siblings, 1 reply; 64+ messages in thread
From: Chang S. Bae @ 2020-12-23 15:57 UTC (permalink / raw)
  To: bp, luto, tglx, mingo, x86
  Cc: len.brown, dave.hansen, jing2.liu, ravi.v.shankar, linux-kernel,
	chang.seok.bae, kvm

In preparation for dynamic xstate buffer expansion, update the xstate
restore function parameters to equally handle static in-line xstate buffer,
as well as dynamically allocated xstate buffer.

No functional change.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: kvm@vger.kernel.org
---
Changes from v2:
* Updated the changelog with task->fpu removed. (Boris Petkov)
---
 arch/x86/include/asm/fpu/internal.h | 9 ++++++---
 arch/x86/kernel/fpu/core.c          | 4 ++--
 arch/x86/kernel/fpu/signal.c        | 3 +--
 arch/x86/kvm/x86.c                  | 2 +-
 4 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 0153c4d4ca77..37ea5e37f21c 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -397,8 +397,9 @@ static inline int copy_user_to_xregs(struct xregs_state __user *buf, u64 mask)
  * Restore xstate from kernel space xsave area, return an error code instead of
  * an exception.
  */
-static inline int copy_kernel_to_xregs_err(struct xregs_state *xstate, u64 mask)
+static inline int copy_kernel_to_xregs_err(struct fpu *fpu, u64 mask)
 {
+	struct xregs_state *xstate = &fpu->state.xsave;
 	u32 lmask = mask;
 	u32 hmask = mask >> 32;
 	int err;
@@ -425,8 +426,10 @@ static inline void __copy_kernel_to_fpregs(union fpregs_state *fpstate, u64 mask
 	}
 }
 
-static inline void copy_kernel_to_fpregs(union fpregs_state *fpstate)
+static inline void copy_kernel_to_fpregs(struct fpu *fpu)
 {
+	union fpregs_state *fpstate = &fpu->state;
+
 	/*
 	 * AMD K7/K8 CPUs don't save/restore FDP/FIP/FOP unless an exception is
 	 * pending. Clear the x87 state here by setting it to fixed values.
@@ -511,7 +514,7 @@ static inline void __fpregs_load_activate(void)
 		return;
 
 	if (!fpregs_state_valid(fpu, cpu)) {
-		copy_kernel_to_fpregs(&fpu->state);
+		copy_kernel_to_fpregs(fpu);
 		fpregs_activate(fpu);
 		fpu->last_cpu = cpu;
 	}
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index f23e5ffbb307..20925cae2a84 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -172,7 +172,7 @@ void fpu__save(struct fpu *fpu)
 
 	if (!test_thread_flag(TIF_NEED_FPU_LOAD)) {
 		if (!copy_fpregs_to_fpstate(fpu)) {
-			copy_kernel_to_fpregs(&fpu->state);
+			copy_kernel_to_fpregs(fpu);
 		}
 	}
 
@@ -248,7 +248,7 @@ int fpu__copy(struct task_struct *dst, struct task_struct *src)
 		memcpy(&dst_fpu->state, &src_fpu->state, fpu_kernel_xstate_size);
 
 	else if (!copy_fpregs_to_fpstate(dst_fpu))
-		copy_kernel_to_fpregs(&dst_fpu->state);
+		copy_kernel_to_fpregs(dst_fpu);
 
 	fpregs_unlock();
 
diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index 0d6deb75c507..414a13427934 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -426,8 +426,7 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
 		 * Restore previously saved supervisor xstates along with
 		 * copied-in user xstates.
 		 */
-		ret = copy_kernel_to_xregs_err(&fpu->state.xsave,
-					       user_xfeatures | xfeatures_mask_supervisor());
+		ret = copy_kernel_to_xregs_err(fpu, user_xfeatures | xfeatures_mask_supervisor());
 
 	} else if (use_fxsr()) {
 		ret = __copy_from_user(&fpu->state.fxsave, buf_fx, state_size);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 09368201d9cc..a087bbf252b6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9249,7 +9249,7 @@ static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
 
 	kvm_save_current_fpu(vcpu->arch.guest_fpu);
 
-	copy_kernel_to_fpregs(&vcpu->arch.user_fpu->state);
+	copy_kernel_to_fpregs(vcpu->arch.user_fpu);
 
 	fpregs_mark_activate();
 	fpregs_unlock();
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v3 05/21] x86/fpu/xstate: Add a new variable to indicate dynamic user states
  2020-12-23 15:56 [PATCH v3 00/21] x86: Support Intel Advanced Matrix Extensions Chang S. Bae
                   ` (3 preceding siblings ...)
  2020-12-23 15:57 ` [PATCH v3 04/21] x86/fpu/xstate: Modify context switch helpers " Chang S. Bae
@ 2020-12-23 15:57 ` Chang S. Bae
  2021-01-15 13:39   ` Borislav Petkov
  2020-12-23 15:57 ` [PATCH v3 06/21] x86/fpu/xstate: Calculate and remember dynamic xstate buffer sizes Chang S. Bae
                   ` (16 subsequent siblings)
  21 siblings, 1 reply; 64+ messages in thread
From: Chang S. Bae @ 2020-12-23 15:57 UTC (permalink / raw)
  To: bp, luto, tglx, mingo, x86
  Cc: len.brown, dave.hansen, jing2.liu, ravi.v.shankar, linux-kernel,
	chang.seok.bae

The perf has a buffer that is allocated on demand. The states saved in the
buffer were named as 'dynamic' (supervisor) states but the buffer is not
updated in every context switch.

The context switch buffer is in preparation to be dynamic for user states.
Make the wording to differentiate between those 'dynamic' states.

Add a new variable -- xfeatures_mask_user_dynamic to indicate the dynamic
user states, and rename some define and helper as related to the dynamic
supervisor states:
	xfeatures_mask_supervisor_dynamic()
	XFEATURE_MASK_SUPERVISOR_DYNAMIC

No functional change.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v2:
* Updated the changelog for clarification.
---
 arch/x86/include/asm/fpu/xstate.h | 12 +++++++-----
 arch/x86/kernel/fpu/xstate.c      | 29 +++++++++++++++++++----------
 2 files changed, 26 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
index 24bf8d3f559a..6ce8350672c2 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -56,7 +56,7 @@
  * - Don't set the bit corresponding to the dynamic supervisor feature in
  *   IA32_XSS at run time, since it has been set at boot time.
  */
-#define XFEATURE_MASK_DYNAMIC (XFEATURE_MASK_LBR)
+#define XFEATURE_MASK_SUPERVISOR_DYNAMIC (XFEATURE_MASK_LBR)
 
 /*
  * Unsupported supervisor features. When a supervisor feature in this mask is
@@ -66,7 +66,7 @@
 
 /* All supervisor states including supported and unsupported states. */
 #define XFEATURE_MASK_SUPERVISOR_ALL (XFEATURE_MASK_SUPERVISOR_SUPPORTED | \
-				      XFEATURE_MASK_DYNAMIC | \
+				      XFEATURE_MASK_SUPERVISOR_DYNAMIC | \
 				      XFEATURE_MASK_SUPERVISOR_UNSUPPORTED)
 
 #ifdef CONFIG_X86_64
@@ -87,14 +87,16 @@ static inline u64 xfeatures_mask_user(void)
 	return xfeatures_mask_all & XFEATURE_MASK_USER_SUPPORTED;
 }
 
-static inline u64 xfeatures_mask_dynamic(void)
+static inline u64 xfeatures_mask_supervisor_dynamic(void)
 {
 	if (!boot_cpu_has(X86_FEATURE_ARCH_LBR))
-		return XFEATURE_MASK_DYNAMIC & ~XFEATURE_MASK_LBR;
+		return XFEATURE_MASK_SUPERVISOR_DYNAMIC & ~XFEATURE_MASK_LBR;
 
-	return XFEATURE_MASK_DYNAMIC;
+	return XFEATURE_MASK_SUPERVISOR_DYNAMIC;
 }
 
+extern u64 xfeatures_mask_user_dynamic;
+
 extern u64 xstate_fx_sw_bytes[USER_XSTATE_FX_SW_WORDS];
 
 extern void __init update_regset_xstate_info(unsigned int size,
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 2010c31d25e1..6620d0a3caff 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -61,6 +61,12 @@ static short xsave_cpuid_features[] __initdata = {
  */
 u64 xfeatures_mask_all __read_mostly;
 
+/*
+ * This represents user xstates, a subset of xfeatures_mask_all, saved in a
+ * dynamic kernel XSAVE buffer.
+ */
+u64 xfeatures_mask_user_dynamic __read_mostly;
+
 static unsigned int xstate_offsets[XFEATURE_MAX] = { [ 0 ... XFEATURE_MAX - 1] = -1};
 static unsigned int xstate_sizes[XFEATURE_MAX]   = { [ 0 ... XFEATURE_MAX - 1] = -1};
 static unsigned int xstate_comp_offsets[XFEATURE_MAX] = { [ 0 ... XFEATURE_MAX - 1] = -1};
@@ -237,7 +243,7 @@ void fpu__init_cpu_xstate(void)
 	 */
 	if (boot_cpu_has(X86_FEATURE_XSAVES)) {
 		wrmsrl(MSR_IA32_XSS, xfeatures_mask_supervisor() |
-				     xfeatures_mask_dynamic());
+				     xfeatures_mask_supervisor_dynamic());
 	}
 }
 
@@ -686,7 +692,7 @@ static unsigned int __init get_xsaves_size(void)
  */
 static unsigned int __init get_xsaves_size_no_dynamic(void)
 {
-	u64 mask = xfeatures_mask_dynamic();
+	u64 mask = xfeatures_mask_supervisor_dynamic();
 	unsigned int size;
 
 	if (!mask)
@@ -773,6 +779,7 @@ static int __init init_xstate_size(void)
 static void fpu__init_disable_system_xstate(void)
 {
 	xfeatures_mask_all = 0;
+	xfeatures_mask_user_dynamic = 0;
 	cr4_clear_bits(X86_CR4_OSXSAVE);
 	setup_clear_cpu_cap(X86_FEATURE_XSAVE);
 }
@@ -839,6 +846,8 @@ void __init fpu__init_system_xstate(void)
 	}
 
 	xfeatures_mask_all &= fpu__get_supported_xfeatures_mask();
+	/* Do not support the dynamically allocated buffer yet. */
+	xfeatures_mask_user_dynamic = 0;
 
 	/* Enable xstate instructions to be able to continue with initialization: */
 	fpu__init_cpu_xstate();
@@ -886,7 +895,7 @@ void fpu__resume_cpu(void)
 	 */
 	if (boot_cpu_has(X86_FEATURE_XSAVES)) {
 		wrmsrl(MSR_IA32_XSS, xfeatures_mask_supervisor()  |
-				     xfeatures_mask_dynamic());
+				     xfeatures_mask_supervisor_dynamic());
 	}
 }
 
@@ -1321,8 +1330,8 @@ void copy_supervisor_to_kernel(struct fpu *fpu)
  * @mask: Represent the dynamic supervisor features saved into the xsave area
  *
  * Only the dynamic supervisor states sets in the mask are saved into the xsave
- * area (See the comment in XFEATURE_MASK_DYNAMIC for the details of dynamic
- * supervisor feature). Besides the dynamic supervisor states, the legacy
+ * area (See the comment in XFEATURE_MASK_SUPERVISOR_DYNAMIC for the details of
+ * dynamic supervisor feature). Besides the dynamic supervisor states, the legacy
  * region and XSAVE header are also saved into the xsave area. The supervisor
  * features in the XFEATURE_MASK_SUPERVISOR_SUPPORTED and
  * XFEATURE_MASK_SUPERVISOR_UNSUPPORTED are not saved.
@@ -1331,7 +1340,7 @@ void copy_supervisor_to_kernel(struct fpu *fpu)
  */
 void copy_dynamic_supervisor_to_kernel(struct xregs_state *xstate, u64 mask)
 {
-	u64 dynamic_mask = xfeatures_mask_dynamic() & mask;
+	u64 dynamic_mask = xfeatures_mask_supervisor_dynamic() & mask;
 	u32 lmask, hmask;
 	int err;
 
@@ -1357,9 +1366,9 @@ void copy_dynamic_supervisor_to_kernel(struct xregs_state *xstate, u64 mask)
  * @mask: Represent the dynamic supervisor features restored from the xsave area
  *
  * Only the dynamic supervisor states sets in the mask are restored from the
- * xsave area (See the comment in XFEATURE_MASK_DYNAMIC for the details of
- * dynamic supervisor feature). Besides the dynamic supervisor states, the
- * legacy region and XSAVE header are also restored from the xsave area. The
+ * xsave area (See the comment in XFEATURE_MASK_SUPERVISOR_DYNAMIC for the
+ * details of dynamic supervisor feature). Besides the dynamic supervisor states,
+ * the legacy region and XSAVE header are also restored from the xsave area. The
  * supervisor features in the XFEATURE_MASK_SUPERVISOR_SUPPORTED and
  * XFEATURE_MASK_SUPERVISOR_UNSUPPORTED are not restored.
  *
@@ -1367,7 +1376,7 @@ void copy_dynamic_supervisor_to_kernel(struct xregs_state *xstate, u64 mask)
  */
 void copy_kernel_to_dynamic_supervisor(struct xregs_state *xstate, u64 mask)
 {
-	u64 dynamic_mask = xfeatures_mask_dynamic() & mask;
+	u64 dynamic_mask = xfeatures_mask_supervisor_dynamic() & mask;
 	u32 lmask, hmask;
 	int err;
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v3 06/21] x86/fpu/xstate: Calculate and remember dynamic xstate buffer sizes
  2020-12-23 15:56 [PATCH v3 00/21] x86: Support Intel Advanced Matrix Extensions Chang S. Bae
                   ` (4 preceding siblings ...)
  2020-12-23 15:57 ` [PATCH v3 05/21] x86/fpu/xstate: Add a new variable to indicate dynamic user states Chang S. Bae
@ 2020-12-23 15:57 ` Chang S. Bae
  2021-01-22 11:44   ` Borislav Petkov
  2020-12-23 15:57 ` [PATCH v3 07/21] x86/fpu/xstate: Introduce helpers to manage dynamic xstate buffers Chang S. Bae
                   ` (15 subsequent siblings)
  21 siblings, 1 reply; 64+ messages in thread
From: Chang S. Bae @ 2020-12-23 15:57 UTC (permalink / raw)
  To: bp, luto, tglx, mingo, x86
  Cc: len.brown, dave.hansen, jing2.liu, ravi.v.shankar, linux-kernel,
	chang.seok.bae, kvm

The xstate buffer is currently in-line with static size. To accommodate
dynamic user xstates, introduce variables to represent the maximum and
minimum buffer sizes.

do_extra_xstate_size_checks() calculates the maximum xstate size and sanity
checks it with CPUID. It calculates the static in-line buffer size by
excluding the dynamic user states from the maximum xstate size.

No functional change, until the kernel enables dynamic buffer support.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: kvm@vger.kernel.org
---
Changes from v2:
* Updated the changelog with task->fpu removed. (Boris Petkov)
* Renamed the in-line size variable.
* Updated some code comments.
---
 arch/x86/include/asm/processor.h | 10 +++----
 arch/x86/kernel/fpu/core.c       |  6 ++---
 arch/x86/kernel/fpu/init.c       | 36 ++++++++++++++++---------
 arch/x86/kernel/fpu/signal.c     |  2 +-
 arch/x86/kernel/fpu/xstate.c     | 46 +++++++++++++++++++++-----------
 arch/x86/kernel/process.c        |  6 +++++
 arch/x86/kvm/x86.c               |  2 +-
 7 files changed, 67 insertions(+), 41 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 82a08b585818..c9c608f8af91 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -477,7 +477,8 @@ DECLARE_PER_CPU_ALIGNED(struct stack_canary, stack_canary);
 DECLARE_PER_CPU(struct irq_stack *, softirq_stack_ptr);
 #endif	/* X86_64 */
 
-extern unsigned int fpu_kernel_xstate_size;
+extern unsigned int fpu_kernel_xstate_min_size;
+extern unsigned int fpu_kernel_xstate_max_size;
 extern unsigned int fpu_user_xstate_size;
 
 struct perf_event;
@@ -545,12 +546,7 @@ struct thread_struct {
 };
 
 /* Whitelist the FPU state from the task_struct for hardened usercopy. */
-static inline void arch_thread_struct_whitelist(unsigned long *offset,
-						unsigned long *size)
-{
-	*offset = offsetof(struct thread_struct, fpu.state);
-	*size = fpu_kernel_xstate_size;
-}
+extern void arch_thread_struct_whitelist(unsigned long *offset, unsigned long *size);
 
 /*
  * Thread-synchronous status.
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 20925cae2a84..1a428803e6b2 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -206,7 +206,7 @@ void fpstate_init(struct fpu *fpu)
 		return;
 	}
 
-	memset(state, 0, fpu_kernel_xstate_size);
+	memset(state, 0, fpu_kernel_xstate_min_size);
 
 	if (static_cpu_has(X86_FEATURE_XSAVES))
 		fpstate_init_xstate(&state->xsave, xfeatures_mask_all);
@@ -233,7 +233,7 @@ int fpu__copy(struct task_struct *dst, struct task_struct *src)
 	 * Don't let 'init optimized' areas of the XSAVE area
 	 * leak into the child task:
 	 */
-	memset(&dst_fpu->state.xsave, 0, fpu_kernel_xstate_size);
+	memset(&dst_fpu->state.xsave, 0, fpu_kernel_xstate_min_size);
 
 	/*
 	 * If the FPU registers are not current just memcpy() the state.
@@ -245,7 +245,7 @@ int fpu__copy(struct task_struct *dst, struct task_struct *src)
 	 */
 	fpregs_lock();
 	if (test_thread_flag(TIF_NEED_FPU_LOAD))
-		memcpy(&dst_fpu->state, &src_fpu->state, fpu_kernel_xstate_size);
+		memcpy(&dst_fpu->state, &src_fpu->state, fpu_kernel_xstate_min_size);
 
 	else if (!copy_fpregs_to_fpstate(dst_fpu))
 		copy_kernel_to_fpregs(dst_fpu);
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 74e03e3bc20f..5dac97158030 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -130,13 +130,20 @@ static void __init fpu__init_system_generic(void)
 }
 
 /*
- * Size of the FPU context state. All tasks in the system use the
- * same context size, regardless of what portion they use.
- * This is inherent to the XSAVE architecture which puts all state
- * components into a single, continuous memory block:
+ * Size of the minimally allocated FPU context state. All threads have this amount
+ * of xstate buffer at minimum.
+ *
+ * This buffer is inherent to the XSAVE architecture which puts all state components
+ * into a single, continuous memory block:
+ */
+unsigned int fpu_kernel_xstate_min_size;
+EXPORT_SYMBOL_GPL(fpu_kernel_xstate_min_size);
+
+/*
+ * Size of the maximum FPU context state. When using the compacted format, the buffer
+ * can be dynamically expanded to include some states up to this size.
  */
-unsigned int fpu_kernel_xstate_size;
-EXPORT_SYMBOL_GPL(fpu_kernel_xstate_size);
+unsigned int fpu_kernel_xstate_max_size;
 
 /* Get alignment of the TYPE. */
 #define TYPE_ALIGN(TYPE) offsetof(struct { char x; TYPE test; }, test)
@@ -167,8 +174,10 @@ static void __init fpu__init_task_struct_size(void)
 	/*
 	 * Add back the dynamically-calculated register state
 	 * size.
+	 *
+	 * Use the minimum size as in-lined to the task_struct.
 	 */
-	task_size += fpu_kernel_xstate_size;
+	task_size += fpu_kernel_xstate_min_size;
 
 	/*
 	 * We dynamically size 'struct fpu', so we require that
@@ -193,6 +202,7 @@ static void __init fpu__init_task_struct_size(void)
 static void __init fpu__init_system_xstate_size_legacy(void)
 {
 	static int on_boot_cpu __initdata = 1;
+	unsigned int size;
 
 	WARN_ON_FPU(!on_boot_cpu);
 	on_boot_cpu = 0;
@@ -203,17 +213,17 @@ static void __init fpu__init_system_xstate_size_legacy(void)
 	 */
 
 	if (!boot_cpu_has(X86_FEATURE_FPU)) {
-		fpu_kernel_xstate_size = sizeof(struct swregs_state);
+		size = sizeof(struct swregs_state);
 	} else {
 		if (boot_cpu_has(X86_FEATURE_FXSR))
-			fpu_kernel_xstate_size =
-				sizeof(struct fxregs_state);
+			size = sizeof(struct fxregs_state);
 		else
-			fpu_kernel_xstate_size =
-				sizeof(struct fregs_state);
+			size = sizeof(struct fregs_state);
 	}
 
-	fpu_user_xstate_size = fpu_kernel_xstate_size;
+	fpu_kernel_xstate_min_size = size;
+	fpu_kernel_xstate_max_size = size;
+	fpu_user_xstate_size = size;
 }
 
 /*
diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index 414a13427934..b6d2706b6886 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -289,8 +289,8 @@ static int copy_user_to_fpregs_zeroing(void __user *buf, u64 xbv, int fx_only)
 
 static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
 {
+	int state_size = fpu_kernel_xstate_min_size;
 	struct user_i387_ia32_struct *envp = NULL;
-	int state_size = fpu_kernel_xstate_size;
 	int ia32_fxstate = (buf != buf_fx);
 	struct task_struct *tsk = current;
 	struct fpu *fpu = &tsk->thread.fpu;
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 6620d0a3caff..2012b17b1793 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -627,13 +627,18 @@ static void check_xstate_against_struct(int nr)
  */
 static void do_extra_xstate_size_checks(void)
 {
-	int paranoid_xstate_size = FXSAVE_SIZE + XSAVE_HDR_SIZE;
+	int paranoid_min_size = FXSAVE_SIZE + XSAVE_HDR_SIZE;
+	int paranoid_max_size = FXSAVE_SIZE + XSAVE_HDR_SIZE;
 	int i;
 
 	for (i = FIRST_EXTENDED_XFEATURE; i < XFEATURE_MAX; i++) {
+		bool dynamic;
+
 		if (!xfeature_enabled(i))
 			continue;
 
+		dynamic = (xfeatures_mask_user_dynamic & BIT_ULL(i)) ? true : false;
+
 		check_xstate_against_struct(i);
 		/*
 		 * Supervisor state components can be managed only by
@@ -643,23 +648,32 @@ static void do_extra_xstate_size_checks(void)
 			XSTATE_WARN_ON(xfeature_is_supervisor(i));
 
 		/* Align from the end of the previous feature */
-		if (xfeature_is_aligned(i))
-			paranoid_xstate_size = ALIGN(paranoid_xstate_size, 64);
+		if (xfeature_is_aligned(i)) {
+			paranoid_max_size = ALIGN(paranoid_max_size, 64);
+			if (!dynamic)
+				paranoid_min_size = ALIGN(paranoid_min_size, 64);
+		}
 		/*
 		 * The offset of a given state in the non-compacted
 		 * format is given to us in a CPUID leaf.  We check
 		 * them for being ordered (increasing offsets) in
 		 * setup_xstate_features().
 		 */
-		if (!using_compacted_format())
-			paranoid_xstate_size = xfeature_uncompacted_offset(i);
+		if (!using_compacted_format()) {
+			paranoid_max_size = xfeature_uncompacted_offset(i);
+			if (!dynamic)
+				paranoid_min_size = xfeature_uncompacted_offset(i);
+		}
 		/*
 		 * The compacted-format offset always depends on where
 		 * the previous state ended.
 		 */
-		paranoid_xstate_size += xfeature_size(i);
+		paranoid_max_size += xfeature_size(i);
+		if (!dynamic)
+			paranoid_min_size += xfeature_size(i);
 	}
-	XSTATE_WARN_ON(paranoid_xstate_size != fpu_kernel_xstate_size);
+	XSTATE_WARN_ON(paranoid_max_size != fpu_kernel_xstate_max_size);
+	fpu_kernel_xstate_min_size = paranoid_min_size;
 }
 
 
@@ -744,27 +758,27 @@ static bool is_supported_xstate_size(unsigned int test_xstate_size)
 static int __init init_xstate_size(void)
 {
 	/* Recompute the context size for enabled features: */
-	unsigned int possible_xstate_size;
+	unsigned int possible_max_xstate_size;
 	unsigned int xsave_size;
 
 	xsave_size = get_xsave_size();
 
 	if (boot_cpu_has(X86_FEATURE_XSAVES))
-		possible_xstate_size = get_xsaves_size_no_dynamic();
+		possible_max_xstate_size = get_xsaves_size_no_dynamic();
 	else
-		possible_xstate_size = xsave_size;
-
-	/* Ensure we have the space to store all enabled: */
-	if (!is_supported_xstate_size(possible_xstate_size))
-		return -EINVAL;
+		possible_max_xstate_size = xsave_size;
 
 	/*
 	 * The size is OK, we are definitely going to use xsave,
 	 * make it known to the world that we need more space.
 	 */
-	fpu_kernel_xstate_size = possible_xstate_size;
+	fpu_kernel_xstate_max_size = possible_max_xstate_size;
 	do_extra_xstate_size_checks();
 
+	/* Ensure we have the supported in-line space: */
+	if (!is_supported_xstate_size(fpu_kernel_xstate_min_size))
+		return -EINVAL;
+
 	/*
 	 * User space is always in standard format.
 	 */
@@ -869,7 +883,7 @@ void __init fpu__init_system_xstate(void)
 
 	pr_info("x86/fpu: Enabled xstate features 0x%llx, context size is %d bytes, using '%s' format.\n",
 		xfeatures_mask_all,
-		fpu_kernel_xstate_size,
+		fpu_kernel_xstate_max_size,
 		boot_cpu_has(X86_FEATURE_XSAVES) ? "compacted" : "standard");
 	return;
 
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 145a7ac0c19a..326b16aefb06 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -96,6 +96,12 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
 	return fpu__copy(dst, src);
 }
 
+void arch_thread_struct_whitelist(unsigned long *offset, unsigned long *size)
+{
+	*offset = offsetof(struct thread_struct, fpu.state);
+	*size = fpu_kernel_xstate_min_size;
+}
+
 /*
  * Free thread data structures etc..
  */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a087bbf252b6..4aecfba04bd3 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9220,7 +9220,7 @@ static void kvm_save_current_fpu(struct fpu *fpu)
 	 */
 	if (test_thread_flag(TIF_NEED_FPU_LOAD))
 		memcpy(&fpu->state, &current->thread.fpu.state,
-		       fpu_kernel_xstate_size);
+		       fpu_kernel_xstate_min_size);
 	else
 		copy_fpregs_to_fpstate(fpu);
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v3 07/21] x86/fpu/xstate: Introduce helpers to manage dynamic xstate buffers
  2020-12-23 15:56 [PATCH v3 00/21] x86: Support Intel Advanced Matrix Extensions Chang S. Bae
                   ` (5 preceding siblings ...)
  2020-12-23 15:57 ` [PATCH v3 06/21] x86/fpu/xstate: Calculate and remember dynamic xstate buffer sizes Chang S. Bae
@ 2020-12-23 15:57 ` Chang S. Bae
  2021-01-26 20:17   ` Borislav Petkov
  2020-12-23 15:57 ` [PATCH v3 08/21] x86/fpu/xstate: Define the scope of the initial xstate data Chang S. Bae
                   ` (14 subsequent siblings)
  21 siblings, 1 reply; 64+ messages in thread
From: Chang S. Bae @ 2020-12-23 15:57 UTC (permalink / raw)
  To: bp, luto, tglx, mingo, x86
  Cc: len.brown, dave.hansen, jing2.liu, ravi.v.shankar, linux-kernel,
	chang.seok.bae

The static per-task xstate buffer contains the extended register states --
but it is not expandable at runtime. Introduce runtime methods and new fpu
struct fields to support the expansion.

fpu->state_mask indicates the saved states per task and fpu->state_ptr
points to the dynamically allocated buffer.

alloc_xstate_buffer() uses vmalloc(). If use of this mechanism grows to
allocate buffers larger than 64KB, a more sophisticated allocation scheme
that includes purpose-built reclaim capability might be justified.

Introduce a new helper -- get_xstate_size() to calculate the buffer size.

No functional change until the kernel supports dynamic user states.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v2:
* Updated the changelog with task->fpu removed. (Boris Petkov)
* Replaced 'area' with 'buffer' in the comments and the changelog.
* Updated the code comments.

Changes from v1:
* Removed unneeded interrupt masking (Andy Lutomirski)
* Added vmalloc() error tracing (Dave Hansen, PeterZ, and Andy Lutomirski)
---
 arch/x86/include/asm/fpu/types.h  |  29 ++++++--
 arch/x86/include/asm/fpu/xstate.h |   3 +
 arch/x86/include/asm/trace/fpu.h  |   5 ++
 arch/x86/kernel/fpu/core.c        |   3 +
 arch/x86/kernel/fpu/xstate.c      | 115 ++++++++++++++++++++++++++++++
 5 files changed, 150 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
index f5a38a5f3ae1..3fc6dbbe3ede 100644
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -336,14 +336,33 @@ struct fpu {
 	 */
 	unsigned long			avx512_timestamp;
 
+	/*
+	 * @state_mask:
+	 *
+	 * The state component bitmap. It indicates the saved xstate in
+	 * either @state or @state_ptr. The map value starts to be aligned
+	 * with @state and then with @state_ptr once it is in use.
+	 */
+	u64				state_mask;
+
+	/*
+	 * @state_ptr:
+	 *
+	 * Copy of all extended register states, in a dynamically allocated
+	 * buffer. When a task is using extended features, the register state
+	 * is always the most current. This state copy is more recent than
+	 * @state. If the task context-switches away, they get saved here,
+	 * representing the xstate.
+	 */
+	union fpregs_state		*state_ptr;
+
 	/*
 	 * @state:
 	 *
-	 * In-memory copy of all FPU registers that we save/restore
-	 * over context switches. If the task is using the FPU then
-	 * the registers in the FPU are more recent than this state
-	 * copy. If the task context-switches away then they get
-	 * saved here and represent the FPU state.
+	 * Copy of some extended register state. If a task uses a dynamically
+	 * allocated buffer, @state_ptr, then it has a more recent state copy
+	 * than this. This copy follows the same attributes as described for
+	 * @state_ptr.
 	 */
 	union fpregs_state		state;
 	/*
diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
index 6ce8350672c2..379e8f8b8440 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -103,6 +103,9 @@ extern void __init update_regset_xstate_info(unsigned int size,
 					     u64 xstate_mask);
 
 void *get_xsave_addr(struct fpu *fpu, int xfeature_nr);
+int alloc_xstate_buffer(struct fpu *fpu, u64 mask);
+void free_xstate_buffer(struct fpu *fpu);
+
 const void *get_xsave_field_ptr(int xfeature_nr);
 int using_compacted_format(void);
 int xfeature_size(int xfeature_nr);
diff --git a/arch/x86/include/asm/trace/fpu.h b/arch/x86/include/asm/trace/fpu.h
index 879b77792f94..bf88b3333873 100644
--- a/arch/x86/include/asm/trace/fpu.h
+++ b/arch/x86/include/asm/trace/fpu.h
@@ -89,6 +89,11 @@ DEFINE_EVENT(x86_fpu, x86_fpu_xstate_check_failed,
 	TP_ARGS(fpu)
 );
 
+DEFINE_EVENT(x86_fpu, x86_fpu_xstate_alloc_failed,
+	TP_PROTO(struct fpu *fpu),
+	TP_ARGS(fpu)
+);
+
 #undef TRACE_INCLUDE_PATH
 #define TRACE_INCLUDE_PATH asm/trace/
 #undef TRACE_INCLUDE_FILE
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 1a428803e6b2..6dafed34be4f 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -235,6 +235,9 @@ int fpu__copy(struct task_struct *dst, struct task_struct *src)
 	 */
 	memset(&dst_fpu->state.xsave, 0, fpu_kernel_xstate_min_size);
 
+	dst_fpu->state_mask = xfeatures_mask_all & ~xfeatures_mask_user_dynamic;
+	dst_fpu->state_ptr = NULL;
+
 	/*
 	 * If the FPU registers are not current just memcpy() the state.
 	 * Otherwise save current FPU registers directly into the child's FPU
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 2012b17b1793..af4d7d9aa977 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -10,6 +10,7 @@
 #include <linux/pkeys.h>
 #include <linux/seq_file.h>
 #include <linux/proc_fs.h>
+#include <linux/vmalloc.h>
 
 #include <asm/fpu/api.h>
 #include <asm/fpu/internal.h>
@@ -19,6 +20,7 @@
 
 #include <asm/tlbflush.h>
 #include <asm/cpufeature.h>
+#include <asm/trace/fpu.h>
 
 /*
  * Although we spell it out in here, the Processor Trace
@@ -71,6 +73,7 @@ static unsigned int xstate_offsets[XFEATURE_MAX] = { [ 0 ... XFEATURE_MAX - 1] =
 static unsigned int xstate_sizes[XFEATURE_MAX]   = { [ 0 ... XFEATURE_MAX - 1] = -1};
 static unsigned int xstate_comp_offsets[XFEATURE_MAX] = { [ 0 ... XFEATURE_MAX - 1] = -1};
 static unsigned int xstate_supervisor_only_offsets[XFEATURE_MAX] = { [ 0 ... XFEATURE_MAX - 1] = -1};
+static bool xstate_aligns[XFEATURE_MAX] = { [ 0 ... XFEATURE_MAX - 1] = false};
 
 /*
  * The XSAVE area of kernel can be in standard or compacted format;
@@ -130,6 +133,48 @@ static bool xfeature_is_supervisor(int xfeature_nr)
 	return ecx & 1;
 }
 
+/*
+ * Available once those arrays for the offset, size, and alignment info are set up,
+ * by setup_xstate_features().
+ */
+static unsigned int get_xstate_size(u64 mask)
+{
+	unsigned int size;
+	u64 xmask;
+	int i, nr;
+
+	if (!mask)
+		return 0;
+	else if (mask == (xfeatures_mask_all & ~xfeatures_mask_user_dynamic))
+		return fpu_kernel_xstate_min_size;
+	else if (mask == xfeatures_mask_all)
+		return fpu_kernel_xstate_max_size;
+
+	nr = fls64(mask) - 1;
+
+	if (!using_compacted_format())
+		return xstate_offsets[nr] + xstate_sizes[nr];
+
+	xmask = BIT_ULL(nr + 1) - 1;
+
+	if (mask == (xmask & xfeatures_mask_all))
+		return xstate_comp_offsets[nr] + xstate_sizes[nr];
+
+	/*
+	 * Calculate the size by summing up each state together, since no known
+	 * size found with the xstate buffer format out of the given mask.
+	 */
+	for (size = FXSAVE_SIZE + XSAVE_HDR_SIZE, i = FIRST_EXTENDED_XFEATURE; i <= nr; i++) {
+		if (!(mask & BIT_ULL(i)))
+			continue;
+
+		if (xstate_aligns[i])
+			size = ALIGN(size, 64);
+		size += xstate_sizes[i];
+	}
+	return size;
+}
+
 /*
  * When executing XSAVEOPT (or other optimized XSAVE instructions), if
  * a processor implementation detects that an FPU state component is still
@@ -270,10 +315,12 @@ static void __init setup_xstate_features(void)
 	xstate_offsets[XFEATURE_FP]	= 0;
 	xstate_sizes[XFEATURE_FP]	= offsetof(struct fxregs_state,
 						   xmm_space);
+	xstate_aligns[XFEATURE_FP]	= true;
 
 	xstate_offsets[XFEATURE_SSE]	= xstate_sizes[XFEATURE_FP];
 	xstate_sizes[XFEATURE_SSE]	= sizeof_field(struct fxregs_state,
 						       xmm_space);
+	xstate_aligns[XFEATURE_SSE]	= true;
 
 	for (i = FIRST_EXTENDED_XFEATURE; i < XFEATURE_MAX; i++) {
 		if (!xfeature_enabled(i))
@@ -291,6 +338,7 @@ static void __init setup_xstate_features(void)
 			continue;
 
 		xstate_offsets[i] = ebx;
+		xstate_aligns[i] = (ecx & 2) ? true : false;
 
 		/*
 		 * In our xstate size checks, we assume that the highest-numbered
@@ -755,6 +803,9 @@ static bool is_supported_xstate_size(unsigned int test_xstate_size)
 	return false;
 }
 
+/* The watched threshold size of dynamically allocated xstate buffer */
+#define XSTATE_BUFFER_MAX_BYTES		(64 * 1024)
+
 static int __init init_xstate_size(void)
 {
 	/* Recompute the context size for enabled features: */
@@ -779,6 +830,14 @@ static int __init init_xstate_size(void)
 	if (!is_supported_xstate_size(fpu_kernel_xstate_min_size))
 		return -EINVAL;
 
+	/*
+	 * When allocating buffers larger than the threshold, a more sophisticated
+	 * mechanism might be considerable.
+	 */
+	if (fpu_kernel_xstate_max_size > XSTATE_BUFFER_MAX_BYTES)
+		pr_warn("x86/fpu: xstate buffer too large (%u > %u)\n",
+			fpu_kernel_xstate_max_size, XSTATE_BUFFER_MAX_BYTES);
+
 	/*
 	 * User space is always in standard format.
 	 */
@@ -869,6 +928,9 @@ void __init fpu__init_system_xstate(void)
 	if (err)
 		goto out_disable;
 
+	/* Make sure init_task does not include the dynamic user states */
+	current->thread.fpu.state_mask = (xfeatures_mask_all & ~xfeatures_mask_user_dynamic);
+
 	/*
 	 * Update info used for ptrace frames; use standard-format size and no
 	 * supervisor xstates:
@@ -1089,6 +1151,59 @@ static inline bool xfeatures_mxcsr_quirk(u64 xfeatures)
 	return true;
 }
 
+void free_xstate_buffer(struct fpu *fpu)
+{
+	vfree(fpu->state_ptr);
+}
+
+/*
+ * Allocate an xstate buffer with the size calculated based on 'mask'.
+ *
+ * The allocation mechanism does not shrink or reclaim the buffer.
+ */
+int alloc_xstate_buffer(struct fpu *fpu, u64 mask)
+{
+	union fpregs_state *state_ptr;
+	unsigned int oldsz, newsz;
+	u64 state_mask;
+
+	state_mask = fpu->state_mask | mask;
+
+	oldsz = get_xstate_size(fpu->state_mask);
+	newsz = get_xstate_size(state_mask);
+
+	if (oldsz >= newsz)
+		return 0;
+
+	if (newsz > fpu_kernel_xstate_max_size) {
+		pr_warn_once("x86/fpu: xstate buffer too large (%u > %u bytes)\n",
+			     newsz, fpu_kernel_xstate_max_size);
+		XSTATE_WARN_ON(1);
+		return 0;
+	}
+
+	/* We need 64B aligned pointer, but vmalloc() returns a page-aligned address. */
+	state_ptr = vmalloc(newsz);
+	if (!state_ptr) {
+		trace_x86_fpu_xstate_alloc_failed(fpu);
+		return -ENOMEM;
+	}
+
+	memset(state_ptr, 0, newsz);
+	if (using_compacted_format())
+		fpstate_init_xstate(&state_ptr->xsave, state_mask);
+
+	/*
+	 * As long as the register state is intact, save the xstate in the new buffer
+	 * at the next context copy/switch or potentially ptrace-driven xstate writing.
+	 */
+
+	vfree(fpu->state_ptr);
+	fpu->state_ptr = state_ptr;
+	fpu->state_mask = state_mask;
+	return 0;
+}
+
 static void fill_gap(struct membuf *to, unsigned *last, unsigned offset)
 {
 	if (*last >= offset)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v3 08/21] x86/fpu/xstate: Define the scope of the initial xstate data
  2020-12-23 15:56 [PATCH v3 00/21] x86: Support Intel Advanced Matrix Extensions Chang S. Bae
                   ` (6 preceding siblings ...)
  2020-12-23 15:57 ` [PATCH v3 07/21] x86/fpu/xstate: Introduce helpers to manage dynamic xstate buffers Chang S. Bae
@ 2020-12-23 15:57 ` Chang S. Bae
  2021-02-08 12:33   ` Borislav Petkov
  2020-12-23 15:57 ` [PATCH v3 09/21] x86/fpu/xstate: Introduce wrapper functions to organize xstate buffer access Chang S. Bae
                   ` (13 subsequent siblings)
  21 siblings, 1 reply; 64+ messages in thread
From: Chang S. Bae @ 2020-12-23 15:57 UTC (permalink / raw)
  To: bp, luto, tglx, mingo, x86
  Cc: len.brown, dave.hansen, jing2.liu, ravi.v.shankar, linux-kernel,
	chang.seok.bae

init_fpstate is used to record the initial xstate value for convenience
and covers all the states. But it is wasteful to cover large states all
with trivial initial data.

Limit init_fpstate by clarifying its size and coverage, which are all but
dynamic user states. The dynamic states are assumed to be large but having
initial data with zeros.

No functional change until the kernel supports dynamic user states.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v2:
* Updated the changelog for clarification.
* Updated the code comments.
---
 arch/x86/include/asm/fpu/internal.h | 18 +++++++++++++++---
 arch/x86/include/asm/fpu/xstate.h   |  1 +
 arch/x86/kernel/fpu/core.c          |  4 ++--
 arch/x86/kernel/fpu/xstate.c        |  4 ++--
 4 files changed, 20 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 37ea5e37f21c..bbdd304719c6 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -80,6 +80,18 @@ static __always_inline __pure bool use_fxsr(void)
 
 extern union fpregs_state init_fpstate;
 
+static inline u64 get_init_fpstate_mask(void)
+{
+	/* init_fpstate covers states in fpu->state. */
+	return (xfeatures_mask_all & ~xfeatures_mask_user_dynamic);
+}
+
+static inline unsigned int get_init_fpstate_size(void)
+{
+	/* fpu->state size is aligned with the init_fpstate size. */
+	return fpu_kernel_xstate_min_size;
+}
+
 extern void fpstate_init(struct fpu *fpu);
 #ifdef CONFIG_MATH_EMULATION
 extern void fpstate_init_soft(struct swregs_state *soft);
@@ -269,12 +281,12 @@ static inline void copy_fxregs_to_kernel(struct fpu *fpu)
 		     : "memory")
 
 /*
- * This function is called only during boot time when x86 caps are not set
- * up and alternative can not be used yet.
+ * Use this function to dump the initial state, only during boot time when x86
+ * caps not set up and alternative not available yet.
  */
 static inline void copy_xregs_to_kernel_booting(struct xregs_state *xstate)
 {
-	u64 mask = xfeatures_mask_all;
+	u64 mask = get_init_fpstate_mask();
 	u32 lmask = mask;
 	u32 hmask = mask >> 32;
 	int err;
diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
index 379e8f8b8440..62f6583f34fa 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -103,6 +103,7 @@ extern void __init update_regset_xstate_info(unsigned int size,
 					     u64 xstate_mask);
 
 void *get_xsave_addr(struct fpu *fpu, int xfeature_nr);
+unsigned int get_xstate_size(u64 mask);
 int alloc_xstate_buffer(struct fpu *fpu, u64 mask);
 void free_xstate_buffer(struct fpu *fpu);
 
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 6dafed34be4f..aad1a7102096 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -206,10 +206,10 @@ void fpstate_init(struct fpu *fpu)
 		return;
 	}
 
-	memset(state, 0, fpu_kernel_xstate_min_size);
+	memset(state, 0, fpu ? get_xstate_size(fpu->state_mask) : get_init_fpstate_size());
 
 	if (static_cpu_has(X86_FEATURE_XSAVES))
-		fpstate_init_xstate(&state->xsave, xfeatures_mask_all);
+		fpstate_init_xstate(&state->xsave, fpu ? fpu->state_mask : get_init_fpstate_mask());
 	if (static_cpu_has(X86_FEATURE_FXSR))
 		fpstate_init_fxstate(&state->fxsave);
 	else
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index af4d7d9aa977..43877005b4e2 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -137,7 +137,7 @@ static bool xfeature_is_supervisor(int xfeature_nr)
  * Available once those arrays for the offset, size, and alignment info are set up,
  * by setup_xstate_features().
  */
-static unsigned int get_xstate_size(u64 mask)
+unsigned int get_xstate_size(u64 mask)
 {
 	unsigned int size;
 	u64 xmask;
@@ -511,7 +511,7 @@ static void __init setup_init_fpu_buf(void)
 	print_xstate_features();
 
 	if (boot_cpu_has(X86_FEATURE_XSAVES))
-		fpstate_init_xstate(&init_fpstate.xsave, xfeatures_mask_all);
+		fpstate_init_xstate(&init_fpstate.xsave, get_init_fpstate_mask());
 
 	/*
 	 * Init all the features state with header.xfeatures being 0x0
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v3 09/21] x86/fpu/xstate: Introduce wrapper functions to organize xstate buffer access
  2020-12-23 15:56 [PATCH v3 00/21] x86: Support Intel Advanced Matrix Extensions Chang S. Bae
                   ` (7 preceding siblings ...)
  2020-12-23 15:57 ` [PATCH v3 08/21] x86/fpu/xstate: Define the scope of the initial xstate data Chang S. Bae
@ 2020-12-23 15:57 ` Chang S. Bae
  2021-02-08 12:33   ` Borislav Petkov
  2020-12-23 15:57 ` [PATCH v3 10/21] x86/fpu/xstate: Update xstate save function to support dynamic xstate Chang S. Bae
                   ` (12 subsequent siblings)
  21 siblings, 1 reply; 64+ messages in thread
From: Chang S. Bae @ 2020-12-23 15:57 UTC (permalink / raw)
  To: bp, luto, tglx, mingo, x86
  Cc: len.brown, dave.hansen, jing2.liu, ravi.v.shankar, linux-kernel,
	chang.seok.bae

The struct fpu includes two (possible) xstate buffers -- fpu->state and
fpu->state_ptr. Instead of open code for accessing one of them, provide a
wrapper that covers both cases.

KVM does not yet use fpu->state_ptr, and so it is left unchanged.

No functional change until the kernel supports dynamic user states.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v2:
* Updated the changelog with task->fpu removed. (Boris Petkov)
---
 arch/x86/include/asm/fpu/internal.h | 10 ++++++----
 arch/x86/include/asm/fpu/xstate.h   | 10 ++++++++++
 arch/x86/include/asm/trace/fpu.h    |  6 ++++--
 arch/x86/kernel/fpu/core.c          | 27 ++++++++++++++++-----------
 arch/x86/kernel/fpu/regset.c        | 28 +++++++++++++++++-----------
 arch/x86/kernel/fpu/signal.c        | 23 +++++++++++++----------
 arch/x86/kernel/fpu/xstate.c        | 20 +++++++++++---------
 7 files changed, 77 insertions(+), 47 deletions(-)

diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index bbdd304719c6..67ffd1d7c95e 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -210,10 +210,12 @@ static inline int copy_user_to_fregs(struct fregs_state __user *fx)
 
 static inline void copy_fxregs_to_kernel(struct fpu *fpu)
 {
+	union fpregs_state *xstate = __xstate(fpu);
+
 	if (IS_ENABLED(CONFIG_X86_32))
-		asm volatile( "fxsave %[fx]" : [fx] "=m" (fpu->state.fxsave));
+		asm volatile("fxsave %[fx]" : [fx] "=m" (xstate->fxsave));
 	else
-		asm volatile("fxsaveq %[fx]" : [fx] "=m" (fpu->state.fxsave));
+		asm volatile("fxsaveq %[fx]" : [fx] "=m" (xstate->fxsave));
 }
 
 /* These macros all use (%edi)/(%rdi) as the single memory argument. */
@@ -411,7 +413,7 @@ static inline int copy_user_to_xregs(struct xregs_state __user *buf, u64 mask)
  */
 static inline int copy_kernel_to_xregs_err(struct fpu *fpu, u64 mask)
 {
-	struct xregs_state *xstate = &fpu->state.xsave;
+	struct xregs_state *xstate = __xsave(fpu);
 	u32 lmask = mask;
 	u32 hmask = mask >> 32;
 	int err;
@@ -440,7 +442,7 @@ static inline void __copy_kernel_to_fpregs(union fpregs_state *fpstate, u64 mask
 
 static inline void copy_kernel_to_fpregs(struct fpu *fpu)
 {
-	union fpregs_state *fpstate = &fpu->state;
+	union fpregs_state *fpstate = __xstate(fpu);
 
 	/*
 	 * AMD K7/K8 CPUs don't save/restore FDP/FIP/FOP unless an exception is
diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
index 62f6583f34fa..5927033e017f 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -102,6 +102,16 @@ extern u64 xstate_fx_sw_bytes[USER_XSTATE_FX_SW_WORDS];
 extern void __init update_regset_xstate_info(unsigned int size,
 					     u64 xstate_mask);
 
+static inline union fpregs_state *__xstate(struct fpu *fpu)
+{
+	return (fpu->state_ptr) ? fpu->state_ptr : &fpu->state;
+}
+
+static inline struct xregs_state *__xsave(struct fpu *fpu)
+{
+	return &__xstate(fpu)->xsave;
+}
+
 void *get_xsave_addr(struct fpu *fpu, int xfeature_nr);
 unsigned int get_xstate_size(u64 mask);
 int alloc_xstate_buffer(struct fpu *fpu, u64 mask);
diff --git a/arch/x86/include/asm/trace/fpu.h b/arch/x86/include/asm/trace/fpu.h
index bf88b3333873..4b21c34436f9 100644
--- a/arch/x86/include/asm/trace/fpu.h
+++ b/arch/x86/include/asm/trace/fpu.h
@@ -22,8 +22,10 @@ DECLARE_EVENT_CLASS(x86_fpu,
 		__entry->fpu		= fpu;
 		__entry->load_fpu	= test_thread_flag(TIF_NEED_FPU_LOAD);
 		if (boot_cpu_has(X86_FEATURE_OSXSAVE)) {
-			__entry->xfeatures = fpu->state.xsave.header.xfeatures;
-			__entry->xcomp_bv  = fpu->state.xsave.header.xcomp_bv;
+			struct xregs_state *xsave = __xsave(fpu);
+
+			__entry->xfeatures = xsave->header.xfeatures;
+			__entry->xcomp_bv  = xsave->header.xcomp_bv;
 		}
 	),
 	TP_printk("x86/fpu: %p load: %d xfeatures: %llx xcomp_bv: %llx",
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index aad1a7102096..8b9d3ec9ac46 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -94,14 +94,18 @@ EXPORT_SYMBOL(irq_fpu_usable);
  */
 int copy_fpregs_to_fpstate(struct fpu *fpu)
 {
+	union fpregs_state *xstate = __xstate(fpu);
+
 	if (likely(use_xsave())) {
-		copy_xregs_to_kernel(&fpu->state.xsave);
+		struct xregs_state *xsave = &xstate->xsave;
+
+		copy_xregs_to_kernel(xsave);
 
 		/*
 		 * AVX512 state is tracked here because its use is
 		 * known to slow the max clock speed of the core.
 		 */
-		if (fpu->state.xsave.header.xfeatures & XFEATURE_MASK_AVX512)
+		if (xsave->header.xfeatures & XFEATURE_MASK_AVX512)
 			fpu->avx512_timestamp = jiffies;
 		return 1;
 	}
@@ -115,7 +119,7 @@ int copy_fpregs_to_fpstate(struct fpu *fpu)
 	 * Legacy FPU register saving, FNSAVE always clears FPU registers,
 	 * so we have to mark them inactive:
 	 */
-	asm volatile("fnsave %[fp]; fwait" : [fp] "=m" (fpu->state.fsave));
+	asm volatile("fnsave %[fp]; fwait" : [fp] "=m" (xstate->fsave));
 
 	return 0;
 }
@@ -197,7 +201,7 @@ void fpstate_init(struct fpu *fpu)
 	union fpregs_state *state;
 
 	if (fpu)
-		state = &fpu->state;
+		state = __xstate(fpu);
 	else
 		state = &init_fpstate;
 
@@ -248,7 +252,7 @@ int fpu__copy(struct task_struct *dst, struct task_struct *src)
 	 */
 	fpregs_lock();
 	if (test_thread_flag(TIF_NEED_FPU_LOAD))
-		memcpy(&dst_fpu->state, &src_fpu->state, fpu_kernel_xstate_min_size);
+		memcpy(__xstate(dst_fpu), __xstate(src_fpu), fpu_kernel_xstate_min_size);
 
 	else if (!copy_fpregs_to_fpstate(dst_fpu))
 		copy_kernel_to_fpregs(dst_fpu);
@@ -384,7 +388,7 @@ static void fpu__clear(struct fpu *fpu, bool user_only)
 	if (user_only) {
 		if (!fpregs_state_valid(fpu, smp_processor_id()) &&
 		    xfeatures_mask_supervisor())
-			copy_kernel_to_xregs(&fpu->state.xsave,
+			copy_kernel_to_xregs(__xsave(fpu),
 					     xfeatures_mask_supervisor());
 		copy_init_fpstate_to_fpregs(xfeatures_mask_user());
 	} else {
@@ -451,6 +455,7 @@ EXPORT_SYMBOL_GPL(fpregs_mark_activate);
 
 int fpu__exception_code(struct fpu *fpu, int trap_nr)
 {
+	union fpregs_state *xstate = __xstate(fpu);
 	int err;
 
 	if (trap_nr == X86_TRAP_MF) {
@@ -466,11 +471,11 @@ int fpu__exception_code(struct fpu *fpu, int trap_nr)
 		 * fully reproduce the context of the exception.
 		 */
 		if (boot_cpu_has(X86_FEATURE_FXSR)) {
-			cwd = fpu->state.fxsave.cwd;
-			swd = fpu->state.fxsave.swd;
+			cwd = xstate->fxsave.cwd;
+			swd = xstate->fxsave.swd;
 		} else {
-			cwd = (unsigned short)fpu->state.fsave.cwd;
-			swd = (unsigned short)fpu->state.fsave.swd;
+			cwd = (unsigned short)xstate->fsave.cwd;
+			swd = (unsigned short)xstate->fsave.swd;
 		}
 
 		err = swd & ~cwd;
@@ -484,7 +489,7 @@ int fpu__exception_code(struct fpu *fpu, int trap_nr)
 		unsigned short mxcsr = MXCSR_DEFAULT;
 
 		if (boot_cpu_has(X86_FEATURE_XMM))
-			mxcsr = fpu->state.fxsave.mxcsr;
+			mxcsr = xstate->fxsave.mxcsr;
 
 		err = ~(mxcsr >> 7) & mxcsr;
 	}
diff --git a/arch/x86/kernel/fpu/regset.c b/arch/x86/kernel/fpu/regset.c
index 5e13e58d11d4..8d863240b9c6 100644
--- a/arch/x86/kernel/fpu/regset.c
+++ b/arch/x86/kernel/fpu/regset.c
@@ -37,7 +37,7 @@ int xfpregs_get(struct task_struct *target, const struct user_regset *regset,
 	fpu__prepare_read(fpu);
 	fpstate_sanitize_xstate(fpu);
 
-	return membuf_write(&to, &fpu->state.fxsave, sizeof(struct fxregs_state));
+	return membuf_write(&to, &__xstate(fpu)->fxsave, sizeof(struct fxregs_state));
 }
 
 int xfpregs_set(struct task_struct *target, const struct user_regset *regset,
@@ -45,6 +45,7 @@ int xfpregs_set(struct task_struct *target, const struct user_regset *regset,
 		const void *kbuf, const void __user *ubuf)
 {
 	struct fpu *fpu = &target->thread.fpu;
+	union fpregs_state *xstate;
 	int ret;
 
 	if (!boot_cpu_has(X86_FEATURE_FXSR))
@@ -53,20 +54,22 @@ int xfpregs_set(struct task_struct *target, const struct user_regset *regset,
 	fpu__prepare_write(fpu);
 	fpstate_sanitize_xstate(fpu);
 
+	xstate = __xstate(fpu);
+
 	ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf,
-				 &fpu->state.fxsave, 0, -1);
+				 &xstate->fxsave, 0, -1);
 
 	/*
 	 * mxcsr reserved bits must be masked to zero for security reasons.
 	 */
-	fpu->state.fxsave.mxcsr &= mxcsr_feature_mask;
+	xstate->fxsave.mxcsr &= mxcsr_feature_mask;
 
 	/*
 	 * update the header bits in the xsave header, indicating the
 	 * presence of FP and SSE state.
 	 */
 	if (boot_cpu_has(X86_FEATURE_XSAVE))
-		fpu->state.xsave.header.xfeatures |= XFEATURE_MASK_FPSSE;
+		xstate->xsave.header.xfeatures |= XFEATURE_MASK_FPSSE;
 
 	return ret;
 }
@@ -80,7 +83,7 @@ int xstateregs_get(struct task_struct *target, const struct user_regset *regset,
 	if (!boot_cpu_has(X86_FEATURE_XSAVE))
 		return -ENODEV;
 
-	xsave = &fpu->state.xsave;
+	xsave = __xsave(fpu);
 
 	fpu__prepare_read(fpu);
 
@@ -120,7 +123,7 @@ int xstateregs_set(struct task_struct *target, const struct user_regset *regset,
 	if ((pos != 0) || (count < fpu_user_xstate_size))
 		return -EFAULT;
 
-	xsave = &fpu->state.xsave;
+	xsave = __xsave(fpu);
 
 	fpu__prepare_write(fpu);
 
@@ -224,7 +227,7 @@ static inline u32 twd_fxsr_to_i387(struct fxregs_state *fxsave)
 void
 convert_from_fxsr(struct user_i387_ia32_struct *env, struct task_struct *tsk)
 {
-	struct fxregs_state *fxsave = &tsk->thread.fpu.state.fxsave;
+	struct fxregs_state *fxsave = &__xstate(&tsk->thread.fpu)->fxsave;
 	struct _fpreg *to = (struct _fpreg *) &env->st_space[0];
 	struct _fpxreg *from = (struct _fpxreg *) &fxsave->st_space[0];
 	int i;
@@ -297,7 +300,7 @@ int fpregs_get(struct task_struct *target, const struct user_regset *regset,
 		return fpregs_soft_get(target, regset, to);
 
 	if (!boot_cpu_has(X86_FEATURE_FXSR)) {
-		return membuf_write(&to, &fpu->state.fsave,
+		return membuf_write(&to, &__xstate(fpu)->fsave,
 				    sizeof(struct fregs_state));
 	}
 
@@ -318,6 +321,7 @@ int fpregs_set(struct task_struct *target, const struct user_regset *regset,
 {
 	struct fpu *fpu = &target->thread.fpu;
 	struct user_i387_ia32_struct env;
+	union fpregs_state *xstate;
 	int ret;
 
 	fpu__prepare_write(fpu);
@@ -326,9 +330,11 @@ int fpregs_set(struct task_struct *target, const struct user_regset *regset,
 	if (!boot_cpu_has(X86_FEATURE_FPU))
 		return fpregs_soft_set(target, regset, pos, count, kbuf, ubuf);
 
+	xstate = __xstate(fpu);
+
 	if (!boot_cpu_has(X86_FEATURE_FXSR))
 		return user_regset_copyin(&pos, &count, &kbuf, &ubuf,
-					  &fpu->state.fsave, 0,
+					  &xstate->fsave, 0,
 					  -1);
 
 	if (pos > 0 || count < sizeof(env))
@@ -336,14 +342,14 @@ int fpregs_set(struct task_struct *target, const struct user_regset *regset,
 
 	ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, &env, 0, -1);
 	if (!ret)
-		convert_to_fxsr(&target->thread.fpu.state.fxsave, &env);
+		convert_to_fxsr(&__xstate(&target->thread.fpu)->fxsave, &env);
 
 	/*
 	 * update the header bit in the xsave header, indicating the
 	 * presence of FP.
 	 */
 	if (boot_cpu_has(X86_FEATURE_XSAVE))
-		fpu->state.xsave.header.xfeatures |= XFEATURE_MASK_FP;
+		xstate->xsave.header.xfeatures |= XFEATURE_MASK_FP;
 	return ret;
 }
 
diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index b6d2706b6886..59b6111d3223 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -58,7 +58,7 @@ static inline int check_for_xstate(struct fxregs_state __user *buf,
 static inline int save_fsave_header(struct task_struct *tsk, void __user *buf)
 {
 	if (use_fxsr()) {
-		struct xregs_state *xsave = &tsk->thread.fpu.state.xsave;
+		struct xregs_state *xsave = __xsave(&tsk->thread.fpu);
 		struct user_i387_ia32_struct env;
 		struct _fpstate_32 __user *fp = buf;
 
@@ -152,8 +152,8 @@ static inline int copy_fpregs_to_sigframe(struct xregs_state __user *buf)
  *
  * Try to save it directly to the user frame with disabled page fault handler.
  * If this fails then do the slow path where the FPU state is first saved to
- * task's fpu->state and then copy it to the user frame pointed to by the
- * aligned pointer 'buf_fx'.
+ * task->fpu and then copy it to the user frame pointed to by the aligned
+ * pointer 'buf_fx'.
  *
  * If this is a 32-bit frame with fxstate, put a fsave header before
  * the aligned state at 'buf_fx'.
@@ -216,7 +216,7 @@ sanitize_restored_user_xstate(struct fpu *fpu,
 			      struct user_i387_ia32_struct *ia32_env,
 			      u64 user_xfeatures, int fx_only)
 {
-	struct xregs_state *xsave = &fpu->state.xsave;
+	struct xregs_state *xsave = __xsave(fpu);
 	struct xstate_header *header = &xsave->header;
 
 	if (use_xsave()) {
@@ -253,7 +253,7 @@ sanitize_restored_user_xstate(struct fpu *fpu,
 		xsave->i387.mxcsr &= mxcsr_feature_mask;
 
 		if (ia32_env)
-			convert_to_fxsr(&fpu->state.fxsave, ia32_env);
+			convert_to_fxsr(&__xstate(fpu)->fxsave, ia32_env);
 	}
 }
 
@@ -295,6 +295,7 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
 	struct task_struct *tsk = current;
 	struct fpu *fpu = &tsk->thread.fpu;
 	struct user_i387_ia32_struct env;
+	union fpregs_state *xstate;
 	u64 user_xfeatures = 0;
 	int fx_only = 0;
 	int ret = 0;
@@ -335,6 +336,8 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
 	if ((unsigned long)buf_fx % 64)
 		fx_only = 1;
 
+	xstate = __xstate(fpu);
+
 	if (!ia32_fxstate) {
 		/*
 		 * Attempt to restore the FPU registers directly from user
@@ -363,7 +366,7 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
 			 */
 			if (test_thread_flag(TIF_NEED_FPU_LOAD) &&
 			    xfeatures_mask_supervisor())
-				copy_kernel_to_xregs(&fpu->state.xsave,
+				copy_kernel_to_xregs(&xstate->xsave,
 						     xfeatures_mask_supervisor());
 			fpregs_mark_activate();
 			fpregs_unlock();
@@ -429,7 +432,7 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
 		ret = copy_kernel_to_xregs_err(fpu, user_xfeatures | xfeatures_mask_supervisor());
 
 	} else if (use_fxsr()) {
-		ret = __copy_from_user(&fpu->state.fxsave, buf_fx, state_size);
+		ret = __copy_from_user(&xstate->fxsave, buf_fx, state_size);
 		if (ret) {
 			ret = -EFAULT;
 			goto err_out;
@@ -445,14 +448,14 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
 			copy_kernel_to_xregs(&init_fpstate.xsave, init_bv);
 		}
 
-		ret = copy_kernel_to_fxregs_err(&fpu->state.fxsave);
+		ret = copy_kernel_to_fxregs_err(&xstate->fxsave);
 	} else {
-		ret = __copy_from_user(&fpu->state.fsave, buf_fx, state_size);
+		ret = __copy_from_user(&xstate->fsave, buf_fx, state_size);
 		if (ret)
 			goto err_out;
 
 		fpregs_lock();
-		ret = copy_kernel_to_fregs_err(&fpu->state.fsave);
+		ret = copy_kernel_to_fregs_err(&xstate->fsave);
 	}
 	if (!ret)
 		fpregs_mark_activate();
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 43877005b4e2..8dfbc7d1702a 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -192,14 +192,16 @@ unsigned int get_xstate_size(u64 mask)
  */
 void fpstate_sanitize_xstate(struct fpu *fpu)
 {
-	struct fxregs_state *fx = &fpu->state.fxsave;
+	union fpregs_state *xstate = __xstate(fpu);
+	struct xregs_state *xsave = &xstate->xsave;
+	struct fxregs_state *fx = &xstate->fxsave;
 	int feature_bit;
 	u64 xfeatures;
 
 	if (!use_xsaveopt())
 		return;
 
-	xfeatures = fpu->state.xsave.header.xfeatures;
+	xfeatures = xsave->header.xfeatures;
 
 	/*
 	 * None of the feature bits are in init state. So nothing else
@@ -244,7 +246,7 @@ void fpstate_sanitize_xstate(struct fpu *fpu)
 			int offset = xstate_comp_offsets[feature_bit];
 			int size = xstate_sizes[feature_bit];
 
-			memcpy((void *)fx + offset,
+			memcpy((void *)xsave + offset,
 			       (void *)&init_fpstate.xsave + offset,
 			       size);
 		}
@@ -992,7 +994,7 @@ static void *__raw_xsave_addr(struct fpu *fpu, int xfeature_nr)
 	}
 
 	if (fpu)
-		xsave = &fpu->state.xsave;
+		xsave = __xsave(fpu);
 	else
 		xsave = &init_fpstate.xsave;
 
@@ -1035,7 +1037,7 @@ void *get_xsave_addr(struct fpu *fpu, int xfeature_nr)
 		  "get of unsupported state");
 
 	if (fpu)
-		xsave = &fpu->state.xsave;
+		xsave = __xsave(fpu);
 	else
 		xsave = &init_fpstate.xsave;
 
@@ -1236,7 +1238,7 @@ void copy_xstate_to_kernel(struct membuf to, struct fpu *fpu)
 	unsigned last = 0;
 	int i;
 
-	xsave = &fpu->state.xsave;
+	xsave = __xsave(fpu);
 
 	/*
 	 * The destination is a ptrace buffer; we put in only user xstates:
@@ -1314,7 +1316,7 @@ int copy_kernel_to_xstate(struct fpu *fpu, const void *kbuf)
 		}
 	}
 
-	xsave = &fpu->state.xsave;
+	xsave = __xsave(fpu);
 
 	if (xfeatures_mxcsr_quirk(hdr.xfeatures)) {
 		offset = offsetof(struct fxregs_state, mxcsr);
@@ -1372,7 +1374,7 @@ int copy_user_to_xstate(struct fpu *fpu, const void __user *ubuf)
 		}
 	}
 
-	xsave = &fpu->state.xsave;
+	xsave = __xsave(fpu);
 
 	if (xfeatures_mxcsr_quirk(hdr.xfeatures)) {
 		offset = offsetof(struct fxregs_state, mxcsr);
@@ -1417,7 +1419,7 @@ void copy_supervisor_to_kernel(struct fpu *fpu)
 	max_bit = __fls(xfeatures_mask_supervisor());
 	min_bit = __ffs(xfeatures_mask_supervisor());
 
-	xstate = &fpu->state.xsave;
+	xstate = __xsave(fpu);
 	lmask = xfeatures_mask_supervisor();
 	hmask = xfeatures_mask_supervisor() >> 32;
 	XSTATE_OP(XSAVES, xstate, lmask, hmask, err);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v3 10/21] x86/fpu/xstate: Update xstate save function to support dynamic xstate
  2020-12-23 15:56 [PATCH v3 00/21] x86: Support Intel Advanced Matrix Extensions Chang S. Bae
                   ` (8 preceding siblings ...)
  2020-12-23 15:57 ` [PATCH v3 09/21] x86/fpu/xstate: Introduce wrapper functions to organize xstate buffer access Chang S. Bae
@ 2020-12-23 15:57 ` Chang S. Bae
  2021-01-07  8:41   ` Liu, Jing2
  2021-02-08 12:33   ` Borislav Petkov
  2020-12-23 15:57 ` [PATCH v3 11/21] x86/fpu/xstate: Update xstate buffer address finder " Chang S. Bae
                   ` (11 subsequent siblings)
  21 siblings, 2 replies; 64+ messages in thread
From: Chang S. Bae @ 2020-12-23 15:57 UTC (permalink / raw)
  To: bp, luto, tglx, mingo, x86
  Cc: len.brown, dave.hansen, jing2.liu, ravi.v.shankar, linux-kernel,
	chang.seok.bae, kvm

copy_xregs_to_kernel() used to save all user states in a kernel buffer.
When the dynamic user state is enabled, it becomes conditional which state
to be saved.

fpu->state_mask can indicate which state components are reserved to be
saved in XSAVE buffer. Use it as XSAVE's instruction mask to select states.

KVM used to save all xstate via copy_xregs_to_kernel(). Update KVM to set a
valid fpu->state_mask, which will be necessary to correctly handle dynamic
state buffers.

No functional change until the kernel supports dynamic user states.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: kvm@vger.kernel.org
---
Changes from v2:
* Updated the changelog to clarify the KVM code changes.
---
 arch/x86/include/asm/fpu/internal.h |  3 +--
 arch/x86/kernel/fpu/core.c          |  2 +-
 arch/x86/kvm/x86.c                  | 11 ++++++++---
 3 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 67ffd1d7c95e..d409a6ae0c38 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -332,9 +332,8 @@ static inline void copy_kernel_to_xregs_booting(struct xregs_state *xstate)
 /*
  * Save processor xstate to xsave area.
  */
-static inline void copy_xregs_to_kernel(struct xregs_state *xstate)
+static inline void copy_xregs_to_kernel(struct xregs_state *xstate, u64 mask)
 {
-	u64 mask = xfeatures_mask_all;
 	u32 lmask = mask;
 	u32 hmask = mask >> 32;
 	int err;
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 8b9d3ec9ac46..5a12e4b22db2 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -99,7 +99,7 @@ int copy_fpregs_to_fpstate(struct fpu *fpu)
 	if (likely(use_xsave())) {
 		struct xregs_state *xsave = &xstate->xsave;
 
-		copy_xregs_to_kernel(xsave);
+		copy_xregs_to_kernel(xsave, fpu->state_mask);
 
 		/*
 		 * AVX512 state is tracked here because its use is
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4aecfba04bd3..93b5bacad67a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9214,15 +9214,20 @@ static int complete_emulated_mmio(struct kvm_vcpu *vcpu)
 
 static void kvm_save_current_fpu(struct fpu *fpu)
 {
+	struct fpu *src_fpu = &current->thread.fpu;
+
 	/*
 	 * If the target FPU state is not resident in the CPU registers, just
 	 * memcpy() from current, else save CPU state directly to the target.
 	 */
-	if (test_thread_flag(TIF_NEED_FPU_LOAD))
-		memcpy(&fpu->state, &current->thread.fpu.state,
+	if (test_thread_flag(TIF_NEED_FPU_LOAD)) {
+		memcpy(&fpu->state, &src_fpu->state,
 		       fpu_kernel_xstate_min_size);
-	else
+	} else {
+		if (fpu->state_mask != src_fpu->state_mask)
+			fpu->state_mask = src_fpu->state_mask;
 		copy_fpregs_to_fpstate(fpu);
+	}
 }
 
 /* Swap (qemu) user FPU context for the guest FPU context. */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v3 11/21] x86/fpu/xstate: Update xstate buffer address finder to support dynamic xstate
  2020-12-23 15:56 [PATCH v3 00/21] x86: Support Intel Advanced Matrix Extensions Chang S. Bae
                   ` (9 preceding siblings ...)
  2020-12-23 15:57 ` [PATCH v3 10/21] x86/fpu/xstate: Update xstate save function to support dynamic xstate Chang S. Bae
@ 2020-12-23 15:57 ` Chang S. Bae
  2021-02-19 15:00   ` Borislav Petkov
  2020-12-23 15:57 ` [PATCH v3 12/21] x86/fpu/xstate: Update xstate context copy function to support dynamic buffer Chang S. Bae
                   ` (10 subsequent siblings)
  21 siblings, 1 reply; 64+ messages in thread
From: Chang S. Bae @ 2020-12-23 15:57 UTC (permalink / raw)
  To: bp, luto, tglx, mingo, x86
  Cc: len.brown, dave.hansen, jing2.liu, ravi.v.shankar, linux-kernel,
	chang.seok.bae

__raw_xsave_addr() returns the requested component's pointer in an xstate
buffer, by simply looking up the offset table. The offset used to be fixed,
but, with dynamic user states, it becomes variable.

get_xstate_size() has a routine to find an offset at runtime. Refactor to
use it for the address finder.

No functional change until the kernel enables dynamic user states.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
 arch/x86/kernel/fpu/xstate.c | 82 +++++++++++++++++++++++-------------
 1 file changed, 52 insertions(+), 30 deletions(-)

diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 8dfbc7d1702a..6b863b2ca405 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -133,15 +133,50 @@ static bool xfeature_is_supervisor(int xfeature_nr)
 	return ecx & 1;
 }
 
+/*
+ * Available once those arrays for the offset, size, and alignment info are set up,
+ * by setup_xstate_features().
+ */
+static unsigned int __get_xstate_comp_offset(u64 mask, int feature_nr)
+{
+	u64 xmask = BIT_ULL(feature_nr + 1) - 1;
+	unsigned int next_offset, offset = 0;
+	int i;
+
+	if ((mask & xmask) == (xfeatures_mask_all & xmask))
+		return xstate_comp_offsets[feature_nr];
+
+	/*
+	 * Calculate the size by summing up each state together, since no known
+	 * offset found with the xstate buffer format out of the given mask.
+	 */
+
+	next_offset = FXSAVE_SIZE + XSAVE_HDR_SIZE;
+
+	for (i = FIRST_EXTENDED_XFEATURE; i <= feature_nr; i++) {
+		if (!(mask & BIT_ULL(i)))
+			continue;
+
+		offset = xstate_aligns[i] ? ALIGN(next_offset, 64) : next_offset;
+		next_offset += xstate_sizes[i];
+	}
+
+	return offset;
+}
+
+static unsigned int get_xstate_comp_offset(struct fpu *fpu, int feature_nr)
+{
+	return __get_xstate_comp_offset(fpu->state_mask, feature_nr);
+}
+
 /*
  * Available once those arrays for the offset, size, and alignment info are set up,
  * by setup_xstate_features().
  */
 unsigned int get_xstate_size(u64 mask)
 {
-	unsigned int size;
-	u64 xmask;
-	int i, nr;
+	unsigned int offset;
+	int nr;
 
 	if (!mask)
 		return 0;
@@ -155,24 +190,8 @@ unsigned int get_xstate_size(u64 mask)
 	if (!using_compacted_format())
 		return xstate_offsets[nr] + xstate_sizes[nr];
 
-	xmask = BIT_ULL(nr + 1) - 1;
-
-	if (mask == (xmask & xfeatures_mask_all))
-		return xstate_comp_offsets[nr] + xstate_sizes[nr];
-
-	/*
-	 * Calculate the size by summing up each state together, since no known
-	 * size found with the xstate buffer format out of the given mask.
-	 */
-	for (size = FXSAVE_SIZE + XSAVE_HDR_SIZE, i = FIRST_EXTENDED_XFEATURE; i <= nr; i++) {
-		if (!(mask & BIT_ULL(i)))
-			continue;
-
-		if (xstate_aligns[i])
-			size = ALIGN(size, 64);
-		size += xstate_sizes[i];
-	}
-	return size;
+	offset = __get_xstate_comp_offset(mask, nr);
+	return offset + xstate_sizes[nr];
 }
 
 /*
@@ -988,17 +1007,20 @@ static void *__raw_xsave_addr(struct fpu *fpu, int xfeature_nr)
 {
 	void *xsave;
 
-	if (!xfeature_enabled(xfeature_nr)) {
-		WARN_ON_FPU(1);
-		return NULL;
-	}
-
-	if (fpu)
-		xsave = __xsave(fpu);
-	else
+	if (!xfeature_enabled(xfeature_nr))
+		goto not_found;
+	else if (!fpu)
 		xsave = &init_fpstate.xsave;
+	else if (!(fpu->state_mask & BIT_ULL(xfeature_nr)))
+		goto not_found;
+	else
+		xsave = __xsave(fpu);
+
+	return xsave + get_xstate_comp_offset(fpu, xfeature_nr);
 
-	return xsave + xstate_comp_offsets[xfeature_nr];
+not_found:
+	WARN_ON_FPU(1);
+	return NULL;
 }
 /*
  * Given the xsave area and a state inside, this function returns the
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v3 12/21] x86/fpu/xstate: Update xstate context copy function to support dynamic buffer
  2020-12-23 15:56 [PATCH v3 00/21] x86: Support Intel Advanced Matrix Extensions Chang S. Bae
                   ` (10 preceding siblings ...)
  2020-12-23 15:57 ` [PATCH v3 11/21] x86/fpu/xstate: Update xstate buffer address finder " Chang S. Bae
@ 2020-12-23 15:57 ` Chang S. Bae
  2020-12-23 15:57 ` [PATCH v3 13/21] x86/fpu/xstate: Expand dynamic context switch buffer on first use Chang S. Bae
                   ` (9 subsequent siblings)
  21 siblings, 0 replies; 64+ messages in thread
From: Chang S. Bae @ 2020-12-23 15:57 UTC (permalink / raw)
  To: bp, luto, tglx, mingo, x86
  Cc: len.brown, dave.hansen, jing2.liu, ravi.v.shankar, linux-kernel,
	chang.seok.bae

ptrace() and signal return paths use xstate context copy functions. They
allow callers to read (or write) xstate values in the target's buffer. With
dynamic user states, a component's position in the buffer may vary and the
initial value is not always stored in init_fpstate.

Change the helpers to find a component's offset accordingly.

When copying an initial value, explicitly check the init_fpstate coverage.
If not found, reset the memory in the destination. Otherwise, copy values
from init_fpstate.

No functional change until the kernel supports dynamic user states.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v2:
* Updated the changelog with task->fpu removed. (Boris Petkov)
---
 arch/x86/kernel/fpu/xstate.c | 55 +++++++++++++++++++++++++++---------
 1 file changed, 41 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 6b863b2ca405..1d7d0cce6cc5 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -248,12 +248,14 @@ void fpstate_sanitize_xstate(struct fpu *fpu)
 	if (!(xfeatures & XFEATURE_MASK_SSE))
 		memset(&fx->xmm_space[0], 0, 256);
 
+	/* Make sure 'xfeatures' to be a subset of fpu->state_mask */
+	xfeatures = ((xfeatures_mask_user() & fpu->state_mask) & ~xfeatures);
 	/*
 	 * First two features are FPU and SSE, which above we handled
 	 * in a special way already:
 	 */
 	feature_bit = 0x2;
-	xfeatures = (xfeatures_mask_user() & ~xfeatures) >> 2;
+	xfeatures >>= 0x2;
 
 	/*
 	 * Update all the remaining memory layouts according to their
@@ -262,12 +264,15 @@ void fpstate_sanitize_xstate(struct fpu *fpu)
 	 */
 	while (xfeatures) {
 		if (xfeatures & 0x1) {
-			int offset = xstate_comp_offsets[feature_bit];
-			int size = xstate_sizes[feature_bit];
-
-			memcpy((void *)xsave + offset,
-			       (void *)&init_fpstate.xsave + offset,
-			       size);
+			unsigned int offset = get_xstate_comp_offset(fpu, feature_bit);
+			unsigned int size = xstate_sizes[feature_bit];
+
+			if (get_init_fpstate_mask() & BIT_ULL(feature_bit))
+				memcpy((void *)xsave + offset,
+				       (void *)&init_fpstate.xsave + offset,
+				       size);
+			else
+				memset((void *)xsave + offset, 0, size);
 		}
 
 		xfeatures >>= 1;
@@ -1232,7 +1237,10 @@ static void fill_gap(struct membuf *to, unsigned *last, unsigned offset)
 {
 	if (*last >= offset)
 		return;
-	membuf_write(to, (void *)&init_fpstate.xsave + *last, offset - *last);
+	if (offset <= get_init_fpstate_size())
+		membuf_write(to, (void *)&init_fpstate.xsave + *last, offset - *last);
+	else
+		membuf_zero(to, offset - *last);
 	*last = offset;
 }
 
@@ -1240,7 +1248,10 @@ static void copy_part(struct membuf *to, unsigned *last, unsigned offset,
 		      unsigned size, void *from)
 {
 	fill_gap(to, last, offset);
-	membuf_write(to, from, size);
+	if (from)
+		membuf_write(to, from, size);
+	else
+		membuf_zero(to, size);
 	*last = offset + size;
 }
 
@@ -1292,12 +1303,22 @@ void copy_xstate_to_kernel(struct membuf to, struct fpu *fpu)
 		  sizeof(header), &header);
 
 	for (i = FIRST_EXTENDED_XFEATURE; i < XFEATURE_MAX; i++) {
+		u64 mask = BIT_ULL(i);
+		void *src;
 		/*
-		 * Copy only in-use xstates:
+		 * Copy only in-use xstate at first. If the feature is enabled,
+		 * find the init value, whether stored in init_fpstate or simply
+		 * zeros, and then copy them.
 		 */
-		if ((header.xfeatures >> i) & 1) {
-			void *src = __raw_xsave_addr(fpu, i);
-
+		if (header.xfeatures & mask) {
+			src = __raw_xsave_addr(fpu, i);
+			copy_part(&to, &last, xstate_offsets[i],
+				  xstate_sizes[i], src);
+		} else if (xfeatures_mask_user() & mask) {
+			if (get_init_fpstate_mask() & mask)
+				src = (void *)&init_fpstate.xsave + last;
+			else
+				src = NULL;
 			copy_part(&to, &last, xstate_offsets[i],
 				  xstate_sizes[i], src);
 		}
@@ -1331,6 +1352,9 @@ int copy_kernel_to_xstate(struct fpu *fpu, const void *kbuf)
 		if (hdr.xfeatures & mask) {
 			void *dst = __raw_xsave_addr(fpu, i);
 
+			if (!dst)
+				continue;
+
 			offset = xstate_offsets[i];
 			size = xstate_sizes[i];
 
@@ -1388,6 +1412,9 @@ int copy_user_to_xstate(struct fpu *fpu, const void __user *ubuf)
 		if (hdr.xfeatures & mask) {
 			void *dst = __raw_xsave_addr(fpu, i);
 
+			if (!dst)
+				continue;
+
 			offset = xstate_offsets[i];
 			size = xstate_sizes[i];
 
@@ -1470,7 +1497,7 @@ void copy_supervisor_to_kernel(struct fpu *fpu)
 			continue;
 
 		/* Move xfeature 'i' into its normal location */
-		memmove(xbuf + xstate_comp_offsets[i],
+		memmove(xbuf + get_xstate_comp_offset(fpu, i),
 			xbuf + xstate_supervisor_only_offsets[i],
 			xstate_sizes[i]);
 	}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v3 13/21] x86/fpu/xstate: Expand dynamic context switch buffer on first use
  2020-12-23 15:56 [PATCH v3 00/21] x86: Support Intel Advanced Matrix Extensions Chang S. Bae
                   ` (11 preceding siblings ...)
  2020-12-23 15:57 ` [PATCH v3 12/21] x86/fpu/xstate: Update xstate context copy function to support dynamic buffer Chang S. Bae
@ 2020-12-23 15:57 ` Chang S. Bae
  2020-12-23 15:57 ` [PATCH v3 14/21] x86/fpu/xstate: Support ptracer-induced xstate buffer expansion Chang S. Bae
                   ` (8 subsequent siblings)
  21 siblings, 0 replies; 64+ messages in thread
From: Chang S. Bae @ 2020-12-23 15:57 UTC (permalink / raw)
  To: bp, luto, tglx, mingo, x86
  Cc: len.brown, dave.hansen, jing2.liu, ravi.v.shankar, linux-kernel,
	chang.seok.bae

Intel's Extended Feature Disable (XFD) feature is an extension of the XSAVE
architecture. XFD allows the kernel to enable a feature state in XCR0 and
to receive a #NM trap when a task uses instructions accessing that state.
In this way, Linux can defer allocating the large XSAVE buffer until tasks
need it.

XFD introduces two MSRs: IA32_XFD to enable/disable the feature and
IA32_XFD_ERR to assist the #NM trap handler. Both use the same
state-component bitmap format, used by XCR0.

Use this hardware capability to find the right time to expand the xstate
buffer. Introduce two sets of helper functions for that:

1. The first set is primarily for interacting with the XFD hardware:
	xdisable_setbits()
	xdisable_getbits()
	xdisable_switch()

2. The second set is for managing the first-use status and handling #NM
   trap:
	xfirstuse_enabled()
	xfirstuse_not_detected()

The #NM handler induces the xstate buffer expansion to save the first-used
states.

If the standard (uncompacted) format is used in the kernel, the XSAVE
buffer has the maximum size already, and so XFD is not needed. The XFD
feature is enabled only when the compacted format is in use.

No functional change until the kernel enables dynamic user states and XFD.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v2:
* Changed to enable XFD only when the compacted format is used.
* Updated the changelog with task->fpu removed. (Boris Petkov)

Changes from v1:
* Inlined the XFD-induced #NM handling code (Andy Lutomirski)
---
 arch/x86/include/asm/cpufeatures.h  |  1 +
 arch/x86/include/asm/fpu/internal.h | 51 ++++++++++++++++++++++++++++-
 arch/x86/include/asm/msr-index.h    |  2 ++
 arch/x86/kernel/cpu/cpuid-deps.c    |  1 +
 arch/x86/kernel/fpu/xstate.c        | 37 +++++++++++++++++++--
 arch/x86/kernel/process.c           |  5 +++
 arch/x86/kernel/process_32.c        |  2 +-
 arch/x86/kernel/process_64.c        |  2 +-
 arch/x86/kernel/traps.c             | 40 ++++++++++++++++++++++
 9 files changed, 135 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index dad350d42ecf..5b6496ee3703 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -275,6 +275,7 @@
 #define X86_FEATURE_XSAVEC		(10*32+ 1) /* XSAVEC instruction */
 #define X86_FEATURE_XGETBV1		(10*32+ 2) /* XGETBV with ECX = 1 instruction */
 #define X86_FEATURE_XSAVES		(10*32+ 3) /* XSAVES/XRSTORS instructions */
+#define X86_FEATURE_XFD			(10*32+ 4) /* eXtended Feature Disabling */
 
 /*
  * Extended auxiliary flags: Linux defined - for features scattered in various
diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index d409a6ae0c38..5eba9a466249 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -573,11 +573,58 @@ static inline void switch_fpu_prepare(struct fpu *old_fpu, int cpu)
  * Misc helper functions:
  */
 
+/* The first-use detection helpers: */
+
+static inline void xdisable_setbits(u64 value)
+{
+	wrmsrl_safe(MSR_IA32_XFD, value);
+}
+
+static inline u64 xdisable_getbits(void)
+{
+	u64 value;
+
+	rdmsrl_safe(MSR_IA32_XFD, &value);
+	return value;
+}
+
+static inline u64 xfirstuse_enabled(void)
+{
+	/* All the dynamic user components are first-use enabled. */
+	return xfeatures_mask_user_dynamic;
+}
+
+/*
+ * Convert fpu->firstuse_bv to xdisable configuration in MSR IA32_XFD.
+ * xdisable_setbits() only uses this.
+ */
+static inline u64 xfirstuse_not_detected(struct fpu *fpu)
+{
+	u64 firstuse_bv = (fpu->state_mask & xfirstuse_enabled());
+
+	/*
+	 * If first-use is not detected, set the bit. If the detection is
+	 * not enabled, the bit is always zero in firstuse_bv. So, make
+	 * following conversion:
+	 */
+	return  (xfirstuse_enabled() ^ firstuse_bv);
+}
+
+/* Update MSR IA32_XFD based on fpu->firstuse_bv */
+static inline void xdisable_switch(struct fpu *prev, struct fpu *next)
+{
+	if (!static_cpu_has(X86_FEATURE_XFD) || !xfirstuse_enabled())
+		return;
+
+	if (unlikely(prev->state_mask != next->state_mask))
+		xdisable_setbits(xfirstuse_not_detected(next));
+}
+
 /*
  * Load PKRU from the FPU context if available. Delay loading of the
  * complete FPU state until the return to userland.
  */
-static inline void switch_fpu_finish(struct fpu *new_fpu)
+static inline void switch_fpu_finish(struct fpu *old_fpu, struct fpu *new_fpu)
 {
 	u32 pkru_val = init_pkru_value;
 	struct pkru_state *pk;
@@ -587,6 +634,8 @@ static inline void switch_fpu_finish(struct fpu *new_fpu)
 
 	set_thread_flag(TIF_NEED_FPU_LOAD);
 
+	xdisable_switch(old_fpu, new_fpu);
+
 	if (!cpu_feature_enabled(X86_FEATURE_OSPKE))
 		return;
 
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 972a34d93505..f8b5f9b3c845 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -617,6 +617,8 @@
 #define MSR_IA32_BNDCFGS_RSVD		0x00000ffc
 
 #define MSR_IA32_XSS			0x00000da0
+#define MSR_IA32_XFD			0x000001c4
+#define MSR_IA32_XFD_ERR		0x000001c5
 
 #define MSR_IA32_APICBASE		0x0000001b
 #define MSR_IA32_APICBASE_BSP		(1<<8)
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index d502241995a3..a9e8e160ae30 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -71,6 +71,7 @@ static const struct cpuid_dep cpuid_deps[] = {
 	{ X86_FEATURE_AVX512_BF16,		X86_FEATURE_AVX512VL  },
 	{ X86_FEATURE_ENQCMD,			X86_FEATURE_XSAVES    },
 	{ X86_FEATURE_PER_THREAD_MBA,		X86_FEATURE_MBA       },
+	{ X86_FEATURE_XFD,			X86_FEATURE_XSAVES    },
 	{}
 };
 
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 1d7d0cce6cc5..592e67ff6fa7 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -133,6 +133,21 @@ static bool xfeature_is_supervisor(int xfeature_nr)
 	return ecx & 1;
 }
 
+static bool xfeature_disable_supported(int xfeature_nr)
+{
+	u32 eax, ebx, ecx, edx;
+
+	if (!boot_cpu_has(X86_FEATURE_XFD))
+		return false;
+
+	/*
+	 * If state component 'i' supports xfeature disable (first-use
+	 * detection), ECX[2] return 1; otherwise, 0.
+	 */
+	cpuid_count(XSTATE_CPUID, xfeature_nr, &eax, &ebx, &ecx, &edx);
+	return ecx & 4;
+}
+
 /*
  * Available once those arrays for the offset, size, and alignment info are set up,
  * by setup_xstate_features().
@@ -316,6 +331,9 @@ void fpu__init_cpu_xstate(void)
 		wrmsrl(MSR_IA32_XSS, xfeatures_mask_supervisor() |
 				     xfeatures_mask_supervisor_dynamic());
 	}
+
+	if (boot_cpu_has(X86_FEATURE_XFD))
+		xdisable_setbits(xfirstuse_enabled());
 }
 
 static bool xfeature_enabled(enum xfeature xfeature)
@@ -515,8 +533,9 @@ static void __init print_xstate_offset_size(void)
 	for (i = FIRST_EXTENDED_XFEATURE; i < XFEATURE_MAX; i++) {
 		if (!xfeature_enabled(i))
 			continue;
-		pr_info("x86/fpu: xstate_offset[%d]: %4d, xstate_sizes[%d]: %4d\n",
-			 i, xstate_comp_offsets[i], i, xstate_sizes[i]);
+		pr_info("x86/fpu: xstate_offset[%d]: %4d, xstate_sizes[%d]: %4d (%s)\n",
+			i, xstate_comp_offsets[i], i, xstate_sizes[i],
+			(xfeatures_mask_user_dynamic & BIT_ULL(i)) ? "on-demand" : "default");
 	}
 }
 
@@ -945,9 +964,18 @@ void __init fpu__init_system_xstate(void)
 	}
 
 	xfeatures_mask_all &= fpu__get_supported_xfeatures_mask();
-	/* Do not support the dynamically allocated buffer yet. */
 	xfeatures_mask_user_dynamic = 0;
 
+	for (i = FIRST_EXTENDED_XFEATURE; i < XFEATURE_MAX; i++) {
+		u64 feature_mask = BIT_ULL(i);
+
+		if (!(xfeatures_mask_user() & feature_mask))
+			continue;
+
+		if (xfeature_disable_supported(i))
+			xfeatures_mask_user_dynamic |= feature_mask;
+	}
+
 	/* Enable xstate instructions to be able to continue with initialization: */
 	fpu__init_cpu_xstate();
 	err = init_xstate_size();
@@ -999,6 +1027,9 @@ void fpu__resume_cpu(void)
 		wrmsrl(MSR_IA32_XSS, xfeatures_mask_supervisor()  |
 				     xfeatures_mask_supervisor_dynamic());
 	}
+
+	if (boot_cpu_has(X86_FEATURE_XFD))
+		xdisable_setbits(xfirstuse_not_detected(&current->thread.fpu));
 }
 
 /*
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 326b16aefb06..3c335870051c 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -102,6 +102,11 @@ void arch_thread_struct_whitelist(unsigned long *offset, unsigned long *size)
 	*size = fpu_kernel_xstate_min_size;
 }
 
+void arch_release_task_struct(struct task_struct *tsk)
+{
+	free_xstate_buffer(&tsk->thread.fpu);
+}
+
 /*
  * Free thread data structures etc..
  */
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 4f2f54e1281c..7bd5d08eeb41 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -213,7 +213,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 
 	this_cpu_write(current_task, next_p);
 
-	switch_fpu_finish(next_fpu);
+	switch_fpu_finish(prev_fpu, next_fpu);
 
 	/* Load the Intel cache allocation PQR MSR. */
 	resctrl_sched_in();
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index df342bedea88..4f3bef245863 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -595,7 +595,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	this_cpu_write(current_task, next_p);
 	this_cpu_write(cpu_current_top_of_stack, task_top_of_stack(next_p));
 
-	switch_fpu_finish(next_fpu);
+	switch_fpu_finish(prev_fpu, next_fpu);
 
 	/* Reload sp0. */
 	update_task_stack(next_p);
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index e19df6cde35d..5dca7e70794f 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -1094,10 +1094,50 @@ DEFINE_IDTENTRY(exc_spurious_interrupt_bug)
 	 */
 }
 
+static __always_inline bool handle_xfirstuse_event(struct fpu *fpu)
+{
+	bool handled = false;
+	u64 event_mask;
+
+	/* Check whether the first-use detection is running. */
+	if (!static_cpu_has(X86_FEATURE_XFD) || !xfirstuse_enabled())
+		return handled;
+
+	rdmsrl_safe(MSR_IA32_XFD_ERR, &event_mask);
+
+	/* The trap event should happen to one of first-use enabled features */
+	WARN_ON(!(event_mask & xfirstuse_enabled()));
+
+	/* If IA32_XFD_ERR is empty, the current trap has nothing to do with. */
+	if (!event_mask)
+		return handled;
+
+	/*
+	 * The first-use event is presumed to be from userspace, so it should have
+	 * nothing to do with interrupt context.
+	 */
+	if (WARN_ON(in_interrupt()))
+		return handled;
+
+	if (alloc_xstate_buffer(fpu, event_mask))
+		return handled;
+
+	xdisable_setbits(xfirstuse_not_detected(fpu));
+
+	/* Clear the trap record. */
+	wrmsrl_safe(MSR_IA32_XFD_ERR, 0);
+	handled = true;
+
+	return handled;
+}
+
 DEFINE_IDTENTRY(exc_device_not_available)
 {
 	unsigned long cr0 = read_cr0();
 
+	if (handle_xfirstuse_event(&current->thread.fpu))
+		return;
+
 #ifdef CONFIG_MATH_EMULATION
 	if (!boot_cpu_has(X86_FEATURE_FPU) && (cr0 & X86_CR0_EM)) {
 		struct math_emu_info info = { };
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v3 14/21] x86/fpu/xstate: Support ptracer-induced xstate buffer expansion
  2020-12-23 15:56 [PATCH v3 00/21] x86: Support Intel Advanced Matrix Extensions Chang S. Bae
                   ` (12 preceding siblings ...)
  2020-12-23 15:57 ` [PATCH v3 13/21] x86/fpu/xstate: Expand dynamic context switch buffer on first use Chang S. Bae
@ 2020-12-23 15:57 ` Chang S. Bae
  2020-12-23 15:57 ` [PATCH v3 15/21] x86/fpu/xstate: Extend the table to map xstate components with features Chang S. Bae
                   ` (7 subsequent siblings)
  21 siblings, 0 replies; 64+ messages in thread
From: Chang S. Bae @ 2020-12-23 15:57 UTC (permalink / raw)
  To: bp, luto, tglx, mingo, x86
  Cc: len.brown, dave.hansen, jing2.liu, ravi.v.shankar, linux-kernel,
	chang.seok.bae

ptrace() may update xstate data before the target task has taken an XFD
fault and expanded the context switch buffer. Detect this case and allocate
a sufficient buffer to support the request. Also, disable the (now
unnecessary) associated first-use fault.

No functional change until the kernel supports dynamic user states.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v2:
* Updated the changelog with task->fpu removed. (Boris Petkov)
* Updated the code comments.
---
 arch/x86/kernel/fpu/regset.c | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/arch/x86/kernel/fpu/regset.c b/arch/x86/kernel/fpu/regset.c
index 8d863240b9c6..16ff8ac765c1 100644
--- a/arch/x86/kernel/fpu/regset.c
+++ b/arch/x86/kernel/fpu/regset.c
@@ -125,6 +125,35 @@ int xstateregs_set(struct task_struct *target, const struct user_regset *regset,
 
 	xsave = __xsave(fpu);
 
+	/*
+	 * When a ptracer attempts to write any state in the target buffer but not
+	 * sufficiently allocated, it dynamically expands the buffer.
+	 */
+	if (count > get_xstate_size(fpu->state_mask)) {
+		unsigned int offset, size;
+		struct xstate_header hdr;
+		u64 mask;
+
+		offset = offsetof(struct xregs_state, header);
+		size = sizeof(hdr);
+
+		/* Retrieve XSTATE_BV */
+		if (kbuf) {
+			memcpy(&hdr, kbuf + offset, size);
+		} else {
+			ret = __copy_from_user(&hdr, ubuf + offset, size);
+			if (ret)
+				return ret;
+		}
+
+		mask = hdr.xfeatures & xfeatures_mask_user_dynamic;
+		if (!mask) {
+			ret = alloc_xstate_buffer(fpu, mask);
+			if (ret)
+				return ret;
+		}
+	}
+
 	fpu__prepare_write(fpu);
 
 	if (using_compacted_format()) {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v3 15/21] x86/fpu/xstate: Extend the table to map xstate components with features
  2020-12-23 15:56 [PATCH v3 00/21] x86: Support Intel Advanced Matrix Extensions Chang S. Bae
                   ` (13 preceding siblings ...)
  2020-12-23 15:57 ` [PATCH v3 14/21] x86/fpu/xstate: Support ptracer-induced xstate buffer expansion Chang S. Bae
@ 2020-12-23 15:57 ` Chang S. Bae
  2020-12-23 15:57 ` [PATCH v3 16/21] x86/cpufeatures/amx: Enumerate Advanced Matrix Extension (AMX) feature bits Chang S. Bae
                   ` (6 subsequent siblings)
  21 siblings, 0 replies; 64+ messages in thread
From: Chang S. Bae @ 2020-12-23 15:57 UTC (permalink / raw)
  To: bp, luto, tglx, mingo, x86
  Cc: len.brown, dave.hansen, jing2.liu, ravi.v.shankar, linux-kernel,
	chang.seok.bae

At compile-time xfeatures_mask_all includes all possible XCR0 features. At
run-time fpu__init_system_xstate() clears features in xfeatures_mask_all
that are not enabled in CPUID. It does this by looping through all possible
XCR0 features.

Update the code to handle the possibility that there will be gaps in the
XCR0 feature bit numbers.

No functional change, until hardware with bit number gaps in XCR0.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v1:
* Rebased on the upstream kernel (5.10)
---
 arch/x86/kernel/fpu/xstate.c | 41 ++++++++++++++++++++++--------------
 1 file changed, 25 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 592e67ff6fa7..c2acfee581ba 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -43,18 +43,23 @@ static const char *xfeature_names[] =
 	"unknown xstate feature"	,
 };
 
-static short xsave_cpuid_features[] __initdata = {
-	X86_FEATURE_FPU,
-	X86_FEATURE_XMM,
-	X86_FEATURE_AVX,
-	X86_FEATURE_MPX,
-	X86_FEATURE_MPX,
-	X86_FEATURE_AVX512F,
-	X86_FEATURE_AVX512F,
-	X86_FEATURE_AVX512F,
-	X86_FEATURE_INTEL_PT,
-	X86_FEATURE_PKU,
-	X86_FEATURE_ENQCMD,
+struct xfeature_capflag_info {
+	int xfeature_idx;
+	short cpu_cap;
+};
+
+static struct xfeature_capflag_info xfeature_capflags[] __initdata = {
+	{ XFEATURE_FP,				X86_FEATURE_FPU },
+	{ XFEATURE_SSE,				X86_FEATURE_XMM },
+	{ XFEATURE_YMM,				X86_FEATURE_AVX },
+	{ XFEATURE_BNDREGS,			X86_FEATURE_MPX },
+	{ XFEATURE_BNDCSR,			X86_FEATURE_MPX },
+	{ XFEATURE_OPMASK,			X86_FEATURE_AVX512F },
+	{ XFEATURE_ZMM_Hi256,			X86_FEATURE_AVX512F },
+	{ XFEATURE_Hi16_ZMM,			X86_FEATURE_AVX512F },
+	{ XFEATURE_PT_UNIMPLEMENTED_SO_FAR,	X86_FEATURE_INTEL_PT },
+	{ XFEATURE_PKRU,			X86_FEATURE_PKU },
+	{ XFEATURE_PASID,			X86_FEATURE_ENQCMD },
 };
 
 /*
@@ -956,11 +961,15 @@ void __init fpu__init_system_xstate(void)
 	}
 
 	/*
-	 * Clear XSAVE features that are disabled in the normal CPUID.
+	 * Cross-check XSAVE feature with CPU capability flag. Clear the
+	 * mask bit for disabled features.
 	 */
-	for (i = 0; i < ARRAY_SIZE(xsave_cpuid_features); i++) {
-		if (!boot_cpu_has(xsave_cpuid_features[i]))
-			xfeatures_mask_all &= ~BIT_ULL(i);
+	for (i = 0; i < ARRAY_SIZE(xfeature_capflags); i++) {
+		short cpu_cap = xfeature_capflags[i].cpu_cap;
+		int idx = xfeature_capflags[i].xfeature_idx;
+
+		if (!boot_cpu_has(cpu_cap))
+			xfeatures_mask_all &= ~BIT_ULL(idx);
 	}
 
 	xfeatures_mask_all &= fpu__get_supported_xfeatures_mask();
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v3 16/21] x86/cpufeatures/amx: Enumerate Advanced Matrix Extension (AMX) feature bits
  2020-12-23 15:56 [PATCH v3 00/21] x86: Support Intel Advanced Matrix Extensions Chang S. Bae
                   ` (14 preceding siblings ...)
  2020-12-23 15:57 ` [PATCH v3 15/21] x86/fpu/xstate: Extend the table to map xstate components with features Chang S. Bae
@ 2020-12-23 15:57 ` Chang S. Bae
  2020-12-23 15:57 ` [PATCH v3 17/21] x86/fpu/amx: Define AMX state components and have it used for boot-time checks Chang S. Bae
                   ` (5 subsequent siblings)
  21 siblings, 0 replies; 64+ messages in thread
From: Chang S. Bae @ 2020-12-23 15:57 UTC (permalink / raw)
  To: bp, luto, tglx, mingo, x86
  Cc: len.brown, dave.hansen, jing2.liu, ravi.v.shankar, linux-kernel,
	chang.seok.bae

Intel's Advanced Matrix Extension (AMX) is a new 64-bit extended feature
consisting of two-dimensional registers and an accelerator unit. The first
implementation of the latter is the tile matrix multiply unit (TMUL). TMUL
performs SIMD dot-products on four bytes (INT8) or two bfloat16
floating-point (BF16) elements.

Here we add AMX to the kernel/user ABI, by enumerating the capability.
E.g., /proc/cpuinfo: amx_tile, amx_bf16, amx_int8

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
 arch/x86/include/asm/cpufeatures.h | 3 +++
 arch/x86/kernel/cpu/cpuid-deps.c   | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 5b6496ee3703..a1839b6a1929 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -375,6 +375,9 @@
 #define X86_FEATURE_TSXLDTRK		(18*32+16) /* TSX Suspend Load Address Tracking */
 #define X86_FEATURE_PCONFIG		(18*32+18) /* Intel PCONFIG */
 #define X86_FEATURE_ARCH_LBR		(18*32+19) /* Intel ARCH LBR */
+#define X86_FEATURE_AMX_BF16		(18*32+22) /* AMX BF16 Support */
+#define X86_FEATURE_AMX_TILE		(18*32+24) /* AMX tile Support */
+#define X86_FEATURE_AMX_INT8		(18*32+25) /* AMX INT8 Support */
 #define X86_FEATURE_SPEC_CTRL		(18*32+26) /* "" Speculation Control (IBRS + IBPB) */
 #define X86_FEATURE_INTEL_STIBP		(18*32+27) /* "" Single Thread Indirect Branch Predictors */
 #define X86_FEATURE_FLUSH_L1D		(18*32+28) /* Flush L1D cache */
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index a9e8e160ae30..1cef9264067e 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -72,6 +72,9 @@ static const struct cpuid_dep cpuid_deps[] = {
 	{ X86_FEATURE_ENQCMD,			X86_FEATURE_XSAVES    },
 	{ X86_FEATURE_PER_THREAD_MBA,		X86_FEATURE_MBA       },
 	{ X86_FEATURE_XFD,			X86_FEATURE_XSAVES    },
+	{ X86_FEATURE_AMX_TILE,			X86_FEATURE_XSAVE     },
+	{ X86_FEATURE_AMX_INT8,			X86_FEATURE_AMX_TILE  },
+	{ X86_FEATURE_AMX_BF16,			X86_FEATURE_AMX_TILE  },
 	{}
 };
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v3 17/21] x86/fpu/amx: Define AMX state components and have it used for boot-time checks
  2020-12-23 15:56 [PATCH v3 00/21] x86: Support Intel Advanced Matrix Extensions Chang S. Bae
                   ` (15 preceding siblings ...)
  2020-12-23 15:57 ` [PATCH v3 16/21] x86/cpufeatures/amx: Enumerate Advanced Matrix Extension (AMX) feature bits Chang S. Bae
@ 2020-12-23 15:57 ` Chang S. Bae
  2020-12-23 15:57 ` [PATCH v3 18/21] x86/fpu/amx: Enable the AMX feature in 64-bit mode Chang S. Bae
                   ` (4 subsequent siblings)
  21 siblings, 0 replies; 64+ messages in thread
From: Chang S. Bae @ 2020-12-23 15:57 UTC (permalink / raw)
  To: bp, luto, tglx, mingo, x86
  Cc: len.brown, dave.hansen, jing2.liu, ravi.v.shankar, linux-kernel,
	chang.seok.bae

Linux uses check_xstate_against_struct() to sanity check the size of
XSTATE-enabled features. AMX is the XSAVE-enabled feature, and its size is
not hard-coded but discoverable at run-time via CPUID.

The AMX state is composed of state components 17 and 18, which are all user
state components. The first component is the XTILECFG state of a 64-byte
tile-related control register. The state component 18, called XTILEDATA,
contains the actual tile data, and the state size varies on
implementations. The architectural maximum, as defined in the CPUID(0x1d,
1): EAX[15:0], is a byte less than 64KB. The first implementation supports
8KB.

Check the XTILEDATA state size dynamically. The feature introduces the new
tile register, TMM. Define one register struct only and read the number of
registers from CPUID. Cross-check the overall size with CPUID again.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v2:
* Updated the code comments.

Changes from v1:
* Rebased on the upstream kernel (5.10)
---
 arch/x86/include/asm/fpu/types.h  | 27 ++++++++++++++
 arch/x86/include/asm/fpu/xstate.h |  2 +
 arch/x86/kernel/fpu/xstate.c      | 62 +++++++++++++++++++++++++++++++
 3 files changed, 91 insertions(+)

diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
index 3fc6dbbe3ede..bf9511efd546 100644
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -120,6 +120,9 @@ enum xfeature {
 	XFEATURE_RSRVD_COMP_13,
 	XFEATURE_RSRVD_COMP_14,
 	XFEATURE_LBR,
+	XFEATURE_RSRVD_COMP_16,
+	XFEATURE_XTILE_CFG,
+	XFEATURE_XTILE_DATA,
 
 	XFEATURE_MAX,
 };
@@ -136,11 +139,15 @@ enum xfeature {
 #define XFEATURE_MASK_PKRU		(1 << XFEATURE_PKRU)
 #define XFEATURE_MASK_PASID		(1 << XFEATURE_PASID)
 #define XFEATURE_MASK_LBR		(1 << XFEATURE_LBR)
+#define XFEATURE_MASK_XTILE_CFG	(1 << XFEATURE_XTILE_CFG)
+#define XFEATURE_MASK_XTILE_DATA	(1 << XFEATURE_XTILE_DATA)
 
 #define XFEATURE_MASK_FPSSE		(XFEATURE_MASK_FP | XFEATURE_MASK_SSE)
 #define XFEATURE_MASK_AVX512		(XFEATURE_MASK_OPMASK \
 					 | XFEATURE_MASK_ZMM_Hi256 \
 					 | XFEATURE_MASK_Hi16_ZMM)
+#define XFEATURE_MASK_XTILE		(XFEATURE_MASK_XTILE_DATA \
+					 | XFEATURE_MASK_XTILE_CFG)
 
 #define FIRST_EXTENDED_XFEATURE	XFEATURE_YMM
 
@@ -153,6 +160,9 @@ struct reg_256_bit {
 struct reg_512_bit {
 	u8	regbytes[512/8];
 };
+struct reg_1024_byte {
+	u8	regbytes[1024];
+};
 
 /*
  * State component 2:
@@ -255,6 +265,23 @@ struct arch_lbr_state {
 	u64 ler_to;
 	u64 ler_info;
 	struct lbr_entry		entries[];
+};
+
+/*
+ * State component 17: 64-byte tile configuration register.
+ */
+struct xtile_cfg {
+	u64				tcfg[8];
+} __packed;
+
+/*
+ * State component 18: 1KB tile data register.
+ * Each register represents 16 64-byte rows of the matrix
+ * data. But the number of registers depends on the actual
+ * implementation.
+ */
+struct xtile_data {
+	struct reg_1024_byte		tmm;
 } __packed;
 
 /*
diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
index 5927033e017f..08d3dd18d7d8 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -13,6 +13,8 @@
 
 #define XSTATE_CPUID		0x0000000d
 
+#define TILE_CPUID		0x0000001d
+
 #define FXSAVE_SIZE	512
 
 #define XSAVE_HDR_SIZE	    64
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index c2acfee581ba..f54ff1d4a44b 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -41,6 +41,14 @@ static const char *xfeature_names[] =
 	"Protection Keys User registers",
 	"PASID state",
 	"unknown xstate feature"	,
+	"unknown xstate feature"	,
+	"unknown xstate feature"	,
+	"unknown xstate feature"	,
+	"unknown xstate feature"	,
+	"unknown xstate feature"	,
+	"AMX Tile config"		,
+	"AMX Tile data"			,
+	"unknown xstate feature"	,
 };
 
 struct xfeature_capflag_info {
@@ -60,6 +68,8 @@ static struct xfeature_capflag_info xfeature_capflags[] __initdata = {
 	{ XFEATURE_PT_UNIMPLEMENTED_SO_FAR,	X86_FEATURE_INTEL_PT },
 	{ XFEATURE_PKRU,			X86_FEATURE_PKU },
 	{ XFEATURE_PASID,			X86_FEATURE_ENQCMD },
+	{ XFEATURE_XTILE_CFG,			X86_FEATURE_AMX_TILE },
+	{ XFEATURE_XTILE_DATA,			X86_FEATURE_AMX_TILE }
 };
 
 /*
@@ -424,6 +434,8 @@ static void __init print_xstate_features(void)
 	print_xstate_feature(XFEATURE_MASK_Hi16_ZMM);
 	print_xstate_feature(XFEATURE_MASK_PKRU);
 	print_xstate_feature(XFEATURE_MASK_PASID);
+	print_xstate_feature(XFEATURE_MASK_XTILE_CFG);
+	print_xstate_feature(XFEATURE_MASK_XTILE_DATA);
 }
 
 /*
@@ -676,6 +688,51 @@ static void __xstate_dump_leaves(void)
 	}								\
 } while (0)
 
+static void check_xtile_data_against_struct(int size)
+{
+	u32 max_palid, palid, state_size;
+	u32 eax, ebx, ecx, edx;
+	u16 max_tile;
+
+	/*
+	 * Check the maximum palette id:
+	 *   eax: the highest numbered palette subleaf.
+	 */
+	cpuid_count(TILE_CPUID, 0, &max_palid, &ebx, &ecx, &edx);
+
+	/*
+	 * Cross-check each tile size and find the maximum
+	 * number of supported tiles.
+	 */
+	for (palid = 1, max_tile = 0; palid <= max_palid; palid++) {
+		u16 tile_size, max;
+
+		/*
+		 * Check the tile size info:
+		 *   eax[31:16]:  bytes per title
+		 *   ebx[31:16]:  the max names (or max number of tiles)
+		 */
+		cpuid_count(TILE_CPUID, palid, &eax, &ebx, &edx, &edx);
+		tile_size = eax >> 16;
+		max = ebx >> 16;
+
+		if (WARN_ONCE(tile_size != sizeof(struct xtile_data),
+			      "%s: struct is %zu bytes, cpu xtile %d bytes\n",
+			      __stringify(XFEATURE_XTILE_DATA),
+			      sizeof(struct xtile_data), tile_size))
+			__xstate_dump_leaves();
+
+		if (max > max_tile)
+			max_tile = max;
+	}
+
+	state_size = sizeof(struct xtile_data) * max_tile;
+	if (WARN_ONCE(size != state_size,
+		      "%s: calculated size is %u bytes, cpu state %d bytes\n",
+		      __stringify(XFEATURE_XTILE_DATA), state_size, size))
+		__xstate_dump_leaves();
+}
+
 /*
  * We have a C struct for each 'xstate'.  We need to ensure
  * that our software representation matches what the CPU
@@ -699,6 +756,11 @@ static void check_xstate_against_struct(int nr)
 	XCHECK_SZ(sz, nr, XFEATURE_Hi16_ZMM,  struct avx_512_hi16_state);
 	XCHECK_SZ(sz, nr, XFEATURE_PKRU,      struct pkru_state);
 	XCHECK_SZ(sz, nr, XFEATURE_PASID,     struct ia32_pasid_state);
+	XCHECK_SZ(sz, nr, XFEATURE_XTILE_CFG, struct xtile_cfg);
+
+	/* The tile data size varies between implementations */
+	if (nr == XFEATURE_XTILE_DATA)
+		check_xtile_data_against_struct(sz);
 
 	/*
 	 * Make *SURE* to add any feature numbers in below if
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v3 18/21] x86/fpu/amx: Enable the AMX feature in 64-bit mode
  2020-12-23 15:56 [PATCH v3 00/21] x86: Support Intel Advanced Matrix Extensions Chang S. Bae
                   ` (16 preceding siblings ...)
  2020-12-23 15:57 ` [PATCH v3 17/21] x86/fpu/amx: Define AMX state components and have it used for boot-time checks Chang S. Bae
@ 2020-12-23 15:57 ` Chang S. Bae
  2020-12-23 15:57 ` [PATCH v3 19/21] selftest/x86/amx: Include test cases for the AMX state management Chang S. Bae
                   ` (3 subsequent siblings)
  21 siblings, 0 replies; 64+ messages in thread
From: Chang S. Bae @ 2020-12-23 15:57 UTC (permalink / raw)
  To: bp, luto, tglx, mingo, x86
  Cc: len.brown, dave.hansen, jing2.liu, ravi.v.shankar, linux-kernel,
	chang.seok.bae

In 64-bit mode, include the AMX state components in
XFEATURE_MASK_USER_SUPPORTED.

The XFD feature will be used to dynamically allocate per-task XSAVE
buffer on first use.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
 arch/x86/include/asm/fpu/xstate.h | 3 ++-
 arch/x86/kernel/fpu/init.c        | 8 ++++++--
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
index 08d3dd18d7d8..8f5218d420ad 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -34,7 +34,8 @@
 				      XFEATURE_MASK_Hi16_ZMM	 | \
 				      XFEATURE_MASK_PKRU | \
 				      XFEATURE_MASK_BNDREGS | \
-				      XFEATURE_MASK_BNDCSR)
+				      XFEATURE_MASK_BNDCSR | \
+				      XFEATURE_MASK_XTILE)
 
 /* All currently supported supervisor features */
 #define XFEATURE_MASK_SUPERVISOR_SUPPORTED (XFEATURE_MASK_PASID)
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 5dac97158030..c77c1c5580f9 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -233,8 +233,12 @@ static void __init fpu__init_system_xstate_size_legacy(void)
  */
 u64 __init fpu__get_supported_xfeatures_mask(void)
 {
-	return XFEATURE_MASK_USER_SUPPORTED |
-	       XFEATURE_MASK_SUPERVISOR_SUPPORTED;
+	u64 mask = XFEATURE_MASK_USER_SUPPORTED | XFEATURE_MASK_SUPERVISOR_SUPPORTED;
+
+	if (!IS_ENABLED(CONFIG_X86_64))
+		mask &= ~(XFEATURE_MASK_XTILE);
+
+	return mask;
 }
 
 /* Legacy code to initialize eager fpu mode. */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v3 19/21] selftest/x86/amx: Include test cases for the AMX state management
  2020-12-23 15:56 [PATCH v3 00/21] x86: Support Intel Advanced Matrix Extensions Chang S. Bae
                   ` (17 preceding siblings ...)
  2020-12-23 15:57 ` [PATCH v3 18/21] x86/fpu/amx: Enable the AMX feature in 64-bit mode Chang S. Bae
@ 2020-12-23 15:57 ` Chang S. Bae
  2020-12-23 15:57 ` [PATCH v3 20/21] x86/fpu/xstate: Support dynamic user state in the signal handling path Chang S. Bae
                   ` (2 subsequent siblings)
  21 siblings, 0 replies; 64+ messages in thread
From: Chang S. Bae @ 2020-12-23 15:57 UTC (permalink / raw)
  To: bp, luto, tglx, mingo, x86
  Cc: len.brown, dave.hansen, jing2.liu, ravi.v.shankar, linux-kernel,
	chang.seok.bae, linux-kselftest

This selftest exercises the kernel's behavior not to inherit AMX state and
the ability to switch the context by verifying that they retain unique
data between multiple threads.

Also, ptrace() is used to insert AMX state into existing threads -- both
before and after the existing thread has initialized its AMX state.

Collect the test cases of validating those operations together, as they
share some common setup for the AMX state.

These test cases do not depend on AMX compiler support, as they employ
user-space-XSAVE directly to access AMX state.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-kselftest@vger.kernel.org
---
Changes from v2:
* Updated the test messages and the changelog as tile data is not inherited
  to a child anymore.
* Removed bytecode for the instructions already supported by binutils.
* Changed to check the XSAVE availability in a reliable way.

Changes from v1:
* Removed signal testing code
---
 tools/testing/selftests/x86/Makefile |   2 +-
 tools/testing/selftests/x86/amx.c    | 677 +++++++++++++++++++++++++++
 2 files changed, 678 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/x86/amx.c

diff --git a/tools/testing/selftests/x86/Makefile b/tools/testing/selftests/x86/Makefile
index 6703c7906b71..8408bbde788f 100644
--- a/tools/testing/selftests/x86/Makefile
+++ b/tools/testing/selftests/x86/Makefile
@@ -17,7 +17,7 @@ TARGETS_C_BOTHBITS := single_step_syscall sysret_ss_attrs syscall_nt test_mremap
 TARGETS_C_32BIT_ONLY := entry_from_vm86 test_syscall_vdso unwind_vdso \
 			test_FCMOV test_FCOMI test_FISTTP \
 			vdso_restorer
-TARGETS_C_64BIT_ONLY := fsgsbase sysret_rip syscall_numbering
+TARGETS_C_64BIT_ONLY := fsgsbase sysret_rip syscall_numbering amx
 # Some selftests require 32bit support enabled also on 64bit systems
 TARGETS_C_32BIT_NEEDED := ldt_gdt ptrace_syscall
 
diff --git a/tools/testing/selftests/x86/amx.c b/tools/testing/selftests/x86/amx.c
new file mode 100644
index 000000000000..f4ecdfd27ae9
--- /dev/null
+++ b/tools/testing/selftests/x86/amx.c
@@ -0,0 +1,677 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define _GNU_SOURCE
+#include <err.h>
+#include <elf.h>
+#include <pthread.h>
+#include <sched.h>
+#include <setjmp.h>
+#include <signal.h>
+#include <stdio.h>
+#include <string.h>
+#include <stdbool.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <time.h>
+#include <malloc.h>
+#include <unistd.h>
+#include <ucontext.h>
+
+#include <linux/futex.h>
+
+#include <sys/ipc.h>
+#include <sys/mman.h>
+#include <sys/ptrace.h>
+#include <sys/shm.h>
+#include <sys/signal.h>
+#include <sys/syscall.h>
+#include <sys/time.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <sys/uio.h>
+#include <sys/ucontext.h>
+
+#include <x86intrin.h>
+
+#ifndef __x86_64__
+# error This test is 64-bit only
+#endif
+
+typedef uint8_t u8;
+typedef uint16_t u16;
+typedef uint32_t u32;
+typedef uint64_t u64;
+
+#define PAGE_SIZE			(1 << 12)
+
+#define NUM_TILES			8
+#define TILE_SIZE			1024
+#define XSAVE_SIZE			((NUM_TILES * TILE_SIZE) + PAGE_SIZE)
+
+struct xsave_data {
+	u8 area[XSAVE_SIZE];
+} __attribute__((aligned(64)));
+
+/* Tile configuration associated: */
+#define MAX_TILES			16
+#define RESERVED_BYTES			14
+
+struct tile_config {
+	u8  palette_id;
+	u8  start_row;
+	u8  reserved[RESERVED_BYTES];
+	u16 colsb[MAX_TILES];
+	u8  rows[MAX_TILES];
+};
+
+struct tile_data {
+	u8 data[NUM_TILES * TILE_SIZE];
+};
+
+static inline u64 __xgetbv(u32 index)
+{
+	u32 eax, edx;
+
+	asm volatile("xgetbv;"
+		     : "=a" (eax), "=d" (edx)
+		     : "c" (index));
+	return eax + ((u64)edx << 32);
+}
+
+static inline void __cpuid(u32 *eax, u32 *ebx, u32 *ecx, u32 *edx)
+{
+	asm volatile("cpuid;"
+		     : "=a" (*eax), "=b" (*ebx), "=c" (*ecx), "=d" (*edx)
+		     : "0" (*eax), "2" (*ecx));
+}
+
+/* Load tile configuration */
+static inline void __ldtilecfg(void *cfg)
+{
+	asm volatile(".byte 0xc4,0xe2,0x78,0x49,0x00"
+		     : : "a"(cfg));
+}
+
+/* Load tile data to %tmm0 register only */
+static inline void __tileloadd(void *tile)
+{
+	asm volatile(".byte 0xc4,0xe2,0x7b,0x4b,0x04,0x10"
+		     : : "a"(tile), "d"(0));
+}
+
+/* Save extended states */
+static inline void __xsave(void *buffer, u32 lo, u32 hi)
+{
+	asm volatile("xsave (%%rdi)"
+		     : : "D" (buffer), "a" (lo), "d" (hi)
+		     : "memory");
+}
+
+/* Restore extended states */
+static inline void __xrstor(void *buffer, u32 lo, u32 hi)
+{
+	asm volatile("xrstor (%%rdi)"
+		     : : "D" (buffer), "a" (lo), "d" (hi));
+}
+
+/* Release tile states to init values */
+static inline void __tilerelease(void)
+{
+	asm volatile(".byte 0xc4, 0xe2, 0x78, 0x49, 0xc0" ::);
+}
+
+static void sethandler(int sig, void (*handler)(int, siginfo_t *, void *),
+		       int flags)
+{
+	struct sigaction sa;
+
+	memset(&sa, 0, sizeof(sa));
+	sa.sa_sigaction = handler;
+	sa.sa_flags = SA_SIGINFO | flags;
+	sigemptyset(&sa.sa_mask);
+	if (sigaction(sig, &sa, 0))
+		err(1, "sigaction");
+}
+
+static void clearhandler(int sig)
+{
+	struct sigaction sa;
+
+	memset(&sa, 0, sizeof(sa));
+	sa.sa_handler = SIG_DFL;
+	sigemptyset(&sa.sa_mask);
+	if (sigaction(sig, &sa, 0))
+		err(1, "sigaction");
+}
+
+/* Hardware info check: */
+
+static jmp_buf jmpbuf;
+static bool xsave_disabled;
+
+static void handle_sigill(int sig, siginfo_t *si, void *ctx_void)
+{
+	xsave_disabled = true;
+	siglongjmp(jmpbuf, 1);
+}
+
+#define XFEATURE_XTILE_CFG      17
+#define XFEATURE_XTILE_DATA     18
+#define XFEATURE_MASK_XTILE     ((1 << XFEATURE_XTILE_DATA) | \
+				 (1 << XFEATURE_XTILE_CFG))
+
+static inline bool check_xsave_supports_xtile(void)
+{
+	bool supported = false;
+
+	sethandler(SIGILL, handle_sigill, 0);
+
+	if (!sigsetjmp(jmpbuf, 1))
+		supported = __xgetbv(0) & XFEATURE_MASK_XTILE;
+
+	clearhandler(SIGILL);
+	return supported;
+}
+
+struct xtile_hwinfo {
+	struct {
+		u16 bytes_per_tile;
+		u16 bytes_per_row;
+		u16 max_names;
+		u16 max_rows;
+	} spec;
+
+	struct {
+		u32 offset;
+		u32 size;
+	} xsave;
+};
+
+static struct xtile_hwinfo xtile;
+
+static bool __enum_xtile_config(void)
+{
+	u32 eax, ebx, ecx, edx;
+	u16 bytes_per_tile;
+	bool valid = false;
+
+#define TILE_CPUID			0x1d
+#define TILE_PALETTE_CPUID_SUBLEAVE	0x1
+
+	eax = TILE_CPUID;
+	ecx = TILE_PALETTE_CPUID_SUBLEAVE;
+
+	__cpuid(&eax, &ebx, &ecx, &edx);
+	if (!eax || !ebx || !ecx)
+		return valid;
+
+	xtile.spec.max_names = ebx >> 16;
+	if (xtile.spec.max_names < NUM_TILES)
+		return valid;
+
+	bytes_per_tile = eax >> 16;
+	if (bytes_per_tile < TILE_SIZE)
+		return valid;
+
+	xtile.spec.bytes_per_row = ebx;
+	xtile.spec.max_rows = ecx;
+	valid = true;
+
+	return valid;
+}
+
+static bool __enum_xsave_tile(void)
+{
+	u32 eax, ebx, ecx, edx;
+	bool valid = false;
+
+#define XSTATE_CPUID			0xd
+#define XSTATE_USER_STATE_SUBLEAVE	0x0
+
+	eax = XSTATE_CPUID;
+	ecx = XFEATURE_XTILE_DATA;
+
+	__cpuid(&eax, &ebx, &ecx, &edx);
+	if (!eax || !ebx)
+		return valid;
+
+	xtile.xsave.offset = ebx;
+	xtile.xsave.size = eax;
+	valid = true;
+
+	return valid;
+}
+
+static bool __check_xsave_size(void)
+{
+	u32 eax, ebx, ecx, edx;
+	bool valid = false;
+
+	eax = XSTATE_CPUID;
+	ecx = XSTATE_USER_STATE_SUBLEAVE;
+
+	__cpuid(&eax, &ebx, &ecx, &edx);
+	if (ebx && ebx <= XSAVE_SIZE)
+		valid = true;
+
+	return valid;
+}
+
+/*
+ * Check the hardware-provided tile state info and cross-check it with the
+ * hard-coded values: XSAVE_SIZE, NUM_TILES, and TILE_SIZE.
+ */
+static int check_xtile_hwinfo(void)
+{
+	bool success = false;
+
+	if (!__check_xsave_size())
+		return success;
+
+	if (!__enum_xsave_tile())
+		return success;
+
+	if (!__enum_xtile_config())
+		return success;
+
+	if (sizeof(struct tile_data) >= xtile.xsave.size)
+		success = true;
+
+	return success;
+}
+
+/* The helpers for managing XSAVE buffer and tile states: */
+
+/* Use the uncompacted format without 'init optimization' */
+static void save_xdata(void *data)
+{
+	__xsave(data, -1, -1);
+}
+
+static void restore_xdata(void *data)
+{
+	__xrstor(data, -1, -1);
+}
+
+static inline u64 __get_xsave_xstate_bv(void *data)
+{
+#define XSAVE_HDR_OFFSET	512
+	return *(u64 *)(data + XSAVE_HDR_OFFSET);
+}
+
+static void set_tilecfg(struct tile_config *cfg)
+{
+	int i;
+
+	memset(cfg, 0, sizeof(*cfg));
+	/* The first implementation has one significant palette with id 1 */
+	cfg->palette_id = 1;
+	for (i = 0; i < xtile.spec.max_names; i++) {
+		cfg->colsb[i] = xtile.spec.bytes_per_row;
+		cfg->rows[i] = xtile.spec.max_rows;
+	}
+}
+
+static void load_tilecfg(struct tile_config *cfg)
+{
+	__ldtilecfg(cfg);
+}
+
+static void make_tiles(void *tiles)
+{
+	u32 iterations = xtile.xsave.size / sizeof(u32);
+	static u32 value = 1;
+	u32 *ptr = tiles;
+	int i;
+
+	for (i = 0, ptr = tiles; i < iterations; i++, ptr++)
+		*ptr  = value;
+	value++;
+}
+
+/*
+ * Initialize the XSAVE buffer:
+ *
+ * Make sure tile configuration loaded already. Load limited tile data (%tmm0 only)
+ * and save all the states. XSAVE buffer is ready to complete tile data.
+ */
+static void init_xdata(void *data)
+{
+	struct tile_data tiles;
+
+	make_tiles(&tiles);
+	__tileloadd(&tiles);
+	__xsave(data, -1, -1);
+}
+
+static inline void *__get_xsave_tile_data_addr(void *data)
+{
+	return data + xtile.xsave.offset;
+}
+
+static void copy_tiles_to_xdata(void *xdata, void *tiles)
+{
+	void *dst = __get_xsave_tile_data_addr(xdata);
+
+	memcpy(dst, tiles, xtile.xsave.size);
+}
+
+static int compare_xdata_tiles(void *xdata, void *tiles)
+{
+	void *tile_data = __get_xsave_tile_data_addr(xdata);
+
+	if (memcmp(tile_data, tiles, xtile.xsave.size))
+		return 1;
+
+	return 0;
+}
+
+static int nerrs, errs;
+
+/* Testing tile data inheritance */
+
+static void test_tile_data_inheritance(void)
+{
+	struct xsave_data xdata;
+	struct tile_data tiles;
+	struct tile_config cfg;
+	pid_t child;
+	int status;
+
+	set_tilecfg(&cfg);
+	load_tilecfg(&cfg);
+	init_xdata(&xdata);
+
+	make_tiles(&tiles);
+	copy_tiles_to_xdata(&xdata, &tiles);
+	restore_xdata(&xdata);
+
+	errs = 0;
+
+	child = fork();
+	if (child < 0)
+		err(1, "fork");
+
+	if (child == 0) {
+		memset(&xdata, 0, sizeof(xdata));
+		save_xdata(&xdata);
+		if (compare_xdata_tiles(&xdata, &tiles)) {
+			printf("[OK]\tchild didn't inherit tile data at fork()\n");
+		} else {
+			printf("[FAIL]\tchild inherited tile data at fork()\n");
+			nerrs++;
+		}
+		_exit(0);
+	}
+	wait(&status);
+}
+
+static void test_fork(void)
+{
+	pid_t child;
+	int status;
+
+	child = fork();
+	if (child < 0)
+		err(1, "fork");
+
+	if (child == 0) {
+		test_tile_data_inheritance();
+		_exit(0);
+	}
+
+	wait(&status);
+}
+
+/* Context switching test */
+
+#define ITERATIONS			10
+#define NUM_THREADS			5
+
+struct futex_info {
+	int current;
+	int next;
+	int *futex;
+};
+
+static inline void command_wait(struct futex_info *info, int value)
+{
+	do {
+		sched_yield();
+	} while (syscall(SYS_futex, info->futex, FUTEX_WAIT, value, 0, 0, 0));
+}
+
+static inline void command_wake(struct futex_info *info, int value)
+{
+	do {
+		*info->futex = value;
+		while (!syscall(SYS_futex, info->futex, FUTEX_WAKE, 1, 0, 0, 0))
+			sched_yield();
+	} while (0);
+}
+
+static inline int get_iterative_value(int id)
+{
+	return ((id << 1) & ~0x1);
+}
+
+static inline int get_endpoint_value(int id)
+{
+	return ((id << 1) | 0x1);
+}
+
+static void *check_tiles(void *info)
+{
+	struct futex_info *finfo = (struct futex_info *)info;
+	struct xsave_data xdata;
+	struct tile_data tiles;
+	struct tile_config cfg;
+	int i;
+
+	set_tilecfg(&cfg);
+	load_tilecfg(&cfg);
+	init_xdata(&xdata);
+
+	make_tiles(&tiles);
+	copy_tiles_to_xdata(&xdata, &tiles);
+	restore_xdata(&xdata);
+
+	for (i = 0; i < ITERATIONS; i++) {
+		command_wait(finfo, get_iterative_value(finfo->current));
+
+		memset(&xdata, 0, sizeof(xdata));
+		save_xdata(&xdata);
+		errs += compare_xdata_tiles(&xdata, &tiles);
+
+		make_tiles(&tiles);
+		copy_tiles_to_xdata(&xdata, &tiles);
+		restore_xdata(&xdata);
+
+		command_wake(finfo, get_iterative_value(finfo->next));
+	}
+
+	command_wait(finfo, get_endpoint_value(finfo->current));
+	__tilerelease();
+	return NULL;
+}
+
+static int create_children(int num, struct futex_info *finfo)
+{
+	const int shm_id = shmget(IPC_PRIVATE, sizeof(int), IPC_CREAT | 0666);
+	int *futex = shmat(shm_id, NULL, 0);
+	pthread_t thread;
+	int i;
+
+	for (i = 0; i < num; i++) {
+		finfo[i].futex = futex;
+		finfo[i].current = i + 1;
+		finfo[i].next = (i + 2) % (num + 1);
+
+		if (pthread_create(&thread, NULL, check_tiles, &finfo[i])) {
+			err(1, "pthread_create");
+			return 1;
+		}
+	}
+	return 0;
+}
+
+static void test_context_switch(void)
+{
+	struct futex_info *finfo;
+	cpu_set_t cpuset;
+	int i;
+
+	printf("[RUN]\t%u context switches of tile states in %d threads\n",
+	       ITERATIONS * NUM_THREADS, NUM_THREADS);
+
+	errs = 0;
+
+	CPU_ZERO(&cpuset);
+	CPU_SET(0, &cpuset);
+	if (sched_setaffinity(0, sizeof(cpuset), &cpuset) != 0)
+		err(1, "sched_setaffinity to CPU 0");
+
+	finfo = malloc(sizeof(*finfo) * NUM_THREADS);
+
+	if (create_children(NUM_THREADS, finfo))
+		return;
+
+	for (i = 0; i < ITERATIONS; i++) {
+		command_wake(finfo, get_iterative_value(1));
+		command_wait(finfo, get_iterative_value(0));
+	}
+
+	for (i = 1; i <= NUM_THREADS; i++)
+		command_wake(finfo, get_endpoint_value(i));
+
+	if (errs) {
+		printf("[FAIL]\t%u incorrect tile states\n", errs);
+		nerrs += errs;
+		return;
+	}
+
+	printf("[OK]\tall tile states are correct\n");
+}
+
+/* Ptrace test */
+
+static inline long get_tile_state(pid_t child, struct iovec *iov)
+{
+	return ptrace(PTRACE_GETREGSET, child, (u32)NT_X86_XSTATE, iov);
+}
+
+static inline long set_tile_state(pid_t child, struct iovec *iov)
+{
+	return ptrace(PTRACE_SETREGSET, child, (u32)NT_X86_XSTATE, iov);
+}
+
+static int write_tile_state(bool load_tile, pid_t child)
+{
+	struct xsave_data xdata;
+	struct tile_data tiles;
+	struct iovec iov;
+
+	iov.iov_base = &xdata;
+	iov.iov_len = sizeof(xdata);
+
+	if (get_tile_state(child, &iov))
+		err(1, "PTRACE_GETREGSET");
+
+	make_tiles(&tiles);
+	copy_tiles_to_xdata(&xdata, &tiles);
+	if (set_tile_state(child, &iov))
+		err(1, "PTRACE_SETREGSET");
+
+	memset(&xdata, 0, sizeof(xdata));
+	if (get_tile_state(child, &iov))
+		err(1, "PTRACE_GETREGSET");
+
+	if (!load_tile)
+		memset(&tiles, 0, sizeof(tiles));
+
+	return compare_xdata_tiles(&xdata, &tiles);
+}
+
+static void test_tile_state_write(bool load_tile)
+{
+	pid_t child;
+	int status;
+
+	child = fork();
+	if (child < 0)
+		err(1, "fork");
+
+	if (child == 0) {
+		printf("[RUN]\tPtrace-induced tile state write, ");
+		printf("%s tile data loaded\n", load_tile ? "with" : "without");
+
+		if (ptrace(PTRACE_TRACEME, 0, NULL, NULL))
+			err(1, "PTRACE_TRACEME");
+
+		if (load_tile) {
+			struct tile_config cfg;
+			struct tile_data tiles;
+
+			set_tilecfg(&cfg);
+			load_tilecfg(&cfg);
+			make_tiles(&tiles);
+			/* Load only %tmm0 but inducing the #NM */
+			__tileloadd(&tiles);
+		}
+
+		raise(SIGTRAP);
+		_exit(0);
+	}
+
+	do {
+		wait(&status);
+	} while (WSTOPSIG(status) != SIGTRAP);
+
+	errs = write_tile_state(load_tile, child);
+	if (errs) {
+		nerrs++;
+		printf("[FAIL]\t%s write\n", load_tile ? "incorrect" : "unexpected");
+	} else {
+		printf("[OK]\t%s write\n", load_tile ? "correct" : "no");
+	}
+
+	ptrace(PTRACE_DETACH, child, NULL, NULL);
+	wait(&status);
+}
+
+static void test_ptrace(void)
+{
+	bool ptracee_loads_tiles;
+
+	ptracee_loads_tiles = true;
+	test_tile_state_write(ptracee_loads_tiles);
+
+	ptracee_loads_tiles = false;
+	test_tile_state_write(ptracee_loads_tiles);
+}
+
+int main(void)
+{
+	/* Check hardware availability at first */
+
+	if (!check_xsave_supports_xtile()) {
+		if (xsave_disabled)
+			printf("XSAVE disabled.\n");
+		else
+			printf("Tile data not available.\n");
+		return 0;
+	}
+
+	if (!check_xtile_hwinfo()) {
+		printf("Available tile state size is insufficient to test.\n");
+		return 0;
+	}
+
+	nerrs = 0;
+
+	test_fork();
+	test_context_switch();
+	test_ptrace();
+
+	return nerrs ? 1 : 0;
+}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v3 20/21] x86/fpu/xstate: Support dynamic user state in the signal handling path
  2020-12-23 15:56 [PATCH v3 00/21] x86: Support Intel Advanced Matrix Extensions Chang S. Bae
                   ` (18 preceding siblings ...)
  2020-12-23 15:57 ` [PATCH v3 19/21] selftest/x86/amx: Include test cases for the AMX state management Chang S. Bae
@ 2020-12-23 15:57 ` Chang S. Bae
  2020-12-23 15:57 ` [PATCH v3 21/21] x86/fpu/xstate: Introduce boot-parameters to control some state component support Chang S. Bae
  2021-01-14 21:31 ` [PATCH v3 00/21] x86: Support Intel Advanced Matrix Extensions Bae, Chang Seok
  21 siblings, 0 replies; 64+ messages in thread
From: Chang S. Bae @ 2020-12-23 15:57 UTC (permalink / raw)
  To: bp, luto, tglx, mingo, x86
  Cc: len.brown, dave.hansen, jing2.liu, ravi.v.shankar, linux-kernel,
	chang.seok.bae, linux-kselftest

Entering a signal handler, the kernel saves the XSAVE buffer. The dynamic
user state is better to be saved only when used. fpu->state_mask can help
to exclude unused states.

Returning from a signal handler, XRSTOR re-initializes the excluded state
components.

Add a test case to verify in the signal handler that the signal frame
excludes AMX data when the signaled thread has initialized AMX state.

No functional change until the kernel supports the dynamic user states.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-kselftest@vger.kernel.org
---
Changes from v1:
* Made it revertible (moved close to the end of the series).
* Included the test case.
---
 arch/x86/include/asm/fpu/internal.h |  2 +-
 tools/testing/selftests/x86/amx.c   | 66 +++++++++++++++++++++++++++++
 2 files changed, 67 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 5eba9a466249..202874bb79da 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -369,7 +369,7 @@ static inline void copy_kernel_to_xregs(struct xregs_state *xstate, u64 mask)
  */
 static inline int copy_xregs_to_user(struct xregs_state __user *buf)
 {
-	u64 mask = xfeatures_mask_user();
+	u64 mask = current->thread.fpu.state_mask;
 	u32 lmask = mask;
 	u32 hmask = mask >> 32;
 	int err;
diff --git a/tools/testing/selftests/x86/amx.c b/tools/testing/selftests/x86/amx.c
index f4ecdfd27ae9..a7386b886532 100644
--- a/tools/testing/selftests/x86/amx.c
+++ b/tools/testing/selftests/x86/amx.c
@@ -650,6 +650,71 @@ static void test_ptrace(void)
 	test_tile_state_write(ptracee_loads_tiles);
 }
 
+/* Signal handling test */
+
+static int sigtrapped;
+struct tile_data sig_tiles, sighdl_tiles;
+
+static void handle_sigtrap(int sig, siginfo_t *info, void *ctx_void)
+{
+	ucontext_t *uctxt = (ucontext_t *)ctx_void;
+	struct xsave_data xdata;
+	struct tile_config cfg;
+	struct tile_data tiles;
+	u64 header;
+
+	header = __get_xsave_xstate_bv((void *)uctxt->uc_mcontext.fpregs);
+
+	if (header & (1 << XFEATURE_XTILE_DATA))
+		printf("[FAIL]\ttile data was written in sigframe\n");
+	else
+		printf("[OK]\ttile data was skipped in sigframe\n");
+
+	set_tilecfg(&cfg);
+	load_tilecfg(&cfg);
+	init_xdata(&xdata);
+
+	make_tiles(&tiles);
+	copy_tiles_to_xdata(&xdata, &tiles);
+	restore_xdata(&xdata);
+
+	save_xdata(&xdata);
+	if (compare_xdata_tiles(&xdata, &tiles))
+		err(1, "tile load file");
+
+	printf("\tsignal handler: load tile data\n");
+
+	sigtrapped = sig;
+}
+
+static void test_signal_handling(void)
+{
+	struct xsave_data xdata = { 0 };
+	struct tile_data tiles = { 0 };
+
+	sethandler(SIGTRAP, handle_sigtrap, 0);
+	sigtrapped = 0;
+
+	printf("[RUN]\tCheck tile state management in handling signal\n");
+
+	printf("\tbefore signal: initial tile data state\n");
+
+	raise(SIGTRAP);
+
+	if (sigtrapped == 0)
+		err(1, "sigtrap");
+
+	save_xdata(&xdata);
+	if (compare_xdata_tiles(&xdata, &tiles)) {
+		printf("[FAIL]\ttile data was not loaded at sigreturn\n");
+		nerrs++;
+	} else {
+		printf("[OK]\ttile data was re-initialized at sigreturn\n");
+	}
+
+	clearhandler(SIGTRAP);
+}
+
 int main(void)
 {
 	/* Check hardware availability at first */
@@ -672,6 +737,7 @@ int main(void)
 	test_fork();
 	test_context_switch();
 	test_ptrace();
+	test_signal_handling();
 
 	return nerrs ? 1 : 0;
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v3 21/21] x86/fpu/xstate: Introduce boot-parameters to control some state component support
  2020-12-23 15:56 [PATCH v3 00/21] x86: Support Intel Advanced Matrix Extensions Chang S. Bae
                   ` (19 preceding siblings ...)
  2020-12-23 15:57 ` [PATCH v3 20/21] x86/fpu/xstate: Support dynamic user state in the signal handling path Chang S. Bae
@ 2020-12-23 15:57 ` Chang S. Bae
  2020-12-23 18:37   ` Randy Dunlap
  2021-01-14 21:31 ` [PATCH v3 00/21] x86: Support Intel Advanced Matrix Extensions Bae, Chang Seok
  21 siblings, 1 reply; 64+ messages in thread
From: Chang S. Bae @ 2020-12-23 15:57 UTC (permalink / raw)
  To: bp, luto, tglx, mingo, x86
  Cc: len.brown, dave.hansen, jing2.liu, ravi.v.shankar, linux-kernel,
	chang.seok.bae, linux-doc

"xstate.disable=0x60000" will disable AMX on a system that has AMX compiled
into XFEATURE_MASK_USER_ENABLED.

"xstate.enable=0x60000" will enable AMX on a system that does NOT have AMX
compiled into XFEATURE_MASK_USER_ENABLED (assuming the kernel is new enough
to support this feature).

Rename XFEATURE_MASK_USER_SUPPORTED to XFEATURE_MASK_USER_ENABLED to be
aligned with the new parameters.

While this cmdline is currently enabled only for AMX, it is intended to be
easily enabled to be useful for future XSAVE-enabled features.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Cc: x86@kernel.org
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v2:
* Changed the kernel tainted when any unknown state is enabled. (Andy
  Lutomirski)
* Simplified the cmdline handling.
* Edited the changelog.

Changes from v1:
* Renamed the user state mask define (Andy Lutomirski and Dave Hansen)
* Changed the error message (Dave Hansen)
* Fixed xfeatures_mask_user()
* Rebased the upstream kernel (5.10) -- revived the param parse function
---
 .../admin-guide/kernel-parameters.txt         | 15 +++++
 arch/x86/include/asm/fpu/types.h              |  6 ++
 arch/x86/include/asm/fpu/xstate.h             | 24 +++----
 arch/x86/kernel/fpu/init.c                    | 65 +++++++++++++++++--
 4 files changed, 93 insertions(+), 17 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 44fde25bb221..a67ae04d43c5 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -6002,6 +6002,21 @@
 			which allow the hypervisor to 'idle' the guest on lock
 			contention.
 
+	xstate.enable=	[X86-64]
+	xstate.disable=	[X86-64]
+			The kernel is compiled with a default xstate bitmask --
+			enabling it to use the XSAVE hardware to efficiently
+			save and restore thread states on context switch.
+			xstate.enable allows adding to that default mask at
+			boot-time without recompiling the kernel just to support
+			the new thread state. (Note that the kernel will ignore
+			any bits in the mask that do not correspond to features
+			that are actually available in CPUID)  xstate.disable
+			allows clearing bits in the default mask, forcing the
+			kernel to forget that it supports the specified thread
+			state. When a bit set for both, the kernel takes
+			xstate.disable in a priority.
+
 	xirc2ps_cs=	[NET,PCMCIA]
 			Format:
 			<irq>,<irq_mask>,<io>,<full_duplex>,<do_sound>,<lockup_hack>[,<irq2>[,<irq3>[,<irq4>]]]
diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
index bf9511efd546..8835d3f6acb7 100644
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -149,6 +149,12 @@ enum xfeature {
 #define XFEATURE_MASK_XTILE		(XFEATURE_MASK_XTILE_DATA \
 					 | XFEATURE_MASK_XTILE_CFG)
 
+#define XFEATURE_REGION_MASK(max_bit, min_bit) \
+	((BIT_ULL((max_bit) - (min_bit) + 1) - 1) << (min_bit))
+
+#define XFEATURE_MASK_CONFIGURABLE \
+	XFEATURE_REGION_MASK(XFEATURE_XTILE_DATA, XFEATURE_XTILE_CFG)
+
 #define FIRST_EXTENDED_XFEATURE	XFEATURE_YMM
 
 struct reg_128_bit {
diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
index 8f5218d420ad..c27feca8e66c 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -25,17 +25,17 @@
 
 #define XSAVE_ALIGNMENT     64
 
-/* All currently supported user features */
-#define XFEATURE_MASK_USER_SUPPORTED (XFEATURE_MASK_FP | \
-				      XFEATURE_MASK_SSE | \
-				      XFEATURE_MASK_YMM | \
-				      XFEATURE_MASK_OPMASK | \
-				      XFEATURE_MASK_ZMM_Hi256 | \
-				      XFEATURE_MASK_Hi16_ZMM	 | \
-				      XFEATURE_MASK_PKRU | \
-				      XFEATURE_MASK_BNDREGS | \
-				      XFEATURE_MASK_BNDCSR | \
-				      XFEATURE_MASK_XTILE)
+/* All currently enabled user features */
+#define XFEATURE_MASK_USER_ENABLED (XFEATURE_MASK_FP | \
+				    XFEATURE_MASK_SSE | \
+				    XFEATURE_MASK_YMM | \
+				    XFEATURE_MASK_OPMASK | \
+				    XFEATURE_MASK_ZMM_Hi256 | \
+				    XFEATURE_MASK_Hi16_ZMM	 | \
+				    XFEATURE_MASK_PKRU | \
+				    XFEATURE_MASK_BNDREGS | \
+				    XFEATURE_MASK_BNDCSR | \
+				    XFEATURE_MASK_XTILE)
 
 /* All currently supported supervisor features */
 #define XFEATURE_MASK_SUPERVISOR_SUPPORTED (XFEATURE_MASK_PASID)
@@ -87,7 +87,7 @@ static inline u64 xfeatures_mask_supervisor(void)
 
 static inline u64 xfeatures_mask_user(void)
 {
-	return xfeatures_mask_all & XFEATURE_MASK_USER_SUPPORTED;
+	return xfeatures_mask_all & ~(XFEATURE_MASK_SUPERVISOR_ALL);
 }
 
 static inline u64 xfeatures_mask_supervisor_dynamic(void)
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index c77c1c5580f9..f73aaae81ed9 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -5,6 +5,7 @@
 #include <asm/fpu/internal.h>
 #include <asm/tlbflush.h>
 #include <asm/setup.h>
+#include <asm/cmdline.h>
 
 #include <linux/sched.h>
 #include <linux/sched/task.h>
@@ -229,14 +230,45 @@ static void __init fpu__init_system_xstate_size_legacy(void)
 /*
  * Find supported xfeatures based on cpu features and command-line input.
  * This must be called after fpu__init_parse_early_param() is called and
- * xfeatures_mask is enumerated.
+ * xfeatures_mask_all is enumerated.
  */
+
+static u64 xstate_enable;
+static u64 xstate_disable;
+
 u64 __init fpu__get_supported_xfeatures_mask(void)
 {
-	u64 mask = XFEATURE_MASK_USER_SUPPORTED | XFEATURE_MASK_SUPERVISOR_SUPPORTED;
-
-	if (!IS_ENABLED(CONFIG_X86_64))
-		mask &= ~(XFEATURE_MASK_XTILE);
+	u64 mask = XFEATURE_MASK_USER_ENABLED | XFEATURE_MASK_SUPERVISOR_SUPPORTED;
+
+	if (!IS_ENABLED(CONFIG_X86_64)) {
+		mask  &= ~(XFEATURE_MASK_XTILE);
+	} else if (xstate_enable || xstate_disable) {
+		u64 custom = mask;
+		u64 unknown;
+
+		custom |= xstate_enable;
+		custom &= ~xstate_disable;
+
+		unknown = custom & ~mask;
+		if (unknown) {
+			/*
+			 * User should fully understand the result of using undocumented
+			 * xstate component.
+			 */
+			add_taint(TAINT_CPU_OUT_OF_SPEC, LOCKDEP_STILL_OK);
+			pr_warn("x86/fpu: Attempt to enable unknown xstate features 0x%llx\n",
+				unknown);
+			WARN_ON_FPU(1);
+		}
+
+		if ((custom & XFEATURE_MASK_XTILE) != XFEATURE_MASK_XTILE) {
+			pr_warn("x86/fpu: Error in xstate.disable. Additionally disabling 0x%x components.\n",
+				XFEATURE_MASK_XTILE);
+			custom &= ~(XFEATURE_MASK_XTILE);
+		}
+
+		mask = custom;
+	}
 
 	return mask;
 }
@@ -250,12 +282,35 @@ static void __init fpu__init_system_ctx_switch(void)
 	on_boot_cpu = 0;
 }
 
+/*
+ * Longest parameter of 'xstate.enable=' is 22 octal number characters with '0' prefix and
+ * an extra '\0' for termination.
+ */
+#define MAX_XSTATE_MASK_CHARS	24
+/*
+ * We parse xstate parameters early because fpu__init_system() is executed before
+ * parse_early_param().
+ */
+static void __init fpu__init_parse_early_param(void)
+{
+	char arg[MAX_XSTATE_MASK_CHARS];
+
+	if (cmdline_find_option(boot_command_line, "xstate.enable", arg, sizeof(arg)) &&
+	    !kstrtoull(arg, 0, &xstate_enable))
+		xstate_enable &= XFEATURE_MASK_CONFIGURABLE;
+
+	if (cmdline_find_option(boot_command_line, "xstate.disable", arg, sizeof(arg)) &&
+	    !kstrtoull(arg, 0, &xstate_disable))
+		xstate_disable &= XFEATURE_MASK_CONFIGURABLE;
+}
+
 /*
  * Called on the boot CPU once per system bootup, to set up the initial
  * FPU state that is later cloned into all processes:
  */
 void __init fpu__init_system(struct cpuinfo_x86 *c)
 {
+	fpu__init_parse_early_param();
 	fpu__init_system_early_generic(c);
 
 	/*
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* Re: [PATCH v3 21/21] x86/fpu/xstate: Introduce boot-parameters to control some state component support
  2020-12-23 15:57 ` [PATCH v3 21/21] x86/fpu/xstate: Introduce boot-parameters to control some state component support Chang S. Bae
@ 2020-12-23 18:37   ` Randy Dunlap
  2021-01-14 21:31     ` Bae, Chang Seok
  0 siblings, 1 reply; 64+ messages in thread
From: Randy Dunlap @ 2020-12-23 18:37 UTC (permalink / raw)
  To: Chang S. Bae, bp, luto, tglx, mingo, x86
  Cc: len.brown, dave.hansen, jing2.liu, ravi.v.shankar, linux-kernel,
	linux-doc

On 12/23/20 7:57 AM, Chang S. Bae wrote:
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 44fde25bb221..a67ae04d43c5 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -6002,6 +6002,21 @@
>  			which allow the hypervisor to 'idle' the guest on lock
>  			contention.
>  
> +	xstate.enable=	[X86-64]
> +	xstate.disable=	[X86-64]
> +			The kernel is compiled with a default xstate bitmask --
> +			enabling it to use the XSAVE hardware to efficiently
> +			save and restore thread states on context switch.
> +			xstate.enable allows adding to that default mask at
> +			boot-time without recompiling the kernel just to support
> +			the new thread state. (Note that the kernel will ignore
> +			any bits in the mask that do not correspond to features
> +			that are actually available in CPUID)  xstate.disable

			                               CPUID.)

> +			allows clearing bits in the default mask, forcing the
> +			kernel to forget that it supports the specified thread
> +			state. When a bit set for both, the kernel takes
> +			xstate.disable in a priority.

			               as a priority.
?


thanks.
-- 
~Randy


^ permalink raw reply	[flat|nested] 64+ messages in thread

* RE: [PATCH v3 10/21] x86/fpu/xstate: Update xstate save function to support dynamic xstate
  2020-12-23 15:57 ` [PATCH v3 10/21] x86/fpu/xstate: Update xstate save function to support dynamic xstate Chang S. Bae
@ 2021-01-07  8:41   ` Liu, Jing2
  2021-01-07 18:40     ` Bae, Chang Seok
  2021-02-08 12:33   ` Borislav Petkov
  1 sibling, 1 reply; 64+ messages in thread
From: Liu, Jing2 @ 2021-01-07  8:41 UTC (permalink / raw)
  To: Bae, Chang Seok, bp, luto, tglx, mingo, x86
  Cc: Brown, Len, Hansen, Dave, Shankar, Ravi V, linux-kernel, kvm



-----Original Message-----
From: Bae, Chang Seok <chang.seok.bae@intel.com> 
Sent: Wednesday, December 23, 2020 11:57 PM
To: bp@suse.de; luto@kernel.org; tglx@linutronix.de; mingo@kernel.org; x86@kernel.org
Cc: Brown, Len <len.brown@intel.com>; Hansen, Dave <dave.hansen@intel.com>; Liu, Jing2 <jing2.liu@intel.com>; Shankar, Ravi V <ravi.v.shankar@intel.com>; linux-kernel@vger.kernel.org; Bae, Chang Seok <chang.seok.bae@intel.com>; kvm@vger.kernel.org
Subject: [PATCH v3 10/21] x86/fpu/xstate: Update xstate save function to support dynamic xstate

copy_xregs_to_kernel() used to save all user states in a kernel buffer.
When the dynamic user state is enabled, it becomes conditional which state to be saved.

fpu->state_mask can indicate which state components are reserved to be
saved in XSAVE buffer. Use it as XSAVE's instruction mask to select states.

KVM used to save all xstate via copy_xregs_to_kernel(). Update KVM to set a valid fpu->state_mask, which will be necessary to correctly handle dynamic state buffers.

See comments together below.

No functional change until the kernel supports dynamic user states.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: kvm@vger.kernel.org
[...]
 		/*
 		 * AVX512 state is tracked here because its use is diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 4aecfba04bd3..93b5bacad67a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9214,15 +9214,20 @@ static int complete_emulated_mmio(struct kvm_vcpu *vcpu)
 
 static void kvm_save_current_fpu(struct fpu *fpu)  {
+	struct fpu *src_fpu = &current->thread.fpu;
+
 	/*
 	 * If the target FPU state is not resident in the CPU registers, just
 	 * memcpy() from current, else save CPU state directly to the target.
 	 */
-	if (test_thread_flag(TIF_NEED_FPU_LOAD))
-		memcpy(&fpu->state, &current->thread.fpu.state,
+	if (test_thread_flag(TIF_NEED_FPU_LOAD)) {
+		memcpy(&fpu->state, &src_fpu->state,
 		       fpu_kernel_xstate_min_size);
For kvm, if we assume that it does not support dynamic features until this series,
memcpy for only fpu->state is correct. 
I think this kind of assumption is reasonable and we only make original xstate work.

-	else
+	} else {
+		if (fpu->state_mask != src_fpu->state_mask)
+			fpu->state_mask = src_fpu->state_mask;

Though dynamic feature is not supported in kvm now, this function still need
consider more things for fpu->state_mask.
I suggest that we can set it before if...else (for both cases) and not change other. 

Thanks,
Jing

 		copy_fpregs_to_fpstate(fpu);
+	}

 }

 
 /* Swap (qemu) user FPU context for the guest FPU context. */
--
2.17.1


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v3 10/21] x86/fpu/xstate: Update xstate save function to support dynamic xstate
  2021-01-07  8:41   ` Liu, Jing2
@ 2021-01-07 18:40     ` Bae, Chang Seok
  2021-01-12  2:52       ` Liu, Jing2
  0 siblings, 1 reply; 64+ messages in thread
From: Bae, Chang Seok @ 2021-01-07 18:40 UTC (permalink / raw)
  To: Liu, Jing2
  Cc: bp, luto, tglx, mingo, x86, Brown, Len, Hansen, Dave, Shankar,
	Ravi V, linux-kernel, kvm


> On Jan 7, 2021, at 17:41, Liu, Jing2 <jing2.liu@intel.com> wrote:
> 
> static void kvm_save_current_fpu(struct fpu *fpu)  {
> +	struct fpu *src_fpu = &current->thread.fpu;
> +
> 	/*
> 	 * If the target FPU state is not resident in the CPU registers, just
> 	 * memcpy() from current, else save CPU state directly to the target.
> 	 */
> -	if (test_thread_flag(TIF_NEED_FPU_LOAD))
> -		memcpy(&fpu->state, &current->thread.fpu.state,
> +	if (test_thread_flag(TIF_NEED_FPU_LOAD)) {
> +		memcpy(&fpu->state, &src_fpu->state,
> 		       fpu_kernel_xstate_min_size);
> For kvm, if we assume that it does not support dynamic features until this series,
> memcpy for only fpu->state is correct. 
> I think this kind of assumption is reasonable and we only make original xstate work.
> 
> -	else
> +	} else {
> +		if (fpu->state_mask != src_fpu->state_mask)
> +			fpu->state_mask = src_fpu->state_mask;
> 
> Though dynamic feature is not supported in kvm now, this function still need
> consider more things for fpu->state_mask.

Can you elaborate this? Which path might be affected by fpu->state_mask
without dynamic state supported in KVM?

> I suggest that we can set it before if...else (for both cases) and not change other. 

I tried a minimum change here.  The fpu->state_mask value does not impact the
memcpy(). So, why do we need to change it for both?

Thanks,
Chang

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v3 10/21] x86/fpu/xstate: Update xstate save function to support dynamic xstate
  2021-01-07 18:40     ` Bae, Chang Seok
@ 2021-01-12  2:52       ` Liu, Jing2
  2021-01-15  4:59         ` Bae, Chang Seok
  0 siblings, 1 reply; 64+ messages in thread
From: Liu, Jing2 @ 2021-01-12  2:52 UTC (permalink / raw)
  To: Bae, Chang Seok, Liu, Jing2
  Cc: bp, luto, tglx, mingo, x86, Brown, Len, Hansen, Dave, Shankar,
	Ravi V, linux-kernel, kvm


On 1/8/2021 2:40 AM, Bae, Chang Seok wrote:
>> On Jan 7, 2021, at 17:41, Liu, Jing2 <jing2.liu@intel.com> wrote:
>>
>> static void kvm_save_current_fpu(struct fpu *fpu)  {
>> +	struct fpu *src_fpu = &current->thread.fpu;
>> +
>> 	/*
>> 	 * If the target FPU state is not resident in the CPU registers, just
>> 	 * memcpy() from current, else save CPU state directly to the target.
>> 	 */
>> -	if (test_thread_flag(TIF_NEED_FPU_LOAD))
>> -		memcpy(&fpu->state, &current->thread.fpu.state,
>> +	if (test_thread_flag(TIF_NEED_FPU_LOAD)) {
>> +		memcpy(&fpu->state, &src_fpu->state,
>> 		       fpu_kernel_xstate_min_size);
>> For kvm, if we assume that it does not support dynamic features until this series,
>> memcpy for only fpu->state is correct.
>> I think this kind of assumption is reasonable and we only make original xstate work.
>>
>> -	else
>> +	} else {
>> +		if (fpu->state_mask != src_fpu->state_mask)
>> +			fpu->state_mask = src_fpu->state_mask;
>>
>> Though dynamic feature is not supported in kvm now, this function still need
>> consider more things for fpu->state_mask.
> Can you elaborate this? Which path might be affected by fpu->state_mask
> without dynamic state supported in KVM?
>
>> I suggest that we can set it before if...else (for both cases) and not change other.
> I tried a minimum change here.  The fpu->state_mask value does not impact the
> memcpy(). So, why do we need to change it for both?

Sure, what I'm considering is that "mask" is the first time introduced 
into "fpu",

representing the usage, so not only set it when needed, but also make it 
as a

representation, in case of anywhere using it (especially between the 
interval

of this series and kvm series in future).

Thanks,

Jing

> Thanks,
> Chang

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v3 21/21] x86/fpu/xstate: Introduce boot-parameters to control some state component support
  2020-12-23 18:37   ` Randy Dunlap
@ 2021-01-14 21:31     ` Bae, Chang Seok
  0 siblings, 0 replies; 64+ messages in thread
From: Bae, Chang Seok @ 2021-01-14 21:31 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Borislav Petkov, luto, tglx, mingo, x86, Brown, Len, Hansen,
	Dave, Liu, Jing2, Shankar, Ravi V, linux-kernel, linux-doc


> On Dec 23, 2020, at 10:37, Randy Dunlap <rdunlap@infradead.org> wrote:
> 
> On 12/23/20 7:57 AM, Chang S. Bae wrote:
>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>> index 44fde25bb221..a67ae04d43c5 100644
>> --- a/Documentation/admin-guide/kernel-parameters.txt
>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>> @@ -6002,6 +6002,21 @@
>> 			which allow the hypervisor to 'idle' the guest on lock
>> 			contention.
>> 
>> +	xstate.enable=	[X86-64]
>> +	xstate.disable=	[X86-64]
>> +			The kernel is compiled with a default xstate bitmask --
>> +			enabling it to use the XSAVE hardware to efficiently
>> +			save and restore thread states on context switch.
>> +			xstate.enable allows adding to that default mask at
>> +			boot-time without recompiling the kernel just to support
>> +			the new thread state. (Note that the kernel will ignore
>> +			any bits in the mask that do not correspond to features
>> +			that are actually available in CPUID)  xstate.disable
> 
> 			                               CPUID.)
> 
>> +			allows clearing bits in the default mask, forcing the
>> +			kernel to forget that it supports the specified thread
>> +			state. When a bit set for both, the kernel takes
>> +			xstate.disable in a priority.
> 
> 			               as a priority.
> ?

Thank you. I fixed those typos in my tree.

Chang

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v3 00/21] x86: Support Intel Advanced Matrix Extensions
  2020-12-23 15:56 [PATCH v3 00/21] x86: Support Intel Advanced Matrix Extensions Chang S. Bae
                   ` (20 preceding siblings ...)
  2020-12-23 15:57 ` [PATCH v3 21/21] x86/fpu/xstate: Introduce boot-parameters to control some state component support Chang S. Bae
@ 2021-01-14 21:31 ` Bae, Chang Seok
  21 siblings, 0 replies; 64+ messages in thread
From: Bae, Chang Seok @ 2021-01-14 21:31 UTC (permalink / raw)
  To: Borislav Petkov, Andy Lutomirski
  Cc: Thomas Gleixner, Ingo Molnar, x86-ml, Brown, Len, Hansen, Dave,
	Liu, Jing2, Shankar, Ravi V, lkml


> On Dec 23, 2020, at 07:56, Bae, Chang Seok <chang.seok.bae@intel.com> wrote:
> 
> Changes from v2 [5]:
> * Removed the patch for the tile data inheritance. Also, updated the
>  selftest patch. (Andy Lutomirski)
> * Changed the kernel tainted when any unknown state is enabled. (Andy
>  Lutomirski)
> * Changed to use the XFD feature only when the compacted format in use.
> * Improved the test code.
> * Simplified the cmdline handling.
> * Removed 'task->fpu' in changelogs. (Boris Petkov)
> * Updated the variable name / comments / changelogs for clarification.

Hi Boris,

Thanks for the feedback. Please let me know if this updated commit messages
meet your expectations.

Thanks,
Chang

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v3 10/21] x86/fpu/xstate: Update xstate save function to support dynamic xstate
  2021-01-12  2:52       ` Liu, Jing2
@ 2021-01-15  4:59         ` Bae, Chang Seok
  2021-01-15  5:45           ` Liu, Jing2
  0 siblings, 1 reply; 64+ messages in thread
From: Bae, Chang Seok @ 2021-01-15  4:59 UTC (permalink / raw)
  To: Liu, Jing2
  Cc: Liu, Jing2, bp, luto, tglx, mingo, x86, Brown, Len, Hansen, Dave,
	Shankar, Ravi V, linux-kernel, kvm


> On Jan 11, 2021, at 18:52, Liu, Jing2 <jing2.liu@linux.intel.com> wrote:
> 
> On 1/8/2021 2:40 AM, Bae, Chang Seok wrote:
>>> On Jan 7, 2021, at 17:41, Liu, Jing2 <jing2.liu@intel.com> wrote:
>>> 
>>> static void kvm_save_current_fpu(struct fpu *fpu)  {
>>> +	struct fpu *src_fpu = &current->thread.fpu;
>>> +
>>> 	/*
>>> 	 * If the target FPU state is not resident in the CPU registers, just
>>> 	 * memcpy() from current, else save CPU state directly to the target.
>>> 	 */
>>> -	if (test_thread_flag(TIF_NEED_FPU_LOAD))
>>> -		memcpy(&fpu->state, &current->thread.fpu.state,
>>> +	if (test_thread_flag(TIF_NEED_FPU_LOAD)) {
>>> +		memcpy(&fpu->state, &src_fpu->state,
>>> 		       fpu_kernel_xstate_min_size);

<snip>

>>> -	else
>>> +	} else {
>>> +		if (fpu->state_mask != src_fpu->state_mask)
>>> +			fpu->state_mask = src_fpu->state_mask;
>>> 
>>> Though dynamic feature is not supported in kvm now, this function still need
>>> consider more things for fpu->state_mask.
>> Can you elaborate this? Which path might be affected by fpu->state_mask
>> without dynamic state supported in KVM?
>> 
>>> I suggest that we can set it before if...else (for both cases) and not change other.
>> I tried a minimum change here.  The fpu->state_mask value does not impact the
>> memcpy(). So, why do we need to change it for both?
> 
> Sure, what I'm considering is that "mask" is the first time introduced into "fpu",
> representing the usage, so not only set it when needed, but also make it as a
> representation, in case of anywhere using it (especially between the interval
> of this series and kvm series in future).

Thank you for the feedback. Sorry, I don't get any logical reason to set the
mask always here. Perhaps, KVM code can be updated like you mentioned when
supporting the dynamic states there.

Please let me know if I’m missing any functional issues.

Thanks,
Chang

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v3 10/21] x86/fpu/xstate: Update xstate save function to support dynamic xstate
  2021-01-15  4:59         ` Bae, Chang Seok
@ 2021-01-15  5:45           ` Liu, Jing2
  0 siblings, 0 replies; 64+ messages in thread
From: Liu, Jing2 @ 2021-01-15  5:45 UTC (permalink / raw)
  To: Bae, Chang Seok
  Cc: Liu, Jing2, bp, luto, tglx, mingo, x86, Brown, Len, Hansen, Dave,
	Shankar, Ravi V, linux-kernel, kvm


On 1/15/2021 12:59 PM, Bae, Chang Seok wrote:
>> On Jan 11, 2021, at 18:52, Liu, Jing2 <jing2.liu@linux.intel.com> wrote:
>>
>> On 1/8/2021 2:40 AM, Bae, Chang Seok wrote:
>>>> On Jan 7, 2021, at 17:41, Liu, Jing2 <jing2.liu@intel.com> wrote:
>>>>
>>>> static void kvm_save_current_fpu(struct fpu *fpu)  {
>>>> +	struct fpu *src_fpu = &current->thread.fpu;
>>>> +
>>>> 	/*
>>>> 	 * If the target FPU state is not resident in the CPU registers, just
>>>> 	 * memcpy() from current, else save CPU state directly to the target.
>>>> 	 */
>>>> -	if (test_thread_flag(TIF_NEED_FPU_LOAD))
>>>> -		memcpy(&fpu->state, &current->thread.fpu.state,
>>>> +	if (test_thread_flag(TIF_NEED_FPU_LOAD)) {
>>>> +		memcpy(&fpu->state, &src_fpu->state,
>>>> 		       fpu_kernel_xstate_min_size);
> <snip>
>
>>>> -	else
>>>> +	} else {
>>>> +		if (fpu->state_mask != src_fpu->state_mask)
>>>> +			fpu->state_mask = src_fpu->state_mask;
>>>>
>>>> Though dynamic feature is not supported in kvm now, this function still need
>>>> consider more things for fpu->state_mask.
>>> Can you elaborate this? Which path might be affected by fpu->state_mask
>>> without dynamic state supported in KVM?
>>>
>>>> I suggest that we can set it before if...else (for both cases) and not change other.
>>> I tried a minimum change here.  The fpu->state_mask value does not impact the
>>> memcpy(). So, why do we need to change it for both?
>> Sure, what I'm considering is that "mask" is the first time introduced into "fpu",
>> representing the usage, so not only set it when needed, but also make it as a
>> representation, in case of anywhere using it (especially between the interval
>> of this series and kvm series in future).
> Thank you for the feedback. Sorry, I don't get any logical reason to set the
> mask always here.

Sure, I'd like to see if fx_init()->memset is the case,

though maybe no hurt so far in test.

Thanks,

Jing

>   Perhaps, KVM code can be updated like you mentioned when
> supporting the dynamic states there.
>
> Please let me know if I’m missing any functional issues.
>
> Thanks,
> Chang

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v3 01/21] x86/fpu/xstate: Modify initialization helper to handle both static and dynamic buffers
  2020-12-23 15:56 ` [PATCH v3 01/21] x86/fpu/xstate: Modify initialization helper to handle both static and dynamic buffers Chang S. Bae
@ 2021-01-15 12:40   ` Borislav Petkov
  0 siblings, 0 replies; 64+ messages in thread
From: Borislav Petkov @ 2021-01-15 12:40 UTC (permalink / raw)
  To: Chang S. Bae
  Cc: luto, tglx, mingo, x86, len.brown, dave.hansen, jing2.liu,
	ravi.v.shankar, linux-kernel, kvm

On Wed, Dec 23, 2020 at 07:56:57AM -0800, Chang S. Bae wrote:
> In preparation for dynamic xstate buffer expansion, update the buffer
> initialization function parameters to equally handle static in-line xstate
> buffer, as well as dynamically allocated xstate buffer.
> 
> init_fpstate is a special case, which is indicated by a null pointer
> parameter to fpstate_init().
> 
> Also, fpstate_init_xstate() now accepts the state component bitmap to
> configure XCOMP_BV for the compacted format.
> 
> No functional change.

Much better, thanks!

> diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
> index eb86a2b831b1..f23e5ffbb307 100644
> --- a/arch/x86/kernel/fpu/core.c
> +++ b/arch/x86/kernel/fpu/core.c
> @@ -191,8 +191,16 @@ static inline void fpstate_init_fstate(struct fregs_state *fp)
>  	fp->fos = 0xffff0000u;
>  }
>  
> -void fpstate_init(union fpregs_state *state)
> +/* A null pointer parameter indicates init_fpstate. */

Use kernel-doc comment style instead:

/**
 * ..
 *
 * @fpu: If NULL, use init_fpstate
 */

> +void fpstate_init(struct fpu *fpu)
>  {

...

-- 
Regards/Gruss,
    Boris.

SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer, HRB 36809, AG Nürnberg

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v3 02/21] x86/fpu/xstate: Modify state copy helpers to handle both static and dynamic buffers
  2020-12-23 15:56 ` [PATCH v3 02/21] x86/fpu/xstate: Modify state copy helpers " Chang S. Bae
@ 2021-01-15 12:50   ` Borislav Petkov
  2021-01-19 18:50     ` Bae, Chang Seok
  0 siblings, 1 reply; 64+ messages in thread
From: Borislav Petkov @ 2021-01-15 12:50 UTC (permalink / raw)
  To: Chang S. Bae
  Cc: luto, tglx, mingo, x86, len.brown, dave.hansen, jing2.liu,
	ravi.v.shankar, linux-kernel

On Wed, Dec 23, 2020 at 07:56:58AM -0800, Chang S. Bae wrote:
> In preparation for dynamic xstate buffer expansion, update the xstate
> copy function parameters to equally handle static in-line buffer, as well
> as dynamically allocated xstate buffer.

This is repeated from the previous patch. I'm sure you can think of text
which fits here.

> 
> No functional change.
> 
> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
> Reviewed-by: Len Brown <len.brown@intel.com>
> Cc: x86@kernel.org
> Cc: linux-kernel@vger.kernel.org
> ---
> Changes from v2:
> * Updated the changelog with task->fpu removed. (Boris Petkov)
> ---
>  arch/x86/include/asm/fpu/xstate.h |  8 ++++----
>  arch/x86/kernel/fpu/regset.c      |  6 +++---
>  arch/x86/kernel/fpu/signal.c      | 16 +++++++---------
>  arch/x86/kernel/fpu/xstate.c      | 19 +++++++++++++++----
>  4 files changed, 29 insertions(+), 20 deletions(-)
> 
> diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
> index 47a92232d595..e0f1b22f53ce 100644
> --- a/arch/x86/include/asm/fpu/xstate.h
> +++ b/arch/x86/include/asm/fpu/xstate.h
> @@ -105,10 +105,10 @@ const void *get_xsave_field_ptr(int xfeature_nr);
>  int using_compacted_format(void);
>  int xfeature_size(int xfeature_nr);
>  struct membuf;
> -void copy_xstate_to_kernel(struct membuf to, struct xregs_state *xsave);
> -int copy_kernel_to_xstate(struct xregs_state *xsave, const void *kbuf);
> -int copy_user_to_xstate(struct xregs_state *xsave, const void __user *ubuf);
> -void copy_supervisor_to_kernel(struct xregs_state *xsave);
> +void copy_xstate_to_kernel(struct membuf to, struct fpu *fpu);
> +int copy_kernel_to_xstate(struct fpu *fpu, const void *kbuf);
> +int copy_user_to_xstate(struct fpu *fpu, const void __user *ubuf);
> +void copy_supervisor_to_kernel(struct fpu *fpu);

Hmm, so those functions have "xstate" in the name because they took and
@xstate parameter. I guess not such a big deal you changing them, just
pointing out what the naming logic was.

-- 
Regards/Gruss,
    Boris.

SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer, HRB 36809, AG Nürnberg

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v3 03/21] x86/fpu/xstate: Modify address finders to handle both static and dynamic buffers
  2020-12-23 15:56 ` [PATCH v3 03/21] x86/fpu/xstate: Modify address finders " Chang S. Bae
@ 2021-01-15 13:06   ` Borislav Petkov
  0 siblings, 0 replies; 64+ messages in thread
From: Borislav Petkov @ 2021-01-15 13:06 UTC (permalink / raw)
  To: Chang S. Bae
  Cc: luto, tglx, mingo, x86, len.brown, dave.hansen, jing2.liu,
	ravi.v.shankar, linux-kernel, kvm

On Wed, Dec 23, 2020 at 07:56:59AM -0800, Chang S. Bae wrote:
> diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
> index 6156dad0feb6..2010c31d25e1 100644
> --- a/arch/x86/kernel/fpu/xstate.c
> +++ b/arch/x86/kernel/fpu/xstate.c
> @@ -894,15 +894,24 @@ void fpu__resume_cpu(void)
>   * Given an xstate feature nr, calculate where in the xsave
>   * buffer the state is.  Callers should ensure that the buffer
>   * is valid.
> + *
> + * A null pointer parameter indicates to use init_fpstate.
>   */

kernel-doc style comment pls.

-- 
Regards/Gruss,
    Boris.

SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer, HRB 36809, AG Nürnberg

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v3 04/21] x86/fpu/xstate: Modify context switch helpers to handle both static and dynamic buffers
  2020-12-23 15:57 ` [PATCH v3 04/21] x86/fpu/xstate: Modify context switch helpers " Chang S. Bae
@ 2021-01-15 13:18   ` Borislav Petkov
  2021-01-19 18:49     ` Bae, Chang Seok
  0 siblings, 1 reply; 64+ messages in thread
From: Borislav Petkov @ 2021-01-15 13:18 UTC (permalink / raw)
  To: Chang S. Bae
  Cc: luto, tglx, mingo, x86, len.brown, dave.hansen, jing2.liu,
	ravi.v.shankar, linux-kernel, kvm

On Wed, Dec 23, 2020 at 07:57:00AM -0800, Chang S. Bae wrote:
> In preparation for dynamic xstate buffer expansion, update the xstate
> restore function parameters to equally handle static in-line xstate buffer,
> as well as dynamically allocated xstate buffer.

Ok, I see what you've done: you've slightly changed that same
formulation depending on what the patch is doing. I need to read very
carefully.

What I would've written is:

"Have all functions handling FPU state take a struct fpu * pointer in
preparation for dynamic state buffer support."

Plain and simple.

-- 
Regards/Gruss,
    Boris.

SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer, HRB 36809, AG Nürnberg

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v3 05/21] x86/fpu/xstate: Add a new variable to indicate dynamic user states
  2020-12-23 15:57 ` [PATCH v3 05/21] x86/fpu/xstate: Add a new variable to indicate dynamic user states Chang S. Bae
@ 2021-01-15 13:39   ` Borislav Petkov
  2021-01-15 19:47     ` Bae, Chang Seok
  0 siblings, 1 reply; 64+ messages in thread
From: Borislav Petkov @ 2021-01-15 13:39 UTC (permalink / raw)
  To: Chang S. Bae
  Cc: luto, tglx, mingo, x86, len.brown, dave.hansen, jing2.liu,
	ravi.v.shankar, linux-kernel

On Wed, Dec 23, 2020 at 07:57:01AM -0800, Chang S. Bae wrote:
> The perf has a buffer that is allocated on demand. The states saved in the

What's "the perf"? I hope to find out when I countinue reading...

> buffer were named as 'dynamic' (supervisor) states but the buffer is not
> updated in every context switch.
> 
> The context switch buffer is in preparation to be dynamic for user states.
> Make the wording to differentiate between those 'dynamic' states.
> 
> Add a new variable -- xfeatures_mask_user_dynamic to indicate the dynamic
> user states, and rename some define and helper as related to the dynamic
> supervisor states:
> 	xfeatures_mask_supervisor_dynamic()
> 	XFEATURE_MASK_SUPERVISOR_DYNAMIC
> 
> No functional change.

Text needs cleaning up.

> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
> Reviewed-by: Len Brown <len.brown@intel.com>
> Cc: x86@kernel.org
> Cc: linux-kernel@vger.kernel.org
> ---
> Changes from v2:
> * Updated the changelog for clarification.
> ---
>  arch/x86/include/asm/fpu/xstate.h | 12 +++++++-----
>  arch/x86/kernel/fpu/xstate.c      | 29 +++++++++++++++++++----------
>  2 files changed, 26 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
> index 24bf8d3f559a..6ce8350672c2 100644
> --- a/arch/x86/include/asm/fpu/xstate.h
> +++ b/arch/x86/include/asm/fpu/xstate.h
> @@ -56,7 +56,7 @@
>   * - Don't set the bit corresponding to the dynamic supervisor feature in
>   *   IA32_XSS at run time, since it has been set at boot time.
>   */
> -#define XFEATURE_MASK_DYNAMIC (XFEATURE_MASK_LBR)
> +#define XFEATURE_MASK_SUPERVISOR_DYNAMIC (XFEATURE_MASK_LBR)

Is XFEATURE_MASK_USER_DYNAMIC coming too?

-- 
Regards/Gruss,
    Boris.

SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer, HRB 36809, AG Nürnberg

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v3 05/21] x86/fpu/xstate: Add a new variable to indicate dynamic user states
  2021-01-15 13:39   ` Borislav Petkov
@ 2021-01-15 19:47     ` Bae, Chang Seok
  2021-01-19 15:57       ` Borislav Petkov
  0 siblings, 1 reply; 64+ messages in thread
From: Bae, Chang Seok @ 2021-01-15 19:47 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, x86-ml, Brown,
	Len, Hansen, Dave, Liu, Jing2, Shankar, Ravi V, linux-kernel

On Jan 15, 2021, at 05:39, Borislav Petkov <bp@suse.de> wrote:
> On Wed, Dec 23, 2020 at 07:57:01AM -0800, Chang S. Bae wrote:
>> The perf has a buffer that is allocated on demand. The states saved in the
> 
> What's "the perf"? I hope to find out when I countinue reading…

Maybe it was better to write ‘Linux perf (tools)’ [1] here. Sorry for the
confusion.

>> diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
>> index 24bf8d3f559a..6ce8350672c2 100644
>> --- a/arch/x86/include/asm/fpu/xstate.h
>> +++ b/arch/x86/include/asm/fpu/xstate.h
>> @@ -56,7 +56,7 @@
>>  * - Don't set the bit corresponding to the dynamic supervisor feature in
>>  *   IA32_XSS at run time, since it has been set at boot time.
>>  */
>> -#define XFEATURE_MASK_DYNAMIC (XFEATURE_MASK_LBR)
>> +#define XFEATURE_MASK_SUPERVISOR_DYNAMIC (XFEATURE_MASK_LBR)
> 
> Is XFEATURE_MASK_USER_DYNAMIC coming too?

Instead of the new define, I thought the new variable --
xfeatures_mask_user_dynamic, as its value needs to be determined at boot
time.

PATCH13/21 has the routine:

        xfeatures_mask_all &= fpu__get_supported_xfeatures_mask();
-       /* Do not support the dynamically allocated buffer yet. */
        xfeatures_mask_user_dynamic = 0;
 
+       for (i = FIRST_EXTENDED_XFEATURE; i < XFEATURE_MAX; i++) {
+               u64 feature_mask = BIT_ULL(i);
+
+               if (!(xfeatures_mask_user() & feature_mask))
+                       continue;
+
+               if (xfeature_disable_supported(i))
+                       xfeatures_mask_user_dynamic |= feature_mask;
+       }
+

Thanks,
Chang

[1] https://en.wikipedia.org/wiki/Perf_(Linux)


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v3 05/21] x86/fpu/xstate: Add a new variable to indicate dynamic user states
  2021-01-15 19:47     ` Bae, Chang Seok
@ 2021-01-19 15:57       ` Borislav Petkov
  2021-01-19 18:57         ` Bae, Chang Seok
  0 siblings, 1 reply; 64+ messages in thread
From: Borislav Petkov @ 2021-01-19 15:57 UTC (permalink / raw)
  To: Bae, Chang Seok
  Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, x86-ml, Brown,
	Len, Hansen, Dave, Liu, Jing2, Shankar, Ravi V, linux-kernel

On Fri, Jan 15, 2021 at 07:47:38PM +0000, Bae, Chang Seok wrote:
> On Jan 15, 2021, at 05:39, Borislav Petkov <bp@suse.de> wrote:
> > On Wed, Dec 23, 2020 at 07:57:01AM -0800, Chang S. Bae wrote:
> >> The perf has a buffer that is allocated on demand. The states saved in the
> > 
> > What's "the perf"? I hope to find out when I countinue reading…
> 
> Maybe it was better to write ‘Linux perf (tools)’ [1] here. Sorry for the
> confusion.

Well, I'm also confused as to what does the perf buffer have to do with
AMX?

> >> -#define XFEATURE_MASK_DYNAMIC (XFEATURE_MASK_LBR)
> >> +#define XFEATURE_MASK_SUPERVISOR_DYNAMIC (XFEATURE_MASK_LBR)
> > 
> > Is XFEATURE_MASK_USER_DYNAMIC coming too?
> 
> Instead of the new define, I thought the new variable --
> xfeatures_mask_user_dynamic, as its value needs to be determined at boot
> time.

Why isn't that in your commit message?

All I see a patch doing a bunch of renames, some unrelated blurb in the
commit message and I have no clue what's going on here and why you're
doing this. Your commit messages should contain simple english sentences
and explain *why* the change is needed - not *what* you're doing. The
*what* I can see from the diff itself, for the *why* I need a crystal
ball which I can't buy in any store.

So how about explaining the *why*?

-- 
Regards/Gruss,
    Boris.

SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer, HRB 36809, AG Nürnberg

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v3 04/21] x86/fpu/xstate: Modify context switch helpers to handle both static and dynamic buffers
  2021-01-15 13:18   ` Borislav Petkov
@ 2021-01-19 18:49     ` Bae, Chang Seok
  0 siblings, 0 replies; 64+ messages in thread
From: Bae, Chang Seok @ 2021-01-19 18:49 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, x86-ml, Brown,
	Len, Hansen, Dave, Liu, Jing2, Shankar, Ravi V, linux-kernel,
	kvm

On Jan 15, 2021, at 05:18, Borislav Petkov <bp@suse.de> wrote:
> 
> What I would've written is:
> 
> "Have all functions handling FPU state take a struct fpu * pointer in
> preparation for dynamic state buffer support."
> 
> Plain and simple.

Thank you. I will apply this on my next revision.

Chang

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v3 02/21] x86/fpu/xstate: Modify state copy helpers to handle both static and dynamic buffers
  2021-01-15 12:50   ` Borislav Petkov
@ 2021-01-19 18:50     ` Bae, Chang Seok
  2021-01-20 20:53       ` Borislav Petkov
  0 siblings, 1 reply; 64+ messages in thread
From: Bae, Chang Seok @ 2021-01-19 18:50 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Andy Lutomirski, tglx, mingo, x86, Brown, Len, Hansen, Dave, Liu,
	Jing2, Shankar, Ravi V, linux-kernel

On Jan 15, 2021, at 04:50, Borislav Petkov <bp@suse.de> wrote:
> On Wed, Dec 23, 2020 at 07:56:58AM -0800, Chang S. Bae wrote:
>> 
>> -void copy_xstate_to_kernel(struct membuf to, struct xregs_state *xsave);
>> -int copy_kernel_to_xstate(struct xregs_state *xsave, const void *kbuf);
>> -int copy_user_to_xstate(struct xregs_state *xsave, const void __user *ubuf);
>> -void copy_supervisor_to_kernel(struct xregs_state *xsave);
>> +void copy_xstate_to_kernel(struct membuf to, struct fpu *fpu);
>> +int copy_kernel_to_xstate(struct fpu *fpu, const void *kbuf);
>> +int copy_user_to_xstate(struct fpu *fpu, const void __user *ubuf);
>> +void copy_supervisor_to_kernel(struct fpu *fpu);
> 
> Hmm, so those functions have "xstate" in the name because they took and
> @xstate parameter. I guess not such a big deal you changing them, just
> pointing out what the naming logic was.

I will add a sentence like this if looks fine:

"The copy functions used to have ‘xstate' in the name as they took a struct
xregs_state * pointer."

Thanks,
Chang

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v3 05/21] x86/fpu/xstate: Add a new variable to indicate dynamic user states
  2021-01-19 15:57       ` Borislav Petkov
@ 2021-01-19 18:57         ` Bae, Chang Seok
  2021-01-22 10:56           ` Borislav Petkov
  0 siblings, 1 reply; 64+ messages in thread
From: Bae, Chang Seok @ 2021-01-19 18:57 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, x86-ml, Brown,
	Len, Hansen, Dave, Liu, Jing2, Shankar, Ravi V, linux-kernel

On Jan 19, 2021, at 07:57, Borislav Petkov <bp@suse.de> wrote:
> On Fri, Jan 15, 2021 at 07:47:38PM +0000, Bae, Chang Seok wrote:
>> On Jan 15, 2021, at 05:39, Borislav Petkov <bp@suse.de> wrote:
>>> On Wed, Dec 23, 2020 at 07:57:01AM -0800, Chang S. Bae wrote:
>>>> The perf has a buffer that is allocated on demand. The states saved in the
>>> 
>>> What's "the perf"? I hope to find out when I countinue reading…
>> 
>> Maybe it was better to write ‘Linux perf (tools)’ [1] here. Sorry for the
>> confusion.
> 
> Well, I'm also confused as to what does the perf buffer have to do with
> AMX?

This series attempts to save the AMX state in the context switch buffer only
when needed -- so it is called out ‘dynamic’ user states.

The LBR state is saved in the perf buffer [1], and this state is named
'dynamic' supervisor states [2]. But some naming in the change has ‘dynamic’
state only.

So, these two kinds of dynamic states are different and need to be named
clearly.

>>>> -#define XFEATURE_MASK_DYNAMIC (XFEATURE_MASK_LBR)
>>>> +#define XFEATURE_MASK_SUPERVISOR_DYNAMIC (XFEATURE_MASK_LBR)
>>> 
>>> Is XFEATURE_MASK_USER_DYNAMIC coming too?
>> 
>> Instead of the new define, I thought the new variable --
>> xfeatures_mask_user_dynamic, as its value needs to be determined at boot
>> time.
> 
> Why isn't that in your commit message?

I will add it on my next revision.

> All I see a patch doing a bunch of renames, some unrelated blurb in the
> commit message and I have no clue what's going on here and why you're
> doing this. Your commit messages should contain simple english sentences
> and explain *why* the change is needed - not *what* you're doing. The
> *what* I can see from the diff itself, for the *why* I need a crystal
> ball which I can't buy in any store.
> 
> So how about explaining the *why*?

How about the changelog message like this:

"
The context switch buffer is in preparation to be dynamic for user states.
Introduce a new mask variable to indicate the 'dynamic' user states. The value
is determined at boot time.

The perf subsystem has a separate buffer to save some states only when needed,
not in every context switch. The states are named as 'dynamic' supervisor
states. Some define and helper are not named with dynamic supervisor states,
so rename them.

No functional change.
“

Thanks,
Chang

[1] https://lore.kernel.org/lkml/1593780569-62993-21-git-send-email-kan.liang@linux.intel.com/
[2] https://lore.kernel.org/lkml/1593780569-62993-22-git-send-email-kan.liang@linux.intel.com/



^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v3 02/21] x86/fpu/xstate: Modify state copy helpers to handle both static and dynamic buffers
  2021-01-19 18:50     ` Bae, Chang Seok
@ 2021-01-20 20:53       ` Borislav Petkov
  2021-01-20 21:12         ` Bae, Chang Seok
  0 siblings, 1 reply; 64+ messages in thread
From: Borislav Petkov @ 2021-01-20 20:53 UTC (permalink / raw)
  To: Bae, Chang Seok
  Cc: Andy Lutomirski, tglx, mingo, x86, Brown, Len, Hansen, Dave, Liu,
	Jing2, Shankar, Ravi V, linux-kernel

On Tue, Jan 19, 2021 at 06:50:52PM +0000, Bae, Chang Seok wrote:
> I will add a sentence like this if looks fine:
> 
> "The copy functions used to have ‘xstate' in the name as they took a struct
> xregs_state * pointer."

What for?

I was just pointing out what the naming logic was and that you're
changing that...

-- 
Regards/Gruss,
    Boris.

SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer, HRB 36809, AG Nürnberg

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v3 02/21] x86/fpu/xstate: Modify state copy helpers to handle both static and dynamic buffers
  2021-01-20 20:53       ` Borislav Petkov
@ 2021-01-20 21:12         ` Bae, Chang Seok
  0 siblings, 0 replies; 64+ messages in thread
From: Bae, Chang Seok @ 2021-01-20 21:12 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Andy Lutomirski, tglx, mingo, x86, Brown, Len, Hansen, Dave, Liu,
	Jing2, Shankar, Ravi V, linux-kernel

On Jan 20, 2021, at 12:53, Borislav Petkov <bp@suse.de> wrote:
> On Tue, Jan 19, 2021 at 06:50:52PM +0000, Bae, Chang Seok wrote:
>> I will add a sentence like this if looks fine:
>> 
>> "The copy functions used to have ‘xstate' in the name as they took a struct
>> xregs_state * pointer."
> 
> What for?
> 
> I was just pointing out what the naming logic was and that you're
> changing that…

Oh, I thought you meant in there. Okay, I will not add it. 

Thank you for clarifying this.
Chang


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v3 05/21] x86/fpu/xstate: Add a new variable to indicate dynamic user states
  2021-01-19 18:57         ` Bae, Chang Seok
@ 2021-01-22 10:56           ` Borislav Petkov
  2021-01-27  1:23             ` Bae, Chang Seok
  0 siblings, 1 reply; 64+ messages in thread
From: Borislav Petkov @ 2021-01-22 10:56 UTC (permalink / raw)
  To: Bae, Chang Seok
  Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, x86-ml, Brown,
	Len, Hansen, Dave, Liu, Jing2, Shankar, Ravi V, linux-kernel

On Tue, Jan 19, 2021 at 06:57:26PM +0000, Bae, Chang Seok wrote:
> This series attempts to save the AMX state in the context switch buffer only

What is the context switch buffer?

I think you mean simply the xstate per-task buffer which is switched on
context switches...

> when needed -- so it is called out ‘dynamic’ user states.
> 
> The LBR state is saved in the perf buffer [1], and this state is named
> 'dynamic' supervisor states [2]. But some naming in the change has ‘dynamic’
> state only.
> 
> So, these two kinds of dynamic states are different and need to be named
> clearly.

Oh well, this is going to be a mess, there's also CET coming but at
least stuff is properly documented with comments - I guess thanks
dhansen :) - so we can fix it up later if something's still amiss.

> How about the changelog message like this:
> 
> "
> The context switch buffer is in preparation to be dynamic for user states.
> Introduce a new mask variable to indicate the 'dynamic' user states. The value
> is determined at boot time.
> 
> The perf subsystem has a separate buffer to save some states only when needed,
> not in every context switch. The states are named as 'dynamic' supervisor
> states. Some define and helper are not named with dynamic supervisor states,
> so rename them.
> 
> No functional change.
> “

Yah, better.

Thx.

-- 
Regards/Gruss,
    Boris.

SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer, HRB 36809, AG Nürnberg

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v3 06/21] x86/fpu/xstate: Calculate and remember dynamic xstate buffer sizes
  2020-12-23 15:57 ` [PATCH v3 06/21] x86/fpu/xstate: Calculate and remember dynamic xstate buffer sizes Chang S. Bae
@ 2021-01-22 11:44   ` Borislav Petkov
  2021-01-27  1:23     ` Bae, Chang Seok
  0 siblings, 1 reply; 64+ messages in thread
From: Borislav Petkov @ 2021-01-22 11:44 UTC (permalink / raw)
  To: Chang S. Bae
  Cc: luto, tglx, mingo, x86, len.brown, dave.hansen, jing2.liu,
	ravi.v.shankar, linux-kernel, kvm

On Wed, Dec 23, 2020 at 07:57:02AM -0800, Chang S. Bae wrote:
> The xstate buffer is currently in-line with static size. To accommodatea

"in-line" doesn't fit in this context, especially since "inline"
is a keyword with another meaning. Please replace it with a better
formulation in this patch.

> dynamic user xstates, introduce variables to represent the maximum and
> minimum buffer sizes.
> 
> do_extra_xstate_size_checks() calculates the maximum xstate size and sanity
> checks it with CPUID. It calculates the static in-line buffer size by
> excluding the dynamic user states from the maximum xstate size.
> 
> No functional change, until the kernel enables dynamic buffer support.
> 
> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
> Reviewed-by: Len Brown <len.brown@intel.com>
> Cc: x86@kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: kvm@vger.kernel.org
> ---
> Changes from v2:
> * Updated the changelog with task->fpu removed. (Boris Petkov)
> * Renamed the in-line size variable.
> * Updated some code comments.
> ---
>  arch/x86/include/asm/processor.h | 10 +++----
>  arch/x86/kernel/fpu/core.c       |  6 ++---
>  arch/x86/kernel/fpu/init.c       | 36 ++++++++++++++++---------
>  arch/x86/kernel/fpu/signal.c     |  2 +-
>  arch/x86/kernel/fpu/xstate.c     | 46 +++++++++++++++++++++-----------
>  arch/x86/kernel/process.c        |  6 +++++
>  arch/x86/kvm/x86.c               |  2 +-
>  7 files changed, 67 insertions(+), 41 deletions(-)
> 
> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
> index 82a08b585818..c9c608f8af91 100644
> --- a/arch/x86/include/asm/processor.h
> +++ b/arch/x86/include/asm/processor.h
> @@ -477,7 +477,8 @@ DECLARE_PER_CPU_ALIGNED(struct stack_canary, stack_canary);
>  DECLARE_PER_CPU(struct irq_stack *, softirq_stack_ptr);
>  #endif	/* X86_64 */
>  
> -extern unsigned int fpu_kernel_xstate_size;
> +extern unsigned int fpu_kernel_xstate_min_size;
> +extern unsigned int fpu_kernel_xstate_max_size;

Is it time to group this into a struct so that all those settings go
together instead in single variables?

struct fpu_xstate {
	unsigned int min_size, max_size;
	unsigned int user_size;
	...
};

etc.

>  extern unsigned int fpu_user_xstate_size;
>  
>  struct perf_event;
> @@ -545,12 +546,7 @@ struct thread_struct {
>  };
>  
>  /* Whitelist the FPU state from the task_struct for hardened usercopy. */
> -static inline void arch_thread_struct_whitelist(unsigned long *offset,
> -						unsigned long *size)
> -{
> -	*offset = offsetof(struct thread_struct, fpu.state);
> -	*size = fpu_kernel_xstate_size;
> -}
> +extern void arch_thread_struct_whitelist(unsigned long *offset, unsigned long *size);

What's that move for?

> diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
> index 74e03e3bc20f..5dac97158030 100644
> --- a/arch/x86/kernel/fpu/init.c
> +++ b/arch/x86/kernel/fpu/init.c
> @@ -130,13 +130,20 @@ static void __init fpu__init_system_generic(void)
>  }
>  
>  /*
> - * Size of the FPU context state. All tasks in the system use the
> - * same context size, regardless of what portion they use.
> - * This is inherent to the XSAVE architecture which puts all state
> - * components into a single, continuous memory block:
> + * Size of the minimally allocated FPU context state. All threads have this amount
> + * of xstate buffer at minimum.
> + *
> + * This buffer is inherent to the XSAVE architecture which puts all state components
> + * into a single, continuous memory block:
> + */
> +unsigned int fpu_kernel_xstate_min_size;
> +EXPORT_SYMBOL_GPL(fpu_kernel_xstate_min_size);
> +
> +/*
> + * Size of the maximum FPU context state. When using the compacted format, the buffer
> + * can be dynamically expanded to include some states up to this size.
>   */
> -unsigned int fpu_kernel_xstate_size;
> -EXPORT_SYMBOL_GPL(fpu_kernel_xstate_size);
> +unsigned int fpu_kernel_xstate_max_size;

And since we're probably going to start querying different aspects about
the buffer, instead of exporting all kinds of variables in the future,
maybe this should be a single exported function called

get_xstate_buffer_attr(typedef buffer_attr)

which gives you what you wanna know about it... For example:

get_xstate_buffer_attr(MIN_SIZE);
get_xstate_buffer_attr(MAX_SIZE);
...

> diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
> index 414a13427934..b6d2706b6886 100644
> --- a/arch/x86/kernel/fpu/signal.c
> +++ b/arch/x86/kernel/fpu/signal.c
> @@ -289,8 +289,8 @@ static int copy_user_to_fpregs_zeroing(void __user *buf, u64 xbv, int fx_only)
>  
>  static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
>  {
> +	int state_size = fpu_kernel_xstate_min_size;
>  	struct user_i387_ia32_struct *envp = NULL;
> -	int state_size = fpu_kernel_xstate_size;
>  	int ia32_fxstate = (buf != buf_fx);
>  	struct task_struct *tsk = current;
>  	struct fpu *fpu = &tsk->thread.fpu;
> diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
> index 6620d0a3caff..2012b17b1793 100644
> --- a/arch/x86/kernel/fpu/xstate.c
> +++ b/arch/x86/kernel/fpu/xstate.c
> @@ -627,13 +627,18 @@ static void check_xstate_against_struct(int nr)
>   */

<-- There's a comment over this function that might need adjustment.

>  static void do_extra_xstate_size_checks(void)
>  {
> -	int paranoid_xstate_size = FXSAVE_SIZE + XSAVE_HDR_SIZE;
> +	int paranoid_min_size = FXSAVE_SIZE + XSAVE_HDR_SIZE;
> +	int paranoid_max_size = FXSAVE_SIZE + XSAVE_HDR_SIZE;
>  	int i;

...

> @@ -744,27 +758,27 @@ static bool is_supported_xstate_size(unsigned int test_xstate_size)
>  static int __init init_xstate_size(void)
>  {
>  	/* Recompute the context size for enabled features: */
> -	unsigned int possible_xstate_size;
> +	unsigned int possible_max_xstate_size;
>  	unsigned int xsave_size;
>  
>  	xsave_size = get_xsave_size();
>  
>  	if (boot_cpu_has(X86_FEATURE_XSAVES))

using_compacted_format()

FPU code needs to agree on one helper and not use both. :-\

> -		possible_xstate_size = get_xsaves_size_no_dynamic();
> +		possible_max_xstate_size = get_xsaves_size_no_dynamic();
>  	else
> -		possible_xstate_size = xsave_size;
> -
> -	/* Ensure we have the space to store all enabled: */
> -	if (!is_supported_xstate_size(possible_xstate_size))
> -		return -EINVAL;
> +		possible_max_xstate_size = xsave_size;
>  
>  	/*
>  	 * The size is OK, we are definitely going to use xsave,
>  	 * make it known to the world that we need more space.
>  	 */
> -	fpu_kernel_xstate_size = possible_xstate_size;
> +	fpu_kernel_xstate_max_size = possible_max_xstate_size;
>  	do_extra_xstate_size_checks();
>  
> +	/* Ensure we have the supported in-line space: */

Who's "we"?

> +	if (!is_supported_xstate_size(fpu_kernel_xstate_min_size))
> +		return -EINVAL;
> +
>  	/*
>  	 * User space is always in standard format.
>  	 */

-- 
Regards/Gruss,
    Boris.

SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer, HRB 36809, AG Nürnberg

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v3 07/21] x86/fpu/xstate: Introduce helpers to manage dynamic xstate buffers
  2020-12-23 15:57 ` [PATCH v3 07/21] x86/fpu/xstate: Introduce helpers to manage dynamic xstate buffers Chang S. Bae
@ 2021-01-26 20:17   ` Borislav Petkov
  2021-01-27  1:23     ` Bae, Chang Seok
  2021-02-03  4:10     ` Bae, Chang Seok
  0 siblings, 2 replies; 64+ messages in thread
From: Borislav Petkov @ 2021-01-26 20:17 UTC (permalink / raw)
  To: Chang S. Bae
  Cc: luto, tglx, mingo, x86, len.brown, dave.hansen, jing2.liu,
	ravi.v.shankar, linux-kernel

On Wed, Dec 23, 2020 at 07:57:03AM -0800, Chang S. Bae wrote:
> diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
> index f5a38a5f3ae1..3fc6dbbe3ede 100644
> --- a/arch/x86/include/asm/fpu/types.h
> +++ b/arch/x86/include/asm/fpu/types.h
> @@ -336,14 +336,33 @@ struct fpu {
>  	 */
>  	unsigned long			avx512_timestamp;
>  
> +	/*
> +	 * @state_mask:
> +	 *
> +	 * The state component bitmap. It indicates the saved xstate in
> +	 * either @state or @state_ptr. The map value starts to be aligned
> +	 * with @state and then with @state_ptr once it is in use.

Are you trying to say here that the mask describes the state saved in
@state initially and then, when the task is switched to dynamic state,
it denotes the state in ->state_ptr?

> +	 */
> +	u64				state_mask;
> +
> +	/*
> +	 * @state_ptr:
> +	 *
> +	 * Copy of all extended register states, in a dynamically allocated
> +	 * buffer. When a task is using extended features, the register state
> +	 * is always the most current. This state copy is more recent than
> +	 * @state. If the task context-switches away, they get saved here,
> +	 * representing the xstate.

Calling it a copy here is confusing - you wanna say that when dynamic
states get used, the state in state_ptr supercedes and invalidates the
state in @state. AFAIU, at least.

> +	 */
> +	union fpregs_state		*state_ptr;
> +
>  	/*
>  	 * @state:
>  	 *
> -	 * In-memory copy of all FPU registers that we save/restore
> -	 * over context switches. If the task is using the FPU then
> -	 * the registers in the FPU are more recent than this state
> -	 * copy. If the task context-switches away then they get
> -	 * saved here and represent the FPU state.
> +	 * Copy of some extended register state. If a task uses a dynamically

Copy of some?

Why not, "Initial in-memory copy of all FPU registers that we
save/restore over context switches. When the task is switched to dynamic
states, this copy is replaced with the one in ->state_ptr."

Which brings me to the more important question and I guess I'll see when
I get to the end of this: are we aiming at having a *single* ->state
pointer which gets used in both static and dynamic FPU state settings?

> +	 * allocated buffer, @state_ptr, then it has a more recent state copy
> +	 * than this. This copy follows the same attributes as described for
> +	 * @state_ptr.
>  	 */
>  	union fpregs_state		state;
>  	/*
> diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
> index 6ce8350672c2..379e8f8b8440 100644
> --- a/arch/x86/include/asm/fpu/xstate.h
> +++ b/arch/x86/include/asm/fpu/xstate.h
> @@ -103,6 +103,9 @@ extern void __init update_regset_xstate_info(unsigned int size,
>  					     u64 xstate_mask);
>  
>  void *get_xsave_addr(struct fpu *fpu, int xfeature_nr);
> +int alloc_xstate_buffer(struct fpu *fpu, u64 mask);
> +void free_xstate_buffer(struct fpu *fpu);
> +
>  const void *get_xsave_field_ptr(int xfeature_nr);
>  int using_compacted_format(void);
>  int xfeature_size(int xfeature_nr);
> diff --git a/arch/x86/include/asm/trace/fpu.h b/arch/x86/include/asm/trace/fpu.h
> index 879b77792f94..bf88b3333873 100644
> --- a/arch/x86/include/asm/trace/fpu.h
> +++ b/arch/x86/include/asm/trace/fpu.h
> @@ -89,6 +89,11 @@ DEFINE_EVENT(x86_fpu, x86_fpu_xstate_check_failed,
>  	TP_ARGS(fpu)
>  );
>  
> +DEFINE_EVENT(x86_fpu, x86_fpu_xstate_alloc_failed,
> +	TP_PROTO(struct fpu *fpu),
> +	TP_ARGS(fpu)
> +);
> +

Huh, what's that for?

>  #undef TRACE_INCLUDE_PATH
>  #define TRACE_INCLUDE_PATH asm/trace/
>  #undef TRACE_INCLUDE_FILE
> diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
> index 1a428803e6b2..6dafed34be4f 100644
> --- a/arch/x86/kernel/fpu/core.c
> +++ b/arch/x86/kernel/fpu/core.c
> @@ -235,6 +235,9 @@ int fpu__copy(struct task_struct *dst, struct task_struct *src)
>  	 */
>  	memset(&dst_fpu->state.xsave, 0, fpu_kernel_xstate_min_size);
>  
> +	dst_fpu->state_mask = xfeatures_mask_all & ~xfeatures_mask_user_dynamic;
> +	dst_fpu->state_ptr = NULL;
> +
>  	/*
>  	 * If the FPU registers are not current just memcpy() the state.
>  	 * Otherwise save current FPU registers directly into the child's FPU
> diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
> index 2012b17b1793..af4d7d9aa977 100644
> --- a/arch/x86/kernel/fpu/xstate.c
> +++ b/arch/x86/kernel/fpu/xstate.c
> @@ -10,6 +10,7 @@
>  #include <linux/pkeys.h>
>  #include <linux/seq_file.h>
>  #include <linux/proc_fs.h>
> +#include <linux/vmalloc.h>
>  
>  #include <asm/fpu/api.h>
>  #include <asm/fpu/internal.h>
> @@ -19,6 +20,7 @@
>  
>  #include <asm/tlbflush.h>
>  #include <asm/cpufeature.h>
> +#include <asm/trace/fpu.h>
>  
>  /*
>   * Although we spell it out in here, the Processor Trace
> @@ -71,6 +73,7 @@ static unsigned int xstate_offsets[XFEATURE_MAX] = { [ 0 ... XFEATURE_MAX - 1] =
>  static unsigned int xstate_sizes[XFEATURE_MAX]   = { [ 0 ... XFEATURE_MAX - 1] = -1};
>  static unsigned int xstate_comp_offsets[XFEATURE_MAX] = { [ 0 ... XFEATURE_MAX - 1] = -1};
>  static unsigned int xstate_supervisor_only_offsets[XFEATURE_MAX] = { [ 0 ... XFEATURE_MAX - 1] = -1};
> +static bool xstate_aligns[XFEATURE_MAX] = { [ 0 ... XFEATURE_MAX - 1] = false};

What's that for?

>  
>  /*
>   * The XSAVE area of kernel can be in standard or compacted format;
> @@ -130,6 +133,48 @@ static bool xfeature_is_supervisor(int xfeature_nr)
>  	return ecx & 1;
>  }
>  
> +/*
> + * Available once those arrays for the offset, size, and alignment info are set up,
> + * by setup_xstate_features().
> + */
> +static unsigned int get_xstate_size(u64 mask)
> +{
> +	unsigned int size;
> +	u64 xmask;
> +	int i, nr;
> +
> +	if (!mask)
> +		return 0;
> +	else if (mask == (xfeatures_mask_all & ~xfeatures_mask_user_dynamic))
> +		return fpu_kernel_xstate_min_size;
> +	else if (mask == xfeatures_mask_all)
> +		return fpu_kernel_xstate_max_size;
> +
> +	nr = fls64(mask) - 1;
> +
> +	if (!using_compacted_format())
> +		return xstate_offsets[nr] + xstate_sizes[nr];
> +
> +	xmask = BIT_ULL(nr + 1) - 1;
> +
> +	if (mask == (xmask & xfeatures_mask_all))
> +		return xstate_comp_offsets[nr] + xstate_sizes[nr];
> +
> +	/*
> +	 * Calculate the size by summing up each state together, since no known
> +	 * size found with the xstate buffer format out of the given mask.
> +	 */

I barely can imagine what that comment is trying to tell me...

> +	for (size = FXSAVE_SIZE + XSAVE_HDR_SIZE, i = FIRST_EXTENDED_XFEATURE; i <= nr; i++) {
> +		if (!(mask & BIT_ULL(i)))
> +			continue;
> +
> +		if (xstate_aligns[i])
> +			size = ALIGN(size, 64);
> +		size += xstate_sizes[i];
> +	}
> +	return size;
> +}
> +
>  /*
>   * When executing XSAVEOPT (or other optimized XSAVE instructions), if
>   * a processor implementation detects that an FPU state component is still
> @@ -270,10 +315,12 @@ static void __init setup_xstate_features(void)
>  	xstate_offsets[XFEATURE_FP]	= 0;
>  	xstate_sizes[XFEATURE_FP]	= offsetof(struct fxregs_state,
>  						   xmm_space);
> +	xstate_aligns[XFEATURE_FP]	= true;
>  
>  	xstate_offsets[XFEATURE_SSE]	= xstate_sizes[XFEATURE_FP];
>  	xstate_sizes[XFEATURE_SSE]	= sizeof_field(struct fxregs_state,
>  						       xmm_space);
> +	xstate_aligns[XFEATURE_SSE]	= true;
>  
>  	for (i = FIRST_EXTENDED_XFEATURE; i < XFEATURE_MAX; i++) {
>  		if (!xfeature_enabled(i))
> @@ -291,6 +338,7 @@ static void __init setup_xstate_features(void)
>  			continue;
>  
>  		xstate_offsets[i] = ebx;
> +		xstate_aligns[i] = (ecx & 2) ? true : false;
>  
>  		/*
>  		 * In our xstate size checks, we assume that the highest-numbered
> @@ -755,6 +803,9 @@ static bool is_supported_xstate_size(unsigned int test_xstate_size)
>  	return false;
>  }
>  
> +/* The watched threshold size of dynamically allocated xstate buffer */

Watched?

> +#define XSTATE_BUFFER_MAX_BYTES		(64 * 1024)

What's that thing for when we have fpu_kernel_xstate_max_size too?

> +
>  static int __init init_xstate_size(void)
>  {
>  	/* Recompute the context size for enabled features: */
> @@ -779,6 +830,14 @@ static int __init init_xstate_size(void)
>  	if (!is_supported_xstate_size(fpu_kernel_xstate_min_size))
>  		return -EINVAL;
>  
> +	/*
> +	 * When allocating buffers larger than the threshold, a more sophisticated
> +	 * mechanism might be considerable.
> +	 */
> +	if (fpu_kernel_xstate_max_size > XSTATE_BUFFER_MAX_BYTES)
> +		pr_warn("x86/fpu: xstate buffer too large (%u > %u)\n",
> +			fpu_kernel_xstate_max_size, XSTATE_BUFFER_MAX_BYTES);

So why doesn't this return an error?

> +
>  	/*
>  	 * User space is always in standard format.
>  	 */
> @@ -869,6 +928,9 @@ void __init fpu__init_system_xstate(void)
>  	if (err)
>  		goto out_disable;
>  
> +	/* Make sure init_task does not include the dynamic user states */
> +	current->thread.fpu.state_mask = (xfeatures_mask_all & ~xfeatures_mask_user_dynamic);

xfeatures_mask_user_dynamic just got set to 0 a couple of lines above...

> +
>  	/*
>  	 * Update info used for ptrace frames; use standard-format size and no
>  	 * supervisor xstates:
> @@ -1089,6 +1151,59 @@ static inline bool xfeatures_mxcsr_quirk(u64 xfeatures)
>  	return true;
>  }
>  
> +void free_xstate_buffer(struct fpu *fpu)
> +{
> +	vfree(fpu->state_ptr);
> +}
> +
> +/*
> + * Allocate an xstate buffer with the size calculated based on 'mask'.
> + *
> + * The allocation mechanism does not shrink or reclaim the buffer.
> + */
> +int alloc_xstate_buffer(struct fpu *fpu, u64 mask)
> +{
> +	union fpregs_state *state_ptr;
> +	unsigned int oldsz, newsz;
> +	u64 state_mask;
> +
> +	state_mask = fpu->state_mask | mask;
> +
> +	oldsz = get_xstate_size(fpu->state_mask);
> +	newsz = get_xstate_size(state_mask);
> +
> +	if (oldsz >= newsz)
> +		return 0;
> +
> +	if (newsz > fpu_kernel_xstate_max_size) {
> +		pr_warn_once("x86/fpu: xstate buffer too large (%u > %u bytes)\n",
> +			     newsz, fpu_kernel_xstate_max_size);
> +		XSTATE_WARN_ON(1);
> +		return 0;

return 0?!? On an error?!?

> +	}
> +
> +	/* We need 64B aligned pointer, but vmalloc() returns a page-aligned address. */

So this comment is useless, basically...

> +	state_ptr = vmalloc(newsz);
> +	if (!state_ptr) {
> +		trace_x86_fpu_xstate_alloc_failed(fpu);

WTH is that tracepoint here for?

> +		return -ENOMEM;
> +	}
> +
> +	memset(state_ptr, 0, newsz);

So vzalloc() above?

> +	if (using_compacted_format())
> +		fpstate_init_xstate(&state_ptr->xsave, state_mask);
> +
> +	/*
> +	 * As long as the register state is intact, save the xstate in the new buffer
> +	 * at the next context copy/switch or potentially ptrace-driven xstate writing.
> +	 */
> +
> +	vfree(fpu->state_ptr);
> +	fpu->state_ptr = state_ptr;
> +	fpu->state_mask = state_mask;

I must be missing something here but where's the logic that decides
between the static and dynamic buffer? Later patches?

I have to admit I've yet to see how the "switching" between static and
dynamic state happens...

-- 
Regards/Gruss,
    Boris.

SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer, HRB 36809, AG Nürnberg

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v3 05/21] x86/fpu/xstate: Add a new variable to indicate dynamic user states
  2021-01-22 10:56           ` Borislav Petkov
@ 2021-01-27  1:23             ` Bae, Chang Seok
  0 siblings, 0 replies; 64+ messages in thread
From: Bae, Chang Seok @ 2021-01-27  1:23 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, x86-ml, Brown,
	Len, Hansen, Dave, Liu, Jing2, Shankar, Ravi V, linux-kernel

On Jan 22, 2021, at 02:56, Borislav Petkov <bp@suse.de> wrote:
> On Tue, Jan 19, 2021 at 06:57:26PM +0000, Bae, Chang Seok wrote:
>> This series attempts to save the AMX state in the context switch buffer only
> 
> What is the context switch buffer?
> 
> I think you mean simply the xstate per-task buffer which is switched on
> context switches...

Yes, I will use ‘xstate per-task buffer’ instead of it.

>> How about the changelog message like this:
>> 
>> "
>> The context switch buffer is in preparation to be dynamic for user states.

s/context switch/xstate per-task/

Thanks,
Chan


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v3 06/21] x86/fpu/xstate: Calculate and remember dynamic xstate buffer sizes
  2021-01-22 11:44   ` Borislav Petkov
@ 2021-01-27  1:23     ` Bae, Chang Seok
  2021-01-27  9:38       ` Borislav Petkov
  0 siblings, 1 reply; 64+ messages in thread
From: Bae, Chang Seok @ 2021-01-27  1:23 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Andy Lutomirski, Thomas Gleixner, mingo, x86, Brown, Len, Hansen,
	Dave, Liu, Jing2, Shankar, Ravi V, linux-kernel, kvm

On Jan 22, 2021, at 03:44, Borislav Petkov <bp@suse.de> wrote:
> On Wed, Dec 23, 2020 at 07:57:02AM -0800, Chang S. Bae wrote:
>> The xstate buffer is currently in-line with static size. To accommodatea
> 
> "in-line" doesn't fit in this context, especially since "inline"
> is a keyword with another meaning. Please replace it with a better
> formulation in this patch.

How about ‘embedded’?,
    “The xstate buffer is currently embedded into struct fpu with static size."

>> -extern unsigned int fpu_kernel_xstate_size;
>> +extern unsigned int fpu_kernel_xstate_min_size;
>> +extern unsigned int fpu_kernel_xstate_max_size;
> 
> Is it time to group this into a struct so that all those settings go
> together instead in single variables?
> 
> struct fpu_xstate {
> 	unsigned int min_size, max_size;
> 	unsigned int user_size;
> 	...
> };
> 
> etc.

<snip>

> And since we're probably going to start querying different aspects about
> the buffer, instead of exporting all kinds of variables in the future,
> maybe this should be a single exported function called
> 
> get_xstate_buffer_attr(typedef buffer_attr)
> 
> which gives you what you wanna know about it... For example:
> 
> get_xstate_buffer_attr(MIN_SIZE);
> get_xstate_buffer_attr(MAX_SIZE);
> ...

Okay. I will prepare a separate cleanup patch that can be applied at the end
of the series. Will post the change in this thread at first.

>> /* Whitelist the FPU state from the task_struct for hardened usercopy. */
>> -static inline void arch_thread_struct_whitelist(unsigned long *offset,
>> -						unsigned long *size)
>> -{
>> -	*offset = offsetof(struct thread_struct, fpu.state);
>> -	*size = fpu_kernel_xstate_size;
>> -}
>> +extern void arch_thread_struct_whitelist(unsigned long *offset, unsigned long *size);
> 
> What's that move for?

One of my drafts had some internal helper to be called in there. No reason
prior to applying the get_xstate_buffer_attr() helper. But with it, better to
move this out of this header file I think.

>> @@ -627,13 +627,18 @@ static void check_xstate_against_struct(int nr)
>>  */
> 
> <-- There's a comment over this function that might need adjustment.

Do you mean an empty line? (Just want to clarify.)

>> static void do_extra_xstate_size_checks(void)
>> {

<snip>

>> 	if (boot_cpu_has(X86_FEATURE_XSAVES))
> 
> using_compacted_format()
> 
> FPU code needs to agree on one helper and not use both. :-\

Agreed. I will prepare a patch. At least will post the diff here.

<snip>

>> +	/* Ensure we have the supported in-line space: */
> 
> Who's "we"?

How about:
    “Ensure the size fits in the statically-allocated buffer:"

>> +	if (!is_supported_xstate_size(fpu_kernel_xstate_min_size))
>> +		return -EINVAL;

No excuse, just pointing out the upstream code has “we” there [1].

Thanks,
Chang

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/kernel/fpu/xstate.c#n752


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v3 07/21] x86/fpu/xstate: Introduce helpers to manage dynamic xstate buffers
  2021-01-26 20:17   ` Borislav Petkov
@ 2021-01-27  1:23     ` Bae, Chang Seok
  2021-01-27 10:41       ` Borislav Petkov
  2021-02-03  4:10     ` Bae, Chang Seok
  1 sibling, 1 reply; 64+ messages in thread
From: Bae, Chang Seok @ 2021-01-27  1:23 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, x86-ml, Brown,
	Len, Hansen, Dave, Liu, Jing2, Shankar, Ravi V, linux-kernel

On Jan 26, 2021, at 12:17, Borislav Petkov <bp@suse.de> wrote:
> On Wed, Dec 23, 2020 at 07:57:03AM -0800, Chang S. Bae wrote:
>> 
>> +	/*
>> +	 * @state_mask:
>> +	 *
>> +	 * The state component bitmap. It indicates the saved xstate in
>> +	 * either @state or @state_ptr. The map value starts to be aligned
>> +	 * with @state and then with @state_ptr once it is in use.
> 
> Are you trying to say here that the mask describes the state saved in
> @state initially and then, when the task is switched to dynamic state,
> it denotes the state in ->state_ptr?

Yes, it is. I will take your sentence in the comment. Thank you.

>> +	 */
>> +	u64				state_mask;
>> +
>> +	/*
>> +	 * @state_ptr:
>> +	 *
>> +	 * Copy of all extended register states, in a dynamically allocated
>> +	 * buffer. When a task is using extended features, the register state
>> +	 * is always the most current. This state copy is more recent than
>> +	 * @state. If the task context-switches away, they get saved here,
>> +	 * representing the xstate.
> 
> Calling it a copy here is confusing - you wanna say that when dynamic
> states get used, the state in state_ptr supercedes and invalidates the
> state in @state. AFAIU, at least.

True, it looks better here too.

>> +DEFINE_EVENT(x86_fpu, x86_fpu_xstate_alloc_failed,
>> +	TP_PROTO(struct fpu *fpu),
>> +	TP_ARGS(fpu)
>> +);
>> +
> 
> Huh, what's that for?

This tracepoint can point to the allocation failure even with the NMI handling
failure message only. (You can also check the comment below at the call site.)

>> /*
>>  * Although we spell it out in here, the Processor Trace
>> @@ -71,6 +73,7 @@ static unsigned int xstate_offsets[XFEATURE_MAX] = { [ 0 ... XFEATURE_MAX - 1] =
>> static unsigned int xstate_sizes[XFEATURE_MAX]   = { [ 0 ... XFEATURE_MAX - 1] = -1};
>> static unsigned int xstate_comp_offsets[XFEATURE_MAX] = { [ 0 ... XFEATURE_MAX - 1] = -1};
>> static unsigned int xstate_supervisor_only_offsets[XFEATURE_MAX] = { [ 0 ... XFEATURE_MAX - 1] = -1};
>> +static bool xstate_aligns[XFEATURE_MAX] = { [ 0 ... XFEATURE_MAX - 1] = false};
> 
> What's that for?

The xstate buffer may expand on the fly. The size has to be correctly
calculated if needed. CPUID provides essential information for the
calculation. Instead of reading CPUID repeatedly, store them -- the offset and
size are already stored here. The 64B alignment looks to be missing, so added
here.

>> +	/*
>> +	 * Calculate the size by summing up each state together, since no known
>> +	 * size found with the xstate buffer format out of the given mask.
>> +	 */
> 
> I barely can imagine what that comment is trying to tell me...

How about:
    “With the given mask, no relevant size is found so far. So, calculate it by
     summing up each state size."

>> +/* The watched threshold size of dynamically allocated xstate buffer */
> 
> Watched?

Maybe: 
    "When the buffer is more than this size, the current mechanism is
     potentially marginal to support the allocations."

>> +#define XSTATE_BUFFER_MAX_BYTES		(64 * 1024)
> 
> What's that thing for when we have fpu_kernel_xstate_max_size too?

The threshold size is what the current mechanism can comfortably allocate
(maybe at most). The warning is left when the buffer size goes beyond the 
threshold. Then, we may need to consider a better allocation mechanism.

>> static int __init init_xstate_size(void)
>> {
>> 	/* Recompute the context size for enabled features: */
>> @@ -779,6 +830,14 @@ static int __init init_xstate_size(void)
>> 	if (!is_supported_xstate_size(fpu_kernel_xstate_min_size))
>> 		return -EINVAL;
>> 
>> +	/*
>> +	 * When allocating buffers larger than the threshold, a more sophisticated
>> +	 * mechanism might be considerable.
>> +	 */
>> +	if (fpu_kernel_xstate_max_size > XSTATE_BUFFER_MAX_BYTES)
>> +		pr_warn("x86/fpu: xstate buffer too large (%u > %u)\n",
>> +			fpu_kernel_xstate_max_size, XSTATE_BUFFER_MAX_BYTES);
> 
> So why doesn't this return an error?

Although a warning is given, vmalloc() may manage to allocate this size. So,
it was not considered a hard hit yet. vmalloc() failure will return an error
later.

>> 	/*
>> 	 * User space is always in standard format.
>> 	 */
>> @@ -869,6 +928,9 @@ void __init fpu__init_system_xstate(void)
>> 	if (err)
>> 		goto out_disable;
>> 
>> +	/* Make sure init_task does not include the dynamic user states */
>> +	current->thread.fpu.state_mask = (xfeatures_mask_all & ~xfeatures_mask_user_dynamic);
> 
> xfeatures_mask_user_dynamic just got set to 0 a couple of lines above...

Well, it will have some values when the piece in place to support the dynamic
user state. PATCH13 has this change there:
 
+	for (i = FIRST_EXTENDED_XFEATURE; i < XFEATURE_MAX; i++) {
+		u64 feature_mask = BIT_ULL(i);
+
+		if (!(xfeatures_mask_user() & feature_mask))
+			continue;
+
+		if (xfeature_disable_supported(i))
+			xfeatures_mask_user_dynamic |= feature_mask;
+	}

>> +/*
>> + * Allocate an xstate buffer with the size calculated based on 'mask'.
>> + *
>> + * The allocation mechanism does not shrink or reclaim the buffer.
>> + */
>> +int alloc_xstate_buffer(struct fpu *fpu, u64 mask)
>> +{
>> +	union fpregs_state *state_ptr;
>> +	unsigned int oldsz, newsz;
>> +	u64 state_mask;
>> +
>> +	state_mask = fpu->state_mask | mask;
>> +
>> +	oldsz = get_xstate_size(fpu->state_mask);
>> +	newsz = get_xstate_size(state_mask);
>> +
>> +	if (oldsz >= newsz)
>> +		return 0;
>> +
>> +	if (newsz > fpu_kernel_xstate_max_size) {
>> +		pr_warn_once("x86/fpu: xstate buffer too large (%u > %u bytes)\n",
>> +			     newsz, fpu_kernel_xstate_max_size);
>> +		XSTATE_WARN_ON(1);
>> +		return 0;
> 
> return 0?!? On an error?!?

Okay, the first question is whether this is an error. Well, with such too-much
size though, the buffer can still store the states. So, give a warning at
least. Perhaps, a similar case is when the calculated size is unmatched with
the CPUID-provided [3]. We give a warning, not an error there, maybe assuming
the calculated is larger.

But if it should be considered an error, maybe return -EINVAL.

>> +	}
>> +
>> +	/* We need 64B aligned pointer, but vmalloc() returns a page-aligned address. */
> 
> So this comment is useless, basically...

Okay, removed.

>> +	state_ptr = vmalloc(newsz);
>> +	if (!state_ptr) {
>> +		trace_x86_fpu_xstate_alloc_failed(fpu);
> 
> WTH is that tracepoint here for?

While it returns an error, this function can be on the path of NMI handling.
Then, likely only with the “unexpected #NM exception” message. So, logging a
tracepoint can provide evidence of the allocation failure in that case.

The comments on v1 [1][2] were received as such change.

>> +		return -ENOMEM;
>> +	}
>> +
>> +	memset(state_ptr, 0, newsz);
> 
> So vzalloc() above?

Yes, I think it is better to use vzalloc() here.

> I must be missing something here but where's the logic that decides
> between the static and dynamic buffer? Later patches?
> 
> I have to admit I've yet to see how the "switching" between static and
> dynamic state happens…

PATCH9 introduces a wrapper that determines which to take. It simply returns
state_ptr when not a null pointer. So, the logic is to use the dynamic buffer
when available.

Thanks,
Chang

[1] https://lore.kernel.org/lkml/69721125-4e1c-ca9c-ff59-8e1331933e6c@intel.com/#t
[2] https://lore.kernel.org/lkml/20201014104148.GD2628@hirez.programming.kicks-ass.net/
[3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/kernel/fpu/xstate.c#n657

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v3 06/21] x86/fpu/xstate: Calculate and remember dynamic xstate buffer sizes
  2021-01-27  1:23     ` Bae, Chang Seok
@ 2021-01-27  9:38       ` Borislav Petkov
  2021-02-03  2:54         ` Bae, Chang Seok
  0 siblings, 1 reply; 64+ messages in thread
From: Borislav Petkov @ 2021-01-27  9:38 UTC (permalink / raw)
  To: Bae, Chang Seok
  Cc: Andy Lutomirski, Thomas Gleixner, mingo, x86, Brown, Len, Hansen,
	Dave, Liu, Jing2, Shankar, Ravi V, linux-kernel, kvm

On Wed, Jan 27, 2021 at 01:23:35AM +0000, Bae, Chang Seok wrote:
> How about ‘embedded’?,
>     “The xstate buffer is currently embedded into struct fpu with static size."

Better.

> Okay. I will prepare a separate cleanup patch that can be applied at the end
> of the series. Will post the change in this thread at first.

No, this is not how this works. Imagine you pile up a patch at the end
for each review feedback you've gotten. No, this will be an insane churn
and an unreviewable mess.

What you do is you rework your patches like everyone else.

Also, thinking about this more, I'm wondering if all those
xstate-related attributes shouldn't be part of struct fpu instead of
being scattered around like that.

That thing - struct fpu * - gets passed in everywhere anyway so all that
min_size, max_size, ->xstate_ptr and whatever, looks like it wants to be
part of struct fpu. Then maybe you won't need the accessors...

> One of my drafts had some internal helper to be called in there. No reason
> prior to applying the get_xstate_buffer_attr() helper. But with it, better to
> move this out of this header file I think.

See above.

> 
> >> @@ -627,13 +627,18 @@ static void check_xstate_against_struct(int nr)
> >>  */
> > 
> > <-- There's a comment over this function that might need adjustment.
> 
> Do you mean an empty line? (Just want to clarify.)

No, I mean this comment:

 * Dynamic XSAVE features allocate their own buffers and are not
 * covered by these checks. Only the size of the buffer for task->fpu
 * is checked here.

That probably needs adjusting as you do set min and max size here now
for the dynamic buffer.

> Agreed. I will prepare a patch. At least will post the diff here.

You can send it separately from this patchset, ontop of current
tip/master, so that I can take it now.

> How about:
>     “Ensure the size fits in the statically-allocated buffer:"

Yep.

> No excuse, just pointing out the upstream code has “we” there [1].

Yeah, I know. :-\

But considering how many parties develop the kernel now, "we" becomes
really ambiguous.

Thx.

-- 
Regards/Gruss,
    Boris.

SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer, HRB 36809, AG Nürnberg

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v3 07/21] x86/fpu/xstate: Introduce helpers to manage dynamic xstate buffers
  2021-01-27  1:23     ` Bae, Chang Seok
@ 2021-01-27 10:41       ` Borislav Petkov
  2021-02-03  4:10         ` Bae, Chang Seok
  0 siblings, 1 reply; 64+ messages in thread
From: Borislav Petkov @ 2021-01-27 10:41 UTC (permalink / raw)
  To: Bae, Chang Seok
  Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, x86-ml, Brown,
	Len, Hansen, Dave, Liu, Jing2, Shankar, Ravi V, linux-kernel

On Wed, Jan 27, 2021 at 01:23:57AM +0000, Bae, Chang Seok wrote:
> The xstate buffer may expand on the fly. The size has to be correctly
> calculated if needed. CPUID provides essential information for the
> calculation. Instead of reading CPUID repeatedly, store them -- the offset and
> size are already stored here. The 64B alignment looks to be missing, so added
> here.

/me goes and digs into the SDM.

Do you mean this:

"Bit 01 is set if, when the compacted format of an XSAVE area is used,
this extended state component located on the next 64-byte boundary
following the preceding state component (otherwise, it is located
immediately following the preceding state component)."

So judging by your variable naming, you wanna record here whether the
buffer aligns on 64 bytes.

Yes, no?

How about a comment over that variable so that people reading the code,
know what it records and do not have to open the SDM each time.

> How about:
>     “With the given mask, no relevant size is found so far. So, calculate it by
>      summing up each state size."

Yap, better.

> Maybe:
>     "When the buffer is more than this size, the current mechanism is
>      potentially marginal to support the allocations."

Where do you get those formulations?!

Are you simply trying to say that for buffers larger than 64K, the
kernel needs "a more sophisticated allocation scheme"?

I'd suggest you try simple formulations first.

And why does it need a more sophisticated allocation scheme? Is 64K
magical?

Also, I'm assuming here - since you're using vmalloc - that XSAVE* can
handle virtually contiguous memory. SDM says it saves to "mem" and
doesn't specify so it sounds like it does but let's have a confirmation
here pls.

> 
> >> +#define XSTATE_BUFFER_MAX_BYTES		(64 * 1024)
> > 
> > What's that thing for when we have fpu_kernel_xstate_max_size too?
> 
> The threshold size is what the current mechanism can comfortably allocate
> (maybe at most). The warning is left when the buffer size goes beyond the 
> threshold. Then, we may need to consider a better allocation mechanism.

As above, why?

> Although a warning is given, vmalloc() may manage to allocate this size. So,
> it was not considered a hard hit yet. vmalloc() failure will return an error
> later.

And that warning is destined for whom, exactly?

When can that state become more than 64K?

What is that artificial limit for?

A whole lot of questions...

> Okay, the first question is whether this is an error. Well, with such too-much
> size though, the buffer can still store the states. So, give a warning at
> least. Perhaps, a similar case is when the calculated size is unmatched with
> the CPUID-provided [3]. We give a warning, not an error there, maybe assuming
> the calculated is larger.
> 
> But if it should be considered an error, maybe return -EINVAL.

I have no clue what that means...

> 
> >> +	}
> >> +
> >> +	/* We need 64B aligned pointer, but vmalloc() returns a page-aligned address. */
> > 
> > So this comment is useless, basically...
> 
> Okay, removed.
> 
> >> +	state_ptr = vmalloc(newsz);
> >> +	if (!state_ptr) {
> >> +		trace_x86_fpu_xstate_alloc_failed(fpu);
> > 
> > WTH is that tracepoint here for?
> 
> While it returns an error, this function can be on the path of NMI handling.

How?

You're allocating an xstate buffer in NMI context?!

> Then, likely only with the “unexpected #NM exception” message. So, logging a
> tracepoint can provide evidence of the allocation failure in that case.

Who's going to see that tracepoint, people who are tracing the system
but not normal users.

> PATCH9 introduces a wrapper that determines which to take. It simply returns
> state_ptr when not a null pointer. So, the logic is to use the dynamic buffer
> when available.

Why not allocate the xstate buffer by default instead of being embedded
in struct fpu?

You're already determining its max_size and you can use that to do the
allocation. Two buffers is calling for trouble.


> [1] https://lore.kernel.org/lkml/69721125-4e1c-ca9c-ff59-8e1331933e6c@intel.com/#t

Ok, I read that subthread.

The reasoning *why* we're using vmalloc() needs to be explained in a
comment over alloc_xstate_buffer() otherwise we will forget and that is
important.

-- 
Regards/Gruss,
    Boris.

SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer, HRB 36809, AG Nürnberg

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v3 06/21] x86/fpu/xstate: Calculate and remember dynamic xstate buffer sizes
  2021-01-27  9:38       ` Borislav Petkov
@ 2021-02-03  2:54         ` Bae, Chang Seok
  0 siblings, 0 replies; 64+ messages in thread
From: Bae, Chang Seok @ 2021-02-03  2:54 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Andy Lutomirski, Thomas Gleixner, mingo, x86, Brown, Len, Hansen,
	Dave, Liu, Jing2, Shankar, Ravi V, linux-kernel, kvm

On Jan 27, 2021, at 01:38, Borislav Petkov <bp@suse.de> wrote:
> On Wed, Jan 27, 2021 at 01:23:35AM +0000, Bae, Chang Seok wrote:
>> Okay. I will prepare a separate cleanup patch that can be applied at the end
>> of the series. Will post the change in this thread at first.
> 
> No, this is not how this works. Imagine you pile up a patch at the end
> for each review feedback you've gotten. No, this will be an insane churn
> and an unreviewable mess.
> 
> What you do is you rework your patches like everyone else.

Yeah, it makes sense. I will post v4.

> Also, thinking about this more, I'm wondering if all those
> xstate-related attributes shouldn't be part of struct fpu instead of
> being scattered around like that.
> 
> That thing - struct fpu * - gets passed in everywhere anyway so all that
> min_size, max_size, ->xstate_ptr and whatever, looks like it wants to be
> part of struct fpu. Then maybe you won't need the accessors...

Well, min_size and max_size are not task-specific. So, it will be wasteful to
include in struct fpu.

I will follow your suggestion to add new helpers to access the size values,
instead of exporting them.

>>>> @@ -627,13 +627,18 @@ static void check_xstate_against_struct(int nr)
>>>> */
>>> 
>>> <-- There's a comment over this function that might need adjustment.
>> 
>> Do you mean an empty line? (Just want to clarify.)
> 
> No, I mean this comment:
> 
> * Dynamic XSAVE features allocate their own buffers and are not
> * covered by these checks. Only the size of the buffer for task->fpu
> * is checked here.
> 
> That probably needs adjusting as you do set min and max size here now
> for the dynamic buffer.

Oh, I see. Thank you.

>> Agreed. I will prepare a patch. At least will post the diff here.
> 
> You can send it separately from this patchset, ontop of current
> tip/master, so that I can take it now.

Posted, [1]. After all, the proposal is to remove the helper.

Thanks,
Chang

[1] https://lore.kernel.org/lkml/20210203024052.15789-1-chang.seok.bae@intel.com/

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v3 07/21] x86/fpu/xstate: Introduce helpers to manage dynamic xstate buffers
  2021-01-26 20:17   ` Borislav Petkov
  2021-01-27  1:23     ` Bae, Chang Seok
@ 2021-02-03  4:10     ` Bae, Chang Seok
  1 sibling, 0 replies; 64+ messages in thread
From: Bae, Chang Seok @ 2021-02-03  4:10 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Andy Lutomirski, tglx, mingo, x86, Brown, Len, Hansen, Dave, Liu,
	Jing2, Shankar, Ravi V, linux-kernel

On Jan 26, 2021, at 12:17, Borislav Petkov <bp@suse.de> wrote:
> On Wed, Dec 23, 2020 at 07:57:03AM -0800, Chang S. Bae wrote:
>> 
>> +int alloc_xstate_buffer(struct fpu *fpu, u64 mask)

<snip>

>> +	if (newsz > fpu_kernel_xstate_max_size) {
>> +		pr_warn_once("x86/fpu: xstate buffer too large (%u > %u bytes)\n",
>> +			     newsz, fpu_kernel_xstate_max_size);
>> +		XSTATE_WARN_ON(1);
>> +		return 0;
> 
> return 0?!? On an error?!?

With more discussions, I now think it is too much to check like this. This
function (merely) allocates the requested size. So, going to remove it.

Thanks,
Chang


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v3 07/21] x86/fpu/xstate: Introduce helpers to manage dynamic xstate buffers
  2021-01-27 10:41       ` Borislav Petkov
@ 2021-02-03  4:10         ` Bae, Chang Seok
  2021-02-04 13:10           ` Borislav Petkov
  0 siblings, 1 reply; 64+ messages in thread
From: Bae, Chang Seok @ 2021-02-03  4:10 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, x86-ml, Brown,
	Len, Hansen, Dave, Liu, Jing2, Shankar, Ravi V, linux-kernel

On Jan 27, 2021, at 02:41, Borislav Petkov <bp@suse.de> wrote:
> On Wed, Jan 27, 2021 at 01:23:57AM +0000, Bae, Chang Seok wrote:
>> The xstate buffer may expand on the fly. The size has to be correctly
>> calculated if needed. CPUID provides essential information for the
>> calculation. Instead of reading CPUID repeatedly, store them -- the offset and
>> size are already stored here. The 64B alignment looks to be missing, so added
>> here.
> 
> /me goes and digs into the SDM.
> 
> Do you mean this:
> 
> "Bit 01 is set if, when the compacted format of an XSAVE area is used,
> this extended state component located on the next 64-byte boundary
> following the preceding state component (otherwise, it is located
> immediately following the preceding state component)."
> 
> So judging by your variable naming, you wanna record here whether the
> buffer aligns on 64 bytes.
> 
> Yes, no?

Yes, you’re right.

> How about a comment over that variable so that people reading the code,
> know what it records and do not have to open the SDM each time.

Okay, how about:
“
This alignment bit is set if the state is saved on a 64B-aligned address in
the compacted format buffer.
"

>> Maybe:
>>    "When the buffer is more than this size, the current mechanism is
>>     potentially marginal to support the allocations."
> 
> Where do you get those formulations?!
> 
> Are you simply trying to say that for buffers larger than 64K, the
> kernel needs "a more sophisticated allocation scheme"?
> 
> I'd suggest you try simple formulations first.
> 
> And why does it need a more sophisticated allocation scheme? Is 64K
> magical?
> 
> Also, I'm assuming here - since you're using vmalloc - that XSAVE* can
> handle virtually contiguous memory. SDM says it saves to "mem" and
> doesn't specify so it sounds like it does but let's have a confirmation
> here pls.

Yes, correct.

>>>> +#define XSTATE_BUFFER_MAX_BYTES		(64 * 1024)
>>> 
>>> What's that thing for when we have fpu_kernel_xstate_max_size too?
>> 
>> The threshold size is what the current mechanism can comfortably allocate
>> (maybe at most). The warning is left when the buffer size goes beyond the 
>> threshold. Then, we may need to consider a better allocation mechanism.
> 
> As above, why?
> 
>> Although a warning is given, vmalloc() may manage to allocate this size. So,
>> it was not considered a hard hit yet. vmalloc() failure will return an error
>> later.
> 
> And that warning is destined for whom, exactly?
> 
> When can that state become more than 64K?
> 
> What is that artificial limit for?
> 
> A whole lot of questions…

Okay, let me try to explain..

The threshold here could be more than that. But the intention is a heads-up to
(re-)consider (a) a new allocation mechanism and (b) to shrink the memory
allocation.

Also, the AMX state size is limited to (a bit less than) 64KB and it was
discussed that vmalloc() will be okay with AMX [2].

DaveH, correct me if I'm wrong.

>>>> +	state_ptr = vmalloc(newsz);
>>>> +	if (!state_ptr) {
>>>> +		trace_x86_fpu_xstate_alloc_failed(fpu);
>>> 
>>> WTH is that tracepoint here for?
>> 
>> While it returns an error, this function can be on the path of NMI handling.
> 
> How?
> 
> You're allocating an xstate buffer in NMI context?!

Oh, sorry. The typo could make it confusing here -- s/NMI/#NM/.

>> Then, likely only with the “unexpected #NM exception” message. So, logging a
>> tracepoint can provide evidence of the allocation failure in that case.
> 
> Who's going to see that tracepoint, people who are tracing the system
> but not normal users.

Maybe it is possible to backtrack this allocation failure out of #NM handling.
But the tracepoint can provide a clear context, although limited to those
using it.

>> PATCH9 introduces a wrapper that determines which to take. It simply returns
>> state_ptr when not a null pointer. So, the logic is to use the dynamic buffer
>> when available.
> 
> Why not allocate the xstate buffer by default instead of being embedded
> in struct fpu?

Indeed, this is the most preferred way on one hand. But there was a change to
the current allocation approach by Ingo about 6 years ago [3].

So, I’m wondering his current thought on this suggestion.

> You're already determining its max_size and you can use that to do the
> allocation. Two buffers is calling for trouble.

But if so, every task will consume 8KB (or up to 64KB) with AMX. Bad is the
waste of memory for those not using the state at all.

>> [1] https://lore.kernel.org/lkml/69721125-4e1c-ca9c-ff59-8e1331933e6c@intel.com/#t
> 
> Ok, I read that subthread.
> 
> The reasoning *why* we're using vmalloc() needs to be explained in a
> comment over alloc_xstate_buffer() otherwise we will forget and that is
> important.

Maybe:
“
If the task with vmalloc()-allocated buffer tends to terminate quickly,
vfree()-induced IPIs may be a concern. Implement cache may be helpful on this.
But the task with large state is likely to live longer. So, use vmalloc()
simply.
"
Let me know if this is not enough.

Thanks,
Chang

[2] https://lore.kernel.org/lkml/CALCETrW8u5rUsZvoo5t4YtC+4boBVcK__-srtA1+-YX06QYD1w@mail.gmail.com/
[3] https://lore.kernel.org/lkml/1430848300-27877-56-git-send-email-mingo@kernel.org/


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v3 07/21] x86/fpu/xstate: Introduce helpers to manage dynamic xstate buffers
  2021-02-03  4:10         ` Bae, Chang Seok
@ 2021-02-04 13:10           ` Borislav Petkov
  0 siblings, 0 replies; 64+ messages in thread
From: Borislav Petkov @ 2021-02-04 13:10 UTC (permalink / raw)
  To: Bae, Chang Seok
  Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, x86-ml, Brown,
	Len, Hansen, Dave, Liu, Jing2, Shankar, Ravi V, linux-kernel

On Wed, Feb 03, 2021 at 04:10:24AM +0000, Bae, Chang Seok wrote:
> Okay, how about:
> “
> This alignment bit is set if the state is saved on a 64B-aligned address in
> the compacted format buffer.
> "

I'd prefer:

/*
 * True if the buffer of the corresponding XFEATURE is located on the next 64
 * byte boundary. Otherwise, it follows the preceding component immediately.
 */
static bool xstate_aligns[XFEATURE_MAX] = { [ 0 ... XFEATURE_MAX - 1] = false };

> The threshold here could be more than that. But the intention is a heads-up to
> (re-)consider (a) a new allocation mechanism and (b) to shrink the memory
> allocation.
> 
> Also, the AMX state size is limited to (a bit less than) 64KB and it was
> discussed that vmalloc() will be okay with AMX [2].

So if nothing is going to grow over 64K, why are we even talking about this?

> Maybe it is possible to backtrack this allocation failure out of #NM handling.
> But the tracepoint can provide a clear context, although limited to those
> using it.

Yes, add it when it is really needed. Not slapping it proactively and
hoping for any potential usage.

> Indeed, this is the most preferred way on one hand. But there was a change to
> the current allocation approach by Ingo about 6 years ago [3].

Yah, there's that. :-\

I guess it needs to stay embedded. Oh well.

I guess you can diminish the confusion by doing this:

struct fpu {

	...

	union fpregs_state		*state;

	union fpregs_state		__default_state;
};

and tasks will have

	state = &__default_state;

set up by default in fpu__copy() etc.

AMX tasks will simply change the pointer to the vmalloc'ed xstate
buffer. This way at least the pointer will be a single one and the task
alloc code will simply reroute it instead of having two things to pay
attention to.

Thx.

-- 
Regards/Gruss,
    Boris.

SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer, HRB 36809, AG Nürnberg

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v3 08/21] x86/fpu/xstate: Define the scope of the initial xstate data
  2020-12-23 15:57 ` [PATCH v3 08/21] x86/fpu/xstate: Define the scope of the initial xstate data Chang S. Bae
@ 2021-02-08 12:33   ` Borislav Petkov
  2021-02-08 18:53     ` Bae, Chang Seok
  0 siblings, 1 reply; 64+ messages in thread
From: Borislav Petkov @ 2021-02-08 12:33 UTC (permalink / raw)
  To: Chang S. Bae
  Cc: luto, tglx, mingo, x86, len.brown, dave.hansen, jing2.liu,
	ravi.v.shankar, linux-kernel

On Wed, Dec 23, 2020 at 07:57:04AM -0800, Chang S. Bae wrote:
> init_fpstate is used to record the initial xstate value for convenience

convenience?

> and covers all the states. But it is wasteful to cover large states all
> with trivial initial data.
> 
> Limit init_fpstate by clarifying its size and coverage, which are all but
> dynamic user states. The dynamic states are assumed to be large but having
> initial data with zeros.
> 
> No functional change until the kernel supports dynamic user states.

What does that mean?

This patch either makes no functional change or it does...

> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
> Reviewed-by: Len Brown <len.brown@intel.com>
> Cc: x86@kernel.org
> Cc: linux-kernel@vger.kernel.org
> ---
> Changes from v2:
> * Updated the changelog for clarification.
> * Updated the code comments.
> ---
>  arch/x86/include/asm/fpu/internal.h | 18 +++++++++++++++---
>  arch/x86/include/asm/fpu/xstate.h   |  1 +
>  arch/x86/kernel/fpu/core.c          |  4 ++--
>  arch/x86/kernel/fpu/xstate.c        |  4 ++--
>  4 files changed, 20 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
> index 37ea5e37f21c..bbdd304719c6 100644
> --- a/arch/x86/include/asm/fpu/internal.h
> +++ b/arch/x86/include/asm/fpu/internal.h
> @@ -80,6 +80,18 @@ static __always_inline __pure bool use_fxsr(void)
>  
>  extern union fpregs_state init_fpstate;
>  
> +static inline u64 get_init_fpstate_mask(void)
> +{
> +	/* init_fpstate covers states in fpu->state. */
> +	return (xfeatures_mask_all & ~xfeatures_mask_user_dynamic);
> +}

If you're going to introduce such a helper, then use it everywhere in the code:

$ git grep "xfeatures_mask_all & ~xfeatures_mask_user_dynamic"
arch/x86/kernel/fpu/core.c:239: dst_fpu->state_mask = xfeatures_mask_all & ~xfeatures_mask_user_dynamic;
arch/x86/kernel/fpu/xstate.c:148:       else if (mask == (xfeatures_mask_all & ~xfeatures_mask_user_dynamic))
arch/x86/kernel/fpu/xstate.c:932:       current->thread.fpu.state_mask = (xfeatures_mask_all & ~xfeatures_mask_user_dynamic);

and if you do that, do that in a separate pre-patch which does only this
conversion.

> +static inline unsigned int get_init_fpstate_size(void)
> +{
> +	/* fpu->state size is aligned with the init_fpstate size. */
> +	return fpu_kernel_xstate_min_size;
> +}
> +
>  extern void fpstate_init(struct fpu *fpu);
>  #ifdef CONFIG_MATH_EMULATION
>  extern void fpstate_init_soft(struct swregs_state *soft);
> @@ -269,12 +281,12 @@ static inline void copy_fxregs_to_kernel(struct fpu *fpu)
>  		     : "memory")
>  
>  /*
> - * This function is called only during boot time when x86 caps are not set
> - * up and alternative can not be used yet.
> + * Use this function to dump the initial state, only during boot time when x86
> + * caps not set up and alternative not available yet.
>   */

What's the point of this change? Also, "dump"?!

>  static inline void copy_xregs_to_kernel_booting(struct xregs_state *xstate)
>  {
> -	u64 mask = xfeatures_mask_all;
> +	u64 mask = get_init_fpstate_mask();
>  	u32 lmask = mask;
>  	u32 hmask = mask >> 32;
>  	int err;
> diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
> index 379e8f8b8440..62f6583f34fa 100644
> --- a/arch/x86/include/asm/fpu/xstate.h
> +++ b/arch/x86/include/asm/fpu/xstate.h
> @@ -103,6 +103,7 @@ extern void __init update_regset_xstate_info(unsigned int size,
>  					     u64 xstate_mask);
>  
>  void *get_xsave_addr(struct fpu *fpu, int xfeature_nr);
> +unsigned int get_xstate_size(u64 mask);
>  int alloc_xstate_buffer(struct fpu *fpu, u64 mask);
>  void free_xstate_buffer(struct fpu *fpu);
>  
> diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
> index 6dafed34be4f..aad1a7102096 100644
> --- a/arch/x86/kernel/fpu/core.c
> +++ b/arch/x86/kernel/fpu/core.c
> @@ -206,10 +206,10 @@ void fpstate_init(struct fpu *fpu)
>  		return;
>  	}
>  
> -	memset(state, 0, fpu_kernel_xstate_min_size);
> +	memset(state, 0, fpu ? get_xstate_size(fpu->state_mask) : get_init_fpstate_size());
>  
>  	if (static_cpu_has(X86_FEATURE_XSAVES))
> -		fpstate_init_xstate(&state->xsave, xfeatures_mask_all);
> +		fpstate_init_xstate(&state->xsave, fpu ? fpu->state_mask : get_init_fpstate_mask());

<---- newline here.

>  	if (static_cpu_has(X86_FEATURE_FXSR))
>  		fpstate_init_fxstate(&state->fxsave);
>  	else

...

-- 
Regards/Gruss,
    Boris.

SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer, HRB 36809, AG Nürnberg

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v3 09/21] x86/fpu/xstate: Introduce wrapper functions to organize xstate buffer access
  2020-12-23 15:57 ` [PATCH v3 09/21] x86/fpu/xstate: Introduce wrapper functions to organize xstate buffer access Chang S. Bae
@ 2021-02-08 12:33   ` Borislav Petkov
  2021-02-09 15:50     ` Bae, Chang Seok
  0 siblings, 1 reply; 64+ messages in thread
From: Borislav Petkov @ 2021-02-08 12:33 UTC (permalink / raw)
  To: Chang S. Bae
  Cc: luto, tglx, mingo, x86, len.brown, dave.hansen, jing2.liu,
	ravi.v.shankar, linux-kernel

On Wed, Dec 23, 2020 at 07:57:05AM -0800, Chang S. Bae wrote:
> The struct fpu includes two (possible) xstate buffers -- fpu->state and
> fpu->state_ptr. Instead of open code for accessing one of them, provide a
> wrapper that covers both cases.

Right, if you do the thing I suggested - have a single ->xstate pointer
- then that below is not needed.

-- 
Regards/Gruss,
    Boris.

SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer, HRB 36809, AG Nürnberg

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v3 10/21] x86/fpu/xstate: Update xstate save function to support dynamic xstate
  2020-12-23 15:57 ` [PATCH v3 10/21] x86/fpu/xstate: Update xstate save function to support dynamic xstate Chang S. Bae
  2021-01-07  8:41   ` Liu, Jing2
@ 2021-02-08 12:33   ` Borislav Petkov
  2021-02-09 15:48     ` Bae, Chang Seok
  1 sibling, 1 reply; 64+ messages in thread
From: Borislav Petkov @ 2021-02-08 12:33 UTC (permalink / raw)
  To: Chang S. Bae
  Cc: luto, tglx, mingo, x86, len.brown, dave.hansen, jing2.liu,
	ravi.v.shankar, linux-kernel, kvm

On Wed, Dec 23, 2020 at 07:57:06AM -0800, Chang S. Bae wrote:
> copy_xregs_to_kernel() used to save all user states in a kernel buffer.
> When the dynamic user state is enabled, it becomes conditional which state
> to be saved.
> 
> fpu->state_mask can indicate which state components are reserved to be
> saved in XSAVE buffer. Use it as XSAVE's instruction mask to select states.
> 
> KVM used to save all xstate via copy_xregs_to_kernel(). Update KVM to set a
> valid fpu->state_mask, which will be necessary to correctly handle dynamic
> state buffers.

All this commit message should say is something along the lines of
"extend copy_xregs_to_kernel() to receive a mask argument of which
states to save, in preparation of dynamic states handling."

> No functional change until the kernel supports dynamic user states.

Same comment as before.

-- 
Regards/Gruss,
    Boris.

SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer, HRB 36809, AG Nürnberg

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v3 08/21] x86/fpu/xstate: Define the scope of the initial xstate data
  2021-02-08 12:33   ` Borislav Petkov
@ 2021-02-08 18:53     ` Bae, Chang Seok
  2021-02-09 12:49       ` Borislav Petkov
  0 siblings, 1 reply; 64+ messages in thread
From: Bae, Chang Seok @ 2021-02-08 18:53 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Andy Lutomirski, tglx, mingo, x86, Brown, Len, Hansen, Dave, Liu,
	Jing2, Shankar, Ravi V, linux-kernel

On Feb 8, 2021, at 04:33, Borislav Petkov <bp@suse.de> wrote:
> On Wed, Dec 23, 2020 at 07:57:04AM -0800, Chang S. Bae wrote:
>> init_fpstate is used to record the initial xstate value for convenience
> 
> convenience?

Yes, this is vague. I think the usage is when (re-)initializing the register
states, e.g. from fpu__clear().

Maybe, drop ‘for convenience’ from this sentence, since the buffer’s usage is
not much relevant in this changelog.

>> and covers all the states. But it is wasteful to cover large states all
>> with trivial initial data.
>> 
>> Limit init_fpstate by clarifying its size and coverage, which are all but
>> dynamic user states. The dynamic states are assumed to be large but having
>> initial data with zeros.
>> 
>> No functional change until the kernel supports dynamic user states.
> 
> What does that mean?
> 
> This patch either makes no functional change or it does...

It does functional change, but it is conditional to AMX enabling.

It includes all the initial states when AMX states not enabled. But it will
exclude the AMX state (with 8KB zeros) with the change.

>> extern union fpregs_state init_fpstate;
>> 
>> +static inline u64 get_init_fpstate_mask(void)
>> +{
>> +	/* init_fpstate covers states in fpu->state. */
>> +	return (xfeatures_mask_all & ~xfeatures_mask_user_dynamic);
>> +}
> 
> If you're going to introduce such a helper, then use it everywhere in the code:
> 
> $ git grep "xfeatures_mask_all & ~xfeatures_mask_user_dynamic"
> arch/x86/kernel/fpu/core.c:239: dst_fpu->state_mask = xfeatures_mask_all & ~xfeatures_mask_user_dynamic;
> arch/x86/kernel/fpu/xstate.c:148:       else if (mask == (xfeatures_mask_all & ~xfeatures_mask_user_dynamic))
> arch/x86/kernel/fpu/xstate.c:932:       current->thread.fpu.state_mask = (xfeatures_mask_all & ~xfeatures_mask_user_dynamic);
> 
> and if you do that, do that in a separate pre-patch which does only this
> conversion.

I think they are in a different context.

The helper indicates the mask for the ‘init_fpstate’ buffer. The rest is the
initial mask value for the per-task xstate buffer.

Since you suggested to introduce get_xstate_buffer_attr(), how about replacing
what you found with something like:

get_xstate_buffer_attr(XSTATE_INIT_MASK)

>> +
>> extern void fpstate_init(struct fpu *fpu);
>> #ifdef CONFIG_MATH_EMULATION
>> extern void fpstate_init_soft(struct swregs_state *soft);
>> @@ -269,12 +281,12 @@ static inline void copy_fxregs_to_kernel(struct fpu *fpu)
>> 		     : "memory")
>> 
>> /*
>> - * This function is called only during boot time when x86 caps are not set
>> - * up and alternative can not be used yet.
>> + * Use this function to dump the initial state, only during boot time when x86
>> + * caps not set up and alternative not available yet.
>>  */
> 
> What's the point of this change? Also, "dump"?!

Yeah, right now, I don’t see the change is really necessary here. Sorry.

>> -	memset(state, 0, fpu_kernel_xstate_min_size);
>> +	memset(state, 0, fpu ? get_xstate_size(fpu->state_mask) : get_init_fpstate_size());
>> 
>> 	if (static_cpu_has(X86_FEATURE_XSAVES))
>> -		fpstate_init_xstate(&state->xsave, xfeatures_mask_all);
>> +		fpstate_init_xstate(&state->xsave, fpu ? fpu->state_mask : get_init_fpstate_mask());
> 
> <---- newline here.

Okay. Will do that.

Thanks,
Chang

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v3 08/21] x86/fpu/xstate: Define the scope of the initial xstate data
  2021-02-08 18:53     ` Bae, Chang Seok
@ 2021-02-09 12:49       ` Borislav Petkov
  2021-02-09 15:38         ` Bae, Chang Seok
  0 siblings, 1 reply; 64+ messages in thread
From: Borislav Petkov @ 2021-02-09 12:49 UTC (permalink / raw)
  To: Bae, Chang Seok
  Cc: Andy Lutomirski, tglx, mingo, x86, Brown, Len, Hansen, Dave, Liu,
	Jing2, Shankar, Ravi V, linux-kernel

On Mon, Feb 08, 2021 at 06:53:23PM +0000, Bae, Chang Seok wrote:
> Maybe, drop ‘for convenience’ from this sentence, since the buffer’s usage is
> not much relevant in this changelog.

Yes, "init_fpstate" is kinda clear what it is, from the name.

> It does functional change, but it is conditional to AMX enabling.
>
> It includes all the initial states when AMX states not enabled. But it will
> exclude the AMX state (with 8KB zeros) with the change.

Those sentences "no functional change" are supposed to mean that
the patch doesn't change anything and is only an equivalent code
transformation.

Yours does. So drop it from this one and from all the other patches as
it is causing more confusion than it is trying to dispel.

> I think they are in a different context.
> 
> The helper indicates the mask for the ‘init_fpstate’ buffer. The rest is the
> initial mask value for the per-task xstate buffer.

Wait, what?

Are you trying to tell me that that helper will return different masks
depending on xfeatures_mask_user_dynamic, which changes in its lifetime?

Then drop that helper altogether - that is more confusion and the xstate
code is already confusing enough.

> Since you suggested to introduce get_xstate_buffer_attr(), how about replacing
> what you found with something like:
> 
> get_xstate_buffer_attr(XSTATE_INIT_MASK)

I'd prefer no helper at all but only comments above the usage site.

Thx.

-- 
Regards/Gruss,
    Boris.

SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer, HRB 36809, AG Nürnberg

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v3 08/21] x86/fpu/xstate: Define the scope of the initial xstate data
  2021-02-09 12:49       ` Borislav Petkov
@ 2021-02-09 15:38         ` Bae, Chang Seok
  0 siblings, 0 replies; 64+ messages in thread
From: Bae, Chang Seok @ 2021-02-09 15:38 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Andy Lutomirski, tglx, mingo, x86, Brown, Len, Hansen, Dave, Liu,
	Jing2, Shankar, Ravi V, linux-kernel

On Feb 9, 2021, at 04:49, Borislav Petkov <bp@suse.de> wrote:
> On Mon, Feb 08, 2021 at 06:53:23PM +0000, Bae, Chang Seok wrote:
> Yours does. So drop it from this one and from all the other patches as
> it is causing more confusion than it is trying to dispel.

Okay.

>> I think they are in a different context.
>> 
>> The helper indicates the mask for the ‘init_fpstate’ buffer. The rest is the
>> initial mask value for the per-task xstate buffer.
> 
> Wait, what?
> 
> Are you trying to tell me that that helper will return different masks
> depending on xfeatures_mask_user_dynamic, which changes in its lifetime?

At least in this series, no. But I thought it is possible in the future.

> Then drop that helper altogether - that is more confusion and the xstate
> code is already confusing enough.

Okay.

>> Since you suggested to introduce get_xstate_buffer_attr(), how about replacing
>> what you found with something like:
>> 
>> get_xstate_buffer_attr(XSTATE_INIT_MASK)
> 
> I'd prefer no helper at all but only comments above the usage site.

Yes, I will do that.

Thanks,
Chang

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v3 10/21] x86/fpu/xstate: Update xstate save function to support dynamic xstate
  2021-02-08 12:33   ` Borislav Petkov
@ 2021-02-09 15:48     ` Bae, Chang Seok
  0 siblings, 0 replies; 64+ messages in thread
From: Bae, Chang Seok @ 2021-02-09 15:48 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Andy Lutomirski, tglx, mingo, x86, Brown, Len, Hansen, Dave, Liu,
	Jing2, Shankar, Ravi V, linux-kernel, kvm

On Feb 8, 2021, at 04:33, Borislav Petkov <bp@suse.de> wrote:
> On Wed, Dec 23, 2020 at 07:57:06AM -0800, Chang S. Bae wrote:
>> copy_xregs_to_kernel() used to save all user states in a kernel buffer.
>> When the dynamic user state is enabled, it becomes conditional which state
>> to be saved.
>> 
>> fpu->state_mask can indicate which state components are reserved to be
>> saved in XSAVE buffer. Use it as XSAVE's instruction mask to select states.
>> 
>> KVM used to save all xstate via copy_xregs_to_kernel(). Update KVM to set a
>> valid fpu->state_mask, which will be necessary to correctly handle dynamic
>> state buffers.
> 
> All this commit message should say is something along the lines of
> "extend copy_xregs_to_kernel() to receive a mask argument of which
> states to save, in preparation of dynamic states handling."

Yes, I will change like that. Thanks.

>> No functional change until the kernel supports dynamic user states.
> 
> Same comment as before.

This needs to be removed as per your comment [1].

Chang

[1] https://lore.kernel.org/lkml/20210209124906.GC15909@zn.tnic/

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v3 09/21] x86/fpu/xstate: Introduce wrapper functions to organize xstate buffer access
  2021-02-08 12:33   ` Borislav Petkov
@ 2021-02-09 15:50     ` Bae, Chang Seok
  0 siblings, 0 replies; 64+ messages in thread
From: Bae, Chang Seok @ 2021-02-09 15:50 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Andy Lutomirski, tglx, mingo, x86, Brown, Len, Hansen, Dave, Liu,
	Jing2, Shankar, Ravi V, linux-kernel

On Feb 8, 2021, at 04:33, Borislav Petkov <bp@suse.de> wrote:
> On Wed, Dec 23, 2020 at 07:57:05AM -0800, Chang S. Bae wrote:
>> The struct fpu includes two (possible) xstate buffers -- fpu->state and
>> fpu->state_ptr. Instead of open code for accessing one of them, provide a
>> wrapper that covers both cases.
> 
> Right, if you do the thing I suggested - have a single ->xstate pointer
> - then that below is not needed.

Yes. I dropped this patch.

Thanks,
Chang

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v3 11/21] x86/fpu/xstate: Update xstate buffer address finder to support dynamic xstate
  2020-12-23 15:57 ` [PATCH v3 11/21] x86/fpu/xstate: Update xstate buffer address finder " Chang S. Bae
@ 2021-02-19 15:00   ` Borislav Petkov
  2021-02-19 19:19     ` Bae, Chang Seok
  0 siblings, 1 reply; 64+ messages in thread
From: Borislav Petkov @ 2021-02-19 15:00 UTC (permalink / raw)
  To: Chang S. Bae
  Cc: luto, tglx, mingo, x86, len.brown, dave.hansen, jing2.liu,
	ravi.v.shankar, linux-kernel

On Wed, Dec 23, 2020 at 07:57:07AM -0800, Chang S. Bae wrote:
> __raw_xsave_addr() returns the requested component's pointer in an xstate
> buffer, by simply looking up the offset table. The offset used to be fixed,
> but, with dynamic user states, it becomes variable.
> 
> get_xstate_size() has a routine to find an offset at runtime. Refactor to
> use it for the address finder.
> 
> No functional change until the kernel enables dynamic user states.
> 
> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
> Reviewed-by: Len Brown <len.brown@intel.com>
> Cc: x86@kernel.org
> Cc: linux-kernel@vger.kernel.org
> ---
>  arch/x86/kernel/fpu/xstate.c | 82 +++++++++++++++++++++++-------------
>  1 file changed, 52 insertions(+), 30 deletions(-)
> 
> diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
> index 8dfbc7d1702a..6b863b2ca405 100644
> --- a/arch/x86/kernel/fpu/xstate.c
> +++ b/arch/x86/kernel/fpu/xstate.c
> @@ -133,15 +133,50 @@ static bool xfeature_is_supervisor(int xfeature_nr)
>  	return ecx & 1;
>  }
>  
> +/*
> + * Available once those arrays for the offset, size, and alignment info are set up,
> + * by setup_xstate_features().
> + */

That's kinda clear, right? Apparently, we do cache FPU attributes in
xstate.c so what is that comment actually trying to tell us? Or do you
want to add some sort of an assertion to this function in case it gets
called before setup_xstate_features()?

I think you should simply add kernel-doc style comment explaining what
the inputs are and what the function gives, which would be a lot more
useful.

> +static unsigned int __get_xstate_comp_offset(u64 mask, int feature_nr)
> +{
> +	u64 xmask = BIT_ULL(feature_nr + 1) - 1;
> +	unsigned int next_offset, offset = 0;
> +	int i;
> +
> +	if ((mask & xmask) == (xfeatures_mask_all & xmask))
> +		return xstate_comp_offsets[feature_nr];
> +
> +	/*
> +	 * Calculate the size by summing up each state together, since no known
> +	 * offset found with the xstate buffer format out of the given mask.
> +	 */
> +
> +	next_offset = FXSAVE_SIZE + XSAVE_HDR_SIZE;
> +
> +	for (i = FIRST_EXTENDED_XFEATURE; i <= feature_nr; i++) {
> +		if (!(mask & BIT_ULL(i)))
> +			continue;
> +
> +		offset = xstate_aligns[i] ? ALIGN(next_offset, 64) : next_offset;
> +		next_offset += xstate_sizes[i];
> +	}
> +
> +	return offset;
> +}
> +
> +static unsigned int get_xstate_comp_offset(struct fpu *fpu, int feature_nr)
> +{
> +	return __get_xstate_comp_offset(fpu->state_mask, feature_nr);
> +}

Just get rid of the __ variant and have a single function with the
following signature:

	static unsigned int get_xstate_comp_offset(u64 mask, int feature_nr)


Thx.

-- 
Regards/Gruss,
    Boris.

SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer, HRB 36809, AG Nürnberg

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v3 11/21] x86/fpu/xstate: Update xstate buffer address finder to support dynamic xstate
  2021-02-19 15:00   ` Borislav Petkov
@ 2021-02-19 19:19     ` Bae, Chang Seok
  0 siblings, 0 replies; 64+ messages in thread
From: Bae, Chang Seok @ 2021-02-19 19:19 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Andy Lutomirski, tglx, mingo, x86, Brown, Len, Hansen, Dave, Liu,
	Jing2, Shankar, Ravi V, linux-kernel

On Feb 19, 2021, at 07:00, Borislav Petkov <bp@suse.de> wrote:
> On Wed, Dec 23, 2020 at 07:57:07AM -0800, Chang S. Bae wrote:
>> 
>> 
>> +/*
>> + * Available once those arrays for the offset, size, and alignment info are set up,
>> + * by setup_xstate_features().
>> + */
> 
> That's kinda clear, right? Apparently, we do cache FPU attributes in
> xstate.c so what is that comment actually trying to tell us? Or do you
> want to add some sort of an assertion to this function in case it gets
> called before setup_xstate_features()?

Yes, it looks apparent without saying that. I don’t think assertion needed.

> I think you should simply add kernel-doc style comment explaining what
> the inputs are and what the function gives, which would be a lot more
> useful.

Maybe something like this:

/**
 * get_xstate_comp_offset() - Find the feature's offset in the compacted format
 * @mask:		This bitmap tells which components reserved in the format.
 * @feature_nr:	Feature number
 *
 * Returns:		The offset value
 */

>> +static unsigned int get_xstate_comp_offset(struct fpu *fpu, int feature_nr)
>> +{
>> +	return __get_xstate_comp_offset(fpu->state_mask, feature_nr);
>> +}
> 
> Just get rid of the __ variant and have a single function with the
> following signature:
> 
> 	static unsigned int get_xstate_comp_offset(u64 mask, int feature_nr)

Yeah, I should have done like this.

Thanks,
Chang

^ permalink raw reply	[flat|nested] 64+ messages in thread

end of thread, other threads:[~2021-02-19 19:20 UTC | newest]

Thread overview: 64+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-23 15:56 [PATCH v3 00/21] x86: Support Intel Advanced Matrix Extensions Chang S. Bae
2020-12-23 15:56 ` [PATCH v3 01/21] x86/fpu/xstate: Modify initialization helper to handle both static and dynamic buffers Chang S. Bae
2021-01-15 12:40   ` Borislav Petkov
2020-12-23 15:56 ` [PATCH v3 02/21] x86/fpu/xstate: Modify state copy helpers " Chang S. Bae
2021-01-15 12:50   ` Borislav Petkov
2021-01-19 18:50     ` Bae, Chang Seok
2021-01-20 20:53       ` Borislav Petkov
2021-01-20 21:12         ` Bae, Chang Seok
2020-12-23 15:56 ` [PATCH v3 03/21] x86/fpu/xstate: Modify address finders " Chang S. Bae
2021-01-15 13:06   ` Borislav Petkov
2020-12-23 15:57 ` [PATCH v3 04/21] x86/fpu/xstate: Modify context switch helpers " Chang S. Bae
2021-01-15 13:18   ` Borislav Petkov
2021-01-19 18:49     ` Bae, Chang Seok
2020-12-23 15:57 ` [PATCH v3 05/21] x86/fpu/xstate: Add a new variable to indicate dynamic user states Chang S. Bae
2021-01-15 13:39   ` Borislav Petkov
2021-01-15 19:47     ` Bae, Chang Seok
2021-01-19 15:57       ` Borislav Petkov
2021-01-19 18:57         ` Bae, Chang Seok
2021-01-22 10:56           ` Borislav Petkov
2021-01-27  1:23             ` Bae, Chang Seok
2020-12-23 15:57 ` [PATCH v3 06/21] x86/fpu/xstate: Calculate and remember dynamic xstate buffer sizes Chang S. Bae
2021-01-22 11:44   ` Borislav Petkov
2021-01-27  1:23     ` Bae, Chang Seok
2021-01-27  9:38       ` Borislav Petkov
2021-02-03  2:54         ` Bae, Chang Seok
2020-12-23 15:57 ` [PATCH v3 07/21] x86/fpu/xstate: Introduce helpers to manage dynamic xstate buffers Chang S. Bae
2021-01-26 20:17   ` Borislav Petkov
2021-01-27  1:23     ` Bae, Chang Seok
2021-01-27 10:41       ` Borislav Petkov
2021-02-03  4:10         ` Bae, Chang Seok
2021-02-04 13:10           ` Borislav Petkov
2021-02-03  4:10     ` Bae, Chang Seok
2020-12-23 15:57 ` [PATCH v3 08/21] x86/fpu/xstate: Define the scope of the initial xstate data Chang S. Bae
2021-02-08 12:33   ` Borislav Petkov
2021-02-08 18:53     ` Bae, Chang Seok
2021-02-09 12:49       ` Borislav Petkov
2021-02-09 15:38         ` Bae, Chang Seok
2020-12-23 15:57 ` [PATCH v3 09/21] x86/fpu/xstate: Introduce wrapper functions to organize xstate buffer access Chang S. Bae
2021-02-08 12:33   ` Borislav Petkov
2021-02-09 15:50     ` Bae, Chang Seok
2020-12-23 15:57 ` [PATCH v3 10/21] x86/fpu/xstate: Update xstate save function to support dynamic xstate Chang S. Bae
2021-01-07  8:41   ` Liu, Jing2
2021-01-07 18:40     ` Bae, Chang Seok
2021-01-12  2:52       ` Liu, Jing2
2021-01-15  4:59         ` Bae, Chang Seok
2021-01-15  5:45           ` Liu, Jing2
2021-02-08 12:33   ` Borislav Petkov
2021-02-09 15:48     ` Bae, Chang Seok
2020-12-23 15:57 ` [PATCH v3 11/21] x86/fpu/xstate: Update xstate buffer address finder " Chang S. Bae
2021-02-19 15:00   ` Borislav Petkov
2021-02-19 19:19     ` Bae, Chang Seok
2020-12-23 15:57 ` [PATCH v3 12/21] x86/fpu/xstate: Update xstate context copy function to support dynamic buffer Chang S. Bae
2020-12-23 15:57 ` [PATCH v3 13/21] x86/fpu/xstate: Expand dynamic context switch buffer on first use Chang S. Bae
2020-12-23 15:57 ` [PATCH v3 14/21] x86/fpu/xstate: Support ptracer-induced xstate buffer expansion Chang S. Bae
2020-12-23 15:57 ` [PATCH v3 15/21] x86/fpu/xstate: Extend the table to map xstate components with features Chang S. Bae
2020-12-23 15:57 ` [PATCH v3 16/21] x86/cpufeatures/amx: Enumerate Advanced Matrix Extension (AMX) feature bits Chang S. Bae
2020-12-23 15:57 ` [PATCH v3 17/21] x86/fpu/amx: Define AMX state components and have it used for boot-time checks Chang S. Bae
2020-12-23 15:57 ` [PATCH v3 18/21] x86/fpu/amx: Enable the AMX feature in 64-bit mode Chang S. Bae
2020-12-23 15:57 ` [PATCH v3 19/21] selftest/x86/amx: Include test cases for the AMX state management Chang S. Bae
2020-12-23 15:57 ` [PATCH v3 20/21] x86/fpu/xstate: Support dynamic user state in the signal handling path Chang S. Bae
2020-12-23 15:57 ` [PATCH v3 21/21] x86/fpu/xstate: Introduce boot-parameters to control some state component support Chang S. Bae
2020-12-23 18:37   ` Randy Dunlap
2021-01-14 21:31     ` Bae, Chang Seok
2021-01-14 21:31 ` [PATCH v3 00/21] x86: Support Intel Advanced Matrix Extensions Bae, Chang Seok

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.