All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/23] x86: Support Intel Advanced Matrix Extensions (part 4)
@ 2021-10-21 22:55 Chang S. Bae
  2021-10-21 22:55 ` [PATCH 01/23] signal: Add an optional check for altstack size Chang S. Bae
                   ` (22 more replies)
  0 siblings, 23 replies; 60+ messages in thread
From: Chang S. Bae @ 2021-10-21 22:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, tglx, dave.hansen, arjan, ravi.v.shankar, chang.seok.bae

This is the last part of the effort to support AMX. This series follows the
KVM part:
    https://lore.kernel.org/lkml/20211017151447.829495362@linutronix.de/

With AMX the FPU register state buffer which is part of
task_struct::thread::fpu is not going to be extended unconditionally for
all tasks on an AMX enabled system as that would waste minimum 8K per task.

AMX provides a mechanism to trap on first use. That trap will be utilized
to allocate a larger register state buffer when the task (process) has
permissions to use it. The default buffer task_struct will only carry
states up to AVX512.

The cost of XFD switching only matters for an AMX-enabled system. With the
cleanup of the KVM FPU handling, host-side XFD/AMX is completely
independent of guest-side XFD/AMX.

The per-task feature and size information helps to support dynamic features
organically compared to the old versions.

Each task has a unique sigframe length with dynamic features. sigaltstack()
has a new size checker to support a per-task sigframe size.

This version also fixes the syscall implementation and the XFD state
switching on hot paths.

Here is a summary of patches:

  - At first, add arch-specific sigaltstack size check

  - Establish a set of new syscalls to control dynamic XSTATE components

  - Update infrastructure to prepare dynamic features

  - Support XFD state and switching, and add buffer reallocation helpers

  - Finally, enable AMX with XFD #NM handling

This series is based on:

    git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git x86/fpu-3-kvm

and a revision of the version in the tglx tree -- some bug fixes, polish,
and addressing Dave Hansen's feedback (noted on each patch):

    git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git x86/fpu

This comes with a large selftest suite which is going to be posted
separately. A preview along with the full series is available here:

    git://github.com/intel/amx-linux.git amx

Thanks,
Chang

Chang S. Bae (15):
  x86/fpu/xstate: Provide xstate_calculate_size()
  x86/arch_prctl: Add controls for dynamic XSTATE components
  x86/fpu/signal: Prepare for variable sigframe length
  x86/fpu: Reset permission and fpstate on exec()
  x86/cpufeatures: Add eXtended Feature Disabling (XFD) feature bit
  x86/msr-index: Add MSRs for XFD
  x86/fpu: Add XFD state to fpstate
  x86/fpu: Update XFD state where required
  x86/fpu/xstate: Add XFD #NM handler
  x86/fpu/xstate: Add fpstate_realloc()/free()
  x86/fpu/xstate: Prepare XSAVE feature table for gaps in state
    component numbers
  x86/fpu/amx: Define AMX state components and have it used for
    boot-time checks
  x86/fpu: Calculate the default sizes independently
  x86/fpu: Add XFD handling for dynamic states
  x86/fpu/amx: Enable the AMX feature in 64-bit mode

Thomas Gleixner (8):
  signal: Add an optional check for altstack size
  x86/signal: Implement sigaltstack size validation
  x86/fpu: Add members to struct fpu to cache permission information
  x86/fpu: Add fpu_state_config::legacy_features
  x86/fpu: Add basic helpers for dynamically enabled features
  x86/signal: Use fpu::__state_user_size for sigalt stack validation
  x86/fpu: Prepare fpu_clone() for dynamically enabled features
  x86/fpu: Add sanity checks for XFD

 .../admin-guide/kernel-parameters.txt         |   9 +
 arch/Kconfig                                  |   3 +
 arch/x86/Kconfig                              |  17 +
 arch/x86/include/asm/cpufeatures.h            |   2 +
 arch/x86/include/asm/fpu/api.h                |  11 +
 arch/x86/include/asm/fpu/sched.h              |   2 +-
 arch/x86/include/asm/fpu/types.h              |  88 +++
 arch/x86/include/asm/fpu/xstate.h             |  21 +-
 arch/x86/include/asm/msr-index.h              |   2 +
 arch/x86/include/asm/proto.h                  |   2 +-
 arch/x86/include/uapi/asm/prctl.h             |   4 +
 arch/x86/kernel/cpu/cpuid-deps.c              |   2 +
 arch/x86/kernel/fpu/context.h                 |   2 +
 arch/x86/kernel/fpu/core.c                    |  93 ++-
 arch/x86/kernel/fpu/init.c                    |   9 +-
 arch/x86/kernel/fpu/internal.h                |   3 -
 arch/x86/kernel/fpu/signal.c                  |  70 +--
 arch/x86/kernel/fpu/xstate.c                  | 566 ++++++++++++++++--
 arch/x86/kernel/fpu/xstate.h                  |  61 +-
 arch/x86/kernel/process.c                     |  21 +-
 arch/x86/kernel/signal.c                      |  64 +-
 arch/x86/kernel/traps.c                       |  38 ++
 include/linux/signal.h                        |   6 +
 kernel/signal.c                               |  35 +-
 24 files changed, 1008 insertions(+), 123 deletions(-)


base-commit: 672f086122b67b299aa89c6b5f647c20ef84157f
--
2.17.1


^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 01/23] signal: Add an optional check for altstack size
  2021-10-21 22:55 [PATCH 00/23] x86: Support Intel Advanced Matrix Extensions (part 4) Chang S. Bae
@ 2021-10-21 22:55 ` Chang S. Bae
  2021-10-22  0:06   ` Bae, Chang Seok
                     ` (2 more replies)
  2021-10-21 22:55 ` [PATCH 02/23] x86/signal: Implement sigaltstack size validation Chang S. Bae
                   ` (21 subsequent siblings)
  22 siblings, 3 replies; 60+ messages in thread
From: Chang S. Bae @ 2021-10-21 22:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, tglx, dave.hansen, arjan, ravi.v.shankar, chang.seok.bae,
	Oleg Nesterov, Eric W . Biederman

From: Thomas Gleixner <tglx@linutronix.de>

The upcoming support for dynamically enabled FPU features on x86 requires
an architecture specific sanity check and serialization of the store to
task::sas_ss_size. The check is required to ensure that:

   - Enabling of a dynamic feature, which changes the sigframe size fits
     into an enabled sigaltstack

   - Installing a too small sigaltstack after a dynamic feature has been
     added is not possible.

It needs serialization to prevent race conditions of all sorts in the
feature enable code as that has to walk the thread list of the process.

Add the infrastructure in form of a config option and provide empty stubs
for architectures which do not need that.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Oleg Nesterov <ole@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
---
 arch/Kconfig           |  3 +++
 include/linux/signal.h |  6 ++++++
 kernel/signal.c        | 35 +++++++++++++++++++++++++++++------
 3 files changed, 38 insertions(+), 6 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 8df1c7102643..af5cf3009b4f 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -1288,6 +1288,9 @@ config ARCH_HAS_ELFCORE_COMPAT
 config ARCH_HAS_PARANOID_L1D_FLUSH
 	bool
 
+config DYNAMIC_SIGFRAME
+	bool
+
 source "kernel/gcov/Kconfig"
 
 source "scripts/gcc-plugins/Kconfig"
diff --git a/include/linux/signal.h b/include/linux/signal.h
index 3f96a6374e4f..7d34105e20c6 100644
--- a/include/linux/signal.h
+++ b/include/linux/signal.h
@@ -464,6 +464,12 @@ int __save_altstack(stack_t __user *, unsigned long);
 	unsafe_put_user(t->sas_ss_size, &__uss->ss_size, label); \
 } while (0);
 
+#ifdef CONFIG_DYNAMIC_SIGFRAME
+bool sigaltstack_size_valid(size_t ss_size);
+#else
+static inline bool sigaltstack_size_valid(size_t size) { return true; }
+#endif /* !CONFIG_DYNAMIC_SIGFRAME */
+
 #ifdef CONFIG_PROC_FS
 struct seq_file;
 extern void render_sigset_t(struct seq_file *, const char *, sigset_t *);
diff --git a/kernel/signal.c b/kernel/signal.c
index 952741f6d0f9..9278f5291ed6 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -4151,11 +4151,29 @@ int do_sigaction(int sig, struct k_sigaction *act, struct k_sigaction *oact)
 	return 0;
 }
 
+#ifdef CONFIG_DYNAMIC_SIGFRAME
+static inline void sigaltstack_lock(void)
+	__acquires(&current->sighand->siglock)
+{
+	spin_lock_irq(&current->sighand->siglock);
+}
+
+static inline void sigaltstack_unlock(void)
+	__releases(&current->sighand->siglock)
+{
+	spin_unlock_irq(&current->sighand->siglock);
+}
+#else
+static inline void sigaltstack_lock(void) { }
+static inline void sigaltstack_unlock(void) { }
+#endif
+
 static int
 do_sigaltstack (const stack_t *ss, stack_t *oss, unsigned long sp,
 		size_t min_ss_size)
 {
 	struct task_struct *t = current;
+	int ret = 0;
 
 	if (oss) {
 		memset(oss, 0, sizeof(stack_t));
@@ -4179,19 +4197,24 @@ do_sigaltstack (const stack_t *ss, stack_t *oss, unsigned long sp,
 				ss_mode != 0))
 			return -EINVAL;
 
+		sigaltstack_lock();
 		if (ss_mode == SS_DISABLE) {
 			ss_size = 0;
 			ss_sp = NULL;
 		} else {
 			if (unlikely(ss_size < min_ss_size))
-				return -ENOMEM;
+				ret = -ENOMEM;
+			if (!sigaltstack_size_valid(ss_size))
+				ret = -ENOMEM;
 		}
-
-		t->sas_ss_sp = (unsigned long) ss_sp;
-		t->sas_ss_size = ss_size;
-		t->sas_ss_flags = ss_flags;
+		if (!ret) {
+			t->sas_ss_sp = (unsigned long) ss_sp;
+			t->sas_ss_size = ss_size;
+			t->sas_ss_flags = ss_flags;
+		}
+		sigaltstack_unlock();
 	}
-	return 0;
+	return ret;
 }
 
 SYSCALL_DEFINE2(sigaltstack,const stack_t __user *,uss, stack_t __user *,uoss)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 02/23] x86/signal: Implement sigaltstack size validation
  2021-10-21 22:55 [PATCH 00/23] x86: Support Intel Advanced Matrix Extensions (part 4) Chang S. Bae
  2021-10-21 22:55 ` [PATCH 01/23] signal: Add an optional check for altstack size Chang S. Bae
@ 2021-10-21 22:55 ` Chang S. Bae
  2021-10-26 16:16   ` [tip: x86/fpu] " tip-bot2 for Thomas Gleixner
  2021-10-21 22:55 ` [PATCH 03/23] x86/fpu/xstate: Provide xstate_calculate_size() Chang S. Bae
                   ` (20 subsequent siblings)
  22 siblings, 1 reply; 60+ messages in thread
From: Chang S. Bae @ 2021-10-21 22:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, tglx, dave.hansen, arjan, ravi.v.shankar, chang.seok.bae

From: Thomas Gleixner <tglx@linutronix.de>

For historical reasons MINSIGSTKSZ is a constant which became already too
small with AVX512 support.

Add a mechanism to enforce strict checking of the sigaltstack size against
the real size of the FPU frame.

The strict check can be enabled via a config option and can also be
controlled via the kernel command line option 'strict_sas_size' independent
of the config switch.

Enabling it might break existing applications which allocate a too small
sigaltstack but 'work' because they never get a signal delivered. Though it
can be handy to filter out binaries which are not yet aware of
AT_MINSIGSTKSZ.

Also the upcoming support for dynamically enabled FPU features requires a
strict sanity check to ensure that:

   - Enabling of a dynamic feature, which changes the sigframe size fits
     into an enabled sigaltstack

   - Installing a too small sigaltstack after a dynamic feature has been
     added is not possible.

Implement the base check which is controlled by config and command line
options.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
 .../admin-guide/kernel-parameters.txt         |  9 +++++
 arch/x86/Kconfig                              | 17 +++++++++
 arch/x86/kernel/signal.c                      | 35 +++++++++++++++++++
 3 files changed, 61 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 91ba391f9b32..efb9512d0b14 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5497,6 +5497,15 @@
 	stifb=		[HW]
 			Format: bpp:<bpp1>[:<bpp2>[:<bpp3>...]]
 
+        strict_sas_size=
+			[X86]
+			Format: <bool>
+			Enable or disable strict sigaltstack size checks
+			against the required signal frame size which
+			depends on the supported FPU features. This can
+			be used to filter out binaries which have
+			not yet been made aware of AT_MINSIGSTKSZ.
+
 	sunrpc.min_resvport=
 	sunrpc.max_resvport=
 			[NFS,SUNRPC]
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 8055da49f1c0..9b77b4af016b 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -125,6 +125,7 @@ config X86
 	select CLOCKSOURCE_VALIDATE_LAST_CYCLE
 	select CLOCKSOURCE_WATCHDOG
 	select DCACHE_WORD_ACCESS
+	select DYNAMIC_SIGFRAME
 	select EDAC_ATOMIC_SCRUB
 	select EDAC_SUPPORT
 	select GENERIC_CLOCKEVENTS_BROADCAST	if X86_64 || (X86_32 && X86_LOCAL_APIC)
@@ -2389,6 +2390,22 @@ config MODIFY_LDT_SYSCALL
 
 	  Saying 'N' here may make sense for embedded or server kernels.
 
+config STRICT_SIGALTSTACK_SIZE
+	bool "Enforce strict size checking for sigaltstack"
+	depends on DYNAMIC_SIGFRAME
+	help
+	  For historical reasons MINSIGSTKSZ is a constant which became
+	  already too small with AVX512 support. Add a mechanism to
+	  enforce strict checking of the sigaltstack size against the
+	  real size of the FPU frame. This option enables the check
+	  by default. It can also be controlled via the kernel command
+	  line option 'strict_sas_size' independent of this config
+	  switch. Enabling it might break existing applications which
+	  allocate a too small sigaltstack but 'work' because they
+	  never get a signal delivered.
+
+	  Say 'N' unless you want to really enforce this check.
+
 source "kernel/livepatch/Kconfig"
 
 endmenu
diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index 58bd07071d14..0111a6ae6e60 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -15,6 +15,7 @@
 #include <linux/mm.h>
 #include <linux/smp.h>
 #include <linux/kernel.h>
+#include <linux/kstrtox.h>
 #include <linux/errno.h>
 #include <linux/wait.h>
 #include <linux/tracehook.h>
@@ -40,6 +41,7 @@
 #include <linux/compat.h>
 #include <asm/proto.h>
 #include <asm/ia32_unistd.h>
+#include <asm/fpu/xstate.h>
 #endif /* CONFIG_X86_64 */
 
 #include <asm/syscall.h>
@@ -907,6 +909,39 @@ void signal_fault(struct pt_regs *regs, void __user *frame, char *where)
 	force_sig(SIGSEGV);
 }
 
+#ifdef CONFIG_DYNAMIC_SIGFRAME
+#ifdef CONFIG_STRICT_SIGALTSTACK_SIZE
+static bool strict_sigaltstack_size __ro_after_init = true;
+#else
+static bool strict_sigaltstack_size __ro_after_init = false;
+#endif
+
+static int __init strict_sas_size(char *arg)
+{
+	return kstrtobool(arg, &strict_sigaltstack_size);
+}
+__setup("strict_sas_size", strict_sas_size);
+
+/*
+ * MINSIGSTKSZ is 2048 and can't be changed despite the fact that AVX512
+ * exceeds that size already. As such programs might never use the
+ * sigaltstack they just continued to work. While always checking against
+ * the real size would be correct, this might be considered a regression.
+ *
+ * Therefore avoid the sanity check, unless enforced by kernel config or
+ * command line option.
+ */
+bool sigaltstack_size_valid(size_t ss_size)
+{
+	lockdep_assert_held(&current->sighand->siglock);
+
+	if (strict_sigaltstack_size)
+		return ss_size > get_sigframe_size();
+
+	return true;
+}
+#endif /* CONFIG_DYNAMIC_SIGFRAME */
+
 #ifdef CONFIG_X86_X32_ABI
 COMPAT_SYSCALL_DEFINE0(x32_rt_sigreturn)
 {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 03/23] x86/fpu/xstate: Provide xstate_calculate_size()
  2021-10-21 22:55 [PATCH 00/23] x86: Support Intel Advanced Matrix Extensions (part 4) Chang S. Bae
  2021-10-21 22:55 ` [PATCH 01/23] signal: Add an optional check for altstack size Chang S. Bae
  2021-10-21 22:55 ` [PATCH 02/23] x86/signal: Implement sigaltstack size validation Chang S. Bae
@ 2021-10-21 22:55 ` Chang S. Bae
  2021-10-26 16:16   ` [tip: x86/fpu] " tip-bot2 for Chang S. Bae
  2021-10-21 22:55 ` [PATCH 04/23] x86/fpu: Add members to struct fpu to cache permission information Chang S. Bae
                   ` (19 subsequent siblings)
  22 siblings, 1 reply; 60+ messages in thread
From: Chang S. Bae @ 2021-10-21 22:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, tglx, dave.hansen, arjan, ravi.v.shankar, chang.seok.bae

Split out the size calculation from the paranoia check, so it can be used
for recalculating buffer sizes when dynamically enabled features becomes
supported.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
[ tglx: Adopted to changed base code ]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
 arch/x86/kernel/fpu/xstate.c | 46 ++++++++++++++++++++++--------------
 1 file changed, 28 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index cbba3812a160..310c4201e056 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -549,6 +549,33 @@ static bool __init check_xstate_against_struct(int nr)
 	return true;
 }
 
+static unsigned int xstate_calculate_size(u64 xfeatures, bool compacted)
+{
+	unsigned int size = FXSAVE_SIZE + XSAVE_HDR_SIZE;
+	int i;
+
+	for_each_extended_xfeature(i, xfeatures) {
+		/* Align from the end of the previous feature */
+		if (xfeature_is_aligned(i))
+			size = ALIGN(size, 64);
+		/*
+		 * In compacted format the enabled features are packed,
+		 * i.e. disabled features do not occupy space.
+		 *
+		 * In non-compacted format the offsets are fixed and
+		 * disabled states still occupy space in the memory buffer.
+		 */
+		if (!compacted)
+			size = xfeature_uncompacted_offset(i);
+		/*
+		 * Add the feature size even for non-compacted format
+		 * to make the end result correct
+		 */
+		size += xfeature_size(i);
+	}
+	return size;
+}
+
 /*
  * This essentially double-checks what the cpu told us about
  * how large the XSAVE buffer needs to be.  We are recalculating
@@ -575,25 +602,8 @@ static bool __init paranoid_xstate_size_valid(unsigned int kernel_size)
 			XSTATE_WARN_ON(1);
 			return false;
 		}
-
-		/* Align from the end of the previous feature */
-		if (xfeature_is_aligned(i))
-			size = ALIGN(size, 64);
-		/*
-		 * In compacted format the enabled features are packed,
-		 * i.e. disabled features do not occupy space.
-		 *
-		 * In non-compacted format the offsets are fixed and
-		 * disabled states still occupy space in the memory buffer.
-		 */
-		if (!compacted)
-			size = xfeature_uncompacted_offset(i);
-		/*
-		 * Add the feature size even for non-compacted format
-		 * to make the end result correct
-		 */
-		size += xfeature_size(i);
 	}
+	size = xstate_calculate_size(fpu_kernel_cfg.max_features, compacted);
 	XSTATE_WARN_ON(size != kernel_size);
 	return size == kernel_size;
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 04/23] x86/fpu: Add members to struct fpu to cache permission information
  2021-10-21 22:55 [PATCH 00/23] x86: Support Intel Advanced Matrix Extensions (part 4) Chang S. Bae
                   ` (2 preceding siblings ...)
  2021-10-21 22:55 ` [PATCH 03/23] x86/fpu/xstate: Provide xstate_calculate_size() Chang S. Bae
@ 2021-10-21 22:55 ` Chang S. Bae
  2021-10-26 16:16   ` [tip: x86/fpu] " tip-bot2 for Thomas Gleixner
  2021-10-21 22:55 ` [PATCH 05/23] x86/fpu: Add fpu_state_config::legacy_features Chang S. Bae
                   ` (18 subsequent siblings)
  22 siblings, 1 reply; 60+ messages in thread
From: Chang S. Bae @ 2021-10-21 22:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, tglx, dave.hansen, arjan, ravi.v.shankar, chang.seok.bae

From: Thomas Gleixner <tglx@linutronix.de>

Dynamically enabled features can be requested by any thread of a running
process at any time. The request does neither enable the feature nor
allocate larger buffers. It just stores the permission to use the feature
by adding the features to the permission bitmap and by calculating the
required sizes for kernel and user space.

The reallocation of the kernel buffer happens when the feature is used for
the first time which is caught by an exception. The permission bitmap is
then checked and if the feature is permitted, then it becomes fully
enabled. If not, the task dies similar to a task which uses an undefined
instruction.

The size information is precomputed to allow proper sigaltstack size checks
once the feature is permitted, but not yet in use because otherwise this
would open race windows where too small stacks could be installed causing
a later fail on signal delivery.

Initialize them to the default feature set and sizes.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
 arch/x86/include/asm/fpu/types.h | 46 ++++++++++++++++++++++++++++++++
 arch/x86/kernel/fpu/core.c       |  5 ++++
 2 files changed, 51 insertions(+)

diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
index 81a01de1fec2..d8b59e45dc25 100644
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -352,6 +352,45 @@ struct fpstate {
 	/* @regs is dynamically sized! Don't add anything after @regs! */
 } __attribute__ ((aligned (64)));
 
+struct fpu_state_perm {
+	/*
+	 * @__state_perm:
+	 *
+	 * This bitmap indicates the permission for state components, which
+	 * are available to a thread group. The permission prctl() sets the
+	 * enabled state bits in thread_group_leader()->thread.fpu.
+	 *
+	 * All run time operations use the per thread information in the
+	 * currently active fpu.fpstate which contains the xfeature masks
+	 * and sizes for kernel and user space.
+	 *
+	 * This master permission field is only to be used when
+	 * task.fpu.fpstate based checks fail to validate whether the task
+	 * is allowed to expand it's xfeatures set which requires to
+	 * allocate a larger sized fpstate buffer.
+	 *
+	 * Do not access this field directly.  Use the provided helper
+	 * function. Unlocked access is possible for quick checks.
+	 */
+	u64				__state_perm;
+
+	/*
+	 * @__state_size:
+	 *
+	 * The size required for @__state_perm. Only valid to access
+	 * with sighand locked.
+	 */
+	unsigned int			__state_size;
+
+	/*
+	 * @__user_state_size:
+	 *
+	 * The size required for @__state_perm user part. Only valid to
+	 * access with sighand locked.
+	 */
+	unsigned int			__user_state_size;
+};
+
 /*
  * Highest level per task FPU state data structure that
  * contains the FPU register state plus various FPU
@@ -395,6 +434,13 @@ struct fpu {
 	 */
 	struct fpstate			*__task_fpstate;
 
+	/*
+	 * @perm:
+	 *
+	 * Permission related information
+	 */
+	struct fpu_state_perm		perm;
+
 	/*
 	 * @__fpstate:
 	 *
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 4b09f0f70082..eb911c843386 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -415,6 +415,11 @@ void fpstate_reset(struct fpu *fpu)
 	/* Set the fpstate pointer to the default fpstate */
 	fpu->fpstate = &fpu->__fpstate;
 	__fpstate_reset(fpu->fpstate);
+
+	/* Initialize the permission related info in fpu */
+	fpu->perm.__state_perm		= fpu_kernel_cfg.default_features;
+	fpu->perm.__state_size		= fpu_kernel_cfg.default_size;
+	fpu->perm.__user_state_size	= fpu_user_cfg.default_size;
 }
 
 /* Clone current's FPU state on fork */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 05/23] x86/fpu: Add fpu_state_config::legacy_features
  2021-10-21 22:55 [PATCH 00/23] x86: Support Intel Advanced Matrix Extensions (part 4) Chang S. Bae
                   ` (3 preceding siblings ...)
  2021-10-21 22:55 ` [PATCH 04/23] x86/fpu: Add members to struct fpu to cache permission information Chang S. Bae
@ 2021-10-21 22:55 ` Chang S. Bae
  2021-10-26 16:16   ` [tip: x86/fpu] " tip-bot2 for Thomas Gleixner
  2021-10-21 22:55 ` [PATCH 06/23] x86/arch_prctl: Add controls for dynamic XSTATE components Chang S. Bae
                   ` (17 subsequent siblings)
  22 siblings, 1 reply; 60+ messages in thread
From: Chang S. Bae @ 2021-10-21 22:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, tglx, dave.hansen, arjan, ravi.v.shankar, chang.seok.bae

From: Thomas Gleixner <tglx@linutronix.de>

The upcoming prctl() which is required to request the permission for a
dynamically enabled feature will also provide an option to retrieve the
supported features. If the CPU does not support xsave the supported
features would be 0 even when the CPU supports FP and SSE.

Provide separate storage for the legacy feature set to avoid that and fill
in the bits in the legacy init function.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
 arch/x86/include/asm/fpu/types.h | 7 +++++++
 arch/x86/kernel/fpu/init.c       | 9 ++++++---
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
index d8b59e45dc25..d27ad4842fa1 100644
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -500,6 +500,13 @@ struct fpu_state_config {
 	 * be requested by user space before usage.
 	 */
 	u64 default_features;
+	/*
+	 * @legacy_features:
+	 *
+	 * Features which can be reported back to user space
+	 * even without XSAVE support, i.e. legacy features FP + SSE
+	 */
+	u64 legacy_features;
 };
 
 /* FPU state configuration information */
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 7074154131e6..621f4b6cac4a 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -193,12 +193,15 @@ static void __init fpu__init_system_xstate_size_legacy(void)
 	 * Note that the size configuration might be overwritten later
 	 * during fpu__init_system_xstate().
 	 */
-	if (!cpu_feature_enabled(X86_FEATURE_FPU))
+	if (!cpu_feature_enabled(X86_FEATURE_FPU)) {
 		size = sizeof(struct swregs_state);
-	else if (cpu_feature_enabled(X86_FEATURE_FXSR))
+	} else if (cpu_feature_enabled(X86_FEATURE_FXSR)) {
 		size = sizeof(struct fxregs_state);
-	else
+		fpu_user_cfg.legacy_features = XFEATURE_MASK_FPSSE;
+	} else {
 		size = sizeof(struct fregs_state);
+		fpu_user_cfg.legacy_features = XFEATURE_MASK_FP;
+	}
 
 	fpu_kernel_cfg.max_size = size;
 	fpu_kernel_cfg.default_size = size;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 06/23] x86/arch_prctl: Add controls for dynamic XSTATE components
  2021-10-21 22:55 [PATCH 00/23] x86: Support Intel Advanced Matrix Extensions (part 4) Chang S. Bae
                   ` (4 preceding siblings ...)
  2021-10-21 22:55 ` [PATCH 05/23] x86/fpu: Add fpu_state_config::legacy_features Chang S. Bae
@ 2021-10-21 22:55 ` Chang S. Bae
  2021-10-24 21:17   ` Borislav Petkov
  2021-10-26 16:16   ` [tip: x86/fpu] x86/arch_prctl: Add controls for dynamic XSTATE components tip-bot2 for Chang S. Bae
  2021-10-21 22:55 ` [PATCH 07/23] x86/fpu: Add basic helpers for dynamically enabled features Chang S. Bae
                   ` (16 subsequent siblings)
  22 siblings, 2 replies; 60+ messages in thread
From: Chang S. Bae @ 2021-10-21 22:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, tglx, dave.hansen, arjan, ravi.v.shankar, chang.seok.bae

Dynamically enabled XSTATE features are by default disabled for all
processes. A process has to request permission to use such a feature.

To support this implement a architecture specific prctl() with the options:

   - ARCH_GET_XCOMP_SUPP

     Copies the supported feature bitmap into the user space provided
     u64 storage. The pointer is handed in via arg2

   - ARCH_GET_XCOMP_PERM

     Copies the process wide permitted feature bitmap into the user space
     provided u64 storage. The pointer is handed in via arg2

   - ARCH_REQ_XCOMP_PERM

     Request permission for a feature set. A feature set can be mapped to a
     facility, e.g. AMX, and can require one or more XSTATE components to
     be enabled.

     The feature argument is the number of the highest XSTATE component
     which is required for a facility to work.

     The request argument is not a user supplied bitmap because that makes
     filtering harder (think seccomp) and even impossible because to
     support 32bit tasks the argument would have to be a pointer.

The permission mechanism works this way:

   Task asks for permission for a facility and kernel checks whether that's
   supported. If supported it does:

     1) Check whether permission has already been granted

     2) Compute the size of the required kernel and user space buffer
        (sigframe) size.

     3) Validate that no task has a sigaltstack installed
        which is smaller than the resulting sigframe size

     4) Add the requested feature bit(s) to the permission bitmap of
        current->group_leader->fpu and store the sizes in the group
        leaders fpu struct as well.

If that is successful then the feature is still not enabled for any of the
tasks. The first usage of a related instruction will result in a #NM
trap. The trap handler validates the permission bit of the tasks group
leader and if permitted it installs a larger kernel buffer and transfers
the permission and size info to the new fpstate container which makes all
the FPU functions which require per task information aware of the extended
feature set.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
[ tglx: Adopted to new base code, added missing serialization,
        massaged namings, comments and changelog ]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
Changes from the tglx tree:
* Remove the dynamic feature example in the changelog for clarity. The
  bit-17 is not dynamic feature. (Dave Hansen)
---
 arch/x86/include/asm/fpu/api.h    |   4 +
 arch/x86/include/asm/proto.h      |   2 +-
 arch/x86/include/uapi/asm/prctl.h |   4 +
 arch/x86/kernel/fpu/xstate.c      | 156 ++++++++++++++++++++++++++++++
 arch/x86/kernel/fpu/xstate.h      |   6 ++
 arch/x86/kernel/process.c         |   9 +-
 6 files changed, 178 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h
index 286a66ff0bd1..89762c28ad5a 100644
--- a/arch/x86/include/asm/fpu/api.h
+++ b/arch/x86/include/asm/fpu/api.h
@@ -151,4 +151,8 @@ static inline bool fpstate_is_confidential(struct fpu_guest *gfpu)
 	return gfpu->fpstate->is_confidential;
 }
 
+/* prctl */
+struct task_struct;
+extern long fpu_xstate_prctl(struct task_struct *tsk, int option, unsigned long arg2);
+
 #endif /* _ASM_X86_FPU_API_H */
diff --git a/arch/x86/include/asm/proto.h b/arch/x86/include/asm/proto.h
index 8c5d1910a848..feed36d44d04 100644
--- a/arch/x86/include/asm/proto.h
+++ b/arch/x86/include/asm/proto.h
@@ -40,6 +40,6 @@ void x86_report_nx(void);
 extern int reboot_force;
 
 long do_arch_prctl_common(struct task_struct *task, int option,
-			  unsigned long cpuid_enabled);
+			  unsigned long arg2);
 
 #endif /* _ASM_X86_PROTO_H */
diff --git a/arch/x86/include/uapi/asm/prctl.h b/arch/x86/include/uapi/asm/prctl.h
index 5a6aac9fa41f..754a07856817 100644
--- a/arch/x86/include/uapi/asm/prctl.h
+++ b/arch/x86/include/uapi/asm/prctl.h
@@ -10,6 +10,10 @@
 #define ARCH_GET_CPUID		0x1011
 #define ARCH_SET_CPUID		0x1012
 
+#define ARCH_GET_XCOMP_SUPP	0x1021
+#define ARCH_GET_XCOMP_PERM	0x1022
+#define ARCH_REQ_XCOMP_PERM	0x1023
+
 #define ARCH_MAP_VDSO_X32	0x2001
 #define ARCH_MAP_VDSO_32	0x2002
 #define ARCH_MAP_VDSO_64	0x2003
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 310c4201e056..c79ca3d430c4 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -8,6 +8,7 @@
 #include <linux/compat.h>
 #include <linux/cpu.h>
 #include <linux/mman.h>
+#include <linux/nospec.h>
 #include <linux/pkeys.h>
 #include <linux/seq_file.h>
 #include <linux/proc_fs.h>
@@ -18,6 +19,8 @@
 #include <asm/fpu/xcr.h>
 
 #include <asm/tlbflush.h>
+#include <asm/prctl.h>
+#include <asm/elf.h>
 
 #include "internal.h"
 #include "legacy.h"
@@ -1298,6 +1301,159 @@ void fpstate_clear_xstate_component(struct fpstate *fps, unsigned int xfeature)
 EXPORT_SYMBOL_GPL(fpstate_clear_xstate_component);
 #endif
 
+#ifdef CONFIG_X86_64
+static int validate_sigaltstack(unsigned int usize)
+{
+	struct task_struct *thread, *leader = current->group_leader;
+	unsigned long framesize = get_sigframe_size();
+
+	lockdep_assert_held(&current->sighand->siglock);
+
+	/* get_sigframe_size() is based on fpu_user_cfg.max_size */
+	framesize -= fpu_user_cfg.max_size;
+	framesize += usize;
+	for_each_thread(leader, thread) {
+		if (thread->sas_ss_size && thread->sas_ss_size < framesize)
+			return -ENOSPC;
+	}
+	return 0;
+}
+
+static int __xstate_request_perm(u64 permitted, u64 requested)
+{
+	/*
+	 * This deliberately does not exclude !XSAVES as we still might
+	 * decide to optionally context switch XCR0 or talk the silicon
+	 * vendors into extending XFD for the pre AMX states.
+	 */
+	bool compacted = cpu_feature_enabled(X86_FEATURE_XSAVES);
+	struct fpu *fpu = &current->group_leader->thread.fpu;
+	unsigned int ksize, usize;
+	u64 mask;
+	int ret;
+
+	/* Check whether fully enabled */
+	if ((permitted & requested) == requested)
+		return 0;
+
+	/* Calculate the resulting kernel state size */
+	mask = permitted | requested;
+	ksize = xstate_calculate_size(mask, compacted);
+
+	/* Calculate the resulting user state size */
+	mask &= XFEATURE_MASK_USER_SUPPORTED;
+	usize = xstate_calculate_size(mask, false);
+
+	ret = validate_sigaltstack(usize);
+	if (ret)
+		return ret;
+
+	/* Pairs with the READ_ONCE() in xstate_get_group_perm() */
+	WRITE_ONCE(fpu->perm.__state_perm, requested);
+	/* Protected by sighand lock */
+	fpu->perm.__state_size = ksize;
+	fpu->perm.__user_state_size = usize;
+	return ret;
+}
+
+/*
+ * Permissions array to map facilities with more than one component
+ */
+static const u64 xstate_prctl_req[XFEATURE_MAX] = {
+	/* [XFEATURE_XTILE_DATA] = XFEATURE_MASK_XTILE, */
+};
+
+static int xstate_request_perm(unsigned long idx)
+{
+	u64 permitted, requested;
+	int ret;
+
+	if (idx >= XFEATURE_MAX)
+		return -EINVAL;
+
+	/*
+	 * Look up the facility mask which can require more than
+	 * one xstate component.
+	 */
+	idx = array_index_nospec(idx, ARRAY_SIZE(xstate_prctl_req));
+	requested = xstate_prctl_req[idx];
+	if (!requested)
+		return -ENOTSUPP;
+
+	if ((fpu_user_cfg.max_features & requested) != requested)
+		return -ENOTSUPP;
+
+	/* Lockless quick check */
+	permitted = xstate_get_host_group_perm();
+	if ((permitted & requested) == requested)
+		return 0;
+
+	/* Protect against concurrent modifications */
+	spin_lock_irq(&current->sighand->siglock);
+	permitted = xstate_get_host_group_perm();
+	ret = __xstate_request_perm(permitted, requested);
+	spin_unlock_irq(&current->sighand->siglock);
+	return ret;
+}
+#else /* CONFIG_X86_64 */
+static inline int xstate_request_perm(u64 permitted, u64 requested)
+{
+	return -EPERM;
+}
+#endif  /* !CONFIG_X86_64 */
+
+/**
+ * fpu_xstate_prctl - xstate permission operations
+ * @tsk:	Redundant pointer to current
+ * @option:	A subfunction of arch_prctl()
+ * @arg2:	option argument
+ * Return:	0 if successful; otherwise, an error code
+ *
+ * Option arguments:
+ *
+ * ARCH_GET_XCOMP_SUPP: Pointer to user space u64 to store the info
+ * ARCH_GET_XCOMP_PERM: Pointer to user space u64 to store the info
+ * ARCH_REQ_XCOMP_PERM: Facility number requested
+ *
+ * For facilities which require more than one XSTATE component, the request
+ * must be the highest state component number related to that facility,
+ * e.g. for AMX which requires XFEATURE_XTILE_CFG(17) and
+ * XFEATURE_XTILE_DATA(18) this would be XFEATURE_XTILE_DATA(18).
+ */
+long fpu_xstate_prctl(struct task_struct *tsk, int option, unsigned long arg2)
+{
+	u64 __user *uptr = (u64 __user *)arg2;
+	u64 permitted, supported;
+	unsigned long idx = arg2;
+
+	if (tsk != current)
+		return -EPERM;
+
+	switch (option) {
+	case ARCH_GET_XCOMP_SUPP:
+		supported = fpu_user_cfg.max_features |	fpu_user_cfg.legacy_features;
+		return put_user(supported, uptr);
+
+	case ARCH_GET_XCOMP_PERM:
+		/*
+		 * Lockless snapshot as it can also change right after the
+		 * dropping the lock.
+		 */
+		permitted = xstate_get_host_group_perm();
+		permitted &= XFEATURE_MASK_USER_SUPPORTED;
+		return put_user(permitted, uptr);
+
+	case ARCH_REQ_XCOMP_PERM:
+		if (!IS_ENABLED(CONFIG_X86_64))
+			return -ENOTSUPP;
+
+		return xstate_request_perm(idx);
+
+	default:
+		return -EINVAL;
+	}
+}
+
 #ifdef CONFIG_PROC_PID_ARCH_STATUS
 /*
  * Report the amount of time elapsed in millisecond since last AVX512
diff --git a/arch/x86/kernel/fpu/xstate.h b/arch/x86/kernel/fpu/xstate.h
index a1aa0bad2c9c..4ce1dc030f38 100644
--- a/arch/x86/kernel/fpu/xstate.h
+++ b/arch/x86/kernel/fpu/xstate.h
@@ -15,6 +15,12 @@ static inline void xstate_init_xcomp_bv(struct xregs_state *xsave, u64 mask)
 		xsave->header.xcomp_bv = mask | XCOMP_BV_COMPACTED_FORMAT;
 }
 
+static inline u64 xstate_get_host_group_perm(void)
+{
+	/* Pairs with WRITE_ONCE() in xstate_request_perm() */
+	return READ_ONCE(current->group_leader->thread.fpu.perm.__state_perm);
+}
+
 enum xstate_copy_mode {
 	XSTATE_COPY_FP,
 	XSTATE_COPY_FX,
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index c74c7e889e9d..97fea1649a5e 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -30,6 +30,7 @@
 #include <asm/apic.h>
 #include <linux/uaccess.h>
 #include <asm/mwait.h>
+#include <asm/fpu/api.h>
 #include <asm/fpu/sched.h>
 #include <asm/debugreg.h>
 #include <asm/nmi.h>
@@ -1003,13 +1004,17 @@ unsigned long get_wchan(struct task_struct *p)
 }
 
 long do_arch_prctl_common(struct task_struct *task, int option,
-			  unsigned long cpuid_enabled)
+			  unsigned long arg2)
 {
 	switch (option) {
 	case ARCH_GET_CPUID:
 		return get_cpuid_mode();
 	case ARCH_SET_CPUID:
-		return set_cpuid_mode(task, cpuid_enabled);
+		return set_cpuid_mode(task, arg2);
+	case ARCH_GET_XCOMP_SUPP:
+	case ARCH_GET_XCOMP_PERM:
+	case ARCH_REQ_XCOMP_PERM:
+		return fpu_xstate_prctl(task, option, arg2);
 	}
 
 	return -EINVAL;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 07/23] x86/fpu: Add basic helpers for dynamically enabled features
  2021-10-21 22:55 [PATCH 00/23] x86: Support Intel Advanced Matrix Extensions (part 4) Chang S. Bae
                   ` (5 preceding siblings ...)
  2021-10-21 22:55 ` [PATCH 06/23] x86/arch_prctl: Add controls for dynamic XSTATE components Chang S. Bae
@ 2021-10-21 22:55 ` Chang S. Bae
  2021-10-26 16:16   ` [tip: x86/fpu] " tip-bot2 for Thomas Gleixner
  2021-10-21 22:55 ` [PATCH 08/23] x86/signal: Use fpu::__state_user_size for sigalt stack validation Chang S. Bae
                   ` (15 subsequent siblings)
  22 siblings, 1 reply; 60+ messages in thread
From: Chang S. Bae @ 2021-10-21 22:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, tglx, dave.hansen, arjan, ravi.v.shankar, chang.seok.bae

From: Thomas Gleixner <tglx@linutronix.de>

To allow building up the infrastructure required to support dynamically
enabled FPU features add:

 - XFEATURES_MASK_DYNAMIC

   This constant will hold xfeatures which can be dynamically enabled.

 - fpu_state_size_dynamic()

   A static branch for 64-bit and a simple 'return false' for 32-bit.

   This helper allows to add dynamic feature specific changes to common
   code which is shared between 32-bit and 64-bit without #ifdeffery.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
Changes from the tglx tree:
* Add the missing #ifdeffery. (Dave Hansen)
---
 arch/x86/include/asm/fpu/xstate.h | 14 ++++++++++++++
 arch/x86/kernel/fpu/core.c        |  4 ++++
 2 files changed, 18 insertions(+)

diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
index 43ae89d4bcd2..023000e79348 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -43,6 +43,9 @@
 #define XFEATURE_MASK_USER_RESTORE	\
 	(XFEATURE_MASK_USER_SUPPORTED & ~XFEATURE_MASK_PKRU)
 
+/* Features which are dynamically enabled for a process on request */
+#define XFEATURE_MASK_USER_DYNAMIC	0ULL
+
 /* All currently supported supervisor features */
 #define XFEATURE_MASK_SUPERVISOR_SUPPORTED (XFEATURE_MASK_PASID)
 
@@ -96,4 +99,15 @@ int xfeature_size(int xfeature_nr);
 void xsaves(struct xregs_state *xsave, u64 mask);
 void xrstors(struct xregs_state *xsave, u64 mask);
 
+#ifdef CONFIG_X86_64
+DECLARE_STATIC_KEY_FALSE(__fpu_state_size_dynamic);
+#endif
+
+static __always_inline __pure bool fpu_state_size_dynamic(void)
+{
+	if (IS_ENABLED(CONFIG_X86_64))
+		return static_branch_unlikely(&__fpu_state_size_dynamic);
+	return false;
+}
+
 #endif
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index eb911c843386..ff7d239fe961 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -25,6 +25,10 @@
 #define CREATE_TRACE_POINTS
 #include <asm/trace/fpu.h>
 
+#ifdef CONFIG_X86_64
+DEFINE_STATIC_KEY_FALSE(__fpu_state_size_dynamic);
+#endif
+
 /* The FPU state configuration data for kernel and user space */
 struct fpu_state_config	fpu_kernel_cfg __ro_after_init;
 struct fpu_state_config fpu_user_cfg __ro_after_init;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 08/23] x86/signal: Use fpu::__state_user_size for sigalt stack validation
  2021-10-21 22:55 [PATCH 00/23] x86: Support Intel Advanced Matrix Extensions (part 4) Chang S. Bae
                   ` (6 preceding siblings ...)
  2021-10-21 22:55 ` [PATCH 07/23] x86/fpu: Add basic helpers for dynamically enabled features Chang S. Bae
@ 2021-10-21 22:55 ` Chang S. Bae
  2021-10-26 16:16   ` [tip: x86/fpu] " tip-bot2 for Thomas Gleixner
  2021-10-21 22:55 ` [PATCH 09/23] x86/fpu/signal: Prepare for variable sigframe length Chang S. Bae
                   ` (14 subsequent siblings)
  22 siblings, 1 reply; 60+ messages in thread
From: Chang S. Bae @ 2021-10-21 22:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, tglx, dave.hansen, arjan, ravi.v.shankar, chang.seok.bae

From: Thomas Gleixner <tglx@linutronix.de>

Use the current->group_leader->fpu to check for pending permissions to use
extended features and validate against the resulting user space size which
is stored in the group leaders fpu struct as well.

This prevents a task to install a too small sized sigaltstack after
permissions to use dynamically enabled features has been granted, but the
task has not (yet) used a related instruction.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
 arch/x86/kernel/signal.c | 35 +++++++++++++++++++++++++++++++----
 1 file changed, 31 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index 0111a6ae6e60..ec71e06ae364 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -32,6 +32,7 @@
 #include <asm/processor.h>
 #include <asm/ucontext.h>
 #include <asm/fpu/signal.h>
+#include <asm/fpu/xstate.h>
 #include <asm/vdso.h>
 #include <asm/mce.h>
 #include <asm/sighandling.h>
@@ -720,12 +721,15 @@ SYSCALL_DEFINE0(rt_sigreturn)
 
 /* max_frame_size tells userspace the worst case signal stack size. */
 static unsigned long __ro_after_init max_frame_size;
+static unsigned int __ro_after_init fpu_default_state_size;
 
 void __init init_sigframe_size(void)
 {
+	fpu_default_state_size = fpu__get_fpstate_size();
+
 	max_frame_size = MAX_FRAME_SIGINFO_UCTXT_SIZE + MAX_FRAME_PADDING;
 
-	max_frame_size += fpu__get_fpstate_size() + MAX_XSAVE_PADDING;
+	max_frame_size += fpu_default_state_size + MAX_XSAVE_PADDING;
 
 	/* Userspace expects an aligned size. */
 	max_frame_size = round_up(max_frame_size, FRAME_ALIGNMENT);
@@ -928,15 +932,38 @@ __setup("strict_sas_size", strict_sas_size);
  * sigaltstack they just continued to work. While always checking against
  * the real size would be correct, this might be considered a regression.
  *
- * Therefore avoid the sanity check, unless enforced by kernel config or
- * command line option.
+ * Therefore avoid the sanity check, unless enforced by kernel
+ * configuration or command line option.
+ *
+ * When dynamic FPU features are supported, the check is also enforced when
+ * the task has permissions to use dynamic features. Tasks which have no
+ * permission are checked against the size of the non-dynamic feature set
+ * if strict checking is enabled. This avoids forcing all tasks on the
+ * system to allocate large sigaltstacks even if they are never going
+ * to use a dynamic feature. As this is serialized via sighand::siglock
+ * any permission request for a dynamic feature either happened already
+ * or will see the newly install sigaltstack size in the permission checks.
  */
 bool sigaltstack_size_valid(size_t ss_size)
 {
+	unsigned long fsize = max_frame_size - fpu_default_state_size;
+	u64 mask;
+
 	lockdep_assert_held(&current->sighand->siglock);
 
+	if (!fpu_state_size_dynamic() && !strict_sigaltstack_size)
+		return true;
+
+	fsize += current->group_leader->thread.fpu.perm.__user_state_size;
+	if (likely(ss_size > fsize))
+		return true;
+
 	if (strict_sigaltstack_size)
-		return ss_size > get_sigframe_size();
+		return ss_size > fsize;
+
+	mask = current->group_leader->thread.fpu.perm.__state_perm;
+	if (mask & XFEATURE_MASK_USER_DYNAMIC)
+		return ss_size > fsize;
 
 	return true;
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 09/23] x86/fpu/signal: Prepare for variable sigframe length
  2021-10-21 22:55 [PATCH 00/23] x86: Support Intel Advanced Matrix Extensions (part 4) Chang S. Bae
                   ` (7 preceding siblings ...)
  2021-10-21 22:55 ` [PATCH 08/23] x86/signal: Use fpu::__state_user_size for sigalt stack validation Chang S. Bae
@ 2021-10-21 22:55 ` Chang S. Bae
  2021-10-26 16:16   ` [tip: x86/fpu] " tip-bot2 for Chang S. Bae
  2021-10-21 22:55 ` [PATCH 10/23] x86/fpu: Prepare fpu_clone() for dynamically enabled features Chang S. Bae
                   ` (13 subsequent siblings)
  22 siblings, 1 reply; 60+ messages in thread
From: Chang S. Bae @ 2021-10-21 22:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, tglx, dave.hansen, arjan, ravi.v.shankar, chang.seok.bae

The software reserved portion of the fxsave frame in the signal frame is
copied from structures which have been set up at boot time. With
dynamically enabled features the content of these structures is not longer
correct because the xfeatures and size can be different per task.

Calculate the software reserved portion at runtime and fill in the
xfeatures and size values from the tasks active fpstate.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
 arch/x86/kernel/fpu/internal.h |  3 --
 arch/x86/kernel/fpu/signal.c   | 62 ++++++++++++++--------------------
 arch/x86/kernel/fpu/xstate.c   |  1 -
 3 files changed, 26 insertions(+), 40 deletions(-)

diff --git a/arch/x86/kernel/fpu/internal.h b/arch/x86/kernel/fpu/internal.h
index e1d8a352f12d..dbdb31f55fc7 100644
--- a/arch/x86/kernel/fpu/internal.h
+++ b/arch/x86/kernel/fpu/internal.h
@@ -21,9 +21,6 @@ static __always_inline __pure bool use_fxsr(void)
 # define WARN_ON_FPU(x) ({ (void)(x); 0; })
 #endif
 
-/* Init functions */
-extern void fpu__init_prepare_fx_sw_frame(void);
-
 /* Used in init.c */
 extern void fpstate_init_user(struct fpstate *fpstate);
 extern void fpstate_reset(struct fpu *fpu);
diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index 8554e540a871..d128eb581ffc 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -20,9 +20,6 @@
 #include "legacy.h"
 #include "xstate.h"
 
-static struct _fpx_sw_bytes fx_sw_reserved __ro_after_init;
-static struct _fpx_sw_bytes fx_sw_reserved_ia32 __ro_after_init;
-
 /*
  * Check for the presence of extended state information in the
  * user fpstate pointer in the sigcontext.
@@ -98,23 +95,42 @@ static inline bool save_fsave_header(struct task_struct *tsk, void __user *buf)
 	return true;
 }
 
+/*
+ * Prepare the SW reserved portion of the fxsave memory layout, indicating
+ * the presence of the extended state information in the memory layout
+ * pointed to by the fpstate pointer in the sigcontext.
+ * This is saved when ever the FP and extended state context is
+ * saved on the user stack during the signal handler delivery to the user.
+ */
+static inline void save_sw_bytes(struct _fpx_sw_bytes *sw_bytes, bool ia32_frame,
+				 struct fpstate *fpstate)
+{
+	sw_bytes->magic1 = FP_XSTATE_MAGIC1;
+	sw_bytes->extended_size = fpstate->user_size + FP_XSTATE_MAGIC2_SIZE;
+	sw_bytes->xfeatures = fpstate->user_xfeatures;
+	sw_bytes->xstate_size = fpstate->user_size;
+
+	if (ia32_frame)
+		sw_bytes->extended_size += sizeof(struct fregs_state);
+}
+
 static inline bool save_xstate_epilog(void __user *buf, int ia32_frame,
-				      unsigned int usize)
+				      struct fpstate *fpstate)
 {
 	struct xregs_state __user *x = buf;
-	struct _fpx_sw_bytes *sw_bytes;
+	struct _fpx_sw_bytes sw_bytes;
 	u32 xfeatures;
 	int err;
 
 	/* Setup the bytes not touched by the [f]xsave and reserved for SW. */
-	sw_bytes = ia32_frame ? &fx_sw_reserved_ia32 : &fx_sw_reserved;
-	err = __copy_to_user(&x->i387.sw_reserved, sw_bytes, sizeof(*sw_bytes));
+	save_sw_bytes(&sw_bytes, ia32_frame, fpstate);
+	err = __copy_to_user(&x->i387.sw_reserved, &sw_bytes, sizeof(sw_bytes));
 
 	if (!use_xsave())
 		return !err;
 
 	err |= __put_user(FP_XSTATE_MAGIC2,
-			  (__u32 __user *)(buf + usize));
+			  (__u32 __user *)(buf + fpstate->user_size));
 
 	/*
 	 * Read the xfeatures which we copied (directly from the cpu or
@@ -173,7 +189,7 @@ bool copy_fpstate_to_sigframe(void __user *buf, void __user *buf_fx, int size)
 {
 	struct task_struct *tsk = current;
 	struct fpstate *fpstate = tsk->thread.fpu.fpstate;
-	int ia32_fxstate = (buf != buf_fx);
+	bool ia32_fxstate = (buf != buf_fx);
 	int ret;
 
 	ia32_fxstate &= (IS_ENABLED(CONFIG_X86_32) ||
@@ -226,8 +242,7 @@ bool copy_fpstate_to_sigframe(void __user *buf, void __user *buf_fx, int size)
 	if ((ia32_fxstate || !use_fxsr()) && !save_fsave_header(tsk, buf))
 		return false;
 
-	if (use_fxsr() &&
-	    !save_xstate_epilog(buf_fx, ia32_fxstate, fpstate->user_size))
+	if (use_fxsr() && !save_xstate_epilog(buf_fx, ia32_fxstate, fpstate))
 		return false;
 
 	return true;
@@ -523,28 +538,3 @@ unsigned long __init fpu__get_fpstate_size(void)
 	return ret;
 }
 
-/*
- * Prepare the SW reserved portion of the fxsave memory layout, indicating
- * the presence of the extended state information in the memory layout
- * pointed by the fpstate pointer in the sigcontext.
- * This will be saved when ever the FP and extended state context is
- * saved on the user stack during the signal handler delivery to the user.
- */
-void __init fpu__init_prepare_fx_sw_frame(void)
-{
-	int size = fpu_user_cfg.default_size + FP_XSTATE_MAGIC2_SIZE;
-
-	fx_sw_reserved.magic1 = FP_XSTATE_MAGIC1;
-	fx_sw_reserved.extended_size = size;
-	fx_sw_reserved.xfeatures = fpu_user_cfg.default_features;
-	fx_sw_reserved.xstate_size = fpu_user_cfg.default_size;
-
-	if (IS_ENABLED(CONFIG_IA32_EMULATION) ||
-	    IS_ENABLED(CONFIG_X86_32)) {
-		int fsave_header_size = sizeof(struct fregs_state);
-
-		fx_sw_reserved_ia32 = fx_sw_reserved;
-		fx_sw_reserved_ia32.extended_size = size + fsave_header_size;
-	}
-}
-
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index c79ca3d430c4..c178aa91abb3 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -830,7 +830,6 @@ void __init fpu__init_system_xstate(unsigned int legacy_size)
 	update_regset_xstate_info(fpu_user_cfg.max_size,
 				  fpu_user_cfg.max_features);
 
-	fpu__init_prepare_fx_sw_frame();
 	setup_init_fpu_buf();
 	setup_xstate_comp_offsets();
 	setup_supervisor_only_offsets();
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 10/23] x86/fpu: Prepare fpu_clone() for dynamically enabled features
  2021-10-21 22:55 [PATCH 00/23] x86: Support Intel Advanced Matrix Extensions (part 4) Chang S. Bae
                   ` (8 preceding siblings ...)
  2021-10-21 22:55 ` [PATCH 09/23] x86/fpu/signal: Prepare for variable sigframe length Chang S. Bae
@ 2021-10-21 22:55 ` Chang S. Bae
  2021-10-26 16:16   ` [tip: x86/fpu] " tip-bot2 for Thomas Gleixner
  2021-10-21 22:55 ` [PATCH 11/23] x86/fpu: Reset permission and fpstate on exec() Chang S. Bae
                   ` (12 subsequent siblings)
  22 siblings, 1 reply; 60+ messages in thread
From: Chang S. Bae @ 2021-10-21 22:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, tglx, dave.hansen, arjan, ravi.v.shankar, chang.seok.bae

From: Thomas Gleixner <tglx@linutronix.de>

The default portion of the parent's FPU state is saved in a child task.
With dynamic features enabled, the non-default portion is not saved in a
child's fpstate because these register states are defined to be
caller-saved. The new task's fpstate is therefore the default buffer.

Fork inherits the permission of the parent.

Also, do not use memcpy() when TIF_NEED_FPU_LOAD is set because it is
invalid when the parent has dynamic features.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
Changes from the tglx tree:
* Add the changelog.
* Update comments. (Dave Hansen)
---
 arch/x86/include/asm/fpu/sched.h |  2 +-
 arch/x86/kernel/fpu/core.c       | 35 +++++++++++++++++++++++---------
 arch/x86/kernel/process.c        |  2 +-
 3 files changed, 27 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/fpu/sched.h b/arch/x86/include/asm/fpu/sched.h
index cdb78d590c86..99a8820e8cc4 100644
--- a/arch/x86/include/asm/fpu/sched.h
+++ b/arch/x86/include/asm/fpu/sched.h
@@ -11,7 +11,7 @@
 
 extern void save_fpregs_to_fpstate(struct fpu *fpu);
 extern void fpu__drop(struct fpu *fpu);
-extern int  fpu_clone(struct task_struct *dst);
+extern int  fpu_clone(struct task_struct *dst, unsigned long clone_flags);
 extern void fpu_flush_thread(void);
 
 /*
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index ff7d239fe961..177f27687261 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -426,8 +426,20 @@ void fpstate_reset(struct fpu *fpu)
 	fpu->perm.__user_state_size	= fpu_user_cfg.default_size;
 }
 
+static inline void fpu_inherit_perms(struct fpu *dst_fpu)
+{
+	if (fpu_state_size_dynamic()) {
+		struct fpu *src_fpu = &current->group_leader->thread.fpu;
+
+		spin_lock_irq(&current->sighand->siglock);
+		/* Fork also inherits the permissions of the parent */
+		dst_fpu->perm = src_fpu->perm;
+		spin_unlock_irq(&current->sighand->siglock);
+	}
+}
+
 /* Clone current's FPU state on fork */
-int fpu_clone(struct task_struct *dst)
+int fpu_clone(struct task_struct *dst, unsigned long clone_flags)
 {
 	struct fpu *src_fpu = &current->thread.fpu;
 	struct fpu *dst_fpu = &dst->thread.fpu;
@@ -458,17 +470,20 @@ int fpu_clone(struct task_struct *dst)
 	}
 
 	/*
-	 * If the FPU registers are not owned by current just memcpy() the
-	 * state.  Otherwise save the FPU registers directly into the
-	 * child's FPU context, without any memory-to-memory copying.
+	 * Save the default portion of the current FPU state into the
+	 * clone. Assume all dynamic features to be defined as caller-
+	 * saved, which enables skipping both the expansion of fpstate
+	 * and the copying of any dynamic state.
+	 *
+	 * Do not use memcpy() when TIF_NEED_FPU_LOAD is set because
+	 * copying is not valid when current uses non-default states.
 	 */
 	fpregs_lock();
-	if (test_thread_flag(TIF_NEED_FPU_LOAD)) {
-		memcpy(&dst_fpu->fpstate->regs, &src_fpu->fpstate->regs,
-		       dst_fpu->fpstate->size);
-	} else {
-		save_fpregs_to_fpstate(dst_fpu);
-	}
+	if (test_thread_flag(TIF_NEED_FPU_LOAD))
+		fpregs_restore_userregs();
+	save_fpregs_to_fpstate(dst_fpu);
+	if (!(clone_flags & CLONE_THREAD))
+		fpu_inherit_perms(dst_fpu);
 	fpregs_unlock();
 
 	trace_x86_fpu_copy_src(src_fpu);
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 97fea1649a5e..99025e32f105 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -157,7 +157,7 @@ int copy_thread(unsigned long clone_flags, unsigned long sp, unsigned long arg,
 	frame->flags = X86_EFLAGS_FIXED;
 #endif
 
-	fpu_clone(p);
+	fpu_clone(p, clone_flags);
 
 	/* Kernel thread ? */
 	if (unlikely(p->flags & PF_KTHREAD)) {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 11/23] x86/fpu: Reset permission and fpstate on exec()
  2021-10-21 22:55 [PATCH 00/23] x86: Support Intel Advanced Matrix Extensions (part 4) Chang S. Bae
                   ` (9 preceding siblings ...)
  2021-10-21 22:55 ` [PATCH 10/23] x86/fpu: Prepare fpu_clone() for dynamically enabled features Chang S. Bae
@ 2021-10-21 22:55 ` Chang S. Bae
  2021-10-26 16:16   ` [tip: x86/fpu] " tip-bot2 for Chang S. Bae
  2021-10-21 22:55 ` [PATCH 12/23] x86/cpufeatures: Add eXtended Feature Disabling (XFD) feature bit Chang S. Bae
                   ` (11 subsequent siblings)
  22 siblings, 1 reply; 60+ messages in thread
From: Chang S. Bae @ 2021-10-21 22:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, tglx, dave.hansen, arjan, ravi.v.shankar, chang.seok.bae

On exec(), extended register states saved in the buffer is cleared. With
dynamic features, each task carries variables besides the register states.
The struct fpu has permission information and struct fpstate contains
buffer size and feature masks. They are all dynamically updated with
dynamic features.

Reset the current task's entire FPU data before an exec() so that the new
task starts with default permission and fpstate.

Rename the register state reset function because the old naming confuses as
it does not reset struct fpstate.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
Changes from the tglx tree:
* Add as a new patch.
---
 arch/x86/kernel/fpu/core.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 177f27687261..073a8fc281d8 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -547,7 +547,7 @@ static inline void restore_fpregs_from_init_fpstate(u64 features_mask)
 /*
  * Reset current->fpu memory state to the init values.
  */
-static void fpu_reset_fpstate(void)
+static void fpu_reset_fpregs(void)
 {
 	struct fpu *fpu = &current->thread.fpu;
 
@@ -582,7 +582,7 @@ void fpu__clear_user_states(struct fpu *fpu)
 
 	fpregs_lock();
 	if (!cpu_feature_enabled(X86_FEATURE_FPU)) {
-		fpu_reset_fpstate();
+		fpu_reset_fpregs();
 		fpregs_unlock();
 		return;
 	}
@@ -612,7 +612,8 @@ void fpu__clear_user_states(struct fpu *fpu)
 
 void fpu_flush_thread(void)
 {
-	fpu_reset_fpstate();
+	fpstate_reset(&current->thread.fpu);
+	fpu_reset_fpregs();
 }
 /*
  * Load FPU context before returning to userspace.
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 12/23] x86/cpufeatures: Add eXtended Feature Disabling (XFD) feature bit
  2021-10-21 22:55 [PATCH 00/23] x86: Support Intel Advanced Matrix Extensions (part 4) Chang S. Bae
                   ` (10 preceding siblings ...)
  2021-10-21 22:55 ` [PATCH 11/23] x86/fpu: Reset permission and fpstate on exec() Chang S. Bae
@ 2021-10-21 22:55 ` Chang S. Bae
  2021-10-26 16:16   ` [tip: x86/fpu] " tip-bot2 for Chang S. Bae
  2021-10-21 22:55 ` [PATCH 13/23] x86/msr-index: Add MSRs for XFD Chang S. Bae
                   ` (10 subsequent siblings)
  22 siblings, 1 reply; 60+ messages in thread
From: Chang S. Bae @ 2021-10-21 22:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, tglx, dave.hansen, arjan, ravi.v.shankar, chang.seok.bae

Intel's eXtended Feature Disable (XFD) feature is an extension of the XSAVE
architecture. XFD allows the kernel to enable a feature state in XCR0 and
to receive a #NM trap when a task uses instructions accessing that state.

This is going to be used to postpone the allocation of a larger XSTATE
buffer for a task to the point where it is actually using a related
instruction after the permission to use that facility has been granted.

XFD is not used by the kernel, but only applied to userspace. This is a
matter of policy as the kernel knows how a fpstate is reallocated and the
XFD state.

The compacted XSAVE format is adjustable for dynamic features. Make XFD
depend on XSAVES.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
Changes from the tglx tree:
* Call out the policy to use XFD for userspace. (Dave Hansen)
* Make XFD depend on XSAVES. (Dave Hansen)
---
 arch/x86/include/asm/cpufeatures.h | 1 +
 arch/x86/kernel/cpu/cpuid-deps.c   | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index d0ce5cfd3ac1..ab7b3a2de85d 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -277,6 +277,7 @@
 #define X86_FEATURE_XSAVEC		(10*32+ 1) /* XSAVEC instruction */
 #define X86_FEATURE_XGETBV1		(10*32+ 2) /* XGETBV with ECX = 1 instruction */
 #define X86_FEATURE_XSAVES		(10*32+ 3) /* XSAVES/XRSTORS instructions */
+#define X86_FEATURE_XFD			(10*32+ 4) /* "" eXtended Feature Disabling */
 
 /*
  * Extended auxiliary flags: Linux defined - for features scattered in various
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index defda61f372d..d9ead9c20408 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -75,6 +75,7 @@ static const struct cpuid_dep cpuid_deps[] = {
 	{ X86_FEATURE_SGX_LC,			X86_FEATURE_SGX	      },
 	{ X86_FEATURE_SGX1,			X86_FEATURE_SGX       },
 	{ X86_FEATURE_SGX2,			X86_FEATURE_SGX1      },
+	{ X86_FEATURE_XFD,			X86_FEATURE_XSAVES    },
 	{}
 };
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 13/23] x86/msr-index: Add MSRs for XFD
  2021-10-21 22:55 [PATCH 00/23] x86: Support Intel Advanced Matrix Extensions (part 4) Chang S. Bae
                   ` (11 preceding siblings ...)
  2021-10-21 22:55 ` [PATCH 12/23] x86/cpufeatures: Add eXtended Feature Disabling (XFD) feature bit Chang S. Bae
@ 2021-10-21 22:55 ` Chang S. Bae
  2021-10-26 16:16   ` [tip: x86/fpu] " tip-bot2 for Chang S. Bae
  2021-10-21 22:55 ` [PATCH 14/23] x86/fpu: Add XFD state to fpstate Chang S. Bae
                   ` (9 subsequent siblings)
  22 siblings, 1 reply; 60+ messages in thread
From: Chang S. Bae @ 2021-10-21 22:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, tglx, dave.hansen, arjan, ravi.v.shankar, chang.seok.bae

XFD introduces two MSRs:

    - IA32_XFD to enable/disable a feature controlled by XFD

    - IA32_XFD_ERR to expose the #NM trap handler which feature
      was tried to be used for the first time.

Both use the same xstate-component bitmap format, used by XCR0.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
 arch/x86/include/asm/msr-index.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index a7c413432b33..01e2650b9585 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -625,6 +625,8 @@
 
 #define MSR_IA32_BNDCFGS_RSVD		0x00000ffc
 
+#define MSR_IA32_XFD			0x000001c4
+#define MSR_IA32_XFD_ERR		0x000001c5
 #define MSR_IA32_XSS			0x00000da0
 
 #define MSR_IA32_APICBASE		0x0000001b
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 14/23] x86/fpu: Add XFD state to fpstate
  2021-10-21 22:55 [PATCH 00/23] x86: Support Intel Advanced Matrix Extensions (part 4) Chang S. Bae
                   ` (12 preceding siblings ...)
  2021-10-21 22:55 ` [PATCH 13/23] x86/msr-index: Add MSRs for XFD Chang S. Bae
@ 2021-10-21 22:55 ` Chang S. Bae
  2021-10-26 16:16   ` [tip: x86/fpu] " tip-bot2 for Chang S. Bae
  2021-10-21 22:55 ` [PATCH 15/23] x86/fpu: Add sanity checks for XFD Chang S. Bae
                   ` (8 subsequent siblings)
  22 siblings, 1 reply; 60+ messages in thread
From: Chang S. Bae @ 2021-10-21 22:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, tglx, dave.hansen, arjan, ravi.v.shankar, chang.seok.bae

Add storage for XFD register state to struct fpstate. This will be used to
store the XFD MSR state. This will be used for switching the XFD MSR when
FPU content is restored.

Add a per-CPU variable to cache the current MSR value so the MSR has only
to be written when the values are different.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
Changes from the tglx tree:
* Add a missing #ifdeffery.
---
 arch/x86/include/asm/fpu/types.h | 3 +++
 arch/x86/kernel/fpu/core.c       | 2 ++
 arch/x86/kernel/fpu/xstate.h     | 4 ++++
 3 files changed, 9 insertions(+)

diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
index d27ad4842fa1..c956db14b011 100644
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -322,6 +322,9 @@ struct fpstate {
 	/* @user_xfeatures:	xfeatures valid in UABI buffers */
 	u64			user_xfeatures;
 
+	/* @xfd:		xfeatures disabled to trap userspace use. */
+	u64			xfd;
+
 	/* @is_valloc:		Indicator for dynamically allocated state */
 	unsigned int		is_valloc	: 1;
 
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 073a8fc281d8..e92c9d79441c 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -27,6 +27,7 @@
 
 #ifdef CONFIG_X86_64
 DEFINE_STATIC_KEY_FALSE(__fpu_state_size_dynamic);
+DEFINE_PER_CPU(u64, xfd_state);
 #endif
 
 /* The FPU state configuration data for kernel and user space */
@@ -412,6 +413,7 @@ static void __fpstate_reset(struct fpstate *fpstate)
 	fpstate->user_size	= fpu_user_cfg.default_size;
 	fpstate->xfeatures	= fpu_kernel_cfg.default_features;
 	fpstate->user_xfeatures	= fpu_user_cfg.default_features;
+	fpstate->xfd		= init_fpstate.xfd;
 }
 
 void fpstate_reset(struct fpu *fpu)
diff --git a/arch/x86/kernel/fpu/xstate.h b/arch/x86/kernel/fpu/xstate.h
index 4ce1dc030f38..32a4dee4de3b 100644
--- a/arch/x86/kernel/fpu/xstate.h
+++ b/arch/x86/kernel/fpu/xstate.h
@@ -5,6 +5,10 @@
 #include <asm/cpufeature.h>
 #include <asm/fpu/xstate.h>
 
+#ifdef CONFIG_X86_64
+DECLARE_PER_CPU(u64, xfd_state);
+#endif
+
 static inline void xstate_init_xcomp_bv(struct xregs_state *xsave, u64 mask)
 {
 	/*
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 15/23] x86/fpu: Add sanity checks for XFD
  2021-10-21 22:55 [PATCH 00/23] x86: Support Intel Advanced Matrix Extensions (part 4) Chang S. Bae
                   ` (13 preceding siblings ...)
  2021-10-21 22:55 ` [PATCH 14/23] x86/fpu: Add XFD state to fpstate Chang S. Bae
@ 2021-10-21 22:55 ` Chang S. Bae
  2021-10-25  8:11   ` Borislav Petkov
                     ` (2 more replies)
  2021-10-21 22:55 ` [PATCH 16/23] x86/fpu: Update XFD state where required Chang S. Bae
                   ` (7 subsequent siblings)
  22 siblings, 3 replies; 60+ messages in thread
From: Chang S. Bae @ 2021-10-21 22:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, tglx, dave.hansen, arjan, ravi.v.shankar, chang.seok.bae

From: Thomas Gleixner <tglx@linutronix.de>

Add debug functionality to ensure that the XFD MSR is up to date for XSAVE*
and XRSTOR* operations.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
Changes from the tglx tree:
* Rename the xstate ops validation helper. (Dave Hansen)
* Update code comments. (Dave Hansen)
---
 arch/x86/kernel/fpu/core.c   |  9 +++---
 arch/x86/kernel/fpu/signal.c |  6 ++--
 arch/x86/kernel/fpu/xstate.c | 55 ++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/fpu/xstate.h | 34 +++++++++++++++++++---
 4 files changed, 92 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index e92c9d79441c..f4b02ed47034 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -166,7 +166,7 @@ void restore_fpregs_from_fpstate(struct fpstate *fpstate, u64 mask)
 		 */
 		mask = fpu_kernel_cfg.max_features & mask;
 
-		os_xrstor(&fpstate->regs.xsave, mask);
+		os_xrstor(fpstate, mask);
 	} else {
 		if (use_fxsr())
 			fxrstor(&fpstate->regs.fxsave);
@@ -537,7 +537,7 @@ void fpu__drop(struct fpu *fpu)
 static inline void restore_fpregs_from_init_fpstate(u64 features_mask)
 {
 	if (use_xsave())
-		os_xrstor(&init_fpstate.regs.xsave, features_mask);
+		os_xrstor(&init_fpstate, features_mask);
 	else if (use_fxsr())
 		fxrstor(&init_fpstate.regs.fxsave);
 	else
@@ -594,9 +594,8 @@ void fpu__clear_user_states(struct fpu *fpu)
 	 * corresponding registers.
 	 */
 	if (xfeatures_mask_supervisor() &&
-	    !fpregs_state_valid(fpu, smp_processor_id())) {
-		os_xrstor(&fpu->fpstate->regs.xsave, xfeatures_mask_supervisor());
-	}
+	    !fpregs_state_valid(fpu, smp_processor_id()))
+		os_xrstor_supervisor(fpu->fpstate);
 
 	/* Reset user states in registers. */
 	restore_fpregs_from_init_fpstate(XFEATURE_MASK_USER_RESTORE);
diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index d128eb581ffc..a937980fd02b 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -261,7 +261,7 @@ static int __restore_fpregs_from_user(void __user *buf, u64 ufeatures,
 			ret = fxrstor_from_user_sigframe(buf);
 
 		if (!ret && unlikely(init_bv))
-			os_xrstor(&init_fpstate.regs.xsave, init_bv);
+			os_xrstor(&init_fpstate, init_bv);
 		return ret;
 	} else if (use_fxsr()) {
 		return fxrstor_from_user_sigframe(buf);
@@ -322,7 +322,7 @@ static bool restore_fpregs_from_user(void __user *buf, u64 xrestore,
 	 * been restored from a user buffer directly.
 	 */
 	if (test_thread_flag(TIF_NEED_FPU_LOAD) && xfeatures_mask_supervisor())
-		os_xrstor(&fpu->fpstate->regs.xsave, xfeatures_mask_supervisor());
+		os_xrstor_supervisor(fpu->fpstate);
 
 	fpregs_mark_activate();
 	fpregs_unlock();
@@ -432,7 +432,7 @@ static bool __fpu_restore_sig(void __user *buf, void __user *buf_fx,
 		u64 mask = user_xfeatures | xfeatures_mask_supervisor();
 
 		fpregs->xsave.header.xfeatures &= mask;
-		success = !os_xrstor_safe(&fpregs->xsave,
+		success = !os_xrstor_safe(fpu->fpstate,
 					  fpu_kernel_cfg.max_features);
 	} else {
 		success = !fxrstor_safe(&fpregs->fxsave);
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index c178aa91abb3..c2cbe14aaa00 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -1301,6 +1301,61 @@ EXPORT_SYMBOL_GPL(fpstate_clear_xstate_component);
 #endif
 
 #ifdef CONFIG_X86_64
+
+#ifdef CONFIG_X86_DEBUG_FPU
+/*
+ * Ensure that a subsequent XSAVE* or XRSTOR* instruction with RFBM=@mask
+ * can safely operate on the @fpstate buffer.
+ */
+static bool xstate_op_valid(struct fpstate *fpstate, u64 mask, bool rstor)
+{
+	u64 xfd = __this_cpu_read(xfd_state);
+
+	if (fpstate->xfd == xfd)
+		return true;
+
+	/* For current's fpstate the XFD state must be correct. */
+	if (fpstate->xfd == current->thread.fpu.fpstate->xfd)
+		return false;
+
+	/*
+	 * XRSTOR(S) from init_fpstate are always correct as it will just
+	 * bring all components into init state and not read from the
+	 * buffer. XSAVE(S) raises #PF after init.
+	 */
+	if (fpstate == &init_fpstate)
+		return rstor;
+
+	/*
+	 * XSAVE(S): clone(), fpu_swap_kvm_fpu()
+	 * XRSTORS(S): fpu_swap_kvm_fpu()
+	 */
+
+	/*
+	 * No XSAVE/XRSTOR instructions (except XSAVE itself) touch
+	 * the buffer area for XFD-disabled state components.
+	 */
+	mask &= ~xfd;
+
+	/*
+	 * Remove features which are valid in fpstate. They
+	 * have space allocated in fpstate.
+	 */
+	mask &= ~fpstate->xfeatures;
+
+	/*
+	 * Any remaining state components in 'mask' might be written
+	 * by XSAVE/XRSTOR. Fail validation it found.
+	 */
+	return !mask;
+}
+
+void xfd_validate_state(struct fpstate *fpstate, u64 mask, bool rstor)
+{
+	WARN_ON_ONCE(!xstate_op_valid(fpstate, mask, rstor));
+}
+#endif /* CONFIG_X86_DEBUG_FPU */
+
 static int validate_sigaltstack(unsigned int usize)
 {
 	struct task_struct *thread, *leader = current->group_leader;
diff --git a/arch/x86/kernel/fpu/xstate.h b/arch/x86/kernel/fpu/xstate.h
index 32a4dee4de3b..29024244965b 100644
--- a/arch/x86/kernel/fpu/xstate.h
+++ b/arch/x86/kernel/fpu/xstate.h
@@ -130,6 +130,12 @@ static inline u64 xfeatures_mask_independent(void)
 		     : "D" (st), "m" (*st), "a" (lmask), "d" (hmask)	\
 		     : "memory")
 
+#if defined(CONFIG_X86_64) && defined(CONFIG_X86_DEBUG_FPU)
+extern void xfd_validate_state(struct fpstate *fpstate, u64 mask, bool rstor);
+#else
+static inline void xfd_validate_state(struct fpstate *fpstate, u64 mask, bool rstor) { }
+#endif
+
 /*
  * Save processor xstate to xsave area.
  *
@@ -144,6 +150,7 @@ static inline void os_xsave(struct fpstate *fpstate)
 	int err;
 
 	WARN_ON_FPU(!alternatives_patched);
+	xfd_validate_state(fpstate, mask, false);
 
 	XSTATE_XSAVE(&fpstate->regs.xsave, lmask, hmask, err);
 
@@ -156,12 +163,23 @@ static inline void os_xsave(struct fpstate *fpstate)
  *
  * Uses XRSTORS when XSAVES is used, XRSTOR otherwise.
  */
-static inline void os_xrstor(struct xregs_state *xstate, u64 mask)
+static inline void os_xrstor(struct fpstate *fpstate, u64 mask)
+{
+	u32 lmask = mask;
+	u32 hmask = mask >> 32;
+
+	xfd_validate_state(fpstate, mask, true);
+	XSTATE_XRESTORE(&fpstate->regs.xsave, lmask, hmask);
+}
+
+/* Restore of supervisor state. Does not require XFD */
+static inline void os_xrstor_supervisor(struct fpstate *fpstate)
 {
+	u64 mask = xfeatures_mask_supervisor();
 	u32 lmask = mask;
 	u32 hmask = mask >> 32;
 
-	XSTATE_XRESTORE(xstate, lmask, hmask);
+	XSTATE_XRESTORE(&fpstate->regs.xsave, lmask, hmask);
 }
 
 /*
@@ -184,11 +202,14 @@ static inline int xsave_to_user_sigframe(struct xregs_state __user *buf)
 	 * internally, e.g. PKRU. That's user space ABI and also required
 	 * to allow the signal handler to modify PKRU.
 	 */
-	u64 mask = current->thread.fpu.fpstate->user_xfeatures;
+	struct fpstate *fpstate = current->thread.fpu.fpstate;
+	u64 mask = fpstate->user_xfeatures;
 	u32 lmask = mask;
 	u32 hmask = mask >> 32;
 	int err;
 
+	xfd_validate_state(fpstate, mask, false);
+
 	stac();
 	XSTATE_OP(XSAVE, buf, lmask, hmask, err);
 	clac();
@@ -206,6 +227,8 @@ static inline int xrstor_from_user_sigframe(struct xregs_state __user *buf, u64
 	u32 hmask = mask >> 32;
 	int err;
 
+	xfd_validate_state(current->thread.fpu.fpstate, mask, true);
+
 	stac();
 	XSTATE_OP(XRSTOR, xstate, lmask, hmask, err);
 	clac();
@@ -217,12 +240,15 @@ static inline int xrstor_from_user_sigframe(struct xregs_state __user *buf, u64
  * Restore xstate from kernel space xsave area, return an error code instead of
  * an exception.
  */
-static inline int os_xrstor_safe(struct xregs_state *xstate, u64 mask)
+static inline int os_xrstor_safe(struct fpstate *fpstate, u64 mask)
 {
+	struct xregs_state *xstate = &fpstate->regs.xsave;
 	u32 lmask = mask;
 	u32 hmask = mask >> 32;
 	int err;
 
+	/* Must enforce XFD update here */
+
 	if (cpu_feature_enabled(X86_FEATURE_XSAVES))
 		XSTATE_OP(XRSTORS, xstate, lmask, hmask, err);
 	else
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 16/23] x86/fpu: Update XFD state where required
  2021-10-21 22:55 [PATCH 00/23] x86: Support Intel Advanced Matrix Extensions (part 4) Chang S. Bae
                   ` (14 preceding siblings ...)
  2021-10-21 22:55 ` [PATCH 15/23] x86/fpu: Add sanity checks for XFD Chang S. Bae
@ 2021-10-21 22:55 ` Chang S. Bae
  2021-10-26 16:16   ` [tip: x86/fpu] " tip-bot2 for Chang S. Bae
  2021-10-21 22:55 ` [PATCH 17/23] x86/fpu/xstate: Add XFD #NM handler Chang S. Bae
                   ` (6 subsequent siblings)
  22 siblings, 1 reply; 60+ messages in thread
From: Chang S. Bae @ 2021-10-21 22:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, tglx, dave.hansen, arjan, ravi.v.shankar, chang.seok.bae

The IA32_XFD_MSR allows to arm #NM traps for XSTATE components which are
enabled in XCR0. The register has to be restored before the tasks XSTATE is
restored. The life time rules are the same as for FPU state.

XFD is updated on return to userspace only when the FPU state of the task
is not up to date in the registers. It's updated before the XRSTORS so
that eventually enabled dynamic features are restored as well and not
brought into init state.

Also in signal handling for restoring FPU state from user space the
correctness of the XFD state has to be ensured.

Add it to CPU initialization and resume as well.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
 arch/x86/kernel/fpu/context.h |  2 ++
 arch/x86/kernel/fpu/core.c    | 25 +++++++++++++++++++++++++
 arch/x86/kernel/fpu/signal.c  |  2 ++
 arch/x86/kernel/fpu/xstate.c  | 12 ++++++++++++
 arch/x86/kernel/fpu/xstate.h  | 19 ++++++++++++++++++-
 5 files changed, 59 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/fpu/context.h b/arch/x86/kernel/fpu/context.h
index a06ebf315d83..958accf2ccf0 100644
--- a/arch/x86/kernel/fpu/context.h
+++ b/arch/x86/kernel/fpu/context.h
@@ -69,6 +69,8 @@ static inline void fpregs_restore_userregs(void)
 		 * correct because it was either set in switch_to() or in
 		 * flush_thread(). So it is excluded because it might be
 		 * not up to date in current->thread.fpu.xsave state.
+		 *
+		 * XFD state is handled in restore_fpregs_from_fpstate().
 		 */
 		restore_fpregs_from_fpstate(fpu->fpstate, XFEATURE_MASK_FPSTATE);
 
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index f4b02ed47034..f66beea2d1f8 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -155,6 +155,23 @@ void restore_fpregs_from_fpstate(struct fpstate *fpstate, u64 mask)
 	}
 
 	if (use_xsave()) {
+		/*
+		 * Dynamically enabled features are enabled in XCR0, but
+		 * usage requires also that the corresponding bits in XFD
+		 * are cleared.  If the bits are set then using a related
+		 * instruction will raise #NM. This allows to do the
+		 * allocation of the larger FPU buffer lazy from #NM or if
+		 * the task has no permission to kill it which would happen
+		 * via #UD if the feature is disabled in XCR0.
+		 *
+		 * XFD state is following the same life time rules as
+		 * XSTATE and to restore state correctly XFD has to be
+		 * updated before XRSTORS otherwise the component would
+		 * stay in or go into init state even if the bits are set
+		 * in fpstate::regs::xsave::xfeatures.
+		 */
+		xfd_update_state(fpstate);
+
 		/*
 		 * Restoring state always needs to modify all features
 		 * which are in @mask even if the current task cannot use
@@ -244,7 +261,15 @@ int fpu_swap_kvm_fpstate(struct fpu_guest *guest_fpu, bool enter_guest,
 
 	if (!cur_fps->is_confidential) {
 		restore_mask &= XFEATURE_MASK_FPSTATE;
+		/* Includes XFD update */
 		restore_fpregs_from_fpstate(cur_fps, restore_mask);
+	} else {
+		/*
+		 * XSTATE is restored by firmware from encrypted
+		 * memory. Make sure XFD state is correct while
+		 * running with guest fpstate
+		 */
+		xfd_update_state(cur_fps);
 	}
 
 	fpregs_mark_activate();
diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index a937980fd02b..f6461937c536 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -282,6 +282,8 @@ static bool restore_fpregs_from_user(void __user *buf, u64 xrestore,
 
 retry:
 	fpregs_lock();
+	/* Ensure that XFD is up to date */
+	xfd_update_state(fpu->fpstate);
 	pagefault_disable();
 	ret = __restore_fpregs_from_user(buf, fpu->fpstate->user_xfeatures,
 					 xrestore, fx_only);
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index c2cbe14aaa00..04f1f7aea93c 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -136,6 +136,15 @@ void fpu__init_cpu_xstate(void)
 
 	cr4_set_bits(X86_CR4_OSXSAVE);
 
+	/*
+	 * Must happen after CR4 setup and before xsetbv() to allow KVM
+	 * lazy passthrough.  Write independent of the dynamic state static
+	 * key as that does not work on the boot CPU. This also ensures
+	 * that any stale state is wiped out from XFD.
+	 */
+	if (cpu_feature_enabled(X86_FEATURE_XFD))
+		wrmsrl(MSR_IA32_XFD, init_fpstate.xfd);
+
 	/*
 	 * XCR_XFEATURE_ENABLED_MASK (aka. XCR0) sets user features
 	 * managed by XSAVE{C, OPT, S} and XRSTOR{S}.  Only XSAVE user
@@ -875,6 +884,9 @@ void fpu__resume_cpu(void)
 		wrmsrl(MSR_IA32_XSS, xfeatures_mask_supervisor()  |
 				     xfeatures_mask_independent());
 	}
+
+	if (fpu_state_size_dynamic())
+		wrmsrl(MSR_IA32_XFD, current->thread.fpu.fpstate->xfd);
 }
 
 /*
diff --git a/arch/x86/kernel/fpu/xstate.h b/arch/x86/kernel/fpu/xstate.h
index 29024244965b..e18210dff88c 100644
--- a/arch/x86/kernel/fpu/xstate.h
+++ b/arch/x86/kernel/fpu/xstate.h
@@ -136,6 +136,22 @@ extern void xfd_validate_state(struct fpstate *fpstate, u64 mask, bool rstor);
 static inline void xfd_validate_state(struct fpstate *fpstate, u64 mask, bool rstor) { }
 #endif
 
+#ifdef CONFIG_X86_64
+static inline void xfd_update_state(struct fpstate *fpstate)
+{
+	if (fpu_state_size_dynamic()) {
+		u64 xfd = fpstate->xfd;
+
+		if (__this_cpu_read(xfd_state) != xfd) {
+			wrmsrl(MSR_IA32_XFD, xfd);
+			__this_cpu_write(xfd_state, xfd);
+		}
+	}
+}
+#else
+static inline void xfd_update_state(struct fpstate *fpstate) { }
+#endif
+
 /*
  * Save processor xstate to xsave area.
  *
@@ -247,7 +263,8 @@ static inline int os_xrstor_safe(struct fpstate *fpstate, u64 mask)
 	u32 hmask = mask >> 32;
 	int err;
 
-	/* Must enforce XFD update here */
+	/* Ensure that XFD is up to date */
+	xfd_update_state(fpstate);
 
 	if (cpu_feature_enabled(X86_FEATURE_XSAVES))
 		XSTATE_OP(XRSTORS, xstate, lmask, hmask, err);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 17/23] x86/fpu/xstate: Add XFD #NM handler
  2021-10-21 22:55 [PATCH 00/23] x86: Support Intel Advanced Matrix Extensions (part 4) Chang S. Bae
                   ` (15 preceding siblings ...)
  2021-10-21 22:55 ` [PATCH 16/23] x86/fpu: Update XFD state where required Chang S. Bae
@ 2021-10-21 22:55 ` Chang S. Bae
  2021-10-26 16:16   ` [tip: x86/fpu] " tip-bot2 for Chang S. Bae
  2021-10-21 22:55 ` [PATCH 18/23] x86/fpu/xstate: Add fpstate_realloc()/free() Chang S. Bae
                   ` (5 subsequent siblings)
  22 siblings, 1 reply; 60+ messages in thread
From: Chang S. Bae @ 2021-10-21 22:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, tglx, dave.hansen, arjan, ravi.v.shankar, chang.seok.bae

If the XFD MSR has feature bits set then #NM will be raised when user space
attempts to use an instruction related to one of these features.

When the task has no permissions to use that feature, raise SIGILL, which
is the same behavior as #UD.

If the task has permissions, calculate the new buffer size for the extended
feature set and allocate a larger fpstate. In the unlikely case that
vzalloc() fails, SIGSEGV is raised.

The allocation function will be added in the next step. Provide a stub
which fails for now.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
[ tglx: Updated serialization ]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
 arch/x86/include/asm/fpu/xstate.h |  2 ++
 arch/x86/kernel/fpu/xstate.c      | 47 +++++++++++++++++++++++++++++++
 arch/x86/kernel/traps.c           | 38 +++++++++++++++++++++++++
 3 files changed, 87 insertions(+)

diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
index 023000e79348..d23714132dbb 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -99,6 +99,8 @@ int xfeature_size(int xfeature_nr);
 void xsaves(struct xregs_state *xsave, u64 mask);
 void xrstors(struct xregs_state *xsave, u64 mask);
 
+int xfd_enable_feature(u64 xfd_err);
+
 #ifdef CONFIG_X86_64
 DECLARE_STATIC_KEY_FALSE(__fpu_state_size_dynamic);
 #endif
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 04f1f7aea93c..1b2fad6e4964 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -1461,6 +1461,53 @@ static int xstate_request_perm(unsigned long idx)
 	spin_unlock_irq(&current->sighand->siglock);
 	return ret;
 }
+
+/* Place holder for now */
+static int fpstate_realloc(u64 xfeatures, unsigned int ksize,
+			   unsigned int usize)
+{
+	return -ENOMEM;
+}
+
+int xfd_enable_feature(u64 xfd_err)
+{
+	u64 xfd_event = xfd_err & XFEATURE_MASK_USER_DYNAMIC;
+	unsigned int ksize, usize;
+	struct fpu *fpu;
+
+	if (!xfd_event) {
+		pr_err_once("XFD: Invalid xfd error: %016llx\n", xfd_err);
+		return 0;
+	}
+
+	/* Protect against concurrent modifications */
+	spin_lock_irq(&current->sighand->siglock);
+
+	/* If not permitted let it die */
+	if ((xstate_get_host_group_perm() & xfd_event) != xfd_event) {
+		spin_unlock_irq(&current->sighand->siglock);
+		return -EPERM;
+	}
+
+	fpu = &current->group_leader->thread.fpu;
+	ksize = fpu->perm.__state_size;
+	usize = fpu->perm.__user_state_size;
+	/*
+	 * The feature is permitted. State size is sufficient.  Dropping
+	 * the lock is safe here even if more features are added from
+	 * another task, the retrieved buffer sizes are valid for the
+	 * currently requested feature(s).
+	 */
+	spin_unlock_irq(&current->sighand->siglock);
+
+	/*
+	 * Try to allocate a new fpstate. If that fails there is no way
+	 * out.
+	 */
+	if (fpstate_realloc(xfd_event, ksize, usize))
+		return -EFAULT;
+	return 0;
+}
 #else /* CONFIG_X86_64 */
 static inline int xstate_request_perm(u64 permitted, u64 requested)
 {
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index bae7582c58f5..6ca1454a65d4 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -1108,10 +1108,48 @@ DEFINE_IDTENTRY(exc_spurious_interrupt_bug)
 	 */
 }
 
+static bool handle_xfd_event(struct pt_regs *regs)
+{
+	u64 xfd_err;
+	int err;
+
+	if (!IS_ENABLED(CONFIG_X86_64) || !cpu_feature_enabled(X86_FEATURE_XFD))
+		return false;
+
+	rdmsrl(MSR_IA32_XFD_ERR, xfd_err);
+	if (!xfd_err)
+		return false;
+
+	wrmsrl(MSR_IA32_XFD_ERR, 0);
+
+	/* Die if that happens in kernel space */
+	if (WARN_ON(!user_mode(regs)))
+		return false;
+
+	local_irq_enable();
+
+	err = xfd_enable_feature(xfd_err);
+
+	switch (err) {
+	case -EPERM:
+		force_sig_fault(SIGILL, ILL_ILLOPC, error_get_trap_addr(regs));
+		break;
+	case -EFAULT:
+		force_sig(SIGSEGV);
+		break;
+	}
+
+	local_irq_disable();
+	return true;
+}
+
 DEFINE_IDTENTRY(exc_device_not_available)
 {
 	unsigned long cr0 = read_cr0();
 
+	if (handle_xfd_event(regs))
+		return;
+
 #ifdef CONFIG_MATH_EMULATION
 	if (!boot_cpu_has(X86_FEATURE_FPU) && (cr0 & X86_CR0_EM)) {
 		struct math_emu_info info = { };
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 18/23] x86/fpu/xstate: Add fpstate_realloc()/free()
  2021-10-21 22:55 [PATCH 00/23] x86: Support Intel Advanced Matrix Extensions (part 4) Chang S. Bae
                   ` (16 preceding siblings ...)
  2021-10-21 22:55 ` [PATCH 17/23] x86/fpu/xstate: Add XFD #NM handler Chang S. Bae
@ 2021-10-21 22:55 ` Chang S. Bae
  2021-10-26 16:16   ` [tip: x86/fpu] " tip-bot2 for Chang S. Bae
  2021-10-21 22:55 ` [PATCH 19/23] x86/fpu/xstate: Prepare XSAVE feature table for gaps in state component numbers Chang S. Bae
                   ` (4 subsequent siblings)
  22 siblings, 1 reply; 60+ messages in thread
From: Chang S. Bae @ 2021-10-21 22:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, tglx, dave.hansen, arjan, ravi.v.shankar, chang.seok.bae

The fpstate embedded in struct fpu is the default state for storing the FPU
registers. It's sized so that the default supported features can be stored.
For dynamically enabled features the register buffer is too small.

The #NM handler detects first use of a feature which is disabled in the
XFD MSR. After handling permission checks it recalculates the size for
kernel space and user space state and invokes fpstate_realloc() which
tries to reallocate fpstate and install it.

Provide the allocator function which checks whether the current buffer size
is sufficient and if not allocates one. If allocation is successful the new
fpstate is initialized with the new features and sizes and the now enabled
features is removed from the task's XFD mask.

realloc_fpstate() uses vzalloc(). If use of this mechanism grows to
re-allocate buffers larger than 64KB, a more sophisticated allocation
scheme that includes purpose-built reclaim capability might be justified.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
 arch/x86/include/asm/fpu/api.h |  7 +++
 arch/x86/kernel/fpu/xstate.c   | 97 +++++++++++++++++++++++++++++++---
 arch/x86/kernel/process.c      | 10 ++++
 3 files changed, 106 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h
index 89762c28ad5a..c17d02decd65 100644
--- a/arch/x86/include/asm/fpu/api.h
+++ b/arch/x86/include/asm/fpu/api.h
@@ -130,6 +130,13 @@ static inline void fpstate_init_soft(struct swregs_state *soft) {}
 /* State tracking */
 DECLARE_PER_CPU(struct fpu *, fpu_fpregs_owner_ctx);
 
+/* Process cleanup */
+#ifdef CONFIG_X86_64
+extern void fpstate_free(struct fpu *fpu);
+#else
+static inline void fpstate_free(struct fpu *fpu) { }
+#endif
+
 /* fpstate-related functions which are exported to KVM */
 extern void fpstate_clear_xstate_component(struct fpstate *fps, unsigned int xfeature);
 
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 1b2fad6e4964..3f65140b4e6f 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -12,6 +12,7 @@
 #include <linux/pkeys.h>
 #include <linux/seq_file.h>
 #include <linux/proc_fs.h>
+#include <linux/vmalloc.h>
 
 #include <asm/fpu/api.h>
 #include <asm/fpu/regset.h>
@@ -22,6 +23,7 @@
 #include <asm/prctl.h>
 #include <asm/elf.h>
 
+#include "context.h"
 #include "internal.h"
 #include "legacy.h"
 #include "xstate.h"
@@ -1368,6 +1370,91 @@ void xfd_validate_state(struct fpstate *fpstate, u64 mask, bool rstor)
 }
 #endif /* CONFIG_X86_DEBUG_FPU */
 
+void fpstate_free(struct fpu *fpu)
+{
+	if (fpu->fpstate || fpu->fpstate != &fpu->__fpstate)
+		vfree(fpu->fpstate);
+}
+
+/**
+ * fpu_install_fpstate - Update the active fpstate in the FPU
+ *
+ * @fpu:	A struct fpu * pointer
+ * @newfps:	A struct fpstate * pointer
+ *
+ * Returns:	A null pointer if the last active fpstate is the embedded
+ *		one or the new fpstate is already installed;
+ *		otherwise, a pointer to the old fpstate which has to
+ *		be freed by the caller.
+ */
+static struct fpstate *fpu_install_fpstate(struct fpu *fpu,
+					   struct fpstate *newfps)
+{
+	struct fpstate *oldfps = fpu->fpstate;
+
+	if (fpu->fpstate == newfps)
+		return NULL;
+
+	fpu->fpstate = newfps;
+	return oldfps != &fpu->__fpstate ? oldfps : NULL;
+}
+
+/**
+ * fpstate_realloc - Reallocate struct fpstate for the requested new features
+ *
+ * @xfeatures:	A bitmap of xstate features which extend the enabled features
+ *		of that task
+ * @ksize:	The required size for the kernel buffer
+ * @usize:	The required size for user space buffers
+ *
+ * Note vs. vmalloc(): If the task with a vzalloc()-allocated buffer
+ * terminates quickly, vfree()-induced IPIs may be a concern, but tasks
+ * with large states are likely to live longer.
+ *
+ * Returns: 0 on success, -ENOMEM on allocation error.
+ */
+static int fpstate_realloc(u64 xfeatures, unsigned int ksize,
+			   unsigned int usize)
+{
+	struct fpu *fpu = &current->thread.fpu;
+	struct fpstate *curfps, *newfps = NULL;
+	unsigned int fpsize;
+
+	curfps = fpu->fpstate;
+	fpsize = ksize + ALIGN(offsetof(struct fpstate, regs), 64);
+
+	newfps = vzalloc(fpsize);
+	if (!newfps)
+		return -ENOMEM;
+	newfps->size = ksize;
+	newfps->user_size = usize;
+	newfps->is_valloc = true;
+
+	fpregs_lock();
+	/*
+	 * Ensure that the current state is in the registers before
+	 * swapping fpstate as that might invalidate it due to layout
+	 * changes.
+	 */
+	if (test_thread_flag(TIF_NEED_FPU_LOAD))
+		fpregs_restore_userregs();
+
+	newfps->xfeatures = curfps->xfeatures | xfeatures;
+	newfps->user_xfeatures = curfps->user_xfeatures | xfeatures;
+	newfps->xfd = curfps->xfd & ~xfeatures;
+
+	curfps = fpu_install_fpstate(fpu, newfps);
+
+	/* Do the final updates within the locked region */
+	xstate_init_xcomp_bv(&newfps->regs.xsave, newfps->xfeatures);
+	xfd_update_state(newfps);
+
+	fpregs_unlock();
+
+	vfree(curfps);
+	return 0;
+}
+
 static int validate_sigaltstack(unsigned int usize)
 {
 	struct task_struct *thread, *leader = current->group_leader;
@@ -1390,7 +1477,8 @@ static int __xstate_request_perm(u64 permitted, u64 requested)
 	/*
 	 * This deliberately does not exclude !XSAVES as we still might
 	 * decide to optionally context switch XCR0 or talk the silicon
-	 * vendors into extending XFD for the pre AMX states.
+	 * vendors into extending XFD for the pre AMX states, especially
+	 * AVX512.
 	 */
 	bool compacted = cpu_feature_enabled(X86_FEATURE_XSAVES);
 	struct fpu *fpu = &current->group_leader->thread.fpu;
@@ -1462,13 +1550,6 @@ static int xstate_request_perm(unsigned long idx)
 	return ret;
 }
 
-/* Place holder for now */
-static int fpstate_realloc(u64 xfeatures, unsigned int ksize,
-			   unsigned int usize)
-{
-	return -ENOMEM;
-}
-
 int xfd_enable_feature(u64 xfd_err)
 {
 	u64 xfd_event = xfd_err & XFEATURE_MASK_USER_DYNAMIC;
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 99025e32f105..f3f251787b99 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -32,6 +32,7 @@
 #include <asm/mwait.h>
 #include <asm/fpu/api.h>
 #include <asm/fpu/sched.h>
+#include <asm/fpu/xstate.h>
 #include <asm/debugreg.h>
 #include <asm/nmi.h>
 #include <asm/tlbflush.h>
@@ -90,9 +91,18 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
 #endif
 	/* Drop the copied pointer to current's fpstate */
 	dst->thread.fpu.fpstate = NULL;
+
 	return 0;
 }
 
+#ifdef CONFIG_X86_64
+void arch_release_task_struct(struct task_struct *tsk)
+{
+	if (fpu_state_size_dynamic())
+		fpstate_free(&tsk->thread.fpu);
+}
+#endif
+
 /*
  * Free thread data structures etc..
  */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 19/23] x86/fpu/xstate: Prepare XSAVE feature table for gaps in state component numbers
  2021-10-21 22:55 [PATCH 00/23] x86: Support Intel Advanced Matrix Extensions (part 4) Chang S. Bae
                   ` (17 preceding siblings ...)
  2021-10-21 22:55 ` [PATCH 18/23] x86/fpu/xstate: Add fpstate_realloc()/free() Chang S. Bae
@ 2021-10-21 22:55 ` Chang S. Bae
  2021-10-26 16:16   ` [tip: x86/fpu] " tip-bot2 for Chang S. Bae
  2021-10-21 22:55 ` [PATCH 20/23] x86/fpu/amx: Define AMX state components and have it used for boot-time checks Chang S. Bae
                   ` (3 subsequent siblings)
  22 siblings, 1 reply; 60+ messages in thread
From: Chang S. Bae @ 2021-10-21 22:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, tglx, dave.hansen, arjan, ravi.v.shankar, chang.seok.bae

The kernel checks at boot time which features are available by walking a
XSAVE feature table which contains the CPUID feature bit numbers which need
to be checked whether a feature is available on a CPU or not. So far the
feature numbers have been linear, but AMX will create a gap which the
current code cannot handle.

Make the table entries explicitly indexed and adjust the loop code
accordingly to prepare for that.

No functional change.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
---
 arch/x86/kernel/fpu/xstate.c | 29 ++++++++++++++++-------------
 1 file changed, 16 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 3f65140b4e6f..a2e17aca6318 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -53,18 +53,18 @@ static const char *xfeature_names[] =
 	"unknown xstate feature"	,
 };
 
-static short xsave_cpuid_features[] __initdata = {
-	X86_FEATURE_FPU,
-	X86_FEATURE_XMM,
-	X86_FEATURE_AVX,
-	X86_FEATURE_MPX,
-	X86_FEATURE_MPX,
-	X86_FEATURE_AVX512F,
-	X86_FEATURE_AVX512F,
-	X86_FEATURE_AVX512F,
-	X86_FEATURE_INTEL_PT,
-	X86_FEATURE_PKU,
-	X86_FEATURE_ENQCMD,
+static unsigned short xsave_cpuid_features[] __initdata = {
+	[XFEATURE_FP]				= X86_FEATURE_FPU,
+	[XFEATURE_SSE]				= X86_FEATURE_XMM,
+	[XFEATURE_YMM]				= X86_FEATURE_AVX,
+	[XFEATURE_BNDREGS]			= X86_FEATURE_MPX,
+	[XFEATURE_BNDCSR]			= X86_FEATURE_MPX,
+	[XFEATURE_OPMASK]			= X86_FEATURE_AVX512F,
+	[XFEATURE_ZMM_Hi256]			= X86_FEATURE_AVX512F,
+	[XFEATURE_Hi16_ZMM]			= X86_FEATURE_AVX512F,
+	[XFEATURE_PT_UNIMPLEMENTED_SO_FAR]	= X86_FEATURE_INTEL_PT,
+	[XFEATURE_PKRU]				= X86_FEATURE_PKU,
+	[XFEATURE_PASID]			= X86_FEATURE_ENQCMD,
 };
 
 static unsigned int xstate_offsets[XFEATURE_MAX] __ro_after_init =
@@ -809,7 +809,10 @@ void __init fpu__init_system_xstate(unsigned int legacy_size)
 	 * Clear XSAVE features that are disabled in the normal CPUID.
 	 */
 	for (i = 0; i < ARRAY_SIZE(xsave_cpuid_features); i++) {
-		if (!boot_cpu_has(xsave_cpuid_features[i]))
+		unsigned short cid = xsave_cpuid_features[i];
+
+		/* Careful: X86_FEATURE_FPU is 0! */
+		if ((i != XFEATURE_FP && !cid) || !boot_cpu_has(cid))
 			fpu_kernel_cfg.max_features &= ~BIT_ULL(i);
 	}
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 20/23] x86/fpu/amx: Define AMX state components and have it used for boot-time checks
  2021-10-21 22:55 [PATCH 00/23] x86: Support Intel Advanced Matrix Extensions (part 4) Chang S. Bae
                   ` (18 preceding siblings ...)
  2021-10-21 22:55 ` [PATCH 19/23] x86/fpu/xstate: Prepare XSAVE feature table for gaps in state component numbers Chang S. Bae
@ 2021-10-21 22:55 ` Chang S. Bae
  2021-10-26 16:16   ` [tip: x86/fpu] " tip-bot2 for Chang S. Bae
  2021-10-21 22:55 ` [PATCH 21/23] x86/fpu: Calculate the default sizes independently Chang S. Bae
                   ` (2 subsequent siblings)
  22 siblings, 1 reply; 60+ messages in thread
From: Chang S. Bae @ 2021-10-21 22:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, tglx, dave.hansen, arjan, ravi.v.shankar, chang.seok.bae

The XSTATE initialization uses check_xstate_against_struct() to sanity
check the size of XSTATE-enabled features. AMX is a XSAVE-enabled feature,
and its size is not hard-coded but discoverable at run-time via CPUID.

The AMX state is composed of state components 17 and 18, which are all user
state components. The first component is the XTILECFG state of a 64-byte
tile-related control register. The state component 18, called XTILEDATA,
contains the actual tile data, and the state size varies on
implementations. The architectural maximum, as defined in the CPUID(0x1d,
1): EAX[15:0], is a byte less than 64KB. The first implementation supports
8KB.

Check the XTILEDATA state size dynamically. The feature introduces the new
tile register, TMM. Define one register struct only and read the number of
registers from CPUID. Cross-check the overall size with CPUID again.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
Changes from the tglx tree:
* Make AMX_TILE depend on XFD.
* Update the changelog.
---
 arch/x86/include/asm/cpufeatures.h |  1 +
 arch/x86/include/asm/fpu/types.h   | 32 ++++++++++++
 arch/x86/include/asm/fpu/xstate.h  |  2 +
 arch/x86/kernel/fpu/xstate.c       | 80 +++++++++++++++++++++++++++++-
 4 files changed, 114 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index ab7b3a2de85d..d5b5f2ab87a0 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -299,6 +299,7 @@
 /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
 #define X86_FEATURE_AVX_VNNI		(12*32+ 4) /* AVX VNNI instructions */
 #define X86_FEATURE_AVX512_BF16		(12*32+ 5) /* AVX512 BFLOAT16 instructions */
+#define X86_FEATURE_AMX_TILE		(18*32+24) /* AMX tile Support */
 
 /* AMD-defined CPU features, CPUID level 0x80000008 (EBX), word 13 */
 #define X86_FEATURE_CLZERO		(13*32+ 0) /* CLZERO instruction */
diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
index c956db14b011..5dd2455ac230 100644
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -120,6 +120,9 @@ enum xfeature {
 	XFEATURE_RSRVD_COMP_13,
 	XFEATURE_RSRVD_COMP_14,
 	XFEATURE_LBR,
+	XFEATURE_RSRVD_COMP_16,
+	XFEATURE_XTILE_CFG,
+	XFEATURE_XTILE_DATA,
 
 	XFEATURE_MAX,
 };
@@ -136,12 +139,21 @@ enum xfeature {
 #define XFEATURE_MASK_PKRU		(1 << XFEATURE_PKRU)
 #define XFEATURE_MASK_PASID		(1 << XFEATURE_PASID)
 #define XFEATURE_MASK_LBR		(1 << XFEATURE_LBR)
+#define XFEATURE_MASK_XTILE_CFG		(1 << XFEATURE_XTILE_CFG)
+#define XFEATURE_MASK_XTILE_DATA	(1 << XFEATURE_XTILE_DATA)
 
 #define XFEATURE_MASK_FPSSE		(XFEATURE_MASK_FP | XFEATURE_MASK_SSE)
 #define XFEATURE_MASK_AVX512		(XFEATURE_MASK_OPMASK \
 					 | XFEATURE_MASK_ZMM_Hi256 \
 					 | XFEATURE_MASK_Hi16_ZMM)
 
+#ifdef CONFIG_X86_64
+# define XFEATURE_MASK_XTILE		(XFEATURE_MASK_XTILE_DATA \
+					 | XFEATURE_MASK_XTILE_CFG)
+#else
+# define XFEATURE_MASK_XTILE		(0)
+#endif
+
 #define FIRST_EXTENDED_XFEATURE	XFEATURE_YMM
 
 struct reg_128_bit {
@@ -153,6 +165,9 @@ struct reg_256_bit {
 struct reg_512_bit {
 	u8	regbytes[512/8];
 };
+struct reg_1024_byte {
+	u8	regbytes[1024];
+};
 
 /*
  * State component 2:
@@ -255,6 +270,23 @@ struct arch_lbr_state {
 	u64 ler_to;
 	u64 ler_info;
 	struct lbr_entry		entries[];
+};
+
+/*
+ * State component 17: 64-byte tile configuration register.
+ */
+struct xtile_cfg {
+	u64				tcfg[8];
+} __packed;
+
+/*
+ * State component 18: 1KB tile data register.
+ * Each register represents 16 64-byte rows of the matrix
+ * data. But the number of registers depends on the actual
+ * implementation.
+ */
+struct xtile_data {
+	struct reg_1024_byte		tmm;
 } __packed;
 
 /*
diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
index d23714132dbb..0c850def4d13 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -14,6 +14,8 @@
 
 #define XSTATE_CPUID		0x0000000d
 
+#define TILE_CPUID		0x0000001d
+
 #define FXSAVE_SIZE	512
 
 #define XSAVE_HDR_SIZE	    64
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index a2e17aca6318..df64b311119e 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -51,6 +51,14 @@ static const char *xfeature_names[] =
 	"Protection Keys User registers",
 	"PASID state",
 	"unknown xstate feature"	,
+	"unknown xstate feature"	,
+	"unknown xstate feature"	,
+	"unknown xstate feature"	,
+	"unknown xstate feature"	,
+	"unknown xstate feature"	,
+	"AMX Tile config"		,
+	"AMX Tile data"			,
+	"unknown xstate feature"	,
 };
 
 static unsigned short xsave_cpuid_features[] __initdata = {
@@ -65,6 +73,8 @@ static unsigned short xsave_cpuid_features[] __initdata = {
 	[XFEATURE_PT_UNIMPLEMENTED_SO_FAR]	= X86_FEATURE_INTEL_PT,
 	[XFEATURE_PKRU]				= X86_FEATURE_PKU,
 	[XFEATURE_PASID]			= X86_FEATURE_ENQCMD,
+	[XFEATURE_XTILE_CFG]			= X86_FEATURE_AMX_TILE,
+	[XFEATURE_XTILE_DATA]			= X86_FEATURE_AMX_TILE,
 };
 
 static unsigned int xstate_offsets[XFEATURE_MAX] __ro_after_init =
@@ -240,6 +250,8 @@ static void __init print_xstate_features(void)
 	print_xstate_feature(XFEATURE_MASK_Hi16_ZMM);
 	print_xstate_feature(XFEATURE_MASK_PKRU);
 	print_xstate_feature(XFEATURE_MASK_PASID);
+	print_xstate_feature(XFEATURE_MASK_XTILE_CFG);
+	print_xstate_feature(XFEATURE_MASK_XTILE_DATA);
 }
 
 /*
@@ -523,6 +535,67 @@ static void __init __xstate_dump_leaves(void)
 	}								\
 } while (0)
 
+/**
+ * check_xtile_data_against_struct - Check tile data state size.
+ *
+ * Calculate the state size by multiplying the single tile size which is
+ * recorded in a C struct, and the number of tiles that the CPU informs.
+ * Compare the provided size with the calculation.
+ *
+ * @size:	The tile data state size
+ *
+ * Returns:	0 on success, -EINVAL on mismatch.
+ */
+static int __init check_xtile_data_against_struct(int size)
+{
+	u32 max_palid, palid, state_size;
+	u32 eax, ebx, ecx, edx;
+	u16 max_tile;
+
+	/*
+	 * Check the maximum palette id:
+	 *   eax: the highest numbered palette subleaf.
+	 */
+	cpuid_count(TILE_CPUID, 0, &max_palid, &ebx, &ecx, &edx);
+
+	/*
+	 * Cross-check each tile size and find the maximum number of
+	 * supported tiles.
+	 */
+	for (palid = 1, max_tile = 0; palid <= max_palid; palid++) {
+		u16 tile_size, max;
+
+		/*
+		 * Check the tile size info:
+		 *   eax[31:16]:  bytes per title
+		 *   ebx[31:16]:  the max names (or max number of tiles)
+		 */
+		cpuid_count(TILE_CPUID, palid, &eax, &ebx, &edx, &edx);
+		tile_size = eax >> 16;
+		max = ebx >> 16;
+
+		if (tile_size != sizeof(struct xtile_data)) {
+			pr_err("%s: struct is %zu bytes, cpu xtile %d bytes\n",
+			       __stringify(XFEATURE_XTILE_DATA),
+			       sizeof(struct xtile_data), tile_size);
+			__xstate_dump_leaves();
+			return -EINVAL;
+		}
+
+		if (max > max_tile)
+			max_tile = max;
+	}
+
+	state_size = sizeof(struct xtile_data) * max_tile;
+	if (size != state_size) {
+		pr_err("%s: calculated size is %u bytes, cpu state %d bytes\n",
+		       __stringify(XFEATURE_XTILE_DATA), state_size, size);
+		__xstate_dump_leaves();
+		return -EINVAL;
+	}
+	return 0;
+}
+
 /*
  * We have a C struct for each 'xstate'.  We need to ensure
  * that our software representation matches what the CPU
@@ -546,6 +619,11 @@ static bool __init check_xstate_against_struct(int nr)
 	XCHECK_SZ(sz, nr, XFEATURE_Hi16_ZMM,  struct avx_512_hi16_state);
 	XCHECK_SZ(sz, nr, XFEATURE_PKRU,      struct pkru_state);
 	XCHECK_SZ(sz, nr, XFEATURE_PASID,     struct ia32_pasid_state);
+	XCHECK_SZ(sz, nr, XFEATURE_XTILE_CFG, struct xtile_cfg);
+
+	/* The tile data size varies between implementations. */
+	if (nr == XFEATURE_XTILE_DATA)
+		check_xtile_data_against_struct(sz);
 
 	/*
 	 * Make *SURE* to add any feature numbers in below if
@@ -555,7 +633,7 @@ static bool __init check_xstate_against_struct(int nr)
 	if ((nr < XFEATURE_YMM) ||
 	    (nr >= XFEATURE_MAX) ||
 	    (nr == XFEATURE_PT_UNIMPLEMENTED_SO_FAR) ||
-	    ((nr >= XFEATURE_RSRVD_COMP_11) && (nr <= XFEATURE_LBR))) {
+	    ((nr >= XFEATURE_RSRVD_COMP_11) && (nr <= XFEATURE_RSRVD_COMP_16))) {
 		WARN_ONCE(1, "no structure for xstate: %d\n", nr);
 		XSTATE_WARN_ON(1);
 		return false;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 21/23] x86/fpu: Calculate the default sizes independently
  2021-10-21 22:55 [PATCH 00/23] x86: Support Intel Advanced Matrix Extensions (part 4) Chang S. Bae
                   ` (19 preceding siblings ...)
  2021-10-21 22:55 ` [PATCH 20/23] x86/fpu/amx: Define AMX state components and have it used for boot-time checks Chang S. Bae
@ 2021-10-21 22:55 ` Chang S. Bae
  2021-10-26 16:16   ` [tip: x86/fpu] " tip-bot2 for Chang S. Bae
  2021-10-21 22:55 ` [PATCH 22/23] x86/fpu: Add XFD handling for dynamic states Chang S. Bae
  2021-10-21 22:55 ` [PATCH 23/23] x86/fpu/amx: Enable the AMX feature in 64-bit mode Chang S. Bae
  22 siblings, 1 reply; 60+ messages in thread
From: Chang S. Bae @ 2021-10-21 22:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, tglx, dave.hansen, arjan, ravi.v.shankar, chang.seok.bae

When dynamically enabled states are supported the maximum and default sizes
for the kernel buffers and user space interfaces are not longer identical.

Put the necessary calculations in place which only take the default enabled
features into account.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
Changes from the tglx tree:
* Fix to work when XSAVES is not available.
---
 arch/x86/kernel/fpu/xstate.c | 30 +++++++++++++++++++++---------
 1 file changed, 21 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index df64b311119e..dcc1dd2d8f2d 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -781,35 +781,40 @@ static bool __init is_supported_xstate_size(unsigned int test_xstate_size)
 static int __init init_xstate_size(void)
 {
 	/* Recompute the context size for enabled features: */
-	unsigned int user_size, kernel_size;
+	unsigned int user_size, kernel_size, kernel_default_size;
+	bool compacted = cpu_feature_enabled(X86_FEATURE_XSAVES);
 
 	/* Uncompacted user space size */
 	user_size = get_xsave_size_user();
 
 	/*
 	 * XSAVES kernel size includes supervisor states and
-	 * uses compacted format.
+	 * uses compacted format when available.
 	 *
 	 * XSAVE does not support supervisor states so
 	 * kernel and user size is identical.
 	 */
-	if (cpu_feature_enabled(X86_FEATURE_XSAVES))
+	if (compacted)
 		kernel_size = get_xsaves_size_no_independent();
 	else
 		kernel_size = user_size;
 
-	/* Ensure we have the space to store all enabled features. */
-	if (!is_supported_xstate_size(kernel_size))
+	kernel_default_size =
+		xstate_calculate_size(fpu_kernel_cfg.default_features, compacted);
+
+	/* Ensure we have the space to store all default enabled features. */
+	if (!is_supported_xstate_size(kernel_default_size))
 		return -EINVAL;
 
 	if (!paranoid_xstate_size_valid(kernel_size))
 		return -EINVAL;
 
-	/* Keep it the same for now */
 	fpu_kernel_cfg.max_size = kernel_size;
-	fpu_kernel_cfg.default_size = kernel_size;
 	fpu_user_cfg.max_size = user_size;
-	fpu_user_cfg.default_size = user_size;
+
+	fpu_kernel_cfg.default_size = kernel_default_size;
+	fpu_user_cfg.default_size =
+		xstate_calculate_size(fpu_user_cfg.default_features, false);
 
 	return 0;
 }
@@ -894,15 +899,21 @@ void __init fpu__init_system_xstate(unsigned int legacy_size)
 			fpu_kernel_cfg.max_features &= ~BIT_ULL(i);
 	}
 
+	if (!cpu_feature_enabled(X86_FEATURE_XFD))
+		fpu_kernel_cfg.max_features &= ~XFEATURE_MASK_USER_DYNAMIC;
+
 	fpu_kernel_cfg.max_features &= XFEATURE_MASK_USER_SUPPORTED |
 			      XFEATURE_MASK_SUPERVISOR_SUPPORTED;
 
 	fpu_user_cfg.max_features = fpu_kernel_cfg.max_features;
 	fpu_user_cfg.max_features &= XFEATURE_MASK_USER_SUPPORTED;
 
-	/* Identical for now */
+	/* Clean out dynamic features from default */
 	fpu_kernel_cfg.default_features = fpu_kernel_cfg.max_features;
+	fpu_kernel_cfg.default_features &= ~XFEATURE_MASK_USER_DYNAMIC;
+
 	fpu_user_cfg.default_features = fpu_user_cfg.max_features;
+	fpu_user_cfg.default_features &= ~XFEATURE_MASK_USER_DYNAMIC;
 
 	/* Store it for paranoia check at the end */
 	xfeatures = fpu_kernel_cfg.max_features;
@@ -913,6 +924,7 @@ void __init fpu__init_system_xstate(unsigned int legacy_size)
 	if (err)
 		goto out_disable;
 
+	/* Reset the state for the current task */
 	fpstate_reset(&current->thread.fpu);
 
 	/*
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 22/23] x86/fpu: Add XFD handling for dynamic states
  2021-10-21 22:55 [PATCH 00/23] x86: Support Intel Advanced Matrix Extensions (part 4) Chang S. Bae
                   ` (20 preceding siblings ...)
  2021-10-21 22:55 ` [PATCH 21/23] x86/fpu: Calculate the default sizes independently Chang S. Bae
@ 2021-10-21 22:55 ` Chang S. Bae
  2021-10-26 16:16   ` [tip: x86/fpu] " tip-bot2 for Chang S. Bae
  2021-10-21 22:55 ` [PATCH 23/23] x86/fpu/amx: Enable the AMX feature in 64-bit mode Chang S. Bae
  22 siblings, 1 reply; 60+ messages in thread
From: Chang S. Bae @ 2021-10-21 22:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, tglx, dave.hansen, arjan, ravi.v.shankar, chang.seok.bae

To handle the dynamic sizing of buffers on first use the XFD MSR has to be
armed. Store the delta between the maximum available and the default
feature bits in init_fpstate where it can be retrieved for task creation.

If the delta is non zero then dynamic features are enabled. This needs also
to enable the static key which guards the XFD updates. This is delayed to
an initcall because the FPU setup runs before jump labels are initialized.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
 arch/x86/kernel/fpu/xstate.c | 28 +++++++++++++++++++++++++++-
 1 file changed, 27 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index dcc1dd2d8f2d..81b40f119ce7 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -835,6 +835,12 @@ static void __init fpu__init_disable_system_xstate(unsigned int legacy_size)
 	fpu_user_cfg.max_size = legacy_size;
 	fpu_user_cfg.default_size = legacy_size;
 
+	/*
+	 * Prevent enabling the static branch which enables writes to the
+	 * XFD MSR.
+	 */
+	init_fpstate.xfd = 0;
+
 	fpstate_reset(&current->thread.fpu);
 }
 
@@ -918,6 +924,14 @@ void __init fpu__init_system_xstate(unsigned int legacy_size)
 	/* Store it for paranoia check at the end */
 	xfeatures = fpu_kernel_cfg.max_features;
 
+	/*
+	 * Initialize the default XFD state in initfp_state and enable the
+	 * dynamic sizing mechanism if dynamic states are available.  The
+	 * static key cannot be enabled here because this runs before
+	 * jump_label_init(). This is delayed to an initcall.
+	 */
+	init_fpstate.xfd = fpu_user_cfg.max_features & XFEATURE_MASK_USER_DYNAMIC;
+
 	/* Enable xstate instructions to be able to continue with initialization: */
 	fpu__init_cpu_xstate();
 	err = init_xstate_size();
@@ -1463,9 +1477,21 @@ void xfd_validate_state(struct fpstate *fpstate, u64 mask, bool rstor)
 }
 #endif /* CONFIG_X86_DEBUG_FPU */
 
+static int __init xfd_update_static_branch(void)
+{
+	/*
+	 * If init_fpstate.xfd has bits set then dynamic features are
+	 * available and the dynamic sizing must be enabled.
+	 */
+	if (init_fpstate.xfd)
+		static_branch_enable(&__fpu_state_size_dynamic);
+	return 0;
+}
+arch_initcall(xfd_update_static_branch)
+
 void fpstate_free(struct fpu *fpu)
 {
-	if (fpu->fpstate || fpu->fpstate != &fpu->__fpstate)
+	if (fpu->fpstate && fpu->fpstate != &fpu->__fpstate)
 		vfree(fpu->fpstate);
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 23/23] x86/fpu/amx: Enable the AMX feature in 64-bit mode
  2021-10-21 22:55 [PATCH 00/23] x86: Support Intel Advanced Matrix Extensions (part 4) Chang S. Bae
                   ` (21 preceding siblings ...)
  2021-10-21 22:55 ` [PATCH 22/23] x86/fpu: Add XFD handling for dynamic states Chang S. Bae
@ 2021-10-21 22:55 ` Chang S. Bae
  2021-10-26 16:16   ` [tip: x86/fpu] " tip-bot2 for Chang S. Bae
  22 siblings, 1 reply; 60+ messages in thread
From: Chang S. Bae @ 2021-10-21 22:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, tglx, dave.hansen, arjan, ravi.v.shankar, chang.seok.bae

Add the AMX state components in XFEATURE_MASK_USER_SUPPORTED and the
TILE_DATA component to the dynamic states and update the permission check
table accordingly.

This is only effective on 64 bit kernels as for 32bit kernels
XFEATURE_MASK_TILE is defined as 0.

TILE_DATA is caller-saved state and the only dynamic state. Add build time
sanity check to ensure the assumption that every dynamic feature is caller-
saved.

Make AMX state depend on XFD as it is dynamic feature.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
Changes from the tglx tree:
* Add compile time sanity check. (Dave Hansen)
* Make AMX depend on XFD.
---
 arch/x86/include/asm/fpu/xstate.h | 5 +++--
 arch/x86/kernel/cpu/cpuid-deps.c  | 1 +
 arch/x86/kernel/fpu/core.c        | 6 ++++++
 arch/x86/kernel/fpu/xstate.c      | 5 +++--
 4 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
index 0c850def4d13..ffb08dc298bb 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -35,7 +35,8 @@
 				      XFEATURE_MASK_Hi16_ZMM	 | \
 				      XFEATURE_MASK_PKRU | \
 				      XFEATURE_MASK_BNDREGS | \
-				      XFEATURE_MASK_BNDCSR)
+				      XFEATURE_MASK_BNDCSR | \
+				      XFEATURE_MASK_XTILE)
 
 /*
  * Features which are restored when returning to user space.
@@ -46,7 +47,7 @@
 	(XFEATURE_MASK_USER_SUPPORTED & ~XFEATURE_MASK_PKRU)
 
 /* Features which are dynamically enabled for a process on request */
-#define XFEATURE_MASK_USER_DYNAMIC	0ULL
+#define XFEATURE_MASK_USER_DYNAMIC	XFEATURE_MASK_XTILE_DATA
 
 /* All currently supported supervisor features */
 #define XFEATURE_MASK_SUPERVISOR_SUPPORTED (XFEATURE_MASK_PASID)
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index d9ead9c20408..cb2fdd130aae 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -76,6 +76,7 @@ static const struct cpuid_dep cpuid_deps[] = {
 	{ X86_FEATURE_SGX1,			X86_FEATURE_SGX       },
 	{ X86_FEATURE_SGX2,			X86_FEATURE_SGX1      },
 	{ X86_FEATURE_XFD,			X86_FEATURE_XSAVES    },
+	{ X86_FEATURE_AMX_TILE,			X86_FEATURE_XFD       },
 	{}
 };
 
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index f66beea2d1f8..323ae0c2a137 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -496,6 +496,12 @@ int fpu_clone(struct task_struct *dst, unsigned long clone_flags)
 		return 0;
 	}
 
+	/*
+	 * If a new feature is added, ensure all dynamic features are
+	 * caller-saved from here!
+	 */
+	BUILD_BUG_ON(XFEATURE_MASK_USER_DYNAMIC != XFEATURE_MASK_XTILE_DATA);
+
 	/*
 	 * Save the default portion of the current FPU state into the
 	 * clone. Assume all dynamic features to be defined as caller-
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 81b40f119ce7..3195dc116fa8 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -404,7 +404,8 @@ static __init void os_xrstor_booting(struct xregs_state *xstate)
 	 XFEATURE_MASK_PKRU |			\
 	 XFEATURE_MASK_BNDREGS |		\
 	 XFEATURE_MASK_BNDCSR |			\
-	 XFEATURE_MASK_PASID)
+	 XFEATURE_MASK_PASID |			\
+	 XFEATURE_MASK_XTILE)
 
 /*
  * setup the xstate image representing the init state
@@ -1633,7 +1634,7 @@ static int __xstate_request_perm(u64 permitted, u64 requested)
  * Permissions array to map facilities with more than one component
  */
 static const u64 xstate_prctl_req[XFEATURE_MAX] = {
-	/* [XFEATURE_XTILE_DATA] = XFEATURE_MASK_XTILE, */
+	[XFEATURE_XTILE_DATA] = XFEATURE_MASK_XTILE_DATA,
 };
 
 static int xstate_request_perm(unsigned long idx)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [PATCH 01/23] signal: Add an optional check for altstack size
  2021-10-21 22:55 ` [PATCH 01/23] signal: Add an optional check for altstack size Chang S. Bae
@ 2021-10-22  0:06   ` Bae, Chang Seok
  2021-10-22 15:20   ` Eric W. Biederman
  2021-10-26 16:16   ` [tip: x86/fpu] " tip-bot2 for Thomas Gleixner
  2 siblings, 0 replies; 60+ messages in thread
From: Bae, Chang Seok @ 2021-10-22  0:06 UTC (permalink / raw)
  To: oleg
  Cc: LKML, x86, Thomas Gleixner, Dave Hansen, Arjan van de Ven,
	Shankar, Ravi V, Eric W . Biederman

On Oct 21, 2021, at 15:55, Bae, Chang Seok <chang.seok.bae@intel.com> wrote:
> 
> Cc: Oleg Nesterov <ole@redhat.com>

Hi Oleg,

Apologies on my typo -- s/ole/oleg/

The original message can be found:
    https://lore.kernel.org/lkml/20211021225527.10184-2-chang.seok.bae@intel.com/

Thanks,
Chang

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 01/23] signal: Add an optional check for altstack size
  2021-10-21 22:55 ` [PATCH 01/23] signal: Add an optional check for altstack size Chang S. Bae
  2021-10-22  0:06   ` Bae, Chang Seok
@ 2021-10-22 15:20   ` Eric W. Biederman
  2021-10-22 20:58     ` Bae, Chang Seok
  2021-10-22 22:51     ` Dave Hansen
  2021-10-26 16:16   ` [tip: x86/fpu] " tip-bot2 for Thomas Gleixner
  2 siblings, 2 replies; 60+ messages in thread
From: Eric W. Biederman @ 2021-10-22 15:20 UTC (permalink / raw)
  To: Chang S. Bae
  Cc: linux-kernel, x86, tglx, dave.hansen, arjan, ravi.v.shankar,
	Oleg Nesterov, Al Viro

"Chang S. Bae" <chang.seok.bae@intel.com> writes:

> From: Thomas Gleixner <tglx@linutronix.de>
>
> The upcoming support for dynamically enabled FPU features on x86 requires
> an architecture specific sanity check and serialization of the store to
> task::sas_ss_size. The check is required to ensure that:
>
>    - Enabling of a dynamic feature, which changes the sigframe size fits
>      into an enabled sigaltstack
>
>    - Installing a too small sigaltstack after a dynamic feature has been
>      added is not possible.
>
> It needs serialization to prevent race conditions of all sorts in the
> feature enable code as that has to walk the thread list of the process.
>
> Add the infrastructure in form of a config option and provide empty stubs
> for architectures which do not need that.

Last I looked Al Viro was doing a lot of work on sigframes, adding him
to the cc.


That said description in the patch is very sorely lacking.

First the reason for the new locking is not really explained, it talks
about serialization but it does not talk about what is protected.
Especially given that the signal delivery code already has to check if
the signal frame on the stack when pushing a new signal I don't
understand what the code is trying to prevent.

Second the reason that 2K is not enough mentioned.  The current value of
MINSIGSTKSZ on x86 is 2K.

Third the issues with modifying the userspace ABI are not discussed.
Frankly that is a pretty big consideration.  MINSIGSTKSZ is exported to
userspace and userspace fundamentally needs to allocate the alternate
signal frame.

Forth the sigframe size on x86 is already dynamic and is already
computed by get_sigframe_size.

So can we please please please have a better description of what
is going on and the trade offs that are being made.

Thank you,
Eric




> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
> Cc: Oleg Nesterov <ole@redhat.com>
> Cc: Eric W. Biederman <ebiederm@xmission.com>
> ---
>  arch/Kconfig           |  3 +++
>  include/linux/signal.h |  6 ++++++
>  kernel/signal.c        | 35 +++++++++++++++++++++++++++++------
>  3 files changed, 38 insertions(+), 6 deletions(-)
>
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 8df1c7102643..af5cf3009b4f 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -1288,6 +1288,9 @@ config ARCH_HAS_ELFCORE_COMPAT
>  config ARCH_HAS_PARANOID_L1D_FLUSH
>  	bool
>  
> +config DYNAMIC_SIGFRAME
> +	bool
> +
>  source "kernel/gcov/Kconfig"
>  
>  source "scripts/gcc-plugins/Kconfig"
> diff --git a/include/linux/signal.h b/include/linux/signal.h
> index 3f96a6374e4f..7d34105e20c6 100644
> --- a/include/linux/signal.h
> +++ b/include/linux/signal.h
> @@ -464,6 +464,12 @@ int __save_altstack(stack_t __user *, unsigned long);
>  	unsafe_put_user(t->sas_ss_size, &__uss->ss_size, label); \
>  } while (0);
>  
> +#ifdef CONFIG_DYNAMIC_SIGFRAME
> +bool sigaltstack_size_valid(size_t ss_size);
> +#else
> +static inline bool sigaltstack_size_valid(size_t size) { return true; }
> +#endif /* !CONFIG_DYNAMIC_SIGFRAME */
> +
>  #ifdef CONFIG_PROC_FS
>  struct seq_file;
>  extern void render_sigset_t(struct seq_file *, const char *, sigset_t *);
> diff --git a/kernel/signal.c b/kernel/signal.c
> index 952741f6d0f9..9278f5291ed6 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -4151,11 +4151,29 @@ int do_sigaction(int sig, struct k_sigaction *act, struct k_sigaction *oact)
>  	return 0;
>  }
>  
> +#ifdef CONFIG_DYNAMIC_SIGFRAME
> +static inline void sigaltstack_lock(void)
> +	__acquires(&current->sighand->siglock)
> +{
> +	spin_lock_irq(&current->sighand->siglock);
> +}
> +
> +static inline void sigaltstack_unlock(void)
> +	__releases(&current->sighand->siglock)
> +{
> +	spin_unlock_irq(&current->sighand->siglock);
> +}
> +#else
> +static inline void sigaltstack_lock(void) { }
> +static inline void sigaltstack_unlock(void) { }
> +#endif
> +
>  static int
>  do_sigaltstack (const stack_t *ss, stack_t *oss, unsigned long sp,
>  		size_t min_ss_size)
>  {
>  	struct task_struct *t = current;
> +	int ret = 0;
>  
>  	if (oss) {
>  		memset(oss, 0, sizeof(stack_t));
> @@ -4179,19 +4197,24 @@ do_sigaltstack (const stack_t *ss, stack_t *oss, unsigned long sp,
>  				ss_mode != 0))
>  			return -EINVAL;
>  
> +		sigaltstack_lock();
>  		if (ss_mode == SS_DISABLE) {
>  			ss_size = 0;
>  			ss_sp = NULL;
>  		} else {
>  			if (unlikely(ss_size < min_ss_size))
> -				return -ENOMEM;
> +				ret = -ENOMEM;
> +			if (!sigaltstack_size_valid(ss_size))
> +				ret = -ENOMEM;
>  		}
> -
> -		t->sas_ss_sp = (unsigned long) ss_sp;
> -		t->sas_ss_size = ss_size;
> -		t->sas_ss_flags = ss_flags;
> +		if (!ret) {
> +			t->sas_ss_sp = (unsigned long) ss_sp;
> +			t->sas_ss_size = ss_size;
> +			t->sas_ss_flags = ss_flags;
> +		}
> +		sigaltstack_unlock();
>  	}
> -	return 0;
> +	return ret;
>  }
>  
>  SYSCALL_DEFINE2(sigaltstack,const stack_t __user *,uss, stack_t __user *,uoss)

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 01/23] signal: Add an optional check for altstack size
  2021-10-22 15:20   ` Eric W. Biederman
@ 2021-10-22 20:58     ` Bae, Chang Seok
  2021-10-22 22:51     ` Dave Hansen
  1 sibling, 0 replies; 60+ messages in thread
From: Bae, Chang Seok @ 2021-10-22 20:58 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: LKML, x86, Thomas Gleixner, Dave Hansen, Arjan van de Ven,
	Shankar, Ravi V, Oleg Nesterov, Al Viro

Hi Eric,

On Oct 22, 2021, at 08:20, Eric W. Biederman <ebiederm@xmission.com> wrote:
> 
> "Chang S. Bae" <chang.seok.bae@intel.com> writes:
> 
>> From: Thomas Gleixner <tglx@linutronix.de>
>> 
>> The upcoming support for dynamically enabled FPU features on x86 requires
>> an architecture specific sanity check and serialization of the store to
>> task::sas_ss_size. The check is required to ensure that:
>> 
>>   - Enabling of a dynamic feature, which changes the sigframe size fits
>>     into an enabled sigaltstack
>> 
>>   - Installing a too small sigaltstack after a dynamic feature has been
>>     added is not possible.
>> 
>> It needs serialization to prevent race conditions of all sorts in the
>> feature enable code as that has to walk the thread list of the process.
>> 
>> Add the infrastructure in form of a config option and provide empty stubs
>> for architectures which do not need that.
> 
> Last I looked Al Viro was doing a lot of work on sigframes, adding him
> to the cc.
> 
> 
> That said description in the patch is very sorely lacking.

Will update the changelog though, let me clarify what you pointed out here 
at first.

> First the reason for the new locking is not really explained, it talks
> about serialization but it does not talk about what is protected.
> Especially given that the signal delivery code already has to check if
> the signal frame on the stack when pushing a new signal I don't
> understand what the code is trying to prevent.

Later in this series, a new x86-specific prctl() is introduced so that an 
application may ask permission to use dynamic states. It means the sigframe 
size is also dynamic. Besides that, in the implemented mechanism [1]:

    "Task asks for permission for a facility and kernel checks whether that's
     supported. If supported it does:
    ...
    3) Validate that no task has a sigaltstack installed
       which is smaller than the resulting sigframe size
    ...
    "

The new sigframe size out of new permission is validated via each thread’s 
altstack size. Accessing each task::sas_ss_size can be racy with
sigaltstack(). So, it is protected by sighand lock. (3) looks like this in a
nutshell:

	spin_lock_irq(&current->sighand->siglock);
	...
	int validate_sigaltstack(unsigned int usize) {
		struct task_struct *thread, *leader = current->group_leader;
		unsigned long framesize = get_sigframe_size();
		...
		for_each_thread(leader, thread) {
			if (thread->sas_ss_size && thread->sas_ss_size < framesize)
			return -ENOSPC;
		}
		...
	}
	...
	spin_unlock_irq(&current->sighand->siglock);

> Second the reason that 2K is not enough mentioned.  The current value of
> MINSIGSTKSZ on x86 is 2K.

The MINSIGSTKSZ constant is 2K but this is already too small in x86 with 
AVX512. Increasing this might break binaries that were already compiled with 
the old value. They used to work because their sigalstack was never used.

> Third the issues with modifying the userspace ABI are not discussed.
> Frankly that is a pretty big consideration.  MINSIGSTKSZ is exported to
> userspace and userspace fundamentally needs to allocate the alternate
> signal frame.

Now, there is an effort to redefine MINSIGSTKSZ as a dynamic value.
The latest glibc v2.34 has this [2]:

    "Add _SC_MINSIGSTKSZ and _SC_SIGSTKSZ. When _DYNAMIC_STACK_SIZE_SOURCE
     or _GNU_SOURCE are defined, MINSIGSTKSZ and SIGSTKSZ are no longer 
     constant on Linux. MINSIGSTKSZ is redefined to sysconf(_SC_MINSIGSTKSZ)
     and SIGSTKSZ is redefined to sysconf (_SC_SIGSTKSZ). This supports
     dynamic sized register sets for modern architectural features like
     Arm SVE."

> Forth the sigframe size on x86 is already dynamic and is already
> computed by get_sigframe_size.

Also, the x86 kernel supports exporting it via the AT_MINSIGSTKSZ aux vector
[4] since v5.14. I had developed the code with H.J. who authored the glibc
code [3].

> So can we please please please have a better description of what
> is going on and the trade offs that are being made.

Okay, but I think the dynamic MINSIGSTKSZ is not what this patch does, no?
Maybe the task::sas_ss_size part needs more clarity though.

Thanks,
Chang

[1] https://lore.kernel.org/lkml/20211021225527.10184-7-chang.seok.bae@intel.com/
[2] https://sourceware.org/pipermail/libc-alpha/2021-August/129718.html
[3] https://sourceware.org/git/?p=glibc.git;a=commit;h=6c57d320484988e87e446e2e60ce42816bf51d53
[4] https://lore.kernel.org/lkml/20210518200320.17239-1-chang.seok.bae@intel.com/


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 01/23] signal: Add an optional check for altstack size
  2021-10-22 15:20   ` Eric W. Biederman
  2021-10-22 20:58     ` Bae, Chang Seok
@ 2021-10-22 22:51     ` Dave Hansen
  1 sibling, 0 replies; 60+ messages in thread
From: Dave Hansen @ 2021-10-22 22:51 UTC (permalink / raw)
  To: Eric W. Biederman, Chang S. Bae
  Cc: linux-kernel, x86, tglx, dave.hansen, arjan, ravi.v.shankar,
	Oleg Nesterov, Al Viro

Hi Eric,

> First the reason for the new locking is not really explained, it talks
> about serialization but it does not talk about what is protected.
> Especially given that the signal delivery code already has to check if
> the signal frame on the stack when pushing a new signal I don't
> understand what the code is trying to prevent.

Yeah, the basic idea is to ensure that there are no races between
do_sigaltstack() and enabling the new "dynamic features".

> Third the issues with modifying the userspace ABI are not discussed. 
> Frankly that is a pretty big consideration.  MINSIGSTKSZ is exported
> to userspace and userspace fundamentally needs to allocate the
> alternate signal frame.
Agreed.  This is what we settled on to both respect old programs that
have MINSIGSTKSZ=2k compiled in *and* support new ones that need bigger
stacks.

> Forth the sigframe size on x86 is already dynamic and is already
> computed by get_sigframe_size.

Right.  This is much more about making the altstack size checks dynamic
than the sigframe size.

> So can we please please please have a better description of what
> is going on and the trade offs that are being made.

I've got a suggested replacement changelog below.  Please let me know if
it clarifies things, or leaves anything out.

How would this be a better subject?

	signal: Add optional dynamic altstack size checks

And this for a changelog?

--

New x86 FPU features will be very large, requiring ~10k of stack in
signal handlers.  These new features require a new approach called
"dynamic features".

The kernel currently tries to ensure that altstacks are reasonably
sized.  Right now, on x86, sys_sigaltstack() requires a size of >=2k.
However, that 2k is a constant.  Simply raising that 2k requirement to
>10k for the new features would break existing apps which have a
compiled-in size of 2k.

Instead of universally enforcing a larger stack, prohibit a process from
using dynamic features without properly-sized altstacks.  This must be
enforced in two places:

 * A dynamic feature can not be enabled without an large-enough altstack
   for each process thread.
 * Once a dynamic feature is enabled, any request to install a too-small
   altstack will be rejected

The dynamic feature enabling code must examine each thread in a process
to ensure that the altstacks are large enough.  Add a new lock
(sigaltstack_lock()) to ensure that threads can not race and change
their altstack after being examined.

Add the infrastructure in form of a config option and provide empty
stubs for architectures which do not need dynamic altstack size checks.

This implementation will be fleshed out for x86 in:

	x86/arch_prctl: Add controls for dynamic XSTATE components

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 06/23] x86/arch_prctl: Add controls for dynamic XSTATE components
  2021-10-21 22:55 ` [PATCH 06/23] x86/arch_prctl: Add controls for dynamic XSTATE components Chang S. Bae
@ 2021-10-24 21:17   ` Borislav Petkov
  2021-10-26  9:11     ` [PATCH] Documentation/x86: Add documentation for using dynamic XSTATE features Chang S. Bae
  2021-10-26 16:16   ` [tip: x86/fpu] x86/arch_prctl: Add controls for dynamic XSTATE components tip-bot2 for Chang S. Bae
  1 sibling, 1 reply; 60+ messages in thread
From: Borislav Petkov @ 2021-10-24 21:17 UTC (permalink / raw)
  To: Chang S. Bae; +Cc: linux-kernel, x86, tglx, dave.hansen, arjan, ravi.v.shankar

On Thu, Oct 21, 2021 at 03:55:10PM -0700, Chang S. Bae wrote:
> +/**
> + * fpu_xstate_prctl - xstate permission operations
> + * @tsk:	Redundant pointer to current
> + * @option:	A subfunction of arch_prctl()
> + * @arg2:	option argument
> + * Return:	0 if successful; otherwise, an error code
> + *
> + * Option arguments:
> + *
> + * ARCH_GET_XCOMP_SUPP: Pointer to user space u64 to store the info
> + * ARCH_GET_XCOMP_PERM: Pointer to user space u64 to store the info
> + * ARCH_REQ_XCOMP_PERM: Facility number requested
> + *
> + * For facilities which require more than one XSTATE component, the request
> + * must be the highest state component number related to that facility,
> + * e.g. for AMX which requires XFEATURE_XTILE_CFG(17) and
> + * XFEATURE_XTILE_DATA(18) this would be XFEATURE_XTILE_DATA(18).
> + */

Can I pls get this new set of prctl()s documented properly ontop of this
patchset?

I guess Documentation/x86/xstate.rst or so.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 15/23] x86/fpu: Add sanity checks for XFD
  2021-10-21 22:55 ` [PATCH 15/23] x86/fpu: Add sanity checks for XFD Chang S. Bae
@ 2021-10-25  8:11   ` Borislav Petkov
  2021-10-25 20:15     ` Thomas Gleixner
  2021-10-25  8:33   ` Mika Penttilä
  2021-10-26 16:16   ` [tip: x86/fpu] " tip-bot2 for Thomas Gleixner
  2 siblings, 1 reply; 60+ messages in thread
From: Borislav Petkov @ 2021-10-25  8:11 UTC (permalink / raw)
  To: Chang S. Bae; +Cc: linux-kernel, x86, tglx, dave.hansen, arjan, ravi.v.shankar

On Thu, Oct 21, 2021 at 03:55:19PM -0700, Chang S. Bae wrote:
> @@ -217,12 +240,15 @@ static inline int xrstor_from_user_sigframe(struct xregs_state __user *buf, u64
>   * Restore xstate from kernel space xsave area, return an error code instead of
>   * an exception.
>   */
> -static inline int os_xrstor_safe(struct xregs_state *xstate, u64 mask)
> +static inline int os_xrstor_safe(struct fpstate *fpstate, u64 mask)
>  {
> +	struct xregs_state *xstate = &fpstate->regs.xsave;
>  	u32 lmask = mask;
>  	u32 hmask = mask >> 32;
>  	int err;
>  
> +	/* Must enforce XFD update here */
> +

<--- something's missing here?

>  	if (cpu_feature_enabled(X86_FEATURE_XSAVES))
>  		XSTATE_OP(XRSTORS, xstate, lmask, hmask, err);
>  	else
> -- 
> 2.17.1
> 

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 15/23] x86/fpu: Add sanity checks for XFD
  2021-10-21 22:55 ` [PATCH 15/23] x86/fpu: Add sanity checks for XFD Chang S. Bae
  2021-10-25  8:11   ` Borislav Petkov
@ 2021-10-25  8:33   ` Mika Penttilä
  2021-10-25 18:13     ` Thomas Gleixner
  2021-10-26 16:16   ` [tip: x86/fpu] " tip-bot2 for Thomas Gleixner
  2 siblings, 1 reply; 60+ messages in thread
From: Mika Penttilä @ 2021-10-25  8:33 UTC (permalink / raw)
  To: Chang S. Bae, linux-kernel; +Cc: x86, tglx, dave.hansen, arjan, ravi.v.shankar



On 22.10.2021 1.55, Chang S. Bae wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
>
> Add debug functionality to ensure that the XFD MSR is up to date for XSAVE*
> and XRSTOR* operations.
>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
> ---
> Changes from the tglx tree:
> * Rename the xstate ops validation helper. (Dave Hansen)
> * Update code comments. (Dave Hansen)
> ---
>  arch/x86/kernel/fpu/core.c   |  9 +++---
>  arch/x86/kernel/fpu/signal.c |  6 ++--
>  arch/x86/kernel/fpu/xstate.c | 55 ++++++++++++++++++++++++++++++++++++
>  arch/x86/kernel/fpu/xstate.h | 34 +++++++++++++++++++---
>  4 files changed, 92 insertions(+), 12 deletions(-)
>
> diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
> index e92c9d79441c..f4b02ed47034 100644
> --- a/arch/x86/kernel/fpu/core.c
> +++ b/arch/x86/kernel/fpu/core.c
> @@ -166,7 +166,7 @@ void restore_fpregs_from_fpstate(struct fpstate *fpstate, u64 mask)
>  		 */
>  		mask = fpu_kernel_cfg.max_features & mask;
>  
> -		os_xrstor(&fpstate->regs.xsave, mask);
> +		os_xrstor(fpstate, mask);
>  	} else {
>  		if (use_fxsr())
>  			fxrstor(&fpstate->regs.fxsave);
> @@ -537,7 +537,7 @@ void fpu__drop(struct fpu *fpu)
>  static inline void restore_fpregs_from_init_fpstate(u64 features_mask)
>  {
>  	if (use_xsave())
> -		os_xrstor(&init_fpstate.regs.xsave, features_mask);
> +		os_xrstor(&init_fpstate, features_mask);
>  	else if (use_fxsr())
>  		fxrstor(&init_fpstate.regs.fxsave);
>  	else
> @@ -594,9 +594,8 @@ void fpu__clear_user_states(struct fpu *fpu)
>  	 * corresponding registers.
>  	 */
>  	if (xfeatures_mask_supervisor() &&
> -	    !fpregs_state_valid(fpu, smp_processor_id())) {
> -		os_xrstor(&fpu->fpstate->regs.xsave, xfeatures_mask_supervisor());
> -	}
> +	    !fpregs_state_valid(fpu, smp_processor_id()))
> +		os_xrstor_supervisor(fpu->fpstate);
>  
>  	/* Reset user states in registers. */
>  	restore_fpregs_from_init_fpstate(XFEATURE_MASK_USER_RESTORE);
> diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
> index d128eb581ffc..a937980fd02b 100644
> --- a/arch/x86/kernel/fpu/signal.c
> +++ b/arch/x86/kernel/fpu/signal.c
> @@ -261,7 +261,7 @@ static int __restore_fpregs_from_user(void __user *buf, u64 ufeatures,
>  			ret = fxrstor_from_user_sigframe(buf);
>  
>  		if (!ret && unlikely(init_bv))
> -			os_xrstor(&init_fpstate.regs.xsave, init_bv);
> +			os_xrstor(&init_fpstate, init_bv);
>  		return ret;
>  	} else if (use_fxsr()) {
>  		return fxrstor_from_user_sigframe(buf);
> @@ -322,7 +322,7 @@ static bool restore_fpregs_from_user(void __user *buf, u64 xrestore,
>  	 * been restored from a user buffer directly.
>  	 */
>  	if (test_thread_flag(TIF_NEED_FPU_LOAD) && xfeatures_mask_supervisor())
> -		os_xrstor(&fpu->fpstate->regs.xsave, xfeatures_mask_supervisor());
> +		os_xrstor_supervisor(fpu->fpstate);
>  
>  	fpregs_mark_activate();
>  	fpregs_unlock();
> @@ -432,7 +432,7 @@ static bool __fpu_restore_sig(void __user *buf, void __user *buf_fx,
>  		u64 mask = user_xfeatures | xfeatures_mask_supervisor();
>  
>  		fpregs->xsave.header.xfeatures &= mask;
> -		success = !os_xrstor_safe(&fpregs->xsave,
> +		success = !os_xrstor_safe(fpu->fpstate,
>  					  fpu_kernel_cfg.max_features);
>  	} else {
>  		success = !fxrstor_safe(&fpregs->fxsave);
> diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
> index c178aa91abb3..c2cbe14aaa00 100644
> --- a/arch/x86/kernel/fpu/xstate.c
> +++ b/arch/x86/kernel/fpu/xstate.c
> @@ -1301,6 +1301,61 @@ EXPORT_SYMBOL_GPL(fpstate_clear_xstate_component);
>  #endif
>  
>  #ifdef CONFIG_X86_64
> +
> +#ifdef CONFIG_X86_DEBUG_FPU
> +/*
> + * Ensure that a subsequent XSAVE* or XRSTOR* instruction with RFBM=@mask
> + * can safely operate on the @fpstate buffer.
> + */
> +static bool xstate_op_valid(struct fpstate *fpstate, u64 mask, bool rstor)
> +{
> +	u64 xfd = __this_cpu_read(xfd_state);
> +
> +	if (fpstate->xfd == xfd)
> +		return true;
> +
> +	/* For current's fpstate the XFD state must be correct. */
> +	if (fpstate->xfd == current->thread.fpu.fpstate->xfd)
> +		return false;
> +
Should this return true or is the comment confusing?


> +	/*
> +	 * XRSTOR(S) from init_fpstate are always correct as it will just
> +	 * bring all components into init state and not read from the
> +	 * buffer. XSAVE(S) raises #PF after init.
> +	 */
> +	if (fpstate == &init_fpstate)
> +		return rstor;
> +
> +	/*
> +	 * XSAVE(S): clone(), fpu_swap_kvm_fpu()
> +	 * XRSTORS(S): fpu_swap_kvm_fpu()
> +	 */
> +
> +	/*
> +	 * No XSAVE/XRSTOR instructions (except XSAVE itself) touch
> +	 * the buffer area for XFD-disabled state components.
> +	 */
> +	mask &= ~xfd;
> +
> +	/*
> +	 * Remove features which are valid in fpstate. They
> +	 * have space allocated in fpstate.
> +	 */
> +	mask &= ~fpstate->xfeatures;
> +
> +	/*
> +	 * Any remaining state components in 'mask' might be written
> +	 * by XSAVE/XRSTOR. Fail validation it found.
> +	 */
> +	return !mask;
> +}
> +
> +void xfd_validate_state(struct fpstate *fpstate, u64 mask, bool rstor)
> +{
> +	WARN_ON_ONCE(!xstate_op_valid(fpstate, mask, rstor));
> +}
> +#endif /* CONFIG_X86_DEBUG_FPU */
> +
>  static int validate_sigaltstack(unsigned int usize)
>  {
>  	struct task_struct *thread, *leader = current->group_leader;
> diff --git a/arch/x86/kernel/fpu/xstate.h b/arch/x86/kernel/fpu/xstate.h
> index 32a4dee4de3b..29024244965b 100644
> --- a/arch/x86/kernel/fpu/xstate.h
> +++ b/arch/x86/kernel/fpu/xstate.h
> @@ -130,6 +130,12 @@ static inline u64 xfeatures_mask_independent(void)
>  		     : "D" (st), "m" (*st), "a" (lmask), "d" (hmask)	\
>  		     : "memory")
>  
> +#if defined(CONFIG_X86_64) && defined(CONFIG_X86_DEBUG_FPU)
> +extern void xfd_validate_state(struct fpstate *fpstate, u64 mask, bool rstor);
> +#else
> +static inline void xfd_validate_state(struct fpstate *fpstate, u64 mask, bool rstor) { }
> +#endif
> +
>  /*
>   * Save processor xstate to xsave area.
>   *
> @@ -144,6 +150,7 @@ static inline void os_xsave(struct fpstate *fpstate)
>  	int err;
>  
>  	WARN_ON_FPU(!alternatives_patched);
> +	xfd_validate_state(fpstate, mask, false);
>  
>  	XSTATE_XSAVE(&fpstate->regs.xsave, lmask, hmask, err);
>  
> @@ -156,12 +163,23 @@ static inline void os_xsave(struct fpstate *fpstate)
>   *
>   * Uses XRSTORS when XSAVES is used, XRSTOR otherwise.
>   */
> -static inline void os_xrstor(struct xregs_state *xstate, u64 mask)
> +static inline void os_xrstor(struct fpstate *fpstate, u64 mask)
> +{
> +	u32 lmask = mask;
> +	u32 hmask = mask >> 32;
> +
> +	xfd_validate_state(fpstate, mask, true);
> +	XSTATE_XRESTORE(&fpstate->regs.xsave, lmask, hmask);
> +}
> +
> +/* Restore of supervisor state. Does not require XFD */
> +static inline void os_xrstor_supervisor(struct fpstate *fpstate)
>  {
> +	u64 mask = xfeatures_mask_supervisor();
>  	u32 lmask = mask;
>  	u32 hmask = mask >> 32;
>  
> -	XSTATE_XRESTORE(xstate, lmask, hmask);
> +	XSTATE_XRESTORE(&fpstate->regs.xsave, lmask, hmask);
>  }
>  
>  /*
> @@ -184,11 +202,14 @@ static inline int xsave_to_user_sigframe(struct xregs_state __user *buf)
>  	 * internally, e.g. PKRU. That's user space ABI and also required
>  	 * to allow the signal handler to modify PKRU.
>  	 */
> -	u64 mask = current->thread.fpu.fpstate->user_xfeatures;
> +	struct fpstate *fpstate = current->thread.fpu.fpstate;
> +	u64 mask = fpstate->user_xfeatures;
>  	u32 lmask = mask;
>  	u32 hmask = mask >> 32;
>  	int err;
>  
> +	xfd_validate_state(fpstate, mask, false);
> +
>  	stac();
>  	XSTATE_OP(XSAVE, buf, lmask, hmask, err);
>  	clac();
> @@ -206,6 +227,8 @@ static inline int xrstor_from_user_sigframe(struct xregs_state __user *buf, u64
>  	u32 hmask = mask >> 32;
>  	int err;
>  
> +	xfd_validate_state(current->thread.fpu.fpstate, mask, true);
> +
>  	stac();
>  	XSTATE_OP(XRSTOR, xstate, lmask, hmask, err);
>  	clac();
> @@ -217,12 +240,15 @@ static inline int xrstor_from_user_sigframe(struct xregs_state __user *buf, u64
>   * Restore xstate from kernel space xsave area, return an error code instead of
>   * an exception.
>   */
> -static inline int os_xrstor_safe(struct xregs_state *xstate, u64 mask)
> +static inline int os_xrstor_safe(struct fpstate *fpstate, u64 mask)
>  {
> +	struct xregs_state *xstate = &fpstate->regs.xsave;
>  	u32 lmask = mask;
>  	u32 hmask = mask >> 32;
>  	int err;
>  
> +	/* Must enforce XFD update here */
> +
>  	if (cpu_feature_enabled(X86_FEATURE_XSAVES))
>  		XSTATE_OP(XRSTORS, xstate, lmask, hmask, err);
>  	else


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 15/23] x86/fpu: Add sanity checks for XFD
  2021-10-25  8:33   ` Mika Penttilä
@ 2021-10-25 18:13     ` Thomas Gleixner
  2021-10-25 19:57       ` Dave Hansen
  0 siblings, 1 reply; 60+ messages in thread
From: Thomas Gleixner @ 2021-10-25 18:13 UTC (permalink / raw)
  To: Mika Penttilä, Chang S. Bae, linux-kernel
  Cc: x86, dave.hansen, arjan, ravi.v.shankar

On Mon, Oct 25 2021 at 11:33, Mika Penttilä wrote:
> On 22.10.2021 1.55, Chang S. Bae wrote:
>> +#ifdef CONFIG_X86_DEBUG_FPU
>> +/*
>> + * Ensure that a subsequent XSAVE* or XRSTOR* instruction with RFBM=@mask
>> + * can safely operate on the @fpstate buffer.
>> + */
>> +static bool xstate_op_valid(struct fpstate *fpstate, u64 mask, bool rstor)
>> +{
>> +	u64 xfd = __this_cpu_read(xfd_state);
>> +
>> +	if (fpstate->xfd == xfd)
>> +		return true;
>> +
>> +	/* For current's fpstate the XFD state must be correct. */
>> +	if (fpstate->xfd == current->thread.fpu.fpstate->xfd)
>> +		return false;
>> +
> Should this return true or is the comment confusing?

Comment might be confusing. The logic here is:

If fpstate->xfd equal xfd then it's valid.

So the next check is whether fpstate is the same as current's
fpstate. If that's the case then the result is invalid because for
current's fpstate the first condition should be true. But if it is not
true then the state is not valid.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 15/23] x86/fpu: Add sanity checks for XFD
  2021-10-25 18:13     ` Thomas Gleixner
@ 2021-10-25 19:57       ` Dave Hansen
  0 siblings, 0 replies; 60+ messages in thread
From: Dave Hansen @ 2021-10-25 19:57 UTC (permalink / raw)
  To: Thomas Gleixner, Mika Penttilä, Chang S. Bae, linux-kernel
  Cc: x86, dave.hansen, arjan, ravi.v.shankar

On 10/25/21 11:13 AM, Thomas Gleixner wrote:
> On Mon, Oct 25 2021 at 11:33, Mika Penttilä wrote:
>> On 22.10.2021 1.55, Chang S. Bae wrote:
>>> +#ifdef CONFIG_X86_DEBUG_FPU
>>> +/*
>>> + * Ensure that a subsequent XSAVE* or XRSTOR* instruction with RFBM=@mask
>>> + * can safely operate on the @fpstate buffer.
>>> + */
>>> +static bool xstate_op_valid(struct fpstate *fpstate, u64 mask, bool rstor)
>>> +{
>>> +	u64 xfd = __this_cpu_read(xfd_state);
>>> +
>>> +	if (fpstate->xfd == xfd)
>>> +		return true;
>>> +
>>> +	/* For current's fpstate the XFD state must be correct. */
>>> +	if (fpstate->xfd == current->thread.fpu.fpstate->xfd)
>>> +		return false;
>>> +
>> Should this return true or is the comment confusing?
> Comment might be confusing. The logic here is:
> 
> If fpstate->xfd equal xfd then it's valid.
> 
> So the next check is whether fpstate is the same as current's
> fpstate. If that's the case then the result is invalid because for
> current's fpstate the first condition should be true. But if it is not
> true then the state is not valid.

Maybe I'm just a dummy today, but I spent way too much time looking at
that line of code.  Would a slightly longer comment help.  Perhaps:

	/*
	 * The XFD MSR doesn't match the passed-in fpstate.
 	 * Make sure it at least matches current's fpstate.
	 * If not, XFD was set up wrong and is invalid.
	 */

Below is an even more exhaustive look at it.  Feel free to ignore.

--

In most cases, we are doing a normal old context switch.  XFD is already
set up by this point and the @fpstate is pointed to by
current->thread.fpu.  fpstate->xfd==xfd, life is good and we skip the
rest of the checks.

If they don't match, we have some more checks to do.  Here's a
bit-by-bit walk through it.  Just like the hardware MSR at this point,
assume that only single bit can be set in any of the xfd's.

 MSR | fpstate | cur->fpstate | valid
-------------------------------------
   0 |	   0   |      	  x   |    1  // MSR matches @fpstate
   0 |	   1   |          0   |    1  // MSR matches cur->fpstate
   0 |	   1   |          1   |    0  <- *** MSR matches nothing!
   1 |	   0   |          0   |    0  <- *** MSR matches nothing!
   1 |	   0   |          1   |    1  // MSR matches cur->fpstate
   1 |	   1   |          x   |    1  // MSR matches @fpstate

 (BTW, "valid" here means either return true from earlier, or
  falling through to the later xstate_op_valid() checks)

If a bit is *CLEAR* in the MSR, then it better be *CLEAR* in either
fpstate->xfd or current->...fpstate->xfd.

If a bit is *SET*   in the MSR, then it better be *SET*   in either
fpstate->xfd or current->...fpstate->xfd.

It's bad if the MSR doesn't match *either* fpstate->xfd or
current->...fpstate->xfd.  Because, if it does not match, how did we get
here?  What random fpstate was the MSR set up to work with?

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 15/23] x86/fpu: Add sanity checks for XFD
  2021-10-25  8:11   ` Borislav Petkov
@ 2021-10-25 20:15     ` Thomas Gleixner
  0 siblings, 0 replies; 60+ messages in thread
From: Thomas Gleixner @ 2021-10-25 20:15 UTC (permalink / raw)
  To: Borislav Petkov, Chang S. Bae
  Cc: linux-kernel, x86, dave.hansen, arjan, ravi.v.shankar

On Mon, Oct 25 2021 at 10:11, Borislav Petkov wrote:

> On Thu, Oct 21, 2021 at 03:55:19PM -0700, Chang S. Bae wrote:
>> @@ -217,12 +240,15 @@ static inline int xrstor_from_user_sigframe(struct xregs_state __user *buf, u64
>>   * Restore xstate from kernel space xsave area, return an error code instead of
>>   * an exception.
>>   */
>> -static inline int os_xrstor_safe(struct xregs_state *xstate, u64 mask)
>> +static inline int os_xrstor_safe(struct fpstate *fpstate, u64 mask)
>>  {
>> +	struct xregs_state *xstate = &fpstate->regs.xsave;
>>  	u32 lmask = mask;
>>  	u32 hmask = mask >> 32;
>>  	int err;
>>  
>> +	/* Must enforce XFD update here */
>> +
>
> <--- something's missing here?

That comment is replaced in the next patch which adds the actual XFD
update. I added it as a reminder. Could probably go away.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH] Documentation/x86: Add documentation for using dynamic XSTATE features
  2021-10-24 21:17   ` Borislav Petkov
@ 2021-10-26  9:11     ` Chang S. Bae
  2021-10-26 16:16       ` [tip: x86/fpu] " tip-bot2 for Chang S. Bae
  2021-10-28 13:10       ` tip-bot2 for Chang S. Bae
  0 siblings, 2 replies; 60+ messages in thread
From: Chang S. Bae @ 2021-10-26  9:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, tglx, dave.hansen, arjan, ravi.v.shankar, chang.seok.bae

Explain how dynamic XSTATE features can be enabled via the
architecture-specific prctl() along with dynamic sigframe size and
first use trap handling.

Originally-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
 Documentation/x86/index.rst  |  1 +
 Documentation/x86/xstate.rst | 66 ++++++++++++++++++++++++++++++++++++
 2 files changed, 67 insertions(+)
 create mode 100644 Documentation/x86/xstate.rst

diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst
index 383048396336..f498f1d36cd3 100644
--- a/Documentation/x86/index.rst
+++ b/Documentation/x86/index.rst
@@ -37,3 +37,4 @@ x86-specific Documentation
    sgx
    features
    elf_auxvec
+   xstate
diff --git a/Documentation/x86/xstate.rst b/Documentation/x86/xstate.rst
new file mode 100644
index 000000000000..d02e9f10919e
--- /dev/null
+++ b/Documentation/x86/xstate.rst
@@ -0,0 +1,66 @@
+Using XSTATE features in user space applications
+================================================
+
+The x86 architecture supports floating-point extensions which are
+enumerated via CPUID. Applications consult CPUID and use XGETBV to
+evaluate which features have been enabled by the kernel XCR0.
+
+Up to AVX-512 and PKRU states, these features are automatically enabled by
+the kernel if available. Features like AMX TILE_DATA (XSTATE component 18)
+are enabled by XCR0 as well, but the first use of related instruction is
+trapped by the kernel because by default the required large XSTATE buffers
+are not allocated automatically.
+
+Using dynamically enabled XSTATE features in user space applications
+-------------------------------------------------------------------
+
+The kernel provides an arch_prctl(2) based mechanism for applications to
+request the usage of such features. The arch_prctl(2) options related to
+this are:
+
+-ARCH_GET_XCOMP_SUPP
+
+ arch_prctl(ARCH_GET_XCOMP_SUPP, &features);
+
+ ARCH_GET_XCOMP_SUPP stores the supported features in userspace storage of
+ type uint64_t. The second argument is a pointer to that storage.
+
+-ARCH_GET_XCOMP_PERM
+
+ arch_prctl(ARCH_GET_XCOMP_PERM, &features);
+
+ ARCH_GET_XCOMP_PERM stores the features for which the userspace process
+ has permission in userspace storage of type uint64_t. The second argument
+ is a pointer to that storage.
+
+-ARCH_REQ_XCOMP_PERM
+
+ arch_prctl(ARCH_REQ_XCOMP_PERM, feature_nr);
+
+ ARCH_REQ_XCOMP_PERM allows to request permission for a dynamically enabled
+ feature or a feature set. A feature set can be mapped to a facility, e.g.
+ AMX, and can require one or more XSTATE components to be enabled.
+
+ The feature argument is the number of the highest XSTATE component which
+ is required for a facility to work.
+
+When requesting permission for a feature, the kernel checks the
+availability. The kernel ensures that sigaltstacks in the process's tasks
+are large enough to accommodate the resulting large signal frame. It
+enforces this both during ARCH_REQ_XCOMP_SUPP and during any subsequent
+sigaltstack(2) calls. If an installed sigaltstack is smaller than the
+resulting sigframe size, ARCH_REQ_XCOMP_SUPP results in -ENOSUPP. Also,
+sigaltstack(2) results in -ENOMEM if the requested altstack is too small
+for the permitted features.
+
+Permission, when granted, is valid per process. Permissions are inherited
+on fork(2) and cleared on exec(3).
+
+The first use of an instruction related to a dynamically enabled feature is
+trapped by the kernel. The trap handler checks whether the process has
+permission to use the feature. If the process has no permission then the
+kernel sends SIGILL to the application. If the process has permission then
+the handler allocates a larger xstate buffer for the task so the large
+state can be context switched. In the unlikely cases that the allocation
+fails, the kernel sends SIGSEGV.
+
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [tip: x86/fpu] Documentation/x86: Add documentation for using dynamic XSTATE features
  2021-10-26  9:11     ` [PATCH] Documentation/x86: Add documentation for using dynamic XSTATE features Chang S. Bae
@ 2021-10-26 16:16       ` tip-bot2 for Chang S. Bae
  2021-10-28 13:10       ` tip-bot2 for Chang S. Bae
  1 sibling, 0 replies; 60+ messages in thread
From: tip-bot2 for Chang S. Bae @ 2021-10-26 16:16 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Chang S. Bae, Borislav Petkov, x86, linux-kernel

The following commit has been merged into the x86/fpu branch of tip:

Commit-ID:     93175ec299f8418b415da8aabd9cc97506d49ab7
Gitweb:        https://git.kernel.org/tip/93175ec299f8418b415da8aabd9cc97506d49ab7
Author:        Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate:    Tue, 26 Oct 2021 02:11:57 -07:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Tue, 26 Oct 2021 11:31:56 +02:00

Documentation/x86: Add documentation for using dynamic XSTATE features

Explain how dynamic XSTATE features can be enabled via the
architecture-specific prctl() along with dynamic sigframe size and
first use trap handling.

Originally-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211026091157.16711-1-chang.seok.bae@intel.com
---
 Documentation/x86/index.rst  |  1 +-
 Documentation/x86/xstate.rst | 65 +++++++++++++++++++++++++++++++++++-
 2 files changed, 66 insertions(+)
 create mode 100644 Documentation/x86/xstate.rst

diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst
index 3830483..f498f1d 100644
--- a/Documentation/x86/index.rst
+++ b/Documentation/x86/index.rst
@@ -37,3 +37,4 @@ x86-specific Documentation
    sgx
    features
    elf_auxvec
+   xstate
diff --git a/Documentation/x86/xstate.rst b/Documentation/x86/xstate.rst
new file mode 100644
index 0000000..f6be368
--- /dev/null
+++ b/Documentation/x86/xstate.rst
@@ -0,0 +1,65 @@
+Using XSTATE features in user space applications
+================================================
+
+The x86 architecture supports floating-point extensions which are
+enumerated via CPUID. Applications consult CPUID and use XGETBV to
+evaluate which features have been enabled by the kernel XCR0.
+
+Up to AVX-512 and PKRU states, these features are automatically enabled by
+the kernel if available. Features like AMX TILE_DATA (XSTATE component 18)
+are enabled by XCR0 as well, but the first use of related instruction is
+trapped by the kernel because by default the required large XSTATE buffers
+are not allocated automatically.
+
+Using dynamically enabled XSTATE features in user space applications
+-------------------------------------------------------------------
+
+The kernel provides an arch_prctl(2) based mechanism for applications to
+request the usage of such features. The arch_prctl(2) options related to
+this are:
+
+-ARCH_GET_XCOMP_SUPP
+
+ arch_prctl(ARCH_GET_XCOMP_SUPP, &features);
+
+ ARCH_GET_XCOMP_SUPP stores the supported features in userspace storage of
+ type uint64_t. The second argument is a pointer to that storage.
+
+-ARCH_GET_XCOMP_PERM
+
+ arch_prctl(ARCH_GET_XCOMP_PERM, &features);
+
+ ARCH_GET_XCOMP_PERM stores the features for which the userspace process
+ has permission in userspace storage of type uint64_t. The second argument
+ is a pointer to that storage.
+
+-ARCH_REQ_XCOMP_PERM
+
+ arch_prctl(ARCH_REQ_XCOMP_PERM, feature_nr);
+
+ ARCH_REQ_XCOMP_PERM allows to request permission for a dynamically enabled
+ feature or a feature set. A feature set can be mapped to a facility, e.g.
+ AMX, and can require one or more XSTATE components to be enabled.
+
+ The feature argument is the number of the highest XSTATE component which
+ is required for a facility to work.
+
+When requesting permission for a feature, the kernel checks the
+availability. The kernel ensures that sigaltstacks in the process's tasks
+are large enough to accommodate the resulting large signal frame. It
+enforces this both during ARCH_REQ_XCOMP_SUPP and during any subsequent
+sigaltstack(2) calls. If an installed sigaltstack is smaller than the
+resulting sigframe size, ARCH_REQ_XCOMP_SUPP results in -ENOSUPP. Also,
+sigaltstack(2) results in -ENOMEM if the requested altstack is too small
+for the permitted features.
+
+Permission, when granted, is valid per process. Permissions are inherited
+on fork(2) and cleared on exec(3).
+
+The first use of an instruction related to a dynamically enabled feature is
+trapped by the kernel. The trap handler checks whether the process has
+permission to use the feature. If the process has no permission then the
+kernel sends SIGILL to the application. If the process has permission then
+the handler allocates a larger xstate buffer for the task so the large
+state can be context switched. In the unlikely cases that the allocation
+fails, the kernel sends SIGSEGV.

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [tip: x86/fpu] x86/fpu/amx: Enable the AMX feature in 64-bit mode
  2021-10-21 22:55 ` [PATCH 23/23] x86/fpu/amx: Enable the AMX feature in 64-bit mode Chang S. Bae
@ 2021-10-26 16:16   ` tip-bot2 for Chang S. Bae
  0 siblings, 0 replies; 60+ messages in thread
From: tip-bot2 for Chang S. Bae @ 2021-10-26 16:16 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Chang S. Bae, Thomas Gleixner, Borislav Petkov, x86, linux-kernel

The following commit has been merged into the x86/fpu branch of tip:

Commit-ID:     2308ee57d93d896618dd65c996429c9d3e469fe0
Gitweb:        https://git.kernel.org/tip/2308ee57d93d896618dd65c996429c9d3e469fe0
Author:        Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate:    Thu, 21 Oct 2021 15:55:27 -07:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Tue, 26 Oct 2021 10:53:03 +02:00

x86/fpu/amx: Enable the AMX feature in 64-bit mode

Add the AMX state components in XFEATURE_MASK_USER_SUPPORTED and the
TILE_DATA component to the dynamic states and update the permission check
table accordingly.

This is only effective on 64 bit kernels as for 32bit kernels
XFEATURE_MASK_TILE is defined as 0.

TILE_DATA is caller-saved state and the only dynamic state. Add build time
sanity check to ensure the assumption that every dynamic feature is caller-
saved.

Make AMX state depend on XFD as it is dynamic feature.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211021225527.10184-24-chang.seok.bae@intel.com
---
 arch/x86/include/asm/fpu/xstate.h | 5 +++--
 arch/x86/kernel/cpu/cpuid-deps.c  | 1 +
 arch/x86/kernel/fpu/core.c        | 6 ++++++
 arch/x86/kernel/fpu/xstate.c      | 5 +++--
 4 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
index 10adf13..0f8b90a 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -35,7 +35,8 @@
 				      XFEATURE_MASK_Hi16_ZMM	 | \
 				      XFEATURE_MASK_PKRU | \
 				      XFEATURE_MASK_BNDREGS | \
-				      XFEATURE_MASK_BNDCSR)
+				      XFEATURE_MASK_BNDCSR | \
+				      XFEATURE_MASK_XTILE)
 
 /*
  * Features which are restored when returning to user space.
@@ -46,7 +47,7 @@
 	(XFEATURE_MASK_USER_SUPPORTED & ~XFEATURE_MASK_PKRU)
 
 /* Features which are dynamically enabled for a process on request */
-#define XFEATURE_MASK_USER_DYNAMIC	0ULL
+#define XFEATURE_MASK_USER_DYNAMIC	XFEATURE_MASK_XTILE_DATA
 
 /* All currently supported supervisor features */
 #define XFEATURE_MASK_SUPERVISOR_SUPPORTED (XFEATURE_MASK_PASID)
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index d9ead9c..cb2fdd1 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -76,6 +76,7 @@ static const struct cpuid_dep cpuid_deps[] = {
 	{ X86_FEATURE_SGX1,			X86_FEATURE_SGX       },
 	{ X86_FEATURE_SGX2,			X86_FEATURE_SGX1      },
 	{ X86_FEATURE_XFD,			X86_FEATURE_XSAVES    },
+	{ X86_FEATURE_AMX_TILE,			X86_FEATURE_XFD       },
 	{}
 };
 
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 12ca174..290836d 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -495,6 +495,12 @@ int fpu_clone(struct task_struct *dst, unsigned long clone_flags)
 	}
 
 	/*
+	 * If a new feature is added, ensure all dynamic features are
+	 * caller-saved from here!
+	 */
+	BUILD_BUG_ON(XFEATURE_MASK_USER_DYNAMIC != XFEATURE_MASK_XTILE_DATA);
+
+	/*
 	 * Save the default portion of the current FPU state into the
 	 * clone. Assume all dynamic features to be defined as caller-
 	 * saved, which enables skipping both the expansion of fpstate
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 987a07b..d288294 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -404,7 +404,8 @@ static __init void os_xrstor_booting(struct xregs_state *xstate)
 	 XFEATURE_MASK_PKRU |			\
 	 XFEATURE_MASK_BNDREGS |		\
 	 XFEATURE_MASK_BNDCSR |			\
-	 XFEATURE_MASK_PASID)
+	 XFEATURE_MASK_PASID |			\
+	 XFEATURE_MASK_XTILE)
 
 /*
  * setup the xstate image representing the init state
@@ -1636,7 +1637,7 @@ static int __xstate_request_perm(u64 permitted, u64 requested)
  * Permissions array to map facilities with more than one component
  */
 static const u64 xstate_prctl_req[XFEATURE_MAX] = {
-	/* [XFEATURE_XTILE_DATA] = XFEATURE_MASK_XTILE, */
+	[XFEATURE_XTILE_DATA] = XFEATURE_MASK_XTILE_DATA,
 };
 
 static int xstate_request_perm(unsigned long idx)

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [tip: x86/fpu] x86/fpu: Add XFD handling for dynamic states
  2021-10-21 22:55 ` [PATCH 22/23] x86/fpu: Add XFD handling for dynamic states Chang S. Bae
@ 2021-10-26 16:16   ` tip-bot2 for Chang S. Bae
  0 siblings, 0 replies; 60+ messages in thread
From: tip-bot2 for Chang S. Bae @ 2021-10-26 16:16 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Chang S. Bae, Thomas Gleixner, Borislav Petkov, x86, linux-kernel

The following commit has been merged into the x86/fpu branch of tip:

Commit-ID:     db3e7321b4b84b1cb39598ff79b90d1252481378
Gitweb:        https://git.kernel.org/tip/db3e7321b4b84b1cb39598ff79b90d1252481378
Author:        Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate:    Thu, 21 Oct 2021 15:55:26 -07:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Tue, 26 Oct 2021 10:53:03 +02:00

x86/fpu: Add XFD handling for dynamic states

To handle the dynamic sizing of buffers on first use the XFD MSR has to be
armed. Store the delta between the maximum available and the default
feature bits in init_fpstate where it can be retrieved for task creation.

If the delta is non zero then dynamic features are enabled. This needs also
to enable the static key which guards the XFD updates. This is delayed to
an initcall because the FPU setup runs before jump labels are initialized.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211021225527.10184-23-chang.seok.bae@intel.com
---
 arch/x86/kernel/fpu/xstate.c | 28 +++++++++++++++++++++++++++-
 1 file changed, 27 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index b0f6e9a..987a07b 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -835,6 +835,12 @@ static void __init fpu__init_disable_system_xstate(unsigned int legacy_size)
 	fpu_user_cfg.max_size = legacy_size;
 	fpu_user_cfg.default_size = legacy_size;
 
+	/*
+	 * Prevent enabling the static branch which enables writes to the
+	 * XFD MSR.
+	 */
+	init_fpstate.xfd = 0;
+
 	fpstate_reset(&current->thread.fpu);
 }
 
@@ -918,6 +924,14 @@ void __init fpu__init_system_xstate(unsigned int legacy_size)
 	/* Store it for paranoia check at the end */
 	xfeatures = fpu_kernel_cfg.max_features;
 
+	/*
+	 * Initialize the default XFD state in initfp_state and enable the
+	 * dynamic sizing mechanism if dynamic states are available.  The
+	 * static key cannot be enabled here because this runs before
+	 * jump_label_init(). This is delayed to an initcall.
+	 */
+	init_fpstate.xfd = fpu_user_cfg.max_features & XFEATURE_MASK_USER_DYNAMIC;
+
 	/* Enable xstate instructions to be able to continue with initialization: */
 	fpu__init_cpu_xstate();
 	err = init_xstate_size();
@@ -1466,9 +1480,21 @@ void xfd_validate_state(struct fpstate *fpstate, u64 mask, bool rstor)
 }
 #endif /* CONFIG_X86_DEBUG_FPU */
 
+static int __init xfd_update_static_branch(void)
+{
+	/*
+	 * If init_fpstate.xfd has bits set then dynamic features are
+	 * available and the dynamic sizing must be enabled.
+	 */
+	if (init_fpstate.xfd)
+		static_branch_enable(&__fpu_state_size_dynamic);
+	return 0;
+}
+arch_initcall(xfd_update_static_branch)
+
 void fpstate_free(struct fpu *fpu)
 {
-	if (fpu->fpstate || fpu->fpstate != &fpu->__fpstate)
+	if (fpu->fpstate && fpu->fpstate != &fpu->__fpstate)
 		vfree(fpu->fpstate);
 }
 

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [tip: x86/fpu] x86/fpu: Calculate the default sizes independently
  2021-10-21 22:55 ` [PATCH 21/23] x86/fpu: Calculate the default sizes independently Chang S. Bae
@ 2021-10-26 16:16   ` tip-bot2 for Chang S. Bae
  0 siblings, 0 replies; 60+ messages in thread
From: tip-bot2 for Chang S. Bae @ 2021-10-26 16:16 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Chang S. Bae, Thomas Gleixner, Borislav Petkov, x86, linux-kernel

The following commit has been merged into the x86/fpu branch of tip:

Commit-ID:     2ae996e0c1a38ca57a52438ab9deec6761dcba62
Gitweb:        https://git.kernel.org/tip/2ae996e0c1a38ca57a52438ab9deec6761dcba62
Author:        Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate:    Thu, 21 Oct 2021 15:55:25 -07:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Tue, 26 Oct 2021 10:53:02 +02:00

x86/fpu: Calculate the default sizes independently

When dynamically enabled states are supported the maximum and default sizes
for the kernel buffers and user space interfaces are not longer identical.

Put the necessary calculations in place which only take the default enabled
features into account.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211021225527.10184-22-chang.seok.bae@intel.com
---
 arch/x86/kernel/fpu/xstate.c | 30 +++++++++++++++++++++---------
 1 file changed, 21 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 3372da8..b0f6e9a 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -781,35 +781,40 @@ static bool __init is_supported_xstate_size(unsigned int test_xstate_size)
 static int __init init_xstate_size(void)
 {
 	/* Recompute the context size for enabled features: */
-	unsigned int user_size, kernel_size;
+	unsigned int user_size, kernel_size, kernel_default_size;
+	bool compacted = cpu_feature_enabled(X86_FEATURE_XSAVES);
 
 	/* Uncompacted user space size */
 	user_size = get_xsave_size_user();
 
 	/*
 	 * XSAVES kernel size includes supervisor states and
-	 * uses compacted format.
+	 * uses compacted format when available.
 	 *
 	 * XSAVE does not support supervisor states so
 	 * kernel and user size is identical.
 	 */
-	if (cpu_feature_enabled(X86_FEATURE_XSAVES))
+	if (compacted)
 		kernel_size = get_xsaves_size_no_independent();
 	else
 		kernel_size = user_size;
 
-	/* Ensure we have the space to store all enabled features. */
-	if (!is_supported_xstate_size(kernel_size))
+	kernel_default_size =
+		xstate_calculate_size(fpu_kernel_cfg.default_features, compacted);
+
+	/* Ensure we have the space to store all default enabled features. */
+	if (!is_supported_xstate_size(kernel_default_size))
 		return -EINVAL;
 
 	if (!paranoid_xstate_size_valid(kernel_size))
 		return -EINVAL;
 
-	/* Keep it the same for now */
 	fpu_kernel_cfg.max_size = kernel_size;
-	fpu_kernel_cfg.default_size = kernel_size;
 	fpu_user_cfg.max_size = user_size;
-	fpu_user_cfg.default_size = user_size;
+
+	fpu_kernel_cfg.default_size = kernel_default_size;
+	fpu_user_cfg.default_size =
+		xstate_calculate_size(fpu_user_cfg.default_features, false);
 
 	return 0;
 }
@@ -894,15 +899,21 @@ void __init fpu__init_system_xstate(unsigned int legacy_size)
 			fpu_kernel_cfg.max_features &= ~BIT_ULL(i);
 	}
 
+	if (!cpu_feature_enabled(X86_FEATURE_XFD))
+		fpu_kernel_cfg.max_features &= ~XFEATURE_MASK_USER_DYNAMIC;
+
 	fpu_kernel_cfg.max_features &= XFEATURE_MASK_USER_SUPPORTED |
 			      XFEATURE_MASK_SUPERVISOR_SUPPORTED;
 
 	fpu_user_cfg.max_features = fpu_kernel_cfg.max_features;
 	fpu_user_cfg.max_features &= XFEATURE_MASK_USER_SUPPORTED;
 
-	/* Identical for now */
+	/* Clean out dynamic features from default */
 	fpu_kernel_cfg.default_features = fpu_kernel_cfg.max_features;
+	fpu_kernel_cfg.default_features &= ~XFEATURE_MASK_USER_DYNAMIC;
+
 	fpu_user_cfg.default_features = fpu_user_cfg.max_features;
+	fpu_user_cfg.default_features &= ~XFEATURE_MASK_USER_DYNAMIC;
 
 	/* Store it for paranoia check at the end */
 	xfeatures = fpu_kernel_cfg.max_features;
@@ -913,6 +924,7 @@ void __init fpu__init_system_xstate(unsigned int legacy_size)
 	if (err)
 		goto out_disable;
 
+	/* Reset the state for the current task */
 	fpstate_reset(&current->thread.fpu);
 
 	/*

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [tip: x86/fpu] x86/fpu/amx: Define AMX state components and have it used for boot-time checks
  2021-10-21 22:55 ` [PATCH 20/23] x86/fpu/amx: Define AMX state components and have it used for boot-time checks Chang S. Bae
@ 2021-10-26 16:16   ` tip-bot2 for Chang S. Bae
  0 siblings, 0 replies; 60+ messages in thread
From: tip-bot2 for Chang S. Bae @ 2021-10-26 16:16 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Chang S. Bae, Thomas Gleixner, Borislav Petkov, x86, linux-kernel

The following commit has been merged into the x86/fpu branch of tip:

Commit-ID:     eec2113eabd92b7bfbaf1033fa82dc8eb4951203
Gitweb:        https://git.kernel.org/tip/eec2113eabd92b7bfbaf1033fa82dc8eb4951203
Author:        Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate:    Thu, 21 Oct 2021 15:55:24 -07:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Tue, 26 Oct 2021 10:53:02 +02:00

x86/fpu/amx: Define AMX state components and have it used for boot-time checks

The XSTATE initialization uses check_xstate_against_struct() to sanity
check the size of XSTATE-enabled features. AMX is a XSAVE-enabled feature,
and its size is not hard-coded but discoverable at run-time via CPUID.

The AMX state is composed of state components 17 and 18, which are all user
state components. The first component is the XTILECFG state of a 64-byte
tile-related control register. The state component 18, called XTILEDATA,
contains the actual tile data, and the state size varies on
implementations. The architectural maximum, as defined in the CPUID(0x1d,
1): EAX[15:0], is a byte less than 64KB. The first implementation supports
8KB.

Check the XTILEDATA state size dynamically. The feature introduces the new
tile register, TMM. Define one register struct only and read the number of
registers from CPUID. Cross-check the overall size with CPUID again.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211021225527.10184-21-chang.seok.bae@intel.com
---
 arch/x86/include/asm/cpufeatures.h |  1 +-
 arch/x86/include/asm/fpu/types.h   | 32 ++++++++++++-
 arch/x86/include/asm/fpu/xstate.h  |  2 +-
 arch/x86/kernel/fpu/xstate.c       | 80 ++++++++++++++++++++++++++++-
 4 files changed, 114 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index ab7b3a2..d5b5f2a 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -299,6 +299,7 @@
 /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
 #define X86_FEATURE_AVX_VNNI		(12*32+ 4) /* AVX VNNI instructions */
 #define X86_FEATURE_AVX512_BF16		(12*32+ 5) /* AVX512 BFLOAT16 instructions */
+#define X86_FEATURE_AMX_TILE		(18*32+24) /* AMX tile Support */
 
 /* AMD-defined CPU features, CPUID level 0x80000008 (EBX), word 13 */
 #define X86_FEATURE_CLZERO		(13*32+ 0) /* CLZERO instruction */
diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
index b189763..3c06c82 100644
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -120,6 +120,9 @@ enum xfeature {
 	XFEATURE_RSRVD_COMP_13,
 	XFEATURE_RSRVD_COMP_14,
 	XFEATURE_LBR,
+	XFEATURE_RSRVD_COMP_16,
+	XFEATURE_XTILE_CFG,
+	XFEATURE_XTILE_DATA,
 
 	XFEATURE_MAX,
 };
@@ -136,12 +139,21 @@ enum xfeature {
 #define XFEATURE_MASK_PKRU		(1 << XFEATURE_PKRU)
 #define XFEATURE_MASK_PASID		(1 << XFEATURE_PASID)
 #define XFEATURE_MASK_LBR		(1 << XFEATURE_LBR)
+#define XFEATURE_MASK_XTILE_CFG		(1 << XFEATURE_XTILE_CFG)
+#define XFEATURE_MASK_XTILE_DATA	(1 << XFEATURE_XTILE_DATA)
 
 #define XFEATURE_MASK_FPSSE		(XFEATURE_MASK_FP | XFEATURE_MASK_SSE)
 #define XFEATURE_MASK_AVX512		(XFEATURE_MASK_OPMASK \
 					 | XFEATURE_MASK_ZMM_Hi256 \
 					 | XFEATURE_MASK_Hi16_ZMM)
 
+#ifdef CONFIG_X86_64
+# define XFEATURE_MASK_XTILE		(XFEATURE_MASK_XTILE_DATA \
+					 | XFEATURE_MASK_XTILE_CFG)
+#else
+# define XFEATURE_MASK_XTILE		(0)
+#endif
+
 #define FIRST_EXTENDED_XFEATURE	XFEATURE_YMM
 
 struct reg_128_bit {
@@ -153,6 +165,9 @@ struct reg_256_bit {
 struct reg_512_bit {
 	u8	regbytes[512/8];
 };
+struct reg_1024_byte {
+	u8	regbytes[1024];
+};
 
 /*
  * State component 2:
@@ -255,6 +270,23 @@ struct arch_lbr_state {
 	u64 ler_to;
 	u64 ler_info;
 	struct lbr_entry		entries[];
+};
+
+/*
+ * State component 17: 64-byte tile configuration register.
+ */
+struct xtile_cfg {
+	u64				tcfg[8];
+} __packed;
+
+/*
+ * State component 18: 1KB tile data register.
+ * Each register represents 16 64-byte rows of the matrix
+ * data. But the number of registers depends on the actual
+ * implementation.
+ */
+struct xtile_data {
+	struct reg_1024_byte		tmm;
 } __packed;
 
 /*
diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
index b7b145c..10adf13 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -14,6 +14,8 @@
 
 #define XSTATE_CPUID		0x0000000d
 
+#define TILE_CPUID		0x0000001d
+
 #define FXSAVE_SIZE	512
 
 #define XSAVE_HDR_SIZE	    64
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index e3d1898..3372da8 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -51,6 +51,14 @@ static const char *xfeature_names[] =
 	"Protection Keys User registers",
 	"PASID state",
 	"unknown xstate feature"	,
+	"unknown xstate feature"	,
+	"unknown xstate feature"	,
+	"unknown xstate feature"	,
+	"unknown xstate feature"	,
+	"unknown xstate feature"	,
+	"AMX Tile config"		,
+	"AMX Tile data"			,
+	"unknown xstate feature"	,
 };
 
 static unsigned short xsave_cpuid_features[] __initdata = {
@@ -65,6 +73,8 @@ static unsigned short xsave_cpuid_features[] __initdata = {
 	[XFEATURE_PT_UNIMPLEMENTED_SO_FAR]	= X86_FEATURE_INTEL_PT,
 	[XFEATURE_PKRU]				= X86_FEATURE_PKU,
 	[XFEATURE_PASID]			= X86_FEATURE_ENQCMD,
+	[XFEATURE_XTILE_CFG]			= X86_FEATURE_AMX_TILE,
+	[XFEATURE_XTILE_DATA]			= X86_FEATURE_AMX_TILE,
 };
 
 static unsigned int xstate_offsets[XFEATURE_MAX] __ro_after_init =
@@ -240,6 +250,8 @@ static void __init print_xstate_features(void)
 	print_xstate_feature(XFEATURE_MASK_Hi16_ZMM);
 	print_xstate_feature(XFEATURE_MASK_PKRU);
 	print_xstate_feature(XFEATURE_MASK_PASID);
+	print_xstate_feature(XFEATURE_MASK_XTILE_CFG);
+	print_xstate_feature(XFEATURE_MASK_XTILE_DATA);
 }
 
 /*
@@ -523,6 +535,67 @@ static void __init __xstate_dump_leaves(void)
 	}								\
 } while (0)
 
+/**
+ * check_xtile_data_against_struct - Check tile data state size.
+ *
+ * Calculate the state size by multiplying the single tile size which is
+ * recorded in a C struct, and the number of tiles that the CPU informs.
+ * Compare the provided size with the calculation.
+ *
+ * @size:	The tile data state size
+ *
+ * Returns:	0 on success, -EINVAL on mismatch.
+ */
+static int __init check_xtile_data_against_struct(int size)
+{
+	u32 max_palid, palid, state_size;
+	u32 eax, ebx, ecx, edx;
+	u16 max_tile;
+
+	/*
+	 * Check the maximum palette id:
+	 *   eax: the highest numbered palette subleaf.
+	 */
+	cpuid_count(TILE_CPUID, 0, &max_palid, &ebx, &ecx, &edx);
+
+	/*
+	 * Cross-check each tile size and find the maximum number of
+	 * supported tiles.
+	 */
+	for (palid = 1, max_tile = 0; palid <= max_palid; palid++) {
+		u16 tile_size, max;
+
+		/*
+		 * Check the tile size info:
+		 *   eax[31:16]:  bytes per title
+		 *   ebx[31:16]:  the max names (or max number of tiles)
+		 */
+		cpuid_count(TILE_CPUID, palid, &eax, &ebx, &edx, &edx);
+		tile_size = eax >> 16;
+		max = ebx >> 16;
+
+		if (tile_size != sizeof(struct xtile_data)) {
+			pr_err("%s: struct is %zu bytes, cpu xtile %d bytes\n",
+			       __stringify(XFEATURE_XTILE_DATA),
+			       sizeof(struct xtile_data), tile_size);
+			__xstate_dump_leaves();
+			return -EINVAL;
+		}
+
+		if (max > max_tile)
+			max_tile = max;
+	}
+
+	state_size = sizeof(struct xtile_data) * max_tile;
+	if (size != state_size) {
+		pr_err("%s: calculated size is %u bytes, cpu state %d bytes\n",
+		       __stringify(XFEATURE_XTILE_DATA), state_size, size);
+		__xstate_dump_leaves();
+		return -EINVAL;
+	}
+	return 0;
+}
+
 /*
  * We have a C struct for each 'xstate'.  We need to ensure
  * that our software representation matches what the CPU
@@ -546,6 +619,11 @@ static bool __init check_xstate_against_struct(int nr)
 	XCHECK_SZ(sz, nr, XFEATURE_Hi16_ZMM,  struct avx_512_hi16_state);
 	XCHECK_SZ(sz, nr, XFEATURE_PKRU,      struct pkru_state);
 	XCHECK_SZ(sz, nr, XFEATURE_PASID,     struct ia32_pasid_state);
+	XCHECK_SZ(sz, nr, XFEATURE_XTILE_CFG, struct xtile_cfg);
+
+	/* The tile data size varies between implementations. */
+	if (nr == XFEATURE_XTILE_DATA)
+		check_xtile_data_against_struct(sz);
 
 	/*
 	 * Make *SURE* to add any feature numbers in below if
@@ -555,7 +633,7 @@ static bool __init check_xstate_against_struct(int nr)
 	if ((nr < XFEATURE_YMM) ||
 	    (nr >= XFEATURE_MAX) ||
 	    (nr == XFEATURE_PT_UNIMPLEMENTED_SO_FAR) ||
-	    ((nr >= XFEATURE_RSRVD_COMP_11) && (nr <= XFEATURE_LBR))) {
+	    ((nr >= XFEATURE_RSRVD_COMP_11) && (nr <= XFEATURE_RSRVD_COMP_16))) {
 		WARN_ONCE(1, "no structure for xstate: %d\n", nr);
 		XSTATE_WARN_ON(1);
 		return false;

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [tip: x86/fpu] x86/fpu/xstate: Prepare XSAVE feature table for gaps in state component numbers
  2021-10-21 22:55 ` [PATCH 19/23] x86/fpu/xstate: Prepare XSAVE feature table for gaps in state component numbers Chang S. Bae
@ 2021-10-26 16:16   ` tip-bot2 for Chang S. Bae
  0 siblings, 0 replies; 60+ messages in thread
From: tip-bot2 for Chang S. Bae @ 2021-10-26 16:16 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Chang S. Bae, Thomas Gleixner, Len Brown, Borislav Petkov, x86,
	linux-kernel

The following commit has been merged into the x86/fpu branch of tip:

Commit-ID:     70c3f1671b0cbc386b387f1de33b7837e276a195
Gitweb:        https://git.kernel.org/tip/70c3f1671b0cbc386b387f1de33b7837e276a195
Author:        Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate:    Thu, 21 Oct 2021 15:55:23 -07:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Tue, 26 Oct 2021 10:53:02 +02:00

x86/fpu/xstate: Prepare XSAVE feature table for gaps in state component numbers

The kernel checks at boot time which features are available by walking a
XSAVE feature table which contains the CPUID feature bit numbers which need
to be checked whether a feature is available on a CPU or not. So far the
feature numbers have been linear, but AMX will create a gap which the
current code cannot handle.

Make the table entries explicitly indexed and adjust the loop code
accordingly to prepare for that.

No functional change.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Len Brown <len.brown@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211021225527.10184-20-chang.seok.bae@intel.com
---
 arch/x86/kernel/fpu/xstate.c | 29 ++++++++++++++++-------------
 1 file changed, 16 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index db0bfc2..e3d1898 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -53,18 +53,18 @@ static const char *xfeature_names[] =
 	"unknown xstate feature"	,
 };
 
-static short xsave_cpuid_features[] __initdata = {
-	X86_FEATURE_FPU,
-	X86_FEATURE_XMM,
-	X86_FEATURE_AVX,
-	X86_FEATURE_MPX,
-	X86_FEATURE_MPX,
-	X86_FEATURE_AVX512F,
-	X86_FEATURE_AVX512F,
-	X86_FEATURE_AVX512F,
-	X86_FEATURE_INTEL_PT,
-	X86_FEATURE_PKU,
-	X86_FEATURE_ENQCMD,
+static unsigned short xsave_cpuid_features[] __initdata = {
+	[XFEATURE_FP]				= X86_FEATURE_FPU,
+	[XFEATURE_SSE]				= X86_FEATURE_XMM,
+	[XFEATURE_YMM]				= X86_FEATURE_AVX,
+	[XFEATURE_BNDREGS]			= X86_FEATURE_MPX,
+	[XFEATURE_BNDCSR]			= X86_FEATURE_MPX,
+	[XFEATURE_OPMASK]			= X86_FEATURE_AVX512F,
+	[XFEATURE_ZMM_Hi256]			= X86_FEATURE_AVX512F,
+	[XFEATURE_Hi16_ZMM]			= X86_FEATURE_AVX512F,
+	[XFEATURE_PT_UNIMPLEMENTED_SO_FAR]	= X86_FEATURE_INTEL_PT,
+	[XFEATURE_PKRU]				= X86_FEATURE_PKU,
+	[XFEATURE_PASID]			= X86_FEATURE_ENQCMD,
 };
 
 static unsigned int xstate_offsets[XFEATURE_MAX] __ro_after_init =
@@ -809,7 +809,10 @@ void __init fpu__init_system_xstate(unsigned int legacy_size)
 	 * Clear XSAVE features that are disabled in the normal CPUID.
 	 */
 	for (i = 0; i < ARRAY_SIZE(xsave_cpuid_features); i++) {
-		if (!boot_cpu_has(xsave_cpuid_features[i]))
+		unsigned short cid = xsave_cpuid_features[i];
+
+		/* Careful: X86_FEATURE_FPU is 0! */
+		if ((i != XFEATURE_FP && !cid) || !boot_cpu_has(cid))
 			fpu_kernel_cfg.max_features &= ~BIT_ULL(i);
 	}
 

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [tip: x86/fpu] x86/fpu/xstate: Add fpstate_realloc()/free()
  2021-10-21 22:55 ` [PATCH 18/23] x86/fpu/xstate: Add fpstate_realloc()/free() Chang S. Bae
@ 2021-10-26 16:16   ` tip-bot2 for Chang S. Bae
  0 siblings, 0 replies; 60+ messages in thread
From: tip-bot2 for Chang S. Bae @ 2021-10-26 16:16 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Chang S. Bae, Thomas Gleixner, Borislav Petkov, x86, linux-kernel

The following commit has been merged into the x86/fpu branch of tip:

Commit-ID:     500afbf645a040a39e1af0dba2fdf6ebf224bd47
Gitweb:        https://git.kernel.org/tip/500afbf645a040a39e1af0dba2fdf6ebf224bd47
Author:        Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate:    Thu, 21 Oct 2021 15:55:22 -07:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Tue, 26 Oct 2021 10:53:02 +02:00

x86/fpu/xstate: Add fpstate_realloc()/free()

The fpstate embedded in struct fpu is the default state for storing the FPU
registers. It's sized so that the default supported features can be stored.
For dynamically enabled features the register buffer is too small.

The #NM handler detects first use of a feature which is disabled in the
XFD MSR. After handling permission checks it recalculates the size for
kernel space and user space state and invokes fpstate_realloc() which
tries to reallocate fpstate and install it.

Provide the allocator function which checks whether the current buffer size
is sufficient and if not allocates one. If allocation is successful the new
fpstate is initialized with the new features and sizes and the now enabled
features is removed from the task's XFD mask.

realloc_fpstate() uses vzalloc(). If use of this mechanism grows to
re-allocate buffers larger than 64KB, a more sophisticated allocation
scheme that includes purpose-built reclaim capability might be justified.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211021225527.10184-19-chang.seok.bae@intel.com
---
 arch/x86/include/asm/fpu/api.h |  7 ++-
 arch/x86/kernel/fpu/xstate.c   | 97 ++++++++++++++++++++++++++++++---
 arch/x86/kernel/process.c      | 10 +++-
 3 files changed, 106 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h
index 798ae92..b7267b9 100644
--- a/arch/x86/include/asm/fpu/api.h
+++ b/arch/x86/include/asm/fpu/api.h
@@ -130,6 +130,13 @@ static inline void fpstate_init_soft(struct swregs_state *soft) {}
 /* State tracking */
 DECLARE_PER_CPU(struct fpu *, fpu_fpregs_owner_ctx);
 
+/* Process cleanup */
+#ifdef CONFIG_X86_64
+extern void fpstate_free(struct fpu *fpu);
+#else
+static inline void fpstate_free(struct fpu *fpu) { }
+#endif
+
 /* fpstate-related functions which are exported to KVM */
 extern void fpstate_clear_xstate_component(struct fpstate *fps, unsigned int xfeature);
 
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 3d38558..db0bfc2 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -12,6 +12,7 @@
 #include <linux/pkeys.h>
 #include <linux/seq_file.h>
 #include <linux/proc_fs.h>
+#include <linux/vmalloc.h>
 
 #include <asm/fpu/api.h>
 #include <asm/fpu/regset.h>
@@ -22,6 +23,7 @@
 #include <asm/prctl.h>
 #include <asm/elf.h>
 
+#include "context.h"
 #include "internal.h"
 #include "legacy.h"
 #include "xstate.h"
@@ -1371,6 +1373,91 @@ void xfd_validate_state(struct fpstate *fpstate, u64 mask, bool rstor)
 }
 #endif /* CONFIG_X86_DEBUG_FPU */
 
+void fpstate_free(struct fpu *fpu)
+{
+	if (fpu->fpstate || fpu->fpstate != &fpu->__fpstate)
+		vfree(fpu->fpstate);
+}
+
+/**
+ * fpu_install_fpstate - Update the active fpstate in the FPU
+ *
+ * @fpu:	A struct fpu * pointer
+ * @newfps:	A struct fpstate * pointer
+ *
+ * Returns:	A null pointer if the last active fpstate is the embedded
+ *		one or the new fpstate is already installed;
+ *		otherwise, a pointer to the old fpstate which has to
+ *		be freed by the caller.
+ */
+static struct fpstate *fpu_install_fpstate(struct fpu *fpu,
+					   struct fpstate *newfps)
+{
+	struct fpstate *oldfps = fpu->fpstate;
+
+	if (fpu->fpstate == newfps)
+		return NULL;
+
+	fpu->fpstate = newfps;
+	return oldfps != &fpu->__fpstate ? oldfps : NULL;
+}
+
+/**
+ * fpstate_realloc - Reallocate struct fpstate for the requested new features
+ *
+ * @xfeatures:	A bitmap of xstate features which extend the enabled features
+ *		of that task
+ * @ksize:	The required size for the kernel buffer
+ * @usize:	The required size for user space buffers
+ *
+ * Note vs. vmalloc(): If the task with a vzalloc()-allocated buffer
+ * terminates quickly, vfree()-induced IPIs may be a concern, but tasks
+ * with large states are likely to live longer.
+ *
+ * Returns: 0 on success, -ENOMEM on allocation error.
+ */
+static int fpstate_realloc(u64 xfeatures, unsigned int ksize,
+			   unsigned int usize)
+{
+	struct fpu *fpu = &current->thread.fpu;
+	struct fpstate *curfps, *newfps = NULL;
+	unsigned int fpsize;
+
+	curfps = fpu->fpstate;
+	fpsize = ksize + ALIGN(offsetof(struct fpstate, regs), 64);
+
+	newfps = vzalloc(fpsize);
+	if (!newfps)
+		return -ENOMEM;
+	newfps->size = ksize;
+	newfps->user_size = usize;
+	newfps->is_valloc = true;
+
+	fpregs_lock();
+	/*
+	 * Ensure that the current state is in the registers before
+	 * swapping fpstate as that might invalidate it due to layout
+	 * changes.
+	 */
+	if (test_thread_flag(TIF_NEED_FPU_LOAD))
+		fpregs_restore_userregs();
+
+	newfps->xfeatures = curfps->xfeatures | xfeatures;
+	newfps->user_xfeatures = curfps->user_xfeatures | xfeatures;
+	newfps->xfd = curfps->xfd & ~xfeatures;
+
+	curfps = fpu_install_fpstate(fpu, newfps);
+
+	/* Do the final updates within the locked region */
+	xstate_init_xcomp_bv(&newfps->regs.xsave, newfps->xfeatures);
+	xfd_update_state(newfps);
+
+	fpregs_unlock();
+
+	vfree(curfps);
+	return 0;
+}
+
 static int validate_sigaltstack(unsigned int usize)
 {
 	struct task_struct *thread, *leader = current->group_leader;
@@ -1393,7 +1480,8 @@ static int __xstate_request_perm(u64 permitted, u64 requested)
 	/*
 	 * This deliberately does not exclude !XSAVES as we still might
 	 * decide to optionally context switch XCR0 or talk the silicon
-	 * vendors into extending XFD for the pre AMX states.
+	 * vendors into extending XFD for the pre AMX states, especially
+	 * AVX512.
 	 */
 	bool compacted = cpu_feature_enabled(X86_FEATURE_XSAVES);
 	struct fpu *fpu = &current->group_leader->thread.fpu;
@@ -1465,13 +1553,6 @@ static int xstate_request_perm(unsigned long idx)
 	return ret;
 }
 
-/* Place holder for now */
-static int fpstate_realloc(u64 xfeatures, unsigned int ksize,
-			   unsigned int usize)
-{
-	return -ENOMEM;
-}
-
 int xfd_enable_feature(u64 xfd_err)
 {
 	u64 xfd_event = xfd_err & XFEATURE_MASK_USER_DYNAMIC;
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 99025e3..f3f2517 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -32,6 +32,7 @@
 #include <asm/mwait.h>
 #include <asm/fpu/api.h>
 #include <asm/fpu/sched.h>
+#include <asm/fpu/xstate.h>
 #include <asm/debugreg.h>
 #include <asm/nmi.h>
 #include <asm/tlbflush.h>
@@ -90,9 +91,18 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
 #endif
 	/* Drop the copied pointer to current's fpstate */
 	dst->thread.fpu.fpstate = NULL;
+
 	return 0;
 }
 
+#ifdef CONFIG_X86_64
+void arch_release_task_struct(struct task_struct *tsk)
+{
+	if (fpu_state_size_dynamic())
+		fpstate_free(&tsk->thread.fpu);
+}
+#endif
+
 /*
  * Free thread data structures etc..
  */

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [tip: x86/fpu] x86/fpu/xstate: Add XFD #NM handler
  2021-10-21 22:55 ` [PATCH 17/23] x86/fpu/xstate: Add XFD #NM handler Chang S. Bae
@ 2021-10-26 16:16   ` tip-bot2 for Chang S. Bae
  0 siblings, 0 replies; 60+ messages in thread
From: tip-bot2 for Chang S. Bae @ 2021-10-26 16:16 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Chang S. Bae, Thomas Gleixner, Borislav Petkov, x86, linux-kernel

The following commit has been merged into the x86/fpu branch of tip:

Commit-ID:     783e87b404956f8958657aed8a6a72aa98d5b7e1
Gitweb:        https://git.kernel.org/tip/783e87b404956f8958657aed8a6a72aa98d5b7e1
Author:        Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate:    Thu, 21 Oct 2021 15:55:21 -07:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Tue, 26 Oct 2021 10:53:02 +02:00

x86/fpu/xstate: Add XFD #NM handler

If the XFD MSR has feature bits set then #NM will be raised when user space
attempts to use an instruction related to one of these features.

When the task has no permissions to use that feature, raise SIGILL, which
is the same behavior as #UD.

If the task has permissions, calculate the new buffer size for the extended
feature set and allocate a larger fpstate. In the unlikely case that
vzalloc() fails, SIGSEGV is raised.

The allocation function will be added in the next step. Provide a stub
which fails for now.

  [ tglx: Updated serialization ]

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211021225527.10184-18-chang.seok.bae@intel.com
---
 arch/x86/include/asm/fpu/xstate.h |  2 +-
 arch/x86/kernel/fpu/xstate.c      | 47 ++++++++++++++++++++++++++++++-
 arch/x86/kernel/traps.c           | 38 ++++++++++++++++++++++++-
 3 files changed, 87 insertions(+)

diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
index cf28546..b7b145c 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -99,6 +99,8 @@ int xfeature_size(int xfeature_nr);
 void xsaves(struct xregs_state *xsave, u64 mask);
 void xrstors(struct xregs_state *xsave, u64 mask);
 
+int xfd_enable_feature(u64 xfd_err);
+
 #ifdef CONFIG_X86_64
 DECLARE_STATIC_KEY_FALSE(__fpu_state_size_dynamic);
 #endif
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 77739b0..3d38558 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -1464,6 +1464,53 @@ static int xstate_request_perm(unsigned long idx)
 	spin_unlock_irq(&current->sighand->siglock);
 	return ret;
 }
+
+/* Place holder for now */
+static int fpstate_realloc(u64 xfeatures, unsigned int ksize,
+			   unsigned int usize)
+{
+	return -ENOMEM;
+}
+
+int xfd_enable_feature(u64 xfd_err)
+{
+	u64 xfd_event = xfd_err & XFEATURE_MASK_USER_DYNAMIC;
+	unsigned int ksize, usize;
+	struct fpu *fpu;
+
+	if (!xfd_event) {
+		pr_err_once("XFD: Invalid xfd error: %016llx\n", xfd_err);
+		return 0;
+	}
+
+	/* Protect against concurrent modifications */
+	spin_lock_irq(&current->sighand->siglock);
+
+	/* If not permitted let it die */
+	if ((xstate_get_host_group_perm() & xfd_event) != xfd_event) {
+		spin_unlock_irq(&current->sighand->siglock);
+		return -EPERM;
+	}
+
+	fpu = &current->group_leader->thread.fpu;
+	ksize = fpu->perm.__state_size;
+	usize = fpu->perm.__user_state_size;
+	/*
+	 * The feature is permitted. State size is sufficient.  Dropping
+	 * the lock is safe here even if more features are added from
+	 * another task, the retrieved buffer sizes are valid for the
+	 * currently requested feature(s).
+	 */
+	spin_unlock_irq(&current->sighand->siglock);
+
+	/*
+	 * Try to allocate a new fpstate. If that fails there is no way
+	 * out.
+	 */
+	if (fpstate_realloc(xfd_event, ksize, usize))
+		return -EFAULT;
+	return 0;
+}
 #else /* CONFIG_X86_64 */
 static inline int xstate_request_perm(unsigned long idx)
 {
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index bae7582..6ca1454 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -1108,10 +1108,48 @@ DEFINE_IDTENTRY(exc_spurious_interrupt_bug)
 	 */
 }
 
+static bool handle_xfd_event(struct pt_regs *regs)
+{
+	u64 xfd_err;
+	int err;
+
+	if (!IS_ENABLED(CONFIG_X86_64) || !cpu_feature_enabled(X86_FEATURE_XFD))
+		return false;
+
+	rdmsrl(MSR_IA32_XFD_ERR, xfd_err);
+	if (!xfd_err)
+		return false;
+
+	wrmsrl(MSR_IA32_XFD_ERR, 0);
+
+	/* Die if that happens in kernel space */
+	if (WARN_ON(!user_mode(regs)))
+		return false;
+
+	local_irq_enable();
+
+	err = xfd_enable_feature(xfd_err);
+
+	switch (err) {
+	case -EPERM:
+		force_sig_fault(SIGILL, ILL_ILLOPC, error_get_trap_addr(regs));
+		break;
+	case -EFAULT:
+		force_sig(SIGSEGV);
+		break;
+	}
+
+	local_irq_disable();
+	return true;
+}
+
 DEFINE_IDTENTRY(exc_device_not_available)
 {
 	unsigned long cr0 = read_cr0();
 
+	if (handle_xfd_event(regs))
+		return;
+
 #ifdef CONFIG_MATH_EMULATION
 	if (!boot_cpu_has(X86_FEATURE_FPU) && (cr0 & X86_CR0_EM)) {
 		struct math_emu_info info = { };

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [tip: x86/fpu] x86/fpu: Update XFD state where required
  2021-10-21 22:55 ` [PATCH 16/23] x86/fpu: Update XFD state where required Chang S. Bae
@ 2021-10-26 16:16   ` tip-bot2 for Chang S. Bae
  0 siblings, 0 replies; 60+ messages in thread
From: tip-bot2 for Chang S. Bae @ 2021-10-26 16:16 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Chang S. Bae, Thomas Gleixner, Borislav Petkov, x86, linux-kernel

The following commit has been merged into the x86/fpu branch of tip:

Commit-ID:     672365477ae8afca5a1cca98c1deb733235e4525
Gitweb:        https://git.kernel.org/tip/672365477ae8afca5a1cca98c1deb733235e4525
Author:        Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate:    Thu, 21 Oct 2021 15:55:20 -07:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Tue, 26 Oct 2021 10:53:02 +02:00

x86/fpu: Update XFD state where required

The IA32_XFD_MSR allows to arm #NM traps for XSTATE components which are
enabled in XCR0. The register has to be restored before the tasks XSTATE is
restored. The life time rules are the same as for FPU state.

XFD is updated on return to userspace only when the FPU state of the task
is not up to date in the registers. It's updated before the XRSTORS so
that eventually enabled dynamic features are restored as well and not
brought into init state.

Also in signal handling for restoring FPU state from user space the
correctness of the XFD state has to be ensured.

Add it to CPU initialization and resume as well.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211021225527.10184-17-chang.seok.bae@intel.com
---
 arch/x86/kernel/fpu/context.h |  2 ++
 arch/x86/kernel/fpu/core.c    | 28 +++++++++++++++++++++++++++-
 arch/x86/kernel/fpu/signal.c  |  2 ++
 arch/x86/kernel/fpu/xstate.c  | 12 ++++++++++++
 arch/x86/kernel/fpu/xstate.h  | 19 ++++++++++++++++++-
 5 files changed, 61 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/fpu/context.h b/arch/x86/kernel/fpu/context.h
index a06ebf3..958accf 100644
--- a/arch/x86/kernel/fpu/context.h
+++ b/arch/x86/kernel/fpu/context.h
@@ -69,6 +69,8 @@ static inline void fpregs_restore_userregs(void)
 		 * correct because it was either set in switch_to() or in
 		 * flush_thread(). So it is excluded because it might be
 		 * not up to date in current->thread.fpu.xsave state.
+		 *
+		 * XFD state is handled in restore_fpregs_from_fpstate().
 		 */
 		restore_fpregs_from_fpstate(fpu->fpstate, XFEATURE_MASK_FPSTATE);
 
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index b5f5b08..12ca174 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -156,6 +156,23 @@ void restore_fpregs_from_fpstate(struct fpstate *fpstate, u64 mask)
 
 	if (use_xsave()) {
 		/*
+		 * Dynamically enabled features are enabled in XCR0, but
+		 * usage requires also that the corresponding bits in XFD
+		 * are cleared.  If the bits are set then using a related
+		 * instruction will raise #NM. This allows to do the
+		 * allocation of the larger FPU buffer lazy from #NM or if
+		 * the task has no permission to kill it which would happen
+		 * via #UD if the feature is disabled in XCR0.
+		 *
+		 * XFD state is following the same life time rules as
+		 * XSTATE and to restore state correctly XFD has to be
+		 * updated before XRSTORS otherwise the component would
+		 * stay in or go into init state even if the bits are set
+		 * in fpstate::regs::xsave::xfeatures.
+		 */
+		xfd_update_state(fpstate);
+
+		/*
 		 * Restoring state always needs to modify all features
 		 * which are in @mask even if the current task cannot use
 		 * extended features.
@@ -241,8 +258,17 @@ int fpu_swap_kvm_fpstate(struct fpu_guest *guest_fpu, bool enter_guest)
 
 	cur_fps = fpu->fpstate;
 
-	if (!cur_fps->is_confidential)
+	if (!cur_fps->is_confidential) {
+		/* Includes XFD update */
 		restore_fpregs_from_fpstate(cur_fps, XFEATURE_MASK_FPSTATE);
+	} else {
+		/*
+		 * XSTATE is restored by firmware from encrypted
+		 * memory. Make sure XFD state is correct while
+		 * running with guest fpstate
+		 */
+		xfd_update_state(cur_fps);
+	}
 
 	fpregs_mark_activate();
 	fpregs_unlock();
diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index 16fdecd..cc977da 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -282,6 +282,8 @@ static bool restore_fpregs_from_user(void __user *buf, u64 xrestore,
 
 retry:
 	fpregs_lock();
+	/* Ensure that XFD is up to date */
+	xfd_update_state(fpu->fpstate);
 	pagefault_disable();
 	ret = __restore_fpregs_from_user(buf, fpu->fpstate->user_xfeatures,
 					 xrestore, fx_only);
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 603edeb..77739b0 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -137,6 +137,15 @@ void fpu__init_cpu_xstate(void)
 	cr4_set_bits(X86_CR4_OSXSAVE);
 
 	/*
+	 * Must happen after CR4 setup and before xsetbv() to allow KVM
+	 * lazy passthrough.  Write independent of the dynamic state static
+	 * key as that does not work on the boot CPU. This also ensures
+	 * that any stale state is wiped out from XFD.
+	 */
+	if (cpu_feature_enabled(X86_FEATURE_XFD))
+		wrmsrl(MSR_IA32_XFD, init_fpstate.xfd);
+
+	/*
 	 * XCR_XFEATURE_ENABLED_MASK (aka. XCR0) sets user features
 	 * managed by XSAVE{C, OPT, S} and XRSTOR{S}.  Only XSAVE user
 	 * states can be set here.
@@ -875,6 +884,9 @@ void fpu__resume_cpu(void)
 		wrmsrl(MSR_IA32_XSS, xfeatures_mask_supervisor()  |
 				     xfeatures_mask_independent());
 	}
+
+	if (fpu_state_size_dynamic())
+		wrmsrl(MSR_IA32_XFD, current->thread.fpu.fpstate->xfd);
 }
 
 /*
diff --git a/arch/x86/kernel/fpu/xstate.h b/arch/x86/kernel/fpu/xstate.h
index 2902424..e18210d 100644
--- a/arch/x86/kernel/fpu/xstate.h
+++ b/arch/x86/kernel/fpu/xstate.h
@@ -136,6 +136,22 @@ extern void xfd_validate_state(struct fpstate *fpstate, u64 mask, bool rstor);
 static inline void xfd_validate_state(struct fpstate *fpstate, u64 mask, bool rstor) { }
 #endif
 
+#ifdef CONFIG_X86_64
+static inline void xfd_update_state(struct fpstate *fpstate)
+{
+	if (fpu_state_size_dynamic()) {
+		u64 xfd = fpstate->xfd;
+
+		if (__this_cpu_read(xfd_state) != xfd) {
+			wrmsrl(MSR_IA32_XFD, xfd);
+			__this_cpu_write(xfd_state, xfd);
+		}
+	}
+}
+#else
+static inline void xfd_update_state(struct fpstate *fpstate) { }
+#endif
+
 /*
  * Save processor xstate to xsave area.
  *
@@ -247,7 +263,8 @@ static inline int os_xrstor_safe(struct fpstate *fpstate, u64 mask)
 	u32 hmask = mask >> 32;
 	int err;
 
-	/* Must enforce XFD update here */
+	/* Ensure that XFD is up to date */
+	xfd_update_state(fpstate);
 
 	if (cpu_feature_enabled(X86_FEATURE_XSAVES))
 		XSTATE_OP(XRSTORS, xstate, lmask, hmask, err);

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [tip: x86/fpu] x86/fpu: Add XFD state to fpstate
  2021-10-21 22:55 ` [PATCH 14/23] x86/fpu: Add XFD state to fpstate Chang S. Bae
@ 2021-10-26 16:16   ` tip-bot2 for Chang S. Bae
  0 siblings, 0 replies; 60+ messages in thread
From: tip-bot2 for Chang S. Bae @ 2021-10-26 16:16 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Chang S. Bae, Thomas Gleixner, Borislav Petkov, x86, linux-kernel

The following commit has been merged into the x86/fpu branch of tip:

Commit-ID:     8bf26758ca9659866b844dd51037314b4c0fa6bd
Gitweb:        https://git.kernel.org/tip/8bf26758ca9659866b844dd51037314b4c0fa6bd
Author:        Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate:    Thu, 21 Oct 2021 15:55:18 -07:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Tue, 26 Oct 2021 10:18:09 +02:00

x86/fpu: Add XFD state to fpstate

Add storage for XFD register state to struct fpstate. This will be used to
store the XFD MSR state. This will be used for switching the XFD MSR when
FPU content is restored.

Add a per-CPU variable to cache the current MSR value so the MSR has only
to be written when the values are different.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211021225527.10184-15-chang.seok.bae@intel.com
---
 arch/x86/include/asm/fpu/types.h | 3 +++
 arch/x86/kernel/fpu/core.c       | 2 ++
 arch/x86/kernel/fpu/xstate.h     | 4 ++++
 3 files changed, 9 insertions(+)

diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
index 595122f..b189763 100644
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -322,6 +322,9 @@ struct fpstate {
 	/* @user_xfeatures:	xfeatures valid in UABI buffers */
 	u64			user_xfeatures;
 
+	/* @xfd:		xfeatures disabled to trap userspace use. */
+	u64			xfd;
+
 	/* @is_valloc:		Indicator for dynamically allocated state */
 	unsigned int		is_valloc	: 1;
 
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 3349068..3b72cdd 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -27,6 +27,7 @@
 
 #ifdef CONFIG_X86_64
 DEFINE_STATIC_KEY_FALSE(__fpu_state_size_dynamic);
+DEFINE_PER_CPU(u64, xfd_state);
 #endif
 
 /* The FPU state configuration data for kernel and user space */
@@ -409,6 +410,7 @@ static void __fpstate_reset(struct fpstate *fpstate)
 	fpstate->user_size	= fpu_user_cfg.default_size;
 	fpstate->xfeatures	= fpu_kernel_cfg.default_features;
 	fpstate->user_xfeatures	= fpu_user_cfg.default_features;
+	fpstate->xfd		= init_fpstate.xfd;
 }
 
 void fpstate_reset(struct fpu *fpu)
diff --git a/arch/x86/kernel/fpu/xstate.h b/arch/x86/kernel/fpu/xstate.h
index 4ce1dc0..32a4dee 100644
--- a/arch/x86/kernel/fpu/xstate.h
+++ b/arch/x86/kernel/fpu/xstate.h
@@ -5,6 +5,10 @@
 #include <asm/cpufeature.h>
 #include <asm/fpu/xstate.h>
 
+#ifdef CONFIG_X86_64
+DECLARE_PER_CPU(u64, xfd_state);
+#endif
+
 static inline void xstate_init_xcomp_bv(struct xregs_state *xsave, u64 mask)
 {
 	/*

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [tip: x86/fpu] x86/fpu: Add sanity checks for XFD
  2021-10-21 22:55 ` [PATCH 15/23] x86/fpu: Add sanity checks for XFD Chang S. Bae
  2021-10-25  8:11   ` Borislav Petkov
  2021-10-25  8:33   ` Mika Penttilä
@ 2021-10-26 16:16   ` tip-bot2 for Thomas Gleixner
  2 siblings, 0 replies; 60+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2021-10-26 16:16 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Chang S. Bae, Borislav Petkov, x86, linux-kernel

The following commit has been merged into the x86/fpu branch of tip:

Commit-ID:     5529acf47ec31ece0815f69d43f5e6a1e485a0f3
Gitweb:        https://git.kernel.org/tip/5529acf47ec31ece0815f69d43f5e6a1e485a0f3
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Thu, 21 Oct 2021 15:55:19 -07:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Tue, 26 Oct 2021 10:52:35 +02:00

x86/fpu: Add sanity checks for XFD

Add debug functionality to ensure that the XFD MSR is up to date for XSAVE*
and XRSTOR* operations.

 [ tglx: Improve comment. ]

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211021225527.10184-16-chang.seok.bae@intel.com
---
 arch/x86/kernel/fpu/core.c   |  9 ++---
 arch/x86/kernel/fpu/signal.c |  6 ++--
 arch/x86/kernel/fpu/xstate.c | 58 +++++++++++++++++++++++++++++++++++-
 arch/x86/kernel/fpu/xstate.h | 34 ++++++++++++++++++---
 4 files changed, 95 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 3b72cdd..b5f5b08 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -166,7 +166,7 @@ void restore_fpregs_from_fpstate(struct fpstate *fpstate, u64 mask)
 		 */
 		mask = fpu_kernel_cfg.max_features & mask;
 
-		os_xrstor(&fpstate->regs.xsave, mask);
+		os_xrstor(fpstate, mask);
 	} else {
 		if (use_fxsr())
 			fxrstor(&fpstate->regs.fxsave);
@@ -534,7 +534,7 @@ void fpu__drop(struct fpu *fpu)
 static inline void restore_fpregs_from_init_fpstate(u64 features_mask)
 {
 	if (use_xsave())
-		os_xrstor(&init_fpstate.regs.xsave, features_mask);
+		os_xrstor(&init_fpstate, features_mask);
 	else if (use_fxsr())
 		fxrstor(&init_fpstate.regs.fxsave);
 	else
@@ -591,9 +591,8 @@ void fpu__clear_user_states(struct fpu *fpu)
 	 * corresponding registers.
 	 */
 	if (xfeatures_mask_supervisor() &&
-	    !fpregs_state_valid(fpu, smp_processor_id())) {
-		os_xrstor(&fpu->fpstate->regs.xsave, xfeatures_mask_supervisor());
-	}
+	    !fpregs_state_valid(fpu, smp_processor_id()))
+		os_xrstor_supervisor(fpu->fpstate);
 
 	/* Reset user states in registers. */
 	restore_fpregs_from_init_fpstate(XFEATURE_MASK_USER_RESTORE);
diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index 3b7f7d0..16fdecd 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -261,7 +261,7 @@ static int __restore_fpregs_from_user(void __user *buf, u64 ufeatures,
 			ret = fxrstor_from_user_sigframe(buf);
 
 		if (!ret && unlikely(init_bv))
-			os_xrstor(&init_fpstate.regs.xsave, init_bv);
+			os_xrstor(&init_fpstate, init_bv);
 		return ret;
 	} else if (use_fxsr()) {
 		return fxrstor_from_user_sigframe(buf);
@@ -322,7 +322,7 @@ retry:
 	 * been restored from a user buffer directly.
 	 */
 	if (test_thread_flag(TIF_NEED_FPU_LOAD) && xfeatures_mask_supervisor())
-		os_xrstor(&fpu->fpstate->regs.xsave, xfeatures_mask_supervisor());
+		os_xrstor_supervisor(fpu->fpstate);
 
 	fpregs_mark_activate();
 	fpregs_unlock();
@@ -432,7 +432,7 @@ static bool __fpu_restore_sig(void __user *buf, void __user *buf_fx,
 		u64 mask = user_xfeatures | xfeatures_mask_supervisor();
 
 		fpregs->xsave.header.xfeatures &= mask;
-		success = !os_xrstor_safe(&fpregs->xsave,
+		success = !os_xrstor_safe(fpu->fpstate,
 					  fpu_kernel_cfg.max_features);
 	} else {
 		success = !fxrstor_safe(&fpregs->fxsave);
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index bf42ee2..603edeb 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -1301,6 +1301,64 @@ EXPORT_SYMBOL_GPL(fpstate_clear_xstate_component);
 #endif
 
 #ifdef CONFIG_X86_64
+
+#ifdef CONFIG_X86_DEBUG_FPU
+/*
+ * Ensure that a subsequent XSAVE* or XRSTOR* instruction with RFBM=@mask
+ * can safely operate on the @fpstate buffer.
+ */
+static bool xstate_op_valid(struct fpstate *fpstate, u64 mask, bool rstor)
+{
+	u64 xfd = __this_cpu_read(xfd_state);
+
+	if (fpstate->xfd == xfd)
+		return true;
+
+	 /*
+	  * The XFD MSR does not match fpstate->xfd. That's invalid when
+	  * the passed in fpstate is current's fpstate.
+	  */
+	if (fpstate->xfd == current->thread.fpu.fpstate->xfd)
+		return false;
+
+	/*
+	 * XRSTOR(S) from init_fpstate are always correct as it will just
+	 * bring all components into init state and not read from the
+	 * buffer. XSAVE(S) raises #PF after init.
+	 */
+	if (fpstate == &init_fpstate)
+		return rstor;
+
+	/*
+	 * XSAVE(S): clone(), fpu_swap_kvm_fpu()
+	 * XRSTORS(S): fpu_swap_kvm_fpu()
+	 */
+
+	/*
+	 * No XSAVE/XRSTOR instructions (except XSAVE itself) touch
+	 * the buffer area for XFD-disabled state components.
+	 */
+	mask &= ~xfd;
+
+	/*
+	 * Remove features which are valid in fpstate. They
+	 * have space allocated in fpstate.
+	 */
+	mask &= ~fpstate->xfeatures;
+
+	/*
+	 * Any remaining state components in 'mask' might be written
+	 * by XSAVE/XRSTOR. Fail validation it found.
+	 */
+	return !mask;
+}
+
+void xfd_validate_state(struct fpstate *fpstate, u64 mask, bool rstor)
+{
+	WARN_ON_ONCE(!xstate_op_valid(fpstate, mask, rstor));
+}
+#endif /* CONFIG_X86_DEBUG_FPU */
+
 static int validate_sigaltstack(unsigned int usize)
 {
 	struct task_struct *thread, *leader = current->group_leader;
diff --git a/arch/x86/kernel/fpu/xstate.h b/arch/x86/kernel/fpu/xstate.h
index 32a4dee..2902424 100644
--- a/arch/x86/kernel/fpu/xstate.h
+++ b/arch/x86/kernel/fpu/xstate.h
@@ -130,6 +130,12 @@ static inline u64 xfeatures_mask_independent(void)
 		     : "D" (st), "m" (*st), "a" (lmask), "d" (hmask)	\
 		     : "memory")
 
+#if defined(CONFIG_X86_64) && defined(CONFIG_X86_DEBUG_FPU)
+extern void xfd_validate_state(struct fpstate *fpstate, u64 mask, bool rstor);
+#else
+static inline void xfd_validate_state(struct fpstate *fpstate, u64 mask, bool rstor) { }
+#endif
+
 /*
  * Save processor xstate to xsave area.
  *
@@ -144,6 +150,7 @@ static inline void os_xsave(struct fpstate *fpstate)
 	int err;
 
 	WARN_ON_FPU(!alternatives_patched);
+	xfd_validate_state(fpstate, mask, false);
 
 	XSTATE_XSAVE(&fpstate->regs.xsave, lmask, hmask, err);
 
@@ -156,12 +163,23 @@ static inline void os_xsave(struct fpstate *fpstate)
  *
  * Uses XRSTORS when XSAVES is used, XRSTOR otherwise.
  */
-static inline void os_xrstor(struct xregs_state *xstate, u64 mask)
+static inline void os_xrstor(struct fpstate *fpstate, u64 mask)
+{
+	u32 lmask = mask;
+	u32 hmask = mask >> 32;
+
+	xfd_validate_state(fpstate, mask, true);
+	XSTATE_XRESTORE(&fpstate->regs.xsave, lmask, hmask);
+}
+
+/* Restore of supervisor state. Does not require XFD */
+static inline void os_xrstor_supervisor(struct fpstate *fpstate)
 {
+	u64 mask = xfeatures_mask_supervisor();
 	u32 lmask = mask;
 	u32 hmask = mask >> 32;
 
-	XSTATE_XRESTORE(xstate, lmask, hmask);
+	XSTATE_XRESTORE(&fpstate->regs.xsave, lmask, hmask);
 }
 
 /*
@@ -184,11 +202,14 @@ static inline int xsave_to_user_sigframe(struct xregs_state __user *buf)
 	 * internally, e.g. PKRU. That's user space ABI and also required
 	 * to allow the signal handler to modify PKRU.
 	 */
-	u64 mask = current->thread.fpu.fpstate->user_xfeatures;
+	struct fpstate *fpstate = current->thread.fpu.fpstate;
+	u64 mask = fpstate->user_xfeatures;
 	u32 lmask = mask;
 	u32 hmask = mask >> 32;
 	int err;
 
+	xfd_validate_state(fpstate, mask, false);
+
 	stac();
 	XSTATE_OP(XSAVE, buf, lmask, hmask, err);
 	clac();
@@ -206,6 +227,8 @@ static inline int xrstor_from_user_sigframe(struct xregs_state __user *buf, u64 
 	u32 hmask = mask >> 32;
 	int err;
 
+	xfd_validate_state(current->thread.fpu.fpstate, mask, true);
+
 	stac();
 	XSTATE_OP(XRSTOR, xstate, lmask, hmask, err);
 	clac();
@@ -217,12 +240,15 @@ static inline int xrstor_from_user_sigframe(struct xregs_state __user *buf, u64 
  * Restore xstate from kernel space xsave area, return an error code instead of
  * an exception.
  */
-static inline int os_xrstor_safe(struct xregs_state *xstate, u64 mask)
+static inline int os_xrstor_safe(struct fpstate *fpstate, u64 mask)
 {
+	struct xregs_state *xstate = &fpstate->regs.xsave;
 	u32 lmask = mask;
 	u32 hmask = mask >> 32;
 	int err;
 
+	/* Must enforce XFD update here */
+
 	if (cpu_feature_enabled(X86_FEATURE_XSAVES))
 		XSTATE_OP(XRSTORS, xstate, lmask, hmask, err);
 	else

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [tip: x86/fpu] x86/msr-index: Add MSRs for XFD
  2021-10-21 22:55 ` [PATCH 13/23] x86/msr-index: Add MSRs for XFD Chang S. Bae
@ 2021-10-26 16:16   ` tip-bot2 for Chang S. Bae
  0 siblings, 0 replies; 60+ messages in thread
From: tip-bot2 for Chang S. Bae @ 2021-10-26 16:16 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Chang S. Bae, Thomas Gleixner, Borislav Petkov, x86, linux-kernel

The following commit has been merged into the x86/fpu branch of tip:

Commit-ID:     dae1bd58389615d401a84aedc38fa075ef8f7de6
Gitweb:        https://git.kernel.org/tip/dae1bd58389615d401a84aedc38fa075ef8f7de6
Author:        Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate:    Thu, 21 Oct 2021 15:55:17 -07:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Tue, 26 Oct 2021 10:18:09 +02:00

x86/msr-index: Add MSRs for XFD

XFD introduces two MSRs:

    - IA32_XFD to enable/disable a feature controlled by XFD

    - IA32_XFD_ERR to expose to the #NM trap handler which feature
      was tried to be used for the first time.

Both use the same xstate-component bitmap format, used by XCR0.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211021225527.10184-14-chang.seok.bae@intel.com
---
 arch/x86/include/asm/msr-index.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index a7c4134..01e2650 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -625,6 +625,8 @@
 
 #define MSR_IA32_BNDCFGS_RSVD		0x00000ffc
 
+#define MSR_IA32_XFD			0x000001c4
+#define MSR_IA32_XFD_ERR		0x000001c5
 #define MSR_IA32_XSS			0x00000da0
 
 #define MSR_IA32_APICBASE		0x0000001b

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [tip: x86/fpu] x86/cpufeatures: Add eXtended Feature Disabling (XFD) feature bit
  2021-10-21 22:55 ` [PATCH 12/23] x86/cpufeatures: Add eXtended Feature Disabling (XFD) feature bit Chang S. Bae
@ 2021-10-26 16:16   ` tip-bot2 for Chang S. Bae
  0 siblings, 0 replies; 60+ messages in thread
From: tip-bot2 for Chang S. Bae @ 2021-10-26 16:16 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Chang S. Bae, Thomas Gleixner, Borislav Petkov, x86, linux-kernel

The following commit has been merged into the x86/fpu branch of tip:

Commit-ID:     c351101678ce54492b6e09810ec02efc0df036a9
Gitweb:        https://git.kernel.org/tip/c351101678ce54492b6e09810ec02efc0df036a9
Author:        Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate:    Thu, 21 Oct 2021 15:55:16 -07:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Tue, 26 Oct 2021 10:18:09 +02:00

x86/cpufeatures: Add eXtended Feature Disabling (XFD) feature bit

Intel's eXtended Feature Disable (XFD) feature is an extension of the XSAVE
architecture. XFD allows the kernel to enable a feature state in XCR0 and
to receive a #NM trap when a task uses instructions accessing that state.

This is going to be used to postpone the allocation of a larger XSTATE
buffer for a task to the point where it is actually using a related
instruction after the permission to use that facility has been granted.

XFD is not used by the kernel, but only applied to userspace. This is a
matter of policy as the kernel knows how a fpstate is reallocated and the
XFD state.

The compacted XSAVE format is adjustable for dynamic features. Make XFD
depend on XSAVES.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211021225527.10184-13-chang.seok.bae@intel.com
---
 arch/x86/include/asm/cpufeatures.h | 1 +
 arch/x86/kernel/cpu/cpuid-deps.c   | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index d0ce5cf..ab7b3a2 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -277,6 +277,7 @@
 #define X86_FEATURE_XSAVEC		(10*32+ 1) /* XSAVEC instruction */
 #define X86_FEATURE_XGETBV1		(10*32+ 2) /* XGETBV with ECX = 1 instruction */
 #define X86_FEATURE_XSAVES		(10*32+ 3) /* XSAVES/XRSTORS instructions */
+#define X86_FEATURE_XFD			(10*32+ 4) /* "" eXtended Feature Disabling */
 
 /*
  * Extended auxiliary flags: Linux defined - for features scattered in various
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index defda61..d9ead9c 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -75,6 +75,7 @@ static const struct cpuid_dep cpuid_deps[] = {
 	{ X86_FEATURE_SGX_LC,			X86_FEATURE_SGX	      },
 	{ X86_FEATURE_SGX1,			X86_FEATURE_SGX       },
 	{ X86_FEATURE_SGX2,			X86_FEATURE_SGX1      },
+	{ X86_FEATURE_XFD,			X86_FEATURE_XSAVES    },
 	{}
 };
 

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [tip: x86/fpu] x86/fpu: Reset permission and fpstate on exec()
  2021-10-21 22:55 ` [PATCH 11/23] x86/fpu: Reset permission and fpstate on exec() Chang S. Bae
@ 2021-10-26 16:16   ` tip-bot2 for Chang S. Bae
  0 siblings, 0 replies; 60+ messages in thread
From: tip-bot2 for Chang S. Bae @ 2021-10-26 16:16 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Chang S. Bae, Borislav Petkov, x86, linux-kernel

The following commit has been merged into the x86/fpu branch of tip:

Commit-ID:     e61d6310a0f80cb986fd2076d432760b3619fb6d
Gitweb:        https://git.kernel.org/tip/e61d6310a0f80cb986fd2076d432760b3619fb6d
Author:        Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate:    Thu, 21 Oct 2021 15:55:15 -07:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Tue, 26 Oct 2021 10:18:09 +02:00

x86/fpu: Reset permission and fpstate on exec()

On exec(), extended register states saved in the buffer is cleared. With
dynamic features, each task carries variables besides the register states.
The struct fpu has permission information and struct fpstate contains
buffer size and feature masks. They are all dynamically updated with
dynamic features.

Reset the current task's entire FPU data before an exec() so that the new
task starts with default permission and fpstate.

Rename the register state reset function because the old naming confuses as
it does not reset struct fpstate.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211021225527.10184-12-chang.seok.bae@intel.com
---
 arch/x86/kernel/fpu/core.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 1ff6b83..3349068 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -544,7 +544,7 @@ static inline void restore_fpregs_from_init_fpstate(u64 features_mask)
 /*
  * Reset current->fpu memory state to the init values.
  */
-static void fpu_reset_fpstate(void)
+static void fpu_reset_fpregs(void)
 {
 	struct fpu *fpu = &current->thread.fpu;
 
@@ -579,7 +579,7 @@ void fpu__clear_user_states(struct fpu *fpu)
 
 	fpregs_lock();
 	if (!cpu_feature_enabled(X86_FEATURE_FPU)) {
-		fpu_reset_fpstate();
+		fpu_reset_fpregs();
 		fpregs_unlock();
 		return;
 	}
@@ -609,7 +609,8 @@ void fpu__clear_user_states(struct fpu *fpu)
 
 void fpu_flush_thread(void)
 {
-	fpu_reset_fpstate();
+	fpstate_reset(&current->thread.fpu);
+	fpu_reset_fpregs();
 }
 /*
  * Load FPU context before returning to userspace.

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [tip: x86/fpu] x86/fpu: Prepare fpu_clone() for dynamically enabled features
  2021-10-21 22:55 ` [PATCH 10/23] x86/fpu: Prepare fpu_clone() for dynamically enabled features Chang S. Bae
@ 2021-10-26 16:16   ` tip-bot2 for Thomas Gleixner
  0 siblings, 0 replies; 60+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2021-10-26 16:16 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Chang S. Bae, Borislav Petkov, x86, linux-kernel

The following commit has been merged into the x86/fpu branch of tip:

Commit-ID:     9e798e9aa14c45fb94e47b30bf6347b369ce9df7
Gitweb:        https://git.kernel.org/tip/9e798e9aa14c45fb94e47b30bf6347b369ce9df7
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Thu, 21 Oct 2021 15:55:14 -07:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Tue, 26 Oct 2021 10:18:09 +02:00

x86/fpu: Prepare fpu_clone() for dynamically enabled features

The default portion of the parent's FPU state is saved in a child task.
With dynamic features enabled, the non-default portion is not saved in a
child's fpstate because these register states are defined to be
caller-saved. The new task's fpstate is therefore the default buffer.

Fork inherits the permission of the parent.

Also, do not use memcpy() when TIF_NEED_FPU_LOAD is set because it is
invalid when the parent has dynamic features.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211021225527.10184-11-chang.seok.bae@intel.com
---
 arch/x86/include/asm/fpu/sched.h |  2 +-
 arch/x86/kernel/fpu/core.c       | 35 ++++++++++++++++++++++---------
 arch/x86/kernel/process.c        |  2 +-
 3 files changed, 27 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/fpu/sched.h b/arch/x86/include/asm/fpu/sched.h
index cdb78d5..99a8820 100644
--- a/arch/x86/include/asm/fpu/sched.h
+++ b/arch/x86/include/asm/fpu/sched.h
@@ -11,7 +11,7 @@
 
 extern void save_fpregs_to_fpstate(struct fpu *fpu);
 extern void fpu__drop(struct fpu *fpu);
-extern int  fpu_clone(struct task_struct *dst);
+extern int  fpu_clone(struct task_struct *dst, unsigned long clone_flags);
 extern void fpu_flush_thread(void);
 
 /*
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 4018083..1ff6b83 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -423,8 +423,20 @@ void fpstate_reset(struct fpu *fpu)
 	fpu->perm.__user_state_size	= fpu_user_cfg.default_size;
 }
 
+static inline void fpu_inherit_perms(struct fpu *dst_fpu)
+{
+	if (fpu_state_size_dynamic()) {
+		struct fpu *src_fpu = &current->group_leader->thread.fpu;
+
+		spin_lock_irq(&current->sighand->siglock);
+		/* Fork also inherits the permissions of the parent */
+		dst_fpu->perm = src_fpu->perm;
+		spin_unlock_irq(&current->sighand->siglock);
+	}
+}
+
 /* Clone current's FPU state on fork */
-int fpu_clone(struct task_struct *dst)
+int fpu_clone(struct task_struct *dst, unsigned long clone_flags)
 {
 	struct fpu *src_fpu = &current->thread.fpu;
 	struct fpu *dst_fpu = &dst->thread.fpu;
@@ -455,17 +467,20 @@ int fpu_clone(struct task_struct *dst)
 	}
 
 	/*
-	 * If the FPU registers are not owned by current just memcpy() the
-	 * state.  Otherwise save the FPU registers directly into the
-	 * child's FPU context, without any memory-to-memory copying.
+	 * Save the default portion of the current FPU state into the
+	 * clone. Assume all dynamic features to be defined as caller-
+	 * saved, which enables skipping both the expansion of fpstate
+	 * and the copying of any dynamic state.
+	 *
+	 * Do not use memcpy() when TIF_NEED_FPU_LOAD is set because
+	 * copying is not valid when current uses non-default states.
 	 */
 	fpregs_lock();
-	if (test_thread_flag(TIF_NEED_FPU_LOAD)) {
-		memcpy(&dst_fpu->fpstate->regs, &src_fpu->fpstate->regs,
-		       dst_fpu->fpstate->size);
-	} else {
-		save_fpregs_to_fpstate(dst_fpu);
-	}
+	if (test_thread_flag(TIF_NEED_FPU_LOAD))
+		fpregs_restore_userregs();
+	save_fpregs_to_fpstate(dst_fpu);
+	if (!(clone_flags & CLONE_THREAD))
+		fpu_inherit_perms(dst_fpu);
 	fpregs_unlock();
 
 	trace_x86_fpu_copy_src(src_fpu);
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 97fea16..99025e3 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -157,7 +157,7 @@ int copy_thread(unsigned long clone_flags, unsigned long sp, unsigned long arg,
 	frame->flags = X86_EFLAGS_FIXED;
 #endif
 
-	fpu_clone(p);
+	fpu_clone(p, clone_flags);
 
 	/* Kernel thread ? */
 	if (unlikely(p->flags & PF_KTHREAD)) {

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [tip: x86/fpu] x86/fpu/signal: Prepare for variable sigframe length
  2021-10-21 22:55 ` [PATCH 09/23] x86/fpu/signal: Prepare for variable sigframe length Chang S. Bae
@ 2021-10-26 16:16   ` tip-bot2 for Chang S. Bae
  0 siblings, 0 replies; 60+ messages in thread
From: tip-bot2 for Chang S. Bae @ 2021-10-26 16:16 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Chang S. Bae, Thomas Gleixner, Borislav Petkov, x86, linux-kernel

The following commit has been merged into the x86/fpu branch of tip:

Commit-ID:     53599b4d54b9b8dda1d537a558946869d2acbddc
Gitweb:        https://git.kernel.org/tip/53599b4d54b9b8dda1d537a558946869d2acbddc
Author:        Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate:    Thu, 21 Oct 2021 15:55:13 -07:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Tue, 26 Oct 2021 10:18:09 +02:00

x86/fpu/signal: Prepare for variable sigframe length

The software reserved portion of the fxsave frame in the signal frame
is copied from structures which have been set up at boot time. With
dynamically enabled features the content of these structures is no
longer correct because the xfeatures and size can be different per task.

Calculate the software reserved portion at runtime and fill in the
xfeatures and size values from the tasks active fpstate.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211021225527.10184-10-chang.seok.bae@intel.com
---
 arch/x86/kernel/fpu/internal.h |  3 +--
 arch/x86/kernel/fpu/signal.c   | 62 +++++++++++++--------------------
 arch/x86/kernel/fpu/xstate.c   |  1 +-
 3 files changed, 26 insertions(+), 40 deletions(-)

diff --git a/arch/x86/kernel/fpu/internal.h b/arch/x86/kernel/fpu/internal.h
index e1d8a35..dbdb31f 100644
--- a/arch/x86/kernel/fpu/internal.h
+++ b/arch/x86/kernel/fpu/internal.h
@@ -21,9 +21,6 @@ static __always_inline __pure bool use_fxsr(void)
 # define WARN_ON_FPU(x) ({ (void)(x); 0; })
 #endif
 
-/* Init functions */
-extern void fpu__init_prepare_fx_sw_frame(void);
-
 /* Used in init.c */
 extern void fpstate_init_user(struct fpstate *fpstate);
 extern void fpstate_reset(struct fpu *fpu);
diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index 3e42e6e..3b7f7d0 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -20,9 +20,6 @@
 #include "legacy.h"
 #include "xstate.h"
 
-static struct _fpx_sw_bytes fx_sw_reserved __ro_after_init;
-static struct _fpx_sw_bytes fx_sw_reserved_ia32 __ro_after_init;
-
 /*
  * Check for the presence of extended state information in the
  * user fpstate pointer in the sigcontext.
@@ -98,23 +95,42 @@ static inline bool save_fsave_header(struct task_struct *tsk, void __user *buf)
 	return true;
 }
 
+/*
+ * Prepare the SW reserved portion of the fxsave memory layout, indicating
+ * the presence of the extended state information in the memory layout
+ * pointed to by the fpstate pointer in the sigcontext.
+ * This is saved when ever the FP and extended state context is
+ * saved on the user stack during the signal handler delivery to the user.
+ */
+static inline void save_sw_bytes(struct _fpx_sw_bytes *sw_bytes, bool ia32_frame,
+				 struct fpstate *fpstate)
+{
+	sw_bytes->magic1 = FP_XSTATE_MAGIC1;
+	sw_bytes->extended_size = fpstate->user_size + FP_XSTATE_MAGIC2_SIZE;
+	sw_bytes->xfeatures = fpstate->user_xfeatures;
+	sw_bytes->xstate_size = fpstate->user_size;
+
+	if (ia32_frame)
+		sw_bytes->extended_size += sizeof(struct fregs_state);
+}
+
 static inline bool save_xstate_epilog(void __user *buf, int ia32_frame,
-				      unsigned int usize)
+				      struct fpstate *fpstate)
 {
 	struct xregs_state __user *x = buf;
-	struct _fpx_sw_bytes *sw_bytes;
+	struct _fpx_sw_bytes sw_bytes;
 	u32 xfeatures;
 	int err;
 
 	/* Setup the bytes not touched by the [f]xsave and reserved for SW. */
-	sw_bytes = ia32_frame ? &fx_sw_reserved_ia32 : &fx_sw_reserved;
-	err = __copy_to_user(&x->i387.sw_reserved, sw_bytes, sizeof(*sw_bytes));
+	save_sw_bytes(&sw_bytes, ia32_frame, fpstate);
+	err = __copy_to_user(&x->i387.sw_reserved, &sw_bytes, sizeof(sw_bytes));
 
 	if (!use_xsave())
 		return !err;
 
 	err |= __put_user(FP_XSTATE_MAGIC2,
-			  (__u32 __user *)(buf + usize));
+			  (__u32 __user *)(buf + fpstate->user_size));
 
 	/*
 	 * Read the xfeatures which we copied (directly from the cpu or
@@ -173,7 +189,7 @@ bool copy_fpstate_to_sigframe(void __user *buf, void __user *buf_fx, int size)
 {
 	struct task_struct *tsk = current;
 	struct fpstate *fpstate = tsk->thread.fpu.fpstate;
-	int ia32_fxstate = (buf != buf_fx);
+	bool ia32_fxstate = (buf != buf_fx);
 	int ret;
 
 	ia32_fxstate &= (IS_ENABLED(CONFIG_X86_32) ||
@@ -226,8 +242,7 @@ retry:
 	if ((ia32_fxstate || !use_fxsr()) && !save_fsave_header(tsk, buf))
 		return false;
 
-	if (use_fxsr() &&
-	    !save_xstate_epilog(buf_fx, ia32_fxstate, fpstate->user_size))
+	if (use_fxsr() && !save_xstate_epilog(buf_fx, ia32_fxstate, fpstate))
 		return false;
 
 	return true;
@@ -523,28 +538,3 @@ unsigned long __init fpu__get_fpstate_size(void)
 	return ret;
 }
 
-/*
- * Prepare the SW reserved portion of the fxsave memory layout, indicating
- * the presence of the extended state information in the memory layout
- * pointed by the fpstate pointer in the sigcontext.
- * This will be saved when ever the FP and extended state context is
- * saved on the user stack during the signal handler delivery to the user.
- */
-void __init fpu__init_prepare_fx_sw_frame(void)
-{
-	int size = fpu_user_cfg.default_size + FP_XSTATE_MAGIC2_SIZE;
-
-	fx_sw_reserved.magic1 = FP_XSTATE_MAGIC1;
-	fx_sw_reserved.extended_size = size;
-	fx_sw_reserved.xfeatures = fpu_user_cfg.default_features;
-	fx_sw_reserved.xstate_size = fpu_user_cfg.default_size;
-
-	if (IS_ENABLED(CONFIG_IA32_EMULATION) ||
-	    IS_ENABLED(CONFIG_X86_32)) {
-		int fsave_header_size = sizeof(struct fregs_state);
-
-		fx_sw_reserved_ia32 = fx_sw_reserved;
-		fx_sw_reserved_ia32.extended_size = size + fsave_header_size;
-	}
-}
-
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index c837cff..bf42ee2 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -830,7 +830,6 @@ void __init fpu__init_system_xstate(unsigned int legacy_size)
 	update_regset_xstate_info(fpu_user_cfg.max_size,
 				  fpu_user_cfg.max_features);
 
-	fpu__init_prepare_fx_sw_frame();
 	setup_init_fpu_buf();
 	setup_xstate_comp_offsets();
 	setup_supervisor_only_offsets();

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [tip: x86/fpu] x86/signal: Use fpu::__state_user_size for sigalt stack validation
  2021-10-21 22:55 ` [PATCH 08/23] x86/signal: Use fpu::__state_user_size for sigalt stack validation Chang S. Bae
@ 2021-10-26 16:16   ` tip-bot2 for Thomas Gleixner
  0 siblings, 0 replies; 60+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2021-10-26 16:16 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Chang S. Bae, Borislav Petkov, x86, linux-kernel

The following commit has been merged into the x86/fpu branch of tip:

Commit-ID:     4b7ca609a33dd8696bcbd2f1ad949e26a591592f
Gitweb:        https://git.kernel.org/tip/4b7ca609a33dd8696bcbd2f1ad949e26a591592f
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Thu, 21 Oct 2021 15:55:12 -07:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Tue, 26 Oct 2021 10:18:09 +02:00

x86/signal: Use fpu::__state_user_size for sigalt stack validation

Use the current->group_leader->fpu to check for pending permissions to use
extended features and validate against the resulting user space size which
is stored in the group leaders fpu struct as well.

This prevents a task from installing a too small sized sigaltstack after
permissions to use dynamically enabled features have been granted, but
the task has not (yet) used a related instruction.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211021225527.10184-9-chang.seok.bae@intel.com
---
 arch/x86/kernel/signal.c | 35 +++++++++++++++++++++++++++++++----
 1 file changed, 31 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index 0111a6a..ec71e06 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -32,6 +32,7 @@
 #include <asm/processor.h>
 #include <asm/ucontext.h>
 #include <asm/fpu/signal.h>
+#include <asm/fpu/xstate.h>
 #include <asm/vdso.h>
 #include <asm/mce.h>
 #include <asm/sighandling.h>
@@ -720,12 +721,15 @@ badframe:
 
 /* max_frame_size tells userspace the worst case signal stack size. */
 static unsigned long __ro_after_init max_frame_size;
+static unsigned int __ro_after_init fpu_default_state_size;
 
 void __init init_sigframe_size(void)
 {
+	fpu_default_state_size = fpu__get_fpstate_size();
+
 	max_frame_size = MAX_FRAME_SIGINFO_UCTXT_SIZE + MAX_FRAME_PADDING;
 
-	max_frame_size += fpu__get_fpstate_size() + MAX_XSAVE_PADDING;
+	max_frame_size += fpu_default_state_size + MAX_XSAVE_PADDING;
 
 	/* Userspace expects an aligned size. */
 	max_frame_size = round_up(max_frame_size, FRAME_ALIGNMENT);
@@ -928,15 +932,38 @@ __setup("strict_sas_size", strict_sas_size);
  * sigaltstack they just continued to work. While always checking against
  * the real size would be correct, this might be considered a regression.
  *
- * Therefore avoid the sanity check, unless enforced by kernel config or
- * command line option.
+ * Therefore avoid the sanity check, unless enforced by kernel
+ * configuration or command line option.
+ *
+ * When dynamic FPU features are supported, the check is also enforced when
+ * the task has permissions to use dynamic features. Tasks which have no
+ * permission are checked against the size of the non-dynamic feature set
+ * if strict checking is enabled. This avoids forcing all tasks on the
+ * system to allocate large sigaltstacks even if they are never going
+ * to use a dynamic feature. As this is serialized via sighand::siglock
+ * any permission request for a dynamic feature either happened already
+ * or will see the newly install sigaltstack size in the permission checks.
  */
 bool sigaltstack_size_valid(size_t ss_size)
 {
+	unsigned long fsize = max_frame_size - fpu_default_state_size;
+	u64 mask;
+
 	lockdep_assert_held(&current->sighand->siglock);
 
+	if (!fpu_state_size_dynamic() && !strict_sigaltstack_size)
+		return true;
+
+	fsize += current->group_leader->thread.fpu.perm.__user_state_size;
+	if (likely(ss_size > fsize))
+		return true;
+
 	if (strict_sigaltstack_size)
-		return ss_size > get_sigframe_size();
+		return ss_size > fsize;
+
+	mask = current->group_leader->thread.fpu.perm.__state_perm;
+	if (mask & XFEATURE_MASK_USER_DYNAMIC)
+		return ss_size > fsize;
 
 	return true;
 }

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [tip: x86/fpu] x86/fpu: Add basic helpers for dynamically enabled features
  2021-10-21 22:55 ` [PATCH 07/23] x86/fpu: Add basic helpers for dynamically enabled features Chang S. Bae
@ 2021-10-26 16:16   ` tip-bot2 for Thomas Gleixner
  0 siblings, 0 replies; 60+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2021-10-26 16:16 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Chang S. Bae, Borislav Petkov, x86, linux-kernel

The following commit has been merged into the x86/fpu branch of tip:

Commit-ID:     23686ef25d4ae81bc12fe3994d1905191fcf71f8
Gitweb:        https://git.kernel.org/tip/23686ef25d4ae81bc12fe3994d1905191fcf71f8
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Thu, 21 Oct 2021 15:55:11 -07:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Tue, 26 Oct 2021 10:18:09 +02:00

x86/fpu: Add basic helpers for dynamically enabled features

To allow building up the infrastructure required to support dynamically
enabled FPU features, add:

 - XFEATURES_MASK_DYNAMIC

   This constant will hold xfeatures which can be dynamically enabled.

 - fpu_state_size_dynamic()

   A static branch for 64-bit and a simple 'return false' for 32-bit.

   This helper allows to add dynamic-feature-specific changes to common
   code which is shared between 32-bit and 64-bit without #ifdeffery.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211021225527.10184-8-chang.seok.bae@intel.com
---
 arch/x86/include/asm/fpu/xstate.h | 21 +++++++++++++++++++++
 arch/x86/kernel/fpu/core.c        |  4 ++++
 2 files changed, 25 insertions(+)

diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
index 43ae89d..cf28546 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -43,6 +43,9 @@
 #define XFEATURE_MASK_USER_RESTORE	\
 	(XFEATURE_MASK_USER_SUPPORTED & ~XFEATURE_MASK_PKRU)
 
+/* Features which are dynamically enabled for a process on request */
+#define XFEATURE_MASK_USER_DYNAMIC	0ULL
+
 /* All currently supported supervisor features */
 #define XFEATURE_MASK_SUPERVISOR_SUPPORTED (XFEATURE_MASK_PASID)
 
@@ -96,4 +99,22 @@ int xfeature_size(int xfeature_nr);
 void xsaves(struct xregs_state *xsave, u64 mask);
 void xrstors(struct xregs_state *xsave, u64 mask);
 
+#ifdef CONFIG_X86_64
+DECLARE_STATIC_KEY_FALSE(__fpu_state_size_dynamic);
+#endif
+
+#ifdef CONFIG_X86_64
+DECLARE_STATIC_KEY_FALSE(__fpu_state_size_dynamic);
+
+static __always_inline __pure bool fpu_state_size_dynamic(void)
+{
+	return static_branch_unlikely(&__fpu_state_size_dynamic);
+}
+#else
+static __always_inline __pure bool fpu_state_size_dynamic(void)
+{
+	return false;
+}
+#endif
+
 #endif
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index b05f6a3..4018083 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -25,6 +25,10 @@
 #define CREATE_TRACE_POINTS
 #include <asm/trace/fpu.h>
 
+#ifdef CONFIG_X86_64
+DEFINE_STATIC_KEY_FALSE(__fpu_state_size_dynamic);
+#endif
+
 /* The FPU state configuration data for kernel and user space */
 struct fpu_state_config	fpu_kernel_cfg __ro_after_init;
 struct fpu_state_config fpu_user_cfg __ro_after_init;

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [tip: x86/fpu] x86/arch_prctl: Add controls for dynamic XSTATE components
  2021-10-21 22:55 ` [PATCH 06/23] x86/arch_prctl: Add controls for dynamic XSTATE components Chang S. Bae
  2021-10-24 21:17   ` Borislav Petkov
@ 2021-10-26 16:16   ` tip-bot2 for Chang S. Bae
  1 sibling, 0 replies; 60+ messages in thread
From: tip-bot2 for Chang S. Bae @ 2021-10-26 16:16 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Chang S. Bae, Thomas Gleixner, Borislav Petkov, x86, linux-kernel

The following commit has been merged into the x86/fpu branch of tip:

Commit-ID:     db8268df0983adc2bb1fb48c9e5f7bfbb5f617f3
Gitweb:        https://git.kernel.org/tip/db8268df0983adc2bb1fb48c9e5f7bfbb5f617f3
Author:        Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate:    Thu, 21 Oct 2021 15:55:10 -07:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Tue, 26 Oct 2021 10:18:09 +02:00

x86/arch_prctl: Add controls for dynamic XSTATE components

Dynamically enabled XSTATE features are by default disabled for all
processes. A process has to request permission to use such a feature.

To support this implement a architecture specific prctl() with the options:

   - ARCH_GET_XCOMP_SUPP

     Copies the supported feature bitmap into the user space provided
     u64 storage. The pointer is handed in via arg2

   - ARCH_GET_XCOMP_PERM

     Copies the process wide permitted feature bitmap into the user space
     provided u64 storage. The pointer is handed in via arg2

   - ARCH_REQ_XCOMP_PERM

     Request permission for a feature set. A feature set can be mapped to a
     facility, e.g. AMX, and can require one or more XSTATE components to
     be enabled.

     The feature argument is the number of the highest XSTATE component
     which is required for a facility to work.

     The request argument is not a user supplied bitmap because that makes
     filtering harder (think seccomp) and even impossible because to
     support 32bit tasks the argument would have to be a pointer.

The permission mechanism works this way:

   Task asks for permission for a facility and kernel checks whether that's
   supported. If supported it does:

     1) Check whether permission has already been granted

     2) Compute the size of the required kernel and user space buffer
        (sigframe) size.

     3) Validate that no task has a sigaltstack installed
        which is smaller than the resulting sigframe size

     4) Add the requested feature bit(s) to the permission bitmap of
        current->group_leader->fpu and store the sizes in the group
        leaders fpu struct as well.

If that is successful then the feature is still not enabled for any of the
tasks. The first usage of a related instruction will result in a #NM
trap. The trap handler validates the permission bit of the tasks group
leader and if permitted it installs a larger kernel buffer and transfers
the permission and size info to the new fpstate container which makes all
the FPU functions which require per task information aware of the extended
feature set.

  [ tglx: Adopted to new base code, added missing serialization,
          massaged namings, comments and changelog ]

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211021225527.10184-7-chang.seok.bae@intel.com
---
 arch/x86/include/asm/fpu/api.h    |   4 +-
 arch/x86/include/asm/proto.h      |   2 +-
 arch/x86/include/uapi/asm/prctl.h |   4 +-
 arch/x86/kernel/fpu/xstate.c      | 156 +++++++++++++++++++++++++++++-
 arch/x86/kernel/fpu/xstate.h      |   6 +-
 arch/x86/kernel/process.c         |   9 +-
 6 files changed, 178 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h
index e9379d7..798ae92 100644
--- a/arch/x86/include/asm/fpu/api.h
+++ b/arch/x86/include/asm/fpu/api.h
@@ -151,4 +151,8 @@ static inline bool fpstate_is_confidential(struct fpu_guest *gfpu)
 	return gfpu->fpstate->is_confidential;
 }
 
+/* prctl */
+struct task_struct;
+extern long fpu_xstate_prctl(struct task_struct *tsk, int option, unsigned long arg2);
+
 #endif /* _ASM_X86_FPU_API_H */
diff --git a/arch/x86/include/asm/proto.h b/arch/x86/include/asm/proto.h
index 8c5d191..feed36d 100644
--- a/arch/x86/include/asm/proto.h
+++ b/arch/x86/include/asm/proto.h
@@ -40,6 +40,6 @@ void x86_report_nx(void);
 extern int reboot_force;
 
 long do_arch_prctl_common(struct task_struct *task, int option,
-			  unsigned long cpuid_enabled);
+			  unsigned long arg2);
 
 #endif /* _ASM_X86_PROTO_H */
diff --git a/arch/x86/include/uapi/asm/prctl.h b/arch/x86/include/uapi/asm/prctl.h
index 5a6aac9..754a078 100644
--- a/arch/x86/include/uapi/asm/prctl.h
+++ b/arch/x86/include/uapi/asm/prctl.h
@@ -10,6 +10,10 @@
 #define ARCH_GET_CPUID		0x1011
 #define ARCH_SET_CPUID		0x1012
 
+#define ARCH_GET_XCOMP_SUPP	0x1021
+#define ARCH_GET_XCOMP_PERM	0x1022
+#define ARCH_REQ_XCOMP_PERM	0x1023
+
 #define ARCH_MAP_VDSO_X32	0x2001
 #define ARCH_MAP_VDSO_32	0x2002
 #define ARCH_MAP_VDSO_64	0x2003
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 310c420..c837cff 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -8,6 +8,7 @@
 #include <linux/compat.h>
 #include <linux/cpu.h>
 #include <linux/mman.h>
+#include <linux/nospec.h>
 #include <linux/pkeys.h>
 #include <linux/seq_file.h>
 #include <linux/proc_fs.h>
@@ -18,6 +19,8 @@
 #include <asm/fpu/xcr.h>
 
 #include <asm/tlbflush.h>
+#include <asm/prctl.h>
+#include <asm/elf.h>
 
 #include "internal.h"
 #include "legacy.h"
@@ -1298,6 +1301,159 @@ void fpstate_clear_xstate_component(struct fpstate *fps, unsigned int xfeature)
 EXPORT_SYMBOL_GPL(fpstate_clear_xstate_component);
 #endif
 
+#ifdef CONFIG_X86_64
+static int validate_sigaltstack(unsigned int usize)
+{
+	struct task_struct *thread, *leader = current->group_leader;
+	unsigned long framesize = get_sigframe_size();
+
+	lockdep_assert_held(&current->sighand->siglock);
+
+	/* get_sigframe_size() is based on fpu_user_cfg.max_size */
+	framesize -= fpu_user_cfg.max_size;
+	framesize += usize;
+	for_each_thread(leader, thread) {
+		if (thread->sas_ss_size && thread->sas_ss_size < framesize)
+			return -ENOSPC;
+	}
+	return 0;
+}
+
+static int __xstate_request_perm(u64 permitted, u64 requested)
+{
+	/*
+	 * This deliberately does not exclude !XSAVES as we still might
+	 * decide to optionally context switch XCR0 or talk the silicon
+	 * vendors into extending XFD for the pre AMX states.
+	 */
+	bool compacted = cpu_feature_enabled(X86_FEATURE_XSAVES);
+	struct fpu *fpu = &current->group_leader->thread.fpu;
+	unsigned int ksize, usize;
+	u64 mask;
+	int ret;
+
+	/* Check whether fully enabled */
+	if ((permitted & requested) == requested)
+		return 0;
+
+	/* Calculate the resulting kernel state size */
+	mask = permitted | requested;
+	ksize = xstate_calculate_size(mask, compacted);
+
+	/* Calculate the resulting user state size */
+	mask &= XFEATURE_MASK_USER_SUPPORTED;
+	usize = xstate_calculate_size(mask, false);
+
+	ret = validate_sigaltstack(usize);
+	if (ret)
+		return ret;
+
+	/* Pairs with the READ_ONCE() in xstate_get_group_perm() */
+	WRITE_ONCE(fpu->perm.__state_perm, requested);
+	/* Protected by sighand lock */
+	fpu->perm.__state_size = ksize;
+	fpu->perm.__user_state_size = usize;
+	return ret;
+}
+
+/*
+ * Permissions array to map facilities with more than one component
+ */
+static const u64 xstate_prctl_req[XFEATURE_MAX] = {
+	/* [XFEATURE_XTILE_DATA] = XFEATURE_MASK_XTILE, */
+};
+
+static int xstate_request_perm(unsigned long idx)
+{
+	u64 permitted, requested;
+	int ret;
+
+	if (idx >= XFEATURE_MAX)
+		return -EINVAL;
+
+	/*
+	 * Look up the facility mask which can require more than
+	 * one xstate component.
+	 */
+	idx = array_index_nospec(idx, ARRAY_SIZE(xstate_prctl_req));
+	requested = xstate_prctl_req[idx];
+	if (!requested)
+		return -EOPNOTSUPP;
+
+	if ((fpu_user_cfg.max_features & requested) != requested)
+		return -EOPNOTSUPP;
+
+	/* Lockless quick check */
+	permitted = xstate_get_host_group_perm();
+	if ((permitted & requested) == requested)
+		return 0;
+
+	/* Protect against concurrent modifications */
+	spin_lock_irq(&current->sighand->siglock);
+	permitted = xstate_get_host_group_perm();
+	ret = __xstate_request_perm(permitted, requested);
+	spin_unlock_irq(&current->sighand->siglock);
+	return ret;
+}
+#else /* CONFIG_X86_64 */
+static inline int xstate_request_perm(unsigned long idx)
+{
+	return -EPERM;
+}
+#endif  /* !CONFIG_X86_64 */
+
+/**
+ * fpu_xstate_prctl - xstate permission operations
+ * @tsk:	Redundant pointer to current
+ * @option:	A subfunction of arch_prctl()
+ * @arg2:	option argument
+ * Return:	0 if successful; otherwise, an error code
+ *
+ * Option arguments:
+ *
+ * ARCH_GET_XCOMP_SUPP: Pointer to user space u64 to store the info
+ * ARCH_GET_XCOMP_PERM: Pointer to user space u64 to store the info
+ * ARCH_REQ_XCOMP_PERM: Facility number requested
+ *
+ * For facilities which require more than one XSTATE component, the request
+ * must be the highest state component number related to that facility,
+ * e.g. for AMX which requires XFEATURE_XTILE_CFG(17) and
+ * XFEATURE_XTILE_DATA(18) this would be XFEATURE_XTILE_DATA(18).
+ */
+long fpu_xstate_prctl(struct task_struct *tsk, int option, unsigned long arg2)
+{
+	u64 __user *uptr = (u64 __user *)arg2;
+	u64 permitted, supported;
+	unsigned long idx = arg2;
+
+	if (tsk != current)
+		return -EPERM;
+
+	switch (option) {
+	case ARCH_GET_XCOMP_SUPP:
+		supported = fpu_user_cfg.max_features |	fpu_user_cfg.legacy_features;
+		return put_user(supported, uptr);
+
+	case ARCH_GET_XCOMP_PERM:
+		/*
+		 * Lockless snapshot as it can also change right after the
+		 * dropping the lock.
+		 */
+		permitted = xstate_get_host_group_perm();
+		permitted &= XFEATURE_MASK_USER_SUPPORTED;
+		return put_user(permitted, uptr);
+
+	case ARCH_REQ_XCOMP_PERM:
+		if (!IS_ENABLED(CONFIG_X86_64))
+			return -EOPNOTSUPP;
+
+		return xstate_request_perm(idx);
+
+	default:
+		return -EINVAL;
+	}
+}
+
 #ifdef CONFIG_PROC_PID_ARCH_STATUS
 /*
  * Report the amount of time elapsed in millisecond since last AVX512
diff --git a/arch/x86/kernel/fpu/xstate.h b/arch/x86/kernel/fpu/xstate.h
index a1aa0ba..4ce1dc0 100644
--- a/arch/x86/kernel/fpu/xstate.h
+++ b/arch/x86/kernel/fpu/xstate.h
@@ -15,6 +15,12 @@ static inline void xstate_init_xcomp_bv(struct xregs_state *xsave, u64 mask)
 		xsave->header.xcomp_bv = mask | XCOMP_BV_COMPACTED_FORMAT;
 }
 
+static inline u64 xstate_get_host_group_perm(void)
+{
+	/* Pairs with WRITE_ONCE() in xstate_request_perm() */
+	return READ_ONCE(current->group_leader->thread.fpu.perm.__state_perm);
+}
+
 enum xstate_copy_mode {
 	XSTATE_COPY_FP,
 	XSTATE_COPY_FX,
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index c74c7e8..97fea16 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -30,6 +30,7 @@
 #include <asm/apic.h>
 #include <linux/uaccess.h>
 #include <asm/mwait.h>
+#include <asm/fpu/api.h>
 #include <asm/fpu/sched.h>
 #include <asm/debugreg.h>
 #include <asm/nmi.h>
@@ -1003,13 +1004,17 @@ out:
 }
 
 long do_arch_prctl_common(struct task_struct *task, int option,
-			  unsigned long cpuid_enabled)
+			  unsigned long arg2)
 {
 	switch (option) {
 	case ARCH_GET_CPUID:
 		return get_cpuid_mode();
 	case ARCH_SET_CPUID:
-		return set_cpuid_mode(task, cpuid_enabled);
+		return set_cpuid_mode(task, arg2);
+	case ARCH_GET_XCOMP_SUPP:
+	case ARCH_GET_XCOMP_PERM:
+	case ARCH_REQ_XCOMP_PERM:
+		return fpu_xstate_prctl(task, option, arg2);
 	}
 
 	return -EINVAL;

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [tip: x86/fpu] x86/fpu: Add members to struct fpu to cache permission information
  2021-10-21 22:55 ` [PATCH 04/23] x86/fpu: Add members to struct fpu to cache permission information Chang S. Bae
@ 2021-10-26 16:16   ` tip-bot2 for Thomas Gleixner
  0 siblings, 0 replies; 60+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2021-10-26 16:16 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Chang S. Bae, Borislav Petkov, x86, linux-kernel

The following commit has been merged into the x86/fpu branch of tip:

Commit-ID:     6f6a7c09c4065a5b140194dfcfe4cf7104fec4d2
Gitweb:        https://git.kernel.org/tip/6f6a7c09c4065a5b140194dfcfe4cf7104fec4d2
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Thu, 21 Oct 2021 15:55:08 -07:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Tue, 26 Oct 2021 10:18:09 +02:00

x86/fpu: Add members to struct fpu to cache permission information

Dynamically enabled features can be requested by any thread of a running
process at any time. The request does neither enable the feature nor
allocate larger buffers. It just stores the permission to use the feature
by adding the features to the permission bitmap and by calculating the
required sizes for kernel and user space.

The reallocation of the kernel buffer happens when the feature is used
for the first time which is caught by an exception. The permission
bitmap is then checked and if the feature is permitted, then it becomes
fully enabled. If not, the task dies similarly to a task which uses an
undefined instruction.

The size information is precomputed to allow proper sigaltstack size checks
once the feature is permitted, but not yet in use because otherwise this
would open race windows where too small stacks could be installed causing
a later fail on signal delivery.

Initialize them to the default feature set and sizes.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211021225527.10184-5-chang.seok.bae@intel.com
---
 arch/x86/include/asm/fpu/types.h | 46 +++++++++++++++++++++++++++++++-
 arch/x86/kernel/fpu/core.c       |  5 +++-
 2 files changed, 51 insertions(+)

diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
index c72cb22..c3ec562 100644
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -352,6 +352,45 @@ struct fpstate {
 	/* @regs is dynamically sized! Don't add anything after @regs! */
 } __aligned(64);
 
+struct fpu_state_perm {
+	/*
+	 * @__state_perm:
+	 *
+	 * This bitmap indicates the permission for state components, which
+	 * are available to a thread group. The permission prctl() sets the
+	 * enabled state bits in thread_group_leader()->thread.fpu.
+	 *
+	 * All run time operations use the per thread information in the
+	 * currently active fpu.fpstate which contains the xfeature masks
+	 * and sizes for kernel and user space.
+	 *
+	 * This master permission field is only to be used when
+	 * task.fpu.fpstate based checks fail to validate whether the task
+	 * is allowed to expand it's xfeatures set which requires to
+	 * allocate a larger sized fpstate buffer.
+	 *
+	 * Do not access this field directly.  Use the provided helper
+	 * function. Unlocked access is possible for quick checks.
+	 */
+	u64				__state_perm;
+
+	/*
+	 * @__state_size:
+	 *
+	 * The size required for @__state_perm. Only valid to access
+	 * with sighand locked.
+	 */
+	unsigned int			__state_size;
+
+	/*
+	 * @__user_state_size:
+	 *
+	 * The size required for @__state_perm user part. Only valid to
+	 * access with sighand locked.
+	 */
+	unsigned int			__user_state_size;
+};
+
 /*
  * Highest level per task FPU state data structure that
  * contains the FPU register state plus various FPU
@@ -396,6 +435,13 @@ struct fpu {
 	struct fpstate			*__task_fpstate;
 
 	/*
+	 * @perm:
+	 *
+	 * Permission related information
+	 */
+	struct fpu_state_perm		perm;
+
+	/*
 	 * @__fpstate:
 	 *
 	 * Initial in-memory storage for FPU registers which are saved in
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 9c475e2..b05f6a3 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -412,6 +412,11 @@ void fpstate_reset(struct fpu *fpu)
 	/* Set the fpstate pointer to the default fpstate */
 	fpu->fpstate = &fpu->__fpstate;
 	__fpstate_reset(fpu->fpstate);
+
+	/* Initialize the permission related info in fpu */
+	fpu->perm.__state_perm		= fpu_kernel_cfg.default_features;
+	fpu->perm.__state_size		= fpu_kernel_cfg.default_size;
+	fpu->perm.__user_state_size	= fpu_user_cfg.default_size;
 }
 
 /* Clone current's FPU state on fork */

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [tip: x86/fpu] x86/fpu: Add fpu_state_config::legacy_features
  2021-10-21 22:55 ` [PATCH 05/23] x86/fpu: Add fpu_state_config::legacy_features Chang S. Bae
@ 2021-10-26 16:16   ` tip-bot2 for Thomas Gleixner
  0 siblings, 0 replies; 60+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2021-10-26 16:16 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Chang S. Bae, Borislav Petkov, x86, linux-kernel

The following commit has been merged into the x86/fpu branch of tip:

Commit-ID:     c33f0a81a2cf3920465309ce683534751bb86485
Gitweb:        https://git.kernel.org/tip/c33f0a81a2cf3920465309ce683534751bb86485
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Thu, 21 Oct 2021 15:55:09 -07:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Tue, 26 Oct 2021 10:18:09 +02:00

x86/fpu: Add fpu_state_config::legacy_features

The upcoming prctl() which is required to request the permission for a
dynamically enabled feature will also provide an option to retrieve the
supported features. If the CPU does not support XSAVE, the supported
features would be 0 even when the CPU supports FP and SSE.

Provide separate storage for the legacy feature set to avoid that and fill
in the bits in the legacy init function.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211021225527.10184-6-chang.seok.bae@intel.com
---
 arch/x86/include/asm/fpu/types.h |  7 +++++++
 arch/x86/kernel/fpu/init.c       |  9 ++++++---
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
index c3ec562..595122f 100644
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -503,6 +503,13 @@ struct fpu_state_config {
 	 * be requested by user space before usage.
 	 */
 	u64 default_features;
+	/*
+	 * @legacy_features:
+	 *
+	 * Features which can be reported back to user space
+	 * even without XSAVE support, i.e. legacy features FP + SSE
+	 */
+	u64 legacy_features;
 };
 
 /* FPU state configuration information */
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 7074154..621f4b6 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -193,12 +193,15 @@ static void __init fpu__init_system_xstate_size_legacy(void)
 	 * Note that the size configuration might be overwritten later
 	 * during fpu__init_system_xstate().
 	 */
-	if (!cpu_feature_enabled(X86_FEATURE_FPU))
+	if (!cpu_feature_enabled(X86_FEATURE_FPU)) {
 		size = sizeof(struct swregs_state);
-	else if (cpu_feature_enabled(X86_FEATURE_FXSR))
+	} else if (cpu_feature_enabled(X86_FEATURE_FXSR)) {
 		size = sizeof(struct fxregs_state);
-	else
+		fpu_user_cfg.legacy_features = XFEATURE_MASK_FPSSE;
+	} else {
 		size = sizeof(struct fregs_state);
+		fpu_user_cfg.legacy_features = XFEATURE_MASK_FP;
+	}
 
 	fpu_kernel_cfg.max_size = size;
 	fpu_kernel_cfg.default_size = size;

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [tip: x86/fpu] x86/fpu/xstate: Provide xstate_calculate_size()
  2021-10-21 22:55 ` [PATCH 03/23] x86/fpu/xstate: Provide xstate_calculate_size() Chang S. Bae
@ 2021-10-26 16:16   ` tip-bot2 for Chang S. Bae
  0 siblings, 0 replies; 60+ messages in thread
From: tip-bot2 for Chang S. Bae @ 2021-10-26 16:16 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Chang S. Bae, Thomas Gleixner, Borislav Petkov, x86, linux-kernel

The following commit has been merged into the x86/fpu branch of tip:

Commit-ID:     84e4dccc8fce20b497388d756e12de5c9006eb48
Gitweb:        https://git.kernel.org/tip/84e4dccc8fce20b497388d756e12de5c9006eb48
Author:        Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate:    Thu, 21 Oct 2021 15:55:07 -07:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Tue, 26 Oct 2021 10:18:09 +02:00

x86/fpu/xstate: Provide xstate_calculate_size()

Split out the size calculation from the paranoia check so it can be used
for recalculating buffer sizes when dynamically enabled features are
supported.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
[ tglx: Adopted to changed base code ]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211021225527.10184-4-chang.seok.bae@intel.com
---
 arch/x86/kernel/fpu/xstate.c | 46 +++++++++++++++++++++--------------
 1 file changed, 28 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index cbba381..310c420 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -549,6 +549,33 @@ static bool __init check_xstate_against_struct(int nr)
 	return true;
 }
 
+static unsigned int xstate_calculate_size(u64 xfeatures, bool compacted)
+{
+	unsigned int size = FXSAVE_SIZE + XSAVE_HDR_SIZE;
+	int i;
+
+	for_each_extended_xfeature(i, xfeatures) {
+		/* Align from the end of the previous feature */
+		if (xfeature_is_aligned(i))
+			size = ALIGN(size, 64);
+		/*
+		 * In compacted format the enabled features are packed,
+		 * i.e. disabled features do not occupy space.
+		 *
+		 * In non-compacted format the offsets are fixed and
+		 * disabled states still occupy space in the memory buffer.
+		 */
+		if (!compacted)
+			size = xfeature_uncompacted_offset(i);
+		/*
+		 * Add the feature size even for non-compacted format
+		 * to make the end result correct
+		 */
+		size += xfeature_size(i);
+	}
+	return size;
+}
+
 /*
  * This essentially double-checks what the cpu told us about
  * how large the XSAVE buffer needs to be.  We are recalculating
@@ -575,25 +602,8 @@ static bool __init paranoid_xstate_size_valid(unsigned int kernel_size)
 			XSTATE_WARN_ON(1);
 			return false;
 		}
-
-		/* Align from the end of the previous feature */
-		if (xfeature_is_aligned(i))
-			size = ALIGN(size, 64);
-		/*
-		 * In compacted format the enabled features are packed,
-		 * i.e. disabled features do not occupy space.
-		 *
-		 * In non-compacted format the offsets are fixed and
-		 * disabled states still occupy space in the memory buffer.
-		 */
-		if (!compacted)
-			size = xfeature_uncompacted_offset(i);
-		/*
-		 * Add the feature size even for non-compacted format
-		 * to make the end result correct
-		 */
-		size += xfeature_size(i);
 	}
+	size = xstate_calculate_size(fpu_kernel_cfg.max_features, compacted);
 	XSTATE_WARN_ON(size != kernel_size);
 	return size == kernel_size;
 }

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [tip: x86/fpu] x86/signal: Implement sigaltstack size validation
  2021-10-21 22:55 ` [PATCH 02/23] x86/signal: Implement sigaltstack size validation Chang S. Bae
@ 2021-10-26 16:16   ` tip-bot2 for Thomas Gleixner
  0 siblings, 0 replies; 60+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2021-10-26 16:16 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Chang S. Bae, Borislav Petkov, x86, linux-kernel

The following commit has been merged into the x86/fpu branch of tip:

Commit-ID:     3aac3ebea08f2d342364f827c8979ab0e1dd591e
Gitweb:        https://git.kernel.org/tip/3aac3ebea08f2d342364f827c8979ab0e1dd591e
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Thu, 21 Oct 2021 15:55:06 -07:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Tue, 26 Oct 2021 10:18:09 +02:00

x86/signal: Implement sigaltstack size validation

For historical reasons MINSIGSTKSZ is a constant which became already too
small with AVX512 support.

Add a mechanism to enforce strict checking of the sigaltstack size against
the real size of the FPU frame.

The strict check can be enabled via a config option and can also be
controlled via the kernel command line option 'strict_sas_size' independent
of the config switch.

Enabling it might break existing applications which allocate a too small
sigaltstack but 'work' because they never get a signal delivered. Though it
can be handy to filter out binaries which are not yet aware of
AT_MINSIGSTKSZ.

Also the upcoming support for dynamically enabled FPU features requires a
strict sanity check to ensure that:

   - Enabling of a dynamic feature, which changes the sigframe size fits
     into an enabled sigaltstack

   - Installing a too small sigaltstack after a dynamic feature has been
     added is not possible.

Implement the base check which is controlled by config and command line
options.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211021225527.10184-3-chang.seok.bae@intel.com
---
 Documentation/admin-guide/kernel-parameters.txt |  9 ++++-
 arch/x86/Kconfig                                | 17 ++++++++-
 arch/x86/kernel/signal.c                        | 35 ++++++++++++++++-
 3 files changed, 61 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 43dc35f..eb9a73a 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5497,6 +5497,15 @@
 	stifb=		[HW]
 			Format: bpp:<bpp1>[:<bpp2>[:<bpp3>...]]
 
+        strict_sas_size=
+			[X86]
+			Format: <bool>
+			Enable or disable strict sigaltstack size checks
+			against the required signal frame size which
+			depends on the supported FPU features. This can
+			be used to filter out binaries which have
+			not yet been made aware of AT_MINSIGSTKSZ.
+
 	sunrpc.min_resvport=
 	sunrpc.max_resvport=
 			[NFS,SUNRPC]
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index d9830e7..8584b30 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -125,6 +125,7 @@ config X86
 	select CLOCKSOURCE_VALIDATE_LAST_CYCLE
 	select CLOCKSOURCE_WATCHDOG
 	select DCACHE_WORD_ACCESS
+	select DYNAMIC_SIGFRAME
 	select EDAC_ATOMIC_SCRUB
 	select EDAC_SUPPORT
 	select GENERIC_CLOCKEVENTS_BROADCAST	if X86_64 || (X86_32 && X86_LOCAL_APIC)
@@ -2388,6 +2389,22 @@ config MODIFY_LDT_SYSCALL
 
 	  Saying 'N' here may make sense for embedded or server kernels.
 
+config STRICT_SIGALTSTACK_SIZE
+	bool "Enforce strict size checking for sigaltstack"
+	depends on DYNAMIC_SIGFRAME
+	help
+	  For historical reasons MINSIGSTKSZ is a constant which became
+	  already too small with AVX512 support. Add a mechanism to
+	  enforce strict checking of the sigaltstack size against the
+	  real size of the FPU frame. This option enables the check
+	  by default. It can also be controlled via the kernel command
+	  line option 'strict_sas_size' independent of this config
+	  switch. Enabling it might break existing applications which
+	  allocate a too small sigaltstack but 'work' because they
+	  never get a signal delivered.
+
+	  Say 'N' unless you want to really enforce this check.
+
 source "kernel/livepatch/Kconfig"
 
 endmenu
diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index 58bd070..0111a6a 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -15,6 +15,7 @@
 #include <linux/mm.h>
 #include <linux/smp.h>
 #include <linux/kernel.h>
+#include <linux/kstrtox.h>
 #include <linux/errno.h>
 #include <linux/wait.h>
 #include <linux/tracehook.h>
@@ -40,6 +41,7 @@
 #include <linux/compat.h>
 #include <asm/proto.h>
 #include <asm/ia32_unistd.h>
+#include <asm/fpu/xstate.h>
 #endif /* CONFIG_X86_64 */
 
 #include <asm/syscall.h>
@@ -907,6 +909,39 @@ void signal_fault(struct pt_regs *regs, void __user *frame, char *where)
 	force_sig(SIGSEGV);
 }
 
+#ifdef CONFIG_DYNAMIC_SIGFRAME
+#ifdef CONFIG_STRICT_SIGALTSTACK_SIZE
+static bool strict_sigaltstack_size __ro_after_init = true;
+#else
+static bool strict_sigaltstack_size __ro_after_init = false;
+#endif
+
+static int __init strict_sas_size(char *arg)
+{
+	return kstrtobool(arg, &strict_sigaltstack_size);
+}
+__setup("strict_sas_size", strict_sas_size);
+
+/*
+ * MINSIGSTKSZ is 2048 and can't be changed despite the fact that AVX512
+ * exceeds that size already. As such programs might never use the
+ * sigaltstack they just continued to work. While always checking against
+ * the real size would be correct, this might be considered a regression.
+ *
+ * Therefore avoid the sanity check, unless enforced by kernel config or
+ * command line option.
+ */
+bool sigaltstack_size_valid(size_t ss_size)
+{
+	lockdep_assert_held(&current->sighand->siglock);
+
+	if (strict_sigaltstack_size)
+		return ss_size > get_sigframe_size();
+
+	return true;
+}
+#endif /* CONFIG_DYNAMIC_SIGFRAME */
+
 #ifdef CONFIG_X86_X32_ABI
 COMPAT_SYSCALL_DEFINE0(x32_rt_sigreturn)
 {

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [tip: x86/fpu] signal: Add an optional check for altstack size
  2021-10-21 22:55 ` [PATCH 01/23] signal: Add an optional check for altstack size Chang S. Bae
  2021-10-22  0:06   ` Bae, Chang Seok
  2021-10-22 15:20   ` Eric W. Biederman
@ 2021-10-26 16:16   ` tip-bot2 for Thomas Gleixner
  2 siblings, 0 replies; 60+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2021-10-26 16:16 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Chang S. Bae, Borislav Petkov, x86, linux-kernel

The following commit has been merged into the x86/fpu branch of tip:

Commit-ID:     1bdda24c4af64cd2d65dec5192ab624c5fee7ca0
Gitweb:        https://git.kernel.org/tip/1bdda24c4af64cd2d65dec5192ab624c5fee7ca0
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Thu, 21 Oct 2021 15:55:05 -07:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Tue, 26 Oct 2021 10:15:12 +02:00

signal: Add an optional check for altstack size

New x86 FPU features will be very large, requiring ~10k of stack in
signal handlers.  These new features require a new approach called
"dynamic features".

The kernel currently tries to ensure that altstacks are reasonably
sized. Right now, on x86, sys_sigaltstack() requires a size of >=2k.
However, that 2k is a constant. Simply raising that 2k requirement
to >10k for the new features would break existing apps which have a
compiled-in size of 2k.

Instead of universally enforcing a larger stack, prohibit a process from
using dynamic features without properly-sized altstacks. This must be
enforced in two places:

 * A dynamic feature can not be enabled without an large-enough altstack
   for each process thread.
 * Once a dynamic feature is enabled, any request to install a too-small
   altstack will be rejected

The dynamic feature enabling code must examine each thread in a
process to ensure that the altstacks are large enough. Add a new lock
(sigaltstack_lock()) to ensure that threads can not race and change
their altstack after being examined.

Add the infrastructure in form of a config option and provide empty
stubs for architectures which do not need dynamic altstack size checks.

This implementation will be fleshed out for x86 in a future patch called

  x86/arch_prctl: Add controls for dynamic XSTATE components

  [dhansen: commit message. ]

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211021225527.10184-2-chang.seok.bae@intel.com
---
 arch/Kconfig           |  3 +++
 include/linux/signal.h |  6 ++++++
 kernel/signal.c        | 35 +++++++++++++++++++++++++++++------
 3 files changed, 38 insertions(+), 6 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 8df1c71..af5cf30 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -1288,6 +1288,9 @@ config ARCH_HAS_ELFCORE_COMPAT
 config ARCH_HAS_PARANOID_L1D_FLUSH
 	bool
 
+config DYNAMIC_SIGFRAME
+	bool
+
 source "kernel/gcov/Kconfig"
 
 source "scripts/gcc-plugins/Kconfig"
diff --git a/include/linux/signal.h b/include/linux/signal.h
index 3f96a63..7d34105 100644
--- a/include/linux/signal.h
+++ b/include/linux/signal.h
@@ -464,6 +464,12 @@ int __save_altstack(stack_t __user *, unsigned long);
 	unsafe_put_user(t->sas_ss_size, &__uss->ss_size, label); \
 } while (0);
 
+#ifdef CONFIG_DYNAMIC_SIGFRAME
+bool sigaltstack_size_valid(size_t ss_size);
+#else
+static inline bool sigaltstack_size_valid(size_t size) { return true; }
+#endif /* !CONFIG_DYNAMIC_SIGFRAME */
+
 #ifdef CONFIG_PROC_FS
 struct seq_file;
 extern void render_sigset_t(struct seq_file *, const char *, sigset_t *);
diff --git a/kernel/signal.c b/kernel/signal.c
index 952741f..9278f52 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -4151,11 +4151,29 @@ int do_sigaction(int sig, struct k_sigaction *act, struct k_sigaction *oact)
 	return 0;
 }
 
+#ifdef CONFIG_DYNAMIC_SIGFRAME
+static inline void sigaltstack_lock(void)
+	__acquires(&current->sighand->siglock)
+{
+	spin_lock_irq(&current->sighand->siglock);
+}
+
+static inline void sigaltstack_unlock(void)
+	__releases(&current->sighand->siglock)
+{
+	spin_unlock_irq(&current->sighand->siglock);
+}
+#else
+static inline void sigaltstack_lock(void) { }
+static inline void sigaltstack_unlock(void) { }
+#endif
+
 static int
 do_sigaltstack (const stack_t *ss, stack_t *oss, unsigned long sp,
 		size_t min_ss_size)
 {
 	struct task_struct *t = current;
+	int ret = 0;
 
 	if (oss) {
 		memset(oss, 0, sizeof(stack_t));
@@ -4179,19 +4197,24 @@ do_sigaltstack (const stack_t *ss, stack_t *oss, unsigned long sp,
 				ss_mode != 0))
 			return -EINVAL;
 
+		sigaltstack_lock();
 		if (ss_mode == SS_DISABLE) {
 			ss_size = 0;
 			ss_sp = NULL;
 		} else {
 			if (unlikely(ss_size < min_ss_size))
-				return -ENOMEM;
+				ret = -ENOMEM;
+			if (!sigaltstack_size_valid(ss_size))
+				ret = -ENOMEM;
 		}
-
-		t->sas_ss_sp = (unsigned long) ss_sp;
-		t->sas_ss_size = ss_size;
-		t->sas_ss_flags = ss_flags;
+		if (!ret) {
+			t->sas_ss_sp = (unsigned long) ss_sp;
+			t->sas_ss_size = ss_size;
+			t->sas_ss_flags = ss_flags;
+		}
+		sigaltstack_unlock();
 	}
-	return 0;
+	return ret;
 }
 
 SYSCALL_DEFINE2(sigaltstack,const stack_t __user *,uss, stack_t __user *,uoss)

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [tip: x86/fpu] Documentation/x86: Add documentation for using dynamic XSTATE features
  2021-10-26  9:11     ` [PATCH] Documentation/x86: Add documentation for using dynamic XSTATE features Chang S. Bae
  2021-10-26 16:16       ` [tip: x86/fpu] " tip-bot2 for Chang S. Bae
@ 2021-10-28 13:10       ` tip-bot2 for Chang S. Bae
  1 sibling, 0 replies; 60+ messages in thread
From: tip-bot2 for Chang S. Bae @ 2021-10-28 13:10 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Chang S. Bae, Borislav Petkov, x86, linux-kernel

The following commit has been merged into the x86/fpu branch of tip:

Commit-ID:     d7a9590f608dbedd917eb0857a074accdf0d3919
Gitweb:        https://git.kernel.org/tip/d7a9590f608dbedd917eb0857a074accdf0d3919
Author:        Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate:    Tue, 26 Oct 2021 02:11:57 -07:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Thu, 28 Oct 2021 14:54:58 +02:00

Documentation/x86: Add documentation for using dynamic XSTATE features

Explain how dynamic XSTATE features can be enabled via the
architecture-specific prctl() along with dynamic sigframe size and
first use trap handling.

Fix:

Documentation/x86/xstate.rst:15: WARNING: Title underline too short.

as reported by Stephen Rothwell <sfr@canb.auug.org.au>

Originally-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211026091157.16711-1-chang.seok.bae@intel.com
---
 Documentation/x86/index.rst  |  1 +-
 Documentation/x86/xstate.rst | 65 +++++++++++++++++++++++++++++++++++-
 2 files changed, 66 insertions(+)
 create mode 100644 Documentation/x86/xstate.rst

diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst
index 3830483..f498f1d 100644
--- a/Documentation/x86/index.rst
+++ b/Documentation/x86/index.rst
@@ -37,3 +37,4 @@ x86-specific Documentation
    sgx
    features
    elf_auxvec
+   xstate
diff --git a/Documentation/x86/xstate.rst b/Documentation/x86/xstate.rst
new file mode 100644
index 0000000..65de3f0
--- /dev/null
+++ b/Documentation/x86/xstate.rst
@@ -0,0 +1,65 @@
+Using XSTATE features in user space applications
+================================================
+
+The x86 architecture supports floating-point extensions which are
+enumerated via CPUID. Applications consult CPUID and use XGETBV to
+evaluate which features have been enabled by the kernel XCR0.
+
+Up to AVX-512 and PKRU states, these features are automatically enabled by
+the kernel if available. Features like AMX TILE_DATA (XSTATE component 18)
+are enabled by XCR0 as well, but the first use of related instruction is
+trapped by the kernel because by default the required large XSTATE buffers
+are not allocated automatically.
+
+Using dynamically enabled XSTATE features in user space applications
+--------------------------------------------------------------------
+
+The kernel provides an arch_prctl(2) based mechanism for applications to
+request the usage of such features. The arch_prctl(2) options related to
+this are:
+
+-ARCH_GET_XCOMP_SUPP
+
+ arch_prctl(ARCH_GET_XCOMP_SUPP, &features);
+
+ ARCH_GET_XCOMP_SUPP stores the supported features in userspace storage of
+ type uint64_t. The second argument is a pointer to that storage.
+
+-ARCH_GET_XCOMP_PERM
+
+ arch_prctl(ARCH_GET_XCOMP_PERM, &features);
+
+ ARCH_GET_XCOMP_PERM stores the features for which the userspace process
+ has permission in userspace storage of type uint64_t. The second argument
+ is a pointer to that storage.
+
+-ARCH_REQ_XCOMP_PERM
+
+ arch_prctl(ARCH_REQ_XCOMP_PERM, feature_nr);
+
+ ARCH_REQ_XCOMP_PERM allows to request permission for a dynamically enabled
+ feature or a feature set. A feature set can be mapped to a facility, e.g.
+ AMX, and can require one or more XSTATE components to be enabled.
+
+ The feature argument is the number of the highest XSTATE component which
+ is required for a facility to work.
+
+When requesting permission for a feature, the kernel checks the
+availability. The kernel ensures that sigaltstacks in the process's tasks
+are large enough to accommodate the resulting large signal frame. It
+enforces this both during ARCH_REQ_XCOMP_SUPP and during any subsequent
+sigaltstack(2) calls. If an installed sigaltstack is smaller than the
+resulting sigframe size, ARCH_REQ_XCOMP_SUPP results in -ENOSUPP. Also,
+sigaltstack(2) results in -ENOMEM if the requested altstack is too small
+for the permitted features.
+
+Permission, when granted, is valid per process. Permissions are inherited
+on fork(2) and cleared on exec(3).
+
+The first use of an instruction related to a dynamically enabled feature is
+trapped by the kernel. The trap handler checks whether the process has
+permission to use the feature. If the process has no permission then the
+kernel sends SIGILL to the application. If the process has permission then
+the handler allocates a larger xstate buffer for the task so the large
+state can be context switched. In the unlikely cases that the allocation
+fails, the kernel sends SIGSEGV.

^ permalink raw reply related	[flat|nested] 60+ messages in thread

end of thread, other threads:[~2021-10-28 13:10 UTC | newest]

Thread overview: 60+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-21 22:55 [PATCH 00/23] x86: Support Intel Advanced Matrix Extensions (part 4) Chang S. Bae
2021-10-21 22:55 ` [PATCH 01/23] signal: Add an optional check for altstack size Chang S. Bae
2021-10-22  0:06   ` Bae, Chang Seok
2021-10-22 15:20   ` Eric W. Biederman
2021-10-22 20:58     ` Bae, Chang Seok
2021-10-22 22:51     ` Dave Hansen
2021-10-26 16:16   ` [tip: x86/fpu] " tip-bot2 for Thomas Gleixner
2021-10-21 22:55 ` [PATCH 02/23] x86/signal: Implement sigaltstack size validation Chang S. Bae
2021-10-26 16:16   ` [tip: x86/fpu] " tip-bot2 for Thomas Gleixner
2021-10-21 22:55 ` [PATCH 03/23] x86/fpu/xstate: Provide xstate_calculate_size() Chang S. Bae
2021-10-26 16:16   ` [tip: x86/fpu] " tip-bot2 for Chang S. Bae
2021-10-21 22:55 ` [PATCH 04/23] x86/fpu: Add members to struct fpu to cache permission information Chang S. Bae
2021-10-26 16:16   ` [tip: x86/fpu] " tip-bot2 for Thomas Gleixner
2021-10-21 22:55 ` [PATCH 05/23] x86/fpu: Add fpu_state_config::legacy_features Chang S. Bae
2021-10-26 16:16   ` [tip: x86/fpu] " tip-bot2 for Thomas Gleixner
2021-10-21 22:55 ` [PATCH 06/23] x86/arch_prctl: Add controls for dynamic XSTATE components Chang S. Bae
2021-10-24 21:17   ` Borislav Petkov
2021-10-26  9:11     ` [PATCH] Documentation/x86: Add documentation for using dynamic XSTATE features Chang S. Bae
2021-10-26 16:16       ` [tip: x86/fpu] " tip-bot2 for Chang S. Bae
2021-10-28 13:10       ` tip-bot2 for Chang S. Bae
2021-10-26 16:16   ` [tip: x86/fpu] x86/arch_prctl: Add controls for dynamic XSTATE components tip-bot2 for Chang S. Bae
2021-10-21 22:55 ` [PATCH 07/23] x86/fpu: Add basic helpers for dynamically enabled features Chang S. Bae
2021-10-26 16:16   ` [tip: x86/fpu] " tip-bot2 for Thomas Gleixner
2021-10-21 22:55 ` [PATCH 08/23] x86/signal: Use fpu::__state_user_size for sigalt stack validation Chang S. Bae
2021-10-26 16:16   ` [tip: x86/fpu] " tip-bot2 for Thomas Gleixner
2021-10-21 22:55 ` [PATCH 09/23] x86/fpu/signal: Prepare for variable sigframe length Chang S. Bae
2021-10-26 16:16   ` [tip: x86/fpu] " tip-bot2 for Chang S. Bae
2021-10-21 22:55 ` [PATCH 10/23] x86/fpu: Prepare fpu_clone() for dynamically enabled features Chang S. Bae
2021-10-26 16:16   ` [tip: x86/fpu] " tip-bot2 for Thomas Gleixner
2021-10-21 22:55 ` [PATCH 11/23] x86/fpu: Reset permission and fpstate on exec() Chang S. Bae
2021-10-26 16:16   ` [tip: x86/fpu] " tip-bot2 for Chang S. Bae
2021-10-21 22:55 ` [PATCH 12/23] x86/cpufeatures: Add eXtended Feature Disabling (XFD) feature bit Chang S. Bae
2021-10-26 16:16   ` [tip: x86/fpu] " tip-bot2 for Chang S. Bae
2021-10-21 22:55 ` [PATCH 13/23] x86/msr-index: Add MSRs for XFD Chang S. Bae
2021-10-26 16:16   ` [tip: x86/fpu] " tip-bot2 for Chang S. Bae
2021-10-21 22:55 ` [PATCH 14/23] x86/fpu: Add XFD state to fpstate Chang S. Bae
2021-10-26 16:16   ` [tip: x86/fpu] " tip-bot2 for Chang S. Bae
2021-10-21 22:55 ` [PATCH 15/23] x86/fpu: Add sanity checks for XFD Chang S. Bae
2021-10-25  8:11   ` Borislav Petkov
2021-10-25 20:15     ` Thomas Gleixner
2021-10-25  8:33   ` Mika Penttilä
2021-10-25 18:13     ` Thomas Gleixner
2021-10-25 19:57       ` Dave Hansen
2021-10-26 16:16   ` [tip: x86/fpu] " tip-bot2 for Thomas Gleixner
2021-10-21 22:55 ` [PATCH 16/23] x86/fpu: Update XFD state where required Chang S. Bae
2021-10-26 16:16   ` [tip: x86/fpu] " tip-bot2 for Chang S. Bae
2021-10-21 22:55 ` [PATCH 17/23] x86/fpu/xstate: Add XFD #NM handler Chang S. Bae
2021-10-26 16:16   ` [tip: x86/fpu] " tip-bot2 for Chang S. Bae
2021-10-21 22:55 ` [PATCH 18/23] x86/fpu/xstate: Add fpstate_realloc()/free() Chang S. Bae
2021-10-26 16:16   ` [tip: x86/fpu] " tip-bot2 for Chang S. Bae
2021-10-21 22:55 ` [PATCH 19/23] x86/fpu/xstate: Prepare XSAVE feature table for gaps in state component numbers Chang S. Bae
2021-10-26 16:16   ` [tip: x86/fpu] " tip-bot2 for Chang S. Bae
2021-10-21 22:55 ` [PATCH 20/23] x86/fpu/amx: Define AMX state components and have it used for boot-time checks Chang S. Bae
2021-10-26 16:16   ` [tip: x86/fpu] " tip-bot2 for Chang S. Bae
2021-10-21 22:55 ` [PATCH 21/23] x86/fpu: Calculate the default sizes independently Chang S. Bae
2021-10-26 16:16   ` [tip: x86/fpu] " tip-bot2 for Chang S. Bae
2021-10-21 22:55 ` [PATCH 22/23] x86/fpu: Add XFD handling for dynamic states Chang S. Bae
2021-10-26 16:16   ` [tip: x86/fpu] " tip-bot2 for Chang S. Bae
2021-10-21 22:55 ` [PATCH 23/23] x86/fpu/amx: Enable the AMX feature in 64-bit mode Chang S. Bae
2021-10-26 16:16   ` [tip: x86/fpu] " tip-bot2 for Chang S. Bae

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.