All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch V2 00/14] x86/fpu: Mop up XSAVES and related damage
@ 2021-06-05 23:47 Thomas Gleixner
  2021-06-05 23:47 ` [patch V2 01/14] selftests/x86: Test signal frame XSTATE header corruption handling Thomas Gleixner
                   ` (14 more replies)
  0 siblings, 15 replies; 26+ messages in thread
From: Thomas Gleixner @ 2021-06-05 23:47 UTC (permalink / raw)
  To: LKML
  Cc: x86, Andy Lutomirski, Dave Hansen, Fenghua Yu, Tony Luck,
	Yu-cheng Yu, Sebastian Andrzej Siewior

syszbot reported a warnon for XRSTOR raising #GP:

  https://lore.kernel.org/r/0000000000004c453905c30f8334@google.com

with a syzcaller reproducer and a conclusive bisect result.  It took a
while to destill a simple C reproducer out of it which allowed to pin point
the root cause: The recent addition of supervisor XSTATEs broke the signal
restore path for the case where the signal handler wreckaged the XSTATE on
stack because it does not sanitize the XSTATE header which causes a
subsequent XRSTOR to fail and #GP.

The following series addresses the problem and fixes related issues which
were found while inspecting the related changes.

V1 of this can be found here:

  https://lore.kernel.org/r/20210602095543.149814064@linutronix.de

Changes vs. V1:

  - Address review comments

  - Fix the reported fallout (mostly PKRU and PTRACE). The test cases pass
    now.

  - Address the broken init_fpstate fiddling which was found by deeper
    inspection of the PKRU/XSTATE related code.

Thanks,

	tglx
---

P.S: I picked up Dave's PKRU series from

  https://lore.kernel.org/r/20210603230803.31660AFE@viggo.jf.intel.com

and adopted it to this pile. The result is here:

    https://tglx.de/~tglx/patches-fpu-pkru.tar

There are some fixes at the end which I did no fold back yet and the glibc
PKRU test case with all that applied still fails with:

 ../sysdeps/unix/sysv/linux/tst-pkey.c:377: numeric comparison failure
    left: 1 (0x1); from: result->access_rights[i]
   right: 0 (0x0); from: 0
 ../sysdeps/unix/sysv/linux/tst-pkey.c:382: numeric comparison failure
    left: 1 (0x1); from: result2->access_rights[i]
   right: 0 (0x0); from: 0
 error: 2 test failures

Too tired to analyze that right now, but I wanted to share the work.
     
---
 arch/x86/include/asm/fpu/internal.h                   |   17 --
 arch/x86/include/asm/fpu/xstate.h                     |   10 -
 arch/x86/include/asm/pgtable.h                        |    6 
 arch/x86/include/asm/pkeys.h                          |   11 -
 arch/x86/kernel/cpu/bugs.c                            |    3 
 arch/x86/kernel/cpu/common.c                          |   27 +---
 arch/x86/kernel/fpu/core.c                            |  119 ++++++++++++------
 arch/x86/kernel/fpu/regset.c                          |   45 ++----
 arch/x86/kernel/fpu/signal.c                          |   30 +++-
 arch/x86/kernel/fpu/xstate.c                          |  100 +++++----------
 arch/x86/kernel/process.c                             |   14 +-
 arch/x86/mm/fault.c                                   |    2 
 arch/x86/mm/pkeys.c                                   |   31 ++--
 b/tools/testing/selftests/x86/corrupt_xstate_header.c |  114 +++++++++++++++++
 include/linux/pkeys.h                                 |    2 
 tools/testing/selftests/x86/Makefile                  |    3 
 16 files changed, 334 insertions(+), 200 deletions(-)


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [patch V2 01/14] selftests/x86: Test signal frame XSTATE header corruption handling
  2021-06-05 23:47 [patch V2 00/14] x86/fpu: Mop up XSAVES and related damage Thomas Gleixner
@ 2021-06-05 23:47 ` Thomas Gleixner
  2021-06-05 23:47 ` [patch V2 02/14] x86/fpu: Prevent state corruption in __fpu__restore_sig() Thomas Gleixner
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 26+ messages in thread
From: Thomas Gleixner @ 2021-06-05 23:47 UTC (permalink / raw)
  To: LKML
  Cc: x86, Andy Lutomirski, Dave Hansen, Fenghua Yu, Tony Luck,
	Yu-cheng Yu, Sebastian Andrzej Siewior

From: Andy Lutomirski <luto@kernel.org>

This is very heavily based on some code from Thomas Gleixner.  On a system
without XSAVES, it triggers the WARN_ON():

    Bad FPU state detected at copy_kernel_to_fpregs+0x2f/0x40, reinitializing FPU registers.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V2: main() -> main(void) - Borislav
---
 tools/testing/selftests/x86/Makefile                |    3 
 tools/testing/selftests/x86/corrupt_xstate_header.c |  114 ++++++++++++++++++++
 2 files changed, 116 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/x86/corrupt_xstate_header.c

--- a/tools/testing/selftests/x86/Makefile
+++ b/tools/testing/selftests/x86/Makefile
@@ -17,7 +17,8 @@ TARGETS_C_BOTHBITS := single_step_syscal
 TARGETS_C_32BIT_ONLY := entry_from_vm86 test_syscall_vdso unwind_vdso \
 			test_FCMOV test_FCOMI test_FISTTP \
 			vdso_restorer
-TARGETS_C_64BIT_ONLY := fsgsbase sysret_rip syscall_numbering
+TARGETS_C_64BIT_ONLY := fsgsbase sysret_rip syscall_numbering \
+			corrupt_xstate_header
 # Some selftests require 32bit support enabled also on 64bit systems
 TARGETS_C_32BIT_NEEDED := ldt_gdt ptrace_syscall
 
--- /dev/null
+++ b/tools/testing/selftests/x86/corrupt_xstate_header.c
@@ -0,0 +1,114 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Corrupt the XSTATE header in a signal frame
+ *
+ * Based on analysis and a test case from Thomas Gleixner.
+ */
+
+#define _GNU_SOURCE
+
+#include <stdlib.h>
+#include <stdio.h>
+#include <string.h>
+#include <sched.h>
+#include <signal.h>
+#include <err.h>
+#include <unistd.h>
+#include <stdint.h>
+#include <sys/wait.h>
+
+static inline void __cpuid(unsigned int *eax, unsigned int *ebx,
+			   unsigned int *ecx, unsigned int *edx)
+{
+	asm volatile(
+		"cpuid;"
+		: "=a" (*eax),
+		  "=b" (*ebx),
+		  "=c" (*ecx),
+		  "=d" (*edx)
+		: "0" (*eax), "2" (*ecx));
+}
+
+static inline int xsave_enabled(void)
+{
+	unsigned int eax, ebx, ecx, edx;
+
+	eax = 0x1;
+	ecx = 0x0;
+	__cpuid(&eax, &ebx, &ecx, &edx);
+
+	/* Is CR4.OSXSAVE enabled ? */
+	return ecx & (1U << 27);
+}
+
+static void sethandler(int sig, void (*handler)(int, siginfo_t *, void *),
+		       int flags)
+{
+	struct sigaction sa;
+
+	memset(&sa, 0, sizeof(sa));
+	sa.sa_sigaction = handler;
+	sa.sa_flags = SA_SIGINFO | flags;
+	sigemptyset(&sa.sa_mask);
+	if (sigaction(sig, &sa, 0))
+		err(1, "sigaction");
+}
+
+static void sigusr1(int sig, siginfo_t *info, void *uc_void)
+{
+	ucontext_t *uc = uc_void;
+	uint8_t *fpstate = (uint8_t *)uc->uc_mcontext.fpregs;
+	uint64_t *xfeatures = (uint64_t *)(fpstate + 512);
+
+	printf("\tWreckage XSTATE header\n");
+	/* Wreckage the first reserved byte in the header */
+	*(xfeatures + 2) = 0xfffffff;
+}
+
+static void sigsegv(int sig, siginfo_t *info, void *uc_void)
+{
+	printf("\tGot SIGSEGV\n");
+}
+
+int main(void)
+{
+	cpu_set_t set;
+
+	sethandler(SIGUSR1, sigusr1, 0);
+	sethandler(SIGSEGV, sigsegv, 0);
+
+	if (!xsave_enabled()) {
+		printf("[SKIP] CR4.OSXSAVE disabled.\n");
+		return 0;
+	}
+
+	CPU_ZERO(&set);
+	CPU_SET(0, &set);
+
+	/*
+	 * Enforce that the child runs on the same CPU
+	 * which in turn forces a schedule.
+	 */
+	sched_setaffinity(getpid(), sizeof(set), &set);
+
+	printf("[RUN]\tSend ourselves a signal\n");
+	raise(SIGUSR1);
+
+	printf("[OK]\tBack from the signal.  Now schedule.\n");
+	pid_t child = fork();
+	if (child < 0)
+		err(1, "fork");
+	if (child == 0)
+		return 0;
+	if (child)
+		waitpid(child, NULL, 0);
+	printf("[OK]\tBack in the main thread.\n");
+
+	/*
+	 * We could try to confirm that extended state is still preserved
+	 * when we schedule.  For now, the only indication of failure is
+	 * a warning in the kernel logs.
+	 */
+
+	return 0;
+}


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [patch V2 02/14] x86/fpu: Prevent state corruption in __fpu__restore_sig()
  2021-06-05 23:47 [patch V2 00/14] x86/fpu: Mop up XSAVES and related damage Thomas Gleixner
  2021-06-05 23:47 ` [patch V2 01/14] selftests/x86: Test signal frame XSTATE header corruption handling Thomas Gleixner
@ 2021-06-05 23:47 ` Thomas Gleixner
  2021-06-07  8:49   ` Borislav Petkov
  2021-06-05 23:47 ` [patch V2 03/14] x86/fpu: Invalidate FPU state after a failed XRSTOR from a user buffer Thomas Gleixner
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 26+ messages in thread
From: Thomas Gleixner @ 2021-06-05 23:47 UTC (permalink / raw)
  To: LKML
  Cc: x86, Andy Lutomirski, Dave Hansen, Fenghua Yu, Tony Luck,
	Yu-cheng Yu, Sebastian Andrzej Siewior,
	syzbot+2067e764dbcd10721e2e, stable

From: Thomas Gleixner <tglx@linutronix.de>

The non-compacted slowpath uses __copy_from_user() and copies the entire
user buffer into the kernel buffer, verbatim.  This means that the kernel
buffer may now contain entirely invalid state on which XRSTOR will #GP.
validate_user_xstate_header() can detect some of that corruption, but that
leaves the onus on callers to clear the buffer.

Prior to XSAVES support it was possible just to reinitialize the buffer,
completely, but with supervisor states that is not longer possible as the
buffer clearing code split got it backwards. Fixing that is possible, but
not corrupting the state in the first place is more robust.

Avoid corruption of the kernel XSAVE buffer by using copy_user_to_xstate()
which validates the XSAVE header contents before copying the actual states
to the kernel. copy_user_to_xstate() was previously only called for
compacted-format kernel buffers, but it works for both compacted and
non-compacted forms.

Using it for the non-compacted form is slower because of multiple
__copy_from_user() operations, but that cost is less important than robust
code in an already slow path.

[ Changelog polished by Dave Hansen ]

Fixes: b860eb8dce59 ("x86/fpu/xstate: Define new functions for clearing fpregs and xstates")
Reported-by: syzbot+2067e764dbcd10721e2e@syzkaller.appspotmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
---
V2: Removed the make validate_user_xstate_header() static hunks (Borislav)
---
 arch/x86/kernel/fpu/signal.c |    9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)

--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -405,14 +405,7 @@ static int __fpu__restore_sig(void __use
 	if (use_xsave() && !fx_only) {
 		u64 init_bv = xfeatures_mask_user() & ~user_xfeatures;
 
-		if (using_compacted_format()) {
-			ret = copy_user_to_xstate(&fpu->state.xsave, buf_fx);
-		} else {
-			ret = __copy_from_user(&fpu->state.xsave, buf_fx, state_size);
-
-			if (!ret && state_size > offsetof(struct xregs_state, header))
-				ret = validate_user_xstate_header(&fpu->state.xsave.header);
-		}
+		ret = copy_user_to_xstate(&fpu->state.xsave, buf_fx);
 		if (ret)
 			goto err_out;
 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [patch V2 03/14] x86/fpu: Invalidate FPU state after a failed XRSTOR from a user buffer
  2021-06-05 23:47 [patch V2 00/14] x86/fpu: Mop up XSAVES and related damage Thomas Gleixner
  2021-06-05 23:47 ` [patch V2 01/14] selftests/x86: Test signal frame XSTATE header corruption handling Thomas Gleixner
  2021-06-05 23:47 ` [patch V2 02/14] x86/fpu: Prevent state corruption in __fpu__restore_sig() Thomas Gleixner
@ 2021-06-05 23:47 ` Thomas Gleixner
  2021-06-05 23:47 ` [patch V2 04/14] x86/pkru: Make the fpinit state update work Thomas Gleixner
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 26+ messages in thread
From: Thomas Gleixner @ 2021-06-05 23:47 UTC (permalink / raw)
  To: LKML
  Cc: x86, Andy Lutomirski, Dave Hansen, Fenghua Yu, Tony Luck,
	Yu-cheng Yu, Sebastian Andrzej Siewior, stable

From: Andy Lutomirski <luto@kernel.org>

Both Intel and AMD consider it to be architecturally valid for XRSTOR to
fail with #PF but nonetheless change the register state.  The actual
conditions under which this might occur are unclear [1], but it seems
plausible that this might be triggered if one sibling thread unmaps a page
and invalidates the shared TLB while another sibling thread is executing
XRSTOR on the page in question.

__fpu__restore_sig() can execute XRSTOR while the hardware registers are
preserved on behalf of a different victim task (using the
fpu_fpregs_owner_ctx mechanism), and, in theory, XRSTOR could fail but
modify the registers.  If this happens, then there is a window in which
__fpu__restore_sig() could schedule out and the victim task could schedule
back in without reloading its own FPU registers.  This would result in part
of the FPU state that __fpu__restore_sig() was attempting to load leaking
into the victim task's user-visible state.

Invalidate preserved FPU registers on XRSTOR failure to prevent this
situation from corrupting any state.

[1] Frequent readers of the errata lists might imagine "complex
    microarchitectural conditions"

Fixes: 1d731e731c4c ("x86/fpu: Add a fastpath to __fpu__restore_sig()")
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
---
V2: Amend changelog - Borislav
---
 arch/x86/kernel/fpu/signal.c |   21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -369,6 +369,27 @@ static int __fpu__restore_sig(void __use
 			fpregs_unlock();
 			return 0;
 		}
+
+		if (test_thread_flag(TIF_NEED_FPU_LOAD)) {
+			/*
+			 * The FPU registers do not belong to current, and
+			 * we just did an FPU restore operation, restricted
+			 * to the user portion of the register file, and
+			 * failed.  In the event that the ucode was
+			 * unfriendly and modified the registers at all, we
+			 * need to make sure that we aren't corrupting an
+			 * innocent non-current task's registers.
+			 */
+			__cpu_invalidate_fpregs_state();
+		} else {
+			/*
+			 * As above, we may have just clobbered current's
+			 * user FPU state.  We will either successfully
+			 * load it or clear it below, so no action is
+			 * required here.
+			 */
+		}
+
 		fpregs_unlock();
 	} else {
 		/*


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [patch V2 04/14] x86/pkru: Make the fpinit state update work
  2021-06-05 23:47 [patch V2 00/14] x86/fpu: Mop up XSAVES and related damage Thomas Gleixner
                   ` (2 preceding siblings ...)
  2021-06-05 23:47 ` [patch V2 03/14] x86/fpu: Invalidate FPU state after a failed XRSTOR from a user buffer Thomas Gleixner
@ 2021-06-05 23:47 ` Thomas Gleixner
  2021-06-07 15:18   ` Borislav Petkov
  2021-06-05 23:47 ` [patch V2 05/14] x86/fpu: Limit xstate copy size in xstateregs_set() Thomas Gleixner
                   ` (10 subsequent siblings)
  14 siblings, 1 reply; 26+ messages in thread
From: Thomas Gleixner @ 2021-06-05 23:47 UTC (permalink / raw)
  To: LKML
  Cc: x86, Andy Lutomirski, Dave Hansen, Fenghua Yu, Tony Luck,
	Yu-cheng Yu, Sebastian Andrzej Siewior, stable

It's unclear how this ever made any difference because
init_fpstate.xsave.header.xfeatures is always 0 so get_xsave_addr() will
always return a NULL pointer.

Fix this by:

 - Creating a propagation function instead of several open coded instances
   and invoke it only once after the boot CPU has been initialized and from
   the debugfs file write function.

 - Set the PKRU feature bit in init_fpstate.xsave.header.xfeatures to make
   get_xsave_addr() work which allows to store the default value for real.

 - Having the feature bit set also gets rid of copy_init_pkru_to_fpregs()
   because now copy_init_fpstate_to_fpregs() already loads the default PKRU
   value.

Fixes: a5eff7259790 ("x86/pkeys: Add PKRU value to init_fpstate")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
---
V2: New patch
--- 
 arch/x86/include/asm/pkeys.h |    3 ++-
 arch/x86/kernel/cpu/bugs.c   |    3 +++
 arch/x86/kernel/cpu/common.c |    5 -----
 arch/x86/kernel/fpu/core.c   |    3 ---
 arch/x86/mm/pkeys.c          |   31 ++++++++++++++-----------------
 include/linux/pkeys.h        |    2 +-
 6 files changed, 20 insertions(+), 27 deletions(-)

--- a/arch/x86/include/asm/pkeys.h
+++ b/arch/x86/include/asm/pkeys.h
@@ -124,7 +124,8 @@ extern int arch_set_user_pkey_access(str
 		unsigned long init_val);
 extern int __arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
 		unsigned long init_val);
-extern void copy_init_pkru_to_fpregs(void);
+
+extern void pkru_propagate_default(void);
 
 static inline int vma_pkey(struct vm_area_struct *vma)
 {
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -16,6 +16,7 @@
 #include <linux/prctl.h>
 #include <linux/sched/smt.h>
 #include <linux/pgtable.h>
+#include <linux/pkeys.h>
 
 #include <asm/spec-ctrl.h>
 #include <asm/cmdline.h>
@@ -120,6 +121,8 @@ void __init check_bugs(void)
 
 	arch_smt_update();
 
+	pkru_propagate_default();
+
 #ifdef CONFIG_X86_32
 	/*
 	 * Check whether we are able to run this kernel safely on SMP.
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -465,8 +465,6 @@ static bool pku_disabled;
 
 static __always_inline void setup_pku(struct cpuinfo_x86 *c)
 {
-	struct pkru_state *pk;
-
 	/* check the boot processor, plus compile options for PKU: */
 	if (!cpu_feature_enabled(X86_FEATURE_PKU))
 		return;
@@ -477,9 +475,6 @@ static __always_inline void setup_pku(st
 		return;
 
 	cr4_set_bits(X86_CR4_PKE);
-	pk = get_xsave_addr(&init_fpstate.xsave, XFEATURE_PKRU);
-	if (pk)
-		pk->pkru = init_pkru_value;
 	/*
 	 * Setting X86_CR4_PKE will cause the X86_FEATURE_OSPKE
 	 * cpuid bit to be set.  We need to ensure that we
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -348,9 +348,6 @@ static inline void copy_init_fpstate_to_
 		copy_kernel_to_fxregs(&init_fpstate.fxsave);
 	else
 		copy_kernel_to_fregs(&init_fpstate.fsave);
-
-	if (boot_cpu_has(X86_FEATURE_OSPKE))
-		copy_init_pkru_to_fpregs();
 }
 
 /*
--- a/arch/x86/mm/pkeys.c
+++ b/arch/x86/mm/pkeys.c
@@ -125,20 +125,20 @@ u32 init_pkru_value = PKRU_AD_KEY( 1) |
 		      PKRU_AD_KEY(10) | PKRU_AD_KEY(11) | PKRU_AD_KEY(12) |
 		      PKRU_AD_KEY(13) | PKRU_AD_KEY(14) | PKRU_AD_KEY(15);
 
-/*
- * Called from the FPU code when creating a fresh set of FPU
- * registers.  This is called from a very specific context where
- * we know the FPU registers are safe for use and we can use PKRU
- * directly.
- */
-void copy_init_pkru_to_fpregs(void)
+void pkru_propagate_default(void)
 {
-	u32 init_pkru_value_snapshot = READ_ONCE(init_pkru_value);
+	struct pkru_state *pk;
+
+	if (!boot_cpu_has(X86_FEATURE_OSPKE))
+		return;
 	/*
-	 * Override the PKRU state that came from 'init_fpstate'
-	 * with the baseline from the process.
+	 * Force XFEATURE_PKRU to be set in the header otherwise
+	 * get_xsave_addr() does not work and it needs to be set
+	 * to make XRSTOR(S) load it.
 	 */
-	write_pkru(init_pkru_value_snapshot);
+	init_fpstate.xsave.header.xfeatures |= XFEATURE_MASK_PKRU;
+	pk = get_xsave_addr(&init_fpstate.xsave, XFEATURE_PKRU);
+	pk->pkru = READ_ONCE(init_pkru_value);
 }
 
 static ssize_t init_pkru_read_file(struct file *file, char __user *user_buf,
@@ -154,10 +154,9 @@ static ssize_t init_pkru_read_file(struc
 static ssize_t init_pkru_write_file(struct file *file,
 		 const char __user *user_buf, size_t count, loff_t *ppos)
 {
-	struct pkru_state *pk;
+	u32 new_init_pkru;
 	char buf[32];
 	ssize_t len;
-	u32 new_init_pkru;
 
 	len = min(count, sizeof(buf) - 1);
 	if (copy_from_user(buf, user_buf, len))
@@ -177,10 +176,8 @@ static ssize_t init_pkru_write_file(stru
 		return -EINVAL;
 
 	WRITE_ONCE(init_pkru_value, new_init_pkru);
-	pk = get_xsave_addr(&init_fpstate.xsave, XFEATURE_PKRU);
-	if (!pk)
-		return -EINVAL;
-	pk->pkru = new_init_pkru;
+
+	pkru_propagate_default();
 	return count;
 }
 
--- a/include/linux/pkeys.h
+++ b/include/linux/pkeys.h
@@ -44,7 +44,7 @@ static inline bool arch_pkeys_enabled(vo
 	return false;
 }
 
-static inline void copy_init_pkru_to_fpregs(void)
+static inline void pkru_propagate_default(void)
 {
 }
 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [patch V2 05/14] x86/fpu: Limit xstate copy size in xstateregs_set()
  2021-06-05 23:47 [patch V2 00/14] x86/fpu: Mop up XSAVES and related damage Thomas Gleixner
                   ` (3 preceding siblings ...)
  2021-06-05 23:47 ` [patch V2 04/14] x86/pkru: Make the fpinit state update work Thomas Gleixner
@ 2021-06-05 23:47 ` Thomas Gleixner
  2021-06-05 23:47 ` [patch V2 06/14] x86/fpu: Sanitize xstateregs_set() Thomas Gleixner
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 26+ messages in thread
From: Thomas Gleixner @ 2021-06-05 23:47 UTC (permalink / raw)
  To: LKML
  Cc: x86, Andy Lutomirski, Dave Hansen, Fenghua Yu, Tony Luck,
	Yu-cheng Yu, Sebastian Andrzej Siewior, stable

If the count argument is larger than the xstate size, this will happily
copy beyond the end of xstate.

Fixes: 91c3dba7dbc1 ("x86/fpu/xstate: Fix PTRACE frames for XSAVES")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
---
 arch/x86/kernel/fpu/regset.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/x86/kernel/fpu/regset.c
+++ b/arch/x86/kernel/fpu/regset.c
@@ -117,7 +117,7 @@ int xstateregs_set(struct task_struct *t
 	/*
 	 * A whole standard-format XSAVE buffer is needed:
 	 */
-	if ((pos != 0) || (count < fpu_user_xstate_size))
+	if (pos != 0 || count != fpu_user_xstate_size)
 		return -EFAULT;
 
 	xsave = &fpu->state.xsave;


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [patch V2 06/14] x86/fpu: Sanitize xstateregs_set()
  2021-06-05 23:47 [patch V2 00/14] x86/fpu: Mop up XSAVES and related damage Thomas Gleixner
                   ` (4 preceding siblings ...)
  2021-06-05 23:47 ` [patch V2 05/14] x86/fpu: Limit xstate copy size in xstateregs_set() Thomas Gleixner
@ 2021-06-05 23:47 ` Thomas Gleixner
  2021-06-07 19:39   ` Borislav Petkov
  2021-06-05 23:47 ` [patch V2 07/14] x86/fpu: Add address range checks to copy_user_to_xstate() Thomas Gleixner
                   ` (8 subsequent siblings)
  14 siblings, 1 reply; 26+ messages in thread
From: Thomas Gleixner @ 2021-06-05 23:47 UTC (permalink / raw)
  To: LKML
  Cc: x86, Andy Lutomirski, Dave Hansen, Fenghua Yu, Tony Luck,
	Yu-cheng Yu, Sebastian Andrzej Siewior

xstateregs_set() operates on a stopped task and tries to copy the provided
buffer into the tasks fpu.state.xsave buffer.

Any error while copying or invalid state detected after copying results in
wiping the target tasks FPU state completely including supervisor states.

That's just wrong. The caller supplied invalid data or has a problem with
unmapped memory, so there is absolutely no justification to wreckage the
target.

Fix this with the following modifications:

 1) If data has to be copied from userspace, allocate a buffer and copy from
    user first.

 2) Use copy_kernel_to_xstate() unconditionally so that header checking
    works correctly.

 3) Return on error without wreckaging the target state.

This prevents corrupting supervisor states and lets the caller deal with
the problem it caused in the first place.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V2: Move the validate_user_xstate_header() static here - Borislav
    Use copy_from_user() - Dave, Yu
    Rename xbuf to tmpbuf - Borislav
---
 arch/x86/include/asm/fpu/xstate.h |    4 ---
 arch/x86/kernel/fpu/regset.c      |   41 +++++++++++++++-----------------------
 arch/x86/kernel/fpu/xstate.c      |   12 ++++++-----
 3 files changed, 24 insertions(+), 33 deletions(-)

--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -112,8 +112,4 @@ void copy_supervisor_to_kernel(struct xr
 void copy_dynamic_supervisor_to_kernel(struct xregs_state *xstate, u64 mask);
 void copy_kernel_to_dynamic_supervisor(struct xregs_state *xstate, u64 mask);
 
-
-/* Validate an xstate header supplied by userspace (ptrace or sigreturn) */
-int validate_user_xstate_header(const struct xstate_header *hdr);
-
 #endif
--- a/arch/x86/kernel/fpu/regset.c
+++ b/arch/x86/kernel/fpu/regset.c
@@ -6,8 +6,12 @@
 #include <asm/fpu/signal.h>
 #include <asm/fpu/regset.h>
 #include <asm/fpu/xstate.h>
+
+#include <linux/vmalloc.h>
+
 #include <linux/sched/task_stack.h>
 
+
 /*
  * The xstateregs_active() routine is the same as the regset_fpregs_active() routine,
  * as the "regset->n" for the xstate regset will be updated based on the feature
@@ -108,7 +112,7 @@ int xstateregs_set(struct task_struct *t
 		  const void *kbuf, const void __user *ubuf)
 {
 	struct fpu *fpu = &target->thread.fpu;
-	struct xregs_state *xsave;
+	struct xregs_state *tmpbuf = NULL;
 	int ret;
 
 	if (!boot_cpu_has(X86_FEATURE_XSAVE))
@@ -120,32 +124,21 @@ int xstateregs_set(struct task_struct *t
 	if (pos != 0 || count != fpu_user_xstate_size)
 		return -EFAULT;
 
-	xsave = &fpu->state.xsave;
-
-	fpu__prepare_write(fpu);
-
-	if (using_compacted_format()) {
-		if (kbuf)
-			ret = copy_kernel_to_xstate(xsave, kbuf);
-		else
-			ret = copy_user_to_xstate(xsave, ubuf);
-	} else {
-		ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, xsave, 0, -1);
-		if (!ret)
-			ret = validate_user_xstate_header(&xsave->header);
+	if (!kbuf) {
+		tmpbuf = vmalloc(count);
+		if (!tmpbuf)
+			return -ENOMEM;
+		if (copy_from_user(tmpbuf, ubuf, count)) {
+			ret = -EFAULT;
+			goto out;
+		}
 	}
 
-	/*
-	 * mxcsr reserved bits must be masked to zero for security reasons.
-	 */
-	xsave->i387.mxcsr &= mxcsr_feature_mask;
-
-	/*
-	 * In case of failure, mark all states as init:
-	 */
-	if (ret)
-		fpstate_init(&fpu->state);
+	fpu__prepare_write(fpu);
+	ret = copy_kernel_to_xstate(&fpu->state.xsave, kbuf ?: tmpbuf);
 
+out:
+	vfree(tmpbuf);
 	return ret;
 }
 
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -515,7 +515,7 @@ int using_compacted_format(void)
 }
 
 /* Validate an xstate header supplied by userspace (ptrace or sigreturn) */
-int validate_user_xstate_header(const struct xstate_header *hdr)
+static int validate_user_xstate_header(const struct xstate_header *hdr)
 {
 	/* No unknown or supervisor features may be set */
 	if (hdr->xfeatures & ~xfeatures_mask_user())
@@ -1172,14 +1172,16 @@ int copy_kernel_to_xstate(struct xregs_s
 	 */
 	xsave->header.xfeatures |= hdr.xfeatures;
 
+	/* mxcsr reserved bits must be masked to zero for security reasons. */
+	xsave->i387.mxcsr &= mxcsr_feature_mask;
+
 	return 0;
 }
 
 /*
- * Convert from a ptrace or sigreturn standard-format user-space buffer to
- * kernel XSAVES format and copy to the target thread. This is called from
- * xstateregs_set(), as well as potentially from the sigreturn() and
- * rt_sigreturn() system calls.
+ * Convert from a sigreturn standard-format user-space buffer to kernel
+ * XSAVES format and copy to the target thread. This is called from the
+ * sigreturn() and rt_sigreturn() system calls.
  */
 int copy_user_to_xstate(struct xregs_state *xsave, const void __user *ubuf)
 {


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [patch V2 07/14] x86/fpu: Add address range checks to copy_user_to_xstate()
  2021-06-05 23:47 [patch V2 00/14] x86/fpu: Mop up XSAVES and related damage Thomas Gleixner
                   ` (5 preceding siblings ...)
  2021-06-05 23:47 ` [patch V2 06/14] x86/fpu: Sanitize xstateregs_set() Thomas Gleixner
@ 2021-06-05 23:47 ` Thomas Gleixner
  2021-06-05 23:47 ` [patch V2 08/14] x86/fpu: Move inlines where they belong Thomas Gleixner
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 26+ messages in thread
From: Thomas Gleixner @ 2021-06-05 23:47 UTC (permalink / raw)
  To: LKML
  Cc: x86, Andy Lutomirski, Dave Hansen, Fenghua Yu, Tony Luck,
	Yu-cheng Yu, Sebastian Andrzej Siewior

From: Andy Lutomirski <luto@kernel.org>

copy_user_to_xstate() uses __copy_from_user(), which provides a negligible
speedup.  Fortunately, both call sites are at least almost correct.
__fpu__restore_sig() checks access_ok() with a length of
xstate_sigframe_size() and ptrace regset access uses fpu_user_xstate_size.
These should be valid upper bounds on the length, so, at worst, this would
cause spurious failures and not accesses to kernel memory.

Nonetheless, this is far more fragile than necessary and none of these
callers are in a hotpath. 

Use copy_from_user() instead.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/fpu/xstate.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -1192,7 +1192,7 @@ int copy_user_to_xstate(struct xregs_sta
 	offset = offsetof(struct xregs_state, header);
 	size = sizeof(hdr);
 
-	if (__copy_from_user(&hdr, ubuf + offset, size))
+	if (copy_from_user(&hdr, ubuf + offset, size))
 		return -EFAULT;
 
 	if (validate_user_xstate_header(&hdr))
@@ -1207,7 +1207,7 @@ int copy_user_to_xstate(struct xregs_sta
 			offset = xstate_offsets[i];
 			size = xstate_sizes[i];
 
-			if (__copy_from_user(dst, ubuf + offset, size))
+			if (copy_from_user(dst, ubuf + offset, size))
 				return -EFAULT;
 		}
 	}
@@ -1215,7 +1215,7 @@ int copy_user_to_xstate(struct xregs_sta
 	if (xfeatures_mxcsr_quirk(hdr.xfeatures)) {
 		offset = offsetof(struct fxregs_state, mxcsr);
 		size = MXCSR_AND_FLAGS_SIZE;
-		if (__copy_from_user(&xsave->i387.mxcsr, ubuf + offset, size))
+		if (copy_from_user(&xsave->i387.mxcsr, ubuf + offset, size))
 			return -EFAULT;
 	}
 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [patch V2 08/14] x86/fpu: Move inlines where they belong
  2021-06-05 23:47 [patch V2 00/14] x86/fpu: Mop up XSAVES and related damage Thomas Gleixner
                   ` (6 preceding siblings ...)
  2021-06-05 23:47 ` [patch V2 07/14] x86/fpu: Add address range checks to copy_user_to_xstate() Thomas Gleixner
@ 2021-06-05 23:47 ` Thomas Gleixner
  2021-06-05 23:47 ` [patch V2 09/14] x86/cpu: Sanitize X86_FEATURE_OSPKE Thomas Gleixner
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 26+ messages in thread
From: Thomas Gleixner @ 2021-06-05 23:47 UTC (permalink / raw)
  To: LKML
  Cc: x86, Andy Lutomirski, Dave Hansen, Fenghua Yu, Tony Luck,
	Yu-cheng Yu, Sebastian Andrzej Siewior

They are only used in fpstate_init() and there is no point to have them in
a header just to make reading the code harder.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V2: New patch
---
 arch/x86/include/asm/fpu/internal.h |   14 --------------
 arch/x86/kernel/fpu/core.c          |   15 +++++++++++++++
 2 files changed, 15 insertions(+), 14 deletions(-)

--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -87,20 +87,6 @@ extern void fpstate_init_soft(struct swr
 static inline void fpstate_init_soft(struct swregs_state *soft) {}
 #endif
 
-static inline void fpstate_init_xstate(struct xregs_state *xsave)
-{
-	/*
-	 * XRSTORS requires these bits set in xcomp_bv, or it will
-	 * trigger #GP:
-	 */
-	xsave->header.xcomp_bv = XCOMP_BV_COMPACTED_FORMAT | xfeatures_mask_all;
-}
-
-static inline void fpstate_init_fxstate(struct fxregs_state *fx)
-{
-	fx->cwd = 0x37f;
-	fx->mxcsr = MXCSR_DEFAULT;
-}
 extern void fpstate_sanitize_xstate(struct fpu *fpu);
 
 #define user_insn(insn, output, input...)				\
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -181,6 +181,21 @@ void fpu__save(struct fpu *fpu)
 	fpregs_unlock();
 }
 
+static inline void fpstate_init_xstate(struct xregs_state *xsave)
+{
+	/*
+	 * XRSTORS requires these bits set in xcomp_bv, or it will
+	 * trigger #GP:
+	 */
+	xsave->header.xcomp_bv = XCOMP_BV_COMPACTED_FORMAT | xfeatures_mask_all;
+}
+
+static inline void fpstate_init_fxstate(struct fxregs_state *fx)
+{
+	fx->cwd = 0x37f;
+	fx->mxcsr = MXCSR_DEFAULT;
+}
+
 /*
  * Legacy x87 fpstate state init:
  */


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [patch V2 09/14] x86/cpu: Sanitize X86_FEATURE_OSPKE
  2021-06-05 23:47 [patch V2 00/14] x86/fpu: Mop up XSAVES and related damage Thomas Gleixner
                   ` (7 preceding siblings ...)
  2021-06-05 23:47 ` [patch V2 08/14] x86/fpu: Move inlines where they belong Thomas Gleixner
@ 2021-06-05 23:47 ` Thomas Gleixner
  2021-06-05 23:47 ` [patch V2 10/14] x86/fpu: Rename fpu__clear_all() to fpu_flush_thread() Thomas Gleixner
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 26+ messages in thread
From: Thomas Gleixner @ 2021-06-05 23:47 UTC (permalink / raw)
  To: LKML
  Cc: x86, Andy Lutomirski, Dave Hansen, Fenghua Yu, Tony Luck,
	Yu-cheng Yu, Sebastian Andrzej Siewior

X86_FEATURE_OSPKE is enabled first on the boot CPU and the feature flag is
set. Secondary CPUs have to enable CR4.PKE as well and set their per CPU
feature flag. That's ineffective because all call sites have checks for
boot_cpu_data.

Make it smarter and force the feature flag when PKU is enabled on the boot
cpu which allows then to use cpu_feature_enabled(X86_FEATURE_OSPKE) all
over the place. That either compiles the code out when PKEY support is
disabled in Kconfig or uses a static_cpu_has() for the feature check which
makes a significant difference in hotpathes, e.g. context switch.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V2: New patch
---
 arch/x86/include/asm/pgtable.h |    4 ++--
 arch/x86/include/asm/pkeys.h   |    8 ++++----
 arch/x86/kernel/cpu/common.c   |   24 +++++++++++-------------
 arch/x86/kernel/fpu/xstate.c   |    2 +-
 arch/x86/mm/fault.c            |    2 +-
 5 files changed, 19 insertions(+), 21 deletions(-)

--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -129,7 +129,7 @@ static inline int pte_dirty(pte_t pte)
 
 static inline u32 read_pkru(void)
 {
-	if (boot_cpu_has(X86_FEATURE_OSPKE))
+	if (cpu_feature_enabled(X86_FEATURE_OSPKE))
 		return rdpkru();
 	return 0;
 }
@@ -138,7 +138,7 @@ static inline void write_pkru(u32 pkru)
 {
 	struct pkru_state *pk;
 
-	if (!boot_cpu_has(X86_FEATURE_OSPKE))
+	if (!cpu_feature_enabled(X86_FEATURE_OSPKE))
 		return;
 
 	pk = get_xsave_addr(&current->thread.fpu.state.xsave, XFEATURE_PKRU);
--- a/arch/x86/include/asm/pkeys.h
+++ b/arch/x86/include/asm/pkeys.h
@@ -9,14 +9,14 @@
  * will be necessary to ensure that the types that store key
  * numbers and masks have sufficient capacity.
  */
-#define arch_max_pkey() (boot_cpu_has(X86_FEATURE_OSPKE) ? 16 : 1)
+#define arch_max_pkey() (cpu_feature_enabled(X86_FEATURE_OSPKE) ? 16 : 1)
 
 extern int arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
 		unsigned long init_val);
 
 static inline bool arch_pkeys_enabled(void)
 {
-	return boot_cpu_has(X86_FEATURE_OSPKE);
+	return cpu_feature_enabled(X86_FEATURE_OSPKE);
 }
 
 /*
@@ -26,7 +26,7 @@ static inline bool arch_pkeys_enabled(vo
 extern int __execute_only_pkey(struct mm_struct *mm);
 static inline int execute_only_pkey(struct mm_struct *mm)
 {
-	if (!boot_cpu_has(X86_FEATURE_OSPKE))
+	if (!cpu_feature_enabled(X86_FEATURE_OSPKE))
 		return ARCH_DEFAULT_PKEY;
 
 	return __execute_only_pkey(mm);
@@ -37,7 +37,7 @@ extern int __arch_override_mprotect_pkey
 static inline int arch_override_mprotect_pkey(struct vm_area_struct *vma,
 		int prot, int pkey)
 {
-	if (!boot_cpu_has(X86_FEATURE_OSPKE))
+	if (!cpu_feature_enabled(X86_FEATURE_OSPKE))
 		return 0;
 
 	return __arch_override_mprotect_pkey(vma, prot, pkey);
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -465,22 +465,20 @@ static bool pku_disabled;
 
 static __always_inline void setup_pku(struct cpuinfo_x86 *c)
 {
-	/* check the boot processor, plus compile options for PKU: */
-	if (!cpu_feature_enabled(X86_FEATURE_PKU))
-		return;
-	/* checks the actual processor's cpuid bits: */
-	if (!cpu_has(c, X86_FEATURE_PKU))
-		return;
-	if (pku_disabled)
+	if (c == &boot_cpu_data) {
+		if (pku_disabled || !cpu_feature_enabled(X86_FEATURE_PKU))
+			return;
+		/*
+		 * Setting CR4.PKE will cause the X86_FEATURE_OSPKE cpuid
+		 * bit to be set.  Enforce it.
+		 */
+		setup_force_cpu_cap(X86_FEATURE_OSPKE);
+
+	} else if (!cpu_feature_enabled(X86_FEATURE_OSPKE)) {
 		return;
+	}
 
 	cr4_set_bits(X86_CR4_PKE);
-	/*
-	 * Setting X86_CR4_PKE will cause the X86_FEATURE_OSPKE
-	 * cpuid bit to be set.  We need to ensure that we
-	 * update that bit in this CPU's "cpu_info".
-	 */
-	set_cpu_cap(c, X86_FEATURE_OSPKE);
 }
 
 #ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -1002,7 +1002,7 @@ int arch_set_user_pkey_access(struct tas
 	 * This check implies XSAVE support.  OSPKE only gets
 	 * set if we enable XSAVE and we enable PKU in XCR0.
 	 */
-	if (!boot_cpu_has(X86_FEATURE_OSPKE))
+	if (!cpu_feature_enabled(X86_FEATURE_OSPKE))
 		return -EINVAL;
 
 	/*
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -875,7 +875,7 @@ static inline bool bad_area_access_from_
 	/* This code is always called on the current mm */
 	bool foreign = false;
 
-	if (!boot_cpu_has(X86_FEATURE_OSPKE))
+	if (!cpu_feature_enabled(X86_FEATURE_OSPKE))
 		return false;
 	if (error_code & X86_PF_PK)
 		return true;


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [patch V2 10/14] x86/fpu: Rename fpu__clear_all() to fpu_flush_thread()
  2021-06-05 23:47 [patch V2 00/14] x86/fpu: Mop up XSAVES and related damage Thomas Gleixner
                   ` (8 preceding siblings ...)
  2021-06-05 23:47 ` [patch V2 09/14] x86/cpu: Sanitize X86_FEATURE_OSPKE Thomas Gleixner
@ 2021-06-05 23:47 ` Thomas Gleixner
  2021-06-05 23:47 ` [patch V2 11/14] x86/pkru: Provide pkru_get_init_value() Thomas Gleixner
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 26+ messages in thread
From: Thomas Gleixner @ 2021-06-05 23:47 UTC (permalink / raw)
  To: LKML
  Cc: x86, Andy Lutomirski, Dave Hansen, Fenghua Yu, Tony Luck,
	Yu-cheng Yu, Sebastian Andrzej Siewior

Make it clear what the function is about.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V2: New patch
---
 arch/x86/include/asm/fpu/internal.h |    3 ++-
 arch/x86/kernel/fpu/core.c          |    4 ++--
 arch/x86/kernel/process.c           |    2 +-
 3 files changed, 5 insertions(+), 4 deletions(-)

--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -33,9 +33,10 @@ extern int  fpu__restore_sig(void __user
 extern void fpu__drop(struct fpu *fpu);
 extern int  fpu__copy(struct task_struct *dst, struct task_struct *src);
 extern void fpu__clear_user_states(struct fpu *fpu);
-extern void fpu__clear_all(struct fpu *fpu);
 extern int  fpu__exception_code(struct fpu *fpu, int trap_nr);
 
+extern void fpu_flush_thread(void);
+
 /*
  * Boot time FPU initialization functions:
  */
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -402,9 +402,9 @@ void fpu__clear_user_states(struct fpu *
 	fpu__clear(fpu, true);
 }
 
-void fpu__clear_all(struct fpu *fpu)
+void fpu_flush_thread(void)
 {
-	fpu__clear(fpu, false);
+	fpu__clear(&current->thread.fpu, false);
 }
 
 /*
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -206,7 +206,7 @@ void flush_thread(void)
 	flush_ptrace_hw_breakpoint(tsk);
 	memset(tsk->thread.tls_array, 0, sizeof(tsk->thread.tls_array));
 
-	fpu__clear_all(&tsk->thread.fpu);
+	fpu_flush_thread();
 }
 
 void disable_TSC(void)


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [patch V2 11/14] x86/pkru: Provide pkru_get_init_value()
  2021-06-05 23:47 [patch V2 00/14] x86/fpu: Mop up XSAVES and related damage Thomas Gleixner
                   ` (9 preceding siblings ...)
  2021-06-05 23:47 ` [patch V2 10/14] x86/fpu: Rename fpu__clear_all() to fpu_flush_thread() Thomas Gleixner
@ 2021-06-05 23:47 ` Thomas Gleixner
  2021-06-05 23:47 ` [patch V2 12/14] x86/fpu: Clean up the fpu__clear() variants Thomas Gleixner
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 26+ messages in thread
From: Thomas Gleixner @ 2021-06-05 23:47 UTC (permalink / raw)
  To: LKML
  Cc: x86, Andy Lutomirski, Dave Hansen, Fenghua Yu, Tony Luck,
	Yu-cheng Yu, Sebastian Andrzej Siewior

When CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS is disabled then the following
code fails to compile:

     if (cpu_feature_enabled(X86_FEATURE_OSPKE)) {
     	u32 pkru = READ_ONCE(init_pkru_value);
	..
     }

because init_pkru_value is defined as '0' which makes READ_ONCE() upset.

Provide an accessor macro to avoid #ifdeffery all over the place.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V2: New patch
---
 arch/x86/include/asm/pgtable.h |    2 ++
 1 file changed, 2 insertions(+)

--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1366,8 +1366,10 @@ static inline pmd_t pmd_swp_clear_uffd_w
 
 #ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
 extern u32 init_pkru_value;
+#define pkru_get_init_value()	READ_ONCE(init_pkru_value)
 #else
 #define init_pkru_value	0
+#define pkru_get_init_value()	0
 #endif
 
 static inline bool __pkru_allows_read(u32 pkru, u16 pkey)


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [patch V2 12/14] x86/fpu: Clean up the fpu__clear() variants
  2021-06-05 23:47 [patch V2 00/14] x86/fpu: Mop up XSAVES and related damage Thomas Gleixner
                   ` (10 preceding siblings ...)
  2021-06-05 23:47 ` [patch V2 11/14] x86/pkru: Provide pkru_get_init_value() Thomas Gleixner
@ 2021-06-05 23:47 ` Thomas Gleixner
  2021-06-05 23:47 ` [patch V2 13/14] x86/fpu: Rename xstate copy functions which are related to UABI Thomas Gleixner
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 26+ messages in thread
From: Thomas Gleixner @ 2021-06-05 23:47 UTC (permalink / raw)
  To: LKML
  Cc: x86, Andy Lutomirski, Dave Hansen, Fenghua Yu, Tony Luck,
	Yu-cheng Yu, Sebastian Andrzej Siewior

From: Andy Lutomirski <luto@kernel.org>

fpu__clear() currently resets both register state and kernel XSAVE buffer
state.  It has two modes: one for all state (supervisor and user) and
another for user state only.  fpu__clear_all() uses the "all state"
(user_only=0) mode, while a number of signal paths use the user_only=1
mode.

Make fpu__clear() work only for user state (user_only=1) and remove the
"all state" (user_only=0) code.  Rename it to match so it can be used by
the signal paths.

Replace the "all state" (user_only=0) fpu__clear() functionality.  Use the
TIF_NEED_FPU_LOAD functionality instead of making any actual hardware
registers changes in this path.

Instead of invoking fpu__initialize() just memcpy() init_fpstate into the
tasks FPU state because that has already the correct format and in case of
PKRU also contains the default PKRU value. Move the actual PKRU write out
into flush_thread() where it belongs and where it will end up anyway when
Dave distangles PKRU and XSTATE.

[ Dave Hansen: Polished changelog ]
[ tglx: Fixed the PKRU fallout ]

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V2: Fixed PKRU fallout reported by Yu-Chen
---
 arch/x86/kernel/fpu/core.c |  101 +++++++++++++++++++++++++++++----------------
 arch/x86/kernel/process.c  |   12 +++++
 2 files changed, 78 insertions(+), 35 deletions(-)

--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -269,19 +269,6 @@ int fpu__copy(struct task_struct *dst, s
 }
 
 /*
- * Activate the current task's in-memory FPU context,
- * if it has not been used before:
- */
-static void fpu__initialize(struct fpu *fpu)
-{
-	WARN_ON_FPU(fpu != &current->thread.fpu);
-
-	set_thread_flag(TIF_NEED_FPU_LOAD);
-	fpstate_init(&fpu->state);
-	trace_x86_fpu_init_state(fpu);
-}
-
-/*
  * This function must be called before we read a task's fpstate.
  *
  * There's two cases where this gets called:
@@ -365,46 +352,90 @@ static inline void copy_init_fpstate_to_
 		copy_kernel_to_fregs(&init_fpstate.fsave);
 }
 
+static inline unsigned int init_fpstate_copy_size(void)
+{
+	if (!use_xsave())
+		return fpu_kernel_xstate_size;
+
+	/* XSAVE(S) without PKRU just needs the xstate part */
+	if (!cpu_feature_enabled(X86_FEATURE_OSPKE))
+		return sizeof(init_fpstate.xsave);
+
+	/*
+	 * With PKRU it needs the PKRU component storage as well.  Just
+	 * copy the full size for now as this will go away soon when PKRU
+	 * has been distangled from XSTATE.
+	 */
+	return min((unsigned int)sizeof(init_fpstate), fpu_kernel_xstate_size);
+}
+
 /*
- * Clear the FPU state back to init state.
- *
- * Called by sys_execve(), by the signal handler code and by various
- * error paths.
+ * Reset current->fpu memory state to the init values.
  */
-static void fpu__clear(struct fpu *fpu, bool user_only)
+static void fpu_reset_fpstate(void)
+{
+	struct fpu *fpu= &current->thread.fpu;
+
+	fpregs_lock();
+	fpu__drop(fpu);
+	/*
+	 * This does not change the actual hardware registers. It just
+	 * resets the memory image and sets TIF_NEED_FPU_LOAD so a
+	 * subsequent return to usermode will reload the registers from the
+	 * tasks memory image.
+	 *
+	 * Do not use fpstate_init() here. Just copy init_fpstate which has
+	 * the correct content already and if PKRU is enabled also contains
+	 * the default PKRU value.
+	 */
+	memcpy(&fpu->state, &init_fpstate, init_fpstate_copy_size());
+	set_thread_flag(TIF_NEED_FPU_LOAD);
+	fpregs_unlock();
+}
+
+/*
+ * Reset current's user FPU states to the init states.  current's
+ * supervisor states, if any, are not modified by this function.  The
+ * caller guarantees that the XSTATE header in memory is intact.
+ */
+void fpu__clear_user_states(struct fpu *fpu)
 {
 	WARN_ON_FPU(fpu != &current->thread.fpu);
 
 	if (!static_cpu_has(X86_FEATURE_FPU)) {
-		fpu__drop(fpu);
-		fpu__initialize(fpu);
+		fpu_reset_fpstate();
 		return;
 	}
 
 	fpregs_lock();
-
-	if (user_only) {
-		if (!fpregs_state_valid(fpu, smp_processor_id()) &&
-		    xfeatures_mask_supervisor())
-			copy_kernel_to_xregs(&fpu->state.xsave,
-					     xfeatures_mask_supervisor());
-		copy_init_fpstate_to_fpregs(xfeatures_mask_user());
-	} else {
-		copy_init_fpstate_to_fpregs(xfeatures_mask_all);
+	/*
+	 * Ensure that current's supervisor states are loaded into their
+	 * corresponding registers.
+	 */
+	if (xfeatures_mask_supervisor() &&
+	    !fpregs_state_valid(fpu, smp_processor_id())) {
+		copy_kernel_to_xregs(&fpu->state.xsave,
+				     xfeatures_mask_supervisor());
 	}
 
+	/* Reset user states in registers. */
+	copy_init_fpstate_to_fpregs(xfeatures_mask_user());
+
+	/*
+	 * Now all FPU registers have their desired values.  Inform the FPU
+	 * state machine that current's FPU registers are in the hardware
+	 * registers. The memory image does not need to be updated because
+	 * any operation relying on it has to save the registers first when
+	 * currents FPU is marked active.
+	 */
 	fpregs_mark_activate();
-	fpregs_unlock();
-}
 
-void fpu__clear_user_states(struct fpu *fpu)
-{
-	fpu__clear(fpu, true);
+	fpregs_unlock();
 }
 
 void fpu_flush_thread(void)
 {
-	fpu__clear(&current->thread.fpu, false);
+	fpu_reset_fpstate();
 }
 
 /*
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -199,6 +199,17 @@ int copy_thread(unsigned long clone_flag
 	return ret;
 }
 
+static void pkru_flush_thread(void)
+{
+	/*
+	 * If PKRU is enabled the default PKRU value has to be loaded into
+	 * the hardware right here (similar to context switch). The XSAVE
+	 * state is already up to date.
+	 */
+	if (cpu_feature_enabled(X86_FEATURE_OSPKE))
+		__write_pkru(pkru_get_init_value());
+}
+
 void flush_thread(void)
 {
 	struct task_struct *tsk = current;
@@ -207,6 +218,7 @@ void flush_thread(void)
 	memset(tsk->thread.tls_array, 0, sizeof(tsk->thread.tls_array));
 
 	fpu_flush_thread();
+	pkru_flush_thread();
 }
 
 void disable_TSC(void)


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [patch V2 13/14] x86/fpu: Rename xstate copy functions which are related to UABI
  2021-06-05 23:47 [patch V2 00/14] x86/fpu: Mop up XSAVES and related damage Thomas Gleixner
                   ` (11 preceding siblings ...)
  2021-06-05 23:47 ` [patch V2 12/14] x86/fpu: Clean up the fpu__clear() variants Thomas Gleixner
@ 2021-06-05 23:47 ` Thomas Gleixner
  2021-06-05 23:47 ` [patch V2 14/14] x86/fpu: Deduplicate copy_uabi_from_user/kernel_to_xstate() Thomas Gleixner
  2021-06-07 13:02 ` [patch V2 00/14] x86/fpu: Mop up XSAVES and related damage Thomas Gleixner
  14 siblings, 0 replies; 26+ messages in thread
From: Thomas Gleixner @ 2021-06-05 23:47 UTC (permalink / raw)
  To: LKML
  Cc: x86, Andy Lutomirski, Dave Hansen, Fenghua Yu, Tony Luck,
	Yu-cheng Yu, Sebastian Andrzej Siewior

Rename them to reflect that these functions deal with user space format
XSAVE buffers.

Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V2: New patch
---
 arch/x86/include/asm/fpu/xstate.h |    6 +++---
 arch/x86/kernel/fpu/regset.c      |    4 ++--
 arch/x86/kernel/fpu/signal.c      |    2 +-
 arch/x86/kernel/fpu/xstate.c      |    7 ++++---
 4 files changed, 10 insertions(+), 9 deletions(-)

--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -105,9 +105,9 @@ const void *get_xsave_field_ptr(int xfea
 int using_compacted_format(void);
 int xfeature_size(int xfeature_nr);
 struct membuf;
-void copy_xstate_to_kernel(struct membuf to, struct xregs_state *xsave);
-int copy_kernel_to_xstate(struct xregs_state *xsave, const void *kbuf);
-int copy_user_to_xstate(struct xregs_state *xsave, const void __user *ubuf);
+void copy_uabi_xstate_to_kernel(struct membuf to, struct xregs_state *xsave);
+int copy_uabi_from_kernel_to_xstate(struct xregs_state *xsave, const void *kbuf);
+int copy_uabi_from_user_to_xstate(struct xregs_state *xsave, const void __user *ubuf);
 void copy_supervisor_to_kernel(struct xregs_state *xsave);
 void copy_dynamic_supervisor_to_kernel(struct xregs_state *xstate, u64 mask);
 void copy_kernel_to_dynamic_supervisor(struct xregs_state *xstate, u64 mask);
--- a/arch/x86/kernel/fpu/regset.c
+++ b/arch/x86/kernel/fpu/regset.c
@@ -89,7 +89,7 @@ int xstateregs_get(struct task_struct *t
 	fpu__prepare_read(fpu);
 
 	if (using_compacted_format()) {
-		copy_xstate_to_kernel(to, xsave);
+		copy_uabi_xstate_to_kernel(to, xsave);
 		return 0;
 	} else {
 		fpstate_sanitize_xstate(fpu);
@@ -135,7 +135,7 @@ int xstateregs_set(struct task_struct *t
 	}
 
 	fpu__prepare_write(fpu);
-	ret = copy_kernel_to_xstate(&fpu->state.xsave, kbuf ?: tmpbuf);
+	ret = copy_uabi_from_kernel_to_xstate(&fpu->state.xsave, kbuf ?: tmpbuf);
 
 out:
 	vfree(tmpbuf);
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -426,7 +426,7 @@ static int __fpu__restore_sig(void __use
 	if (use_xsave() && !fx_only) {
 		u64 init_bv = xfeatures_mask_user() & ~user_xfeatures;
 
-		ret = copy_user_to_xstate(&fpu->state.xsave, buf_fx);
+		ret = copy_uabi_from_user_to_xstate(&fpu->state.xsave, buf_fx);
 		if (ret)
 			goto err_out;
 
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -1072,7 +1072,7 @@ static void copy_part(struct membuf *to,
  * It supports partial copy but pos always starts from zero. This is called
  * from xstateregs_get() and there we check the CPU has XSAVES.
  */
-void copy_xstate_to_kernel(struct membuf to, struct xregs_state *xsave)
+void copy_uabi_xstate_to_kernel(struct membuf to, struct xregs_state *xsave)
 {
 	struct xstate_header header;
 	const unsigned off_mxcsr = offsetof(struct fxregs_state, mxcsr);
@@ -1128,7 +1128,7 @@ void copy_xstate_to_kernel(struct membuf
  * Convert from a ptrace standard-format kernel buffer to kernel XSAVES format
  * and copy to the target thread. This is called from xstateregs_set().
  */
-int copy_kernel_to_xstate(struct xregs_state *xsave, const void *kbuf)
+int copy_uabi_from_kernel_to_xstate(struct xregs_state *xsave, const void *kbuf)
 {
 	unsigned int offset, size;
 	int i;
@@ -1183,7 +1183,8 @@ int copy_kernel_to_xstate(struct xregs_s
  * XSAVES format and copy to the target thread. This is called from the
  * sigreturn() and rt_sigreturn() system calls.
  */
-int copy_user_to_xstate(struct xregs_state *xsave, const void __user *ubuf)
+int copy_uabi_from_user_to_xstate(struct xregs_state *xsave,
+				  const void __user *ubuf)
 {
 	unsigned int offset, size;
 	int i;


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [patch V2 14/14] x86/fpu: Deduplicate copy_uabi_from_user/kernel_to_xstate()
  2021-06-05 23:47 [patch V2 00/14] x86/fpu: Mop up XSAVES and related damage Thomas Gleixner
                   ` (12 preceding siblings ...)
  2021-06-05 23:47 ` [patch V2 13/14] x86/fpu: Rename xstate copy functions which are related to UABI Thomas Gleixner
@ 2021-06-05 23:47 ` Thomas Gleixner
  2021-06-07 13:02 ` [patch V2 00/14] x86/fpu: Mop up XSAVES and related damage Thomas Gleixner
  14 siblings, 0 replies; 26+ messages in thread
From: Thomas Gleixner @ 2021-06-05 23:47 UTC (permalink / raw)
  To: LKML
  Cc: x86, Andy Lutomirski, Dave Hansen, Fenghua Yu, Tony Luck,
	Yu-cheng Yu, Sebastian Andrzej Siewior

copy_uabi_from_user_to_xstate() and copy_uabi_from_kernel_to_xstate() are
almost identical except for the copy function.

Unify them.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
---
V2: Adjusted to the rename.
---
 arch/x86/kernel/fpu/xstate.c |   83 +++++++++++++++----------------------------
 1 file changed, 29 insertions(+), 54 deletions(-)

--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -1124,20 +1124,30 @@ void copy_uabi_xstate_to_kernel(struct m
 	fill_gap(&to, &last, size);
 }
 
-/*
- * Convert from a ptrace standard-format kernel buffer to kernel XSAVES format
- * and copy to the target thread. This is called from xstateregs_set().
- */
-int copy_uabi_from_kernel_to_xstate(struct xregs_state *xsave, const void *kbuf)
+static int copy_from_buffer(void *dst, unsigned int offset, unsigned int size,
+			    const void *kbuf, const void __user *ubuf)
+{
+	if (kbuf) {
+		memcpy(dst, kbuf + offset, size);
+	} else {
+		if (copy_from_user(dst, ubuf + offset, size))
+			return -EFAULT;
+	}
+	return 0;
+}
+
+static int copy_uabi_to_xstate(struct xregs_state *xsave, const void *kbuf,
+			       const void __user *ubuf)
 {
 	unsigned int offset, size;
-	int i;
 	struct xstate_header hdr;
+	int i;
 
 	offset = offsetof(struct xregs_state, header);
 	size = sizeof(hdr);
 
-	memcpy(&hdr, kbuf + offset, size);
+	if (copy_from_buffer(&hdr, offset, size, kbuf, ubuf))
+		return -EFAULT;
 
 	if (validate_user_xstate_header(&hdr))
 		return -EINVAL;
@@ -1151,7 +1161,8 @@ int copy_uabi_from_kernel_to_xstate(stru
 			offset = xstate_offsets[i];
 			size = xstate_sizes[i];
 
-			memcpy(dst, kbuf + offset, size);
+			if (copy_from_buffer(dst, offset, size, kbuf, ubuf))
+				return -EFAULT;
 		}
 	}
 
@@ -1179,6 +1190,15 @@ int copy_uabi_from_kernel_to_xstate(stru
 }
 
 /*
+ * Convert from a ptrace standard-format kernel buffer to kernel XSAVES format
+ * and copy to the target thread. This is called from xstateregs_set().
+ */
+int copy_uabi_from_kernel_to_xstate(struct xregs_state *xsave, const void *kbuf)
+{
+	return copy_uabi_to_xstate(xsave, kbuf, NULL);
+}
+
+/*
  * Convert from a sigreturn standard-format user-space buffer to kernel
  * XSAVES format and copy to the target thread. This is called from the
  * sigreturn() and rt_sigreturn() system calls.
@@ -1186,52 +1206,7 @@ int copy_uabi_from_kernel_to_xstate(stru
 int copy_uabi_from_user_to_xstate(struct xregs_state *xsave,
 				  const void __user *ubuf)
 {
-	unsigned int offset, size;
-	int i;
-	struct xstate_header hdr;
-
-	offset = offsetof(struct xregs_state, header);
-	size = sizeof(hdr);
-
-	if (copy_from_user(&hdr, ubuf + offset, size))
-		return -EFAULT;
-
-	if (validate_user_xstate_header(&hdr))
-		return -EINVAL;
-
-	for (i = 0; i < XFEATURE_MAX; i++) {
-		u64 mask = ((u64)1 << i);
-
-		if (hdr.xfeatures & mask) {
-			void *dst = __raw_xsave_addr(xsave, i);
-
-			offset = xstate_offsets[i];
-			size = xstate_sizes[i];
-
-			if (copy_from_user(dst, ubuf + offset, size))
-				return -EFAULT;
-		}
-	}
-
-	if (xfeatures_mxcsr_quirk(hdr.xfeatures)) {
-		offset = offsetof(struct fxregs_state, mxcsr);
-		size = MXCSR_AND_FLAGS_SIZE;
-		if (copy_from_user(&xsave->i387.mxcsr, ubuf + offset, size))
-			return -EFAULT;
-	}
-
-	/*
-	 * The state that came in from userspace was user-state only.
-	 * Mask all the user states out of 'xfeatures':
-	 */
-	xsave->header.xfeatures &= XFEATURE_MASK_SUPERVISOR_ALL;
-
-	/*
-	 * Add back in the features that came in from userspace:
-	 */
-	xsave->header.xfeatures |= hdr.xfeatures;
-
-	return 0;
+	return copy_uabi_to_xstate(xsave, NULL, ubuf);
 }
 
 /*


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch V2 02/14] x86/fpu: Prevent state corruption in __fpu__restore_sig()
  2021-06-05 23:47 ` [patch V2 02/14] x86/fpu: Prevent state corruption in __fpu__restore_sig() Thomas Gleixner
@ 2021-06-07  8:49   ` Borislav Petkov
  0 siblings, 0 replies; 26+ messages in thread
From: Borislav Petkov @ 2021-06-07  8:49 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Andy Lutomirski, Dave Hansen, Fenghua Yu, Tony Luck,
	Yu-cheng Yu, Sebastian Andrzej Siewior,
	syzbot+2067e764dbcd10721e2e, stable

On Sun, Jun 06, 2021 at 01:47:44AM +0200, Thomas Gleixner wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> The non-compacted slowpath uses __copy_from_user() and copies the entire
> user buffer into the kernel buffer, verbatim.  This means that the kernel
> buffer may now contain entirely invalid state on which XRSTOR will #GP.
> validate_user_xstate_header() can detect some of that corruption, but that
> leaves the onus on callers to clear the buffer.
> 
> Prior to XSAVES support it was possible just to reinitialize the buffer,
> completely, but with supervisor states that is not longer possible as the
> buffer clearing code split got it backwards. Fixing that is possible, but
> not corrupting the state in the first place is more robust.
> 
> Avoid corruption of the kernel XSAVE buffer by using copy_user_to_xstate()
> which validates the XSAVE header contents before copying the actual states
> to the kernel. copy_user_to_xstate() was previously only called for
> compacted-format kernel buffers, but it works for both compacted and
> non-compacted forms.
> 
> Using it for the non-compacted form is slower because of multiple
> __copy_from_user() operations, but that cost is less important than robust
> code in an already slow path.
> 
> [ Changelog polished by Dave Hansen ]
> 
> Fixes: b860eb8dce59 ("x86/fpu/xstate: Define new functions for clearing fpregs and xstates")
> Reported-by: syzbot+2067e764dbcd10721e2e@syzkaller.appspotmail.com
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: stable@vger.kernel.org
> ---
> V2: Removed the make validate_user_xstate_header() static hunks (Borislav)
> ---
>  arch/x86/kernel/fpu/signal.c |    9 +--------
>  1 file changed, 1 insertion(+), 8 deletions(-)

Very nice.

Reviewed-by: Borislav Petkov <bp@suse.de>

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch V2 00/14] x86/fpu: Mop up XSAVES and related damage
  2021-06-05 23:47 [patch V2 00/14] x86/fpu: Mop up XSAVES and related damage Thomas Gleixner
                   ` (13 preceding siblings ...)
  2021-06-05 23:47 ` [patch V2 14/14] x86/fpu: Deduplicate copy_uabi_from_user/kernel_to_xstate() Thomas Gleixner
@ 2021-06-07 13:02 ` Thomas Gleixner
  2021-06-07 13:36   ` Dave Hansen
  14 siblings, 1 reply; 26+ messages in thread
From: Thomas Gleixner @ 2021-06-07 13:02 UTC (permalink / raw)
  To: LKML
  Cc: x86, Andy Lutomirski, Dave Hansen, Fenghua Yu, Tony Luck,
	Yu-cheng Yu, Sebastian Andrzej Siewior

On Sun, Jun 06 2021 at 01:47, Thomas Gleixner wrote:
> P.S: I picked up Dave's PKRU series from

and while looking deeper at that I found the following existing
inconsistency vs. PKRU:

On exec() the kernel sets PKRU to the default value which is as
restrictive as possible.

But on sigrestore PKRU when the init optimization is used PKRU is reset
to the hardware default (0), which is as permissive as possible:

  1) On XRSTOR from user and the signal handler cleared the PKRU feature
     bit in xsave.header.xfeatures. That's true for the fast and the
     slow path.

  2) For the fx_only case which loads the init_fpstate (with my fixes
     applied that's not longer the case)

So that's inconsistent at best.

As the kernel defines the "init" behaviour of PKRU as non-permissive
on exec() it should do so consequently in sigrestore as well.

Thoughts?

Thanks,

        tglx



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch V2 00/14] x86/fpu: Mop up XSAVES and related damage
  2021-06-07 13:02 ` [patch V2 00/14] x86/fpu: Mop up XSAVES and related damage Thomas Gleixner
@ 2021-06-07 13:36   ` Dave Hansen
  2021-06-07 14:08     ` Thomas Gleixner
  0 siblings, 1 reply; 26+ messages in thread
From: Dave Hansen @ 2021-06-07 13:36 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: x86, Andy Lutomirski, Dave Hansen, Fenghua Yu, Tony Luck,
	Yu-cheng Yu, Sebastian Andrzej Siewior

On 6/7/21 6:02 AM, Thomas Gleixner wrote:
> On Sun, Jun 06 2021 at 01:47, Thomas Gleixner wrote:
>> P.S: I picked up Dave's PKRU series from
> and while looking deeper at that I found the following existing
> inconsistency vs. PKRU:
> 
> On exec() the kernel sets PKRU to the default value which is as
> restrictive as possible.
> 
> But on sigrestore PKRU when the init optimization is used PKRU is reset
> to the hardware default (0), which is as permissive as possible:
> 
>   1) On XRSTOR from user and the signal handler cleared the PKRU feature
>      bit in xsave.header.xfeatures. That's true for the fast and the
>      slow path.
> 
>   2) For the fx_only case which loads the init_fpstate (with my fixes
>      applied that's not longer the case)
> 
> So that's inconsistent at best.
> 
> As the kernel defines the "init" behaviour of PKRU as non-permissive
> on exec() it should do so consequently in sigrestore as well.

Agreed.  The kernel has essentially picked its own init value for PKRU,
separate from the hardware init value.  It should be applied consistently.

By the way, are you talking specifically about the _error_ paths where
the kernel is unable to XRSTOR the signal XSAVE buffer for some reason,
and tries to apply either init_fpu or the hardware init state instead?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch V2 00/14] x86/fpu: Mop up XSAVES and related damage
  2021-06-07 13:36   ` Dave Hansen
@ 2021-06-07 14:08     ` Thomas Gleixner
  2021-06-07 16:38       ` Dave Hansen
  0 siblings, 1 reply; 26+ messages in thread
From: Thomas Gleixner @ 2021-06-07 14:08 UTC (permalink / raw)
  To: Dave Hansen, LKML
  Cc: x86, Andy Lutomirski, Dave Hansen, Fenghua Yu, Tony Luck,
	Yu-cheng Yu, Sebastian Andrzej Siewior

On Mon, Jun 07 2021 at 06:36, Dave Hansen wrote:
> On 6/7/21 6:02 AM, Thomas Gleixner wrote:
>> On exec() the kernel sets PKRU to the default value which is as
>> restrictive as possible.
>> 
>> But on sigrestore PKRU when the init optimization is used PKRU is reset
>> to the hardware default (0), which is as permissive as possible:
>> 
>>   1) On XRSTOR from user and the signal handler cleared the PKRU feature
>>      bit in xsave.header.xfeatures. That's true for the fast and the
>>      slow path.
>> 
>>   2) For the fx_only case which loads the init_fpstate (with my fixes
>>      applied that's not longer the case)
>> 
>> So that's inconsistent at best.
>> 
>> As the kernel defines the "init" behaviour of PKRU as non-permissive
>> on exec() it should do so consequently in sigrestore as well.
>
> Agreed.  The kernel has essentially picked its own init value for PKRU,
> separate from the hardware init value.  It should be applied consistently.
>
> By the way, are you talking specifically about the _error_ paths where
> the kernel is unable to XRSTOR the signal XSAVE buffer for some reason,
> and tries to apply either init_fpu or the hardware init state instead?

1) Successful XRSTOR from user if the PKRU feature bit in the
   sigframe xsave.header.xfeatures is cleared. Both fast and slow path.

2) Successful XRSTOR from user if the PKRU feature bit is cleared in
   fx_sw_user.xfeatures as that loads init_fpstate which is busted in
   mainline. Both fast and slow path. Patch 4/14 of this series fixes this.

3) fx_only both in the fast and slow path are broken without 4/14

4) The error handling path (failed restore) is actually doing the right
   thing.

The slow path and the fx_only/fx_sw_user.xfeaturers cases are trivial
to fix.

The fast path not so much because XRSTOR operates directly on the
sigframe and there is no information about the header.xfeatures
content. So to fix this, we need to get header.xfeatures from user
before the XRSTOR and then handle the case then the PKRU bit is clear.

Thanks

        tglx
  





   

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch V2 04/14] x86/pkru: Make the fpinit state update work
  2021-06-05 23:47 ` [patch V2 04/14] x86/pkru: Make the fpinit state update work Thomas Gleixner
@ 2021-06-07 15:18   ` Borislav Petkov
  0 siblings, 0 replies; 26+ messages in thread
From: Borislav Petkov @ 2021-06-07 15:18 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Andy Lutomirski, Dave Hansen, Fenghua Yu, Tony Luck,
	Yu-cheng Yu, Sebastian Andrzej Siewior, stable

On Sun, Jun 06, 2021 at 01:47:46AM +0200, Thomas Gleixner wrote:
> @@ -120,6 +121,8 @@ void __init check_bugs(void)
>  
>  	arch_smt_update();
>  
> +	pkru_propagate_default();

I guess this fits better at the end of identify_boot_cpu(), which is
pretty close to here, in the boot order.

Regardless, that function check_bugs() needs cleaning up as it has
collected a lot more stuff than just checking the bugs nasty.

> +void pkru_propagate_default(void)
>  {
> -	u32 init_pkru_value_snapshot = READ_ONCE(init_pkru_value);
> +	struct pkru_state *pk;
> +
> +	if (!boot_cpu_has(X86_FEATURE_OSPKE))

cpu_feature_enabled()

> +		return;
>  	/*
> -	 * Override the PKRU state that came from 'init_fpstate'
> -	 * with the baseline from the process.
> +	 * Force XFEATURE_PKRU to be set in the header otherwise
> +	 * get_xsave_addr() does not work and it needs to be set
> +	 * to make XRSTOR(S) load it.
>  	 */
> -	write_pkru(init_pkru_value_snapshot);
> +	init_fpstate.xsave.header.xfeatures |= XFEATURE_MASK_PKRU;
> +	pk = get_xsave_addr(&init_fpstate.xsave, XFEATURE_PKRU);
> +	pk->pkru = READ_ONCE(init_pkru_value);
>  }

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch V2 00/14] x86/fpu: Mop up XSAVES and related damage
  2021-06-07 14:08     ` Thomas Gleixner
@ 2021-06-07 16:38       ` Dave Hansen
  2021-06-07 22:51         ` Thomas Gleixner
  2021-06-08 11:17         ` Thomas Gleixner
  0 siblings, 2 replies; 26+ messages in thread
From: Dave Hansen @ 2021-06-07 16:38 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: x86, Andy Lutomirski, Dave Hansen, Fenghua Yu, Tony Luck,
	Yu-cheng Yu, Sebastian Andrzej Siewior

On 6/7/21 7:08 AM, Thomas Gleixner wrote:
>> By the way, are you talking specifically about the _error_ paths where
>> the kernel is unable to XRSTOR the signal XSAVE buffer for some reason,
>> and tries to apply either init_fpu or the hardware init state instead?
> 
> 1) Successful XRSTOR from user if the PKRU feature bit in the
>    sigframe xsave.header.xfeatures is cleared. Both fast and slow path.

It seems like the suggestion here is to inject 'init_pkru_value' in all
cases where the kernel would be injecting the hardware init value.  I
don't think we should go that far.

If a signal handler sets xsave.header.xfeatures[PKRU]=0, I can't imagine
any other intent than wanting the hardware init value.

The error cases don't have intent and the kernel is pretty free to
inject whatever sanity it deems fit to make forward progress.

> 2) Successful XRSTOR from user if the PKRU feature bit is cleared in
>    fx_sw_user.xfeatures as that loads init_fpstate which is busted in
>    mainline. Both fast and slow path. Patch 4/14 of this series fixes this.

This comes down to what happens when "xsave.header.xfeatures !=
fx_sw_user.xfeatures".  I honestly don't have the foggiest idea, and
seriously doubt anyone cares.  I'm fine with where 4/14 lands.

> 3) fx_only both in the fast and slow path are broken without 4/14

fx_only to me means that XSAVE doesn't work, either because the hardware
doesn't support it, or userspace mucked with the buffer enough so XSAVE
doesn't work.  That's pretty close to the error case, and I'm fine with
the kernel doing what it wants.

BTW, we currently zap X86_FEATURE_PKU if XSAVE is disabled.  That gets
rid of at least a few of the nasty fx_only cases.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch V2 06/14] x86/fpu: Sanitize xstateregs_set()
  2021-06-05 23:47 ` [patch V2 06/14] x86/fpu: Sanitize xstateregs_set() Thomas Gleixner
@ 2021-06-07 19:39   ` Borislav Petkov
  0 siblings, 0 replies; 26+ messages in thread
From: Borislav Petkov @ 2021-06-07 19:39 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Andy Lutomirski, Dave Hansen, Fenghua Yu, Tony Luck,
	Yu-cheng Yu, Sebastian Andrzej Siewior

On Sun, Jun 06, 2021 at 01:47:48AM +0200, Thomas Gleixner wrote:
> xstateregs_set() operates on a stopped task and tries to copy the provided
> buffer into the tasks fpu.state.xsave buffer.
> 
> Any error while copying or invalid state detected after copying results in
> wiping the target tasks FPU state completely including supervisor states.

Again, "task's" in both above pls.

> That's just wrong. The caller supplied invalid data or has a problem with
> unmapped memory, so there is absolutely no justification to wreckage the

Yeah, as amluto already pointed out "wreck the target".

> target.
> 
> Fix this with the following modifications:
> 
>  1) If data has to be copied from userspace, allocate a buffer and copy from
>     user first.
> 
>  2) Use copy_kernel_to_xstate() unconditionally so that header checking
>     works correctly.
> 
>  3) Return on error without wreckaging the target state.

"wrecking"

> 
> This prevents corrupting supervisor states and lets the caller deal with
> the problem it caused in the first place.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
> V2: Move the validate_user_xstate_header() static here - Borislav
>     Use copy_from_user() - Dave, Yu
>     Rename xbuf to tmpbuf - Borislav
> ---
>  arch/x86/include/asm/fpu/xstate.h |    4 ---
>  arch/x86/kernel/fpu/regset.c      |   41 +++++++++++++++-----------------------
>  arch/x86/kernel/fpu/xstate.c      |   12 ++++++-----
>  3 files changed, 24 insertions(+), 33 deletions(-)
> 
> --- a/arch/x86/include/asm/fpu/xstate.h
> +++ b/arch/x86/include/asm/fpu/xstate.h
> @@ -112,8 +112,4 @@ void copy_supervisor_to_kernel(struct xr
>  void copy_dynamic_supervisor_to_kernel(struct xregs_state *xstate, u64 mask);
>  void copy_kernel_to_dynamic_supervisor(struct xregs_state *xstate, u64 mask);
>  
> -
> -/* Validate an xstate header supplied by userspace (ptrace or sigreturn) */
> -int validate_user_xstate_header(const struct xstate_header *hdr);
> -
>  #endif
> --- a/arch/x86/kernel/fpu/regset.c
> +++ b/arch/x86/kernel/fpu/regset.c
> @@ -6,8 +6,12 @@
>  #include <asm/fpu/signal.h>
>  #include <asm/fpu/regset.h>
>  #include <asm/fpu/xstate.h>
> +
> +#include <linux/vmalloc.h>
> +
>  #include <linux/sched/task_stack.h>

I think the linux/ namespace headers come first and then the asm/ ones.

IOW, this:

diff --git a/arch/x86/kernel/fpu/regset.c b/arch/x86/kernel/fpu/regset.c
index cc49078b74ce..cfa242d54a26 100644
--- a/arch/x86/kernel/fpu/regset.c
+++ b/arch/x86/kernel/fpu/regset.c
@@ -2,16 +2,15 @@
 /*
  * FPU register's regset abstraction, for ptrace, core dumps, etc.
  */
+
+#include <linux/sched/task_stack.h>
+#include <linux/vmalloc.h>
+
 #include <asm/fpu/internal.h>
 #include <asm/fpu/signal.h>
 #include <asm/fpu/regset.h>
 #include <asm/fpu/xstate.h>
 
-#include <linux/vmalloc.h>
-
-#include <linux/sched/task_stack.h>
-
-
 /*
  * The xstateregs_active() routine is the same as the regset_fpregs_active() routine,
  * as the "regset->n" for the xstate regset will be updated based on the feature


>   * The xstateregs_active() routine is the same as the regset_fpregs_active() routine,
>   * as the "regset->n" for the xstate regset will be updated based on the feature
> @@ -108,7 +112,7 @@ int xstateregs_set(struct task_struct *t
>  		  const void *kbuf, const void __user *ubuf)
>  {
>  	struct fpu *fpu = &target->thread.fpu;
> -	struct xregs_state *xsave;
> +	struct xregs_state *tmpbuf = NULL;
>  	int ret;
>  
>  	if (!boot_cpu_has(X86_FEATURE_XSAVE))

Can we sneak in a switch to cpu_feature_enabled() too, while at it?

> @@ -120,32 +124,21 @@ int xstateregs_set(struct task_struct *t
>  	if (pos != 0 || count != fpu_user_xstate_size)
>  		return -EFAULT;
>  
> -	xsave = &fpu->state.xsave;
> -
> -	fpu__prepare_write(fpu);
> -
> -	if (using_compacted_format()) {
> -		if (kbuf)
> -			ret = copy_kernel_to_xstate(xsave, kbuf);
> -		else
> -			ret = copy_user_to_xstate(xsave, ubuf);
> -	} else {
> -		ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, xsave, 0, -1);
> -		if (!ret)
> -			ret = validate_user_xstate_header(&xsave->header);
> +	if (!kbuf) {
> +		tmpbuf = vmalloc(count);
> +		if (!tmpbuf)
> +			return -ENOMEM;

<---- newline here to split the vmalloc error handling from the next
step.

>  /*
> - * Convert from a ptrace or sigreturn standard-format user-space buffer to
> - * kernel XSAVES format and copy to the target thread. This is called from
> - * xstateregs_set(), as well as potentially from the sigreturn() and
> - * rt_sigreturn() system calls.
> + * Convert from a sigreturn standard-format user-space buffer to kernel

to compacted format.

> + * XSAVES format and copy to the target thread. This is called from the
> + * sigreturn() and rt_sigreturn() system calls.
>   */
>  int copy_user_to_xstate(struct xregs_state *xsave, const void __user *ubuf)
>  {
> 

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [patch V2 00/14] x86/fpu: Mop up XSAVES and related damage
  2021-06-07 16:38       ` Dave Hansen
@ 2021-06-07 22:51         ` Thomas Gleixner
  2021-06-08 14:47           ` Dave Hansen
  2021-06-08 11:17         ` Thomas Gleixner
  1 sibling, 1 reply; 26+ messages in thread
From: Thomas Gleixner @ 2021-06-07 22:51 UTC (permalink / raw)
  To: Dave Hansen, LKML
  Cc: x86, Andy Lutomirski, Dave Hansen, Fenghua Yu, Tony Luck,
	Yu-cheng Yu, Sebastian Andrzej Siewior

On Mon, Jun 07 2021 at 09:38, Dave Hansen wrote:
> On 6/7/21 7:08 AM, Thomas Gleixner wrote:
>>> By the way, are you talking specifically about the _error_ paths where
>>> the kernel is unable to XRSTOR the signal XSAVE buffer for some reason,
>>> and tries to apply either init_fpu or the hardware init state instead?
>> 
>> 1) Successful XRSTOR from user if the PKRU feature bit in the
>>    sigframe xsave.header.xfeatures is cleared. Both fast and slow path.
>
> It seems like the suggestion here is to inject 'init_pkru_value' in all
> cases where the kernel would be injecting the hardware init value.  I
> don't think we should go that far.
>
> If a signal handler sets xsave.header.xfeatures[PKRU]=0, I can't imagine
> any other intent than wanting the hardware init value.
>
> The error cases don't have intent and the kernel is pretty free to
> inject whatever sanity it deems fit to make forward progress.
>
>> 2) Successful XRSTOR from user if the PKRU feature bit is cleared in
>>    fx_sw_user.xfeatures as that loads init_fpstate which is busted in
>>    mainline. Both fast and slow path. Patch 4/14 of this series fixes this.
>
> This comes down to what happens when "xsave.header.xfeatures !=
> fx_sw_user.xfeatures".  I honestly don't have the foggiest idea, and
> seriously doubt anyone cares.  I'm fine with where 4/14 lands.
>
>> 3) fx_only both in the fast and slow path are broken without 4/14
>
> fx_only to me means that XSAVE doesn't work, either because the hardware
> doesn't support it, or userspace mucked with the buffer enough so XSAVE
> doesn't work.  That's pretty close to the error case, and I'm fine with
> the kernel doing what it wants.
>
> BTW, we currently zap X86_FEATURE_PKU if XSAVE is disabled.  That gets
> rid of at least a few of the nasty fx_only cases.

So as we further discussed on IRC:

We have an hen and egg problem because the hardware init value is 0
(maximum permissive) and the kernel initial PKRU value is minimal
permissive.

Of course we cannot make this consistent ever because the hardware does
not provide a mechanism to set a default PKRU init value, e.g. via the
obviously useful MSR_PKRU_INIT.

So what ever we do, we end up being inconsistent because a user space
XRSTOR behaves different from a sigreturn restore in many ways, some of
them are historical nonsense, i.e. the completely obscure
fpx_sw_bytes.xfeatures handling.

Unless we revert acd547b29880 ("x86/pkeys: Default to a restrictive init
PKRU") there is no real consistent solution. But that revert makes no
sense either because that would put the burden on user space to come up
with some sensible default when initializing an application (think
libraries) and still would not give any meaningful values for the XRSTOR
case with the PKRU bit disabled in xfeatures.

Now a real useful way out of this is to remove PKRU from xsaves
alltogether, i.e. disable it in XCR0 as XSAVES managed state.

That solves a couple of problems:

  1) The issue that PKRU has to be context switched contrary to the rest
     of XSTATE which conflicts with the restore XSTATE only on return to
     user space approach.

  2) The signal restore would get rid of all consistency issues in one
     go.

  3) init_fpstate would become __ro_after_init and not involved in all
     of that

That opens up the opportunity to:

  1) Make the PKRU value on the signal frame independent of XSAVE init
     issues

  2) To provide a per process or task defined PKRU default value which
     can be used for e.g. the signal handling path which makes a lot of
     sense on it's own.
  
But it creates a few new problems:

  1) Where to put the PKRU value in the sigframe?

     For 64bit sigframes that's easy as there is padding space, for
     32bit sigframes that's a problem because there is no space.

  2) Backward compatibility

     As much as we wish to have a time machine there is a rule not to
     break existing user space.

Now fortunately there is a way out:

  1) User space cannot rely on PKRU being XSTATE managed unless PKRU is
     enabled in XCR0. XCR0 enablement is part of the UABI so any
     complaint about missing XCR0 support is futile

  2) As documented in pkey_alloc(2):

     "pkey_alloc() is always safe to call regardless of whether or not
      the operating system supports protection keys.  It can be used in
      lieu of any other mechanism for detecting pkey support and will
      simply fail with the error ENOSPC if the operating system has no
      pkey support.

      The kernel guarantees that the contents of the hardware rights
      register (PKRU) will be preserved only for allocated protec‐ tion
      keys.  Any time a key is unallocated (either before the first call
      returning that key from pkey_alloc() or after it is freed via
      pkey_free()), the kernel may make arbitrary changes to the parts
      of the rights register affecting access to that key."

Stupidly the pkeys(7) man page says:

      "This signal behavior is unusual and is due to the fact that the
       x86 PKRU register (which stores protection key access rights) is
       managed with the same hardware mechanism (XSAVE) that manages
       floating-point registers.  The signal behavior is the same as
       that of floating-point registers."

But that's not a show stopper IMO because the signal behaviour of XSTATE
managed floating point registers is documented that it only contains the
register state which is actually XSTATE managed, i.e. the corresponding
bit is set in XCR0 which user space can retrieve via XEGTBV and which is
fortunately part of the *kinda* documented ABI.

So the proposed solution is to:

   A) Exclude PKRU from XSTATE managed state, i.e. do not set the PKRU
      bit in XCR0

   B) Exclude 32bit applications on 64bit kernels from using PKEYS by
      returning an error code from pkey_alloc(). That's fine because the
      man page requires them to handle the fail which they need to do
      anyway because 32bit kernel do not support PKEYS and never will.

   C) Replace the current context switch mechanism which is partially
      XSAVE based by a software managed one.

   D) Store the PKRU value in one of the reserved slots of the 64bit
      signal frame which is possible because of #B so that a signal
      handler has the chance to override the interrupted task's PKRU
      setting.

Thoughts?

Thanks,

        tglx


      

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch V2 00/14] x86/fpu: Mop up XSAVES and related damage
  2021-06-07 16:38       ` Dave Hansen
  2021-06-07 22:51         ` Thomas Gleixner
@ 2021-06-08 11:17         ` Thomas Gleixner
  2021-06-08 12:27           ` Thomas Gleixner
  1 sibling, 1 reply; 26+ messages in thread
From: Thomas Gleixner @ 2021-06-08 11:17 UTC (permalink / raw)
  To: Dave Hansen, LKML
  Cc: x86, Andy Lutomirski, Dave Hansen, Fenghua Yu, Tony Luck,
	Yu-cheng Yu, Sebastian Andrzej Siewior

On Mon, Jun 07 2021 at 09:38, Dave Hansen wrote:
> On 6/7/21 7:08 AM, Thomas Gleixner wrote:
>>> By the way, are you talking specifically about the _error_ paths where
>>> the kernel is unable to XRSTOR the signal XSAVE buffer for some reason,
>>> and tries to apply either init_fpu or the hardware init state instead?
>> 
>> 1) Successful XRSTOR from user if the PKRU feature bit in the
>>    sigframe xsave.header.xfeatures is cleared. Both fast and slow path.
>
> It seems like the suggestion here is to inject 'init_pkru_value' in all
> cases where the kernel would be injecting the hardware init value.  I
> don't think we should go that far.
>
> If a signal handler sets xsave.header.xfeatures[PKRU]=0, I can't imagine
> any other intent than wanting the hardware init value.

Fine. But PKRU=0 is broken today...

T1 in user space
     wrpkru(0)

T1 -> kernel
     schedule()
       XSAVE(S) -> T1->xsave.header.xfeatures[PKRU] == 0
       T1->flags |= TIF_NEED_FPU_LOAD;
       
       wrpkru();

     schedule()
       ...
       pk = get_xsave_addr(&T1->fpu->state.xsave, XFEATURE_PKRU);
       if (pk)
         wrpkru(pk->pkru);
       else
         wrpkru(DEFAULT_PKRU);

But because PKRU of T1 was 0, the xfeatures bit is 0 and therefore the
value in the xsave storage is not valid. Which makes get_xsave_addr()
return NULL and switch_to() writes the default PKRU.

So that wreckages any copy_to/from_user() on the way back to user space
which hits memory which is protected by the default PKRU value.

Assumed that this does not fail (pure luck) then T1 goes back to user
space and because TIF_NEED_FPU_LOAD is set it ends up in

switch_fpu_return()
    __fpregs_load_activate()
      if (!fpregs_state_valid()) {
         load_XSTATE_from_task();
      }

But because nothing touched the FPU between schedule out and schedule in
of T1 the fpregs_state is valid so switch_fpu_return() does nothing and
just clears TIF_NEED_FPU_LOAD. Back to user space with DEFAULT_PKRU
loaded. FAIL!

XSTATE sucks.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch V2 00/14] x86/fpu: Mop up XSAVES and related damage
  2021-06-08 11:17         ` Thomas Gleixner
@ 2021-06-08 12:27           ` Thomas Gleixner
  0 siblings, 0 replies; 26+ messages in thread
From: Thomas Gleixner @ 2021-06-08 12:27 UTC (permalink / raw)
  To: Dave Hansen, LKML
  Cc: x86, Andy Lutomirski, Dave Hansen, Fenghua Yu, Tony Luck,
	Yu-cheng Yu, Sebastian Andrzej Siewior

On Tue, Jun 08 2021 at 13:17, Thomas Gleixner wrote:
> On Mon, Jun 07 2021 at 09:38, Dave Hansen wrote:
>> On 6/7/21 7:08 AM, Thomas Gleixner wrote:
>>>> By the way, are you talking specifically about the _error_ paths where
>>>> the kernel is unable to XRSTOR the signal XSAVE buffer for some reason,
>>>> and tries to apply either init_fpu or the hardware init state instead?
>>> 
>>> 1) Successful XRSTOR from user if the PKRU feature bit in the
>>>    sigframe xsave.header.xfeatures is cleared. Both fast and slow path.
>>
>> It seems like the suggestion here is to inject 'init_pkru_value' in all
>> cases where the kernel would be injecting the hardware init value.  I
>> don't think we should go that far.
>>
>> If a signal handler sets xsave.header.xfeatures[PKRU]=0, I can't imagine
>> any other intent than wanting the hardware init value.
>
> Fine. But PKRU=0 is broken today...
>
> T1 in user space
>      wrpkru(0)

It's actually not wrpkru(0), but a XRSTOR with the
xsave.header.xfeatures[PKRU] bit cleared. But the rest is the same...

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch V2 00/14] x86/fpu: Mop up XSAVES and related damage
  2021-06-07 22:51         ` Thomas Gleixner
@ 2021-06-08 14:47           ` Dave Hansen
  0 siblings, 0 replies; 26+ messages in thread
From: Dave Hansen @ 2021-06-08 14:47 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: x86, Andy Lutomirski, Dave Hansen, Fenghua Yu, Tony Luck,
	Yu-cheng Yu, Sebastian Andrzej Siewior

On 6/7/21 3:51 PM, Thomas Gleixner wrote:
...
> But it creates a few new problems:
> 
>   1) Where to put the PKRU value in the sigframe?
> 
>      For 64bit sigframes that's easy as there is padding space, for
>      32bit sigframes that's a problem because there is no space.
> 
>   2) Backward compatibility
> 
>      As much as we wish to have a time machine there is a rule not to
>      break existing user space.
> 
> Now fortunately there is a way out:
> 
>   1) User space cannot rely on PKRU being XSTATE managed unless PKRU is
>      enabled in XCR0. XCR0 enablement is part of the UABI so any
>      complaint about missing XCR0 support is futile

So...  One more gem from the manpages:

> It is recommended that
>        applications wanting to use protection keys should simply call
>        pkey_alloc(2) and test whether the call succeeds, instead of
>        attempting to detect support for the feature in any other way.

I kinda wrote that thinking that folks could avoid doing the
CPUID/XGETBV dance and just use the syscall instead.  *If* they do what
is suggested, they'll never notice the lack of PKRU in XCR0.

The pkey selftest, for instance, blindly assumes that pkeys is enabled
in XCR0.  It would probably end up scribbling somewhere on the stack.
Now the same person who wrote that also wrote the manpages, so those are
not exactly two separate data points.

...
> So the proposed solution is to:
> 
>    A) Exclude PKRU from XSTATE managed state, i.e. do not set the PKRU
>       bit in XCR0
> 
>    B) Exclude 32bit applications on 64bit kernels from using PKEYS by
>       returning an error code from pkey_alloc(). That's fine because the
>       man page requires them to handle the fail which they need to do
>       anyway because 32bit kernel do not support PKEYS and never will.
> 
>    C) Replace the current context switch mechanism which is partially
>       XSAVE based by a software managed one.
> 
>    D) Store the PKRU value in one of the reserved slots of the 64bit
>       signal frame which is possible because of #B so that a signal
>       handler has the chance to override the interrupted task's PKRU
>       setting.
> 
> Thoughts?

The thing that makes me most nervous is changing the signal stack ABI
for PKRU.  Careful apps (not the selftest) will probably have more
careful enumeration and might bug out due to the missing XCR0 bit.  Or,
they might at least check xfeatures (aka. XSTATE_BV) in the signal stack
XSAVE buffer.

On the bright side, rudely masking PKRU out of XCR0:

	xcr0 &= ~XFEATURE_MASK_PKRU;

still results in a kernel that boots.

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2021-06-08 14:47 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-05 23:47 [patch V2 00/14] x86/fpu: Mop up XSAVES and related damage Thomas Gleixner
2021-06-05 23:47 ` [patch V2 01/14] selftests/x86: Test signal frame XSTATE header corruption handling Thomas Gleixner
2021-06-05 23:47 ` [patch V2 02/14] x86/fpu: Prevent state corruption in __fpu__restore_sig() Thomas Gleixner
2021-06-07  8:49   ` Borislav Petkov
2021-06-05 23:47 ` [patch V2 03/14] x86/fpu: Invalidate FPU state after a failed XRSTOR from a user buffer Thomas Gleixner
2021-06-05 23:47 ` [patch V2 04/14] x86/pkru: Make the fpinit state update work Thomas Gleixner
2021-06-07 15:18   ` Borislav Petkov
2021-06-05 23:47 ` [patch V2 05/14] x86/fpu: Limit xstate copy size in xstateregs_set() Thomas Gleixner
2021-06-05 23:47 ` [patch V2 06/14] x86/fpu: Sanitize xstateregs_set() Thomas Gleixner
2021-06-07 19:39   ` Borislav Petkov
2021-06-05 23:47 ` [patch V2 07/14] x86/fpu: Add address range checks to copy_user_to_xstate() Thomas Gleixner
2021-06-05 23:47 ` [patch V2 08/14] x86/fpu: Move inlines where they belong Thomas Gleixner
2021-06-05 23:47 ` [patch V2 09/14] x86/cpu: Sanitize X86_FEATURE_OSPKE Thomas Gleixner
2021-06-05 23:47 ` [patch V2 10/14] x86/fpu: Rename fpu__clear_all() to fpu_flush_thread() Thomas Gleixner
2021-06-05 23:47 ` [patch V2 11/14] x86/pkru: Provide pkru_get_init_value() Thomas Gleixner
2021-06-05 23:47 ` [patch V2 12/14] x86/fpu: Clean up the fpu__clear() variants Thomas Gleixner
2021-06-05 23:47 ` [patch V2 13/14] x86/fpu: Rename xstate copy functions which are related to UABI Thomas Gleixner
2021-06-05 23:47 ` [patch V2 14/14] x86/fpu: Deduplicate copy_uabi_from_user/kernel_to_xstate() Thomas Gleixner
2021-06-07 13:02 ` [patch V2 00/14] x86/fpu: Mop up XSAVES and related damage Thomas Gleixner
2021-06-07 13:36   ` Dave Hansen
2021-06-07 14:08     ` Thomas Gleixner
2021-06-07 16:38       ` Dave Hansen
2021-06-07 22:51         ` Thomas Gleixner
2021-06-08 14:47           ` Dave Hansen
2021-06-08 11:17         ` Thomas Gleixner
2021-06-08 12:27           ` Thomas Gleixner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.