All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH fsgsbase v2 0/4] x86/fsgsbase: Some fixes
@ 2020-06-26 17:24 Andy Lutomirski
  2020-06-26 17:24 ` [PATCH fsgsbase v2 1/4] selftests/x86/fsgsbase: Fix a comment in the ptrace_write_gsbase test Andy Lutomirski
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Andy Lutomirski @ 2020-06-26 17:24 UTC (permalink / raw)
  To: x86
  Cc: Sasha Levin, linux-kernel, Andrew Cooper, Juergen Gross, Andy Lutomirski

These apply to tip:x86/fsgsbase.

The first two are trivial selftest fixes.  The third one fixes an actual
regression.  The forth one fixes a crash on boot on Xen PV.

Andy Lutomirski (4):
  selftests/x86/fsgsbase: Fix a comment in the ptrace_write_gsbase test
  selftests/x86/fsgsbase: Add a missing memory constraint
  x86/ptrace: Fix 32-bit PTRACE_SETREGS vs fsbase and gsbase
  x86/fsgsbase: Fix Xen PV support

 arch/x86/include/asm/fsgsbase.h               |   2 +
 arch/x86/kernel/process_64.c                  |  24 +-
 arch/x86/kernel/ptrace.c                      |  43 ++-
 tools/testing/selftests/x86/Makefile          |   2 +-
 tools/testing/selftests/x86/fsgsbase.c        |   6 +-
 .../testing/selftests/x86/fsgsbase_restore.c  | 245 ++++++++++++++++++
 6 files changed, 298 insertions(+), 24 deletions(-)
 create mode 100644 tools/testing/selftests/x86/fsgsbase_restore.c

-- 
2.25.4


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH fsgsbase v2 1/4] selftests/x86/fsgsbase: Fix a comment in the ptrace_write_gsbase test
  2020-06-26 17:24 [PATCH fsgsbase v2 0/4] x86/fsgsbase: Some fixes Andy Lutomirski
@ 2020-06-26 17:24 ` Andy Lutomirski
  2020-07-01 13:33   ` [tip: x86/fsgsbase] " tip-bot2 for Andy Lutomirski
  2020-06-26 17:24 ` [PATCH fsgsbase v2 2/4] selftests/x86/fsgsbase: Add a missing memory constraint Andy Lutomirski
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 14+ messages in thread
From: Andy Lutomirski @ 2020-06-26 17:24 UTC (permalink / raw)
  To: x86
  Cc: Sasha Levin, linux-kernel, Andrew Cooper, Juergen Gross, Andy Lutomirski

A comment was unclear.  Fix it.

Fixes: 5e7ec8578fa3 ("selftests/x86/fsgsbase: Test ptracer-induced GS base write with FSGSBASE")
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 tools/testing/selftests/x86/fsgsbase.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/x86/fsgsbase.c b/tools/testing/selftests/x86/fsgsbase.c
index 9a4349813a30..f47495d2f070 100644
--- a/tools/testing/selftests/x86/fsgsbase.c
+++ b/tools/testing/selftests/x86/fsgsbase.c
@@ -498,7 +498,8 @@ static void test_ptrace_write_gsbase(void)
 			 * base would zero the selector.  On newer kernels,
 			 * this behavior has changed -- poking the base
 			 * changes only the base and, if FSGSBASE is not
-			 * available, this may not effect.
+			 * available, this may have no effect once the tracee
+			 * is resumed.
 			 */
 			if (gs == 0)
 				printf("\tNote: this is expected behavior on older kernels.\n");
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH fsgsbase v2 2/4] selftests/x86/fsgsbase: Add a missing memory constraint
  2020-06-26 17:24 [PATCH fsgsbase v2 0/4] x86/fsgsbase: Some fixes Andy Lutomirski
  2020-06-26 17:24 ` [PATCH fsgsbase v2 1/4] selftests/x86/fsgsbase: Fix a comment in the ptrace_write_gsbase test Andy Lutomirski
@ 2020-06-26 17:24 ` Andy Lutomirski
  2020-07-01 13:33   ` [tip: x86/fsgsbase] " tip-bot2 for Andy Lutomirski
  2020-06-26 17:24 ` [PATCH fsgsbase v2 3/4] x86/ptrace: Fix 32-bit PTRACE_SETREGS vs fsbase and gsbase Andy Lutomirski
  2020-06-26 17:24   ` Andy Lutomirski
  3 siblings, 1 reply; 14+ messages in thread
From: Andy Lutomirski @ 2020-06-26 17:24 UTC (permalink / raw)
  To: x86
  Cc: Sasha Levin, linux-kernel, Andrew Cooper, Juergen Gross, Andy Lutomirski

The manual call to set_thread_area() via int $0x80 was missing any
indication that the descriptor was a pointer, causing gcc to
occasionally generate wrong code.  Add the missing constraint.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 tools/testing/selftests/x86/fsgsbase.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/x86/fsgsbase.c b/tools/testing/selftests/x86/fsgsbase.c
index f47495d2f070..998319553523 100644
--- a/tools/testing/selftests/x86/fsgsbase.c
+++ b/tools/testing/selftests/x86/fsgsbase.c
@@ -285,7 +285,8 @@ static unsigned short load_gs(void)
 		/* 32-bit set_thread_area */
 		long ret;
 		asm volatile ("int $0x80"
-			      : "=a" (ret) : "a" (243), "b" (low_desc)
+			      : "=a" (ret), "+m" (*low_desc)
+			      : "a" (243), "b" (low_desc)
 			      : "r8", "r9", "r10", "r11");
 		memcpy(&desc, low_desc, sizeof(desc));
 		munmap(low_desc, sizeof(desc));
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH fsgsbase v2 3/4] x86/ptrace: Fix 32-bit PTRACE_SETREGS vs fsbase and gsbase
  2020-06-26 17:24 [PATCH fsgsbase v2 0/4] x86/fsgsbase: Some fixes Andy Lutomirski
  2020-06-26 17:24 ` [PATCH fsgsbase v2 1/4] selftests/x86/fsgsbase: Fix a comment in the ptrace_write_gsbase test Andy Lutomirski
  2020-06-26 17:24 ` [PATCH fsgsbase v2 2/4] selftests/x86/fsgsbase: Add a missing memory constraint Andy Lutomirski
@ 2020-06-26 17:24 ` Andy Lutomirski
  2020-07-01 13:33   ` [tip: x86/fsgsbase] " tip-bot2 for Andy Lutomirski
  2020-06-26 17:24   ` Andy Lutomirski
  3 siblings, 1 reply; 14+ messages in thread
From: Andy Lutomirski @ 2020-06-26 17:24 UTC (permalink / raw)
  To: x86
  Cc: Sasha Levin, linux-kernel, Andrew Cooper, Juergen Gross, Andy Lutomirski

Debuggers expect that doing PTRACE_GETREGS, then poking at a tracee
and maybe letting it run for a while, then doing PTRACE_SETREGS will
put the tracee back where it was.  In the specific case of a 32-bit
tracer and tracee, the PTRACE_GETREGS/SETREGS data structure doesn't
have fs_base or gs_base fields, so FSBASE and GSBASE fields are
never stored anywhere.  Everything used to still work because
nonzero FS or GS would result full reloads of the segment registers
when the tracee resumes, and the bases associated with FS==0 or
GS==0 are irrelevant to 32-bit code.

Adding FSGSBASE support broke this: when FSGSBASE is enabled, FSBASE
and GSBASE are now restored independently of FS and GS for all tasks
when context-switched in.  This means that, if a 32-bit tracer
restores a previous state using PTRACE_SETREGS but the tracee's
pre-restore and post-restore bases don't match, then the tracee is
resumed with the wrong base.

Fix it by explicitly loading the base when a 32-bit tracer pokes FS
or GS on a 64-bit kernel.

Also add a test case.

Fixes: 673903495c85 ("x86/process/64: Use FSBSBASE in switch_to() if available")
Cc: Sasha Levin <sashal@kernel.org>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/include/asm/fsgsbase.h               |   2 +
 arch/x86/kernel/process_64.c                  |   4 +-
 arch/x86/kernel/ptrace.c                      |  43 ++-
 tools/testing/selftests/x86/Makefile          |   2 +-
 .../testing/selftests/x86/fsgsbase_restore.c  | 245 ++++++++++++++++++
 5 files changed, 280 insertions(+), 16 deletions(-)
 create mode 100644 tools/testing/selftests/x86/fsgsbase_restore.c

diff --git a/arch/x86/include/asm/fsgsbase.h b/arch/x86/include/asm/fsgsbase.h
index aefd53767a5d..d552646411a9 100644
--- a/arch/x86/include/asm/fsgsbase.h
+++ b/arch/x86/include/asm/fsgsbase.h
@@ -75,6 +75,8 @@ static inline void x86_fsbase_write_cpu(unsigned long fsbase)
 
 extern unsigned long x86_gsbase_read_cpu_inactive(void);
 extern void x86_gsbase_write_cpu_inactive(unsigned long gsbase);
+extern unsigned long x86_fsgsbase_read_task(struct task_struct *task,
+					    unsigned short selector);
 
 #endif /* CONFIG_X86_64 */
 
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index d618969cea66..cb8e37d3acaa 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -347,8 +347,8 @@ static __always_inline void x86_fsgsbase_load(struct thread_struct *prev,
 	}
 }
 
-static unsigned long x86_fsgsbase_read_task(struct task_struct *task,
-					    unsigned short selector)
+unsigned long x86_fsgsbase_read_task(struct task_struct *task,
+				     unsigned short selector)
 {
 	unsigned short idx = selector >> 3;
 	unsigned long base;
diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index 1c7646c8d0fe..3f006489087f 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -281,17 +281,9 @@ static int set_segment_reg(struct task_struct *task,
 		return -EIO;
 
 	/*
-	 * This function has some ABI oddities.
-	 *
-	 * A 32-bit ptracer probably expects that writing FS or GS will change
-	 * FSBASE or GSBASE respectively.  In the absence of FSGSBASE support,
-	 * this code indeed has that effect.  When FSGSBASE is added, this
-	 * will require a special case.
-	 *
-	 * For existing 64-bit ptracers, writing FS or GS *also* currently
-	 * changes the base if the selector is nonzero the next time the task
-	 * is run.  This behavior may not be needed, and trying to preserve it
-	 * when FSGSBASE is added would be complicated at best.
+	 * Writes to FS and GS will change the stored selector.  Whether
+	 * this changes the segment base as well depends on whether
+	 * FSGSBASE is enabled.
 	 */
 
 	switch (offset) {
@@ -867,14 +859,39 @@ long arch_ptrace(struct task_struct *child, long request,
 static int putreg32(struct task_struct *child, unsigned regno, u32 value)
 {
 	struct pt_regs *regs = task_pt_regs(child);
+	int ret;
 
 	switch (regno) {
 
 	SEG32(cs);
 	SEG32(ds);
 	SEG32(es);
-	SEG32(fs);
-	SEG32(gs);
+
+	/*
+	 * A 32-bit ptracer on a 64-bit kernel expects that writing
+	 * FS or GS will also update the base.  This is needed for
+	 * operations like PTRACE_SETREGS to fully restore a saved
+	 * CPU state.
+	 */
+
+	case offsetof(struct user32, regs.fs):
+		ret = set_segment_reg(child,
+				      offsetof(struct user_regs_struct, fs),
+				      value);
+		if (ret == 0)
+			child->thread.fsbase =
+				x86_fsgsbase_read_task(child, value);
+		return ret;
+
+	case offsetof(struct user32, regs.gs):
+		ret = set_segment_reg(child,
+				      offsetof(struct user_regs_struct, gs),
+				      value);
+		if (ret == 0)
+			child->thread.gsbase =
+				x86_fsgsbase_read_task(child, value);
+		return ret;
+
 	SEG32(ss);
 
 	R32(ebx, bx);
diff --git a/tools/testing/selftests/x86/Makefile b/tools/testing/selftests/x86/Makefile
index 5f16821c7f63..3ff94575d02d 100644
--- a/tools/testing/selftests/x86/Makefile
+++ b/tools/testing/selftests/x86/Makefile
@@ -13,7 +13,7 @@ CAN_BUILD_WITH_NOPIE := $(shell ./check_cc.sh $(CC) trivial_program.c -no-pie)
 TARGETS_C_BOTHBITS := single_step_syscall sysret_ss_attrs syscall_nt test_mremap_vdso \
 			check_initial_reg_state sigreturn iopl ioperm \
 			test_vdso test_vsyscall mov_ss_trap \
-			syscall_arg_fault
+			syscall_arg_fault fsgsbase_restore
 TARGETS_C_32BIT_ONLY := entry_from_vm86 test_syscall_vdso unwind_vdso \
 			test_FCMOV test_FCOMI test_FISTTP \
 			vdso_restorer
diff --git a/tools/testing/selftests/x86/fsgsbase_restore.c b/tools/testing/selftests/x86/fsgsbase_restore.c
new file mode 100644
index 000000000000..6fffadc51579
--- /dev/null
+++ b/tools/testing/selftests/x86/fsgsbase_restore.c
@@ -0,0 +1,245 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * fsgsbase_restore.c, test ptrace vs fsgsbase
+ * Copyright (c) 2020 Andy Lutomirski
+ *
+ * This test case simulates a tracer redirecting tracee execution to
+ * a function and then restoring tracee state using PTRACE_GETREGS and
+ * PTRACE_SETREGS.  This is similar to what gdb does when doing
+ * 'p func()'.  The catch is that this test has the called function
+ * modify a segment register.  This makes sure that ptrace correctly
+ * restores segment state when using PTRACE_SETREGS.
+ *
+ * This is not part of fsgsbase.c, because that test is 64-bit only.
+ */
+
+#define _GNU_SOURCE
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdbool.h>
+#include <string.h>
+#include <sys/syscall.h>
+#include <unistd.h>
+#include <err.h>
+#include <sys/user.h>
+#include <asm/prctl.h>
+#include <sys/prctl.h>
+#include <asm/ldt.h>
+#include <sys/mman.h>
+#include <stddef.h>
+#include <sys/ptrace.h>
+#include <sys/wait.h>
+#include <stdint.h>
+
+#define EXPECTED_VALUE 0x1337f00d
+
+#ifdef __x86_64__
+# define SEG "%gs"
+#else
+# define SEG "%fs"
+#endif
+
+static unsigned int dereference_seg_base(void)
+{
+	int ret;
+	asm volatile ("mov %" SEG ":(0), %0" : "=rm" (ret));
+	return ret;
+}
+
+static void init_seg(void)
+{
+	unsigned int *target = mmap(
+		NULL, sizeof(unsigned int),
+		PROT_READ | PROT_WRITE,
+		MAP_PRIVATE | MAP_ANONYMOUS | MAP_32BIT, -1, 0);
+	if (target == MAP_FAILED)
+		err(1, "mmap");
+
+	*target = EXPECTED_VALUE;
+
+	printf("\tsegment base address = 0x%lx\n", (unsigned long)target);
+
+	struct user_desc desc = {
+		.entry_number    = 0,
+		.base_addr       = (unsigned int)(uintptr_t)target,
+		.limit           = sizeof(unsigned int) - 1,
+		.seg_32bit       = 1,
+		.contents        = 0, /* Data, grow-up */
+		.read_exec_only  = 0,
+		.limit_in_pages  = 0,
+		.seg_not_present = 0,
+		.useable         = 0
+	};
+	if (syscall(SYS_modify_ldt, 1, &desc, sizeof(desc)) == 0) {
+		printf("\tusing LDT slot 0\n");
+		asm volatile ("mov %0, %" SEG :: "rm" ((unsigned short)0x7));
+	} else {
+		/* No modify_ldt for us (configured out, perhaps) */
+
+		struct user_desc *low_desc = mmap(
+			NULL, sizeof(desc),
+			PROT_READ | PROT_WRITE,
+			MAP_PRIVATE | MAP_ANONYMOUS | MAP_32BIT, -1, 0);
+		memcpy(low_desc, &desc, sizeof(desc));
+
+		low_desc->entry_number = -1;
+
+		/* 32-bit set_thread_area */
+		long ret;
+		asm volatile ("int $0x80"
+			      : "=a" (ret), "+m" (*low_desc)
+			      : "a" (243), "b" (low_desc)
+#ifdef __x86_64__
+			      : "r8", "r9", "r10", "r11"
+#endif
+			);
+		memcpy(&desc, low_desc, sizeof(desc));
+		munmap(low_desc, sizeof(desc));
+
+		if (ret != 0) {
+			printf("[NOTE]\tcould not create a segment -- can't test anything\n");
+			exit(0);
+		}
+		printf("\tusing GDT slot %d\n", desc.entry_number);
+
+		unsigned short sel = (unsigned short)((desc.entry_number << 3) | 0x3);
+		asm volatile ("mov %0, %" SEG :: "rm" (sel));
+	}
+}
+
+static void tracee_zap_segment(void)
+{
+	/*
+	 * The tracer will redirect execution here.  This is meant to
+	 * work like gdb's 'p func()' feature.  The tricky bit is that
+	 * we modify a segment register in order to make sure that ptrace
+	 * can correctly restore segment registers.
+	 */
+	printf("\tTracee: in tracee_zap_segment()\n");
+
+	/*
+	 * Write a nonzero selector with base zero to the segment register.
+	 * Using a null selector would defeat the test on AMD pre-Zen2
+	 * CPUs, as such CPUs don't clear the base when loading a null
+	 * selector.
+	 */
+	unsigned short sel;
+	asm volatile ("mov %%ss, %0\n\t"
+		      "mov %0, %" SEG
+		      : "=rm" (sel));
+
+	pid_t pid = getpid(), tid = syscall(SYS_gettid);
+
+	printf("\tTracee is going back to sleep\n");
+	syscall(SYS_tgkill, pid, tid, SIGSTOP);
+
+	/* Should not get here. */
+	while (true) {
+		printf("[FAIL]\tTracee hit unreachable code\n");
+		pause();
+	}
+}
+
+int main()
+{
+	printf("\tSetting up a segment\n");
+	init_seg();
+
+	unsigned int val = dereference_seg_base();
+	if (val != EXPECTED_VALUE) {
+		printf("[FAIL]\tseg[0] == %x; should be %x\n", val, EXPECTED_VALUE);
+		return 1;
+	}
+	printf("[OK]\tThe segment points to the right place.\n");
+
+	pid_t chld = fork();
+	if (chld < 0)
+		err(1, "fork");
+
+	if (chld == 0) {
+		prctl(PR_SET_PDEATHSIG, SIGKILL, 0, 0, 0, 0);
+
+		if (ptrace(PTRACE_TRACEME, 0, 0, 0) != 0)
+			err(1, "PTRACE_TRACEME");
+
+		pid_t pid = getpid(), tid = syscall(SYS_gettid);
+
+		printf("\tTracee will take a nap until signaled\n");
+		syscall(SYS_tgkill, pid, tid, SIGSTOP);
+
+		printf("\tTracee was resumed.  Will re-check segment.\n");
+
+		val = dereference_seg_base();
+		if (val != EXPECTED_VALUE) {
+			printf("[FAIL]\tseg[0] == %x; should be %x\n", val, EXPECTED_VALUE);
+			exit(1);
+		}
+
+		printf("[OK]\tThe segment points to the right place.\n");
+		exit(0);
+	}
+
+	int status;
+
+	/* Wait for SIGSTOP. */
+	if (waitpid(chld, &status, 0) != chld || !WIFSTOPPED(status))
+		err(1, "waitpid");
+
+	struct user_regs_struct regs;
+
+	if (ptrace(PTRACE_GETREGS, chld, NULL, &regs) != 0)
+		err(1, "PTRACE_GETREGS");
+
+#ifdef __x86_64__
+	printf("\tChild GS=0x%lx, GSBASE=0x%lx\n", (unsigned long)regs.gs, (unsigned long)regs.gs_base);
+#else
+	printf("\tChild FS=0x%lx\n", (unsigned long)regs.xfs);
+#endif
+
+	struct user_regs_struct regs2 = regs;
+#ifdef __x86_64__
+	regs2.rip = (unsigned long)tracee_zap_segment;
+	regs2.rsp -= 128;	/* Don't clobber the redzone. */
+#else
+	regs2.eip = (unsigned long)tracee_zap_segment;
+#endif
+
+	printf("\tTracer: redirecting tracee to tracee_zap_segment()\n");
+	if (ptrace(PTRACE_SETREGS, chld, NULL, &regs2) != 0)
+		err(1, "PTRACE_GETREGS");
+	if (ptrace(PTRACE_CONT, chld, NULL, NULL) != 0)
+		err(1, "PTRACE_GETREGS");
+
+	/* Wait for SIGSTOP. */
+	if (waitpid(chld, &status, 0) != chld || !WIFSTOPPED(status))
+		err(1, "waitpid");
+
+	printf("\tTracer: restoring tracee state\n");
+	if (ptrace(PTRACE_SETREGS, chld, NULL, &regs) != 0)
+		err(1, "PTRACE_GETREGS");
+	if (ptrace(PTRACE_DETACH, chld, NULL, NULL) != 0)
+		err(1, "PTRACE_GETREGS");
+
+	/* Wait for SIGSTOP. */
+	if (waitpid(chld, &status, 0) != chld)
+		err(1, "waitpid");
+
+	if (WIFSIGNALED(status)) {
+		printf("[FAIL]\tTracee crashed\n");
+		return 1;
+	}
+
+	if (!WIFEXITED(status)) {
+		printf("[FAIL]\tTracee stopped for an unexpected reason: %d\n", status);
+		return 1;
+	}
+
+	int exitcode = WEXITSTATUS(status);
+	if (exitcode != 0) {
+		printf("[FAIL]\tTracee reported failure\n");
+		return 1;
+	}
+
+	printf("[OK]\tAll is well.\n");
+	return 0;
+}
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH fsgsbase v2 4/4] x86/fsgsbase: Fix Xen PV support
  2020-06-26 17:24 [PATCH fsgsbase v2 0/4] x86/fsgsbase: Some fixes Andy Lutomirski
@ 2020-06-26 17:24   ` Andy Lutomirski
  2020-06-26 17:24 ` [PATCH fsgsbase v2 2/4] selftests/x86/fsgsbase: Add a missing memory constraint Andy Lutomirski
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 14+ messages in thread
From: Andy Lutomirski @ 2020-06-26 17:24 UTC (permalink / raw)
  To: x86
  Cc: Sasha Levin, linux-kernel, Andrew Cooper, Juergen Gross,
	Andy Lutomirski, Boris Ostrovsky, Stefano Stabellini, xen-devel

On Xen PV, SWAPGS doesn't work.  Teach __rdfsbase_inactive() and
__wrgsbase_inactive() to use rdmsrl()/wrmsrl() on Xen PV.  The Xen
pvop code will understand this and issue the correct hypercalls.

Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: xen-devel@lists.xenproject.org
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/kernel/process_64.c | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index cb8e37d3acaa..457d02aa10d8 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -163,9 +163,13 @@ static noinstr unsigned long __rdgsbase_inactive(void)
 
 	lockdep_assert_irqs_disabled();
 
-	native_swapgs();
-	gsbase = rdgsbase();
-	native_swapgs();
+	if (!static_cpu_has(X86_FEATURE_XENPV)) {
+		native_swapgs();
+		gsbase = rdgsbase();
+		native_swapgs();
+	} else {
+		rdmsrl(MSR_KERNEL_GS_BASE, gsbase);
+	}
 
 	return gsbase;
 }
@@ -182,9 +186,13 @@ static noinstr void __wrgsbase_inactive(unsigned long gsbase)
 {
 	lockdep_assert_irqs_disabled();
 
-	native_swapgs();
-	wrgsbase(gsbase);
-	native_swapgs();
+	if (!static_cpu_has(X86_FEATURE_XENPV)) {
+		native_swapgs();
+		wrgsbase(gsbase);
+		native_swapgs();
+	} else {
+		wrmsrl(MSR_KERNEL_GS_BASE, gsbase);
+	}
 }
 
 /*
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH fsgsbase v2 4/4] x86/fsgsbase: Fix Xen PV support
@ 2020-06-26 17:24   ` Andy Lutomirski
  0 siblings, 0 replies; 14+ messages in thread
From: Andy Lutomirski @ 2020-06-26 17:24 UTC (permalink / raw)
  To: x86
  Cc: Sasha Levin, Juergen Gross, Stefano Stabellini, Andrew Cooper,
	linux-kernel, Andy Lutomirski, xen-devel, Boris Ostrovsky

On Xen PV, SWAPGS doesn't work.  Teach __rdfsbase_inactive() and
__wrgsbase_inactive() to use rdmsrl()/wrmsrl() on Xen PV.  The Xen
pvop code will understand this and issue the correct hypercalls.

Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: xen-devel@lists.xenproject.org
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/kernel/process_64.c | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index cb8e37d3acaa..457d02aa10d8 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -163,9 +163,13 @@ static noinstr unsigned long __rdgsbase_inactive(void)
 
 	lockdep_assert_irqs_disabled();
 
-	native_swapgs();
-	gsbase = rdgsbase();
-	native_swapgs();
+	if (!static_cpu_has(X86_FEATURE_XENPV)) {
+		native_swapgs();
+		gsbase = rdgsbase();
+		native_swapgs();
+	} else {
+		rdmsrl(MSR_KERNEL_GS_BASE, gsbase);
+	}
 
 	return gsbase;
 }
@@ -182,9 +186,13 @@ static noinstr void __wrgsbase_inactive(unsigned long gsbase)
 {
 	lockdep_assert_irqs_disabled();
 
-	native_swapgs();
-	wrgsbase(gsbase);
-	native_swapgs();
+	if (!static_cpu_has(X86_FEATURE_XENPV)) {
+		native_swapgs();
+		wrgsbase(gsbase);
+		native_swapgs();
+	} else {
+		wrmsrl(MSR_KERNEL_GS_BASE, gsbase);
+	}
 }
 
 /*
-- 
2.25.4



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH fsgsbase v2 4/4] x86/fsgsbase: Fix Xen PV support
  2020-06-26 17:24   ` Andy Lutomirski
@ 2020-06-29  5:17     ` Jürgen Groß
  -1 siblings, 0 replies; 14+ messages in thread
From: Jürgen Groß @ 2020-06-29  5:17 UTC (permalink / raw)
  To: Andy Lutomirski, x86
  Cc: Sasha Levin, linux-kernel, Andrew Cooper, Boris Ostrovsky,
	Stefano Stabellini, xen-devel

On 26.06.20 19:24, Andy Lutomirski wrote:
> On Xen PV, SWAPGS doesn't work.  Teach __rdfsbase_inactive() and
> __wrgsbase_inactive() to use rdmsrl()/wrmsrl() on Xen PV.  The Xen
> pvop code will understand this and issue the correct hypercalls.
> 
> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Stefano Stabellini <sstabellini@kernel.org>
> Cc: xen-devel@lists.xenproject.org
> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> ---
>   arch/x86/kernel/process_64.c | 20 ++++++++++++++------
>   1 file changed, 14 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
> index cb8e37d3acaa..457d02aa10d8 100644
> --- a/arch/x86/kernel/process_64.c
> +++ b/arch/x86/kernel/process_64.c
> @@ -163,9 +163,13 @@ static noinstr unsigned long __rdgsbase_inactive(void)
>   
>   	lockdep_assert_irqs_disabled();
>   
> -	native_swapgs();
> -	gsbase = rdgsbase();
> -	native_swapgs();
> +	if (!static_cpu_has(X86_FEATURE_XENPV)) {
> +		native_swapgs();
> +		gsbase = rdgsbase();
> +		native_swapgs();
> +	} else {
> +		rdmsrl(MSR_KERNEL_GS_BASE, gsbase);
> +	}
>   
>   	return gsbase;
>   }
> @@ -182,9 +186,13 @@ static noinstr void __wrgsbase_inactive(unsigned long gsbase)
>   {
>   	lockdep_assert_irqs_disabled();
>   
> -	native_swapgs();
> -	wrgsbase(gsbase);
> -	native_swapgs();
> +	if (!static_cpu_has(X86_FEATURE_XENPV)) {
> +		native_swapgs();
> +		wrgsbase(gsbase);
> +		native_swapgs();
> +	} else {
> +		wrmsrl(MSR_KERNEL_GS_BASE, gsbase);
> +	}
>   }
>   
>   /*
> 

Another possibility would be to just do (I'm fine either way):

diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index acc49fa6a097..b78dd373adbf 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -318,6 +318,8 @@ static void __init xen_init_capabilities(void)
  		setup_clear_cpu_cap(X86_FEATURE_XSAVE);
  		setup_clear_cpu_cap(X86_FEATURE_OSXSAVE);
  	}
+
+	setup_clear_cpu_cap(X86_FEATURE_FSGSBASE);
  }

  static void xen_set_debugreg(int reg, unsigned long val)


Juergen

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH fsgsbase v2 4/4] x86/fsgsbase: Fix Xen PV support
@ 2020-06-29  5:17     ` Jürgen Groß
  0 siblings, 0 replies; 14+ messages in thread
From: Jürgen Groß @ 2020-06-29  5:17 UTC (permalink / raw)
  To: Andy Lutomirski, x86
  Cc: Sasha Levin, Stefano Stabellini, Andrew Cooper, linux-kernel,
	xen-devel, Boris Ostrovsky

On 26.06.20 19:24, Andy Lutomirski wrote:
> On Xen PV, SWAPGS doesn't work.  Teach __rdfsbase_inactive() and
> __wrgsbase_inactive() to use rdmsrl()/wrmsrl() on Xen PV.  The Xen
> pvop code will understand this and issue the correct hypercalls.
> 
> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Stefano Stabellini <sstabellini@kernel.org>
> Cc: xen-devel@lists.xenproject.org
> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> ---
>   arch/x86/kernel/process_64.c | 20 ++++++++++++++------
>   1 file changed, 14 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
> index cb8e37d3acaa..457d02aa10d8 100644
> --- a/arch/x86/kernel/process_64.c
> +++ b/arch/x86/kernel/process_64.c
> @@ -163,9 +163,13 @@ static noinstr unsigned long __rdgsbase_inactive(void)
>   
>   	lockdep_assert_irqs_disabled();
>   
> -	native_swapgs();
> -	gsbase = rdgsbase();
> -	native_swapgs();
> +	if (!static_cpu_has(X86_FEATURE_XENPV)) {
> +		native_swapgs();
> +		gsbase = rdgsbase();
> +		native_swapgs();
> +	} else {
> +		rdmsrl(MSR_KERNEL_GS_BASE, gsbase);
> +	}
>   
>   	return gsbase;
>   }
> @@ -182,9 +186,13 @@ static noinstr void __wrgsbase_inactive(unsigned long gsbase)
>   {
>   	lockdep_assert_irqs_disabled();
>   
> -	native_swapgs();
> -	wrgsbase(gsbase);
> -	native_swapgs();
> +	if (!static_cpu_has(X86_FEATURE_XENPV)) {
> +		native_swapgs();
> +		wrgsbase(gsbase);
> +		native_swapgs();
> +	} else {
> +		wrmsrl(MSR_KERNEL_GS_BASE, gsbase);
> +	}
>   }
>   
>   /*
> 

Another possibility would be to just do (I'm fine either way):

diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index acc49fa6a097..b78dd373adbf 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -318,6 +318,8 @@ static void __init xen_init_capabilities(void)
  		setup_clear_cpu_cap(X86_FEATURE_XSAVE);
  		setup_clear_cpu_cap(X86_FEATURE_OSXSAVE);
  	}
+
+	setup_clear_cpu_cap(X86_FEATURE_FSGSBASE);
  }

  static void xen_set_debugreg(int reg, unsigned long val)


Juergen


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH fsgsbase v2 4/4] x86/fsgsbase: Fix Xen PV support
  2020-06-29  5:17     ` Jürgen Groß
@ 2020-06-29 11:07       ` Andrew Cooper
  -1 siblings, 0 replies; 14+ messages in thread
From: Andrew Cooper @ 2020-06-29 11:07 UTC (permalink / raw)
  To: Jürgen Groß, Andy Lutomirski, x86
  Cc: Sasha Levin, linux-kernel, Boris Ostrovsky, Stefano Stabellini,
	xen-devel

On 29/06/2020 06:17, Jürgen Groß wrote:
> On 26.06.20 19:24, Andy Lutomirski wrote:
>> On Xen PV, SWAPGS doesn't work.  Teach __rdfsbase_inactive() and
>> __wrgsbase_inactive() to use rdmsrl()/wrmsrl() on Xen PV.  The Xen
>> pvop code will understand this and issue the correct hypercalls.
>>
>> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>> Cc: Juergen Gross <jgross@suse.com>
>> Cc: Stefano Stabellini <sstabellini@kernel.org>
>> Cc: xen-devel@lists.xenproject.org
>> Signed-off-by: Andy Lutomirski <luto@kernel.org>
>> ---
>>   arch/x86/kernel/process_64.c | 20 ++++++++++++++------
>>   1 file changed, 14 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
>> index cb8e37d3acaa..457d02aa10d8 100644
>> --- a/arch/x86/kernel/process_64.c
>> +++ b/arch/x86/kernel/process_64.c
>> @@ -163,9 +163,13 @@ static noinstr unsigned long
>> __rdgsbase_inactive(void)
>>         lockdep_assert_irqs_disabled();
>>   -    native_swapgs();
>> -    gsbase = rdgsbase();
>> -    native_swapgs();
>> +    if (!static_cpu_has(X86_FEATURE_XENPV)) {
>> +        native_swapgs();
>> +        gsbase = rdgsbase();
>> +        native_swapgs();
>> +    } else {
>> +        rdmsrl(MSR_KERNEL_GS_BASE, gsbase);
>> +    }
>>         return gsbase;
>>   }
>> @@ -182,9 +186,13 @@ static noinstr void __wrgsbase_inactive(unsigned
>> long gsbase)
>>   {
>>       lockdep_assert_irqs_disabled();
>>   -    native_swapgs();
>> -    wrgsbase(gsbase);
>> -    native_swapgs();
>> +    if (!static_cpu_has(X86_FEATURE_XENPV)) {
>> +        native_swapgs();
>> +        wrgsbase(gsbase);
>> +        native_swapgs();
>> +    } else {
>> +        wrmsrl(MSR_KERNEL_GS_BASE, gsbase);
>> +    }
>>   }
>>     /*
>>
>
> Another possibility would be to just do (I'm fine either way):
>
> diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
> index acc49fa6a097..b78dd373adbf 100644
> --- a/arch/x86/xen/enlighten_pv.c
> +++ b/arch/x86/xen/enlighten_pv.c
> @@ -318,6 +318,8 @@ static void __init xen_init_capabilities(void)
>          setup_clear_cpu_cap(X86_FEATURE_XSAVE);
>          setup_clear_cpu_cap(X86_FEATURE_OSXSAVE);
>      }
> +
> +    setup_clear_cpu_cap(X86_FEATURE_FSGSBASE);

That will stop both userspace and Xen (side effect of the guest kernel's
CR4 choice) from using the instructions.

Even when the kernel is using the paravirt fastpath, its still Xen
actually taking the hit.  MSR_{FS,GS}_BASE/SHADOW are thousands of
cycles to access, whereas {RD,WR}{FS,GS}BASE are a handful.

~Andrew

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH fsgsbase v2 4/4] x86/fsgsbase: Fix Xen PV support
@ 2020-06-29 11:07       ` Andrew Cooper
  0 siblings, 0 replies; 14+ messages in thread
From: Andrew Cooper @ 2020-06-29 11:07 UTC (permalink / raw)
  To: Jürgen Groß, Andy Lutomirski, x86
  Cc: Sasha Levin, xen-devel, Boris Ostrovsky, Stefano Stabellini,
	linux-kernel

On 29/06/2020 06:17, Jürgen Groß wrote:
> On 26.06.20 19:24, Andy Lutomirski wrote:
>> On Xen PV, SWAPGS doesn't work.  Teach __rdfsbase_inactive() and
>> __wrgsbase_inactive() to use rdmsrl()/wrmsrl() on Xen PV.  The Xen
>> pvop code will understand this and issue the correct hypercalls.
>>
>> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>> Cc: Juergen Gross <jgross@suse.com>
>> Cc: Stefano Stabellini <sstabellini@kernel.org>
>> Cc: xen-devel@lists.xenproject.org
>> Signed-off-by: Andy Lutomirski <luto@kernel.org>
>> ---
>>   arch/x86/kernel/process_64.c | 20 ++++++++++++++------
>>   1 file changed, 14 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
>> index cb8e37d3acaa..457d02aa10d8 100644
>> --- a/arch/x86/kernel/process_64.c
>> +++ b/arch/x86/kernel/process_64.c
>> @@ -163,9 +163,13 @@ static noinstr unsigned long
>> __rdgsbase_inactive(void)
>>         lockdep_assert_irqs_disabled();
>>   -    native_swapgs();
>> -    gsbase = rdgsbase();
>> -    native_swapgs();
>> +    if (!static_cpu_has(X86_FEATURE_XENPV)) {
>> +        native_swapgs();
>> +        gsbase = rdgsbase();
>> +        native_swapgs();
>> +    } else {
>> +        rdmsrl(MSR_KERNEL_GS_BASE, gsbase);
>> +    }
>>         return gsbase;
>>   }
>> @@ -182,9 +186,13 @@ static noinstr void __wrgsbase_inactive(unsigned
>> long gsbase)
>>   {
>>       lockdep_assert_irqs_disabled();
>>   -    native_swapgs();
>> -    wrgsbase(gsbase);
>> -    native_swapgs();
>> +    if (!static_cpu_has(X86_FEATURE_XENPV)) {
>> +        native_swapgs();
>> +        wrgsbase(gsbase);
>> +        native_swapgs();
>> +    } else {
>> +        wrmsrl(MSR_KERNEL_GS_BASE, gsbase);
>> +    }
>>   }
>>     /*
>>
>
> Another possibility would be to just do (I'm fine either way):
>
> diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
> index acc49fa6a097..b78dd373adbf 100644
> --- a/arch/x86/xen/enlighten_pv.c
> +++ b/arch/x86/xen/enlighten_pv.c
> @@ -318,6 +318,8 @@ static void __init xen_init_capabilities(void)
>          setup_clear_cpu_cap(X86_FEATURE_XSAVE);
>          setup_clear_cpu_cap(X86_FEATURE_OSXSAVE);
>      }
> +
> +    setup_clear_cpu_cap(X86_FEATURE_FSGSBASE);

That will stop both userspace and Xen (side effect of the guest kernel's
CR4 choice) from using the instructions.

Even when the kernel is using the paravirt fastpath, its still Xen
actually taking the hit.  MSR_{FS,GS}_BASE/SHADOW are thousands of
cycles to access, whereas {RD,WR}{FS,GS}BASE are a handful.

~Andrew


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [tip: x86/fsgsbase] x86/fsgsbase: Fix Xen PV support
  2020-06-26 17:24   ` Andy Lutomirski
  (?)
  (?)
@ 2020-07-01 13:33   ` tip-bot2 for Andy Lutomirski
  -1 siblings, 0 replies; 14+ messages in thread
From: tip-bot2 for Andy Lutomirski @ 2020-07-01 13:33 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Andy Lutomirski, Thomas Gleixner, x86, LKML

The following commit has been merged into the x86/fsgsbase branch of tip:

Commit-ID:     d029bff60aa6c7eab281d52602b6a7a971615324
Gitweb:        https://git.kernel.org/tip/d029bff60aa6c7eab281d52602b6a7a971615324
Author:        Andy Lutomirski <luto@kernel.org>
AuthorDate:    Fri, 26 Jun 2020 10:24:30 -07:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Wed, 01 Jul 2020 15:27:20 +02:00

x86/fsgsbase: Fix Xen PV support

On Xen PV, SWAPGS doesn't work.  Teach __rdfsbase_inactive() and
__wrgsbase_inactive() to use rdmsrl()/wrmsrl() on Xen PV.  The Xen
pvop code will understand this and issue the correct hypercalls.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/f07c08f178fe9711915862b656722a207cd52c28.1593192140.git.luto@kernel.org

---
 arch/x86/kernel/process_64.c | 24 ++++++++++++++++++------
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index cb8e37d..e14476f 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -163,9 +163,15 @@ static noinstr unsigned long __rdgsbase_inactive(void)
 
 	lockdep_assert_irqs_disabled();
 
-	native_swapgs();
-	gsbase = rdgsbase();
-	native_swapgs();
+	if (!static_cpu_has(X86_FEATURE_XENPV)) {
+		native_swapgs();
+		gsbase = rdgsbase();
+		native_swapgs();
+	} else {
+		instrumentation_begin();
+		rdmsrl(MSR_KERNEL_GS_BASE, gsbase);
+		instrumentation_end();
+	}
 
 	return gsbase;
 }
@@ -182,9 +188,15 @@ static noinstr void __wrgsbase_inactive(unsigned long gsbase)
 {
 	lockdep_assert_irqs_disabled();
 
-	native_swapgs();
-	wrgsbase(gsbase);
-	native_swapgs();
+	if (!static_cpu_has(X86_FEATURE_XENPV)) {
+		native_swapgs();
+		wrgsbase(gsbase);
+		native_swapgs();
+	} else {
+		instrumentation_begin();
+		wrmsrl(MSR_KERNEL_GS_BASE, gsbase);
+		instrumentation_end();
+	}
 }
 
 /*

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [tip: x86/fsgsbase] x86/ptrace: Fix 32-bit PTRACE_SETREGS vs fsbase and gsbase
  2020-06-26 17:24 ` [PATCH fsgsbase v2 3/4] x86/ptrace: Fix 32-bit PTRACE_SETREGS vs fsbase and gsbase Andy Lutomirski
@ 2020-07-01 13:33   ` tip-bot2 for Andy Lutomirski
  0 siblings, 0 replies; 14+ messages in thread
From: tip-bot2 for Andy Lutomirski @ 2020-07-01 13:33 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Andy Lutomirski, Thomas Gleixner, x86, LKML

The following commit has been merged into the x86/fsgsbase branch of tip:

Commit-ID:     40c45904f818c1f6555294ca27afc5fda4f09e68
Gitweb:        https://git.kernel.org/tip/40c45904f818c1f6555294ca27afc5fda4f09e68
Author:        Andy Lutomirski <luto@kernel.org>
AuthorDate:    Fri, 26 Jun 2020 10:24:29 -07:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Wed, 01 Jul 2020 15:27:20 +02:00

x86/ptrace: Fix 32-bit PTRACE_SETREGS vs fsbase and gsbase

Debuggers expect that doing PTRACE_GETREGS, then poking at a tracee
and maybe letting it run for a while, then doing PTRACE_SETREGS will
put the tracee back where it was.  In the specific case of a 32-bit
tracer and tracee, the PTRACE_GETREGS/SETREGS data structure doesn't
have fs_base or gs_base fields, so FSBASE and GSBASE fields are
never stored anywhere.  Everything used to still work because
nonzero FS or GS would result full reloads of the segment registers
when the tracee resumes, and the bases associated with FS==0 or
GS==0 are irrelevant to 32-bit code.

Adding FSGSBASE support broke this: when FSGSBASE is enabled, FSBASE
and GSBASE are now restored independently of FS and GS for all tasks
when context-switched in.  This means that, if a 32-bit tracer
restores a previous state using PTRACE_SETREGS but the tracee's
pre-restore and post-restore bases don't match, then the tracee is
resumed with the wrong base.

Fix it by explicitly loading the base when a 32-bit tracer pokes FS
or GS on a 64-bit kernel.

Also add a test case.

Fixes: 673903495c85 ("x86/process/64: Use FSBSBASE in switch_to() if available")
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/229cc6a50ecbb701abd50fe4ddaf0eda888898cd.1593192140.git.luto@kernel.org

---
 arch/x86/include/asm/fsgsbase.h                |   2 +-
 arch/x86/kernel/process_64.c                   |   4 +-
 arch/x86/kernel/ptrace.c                       |  43 ++-
 tools/testing/selftests/x86/Makefile           |   2 +-
 tools/testing/selftests/x86/fsgsbase_restore.c | 245 ++++++++++++++++-
 5 files changed, 280 insertions(+), 16 deletions(-)
 create mode 100644 tools/testing/selftests/x86/fsgsbase_restore.c

diff --git a/arch/x86/include/asm/fsgsbase.h b/arch/x86/include/asm/fsgsbase.h
index aefd537..d552646 100644
--- a/arch/x86/include/asm/fsgsbase.h
+++ b/arch/x86/include/asm/fsgsbase.h
@@ -75,6 +75,8 @@ static inline void x86_fsbase_write_cpu(unsigned long fsbase)
 
 extern unsigned long x86_gsbase_read_cpu_inactive(void);
 extern void x86_gsbase_write_cpu_inactive(unsigned long gsbase);
+extern unsigned long x86_fsgsbase_read_task(struct task_struct *task,
+					    unsigned short selector);
 
 #endif /* CONFIG_X86_64 */
 
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index d618969..cb8e37d 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -347,8 +347,8 @@ static __always_inline void x86_fsgsbase_load(struct thread_struct *prev,
 	}
 }
 
-static unsigned long x86_fsgsbase_read_task(struct task_struct *task,
-					    unsigned short selector)
+unsigned long x86_fsgsbase_read_task(struct task_struct *task,
+				     unsigned short selector)
 {
 	unsigned short idx = selector >> 3;
 	unsigned long base;
diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index 1c7646c..3f00648 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -281,17 +281,9 @@ static int set_segment_reg(struct task_struct *task,
 		return -EIO;
 
 	/*
-	 * This function has some ABI oddities.
-	 *
-	 * A 32-bit ptracer probably expects that writing FS or GS will change
-	 * FSBASE or GSBASE respectively.  In the absence of FSGSBASE support,
-	 * this code indeed has that effect.  When FSGSBASE is added, this
-	 * will require a special case.
-	 *
-	 * For existing 64-bit ptracers, writing FS or GS *also* currently
-	 * changes the base if the selector is nonzero the next time the task
-	 * is run.  This behavior may not be needed, and trying to preserve it
-	 * when FSGSBASE is added would be complicated at best.
+	 * Writes to FS and GS will change the stored selector.  Whether
+	 * this changes the segment base as well depends on whether
+	 * FSGSBASE is enabled.
 	 */
 
 	switch (offset) {
@@ -867,14 +859,39 @@ long arch_ptrace(struct task_struct *child, long request,
 static int putreg32(struct task_struct *child, unsigned regno, u32 value)
 {
 	struct pt_regs *regs = task_pt_regs(child);
+	int ret;
 
 	switch (regno) {
 
 	SEG32(cs);
 	SEG32(ds);
 	SEG32(es);
-	SEG32(fs);
-	SEG32(gs);
+
+	/*
+	 * A 32-bit ptracer on a 64-bit kernel expects that writing
+	 * FS or GS will also update the base.  This is needed for
+	 * operations like PTRACE_SETREGS to fully restore a saved
+	 * CPU state.
+	 */
+
+	case offsetof(struct user32, regs.fs):
+		ret = set_segment_reg(child,
+				      offsetof(struct user_regs_struct, fs),
+				      value);
+		if (ret == 0)
+			child->thread.fsbase =
+				x86_fsgsbase_read_task(child, value);
+		return ret;
+
+	case offsetof(struct user32, regs.gs):
+		ret = set_segment_reg(child,
+				      offsetof(struct user_regs_struct, gs),
+				      value);
+		if (ret == 0)
+			child->thread.gsbase =
+				x86_fsgsbase_read_task(child, value);
+		return ret;
+
 	SEG32(ss);
 
 	R32(ebx, bx);
diff --git a/tools/testing/selftests/x86/Makefile b/tools/testing/selftests/x86/Makefile
index 5f16821..3ff9457 100644
--- a/tools/testing/selftests/x86/Makefile
+++ b/tools/testing/selftests/x86/Makefile
@@ -13,7 +13,7 @@ CAN_BUILD_WITH_NOPIE := $(shell ./check_cc.sh $(CC) trivial_program.c -no-pie)
 TARGETS_C_BOTHBITS := single_step_syscall sysret_ss_attrs syscall_nt test_mremap_vdso \
 			check_initial_reg_state sigreturn iopl ioperm \
 			test_vdso test_vsyscall mov_ss_trap \
-			syscall_arg_fault
+			syscall_arg_fault fsgsbase_restore
 TARGETS_C_32BIT_ONLY := entry_from_vm86 test_syscall_vdso unwind_vdso \
 			test_FCMOV test_FCOMI test_FISTTP \
 			vdso_restorer
diff --git a/tools/testing/selftests/x86/fsgsbase_restore.c b/tools/testing/selftests/x86/fsgsbase_restore.c
new file mode 100644
index 0000000..6fffadc
--- /dev/null
+++ b/tools/testing/selftests/x86/fsgsbase_restore.c
@@ -0,0 +1,245 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * fsgsbase_restore.c, test ptrace vs fsgsbase
+ * Copyright (c) 2020 Andy Lutomirski
+ *
+ * This test case simulates a tracer redirecting tracee execution to
+ * a function and then restoring tracee state using PTRACE_GETREGS and
+ * PTRACE_SETREGS.  This is similar to what gdb does when doing
+ * 'p func()'.  The catch is that this test has the called function
+ * modify a segment register.  This makes sure that ptrace correctly
+ * restores segment state when using PTRACE_SETREGS.
+ *
+ * This is not part of fsgsbase.c, because that test is 64-bit only.
+ */
+
+#define _GNU_SOURCE
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdbool.h>
+#include <string.h>
+#include <sys/syscall.h>
+#include <unistd.h>
+#include <err.h>
+#include <sys/user.h>
+#include <asm/prctl.h>
+#include <sys/prctl.h>
+#include <asm/ldt.h>
+#include <sys/mman.h>
+#include <stddef.h>
+#include <sys/ptrace.h>
+#include <sys/wait.h>
+#include <stdint.h>
+
+#define EXPECTED_VALUE 0x1337f00d
+
+#ifdef __x86_64__
+# define SEG "%gs"
+#else
+# define SEG "%fs"
+#endif
+
+static unsigned int dereference_seg_base(void)
+{
+	int ret;
+	asm volatile ("mov %" SEG ":(0), %0" : "=rm" (ret));
+	return ret;
+}
+
+static void init_seg(void)
+{
+	unsigned int *target = mmap(
+		NULL, sizeof(unsigned int),
+		PROT_READ | PROT_WRITE,
+		MAP_PRIVATE | MAP_ANONYMOUS | MAP_32BIT, -1, 0);
+	if (target == MAP_FAILED)
+		err(1, "mmap");
+
+	*target = EXPECTED_VALUE;
+
+	printf("\tsegment base address = 0x%lx\n", (unsigned long)target);
+
+	struct user_desc desc = {
+		.entry_number    = 0,
+		.base_addr       = (unsigned int)(uintptr_t)target,
+		.limit           = sizeof(unsigned int) - 1,
+		.seg_32bit       = 1,
+		.contents        = 0, /* Data, grow-up */
+		.read_exec_only  = 0,
+		.limit_in_pages  = 0,
+		.seg_not_present = 0,
+		.useable         = 0
+	};
+	if (syscall(SYS_modify_ldt, 1, &desc, sizeof(desc)) == 0) {
+		printf("\tusing LDT slot 0\n");
+		asm volatile ("mov %0, %" SEG :: "rm" ((unsigned short)0x7));
+	} else {
+		/* No modify_ldt for us (configured out, perhaps) */
+
+		struct user_desc *low_desc = mmap(
+			NULL, sizeof(desc),
+			PROT_READ | PROT_WRITE,
+			MAP_PRIVATE | MAP_ANONYMOUS | MAP_32BIT, -1, 0);
+		memcpy(low_desc, &desc, sizeof(desc));
+
+		low_desc->entry_number = -1;
+
+		/* 32-bit set_thread_area */
+		long ret;
+		asm volatile ("int $0x80"
+			      : "=a" (ret), "+m" (*low_desc)
+			      : "a" (243), "b" (low_desc)
+#ifdef __x86_64__
+			      : "r8", "r9", "r10", "r11"
+#endif
+			);
+		memcpy(&desc, low_desc, sizeof(desc));
+		munmap(low_desc, sizeof(desc));
+
+		if (ret != 0) {
+			printf("[NOTE]\tcould not create a segment -- can't test anything\n");
+			exit(0);
+		}
+		printf("\tusing GDT slot %d\n", desc.entry_number);
+
+		unsigned short sel = (unsigned short)((desc.entry_number << 3) | 0x3);
+		asm volatile ("mov %0, %" SEG :: "rm" (sel));
+	}
+}
+
+static void tracee_zap_segment(void)
+{
+	/*
+	 * The tracer will redirect execution here.  This is meant to
+	 * work like gdb's 'p func()' feature.  The tricky bit is that
+	 * we modify a segment register in order to make sure that ptrace
+	 * can correctly restore segment registers.
+	 */
+	printf("\tTracee: in tracee_zap_segment()\n");
+
+	/*
+	 * Write a nonzero selector with base zero to the segment register.
+	 * Using a null selector would defeat the test on AMD pre-Zen2
+	 * CPUs, as such CPUs don't clear the base when loading a null
+	 * selector.
+	 */
+	unsigned short sel;
+	asm volatile ("mov %%ss, %0\n\t"
+		      "mov %0, %" SEG
+		      : "=rm" (sel));
+
+	pid_t pid = getpid(), tid = syscall(SYS_gettid);
+
+	printf("\tTracee is going back to sleep\n");
+	syscall(SYS_tgkill, pid, tid, SIGSTOP);
+
+	/* Should not get here. */
+	while (true) {
+		printf("[FAIL]\tTracee hit unreachable code\n");
+		pause();
+	}
+}
+
+int main()
+{
+	printf("\tSetting up a segment\n");
+	init_seg();
+
+	unsigned int val = dereference_seg_base();
+	if (val != EXPECTED_VALUE) {
+		printf("[FAIL]\tseg[0] == %x; should be %x\n", val, EXPECTED_VALUE);
+		return 1;
+	}
+	printf("[OK]\tThe segment points to the right place.\n");
+
+	pid_t chld = fork();
+	if (chld < 0)
+		err(1, "fork");
+
+	if (chld == 0) {
+		prctl(PR_SET_PDEATHSIG, SIGKILL, 0, 0, 0, 0);
+
+		if (ptrace(PTRACE_TRACEME, 0, 0, 0) != 0)
+			err(1, "PTRACE_TRACEME");
+
+		pid_t pid = getpid(), tid = syscall(SYS_gettid);
+
+		printf("\tTracee will take a nap until signaled\n");
+		syscall(SYS_tgkill, pid, tid, SIGSTOP);
+
+		printf("\tTracee was resumed.  Will re-check segment.\n");
+
+		val = dereference_seg_base();
+		if (val != EXPECTED_VALUE) {
+			printf("[FAIL]\tseg[0] == %x; should be %x\n", val, EXPECTED_VALUE);
+			exit(1);
+		}
+
+		printf("[OK]\tThe segment points to the right place.\n");
+		exit(0);
+	}
+
+	int status;
+
+	/* Wait for SIGSTOP. */
+	if (waitpid(chld, &status, 0) != chld || !WIFSTOPPED(status))
+		err(1, "waitpid");
+
+	struct user_regs_struct regs;
+
+	if (ptrace(PTRACE_GETREGS, chld, NULL, &regs) != 0)
+		err(1, "PTRACE_GETREGS");
+
+#ifdef __x86_64__
+	printf("\tChild GS=0x%lx, GSBASE=0x%lx\n", (unsigned long)regs.gs, (unsigned long)regs.gs_base);
+#else
+	printf("\tChild FS=0x%lx\n", (unsigned long)regs.xfs);
+#endif
+
+	struct user_regs_struct regs2 = regs;
+#ifdef __x86_64__
+	regs2.rip = (unsigned long)tracee_zap_segment;
+	regs2.rsp -= 128;	/* Don't clobber the redzone. */
+#else
+	regs2.eip = (unsigned long)tracee_zap_segment;
+#endif
+
+	printf("\tTracer: redirecting tracee to tracee_zap_segment()\n");
+	if (ptrace(PTRACE_SETREGS, chld, NULL, &regs2) != 0)
+		err(1, "PTRACE_GETREGS");
+	if (ptrace(PTRACE_CONT, chld, NULL, NULL) != 0)
+		err(1, "PTRACE_GETREGS");
+
+	/* Wait for SIGSTOP. */
+	if (waitpid(chld, &status, 0) != chld || !WIFSTOPPED(status))
+		err(1, "waitpid");
+
+	printf("\tTracer: restoring tracee state\n");
+	if (ptrace(PTRACE_SETREGS, chld, NULL, &regs) != 0)
+		err(1, "PTRACE_GETREGS");
+	if (ptrace(PTRACE_DETACH, chld, NULL, NULL) != 0)
+		err(1, "PTRACE_GETREGS");
+
+	/* Wait for SIGSTOP. */
+	if (waitpid(chld, &status, 0) != chld)
+		err(1, "waitpid");
+
+	if (WIFSIGNALED(status)) {
+		printf("[FAIL]\tTracee crashed\n");
+		return 1;
+	}
+
+	if (!WIFEXITED(status)) {
+		printf("[FAIL]\tTracee stopped for an unexpected reason: %d\n", status);
+		return 1;
+	}
+
+	int exitcode = WEXITSTATUS(status);
+	if (exitcode != 0) {
+		printf("[FAIL]\tTracee reported failure\n");
+		return 1;
+	}
+
+	printf("[OK]\tAll is well.\n");
+	return 0;
+}

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [tip: x86/fsgsbase] selftests/x86/fsgsbase: Add a missing memory constraint
  2020-06-26 17:24 ` [PATCH fsgsbase v2 2/4] selftests/x86/fsgsbase: Add a missing memory constraint Andy Lutomirski
@ 2020-07-01 13:33   ` tip-bot2 for Andy Lutomirski
  0 siblings, 0 replies; 14+ messages in thread
From: tip-bot2 for Andy Lutomirski @ 2020-07-01 13:33 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Andy Lutomirski, Thomas Gleixner, x86, LKML

The following commit has been merged into the x86/fsgsbase branch of tip:

Commit-ID:     8e259031c67a5ea0666428edb64c89e8c6ebd18e
Gitweb:        https://git.kernel.org/tip/8e259031c67a5ea0666428edb64c89e8c6ebd18e
Author:        Andy Lutomirski <luto@kernel.org>
AuthorDate:    Fri, 26 Jun 2020 10:24:28 -07:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Wed, 01 Jul 2020 15:27:20 +02:00

selftests/x86/fsgsbase: Add a missing memory constraint

The manual call to set_thread_area() via int $0x80 was missing any
indication that the descriptor was a pointer, causing gcc to
occasionally generate wrong code.  Add the missing constraint.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/432968af67259ca92d68b774a731aff468eae610.1593192140.git.luto@kernel.org

---
 tools/testing/selftests/x86/fsgsbase.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/x86/fsgsbase.c b/tools/testing/selftests/x86/fsgsbase.c
index f47495d..9983195 100644
--- a/tools/testing/selftests/x86/fsgsbase.c
+++ b/tools/testing/selftests/x86/fsgsbase.c
@@ -285,7 +285,8 @@ static unsigned short load_gs(void)
 		/* 32-bit set_thread_area */
 		long ret;
 		asm volatile ("int $0x80"
-			      : "=a" (ret) : "a" (243), "b" (low_desc)
+			      : "=a" (ret), "+m" (*low_desc)
+			      : "a" (243), "b" (low_desc)
 			      : "r8", "r9", "r10", "r11");
 		memcpy(&desc, low_desc, sizeof(desc));
 		munmap(low_desc, sizeof(desc));

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [tip: x86/fsgsbase] selftests/x86/fsgsbase: Fix a comment in the ptrace_write_gsbase test
  2020-06-26 17:24 ` [PATCH fsgsbase v2 1/4] selftests/x86/fsgsbase: Fix a comment in the ptrace_write_gsbase test Andy Lutomirski
@ 2020-07-01 13:33   ` tip-bot2 for Andy Lutomirski
  0 siblings, 0 replies; 14+ messages in thread
From: tip-bot2 for Andy Lutomirski @ 2020-07-01 13:33 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Andy Lutomirski, Thomas Gleixner, x86, LKML

The following commit has been merged into the x86/fsgsbase branch of tip:

Commit-ID:     979c2c4247cafd8a91628a7306b6871efbd12fdb
Gitweb:        https://git.kernel.org/tip/979c2c4247cafd8a91628a7306b6871efbd12fdb
Author:        Andy Lutomirski <luto@kernel.org>
AuthorDate:    Fri, 26 Jun 2020 10:24:27 -07:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Wed, 01 Jul 2020 15:27:20 +02:00

selftests/x86/fsgsbase: Fix a comment in the ptrace_write_gsbase test

A comment was unclear.  Fix it.

Fixes: 5e7ec8578fa3 ("selftests/x86/fsgsbase: Test ptracer-induced GS base write with FSGSBASE")
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/901034a91a40169ec84f1f699ea86704dff762e4.1593192140.git.luto@kernel.org

---
 tools/testing/selftests/x86/fsgsbase.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/x86/fsgsbase.c b/tools/testing/selftests/x86/fsgsbase.c
index 9a43498..f47495d 100644
--- a/tools/testing/selftests/x86/fsgsbase.c
+++ b/tools/testing/selftests/x86/fsgsbase.c
@@ -498,7 +498,8 @@ static void test_ptrace_write_gsbase(void)
 			 * base would zero the selector.  On newer kernels,
 			 * this behavior has changed -- poking the base
 			 * changes only the base and, if FSGSBASE is not
-			 * available, this may not effect.
+			 * available, this may have no effect once the tracee
+			 * is resumed.
 			 */
 			if (gs == 0)
 				printf("\tNote: this is expected behavior on older kernels.\n");

^ permalink raw reply related	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-07-01 13:33 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-26 17:24 [PATCH fsgsbase v2 0/4] x86/fsgsbase: Some fixes Andy Lutomirski
2020-06-26 17:24 ` [PATCH fsgsbase v2 1/4] selftests/x86/fsgsbase: Fix a comment in the ptrace_write_gsbase test Andy Lutomirski
2020-07-01 13:33   ` [tip: x86/fsgsbase] " tip-bot2 for Andy Lutomirski
2020-06-26 17:24 ` [PATCH fsgsbase v2 2/4] selftests/x86/fsgsbase: Add a missing memory constraint Andy Lutomirski
2020-07-01 13:33   ` [tip: x86/fsgsbase] " tip-bot2 for Andy Lutomirski
2020-06-26 17:24 ` [PATCH fsgsbase v2 3/4] x86/ptrace: Fix 32-bit PTRACE_SETREGS vs fsbase and gsbase Andy Lutomirski
2020-07-01 13:33   ` [tip: x86/fsgsbase] " tip-bot2 for Andy Lutomirski
2020-06-26 17:24 ` [PATCH fsgsbase v2 4/4] x86/fsgsbase: Fix Xen PV support Andy Lutomirski
2020-06-26 17:24   ` Andy Lutomirski
2020-06-29  5:17   ` Jürgen Groß
2020-06-29  5:17     ` Jürgen Groß
2020-06-29 11:07     ` Andrew Cooper
2020-06-29 11:07       ` Andrew Cooper
2020-07-01 13:33   ` [tip: x86/fsgsbase] " tip-bot2 for Andy Lutomirski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.