linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v9 00/17] Enable FSGSBASE instructions
@ 2019-10-04 18:15 Chang S. Bae
  2019-10-04 18:15 ` [PATCH v9 01/17] x86/ptrace: Prevent ptrace from clearing the FS/GS selector Chang S. Bae
                   ` (17 more replies)
  0 siblings, 18 replies; 46+ messages in thread
From: Chang S. Bae @ 2019-10-04 18:15 UTC (permalink / raw)
  To: linux-kernel, tglx, bp, luto
  Cc: hpa, dave.hansen, tony.luck, ak, ravi.v.shankar, chang.seok.bae

Benefits:
Currently a user process that wishes to read or write the FS/GS base must
make a system call. But recent X86 processors have added new instructions
for use in 64-bit mode that allow direct access to the FS and GS segment
base addresses.  The operating system controls whether applications can
use these instructions with a %cr4 control bit.

In addition to benefits to applications, performance improvements to the
OS context switch code are possible by making use of these instructions. A
third party reported out promising performance numbers out of their
initial benchmarking of the previous version of this patch series [9].

Enablement check:
The kernel provides information about the enabled state of FSGSBASE to
applications using the ELF_AUX vector. If the HWCAP2_FSGSBASE bit is set in
the AUX vector, the kernel has FSGSBASE instructions enabled and
applications can use them.

Kernel changes:
Major changes made in the kernel are in context switch, paranoid path, and
ptrace. In a context switch, a task's FS/GS base will be secured regardless
of its selector. In the paranoid path, GS base is unconditionally
overwritten to the kernel GS base on entry and the original GS base is
restored on exit. Ptrace includes divergence of FS/GS index and base
values.

Security:
For mitigating the Spectre v1 SWAPGS issue, LFENCE instructions were added
on most kernel entries. Those patches are dependent on previous behaviors
that users couldn't load a kernel address into the GS base. These patches
change that assumption since the user can load any address into GS base.
The changes to the kernel entry path in this patch series take account of
the SWAPGS issue.

Updates from v8 [10]:
* Internalized the interrupt check in the helper functions (Andy L.)
* Simplified GS base helper functions (Tony L.)
* Changed the patch order to put the paranoid path changes before the
  context switch changes (Tony L.)
* Fixed typos (Randy D.) and massaged a few sentences in the documentation
* Massaged the FSGSBASE enablement message

Previous versions: [1-7]

[1] version 1: https://lkml.kernel.org/r/1521481767-22113-1-git-send-email-chang.seok.bae@intel.com/
[2] version 2: https://lkml.kernel.org/r/1527789525-8857-1-git-send-email-chang.seok.bae@intel.com/
[3] version 3: https://lkml.kernel.org/r/20181023184234.14025-1-chang.seok.bae@intel.com/
[4] version 4: https://lkml.kernel.org/r/20190116224849.8617-1-chang.seok.bae@intel.com/
[5] version 5: https://lkml.kernel.org/r/20190201205319.15995-1-chang.seok.bae@intel.com/
[6] version 6: https://lkml.kernel.org/r/1552680405-5265-1-git-send-email-chang.seok.bae@intel.com/
[7] version 7: https://lkml.kernel.org/r/1557309753-24073-1-git-send-email-chang.seok.bae@intel.com/
[8] previously merged point (right before reverted):
    https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?h=x86-cpu-for-linus&id=697096b14444f458fb81212d1c
82d7846e932455
[9] initial benchmark: https://www.phoronix.com/scan.php?page=article&item=linux-wip-fsgsbase&num=1
[10] version 8: https://lore.kernel.org/lkml/1568318818-4091-1-git-send-email-chang.seok.bae@intel.com/

Andi Kleen (2):
  x86/fsgsbase/64: Add intrinsics for FSGSBASE instructions
  x86/elf: Enumerate kernel FSGSBASE capability in AT_HWCAP2

Andy Lutomirski (4):
  x86/cpu: Add 'unsafe_fsgsbase' to enable CR4.FSGSBASE
  x86/entry/64: Clean up paranoid exit
  x86/fsgsbase/64: Use FSGSBASE in switch_to() if available
  x86/fsgsbase/64: Enable FSGSBASE on 64bit by default and add a chicken
    bit

Chang S. Bae (9):
  x86/ptrace: Prevent ptrace from clearing the FS/GS selector
  selftests/x86/fsgsbase: Test GS selector on ptracer-induced GS base
    write
  x86/entry/64: Switch CR3 before SWAPGS in paranoid entry
  x86/entry/64: Introduce the FIND_PERCPU_BASE macro
  x86/entry/64: Handle FSGSBASE enabled paranoid entry/exit
  x86/entry/64: Document GSBASE handling in the paranoid path
  x86/fsgsbase/64: Enable FSGSBASE instructions in helper functions
  x86/fsgsbase/64: Use FSGSBASE instructions on thread copy and ptrace
  selftests/x86/fsgsbase: Test ptracer-induced GS base write with
    FSGSBASE

Thomas Gleixner (1):
  Documentation/x86/64: Add documentation for GS/FS addressing mode

Tony Luck (1):
  x86/speculation/swapgs: Check FSGSBASE in enabling SWAPGS mitigation

 Documentation/admin-guide/kernel-parameters.txt |   2 +
 Documentation/x86/entry_64.rst                  |   9 ++
 Documentation/x86/x86_64/fsgs.rst               | 199 ++++++++++++++++++++++++
 Documentation/x86/x86_64/index.rst              |   1 +
 arch/x86/entry/calling.h                        |  40 +++++
 arch/x86/entry/entry_64.S                       | 134 ++++++++++++----
 arch/x86/include/asm/fsgsbase.h                 |  45 ++++--
 arch/x86/include/asm/inst.h                     |  15 ++
 arch/x86/include/uapi/asm/hwcap2.h              |   3 +
 arch/x86/kernel/cpu/bugs.c                      |   6 +-
 arch/x86/kernel/cpu/common.c                    |  22 +++
 arch/x86/kernel/process_64.c                    | 107 +++++++++++--
 arch/x86/kernel/ptrace.c                        |  14 +-
 tools/testing/selftests/x86/fsgsbase.c          |  24 ++-
 14 files changed, 549 insertions(+), 72 deletions(-)
 create mode 100644 Documentation/x86/x86_64/fsgs.rst

-- 
2.7.4


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v9 01/17] x86/ptrace: Prevent ptrace from clearing the FS/GS selector
  2019-10-04 18:15 [PATCH v9 00/17] Enable FSGSBASE instructions Chang S. Bae
@ 2019-10-04 18:15 ` Chang S. Bae
  2019-10-04 18:15 ` [PATCH v9 02/17] selftests/x86/fsgsbase: Test GS selector on ptracer-induced GS base write Chang S. Bae
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 46+ messages in thread
From: Chang S. Bae @ 2019-10-04 18:15 UTC (permalink / raw)
  To: linux-kernel, tglx, bp, luto
  Cc: hpa, dave.hansen, tony.luck, ak, ravi.v.shankar, chang.seok.bae

When a ptracer writes a ptracee's FS/GS base with a different value, the
selector is also cleared. This behavior is not correct as the selector
should be preserved.

Update only the base value and leave the selector intact. To simplify the
code further remove the conditional checking for the same value as this
code is not performance-critical.

The only recognizable downside of this change is when the selector is
already nonzero on write. The base will be reloaded according to the
selector. But the case is highly unexpected in real usages.

Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
---

Changes from v8: none

Changes from v7:
* Fixed to call correct helper functions
* Massaged changelog by Thomas
* Used '[FS|GS] base' consistently, instead of '[FS|GS]BASE'
---
 arch/x86/kernel/ptrace.c | 14 ++------------
 1 file changed, 2 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index 3c5bbe8..df222e2 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -370,22 +370,12 @@ static int putreg(struct task_struct *child,
 	case offsetof(struct user_regs_struct,fs_base):
 		if (value >= TASK_SIZE_MAX)
 			return -EIO;
-		/*
-		 * When changing the FS base, use do_arch_prctl_64()
-		 * to set the index to zero and to set the base
-		 * as requested.
-		 */
-		if (child->thread.fsbase != value)
-			return do_arch_prctl_64(child, ARCH_SET_FS, value);
+		x86_fsbase_write_task(child, value);
 		return 0;
 	case offsetof(struct user_regs_struct,gs_base):
-		/*
-		 * Exactly the same here as the %fs handling above.
-		 */
 		if (value >= TASK_SIZE_MAX)
 			return -EIO;
-		if (child->thread.gsbase != value)
-			return do_arch_prctl_64(child, ARCH_SET_GS, value);
+		x86_gsbase_write_task(child, value);
 		return 0;
 #endif
 	}
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v9 02/17] selftests/x86/fsgsbase: Test GS selector on ptracer-induced GS base write
  2019-10-04 18:15 [PATCH v9 00/17] Enable FSGSBASE instructions Chang S. Bae
  2019-10-04 18:15 ` [PATCH v9 01/17] x86/ptrace: Prevent ptrace from clearing the FS/GS selector Chang S. Bae
@ 2019-10-04 18:15 ` Chang S. Bae
  2019-10-04 18:15 ` [PATCH v9 03/17] x86/cpu: Add 'unsafe_fsgsbase' to enable CR4.FSGSBASE Chang S. Bae
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 46+ messages in thread
From: Chang S. Bae @ 2019-10-04 18:15 UTC (permalink / raw)
  To: linux-kernel, tglx, bp, luto
  Cc: hpa, dave.hansen, tony.luck, ak, ravi.v.shankar, chang.seok.bae

The test validates that the selector is not changed when a ptracer writes
the ptracee's GS base.

Originally-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
---

Changes from v8: none

Changes from v7:
* Trimmed down the changes as most codes from v7 were already merged
* Included Andy's additional comments and messages when testing old
  kernels
* Used 'GS base' consistently, instead of 'GSBASE'
---
 tools/testing/selftests/x86/fsgsbase.c | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/x86/fsgsbase.c b/tools/testing/selftests/x86/fsgsbase.c
index 15a329d..950a48b 100644
--- a/tools/testing/selftests/x86/fsgsbase.c
+++ b/tools/testing/selftests/x86/fsgsbase.c
@@ -465,7 +465,7 @@ static void test_ptrace_write_gsbase(void)
 	wait(&status);
 
 	if (WSTOPSIG(status) == SIGTRAP) {
-		unsigned long gs, base;
+		unsigned long gs;
 		unsigned long gs_offset = USER_REGS_OFFSET(gs);
 		unsigned long base_offset = USER_REGS_OFFSET(gs_base);
 
@@ -481,7 +481,6 @@ static void test_ptrace_write_gsbase(void)
 			err(1, "PTRACE_POKEUSER");
 
 		gs = ptrace(PTRACE_PEEKUSER, child, gs_offset, NULL);
-		base = ptrace(PTRACE_PEEKUSER, child, base_offset, NULL);
 
 		/*
 		 * In a non-FSGSBASE system, the nonzero selector will load
@@ -489,11 +488,21 @@ static void test_ptrace_write_gsbase(void)
 		 * selector value is changed or not by the GSBASE write in
 		 * a ptracer.
 		 */
-		if (gs == 0 && base == 0xFF) {
-			printf("[OK]\tGS was reset as expected\n");
-		} else {
+		if (gs != *shared_scratch) {
 			nerrs++;
-			printf("[FAIL]\tGS=0x%lx, GSBASE=0x%lx (should be 0, 0xFF)\n", gs, base);
+			printf("[FAIL]\tGS changed to %lx\n", gs);
+
+			/*
+			 * On older kernels, poking a nonzero value into the
+			 * base would zero the selector.  On newer kernels,
+			 * this behavior has changed -- poking the base
+			 * changes only the base and, if FSGSBASE is not
+			 * available, this may not effect.
+			 */
+			if (gs == 0)
+				printf("\tNote: this is expected behavior on older kernels.\n");
+		} else {
+			printf("[OK]\tGS remained 0x%hx\n", *shared_scratch);
 		}
 	}
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v9 03/17] x86/cpu: Add 'unsafe_fsgsbase' to enable CR4.FSGSBASE
  2019-10-04 18:15 [PATCH v9 00/17] Enable FSGSBASE instructions Chang S. Bae
  2019-10-04 18:15 ` [PATCH v9 01/17] x86/ptrace: Prevent ptrace from clearing the FS/GS selector Chang S. Bae
  2019-10-04 18:15 ` [PATCH v9 02/17] selftests/x86/fsgsbase: Test GS selector on ptracer-induced GS base write Chang S. Bae
@ 2019-10-04 18:15 ` Chang S. Bae
  2019-10-04 18:15 ` [PATCH v9 04/17] x86/entry/64: Clean up paranoid exit Chang S. Bae
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 46+ messages in thread
From: Chang S. Bae @ 2019-10-04 18:15 UTC (permalink / raw)
  To: linux-kernel, tglx, bp, luto
  Cc: hpa, dave.hansen, tony.luck, ak, ravi.v.shankar, chang.seok.bae

From: Andy Lutomirski <luto@kernel.org>

This is temporary.  It will allow the next few patches to be tested
incrementally.

Setting unsafe_fsgsbase is a root hole.  Don't do it.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
---

Changes from v8: none
Changes from v7: none
---
 Documentation/admin-guide/kernel-parameters.txt |  3 +++
 arch/x86/kernel/cpu/common.c                    | 24 ++++++++++++++++++++++++
 2 files changed, 27 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index c7ac2f3..eb9a491 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2899,6 +2899,9 @@
 	no5lvl		[X86-64] Disable 5-level paging mode. Forces
 			kernel to use 4-level paging instead.
 
+	unsafe_fsgsbase	[X86] Allow FSGSBASE instructions.  This will be
+			replaced with a nofsgsbase flag.
+
 	no_console_suspend
 			[HW] Never suspend the console
 			Disable suspending of consoles during suspend and
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 9ae7d1b..9f57fb0 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -438,6 +438,22 @@ static void __init setup_cr_pinning(void)
 }
 
 /*
+ * Temporary hack: FSGSBASE is unsafe until a few kernel code paths are
+ * updated. This allows us to get the kernel ready incrementally.
+ *
+ * Once all the pieces are in place, these will go away and be replaced with
+ * a nofsgsbase chicken flag.
+ */
+static bool unsafe_fsgsbase;
+
+static __init int setup_unsafe_fsgsbase(char *arg)
+{
+	unsafe_fsgsbase = true;
+	return 1;
+}
+__setup("unsafe_fsgsbase", setup_unsafe_fsgsbase);
+
+/*
  * Protection Keys are not available in 32-bit mode.
  */
 static bool pku_disabled;
@@ -1455,6 +1471,14 @@ static void identify_cpu(struct cpuinfo_x86 *c)
 	setup_smap(c);
 	setup_umip(c);
 
+	/* Enable FSGSBASE instructions if available. */
+	if (cpu_has(c, X86_FEATURE_FSGSBASE)) {
+		if (unsafe_fsgsbase)
+			cr4_set_bits(X86_CR4_FSGSBASE);
+		else
+			clear_cpu_cap(c, X86_FEATURE_FSGSBASE);
+	}
+
 	/*
 	 * The vendor-specific functions might have changed features.
 	 * Now we do "generic changes."
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v9 04/17] x86/entry/64: Clean up paranoid exit
  2019-10-04 18:15 [PATCH v9 00/17] Enable FSGSBASE instructions Chang S. Bae
                   ` (2 preceding siblings ...)
  2019-10-04 18:15 ` [PATCH v9 03/17] x86/cpu: Add 'unsafe_fsgsbase' to enable CR4.FSGSBASE Chang S. Bae
@ 2019-10-04 18:15 ` Chang S. Bae
  2019-10-04 18:15 ` [PATCH v9 05/17] x86/entry/64: Switch CR3 before SWAPGS in paranoid entry Chang S. Bae
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 46+ messages in thread
From: Chang S. Bae @ 2019-10-04 18:15 UTC (permalink / raw)
  To: linux-kernel, tglx, bp, luto
  Cc: hpa, dave.hansen, tony.luck, ak, ravi.v.shankar, chang.seok.bae,
	Vegard Nossum

From: Andy Lutomirski <luto@kernel.org>

All that paranoid exit needs to do is to disable IRQs, handle IRQ tracing,
then restore CR3, and restore GS base. Simply do those actions in that
order. Cleaning up the spaghetti code.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Vegard Nossum <vegard.nossum@oracle.com>
---

Changes from v8: none

Changes from v7:
* Included as a new patch. Took the cleanup part from the Andy Lutomirski's
  original patch [*] and edited its changelog a little bit.

[*] https://lkml.kernel.org/r/59725ceb08977359489fbed979716949ad45f616.1562035429.git.luto@kernel.org
---
 arch/x86/entry/entry_64.S | 29 +++++++++++++++++------------
 1 file changed, 17 insertions(+), 12 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index b7c3ea4..dd0d62a 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1265,20 +1265,25 @@ END(paranoid_entry)
 ENTRY(paranoid_exit)
 	UNWIND_HINT_REGS
 	DISABLE_INTERRUPTS(CLBR_ANY)
+
+	/*
+	 * The order of operations is important. IRQ tracing requires
+	 * kernel GS base and CR3. RESTORE_CR3 requires kernel GS base.
+	 *
+	 * NB to anyone to try to optimize this code: this code does
+	 * not execute at all for exceptions from user mode. Those
+	 * exceptions go through error_exit instead.
+	 */
 	TRACE_IRQS_OFF_DEBUG
-	testl	%ebx, %ebx			/* swapgs needed? */
-	jnz	.Lparanoid_exit_no_swapgs
-	TRACE_IRQS_IRETQ
-	/* Always restore stashed CR3 value (see paranoid_entry) */
-	RESTORE_CR3	scratch_reg=%rbx save_reg=%r14
+	RESTORE_CR3	scratch_reg=%rax save_reg=%r14
+
+	/* If EBX is 0, SWAPGS is required */
+	testl	%ebx, %ebx
+	jnz	restore_regs_and_return_to_kernel
+
+	/* We are returning to a context with user GS base */
 	SWAPGS_UNSAFE_STACK
-	jmp	.Lparanoid_exit_restore
-.Lparanoid_exit_no_swapgs:
-	TRACE_IRQS_IRETQ_DEBUG
-	/* Always restore stashed CR3 value (see paranoid_entry) */
-	RESTORE_CR3	scratch_reg=%rbx save_reg=%r14
-.Lparanoid_exit_restore:
-	jmp restore_regs_and_return_to_kernel
+	jmp	restore_regs_and_return_to_kernel
 END(paranoid_exit)
 
 /*
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v9 05/17] x86/entry/64: Switch CR3 before SWAPGS in paranoid entry
  2019-10-04 18:15 [PATCH v9 00/17] Enable FSGSBASE instructions Chang S. Bae
                   ` (3 preceding siblings ...)
  2019-10-04 18:15 ` [PATCH v9 04/17] x86/entry/64: Clean up paranoid exit Chang S. Bae
@ 2019-10-04 18:15 ` Chang S. Bae
  2019-10-04 18:15 ` [PATCH v9 06/17] x86/entry/64: Introduce the FIND_PERCPU_BASE macro Chang S. Bae
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 46+ messages in thread
From: Chang S. Bae @ 2019-10-04 18:15 UTC (permalink / raw)
  To: linux-kernel, tglx, bp, luto
  Cc: hpa, dave.hansen, tony.luck, ak, ravi.v.shankar, chang.seok.bae,
	Vegard Nossum

When FSGSBASE is enabled, the GS base handling in paranoid entry will need
to retrieve the kernel GS base which requires that the kernel page table is
active.

As the CR3 switch to the kernel page tables (PTI is active) does not depend
on kernel GS base, move the CR3 switch in front of the GS base handling.

Comment the EBX content while at it.

No functional change.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Vegard Nossum <vegard.nossum@oracle.com>
---

Changes from v8: none

Changes from v7:
* Rebased onto the LFENCE-based SWAPGS mitigation code
* Dropped the READ_MSR_GSBASE macro by Thomas
* Rewrote changelog and comments by Thomas
* Use 'GS base' consistently, instead of 'GSBASE'
---
 arch/x86/entry/entry_64.S | 31 +++++++++++++++++++------------
 1 file changed, 19 insertions(+), 12 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index dd0d62a..edb4160 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1219,15 +1219,7 @@ ENTRY(paranoid_entry)
 	cld
 	PUSH_AND_CLEAR_REGS save_ret=1
 	ENCODE_FRAME_POINTER 8
-	movl	$1, %ebx
-	movl	$MSR_GS_BASE, %ecx
-	rdmsr
-	testl	%edx, %edx
-	js	1f				/* negative -> in kernel */
-	SWAPGS
-	xorl	%ebx, %ebx
 
-1:
 	/*
 	 * Always stash CR3 in %r14.  This value will be restored,
 	 * verbatim, at exit.  Needed if paranoid_entry interrupted
@@ -1237,16 +1229,31 @@ ENTRY(paranoid_entry)
 	 * This is also why CS (stashed in the "iret frame" by the
 	 * hardware at entry) can not be used: this may be a return
 	 * to kernel code, but with a user CR3 value.
+	 *
+	 * Switching CR3 does not depend on kernel GS base so it can
+	 * be done before switching to the kernel GS base. This is
+	 * required for FSGSBASE because the kernel GS base has to
+	 * be retrieved from a kernel internal table.
 	 */
 	SAVE_AND_SWITCH_TO_KERNEL_CR3 scratch_reg=%rax save_reg=%r14
 
+	/* EBX = 1 -> kernel GSBASE active, no restore required */
+	movl	$1, %ebx
 	/*
-	 * The above SAVE_AND_SWITCH_TO_KERNEL_CR3 macro doesn't do an
-	 * unconditional CR3 write, even in the PTI case.  So do an lfence
-	 * to prevent GS speculation, regardless of whether PTI is enabled.
+	 * The kernel-enforced convention is a negative GS base indicates
+	 * a kernel value. No SWAPGS needed on entry and exit.
 	 */
-	FENCE_SWAPGS_KERNEL_ENTRY
+	movl	$MSR_GS_BASE, %ecx
+	rdmsr
+	testl	%edx, %edx
+	jns	.Lparanoid_entry_swapgs
+	ret
 
+.Lparanoid_entry_swapgs:
+	SWAPGS
+	FENCE_SWAPGS_KERNEL_ENTRY
+	/* EBX = 0 -> SWAPGS required on exit */
+	xorl	%ebx, %ebx
 	ret
 END(paranoid_entry)
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v9 06/17] x86/entry/64: Introduce the FIND_PERCPU_BASE macro
  2019-10-04 18:15 [PATCH v9 00/17] Enable FSGSBASE instructions Chang S. Bae
                   ` (4 preceding siblings ...)
  2019-10-04 18:15 ` [PATCH v9 05/17] x86/entry/64: Switch CR3 before SWAPGS in paranoid entry Chang S. Bae
@ 2019-10-04 18:15 ` Chang S. Bae
  2019-10-04 18:15 ` [PATCH v9 07/17] x86/entry/64: Handle FSGSBASE enabled paranoid entry/exit Chang S. Bae
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 46+ messages in thread
From: Chang S. Bae @ 2019-10-04 18:15 UTC (permalink / raw)
  To: linux-kernel, tglx, bp, luto
  Cc: hpa, dave.hansen, tony.luck, ak, ravi.v.shankar, chang.seok.bae,
	Vegard Nossum

GS base is used to find per-CPU data in the kernel. But when GS base is
unknown, the per-CPU base can be found from the per_cpu_offset table with a
CPU NR.  The CPU NR is extracted from the limit field of the CPUNODE entry
in GDT, or by the RDPID instruction. This is a prerequisite for using
FSGSBASE in the low level entry code.

Also, add the GAS-compatible RDPID macro as binutils 2.21 does not support
it. Support is added in version 2.27.

Suggested-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Vegard Nossum <vegard.nossum@oracle.com>
---

Changes from v8: none

Changes from v7:
* No code change
* Massaged changelog by Thomas
* Used 'GS base' consistently, instead of 'GSBASE'
---
 arch/x86/entry/calling.h    | 34 ++++++++++++++++++++++++++++++++++
 arch/x86/include/asm/inst.h | 15 +++++++++++++++
 2 files changed, 49 insertions(+)

diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index 515c0ce..c222302 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -6,6 +6,7 @@
 #include <asm/percpu.h>
 #include <asm/asm-offsets.h>
 #include <asm/processor-flags.h>
+#include <asm/inst.h>
 
 /*
 
@@ -347,6 +348,39 @@ For 32-bit we have the following conventions - kernel is built with
 #endif
 .endm
 
+#ifdef CONFIG_SMP
+
+/*
+ * CPU/node NR is loaded from the limit (size) field of a special segment
+ * descriptor entry in GDT.
+ */
+.macro LOAD_CPU_AND_NODE_SEG_LIMIT reg:req
+	movq	$__CPUNODE_SEG, \reg
+	lsl	\reg, \reg
+.endm
+
+/*
+ * Fetch the per-CPU GS base value for this processor and put it in @reg.
+ * We normally use %gs for accessing per-CPU data, but we are setting up
+ * %gs here and obviously can not use %gs itself to access per-CPU data.
+ */
+.macro GET_PERCPU_BASE reg:req
+	ALTERNATIVE \
+		"LOAD_CPU_AND_NODE_SEG_LIMIT \reg", \
+		"RDPID	\reg", \
+		X86_FEATURE_RDPID
+	andq	$VDSO_CPUNODE_MASK, \reg
+	movq	__per_cpu_offset(, \reg, 8), \reg
+.endm
+
+#else
+
+.macro GET_PERCPU_BASE reg:req
+	movq	pcpu_unit_offsets(%rip), \reg
+.endm
+
+#endif /* CONFIG_SMP */
+
 /*
  * This does 'call enter_from_user_mode' unless we can avoid it based on
  * kernel config or using the static jump infrastructure.
diff --git a/arch/x86/include/asm/inst.h b/arch/x86/include/asm/inst.h
index f5a796d..d063841 100644
--- a/arch/x86/include/asm/inst.h
+++ b/arch/x86/include/asm/inst.h
@@ -306,6 +306,21 @@
 	.endif
 	MODRM 0xc0 movq_r64_xmm_opd1 movq_r64_xmm_opd2
 	.endm
+
+.macro RDPID opd
+	REG_TYPE rdpid_opd_type \opd
+	.if rdpid_opd_type == REG_TYPE_R64
+	R64_NUM rdpid_opd \opd
+	.else
+	R32_NUM rdpid_opd \opd
+	.endif
+	.byte 0xf3
+	.if rdpid_opd > 7
+	PFX_REX rdpid_opd 0
+	.endif
+	.byte 0x0f, 0xc7
+	MODRM 0xc0 rdpid_opd 0x7
+.endm
 #endif
 
 #endif
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v9 07/17] x86/entry/64: Handle FSGSBASE enabled paranoid entry/exit
  2019-10-04 18:15 [PATCH v9 00/17] Enable FSGSBASE instructions Chang S. Bae
                   ` (5 preceding siblings ...)
  2019-10-04 18:15 ` [PATCH v9 06/17] x86/entry/64: Introduce the FIND_PERCPU_BASE macro Chang S. Bae
@ 2019-10-04 18:15 ` Chang S. Bae
  2019-10-04 18:16 ` [PATCH v9 08/17] x86/entry/64: Document GSBASE handling in the paranoid path Chang S. Bae
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 46+ messages in thread
From: Chang S. Bae @ 2019-10-04 18:15 UTC (permalink / raw)
  To: linux-kernel, tglx, bp, luto
  Cc: hpa, dave.hansen, tony.luck, ak, ravi.v.shankar, chang.seok.bae,
	Tom Lendacky, Vegard Nossum

Without FSGSBASE, user space cannot change GS base other than through a
PRCTL. The kernel enforces that the user space GS base value is positive
as negative values are used for detecting the kernel space GS base value
in the paranoid entry code.

If FSGSBASE is enabled, user space can set arbitrary GS base values without
kernel intervention, including negative ones, which breaks the paranoid
entry assumptions.

To avoid this, paranoid entry needs to unconditionally save the current
GS base value independent of the interrupted context, retrieve and write
the kernel GS base and unconditionally restore the saved value on exit.
The restore happens either in paranoid exit or in the special exit path of
the NMI low level code.

All other entry code paths which use unconditional SWAPGS are not affected
as they do not depend on the actual content.

The new logic for paranoid entry, when FSGSBASE is enabled, removes SWAPGS
and replaces with unconditional WRGSBASE. Hence no fences are needed.

Suggested-by: H. Peter Anvin <hpa@zytor.com>
Suggested-by: Andy Lutomirski <luto@kernel.org>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Vegard Nossum <vegard.nossum@oracle.com>
---

Changes from v8: none

Changes from v7:
* Rebased paranoid exit changes on the precedent cleanup patch
* Massaged changelog and comment by Thomas
* Added comments related to the SWAPGS mitigation
* Used 'GS base' consistently, instead of 'GSBASE'
---
 arch/x86/entry/calling.h  |  6 ++++
 arch/x86/entry/entry_64.S | 78 +++++++++++++++++++++++++++++++++++++++++------
 2 files changed, 75 insertions(+), 9 deletions(-)

diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index c222302..673d086 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -340,6 +340,12 @@ For 32-bit we have the following conventions - kernel is built with
 #endif
 .endm
 
+.macro SAVE_AND_SET_GSBASE scratch_reg:req save_reg:req
+	rdgsbase \save_reg
+	GET_PERCPU_BASE \scratch_reg
+	wrgsbase \scratch_reg
+.endm
+
 #endif /* CONFIG_X86_64 */
 
 .macro STACKLEAK_ERASE
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index edb4160..d554754 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -38,6 +38,7 @@
 #include <asm/export.h>
 #include <asm/frame.h>
 #include <asm/nospec-branch.h>
+#include <asm/fsgsbase.h>
 #include <linux/err.h>
 
 #include "calling.h"
@@ -1210,9 +1211,14 @@ idtentry machine_check		do_mce			has_error_code=0	paranoid=1
 #endif
 
 /*
- * Save all registers in pt_regs, and switch gs if needed.
- * Use slow, but surefire "are we in kernel?" check.
- * Return: ebx=0: need swapgs on exit, ebx=1: otherwise
+ * Save all registers in pt_regs. Return GS base related information
+ * in EBX depending on the availability of the FSGSBASE instructions:
+ *
+ * FSGSBASE	R/EBX
+ *     N        0 -> SWAPGS on exit
+ *              1 -> no SWAPGS on exit
+ *
+ *     Y        GS base value at entry, must be restored in paranoid_exit
  */
 ENTRY(paranoid_entry)
 	UNWIND_HINT_FUNC
@@ -1237,7 +1243,29 @@ ENTRY(paranoid_entry)
 	 */
 	SAVE_AND_SWITCH_TO_KERNEL_CR3 scratch_reg=%rax save_reg=%r14
 
-	/* EBX = 1 -> kernel GSBASE active, no restore required */
+	/*
+	 * Handling GS base depends on the availability of FSGSBASE.
+	 *
+	 * Without FSGSBASE the kernel enforces that negative GS base
+	 * values indicate kernel GS base. With FSGSBASE no assumptions
+	 * can be made about the GS base value when entering from user
+	 * space.
+	*/
+	ALTERNATIVE "jmp .Lparanoid_entry_checkgs", "", X86_FEATURE_FSGSBASE
+
+	/*
+	 * Read the current GS base and store it in %rbx unconditionally,
+	 * retrieve and set the current CPUs kernel GS base. The stored value
+	 * has to be restored in paranoid_exit unconditionally.
+	 *
+	 * This unconditional write of GS base ensures no subsequent load
+	 * based on a mispredicted GS base.
+	 */
+	SAVE_AND_SET_GSBASE scratch_reg=%rax save_reg=%rbx
+	ret
+
+.Lparanoid_entry_checkgs:
+	/* EBX = 1 -> kernel GS base active, no restore required */
 	movl	$1, %ebx
 	/*
 	 * The kernel-enforced convention is a negative GS base indicates
@@ -1264,10 +1292,17 @@ END(paranoid_entry)
  *
  * We may be returning to very strange contexts (e.g. very early
  * in syscall entry), so checking for preemption here would
- * be complicated.  Fortunately, we there's no good reason
- * to try to handle preemption here.
+ * be complicated.  Fortunately, there's no good reason to try
+ * to handle preemption here.
+ *
+ * R/EBX contains the GS base related information depending on the
+ * availability of the FSGSBASE instructions:
+ *
+ * FSGSBASE	R/EBX
+ *     N        0 -> SWAPGS on exit
+ *              1 -> no SWAPGS on exit
  *
- * On entry, ebx is "no swapgs" flag (1: don't need swapgs, 0: need it)
+ *     Y        User space GS base, must be restored unconditionally
  */
 ENTRY(paranoid_exit)
 	UNWIND_HINT_REGS
@@ -1284,7 +1319,15 @@ ENTRY(paranoid_exit)
 	TRACE_IRQS_OFF_DEBUG
 	RESTORE_CR3	scratch_reg=%rax save_reg=%r14
 
-	/* If EBX is 0, SWAPGS is required */
+	/* Handle the three GS base cases */
+	ALTERNATIVE "jmp .Lparanoid_exit_checkgs", "", X86_FEATURE_FSGSBASE
+
+	/* With FSGSBASE enabled, unconditionally resotre GS base */
+	wrgsbase	%rbx
+	jmp	restore_regs_and_return_to_kernel
+
+.Lparanoid_exit_checkgs:
+	/* On non-FSGSBASE systems, conditionally do SWAPGS */
 	testl	%ebx, %ebx
 	jnz	restore_regs_and_return_to_kernel
 
@@ -1698,10 +1741,27 @@ end_repeat_nmi:
 	/* Always restore stashed CR3 value (see paranoid_entry) */
 	RESTORE_CR3 scratch_reg=%r15 save_reg=%r14
 
-	testl	%ebx, %ebx			/* swapgs needed? */
+	/*
+	 * The above invocation of paranoid_entry stored the GS base
+	 * related information in R/EBX depending on the availability
+	 * of FSGSBASE.
+	 *
+	 * If FSGSBASE is enabled, restore the saved GS base value
+	 * unconditionally, otherwise take the conditional SWAPGS path.
+	 */
+	ALTERNATIVE "jmp nmi_no_fsgsbase", "", X86_FEATURE_FSGSBASE
+
+	wrgsbase	%rbx
+	jmp	nmi_restore
+
+nmi_no_fsgsbase:
+	/* EBX == 0 -> invoke SWAPGS */
+	testl	%ebx, %ebx
 	jnz	nmi_restore
+
 nmi_swapgs:
 	SWAPGS_UNSAFE_STACK
+
 nmi_restore:
 	POP_REGS
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v9 08/17] x86/entry/64: Document GSBASE handling in the paranoid path
  2019-10-04 18:15 [PATCH v9 00/17] Enable FSGSBASE instructions Chang S. Bae
                   ` (6 preceding siblings ...)
  2019-10-04 18:15 ` [PATCH v9 07/17] x86/entry/64: Handle FSGSBASE enabled paranoid entry/exit Chang S. Bae
@ 2019-10-04 18:16 ` Chang S. Bae
  2019-10-04 18:16 ` [PATCH v9 09/17] x86/fsgsbase/64: Add intrinsics for FSGSBASE instructions Chang S. Bae
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 46+ messages in thread
From: Chang S. Bae @ 2019-10-04 18:16 UTC (permalink / raw)
  To: linux-kernel, tglx, bp, luto
  Cc: hpa, dave.hansen, tony.luck, ak, ravi.v.shankar, chang.seok.bae

On FSGSBASE systems, the way to handle GS base in the paranoid path is
different from the existing SWAPGS-based entry/exit path handling. Document
the reason and what has to be done for FSGSBASE enabled systems.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
---

Changes from v8: none

Changes from v7:
* Massaged doc and changelog by Thomas
* Used 'GS base' consistently, instead of 'GSBASE'
---
 Documentation/x86/entry_64.rst | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/Documentation/x86/entry_64.rst b/Documentation/x86/entry_64.rst
index a48b3f6..0499a40 100644
--- a/Documentation/x86/entry_64.rst
+++ b/Documentation/x86/entry_64.rst
@@ -108,3 +108,12 @@ We try to only use IST entries and the paranoid entry code for vectors
 that absolutely need the more expensive check for the GS base - and we
 generate all 'normal' entry points with the regular (faster) paranoid=0
 variant.
+
+On FSGSBASE systems, however, user space can set GS without kernel
+interaction. It means the value of GS base itself does not imply anything,
+whether a kernel value or a user space value. So, there is no longer a safe
+way to check whether the exception is entering from user mode or kernel
+mode in the paranoid entry code path. So the GS base value needs to be read
+out, saved and the kernel GS base value written. On exit, the saved GS base
+value needs to be restored unconditionally. The non-paranoid entry/exit
+code still uses SWAPGS unconditionally as the state is known.
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v9 09/17] x86/fsgsbase/64: Add intrinsics for FSGSBASE instructions
  2019-10-04 18:15 [PATCH v9 00/17] Enable FSGSBASE instructions Chang S. Bae
                   ` (7 preceding siblings ...)
  2019-10-04 18:16 ` [PATCH v9 08/17] x86/entry/64: Document GSBASE handling in the paranoid path Chang S. Bae
@ 2019-10-04 18:16 ` Chang S. Bae
  2019-10-04 18:16 ` [PATCH v9 10/17] x86/fsgsbase/64: Enable FSGSBASE instructions in helper functions Chang S. Bae
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 46+ messages in thread
From: Chang S. Bae @ 2019-10-04 18:16 UTC (permalink / raw)
  To: linux-kernel, tglx, bp, luto
  Cc: hpa, dave.hansen, tony.luck, ak, ravi.v.shankar, chang.seok.bae

From: Andi Kleen <ak@linux.intel.com>

[ luto: Rename the variables from FS and GS to FSBASE and GSBASE and
  make <asm/fsgsbase.h> safe to include on 32-bit kernels. ]

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andi Kleen <ak@linux.intel.com>
---

Changes from v8: none

Changes from v7:
* No code change
* Trimmed the changelog by Thomas
---
 arch/x86/include/asm/fsgsbase.h | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/arch/x86/include/asm/fsgsbase.h b/arch/x86/include/asm/fsgsbase.h
index bca4c74..fdd1177 100644
--- a/arch/x86/include/asm/fsgsbase.h
+++ b/arch/x86/include/asm/fsgsbase.h
@@ -19,6 +19,36 @@ extern unsigned long x86_gsbase_read_task(struct task_struct *task);
 extern void x86_fsbase_write_task(struct task_struct *task, unsigned long fsbase);
 extern void x86_gsbase_write_task(struct task_struct *task, unsigned long gsbase);
 
+/* Must be protected by X86_FEATURE_FSGSBASE check. */
+
+static __always_inline unsigned long rdfsbase(void)
+{
+	unsigned long fsbase;
+
+	asm volatile("rdfsbase %0" : "=r" (fsbase) :: "memory");
+
+	return fsbase;
+}
+
+static __always_inline unsigned long rdgsbase(void)
+{
+	unsigned long gsbase;
+
+	asm volatile("rdgsbase %0" : "=r" (gsbase) :: "memory");
+
+	return gsbase;
+}
+
+static __always_inline void wrfsbase(unsigned long fsbase)
+{
+	asm volatile("wrfsbase %0" :: "r" (fsbase) : "memory");
+}
+
+static __always_inline void wrgsbase(unsigned long gsbase)
+{
+	asm volatile("wrgsbase %0" :: "r" (gsbase) : "memory");
+}
+
 /* Helper functions for reading/writing FS/GS base */
 
 static inline unsigned long x86_fsbase_read_cpu(void)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v9 10/17] x86/fsgsbase/64: Enable FSGSBASE instructions in helper functions
  2019-10-04 18:15 [PATCH v9 00/17] Enable FSGSBASE instructions Chang S. Bae
                   ` (8 preceding siblings ...)
  2019-10-04 18:16 ` [PATCH v9 09/17] x86/fsgsbase/64: Add intrinsics for FSGSBASE instructions Chang S. Bae
@ 2019-10-04 18:16 ` Chang S. Bae
  2019-10-04 18:16 ` [PATCH v9 11/17] x86/fsgsbase/64: Use FSGSBASE in switch_to() if available Chang S. Bae
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 46+ messages in thread
From: Chang S. Bae @ 2019-10-04 18:16 UTC (permalink / raw)
  To: linux-kernel, tglx, bp, luto
  Cc: hpa, dave.hansen, tony.luck, ak, ravi.v.shankar, chang.seok.bae,
	Andrew Cooper

Add CPU feature conditional FS/GS base access to the relevant helper
functions. That allows accelerating certain FS/GS base operations in
subsequent changes.

Note, that while possible, the user space entry/exit GS base operations are
not going to use the new FSGSBASE instructions. The reason is that it would
require additional storage for the user space value which adds more
complexity to the low level code and experiments have shown marginal
benefit. This may be revisited later but for now the SWAPGS based handling
in the entry code is preserved except for the paranoid entry/exit code.

Suggested-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---

Changes from v8:
* Internalized the interrupt condition check in the helper functions (Andy L.)
* Simplified the GS base read/write helper functions (Tony)
* Massaged the changelog to reflect the helper changes

Changes from v7:
* Added interrupt-related warning messages by Thomas
* Massaged changelog by Thomas
* Used '[FS|GS] base' consistently, instead of '[FS|GS]BASE'
---
 arch/x86/include/asm/fsgsbase.h | 27 +++++++++----------
 arch/x86/kernel/process_64.c    | 58 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 70 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/fsgsbase.h b/arch/x86/include/asm/fsgsbase.h
index fdd1177..aefd537 100644
--- a/arch/x86/include/asm/fsgsbase.h
+++ b/arch/x86/include/asm/fsgsbase.h
@@ -49,35 +49,32 @@ static __always_inline void wrgsbase(unsigned long gsbase)
 	asm volatile("wrgsbase %0" :: "r" (gsbase) : "memory");
 }
 
+#include <asm/cpufeature.h>
+
 /* Helper functions for reading/writing FS/GS base */
 
 static inline unsigned long x86_fsbase_read_cpu(void)
 {
 	unsigned long fsbase;
 
-	rdmsrl(MSR_FS_BASE, fsbase);
+	if (static_cpu_has(X86_FEATURE_FSGSBASE))
+		fsbase = rdfsbase();
+	else
+		rdmsrl(MSR_FS_BASE, fsbase);
 
 	return fsbase;
 }
 
-static inline unsigned long x86_gsbase_read_cpu_inactive(void)
-{
-	unsigned long gsbase;
-
-	rdmsrl(MSR_KERNEL_GS_BASE, gsbase);
-
-	return gsbase;
-}
-
 static inline void x86_fsbase_write_cpu(unsigned long fsbase)
 {
-	wrmsrl(MSR_FS_BASE, fsbase);
+	if (static_cpu_has(X86_FEATURE_FSGSBASE))
+		wrfsbase(fsbase);
+	else
+		wrmsrl(MSR_FS_BASE, fsbase);
 }
 
-static inline void x86_gsbase_write_cpu_inactive(unsigned long gsbase)
-{
-	wrmsrl(MSR_KERNEL_GS_BASE, gsbase);
-}
+extern unsigned long x86_gsbase_read_cpu_inactive(void);
+extern void x86_gsbase_write_cpu_inactive(unsigned long gsbase);
 
 #endif /* CONFIG_X86_64 */
 
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index af64519..295aa0c 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -329,6 +329,64 @@ static unsigned long x86_fsgsbase_read_task(struct task_struct *task,
 	return base;
 }
 
+unsigned long x86_gsbase_read_cpu_inactive(void)
+{
+	unsigned long gsbase;
+
+	if (static_cpu_has(X86_FEATURE_FSGSBASE)) {
+		bool need_restore = false;
+		unsigned long flags;
+
+		/*
+		 * We read the inactive GS base value by swapping
+		 * to make it the active one. But we cannot allow
+		 * an interrupt while we switch to and from.
+		 */
+		if (!irqs_disabled()) {
+			local_irq_save(flags);
+			need_restore = true;
+		}
+
+		native_swapgs();
+		gsbase = rdgsbase();
+		native_swapgs();
+
+		if (need_restore)
+			local_irq_restore(flags);
+	} else {
+		rdmsrl(MSR_KERNEL_GS_BASE, gsbase);
+	}
+
+	return gsbase;
+}
+
+void x86_gsbase_write_cpu_inactive(unsigned long gsbase)
+{
+	if (static_cpu_has(X86_FEATURE_FSGSBASE)) {
+		bool need_restore = false;
+		unsigned long flags;
+
+		/*
+		 * We write the inactive GS base value by swapping
+		 * to make it the active one. But we cannot allow
+		 * an interrupt while we switch to and from.
+		 */
+		if (!irqs_disabled()) {
+			local_irq_save(flags);
+			need_restore = true;
+		}
+
+		native_swapgs();
+		wrgsbase(gsbase);
+		native_swapgs();
+
+		if (need_restore)
+			local_irq_restore(flags);
+	} else {
+		wrmsrl(MSR_KERNEL_GS_BASE, gsbase);
+	}
+}
+
 unsigned long x86_fsbase_read_task(struct task_struct *task)
 {
 	unsigned long fsbase;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v9 11/17] x86/fsgsbase/64: Use FSGSBASE in switch_to() if available
  2019-10-04 18:15 [PATCH v9 00/17] Enable FSGSBASE instructions Chang S. Bae
                   ` (9 preceding siblings ...)
  2019-10-04 18:16 ` [PATCH v9 10/17] x86/fsgsbase/64: Enable FSGSBASE instructions in helper functions Chang S. Bae
@ 2019-10-04 18:16 ` Chang S. Bae
  2019-10-04 18:16 ` [PATCH v9 12/17] x86/fsgsbase/64: Use FSGSBASE instructions on thread copy and ptrace Chang S. Bae
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 46+ messages in thread
From: Chang S. Bae @ 2019-10-04 18:16 UTC (permalink / raw)
  To: linux-kernel, tglx, bp, luto
  Cc: hpa, dave.hansen, tony.luck, ak, ravi.v.shankar, chang.seok.bae

From: Andy Lutomirski <luto@kernel.org>

With the new FSGSBASE instructions, FS/GS base can be efficiently read
and written in __switch_to(). Use that capability to preserve the full
state.

This will enable user code to do whatever it wants with the new
instructions without any kernel-induced gotchas.  (There can still be
architectural gotchas: movl %gs,%eax; movl %eax,%gs may change GS base
if WRGSBASE was used, but users are expected to read the CPU manual
before doing things like that.)

This is a considerable speedup. It seems to save about 100 cycles per
context switch compared to the baseline 4.6-rc1 behavior on a Skylake
laptop.

[ chang: 5~10% performance improvements were seen by a context switch
  benchmark that ran threads with different FS/GS base values (to the
  baseline 4.16). ]

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
---

Changes from v8:
* Rebased on the precedent helper changes; removed the interrupt
  condition check and IRQ disablement from save_fsgs() and
  x86_fsgsbase_load().

Changes from v7:
* Used appropriate GS base read/write functions depending on interrupt
  conditions. This fixes the bug in v7.
* Massaged changelog by Thomas
* Used '[FS|GS] base' consistently, instead of '[FS|GS]BASE'
---
 arch/x86/kernel/process_64.c | 34 ++++++++++++++++++++++++++++------
 1 file changed, 28 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 295aa0c..56c0e5b 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -200,8 +200,18 @@ static __always_inline void save_fsgs(struct task_struct *task)
 {
 	savesegment(fs, task->thread.fsindex);
 	savesegment(gs, task->thread.gsindex);
-	save_base_legacy(task, task->thread.fsindex, FS);
-	save_base_legacy(task, task->thread.gsindex, GS);
+	if (static_cpu_has(X86_FEATURE_FSGSBASE)) {
+		/*
+		 * If FSGSBASE is enabled, we can't make any useful guesses
+		 * about the base, and user code expects us to save the current
+		 * value.  Fortunately, reading the base directly is efficient.
+		 */
+		task->thread.fsbase = rdfsbase();
+		task->thread.gsbase = x86_gsbase_read_cpu_inactive();
+	} else {
+		save_base_legacy(task, task->thread.fsindex, FS);
+		save_base_legacy(task, task->thread.gsindex, GS);
+	}
 }
 
 #if IS_ENABLED(CONFIG_KVM)
@@ -280,10 +290,22 @@ static __always_inline void load_seg_legacy(unsigned short prev_index,
 static __always_inline void x86_fsgsbase_load(struct thread_struct *prev,
 					      struct thread_struct *next)
 {
-	load_seg_legacy(prev->fsindex, prev->fsbase,
-			next->fsindex, next->fsbase, FS);
-	load_seg_legacy(prev->gsindex, prev->gsbase,
-			next->gsindex, next->gsbase, GS);
+	if (static_cpu_has(X86_FEATURE_FSGSBASE)) {
+		/* Update the FS and GS selectors if they could have changed. */
+		if (unlikely(prev->fsindex || next->fsindex))
+			loadseg(FS, next->fsindex);
+		if (unlikely(prev->gsindex || next->gsindex))
+			loadseg(GS, next->gsindex);
+
+		/* Update the bases. */
+		wrfsbase(next->fsbase);
+		x86_gsbase_write_cpu_inactive(next->gsbase);
+	} else {
+		load_seg_legacy(prev->fsindex, prev->fsbase,
+				next->fsindex, next->fsbase, FS);
+		load_seg_legacy(prev->gsindex, prev->gsbase,
+				next->gsindex, next->gsbase, GS);
+	}
 }
 
 static unsigned long x86_fsgsbase_read_task(struct task_struct *task,
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v9 12/17] x86/fsgsbase/64: Use FSGSBASE instructions on thread copy and ptrace
  2019-10-04 18:15 [PATCH v9 00/17] Enable FSGSBASE instructions Chang S. Bae
                   ` (10 preceding siblings ...)
  2019-10-04 18:16 ` [PATCH v9 11/17] x86/fsgsbase/64: Use FSGSBASE in switch_to() if available Chang S. Bae
@ 2019-10-04 18:16 ` Chang S. Bae
  2019-10-04 18:16 ` [PATCH v9 13/17] x86/speculation/swapgs: Check FSGSBASE in enabling SWAPGS mitigation Chang S. Bae
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 46+ messages in thread
From: Chang S. Bae @ 2019-10-04 18:16 UTC (permalink / raw)
  To: linux-kernel, tglx, bp, luto
  Cc: hpa, dave.hansen, tony.luck, ak, ravi.v.shankar, chang.seok.bae

When FSGSBASE is enabled, copying threads and reading FS/GS base using
ptrace must read the actual values.

When copying a thread, use fsgs_save() and copy the saved values. For
ptrace, the bases must be read from memory regardless of the selector
if FSGSBASE is enabled.

Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
---

Changes from v8: none

Changes from v7:
* No code change
* Massaged changelog by Andy Lutomirski
---
 arch/x86/kernel/process_64.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 56c0e5b..b67f656 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -415,7 +415,8 @@ unsigned long x86_fsbase_read_task(struct task_struct *task)
 
 	if (task == current)
 		fsbase = x86_fsbase_read_cpu();
-	else if (task->thread.fsindex == 0)
+	else if (static_cpu_has(X86_FEATURE_FSGSBASE) ||
+		 (task->thread.fsindex == 0))
 		fsbase = task->thread.fsbase;
 	else
 		fsbase = x86_fsgsbase_read_task(task, task->thread.fsindex);
@@ -429,7 +430,8 @@ unsigned long x86_gsbase_read_task(struct task_struct *task)
 
 	if (task == current)
 		gsbase = x86_gsbase_read_cpu_inactive();
-	else if (task->thread.gsindex == 0)
+	else if (static_cpu_has(X86_FEATURE_FSGSBASE) ||
+		 (task->thread.gsindex == 0))
 		gsbase = task->thread.gsbase;
 	else
 		gsbase = x86_fsgsbase_read_task(task, task->thread.gsindex);
@@ -469,10 +471,11 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long sp,
 	p->thread.sp = (unsigned long) fork_frame;
 	p->thread.io_bitmap_ptr = NULL;
 
-	savesegment(gs, p->thread.gsindex);
-	p->thread.gsbase = p->thread.gsindex ? 0 : me->thread.gsbase;
-	savesegment(fs, p->thread.fsindex);
-	p->thread.fsbase = p->thread.fsindex ? 0 : me->thread.fsbase;
+	save_fsgs(me);
+	p->thread.fsindex = me->thread.fsindex;
+	p->thread.fsbase = me->thread.fsbase;
+	p->thread.gsindex = me->thread.gsindex;
+	p->thread.gsbase = me->thread.gsbase;
 	savesegment(es, p->thread.es);
 	savesegment(ds, p->thread.ds);
 	memset(p->thread.ptrace_bps, 0, sizeof(p->thread.ptrace_bps));
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v9 13/17] x86/speculation/swapgs: Check FSGSBASE in enabling SWAPGS mitigation
  2019-10-04 18:15 [PATCH v9 00/17] Enable FSGSBASE instructions Chang S. Bae
                   ` (11 preceding siblings ...)
  2019-10-04 18:16 ` [PATCH v9 12/17] x86/fsgsbase/64: Use FSGSBASE instructions on thread copy and ptrace Chang S. Bae
@ 2019-10-04 18:16 ` Chang S. Bae
  2019-10-04 18:16 ` [PATCH v9 14/17] selftests/x86/fsgsbase: Test ptracer-induced GS base write with FSGSBASE Chang S. Bae
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 46+ messages in thread
From: Chang S. Bae @ 2019-10-04 18:16 UTC (permalink / raw)
  To: linux-kernel, tglx, bp, luto
  Cc: hpa, dave.hansen, tony.luck, ak, ravi.v.shankar, chang.seok.bae

From: Tony Luck <tony.luck@intel.com>

Before enabling FSGSBASE the kernel could safely assume that the content
of GS base was a user address. Thus any speculative access as the result
of a mispredicted branch controlling the execution of SWAPGS would be to
a user address. So systems with speculation-proof SMAP did not need to
add additional LFENCE instructions to mitigate.

With FSGSBASE enabled a hostile user can set GS base to a kernel address.
So they can make the kernel speculatively access data they wish to leak
via a side channel. This means that SMAP provides no protection.

Add FSGSBASE as an additional condition to enable the fence-based SWAPGS
mitigation.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
---

Changes from v8: none

Changes from v7:
* Included as a new patch.
---
 arch/x86/kernel/cpu/bugs.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 91c2561..e06356f 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -321,14 +321,12 @@ static void __init spectre_v1_select_mitigation(void)
 		 * If FSGSBASE is enabled, the user can put a kernel address in
 		 * GS, in which case SMAP provides no protection.
 		 *
-		 * [ NOTE: Don't check for X86_FEATURE_FSGSBASE until the
-		 *	   FSGSBASE enablement patches have been merged. ]
-		 *
 		 * If FSGSBASE is disabled, the user can only put a user space
 		 * address in GS.  That makes an attack harder, but still
 		 * possible if there's no SMAP protection.
 		 */
-		if (!smap_works_speculatively()) {
+		if (boot_cpu_has(X86_FEATURE_FSGSBASE) ||
+		    !smap_works_speculatively()) {
 			/*
 			 * Mitigation can be provided from SWAPGS itself or
 			 * PTI as the CR3 write in the Meltdown mitigation
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v9 14/17] selftests/x86/fsgsbase: Test ptracer-induced GS base write with FSGSBASE
  2019-10-04 18:15 [PATCH v9 00/17] Enable FSGSBASE instructions Chang S. Bae
                   ` (12 preceding siblings ...)
  2019-10-04 18:16 ` [PATCH v9 13/17] x86/speculation/swapgs: Check FSGSBASE in enabling SWAPGS mitigation Chang S. Bae
@ 2019-10-04 18:16 ` Chang S. Bae
  2019-10-04 18:16 ` [PATCH v9 15/17] x86/fsgsbase/64: Enable FSGSBASE on 64bit by default and add a chicken bit Chang S. Bae
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 46+ messages in thread
From: Chang S. Bae @ 2019-10-04 18:16 UTC (permalink / raw)
  To: linux-kernel, tglx, bp, luto
  Cc: hpa, dave.hansen, tony.luck, ak, ravi.v.shankar, chang.seok.bae

This validates that GS selector and base are independently preserved in
ptrace commands.

Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
---

Changes from v8: none

Changes from v7:
* Fixed the test message
---
 tools/testing/selftests/x86/fsgsbase.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/x86/fsgsbase.c b/tools/testing/selftests/x86/fsgsbase.c
index 950a48b..9a43498 100644
--- a/tools/testing/selftests/x86/fsgsbase.c
+++ b/tools/testing/selftests/x86/fsgsbase.c
@@ -465,7 +465,7 @@ static void test_ptrace_write_gsbase(void)
 	wait(&status);
 
 	if (WSTOPSIG(status) == SIGTRAP) {
-		unsigned long gs;
+		unsigned long gs, base;
 		unsigned long gs_offset = USER_REGS_OFFSET(gs);
 		unsigned long base_offset = USER_REGS_OFFSET(gs_base);
 
@@ -481,6 +481,7 @@ static void test_ptrace_write_gsbase(void)
 			err(1, "PTRACE_POKEUSER");
 
 		gs = ptrace(PTRACE_PEEKUSER, child, gs_offset, NULL);
+		base = ptrace(PTRACE_PEEKUSER, child, base_offset, NULL);
 
 		/*
 		 * In a non-FSGSBASE system, the nonzero selector will load
@@ -501,8 +502,14 @@ static void test_ptrace_write_gsbase(void)
 			 */
 			if (gs == 0)
 				printf("\tNote: this is expected behavior on older kernels.\n");
+		} else if (have_fsgsbase && (base != 0xFF)) {
+			nerrs++;
+			printf("[FAIL]\tGSBASE changed to %lx\n", base);
 		} else {
-			printf("[OK]\tGS remained 0x%hx\n", *shared_scratch);
+			printf("[OK]\tGS remained 0x%hx", *shared_scratch);
+			if (have_fsgsbase)
+				printf(" and GSBASE changed to 0xFF");
+			printf("\n");
 		}
 	}
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v9 15/17] x86/fsgsbase/64: Enable FSGSBASE on 64bit by default and add a chicken bit
  2019-10-04 18:15 [PATCH v9 00/17] Enable FSGSBASE instructions Chang S. Bae
                   ` (13 preceding siblings ...)
  2019-10-04 18:16 ` [PATCH v9 14/17] selftests/x86/fsgsbase: Test ptracer-induced GS base write with FSGSBASE Chang S. Bae
@ 2019-10-04 18:16 ` Chang S. Bae
  2019-10-04 18:16 ` [PATCH v9 16/17] x86/elf: Enumerate kernel FSGSBASE capability in AT_HWCAP2 Chang S. Bae
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 46+ messages in thread
From: Chang S. Bae @ 2019-10-04 18:16 UTC (permalink / raw)
  To: linux-kernel, tglx, bp, luto
  Cc: hpa, dave.hansen, tony.luck, ak, ravi.v.shankar, chang.seok.bae

From: Andy Lutomirski <luto@kernel.org>

Now that FSGSBASE is fully supported, remove unsafe_fsgsbase, enable
FSGSBASE by default, and add nofsgsbase to disable it.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
---

Changes from v8:
* Massaged the print message for FSGSBASE enablement by Thomas. This
  change was missed in v7.

Changes from v7:
* No code change
* Massaged title by Thomas
---
 Documentation/admin-guide/kernel-parameters.txt |  3 +--
 arch/x86/kernel/cpu/common.c                    | 32 +++++++++++--------------
 2 files changed, 15 insertions(+), 20 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index eb9a491..a14289a 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2899,8 +2899,7 @@
 	no5lvl		[X86-64] Disable 5-level paging mode. Forces
 			kernel to use 4-level paging instead.
 
-	unsafe_fsgsbase	[X86] Allow FSGSBASE instructions.  This will be
-			replaced with a nofsgsbase flag.
+	nofsgsbase	[X86] Disables FSGSBASE instructions.
 
 	no_console_suspend
 			[HW] Never suspend the console
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 9f57fb0..9b59377bb 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -437,21 +437,21 @@ static void __init setup_cr_pinning(void)
 	static_key_enable(&cr_pinning.key);
 }
 
-/*
- * Temporary hack: FSGSBASE is unsafe until a few kernel code paths are
- * updated. This allows us to get the kernel ready incrementally.
- *
- * Once all the pieces are in place, these will go away and be replaced with
- * a nofsgsbase chicken flag.
- */
-static bool unsafe_fsgsbase;
-
-static __init int setup_unsafe_fsgsbase(char *arg)
+static __init int x86_nofsgsbase_setup(char *arg)
 {
-	unsafe_fsgsbase = true;
+	/* Require an exact match without trailing characters. */
+	if (strlen(arg))
+		return 0;
+
+	/* Do not emit a message if the feature is not present. */
+	if (!boot_cpu_has(X86_FEATURE_FSGSBASE))
+		return 1;
+
+	setup_clear_cpu_cap(X86_FEATURE_FSGSBASE);
+	pr_info("FSGSBASE disabled via kernel command line\n");
 	return 1;
 }
-__setup("unsafe_fsgsbase", setup_unsafe_fsgsbase);
+__setup("nofsgsbase", x86_nofsgsbase_setup);
 
 /*
  * Protection Keys are not available in 32-bit mode.
@@ -1472,12 +1472,8 @@ static void identify_cpu(struct cpuinfo_x86 *c)
 	setup_umip(c);
 
 	/* Enable FSGSBASE instructions if available. */
-	if (cpu_has(c, X86_FEATURE_FSGSBASE)) {
-		if (unsafe_fsgsbase)
-			cr4_set_bits(X86_CR4_FSGSBASE);
-		else
-			clear_cpu_cap(c, X86_FEATURE_FSGSBASE);
-	}
+	if (cpu_has(c, X86_FEATURE_FSGSBASE))
+		cr4_set_bits(X86_CR4_FSGSBASE);
 
 	/*
 	 * The vendor-specific functions might have changed features.
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v9 16/17] x86/elf: Enumerate kernel FSGSBASE capability in AT_HWCAP2
  2019-10-04 18:15 [PATCH v9 00/17] Enable FSGSBASE instructions Chang S. Bae
                   ` (14 preceding siblings ...)
  2019-10-04 18:16 ` [PATCH v9 15/17] x86/fsgsbase/64: Enable FSGSBASE on 64bit by default and add a chicken bit Chang S. Bae
@ 2019-10-04 18:16 ` Chang S. Bae
  2019-10-04 18:16 ` [PATCH v9 17/17] Documentation/x86/64: Add documentation for GS/FS addressing mode Chang S. Bae
  2019-11-15 18:29 ` [PATCH v9 00/17] Enable FSGSBASE instructions Thomas Gleixner
  17 siblings, 0 replies; 46+ messages in thread
From: Chang S. Bae @ 2019-10-04 18:16 UTC (permalink / raw)
  To: linux-kernel, tglx, bp, luto
  Cc: hpa, dave.hansen, tony.luck, ak, ravi.v.shankar, chang.seok.bae

From: Andi Kleen <ak@linux.intel.com>

The kernel needs to explicitly enable FSGSBASE. So, the application needs
to know if it can safely use these instructions. Just looking at the CPUID
bit is not enough because it may be running in a kernel that does not
enable the instructions.

One way for the application would be to just try and catch the SIGILL.
But that is difficult to do in libraries which may not want to overwrite
the signal handlers of the main application.

Enumerate the enabled FSGSBASE capability in bit 1 of AT_HWCAP2 in the ELF
aux vector. AT_HWCAP2 is already used by PPC for similar purposes.

The application can access it open coded or by using the getauxval()
function in newer versions of glibc.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
---

Changes from v8: none

Changes from v7:
* No code change
* Massaged changelog by Thomas
---
 arch/x86/include/uapi/asm/hwcap2.h | 3 +++
 arch/x86/kernel/cpu/common.c       | 4 +++-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/uapi/asm/hwcap2.h b/arch/x86/include/uapi/asm/hwcap2.h
index 8b2effe..5fdfcb4 100644
--- a/arch/x86/include/uapi/asm/hwcap2.h
+++ b/arch/x86/include/uapi/asm/hwcap2.h
@@ -5,4 +5,7 @@
 /* MONITOR/MWAIT enabled in Ring 3 */
 #define HWCAP2_RING3MWAIT		(1 << 0)
 
+/* Kernel allows FSGSBASE instructions available in Ring 3 */
+#define HWCAP2_FSGSBASE			BIT(1)
+
 #endif
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 9b59377bb..90d7e95 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1472,8 +1472,10 @@ static void identify_cpu(struct cpuinfo_x86 *c)
 	setup_umip(c);
 
 	/* Enable FSGSBASE instructions if available. */
-	if (cpu_has(c, X86_FEATURE_FSGSBASE))
+	if (cpu_has(c, X86_FEATURE_FSGSBASE)) {
 		cr4_set_bits(X86_CR4_FSGSBASE);
+		elf_hwcap2 |= HWCAP2_FSGSBASE;
+	}
 
 	/*
 	 * The vendor-specific functions might have changed features.
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v9 17/17] Documentation/x86/64: Add documentation for GS/FS addressing mode
  2019-10-04 18:15 [PATCH v9 00/17] Enable FSGSBASE instructions Chang S. Bae
                   ` (15 preceding siblings ...)
  2019-10-04 18:16 ` [PATCH v9 16/17] x86/elf: Enumerate kernel FSGSBASE capability in AT_HWCAP2 Chang S. Bae
@ 2019-10-04 18:16 ` Chang S. Bae
  2019-10-04 22:54   ` Randy Dunlap
  2019-11-15 18:29 ` [PATCH v9 00/17] Enable FSGSBASE instructions Thomas Gleixner
  17 siblings, 1 reply; 46+ messages in thread
From: Chang S. Bae @ 2019-10-04 18:16 UTC (permalink / raw)
  To: linux-kernel, tglx, bp, luto
  Cc: hpa, dave.hansen, tony.luck, ak, ravi.v.shankar, chang.seok.bae,
	Randy Dunlap, Jonathan Corbet

From: Thomas Gleixner <tglx@linutronix.de>

Explain how the GS/FS based addressing can be utilized in user space
applications along with the differences between the generic prctl() based
GS/FS base control and the FSGSBASE version available on newer CPUs.

Originally-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Jonathan Corbet <corbet@lwn.net>
---

Changes from v8:
* Fixed typos (Randy Dunlap)
* Massaged a few sentences that were previously edited by Thomas.

Changes from v7:
* Rewritten the documentation and changelog by Thomas
* Included compiler version info additionally
---
 Documentation/x86/x86_64/fsgs.rst  | 199 +++++++++++++++++++++++++++++++++++++
 Documentation/x86/x86_64/index.rst |   1 +
 2 files changed, 200 insertions(+)
 create mode 100644 Documentation/x86/x86_64/fsgs.rst

diff --git a/Documentation/x86/x86_64/fsgs.rst b/Documentation/x86/x86_64/fsgs.rst
new file mode 100644
index 0000000..50960e0
--- /dev/null
+++ b/Documentation/x86/x86_64/fsgs.rst
@@ -0,0 +1,199 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Using FS and GS segments in user space applications
+===================================================
+
+The x86 architecture supports segmentation. Instructions which access
+memory can use segment register based addressing mode. The following
+notation is used to address a byte within a segment:
+
+  Segment-register:Byte-address
+
+The segment base address is added to the Byte-address to compute the
+resulting virtual address which is accessed. This allows to access multiple
+instances of data with the identical Byte-address, i.e. the same code. The
+selection of a particular instance is purely based on the base-address in
+the segment register.
+
+In 32-bit mode the CPU provides 6 segments, which also support segment
+limits. The limits can be used to enforce address space protections.
+
+In 64-bit mode the CS/SS/DS/ES segments are ignored and the base address is
+always 0 to provide a full 64bit address space. The FS and GS segments are
+still functional in 64-bit mode.
+
+Common FS and GS usage
+------------------------------
+
+The FS segment is commonly used to address Thread Local Storage (TLS). FS
+is usually managed by runtime code or a threading library. Variables
+declared with the '__thread' storage class specifier are instantiated per
+thread and the compiler emits the FS: address prefix for accesses to these
+variables. Each thread has its own FS base address so common code can be
+used without complex address offset calculations to access the per thread
+instances. Applications should not use FS for other purposes when they use
+runtimes or threading libraries which manage the per thread FS.
+
+The GS segment has no common use and can be used freely by
+applications. GCC and Clang support GS based addressing via address space
+identifiers.
+
+Reading and writing the FS/GS base address
+------------------------------------------
+
+There exist two mechanisms to read and write the FS/GS base address:
+
+ - the arch_prctl() system call
+
+ - the FSGSBASE instruction family
+
+Accessing FS/GS base with arch_prctl()
+--------------------------------------
+
+ The arch_prctl(2) based mechanism is available on all 64-bit CPUs and all
+ kernel versions.
+
+ Reading the base:
+
+   arch_prctl(ARCH_GET_FS, &fsbase);
+   arch_prctl(ARCH_GET_GS, &gsbase);
+
+ Writing the base:
+
+   arch_prctl(ARCH_SET_FS, fsbase);
+   arch_prctl(ARCH_SET_GS, gsbase);
+
+ The ARCH_SET_GS prctl may be disabled depending on kernel configuration
+ and security settings.
+
+Accessing FS/GS base with the FSGSBASE instructions
+---------------------------------------------------
+
+ With the Ivy Bridge CPU generation Intel introduced a new set of
+ instructions to access the FS and GS base registers directly from user
+ space. These instructions are also supported on AMD Family 17H CPUs. The
+ following instructions are available:
+
+  =============== ===========================
+  RDFSBASE %reg   Read the FS base register
+  RDGSBASE %reg   Read the GS base register
+  WRFSBASE %reg   Write the FS base register
+  WRGSBASE %reg   Write the GS base register
+  =============== ===========================
+
+ The instructions avoid the overhead of the arch_prctl() syscall and allow
+ more flexible usage of the FS/GS addressing modes in user space
+ applications. This does not prevent conflicts between threading libraries
+ and runtimes which utilize FS and applications which want to use it for
+ their own purpose.
+
+FSGSBASE instructions enablement
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+ The instructions are enumerated in CPUID leaf 7, bit 0 of EBX. If
+ available /proc/cpuinfo shows 'fsgsbase' in the flag entry of the CPUs.
+
+ The availability of the instructions does not enable them
+ automatically. The kernel has to enable them explicitly in CR4. The
+ reason for this is that older kernels make assumptions about the values in
+ the GS register and enforce them when GS base is set via
+ arch_prctl(). Allowing user space to write arbitrary values to GS base
+ would violate these assumptions and cause malfunction.
+
+ On kernels which do not enable FSGSBASE the execution of the FSGSBASE
+ instructions will fault with a #UD exception.
+
+ The kernel provides reliable information about the enabled state in the
+ ELF AUX vector. If the HWCAP2_FSGSBASE bit is set in the AUX vector, the
+ kernel has FSGSBASE instructions enabled and applications can use them.
+ The following code example shows how this detection works::
+
+   #include <sys/auxv.h>
+   #include <elf.h>
+
+   /* Will be eventually in asm/hwcap.h */
+   #ifndef HWCAP2_FSGSBASE
+   #define HWCAP2_FSGSBASE        (1 << 1)
+   #endif
+
+   ....
+
+   unsigned val = getauxval(AT_HWCAP2);
+
+   if (val & HWCAP2_FSGSBASE)
+        printf("FSGSBASE enabled\n");
+
+FSGSBASE instructions compiler support
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+GCC version 4.6.4 and newer provide instrinsics for the FSGSBASE
+instructions. Clang 5 supports them as well.
+
+  =================== ===========================
+  _readfsbase_u64()   Read the FS base register
+  _readfsbase_u64()   Read the GS base register
+  _writefsbase_u64()  Write the FS base register
+  _writegsbase_u64()  Write the GS base register
+  =================== ===========================
+
+To utilize these instrinsics <immintrin.h> must be included in the source
+code and the compiler option -mfsgsbase has to be added.
+
+Compiler support for FS/GS based addressing
+-------------------------------------------
+
+GCC version 6 and newer provide support for FS/GS based addressing via
+Named Address Spaces. GCC implements the following address space
+identifiers for x86:
+
+  ========= ====================================
+  __seg_fs  Variable is addressed relative to FS
+  __seg_gs  Variable is addressed relative to GS
+  ========= ====================================
+
+The preprocessor symbols __SEG_FS and __SEG_GS are defined when these
+address spaces are supported. Code which implements fallback modes should
+check whether these symbols are defined. Usage example::
+
+  #ifdef __SEG_GS
+
+  long data0 = 0;
+  long data1 = 1;
+
+  long __seg_gs *ptr;
+
+  /* Check whether FSGSBASE is enabled by the kernel (HWCAP2_FSGSBASE) */
+  ....
+
+  /* Set GS base to point to data0 */
+  _writegsbase_u64(&data0);
+
+  /* Access offset 0 of GS */
+  ptr = 0;
+  printf("data0 = %ld\n", *ptr);
+
+  /* Set GS base to point to data1 */
+  _writegsbase_u64(&data1);
+  /* ptr still addresses offset 0! */
+  printf("data1 = %ld\n", *ptr);
+
+
+Clang does not provide the GCC address space identifiers, but it provides
+address spaces via an attribute based mechanism in Clang 2.6 and newer
+versions:
+
+ ==================================== =====================================
+  __attribute__((address_space(256))  Variable is addressed relative to GS
+  __attribute__((address_space(257))  Variable is addressed relative to FS
+ ==================================== =====================================
+
+FS/GS based addressing with inline assembly
+-------------------------------------------
+
+In case the compiler does not support address spaces, inline assembly can
+be used for FS/GS based addressing mode::
+
+	mov %fs:offset, %reg
+	mov %gs:offset, %reg
+
+	mov %reg, %fs:offset
+	mov %reg, %gs:offset
diff --git a/Documentation/x86/x86_64/index.rst b/Documentation/x86/x86_64/index.rst
index d6eaaa5..a56070f 100644
--- a/Documentation/x86/x86_64/index.rst
+++ b/Documentation/x86/x86_64/index.rst
@@ -14,3 +14,4 @@ x86_64 Support
    fake-numa-for-cpusets
    cpu-hotplug-spec
    machinecheck
+   fsgs
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH v9 17/17] Documentation/x86/64: Add documentation for GS/FS addressing mode
  2019-10-04 18:16 ` [PATCH v9 17/17] Documentation/x86/64: Add documentation for GS/FS addressing mode Chang S. Bae
@ 2019-10-04 22:54   ` Randy Dunlap
  0 siblings, 0 replies; 46+ messages in thread
From: Randy Dunlap @ 2019-10-04 22:54 UTC (permalink / raw)
  To: Chang S. Bae, linux-kernel, tglx, bp, luto
  Cc: hpa, dave.hansen, tony.luck, ak, ravi.v.shankar, Jonathan Corbet

On 10/4/19 11:16 AM, Chang S. Bae wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> ---

Reviewed-by: Randy Dunlap <rdunlap@infradead.org>

Thanks.

> 
> Changes from v8:
> * Fixed typos (Randy Dunlap)
> * Massaged a few sentences that were previously edited by Thomas.
> 
> Changes from v7:
> * Rewritten the documentation and changelog by Thomas
> * Included compiler version info additionally
> ---
>  Documentation/x86/x86_64/fsgs.rst  | 199 +++++++++++++++++++++++++++++++++++++
>  Documentation/x86/x86_64/index.rst |   1 +
>  2 files changed, 200 insertions(+)
>  create mode 100644 Documentation/x86/x86_64/fsgs.rst


-- 
~Randy

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v9 00/17] Enable FSGSBASE instructions
  2019-10-04 18:15 [PATCH v9 00/17] Enable FSGSBASE instructions Chang S. Bae
                   ` (16 preceding siblings ...)
  2019-10-04 18:16 ` [PATCH v9 17/17] Documentation/x86/64: Add documentation for GS/FS addressing mode Chang S. Bae
@ 2019-11-15 18:29 ` Thomas Gleixner
  2019-11-15 19:12   ` Andi Kleen
  17 siblings, 1 reply; 46+ messages in thread
From: Thomas Gleixner @ 2019-11-15 18:29 UTC (permalink / raw)
  To: Chang S. Bae
  Cc: linux-kernel, bp, luto, hpa, dave.hansen, tony.luck, ak, ravi.v.shankar

On Fri, 4 Oct 2019, Chang S. Bae wrote:
> 
> Updates from v8 [10]:
> * Internalized the interrupt check in the helper functions (Andy L.)
> * Simplified GS base helper functions (Tony L.)
> * Changed the patch order to put the paranoid path changes before the
>   context switch changes (Tony L.)
> * Fixed typos (Randy D.) and massaged a few sentences in the documentation
> * Massaged the FSGSBASE enablement message

That still lacks what Andy requested quite some time ago in the V8 thread:

     https://lore.kernel.org/lkml/034aaf3a-a93d-ec03-0bbd-068e1905b774@kernel.org/

  "I also think that, before this series can have my ack, it needs an 
   actual gdb maintainer to chime in, publicly, and state that they have 
   thought about and tested the ABI changes and that gdb still works on 
   patched kernels with and without FSGSBASE enabled.  I realize that there 
   were all kinds of discussions, but they were all quite theoretical, and 
   I think that the actual patches need to be considered by people who 
   understand the concerns.  Specific test cases would be nice, too."

What's the state of this?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v9 00/17] Enable FSGSBASE instructions
  2019-11-15 18:29 ` [PATCH v9 00/17] Enable FSGSBASE instructions Thomas Gleixner
@ 2019-11-15 19:12   ` Andi Kleen
  2019-11-29 14:56     ` Metzger, Markus T
  0 siblings, 1 reply; 46+ messages in thread
From: Andi Kleen @ 2019-11-15 19:12 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Chang S. Bae, linux-kernel, bp, luto, hpa, dave.hansen,
	tony.luck, ravi.v.shankar, markus.t.metzger

On Fri, Nov 15, 2019 at 07:29:17PM +0100, Thomas Gleixner wrote:
> On Fri, 4 Oct 2019, Chang S. Bae wrote:
> > 
> > Updates from v8 [10]:
> > * Internalized the interrupt check in the helper functions (Andy L.)
> > * Simplified GS base helper functions (Tony L.)
> > * Changed the patch order to put the paranoid path changes before the
> >   context switch changes (Tony L.)
> > * Fixed typos (Randy D.) and massaged a few sentences in the documentation
> > * Massaged the FSGSBASE enablement message
> 
> That still lacks what Andy requested quite some time ago in the V8 thread:
> 
>      https://lore.kernel.org/lkml/034aaf3a-a93d-ec03-0bbd-068e1905b774@kernel.org/
> 
>   "I also think that, before this series can have my ack, it needs an 
>    actual gdb maintainer to chime in, publicly, and state that they have 
>    thought about and tested the ABI changes and that gdb still works on 
>    patched kernels with and without FSGSBASE enabled.  I realize that there 
>    were all kinds of discussions, but they were all quite theoretical, and 
>    I think that the actual patches need to be considered by people who 
>    understand the concerns.  Specific test cases would be nice, too."
> 
> What's the state of this?

Adding Markus.

-Andi

^ permalink raw reply	[flat|nested] 46+ messages in thread

* RE: [PATCH v9 00/17] Enable FSGSBASE instructions
  2019-11-15 19:12   ` Andi Kleen
@ 2019-11-29 14:56     ` Metzger, Markus T
  2019-11-29 16:51       ` Andy Lutomirski
  0 siblings, 1 reply; 46+ messages in thread
From: Metzger, Markus T @ 2019-11-29 14:56 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Bae, Chang Seok, linux-kernel, bp, luto, hpa, Hansen, Dave, Luck,
	Tony, Shankar, Ravi V, Pedro Alves, Simon Marchi, Andi Kleen

> On Fri, Nov 15, 2019 at 07:29:17PM +0100, Thomas Gleixner wrote:
> > On Fri, 4 Oct 2019, Chang S. Bae wrote:
> > >
> > > Updates from v8 [10]:
> > > * Internalized the interrupt check in the helper functions (Andy L.)
> > > * Simplified GS base helper functions (Tony L.)
> > > * Changed the patch order to put the paranoid path changes before the
> > >   context switch changes (Tony L.)
> > > * Fixed typos (Randy D.) and massaged a few sentences in the documentation
> > > * Massaged the FSGSBASE enablement message
> >
> > That still lacks what Andy requested quite some time ago in the V8 thread:
> >
> >      https://lore.kernel.org/lkml/034aaf3a-a93d-ec03-0bbd-
> 068e1905b774@kernel.org/
> >
> >   "I also think that, before this series can have my ack, it needs an
> >    actual gdb maintainer to chime in, publicly, and state that they have
> >    thought about and tested the ABI changes and that gdb still works on
> >    patched kernels with and without FSGSBASE enabled.  I realize that there
> >    were all kinds of discussions, but they were all quite theoretical, and
> >    I think that the actual patches need to be considered by people who
> >    understand the concerns.  Specific test cases would be nice, too."
> >
> > What's the state of this?

On branch users/mmetzger/fsgs in sourceware.org/git/binutils-gdb.git,
there's a GDB test covering the behavior discussed theoretically back then.

It covers modifying the selector as well as the base from GDB and using
the modified values for inferior calls as well as for resuming the inferior.

Current kernels allow changing the selector and provide the resulting
base back to the ptracer.  They also allow changing the base as long as
the selector is zero.  That's the behavior we wanted to preserve IIRC.

The patch series on branch fsgs_tip_5.4-rc1_100319 at
github.com/changbae/Linux-kernel.git breaks tests that modify the
selector and expect that to change the base.

That kernel allows changing the base via ptrace but ignores changes
to the selector.

Regards,
Markus.
Intel Deutschland GmbH
Registered Address: Am Campeon 10-12, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de
Managing Directors: Christin Eisenschmid, Gary Kershaw
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v9 00/17] Enable FSGSBASE instructions
  2019-11-29 14:56     ` Metzger, Markus T
@ 2019-11-29 16:51       ` Andy Lutomirski
  2019-12-02  8:23         ` Metzger, Markus T
  0 siblings, 1 reply; 46+ messages in thread
From: Andy Lutomirski @ 2019-11-29 16:51 UTC (permalink / raw)
  To: Metzger, Markus T
  Cc: Thomas Gleixner, Bae, Chang Seok, linux-kernel, bp, luto, hpa,
	Hansen, Dave, Luck, Tony, Shankar, Ravi V, Pedro Alves,
	Simon Marchi, Andi Kleen

On Fri, Nov 29, 2019 at 6:56 AM Metzger, Markus T
<markus.t.metzger@intel.com> wrote:
>
> > On Fri, Nov 15, 2019 at 07:29:17PM +0100, Thomas Gleixner wrote:
> > > On Fri, 4 Oct 2019, Chang S. Bae wrote:
> > > >
> > > > Updates from v8 [10]:
> > > > * Internalized the interrupt check in the helper functions (Andy L.)
> > > > * Simplified GS base helper functions (Tony L.)
> > > > * Changed the patch order to put the paranoid path changes before the
> > > >   context switch changes (Tony L.)
> > > > * Fixed typos (Randy D.) and massaged a few sentences in the documentation
> > > > * Massaged the FSGSBASE enablement message
> > >
> > > That still lacks what Andy requested quite some time ago in the V8 thread:
> > >
> > >      https://lore.kernel.org/lkml/034aaf3a-a93d-ec03-0bbd-
> > 068e1905b774@kernel.org/
> > >
> > >   "I also think that, before this series can have my ack, it needs an
> > >    actual gdb maintainer to chime in, publicly, and state that they have
> > >    thought about and tested the ABI changes and that gdb still works on
> > >    patched kernels with and without FSGSBASE enabled.  I realize that there
> > >    were all kinds of discussions, but they were all quite theoretical, and
> > >    I think that the actual patches need to be considered by people who
> > >    understand the concerns.  Specific test cases would be nice, too."
> > >
> > > What's the state of this?
>
> On branch users/mmetzger/fsgs in sourceware.org/git/binutils-gdb.git,
> there's a GDB test covering the behavior discussed theoretically back then.
>
> It covers modifying the selector as well as the base from GDB and using
> the modified values for inferior calls as well as for resuming the inferior.
>
> Current kernels allow changing the selector and provide the resulting
> base back to the ptracer.  They also allow changing the base as long as
> the selector is zero.  That's the behavior we wanted to preserve IIRC.

The general kernel rule is that we don't break working applications.
Other than that, we're allowed to change the ABI if existing working
applications don't break.  I can't tell whether you wrote a test that
detects a behavior change or whether you wrote a test that tests
behavior that gdb or other programs actually rely on.

Certainly, with a 32-bit *gdb*, writing a nonzero value to FS or GS
using ptrace should change the base accordingly.  I think the current
patches get this wrong.

With a 64-bit gdb and a 32-bit inferior, in an ideal world, everything
would work just like full 64-bit, since that's how the hardware works.
But we don't necessary live in an ideal world.

With a 64-bit gdb and a 64-bit inferior, the inferior can set FS to
some nonzero value and then set FSBASE to an arbitrary 64-bit number,
and FS will retain its value.  ptrace needs to give gdb some way to
read, save, and restore this state.

I think the ideal behavior is that 64-bit ptrace callers should
control FS and FSBASE independently.  The question is: will that break
things?  If it will, then we'll need to make sure that there is an API
by which a debugger can independently control FS and FSBASE, and we'll
also need to make sure that whatever existing API debuggers use to
change FS and expect FSBASE to magically change as well continue to
have that effect.

>
> The patch series on branch fsgs_tip_5.4-rc1_100319 at
> github.com/changbae/Linux-kernel.git breaks tests that modify the
> selector and expect that to change the base.
>
> That kernel allows changing the base via ptrace but ignores changes
> to the selector.
>

I don't really understand your test, but I'm pretty sure I found a
couple bugs in the test:

  88 int
  89 switch_fs_read (unsigned int fs)
  90 {
  91   __asm__ volatile ("mov %0, %%fs" :: "rm"(fs) : "memory");
  92
  93   return read_fs ();
  94 }

This has fundamentally inconsistent behavior on Intel vs AMD CPUs.
Intel CPUs will clear FSBASE when you write 0 to FS.  Older AMD CPUs
do *not* clear FSBASE when you write 0 to FS.  Very very new AMD CPUs
behave more like Intel CPUs, I believe.

  40     struct user_desc ud;
  41     int errcode;
  42
  43     memset (&ud, 0, sizeof (ud));
  44     ud.entry_number = entry;
  45     ud.base_addr = (unsigned long) base;
  46     ud.limit = (unsigned int) size;
  47
  48     /* Some 64-bit systems declare ud.base_addr 'unsigned int' instead of
  49        'unsigned long'.
  50
  51        Combined with address space layout randomization, this might
  52        truncate our base address and result in a crash when we try to read
  53        segment-relative.
  54
  55        Checking the field size would exclude too many systems so we settle
  56        for checking whether we actually truncated the address.  */
  57
  58     if (ud.base_addr != (unsigned long) base)
  59       return 0u;

The base of a segment in a descriptor table is 32 bits, full stop.
This is a hardware limitation and has nothing to do with the kernel.
base_addr is correctly unsigned int in the kernel headers.  If you
actually find a system where base_addr is unsigned long and unsigned
long is 64 bits, then your test will malfunction.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* RE: [PATCH v9 00/17] Enable FSGSBASE instructions
  2019-11-29 16:51       ` Andy Lutomirski
@ 2019-12-02  8:23         ` Metzger, Markus T
  2019-12-04 20:20           ` Andy Lutomirski
  0 siblings, 1 reply; 46+ messages in thread
From: Metzger, Markus T @ 2019-12-02  8:23 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, Bae, Chang Seok, linux-kernel, bp, hpa, Hansen,
	Dave, Luck, Tony, Shankar, Ravi V, Pedro Alves, Simon Marchi,
	Andi Kleen

> On Fri, Nov 29, 2019 at 6:56 AM Metzger, Markus T
> <markus.t.metzger@intel.com> wrote:
> >
> > > On Fri, Nov 15, 2019 at 07:29:17PM +0100, Thomas Gleixner wrote:
> > > > On Fri, 4 Oct 2019, Chang S. Bae wrote:
> > > > >
> > > > > Updates from v8 [10]:
> > > > > * Internalized the interrupt check in the helper functions (Andy L.)
> > > > > * Simplified GS base helper functions (Tony L.)
> > > > > * Changed the patch order to put the paranoid path changes before the
> > > > >   context switch changes (Tony L.)
> > > > > * Fixed typos (Randy D.) and massaged a few sentences in the
> documentation
> > > > > * Massaged the FSGSBASE enablement message
> > > >
> > > > That still lacks what Andy requested quite some time ago in the V8 thread:
> > > >
> > > >      https://lore.kernel.org/lkml/034aaf3a-a93d-ec03-0bbd-
> > > 068e1905b774@kernel.org/
> > > >
> > > >   "I also think that, before this series can have my ack, it needs an
> > > >    actual gdb maintainer to chime in, publicly, and state that they have
> > > >    thought about and tested the ABI changes and that gdb still works on
> > > >    patched kernels with and without FSGSBASE enabled.  I realize that there
> > > >    were all kinds of discussions, but they were all quite theoretical, and
> > > >    I think that the actual patches need to be considered by people who
> > > >    understand the concerns.  Specific test cases would be nice, too."
> > > >
> > > > What's the state of this?
> >
> > On branch users/mmetzger/fsgs in sourceware.org/git/binutils-gdb.git,
> > there's a GDB test covering the behavior discussed theoretically back then.
> >
> > It covers modifying the selector as well as the base from GDB and using
> > the modified values for inferior calls as well as for resuming the inferior.
> >
> > Current kernels allow changing the selector and provide the resulting
> > base back to the ptracer.  They also allow changing the base as long as
> > the selector is zero.  That's the behavior we wanted to preserve IIRC.
> 
> The general kernel rule is that we don't break working applications.
> Other than that, we're allowed to change the ABI if existing working
> applications don't break.  I can't tell whether you wrote a test that
> detects a behavior change or whether you wrote a test that tests
> behavior that gdb or other programs actually rely on.

Well, that's a tough question.  The test covers GDB's behavior on today's
systems.  GDB itself does not actually rely on that behavior.  That is, GDB
itself wouldn't break.  You couldn't do all that you could do with it before,
though.

It would be GDB's users that are affected.  How do you tell if anyone is
actually relying on it?


> Certainly, with a 32-bit *gdb*, writing a nonzero value to FS or GS
> using ptrace should change the base accordingly.  I think the current
> patches get this wrong.
> 
> With a 64-bit gdb and a 32-bit inferior, in an ideal world, everything
> would work just like full 64-bit, since that's how the hardware works.

Not sure what you mean.  The h/w runs in compatibility mode and the
inferior cannot set the base directly, can it?


> But we don't necessary live in an ideal world.
> 
> With a 64-bit gdb and a 64-bit inferior, the inferior can set FS to
> some nonzero value and then set FSBASE to an arbitrary 64-bit number,
> and FS will retain its value.  ptrace needs to give gdb some way to
> read, save, and restore this state.

With Chang's patch series, that actually works.  You can set FS and then
set FSBASE without setting FS to zero previously.  The tests do not cover
that since on current system that leads to the inferior crashing in read_fs().


> I think the ideal behavior is that 64-bit ptrace callers should
> control FS and FSBASE independently.  The question is: will that break
> things?  If it will, then we'll need to make sure that there is an API
> by which a debugger can independently control FS and FSBASE, and we'll
> also need to make sure that whatever existing API debuggers use to
> change FS and expect FSBASE to magically change as well continue to
> have that effect.

We had discussed this some time ago and proposed the following behavior: "
https://lore.kernel.org/lkml/1521481767-22113-15-git-send-email-chang.seok.bae@intel.com/

	In a summary, ptracer's update on FS/GS selector and base
	yields such results on tracee's base:
	- When FS/GS selector only changed (to nonzero), fetch base
	from GDT/LDT (legacy behavior)
	- When FS/GS base (regardless of selector) changed, tracee
	will have the base
"

The ptracer would need to read registers back after changing the selector
to get the updated base.

The only time when both change at the same time, then, is when registers
are restored after returning from an inferior call.  And then, it's the base
we want to take priority since we previously ensured that the base is always
up-to-date.


> > The patch series on branch fsgs_tip_5.4-rc1_100319 at
> > github.com/changbae/Linux-kernel.git breaks tests that modify the
> > selector and expect that to change the base.
> >
> > That kernel allows changing the base via ptrace but ignores changes
> > to the selector.
> >
> 
> I don't really understand your test, but I'm pretty sure I found a
> couple bugs in the test:

Thanks for your review.


>   88 int
>   89 switch_fs_read (unsigned int fs)
>   90 {
>   91   __asm__ volatile ("mov %0, %%fs" :: "rm"(fs) : "memory");
>   92
>   93   return read_fs ();
>   94 }
> 
> This has fundamentally inconsistent behavior on Intel vs AMD CPUs.
> Intel CPUs will clear FSBASE when you write 0 to FS.  Older AMD CPUs
> do *not* clear FSBASE when you write 0 to FS.  Very very new AMD CPUs
> behave more like Intel CPUs, I believe.

Thanks for pointing this out but I don't think that this is actually an issue for
this test.  This function is only ever used with fs==0xa7 to switch to the LDT
entry that the test program has setup before.

The test sets FS/GS to zero via ptrace from GDB.


>   40     struct user_desc ud;
>   41     int errcode;
>   42
>   43     memset (&ud, 0, sizeof (ud));
>   44     ud.entry_number = entry;
>   45     ud.base_addr = (unsigned long) base;
>   46     ud.limit = (unsigned int) size;
>   47
>   48     /* Some 64-bit systems declare ud.base_addr 'unsigned int' instead of
>   49        'unsigned long'.
>   50
>   51        Combined with address space layout randomization, this might
>   52        truncate our base address and result in a crash when we try to read
>   53        segment-relative.
>   54
>   55        Checking the field size would exclude too many systems so we settle
>   56        for checking whether we actually truncated the address.  */
>   57
>   58     if (ud.base_addr != (unsigned long) base)
>   59       return 0u;
> 
> The base of a segment in a descriptor table is 32 bits, full stop.
> This is a hardware limitation and has nothing to do with the kernel.
> base_addr is correctly unsigned int in the kernel headers.  If you
> actually find a system where base_addr is unsigned long and unsigned
> long is 64 bits, then your test will malfunction.

The modify_ldt(2) man page says: "
       The user_desc structure is defined in <asm/ldt.h> as:

           struct user_desc {
               unsigned int  entry_number;
               unsigned long base_addr;
               unsigned int  limit;
               unsigned int  seg_32bit:1;
               unsigned int  contents:2;
               unsigned int  read_exec_only:1;
               unsigned int  limit_in_pages:1;
               unsigned int  seg_not_present:1;
               unsigned int  useable:1;
           };
"

The declaration in asm/ldt.h actually defines base_addr as unsigned int.

So my comment about 'some 64-bit systems' is wrong and should actually
say 'all systems'.  Will fix.

That by itself is not an issue as long as the main executable is not loaded at
a high address.  I only ran into problems with that on some ubuntu system
in our test pool.

Markus.


Intel Deutschland GmbH
Registered Address: Am Campeon 10-12, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de
Managing Directors: Christin Eisenschmid, Gary Kershaw
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v9 00/17] Enable FSGSBASE instructions
  2019-12-02  8:23         ` Metzger, Markus T
@ 2019-12-04 20:20           ` Andy Lutomirski
  2019-12-10  8:27             ` Metzger, Markus T
  2020-02-24 18:02             ` Bae, Chang Seok
  0 siblings, 2 replies; 46+ messages in thread
From: Andy Lutomirski @ 2019-12-04 20:20 UTC (permalink / raw)
  To: Metzger, Markus T
  Cc: Andy Lutomirski, Thomas Gleixner, Bae, Chang Seok, linux-kernel,
	bp, hpa, Hansen, Dave, Luck, Tony, Shankar, Ravi V, Pedro Alves,
	Simon Marchi, Andi Kleen

On Mon, Dec 2, 2019 at 12:23 AM Metzger, Markus T
<markus.t.metzger@intel.com> wrote:
>
> > On Fri, Nov 29, 2019 at 6:56 AM Metzger, Markus T
> > <markus.t.metzger@intel.com> wrote:
> > >
> > > > On Fri, Nov 15, 2019 at 07:29:17PM +0100, Thomas Gleixner wrote:
> > > > > On Fri, 4 Oct 2019, Chang S. Bae wrote:
> > > > > >
> > > > > > Updates from v8 [10]:
> > > > > > * Internalized the interrupt check in the helper functions (Andy L.)
> > > > > > * Simplified GS base helper functions (Tony L.)
> > > > > > * Changed the patch order to put the paranoid path changes before the
> > > > > >   context switch changes (Tony L.)
> > > > > > * Fixed typos (Randy D.) and massaged a few sentences in the
> > documentation
> > > > > > * Massaged the FSGSBASE enablement message
> > > > >
> > > > > That still lacks what Andy requested quite some time ago in the V8 thread:
> > > > >
> > > > >      https://lore.kernel.org/lkml/034aaf3a-a93d-ec03-0bbd-
> > > > 068e1905b774@kernel.org/
> > > > >
> > > > >   "I also think that, before this series can have my ack, it needs an
> > > > >    actual gdb maintainer to chime in, publicly, and state that they have
> > > > >    thought about and tested the ABI changes and that gdb still works on
> > > > >    patched kernels with and without FSGSBASE enabled.  I realize that there
> > > > >    were all kinds of discussions, but they were all quite theoretical, and
> > > > >    I think that the actual patches need to be considered by people who
> > > > >    understand the concerns.  Specific test cases would be nice, too."
> > > > >
> > > > > What's the state of this?
> > >
> > > On branch users/mmetzger/fsgs in sourceware.org/git/binutils-gdb.git,
> > > there's a GDB test covering the behavior discussed theoretically back then.
> > >
> > > It covers modifying the selector as well as the base from GDB and using
> > > the modified values for inferior calls as well as for resuming the inferior.
> > >
> > > Current kernels allow changing the selector and provide the resulting
> > > base back to the ptracer.  They also allow changing the base as long as
> > > the selector is zero.  That's the behavior we wanted to preserve IIRC.
> >
> > The general kernel rule is that we don't break working applications.
> > Other than that, we're allowed to change the ABI if existing working
> > applications don't break.  I can't tell whether you wrote a test that
> > detects a behavior change or whether you wrote a test that tests
> > behavior that gdb or other programs actually rely on.
>
> Well, that's a tough question.  The test covers GDB's behavior on today's
> systems.  GDB itself does not actually rely on that behavior.  That is, GDB
> itself wouldn't break.  You couldn't do all that you could do with it before,
> though.

GDB does rely on at least some behavior.  If I tell gdb to call a
function on my behalf, doesn't it save the old state, call the
function, and then restore the state?  Surely it expects the restore
operation to actually restore the state.

>
> It would be GDB's users that are affected.  How do you tell if anyone is
> actually relying on it?

No clue.  But at least if this type of use is mostly interactive, then
users should be that badly affected.

It also helps that very, very few 64-bit applications use nonzero
segments at all.  They used to because of a kernel optimization to
automatically load a segment if an FS or GSBASE less than 4GB was
requested, but that's been gone for a while.  Calling
set_thread_area() at all in a 64-bit program requires considerable
gymnastics, and distributions can and do disable modify_ldt() outright
without significant ill effects.

So we're mostly talking about compatibility with 32-bit programs and
exotic users like Wine and DOSEMU.

>
>
> > Certainly, with a 32-bit *gdb*, writing a nonzero value to FS or GS
> > using ptrace should change the base accordingly.  I think the current
> > patches get this wrong.
> >
> > With a 64-bit gdb and a 32-bit inferior, in an ideal world, everything
> > would work just like full 64-bit, since that's how the hardware works.
>
> Not sure what you mean.  The h/w runs in compatibility mode and the
> inferior cannot set the base directly, can it?

I think there's a general impedance mismatch between gdb and the
kernel/hw here.  On Linux on a 64-bit machine, there's isn't really a
strong concept of a "32-bit process" versus a "64-bit process".  All
tasks have 64-bit values in RAX, all tasks have R8-R15, all tasks have
a GDT and an LDT, etc.  "32-bit tasks" are merely tasks that happen to
be running with a compatibility selector loaded into CS at the time.
Tasks can and do switch freely between compatibility and long mode
using LJMP or LRET.  As far as I can tell, however, gdb doesn't really
understand this and thinks that 32-bit tasks are their own special
thing.

This causes me real problems: gdb explodes horribly if I connect gdb
to QEMU's gdbserver (qemu -s) and try to debug during boot when the
inferior switches between 32-bit and long mode.

As far as FSGSBASE goes, a "32-bit task" absolutely can set
independent values in FS and FSBASE, although it's awkward to do so:
the task would have to do a far transfer to long mode, then WRFSBASE,
then far transfer back to compat mode.  But this entire sequence of
events could occur without entering the kernel at all, and the ptrace
API should be able to represent the result.  I think that, ideally, a
64-bit debugger would understand the essential 64-bitness of even
compat tasks and work sensibly.  I don't really expect gdb to be able
to do this any time soon, though.

>
>
> > But we don't necessary live in an ideal world.
> >
> > With a 64-bit gdb and a 64-bit inferior, the inferior can set FS to
> > some nonzero value and then set FSBASE to an arbitrary 64-bit number,
> > and FS will retain its value.  ptrace needs to give gdb some way to
> > read, save, and restore this state.
>
> With Chang's patch series, that actually works.  You can set FS and then
> set FSBASE without setting FS to zero previously.  The tests do not cover
> that since on current system that leads to the inferior crashing in read_fs().
>
>
> > I think the ideal behavior is that 64-bit ptrace callers should
> > control FS and FSBASE independently.  The question is: will that break
> > things?  If it will, then we'll need to make sure that there is an API
> > by which a debugger can independently control FS and FSBASE, and we'll
> > also need to make sure that whatever existing API debuggers use to
> > change FS and expect FSBASE to magically change as well continue to
> > have that effect.
>
> We had discussed this some time ago and proposed the following behavior: "
> https://lore.kernel.org/lkml/1521481767-22113-15-git-send-email-chang.seok.bae@intel.com/
>
>         In a summary, ptracer's update on FS/GS selector and base
>         yields such results on tracee's base:
>         - When FS/GS selector only changed (to nonzero), fetch base
>         from GDT/LDT (legacy behavior)
>         - When FS/GS base (regardless of selector) changed, tracee
>         will have the base
> "

Indeed.  But I never understood how this behavior could be implemented
with the current ABI.  As I understand it, gdb only ever sets the
inferior register state by using a single ptrace() call to load the
entire state, which means that the kernel does not know whether just
FS is being written or whether FS and FSBASE are being written.

What actual ptrace() call does gdb use when a 64-bit gdb debugs a
64-bit inferior?  How about a 32-bit inferior?

>
> The ptracer would need to read registers back after changing the selector
> to get the updated base.

What would the actual API be?

I think it could make sense to add a whole new ptrace() command to
tell the tracee to, in effect, MOV a specified value to a segment
register.  This call would have the actual correct semantics in which
it would return an error code if the specified value is invalid and
would return 0 on success.  And then a second ptrace() call could be
issued to read out FSBASE or GSBASE if needed.  Would this be useful?
What gdb commands would invoke it?

>
> The only time when both change at the same time, then, is when registers
> are restored after returning from an inferior call.  And then, it's the base
> we want to take priority since we previously ensured that the base is always
> up-to-date.

Right.  But how does the kernel tell the difference?

> >
> > The base of a segment in a descriptor table is 32 bits, full stop.
> > This is a hardware limitation and has nothing to do with the kernel.
> > base_addr is correctly unsigned int in the kernel headers.  If you
> > actually find a system where base_addr is unsigned long and unsigned
> > long is 64 bits, then your test will malfunction.
>
> The modify_ldt(2) man page says: "
>        The user_desc structure is defined in <asm/ldt.h> as:
>
>            struct user_desc {
>                unsigned int  entry_number;
>                unsigned long base_addr;
>                unsigned int  limit;
>                unsigned int  seg_32bit:1;
>                unsigned int  contents:2;
>                unsigned int  read_exec_only:1;
>                unsigned int  limit_in_pages:1;
>                unsigned int  seg_not_present:1;
>                unsigned int  useable:1;
>            };
> "
>
> The declaration in asm/ldt.h actually defines base_addr as unsigned int.
>
> So my comment about 'some 64-bit systems' is wrong and should actually
> say 'all systems'.  Will fix.

>
> That by itself is not an issue as long as the main executable is not loaded at
> a high address.  I only ran into problems with that on some ubuntu system
> in our test pool.

In my test cases, I use mmap() with MAP_32BIT to avoid this issue.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* RE: [PATCH v9 00/17] Enable FSGSBASE instructions
  2019-12-04 20:20           ` Andy Lutomirski
@ 2019-12-10  8:27             ` Metzger, Markus T
  2020-02-24 18:02             ` Bae, Chang Seok
  1 sibling, 0 replies; 46+ messages in thread
From: Metzger, Markus T @ 2019-12-10  8:27 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, Bae, Chang Seok, linux-kernel, bp, hpa, Hansen,
	Dave, Luck, Tony, Shankar, Ravi V, Pedro Alves, Simon Marchi,
	Andi Kleen

> > > The general kernel rule is that we don't break working applications.
> > > Other than that, we're allowed to change the ABI if existing working
> > > applications don't break.  I can't tell whether you wrote a test that
> > > detects a behavior change or whether you wrote a test that tests
> > > behavior that gdb or other programs actually rely on.
> >
> > Well, that's a tough question.  The test covers GDB's behavior on today's
> > systems.  GDB itself does not actually rely on that behavior.  That is, GDB
> > itself wouldn't break.  You couldn't do all that you could do with it before,
> > though.
> 
> GDB does rely on at least some behavior.  If I tell gdb to call a
> function on my behalf, doesn't it save the old state, call the
> function, and then restore the state?  Surely it expects the restore
> operation to actually restore the state.

It does.  If we managed to break that, inferior calls in GDB would be
broken.  Users who don't use inferior calls wouldn't know or care,
though.  That's the point I was trying to make previously.


> It also helps that very, very few 64-bit applications use nonzero
> segments at all.  They used to because of a kernel optimization to
> automatically load a segment if an FS or GSBASE less than 4GB was
> requested, but that's been gone for a while.  Calling
> set_thread_area() at all in a 64-bit program requires considerable
> gymnastics, and distributions can and do disable modify_ldt() outright
> without significant ill effects.
> 
> So we're mostly talking about compatibility with 32-bit programs and
> exotic users like Wine and DOSEMU.

I agree that this should mostly affect 32-bit programs.


> > > Certainly, with a 32-bit *gdb*, writing a nonzero value to FS or GS
> > > using ptrace should change the base accordingly.  I think the current
> > > patches get this wrong.
> > >
> > > With a 64-bit gdb and a 32-bit inferior, in an ideal world, everything
> > > would work just like full 64-bit, since that's how the hardware works.
> >
> > Not sure what you mean.  The h/w runs in compatibility mode and the
> > inferior cannot set the base directly, can it?
> 
> I think there's a general impedance mismatch between gdb and the
> kernel/hw here.  On Linux on a 64-bit machine, there's isn't really a
> strong concept of a "32-bit process" versus a "64-bit process".  All
> tasks have 64-bit values in RAX, all tasks have R8-R15, all tasks have
> a GDT and an LDT, etc.  "32-bit tasks" are merely tasks that happen to
> be running with a compatibility selector loaded into CS at the time.
> Tasks can and do switch freely between compatibility and long mode
> using LJMP or LRET.  As far as I can tell, however, gdb doesn't really
> understand this and thinks that 32-bit tasks are their own special
> thing.
> 
> This causes me real problems: gdb explodes horribly if I connect gdb
> to QEMU's gdbserver (qemu -s) and try to debug during boot when the
> inferior switches between 32-bit and long mode.
> 
> As far as FSGSBASE goes, a "32-bit task" absolutely can set
> independent values in FS and FSBASE, although it's awkward to do so:
> the task would have to do a far transfer to long mode, then WRFSBASE,
> then far transfer back to compat mode.  But this entire sequence of
> events could occur without entering the kernel at all, and the ptrace
> API should be able to represent the result.  I think that, ideally, a
> 64-bit debugger would understand the essential 64-bitness of even
> compat tasks and work sensibly.  I don't really expect gdb to be able
> to do this any time soon, though.

I guess the primary use-case would be an application that was originally
written for 32-bit and is being maintained since then.  GDB is probably
64-bit in that case.


> > We had discussed this some time ago and proposed the following behavior: "
> > https://lore.kernel.org/lkml/1521481767-22113-15-git-send-email-
> chang.seok.bae@intel.com/
> >
> >         In a summary, ptracer's update on FS/GS selector and base
> >         yields such results on tracee's base:
> >         - When FS/GS selector only changed (to nonzero), fetch base
> >         from GDT/LDT (legacy behavior)
> >         - When FS/GS base (regardless of selector) changed, tracee
> >         will have the base
> > "
> 
> Indeed.  But I never understood how this behavior could be implemented
> with the current ABI.  As I understand it, gdb only ever sets the
> inferior register state by using a single ptrace() call to load the
> entire state, which means that the kernel does not know whether just
> FS is being written or whether FS and FSBASE are being written.

GDB writes the register state as soon as the user changes one of them.


> What actual ptrace() call does gdb use when a 64-bit gdb debugs a
> 64-bit inferior?  How about a 32-bit inferior?

GDB uses GETREGS both for 64-bit and 32-bit inferiors.  If GETREGS is
not available, it errors out on 64-bit and falls back to PEEKUSER on 32-bit.


> > The ptracer would need to read registers back after changing the selector
> > to get the updated base.
> 
> What would the actual API be?

GETREGS and PEEKUSER.


> I think it could make sense to add a whole new ptrace() command to
> tell the tracee to, in effect, MOV a specified value to a segment
> register.  This call would have the actual correct semantics in which
> it would return an error code if the specified value is invalid and
> would return 0 on success.  And then a second ptrace() call could be
> issued to read out FSBASE or GSBASE if needed.  Would this be useful?
> What gdb commands would invoke it?

Could SETREGS handle it based on the above proposal?


> > The only time when both change at the same time, then, is when registers
> > are restored after returning from an inferior call.  And then, it's the base
> > we want to take priority since we previously ensured that the base is always
> > up-to-date.
> 
> Right.  But how does the kernel tell the difference?

The other times only one changes.  Could the kernel compare the old and new
values for selector and base and detect if one or both change at the same time?

Regards,
Markus.
Intel Deutschland GmbH
Registered Address: Am Campeon 10-12, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de
Managing Directors: Christin Eisenschmid, Gary Kershaw
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v9 00/17] Enable FSGSBASE instructions
  2019-12-04 20:20           ` Andy Lutomirski
  2019-12-10  8:27             ` Metzger, Markus T
@ 2020-02-24 18:02             ` Bae, Chang Seok
  2020-04-13 20:03               ` Sasha Levin
  1 sibling, 1 reply; 46+ messages in thread
From: Bae, Chang Seok @ 2020-02-24 18:02 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Metzger, Markus T, Andi Kleen, hpa, Thomas Gleixner, bp, Hansen,
	Dave, Luck, Tony, Pedro Alves, Simon Marchi, Shankar, Ravi V,
	linux-kernel


> On Dec 4, 2019, at 12:20, Andy Lutomirski <luto@kernel.org> wrote:
> 
> I think it could make sense to add a whole new ptrace() command to
> tell the tracee to, in effect, MOV a specified value to a segment
> register.  This call would have the actual correct semantics in which
> it would return an error code if the specified value is invalid and
> would return 0 on success.  And then a second ptrace() call could be
> issued to read out FSBASE or GSBASE if needed.  Would this be useful?
> What gdb commands would invoke it?

We consider new commands to access GDT/LDT that hpa posted before [1] may be
helpful. If the kernel provides the interfaces to ptracer, we expect GDB for
both 32-/64-bits can make such changes for inferior calls:
(1) When FS/GS selector only updated,
	GDB used to write the selector value via SETREGS. Now it can read the
	base value from the new APIs and write the base also. This change does
	not harm today's kernel, and it retains the legacy behavior on
	FSGSBASE-enabled kernels in the future.
(2) When FS/GS base only updated,
(3) When both FS/GS selector and base updated,
	GDB has no change from what it used to do. The new FSGSBASE-enabled
	kernel improves the behavior by keeping the base regardless of a
	selector.

The proposed change in GDB would do an additional GETREGS for every SETREGS
to obtain the old value. Other ptrace-users may need a similar patch if
sensitive to the outcome from writing FS/GS selector, but last time when we
surveyed for other tools [2, 3], we didn't find the issue. We also didn't
find actual users who rely on legacy behavior in practice.

We'd like to hear a clear opinion of whether the GDB changes along with the
new ptrace APIs are necessary and sufficient as preparing the FSGSBASE
support in the kernel.

[1] https://lore.kernel.org/patchwork/cover/954471/
[2] https://mail.mozilla.org/pipermail/rr-dev/2018-March/000616.html
[3] https://lists.openvz.org/pipermail/criu/2018-March/040654.html

Thanks,
Chang

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v9 00/17] Enable FSGSBASE instructions
  2020-02-24 18:02             ` Bae, Chang Seok
@ 2020-04-13 20:03               ` Sasha Levin
  2020-04-14  0:32                 ` Andi Kleen
  2020-04-14 15:47                 ` Bae, Chang Seok
  0 siblings, 2 replies; 46+ messages in thread
From: Sasha Levin @ 2020-04-13 20:03 UTC (permalink / raw)
  To: Bae, Chang Seok
  Cc: Andy Lutomirski, Metzger, Markus T, Andi Kleen, hpa,
	Thomas Gleixner, bp, Hansen, Dave, Luck, Tony, Pedro Alves,
	Simon Marchi, Shankar, Ravi V, linux-kernel

On Mon, Feb 24, 2020 at 06:02:17PM +0000, Bae, Chang Seok wrote:
>
>> On Dec 4, 2019, at 12:20, Andy Lutomirski <luto@kernel.org> wrote:
>>
>> I think it could make sense to add a whole new ptrace() command to
>> tell the tracee to, in effect, MOV a specified value to a segment
>> register.  This call would have the actual correct semantics in which
>> it would return an error code if the specified value is invalid and
>> would return 0 on success.  And then a second ptrace() call could be
>> issued to read out FSBASE or GSBASE if needed.  Would this be useful?
>> What gdb commands would invoke it?
>
>We consider new commands to access GDT/LDT that hpa posted before [1] may be
>helpful. If the kernel provides the interfaces to ptracer, we expect GDB for
>both 32-/64-bits can make such changes for inferior calls:
>(1) When FS/GS selector only updated,
>	GDB used to write the selector value via SETREGS. Now it can read the
>	base value from the new APIs and write the base also. This change does
>	not harm today's kernel, and it retains the legacy behavior on
>	FSGSBASE-enabled kernels in the future.
>(2) When FS/GS base only updated,
>(3) When both FS/GS selector and base updated,
>	GDB has no change from what it used to do. The new FSGSBASE-enabled
>	kernel improves the behavior by keeping the base regardless of a
>	selector.
>
>The proposed change in GDB would do an additional GETREGS for every SETREGS
>to obtain the old value. Other ptrace-users may need a similar patch if
>sensitive to the outcome from writing FS/GS selector, but last time when we
>surveyed for other tools [2, 3], we didn't find the issue. We also didn't
>find actual users who rely on legacy behavior in practice.
>
>We'd like to hear a clear opinion of whether the GDB changes along with the
>new ptrace APIs are necessary and sufficient as preparing the FSGSBASE
>support in the kernel.

Hi folks,

Let me try to revive this work as I think that it's blocked due to
misunderstanding of the current situation.

What I gather from the Intel folks is that the GDB folks are okay with
the change as is and don't expect to be doing any changes on their end.

The intel folks are interested in resolving this, but haven't heard back
on their proposed plan (above).

Thomas/Andy want to make sure that we are doing the right thing and are
not breaking anything:

   1. The ptrace modifications are correct (we do the right thing around
   updating FS/GS).
   2. The ptrace changes don't break existing userspace. I think that
   the Intel folks confirmed it above.


Is my attempt at understanding the current situation correct?

-- 
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v9 00/17] Enable FSGSBASE instructions
  2020-04-13 20:03               ` Sasha Levin
@ 2020-04-14  0:32                 ` Andi Kleen
  2020-04-17 13:30                   ` Sasha Levin
  2020-04-14 15:47                 ` Bae, Chang Seok
  1 sibling, 1 reply; 46+ messages in thread
From: Andi Kleen @ 2020-04-14  0:32 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Bae, Chang Seok, Andy Lutomirski, Metzger, Markus T, hpa,
	Thomas Gleixner, bp, Hansen, Dave, Luck, Tony, Pedro Alves,
	Simon Marchi, Shankar, Ravi V, linux-kernel

> Is my attempt at understanding the current situation correct?

Yes.

Nothing breaks, and it's a nice improvement for context switch
performance, in NMI/PMU performance, and also gives user space two free
registers to play around with.

-Andi

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v9 00/17] Enable FSGSBASE instructions
  2020-04-13 20:03               ` Sasha Levin
  2020-04-14  0:32                 ` Andi Kleen
@ 2020-04-14 15:47                 ` Bae, Chang Seok
  1 sibling, 0 replies; 46+ messages in thread
From: Bae, Chang Seok @ 2020-04-14 15:47 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Andy Lutomirski, Metzger, Markus T, Andi Kleen, hpa,
	Thomas Gleixner, bp, Hansen, Dave, Luck, Tony, Pedro Alves,
	Simon Marchi, Shankar, Ravi V, linux-kernel


> On Apr 13, 2020, at 13:03, Sasha Levin <sashal@kernel.org> wrote:
> 
> What I gather from the Intel folks is that the GDB folks are okay with
> the change as is and don't expect to be doing any changes on their end.

As far as I know, we never get any comments from GDB maintainers for this in
public.

Thanks,
Chang

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v9 00/17] Enable FSGSBASE instructions
  2020-04-14  0:32                 ` Andi Kleen
@ 2020-04-17 13:30                   ` Sasha Levin
  2020-04-17 15:52                     ` Andy Lutomirski
  0 siblings, 1 reply; 46+ messages in thread
From: Sasha Levin @ 2020-04-17 13:30 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Bae, Chang Seok, Andy Lutomirski, Metzger, Markus T, hpa,
	Thomas Gleixner, bp, Hansen, Dave, Luck, Tony, Pedro Alves,
	Simon Marchi, Shankar, Ravi V, linux-kernel

On Mon, Apr 13, 2020 at 05:32:05PM -0700, Andi Kleen wrote:
>> Is my attempt at understanding the current situation correct?
>
>Yes.
>
>Nothing breaks, and it's a nice improvement for context switch
>performance, in NMI/PMU performance, and also gives user space two free
>registers to play around with.

Thomas, Andy,

Could you list your outstanding objections to this patchset? I know it
might be rehashing stuff you've already mentioned in this thread but I
think that there's a disconnect between folks and it'll help with
restarting everything.

-- 
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v9 00/17] Enable FSGSBASE instructions
  2020-04-17 13:30                   ` Sasha Levin
@ 2020-04-17 15:52                     ` Andy Lutomirski
  2020-04-20 14:13                       ` Andi Kleen
  0 siblings, 1 reply; 46+ messages in thread
From: Andy Lutomirski @ 2020-04-17 15:52 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Andi Kleen, Bae, Chang Seok, Andy Lutomirski, Metzger, Markus T,
	hpa, Thomas Gleixner, bp, Hansen, Dave, Luck, Tony, Pedro Alves,
	Simon Marchi, Shankar, Ravi V, linux-kernel

tip-On Fri, Apr 17, 2020 at 6:30 AM Sasha Levin <sashal@kernel.org> wrote:
>
> On Mon, Apr 13, 2020 at 05:32:05PM -0700, Andi Kleen wrote:
> >> Is my attempt at understanding the current situation correct?
> >
> >Yes.
> >
> >Nothing breaks, and it's a nice improvement for context switch
> >performance, in NMI/PMU performance, and also gives user space two free
> >registers to play around with.
>
> Thomas, Andy,
>
> Could you list your outstanding objections to this patchset? I know it
> might be rehashing stuff you've already mentioned in this thread but I
> think that there's a disconnect between folks and it'll help with
> restarting everything.
>

My outstanding objections are:

1. The previous submission was broken.  This should obviously be fixed.

2. The issues documented here need to be addressed:

https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?id=56f2ab41b652251f336a0f471b1033afeaedd161

3. Adding FSGSBASE fundamentally changes the user ABI, and the changes
will be observable.  This means that something could break, especially
in the case where ptrace is in use and the tracee uses the new
instructions.  The old behavior cannot sanely be preserved with
FSGSBASE enabled.  This isn't a showstopper, but whoever resubmits
this thing needs to document what changes and what use cases might
break.  I'm hopeful that the only thing that will break is actual
human beings using a tool like gdb to manually poke at the registers.
This is fine -- the behavior of the registers is different and human
beings debugging need to be aware of this.  But the existing automated
stuff that gdb, lldb, etc do needs to continue working.  This
especially includes using gdb to force the tracee to call a function,
e.g. 'p function()'.

4. The exising ptrace API does not provide a sane way to ask what the
base value associated with a selector would be.  This means that,
under the natural way to make FSGSBASE and ptrace work together (e.g.
as implemented in the previous submission), the tracer has no good way
to emulate 'MOV [whatever], %gs' in the tracee.

Now maybe no one cares about #4.  I certainly have the impression that
the *gdb developers* don't care.  But gdb isn't exactly a good example
of a piece of software that tries to work correctly when dealing with
unusual software.  Maybe other things like rr will care more.  It
might be nice to avoid a situation where a piece of careful software
(like rr?) can support kernel 5.y, but breaks in 5.y+1 because of
FSGSBASE, and only starts working again in 5.y+3 because we added the
ptrace API that's needed.

So maybe the first version should have a PTRACE_LOAD_SEGMENT that
sticks a selector in FS or GS and changes the base accordingly, even
if no current userspace has spoken up and said they need it.  And a
selftest.

--Andy

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v9 00/17] Enable FSGSBASE instructions
  2020-04-17 15:52                     ` Andy Lutomirski
@ 2020-04-20 14:13                       ` Andi Kleen
  2020-04-20 17:14                         ` Thomas Gleixner
  0 siblings, 1 reply; 46+ messages in thread
From: Andi Kleen @ 2020-04-20 14:13 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Sasha Levin, Bae, Chang Seok, Metzger, Markus T, hpa,
	Thomas Gleixner, bp, Hansen, Dave, Luck, Tony, Pedro Alves,
	Simon Marchi, Shankar, Ravi V, linux-kernel

> Now maybe no one cares about #4.  

Yes noone cares. Selectors are largely obsolete.

> the *gdb developers* don't care.  But gdb isn't exactly a good example
> of a piece of software that tries to work correctly when dealing with
> unusual software.  Maybe other things like rr will care more.  It

rr is used to replay modern software, and modern software
doesn't care about selectors, thus rr doesn't care either.

Please stop the FUD.

Thanks,
-Andi

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v9 00/17] Enable FSGSBASE instructions
  2020-04-20 14:13                       ` Andi Kleen
@ 2020-04-20 17:14                         ` Thomas Gleixner
  2020-04-21 16:06                           ` Sasha Levin
  0 siblings, 1 reply; 46+ messages in thread
From: Thomas Gleixner @ 2020-04-20 17:14 UTC (permalink / raw)
  To: Andi Kleen, Andy Lutomirski
  Cc: Sasha Levin, Bae, Chang Seok, Metzger, Markus T, hpa, bp, Hansen,
	Dave, Luck, Tony, Pedro Alves, Simon Marchi, Shankar, Ravi V,
	linux-kernel

Andi Kleen <ak@linux.intel.com> writes:
>> the *gdb developers* don't care.  But gdb isn't exactly a good example
>> of a piece of software that tries to work correctly when dealing with
>> unusual software.  Maybe other things like rr will care more.  It
>
> rr is used to replay modern software, and modern software
> doesn't care about selectors, thus rr doesn't care either.
>
> Please stop the FUD.

There is absolutely no FUD. Being careful about not breaking existing
user space is a legitimate request.

It's up to those who change the ABI to prove that it does not matter and
not up to the maintainers to figure it out.

This sits in limbo for months now just because Intel doesn't get it's
homework done.

Stop making false accusations and provide factual information instead.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v9 00/17] Enable FSGSBASE instructions
  2020-04-20 17:14                         ` Thomas Gleixner
@ 2020-04-21 16:06                           ` Sasha Levin
  2020-04-21 16:49                             ` Andy Lutomirski
                                               ` (2 more replies)
  0 siblings, 3 replies; 46+ messages in thread
From: Sasha Levin @ 2020-04-21 16:06 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Andi Kleen, Andy Lutomirski, Bae, Chang Seok, Metzger, Markus T,
	hpa, bp, Hansen, Dave, Luck, Tony, Pedro Alves, Simon Marchi,
	Shankar, Ravi V, linux-kernel

On Mon, Apr 20, 2020 at 07:14:46PM +0200, Thomas Gleixner wrote:
>Andi Kleen <ak@linux.intel.com> writes:
>>> the *gdb developers* don't care.  But gdb isn't exactly a good example
>>> of a piece of software that tries to work correctly when dealing with
>>> unusual software.  Maybe other things like rr will care more.  It
>>
>> rr is used to replay modern software, and modern software
>> doesn't care about selectors, thus rr doesn't care either.
>>
>> Please stop the FUD.
>
>There is absolutely no FUD. Being careful about not breaking existing
>user space is a legitimate request.
>
>It's up to those who change the ABI to prove that it does not matter and
>not up to the maintainers to figure it out.

I think that this is a difficult ask; "prove that god doesn't exist".

Andi's point is that there is no known user it breaks, and the Intel
folks did some digging into potential users who might be affected by
this, including 'rr' brought up by Andy, and concluded that there won't
be breakage as a result of this patchset:

	https://mail.mozilla.org/pipermail/rr-dev/2018-March/000616.html

Sure, if you poke at it you could see a behavior change, but is there
an actual user that will be affected by it? I suspect not.

>This sits in limbo for months now just because Intel doesn't get it's
>homework done.
>
>Stop making false accusations and provide factual information instead.

If there's no known user that will be broken here, can we consider
merging this to be disabled by default and let distros try it out? This
will let us find these users while providing an easy way to work around
the problem.

-- 
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v9 00/17] Enable FSGSBASE instructions
  2020-04-21 16:06                           ` Sasha Levin
@ 2020-04-21 16:49                             ` Andy Lutomirski
  2020-04-21 20:02                               ` Andi Kleen
  2020-04-21 17:15                             ` Bae, Chang Seok
  2020-04-21 19:56                             ` Andi Kleen
  2 siblings, 1 reply; 46+ messages in thread
From: Andy Lutomirski @ 2020-04-21 16:49 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Thomas Gleixner, Andi Kleen, Andy Lutomirski, Bae, Chang Seok,
	Metzger, Markus T, hpa, bp, Hansen, Dave, Luck, Tony,
	Pedro Alves, Simon Marchi, Shankar, Ravi V, linux-kernel



> On Apr 21, 2020, at 9:06 AM, Sasha Levin <sashal@kernel.org> wrote:
> 
> On Mon, Apr 20, 2020 at 07:14:46PM +0200, Thomas Gleixner wrote:
>> Andi Kleen <ak@linux.intel.com> writes:
>>>> the *gdb developers* don't care.  But gdb isn't exactly a good example
>>>> of a piece of software that tries to work correctly when dealing with
>>>> unusual software.  Maybe other things like rr will care more.  It
>>> 
>>> rr is used to replay modern software, and modern software
>>> doesn't care about selectors, thus rr doesn't care either.
>>> 
>>> Please stop the FUD.
>> 
>> There is absolutely no FUD. Being careful about not breaking existing
>> user space is a legitimate request.
>> 
>> It's up to those who change the ABI to prove that it does not matter and
>> not up to the maintainers to figure it out.
> 
> I think that this is a difficult ask; "prove that god doesn't exist".
> 
> Andi's point is that there is no known user it breaks, and the Intel
> folks did some digging into potential users who might be affected by
> this, including 'rr' brought up by Andy, and concluded that there won't
> be breakage as a result of this patchset:
> 
>    https://mail.mozilla.org/pipermail/rr-dev/2018-March/000616.html
> 
> Sure, if you poke at it you could see a behavior change, but is there
> an actual user that will be affected by it? I suspect not.
> 
>> This sits in limbo for months now just because Intel doesn't get it's
>> homework done.
>> 
>> Stop making false accusations and provide factual information instead.
> 
> If there's no known user that will be broken here, can we consider
> merging this to be disabled by default and let distros try it out? This
> will let us find these users while providing an easy way to work around
> the problem.

No.  Once it’s merged, people will write user code using the ABI, and that means we need to get the ABI right.

The very early versions had severely problematic ABIs. The new ones are probably okay except for, maybe, ptrace.  If we had merged the old ones, then we might have gotten stuck with the old, problematic ABI.

> 
> -- 
> Thanks,
> Sasha

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v9 00/17] Enable FSGSBASE instructions
  2020-04-21 16:06                           ` Sasha Levin
  2020-04-21 16:49                             ` Andy Lutomirski
@ 2020-04-21 17:15                             ` Bae, Chang Seok
  2020-04-21 19:56                             ` Andi Kleen
  2 siblings, 0 replies; 46+ messages in thread
From: Bae, Chang Seok @ 2020-04-21 17:15 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Thomas Gleixner, Andi Kleen, Andy Lutomirski, Metzger, Markus T,
	hpa, bp, Hansen, Dave, Luck, Tony, Pedro Alves, Simon Marchi,
	Shankar, Ravi V, linux-kernel


> On Apr 21, 2020, at 09:06, Sasha Levin <sashal@kernel.org> wrote:
> 
> Andi's point is that there is no known user it breaks, and the Intel
> folks did some digging into potential users who might be affected by
> this, including 'rr' brought up by Andy, and concluded that there won't
> be breakage as a result of this patchset:

FWIW, we surveyed tools like rr and CRIU before. Their comments are [*,**]:

   "Anyway I think rr will be fine with the new behavior. Our modifications
    to fs/gs/fs_base/gs_base are always either a) setting values that the
    kernel set during recording to make them happen during replay or b)
    emulating PTRACE_SET_REGS that a tracee ptracer tried to set on another
    tracee. Either way I think the effects are going to be the same as what
    would happen if the program were run without rr.”

   "Internally in criu we fetch the regset via ptrace and keep them in
    images as they were at moment of dump (if ldt is being used we don't
    support such tasks) so I think the changes should not break criu."

What we took away was that those tools reactively follow the underlying
kernel's behavior; so, they should be fine with the FSGSBASE-brought new
behaviors.

[*] https://mail.mozilla.org/pipermail/rr-dev/2018-March/000615.html
[**] https://lists.openvz.org/pipermail/criu/2018-March/040654.html

Thanks,
Chang

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v9 00/17] Enable FSGSBASE instructions
  2020-04-21 16:06                           ` Sasha Levin
  2020-04-21 16:49                             ` Andy Lutomirski
  2020-04-21 17:15                             ` Bae, Chang Seok
@ 2020-04-21 19:56                             ` Andi Kleen
  2020-04-21 20:21                               ` Andy Lutomirski
  2 siblings, 1 reply; 46+ messages in thread
From: Andi Kleen @ 2020-04-21 19:56 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Thomas Gleixner, Andy Lutomirski, Bae, Chang Seok, Metzger,
	Markus T, hpa, bp, Hansen, Dave, Luck, Tony, Pedro Alves,
	Simon Marchi, Shankar, Ravi V, linux-kernel

> Andi's point is that there is no known user it breaks, and the Intel
> folks did some digging into potential users who might be affected by
> this, including 'rr' brought up by Andy, and concluded that there won't
> be breakage as a result of this patchset:
> 
> 	https://mail.mozilla.org/pipermail/rr-dev/2018-March/000616.html
> 
> Sure, if you poke at it you could see a behavior change, but is there
> an actual user that will be affected by it? I suspect not.

Actually we don't know of any behavior changes caused by the kernel
with selectors.

The application can change itself of course, but only if it uses the 
new instructions, which no current application does.

[This was different in the original patch kit long ago which could
change behavior on context switch for programs with out of sync selectors,
but this has been long fixed]

A debugger can also change behavior, but we're not aware of any case
that it would break.

For rr or criu we're also not aware of any case that could break.

I honestly don't know what else could be done in due diligence.

Also just to reiterate merging this would immediately shave off
hundreds of cycles in most context switches.

-Andi

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v9 00/17] Enable FSGSBASE instructions
  2020-04-21 16:49                             ` Andy Lutomirski
@ 2020-04-21 20:02                               ` Andi Kleen
  0 siblings, 0 replies; 46+ messages in thread
From: Andi Kleen @ 2020-04-21 20:02 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Sasha Levin, Thomas Gleixner, Andy Lutomirski, Bae, Chang Seok,
	Metzger, Markus T, hpa, bp, Hansen, Dave, Luck, Tony,
	Pedro Alves, Simon Marchi, Shankar, Ravi V, linux-kernel

> The very early versions had severely problematic ABIs. The new ones are probably okay except for, maybe, ptrace.  If we had merged the old ones, then we might have gotten stuck with the old, problematic ABI.

This is beyond vague. Is there a problem with the ABI or not?

If yes please point it out in an actionable concrete way that it can
be addressed.

If not there shouldn't be any reason to further block it.

Thanks

-Andi

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v9 00/17] Enable FSGSBASE instructions
  2020-04-21 19:56                             ` Andi Kleen
@ 2020-04-21 20:21                               ` Andy Lutomirski
  2020-04-21 20:51                                 ` Sasha Levin
  0 siblings, 1 reply; 46+ messages in thread
From: Andy Lutomirski @ 2020-04-21 20:21 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Sasha Levin, Thomas Gleixner, Andy Lutomirski, Bae, Chang Seok,
	Metzger, Markus T, hpa, bp, Hansen, Dave, Luck, Tony,
	Pedro Alves, Simon Marchi, Shankar, Ravi V, linux-kernel



> On Apr 21, 2020, at 12:56 PM, Andi Kleen <ak@linux.intel.com> wrote:
> 
> 
>> 
>> Andi's point is that there is no known user it breaks, and the Intel
>> folks did some digging into potential users who might be affected by
>> this, including 'rr' brought up by Andy, and concluded that there won't
>> be breakage as a result of this patchset:
>> 
>>    https://mail.mozilla.org/pipermail/rr-dev/2018-March/000616.html
>> 
>> Sure, if you poke at it you could see a behavior change, but is there
>> an actual user that will be affected by it? I suspect not.
> 
> Actually we don't know of any behavior changes caused by the kernel
> with selectors.
> 
> The application can change itself of course, but only if it uses the 
> new instructions, which no current application does.

If you use ptrace to change the gs selector, the behavior is different on a patched kernel.

Again, I’m not saying that the change is problematic. But I will say that the fact that anyone involved in this series keeps ignoring this fact makes me quite uncomfortable with the patch set.

> 
> [This was different in the original patch kit long ago which could
> change behavior on context switch for programs with out of sync selectors,
> but this has been long fixed]

That’s the issue I was referring to.

> 
> A debugger can also change behavior, but we're not aware of any case
> that it would break.

How hard did you look?

> 
> For rr or criu we're also not aware of any case that could break.
> 
> I honestly don't know what else could be done in due diligence.
> 
> Also just to reiterate merging this would immediately shave off
> hundreds of cycles in most context switches.
> 
> -Andi

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v9 00/17] Enable FSGSBASE instructions
  2020-04-21 20:21                               ` Andy Lutomirski
@ 2020-04-21 20:51                                 ` Sasha Levin
  2020-04-22 23:00                                   ` Andy Lutomirski
  0 siblings, 1 reply; 46+ messages in thread
From: Sasha Levin @ 2020-04-21 20:51 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Andi Kleen, Thomas Gleixner, Andy Lutomirski, Bae, Chang Seok,
	Metzger, Markus T, hpa, bp, Hansen, Dave, Luck, Tony,
	Pedro Alves, Simon Marchi, Shankar, Ravi V, linux-kernel

On Tue, Apr 21, 2020 at 01:21:39PM -0700, Andy Lutomirski wrote:
>
>
>> On Apr 21, 2020, at 12:56 PM, Andi Kleen <ak@linux.intel.com> wrote:
>>
>> 
>>>
>>> Andi's point is that there is no known user it breaks, and the Intel
>>> folks did some digging into potential users who might be affected by
>>> this, including 'rr' brought up by Andy, and concluded that there won't
>>> be breakage as a result of this patchset:
>>>
>>>    https://mail.mozilla.org/pipermail/rr-dev/2018-March/000616.html
>>>
>>> Sure, if you poke at it you could see a behavior change, but is there
>>> an actual user that will be affected by it? I suspect not.
>>
>> Actually we don't know of any behavior changes caused by the kernel
>> with selectors.
>>
>> The application can change itself of course, but only if it uses the
>> new instructions, which no current application does.
>
>If you use ptrace to change the gs selector, the behavior is different on a patched kernel.
>
>Again, I’m not saying that the change is problematic. But I will say that the fact that anyone involved in this series keeps ignoring this fact makes me quite uncomfortable with the patch set.

That's what I referred to with "poke at it". While the behavior may be
different, I fail to find anyone who cares.

>>
>> [This was different in the original patch kit long ago which could
>> change behavior on context switch for programs with out of sync selectors,
>> but this has been long fixed]
>
>That’s the issue I was referring to.
>
>>
>> A debugger can also change behavior, but we're not aware of any case
>> that it would break.
>
>How hard did you look?

Come on, how does one respond to this?

Is there a real use case affected by this? If so, point it out and I'll
be happy to go test it. This was already done (per your previous
request) for gdb and rr.

-- 
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v9 00/17] Enable FSGSBASE instructions
  2020-04-21 20:51                                 ` Sasha Levin
@ 2020-04-22 23:00                                   ` Andy Lutomirski
  2020-04-23  4:08                                     ` Sasha Levin
  0 siblings, 1 reply; 46+ messages in thread
From: Andy Lutomirski @ 2020-04-22 23:00 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Andi Kleen, Thomas Gleixner, Andy Lutomirski, Bae, Chang Seok,
	Metzger, Markus T, hpa, bp, Hansen, Dave, Luck, Tony,
	Pedro Alves, Simon Marchi, Shankar, Ravi V, linux-kernel

On Tue, Apr 21, 2020 at 1:51 PM Sasha Levin <sashal@kernel.org> wrote:
>
> On Tue, Apr 21, 2020 at 01:21:39PM -0700, Andy Lutomirski wrote:
> >
> >
> >> On Apr 21, 2020, at 12:56 PM, Andi Kleen <ak@linux.intel.com> wrote:
> >>
> >> 
> >>>
> >>> Andi's point is that there is no known user it breaks, and the Intel
> >>> folks did some digging into potential users who might be affected by
> >>> this, including 'rr' brought up by Andy, and concluded that there won't
> >>> be breakage as a result of this patchset:
> >>>
> >>>    https://mail.mozilla.org/pipermail/rr-dev/2018-March/000616.html
> >>>
> >>> Sure, if you poke at it you could see a behavior change, but is there
> >>> an actual user that will be affected by it? I suspect not.
> >>
> >> Actually we don't know of any behavior changes caused by the kernel
> >> with selectors.
> >>
> >> The application can change itself of course, but only if it uses the
> >> new instructions, which no current application does.
> >
> >If you use ptrace to change the gs selector, the behavior is different on a patched kernel.
> >
> >Again, I’m not saying that the change is problematic. But I will say that the fact that anyone involved in this series keeps ignoring this fact makes me quite uncomfortable with the patch set.
>
> That's what I referred to with "poke at it". While the behavior may be
> different, I fail to find anyone who cares.
>
> >>
> >> [This was different in the original patch kit long ago which could
> >> change behavior on context switch for programs with out of sync selectors,
> >> but this has been long fixed]
> >
> >That’s the issue I was referring to.
> >
> >>
> >> A debugger can also change behavior, but we're not aware of any case
> >> that it would break.
> >
> >How hard did you look?
>
> Come on, how does one respond to this?
>
> Is there a real use case affected by this? If so, point it out and I'll
> be happy to go test it. This was already done (per your previous
> request) for gdb and rr.
>

gdb and rr are certainly a good start.  If patches show up, I'll take a look.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v9 00/17] Enable FSGSBASE instructions
  2020-04-22 23:00                                   ` Andy Lutomirski
@ 2020-04-23  4:08                                     ` Sasha Levin
  2020-04-25 22:39                                       ` Thomas Gleixner
  0 siblings, 1 reply; 46+ messages in thread
From: Sasha Levin @ 2020-04-23  4:08 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Andi Kleen, Thomas Gleixner, Andy Lutomirski, Bae, Chang Seok,
	Metzger, Markus T, hpa, bp, Hansen, Dave, Luck, Tony,
	Pedro Alves, Simon Marchi, Shankar, Ravi V, linux-kernel

On Wed, Apr 22, 2020 at 04:00:16PM -0700, Andy Lutomirski wrote:
>On Tue, Apr 21, 2020 at 1:51 PM Sasha Levin <sashal@kernel.org> wrote:
>>
>> On Tue, Apr 21, 2020 at 01:21:39PM -0700, Andy Lutomirski wrote:
>> >
>> >
>> >> On Apr 21, 2020, at 12:56 PM, Andi Kleen <ak@linux.intel.com> wrote:
>> >>
>> >> 
>> >>>
>> >>> Andi's point is that there is no known user it breaks, and the Intel
>> >>> folks did some digging into potential users who might be affected by
>> >>> this, including 'rr' brought up by Andy, and concluded that there won't
>> >>> be breakage as a result of this patchset:
>> >>>
>> >>>    https://mail.mozilla.org/pipermail/rr-dev/2018-March/000616.html
>> >>>
>> >>> Sure, if you poke at it you could see a behavior change, but is there
>> >>> an actual user that will be affected by it? I suspect not.
>> >>
>> >> Actually we don't know of any behavior changes caused by the kernel
>> >> with selectors.
>> >>
>> >> The application can change itself of course, but only if it uses the
>> >> new instructions, which no current application does.
>> >
>> >If you use ptrace to change the gs selector, the behavior is different on a patched kernel.
>> >
>> >Again, I’m not saying that the change is problematic. But I will say that the fact that anyone involved in this series keeps ignoring this fact makes me quite uncomfortable with the patch set.
>>
>> That's what I referred to with "poke at it". While the behavior may be
>> different, I fail to find anyone who cares.
>>
>> >>
>> >> [This was different in the original patch kit long ago which could
>> >> change behavior on context switch for programs with out of sync selectors,
>> >> but this has been long fixed]
>> >
>> >That’s the issue I was referring to.
>> >
>> >>
>> >> A debugger can also change behavior, but we're not aware of any case
>> >> that it would break.
>> >
>> >How hard did you look?
>>
>> Come on, how does one respond to this?
>>
>> Is there a real use case affected by this? If so, point it out and I'll
>> be happy to go test it. This was already done (per your previous
>> request) for gdb and rr.
>>
>
>gdb and rr are certainly a good start.  If patches show up, I'll take a look.

I'm sorry, but what patches are we talking about?

I just went to gdb to check again that I'm not crazy, and the scenario
you were worried about seems to work just fine:

134			asm volatile ("mov %%gs:(%%rcx), %%rax" : : "c" (offset) : "rax");
(gdb) p printme()
Hi!
$1 = void
(gdb)

Again, please point me to a specific user we break.

-- 
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v9 00/17] Enable FSGSBASE instructions
  2020-04-23  4:08                                     ` Sasha Levin
@ 2020-04-25 22:39                                       ` Thomas Gleixner
  2020-04-26  2:52                                         ` Sasha Levin
  0 siblings, 1 reply; 46+ messages in thread
From: Thomas Gleixner @ 2020-04-25 22:39 UTC (permalink / raw)
  To: Sasha Levin, Andy Lutomirski
  Cc: Andi Kleen, Andy Lutomirski, Bae, Chang Seok, Metzger, Markus T,
	hpa, bp, Hansen, Dave, Luck, Tony, Pedro Alves, Simon Marchi,
	Shankar, Ravi V, linux-kernel

Sasha Levin <sashal@kernel.org> writes:
> On Wed, Apr 22, 2020 at 04:00:16PM -0700, Andy Lutomirski wrote:
>>
>>gdb and rr are certainly a good start.  If patches show up, I'll take a look.
>
> I'm sorry, but what patches are we talking about?

About patches which:

  - Are rebased to current upstream

  - Addressed the outstanding review comments

  - Have proper documentation in the changelog of the user space visible
    ABI changes why it does not break any existing usage and having the
    relevant people who maintain tools which utilize the affected
    interfaces Cc'ed on submission.

  - Made sure that the cleanups I did when merging them initially have
    been picked up. I'm not going to waste another couple of days on
    this mess just to revert it because it hadn't seen any serious
    testing in development.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v9 00/17] Enable FSGSBASE instructions
  2020-04-25 22:39                                       ` Thomas Gleixner
@ 2020-04-26  2:52                                         ` Sasha Levin
  2020-04-26 10:04                                           ` Thomas Gleixner
  0 siblings, 1 reply; 46+ messages in thread
From: Sasha Levin @ 2020-04-26  2:52 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Andy Lutomirski, Andi Kleen, Andy Lutomirski, Bae, Chang Seok,
	Metzger, Markus T, hpa, bp, Hansen, Dave, Luck, Tony,
	Pedro Alves, Simon Marchi, Shankar, Ravi V, linux-kernel

On Sun, Apr 26, 2020 at 12:39:27AM +0200, Thomas Gleixner wrote:
>Sasha Levin <sashal@kernel.org> writes:
>> On Wed, Apr 22, 2020 at 04:00:16PM -0700, Andy Lutomirski wrote:
>>>
>>>gdb and rr are certainly a good start.  If patches show up, I'll take a look.
>>
>> I'm sorry, but what patches are we talking about?
>
>About patches which:
>
>  - Are rebased to current upstream

v10 of this series was sent a few days ago and is rebased on top of
v5.7-rc1:
https://lore.kernel.org/lkml/20200423232207.5797-1-sashal@kernel.org/ .

>  - Addressed the outstanding review comments

I saw a review that Andy has just sent on patch #1 from the new series,
I'll address that.

>  - Have proper documentation in the changelog of the user space visible
>    ABI changes why it does not break any existing usage and having the
>    relevant people who maintain tools which utilize the affected
>    interfaces Cc'ed on submission.

The cover letter has references to mail correspondence with maintainers
of these tools that are affected by this change. Each of those exchanges
goes over what FSGSBASE does and answers any specific questions those
maintainers had.

If you want it out of the cover letter and into one of the patches I'd
be happy to do that. If you want me to go chase down another userspace
which we might be breaking just let me know which.

I didn't want to have them on the Cc line as they have already acked
this change from their end and I wanted to avoid additional noise. I'll
be happy to add them back to the next spin of this.

>  - Made sure that the cleanups I did when merging them initially have
>    been picked up. I'm not going to waste another couple of days on
>    this mess just to revert it because it hadn't seen any serious
>    testing in development.

Based on your revert
(https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?h=x86/cpu&id=049331f277fef1c3f2527c2c9afa1d285e9a1247)
I believe that we have all the relevant patches in the series.

I'll also add here that several groups at Microsoft have been running
workloads that heavily exercise the functionality added by this patch.
I'd say that it has gotten a solid round of testing for the past few
months.

-- 
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v9 00/17] Enable FSGSBASE instructions
  2020-04-26  2:52                                         ` Sasha Levin
@ 2020-04-26 10:04                                           ` Thomas Gleixner
  0 siblings, 0 replies; 46+ messages in thread
From: Thomas Gleixner @ 2020-04-26 10:04 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Andy Lutomirski, Andi Kleen, Andy Lutomirski, Bae, Chang Seok,
	Metzger, Markus T, hpa, bp, Hansen, Dave, Luck, Tony,
	Pedro Alves, Simon Marchi, Shankar, Ravi V, linux-kernel

Sasha Levin <sashal@kernel.org> writes:
> On Sun, Apr 26, 2020 at 12:39:27AM +0200, Thomas Gleixner wrote:
>>  - Addressed the outstanding review comments
>
> I saw a review that Andy has just sent on patch #1 from the new series,
> I'll address that.

Please look at the last version from Intel as well whether there is anything
outstanding.

>>  - Have proper documentation in the changelog of the user space visible
>>    ABI changes why it does not break any existing usage and having the
>>    relevant people who maintain tools which utilize the affected
>>    interfaces Cc'ed on submission.
>
> The cover letter has references to mail correspondence with maintainers
> of these tools that are affected by this change. Each of those exchanges
> goes over what FSGSBASE does and answers any specific questions those
> maintainers had.
>
> If you want it out of the cover letter and into one of the patches I'd
> be happy to do that. If you want me to go chase down another userspace
> which we might be breaking just let me know which.

Yes, please add the information to the changelogs. That's where it
really belongs.

> I didn't want to have them on the Cc line as they have already acked
> this change from their end and I wanted to avoid additional noise. I'll
> be happy to add them back to the next spin of this.
>
>>  - Made sure that the cleanups I did when merging them initially have
>>    been picked up. I'm not going to waste another couple of days on
>>    this mess just to revert it because it hadn't seen any serious
>>    testing in development.
>
> Based on your revert
> (https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?h=x86/cpu&id=049331f277fef1c3f2527c2c9afa1d285e9a1247)
> I believe that we have all the relevant patches in the series.

Ok.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2020-04-26 10:04 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-04 18:15 [PATCH v9 00/17] Enable FSGSBASE instructions Chang S. Bae
2019-10-04 18:15 ` [PATCH v9 01/17] x86/ptrace: Prevent ptrace from clearing the FS/GS selector Chang S. Bae
2019-10-04 18:15 ` [PATCH v9 02/17] selftests/x86/fsgsbase: Test GS selector on ptracer-induced GS base write Chang S. Bae
2019-10-04 18:15 ` [PATCH v9 03/17] x86/cpu: Add 'unsafe_fsgsbase' to enable CR4.FSGSBASE Chang S. Bae
2019-10-04 18:15 ` [PATCH v9 04/17] x86/entry/64: Clean up paranoid exit Chang S. Bae
2019-10-04 18:15 ` [PATCH v9 05/17] x86/entry/64: Switch CR3 before SWAPGS in paranoid entry Chang S. Bae
2019-10-04 18:15 ` [PATCH v9 06/17] x86/entry/64: Introduce the FIND_PERCPU_BASE macro Chang S. Bae
2019-10-04 18:15 ` [PATCH v9 07/17] x86/entry/64: Handle FSGSBASE enabled paranoid entry/exit Chang S. Bae
2019-10-04 18:16 ` [PATCH v9 08/17] x86/entry/64: Document GSBASE handling in the paranoid path Chang S. Bae
2019-10-04 18:16 ` [PATCH v9 09/17] x86/fsgsbase/64: Add intrinsics for FSGSBASE instructions Chang S. Bae
2019-10-04 18:16 ` [PATCH v9 10/17] x86/fsgsbase/64: Enable FSGSBASE instructions in helper functions Chang S. Bae
2019-10-04 18:16 ` [PATCH v9 11/17] x86/fsgsbase/64: Use FSGSBASE in switch_to() if available Chang S. Bae
2019-10-04 18:16 ` [PATCH v9 12/17] x86/fsgsbase/64: Use FSGSBASE instructions on thread copy and ptrace Chang S. Bae
2019-10-04 18:16 ` [PATCH v9 13/17] x86/speculation/swapgs: Check FSGSBASE in enabling SWAPGS mitigation Chang S. Bae
2019-10-04 18:16 ` [PATCH v9 14/17] selftests/x86/fsgsbase: Test ptracer-induced GS base write with FSGSBASE Chang S. Bae
2019-10-04 18:16 ` [PATCH v9 15/17] x86/fsgsbase/64: Enable FSGSBASE on 64bit by default and add a chicken bit Chang S. Bae
2019-10-04 18:16 ` [PATCH v9 16/17] x86/elf: Enumerate kernel FSGSBASE capability in AT_HWCAP2 Chang S. Bae
2019-10-04 18:16 ` [PATCH v9 17/17] Documentation/x86/64: Add documentation for GS/FS addressing mode Chang S. Bae
2019-10-04 22:54   ` Randy Dunlap
2019-11-15 18:29 ` [PATCH v9 00/17] Enable FSGSBASE instructions Thomas Gleixner
2019-11-15 19:12   ` Andi Kleen
2019-11-29 14:56     ` Metzger, Markus T
2019-11-29 16:51       ` Andy Lutomirski
2019-12-02  8:23         ` Metzger, Markus T
2019-12-04 20:20           ` Andy Lutomirski
2019-12-10  8:27             ` Metzger, Markus T
2020-02-24 18:02             ` Bae, Chang Seok
2020-04-13 20:03               ` Sasha Levin
2020-04-14  0:32                 ` Andi Kleen
2020-04-17 13:30                   ` Sasha Levin
2020-04-17 15:52                     ` Andy Lutomirski
2020-04-20 14:13                       ` Andi Kleen
2020-04-20 17:14                         ` Thomas Gleixner
2020-04-21 16:06                           ` Sasha Levin
2020-04-21 16:49                             ` Andy Lutomirski
2020-04-21 20:02                               ` Andi Kleen
2020-04-21 17:15                             ` Bae, Chang Seok
2020-04-21 19:56                             ` Andi Kleen
2020-04-21 20:21                               ` Andy Lutomirski
2020-04-21 20:51                                 ` Sasha Levin
2020-04-22 23:00                                   ` Andy Lutomirski
2020-04-23  4:08                                     ` Sasha Levin
2020-04-25 22:39                                       ` Thomas Gleixner
2020-04-26  2:52                                         ` Sasha Levin
2020-04-26 10:04                                           ` Thomas Gleixner
2020-04-14 15:47                 ` Bae, Chang Seok

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).