linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 0/8] x86: infrastructure to enable FSGSBASE
@ 2018-09-18 23:08 Chang S. Bae
  2018-09-18 23:08 ` [PATCH v6 1/8] x86/arch_prctl/64: Make ptrace read FS/GS base accurately Chang S. Bae
                   ` (7 more replies)
  0 siblings, 8 replies; 27+ messages in thread
From: Chang S. Bae @ 2018-09-18 23:08 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, Andy Lutomirski, H . Peter Anvin
  Cc: Andi Kleen, Dave Hansen, Markus T Metzger, Ravi Shankar,
	Chang S . Bae, LKML

Changes from V5 [5]:
* Unified name-space for the new FS/GS helpers
* Port patch #7 to latest -tip (resolve conflict with
e78e5a91456f: 'x86/vdso: Fix lsl operand order')
* Minor updates on comments and descriptions

Changes from V4 [4]:
* Change patch ordering; putting the fix first before introducing
the helper functions
* Cleanup further for vDSO CPU initialization codes

Changes from V3 [3]:
* Unify CPU number initialization

Changes from V2 [2]:
* Bisect the CPU number initialization patch
* Drop patches for introducing i386 CPU_NUMBER and switching
write_rdtscp_aux() to use wrmsr_safe()

Changes from V1 [1]:
* Rename the x86-64 CPU_NUMBER segment from PER_CPU
* Add i386 CPU_NUMBER equivalent to x86-64 at GDT entry 23
* Add additional helper function to store CPU number
* Switch write_rdtscp_aux() to use wrmsr_safe()

[1] V1: https://lore.kernel.org/patchwork/cover/913139/
[2] V2: https://lore.kernel.org/patchwork/cover/913644/
[3] V3: https://lore.kernel.org/patchwork/cover/949775/
[4] V4: https://lore.kernel.org/patchwork/cover/951712/
[5] V5: https://lore.kernel.org/patchwork/cover/956526/

Andy Lutomirski (1):
  x86/arch_prctl/64: Make ptrace read FS/GS base accurately

Chang S. Bae (7):
  x86/fsgsbase/64: Introduce FS/GS base helper functions
  x86/fsgsbase/64: Make ptrace use correct FS/GS base helpers
  x86/fsgsbase/64: Use FS/GS base helpers in core dump
  x86/fsgsbase/64: Factor out load FS/GS segments from __switch_to()
  x86/segments/64: Rename PER_CPU segment to CPU_NUMBER
  x86/vdso: Introduce helper functions for CPU and node number
  x86/vdso: Move out the CPU initialization

 arch/x86/entry/vdso/vgetcpu.c   |   9 +-
 arch/x86/entry/vdso/vma.c       |  38 +--------
 arch/x86/include/asm/elf.h      |   6 +-
 arch/x86/include/asm/fsgsbase.h |  47 +++++++++++
 arch/x86/include/asm/segment.h  |  46 +++++++++-
 arch/x86/include/asm/vgtod.h    |  26 ------
 arch/x86/kernel/cpu/common.c    |  24 ++++++
 arch/x86/kernel/process_64.c    | 183 +++++++++++++++++++++++++++++++---------
 arch/x86/kernel/ptrace.c        |  28 ++----
 9 files changed, 270 insertions(+), 137 deletions(-)
 create mode 100644 arch/x86/include/asm/fsgsbase.h

-- 
2.7.4


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v6 1/8] x86/arch_prctl/64: Make ptrace read FS/GS base accurately
  2018-09-18 23:08 [PATCH v6 0/8] x86: infrastructure to enable FSGSBASE Chang S. Bae
@ 2018-09-18 23:08 ` Chang S. Bae
  2018-10-08  9:54   ` [tip:x86/asm] x86/fsgsbase/64: Fix ptrace() to read the " tip-bot for Andy Lutomirski
                     ` (2 more replies)
  2018-09-18 23:08 ` [PATCH v6 2/8] x86/fsgsbase/64: Introduce FS/GS base helper functions Chang S. Bae
                   ` (6 subsequent siblings)
  7 siblings, 3 replies; 27+ messages in thread
From: Chang S. Bae @ 2018-09-18 23:08 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, Andy Lutomirski, H . Peter Anvin
  Cc: Andi Kleen, Dave Hansen, Markus T Metzger, Ravi Shankar,
	Chang S . Bae, LKML

From: Andy Lutomirski <luto@kernel.org>

Ptrace can read FS/GS base using the register access API
(PTRACE_PEEKUSER, etc) or PTRACE_ARCH_PRCTL.  Make both of these
mechanisms return the actual FS/GS base.

This will improve debuggability by providing the correct information
to ptracer (GDB and etc).

Signed-off-by: Andy Lutomirski <luto@kernel.org>
[chang: Rebase and revise patch description]
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
---
 arch/x86/kernel/ptrace.c | 62 ++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 52 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index e2ee403..3acbf45 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -39,6 +39,7 @@
 #include <asm/hw_breakpoint.h>
 #include <asm/traps.h>
 #include <asm/syscall.h>
+#include <asm/mmu_context.h>
 
 #include "tls.h"
 
@@ -342,6 +343,49 @@ static int set_segment_reg(struct task_struct *task,
 	return 0;
 }
 
+static unsigned long task_seg_base(struct task_struct *task,
+				   unsigned short selector)
+{
+	unsigned short idx = selector >> 3;
+	unsigned long base;
+
+	if (likely((selector & SEGMENT_TI_MASK) == 0)) {
+		if (unlikely(idx >= GDT_ENTRIES))
+			return 0;
+
+		/*
+		 * There are no user segments in the GDT with nonzero bases
+		 * other than the TLS segments.
+		 */
+		if (idx < GDT_ENTRY_TLS_MIN || idx > GDT_ENTRY_TLS_MAX)
+			return 0;
+
+		idx -= GDT_ENTRY_TLS_MIN;
+		base = get_desc_base(&task->thread.tls_array[idx]);
+	} else {
+#ifdef CONFIG_MODIFY_LDT_SYSCALL
+		struct ldt_struct *ldt;
+
+		/*
+		 * If performance here mattered, we could protect the LDT
+		 * with RCU.  This is a slow path, though, so we can just
+		 * take the mutex.
+		 */
+		mutex_lock(&task->mm->context.lock);
+		ldt = task->mm->context.ldt;
+		if (unlikely(idx >= ldt->nr_entries))
+			base = 0;
+		else
+			base = get_desc_base(ldt->entries + idx);
+		mutex_unlock(&task->mm->context.lock);
+#else
+		base = 0;
+#endif
+	}
+
+	return base;
+}
+
 #endif	/* CONFIG_X86_32 */
 
 static unsigned long get_flags(struct task_struct *task)
@@ -435,18 +479,16 @@ static unsigned long getreg(struct task_struct *task, unsigned long offset)
 
 #ifdef CONFIG_X86_64
 	case offsetof(struct user_regs_struct, fs_base): {
-		/*
-		 * XXX: This will not behave as expected if called on
-		 * current or if fsindex != 0.
-		 */
-		return task->thread.fsbase;
+		if (task->thread.fsindex == 0)
+			return task->thread.fsbase;
+		else
+			return task_seg_base(task, task->thread.fsindex);
 	}
 	case offsetof(struct user_regs_struct, gs_base): {
-		/*
-		 * XXX: This will not behave as expected if called on
-		 * current or if fsindex != 0.
-		 */
-		return task->thread.gsbase;
+		if (task->thread.gsindex == 0)
+			return task->thread.gsbase;
+		else
+			return task_seg_base(task, task->thread.gsindex);
 	}
 #endif
 	}
-- 
2.7.4


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v6 2/8] x86/fsgsbase/64: Introduce FS/GS base helper functions
  2018-09-18 23:08 [PATCH v6 0/8] x86: infrastructure to enable FSGSBASE Chang S. Bae
  2018-09-18 23:08 ` [PATCH v6 1/8] x86/arch_prctl/64: Make ptrace read FS/GS base accurately Chang S. Bae
@ 2018-09-18 23:08 ` Chang S. Bae
  2018-10-08  9:55   ` [tip:x86/asm] " tip-bot for Chang S. Bae
  2018-10-24 19:01   ` [regression in -rc1] Re: [PATCH v6 2/8] " Andy Lutomirski
  2018-09-18 23:08 ` [PATCH v6 3/8] x86/fsgsbase/64: Make ptrace use correct FS/GS base helpers Chang S. Bae
                   ` (5 subsequent siblings)
  7 siblings, 2 replies; 27+ messages in thread
From: Chang S. Bae @ 2018-09-18 23:08 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, Andy Lutomirski, H . Peter Anvin
  Cc: Andi Kleen, Dave Hansen, Markus T Metzger, Ravi Shankar,
	Chang S . Bae, LKML

With new helpers, FS/GS base access is centralized.
Eventually, when FSGSBASE instruction enabled, it will
be faster.

"inactive" GS base refers to base backed up at kernel
entries and of inactive (user) task's.

task_seg_base() is renamed to x86_fsgsbase_read_task() and
moved out to kernel/process_64.c, where the helper functions
are implemented as closely coupled. When next patch makes
ptrace to use the helpers, it won't be directly accessed
from ptrace.

Based-on-code-from: Andy Lutomirski <luto@kernel.org>
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
---
 arch/x86/include/asm/fsgsbase.h |  50 ++++++++++++++++
 arch/x86/kernel/process_64.c    | 124 ++++++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/ptrace.c        |  51 ++---------------
 3 files changed, 179 insertions(+), 46 deletions(-)
 create mode 100644 arch/x86/include/asm/fsgsbase.h

diff --git a/arch/x86/include/asm/fsgsbase.h b/arch/x86/include/asm/fsgsbase.h
new file mode 100644
index 0000000..0404dab
--- /dev/null
+++ b/arch/x86/include/asm/fsgsbase.h
@@ -0,0 +1,50 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_FSGSBASE_H
+#define _ASM_FSGSBASE_H 1
+
+#ifndef __ASSEMBLY__
+
+#ifdef CONFIG_X86_64
+
+#include <asm/msr-index.h>
+
+unsigned long x86_fsgsbase_read_task(struct task_struct *task,
+				     unsigned short selector);
+
+/*
+ * Read/write a task's fsbase or gsbase. This returns the value that
+ * the FS/GS base would have (if the task were to be resumed). These
+ * work on current or on a different non-running task.
+ */
+unsigned long x86_fsbase_read_task(struct task_struct *task);
+unsigned long x86_gsbase_read_task(struct task_struct *task);
+int x86_fsbase_write_task(struct task_struct *task, unsigned long fsbase);
+int x86_gsbase_write_task(struct task_struct *task, unsigned long gsbase);
+
+/* Helper functions for reading/writing FS/GS base */
+
+static inline unsigned long x86_fsbase_read_cpu(void)
+{
+	unsigned long fsbase;
+
+	rdmsrl(MSR_FS_BASE, fsbase);
+	return fsbase;
+}
+
+void x86_fsbase_write_cpu(unsigned long fsbase);
+
+static inline unsigned long x86_gsbase_read_cpu_inactive(void)
+{
+	unsigned long gsbase;
+
+	rdmsrl(MSR_KERNEL_GS_BASE, gsbase);
+	return gsbase;
+}
+
+void  x86_gsbase_write_cpu_inactive(unsigned long gsbase);
+
+#endif /* CONFIG_X86_64 */
+
+#endif /* __ASSEMBLY__ */
+
+#endif /* _ASM_FSGSBASE_H */
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 1e6abfa..52af8c1 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -54,6 +54,7 @@
 #include <asm/vdso.h>
 #include <asm/intel_rdt_sched.h>
 #include <asm/unistd.h>
+#include <asm/fsgsbase.h>
 #ifdef CONFIG_IA32_EMULATION
 /* Not included via unistd.h */
 #include <asm/unistd_32_ia32.h>
@@ -284,6 +285,129 @@ static __always_inline void load_seg_legacy(unsigned short prev_index,
 	}
 }
 
+unsigned long x86_fsgsbase_read_task(struct task_struct *task,
+				     unsigned short selector)
+{
+	unsigned short idx = selector >> 3;
+	unsigned long base;
+
+	if (likely((selector & SEGMENT_TI_MASK) == 0)) {
+		if (unlikely(idx >= GDT_ENTRIES))
+			return 0;
+
+		/*
+		 * There are no user segments in the GDT with nonzero bases
+		 * other than the TLS segments.
+		 */
+		if (idx < GDT_ENTRY_TLS_MIN || idx > GDT_ENTRY_TLS_MAX)
+			return 0;
+
+		idx -= GDT_ENTRY_TLS_MIN;
+		base = get_desc_base(&task->thread.tls_array[idx]);
+	} else {
+#ifdef CONFIG_MODIFY_LDT_SYSCALL
+		struct ldt_struct *ldt;
+
+		/*
+		 * If performance here mattered, we could protect the LDT
+		 * with RCU.  This is a slow path, though, so we can just
+		 * take the mutex.
+		 */
+		mutex_lock(&task->mm->context.lock);
+		ldt = task->mm->context.ldt;
+		if (unlikely(idx >= ldt->nr_entries))
+			base = 0;
+		else
+			base = get_desc_base(ldt->entries + idx);
+		mutex_unlock(&task->mm->context.lock);
+#else
+		base = 0;
+#endif
+	}
+
+	return base;
+}
+
+void x86_fsbase_write_cpu(unsigned long fsbase)
+{
+	/*
+	 * Set the selector to 0 as a notion, that the segment base is
+	 * overwritten, which will be checked for skipping the segment load
+	 * during context switch.
+	 */
+	loadseg(FS, 0);
+	wrmsrl(MSR_FS_BASE, fsbase);
+}
+
+void x86_gsbase_write_cpu_inactive(unsigned long gsbase)
+{
+	/* Set the selector to 0 for the same reason as %fs above. */
+	loadseg(GS, 0);
+	wrmsrl(MSR_KERNEL_GS_BASE, gsbase);
+}
+
+unsigned long x86_fsbase_read_task(struct task_struct *task)
+{
+	unsigned long fsbase;
+
+	if (task == current)
+		fsbase = x86_fsbase_read_cpu();
+	else if (task->thread.fsindex == 0)
+		fsbase = task->thread.fsbase;
+	else
+		fsbase = x86_fsgsbase_read_task(task, task->thread.fsindex);
+
+	return fsbase;
+}
+
+unsigned long x86_gsbase_read_task(struct task_struct *task)
+{
+	unsigned long gsbase;
+
+	if (task == current)
+		gsbase = x86_gsbase_read_cpu_inactive();
+	else if (task->thread.gsindex == 0)
+		gsbase = task->thread.gsbase;
+	else
+		gsbase = x86_fsgsbase_read_task(task, task->thread.gsindex);
+
+	return gsbase;
+}
+
+int x86_fsbase_write_task(struct task_struct *task, unsigned long fsbase)
+{
+	/*
+	 * Not strictly needed for %fs, but do it for symmetry
+	 * with %gs
+	 */
+	if (unlikely(fsbase >= TASK_SIZE_MAX))
+		return -EPERM;
+
+	preempt_disable();
+	task->thread.fsbase = fsbase;
+	if (task == current)
+		x86_fsbase_write_cpu(fsbase);
+	task->thread.fsindex = 0;
+	preempt_enable();
+
+	return 0;
+}
+
+int x86_gsbase_write_task(struct task_struct *task, unsigned long gsbase)
+{
+	if (unlikely(gsbase >= TASK_SIZE_MAX))
+		return -EPERM;
+
+	preempt_disable();
+	task->thread.gsbase = gsbase;
+	if (task == current)
+		x86_gsbase_write_cpu_inactive(gsbase);
+	task->thread.gsindex = 0;
+	preempt_enable();
+
+	return 0;
+}
+
 int copy_thread_tls(unsigned long clone_flags, unsigned long sp,
 		unsigned long arg, struct task_struct *p, unsigned long tls)
 {
diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index 3acbf45..fbde2a7 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -39,7 +39,7 @@
 #include <asm/hw_breakpoint.h>
 #include <asm/traps.h>
 #include <asm/syscall.h>
-#include <asm/mmu_context.h>
+#include <asm/fsgsbase.h>
 
 #include "tls.h"
 
@@ -343,49 +343,6 @@ static int set_segment_reg(struct task_struct *task,
 	return 0;
 }
 
-static unsigned long task_seg_base(struct task_struct *task,
-				   unsigned short selector)
-{
-	unsigned short idx = selector >> 3;
-	unsigned long base;
-
-	if (likely((selector & SEGMENT_TI_MASK) == 0)) {
-		if (unlikely(idx >= GDT_ENTRIES))
-			return 0;
-
-		/*
-		 * There are no user segments in the GDT with nonzero bases
-		 * other than the TLS segments.
-		 */
-		if (idx < GDT_ENTRY_TLS_MIN || idx > GDT_ENTRY_TLS_MAX)
-			return 0;
-
-		idx -= GDT_ENTRY_TLS_MIN;
-		base = get_desc_base(&task->thread.tls_array[idx]);
-	} else {
-#ifdef CONFIG_MODIFY_LDT_SYSCALL
-		struct ldt_struct *ldt;
-
-		/*
-		 * If performance here mattered, we could protect the LDT
-		 * with RCU.  This is a slow path, though, so we can just
-		 * take the mutex.
-		 */
-		mutex_lock(&task->mm->context.lock);
-		ldt = task->mm->context.ldt;
-		if (unlikely(idx >= ldt->nr_entries))
-			base = 0;
-		else
-			base = get_desc_base(ldt->entries + idx);
-		mutex_unlock(&task->mm->context.lock);
-#else
-		base = 0;
-#endif
-	}
-
-	return base;
-}
-
 #endif	/* CONFIG_X86_32 */
 
 static unsigned long get_flags(struct task_struct *task)
@@ -482,13 +439,15 @@ static unsigned long getreg(struct task_struct *task, unsigned long offset)
 		if (task->thread.fsindex == 0)
 			return task->thread.fsbase;
 		else
-			return task_seg_base(task, task->thread.fsindex);
+			return x86_fsgsbase_read_task(task,
+						      task->thread.fsindex);
 	}
 	case offsetof(struct user_regs_struct, gs_base): {
 		if (task->thread.gsindex == 0)
 			return task->thread.gsbase;
 		else
-			return task_seg_base(task, task->thread.gsindex);
+			return x86_fsgsbase_read_task(task,
+						      task->thread.gsindex);
 	}
 #endif
 	}
-- 
2.7.4


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v6 3/8] x86/fsgsbase/64: Make ptrace use correct FS/GS base helpers
  2018-09-18 23:08 [PATCH v6 0/8] x86: infrastructure to enable FSGSBASE Chang S. Bae
  2018-09-18 23:08 ` [PATCH v6 1/8] x86/arch_prctl/64: Make ptrace read FS/GS base accurately Chang S. Bae
  2018-09-18 23:08 ` [PATCH v6 2/8] x86/fsgsbase/64: Introduce FS/GS base helper functions Chang S. Bae
@ 2018-09-18 23:08 ` Chang S. Bae
  2018-10-08  9:56   ` [tip:x86/asm] x86/fsgsbase/64: Make ptrace use the new " tip-bot for Chang S. Bae
  2018-09-18 23:08 ` [PATCH v6 4/8] x86/fsgsbase/64: Use FS/GS base helpers in core dump Chang S. Bae
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 27+ messages in thread
From: Chang S. Bae @ 2018-09-18 23:08 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, Andy Lutomirski, H . Peter Anvin
  Cc: Andi Kleen, Dave Hansen, Markus T Metzger, Ravi Shankar,
	Chang S . Bae, LKML

The FS/GS base helper functions are used on ptrace APIs
(PTRACE_ARCH_PRCTL, PTRACE_SETREG, PTRACE_GETREG, etc).
The FS/GS-update mechanism is now a bit organized.

Based-on-code-from: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
---
 arch/x86/include/asm/fsgsbase.h |  3 ---
 arch/x86/kernel/process_64.c    | 49 +++++++++--------------------------------
 arch/x86/kernel/ptrace.c        | 27 +++++++----------------
 3 files changed, 18 insertions(+), 61 deletions(-)

diff --git a/arch/x86/include/asm/fsgsbase.h b/arch/x86/include/asm/fsgsbase.h
index 0404dab..949eefc 100644
--- a/arch/x86/include/asm/fsgsbase.h
+++ b/arch/x86/include/asm/fsgsbase.h
@@ -8,9 +8,6 @@
 
 #include <asm/msr-index.h>
 
-unsigned long x86_fsgsbase_read_task(struct task_struct *task,
-				     unsigned short selector);
-
 /*
  * Read/write a task's fsbase or gsbase. This returns the value that
  * the FS/GS base would have (if the task were to be resumed). These
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 52af8c1..710f639 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -285,8 +285,8 @@ static __always_inline void load_seg_legacy(unsigned short prev_index,
 	}
 }
 
-unsigned long x86_fsgsbase_read_task(struct task_struct *task,
-				     unsigned short selector)
+static unsigned long x86_fsgsbase_read_task(struct task_struct *task,
+					    unsigned short selector)
 {
 	unsigned short idx = selector >> 3;
 	unsigned long base;
@@ -749,54 +749,25 @@ static long prctl_map_vdso(const struct vdso_image *image, unsigned long addr)
 long do_arch_prctl_64(struct task_struct *task, int option, unsigned long arg2)
 {
 	int ret = 0;
-	int doit = task == current;
-	int cpu;
 
 	switch (option) {
-	case ARCH_SET_GS:
-		if (arg2 >= TASK_SIZE_MAX)
-			return -EPERM;
-		cpu = get_cpu();
-		task->thread.gsindex = 0;
-		task->thread.gsbase = arg2;
-		if (doit) {
-			load_gs_index(0);
-			ret = wrmsrl_safe(MSR_KERNEL_GS_BASE, arg2);
-		}
-		put_cpu();
+	case ARCH_SET_GS: {
+		ret = x86_gsbase_write_task(task, arg2);
 		break;
-	case ARCH_SET_FS:
-		/* Not strictly needed for fs, but do it for symmetry
-		   with gs */
-		if (arg2 >= TASK_SIZE_MAX)
-			return -EPERM;
-		cpu = get_cpu();
-		task->thread.fsindex = 0;
-		task->thread.fsbase = arg2;
-		if (doit) {
-			/* set the selector to 0 to not confuse __switch_to */
-			loadsegment(fs, 0);
-			ret = wrmsrl_safe(MSR_FS_BASE, arg2);
-		}
-		put_cpu();
+	}
+	case ARCH_SET_FS: {
+		ret = x86_fsbase_write_task(task, arg2);
 		break;
+	}
 	case ARCH_GET_FS: {
-		unsigned long base;
+		unsigned long base = x86_fsbase_read_task(task);
 
-		if (doit)
-			rdmsrl(MSR_FS_BASE, base);
-		else
-			base = task->thread.fsbase;
 		ret = put_user(base, (unsigned long __user *)arg2);
 		break;
 	}
 	case ARCH_GET_GS: {
-		unsigned long base;
+		unsigned long base = x86_gsbase_read_task(task);
 
-		if (doit)
-			rdmsrl(MSR_KERNEL_GS_BASE, base);
-		else
-			base = task->thread.gsbase;
 		ret = put_user(base, (unsigned long __user *)arg2);
 		break;
 	}
diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index fbde2a7..d8f49c7 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -397,12 +397,11 @@ static int putreg(struct task_struct *child,
 		if (value >= TASK_SIZE_MAX)
 			return -EIO;
 		/*
-		 * When changing the segment base, use do_arch_prctl_64
-		 * to set either thread.fs or thread.fsindex and the
-		 * corresponding GDT slot.
+		 * When changing the FS base, use the same
+		 * mechanism as for do_arch_prctl_64().
 		 */
 		if (child->thread.fsbase != value)
-			return do_arch_prctl_64(child, ARCH_SET_FS, value);
+			return x86_fsbase_write_task(child, value);
 		return 0;
 	case offsetof(struct user_regs_struct,gs_base):
 		/*
@@ -411,7 +410,7 @@ static int putreg(struct task_struct *child,
 		if (value >= TASK_SIZE_MAX)
 			return -EIO;
 		if (child->thread.gsbase != value)
-			return do_arch_prctl_64(child, ARCH_SET_GS, value);
+			return x86_gsbase_write_task(child, value);
 		return 0;
 #endif
 	}
@@ -435,20 +434,10 @@ static unsigned long getreg(struct task_struct *task, unsigned long offset)
 		return get_flags(task);
 
 #ifdef CONFIG_X86_64
-	case offsetof(struct user_regs_struct, fs_base): {
-		if (task->thread.fsindex == 0)
-			return task->thread.fsbase;
-		else
-			return x86_fsgsbase_read_task(task,
-						      task->thread.fsindex);
-	}
-	case offsetof(struct user_regs_struct, gs_base): {
-		if (task->thread.gsindex == 0)
-			return task->thread.gsbase;
-		else
-			return x86_fsgsbase_read_task(task,
-						      task->thread.gsindex);
-	}
+	case offsetof(struct user_regs_struct, fs_base):
+		return x86_fsbase_read_task(task);
+	case offsetof(struct user_regs_struct, gs_base):
+		return x86_gsbase_read_task(task);
 #endif
 	}
 
-- 
2.7.4


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v6 4/8] x86/fsgsbase/64: Use FS/GS base helpers in core dump
  2018-09-18 23:08 [PATCH v6 0/8] x86: infrastructure to enable FSGSBASE Chang S. Bae
                   ` (2 preceding siblings ...)
  2018-09-18 23:08 ` [PATCH v6 3/8] x86/fsgsbase/64: Make ptrace use correct FS/GS base helpers Chang S. Bae
@ 2018-09-18 23:08 ` Chang S. Bae
  2018-10-08  9:56   ` [tip:x86/asm] x86/fsgsbase/64: Convert the ELF core dump code to the new FSGSBASE helpers tip-bot for Chang S. Bae
  2018-09-18 23:08 ` [PATCH v6 5/8] x86/fsgsbase/64: Factor out load FS/GS segments from __switch_to() Chang S. Bae
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 27+ messages in thread
From: Chang S. Bae @ 2018-09-18 23:08 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, Andy Lutomirski, H . Peter Anvin
  Cc: Andi Kleen, Dave Hansen, Markus T Metzger, Ravi Shankar,
	Chang S . Bae, LKML

The open coded access is now replaced, that might prevent
from using the enhanced FSGSBASE mechanism.

Based-on-code-from: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
---
 arch/x86/include/asm/elf.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h
index 0d157d2..1527ec3 100644
--- a/arch/x86/include/asm/elf.h
+++ b/arch/x86/include/asm/elf.h
@@ -10,6 +10,7 @@
 #include <asm/ptrace.h>
 #include <asm/user.h>
 #include <asm/auxvec.h>
+#include <asm/fsgsbase.h>
 
 typedef unsigned long elf_greg_t;
 
@@ -205,7 +206,6 @@ void set_personality_ia32(bool);
 
 #define ELF_CORE_COPY_REGS(pr_reg, regs)			\
 do {								\
-	unsigned long base;					\
 	unsigned v;						\
 	(pr_reg)[0] = (regs)->r15;				\
 	(pr_reg)[1] = (regs)->r14;				\
@@ -228,8 +228,8 @@ do {								\
 	(pr_reg)[18] = (regs)->flags;				\
 	(pr_reg)[19] = (regs)->sp;				\
 	(pr_reg)[20] = (regs)->ss;				\
-	rdmsrl(MSR_FS_BASE, base); (pr_reg)[21] = base;		\
-	rdmsrl(MSR_KERNEL_GS_BASE, base); (pr_reg)[22] = base;	\
+	(pr_reg)[21] = x86_fsbase_read_cpu();			\
+	(pr_reg)[22] = x86_gsbase_read_cpu_inactive();		\
 	asm("movl %%ds,%0" : "=r" (v)); (pr_reg)[23] = v;	\
 	asm("movl %%es,%0" : "=r" (v)); (pr_reg)[24] = v;	\
 	asm("movl %%fs,%0" : "=r" (v)); (pr_reg)[25] = v;	\
-- 
2.7.4


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v6 5/8] x86/fsgsbase/64: Factor out load FS/GS segments from __switch_to()
  2018-09-18 23:08 [PATCH v6 0/8] x86: infrastructure to enable FSGSBASE Chang S. Bae
                   ` (3 preceding siblings ...)
  2018-09-18 23:08 ` [PATCH v6 4/8] x86/fsgsbase/64: Use FS/GS base helpers in core dump Chang S. Bae
@ 2018-09-18 23:08 ` Chang S. Bae
  2018-10-08  9:57   ` [tip:x86/asm] x86/fsgsbase/64: Factor out FS/GS segment loading " tip-bot for Chang S. Bae
  2018-09-18 23:08 ` [PATCH v6 6/8] x86/segments/64: Rename PER_CPU segment to CPU_NUMBER Chang S. Bae
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 27+ messages in thread
From: Chang S. Bae @ 2018-09-18 23:08 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, Andy Lutomirski, H . Peter Anvin
  Cc: Andi Kleen, Dave Hansen, Markus T Metzger, Ravi Shankar,
	Chang S . Bae, LKML

Instead of open coding the calls to load_seg_legacy(), add
x86_fsgsbase_load() to load FS/GS segments.  When FSGSBASE is
enabled, the new helper will be updated.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
---
 arch/x86/kernel/process_64.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 710f639..31b4755 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -285,6 +285,15 @@ static __always_inline void load_seg_legacy(unsigned short prev_index,
 	}
 }
 
+static __always_inline void x86_fsgsbase_load(struct thread_struct *prev,
+					      struct thread_struct *next)
+{
+	load_seg_legacy(prev->fsindex, prev->fsbase,
+			next->fsindex, next->fsbase, FS);
+	load_seg_legacy(prev->gsindex, prev->gsbase,
+			next->gsindex, next->gsbase, GS);
+}
+
 static unsigned long x86_fsgsbase_read_task(struct task_struct *task,
 					    unsigned short selector)
 {
@@ -595,10 +604,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	if (unlikely(next->ds | prev->ds))
 		loadsegment(ds, next->ds);
 
-	load_seg_legacy(prev->fsindex, prev->fsbase,
-			next->fsindex, next->fsbase, FS);
-	load_seg_legacy(prev->gsindex, prev->gsbase,
-			next->gsindex, next->gsbase, GS);
+	x86_fsgsbase_load(prev, next);
 
 	switch_fpu_finish(next_fpu, cpu);
 
-- 
2.7.4


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v6 6/8] x86/segments/64: Rename PER_CPU segment to CPU_NUMBER
  2018-09-18 23:08 [PATCH v6 0/8] x86: infrastructure to enable FSGSBASE Chang S. Bae
                   ` (4 preceding siblings ...)
  2018-09-18 23:08 ` [PATCH v6 5/8] x86/fsgsbase/64: Factor out load FS/GS segments from __switch_to() Chang S. Bae
@ 2018-09-18 23:08 ` Chang S. Bae
  2018-10-08  9:57   ` [tip:x86/asm] x86/segments/64: Rename the GDT PER_CPU entry " tip-bot for Chang S. Bae
  2018-09-18 23:08 ` [PATCH v6 7/8] x86/vdso: Introduce helper functions for CPU and node number Chang S. Bae
  2018-09-18 23:08 ` [PATCH v6 8/8] x86/vdso: Move out the CPU initialization Chang S. Bae
  7 siblings, 1 reply; 27+ messages in thread
From: Chang S. Bae @ 2018-09-18 23:08 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, Andy Lutomirski, H . Peter Anvin
  Cc: Andi Kleen, Dave Hansen, Markus T Metzger, Ravi Shankar,
	Chang S . Bae, LKML

64-bit doesn't use the entry for per CPU data, but for CPU
(and node) numbers. The change will clarify the real usage
of this entry in GDT.

Suggested-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
---
 arch/x86/entry/vdso/vma.c      | 2 +-
 arch/x86/include/asm/segment.h | 5 ++---
 arch/x86/include/asm/vgtod.h   | 8 ++++----
 3 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index 5b8b556..0b114aa 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -359,7 +359,7 @@ static void vgetcpu_cpu_init(void *arg)
 	d.p = 1;		/* Present */
 	d.d = 1;		/* 32-bit */
 
-	write_gdt_entry(get_cpu_gdt_rw(cpu), GDT_ENTRY_PER_CPU, &d, DESCTYPE_S);
+	write_gdt_entry(get_cpu_gdt_rw(cpu), GDT_ENTRY_CPU_NUMBER, &d, DESCTYPE_S);
 }
 
 static int vgetcpu_online(unsigned int cpu)
diff --git a/arch/x86/include/asm/segment.h b/arch/x86/include/asm/segment.h
index 0ffbe95..3cb2aa5 100644
--- a/arch/x86/include/asm/segment.h
+++ b/arch/x86/include/asm/segment.h
@@ -186,8 +186,7 @@
 #define GDT_ENTRY_TLS_MIN		12
 #define GDT_ENTRY_TLS_MAX		14
 
-/* Abused to load per CPU data from limit */
-#define GDT_ENTRY_PER_CPU		15
+#define GDT_ENTRY_CPU_NUMBER		15
 
 /*
  * Number of entries in the GDT table:
@@ -207,7 +206,7 @@
 #define __USER_DS			(GDT_ENTRY_DEFAULT_USER_DS*8 + 3)
 #define __USER32_DS			__USER_DS
 #define __USER_CS			(GDT_ENTRY_DEFAULT_USER_CS*8 + 3)
-#define __PER_CPU_SEG			(GDT_ENTRY_PER_CPU*8 + 3)
+#define __CPU_NUMBER_SEG		(GDT_ENTRY_CPU_NUMBER*8 + 3)
 
 #endif
 
diff --git a/arch/x86/include/asm/vgtod.h b/arch/x86/include/asm/vgtod.h
index 5374854..4e81ea9 100644
--- a/arch/x86/include/asm/vgtod.h
+++ b/arch/x86/include/asm/vgtod.h
@@ -86,9 +86,9 @@ static inline unsigned int __getcpu(void)
 	unsigned int p;
 
 	/*
-	 * Load per CPU data from GDT.  LSL is faster than RDTSCP and
-	 * works on all CPUs.  This is volatile so that it orders
-	 * correctly wrt barrier() and to keep gcc from cleverly
+	 * Load CPU (and node) number from GDT.  LSL is faster than RDTSCP
+	 * and works on all CPUs.  This is volatile so that it orders
+	 * correctly with respect to barrier() and to keep GCC from cleverly
 	 * hoisting it out of the calling function.
 	 *
 	 * If RDPID is available, use it.
@@ -96,7 +96,7 @@ static inline unsigned int __getcpu(void)
 	alternative_io ("lsl %[seg],%[p]",
 			".byte 0xf3,0x0f,0xc7,0xf8", /* RDPID %eax/rax */
 			X86_FEATURE_RDPID,
-			[p] "=a" (p), [seg] "r" (__PER_CPU_SEG));
+			[p] "=a" (p), [seg] "r" (__CPU_NUMBER_SEG));
 
 	return p;
 }
-- 
2.7.4


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v6 7/8] x86/vdso: Introduce helper functions for CPU and node number
  2018-09-18 23:08 [PATCH v6 0/8] x86: infrastructure to enable FSGSBASE Chang S. Bae
                   ` (5 preceding siblings ...)
  2018-09-18 23:08 ` [PATCH v6 6/8] x86/segments/64: Rename PER_CPU segment to CPU_NUMBER Chang S. Bae
@ 2018-09-18 23:08 ` Chang S. Bae
  2018-10-08  9:58   ` [tip:x86/asm] " tip-bot for Chang S. Bae
  2018-09-18 23:08 ` [PATCH v6 8/8] x86/vdso: Move out the CPU initialization Chang S. Bae
  7 siblings, 1 reply; 27+ messages in thread
From: Chang S. Bae @ 2018-09-18 23:08 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, Andy Lutomirski, H . Peter Anvin
  Cc: Andi Kleen, Dave Hansen, Markus T Metzger, Ravi Shankar,
	Chang S . Bae, LKML

The CPU initialization in vDSO is now a bit cleaned up by
the new helper functions. The helper functions will take
care of combining CPU and node number and reading each from
the combined value.

Suggested-by: Andy Lutomirski <luto@kernel.org>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
---
 arch/x86/entry/vdso/vgetcpu.c  |  9 +--------
 arch/x86/entry/vdso/vma.c      | 19 +++++++------------
 arch/x86/include/asm/segment.h | 41 +++++++++++++++++++++++++++++++++++++++++
 arch/x86/include/asm/vgtod.h   | 26 --------------------------
 4 files changed, 49 insertions(+), 46 deletions(-)

diff --git a/arch/x86/entry/vdso/vgetcpu.c b/arch/x86/entry/vdso/vgetcpu.c
index 8ec3d1f..de78fc9 100644
--- a/arch/x86/entry/vdso/vgetcpu.c
+++ b/arch/x86/entry/vdso/vgetcpu.c
@@ -13,14 +13,7 @@
 notrace long
 __vdso_getcpu(unsigned *cpu, unsigned *node, struct getcpu_cache *unused)
 {
-	unsigned int p;
-
-	p = __getcpu();
-
-	if (cpu)
-		*cpu = p & VGETCPU_CPU_MASK;
-	if (node)
-		*node = p >> 12;
+	vdso_read_cpu_node(cpu, node);
 	return 0;
 }
 
diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index 0b114aa..39b5584 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -339,20 +339,15 @@ static void vgetcpu_cpu_init(void *arg)
 {
 	int cpu = smp_processor_id();
 	struct desc_struct d = { };
-	unsigned long node = 0;
-#ifdef CONFIG_NUMA
-	node = cpu_to_node(cpu);
-#endif
+	unsigned long cpudata = vdso_encode_cpu_node(cpu, cpu_to_node(cpu));
+
 	if (static_cpu_has(X86_FEATURE_RDTSCP))
-		write_rdtscp_aux((node << 12) | cpu);
+		write_rdtscp_aux(cpudata);
+
+	/* Store CPU and node number in limit */
+	d.limit0 = cpudata;
+	d.limit1 = cpudata >> 16;
 
-	/*
-	 * Store cpu number in limit so that it can be loaded
-	 * quickly in user space in vgetcpu. (12 bits for the CPU
-	 * and 8 bits for the node)
-	 */
-	d.limit0 = cpu | ((node & 0xf) << 12);
-	d.limit1 = node >> 4;
 	d.type = 5;		/* RO data, expand down, accessed */
 	d.dpl = 3;		/* Visible to user code */
 	d.s = 1;		/* Not a system segment */
diff --git a/arch/x86/include/asm/segment.h b/arch/x86/include/asm/segment.h
index 3cb2aa5..d4079bd 100644
--- a/arch/x86/include/asm/segment.h
+++ b/arch/x86/include/asm/segment.h
@@ -224,6 +224,47 @@
 #define GDT_ENTRY_TLS_ENTRIES		3
 #define TLS_SIZE			(GDT_ENTRY_TLS_ENTRIES* 8)
 
+#ifdef CONFIG_X86_64
+
+/* Bit size and mask of CPU number stored in the per CPU data (and TSC_AUX) */
+#define VDSO_CPU_SIZE			12
+#define VDSO_CPU_MASK			0xfff
+
+#ifndef __ASSEMBLY__
+
+/* Helper functions to store/load CPU and node numbers */
+
+static inline unsigned long vdso_encode_cpu_node(int cpu, unsigned long node)
+{
+	return ((node << VDSO_CPU_SIZE) | cpu);
+}
+
+static inline void vdso_read_cpu_node(unsigned *cpu, unsigned *node)
+{
+	unsigned int p;
+
+	/*
+	 * Load CPU and node number from GDT.  LSL is faster than RDTSCP
+	 * and works on all CPUs.  This is volatile so that it orders
+	 * correctly with respect to barrier() and to keep GCC from cleverly
+	 * hoisting it out of the calling function.
+	 *
+	 * If RDPID is available, use it.
+	 */
+	alternative_io ("lsl %[seg],%[p]",
+			".byte 0xf3,0x0f,0xc7,0xf8", /* RDPID %eax/rax */
+			X86_FEATURE_RDPID,
+			[p] "=a" (p), [seg] "r" (__CPU_NUMBER_SEG));
+
+	if (cpu)
+		*cpu = (p & VDSO_CPU_MASK);
+	if (node)
+		*node = (p >> VDSO_CPU_SIZE);
+}
+
+#endif /* !__ASSEMBLY__ */
+#endif /* CONFIG_X86_64 */
+
 #ifdef __KERNEL__
 
 /*
diff --git a/arch/x86/include/asm/vgtod.h b/arch/x86/include/asm/vgtod.h
index 4e81ea9..056a61c 100644
--- a/arch/x86/include/asm/vgtod.h
+++ b/arch/x86/include/asm/vgtod.h
@@ -77,30 +77,4 @@ static inline void gtod_write_end(struct vsyscall_gtod_data *s)
 	++s->seq;
 }
 
-#ifdef CONFIG_X86_64
-
-#define VGETCPU_CPU_MASK 0xfff
-
-static inline unsigned int __getcpu(void)
-{
-	unsigned int p;
-
-	/*
-	 * Load CPU (and node) number from GDT.  LSL is faster than RDTSCP
-	 * and works on all CPUs.  This is volatile so that it orders
-	 * correctly with respect to barrier() and to keep GCC from cleverly
-	 * hoisting it out of the calling function.
-	 *
-	 * If RDPID is available, use it.
-	 */
-	alternative_io ("lsl %[seg],%[p]",
-			".byte 0xf3,0x0f,0xc7,0xf8", /* RDPID %eax/rax */
-			X86_FEATURE_RDPID,
-			[p] "=a" (p), [seg] "r" (__CPU_NUMBER_SEG));
-
-	return p;
-}
-
-#endif /* CONFIG_X86_64 */
-
 #endif /* _ASM_X86_VGTOD_H */
-- 
2.7.4


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v6 8/8] x86/vdso: Move out the CPU initialization
  2018-09-18 23:08 [PATCH v6 0/8] x86: infrastructure to enable FSGSBASE Chang S. Bae
                   ` (6 preceding siblings ...)
  2018-09-18 23:08 ` [PATCH v6 7/8] x86/vdso: Introduce helper functions for CPU and node number Chang S. Bae
@ 2018-09-18 23:08 ` Chang S. Bae
  2018-10-08  8:36   ` Ingo Molnar
  2018-10-08  9:58   ` [tip:x86/asm] x86/vdso: Initialize the CPU/node NR segment descriptor earlier tip-bot for Chang S. Bae
  7 siblings, 2 replies; 27+ messages in thread
From: Chang S. Bae @ 2018-09-18 23:08 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, Andy Lutomirski, H . Peter Anvin
  Cc: Andi Kleen, Dave Hansen, Markus T Metzger, Ravi Shankar,
	Chang S . Bae, LKML

The CPU and node number will be written, as early enough,
to the segment limit of per CPU data and TSC_AUX MSR entry.
The information has been retrieved by vgetcpu in user space
and will be also loaded from the paranoid entry, when
FSGSBASE enabled.

The new setup function is named after the getcpu(2) system
call, and will be called during each CPU initialization
(before setting up IST). It makes a facility useful to both
the kernel and userspace unconditionally available much
sooner.

The change brings a substantial code removal. The redundant
setting of the segment in entry/vdso/vma.c and hotplug
notifier are removed.

Suggested-by: H. Peter Anvin <hpa@zytor.com>
Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/entry/vdso/vma.c    | 33 +--------------------------------
 arch/x86/kernel/cpu/common.c | 24 ++++++++++++++++++++++++
 2 files changed, 25 insertions(+), 32 deletions(-)

diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index 39b5584..3f9d43f 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -332,35 +332,6 @@ static __init int vdso_setup(char *s)
 	return 0;
 }
 __setup("vdso=", vdso_setup);
-#endif
-
-#ifdef CONFIG_X86_64
-static void vgetcpu_cpu_init(void *arg)
-{
-	int cpu = smp_processor_id();
-	struct desc_struct d = { };
-	unsigned long cpudata = vdso_encode_cpu_node(cpu, cpu_to_node(cpu));
-
-	if (static_cpu_has(X86_FEATURE_RDTSCP))
-		write_rdtscp_aux(cpudata);
-
-	/* Store CPU and node number in limit */
-	d.limit0 = cpudata;
-	d.limit1 = cpudata >> 16;
-
-	d.type = 5;		/* RO data, expand down, accessed */
-	d.dpl = 3;		/* Visible to user code */
-	d.s = 1;		/* Not a system segment */
-	d.p = 1;		/* Present */
-	d.d = 1;		/* 32-bit */
-
-	write_gdt_entry(get_cpu_gdt_rw(cpu), GDT_ENTRY_CPU_NUMBER, &d, DESCTYPE_S);
-}
-
-static int vgetcpu_online(unsigned int cpu)
-{
-	return smp_call_function_single(cpu, vgetcpu_cpu_init, NULL, 1);
-}
 
 static int __init init_vdso(void)
 {
@@ -370,9 +341,7 @@ static int __init init_vdso(void)
 	init_vdso_image(&vdso_image_x32);
 #endif
 
-	/* notifier priority > KVM */
-	return cpuhp_setup_state(CPUHP_AP_X86_VDSO_VMA_ONLINE,
-				 "x86/vdso/vma:online", vgetcpu_online, NULL);
+	return 0;
 }
 subsys_initcall(init_vdso);
 #endif /* CONFIG_X86_64 */
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 1ac7e6e..359a422 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1660,6 +1660,29 @@ static void wait_for_master_cpu(int cpu)
 #endif
 }
 
+#ifdef CONFIG_X86_64
+static void setup_getcpu(int cpu)
+{
+	unsigned long cpudata = vdso_encode_cpu_node(cpu, early_cpu_to_node(cpu));
+	struct desc_struct d = { };
+
+	if (static_cpu_has(X86_FEATURE_RDTSCP))
+		write_rdtscp_aux(cpudata);
+
+	/* Store CPU and node number in limit. */
+	d.limit0 = cpudata;
+	d.limit1 = cpudata >> 16;
+
+	d.type = 5;		/* RO data, expand down, accessed */
+	d.dpl = 3;		/* Visible to user code */
+	d.s = 1;		/* Not a system segment */
+	d.p = 1;		/* Present */
+	d.d = 1;		/* 32-bit */
+
+	write_gdt_entry(get_cpu_gdt_rw(cpu), GDT_ENTRY_CPU_NUMBER, &d, DESCTYPE_S);
+}
+#endif
+
 /*
  * cpu_init() initializes state that is per-CPU. Some data is already
  * initialized (naturally) in the bootstrap process, such as the GDT
@@ -1697,6 +1720,7 @@ void cpu_init(void)
 	    early_cpu_to_node(cpu) != NUMA_NO_NODE)
 		set_numa_node(early_cpu_to_node(cpu));
 #endif
+	setup_getcpu(cpu);
 
 	me = current;
 
-- 
2.7.4


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v6 8/8] x86/vdso: Move out the CPU initialization
  2018-09-18 23:08 ` [PATCH v6 8/8] x86/vdso: Move out the CPU initialization Chang S. Bae
@ 2018-10-08  8:36   ` Ingo Molnar
  2018-10-08  9:58   ` [tip:x86/asm] x86/vdso: Initialize the CPU/node NR segment descriptor earlier tip-bot for Chang S. Bae
  1 sibling, 0 replies; 27+ messages in thread
From: Ingo Molnar @ 2018-10-08  8:36 UTC (permalink / raw)
  To: Chang S. Bae
  Cc: Thomas Gleixner, Andy Lutomirski, H . Peter Anvin, Andi Kleen,
	Dave Hansen, Markus T Metzger, Ravi Shankar, LKML


* Chang S. Bae <chang.seok.bae@intel.com> wrote:

> The CPU and node number will be written, as early enough,
> to the segment limit of per CPU data and TSC_AUX MSR entry.
> The information has been retrieved by vgetcpu in user space
> and will be also loaded from the paranoid entry, when
> FSGSBASE enabled.
>
> The new setup function is named after the getcpu(2) system
> call, and will be called during each CPU initialization
> (before setting up IST). It makes a facility useful to both
> the kernel and userspace unconditionally available much
> sooner.
> 
> The change brings a substantial code removal. The redundant
> setting of the segment in entry/vdso/vma.c and hotplug
> notifier are removed.

The title and the changelog is totally unreadable, full of grammar errors
which makes it actively misleading...

A good changelog should explain not what it does, but _why_ it is
done:

  x86/vdso: Initialize the CPU/node NR segment descriptor earlier

  Currently the CPU/node NR segment descriptor (GDT_ENTRY_CPU_NUMBER) is
  initialized relatively late during CPU init, from the vCPU code, which
  has a number of disadvantages, such as hotplug CPU notifiers and SMP
  cross-calls.

  Instead just initialize it much earlier, directly in cpu_init().

  This reduces complexity and increases robustness.

I've edited the changelog, but please keep this in mind for future submissions.

I also made a number of other cleanups to the code, will push them out
after some testing.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [tip:x86/asm] x86/fsgsbase/64: Fix ptrace() to read the FS/GS base accurately
  2018-09-18 23:08 ` [PATCH v6 1/8] x86/arch_prctl/64: Make ptrace read FS/GS base accurately Chang S. Bae
@ 2018-10-08  9:54   ` tip-bot for Andy Lutomirski
  2018-10-08  9:59   ` [tip:x86/asm] x86/segments: Introduce the 'CPUNODE' naming to better document the segment limit CPU/node NR trick tip-bot for Ingo Molnar
  2018-10-08  9:59   ` [tip:x86/asm] x86/fsgsbase/64: Clean up various details tip-bot for Ingo Molnar
  2 siblings, 0 replies; 27+ messages in thread
From: tip-bot for Andy Lutomirski @ 2018-10-08  9:54 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, riel, brgerst, dave.hansen, bp, tglx, hpa, linux-kernel,
	chang.seok.bae, luto, torvalds, dvlasenk, ravi.v.shankar, peterz,
	markus.t.metzger, luto

Commit-ID:  07e1d88adaaeab247b300926f78cc3f950dbeda3
Gitweb:     https://git.kernel.org/tip/07e1d88adaaeab247b300926f78cc3f950dbeda3
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Tue, 18 Sep 2018 16:08:52 -0700
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Mon, 8 Oct 2018 10:41:08 +0200

x86/fsgsbase/64: Fix ptrace() to read the FS/GS base accurately

On 64-bit kernels ptrace can read the FS/GS base using the register access
APIs (PTRACE_PEEKUSER, etc.) or PTRACE_ARCH_PRCTL.

Make both of these mechanisms return the actual FS/GS base.

This will improve debuggability by providing the correct information
to ptracer such as GDB.

[ chang: Rebased and revised patch description. ]
[ mingo: Revised the changelog some more. ]

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Markus T Metzger <markus.t.metzger@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1537312139-5580-2-git-send-email-chang.seok.bae@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/ptrace.c | 62 ++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 52 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index e2ee403865eb..3acbf45cb7fb 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -39,6 +39,7 @@
 #include <asm/hw_breakpoint.h>
 #include <asm/traps.h>
 #include <asm/syscall.h>
+#include <asm/mmu_context.h>
 
 #include "tls.h"
 
@@ -342,6 +343,49 @@ static int set_segment_reg(struct task_struct *task,
 	return 0;
 }
 
+static unsigned long task_seg_base(struct task_struct *task,
+				   unsigned short selector)
+{
+	unsigned short idx = selector >> 3;
+	unsigned long base;
+
+	if (likely((selector & SEGMENT_TI_MASK) == 0)) {
+		if (unlikely(idx >= GDT_ENTRIES))
+			return 0;
+
+		/*
+		 * There are no user segments in the GDT with nonzero bases
+		 * other than the TLS segments.
+		 */
+		if (idx < GDT_ENTRY_TLS_MIN || idx > GDT_ENTRY_TLS_MAX)
+			return 0;
+
+		idx -= GDT_ENTRY_TLS_MIN;
+		base = get_desc_base(&task->thread.tls_array[idx]);
+	} else {
+#ifdef CONFIG_MODIFY_LDT_SYSCALL
+		struct ldt_struct *ldt;
+
+		/*
+		 * If performance here mattered, we could protect the LDT
+		 * with RCU.  This is a slow path, though, so we can just
+		 * take the mutex.
+		 */
+		mutex_lock(&task->mm->context.lock);
+		ldt = task->mm->context.ldt;
+		if (unlikely(idx >= ldt->nr_entries))
+			base = 0;
+		else
+			base = get_desc_base(ldt->entries + idx);
+		mutex_unlock(&task->mm->context.lock);
+#else
+		base = 0;
+#endif
+	}
+
+	return base;
+}
+
 #endif	/* CONFIG_X86_32 */
 
 static unsigned long get_flags(struct task_struct *task)
@@ -435,18 +479,16 @@ static unsigned long getreg(struct task_struct *task, unsigned long offset)
 
 #ifdef CONFIG_X86_64
 	case offsetof(struct user_regs_struct, fs_base): {
-		/*
-		 * XXX: This will not behave as expected if called on
-		 * current or if fsindex != 0.
-		 */
-		return task->thread.fsbase;
+		if (task->thread.fsindex == 0)
+			return task->thread.fsbase;
+		else
+			return task_seg_base(task, task->thread.fsindex);
 	}
 	case offsetof(struct user_regs_struct, gs_base): {
-		/*
-		 * XXX: This will not behave as expected if called on
-		 * current or if fsindex != 0.
-		 */
-		return task->thread.gsbase;
+		if (task->thread.gsindex == 0)
+			return task->thread.gsbase;
+		else
+			return task_seg_base(task, task->thread.gsindex);
 	}
 #endif
 	}

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [tip:x86/asm] x86/fsgsbase/64: Introduce FS/GS base helper functions
  2018-09-18 23:08 ` [PATCH v6 2/8] x86/fsgsbase/64: Introduce FS/GS base helper functions Chang S. Bae
@ 2018-10-08  9:55   ` tip-bot for Chang S. Bae
  2018-10-24 19:01   ` [regression in -rc1] Re: [PATCH v6 2/8] " Andy Lutomirski
  1 sibling, 0 replies; 27+ messages in thread
From: tip-bot for Chang S. Bae @ 2018-10-08  9:55 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: tglx, luto, peterz, chang.seok.bae, luto, hpa, dave.hansen,
	linux-kernel, mingo, bp, markus.t.metzger, torvalds, dvlasenk,
	riel, ravi.v.shankar, brgerst

Commit-ID:  b1378a561fd16afdd96ef0bc912b1bcd2b85a68e
Gitweb:     https://git.kernel.org/tip/b1378a561fd16afdd96ef0bc912b1bcd2b85a68e
Author:     Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate: Tue, 18 Sep 2018 16:08:53 -0700
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Mon, 8 Oct 2018 10:41:08 +0200

x86/fsgsbase/64: Introduce FS/GS base helper functions

Introduce FS/GS base access functionality via <asm/fsgsbase.h>,
not yet used by anything directly.

Factor out task_seg_base() from x86/ptrace.c and rename it to
x86_fsgsbase_read_task() to make it part of the new helpers.

This will allow us to enhance FSGSBASE support and eventually enable
the FSBASE/GSBASE instructions.

An "inactive" GS base refers to a base saved at kernel entry
and being part of an inactive, non-running/stopped user-task.
(The typical ptrace model.)

Here are the new functions:

  x86_fsbase_read_task()
  x86_gsbase_read_task()
  x86_fsbase_write_task()
  x86_gsbase_write_task()
  x86_fsbase_read_cpu()
  x86_fsbase_write_cpu()
  x86_gsbase_read_cpu_inactive()
  x86_gsbase_write_cpu_inactive()

As an advantage of the unified namespace we can now see all FS/GSBASE
API use in the kernel via the following 'git grep' pattern:

  $ git grep x86_.*sbase

[ mingo: Wrote new changelog. ]

Based-on-code-from: Andy Lutomirski <luto@kernel.org>
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Markus T Metzger <markus.t.metzger@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1537312139-5580-3-git-send-email-chang.seok.bae@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/fsgsbase.h |  50 ++++++++++++++++
 arch/x86/kernel/process_64.c    | 124 ++++++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/ptrace.c        |  51 ++---------------
 3 files changed, 179 insertions(+), 46 deletions(-)

diff --git a/arch/x86/include/asm/fsgsbase.h b/arch/x86/include/asm/fsgsbase.h
new file mode 100644
index 000000000000..1ab465ee23fe
--- /dev/null
+++ b/arch/x86/include/asm/fsgsbase.h
@@ -0,0 +1,50 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_FSGSBASE_H
+#define _ASM_FSGSBASE_H 1
+
+#ifndef __ASSEMBLY__
+
+#ifdef CONFIG_X86_64
+
+#include <asm/msr-index.h>
+
+unsigned long x86_fsgsbase_read_task(struct task_struct *task,
+				     unsigned short selector);
+
+/*
+ * Read/write a task's fsbase or gsbase. This returns the value that
+ * the FS/GS base would have (if the task were to be resumed). These
+ * work on current or on a different non-running task.
+ */
+unsigned long x86_fsbase_read_task(struct task_struct *task);
+unsigned long x86_gsbase_read_task(struct task_struct *task);
+int x86_fsbase_write_task(struct task_struct *task, unsigned long fsbase);
+int x86_gsbase_write_task(struct task_struct *task, unsigned long gsbase);
+
+/* Helper functions for reading/writing FS/GS base */
+
+static inline unsigned long x86_fsbase_read_cpu(void)
+{
+	unsigned long fsbase;
+
+	rdmsrl(MSR_FS_BASE, fsbase);
+	return fsbase;
+}
+
+void x86_fsbase_write_cpu(unsigned long fsbase);
+
+static inline unsigned long x86_gsbase_read_cpu_inactive(void)
+{
+	unsigned long gsbase;
+
+	rdmsrl(MSR_KERNEL_GS_BASE, gsbase);
+	return gsbase;
+}
+
+void x86_gsbase_write_cpu_inactive(unsigned long gsbase);
+
+#endif /* CONFIG_X86_64 */
+
+#endif /* __ASSEMBLY__ */
+
+#endif /* _ASM_FSGSBASE_H */
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index ea5ea850348d..2a53ff8d1baf 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -54,6 +54,7 @@
 #include <asm/vdso.h>
 #include <asm/intel_rdt_sched.h>
 #include <asm/unistd.h>
+#include <asm/fsgsbase.h>
 #ifdef CONFIG_IA32_EMULATION
 /* Not included via unistd.h */
 #include <asm/unistd_32_ia32.h>
@@ -286,6 +287,129 @@ static __always_inline void load_seg_legacy(unsigned short prev_index,
 	}
 }
 
+unsigned long x86_fsgsbase_read_task(struct task_struct *task,
+				     unsigned short selector)
+{
+	unsigned short idx = selector >> 3;
+	unsigned long base;
+
+	if (likely((selector & SEGMENT_TI_MASK) == 0)) {
+		if (unlikely(idx >= GDT_ENTRIES))
+			return 0;
+
+		/*
+		 * There are no user segments in the GDT with nonzero bases
+		 * other than the TLS segments.
+		 */
+		if (idx < GDT_ENTRY_TLS_MIN || idx > GDT_ENTRY_TLS_MAX)
+			return 0;
+
+		idx -= GDT_ENTRY_TLS_MIN;
+		base = get_desc_base(&task->thread.tls_array[idx]);
+	} else {
+#ifdef CONFIG_MODIFY_LDT_SYSCALL
+		struct ldt_struct *ldt;
+
+		/*
+		 * If performance here mattered, we could protect the LDT
+		 * with RCU.  This is a slow path, though, so we can just
+		 * take the mutex.
+		 */
+		mutex_lock(&task->mm->context.lock);
+		ldt = task->mm->context.ldt;
+		if (unlikely(idx >= ldt->nr_entries))
+			base = 0;
+		else
+			base = get_desc_base(ldt->entries + idx);
+		mutex_unlock(&task->mm->context.lock);
+#else
+		base = 0;
+#endif
+	}
+
+	return base;
+}
+
+void x86_fsbase_write_cpu(unsigned long fsbase)
+{
+	/*
+	 * Set the selector to 0 as a notion, that the segment base is
+	 * overwritten, which will be checked for skipping the segment load
+	 * during context switch.
+	 */
+	loadseg(FS, 0);
+	wrmsrl(MSR_FS_BASE, fsbase);
+}
+
+void x86_gsbase_write_cpu_inactive(unsigned long gsbase)
+{
+	/* Set the selector to 0 for the same reason as %fs above. */
+	loadseg(GS, 0);
+	wrmsrl(MSR_KERNEL_GS_BASE, gsbase);
+}
+
+unsigned long x86_fsbase_read_task(struct task_struct *task)
+{
+	unsigned long fsbase;
+
+	if (task == current)
+		fsbase = x86_fsbase_read_cpu();
+	else if (task->thread.fsindex == 0)
+		fsbase = task->thread.fsbase;
+	else
+		fsbase = x86_fsgsbase_read_task(task, task->thread.fsindex);
+
+	return fsbase;
+}
+
+unsigned long x86_gsbase_read_task(struct task_struct *task)
+{
+	unsigned long gsbase;
+
+	if (task == current)
+		gsbase = x86_gsbase_read_cpu_inactive();
+	else if (task->thread.gsindex == 0)
+		gsbase = task->thread.gsbase;
+	else
+		gsbase = x86_fsgsbase_read_task(task, task->thread.gsindex);
+
+	return gsbase;
+}
+
+int x86_fsbase_write_task(struct task_struct *task, unsigned long fsbase)
+{
+	/*
+	 * Not strictly needed for %fs, but do it for symmetry
+	 * with %gs
+	 */
+	if (unlikely(fsbase >= TASK_SIZE_MAX))
+		return -EPERM;
+
+	preempt_disable();
+	task->thread.fsbase = fsbase;
+	if (task == current)
+		x86_fsbase_write_cpu(fsbase);
+	task->thread.fsindex = 0;
+	preempt_enable();
+
+	return 0;
+}
+
+int x86_gsbase_write_task(struct task_struct *task, unsigned long gsbase)
+{
+	if (unlikely(gsbase >= TASK_SIZE_MAX))
+		return -EPERM;
+
+	preempt_disable();
+	task->thread.gsbase = gsbase;
+	if (task == current)
+		x86_gsbase_write_cpu_inactive(gsbase);
+	task->thread.gsindex = 0;
+	preempt_enable();
+
+	return 0;
+}
+
 int copy_thread_tls(unsigned long clone_flags, unsigned long sp,
 		unsigned long arg, struct task_struct *p, unsigned long tls)
 {
diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index 3acbf45cb7fb..fbde2a7ce377 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -39,7 +39,7 @@
 #include <asm/hw_breakpoint.h>
 #include <asm/traps.h>
 #include <asm/syscall.h>
-#include <asm/mmu_context.h>
+#include <asm/fsgsbase.h>
 
 #include "tls.h"
 
@@ -343,49 +343,6 @@ static int set_segment_reg(struct task_struct *task,
 	return 0;
 }
 
-static unsigned long task_seg_base(struct task_struct *task,
-				   unsigned short selector)
-{
-	unsigned short idx = selector >> 3;
-	unsigned long base;
-
-	if (likely((selector & SEGMENT_TI_MASK) == 0)) {
-		if (unlikely(idx >= GDT_ENTRIES))
-			return 0;
-
-		/*
-		 * There are no user segments in the GDT with nonzero bases
-		 * other than the TLS segments.
-		 */
-		if (idx < GDT_ENTRY_TLS_MIN || idx > GDT_ENTRY_TLS_MAX)
-			return 0;
-
-		idx -= GDT_ENTRY_TLS_MIN;
-		base = get_desc_base(&task->thread.tls_array[idx]);
-	} else {
-#ifdef CONFIG_MODIFY_LDT_SYSCALL
-		struct ldt_struct *ldt;
-
-		/*
-		 * If performance here mattered, we could protect the LDT
-		 * with RCU.  This is a slow path, though, so we can just
-		 * take the mutex.
-		 */
-		mutex_lock(&task->mm->context.lock);
-		ldt = task->mm->context.ldt;
-		if (unlikely(idx >= ldt->nr_entries))
-			base = 0;
-		else
-			base = get_desc_base(ldt->entries + idx);
-		mutex_unlock(&task->mm->context.lock);
-#else
-		base = 0;
-#endif
-	}
-
-	return base;
-}
-
 #endif	/* CONFIG_X86_32 */
 
 static unsigned long get_flags(struct task_struct *task)
@@ -482,13 +439,15 @@ static unsigned long getreg(struct task_struct *task, unsigned long offset)
 		if (task->thread.fsindex == 0)
 			return task->thread.fsbase;
 		else
-			return task_seg_base(task, task->thread.fsindex);
+			return x86_fsgsbase_read_task(task,
+						      task->thread.fsindex);
 	}
 	case offsetof(struct user_regs_struct, gs_base): {
 		if (task->thread.gsindex == 0)
 			return task->thread.gsbase;
 		else
-			return task_seg_base(task, task->thread.gsindex);
+			return x86_fsgsbase_read_task(task,
+						      task->thread.gsindex);
 	}
 #endif
 	}

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [tip:x86/asm] x86/fsgsbase/64: Make ptrace use the new FS/GS base helpers
  2018-09-18 23:08 ` [PATCH v6 3/8] x86/fsgsbase/64: Make ptrace use correct FS/GS base helpers Chang S. Bae
@ 2018-10-08  9:56   ` tip-bot for Chang S. Bae
  0 siblings, 0 replies; 27+ messages in thread
From: tip-bot for Chang S. Bae @ 2018-10-08  9:56 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: tglx, hpa, dvlasenk, brgerst, torvalds, luto, mingo, bp, riel,
	chang.seok.bae, ravi.v.shankar, dave.hansen, luto, linux-kernel,
	markus.t.metzger, peterz

Commit-ID:  e696c231bebf5f17fe0c5e465c01511320668054
Gitweb:     https://git.kernel.org/tip/e696c231bebf5f17fe0c5e465c01511320668054
Author:     Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate: Tue, 18 Sep 2018 16:08:54 -0700
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Mon, 8 Oct 2018 10:41:08 +0200

x86/fsgsbase/64: Make ptrace use the new FS/GS base helpers

Use the new FS/GS base helper functions in <asm/fsgsbase.h> in the platform
specific ptrace implementation of the following APIs:

  PTRACE_ARCH_PRCTL,
  PTRACE_SETREG,
  PTRACE_GETREG,
  etc.

The fsgsbase code is more abstracted out this way and the FS/GS-update
mechanism will be easier to change this way.

[ mingo: Wrote new changelog. ]

Based-on-code-from: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Markus T Metzger <markus.t.metzger@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1537312139-5580-4-git-send-email-chang.seok.bae@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/fsgsbase.h |  3 ---
 arch/x86/kernel/process_64.c    | 49 +++++++++--------------------------------
 arch/x86/kernel/ptrace.c        | 27 +++++++----------------
 3 files changed, 18 insertions(+), 61 deletions(-)

diff --git a/arch/x86/include/asm/fsgsbase.h b/arch/x86/include/asm/fsgsbase.h
index 1ab465ee23fe..5e9cbcce318a 100644
--- a/arch/x86/include/asm/fsgsbase.h
+++ b/arch/x86/include/asm/fsgsbase.h
@@ -8,9 +8,6 @@
 
 #include <asm/msr-index.h>
 
-unsigned long x86_fsgsbase_read_task(struct task_struct *task,
-				     unsigned short selector);
-
 /*
  * Read/write a task's fsbase or gsbase. This returns the value that
  * the FS/GS base would have (if the task were to be resumed). These
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 2a53ff8d1baf..e5fb0c3dee4d 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -287,8 +287,8 @@ static __always_inline void load_seg_legacy(unsigned short prev_index,
 	}
 }
 
-unsigned long x86_fsgsbase_read_task(struct task_struct *task,
-				     unsigned short selector)
+static unsigned long x86_fsgsbase_read_task(struct task_struct *task,
+					    unsigned short selector)
 {
 	unsigned short idx = selector >> 3;
 	unsigned long base;
@@ -751,54 +751,25 @@ static long prctl_map_vdso(const struct vdso_image *image, unsigned long addr)
 long do_arch_prctl_64(struct task_struct *task, int option, unsigned long arg2)
 {
 	int ret = 0;
-	int doit = task == current;
-	int cpu;
 
 	switch (option) {
-	case ARCH_SET_GS:
-		if (arg2 >= TASK_SIZE_MAX)
-			return -EPERM;
-		cpu = get_cpu();
-		task->thread.gsindex = 0;
-		task->thread.gsbase = arg2;
-		if (doit) {
-			load_gs_index(0);
-			ret = wrmsrl_safe(MSR_KERNEL_GS_BASE, arg2);
-		}
-		put_cpu();
+	case ARCH_SET_GS: {
+		ret = x86_gsbase_write_task(task, arg2);
 		break;
-	case ARCH_SET_FS:
-		/* Not strictly needed for fs, but do it for symmetry
-		   with gs */
-		if (arg2 >= TASK_SIZE_MAX)
-			return -EPERM;
-		cpu = get_cpu();
-		task->thread.fsindex = 0;
-		task->thread.fsbase = arg2;
-		if (doit) {
-			/* set the selector to 0 to not confuse __switch_to */
-			loadsegment(fs, 0);
-			ret = wrmsrl_safe(MSR_FS_BASE, arg2);
-		}
-		put_cpu();
+	}
+	case ARCH_SET_FS: {
+		ret = x86_fsbase_write_task(task, arg2);
 		break;
+	}
 	case ARCH_GET_FS: {
-		unsigned long base;
+		unsigned long base = x86_fsbase_read_task(task);
 
-		if (doit)
-			rdmsrl(MSR_FS_BASE, base);
-		else
-			base = task->thread.fsbase;
 		ret = put_user(base, (unsigned long __user *)arg2);
 		break;
 	}
 	case ARCH_GET_GS: {
-		unsigned long base;
+		unsigned long base = x86_gsbase_read_task(task);
 
-		if (doit)
-			rdmsrl(MSR_KERNEL_GS_BASE, base);
-		else
-			base = task->thread.gsbase;
 		ret = put_user(base, (unsigned long __user *)arg2);
 		break;
 	}
diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index fbde2a7ce377..d8f49c7384a3 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -397,12 +397,11 @@ static int putreg(struct task_struct *child,
 		if (value >= TASK_SIZE_MAX)
 			return -EIO;
 		/*
-		 * When changing the segment base, use do_arch_prctl_64
-		 * to set either thread.fs or thread.fsindex and the
-		 * corresponding GDT slot.
+		 * When changing the FS base, use the same
+		 * mechanism as for do_arch_prctl_64().
 		 */
 		if (child->thread.fsbase != value)
-			return do_arch_prctl_64(child, ARCH_SET_FS, value);
+			return x86_fsbase_write_task(child, value);
 		return 0;
 	case offsetof(struct user_regs_struct,gs_base):
 		/*
@@ -411,7 +410,7 @@ static int putreg(struct task_struct *child,
 		if (value >= TASK_SIZE_MAX)
 			return -EIO;
 		if (child->thread.gsbase != value)
-			return do_arch_prctl_64(child, ARCH_SET_GS, value);
+			return x86_gsbase_write_task(child, value);
 		return 0;
 #endif
 	}
@@ -435,20 +434,10 @@ static unsigned long getreg(struct task_struct *task, unsigned long offset)
 		return get_flags(task);
 
 #ifdef CONFIG_X86_64
-	case offsetof(struct user_regs_struct, fs_base): {
-		if (task->thread.fsindex == 0)
-			return task->thread.fsbase;
-		else
-			return x86_fsgsbase_read_task(task,
-						      task->thread.fsindex);
-	}
-	case offsetof(struct user_regs_struct, gs_base): {
-		if (task->thread.gsindex == 0)
-			return task->thread.gsbase;
-		else
-			return x86_fsgsbase_read_task(task,
-						      task->thread.gsindex);
-	}
+	case offsetof(struct user_regs_struct, fs_base):
+		return x86_fsbase_read_task(task);
+	case offsetof(struct user_regs_struct, gs_base):
+		return x86_gsbase_read_task(task);
 #endif
 	}
 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [tip:x86/asm] x86/fsgsbase/64: Convert the ELF core dump code to the new FSGSBASE helpers
  2018-09-18 23:08 ` [PATCH v6 4/8] x86/fsgsbase/64: Use FS/GS base helpers in core dump Chang S. Bae
@ 2018-10-08  9:56   ` tip-bot for Chang S. Bae
  0 siblings, 0 replies; 27+ messages in thread
From: tip-bot for Chang S. Bae @ 2018-10-08  9:56 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: riel, torvalds, luto, ravi.v.shankar, luto, brgerst, bp, ak, hpa,
	peterz, markus.t.metzger, dave.hansen, tglx, chang.seok.bae,
	dvlasenk, linux-kernel, mingo

Commit-ID:  824eea38d239fb2a6027e65e18a5daef23019b00
Gitweb:     https://git.kernel.org/tip/824eea38d239fb2a6027e65e18a5daef23019b00
Author:     Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate: Tue, 18 Sep 2018 16:08:55 -0700
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Mon, 8 Oct 2018 10:41:09 +0200

x86/fsgsbase/64: Convert the ELF core dump code to the new FSGSBASE helpers

Replace open-coded rdmsr()'s with their <asm/fsgsbase.h> API
counterparts.

No change in functionality intended.

[ mingo: Wrote new changelog. ]

Based-on-code-from: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Markus T Metzger <markus.t.metzger@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: Rik van Riel <riel@surriel.com>
Link: http://lkml.kernel.org/r/1537312139-5580-5-git-send-email-chang.seok.bae@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/elf.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h
index 0d157d2a1e2a..1527ec351036 100644
--- a/arch/x86/include/asm/elf.h
+++ b/arch/x86/include/asm/elf.h
@@ -10,6 +10,7 @@
 #include <asm/ptrace.h>
 #include <asm/user.h>
 #include <asm/auxvec.h>
+#include <asm/fsgsbase.h>
 
 typedef unsigned long elf_greg_t;
 
@@ -205,7 +206,6 @@ void set_personality_ia32(bool);
 
 #define ELF_CORE_COPY_REGS(pr_reg, regs)			\
 do {								\
-	unsigned long base;					\
 	unsigned v;						\
 	(pr_reg)[0] = (regs)->r15;				\
 	(pr_reg)[1] = (regs)->r14;				\
@@ -228,8 +228,8 @@ do {								\
 	(pr_reg)[18] = (regs)->flags;				\
 	(pr_reg)[19] = (regs)->sp;				\
 	(pr_reg)[20] = (regs)->ss;				\
-	rdmsrl(MSR_FS_BASE, base); (pr_reg)[21] = base;		\
-	rdmsrl(MSR_KERNEL_GS_BASE, base); (pr_reg)[22] = base;	\
+	(pr_reg)[21] = x86_fsbase_read_cpu();			\
+	(pr_reg)[22] = x86_gsbase_read_cpu_inactive();		\
 	asm("movl %%ds,%0" : "=r" (v)); (pr_reg)[23] = v;	\
 	asm("movl %%es,%0" : "=r" (v)); (pr_reg)[24] = v;	\
 	asm("movl %%fs,%0" : "=r" (v)); (pr_reg)[25] = v;	\

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [tip:x86/asm] x86/fsgsbase/64: Factor out FS/GS segment loading from __switch_to()
  2018-09-18 23:08 ` [PATCH v6 5/8] x86/fsgsbase/64: Factor out load FS/GS segments from __switch_to() Chang S. Bae
@ 2018-10-08  9:57   ` tip-bot for Chang S. Bae
  0 siblings, 0 replies; 27+ messages in thread
From: tip-bot for Chang S. Bae @ 2018-10-08  9:57 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: chang.seok.bae, riel, bp, luto, hpa, linux-kernel, brgerst, tglx,
	torvalds, luto, dave.hansen, ravi.v.shankar, ak, mingo,
	markus.t.metzger, dvlasenk, peterz

Commit-ID:  f4550b52e495e1b634d1f2c1004bcea5dc3321ea
Gitweb:     https://git.kernel.org/tip/f4550b52e495e1b634d1f2c1004bcea5dc3321ea
Author:     Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate: Tue, 18 Sep 2018 16:08:56 -0700
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Mon, 8 Oct 2018 10:41:09 +0200

x86/fsgsbase/64: Factor out FS/GS segment loading from __switch_to()

Instead of open coding the calls to load_seg_legacy(), introduce
x86_fsgsbase_load() to load FS/GS segments.

This makes it more explicit that this is part of FSGSBASE functionality,
and the new helper can be updated when FSGSBASE instructions are enabled.

[ mingo: Wrote new changelog. ]

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Markus T Metzger <markus.t.metzger@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: Rik van Riel <riel@surriel.com>
Link: http://lkml.kernel.org/r/1537312139-5580-6-git-send-email-chang.seok.bae@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/process_64.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index e5fb0c3dee4d..d6674a425714 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -287,6 +287,15 @@ static __always_inline void load_seg_legacy(unsigned short prev_index,
 	}
 }
 
+static __always_inline void x86_fsgsbase_load(struct thread_struct *prev,
+					      struct thread_struct *next)
+{
+	load_seg_legacy(prev->fsindex, prev->fsbase,
+			next->fsindex, next->fsbase, FS);
+	load_seg_legacy(prev->gsindex, prev->gsbase,
+			next->gsindex, next->gsbase, GS);
+}
+
 static unsigned long x86_fsgsbase_read_task(struct task_struct *task,
 					    unsigned short selector)
 {
@@ -597,10 +606,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	if (unlikely(next->ds | prev->ds))
 		loadsegment(ds, next->ds);
 
-	load_seg_legacy(prev->fsindex, prev->fsbase,
-			next->fsindex, next->fsbase, FS);
-	load_seg_legacy(prev->gsindex, prev->gsbase,
-			next->gsindex, next->gsbase, GS);
+	x86_fsgsbase_load(prev, next);
 
 	switch_fpu_finish(next_fpu, cpu);
 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [tip:x86/asm] x86/segments/64: Rename the GDT PER_CPU entry to CPU_NUMBER
  2018-09-18 23:08 ` [PATCH v6 6/8] x86/segments/64: Rename PER_CPU segment to CPU_NUMBER Chang S. Bae
@ 2018-10-08  9:57   ` tip-bot for Chang S. Bae
  0 siblings, 0 replies; 27+ messages in thread
From: tip-bot for Chang S. Bae @ 2018-10-08  9:57 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: markus.t.metzger, hpa, ravi.v.shankar, dvlasenk, linux-kernel,
	riel, dave.hansen, torvalds, brgerst, luto, luto, mingo,
	chang.seok.bae, bp, peterz, tglx

Commit-ID:  c4755613a1339ea77dbb15de75c9f74217209265
Gitweb:     https://git.kernel.org/tip/c4755613a1339ea77dbb15de75c9f74217209265
Author:     Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate: Tue, 18 Sep 2018 16:08:57 -0700
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Mon, 8 Oct 2018 10:41:10 +0200

x86/segments/64: Rename the GDT PER_CPU entry to CPU_NUMBER

The old 'per CPU' naming was misleading: 64-bit kernels don't use this
GDT entry for per CPU data, but to store the CPU (and node) ID.

[ mingo: Wrote new changelog. ]

Suggested-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Markus T Metzger <markus.t.metzger@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: Rik van Riel <riel@surriel.com>
Link: http://lkml.kernel.org/r/1537312139-5580-7-git-send-email-chang.seok.bae@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/entry/vdso/vma.c      | 2 +-
 arch/x86/include/asm/segment.h | 5 ++---
 arch/x86/include/asm/vgtod.h   | 8 ++++----
 3 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index 5b8b556dbb12..0b114aafcedc 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -359,7 +359,7 @@ static void vgetcpu_cpu_init(void *arg)
 	d.p = 1;		/* Present */
 	d.d = 1;		/* 32-bit */
 
-	write_gdt_entry(get_cpu_gdt_rw(cpu), GDT_ENTRY_PER_CPU, &d, DESCTYPE_S);
+	write_gdt_entry(get_cpu_gdt_rw(cpu), GDT_ENTRY_CPU_NUMBER, &d, DESCTYPE_S);
 }
 
 static int vgetcpu_online(unsigned int cpu)
diff --git a/arch/x86/include/asm/segment.h b/arch/x86/include/asm/segment.h
index e293c122d0d5..e3e788ea52e5 100644
--- a/arch/x86/include/asm/segment.h
+++ b/arch/x86/include/asm/segment.h
@@ -186,8 +186,7 @@
 #define GDT_ENTRY_TLS_MIN		12
 #define GDT_ENTRY_TLS_MAX		14
 
-/* Abused to load per CPU data from limit */
-#define GDT_ENTRY_PER_CPU		15
+#define GDT_ENTRY_CPU_NUMBER		15
 
 /*
  * Number of entries in the GDT table:
@@ -207,7 +206,7 @@
 #define __USER_DS			(GDT_ENTRY_DEFAULT_USER_DS*8 + 3)
 #define __USER32_DS			__USER_DS
 #define __USER_CS			(GDT_ENTRY_DEFAULT_USER_CS*8 + 3)
-#define __PER_CPU_SEG			(GDT_ENTRY_PER_CPU*8 + 3)
+#define __CPU_NUMBER_SEG		(GDT_ENTRY_CPU_NUMBER*8 + 3)
 
 #endif
 
diff --git a/arch/x86/include/asm/vgtod.h b/arch/x86/include/asm/vgtod.h
index 53748541c487..4e81ea920722 100644
--- a/arch/x86/include/asm/vgtod.h
+++ b/arch/x86/include/asm/vgtod.h
@@ -86,9 +86,9 @@ static inline unsigned int __getcpu(void)
 	unsigned int p;
 
 	/*
-	 * Load per CPU data from GDT.  LSL is faster than RDTSCP and
-	 * works on all CPUs.  This is volatile so that it orders
-	 * correctly wrt barrier() and to keep gcc from cleverly
+	 * Load CPU (and node) number from GDT.  LSL is faster than RDTSCP
+	 * and works on all CPUs.  This is volatile so that it orders
+	 * correctly with respect to barrier() and to keep GCC from cleverly
 	 * hoisting it out of the calling function.
 	 *
 	 * If RDPID is available, use it.
@@ -96,7 +96,7 @@ static inline unsigned int __getcpu(void)
 	alternative_io ("lsl %[seg],%[p]",
 			".byte 0xf3,0x0f,0xc7,0xf8", /* RDPID %eax/rax */
 			X86_FEATURE_RDPID,
-			[p] "=a" (p), [seg] "r" (__PER_CPU_SEG));
+			[p] "=a" (p), [seg] "r" (__CPU_NUMBER_SEG));
 
 	return p;
 }

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [tip:x86/asm] x86/vdso: Introduce helper functions for CPU and node number
  2018-09-18 23:08 ` [PATCH v6 7/8] x86/vdso: Introduce helper functions for CPU and node number Chang S. Bae
@ 2018-10-08  9:58   ` tip-bot for Chang S. Bae
  0 siblings, 0 replies; 27+ messages in thread
From: tip-bot for Chang S. Bae @ 2018-10-08  9:58 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: torvalds, chang.seok.bae, hpa, dave.hansen, peterz,
	ravi.v.shankar, dvlasenk, markus.t.metzger, linux-kernel,
	brgerst, mingo, luto, riel, luto, tglx, bp

Commit-ID:  ffebbaedc8616cffe648202e364dce6a045d65a2
Gitweb:     https://git.kernel.org/tip/ffebbaedc8616cffe648202e364dce6a045d65a2
Author:     Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate: Tue, 18 Sep 2018 16:08:58 -0700
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Mon, 8 Oct 2018 10:41:10 +0200

x86/vdso: Introduce helper functions for CPU and node number

Clean up the CPU/node number related code a bit, to make it more apparent
how we are encoding/extracting the CPU and node fields from the
segment limit.

No change in functionality intended.

[ mingo: Wrote new changelog. ]

Suggested-by: Andy Lutomirski <luto@kernel.org>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Markus T Metzger <markus.t.metzger@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: Rik van Riel <riel@surriel.com>
Link: http://lkml.kernel.org/r/1537312139-5580-8-git-send-email-chang.seok.bae@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/entry/vdso/vgetcpu.c  |  9 +--------
 arch/x86/entry/vdso/vma.c      | 19 +++++++------------
 arch/x86/include/asm/segment.h | 41 +++++++++++++++++++++++++++++++++++++++++
 arch/x86/include/asm/vgtod.h   | 26 --------------------------
 4 files changed, 49 insertions(+), 46 deletions(-)

diff --git a/arch/x86/entry/vdso/vgetcpu.c b/arch/x86/entry/vdso/vgetcpu.c
index 8ec3d1f4ce9a..de78fc9cd963 100644
--- a/arch/x86/entry/vdso/vgetcpu.c
+++ b/arch/x86/entry/vdso/vgetcpu.c
@@ -13,14 +13,7 @@
 notrace long
 __vdso_getcpu(unsigned *cpu, unsigned *node, struct getcpu_cache *unused)
 {
-	unsigned int p;
-
-	p = __getcpu();
-
-	if (cpu)
-		*cpu = p & VGETCPU_CPU_MASK;
-	if (node)
-		*node = p >> 12;
+	vdso_read_cpu_node(cpu, node);
 	return 0;
 }
 
diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index 0b114aafcedc..39b5584c5808 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -339,20 +339,15 @@ static void vgetcpu_cpu_init(void *arg)
 {
 	int cpu = smp_processor_id();
 	struct desc_struct d = { };
-	unsigned long node = 0;
-#ifdef CONFIG_NUMA
-	node = cpu_to_node(cpu);
-#endif
+	unsigned long cpudata = vdso_encode_cpu_node(cpu, cpu_to_node(cpu));
+
 	if (static_cpu_has(X86_FEATURE_RDTSCP))
-		write_rdtscp_aux((node << 12) | cpu);
+		write_rdtscp_aux(cpudata);
+
+	/* Store CPU and node number in limit */
+	d.limit0 = cpudata;
+	d.limit1 = cpudata >> 16;
 
-	/*
-	 * Store cpu number in limit so that it can be loaded
-	 * quickly in user space in vgetcpu. (12 bits for the CPU
-	 * and 8 bits for the node)
-	 */
-	d.limit0 = cpu | ((node & 0xf) << 12);
-	d.limit1 = node >> 4;
 	d.type = 5;		/* RO data, expand down, accessed */
 	d.dpl = 3;		/* Visible to user code */
 	d.s = 1;		/* Not a system segment */
diff --git a/arch/x86/include/asm/segment.h b/arch/x86/include/asm/segment.h
index e3e788ea52e5..4d1f6cc62e13 100644
--- a/arch/x86/include/asm/segment.h
+++ b/arch/x86/include/asm/segment.h
@@ -224,6 +224,47 @@
 #define GDT_ENTRY_TLS_ENTRIES		3
 #define TLS_SIZE			(GDT_ENTRY_TLS_ENTRIES* 8)
 
+#ifdef CONFIG_X86_64
+
+/* Bit size and mask of CPU number stored in the per CPU data (and TSC_AUX) */
+#define VDSO_CPU_SIZE			12
+#define VDSO_CPU_MASK			0xfff
+
+#ifndef __ASSEMBLY__
+
+/* Helper functions to store/load CPU and node numbers */
+
+static inline unsigned long vdso_encode_cpu_node(int cpu, unsigned long node)
+{
+	return ((node << VDSO_CPU_SIZE) | cpu);
+}
+
+static inline void vdso_read_cpu_node(unsigned *cpu, unsigned *node)
+{
+	unsigned int p;
+
+	/*
+	 * Load CPU and node number from GDT.  LSL is faster than RDTSCP
+	 * and works on all CPUs.  This is volatile so that it orders
+	 * correctly with respect to barrier() and to keep GCC from cleverly
+	 * hoisting it out of the calling function.
+	 *
+	 * If RDPID is available, use it.
+	 */
+	alternative_io ("lsl %[seg],%[p]",
+			".byte 0xf3,0x0f,0xc7,0xf8", /* RDPID %eax/rax */
+			X86_FEATURE_RDPID,
+			[p] "=a" (p), [seg] "r" (__CPU_NUMBER_SEG));
+
+	if (cpu)
+		*cpu = (p & VDSO_CPU_MASK);
+	if (node)
+		*node = (p >> VDSO_CPU_SIZE);
+}
+
+#endif /* !__ASSEMBLY__ */
+#endif /* CONFIG_X86_64 */
+
 #ifdef __KERNEL__
 
 /*
diff --git a/arch/x86/include/asm/vgtod.h b/arch/x86/include/asm/vgtod.h
index 4e81ea920722..056a61c8c5c7 100644
--- a/arch/x86/include/asm/vgtod.h
+++ b/arch/x86/include/asm/vgtod.h
@@ -77,30 +77,4 @@ static inline void gtod_write_end(struct vsyscall_gtod_data *s)
 	++s->seq;
 }
 
-#ifdef CONFIG_X86_64
-
-#define VGETCPU_CPU_MASK 0xfff
-
-static inline unsigned int __getcpu(void)
-{
-	unsigned int p;
-
-	/*
-	 * Load CPU (and node) number from GDT.  LSL is faster than RDTSCP
-	 * and works on all CPUs.  This is volatile so that it orders
-	 * correctly with respect to barrier() and to keep GCC from cleverly
-	 * hoisting it out of the calling function.
-	 *
-	 * If RDPID is available, use it.
-	 */
-	alternative_io ("lsl %[seg],%[p]",
-			".byte 0xf3,0x0f,0xc7,0xf8", /* RDPID %eax/rax */
-			X86_FEATURE_RDPID,
-			[p] "=a" (p), [seg] "r" (__CPU_NUMBER_SEG));
-
-	return p;
-}
-
-#endif /* CONFIG_X86_64 */
-
 #endif /* _ASM_X86_VGTOD_H */

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [tip:x86/asm] x86/vdso: Initialize the CPU/node NR segment descriptor earlier
  2018-09-18 23:08 ` [PATCH v6 8/8] x86/vdso: Move out the CPU initialization Chang S. Bae
  2018-10-08  8:36   ` Ingo Molnar
@ 2018-10-08  9:58   ` tip-bot for Chang S. Bae
  1 sibling, 0 replies; 27+ messages in thread
From: tip-bot for Chang S. Bae @ 2018-10-08  9:58 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: luto, riel, ravi.v.shankar, markus.t.metzger, mingo,
	chang.seok.bae, hpa, peterz, dvlasenk, linux-kernel, dave.hansen,
	luto, tglx, brgerst, torvalds, bp

Commit-ID:  b2e2ba578e016a091eb31565849990fe68c7c599
Gitweb:     https://git.kernel.org/tip/b2e2ba578e016a091eb31565849990fe68c7c599
Author:     Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate: Tue, 18 Sep 2018 16:08:59 -0700
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Mon, 8 Oct 2018 10:41:10 +0200

x86/vdso: Initialize the CPU/node NR segment descriptor earlier

Currently the CPU/node NR segment descriptor (GDT_ENTRY_CPU_NUMBER) is
initialized relatively late during CPU init, from the vCPU code, which
has a number of disadvantages, such as hotplug CPU notifiers and SMP
cross-calls.

Instead just initialize it much earlier, directly in cpu_init().

This reduces complexity and increases robustness.

[ mingo: Wrote new changelog. ]

Suggested-by: H. Peter Anvin <hpa@zytor.com>
Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Markus T Metzger <markus.t.metzger@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1537312139-5580-9-git-send-email-chang.seok.bae@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/entry/vdso/vma.c    | 33 +--------------------------------
 arch/x86/kernel/cpu/common.c | 24 ++++++++++++++++++++++++
 2 files changed, 25 insertions(+), 32 deletions(-)

diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index 39b5584c5808..3f9d43f26f63 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -332,35 +332,6 @@ static __init int vdso_setup(char *s)
 	return 0;
 }
 __setup("vdso=", vdso_setup);
-#endif
-
-#ifdef CONFIG_X86_64
-static void vgetcpu_cpu_init(void *arg)
-{
-	int cpu = smp_processor_id();
-	struct desc_struct d = { };
-	unsigned long cpudata = vdso_encode_cpu_node(cpu, cpu_to_node(cpu));
-
-	if (static_cpu_has(X86_FEATURE_RDTSCP))
-		write_rdtscp_aux(cpudata);
-
-	/* Store CPU and node number in limit */
-	d.limit0 = cpudata;
-	d.limit1 = cpudata >> 16;
-
-	d.type = 5;		/* RO data, expand down, accessed */
-	d.dpl = 3;		/* Visible to user code */
-	d.s = 1;		/* Not a system segment */
-	d.p = 1;		/* Present */
-	d.d = 1;		/* 32-bit */
-
-	write_gdt_entry(get_cpu_gdt_rw(cpu), GDT_ENTRY_CPU_NUMBER, &d, DESCTYPE_S);
-}
-
-static int vgetcpu_online(unsigned int cpu)
-{
-	return smp_call_function_single(cpu, vgetcpu_cpu_init, NULL, 1);
-}
 
 static int __init init_vdso(void)
 {
@@ -370,9 +341,7 @@ static int __init init_vdso(void)
 	init_vdso_image(&vdso_image_x32);
 #endif
 
-	/* notifier priority > KVM */
-	return cpuhp_setup_state(CPUHP_AP_X86_VDSO_VMA_ONLINE,
-				 "x86/vdso/vma:online", vgetcpu_online, NULL);
+	return 0;
 }
 subsys_initcall(init_vdso);
 #endif /* CONFIG_X86_64 */
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 44c4ef3d989b..a148d18a1ef0 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1669,6 +1669,29 @@ static void wait_for_master_cpu(int cpu)
 #endif
 }
 
+#ifdef CONFIG_X86_64
+static void setup_getcpu(int cpu)
+{
+	unsigned long cpudata = vdso_encode_cpu_node(cpu, early_cpu_to_node(cpu));
+	struct desc_struct d = { };
+
+	if (static_cpu_has(X86_FEATURE_RDTSCP))
+		write_rdtscp_aux(cpudata);
+
+	/* Store CPU and node number in limit. */
+	d.limit0 = cpudata;
+	d.limit1 = cpudata >> 16;
+
+	d.type = 5;		/* RO data, expand down, accessed */
+	d.dpl = 3;		/* Visible to user code */
+	d.s = 1;		/* Not a system segment */
+	d.p = 1;		/* Present */
+	d.d = 1;		/* 32-bit */
+
+	write_gdt_entry(get_cpu_gdt_rw(cpu), GDT_ENTRY_CPU_NUMBER, &d, DESCTYPE_S);
+}
+#endif
+
 /*
  * cpu_init() initializes state that is per-CPU. Some data is already
  * initialized (naturally) in the bootstrap process, such as the GDT
@@ -1706,6 +1729,7 @@ void cpu_init(void)
 	    early_cpu_to_node(cpu) != NUMA_NO_NODE)
 		set_numa_node(early_cpu_to_node(cpu));
 #endif
+	setup_getcpu(cpu);
 
 	me = current;
 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [tip:x86/asm] x86/segments: Introduce the 'CPUNODE' naming to better document the segment limit CPU/node NR trick
  2018-09-18 23:08 ` [PATCH v6 1/8] x86/arch_prctl/64: Make ptrace read FS/GS base accurately Chang S. Bae
  2018-10-08  9:54   ` [tip:x86/asm] x86/fsgsbase/64: Fix ptrace() to read the " tip-bot for Andy Lutomirski
@ 2018-10-08  9:59   ` tip-bot for Ingo Molnar
  2018-10-08  9:59   ` [tip:x86/asm] x86/fsgsbase/64: Clean up various details tip-bot for Ingo Molnar
  2 siblings, 0 replies; 27+ messages in thread
From: tip-bot for Ingo Molnar @ 2018-10-08  9:59 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: bp, torvalds, dave.hansen, linux-kernel, dvlasenk, luto, mingo,
	chang.seok.bae, ravi.v.shankar, tglx, markus.t.metzger, riel,
	peterz, hpa, brgerst

Commit-ID:  22245bdf0ad805d6c29f82b6d5e977ee94bb2166
Gitweb:     https://git.kernel.org/tip/22245bdf0ad805d6c29f82b6d5e977ee94bb2166
Author:     Ingo Molnar <mingo@kernel.org>
AuthorDate: Mon, 8 Oct 2018 10:41:59 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Mon, 8 Oct 2018 10:45:02 +0200

x86/segments: Introduce the 'CPUNODE' naming to better document the segment limit CPU/node NR trick

We have a special segment descriptor entry in the GDT, whose sole purpose is to
encode the CPU and node numbers in its limit (size) field. There are user-space
instructions that allow the reading of the limit field, which gives us a really
fast way to read the CPU and node IDs from the vDSO for example.

But the naming of related functionality does not make this clear, at all:

	VDSO_CPU_SIZE
	VDSO_CPU_MASK
	__CPU_NUMBER_SEG
	GDT_ENTRY_CPU_NUMBER
	vdso_encode_cpu_node
	vdso_read_cpu_node

There's a number of problems:

 - The 'VDSO_CPU_SIZE' doesn't really make it clear that these are number
   of bits, nor does it make it clear which 'CPU' this refers to, i.e.
   that this is about a GDT entry whose limit encodes the CPU and node number.

 - Furthermore, the 'CPU_NUMBER' naming is actively misleading as well,
   because the segment limit encodes not just the CPU number but the
   node ID as well ...

So use a better nomenclature all around: name everything related to this trick
as 'CPUNODE', to make it clear that this is something special, and add
_BITS to make it clear that these are number of bits, and propagate this to
every affected name:

	VDSO_CPU_SIZE         =>  VDSO_CPUNODE_BITS
	VDSO_CPU_MASK         =>  VDSO_CPUNODE_MASK
	__CPU_NUMBER_SEG      =>  __CPUNODE_SEG
	GDT_ENTRY_CPU_NUMBER  =>  GDT_ENTRY_CPUNODE
	vdso_encode_cpu_node  =>  vdso_encode_cpunode
	vdso_read_cpu_node    =>  vdso_read_cpunode

This, beyond being less confusing, also makes it easier to grep for all related
functionality:

  $ git grep -i cpunode arch/x86

Also, while at it, fix "return is not a function" style sloppiness in vdso_encode_cpunode().

Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Markus T Metzger <markus.t.metzger@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/1537312139-5580-2-git-send-email-chang.seok.bae@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/entry/vdso/vgetcpu.c  |  2 +-
 arch/x86/include/asm/segment.h | 22 +++++++++++-----------
 arch/x86/kernel/cpu/common.c   |  4 ++--
 3 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/arch/x86/entry/vdso/vgetcpu.c b/arch/x86/entry/vdso/vgetcpu.c
index de78fc9cd963..edd214f5264d 100644
--- a/arch/x86/entry/vdso/vgetcpu.c
+++ b/arch/x86/entry/vdso/vgetcpu.c
@@ -13,7 +13,7 @@
 notrace long
 __vdso_getcpu(unsigned *cpu, unsigned *node, struct getcpu_cache *unused)
 {
-	vdso_read_cpu_node(cpu, node);
+	vdso_read_cpunode(cpu, node);
 	return 0;
 }
 
diff --git a/arch/x86/include/asm/segment.h b/arch/x86/include/asm/segment.h
index 4d1f6cc62e13..a314087add07 100644
--- a/arch/x86/include/asm/segment.h
+++ b/arch/x86/include/asm/segment.h
@@ -186,7 +186,7 @@
 #define GDT_ENTRY_TLS_MIN		12
 #define GDT_ENTRY_TLS_MAX		14
 
-#define GDT_ENTRY_CPU_NUMBER		15
+#define GDT_ENTRY_CPUNODE		15
 
 /*
  * Number of entries in the GDT table:
@@ -206,7 +206,7 @@
 #define __USER_DS			(GDT_ENTRY_DEFAULT_USER_DS*8 + 3)
 #define __USER32_DS			__USER_DS
 #define __USER_CS			(GDT_ENTRY_DEFAULT_USER_CS*8 + 3)
-#define __CPU_NUMBER_SEG		(GDT_ENTRY_CPU_NUMBER*8 + 3)
+#define __CPUNODE_SEG			(GDT_ENTRY_CPUNODE*8 + 3)
 
 #endif
 
@@ -227,24 +227,24 @@
 #ifdef CONFIG_X86_64
 
 /* Bit size and mask of CPU number stored in the per CPU data (and TSC_AUX) */
-#define VDSO_CPU_SIZE			12
-#define VDSO_CPU_MASK			0xfff
+#define VDSO_CPUNODE_BITS		12
+#define VDSO_CPUNODE_MASK		0xfff
 
 #ifndef __ASSEMBLY__
 
 /* Helper functions to store/load CPU and node numbers */
 
-static inline unsigned long vdso_encode_cpu_node(int cpu, unsigned long node)
+static inline unsigned long vdso_encode_cpunode(int cpu, unsigned long node)
 {
-	return ((node << VDSO_CPU_SIZE) | cpu);
+	return (node << VDSO_CPUNODE_BITS) | cpu;
 }
 
-static inline void vdso_read_cpu_node(unsigned *cpu, unsigned *node)
+static inline void vdso_read_cpunode(unsigned *cpu, unsigned *node)
 {
 	unsigned int p;
 
 	/*
-	 * Load CPU and node number from GDT.  LSL is faster than RDTSCP
+	 * Load CPU and node number from the GDT.  LSL is faster than RDTSCP
 	 * and works on all CPUs.  This is volatile so that it orders
 	 * correctly with respect to barrier() and to keep GCC from cleverly
 	 * hoisting it out of the calling function.
@@ -254,12 +254,12 @@ static inline void vdso_read_cpu_node(unsigned *cpu, unsigned *node)
 	alternative_io ("lsl %[seg],%[p]",
 			".byte 0xf3,0x0f,0xc7,0xf8", /* RDPID %eax/rax */
 			X86_FEATURE_RDPID,
-			[p] "=a" (p), [seg] "r" (__CPU_NUMBER_SEG));
+			[p] "=a" (p), [seg] "r" (__CPUNODE_SEG));
 
 	if (cpu)
-		*cpu = (p & VDSO_CPU_MASK);
+		*cpu = (p & VDSO_CPUNODE_MASK);
 	if (node)
-		*node = (p >> VDSO_CPU_SIZE);
+		*node = (p >> VDSO_CPUNODE_BITS);
 }
 
 #endif /* !__ASSEMBLY__ */
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index a148d18a1ef0..7da587f4af52 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1672,7 +1672,7 @@ static void wait_for_master_cpu(int cpu)
 #ifdef CONFIG_X86_64
 static void setup_getcpu(int cpu)
 {
-	unsigned long cpudata = vdso_encode_cpu_node(cpu, early_cpu_to_node(cpu));
+	unsigned long cpudata = vdso_encode_cpunode(cpu, early_cpu_to_node(cpu));
 	struct desc_struct d = { };
 
 	if (static_cpu_has(X86_FEATURE_RDTSCP))
@@ -1688,7 +1688,7 @@ static void setup_getcpu(int cpu)
 	d.p = 1;		/* Present */
 	d.d = 1;		/* 32-bit */
 
-	write_gdt_entry(get_cpu_gdt_rw(cpu), GDT_ENTRY_CPU_NUMBER, &d, DESCTYPE_S);
+	write_gdt_entry(get_cpu_gdt_rw(cpu), GDT_ENTRY_CPUNODE, &d, DESCTYPE_S);
 }
 #endif
 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [tip:x86/asm] x86/fsgsbase/64: Clean up various details
  2018-09-18 23:08 ` [PATCH v6 1/8] x86/arch_prctl/64: Make ptrace read FS/GS base accurately Chang S. Bae
  2018-10-08  9:54   ` [tip:x86/asm] x86/fsgsbase/64: Fix ptrace() to read the " tip-bot for Andy Lutomirski
  2018-10-08  9:59   ` [tip:x86/asm] x86/segments: Introduce the 'CPUNODE' naming to better document the segment limit CPU/node NR trick tip-bot for Ingo Molnar
@ 2018-10-08  9:59   ` tip-bot for Ingo Molnar
  2 siblings, 0 replies; 27+ messages in thread
From: tip-bot for Ingo Molnar @ 2018-10-08  9:59 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: dave.hansen, luto, tglx, riel, mingo, peterz, brgerst, dvlasenk,
	linux-kernel, chang.seok.bae, torvalds, hpa, bp, ravi.v.shankar,
	markus.t.metzger

Commit-ID:  ec3a94188df7d28b374868d9a2a0face910e62ab
Gitweb:     https://git.kernel.org/tip/ec3a94188df7d28b374868d9a2a0face910e62ab
Author:     Ingo Molnar <mingo@kernel.org>
AuthorDate: Mon, 8 Oct 2018 10:41:59 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Mon, 8 Oct 2018 10:45:04 +0200

x86/fsgsbase/64: Clean up various details

So:

 - use 'extern' consistently for APIs

 - fix weird header guard

 - clarify code comments

 - reorder APIs by type

Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Markus T Metzger <markus.t.metzger@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/1537312139-5580-2-git-send-email-chang.seok.bae@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/entry/vdso/vgetcpu.c   |  1 +
 arch/x86/include/asm/fsgsbase.h | 22 ++++++++++++----------
 2 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/arch/x86/entry/vdso/vgetcpu.c b/arch/x86/entry/vdso/vgetcpu.c
index edd214f5264d..f86ab0ae1777 100644
--- a/arch/x86/entry/vdso/vgetcpu.c
+++ b/arch/x86/entry/vdso/vgetcpu.c
@@ -14,6 +14,7 @@ notrace long
 __vdso_getcpu(unsigned *cpu, unsigned *node, struct getcpu_cache *unused)
 {
 	vdso_read_cpunode(cpu, node);
+
 	return 0;
 }
 
diff --git a/arch/x86/include/asm/fsgsbase.h b/arch/x86/include/asm/fsgsbase.h
index 5e9cbcce318a..eb377b6e9eed 100644
--- a/arch/x86/include/asm/fsgsbase.h
+++ b/arch/x86/include/asm/fsgsbase.h
@@ -1,6 +1,6 @@
 /* SPDX-License-Identifier: GPL-2.0 */
 #ifndef _ASM_FSGSBASE_H
-#define _ASM_FSGSBASE_H 1
+#define _ASM_FSGSBASE_H
 
 #ifndef __ASSEMBLY__
 
@@ -9,14 +9,15 @@
 #include <asm/msr-index.h>
 
 /*
- * Read/write a task's fsbase or gsbase. This returns the value that
+ * Read/write a task's FSBASE or GSBASE. This returns the value that
  * the FS/GS base would have (if the task were to be resumed). These
- * work on current or on a different non-running task.
+ * work on the current task or on a non-running (typically stopped
+ * ptrace child) task.
  */
-unsigned long x86_fsbase_read_task(struct task_struct *task);
-unsigned long x86_gsbase_read_task(struct task_struct *task);
-int x86_fsbase_write_task(struct task_struct *task, unsigned long fsbase);
-int x86_gsbase_write_task(struct task_struct *task, unsigned long gsbase);
+extern unsigned long x86_fsbase_read_task(struct task_struct *task);
+extern unsigned long x86_gsbase_read_task(struct task_struct *task);
+extern int x86_fsbase_write_task(struct task_struct *task, unsigned long fsbase);
+extern int x86_gsbase_write_task(struct task_struct *task, unsigned long gsbase);
 
 /* Helper functions for reading/writing FS/GS base */
 
@@ -25,20 +26,21 @@ static inline unsigned long x86_fsbase_read_cpu(void)
 	unsigned long fsbase;
 
 	rdmsrl(MSR_FS_BASE, fsbase);
+
 	return fsbase;
 }
 
-void x86_fsbase_write_cpu(unsigned long fsbase);
-
 static inline unsigned long x86_gsbase_read_cpu_inactive(void)
 {
 	unsigned long gsbase;
 
 	rdmsrl(MSR_KERNEL_GS_BASE, gsbase);
+
 	return gsbase;
 }
 
-void x86_gsbase_write_cpu_inactive(unsigned long gsbase);
+extern void x86_fsbase_write_cpu(unsigned long fsbase);
+extern void x86_gsbase_write_cpu_inactive(unsigned long gsbase);
 
 #endif /* CONFIG_X86_64 */
 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [regression in -rc1] Re: [PATCH v6 2/8] x86/fsgsbase/64: Introduce FS/GS base helper functions
  2018-09-18 23:08 ` [PATCH v6 2/8] x86/fsgsbase/64: Introduce FS/GS base helper functions Chang S. Bae
  2018-10-08  9:55   ` [tip:x86/asm] " tip-bot for Chang S. Bae
@ 2018-10-24 19:01   ` Andy Lutomirski
  2018-10-24 19:13     ` Bae, Chang Seok
  2018-10-25 22:37     ` Andy Lutomirski
  1 sibling, 2 replies; 27+ messages in thread
From: Andy Lutomirski @ 2018-10-24 19:01 UTC (permalink / raw)
  To: Bae, Chang Seok
  Cc: Ingo Molnar, Thomas Gleixner, Andrew Lutomirski, H. Peter Anvin,
	Andi Kleen, Dave Hansen, Metzger, Markus T, Ravi V. Shankar,
	LKML

On Tue, Sep 18, 2018 at 4:09 PM Chang S. Bae <chang.seok.bae@intel.com> wrote:
>
> With new helpers, FS/GS base access is centralized.
> Eventually, when FSGSBASE instruction enabled, it will
> be faster.

Sorry for not catching this during review, but:

> +void x86_fsbase_write_cpu(unsigned long fsbase)
> +{
> +       /*
> +        * Set the selector to 0 as a notion, that the segment base is
> +        * overwritten, which will be checked for skipping the segment load
> +        * during context switch.
> +        */
> +       loadseg(FS, 0);

^^^

what?

> +       wrmsrl(MSR_FS_BASE, fsbase);
> +}

I don't understand what the comment is trying to say, but the sole
caller so far of this function is x86_gsbase_write_task(), and the
code looks incorrect.

Ingo, I think we need to address this during this merge window,
probably by removing the comment and the loadseg() call (and the same
for gsbase...inactive).  But first, Chang, can you explain what
exactly your intent is here?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [regression in -rc1] Re: [PATCH v6 2/8] x86/fsgsbase/64: Introduce FS/GS base helper functions
  2018-10-24 19:01   ` [regression in -rc1] Re: [PATCH v6 2/8] " Andy Lutomirski
@ 2018-10-24 19:13     ` Bae, Chang Seok
  2018-10-24 19:22       ` Andy Lutomirski
  2018-10-25 22:37     ` Andy Lutomirski
  1 sibling, 1 reply; 27+ messages in thread
From: Bae, Chang Seok @ 2018-10-24 19:13 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andi Kleen,
	Dave Hansen, Metzger, Markus T, Shankar, Ravi V, LKML

On Tue, Sep 18, 2018 at 12:02 PM Andy Lutomirski <luto@kernel.org>
> On Tue, Sep 18, 2018 at 4:09 PM Chang S. Bae <chang.seok.bae@intel.com>
> wrote:
> >
> > With new helpers, FS/GS base access is centralized.
> > Eventually, when FSGSBASE instruction enabled, it will be faster.
> 
> Sorry for not catching this during review, but:
> 
> > +void x86_fsbase_write_cpu(unsigned long fsbase) {
> > +       /*
> > +        * Set the selector to 0 as a notion, that the segment base is
> > +        * overwritten, which will be checked for skipping the segment load
> > +        * during context switch.
> > +        */
> > +       loadseg(FS, 0);
> 
> ^^^
> 
> what?
> 
> > +       wrmsrl(MSR_FS_BASE, fsbase);
> > +}
> 
> I don't understand what the comment is trying to say, but the sole caller so far
> of this function is x86_gsbase_write_task(), and the code looks incorrect.
> 
> Ingo, I think we need to address this during this merge window, probably by
> removing the comment and the loadseg() call (and the same for
> gsbase...inactive).  But first, Chang, can you explain what exactly your intent is
> here?

It's coming from do_arch_prctl_64(). If you think it really makes confusion in 
x86_fsbase_write_cpu(), how about moving it to x86_fsbase_write_task()?

Chang

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [regression in -rc1] Re: [PATCH v6 2/8] x86/fsgsbase/64: Introduce FS/GS base helper functions
  2018-10-24 19:13     ` Bae, Chang Seok
@ 2018-10-24 19:22       ` Andy Lutomirski
  2018-10-24 19:29         ` Bae, Chang Seok
  0 siblings, 1 reply; 27+ messages in thread
From: Andy Lutomirski @ 2018-10-24 19:22 UTC (permalink / raw)
  To: Bae, Chang Seok
  Cc: Andrew Lutomirski, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Andi Kleen, Dave Hansen, Metzger, Markus T, Ravi V. Shankar,
	LKML

On Wed, Oct 24, 2018 at 12:13 PM Bae, Chang Seok
<chang.seok.bae@intel.com> wrote:
>
> On Tue, Sep 18, 2018 at 12:02 PM Andy Lutomirski <luto@kernel.org>
> > On Tue, Sep 18, 2018 at 4:09 PM Chang S. Bae <chang.seok.bae@intel.com>
> > wrote:
> > >
> > > With new helpers, FS/GS base access is centralized.
> > > Eventually, when FSGSBASE instruction enabled, it will be faster.
> >
> > Sorry for not catching this during review, but:
> >
> > > +void x86_fsbase_write_cpu(unsigned long fsbase) {
> > > +       /*
> > > +        * Set the selector to 0 as a notion, that the segment base is
> > > +        * overwritten, which will be checked for skipping the segment load
> > > +        * during context switch.
> > > +        */
> > > +       loadseg(FS, 0);
> >
> > ^^^
> >
> > what?
> >
> > > +       wrmsrl(MSR_FS_BASE, fsbase);
> > > +}
> >
> > I don't understand what the comment is trying to say, but the sole caller so far
> > of this function is x86_gsbase_write_task(), and the code looks incorrect.
> >
> > Ingo, I think we need to address this during this merge window, probably by
> > removing the comment and the loadseg() call (and the same for
> > gsbase...inactive).  But first, Chang, can you explain what exactly your intent is
> > here?
>
> It's coming from do_arch_prctl_64(). If you think it really makes confusion in
> x86_fsbase_write_cpu(), how about moving it to x86_fsbase_write_task()?

Why should ..write_task() magically change the index but only if it's
writing current?

I think you should move it all the way out to the caller
(do_arch_prctl_64()?) and we can see if it makes sense there.

>
> Chang

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [regression in -rc1] Re: [PATCH v6 2/8] x86/fsgsbase/64: Introduce FS/GS base helper functions
  2018-10-24 19:22       ` Andy Lutomirski
@ 2018-10-24 19:29         ` Bae, Chang Seok
  2018-10-24 19:43           ` Andy Lutomirski
  0 siblings, 1 reply; 27+ messages in thread
From: Bae, Chang Seok @ 2018-10-24 19:29 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andi Kleen,
	Dave Hansen, Metzger, Markus T, Shankar, Ravi V, LKML

On Wed, Oct 24, 2018 at 12:22 PM Andy Lutomirski <luto@kernel.org>
> On Wed, Oct 24, 2018 at 12:13 PM Bae, Chang Seok
> <chang.seok.bae@intel.com> wrote:
> >
> > On Tue, Sep 18, 2018 at 12:02 PM Andy Lutomirski <luto@kernel.org>
> > > On Tue, Sep 18, 2018 at 4:09 PM Chang S. Bae
> > > <chang.seok.bae@intel.com>
> > > wrote:
> > > >
> > > > With new helpers, FS/GS base access is centralized.
> > > > Eventually, when FSGSBASE instruction enabled, it will be faster.
> > >
> > > Sorry for not catching this during review, but:
> > >
> > > > +void x86_fsbase_write_cpu(unsigned long fsbase) {
> > > > +       /*
> > > > +        * Set the selector to 0 as a notion, that the segment base is
> > > > +        * overwritten, which will be checked for skipping the segment load
> > > > +        * during context switch.
> > > > +        */
> > > > +       loadseg(FS, 0);
> > >
> > > ^^^
> > >
> > > what?
> > >
> > > > +       wrmsrl(MSR_FS_BASE, fsbase); }
> > >
> > > I don't understand what the comment is trying to say, but the sole
> > > caller so far of this function is x86_gsbase_write_task(), and the code looks
> incorrect.
> > >
> > > Ingo, I think we need to address this during this merge window,
> > > probably by removing the comment and the loadseg() call (and the
> > > same for gsbase...inactive).  But first, Chang, can you explain what
> > > exactly your intent is here?
> >
> > It's coming from do_arch_prctl_64(). If you think it really makes
> > confusion in x86_fsbase_write_cpu(), how about moving it to
> x86_fsbase_write_task()?
> 
> Why should ..write_task() magically change the index but only if it's writing
> current?
> 
> I think you should move it all the way out to the caller
> (do_arch_prctl_64()?) and we can see if it makes sense there.
> 

Okay. x86_fsbase_write_task() doesn't make sense. 
Then, it should rollback that helper and call x86_fsbase_write_cpu() only
from ptrace. Same for gsbase. Sounds okay?

Chang


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [regression in -rc1] Re: [PATCH v6 2/8] x86/fsgsbase/64: Introduce FS/GS base helper functions
  2018-10-24 19:29         ` Bae, Chang Seok
@ 2018-10-24 19:43           ` Andy Lutomirski
  2018-10-24 22:50             ` Bae, Chang Seok
  0 siblings, 1 reply; 27+ messages in thread
From: Andy Lutomirski @ 2018-10-24 19:43 UTC (permalink / raw)
  To: Bae, Chang Seok
  Cc: Andrew Lutomirski, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Andi Kleen, Dave Hansen, Metzger, Markus T, Ravi V. Shankar,
	LKML

On Wed, Oct 24, 2018 at 12:29 PM Bae, Chang Seok
<chang.seok.bae@intel.com> wrote:
>
> On Wed, Oct 24, 2018 at 12:22 PM Andy Lutomirski <luto@kernel.org>
> > On Wed, Oct 24, 2018 at 12:13 PM Bae, Chang Seok
> > <chang.seok.bae@intel.com> wrote:
> > >
> > > On Tue, Sep 18, 2018 at 12:02 PM Andy Lutomirski <luto@kernel.org>
> > > > On Tue, Sep 18, 2018 at 4:09 PM Chang S. Bae
> > > > <chang.seok.bae@intel.com>
> > > > wrote:
> > > > >
> > > > > With new helpers, FS/GS base access is centralized.
> > > > > Eventually, when FSGSBASE instruction enabled, it will be faster.
> > > >
> > > > Sorry for not catching this during review, but:
> > > >
> > > > > +void x86_fsbase_write_cpu(unsigned long fsbase) {
> > > > > +       /*
> > > > > +        * Set the selector to 0 as a notion, that the segment base is
> > > > > +        * overwritten, which will be checked for skipping the segment load
> > > > > +        * during context switch.
> > > > > +        */
> > > > > +       loadseg(FS, 0);
> > > >
> > > > ^^^
> > > >
> > > > what?
> > > >
> > > > > +       wrmsrl(MSR_FS_BASE, fsbase); }
> > > >
> > > > I don't understand what the comment is trying to say, but the sole
> > > > caller so far of this function is x86_gsbase_write_task(), and the code looks
> > incorrect.
> > > >
> > > > Ingo, I think we need to address this during this merge window,
> > > > probably by removing the comment and the loadseg() call (and the
> > > > same for gsbase...inactive).  But first, Chang, can you explain what
> > > > exactly your intent is here?
> > >
> > > It's coming from do_arch_prctl_64(). If you think it really makes
> > > confusion in x86_fsbase_write_cpu(), how about moving it to
> > x86_fsbase_write_task()?
> >
> > Why should ..write_task() magically change the index but only if it's writing
> > current?
> >
> > I think you should move it all the way out to the caller
> > (do_arch_prctl_64()?) and we can see if it makes sense there.
> >
>
> Okay. x86_fsbase_write_task() doesn't make sense.
> Then, it should rollback that helper and call x86_fsbase_write_cpu() only
> from ptrace. Same for gsbase. Sounds okay?
>

I think x86_fsbase_write_task() makes plenty of sense, but I think
that callers need to be aware that the effect of writing a nonzero
fsbase *and* a nonzero fsindex is bizarre on non-FSGSBASE systems.  So
that code should go in the callers.  The oddities involved have little
to do with whether the caller is writing to current or to something
else.

Arguably the code should be entirely split out into the code that
writes current (arch_prctl()) and the code that writes a stopped task
(ptrace).  I don't think there are any code paths that genuinely can
write either.

--Andy

--Andy

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [regression in -rc1] Re: [PATCH v6 2/8] x86/fsgsbase/64: Introduce FS/GS base helper functions
  2018-10-24 19:43           ` Andy Lutomirski
@ 2018-10-24 22:50             ` Bae, Chang Seok
  0 siblings, 0 replies; 27+ messages in thread
From: Bae, Chang Seok @ 2018-10-24 22:50 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andi Kleen,
	Dave Hansen, Metzger, Markus T, Shankar, Ravi V, LKML

On Wed, Oct 24, 2018 at 12:43 PM Andy Lutomirski <luto@kernel.org> wrote:
> I think x86_fsbase_write_task() makes plenty of sense, but I think
> that callers need to be aware that the effect of writing a nonzero
> fsbase *and* a nonzero fsindex is bizarre on non-FSGSBASE systems.  So
> that code should go in the callers.  The oddities involved have little
> to do with whether the caller is writing to current or to something
> else.
> 
> Arguably the code should be entirely split out into the code that
> writes current (arch_prctl()) and the code that writes a stopped task
> (ptrace).  I don't think there are any code paths that genuinely can
> write either.
> 

Can you check this patch is close to what in your mind?

diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index d6674a425714..5f986e15842e 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -341,19 +341,11 @@ static unsigned long x86_fsgsbase_read_task(struct task_struct *task,
 
 void x86_fsbase_write_cpu(unsigned long fsbase)
 {
-	/*
-	 * Set the selector to 0 as a notion, that the segment base is
-	 * overwritten, which will be checked for skipping the segment load
-	 * during context switch.
-	 */
-	loadseg(FS, 0);
 	wrmsrl(MSR_FS_BASE, fsbase);
 }
 
 void x86_gsbase_write_cpu_inactive(unsigned long gsbase)
 {
-	/* Set the selector to 0 for the same reason as %fs above. */
-	loadseg(GS, 0);
 	wrmsrl(MSR_KERNEL_GS_BASE, gsbase);
 }
 
@@ -361,9 +353,7 @@ unsigned long x86_fsbase_read_task(struct task_struct *task)
 {
 	unsigned long fsbase;
 
-	if (task == current)
-		fsbase = x86_fsbase_read_cpu();
-	else if (task->thread.fsindex == 0)
+	if (task->thread.fsindex == 0)
 		fsbase = task->thread.fsbase;
 	else
 		fsbase = x86_fsgsbase_read_task(task, task->thread.fsindex);
@@ -375,9 +365,7 @@ unsigned long x86_gsbase_read_task(struct task_struct *task)
 {
 	unsigned long gsbase;
 
-	if (task == current)
-		gsbase = x86_gsbase_read_cpu_inactive();
-	else if (task->thread.gsindex == 0)
+	if (task->thread.gsindex == 0)
 		gsbase = task->thread.gsbase;
 	else
 		gsbase = x86_fsgsbase_read_task(task, task->thread.gsindex);
@@ -396,8 +384,6 @@ int x86_fsbase_write_task(struct task_struct *task, unsigned long fsbase)
 
 	preempt_disable();
 	task->thread.fsbase = fsbase;
-	if (task == current)
-		x86_fsbase_write_cpu(fsbase);
 	task->thread.fsindex = 0;
 	preempt_enable();
 
@@ -411,8 +397,6 @@ int x86_gsbase_write_task(struct task_struct *task, unsigned long gsbase)
 
 	preempt_disable();
 	task->thread.gsbase = gsbase;
-	if (task == current)
-		x86_gsbase_write_cpu_inactive(gsbase);
 	task->thread.gsindex = 0;
 	preempt_enable();
 
@@ -761,20 +745,42 @@ long do_arch_prctl_64(struct task_struct *task, int option, unsigned long arg2)
 	switch (option) {
 	case ARCH_SET_GS: {
 		ret = x86_gsbase_write_task(task, arg2);
+		if (task == current && ret == 0) {
+			preempt_disable();
+			loadseg(GS, 0);
+			x86_gsbase_write_cpu_inactive();
+			preempt_enable();
+		}
 		break;
 	}
 	case ARCH_SET_FS: {
 		ret = x86_fsbase_write_task(task, arg2);
+		if (task == current && ret == 0) {
+			preempt_disable();
+			loadseg(FS, 0);
+			x86_fsbase_write_cpu();
+			preempt_enable();
+		}
 		break;
 	}
 	case ARCH_GET_FS: {
-		unsigned long base = x86_fsbase_read_task(task);
+		unsigned long base;
+
+		if (task == current)
+			base = x86_fsbase_read_cpu();
+		else
+			base = x86_fsbase_read_task(task);
 
 		ret = put_user(base, (unsigned long __user *)arg2);
 		break;
 	}
 	case ARCH_GET_GS: {
-		unsigned long base = x86_gsbase_read_task(task);
+		unsigned long base;
+
+		if (task == current)
+			base = x86_gsbase_read_cpu_inactive();
+		else
+			base = x86_gsbase_read_task(task);
 
 		ret = put_user(base, (unsigned long __user *)arg2);
 		break;

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [regression in -rc1] Re: [PATCH v6 2/8] x86/fsgsbase/64: Introduce FS/GS base helper functions
  2018-10-24 19:01   ` [regression in -rc1] Re: [PATCH v6 2/8] " Andy Lutomirski
  2018-10-24 19:13     ` Bae, Chang Seok
@ 2018-10-25 22:37     ` Andy Lutomirski
  1 sibling, 0 replies; 27+ messages in thread
From: Andy Lutomirski @ 2018-10-25 22:37 UTC (permalink / raw)
  To: Andrew Lutomirski
  Cc: Bae, Chang Seok, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Andi Kleen, Dave Hansen, Metzger, Markus T, Ravi V. Shankar,
	LKML

On Wed, Oct 24, 2018 at 12:01 PM Andy Lutomirski <luto@kernel.org> wrote:
>
> On Tue, Sep 18, 2018 at 4:09 PM Chang S. Bae <chang.seok.bae@intel.com> wrote:
> >
> > With new helpers, FS/GS base access is centralized.
> > Eventually, when FSGSBASE instruction enabled, it will
> > be faster.
>
> Sorry for not catching this during review, but:
>
> > +void x86_fsbase_write_cpu(unsigned long fsbase)
> > +{
> > +       /*
> > +        * Set the selector to 0 as a notion, that the segment base is
> > +        * overwritten, which will be checked for skipping the segment load
> > +        * during context switch.
> > +        */
> > +       loadseg(FS, 0);
>
> ^^^
>
> what?
>
> > +       wrmsrl(MSR_FS_BASE, fsbase);
> > +}
>
> I don't understand what the comment is trying to say, but the sole
> caller so far of this function is x86_gsbase_write_task(), and the
> code looks incorrect.
>
> Ingo, I think we need to address this during this merge window,
> probably by removing the comment and the loadseg() call (and the same
> for gsbase...inactive).  But first, Chang, can you explain what
> exactly your intent is here?

It might not be a problem for the current merge window, since the one
and only caller (I think) that hits this code is okay with it.  But it
might still be nice to have it cleaned up in Linus' tree.

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2018-10-25 22:37 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-18 23:08 [PATCH v6 0/8] x86: infrastructure to enable FSGSBASE Chang S. Bae
2018-09-18 23:08 ` [PATCH v6 1/8] x86/arch_prctl/64: Make ptrace read FS/GS base accurately Chang S. Bae
2018-10-08  9:54   ` [tip:x86/asm] x86/fsgsbase/64: Fix ptrace() to read the " tip-bot for Andy Lutomirski
2018-10-08  9:59   ` [tip:x86/asm] x86/segments: Introduce the 'CPUNODE' naming to better document the segment limit CPU/node NR trick tip-bot for Ingo Molnar
2018-10-08  9:59   ` [tip:x86/asm] x86/fsgsbase/64: Clean up various details tip-bot for Ingo Molnar
2018-09-18 23:08 ` [PATCH v6 2/8] x86/fsgsbase/64: Introduce FS/GS base helper functions Chang S. Bae
2018-10-08  9:55   ` [tip:x86/asm] " tip-bot for Chang S. Bae
2018-10-24 19:01   ` [regression in -rc1] Re: [PATCH v6 2/8] " Andy Lutomirski
2018-10-24 19:13     ` Bae, Chang Seok
2018-10-24 19:22       ` Andy Lutomirski
2018-10-24 19:29         ` Bae, Chang Seok
2018-10-24 19:43           ` Andy Lutomirski
2018-10-24 22:50             ` Bae, Chang Seok
2018-10-25 22:37     ` Andy Lutomirski
2018-09-18 23:08 ` [PATCH v6 3/8] x86/fsgsbase/64: Make ptrace use correct FS/GS base helpers Chang S. Bae
2018-10-08  9:56   ` [tip:x86/asm] x86/fsgsbase/64: Make ptrace use the new " tip-bot for Chang S. Bae
2018-09-18 23:08 ` [PATCH v6 4/8] x86/fsgsbase/64: Use FS/GS base helpers in core dump Chang S. Bae
2018-10-08  9:56   ` [tip:x86/asm] x86/fsgsbase/64: Convert the ELF core dump code to the new FSGSBASE helpers tip-bot for Chang S. Bae
2018-09-18 23:08 ` [PATCH v6 5/8] x86/fsgsbase/64: Factor out load FS/GS segments from __switch_to() Chang S. Bae
2018-10-08  9:57   ` [tip:x86/asm] x86/fsgsbase/64: Factor out FS/GS segment loading " tip-bot for Chang S. Bae
2018-09-18 23:08 ` [PATCH v6 6/8] x86/segments/64: Rename PER_CPU segment to CPU_NUMBER Chang S. Bae
2018-10-08  9:57   ` [tip:x86/asm] x86/segments/64: Rename the GDT PER_CPU entry " tip-bot for Chang S. Bae
2018-09-18 23:08 ` [PATCH v6 7/8] x86/vdso: Introduce helper functions for CPU and node number Chang S. Bae
2018-10-08  9:58   ` [tip:x86/asm] " tip-bot for Chang S. Bae
2018-09-18 23:08 ` [PATCH v6 8/8] x86/vdso: Move out the CPU initialization Chang S. Bae
2018-10-08  8:36   ` Ingo Molnar
2018-10-08  9:58   ` [tip:x86/asm] x86/vdso: Initialize the CPU/node NR segment descriptor earlier tip-bot for Chang S. Bae

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).