linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [v3 00/12] x86: Enable FSGSBASE instructions
@ 2018-10-23 18:42 Chang S. Bae
  2018-10-23 18:42 ` [v3 01/12] taint: Introduce a new taint flag (insecure) Chang S. Bae
                   ` (11 more replies)
  0 siblings, 12 replies; 34+ messages in thread
From: Chang S. Bae @ 2018-10-23 18:42 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, Andy Lutomirski, H . Peter Anvin
  Cc: Andi Kleen, Dave Hansen, Markus T Metzger, Ravi Shankar,
	Chang S . Bae, LKML

FSGSBASE is a 64-bit instruction set to allow read/write FS/GSBASE from
any privileges. Since introduced in Ivybridge, enabling efforts has been
revolving in a quite long period of time, for various reasons [2,3,4].
After the extended discussions [1], the new ABIs are finally introduced to
customize FS/GSBASE separately from the selector.

Benefits:
Some performance benefit in context switch is expected by skipping MSR
write for GSBASE. User-level programs (such as JAVA-based) may benefit
from avoiding system calls to edit FS/GSBASE.

Major changes in the kernel:
* In a context switch, a thread's FS/GSBASE will be secured regardless of
its selector, base on the discussion [1].
* (Subsequently) ptracer should expect a divergence of FS/GS index and
base values. There was a controversial debate on the concerns for a
backward compatibility (mostly for GDB. [7,8]). We finally concluded it is
insignificant in real usages.
* On the paranoid_entry, GSBASE is updated to point the per_CPU base and
the original GSBASE is restored at the exit.

Virtualization:
A FSGSBASE-enabled VM can be located on a host either with HW
virtualization or with SW emulation. KVM advertises FSGSBASE when
physical CPU has. The emulation is supported in QEMU/TCG [5]. In a pool of
the mixed systems, VMM may disable FSGSBASE for seamless VM migrations [6].

Update from v2 [10]:
* Separate out the preparatory patches [11] (now merged to the tip)
* Bisect the paranoid_entry update patch
* Edit minor nits

Updates from v1 [9]:
* Update the GSBASE update mechanism on the paranoid entry/exit.
* Exclude ptracer backward compatibility patches.
* Include FSGSBASE documentation and enumerating capability
for user space
* Add the TAINT_INSECURE flag.

[1] Recent discussion on LKML:
https://marc.info/?t=150147053700001&r=1&w=2
[2] Andy Lutomirski’s patchwork work :
https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/log/?h=x86/fsgsbase
[3] Patchset shown in 2016:
https://lore.kernel.org/patchwork/patch/660520
[4] Patchset shown in 2014:
https://lore.kernel.org/patchwork/patch/460288
[5] QEMU with FSGSBASE emulation:
https://github.com/qemu/qemu/blob/026aaf47c02b79036feb830206cfebb2a726510d/target/i386/translate.c#L8186
[6] 5-level EPT:
http://lkml.kernel.org/r/9ddf602b-6c8b-8c1e-ab46-07ed12366593@redhat.com
[7] RR/FSGSBASE:
https://mail.mozilla.org/pipermail/rr-dev/2018-March/000616.html
[8] CRIU/FSGSBASE:
https://lists.openvz.org/pipermail/criu/2018-March/040654.html
[9] Version 1:
https://lore.kernel.org/patchwork/cover/934843
[10] Version 2:
https://lore.kernel.org/patchwork/cover/912063
[11] x86: infra to enable FSGSBASE
https://lore.kernel.org/patchwork/cover/988180

Andi Kleen (3):
  x86/fsgsbase/64: Add intrinsics/macros for FSGSBASE instructions
  x86/elf: Enumerate kernel FSGSBASE capability in AT_HWCAP2
  x86/fsgsbase/64: Add documentation for FSGSBASE

Andy Lutomirski (4):
  x86/fsgsbase/64: Add 'unsafe_fsgsbase' to enable CR4.FSGSBASE
  x86/fsgsbase/64: Preserve FS/GS state in __switch_to() if FSGSBASE is
    on
  selftests/x86/fsgsbase: Test WRGSBASE
  x86/fsgsbase/64: Enable FSGSBASE by default and add a chicken bit

Chang S. Bae (5):
  taint: Introduce a new taint flag (insecure)
  x86/fsgsbase/64: Enable FSGSBASE instructions in the helper functions
  x86/fsgsbase/64: When copying a thread, use the FSGSBASE instructions
    if available
  x86/fsgsbase/64: Introduce the new FIND_PERCPU_BASE macro
  x86/fsgsbase/64: Use the per-CPU base as GSBASE at the paranoid_entry

 .../admin-guide/kernel-parameters.txt         |   2 +
 Documentation/sysctl/kernel.txt               |   1 +
 Documentation/x86/fsgs.txt                    | 104 +++++++++++++
 arch/x86/entry/entry_64.S                     |  73 +++++++--
 arch/x86/include/asm/fsgsbase.h               | 140 +++++++++++++++++-
 arch/x86/include/asm/inst.h                   |  15 ++
 arch/x86/include/uapi/asm/hwcap2.h            |   3 +
 arch/x86/kernel/cpu/common.c                  |  22 +++
 arch/x86/kernel/process_64.c                  | 128 +++++++++++++---
 include/linux/kernel.h                        |   3 +-
 kernel/panic.c                                |   1 +
 tools/testing/selftests/x86/fsgsbase.c        | 110 +++++++++++++-
 12 files changed, 556 insertions(+), 46 deletions(-)
 create mode 100644 Documentation/x86/fsgs.txt

--
2.19.1


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [v3 01/12] taint: Introduce a new taint flag (insecure)
  2018-10-23 18:42 [v3 00/12] x86: Enable FSGSBASE instructions Chang S. Bae
@ 2018-10-23 18:42 ` Chang S. Bae
  2018-10-24 18:50   ` Andy Lutomirski
  2018-10-23 18:42 ` [v3 02/12] x86/fsgsbase/64: Add 'unsafe_fsgsbase' to enable CR4.FSGSBASE Chang S. Bae
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 34+ messages in thread
From: Chang S. Bae @ 2018-10-23 18:42 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, Andy Lutomirski, H . Peter Anvin
  Cc: Andi Kleen, Dave Hansen, Markus T Metzger, Ravi Shankar,
	Chang S . Bae, LKML

For testing (or root-only) purposes, the new flag will serve to tag the
kernel taint accurately.

When adding a new feature support, patches need to be incrementally
applied and tested with temporal parameters. Currently, there is no flag
for this usage.

Suggested-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
---
 Documentation/sysctl/kernel.txt | 1 +
 include/linux/kernel.h          | 3 ++-
 kernel/panic.c                  | 1 +
 3 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index 37a679501ddc..d682a6551365 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -1017,6 +1017,7 @@ ORed together. The letters are seen in "Tainted" line of Oops reports.
  32768 (K): The kernel has been live patched.
  65536 (X): Auxiliary taint, defined and used by for distros.
 131072 (T): The kernel was built with the struct randomization plugin.
+262144 (Z): The kernel is running in a known insecure configuration.
 
 ==============================================================
 
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index d6aac75b51ba..38b2657d62d5 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -598,7 +598,8 @@ extern enum system_states {
 #define TAINT_LIVEPATCH			15
 #define TAINT_AUX			16
 #define TAINT_RANDSTRUCT		17
-#define TAINT_FLAGS_COUNT		18
+#define TAINT_INSECURE			18
+#define TAINT_FLAGS_COUNT		19
 
 struct taint_flag {
 	char c_true;	/* character printed when tainted */
diff --git a/kernel/panic.c b/kernel/panic.c
index 8b2e002d52eb..8db2ed9ec290 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -327,6 +327,7 @@ const struct taint_flag taint_flags[TAINT_FLAGS_COUNT] = {
 	[ TAINT_LIVEPATCH ]		= { 'K', ' ', true },
 	[ TAINT_AUX ]			= { 'X', ' ', true },
 	[ TAINT_RANDSTRUCT ]		= { 'T', ' ', true },
+	[ TAINT_INSECURE ]		= { 'Z', ' ', false },
 };
 
 /**
-- 
2.19.1


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [v3 02/12] x86/fsgsbase/64: Add 'unsafe_fsgsbase' to enable CR4.FSGSBASE
  2018-10-23 18:42 [v3 00/12] x86: Enable FSGSBASE instructions Chang S. Bae
  2018-10-23 18:42 ` [v3 01/12] taint: Introduce a new taint flag (insecure) Chang S. Bae
@ 2018-10-23 18:42 ` Chang S. Bae
  2018-10-24 18:51   ` Andy Lutomirski
  2018-10-23 18:42 ` [v3 03/12] x86/fsgsbase/64: Add intrinsics/macros for FSGSBASE instructions Chang S. Bae
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 34+ messages in thread
From: Chang S. Bae @ 2018-10-23 18:42 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, Andy Lutomirski, H . Peter Anvin
  Cc: Andi Kleen, Dave Hansen, Markus T Metzger, Ravi Shankar,
	Chang S . Bae, LKML

From: Andy Lutomirski <luto@kernel.org>

This is temporary.  It will allow the next few patches to be tested
incrementally.

Setting unsafe_fsgsbase is a root hole.  Don't do it.

[ chang: Minor fix. Add the TAINT_INSECURE flag. ]

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
---
 .../admin-guide/kernel-parameters.txt         |  3 +++
 arch/x86/kernel/cpu/common.c                  | 27 +++++++++++++++++++
 2 files changed, 30 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index ea8095521085..dfc2023b796b 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2729,6 +2729,9 @@
 	no5lvl		[X86-64] Disable 5-level paging mode. Forces
 			kernel to use 4-level paging instead.
 
+	unsafe_fsgsbase	[X86] Allow FSGSBASE instructions.  This will be
+			replaced with a  nofsgsbase flag.
+
 	no_console_suspend
 			[HW] Never suspend the console
 			Disable suspending of consoles during suspend and
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 660d0b22e962..6c54e6d2fdfb 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -365,6 +365,25 @@ static __always_inline void setup_umip(struct cpuinfo_x86 *c)
 	cr4_clear_bits(X86_CR4_UMIP);
 }
 
+/*
+ * Temporary hack: FSGSBASE is unsafe until a few kernel code paths are
+ * updated. This allows us to get the kernel ready incrementally. Setting
+ * unsafe_fsgsbase and TAINT_INSECURE flags will allow the series to be
+ * bisected if necessary.
+ *
+ * Once all the pieces are in place, these will go away and be replaced with
+ * a nofsgsbase chicken flag.
+ */
+static bool unsafe_fsgsbase;
+
+static __init int setup_unsafe_fsgsbase(char *arg)
+{
+	unsafe_fsgsbase = true;
+	add_taint(TAINT_INSECURE, LOCKDEP_STILL_OK);
+	return 1;
+}
+__setup("unsafe_fsgsbase", setup_unsafe_fsgsbase);
+
 /*
  * Protection Keys are not available in 32-bit mode.
  */
@@ -1352,6 +1371,14 @@ static void identify_cpu(struct cpuinfo_x86 *c)
 	setup_smap(c);
 	setup_umip(c);
 
+	/* Enable FSGSBASE instructions if available. */
+	if (cpu_has(c, X86_FEATURE_FSGSBASE)) {
+		if (unsafe_fsgsbase)
+			cr4_set_bits(X86_CR4_FSGSBASE);
+		else
+			clear_cpu_cap(c, X86_FEATURE_FSGSBASE);
+	}
+
 	/*
 	 * The vendor-specific functions might have changed features.
 	 * Now we do "generic changes."
-- 
2.19.1


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [v3 03/12] x86/fsgsbase/64: Add intrinsics/macros for FSGSBASE instructions
  2018-10-23 18:42 [v3 00/12] x86: Enable FSGSBASE instructions Chang S. Bae
  2018-10-23 18:42 ` [v3 01/12] taint: Introduce a new taint flag (insecure) Chang S. Bae
  2018-10-23 18:42 ` [v3 02/12] x86/fsgsbase/64: Add 'unsafe_fsgsbase' to enable CR4.FSGSBASE Chang S. Bae
@ 2018-10-23 18:42 ` Chang S. Bae
  2018-10-24 18:53   ` Andy Lutomirski
  2018-10-23 18:42 ` [v3 04/12] x86/fsgsbase/64: Enable FSGSBASE instructions in the helper functions Chang S. Bae
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 34+ messages in thread
From: Chang S. Bae @ 2018-10-23 18:42 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, Andy Lutomirski, H . Peter Anvin
  Cc: Andi Kleen, Dave Hansen, Markus T Metzger, Ravi Shankar,
	Chang S . Bae, LKML

From: Andi Kleen <ak@linux.intel.com>

Add C intrinsics and assembler macros for the new FSBASE and GSBASE
instructions.

Very straight forward. Used in followon patches.

[ luto: Rename the variables from FS and GS to FSBASE and GSBASE and
  make <asm/fsgsbase.h> safe to include on 32-bit kernels. ]

v2: Use __always_inline

[ chang: Revise the changelog. Place them in <asm/fsgsbase.h>. Replace
  the macros with GAS-compatible ones. ]

If GCC supports it, we can add -mfsgsbase to CFLAGS and use the builtins
here for extra performance.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
---
 arch/x86/include/asm/fsgsbase.h | 72 +++++++++++++++++++++++++++++++++
 1 file changed, 72 insertions(+)

diff --git a/arch/x86/include/asm/fsgsbase.h b/arch/x86/include/asm/fsgsbase.h
index eb377b6e9eed..b4d4509b786c 100644
--- a/arch/x86/include/asm/fsgsbase.h
+++ b/arch/x86/include/asm/fsgsbase.h
@@ -19,6 +19,44 @@ extern unsigned long x86_gsbase_read_task(struct task_struct *task);
 extern int x86_fsbase_write_task(struct task_struct *task, unsigned long fsbase);
 extern int x86_gsbase_write_task(struct task_struct *task, unsigned long gsbase);
 
+/* Must be protected by X86_FEATURE_FSGSBASE check. */
+
+static __always_inline unsigned long rdfsbase(void)
+{
+	unsigned long fsbase;
+
+	asm volatile(".byte 0xf3, 0x48, 0x0f, 0xae, 0xc0 # rdfsbaseq %%rax"
+			: "=a" (fsbase)
+			:: "memory");
+
+	return fsbase;
+}
+
+static __always_inline unsigned long rdgsbase(void)
+{
+	unsigned long gsbase;
+
+	asm volatile(".byte 0xf3, 0x48, 0x0f, 0xae, 0xc8 # rdgsbaseq %%rax;"
+			: "=a" (gsbase)
+			:: "memory");
+
+	return gsbase;
+}
+
+static __always_inline void wrfsbase(unsigned long fsbase)
+{
+	asm volatile(".byte 0xf3, 0x48, 0x0f, 0xae, 0xd0 # wrfsbaseq %%rax"
+			:: "a" (fsbase)
+			: "memory");
+}
+
+static __always_inline void wrgsbase(unsigned long gsbase)
+{
+	asm volatile(".byte 0xf3, 0x48, 0x0f, 0xae, 0xd8 # wrgsbaseq %%rax;"
+			:: "a" (gsbase)
+			: "memory");
+}
+
 /* Helper functions for reading/writing FS/GS base */
 
 static inline unsigned long x86_fsbase_read_cpu(void)
@@ -44,6 +82,40 @@ extern void x86_gsbase_write_cpu_inactive(unsigned long gsbase);
 
 #endif /* CONFIG_X86_64 */
 
+#else /* __ASSEMBLY__ */
+
+#ifdef CONFIG_X86_64
+
+#include <asm/inst.h>
+
+.macro RDGSBASE opd
+	REG_TYPE rdgsbase_opd_type \opd
+	.if rdgsbase_opd_type == REG_TYPE_R64
+	R64_NUM rdgsbase_opd \opd
+	.byte 0xf3
+	PFX_REX rdgsbase_opd 0 W = 1
+	.else
+	.error "RDGSBASE: only for 64-bit value"
+	.endif
+	.byte 0xf, 0xae
+	MODRM 0xc0 rdgsbase_opd 1
+.endm
+
+.macro WRGSBASE opd
+	REG_TYPE wrgsbase_opd_type \opd
+	.if wrgsbase_opd_type == REG_TYPE_R64
+	R64_NUM wrgsbase_opd \opd
+	.byte 0xf3
+	PFX_REX wrgsbase_opd 0 W = 1
+	.else
+	.error "WRGSBASE: only for 64-bit value"
+	.endif
+	.byte 0xf, 0xae
+	MODRM 0xd0 wrgsbase_opd 1
+.endm
+
+#endif /* CONFIG_X86_64 */
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_FSGSBASE_H */
-- 
2.19.1


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [v3 04/12] x86/fsgsbase/64: Enable FSGSBASE instructions in the helper functions
  2018-10-23 18:42 [v3 00/12] x86: Enable FSGSBASE instructions Chang S. Bae
                   ` (2 preceding siblings ...)
  2018-10-23 18:42 ` [v3 03/12] x86/fsgsbase/64: Add intrinsics/macros for FSGSBASE instructions Chang S. Bae
@ 2018-10-23 18:42 ` Chang S. Bae
  2018-10-24 19:16   ` Andy Lutomirski
  2018-10-23 18:42 ` [v3 05/12] x86/fsgsbase/64: Preserve FS/GS state in __switch_to() if FSGSBASE is on Chang S. Bae
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 34+ messages in thread
From: Chang S. Bae @ 2018-10-23 18:42 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, Andy Lutomirski, H . Peter Anvin
  Cc: Andi Kleen, Dave Hansen, Markus T Metzger, Ravi Shankar,
	Chang S . Bae, LKML

The helper functions will switch on faster accesses to FSBASE and GSBASE
when the FSGSBASE feature is enabled.

Accessing user GSBASE needs a couple of SWAPGS operations. It is avoidable
if the user GSBASE is saved at kernel entry, being updated as changes, and
restored back at kernel exit. However, it seems to spend more cycles for
savings and restorations. Little or no benefit was measured from
experiments.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: Any Lutomirski <luto@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
---
 arch/x86/include/asm/fsgsbase.h | 17 +++----
 arch/x86/kernel/process_64.c    | 82 +++++++++++++++++++++++++++------
 2 files changed, 75 insertions(+), 24 deletions(-)

diff --git a/arch/x86/include/asm/fsgsbase.h b/arch/x86/include/asm/fsgsbase.h
index b4d4509b786c..e500d771155f 100644
--- a/arch/x86/include/asm/fsgsbase.h
+++ b/arch/x86/include/asm/fsgsbase.h
@@ -57,26 +57,23 @@ static __always_inline void wrgsbase(unsigned long gsbase)
 			: "memory");
 }
 
+#include <asm/cpufeature.h>
+
 /* Helper functions for reading/writing FS/GS base */
 
 static inline unsigned long x86_fsbase_read_cpu(void)
 {
 	unsigned long fsbase;
 
-	rdmsrl(MSR_FS_BASE, fsbase);
+	if (static_cpu_has(X86_FEATURE_FSGSBASE))
+		fsbase = rdfsbase();
+	else
+		rdmsrl(MSR_FS_BASE, fsbase);
 
 	return fsbase;
 }
 
-static inline unsigned long x86_gsbase_read_cpu_inactive(void)
-{
-	unsigned long gsbase;
-
-	rdmsrl(MSR_KERNEL_GS_BASE, gsbase);
-
-	return gsbase;
-}
-
+extern unsigned long x86_gsbase_read_cpu_inactive(void);
 extern void x86_fsbase_write_cpu(unsigned long fsbase);
 extern void x86_gsbase_write_cpu_inactive(unsigned long gsbase);
 
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 31b4755369f0..fcf18046c3d6 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -159,6 +159,36 @@ enum which_selector {
 	GS
 };
 
+/*
+ * Interrupts are disabled here. Out of line to be protected from kprobes.
+ */
+static noinline __kprobes unsigned long rd_inactive_gsbase(void)
+{
+	unsigned long gsbase, flags;
+
+	local_irq_save(flags);
+	native_swapgs();
+	gsbase = rdgsbase();
+	native_swapgs();
+	local_irq_restore(flags);
+
+	return gsbase;
+}
+
+/*
+ * Interrupts are disabled here. Out of line to be protected from kprobes.
+ */
+static noinline __kprobes void wr_inactive_gsbase(unsigned long gsbase)
+{
+	unsigned long flags;
+
+	local_irq_save(flags);
+	native_swapgs();
+	wrgsbase(gsbase);
+	native_swapgs();
+	local_irq_restore(flags);
+}
+
 /*
  * Saves the FS or GS base for an outgoing thread if FSGSBASE extensions are
  * not available.  The goal is to be reasonably fast on non-FSGSBASE systems.
@@ -337,22 +367,42 @@ static unsigned long x86_fsgsbase_read_task(struct task_struct *task,
 	return base;
 }
 
+unsigned long x86_gsbase_read_cpu_inactive(void)
+{
+	unsigned long gsbase;
+
+	if (static_cpu_has(X86_FEATURE_FSGSBASE))
+		gsbase = rd_inactive_gsbase();
+	else
+		rdmsrl(MSR_KERNEL_GS_BASE, gsbase);
+
+	return gsbase;
+}
+
 void x86_fsbase_write_cpu(unsigned long fsbase)
 {
-	/*
-	 * Set the selector to 0 as a notion, that the segment base is
-	 * overwritten, which will be checked for skipping the segment load
-	 * during context switch.
-	 */
-	loadseg(FS, 0);
-	wrmsrl(MSR_FS_BASE, fsbase);
+	if (static_cpu_has(X86_FEATURE_FSGSBASE)) {
+		wrfsbase(fsbase);
+	} else {
+		/*
+		 * Set the selector to 0 as a notion, that the segment base is
+		 * overwritten, which will be checked for skipping the segment load
+		 * during context switch.
+		 */
+		loadseg(FS, 0);
+		wrmsrl(MSR_FS_BASE, fsbase);
+	}
 }
 
 void x86_gsbase_write_cpu_inactive(unsigned long gsbase)
 {
-	/* Set the selector to 0 for the same reason as %fs above. */
-	loadseg(GS, 0);
-	wrmsrl(MSR_KERNEL_GS_BASE, gsbase);
+	if (static_cpu_has(X86_FEATURE_FSGSBASE)) {
+		wr_inactive_gsbase(gsbase);
+	} else {
+		/* Set the selector to 0 for the same reason as %fs above. */
+		loadseg(GS, 0);
+		wrmsrl(MSR_KERNEL_GS_BASE, gsbase);
+	}
 }
 
 unsigned long x86_fsbase_read_task(struct task_struct *task)
@@ -361,7 +411,8 @@ unsigned long x86_fsbase_read_task(struct task_struct *task)
 
 	if (task == current)
 		fsbase = x86_fsbase_read_cpu();
-	else if (task->thread.fsindex == 0)
+	else if (static_cpu_has(X86_FEATURE_FSGSBASE) ||
+		 (task->thread.fsindex == 0))
 		fsbase = task->thread.fsbase;
 	else
 		fsbase = x86_fsgsbase_read_task(task, task->thread.fsindex);
@@ -375,7 +426,8 @@ unsigned long x86_gsbase_read_task(struct task_struct *task)
 
 	if (task == current)
 		gsbase = x86_gsbase_read_cpu_inactive();
-	else if (task->thread.gsindex == 0)
+	else if (static_cpu_has(X86_FEATURE_FSGSBASE) ||
+		 (task->thread.gsindex == 0))
 		gsbase = task->thread.gsbase;
 	else
 		gsbase = x86_fsgsbase_read_task(task, task->thread.gsindex);
@@ -396,7 +448,8 @@ int x86_fsbase_write_task(struct task_struct *task, unsigned long fsbase)
 	task->thread.fsbase = fsbase;
 	if (task == current)
 		x86_fsbase_write_cpu(fsbase);
-	task->thread.fsindex = 0;
+	if (!static_cpu_has(X86_FEATURE_FSGSBASE))
+		task->thread.fsindex = 0;
 	preempt_enable();
 
 	return 0;
@@ -411,7 +464,8 @@ int x86_gsbase_write_task(struct task_struct *task, unsigned long gsbase)
 	task->thread.gsbase = gsbase;
 	if (task == current)
 		x86_gsbase_write_cpu_inactive(gsbase);
-	task->thread.gsindex = 0;
+	if (!static_cpu_has(X86_FEATURE_FSGSBASE))
+		task->thread.gsindex = 0;
 	preempt_enable();
 
 	return 0;
-- 
2.19.1


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [v3 05/12] x86/fsgsbase/64: Preserve FS/GS state in __switch_to() if FSGSBASE is on
  2018-10-23 18:42 [v3 00/12] x86: Enable FSGSBASE instructions Chang S. Bae
                   ` (3 preceding siblings ...)
  2018-10-23 18:42 ` [v3 04/12] x86/fsgsbase/64: Enable FSGSBASE instructions in the helper functions Chang S. Bae
@ 2018-10-23 18:42 ` Chang S. Bae
  2018-10-24 19:21   ` Andy Lutomirski
  2018-10-23 18:42 ` [v3 06/12] x86/fsgsbase/64: When copying a thread, use the FSGSBASE instructions if available Chang S. Bae
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 34+ messages in thread
From: Chang S. Bae @ 2018-10-23 18:42 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, Andy Lutomirski, H . Peter Anvin
  Cc: Andi Kleen, Dave Hansen, Markus T Metzger, Ravi Shankar,
	Chang S . Bae, LKML

From: Andy Lutomirski <luto@kernel.org>

With the new FSGSBASE instructions, we can efficiently read and write
the FSBASE and GSBASE in __switch_to().  Use that capability to preserve
the full state.

This will enable user code to do whatever it wants with the new
instructions without any kernel-induced gotchas.  (There can still be
architectural gotchas: movl %gs,%eax; movl %eax,%gs may change GSBASE
if WRGSBASE was used, but users are expected to read the CPU manual
before doing things like that.)

This is a considerable speedup.  It seems to save about 100 cycles
per context switch compared to the baseline 4.6-rc1 behavior on my
Skylake laptop.

[ chang: 5~10% performance improvements were seen by a context switch
  benchmark that ran threads with different FS/GSBASE values. Minor
  edit on the changelog. ]

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
---
 arch/x86/kernel/process_64.c | 34 ++++++++++++++++++++++++++++------
 1 file changed, 28 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index fcf18046c3d6..1d975cadc256 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -238,8 +238,18 @@ static __always_inline void save_fsgs(struct task_struct *task)
 {
 	savesegment(fs, task->thread.fsindex);
 	savesegment(gs, task->thread.gsindex);
-	save_base_legacy(task, task->thread.fsindex, FS);
-	save_base_legacy(task, task->thread.gsindex, GS);
+	if (static_cpu_has(X86_FEATURE_FSGSBASE)) {
+		/*
+		 * If FSGSBASE is enabled, we can't make any useful guesses
+		 * about the base, and user code expects us to save the current
+		 * value.  Fortunately, reading the base directly is efficient.
+		 */
+		task->thread.fsbase = rdfsbase();
+		task->thread.gsbase = rd_inactive_gsbase();
+	} else {
+		save_base_legacy(task, task->thread.fsindex, FS);
+		save_base_legacy(task, task->thread.gsindex, GS);
+	}
 }
 
 #if IS_ENABLED(CONFIG_KVM)
@@ -318,10 +328,22 @@ static __always_inline void load_seg_legacy(unsigned short prev_index,
 static __always_inline void x86_fsgsbase_load(struct thread_struct *prev,
 					      struct thread_struct *next)
 {
-	load_seg_legacy(prev->fsindex, prev->fsbase,
-			next->fsindex, next->fsbase, FS);
-	load_seg_legacy(prev->gsindex, prev->gsbase,
-			next->gsindex, next->gsbase, GS);
+	if (static_cpu_has(X86_FEATURE_FSGSBASE)) {
+		/* Update the FS and GS selectors if they could have changed. */
+		if (unlikely(prev->fsindex || next->fsindex))
+			loadseg(FS, next->fsindex);
+		if (unlikely(prev->gsindex || next->gsindex))
+			loadseg(GS, next->gsindex);
+
+		/* Update the bases. */
+		wrfsbase(next->fsbase);
+		wr_inactive_gsbase(next->gsbase);
+	} else {
+		load_seg_legacy(prev->fsindex, prev->fsbase,
+				next->fsindex, next->fsbase, FS);
+		load_seg_legacy(prev->gsindex, prev->gsbase,
+				next->gsindex, next->gsbase, GS);
+	}
 }
 
 static unsigned long x86_fsgsbase_read_task(struct task_struct *task,
-- 
2.19.1


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [v3 06/12] x86/fsgsbase/64: When copying a thread, use the FSGSBASE instructions if available
  2018-10-23 18:42 [v3 00/12] x86: Enable FSGSBASE instructions Chang S. Bae
                   ` (4 preceding siblings ...)
  2018-10-23 18:42 ` [v3 05/12] x86/fsgsbase/64: Preserve FS/GS state in __switch_to() if FSGSBASE is on Chang S. Bae
@ 2018-10-23 18:42 ` Chang S. Bae
  2018-10-23 18:42 ` [v3 07/12] x86/fsgsbase/64: Introduce the new FIND_PERCPU_BASE macro Chang S. Bae
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 34+ messages in thread
From: Chang S. Bae @ 2018-10-23 18:42 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, Andy Lutomirski, H . Peter Anvin
  Cc: Andi Kleen, Dave Hansen, Markus T Metzger, Ravi Shankar,
	Chang S . Bae, LKML

Copy real FS/GSBASE values instead of approximation when FSGSBASE is
enabled.

Factoring out to save_fsgs() does not result in the same behavior because
save_base_legacy() does not copy FS/GSBASE when the index is zero.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
---
 arch/x86/kernel/process_64.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 1d975cadc256..71cc8abe208c 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -510,10 +510,16 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long sp,
 	p->thread.sp = (unsigned long) fork_frame;
 	p->thread.io_bitmap_ptr = NULL;
 
-	savesegment(gs, p->thread.gsindex);
-	p->thread.gsbase = p->thread.gsindex ? 0 : me->thread.gsbase;
 	savesegment(fs, p->thread.fsindex);
-	p->thread.fsbase = p->thread.fsindex ? 0 : me->thread.fsbase;
+	savesegment(gs, p->thread.gsindex);
+	if (static_cpu_has(X86_FEATURE_FSGSBASE)) {
+		p->thread.fsbase = rdfsbase();
+		p->thread.gsbase = rd_inactive_gsbase();
+	} else {
+		/* save_base_legacy() does not set base when index is zero. */
+		p->thread.fsbase = p->thread.fsindex ? 0 : me->thread.fsbase;
+		p->thread.gsbase = p->thread.gsindex ? 0 : me->thread.gsbase;
+	}
 	savesegment(es, p->thread.es);
 	savesegment(ds, p->thread.ds);
 	memset(p->thread.ptrace_bps, 0, sizeof(p->thread.ptrace_bps));
-- 
2.19.1


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [v3 07/12] x86/fsgsbase/64: Introduce the new FIND_PERCPU_BASE macro
  2018-10-23 18:42 [v3 00/12] x86: Enable FSGSBASE instructions Chang S. Bae
                   ` (5 preceding siblings ...)
  2018-10-23 18:42 ` [v3 06/12] x86/fsgsbase/64: When copying a thread, use the FSGSBASE instructions if available Chang S. Bae
@ 2018-10-23 18:42 ` Chang S. Bae
  2018-10-26  0:25   ` Andy Lutomirski
  2018-10-23 18:42 ` [v3 08/12] x86/fsgsbase/64: Use the per-CPU base as GSBASE at the paranoid_entry Chang S. Bae
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 34+ messages in thread
From: Chang S. Bae @ 2018-10-23 18:42 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, Andy Lutomirski, H . Peter Anvin
  Cc: Andi Kleen, Dave Hansen, Markus T Metzger, Ravi Shankar,
	Chang S . Bae, LKML

GSBASE is used to find per-CPU data in the kernel. But when it is unknown,
the per-CPU base can be found from the per_cpu_offset table with a CPU NR.
The CPU NR is extracted from the limit field of the CPUNODE entry in GDT,
or by the RDPID instruction.

Also, add the GAS-compatible RDPID macro.

The new macro will be used on a following patch.

Suggested-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/fsgsbase.h | 52 +++++++++++++++++++++++++++++++++
 arch/x86/include/asm/inst.h     | 15 ++++++++++
 2 files changed, 67 insertions(+)

diff --git a/arch/x86/include/asm/fsgsbase.h b/arch/x86/include/asm/fsgsbase.h
index e500d771155f..0c2d7d8a8c01 100644
--- a/arch/x86/include/asm/fsgsbase.h
+++ b/arch/x86/include/asm/fsgsbase.h
@@ -111,6 +111,58 @@ extern void x86_gsbase_write_cpu_inactive(unsigned long gsbase);
 	MODRM 0xd0 wrgsbase_opd 1
 .endm
 
+#if CONFIG_SMP
+
+/*
+ * Fetch the per-CPU GSBASE value for this processor and put it in @reg.
+ * We normally use %gs for accessing per-CPU data, but we are setting up
+ * %gs here and obviously can not use %gs itself to access per-CPU data.
+ */
+.macro FIND_PERCPU_BASE_RDPID reg:req
+	/*
+	 * The CPU/node NR is initialized earlier, directly in cpu_init().
+	 * The CPU NR is extracted from it.
+	 */
+	RDPID	\reg
+	andq	$VDSO_CPUNODE_MASK, \reg
+
+	/*
+	 * The kernel GSBASE value is found from the __per_cpu_offset table
+	 * with the CPU NR.
+	 */
+	movq	__per_cpu_offset(, \reg, 8), \reg
+.endm
+
+/*
+ * Same as above FIND_PERCPU_BASERDPID, except that CPU/node NR is loaded
+ * from the limit (size) field of a special segment descriptor entry in
+ * GDT.
+ */
+.macro FIND_PERCPU_BASE_SEG_LIMIT reg:req
+	/* Read CPU NR */
+	movq	$__CPUNODE_SEG, \reg
+	lsl	\reg, \reg
+	andq	$VDSO_CPUNODE_MASK, \reg
+
+	movq	__per_cpu_offset(, \reg, 8), \reg
+.endm
+
+.macro FIND_PERCPU_BASE reg:req
+	ALTERNATIVE \
+		"FIND_PERCPU_BASE_SEG_LIMIT \reg", \
+		"FIND_PERCPU_BASE_RDPID \reg", \
+		X86_FEATURE_RDPID
+.endm
+
+#else
+
+.macro FIND_PERCPU_BASE reg:req
+	/* Tracking the base offset value */
+	movq	pcpu_unit_offsets(%rip), \reg
+.endm
+
+#endif /* CONFIG_SMP */
+
 #endif /* CONFIG_X86_64 */
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/x86/include/asm/inst.h b/arch/x86/include/asm/inst.h
index f5a796da07f8..d063841a17e3 100644
--- a/arch/x86/include/asm/inst.h
+++ b/arch/x86/include/asm/inst.h
@@ -306,6 +306,21 @@
 	.endif
 	MODRM 0xc0 movq_r64_xmm_opd1 movq_r64_xmm_opd2
 	.endm
+
+.macro RDPID opd
+	REG_TYPE rdpid_opd_type \opd
+	.if rdpid_opd_type == REG_TYPE_R64
+	R64_NUM rdpid_opd \opd
+	.else
+	R32_NUM rdpid_opd \opd
+	.endif
+	.byte 0xf3
+	.if rdpid_opd > 7
+	PFX_REX rdpid_opd 0
+	.endif
+	.byte 0x0f, 0xc7
+	MODRM 0xc0 rdpid_opd 0x7
+.endm
 #endif
 
 #endif
-- 
2.19.1


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [v3 08/12] x86/fsgsbase/64: Use the per-CPU base as GSBASE at the paranoid_entry
  2018-10-23 18:42 [v3 00/12] x86: Enable FSGSBASE instructions Chang S. Bae
                   ` (6 preceding siblings ...)
  2018-10-23 18:42 ` [v3 07/12] x86/fsgsbase/64: Introduce the new FIND_PERCPU_BASE macro Chang S. Bae
@ 2018-10-23 18:42 ` Chang S. Bae
  2018-10-23 18:42 ` [v3 09/12] selftests/x86/fsgsbase: Test WRGSBASE Chang S. Bae
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 34+ messages in thread
From: Chang S. Bae @ 2018-10-23 18:42 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, Andy Lutomirski, H . Peter Anvin
  Cc: Andi Kleen, Dave Hansen, Markus T Metzger, Ravi Shankar,
	Chang S . Bae, LKML

The FSGSBASE instructions allow fast accesses on GSBASE.  Now, at the
paranoid_entry, the per-CPU base value can be always copied to GSBASE.
And the original GSBASE value will be restored at the exit.

So far, GSBASE modification has not been directly allowed from userspace.
So, swapping GSBASE has been conditionally executed according to the
kernel-enforced convention that a negative GSBASE indicates a kernel value.
But when FSGSBASE is enabled, userspace can put an arbitrary value in
GSBASE. The change will secure a correct GSBASE value with FSGSBASE.

Also, factor out the RDMSR-based GSBASE read into a new macro,
READ_MSR_GSBASE.

Suggested-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/entry/entry_64.S       | 73 ++++++++++++++++++++++++++-------
 arch/x86/include/asm/fsgsbase.h |  9 ++++
 2 files changed, 68 insertions(+), 14 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 16427981f222..8c7a4949395b 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -38,6 +38,7 @@
 #include <asm/export.h>
 #include <asm/frame.h>
 #include <asm/nospec-branch.h>
+#include <asm/fsgsbase.h>
 #include <linux/err.h>
 
 #include "calling.h"
@@ -928,10 +929,14 @@ ENTRY(\sym)
 	addq	$EXCEPTION_STKSZ, CPU_TSS_IST(\shift_ist)
 	.endif
 
-	/* these procedures expect "no swapgs" flag in ebx */
 	.if \paranoid
+	/*
+	 * With FSGSBASE, original GSBASE is stored in %rbx
+	 * Without FSGSBASE, expect "no swapgs" flag in %ebx
+	 */
 	jmp	paranoid_exit
 	.else
+	/* Expect "no swapgs" flag in %ebx */
 	jmp	error_exit
 	.endif
 
@@ -1144,26 +1149,57 @@ idtentry machine_check		do_mce			has_error_code=0	paranoid=1
 #endif
 
 /*
- * Save all registers in pt_regs, and switch gs if needed.
- * Use slow, but surefire "are we in kernel?" check.
- * Return: ebx=0: need swapgs on exit, ebx=1: otherwise
+ * Save all registers in pt_regs.
+ *
+ * When FSGSBASE enabled, current GSBASE is always copied to %rbx.
+ *
+ * Without FSGSBASE, SWAPGS is needed when entering from userspace.
+ * A positive GSBASE means it is a user value and a negative GSBASE
+ * means it is a kernel value.
+ *
+ * Return:
+ * 	With FSGSBASE, %rbx has current GSBASE.
+ * 	Without that,
+ *		%ebx=0: need SWAPGS on exit, %ebx=1: otherwise
  */
 ENTRY(paranoid_entry)
 	UNWIND_HINT_FUNC
 	cld
 	PUSH_AND_CLEAR_REGS save_ret=1
 	ENCODE_FRAME_POINTER 8
-	movl	$1, %ebx
-	movl	$MSR_GS_BASE, %ecx
-	rdmsr
-	testl	%edx, %edx
-	js	1f				/* negative -> in kernel */
-	SWAPGS
-	xorl	%ebx, %ebx
 
-1:
+	/*
+	 * As long as this PTI macro doesn't depend on kernel GSBASE,
+	 * we can do it early. This is because FIND_PERCPU_BASE
+	 * references data in kernel space.
+	 */
 	SAVE_AND_SWITCH_TO_KERNEL_CR3 scratch_reg=%rax save_reg=%r14
 
+	/*
+	 * Read GSBASE by RDGSBASE. Kernel GSBASE is found
+	 * from the per-CPU offset table with a CPU NR.
+	 */
+	ALTERNATIVE "jmp .Lparanoid_entry_no_fsgsbase",	"",\
+		X86_FEATURE_FSGSBASE
+	RDGSBASE	%rbx
+	FIND_PERCPU_BASE	%rax
+	WRGSBASE	%rax
+	ret
+
+.Lparanoid_entry_no_fsgsbase:
+	movl	$1, %ebx
+	/*
+	 * FSGSBASE is not in use, so depend on the kernel-enforced
+	 * convention that a negative GSBASE indicates a kernel value.
+	 */
+	READ_MSR_GSBASE save_reg=%edx
+	testl	%edx, %edx	/* Negative -> in kernel */
+	jns	.Lparanoid_entry_swapgs
+	ret
+
+.Lparanoid_entry_swapgs:
+	SWAPGS
+	xorl	%ebx, %ebx
 	ret
 END(paranoid_entry)
 
@@ -1177,12 +1213,21 @@ END(paranoid_entry)
  * be complicated.  Fortunately, we there's no good reason
  * to try to handle preemption here.
  *
- * On entry, ebx is "no swapgs" flag (1: don't need swapgs, 0: need it)
+ * On entry,
+ *	With FSGSBASE,
+ *		%rbx is original GSBASE that needs to be restored on the exit
+ *	Without that,
+ * 		%ebx is "no swapgs" flag (1: don't need swapgs, 0: need it)
  */
 ENTRY(paranoid_exit)
 	UNWIND_HINT_REGS
 	DISABLE_INTERRUPTS(CLBR_ANY)
 	TRACE_IRQS_OFF_DEBUG
+	ALTERNATIVE "jmp .Lparanoid_exit_no_fsgsbase",	"nop",\
+		X86_FEATURE_FSGSBASE
+	WRGSBASE	%rbx
+	jmp	.Lparanoid_exit_no_swapgs;
+.Lparanoid_exit_no_fsgsbase:
 	testl	%ebx, %ebx			/* swapgs needed? */
 	jnz	.Lparanoid_exit_no_swapgs
 	TRACE_IRQS_IRETQ
@@ -1193,7 +1238,7 @@ ENTRY(paranoid_exit)
 	TRACE_IRQS_IRETQ_DEBUG
 	RESTORE_CR3	scratch_reg=%rbx save_reg=%r14
 .Lparanoid_exit_restore:
-	jmp restore_regs_and_return_to_kernel
+	jmp	restore_regs_and_return_to_kernel
 END(paranoid_exit)
 
 /*
diff --git a/arch/x86/include/asm/fsgsbase.h b/arch/x86/include/asm/fsgsbase.h
index 0c2d7d8a8c01..c5bbd40454b8 100644
--- a/arch/x86/include/asm/fsgsbase.h
+++ b/arch/x86/include/asm/fsgsbase.h
@@ -163,6 +163,15 @@ extern void x86_gsbase_write_cpu_inactive(unsigned long gsbase);
 
 #endif /* CONFIG_SMP */
 
+.macro READ_MSR_GSBASE save_reg:req
+	movl	$MSR_GS_BASE, %ecx
+	/* Read MSR specified by %ecx into %edx:%eax */
+	rdmsr
+	.ifnc \save_reg, %edx
+	movl	%edx, \save_reg
+	.endif
+.endm
+
 #endif /* CONFIG_X86_64 */
 
 #endif /* __ASSEMBLY__ */
-- 
2.19.1


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [v3 09/12] selftests/x86/fsgsbase: Test WRGSBASE
  2018-10-23 18:42 [v3 00/12] x86: Enable FSGSBASE instructions Chang S. Bae
                   ` (7 preceding siblings ...)
  2018-10-23 18:42 ` [v3 08/12] x86/fsgsbase/64: Use the per-CPU base as GSBASE at the paranoid_entry Chang S. Bae
@ 2018-10-23 18:42 ` Chang S. Bae
  2018-10-23 18:42 ` [v3 10/12] x86/fsgsbase/64: Enable FSGSBASE by default and add a chicken bit Chang S. Bae
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 34+ messages in thread
From: Chang S. Bae @ 2018-10-23 18:42 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, Andy Lutomirski, H . Peter Anvin
  Cc: Andi Kleen, Dave Hansen, Markus T Metzger, Ravi Shankar,
	Chang S . Bae, LKML

From: Andy Lutomirski <luto@kernel.org>

This validates that GS and GSBASE are independently preserved across
context switches.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
---
 tools/testing/selftests/x86/fsgsbase.c | 110 ++++++++++++++++++++++++-
 1 file changed, 107 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/x86/fsgsbase.c b/tools/testing/selftests/x86/fsgsbase.c
index f249e042b3b5..fe7acfef53ba 100644
--- a/tools/testing/selftests/x86/fsgsbase.c
+++ b/tools/testing/selftests/x86/fsgsbase.c
@@ -23,6 +23,7 @@
 #include <pthread.h>
 #include <asm/ldt.h>
 #include <sys/mman.h>
+#include <setjmp.h>
 
 #ifndef __x86_64__
 # error This test is 64-bit only
@@ -71,6 +72,51 @@ static void sigsegv(int sig, siginfo_t *si, void *ctx_void)
 
 }
 
+static jmp_buf jmpbuf;
+
+static void sigill(int sig, siginfo_t *si, void *ctx_void)
+{
+	siglongjmp(jmpbuf, 1);
+}
+
+static bool have_fsgsbase;
+
+static inline unsigned long rdgsbase(void)
+{
+	unsigned long gsbase;
+
+	asm volatile(".byte 0xf3,0x48,0x0f,0xae,0xc8 # rdgsbaseq %%rax"
+			: "=a" (gsbase)
+			:: "memory");
+
+	return gsbase;
+}
+
+static inline unsigned long rdfsbase(void)
+{
+	unsigned long fsbase;
+
+	asm volatile(".byte 0xf3,0x48,0x0f,0xae,0xc0 # rdfsbaseq %%rax"
+			: "=a" (fsbase)
+			:: "memory");
+
+	return fsbase;
+}
+
+static inline void wrgsbase(unsigned long gsbase)
+{
+	asm volatile(".byte 0xf3,0x48,0x0f,0xae,0xd8 # wrgsbaseq %%rax"
+			:: "a" (gsbase)
+			: "memory");
+}
+
+static inline void wrfsbase(unsigned long fsbase)
+{
+	asm volatile(".byte 0xf3,0x48,0x0f,0xae,0xd0 # wrfsbaseq %%rax"
+			:: "a" (fsbase)
+			: "memory");
+}
+
 enum which_base { FS, GS };
 
 static unsigned long read_base(enum which_base which)
@@ -199,14 +245,16 @@ static void do_remote_base()
 	       to_set, hard_zero ? " and clear gs" : "", sel);
 }
 
-void do_unexpected_base(void)
+static __thread int set_thread_area_entry_number = -1;
+
+static void do_unexpected_base(void)
 {
 	/*
 	 * The goal here is to try to arrange for GS == 0, GSBASE !=
 	 * 0, and for the the kernel the think that GSBASE == 0.
 	 *
 	 * To make the test as reliable as possible, this uses
-	 * explicit descriptorss.  (This is not the only way.  This
+	 * explicit descriptors.  (This is not the only way.  This
 	 * could use ARCH_SET_GS with a low, nonzero base, but the
 	 * relevant side effect of ARCH_SET_GS could change.)
 	 */
@@ -239,7 +287,7 @@ void do_unexpected_base(void)
 			MAP_PRIVATE | MAP_ANONYMOUS | MAP_32BIT, -1, 0);
 		memcpy(low_desc, &desc, sizeof(desc));
 
-		low_desc->entry_number = -1;
+		low_desc->entry_number = set_thread_area_entry_number;
 
 		/* 32-bit set_thread_area */
 		long ret;
@@ -254,6 +302,8 @@ void do_unexpected_base(void)
 			return;
 		}
 		printf("\tother thread: using GDT slot %d\n", desc.entry_number);
+		set_thread_area_entry_number = desc.entry_number;
+
 		asm volatile ("mov %0, %%gs" : : "rm" ((unsigned short)((desc.entry_number << 3) | 0x3)));
 	}
 
@@ -265,6 +315,34 @@ void do_unexpected_base(void)
 	asm volatile ("mov %0, %%gs" : : "rm" ((unsigned short)0));
 }
 
+void test_wrbase(unsigned short index, unsigned long base)
+{
+	unsigned short newindex;
+	unsigned long newbase;
+
+	printf("[RUN]\tGS = 0x%hx, GSBASE = 0x%lx\n", index, base);
+
+	asm volatile ("mov %0, %%gs" : : "rm" (index));
+	wrgsbase(base);
+
+	remote_base = 0;
+	ftx = 1;
+	syscall(SYS_futex, &ftx, FUTEX_WAKE, 0, NULL, NULL, 0);
+	while (ftx != 0)
+		syscall(SYS_futex, &ftx, FUTEX_WAIT, 1, NULL, NULL, 0);
+
+	asm volatile ("mov %%gs, %0" : "=rm" (newindex));
+	newbase = rdgsbase();
+
+	if (newindex == index && newbase == base) {
+		printf("[OK]\tIndex and base were preserved\n");
+	} else {
+		printf("[FAIL]\tAfter switch, GS = 0x%hx and GSBASE = 0x%lx\n",
+		       newindex, newbase);
+		nerrs++;
+	}
+}
+
 static void *threadproc(void *ctx)
 {
 	while (1) {
@@ -371,6 +449,17 @@ int main()
 {
 	pthread_t thread;
 
+	/* Probe FSGSBASE */
+	sethandler(SIGILL, sigill, 0);
+	if (sigsetjmp(jmpbuf, 1) == 0) {
+		rdfsbase();
+		have_fsgsbase = true;
+		printf("\tFSGSBASE instructions are enabled\n");
+	} else {
+		printf("\tFSGSBASE instructions are disabled\n");
+	}
+	clearhandler(SIGILL);
+
 	sethandler(SIGSEGV, sigsegv, 0);
 
 	check_gs_value(0);
@@ -417,6 +506,21 @@ int main()
 
 	test_unexpected_base();
 
+	if (have_fsgsbase) {
+		unsigned short ss;
+
+		asm volatile ("mov %%ss, %0" : "=rm" (ss));
+
+		test_wrbase(0, 0);
+		test_wrbase(0, 1);
+		test_wrbase(0, 0x200000000);
+		test_wrbase(0, 0xffffffffffffffff);
+		test_wrbase(ss, 0);
+		test_wrbase(ss, 1);
+		test_wrbase(ss, 0x200000000);
+		test_wrbase(ss, 0xffffffffffffffff);
+	}
+
 	ftx = 3;  /* Kill the thread. */
 	syscall(SYS_futex, &ftx, FUTEX_WAKE, 0, NULL, NULL, 0);
 
-- 
2.19.1


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [v3 10/12] x86/fsgsbase/64: Enable FSGSBASE by default and add a chicken bit
  2018-10-23 18:42 [v3 00/12] x86: Enable FSGSBASE instructions Chang S. Bae
                   ` (8 preceding siblings ...)
  2018-10-23 18:42 ` [v3 09/12] selftests/x86/fsgsbase: Test WRGSBASE Chang S. Bae
@ 2018-10-23 18:42 ` Chang S. Bae
  2018-10-23 18:42 ` [v3 11/12] x86/elf: Enumerate kernel FSGSBASE capability in AT_HWCAP2 Chang S. Bae
  2018-10-23 18:42 ` [v3 12/12] x86/fsgsbase/64: Add documentation for FSGSBASE Chang S. Bae
  11 siblings, 0 replies; 34+ messages in thread
From: Chang S. Bae @ 2018-10-23 18:42 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, Andy Lutomirski, H . Peter Anvin
  Cc: Andi Kleen, Dave Hansen, Markus T Metzger, Ravi Shankar,
	Chang S . Bae, LKML

From: Andy Lutomirski <luto@kernel.org>

Now that FSGSBASE is fully supported, remove unsafe_fsgsbase, enable
FSGSBASE by default, and add nofsgsbase to disable it.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
---
 .../admin-guide/kernel-parameters.txt         |  3 +-
 arch/x86/kernel/cpu/common.c                  | 35 ++++++++-----------
 2 files changed, 15 insertions(+), 23 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index dfc2023b796b..72ed1a5ed832 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2729,8 +2729,7 @@
 	no5lvl		[X86-64] Disable 5-level paging mode. Forces
 			kernel to use 4-level paging instead.
 
-	unsafe_fsgsbase	[X86] Allow FSGSBASE instructions.  This will be
-			replaced with a  nofsgsbase flag.
+	nofsgsbase	[X86] Disables FSGSBASE instructions.
 
 	no_console_suspend
 			[HW] Never suspend the console
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 6c54e6d2fdfb..f20edc754532 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -365,24 +365,21 @@ static __always_inline void setup_umip(struct cpuinfo_x86 *c)
 	cr4_clear_bits(X86_CR4_UMIP);
 }
 
-/*
- * Temporary hack: FSGSBASE is unsafe until a few kernel code paths are
- * updated. This allows us to get the kernel ready incrementally. Setting
- * unsafe_fsgsbase and TAINT_INSECURE flags will allow the series to be
- * bisected if necessary.
- *
- * Once all the pieces are in place, these will go away and be replaced with
- * a nofsgsbase chicken flag.
- */
-static bool unsafe_fsgsbase;
-
-static __init int setup_unsafe_fsgsbase(char *arg)
+static __init int x86_nofsgsbase_setup(char *arg)
 {
-	unsafe_fsgsbase = true;
-	add_taint(TAINT_INSECURE, LOCKDEP_STILL_OK);
+	/* Require an exact match without trailing characters. */
+	if (strlen(arg))
+		return 0;
+
+	/* Do not emit a message if the feature is not present. */
+	if (!boot_cpu_has(X86_FEATURE_FSGSBASE))
+		return 1;
+
+	setup_clear_cpu_cap(X86_FEATURE_FSGSBASE);
+	pr_info("nofsgsbase: FSGSBASE disabled\n");
 	return 1;
 }
-__setup("unsafe_fsgsbase", setup_unsafe_fsgsbase);
+__setup("nofsgsbase", x86_nofsgsbase_setup);
 
 /*
  * Protection Keys are not available in 32-bit mode.
@@ -1372,12 +1369,8 @@ static void identify_cpu(struct cpuinfo_x86 *c)
 	setup_umip(c);
 
 	/* Enable FSGSBASE instructions if available. */
-	if (cpu_has(c, X86_FEATURE_FSGSBASE)) {
-		if (unsafe_fsgsbase)
-			cr4_set_bits(X86_CR4_FSGSBASE);
-		else
-			clear_cpu_cap(c, X86_FEATURE_FSGSBASE);
-	}
+	if (cpu_has(c, X86_FEATURE_FSGSBASE))
+		cr4_set_bits(X86_CR4_FSGSBASE);
 
 	/*
 	 * The vendor-specific functions might have changed features.
-- 
2.19.1


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [v3 11/12] x86/elf: Enumerate kernel FSGSBASE capability in AT_HWCAP2
  2018-10-23 18:42 [v3 00/12] x86: Enable FSGSBASE instructions Chang S. Bae
                   ` (9 preceding siblings ...)
  2018-10-23 18:42 ` [v3 10/12] x86/fsgsbase/64: Enable FSGSBASE by default and add a chicken bit Chang S. Bae
@ 2018-10-23 18:42 ` Chang S. Bae
  2018-10-23 18:42 ` [v3 12/12] x86/fsgsbase/64: Add documentation for FSGSBASE Chang S. Bae
  11 siblings, 0 replies; 34+ messages in thread
From: Chang S. Bae @ 2018-10-23 18:42 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, Andy Lutomirski, H . Peter Anvin
  Cc: Andi Kleen, Dave Hansen, Markus T Metzger, Ravi Shankar,
	Chang S . Bae, LKML

From: Andi Kleen <ak@linux.intel.com>

The kernel needs to explicitly enable FSGSBASE. So, the application
needs to know if it can safely use these instructions. Just looking
at the CPUID bit is not enough because it may be running in a kernel
that does not enable the instructions.

One way for the application would be to just try and catch the SIGILL.
But that is difficult to do in libraries which may not want
to overwrite the signal handlers of the main application.

So we need to provide a way for the application to discover the kernel
capability.

I used AT_HWCAP2 in the ELF aux vector which is already used by
PPC for similar things. We define a new Linux defined bitmap
returned in AT_HWCAP.  Next to MONITOR/MWAIT, bit 1 is reserved for
FSGSBASE capability checks.

The application can then access it manually or using
the getauxval() function in newer glibc.

[ chang: Rebase and edit the patch note accordingly. ]

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
---
 arch/x86/include/uapi/asm/hwcap2.h | 3 +++
 arch/x86/kernel/cpu/common.c       | 4 +++-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/uapi/asm/hwcap2.h b/arch/x86/include/uapi/asm/hwcap2.h
index 6ebaae90e207..c5ce54e749f6 100644
--- a/arch/x86/include/uapi/asm/hwcap2.h
+++ b/arch/x86/include/uapi/asm/hwcap2.h
@@ -5,4 +5,7 @@
 /* MONITOR/MWAIT enabled in Ring 3 */
 #define HWCAP2_RING3MWAIT		(1 << 0)
 
+/* Kernel allows FSGSBASE instructions available in Ring 3 */
+#define HWCAP2_FSGSBASE			BIT(1)
+
 #endif
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index f20edc754532..6964dd24082d 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1369,8 +1369,10 @@ static void identify_cpu(struct cpuinfo_x86 *c)
 	setup_umip(c);
 
 	/* Enable FSGSBASE instructions if available. */
-	if (cpu_has(c, X86_FEATURE_FSGSBASE))
+	if (cpu_has(c, X86_FEATURE_FSGSBASE)) {
 		cr4_set_bits(X86_CR4_FSGSBASE);
+		elf_hwcap2 |= HWCAP2_FSGSBASE;
+	}
 
 	/*
 	 * The vendor-specific functions might have changed features.
-- 
2.19.1


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [v3 12/12] x86/fsgsbase/64: Add documentation for FSGSBASE
  2018-10-23 18:42 [v3 00/12] x86: Enable FSGSBASE instructions Chang S. Bae
                   ` (10 preceding siblings ...)
  2018-10-23 18:42 ` [v3 11/12] x86/elf: Enumerate kernel FSGSBASE capability in AT_HWCAP2 Chang S. Bae
@ 2018-10-23 18:42 ` Chang S. Bae
  11 siblings, 0 replies; 34+ messages in thread
From: Chang S. Bae @ 2018-10-23 18:42 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, Andy Lutomirski, H . Peter Anvin
  Cc: Andi Kleen, Dave Hansen, Markus T Metzger, Ravi Shankar,
	Chang S . Bae, LKML

From: Andi Kleen <ak@linux.intel.com>

v2: Minor updates to documentation requested in review.
v3: Update for new gcc and various improvements.

[ chang: Fix some typo. Fix the example code. ]

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
---
 Documentation/x86/fsgs.txt | 104 +++++++++++++++++++++++++++++++++++++
 1 file changed, 104 insertions(+)
 create mode 100644 Documentation/x86/fsgs.txt

diff --git a/Documentation/x86/fsgs.txt b/Documentation/x86/fsgs.txt
new file mode 100644
index 000000000000..7a973a5c1767
--- /dev/null
+++ b/Documentation/x86/fsgs.txt
@@ -0,0 +1,104 @@
+
+Using FS and GS prefixes on 64bit x86 linux
+
+The x86 architecture supports segment prefixes per instruction to add an
+offset to an address.  On 64bit x86, these are mostly nops, except for FS
+and GS.
+
+This offers an efficient way to reference a global pointer.
+
+The compiler has to generate special code to use these base registers,
+or they can be accessed with inline assembler.
+
+	mov %gs:offset,%reg
+	mov %fs:offset,%reg
+
+On 64bit code, FS is used to address the thread local segment (TLS), declared using
+__thread.  The compiler then automatically generates the correct prefixes and
+relocations to access these values.
+
+FS is normally managed by the runtime code or the threading library
+Overwriting it can break a lot of things (including syscalls and gdb),
+but it can make sense to save/restore it for threading purposes.
+
+GS is freely available, but may need special (compiler or inline assembler)
+code to use.
+
+Traditionally 64bit FS and GS could be set by the arch_prctl system call
+
+	arch_prctl(ARCH_SET_GS, value)
+	arch_prctl(ARCH_SET_FS, value)
+
+[There was also an older method using modify_ldt(), inherited from 32bit,
+but this is not discussed here.]
+
+However using a syscall is problematic for user space threading libraries
+that want to context switch in user space. The whole point of them
+is avoiding the overhead of a syscall. It's also cleaner for compilers
+wanting to use the extra register to use instructions to write
+it, or read it directly to compute addresses and offsets.
+
+Newer Intel CPUs (Ivy Bridge and later) added new instructions to directly
+access these registers quickly from user context
+
+	RDFSBASE %reg	read the FS base	(or _readfsbase_u64)
+	RDGSBASE %reg	read the GS base	(or _readgsbase_u64)
+
+	WRFSBASE %reg	write the FS base	(or _writefsbase_u64)
+	WRGSBASE %reg	write the GS base	(or _writegsbase_u64)
+
+If you use the intrinsics include <immintrin.h> and set the -mfsgsbase option.
+
+The instructions are supported by the CPU when the "fsgsbase" string is shown in
+/proc/cpuinfo (or directly retrieved through the CPUID instruction,
+7:0 (ebx), word 9, bit 0)
+
+The instructions are only available to 64bit binaries.
+
+In addition the kernel needs to explicitly enable these instructions, as it
+may otherwise not correctly context switch the state. Newer Linux
+kernels enable this. When the kernel did not enable the instruction
+they will fault with an #UD exception.
+
+An FSGSBASE enabled kernel can be detected by checking the AT_HWCAP2
+bitmask in the aux vector. When the HWCAP2_FSGSBASE bit is set the
+kernel supports FSGSBASE.
+
+	#include <sys/auxv.h>
+	#include <elf.h>
+
+	/* Will be eventually in asm/hwcap.h */
+	#define HWCAP2_FSGSBASE        (1 << 1)
+
+        unsigned val = getauxval(AT_HWCAP2);
+        if (val & HWCAP2_FSGSBASE) {
+                asm("wrgsbase %0" :: "r" (ptr));
+        }
+
+No extra CPUID check needed as the kernel will not set this bit if the CPU
+does not support it.
+
+gcc 6 will have special support to directly access data relative
+to fs/gs using the __seg_fs and __seg_gs address space pointer
+modifiers.
+
+#ifndef __SEG_GS
+#error "Need gcc 6 or later"
+#endif
+
+struct gsdata {
+	int a;
+	int b;
+} gsdata = { 1, 2 };
+
+int __seg_gs *valp = 0;		/* offset relative to GS */
+
+	/* Check if kernel supports FSGSBASE as above */
+
+	/* Set up new GS */
+	asm("wrgsbase %0" :: "r" (&gsdata));
+
+	/* Now the global pointer can be used normally */
+	printf("gsdata.a = %d\n", *valp);
+
+Andi Kleen
-- 
2.19.1


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [v3 01/12] taint: Introduce a new taint flag (insecure)
  2018-10-23 18:42 ` [v3 01/12] taint: Introduce a new taint flag (insecure) Chang S. Bae
@ 2018-10-24 18:50   ` Andy Lutomirski
  0 siblings, 0 replies; 34+ messages in thread
From: Andy Lutomirski @ 2018-10-24 18:50 UTC (permalink / raw)
  To: Bae, Chang Seok
  Cc: Ingo Molnar, Thomas Gleixner, Andrew Lutomirski, H. Peter Anvin,
	Andi Kleen, Dave Hansen, Metzger, Markus T, Ravi V. Shankar,
	LKML

On Tue, Oct 23, 2018 at 11:43 AM Chang S. Bae <chang.seok.bae@intel.com> wrote:
>
> For testing (or root-only) purposes, the new flag will serve to tag the
> kernel taint accurately.
>
> When adding a new feature support, patches need to be incrementally
> applied and tested with temporal parameters. Currently, there is no flag
> for this usage.

I don't object to this patch per se, but it seems unnecessary to me.
Especially since, once the whole series is applied, this code is again
unused.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [v3 02/12] x86/fsgsbase/64: Add 'unsafe_fsgsbase' to enable CR4.FSGSBASE
  2018-10-23 18:42 ` [v3 02/12] x86/fsgsbase/64: Add 'unsafe_fsgsbase' to enable CR4.FSGSBASE Chang S. Bae
@ 2018-10-24 18:51   ` Andy Lutomirski
  0 siblings, 0 replies; 34+ messages in thread
From: Andy Lutomirski @ 2018-10-24 18:51 UTC (permalink / raw)
  To: Bae, Chang Seok
  Cc: Ingo Molnar, Thomas Gleixner, Andrew Lutomirski, H. Peter Anvin,
	Andi Kleen, Dave Hansen, Metzger, Markus T, Ravi V. Shankar,
	LKML

On Tue, Oct 23, 2018 at 11:43 AM Chang S. Bae <chang.seok.bae@intel.com> wrote:
>
> From: Andy Lutomirski <luto@kernel.org>
>
> This is temporary.  It will allow the next few patches to be tested
> incrementally.
>
> Setting unsafe_fsgsbase is a root hole.  Don't do it.
>
> [ chang: Minor fix. Add the TAINT_INSECURE flag. ]

Reviewed-by: Andy Lutomirski <luto@kernel.org>

although reviewing code that I mostly wrote seems a bit odd...

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [v3 03/12] x86/fsgsbase/64: Add intrinsics/macros for FSGSBASE instructions
  2018-10-23 18:42 ` [v3 03/12] x86/fsgsbase/64: Add intrinsics/macros for FSGSBASE instructions Chang S. Bae
@ 2018-10-24 18:53   ` Andy Lutomirski
  2018-10-24 19:21     ` Andi Kleen
  0 siblings, 1 reply; 34+ messages in thread
From: Andy Lutomirski @ 2018-10-24 18:53 UTC (permalink / raw)
  To: Bae, Chang Seok
  Cc: Ingo Molnar, Thomas Gleixner, Andrew Lutomirski, H. Peter Anvin,
	Andi Kleen, Dave Hansen, Metzger, Markus T, Ravi V. Shankar,
	LKML

On Tue, Oct 23, 2018 at 11:43 AM Chang S. Bae <chang.seok.bae@intel.com> wrote:
>
> From: Andi Kleen <ak@linux.intel.com>
>
> Add C intrinsics and assembler macros for the new FSBASE and GSBASE
> instructions.
>
> Very straight forward. Used in followon patches.
>
> [ luto: Rename the variables from FS and GS to FSBASE and GSBASE and
>   make <asm/fsgsbase.h> safe to include on 32-bit kernels. ]
>
> v2: Use __always_inline
>
> [ chang: Revise the changelog. Place them in <asm/fsgsbase.h>. Replace
>   the macros with GAS-compatible ones. ]
>
> If GCC supports it, we can add -mfsgsbase to CFLAGS and use the builtins
> here for extra performance.

Reviewed-by: Andy Lutomirski <luto@kernel.org> # C parts only

With the caveat that I'm not convinced that the memory clobbers are
needed.  The __force_order trick in special_insns.h would probably be
more appropriate.

I don't feel qualified to review the asm part without some research.
Whereas hpa or Boris could probably review it with their eyes closed.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [v3 04/12] x86/fsgsbase/64: Enable FSGSBASE instructions in the helper functions
  2018-10-23 18:42 ` [v3 04/12] x86/fsgsbase/64: Enable FSGSBASE instructions in the helper functions Chang S. Bae
@ 2018-10-24 19:16   ` Andy Lutomirski
  2018-10-24 19:41     ` [Xen-devel] " Andrew Cooper
                       ` (2 more replies)
  0 siblings, 3 replies; 34+ messages in thread
From: Andy Lutomirski @ 2018-10-24 19:16 UTC (permalink / raw)
  To: Bae, Chang Seok, Boris Ostrovsky, Juergen Gross, xen-devel
  Cc: Ingo Molnar, Thomas Gleixner, Andrew Lutomirski, H. Peter Anvin,
	Andi Kleen, Dave Hansen, Metzger, Markus T, Ravi V. Shankar,
	LKML

On Tue, Oct 23, 2018 at 11:43 AM Chang S. Bae <chang.seok.bae@intel.com> wrote:
>
> The helper functions will switch on faster accesses to FSBASE and GSBASE
> when the FSGSBASE feature is enabled.
>
> Accessing user GSBASE needs a couple of SWAPGS operations. It is avoidable
> if the user GSBASE is saved at kernel entry, being updated as changes, and
> restored back at kernel exit. However, it seems to spend more cycles for
> savings and restorations. Little or no benefit was measured from
> experiments.
>
> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
> Reviewed-by: Andi Kleen <ak@linux.intel.com>
> Cc: Any Lutomirski <luto@kernel.org>
> Cc: H. Peter Anvin <hpa@zytor.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> ---
>  arch/x86/include/asm/fsgsbase.h | 17 +++----
>  arch/x86/kernel/process_64.c    | 82 +++++++++++++++++++++++++++------
>  2 files changed, 75 insertions(+), 24 deletions(-)
>
> diff --git a/arch/x86/include/asm/fsgsbase.h b/arch/x86/include/asm/fsgsbase.h
> index b4d4509b786c..e500d771155f 100644
> --- a/arch/x86/include/asm/fsgsbase.h
> +++ b/arch/x86/include/asm/fsgsbase.h
> @@ -57,26 +57,23 @@ static __always_inline void wrgsbase(unsigned long gsbase)
>                         : "memory");
>  }
>
> +#include <asm/cpufeature.h>
> +
>  /* Helper functions for reading/writing FS/GS base */
>
>  static inline unsigned long x86_fsbase_read_cpu(void)
>  {
>         unsigned long fsbase;
>
> -       rdmsrl(MSR_FS_BASE, fsbase);
> +       if (static_cpu_has(X86_FEATURE_FSGSBASE))
> +               fsbase = rdfsbase();
> +       else
> +               rdmsrl(MSR_FS_BASE, fsbase);
>
>         return fsbase;
>  }
>
> -static inline unsigned long x86_gsbase_read_cpu_inactive(void)
> -{
> -       unsigned long gsbase;
> -
> -       rdmsrl(MSR_KERNEL_GS_BASE, gsbase);
> -
> -       return gsbase;
> -}
> -
> +extern unsigned long x86_gsbase_read_cpu_inactive(void);
>  extern void x86_fsbase_write_cpu(unsigned long fsbase);
>  extern void x86_gsbase_write_cpu_inactive(unsigned long gsbase);
>
> diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
> index 31b4755369f0..fcf18046c3d6 100644
> --- a/arch/x86/kernel/process_64.c
> +++ b/arch/x86/kernel/process_64.c
> @@ -159,6 +159,36 @@ enum which_selector {
>         GS
>  };
>
> +/*
> + * Interrupts are disabled here. Out of line to be protected from kprobes.
> + */
> +static noinline __kprobes unsigned long rd_inactive_gsbase(void)
> +{
> +       unsigned long gsbase, flags;
> +
> +       local_irq_save(flags);
> +       native_swapgs();
> +       gsbase = rdgsbase();
> +       native_swapgs();
> +       local_irq_restore(flags);
> +
> +       return gsbase;
> +}

Please fold this into its only caller and make *that* noinline.

Also, this function, and its "write" equivalent, will access the
*active* gsbase.  So it either needs to be fixed for Xen PV or some
clear comment and careful auditing needs to be added to ensure that
it's not used on Xen PV.  Or it needs to be renamed
native_x86_fsgsbase_... and add paravirt hooks, since Xen PV allows a
very efficient but different implementation, I think.  The latter is
probably the right solution.

(Hi Xen people -- how does CR4.FSGSBASE work on Xen?  Is it always
set?  Never set?  Set only if the guest tries to set it?)

>  void x86_fsbase_write_cpu(unsigned long fsbase)
>  {
> -       /*
> -        * Set the selector to 0 as a notion, that the segment base is
> -        * overwritten, which will be checked for skipping the segment load
> -        * during context switch.
> -        */
> -       loadseg(FS, 0);
> -       wrmsrl(MSR_FS_BASE, fsbase);
> +       if (static_cpu_has(X86_FEATURE_FSGSBASE)) {
> +               wrfsbase(fsbase);
> +       } else {
> +               /*
> +                * Set the selector to 0 as a notion, that the segment base is
> +                * overwritten, which will be checked for skipping the segment load
> +                * during context switch.
> +                */
> +               loadseg(FS, 0);
> +               wrmsrl(MSR_FS_BASE, fsbase);
> +       }
>  }
>
>  void x86_gsbase_write_cpu_inactive(unsigned long gsbase)
>  {
> -       /* Set the selector to 0 for the same reason as %fs above. */
> -       loadseg(GS, 0);
> -       wrmsrl(MSR_KERNEL_GS_BASE, gsbase);
> +       if (static_cpu_has(X86_FEATURE_FSGSBASE)) {
> +               wr_inactive_gsbase(gsbase);
> +       } else {
> +               /* Set the selector to 0 for the same reason as %fs above. */
> +               loadseg(GS, 0);
> +               wrmsrl(MSR_KERNEL_GS_BASE, gsbase);

I still don't get what this code is trying to do.  See other email.  I
think it will straight up crash the kernel on some CPUs, since writing
0 to %%gs will zero out the *active* base on some CPUs.

I think that, if you really want some fancy optimization for the
non-FSGSBASE case, you need to pull that out into the callers of these
helpers.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [v3 05/12] x86/fsgsbase/64: Preserve FS/GS state in __switch_to() if FSGSBASE is on
  2018-10-23 18:42 ` [v3 05/12] x86/fsgsbase/64: Preserve FS/GS state in __switch_to() if FSGSBASE is on Chang S. Bae
@ 2018-10-24 19:21   ` Andy Lutomirski
  2018-10-24 19:36     ` Bae, Chang Seok
  0 siblings, 1 reply; 34+ messages in thread
From: Andy Lutomirski @ 2018-10-24 19:21 UTC (permalink / raw)
  To: Bae, Chang Seok
  Cc: Ingo Molnar, Thomas Gleixner, Andrew Lutomirski, H. Peter Anvin,
	Andi Kleen, Dave Hansen, Metzger, Markus T, Ravi V. Shankar,
	LKML

On Tue, Oct 23, 2018 at 11:43 AM Chang S. Bae <chang.seok.bae@intel.com> wrote:
>
> From: Andy Lutomirski <luto@kernel.org>
>
> With the new FSGSBASE instructions, we can efficiently read and write
> the FSBASE and GSBASE in __switch_to().  Use that capability to preserve
> the full state.
>
> This will enable user code to do whatever it wants with the new
> instructions without any kernel-induced gotchas.  (There can still be
> architectural gotchas: movl %gs,%eax; movl %eax,%gs may change GSBASE
> if WRGSBASE was used, but users are expected to read the CPU manual
> before doing things like that.)
>
> This is a considerable speedup.  It seems to save about 100 cycles
> per context switch compared to the baseline 4.6-rc1 behavior on my
> Skylake laptop.
>
> [ chang: 5~10% performance improvements were seen by a context switch
>   benchmark that ran threads with different FS/GSBASE values. Minor
>   edit on the changelog. ]
>
> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
> Reviewed-by: Andi Kleen <ak@linux.intel.com>
> Cc: H. Peter Anvin <hpa@zytor.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> ---
>  arch/x86/kernel/process_64.c | 34 ++++++++++++++++++++++++++++------
>  1 file changed, 28 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
> index fcf18046c3d6..1d975cadc256 100644
> --- a/arch/x86/kernel/process_64.c
> +++ b/arch/x86/kernel/process_64.c
> @@ -238,8 +238,18 @@ static __always_inline void save_fsgs(struct task_struct *task)
>  {
>         savesegment(fs, task->thread.fsindex);
>         savesegment(gs, task->thread.gsindex);
> -       save_base_legacy(task, task->thread.fsindex, FS);
> -       save_base_legacy(task, task->thread.gsindex, GS);
> +       if (static_cpu_has(X86_FEATURE_FSGSBASE)) {
> +               /*
> +                * If FSGSBASE is enabled, we can't make any useful guesses
> +                * about the base, and user code expects us to save the current
> +                * value.  Fortunately, reading the base directly is efficient.
> +                */
> +               task->thread.fsbase = rdfsbase();
> +               task->thread.gsbase = rd_inactive_gsbase();
> +       } else {
> +               save_base_legacy(task, task->thread.fsindex, FS);
> +               save_base_legacy(task, task->thread.gsindex, GS);
> +       }
>  }
>
>  #if IS_ENABLED(CONFIG_KVM)
> @@ -318,10 +328,22 @@ static __always_inline void load_seg_legacy(unsigned short prev_index,
>  static __always_inline void x86_fsgsbase_load(struct thread_struct *prev,
>                                               struct thread_struct *next)
>  {
> -       load_seg_legacy(prev->fsindex, prev->fsbase,
> -                       next->fsindex, next->fsbase, FS);
> -       load_seg_legacy(prev->gsindex, prev->gsbase,
> -                       next->gsindex, next->gsbase, GS);
> +       if (static_cpu_has(X86_FEATURE_FSGSBASE)) {
> +               /* Update the FS and GS selectors if they could have changed. */
> +               if (unlikely(prev->fsindex || next->fsindex))
> +                       loadseg(FS, next->fsindex);
> +               if (unlikely(prev->gsindex || next->gsindex))
> +                       loadseg(GS, next->gsindex);
> +
> +               /* Update the bases. */
> +               wrfsbase(next->fsbase);
> +               wr_inactive_gsbase(next->gsbase);

Aha, I see what you're doing with the FSGSBASE-optimized version being
out of line.  But it's way too unclear from the code.  You should name
the helper wrgsbase_inactive or maybe __wrgsbase_inactive() to
emphasize that you're literally using the WRGSBASE instruction.  (Or
it's Xen PV equivalent.  Hmm.)

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [v3 03/12] x86/fsgsbase/64: Add intrinsics/macros for FSGSBASE instructions
  2018-10-24 18:53   ` Andy Lutomirski
@ 2018-10-24 19:21     ` Andi Kleen
  2018-10-25 23:14       ` Andy Lutomirski
  0 siblings, 1 reply; 34+ messages in thread
From: Andi Kleen @ 2018-10-24 19:21 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Bae, Chang Seok, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Dave Hansen, Metzger, Markus T, Ravi V. Shankar, LKML

On Wed, Oct 24, 2018 at 11:53:54AM -0700, Andy Lutomirski wrote:
> On Tue, Oct 23, 2018 at 11:43 AM Chang S. Bae <chang.seok.bae@intel.com> wrote:
> >
> > From: Andi Kleen <ak@linux.intel.com>
> >
> > Add C intrinsics and assembler macros for the new FSBASE and GSBASE
> > instructions.
> >
> > Very straight forward. Used in followon patches.
> >
> > [ luto: Rename the variables from FS and GS to FSBASE and GSBASE and
> >   make <asm/fsgsbase.h> safe to include on 32-bit kernels. ]
> >
> > v2: Use __always_inline
> >
> > [ chang: Revise the changelog. Place them in <asm/fsgsbase.h>. Replace
> >   the macros with GAS-compatible ones. ]
> >
> > If GCC supports it, we can add -mfsgsbase to CFLAGS and use the builtins
> > here for extra performance.
> 
> Reviewed-by: Andy Lutomirski <luto@kernel.org> # C parts only
> 
> With the caveat that I'm not convinced that the memory clobbers are
> needed.  The __force_order trick in special_insns.h would probably be
> more appropriate.
> 
> I don't feel qualified to review the asm part without some research.
> Whereas hpa or Boris could probably review it with their eyes closed.

BTW the other option would be to update the min-binutils requirement 
to 2.21 (currently it is 2.20) and then write it directly without .byte. 
I believe 2.21 added support for these instructions.

(It's only a binutils requirement, don't need gcc support)

-Andi

^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: [v3 05/12] x86/fsgsbase/64: Preserve FS/GS state in __switch_to() if FSGSBASE is on
  2018-10-24 19:21   ` Andy Lutomirski
@ 2018-10-24 19:36     ` Bae, Chang Seok
  0 siblings, 0 replies; 34+ messages in thread
From: Bae, Chang Seok @ 2018-10-24 19:36 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andi Kleen,
	Dave Hansen, Metzger, Markus T, Shankar, Ravi V, LKML

On Tue, Oct 24, 2018 at 12:21 PM Andy Lutomirski <luto@kernel.org> wrote:
> On Tue, Oct 23, 2018 at 11:43 AM Chang S. Bae <chang.seok.bae@intel.com>
> wrote:
> >
> > From: Andy Lutomirski <luto@kernel.org>
> >
> > With the new FSGSBASE instructions, we can efficiently read and write
> > the FSBASE and GSBASE in __switch_to().  Use that capability to preserve
> > the full state.
> >
> > This will enable user code to do whatever it wants with the new
> > instructions without any kernel-induced gotchas.  (There can still be
> > architectural gotchas: movl %gs,%eax; movl %eax,%gs may change GSBASE
> > if WRGSBASE was used, but users are expected to read the CPU manual
> > before doing things like that.)
> >
> > This is a considerable speedup.  It seems to save about 100 cycles
> > per context switch compared to the baseline 4.6-rc1 behavior on my
> > Skylake laptop.
> >
> > [ chang: 5~10% performance improvements were seen by a context switch
> >   benchmark that ran threads with different FS/GSBASE values. Minor
> >   edit on the changelog. ]
> >
> > Signed-off-by: Andy Lutomirski <luto@kernel.org>
> > Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
> > Reviewed-by: Andi Kleen <ak@linux.intel.com>
> > Cc: H. Peter Anvin <hpa@zytor.com>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Ingo Molnar <mingo@kernel.org>
> > Cc: Dave Hansen <dave.hansen@linux.intel.com>
> > ---
> >  arch/x86/kernel/process_64.c | 34 ++++++++++++++++++++++++++++------
> >  1 file changed, 28 insertions(+), 6 deletions(-)
> >
> > diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
> > index fcf18046c3d6..1d975cadc256 100644
> > --- a/arch/x86/kernel/process_64.c
> > +++ b/arch/x86/kernel/process_64.c
> > @@ -238,8 +238,18 @@ static __always_inline void save_fsgs(struct
> task_struct *task)
> >  {
> >         savesegment(fs, task->thread.fsindex);
> >         savesegment(gs, task->thread.gsindex);
> > -       save_base_legacy(task, task->thread.fsindex, FS);
> > -       save_base_legacy(task, task->thread.gsindex, GS);
> > +       if (static_cpu_has(X86_FEATURE_FSGSBASE)) {
> > +               /*
> > +                * If FSGSBASE is enabled, we can't make any useful guesses
> > +                * about the base, and user code expects us to save the current
> > +                * value.  Fortunately, reading the base directly is efficient.
> > +                */
> > +               task->thread.fsbase = rdfsbase();
> > +               task->thread.gsbase = rd_inactive_gsbase();
> > +       } else {
> > +               save_base_legacy(task, task->thread.fsindex, FS);
> > +               save_base_legacy(task, task->thread.gsindex, GS);
> > +       }
> >  }
> >
> >  #if IS_ENABLED(CONFIG_KVM)
> > @@ -318,10 +328,22 @@ static __always_inline void
> load_seg_legacy(unsigned short prev_index,
> >  static __always_inline void x86_fsgsbase_load(struct thread_struct *prev,
> >                                               struct thread_struct *next)
> >  {
> > -       load_seg_legacy(prev->fsindex, prev->fsbase,
> > -                       next->fsindex, next->fsbase, FS);
> > -       load_seg_legacy(prev->gsindex, prev->gsbase,
> > -                       next->gsindex, next->gsbase, GS);
> > +       if (static_cpu_has(X86_FEATURE_FSGSBASE)) {
> > +               /* Update the FS and GS selectors if they could have changed. */
> > +               if (unlikely(prev->fsindex || next->fsindex))
> > +                       loadseg(FS, next->fsindex);
> > +               if (unlikely(prev->gsindex || next->gsindex))
> > +                       loadseg(GS, next->gsindex);
> > +
> > +               /* Update the bases. */
> > +               wrfsbase(next->fsbase);
> > +               wr_inactive_gsbase(next->gsbase);
> 
> Aha, I see what you're doing with the FSGSBASE-optimized version being
> out of line.  But it's way too unclear from the code.  You should name
> the helper wrgsbase_inactive or maybe __wrgsbase_inactive() to
> emphasize that you're literally using the WRGSBASE instruction.  (Or
> it's Xen PV equivalent.  Hmm.)

Okay. Will rename the relevants.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Xen-devel] [v3 04/12] x86/fsgsbase/64: Enable FSGSBASE instructions in the helper functions
  2018-10-24 19:16   ` Andy Lutomirski
@ 2018-10-24 19:41     ` Andrew Cooper
  2018-10-25  6:09       ` Juergen Gross
  2018-10-25  7:32     ` Bae, Chang Seok
  2018-10-25 23:16     ` Andy Lutomirski
  2 siblings, 1 reply; 34+ messages in thread
From: Andrew Cooper @ 2018-10-24 19:41 UTC (permalink / raw)
  To: Andy Lutomirski, Bae, Chang Seok, Boris Ostrovsky, Juergen Gross,
	xen-devel
  Cc: Ravi V. Shankar, Andi Kleen, Dave Hansen, LKML, Metzger,
	Markus T, H. Peter Anvin, Thomas Gleixner, Ingo Molnar

On 24/10/18 20:16, Andy Lutomirski wrote:
> On Tue, Oct 23, 2018 at 11:43 AM Chang S. Bae <chang.seok.bae@intel.com> wrote:
>> The helper functions will switch on faster accesses to FSBASE and GSBASE
>> when the FSGSBASE feature is enabled.
>>
>> Accessing user GSBASE needs a couple of SWAPGS operations. It is avoidable
>> if the user GSBASE is saved at kernel entry, being updated as changes, and
>> restored back at kernel exit. However, it seems to spend more cycles for
>> savings and restorations. Little or no benefit was measured from
>> experiments.
>>
>> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
>> Reviewed-by: Andi Kleen <ak@linux.intel.com>
>> Cc: Any Lutomirski <luto@kernel.org>
>> Cc: H. Peter Anvin <hpa@zytor.com>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Ingo Molnar <mingo@kernel.org>
>> Cc: Dave Hansen <dave.hansen@linux.intel.com>
>> ---
>>  arch/x86/include/asm/fsgsbase.h | 17 +++----
>>  arch/x86/kernel/process_64.c    | 82 +++++++++++++++++++++++++++------
>>  2 files changed, 75 insertions(+), 24 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/fsgsbase.h b/arch/x86/include/asm/fsgsbase.h
>> index b4d4509b786c..e500d771155f 100644
>> --- a/arch/x86/include/asm/fsgsbase.h
>> +++ b/arch/x86/include/asm/fsgsbase.h
>> @@ -57,26 +57,23 @@ static __always_inline void wrgsbase(unsigned long gsbase)
>>                         : "memory");
>>  }
>>
>> +#include <asm/cpufeature.h>
>> +
>>  /* Helper functions for reading/writing FS/GS base */
>>
>>  static inline unsigned long x86_fsbase_read_cpu(void)
>>  {
>>         unsigned long fsbase;
>>
>> -       rdmsrl(MSR_FS_BASE, fsbase);
>> +       if (static_cpu_has(X86_FEATURE_FSGSBASE))
>> +               fsbase = rdfsbase();
>> +       else
>> +               rdmsrl(MSR_FS_BASE, fsbase);
>>
>>         return fsbase;
>>  }
>>
>> -static inline unsigned long x86_gsbase_read_cpu_inactive(void)
>> -{
>> -       unsigned long gsbase;
>> -
>> -       rdmsrl(MSR_KERNEL_GS_BASE, gsbase);
>> -
>> -       return gsbase;
>> -}
>> -
>> +extern unsigned long x86_gsbase_read_cpu_inactive(void);
>>  extern void x86_fsbase_write_cpu(unsigned long fsbase);
>>  extern void x86_gsbase_write_cpu_inactive(unsigned long gsbase);
>>
>> diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
>> index 31b4755369f0..fcf18046c3d6 100644
>> --- a/arch/x86/kernel/process_64.c
>> +++ b/arch/x86/kernel/process_64.c
>> @@ -159,6 +159,36 @@ enum which_selector {
>>         GS
>>  };
>>
>> +/*
>> + * Interrupts are disabled here. Out of line to be protected from kprobes.
>> + */
>> +static noinline __kprobes unsigned long rd_inactive_gsbase(void)
>> +{
>> +       unsigned long gsbase, flags;
>> +
>> +       local_irq_save(flags);
>> +       native_swapgs();
>> +       gsbase = rdgsbase();
>> +       native_swapgs();
>> +       local_irq_restore(flags);
>> +
>> +       return gsbase;
>> +}
> Please fold this into its only caller and make *that* noinline.
>
> Also, this function, and its "write" equivalent, will access the
> *active* gsbase.  So it either needs to be fixed for Xen PV or some
> clear comment and careful auditing needs to be added to ensure that
> it's not used on Xen PV.  Or it needs to be renamed
> native_x86_fsgsbase_... and add paravirt hooks, since Xen PV allows a
> very efficient but different implementation, I think.  The latter is
> probably the right solution.
>
> (Hi Xen people -- how does CR4.FSGSBASE work on Xen?  Is it always
> set?  Never set?  Set only if the guest tries to set it?)

FML.  Seriously - whoever put this code into the hypervisor in the past
did an atrocious job.  After some experimentation, you're going to be
sad and I'm declaring this borderline unusable.

Looks like Xen unconditionally enabled CR4.FSGSBASE if it is available. 
Therefore, PV guests can use the instructions, even if the bit is clear
in vCR4.

The CPUID bits are exposed to guests by default, and Xen will emulate
vCR4.FSGSBASE being set and cleared.

We don't however emulate swapgs (which is a cpl0 instruction).  The
guest gets handed a #GP[0] instead.

The Linux WRMSR PVop uses the set_segment_base() hypercall in instead of
going through the full wrmsr emulation path.

There is no equivalent get hypercall, so the only way I can see of
getting the value is to actually read MSR_KERNEL_GS_BASE and take the
full rdmsr emulation path.

~Andrew

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Xen-devel] [v3 04/12] x86/fsgsbase/64: Enable FSGSBASE instructions in the helper functions
  2018-10-24 19:41     ` [Xen-devel] " Andrew Cooper
@ 2018-10-25  6:09       ` Juergen Gross
  2018-10-25 23:08         ` Andrew Cooper
  0 siblings, 1 reply; 34+ messages in thread
From: Juergen Gross @ 2018-10-25  6:09 UTC (permalink / raw)
  To: Andrew Cooper, Andy Lutomirski, Bae, Chang Seok, Boris Ostrovsky,
	xen-devel
  Cc: Ravi V. Shankar, Andi Kleen, Dave Hansen, LKML, Metzger,
	Markus T, H. Peter Anvin, Thomas Gleixner, Ingo Molnar

On 24/10/2018 21:41, Andrew Cooper wrote:
> On 24/10/18 20:16, Andy Lutomirski wrote:
>> On Tue, Oct 23, 2018 at 11:43 AM Chang S. Bae <chang.seok.bae@intel.com> wrote:
>>> The helper functions will switch on faster accesses to FSBASE and GSBASE
>>> when the FSGSBASE feature is enabled.
>>>
>>> Accessing user GSBASE needs a couple of SWAPGS operations. It is avoidable
>>> if the user GSBASE is saved at kernel entry, being updated as changes, and
>>> restored back at kernel exit. However, it seems to spend more cycles for
>>> savings and restorations. Little or no benefit was measured from
>>> experiments.
>>>
>>> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
>>> Reviewed-by: Andi Kleen <ak@linux.intel.com>
>>> Cc: Any Lutomirski <luto@kernel.org>
>>> Cc: H. Peter Anvin <hpa@zytor.com>
>>> Cc: Thomas Gleixner <tglx@linutronix.de>
>>> Cc: Ingo Molnar <mingo@kernel.org>
>>> Cc: Dave Hansen <dave.hansen@linux.intel.com>
>>> ---
>>>  arch/x86/include/asm/fsgsbase.h | 17 +++----
>>>  arch/x86/kernel/process_64.c    | 82 +++++++++++++++++++++++++++------
>>>  2 files changed, 75 insertions(+), 24 deletions(-)
>>>
>>> diff --git a/arch/x86/include/asm/fsgsbase.h b/arch/x86/include/asm/fsgsbase.h
>>> index b4d4509b786c..e500d771155f 100644
>>> --- a/arch/x86/include/asm/fsgsbase.h
>>> +++ b/arch/x86/include/asm/fsgsbase.h
>>> @@ -57,26 +57,23 @@ static __always_inline void wrgsbase(unsigned long gsbase)
>>>                         : "memory");
>>>  }
>>>
>>> +#include <asm/cpufeature.h>
>>> +
>>>  /* Helper functions for reading/writing FS/GS base */
>>>
>>>  static inline unsigned long x86_fsbase_read_cpu(void)
>>>  {
>>>         unsigned long fsbase;
>>>
>>> -       rdmsrl(MSR_FS_BASE, fsbase);
>>> +       if (static_cpu_has(X86_FEATURE_FSGSBASE))
>>> +               fsbase = rdfsbase();
>>> +       else
>>> +               rdmsrl(MSR_FS_BASE, fsbase);
>>>
>>>         return fsbase;
>>>  }
>>>
>>> -static inline unsigned long x86_gsbase_read_cpu_inactive(void)
>>> -{
>>> -       unsigned long gsbase;
>>> -
>>> -       rdmsrl(MSR_KERNEL_GS_BASE, gsbase);
>>> -
>>> -       return gsbase;
>>> -}
>>> -
>>> +extern unsigned long x86_gsbase_read_cpu_inactive(void);
>>>  extern void x86_fsbase_write_cpu(unsigned long fsbase);
>>>  extern void x86_gsbase_write_cpu_inactive(unsigned long gsbase);
>>>
>>> diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
>>> index 31b4755369f0..fcf18046c3d6 100644
>>> --- a/arch/x86/kernel/process_64.c
>>> +++ b/arch/x86/kernel/process_64.c
>>> @@ -159,6 +159,36 @@ enum which_selector {
>>>         GS
>>>  };
>>>
>>> +/*
>>> + * Interrupts are disabled here. Out of line to be protected from kprobes.
>>> + */
>>> +static noinline __kprobes unsigned long rd_inactive_gsbase(void)
>>> +{
>>> +       unsigned long gsbase, flags;
>>> +
>>> +       local_irq_save(flags);
>>> +       native_swapgs();
>>> +       gsbase = rdgsbase();
>>> +       native_swapgs();
>>> +       local_irq_restore(flags);
>>> +
>>> +       return gsbase;
>>> +}
>> Please fold this into its only caller and make *that* noinline.
>>
>> Also, this function, and its "write" equivalent, will access the
>> *active* gsbase.  So it either needs to be fixed for Xen PV or some
>> clear comment and careful auditing needs to be added to ensure that
>> it's not used on Xen PV.  Or it needs to be renamed
>> native_x86_fsgsbase_... and add paravirt hooks, since Xen PV allows a
>> very efficient but different implementation, I think.  The latter is
>> probably the right solution.
>>
>> (Hi Xen people -- how does CR4.FSGSBASE work on Xen?  Is it always
>> set?  Never set?  Set only if the guest tries to set it?)
> 
> FML.  Seriously - whoever put this code into the hypervisor in the past
> did an atrocious job.  After some experimentation, you're going to be
> sad and I'm declaring this borderline unusable.
> 
> Looks like Xen unconditionally enabled CR4.FSGSBASE if it is available. 
> Therefore, PV guests can use the instructions, even if the bit is clear
> in vCR4.
> 
> The CPUID bits are exposed to guests by default, and Xen will emulate
> vCR4.FSGSBASE being set and cleared.
> 
> We don't however emulate swapgs (which is a cpl0 instruction).  The
> guest gets handed a #GP[0] instead.
> 
> The Linux WRMSR PVop uses the set_segment_base() hypercall in instead of
> going through the full wrmsr emulation path.
> 
> There is no equivalent get hypercall, so the only way I can see of
> getting the value is to actually read MSR_KERNEL_GS_BASE and take the
> full rdmsr emulation path.

Or shadow the value in a percpu variable.


Juergen

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [v3 04/12] x86/fsgsbase/64: Enable FSGSBASE instructions in the helper functions
  2018-10-24 19:16   ` Andy Lutomirski
  2018-10-24 19:41     ` [Xen-devel] " Andrew Cooper
@ 2018-10-25  7:32     ` Bae, Chang Seok
  2018-10-25 23:00       ` Andy Lutomirski
  2018-10-25 23:16     ` Andy Lutomirski
  2 siblings, 1 reply; 34+ messages in thread
From: Bae, Chang Seok @ 2018-10-25  7:32 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Boris Ostrovsky, Juergen Gross, xen-devel, Ingo Molnar,
	Thomas Gleixner, H. Peter Anvin, Andi Kleen, Dave Hansen,
	Metzger, Markus T, Shankar, Ravi V, LKML


> On Oct 24, 2018, at 12:16, Andy Lutomirski <luto@kernel.org> wrote:
> 
> On Tue, Oct 23, 2018 at 11:43 AM Chang S. Bae <chang.seok.bae@intel.com> wrote:
>> void x86_fsbase_write_cpu(unsigned long fsbase)
>> {
>> -       /*
>> -        * Set the selector to 0 as a notion, that the segment base is
>> -        * overwritten, which will be checked for skipping the segment load
>> -        * during context switch.
>> -        */
>> -       loadseg(FS, 0);
>> -       wrmsrl(MSR_FS_BASE, fsbase);
>> +       if (static_cpu_has(X86_FEATURE_FSGSBASE)) {
>> +               wrfsbase(fsbase);
>> +       } else {
>> +               /*
>> +                * Set the selector to 0 as a notion, that the segment base is
>> +                * overwritten, which will be checked for skipping the segment load
>> +                * during context switch.
>> +                */
>> +               loadseg(FS, 0);
>> +               wrmsrl(MSR_FS_BASE, fsbase);
>> +       }
>> }
>> 
>> void x86_gsbase_write_cpu_inactive(unsigned long gsbase)
>> {
>> -       /* Set the selector to 0 for the same reason as %fs above. */
>> -       loadseg(GS, 0);
>> -       wrmsrl(MSR_KERNEL_GS_BASE, gsbase);
>> +       if (static_cpu_has(X86_FEATURE_FSGSBASE)) {
>> +               wr_inactive_gsbase(gsbase);
>> +       } else {
>> +               /* Set the selector to 0 for the same reason as %fs above. */
>> +               loadseg(GS, 0);
>> +               wrmsrl(MSR_KERNEL_GS_BASE, gsbase);
> 
> I still don't get what this code is trying to do.  See other email.  I
> think it will straight up crash the kernel on some CPUs, since writing
> 0 to %%gs will zero out the *active* base on some CPUs.
> 

On those CPUs, how the old do_arch_prctl_64() worked?
loadseg(GS, 0) eventually hits the native_load_gs_index entry, where actual
mov …, %gs is wrapped by two SWAPGSes. So, it won’t cause the side effect
of overwriting the *active* base, I think.

> I think that, if you really want some fancy optimization for the
> non-FSGSBASE case, you need to pull that out into the callers of these
> helpers.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [v3 04/12] x86/fsgsbase/64: Enable FSGSBASE instructions in the helper functions
  2018-10-25  7:32     ` Bae, Chang Seok
@ 2018-10-25 23:00       ` Andy Lutomirski
  2018-10-25 23:03         ` Bae, Chang Seok
  0 siblings, 1 reply; 34+ messages in thread
From: Andy Lutomirski @ 2018-10-25 23:00 UTC (permalink / raw)
  To: Bae, Chang Seok
  Cc: Andrew Lutomirski, Boris Ostrovsky, Juergen Gross, xen-devel,
	Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andi Kleen,
	Dave Hansen, Metzger, Markus T, Ravi V. Shankar, LKML

On Thu, Oct 25, 2018 at 12:32 AM Bae, Chang Seok
<chang.seok.bae@intel.com> wrote:
>
>
> > On Oct 24, 2018, at 12:16, Andy Lutomirski <luto@kernel.org> wrote:
> >
> > On Tue, Oct 23, 2018 at 11:43 AM Chang S. Bae <chang.seok.bae@intel.com> wrote:
> >> void x86_fsbase_write_cpu(unsigned long fsbase)
> >> {
> >> -       /*
> >> -        * Set the selector to 0 as a notion, that the segment base is
> >> -        * overwritten, which will be checked for skipping the segment load
> >> -        * during context switch.
> >> -        */
> >> -       loadseg(FS, 0);
> >> -       wrmsrl(MSR_FS_BASE, fsbase);
> >> +       if (static_cpu_has(X86_FEATURE_FSGSBASE)) {
> >> +               wrfsbase(fsbase);
> >> +       } else {
> >> +               /*
> >> +                * Set the selector to 0 as a notion, that the segment base is
> >> +                * overwritten, which will be checked for skipping the segment load
> >> +                * during context switch.
> >> +                */
> >> +               loadseg(FS, 0);
> >> +               wrmsrl(MSR_FS_BASE, fsbase);
> >> +       }
> >> }
> >>
> >> void x86_gsbase_write_cpu_inactive(unsigned long gsbase)
> >> {
> >> -       /* Set the selector to 0 for the same reason as %fs above. */
> >> -       loadseg(GS, 0);
> >> -       wrmsrl(MSR_KERNEL_GS_BASE, gsbase);
> >> +       if (static_cpu_has(X86_FEATURE_FSGSBASE)) {
> >> +               wr_inactive_gsbase(gsbase);
> >> +       } else {
> >> +               /* Set the selector to 0 for the same reason as %fs above. */
> >> +               loadseg(GS, 0);
> >> +               wrmsrl(MSR_KERNEL_GS_BASE, gsbase);
> >
> > I still don't get what this code is trying to do.  See other email.  I
> > think it will straight up crash the kernel on some CPUs, since writing
> > 0 to %%gs will zero out the *active* base on some CPUs.
> >
>
> On those CPUs, how the old do_arch_prctl_64() worked?
> loadseg(GS, 0) eventually hits the native_load_gs_index entry, where actual
> mov …, %gs is wrapped by two SWAPGSes. So, it won’t cause the side effect
> of overwriting the *active* base, I think.
>
> > I think that, if you really want some fancy optimization for the
> > non-FSGSBASE case, you need to pull that out into the callers of these
> > helpers.
>

I was thinking of loadsegment, not loadseg.  Sorry!

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [v3 04/12] x86/fsgsbase/64: Enable FSGSBASE instructions in the helper functions
  2018-10-25 23:00       ` Andy Lutomirski
@ 2018-10-25 23:03         ` Bae, Chang Seok
  0 siblings, 0 replies; 34+ messages in thread
From: Bae, Chang Seok @ 2018-10-25 23:03 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Boris Ostrovsky, Juergen Gross, xen-devel, Ingo Molnar,
	Thomas Gleixner, H. Peter Anvin, Andi Kleen, Dave Hansen,
	Metzger, Markus T, Shankar, Ravi V, LKML



> On Oct 25, 2018, at 16:00, Andy Lutomirski <luto@kernel.org> wrote:
> 
> On Thu, Oct 25, 2018 at 12:32 AM Bae, Chang Seok
> <chang.seok.bae@intel.com> wrote:
>> 
>> 
>>> On Oct 24, 2018, at 12:16, Andy Lutomirski <luto@kernel.org> wrote:
>>> 
>>> On Tue, Oct 23, 2018 at 11:43 AM Chang S. Bae <chang.seok.bae@intel.com> wrote:
>>>> void x86_fsbase_write_cpu(unsigned long fsbase)
>>>> {
>>>> -       /*
>>>> -        * Set the selector to 0 as a notion, that the segment base is
>>>> -        * overwritten, which will be checked for skipping the segment load
>>>> -        * during context switch.
>>>> -        */
>>>> -       loadseg(FS, 0);
>>>> -       wrmsrl(MSR_FS_BASE, fsbase);
>>>> +       if (static_cpu_has(X86_FEATURE_FSGSBASE)) {
>>>> +               wrfsbase(fsbase);
>>>> +       } else {
>>>> +               /*
>>>> +                * Set the selector to 0 as a notion, that the segment base is
>>>> +                * overwritten, which will be checked for skipping the segment load
>>>> +                * during context switch.
>>>> +                */
>>>> +               loadseg(FS, 0);
>>>> +               wrmsrl(MSR_FS_BASE, fsbase);
>>>> +       }
>>>> }
>>>> 
>>>> void x86_gsbase_write_cpu_inactive(unsigned long gsbase)
>>>> {
>>>> -       /* Set the selector to 0 for the same reason as %fs above. */
>>>> -       loadseg(GS, 0);
>>>> -       wrmsrl(MSR_KERNEL_GS_BASE, gsbase);
>>>> +       if (static_cpu_has(X86_FEATURE_FSGSBASE)) {
>>>> +               wr_inactive_gsbase(gsbase);
>>>> +       } else {
>>>> +               /* Set the selector to 0 for the same reason as %fs above. */
>>>> +               loadseg(GS, 0);
>>>> +               wrmsrl(MSR_KERNEL_GS_BASE, gsbase);
>>> 
>>> I still don't get what this code is trying to do.  See other email.  I
>>> think it will straight up crash the kernel on some CPUs, since writing
>>> 0 to %%gs will zero out the *active* base on some CPUs.
>>> 
>> 
>> On those CPUs, how the old do_arch_prctl_64() worked?
>> loadseg(GS, 0) eventually hits the native_load_gs_index entry, where actual
>> mov …, %gs is wrapped by two SWAPGSes. So, it won’t cause the side effect
>> of overwriting the *active* base, I think.
>> 
>>> I think that, if you really want some fancy optimization for the
>>> non-FSGSBASE case, you need to pull that out into the callers of these
>>> helpers.
>> 
> 
> I was thinking of loadsegment, not loadseg.  Sorry!

No problem!  Appreciate your reviews.

Chang


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Xen-devel] [v3 04/12] x86/fsgsbase/64: Enable FSGSBASE instructions in the helper functions
  2018-10-25  6:09       ` Juergen Gross
@ 2018-10-25 23:08         ` Andrew Cooper
  2018-10-25 23:11           ` Andy Lutomirski
  0 siblings, 1 reply; 34+ messages in thread
From: Andrew Cooper @ 2018-10-25 23:08 UTC (permalink / raw)
  To: Juergen Gross, Andy Lutomirski, Bae, Chang Seok, Boris Ostrovsky,
	xen-devel
  Cc: Ravi V. Shankar, Andi Kleen, Dave Hansen, LKML, Metzger,
	Markus T, H. Peter Anvin, Thomas Gleixner, Ingo Molnar

On 25/10/2018 07:09, Juergen Gross wrote:
> On 24/10/2018 21:41, Andrew Cooper wrote:
>> On 24/10/18 20:16, Andy Lutomirski wrote:
>>> On Tue, Oct 23, 2018 at 11:43 AM Chang S. Bae <chang.seok.bae@intel.com> wrote:
>>>> The helper functions will switch on faster accesses to FSBASE and GSBASE
>>>> when the FSGSBASE feature is enabled.
>>>>
>>>> Accessing user GSBASE needs a couple of SWAPGS operations. It is avoidable
>>>> if the user GSBASE is saved at kernel entry, being updated as changes, and
>>>> restored back at kernel exit. However, it seems to spend more cycles for
>>>> savings and restorations. Little or no benefit was measured from
>>>> experiments.
>>>>
>>>> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
>>>> Reviewed-by: Andi Kleen <ak@linux.intel.com>
>>>> Cc: Any Lutomirski <luto@kernel.org>
>>>> Cc: H. Peter Anvin <hpa@zytor.com>
>>>> Cc: Thomas Gleixner <tglx@linutronix.de>
>>>> Cc: Ingo Molnar <mingo@kernel.org>
>>>> Cc: Dave Hansen <dave.hansen@linux.intel.com>
>>>> ---
>>>>  arch/x86/include/asm/fsgsbase.h | 17 +++----
>>>>  arch/x86/kernel/process_64.c    | 82 +++++++++++++++++++++++++++------
>>>>  2 files changed, 75 insertions(+), 24 deletions(-)
>>>>
>>>> diff --git a/arch/x86/include/asm/fsgsbase.h b/arch/x86/include/asm/fsgsbase.h
>>>> index b4d4509b786c..e500d771155f 100644
>>>> --- a/arch/x86/include/asm/fsgsbase.h
>>>> +++ b/arch/x86/include/asm/fsgsbase.h
>>>> @@ -57,26 +57,23 @@ static __always_inline void wrgsbase(unsigned long gsbase)
>>>>                         : "memory");
>>>>  }
>>>>
>>>> +#include <asm/cpufeature.h>
>>>> +
>>>>  /* Helper functions for reading/writing FS/GS base */
>>>>
>>>>  static inline unsigned long x86_fsbase_read_cpu(void)
>>>>  {
>>>>         unsigned long fsbase;
>>>>
>>>> -       rdmsrl(MSR_FS_BASE, fsbase);
>>>> +       if (static_cpu_has(X86_FEATURE_FSGSBASE))
>>>> +               fsbase = rdfsbase();
>>>> +       else
>>>> +               rdmsrl(MSR_FS_BASE, fsbase);
>>>>
>>>>         return fsbase;
>>>>  }
>>>>
>>>> -static inline unsigned long x86_gsbase_read_cpu_inactive(void)
>>>> -{
>>>> -       unsigned long gsbase;
>>>> -
>>>> -       rdmsrl(MSR_KERNEL_GS_BASE, gsbase);
>>>> -
>>>> -       return gsbase;
>>>> -}
>>>> -
>>>> +extern unsigned long x86_gsbase_read_cpu_inactive(void);
>>>>  extern void x86_fsbase_write_cpu(unsigned long fsbase);
>>>>  extern void x86_gsbase_write_cpu_inactive(unsigned long gsbase);
>>>>
>>>> diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
>>>> index 31b4755369f0..fcf18046c3d6 100644
>>>> --- a/arch/x86/kernel/process_64.c
>>>> +++ b/arch/x86/kernel/process_64.c
>>>> @@ -159,6 +159,36 @@ enum which_selector {
>>>>         GS
>>>>  };
>>>>
>>>> +/*
>>>> + * Interrupts are disabled here. Out of line to be protected from kprobes.
>>>> + */
>>>> +static noinline __kprobes unsigned long rd_inactive_gsbase(void)
>>>> +{
>>>> +       unsigned long gsbase, flags;
>>>> +
>>>> +       local_irq_save(flags);
>>>> +       native_swapgs();
>>>> +       gsbase = rdgsbase();
>>>> +       native_swapgs();
>>>> +       local_irq_restore(flags);
>>>> +
>>>> +       return gsbase;
>>>> +}
>>> Please fold this into its only caller and make *that* noinline.
>>>
>>> Also, this function, and its "write" equivalent, will access the
>>> *active* gsbase.  So it either needs to be fixed for Xen PV or some
>>> clear comment and careful auditing needs to be added to ensure that
>>> it's not used on Xen PV.  Or it needs to be renamed
>>> native_x86_fsgsbase_... and add paravirt hooks, since Xen PV allows a
>>> very efficient but different implementation, I think.  The latter is
>>> probably the right solution.
>>>
>>> (Hi Xen people -- how does CR4.FSGSBASE work on Xen?  Is it always
>>> set?  Never set?  Set only if the guest tries to set it?)
>> FML.  Seriously - whoever put this code into the hypervisor in the past
>> did an atrocious job.  After some experimentation, you're going to be
>> sad and I'm declaring this borderline unusable.
>>
>> Looks like Xen unconditionally enabled CR4.FSGSBASE if it is available. 
>> Therefore, PV guests can use the instructions, even if the bit is clear
>> in vCR4.
>>
>> The CPUID bits are exposed to guests by default, and Xen will emulate
>> vCR4.FSGSBASE being set and cleared.
>>
>> We don't however emulate swapgs (which is a cpl0 instruction).  The
>> guest gets handed a #GP[0] instead.
>>
>> The Linux WRMSR PVop uses the set_segment_base() hypercall in instead of
>> going through the full wrmsr emulation path.
>>
>> There is no equivalent get hypercall, so the only way I can see of
>> getting the value is to actually read MSR_KERNEL_GS_BASE and take the
>> full rdmsr emulation path.
> Or shadow the value in a percpu variable.

Hmm true, so long as no paths try to use native_rd{fs,gs}base() to
bypass the PVop.

~Andrew

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Xen-devel] [v3 04/12] x86/fsgsbase/64: Enable FSGSBASE instructions in the helper functions
  2018-10-25 23:08         ` Andrew Cooper
@ 2018-10-25 23:11           ` Andy Lutomirski
  2018-10-25 23:14             ` Andrew Cooper
  0 siblings, 1 reply; 34+ messages in thread
From: Andy Lutomirski @ 2018-10-25 23:11 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Juergen Gross, Andrew Lutomirski, Bae, Chang Seok,
	Boris Ostrovsky, xen-devel, Ravi V. Shankar, Andi Kleen,
	Dave Hansen, LKML, Metzger, Markus T, H. Peter Anvin,
	Thomas Gleixner, Ingo Molnar

On Thu, Oct 25, 2018 at 4:09 PM Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>
> On 25/10/2018 07:09, Juergen Gross wrote:
> > On 24/10/2018 21:41, Andrew Cooper wrote:
> >> On 24/10/18 20:16, Andy Lutomirski wrote:
> >>> On Tue, Oct 23, 2018 at 11:43 AM Chang S. Bae <chang.seok.bae@intel.com> wrote:
> >>>> The helper functions will switch on faster accesses to FSBASE and GSBASE
> >>>> when the FSGSBASE feature is enabled.
> >>>>
> >>>> Accessing user GSBASE needs a couple of SWAPGS operations. It is avoidable
> >>>> if the user GSBASE is saved at kernel entry, being updated as changes, and
> >>>> restored back at kernel exit. However, it seems to spend more cycles for
> >>>> savings and restorations. Little or no benefit was measured from
> >>>> experiments.
> >>>>
> >>>> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
> >>>> Reviewed-by: Andi Kleen <ak@linux.intel.com>
> >>>> Cc: Any Lutomirski <luto@kernel.org>
> >>>> Cc: H. Peter Anvin <hpa@zytor.com>
> >>>> Cc: Thomas Gleixner <tglx@linutronix.de>
> >>>> Cc: Ingo Molnar <mingo@kernel.org>
> >>>> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> >>>> ---
> >>>>  arch/x86/include/asm/fsgsbase.h | 17 +++----
> >>>>  arch/x86/kernel/process_64.c    | 82 +++++++++++++++++++++++++++------
> >>>>  2 files changed, 75 insertions(+), 24 deletions(-)
> >>>>
> >>>> diff --git a/arch/x86/include/asm/fsgsbase.h b/arch/x86/include/asm/fsgsbase.h
> >>>> index b4d4509b786c..e500d771155f 100644
> >>>> --- a/arch/x86/include/asm/fsgsbase.h
> >>>> +++ b/arch/x86/include/asm/fsgsbase.h
> >>>> @@ -57,26 +57,23 @@ static __always_inline void wrgsbase(unsigned long gsbase)
> >>>>                         : "memory");
> >>>>  }
> >>>>
> >>>> +#include <asm/cpufeature.h>
> >>>> +
> >>>>  /* Helper functions for reading/writing FS/GS base */
> >>>>
> >>>>  static inline unsigned long x86_fsbase_read_cpu(void)
> >>>>  {
> >>>>         unsigned long fsbase;
> >>>>
> >>>> -       rdmsrl(MSR_FS_BASE, fsbase);
> >>>> +       if (static_cpu_has(X86_FEATURE_FSGSBASE))
> >>>> +               fsbase = rdfsbase();
> >>>> +       else
> >>>> +               rdmsrl(MSR_FS_BASE, fsbase);
> >>>>
> >>>>         return fsbase;
> >>>>  }
> >>>>
> >>>> -static inline unsigned long x86_gsbase_read_cpu_inactive(void)
> >>>> -{
> >>>> -       unsigned long gsbase;
> >>>> -
> >>>> -       rdmsrl(MSR_KERNEL_GS_BASE, gsbase);
> >>>> -
> >>>> -       return gsbase;
> >>>> -}
> >>>> -
> >>>> +extern unsigned long x86_gsbase_read_cpu_inactive(void);
> >>>>  extern void x86_fsbase_write_cpu(unsigned long fsbase);
> >>>>  extern void x86_gsbase_write_cpu_inactive(unsigned long gsbase);
> >>>>
> >>>> diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
> >>>> index 31b4755369f0..fcf18046c3d6 100644
> >>>> --- a/arch/x86/kernel/process_64.c
> >>>> +++ b/arch/x86/kernel/process_64.c
> >>>> @@ -159,6 +159,36 @@ enum which_selector {
> >>>>         GS
> >>>>  };
> >>>>
> >>>> +/*
> >>>> + * Interrupts are disabled here. Out of line to be protected from kprobes.
> >>>> + */
> >>>> +static noinline __kprobes unsigned long rd_inactive_gsbase(void)
> >>>> +{
> >>>> +       unsigned long gsbase, flags;
> >>>> +
> >>>> +       local_irq_save(flags);
> >>>> +       native_swapgs();
> >>>> +       gsbase = rdgsbase();
> >>>> +       native_swapgs();
> >>>> +       local_irq_restore(flags);
> >>>> +
> >>>> +       return gsbase;
> >>>> +}
> >>> Please fold this into its only caller and make *that* noinline.
> >>>
> >>> Also, this function, and its "write" equivalent, will access the
> >>> *active* gsbase.  So it either needs to be fixed for Xen PV or some
> >>> clear comment and careful auditing needs to be added to ensure that
> >>> it's not used on Xen PV.  Or it needs to be renamed
> >>> native_x86_fsgsbase_... and add paravirt hooks, since Xen PV allows a
> >>> very efficient but different implementation, I think.  The latter is
> >>> probably the right solution.
> >>>
> >>> (Hi Xen people -- how does CR4.FSGSBASE work on Xen?  Is it always
> >>> set?  Never set?  Set only if the guest tries to set it?)
> >> FML.  Seriously - whoever put this code into the hypervisor in the past
> >> did an atrocious job.  After some experimentation, you're going to be
> >> sad and I'm declaring this borderline unusable.
> >>
> >> Looks like Xen unconditionally enabled CR4.FSGSBASE if it is available.
> >> Therefore, PV guests can use the instructions, even if the bit is clear
> >> in vCR4.
> >>
> >> The CPUID bits are exposed to guests by default, and Xen will emulate
> >> vCR4.FSGSBASE being set and cleared.
> >>
> >> We don't however emulate swapgs (which is a cpl0 instruction).  The
> >> guest gets handed a #GP[0] instead.
> >>
> >> The Linux WRMSR PVop uses the set_segment_base() hypercall in instead of
> >> going through the full wrmsr emulation path.
> >>
> >> There is no equivalent get hypercall, so the only way I can see of
> >> getting the value is to actually read MSR_KERNEL_GS_BASE and take the
> >> full rdmsr emulation path.
> > Or shadow the value in a percpu variable.
>
> Hmm true, so long as no paths try to use native_rd{fs,gs}base() to
> bypass the PVop.

But *user* code can change the base.  How is the kernel supposed to
context-switch the user gsbase?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [v3 03/12] x86/fsgsbase/64: Add intrinsics/macros for FSGSBASE instructions
  2018-10-24 19:21     ` Andi Kleen
@ 2018-10-25 23:14       ` Andy Lutomirski
  2018-10-25 23:31         ` Linus Torvalds
  0 siblings, 1 reply; 34+ messages in thread
From: Andy Lutomirski @ 2018-10-25 23:14 UTC (permalink / raw)
  To: Andi Kleen, Linus Torvalds
  Cc: Andrew Lutomirski, Bae, Chang Seok, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, Dave Hansen, Metzger, Markus T, Ravi V. Shankar,
	LKML

On Wed, Oct 24, 2018 at 12:21 PM Andi Kleen <ak@linux.intel.com> wrote:
>
> On Wed, Oct 24, 2018 at 11:53:54AM -0700, Andy Lutomirski wrote:
> > On Tue, Oct 23, 2018 at 11:43 AM Chang S. Bae <chang.seok.bae@intel.com> wrote:
> > >
> > > From: Andi Kleen <ak@linux.intel.com>
> > >
> > > Add C intrinsics and assembler macros for the new FSBASE and GSBASE
> > > instructions.
> > >
> > > Very straight forward. Used in followon patches.
> > >
> > > [ luto: Rename the variables from FS and GS to FSBASE and GSBASE and
> > >   make <asm/fsgsbase.h> safe to include on 32-bit kernels. ]
> > >
> > > v2: Use __always_inline
> > >
> > > [ chang: Revise the changelog. Place them in <asm/fsgsbase.h>. Replace
> > >   the macros with GAS-compatible ones. ]
> > >
> > > If GCC supports it, we can add -mfsgsbase to CFLAGS and use the builtins
> > > here for extra performance.
> >
> > Reviewed-by: Andy Lutomirski <luto@kernel.org> # C parts only
> >
> > With the caveat that I'm not convinced that the memory clobbers are
> > needed.  The __force_order trick in special_insns.h would probably be
> > more appropriate.
> >
> > I don't feel qualified to review the asm part without some research.
> > Whereas hpa or Boris could probably review it with their eyes closed.
>
> BTW the other option would be to update the min-binutils requirement
> to 2.21 (currently it is 2.20) and then write it directly without .byte.
> I believe 2.21 added support for these instructions.
>
> (It's only a binutils requirement, don't need gcc support)
>

I'd personally be fine with this.  Linus? Thomas? Ingo?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Xen-devel] [v3 04/12] x86/fsgsbase/64: Enable FSGSBASE instructions in the helper functions
  2018-10-25 23:11           ` Andy Lutomirski
@ 2018-10-25 23:14             ` Andrew Cooper
  0 siblings, 0 replies; 34+ messages in thread
From: Andrew Cooper @ 2018-10-25 23:14 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Juergen Gross, Bae, Chang Seok, Boris Ostrovsky, xen-devel,
	Ravi V. Shankar, Andi Kleen, Dave Hansen, LKML, Metzger,
	Markus T, H. Peter Anvin, Thomas Gleixner, Ingo Molnar

On 26/10/2018 00:11, Andy Lutomirski wrote:
> On Thu, Oct 25, 2018 at 4:09 PM Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>> On 25/10/2018 07:09, Juergen Gross wrote:
>>> On 24/10/2018 21:41, Andrew Cooper wrote:
>>>> On 24/10/18 20:16, Andy Lutomirski wrote:
>>>>> On Tue, Oct 23, 2018 at 11:43 AM Chang S. Bae <chang.seok.bae@intel.com> wrote:
>>>>>> The helper functions will switch on faster accesses to FSBASE and GSBASE
>>>>>> when the FSGSBASE feature is enabled.
>>>>>>
>>>>>> Accessing user GSBASE needs a couple of SWAPGS operations. It is avoidable
>>>>>> if the user GSBASE is saved at kernel entry, being updated as changes, and
>>>>>> restored back at kernel exit. However, it seems to spend more cycles for
>>>>>> savings and restorations. Little or no benefit was measured from
>>>>>> experiments.
>>>>>>
>>>>>> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
>>>>>> Reviewed-by: Andi Kleen <ak@linux.intel.com>
>>>>>> Cc: Any Lutomirski <luto@kernel.org>
>>>>>> Cc: H. Peter Anvin <hpa@zytor.com>
>>>>>> Cc: Thomas Gleixner <tglx@linutronix.de>
>>>>>> Cc: Ingo Molnar <mingo@kernel.org>
>>>>>> Cc: Dave Hansen <dave.hansen@linux.intel.com>
>>>>>> ---
>>>>>>  arch/x86/include/asm/fsgsbase.h | 17 +++----
>>>>>>  arch/x86/kernel/process_64.c    | 82 +++++++++++++++++++++++++++------
>>>>>>  2 files changed, 75 insertions(+), 24 deletions(-)
>>>>>>
>>>>>> diff --git a/arch/x86/include/asm/fsgsbase.h b/arch/x86/include/asm/fsgsbase.h
>>>>>> index b4d4509b786c..e500d771155f 100644
>>>>>> --- a/arch/x86/include/asm/fsgsbase.h
>>>>>> +++ b/arch/x86/include/asm/fsgsbase.h
>>>>>> @@ -57,26 +57,23 @@ static __always_inline void wrgsbase(unsigned long gsbase)
>>>>>>                         : "memory");
>>>>>>  }
>>>>>>
>>>>>> +#include <asm/cpufeature.h>
>>>>>> +
>>>>>>  /* Helper functions for reading/writing FS/GS base */
>>>>>>
>>>>>>  static inline unsigned long x86_fsbase_read_cpu(void)
>>>>>>  {
>>>>>>         unsigned long fsbase;
>>>>>>
>>>>>> -       rdmsrl(MSR_FS_BASE, fsbase);
>>>>>> +       if (static_cpu_has(X86_FEATURE_FSGSBASE))
>>>>>> +               fsbase = rdfsbase();
>>>>>> +       else
>>>>>> +               rdmsrl(MSR_FS_BASE, fsbase);
>>>>>>
>>>>>>         return fsbase;
>>>>>>  }
>>>>>>
>>>>>> -static inline unsigned long x86_gsbase_read_cpu_inactive(void)
>>>>>> -{
>>>>>> -       unsigned long gsbase;
>>>>>> -
>>>>>> -       rdmsrl(MSR_KERNEL_GS_BASE, gsbase);
>>>>>> -
>>>>>> -       return gsbase;
>>>>>> -}
>>>>>> -
>>>>>> +extern unsigned long x86_gsbase_read_cpu_inactive(void);
>>>>>>  extern void x86_fsbase_write_cpu(unsigned long fsbase);
>>>>>>  extern void x86_gsbase_write_cpu_inactive(unsigned long gsbase);
>>>>>>
>>>>>> diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
>>>>>> index 31b4755369f0..fcf18046c3d6 100644
>>>>>> --- a/arch/x86/kernel/process_64.c
>>>>>> +++ b/arch/x86/kernel/process_64.c
>>>>>> @@ -159,6 +159,36 @@ enum which_selector {
>>>>>>         GS
>>>>>>  };
>>>>>>
>>>>>> +/*
>>>>>> + * Interrupts are disabled here. Out of line to be protected from kprobes.
>>>>>> + */
>>>>>> +static noinline __kprobes unsigned long rd_inactive_gsbase(void)
>>>>>> +{
>>>>>> +       unsigned long gsbase, flags;
>>>>>> +
>>>>>> +       local_irq_save(flags);
>>>>>> +       native_swapgs();
>>>>>> +       gsbase = rdgsbase();
>>>>>> +       native_swapgs();
>>>>>> +       local_irq_restore(flags);
>>>>>> +
>>>>>> +       return gsbase;
>>>>>> +}
>>>>> Please fold this into its only caller and make *that* noinline.
>>>>>
>>>>> Also, this function, and its "write" equivalent, will access the
>>>>> *active* gsbase.  So it either needs to be fixed for Xen PV or some
>>>>> clear comment and careful auditing needs to be added to ensure that
>>>>> it's not used on Xen PV.  Or it needs to be renamed
>>>>> native_x86_fsgsbase_... and add paravirt hooks, since Xen PV allows a
>>>>> very efficient but different implementation, I think.  The latter is
>>>>> probably the right solution.
>>>>>
>>>>> (Hi Xen people -- how does CR4.FSGSBASE work on Xen?  Is it always
>>>>> set?  Never set?  Set only if the guest tries to set it?)
>>>> FML.  Seriously - whoever put this code into the hypervisor in the past
>>>> did an atrocious job.  After some experimentation, you're going to be
>>>> sad and I'm declaring this borderline unusable.
>>>>
>>>> Looks like Xen unconditionally enabled CR4.FSGSBASE if it is available.
>>>> Therefore, PV guests can use the instructions, even if the bit is clear
>>>> in vCR4.
>>>>
>>>> The CPUID bits are exposed to guests by default, and Xen will emulate
>>>> vCR4.FSGSBASE being set and cleared.
>>>>
>>>> We don't however emulate swapgs (which is a cpl0 instruction).  The
>>>> guest gets handed a #GP[0] instead.
>>>>
>>>> The Linux WRMSR PVop uses the set_segment_base() hypercall in instead of
>>>> going through the full wrmsr emulation path.
>>>>
>>>> There is no equivalent get hypercall, so the only way I can see of
>>>> getting the value is to actually read MSR_KERNEL_GS_BASE and take the
>>>> full rdmsr emulation path.
>>> Or shadow the value in a percpu variable.
>> Hmm true, so long as no paths try to use native_rd{fs,gs}base() to
>> bypass the PVop.
> But *user* code can change the base.  How is the kernel supposed to
> context-switch the user gsbase?

user code can change the user gs base.

Xen will switch user/kernel base as appropriate on context switch so the
kernel is entered on the kernel gs base.

But you are right - there is no way for Linux to peek at the current
user gs base without reading MSR_GS_SHADOW.  (The user gs base can be
set via a hypercall, but not obtained).

~Andrew

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [v3 04/12] x86/fsgsbase/64: Enable FSGSBASE instructions in the helper functions
  2018-10-24 19:16   ` Andy Lutomirski
  2018-10-24 19:41     ` [Xen-devel] " Andrew Cooper
  2018-10-25  7:32     ` Bae, Chang Seok
@ 2018-10-25 23:16     ` Andy Lutomirski
  2 siblings, 0 replies; 34+ messages in thread
From: Andy Lutomirski @ 2018-10-25 23:16 UTC (permalink / raw)
  To: Andrew Lutomirski
  Cc: Bae, Chang Seok, Boris Ostrovsky, Juergen Gross, xen-devel,
	Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andi Kleen,
	Dave Hansen, Metzger, Markus T, Ravi V. Shankar, LKML

On Wed, Oct 24, 2018 at 12:16 PM Andy Lutomirski <luto@kernel.org> wrote:
>
> On Tue, Oct 23, 2018 at 11:43 AM Chang S. Bae <chang.seok.bae@intel.com> wrote:

> > +/*
> > + * Interrupts are disabled here. Out of line to be protected from kprobes.
> > + */
> > +static noinline __kprobes unsigned long rd_inactive_gsbase(void)
> > +{
> > +       unsigned long gsbase, flags;
> > +
> > +       local_irq_save(flags);
> > +       native_swapgs();
> > +       gsbase = rdgsbase();
> > +       native_swapgs();
> > +       local_irq_restore(flags);
> > +
> > +       return gsbase;
> > +}
>
> Please fold this into its only caller and make *that* noinline.
>

On further reading of the whole series, I retract this particular
comment.  But I do think that __rdgsbase_inactive() would be a better
name.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [v3 03/12] x86/fsgsbase/64: Add intrinsics/macros for FSGSBASE instructions
  2018-10-25 23:14       ` Andy Lutomirski
@ 2018-10-25 23:31         ` Linus Torvalds
  2018-10-26  0:09           ` Andy Lutomirski
  0 siblings, 1 reply; 34+ messages in thread
From: Linus Torvalds @ 2018-10-25 23:31 UTC (permalink / raw)
  To: luto
  Cc: ak, chang.seok.bae, Ingo Molnar, tglx, Peter Anvin, dave.hansen,
	markus.t.metzger, ravi.v.shankar, Linux Kernel Mailing List

On Thu, Oct 25, 2018 at 4:14 PM Andy Lutomirski <luto@kernel.org> wrote:
>
> On Wed, Oct 24, 2018 at 12:21 PM Andi Kleen <ak@linux.intel.com> wrote:
> >
> > BTW the other option would be to update the min-binutils requirement
> > to 2.21 (currently it is 2.20) and then write it directly without .byte.
> > I believe 2.21 added support for these instructions.
> >
> > (It's only a binutils requirement, don't need gcc support)
>
> I'd personally be fine with this.  Linus? Thomas? Ingo?

I always vote for "require modern tools" as long as it doesn't cause problems.

binutils-2.21 is something like seven years old by now, but the real
issue would be what versions distros are actually shipping. I don't
want people to have to build their own binutils just to build a
kernel.

It's usually some ancient enterprise distro that is stuck on old
versions. Anybody have any idea?

                 Linus

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [v3 03/12] x86/fsgsbase/64: Add intrinsics/macros for FSGSBASE instructions
  2018-10-25 23:31         ` Linus Torvalds
@ 2018-10-26  0:09           ` Andy Lutomirski
  0 siblings, 0 replies; 34+ messages in thread
From: Andy Lutomirski @ 2018-10-26  0:09 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Lutomirski, Andi Kleen, Bae, Chang Seok, Ingo Molnar,
	Thomas Gleixner, H. Peter Anvin, Dave Hansen, Metzger, Markus T,
	Ravi V. Shankar, LKML

On Thu, Oct 25, 2018 at 4:31 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Thu, Oct 25, 2018 at 4:14 PM Andy Lutomirski <luto@kernel.org> wrote:
> >
> > On Wed, Oct 24, 2018 at 12:21 PM Andi Kleen <ak@linux.intel.com> wrote:
> > >
> > > BTW the other option would be to update the min-binutils requirement
> > > to 2.21 (currently it is 2.20) and then write it directly without .byte.
> > > I believe 2.21 added support for these instructions.
> > >
> > > (It's only a binutils requirement, don't need gcc support)
> >
> > I'd personally be fine with this.  Linus? Thomas? Ingo?
>
> I always vote for "require modern tools" as long as it doesn't cause problems.
>
> binutils-2.21 is something like seven years old by now, but the real
> issue would be what versions distros are actually shipping. I don't
> want people to have to build their own binutils just to build a
> kernel.
>
> It's usually some ancient enterprise distro that is stuck on old
> versions. Anybody have any idea?
>

With some basic Googling:

CentOS 6 is binutils 2.23.  CentOS 5 is EOL.  RHEL 5 has "extended
life", which means that it's officially zombified and paying customers
can still download (unsupported) packages.

SLES 11 is binutils 2.19, which is already unsupported.  SLES 12 is 2.24.

So I would guess we're okay and we can bump the requirement to 2.21.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [v3 07/12] x86/fsgsbase/64: Introduce the new FIND_PERCPU_BASE macro
  2018-10-23 18:42 ` [v3 07/12] x86/fsgsbase/64: Introduce the new FIND_PERCPU_BASE macro Chang S. Bae
@ 2018-10-26  0:25   ` Andy Lutomirski
  2018-10-26  0:59     ` Nadav Amit
  0 siblings, 1 reply; 34+ messages in thread
From: Andy Lutomirski @ 2018-10-26  0:25 UTC (permalink / raw)
  To: Bae, Chang Seok, Nadav Amit
  Cc: Ingo Molnar, Thomas Gleixner, Andrew Lutomirski, H. Peter Anvin,
	Andi Kleen, Dave Hansen, Metzger, Markus T, Ravi V. Shankar,
	LKML

On Tue, Oct 23, 2018 at 11:43 AM Chang S. Bae <chang.seok.bae@intel.com> wrote:
>
> GSBASE is used to find per-CPU data in the kernel. But when it is unknown,
> the per-CPU base can be found from the per_cpu_offset table with a CPU NR.
> The CPU NR is extracted from the limit field of the CPUNODE entry in GDT,
> or by the RDPID instruction.
>
> Also, add the GAS-compatible RDPID macro.
>
> The new macro will be used on a following patch.
>
> Suggested-by: H. Peter Anvin <hpa@zytor.com>
> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
> Cc: Andi Kleen <ak@linux.intel.com>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@kernel.org>
> ---
>  arch/x86/include/asm/fsgsbase.h | 52 +++++++++++++++++++++++++++++++++
>  arch/x86/include/asm/inst.h     | 15 ++++++++++
>  2 files changed, 67 insertions(+)
>
> diff --git a/arch/x86/include/asm/fsgsbase.h b/arch/x86/include/asm/fsgsbase.h
> index e500d771155f..0c2d7d8a8c01 100644
> --- a/arch/x86/include/asm/fsgsbase.h
> +++ b/arch/x86/include/asm/fsgsbase.h
> @@ -111,6 +111,58 @@ extern void x86_gsbase_write_cpu_inactive(unsigned long gsbase);
>         MODRM 0xd0 wrgsbase_opd 1
>  .endm
>
> +#if CONFIG_SMP
> +
> +/*
> + * Fetch the per-CPU GSBASE value for this processor and put it in @reg.
> + * We normally use %gs for accessing per-CPU data, but we are setting up
> + * %gs here and obviously can not use %gs itself to access per-CPU data.
> + */
> +.macro FIND_PERCPU_BASE_RDPID reg:req
> +       /*
> +        * The CPU/node NR is initialized earlier, directly in cpu_init().
P
> +        */
> +       RDPID   \reg

I would suggest that you instead add a macro LOAD_CPU_AND_NODE \reg
and have that macro contain the alternative.  It can switch between
RDPID and LSL.  This way you avoid duplicating the rest of it.

This should end up in the same header as __getcpu() -- it probably
makes sense to just move __getcpu() for this purpose.

Also, hpa and Nadav, shouldn't asm/inst.h end up in macros.S?

--Andy

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [v3 07/12] x86/fsgsbase/64: Introduce the new FIND_PERCPU_BASE macro
  2018-10-26  0:25   ` Andy Lutomirski
@ 2018-10-26  0:59     ` Nadav Amit
  0 siblings, 0 replies; 34+ messages in thread
From: Nadav Amit @ 2018-10-26  0:59 UTC (permalink / raw)
  To: Andy Lutomirski, Bae, Chang Seok
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andi Kleen,
	Dave Hansen, Metzger, Markus T, Ravi V. Shankar, LKML

From: Andy Lutomirski
Sent: October 26, 2018 at 12:25:17 AM GMT
> To: Bae, Chang Seok <chang.seok.bae@intel.com>, Nadav Amit <namit@vmware.com>
> Cc: Ingo Molnar <mingo@kernel.org>, Thomas Gleixner <tglx@linutronix.de>, Andrew Lutomirski <luto@kernel.org>, H. Peter Anvin <hpa@zytor.com>, Andi Kleen <ak@linux.intel.com>, Dave Hansen <dave.hansen@linux.intel.com>, Metzger, Markus T <markus.t.metzger@intel.com>, Ravi V. Shankar <ravi.v.shankar@intel.com>, LKML <linux-kernel@vger.kernel.org>
> Subject: Re: [v3 07/12] x86/fsgsbase/64: Introduce the new FIND_PERCPU_BASE macro
> 
> 
> On Tue, Oct 23, 2018 at 11:43 AM Chang S. Bae <chang.seok.bae@intel.com> wrote:
>> GSBASE is used to find per-CPU data in the kernel. But when it is unknown,
>> the per-CPU base can be found from the per_cpu_offset table with a CPU NR.
>> The CPU NR is extracted from the limit field of the CPUNODE entry in GDT,
>> or by the RDPID instruction.
>> 
>> Also, add the GAS-compatible RDPID macro.
>> 
>> The new macro will be used on a following patch.
>> 
>> Suggested-by: H. Peter Anvin <hpa@zytor.com>
>> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
>> Cc: Andi Kleen <ak@linux.intel.com>
>> Cc: Andy Lutomirski <luto@kernel.org>
>> Cc: Dave Hansen <dave.hansen@linux.intel.com>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Ingo Molnar <mingo@kernel.org>
>> ---
>> arch/x86/include/asm/fsgsbase.h | 52 +++++++++++++++++++++++++++++++++
>> arch/x86/include/asm/inst.h     | 15 ++++++++++
>> 2 files changed, 67 insertions(+)
>> 
>> diff --git a/arch/x86/include/asm/fsgsbase.h b/arch/x86/include/asm/fsgsbase.h
>> index e500d771155f..0c2d7d8a8c01 100644
>> --- a/arch/x86/include/asm/fsgsbase.h
>> +++ b/arch/x86/include/asm/fsgsbase.h
>> @@ -111,6 +111,58 @@ extern void x86_gsbase_write_cpu_inactive(unsigned long gsbase);
>>        MODRM 0xd0 wrgsbase_opd 1
>> .endm
>> 
>> +#if CONFIG_SMP
>> +
>> +/*
>> + * Fetch the per-CPU GSBASE value for this processor and put it in @reg.
>> + * We normally use %gs for accessing per-CPU data, but we are setting up
>> + * %gs here and obviously can not use %gs itself to access per-CPU data.
>> + */
>> +.macro FIND_PERCPU_BASE_RDPID reg:req
>> +       /*
>> +        * The CPU/node NR is initialized earlier, directly in cpu_init().
> P
>> +        */
>> +       RDPID   \reg
> 
> I would suggest that you instead add a macro LOAD_CPU_AND_NODE \reg
> and have that macro contain the alternative.  It can switch between
> RDPID and LSL.  This way you avoid duplicating the rest of it.
> 
> This should end up in the same header as __getcpu() -- it probably
> makes sense to just move __getcpu() for this purpose.
> 
> Also, hpa and Nadav, shouldn't asm/inst.h end up in macros.S?

If there are going to be C uses, yes.

Ingo was concerned that the .s file will be too big, so it should not
be overly abused. In addition, I want to send a patch that recompiles all
the .c files in macro.S changes. I’m worried it will start creating build
problems.


^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2018-10-26  0:59 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-23 18:42 [v3 00/12] x86: Enable FSGSBASE instructions Chang S. Bae
2018-10-23 18:42 ` [v3 01/12] taint: Introduce a new taint flag (insecure) Chang S. Bae
2018-10-24 18:50   ` Andy Lutomirski
2018-10-23 18:42 ` [v3 02/12] x86/fsgsbase/64: Add 'unsafe_fsgsbase' to enable CR4.FSGSBASE Chang S. Bae
2018-10-24 18:51   ` Andy Lutomirski
2018-10-23 18:42 ` [v3 03/12] x86/fsgsbase/64: Add intrinsics/macros for FSGSBASE instructions Chang S. Bae
2018-10-24 18:53   ` Andy Lutomirski
2018-10-24 19:21     ` Andi Kleen
2018-10-25 23:14       ` Andy Lutomirski
2018-10-25 23:31         ` Linus Torvalds
2018-10-26  0:09           ` Andy Lutomirski
2018-10-23 18:42 ` [v3 04/12] x86/fsgsbase/64: Enable FSGSBASE instructions in the helper functions Chang S. Bae
2018-10-24 19:16   ` Andy Lutomirski
2018-10-24 19:41     ` [Xen-devel] " Andrew Cooper
2018-10-25  6:09       ` Juergen Gross
2018-10-25 23:08         ` Andrew Cooper
2018-10-25 23:11           ` Andy Lutomirski
2018-10-25 23:14             ` Andrew Cooper
2018-10-25  7:32     ` Bae, Chang Seok
2018-10-25 23:00       ` Andy Lutomirski
2018-10-25 23:03         ` Bae, Chang Seok
2018-10-25 23:16     ` Andy Lutomirski
2018-10-23 18:42 ` [v3 05/12] x86/fsgsbase/64: Preserve FS/GS state in __switch_to() if FSGSBASE is on Chang S. Bae
2018-10-24 19:21   ` Andy Lutomirski
2018-10-24 19:36     ` Bae, Chang Seok
2018-10-23 18:42 ` [v3 06/12] x86/fsgsbase/64: When copying a thread, use the FSGSBASE instructions if available Chang S. Bae
2018-10-23 18:42 ` [v3 07/12] x86/fsgsbase/64: Introduce the new FIND_PERCPU_BASE macro Chang S. Bae
2018-10-26  0:25   ` Andy Lutomirski
2018-10-26  0:59     ` Nadav Amit
2018-10-23 18:42 ` [v3 08/12] x86/fsgsbase/64: Use the per-CPU base as GSBASE at the paranoid_entry Chang S. Bae
2018-10-23 18:42 ` [v3 09/12] selftests/x86/fsgsbase: Test WRGSBASE Chang S. Bae
2018-10-23 18:42 ` [v3 10/12] x86/fsgsbase/64: Enable FSGSBASE by default and add a chicken bit Chang S. Bae
2018-10-23 18:42 ` [v3 11/12] x86/elf: Enumerate kernel FSGSBASE capability in AT_HWCAP2 Chang S. Bae
2018-10-23 18:42 ` [v3 12/12] x86/fsgsbase/64: Add documentation for FSGSBASE Chang S. Bae

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).