linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/6] Enable LKGS instruction
@ 2022-10-13 20:01 Xin Li
  2022-10-13 20:01 ` [PATCH v3 1/6] x86/cpufeature: add cpu feature bit for LKGS Xin Li
                   ` (5 more replies)
  0 siblings, 6 replies; 16+ messages in thread
From: Xin Li @ 2022-10-13 20:01 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: tglx, mingo, bp, dave.hansen, hpa, peterz, brgerst, chang.seok.bae

LKGS instruction is introduced with Intel FRED (flexible return and event
delivery) specification https://cdrdv2.intel.com/v1/dl/getContent/678938.

LKGS is independent of FRED, so we enable it as a standalone CPU feature.

LKGS behaves like the MOV to GS instruction except that it loads the base
address into the IA32_KERNEL_GS_BASE MSR instead of the GS segment’s
descriptor cache, which is exactly what Linux kernel does to load user level
GS base.  Thus, with LKGS, there is no need to SWAPGS away from the kernel
GS base.

Changes since V2:
* add "" not to show "lkgs" in /proc/cpuinfo.
* mark DI as input and output (+D) as in V1, since the exception handler
  modifies it.

Changes since V1:
* use EX_TYPE_ZERO_REG instead of fixup code in the obsolete .fixup code
  section.
* add a comment that states the LKGS_DI macro will be repalced with "lkgs %di"
  once the binutils support LKGS instruction.

H. Peter Anvin (Intel) (6):
  x86/cpufeature: add cpu feature bit for LKGS
  x86/opcode: add LKGS instruction to x86-opcode-map
  x86/gsseg: make asm_load_gs_index() take an u16
  x86/gsseg: move local_irq_save/restore() into asm_load_gs_index()
  x86/gsseg: move load_gs_index() to its own header file
  x86/gsseg: use the LKGS instruction if available for load_gs_index()

 arch/x86/entry/entry_64.S                | 28 +++++++++---
 arch/x86/ia32/ia32_signal.c              |  1 +
 arch/x86/include/asm/cpufeatures.h       |  1 +
 arch/x86/include/asm/gsseg.h             | 56 ++++++++++++++++++++++++
 arch/x86/include/asm/mmu_context.h       |  1 +
 arch/x86/include/asm/special_insns.h     | 21 ---------
 arch/x86/kernel/paravirt.c               |  1 +
 arch/x86/kernel/tls.c                    |  1 +
 arch/x86/lib/x86-opcode-map.txt          |  1 +
 tools/arch/x86/include/asm/cpufeatures.h |  1 +
 tools/arch/x86/lib/x86-opcode-map.txt    |  1 +
 11 files changed, 86 insertions(+), 27 deletions(-)
 create mode 100644 arch/x86/include/asm/gsseg.h

-- 
2.34.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v3 1/6] x86/cpufeature: add cpu feature bit for LKGS
  2022-10-13 20:01 [PATCH v3 0/6] Enable LKGS instruction Xin Li
@ 2022-10-13 20:01 ` Xin Li
  2022-10-13 20:01 ` [PATCH v3 2/6] x86/opcode: add LKGS instruction to x86-opcode-map Xin Li
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Xin Li @ 2022-10-13 20:01 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: tglx, mingo, bp, dave.hansen, hpa, peterz, brgerst, chang.seok.bae

From: "H. Peter Anvin (Intel)" <hpa@zytor.com>

Add the CPU feature bit for LKGS (Load "Kernel" GS).

LKGS instruction is introduced with Intel FRED (flexible return and
event delivery) specificaton
https://cdrdv2.intel.com/v1/dl/getContent/678938.

LKGS behaves like the MOV to GS instruction except that it loads
the base address into the IA32_KERNEL_GS_BASE MSR instead of the
GS segment’s descriptor cache, which is exactly what Linux kernel
does to load a user level GS base.  Thus, with LKGS, there is no
need to SWAPGS away from the kernel GS base.

Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Xin Li <xin3.li@intel.com>
Link: https://lkml.org/lkml/2022/10/11/1139
---
 arch/x86/include/asm/cpufeatures.h       | 1 +
 tools/arch/x86/include/asm/cpufeatures.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index b71f4f2ecdd5..3dc1a48c2796 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -308,6 +308,7 @@
 /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
 #define X86_FEATURE_AVX_VNNI		(12*32+ 4) /* AVX VNNI instructions */
 #define X86_FEATURE_AVX512_BF16		(12*32+ 5) /* AVX512 BFLOAT16 instructions */
+#define X86_FEATURE_LKGS		(12*32+ 18) /* "" Load "kernel" (userspace) gs */
 
 /* AMD-defined CPU features, CPUID level 0x80000008 (EBX), word 13 */
 #define X86_FEATURE_CLZERO		(13*32+ 0) /* CLZERO instruction */
diff --git a/tools/arch/x86/include/asm/cpufeatures.h b/tools/arch/x86/include/asm/cpufeatures.h
index ef4775c6db01..9d45071d1730 100644
--- a/tools/arch/x86/include/asm/cpufeatures.h
+++ b/tools/arch/x86/include/asm/cpufeatures.h
@@ -308,6 +308,7 @@
 /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
 #define X86_FEATURE_AVX_VNNI		(12*32+ 4) /* AVX VNNI instructions */
 #define X86_FEATURE_AVX512_BF16		(12*32+ 5) /* AVX512 BFLOAT16 instructions */
+#define X86_FEATURE_LKGS		(12*32+ 18) /* "" Load "kernel" (userspace) gs */
 
 /* AMD-defined CPU features, CPUID level 0x80000008 (EBX), word 13 */
 #define X86_FEATURE_CLZERO		(13*32+ 0) /* CLZERO instruction */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 2/6] x86/opcode: add LKGS instruction to x86-opcode-map
  2022-10-13 20:01 [PATCH v3 0/6] Enable LKGS instruction Xin Li
  2022-10-13 20:01 ` [PATCH v3 1/6] x86/cpufeature: add cpu feature bit for LKGS Xin Li
@ 2022-10-13 20:01 ` Xin Li
  2022-10-13 20:01 ` [PATCH v3 3/6] x86/gsseg: make asm_load_gs_index() take an u16 Xin Li
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Xin Li @ 2022-10-13 20:01 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: tglx, mingo, bp, dave.hansen, hpa, peterz, brgerst, chang.seok.bae

From: "H. Peter Anvin (Intel)" <hpa@zytor.com>

Add the instruction opcode used by LKGS.
Opcode number is per public FRED draft spec v3.0
https://cdrdv2.intel.com/v1/dl/getContent/678938.

Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Xin Li <xin3.li@intel.com>
---
 arch/x86/lib/x86-opcode-map.txt       | 1 +
 tools/arch/x86/lib/x86-opcode-map.txt | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
index d12d1358f96d..5168ee0360b2 100644
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -1047,6 +1047,7 @@ GrpTable: Grp6
 3: LTR Ew
 4: VERR Ew
 5: VERW Ew
+6: LKGS Ew (F2)
 EndTable
 
 GrpTable: Grp7
diff --git a/tools/arch/x86/lib/x86-opcode-map.txt b/tools/arch/x86/lib/x86-opcode-map.txt
index d12d1358f96d..5168ee0360b2 100644
--- a/tools/arch/x86/lib/x86-opcode-map.txt
+++ b/tools/arch/x86/lib/x86-opcode-map.txt
@@ -1047,6 +1047,7 @@ GrpTable: Grp6
 3: LTR Ew
 4: VERR Ew
 5: VERW Ew
+6: LKGS Ew (F2)
 EndTable
 
 GrpTable: Grp7
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 3/6] x86/gsseg: make asm_load_gs_index() take an u16
  2022-10-13 20:01 [PATCH v3 0/6] Enable LKGS instruction Xin Li
  2022-10-13 20:01 ` [PATCH v3 1/6] x86/cpufeature: add cpu feature bit for LKGS Xin Li
  2022-10-13 20:01 ` [PATCH v3 2/6] x86/opcode: add LKGS instruction to x86-opcode-map Xin Li
@ 2022-10-13 20:01 ` Xin Li
  2022-10-14 12:28   ` David Laight
  2022-10-13 20:01 ` [PATCH v3 4/6] x86/gsseg: move local_irq_save/restore() into asm_load_gs_index() Xin Li
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 16+ messages in thread
From: Xin Li @ 2022-10-13 20:01 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: tglx, mingo, bp, dave.hansen, hpa, peterz, brgerst, chang.seok.bae

From: "H. Peter Anvin (Intel)" <hpa@zytor.com>

Let gcc know that only the low 16 bits of load_gs_index() argument
actually matter. It might allow it to create slightly better
code. However, do not propagate this into the prototypes of functions
that end up being paravirtualized, to avoid unnecessary changes.

Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Xin Li <xin3.li@intel.com>
---
 arch/x86/entry/entry_64.S            | 2 +-
 arch/x86/include/asm/special_insns.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 9953d966d124..e0c48998d2fb 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -779,7 +779,7 @@ _ASM_NOKPROBE(common_interrupt_return)
 
 /*
  * Reload gs selector with exception handling
- * edi:  new selector
+ *  di:  new selector
  *
  * Is in entry.text as it shouldn't be instrumented.
  */
diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h
index 35f709f619fb..a71d0e8d4684 100644
--- a/arch/x86/include/asm/special_insns.h
+++ b/arch/x86/include/asm/special_insns.h
@@ -120,7 +120,7 @@ static inline void native_wbinvd(void)
 	asm volatile("wbinvd": : :"memory");
 }
 
-extern asmlinkage void asm_load_gs_index(unsigned int selector);
+extern asmlinkage void asm_load_gs_index(u16 selector);
 
 static inline void native_load_gs_index(unsigned int selector)
 {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 4/6] x86/gsseg: move local_irq_save/restore() into asm_load_gs_index()
  2022-10-13 20:01 [PATCH v3 0/6] Enable LKGS instruction Xin Li
                   ` (2 preceding siblings ...)
  2022-10-13 20:01 ` [PATCH v3 3/6] x86/gsseg: make asm_load_gs_index() take an u16 Xin Li
@ 2022-10-13 20:01 ` Xin Li
  2022-10-15  8:51   ` Thomas Gleixner
  2022-10-13 20:01 ` [PATCH v3 5/6] x86/gsseg: move load_gs_index() to its own header file Xin Li
  2022-10-13 20:01 ` [PATCH v3 6/6] x86/gsseg: use the LKGS instruction if available for load_gs_index() Xin Li
  5 siblings, 1 reply; 16+ messages in thread
From: Xin Li @ 2022-10-13 20:01 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: tglx, mingo, bp, dave.hansen, hpa, peterz, brgerst, chang.seok.bae

From: "H. Peter Anvin (Intel)" <hpa@zytor.com>

The need to disable/enable interrupts around asm_load_gs_index() is a
consequence of the implementation. Prepare for using the LKGS
instruction, which is locally atomic and therefore doesn't need
interrupts disabled.

Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Xin Li <xin3.li@intel.com>
---
 arch/x86/entry/entry_64.S            | 26 +++++++++++++++++++++-----
 arch/x86/include/asm/special_insns.h |  4 ----
 2 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index e0c48998d2fb..cc6ba6672418 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -778,19 +778,35 @@ SYM_CODE_END(common_interrupt_return)
 _ASM_NOKPROBE(common_interrupt_return)
 
 /*
- * Reload gs selector with exception handling
+ * Reload gs selector with exception handling. This is used only on
+ * native, so using swapgs, pushf, popf, cli, sti, ... directly is fine.
+ *
  *  di:  new selector
+ * rax:  scratch register
  *
  * Is in entry.text as it shouldn't be instrumented.
+ *
+ * Note: popf is slow, so use pushf to read IF and then execute cli/sti
+ * if necessary.
  */
 SYM_FUNC_START(asm_load_gs_index)
 	FRAME_BEGIN
+	pushf
+	pop	%rax
+	andl	$X86_EFLAGS_IF, %eax	/* Interrupts enabled? */
+	jz	1f
+	cli
+1:
 	swapgs
 .Lgs_change:
 	ANNOTATE_NOENDBR // error_entry
 	movl	%edi, %gs
 2:	ALTERNATIVE "", "mfence", X86_BUG_SWAPGS_FENCE
 	swapgs
+	testl	%eax, %eax
+	jz	3f
+	sti
+3:
 	FRAME_END
 	RET
 
@@ -799,12 +815,12 @@ SYM_FUNC_START(asm_load_gs_index)
 	swapgs					/* switch back to user gs */
 .macro ZAP_GS
 	/* This can't be a string because the preprocessor needs to see it. */
-	movl $__USER_DS, %eax
-	movl %eax, %gs
+	movl $__USER_DS, %edi
+	movl %edi, %gs
 .endm
 	ALTERNATIVE "", "ZAP_GS", X86_BUG_NULL_SEG
-	xorl	%eax, %eax
-	movl	%eax, %gs
+	xorl	%edi, %edi
+	movl	%edi, %gs
 	jmp	2b
 
 	_ASM_EXTABLE(.Lgs_change, .Lbad_gs)
diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h
index a71d0e8d4684..6de00dec6564 100644
--- a/arch/x86/include/asm/special_insns.h
+++ b/arch/x86/include/asm/special_insns.h
@@ -124,11 +124,7 @@ extern asmlinkage void asm_load_gs_index(u16 selector);
 
 static inline void native_load_gs_index(unsigned int selector)
 {
-	unsigned long flags;
-
-	local_irq_save(flags);
 	asm_load_gs_index(selector);
-	local_irq_restore(flags);
 }
 
 static inline unsigned long __read_cr4(void)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 5/6] x86/gsseg: move load_gs_index() to its own header file
  2022-10-13 20:01 [PATCH v3 0/6] Enable LKGS instruction Xin Li
                   ` (3 preceding siblings ...)
  2022-10-13 20:01 ` [PATCH v3 4/6] x86/gsseg: move local_irq_save/restore() into asm_load_gs_index() Xin Li
@ 2022-10-13 20:01 ` Xin Li
  2022-10-13 20:01 ` [PATCH v3 6/6] x86/gsseg: use the LKGS instruction if available for load_gs_index() Xin Li
  5 siblings, 0 replies; 16+ messages in thread
From: Xin Li @ 2022-10-13 20:01 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: tglx, mingo, bp, dave.hansen, hpa, peterz, brgerst, chang.seok.bae

From: "H. Peter Anvin (Intel)" <hpa@zytor.com>

<asm/cpufeature.h> depends on <asm/special_insns.h>, so in order to be
able to use alternatives in native_load_gs_index(), factor it out into
a separate header file.

Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Xin Li <xin3.li@intel.com>
---
 arch/x86/ia32/ia32_signal.c          |  1 +
 arch/x86/include/asm/gsseg.h         | 32 ++++++++++++++++++++++++++++
 arch/x86/include/asm/mmu_context.h   |  1 +
 arch/x86/include/asm/special_insns.h | 17 ---------------
 arch/x86/kernel/paravirt.c           |  1 +
 arch/x86/kernel/tls.c                |  1 +
 6 files changed, 36 insertions(+), 17 deletions(-)
 create mode 100644 arch/x86/include/asm/gsseg.h

diff --git a/arch/x86/ia32/ia32_signal.c b/arch/x86/ia32/ia32_signal.c
index c9c3859322fa..14c739303099 100644
--- a/arch/x86/ia32/ia32_signal.c
+++ b/arch/x86/ia32/ia32_signal.c
@@ -34,6 +34,7 @@
 #include <asm/sigframe.h>
 #include <asm/sighandling.h>
 #include <asm/smap.h>
+#include <asm/gsseg.h>
 
 static inline void reload_segments(struct sigcontext_32 *sc)
 {
diff --git a/arch/x86/include/asm/gsseg.h b/arch/x86/include/asm/gsseg.h
new file mode 100644
index 000000000000..5e3b56a17098
--- /dev/null
+++ b/arch/x86/include/asm/gsseg.h
@@ -0,0 +1,32 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef _ASM_X86_GSSEG_H
+#define _ASM_X86_GSSEG_H
+
+#include <linux/types.h>
+#include <asm/processor.h>
+
+#ifdef CONFIG_X86_64
+
+extern asmlinkage void asm_load_gs_index(u16 selector);
+
+static inline void native_load_gs_index(unsigned int selector)
+{
+	asm_load_gs_index(selector);
+}
+
+#endif /* CONFIG_X86_64 */
+
+#ifndef CONFIG_PARAVIRT_XXL
+
+static inline void load_gs_index(unsigned int selector)
+{
+#ifdef CONFIG_X86_64
+	native_load_gs_index(selector);
+#else
+	loadsegment(gs, selector);
+#endif
+}
+
+#endif /* CONFIG_PARAVIRT_XXL */
+
+#endif /* _ASM_X86_GSSEG_H */
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index b8d40ddeab00..e01aa74a6de7 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -12,6 +12,7 @@
 #include <asm/tlbflush.h>
 #include <asm/paravirt.h>
 #include <asm/debugreg.h>
+#include <asm/gsseg.h>
 
 extern atomic64_t last_mm_ctx_id;
 
diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h
index 6de00dec6564..cfd9499b617c 100644
--- a/arch/x86/include/asm/special_insns.h
+++ b/arch/x86/include/asm/special_insns.h
@@ -120,13 +120,6 @@ static inline void native_wbinvd(void)
 	asm volatile("wbinvd": : :"memory");
 }
 
-extern asmlinkage void asm_load_gs_index(u16 selector);
-
-static inline void native_load_gs_index(unsigned int selector)
-{
-	asm_load_gs_index(selector);
-}
-
 static inline unsigned long __read_cr4(void)
 {
 	return native_read_cr4();
@@ -180,16 +173,6 @@ static inline void wbinvd(void)
 	native_wbinvd();
 }
 
-
-static inline void load_gs_index(unsigned int selector)
-{
-#ifdef CONFIG_X86_64
-	native_load_gs_index(selector);
-#else
-	loadsegment(gs, selector);
-#endif
-}
-
 #endif /* CONFIG_PARAVIRT_XXL */
 
 static inline void clflush(volatile void *__p)
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index 7ca2d46c08cc..00f6a92551d2 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -32,6 +32,7 @@
 #include <asm/special_insns.h>
 #include <asm/tlb.h>
 #include <asm/io_bitmap.h>
+#include <asm/gsseg.h>
 
 /*
  * nop stub, which must not clobber anything *including the stack* to
diff --git a/arch/x86/kernel/tls.c b/arch/x86/kernel/tls.c
index 3c883e064242..3ffbab0081f4 100644
--- a/arch/x86/kernel/tls.c
+++ b/arch/x86/kernel/tls.c
@@ -12,6 +12,7 @@
 #include <asm/ldt.h>
 #include <asm/processor.h>
 #include <asm/proto.h>
+#include <asm/gsseg.h>
 
 #include "tls.h"
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 6/6] x86/gsseg: use the LKGS instruction if available for load_gs_index()
  2022-10-13 20:01 [PATCH v3 0/6] Enable LKGS instruction Xin Li
                   ` (4 preceding siblings ...)
  2022-10-13 20:01 ` [PATCH v3 5/6] x86/gsseg: move load_gs_index() to its own header file Xin Li
@ 2022-10-13 20:01 ` Xin Li
  5 siblings, 0 replies; 16+ messages in thread
From: Xin Li @ 2022-10-13 20:01 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: tglx, mingo, bp, dave.hansen, hpa, peterz, brgerst, chang.seok.bae

From: "H. Peter Anvin (Intel)" <hpa@zytor.com>

The LKGS instruction atomically loads a segment descriptor into the
%gs descriptor registers, *except* that %gs.base is unchanged, and the
base is instead loaded into MSR_IA32_KERNEL_GS_BASE, which is exactly
what we want this function to do.

Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Xin Li <xin3.li@intel.com>
link: https://lkml.org/lkml/2022/10/7/352
link: https://lkml.org/lkml/2022/10/7/373
link: https://lkml.org/lkml/2022/10/10/1286
---
 arch/x86/include/asm/gsseg.h | 26 +++++++++++++++++++++++++-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/gsseg.h b/arch/x86/include/asm/gsseg.h
index 5e3b56a17098..7463ac65ef56 100644
--- a/arch/x86/include/asm/gsseg.h
+++ b/arch/x86/include/asm/gsseg.h
@@ -3,15 +3,39 @@
 #define _ASM_X86_GSSEG_H
 
 #include <linux/types.h>
+
+#include <asm/asm.h>
+#include <asm/cpufeature.h>
+#include <asm/alternative.h>
 #include <asm/processor.h>
+#include <asm/nops.h>
 
 #ifdef CONFIG_X86_64
 
 extern asmlinkage void asm_load_gs_index(u16 selector);
 
+/* Replace with "lkgs %di" once binutils support LKGS instruction */
+#define LKGS_DI _ASM_BYTES(0xf2,0x0f,0x00,0xf7)
+
 static inline void native_load_gs_index(unsigned int selector)
 {
-	asm_load_gs_index(selector);
+	u16 sel = selector;
+
+	/*
+	 * Note: the fixup is used for the LKGS instruction, but
+	 * it needs to be attached to the primary instruction sequence
+	 * as it isn't something that gets patched.
+	 *
+	 * %rax is provided to the assembly routine as a scratch
+	 * register.
+	 */
+	asm_inline volatile("1:\n"
+			    ALTERNATIVE("call asm_load_gs_index\n",
+					_ASM_BYTES(0x3e) LKGS_DI,
+					X86_FEATURE_LKGS)
+			    _ASM_EXTABLE_TYPE_REG(1b, 1b, EX_TYPE_ZERO_REG, %k[sel])
+			    : ASM_OUTPUT2([sel] "+D" (sel), ASM_CALL_CONSTRAINT)
+			    : : _ASM_AX);
 }
 
 #endif /* CONFIG_X86_64 */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* RE: [PATCH v3 3/6] x86/gsseg: make asm_load_gs_index() take an u16
  2022-10-13 20:01 ` [PATCH v3 3/6] x86/gsseg: make asm_load_gs_index() take an u16 Xin Li
@ 2022-10-14 12:28   ` David Laight
  2022-10-15  0:13     ` Li, Xin3
  2022-10-15  2:41     ` H. Peter Anvin
  0 siblings, 2 replies; 16+ messages in thread
From: David Laight @ 2022-10-14 12:28 UTC (permalink / raw)
  To: 'Xin Li', linux-kernel, x86
  Cc: tglx, mingo, bp, dave.hansen, hpa, peterz, brgerst, chang.seok.bae

From: Xin Li
> Sent: 13 October 2022 21:02
> 
> From: "H. Peter Anvin (Intel)" <hpa@zytor.com>
> 
> Let gcc know that only the low 16 bits of load_gs_index() argument
> actually matter. It might allow it to create slightly better
> code. However, do not propagate this into the prototypes of functions
> that end up being paravirtualized, to avoid unnecessary changes.

Using u16 will almost always make the code worse.
At some point the value has to be masked and/or extended
to ensure an out of range value doesn't appear in
a register.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH v3 3/6] x86/gsseg: make asm_load_gs_index() take an u16
  2022-10-14 12:28   ` David Laight
@ 2022-10-15  0:13     ` Li, Xin3
  2022-10-15  2:41     ` H. Peter Anvin
  1 sibling, 0 replies; 16+ messages in thread
From: Li, Xin3 @ 2022-10-15  0:13 UTC (permalink / raw)
  To: David Laight, linux-kernel, x86
  Cc: tglx, mingo, bp, dave.hansen, hpa, peterz, brgerst, Bae, Chang Seok

> >
> > From: "H. Peter Anvin (Intel)" <hpa@zytor.com>
> >
> > Let gcc know that only the low 16 bits of load_gs_index() argument
> > actually matter. It might allow it to create slightly better code.
> > However, do not propagate this into the prototypes of functions that
> > end up being paravirtualized, to avoid unnecessary changes.
> 
> Using u16 will almost always make the code worse.
> At some point the value has to be masked and/or extended to ensure an out of
> range value doesn't appear in a register.

Any potential issue with this patch set?

> 
> 	David
> 
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1
> 1PT, UK Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH v3 3/6] x86/gsseg: make asm_load_gs_index() take an u16
  2022-10-14 12:28   ` David Laight
  2022-10-15  0:13     ` Li, Xin3
@ 2022-10-15  2:41     ` H. Peter Anvin
  2022-10-17  7:49       ` David Laight
  1 sibling, 1 reply; 16+ messages in thread
From: H. Peter Anvin @ 2022-10-15  2:41 UTC (permalink / raw)
  To: David Laight, 'Xin Li', linux-kernel, x86
  Cc: tglx, mingo, bp, dave.hansen, peterz, brgerst, chang.seok.bae

On October 14, 2022 5:28:25 AM PDT, David Laight <David.Laight@ACULAB.COM> wrote:
>From: Xin Li
>> Sent: 13 October 2022 21:02
>> 
>> From: "H. Peter Anvin (Intel)" <hpa@zytor.com>
>> 
>> Let gcc know that only the low 16 bits of load_gs_index() argument
>> actually matter. It might allow it to create slightly better
>> code. However, do not propagate this into the prototypes of functions
>> that end up being paravirtualized, to avoid unnecessary changes.
>
>Using u16 will almost always make the code worse.
>At some point the value has to be masked and/or extended
>to ensure an out of range value doesn't appear in
>a register.
>
>	David
>
>-
>Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
>Registration No: 1397386 (Wales)
>
>

Is that a general statement or are you actually invoking it in this case? This is about it being a narrowing input, *removing* such constraints.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 4/6] x86/gsseg: move local_irq_save/restore() into asm_load_gs_index()
  2022-10-13 20:01 ` [PATCH v3 4/6] x86/gsseg: move local_irq_save/restore() into asm_load_gs_index() Xin Li
@ 2022-10-15  8:51   ` Thomas Gleixner
  2022-10-18 17:25     ` Li, Xin3
  0 siblings, 1 reply; 16+ messages in thread
From: Thomas Gleixner @ 2022-10-15  8:51 UTC (permalink / raw)
  To: Xin Li, linux-kernel, x86
  Cc: mingo, bp, dave.hansen, hpa, peterz, brgerst, chang.seok.bae

On Thu, Oct 13 2022 at 13:01, Xin Li wrote:
> From: "H. Peter Anvin (Intel)" <hpa@zytor.com>
>
> The need to disable/enable interrupts around asm_load_gs_index() is a
> consequence of the implementation. Prepare for using the LKGS
> instruction, which is locally atomic and therefore doesn't need
> interrupts disabled.

*Shudder*. We want less ASM not more.

>  static inline void native_load_gs_index(unsigned int selector)
>  {
> -	unsigned long flags;
> -
> -	local_irq_save(flags);
>  	asm_load_gs_index(selector);
> -	local_irq_restore(flags);
>  }

static inline void native_load_gs_index(unsigned int selector)
{
	unsigned long flags;

        if (cpu_feature_enabled(LKGS)) {
              native_lkgs(selector);
        } else {
	      local_irq_save(flags);
              asm_load_gs_index(selector);
	      local_irq_restore(flags);
       }
}

For paravirt enabled kernels we want during feature detection:

        if (cpu_feature_enabled(LKGS)))
        	pv_ops.cpu.load_gs_index = native_lkgs;

No?

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH v3 3/6] x86/gsseg: make asm_load_gs_index() take an u16
  2022-10-15  2:41     ` H. Peter Anvin
@ 2022-10-17  7:49       ` David Laight
  2022-10-17 18:39         ` H. Peter Anvin
  0 siblings, 1 reply; 16+ messages in thread
From: David Laight @ 2022-10-17  7:49 UTC (permalink / raw)
  To: 'H. Peter Anvin', 'Xin Li', linux-kernel, x86
  Cc: tglx, mingo, bp, dave.hansen, peterz, brgerst, chang.seok.bae

From: H. Peter Anvin
> Sent: 15 October 2022 03:41
> 
> On October 14, 2022 5:28:25 AM PDT, David Laight <David.Laight@ACULAB.COM> wrote:
> >From: Xin Li
> >> Sent: 13 October 2022 21:02
> >>
> >> From: "H. Peter Anvin (Intel)" <hpa@zytor.com>
> >>
> >> Let gcc know that only the low 16 bits of load_gs_index() argument
> >> actually matter. It might allow it to create slightly better
> >> code. However, do not propagate this into the prototypes of functions
> >> that end up being paravirtualized, to avoid unnecessary changes.
> >
> >Using u16 will almost always make the code worse.
> >At some point the value has to be masked and/or extended
> >to ensure an out of range value doesn't appear in
> >a register.
> >
> >	David
> 
> Is that a general statement or are you actually invoking it in this case?
> This is about it being a narrowing input, *removing* such constraints.

It is a general statement.
You suggested you might get better code.
If fact you'll probably get worse code.
It might not matter here, but ...

Most modern calling conventions use cpu register to pass arguments
and results.
So the compiler is required to ensure that u16 values are in range
in either the caller or called code (or both).
Just because the domain of a value is small doesn't mean that
the best type isn't 'int' or 'unsigned int'.

Additionally (except on x86) any arithmetic on sub-32bit values
requires additional instructions to mask the result.

Even on x86-64 if you index an array with an 'int' the compiler
has to generate code to sign extend the value to 64 bits.
You get better code for 'signed long' or unsigned types.
This is probably true for all 64bit architectures.

Since (most) cpu have both sign extending an zero extending
loads from memory, it can make sense to use u8 and u16 to
reduce the size of structures.
But for function arguments and function locals it almost
always makes the code worse.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH v3 3/6] x86/gsseg: make asm_load_gs_index() take an u16
  2022-10-17  7:49       ` David Laight
@ 2022-10-17 18:39         ` H. Peter Anvin
  0 siblings, 0 replies; 16+ messages in thread
From: H. Peter Anvin @ 2022-10-17 18:39 UTC (permalink / raw)
  To: David Laight, 'Xin Li', linux-kernel, x86
  Cc: tglx, mingo, bp, dave.hansen, peterz, brgerst, chang.seok.bae

On October 17, 2022 12:49:41 AM PDT, David Laight <David.Laight@ACULAB.COM> wrote:
>From: H. Peter Anvin
>> Sent: 15 October 2022 03:41
>> 
>> On October 14, 2022 5:28:25 AM PDT, David Laight <David.Laight@ACULAB.COM> wrote:
>> >From: Xin Li
>> >> Sent: 13 October 2022 21:02
>> >>
>> >> From: "H. Peter Anvin (Intel)" <hpa@zytor.com>
>> >>
>> >> Let gcc know that only the low 16 bits of load_gs_index() argument
>> >> actually matter. It might allow it to create slightly better
>> >> code. However, do not propagate this into the prototypes of functions
>> >> that end up being paravirtualized, to avoid unnecessary changes.
>> >
>> >Using u16 will almost always make the code worse.
>> >At some point the value has to be masked and/or extended
>> >to ensure an out of range value doesn't appear in
>> >a register.
>> >
>> >	David
>> 
>> Is that a general statement or are you actually invoking it in this case?
>> This is about it being a narrowing input, *removing* such constraints.
>
>It is a general statement.
>You suggested you might get better code.
>If fact you'll probably get worse code.
>It might not matter here, but ...
>
>Most modern calling conventions use cpu register to pass arguments
>and results.
>So the compiler is required to ensure that u16 values are in range
>in either the caller or called code (or both).
>Just because the domain of a value is small doesn't mean that
>the best type isn't 'int' or 'unsigned int'.
>
>Additionally (except on x86) any arithmetic on sub-32bit values
>requires additional instructions to mask the result.
>
>Even on x86-64 if you index an array with an 'int' the compiler
>has to generate code to sign extend the value to 64 bits.
>You get better code for 'signed long' or unsigned types.
>This is probably true for all 64bit architectures.
>
>Since (most) cpu have both sign extending an zero extending
>loads from memory, it can make sense to use u8 and u16 to
>reduce the size of structures.
>But for function arguments and function locals it almost
>always makes the code worse.
>
>	David
>
>-
>Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
>Registration No: 1397386 (Wales)
>

Ok. You are plain incorrect in this case for two reasons:

1. The x86-64 calling convention makes it up to the receiver (callee for arguments, caller for returns) to do such masking of values.

2. The consumer of the values here does not need any masking or extensions.

So this is simply telling the compiler what the programmer knows.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH v3 4/6] x86/gsseg: move local_irq_save/restore() into asm_load_gs_index()
  2022-10-15  8:51   ` Thomas Gleixner
@ 2022-10-18 17:25     ` Li, Xin3
  2022-10-18 18:13       ` H. Peter Anvin
  0 siblings, 1 reply; 16+ messages in thread
From: Li, Xin3 @ 2022-10-18 17:25 UTC (permalink / raw)
  To: Thomas Gleixner, linux-kernel, x86
  Cc: mingo, bp, dave.hansen, hpa, peterz, brgerst, Bae, Chang Seok

> >  static inline void native_load_gs_index(unsigned int selector)  {
> > -	unsigned long flags;
> > -
> > -	local_irq_save(flags);
> >  	asm_load_gs_index(selector);
> > -	local_irq_restore(flags);
> >  }
> 
> static inline void native_load_gs_index(unsigned int selector) {
> 	unsigned long flags;
> 
>         if (cpu_feature_enabled(LKGS)) {
>               native_lkgs(selector);
>         } else {
> 	      local_irq_save(flags);
>               asm_load_gs_index(selector);
> 	      local_irq_restore(flags);
>        }
> }
> 
> For paravirt enabled kernels we want during feature detection:
> 
>         if (cpu_feature_enabled(LKGS)))
>         	pv_ops.cpu.load_gs_index = native_lkgs;

If we use static_cpu_has in native_load_gs_index
       if (static_cpu_has(X86_FEATURE_LKGS)) {
               native_lkgs(selector);
       }

We don't have to change pv_ops.cpu.load_gs_index, right?

Thanks!
Xin

> 
> No?
> 
> Thanks,
> 
>         tglx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH v3 4/6] x86/gsseg: move local_irq_save/restore() into asm_load_gs_index()
  2022-10-18 17:25     ` Li, Xin3
@ 2022-10-18 18:13       ` H. Peter Anvin
  2022-10-18 21:29         ` Li, Xin3
  0 siblings, 1 reply; 16+ messages in thread
From: H. Peter Anvin @ 2022-10-18 18:13 UTC (permalink / raw)
  To: Li, Xin3, Thomas Gleixner, linux-kernel, x86
  Cc: mingo, bp, dave.hansen, peterz, brgerst, Bae, Chang Seok

On October 18, 2022 10:25:31 AM PDT, "Li, Xin3" <xin3.li@intel.com> wrote:
>> >  static inline void native_load_gs_index(unsigned int selector)  {
>> > -	unsigned long flags;
>> > -
>> > -	local_irq_save(flags);
>> >  	asm_load_gs_index(selector);
>> > -	local_irq_restore(flags);
>> >  }
>> 
>> static inline void native_load_gs_index(unsigned int selector) {
>> 	unsigned long flags;
>> 
>>         if (cpu_feature_enabled(LKGS)) {
>>               native_lkgs(selector);
>>         } else {
>> 	      local_irq_save(flags);
>>               asm_load_gs_index(selector);
>> 	      local_irq_restore(flags);
>>        }
>> }
>> 
>> For paravirt enabled kernels we want during feature detection:
>> 
>>         if (cpu_feature_enabled(LKGS)))
>>         	pv_ops.cpu.load_gs_index = native_lkgs;
>
>If we use static_cpu_has in native_load_gs_index
>       if (static_cpu_has(X86_FEATURE_LKGS)) {
>               native_lkgs(selector);
>       }
>
>We don't have to change pv_ops.cpu.load_gs_index, right?
>
>Thanks!
>Xin
>
>> 
>> No?
>> 
>> Thanks,
>> 
>>         tglx
>

You don't *have* to, but it would mean a branch to a branch, so it would be more efficient. It would strictly be an optimization.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH v3 4/6] x86/gsseg: move local_irq_save/restore() into asm_load_gs_index()
  2022-10-18 18:13       ` H. Peter Anvin
@ 2022-10-18 21:29         ` Li, Xin3
  0 siblings, 0 replies; 16+ messages in thread
From: Li, Xin3 @ 2022-10-18 21:29 UTC (permalink / raw)
  To: H. Peter Anvin, Thomas Gleixner, linux-kernel, x86
  Cc: mingo, bp, dave.hansen, peterz, brgerst, Bae, Chang Seok

> On October 18, 2022 10:25:31 AM PDT, "Li, Xin3" <xin3.li@intel.com> wrote:
> >> >  static inline void native_load_gs_index(unsigned int selector)  {
> >> > -	unsigned long flags;
> >> > -
> >> > -	local_irq_save(flags);
> >> >  	asm_load_gs_index(selector);
> >> > -	local_irq_restore(flags);
> >> >  }
> >>
> >> static inline void native_load_gs_index(unsigned int selector) {
> >> 	unsigned long flags;
> >>
> >>         if (cpu_feature_enabled(LKGS)) {
> >>               native_lkgs(selector);
> >>         } else {
> >> 	      local_irq_save(flags);
> >>               asm_load_gs_index(selector);
> >> 	      local_irq_restore(flags);
> >>        }
> >> }
> >>
> >> For paravirt enabled kernels we want during feature detection:
> >>
> >>         if (cpu_feature_enabled(LKGS)))
> >>         	pv_ops.cpu.load_gs_index = native_lkgs;
> >
> >If we use static_cpu_has in native_load_gs_index
> >       if (static_cpu_has(X86_FEATURE_LKGS)) {
> >               native_lkgs(selector);
> >       }
> >
> >We don't have to change pv_ops.cpu.load_gs_index, right?
> >
> >Thanks!
> >Xin
> >
> >>
> >> No?
> >>
> >> Thanks,
> >>
> >>         tglx
> >
> 
> You don't *have* to, but it would mean a branch to a branch, so it would be
> more efficient. It would strictly be an optimization.

Right, the generated object file shows that static_cpu_has is less efficient than ALTERNATIVE.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2022-10-18 21:29 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-13 20:01 [PATCH v3 0/6] Enable LKGS instruction Xin Li
2022-10-13 20:01 ` [PATCH v3 1/6] x86/cpufeature: add cpu feature bit for LKGS Xin Li
2022-10-13 20:01 ` [PATCH v3 2/6] x86/opcode: add LKGS instruction to x86-opcode-map Xin Li
2022-10-13 20:01 ` [PATCH v3 3/6] x86/gsseg: make asm_load_gs_index() take an u16 Xin Li
2022-10-14 12:28   ` David Laight
2022-10-15  0:13     ` Li, Xin3
2022-10-15  2:41     ` H. Peter Anvin
2022-10-17  7:49       ` David Laight
2022-10-17 18:39         ` H. Peter Anvin
2022-10-13 20:01 ` [PATCH v3 4/6] x86/gsseg: move local_irq_save/restore() into asm_load_gs_index() Xin Li
2022-10-15  8:51   ` Thomas Gleixner
2022-10-18 17:25     ` Li, Xin3
2022-10-18 18:13       ` H. Peter Anvin
2022-10-18 21:29         ` Li, Xin3
2022-10-13 20:01 ` [PATCH v3 5/6] x86/gsseg: move load_gs_index() to its own header file Xin Li
2022-10-13 20:01 ` [PATCH v3 6/6] x86/gsseg: use the LKGS instruction if available for load_gs_index() Xin Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).