linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V4 0/6] x86: Don't abuse tss.sp1
@ 2021-02-10 13:39 Lai Jiangshan
  2021-02-10 13:39 ` [PATCH V4 1/6] x86/entry/64: Move cpu_current_top_of_stack out of TSS Lai Jiangshan
                   ` (6 more replies)
  0 siblings, 7 replies; 9+ messages in thread
From: Lai Jiangshan @ 2021-02-10 13:39 UTC (permalink / raw)
  To: linux-kernel, Borislav Petkov
  Cc: Lai Jiangshan, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	x86, H. Peter Anvin, Dave Hansen, Peter Zijlstra, Al Viro,
	Vincenzo Frascino, Joerg Roedel, Ricardo Neri, Reinette Chatre,
	Balbir Singh, Andrew Morton, Gabriel Krisman Bertazi, Kees Cook,
	Frederic Weisbecker, Jens Axboe, Arvind Sankar, Brian Gerst,
	Ard Biesheuvel, Andi Kleen, Mike Rapoport, Mike Hommey,
	Mark Gross, Fenghua Yu, Tony Luck, Anthony Steinhauser, Jay Lang,
	Chang S. Bae

From: Lai Jiangshan <laijs@linux.alibaba.com>

In x86_64, tss.sp1 is reused as cpu_current_top_of_stack.  We'd better
directly use percpu since CR3 and gs_base is correct when it is used.

In x86_32, tss.sp1 is resued as thread.sp0 in three places in entry
code.  We have the correct CR3 and %fs at two of the places.  The last
one is sysenter.  This patchset makes %fs available earlier so that
we can also use percpu in sysenter.  And add a percpu cpu_current_thread_sp0
for thread.sp0 instead of tss.sp1

[V3]: https://lore.kernel.org/lkml/20210127163231.12709-1-jiangshanlai@gmail.com/
[V2]: https://lore.kernel.org/lkml/20210125173444.22696-1-jiangshanlai@gmail.com/
[V1]: https://lore.kernel.org/lkml/20210123084900.3118-1-jiangshanlai@gmail.com/

Changed from V3:
	Update subjects as Borislav's imperative request. ^_^
	Update changelog as Borislav suggested.
	Change EXPORT_PER_CPU_SYMBOL to EXPORT_PER_CPU_SYMBOL_GPL.

Changed from V2:
	Add missing "%ss:" reported by Brian Gerst.

Changed from V1:
	Requested from Andy to also fix sp1 for x86_32.
	Update comments in the x86_64 patch as Andy sugguested.

Lai Jiangshan (6):
  x86/entry/64: Move cpu_current_top_of_stack out of TSS
  x86/entry/32: Use percpu instead of offset-calculation to get
    thread.sp0 in SWITCH_TO_KERNEL_STACK
  x86/entry/32: Switch to the task stack without emptying the entry
    stack
  x86/entry/32: Restore %fs before switching stack
  x86/entry/32: Use percpu to get thread.sp0 in SYSENTER
  x86/entry/32: Introduce cpu_current_thread_sp0 to replace
    cpu_tss_rw.x86_tss.sp1

 arch/x86/entry/entry_32.S          | 38 +++++++++++++++++-------------
 arch/x86/include/asm/processor.h   | 12 ++--------
 arch/x86/include/asm/switch_to.h   |  8 +------
 arch/x86/include/asm/thread_info.h |  6 -----
 arch/x86/kernel/asm-offsets.c      |  1 -
 arch/x86/kernel/asm-offsets_32.c   | 10 --------
 arch/x86/kernel/cpu/common.c       | 12 +++++++++-
 arch/x86/kernel/process.c          |  7 ------
 arch/x86/mm/pti.c                  |  7 +++---
 9 files changed, 39 insertions(+), 62 deletions(-)

-- 
2.19.1.6.gb485710b


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH V4 1/6] x86/entry/64: Move cpu_current_top_of_stack out of TSS
  2021-02-10 13:39 [PATCH V4 0/6] x86: Don't abuse tss.sp1 Lai Jiangshan
@ 2021-02-10 13:39 ` Lai Jiangshan
  2021-02-10 13:39 ` [PATCH V4 2/6] x86/entry/32: Use percpu instead of offset-calculation to get thread.sp0 in SWITCH_TO_KERNEL_STACK Lai Jiangshan
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Lai Jiangshan @ 2021-02-10 13:39 UTC (permalink / raw)
  To: linux-kernel, Borislav Petkov
  Cc: Lai Jiangshan, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	x86, H. Peter Anvin, Dave Hansen, Peter Zijlstra, Al Viro,
	Vincenzo Frascino, Joerg Roedel, Ricardo Neri, Reinette Chatre,
	Balbir Singh, Andrew Morton, Gabriel Krisman Bertazi, Kees Cook,
	Frederic Weisbecker, Jens Axboe, Arvind Sankar, Brian Gerst,
	Ard Biesheuvel, Andi Kleen, Mike Rapoport, Mike Hommey,
	Mark Gross, Fenghua Yu, Tony Luck, Anthony Steinhauser, Jay Lang,
	Chang S. Bae

From: Lai Jiangshan <laijs@linux.alibaba.com>

In x86_64, cpu_current_top_of_stack is an alias of cpu_tss_rw.x86_tss.sp1.

When the CPU has meltdown vulnerability(X86_BUG_CPU_MELTDOWN), it would
become a coveted fruit even if kernel pagetable isolation is enabled since
CPU TSS must also be in the user CR3.  An attacker can fetch the kernel
stack top from it through the said vulnerability and continue next steps
of actions based on the kernel stack.

Besides the possible leakage of the address of the kernel stack, it is
not necessary to be in TSS either.  Although it is also heavily used
in the entry code, it is only used when CR3 is already the kernel CR3
and gs_base is already the kernel gs_base which means it can be a normal
percpu variable instead of an alias to a field in TSS.

The major reason it reuses a filed in TSS is performance because TSS is
normally hot in cache and TLB since entry_SYSCALL_64 uses sp2 as scratch
space to stash the user RSP value.

This patch makes it be a percpu variable near other hot percpu variables,
such as current_task, __preempt_count, and they are in the same
cache line.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
tools/testing/selftests/seccomp/seccomp_benchmark desn't show any
performance lost in "getpid native" result.  And actually, the result
changes from 93ns before patch to 92ns after patch when !KPTI, and the
test is very stable although the test desn't show a higher degree of
precision but enough to know it doesn't cause degression for the test.

 arch/x86/include/asm/processor.h   | 10 ----------
 arch/x86/include/asm/switch_to.h   |  6 ------
 arch/x86/include/asm/thread_info.h |  6 ------
 arch/x86/kernel/cpu/common.c       |  3 +++
 arch/x86/kernel/process.c          |  7 +------
 arch/x86/mm/pti.c                  |  7 +++----
 6 files changed, 7 insertions(+), 32 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index a411466a6e74..e197de05d0aa 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -316,11 +316,6 @@ struct x86_hw_tss {
 struct x86_hw_tss {
 	u32			reserved1;
 	u64			sp0;
-
-	/*
-	 * We store cpu_current_top_of_stack in sp1 so it's always accessible.
-	 * Linux does not use ring 1, so sp1 is not otherwise needed.
-	 */
 	u64			sp1;
 
 	/*
@@ -430,12 +425,7 @@ struct irq_stack {
 
 DECLARE_PER_CPU(struct irq_stack *, hardirq_stack_ptr);
 
-#ifdef CONFIG_X86_32
 DECLARE_PER_CPU(unsigned long, cpu_current_top_of_stack);
-#else
-/* The RO copy can't be accessed with this_cpu_xyz(), so use the RW copy. */
-#define cpu_current_top_of_stack cpu_tss_rw.x86_tss.sp1
-#endif
 
 #ifdef CONFIG_X86_64
 struct fixed_percpu_data {
diff --git a/arch/x86/include/asm/switch_to.h b/arch/x86/include/asm/switch_to.h
index 9f69cc497f4b..f0ba06bcba0b 100644
--- a/arch/x86/include/asm/switch_to.h
+++ b/arch/x86/include/asm/switch_to.h
@@ -71,12 +71,6 @@ static inline void update_task_stack(struct task_struct *task)
 	else
 		this_cpu_write(cpu_tss_rw.x86_tss.sp1, task->thread.sp0);
 #else
-	/*
-	 * x86-64 updates x86_tss.sp1 via cpu_current_top_of_stack. That
-	 * doesn't work on x86-32 because sp1 and
-	 * cpu_current_top_of_stack have different values (because of
-	 * the non-zero stack-padding on 32bit).
-	 */
 	if (static_cpu_has(X86_FEATURE_XENPV))
 		load_sp0(task_top_of_stack(task));
 #endif
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index 33b637442b9e..f72404991d01 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -199,12 +199,6 @@ static inline int arch_within_stack_frames(const void * const stack,
 #endif
 }
 
-#else /* !__ASSEMBLY__ */
-
-#ifdef CONFIG_X86_64
-# define cpu_current_top_of_stack (cpu_tss_rw + TSS_sp1)
-#endif
-
 #endif
 
 #ifdef CONFIG_COMPAT
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 9215b91bc044..9c531ec73f5c 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1748,6 +1748,9 @@ DEFINE_PER_CPU(unsigned int, irq_count) __visible = -1;
 DEFINE_PER_CPU(int, __preempt_count) = INIT_PREEMPT_COUNT;
 EXPORT_PER_CPU_SYMBOL(__preempt_count);
 
+DEFINE_PER_CPU(unsigned long, cpu_current_top_of_stack) = TOP_OF_INIT_STACK;
+EXPORT_PER_CPU_SYMBOL_GPL(cpu_current_top_of_stack);
+
 /* May not be marked __init: used by software suspend */
 void syscall_init(void)
 {
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 145a7ac0c19a..296de77da4b2 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -63,14 +63,9 @@ __visible DEFINE_PER_CPU_PAGE_ALIGNED(struct tss_struct, cpu_tss_rw) = {
 		 */
 		.sp0 = (1UL << (BITS_PER_LONG-1)) + 1,
 
-		/*
-		 * .sp1 is cpu_current_top_of_stack.  The init task never
-		 * runs user code, but cpu_current_top_of_stack should still
-		 * be well defined before the first context switch.
-		 */
+#ifdef CONFIG_X86_32
 		.sp1 = TOP_OF_INIT_STACK,
 
-#ifdef CONFIG_X86_32
 		.ss0 = __KERNEL_DS,
 		.ss1 = __KERNEL_CS,
 #endif
diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c
index 1aab92930569..e101cd87d038 100644
--- a/arch/x86/mm/pti.c
+++ b/arch/x86/mm/pti.c
@@ -440,10 +440,9 @@ static void __init pti_clone_user_shared(void)
 
 	for_each_possible_cpu(cpu) {
 		/*
-		 * The SYSCALL64 entry code needs to be able to find the
-		 * thread stack and needs one word of scratch space in which
-		 * to spill a register.  All of this lives in the TSS, in
-		 * the sp1 and sp2 slots.
+		 * The SYSCALL64 entry code needs one word of scratch space
+		 * in which to spill a register.  It lives in the sp2 slot
+		 * of the CPU's TSS.
 		 *
 		 * This is done for all possible CPUs during boot to ensure
 		 * that it's propagated to all mms.
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH V4 2/6] x86/entry/32: Use percpu instead of offset-calculation to get thread.sp0 in SWITCH_TO_KERNEL_STACK
  2021-02-10 13:39 [PATCH V4 0/6] x86: Don't abuse tss.sp1 Lai Jiangshan
  2021-02-10 13:39 ` [PATCH V4 1/6] x86/entry/64: Move cpu_current_top_of_stack out of TSS Lai Jiangshan
@ 2021-02-10 13:39 ` Lai Jiangshan
  2021-02-10 13:39 ` [PATCH V4 3/6] x86/entry/32: Switch to the task stack without emptying the entry stack Lai Jiangshan
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Lai Jiangshan @ 2021-02-10 13:39 UTC (permalink / raw)
  To: linux-kernel, Borislav Petkov
  Cc: Lai Jiangshan, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	x86, H. Peter Anvin, Dave Hansen, Peter Zijlstra, Al Viro,
	Vincenzo Frascino, Joerg Roedel, Ricardo Neri, Reinette Chatre,
	Balbir Singh, Andrew Morton, Gabriel Krisman Bertazi, Kees Cook,
	Frederic Weisbecker, Jens Axboe, Arvind Sankar, Brian Gerst,
	Ard Biesheuvel, Andi Kleen, Mike Rapoport, Mike Hommey,
	Mark Gross, Fenghua Yu, Tony Luck, Anthony Steinhauser, Jay Lang,
	Chang S. Bae

From: Lai Jiangshan <laijs@linux.alibaba.com>

TSS_entry2task_stack is used to refer to tss.sp1 which is a copy of
thread.sp0.

When TSS_entry2task_stack is used in SWITCH_TO_KERNEL_STACK, the CR3 is
already kernel CR3 and the kernel segments are loaded.

So it directly uses percpu to get tss.sp1(thread.sp0) instead of
the complicated offset-calculation via TSS_entry2task_stack.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/entry/entry_32.S | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index df8c017e6161..3b4d1a63d1f0 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -465,16 +465,11 @@
 	cmpl	$SIZEOF_entry_stack, %ecx
 	jae	.Lend_\@
 
-	/* Load stack pointer into %esi and %edi */
+	/* Load stack pointer into %esi */
 	movl	%esp, %esi
-	movl	%esi, %edi
-
-	/* Move %edi to the top of the entry stack */
-	andl	$(MASK_entry_stack), %edi
-	addl	$(SIZEOF_entry_stack), %edi
 
 	/* Load top of task-stack into %edi */
-	movl	TSS_entry2task_stack(%edi), %edi
+	movl	PER_CPU_VAR(cpu_tss_rw + TSS_sp1), %edi
 
 	/* Special case - entry from kernel mode via entry stack */
 #ifdef CONFIG_VM86
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH V4 3/6] x86/entry/32: Switch to the task stack without emptying the entry stack
  2021-02-10 13:39 [PATCH V4 0/6] x86: Don't abuse tss.sp1 Lai Jiangshan
  2021-02-10 13:39 ` [PATCH V4 1/6] x86/entry/64: Move cpu_current_top_of_stack out of TSS Lai Jiangshan
  2021-02-10 13:39 ` [PATCH V4 2/6] x86/entry/32: Use percpu instead of offset-calculation to get thread.sp0 in SWITCH_TO_KERNEL_STACK Lai Jiangshan
@ 2021-02-10 13:39 ` Lai Jiangshan
  2021-02-10 13:39 ` [PATCH V4 4/6] x86/entry/32: Restore %fs before switching stack Lai Jiangshan
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Lai Jiangshan @ 2021-02-10 13:39 UTC (permalink / raw)
  To: linux-kernel, Borislav Petkov
  Cc: Lai Jiangshan, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	x86, H. Peter Anvin, Dave Hansen, Peter Zijlstra, Al Viro,
	Vincenzo Frascino, Joerg Roedel, Ricardo Neri, Reinette Chatre,
	Balbir Singh, Andrew Morton, Gabriel Krisman Bertazi, Kees Cook,
	Frederic Weisbecker, Jens Axboe, Arvind Sankar, Brian Gerst,
	Ard Biesheuvel, Andi Kleen, Mike Rapoport, Mike Hommey,
	Mark Gross, Fenghua Yu, Tony Luck, Anthony Steinhauser, Jay Lang,
	Chang S. Bae

From: Lai Jiangshan <laijs@linux.alibaba.com>

Like the way x86_64 uses the entry stack when switching to the task stack,
entry_SYSENTER_32 can also save the entry stack pointer to a register and
then switch to the task stack.  So that it doesn't need to empty the entry
stack by poping contents to registers and it has more space on the entry
stack to save stuffs or scratch registers.

It is a preparation for next patches which need to save user %fs in the
entry stack before restoring kernel %fs and loading the task stack for
stack switching.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/entry/entry_32.S | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 3b4d1a63d1f0..3e693db0963d 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -905,19 +905,18 @@ SYM_FUNC_START(entry_SYSENTER_32)
 	pushl	%eax
 	BUG_IF_WRONG_CR3 no_user_check=1
 	SWITCH_TO_KERNEL_CR3 scratch_reg=%eax
-	popl	%eax
-	popfl
 
-	/* Stack empty again, switch to task stack */
-	movl	TSS_entry2task_stack(%esp), %esp
+	/* Switch to task stack */
+	movl	%esp, %eax
+	movl	(2*4+TSS_entry2task_stack)(%esp), %esp
 
 .Lsysenter_past_esp:
 	pushl	$__USER_DS		/* pt_regs->ss */
 	pushl	$0			/* pt_regs->sp (placeholder) */
-	pushfl				/* pt_regs->flags (except IF = 0) */
+	pushl	%ss:4(%eax)		/* pt_regs->flags (except IF = 0) */
 	pushl	$__USER_CS		/* pt_regs->cs */
 	pushl	$0			/* pt_regs->ip = 0 (placeholder) */
-	pushl	%eax			/* pt_regs->orig_ax */
+	pushl	%ss:(%eax)		/* pt_regs->orig_ax */
 	SAVE_ALL pt_regs_ax=$-ENOSYS	/* save rest, stack already switched */
 
 	/*
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH V4 4/6] x86/entry/32: Restore %fs before switching stack
  2021-02-10 13:39 [PATCH V4 0/6] x86: Don't abuse tss.sp1 Lai Jiangshan
                   ` (2 preceding siblings ...)
  2021-02-10 13:39 ` [PATCH V4 3/6] x86/entry/32: Switch to the task stack without emptying the entry stack Lai Jiangshan
@ 2021-02-10 13:39 ` Lai Jiangshan
  2021-02-10 13:39 ` [PATCH V4 5/6] x86/entry/32: Use percpu to get thread.sp0 in SYSENTER Lai Jiangshan
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Lai Jiangshan @ 2021-02-10 13:39 UTC (permalink / raw)
  To: linux-kernel, Borislav Petkov
  Cc: Lai Jiangshan, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	x86, H. Peter Anvin, Dave Hansen, Peter Zijlstra, Al Viro,
	Vincenzo Frascino, Joerg Roedel, Ricardo Neri, Reinette Chatre,
	Balbir Singh, Andrew Morton, Gabriel Krisman Bertazi, Kees Cook,
	Frederic Weisbecker, Jens Axboe, Arvind Sankar, Brian Gerst,
	Ard Biesheuvel, Andi Kleen, Mike Rapoport, Mike Hommey,
	Mark Gross, Fenghua Yu, Tony Luck, Anthony Steinhauser, Jay Lang,
	Chang S. Bae

From: Lai Jiangshan <laijs@linux.alibaba.com>

entry_SYSENTER_32 saves the user %fs in the entry stack and restores the
kernel %fs before loading the task stack for stack switching, so that it
can use percpu before switching stack in the next patch.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/entry/entry_32.S | 22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 3e693db0963d..01f098c5b017 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -279,11 +279,13 @@
 .Lfinished_frame_\@:
 .endm
 
-.macro SAVE_ALL pt_regs_ax=%eax switch_stacks=0 skip_gs=0 unwind_espfix=0
+.macro SAVE_ALL pt_regs_ax=%eax switch_stacks=0 skip_gs=0 skip_fs=0 unwind_espfix=0
 	cld
 .if \skip_gs == 0
 	PUSH_GS
 .endif
+
+.if \skip_fs == 0
 	pushl	%fs
 
 	pushl	%eax
@@ -293,6 +295,7 @@
 	UNWIND_ESPFIX_STACK
 .endif
 	popl	%eax
+.endif
 
 	FIXUP_FRAME
 	pushl	%es
@@ -906,18 +909,27 @@ SYM_FUNC_START(entry_SYSENTER_32)
 	BUG_IF_WRONG_CR3 no_user_check=1
 	SWITCH_TO_KERNEL_CR3 scratch_reg=%eax
 
+	/* Restore kernel %fs, so that we can use PERCPU */
+	pushl	%fs
+	movl	$(__KERNEL_PERCPU), %eax
+	movl	%eax, %fs
+
 	/* Switch to task stack */
 	movl	%esp, %eax
-	movl	(2*4+TSS_entry2task_stack)(%esp), %esp
+	movl	(3*4+TSS_entry2task_stack)(%esp), %esp
 
 .Lsysenter_past_esp:
 	pushl	$__USER_DS		/* pt_regs->ss */
 	pushl	$0			/* pt_regs->sp (placeholder) */
-	pushl	%ss:4(%eax)		/* pt_regs->flags (except IF = 0) */
+	pushl	%ss:8(%eax)		/* pt_regs->flags (except IF = 0) */
 	pushl	$__USER_CS		/* pt_regs->cs */
 	pushl	$0			/* pt_regs->ip = 0 (placeholder) */
-	pushl	%ss:(%eax)		/* pt_regs->orig_ax */
-	SAVE_ALL pt_regs_ax=$-ENOSYS	/* save rest, stack already switched */
+	pushl	%ss:4(%eax)		/* pt_regs->orig_ax */
+	PUSH_GS				/* pt_regs->gs */
+	pushl	%ss:(%eax)		/* pt_regs->fs */
+	/* save rest, stack and %fs already switched */
+	SAVE_ALL pt_regs_ax=$-ENOSYS skip_gs=1 skip_fs=1
+	SET_KERNEL_GS %edx
 
 	/*
 	 * SYSENTER doesn't filter flags, so we need to clear NT, AC
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH V4 5/6] x86/entry/32: Use percpu to get thread.sp0 in SYSENTER
  2021-02-10 13:39 [PATCH V4 0/6] x86: Don't abuse tss.sp1 Lai Jiangshan
                   ` (3 preceding siblings ...)
  2021-02-10 13:39 ` [PATCH V4 4/6] x86/entry/32: Restore %fs before switching stack Lai Jiangshan
@ 2021-02-10 13:39 ` Lai Jiangshan
  2021-02-10 13:39 ` [PATCH V4 6/6] x86/entry/32: Introduce cpu_current_thread_sp0 to replace cpu_tss_rw.x86_tss.sp1 Lai Jiangshan
  2021-02-10 23:42 ` [PATCH V4 0/6] x86: Don't abuse tss.sp1 mark gross
  6 siblings, 0 replies; 9+ messages in thread
From: Lai Jiangshan @ 2021-02-10 13:39 UTC (permalink / raw)
  To: linux-kernel, Borislav Petkov
  Cc: Lai Jiangshan, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	x86, H. Peter Anvin, Dave Hansen, Peter Zijlstra, Al Viro,
	Vincenzo Frascino, Joerg Roedel, Ricardo Neri, Reinette Chatre,
	Balbir Singh, Andrew Morton, Gabriel Krisman Bertazi, Kees Cook,
	Frederic Weisbecker, Jens Axboe, Arvind Sankar, Brian Gerst,
	Ard Biesheuvel, Andi Kleen, Mike Rapoport, Mike Hommey,
	Mark Gross, Fenghua Yu, Tony Luck, Anthony Steinhauser, Jay Lang,
	Chang S. Bae

From: Lai Jiangshan <laijs@linux.alibaba.com>

TSS_entry2task_stack is used to refer to tss.sp1 which is a copy of
thread.sp0.

When TSS_entry2task_stack is used in entry_SYSENTER_32, the CR3 is
already kernel CR3 and the kernel %fs is loaded.

So it directly uses percpu instead of offset-calculation via
TSS_entry2task_stack.

And we remove the unused TSS_entry2task_stack.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/entry/entry_32.S        |  2 +-
 arch/x86/kernel/asm-offsets_32.c | 10 ----------
 2 files changed, 1 insertion(+), 11 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 01f098c5b017..d5b5b43fd0c0 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -916,7 +916,7 @@ SYM_FUNC_START(entry_SYSENTER_32)
 
 	/* Switch to task stack */
 	movl	%esp, %eax
-	movl	(3*4+TSS_entry2task_stack)(%esp), %esp
+	movl	PER_CPU_VAR(cpu_tss_rw + TSS_sp1), %esp
 
 .Lsysenter_past_esp:
 	pushl	$__USER_DS		/* pt_regs->ss */
diff --git a/arch/x86/kernel/asm-offsets_32.c b/arch/x86/kernel/asm-offsets_32.c
index 6e043f295a60..6d4143cfbf03 100644
--- a/arch/x86/kernel/asm-offsets_32.c
+++ b/arch/x86/kernel/asm-offsets_32.c
@@ -43,16 +43,6 @@ void foo(void)
 	OFFSET(saved_context_gdt_desc, saved_context, gdt_desc);
 	BLANK();
 
-	/*
-	 * Offset from the entry stack to task stack stored in TSS. Kernel entry
-	 * happens on the per-cpu entry-stack, and the asm code switches to the
-	 * task-stack pointer stored in x86_tss.sp1, which is a copy of
-	 * task->thread.sp0 where entry code can find it.
-	 */
-	DEFINE(TSS_entry2task_stack,
-	       offsetof(struct cpu_entry_area, tss.x86_tss.sp1) -
-	       offsetofend(struct cpu_entry_area, entry_stack_page.stack));
-
 #ifdef CONFIG_STACKPROTECTOR
 	BLANK();
 	OFFSET(stack_canary_offset, stack_canary, canary);
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH V4 6/6] x86/entry/32: Introduce cpu_current_thread_sp0 to replace cpu_tss_rw.x86_tss.sp1
  2021-02-10 13:39 [PATCH V4 0/6] x86: Don't abuse tss.sp1 Lai Jiangshan
                   ` (4 preceding siblings ...)
  2021-02-10 13:39 ` [PATCH V4 5/6] x86/entry/32: Use percpu to get thread.sp0 in SYSENTER Lai Jiangshan
@ 2021-02-10 13:39 ` Lai Jiangshan
  2021-02-10 23:42 ` [PATCH V4 0/6] x86: Don't abuse tss.sp1 mark gross
  6 siblings, 0 replies; 9+ messages in thread
From: Lai Jiangshan @ 2021-02-10 13:39 UTC (permalink / raw)
  To: linux-kernel, Borislav Petkov
  Cc: Lai Jiangshan, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	x86, H. Peter Anvin, Dave Hansen, Peter Zijlstra, Al Viro,
	Vincenzo Frascino, Joerg Roedel, Ricardo Neri, Reinette Chatre,
	Balbir Singh, Andrew Morton, Gabriel Krisman Bertazi, Kees Cook,
	Frederic Weisbecker, Jens Axboe, Arvind Sankar, Brian Gerst,
	Ard Biesheuvel, Andi Kleen, Mike Rapoport, Mike Hommey,
	Mark Gross, Fenghua Yu, Tony Luck, Anthony Steinhauser, Jay Lang,
	Chang S. Bae

From: Lai Jiangshan <laijs@linux.alibaba.com>

TSS sp1 is not used by hardware and is used as a copy of thread.sp0.

It should just use a percpu variable instead, so we introduce
cpu_current_thread_sp0 for it.

And we remove the unneeded TSS_sp1.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/entry/entry_32.S        | 6 +++---
 arch/x86/include/asm/processor.h | 2 ++
 arch/x86/include/asm/switch_to.h | 2 +-
 arch/x86/kernel/asm-offsets.c    | 1 -
 arch/x86/kernel/cpu/common.c     | 9 ++++++++-
 arch/x86/kernel/process.c        | 2 --
 6 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index d5b5b43fd0c0..55dcf5c35141 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -472,7 +472,7 @@
 	movl	%esp, %esi
 
 	/* Load top of task-stack into %edi */
-	movl	PER_CPU_VAR(cpu_tss_rw + TSS_sp1), %edi
+	movl	PER_CPU_VAR(cpu_current_thread_sp0), %edi
 
 	/* Special case - entry from kernel mode via entry stack */
 #ifdef CONFIG_VM86
@@ -658,7 +658,7 @@
 	movl	PER_CPU_VAR(cpu_tss_rw + TSS_sp0), %edi
 
 	/* Bytes on the task-stack to ecx */
-	movl	PER_CPU_VAR(cpu_tss_rw + TSS_sp1), %ecx
+	movl	PER_CPU_VAR(cpu_current_thread_sp0), %ecx
 	subl	%esi, %ecx
 
 	/* Allocate stack-frame on entry-stack */
@@ -916,7 +916,7 @@ SYM_FUNC_START(entry_SYSENTER_32)
 
 	/* Switch to task stack */
 	movl	%esp, %eax
-	movl	PER_CPU_VAR(cpu_tss_rw + TSS_sp1), %esp
+	movl	PER_CPU_VAR(cpu_current_thread_sp0), %esp
 
 .Lsysenter_past_esp:
 	pushl	$__USER_DS		/* pt_regs->ss */
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index e197de05d0aa..a40bade32105 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -776,6 +776,8 @@ static inline void spin_lock_prefetch(const void *x)
 
 #define KSTK_ESP(task)		(task_pt_regs(task)->sp)
 
+DECLARE_PER_CPU(unsigned long, cpu_current_thread_sp0);
+
 #else
 #define INIT_THREAD { }
 
diff --git a/arch/x86/include/asm/switch_to.h b/arch/x86/include/asm/switch_to.h
index f0ba06bcba0b..eb0d3ae8a54d 100644
--- a/arch/x86/include/asm/switch_to.h
+++ b/arch/x86/include/asm/switch_to.h
@@ -69,7 +69,7 @@ static inline void update_task_stack(struct task_struct *task)
 	if (static_cpu_has(X86_FEATURE_XENPV))
 		load_sp0(task->thread.sp0);
 	else
-		this_cpu_write(cpu_tss_rw.x86_tss.sp1, task->thread.sp0);
+		this_cpu_write(cpu_current_thread_sp0, task->thread.sp0);
 #else
 	if (static_cpu_has(X86_FEATURE_XENPV))
 		load_sp0(task_top_of_stack(task));
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index 60b9f42ce3c1..3b63b6062792 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -98,6 +98,5 @@ static void __used common(void)
 
 	/* Offset for fields in tss_struct */
 	OFFSET(TSS_sp0, tss_struct, x86_tss.sp0);
-	OFFSET(TSS_sp1, tss_struct, x86_tss.sp1);
 	OFFSET(TSS_sp2, tss_struct, x86_tss.sp2);
 }
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 9c531ec73f5c..86485d55949e 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1792,12 +1792,19 @@ EXPORT_PER_CPU_SYMBOL(__preempt_count);
 /*
  * On x86_32, vm86 modifies tss.sp0, so sp0 isn't a reliable way to find
  * the top of the kernel stack.  Use an extra percpu variable to track the
- * top of the kernel stack directly.
+ * top of the kernel stack directly and an percpu variable to track the
+ * thread.sp0 for using in entry code.  cpu_current_top_of_stack and
+ * cpu_current_thread_sp0 are different value because of the non-zero
+ * stack-padding on 32bit.  See more comment at TOP_OF_KERNEL_STACK_PADDING
+ * and vm86.
  */
 DEFINE_PER_CPU(unsigned long, cpu_current_top_of_stack) =
 	(unsigned long)&init_thread_union + THREAD_SIZE;
 EXPORT_PER_CPU_SYMBOL(cpu_current_top_of_stack);
 
+DEFINE_PER_CPU(unsigned long, cpu_current_thread_sp0) = TOP_OF_INIT_STACK;
+EXPORT_PER_CPU_SYMBOL_GPL(cpu_current_thread_sp0);
+
 #ifdef CONFIG_STACKPROTECTOR
 DEFINE_PER_CPU_ALIGNED(struct stack_canary, stack_canary);
 #endif
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 296de77da4b2..e6d4b5399a81 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -64,8 +64,6 @@ __visible DEFINE_PER_CPU_PAGE_ALIGNED(struct tss_struct, cpu_tss_rw) = {
 		.sp0 = (1UL << (BITS_PER_LONG-1)) + 1,
 
 #ifdef CONFIG_X86_32
-		.sp1 = TOP_OF_INIT_STACK,
-
 		.ss0 = __KERNEL_DS,
 		.ss1 = __KERNEL_CS,
 #endif
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH V4 0/6] x86: Don't abuse tss.sp1
  2021-02-10 13:39 [PATCH V4 0/6] x86: Don't abuse tss.sp1 Lai Jiangshan
                   ` (5 preceding siblings ...)
  2021-02-10 13:39 ` [PATCH V4 6/6] x86/entry/32: Introduce cpu_current_thread_sp0 to replace cpu_tss_rw.x86_tss.sp1 Lai Jiangshan
@ 2021-02-10 23:42 ` mark gross
  2021-02-11  1:51   ` Lai Jiangshan
  6 siblings, 1 reply; 9+ messages in thread
From: mark gross @ 2021-02-10 23:42 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: linux-kernel, Borislav Petkov, Lai Jiangshan, Andy Lutomirski,
	Thomas Gleixner, Ingo Molnar, x86, H. Peter Anvin, Dave Hansen,
	Peter Zijlstra, Al Viro, Vincenzo Frascino, Joerg Roedel,
	Ricardo Neri, Reinette Chatre, Balbir Singh, Andrew Morton,
	Gabriel Krisman Bertazi, Kees Cook, Frederic Weisbecker,
	Jens Axboe, Arvind Sankar, Brian Gerst, Ard Biesheuvel,
	Andi Kleen, Mike Rapoport, Mike Hommey, Mark Gross, Fenghua Yu,
	Tony Luck, Anthony Steinhauser, Jay Lang, Chang S. Bae

On Wed, Feb 10, 2021 at 09:39:11PM +0800, Lai Jiangshan wrote:
> From: Lai Jiangshan <laijs@linux.alibaba.com>
> 
> In x86_64, tss.sp1 is reused as cpu_current_top_of_stack.  We'd better
> directly use percpu since CR3 and gs_base is correct when it is used.
Be more direct if not using percpu is incorrect in some way.
> 
> In x86_32, tss.sp1 is resued as thread.sp0 in three places in entry
s/resued/reused
> code.  We have the correct CR3 and %fs at two of the places.  The last
> one is sysenter.  This patchset makes %fs available earlier so that
> we can also use percpu in sysenter.  And add a percpu cpu_current_thread_sp0
> for thread.sp0 instead of tss.sp1
> 
> [V3]: https://lore.kernel.org/lkml/20210127163231.12709-1-jiangshanlai@gmail.com/
> [V2]: https://lore.kernel.org/lkml/20210125173444.22696-1-jiangshanlai@gmail.com/
> [V1]: https://lore.kernel.org/lkml/20210123084900.3118-1-jiangshanlai@gmail.com/
> 
> Changed from V3:
> 	Update subjects as Borislav's imperative request. ^_^
> 	Update changelog as Borislav suggested.
> 	Change EXPORT_PER_CPU_SYMBOL to EXPORT_PER_CPU_SYMBOL_GPL.
> 
> Changed from V2:
> 	Add missing "%ss:" reported by Brian Gerst.
> 
> Changed from V1:
> 	Requested from Andy to also fix sp1 for x86_32.
> 	Update comments in the x86_64 patch as Andy sugguested.
> 
> Lai Jiangshan (6):
>   x86/entry/64: Move cpu_current_top_of_stack out of TSS
>   x86/entry/32: Use percpu instead of offset-calculation to get
>     thread.sp0 in SWITCH_TO_KERNEL_STACK
>   x86/entry/32: Switch to the task stack without emptying the entry
>     stack
>   x86/entry/32: Restore %fs before switching stack
>   x86/entry/32: Use percpu to get thread.sp0 in SYSENTER
>   x86/entry/32: Introduce cpu_current_thread_sp0 to replace
>     cpu_tss_rw.x86_tss.sp1
> 
>  arch/x86/entry/entry_32.S          | 38 +++++++++++++++++-------------
>  arch/x86/include/asm/processor.h   | 12 ++--------
>  arch/x86/include/asm/switch_to.h   |  8 +------
>  arch/x86/include/asm/thread_info.h |  6 -----
>  arch/x86/kernel/asm-offsets.c      |  1 -
>  arch/x86/kernel/asm-offsets_32.c   | 10 --------
>  arch/x86/kernel/cpu/common.c       | 12 +++++++++-
>  arch/x86/kernel/process.c          |  7 ------
>  arch/x86/mm/pti.c                  |  7 +++---
>  9 files changed, 39 insertions(+), 62 deletions(-)
> 
> -- 
> 2.19.1.6.gb485710b
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH V4 0/6] x86: Don't abuse tss.sp1
  2021-02-10 23:42 ` [PATCH V4 0/6] x86: Don't abuse tss.sp1 mark gross
@ 2021-02-11  1:51   ` Lai Jiangshan
  0 siblings, 0 replies; 9+ messages in thread
From: Lai Jiangshan @ 2021-02-11  1:51 UTC (permalink / raw)
  To: Mark Gross
  Cc: LKML, Borislav Petkov, Lai Jiangshan, Andy Lutomirski,
	Thomas Gleixner, Ingo Molnar, X86 ML, H. Peter Anvin,
	Dave Hansen, Peter Zijlstra, Al Viro, Vincenzo Frascino,
	Joerg Roedel, Ricardo Neri, Reinette Chatre, Balbir Singh,
	Andrew Morton, Gabriel Krisman Bertazi, Kees Cook,
	Frederic Weisbecker, Jens Axboe, Arvind Sankar, Brian Gerst,
	Ard Biesheuvel, Andi Kleen, Mike Rapoport, Mike Hommey,
	Fenghua Yu, Tony Luck, Anthony Steinhauser, Jay Lang,
	Chang S. Bae

Hi Mark

Thank you for your reply.

On Thu, Feb 11, 2021 at 7:42 AM mark gross <mgross@linux.intel.com> wrote:
>
> On Wed, Feb 10, 2021 at 09:39:11PM +0800, Lai Jiangshan wrote:
> > From: Lai Jiangshan <laijs@linux.alibaba.com>
> >
> > In x86_64, tss.sp1 is reused as cpu_current_top_of_stack.  We'd better
> > directly use percpu since CR3 and gs_base is correct when it is used.
> Be more direct if not using percpu is incorrect in some way.

Sure, I will abstract the most important reason from changelogs to
cover letter in the future.

> >
> > In x86_32, tss.sp1 is resued as thread.sp0 in three places in entry
> s/resued/reused

Sorry, I made it wrong in every cover letter even I noticed it after V2
was sent.  I forgot to use a spellchecker on the cover letter.

> > code.  We have the correct CR3 and %fs at two of the places.  The last
> > one is sysenter.  This patchset makes %fs available earlier so that
> > we can also use percpu in sysenter.  And add a percpu cpu_current_thread_sp0
> > for thread.sp0 instead of tss.sp1
> >
> > [V3]: https://lore.kernel.org/lkml/20210127163231.12709-1-jiangshanlai@gmail.com/
> > [V2]: https://lore.kernel.org/lkml/20210125173444.22696-1-jiangshanlai@gmail.com/
> > [V1]: https://lore.kernel.org/lkml/20210123084900.3118-1-jiangshanlai@gmail.com/
> >
> > Changed from V3:
> >       Update subjects as Borislav's imperative request. ^_^
> >       Update changelog as Borislav suggested.
> >       Change EXPORT_PER_CPU_SYMBOL to EXPORT_PER_CPU_SYMBOL_GPL.
> >
> > Changed from V2:
> >       Add missing "%ss:" reported by Brian Gerst.
> >
> > Changed from V1:
> >       Requested from Andy to also fix sp1 for x86_32.
> >       Update comments in the x86_64 patch as Andy sugguested.
> >
> > Lai Jiangshan (6):
> >   x86/entry/64: Move cpu_current_top_of_stack out of TSS
> >   x86/entry/32: Use percpu instead of offset-calculation to get
> >     thread.sp0 in SWITCH_TO_KERNEL_STACK
> >   x86/entry/32: Switch to the task stack without emptying the entry
> >     stack
> >   x86/entry/32: Restore %fs before switching stack
> >   x86/entry/32: Use percpu to get thread.sp0 in SYSENTER
> >   x86/entry/32: Introduce cpu_current_thread_sp0 to replace
> >     cpu_tss_rw.x86_tss.sp1
> >
> >  arch/x86/entry/entry_32.S          | 38 +++++++++++++++++-------------
> >  arch/x86/include/asm/processor.h   | 12 ++--------
> >  arch/x86/include/asm/switch_to.h   |  8 +------
> >  arch/x86/include/asm/thread_info.h |  6 -----
> >  arch/x86/kernel/asm-offsets.c      |  1 -
> >  arch/x86/kernel/asm-offsets_32.c   | 10 --------
> >  arch/x86/kernel/cpu/common.c       | 12 +++++++++-
> >  arch/x86/kernel/process.c          |  7 ------
> >  arch/x86/mm/pti.c                  |  7 +++---
> >  9 files changed, 39 insertions(+), 62 deletions(-)
> >
> > --
> > 2.19.1.6.gb485710b
> >

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-02-11  2:02 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-10 13:39 [PATCH V4 0/6] x86: Don't abuse tss.sp1 Lai Jiangshan
2021-02-10 13:39 ` [PATCH V4 1/6] x86/entry/64: Move cpu_current_top_of_stack out of TSS Lai Jiangshan
2021-02-10 13:39 ` [PATCH V4 2/6] x86/entry/32: Use percpu instead of offset-calculation to get thread.sp0 in SWITCH_TO_KERNEL_STACK Lai Jiangshan
2021-02-10 13:39 ` [PATCH V4 3/6] x86/entry/32: Switch to the task stack without emptying the entry stack Lai Jiangshan
2021-02-10 13:39 ` [PATCH V4 4/6] x86/entry/32: Restore %fs before switching stack Lai Jiangshan
2021-02-10 13:39 ` [PATCH V4 5/6] x86/entry/32: Use percpu to get thread.sp0 in SYSENTER Lai Jiangshan
2021-02-10 13:39 ` [PATCH V4 6/6] x86/entry/32: Introduce cpu_current_thread_sp0 to replace cpu_tss_rw.x86_tss.sp1 Lai Jiangshan
2021-02-10 23:42 ` [PATCH V4 0/6] x86: Don't abuse tss.sp1 mark gross
2021-02-11  1:51   ` Lai Jiangshan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).