linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] x86/entry: simply stack switching when exception on userspace
@ 2020-05-27  7:31 Lai Jiangshan
  2020-05-27  7:31 ` [PATCH 1/5] x86/entry: introduce macro idtentry_swapgs_and_switch_to_kernel_stack Lai Jiangshan
                   ` (4 more replies)
  0 siblings, 5 replies; 24+ messages in thread
From: Lai Jiangshan @ 2020-05-27  7:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: Lai Jiangshan, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin, Peter Zijlstra,
	Alexandre Chartre, Eric W. Biederman, Jann Horn, Dave Hansen

7f2590a110b8("x86/entry/64: Use a per-CPU trampoline stack for IDT entries")
has resulted that when exception on userspace, the kernel (error_entry)
always push the pt_regs to entry stack(sp0), and then copy them to the
kernel stack.

This is a hot path (for example page fault) and interrupt_entry
directly switches to kernel stack and pushes pt_regs to kernel stack.
We should do it for error_entry. This is the job of patch1,2.

Patch 3-5 simply stack switching for .Lerror_bad_iret by just doing
all the work in one function (fixup_bad_iret()).

The patch set is based on tip/master (c021d3d8fe45) (Mon May 25).

The diffstat is "66 insertions(+), 66 deletions(-)", but actually
it mainly adds comments and deletes code.

Cc: Andy Lutomirski <luto@kernel.org>,
Cc: Thomas Gleixner <tglx@linutronix.de>,
Cc: Ingo Molnar <mingo@redhat.com>,
Cc: Borislav Petkov <bp@alien8.de>,
Cc: x86@kernel.org,
Cc: "H. Peter Anvin" <hpa@zytor.com>,
Cc: Peter Zijlstra <peterz@infradead.org>,
Cc: Alexandre Chartre <alexandre.chartre@oracle.com>,
Cc: "Eric W. Biederman" <ebiederm@xmission.com>,
Cc: Jann Horn <jannh@google.com>,
Cc: Dave Hansen <dave.hansen@linux.intel.com>

Lai Jiangshan (5):
  x86/entry: introduce macro idtentry_swapgs_and_switch_to_kernel_stack
  x86/entry: avoid calling into sync_regs() when entering from userspace
  x86/entry: directly switch to kernel stack when .Lerror_bad_iret
  x86/entry: remove unused sync_regs()
  x86/entry: don't copy to tmp in fixup_bad_iret

 arch/x86/entry/entry_64.S    | 89 ++++++++++++++++++++----------------
 arch/x86/include/asm/traps.h |  1 -
 arch/x86/kernel/traps.c      | 42 +++++++----------
 3 files changed, 66 insertions(+), 66 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 1/5] x86/entry: introduce macro idtentry_swapgs_and_switch_to_kernel_stack
  2020-05-27  7:31 [PATCH 0/5] x86/entry: simply stack switching when exception on userspace Lai Jiangshan
@ 2020-05-27  7:31 ` Lai Jiangshan
  2020-05-28 19:12   ` Thomas Gleixner
  2020-05-27  7:31 ` [PATCH 2/5] x86/entry: avoid calling into sync_regs() when entering from userspace Lai Jiangshan
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 24+ messages in thread
From: Lai Jiangshan @ 2020-05-27  7:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: Lai Jiangshan, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin

Move a portion of code to be a macro, and it will also be used in
next patch.

Just move around the code, no functionality changed.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/entry/entry_64.S | 60 ++++++++++++++++++++++++++-------------
 1 file changed, 41 insertions(+), 19 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index d983a0d4bc73..5e983506f82e 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -698,28 +698,22 @@ SYM_CODE_END(\asmsym)
 #include <asm/idtentry.h>
 
 /*
- * Interrupt entry helper function.
+ * IDT entry helper macro for entering from userspace.
  *
  * Entry runs with interrupts off. Stack layout at entry:
- * +----------------------------------------------------+
- * | regs->ss						|
- * | regs->rsp						|
- * | regs->eflags					|
- * | regs->cs						|
- * | regs->ip						|
- * +----------------------------------------------------+
- * | regs->orig_ax = ~(interrupt number)		|
- * +----------------------------------------------------+
- * | return address					|
- * +----------------------------------------------------+
+ * +--------------------+
+ * | regs->ss		|
+ * | regs->rsp		|
+ * | regs->eflags	|
+ * | regs->cs		|
+ * | regs->ip		|
+ * | regs->orig_ax	|
+ * | return address	|
+ * +--------------------+
+ * The macro does swapgs and switches to current kernel stack with the
+ * same stack layout copied.
  */
-SYM_CODE_START(interrupt_entry)
-	UNWIND_HINT_IRET_REGS offset=16
-	ASM_CLAC
-	cld
-
-	testb	$3, CS-ORIG_RAX+8(%rsp)
-	jz	1f
+.macro idtentry_swapgs_and_switch_to_kernel_stack
 	SWAPGS
 	FENCE_SWAPGS_USER_ENTRY
 	/*
@@ -751,6 +745,34 @@ SYM_CODE_START(interrupt_entry)
 	pushq	8(%rdi)			/* return address */
 
 	movq	(%rdi), %rdi
+.endm
+
+/*
+ * Interrupt entry helper function.
+ *
+ * Entry runs with interrupts off. Stack layout at entry:
+ * +----------------------------------------------------+
+ * | regs->ss						|
+ * | regs->rsp						|
+ * | regs->eflags					|
+ * | regs->cs						|
+ * | regs->ip						|
+ * +----------------------------------------------------+
+ * | regs->orig_ax = ~(interrupt number)		|
+ * +----------------------------------------------------+
+ * | return address					|
+ * +----------------------------------------------------+
+ */
+SYM_CODE_START(interrupt_entry)
+	UNWIND_HINT_IRET_REGS offset=16
+	ASM_CLAC
+	cld
+
+	testb	$3, CS-ORIG_RAX+8(%rsp)
+	jz	1f
+
+	idtentry_swapgs_and_switch_to_kernel_stack
+
 	jmp	2f
 1:
 	FENCE_SWAPGS_KERNEL_ENTRY
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 2/5] x86/entry: avoid calling into sync_regs() when entering from userspace
  2020-05-27  7:31 [PATCH 0/5] x86/entry: simply stack switching when exception on userspace Lai Jiangshan
  2020-05-27  7:31 ` [PATCH 1/5] x86/entry: introduce macro idtentry_swapgs_and_switch_to_kernel_stack Lai Jiangshan
@ 2020-05-27  7:31 ` Lai Jiangshan
  2020-05-27  7:31 ` [PATCH 3/5] x86/entry: directly switch to kernel stack when .Lerror_bad_iret Lai Jiangshan
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 24+ messages in thread
From: Lai Jiangshan @ 2020-05-27  7:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: Lai Jiangshan, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin

7f2590a110b8("x86/entry/64: Use a per-CPU trampoline stack for IDT entries")
made a change that when any exception happens on userspace, the
entry code will save the pt_regs on the sp0 stack, and then copy it
to the thread stack via sync_regs() and switch to thread stack
afterward.

This is hot path, such overhead should be avoided. This patch
borrows the way how interrupt_entry handles it.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/entry/entry_64.S | 18 +++++++-----------
 1 file changed, 7 insertions(+), 11 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 5e983506f82e..e8817ae31390 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1321,19 +1321,13 @@ SYM_CODE_END(paranoid_exit)
 SYM_CODE_START_LOCAL(error_entry)
 	UNWIND_HINT_FUNC
 	cld
-	PUSH_AND_CLEAR_REGS save_ret=1
-	ENCODE_FRAME_POINTER 8
-	testb	$3, CS+8(%rsp)
+	testb	$3, CS-ORIG_RAX+8(%rsp)
 	jz	.Lerror_kernelspace
 
-	/*
-	 * We entered from user mode or we're pretending to have entered
-	 * from user mode due to an IRET fault.
-	 */
-	SWAPGS
-	FENCE_SWAPGS_USER_ENTRY
-	/* We have user CR3.  Change to kernel CR3. */
-	SWITCH_TO_KERNEL_CR3 scratch_reg=%rax
+	idtentry_swapgs_and_switch_to_kernel_stack
+	PUSH_AND_CLEAR_REGS save_ret=1
+	ENCODE_FRAME_POINTER 8
+	ret
 
 .Lerror_entry_from_usermode_after_swapgs:
 	/* Put us onto the real thread stack. */
@@ -1357,6 +1351,8 @@ SYM_CODE_START_LOCAL(error_entry)
 	 * for these here too.
 	 */
 .Lerror_kernelspace:
+	PUSH_AND_CLEAR_REGS save_ret=1
+	ENCODE_FRAME_POINTER 8
 	leaq	native_irq_return_iret(%rip), %rcx
 	cmpq	%rcx, RIP+8(%rsp)
 	je	.Lerror_bad_iret
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 3/5] x86/entry: directly switch to kernel stack when .Lerror_bad_iret
  2020-05-27  7:31 [PATCH 0/5] x86/entry: simply stack switching when exception on userspace Lai Jiangshan
  2020-05-27  7:31 ` [PATCH 1/5] x86/entry: introduce macro idtentry_swapgs_and_switch_to_kernel_stack Lai Jiangshan
  2020-05-27  7:31 ` [PATCH 2/5] x86/entry: avoid calling into sync_regs() when entering from userspace Lai Jiangshan
@ 2020-05-27  7:31 ` Lai Jiangshan
  2020-05-27  7:31 ` [PATCH 4/5] x86/entry: remove unused sync_regs() Lai Jiangshan
  2020-05-27  7:31 ` [PATCH 5/5] x86/entry: don't copy to tmp in fixup_bad_iret Lai Jiangshan
  4 siblings, 0 replies; 24+ messages in thread
From: Lai Jiangshan @ 2020-05-27  7:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: Lai Jiangshan, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin, Peter Zijlstra,
	Alexandre Chartre, Eric W. Biederman, Jann Horn, Dave Hansen

Directly copy pt_regs to kernel stack when .Lerror_bad_iret.
Directly switch to kernel stack when .Lerror_bad_iret.

We can see that entry_64.S do the following things back to back
when .Lerror_bad_iret:
	call fixup_bad_iret(), switch to sp0 stack with pt_regs copied
	call sync_regs(), switch to kernel stack with pt_regs copied

So we can do the all things together in fixup_bad_iret().

After this patch, fixup_bad_iret() is restored to the behavior before
7f2590a110b8("x86/entry/64: Use a per-CPU trampoline stack for IDT entries")

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/entry/entry_64.S | 13 ++-----------
 arch/x86/kernel/traps.c   |  9 ++++-----
 2 files changed, 6 insertions(+), 16 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index e8817ae31390..c5db048e5bed 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1329,16 +1329,6 @@ SYM_CODE_START_LOCAL(error_entry)
 	ENCODE_FRAME_POINTER 8
 	ret
 
-.Lerror_entry_from_usermode_after_swapgs:
-	/* Put us onto the real thread stack. */
-	popq	%r12				/* save return addr in %12 */
-	movq	%rsp, %rdi			/* arg0 = pt_regs pointer */
-	call	sync_regs
-	movq	%rax, %rsp			/* switch stack */
-	ENCODE_FRAME_POINTER
-	pushq	%r12
-	ret
-
 .Lerror_entry_done_lfence:
 	FENCE_SWAPGS_KERNEL_ENTRY
 .Lerror_entry_done:
@@ -1392,7 +1382,8 @@ SYM_CODE_START_LOCAL(error_entry)
 	mov	%rsp, %rdi
 	call	fixup_bad_iret
 	mov	%rax, %rsp
-	jmp	.Lerror_entry_from_usermode_after_swapgs
+	ENCODE_FRAME_POINTER 8
+	ret
 SYM_CODE_END(error_entry)
 
 SYM_CODE_START_LOCAL(error_exit)
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 9e5d81cb94ba..3bef95934644 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -666,13 +666,12 @@ struct bad_iret_stack *fixup_bad_iret(struct bad_iret_stack *s)
 	/*
 	 * This is called from entry_64.S early in handling a fault
 	 * caused by a bad iret to user mode.  To handle the fault
-	 * correctly, we want to move our stack frame to where it would
-	 * be had we entered directly on the entry stack (rather than
-	 * just below the IRET frame) and we want to pretend that the
-	 * exception came from the IRET target.
+	 * correctly, we want to move our stack frame to kernel stack
+	 * (rather than just below the IRET frame) and we want to
+	 * pretend that the exception came from the IRET target.
 	 */
 	struct bad_iret_stack tmp, *new_stack =
-		(struct bad_iret_stack *)__this_cpu_read(cpu_tss_rw.x86_tss.sp0) - 1;
+		(struct bad_iret_stack *)__this_cpu_read(cpu_current_top_of_stack) - 1;
 
 	/* Copy the IRET target to the temporary storage. */
 	memcpy(&tmp.regs.ip, (void *)s->regs.sp, 5*8);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 4/5] x86/entry: remove unused sync_regs()
  2020-05-27  7:31 [PATCH 0/5] x86/entry: simply stack switching when exception on userspace Lai Jiangshan
                   ` (2 preceding siblings ...)
  2020-05-27  7:31 ` [PATCH 3/5] x86/entry: directly switch to kernel stack when .Lerror_bad_iret Lai Jiangshan
@ 2020-05-27  7:31 ` Lai Jiangshan
  2020-05-27  7:31 ` [PATCH 5/5] x86/entry: don't copy to tmp in fixup_bad_iret Lai Jiangshan
  4 siblings, 0 replies; 24+ messages in thread
From: Lai Jiangshan @ 2020-05-27  7:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: Lai Jiangshan, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	x86, H. Peter Anvin, Andy Lutomirski, Alexandre Chartre,
	Peter Zijlstra, Zhenzhong Duan, Eric W. Biederman, Jann Horn,
	Dave Hansen

No more users.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/include/asm/traps.h |  1 -
 arch/x86/kernel/traps.c      | 13 -------------
 2 files changed, 14 deletions(-)

diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index f5a2e438a878..20b9db7a1d49 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -21,7 +21,6 @@ asmlinkage void xen_page_fault(void);
 dotraplinkage void do_page_fault(struct pt_regs *regs, unsigned long error_code, unsigned long address);
 
 #ifdef CONFIG_X86_64
-asmlinkage __visible notrace struct pt_regs *sync_regs(struct pt_regs *eregs);
 asmlinkage __visible notrace
 struct bad_iret_stack *fixup_bad_iret(struct bad_iret_stack *s);
 void __init trap_init(void);
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 3bef95934644..8291f84933ff 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -642,19 +642,6 @@ DEFINE_IDTENTRY_RAW(exc_int3)
 }
 
 #ifdef CONFIG_X86_64
-/*
- * Help handler running on a per-cpu (IST or entry trampoline) stack
- * to switch to the normal thread stack if the interrupted code was in
- * user mode. The actual stack switch is done in entry_64.S
- */
-asmlinkage __visible noinstr struct pt_regs *sync_regs(struct pt_regs *eregs)
-{
-	struct pt_regs *regs = (struct pt_regs *)this_cpu_read(cpu_current_top_of_stack) - 1;
-	if (regs != eregs)
-		*regs = *eregs;
-	return regs;
-}
-
 struct bad_iret_stack {
 	void *error_entry_ret;
 	struct pt_regs regs;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 5/5] x86/entry: don't copy to tmp in fixup_bad_iret
  2020-05-27  7:31 [PATCH 0/5] x86/entry: simply stack switching when exception on userspace Lai Jiangshan
                   ` (3 preceding siblings ...)
  2020-05-27  7:31 ` [PATCH 4/5] x86/entry: remove unused sync_regs() Lai Jiangshan
@ 2020-05-27  7:31 ` Lai Jiangshan
  4 siblings, 0 replies; 24+ messages in thread
From: Lai Jiangshan @ 2020-05-27  7:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: Lai Jiangshan, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	x86, H. Peter Anvin, Andy Lutomirski, Peter Zijlstra,
	Alexandre Chartre, Eric W. Biederman, Jann Horn, Dave Hansen

It is safe to do memcpy() in fixup_bad_iret() now.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/kernel/traps.c | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 8291f84933ff..6fe72c771745 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -657,17 +657,23 @@ struct bad_iret_stack *fixup_bad_iret(struct bad_iret_stack *s)
 	 * (rather than just below the IRET frame) and we want to
 	 * pretend that the exception came from the IRET target.
 	 */
-	struct bad_iret_stack tmp, *new_stack =
+	struct bad_iret_stack *new_stack =
 		(struct bad_iret_stack *)__this_cpu_read(cpu_current_top_of_stack) - 1;
 
-	/* Copy the IRET target to the temporary storage. */
-	memcpy(&tmp.regs.ip, (void *)s->regs.sp, 5*8);
+	/*
+	 * Both the IRET frame and the saved pt_regs must be on the
+	 * entry stacks since iret to user is only issued on the
+	 * entry stacks. So they don't overlap with kernel stack and
+	 * memcpy() is safe to copy them.
+	 */
+	BUG_ON(((unsigned long)s - (unsigned long)new_stack) < PAGE_SIZE ||
+	       ((unsigned long)new_stack - (unsigned long)s) < PAGE_SIZE);
 
-	/* Copy the remainder of the stack from the current stack. */
-	memcpy(&tmp, s, offsetof(struct bad_iret_stack, regs.ip));
+	/* Copy the IRET target to the new stack. */
+	memcpy(&new_stack->regs.ip, (void *)s->regs.sp, 5*8);
 
-	/* Update the entry stack */
-	memcpy(new_stack, &tmp, sizeof(tmp));
+	/* Copy the remainder of the stack from the current stack. */
+	memcpy(new_stack, s, offsetof(struct bad_iret_stack, regs.ip));
 
 	BUG_ON(!user_mode(&new_stack->regs));
 	return new_stack;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/5] x86/entry: introduce macro idtentry_swapgs_and_switch_to_kernel_stack
  2020-05-27  7:31 ` [PATCH 1/5] x86/entry: introduce macro idtentry_swapgs_and_switch_to_kernel_stack Lai Jiangshan
@ 2020-05-28 19:12   ` Thomas Gleixner
  2020-05-29  8:26     ` [PATCH V2 0/4] x86/entry: simply stack switching when exception on userspace Lai Jiangshan
  0 siblings, 1 reply; 24+ messages in thread
From: Thomas Gleixner @ 2020-05-28 19:12 UTC (permalink / raw)
  To: Lai Jiangshan, linux-kernel
  Cc: Lai Jiangshan, Andy Lutomirski, Ingo Molnar, Borislav Petkov,
	x86, H. Peter Anvin

Lai,

Lai Jiangshan <laijs@linux.alibaba.com> writes:

> Move a portion of code to be a macro, and it will also be used in
> next patch.
>
> Just move around the code, no functionality changed.

interrupt_enter does not exist anymore. See:

  git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86/entry

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH V2 0/4] x86/entry: simply stack switching when exception on userspace
  2020-05-28 19:12   ` Thomas Gleixner
@ 2020-05-29  8:26     ` Lai Jiangshan
  2020-05-29  8:26       ` [PATCH V2 1/4] x86/entry: avoid calling into sync_regs() when entering from userspace Lai Jiangshan
                         ` (4 more replies)
  0 siblings, 5 replies; 24+ messages in thread
From: Lai Jiangshan @ 2020-05-29  8:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: Lai Jiangshan, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin, Peter Zijlstra,
	Alexandre Chartre, Eric W. Biederman, Jann Horn, Dave Hansen

7f2590a110b8("x86/entry/64: Use a per-CPU trampoline stack for IDT entries")
has resulted that when exception on userspace, the kernel (error_entry)
always push the pt_regs to entry stack(sp0), and then copy them to the
kernel stack.

And recent x86/entry work makes interrupt also use idtentry
and makes all the interrupt code save the pt_regs on the sp0 stack
and then copy it to the thread stack like exception.

This is hot path (page fault, ipi), such overhead should be avoided.
And the original interrupt_entry directly switches to kernel stack
and pushes pt_regs to kernel stack. We should do it for error_entry.
This is the job of patch1.

Patch 2-4 simply stack switching for .Lerror_bad_iret by just doing
all the work in one function (fixup_bad_iret()).

The patch set is based on tip/x86/entry (28447ea41542) (May 20).

Changed from V1:
	based on tip/master -> based on tip/x86/entry

	patch 1 replaces the patch1,2 of V1, it borrows the
	original interrupt_entry's code into error_entry.

	patch2-4 is V1's patch3-5, unchanged (but rebased)

Cc: Andy Lutomirski <luto@kernel.org>,
Cc: Thomas Gleixner <tglx@linutronix.de>,
Cc: Ingo Molnar <mingo@redhat.com>,
Cc: Borislav Petkov <bp@alien8.de>,
Cc: x86@kernel.org,
Cc: "H. Peter Anvin" <hpa@zytor.com>,
Cc: Peter Zijlstra <peterz@infradead.org>,
Cc: Alexandre Chartre <alexandre.chartre@oracle.com>,
Cc: "Eric W. Biederman" <ebiederm@xmission.com>,
Cc: Jann Horn <jannh@google.com>,
Cc: Dave Hansen <dave.hansen@linux.intel.com>

Lai Jiangshan (4):
  x86/entry: avoid calling into sync_regs() when entering from userspace
  x86/entry: directly switch to kernel stack when .Lerror_bad_iret
  x86/entry: remove unused sync_regs()
  x86/entry: don't copy to tmp in fixup_bad_iret

 arch/x86/entry/entry_64.S    | 52 +++++++++++++++++++++++-------------
 arch/x86/include/asm/traps.h |  1 -
 arch/x86/kernel/traps.c      | 42 ++++++++++++-----------------
 3 files changed, 51 insertions(+), 44 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH V2 1/4] x86/entry: avoid calling into sync_regs() when entering from userspace
  2020-05-29  8:26     ` [PATCH V2 0/4] x86/entry: simply stack switching when exception on userspace Lai Jiangshan
@ 2020-05-29  8:26       ` Lai Jiangshan
  2020-05-29  8:26       ` [PATCH V2 2/4] x86/entry: directly switch to kernel stack when .Lerror_bad_iret Lai Jiangshan
                         ` (3 subsequent siblings)
  4 siblings, 0 replies; 24+ messages in thread
From: Lai Jiangshan @ 2020-05-29  8:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: Lai Jiangshan, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin

7f2590a110b8("x86/entry/64: Use a per-CPU trampoline stack for IDT entries")
made a change that when any exception happens on userspace, the
entry code will save the pt_regs on the sp0 stack, and then copy it
to the thread stack via sync_regs() and switch to thread stack
afterward.

And recent x86/entry work makes interrupt also use idtentry
and makes all the interrupt code save the pt_regs on the sp0 stack
and then copy it to the thread stack like exception.

This is hot path (page fault, ipi), such overhead should be avoided.
This patch borrows the way how original interrupt_entry handles it.
It switches to the thread stack directly right away when comes
from userspace.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/entry/entry_64.S | 43 +++++++++++++++++++++++++++++++--------
 1 file changed, 34 insertions(+), 9 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 265ff97b3961..b524ff3499d0 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -878,19 +878,42 @@ SYM_CODE_END(paranoid_exit)
 SYM_CODE_START_LOCAL(error_entry)
 	UNWIND_HINT_FUNC
 	cld
-	PUSH_AND_CLEAR_REGS save_ret=1
-	ENCODE_FRAME_POINTER 8
-	testb	$3, CS+8(%rsp)
+	testb	$3, CS-ORIG_RAX+8(%rsp)
 	jz	.Lerror_kernelspace
 
-	/*
-	 * We entered from user mode or we're pretending to have entered
-	 * from user mode due to an IRET fault.
-	 */
 	SWAPGS
 	FENCE_SWAPGS_USER_ENTRY
-	/* We have user CR3.  Change to kernel CR3. */
-	SWITCH_TO_KERNEL_CR3 scratch_reg=%rax
+	/*
+	 * Switch to the thread stack. The IRET frame and orig_ax are
+	 * on the stack, as well as the return address. RDI..R12 are
+	 * not (yet) on the stack and space has not (yet) been
+	 * allocated for them.
+	 */
+	pushq	%rdx
+
+	/* Need to switch before accessing the thread stack. */
+	SWITCH_TO_KERNEL_CR3 scratch_reg=%rdx
+	movq	%rsp, %rdx
+	movq	PER_CPU_VAR(cpu_current_top_of_stack), %rsp
+
+	 /*
+	  * We have RDX, return address, and orig_ax on the stack on
+	  * top of the IRET frame. That means offset=24
+	  */
+	UNWIND_HINT_IRET_REGS base=%rdx offset=24
+
+	pushq	7*8(%rdx)		/* regs->ss */
+	pushq	6*8(%rdx)		/* regs->rsp */
+	pushq	5*8(%rdx)		/* regs->eflags */
+	pushq	4*8(%rdx)		/* regs->cs */
+	pushq	3*8(%rdx)		/* regs->ip */
+	pushq	2*8(%rdx)		/* regs->orig_ax */
+	pushq	8(%rdx)			/* return address */
+	UNWIND_HINT_FUNC
+
+	PUSH_AND_CLEAR_REGS rdx=(%rdx), save_ret=1
+	ENCODE_FRAME_POINTER 8
+	ret
 
 .Lerror_entry_from_usermode_after_swapgs:
 	/* Put us onto the real thread stack. */
@@ -914,6 +937,8 @@ SYM_CODE_START_LOCAL(error_entry)
 	 * for these here too.
 	 */
 .Lerror_kernelspace:
+	PUSH_AND_CLEAR_REGS save_ret=1
+	ENCODE_FRAME_POINTER 8
 	leaq	native_irq_return_iret(%rip), %rcx
 	cmpq	%rcx, RIP+8(%rsp)
 	je	.Lerror_bad_iret
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH V2 2/4] x86/entry: directly switch to kernel stack when .Lerror_bad_iret
  2020-05-29  8:26     ` [PATCH V2 0/4] x86/entry: simply stack switching when exception on userspace Lai Jiangshan
  2020-05-29  8:26       ` [PATCH V2 1/4] x86/entry: avoid calling into sync_regs() when entering from userspace Lai Jiangshan
@ 2020-05-29  8:26       ` Lai Jiangshan
  2020-05-29  8:26       ` [PATCH V2 3/4] x86/entry: remove unused sync_regs() Lai Jiangshan
                         ` (2 subsequent siblings)
  4 siblings, 0 replies; 24+ messages in thread
From: Lai Jiangshan @ 2020-05-29  8:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: Lai Jiangshan, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin, Peter Zijlstra,
	Alexandre Chartre, Eric W. Biederman, Jann Horn, Dave Hansen

Directly copy pt_regs to kernel stack when .Lerror_bad_iret.
Directly switch to kernel stack when .Lerror_bad_iret.

We can see that entry_64.S do the following things back to back
when .Lerror_bad_iret:
  call fixup_bad_iret(), switch to sp0 stack with pt_regs copied
  call sync_regs(), switch to kernel stack with pt_regs copied

So we can do the all things together in fixup_bad_iret().

After this patch, fixup_bad_iret() is restored to the behavior before
7f2590a110b8("x86/entry/64: Use a per-CPU trampoline stack for IDT entries")

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/entry/entry_64.S | 13 ++-----------
 arch/x86/kernel/traps.c   |  9 ++++-----
 2 files changed, 6 insertions(+), 16 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index b524ff3499d0..6f8963072ed0 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -915,16 +915,6 @@ SYM_CODE_START_LOCAL(error_entry)
 	ENCODE_FRAME_POINTER 8
 	ret
 
-.Lerror_entry_from_usermode_after_swapgs:
-	/* Put us onto the real thread stack. */
-	popq	%r12				/* save return addr in %12 */
-	movq	%rsp, %rdi			/* arg0 = pt_regs pointer */
-	call	sync_regs
-	movq	%rax, %rsp			/* switch stack */
-	ENCODE_FRAME_POINTER
-	pushq	%r12
-	ret
-
 .Lerror_entry_done_lfence:
 	FENCE_SWAPGS_KERNEL_ENTRY
 .Lerror_entry_done:
@@ -978,7 +968,8 @@ SYM_CODE_START_LOCAL(error_entry)
 	mov	%rsp, %rdi
 	call	fixup_bad_iret
 	mov	%rax, %rsp
-	jmp	.Lerror_entry_from_usermode_after_swapgs
+	ENCODE_FRAME_POINTER 8
+	ret
 SYM_CODE_END(error_entry)
 
 SYM_CODE_START_LOCAL(error_return)
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 50fb9cd5be97..053a15717bfd 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -669,13 +669,12 @@ struct bad_iret_stack *fixup_bad_iret(struct bad_iret_stack *s)
 	/*
 	 * This is called from entry_64.S early in handling a fault
 	 * caused by a bad iret to user mode.  To handle the fault
-	 * correctly, we want to move our stack frame to where it would
-	 * be had we entered directly on the entry stack (rather than
-	 * just below the IRET frame) and we want to pretend that the
-	 * exception came from the IRET target.
+	 * correctly, we want to move our stack frame to kernel stack
+	 * (rather than just below the IRET frame) and we want to
+	 * pretend that the exception came from the IRET target.
 	 */
 	struct bad_iret_stack tmp, *new_stack =
-		(struct bad_iret_stack *)__this_cpu_read(cpu_tss_rw.x86_tss.sp0) - 1;
+		(struct bad_iret_stack *)__this_cpu_read(cpu_current_top_of_stack) - 1;
 
 	/* Copy the IRET target to the temporary storage. */
 	memcpy(&tmp.regs.ip, (void *)s->regs.sp, 5*8);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH V2 3/4] x86/entry: remove unused sync_regs()
  2020-05-29  8:26     ` [PATCH V2 0/4] x86/entry: simply stack switching when exception on userspace Lai Jiangshan
  2020-05-29  8:26       ` [PATCH V2 1/4] x86/entry: avoid calling into sync_regs() when entering from userspace Lai Jiangshan
  2020-05-29  8:26       ` [PATCH V2 2/4] x86/entry: directly switch to kernel stack when .Lerror_bad_iret Lai Jiangshan
@ 2020-05-29  8:26       ` Lai Jiangshan
  2020-05-29  8:26       ` [PATCH V2 4/4] x86/entry: don't copy to tmp in fixup_bad_iret Lai Jiangshan
  2020-05-29 18:32       ` [PATCH V2 0/4] x86/entry: simply stack switching when exception on userspace Andy Lutomirski
  4 siblings, 0 replies; 24+ messages in thread
From: Lai Jiangshan @ 2020-05-29  8:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: Lai Jiangshan, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	x86, H. Peter Anvin, Andy Lutomirski, Alexandre Chartre,
	Peter Zijlstra, Jann Horn, Dave Hansen

No more users.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/include/asm/traps.h |  1 -
 arch/x86/kernel/traps.c      | 13 -------------
 2 files changed, 14 deletions(-)

diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index 714b1a30e7b0..52cd29bde1d0 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -10,7 +10,6 @@
 #include <asm/siginfo.h>			/* TRAP_TRACE, ... */
 
 #ifdef CONFIG_X86_64
-asmlinkage __visible notrace struct pt_regs *sync_regs(struct pt_regs *eregs);
 asmlinkage __visible notrace
 struct bad_iret_stack *fixup_bad_iret(struct bad_iret_stack *s);
 void __init trap_init(void);
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 053a15717bfd..9d16672865f8 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -645,19 +645,6 @@ DEFINE_IDTENTRY_RAW(exc_int3)
 }
 
 #ifdef CONFIG_X86_64
-/*
- * Help handler running on a per-cpu (IST or entry trampoline) stack
- * to switch to the normal thread stack if the interrupted code was in
- * user mode. The actual stack switch is done in entry_64.S
- */
-asmlinkage __visible noinstr struct pt_regs *sync_regs(struct pt_regs *eregs)
-{
-	struct pt_regs *regs = (struct pt_regs *)this_cpu_read(cpu_current_top_of_stack) - 1;
-	if (regs != eregs)
-		*regs = *eregs;
-	return regs;
-}
-
 struct bad_iret_stack {
 	void *error_entry_ret;
 	struct pt_regs regs;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH V2 4/4] x86/entry: don't copy to tmp in fixup_bad_iret
  2020-05-29  8:26     ` [PATCH V2 0/4] x86/entry: simply stack switching when exception on userspace Lai Jiangshan
                         ` (2 preceding siblings ...)
  2020-05-29  8:26       ` [PATCH V2 3/4] x86/entry: remove unused sync_regs() Lai Jiangshan
@ 2020-05-29  8:26       ` Lai Jiangshan
  2020-05-29 18:32       ` [PATCH V2 0/4] x86/entry: simply stack switching when exception on userspace Andy Lutomirski
  4 siblings, 0 replies; 24+ messages in thread
From: Lai Jiangshan @ 2020-05-29  8:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: Lai Jiangshan, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	x86, H. Peter Anvin, Andy Lutomirski, Peter Zijlstra,
	Alexandre Chartre, Jann Horn, Dave Hansen

It is safe to do memcpy() in fixup_bad_iret() now.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/kernel/traps.c | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 9d16672865f8..1a0253a80a4c 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -660,17 +660,23 @@ struct bad_iret_stack *fixup_bad_iret(struct bad_iret_stack *s)
 	 * (rather than just below the IRET frame) and we want to
 	 * pretend that the exception came from the IRET target.
 	 */
-	struct bad_iret_stack tmp, *new_stack =
+	struct bad_iret_stack *new_stack =
 		(struct bad_iret_stack *)__this_cpu_read(cpu_current_top_of_stack) - 1;
 
-	/* Copy the IRET target to the temporary storage. */
-	memcpy(&tmp.regs.ip, (void *)s->regs.sp, 5*8);
+	/*
+	 * Both the IRET frame and the saved pt_regs must be on the
+	 * entry stacks since iret to user is only issued on the
+	 * entry stacks. So they don't overlap with kernel stack and
+	 * memcpy() is safe to copy them.
+	 */
+	BUG_ON(((unsigned long)s - (unsigned long)new_stack) < PAGE_SIZE ||
+	       ((unsigned long)new_stack - (unsigned long)s) < PAGE_SIZE);
 
-	/* Copy the remainder of the stack from the current stack. */
-	memcpy(&tmp, s, offsetof(struct bad_iret_stack, regs.ip));
+	/* Copy the IRET target to the new stack. */
+	memcpy(&new_stack->regs.ip, (void *)s->regs.sp, 5*8);
 
-	/* Update the entry stack */
-	memcpy(new_stack, &tmp, sizeof(tmp));
+	/* Copy the remainder of the stack from the current stack. */
+	memcpy(new_stack, s, offsetof(struct bad_iret_stack, regs.ip));
 
 	BUG_ON(!user_mode(&new_stack->regs));
 	return new_stack;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH V2 0/4] x86/entry: simply stack switching when exception on userspace
  2020-05-29  8:26     ` [PATCH V2 0/4] x86/entry: simply stack switching when exception on userspace Lai Jiangshan
                         ` (3 preceding siblings ...)
  2020-05-29  8:26       ` [PATCH V2 4/4] x86/entry: don't copy to tmp in fixup_bad_iret Lai Jiangshan
@ 2020-05-29 18:32       ` Andy Lutomirski
  2020-06-16  1:56         ` Lai Jiangshan
  4 siblings, 1 reply; 24+ messages in thread
From: Andy Lutomirski @ 2020-05-29 18:32 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: LKML, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, X86 ML, H. Peter Anvin, Peter Zijlstra,
	Alexandre Chartre, Eric W. Biederman, Jann Horn, Dave Hansen

On Fri, May 29, 2020 at 1:26 AM Lai Jiangshan <laijs@linux.alibaba.com> wrote:
>
> 7f2590a110b8("x86/entry/64: Use a per-CPU trampoline stack for IDT entries")
> has resulted that when exception on userspace, the kernel (error_entry)
> always push the pt_regs to entry stack(sp0), and then copy them to the
> kernel stack.
>
> And recent x86/entry work makes interrupt also use idtentry
> and makes all the interrupt code save the pt_regs on the sp0 stack
> and then copy it to the thread stack like exception.
>
> This is hot path (page fault, ipi), such overhead should be avoided.
> And the original interrupt_entry directly switches to kernel stack
> and pushes pt_regs to kernel stack. We should do it for error_entry.
> This is the job of patch1.
>
> Patch 2-4 simply stack switching for .Lerror_bad_iret by just doing
> all the work in one function (fixup_bad_iret()).
>
> The patch set is based on tip/x86/entry (28447ea41542) (May 20).

There are definitely good cleanups in here, but I think it would be
nice rebased to whatever lands in 5.8-rc1 settles.

--Andy

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH V2 0/4] x86/entry: simply stack switching when exception on userspace
  2020-05-29 18:32       ` [PATCH V2 0/4] x86/entry: simply stack switching when exception on userspace Andy Lutomirski
@ 2020-06-16  1:56         ` Lai Jiangshan
  2020-06-18 13:52           ` Lai Jiangshan
  0 siblings, 1 reply; 24+ messages in thread
From: Lai Jiangshan @ 2020-06-16  1:56 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Lai Jiangshan, LKML, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, X86 ML, H. Peter Anvin, Peter Zijlstra,
	Alexandre Chartre, Eric W. Biederman, Jann Horn, Dave Hansen

On Sat, May 30, 2020 at 2:33 AM Andy Lutomirski <luto@kernel.org> wrote:
>
> On Fri, May 29, 2020 at 1:26 AM Lai Jiangshan <laijs@linux.alibaba.com> wrote:
> >
> > 7f2590a110b8("x86/entry/64: Use a per-CPU trampoline stack for IDT entries")
> > has resulted that when exception on userspace, the kernel (error_entry)
> > always push the pt_regs to entry stack(sp0), and then copy them to the
> > kernel stack.
> >
> > And recent x86/entry work makes interrupt also use idtentry
> > and makes all the interrupt code save the pt_regs on the sp0 stack
> > and then copy it to the thread stack like exception.
> >
> > This is hot path (page fault, ipi), such overhead should be avoided.
> > And the original interrupt_entry directly switches to kernel stack
> > and pushes pt_regs to kernel stack. We should do it for error_entry.
> > This is the job of patch1.
> >
> > Patch 2-4 simply stack switching for .Lerror_bad_iret by just doing
> > all the work in one function (fixup_bad_iret()).
> >
> > The patch set is based on tip/x86/entry (28447ea41542) (May 20).
>
> There are definitely good cleanups in here, but I think it would be
> nice rebased to whatever lands in 5.8-rc1 settles.
>

Hello, All

This patchset can be smoothly applicable to the newest tip/x86/entry
which has 5.8-rc1 merged. Which means I don't have to respin/resend it
until any update is needed.

Could you have a review on it please.

Thanks
Lai

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH V2 0/4] x86/entry: simply stack switching when exception on userspace
  2020-06-16  1:56         ` Lai Jiangshan
@ 2020-06-18 13:52           ` Lai Jiangshan
  2020-06-18 14:10             ` Thomas Gleixner
  0 siblings, 1 reply; 24+ messages in thread
From: Lai Jiangshan @ 2020-06-18 13:52 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Lai Jiangshan, LKML, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, X86 ML, H. Peter Anvin, Peter Zijlstra,
	Alexandre Chartre, Eric W. Biederman, Jann Horn, Dave Hansen

Hello and Ping

On Tue, Jun 16, 2020 at 9:56 AM Lai Jiangshan
<jiangshanlai+lkml@gmail.com> wrote:
>
> On Sat, May 30, 2020 at 2:33 AM Andy Lutomirski <luto@kernel.org> wrote:
> >
> > On Fri, May 29, 2020 at 1:26 AM Lai Jiangshan <laijs@linux.alibaba.com> wrote:
> > >
> > > 7f2590a110b8("x86/entry/64: Use a per-CPU trampoline stack for IDT entries")
> > > has resulted that when exception on userspace, the kernel (error_entry)
> > > always push the pt_regs to entry stack(sp0), and then copy them to the
> > > kernel stack.

Ping Andy Lutomirski for having added the overhead two years ago.

> > >
> > > And recent x86/entry work makes interrupt also use idtentry
> > > and makes all the interrupt code save the pt_regs on the sp0 stack
> > > and then copy it to the thread stack like exception.
> > >

Ping Thomas Gleixner for having added the overhead recently.

>
> Hello, All
>
> This patchset can be smoothly applicable to the newest tip/x86/entry
> which has 5.8-rc1 merged. Which means I don't have to respin/resend it
> until any update is needed.
>
> Could you have a review on it please.
>
> Thanks
> Lai

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH V2 0/4] x86/entry: simply stack switching when exception on userspace
  2020-06-18 13:52           ` Lai Jiangshan
@ 2020-06-18 14:10             ` Thomas Gleixner
  2020-06-27 21:03               ` Andy Lutomirski
  0 siblings, 1 reply; 24+ messages in thread
From: Thomas Gleixner @ 2020-06-18 14:10 UTC (permalink / raw)
  To: Lai Jiangshan, Andy Lutomirski
  Cc: Lai Jiangshan, LKML, Ingo Molnar, Borislav Petkov, X86 ML,
	H. Peter Anvin, Peter Zijlstra, Alexandre Chartre,
	Eric W. Biederman, Jann Horn, Dave Hansen

Lai,

Lai Jiangshan <jiangshanlai+lkml@gmail.com> writes:

> Hello and Ping

you have poked on that on Tuesday, i.e. 2 days ago. It's on the todo
list, but not the must urgent problem on the planet.

Thanks

        tglx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH V2 0/4] x86/entry: simply stack switching when exception on userspace
  2020-06-18 14:10             ` Thomas Gleixner
@ 2020-06-27 21:03               ` Andy Lutomirski
       [not found]                 ` <20200817062355.2884-1-jiangshanlai@gmail.com>
  0 siblings, 1 reply; 24+ messages in thread
From: Andy Lutomirski @ 2020-06-27 21:03 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Lai Jiangshan, Andy Lutomirski, Lai Jiangshan, LKML, Ingo Molnar,
	Borislav Petkov, X86 ML, H. Peter Anvin, Peter Zijlstra,
	Alexandre Chartre, Eric W. Biederman, Jann Horn, Dave Hansen

On Thu, Jun 18, 2020 at 7:10 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> Lai,
>
> Lai Jiangshan <jiangshanlai+lkml@gmail.com> writes:
>
> > Hello and Ping
>
> you have poked on that on Tuesday, i.e. 2 days ago. It's on the todo
> list, but not the must urgent problem on the planet.
>

Just as a heads up, I'd be surprised if I can get to this in time for
5.9.  I'm still dealing with the fallout from 5.8-rc1, and there's no
shortage of fallout...

-Andy

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH V3 0/3] x86/entry: simply stack switching when exception on userspace
       [not found]                 ` <20200817062355.2884-1-jiangshanlai@gmail.com>
@ 2020-08-17  5:31                   ` Lai Jiangshan
  2020-09-10 10:12                     ` Lai Jiangshan
  2020-08-17  6:23                   ` [PATCH V3 1/3] x86/entry: avoid calling into sync_regs() when entering from userspace Lai Jiangshan
                                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 24+ messages in thread
From: Lai Jiangshan @ 2020-08-17  5:31 UTC (permalink / raw)
  To: Lai Jiangshan, LKML
  Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	x86, H. Peter Anvin, Peter Zijlstra, Alexandre Chartre,
	Eric W. Biederman, Jann Horn, Dave Hansen


Deeply sorry, the cover-letter was forgotten to send to LKML.

Here it is:

On 2020/8/17 14:23, Lai Jiangshan wrote:
> From: Lai Jiangshan <laijs@linux.alibaba.com>
> 
> 7f2590a110b8("x86/entry/64: Use a per-CPU trampoline stack for IDT entries")
> has resulted that when exception on userspace, the kernel (error_entry)
> always push the pt_regs to entry stack(sp0), and then copy them to the
> kernel stack.
> 
> And recent x86/entry work makes interrupt also use idtentry
> and makes all the interrupt code save the pt_regs on the sp0 stack
> and then copy it to the thread stack like exception.
> 
> This is hot path (page fault, ipi), such overhead should be avoided.
> And the original interrupt_entry directly switches to kernel stack
> and pushes pt_regs to kernel stack. We should do it for error_entry.
> This is the job of patch1.
> 
> Patch 2-3 simplify stack switching for .Lerror_bad_iret by just doing
> all the work in one function (fixup_bad_iret()).
> 
> The patch set is based on v5.9-rc1
> 
> Changed from V1:
> 	based on tip/master -> based on tip/x86/entry
> 
> 	patch 1 replaces the patch1,2 of V1, it borrows the
> 	original interrupt_entry's code into error_entry.
> 
> 	patch2-4 is V1's patch3-5, unchanged (but rebased)
> 
> Changed from V2:
> 	(re-)based on v5.9-rc1
> 	drop the patch4 of V2 patchset
> 
> Cc: Andy Lutomirski <luto@kernel.org>,
> Cc: Thomas Gleixner <tglx@linutronix.de>,
> Cc: Ingo Molnar <mingo@redhat.com>,
> Cc: Borislav Petkov <bp@alien8.de>,
> Cc: x86@kernel.org,
> Cc: "H. Peter Anvin" <hpa@zytor.com>,
> Cc: Peter Zijlstra <peterz@infradead.org>,
> Cc: Alexandre Chartre <alexandre.chartre@oracle.com>,
> Cc: "Eric W. Biederman" <ebiederm@xmission.com>,
> Cc: Jann Horn <jannh@google.com>,
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> 
> 
> Lai Jiangshan (3):
>    x86/entry: avoid calling into sync_regs() when entering from userspace
>    x86/entry: directly switch to kernel stack when .Lerror_bad_iret
>    x86/entry: remove unused sync_regs()
> 
>   arch/x86/entry/entry_64.S    | 52 +++++++++++++++++++++++-------------
>   arch/x86/include/asm/traps.h |  1 -
>   arch/x86/kernel/traps.c      | 22 +++------------
>   3 files changed, 38 insertions(+), 37 deletions(-)
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH V3 1/3] x86/entry: avoid calling into sync_regs() when entering from userspace
       [not found]                 ` <20200817062355.2884-1-jiangshanlai@gmail.com>
  2020-08-17  5:31                   ` [PATCH V3 0/3] " Lai Jiangshan
@ 2020-08-17  6:23                   ` Lai Jiangshan
  2020-09-11 21:24                     ` Jann Horn
  2020-08-17  6:23                   ` [PATCH V3 2/3] x86/entry: directly switch to kernel stack when .Lerror_bad_iret Lai Jiangshan
  2020-08-17  6:23                   ` [PATCH V3 3/3] x86/entry: remove unused sync_regs() Lai Jiangshan
  3 siblings, 1 reply; 24+ messages in thread
From: Lai Jiangshan @ 2020-08-17  6:23 UTC (permalink / raw)
  Cc: Lai Jiangshan, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin, linux-kernel

From: Lai Jiangshan <laijs@linux.alibaba.com>

7f2590a110b8("x86/entry/64: Use a per-CPU trampoline stack for IDT entries")
made a change that when any exception happens on userspace, the
entry code will save the pt_regs on the sp0 stack, and then copy it
to the thread stack via sync_regs() and switch to thread stack
afterward.

And recent x86/entry work makes interrupt also use idtentry
and makes all the interrupt code save the pt_regs on the sp0 stack
and then copy it to the thread stack like exception.

This is hot path (page fault, ipi), such overhead should be avoided.
This patch borrows the way how original interrupt_entry handles it.
It switches to the thread stack directly right away when comes
from userspace.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/entry/entry_64.S | 43 +++++++++++++++++++++++++++++++--------
 1 file changed, 34 insertions(+), 9 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 70dea9337816..1a7715430da3 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -928,19 +928,42 @@ SYM_CODE_END(paranoid_exit)
 SYM_CODE_START_LOCAL(error_entry)
 	UNWIND_HINT_FUNC
 	cld
-	PUSH_AND_CLEAR_REGS save_ret=1
-	ENCODE_FRAME_POINTER 8
-	testb	$3, CS+8(%rsp)
+	testb	$3, CS-ORIG_RAX+8(%rsp)
 	jz	.Lerror_kernelspace
 
-	/*
-	 * We entered from user mode or we're pretending to have entered
-	 * from user mode due to an IRET fault.
-	 */
 	SWAPGS
 	FENCE_SWAPGS_USER_ENTRY
-	/* We have user CR3.  Change to kernel CR3. */
-	SWITCH_TO_KERNEL_CR3 scratch_reg=%rax
+	/*
+	 * Switch to the thread stack. The IRET frame and orig_ax are
+	 * on the stack, as well as the return address. RDI..R12 are
+	 * not (yet) on the stack and space has not (yet) been
+	 * allocated for them.
+	 */
+	pushq	%rdx
+
+	/* Need to switch before accessing the thread stack. */
+	SWITCH_TO_KERNEL_CR3 scratch_reg=%rdx
+	movq	%rsp, %rdx
+	movq	PER_CPU_VAR(cpu_current_top_of_stack), %rsp
+
+	 /*
+	  * We have RDX, return address, and orig_ax on the stack on
+	  * top of the IRET frame. That means offset=24
+	  */
+	UNWIND_HINT_IRET_REGS base=%rdx offset=24
+
+	pushq	7*8(%rdx)		/* regs->ss */
+	pushq	6*8(%rdx)		/* regs->rsp */
+	pushq	5*8(%rdx)		/* regs->eflags */
+	pushq	4*8(%rdx)		/* regs->cs */
+	pushq	3*8(%rdx)		/* regs->ip */
+	pushq	2*8(%rdx)		/* regs->orig_ax */
+	pushq	8(%rdx)			/* return address */
+	UNWIND_HINT_FUNC
+
+	PUSH_AND_CLEAR_REGS rdx=(%rdx), save_ret=1
+	ENCODE_FRAME_POINTER 8
+	ret
 
 .Lerror_entry_from_usermode_after_swapgs:
 	/* Put us onto the real thread stack. */
@@ -964,6 +987,8 @@ SYM_CODE_START_LOCAL(error_entry)
 	 * for these here too.
 	 */
 .Lerror_kernelspace:
+	PUSH_AND_CLEAR_REGS save_ret=1
+	ENCODE_FRAME_POINTER 8
 	leaq	native_irq_return_iret(%rip), %rcx
 	cmpq	%rcx, RIP+8(%rsp)
 	je	.Lerror_bad_iret
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH V3 2/3] x86/entry: directly switch to kernel stack when .Lerror_bad_iret
       [not found]                 ` <20200817062355.2884-1-jiangshanlai@gmail.com>
  2020-08-17  5:31                   ` [PATCH V3 0/3] " Lai Jiangshan
  2020-08-17  6:23                   ` [PATCH V3 1/3] x86/entry: avoid calling into sync_regs() when entering from userspace Lai Jiangshan
@ 2020-08-17  6:23                   ` Lai Jiangshan
  2020-08-17  6:23                   ` [PATCH V3 3/3] x86/entry: remove unused sync_regs() Lai Jiangshan
  3 siblings, 0 replies; 24+ messages in thread
From: Lai Jiangshan @ 2020-08-17  6:23 UTC (permalink / raw)
  Cc: Lai Jiangshan, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin, Peter Zijlstra,
	Alexandre Chartre, Jann Horn, Dave Hansen, linux-kernel

From: Lai Jiangshan <laijs@linux.alibaba.com>

Directly copy pt_regs to kernel stack when .Lerror_bad_iret.
Directly switch to kernel stack when .Lerror_bad_iret.

We can see that entry_64.S do the following things back to back
when .Lerror_bad_iret:
  call fixup_bad_iret(), switch to sp0 stack with pt_regs copied
  call sync_regs(), switch to kernel stack with pt_regs copied

So we can do the all things together in fixup_bad_iret().

After this patch, fixup_bad_iret() is restored to the behavior before
7f2590a110b8("x86/entry/64: Use a per-CPU trampoline stack for IDT entries")

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/entry/entry_64.S | 13 ++-----------
 arch/x86/kernel/traps.c   |  9 ++++-----
 2 files changed, 6 insertions(+), 16 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 1a7715430da3..911cfa0da637 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -965,16 +965,6 @@ SYM_CODE_START_LOCAL(error_entry)
 	ENCODE_FRAME_POINTER 8
 	ret
 
-.Lerror_entry_from_usermode_after_swapgs:
-	/* Put us onto the real thread stack. */
-	popq	%r12				/* save return addr in %12 */
-	movq	%rsp, %rdi			/* arg0 = pt_regs pointer */
-	call	sync_regs
-	movq	%rax, %rsp			/* switch stack */
-	ENCODE_FRAME_POINTER
-	pushq	%r12
-	ret
-
 .Lerror_entry_done_lfence:
 	FENCE_SWAPGS_KERNEL_ENTRY
 .Lerror_entry_done:
@@ -1028,7 +1018,8 @@ SYM_CODE_START_LOCAL(error_entry)
 	mov	%rsp, %rdi
 	call	fixup_bad_iret
 	mov	%rax, %rsp
-	jmp	.Lerror_entry_from_usermode_after_swapgs
+	ENCODE_FRAME_POINTER 8
+	ret
 SYM_CODE_END(error_entry)
 
 SYM_CODE_START_LOCAL(error_return)
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 1f66d2d1e998..852de6f1bf88 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -684,13 +684,12 @@ struct bad_iret_stack *fixup_bad_iret(struct bad_iret_stack *s)
 	/*
 	 * This is called from entry_64.S early in handling a fault
 	 * caused by a bad iret to user mode.  To handle the fault
-	 * correctly, we want to move our stack frame to where it would
-	 * be had we entered directly on the entry stack (rather than
-	 * just below the IRET frame) and we want to pretend that the
-	 * exception came from the IRET target.
+	 * correctly, we want to move our stack frame to kernel stack
+	 * (rather than just below the IRET frame) and we want to
+	 * pretend that the exception came from the IRET target.
 	 */
 	struct bad_iret_stack tmp, *new_stack =
-		(struct bad_iret_stack *)__this_cpu_read(cpu_tss_rw.x86_tss.sp0) - 1;
+		(struct bad_iret_stack *)__this_cpu_read(cpu_current_top_of_stack) - 1;
 
 	/* Copy the IRET target to the temporary storage. */
 	__memcpy(&tmp.regs.ip, (void *)s->regs.sp, 5*8);
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH V3 3/3] x86/entry: remove unused sync_regs()
       [not found]                 ` <20200817062355.2884-1-jiangshanlai@gmail.com>
                                     ` (2 preceding siblings ...)
  2020-08-17  6:23                   ` [PATCH V3 2/3] x86/entry: directly switch to kernel stack when .Lerror_bad_iret Lai Jiangshan
@ 2020-08-17  6:23                   ` Lai Jiangshan
  3 siblings, 0 replies; 24+ messages in thread
From: Lai Jiangshan @ 2020-08-17  6:23 UTC (permalink / raw)
  Cc: Lai Jiangshan, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	x86, H. Peter Anvin, Andy Lutomirski, Alexandre Chartre,
	Peter Zijlstra, Jann Horn, Dave Hansen, linux-kernel

From: Lai Jiangshan <laijs@linux.alibaba.com>

No more users.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/include/asm/traps.h |  1 -
 arch/x86/kernel/traps.c      | 13 -------------
 2 files changed, 14 deletions(-)

diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index 714b1a30e7b0..52cd29bde1d0 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -10,7 +10,6 @@
 #include <asm/siginfo.h>			/* TRAP_TRACE, ... */
 
 #ifdef CONFIG_X86_64
-asmlinkage __visible notrace struct pt_regs *sync_regs(struct pt_regs *eregs);
 asmlinkage __visible notrace
 struct bad_iret_stack *fixup_bad_iret(struct bad_iret_stack *s);
 void __init trap_init(void);
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 852de6f1bf88..ebea1c3e473b 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -660,19 +660,6 @@ DEFINE_IDTENTRY_RAW(exc_int3)
 }
 
 #ifdef CONFIG_X86_64
-/*
- * Help handler running on a per-cpu (IST or entry trampoline) stack
- * to switch to the normal thread stack if the interrupted code was in
- * user mode. The actual stack switch is done in entry_64.S
- */
-asmlinkage __visible noinstr struct pt_regs *sync_regs(struct pt_regs *eregs)
-{
-	struct pt_regs *regs = (struct pt_regs *)this_cpu_read(cpu_current_top_of_stack) - 1;
-	if (regs != eregs)
-		*regs = *eregs;
-	return regs;
-}
-
 struct bad_iret_stack {
 	void *error_entry_ret;
 	struct pt_regs regs;
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH V3 0/3] x86/entry: simply stack switching when exception on userspace
  2020-08-17  5:31                   ` [PATCH V3 0/3] " Lai Jiangshan
@ 2020-09-10 10:12                     ` Lai Jiangshan
  0 siblings, 0 replies; 24+ messages in thread
From: Lai Jiangshan @ 2020-09-10 10:12 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: LKML, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, X86 ML, H. Peter Anvin, Peter Zijlstra,
	Alexandre Chartre, Eric W. Biederman, Jann Horn, Dave Hansen

> There are definitely good cleanups in here, but I think it would be
> nice rebased to whatever lands in 5.8-rc1 settles.

Hello, tglx, Andy:

Could you have a review on the patches please?

They are based on 5.9-rc1 and sent when 5.9-rc1 was just settled,
and hoped to get their way into 5.10-rc1.

The first two patches can still be applied to today's tip/master,
(Sep 10 2020). The third can't, and conflicts with a commit just
added 3 days ago(#VC related). But the third removes a short snip
of code only, and is easy to be reviewed or even easy to be
applied manually. I will rebase all of them after they or the first
two are reviewed.

I'm sorry that I have to ask you to review again, because the
patches change the code you've recently changed.

Thanks
Lai

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH V3 1/3] x86/entry: avoid calling into sync_regs() when entering from userspace
  2020-08-17  6:23                   ` [PATCH V3 1/3] x86/entry: avoid calling into sync_regs() when entering from userspace Lai Jiangshan
@ 2020-09-11 21:24                     ` Jann Horn
  2020-09-15  7:55                       ` Lai Jiangshan
  0 siblings, 1 reply; 24+ messages in thread
From: Jann Horn @ 2020-09-11 21:24 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Lai Jiangshan, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, the arch/x86 maintainers, H. Peter Anvin,
	kernel list, Kees Cook

On Mon, Aug 17, 2020 at 8:23 AM Lai Jiangshan <jiangshanlai@gmail.com> wrote:
> 7f2590a110b8("x86/entry/64: Use a per-CPU trampoline stack for IDT entries")
> made a change that when any exception happens on userspace, the
> entry code will save the pt_regs on the sp0 stack, and then copy it
> to the thread stack via sync_regs() and switch to thread stack
> afterward.
>
> And recent x86/entry work makes interrupt also use idtentry
> and makes all the interrupt code save the pt_regs on the sp0 stack
> and then copy it to the thread stack like exception.
>
> This is hot path (page fault, ipi), such overhead should be avoided.
> This patch borrows the way how original interrupt_entry handles it.
> It switches to the thread stack directly right away when comes
> from userspace.
>
> Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>

As far as I can see, on systems affected by Meltdown, this patch fixes
register state leakage between tasks because any data that is written
to the per-CPU trampoline stacks must be considered visible to all
userspace. I think that makes this a fix that should go into stable
kernels.

Therefore, please add:

Fixes: 7f2590a110b8 ("x86/entry/64: Use a per-CPU trampoline stack for
IDT entries")
Cc: stable@vger.kernel.org


> ---
>  arch/x86/entry/entry_64.S | 43 +++++++++++++++++++++++++++++++--------
>  1 file changed, 34 insertions(+), 9 deletions(-)
>
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index 70dea9337816..1a7715430da3 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -928,19 +928,42 @@ SYM_CODE_END(paranoid_exit)
>  SYM_CODE_START_LOCAL(error_entry)
>         UNWIND_HINT_FUNC
>         cld
> -       PUSH_AND_CLEAR_REGS save_ret=1
> -       ENCODE_FRAME_POINTER 8
> -       testb   $3, CS+8(%rsp)
> +       testb   $3, CS-ORIG_RAX+8(%rsp)
>         jz      .Lerror_kernelspace
>
> -       /*
> -        * We entered from user mode or we're pretending to have entered
> -        * from user mode due to an IRET fault.
> -        */

As far as I can tell, this comment is still correct, and it is
helpful. Why are you removing it?

>         SWAPGS
>         FENCE_SWAPGS_USER_ENTRY
> -       /* We have user CR3.  Change to kernel CR3. */
> -       SWITCH_TO_KERNEL_CR3 scratch_reg=%rax
> +       /*
> +        * Switch to the thread stack. The IRET frame and orig_ax are
> +        * on the stack, as well as the return address. RDI..R12 are

Did you mean RDI..R15?

> +        * not (yet) on the stack and space has not (yet) been
> +        * allocated for them.
> +        */
> +       pushq   %rdx
> +
> +       /* Need to switch before accessing the thread stack. */
> +       SWITCH_TO_KERNEL_CR3 scratch_reg=%rdx
> +       movq    %rsp, %rdx
> +       movq    PER_CPU_VAR(cpu_current_top_of_stack), %rsp

Can we avoid spilling %rdx to the meltdown-readable entry stack here?
We could do something similar to what entry_SYSCALL_64 does, roughly
like this:


/*
 * While there is an iret frame, it won't be easy to find for a
 * few instructions, so let's pretend it doesn't exist.
 */
UNWIND_HINT_EMPTY

/*
 * Switch to kernel CR3 and stack. To avoid spilling secret
 * userspace register state to the trampoline stack, we use
 * RSP as scratch - we can reconstruct the old RSP afterwards
 * using TSS_sp0.
 */
SWITCH_TO_KERNEL_CR3 scratch_reg=%rsp
movq    PER_CPU_VAR(cpu_current_top_of_stack), %rsp

pushq %rdx /* scratch, will be replaced with regs->ss later */
mov PER_CPU_VAR(cpu_tss_rw + TSS_sp0), %rdx
sub $7*8, %rdx /* return address, orig_ax, IRET frame */
/*
 * We have return address and orig_ax on the stack on
 * top of the IRET frame. That means offset=2*8
 */
UNWIND_HINT_IRET_REGS base=%rdx offset=-5*8

pushq   -2*8(%rdx)               /* regs->rsp */
pushq   -3*8(%rdx)               /* regs->eflags */
pushq   -4*8(%rdx)               /* regs->cs */
pushq   -5*8(%rdx)               /* regs->ip */
pushq   -6*8(%rdx)               /* regs->orig_ax */
pushq   -7*8(%rdx)               /* return address */
UNWIND_HINT_FUNC

PUSH_AND_CLEAR_REGS rdx=7*8(%rsp), save_ret=1

/* copy regs->ss from trampoline stack */
movq PER_CPU_VAR(cpu_tss_rw + TSS_sp0), %rax
mov -1*8(%rax), %rax
movq %rax, 20*8(%rsp)

ENCODE_FRAME_POINTER 8

ret


Does something like that seem like a reasonable idea?

> +        /*
> +         * We have RDX, return address, and orig_ax on the stack on
> +         * top of the IRET frame. That means offset=24
> +         */
> +       UNWIND_HINT_IRET_REGS base=%rdx offset=24
> +
> +       pushq   7*8(%rdx)               /* regs->ss */
> +       pushq   6*8(%rdx)               /* regs->rsp */
> +       pushq   5*8(%rdx)               /* regs->eflags */
> +       pushq   4*8(%rdx)               /* regs->cs */
> +       pushq   3*8(%rdx)               /* regs->ip */
> +       pushq   2*8(%rdx)               /* regs->orig_ax */
> +       pushq   8(%rdx)                 /* return address */
> +       UNWIND_HINT_FUNC
> +
> +       PUSH_AND_CLEAR_REGS rdx=(%rdx), save_ret=1
> +       ENCODE_FRAME_POINTER 8
> +       ret

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH V3 1/3] x86/entry: avoid calling into sync_regs() when entering from userspace
  2020-09-11 21:24                     ` Jann Horn
@ 2020-09-15  7:55                       ` Lai Jiangshan
  0 siblings, 0 replies; 24+ messages in thread
From: Lai Jiangshan @ 2020-09-15  7:55 UTC (permalink / raw)
  To: Jann Horn
  Cc: Lai Jiangshan, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, the arch/x86 maintainers, H. Peter Anvin,
	kernel list, Kees Cook

On Sat, Sep 12, 2020 at 5:24 AM Jann Horn <jannh@google.com> wrote:


Hello Jann

Thanks for your review.

> As far as I can see, on systems affected by Meltdown, this patch fixes
> register state leakage between tasks because any data that is written
> to the per-CPU trampoline stacks must be considered visible to all
> userspace. I think that makes this a fix that should go into stable
> kernels.

I know what you mean since I did similarly as you said in another
project. But I hesitated to claim that. Not everyone goes too
paranoid to hide all registers. I consider them a nice cleanup
the makes future entry_64.S better.

>
> Therefore, please add:
>
> Fixes: 7f2590a110b8 ("x86/entry/64: Use a per-CPU trampoline stack for
> IDT entries")
> Cc: stable@vger.kernel.org
>
>
> > ---
> >  arch/x86/entry/entry_64.S | 43 +++++++++++++++++++++++++++++++--------
> >  1 file changed, 34 insertions(+), 9 deletions(-)
> >
> > diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> > index 70dea9337816..1a7715430da3 100644
> > --- a/arch/x86/entry/entry_64.S
> > +++ b/arch/x86/entry/entry_64.S
> > @@ -928,19 +928,42 @@ SYM_CODE_END(paranoid_exit)
> >  SYM_CODE_START_LOCAL(error_entry)
> >         UNWIND_HINT_FUNC
> >         cld
> > -       PUSH_AND_CLEAR_REGS save_ret=1
> > -       ENCODE_FRAME_POINTER 8
> > -       testb   $3, CS+8(%rsp)
> > +       testb   $3, CS-ORIG_RAX+8(%rsp)
> >         jz      .Lerror_kernelspace
> >
> > -       /*
> > -        * We entered from user mode or we're pretending to have entered
> > -        * from user mode due to an IRET fault.
> > -        */
>
> As far as I can tell, this comment is still correct, and it is
> helpful. Why are you removing it?

This comment actually describes .Lerror_entry_from_usermode_after_swapgs
which is kind of far from here.

"We entered from user mode" is already self-explained by the code above.

"we're pretending to have entered from user mode due to an IRET fault"
notes code reviewers that .Lerror_bad_iret can jump to
.Lerror_entry_from_usermode_after_swapgs.

.Lerror_entry_from_usermode_after_swapgs will be removed in the next
patch.

Since the comment is too far from its target, and the target
will be removed in the next patch, so I remove the comments.

Maybe I should move the removal of the comment in the next patch?
But should I rewrite the comments before that?

>
> >         SWAPGS
> >         FENCE_SWAPGS_USER_ENTRY
> > -       /* We have user CR3.  Change to kernel CR3. */
> > -       SWITCH_TO_KERNEL_CR3 scratch_reg=%rax
> > +       /*
> > +        * Switch to the thread stack. The IRET frame and orig_ax are
> > +        * on the stack, as well as the return address. RDI..R12 are
>
> Did you mean RDI..R15?

Good catch!

The whole code along with the comments is copied from original
interrupt_entry. The "RDI..R12" in the comment has been moved
several times and can be traced back to 7f2590a110b8
("x86/entry/64: Use a per-CPU trampoline stack for IDT entries")
(You already just referred to this commit).

>
> > +        * not (yet) on the stack and space has not (yet) been
> > +        * allocated for them.
> > +        */
> > +       pushq   %rdx
> > +
> > +       /* Need to switch before accessing the thread stack. */
> > +       SWITCH_TO_KERNEL_CR3 scratch_reg=%rdx
> > +       movq    %rsp, %rdx
> > +       movq    PER_CPU_VAR(cpu_current_top_of_stack), %rsp
>
> Can we avoid spilling %rdx to the meltdown-readable entry stack here?
> We could do something similar to what entry_SYSCALL_64 does, roughly
> like this:

We must spill at least one register besides %rsp. I once used %rbp
for scratch_reg in a not-yet-opensourced project. I think
spilling %rbp is safer than spilling other registers, but I can't
find strong enough reasoning.

The feeble reason at hand is that %rbp is often used as the frame
pointer in userspace which is safer to be leaked since %rsp is
already leaked.

See https://github.com/google/gvisor/pull/2256#discussion_r399005782
("implement KPTI for gvisor")
I once recommended spilling %rbp in gVisor.
In the pull-request in gVisor, even Google didn't show their
eagerness to hide the kernel along with the registers from APPs.

If you also consider spilling %rbp is better than spilling
%rdx and other x86 people also want that, I can do it
in other patches.

1) This patch uses "PUSH_AND_CLEAR_REGS rdx=(%rdx), save_ret=1".
   And using %rbp needs to change PUSH_AND_CLEAR_REGS which
   is better to be in a separated commit.
2) The code in entry_64.S spills other %rdi %rdx etc. to
   entry stack. They need to be changed to %rbp too.

>
>
> /*
>  * While there is an iret frame, it won't be easy to find for a
>  * few instructions, so let's pretend it doesn't exist.
>  */
> UNWIND_HINT_EMPTY
>
> /*
>  * Switch to kernel CR3 and stack. To avoid spilling secret
>  * userspace register state to the trampoline stack, we use
>  * RSP as scratch - we can reconstruct the old RSP afterwards
>  * using TSS_sp0.
>  */
> SWITCH_TO_KERNEL_CR3 scratch_reg=%rsp

It is already in the fine state in the proper place
in the stack.

Abusing %rsp is the source of perdition. Linus, tglx and
many others who have been touched entry_64.S have hated
syscall instruction for not automatically switching %rsp
and it is the source of many hateful IST exceptions.

See https://lore.kernel.org/lkml/875z98jkof.fsf@nanos.tec.linutronix.de/

I would not like to raise any objection for this reason
since I also hate abusing %rsp.

I expected you to get tonnes of replies if tglx ever notices
your such suggestion.

> movq    PER_CPU_VAR(cpu_current_top_of_stack), %rsp
>
> pushq %rdx /* scratch, will be replaced with regs->ss later */
> mov PER_CPU_VAR(cpu_tss_rw + TSS_sp0), %rdx
> sub $7*8, %rdx /* return address, orig_ax, IRET frame */
> /*
>  * We have return address and orig_ax on the stack on
>  * top of the IRET frame. That means offset=2*8
>  */
> UNWIND_HINT_IRET_REGS base=%rdx offset=-5*8
>
> pushq   -2*8(%rdx)               /* regs->rsp */
> pushq   -3*8(%rdx)               /* regs->eflags */
> pushq   -4*8(%rdx)               /* regs->cs */
> pushq   -5*8(%rdx)               /* regs->ip */
> pushq   -6*8(%rdx)               /* regs->orig_ax */
> pushq   -7*8(%rdx)               /* return address */
> UNWIND_HINT_FUNC
>
> PUSH_AND_CLEAR_REGS rdx=7*8(%rsp), save_ret=1
>
> /* copy regs->ss from trampoline stack */
> movq PER_CPU_VAR(cpu_tss_rw + TSS_sp0), %rax
> mov -1*8(%rax), %rax
> movq %rax, 20*8(%rsp)
>
> ENCODE_FRAME_POINTER 8
>
> ret


Thanks for your review and your recommendations.

Lai.

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2020-09-15  7:56 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-27  7:31 [PATCH 0/5] x86/entry: simply stack switching when exception on userspace Lai Jiangshan
2020-05-27  7:31 ` [PATCH 1/5] x86/entry: introduce macro idtentry_swapgs_and_switch_to_kernel_stack Lai Jiangshan
2020-05-28 19:12   ` Thomas Gleixner
2020-05-29  8:26     ` [PATCH V2 0/4] x86/entry: simply stack switching when exception on userspace Lai Jiangshan
2020-05-29  8:26       ` [PATCH V2 1/4] x86/entry: avoid calling into sync_regs() when entering from userspace Lai Jiangshan
2020-05-29  8:26       ` [PATCH V2 2/4] x86/entry: directly switch to kernel stack when .Lerror_bad_iret Lai Jiangshan
2020-05-29  8:26       ` [PATCH V2 3/4] x86/entry: remove unused sync_regs() Lai Jiangshan
2020-05-29  8:26       ` [PATCH V2 4/4] x86/entry: don't copy to tmp in fixup_bad_iret Lai Jiangshan
2020-05-29 18:32       ` [PATCH V2 0/4] x86/entry: simply stack switching when exception on userspace Andy Lutomirski
2020-06-16  1:56         ` Lai Jiangshan
2020-06-18 13:52           ` Lai Jiangshan
2020-06-18 14:10             ` Thomas Gleixner
2020-06-27 21:03               ` Andy Lutomirski
     [not found]                 ` <20200817062355.2884-1-jiangshanlai@gmail.com>
2020-08-17  5:31                   ` [PATCH V3 0/3] " Lai Jiangshan
2020-09-10 10:12                     ` Lai Jiangshan
2020-08-17  6:23                   ` [PATCH V3 1/3] x86/entry: avoid calling into sync_regs() when entering from userspace Lai Jiangshan
2020-09-11 21:24                     ` Jann Horn
2020-09-15  7:55                       ` Lai Jiangshan
2020-08-17  6:23                   ` [PATCH V3 2/3] x86/entry: directly switch to kernel stack when .Lerror_bad_iret Lai Jiangshan
2020-08-17  6:23                   ` [PATCH V3 3/3] x86/entry: remove unused sync_regs() Lai Jiangshan
2020-05-27  7:31 ` [PATCH 2/5] x86/entry: avoid calling into sync_regs() when entering from userspace Lai Jiangshan
2020-05-27  7:31 ` [PATCH 3/5] x86/entry: directly switch to kernel stack when .Lerror_bad_iret Lai Jiangshan
2020-05-27  7:31 ` [PATCH 4/5] x86/entry: remove unused sync_regs() Lai Jiangshan
2020-05-27  7:31 ` [PATCH 5/5] x86/entry: don't copy to tmp in fixup_bad_iret Lai Jiangshan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).