All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V2 0/7] x86/entry: Clean up entry code
@ 2022-03-03  3:54 Lai Jiangshan
  2022-03-03  3:54 ` [PATCH V2 1/7] x86/traps: Move pt_regs only in fixup_bad_iret() Lai Jiangshan
                   ` (6 more replies)
  0 siblings, 7 replies; 12+ messages in thread
From: Lai Jiangshan @ 2022-03-03  3:54 UTC (permalink / raw)
  To: linux-kernel; +Cc: x86, Lai Jiangshan

From: Lai Jiangshan <jiangshan.ljs@antgroup.com>

This patchset moves the stack-switch code to the place where
error_entry() return, unravels error_entry() from XENpv and makes
entry_INT80_compat use idtentry macro.

This patchset is highly related to XENpv, because it does the extra
cleanup to convert SWAPGS to swapgs after major cleanup is done.

The patches are the version 2 of
https://lore.kernel.org/lkml/20211208110833.65366-1-jiangshanlai@gmail.com/
which are picked from the patchset
https://lore.kernel.org/lkml/20211126101209.8613-1-jiangshanlai@gmail.com/
which coverts ASM code to C code.  These patches are prepared for that
purpose.  But this patchset has it own value: it simplifies the stack
switch, avoids leaving the old stack inside a function call, and
separates XENpv code with native code without adding new code.

Changed from V1
	Squash cleanup patches converting SWAPGS to swapgs into one patch

	Use my official email address (Ant Group).  The work is backed
	by my company and I was incorrectly misunderstood that
	XXX@linux.alibaba.com is the only portal for opensource work
	in the corporate group.

PS:  I think the work is ready to be queued and I'm deeply sorry not to
resend it for more reviews for the last two months because I have been
obsessed with a boardgame and my brain power was drained.

Lai Jiangshan (7):
  x86/traps: Move pt_regs only in fixup_bad_iret()
  x86/entry: Switch the stack after error_entry() returns
  x86/entry: move PUSH_AND_CLEAR_REGS out of error_entry
  x86/entry: Move cld to the start of idtentry
  x86/entry: Don't call error_entry for XENPV
  x86/entry: Use idtentry macro for entry_INT80_compat
  x86/entry: Convert SWAPGS to swapgs and remove the definition of
    SWAPGS

 arch/x86/entry/entry_64.S        |  65 +++++++++++++------
 arch/x86/entry/entry_64_compat.S | 104 +------------------------------
 arch/x86/include/asm/idtentry.h  |  47 ++++++++++++++
 arch/x86/include/asm/irqflags.h  |   2 -
 arch/x86/include/asm/proto.h     |   4 --
 arch/x86/include/asm/traps.h     |   2 +-
 arch/x86/kernel/traps.c          |  17 ++---
 7 files changed, 102 insertions(+), 139 deletions(-)

-- 
2.19.1.6.gb485710b


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH V2 1/7] x86/traps: Move pt_regs only in fixup_bad_iret()
  2022-03-03  3:54 [PATCH V2 0/7] x86/entry: Clean up entry code Lai Jiangshan
@ 2022-03-03  3:54 ` Lai Jiangshan
  2022-03-03  3:54 ` [PATCH V2 2/7] x86/entry: Switch the stack after error_entry() returns Lai Jiangshan
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Lai Jiangshan @ 2022-03-03  3:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Lai Jiangshan, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
	Peter Zijlstra, Fenghua Yu, Joerg Roedel, Chang S. Bae

From: Lai Jiangshan <jiangshan.ljs@antgroup.com>

fixup_bad_iret() and sync_regs() have similar arguments and do similar
work that copies full or partial pt_regs to a place and switches stack
after return.  They are quite the same, but fixup_bad_iret() not only
copies the pt_regs but also the return address of error_entry() while
sync_regs() copies the pt_regs only and the return address of
error_entry() was preserved and handled in ASM code.

This patch makes fixup_bad_iret() work like sync_regs() and the
handling of the return address of error_entry() is moved in ASM code.

It removes the need to use the struct bad_iret_stack, simplifies
fixup_bad_iret() and makes the ASM error_entry() call fixup_bad_iret()
as the same as calling sync_regs() which adds readability because
the calling patterns are exactly the same.

It is prepared for later patch to do the stack switch after the
error_entry() which simplifies the code further.

Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
---
 arch/x86/entry/entry_64.S    |  5 ++++-
 arch/x86/include/asm/traps.h |  2 +-
 arch/x86/kernel/traps.c      | 17 ++++++-----------
 3 files changed, 11 insertions(+), 13 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 466df3e50276..24846284eda5 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1039,9 +1039,12 @@ SYM_CODE_START_LOCAL(error_entry)
 	 * Pretend that the exception came from user mode: set up pt_regs
 	 * as if we faulted immediately after IRET.
 	 */
-	mov	%rsp, %rdi
+	popq	%r12				/* save return addr in %12 */
+	movq	%rsp, %rdi			/* arg0 = pt_regs pointer */
 	call	fixup_bad_iret
 	mov	%rax, %rsp
+	ENCODE_FRAME_POINTER
+	pushq	%r12
 	jmp	.Lerror_entry_from_usermode_after_swapgs
 SYM_CODE_END(error_entry)
 
diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index 6221be7cafc3..1cdd7e8bcba7 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -13,7 +13,7 @@
 #ifdef CONFIG_X86_64
 asmlinkage __visible notrace struct pt_regs *sync_regs(struct pt_regs *eregs);
 asmlinkage __visible notrace
-struct bad_iret_stack *fixup_bad_iret(struct bad_iret_stack *s);
+struct pt_regs *fixup_bad_iret(struct pt_regs *bad_regs);
 void __init trap_init(void);
 asmlinkage __visible noinstr struct pt_regs *vc_switch_off_ist(struct pt_regs *eregs);
 #endif
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 7ef00dee35be..2b1f049afb14 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -816,13 +816,8 @@ asmlinkage __visible noinstr struct pt_regs *vc_switch_off_ist(struct pt_regs *r
 }
 #endif
 
-struct bad_iret_stack {
-	void *error_entry_ret;
-	struct pt_regs regs;
-};
-
 asmlinkage __visible noinstr
-struct bad_iret_stack *fixup_bad_iret(struct bad_iret_stack *s)
+struct pt_regs *fixup_bad_iret(struct pt_regs *bad_regs)
 {
 	/*
 	 * This is called from entry_64.S early in handling a fault
@@ -832,19 +827,19 @@ struct bad_iret_stack *fixup_bad_iret(struct bad_iret_stack *s)
 	 * just below the IRET frame) and we want to pretend that the
 	 * exception came from the IRET target.
 	 */
-	struct bad_iret_stack tmp, *new_stack =
-		(struct bad_iret_stack *)__this_cpu_read(cpu_tss_rw.x86_tss.sp0) - 1;
+	struct pt_regs tmp, *new_stack =
+		(struct pt_regs *)__this_cpu_read(cpu_tss_rw.x86_tss.sp0) - 1;
 
 	/* Copy the IRET target to the temporary storage. */
-	__memcpy(&tmp.regs.ip, (void *)s->regs.sp, 5*8);
+	__memcpy(&tmp.ip, (void *)bad_regs->sp, 5*8);
 
 	/* Copy the remainder of the stack from the current stack. */
-	__memcpy(&tmp, s, offsetof(struct bad_iret_stack, regs.ip));
+	__memcpy(&tmp, bad_regs, offsetof(struct pt_regs, ip));
 
 	/* Update the entry stack */
 	__memcpy(new_stack, &tmp, sizeof(tmp));
 
-	BUG_ON(!user_mode(&new_stack->regs));
+	BUG_ON(!user_mode(new_stack));
 	return new_stack;
 }
 #endif
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH V2 2/7] x86/entry: Switch the stack after error_entry() returns
  2022-03-03  3:54 [PATCH V2 0/7] x86/entry: Clean up entry code Lai Jiangshan
  2022-03-03  3:54 ` [PATCH V2 1/7] x86/traps: Move pt_regs only in fixup_bad_iret() Lai Jiangshan
@ 2022-03-03  3:54 ` Lai Jiangshan
  2022-03-03  8:00   ` Peter Zijlstra
  2022-03-03  3:54 ` [PATCH V2 3/7] x86/entry: move PUSH_AND_CLEAR_REGS out of error_entry Lai Jiangshan
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 12+ messages in thread
From: Lai Jiangshan @ 2022-03-03  3:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Lai Jiangshan, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin

From: Lai Jiangshan <jiangshan.ljs@antgroup.com>

error_entry() calls sync_regs() to settle/copy the pt_regs and switches
the stack directly after sync_regs().  But error_entry() itself is also
a function call, the switching has to handle the return address of it
together, which causes the work complicated and tangly.

Switching to the stack after error_entry() makes the code simpler and
intuitive.

Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
---
 arch/x86/entry/entry_64.S | 16 ++++++----------
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 24846284eda5..a51cad2b7fff 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -323,6 +323,8 @@ SYM_CODE_END(ret_from_fork)
 .macro idtentry_body cfunc has_error_code:req
 
 	call	error_entry
+	movq	%rax, %rsp			/* switch stack settled by sync_regs() */
+	ENCODE_FRAME_POINTER
 	UNWIND_HINT_REGS
 
 	movq	%rsp, %rdi			/* pt_regs pointer into 1st argument*/
@@ -980,14 +982,10 @@ SYM_CODE_START_LOCAL(error_entry)
 	/* We have user CR3.  Change to kernel CR3. */
 	SWITCH_TO_KERNEL_CR3 scratch_reg=%rax
 
+	leaq	8(%rsp), %rdi			/* arg0 = pt_regs pointer */
 .Lerror_entry_from_usermode_after_swapgs:
 	/* Put us onto the real thread stack. */
-	popq	%r12				/* save return addr in %12 */
-	movq	%rsp, %rdi			/* arg0 = pt_regs pointer */
 	call	sync_regs
-	movq	%rax, %rsp			/* switch stack */
-	ENCODE_FRAME_POINTER
-	pushq	%r12
 	RET
 
 	/*
@@ -1019,6 +1017,7 @@ SYM_CODE_START_LOCAL(error_entry)
 	 */
 .Lerror_entry_done_lfence:
 	FENCE_SWAPGS_KERNEL_ENTRY
+	leaq	8(%rsp), %rax			/* return pt_regs pointer */
 	RET
 
 .Lbstep_iret:
@@ -1039,12 +1038,9 @@ SYM_CODE_START_LOCAL(error_entry)
 	 * Pretend that the exception came from user mode: set up pt_regs
 	 * as if we faulted immediately after IRET.
 	 */
-	popq	%r12				/* save return addr in %12 */
-	movq	%rsp, %rdi			/* arg0 = pt_regs pointer */
+	leaq	8(%rsp), %rdi			/* arg0 = pt_regs pointer */
 	call	fixup_bad_iret
-	mov	%rax, %rsp
-	ENCODE_FRAME_POINTER
-	pushq	%r12
+	mov	%rax, %rdi
 	jmp	.Lerror_entry_from_usermode_after_swapgs
 SYM_CODE_END(error_entry)
 
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH V2 3/7] x86/entry: move PUSH_AND_CLEAR_REGS out of error_entry
  2022-03-03  3:54 [PATCH V2 0/7] x86/entry: Clean up entry code Lai Jiangshan
  2022-03-03  3:54 ` [PATCH V2 1/7] x86/traps: Move pt_regs only in fixup_bad_iret() Lai Jiangshan
  2022-03-03  3:54 ` [PATCH V2 2/7] x86/entry: Switch the stack after error_entry() returns Lai Jiangshan
@ 2022-03-03  3:54 ` Lai Jiangshan
  2022-03-03  8:54   ` Peter Zijlstra
  2022-03-03  3:54 ` [PATCH V2 4/7] x86/entry: Move cld to the start of idtentry Lai Jiangshan
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 12+ messages in thread
From: Lai Jiangshan @ 2022-03-03  3:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Lai Jiangshan, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin

From: Lai Jiangshan <jiangshan.ljs@antgroup.com>

Moving PUSH_AND_CLEAR_REGS out of error_entry doesn't change any
functionality.  It will enlarge the size:

size arch/x86/entry/entry_64.o.before:
   text	   data	    bss	    dec	    hex	filename
  17916	    384	      0	  18300	   477c	arch/x86/entry/entry_64.o

size --format=SysV arch/x86/entry/entry_64.o.before:
.entry.text                      5528      0
.orc_unwind                      6456      0
.orc_unwind_ip                   4304      0

size arch/x86/entry/entry_64.o.after:
   text	   data	    bss	    dec	    hex	filename
  26868	    384	      0	  27252	   6a74	arch/x86/entry/entry_64.o

size --format=SysV arch/x86/entry/entry_64.o.after:
.entry.text                      8200      0
.orc_unwind                     10224      0
.orc_unwind_ip                   6816      0

But .entry.text in x86_64 is 2M aligned, enlarging it to 8.2k doesn't
enlarge the final text size.

The tables .orc_unwind[_ip] are enlarged due to it adds many pushes.

It is prepared for not calling error_entry() from XENPV in later patch
and for future converting the whole error_entry into C code.

Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
---
 arch/x86/entry/entry_64.S | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index a51cad2b7fff..3ca64bad4697 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -322,6 +322,9 @@ SYM_CODE_END(ret_from_fork)
  */
 .macro idtentry_body cfunc has_error_code:req
 
+	PUSH_AND_CLEAR_REGS
+	ENCODE_FRAME_POINTER
+
 	call	error_entry
 	movq	%rax, %rsp			/* switch stack settled by sync_regs() */
 	ENCODE_FRAME_POINTER
@@ -968,8 +971,6 @@ SYM_CODE_END(paranoid_exit)
 SYM_CODE_START_LOCAL(error_entry)
 	UNWIND_HINT_FUNC
 	cld
-	PUSH_AND_CLEAR_REGS save_ret=1
-	ENCODE_FRAME_POINTER 8
 	testb	$3, CS+8(%rsp)
 	jz	.Lerror_kernelspace
 
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH V2 4/7] x86/entry: Move cld to the start of idtentry
  2022-03-03  3:54 [PATCH V2 0/7] x86/entry: Clean up entry code Lai Jiangshan
                   ` (2 preceding siblings ...)
  2022-03-03  3:54 ` [PATCH V2 3/7] x86/entry: move PUSH_AND_CLEAR_REGS out of error_entry Lai Jiangshan
@ 2022-03-03  3:54 ` Lai Jiangshan
  2022-03-03  3:54 ` [PATCH V2 5/7] x86/entry: Don't call error_entry for XENPV Lai Jiangshan
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Lai Jiangshan @ 2022-03-03  3:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Lai Jiangshan, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin

From: Lai Jiangshan <jiangshan.ljs@antgroup.com>

Make it next to CLAC

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
---
 arch/x86/entry/entry_64.S | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 3ca64bad4697..630bf8164a09 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -356,6 +356,7 @@ SYM_CODE_END(ret_from_fork)
 SYM_CODE_START(\asmsym)
 	UNWIND_HINT_IRET_REGS offset=\has_error_code*8
 	ASM_CLAC
+	cld
 
 	.if \has_error_code == 0
 		pushq	$-1			/* ORIG_RAX: no syscall to restart */
@@ -423,6 +424,7 @@ SYM_CODE_END(\asmsym)
 SYM_CODE_START(\asmsym)
 	UNWIND_HINT_IRET_REGS
 	ASM_CLAC
+	cld
 
 	pushq	$-1			/* ORIG_RAX: no syscall to restart */
 
@@ -478,6 +480,7 @@ SYM_CODE_END(\asmsym)
 SYM_CODE_START(\asmsym)
 	UNWIND_HINT_IRET_REGS
 	ASM_CLAC
+	cld
 
 	/*
 	 * If the entry is from userspace, switch stacks and treat it as
@@ -539,6 +542,7 @@ SYM_CODE_END(\asmsym)
 SYM_CODE_START(\asmsym)
 	UNWIND_HINT_IRET_REGS offset=8
 	ASM_CLAC
+	cld
 
 	/* paranoid_entry returns GS information for paranoid_exit in EBX. */
 	call	paranoid_entry
@@ -852,7 +856,6 @@ SYM_CODE_END(xen_failsafe_callback)
  */
 SYM_CODE_START_LOCAL(paranoid_entry)
 	UNWIND_HINT_FUNC
-	cld
 	PUSH_AND_CLEAR_REGS save_ret=1
 	ENCODE_FRAME_POINTER 8
 
@@ -970,7 +973,6 @@ SYM_CODE_END(paranoid_exit)
  */
 SYM_CODE_START_LOCAL(error_entry)
 	UNWIND_HINT_FUNC
-	cld
 	testb	$3, CS+8(%rsp)
 	jz	.Lerror_kernelspace
 
@@ -1103,6 +1105,7 @@ SYM_CODE_START(asm_exc_nmi)
 	 */
 
 	ASM_CLAC
+	cld
 
 	/* Use %rdx as our temp variable throughout */
 	pushq	%rdx
@@ -1122,7 +1125,6 @@ SYM_CODE_START(asm_exc_nmi)
 	 */
 
 	swapgs
-	cld
 	FENCE_SWAPGS_USER_ENTRY
 	SWITCH_TO_KERNEL_CR3 scratch_reg=%rdx
 	movq	%rsp, %rdx
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH V2 5/7] x86/entry: Don't call error_entry for XENPV
  2022-03-03  3:54 [PATCH V2 0/7] x86/entry: Clean up entry code Lai Jiangshan
                   ` (3 preceding siblings ...)
  2022-03-03  3:54 ` [PATCH V2 4/7] x86/entry: Move cld to the start of idtentry Lai Jiangshan
@ 2022-03-03  3:54 ` Lai Jiangshan
  2022-03-03  3:54 ` [PATCH V2 6/7] x86/entry: Use idtentry macro for entry_INT80_compat Lai Jiangshan
  2022-03-03  3:54 ` [PATCH V2 7/7] x86/entry: Convert SWAPGS to swapgs and remove the definition of SWAPGS Lai Jiangshan
  6 siblings, 0 replies; 12+ messages in thread
From: Lai Jiangshan @ 2022-03-03  3:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Lai Jiangshan, xen-devel, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin

From: Lai Jiangshan <jiangshan.ljs@antgroup.com>

When in XENPV, it is already in the task stack, and it can't fault
for native_iret() nor native_load_gs_index() since XENPV uses its own
pvops for iret and load_gs_index().  And it doesn't need to switch CR3.
So there is no reason to call error_entry() in XENPV.

Cc: xen-devel@lists.xenproject.org
Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
---
 arch/x86/entry/entry_64.S | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 630bf8164a09..adc9f7619d1b 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -325,8 +325,17 @@ SYM_CODE_END(ret_from_fork)
 	PUSH_AND_CLEAR_REGS
 	ENCODE_FRAME_POINTER
 
-	call	error_entry
-	movq	%rax, %rsp			/* switch stack settled by sync_regs() */
+	/*
+	 * Call error_entry and switch stack settled by sync_regs().
+	 *
+	 * When in XENPV, it is already in the task stack, and it can't fault
+	 * for native_iret() nor native_load_gs_index() since XENPV uses its
+	 * own pvops for iret and load_gs_index().  And it doesn't need to
+	 * switch CR3.  So it can skip invoking error_entry().
+	 */
+	ALTERNATIVE "call error_entry; movq %rax, %rsp", \
+		"", X86_FEATURE_XENPV
+
 	ENCODE_FRAME_POINTER
 	UNWIND_HINT_REGS
 
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH V2 6/7] x86/entry: Use idtentry macro for entry_INT80_compat
  2022-03-03  3:54 [PATCH V2 0/7] x86/entry: Clean up entry code Lai Jiangshan
                   ` (4 preceding siblings ...)
  2022-03-03  3:54 ` [PATCH V2 5/7] x86/entry: Don't call error_entry for XENPV Lai Jiangshan
@ 2022-03-03  3:54 ` Lai Jiangshan
  2022-03-03  3:54 ` [PATCH V2 7/7] x86/entry: Convert SWAPGS to swapgs and remove the definition of SWAPGS Lai Jiangshan
  6 siblings, 0 replies; 12+ messages in thread
From: Lai Jiangshan @ 2022-03-03  3:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Lai Jiangshan, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
	Joerg Roedel, Chang S. Bae, Jan Kiszka

From: Lai Jiangshan <jiangshan.ljs@antgroup.com>

entry_INT80_compat is identical to idtentry macro except a special
handling for %rax in the prolog.

Add the prolog to idtentry and use idtentry for entry_INT80_compat.

Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
---
 arch/x86/entry/entry_64.S        |  18 ++++++
 arch/x86/entry/entry_64_compat.S | 102 -------------------------------
 arch/x86/include/asm/idtentry.h  |  47 ++++++++++++++
 arch/x86/include/asm/proto.h     |   4 --
 4 files changed, 65 insertions(+), 106 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index adc9f7619d1b..88b61f310289 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -371,6 +371,24 @@ SYM_CODE_START(\asmsym)
 		pushq	$-1			/* ORIG_RAX: no syscall to restart */
 	.endif
 
+	.if \vector == IA32_SYSCALL_VECTOR
+		/*
+		 * User tracing code (ptrace or signal handlers) might assume
+		 * that the saved RAX contains a 32-bit number when we're
+		 * invoking a 32-bit syscall.  Just in case the high bits are
+		 * nonzero, zero-extend the syscall number.  (This could almost
+		 * certainly be deleted with no ill effects.)
+		 */
+		movl	%eax, %eax
+
+		/*
+		 * do_int80_syscall_32() expects regs->orig_ax to be user ax,
+		 * and regs->ax to be $-ENOSYS.
+		 */
+		movq	%rax, (%rsp)
+		movq	$-ENOSYS, %rax
+	.endif
+
 	.if \vector == X86_TRAP_BP
 		/*
 		 * If coming from kernel space, create a 6-word gap to allow the
diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index 0051cf5c792d..a4fcea0cab14 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -311,105 +311,3 @@ sysret32_from_system_call:
 	swapgs
 	sysretl
 SYM_CODE_END(entry_SYSCALL_compat)
-
-/*
- * 32-bit legacy system call entry.
- *
- * 32-bit x86 Linux system calls traditionally used the INT $0x80
- * instruction.  INT $0x80 lands here.
- *
- * This entry point can be used by 32-bit and 64-bit programs to perform
- * 32-bit system calls.  Instances of INT $0x80 can be found inline in
- * various programs and libraries.  It is also used by the vDSO's
- * __kernel_vsyscall fallback for hardware that doesn't support a faster
- * entry method.  Restarted 32-bit system calls also fall back to INT
- * $0x80 regardless of what instruction was originally used to do the
- * system call.
- *
- * This is considered a slow path.  It is not used by most libc
- * implementations on modern hardware except during process startup.
- *
- * Arguments:
- * eax  system call number
- * ebx  arg1
- * ecx  arg2
- * edx  arg3
- * esi  arg4
- * edi  arg5
- * ebp  arg6
- */
-SYM_CODE_START(entry_INT80_compat)
-	UNWIND_HINT_EMPTY
-	/*
-	 * Interrupts are off on entry.
-	 */
-	ASM_CLAC			/* Do this early to minimize exposure */
-	SWAPGS
-
-	/*
-	 * User tracing code (ptrace or signal handlers) might assume that
-	 * the saved RAX contains a 32-bit number when we're invoking a 32-bit
-	 * syscall.  Just in case the high bits are nonzero, zero-extend
-	 * the syscall number.  (This could almost certainly be deleted
-	 * with no ill effects.)
-	 */
-	movl	%eax, %eax
-
-	/* switch to thread stack expects orig_ax and rdi to be pushed */
-	pushq	%rax			/* pt_regs->orig_ax */
-	pushq	%rdi			/* pt_regs->di */
-
-	/* Need to switch before accessing the thread stack. */
-	SWITCH_TO_KERNEL_CR3 scratch_reg=%rdi
-
-	/* In the Xen PV case we already run on the thread stack. */
-	ALTERNATIVE "", "jmp .Lint80_keep_stack", X86_FEATURE_XENPV
-
-	movq	%rsp, %rdi
-	movq	PER_CPU_VAR(cpu_current_top_of_stack), %rsp
-
-	pushq	6*8(%rdi)		/* regs->ss */
-	pushq	5*8(%rdi)		/* regs->rsp */
-	pushq	4*8(%rdi)		/* regs->eflags */
-	pushq	3*8(%rdi)		/* regs->cs */
-	pushq	2*8(%rdi)		/* regs->ip */
-	pushq	1*8(%rdi)		/* regs->orig_ax */
-	pushq	(%rdi)			/* pt_regs->di */
-.Lint80_keep_stack:
-
-	pushq	%rsi			/* pt_regs->si */
-	xorl	%esi, %esi		/* nospec   si */
-	pushq	%rdx			/* pt_regs->dx */
-	xorl	%edx, %edx		/* nospec   dx */
-	pushq	%rcx			/* pt_regs->cx */
-	xorl	%ecx, %ecx		/* nospec   cx */
-	pushq	$-ENOSYS		/* pt_regs->ax */
-	pushq   %r8			/* pt_regs->r8 */
-	xorl	%r8d, %r8d		/* nospec   r8 */
-	pushq   %r9			/* pt_regs->r9 */
-	xorl	%r9d, %r9d		/* nospec   r9 */
-	pushq   %r10			/* pt_regs->r10*/
-	xorl	%r10d, %r10d		/* nospec   r10 */
-	pushq   %r11			/* pt_regs->r11 */
-	xorl	%r11d, %r11d		/* nospec   r11 */
-	pushq   %rbx                    /* pt_regs->rbx */
-	xorl	%ebx, %ebx		/* nospec   rbx */
-	pushq   %rbp                    /* pt_regs->rbp */
-	xorl	%ebp, %ebp		/* nospec   rbp */
-	pushq   %r12                    /* pt_regs->r12 */
-	xorl	%r12d, %r12d		/* nospec   r12 */
-	pushq   %r13                    /* pt_regs->r13 */
-	xorl	%r13d, %r13d		/* nospec   r13 */
-	pushq   %r14                    /* pt_regs->r14 */
-	xorl	%r14d, %r14d		/* nospec   r14 */
-	pushq   %r15                    /* pt_regs->r15 */
-	xorl	%r15d, %r15d		/* nospec   r15 */
-
-	UNWIND_HINT_REGS
-
-	cld
-
-	movq	%rsp, %rdi
-	call	do_int80_syscall_32
-	jmp	swapgs_restore_regs_and_return_to_usermode
-SYM_CODE_END(entry_INT80_compat)
diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index 1345088e9902..38cb2e0dc2c7 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -204,6 +204,20 @@ __visible noinstr void func(struct pt_regs *regs,			\
 									\
 static noinline void __##func(struct pt_regs *regs, u32 vector)
 
+/**
+ * DECLARE_IDTENTRY_IA32_EMULATION - Declare functions for int80
+ * @vector:	Vector number (ignored for C)
+ * @asm_func:	Function name of the entry point
+ * @cfunc:	The C handler called from the ASM entry point (ignored for C)
+ *
+ * Declares two functions:
+ * - The ASM entry point: asm_func
+ * - The XEN PV trap entry point: xen_##asm_func (maybe unused)
+ */
+#define DECLARE_IDTENTRY_IA32_EMULATION(vector, asm_func, cfunc)	\
+	asmlinkage void asm_func(void);					\
+	asmlinkage void xen_##asm_func(void)
+
 /**
  * DECLARE_IDTENTRY_SYSVEC - Declare functions for system vector entry points
  * @vector:	Vector number (ignored for C)
@@ -430,6 +444,35 @@ __visible noinstr void func(struct pt_regs *regs,			\
 #define DECLARE_IDTENTRY_ERRORCODE(vector, func)			\
 	idtentry vector asm_##func func has_error_code=1
 
+/*
+ * 32-bit legacy system call entry.
+ *
+ * 32-bit x86 Linux system calls traditionally used the INT $0x80
+ * instruction.  INT $0x80 lands here.
+ *
+ * This entry point can be used by 32-bit and 64-bit programs to perform
+ * 32-bit system calls.  Instances of INT $0x80 can be found inline in
+ * various programs and libraries.  It is also used by the vDSO's
+ * __kernel_vsyscall fallback for hardware that doesn't support a faster
+ * entry method.  Restarted 32-bit system calls also fall back to INT
+ * $0x80 regardless of what instruction was originally used to do the
+ * system call.
+ *
+ * This is considered a slow path.  It is not used by most libc
+ * implementations on modern hardware except during process startup.
+ *
+ * Arguments:
+ * eax  system call number
+ * ebx  arg1
+ * ecx  arg2
+ * edx  arg3
+ * esi  arg4
+ * edi  arg5
+ * ebp  arg6
+ */
+#define DECLARE_IDTENTRY_IA32_EMULATION(vector, asm_func, cfunc)	\
+	idtentry vector asm_func cfunc has_error_code=0
+
 /* Special case for 32bit IRET 'trap'. Do not emit ASM code */
 #define DECLARE_IDTENTRY_SW(vector, func)
 
@@ -631,6 +674,10 @@ DECLARE_IDTENTRY_IRQ(X86_TRAP_OTHER,	common_interrupt);
 DECLARE_IDTENTRY_IRQ(X86_TRAP_OTHER,	spurious_interrupt);
 #endif
 
+#ifdef CONFIG_IA32_EMULATION
+DECLARE_IDTENTRY_IA32_EMULATION(IA32_SYSCALL_VECTOR,	entry_INT80_compat, do_int80_syscall_32);
+#endif
+
 /* System vector entry points */
 #ifdef CONFIG_X86_LOCAL_APIC
 DECLARE_IDTENTRY_SYSVEC(ERROR_APIC_VECTOR,		sysvec_error_interrupt);
diff --git a/arch/x86/include/asm/proto.h b/arch/x86/include/asm/proto.h
index feed36d44d04..c4d331fe65ff 100644
--- a/arch/x86/include/asm/proto.h
+++ b/arch/x86/include/asm/proto.h
@@ -28,10 +28,6 @@ void entry_SYSENTER_compat(void);
 void __end_entry_SYSENTER_compat(void);
 void entry_SYSCALL_compat(void);
 void entry_SYSCALL_compat_safe_stack(void);
-void entry_INT80_compat(void);
-#ifdef CONFIG_XEN_PV
-void xen_entry_INT80_compat(void);
-#endif
 #endif
 
 void x86_configure_nx(void);
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH V2 7/7] x86/entry: Convert SWAPGS to swapgs and remove the definition of SWAPGS
  2022-03-03  3:54 [PATCH V2 0/7] x86/entry: Clean up entry code Lai Jiangshan
                   ` (5 preceding siblings ...)
  2022-03-03  3:54 ` [PATCH V2 6/7] x86/entry: Use idtentry macro for entry_INT80_compat Lai Jiangshan
@ 2022-03-03  3:54 ` Lai Jiangshan
  6 siblings, 0 replies; 12+ messages in thread
From: Lai Jiangshan @ 2022-03-03  3:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Lai Jiangshan, xen-devel, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
	Juergen Gross, Peter Zijlstra (Intel),
	Kirill A. Shutemov

From: Lai Jiangshan <jiangshan.ljs@antgroup.com>

XENPV doesn't use swapgs_restore_regs_and_return_to_usermode(),
error_entry() and entry_SYSENTER_compat(), so the PV-awared SWAPGS in
them can be changed to swapgs.  There is no user of the SWAPGS anymore
after this change.

The INTERRUPT_RETURN in swapgs_restore_regs_and_return_to_usermode()
is also converted.

Cc: xen-devel@lists.xenproject.org
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
---
 arch/x86/entry/entry_64.S        | 10 +++++-----
 arch/x86/entry/entry_64_compat.S |  2 +-
 arch/x86/include/asm/irqflags.h  |  2 --
 3 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 88b61f310289..d9c885400034 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -644,8 +644,8 @@ SYM_INNER_LABEL(swapgs_restore_regs_and_return_to_usermode, SYM_L_GLOBAL)
 
 	/* Restore RDI. */
 	popq	%rdi
-	SWAPGS
-	INTERRUPT_RETURN
+	swapgs
+	jmp	native_iret
 
 
 SYM_INNER_LABEL(restore_regs_and_return_to_kernel, SYM_L_GLOBAL)
@@ -1007,7 +1007,7 @@ SYM_CODE_START_LOCAL(error_entry)
 	 * We entered from user mode or we're pretending to have entered
 	 * from user mode due to an IRET fault.
 	 */
-	SWAPGS
+	swapgs
 	FENCE_SWAPGS_USER_ENTRY
 	/* We have user CR3.  Change to kernel CR3. */
 	SWITCH_TO_KERNEL_CR3 scratch_reg=%rax
@@ -1039,7 +1039,7 @@ SYM_CODE_START_LOCAL(error_entry)
 	 * gsbase and proceed.  We'll fix up the exception and land in
 	 * .Lgs_change's error handler with kernel gsbase.
 	 */
-	SWAPGS
+	swapgs
 
 	/*
 	 * Issue an LFENCE to prevent GS speculation, regardless of whether it is a
@@ -1060,7 +1060,7 @@ SYM_CODE_START_LOCAL(error_entry)
 	 * We came from an IRET to user mode, so we have user
 	 * gsbase and CR3.  Switch to kernel gsbase and CR3:
 	 */
-	SWAPGS
+	swapgs
 	FENCE_SWAPGS_USER_ENTRY
 	SWITCH_TO_KERNEL_CR3 scratch_reg=%rax
 
diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index a4fcea0cab14..72e017c3941f 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -49,7 +49,7 @@
 SYM_CODE_START(entry_SYSENTER_compat)
 	UNWIND_HINT_EMPTY
 	/* Interrupts are off on entry. */
-	SWAPGS
+	swapgs
 
 	pushq	%rax
 	SWITCH_TO_KERNEL_CR3 scratch_reg=%rax
diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
index 87761396e8cc..ac2e4cc47210 100644
--- a/arch/x86/include/asm/irqflags.h
+++ b/arch/x86/include/asm/irqflags.h
@@ -140,13 +140,11 @@ static __always_inline void arch_local_irq_restore(unsigned long flags)
 #else
 #ifdef CONFIG_X86_64
 #ifdef CONFIG_XEN_PV
-#define SWAPGS	ALTERNATIVE "swapgs", "", X86_FEATURE_XENPV
 #define INTERRUPT_RETURN						\
 	ANNOTATE_RETPOLINE_SAFE;					\
 	ALTERNATIVE_TERNARY("jmp *paravirt_iret(%rip);",		\
 		X86_FEATURE_XENPV, "jmp xen_iret;", "jmp native_iret;")
 #else
-#define SWAPGS	swapgs
 #define INTERRUPT_RETURN	jmp native_iret
 #endif
 #endif
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH V2 2/7] x86/entry: Switch the stack after error_entry() returns
  2022-03-03  3:54 ` [PATCH V2 2/7] x86/entry: Switch the stack after error_entry() returns Lai Jiangshan
@ 2022-03-03  8:00   ` Peter Zijlstra
  2022-03-07 14:39     ` Lai Jiangshan
  0 siblings, 1 reply; 12+ messages in thread
From: Peter Zijlstra @ 2022-03-03  8:00 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: linux-kernel, x86, Lai Jiangshan, Andy Lutomirski,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin

On Thu, Mar 03, 2022 at 11:54:29AM +0800, Lai Jiangshan wrote:
> From: Lai Jiangshan <jiangshan.ljs@antgroup.com>
> 
> error_entry() calls sync_regs() to settle/copy the pt_regs and switches
> the stack directly after sync_regs().  But error_entry() itself is also
> a function call, the switching has to handle the return address of it
> together, which causes the work complicated and tangly.
> 
> Switching to the stack after error_entry() makes the code simpler and
> intuitive.
> 
> Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
> ---
>  arch/x86/entry/entry_64.S | 16 ++++++----------
>  1 file changed, 6 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index 24846284eda5..a51cad2b7fff 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -323,6 +323,8 @@ SYM_CODE_END(ret_from_fork)
>  .macro idtentry_body cfunc has_error_code:req
>  
>  	call	error_entry
> +	movq	%rax, %rsp			/* switch stack settled by sync_regs() */
> +	ENCODE_FRAME_POINTER
>  	UNWIND_HINT_REGS
>  
>  	movq	%rsp, %rdi			/* pt_regs pointer into 1st argument*/
> @@ -980,14 +982,10 @@ SYM_CODE_START_LOCAL(error_entry)
>  	/* We have user CR3.  Change to kernel CR3. */
>  	SWITCH_TO_KERNEL_CR3 scratch_reg=%rax
>  
> +	leaq	8(%rsp), %rdi			/* arg0 = pt_regs pointer */
>  .Lerror_entry_from_usermode_after_swapgs:
>  	/* Put us onto the real thread stack. */
> -	popq	%r12				/* save return addr in %12 */
> -	movq	%rsp, %rdi			/* arg0 = pt_regs pointer */
>  	call	sync_regs
> -	movq	%rax, %rsp			/* switch stack */
> -	ENCODE_FRAME_POINTER
> -	pushq	%r12
>  	RET
>  
>  	/*
> @@ -1019,6 +1017,7 @@ SYM_CODE_START_LOCAL(error_entry)
>  	 */
>  .Lerror_entry_done_lfence:
>  	FENCE_SWAPGS_KERNEL_ENTRY
> +	leaq	8(%rsp), %rax			/* return pt_regs pointer */
>  	RET
>  
>  .Lbstep_iret:
> @@ -1039,12 +1038,9 @@ SYM_CODE_START_LOCAL(error_entry)
>  	 * Pretend that the exception came from user mode: set up pt_regs
>  	 * as if we faulted immediately after IRET.
>  	 */
> -	popq	%r12				/* save return addr in %12 */
> -	movq	%rsp, %rdi			/* arg0 = pt_regs pointer */
> +	leaq	8(%rsp), %rdi			/* arg0 = pt_regs pointer */
>  	call	fixup_bad_iret
> -	mov	%rax, %rsp
> -	ENCODE_FRAME_POINTER
> -	pushq	%r12
> +	mov	%rax, %rdi
>  	jmp	.Lerror_entry_from_usermode_after_swapgs
>  SYM_CODE_END(error_entry)

I can't seem to follow; perhaps I need more wake-up-juice?

Suppose we have .Lerror_bad_iret, then the code flow is something like:


	old						new

.Lerror_bad_iret
				SWAWPGS business

	popq	%r12					leaq	8(%rsp), %rsi
	movq	%rsp, %rdi
	call	fixup_bad_iret				call	fixup_bad_iret
	mov	%rax, %rsp				mov	%rax, %rdi
	ENCODE_FRAME_POINTER
	pushq	%r12

		jmp	.Lerror_entry_from_usermode_after_swapgs

.Lerror_entry_from_usermode_after_swapgs
	pop	%r12
	movq	%rsp, %rdi
	call	sync_regs				call	sync_regs
	mov	%rax, %rsp
	ENCODE_FRAME_POINTER
	push	%r12
	RET						RET


The thing that appears to me is that syn_regs is called one stack switch
short. This isn't mentioned in the Changelog nor in the comments. Why is
this OK?





^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V2 3/7] x86/entry: move PUSH_AND_CLEAR_REGS out of error_entry
  2022-03-03  3:54 ` [PATCH V2 3/7] x86/entry: move PUSH_AND_CLEAR_REGS out of error_entry Lai Jiangshan
@ 2022-03-03  8:54   ` Peter Zijlstra
  2022-03-07 14:53     ` Lai Jiangshan
  0 siblings, 1 reply; 12+ messages in thread
From: Peter Zijlstra @ 2022-03-03  8:54 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: linux-kernel, x86, Lai Jiangshan, Andy Lutomirski,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin

On Thu, Mar 03, 2022 at 11:54:30AM +0800, Lai Jiangshan wrote:
> From: Lai Jiangshan <jiangshan.ljs@antgroup.com>
> 
> Moving PUSH_AND_CLEAR_REGS out of error_entry doesn't change any
> functionality.  It will enlarge the size:
> 
> size arch/x86/entry/entry_64.o.before:
>    text	   data	    bss	    dec	    hex	filename
>   17916	    384	      0	  18300	   477c	arch/x86/entry/entry_64.o
> 
> size --format=SysV arch/x86/entry/entry_64.o.before:
> .entry.text                      5528      0
> .orc_unwind                      6456      0
> .orc_unwind_ip                   4304      0
> 
> size arch/x86/entry/entry_64.o.after:
>    text	   data	    bss	    dec	    hex	filename
>   26868	    384	      0	  27252	   6a74	arch/x86/entry/entry_64.o
> 
> size --format=SysV arch/x86/entry/entry_64.o.after:
> .entry.text                      8200      0
> .orc_unwind                     10224      0
> .orc_unwind_ip                   6816      0
> 
> But .entry.text in x86_64 is 2M aligned, enlarging it to 8.2k doesn't
> enlarge the final text size.
> 
> The tables .orc_unwind[_ip] are enlarged due to it adds many pushes.
> 
> It is prepared for not calling error_entry() from XENPV in later patch
> and for future converting the whole error_entry into C code.

And also, IIUC, folding that int80 thing a few patches down, right?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V2 2/7] x86/entry: Switch the stack after error_entry() returns
  2022-03-03  8:00   ` Peter Zijlstra
@ 2022-03-07 14:39     ` Lai Jiangshan
  0 siblings, 0 replies; 12+ messages in thread
From: Lai Jiangshan @ 2022-03-07 14:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, X86 ML, Lai Jiangshan, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin

On Thu, Mar 3, 2022 at 4:00 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Thu, Mar 03, 2022 at 11:54:29AM +0800, Lai Jiangshan wrote:
> > From: Lai Jiangshan <jiangshan.ljs@antgroup.com>
> >
> > error_entry() calls sync_regs() to settle/copy the pt_regs and switches
> > the stack directly after sync_regs().  But error_entry() itself is also
> > a function call, the switching has to handle the return address of it
> > together, which causes the work complicated and tangly.
> >
> > Switching to the stack after error_entry() makes the code simpler and
> > intuitive.
> >
> > Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
> > ---
> >  arch/x86/entry/entry_64.S | 16 ++++++----------
> >  1 file changed, 6 insertions(+), 10 deletions(-)
> >
> > diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> > index 24846284eda5..a51cad2b7fff 100644
> > --- a/arch/x86/entry/entry_64.S
> > +++ b/arch/x86/entry/entry_64.S
> > @@ -323,6 +323,8 @@ SYM_CODE_END(ret_from_fork)
> >  .macro idtentry_body cfunc has_error_code:req
> >
> >       call    error_entry
> > +     movq    %rax, %rsp                      /* switch stack settled by sync_regs() */
> > +     ENCODE_FRAME_POINTER
> >       UNWIND_HINT_REGS
> >
> >       movq    %rsp, %rdi                      /* pt_regs pointer into 1st argument*/
> > @@ -980,14 +982,10 @@ SYM_CODE_START_LOCAL(error_entry)
> >       /* We have user CR3.  Change to kernel CR3. */
> >       SWITCH_TO_KERNEL_CR3 scratch_reg=%rax
> >
> > +     leaq    8(%rsp), %rdi                   /* arg0 = pt_regs pointer */
> >  .Lerror_entry_from_usermode_after_swapgs:
> >       /* Put us onto the real thread stack. */
> > -     popq    %r12                            /* save return addr in %12 */
> > -     movq    %rsp, %rdi                      /* arg0 = pt_regs pointer */
> >       call    sync_regs
> > -     movq    %rax, %rsp                      /* switch stack */
> > -     ENCODE_FRAME_POINTER
> > -     pushq   %r12
> >       RET
> >
> >       /*
> > @@ -1019,6 +1017,7 @@ SYM_CODE_START_LOCAL(error_entry)
> >        */
> >  .Lerror_entry_done_lfence:
> >       FENCE_SWAPGS_KERNEL_ENTRY
> > +     leaq    8(%rsp), %rax                   /* return pt_regs pointer */
> >       RET
> >
> >  .Lbstep_iret:
> > @@ -1039,12 +1038,9 @@ SYM_CODE_START_LOCAL(error_entry)
> >        * Pretend that the exception came from user mode: set up pt_regs
> >        * as if we faulted immediately after IRET.
> >        */
> > -     popq    %r12                            /* save return addr in %12 */
> > -     movq    %rsp, %rdi                      /* arg0 = pt_regs pointer */
> > +     leaq    8(%rsp), %rdi                   /* arg0 = pt_regs pointer */
> >       call    fixup_bad_iret
> > -     mov     %rax, %rsp
> > -     ENCODE_FRAME_POINTER
> > -     pushq   %r12
> > +     mov     %rax, %rdi
> >       jmp     .Lerror_entry_from_usermode_after_swapgs
> >  SYM_CODE_END(error_entry)
>
> I can't seem to follow; perhaps I need more wake-up-juice?
>
> Suppose we have .Lerror_bad_iret, then the code flow is something like:
>
>
>         old                                             new
>
> .Lerror_bad_iret
>                                 SWAWPGS business
>
>         popq    %r12                                    leaq    8(%rsp), %rsi
>         movq    %rsp, %rdi
>         call    fixup_bad_iret                          call    fixup_bad_iret
>         mov     %rax, %rsp                              mov     %rax, %rdi
>         ENCODE_FRAME_POINTER
>         pushq   %r12
>
>                 jmp     .Lerror_entry_from_usermode_after_swapgs
>
> .Lerror_entry_from_usermode_after_swapgs
>         pop     %r12
>         movq    %rsp, %rdi
>         call    sync_regs                               call    sync_regs
>         mov     %rax, %rsp
>         ENCODE_FRAME_POINTER
>         push    %r12
>         RET                                             RET
>
>
> The thing that appears to me is that syn_regs is called one stack switch
> short. This isn't mentioned in the Changelog nor in the comments. Why is
> this OK?
>
>

I has not been confident enough with the meaning of "short" or
"call short" in my understanding of English.

So I tried hard to find the semantic difference between the code
before and after the patch.

The logic are the same:
  1) feed fixup_bad_iret() with the pt_regs pushed by ASM code
  2) fixup_bad_iret() moves the partial pt_regs up
  3) feed sync_regs() with the pt_regs returned by fixup_bad_iret()
  4) sync_regs() copies the whole pt_regs to kernel stack if needed
  5) after error_entry(), it is in kernel stack with the pt_regs

Differences:
  Old code switches to copied pt_regs immediately twice in
  error_entry() while new code switches to the copied pt_regs only
  once after error_entry() returns.
  It is correct since sync_regs() doesn't need to be called close
  to the pt_regs it handles.  And this is the point of this
  patch which switches stack once only.

  Old code stashes the return-address of error_entry() in a scratch
  register and new code doesn't stash it.
  It relies on the fact that fixup_bad_iret() and sync_regs() don't
  corrupt the return-address of error_entry().  But even the old code
  also relies on the fact that fixup_bad_iret() and sync_regs() don't
  corrupt the return-address of themselves.  They are the same reliance.

The code had been tested with my additial tracing code in kernel and
the x86 selftest code which includes all kinds of cases for bad iret.
(tools/testing/selftests/x86/sigreturn.c)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V2 3/7] x86/entry: move PUSH_AND_CLEAR_REGS out of error_entry
  2022-03-03  8:54   ` Peter Zijlstra
@ 2022-03-07 14:53     ` Lai Jiangshan
  0 siblings, 0 replies; 12+ messages in thread
From: Lai Jiangshan @ 2022-03-07 14:53 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, X86 ML, Lai Jiangshan, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin

On Thu, Mar 3, 2022 at 4:55 PM Peter Zijlstra <peterz@infradead.org> wrote:

>
> And also, IIUC, folding that int80 thing a few patches down, right?

I don't think folding that int80 thing into idtentry requires this patch.

I didn't notice int80 is a huge ASM code in a different file until I finished
converting the entry code to C code.  So folding int80 thing is a patch later.

It is possible to move this patch of folding int80 thing as the first patch,
but several patches need to be updated respectively.  But I'm afraid of
updating ASM code without good reason which requires several days of
debugging even if there is a slight mistake, for example, forgetting to
update the offset of the stack or register name when switching two lines
of code.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-03-07 14:54 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-03  3:54 [PATCH V2 0/7] x86/entry: Clean up entry code Lai Jiangshan
2022-03-03  3:54 ` [PATCH V2 1/7] x86/traps: Move pt_regs only in fixup_bad_iret() Lai Jiangshan
2022-03-03  3:54 ` [PATCH V2 2/7] x86/entry: Switch the stack after error_entry() returns Lai Jiangshan
2022-03-03  8:00   ` Peter Zijlstra
2022-03-07 14:39     ` Lai Jiangshan
2022-03-03  3:54 ` [PATCH V2 3/7] x86/entry: move PUSH_AND_CLEAR_REGS out of error_entry Lai Jiangshan
2022-03-03  8:54   ` Peter Zijlstra
2022-03-07 14:53     ` Lai Jiangshan
2022-03-03  3:54 ` [PATCH V2 4/7] x86/entry: Move cld to the start of idtentry Lai Jiangshan
2022-03-03  3:54 ` [PATCH V2 5/7] x86/entry: Don't call error_entry for XENPV Lai Jiangshan
2022-03-03  3:54 ` [PATCH V2 6/7] x86/entry: Use idtentry macro for entry_INT80_compat Lai Jiangshan
2022-03-03  3:54 ` [PATCH V2 7/7] x86/entry: Convert SWAPGS to swapgs and remove the definition of SWAPGS Lai Jiangshan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.