* [PATCH V2 0/7] x86/entry: Clean up entry code
@ 2022-03-03 3:54 Lai Jiangshan
2022-03-03 3:54 ` [PATCH V2 1/7] x86/traps: Move pt_regs only in fixup_bad_iret() Lai Jiangshan
` (6 more replies)
0 siblings, 7 replies; 12+ messages in thread
From: Lai Jiangshan @ 2022-03-03 3:54 UTC (permalink / raw)
To: linux-kernel; +Cc: x86, Lai Jiangshan
From: Lai Jiangshan <jiangshan.ljs@antgroup.com>
This patchset moves the stack-switch code to the place where
error_entry() return, unravels error_entry() from XENpv and makes
entry_INT80_compat use idtentry macro.
This patchset is highly related to XENpv, because it does the extra
cleanup to convert SWAPGS to swapgs after major cleanup is done.
The patches are the version 2 of
https://lore.kernel.org/lkml/20211208110833.65366-1-jiangshanlai@gmail.com/
which are picked from the patchset
https://lore.kernel.org/lkml/20211126101209.8613-1-jiangshanlai@gmail.com/
which coverts ASM code to C code. These patches are prepared for that
purpose. But this patchset has it own value: it simplifies the stack
switch, avoids leaving the old stack inside a function call, and
separates XENpv code with native code without adding new code.
Changed from V1
Squash cleanup patches converting SWAPGS to swapgs into one patch
Use my official email address (Ant Group). The work is backed
by my company and I was incorrectly misunderstood that
XXX@linux.alibaba.com is the only portal for opensource work
in the corporate group.
PS: I think the work is ready to be queued and I'm deeply sorry not to
resend it for more reviews for the last two months because I have been
obsessed with a boardgame and my brain power was drained.
Lai Jiangshan (7):
x86/traps: Move pt_regs only in fixup_bad_iret()
x86/entry: Switch the stack after error_entry() returns
x86/entry: move PUSH_AND_CLEAR_REGS out of error_entry
x86/entry: Move cld to the start of idtentry
x86/entry: Don't call error_entry for XENPV
x86/entry: Use idtentry macro for entry_INT80_compat
x86/entry: Convert SWAPGS to swapgs and remove the definition of
SWAPGS
arch/x86/entry/entry_64.S | 65 +++++++++++++------
arch/x86/entry/entry_64_compat.S | 104 +------------------------------
arch/x86/include/asm/idtentry.h | 47 ++++++++++++++
arch/x86/include/asm/irqflags.h | 2 -
arch/x86/include/asm/proto.h | 4 --
arch/x86/include/asm/traps.h | 2 +-
arch/x86/kernel/traps.c | 17 ++---
7 files changed, 102 insertions(+), 139 deletions(-)
--
2.19.1.6.gb485710b
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH V2 1/7] x86/traps: Move pt_regs only in fixup_bad_iret()
2022-03-03 3:54 [PATCH V2 0/7] x86/entry: Clean up entry code Lai Jiangshan
@ 2022-03-03 3:54 ` Lai Jiangshan
2022-03-03 3:54 ` [PATCH V2 2/7] x86/entry: Switch the stack after error_entry() returns Lai Jiangshan
` (5 subsequent siblings)
6 siblings, 0 replies; 12+ messages in thread
From: Lai Jiangshan @ 2022-03-03 3:54 UTC (permalink / raw)
To: linux-kernel
Cc: x86, Lai Jiangshan, Andy Lutomirski, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
Peter Zijlstra, Fenghua Yu, Joerg Roedel, Chang S. Bae
From: Lai Jiangshan <jiangshan.ljs@antgroup.com>
fixup_bad_iret() and sync_regs() have similar arguments and do similar
work that copies full or partial pt_regs to a place and switches stack
after return. They are quite the same, but fixup_bad_iret() not only
copies the pt_regs but also the return address of error_entry() while
sync_regs() copies the pt_regs only and the return address of
error_entry() was preserved and handled in ASM code.
This patch makes fixup_bad_iret() work like sync_regs() and the
handling of the return address of error_entry() is moved in ASM code.
It removes the need to use the struct bad_iret_stack, simplifies
fixup_bad_iret() and makes the ASM error_entry() call fixup_bad_iret()
as the same as calling sync_regs() which adds readability because
the calling patterns are exactly the same.
It is prepared for later patch to do the stack switch after the
error_entry() which simplifies the code further.
Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
---
arch/x86/entry/entry_64.S | 5 ++++-
arch/x86/include/asm/traps.h | 2 +-
arch/x86/kernel/traps.c | 17 ++++++-----------
3 files changed, 11 insertions(+), 13 deletions(-)
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 466df3e50276..24846284eda5 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1039,9 +1039,12 @@ SYM_CODE_START_LOCAL(error_entry)
* Pretend that the exception came from user mode: set up pt_regs
* as if we faulted immediately after IRET.
*/
- mov %rsp, %rdi
+ popq %r12 /* save return addr in %12 */
+ movq %rsp, %rdi /* arg0 = pt_regs pointer */
call fixup_bad_iret
mov %rax, %rsp
+ ENCODE_FRAME_POINTER
+ pushq %r12
jmp .Lerror_entry_from_usermode_after_swapgs
SYM_CODE_END(error_entry)
diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index 6221be7cafc3..1cdd7e8bcba7 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -13,7 +13,7 @@
#ifdef CONFIG_X86_64
asmlinkage __visible notrace struct pt_regs *sync_regs(struct pt_regs *eregs);
asmlinkage __visible notrace
-struct bad_iret_stack *fixup_bad_iret(struct bad_iret_stack *s);
+struct pt_regs *fixup_bad_iret(struct pt_regs *bad_regs);
void __init trap_init(void);
asmlinkage __visible noinstr struct pt_regs *vc_switch_off_ist(struct pt_regs *eregs);
#endif
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 7ef00dee35be..2b1f049afb14 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -816,13 +816,8 @@ asmlinkage __visible noinstr struct pt_regs *vc_switch_off_ist(struct pt_regs *r
}
#endif
-struct bad_iret_stack {
- void *error_entry_ret;
- struct pt_regs regs;
-};
-
asmlinkage __visible noinstr
-struct bad_iret_stack *fixup_bad_iret(struct bad_iret_stack *s)
+struct pt_regs *fixup_bad_iret(struct pt_regs *bad_regs)
{
/*
* This is called from entry_64.S early in handling a fault
@@ -832,19 +827,19 @@ struct bad_iret_stack *fixup_bad_iret(struct bad_iret_stack *s)
* just below the IRET frame) and we want to pretend that the
* exception came from the IRET target.
*/
- struct bad_iret_stack tmp, *new_stack =
- (struct bad_iret_stack *)__this_cpu_read(cpu_tss_rw.x86_tss.sp0) - 1;
+ struct pt_regs tmp, *new_stack =
+ (struct pt_regs *)__this_cpu_read(cpu_tss_rw.x86_tss.sp0) - 1;
/* Copy the IRET target to the temporary storage. */
- __memcpy(&tmp.regs.ip, (void *)s->regs.sp, 5*8);
+ __memcpy(&tmp.ip, (void *)bad_regs->sp, 5*8);
/* Copy the remainder of the stack from the current stack. */
- __memcpy(&tmp, s, offsetof(struct bad_iret_stack, regs.ip));
+ __memcpy(&tmp, bad_regs, offsetof(struct pt_regs, ip));
/* Update the entry stack */
__memcpy(new_stack, &tmp, sizeof(tmp));
- BUG_ON(!user_mode(&new_stack->regs));
+ BUG_ON(!user_mode(new_stack));
return new_stack;
}
#endif
--
2.19.1.6.gb485710b
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH V2 2/7] x86/entry: Switch the stack after error_entry() returns
2022-03-03 3:54 [PATCH V2 0/7] x86/entry: Clean up entry code Lai Jiangshan
2022-03-03 3:54 ` [PATCH V2 1/7] x86/traps: Move pt_regs only in fixup_bad_iret() Lai Jiangshan
@ 2022-03-03 3:54 ` Lai Jiangshan
2022-03-03 8:00 ` Peter Zijlstra
2022-03-03 3:54 ` [PATCH V2 3/7] x86/entry: move PUSH_AND_CLEAR_REGS out of error_entry Lai Jiangshan
` (4 subsequent siblings)
6 siblings, 1 reply; 12+ messages in thread
From: Lai Jiangshan @ 2022-03-03 3:54 UTC (permalink / raw)
To: linux-kernel
Cc: x86, Lai Jiangshan, Andy Lutomirski, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin
From: Lai Jiangshan <jiangshan.ljs@antgroup.com>
error_entry() calls sync_regs() to settle/copy the pt_regs and switches
the stack directly after sync_regs(). But error_entry() itself is also
a function call, the switching has to handle the return address of it
together, which causes the work complicated and tangly.
Switching to the stack after error_entry() makes the code simpler and
intuitive.
Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
---
arch/x86/entry/entry_64.S | 16 ++++++----------
1 file changed, 6 insertions(+), 10 deletions(-)
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 24846284eda5..a51cad2b7fff 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -323,6 +323,8 @@ SYM_CODE_END(ret_from_fork)
.macro idtentry_body cfunc has_error_code:req
call error_entry
+ movq %rax, %rsp /* switch stack settled by sync_regs() */
+ ENCODE_FRAME_POINTER
UNWIND_HINT_REGS
movq %rsp, %rdi /* pt_regs pointer into 1st argument*/
@@ -980,14 +982,10 @@ SYM_CODE_START_LOCAL(error_entry)
/* We have user CR3. Change to kernel CR3. */
SWITCH_TO_KERNEL_CR3 scratch_reg=%rax
+ leaq 8(%rsp), %rdi /* arg0 = pt_regs pointer */
.Lerror_entry_from_usermode_after_swapgs:
/* Put us onto the real thread stack. */
- popq %r12 /* save return addr in %12 */
- movq %rsp, %rdi /* arg0 = pt_regs pointer */
call sync_regs
- movq %rax, %rsp /* switch stack */
- ENCODE_FRAME_POINTER
- pushq %r12
RET
/*
@@ -1019,6 +1017,7 @@ SYM_CODE_START_LOCAL(error_entry)
*/
.Lerror_entry_done_lfence:
FENCE_SWAPGS_KERNEL_ENTRY
+ leaq 8(%rsp), %rax /* return pt_regs pointer */
RET
.Lbstep_iret:
@@ -1039,12 +1038,9 @@ SYM_CODE_START_LOCAL(error_entry)
* Pretend that the exception came from user mode: set up pt_regs
* as if we faulted immediately after IRET.
*/
- popq %r12 /* save return addr in %12 */
- movq %rsp, %rdi /* arg0 = pt_regs pointer */
+ leaq 8(%rsp), %rdi /* arg0 = pt_regs pointer */
call fixup_bad_iret
- mov %rax, %rsp
- ENCODE_FRAME_POINTER
- pushq %r12
+ mov %rax, %rdi
jmp .Lerror_entry_from_usermode_after_swapgs
SYM_CODE_END(error_entry)
--
2.19.1.6.gb485710b
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH V2 3/7] x86/entry: move PUSH_AND_CLEAR_REGS out of error_entry
2022-03-03 3:54 [PATCH V2 0/7] x86/entry: Clean up entry code Lai Jiangshan
2022-03-03 3:54 ` [PATCH V2 1/7] x86/traps: Move pt_regs only in fixup_bad_iret() Lai Jiangshan
2022-03-03 3:54 ` [PATCH V2 2/7] x86/entry: Switch the stack after error_entry() returns Lai Jiangshan
@ 2022-03-03 3:54 ` Lai Jiangshan
2022-03-03 8:54 ` Peter Zijlstra
2022-03-03 3:54 ` [PATCH V2 4/7] x86/entry: Move cld to the start of idtentry Lai Jiangshan
` (3 subsequent siblings)
6 siblings, 1 reply; 12+ messages in thread
From: Lai Jiangshan @ 2022-03-03 3:54 UTC (permalink / raw)
To: linux-kernel
Cc: x86, Lai Jiangshan, Andy Lutomirski, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin
From: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Moving PUSH_AND_CLEAR_REGS out of error_entry doesn't change any
functionality. It will enlarge the size:
size arch/x86/entry/entry_64.o.before:
text data bss dec hex filename
17916 384 0 18300 477c arch/x86/entry/entry_64.o
size --format=SysV arch/x86/entry/entry_64.o.before:
.entry.text 5528 0
.orc_unwind 6456 0
.orc_unwind_ip 4304 0
size arch/x86/entry/entry_64.o.after:
text data bss dec hex filename
26868 384 0 27252 6a74 arch/x86/entry/entry_64.o
size --format=SysV arch/x86/entry/entry_64.o.after:
.entry.text 8200 0
.orc_unwind 10224 0
.orc_unwind_ip 6816 0
But .entry.text in x86_64 is 2M aligned, enlarging it to 8.2k doesn't
enlarge the final text size.
The tables .orc_unwind[_ip] are enlarged due to it adds many pushes.
It is prepared for not calling error_entry() from XENPV in later patch
and for future converting the whole error_entry into C code.
Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
---
arch/x86/entry/entry_64.S | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index a51cad2b7fff..3ca64bad4697 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -322,6 +322,9 @@ SYM_CODE_END(ret_from_fork)
*/
.macro idtentry_body cfunc has_error_code:req
+ PUSH_AND_CLEAR_REGS
+ ENCODE_FRAME_POINTER
+
call error_entry
movq %rax, %rsp /* switch stack settled by sync_regs() */
ENCODE_FRAME_POINTER
@@ -968,8 +971,6 @@ SYM_CODE_END(paranoid_exit)
SYM_CODE_START_LOCAL(error_entry)
UNWIND_HINT_FUNC
cld
- PUSH_AND_CLEAR_REGS save_ret=1
- ENCODE_FRAME_POINTER 8
testb $3, CS+8(%rsp)
jz .Lerror_kernelspace
--
2.19.1.6.gb485710b
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH V2 4/7] x86/entry: Move cld to the start of idtentry
2022-03-03 3:54 [PATCH V2 0/7] x86/entry: Clean up entry code Lai Jiangshan
` (2 preceding siblings ...)
2022-03-03 3:54 ` [PATCH V2 3/7] x86/entry: move PUSH_AND_CLEAR_REGS out of error_entry Lai Jiangshan
@ 2022-03-03 3:54 ` Lai Jiangshan
2022-03-03 3:54 ` [PATCH V2 5/7] x86/entry: Don't call error_entry for XENPV Lai Jiangshan
` (2 subsequent siblings)
6 siblings, 0 replies; 12+ messages in thread
From: Lai Jiangshan @ 2022-03-03 3:54 UTC (permalink / raw)
To: linux-kernel
Cc: x86, Lai Jiangshan, Andy Lutomirski, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin
From: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Make it next to CLAC
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
---
arch/x86/entry/entry_64.S | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 3ca64bad4697..630bf8164a09 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -356,6 +356,7 @@ SYM_CODE_END(ret_from_fork)
SYM_CODE_START(\asmsym)
UNWIND_HINT_IRET_REGS offset=\has_error_code*8
ASM_CLAC
+ cld
.if \has_error_code == 0
pushq $-1 /* ORIG_RAX: no syscall to restart */
@@ -423,6 +424,7 @@ SYM_CODE_END(\asmsym)
SYM_CODE_START(\asmsym)
UNWIND_HINT_IRET_REGS
ASM_CLAC
+ cld
pushq $-1 /* ORIG_RAX: no syscall to restart */
@@ -478,6 +480,7 @@ SYM_CODE_END(\asmsym)
SYM_CODE_START(\asmsym)
UNWIND_HINT_IRET_REGS
ASM_CLAC
+ cld
/*
* If the entry is from userspace, switch stacks and treat it as
@@ -539,6 +542,7 @@ SYM_CODE_END(\asmsym)
SYM_CODE_START(\asmsym)
UNWIND_HINT_IRET_REGS offset=8
ASM_CLAC
+ cld
/* paranoid_entry returns GS information for paranoid_exit in EBX. */
call paranoid_entry
@@ -852,7 +856,6 @@ SYM_CODE_END(xen_failsafe_callback)
*/
SYM_CODE_START_LOCAL(paranoid_entry)
UNWIND_HINT_FUNC
- cld
PUSH_AND_CLEAR_REGS save_ret=1
ENCODE_FRAME_POINTER 8
@@ -970,7 +973,6 @@ SYM_CODE_END(paranoid_exit)
*/
SYM_CODE_START_LOCAL(error_entry)
UNWIND_HINT_FUNC
- cld
testb $3, CS+8(%rsp)
jz .Lerror_kernelspace
@@ -1103,6 +1105,7 @@ SYM_CODE_START(asm_exc_nmi)
*/
ASM_CLAC
+ cld
/* Use %rdx as our temp variable throughout */
pushq %rdx
@@ -1122,7 +1125,6 @@ SYM_CODE_START(asm_exc_nmi)
*/
swapgs
- cld
FENCE_SWAPGS_USER_ENTRY
SWITCH_TO_KERNEL_CR3 scratch_reg=%rdx
movq %rsp, %rdx
--
2.19.1.6.gb485710b
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH V2 5/7] x86/entry: Don't call error_entry for XENPV
2022-03-03 3:54 [PATCH V2 0/7] x86/entry: Clean up entry code Lai Jiangshan
` (3 preceding siblings ...)
2022-03-03 3:54 ` [PATCH V2 4/7] x86/entry: Move cld to the start of idtentry Lai Jiangshan
@ 2022-03-03 3:54 ` Lai Jiangshan
2022-03-03 3:54 ` [PATCH V2 6/7] x86/entry: Use idtentry macro for entry_INT80_compat Lai Jiangshan
2022-03-03 3:54 ` [PATCH V2 7/7] x86/entry: Convert SWAPGS to swapgs and remove the definition of SWAPGS Lai Jiangshan
6 siblings, 0 replies; 12+ messages in thread
From: Lai Jiangshan @ 2022-03-03 3:54 UTC (permalink / raw)
To: linux-kernel
Cc: x86, Lai Jiangshan, xen-devel, Andy Lutomirski, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin
From: Lai Jiangshan <jiangshan.ljs@antgroup.com>
When in XENPV, it is already in the task stack, and it can't fault
for native_iret() nor native_load_gs_index() since XENPV uses its own
pvops for iret and load_gs_index(). And it doesn't need to switch CR3.
So there is no reason to call error_entry() in XENPV.
Cc: xen-devel@lists.xenproject.org
Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
---
arch/x86/entry/entry_64.S | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 630bf8164a09..adc9f7619d1b 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -325,8 +325,17 @@ SYM_CODE_END(ret_from_fork)
PUSH_AND_CLEAR_REGS
ENCODE_FRAME_POINTER
- call error_entry
- movq %rax, %rsp /* switch stack settled by sync_regs() */
+ /*
+ * Call error_entry and switch stack settled by sync_regs().
+ *
+ * When in XENPV, it is already in the task stack, and it can't fault
+ * for native_iret() nor native_load_gs_index() since XENPV uses its
+ * own pvops for iret and load_gs_index(). And it doesn't need to
+ * switch CR3. So it can skip invoking error_entry().
+ */
+ ALTERNATIVE "call error_entry; movq %rax, %rsp", \
+ "", X86_FEATURE_XENPV
+
ENCODE_FRAME_POINTER
UNWIND_HINT_REGS
--
2.19.1.6.gb485710b
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH V2 6/7] x86/entry: Use idtentry macro for entry_INT80_compat
2022-03-03 3:54 [PATCH V2 0/7] x86/entry: Clean up entry code Lai Jiangshan
` (4 preceding siblings ...)
2022-03-03 3:54 ` [PATCH V2 5/7] x86/entry: Don't call error_entry for XENPV Lai Jiangshan
@ 2022-03-03 3:54 ` Lai Jiangshan
2022-03-03 3:54 ` [PATCH V2 7/7] x86/entry: Convert SWAPGS to swapgs and remove the definition of SWAPGS Lai Jiangshan
6 siblings, 0 replies; 12+ messages in thread
From: Lai Jiangshan @ 2022-03-03 3:54 UTC (permalink / raw)
To: linux-kernel
Cc: x86, Lai Jiangshan, Andy Lutomirski, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
Joerg Roedel, Chang S. Bae, Jan Kiszka
From: Lai Jiangshan <jiangshan.ljs@antgroup.com>
entry_INT80_compat is identical to idtentry macro except a special
handling for %rax in the prolog.
Add the prolog to idtentry and use idtentry for entry_INT80_compat.
Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
---
arch/x86/entry/entry_64.S | 18 ++++++
arch/x86/entry/entry_64_compat.S | 102 -------------------------------
arch/x86/include/asm/idtentry.h | 47 ++++++++++++++
arch/x86/include/asm/proto.h | 4 --
4 files changed, 65 insertions(+), 106 deletions(-)
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index adc9f7619d1b..88b61f310289 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -371,6 +371,24 @@ SYM_CODE_START(\asmsym)
pushq $-1 /* ORIG_RAX: no syscall to restart */
.endif
+ .if \vector == IA32_SYSCALL_VECTOR
+ /*
+ * User tracing code (ptrace or signal handlers) might assume
+ * that the saved RAX contains a 32-bit number when we're
+ * invoking a 32-bit syscall. Just in case the high bits are
+ * nonzero, zero-extend the syscall number. (This could almost
+ * certainly be deleted with no ill effects.)
+ */
+ movl %eax, %eax
+
+ /*
+ * do_int80_syscall_32() expects regs->orig_ax to be user ax,
+ * and regs->ax to be $-ENOSYS.
+ */
+ movq %rax, (%rsp)
+ movq $-ENOSYS, %rax
+ .endif
+
.if \vector == X86_TRAP_BP
/*
* If coming from kernel space, create a 6-word gap to allow the
diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index 0051cf5c792d..a4fcea0cab14 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -311,105 +311,3 @@ sysret32_from_system_call:
swapgs
sysretl
SYM_CODE_END(entry_SYSCALL_compat)
-
-/*
- * 32-bit legacy system call entry.
- *
- * 32-bit x86 Linux system calls traditionally used the INT $0x80
- * instruction. INT $0x80 lands here.
- *
- * This entry point can be used by 32-bit and 64-bit programs to perform
- * 32-bit system calls. Instances of INT $0x80 can be found inline in
- * various programs and libraries. It is also used by the vDSO's
- * __kernel_vsyscall fallback for hardware that doesn't support a faster
- * entry method. Restarted 32-bit system calls also fall back to INT
- * $0x80 regardless of what instruction was originally used to do the
- * system call.
- *
- * This is considered a slow path. It is not used by most libc
- * implementations on modern hardware except during process startup.
- *
- * Arguments:
- * eax system call number
- * ebx arg1
- * ecx arg2
- * edx arg3
- * esi arg4
- * edi arg5
- * ebp arg6
- */
-SYM_CODE_START(entry_INT80_compat)
- UNWIND_HINT_EMPTY
- /*
- * Interrupts are off on entry.
- */
- ASM_CLAC /* Do this early to minimize exposure */
- SWAPGS
-
- /*
- * User tracing code (ptrace or signal handlers) might assume that
- * the saved RAX contains a 32-bit number when we're invoking a 32-bit
- * syscall. Just in case the high bits are nonzero, zero-extend
- * the syscall number. (This could almost certainly be deleted
- * with no ill effects.)
- */
- movl %eax, %eax
-
- /* switch to thread stack expects orig_ax and rdi to be pushed */
- pushq %rax /* pt_regs->orig_ax */
- pushq %rdi /* pt_regs->di */
-
- /* Need to switch before accessing the thread stack. */
- SWITCH_TO_KERNEL_CR3 scratch_reg=%rdi
-
- /* In the Xen PV case we already run on the thread stack. */
- ALTERNATIVE "", "jmp .Lint80_keep_stack", X86_FEATURE_XENPV
-
- movq %rsp, %rdi
- movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp
-
- pushq 6*8(%rdi) /* regs->ss */
- pushq 5*8(%rdi) /* regs->rsp */
- pushq 4*8(%rdi) /* regs->eflags */
- pushq 3*8(%rdi) /* regs->cs */
- pushq 2*8(%rdi) /* regs->ip */
- pushq 1*8(%rdi) /* regs->orig_ax */
- pushq (%rdi) /* pt_regs->di */
-.Lint80_keep_stack:
-
- pushq %rsi /* pt_regs->si */
- xorl %esi, %esi /* nospec si */
- pushq %rdx /* pt_regs->dx */
- xorl %edx, %edx /* nospec dx */
- pushq %rcx /* pt_regs->cx */
- xorl %ecx, %ecx /* nospec cx */
- pushq $-ENOSYS /* pt_regs->ax */
- pushq %r8 /* pt_regs->r8 */
- xorl %r8d, %r8d /* nospec r8 */
- pushq %r9 /* pt_regs->r9 */
- xorl %r9d, %r9d /* nospec r9 */
- pushq %r10 /* pt_regs->r10*/
- xorl %r10d, %r10d /* nospec r10 */
- pushq %r11 /* pt_regs->r11 */
- xorl %r11d, %r11d /* nospec r11 */
- pushq %rbx /* pt_regs->rbx */
- xorl %ebx, %ebx /* nospec rbx */
- pushq %rbp /* pt_regs->rbp */
- xorl %ebp, %ebp /* nospec rbp */
- pushq %r12 /* pt_regs->r12 */
- xorl %r12d, %r12d /* nospec r12 */
- pushq %r13 /* pt_regs->r13 */
- xorl %r13d, %r13d /* nospec r13 */
- pushq %r14 /* pt_regs->r14 */
- xorl %r14d, %r14d /* nospec r14 */
- pushq %r15 /* pt_regs->r15 */
- xorl %r15d, %r15d /* nospec r15 */
-
- UNWIND_HINT_REGS
-
- cld
-
- movq %rsp, %rdi
- call do_int80_syscall_32
- jmp swapgs_restore_regs_and_return_to_usermode
-SYM_CODE_END(entry_INT80_compat)
diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index 1345088e9902..38cb2e0dc2c7 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -204,6 +204,20 @@ __visible noinstr void func(struct pt_regs *regs, \
\
static noinline void __##func(struct pt_regs *regs, u32 vector)
+/**
+ * DECLARE_IDTENTRY_IA32_EMULATION - Declare functions for int80
+ * @vector: Vector number (ignored for C)
+ * @asm_func: Function name of the entry point
+ * @cfunc: The C handler called from the ASM entry point (ignored for C)
+ *
+ * Declares two functions:
+ * - The ASM entry point: asm_func
+ * - The XEN PV trap entry point: xen_##asm_func (maybe unused)
+ */
+#define DECLARE_IDTENTRY_IA32_EMULATION(vector, asm_func, cfunc) \
+ asmlinkage void asm_func(void); \
+ asmlinkage void xen_##asm_func(void)
+
/**
* DECLARE_IDTENTRY_SYSVEC - Declare functions for system vector entry points
* @vector: Vector number (ignored for C)
@@ -430,6 +444,35 @@ __visible noinstr void func(struct pt_regs *regs, \
#define DECLARE_IDTENTRY_ERRORCODE(vector, func) \
idtentry vector asm_##func func has_error_code=1
+/*
+ * 32-bit legacy system call entry.
+ *
+ * 32-bit x86 Linux system calls traditionally used the INT $0x80
+ * instruction. INT $0x80 lands here.
+ *
+ * This entry point can be used by 32-bit and 64-bit programs to perform
+ * 32-bit system calls. Instances of INT $0x80 can be found inline in
+ * various programs and libraries. It is also used by the vDSO's
+ * __kernel_vsyscall fallback for hardware that doesn't support a faster
+ * entry method. Restarted 32-bit system calls also fall back to INT
+ * $0x80 regardless of what instruction was originally used to do the
+ * system call.
+ *
+ * This is considered a slow path. It is not used by most libc
+ * implementations on modern hardware except during process startup.
+ *
+ * Arguments:
+ * eax system call number
+ * ebx arg1
+ * ecx arg2
+ * edx arg3
+ * esi arg4
+ * edi arg5
+ * ebp arg6
+ */
+#define DECLARE_IDTENTRY_IA32_EMULATION(vector, asm_func, cfunc) \
+ idtentry vector asm_func cfunc has_error_code=0
+
/* Special case for 32bit IRET 'trap'. Do not emit ASM code */
#define DECLARE_IDTENTRY_SW(vector, func)
@@ -631,6 +674,10 @@ DECLARE_IDTENTRY_IRQ(X86_TRAP_OTHER, common_interrupt);
DECLARE_IDTENTRY_IRQ(X86_TRAP_OTHER, spurious_interrupt);
#endif
+#ifdef CONFIG_IA32_EMULATION
+DECLARE_IDTENTRY_IA32_EMULATION(IA32_SYSCALL_VECTOR, entry_INT80_compat, do_int80_syscall_32);
+#endif
+
/* System vector entry points */
#ifdef CONFIG_X86_LOCAL_APIC
DECLARE_IDTENTRY_SYSVEC(ERROR_APIC_VECTOR, sysvec_error_interrupt);
diff --git a/arch/x86/include/asm/proto.h b/arch/x86/include/asm/proto.h
index feed36d44d04..c4d331fe65ff 100644
--- a/arch/x86/include/asm/proto.h
+++ b/arch/x86/include/asm/proto.h
@@ -28,10 +28,6 @@ void entry_SYSENTER_compat(void);
void __end_entry_SYSENTER_compat(void);
void entry_SYSCALL_compat(void);
void entry_SYSCALL_compat_safe_stack(void);
-void entry_INT80_compat(void);
-#ifdef CONFIG_XEN_PV
-void xen_entry_INT80_compat(void);
-#endif
#endif
void x86_configure_nx(void);
--
2.19.1.6.gb485710b
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH V2 7/7] x86/entry: Convert SWAPGS to swapgs and remove the definition of SWAPGS
2022-03-03 3:54 [PATCH V2 0/7] x86/entry: Clean up entry code Lai Jiangshan
` (5 preceding siblings ...)
2022-03-03 3:54 ` [PATCH V2 6/7] x86/entry: Use idtentry macro for entry_INT80_compat Lai Jiangshan
@ 2022-03-03 3:54 ` Lai Jiangshan
6 siblings, 0 replies; 12+ messages in thread
From: Lai Jiangshan @ 2022-03-03 3:54 UTC (permalink / raw)
To: linux-kernel
Cc: x86, Lai Jiangshan, xen-devel, Andy Lutomirski, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
Juergen Gross, Peter Zijlstra (Intel),
Kirill A. Shutemov
From: Lai Jiangshan <jiangshan.ljs@antgroup.com>
XENPV doesn't use swapgs_restore_regs_and_return_to_usermode(),
error_entry() and entry_SYSENTER_compat(), so the PV-awared SWAPGS in
them can be changed to swapgs. There is no user of the SWAPGS anymore
after this change.
The INTERRUPT_RETURN in swapgs_restore_regs_and_return_to_usermode()
is also converted.
Cc: xen-devel@lists.xenproject.org
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
---
arch/x86/entry/entry_64.S | 10 +++++-----
arch/x86/entry/entry_64_compat.S | 2 +-
arch/x86/include/asm/irqflags.h | 2 --
3 files changed, 6 insertions(+), 8 deletions(-)
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 88b61f310289..d9c885400034 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -644,8 +644,8 @@ SYM_INNER_LABEL(swapgs_restore_regs_and_return_to_usermode, SYM_L_GLOBAL)
/* Restore RDI. */
popq %rdi
- SWAPGS
- INTERRUPT_RETURN
+ swapgs
+ jmp native_iret
SYM_INNER_LABEL(restore_regs_and_return_to_kernel, SYM_L_GLOBAL)
@@ -1007,7 +1007,7 @@ SYM_CODE_START_LOCAL(error_entry)
* We entered from user mode or we're pretending to have entered
* from user mode due to an IRET fault.
*/
- SWAPGS
+ swapgs
FENCE_SWAPGS_USER_ENTRY
/* We have user CR3. Change to kernel CR3. */
SWITCH_TO_KERNEL_CR3 scratch_reg=%rax
@@ -1039,7 +1039,7 @@ SYM_CODE_START_LOCAL(error_entry)
* gsbase and proceed. We'll fix up the exception and land in
* .Lgs_change's error handler with kernel gsbase.
*/
- SWAPGS
+ swapgs
/*
* Issue an LFENCE to prevent GS speculation, regardless of whether it is a
@@ -1060,7 +1060,7 @@ SYM_CODE_START_LOCAL(error_entry)
* We came from an IRET to user mode, so we have user
* gsbase and CR3. Switch to kernel gsbase and CR3:
*/
- SWAPGS
+ swapgs
FENCE_SWAPGS_USER_ENTRY
SWITCH_TO_KERNEL_CR3 scratch_reg=%rax
diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index a4fcea0cab14..72e017c3941f 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -49,7 +49,7 @@
SYM_CODE_START(entry_SYSENTER_compat)
UNWIND_HINT_EMPTY
/* Interrupts are off on entry. */
- SWAPGS
+ swapgs
pushq %rax
SWITCH_TO_KERNEL_CR3 scratch_reg=%rax
diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
index 87761396e8cc..ac2e4cc47210 100644
--- a/arch/x86/include/asm/irqflags.h
+++ b/arch/x86/include/asm/irqflags.h
@@ -140,13 +140,11 @@ static __always_inline void arch_local_irq_restore(unsigned long flags)
#else
#ifdef CONFIG_X86_64
#ifdef CONFIG_XEN_PV
-#define SWAPGS ALTERNATIVE "swapgs", "", X86_FEATURE_XENPV
#define INTERRUPT_RETURN \
ANNOTATE_RETPOLINE_SAFE; \
ALTERNATIVE_TERNARY("jmp *paravirt_iret(%rip);", \
X86_FEATURE_XENPV, "jmp xen_iret;", "jmp native_iret;")
#else
-#define SWAPGS swapgs
#define INTERRUPT_RETURN jmp native_iret
#endif
#endif
--
2.19.1.6.gb485710b
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH V2 2/7] x86/entry: Switch the stack after error_entry() returns
2022-03-03 3:54 ` [PATCH V2 2/7] x86/entry: Switch the stack after error_entry() returns Lai Jiangshan
@ 2022-03-03 8:00 ` Peter Zijlstra
2022-03-07 14:39 ` Lai Jiangshan
0 siblings, 1 reply; 12+ messages in thread
From: Peter Zijlstra @ 2022-03-03 8:00 UTC (permalink / raw)
To: Lai Jiangshan
Cc: linux-kernel, x86, Lai Jiangshan, Andy Lutomirski,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H. Peter Anvin
On Thu, Mar 03, 2022 at 11:54:29AM +0800, Lai Jiangshan wrote:
> From: Lai Jiangshan <jiangshan.ljs@antgroup.com>
>
> error_entry() calls sync_regs() to settle/copy the pt_regs and switches
> the stack directly after sync_regs(). But error_entry() itself is also
> a function call, the switching has to handle the return address of it
> together, which causes the work complicated and tangly.
>
> Switching to the stack after error_entry() makes the code simpler and
> intuitive.
>
> Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
> ---
> arch/x86/entry/entry_64.S | 16 ++++++----------
> 1 file changed, 6 insertions(+), 10 deletions(-)
>
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index 24846284eda5..a51cad2b7fff 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -323,6 +323,8 @@ SYM_CODE_END(ret_from_fork)
> .macro idtentry_body cfunc has_error_code:req
>
> call error_entry
> + movq %rax, %rsp /* switch stack settled by sync_regs() */
> + ENCODE_FRAME_POINTER
> UNWIND_HINT_REGS
>
> movq %rsp, %rdi /* pt_regs pointer into 1st argument*/
> @@ -980,14 +982,10 @@ SYM_CODE_START_LOCAL(error_entry)
> /* We have user CR3. Change to kernel CR3. */
> SWITCH_TO_KERNEL_CR3 scratch_reg=%rax
>
> + leaq 8(%rsp), %rdi /* arg0 = pt_regs pointer */
> .Lerror_entry_from_usermode_after_swapgs:
> /* Put us onto the real thread stack. */
> - popq %r12 /* save return addr in %12 */
> - movq %rsp, %rdi /* arg0 = pt_regs pointer */
> call sync_regs
> - movq %rax, %rsp /* switch stack */
> - ENCODE_FRAME_POINTER
> - pushq %r12
> RET
>
> /*
> @@ -1019,6 +1017,7 @@ SYM_CODE_START_LOCAL(error_entry)
> */
> .Lerror_entry_done_lfence:
> FENCE_SWAPGS_KERNEL_ENTRY
> + leaq 8(%rsp), %rax /* return pt_regs pointer */
> RET
>
> .Lbstep_iret:
> @@ -1039,12 +1038,9 @@ SYM_CODE_START_LOCAL(error_entry)
> * Pretend that the exception came from user mode: set up pt_regs
> * as if we faulted immediately after IRET.
> */
> - popq %r12 /* save return addr in %12 */
> - movq %rsp, %rdi /* arg0 = pt_regs pointer */
> + leaq 8(%rsp), %rdi /* arg0 = pt_regs pointer */
> call fixup_bad_iret
> - mov %rax, %rsp
> - ENCODE_FRAME_POINTER
> - pushq %r12
> + mov %rax, %rdi
> jmp .Lerror_entry_from_usermode_after_swapgs
> SYM_CODE_END(error_entry)
I can't seem to follow; perhaps I need more wake-up-juice?
Suppose we have .Lerror_bad_iret, then the code flow is something like:
old new
.Lerror_bad_iret
SWAWPGS business
popq %r12 leaq 8(%rsp), %rsi
movq %rsp, %rdi
call fixup_bad_iret call fixup_bad_iret
mov %rax, %rsp mov %rax, %rdi
ENCODE_FRAME_POINTER
pushq %r12
jmp .Lerror_entry_from_usermode_after_swapgs
.Lerror_entry_from_usermode_after_swapgs
pop %r12
movq %rsp, %rdi
call sync_regs call sync_regs
mov %rax, %rsp
ENCODE_FRAME_POINTER
push %r12
RET RET
The thing that appears to me is that syn_regs is called one stack switch
short. This isn't mentioned in the Changelog nor in the comments. Why is
this OK?
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH V2 3/7] x86/entry: move PUSH_AND_CLEAR_REGS out of error_entry
2022-03-03 3:54 ` [PATCH V2 3/7] x86/entry: move PUSH_AND_CLEAR_REGS out of error_entry Lai Jiangshan
@ 2022-03-03 8:54 ` Peter Zijlstra
2022-03-07 14:53 ` Lai Jiangshan
0 siblings, 1 reply; 12+ messages in thread
From: Peter Zijlstra @ 2022-03-03 8:54 UTC (permalink / raw)
To: Lai Jiangshan
Cc: linux-kernel, x86, Lai Jiangshan, Andy Lutomirski,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H. Peter Anvin
On Thu, Mar 03, 2022 at 11:54:30AM +0800, Lai Jiangshan wrote:
> From: Lai Jiangshan <jiangshan.ljs@antgroup.com>
>
> Moving PUSH_AND_CLEAR_REGS out of error_entry doesn't change any
> functionality. It will enlarge the size:
>
> size arch/x86/entry/entry_64.o.before:
> text data bss dec hex filename
> 17916 384 0 18300 477c arch/x86/entry/entry_64.o
>
> size --format=SysV arch/x86/entry/entry_64.o.before:
> .entry.text 5528 0
> .orc_unwind 6456 0
> .orc_unwind_ip 4304 0
>
> size arch/x86/entry/entry_64.o.after:
> text data bss dec hex filename
> 26868 384 0 27252 6a74 arch/x86/entry/entry_64.o
>
> size --format=SysV arch/x86/entry/entry_64.o.after:
> .entry.text 8200 0
> .orc_unwind 10224 0
> .orc_unwind_ip 6816 0
>
> But .entry.text in x86_64 is 2M aligned, enlarging it to 8.2k doesn't
> enlarge the final text size.
>
> The tables .orc_unwind[_ip] are enlarged due to it adds many pushes.
>
> It is prepared for not calling error_entry() from XENPV in later patch
> and for future converting the whole error_entry into C code.
And also, IIUC, folding that int80 thing a few patches down, right?
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH V2 2/7] x86/entry: Switch the stack after error_entry() returns
2022-03-03 8:00 ` Peter Zijlstra
@ 2022-03-07 14:39 ` Lai Jiangshan
0 siblings, 0 replies; 12+ messages in thread
From: Lai Jiangshan @ 2022-03-07 14:39 UTC (permalink / raw)
To: Peter Zijlstra
Cc: LKML, X86 ML, Lai Jiangshan, Andy Lutomirski, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin
On Thu, Mar 3, 2022 at 4:00 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Thu, Mar 03, 2022 at 11:54:29AM +0800, Lai Jiangshan wrote:
> > From: Lai Jiangshan <jiangshan.ljs@antgroup.com>
> >
> > error_entry() calls sync_regs() to settle/copy the pt_regs and switches
> > the stack directly after sync_regs(). But error_entry() itself is also
> > a function call, the switching has to handle the return address of it
> > together, which causes the work complicated and tangly.
> >
> > Switching to the stack after error_entry() makes the code simpler and
> > intuitive.
> >
> > Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
> > ---
> > arch/x86/entry/entry_64.S | 16 ++++++----------
> > 1 file changed, 6 insertions(+), 10 deletions(-)
> >
> > diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> > index 24846284eda5..a51cad2b7fff 100644
> > --- a/arch/x86/entry/entry_64.S
> > +++ b/arch/x86/entry/entry_64.S
> > @@ -323,6 +323,8 @@ SYM_CODE_END(ret_from_fork)
> > .macro idtentry_body cfunc has_error_code:req
> >
> > call error_entry
> > + movq %rax, %rsp /* switch stack settled by sync_regs() */
> > + ENCODE_FRAME_POINTER
> > UNWIND_HINT_REGS
> >
> > movq %rsp, %rdi /* pt_regs pointer into 1st argument*/
> > @@ -980,14 +982,10 @@ SYM_CODE_START_LOCAL(error_entry)
> > /* We have user CR3. Change to kernel CR3. */
> > SWITCH_TO_KERNEL_CR3 scratch_reg=%rax
> >
> > + leaq 8(%rsp), %rdi /* arg0 = pt_regs pointer */
> > .Lerror_entry_from_usermode_after_swapgs:
> > /* Put us onto the real thread stack. */
> > - popq %r12 /* save return addr in %12 */
> > - movq %rsp, %rdi /* arg0 = pt_regs pointer */
> > call sync_regs
> > - movq %rax, %rsp /* switch stack */
> > - ENCODE_FRAME_POINTER
> > - pushq %r12
> > RET
> >
> > /*
> > @@ -1019,6 +1017,7 @@ SYM_CODE_START_LOCAL(error_entry)
> > */
> > .Lerror_entry_done_lfence:
> > FENCE_SWAPGS_KERNEL_ENTRY
> > + leaq 8(%rsp), %rax /* return pt_regs pointer */
> > RET
> >
> > .Lbstep_iret:
> > @@ -1039,12 +1038,9 @@ SYM_CODE_START_LOCAL(error_entry)
> > * Pretend that the exception came from user mode: set up pt_regs
> > * as if we faulted immediately after IRET.
> > */
> > - popq %r12 /* save return addr in %12 */
> > - movq %rsp, %rdi /* arg0 = pt_regs pointer */
> > + leaq 8(%rsp), %rdi /* arg0 = pt_regs pointer */
> > call fixup_bad_iret
> > - mov %rax, %rsp
> > - ENCODE_FRAME_POINTER
> > - pushq %r12
> > + mov %rax, %rdi
> > jmp .Lerror_entry_from_usermode_after_swapgs
> > SYM_CODE_END(error_entry)
>
> I can't seem to follow; perhaps I need more wake-up-juice?
>
> Suppose we have .Lerror_bad_iret, then the code flow is something like:
>
>
> old new
>
> .Lerror_bad_iret
> SWAWPGS business
>
> popq %r12 leaq 8(%rsp), %rsi
> movq %rsp, %rdi
> call fixup_bad_iret call fixup_bad_iret
> mov %rax, %rsp mov %rax, %rdi
> ENCODE_FRAME_POINTER
> pushq %r12
>
> jmp .Lerror_entry_from_usermode_after_swapgs
>
> .Lerror_entry_from_usermode_after_swapgs
> pop %r12
> movq %rsp, %rdi
> call sync_regs call sync_regs
> mov %rax, %rsp
> ENCODE_FRAME_POINTER
> push %r12
> RET RET
>
>
> The thing that appears to me is that syn_regs is called one stack switch
> short. This isn't mentioned in the Changelog nor in the comments. Why is
> this OK?
>
>
I has not been confident enough with the meaning of "short" or
"call short" in my understanding of English.
So I tried hard to find the semantic difference between the code
before and after the patch.
The logic are the same:
1) feed fixup_bad_iret() with the pt_regs pushed by ASM code
2) fixup_bad_iret() moves the partial pt_regs up
3) feed sync_regs() with the pt_regs returned by fixup_bad_iret()
4) sync_regs() copies the whole pt_regs to kernel stack if needed
5) after error_entry(), it is in kernel stack with the pt_regs
Differences:
Old code switches to copied pt_regs immediately twice in
error_entry() while new code switches to the copied pt_regs only
once after error_entry() returns.
It is correct since sync_regs() doesn't need to be called close
to the pt_regs it handles. And this is the point of this
patch which switches stack once only.
Old code stashes the return-address of error_entry() in a scratch
register and new code doesn't stash it.
It relies on the fact that fixup_bad_iret() and sync_regs() don't
corrupt the return-address of error_entry(). But even the old code
also relies on the fact that fixup_bad_iret() and sync_regs() don't
corrupt the return-address of themselves. They are the same reliance.
The code had been tested with my additial tracing code in kernel and
the x86 selftest code which includes all kinds of cases for bad iret.
(tools/testing/selftests/x86/sigreturn.c)
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH V2 3/7] x86/entry: move PUSH_AND_CLEAR_REGS out of error_entry
2022-03-03 8:54 ` Peter Zijlstra
@ 2022-03-07 14:53 ` Lai Jiangshan
0 siblings, 0 replies; 12+ messages in thread
From: Lai Jiangshan @ 2022-03-07 14:53 UTC (permalink / raw)
To: Peter Zijlstra
Cc: LKML, X86 ML, Lai Jiangshan, Andy Lutomirski, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin
On Thu, Mar 3, 2022 at 4:55 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> And also, IIUC, folding that int80 thing a few patches down, right?
I don't think folding that int80 thing into idtentry requires this patch.
I didn't notice int80 is a huge ASM code in a different file until I finished
converting the entry code to C code. So folding int80 thing is a patch later.
It is possible to move this patch of folding int80 thing as the first patch,
but several patches need to be updated respectively. But I'm afraid of
updating ASM code without good reason which requires several days of
debugging even if there is a slight mistake, for example, forgetting to
update the offset of the stack or register name when switching two lines
of code.
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2022-03-07 14:54 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-03 3:54 [PATCH V2 0/7] x86/entry: Clean up entry code Lai Jiangshan
2022-03-03 3:54 ` [PATCH V2 1/7] x86/traps: Move pt_regs only in fixup_bad_iret() Lai Jiangshan
2022-03-03 3:54 ` [PATCH V2 2/7] x86/entry: Switch the stack after error_entry() returns Lai Jiangshan
2022-03-03 8:00 ` Peter Zijlstra
2022-03-07 14:39 ` Lai Jiangshan
2022-03-03 3:54 ` [PATCH V2 3/7] x86/entry: move PUSH_AND_CLEAR_REGS out of error_entry Lai Jiangshan
2022-03-03 8:54 ` Peter Zijlstra
2022-03-07 14:53 ` Lai Jiangshan
2022-03-03 3:54 ` [PATCH V2 4/7] x86/entry: Move cld to the start of idtentry Lai Jiangshan
2022-03-03 3:54 ` [PATCH V2 5/7] x86/entry: Don't call error_entry for XENPV Lai Jiangshan
2022-03-03 3:54 ` [PATCH V2 6/7] x86/entry: Use idtentry macro for entry_INT80_compat Lai Jiangshan
2022-03-03 3:54 ` [PATCH V2 7/7] x86/entry: Convert SWAPGS to swapgs and remove the definition of SWAPGS Lai Jiangshan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).