linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/2] Rewrite ret_from_fork() in C
@ 2023-06-23 22:55 Brian Gerst
  2023-06-23 22:55 ` [PATCH v2 1/2] x86/32: Remove schedule_tail_wrapper() Brian Gerst
  2023-06-23 22:55 ` [PATCH v2 2/2] x86: Rewrite ret_from_fork() in C Brian Gerst
  0 siblings, 2 replies; 16+ messages in thread
From: Brian Gerst @ 2023-06-23 22:55 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: Thomas Gleixner, Borislav Petkov, H . Peter Anvin,
	Peter Zijlstra, Sami Tolvanen, alyssa.milburn, keescook,
	jpoimboe, joao, tim.c.chen, Brian Gerst

When kCFI is enabled, special handling is needed for the indirect call
to the kernel thread function.  Handling this in pure assembly is not
simple, so moving it to C is more appropriate.  Instead of moving just
the indirect call to C as Peter Zijlstra has proposed, this patchset
rewrites the whole ret_from_fork() function in C (other than some
necessary asm glue remaning).

V2:
- Fixed wrong address of pt_regs being passed
- Simplified 64-bit asm stub to not use a tail-call

Brian Gerst (2):
  x86/32: Remove schedule_tail_wrapper()
  x86: Rewrite ret_from_fork() in C

 arch/x86/entry/entry_32.S        | 53 ++++++++------------------------
 arch/x86/entry/entry_64.S        | 33 +++++---------------
 arch/x86/include/asm/switch_to.h |  4 ++-
 arch/x86/kernel/process.c        | 22 ++++++++++++-
 4 files changed, 45 insertions(+), 67 deletions(-)

-- 
2.41.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2 1/2] x86/32: Remove schedule_tail_wrapper()
  2023-06-23 22:55 [PATCH v2 0/2] Rewrite ret_from_fork() in C Brian Gerst
@ 2023-06-23 22:55 ` Brian Gerst
  2023-07-10  8:13   ` [tip: x86/urgent] " tip-bot2 for Brian Gerst
  2023-06-23 22:55 ` [PATCH v2 2/2] x86: Rewrite ret_from_fork() in C Brian Gerst
  1 sibling, 1 reply; 16+ messages in thread
From: Brian Gerst @ 2023-06-23 22:55 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: Thomas Gleixner, Borislav Petkov, H . Peter Anvin,
	Peter Zijlstra, Sami Tolvanen, alyssa.milburn, keescook,
	jpoimboe, joao, tim.c.chen, Brian Gerst

The unwinder expects a return address at the very top of the kernel
stack just below pt_regs and before any stack frame is created.  Instead
of calling a wrapper, set up a return address as if ret_from_fork()
was called from the syscall entry code.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
---
 arch/x86/entry/entry_32.S | 33 ++++++++++-----------------------
 1 file changed, 10 insertions(+), 23 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 91397f58ac30..e56123f03a79 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -719,26 +719,6 @@ SYM_CODE_START(__switch_to_asm)
 SYM_CODE_END(__switch_to_asm)
 .popsection
 
-/*
- * The unwinder expects the last frame on the stack to always be at the same
- * offset from the end of the page, which allows it to validate the stack.
- * Calling schedule_tail() directly would break that convention because its an
- * asmlinkage function so its argument has to be pushed on the stack.  This
- * wrapper creates a proper "end of stack" frame header before the call.
- */
-.pushsection .text, "ax"
-SYM_FUNC_START(schedule_tail_wrapper)
-	FRAME_BEGIN
-
-	pushl	%eax
-	call	schedule_tail
-	popl	%eax
-
-	FRAME_END
-	RET
-SYM_FUNC_END(schedule_tail_wrapper)
-.popsection
-
 /*
  * A newly forked process directly context switches into this address.
  *
@@ -748,16 +728,23 @@ SYM_FUNC_END(schedule_tail_wrapper)
  */
 .pushsection .text, "ax"
 SYM_CODE_START(ret_from_fork)
-	call	schedule_tail_wrapper
+	/* return address for the stack unwinder */
+	pushl	$.Lsyscall_32_done
+
+	FRAME_BEGIN
+	pushl	%eax
+	call	schedule_tail
+	addl	$4, %esp
+	FRAME_END
 
 	testl	%ebx, %ebx
 	jnz	1f		/* kernel threads are uncommon */
 
 2:
 	/* When we fork, we trace the syscall return in the child, too. */
-	movl    %esp, %eax
+	leal    4(%esp), %eax
 	call    syscall_exit_to_user_mode
-	jmp     .Lsyscall_32_done
+	RET
 
 	/* kernel thread */
 1:	movl	%edi, %eax
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 2/2] x86: Rewrite ret_from_fork() in C
  2023-06-23 22:55 [PATCH v2 0/2] Rewrite ret_from_fork() in C Brian Gerst
  2023-06-23 22:55 ` [PATCH v2 1/2] x86/32: Remove schedule_tail_wrapper() Brian Gerst
@ 2023-06-23 22:55 ` Brian Gerst
  2023-07-10  8:13   ` [tip: x86/urgent] " tip-bot2 for Brian Gerst
  2023-07-19 15:21   ` [PATCH v2 2/2] " Petr Mladek
  1 sibling, 2 replies; 16+ messages in thread
From: Brian Gerst @ 2023-06-23 22:55 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: Thomas Gleixner, Borislav Petkov, H . Peter Anvin,
	Peter Zijlstra, Sami Tolvanen, alyssa.milburn, keescook,
	jpoimboe, joao, tim.c.chen, Brian Gerst

When kCFI is enabled, special handling is needed for the indirect call
to the kernel thread function.  Rewrite the ret_from_fork() function in
C so that the compiler can properly handle the indirect call.

Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Brian Gerst <brgerst@gmail.com>
---
 arch/x86/entry/entry_32.S        | 30 ++++++++---------------------
 arch/x86/entry/entry_64.S        | 33 ++++++++------------------------
 arch/x86/include/asm/switch_to.h |  4 +++-
 arch/x86/kernel/process.c        | 22 ++++++++++++++++++++-
 4 files changed, 40 insertions(+), 49 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index e56123f03a79..6e6af42e044a 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -727,36 +727,22 @@ SYM_CODE_END(__switch_to_asm)
  * edi: kernel thread arg
  */
 .pushsection .text, "ax"
-SYM_CODE_START(ret_from_fork)
+SYM_CODE_START(ret_from_fork_asm)
+	movl	%esp, %edx	/* regs */
+
 	/* return address for the stack unwinder */
 	pushl	$.Lsyscall_32_done
 
 	FRAME_BEGIN
-	pushl	%eax
-	call	schedule_tail
+	/* prev already in EAX */
+	movl	%ebx, %ecx	/* fn */
+	pushl	%edi		/* fn_arg */
+	call	ret_from_fork
 	addl	$4, %esp
 	FRAME_END
 
-	testl	%ebx, %ebx
-	jnz	1f		/* kernel threads are uncommon */
-
-2:
-	/* When we fork, we trace the syscall return in the child, too. */
-	leal    4(%esp), %eax
-	call    syscall_exit_to_user_mode
 	RET
-
-	/* kernel thread */
-1:	movl	%edi, %eax
-	CALL_NOSPEC ebx
-	/*
-	 * A kernel thread is allowed to return here after successfully
-	 * calling kernel_execve().  Exit to userspace to complete the execve()
-	 * syscall.
-	 */
-	movl	$0, PT_EAX(%esp)
-	jmp	2b
-SYM_CODE_END(ret_from_fork)
+SYM_CODE_END(ret_from_fork_asm)
 .popsection
 
 SYM_ENTRY(__begin_SYSENTER_singlestep_region, SYM_L_GLOBAL, SYM_A_NONE)
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index f31e286c2977..91f6818884fa 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -284,36 +284,19 @@ SYM_FUNC_END(__switch_to_asm)
  * r12: kernel thread arg
  */
 .pushsection .text, "ax"
-	__FUNC_ALIGN
-SYM_CODE_START_NOALIGN(ret_from_fork)
-	UNWIND_HINT_END_OF_STACK
+SYM_CODE_START(ret_from_fork_asm)
+	UNWIND_HINT_REGS
 	ANNOTATE_NOENDBR // copy_thread
 	CALL_DEPTH_ACCOUNT
-	movq	%rax, %rdi
-	call	schedule_tail			/* rdi: 'prev' task parameter */
 
-	testq	%rbx, %rbx			/* from kernel_thread? */
-	jnz	1f				/* kernel threads are uncommon */
+	movq	%rax, %rdi		/* prev */
+	movq	%rsp, %rsi		/* regs */
+	movq	%rbx, %rdx		/* fn */
+	movq	%r12, %rcx		/* fn_arg */
+	call	ret_from_fork
 
-2:
-	UNWIND_HINT_REGS
-	movq	%rsp, %rdi
-	call	syscall_exit_to_user_mode	/* returns with IRQs disabled */
 	jmp	swapgs_restore_regs_and_return_to_usermode
-
-1:
-	/* kernel thread */
-	UNWIND_HINT_END_OF_STACK
-	movq	%r12, %rdi
-	CALL_NOSPEC rbx
-	/*
-	 * A kernel thread is allowed to return here after successfully
-	 * calling kernel_execve().  Exit to userspace to complete the execve()
-	 * syscall.
-	 */
-	movq	$0, RAX(%rsp)
-	jmp	2b
-SYM_CODE_END(ret_from_fork)
+SYM_CODE_END(ret_from_fork_asm)
 .popsection
 
 .macro DEBUG_ENTRY_ASSERT_IRQS_OFF
diff --git a/arch/x86/include/asm/switch_to.h b/arch/x86/include/asm/switch_to.h
index 5c91305d09d2..f42dbf17f52b 100644
--- a/arch/x86/include/asm/switch_to.h
+++ b/arch/x86/include/asm/switch_to.h
@@ -12,7 +12,9 @@ struct task_struct *__switch_to_asm(struct task_struct *prev,
 __visible struct task_struct *__switch_to(struct task_struct *prev,
 					  struct task_struct *next);
 
-asmlinkage void ret_from_fork(void);
+asmlinkage void ret_from_fork_asm(void);
+__visible void ret_from_fork(struct task_struct *prev, struct pt_regs *regs,
+			     int (*fn)(void *), void *fn_arg);
 
 /*
  * This is the structure pointed to by thread.sp for an inactive task.  The
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index cc7a642f8c9d..001e6dad9a48 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -28,6 +28,7 @@
 #include <linux/static_call.h>
 #include <trace/events/power.h>
 #include <linux/hw_breakpoint.h>
+#include <linux/entry-common.h>
 #include <asm/cpu.h>
 #include <asm/apic.h>
 #include <linux/uaccess.h>
@@ -136,6 +137,25 @@ static int set_new_tls(struct task_struct *p, unsigned long tls)
 		return do_set_thread_area_64(p, ARCH_SET_FS, tls);
 }
 
+__visible noinstr void ret_from_fork(struct task_struct *prev, struct pt_regs *regs,
+				     int (*fn)(void *), void *fn_arg)
+{
+	schedule_tail(prev);
+
+	/* Is this a kernel thread? */
+	if (unlikely(fn)) {
+		fn(fn_arg);
+		/*
+		 * A kernel thread is allowed to return here after successfully
+		 * calling kernel_execve().  Exit to userspace to complete the
+		 * execve() syscall.
+		 */
+		regs->ax = 0;
+	}
+
+	syscall_exit_to_user_mode(regs);
+}
+
 int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
 {
 	unsigned long clone_flags = args->flags;
@@ -152,7 +172,7 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
 	frame = &fork_frame->frame;
 
 	frame->bp = encode_frame_pointer(childregs);
-	frame->ret_addr = (unsigned long) ret_from_fork;
+	frame->ret_addr = (unsigned long) ret_from_fork_asm;
 	p->thread.sp = (unsigned long) fork_frame;
 	p->thread.io_bitmap = NULL;
 	p->thread.iopl_warn = 0;
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [tip: x86/urgent] x86: Rewrite ret_from_fork() in C
  2023-06-23 22:55 ` [PATCH v2 2/2] x86: Rewrite ret_from_fork() in C Brian Gerst
@ 2023-07-10  8:13   ` tip-bot2 for Brian Gerst
  2023-07-19 15:21   ` [PATCH v2 2/2] " Petr Mladek
  1 sibling, 0 replies; 16+ messages in thread
From: tip-bot2 for Brian Gerst @ 2023-07-10  8:13 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Peter Zijlstra (Intel),
	Brian Gerst, Kees Cook, Sami Tolvanen, x86, linux-kernel

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID:     3aec4ecb3d1f313a8ab985df7cab07c4af81f478
Gitweb:        https://git.kernel.org/tip/3aec4ecb3d1f313a8ab985df7cab07c4af81f478
Author:        Brian Gerst <brgerst@gmail.com>
AuthorDate:    Fri, 23 Jun 2023 18:55:29 -04:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Mon, 10 Jul 2023 09:52:25 +02:00

x86: Rewrite ret_from_fork() in C

When kCFI is enabled, special handling is needed for the indirect call
to the kernel thread function.  Rewrite the ret_from_fork() function in
C so that the compiler can properly handle the indirect call.

Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://lkml.kernel.org/r/20230623225529.34590-3-brgerst@gmail.com
---
 arch/x86/entry/entry_32.S        | 30 +++++++---------------------
 arch/x86/entry/entry_64.S        | 33 +++++++------------------------
 arch/x86/include/asm/switch_to.h |  4 +++-
 arch/x86/kernel/process.c        | 22 ++++++++++++++++++++-
 4 files changed, 40 insertions(+), 49 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index e56123f..6e6af42 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -727,36 +727,22 @@ SYM_CODE_END(__switch_to_asm)
  * edi: kernel thread arg
  */
 .pushsection .text, "ax"
-SYM_CODE_START(ret_from_fork)
+SYM_CODE_START(ret_from_fork_asm)
+	movl	%esp, %edx	/* regs */
+
 	/* return address for the stack unwinder */
 	pushl	$.Lsyscall_32_done
 
 	FRAME_BEGIN
-	pushl	%eax
-	call	schedule_tail
+	/* prev already in EAX */
+	movl	%ebx, %ecx	/* fn */
+	pushl	%edi		/* fn_arg */
+	call	ret_from_fork
 	addl	$4, %esp
 	FRAME_END
 
-	testl	%ebx, %ebx
-	jnz	1f		/* kernel threads are uncommon */
-
-2:
-	/* When we fork, we trace the syscall return in the child, too. */
-	leal    4(%esp), %eax
-	call    syscall_exit_to_user_mode
 	RET
-
-	/* kernel thread */
-1:	movl	%edi, %eax
-	CALL_NOSPEC ebx
-	/*
-	 * A kernel thread is allowed to return here after successfully
-	 * calling kernel_execve().  Exit to userspace to complete the execve()
-	 * syscall.
-	 */
-	movl	$0, PT_EAX(%esp)
-	jmp	2b
-SYM_CODE_END(ret_from_fork)
+SYM_CODE_END(ret_from_fork_asm)
 .popsection
 
 SYM_ENTRY(__begin_SYSENTER_singlestep_region, SYM_L_GLOBAL, SYM_A_NONE)
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index f31e286..91f6818 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -284,36 +284,19 @@ SYM_FUNC_END(__switch_to_asm)
  * r12: kernel thread arg
  */
 .pushsection .text, "ax"
-	__FUNC_ALIGN
-SYM_CODE_START_NOALIGN(ret_from_fork)
-	UNWIND_HINT_END_OF_STACK
+SYM_CODE_START(ret_from_fork_asm)
+	UNWIND_HINT_REGS
 	ANNOTATE_NOENDBR // copy_thread
 	CALL_DEPTH_ACCOUNT
-	movq	%rax, %rdi
-	call	schedule_tail			/* rdi: 'prev' task parameter */
 
-	testq	%rbx, %rbx			/* from kernel_thread? */
-	jnz	1f				/* kernel threads are uncommon */
+	movq	%rax, %rdi		/* prev */
+	movq	%rsp, %rsi		/* regs */
+	movq	%rbx, %rdx		/* fn */
+	movq	%r12, %rcx		/* fn_arg */
+	call	ret_from_fork
 
-2:
-	UNWIND_HINT_REGS
-	movq	%rsp, %rdi
-	call	syscall_exit_to_user_mode	/* returns with IRQs disabled */
 	jmp	swapgs_restore_regs_and_return_to_usermode
-
-1:
-	/* kernel thread */
-	UNWIND_HINT_END_OF_STACK
-	movq	%r12, %rdi
-	CALL_NOSPEC rbx
-	/*
-	 * A kernel thread is allowed to return here after successfully
-	 * calling kernel_execve().  Exit to userspace to complete the execve()
-	 * syscall.
-	 */
-	movq	$0, RAX(%rsp)
-	jmp	2b
-SYM_CODE_END(ret_from_fork)
+SYM_CODE_END(ret_from_fork_asm)
 .popsection
 
 .macro DEBUG_ENTRY_ASSERT_IRQS_OFF
diff --git a/arch/x86/include/asm/switch_to.h b/arch/x86/include/asm/switch_to.h
index 5c91305..f42dbf1 100644
--- a/arch/x86/include/asm/switch_to.h
+++ b/arch/x86/include/asm/switch_to.h
@@ -12,7 +12,9 @@ struct task_struct *__switch_to_asm(struct task_struct *prev,
 __visible struct task_struct *__switch_to(struct task_struct *prev,
 					  struct task_struct *next);
 
-asmlinkage void ret_from_fork(void);
+asmlinkage void ret_from_fork_asm(void);
+__visible void ret_from_fork(struct task_struct *prev, struct pt_regs *regs,
+			     int (*fn)(void *), void *fn_arg);
 
 /*
  * This is the structure pointed to by thread.sp for an inactive task.  The
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index ff9b80a..72015db 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -28,6 +28,7 @@
 #include <linux/static_call.h>
 #include <trace/events/power.h>
 #include <linux/hw_breakpoint.h>
+#include <linux/entry-common.h>
 #include <asm/cpu.h>
 #include <asm/apic.h>
 #include <linux/uaccess.h>
@@ -134,6 +135,25 @@ static int set_new_tls(struct task_struct *p, unsigned long tls)
 		return do_set_thread_area_64(p, ARCH_SET_FS, tls);
 }
 
+__visible void ret_from_fork(struct task_struct *prev, struct pt_regs *regs,
+				     int (*fn)(void *), void *fn_arg)
+{
+	schedule_tail(prev);
+
+	/* Is this a kernel thread? */
+	if (unlikely(fn)) {
+		fn(fn_arg);
+		/*
+		 * A kernel thread is allowed to return here after successfully
+		 * calling kernel_execve().  Exit to userspace to complete the
+		 * execve() syscall.
+		 */
+		regs->ax = 0;
+	}
+
+	syscall_exit_to_user_mode(regs);
+}
+
 int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
 {
 	unsigned long clone_flags = args->flags;
@@ -149,7 +169,7 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
 	frame = &fork_frame->frame;
 
 	frame->bp = encode_frame_pointer(childregs);
-	frame->ret_addr = (unsigned long) ret_from_fork;
+	frame->ret_addr = (unsigned long) ret_from_fork_asm;
 	p->thread.sp = (unsigned long) fork_frame;
 	p->thread.io_bitmap = NULL;
 	p->thread.iopl_warn = 0;

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [tip: x86/urgent] x86/32: Remove schedule_tail_wrapper()
  2023-06-23 22:55 ` [PATCH v2 1/2] x86/32: Remove schedule_tail_wrapper() Brian Gerst
@ 2023-07-10  8:13   ` tip-bot2 for Brian Gerst
  0 siblings, 0 replies; 16+ messages in thread
From: tip-bot2 for Brian Gerst @ 2023-07-10  8:13 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Brian Gerst, Peter Zijlstra (Intel),
	Kees Cook, Sami Tolvanen, x86, linux-kernel

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID:     81f755d561f365f544795fad92f05a085ea4f292
Gitweb:        https://git.kernel.org/tip/81f755d561f365f544795fad92f05a085ea4f292
Author:        Brian Gerst <brgerst@gmail.com>
AuthorDate:    Fri, 23 Jun 2023 18:55:28 -04:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Mon, 10 Jul 2023 09:52:24 +02:00

x86/32: Remove schedule_tail_wrapper()

The unwinder expects a return address at the very top of the kernel
stack just below pt_regs and before any stack frame is created.  Instead
of calling a wrapper, set up a return address as if ret_from_fork()
was called from the syscall entry code.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://lkml.kernel.org/r/20230623225529.34590-2-brgerst@gmail.com
---
 arch/x86/entry/entry_32.S | 33 ++++++++++-----------------------
 1 file changed, 10 insertions(+), 23 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 91397f5..e56123f 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -720,26 +720,6 @@ SYM_CODE_END(__switch_to_asm)
 .popsection
 
 /*
- * The unwinder expects the last frame on the stack to always be at the same
- * offset from the end of the page, which allows it to validate the stack.
- * Calling schedule_tail() directly would break that convention because its an
- * asmlinkage function so its argument has to be pushed on the stack.  This
- * wrapper creates a proper "end of stack" frame header before the call.
- */
-.pushsection .text, "ax"
-SYM_FUNC_START(schedule_tail_wrapper)
-	FRAME_BEGIN
-
-	pushl	%eax
-	call	schedule_tail
-	popl	%eax
-
-	FRAME_END
-	RET
-SYM_FUNC_END(schedule_tail_wrapper)
-.popsection
-
-/*
  * A newly forked process directly context switches into this address.
  *
  * eax: prev task we switched from
@@ -748,16 +728,23 @@ SYM_FUNC_END(schedule_tail_wrapper)
  */
 .pushsection .text, "ax"
 SYM_CODE_START(ret_from_fork)
-	call	schedule_tail_wrapper
+	/* return address for the stack unwinder */
+	pushl	$.Lsyscall_32_done
+
+	FRAME_BEGIN
+	pushl	%eax
+	call	schedule_tail
+	addl	$4, %esp
+	FRAME_END
 
 	testl	%ebx, %ebx
 	jnz	1f		/* kernel threads are uncommon */
 
 2:
 	/* When we fork, we trace the syscall return in the child, too. */
-	movl    %esp, %eax
+	leal    4(%esp), %eax
 	call    syscall_exit_to_user_mode
-	jmp     .Lsyscall_32_done
+	RET
 
 	/* kernel thread */
 1:	movl	%edi, %eax

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/2] x86: Rewrite ret_from_fork() in C
  2023-06-23 22:55 ` [PATCH v2 2/2] x86: Rewrite ret_from_fork() in C Brian Gerst
  2023-07-10  8:13   ` [tip: x86/urgent] " tip-bot2 for Brian Gerst
@ 2023-07-19 15:21   ` Petr Mladek
  2023-07-19 20:02     ` Peter Zijlstra
  2023-07-19 20:33     ` [PATCH v2 2/2] x86: Rewrite ret_from_fork() in C Joe Lawrence
  1 sibling, 2 replies; 16+ messages in thread
From: Petr Mladek @ 2023-07-19 15:21 UTC (permalink / raw)
  To: Brian Gerst, jpoimboe
  Cc: linux-kernel, x86, Thomas Gleixner, Borislav Petkov,
	H . Peter Anvin, Peter Zijlstra, Sami Tolvanen, alyssa.milburn,
	keescook, joao, tim.c.chen, live-patching

On Fri 2023-06-23 18:55:29, Brian Gerst wrote:
> When kCFI is enabled, special handling is needed for the indirect call
> to the kernel thread function.  Rewrite the ret_from_fork() function in
> C so that the compiler can properly handle the indirect call.

This patch broke livepatching. Kthreads never have a reliable stack.
It works when I revert it.

See also below.

> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -284,36 +284,19 @@ SYM_FUNC_END(__switch_to_asm)
>   * r12: kernel thread arg
>   */
>  .pushsection .text, "ax"
> -	__FUNC_ALIGN
> -SYM_CODE_START_NOALIGN(ret_from_fork)
> -	UNWIND_HINT_END_OF_STACK
> +SYM_CODE_START(ret_from_fork_asm)
> +	UNWIND_HINT_REGS
>  	ANNOTATE_NOENDBR // copy_thread
>  	CALL_DEPTH_ACCOUNT
> -	movq	%rax, %rdi
> -	call	schedule_tail			/* rdi: 'prev' task parameter */
>  
> -	testq	%rbx, %rbx			/* from kernel_thread? */
> -	jnz	1f				/* kernel threads are uncommon */
> +	movq	%rax, %rdi		/* prev */
> +	movq	%rsp, %rsi		/* regs */
> +	movq	%rbx, %rdx		/* fn */
> +	movq	%r12, %rcx		/* fn_arg */
> +	call	ret_from_fork
>  
> -2:
> -	UNWIND_HINT_REGS
> -	movq	%rsp, %rdi
> -	call	syscall_exit_to_user_mode	/* returns with IRQs disabled */
>  	jmp	swapgs_restore_regs_and_return_to_usermode
> -
> -1:
> -	/* kernel thread */
> -	UNWIND_HINT_END_OF_STACK

I think that it might be related to removal of this line.
The following intructions are going to call fn(fn_arg).
See below.

> -	movq	%r12, %rdi
> -	CALL_NOSPEC rbx
> -	/*
> -	 * A kernel thread is allowed to return here after successfully
> -	 * calling kernel_execve().  Exit to userspace to complete the execve()
> -	 * syscall.
> -	 */
> -	movq	$0, RAX(%rsp)
> -	jmp	2b
> -SYM_CODE_END(ret_from_fork)
> +SYM_CODE_END(ret_from_fork_asm)
>  .popsection
>  
>  .macro DEBUG_ENTRY_ASSERT_IRQS_OFF
> diff --git a/arch/x86/include/asm/switch_to.h b/arch/x86/include/asm/switch_to.h
> index 5c91305d09d2..f42dbf17f52b 100644
> --- a/arch/x86/include/asm/switch_to.h
> +++ b/arch/x86/include/asm/switch_to.h
> @@ -12,7 +12,9 @@ struct task_struct *__switch_to_asm(struct task_struct *prev,
>  __visible struct task_struct *__switch_to(struct task_struct *prev,
>  					  struct task_struct *next);
>  
> -asmlinkage void ret_from_fork(void);
> +asmlinkage void ret_from_fork_asm(void);
> +__visible void ret_from_fork(struct task_struct *prev, struct pt_regs *regs,
> +			     int (*fn)(void *), void *fn_arg);
>  
>  /*
>   * This is the structure pointed to by thread.sp for an inactive task.  The
> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index cc7a642f8c9d..001e6dad9a48 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -136,6 +137,25 @@ static int set_new_tls(struct task_struct *p, unsigned long tls)
>  		return do_set_thread_area_64(p, ARCH_SET_FS, tls);
>  }
>  
> +__visible noinstr void ret_from_fork(struct task_struct *prev, struct pt_regs *regs,
> +				     int (*fn)(void *), void *fn_arg)
> +{
> +	schedule_tail(prev);
> +
> +	/* Is this a kernel thread? */
> +	if (unlikely(fn)) {
> +		fn(fn_arg);

This is the related code but it does not include the annotation
about the end of the stack.

Honestly, I am not familiar with the stack unwinder and how this is
supposed to work.

I hope that Josh or anyone else might know better.

> +		/*
> +		 * A kernel thread is allowed to return here after successfully
> +		 * calling kernel_execve().  Exit to userspace to complete the
> +		 * execve() syscall.
> +		 */
> +		regs->ax = 0;
> +	}
> +
> +	syscall_exit_to_user_mode(regs);
> +}
> +
>  int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
>  {
>  	unsigned long clone_flags = args->flags;

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/2] x86: Rewrite ret_from_fork() in C
  2023-07-19 15:21   ` [PATCH v2 2/2] " Petr Mladek
@ 2023-07-19 20:02     ` Peter Zijlstra
  2023-07-19 20:15       ` Peter Zijlstra
  2023-07-19 20:33     ` [PATCH v2 2/2] x86: Rewrite ret_from_fork() in C Joe Lawrence
  1 sibling, 1 reply; 16+ messages in thread
From: Peter Zijlstra @ 2023-07-19 20:02 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Brian Gerst, jpoimboe, linux-kernel, x86, Thomas Gleixner,
	Borislav Petkov, H . Peter Anvin, Sami Tolvanen, alyssa.milburn,
	keescook, joao, tim.c.chen, live-patching

On Wed, Jul 19, 2023 at 05:21:11PM +0200, Petr Mladek wrote:

> This patch broke livepatching. Kthreads never have a reliable stack.
> It works when I revert it.

> > +SYM_CODE_START(ret_from_fork_asm)
> > +	UNWIND_HINT_REGS

It works again when I change the above hint to UNWIND_HINT_END_OF_STACK,
so yeah. Doing this makes objtool unhappy with something else though,
so I'll go prod at things with something sharp...

Thanks!

> >  	ANNOTATE_NOENDBR // copy_thread
> >  	CALL_DEPTH_ACCOUNT
> >  
> > +	movq	%rax, %rdi		/* prev */
> > +	movq	%rsp, %rsi		/* regs */
> > +	movq	%rbx, %rdx		/* fn */
> > +	movq	%r12, %rcx		/* fn_arg */
> > +	call	ret_from_fork
> >  
> > +SYM_CODE_END(ret_from_fork_asm)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/2] x86: Rewrite ret_from_fork() in C
  2023-07-19 20:02     ` Peter Zijlstra
@ 2023-07-19 20:15       ` Peter Zijlstra
  2023-07-19 20:50         ` Peter Zijlstra
                           ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Peter Zijlstra @ 2023-07-19 20:15 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Brian Gerst, jpoimboe, linux-kernel, x86, Thomas Gleixner,
	Borislav Petkov, H . Peter Anvin, Sami Tolvanen, alyssa.milburn,
	keescook, joao, tim.c.chen, live-patching

On Wed, Jul 19, 2023 at 10:02:22PM +0200, Peter Zijlstra wrote:
> On Wed, Jul 19, 2023 at 05:21:11PM +0200, Petr Mladek wrote:
> 
> > This patch broke livepatching. Kthreads never have a reliable stack.
> > It works when I revert it.
> 
> > > +SYM_CODE_START(ret_from_fork_asm)
> > > +	UNWIND_HINT_REGS
> 
> It works again when I change the above hint to UNWIND_HINT_END_OF_STACK,
> so yeah. Doing this makes objtool unhappy with something else though,
> so I'll go prod at things with something sharp...


The below cures things; Josh, did I miss anything?

---
 arch/x86/entry/entry_64.S | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 91f6818884fa..cfe7882ea9ae 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -285,7 +285,14 @@ SYM_FUNC_END(__switch_to_asm)
  */
 .pushsection .text, "ax"
 SYM_CODE_START(ret_from_fork_asm)
-	UNWIND_HINT_REGS
+	/*
+	 * This is the start of the kernel stack; even through there's a regs
+	 * set at the top, there is no real exception frame and one cannot
+	 * unwind further. This is the end.
+	 *
+	 * This ensures stack unwinds of kernel threads hit a known good state.
+	 */
+	UNWIND_HINT_END_OF_STACK
 	ANNOTATE_NOENDBR // copy_thread
 	CALL_DEPTH_ACCOUNT
 
@@ -295,6 +302,11 @@ SYM_CODE_START(ret_from_fork_asm)
 	movq	%r12, %rcx		/* fn_arg */
 	call	ret_from_fork
 
+	/*
+	 * Set the stack state to what is expected for the target function
+	 * -- also it is not wrong.
+	 */
+	UNWIND_HINT_REGS
 	jmp	swapgs_restore_regs_and_return_to_usermode
 SYM_CODE_END(ret_from_fork_asm)
 .popsection

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/2] x86: Rewrite ret_from_fork() in C
  2023-07-19 15:21   ` [PATCH v2 2/2] " Petr Mladek
  2023-07-19 20:02     ` Peter Zijlstra
@ 2023-07-19 20:33     ` Joe Lawrence
  2023-07-19 20:41       ` Peter Zijlstra
  1 sibling, 1 reply; 16+ messages in thread
From: Joe Lawrence @ 2023-07-19 20:33 UTC (permalink / raw)
  To: Petr Mladek, Brian Gerst, jpoimboe
  Cc: linux-kernel, x86, Thomas Gleixner, Borislav Petkov,
	H . Peter Anvin, Peter Zijlstra, Sami Tolvanen, alyssa.milburn,
	keescook, joao, tim.c.chen, live-patching

On 7/19/23 11:21, Petr Mladek wrote:
> On Fri 2023-06-23 18:55:29, Brian Gerst wrote:
>> When kCFI is enabled, special handling is needed for the indirect call
>> to the kernel thread function.  Rewrite the ret_from_fork() function in
>> C so that the compiler can properly handle the indirect call.
> 
> This patch broke livepatching. Kthreads never have a reliable stack.
> It works when I revert it.
> 

Just curious -- did the selftests catch this anywhere?  I'm not 100%
clear on what trees / frequency they all run, so maybe Petr you found
this by code inspection or other means?

-- 
Joe


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/2] x86: Rewrite ret_from_fork() in C
  2023-07-19 20:33     ` [PATCH v2 2/2] x86: Rewrite ret_from_fork() in C Joe Lawrence
@ 2023-07-19 20:41       ` Peter Zijlstra
  0 siblings, 0 replies; 16+ messages in thread
From: Peter Zijlstra @ 2023-07-19 20:41 UTC (permalink / raw)
  To: Joe Lawrence
  Cc: Petr Mladek, Brian Gerst, jpoimboe, linux-kernel, x86,
	Thomas Gleixner, Borislav Petkov, H . Peter Anvin, Sami Tolvanen,
	alyssa.milburn, keescook, joao, tim.c.chen, live-patching

On Wed, Jul 19, 2023 at 04:33:26PM -0400, Joe Lawrence wrote:
> On 7/19/23 11:21, Petr Mladek wrote:
> > On Fri 2023-06-23 18:55:29, Brian Gerst wrote:
> >> When kCFI is enabled, special handling is needed for the indirect call
> >> to the kernel thread function.  Rewrite the ret_from_fork() function in
> >> C so that the compiler can properly handle the indirect call.
> > 
> > This patch broke livepatching. Kthreads never have a reliable stack.
> > It works when I revert it.
> > 
> 
> Just curious -- did the selftests catch this anywhere?  I'm not 100%
> clear on what trees / frequency they all run, so maybe Petr you found
> this by code inspection or other means?

I suspect Petr ran the selftests himself, they're fairly easy to run
(once you figure out the magic incantation) and insta fail.

I'm not sure the robots consistently run this stuff -- I've had these
patches exposed to 0day for weeks...

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/2] x86: Rewrite ret_from_fork() in C
  2023-07-19 20:15       ` Peter Zijlstra
@ 2023-07-19 20:50         ` Peter Zijlstra
  2023-07-19 23:31           ` Josh Poimboeuf
  2023-07-20  8:18         ` Petr Mladek
  2023-07-21  9:20         ` [tip: x86/urgent] x86: Fix kthread unwind tip-bot2 for Peter Zijlstra
  2 siblings, 1 reply; 16+ messages in thread
From: Peter Zijlstra @ 2023-07-19 20:50 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Brian Gerst, jpoimboe, linux-kernel, x86, Thomas Gleixner,
	Borislav Petkov, H . Peter Anvin, Sami Tolvanen, alyssa.milburn,
	keescook, joao, tim.c.chen, live-patching

On Wed, Jul 19, 2023 at 10:15:38PM +0200, Peter Zijlstra wrote:
> On Wed, Jul 19, 2023 at 10:02:22PM +0200, Peter Zijlstra wrote:
> > On Wed, Jul 19, 2023 at 05:21:11PM +0200, Petr Mladek wrote:
> > 
> > > This patch broke livepatching. Kthreads never have a reliable stack.
> > > It works when I revert it.
> > 
> > > > +SYM_CODE_START(ret_from_fork_asm)
> > > > +	UNWIND_HINT_REGS
> > 
> > It works again when I change the above hint to UNWIND_HINT_END_OF_STACK,
> > so yeah. Doing this makes objtool unhappy with something else though,
> > so I'll go prod at things with something sharp...
> 
> 
> The below cures things; Josh, did I miss anything?
> 
> ---
>  arch/x86/entry/entry_64.S | 14 +++++++++++++-
>  1 file changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index 91f6818884fa..cfe7882ea9ae 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -285,7 +285,14 @@ SYM_FUNC_END(__switch_to_asm)
>   */
>  .pushsection .text, "ax"
>  SYM_CODE_START(ret_from_fork_asm)
> -	UNWIND_HINT_REGS
> +	/*
> +	 * This is the start of the kernel stack; even through there's a regs
> +	 * set at the top, there is no real exception frame and one cannot
> +	 * unwind further. This is the end.
> +	 *
> +	 * This ensures stack unwinds of kernel threads hit a known good state.
> +	 */
> +	UNWIND_HINT_END_OF_STACK

So unwind_orc.c:unwind_next_frame() will terminate on this hint *or* on
user_mode(state->regs).

AFAICT way things are set up in copy_thread(), user_mode() will not be
true -- after all there is no usermode, the kthread would first have to
exec() something to create a usermode.

Yet I'm wondering if perhaps we should spoof the regs to make
user_mode() true and auto-terminate without this explicit hint.

Josh, do you remember the rationale for all this?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/2] x86: Rewrite ret_from_fork() in C
  2023-07-19 20:50         ` Peter Zijlstra
@ 2023-07-19 23:31           ` Josh Poimboeuf
  2023-07-20  5:22             ` Peter Zijlstra
  0 siblings, 1 reply; 16+ messages in thread
From: Josh Poimboeuf @ 2023-07-19 23:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Petr Mladek, Brian Gerst, linux-kernel, x86, Thomas Gleixner,
	Borislav Petkov, H . Peter Anvin, Sami Tolvanen, alyssa.milburn,
	keescook, joao, tim.c.chen, live-patching

On Wed, Jul 19, 2023 at 10:50:50PM +0200, Peter Zijlstra wrote:
> > The below cures things; Josh, did I miss anything?
> > 
> > ---
> >  arch/x86/entry/entry_64.S | 14 +++++++++++++-
> >  1 file changed, 13 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> > index 91f6818884fa..cfe7882ea9ae 100644
> > --- a/arch/x86/entry/entry_64.S
> > +++ b/arch/x86/entry/entry_64.S
> > @@ -285,7 +285,14 @@ SYM_FUNC_END(__switch_to_asm)
> >   */
> >  .pushsection .text, "ax"
> >  SYM_CODE_START(ret_from_fork_asm)
> > -	UNWIND_HINT_REGS
> > +	/*
> > +	 * This is the start of the kernel stack; even through there's a regs
> > +	 * set at the top, there is no real exception frame and one cannot
> > +	 * unwind further. This is the end.
> > +	 *
> > +	 * This ensures stack unwinds of kernel threads hit a known good state.
> > +	 */
> > +	UNWIND_HINT_END_OF_STACK

The comments may be a bit superfluous (to me at least) but the patch
looks fine.

> So unwind_orc.c:unwind_next_frame() will terminate on this hint *or* on
> user_mode(state->regs).
> 
> AFAICT way things are set up in copy_thread(), user_mode() will not be
> true -- after all there is no usermode, the kthread would first have to
> exec() something to create a usermode.
> 
> Yet I'm wondering if perhaps we should spoof the regs to make
> user_mode() true and auto-terminate without this explicit hint.

I'm not sure that would be worth the trouble / cleverness.  The hint is
straightforward IMO.

> Josh, do you remember the rationale for all this?

For what exactly :-)

-- 
Josh

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/2] x86: Rewrite ret_from_fork() in C
  2023-07-19 23:31           ` Josh Poimboeuf
@ 2023-07-20  5:22             ` Peter Zijlstra
  2023-07-20  9:28               ` Peter Zijlstra
  0 siblings, 1 reply; 16+ messages in thread
From: Peter Zijlstra @ 2023-07-20  5:22 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Petr Mladek, Brian Gerst, linux-kernel, x86, Thomas Gleixner,
	Borislav Petkov, H . Peter Anvin, Sami Tolvanen, alyssa.milburn,
	keescook, joao, tim.c.chen, live-patching

On Wed, Jul 19, 2023 at 04:31:11PM -0700, Josh Poimboeuf wrote:
> On Wed, Jul 19, 2023 at 10:50:50PM +0200, Peter Zijlstra wrote:
> > > The below cures things; Josh, did I miss anything?
> > > 
> > > ---
> > >  arch/x86/entry/entry_64.S | 14 +++++++++++++-
> > >  1 file changed, 13 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> > > index 91f6818884fa..cfe7882ea9ae 100644
> > > --- a/arch/x86/entry/entry_64.S
> > > +++ b/arch/x86/entry/entry_64.S
> > > @@ -285,7 +285,14 @@ SYM_FUNC_END(__switch_to_asm)
> > >   */
> > >  .pushsection .text, "ax"
> > >  SYM_CODE_START(ret_from_fork_asm)
> > > -	UNWIND_HINT_REGS
> > > +	/*
> > > +	 * This is the start of the kernel stack; even through there's a regs
> > > +	 * set at the top, there is no real exception frame and one cannot
> > > +	 * unwind further. This is the end.
> > > +	 *
> > > +	 * This ensures stack unwinds of kernel threads hit a known good state.
> > > +	 */
> > > +	UNWIND_HINT_END_OF_STACK
> 
> The comments may be a bit superfluous (to me at least) but the patch
> looks fine.

Right, well, it took me a minute to figure out how it was all supposed
to work, I figured I'd stick a comment on it.

The bit I missed is that if you reach the return-to-user part, you will
actually have user_mode() true on the regset.

> > So unwind_orc.c:unwind_next_frame() will terminate on this hint *or* on
> > user_mode(state->regs).
> > 
> > AFAICT way things are set up in copy_thread(), user_mode() will not be
> > true -- after all there is no usermode, the kthread would first have to
> > exec() something to create a usermode.
> > 
> > Yet I'm wondering if perhaps we should spoof the regs to make
> > user_mode() true and auto-terminate without this explicit hint.
> 
> I'm not sure that would be worth the trouble / cleverness.  The hint is
> straightforward IMO.

I tried, it doesn't work, clearly I missed something.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/2] x86: Rewrite ret_from_fork() in C
  2023-07-19 20:15       ` Peter Zijlstra
  2023-07-19 20:50         ` Peter Zijlstra
@ 2023-07-20  8:18         ` Petr Mladek
  2023-07-21  9:20         ` [tip: x86/urgent] x86: Fix kthread unwind tip-bot2 for Peter Zijlstra
  2 siblings, 0 replies; 16+ messages in thread
From: Petr Mladek @ 2023-07-20  8:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Brian Gerst, jpoimboe, linux-kernel, x86, Thomas Gleixner,
	Borislav Petkov, H . Peter Anvin, Sami Tolvanen, alyssa.milburn,
	keescook, joao, tim.c.chen, live-patching

On Wed 2023-07-19 22:15:38, Peter Zijlstra wrote:
> On Wed, Jul 19, 2023 at 10:02:22PM +0200, Peter Zijlstra wrote:
> > On Wed, Jul 19, 2023 at 05:21:11PM +0200, Petr Mladek wrote:
> > 
> > > This patch broke livepatching. Kthreads never have a reliable stack.
> > > It works when I revert it.
> > 
> > > > +SYM_CODE_START(ret_from_fork_asm)
> > > > +	UNWIND_HINT_REGS
> > 
> > It works again when I change the above hint to UNWIND_HINT_END_OF_STACK,
> > so yeah. Doing this makes objtool unhappy with something else though,
> > so I'll go prod at things with something sharp...
> 
> 
> The below cures things; Josh, did I miss anything?

I can confirm that it solved the problem. Feel free to use:

Tested-by: Petr Mladek <pmladek@suse.com>

Thanks a lot for the quick fix.

Best Regards,
Petr

> ---
>  arch/x86/entry/entry_64.S | 14 +++++++++++++-
>  1 file changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index 91f6818884fa..cfe7882ea9ae 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -285,7 +285,14 @@ SYM_FUNC_END(__switch_to_asm)
>   */
>  .pushsection .text, "ax"
>  SYM_CODE_START(ret_from_fork_asm)
> -	UNWIND_HINT_REGS
> +	/*
> +	 * This is the start of the kernel stack; even through there's a regs
> +	 * set at the top, there is no real exception frame and one cannot
> +	 * unwind further. This is the end.
> +	 *
> +	 * This ensures stack unwinds of kernel threads hit a known good state.
> +	 */
> +	UNWIND_HINT_END_OF_STACK
>  	ANNOTATE_NOENDBR // copy_thread
>  	CALL_DEPTH_ACCOUNT
>  
> @@ -295,6 +302,11 @@ SYM_CODE_START(ret_from_fork_asm)
>  	movq	%r12, %rcx		/* fn_arg */
>  	call	ret_from_fork
>  
> +	/*
> +	 * Set the stack state to what is expected for the target function
> +	 * -- also it is not wrong.
> +	 */
> +	UNWIND_HINT_REGS
>  	jmp	swapgs_restore_regs_and_return_to_usermode
>  SYM_CODE_END(ret_from_fork_asm)
>  .popsection

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/2] x86: Rewrite ret_from_fork() in C
  2023-07-20  5:22             ` Peter Zijlstra
@ 2023-07-20  9:28               ` Peter Zijlstra
  0 siblings, 0 replies; 16+ messages in thread
From: Peter Zijlstra @ 2023-07-20  9:28 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Petr Mladek, Brian Gerst, linux-kernel, x86, Thomas Gleixner,
	Borislav Petkov, H . Peter Anvin, Sami Tolvanen, alyssa.milburn,
	keescook, joao, tim.c.chen, live-patching

On Thu, Jul 20, 2023 at 07:22:08AM +0200, Peter Zijlstra wrote:

> > I'm not sure that would be worth the trouble / cleverness.  The hint is
> > straightforward IMO.
> 
> I tried, it doesn't work, clearly I missed something.

FWIW, I tried the below. That should make user_mode() true for the
kernel thread regset, and while the kernel did boot, it still fails the
livepatch self-test.

The difference seems to be that END_OF_STACK terminates it right there,
while REGS thinks its a valid frame and only terminates on user_mode()
when unwinding one more frame. The frame at REGS clearly isn't very
sane.


diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 72015dba72ab..45a400b16b80 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -232,6 +232,7 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
 		 * It does the same kernel frame setup to return to a kernel
 		 * function that a kernel thread does.
 		 */
+		childregs->cs = 3;
 		childregs->sp = 0;
 		childregs->ip = 0;
 		kthread_frame_init(frame, args->fn, args->fn_arg);

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [tip: x86/urgent] x86: Fix kthread unwind
  2023-07-19 20:15       ` Peter Zijlstra
  2023-07-19 20:50         ` Peter Zijlstra
  2023-07-20  8:18         ` Petr Mladek
@ 2023-07-21  9:20         ` tip-bot2 for Peter Zijlstra
  2 siblings, 0 replies; 16+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2023-07-21  9:20 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Petr Mladek, Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID:     2e7e5bbb1c3c8d502edeb5c0670eac4995134b6f
Gitweb:        https://git.kernel.org/tip/2e7e5bbb1c3c8d502edeb5c0670eac4995134b6f
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Wed, 19 Jul 2023 22:15:38 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 20 Jul 2023 23:03:50 +02:00

x86: Fix kthread unwind

The rewrite of ret_from_form() misplaced an unwind hint which caused
all kthread stack unwinds to be marked unreliable, breaking
livepatching.

Restore the annotation and add a comment to explain the how and why of
things.

Fixes: 3aec4ecb3d1f ("x86: Rewrite ret_from_fork() in C")
Reported-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Petr Mladek <pmladek@suse.com>
Link: https://lkml.kernel.org/r/20230719201538.GA3553016@hirez.programming.kicks-ass.net
---
 arch/x86/entry/entry_64.S | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 91f6818..43606de 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -285,7 +285,15 @@ SYM_FUNC_END(__switch_to_asm)
  */
 .pushsection .text, "ax"
 SYM_CODE_START(ret_from_fork_asm)
-	UNWIND_HINT_REGS
+	/*
+	 * This is the start of the kernel stack; even through there's a
+	 * register set at the top, the regset isn't necessarily coherent
+	 * (consider kthreads) and one cannot unwind further.
+	 *
+	 * This ensures stack unwinds of kernel threads terminate in a known
+	 * good state.
+	 */
+	UNWIND_HINT_END_OF_STACK
 	ANNOTATE_NOENDBR // copy_thread
 	CALL_DEPTH_ACCOUNT
 
@@ -295,6 +303,12 @@ SYM_CODE_START(ret_from_fork_asm)
 	movq	%r12, %rcx		/* fn_arg */
 	call	ret_from_fork
 
+	/*
+	 * Set the stack state to what is expected for the target function
+	 * -- at this point the register set should be a valid user set
+	 * and unwind should work normally.
+	 */
+	UNWIND_HINT_REGS
 	jmp	swapgs_restore_regs_and_return_to_usermode
 SYM_CODE_END(ret_from_fork_asm)
 .popsection

^ permalink raw reply related	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2023-07-21  9:21 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-23 22:55 [PATCH v2 0/2] Rewrite ret_from_fork() in C Brian Gerst
2023-06-23 22:55 ` [PATCH v2 1/2] x86/32: Remove schedule_tail_wrapper() Brian Gerst
2023-07-10  8:13   ` [tip: x86/urgent] " tip-bot2 for Brian Gerst
2023-06-23 22:55 ` [PATCH v2 2/2] x86: Rewrite ret_from_fork() in C Brian Gerst
2023-07-10  8:13   ` [tip: x86/urgent] " tip-bot2 for Brian Gerst
2023-07-19 15:21   ` [PATCH v2 2/2] " Petr Mladek
2023-07-19 20:02     ` Peter Zijlstra
2023-07-19 20:15       ` Peter Zijlstra
2023-07-19 20:50         ` Peter Zijlstra
2023-07-19 23:31           ` Josh Poimboeuf
2023-07-20  5:22             ` Peter Zijlstra
2023-07-20  9:28               ` Peter Zijlstra
2023-07-20  8:18         ` Petr Mladek
2023-07-21  9:20         ` [tip: x86/urgent] x86: Fix kthread unwind tip-bot2 for Peter Zijlstra
2023-07-19 20:33     ` [PATCH v2 2/2] x86: Rewrite ret_from_fork() in C Joe Lawrence
2023-07-19 20:41       ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).