linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/6] x86: Rewrite switch_to()
@ 2016-06-18 20:56 Brian Gerst
  2016-06-18 20:56 ` [PATCH v2 1/6] x86-32, kgdb: Don't use thread.ip in sleeping_thread_to_gdb_regs() Brian Gerst
                   ` (6 more replies)
  0 siblings, 7 replies; 22+ messages in thread
From: Brian Gerst @ 2016-06-18 20:56 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Ingo Molnar, H. Peter Anvin, Denys Vlasenko, Andy Lutomirski,
	Borislav Petkov, Thomas Gleixner

This patch set simplifies the switch_to() code, by moving the stack switch
code out of line into an asm stub before calling __switch_to().  This ends
up being more readable, and using the C calling convention instead of
clobbering all registers improves code generation.  It also allows newly
forked processes to construct a special stack frame to seamlessly flow
to ret_from_fork, instead of using a test and branch, or an unbalanced
call/ret.

Changes from v1:
- Added struct inactive_task_frame
- Added comments about kernel threads returning to userspace
- Cleaned up some incorrect uses of thread.sp

Brian Gerst (6):
      x86-32, kgdb: Don't use thread.ip in sleeping_thread_to_gdb_regs()
      x86-64, kgdb: clear GDB_PS on 64-bit
      x86: Add struct inactive_task_frame
      x86: Rewrite switch_to() code
      x86: Pass kernel thread parameters in fork_frame
      x86: Fix thread_saved_pc()

 arch/x86/entry/entry_32.S          |  68 +++++++++++++-----
 arch/x86/entry/entry_64.S          |  76 ++++++++++++++------
 arch/x86/include/asm/processor.h   |  13 +---
 arch/x86/include/asm/stacktrace.h  |   4 +-
 arch/x86/include/asm/switch_to.h   | 142 +++++++------------------------------
 arch/x86/include/asm/thread_info.h |   2 -
 arch/x86/kernel/asm-offsets.c      |   6 ++
 arch/x86/kernel/asm-offsets_32.c   |   5 ++
 arch/x86/kernel/asm-offsets_64.c   |   5 ++
 arch/x86/kernel/kgdb.c             |   8 +--
 arch/x86/kernel/process.c          |  13 +++-
 arch/x86/kernel/process_32.c       |  29 +++-----
 arch/x86/kernel/process_64.c       |  19 ++---
 arch/x86/kernel/smpboot.c          |   1 -
 14 files changed, 187 insertions(+), 204 deletions(-)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2 1/6] x86-32, kgdb: Don't use thread.ip in sleeping_thread_to_gdb_regs()
  2016-06-18 20:56 [PATCH v2 0/6] x86: Rewrite switch_to() Brian Gerst
@ 2016-06-18 20:56 ` Brian Gerst
  2016-06-18 20:56 ` [PATCH v2 2/6] x86-64, kgdb: clear GDB_PS on 64-bit Brian Gerst
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 22+ messages in thread
From: Brian Gerst @ 2016-06-18 20:56 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Ingo Molnar, H. Peter Anvin, Denys Vlasenko, Andy Lutomirski,
	Borislav Petkov, Thomas Gleixner

Match 64-bit and set gdb_regs[GDB_PC] to zero.  thread.ip is always the
same point in the scheduler (except for newly forked processes), and will
be removed in a future patch.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
---
 arch/x86/kernel/kgdb.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/kernel/kgdb.c b/arch/x86/kernel/kgdb.c
index 04cde52..fe649a5 100644
--- a/arch/x86/kernel/kgdb.c
+++ b/arch/x86/kernel/kgdb.c
@@ -172,7 +172,6 @@ void sleeping_thread_to_gdb_regs(unsigned long *gdb_regs, struct task_struct *p)
 	gdb_regs[GDB_ES]	= __KERNEL_DS;
 	gdb_regs[GDB_PS]	= 0;
 	gdb_regs[GDB_CS]	= __KERNEL_CS;
-	gdb_regs[GDB_PC]	= p->thread.ip;
 	gdb_regs[GDB_SS]	= __KERNEL_DS;
 	gdb_regs[GDB_FS]	= 0xFFFF;
 	gdb_regs[GDB_GS]	= 0xFFFF;
@@ -180,7 +179,6 @@ void sleeping_thread_to_gdb_regs(unsigned long *gdb_regs, struct task_struct *p)
 	gdb_regs32[GDB_PS]	= *(unsigned long *)(p->thread.sp + 8);
 	gdb_regs32[GDB_CS]	= __KERNEL_CS;
 	gdb_regs32[GDB_SS]	= __KERNEL_DS;
-	gdb_regs[GDB_PC]	= 0;
 	gdb_regs[GDB_R8]	= 0;
 	gdb_regs[GDB_R9]	= 0;
 	gdb_regs[GDB_R10]	= 0;
@@ -190,6 +188,7 @@ void sleeping_thread_to_gdb_regs(unsigned long *gdb_regs, struct task_struct *p)
 	gdb_regs[GDB_R14]	= 0;
 	gdb_regs[GDB_R15]	= 0;
 #endif
+	gdb_regs[GDB_PC]	= 0;
 	gdb_regs[GDB_SP]	= p->thread.sp;
 }
 
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 2/6] x86-64, kgdb: clear GDB_PS on 64-bit
  2016-06-18 20:56 [PATCH v2 0/6] x86: Rewrite switch_to() Brian Gerst
  2016-06-18 20:56 ` [PATCH v2 1/6] x86-32, kgdb: Don't use thread.ip in sleeping_thread_to_gdb_regs() Brian Gerst
@ 2016-06-18 20:56 ` Brian Gerst
  2016-06-18 20:56 ` [PATCH v2 3/6] x86: Add struct inactive_task_frame Brian Gerst
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 22+ messages in thread
From: Brian Gerst @ 2016-06-18 20:56 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Ingo Molnar, H. Peter Anvin, Denys Vlasenko, Andy Lutomirski,
	Borislav Petkov, Thomas Gleixner

switch_to() no longer saves EFLAGS, so it's bogus to look for it on the
stack.  Set it to zero like 32-bit.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
---
 arch/x86/kernel/kgdb.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/kgdb.c b/arch/x86/kernel/kgdb.c
index fe649a5..5e3f294 100644
--- a/arch/x86/kernel/kgdb.c
+++ b/arch/x86/kernel/kgdb.c
@@ -176,7 +176,7 @@ void sleeping_thread_to_gdb_regs(unsigned long *gdb_regs, struct task_struct *p)
 	gdb_regs[GDB_FS]	= 0xFFFF;
 	gdb_regs[GDB_GS]	= 0xFFFF;
 #else
-	gdb_regs32[GDB_PS]	= *(unsigned long *)(p->thread.sp + 8);
+	gdb_regs32[GDB_PS]	= 0;
 	gdb_regs32[GDB_CS]	= __KERNEL_CS;
 	gdb_regs32[GDB_SS]	= __KERNEL_DS;
 	gdb_regs[GDB_R8]	= 0;
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 3/6] x86: Add struct inactive_task_frame
  2016-06-18 20:56 [PATCH v2 0/6] x86: Rewrite switch_to() Brian Gerst
  2016-06-18 20:56 ` [PATCH v2 1/6] x86-32, kgdb: Don't use thread.ip in sleeping_thread_to_gdb_regs() Brian Gerst
  2016-06-18 20:56 ` [PATCH v2 2/6] x86-64, kgdb: clear GDB_PS on 64-bit Brian Gerst
@ 2016-06-18 20:56 ` Brian Gerst
  2016-06-19 21:18   ` Andy Lutomirski
  2016-06-20 15:39   ` Josh Poimboeuf
  2016-06-18 20:56 ` [PATCH v2 4/6] x86: Rewrite switch_to() code Brian Gerst
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 22+ messages in thread
From: Brian Gerst @ 2016-06-18 20:56 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Ingo Molnar, H. Peter Anvin, Denys Vlasenko, Andy Lutomirski,
	Borislav Petkov, Thomas Gleixner

Add struct inactive_task_frame, which defines the layout of the stack for
a sleeping process.  For now, the only defined field is the BP register
(frame pointer).

Signed-off-by: Brian Gerst <brgerst@gmail.com>
---
 arch/x86/include/asm/stacktrace.h | 4 ++--
 arch/x86/include/asm/switch_to.h  | 5 +++++
 arch/x86/kernel/kgdb.c            | 3 ++-
 arch/x86/kernel/process.c         | 3 ++-
 4 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/stacktrace.h b/arch/x86/include/asm/stacktrace.h
index 7c247e7..fb4a078 100644
--- a/arch/x86/include/asm/stacktrace.h
+++ b/arch/x86/include/asm/stacktrace.h
@@ -8,6 +8,7 @@
 
 #include <linux/uaccess.h>
 #include <linux/ptrace.h>
+#include <asm/switch_to.h>
 
 extern int kstack_depth_to_print;
 
@@ -70,8 +71,7 @@ stack_frame(struct task_struct *task, struct pt_regs *regs)
 		return bp;
 	}
 
-	/* bp is the last reg pushed by switch_to */
-	return *(unsigned long *)task->thread.sp;
+	return ((struct inactive_task_frame *)task->thread.sp)->bp;
 }
 #else
 static inline unsigned long
diff --git a/arch/x86/include/asm/switch_to.h b/arch/x86/include/asm/switch_to.h
index 8f321a1..02de86e 100644
--- a/arch/x86/include/asm/switch_to.h
+++ b/arch/x86/include/asm/switch_to.h
@@ -8,6 +8,11 @@ struct tss_struct;
 void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p,
 		      struct tss_struct *tss);
 
+/* data that is pointed to by thread.sp */
+struct inactive_task_frame {
+	unsigned long bp;
+};
+
 #ifdef CONFIG_X86_32
 
 #ifdef CONFIG_CC_STACKPROTECTOR
diff --git a/arch/x86/kernel/kgdb.c b/arch/x86/kernel/kgdb.c
index 5e3f294..8e36f24 100644
--- a/arch/x86/kernel/kgdb.c
+++ b/arch/x86/kernel/kgdb.c
@@ -50,6 +50,7 @@
 #include <asm/apicdef.h>
 #include <asm/apic.h>
 #include <asm/nmi.h>
+#include <asm/switch_to.h>
 
 struct dbg_reg_def_t dbg_reg_def[DBG_MAX_REG_NUM] =
 {
@@ -166,7 +167,7 @@ void sleeping_thread_to_gdb_regs(unsigned long *gdb_regs, struct task_struct *p)
 	gdb_regs[GDB_DX]	= 0;
 	gdb_regs[GDB_SI]	= 0;
 	gdb_regs[GDB_DI]	= 0;
-	gdb_regs[GDB_BP]	= *(unsigned long *)p->thread.sp;
+	gdb_regs[GDB_BP]	= ((struct inactive_task_frame *)p->thread.sp)->bp;
 #ifdef CONFIG_X86_32
 	gdb_regs[GDB_DS]	= __KERNEL_DS;
 	gdb_regs[GDB_ES]	= __KERNEL_DS;
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 96becbb..00ebab0 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -31,6 +31,7 @@
 #include <asm/tlbflush.h>
 #include <asm/mce.h>
 #include <asm/vm86.h>
+#include <asm/switch_to.h>
 
 /*
  * per-CPU TSS segments. Threads are completely 'soft' on Linux,
@@ -555,7 +556,7 @@ unsigned long get_wchan(struct task_struct *p)
 	if (sp < bottom || sp > top)
 		return 0;
 
-	fp = READ_ONCE_NOCHECK(*(unsigned long *)sp);
+	fp = READ_ONCE_NOCHECK(((struct inactive_task_frame *)sp)->bp);
 	do {
 		if (fp < bottom || fp > top)
 			return 0;
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 4/6] x86: Rewrite switch_to() code
  2016-06-18 20:56 [PATCH v2 0/6] x86: Rewrite switch_to() Brian Gerst
                   ` (2 preceding siblings ...)
  2016-06-18 20:56 ` [PATCH v2 3/6] x86: Add struct inactive_task_frame Brian Gerst
@ 2016-06-18 20:56 ` Brian Gerst
  2016-06-19 21:22   ` Andy Lutomirski
  2016-06-20 15:44   ` Josh Poimboeuf
  2016-06-18 20:56 ` [PATCH v2 5/6] x86: Pass kernel thread parameters in fork_frame Brian Gerst
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 22+ messages in thread
From: Brian Gerst @ 2016-06-18 20:56 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Ingo Molnar, H. Peter Anvin, Denys Vlasenko, Andy Lutomirski,
	Borislav Petkov, Thomas Gleixner

Move the low-level context switch code to an out-of-line asm stub instead of
using complex inline asm.  This allows constructing a new stack frame for the
child process to make it seamlessly flow to ret_from_fork without an extra
test and branch in __switch_to().  It also improves code generation for
__schedule() by using the C calling convention instead of clobbering all
registers.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
---
 arch/x86/entry/entry_32.S          |  37 ++++++++++
 arch/x86/entry/entry_64.S          |  41 ++++++++++-
 arch/x86/include/asm/processor.h   |   3 -
 arch/x86/include/asm/switch_to.h   | 137 ++++++-------------------------------
 arch/x86/include/asm/thread_info.h |   2 -
 arch/x86/kernel/asm-offsets.c      |   6 ++
 arch/x86/kernel/asm-offsets_32.c   |   5 ++
 arch/x86/kernel/asm-offsets_64.c   |   5 ++
 arch/x86/kernel/process_32.c       |   9 ++-
 arch/x86/kernel/process_64.c       |   9 ++-
 arch/x86/kernel/smpboot.c          |   1 -
 11 files changed, 125 insertions(+), 130 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 983e5d3..e8302b5 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -204,6 +204,43 @@
 	POP_GS_EX
 .endm
 
+/*
+ * %eax: prev task
+ * %edx: next task
+ */
+ENTRY(__switch_to_asm)
+	/*
+	 * Save callee-saved registers
+	 * This must match the order in struct inactive_task_frame
+	 */
+	pushl	%ebp
+	pushl	%ebx
+	pushl	%edi
+	pushl	%esi
+
+	/* switch stack */
+	movl	%esp, TASK_threadsp(%eax)
+	movl	TASK_threadsp(%edx), %esp
+
+#ifdef CONFIG_CC_STACKPROTECTOR
+	movl	TASK_stack_canary(%edx), %ebx
+	movl	%ebx, PER_CPU_VAR(stack_canary)+stack_canary_offset
+#endif
+
+	/* restore callee-saved registers */
+	popl	%esi
+	popl	%edi
+	popl	%ebx
+	popl	%ebp
+
+	jmp	__switch_to
+END(__switch_to_asm)
+
+/*
+ * A newly forked process directly context switches into this address.
+ *
+ * eax: prev task we switched from
+ */
 ENTRY(ret_from_fork)
 	pushl	%eax
 	call	schedule_tail
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 9ee0da1..7574528 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -365,13 +365,48 @@ END(ptregs_\func)
 #include <asm/syscalls_64.h>
 
 /*
+ * %rdi: prev task
+ * %rsi: next task
+ */
+ENTRY(__switch_to_asm)
+	/*
+	 * Save callee-saved registers
+	 * This must match the order in inactive_task_frame
+	 */
+	pushq	%rbp
+	pushq	%rbx
+	pushq	%r12
+	pushq	%r13
+	pushq	%r14
+	pushq	%r15
+
+	/* switch stack */
+	movq	%rsp, TASK_threadsp(%rdi)
+	movq	TASK_threadsp(%rsi), %rsp
+
+#ifdef CONFIG_CC_STACKPROTECTOR
+	movq	TASK_stack_canary(%rsi), %rbx
+	movq	%rbx, PER_CPU_VAR(irq_stack_union)+stack_canary_offset
+#endif
+
+	/* restore callee-saved registers */
+	popq	%r15
+	popq	%r14
+	popq	%r13
+	popq	%r12
+	popq	%rbx
+	popq	%rbp
+
+	jmp	__switch_to
+END(__switch_to_asm)
+
+/*
  * A newly forked process directly context switches into this address.
  *
- * rdi: prev task we switched from
+ * rax: prev task we switched from
  */
 ENTRY(ret_from_fork)
-	LOCK ; btr $TIF_FORK, TI_flags(%r8)
-
+	movq	%rax, %rdi
 	call	schedule_tail			/* rdi: 'prev' task parameter */
 
 	testb	$3, CS(%rsp)			/* from kernel_thread? */
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 965c5d2..1e7d634 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -385,9 +385,6 @@ struct thread_struct {
 	unsigned short		fsindex;
 	unsigned short		gsindex;
 #endif
-#ifdef CONFIG_X86_32
-	unsigned long		ip;
-#endif
 #ifdef CONFIG_X86_64
 	unsigned long		fsbase;
 	unsigned long		gsbase;
diff --git a/arch/x86/include/asm/switch_to.h b/arch/x86/include/asm/switch_to.h
index 02de86e..bf4e2ec 100644
--- a/arch/x86/include/asm/switch_to.h
+++ b/arch/x86/include/asm/switch_to.h
@@ -2,135 +2,40 @@
 #define _ASM_X86_SWITCH_TO_H
 
 struct task_struct; /* one of the stranger aspects of C forward declarations */
+
+struct task_struct *__switch_to_asm(struct task_struct *prev,
+				    struct task_struct *next);
+
 __visible struct task_struct *__switch_to(struct task_struct *prev,
-					   struct task_struct *next);
+					  struct task_struct *next);
 struct tss_struct;
 void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p,
 		      struct tss_struct *tss);
 
 /* data that is pointed to by thread.sp */
 struct inactive_task_frame {
+#ifdef CONFIG_X86_64
+	unsigned long r15;
+	unsigned long r14;
+	unsigned long r13;
+	unsigned long r12;
+#else
+	unsigned long si;
+	unsigned long di;
+#endif
+	unsigned long bx;
 	unsigned long bp;
+	unsigned long ret_addr;
 };
 
-#ifdef CONFIG_X86_32
-
-#ifdef CONFIG_CC_STACKPROTECTOR
-#define __switch_canary							\
-	"movl %P[task_canary](%[next]), %%ebx\n\t"			\
-	"movl %%ebx, "__percpu_arg([stack_canary])"\n\t"
-#define __switch_canary_oparam						\
-	, [stack_canary] "=m" (stack_canary.canary)
-#define __switch_canary_iparam						\
-	, [task_canary] "i" (offsetof(struct task_struct, stack_canary))
-#else	/* CC_STACKPROTECTOR */
-#define __switch_canary
-#define __switch_canary_oparam
-#define __switch_canary_iparam
-#endif	/* CC_STACKPROTECTOR */
+struct fork_frame {
+	struct inactive_task_frame frame;
+	struct pt_regs regs;
+};
 
-/*
- * Saving eflags is important. It switches not only IOPL between tasks,
- * it also protects other tasks from NT leaking through sysenter etc.
- */
 #define switch_to(prev, next, last)					\
 do {									\
-	/*								\
-	 * Context-switching clobbers all registers, so we clobber	\
-	 * them explicitly, via unused output variables.		\
-	 * (EAX and EBP is not listed because EBP is saved/restored	\
-	 * explicitly for wchan access and EAX is the return value of	\
-	 * __switch_to())						\
-	 */								\
-	unsigned long ebx, ecx, edx, esi, edi;				\
-									\
-	asm volatile("pushl %%ebp\n\t"		/* save    EBP   */	\
-		     "movl %%esp,%[prev_sp]\n\t"	/* save    ESP   */ \
-		     "movl %[next_sp],%%esp\n\t"	/* restore ESP   */ \
-		     "movl $1f,%[prev_ip]\n\t"	/* save    EIP   */	\
-		     "pushl %[next_ip]\n\t"	/* restore EIP   */	\
-		     __switch_canary					\
-		     "jmp __switch_to\n"	/* regparm call  */	\
-		     "1:\t"						\
-		     "popl %%ebp\n\t"		/* restore EBP   */	\
-									\
-		     /* output parameters */				\
-		     : [prev_sp] "=m" (prev->thread.sp),		\
-		       [prev_ip] "=m" (prev->thread.ip),		\
-		       "=a" (last),					\
-									\
-		       /* clobbered output registers: */		\
-		       "=b" (ebx), "=c" (ecx), "=d" (edx),		\
-		       "=S" (esi), "=D" (edi)				\
-		       							\
-		       __switch_canary_oparam				\
-									\
-		       /* input parameters: */				\
-		     : [next_sp]  "m" (next->thread.sp),		\
-		       [next_ip]  "m" (next->thread.ip),		\
-		       							\
-		       /* regparm parameters for __switch_to(): */	\
-		       [prev]     "a" (prev),				\
-		       [next]     "d" (next)				\
-									\
-		       __switch_canary_iparam				\
-									\
-		     : /* reloaded segment registers */			\
-			"memory");					\
+	((last) = __switch_to_asm((prev), (next)));			\
 } while (0)
 
-#else /* CONFIG_X86_32 */
-
-/* frame pointer must be last for get_wchan */
-#define SAVE_CONTEXT    "pushq %%rbp ; movq %%rsi,%%rbp\n\t"
-#define RESTORE_CONTEXT "movq %%rbp,%%rsi ; popq %%rbp\t"
-
-#define __EXTRA_CLOBBER  \
-	, "rcx", "rbx", "rdx", "r8", "r9", "r10", "r11", \
-	  "r12", "r13", "r14", "r15", "flags"
-
-#ifdef CONFIG_CC_STACKPROTECTOR
-#define __switch_canary							  \
-	"movq %P[task_canary](%%rsi),%%r8\n\t"				  \
-	"movq %%r8,"__percpu_arg([gs_canary])"\n\t"
-#define __switch_canary_oparam						  \
-	, [gs_canary] "=m" (irq_stack_union.stack_canary)
-#define __switch_canary_iparam						  \
-	, [task_canary] "i" (offsetof(struct task_struct, stack_canary))
-#else	/* CC_STACKPROTECTOR */
-#define __switch_canary
-#define __switch_canary_oparam
-#define __switch_canary_iparam
-#endif	/* CC_STACKPROTECTOR */
-
-/*
- * There is no need to save or restore flags, because flags are always
- * clean in kernel mode, with the possible exception of IOPL.  Kernel IOPL
- * has no effect.
- */
-#define switch_to(prev, next, last) \
-	asm volatile(SAVE_CONTEXT					  \
-	     "movq %%rsp,%P[threadrsp](%[prev])\n\t" /* save RSP */	  \
-	     "movq %P[threadrsp](%[next]),%%rsp\n\t" /* restore RSP */	  \
-	     "call __switch_to\n\t"					  \
-	     "movq "__percpu_arg([current_task])",%%rsi\n\t"		  \
-	     __switch_canary						  \
-	     "movq %P[thread_info](%%rsi),%%r8\n\t"			  \
-	     "movq %%rax,%%rdi\n\t" 					  \
-	     "testl  %[_tif_fork],%P[ti_flags](%%r8)\n\t"		  \
-	     "jnz   ret_from_fork\n\t"					  \
-	     RESTORE_CONTEXT						  \
-	     : "=a" (last)					  	  \
-	       __switch_canary_oparam					  \
-	     : [next] "S" (next), [prev] "D" (prev),			  \
-	       [threadrsp] "i" (offsetof(struct task_struct, thread.sp)), \
-	       [ti_flags] "i" (offsetof(struct thread_info, flags)),	  \
-	       [_tif_fork] "i" (_TIF_FORK),			  	  \
-	       [thread_info] "i" (offsetof(struct task_struct, stack)),   \
-	       [current_task] "m" (current_task)			  \
-	       __switch_canary_iparam					  \
-	     : "memory", "cc" __EXTRA_CLOBBER)
-
-#endif /* CONFIG_X86_32 */
-
 #endif /* _ASM_X86_SWITCH_TO_H */
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index 30c133a..20d56ec 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -99,7 +99,6 @@ struct thread_info {
 #define TIF_UPROBE		12	/* breakpointed or singlestepping */
 #define TIF_NOTSC		16	/* TSC is not accessible in userland */
 #define TIF_IA32		17	/* IA32 compatibility process */
-#define TIF_FORK		18	/* ret_from_fork */
 #define TIF_NOHZ		19	/* in adaptive nohz mode */
 #define TIF_MEMDIE		20	/* is terminating due to OOM killer */
 #define TIF_POLLING_NRFLAG	21	/* idle is polling for TIF_NEED_RESCHED */
@@ -123,7 +122,6 @@ struct thread_info {
 #define _TIF_UPROBE		(1 << TIF_UPROBE)
 #define _TIF_NOTSC		(1 << TIF_NOTSC)
 #define _TIF_IA32		(1 << TIF_IA32)
-#define _TIF_FORK		(1 << TIF_FORK)
 #define _TIF_NOHZ		(1 << TIF_NOHZ)
 #define _TIF_POLLING_NRFLAG	(1 << TIF_POLLING_NRFLAG)
 #define _TIF_IO_BITMAP		(1 << TIF_IO_BITMAP)
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index 674134e..ec41c79 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -29,6 +29,12 @@
 
 void common(void) {
 	BLANK();
+	OFFSET(TASK_threadsp, task_struct, thread.sp);
+#ifdef CONFIG_CC_STACKPROTECTOR
+	OFFSET(TASK_stack_canary, task_struct, stack_canary);
+#endif
+
+	BLANK();
 	OFFSET(TI_flags, thread_info, flags);
 	OFFSET(TI_status, thread_info, status);
 	OFFSET(TI_addr_limit, thread_info, addr_limit);
diff --git a/arch/x86/kernel/asm-offsets_32.c b/arch/x86/kernel/asm-offsets_32.c
index ecdc1d2..880aa09 100644
--- a/arch/x86/kernel/asm-offsets_32.c
+++ b/arch/x86/kernel/asm-offsets_32.c
@@ -57,6 +57,11 @@ void foo(void)
 	/* Size of SYSENTER_stack */
 	DEFINE(SIZEOF_SYSENTER_stack, sizeof(((struct tss_struct *)0)->SYSENTER_stack));
 
+#ifdef CONFIG_CC_STACKPROTECTOR
+	BLANK();
+	OFFSET(stack_canary_offset, stack_canary, canary);
+#endif
+
 #if defined(CONFIG_LGUEST) || defined(CONFIG_LGUEST_GUEST) || defined(CONFIG_LGUEST_MODULE)
 	BLANK();
 	OFFSET(LGUEST_DATA_irq_enabled, lguest_data, irq_enabled);
diff --git a/arch/x86/kernel/asm-offsets_64.c b/arch/x86/kernel/asm-offsets_64.c
index d875f97..210927e 100644
--- a/arch/x86/kernel/asm-offsets_64.c
+++ b/arch/x86/kernel/asm-offsets_64.c
@@ -56,6 +56,11 @@ int main(void)
 	OFFSET(TSS_sp0, tss_struct, x86_tss.sp0);
 	BLANK();
 
+#ifdef CONFIG_CC_STACKPROTECTOR
+	DEFINE(stack_canary_offset, offsetof(union irq_stack_union, stack_canary));
+	BLANK();
+#endif
+
 	DEFINE(__NR_syscall_max, sizeof(syscalls_64) - 1);
 	DEFINE(NR_syscalls, sizeof(syscalls_64));
 
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 9f95091..81a82f5 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -133,17 +133,20 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long sp,
 	unsigned long arg, struct task_struct *p, unsigned long tls)
 {
 	struct pt_regs *childregs = task_pt_regs(p);
+	struct fork_frame *fork_frame = container_of(childregs, struct fork_frame, regs);
+	struct inactive_task_frame *frame = &fork_frame->frame;
 	struct task_struct *tsk;
 	int err;
 
-	p->thread.sp = (unsigned long) childregs;
+	frame->bp = 0;
+	p->thread.sp = (unsigned long) fork_frame;
 	p->thread.sp0 = (unsigned long) (childregs+1);
 	memset(p->thread.ptrace_bps, 0, sizeof(p->thread.ptrace_bps));
 
 	if (unlikely(p->flags & PF_KTHREAD)) {
 		/* kernel thread */
 		memset(childregs, 0, sizeof(struct pt_regs));
-		p->thread.ip = (unsigned long) ret_from_kernel_thread;
+		frame->ret_addr = (unsigned long) ret_from_kernel_thread;
 		task_user_gs(p) = __KERNEL_STACK_CANARY;
 		childregs->ds = __USER_DS;
 		childregs->es = __USER_DS;
@@ -161,7 +164,7 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long sp,
 	if (sp)
 		childregs->sp = sp;
 
-	p->thread.ip = (unsigned long) ret_from_fork;
+	frame->ret_addr = (unsigned long) ret_from_fork;
 	task_user_gs(p) = get_user_gs(current_pt_regs());
 
 	p->thread.io_bitmap_ptr = NULL;
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 6e789ca..3ac9522 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -141,12 +141,17 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long sp,
 {
 	int err;
 	struct pt_regs *childregs;
+	struct fork_frame *fork_frame;
+	struct inactive_task_frame *frame;
 	struct task_struct *me = current;
 
 	p->thread.sp0 = (unsigned long)task_stack_page(p) + THREAD_SIZE;
 	childregs = task_pt_regs(p);
-	p->thread.sp = (unsigned long) childregs;
-	set_tsk_thread_flag(p, TIF_FORK);
+	fork_frame = container_of(childregs, struct fork_frame, regs);
+	frame = &fork_frame->frame;
+	frame->bp = 0;
+	frame->ret_addr = (unsigned long) ret_from_fork;
+	p->thread.sp = (unsigned long) fork_frame;
 	p->thread.io_bitmap_ptr = NULL;
 
 	savesegment(gs, p->thread.gsindex);
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 2ed0ec1..021eb02 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -935,7 +935,6 @@ void common_cpu_up(unsigned int cpu, struct task_struct *idle)
 	per_cpu(cpu_current_top_of_stack, cpu) =
 		(unsigned long)task_stack_page(idle) + THREAD_SIZE;
 #else
-	clear_tsk_thread_flag(idle, TIF_FORK);
 	initial_gs = per_cpu_offset(cpu);
 #endif
 }
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 5/6] x86: Pass kernel thread parameters in fork_frame
  2016-06-18 20:56 [PATCH v2 0/6] x86: Rewrite switch_to() Brian Gerst
                   ` (3 preceding siblings ...)
  2016-06-18 20:56 ` [PATCH v2 4/6] x86: Rewrite switch_to() code Brian Gerst
@ 2016-06-18 20:56 ` Brian Gerst
  2016-06-19 21:28   ` Andy Lutomirski
  2016-06-20 13:51   ` Borislav Petkov
  2016-06-18 20:56 ` [PATCH v2 6/6] x86: Fix thread_saved_pc() Brian Gerst
  2016-06-19 22:05 ` [PATCH v2 0/6] x86: Rewrite switch_to() Brian Gerst
  6 siblings, 2 replies; 22+ messages in thread
From: Brian Gerst @ 2016-06-18 20:56 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Ingo Molnar, H. Peter Anvin, Denys Vlasenko, Andy Lutomirski,
	Borislav Petkov, Thomas Gleixner

Instead of setting up a fake pt_regs context, put the kernel thread
function pointer and arg into the unused callee-restored registers
of struct fork_frame.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
---
 arch/x86/entry/entry_32.S    | 31 +++++++++++++++----------------
 arch/x86/entry/entry_64.S    | 35 ++++++++++++++++-------------------
 arch/x86/kernel/process_32.c | 14 ++++----------
 arch/x86/kernel/process_64.c | 10 +++-------
 4 files changed, 38 insertions(+), 52 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index e8302b5..e8a460d 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -240,35 +240,34 @@ END(__switch_to_asm)
  * A newly forked process directly context switches into this address.
  *
  * eax: prev task we switched from
+ * ebx: kernel thread func
+ * edi: kernel thread arg
  */
 ENTRY(ret_from_fork)
 	pushl	%eax
 	call	schedule_tail
 	popl	%eax
 
+	testl	%ebx, %ebx
+	jnz	1f
+
+2:
 	/* When we fork, we trace the syscall return in the child, too. */
 	movl    %esp, %eax
 	call    syscall_return_slowpath
 	jmp     restore_all
-END(ret_from_fork)
-
-ENTRY(ret_from_kernel_thread)
-	pushl	%eax
-	call	schedule_tail
-	popl	%eax
-	movl	PT_EBP(%esp), %eax
-	call	*PT_EBX(%esp)
-	movl	$0, PT_EAX(%esp)
 
+	/* kernel thread */
+1:	movl	%edi, %eax
+	call	*%ebx
 	/*
-	 * Kernel threads return to userspace as if returning from a syscall.
-	 * We should check whether anything actually uses this path and, if so,
-	 * consider switching it over to ret_from_fork.
+	 * A kernel thread is allowed to return here after successfully
+	 * calling do_execve().  Exit to userspace to complete the execve()
+	 * syscall.
 	 */
-	movl    %esp, %eax
-	call    syscall_return_slowpath
-	jmp     restore_all
-ENDPROC(ret_from_kernel_thread)
+	movl	$0, PT_EAX(%esp)
+	jmp	2b
+END(ret_from_fork)
 
 /*
  * Return to user mode is not as complex as all this looks,
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 7574528..23e764c 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -404,37 +404,34 @@ END(__switch_to_asm)
  * A newly forked process directly context switches into this address.
  *
  * rax: prev task we switched from
+ * rbx: kernel thread func
+ * r12: kernel thread arg
  */
 ENTRY(ret_from_fork)
 	movq	%rax, %rdi
 	call	schedule_tail			/* rdi: 'prev' task parameter */
 
-	testb	$3, CS(%rsp)			/* from kernel_thread? */
+	testq	%rbx, %rbx			/* from kernel_thread? */
 	jnz	1f
 
-	/*
-	 * We came from kernel_thread.  This code path is quite twisted, and
-	 * someone should clean it up.
-	 *
-	 * copy_thread_tls stashes the function pointer in RBX and the
-	 * parameter to be passed in RBP.  The called function is permitted
-	 * to call do_execve and thereby jump to user mode.
-	 */
-	movq	RBP(%rsp), %rdi
-	call	*RBX(%rsp)
-	movl	$0, RAX(%rsp)
-
-	/*
-	 * Fall through as though we're exiting a syscall.  This makes a
-	 * twisted sort of sense if we just called do_execve.
-	 */
-
-1:
+2:
 	movq	%rsp, %rdi
 	call	syscall_return_slowpath	/* returns with IRQs disabled */
 	TRACE_IRQS_ON			/* user mode is traced as IRQS on */
 	SWAPGS
 	jmp	restore_regs_and_iret
+
+1:
+	/* kernel thread */
+	movq	%r12, %rdi
+	call	*%rbx
+	/*
+	 * A kernel thread is allowed to return here after successfully
+	 * calling do_execve().  Exit to userspace to complete the execve()
+	 * syscall.
+	 */
+	movq	$0, RAX(%rsp)
+	jmp	2b
 END(ret_from_fork)
 
 /*
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 81a82f5..acee6ca 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -139,6 +139,7 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long sp,
 	int err;
 
 	frame->bp = 0;
+	frame->ret_addr = (unsigned long) ret_from_fork;
 	p->thread.sp = (unsigned long) fork_frame;
 	p->thread.sp0 = (unsigned long) (childregs+1);
 	memset(p->thread.ptrace_bps, 0, sizeof(p->thread.ptrace_bps));
@@ -146,19 +147,12 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long sp,
 	if (unlikely(p->flags & PF_KTHREAD)) {
 		/* kernel thread */
 		memset(childregs, 0, sizeof(struct pt_regs));
-		frame->ret_addr = (unsigned long) ret_from_kernel_thread;
-		task_user_gs(p) = __KERNEL_STACK_CANARY;
-		childregs->ds = __USER_DS;
-		childregs->es = __USER_DS;
-		childregs->fs = __KERNEL_PERCPU;
-		childregs->bx = sp;	/* function */
-		childregs->bp = arg;
-		childregs->orig_ax = -1;
-		childregs->cs = __KERNEL_CS | get_kernel_rpl();
-		childregs->flags = X86_EFLAGS_IF | X86_EFLAGS_FIXED;
+		frame->bx = sp;		/* function */
+		frame->di = arg;
 		p->thread.io_bitmap_ptr = NULL;
 		return 0;
 	}
+	frame->bx = 0;
 	*childregs = *current_pt_regs();
 	childregs->ax = 0;
 	if (sp)
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 3ac9522..479cae5f 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -165,15 +165,11 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long sp,
 	if (unlikely(p->flags & PF_KTHREAD)) {
 		/* kernel thread */
 		memset(childregs, 0, sizeof(struct pt_regs));
-		childregs->sp = (unsigned long)childregs;
-		childregs->ss = __KERNEL_DS;
-		childregs->bx = sp; /* function */
-		childregs->bp = arg;
-		childregs->orig_ax = -1;
-		childregs->cs = __KERNEL_CS | get_kernel_rpl();
-		childregs->flags = X86_EFLAGS_IF | X86_EFLAGS_FIXED;
+		frame->bx = sp;		/* function */
+		frame->r12 = arg;
 		return 0;
 	}
+	frame->bx = 0;
 	*childregs = *current_pt_regs();
 
 	childregs->ax = 0;
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 6/6] x86: Fix thread_saved_pc()
  2016-06-18 20:56 [PATCH v2 0/6] x86: Rewrite switch_to() Brian Gerst
                   ` (4 preceding siblings ...)
  2016-06-18 20:56 ` [PATCH v2 5/6] x86: Pass kernel thread parameters in fork_frame Brian Gerst
@ 2016-06-18 20:56 ` Brian Gerst
  2016-06-20 16:01   ` Josh Poimboeuf
  2016-06-19 22:05 ` [PATCH v2 0/6] x86: Rewrite switch_to() Brian Gerst
  6 siblings, 1 reply; 22+ messages in thread
From: Brian Gerst @ 2016-06-18 20:56 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Ingo Molnar, H. Peter Anvin, Denys Vlasenko, Andy Lutomirski,
	Borislav Petkov, Thomas Gleixner

thread_saved_pc() was using a completely bogus method to get the return
address.  Since switch_to() was previously inlined, there was no sane way
to know where on the stack the return address was stored.  Now with the
frame of a sleeping thread well defined, this can be implemented correctly.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
---
 arch/x86/include/asm/processor.h | 10 ++--------
 arch/x86/kernel/process.c        | 10 ++++++++++
 arch/x86/kernel/process_32.c     |  8 --------
 3 files changed, 12 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 1e7d634..413f4f1 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -716,8 +716,6 @@ static inline void spin_lock_prefetch(const void *x)
 	.io_bitmap_ptr		= NULL,					  \
 }
 
-extern unsigned long thread_saved_pc(struct task_struct *tsk);
-
 /*
  * TOP_OF_KERNEL_STACK_PADDING reserves 8 bytes on top of the ring0 stack.
  * This is necessary to guarantee that the entire "struct pt_regs"
@@ -767,17 +765,13 @@ extern unsigned long thread_saved_pc(struct task_struct *tsk);
 	.sp0 = TOP_OF_INIT_STACK \
 }
 
-/*
- * Return saved PC of a blocked thread.
- * What is this good for? it will be always the scheduler or ret_from_fork.
- */
-#define thread_saved_pc(t)	READ_ONCE_NOCHECK(*(unsigned long *)((t)->thread.sp - 8))
-
 #define task_pt_regs(tsk)	((struct pt_regs *)(tsk)->thread.sp0 - 1)
 extern unsigned long KSTK_ESP(struct task_struct *task);
 
 #endif /* CONFIG_X86_64 */
 
+extern unsigned long thread_saved_pc(struct task_struct *tsk);
+
 extern void start_thread(struct pt_regs *regs, unsigned long new_ip,
 					       unsigned long new_sp);
 
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 00ebab0..db458c4 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -513,6 +513,16 @@ unsigned long arch_randomize_brk(struct mm_struct *mm)
 }
 
 /*
+ * Return saved PC of a blocked thread.
+ */
+unsigned long thread_saved_pc(struct task_struct *tsk)
+{
+	struct inactive_task_frame *frame =
+		(struct inactive_task_frame *) READ_ONCE(tsk->thread.sp);
+	return READ_ONCE_NOCHECK(frame->ret_addr);
+}
+
+/*
  * Called from fs/proc with a reference on @p to find the function
  * which called into schedule(). This needs to be done carefully
  * because the task might wake up and we might look at a stack
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index acee6ca..bcf0a38 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -58,14 +58,6 @@
 asmlinkage void ret_from_fork(void) __asm__("ret_from_fork");
 asmlinkage void ret_from_kernel_thread(void) __asm__("ret_from_kernel_thread");
 
-/*
- * Return saved PC of a blocked thread.
- */
-unsigned long thread_saved_pc(struct task_struct *tsk)
-{
-	return ((unsigned long *)tsk->thread.sp)[3];
-}
-
 void __show_regs(struct pt_regs *regs, int all)
 {
 	unsigned long cr0 = 0L, cr2 = 0L, cr3 = 0L, cr4 = 0L;
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 3/6] x86: Add struct inactive_task_frame
  2016-06-18 20:56 ` [PATCH v2 3/6] x86: Add struct inactive_task_frame Brian Gerst
@ 2016-06-19 21:18   ` Andy Lutomirski
  2016-06-20 15:39   ` Josh Poimboeuf
  1 sibling, 0 replies; 22+ messages in thread
From: Andy Lutomirski @ 2016-06-19 21:18 UTC (permalink / raw)
  To: Brian Gerst
  Cc: X86 ML, linux-kernel, Ingo Molnar, H. Peter Anvin,
	Denys Vlasenko, Borislav Petkov, Thomas Gleixner

On Sat, Jun 18, 2016 at 1:56 PM, Brian Gerst <brgerst@gmail.com> wrote:
> Add struct inactive_task_frame, which defines the layout of the stack for
> a sleeping process.  For now, the only defined field is the BP register
> (frame pointer).

I like this version.

Reviewed-by: Andy Lutomirski <luto@kernel.org>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 4/6] x86: Rewrite switch_to() code
  2016-06-18 20:56 ` [PATCH v2 4/6] x86: Rewrite switch_to() code Brian Gerst
@ 2016-06-19 21:22   ` Andy Lutomirski
  2016-06-20 15:44   ` Josh Poimboeuf
  1 sibling, 0 replies; 22+ messages in thread
From: Andy Lutomirski @ 2016-06-19 21:22 UTC (permalink / raw)
  To: Brian Gerst
  Cc: X86 ML, linux-kernel, Ingo Molnar, H. Peter Anvin,
	Denys Vlasenko, Borislav Petkov, Thomas Gleixner

On Sat, Jun 18, 2016 at 1:56 PM, Brian Gerst <brgerst@gmail.com> wrote:
> Move the low-level context switch code to an out-of-line asm stub instead of
> using complex inline asm.  This allows constructing a new stack frame for the
> child process to make it seamlessly flow to ret_from_fork without an extra
> test and branch in __switch_to().  It also improves code generation for
> __schedule() by using the C calling convention instead of clobbering all
> registers.

This is great.  The conflict with my stack code is minimal enough that
I won't bother rebasing unless someone specifically wants me to.

Acked-by: Andy Lutomirski <luto@kernel.org>

--Andy

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 5/6] x86: Pass kernel thread parameters in fork_frame
  2016-06-18 20:56 ` [PATCH v2 5/6] x86: Pass kernel thread parameters in fork_frame Brian Gerst
@ 2016-06-19 21:28   ` Andy Lutomirski
  2016-06-19 22:01     ` Brian Gerst
  2016-06-20 13:51   ` Borislav Petkov
  1 sibling, 1 reply; 22+ messages in thread
From: Andy Lutomirski @ 2016-06-19 21:28 UTC (permalink / raw)
  To: Brian Gerst
  Cc: X86 ML, linux-kernel, Ingo Molnar, H. Peter Anvin,
	Denys Vlasenko, Borislav Petkov, Thomas Gleixner

On Sat, Jun 18, 2016 at 1:56 PM, Brian Gerst <brgerst@gmail.com> wrote:
> Instead of setting up a fake pt_regs context, put the kernel thread
> function pointer and arg into the unused callee-restored registers
> of struct fork_frame.

This seems generally okay.

>
> Signed-off-by: Brian Gerst <brgerst@gmail.com>

> @@ -146,19 +147,12 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long sp,


>         if (unlikely(p->flags & PF_KTHREAD)) {
>                 /* kernel thread */
>                 memset(childregs, 0, sizeof(struct pt_regs));
> -               frame->ret_addr = (unsigned long) ret_from_kernel_thread;
> -               task_user_gs(p) = __KERNEL_STACK_CANARY;
> -               childregs->ds = __USER_DS;
> -               childregs->es = __USER_DS;
> -               childregs->fs = __KERNEL_PERCPU;

Is the idea that do_execve promises to initialize all these fields to
something sensible if the kernel thread in question tries to return to
user mode?

--Andy

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 5/6] x86: Pass kernel thread parameters in fork_frame
  2016-06-19 21:28   ` Andy Lutomirski
@ 2016-06-19 22:01     ` Brian Gerst
  0 siblings, 0 replies; 22+ messages in thread
From: Brian Gerst @ 2016-06-19 22:01 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: X86 ML, linux-kernel, Ingo Molnar, H. Peter Anvin,
	Denys Vlasenko, Borislav Petkov, Thomas Gleixner

On Sun, Jun 19, 2016 at 5:28 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Sat, Jun 18, 2016 at 1:56 PM, Brian Gerst <brgerst@gmail.com> wrote:
>> Instead of setting up a fake pt_regs context, put the kernel thread
>> function pointer and arg into the unused callee-restored registers
>> of struct fork_frame.
>
> This seems generally okay.
>
>>
>> Signed-off-by: Brian Gerst <brgerst@gmail.com>
>
>> @@ -146,19 +147,12 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long sp,
>
>
>>         if (unlikely(p->flags & PF_KTHREAD)) {
>>                 /* kernel thread */
>>                 memset(childregs, 0, sizeof(struct pt_regs));
>> -               frame->ret_addr = (unsigned long) ret_from_kernel_thread;
>> -               task_user_gs(p) = __KERNEL_STACK_CANARY;
>> -               childregs->ds = __USER_DS;
>> -               childregs->es = __USER_DS;
>> -               childregs->fs = __KERNEL_PERCPU;
>
> Is the idea that do_execve promises to initialize all these fields to
> something sensible if the kernel thread in question tries to return to
> user mode?
>
> --Andy

Yes, do_execve() should be setting the full pt_regs.

--
Brian Gerst

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 0/6] x86: Rewrite switch_to()
  2016-06-18 20:56 [PATCH v2 0/6] x86: Rewrite switch_to() Brian Gerst
                   ` (5 preceding siblings ...)
  2016-06-18 20:56 ` [PATCH v2 6/6] x86: Fix thread_saved_pc() Brian Gerst
@ 2016-06-19 22:05 ` Brian Gerst
  6 siblings, 0 replies; 22+ messages in thread
From: Brian Gerst @ 2016-06-19 22:05 UTC (permalink / raw)
  To: the arch/x86 maintainers, Linux Kernel Mailing List, Josh Poimboeuf
  Cc: Ingo Molnar, H. Peter Anvin, Denys Vlasenko, Andy Lutomirski,
	Borislav Petkov, Thomas Gleixner

On Sat, Jun 18, 2016 at 4:56 PM, Brian Gerst <brgerst@gmail.com> wrote:
> This patch set simplifies the switch_to() code, by moving the stack switch
> code out of line into an asm stub before calling __switch_to().  This ends
> up being more readable, and using the C calling convention instead of
> clobbering all registers improves code generation.  It also allows newly
> forked processes to construct a special stack frame to seamlessly flow
> to ret_from_fork, instead of using a test and branch, or an unbalanced
> call/ret.
>
> Changes from v1:
> - Added struct inactive_task_frame
> - Added comments about kernel threads returning to userspace
> - Cleaned up some incorrect uses of thread.sp

I forgot to also add:
- Rearranged inactive stack frame so that BP (frame pointer) is in the
natural position right below the return address.  This should take
care of unwinding issues Josh raised.

--
Brian Gerst

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 5/6] x86: Pass kernel thread parameters in fork_frame
  2016-06-18 20:56 ` [PATCH v2 5/6] x86: Pass kernel thread parameters in fork_frame Brian Gerst
  2016-06-19 21:28   ` Andy Lutomirski
@ 2016-06-20 13:51   ` Borislav Petkov
  2016-06-20 15:01     ` Brian Gerst
  1 sibling, 1 reply; 22+ messages in thread
From: Borislav Petkov @ 2016-06-20 13:51 UTC (permalink / raw)
  To: Brian Gerst
  Cc: x86, linux-kernel, Ingo Molnar, H. Peter Anvin, Denys Vlasenko,
	Andy Lutomirski, Thomas Gleixner

On Sat, Jun 18, 2016 at 04:56:17PM -0400, Brian Gerst wrote:
> Instead of setting up a fake pt_regs context, put the kernel thread
> function pointer and arg into the unused callee-restored registers
> of struct fork_frame.
> 
> Signed-off-by: Brian Gerst <brgerst@gmail.com>

> ---
>  arch/x86/entry/entry_32.S    | 31 +++++++++++++++----------------
>  arch/x86/entry/entry_64.S    | 35 ++++++++++++++++-------------------
>  arch/x86/kernel/process_32.c | 14 ++++----------
>  arch/x86/kernel/process_64.c | 10 +++-------
>  4 files changed, 38 insertions(+), 52 deletions(-)

...

> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index 7574528..23e764c 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -404,37 +404,34 @@ END(__switch_to_asm)
>   * A newly forked process directly context switches into this address.
>   *
>   * rax: prev task we switched from
> + * rbx: kernel thread func
> + * r12: kernel thread arg
>   */
>  ENTRY(ret_from_fork)
>  	movq	%rax, %rdi
>  	call	schedule_tail			/* rdi: 'prev' task parameter */
>  
> -	testb	$3, CS(%rsp)			/* from kernel_thread? */
> +	testq	%rbx, %rbx			/* from kernel_thread? */
>  	jnz	1f
>  
> -	/*
> -	 * We came from kernel_thread.  This code path is quite twisted, and
> -	 * someone should clean it up.
> -	 *
> -	 * copy_thread_tls stashes the function pointer in RBX and the
> -	 * parameter to be passed in RBP.  The called function is permitted
> -	 * to call do_execve and thereby jump to user mode.
> -	 */
> -	movq	RBP(%rsp), %rdi
> -	call	*RBX(%rsp)
> -	movl	$0, RAX(%rsp)
> -
> -	/*
> -	 * Fall through as though we're exiting a syscall.  This makes a
> -	 * twisted sort of sense if we just called do_execve.
> -	 */
> -
> -1:
> +2:
>  	movq	%rsp, %rdi
>  	call	syscall_return_slowpath	/* returns with IRQs disabled */
>  	TRACE_IRQS_ON			/* user mode is traced as IRQS on */
>  	SWAPGS
>  	jmp	restore_regs_and_iret
> +
> +1:
> +	/* kernel thread */
> +	movq	%r12, %rdi
> +	call	*%rbx
> +	/*
> +	 * A kernel thread is allowed to return here after successfully
> +	 * calling do_execve().  Exit to userspace to complete the execve()
> +	 * syscall.
> +	 */
> +	movq	$0, RAX(%rsp)
> +	jmp	2b
>  END(ret_from_fork)

Wouldn't it be simpler to reverse the JNZ after the TESTB above and thus
have a single label and simpler code flow (pasting the whole thing here
because a diff is unreadable):

ENTRY(ret_from_fork)
        movq    %rax, %rdi
        call    schedule_tail                   /* rdi: 'prev' task parameter */

        testq   %rbx, %rbx                      /* from kernel_thread? */
        jz      1f

        /* kernel thread */
        movq    %r12, %rdi
        call    *%rbx
        /*
         * A kernel thread is allowed to return here after successfully
         * calling do_execve().  Exit to userspace to complete the execve()
         * syscall.
         */
        movq    $0, RAX(%rsp)

1:
        movq    %rsp, %rdi
        call    syscall_return_slowpath /* returns with IRQs disabled */
        TRACE_IRQS_ON                   /* user mode is traced as IRQS on */
        SWAPGS
        jmp     restore_regs_and_iret
END(ret_from_fork)

It boots fine in my guest this way too.

Ditto for 32-bit.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 5/6] x86: Pass kernel thread parameters in fork_frame
  2016-06-20 13:51   ` Borislav Petkov
@ 2016-06-20 15:01     ` Brian Gerst
  2016-06-20 15:14       ` Borislav Petkov
  0 siblings, 1 reply; 22+ messages in thread
From: Brian Gerst @ 2016-06-20 15:01 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Ingo Molnar,
	H. Peter Anvin, Denys Vlasenko, Andy Lutomirski, Thomas Gleixner

On Mon, Jun 20, 2016 at 9:51 AM, Borislav Petkov <bp@suse.de> wrote:
> On Sat, Jun 18, 2016 at 04:56:17PM -0400, Brian Gerst wrote:
>> Instead of setting up a fake pt_regs context, put the kernel thread
>> function pointer and arg into the unused callee-restored registers
>> of struct fork_frame.
>>
>> Signed-off-by: Brian Gerst <brgerst@gmail.com>
>
>> ---
>>  arch/x86/entry/entry_32.S    | 31 +++++++++++++++----------------
>>  arch/x86/entry/entry_64.S    | 35 ++++++++++++++++-------------------
>>  arch/x86/kernel/process_32.c | 14 ++++----------
>>  arch/x86/kernel/process_64.c | 10 +++-------
>>  4 files changed, 38 insertions(+), 52 deletions(-)
>
> ...
>
>> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
>> index 7574528..23e764c 100644
>> --- a/arch/x86/entry/entry_64.S
>> +++ b/arch/x86/entry/entry_64.S
>> @@ -404,37 +404,34 @@ END(__switch_to_asm)
>>   * A newly forked process directly context switches into this address.
>>   *
>>   * rax: prev task we switched from
>> + * rbx: kernel thread func
>> + * r12: kernel thread arg
>>   */
>>  ENTRY(ret_from_fork)
>>       movq    %rax, %rdi
>>       call    schedule_tail                   /* rdi: 'prev' task parameter */
>>
>> -     testb   $3, CS(%rsp)                    /* from kernel_thread? */
>> +     testq   %rbx, %rbx                      /* from kernel_thread? */
>>       jnz     1f
>>
>> -     /*
>> -      * We came from kernel_thread.  This code path is quite twisted, and
>> -      * someone should clean it up.
>> -      *
>> -      * copy_thread_tls stashes the function pointer in RBX and the
>> -      * parameter to be passed in RBP.  The called function is permitted
>> -      * to call do_execve and thereby jump to user mode.
>> -      */
>> -     movq    RBP(%rsp), %rdi
>> -     call    *RBX(%rsp)
>> -     movl    $0, RAX(%rsp)
>> -
>> -     /*
>> -      * Fall through as though we're exiting a syscall.  This makes a
>> -      * twisted sort of sense if we just called do_execve.
>> -      */
>> -
>> -1:
>> +2:
>>       movq    %rsp, %rdi
>>       call    syscall_return_slowpath /* returns with IRQs disabled */
>>       TRACE_IRQS_ON                   /* user mode is traced as IRQS on */
>>       SWAPGS
>>       jmp     restore_regs_and_iret
>> +
>> +1:
>> +     /* kernel thread */
>> +     movq    %r12, %rdi
>> +     call    *%rbx
>> +     /*
>> +      * A kernel thread is allowed to return here after successfully
>> +      * calling do_execve().  Exit to userspace to complete the execve()
>> +      * syscall.
>> +      */
>> +     movq    $0, RAX(%rsp)
>> +     jmp     2b
>>  END(ret_from_fork)
>
> Wouldn't it be simpler to reverse the JNZ after the TESTB above and thus
> have a single label and simpler code flow (pasting the whole thing here
> because a diff is unreadable):
>
> ENTRY(ret_from_fork)
>         movq    %rax, %rdi
>         call    schedule_tail                   /* rdi: 'prev' task parameter */
>
>         testq   %rbx, %rbx                      /* from kernel_thread? */
>         jz      1f
>
>         /* kernel thread */
>         movq    %r12, %rdi
>         call    *%rbx
>         /*
>          * A kernel thread is allowed to return here after successfully
>          * calling do_execve().  Exit to userspace to complete the execve()
>          * syscall.
>          */
>         movq    $0, RAX(%rsp)
>
> 1:
>         movq    %rsp, %rdi
>         call    syscall_return_slowpath /* returns with IRQs disabled */
>         TRACE_IRQS_ON                   /* user mode is traced as IRQS on */
>         SWAPGS
>         jmp     restore_regs_and_iret
> END(ret_from_fork)
>
> It boots fine in my guest this way too.

The idea was to put the uncommon case (kernel thread) out of line for
performance reasons.

--
Brian Gerst

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 5/6] x86: Pass kernel thread parameters in fork_frame
  2016-06-20 15:01     ` Brian Gerst
@ 2016-06-20 15:14       ` Borislav Petkov
  2016-06-22  4:24         ` Brian Gerst
  0 siblings, 1 reply; 22+ messages in thread
From: Borislav Petkov @ 2016-06-20 15:14 UTC (permalink / raw)
  To: Brian Gerst
  Cc: Borislav Petkov, the arch/x86 maintainers,
	Linux Kernel Mailing List, Ingo Molnar, H. Peter Anvin,
	Denys Vlasenko, Andy Lutomirski, Thomas Gleixner

On Mon, Jun 20, 2016 at 11:01:02AM -0400, Brian Gerst wrote:
> The idea was to put the uncommon case (kernel thread) out of line for
> performance reasons.

A comment saying so wouldn't hurt...

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 3/6] x86: Add struct inactive_task_frame
  2016-06-18 20:56 ` [PATCH v2 3/6] x86: Add struct inactive_task_frame Brian Gerst
  2016-06-19 21:18   ` Andy Lutomirski
@ 2016-06-20 15:39   ` Josh Poimboeuf
  1 sibling, 0 replies; 22+ messages in thread
From: Josh Poimboeuf @ 2016-06-20 15:39 UTC (permalink / raw)
  To: Brian Gerst
  Cc: x86, linux-kernel, Ingo Molnar, H. Peter Anvin, Denys Vlasenko,
	Andy Lutomirski, Borislav Petkov, Thomas Gleixner

On Sat, Jun 18, 2016 at 04:56:15PM -0400, Brian Gerst wrote:
> Add struct inactive_task_frame, which defines the layout of the stack for
> a sleeping process.  For now, the only defined field is the BP register
> (frame pointer).
> 
> Signed-off-by: Brian Gerst <brgerst@gmail.com>

Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>

-- 
Josh

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 4/6] x86: Rewrite switch_to() code
  2016-06-18 20:56 ` [PATCH v2 4/6] x86: Rewrite switch_to() code Brian Gerst
  2016-06-19 21:22   ` Andy Lutomirski
@ 2016-06-20 15:44   ` Josh Poimboeuf
  1 sibling, 0 replies; 22+ messages in thread
From: Josh Poimboeuf @ 2016-06-20 15:44 UTC (permalink / raw)
  To: Brian Gerst
  Cc: x86, linux-kernel, Ingo Molnar, H. Peter Anvin, Denys Vlasenko,
	Andy Lutomirski, Borislav Petkov, Thomas Gleixner

On Sat, Jun 18, 2016 at 04:56:16PM -0400, Brian Gerst wrote:
> Move the low-level context switch code to an out-of-line asm stub instead of
> using complex inline asm.  This allows constructing a new stack frame for the
> child process to make it seamlessly flow to ret_from_fork without an extra
> test and branch in __switch_to().  It also improves code generation for
> __schedule() by using the C calling convention instead of clobbering all
> registers.
> 
> Signed-off-by: Brian Gerst <brgerst@gmail.com>

You can also remove the STACK_FRAME_NON_STANDARD annotation for
__schedule() in kernel/sched/core.c since it no longer produces an
objtool warning.

Otherwise it looks great:

Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>

-- 
Josh

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 6/6] x86: Fix thread_saved_pc()
  2016-06-18 20:56 ` [PATCH v2 6/6] x86: Fix thread_saved_pc() Brian Gerst
@ 2016-06-20 16:01   ` Josh Poimboeuf
  2016-06-22  4:27     ` Brian Gerst
  0 siblings, 1 reply; 22+ messages in thread
From: Josh Poimboeuf @ 2016-06-20 16:01 UTC (permalink / raw)
  To: Brian Gerst
  Cc: x86, linux-kernel, Ingo Molnar, H. Peter Anvin, Denys Vlasenko,
	Andy Lutomirski, Borislav Petkov, Thomas Gleixner

On Sat, Jun 18, 2016 at 04:56:18PM -0400, Brian Gerst wrote:
> thread_saved_pc() was using a completely bogus method to get the return
> address.  Since switch_to() was previously inlined, there was no sane way
> to know where on the stack the return address was stored.  Now with the
> frame of a sleeping thread well defined, this can be implemented correctly.
> 
> Signed-off-by: Brian Gerst <brgerst@gmail.com>
> ---
>  arch/x86/include/asm/processor.h | 10 ++--------
>  arch/x86/kernel/process.c        | 10 ++++++++++
>  arch/x86/kernel/process_32.c     |  8 --------
>  3 files changed, 12 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
> index 1e7d634..413f4f1 100644
> --- a/arch/x86/include/asm/processor.h
> +++ b/arch/x86/include/asm/processor.h
> @@ -716,8 +716,6 @@ static inline void spin_lock_prefetch(const void *x)
>  	.io_bitmap_ptr		= NULL,					  \
>  }
>  
> -extern unsigned long thread_saved_pc(struct task_struct *tsk);
> -
>  /*
>   * TOP_OF_KERNEL_STACK_PADDING reserves 8 bytes on top of the ring0 stack.
>   * This is necessary to guarantee that the entire "struct pt_regs"
> @@ -767,17 +765,13 @@ extern unsigned long thread_saved_pc(struct task_struct *tsk);
>  	.sp0 = TOP_OF_INIT_STACK \
>  }
>  
> -/*
> - * Return saved PC of a blocked thread.
> - * What is this good for? it will be always the scheduler or ret_from_fork.
> - */
> -#define thread_saved_pc(t)	READ_ONCE_NOCHECK(*(unsigned long *)((t)->thread.sp - 8))
> -
>  #define task_pt_regs(tsk)	((struct pt_regs *)(tsk)->thread.sp0 - 1)
>  extern unsigned long KSTK_ESP(struct task_struct *task);
>  
>  #endif /* CONFIG_X86_64 */
>  
> +extern unsigned long thread_saved_pc(struct task_struct *tsk);
> +
>  extern void start_thread(struct pt_regs *regs, unsigned long new_ip,
>  					       unsigned long new_sp);
>  
> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index 00ebab0..db458c4 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -513,6 +513,16 @@ unsigned long arch_randomize_brk(struct mm_struct *mm)
>  }
>  
>  /*
> + * Return saved PC of a blocked thread.
> + */
> +unsigned long thread_saved_pc(struct task_struct *tsk)
> +{
> +	struct inactive_task_frame *frame =
> +		(struct inactive_task_frame *) READ_ONCE(tsk->thread.sp);
> +	return READ_ONCE_NOCHECK(frame->ret_addr);
> +}
> +
> +/*

I would agree with the above (removed) comment:

  "What is this good for?  it will be always the scheduler or ret_from_fork."

And I'd guess the same is true for all the arches which have to
implement it.  Maybe this function (and its single call site in
sched_show_task()) should just be removed altogether?

-- 
Josh

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 5/6] x86: Pass kernel thread parameters in fork_frame
  2016-06-20 15:14       ` Borislav Petkov
@ 2016-06-22  4:24         ` Brian Gerst
  2016-07-09 12:01           ` Ingo Molnar
  0 siblings, 1 reply; 22+ messages in thread
From: Brian Gerst @ 2016-06-22  4:24 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Borislav Petkov, the arch/x86 maintainers,
	Linux Kernel Mailing List, Ingo Molnar, H. Peter Anvin,
	Denys Vlasenko, Andy Lutomirski, Thomas Gleixner

On Mon, Jun 20, 2016 at 11:14 AM, Borislav Petkov <bp@alien8.de> wrote:
> On Mon, Jun 20, 2016 at 11:01:02AM -0400, Brian Gerst wrote:
>> The idea was to put the uncommon case (kernel thread) out of line for
>> performance reasons.
>
> A comment saying so wouldn't hurt...

This is a fairly common pattern.  Do we have to document every case of it?

--
Brian Gerst

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 6/6] x86: Fix thread_saved_pc()
  2016-06-20 16:01   ` Josh Poimboeuf
@ 2016-06-22  4:27     ` Brian Gerst
  2016-06-24 18:12       ` Josh Poimboeuf
  0 siblings, 1 reply; 22+ messages in thread
From: Brian Gerst @ 2016-06-22  4:27 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Ingo Molnar,
	H. Peter Anvin, Denys Vlasenko, Andy Lutomirski, Borislav Petkov,
	Thomas Gleixner

On Mon, Jun 20, 2016 at 12:01 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> On Sat, Jun 18, 2016 at 04:56:18PM -0400, Brian Gerst wrote:
>> thread_saved_pc() was using a completely bogus method to get the return
>> address.  Since switch_to() was previously inlined, there was no sane way
>> to know where on the stack the return address was stored.  Now with the
>> frame of a sleeping thread well defined, this can be implemented correctly.
>>
>> Signed-off-by: Brian Gerst <brgerst@gmail.com>
>> ---
>>  arch/x86/include/asm/processor.h | 10 ++--------
>>  arch/x86/kernel/process.c        | 10 ++++++++++
>>  arch/x86/kernel/process_32.c     |  8 --------
>>  3 files changed, 12 insertions(+), 16 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
>> index 1e7d634..413f4f1 100644
>> --- a/arch/x86/include/asm/processor.h
>> +++ b/arch/x86/include/asm/processor.h
>> @@ -716,8 +716,6 @@ static inline void spin_lock_prefetch(const void *x)
>>       .io_bitmap_ptr          = NULL,                                   \
>>  }
>>
>> -extern unsigned long thread_saved_pc(struct task_struct *tsk);
>> -
>>  /*
>>   * TOP_OF_KERNEL_STACK_PADDING reserves 8 bytes on top of the ring0 stack.
>>   * This is necessary to guarantee that the entire "struct pt_regs"
>> @@ -767,17 +765,13 @@ extern unsigned long thread_saved_pc(struct task_struct *tsk);
>>       .sp0 = TOP_OF_INIT_STACK \
>>  }
>>
>> -/*
>> - * Return saved PC of a blocked thread.
>> - * What is this good for? it will be always the scheduler or ret_from_fork.
>> - */
>> -#define thread_saved_pc(t)   READ_ONCE_NOCHECK(*(unsigned long *)((t)->thread.sp - 8))
>> -
>>  #define task_pt_regs(tsk)    ((struct pt_regs *)(tsk)->thread.sp0 - 1)
>>  extern unsigned long KSTK_ESP(struct task_struct *task);
>>
>>  #endif /* CONFIG_X86_64 */
>>
>> +extern unsigned long thread_saved_pc(struct task_struct *tsk);
>> +
>>  extern void start_thread(struct pt_regs *regs, unsigned long new_ip,
>>                                              unsigned long new_sp);
>>
>> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
>> index 00ebab0..db458c4 100644
>> --- a/arch/x86/kernel/process.c
>> +++ b/arch/x86/kernel/process.c
>> @@ -513,6 +513,16 @@ unsigned long arch_randomize_brk(struct mm_struct *mm)
>>  }
>>
>>  /*
>> + * Return saved PC of a blocked thread.
>> + */
>> +unsigned long thread_saved_pc(struct task_struct *tsk)
>> +{
>> +     struct inactive_task_frame *frame =
>> +             (struct inactive_task_frame *) READ_ONCE(tsk->thread.sp);
>> +     return READ_ONCE_NOCHECK(frame->ret_addr);
>> +}
>> +
>> +/*
>
> I would agree with the above (removed) comment:
>
>   "What is this good for?  it will be always the scheduler or ret_from_fork."
>
> And I'd guess the same is true for all the arches which have to
> implement it.  Maybe this function (and its single call site in
> sched_show_task()) should just be removed altogether?

I didn't really want to stray down that path with this series.  This
just makes it functional again.  the usefulness is still open for
debate.

--
Brian Gerst

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 6/6] x86: Fix thread_saved_pc()
  2016-06-22  4:27     ` Brian Gerst
@ 2016-06-24 18:12       ` Josh Poimboeuf
  0 siblings, 0 replies; 22+ messages in thread
From: Josh Poimboeuf @ 2016-06-24 18:12 UTC (permalink / raw)
  To: Brian Gerst
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Ingo Molnar,
	H. Peter Anvin, Denys Vlasenko, Andy Lutomirski, Borislav Petkov,
	Thomas Gleixner

On Wed, Jun 22, 2016 at 12:27:43AM -0400, Brian Gerst wrote:
> On Mon, Jun 20, 2016 at 12:01 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > On Sat, Jun 18, 2016 at 04:56:18PM -0400, Brian Gerst wrote:
> >> thread_saved_pc() was using a completely bogus method to get the return
> >> address.  Since switch_to() was previously inlined, there was no sane way
> >> to know where on the stack the return address was stored.  Now with the
> >> frame of a sleeping thread well defined, this can be implemented correctly.
> >>
> >> Signed-off-by: Brian Gerst <brgerst@gmail.com>
> >> ---
> >>  arch/x86/include/asm/processor.h | 10 ++--------
> >>  arch/x86/kernel/process.c        | 10 ++++++++++
> >>  arch/x86/kernel/process_32.c     |  8 --------
> >>  3 files changed, 12 insertions(+), 16 deletions(-)
> >>
> >> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
> >> index 1e7d634..413f4f1 100644
> >> --- a/arch/x86/include/asm/processor.h
> >> +++ b/arch/x86/include/asm/processor.h
> >> @@ -716,8 +716,6 @@ static inline void spin_lock_prefetch(const void *x)
> >>       .io_bitmap_ptr          = NULL,                                   \
> >>  }
> >>
> >> -extern unsigned long thread_saved_pc(struct task_struct *tsk);
> >> -
> >>  /*
> >>   * TOP_OF_KERNEL_STACK_PADDING reserves 8 bytes on top of the ring0 stack.
> >>   * This is necessary to guarantee that the entire "struct pt_regs"
> >> @@ -767,17 +765,13 @@ extern unsigned long thread_saved_pc(struct task_struct *tsk);
> >>       .sp0 = TOP_OF_INIT_STACK \
> >>  }
> >>
> >> -/*
> >> - * Return saved PC of a blocked thread.
> >> - * What is this good for? it will be always the scheduler or ret_from_fork.
> >> - */
> >> -#define thread_saved_pc(t)   READ_ONCE_NOCHECK(*(unsigned long *)((t)->thread.sp - 8))
> >> -
> >>  #define task_pt_regs(tsk)    ((struct pt_regs *)(tsk)->thread.sp0 - 1)
> >>  extern unsigned long KSTK_ESP(struct task_struct *task);
> >>
> >>  #endif /* CONFIG_X86_64 */
> >>
> >> +extern unsigned long thread_saved_pc(struct task_struct *tsk);
> >> +
> >>  extern void start_thread(struct pt_regs *regs, unsigned long new_ip,
> >>                                              unsigned long new_sp);
> >>
> >> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> >> index 00ebab0..db458c4 100644
> >> --- a/arch/x86/kernel/process.c
> >> +++ b/arch/x86/kernel/process.c
> >> @@ -513,6 +513,16 @@ unsigned long arch_randomize_brk(struct mm_struct *mm)
> >>  }
> >>
> >>  /*
> >> + * Return saved PC of a blocked thread.
> >> + */
> >> +unsigned long thread_saved_pc(struct task_struct *tsk)
> >> +{
> >> +     struct inactive_task_frame *frame =
> >> +             (struct inactive_task_frame *) READ_ONCE(tsk->thread.sp);
> >> +     return READ_ONCE_NOCHECK(frame->ret_addr);
> >> +}
> >> +
> >> +/*
> >
> > I would agree with the above (removed) comment:
> >
> >   "What is this good for?  it will be always the scheduler or ret_from_fork."
> >
> > And I'd guess the same is true for all the arches which have to
> > implement it.  Maybe this function (and its single call site in
> > sched_show_task()) should just be removed altogether?
> 
> I didn't really want to stray down that path with this series.  This
> just makes it functional again.  the usefulness is still open for
> debate.

Fair enough.

Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>

-- 
Josh

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 5/6] x86: Pass kernel thread parameters in fork_frame
  2016-06-22  4:24         ` Brian Gerst
@ 2016-07-09 12:01           ` Ingo Molnar
  0 siblings, 0 replies; 22+ messages in thread
From: Ingo Molnar @ 2016-07-09 12:01 UTC (permalink / raw)
  To: Brian Gerst
  Cc: Borislav Petkov, Borislav Petkov, the arch/x86 maintainers,
	Linux Kernel Mailing List, H. Peter Anvin, Denys Vlasenko,
	Andy Lutomirski, Thomas Gleixner


* Brian Gerst <brgerst@gmail.com> wrote:

> On Mon, Jun 20, 2016 at 11:14 AM, Borislav Petkov <bp@alien8.de> wrote:
> > On Mon, Jun 20, 2016 at 11:01:02AM -0400, Brian Gerst wrote:
> >> The idea was to put the uncommon case (kernel thread) out of line for
> >> performance reasons.
> >
> > A comment saying so wouldn't hurt...
> 
> This is a fairly common pattern.  Do we have to document every case of it?

Yeah, would be good to do that in general, so that people can re-evaluate whether 
'this is rare' is still true years down the line. For newly touched code it makes 
sense to add a minimal comment that explains what is rare about the branch and so.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2016-07-09 12:01 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-18 20:56 [PATCH v2 0/6] x86: Rewrite switch_to() Brian Gerst
2016-06-18 20:56 ` [PATCH v2 1/6] x86-32, kgdb: Don't use thread.ip in sleeping_thread_to_gdb_regs() Brian Gerst
2016-06-18 20:56 ` [PATCH v2 2/6] x86-64, kgdb: clear GDB_PS on 64-bit Brian Gerst
2016-06-18 20:56 ` [PATCH v2 3/6] x86: Add struct inactive_task_frame Brian Gerst
2016-06-19 21:18   ` Andy Lutomirski
2016-06-20 15:39   ` Josh Poimboeuf
2016-06-18 20:56 ` [PATCH v2 4/6] x86: Rewrite switch_to() code Brian Gerst
2016-06-19 21:22   ` Andy Lutomirski
2016-06-20 15:44   ` Josh Poimboeuf
2016-06-18 20:56 ` [PATCH v2 5/6] x86: Pass kernel thread parameters in fork_frame Brian Gerst
2016-06-19 21:28   ` Andy Lutomirski
2016-06-19 22:01     ` Brian Gerst
2016-06-20 13:51   ` Borislav Petkov
2016-06-20 15:01     ` Brian Gerst
2016-06-20 15:14       ` Borislav Petkov
2016-06-22  4:24         ` Brian Gerst
2016-07-09 12:01           ` Ingo Molnar
2016-06-18 20:56 ` [PATCH v2 6/6] x86: Fix thread_saved_pc() Brian Gerst
2016-06-20 16:01   ` Josh Poimboeuf
2016-06-22  4:27     ` Brian Gerst
2016-06-24 18:12       ` Josh Poimboeuf
2016-06-19 22:05 ` [PATCH v2 0/6] x86: Rewrite switch_to() Brian Gerst

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).