All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/7] x86: Rewrite switch_to()
@ 2016-08-13 16:38 Brian Gerst
  2016-08-13 16:38 ` [PATCH v3 1/7] x86-32, kgdb: Don't use thread.ip in sleeping_thread_to_gdb_regs() Brian Gerst
                   ` (8 more replies)
  0 siblings, 9 replies; 25+ messages in thread
From: Brian Gerst @ 2016-08-13 16:38 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Ingo Molnar, H. Peter Anvin, Linus Torvalds, Denys Vlasenko,
	Andy Lutomirski, Borislav Petkov, Thomas Gleixner,
	Josh Poimboeuf

This patch set simplifies the switch_to() code, by moving the stack switch
code out of line into an asm stub before calling __switch_to().  This ends
up being more readable, and using the C calling convention instead of
clobbering all registers improves code generation.  It also allows newly
forked processes to construct a special stack frame to seamlessly flow
to ret_from_fork, instead of using a test and branch, or an unbalanced
call/ret.

Changes from v2:
- Updated comments around kernel threads being uncommon for fork, etc.
- Removed STACK_FRAME_NON_STANDARD annotation from __schedule() per Josh Poimboeuf
- A few minor cleanups added

Changes from v1:
- Added struct inactive_task_frame
- Added comments about kernel threads returning to userspace
- Cleaned up some incorrect uses of thread.sp
- Rearranged inactive stack frame so that BP (frame pointer) is in the natural position right below the return address.  This should take care of unwinding issues Josh raised.

Brian Gerst (7):
  x86-32, kgdb: Don't use thread.ip in sleeping_thread_to_gdb_regs()
  x86-64, kgdb: clear GDB_PS on 64-bit
  x86: Add struct inactive_task_frame
  x86: Rewrite switch_to() code
  x86: Pass kernel thread parameters in fork_frame
  x86: Fix thread_saved_pc()
  Revert "sched: Mark __schedule() stack frame as non-standard"

 arch/x86/entry/entry_32.S          |  68 +++++++++++++-----
 arch/x86/entry/entry_64.S          |  78 ++++++++++++++------
 arch/x86/include/asm/processor.h   |  13 +---
 arch/x86/include/asm/stacktrace.h  |   4 +-
 arch/x86/include/asm/switch_to.h   | 144 ++++++++-----------------------------
 arch/x86/include/asm/thread_info.h |   2 -
 arch/x86/kernel/asm-offsets.c      |   6 ++
 arch/x86/kernel/asm-offsets_32.c   |   5 ++
 arch/x86/kernel/asm-offsets_64.c   |   5 ++
 arch/x86/kernel/kgdb.c             |   8 +--
 arch/x86/kernel/process.c          |  14 +++-
 arch/x86/kernel/process_32.c       |  31 +++-----
 arch/x86/kernel/process_64.c       |  21 +++---
 arch/x86/kernel/smpboot.c          |   1 -
 kernel/sched/core.c                |   1 -
 15 files changed, 190 insertions(+), 211 deletions(-)

-- 
2.5.5

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v3 1/7] x86-32, kgdb: Don't use thread.ip in sleeping_thread_to_gdb_regs()
  2016-08-13 16:38 [PATCH v3 0/7] x86: Rewrite switch_to() Brian Gerst
@ 2016-08-13 16:38 ` Brian Gerst
  2016-08-24 13:07   ` [tip:x86/asm] sched/x86/32, " tip-bot for Brian Gerst
  2016-08-13 16:38 ` [PATCH v3 2/7] x86-64, kgdb: clear GDB_PS on 64-bit Brian Gerst
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 25+ messages in thread
From: Brian Gerst @ 2016-08-13 16:38 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Ingo Molnar, H. Peter Anvin, Linus Torvalds, Denys Vlasenko,
	Andy Lutomirski, Borislav Petkov, Thomas Gleixner,
	Josh Poimboeuf

Match 64-bit and set gdb_regs[GDB_PC] to zero.  thread.ip is always the
same point in the scheduler (except for newly forked processes), and will
be removed in a future patch.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
---
 arch/x86/kernel/kgdb.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/kernel/kgdb.c b/arch/x86/kernel/kgdb.c
index 04cde52..fe649a5 100644
--- a/arch/x86/kernel/kgdb.c
+++ b/arch/x86/kernel/kgdb.c
@@ -172,7 +172,6 @@ void sleeping_thread_to_gdb_regs(unsigned long *gdb_regs, struct task_struct *p)
 	gdb_regs[GDB_ES]	= __KERNEL_DS;
 	gdb_regs[GDB_PS]	= 0;
 	gdb_regs[GDB_CS]	= __KERNEL_CS;
-	gdb_regs[GDB_PC]	= p->thread.ip;
 	gdb_regs[GDB_SS]	= __KERNEL_DS;
 	gdb_regs[GDB_FS]	= 0xFFFF;
 	gdb_regs[GDB_GS]	= 0xFFFF;
@@ -180,7 +179,6 @@ void sleeping_thread_to_gdb_regs(unsigned long *gdb_regs, struct task_struct *p)
 	gdb_regs32[GDB_PS]	= *(unsigned long *)(p->thread.sp + 8);
 	gdb_regs32[GDB_CS]	= __KERNEL_CS;
 	gdb_regs32[GDB_SS]	= __KERNEL_DS;
-	gdb_regs[GDB_PC]	= 0;
 	gdb_regs[GDB_R8]	= 0;
 	gdb_regs[GDB_R9]	= 0;
 	gdb_regs[GDB_R10]	= 0;
@@ -190,6 +188,7 @@ void sleeping_thread_to_gdb_regs(unsigned long *gdb_regs, struct task_struct *p)
 	gdb_regs[GDB_R14]	= 0;
 	gdb_regs[GDB_R15]	= 0;
 #endif
+	gdb_regs[GDB_PC]	= 0;
 	gdb_regs[GDB_SP]	= p->thread.sp;
 }
 
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 2/7] x86-64, kgdb: clear GDB_PS on 64-bit
  2016-08-13 16:38 [PATCH v3 0/7] x86: Rewrite switch_to() Brian Gerst
  2016-08-13 16:38 ` [PATCH v3 1/7] x86-32, kgdb: Don't use thread.ip in sleeping_thread_to_gdb_regs() Brian Gerst
@ 2016-08-13 16:38 ` Brian Gerst
  2016-08-24 13:07   ` [tip:x86/asm] sched/x86/64, kgdb: Clear " tip-bot for Brian Gerst
  2016-08-13 16:38 ` [PATCH v3 3/7] x86: Add struct inactive_task_frame Brian Gerst
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 25+ messages in thread
From: Brian Gerst @ 2016-08-13 16:38 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Ingo Molnar, H. Peter Anvin, Linus Torvalds, Denys Vlasenko,
	Andy Lutomirski, Borislav Petkov, Thomas Gleixner,
	Josh Poimboeuf

switch_to() no longer saves EFLAGS, so it's bogus to look for it on the
stack.  Set it to zero like 32-bit.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
---
 arch/x86/kernel/kgdb.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/kgdb.c b/arch/x86/kernel/kgdb.c
index fe649a5..5e3f294 100644
--- a/arch/x86/kernel/kgdb.c
+++ b/arch/x86/kernel/kgdb.c
@@ -176,7 +176,7 @@ void sleeping_thread_to_gdb_regs(unsigned long *gdb_regs, struct task_struct *p)
 	gdb_regs[GDB_FS]	= 0xFFFF;
 	gdb_regs[GDB_GS]	= 0xFFFF;
 #else
-	gdb_regs32[GDB_PS]	= *(unsigned long *)(p->thread.sp + 8);
+	gdb_regs32[GDB_PS]	= 0;
 	gdb_regs32[GDB_CS]	= __KERNEL_CS;
 	gdb_regs32[GDB_SS]	= __KERNEL_DS;
 	gdb_regs[GDB_R8]	= 0;
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 3/7] x86: Add struct inactive_task_frame
  2016-08-13 16:38 [PATCH v3 0/7] x86: Rewrite switch_to() Brian Gerst
  2016-08-13 16:38 ` [PATCH v3 1/7] x86-32, kgdb: Don't use thread.ip in sleeping_thread_to_gdb_regs() Brian Gerst
  2016-08-13 16:38 ` [PATCH v3 2/7] x86-64, kgdb: clear GDB_PS on 64-bit Brian Gerst
@ 2016-08-13 16:38 ` Brian Gerst
  2016-08-24 13:08   ` [tip:x86/asm] sched/x86: Add 'struct inactive_task_frame' to better document the sleeping task stack frame tip-bot for Brian Gerst
  2016-08-13 16:38 ` [PATCH v3 4/7] x86: Rewrite switch_to() code Brian Gerst
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 25+ messages in thread
From: Brian Gerst @ 2016-08-13 16:38 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Ingo Molnar, H. Peter Anvin, Linus Torvalds, Denys Vlasenko,
	Andy Lutomirski, Borislav Petkov, Thomas Gleixner,
	Josh Poimboeuf

Add struct inactive_task_frame, which defines the layout of the stack for
a sleeping process.  For now, the only defined field is the BP register
(frame pointer).

Signed-off-by: Brian Gerst <brgerst@gmail.com>
---
 arch/x86/include/asm/stacktrace.h | 4 ++--
 arch/x86/include/asm/switch_to.h  | 5 +++++
 arch/x86/kernel/kgdb.c            | 3 ++-
 arch/x86/kernel/process.c         | 3 ++-
 4 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/stacktrace.h b/arch/x86/include/asm/stacktrace.h
index 0944218..7646fb2 100644
--- a/arch/x86/include/asm/stacktrace.h
+++ b/arch/x86/include/asm/stacktrace.h
@@ -8,6 +8,7 @@
 
 #include <linux/uaccess.h>
 #include <linux/ptrace.h>
+#include <asm/switch_to.h>
 
 extern int kstack_depth_to_print;
 
@@ -70,8 +71,7 @@ stack_frame(struct task_struct *task, struct pt_regs *regs)
 		return bp;
 	}
 
-	/* bp is the last reg pushed by switch_to */
-	return *(unsigned long *)task->thread.sp;
+	return ((struct inactive_task_frame *)task->thread.sp)->bp;
 }
 #else
 static inline unsigned long
diff --git a/arch/x86/include/asm/switch_to.h b/arch/x86/include/asm/switch_to.h
index 8f321a1..02de86e 100644
--- a/arch/x86/include/asm/switch_to.h
+++ b/arch/x86/include/asm/switch_to.h
@@ -8,6 +8,11 @@ struct tss_struct;
 void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p,
 		      struct tss_struct *tss);
 
+/* data that is pointed to by thread.sp */
+struct inactive_task_frame {
+	unsigned long bp;
+};
+
 #ifdef CONFIG_X86_32
 
 #ifdef CONFIG_CC_STACKPROTECTOR
diff --git a/arch/x86/kernel/kgdb.c b/arch/x86/kernel/kgdb.c
index 5e3f294..8e36f24 100644
--- a/arch/x86/kernel/kgdb.c
+++ b/arch/x86/kernel/kgdb.c
@@ -50,6 +50,7 @@
 #include <asm/apicdef.h>
 #include <asm/apic.h>
 #include <asm/nmi.h>
+#include <asm/switch_to.h>
 
 struct dbg_reg_def_t dbg_reg_def[DBG_MAX_REG_NUM] =
 {
@@ -166,7 +167,7 @@ void sleeping_thread_to_gdb_regs(unsigned long *gdb_regs, struct task_struct *p)
 	gdb_regs[GDB_DX]	= 0;
 	gdb_regs[GDB_SI]	= 0;
 	gdb_regs[GDB_DI]	= 0;
-	gdb_regs[GDB_BP]	= *(unsigned long *)p->thread.sp;
+	gdb_regs[GDB_BP]	= ((struct inactive_task_frame *)p->thread.sp)->bp;
 #ifdef CONFIG_X86_32
 	gdb_regs[GDB_DS]	= __KERNEL_DS;
 	gdb_regs[GDB_ES]	= __KERNEL_DS;
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 62c0b0e..0115a4a 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -32,6 +32,7 @@
 #include <asm/tlbflush.h>
 #include <asm/mce.h>
 #include <asm/vm86.h>
+#include <asm/switch_to.h>
 
 /*
  * per-CPU TSS segments. Threads are completely 'soft' on Linux,
@@ -556,7 +557,7 @@ unsigned long get_wchan(struct task_struct *p)
 	if (sp < bottom || sp > top)
 		return 0;
 
-	fp = READ_ONCE_NOCHECK(*(unsigned long *)sp);
+	fp = READ_ONCE_NOCHECK(((struct inactive_task_frame *)sp)->bp);
 	do {
 		if (fp < bottom || fp > top)
 			return 0;
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 4/7] x86: Rewrite switch_to() code
  2016-08-13 16:38 [PATCH v3 0/7] x86: Rewrite switch_to() Brian Gerst
                   ` (2 preceding siblings ...)
  2016-08-13 16:38 ` [PATCH v3 3/7] x86: Add struct inactive_task_frame Brian Gerst
@ 2016-08-13 16:38 ` Brian Gerst
  2016-08-24 13:08   ` [tip:x86/asm] sched/x86: Rewrite the " tip-bot for Brian Gerst
  2016-08-13 16:38 ` [PATCH v3 5/7] x86: Pass kernel thread parameters in fork_frame Brian Gerst
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 25+ messages in thread
From: Brian Gerst @ 2016-08-13 16:38 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Ingo Molnar, H. Peter Anvin, Linus Torvalds, Denys Vlasenko,
	Andy Lutomirski, Borislav Petkov, Thomas Gleixner,
	Josh Poimboeuf

Move the low-level context switch code to an out-of-line asm stub instead of
using complex inline asm.  This allows constructing a new stack frame for the
child process to make it seamlessly flow to ret_from_fork without an extra
test and branch in __switch_to().  It also improves code generation for
__schedule() by using the C calling convention instead of clobbering all
registers.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
---
 arch/x86/entry/entry_32.S          |  37 ++++++++++
 arch/x86/entry/entry_64.S          |  41 ++++++++++-
 arch/x86/include/asm/processor.h   |   3 -
 arch/x86/include/asm/switch_to.h   | 137 ++++++-------------------------------
 arch/x86/include/asm/thread_info.h |   2 -
 arch/x86/kernel/asm-offsets.c      |   6 ++
 arch/x86/kernel/asm-offsets_32.c   |   5 ++
 arch/x86/kernel/asm-offsets_64.c   |   5 ++
 arch/x86/kernel/process_32.c       |   9 ++-
 arch/x86/kernel/process_64.c       |   9 ++-
 arch/x86/kernel/smpboot.c          |   1 -
 11 files changed, 125 insertions(+), 130 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 0b56666..bf8f221 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -204,6 +204,43 @@
 	POP_GS_EX
 .endm
 
+/*
+ * %eax: prev task
+ * %edx: next task
+ */
+ENTRY(__switch_to_asm)
+	/*
+	 * Save callee-saved registers
+	 * This must match the order in struct inactive_task_frame
+	 */
+	pushl	%ebp
+	pushl	%ebx
+	pushl	%edi
+	pushl	%esi
+
+	/* switch stack */
+	movl	%esp, TASK_threadsp(%eax)
+	movl	TASK_threadsp(%edx), %esp
+
+#ifdef CONFIG_CC_STACKPROTECTOR
+	movl	TASK_stack_canary(%edx), %ebx
+	movl	%ebx, PER_CPU_VAR(stack_canary)+stack_canary_offset
+#endif
+
+	/* restore callee-saved registers */
+	popl	%esi
+	popl	%edi
+	popl	%ebx
+	popl	%ebp
+
+	jmp	__switch_to
+END(__switch_to_asm)
+
+/*
+ * A newly forked process directly context switches into this address.
+ *
+ * eax: prev task we switched from
+ */
 ENTRY(ret_from_fork)
 	pushl	%eax
 	call	schedule_tail
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index f6b40e5..c1af8ac 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -368,13 +368,48 @@ END(ptregs_\func)
 #include <asm/syscalls_64.h>
 
 /*
+ * %rdi: prev task
+ * %rsi: next task
+ */
+ENTRY(__switch_to_asm)
+	/*
+	 * Save callee-saved registers
+	 * This must match the order in inactive_task_frame
+	 */
+	pushq	%rbp
+	pushq	%rbx
+	pushq	%r12
+	pushq	%r13
+	pushq	%r14
+	pushq	%r15
+
+	/* switch stack */
+	movq	%rsp, TASK_threadsp(%rdi)
+	movq	TASK_threadsp(%rsi), %rsp
+
+#ifdef CONFIG_CC_STACKPROTECTOR
+	movq	TASK_stack_canary(%rsi), %rbx
+	movq	%rbx, PER_CPU_VAR(irq_stack_union)+stack_canary_offset
+#endif
+
+	/* restore callee-saved registers */
+	popq	%r15
+	popq	%r14
+	popq	%r13
+	popq	%r12
+	popq	%rbx
+	popq	%rbp
+
+	jmp	__switch_to
+END(__switch_to_asm)
+
+/*
  * A newly forked process directly context switches into this address.
  *
- * rdi: prev task we switched from
+ * rax: prev task we switched from
  */
 ENTRY(ret_from_fork)
-	LOCK ; btr $TIF_FORK, TI_flags(%r8)
-
+	movq	%rax, %rdi
 	call	schedule_tail			/* rdi: 'prev' task parameter */
 
 	testb	$3, CS(%rsp)			/* from kernel_thread? */
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 63def95..6fee863 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -389,9 +389,6 @@ struct thread_struct {
 	unsigned short		fsindex;
 	unsigned short		gsindex;
 #endif
-#ifdef CONFIG_X86_32
-	unsigned long		ip;
-#endif
 #ifdef CONFIG_X86_64
 	unsigned long		fsbase;
 	unsigned long		gsbase;
diff --git a/arch/x86/include/asm/switch_to.h b/arch/x86/include/asm/switch_to.h
index 02de86e..bf4e2ec 100644
--- a/arch/x86/include/asm/switch_to.h
+++ b/arch/x86/include/asm/switch_to.h
@@ -2,135 +2,40 @@
 #define _ASM_X86_SWITCH_TO_H
 
 struct task_struct; /* one of the stranger aspects of C forward declarations */
+
+struct task_struct *__switch_to_asm(struct task_struct *prev,
+				    struct task_struct *next);
+
 __visible struct task_struct *__switch_to(struct task_struct *prev,
-					   struct task_struct *next);
+					  struct task_struct *next);
 struct tss_struct;
 void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p,
 		      struct tss_struct *tss);
 
 /* data that is pointed to by thread.sp */
 struct inactive_task_frame {
+#ifdef CONFIG_X86_64
+	unsigned long r15;
+	unsigned long r14;
+	unsigned long r13;
+	unsigned long r12;
+#else
+	unsigned long si;
+	unsigned long di;
+#endif
+	unsigned long bx;
 	unsigned long bp;
+	unsigned long ret_addr;
 };
 
-#ifdef CONFIG_X86_32
-
-#ifdef CONFIG_CC_STACKPROTECTOR
-#define __switch_canary							\
-	"movl %P[task_canary](%[next]), %%ebx\n\t"			\
-	"movl %%ebx, "__percpu_arg([stack_canary])"\n\t"
-#define __switch_canary_oparam						\
-	, [stack_canary] "=m" (stack_canary.canary)
-#define __switch_canary_iparam						\
-	, [task_canary] "i" (offsetof(struct task_struct, stack_canary))
-#else	/* CC_STACKPROTECTOR */
-#define __switch_canary
-#define __switch_canary_oparam
-#define __switch_canary_iparam
-#endif	/* CC_STACKPROTECTOR */
+struct fork_frame {
+	struct inactive_task_frame frame;
+	struct pt_regs regs;
+};
 
-/*
- * Saving eflags is important. It switches not only IOPL between tasks,
- * it also protects other tasks from NT leaking through sysenter etc.
- */
 #define switch_to(prev, next, last)					\
 do {									\
-	/*								\
-	 * Context-switching clobbers all registers, so we clobber	\
-	 * them explicitly, via unused output variables.		\
-	 * (EAX and EBP is not listed because EBP is saved/restored	\
-	 * explicitly for wchan access and EAX is the return value of	\
-	 * __switch_to())						\
-	 */								\
-	unsigned long ebx, ecx, edx, esi, edi;				\
-									\
-	asm volatile("pushl %%ebp\n\t"		/* save    EBP   */	\
-		     "movl %%esp,%[prev_sp]\n\t"	/* save    ESP   */ \
-		     "movl %[next_sp],%%esp\n\t"	/* restore ESP   */ \
-		     "movl $1f,%[prev_ip]\n\t"	/* save    EIP   */	\
-		     "pushl %[next_ip]\n\t"	/* restore EIP   */	\
-		     __switch_canary					\
-		     "jmp __switch_to\n"	/* regparm call  */	\
-		     "1:\t"						\
-		     "popl %%ebp\n\t"		/* restore EBP   */	\
-									\
-		     /* output parameters */				\
-		     : [prev_sp] "=m" (prev->thread.sp),		\
-		       [prev_ip] "=m" (prev->thread.ip),		\
-		       "=a" (last),					\
-									\
-		       /* clobbered output registers: */		\
-		       "=b" (ebx), "=c" (ecx), "=d" (edx),		\
-		       "=S" (esi), "=D" (edi)				\
-		       							\
-		       __switch_canary_oparam				\
-									\
-		       /* input parameters: */				\
-		     : [next_sp]  "m" (next->thread.sp),		\
-		       [next_ip]  "m" (next->thread.ip),		\
-		       							\
-		       /* regparm parameters for __switch_to(): */	\
-		       [prev]     "a" (prev),				\
-		       [next]     "d" (next)				\
-									\
-		       __switch_canary_iparam				\
-									\
-		     : /* reloaded segment registers */			\
-			"memory");					\
+	((last) = __switch_to_asm((prev), (next)));			\
 } while (0)
 
-#else /* CONFIG_X86_32 */
-
-/* frame pointer must be last for get_wchan */
-#define SAVE_CONTEXT    "pushq %%rbp ; movq %%rsi,%%rbp\n\t"
-#define RESTORE_CONTEXT "movq %%rbp,%%rsi ; popq %%rbp\t"
-
-#define __EXTRA_CLOBBER  \
-	, "rcx", "rbx", "rdx", "r8", "r9", "r10", "r11", \
-	  "r12", "r13", "r14", "r15", "flags"
-
-#ifdef CONFIG_CC_STACKPROTECTOR
-#define __switch_canary							  \
-	"movq %P[task_canary](%%rsi),%%r8\n\t"				  \
-	"movq %%r8,"__percpu_arg([gs_canary])"\n\t"
-#define __switch_canary_oparam						  \
-	, [gs_canary] "=m" (irq_stack_union.stack_canary)
-#define __switch_canary_iparam						  \
-	, [task_canary] "i" (offsetof(struct task_struct, stack_canary))
-#else	/* CC_STACKPROTECTOR */
-#define __switch_canary
-#define __switch_canary_oparam
-#define __switch_canary_iparam
-#endif	/* CC_STACKPROTECTOR */
-
-/*
- * There is no need to save or restore flags, because flags are always
- * clean in kernel mode, with the possible exception of IOPL.  Kernel IOPL
- * has no effect.
- */
-#define switch_to(prev, next, last) \
-	asm volatile(SAVE_CONTEXT					  \
-	     "movq %%rsp,%P[threadrsp](%[prev])\n\t" /* save RSP */	  \
-	     "movq %P[threadrsp](%[next]),%%rsp\n\t" /* restore RSP */	  \
-	     "call __switch_to\n\t"					  \
-	     "movq "__percpu_arg([current_task])",%%rsi\n\t"		  \
-	     __switch_canary						  \
-	     "movq %P[thread_info](%%rsi),%%r8\n\t"			  \
-	     "movq %%rax,%%rdi\n\t" 					  \
-	     "testl  %[_tif_fork],%P[ti_flags](%%r8)\n\t"		  \
-	     "jnz   ret_from_fork\n\t"					  \
-	     RESTORE_CONTEXT						  \
-	     : "=a" (last)					  	  \
-	       __switch_canary_oparam					  \
-	     : [next] "S" (next), [prev] "D" (prev),			  \
-	       [threadrsp] "i" (offsetof(struct task_struct, thread.sp)), \
-	       [ti_flags] "i" (offsetof(struct thread_info, flags)),	  \
-	       [_tif_fork] "i" (_TIF_FORK),			  	  \
-	       [thread_info] "i" (offsetof(struct task_struct, stack)),   \
-	       [current_task] "m" (current_task)			  \
-	       __switch_canary_iparam					  \
-	     : "memory", "cc" __EXTRA_CLOBBER)
-
-#endif /* CONFIG_X86_32 */
-
 #endif /* _ASM_X86_SWITCH_TO_H */
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index 8b7c8d8..494c4b5 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -95,7 +95,6 @@ struct thread_info {
 #define TIF_UPROBE		12	/* breakpointed or singlestepping */
 #define TIF_NOTSC		16	/* TSC is not accessible in userland */
 #define TIF_IA32		17	/* IA32 compatibility process */
-#define TIF_FORK		18	/* ret_from_fork */
 #define TIF_NOHZ		19	/* in adaptive nohz mode */
 #define TIF_MEMDIE		20	/* is terminating due to OOM killer */
 #define TIF_POLLING_NRFLAG	21	/* idle is polling for TIF_NEED_RESCHED */
@@ -119,7 +118,6 @@ struct thread_info {
 #define _TIF_UPROBE		(1 << TIF_UPROBE)
 #define _TIF_NOTSC		(1 << TIF_NOTSC)
 #define _TIF_IA32		(1 << TIF_IA32)
-#define _TIF_FORK		(1 << TIF_FORK)
 #define _TIF_NOHZ		(1 << TIF_NOHZ)
 #define _TIF_POLLING_NRFLAG	(1 << TIF_POLLING_NRFLAG)
 #define _TIF_IO_BITMAP		(1 << TIF_IO_BITMAP)
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index 2bd5c6f..db3a0af 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -29,6 +29,12 @@
 
 void common(void) {
 	BLANK();
+	OFFSET(TASK_threadsp, task_struct, thread.sp);
+#ifdef CONFIG_CC_STACKPROTECTOR
+	OFFSET(TASK_stack_canary, task_struct, stack_canary);
+#endif
+
+	BLANK();
 	OFFSET(TI_flags, thread_info, flags);
 	OFFSET(TI_status, thread_info, status);
 
diff --git a/arch/x86/kernel/asm-offsets_32.c b/arch/x86/kernel/asm-offsets_32.c
index ecdc1d2..880aa09 100644
--- a/arch/x86/kernel/asm-offsets_32.c
+++ b/arch/x86/kernel/asm-offsets_32.c
@@ -57,6 +57,11 @@ void foo(void)
 	/* Size of SYSENTER_stack */
 	DEFINE(SIZEOF_SYSENTER_stack, sizeof(((struct tss_struct *)0)->SYSENTER_stack));
 
+#ifdef CONFIG_CC_STACKPROTECTOR
+	BLANK();
+	OFFSET(stack_canary_offset, stack_canary, canary);
+#endif
+
 #if defined(CONFIG_LGUEST) || defined(CONFIG_LGUEST_GUEST) || defined(CONFIG_LGUEST_MODULE)
 	BLANK();
 	OFFSET(LGUEST_DATA_irq_enabled, lguest_data, irq_enabled);
diff --git a/arch/x86/kernel/asm-offsets_64.c b/arch/x86/kernel/asm-offsets_64.c
index d875f97..210927e 100644
--- a/arch/x86/kernel/asm-offsets_64.c
+++ b/arch/x86/kernel/asm-offsets_64.c
@@ -56,6 +56,11 @@ int main(void)
 	OFFSET(TSS_sp0, tss_struct, x86_tss.sp0);
 	BLANK();
 
+#ifdef CONFIG_CC_STACKPROTECTOR
+	DEFINE(stack_canary_offset, offsetof(union irq_stack_union, stack_canary));
+	BLANK();
+#endif
+
 	DEFINE(__NR_syscall_max, sizeof(syscalls_64) - 1);
 	DEFINE(NR_syscalls, sizeof(syscalls_64));
 
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index d86be29..4bedbc0 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -133,17 +133,20 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long sp,
 	unsigned long arg, struct task_struct *p, unsigned long tls)
 {
 	struct pt_regs *childregs = task_pt_regs(p);
+	struct fork_frame *fork_frame = container_of(childregs, struct fork_frame, regs);
+	struct inactive_task_frame *frame = &fork_frame->frame;
 	struct task_struct *tsk;
 	int err;
 
-	p->thread.sp = (unsigned long) childregs;
+	frame->bp = 0;
+	p->thread.sp = (unsigned long) fork_frame;
 	p->thread.sp0 = (unsigned long) (childregs+1);
 	memset(p->thread.ptrace_bps, 0, sizeof(p->thread.ptrace_bps));
 
 	if (unlikely(p->flags & PF_KTHREAD)) {
 		/* kernel thread */
 		memset(childregs, 0, sizeof(struct pt_regs));
-		p->thread.ip = (unsigned long) ret_from_kernel_thread;
+		frame->ret_addr = (unsigned long) ret_from_kernel_thread;
 		task_user_gs(p) = __KERNEL_STACK_CANARY;
 		childregs->ds = __USER_DS;
 		childregs->es = __USER_DS;
@@ -161,7 +164,7 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long sp,
 	if (sp)
 		childregs->sp = sp;
 
-	p->thread.ip = (unsigned long) ret_from_fork;
+	frame->ret_addr = (unsigned long) ret_from_fork;
 	task_user_gs(p) = get_user_gs(current_pt_regs());
 
 	p->thread.io_bitmap_ptr = NULL;
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 63236d8..827eeed 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -141,12 +141,17 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long sp,
 {
 	int err;
 	struct pt_regs *childregs;
+	struct fork_frame *fork_frame;
+	struct inactive_task_frame *frame;
 	struct task_struct *me = current;
 
 	p->thread.sp0 = (unsigned long)task_stack_page(p) + THREAD_SIZE;
 	childregs = task_pt_regs(p);
-	p->thread.sp = (unsigned long) childregs;
-	set_tsk_thread_flag(p, TIF_FORK);
+	fork_frame = container_of(childregs, struct fork_frame, regs);
+	frame = &fork_frame->frame;
+	frame->bp = 0;
+	frame->ret_addr = (unsigned long) ret_from_fork;
+	p->thread.sp = (unsigned long) fork_frame;
 	p->thread.io_bitmap_ptr = NULL;
 
 	savesegment(gs, p->thread.gsindex);
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 067de61..a3fefdf 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -935,7 +935,6 @@ void common_cpu_up(unsigned int cpu, struct task_struct *idle)
 	per_cpu(cpu_current_top_of_stack, cpu) =
 		(unsigned long)task_stack_page(idle) + THREAD_SIZE;
 #else
-	clear_tsk_thread_flag(idle, TIF_FORK);
 	initial_gs = per_cpu_offset(cpu);
 #endif
 }
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 5/7] x86: Pass kernel thread parameters in fork_frame
  2016-08-13 16:38 [PATCH v3 0/7] x86: Rewrite switch_to() Brian Gerst
                   ` (3 preceding siblings ...)
  2016-08-13 16:38 ` [PATCH v3 4/7] x86: Rewrite switch_to() code Brian Gerst
@ 2016-08-13 16:38 ` Brian Gerst
  2016-08-24 13:09   ` [tip:x86/asm] sched/x86: Pass kernel thread parameters in 'struct fork_frame' tip-bot for Brian Gerst
  2016-08-13 16:38 ` [PATCH v3 6/7] x86: Fix thread_saved_pc() Brian Gerst
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 25+ messages in thread
From: Brian Gerst @ 2016-08-13 16:38 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Ingo Molnar, H. Peter Anvin, Linus Torvalds, Denys Vlasenko,
	Andy Lutomirski, Borislav Petkov, Thomas Gleixner,
	Josh Poimboeuf

Instead of setting up a fake pt_regs context, put the kernel thread
function pointer and arg into the unused callee-restored registers
of struct fork_frame.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
---
 arch/x86/entry/entry_32.S        | 31 +++++++++++++++----------------
 arch/x86/entry/entry_64.S        | 37 +++++++++++++++++--------------------
 arch/x86/include/asm/switch_to.h |  2 ++
 arch/x86/kernel/process_32.c     | 18 ++++--------------
 arch/x86/kernel/process_64.c     | 12 +++---------
 5 files changed, 41 insertions(+), 59 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index bf8f221..b75a8bc 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -240,35 +240,34 @@ END(__switch_to_asm)
  * A newly forked process directly context switches into this address.
  *
  * eax: prev task we switched from
+ * ebx: kernel thread func (NULL for user thread)
+ * edi: kernel thread arg
  */
 ENTRY(ret_from_fork)
 	pushl	%eax
 	call	schedule_tail
 	popl	%eax
 
+	testl	%ebx, %ebx
+	jnz	1f		/* kernel threads are uncommon */
+
+2:
 	/* When we fork, we trace the syscall return in the child, too. */
 	movl    %esp, %eax
 	call    syscall_return_slowpath
 	jmp     restore_all
-END(ret_from_fork)
-
-ENTRY(ret_from_kernel_thread)
-	pushl	%eax
-	call	schedule_tail
-	popl	%eax
-	movl	PT_EBP(%esp), %eax
-	call	*PT_EBX(%esp)
-	movl	$0, PT_EAX(%esp)
 
+	/* kernel thread */
+1:	movl	%edi, %eax
+	call	*%ebx
 	/*
-	 * Kernel threads return to userspace as if returning from a syscall.
-	 * We should check whether anything actually uses this path and, if so,
-	 * consider switching it over to ret_from_fork.
+	 * A kernel thread is allowed to return here after successfully
+	 * calling do_execve().  Exit to userspace to complete the execve()
+	 * syscall.
 	 */
-	movl    %esp, %eax
-	call    syscall_return_slowpath
-	jmp     restore_all
-ENDPROC(ret_from_kernel_thread)
+	movl	$0, PT_EAX(%esp)
+	jmp	2b
+END(ret_from_fork)
 
 /*
  * Return to user mode is not as complex as all this looks,
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index c1af8ac..c0373d6 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -407,37 +407,34 @@ END(__switch_to_asm)
  * A newly forked process directly context switches into this address.
  *
  * rax: prev task we switched from
+ * rbx: kernel thread func (NULL for user thread)
+ * r12: kernel thread arg
  */
 ENTRY(ret_from_fork)
 	movq	%rax, %rdi
 	call	schedule_tail			/* rdi: 'prev' task parameter */
 
-	testb	$3, CS(%rsp)			/* from kernel_thread? */
-	jnz	1f
-
-	/*
-	 * We came from kernel_thread.  This code path is quite twisted, and
-	 * someone should clean it up.
-	 *
-	 * copy_thread_tls stashes the function pointer in RBX and the
-	 * parameter to be passed in RBP.  The called function is permitted
-	 * to call do_execve and thereby jump to user mode.
-	 */
-	movq	RBP(%rsp), %rdi
-	call	*RBX(%rsp)
-	movl	$0, RAX(%rsp)
-
-	/*
-	 * Fall through as though we're exiting a syscall.  This makes a
-	 * twisted sort of sense if we just called do_execve.
-	 */
+	testq	%rbx, %rbx			/* from kernel_thread? */
+	jnz	1f				/* kernel threads are uncommon */
 
-1:
+2:
 	movq	%rsp, %rdi
 	call	syscall_return_slowpath	/* returns with IRQs disabled */
 	TRACE_IRQS_ON			/* user mode is traced as IRQS on */
 	SWAPGS
 	jmp	restore_regs_and_iret
+
+1:
+	/* kernel thread */
+	movq	%r12, %rdi
+	call	*%rbx
+	/*
+	 * A kernel thread is allowed to return here after successfully
+	 * calling do_execve().  Exit to userspace to complete the execve()
+	 * syscall.
+	 */
+	movq	$0, RAX(%rsp)
+	jmp	2b
 END(ret_from_fork)
 
 /*
diff --git a/arch/x86/include/asm/switch_to.h b/arch/x86/include/asm/switch_to.h
index bf4e2ec..33fb765 100644
--- a/arch/x86/include/asm/switch_to.h
+++ b/arch/x86/include/asm/switch_to.h
@@ -12,6 +12,8 @@ struct tss_struct;
 void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p,
 		      struct tss_struct *tss);
 
+asmlinkage void ret_from_fork(void);
+
 /* data that is pointed to by thread.sp */
 struct inactive_task_frame {
 #ifdef CONFIG_X86_64
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 4bedbc0..18714a1 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -55,9 +55,6 @@
 #include <asm/switch_to.h>
 #include <asm/vm86.h>
 
-asmlinkage void ret_from_fork(void) __asm__("ret_from_fork");
-asmlinkage void ret_from_kernel_thread(void) __asm__("ret_from_kernel_thread");
-
 /*
  * Return saved PC of a blocked thread.
  */
@@ -139,6 +136,7 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long sp,
 	int err;
 
 	frame->bp = 0;
+	frame->ret_addr = (unsigned long) ret_from_fork;
 	p->thread.sp = (unsigned long) fork_frame;
 	p->thread.sp0 = (unsigned long) (childregs+1);
 	memset(p->thread.ptrace_bps, 0, sizeof(p->thread.ptrace_bps));
@@ -146,25 +144,17 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long sp,
 	if (unlikely(p->flags & PF_KTHREAD)) {
 		/* kernel thread */
 		memset(childregs, 0, sizeof(struct pt_regs));
-		frame->ret_addr = (unsigned long) ret_from_kernel_thread;
-		task_user_gs(p) = __KERNEL_STACK_CANARY;
-		childregs->ds = __USER_DS;
-		childregs->es = __USER_DS;
-		childregs->fs = __KERNEL_PERCPU;
-		childregs->bx = sp;	/* function */
-		childregs->bp = arg;
-		childregs->orig_ax = -1;
-		childregs->cs = __KERNEL_CS | get_kernel_rpl();
-		childregs->flags = X86_EFLAGS_IF | X86_EFLAGS_FIXED;
+		frame->bx = sp;		/* function */
+		frame->di = arg;
 		p->thread.io_bitmap_ptr = NULL;
 		return 0;
 	}
+	frame->bx = 0;
 	*childregs = *current_pt_regs();
 	childregs->ax = 0;
 	if (sp)
 		childregs->sp = sp;
 
-	frame->ret_addr = (unsigned long) ret_from_fork;
 	task_user_gs(p) = get_user_gs(current_pt_regs());
 
 	p->thread.io_bitmap_ptr = NULL;
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 827eeed..b812cd0 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -50,8 +50,6 @@
 #include <asm/switch_to.h>
 #include <asm/xen/hypervisor.h>
 
-asmlinkage extern void ret_from_fork(void);
-
 __visible DEFINE_PER_CPU(unsigned long, rsp_scratch);
 
 /* Prints also some state that isn't saved in the pt_regs */
@@ -165,15 +163,11 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long sp,
 	if (unlikely(p->flags & PF_KTHREAD)) {
 		/* kernel thread */
 		memset(childregs, 0, sizeof(struct pt_regs));
-		childregs->sp = (unsigned long)childregs;
-		childregs->ss = __KERNEL_DS;
-		childregs->bx = sp; /* function */
-		childregs->bp = arg;
-		childregs->orig_ax = -1;
-		childregs->cs = __KERNEL_CS | get_kernel_rpl();
-		childregs->flags = X86_EFLAGS_IF | X86_EFLAGS_FIXED;
+		frame->bx = sp;		/* function */
+		frame->r12 = arg;
 		return 0;
 	}
+	frame->bx = 0;
 	*childregs = *current_pt_regs();
 
 	childregs->ax = 0;
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 6/7] x86: Fix thread_saved_pc()
  2016-08-13 16:38 [PATCH v3 0/7] x86: Rewrite switch_to() Brian Gerst
                   ` (4 preceding siblings ...)
  2016-08-13 16:38 ` [PATCH v3 5/7] x86: Pass kernel thread parameters in fork_frame Brian Gerst
@ 2016-08-13 16:38 ` Brian Gerst
  2016-08-24 13:09   ` [tip:x86/asm] sched/x86: " tip-bot for Brian Gerst
  2016-08-13 16:38 ` [PATCH v3 7/7] Revert "sched: Mark __schedule() stack frame as non-standard" Brian Gerst
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 25+ messages in thread
From: Brian Gerst @ 2016-08-13 16:38 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Ingo Molnar, H. Peter Anvin, Linus Torvalds, Denys Vlasenko,
	Andy Lutomirski, Borislav Petkov, Thomas Gleixner,
	Josh Poimboeuf

thread_saved_pc() was using a completely bogus method to get the return
address.  Since switch_to() was previously inlined, there was no sane way
to know where on the stack the return address was stored.  Now with the
frame of a sleeping thread well defined, this can be implemented correctly.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
---
 arch/x86/include/asm/processor.h | 10 ++--------
 arch/x86/kernel/process.c        | 11 +++++++++++
 arch/x86/kernel/process_32.c     |  8 --------
 3 files changed, 13 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 6fee863..b22fb5a 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -721,8 +721,6 @@ static inline void spin_lock_prefetch(const void *x)
 	.addr_limit		= KERNEL_DS,				  \
 }
 
-extern unsigned long thread_saved_pc(struct task_struct *tsk);
-
 /*
  * TOP_OF_KERNEL_STACK_PADDING reserves 8 bytes on top of the ring0 stack.
  * This is necessary to guarantee that the entire "struct pt_regs"
@@ -773,17 +771,13 @@ extern unsigned long thread_saved_pc(struct task_struct *tsk);
 	.addr_limit		= KERNEL_DS,			\
 }
 
-/*
- * Return saved PC of a blocked thread.
- * What is this good for? it will be always the scheduler or ret_from_fork.
- */
-#define thread_saved_pc(t)	READ_ONCE_NOCHECK(*(unsigned long *)((t)->thread.sp - 8))
-
 #define task_pt_regs(tsk)	((struct pt_regs *)(tsk)->thread.sp0 - 1)
 extern unsigned long KSTK_ESP(struct task_struct *task);
 
 #endif /* CONFIG_X86_64 */
 
+extern unsigned long thread_saved_pc(struct task_struct *tsk);
+
 extern void start_thread(struct pt_regs *regs, unsigned long new_ip,
 					       unsigned long new_sp);
 
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 0115a4a..c1fa790 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -514,6 +514,17 @@ unsigned long arch_randomize_brk(struct mm_struct *mm)
 }
 
 /*
+ * Return saved PC of a blocked thread.
+ * What is this good for? it will be always the scheduler or ret_from_fork.
+ */
+unsigned long thread_saved_pc(struct task_struct *tsk)
+{
+	struct inactive_task_frame *frame =
+		(struct inactive_task_frame *) READ_ONCE(tsk->thread.sp);
+	return READ_ONCE_NOCHECK(frame->ret_addr);
+}
+
+/*
  * Called from fs/proc with a reference on @p to find the function
  * which called into schedule(). This needs to be done carefully
  * because the task might wake up and we might look at a stack
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 18714a1..404efdf 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -55,14 +55,6 @@
 #include <asm/switch_to.h>
 #include <asm/vm86.h>
 
-/*
- * Return saved PC of a blocked thread.
- */
-unsigned long thread_saved_pc(struct task_struct *tsk)
-{
-	return ((unsigned long *)tsk->thread.sp)[3];
-}
-
 void __show_regs(struct pt_regs *regs, int all)
 {
 	unsigned long cr0 = 0L, cr2 = 0L, cr3 = 0L, cr4 = 0L;
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 7/7] Revert "sched: Mark __schedule() stack frame as non-standard"
  2016-08-13 16:38 [PATCH v3 0/7] x86: Rewrite switch_to() Brian Gerst
                   ` (5 preceding siblings ...)
  2016-08-13 16:38 ` [PATCH v3 6/7] x86: Fix thread_saved_pc() Brian Gerst
@ 2016-08-13 16:38 ` Brian Gerst
  2016-08-24 13:09   ` [tip:x86/asm] sched: Remove __schedule() non-standard frame annotation tip-bot for Brian Gerst
  2016-08-13 17:16 ` [PATCH v3 0/7] x86: Rewrite switch_to() Linus Torvalds
  2016-08-17 21:20 ` Josh Poimboeuf
  8 siblings, 1 reply; 25+ messages in thread
From: Brian Gerst @ 2016-08-13 16:38 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Ingo Molnar, H. Peter Anvin, Linus Torvalds, Denys Vlasenko,
	Andy Lutomirski, Borislav Petkov, Thomas Gleixner,
	Josh Poimboeuf

Now that the x86 switch_to() uses the standard C calling convention,
STACK_FRAME_NON_STANDARD is no longer needed.

Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Brian Gerst <brgerst@gmail.com>
---
 kernel/sched/core.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 3b6b23c..dbf73db 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3384,7 +3384,6 @@ static void __sched notrace __schedule(bool preempt)
 
 	balance_callback(rq);
 }
-STACK_FRAME_NON_STANDARD(__schedule); /* switch_to() */
 
 static inline void sched_submit_work(struct task_struct *tsk)
 {
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 0/7] x86: Rewrite switch_to()
  2016-08-13 16:38 [PATCH v3 0/7] x86: Rewrite switch_to() Brian Gerst
                   ` (6 preceding siblings ...)
  2016-08-13 16:38 ` [PATCH v3 7/7] Revert "sched: Mark __schedule() stack frame as non-standard" Brian Gerst
@ 2016-08-13 17:16 ` Linus Torvalds
  2016-08-13 18:15   ` Brian Gerst
  2016-08-17 21:20 ` Josh Poimboeuf
  8 siblings, 1 reply; 25+ messages in thread
From: Linus Torvalds @ 2016-08-13 17:16 UTC (permalink / raw)
  To: Brian Gerst
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Ingo Molnar,
	H. Peter Anvin, Denys Vlasenko, Andy Lutomirski, Borislav Petkov,
	Thomas Gleixner, Josh Poimboeuf

On Sat, Aug 13, 2016 at 9:38 AM, Brian Gerst <brgerst@gmail.com> wrote:
> This patch set simplifies the switch_to() code, by moving the stack switch
> code out of line into an asm stub before calling __switch_to().  This ends
> up being more readable, and using the C calling convention instead of
> clobbering all registers improves code generation.  It also allows newly
> forked processes to construct a special stack frame to seamlessly flow
> to ret_from_fork, instead of using a test and branch, or an unbalanced
> call/ret.

Do you have performance numbers? Is it noticeable/measurable?

             Linus

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 0/7] x86: Rewrite switch_to()
  2016-08-13 17:16 ` [PATCH v3 0/7] x86: Rewrite switch_to() Linus Torvalds
@ 2016-08-13 18:15   ` Brian Gerst
  2016-08-13 18:45     ` Ingo Molnar
  0 siblings, 1 reply; 25+ messages in thread
From: Brian Gerst @ 2016-08-13 18:15 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Ingo Molnar,
	H. Peter Anvin, Denys Vlasenko, Andy Lutomirski, Borislav Petkov,
	Thomas Gleixner, Josh Poimboeuf

On Sat, Aug 13, 2016 at 1:16 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Sat, Aug 13, 2016 at 9:38 AM, Brian Gerst <brgerst@gmail.com> wrote:
>> This patch set simplifies the switch_to() code, by moving the stack switch
>> code out of line into an asm stub before calling __switch_to().  This ends
>> up being more readable, and using the C calling convention instead of
>> clobbering all registers improves code generation.  It also allows newly
>> forked processes to construct a special stack frame to seamlessly flow
>> to ret_from_fork, instead of using a test and branch, or an unbalanced
>> call/ret.
>
> Do you have performance numbers? Is it noticeable/measurable?

How do I measure it?  The perf documentation isn't easy to understand.

It shouldn't be a significant change.  On a 64-bit defconfig build,
__schedule() shrinks by 103 bytes.  It's hard to analyse what exactly
changes, but it's likely that GCC can allocate registers better
without all the clobbers of the old inline asm version interfering.
The new stub adds just 39 bytes.

--
Brian Gerst

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 0/7] x86: Rewrite switch_to()
  2016-08-13 18:15   ` Brian Gerst
@ 2016-08-13 18:45     ` Ingo Molnar
  2016-08-13 19:33       ` Andy Lutomirski
  2016-08-14 14:18       ` Brian Gerst
  0 siblings, 2 replies; 25+ messages in thread
From: Ingo Molnar @ 2016-08-13 18:45 UTC (permalink / raw)
  To: Brian Gerst
  Cc: Linus Torvalds, the arch/x86 maintainers,
	Linux Kernel Mailing List, H. Peter Anvin, Denys Vlasenko,
	Andy Lutomirski, Borislav Petkov, Thomas Gleixner,
	Josh Poimboeuf


* Brian Gerst <brgerst@gmail.com> wrote:

> On Sat, Aug 13, 2016 at 1:16 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> > On Sat, Aug 13, 2016 at 9:38 AM, Brian Gerst <brgerst@gmail.com> wrote:
> >> This patch set simplifies the switch_to() code, by moving the stack switch
> >> code out of line into an asm stub before calling __switch_to().  This ends
> >> up being more readable, and using the C calling convention instead of
> >> clobbering all registers improves code generation.  It also allows newly
> >> forked processes to construct a special stack frame to seamlessly flow
> >> to ret_from_fork, instead of using a test and branch, or an unbalanced
> >> call/ret.
> >
> > Do you have performance numbers? Is it noticeable/measurable?
> 
> How do I measure it?  The perf documentation isn't easy to understand.

Something like this:

  taskset 1 perf stat -a -e '{instructions,cycles}' --repeat 10 perf bench sched pipe

... will give a very good idea about the general impact of these changes on 
context switch overhead.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 0/7] x86: Rewrite switch_to()
  2016-08-13 18:45     ` Ingo Molnar
@ 2016-08-13 19:33       ` Andy Lutomirski
  2016-08-17  5:16         ` Herbert Xu
  2016-08-14 14:18       ` Brian Gerst
  1 sibling, 1 reply; 25+ messages in thread
From: Andy Lutomirski @ 2016-08-13 19:33 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Brian Gerst, Linus Torvalds, the arch/x86 maintainers,
	Linux Kernel Mailing List, H. Peter Anvin, Denys Vlasenko,
	Borislav Petkov, Thomas Gleixner, Josh Poimboeuf

On Sat, Aug 13, 2016 at 11:45 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Brian Gerst <brgerst@gmail.com> wrote:
>
>> On Sat, Aug 13, 2016 at 1:16 PM, Linus Torvalds
>> <torvalds@linux-foundation.org> wrote:
>> > On Sat, Aug 13, 2016 at 9:38 AM, Brian Gerst <brgerst@gmail.com> wrote:
>> >> This patch set simplifies the switch_to() code, by moving the stack switch
>> >> code out of line into an asm stub before calling __switch_to().  This ends
>> >> up being more readable, and using the C calling convention instead of
>> >> clobbering all registers improves code generation.  It also allows newly
>> >> forked processes to construct a special stack frame to seamlessly flow
>> >> to ret_from_fork, instead of using a test and branch, or an unbalanced
>> >> call/ret.
>> >
>> > Do you have performance numbers? Is it noticeable/measurable?
>>
>> How do I measure it?  The perf documentation isn't easy to understand.
>
> Something like this:
>
>   taskset 1 perf stat -a -e '{instructions,cycles}' --repeat 10 perf bench sched pipe
>
> ... will give a very good idea about the general impact of these changes on
> context switch overhead.
>

I will be quite surprised if you can measure any effect at all.  I've
never seen context switches take fewer than ~2k cycles, and on my
laptop, they take 8k-9k cycles.  The scheduler is really, really slow.

(Why doesn't that perf command show cycles per context switch?)

--Andy

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 0/7] x86: Rewrite switch_to()
  2016-08-13 18:45     ` Ingo Molnar
  2016-08-13 19:33       ` Andy Lutomirski
@ 2016-08-14 14:18       ` Brian Gerst
  2016-08-15  5:10         ` Ingo Molnar
  1 sibling, 1 reply; 25+ messages in thread
From: Brian Gerst @ 2016-08-14 14:18 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, the arch/x86 maintainers,
	Linux Kernel Mailing List, H. Peter Anvin, Denys Vlasenko,
	Andy Lutomirski, Borislav Petkov, Thomas Gleixner,
	Josh Poimboeuf

On Sat, Aug 13, 2016 at 2:45 PM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Brian Gerst <brgerst@gmail.com> wrote:
>
>> On Sat, Aug 13, 2016 at 1:16 PM, Linus Torvalds
>> <torvalds@linux-foundation.org> wrote:
>> > On Sat, Aug 13, 2016 at 9:38 AM, Brian Gerst <brgerst@gmail.com> wrote:
>> >> This patch set simplifies the switch_to() code, by moving the stack switch
>> >> code out of line into an asm stub before calling __switch_to().  This ends
>> >> up being more readable, and using the C calling convention instead of
>> >> clobbering all registers improves code generation.  It also allows newly
>> >> forked processes to construct a special stack frame to seamlessly flow
>> >> to ret_from_fork, instead of using a test and branch, or an unbalanced
>> >> call/ret.
>> >
>> > Do you have performance numbers? Is it noticeable/measurable?
>>
>> How do I measure it?  The perf documentation isn't easy to understand.
>
> Something like this:
>
>   taskset 1 perf stat -a -e '{instructions,cycles}' --repeat 10 perf bench sched pipe
>
> ... will give a very good idea about the general impact of these changes on
> context switch overhead.

Before:
 Performance counter stats for 'system wide' (10 runs):

    12,010,932,128      instructions              #    1.03  insn per
cycle                                              ( +-  0.31% )
    11,691,797,513      cycles
               ( +-  0.76% )

       3.487329979 seconds time elapsed
          ( +-  0.78% )

After:
 Performance counter stats for 'system wide' (10 runs):

    12,097,706,506      instructions              #    1.04  insn per
cycle                                              ( +-  0.14% )
    11,612,167,742      cycles
               ( +-  0.81% )

       3.451278789 seconds time elapsed
          ( +-  0.82% )

The numbers with or without this patch series are roughly the same.
There is noticeable variation in the numbers each time I run it, so
I'm not sure how good of a benchmark this is.

--
Brian Gerst

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 0/7] x86: Rewrite switch_to()
  2016-08-14 14:18       ` Brian Gerst
@ 2016-08-15  5:10         ` Ingo Molnar
  2016-08-15 11:43           ` Brian Gerst
  2016-08-17 21:23           ` Andy Lutomirski
  0 siblings, 2 replies; 25+ messages in thread
From: Ingo Molnar @ 2016-08-15  5:10 UTC (permalink / raw)
  To: Brian Gerst
  Cc: Linus Torvalds, the arch/x86 maintainers,
	Linux Kernel Mailing List, H. Peter Anvin, Denys Vlasenko,
	Andy Lutomirski, Borislav Petkov, Thomas Gleixner,
	Josh Poimboeuf


* Brian Gerst <brgerst@gmail.com> wrote:

> > Something like this:
> >
> >   taskset 1 perf stat -a -e '{instructions,cycles}' --repeat 10 perf bench sched pipe
> >
> > ... will give a very good idea about the general impact of these changes on
> > context switch overhead.
> 
> Before:
>  Performance counter stats for 'system wide' (10 runs):
> 
>     12,010,932,128      instructions              #    1.03  insn per
> cycle                                              ( +-  0.31% )
>     11,691,797,513      cycles
>                ( +-  0.76% )
> 
>        3.487329979 seconds time elapsed
>           ( +-  0.78% )
> 
> After:
>  Performance counter stats for 'system wide' (10 runs):
> 
>     12,097,706,506      instructions              #    1.04  insn per
> cycle                                              ( +-  0.14% )
>     11,612,167,742      cycles
>                ( +-  0.81% )
> 
>        3.451278789 seconds time elapsed
>           ( +-  0.82% )
> 
> The numbers with or without this patch series are roughly the same.
> There is noticeable variation in the numbers each time I run it, so
> I'm not sure how good of a benchmark this is.

Weird, I get an order of magnitude lower noise:

 triton:~/tip> taskset 1 perf stat -a -e '{instructions,cycles}' --repeat 10 perf bench sched pipe >/dev/null

 Performance counter stats for 'system wide' (10 runs):

    11,503,026,062      instructions              #    1.23  insn per cycle                                              ( +-  2.64% )
     9,377,410,613      cycles                                                        ( +-  2.05% )

       1.669425407 seconds time elapsed                                          ( +-  0.12% )

But note that I also had '--sync' for perf stat and did a >/dev/null at the end to 
make sure no terminal output and subsequent Xorg activities interfere. Also, full 
screen terminal.

Maybe try 'taskset 4' as well to put the workload on another CPU, if the first CPU 
is busier than the others?

(Any Hyperthreading on your test system?)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 0/7] x86: Rewrite switch_to()
  2016-08-15  5:10         ` Ingo Molnar
@ 2016-08-15 11:43           ` Brian Gerst
  2016-08-17 21:23           ` Andy Lutomirski
  1 sibling, 0 replies; 25+ messages in thread
From: Brian Gerst @ 2016-08-15 11:43 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, the arch/x86 maintainers,
	Linux Kernel Mailing List, H. Peter Anvin, Denys Vlasenko,
	Andy Lutomirski, Borislav Petkov, Thomas Gleixner,
	Josh Poimboeuf

On Mon, Aug 15, 2016 at 1:10 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Brian Gerst <brgerst@gmail.com> wrote:
>
>> > Something like this:
>> >
>> >   taskset 1 perf stat -a -e '{instructions,cycles}' --repeat 10 perf bench sched pipe
>> >
>> > ... will give a very good idea about the general impact of these changes on
>> > context switch overhead.
>>
>> Before:
>>  Performance counter stats for 'system wide' (10 runs):
>>
>>     12,010,932,128      instructions              #    1.03  insn per
>> cycle                                              ( +-  0.31% )
>>     11,691,797,513      cycles
>>                ( +-  0.76% )
>>
>>        3.487329979 seconds time elapsed
>>           ( +-  0.78% )
>>
>> After:
>>  Performance counter stats for 'system wide' (10 runs):
>>
>>     12,097,706,506      instructions              #    1.04  insn per
>> cycle                                              ( +-  0.14% )
>>     11,612,167,742      cycles
>>                ( +-  0.81% )
>>
>>        3.451278789 seconds time elapsed
>>           ( +-  0.82% )
>>
>> The numbers with or without this patch series are roughly the same.
>> There is noticeable variation in the numbers each time I run it, so
>> I'm not sure how good of a benchmark this is.
>
> Weird, I get an order of magnitude lower noise:
>
>  triton:~/tip> taskset 1 perf stat -a -e '{instructions,cycles}' --repeat 10 perf bench sched pipe >/dev/null
>
>  Performance counter stats for 'system wide' (10 runs):
>
>     11,503,026,062      instructions              #    1.23  insn per cycle                                              ( +-  2.64% )
>      9,377,410,613      cycles                                                        ( +-  2.05% )
>
>        1.669425407 seconds time elapsed                                          ( +-  0.12% )
>
> But note that I also had '--sync' for perf stat and did a >/dev/null at the end to
> make sure no terminal output and subsequent Xorg activities interfere. Also, full
> screen terminal.
>
> Maybe try 'taskset 4' as well to put the workload on another CPU, if the first CPU
> is busier than the others?
>
> (Any Hyperthreading on your test system?)

It is an AMD Phenom(tm) II X6 1055T, no hyperthreading.

--
Brian Gerst

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 0/7] x86: Rewrite switch_to()
  2016-08-13 19:33       ` Andy Lutomirski
@ 2016-08-17  5:16         ` Herbert Xu
  0 siblings, 0 replies; 25+ messages in thread
From: Herbert Xu @ 2016-08-17  5:16 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: mingo, brgerst, torvalds, x86, linux-kernel, hpa, dvlasenk, bp,
	tglx, jpoimboe

Andy Lutomirski <luto@amacapital.net> wrote:
>
> I will be quite surprised if you can measure any effect at all.  I've
> never seen context switches take fewer than ~2k cycles, and on my
> laptop, they take 8k-9k cycles.  The scheduler is really, really slow.

Indeed, I think I still have a bugzilla entry somewhere regarding
the huge performance regression when we first switched over to the
current scheduler.

Over the years people have been trying to hack around it, e.g.,
network driver polling, low latency socket, anything to avoid
going into the scheduler.  We really should fix it properly some
day.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 0/7] x86: Rewrite switch_to()
  2016-08-13 16:38 [PATCH v3 0/7] x86: Rewrite switch_to() Brian Gerst
                   ` (7 preceding siblings ...)
  2016-08-13 17:16 ` [PATCH v3 0/7] x86: Rewrite switch_to() Linus Torvalds
@ 2016-08-17 21:20 ` Josh Poimboeuf
  8 siblings, 0 replies; 25+ messages in thread
From: Josh Poimboeuf @ 2016-08-17 21:20 UTC (permalink / raw)
  To: Brian Gerst
  Cc: x86, linux-kernel, Ingo Molnar, H. Peter Anvin, Linus Torvalds,
	Denys Vlasenko, Andy Lutomirski, Borislav Petkov,
	Thomas Gleixner

On Sat, Aug 13, 2016 at 12:38:15PM -0400, Brian Gerst wrote:
> This patch set simplifies the switch_to() code, by moving the stack switch
> code out of line into an asm stub before calling __switch_to().  This ends
> up being more readable, and using the C calling convention instead of
> clobbering all registers improves code generation.  It also allows newly
> forked processes to construct a special stack frame to seamlessly flow
> to ret_from_fork, instead of using a test and branch, or an unbalanced
> call/ret.
> 
> Changes from v2:
> - Updated comments around kernel threads being uncommon for fork, etc.
> - Removed STACK_FRAME_NON_STANDARD annotation from __schedule() per Josh Poimboeuf
> - A few minor cleanups added

There are a few minor conflicts with my x86 stack dump patch set, but
for the most part they should be orthogonal.

For the series:

  Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>

-- 
Josh

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 0/7] x86: Rewrite switch_to()
  2016-08-15  5:10         ` Ingo Molnar
  2016-08-15 11:43           ` Brian Gerst
@ 2016-08-17 21:23           ` Andy Lutomirski
  1 sibling, 0 replies; 25+ messages in thread
From: Andy Lutomirski @ 2016-08-17 21:23 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Gleixner, Denys Vlasenko, Josh Poimboeuf, Borislav Petkov,
	the arch/x86 maintainers, Linux Kernel Mailing List, Brian Gerst,
	H. Peter Anvin, Linus Torvalds

On Aug 15, 2016 8:10 AM, "Ingo Molnar" <mingo@kernel.org> wrote:
>
>
> * Brian Gerst <brgerst@gmail.com> wrote:
>
> > > Something like this:
> > >
> > >   taskset 1 perf stat -a -e '{instructions,cycles}' --repeat 10 perf bench sched pipe
> > >
> > > ... will give a very good idea about the general impact of these changes on
> > > context switch overhead.
> >
> > Before:
> >  Performance counter stats for 'system wide' (10 runs):
> >
> >     12,010,932,128      instructions              #    1.03  insn per
> > cycle                                              ( +-  0.31% )
> >     11,691,797,513      cycles
> >                ( +-  0.76% )
> >
> >        3.487329979 seconds time elapsed
> >           ( +-  0.78% )
> >
> > After:
> >  Performance counter stats for 'system wide' (10 runs):
> >
> >     12,097,706,506      instructions              #    1.04  insn per
> > cycle                                              ( +-  0.14% )
> >     11,612,167,742      cycles
> >                ( +-  0.81% )
> >
> >        3.451278789 seconds time elapsed
> >           ( +-  0.82% )
> >
> > The numbers with or without this patch series are roughly the same.
> > There is noticeable variation in the numbers each time I run it, so
> > I'm not sure how good of a benchmark this is.
>
> Weird, I get an order of magnitude lower noise:
>
>  triton:~/tip> taskset 1 perf stat -a -e '{instructions,cycles}' --repeat 10 perf bench sched pipe >/dev/null
>
>  Performance counter stats for 'system wide' (10 runs):
>
>     11,503,026,062      instructions              #    1.23  insn per cycle                                              ( +-  2.64% )
>      9,377,410,613      cycles                                                        ( +-  2.05% )
>
>        1.669425407 seconds time elapsed                                          ( +-  0.12% )
>
> But note that I also had '--sync' for perf stat and did a >/dev/null at the end to
> make sure no terminal output and subsequent Xorg activities interfere. Also, full
> screen terminal.
>
> Maybe try 'taskset 4' as well to put the workload on another CPU, if the first CPU
> is busier than the others?
>
> (Any Hyperthreading on your test system?)
>

I've never investigated for real, but I suspect that cgroups are a big
part of it.  If you do a regular perf recording, I think you'll find
that nearly all of the time is in the scheduler.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [tip:x86/asm] sched/x86/32, kgdb: Don't use thread.ip in sleeping_thread_to_gdb_regs()
  2016-08-13 16:38 ` [PATCH v3 1/7] x86-32, kgdb: Don't use thread.ip in sleeping_thread_to_gdb_regs() Brian Gerst
@ 2016-08-24 13:07   ` tip-bot for Brian Gerst
  0 siblings, 0 replies; 25+ messages in thread
From: tip-bot for Brian Gerst @ 2016-08-24 13:07 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: torvalds, jpoimboe, brgerst, hpa, mingo, bp, tglx, linux-kernel,
	luto, peterz, dvlasenk, jason.wessel

Commit-ID:  4e047aa7f267c3449b6d323510d35864829aca70
Gitweb:     http://git.kernel.org/tip/4e047aa7f267c3449b6d323510d35864829aca70
Author:     Brian Gerst <brgerst@gmail.com>
AuthorDate: Sat, 13 Aug 2016 12:38:16 -0400
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 24 Aug 2016 12:27:40 +0200

sched/x86/32, kgdb: Don't use thread.ip in sleeping_thread_to_gdb_regs()

Match 64-bit and set gdb_regs[GDB_PC] to zero.  thread.ip is always the
same point in the scheduler (except for newly forked processes), and will
be removed in a future patch.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1471106302-10159-2-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/kgdb.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/kernel/kgdb.c b/arch/x86/kernel/kgdb.c
index 04cde52..fe649a5 100644
--- a/arch/x86/kernel/kgdb.c
+++ b/arch/x86/kernel/kgdb.c
@@ -172,7 +172,6 @@ void sleeping_thread_to_gdb_regs(unsigned long *gdb_regs, struct task_struct *p)
 	gdb_regs[GDB_ES]	= __KERNEL_DS;
 	gdb_regs[GDB_PS]	= 0;
 	gdb_regs[GDB_CS]	= __KERNEL_CS;
-	gdb_regs[GDB_PC]	= p->thread.ip;
 	gdb_regs[GDB_SS]	= __KERNEL_DS;
 	gdb_regs[GDB_FS]	= 0xFFFF;
 	gdb_regs[GDB_GS]	= 0xFFFF;
@@ -180,7 +179,6 @@ void sleeping_thread_to_gdb_regs(unsigned long *gdb_regs, struct task_struct *p)
 	gdb_regs32[GDB_PS]	= *(unsigned long *)(p->thread.sp + 8);
 	gdb_regs32[GDB_CS]	= __KERNEL_CS;
 	gdb_regs32[GDB_SS]	= __KERNEL_DS;
-	gdb_regs[GDB_PC]	= 0;
 	gdb_regs[GDB_R8]	= 0;
 	gdb_regs[GDB_R9]	= 0;
 	gdb_regs[GDB_R10]	= 0;
@@ -190,6 +188,7 @@ void sleeping_thread_to_gdb_regs(unsigned long *gdb_regs, struct task_struct *p)
 	gdb_regs[GDB_R14]	= 0;
 	gdb_regs[GDB_R15]	= 0;
 #endif
+	gdb_regs[GDB_PC]	= 0;
 	gdb_regs[GDB_SP]	= p->thread.sp;
 }
 

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [tip:x86/asm] sched/x86/64, kgdb: Clear GDB_PS on 64-bit
  2016-08-13 16:38 ` [PATCH v3 2/7] x86-64, kgdb: clear GDB_PS on 64-bit Brian Gerst
@ 2016-08-24 13:07   ` tip-bot for Brian Gerst
  0 siblings, 0 replies; 25+ messages in thread
From: tip-bot for Brian Gerst @ 2016-08-24 13:07 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, dvlasenk, hpa, peterz, brgerst, luto, bp,
	jason.wessel, mingo, jpoimboe, tglx, torvalds

Commit-ID:  163630191ecb0dd9e4146d3c910045aba1cfeec1
Gitweb:     http://git.kernel.org/tip/163630191ecb0dd9e4146d3c910045aba1cfeec1
Author:     Brian Gerst <brgerst@gmail.com>
AuthorDate: Sat, 13 Aug 2016 12:38:17 -0400
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 24 Aug 2016 12:27:40 +0200

sched/x86/64, kgdb: Clear GDB_PS on 64-bit

switch_to() no longer saves EFLAGS, so it's bogus to look for it on the
stack.  Set it to zero like 32-bit.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1471106302-10159-3-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/kgdb.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/kgdb.c b/arch/x86/kernel/kgdb.c
index fe649a5..5e3f294 100644
--- a/arch/x86/kernel/kgdb.c
+++ b/arch/x86/kernel/kgdb.c
@@ -176,7 +176,7 @@ void sleeping_thread_to_gdb_regs(unsigned long *gdb_regs, struct task_struct *p)
 	gdb_regs[GDB_FS]	= 0xFFFF;
 	gdb_regs[GDB_GS]	= 0xFFFF;
 #else
-	gdb_regs32[GDB_PS]	= *(unsigned long *)(p->thread.sp + 8);
+	gdb_regs32[GDB_PS]	= 0;
 	gdb_regs32[GDB_CS]	= __KERNEL_CS;
 	gdb_regs32[GDB_SS]	= __KERNEL_DS;
 	gdb_regs[GDB_R8]	= 0;

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [tip:x86/asm] sched/x86: Add 'struct inactive_task_frame' to better document the sleeping task stack frame
  2016-08-13 16:38 ` [PATCH v3 3/7] x86: Add struct inactive_task_frame Brian Gerst
@ 2016-08-24 13:08   ` tip-bot for Brian Gerst
  0 siblings, 0 replies; 25+ messages in thread
From: tip-bot for Brian Gerst @ 2016-08-24 13:08 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, brgerst, torvalds, mingo, luto, jpoimboe, dvlasenk,
	tglx, hpa, bp, peterz

Commit-ID:  7b32aeadbc95d4a41402c1c0da6aa3ab51af4c10
Gitweb:     http://git.kernel.org/tip/7b32aeadbc95d4a41402c1c0da6aa3ab51af4c10
Author:     Brian Gerst <brgerst@gmail.com>
AuthorDate: Sat, 13 Aug 2016 12:38:18 -0400
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 24 Aug 2016 12:27:41 +0200

sched/x86: Add 'struct inactive_task_frame' to better document the sleeping task stack frame

Add 'struct inactive_task_frame', which defines the layout of the stack for
a sleeping process.  For now, the only defined field is the BP register
(frame pointer).

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1471106302-10159-4-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/stacktrace.h | 4 ++--
 arch/x86/include/asm/switch_to.h  | 5 +++++
 arch/x86/kernel/kgdb.c            | 3 ++-
 arch/x86/kernel/process.c         | 3 ++-
 4 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/stacktrace.h b/arch/x86/include/asm/stacktrace.h
index 0944218..7646fb2 100644
--- a/arch/x86/include/asm/stacktrace.h
+++ b/arch/x86/include/asm/stacktrace.h
@@ -8,6 +8,7 @@
 
 #include <linux/uaccess.h>
 #include <linux/ptrace.h>
+#include <asm/switch_to.h>
 
 extern int kstack_depth_to_print;
 
@@ -70,8 +71,7 @@ stack_frame(struct task_struct *task, struct pt_regs *regs)
 		return bp;
 	}
 
-	/* bp is the last reg pushed by switch_to */
-	return *(unsigned long *)task->thread.sp;
+	return ((struct inactive_task_frame *)task->thread.sp)->bp;
 }
 #else
 static inline unsigned long
diff --git a/arch/x86/include/asm/switch_to.h b/arch/x86/include/asm/switch_to.h
index 14e4b20..ec689c6 100644
--- a/arch/x86/include/asm/switch_to.h
+++ b/arch/x86/include/asm/switch_to.h
@@ -30,6 +30,11 @@ static inline void prepare_switch_to(struct task_struct *prev,
 #endif
 }
 
+/* data that is pointed to by thread.sp */
+struct inactive_task_frame {
+	unsigned long bp;
+};
+
 #ifdef CONFIG_X86_32
 
 #ifdef CONFIG_CC_STACKPROTECTOR
diff --git a/arch/x86/kernel/kgdb.c b/arch/x86/kernel/kgdb.c
index 5e3f294..8e36f24 100644
--- a/arch/x86/kernel/kgdb.c
+++ b/arch/x86/kernel/kgdb.c
@@ -50,6 +50,7 @@
 #include <asm/apicdef.h>
 #include <asm/apic.h>
 #include <asm/nmi.h>
+#include <asm/switch_to.h>
 
 struct dbg_reg_def_t dbg_reg_def[DBG_MAX_REG_NUM] =
 {
@@ -166,7 +167,7 @@ void sleeping_thread_to_gdb_regs(unsigned long *gdb_regs, struct task_struct *p)
 	gdb_regs[GDB_DX]	= 0;
 	gdb_regs[GDB_SI]	= 0;
 	gdb_regs[GDB_DI]	= 0;
-	gdb_regs[GDB_BP]	= *(unsigned long *)p->thread.sp;
+	gdb_regs[GDB_BP]	= ((struct inactive_task_frame *)p->thread.sp)->bp;
 #ifdef CONFIG_X86_32
 	gdb_regs[GDB_DS]	= __KERNEL_DS;
 	gdb_regs[GDB_ES]	= __KERNEL_DS;
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 62c0b0e..0115a4a 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -32,6 +32,7 @@
 #include <asm/tlbflush.h>
 #include <asm/mce.h>
 #include <asm/vm86.h>
+#include <asm/switch_to.h>
 
 /*
  * per-CPU TSS segments. Threads are completely 'soft' on Linux,
@@ -556,7 +557,7 @@ unsigned long get_wchan(struct task_struct *p)
 	if (sp < bottom || sp > top)
 		return 0;
 
-	fp = READ_ONCE_NOCHECK(*(unsigned long *)sp);
+	fp = READ_ONCE_NOCHECK(((struct inactive_task_frame *)sp)->bp);
 	do {
 		if (fp < bottom || fp > top)
 			return 0;

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [tip:x86/asm] sched/x86: Rewrite the switch_to() code
  2016-08-13 16:38 ` [PATCH v3 4/7] x86: Rewrite switch_to() code Brian Gerst
@ 2016-08-24 13:08   ` tip-bot for Brian Gerst
  0 siblings, 0 replies; 25+ messages in thread
From: tip-bot for Brian Gerst @ 2016-08-24 13:08 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, torvalds, hpa, peterz, brgerst, mingo, dvlasenk,
	bp, tglx, jpoimboe, luto

Commit-ID:  0100301bfdf56a2a370c7157b5ab0fbf9313e1cd
Gitweb:     http://git.kernel.org/tip/0100301bfdf56a2a370c7157b5ab0fbf9313e1cd
Author:     Brian Gerst <brgerst@gmail.com>
AuthorDate: Sat, 13 Aug 2016 12:38:19 -0400
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 24 Aug 2016 12:31:41 +0200

sched/x86: Rewrite the switch_to() code

Move the low-level context switch code to an out-of-line asm stub instead of
using complex inline asm.  This allows constructing a new stack frame for the
child process to make it seamlessly flow to ret_from_fork without an extra
test and branch in __switch_to().  It also improves code generation for
__schedule() by using the C calling convention instead of clobbering all
registers.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1471106302-10159-5-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/entry/entry_32.S          |  37 ++++++++++
 arch/x86/entry/entry_64.S          |  41 ++++++++++-
 arch/x86/include/asm/processor.h   |   3 -
 arch/x86/include/asm/switch_to.h   | 139 ++++++-------------------------------
 arch/x86/include/asm/thread_info.h |   2 -
 arch/x86/kernel/asm-offsets.c      |   6 ++
 arch/x86/kernel/asm-offsets_32.c   |   5 ++
 arch/x86/kernel/asm-offsets_64.c   |   5 ++
 arch/x86/kernel/process_32.c       |   9 ++-
 arch/x86/kernel/process_64.c       |   9 ++-
 arch/x86/kernel/smpboot.c          |   1 -
 11 files changed, 125 insertions(+), 132 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 0b56666..bf8f221 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -204,6 +204,43 @@
 	POP_GS_EX
 .endm
 
+/*
+ * %eax: prev task
+ * %edx: next task
+ */
+ENTRY(__switch_to_asm)
+	/*
+	 * Save callee-saved registers
+	 * This must match the order in struct inactive_task_frame
+	 */
+	pushl	%ebp
+	pushl	%ebx
+	pushl	%edi
+	pushl	%esi
+
+	/* switch stack */
+	movl	%esp, TASK_threadsp(%eax)
+	movl	TASK_threadsp(%edx), %esp
+
+#ifdef CONFIG_CC_STACKPROTECTOR
+	movl	TASK_stack_canary(%edx), %ebx
+	movl	%ebx, PER_CPU_VAR(stack_canary)+stack_canary_offset
+#endif
+
+	/* restore callee-saved registers */
+	popl	%esi
+	popl	%edi
+	popl	%ebx
+	popl	%ebp
+
+	jmp	__switch_to
+END(__switch_to_asm)
+
+/*
+ * A newly forked process directly context switches into this address.
+ *
+ * eax: prev task we switched from
+ */
 ENTRY(ret_from_fork)
 	pushl	%eax
 	call	schedule_tail
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index f6b40e5..c1af8ac 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -368,13 +368,48 @@ END(ptregs_\func)
 #include <asm/syscalls_64.h>
 
 /*
+ * %rdi: prev task
+ * %rsi: next task
+ */
+ENTRY(__switch_to_asm)
+	/*
+	 * Save callee-saved registers
+	 * This must match the order in inactive_task_frame
+	 */
+	pushq	%rbp
+	pushq	%rbx
+	pushq	%r12
+	pushq	%r13
+	pushq	%r14
+	pushq	%r15
+
+	/* switch stack */
+	movq	%rsp, TASK_threadsp(%rdi)
+	movq	TASK_threadsp(%rsi), %rsp
+
+#ifdef CONFIG_CC_STACKPROTECTOR
+	movq	TASK_stack_canary(%rsi), %rbx
+	movq	%rbx, PER_CPU_VAR(irq_stack_union)+stack_canary_offset
+#endif
+
+	/* restore callee-saved registers */
+	popq	%r15
+	popq	%r14
+	popq	%r13
+	popq	%r12
+	popq	%rbx
+	popq	%rbp
+
+	jmp	__switch_to
+END(__switch_to_asm)
+
+/*
  * A newly forked process directly context switches into this address.
  *
- * rdi: prev task we switched from
+ * rax: prev task we switched from
  */
 ENTRY(ret_from_fork)
-	LOCK ; btr $TIF_FORK, TI_flags(%r8)
-
+	movq	%rax, %rdi
 	call	schedule_tail			/* rdi: 'prev' task parameter */
 
 	testb	$3, CS(%rsp)			/* from kernel_thread? */
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 63def95..6fee863 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -389,9 +389,6 @@ struct thread_struct {
 	unsigned short		fsindex;
 	unsigned short		gsindex;
 #endif
-#ifdef CONFIG_X86_32
-	unsigned long		ip;
-#endif
 #ifdef CONFIG_X86_64
 	unsigned long		fsbase;
 	unsigned long		gsbase;
diff --git a/arch/x86/include/asm/switch_to.h b/arch/x86/include/asm/switch_to.h
index ec689c6..886d5ea 100644
--- a/arch/x86/include/asm/switch_to.h
+++ b/arch/x86/include/asm/switch_to.h
@@ -2,8 +2,12 @@
 #define _ASM_X86_SWITCH_TO_H
 
 struct task_struct; /* one of the stranger aspects of C forward declarations */
+
+struct task_struct *__switch_to_asm(struct task_struct *prev,
+				    struct task_struct *next);
+
 __visible struct task_struct *__switch_to(struct task_struct *prev,
-					   struct task_struct *next);
+					  struct task_struct *next);
 struct tss_struct;
 void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p,
 		      struct tss_struct *tss);
@@ -32,131 +36,30 @@ static inline void prepare_switch_to(struct task_struct *prev,
 
 /* data that is pointed to by thread.sp */
 struct inactive_task_frame {
+#ifdef CONFIG_X86_64
+	unsigned long r15;
+	unsigned long r14;
+	unsigned long r13;
+	unsigned long r12;
+#else
+	unsigned long si;
+	unsigned long di;
+#endif
+	unsigned long bx;
 	unsigned long bp;
+	unsigned long ret_addr;
 };
 
-#ifdef CONFIG_X86_32
-
-#ifdef CONFIG_CC_STACKPROTECTOR
-#define __switch_canary							\
-	"movl %P[task_canary](%[next]), %%ebx\n\t"			\
-	"movl %%ebx, "__percpu_arg([stack_canary])"\n\t"
-#define __switch_canary_oparam						\
-	, [stack_canary] "=m" (stack_canary.canary)
-#define __switch_canary_iparam						\
-	, [task_canary] "i" (offsetof(struct task_struct, stack_canary))
-#else	/* CC_STACKPROTECTOR */
-#define __switch_canary
-#define __switch_canary_oparam
-#define __switch_canary_iparam
-#endif	/* CC_STACKPROTECTOR */
+struct fork_frame {
+	struct inactive_task_frame frame;
+	struct pt_regs regs;
+};
 
-/*
- * Saving eflags is important. It switches not only IOPL between tasks,
- * it also protects other tasks from NT leaking through sysenter etc.
- */
 #define switch_to(prev, next, last)					\
 do {									\
-	/*								\
-	 * Context-switching clobbers all registers, so we clobber	\
-	 * them explicitly, via unused output variables.		\
-	 * (EAX and EBP is not listed because EBP is saved/restored	\
-	 * explicitly for wchan access and EAX is the return value of	\
-	 * __switch_to())						\
-	 */								\
-	unsigned long ebx, ecx, edx, esi, edi;				\
-									\
 	prepare_switch_to(prev, next);					\
 									\
-	asm volatile("pushl %%ebp\n\t"		/* save    EBP   */	\
-		     "movl %%esp,%[prev_sp]\n\t"	/* save    ESP   */ \
-		     "movl %[next_sp],%%esp\n\t"	/* restore ESP   */ \
-		     "movl $1f,%[prev_ip]\n\t"	/* save    EIP   */	\
-		     "pushl %[next_ip]\n\t"	/* restore EIP   */	\
-		     __switch_canary					\
-		     "jmp __switch_to\n"	/* regparm call  */	\
-		     "1:\t"						\
-		     "popl %%ebp\n\t"		/* restore EBP   */	\
-									\
-		     /* output parameters */				\
-		     : [prev_sp] "=m" (prev->thread.sp),		\
-		       [prev_ip] "=m" (prev->thread.ip),		\
-		       "=a" (last),					\
-									\
-		       /* clobbered output registers: */		\
-		       "=b" (ebx), "=c" (ecx), "=d" (edx),		\
-		       "=S" (esi), "=D" (edi)				\
-		       							\
-		       __switch_canary_oparam				\
-									\
-		       /* input parameters: */				\
-		     : [next_sp]  "m" (next->thread.sp),		\
-		       [next_ip]  "m" (next->thread.ip),		\
-		       							\
-		       /* regparm parameters for __switch_to(): */	\
-		       [prev]     "a" (prev),				\
-		       [next]     "d" (next)				\
-									\
-		       __switch_canary_iparam				\
-									\
-		     : /* reloaded segment registers */			\
-			"memory");					\
+	((last) = __switch_to_asm((prev), (next)));			\
 } while (0)
 
-#else /* CONFIG_X86_32 */
-
-/* frame pointer must be last for get_wchan */
-#define SAVE_CONTEXT    "pushq %%rbp ; movq %%rsi,%%rbp\n\t"
-#define RESTORE_CONTEXT "movq %%rbp,%%rsi ; popq %%rbp\t"
-
-#define __EXTRA_CLOBBER  \
-	, "rcx", "rbx", "rdx", "r8", "r9", "r10", "r11", \
-	  "r12", "r13", "r14", "r15", "flags"
-
-#ifdef CONFIG_CC_STACKPROTECTOR
-#define __switch_canary							  \
-	"movq %P[task_canary](%%rsi),%%r8\n\t"				  \
-	"movq %%r8,"__percpu_arg([gs_canary])"\n\t"
-#define __switch_canary_oparam						  \
-	, [gs_canary] "=m" (irq_stack_union.stack_canary)
-#define __switch_canary_iparam						  \
-	, [task_canary] "i" (offsetof(struct task_struct, stack_canary))
-#else	/* CC_STACKPROTECTOR */
-#define __switch_canary
-#define __switch_canary_oparam
-#define __switch_canary_iparam
-#endif	/* CC_STACKPROTECTOR */
-
-/*
- * There is no need to save or restore flags, because flags are always
- * clean in kernel mode, with the possible exception of IOPL.  Kernel IOPL
- * has no effect.
- */
-#define switch_to(prev, next, last)					  \
-	prepare_switch_to(prev, next);					  \
-									  \
-	asm volatile(SAVE_CONTEXT					  \
-	     "movq %%rsp,%P[threadrsp](%[prev])\n\t" /* save RSP */	  \
-	     "movq %P[threadrsp](%[next]),%%rsp\n\t" /* restore RSP */	  \
-	     "call __switch_to\n\t"					  \
-	     "movq "__percpu_arg([current_task])",%%rsi\n\t"		  \
-	     __switch_canary						  \
-	     "movq %P[thread_info](%%rsi),%%r8\n\t"			  \
-	     "movq %%rax,%%rdi\n\t" 					  \
-	     "testl  %[_tif_fork],%P[ti_flags](%%r8)\n\t"		  \
-	     "jnz   ret_from_fork\n\t"					  \
-	     RESTORE_CONTEXT						  \
-	     : "=a" (last)					  	  \
-	       __switch_canary_oparam					  \
-	     : [next] "S" (next), [prev] "D" (prev),			  \
-	       [threadrsp] "i" (offsetof(struct task_struct, thread.sp)), \
-	       [ti_flags] "i" (offsetof(struct thread_info, flags)),	  \
-	       [_tif_fork] "i" (_TIF_FORK),			  	  \
-	       [thread_info] "i" (offsetof(struct task_struct, stack)),   \
-	       [current_task] "m" (current_task)			  \
-	       __switch_canary_iparam					  \
-	     : "memory", "cc" __EXTRA_CLOBBER)
-
-#endif /* CONFIG_X86_32 */
-
 #endif /* _ASM_X86_SWITCH_TO_H */
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index 8b7c8d8..494c4b5 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -95,7 +95,6 @@ struct thread_info {
 #define TIF_UPROBE		12	/* breakpointed or singlestepping */
 #define TIF_NOTSC		16	/* TSC is not accessible in userland */
 #define TIF_IA32		17	/* IA32 compatibility process */
-#define TIF_FORK		18	/* ret_from_fork */
 #define TIF_NOHZ		19	/* in adaptive nohz mode */
 #define TIF_MEMDIE		20	/* is terminating due to OOM killer */
 #define TIF_POLLING_NRFLAG	21	/* idle is polling for TIF_NEED_RESCHED */
@@ -119,7 +118,6 @@ struct thread_info {
 #define _TIF_UPROBE		(1 << TIF_UPROBE)
 #define _TIF_NOTSC		(1 << TIF_NOTSC)
 #define _TIF_IA32		(1 << TIF_IA32)
-#define _TIF_FORK		(1 << TIF_FORK)
 #define _TIF_NOHZ		(1 << TIF_NOHZ)
 #define _TIF_POLLING_NRFLAG	(1 << TIF_POLLING_NRFLAG)
 #define _TIF_IO_BITMAP		(1 << TIF_IO_BITMAP)
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index 2bd5c6f..db3a0af 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -29,6 +29,12 @@
 
 void common(void) {
 	BLANK();
+	OFFSET(TASK_threadsp, task_struct, thread.sp);
+#ifdef CONFIG_CC_STACKPROTECTOR
+	OFFSET(TASK_stack_canary, task_struct, stack_canary);
+#endif
+
+	BLANK();
 	OFFSET(TI_flags, thread_info, flags);
 	OFFSET(TI_status, thread_info, status);
 
diff --git a/arch/x86/kernel/asm-offsets_32.c b/arch/x86/kernel/asm-offsets_32.c
index ecdc1d2..880aa09 100644
--- a/arch/x86/kernel/asm-offsets_32.c
+++ b/arch/x86/kernel/asm-offsets_32.c
@@ -57,6 +57,11 @@ void foo(void)
 	/* Size of SYSENTER_stack */
 	DEFINE(SIZEOF_SYSENTER_stack, sizeof(((struct tss_struct *)0)->SYSENTER_stack));
 
+#ifdef CONFIG_CC_STACKPROTECTOR
+	BLANK();
+	OFFSET(stack_canary_offset, stack_canary, canary);
+#endif
+
 #if defined(CONFIG_LGUEST) || defined(CONFIG_LGUEST_GUEST) || defined(CONFIG_LGUEST_MODULE)
 	BLANK();
 	OFFSET(LGUEST_DATA_irq_enabled, lguest_data, irq_enabled);
diff --git a/arch/x86/kernel/asm-offsets_64.c b/arch/x86/kernel/asm-offsets_64.c
index d875f97..210927e 100644
--- a/arch/x86/kernel/asm-offsets_64.c
+++ b/arch/x86/kernel/asm-offsets_64.c
@@ -56,6 +56,11 @@ int main(void)
 	OFFSET(TSS_sp0, tss_struct, x86_tss.sp0);
 	BLANK();
 
+#ifdef CONFIG_CC_STACKPROTECTOR
+	DEFINE(stack_canary_offset, offsetof(union irq_stack_union, stack_canary));
+	BLANK();
+#endif
+
 	DEFINE(__NR_syscall_max, sizeof(syscalls_64) - 1);
 	DEFINE(NR_syscalls, sizeof(syscalls_64));
 
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index d86be29..4bedbc0 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -133,17 +133,20 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long sp,
 	unsigned long arg, struct task_struct *p, unsigned long tls)
 {
 	struct pt_regs *childregs = task_pt_regs(p);
+	struct fork_frame *fork_frame = container_of(childregs, struct fork_frame, regs);
+	struct inactive_task_frame *frame = &fork_frame->frame;
 	struct task_struct *tsk;
 	int err;
 
-	p->thread.sp = (unsigned long) childregs;
+	frame->bp = 0;
+	p->thread.sp = (unsigned long) fork_frame;
 	p->thread.sp0 = (unsigned long) (childregs+1);
 	memset(p->thread.ptrace_bps, 0, sizeof(p->thread.ptrace_bps));
 
 	if (unlikely(p->flags & PF_KTHREAD)) {
 		/* kernel thread */
 		memset(childregs, 0, sizeof(struct pt_regs));
-		p->thread.ip = (unsigned long) ret_from_kernel_thread;
+		frame->ret_addr = (unsigned long) ret_from_kernel_thread;
 		task_user_gs(p) = __KERNEL_STACK_CANARY;
 		childregs->ds = __USER_DS;
 		childregs->es = __USER_DS;
@@ -161,7 +164,7 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long sp,
 	if (sp)
 		childregs->sp = sp;
 
-	p->thread.ip = (unsigned long) ret_from_fork;
+	frame->ret_addr = (unsigned long) ret_from_fork;
 	task_user_gs(p) = get_user_gs(current_pt_regs());
 
 	p->thread.io_bitmap_ptr = NULL;
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 63236d8..827eeed 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -141,12 +141,17 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long sp,
 {
 	int err;
 	struct pt_regs *childregs;
+	struct fork_frame *fork_frame;
+	struct inactive_task_frame *frame;
 	struct task_struct *me = current;
 
 	p->thread.sp0 = (unsigned long)task_stack_page(p) + THREAD_SIZE;
 	childregs = task_pt_regs(p);
-	p->thread.sp = (unsigned long) childregs;
-	set_tsk_thread_flag(p, TIF_FORK);
+	fork_frame = container_of(childregs, struct fork_frame, regs);
+	frame = &fork_frame->frame;
+	frame->bp = 0;
+	frame->ret_addr = (unsigned long) ret_from_fork;
+	p->thread.sp = (unsigned long) fork_frame;
 	p->thread.io_bitmap_ptr = NULL;
 
 	savesegment(gs, p->thread.gsindex);
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index c85d2c6..7e52f83 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -942,7 +942,6 @@ void common_cpu_up(unsigned int cpu, struct task_struct *idle)
 	per_cpu(cpu_current_top_of_stack, cpu) =
 		(unsigned long)task_stack_page(idle) + THREAD_SIZE;
 #else
-	clear_tsk_thread_flag(idle, TIF_FORK);
 	initial_gs = per_cpu_offset(cpu);
 #endif
 }

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [tip:x86/asm] sched/x86: Pass kernel thread parameters in 'struct fork_frame'
  2016-08-13 16:38 ` [PATCH v3 5/7] x86: Pass kernel thread parameters in fork_frame Brian Gerst
@ 2016-08-24 13:09   ` tip-bot for Brian Gerst
  0 siblings, 0 replies; 25+ messages in thread
From: tip-bot for Brian Gerst @ 2016-08-24 13:09 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: hpa, peterz, linux-kernel, bp, dvlasenk, luto, mingo, tglx,
	jpoimboe, torvalds, brgerst

Commit-ID:  616d24835eeafa8ef3466479db028abfdfc77531
Gitweb:     http://git.kernel.org/tip/616d24835eeafa8ef3466479db028abfdfc77531
Author:     Brian Gerst <brgerst@gmail.com>
AuthorDate: Sat, 13 Aug 2016 12:38:20 -0400
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 24 Aug 2016 12:31:50 +0200

sched/x86: Pass kernel thread parameters in 'struct fork_frame'

Instead of setting up a fake pt_regs context, put the kernel thread
function pointer and arg into the unused callee-restored registers
of 'struct fork_frame'.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1471106302-10159-6-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/entry/entry_32.S        | 31 +++++++++++++++----------------
 arch/x86/entry/entry_64.S        | 37 +++++++++++++++++--------------------
 arch/x86/include/asm/switch_to.h |  2 ++
 arch/x86/kernel/process_32.c     | 18 ++++--------------
 arch/x86/kernel/process_64.c     | 12 +++---------
 5 files changed, 41 insertions(+), 59 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index bf8f221..b75a8bc 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -240,35 +240,34 @@ END(__switch_to_asm)
  * A newly forked process directly context switches into this address.
  *
  * eax: prev task we switched from
+ * ebx: kernel thread func (NULL for user thread)
+ * edi: kernel thread arg
  */
 ENTRY(ret_from_fork)
 	pushl	%eax
 	call	schedule_tail
 	popl	%eax
 
+	testl	%ebx, %ebx
+	jnz	1f		/* kernel threads are uncommon */
+
+2:
 	/* When we fork, we trace the syscall return in the child, too. */
 	movl    %esp, %eax
 	call    syscall_return_slowpath
 	jmp     restore_all
-END(ret_from_fork)
-
-ENTRY(ret_from_kernel_thread)
-	pushl	%eax
-	call	schedule_tail
-	popl	%eax
-	movl	PT_EBP(%esp), %eax
-	call	*PT_EBX(%esp)
-	movl	$0, PT_EAX(%esp)
 
+	/* kernel thread */
+1:	movl	%edi, %eax
+	call	*%ebx
 	/*
-	 * Kernel threads return to userspace as if returning from a syscall.
-	 * We should check whether anything actually uses this path and, if so,
-	 * consider switching it over to ret_from_fork.
+	 * A kernel thread is allowed to return here after successfully
+	 * calling do_execve().  Exit to userspace to complete the execve()
+	 * syscall.
 	 */
-	movl    %esp, %eax
-	call    syscall_return_slowpath
-	jmp     restore_all
-ENDPROC(ret_from_kernel_thread)
+	movl	$0, PT_EAX(%esp)
+	jmp	2b
+END(ret_from_fork)
 
 /*
  * Return to user mode is not as complex as all this looks,
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index c1af8ac..c0373d6 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -407,37 +407,34 @@ END(__switch_to_asm)
  * A newly forked process directly context switches into this address.
  *
  * rax: prev task we switched from
+ * rbx: kernel thread func (NULL for user thread)
+ * r12: kernel thread arg
  */
 ENTRY(ret_from_fork)
 	movq	%rax, %rdi
 	call	schedule_tail			/* rdi: 'prev' task parameter */
 
-	testb	$3, CS(%rsp)			/* from kernel_thread? */
-	jnz	1f
-
-	/*
-	 * We came from kernel_thread.  This code path is quite twisted, and
-	 * someone should clean it up.
-	 *
-	 * copy_thread_tls stashes the function pointer in RBX and the
-	 * parameter to be passed in RBP.  The called function is permitted
-	 * to call do_execve and thereby jump to user mode.
-	 */
-	movq	RBP(%rsp), %rdi
-	call	*RBX(%rsp)
-	movl	$0, RAX(%rsp)
-
-	/*
-	 * Fall through as though we're exiting a syscall.  This makes a
-	 * twisted sort of sense if we just called do_execve.
-	 */
+	testq	%rbx, %rbx			/* from kernel_thread? */
+	jnz	1f				/* kernel threads are uncommon */
 
-1:
+2:
 	movq	%rsp, %rdi
 	call	syscall_return_slowpath	/* returns with IRQs disabled */
 	TRACE_IRQS_ON			/* user mode is traced as IRQS on */
 	SWAPGS
 	jmp	restore_regs_and_iret
+
+1:
+	/* kernel thread */
+	movq	%r12, %rdi
+	call	*%rbx
+	/*
+	 * A kernel thread is allowed to return here after successfully
+	 * calling do_execve().  Exit to userspace to complete the execve()
+	 * syscall.
+	 */
+	movq	$0, RAX(%rsp)
+	jmp	2b
 END(ret_from_fork)
 
 /*
diff --git a/arch/x86/include/asm/switch_to.h b/arch/x86/include/asm/switch_to.h
index 886d5ea..5cb436a 100644
--- a/arch/x86/include/asm/switch_to.h
+++ b/arch/x86/include/asm/switch_to.h
@@ -34,6 +34,8 @@ static inline void prepare_switch_to(struct task_struct *prev,
 #endif
 }
 
+asmlinkage void ret_from_fork(void);
+
 /* data that is pointed to by thread.sp */
 struct inactive_task_frame {
 #ifdef CONFIG_X86_64
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 4bedbc0..18714a1 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -55,9 +55,6 @@
 #include <asm/switch_to.h>
 #include <asm/vm86.h>
 
-asmlinkage void ret_from_fork(void) __asm__("ret_from_fork");
-asmlinkage void ret_from_kernel_thread(void) __asm__("ret_from_kernel_thread");
-
 /*
  * Return saved PC of a blocked thread.
  */
@@ -139,6 +136,7 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long sp,
 	int err;
 
 	frame->bp = 0;
+	frame->ret_addr = (unsigned long) ret_from_fork;
 	p->thread.sp = (unsigned long) fork_frame;
 	p->thread.sp0 = (unsigned long) (childregs+1);
 	memset(p->thread.ptrace_bps, 0, sizeof(p->thread.ptrace_bps));
@@ -146,25 +144,17 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long sp,
 	if (unlikely(p->flags & PF_KTHREAD)) {
 		/* kernel thread */
 		memset(childregs, 0, sizeof(struct pt_regs));
-		frame->ret_addr = (unsigned long) ret_from_kernel_thread;
-		task_user_gs(p) = __KERNEL_STACK_CANARY;
-		childregs->ds = __USER_DS;
-		childregs->es = __USER_DS;
-		childregs->fs = __KERNEL_PERCPU;
-		childregs->bx = sp;	/* function */
-		childregs->bp = arg;
-		childregs->orig_ax = -1;
-		childregs->cs = __KERNEL_CS | get_kernel_rpl();
-		childregs->flags = X86_EFLAGS_IF | X86_EFLAGS_FIXED;
+		frame->bx = sp;		/* function */
+		frame->di = arg;
 		p->thread.io_bitmap_ptr = NULL;
 		return 0;
 	}
+	frame->bx = 0;
 	*childregs = *current_pt_regs();
 	childregs->ax = 0;
 	if (sp)
 		childregs->sp = sp;
 
-	frame->ret_addr = (unsigned long) ret_from_fork;
 	task_user_gs(p) = get_user_gs(current_pt_regs());
 
 	p->thread.io_bitmap_ptr = NULL;
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 827eeed..b812cd0 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -50,8 +50,6 @@
 #include <asm/switch_to.h>
 #include <asm/xen/hypervisor.h>
 
-asmlinkage extern void ret_from_fork(void);
-
 __visible DEFINE_PER_CPU(unsigned long, rsp_scratch);
 
 /* Prints also some state that isn't saved in the pt_regs */
@@ -165,15 +163,11 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long sp,
 	if (unlikely(p->flags & PF_KTHREAD)) {
 		/* kernel thread */
 		memset(childregs, 0, sizeof(struct pt_regs));
-		childregs->sp = (unsigned long)childregs;
-		childregs->ss = __KERNEL_DS;
-		childregs->bx = sp; /* function */
-		childregs->bp = arg;
-		childregs->orig_ax = -1;
-		childregs->cs = __KERNEL_CS | get_kernel_rpl();
-		childregs->flags = X86_EFLAGS_IF | X86_EFLAGS_FIXED;
+		frame->bx = sp;		/* function */
+		frame->r12 = arg;
 		return 0;
 	}
+	frame->bx = 0;
 	*childregs = *current_pt_regs();
 
 	childregs->ax = 0;

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [tip:x86/asm] sched/x86: Fix thread_saved_pc()
  2016-08-13 16:38 ` [PATCH v3 6/7] x86: Fix thread_saved_pc() Brian Gerst
@ 2016-08-24 13:09   ` tip-bot for Brian Gerst
  0 siblings, 0 replies; 25+ messages in thread
From: tip-bot for Brian Gerst @ 2016-08-24 13:09 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: peterz, dvlasenk, hpa, brgerst, tglx, luto, jpoimboe, bp, mingo,
	torvalds, linux-kernel

Commit-ID:  ffcb043ba524d3fbd979a9dac2c9ce8ad352000d
Gitweb:     http://git.kernel.org/tip/ffcb043ba524d3fbd979a9dac2c9ce8ad352000d
Author:     Brian Gerst <brgerst@gmail.com>
AuthorDate: Sat, 13 Aug 2016 12:38:21 -0400
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 24 Aug 2016 12:31:51 +0200

sched/x86: Fix thread_saved_pc()

thread_saved_pc() was using a completely bogus method to get the return
address.  Since switch_to() was previously inlined, there was no sane way
to know where on the stack the return address was stored.  Now with the
frame of a sleeping thread well defined, this can be implemented correctly.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1471106302-10159-7-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/processor.h | 10 ++--------
 arch/x86/kernel/process.c        | 11 +++++++++++
 arch/x86/kernel/process_32.c     |  8 --------
 3 files changed, 13 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 6fee863..b22fb5a 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -721,8 +721,6 @@ static inline void spin_lock_prefetch(const void *x)
 	.addr_limit		= KERNEL_DS,				  \
 }
 
-extern unsigned long thread_saved_pc(struct task_struct *tsk);
-
 /*
  * TOP_OF_KERNEL_STACK_PADDING reserves 8 bytes on top of the ring0 stack.
  * This is necessary to guarantee that the entire "struct pt_regs"
@@ -773,17 +771,13 @@ extern unsigned long thread_saved_pc(struct task_struct *tsk);
 	.addr_limit		= KERNEL_DS,			\
 }
 
-/*
- * Return saved PC of a blocked thread.
- * What is this good for? it will be always the scheduler or ret_from_fork.
- */
-#define thread_saved_pc(t)	READ_ONCE_NOCHECK(*(unsigned long *)((t)->thread.sp - 8))
-
 #define task_pt_regs(tsk)	((struct pt_regs *)(tsk)->thread.sp0 - 1)
 extern unsigned long KSTK_ESP(struct task_struct *task);
 
 #endif /* CONFIG_X86_64 */
 
+extern unsigned long thread_saved_pc(struct task_struct *tsk);
+
 extern void start_thread(struct pt_regs *regs, unsigned long new_ip,
 					       unsigned long new_sp);
 
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 0115a4a..c1fa790 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -514,6 +514,17 @@ unsigned long arch_randomize_brk(struct mm_struct *mm)
 }
 
 /*
+ * Return saved PC of a blocked thread.
+ * What is this good for? it will be always the scheduler or ret_from_fork.
+ */
+unsigned long thread_saved_pc(struct task_struct *tsk)
+{
+	struct inactive_task_frame *frame =
+		(struct inactive_task_frame *) READ_ONCE(tsk->thread.sp);
+	return READ_ONCE_NOCHECK(frame->ret_addr);
+}
+
+/*
  * Called from fs/proc with a reference on @p to find the function
  * which called into schedule(). This needs to be done carefully
  * because the task might wake up and we might look at a stack
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 18714a1..404efdf 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -55,14 +55,6 @@
 #include <asm/switch_to.h>
 #include <asm/vm86.h>
 
-/*
- * Return saved PC of a blocked thread.
- */
-unsigned long thread_saved_pc(struct task_struct *tsk)
-{
-	return ((unsigned long *)tsk->thread.sp)[3];
-}
-
 void __show_regs(struct pt_regs *regs, int all)
 {
 	unsigned long cr0 = 0L, cr2 = 0L, cr3 = 0L, cr4 = 0L;

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [tip:x86/asm] sched: Remove __schedule() non-standard frame annotation
  2016-08-13 16:38 ` [PATCH v3 7/7] Revert "sched: Mark __schedule() stack frame as non-standard" Brian Gerst
@ 2016-08-24 13:09   ` tip-bot for Brian Gerst
  0 siblings, 0 replies; 25+ messages in thread
From: tip-bot for Brian Gerst @ 2016-08-24 13:09 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: torvalds, mingo, dvlasenk, bp, jpoimboe, brgerst, peterz, tglx,
	linux-kernel, luto, hpa

Commit-ID:  01175255fd8e3e993353a779f819ec8c0c59137e
Gitweb:     http://git.kernel.org/tip/01175255fd8e3e993353a779f819ec8c0c59137e
Author:     Brian Gerst <brgerst@gmail.com>
AuthorDate: Sat, 13 Aug 2016 12:38:22 -0400
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 24 Aug 2016 12:31:51 +0200

sched: Remove __schedule() non-standard frame annotation

Now that the x86 switch_to() uses the standard C calling convention,
the STACK_FRAME_NON_STANDARD() annotation is no longer needed.

Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1471106302-10159-8-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/core.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2a906f2..3d91b63dd 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3381,7 +3381,6 @@ static void __sched notrace __schedule(bool preempt)
 
 	balance_callback(rq);
 }
-STACK_FRAME_NON_STANDARD(__schedule); /* switch_to() */
 
 static inline void sched_submit_work(struct task_struct *tsk)
 {

^ permalink raw reply related	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2016-08-24 13:46 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-13 16:38 [PATCH v3 0/7] x86: Rewrite switch_to() Brian Gerst
2016-08-13 16:38 ` [PATCH v3 1/7] x86-32, kgdb: Don't use thread.ip in sleeping_thread_to_gdb_regs() Brian Gerst
2016-08-24 13:07   ` [tip:x86/asm] sched/x86/32, " tip-bot for Brian Gerst
2016-08-13 16:38 ` [PATCH v3 2/7] x86-64, kgdb: clear GDB_PS on 64-bit Brian Gerst
2016-08-24 13:07   ` [tip:x86/asm] sched/x86/64, kgdb: Clear " tip-bot for Brian Gerst
2016-08-13 16:38 ` [PATCH v3 3/7] x86: Add struct inactive_task_frame Brian Gerst
2016-08-24 13:08   ` [tip:x86/asm] sched/x86: Add 'struct inactive_task_frame' to better document the sleeping task stack frame tip-bot for Brian Gerst
2016-08-13 16:38 ` [PATCH v3 4/7] x86: Rewrite switch_to() code Brian Gerst
2016-08-24 13:08   ` [tip:x86/asm] sched/x86: Rewrite the " tip-bot for Brian Gerst
2016-08-13 16:38 ` [PATCH v3 5/7] x86: Pass kernel thread parameters in fork_frame Brian Gerst
2016-08-24 13:09   ` [tip:x86/asm] sched/x86: Pass kernel thread parameters in 'struct fork_frame' tip-bot for Brian Gerst
2016-08-13 16:38 ` [PATCH v3 6/7] x86: Fix thread_saved_pc() Brian Gerst
2016-08-24 13:09   ` [tip:x86/asm] sched/x86: " tip-bot for Brian Gerst
2016-08-13 16:38 ` [PATCH v3 7/7] Revert "sched: Mark __schedule() stack frame as non-standard" Brian Gerst
2016-08-24 13:09   ` [tip:x86/asm] sched: Remove __schedule() non-standard frame annotation tip-bot for Brian Gerst
2016-08-13 17:16 ` [PATCH v3 0/7] x86: Rewrite switch_to() Linus Torvalds
2016-08-13 18:15   ` Brian Gerst
2016-08-13 18:45     ` Ingo Molnar
2016-08-13 19:33       ` Andy Lutomirski
2016-08-17  5:16         ` Herbert Xu
2016-08-14 14:18       ` Brian Gerst
2016-08-15  5:10         ` Ingo Molnar
2016-08-15 11:43           ` Brian Gerst
2016-08-17 21:23           ` Andy Lutomirski
2016-08-17 21:20 ` Josh Poimboeuf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.