* [PATCH v2 0/6] x86: Rewrite switch_to()
@ 2016-06-18 20:56 Brian Gerst
2016-06-18 20:56 ` [PATCH v2 1/6] x86-32, kgdb: Don't use thread.ip in sleeping_thread_to_gdb_regs() Brian Gerst
` (6 more replies)
0 siblings, 7 replies; 22+ messages in thread
From: Brian Gerst @ 2016-06-18 20:56 UTC (permalink / raw)
To: x86, linux-kernel
Cc: Ingo Molnar, H. Peter Anvin, Denys Vlasenko, Andy Lutomirski,
Borislav Petkov, Thomas Gleixner
This patch set simplifies the switch_to() code, by moving the stack switch
code out of line into an asm stub before calling __switch_to(). This ends
up being more readable, and using the C calling convention instead of
clobbering all registers improves code generation. It also allows newly
forked processes to construct a special stack frame to seamlessly flow
to ret_from_fork, instead of using a test and branch, or an unbalanced
call/ret.
Changes from v1:
- Added struct inactive_task_frame
- Added comments about kernel threads returning to userspace
- Cleaned up some incorrect uses of thread.sp
Brian Gerst (6):
x86-32, kgdb: Don't use thread.ip in sleeping_thread_to_gdb_regs()
x86-64, kgdb: clear GDB_PS on 64-bit
x86: Add struct inactive_task_frame
x86: Rewrite switch_to() code
x86: Pass kernel thread parameters in fork_frame
x86: Fix thread_saved_pc()
arch/x86/entry/entry_32.S | 68 +++++++++++++-----
arch/x86/entry/entry_64.S | 76 ++++++++++++++------
arch/x86/include/asm/processor.h | 13 +---
arch/x86/include/asm/stacktrace.h | 4 +-
arch/x86/include/asm/switch_to.h | 142 +++++++------------------------------
arch/x86/include/asm/thread_info.h | 2 -
arch/x86/kernel/asm-offsets.c | 6 ++
arch/x86/kernel/asm-offsets_32.c | 5 ++
arch/x86/kernel/asm-offsets_64.c | 5 ++
arch/x86/kernel/kgdb.c | 8 +--
arch/x86/kernel/process.c | 13 +++-
arch/x86/kernel/process_32.c | 29 +++-----
arch/x86/kernel/process_64.c | 19 ++---
arch/x86/kernel/smpboot.c | 1 -
14 files changed, 187 insertions(+), 204 deletions(-)
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH v2 1/6] x86-32, kgdb: Don't use thread.ip in sleeping_thread_to_gdb_regs()
2016-06-18 20:56 [PATCH v2 0/6] x86: Rewrite switch_to() Brian Gerst
@ 2016-06-18 20:56 ` Brian Gerst
2016-06-18 20:56 ` [PATCH v2 2/6] x86-64, kgdb: clear GDB_PS on 64-bit Brian Gerst
` (5 subsequent siblings)
6 siblings, 0 replies; 22+ messages in thread
From: Brian Gerst @ 2016-06-18 20:56 UTC (permalink / raw)
To: x86, linux-kernel
Cc: Ingo Molnar, H. Peter Anvin, Denys Vlasenko, Andy Lutomirski,
Borislav Petkov, Thomas Gleixner
Match 64-bit and set gdb_regs[GDB_PC] to zero. thread.ip is always the
same point in the scheduler (except for newly forked processes), and will
be removed in a future patch.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
---
arch/x86/kernel/kgdb.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/arch/x86/kernel/kgdb.c b/arch/x86/kernel/kgdb.c
index 04cde52..fe649a5 100644
--- a/arch/x86/kernel/kgdb.c
+++ b/arch/x86/kernel/kgdb.c
@@ -172,7 +172,6 @@ void sleeping_thread_to_gdb_regs(unsigned long *gdb_regs, struct task_struct *p)
gdb_regs[GDB_ES] = __KERNEL_DS;
gdb_regs[GDB_PS] = 0;
gdb_regs[GDB_CS] = __KERNEL_CS;
- gdb_regs[GDB_PC] = p->thread.ip;
gdb_regs[GDB_SS] = __KERNEL_DS;
gdb_regs[GDB_FS] = 0xFFFF;
gdb_regs[GDB_GS] = 0xFFFF;
@@ -180,7 +179,6 @@ void sleeping_thread_to_gdb_regs(unsigned long *gdb_regs, struct task_struct *p)
gdb_regs32[GDB_PS] = *(unsigned long *)(p->thread.sp + 8);
gdb_regs32[GDB_CS] = __KERNEL_CS;
gdb_regs32[GDB_SS] = __KERNEL_DS;
- gdb_regs[GDB_PC] = 0;
gdb_regs[GDB_R8] = 0;
gdb_regs[GDB_R9] = 0;
gdb_regs[GDB_R10] = 0;
@@ -190,6 +188,7 @@ void sleeping_thread_to_gdb_regs(unsigned long *gdb_regs, struct task_struct *p)
gdb_regs[GDB_R14] = 0;
gdb_regs[GDB_R15] = 0;
#endif
+ gdb_regs[GDB_PC] = 0;
gdb_regs[GDB_SP] = p->thread.sp;
}
--
2.5.5
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v2 2/6] x86-64, kgdb: clear GDB_PS on 64-bit
2016-06-18 20:56 [PATCH v2 0/6] x86: Rewrite switch_to() Brian Gerst
2016-06-18 20:56 ` [PATCH v2 1/6] x86-32, kgdb: Don't use thread.ip in sleeping_thread_to_gdb_regs() Brian Gerst
@ 2016-06-18 20:56 ` Brian Gerst
2016-06-18 20:56 ` [PATCH v2 3/6] x86: Add struct inactive_task_frame Brian Gerst
` (4 subsequent siblings)
6 siblings, 0 replies; 22+ messages in thread
From: Brian Gerst @ 2016-06-18 20:56 UTC (permalink / raw)
To: x86, linux-kernel
Cc: Ingo Molnar, H. Peter Anvin, Denys Vlasenko, Andy Lutomirski,
Borislav Petkov, Thomas Gleixner
switch_to() no longer saves EFLAGS, so it's bogus to look for it on the
stack. Set it to zero like 32-bit.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
---
arch/x86/kernel/kgdb.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kernel/kgdb.c b/arch/x86/kernel/kgdb.c
index fe649a5..5e3f294 100644
--- a/arch/x86/kernel/kgdb.c
+++ b/arch/x86/kernel/kgdb.c
@@ -176,7 +176,7 @@ void sleeping_thread_to_gdb_regs(unsigned long *gdb_regs, struct task_struct *p)
gdb_regs[GDB_FS] = 0xFFFF;
gdb_regs[GDB_GS] = 0xFFFF;
#else
- gdb_regs32[GDB_PS] = *(unsigned long *)(p->thread.sp + 8);
+ gdb_regs32[GDB_PS] = 0;
gdb_regs32[GDB_CS] = __KERNEL_CS;
gdb_regs32[GDB_SS] = __KERNEL_DS;
gdb_regs[GDB_R8] = 0;
--
2.5.5
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v2 3/6] x86: Add struct inactive_task_frame
2016-06-18 20:56 [PATCH v2 0/6] x86: Rewrite switch_to() Brian Gerst
2016-06-18 20:56 ` [PATCH v2 1/6] x86-32, kgdb: Don't use thread.ip in sleeping_thread_to_gdb_regs() Brian Gerst
2016-06-18 20:56 ` [PATCH v2 2/6] x86-64, kgdb: clear GDB_PS on 64-bit Brian Gerst
@ 2016-06-18 20:56 ` Brian Gerst
2016-06-19 21:18 ` Andy Lutomirski
2016-06-20 15:39 ` Josh Poimboeuf
2016-06-18 20:56 ` [PATCH v2 4/6] x86: Rewrite switch_to() code Brian Gerst
` (3 subsequent siblings)
6 siblings, 2 replies; 22+ messages in thread
From: Brian Gerst @ 2016-06-18 20:56 UTC (permalink / raw)
To: x86, linux-kernel
Cc: Ingo Molnar, H. Peter Anvin, Denys Vlasenko, Andy Lutomirski,
Borislav Petkov, Thomas Gleixner
Add struct inactive_task_frame, which defines the layout of the stack for
a sleeping process. For now, the only defined field is the BP register
(frame pointer).
Signed-off-by: Brian Gerst <brgerst@gmail.com>
---
arch/x86/include/asm/stacktrace.h | 4 ++--
arch/x86/include/asm/switch_to.h | 5 +++++
arch/x86/kernel/kgdb.c | 3 ++-
arch/x86/kernel/process.c | 3 ++-
4 files changed, 11 insertions(+), 4 deletions(-)
diff --git a/arch/x86/include/asm/stacktrace.h b/arch/x86/include/asm/stacktrace.h
index 7c247e7..fb4a078 100644
--- a/arch/x86/include/asm/stacktrace.h
+++ b/arch/x86/include/asm/stacktrace.h
@@ -8,6 +8,7 @@
#include <linux/uaccess.h>
#include <linux/ptrace.h>
+#include <asm/switch_to.h>
extern int kstack_depth_to_print;
@@ -70,8 +71,7 @@ stack_frame(struct task_struct *task, struct pt_regs *regs)
return bp;
}
- /* bp is the last reg pushed by switch_to */
- return *(unsigned long *)task->thread.sp;
+ return ((struct inactive_task_frame *)task->thread.sp)->bp;
}
#else
static inline unsigned long
diff --git a/arch/x86/include/asm/switch_to.h b/arch/x86/include/asm/switch_to.h
index 8f321a1..02de86e 100644
--- a/arch/x86/include/asm/switch_to.h
+++ b/arch/x86/include/asm/switch_to.h
@@ -8,6 +8,11 @@ struct tss_struct;
void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p,
struct tss_struct *tss);
+/* data that is pointed to by thread.sp */
+struct inactive_task_frame {
+ unsigned long bp;
+};
+
#ifdef CONFIG_X86_32
#ifdef CONFIG_CC_STACKPROTECTOR
diff --git a/arch/x86/kernel/kgdb.c b/arch/x86/kernel/kgdb.c
index 5e3f294..8e36f24 100644
--- a/arch/x86/kernel/kgdb.c
+++ b/arch/x86/kernel/kgdb.c
@@ -50,6 +50,7 @@
#include <asm/apicdef.h>
#include <asm/apic.h>
#include <asm/nmi.h>
+#include <asm/switch_to.h>
struct dbg_reg_def_t dbg_reg_def[DBG_MAX_REG_NUM] =
{
@@ -166,7 +167,7 @@ void sleeping_thread_to_gdb_regs(unsigned long *gdb_regs, struct task_struct *p)
gdb_regs[GDB_DX] = 0;
gdb_regs[GDB_SI] = 0;
gdb_regs[GDB_DI] = 0;
- gdb_regs[GDB_BP] = *(unsigned long *)p->thread.sp;
+ gdb_regs[GDB_BP] = ((struct inactive_task_frame *)p->thread.sp)->bp;
#ifdef CONFIG_X86_32
gdb_regs[GDB_DS] = __KERNEL_DS;
gdb_regs[GDB_ES] = __KERNEL_DS;
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 96becbb..00ebab0 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -31,6 +31,7 @@
#include <asm/tlbflush.h>
#include <asm/mce.h>
#include <asm/vm86.h>
+#include <asm/switch_to.h>
/*
* per-CPU TSS segments. Threads are completely 'soft' on Linux,
@@ -555,7 +556,7 @@ unsigned long get_wchan(struct task_struct *p)
if (sp < bottom || sp > top)
return 0;
- fp = READ_ONCE_NOCHECK(*(unsigned long *)sp);
+ fp = READ_ONCE_NOCHECK(((struct inactive_task_frame *)sp)->bp);
do {
if (fp < bottom || fp > top)
return 0;
--
2.5.5
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v2 4/6] x86: Rewrite switch_to() code
2016-06-18 20:56 [PATCH v2 0/6] x86: Rewrite switch_to() Brian Gerst
` (2 preceding siblings ...)
2016-06-18 20:56 ` [PATCH v2 3/6] x86: Add struct inactive_task_frame Brian Gerst
@ 2016-06-18 20:56 ` Brian Gerst
2016-06-19 21:22 ` Andy Lutomirski
2016-06-20 15:44 ` Josh Poimboeuf
2016-06-18 20:56 ` [PATCH v2 5/6] x86: Pass kernel thread parameters in fork_frame Brian Gerst
` (2 subsequent siblings)
6 siblings, 2 replies; 22+ messages in thread
From: Brian Gerst @ 2016-06-18 20:56 UTC (permalink / raw)
To: x86, linux-kernel
Cc: Ingo Molnar, H. Peter Anvin, Denys Vlasenko, Andy Lutomirski,
Borislav Petkov, Thomas Gleixner
Move the low-level context switch code to an out-of-line asm stub instead of
using complex inline asm. This allows constructing a new stack frame for the
child process to make it seamlessly flow to ret_from_fork without an extra
test and branch in __switch_to(). It also improves code generation for
__schedule() by using the C calling convention instead of clobbering all
registers.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
---
arch/x86/entry/entry_32.S | 37 ++++++++++
arch/x86/entry/entry_64.S | 41 ++++++++++-
arch/x86/include/asm/processor.h | 3 -
arch/x86/include/asm/switch_to.h | 137 ++++++-------------------------------
arch/x86/include/asm/thread_info.h | 2 -
arch/x86/kernel/asm-offsets.c | 6 ++
arch/x86/kernel/asm-offsets_32.c | 5 ++
arch/x86/kernel/asm-offsets_64.c | 5 ++
arch/x86/kernel/process_32.c | 9 ++-
arch/x86/kernel/process_64.c | 9 ++-
arch/x86/kernel/smpboot.c | 1 -
11 files changed, 125 insertions(+), 130 deletions(-)
diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 983e5d3..e8302b5 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -204,6 +204,43 @@
POP_GS_EX
.endm
+/*
+ * %eax: prev task
+ * %edx: next task
+ */
+ENTRY(__switch_to_asm)
+ /*
+ * Save callee-saved registers
+ * This must match the order in struct inactive_task_frame
+ */
+ pushl %ebp
+ pushl %ebx
+ pushl %edi
+ pushl %esi
+
+ /* switch stack */
+ movl %esp, TASK_threadsp(%eax)
+ movl TASK_threadsp(%edx), %esp
+
+#ifdef CONFIG_CC_STACKPROTECTOR
+ movl TASK_stack_canary(%edx), %ebx
+ movl %ebx, PER_CPU_VAR(stack_canary)+stack_canary_offset
+#endif
+
+ /* restore callee-saved registers */
+ popl %esi
+ popl %edi
+ popl %ebx
+ popl %ebp
+
+ jmp __switch_to
+END(__switch_to_asm)
+
+/*
+ * A newly forked process directly context switches into this address.
+ *
+ * eax: prev task we switched from
+ */
ENTRY(ret_from_fork)
pushl %eax
call schedule_tail
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 9ee0da1..7574528 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -365,13 +365,48 @@ END(ptregs_\func)
#include <asm/syscalls_64.h>
/*
+ * %rdi: prev task
+ * %rsi: next task
+ */
+ENTRY(__switch_to_asm)
+ /*
+ * Save callee-saved registers
+ * This must match the order in inactive_task_frame
+ */
+ pushq %rbp
+ pushq %rbx
+ pushq %r12
+ pushq %r13
+ pushq %r14
+ pushq %r15
+
+ /* switch stack */
+ movq %rsp, TASK_threadsp(%rdi)
+ movq TASK_threadsp(%rsi), %rsp
+
+#ifdef CONFIG_CC_STACKPROTECTOR
+ movq TASK_stack_canary(%rsi), %rbx
+ movq %rbx, PER_CPU_VAR(irq_stack_union)+stack_canary_offset
+#endif
+
+ /* restore callee-saved registers */
+ popq %r15
+ popq %r14
+ popq %r13
+ popq %r12
+ popq %rbx
+ popq %rbp
+
+ jmp __switch_to
+END(__switch_to_asm)
+
+/*
* A newly forked process directly context switches into this address.
*
- * rdi: prev task we switched from
+ * rax: prev task we switched from
*/
ENTRY(ret_from_fork)
- LOCK ; btr $TIF_FORK, TI_flags(%r8)
-
+ movq %rax, %rdi
call schedule_tail /* rdi: 'prev' task parameter */
testb $3, CS(%rsp) /* from kernel_thread? */
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 965c5d2..1e7d634 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -385,9 +385,6 @@ struct thread_struct {
unsigned short fsindex;
unsigned short gsindex;
#endif
-#ifdef CONFIG_X86_32
- unsigned long ip;
-#endif
#ifdef CONFIG_X86_64
unsigned long fsbase;
unsigned long gsbase;
diff --git a/arch/x86/include/asm/switch_to.h b/arch/x86/include/asm/switch_to.h
index 02de86e..bf4e2ec 100644
--- a/arch/x86/include/asm/switch_to.h
+++ b/arch/x86/include/asm/switch_to.h
@@ -2,135 +2,40 @@
#define _ASM_X86_SWITCH_TO_H
struct task_struct; /* one of the stranger aspects of C forward declarations */
+
+struct task_struct *__switch_to_asm(struct task_struct *prev,
+ struct task_struct *next);
+
__visible struct task_struct *__switch_to(struct task_struct *prev,
- struct task_struct *next);
+ struct task_struct *next);
struct tss_struct;
void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p,
struct tss_struct *tss);
/* data that is pointed to by thread.sp */
struct inactive_task_frame {
+#ifdef CONFIG_X86_64
+ unsigned long r15;
+ unsigned long r14;
+ unsigned long r13;
+ unsigned long r12;
+#else
+ unsigned long si;
+ unsigned long di;
+#endif
+ unsigned long bx;
unsigned long bp;
+ unsigned long ret_addr;
};
-#ifdef CONFIG_X86_32
-
-#ifdef CONFIG_CC_STACKPROTECTOR
-#define __switch_canary \
- "movl %P[task_canary](%[next]), %%ebx\n\t" \
- "movl %%ebx, "__percpu_arg([stack_canary])"\n\t"
-#define __switch_canary_oparam \
- , [stack_canary] "=m" (stack_canary.canary)
-#define __switch_canary_iparam \
- , [task_canary] "i" (offsetof(struct task_struct, stack_canary))
-#else /* CC_STACKPROTECTOR */
-#define __switch_canary
-#define __switch_canary_oparam
-#define __switch_canary_iparam
-#endif /* CC_STACKPROTECTOR */
+struct fork_frame {
+ struct inactive_task_frame frame;
+ struct pt_regs regs;
+};
-/*
- * Saving eflags is important. It switches not only IOPL between tasks,
- * it also protects other tasks from NT leaking through sysenter etc.
- */
#define switch_to(prev, next, last) \
do { \
- /* \
- * Context-switching clobbers all registers, so we clobber \
- * them explicitly, via unused output variables. \
- * (EAX and EBP is not listed because EBP is saved/restored \
- * explicitly for wchan access and EAX is the return value of \
- * __switch_to()) \
- */ \
- unsigned long ebx, ecx, edx, esi, edi; \
- \
- asm volatile("pushl %%ebp\n\t" /* save EBP */ \
- "movl %%esp,%[prev_sp]\n\t" /* save ESP */ \
- "movl %[next_sp],%%esp\n\t" /* restore ESP */ \
- "movl $1f,%[prev_ip]\n\t" /* save EIP */ \
- "pushl %[next_ip]\n\t" /* restore EIP */ \
- __switch_canary \
- "jmp __switch_to\n" /* regparm call */ \
- "1:\t" \
- "popl %%ebp\n\t" /* restore EBP */ \
- \
- /* output parameters */ \
- : [prev_sp] "=m" (prev->thread.sp), \
- [prev_ip] "=m" (prev->thread.ip), \
- "=a" (last), \
- \
- /* clobbered output registers: */ \
- "=b" (ebx), "=c" (ecx), "=d" (edx), \
- "=S" (esi), "=D" (edi) \
- \
- __switch_canary_oparam \
- \
- /* input parameters: */ \
- : [next_sp] "m" (next->thread.sp), \
- [next_ip] "m" (next->thread.ip), \
- \
- /* regparm parameters for __switch_to(): */ \
- [prev] "a" (prev), \
- [next] "d" (next) \
- \
- __switch_canary_iparam \
- \
- : /* reloaded segment registers */ \
- "memory"); \
+ ((last) = __switch_to_asm((prev), (next))); \
} while (0)
-#else /* CONFIG_X86_32 */
-
-/* frame pointer must be last for get_wchan */
-#define SAVE_CONTEXT "pushq %%rbp ; movq %%rsi,%%rbp\n\t"
-#define RESTORE_CONTEXT "movq %%rbp,%%rsi ; popq %%rbp\t"
-
-#define __EXTRA_CLOBBER \
- , "rcx", "rbx", "rdx", "r8", "r9", "r10", "r11", \
- "r12", "r13", "r14", "r15", "flags"
-
-#ifdef CONFIG_CC_STACKPROTECTOR
-#define __switch_canary \
- "movq %P[task_canary](%%rsi),%%r8\n\t" \
- "movq %%r8,"__percpu_arg([gs_canary])"\n\t"
-#define __switch_canary_oparam \
- , [gs_canary] "=m" (irq_stack_union.stack_canary)
-#define __switch_canary_iparam \
- , [task_canary] "i" (offsetof(struct task_struct, stack_canary))
-#else /* CC_STACKPROTECTOR */
-#define __switch_canary
-#define __switch_canary_oparam
-#define __switch_canary_iparam
-#endif /* CC_STACKPROTECTOR */
-
-/*
- * There is no need to save or restore flags, because flags are always
- * clean in kernel mode, with the possible exception of IOPL. Kernel IOPL
- * has no effect.
- */
-#define switch_to(prev, next, last) \
- asm volatile(SAVE_CONTEXT \
- "movq %%rsp,%P[threadrsp](%[prev])\n\t" /* save RSP */ \
- "movq %P[threadrsp](%[next]),%%rsp\n\t" /* restore RSP */ \
- "call __switch_to\n\t" \
- "movq "__percpu_arg([current_task])",%%rsi\n\t" \
- __switch_canary \
- "movq %P[thread_info](%%rsi),%%r8\n\t" \
- "movq %%rax,%%rdi\n\t" \
- "testl %[_tif_fork],%P[ti_flags](%%r8)\n\t" \
- "jnz ret_from_fork\n\t" \
- RESTORE_CONTEXT \
- : "=a" (last) \
- __switch_canary_oparam \
- : [next] "S" (next), [prev] "D" (prev), \
- [threadrsp] "i" (offsetof(struct task_struct, thread.sp)), \
- [ti_flags] "i" (offsetof(struct thread_info, flags)), \
- [_tif_fork] "i" (_TIF_FORK), \
- [thread_info] "i" (offsetof(struct task_struct, stack)), \
- [current_task] "m" (current_task) \
- __switch_canary_iparam \
- : "memory", "cc" __EXTRA_CLOBBER)
-
-#endif /* CONFIG_X86_32 */
-
#endif /* _ASM_X86_SWITCH_TO_H */
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index 30c133a..20d56ec 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -99,7 +99,6 @@ struct thread_info {
#define TIF_UPROBE 12 /* breakpointed or singlestepping */
#define TIF_NOTSC 16 /* TSC is not accessible in userland */
#define TIF_IA32 17 /* IA32 compatibility process */
-#define TIF_FORK 18 /* ret_from_fork */
#define TIF_NOHZ 19 /* in adaptive nohz mode */
#define TIF_MEMDIE 20 /* is terminating due to OOM killer */
#define TIF_POLLING_NRFLAG 21 /* idle is polling for TIF_NEED_RESCHED */
@@ -123,7 +122,6 @@ struct thread_info {
#define _TIF_UPROBE (1 << TIF_UPROBE)
#define _TIF_NOTSC (1 << TIF_NOTSC)
#define _TIF_IA32 (1 << TIF_IA32)
-#define _TIF_FORK (1 << TIF_FORK)
#define _TIF_NOHZ (1 << TIF_NOHZ)
#define _TIF_POLLING_NRFLAG (1 << TIF_POLLING_NRFLAG)
#define _TIF_IO_BITMAP (1 << TIF_IO_BITMAP)
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index 674134e..ec41c79 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -29,6 +29,12 @@
void common(void) {
BLANK();
+ OFFSET(TASK_threadsp, task_struct, thread.sp);
+#ifdef CONFIG_CC_STACKPROTECTOR
+ OFFSET(TASK_stack_canary, task_struct, stack_canary);
+#endif
+
+ BLANK();
OFFSET(TI_flags, thread_info, flags);
OFFSET(TI_status, thread_info, status);
OFFSET(TI_addr_limit, thread_info, addr_limit);
diff --git a/arch/x86/kernel/asm-offsets_32.c b/arch/x86/kernel/asm-offsets_32.c
index ecdc1d2..880aa09 100644
--- a/arch/x86/kernel/asm-offsets_32.c
+++ b/arch/x86/kernel/asm-offsets_32.c
@@ -57,6 +57,11 @@ void foo(void)
/* Size of SYSENTER_stack */
DEFINE(SIZEOF_SYSENTER_stack, sizeof(((struct tss_struct *)0)->SYSENTER_stack));
+#ifdef CONFIG_CC_STACKPROTECTOR
+ BLANK();
+ OFFSET(stack_canary_offset, stack_canary, canary);
+#endif
+
#if defined(CONFIG_LGUEST) || defined(CONFIG_LGUEST_GUEST) || defined(CONFIG_LGUEST_MODULE)
BLANK();
OFFSET(LGUEST_DATA_irq_enabled, lguest_data, irq_enabled);
diff --git a/arch/x86/kernel/asm-offsets_64.c b/arch/x86/kernel/asm-offsets_64.c
index d875f97..210927e 100644
--- a/arch/x86/kernel/asm-offsets_64.c
+++ b/arch/x86/kernel/asm-offsets_64.c
@@ -56,6 +56,11 @@ int main(void)
OFFSET(TSS_sp0, tss_struct, x86_tss.sp0);
BLANK();
+#ifdef CONFIG_CC_STACKPROTECTOR
+ DEFINE(stack_canary_offset, offsetof(union irq_stack_union, stack_canary));
+ BLANK();
+#endif
+
DEFINE(__NR_syscall_max, sizeof(syscalls_64) - 1);
DEFINE(NR_syscalls, sizeof(syscalls_64));
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 9f95091..81a82f5 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -133,17 +133,20 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long sp,
unsigned long arg, struct task_struct *p, unsigned long tls)
{
struct pt_regs *childregs = task_pt_regs(p);
+ struct fork_frame *fork_frame = container_of(childregs, struct fork_frame, regs);
+ struct inactive_task_frame *frame = &fork_frame->frame;
struct task_struct *tsk;
int err;
- p->thread.sp = (unsigned long) childregs;
+ frame->bp = 0;
+ p->thread.sp = (unsigned long) fork_frame;
p->thread.sp0 = (unsigned long) (childregs+1);
memset(p->thread.ptrace_bps, 0, sizeof(p->thread.ptrace_bps));
if (unlikely(p->flags & PF_KTHREAD)) {
/* kernel thread */
memset(childregs, 0, sizeof(struct pt_regs));
- p->thread.ip = (unsigned long) ret_from_kernel_thread;
+ frame->ret_addr = (unsigned long) ret_from_kernel_thread;
task_user_gs(p) = __KERNEL_STACK_CANARY;
childregs->ds = __USER_DS;
childregs->es = __USER_DS;
@@ -161,7 +164,7 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long sp,
if (sp)
childregs->sp = sp;
- p->thread.ip = (unsigned long) ret_from_fork;
+ frame->ret_addr = (unsigned long) ret_from_fork;
task_user_gs(p) = get_user_gs(current_pt_regs());
p->thread.io_bitmap_ptr = NULL;
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 6e789ca..3ac9522 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -141,12 +141,17 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long sp,
{
int err;
struct pt_regs *childregs;
+ struct fork_frame *fork_frame;
+ struct inactive_task_frame *frame;
struct task_struct *me = current;
p->thread.sp0 = (unsigned long)task_stack_page(p) + THREAD_SIZE;
childregs = task_pt_regs(p);
- p->thread.sp = (unsigned long) childregs;
- set_tsk_thread_flag(p, TIF_FORK);
+ fork_frame = container_of(childregs, struct fork_frame, regs);
+ frame = &fork_frame->frame;
+ frame->bp = 0;
+ frame->ret_addr = (unsigned long) ret_from_fork;
+ p->thread.sp = (unsigned long) fork_frame;
p->thread.io_bitmap_ptr = NULL;
savesegment(gs, p->thread.gsindex);
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 2ed0ec1..021eb02 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -935,7 +935,6 @@ void common_cpu_up(unsigned int cpu, struct task_struct *idle)
per_cpu(cpu_current_top_of_stack, cpu) =
(unsigned long)task_stack_page(idle) + THREAD_SIZE;
#else
- clear_tsk_thread_flag(idle, TIF_FORK);
initial_gs = per_cpu_offset(cpu);
#endif
}
--
2.5.5
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v2 5/6] x86: Pass kernel thread parameters in fork_frame
2016-06-18 20:56 [PATCH v2 0/6] x86: Rewrite switch_to() Brian Gerst
` (3 preceding siblings ...)
2016-06-18 20:56 ` [PATCH v2 4/6] x86: Rewrite switch_to() code Brian Gerst
@ 2016-06-18 20:56 ` Brian Gerst
2016-06-19 21:28 ` Andy Lutomirski
2016-06-20 13:51 ` Borislav Petkov
2016-06-18 20:56 ` [PATCH v2 6/6] x86: Fix thread_saved_pc() Brian Gerst
2016-06-19 22:05 ` [PATCH v2 0/6] x86: Rewrite switch_to() Brian Gerst
6 siblings, 2 replies; 22+ messages in thread
From: Brian Gerst @ 2016-06-18 20:56 UTC (permalink / raw)
To: x86, linux-kernel
Cc: Ingo Molnar, H. Peter Anvin, Denys Vlasenko, Andy Lutomirski,
Borislav Petkov, Thomas Gleixner
Instead of setting up a fake pt_regs context, put the kernel thread
function pointer and arg into the unused callee-restored registers
of struct fork_frame.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
---
arch/x86/entry/entry_32.S | 31 +++++++++++++++----------------
arch/x86/entry/entry_64.S | 35 ++++++++++++++++-------------------
arch/x86/kernel/process_32.c | 14 ++++----------
arch/x86/kernel/process_64.c | 10 +++-------
4 files changed, 38 insertions(+), 52 deletions(-)
diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index e8302b5..e8a460d 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -240,35 +240,34 @@ END(__switch_to_asm)
* A newly forked process directly context switches into this address.
*
* eax: prev task we switched from
+ * ebx: kernel thread func
+ * edi: kernel thread arg
*/
ENTRY(ret_from_fork)
pushl %eax
call schedule_tail
popl %eax
+ testl %ebx, %ebx
+ jnz 1f
+
+2:
/* When we fork, we trace the syscall return in the child, too. */
movl %esp, %eax
call syscall_return_slowpath
jmp restore_all
-END(ret_from_fork)
-
-ENTRY(ret_from_kernel_thread)
- pushl %eax
- call schedule_tail
- popl %eax
- movl PT_EBP(%esp), %eax
- call *PT_EBX(%esp)
- movl $0, PT_EAX(%esp)
+ /* kernel thread */
+1: movl %edi, %eax
+ call *%ebx
/*
- * Kernel threads return to userspace as if returning from a syscall.
- * We should check whether anything actually uses this path and, if so,
- * consider switching it over to ret_from_fork.
+ * A kernel thread is allowed to return here after successfully
+ * calling do_execve(). Exit to userspace to complete the execve()
+ * syscall.
*/
- movl %esp, %eax
- call syscall_return_slowpath
- jmp restore_all
-ENDPROC(ret_from_kernel_thread)
+ movl $0, PT_EAX(%esp)
+ jmp 2b
+END(ret_from_fork)
/*
* Return to user mode is not as complex as all this looks,
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 7574528..23e764c 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -404,37 +404,34 @@ END(__switch_to_asm)
* A newly forked process directly context switches into this address.
*
* rax: prev task we switched from
+ * rbx: kernel thread func
+ * r12: kernel thread arg
*/
ENTRY(ret_from_fork)
movq %rax, %rdi
call schedule_tail /* rdi: 'prev' task parameter */
- testb $3, CS(%rsp) /* from kernel_thread? */
+ testq %rbx, %rbx /* from kernel_thread? */
jnz 1f
- /*
- * We came from kernel_thread. This code path is quite twisted, and
- * someone should clean it up.
- *
- * copy_thread_tls stashes the function pointer in RBX and the
- * parameter to be passed in RBP. The called function is permitted
- * to call do_execve and thereby jump to user mode.
- */
- movq RBP(%rsp), %rdi
- call *RBX(%rsp)
- movl $0, RAX(%rsp)
-
- /*
- * Fall through as though we're exiting a syscall. This makes a
- * twisted sort of sense if we just called do_execve.
- */
-
-1:
+2:
movq %rsp, %rdi
call syscall_return_slowpath /* returns with IRQs disabled */
TRACE_IRQS_ON /* user mode is traced as IRQS on */
SWAPGS
jmp restore_regs_and_iret
+
+1:
+ /* kernel thread */
+ movq %r12, %rdi
+ call *%rbx
+ /*
+ * A kernel thread is allowed to return here after successfully
+ * calling do_execve(). Exit to userspace to complete the execve()
+ * syscall.
+ */
+ movq $0, RAX(%rsp)
+ jmp 2b
END(ret_from_fork)
/*
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 81a82f5..acee6ca 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -139,6 +139,7 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long sp,
int err;
frame->bp = 0;
+ frame->ret_addr = (unsigned long) ret_from_fork;
p->thread.sp = (unsigned long) fork_frame;
p->thread.sp0 = (unsigned long) (childregs+1);
memset(p->thread.ptrace_bps, 0, sizeof(p->thread.ptrace_bps));
@@ -146,19 +147,12 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long sp,
if (unlikely(p->flags & PF_KTHREAD)) {
/* kernel thread */
memset(childregs, 0, sizeof(struct pt_regs));
- frame->ret_addr = (unsigned long) ret_from_kernel_thread;
- task_user_gs(p) = __KERNEL_STACK_CANARY;
- childregs->ds = __USER_DS;
- childregs->es = __USER_DS;
- childregs->fs = __KERNEL_PERCPU;
- childregs->bx = sp; /* function */
- childregs->bp = arg;
- childregs->orig_ax = -1;
- childregs->cs = __KERNEL_CS | get_kernel_rpl();
- childregs->flags = X86_EFLAGS_IF | X86_EFLAGS_FIXED;
+ frame->bx = sp; /* function */
+ frame->di = arg;
p->thread.io_bitmap_ptr = NULL;
return 0;
}
+ frame->bx = 0;
*childregs = *current_pt_regs();
childregs->ax = 0;
if (sp)
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 3ac9522..479cae5f 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -165,15 +165,11 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long sp,
if (unlikely(p->flags & PF_KTHREAD)) {
/* kernel thread */
memset(childregs, 0, sizeof(struct pt_regs));
- childregs->sp = (unsigned long)childregs;
- childregs->ss = __KERNEL_DS;
- childregs->bx = sp; /* function */
- childregs->bp = arg;
- childregs->orig_ax = -1;
- childregs->cs = __KERNEL_CS | get_kernel_rpl();
- childregs->flags = X86_EFLAGS_IF | X86_EFLAGS_FIXED;
+ frame->bx = sp; /* function */
+ frame->r12 = arg;
return 0;
}
+ frame->bx = 0;
*childregs = *current_pt_regs();
childregs->ax = 0;
--
2.5.5
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v2 6/6] x86: Fix thread_saved_pc()
2016-06-18 20:56 [PATCH v2 0/6] x86: Rewrite switch_to() Brian Gerst
` (4 preceding siblings ...)
2016-06-18 20:56 ` [PATCH v2 5/6] x86: Pass kernel thread parameters in fork_frame Brian Gerst
@ 2016-06-18 20:56 ` Brian Gerst
2016-06-20 16:01 ` Josh Poimboeuf
2016-06-19 22:05 ` [PATCH v2 0/6] x86: Rewrite switch_to() Brian Gerst
6 siblings, 1 reply; 22+ messages in thread
From: Brian Gerst @ 2016-06-18 20:56 UTC (permalink / raw)
To: x86, linux-kernel
Cc: Ingo Molnar, H. Peter Anvin, Denys Vlasenko, Andy Lutomirski,
Borislav Petkov, Thomas Gleixner
thread_saved_pc() was using a completely bogus method to get the return
address. Since switch_to() was previously inlined, there was no sane way
to know where on the stack the return address was stored. Now with the
frame of a sleeping thread well defined, this can be implemented correctly.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
---
arch/x86/include/asm/processor.h | 10 ++--------
arch/x86/kernel/process.c | 10 ++++++++++
arch/x86/kernel/process_32.c | 8 --------
3 files changed, 12 insertions(+), 16 deletions(-)
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 1e7d634..413f4f1 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -716,8 +716,6 @@ static inline void spin_lock_prefetch(const void *x)
.io_bitmap_ptr = NULL, \
}
-extern unsigned long thread_saved_pc(struct task_struct *tsk);
-
/*
* TOP_OF_KERNEL_STACK_PADDING reserves 8 bytes on top of the ring0 stack.
* This is necessary to guarantee that the entire "struct pt_regs"
@@ -767,17 +765,13 @@ extern unsigned long thread_saved_pc(struct task_struct *tsk);
.sp0 = TOP_OF_INIT_STACK \
}
-/*
- * Return saved PC of a blocked thread.
- * What is this good for? it will be always the scheduler or ret_from_fork.
- */
-#define thread_saved_pc(t) READ_ONCE_NOCHECK(*(unsigned long *)((t)->thread.sp - 8))
-
#define task_pt_regs(tsk) ((struct pt_regs *)(tsk)->thread.sp0 - 1)
extern unsigned long KSTK_ESP(struct task_struct *task);
#endif /* CONFIG_X86_64 */
+extern unsigned long thread_saved_pc(struct task_struct *tsk);
+
extern void start_thread(struct pt_regs *regs, unsigned long new_ip,
unsigned long new_sp);
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 00ebab0..db458c4 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -513,6 +513,16 @@ unsigned long arch_randomize_brk(struct mm_struct *mm)
}
/*
+ * Return saved PC of a blocked thread.
+ */
+unsigned long thread_saved_pc(struct task_struct *tsk)
+{
+ struct inactive_task_frame *frame =
+ (struct inactive_task_frame *) READ_ONCE(tsk->thread.sp);
+ return READ_ONCE_NOCHECK(frame->ret_addr);
+}
+
+/*
* Called from fs/proc with a reference on @p to find the function
* which called into schedule(). This needs to be done carefully
* because the task might wake up and we might look at a stack
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index acee6ca..bcf0a38 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -58,14 +58,6 @@
asmlinkage void ret_from_fork(void) __asm__("ret_from_fork");
asmlinkage void ret_from_kernel_thread(void) __asm__("ret_from_kernel_thread");
-/*
- * Return saved PC of a blocked thread.
- */
-unsigned long thread_saved_pc(struct task_struct *tsk)
-{
- return ((unsigned long *)tsk->thread.sp)[3];
-}
-
void __show_regs(struct pt_regs *regs, int all)
{
unsigned long cr0 = 0L, cr2 = 0L, cr3 = 0L, cr4 = 0L;
--
2.5.5
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH v2 3/6] x86: Add struct inactive_task_frame
2016-06-18 20:56 ` [PATCH v2 3/6] x86: Add struct inactive_task_frame Brian Gerst
@ 2016-06-19 21:18 ` Andy Lutomirski
2016-06-20 15:39 ` Josh Poimboeuf
1 sibling, 0 replies; 22+ messages in thread
From: Andy Lutomirski @ 2016-06-19 21:18 UTC (permalink / raw)
To: Brian Gerst
Cc: X86 ML, linux-kernel, Ingo Molnar, H. Peter Anvin,
Denys Vlasenko, Borislav Petkov, Thomas Gleixner
On Sat, Jun 18, 2016 at 1:56 PM, Brian Gerst <brgerst@gmail.com> wrote:
> Add struct inactive_task_frame, which defines the layout of the stack for
> a sleeping process. For now, the only defined field is the BP register
> (frame pointer).
I like this version.
Reviewed-by: Andy Lutomirski <luto@kernel.org>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2 4/6] x86: Rewrite switch_to() code
2016-06-18 20:56 ` [PATCH v2 4/6] x86: Rewrite switch_to() code Brian Gerst
@ 2016-06-19 21:22 ` Andy Lutomirski
2016-06-20 15:44 ` Josh Poimboeuf
1 sibling, 0 replies; 22+ messages in thread
From: Andy Lutomirski @ 2016-06-19 21:22 UTC (permalink / raw)
To: Brian Gerst
Cc: X86 ML, linux-kernel, Ingo Molnar, H. Peter Anvin,
Denys Vlasenko, Borislav Petkov, Thomas Gleixner
On Sat, Jun 18, 2016 at 1:56 PM, Brian Gerst <brgerst@gmail.com> wrote:
> Move the low-level context switch code to an out-of-line asm stub instead of
> using complex inline asm. This allows constructing a new stack frame for the
> child process to make it seamlessly flow to ret_from_fork without an extra
> test and branch in __switch_to(). It also improves code generation for
> __schedule() by using the C calling convention instead of clobbering all
> registers.
This is great. The conflict with my stack code is minimal enough that
I won't bother rebasing unless someone specifically wants me to.
Acked-by: Andy Lutomirski <luto@kernel.org>
--Andy
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2 5/6] x86: Pass kernel thread parameters in fork_frame
2016-06-18 20:56 ` [PATCH v2 5/6] x86: Pass kernel thread parameters in fork_frame Brian Gerst
@ 2016-06-19 21:28 ` Andy Lutomirski
2016-06-19 22:01 ` Brian Gerst
2016-06-20 13:51 ` Borislav Petkov
1 sibling, 1 reply; 22+ messages in thread
From: Andy Lutomirski @ 2016-06-19 21:28 UTC (permalink / raw)
To: Brian Gerst
Cc: X86 ML, linux-kernel, Ingo Molnar, H. Peter Anvin,
Denys Vlasenko, Borislav Petkov, Thomas Gleixner
On Sat, Jun 18, 2016 at 1:56 PM, Brian Gerst <brgerst@gmail.com> wrote:
> Instead of setting up a fake pt_regs context, put the kernel thread
> function pointer and arg into the unused callee-restored registers
> of struct fork_frame.
This seems generally okay.
>
> Signed-off-by: Brian Gerst <brgerst@gmail.com>
> @@ -146,19 +147,12 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long sp,
> if (unlikely(p->flags & PF_KTHREAD)) {
> /* kernel thread */
> memset(childregs, 0, sizeof(struct pt_regs));
> - frame->ret_addr = (unsigned long) ret_from_kernel_thread;
> - task_user_gs(p) = __KERNEL_STACK_CANARY;
> - childregs->ds = __USER_DS;
> - childregs->es = __USER_DS;
> - childregs->fs = __KERNEL_PERCPU;
Is the idea that do_execve promises to initialize all these fields to
something sensible if the kernel thread in question tries to return to
user mode?
--Andy
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2 5/6] x86: Pass kernel thread parameters in fork_frame
2016-06-19 21:28 ` Andy Lutomirski
@ 2016-06-19 22:01 ` Brian Gerst
0 siblings, 0 replies; 22+ messages in thread
From: Brian Gerst @ 2016-06-19 22:01 UTC (permalink / raw)
To: Andy Lutomirski
Cc: X86 ML, linux-kernel, Ingo Molnar, H. Peter Anvin,
Denys Vlasenko, Borislav Petkov, Thomas Gleixner
On Sun, Jun 19, 2016 at 5:28 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Sat, Jun 18, 2016 at 1:56 PM, Brian Gerst <brgerst@gmail.com> wrote:
>> Instead of setting up a fake pt_regs context, put the kernel thread
>> function pointer and arg into the unused callee-restored registers
>> of struct fork_frame.
>
> This seems generally okay.
>
>>
>> Signed-off-by: Brian Gerst <brgerst@gmail.com>
>
>> @@ -146,19 +147,12 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long sp,
>
>
>> if (unlikely(p->flags & PF_KTHREAD)) {
>> /* kernel thread */
>> memset(childregs, 0, sizeof(struct pt_regs));
>> - frame->ret_addr = (unsigned long) ret_from_kernel_thread;
>> - task_user_gs(p) = __KERNEL_STACK_CANARY;
>> - childregs->ds = __USER_DS;
>> - childregs->es = __USER_DS;
>> - childregs->fs = __KERNEL_PERCPU;
>
> Is the idea that do_execve promises to initialize all these fields to
> something sensible if the kernel thread in question tries to return to
> user mode?
>
> --Andy
Yes, do_execve() should be setting the full pt_regs.
--
Brian Gerst
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2 0/6] x86: Rewrite switch_to()
2016-06-18 20:56 [PATCH v2 0/6] x86: Rewrite switch_to() Brian Gerst
` (5 preceding siblings ...)
2016-06-18 20:56 ` [PATCH v2 6/6] x86: Fix thread_saved_pc() Brian Gerst
@ 2016-06-19 22:05 ` Brian Gerst
6 siblings, 0 replies; 22+ messages in thread
From: Brian Gerst @ 2016-06-19 22:05 UTC (permalink / raw)
To: the arch/x86 maintainers, Linux Kernel Mailing List, Josh Poimboeuf
Cc: Ingo Molnar, H. Peter Anvin, Denys Vlasenko, Andy Lutomirski,
Borislav Petkov, Thomas Gleixner
On Sat, Jun 18, 2016 at 4:56 PM, Brian Gerst <brgerst@gmail.com> wrote:
> This patch set simplifies the switch_to() code, by moving the stack switch
> code out of line into an asm stub before calling __switch_to(). This ends
> up being more readable, and using the C calling convention instead of
> clobbering all registers improves code generation. It also allows newly
> forked processes to construct a special stack frame to seamlessly flow
> to ret_from_fork, instead of using a test and branch, or an unbalanced
> call/ret.
>
> Changes from v1:
> - Added struct inactive_task_frame
> - Added comments about kernel threads returning to userspace
> - Cleaned up some incorrect uses of thread.sp
I forgot to also add:
- Rearranged inactive stack frame so that BP (frame pointer) is in the
natural position right below the return address. This should take
care of unwinding issues Josh raised.
--
Brian Gerst
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2 5/6] x86: Pass kernel thread parameters in fork_frame
2016-06-18 20:56 ` [PATCH v2 5/6] x86: Pass kernel thread parameters in fork_frame Brian Gerst
2016-06-19 21:28 ` Andy Lutomirski
@ 2016-06-20 13:51 ` Borislav Petkov
2016-06-20 15:01 ` Brian Gerst
1 sibling, 1 reply; 22+ messages in thread
From: Borislav Petkov @ 2016-06-20 13:51 UTC (permalink / raw)
To: Brian Gerst
Cc: x86, linux-kernel, Ingo Molnar, H. Peter Anvin, Denys Vlasenko,
Andy Lutomirski, Thomas Gleixner
On Sat, Jun 18, 2016 at 04:56:17PM -0400, Brian Gerst wrote:
> Instead of setting up a fake pt_regs context, put the kernel thread
> function pointer and arg into the unused callee-restored registers
> of struct fork_frame.
>
> Signed-off-by: Brian Gerst <brgerst@gmail.com>
> ---
> arch/x86/entry/entry_32.S | 31 +++++++++++++++----------------
> arch/x86/entry/entry_64.S | 35 ++++++++++++++++-------------------
> arch/x86/kernel/process_32.c | 14 ++++----------
> arch/x86/kernel/process_64.c | 10 +++-------
> 4 files changed, 38 insertions(+), 52 deletions(-)
...
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index 7574528..23e764c 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -404,37 +404,34 @@ END(__switch_to_asm)
> * A newly forked process directly context switches into this address.
> *
> * rax: prev task we switched from
> + * rbx: kernel thread func
> + * r12: kernel thread arg
> */
> ENTRY(ret_from_fork)
> movq %rax, %rdi
> call schedule_tail /* rdi: 'prev' task parameter */
>
> - testb $3, CS(%rsp) /* from kernel_thread? */
> + testq %rbx, %rbx /* from kernel_thread? */
> jnz 1f
>
> - /*
> - * We came from kernel_thread. This code path is quite twisted, and
> - * someone should clean it up.
> - *
> - * copy_thread_tls stashes the function pointer in RBX and the
> - * parameter to be passed in RBP. The called function is permitted
> - * to call do_execve and thereby jump to user mode.
> - */
> - movq RBP(%rsp), %rdi
> - call *RBX(%rsp)
> - movl $0, RAX(%rsp)
> -
> - /*
> - * Fall through as though we're exiting a syscall. This makes a
> - * twisted sort of sense if we just called do_execve.
> - */
> -
> -1:
> +2:
> movq %rsp, %rdi
> call syscall_return_slowpath /* returns with IRQs disabled */
> TRACE_IRQS_ON /* user mode is traced as IRQS on */
> SWAPGS
> jmp restore_regs_and_iret
> +
> +1:
> + /* kernel thread */
> + movq %r12, %rdi
> + call *%rbx
> + /*
> + * A kernel thread is allowed to return here after successfully
> + * calling do_execve(). Exit to userspace to complete the execve()
> + * syscall.
> + */
> + movq $0, RAX(%rsp)
> + jmp 2b
> END(ret_from_fork)
Wouldn't it be simpler to reverse the JNZ after the TESTB above and thus
have a single label and simpler code flow (pasting the whole thing here
because a diff is unreadable):
ENTRY(ret_from_fork)
movq %rax, %rdi
call schedule_tail /* rdi: 'prev' task parameter */
testq %rbx, %rbx /* from kernel_thread? */
jz 1f
/* kernel thread */
movq %r12, %rdi
call *%rbx
/*
* A kernel thread is allowed to return here after successfully
* calling do_execve(). Exit to userspace to complete the execve()
* syscall.
*/
movq $0, RAX(%rsp)
1:
movq %rsp, %rdi
call syscall_return_slowpath /* returns with IRQs disabled */
TRACE_IRQS_ON /* user mode is traced as IRQS on */
SWAPGS
jmp restore_regs_and_iret
END(ret_from_fork)
It boots fine in my guest this way too.
Ditto for 32-bit.
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2 5/6] x86: Pass kernel thread parameters in fork_frame
2016-06-20 13:51 ` Borislav Petkov
@ 2016-06-20 15:01 ` Brian Gerst
2016-06-20 15:14 ` Borislav Petkov
0 siblings, 1 reply; 22+ messages in thread
From: Brian Gerst @ 2016-06-20 15:01 UTC (permalink / raw)
To: Borislav Petkov
Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Ingo Molnar,
H. Peter Anvin, Denys Vlasenko, Andy Lutomirski, Thomas Gleixner
On Mon, Jun 20, 2016 at 9:51 AM, Borislav Petkov <bp@suse.de> wrote:
> On Sat, Jun 18, 2016 at 04:56:17PM -0400, Brian Gerst wrote:
>> Instead of setting up a fake pt_regs context, put the kernel thread
>> function pointer and arg into the unused callee-restored registers
>> of struct fork_frame.
>>
>> Signed-off-by: Brian Gerst <brgerst@gmail.com>
>
>> ---
>> arch/x86/entry/entry_32.S | 31 +++++++++++++++----------------
>> arch/x86/entry/entry_64.S | 35 ++++++++++++++++-------------------
>> arch/x86/kernel/process_32.c | 14 ++++----------
>> arch/x86/kernel/process_64.c | 10 +++-------
>> 4 files changed, 38 insertions(+), 52 deletions(-)
>
> ...
>
>> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
>> index 7574528..23e764c 100644
>> --- a/arch/x86/entry/entry_64.S
>> +++ b/arch/x86/entry/entry_64.S
>> @@ -404,37 +404,34 @@ END(__switch_to_asm)
>> * A newly forked process directly context switches into this address.
>> *
>> * rax: prev task we switched from
>> + * rbx: kernel thread func
>> + * r12: kernel thread arg
>> */
>> ENTRY(ret_from_fork)
>> movq %rax, %rdi
>> call schedule_tail /* rdi: 'prev' task parameter */
>>
>> - testb $3, CS(%rsp) /* from kernel_thread? */
>> + testq %rbx, %rbx /* from kernel_thread? */
>> jnz 1f
>>
>> - /*
>> - * We came from kernel_thread. This code path is quite twisted, and
>> - * someone should clean it up.
>> - *
>> - * copy_thread_tls stashes the function pointer in RBX and the
>> - * parameter to be passed in RBP. The called function is permitted
>> - * to call do_execve and thereby jump to user mode.
>> - */
>> - movq RBP(%rsp), %rdi
>> - call *RBX(%rsp)
>> - movl $0, RAX(%rsp)
>> -
>> - /*
>> - * Fall through as though we're exiting a syscall. This makes a
>> - * twisted sort of sense if we just called do_execve.
>> - */
>> -
>> -1:
>> +2:
>> movq %rsp, %rdi
>> call syscall_return_slowpath /* returns with IRQs disabled */
>> TRACE_IRQS_ON /* user mode is traced as IRQS on */
>> SWAPGS
>> jmp restore_regs_and_iret
>> +
>> +1:
>> + /* kernel thread */
>> + movq %r12, %rdi
>> + call *%rbx
>> + /*
>> + * A kernel thread is allowed to return here after successfully
>> + * calling do_execve(). Exit to userspace to complete the execve()
>> + * syscall.
>> + */
>> + movq $0, RAX(%rsp)
>> + jmp 2b
>> END(ret_from_fork)
>
> Wouldn't it be simpler to reverse the JNZ after the TESTB above and thus
> have a single label and simpler code flow (pasting the whole thing here
> because a diff is unreadable):
>
> ENTRY(ret_from_fork)
> movq %rax, %rdi
> call schedule_tail /* rdi: 'prev' task parameter */
>
> testq %rbx, %rbx /* from kernel_thread? */
> jz 1f
>
> /* kernel thread */
> movq %r12, %rdi
> call *%rbx
> /*
> * A kernel thread is allowed to return here after successfully
> * calling do_execve(). Exit to userspace to complete the execve()
> * syscall.
> */
> movq $0, RAX(%rsp)
>
> 1:
> movq %rsp, %rdi
> call syscall_return_slowpath /* returns with IRQs disabled */
> TRACE_IRQS_ON /* user mode is traced as IRQS on */
> SWAPGS
> jmp restore_regs_and_iret
> END(ret_from_fork)
>
> It boots fine in my guest this way too.
The idea was to put the uncommon case (kernel thread) out of line for
performance reasons.
--
Brian Gerst
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2 5/6] x86: Pass kernel thread parameters in fork_frame
2016-06-20 15:01 ` Brian Gerst
@ 2016-06-20 15:14 ` Borislav Petkov
2016-06-22 4:24 ` Brian Gerst
0 siblings, 1 reply; 22+ messages in thread
From: Borislav Petkov @ 2016-06-20 15:14 UTC (permalink / raw)
To: Brian Gerst
Cc: Borislav Petkov, the arch/x86 maintainers,
Linux Kernel Mailing List, Ingo Molnar, H. Peter Anvin,
Denys Vlasenko, Andy Lutomirski, Thomas Gleixner
On Mon, Jun 20, 2016 at 11:01:02AM -0400, Brian Gerst wrote:
> The idea was to put the uncommon case (kernel thread) out of line for
> performance reasons.
A comment saying so wouldn't hurt...
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
--
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2 3/6] x86: Add struct inactive_task_frame
2016-06-18 20:56 ` [PATCH v2 3/6] x86: Add struct inactive_task_frame Brian Gerst
2016-06-19 21:18 ` Andy Lutomirski
@ 2016-06-20 15:39 ` Josh Poimboeuf
1 sibling, 0 replies; 22+ messages in thread
From: Josh Poimboeuf @ 2016-06-20 15:39 UTC (permalink / raw)
To: Brian Gerst
Cc: x86, linux-kernel, Ingo Molnar, H. Peter Anvin, Denys Vlasenko,
Andy Lutomirski, Borislav Petkov, Thomas Gleixner
On Sat, Jun 18, 2016 at 04:56:15PM -0400, Brian Gerst wrote:
> Add struct inactive_task_frame, which defines the layout of the stack for
> a sleeping process. For now, the only defined field is the BP register
> (frame pointer).
>
> Signed-off-by: Brian Gerst <brgerst@gmail.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
--
Josh
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2 4/6] x86: Rewrite switch_to() code
2016-06-18 20:56 ` [PATCH v2 4/6] x86: Rewrite switch_to() code Brian Gerst
2016-06-19 21:22 ` Andy Lutomirski
@ 2016-06-20 15:44 ` Josh Poimboeuf
1 sibling, 0 replies; 22+ messages in thread
From: Josh Poimboeuf @ 2016-06-20 15:44 UTC (permalink / raw)
To: Brian Gerst
Cc: x86, linux-kernel, Ingo Molnar, H. Peter Anvin, Denys Vlasenko,
Andy Lutomirski, Borislav Petkov, Thomas Gleixner
On Sat, Jun 18, 2016 at 04:56:16PM -0400, Brian Gerst wrote:
> Move the low-level context switch code to an out-of-line asm stub instead of
> using complex inline asm. This allows constructing a new stack frame for the
> child process to make it seamlessly flow to ret_from_fork without an extra
> test and branch in __switch_to(). It also improves code generation for
> __schedule() by using the C calling convention instead of clobbering all
> registers.
>
> Signed-off-by: Brian Gerst <brgerst@gmail.com>
You can also remove the STACK_FRAME_NON_STANDARD annotation for
__schedule() in kernel/sched/core.c since it no longer produces an
objtool warning.
Otherwise it looks great:
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
--
Josh
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2 6/6] x86: Fix thread_saved_pc()
2016-06-18 20:56 ` [PATCH v2 6/6] x86: Fix thread_saved_pc() Brian Gerst
@ 2016-06-20 16:01 ` Josh Poimboeuf
2016-06-22 4:27 ` Brian Gerst
0 siblings, 1 reply; 22+ messages in thread
From: Josh Poimboeuf @ 2016-06-20 16:01 UTC (permalink / raw)
To: Brian Gerst
Cc: x86, linux-kernel, Ingo Molnar, H. Peter Anvin, Denys Vlasenko,
Andy Lutomirski, Borislav Petkov, Thomas Gleixner
On Sat, Jun 18, 2016 at 04:56:18PM -0400, Brian Gerst wrote:
> thread_saved_pc() was using a completely bogus method to get the return
> address. Since switch_to() was previously inlined, there was no sane way
> to know where on the stack the return address was stored. Now with the
> frame of a sleeping thread well defined, this can be implemented correctly.
>
> Signed-off-by: Brian Gerst <brgerst@gmail.com>
> ---
> arch/x86/include/asm/processor.h | 10 ++--------
> arch/x86/kernel/process.c | 10 ++++++++++
> arch/x86/kernel/process_32.c | 8 --------
> 3 files changed, 12 insertions(+), 16 deletions(-)
>
> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
> index 1e7d634..413f4f1 100644
> --- a/arch/x86/include/asm/processor.h
> +++ b/arch/x86/include/asm/processor.h
> @@ -716,8 +716,6 @@ static inline void spin_lock_prefetch(const void *x)
> .io_bitmap_ptr = NULL, \
> }
>
> -extern unsigned long thread_saved_pc(struct task_struct *tsk);
> -
> /*
> * TOP_OF_KERNEL_STACK_PADDING reserves 8 bytes on top of the ring0 stack.
> * This is necessary to guarantee that the entire "struct pt_regs"
> @@ -767,17 +765,13 @@ extern unsigned long thread_saved_pc(struct task_struct *tsk);
> .sp0 = TOP_OF_INIT_STACK \
> }
>
> -/*
> - * Return saved PC of a blocked thread.
> - * What is this good for? it will be always the scheduler or ret_from_fork.
> - */
> -#define thread_saved_pc(t) READ_ONCE_NOCHECK(*(unsigned long *)((t)->thread.sp - 8))
> -
> #define task_pt_regs(tsk) ((struct pt_regs *)(tsk)->thread.sp0 - 1)
> extern unsigned long KSTK_ESP(struct task_struct *task);
>
> #endif /* CONFIG_X86_64 */
>
> +extern unsigned long thread_saved_pc(struct task_struct *tsk);
> +
> extern void start_thread(struct pt_regs *regs, unsigned long new_ip,
> unsigned long new_sp);
>
> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index 00ebab0..db458c4 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -513,6 +513,16 @@ unsigned long arch_randomize_brk(struct mm_struct *mm)
> }
>
> /*
> + * Return saved PC of a blocked thread.
> + */
> +unsigned long thread_saved_pc(struct task_struct *tsk)
> +{
> + struct inactive_task_frame *frame =
> + (struct inactive_task_frame *) READ_ONCE(tsk->thread.sp);
> + return READ_ONCE_NOCHECK(frame->ret_addr);
> +}
> +
> +/*
I would agree with the above (removed) comment:
"What is this good for? it will be always the scheduler or ret_from_fork."
And I'd guess the same is true for all the arches which have to
implement it. Maybe this function (and its single call site in
sched_show_task()) should just be removed altogether?
--
Josh
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2 5/6] x86: Pass kernel thread parameters in fork_frame
2016-06-20 15:14 ` Borislav Petkov
@ 2016-06-22 4:24 ` Brian Gerst
2016-07-09 12:01 ` Ingo Molnar
0 siblings, 1 reply; 22+ messages in thread
From: Brian Gerst @ 2016-06-22 4:24 UTC (permalink / raw)
To: Borislav Petkov
Cc: Borislav Petkov, the arch/x86 maintainers,
Linux Kernel Mailing List, Ingo Molnar, H. Peter Anvin,
Denys Vlasenko, Andy Lutomirski, Thomas Gleixner
On Mon, Jun 20, 2016 at 11:14 AM, Borislav Petkov <bp@alien8.de> wrote:
> On Mon, Jun 20, 2016 at 11:01:02AM -0400, Brian Gerst wrote:
>> The idea was to put the uncommon case (kernel thread) out of line for
>> performance reasons.
>
> A comment saying so wouldn't hurt...
This is a fairly common pattern. Do we have to document every case of it?
--
Brian Gerst
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2 6/6] x86: Fix thread_saved_pc()
2016-06-20 16:01 ` Josh Poimboeuf
@ 2016-06-22 4:27 ` Brian Gerst
2016-06-24 18:12 ` Josh Poimboeuf
0 siblings, 1 reply; 22+ messages in thread
From: Brian Gerst @ 2016-06-22 4:27 UTC (permalink / raw)
To: Josh Poimboeuf
Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Ingo Molnar,
H. Peter Anvin, Denys Vlasenko, Andy Lutomirski, Borislav Petkov,
Thomas Gleixner
On Mon, Jun 20, 2016 at 12:01 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> On Sat, Jun 18, 2016 at 04:56:18PM -0400, Brian Gerst wrote:
>> thread_saved_pc() was using a completely bogus method to get the return
>> address. Since switch_to() was previously inlined, there was no sane way
>> to know where on the stack the return address was stored. Now with the
>> frame of a sleeping thread well defined, this can be implemented correctly.
>>
>> Signed-off-by: Brian Gerst <brgerst@gmail.com>
>> ---
>> arch/x86/include/asm/processor.h | 10 ++--------
>> arch/x86/kernel/process.c | 10 ++++++++++
>> arch/x86/kernel/process_32.c | 8 --------
>> 3 files changed, 12 insertions(+), 16 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
>> index 1e7d634..413f4f1 100644
>> --- a/arch/x86/include/asm/processor.h
>> +++ b/arch/x86/include/asm/processor.h
>> @@ -716,8 +716,6 @@ static inline void spin_lock_prefetch(const void *x)
>> .io_bitmap_ptr = NULL, \
>> }
>>
>> -extern unsigned long thread_saved_pc(struct task_struct *tsk);
>> -
>> /*
>> * TOP_OF_KERNEL_STACK_PADDING reserves 8 bytes on top of the ring0 stack.
>> * This is necessary to guarantee that the entire "struct pt_regs"
>> @@ -767,17 +765,13 @@ extern unsigned long thread_saved_pc(struct task_struct *tsk);
>> .sp0 = TOP_OF_INIT_STACK \
>> }
>>
>> -/*
>> - * Return saved PC of a blocked thread.
>> - * What is this good for? it will be always the scheduler or ret_from_fork.
>> - */
>> -#define thread_saved_pc(t) READ_ONCE_NOCHECK(*(unsigned long *)((t)->thread.sp - 8))
>> -
>> #define task_pt_regs(tsk) ((struct pt_regs *)(tsk)->thread.sp0 - 1)
>> extern unsigned long KSTK_ESP(struct task_struct *task);
>>
>> #endif /* CONFIG_X86_64 */
>>
>> +extern unsigned long thread_saved_pc(struct task_struct *tsk);
>> +
>> extern void start_thread(struct pt_regs *regs, unsigned long new_ip,
>> unsigned long new_sp);
>>
>> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
>> index 00ebab0..db458c4 100644
>> --- a/arch/x86/kernel/process.c
>> +++ b/arch/x86/kernel/process.c
>> @@ -513,6 +513,16 @@ unsigned long arch_randomize_brk(struct mm_struct *mm)
>> }
>>
>> /*
>> + * Return saved PC of a blocked thread.
>> + */
>> +unsigned long thread_saved_pc(struct task_struct *tsk)
>> +{
>> + struct inactive_task_frame *frame =
>> + (struct inactive_task_frame *) READ_ONCE(tsk->thread.sp);
>> + return READ_ONCE_NOCHECK(frame->ret_addr);
>> +}
>> +
>> +/*
>
> I would agree with the above (removed) comment:
>
> "What is this good for? it will be always the scheduler or ret_from_fork."
>
> And I'd guess the same is true for all the arches which have to
> implement it. Maybe this function (and its single call site in
> sched_show_task()) should just be removed altogether?
I didn't really want to stray down that path with this series. This
just makes it functional again. the usefulness is still open for
debate.
--
Brian Gerst
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2 6/6] x86: Fix thread_saved_pc()
2016-06-22 4:27 ` Brian Gerst
@ 2016-06-24 18:12 ` Josh Poimboeuf
0 siblings, 0 replies; 22+ messages in thread
From: Josh Poimboeuf @ 2016-06-24 18:12 UTC (permalink / raw)
To: Brian Gerst
Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Ingo Molnar,
H. Peter Anvin, Denys Vlasenko, Andy Lutomirski, Borislav Petkov,
Thomas Gleixner
On Wed, Jun 22, 2016 at 12:27:43AM -0400, Brian Gerst wrote:
> On Mon, Jun 20, 2016 at 12:01 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > On Sat, Jun 18, 2016 at 04:56:18PM -0400, Brian Gerst wrote:
> >> thread_saved_pc() was using a completely bogus method to get the return
> >> address. Since switch_to() was previously inlined, there was no sane way
> >> to know where on the stack the return address was stored. Now with the
> >> frame of a sleeping thread well defined, this can be implemented correctly.
> >>
> >> Signed-off-by: Brian Gerst <brgerst@gmail.com>
> >> ---
> >> arch/x86/include/asm/processor.h | 10 ++--------
> >> arch/x86/kernel/process.c | 10 ++++++++++
> >> arch/x86/kernel/process_32.c | 8 --------
> >> 3 files changed, 12 insertions(+), 16 deletions(-)
> >>
> >> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
> >> index 1e7d634..413f4f1 100644
> >> --- a/arch/x86/include/asm/processor.h
> >> +++ b/arch/x86/include/asm/processor.h
> >> @@ -716,8 +716,6 @@ static inline void spin_lock_prefetch(const void *x)
> >> .io_bitmap_ptr = NULL, \
> >> }
> >>
> >> -extern unsigned long thread_saved_pc(struct task_struct *tsk);
> >> -
> >> /*
> >> * TOP_OF_KERNEL_STACK_PADDING reserves 8 bytes on top of the ring0 stack.
> >> * This is necessary to guarantee that the entire "struct pt_regs"
> >> @@ -767,17 +765,13 @@ extern unsigned long thread_saved_pc(struct task_struct *tsk);
> >> .sp0 = TOP_OF_INIT_STACK \
> >> }
> >>
> >> -/*
> >> - * Return saved PC of a blocked thread.
> >> - * What is this good for? it will be always the scheduler or ret_from_fork.
> >> - */
> >> -#define thread_saved_pc(t) READ_ONCE_NOCHECK(*(unsigned long *)((t)->thread.sp - 8))
> >> -
> >> #define task_pt_regs(tsk) ((struct pt_regs *)(tsk)->thread.sp0 - 1)
> >> extern unsigned long KSTK_ESP(struct task_struct *task);
> >>
> >> #endif /* CONFIG_X86_64 */
> >>
> >> +extern unsigned long thread_saved_pc(struct task_struct *tsk);
> >> +
> >> extern void start_thread(struct pt_regs *regs, unsigned long new_ip,
> >> unsigned long new_sp);
> >>
> >> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> >> index 00ebab0..db458c4 100644
> >> --- a/arch/x86/kernel/process.c
> >> +++ b/arch/x86/kernel/process.c
> >> @@ -513,6 +513,16 @@ unsigned long arch_randomize_brk(struct mm_struct *mm)
> >> }
> >>
> >> /*
> >> + * Return saved PC of a blocked thread.
> >> + */
> >> +unsigned long thread_saved_pc(struct task_struct *tsk)
> >> +{
> >> + struct inactive_task_frame *frame =
> >> + (struct inactive_task_frame *) READ_ONCE(tsk->thread.sp);
> >> + return READ_ONCE_NOCHECK(frame->ret_addr);
> >> +}
> >> +
> >> +/*
> >
> > I would agree with the above (removed) comment:
> >
> > "What is this good for? it will be always the scheduler or ret_from_fork."
> >
> > And I'd guess the same is true for all the arches which have to
> > implement it. Maybe this function (and its single call site in
> > sched_show_task()) should just be removed altogether?
>
> I didn't really want to stray down that path with this series. This
> just makes it functional again. the usefulness is still open for
> debate.
Fair enough.
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
--
Josh
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2 5/6] x86: Pass kernel thread parameters in fork_frame
2016-06-22 4:24 ` Brian Gerst
@ 2016-07-09 12:01 ` Ingo Molnar
0 siblings, 0 replies; 22+ messages in thread
From: Ingo Molnar @ 2016-07-09 12:01 UTC (permalink / raw)
To: Brian Gerst
Cc: Borislav Petkov, Borislav Petkov, the arch/x86 maintainers,
Linux Kernel Mailing List, H. Peter Anvin, Denys Vlasenko,
Andy Lutomirski, Thomas Gleixner
* Brian Gerst <brgerst@gmail.com> wrote:
> On Mon, Jun 20, 2016 at 11:14 AM, Borislav Petkov <bp@alien8.de> wrote:
> > On Mon, Jun 20, 2016 at 11:01:02AM -0400, Brian Gerst wrote:
> >> The idea was to put the uncommon case (kernel thread) out of line for
> >> performance reasons.
> >
> > A comment saying so wouldn't hurt...
>
> This is a fairly common pattern. Do we have to document every case of it?
Yeah, would be good to do that in general, so that people can re-evaluate whether
'this is rare' is still true years down the line. For newly touched code it makes
sense to add a minimal comment that explains what is rare about the branch and so.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2016-07-09 12:01 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-18 20:56 [PATCH v2 0/6] x86: Rewrite switch_to() Brian Gerst
2016-06-18 20:56 ` [PATCH v2 1/6] x86-32, kgdb: Don't use thread.ip in sleeping_thread_to_gdb_regs() Brian Gerst
2016-06-18 20:56 ` [PATCH v2 2/6] x86-64, kgdb: clear GDB_PS on 64-bit Brian Gerst
2016-06-18 20:56 ` [PATCH v2 3/6] x86: Add struct inactive_task_frame Brian Gerst
2016-06-19 21:18 ` Andy Lutomirski
2016-06-20 15:39 ` Josh Poimboeuf
2016-06-18 20:56 ` [PATCH v2 4/6] x86: Rewrite switch_to() code Brian Gerst
2016-06-19 21:22 ` Andy Lutomirski
2016-06-20 15:44 ` Josh Poimboeuf
2016-06-18 20:56 ` [PATCH v2 5/6] x86: Pass kernel thread parameters in fork_frame Brian Gerst
2016-06-19 21:28 ` Andy Lutomirski
2016-06-19 22:01 ` Brian Gerst
2016-06-20 13:51 ` Borislav Petkov
2016-06-20 15:01 ` Brian Gerst
2016-06-20 15:14 ` Borislav Petkov
2016-06-22 4:24 ` Brian Gerst
2016-07-09 12:01 ` Ingo Molnar
2016-06-18 20:56 ` [PATCH v2 6/6] x86: Fix thread_saved_pc() Brian Gerst
2016-06-20 16:01 ` Josh Poimboeuf
2016-06-22 4:27 ` Brian Gerst
2016-06-24 18:12 ` Josh Poimboeuf
2016-06-19 22:05 ` [PATCH v2 0/6] x86: Rewrite switch_to() Brian Gerst
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).