* [PATCH 0/4] x86: int3 fallout @ 2019-05-08 7:49 Peter Zijlstra 2019-05-08 7:49 ` [PATCH 1/4] x86/entry/32: Clean up return from interrupt preemption path Peter Zijlstra ` (3 more replies) 0 siblings, 4 replies; 36+ messages in thread From: Peter Zijlstra @ 2019-05-08 7:49 UTC (permalink / raw) To: linux-kernel Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar, Andrew Morton, Andy Lutomirski, Nicolai Stange, Thomas Gleixner, Borislav Petkov, H. Peter Anvin, x86, Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain, Masahiro Yamada, Joerg Roedel, linux-kselftest, Masami Hiramatsu So the first 3 patches are cleanups and fixes and should probably just go in. The last patch however is the one Linus hates, it converts i386 to always have a complete pt_regs. Both Josh and me did a bunch of cleanups and the patch is now a net reduction in lines. I still think it is the right thing to do; it removes a whole lot of magic from the C code (and avoids future surprises), but if this cannot convince Linus, I'll not pursue this further. ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH 1/4] x86/entry/32: Clean up return from interrupt preemption path 2019-05-08 7:49 [PATCH 0/4] x86: int3 fallout Peter Zijlstra @ 2019-05-08 7:49 ` Peter Zijlstra 2019-05-08 7:49 ` [PATCH 2/4] x86/kprobes: Fix frame pointer annotations Peter Zijlstra ` (2 subsequent siblings) 3 siblings, 0 replies; 36+ messages in thread From: Peter Zijlstra @ 2019-05-08 7:49 UTC (permalink / raw) To: linux-kernel Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar, Andrew Morton, Andy Lutomirski, Nicolai Stange, Thomas Gleixner, Borislav Petkov, H. Peter Anvin, x86, Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain, Masahiro Yamada, Joerg Roedel, linux-kselftest, Masami Hiramatsu The code flow around the return from interrupt preemption point seems needlesly complicated. There is only one site jumping to resume_kernel, and none (outside of resume_kernel) jumping to restore_all_kernel. Inline resume_kernel in restore_all_kernel and avoid the CONFIG_PREEMPT dependent label. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> --- arch/x86/entry/entry_32.S | 24 ++++++++++-------------- 1 file changed, 10 insertions(+), 14 deletions(-) --- a/arch/x86/entry/entry_32.S +++ b/arch/x86/entry/entry_32.S @@ -67,7 +67,6 @@ # define preempt_stop(clobbers) DISABLE_INTERRUPTS(clobbers); TRACE_IRQS_OFF #else # define preempt_stop(clobbers) -# define resume_kernel restore_all_kernel #endif .macro TRACE_IRQS_IRET @@ -755,7 +754,7 @@ END(ret_from_fork) andl $SEGMENT_RPL_MASK, %eax #endif cmpl $USER_RPL, %eax - jb resume_kernel # not returning to v8086 or userspace + jb restore_all_kernel # not returning to v8086 or userspace ENTRY(resume_userspace) DISABLE_INTERRUPTS(CLBR_ANY) @@ -765,18 +764,6 @@ ENTRY(resume_userspace) jmp restore_all END(ret_from_exception) -#ifdef CONFIG_PREEMPT -ENTRY(resume_kernel) - DISABLE_INTERRUPTS(CLBR_ANY) - cmpl $0, PER_CPU_VAR(__preempt_count) - jnz restore_all_kernel - testl $X86_EFLAGS_IF, PT_EFLAGS(%esp) # interrupts off (exception path) ? - jz restore_all_kernel - call preempt_schedule_irq - jmp restore_all_kernel -END(resume_kernel) -#endif - GLOBAL(__begin_SYSENTER_singlestep_region) /* * All code from here through __end_SYSENTER_singlestep_region is subject @@ -1027,6 +1014,15 @@ ENTRY(entry_INT80_32) INTERRUPT_RETURN restore_all_kernel: +#ifdef CONFIG_PREEMPT + DISABLE_INTERRUPTS(CLBR_ANY) + cmpl $0, PER_CPU_VAR(__preempt_count) + jnz .Lno_preempt + testl $X86_EFLAGS_IF, PT_EFLAGS(%esp) # interrupts off (exception path) ? + jz .Lno_preempt + call preempt_schedule_irq +.Lno_preempt: +#endif TRACE_IRQS_IRET PARANOID_EXIT_TO_KERNEL_MODE BUG_IF_WRONG_CR3 ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH 2/4] x86/kprobes: Fix frame pointer annotations 2019-05-08 7:49 [PATCH 0/4] x86: int3 fallout Peter Zijlstra 2019-05-08 7:49 ` [PATCH 1/4] x86/entry/32: Clean up return from interrupt preemption path Peter Zijlstra @ 2019-05-08 7:49 ` Peter Zijlstra 2019-05-08 11:54 ` Josh Poimboeuf 2019-05-08 7:49 ` [PATCH 3/4] x86/ftrace: Add pt_regs frame annotations Peter Zijlstra 2019-05-08 7:49 ` [RFC][PATCH 4/4] x86_32: Provide consistent pt_regs Peter Zijlstra 3 siblings, 1 reply; 36+ messages in thread From: Peter Zijlstra @ 2019-05-08 7:49 UTC (permalink / raw) To: linux-kernel Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar, Andrew Morton, Andy Lutomirski, Nicolai Stange, Thomas Gleixner, Borislav Petkov, H. Peter Anvin, x86, Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain, Masahiro Yamada, Joerg Roedel, linux-kselftest, Masami Hiramatsu The kprobe trampolines have a FRAME_POINTER annotation that makes no sense. It marks the frame in the middle of pt_regs, at the place of saving BP. Change it to mark the pt_regs frame as per the ENCODE_FRAME_POINTER from the respective entry_*.S. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> --- arch/x86/kernel/kprobes/common.h | 32 +++++++++++++++++++++++--------- 1 file changed, 23 insertions(+), 9 deletions(-) --- a/arch/x86/kernel/kprobes/common.h +++ b/arch/x86/kernel/kprobes/common.h @@ -6,14 +6,15 @@ #include <asm/asm.h> +#ifdef CONFIG_X86_64 + #ifdef CONFIG_FRAME_POINTER -# define SAVE_RBP_STRING " push %" _ASM_BP "\n" \ - " mov %" _ASM_SP ", %" _ASM_BP "\n" +#define ENCODE_FRAME_POINTER \ + " leaq 1(%rsp), %rbp\n" #else -# define SAVE_RBP_STRING " push %" _ASM_BP "\n" +#define ENCODE_FRAME_POINTER #endif -#ifdef CONFIG_X86_64 #define SAVE_REGS_STRING \ /* Skip cs, ip, orig_ax. */ \ " subq $24, %rsp\n" \ @@ -27,11 +28,13 @@ " pushq %r10\n" \ " pushq %r11\n" \ " pushq %rbx\n" \ - SAVE_RBP_STRING \ + " pushq %rbp\n" \ " pushq %r12\n" \ " pushq %r13\n" \ " pushq %r14\n" \ - " pushq %r15\n" + " pushq %r15\n" \ + ENCODE_FRAME_POINTER + #define RESTORE_REGS_STRING \ " popq %r15\n" \ " popq %r14\n" \ @@ -51,19 +54,30 @@ /* Skip orig_ax, ip, cs */ \ " addq $24, %rsp\n" #else + +#ifdef CONFIG_FRAME_POINTER +#define ENCODE_FRAME_POINTER \ + " movl %esp, %ebp\n" \ + " andl $0x7fffffff, %ebp\n" +#else +#define ENCODE_FRAME_POINTER +#endif + #define SAVE_REGS_STRING \ /* Skip cs, ip, orig_ax and gs. */ \ - " subl $16, %esp\n" \ + " subl $4*4, %esp\n" \ " pushl %fs\n" \ " pushl %es\n" \ " pushl %ds\n" \ " pushl %eax\n" \ - SAVE_RBP_STRING \ + " pushl %ebp\n" \ " pushl %edi\n" \ " pushl %esi\n" \ " pushl %edx\n" \ " pushl %ecx\n" \ - " pushl %ebx\n" + " pushl %ebx\n" \ + ENCODE_FRAME_POINTER + #define RESTORE_REGS_STRING \ " popl %ebx\n" \ " popl %ecx\n" \ ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/4] x86/kprobes: Fix frame pointer annotations 2019-05-08 7:49 ` [PATCH 2/4] x86/kprobes: Fix frame pointer annotations Peter Zijlstra @ 2019-05-08 11:54 ` Josh Poimboeuf 2019-05-08 12:04 ` Peter Zijlstra 0 siblings, 1 reply; 36+ messages in thread From: Josh Poimboeuf @ 2019-05-08 11:54 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton, Andy Lutomirski, Nicolai Stange, Thomas Gleixner, Borislav Petkov, H. Peter Anvin, x86, Jiri Kosina, Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain, Masahiro Yamada, Joerg Roedel, linux-kselftest, Masami Hiramatsu On Wed, May 08, 2019 at 09:49:03AM +0200, Peter Zijlstra wrote: > The kprobe trampolines have a FRAME_POINTER annotation that makes no > sense. It marks the frame in the middle of pt_regs, at the place of > saving BP. > > Change it to mark the pt_regs frame as per the ENCODE_FRAME_POINTER > from the respective entry_*.S. > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> > --- > arch/x86/kernel/kprobes/common.h | 32 +++++++++++++++++++++++--------- > 1 file changed, 23 insertions(+), 9 deletions(-) > > --- a/arch/x86/kernel/kprobes/common.h > +++ b/arch/x86/kernel/kprobes/common.h > @@ -6,14 +6,15 @@ > > #include <asm/asm.h> > > +#ifdef CONFIG_X86_64 > + > #ifdef CONFIG_FRAME_POINTER > -# define SAVE_RBP_STRING " push %" _ASM_BP "\n" \ > - " mov %" _ASM_SP ", %" _ASM_BP "\n" > +#define ENCODE_FRAME_POINTER \ > + " leaq 1(%rsp), %rbp\n" > #else > -# define SAVE_RBP_STRING " push %" _ASM_BP "\n" > +#define ENCODE_FRAME_POINTER > #endif > +#ifdef CONFIG_FRAME_POINTER > +#define ENCODE_FRAME_POINTER \ > + " movl %esp, %ebp\n" \ > + " andl $0x7fffffff, %ebp\n" > +#else > +#define ENCODE_FRAME_POINTER > +#endif We should put these macros in a header file somewhere (including stringified versions). -- Josh ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/4] x86/kprobes: Fix frame pointer annotations 2019-05-08 11:54 ` Josh Poimboeuf @ 2019-05-08 12:04 ` Peter Zijlstra 2019-05-08 12:40 ` Peter Zijlstra 2019-05-08 12:42 ` Josh Poimboeuf 0 siblings, 2 replies; 36+ messages in thread From: Peter Zijlstra @ 2019-05-08 12:04 UTC (permalink / raw) To: Josh Poimboeuf Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton, Andy Lutomirski, Nicolai Stange, Thomas Gleixner, Borislav Petkov, H. Peter Anvin, x86, Jiri Kosina, Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain, Masahiro Yamada, Joerg Roedel, linux-kselftest, Masami Hiramatsu On Wed, May 08, 2019 at 06:54:16AM -0500, Josh Poimboeuf wrote: > On Wed, May 08, 2019 at 09:49:03AM +0200, Peter Zijlstra wrote: > > The kprobe trampolines have a FRAME_POINTER annotation that makes no > > sense. It marks the frame in the middle of pt_regs, at the place of > > saving BP. > > > > Change it to mark the pt_regs frame as per the ENCODE_FRAME_POINTER > > from the respective entry_*.S. > > > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> > > --- > > arch/x86/kernel/kprobes/common.h | 32 +++++++++++++++++++++++--------- > > 1 file changed, 23 insertions(+), 9 deletions(-) > > > > --- a/arch/x86/kernel/kprobes/common.h > > +++ b/arch/x86/kernel/kprobes/common.h > > @@ -6,14 +6,15 @@ > > > > #include <asm/asm.h> > > > > +#ifdef CONFIG_X86_64 > > + > > #ifdef CONFIG_FRAME_POINTER > > -# define SAVE_RBP_STRING " push %" _ASM_BP "\n" \ > > - " mov %" _ASM_SP ", %" _ASM_BP "\n" > > +#define ENCODE_FRAME_POINTER \ > > + " leaq 1(%rsp), %rbp\n" > > #else > > -# define SAVE_RBP_STRING " push %" _ASM_BP "\n" > > +#define ENCODE_FRAME_POINTER > > #endif > > > +#ifdef CONFIG_FRAME_POINTER > > +#define ENCODE_FRAME_POINTER \ > > + " movl %esp, %ebp\n" \ > > + " andl $0x7fffffff, %ebp\n" > > +#else > > +#define ENCODE_FRAME_POINTER > > +#endif > > We should put these macros in a header file somewhere (including > stringified versions). Probably a good idea. I'll frob them into asm/frame.h. Do the x86_64 variants also want some ORC annotation? ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/4] x86/kprobes: Fix frame pointer annotations 2019-05-08 12:04 ` Peter Zijlstra @ 2019-05-08 12:40 ` Peter Zijlstra 2019-05-08 12:42 ` Josh Poimboeuf 1 sibling, 0 replies; 36+ messages in thread From: Peter Zijlstra @ 2019-05-08 12:40 UTC (permalink / raw) To: Josh Poimboeuf Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton, Andy Lutomirski, Nicolai Stange, Thomas Gleixner, Borislav Petkov, H. Peter Anvin, x86, Jiri Kosina, Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain, Masahiro Yamada, Joerg Roedel, linux-kselftest, Masami Hiramatsu On Wed, May 08, 2019 at 02:04:16PM +0200, Peter Zijlstra wrote: > On Wed, May 08, 2019 at 06:54:16AM -0500, Josh Poimboeuf wrote: > > We should put these macros in a header file somewhere (including > > stringified versions). > > Probably a good idea. I'll frob them into asm/frame.h. --- Subject: x86: Move ENCODE_FRAME_POINTER to asm/frame.h From: Peter Zijlstra <peterz@infradead.org> Date: Wed May 8 14:30:48 CEST 2019 In preparation for wider use, move the ENCODE_FRAME_POINTER macros to a common header and provide inline asm versions. These macros are used to encode a pt_regs frame for the unwinder; see unwind_frame.c:decode_frame_pointer(). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> --- arch/x86/entry/calling.h | 15 -------------- arch/x86/entry/entry_32.S | 16 -------------- arch/x86/include/asm/frame.h | 46 +++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 46 insertions(+), 31 deletions(-) --- a/arch/x86/entry/calling.h +++ b/arch/x86/entry/calling.h @@ -172,21 +172,6 @@ For 32-bit we have the following convent .endif .endm -/* - * This is a sneaky trick to help the unwinder find pt_regs on the stack. The - * frame pointer is replaced with an encoded pointer to pt_regs. The encoding - * is just setting the LSB, which makes it an invalid stack address and is also - * a signal to the unwinder that it's a pt_regs pointer in disguise. - * - * NOTE: This macro must be used *after* PUSH_AND_CLEAR_REGS because it corrupts - * the original rbp. - */ -.macro ENCODE_FRAME_POINTER ptregs_offset=0 -#ifdef CONFIG_FRAME_POINTER - leaq 1+\ptregs_offset(%rsp), %rbp -#endif -.endm - #ifdef CONFIG_PAGE_TABLE_ISOLATION /* --- a/arch/x86/entry/entry_32.S +++ b/arch/x86/entry/entry_32.S @@ -246,22 +246,6 @@ .Lend_\@: .endm -/* - * This is a sneaky trick to help the unwinder find pt_regs on the stack. The - * frame pointer is replaced with an encoded pointer to pt_regs. The encoding - * is just clearing the MSB, which makes it an invalid stack address and is also - * a signal to the unwinder that it's a pt_regs pointer in disguise. - * - * NOTE: This macro must be used *after* SAVE_ALL because it corrupts the - * original rbp. - */ -.macro ENCODE_FRAME_POINTER -#ifdef CONFIG_FRAME_POINTER - mov %esp, %ebp - andl $0x7fffffff, %ebp -#endif -.endm - .macro RESTORE_INT_REGS popl %ebx popl %ecx --- a/arch/x86/include/asm/frame.h +++ b/arch/x86/include/asm/frame.h @@ -22,6 +22,39 @@ pop %_ASM_BP .endm +#ifdef CONFIG_X86_64 +/* + * This is a sneaky trick to help the unwinder find pt_regs on the stack. The + * frame pointer is replaced with an encoded pointer to pt_regs. The encoding + * is just setting the LSB, which makes it an invalid stack address and is also + * a signal to the unwinder that it's a pt_regs pointer in disguise. + * + * NOTE: This macro must be used *after* PUSH_AND_CLEAR_REGS because it corrupts + * the original rbp. + */ +.macro ENCODE_FRAME_POINTER ptregs_offset=0 +#ifdef CONFIG_FRAME_POINTER + leaq 1+\ptregs_offset(%rsp), %rbp +#endif +.endm +#else /* !CONFIG_X86_64 */ +/* + * This is a sneaky trick to help the unwinder find pt_regs on the stack. The + * frame pointer is replaced with an encoded pointer to pt_regs. The encoding + * is just clearing the MSB, which makes it an invalid stack address and is also + * a signal to the unwinder that it's a pt_regs pointer in disguise. + * + * NOTE: This macro must be used *after* SAVE_ALL because it corrupts the + * original ebp. + */ +.macro ENCODE_FRAME_POINTER +#ifdef CONFIG_FRAME_POINTER + mov %esp, %ebp + andl $0x7fffffff, %ebp +#endif +.endm +#endif /* CONFIG_X86_64 */ + #else /* !__ASSEMBLY__ */ #define FRAME_BEGIN \ @@ -30,6 +63,19 @@ #define FRAME_END "pop %" _ASM_BP "\n" +#ifdef CONFIG_FRAME_POINTER +#ifdef CONFIG_X86_64 +#define ENCODE_FRAME_POINTER \ + "lea 1(%rsp), %rbp\n\t" +#else /* !CONFIG_X86_64 */ +#define ENCODE_FRAME_POINTER \ + "movl %esp, %ebp\n\t" \ + "andl $0x7fffffff, %ebp\n\t" +#endif /* CONFIG_X86_64 */ +#else /* CONFIG_FRAME_POINTER */ +#define ENCODE_FRAME_POINTER +#endif /* CONFIG_FRAME_POINTER */ + #endif /* __ASSEMBLY__ */ #define FRAME_OFFSET __ASM_SEL(4, 8) ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/4] x86/kprobes: Fix frame pointer annotations 2019-05-08 12:04 ` Peter Zijlstra 2019-05-08 12:40 ` Peter Zijlstra @ 2019-05-08 12:42 ` Josh Poimboeuf 2019-05-08 15:39 ` Peter Zijlstra 1 sibling, 1 reply; 36+ messages in thread From: Josh Poimboeuf @ 2019-05-08 12:42 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton, Andy Lutomirski, Nicolai Stange, Thomas Gleixner, Borislav Petkov, H. Peter Anvin, x86, Jiri Kosina, Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain, Masahiro Yamada, Joerg Roedel, linux-kselftest, Masami Hiramatsu On Wed, May 08, 2019 at 02:04:16PM +0200, Peter Zijlstra wrote: > On Wed, May 08, 2019 at 06:54:16AM -0500, Josh Poimboeuf wrote: > > On Wed, May 08, 2019 at 09:49:03AM +0200, Peter Zijlstra wrote: > > > The kprobe trampolines have a FRAME_POINTER annotation that makes no > > > sense. It marks the frame in the middle of pt_regs, at the place of > > > saving BP. > > > > > > Change it to mark the pt_regs frame as per the ENCODE_FRAME_POINTER > > > from the respective entry_*.S. > > > > > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> > > > --- > > > arch/x86/kernel/kprobes/common.h | 32 +++++++++++++++++++++++--------- > > > 1 file changed, 23 insertions(+), 9 deletions(-) > > > > > > --- a/arch/x86/kernel/kprobes/common.h > > > +++ b/arch/x86/kernel/kprobes/common.h > > > @@ -6,14 +6,15 @@ > > > > > > #include <asm/asm.h> > > > > > > +#ifdef CONFIG_X86_64 > > > + > > > #ifdef CONFIG_FRAME_POINTER > > > -# define SAVE_RBP_STRING " push %" _ASM_BP "\n" \ > > > - " mov %" _ASM_SP ", %" _ASM_BP "\n" > > > +#define ENCODE_FRAME_POINTER \ > > > + " leaq 1(%rsp), %rbp\n" > > > #else > > > -# define SAVE_RBP_STRING " push %" _ASM_BP "\n" > > > +#define ENCODE_FRAME_POINTER > > > #endif > > > > > +#ifdef CONFIG_FRAME_POINTER > > > +#define ENCODE_FRAME_POINTER \ > > > + " movl %esp, %ebp\n" \ > > > + " andl $0x7fffffff, %ebp\n" > > > +#else > > > +#define ENCODE_FRAME_POINTER > > > +#endif > > > > We should put these macros in a header file somewhere (including > > stringified versions). > > Probably a good idea. I'll frob them into asm/frame.h. > > Do the x86_64 variants also want some ORC annotation? Maybe so. Though it looks like regs->ip isn't saved. The saved registers might need to be tweaked. I'll need to look into it. -- Josh ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/4] x86/kprobes: Fix frame pointer annotations 2019-05-08 12:42 ` Josh Poimboeuf @ 2019-05-08 15:39 ` Peter Zijlstra 2019-05-08 18:48 ` Josh Poimboeuf 0 siblings, 1 reply; 36+ messages in thread From: Peter Zijlstra @ 2019-05-08 15:39 UTC (permalink / raw) To: Josh Poimboeuf Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton, Andy Lutomirski, Nicolai Stange, Thomas Gleixner, Borislav Petkov, H. Peter Anvin, x86, Jiri Kosina, Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain, Masahiro Yamada, Joerg Roedel, linux-kselftest, Masami Hiramatsu On Wed, May 08, 2019 at 07:42:48AM -0500, Josh Poimboeuf wrote: > On Wed, May 08, 2019 at 02:04:16PM +0200, Peter Zijlstra wrote: > > Do the x86_64 variants also want some ORC annotation? > > Maybe so. Though it looks like regs->ip isn't saved. The saved > registers might need to be tweaked. I'll need to look into it. What all these sites do (and maybe we should look at unifying them somehow) is turn a CALL frame (aka RET-IP) into an exception frame (aka pt_regs). So regs->ip will be the return address (which is fixed up to be the CALL address in the handler). ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/4] x86/kprobes: Fix frame pointer annotations 2019-05-08 15:39 ` Peter Zijlstra @ 2019-05-08 18:48 ` Josh Poimboeuf 2019-05-09 1:20 ` Masami Hiramatsu 0 siblings, 1 reply; 36+ messages in thread From: Josh Poimboeuf @ 2019-05-08 18:48 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton, Andy Lutomirski, Nicolai Stange, Thomas Gleixner, Borislav Petkov, H. Peter Anvin, x86, Jiri Kosina, Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain, Masahiro Yamada, Joerg Roedel, linux-kselftest, Masami Hiramatsu On Wed, May 08, 2019 at 05:39:07PM +0200, Peter Zijlstra wrote: > On Wed, May 08, 2019 at 07:42:48AM -0500, Josh Poimboeuf wrote: > > On Wed, May 08, 2019 at 02:04:16PM +0200, Peter Zijlstra wrote: > > > > Do the x86_64 variants also want some ORC annotation? > > > > Maybe so. Though it looks like regs->ip isn't saved. The saved > > registers might need to be tweaked. I'll need to look into it. > > What all these sites do (and maybe we should look at unifying them > somehow) is turn a CALL frame (aka RET-IP) into an exception frame (aka > pt_regs). > > So regs->ip will be the return address (which is fixed up to be the CALL > address in the handler). But from what I can tell, trampoline_handler() hard-codes regs->ip to point to kretprobe_trampoline(), and the original return address is placed in regs->sp. Masami, is there a reason why regs->ip doesn't have the original return address and regs->sp doesn't have the original SP? I think that would help the unwinder understand things. -- Josh ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/4] x86/kprobes: Fix frame pointer annotations 2019-05-08 18:48 ` Josh Poimboeuf @ 2019-05-09 1:20 ` Masami Hiramatsu 2019-05-09 8:14 ` Peter Zijlstra 0 siblings, 1 reply; 36+ messages in thread From: Masami Hiramatsu @ 2019-05-09 1:20 UTC (permalink / raw) To: Josh Poimboeuf Cc: Peter Zijlstra, linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton, Andy Lutomirski, Nicolai Stange, Thomas Gleixner, Borislav Petkov, H. Peter Anvin, x86, Jiri Kosina, Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain, Masahiro Yamada, Joerg Roedel, linux-kselftest, Masami Hiramatsu Hi Josh, On Wed, 8 May 2019 13:48:48 -0500 Josh Poimboeuf <jpoimboe@redhat.com> wrote: > On Wed, May 08, 2019 at 05:39:07PM +0200, Peter Zijlstra wrote: > > On Wed, May 08, 2019 at 07:42:48AM -0500, Josh Poimboeuf wrote: > > > On Wed, May 08, 2019 at 02:04:16PM +0200, Peter Zijlstra wrote: > > > > > > Do the x86_64 variants also want some ORC annotation? > > > > > > Maybe so. Though it looks like regs->ip isn't saved. The saved > > > registers might need to be tweaked. I'll need to look into it. > > > > What all these sites do (and maybe we should look at unifying them > > somehow) is turn a CALL frame (aka RET-IP) into an exception frame (aka > > pt_regs). > > > > So regs->ip will be the return address (which is fixed up to be the CALL > > address in the handler). > > But from what I can tell, trampoline_handler() hard-codes regs->ip to > point to kretprobe_trampoline(), and the original return address is > placed in regs->sp. > > Masami, is there a reason why regs->ip doesn't have the original return > address and regs->sp doesn't have the original SP? I think that would > help the unwinder understand things. Yes, for regs->ip, there is a histrical reason. Since previously, we had an int3 at trampoline, so the user (kretprobe) handler expects that regs->ip is trampoline address and ri->ret_addr is original return address. It is better to check the other archs, but I think it is possible to change the regs->ip to original return address, since no one cares such "fixed address". :) For the regs->sp, there are 2 reasons. For x86-64, it's just for over-optimizing (reduce stack usage). I think we can make a gap for putting return address, something like "kretprobe_trampoline:\n" #ifdef CONFIG_X86_64 " pushq %rsp\n" /* Make a gap for return address */ " pushq 0(%rsp)\n" /* Copy original stack pointer */ " pushfq\n" SAVE_REGS_STRING " movq %rsp, %rdi\n" " call trampoline_handler\n" /* Push the true return address to the bottom */ " movq %rax, 20*8(%rsp)\n" RESTORE_REGS_STRING " popfq\n" " addq $8, %rsp\n" /* Skip original stack pointer */ For i386 (x86-32), there is no other way to keep ®s->sp as the original stack pointer. It has to be changed with this series, maybe as same as x86-64. Thank you, -- Masami Hiramatsu <mhiramat@kernel.org> ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/4] x86/kprobes: Fix frame pointer annotations 2019-05-09 1:20 ` Masami Hiramatsu @ 2019-05-09 8:14 ` Peter Zijlstra 2019-05-09 9:27 ` Peter Zijlstra ` (4 more replies) 0 siblings, 5 replies; 36+ messages in thread From: Peter Zijlstra @ 2019-05-09 8:14 UTC (permalink / raw) To: Masami Hiramatsu Cc: Josh Poimboeuf, linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton, Andy Lutomirski, Nicolai Stange, Thomas Gleixner, Borislav Petkov, H. Peter Anvin, x86, Jiri Kosina, Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain, Masahiro Yamada, Joerg Roedel, linux-kselftest On Thu, May 09, 2019 at 10:20:30AM +0900, Masami Hiramatsu wrote: > Hi Josh, > > On Wed, 8 May 2019 13:48:48 -0500 > Josh Poimboeuf <jpoimboe@redhat.com> wrote: > > > On Wed, May 08, 2019 at 05:39:07PM +0200, Peter Zijlstra wrote: > > > On Wed, May 08, 2019 at 07:42:48AM -0500, Josh Poimboeuf wrote: > > > > On Wed, May 08, 2019 at 02:04:16PM +0200, Peter Zijlstra wrote: > > > > > > > > Do the x86_64 variants also want some ORC annotation? > > > > > > > > Maybe so. Though it looks like regs->ip isn't saved. The saved > > > > registers might need to be tweaked. I'll need to look into it. > > > > > > What all these sites do (and maybe we should look at unifying them > > > somehow) is turn a CALL frame (aka RET-IP) into an exception frame (aka > > > pt_regs). > > > > > > So regs->ip will be the return address (which is fixed up to be the CALL > > > address in the handler). > > > > But from what I can tell, trampoline_handler() hard-codes regs->ip to > > point to kretprobe_trampoline(), and the original return address is > > placed in regs->sp. > > > > Masami, is there a reason why regs->ip doesn't have the original return > > address and regs->sp doesn't have the original SP? I think that would > > help the unwinder understand things. > > Yes, for regs->ip, there is a histrical reason. Since previously, we had > an int3 at trampoline, so the user (kretprobe) handler expects that > regs->ip is trampoline address and ri->ret_addr is original return address. > It is better to check the other archs, but I think it is possible to > change the regs->ip to original return address, since no one cares such > "fixed address". :) > > For the regs->sp, there are 2 reasons. > > For x86-64, it's just for over-optimizing (reduce stack usage). > I think we can make a gap for putting return address, something like > > "kretprobe_trampoline:\n" > #ifdef CONFIG_X86_64 > " pushq %rsp\n" /* Make a gap for return address */ > " pushq 0(%rsp)\n" /* Copy original stack pointer */ > " pushfq\n" > SAVE_REGS_STRING > " movq %rsp, %rdi\n" > " call trampoline_handler\n" > /* Push the true return address to the bottom */ > " movq %rax, 20*8(%rsp)\n" > RESTORE_REGS_STRING > " popfq\n" > " addq $8, %rsp\n" /* Skip original stack pointer */ > > For i386 (x86-32), there is no other way to keep ®s->sp as > the original stack pointer. It has to be changed with this series, > maybe as same as x86-64. Right; I already fixed that in my patch changing i386's pt_regs. But what I'd love to do is something like the belwo patch, and make all the trampolines (very much including ftrace) use that. Such that we then only have 1 copy of this magic (well, 2 because x86_64 also needs an implementation of this of course). Changing ftrace over to this would be a little more work but it can easily chain things a little to get its original context back: ENTRY(ftrace_regs_caller) GLOBAL(ftrace_regs_func) push ftrace_stub push ftrace_regs_handler jmp call_to_exception_trampoline END(ftrace_regs_caller) typedef void (*ftrace_func_t)(unsigned long, unsigned long, struct ftrace_op *, struct pt_regs *); struct ftrace_regs_stack { ftrace_func_t func; unsigned long parent_ip; }; void ftrace_regs_handler(struct pr_regs *regs) { struct ftrace_regs_stack *st = (void *)regs->sp; ftrace_func_t func = st->func; regs->sp += sizeof(long); /* pop func */ func(regs->ip, st->parent_ip, function_trace_op, regs); } Hmm? I didn't look into the function_graph thing, but I imagine it can be added without too much pain. --- --- a/arch/x86/entry/entry_32.S +++ b/arch/x86/entry/entry_32.S @@ -1576,3 +1576,100 @@ ENTRY(rewind_stack_do_exit) call do_exit 1: jmp 1b END(rewind_stack_do_exit) + +/* + * Transforms a CALL frame into an exception frame; IOW it pretends the CALL we + * just did was in fact scribbled with an INT3. + * + * Use this trampoline like: + * + * PUSH $func + * JMP call_to_exception_trampoline + * + * $func will see regs->ip point at the CALL instruction and must therefore + * modify regs->ip in order to make progress (just like a normal INT3 scribbled + * CALL). + * + * NOTE: we do not restore any of the segment registers. + */ +ENTRY(call_to_exception_trampoline) + /* + * On entry the stack looks like: + * + * 2*4(%esp) <previous context> + * 1*4(%esp) RET-IP + * 0*4(%esp) func + * + * transform this into: + * + * 19*4(%esp) <previous context> + * 18*4(%esp) gap / RET-IP + * 17*4(%esp) gap / func + * 16*4(%esp) ss + * 15*4(%esp) sp / <previous context> + * 14*4(%esp) flags + * 13*4(%esp) cs + * 12*4(%esp) ip / RET-IP + * 11*4(%esp) orig_eax + * 10*4(%esp) gs + * 9*4(%esp) fs + * 8*4(%esp) es + * 7*4(%esp) ds + * 6*4(%esp) eax + * 5*4(%esp) ebp + * 4*4(%esp) edi + * 3*4(%esp) esi + * 2*4(%esp) edx + * 1*4(%esp) ecx + * 0*4(%esp) ebx + */ + pushl %ss + pushl %esp # points at ss + addl $3*4, (%esp) # point it at <previous context> + pushfl + pushl %cs + pushl 5*4(%esp) # RET-IP + subl 5, (%esp) # point at CALL instruction + pushl $-1 + pushl %gs + pushl %fs + pushl %es + pushl %ds + pushl %eax + pushl %ebp + pushl %edi + pushl %esi + pushl %edx + pushl %ecx + pushl %ebx + + ENCODE_FRAME_POINTER + + movl %esp, %eax # 1st argument: pt_regs + + movl 17*4(%esp), %ebx # func + CALL_NOSPEC %ebx + + movl PT_OLDESP(%esp), %eax + + movl PT_EIP(%esp), %ecx + movl %ecx, -1*4(%eax) + + movl PT_EFLAGS(%esp), %ecx + movl %ecx, -2*4(%eax) + + movl PT_EAX(%esp), %ecx + movl %ecx, -3*4(%eax) + + popl %ebx + popl %ecx + popl %edx + popl %esi + popl %edi + popl %ebp + + lea -3*4(%eax), %esp + popl %eax + popfl + ret +END(call_to_exception_trampoline) --- a/arch/x86/kernel/kprobes/core.c +++ b/arch/x86/kernel/kprobes/core.c @@ -731,29 +731,8 @@ asm( ".global kretprobe_trampoline\n" ".type kretprobe_trampoline, @function\n" "kretprobe_trampoline:\n" - /* We don't bother saving the ss register */ -#ifdef CONFIG_X86_64 - " pushq %rsp\n" - " pushfq\n" - SAVE_REGS_STRING - " movq %rsp, %rdi\n" - " call trampoline_handler\n" - /* Replace saved sp with true return address. */ - " movq %rax, 19*8(%rsp)\n" - RESTORE_REGS_STRING - " popfq\n" -#else - " pushl %esp\n" - " pushfl\n" - SAVE_REGS_STRING - " movl %esp, %eax\n" - " call trampoline_handler\n" - /* Replace saved sp with true return address. */ - " movl %eax, 15*4(%esp)\n" - RESTORE_REGS_STRING - " popfl\n" -#endif - " ret\n" + "push trampoline_handler\n" + "jmp call_to_exception_trampoline\n" ".size kretprobe_trampoline, .-kretprobe_trampoline\n" ); NOKPROBE_SYMBOL(kretprobe_trampoline); @@ -791,12 +770,7 @@ static __used void *trampoline_handler(s INIT_HLIST_HEAD(&empty_rp); kretprobe_hash_lock(current, &head, &flags); - /* fixup registers */ - regs->cs = __KERNEL_CS; -#ifdef CONFIG_X86_32 - regs->cs |= get_kernel_rpl(); - regs->gs = 0; -#endif + /* We use pt_regs->sp for return address holder. */ frame_pointer = ®s->sp; regs->ip = trampoline_address; ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/4] x86/kprobes: Fix frame pointer annotations 2019-05-09 8:14 ` Peter Zijlstra @ 2019-05-09 9:27 ` Peter Zijlstra 2019-05-09 14:00 ` Josh Poimboeuf ` (3 subsequent siblings) 4 siblings, 0 replies; 36+ messages in thread From: Peter Zijlstra @ 2019-05-09 9:27 UTC (permalink / raw) To: Masami Hiramatsu Cc: Josh Poimboeuf, linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton, Andy Lutomirski, Nicolai Stange, Thomas Gleixner, Borislav Petkov, H. Peter Anvin, x86, Jiri Kosina, Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain, Masahiro Yamada, Joerg Roedel, linux-kselftest On Thu, May 09, 2019 at 10:14:31AM +0200, Peter Zijlstra wrote: > struct ftrace_regs_stack { > ftrace_func_t func; > unsigned long parent_ip; > }; > > void ftrace_regs_handler(struct pr_regs *regs) > { > struct ftrace_regs_stack *st = (void *)regs->sp; > ftrace_func_t func = st->func; > > regs->sp += sizeof(long); /* pop func */ > > func(regs->ip, st->parent_ip, function_trace_op, regs); > } Alternatively we can add things like: static inline unsigned long int3_emulate_pop(struct pt_regs *regs) { unsigned long val = *(unsigned long *)regs->sp; regs->sp += sizeof(unsigned long); return val; } And do: ftrace_func_t func = (void *)int3_emulate_pop(regs); ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/4] x86/kprobes: Fix frame pointer annotations 2019-05-09 8:14 ` Peter Zijlstra 2019-05-09 9:27 ` Peter Zijlstra @ 2019-05-09 14:00 ` Josh Poimboeuf 2019-05-09 14:01 ` Masami Hiramatsu ` (2 subsequent siblings) 4 siblings, 0 replies; 36+ messages in thread From: Josh Poimboeuf @ 2019-05-09 14:00 UTC (permalink / raw) To: Peter Zijlstra Cc: Masami Hiramatsu, linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton, Andy Lutomirski, Nicolai Stange, Thomas Gleixner, Borislav Petkov, H. Peter Anvin, x86, Jiri Kosina, Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain, Masahiro Yamada, Joerg Roedel, linux-kselftest On Thu, May 09, 2019 at 10:14:31AM +0200, Peter Zijlstra wrote: > But what I'd love to do is something like the belwo patch, and make all > the trampolines (very much including ftrace) use that. Such that we then > only have 1 copy of this magic (well, 2 because x86_64 also needs an > implementation of this of course). > > Changing ftrace over to this would be a little more work but it can > easily chain things a little to get its original context back: > > ENTRY(ftrace_regs_caller) > GLOBAL(ftrace_regs_func) > push ftrace_stub > push ftrace_regs_handler > jmp call_to_exception_trampoline > END(ftrace_regs_caller) > > typedef void (*ftrace_func_t)(unsigned long, unsigned long, struct ftrace_op *, struct pt_regs *); > > struct ftrace_regs_stack { > ftrace_func_t func; > unsigned long parent_ip; > }; > > void ftrace_regs_handler(struct pr_regs *regs) > { > struct ftrace_regs_stack *st = (void *)regs->sp; > ftrace_func_t func = st->func; > > regs->sp += sizeof(long); /* pop func */ > > func(regs->ip, st->parent_ip, function_trace_op, regs); > } > > Hmm? I didn't look into the function_graph thing, but I imagine it can > be added without too much pain. I like this patch a lot, assuming it can be made to work for the different users. -- Josh ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/4] x86/kprobes: Fix frame pointer annotations 2019-05-09 8:14 ` Peter Zijlstra 2019-05-09 9:27 ` Peter Zijlstra 2019-05-09 14:00 ` Josh Poimboeuf @ 2019-05-09 14:01 ` Masami Hiramatsu 2019-05-09 17:14 ` Peter Zijlstra 2019-05-09 16:20 ` Andy Lutomirski 2019-05-09 17:37 ` Steven Rostedt 4 siblings, 1 reply; 36+ messages in thread From: Masami Hiramatsu @ 2019-05-09 14:01 UTC (permalink / raw) To: Peter Zijlstra Cc: Josh Poimboeuf, linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton, Andy Lutomirski, Nicolai Stange, Thomas Gleixner, Borislav Petkov, H. Peter Anvin, x86, Jiri Kosina, Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain, Masahiro Yamada, Joerg Roedel, linux-kselftest On Thu, 9 May 2019 10:14:31 +0200 Peter Zijlstra <peterz@infradead.org> wrote: > On Thu, May 09, 2019 at 10:20:30AM +0900, Masami Hiramatsu wrote: > > Hi Josh, > > > > On Wed, 8 May 2019 13:48:48 -0500 > > Josh Poimboeuf <jpoimboe@redhat.com> wrote: > > > > > On Wed, May 08, 2019 at 05:39:07PM +0200, Peter Zijlstra wrote: > > > > On Wed, May 08, 2019 at 07:42:48AM -0500, Josh Poimboeuf wrote: > > > > > On Wed, May 08, 2019 at 02:04:16PM +0200, Peter Zijlstra wrote: > > > > > > > > > > Do the x86_64 variants also want some ORC annotation? > > > > > > > > > > Maybe so. Though it looks like regs->ip isn't saved. The saved > > > > > registers might need to be tweaked. I'll need to look into it. > > > > > > > > What all these sites do (and maybe we should look at unifying them > > > > somehow) is turn a CALL frame (aka RET-IP) into an exception frame (aka > > > > pt_regs). > > > > > > > > So regs->ip will be the return address (which is fixed up to be the CALL > > > > address in the handler). > > > > > > But from what I can tell, trampoline_handler() hard-codes regs->ip to > > > point to kretprobe_trampoline(), and the original return address is > > > placed in regs->sp. > > > > > > Masami, is there a reason why regs->ip doesn't have the original return > > > address and regs->sp doesn't have the original SP? I think that would > > > help the unwinder understand things. > > > > Yes, for regs->ip, there is a histrical reason. Since previously, we had > > an int3 at trampoline, so the user (kretprobe) handler expects that > > regs->ip is trampoline address and ri->ret_addr is original return address. > > It is better to check the other archs, but I think it is possible to > > change the regs->ip to original return address, since no one cares such > > "fixed address". :) > > > > For the regs->sp, there are 2 reasons. > > > > For x86-64, it's just for over-optimizing (reduce stack usage). > > I think we can make a gap for putting return address, something like > > > > "kretprobe_trampoline:\n" > > #ifdef CONFIG_X86_64 > > " pushq %rsp\n" /* Make a gap for return address */ > > " pushq 0(%rsp)\n" /* Copy original stack pointer */ > > " pushfq\n" > > SAVE_REGS_STRING > > " movq %rsp, %rdi\n" > > " call trampoline_handler\n" > > /* Push the true return address to the bottom */ > > " movq %rax, 20*8(%rsp)\n" > > RESTORE_REGS_STRING > > " popfq\n" > > " addq $8, %rsp\n" /* Skip original stack pointer */ > > > > For i386 (x86-32), there is no other way to keep ®s->sp as > > the original stack pointer. It has to be changed with this series, > > maybe as same as x86-64. > > Right; I already fixed that in my patch changing i386's pt_regs. I see it, and it is good to me. :) > But what I'd love to do is something like the belwo patch, and make all > the trampolines (very much including ftrace) use that. Such that we then > only have 1 copy of this magic (well, 2 because x86_64 also needs an > implementation of this of course). OK, but I will make kretprobe integrated with func-graph tracer, since it is inefficient that we have 2 different hidden return stack... Anyway, > Changing ftrace over to this would be a little more work but it can > easily chain things a little to get its original context back: > > ENTRY(ftrace_regs_caller) > GLOBAL(ftrace_regs_func) > push ftrace_stub > push ftrace_regs_handler > jmp call_to_exception_trampoline > END(ftrace_regs_caller) > > typedef void (*ftrace_func_t)(unsigned long, unsigned long, struct ftrace_op *, struct pt_regs *); > > struct ftrace_regs_stack { > ftrace_func_t func; > unsigned long parent_ip; > }; > > void ftrace_regs_handler(struct pr_regs *regs) > { > struct ftrace_regs_stack *st = (void *)regs->sp; > ftrace_func_t func = st->func; > > regs->sp += sizeof(long); /* pop func */ Sorry, why pop here? > > func(regs->ip, st->parent_ip, function_trace_op, regs); > } > > Hmm? I didn't look into the function_graph thing, but I imagine it can > be added without too much pain. Yes, that should be good for function_graph trampoline too. We use very similar technic. > > --- > --- a/arch/x86/entry/entry_32.S > +++ b/arch/x86/entry/entry_32.S > @@ -1576,3 +1576,100 @@ ENTRY(rewind_stack_do_exit) > call do_exit > 1: jmp 1b > END(rewind_stack_do_exit) > + > +/* > + * Transforms a CALL frame into an exception frame; IOW it pretends the CALL we > + * just did was in fact scribbled with an INT3. > + * > + * Use this trampoline like: > + * > + * PUSH $func > + * JMP call_to_exception_trampoline > + * > + * $func will see regs->ip point at the CALL instruction and must therefore > + * modify regs->ip in order to make progress (just like a normal INT3 scribbled > + * CALL). > + * > + * NOTE: we do not restore any of the segment registers. > + */ > +ENTRY(call_to_exception_trampoline) > + /* > + * On entry the stack looks like: > + * > + * 2*4(%esp) <previous context> > + * 1*4(%esp) RET-IP > + * 0*4(%esp) func > + * > + * transform this into: > + * > + * 19*4(%esp) <previous context> > + * 18*4(%esp) gap / RET-IP > + * 17*4(%esp) gap / func > + * 16*4(%esp) ss > + * 15*415*4(%esp) sp / <previous context> isn't this "&<previous context>" ? > + * 14*4(%esp) flags > + * 13*4(%esp) cs > + * 12*4(%esp) ip / RET-IP > + * 11*4(%esp) orig_eax > + * 10*4(%esp) gs > + * 9*4(%esp) fs > + * 8*4(%esp) es > + * 7*4(%esp) ds > + * 6*4(%esp) eax > + * 5*4(%esp) ebp > + * 4*4(%esp) edi > + * 3*4(%esp) esi > + * 2*4(%esp) edx > + * 1*4(%esp) ecx > + * 0*4(%esp) ebx > + */ > + pushl %ss > + pushl %esp # points at ss > + addl $3*4, (%esp) # point it at <previous context> > + pushfl > + pushl %cs > + pushl 5*4(%esp) # RET-IP > + subl 5, (%esp) # point at CALL instruction > + pushl $-1 > + pushl %gs > + pushl %fs > + pushl %es > + pushl %ds > + pushl %eax > + pushl %ebp > + pushl %edi > + pushl %esi > + pushl %edx > + pushl %ecx > + pushl %ebx > + > + ENCODE_FRAME_POINTER > + > + movl %esp, %eax # 1st argument: pt_regs > + > + movl 17*4(%esp), %ebx # func > + CALL_NOSPEC %ebx > + > + movl PT_OLDESP(%esp), %eax Is PT_OLDESP(%esp) "<previous context>" or "&<previous contex>"? > + > + movl PT_EIP(%esp), %ecx > + movl %ecx, -1*4(%eax) Ah, OK, so $func must set the true return address to regs->ip instead of returning it. > + > + movl PT_EFLAGS(%esp), %ecx > + movl %ecx, -2*4(%eax) > + > + movl PT_EAX(%esp), %ecx > + movl %ecx, -3*4(%eax) So, at this point, the stack becomes 18*4(%esp) RET-IP 17*4(%esp) eflags 16*4(%esp) eax Correct? > + > + popl %ebx > + popl %ecx > + popl %edx > + popl %esi > + popl %edi > + popl %ebp > + > + lea -3*4(%eax), %esp > + popl %eax > + popfl > + ret > +END(call_to_exception_trampoline) > --- a/arch/x86/kernel/kprobes/core.c > +++ b/arch/x86/kernel/kprobes/core.c > @@ -731,29 +731,8 @@ asm( > ".global kretprobe_trampoline\n" > ".type kretprobe_trampoline, @function\n" > "kretprobe_trampoline:\n" > - /* We don't bother saving the ss register */ > -#ifdef CONFIG_X86_64 > - " pushq %rsp\n" > - " pushfq\n" > - SAVE_REGS_STRING > - " movq %rsp, %rdi\n" > - " call trampoline_handler\n" > - /* Replace saved sp with true return address. */ > - " movq %rax, 19*8(%rsp)\n" > - RESTORE_REGS_STRING > - " popfq\n" > -#else > - " pushl %esp\n" > - " pushfl\n" > - SAVE_REGS_STRING > - " movl %esp, %eax\n" > - " call trampoline_handler\n" > - /* Replace saved sp with true return address. */ > - " movl %eax, 15*4(%esp)\n" > - RESTORE_REGS_STRING > - " popfl\n" > -#endif > - " ret\n" Here, we need a gap for storing ret-ip, because kretprobe_trampoline is the address which is returned from the target function. We have no "ret-ip" here at this point. So something like + "push $0\n" /* This is a gap, will be filled with real return address*/ > + "push trampoline_handler\n" > + "jmp call_to_exception_trampoline\n" > ".size kretprobe_trampoline, .-kretprobe_trampoline\n" > ); > NOKPROBE_SYMBOL(kretprobe_trampoline); > @@ -791,12 +770,7 @@ static __used void *trampoline_handler(s > > INIT_HLIST_HEAD(&empty_rp); > kretprobe_hash_lock(current, &head, &flags); > - /* fixup registers */ > - regs->cs = __KERNEL_CS; > -#ifdef CONFIG_X86_32 > - regs->cs |= get_kernel_rpl(); > - regs->gs = 0; > -#endif > + > /* We use pt_regs->sp for return address holder. */ > frame_pointer = ®s->sp; > regs->ip = trampoline_address; Thank you, -- Masami Hiramatsu <mhiramat@kernel.org> ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/4] x86/kprobes: Fix frame pointer annotations 2019-05-09 14:01 ` Masami Hiramatsu @ 2019-05-09 17:14 ` Peter Zijlstra 2019-05-10 4:58 ` Masami Hiramatsu 0 siblings, 1 reply; 36+ messages in thread From: Peter Zijlstra @ 2019-05-09 17:14 UTC (permalink / raw) To: Masami Hiramatsu Cc: Josh Poimboeuf, linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton, Andy Lutomirski, Nicolai Stange, Thomas Gleixner, Borislav Petkov, H. Peter Anvin, x86, Jiri Kosina, Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain, Masahiro Yamada, Joerg Roedel, linux-kselftest On Thu, May 09, 2019 at 11:01:06PM +0900, Masami Hiramatsu wrote: > On Thu, 9 May 2019 10:14:31 +0200 > Peter Zijlstra <peterz@infradead.org> wrote: > > But what I'd love to do is something like the belwo patch, and make all > > the trampolines (very much including ftrace) use that. Such that we then > > only have 1 copy of this magic (well, 2 because x86_64 also needs an > > implementation of this of course). > > OK, but I will make kretprobe integrated with func-graph tracer, > since it is inefficient that we have 2 different hidden return stack... > > Anyway, > > > Changing ftrace over to this would be a little more work but it can > > easily chain things a little to get its original context back: > > > > ENTRY(ftrace_regs_caller) > > GLOBAL(ftrace_regs_func) > > push ftrace_stub > > push ftrace_regs_handler > > jmp call_to_exception_trampoline > > END(ftrace_regs_caller) > > > > typedef void (*ftrace_func_t)(unsigned long, unsigned long, struct ftrace_op *, struct pt_regs *); > > > > struct ftrace_regs_stack { > > ftrace_func_t func; > > unsigned long parent_ip; > > }; > > > > void ftrace_regs_handler(struct pr_regs *regs) > > { > > struct ftrace_regs_stack *st = (void *)regs->sp; > > ftrace_func_t func = st->func; > > > > regs->sp += sizeof(long); /* pop func */ > > Sorry, why pop here? Otherwise it stays on the return stack and bad things happen. Note how the below trampoline thing uses regs->sp. > > func(regs->ip, st->parent_ip, function_trace_op, regs); > > } > > > > Hmm? I didn't look into the function_graph thing, but I imagine it can > > be added without too much pain. > > Yes, that should be good for function_graph trampoline too. > We use very similar technic. Ideally also the optimized kprobe trampoline, but I've not managed to fully comprehend that one. > > > > --- > > --- a/arch/x86/entry/entry_32.S > > +++ b/arch/x86/entry/entry_32.S > > @@ -1576,3 +1576,100 @@ ENTRY(rewind_stack_do_exit) > > call do_exit > > 1: jmp 1b > > END(rewind_stack_do_exit) > > + > > +/* > > + * Transforms a CALL frame into an exception frame; IOW it pretends the CALL we > > + * just did was in fact scribbled with an INT3. > > + * > > + * Use this trampoline like: > > + * > > + * PUSH $func > > + * JMP call_to_exception_trampoline > > + * > > + * $func will see regs->ip point at the CALL instruction and must therefore > > + * modify regs->ip in order to make progress (just like a normal INT3 scribbled > > + * CALL). > > + * > > + * NOTE: we do not restore any of the segment registers. > > + */ > > +ENTRY(call_to_exception_trampoline) > > + /* > > + * On entry the stack looks like: > > + * > > + * 2*4(%esp) <previous context> > > + * 1*4(%esp) RET-IP > > + * 0*4(%esp) func > > + * > > + * transform this into: > > + * > > + * 19*4(%esp) <previous context> > > + * 18*4(%esp) gap / RET-IP > > + * 17*4(%esp) gap / func > > + * 16*4(%esp) ss > > + * 15*415*4(%esp) sp / <previous context> > > isn't this "&<previous context>" ? Yes. > > + * 14*4(%esp) flags > > + * 13*4(%esp) cs > > + * 12*4(%esp) ip / RET-IP > > + * 11*4(%esp) orig_eax > > + * 10*4(%esp) gs > > + * 9*4(%esp) fs > > + * 8*4(%esp) es > > + * 7*4(%esp) ds > > + * 6*4(%esp) eax > > + * 5*4(%esp) ebp > > + * 4*4(%esp) edi > > + * 3*4(%esp) esi > > + * 2*4(%esp) edx > > + * 1*4(%esp) ecx > > + * 0*4(%esp) ebx > > + */ > > + pushl %ss > > + pushl %esp # points at ss > > + addl $3*4, (%esp) # point it at <previous context> > > + pushfl > > + pushl %cs > > + pushl 5*4(%esp) # RET-IP > > + subl 5, (%esp) # point at CALL instruction > > + pushl $-1 > > + pushl %gs > > + pushl %fs > > + pushl %es > > + pushl %ds > > + pushl %eax > > + pushl %ebp > > + pushl %edi > > + pushl %esi > > + pushl %edx > > + pushl %ecx > > + pushl %ebx > > + > > + ENCODE_FRAME_POINTER > > + > > + movl %esp, %eax # 1st argument: pt_regs > > + > > + movl 17*4(%esp), %ebx # func > > + CALL_NOSPEC %ebx > > + > > + movl PT_OLDESP(%esp), %eax > > Is PT_OLDESP(%esp) "<previous context>" or "&<previous contex>"? The latter. > > + > > + movl PT_EIP(%esp), %ecx > > + movl %ecx, -1*4(%eax) > > Ah, OK, so $func must set the true return address to regs->ip > instead of returning it. Just so. > > + > > + movl PT_EFLAGS(%esp), %ecx > > + movl %ecx, -2*4(%eax) > > + > > + movl PT_EAX(%esp), %ecx > > + movl %ecx, -3*4(%eax) > > So, at this point, the stack becomes > 3*4(%esp) ®s->sp 2*4(%esp) RET-IP 1*4(%esp) eflags 0*4(%esp) eax > Correct? Yes, relative to regs->sp, which is why we need to pop 'func', otherwise it stays on the stack. > > + > > + popl %ebx > > + popl %ecx > > + popl %edx > > + popl %esi > > + popl %edi > > + popl %ebp > > + > > + lea -3*4(%eax), %esp > > + popl %eax > > + popfl > > + ret > > +END(call_to_exception_trampoline) > > --- a/arch/x86/kernel/kprobes/core.c > > +++ b/arch/x86/kernel/kprobes/core.c > > @@ -731,29 +731,8 @@ asm( > > ".global kretprobe_trampoline\n" > > ".type kretprobe_trampoline, @function\n" > > "kretprobe_trampoline:\n" > > - /* We don't bother saving the ss register */ > > -#ifdef CONFIG_X86_64 > > - " pushq %rsp\n" > > - " pushfq\n" > > - SAVE_REGS_STRING > > - " movq %rsp, %rdi\n" > > - " call trampoline_handler\n" > > - /* Replace saved sp with true return address. */ > > - " movq %rax, 19*8(%rsp)\n" > > - RESTORE_REGS_STRING > > - " popfq\n" > > -#else > > - " pushl %esp\n" > > - " pushfl\n" > > - SAVE_REGS_STRING > > - " movl %esp, %eax\n" > > - " call trampoline_handler\n" > > - /* Replace saved sp with true return address. */ > > - " movl %eax, 15*4(%esp)\n" > > - RESTORE_REGS_STRING > > - " popfl\n" > > -#endif > > - " ret\n" > > Here, we need a gap for storing ret-ip, because kretprobe_trampoline is > the address which is returned from the target function. We have no > "ret-ip" here at this point. So something like > > + "push $0\n" /* This is a gap, will be filled with real return address*/ The trampoline already provides a gap, trampoline_handler() will need to use int3_emulate_push() if it wants to inject something on the return stack. > > + "push trampoline_handler\n" > > + "jmp call_to_exception_trampoline\n" > > ".size kretprobe_trampoline, .-kretprobe_trampoline\n" > > ); > > NOKPROBE_SYMBOL(kretprobe_trampoline); ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/4] x86/kprobes: Fix frame pointer annotations 2019-05-09 17:14 ` Peter Zijlstra @ 2019-05-10 4:58 ` Masami Hiramatsu 2019-05-10 12:31 ` Peter Zijlstra 2019-05-10 12:40 ` Peter Zijlstra 0 siblings, 2 replies; 36+ messages in thread From: Masami Hiramatsu @ 2019-05-10 4:58 UTC (permalink / raw) To: Peter Zijlstra Cc: Josh Poimboeuf, linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton, Andy Lutomirski, Nicolai Stange, Thomas Gleixner, Borislav Petkov, H. Peter Anvin, x86, Jiri Kosina, Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain, Masahiro Yamada, Joerg Roedel, linux-kselftest On Thu, 9 May 2019 19:14:16 +0200 Peter Zijlstra <peterz@infradead.org> wrote: > On Thu, May 09, 2019 at 11:01:06PM +0900, Masami Hiramatsu wrote: > > On Thu, 9 May 2019 10:14:31 +0200 > > Peter Zijlstra <peterz@infradead.org> wrote: > > > > But what I'd love to do is something like the belwo patch, and make all > > > the trampolines (very much including ftrace) use that. Such that we then > > > only have 1 copy of this magic (well, 2 because x86_64 also needs an > > > implementation of this of course). > > > > OK, but I will make kretprobe integrated with func-graph tracer, > > since it is inefficient that we have 2 different hidden return stack... > > > > Anyway, > > > > > Changing ftrace over to this would be a little more work but it can > > > easily chain things a little to get its original context back: > > > > > > ENTRY(ftrace_regs_caller) > > > GLOBAL(ftrace_regs_func) > > > push ftrace_stub > > > push ftrace_regs_handler > > > jmp call_to_exception_trampoline > > > END(ftrace_regs_caller) > > > > > > typedef void (*ftrace_func_t)(unsigned long, unsigned long, struct ftrace_op *, struct pt_regs *); > > > > > > struct ftrace_regs_stack { > > > ftrace_func_t func; > > > unsigned long parent_ip; > > > }; > > > > > > void ftrace_regs_handler(struct pr_regs *regs) > > > { > > > struct ftrace_regs_stack *st = (void *)regs->sp; > > > ftrace_func_t func = st->func; > > > > > > regs->sp += sizeof(long); /* pop func */ > > > > Sorry, why pop here? > > Otherwise it stays on the return stack and bad things happen. Note how > the below trampoline thing uses regs->sp. > > > > func(regs->ip, st->parent_ip, function_trace_op, regs); > > > } > > > > > > Hmm? I didn't look into the function_graph thing, but I imagine it can > > > be added without too much pain. > > > > Yes, that should be good for function_graph trampoline too. > > We use very similar technic. > > Ideally also the optimized kprobe trampoline, but I've not managed to > fully comprehend that one. As you pointed in other reply, save/restore can be a macro, but each trampoline code is slightly different. Optprobe template has below parts (jumped from probed address) [store regs] [setup function arguments (pt_regs and probed address)] [handler call] [restore regs] [execute copied instruction] [jump back to probed address] Note that there is a limitation that if it is optiomized probe, user handler can not change regs->ip. (we can not use "ret" after executed a copied instruction, which must run on same stack) > > > > > > > --- > > > --- a/arch/x86/entry/entry_32.S > > > +++ b/arch/x86/entry/entry_32.S > > > @@ -1576,3 +1576,100 @@ ENTRY(rewind_stack_do_exit) > > > call do_exit > > > 1: jmp 1b > > > END(rewind_stack_do_exit) > > > + > > > +/* > > > + * Transforms a CALL frame into an exception frame; IOW it pretends the CALL we > > > + * just did was in fact scribbled with an INT3. > > > + * > > > + * Use this trampoline like: > > > + * > > > + * PUSH $func > > > + * JMP call_to_exception_trampoline > > > + * > > > + * $func will see regs->ip point at the CALL instruction and must therefore > > > + * modify regs->ip in order to make progress (just like a normal INT3 scribbled > > > + * CALL). > > > + * > > > + * NOTE: we do not restore any of the segment registers. > > > + */ > > > +ENTRY(call_to_exception_trampoline) > > > + /* > > > + * On entry the stack looks like: > > > + * > > > + * 2*4(%esp) <previous context> > > > + * 1*4(%esp) RET-IP > > > + * 0*4(%esp) func > > > + * > > > + * transform this into: > > > + * > > > + * 19*4(%esp) <previous context> > > > + * 18*4(%esp) gap / RET-IP > > > + * 17*4(%esp) gap / func > > > + * 16*4(%esp) ss > > > + * 15*415*4(%esp) sp / <previous context> > > > > isn't this "&<previous context>" ? > > Yes. > > > > + * 14*4(%esp) flags > > > + * 13*4(%esp) cs > > > + * 12*4(%esp) ip / RET-IP > > > + * 11*4(%esp) orig_eax > > > + * 10*4(%esp) gs > > > + * 9*4(%esp) fs > > > + * 8*4(%esp) es > > > + * 7*4(%esp) ds > > > + * 6*4(%esp) eax > > > + * 5*4(%esp) ebp > > > + * 4*4(%esp) edi > > > + * 3*4(%esp) esi > > > + * 2*4(%esp) edx > > > + * 1*4(%esp) ecx > > > + * 0*4(%esp) ebx > > > + */ > > > + pushl %ss > > > + pushl %esp # points at ss > > > + addl $3*4, (%esp) # point it at <previous context> > > > + pushfl > > > + pushl %cs > > > + pushl 5*4(%esp) # RET-IP > > > + subl 5, (%esp) # point at CALL instruction > > > + pushl $-1 > > > + pushl %gs > > > + pushl %fs > > > + pushl %es > > > + pushl %ds > > > + pushl %eax > > > + pushl %ebp > > > + pushl %edi > > > + pushl %esi > > > + pushl %edx > > > + pushl %ecx > > > + pushl %ebx > > > + > > > + ENCODE_FRAME_POINTER > > > + > > > + movl %esp, %eax # 1st argument: pt_regs > > > + > > > + movl 17*4(%esp), %ebx # func > > > + CALL_NOSPEC %ebx > > > + > > > + movl PT_OLDESP(%esp), %eax > > > > Is PT_OLDESP(%esp) "<previous context>" or "&<previous contex>"? > > The latter. > > > > + > > > + movl PT_EIP(%esp), %ecx > > > + movl %ecx, -1*4(%eax) > > > > Ah, OK, so $func must set the true return address to regs->ip > > instead of returning it. > > Just so. > > > > + > > > + movl PT_EFLAGS(%esp), %ecx > > > + movl %ecx, -2*4(%eax) > > > + > > > + movl PT_EAX(%esp), %ecx > > > + movl %ecx, -3*4(%eax) > > > > So, at this point, the stack becomes > > > 3*4(%esp) ®s->sp > 2*4(%esp) RET-IP > 1*4(%esp) eflags > 0*4(%esp) eax > > > Correct? > > Yes, relative to regs->sp, which is why we need to pop 'func', otherwise > it stays on the stack. > > > > + > > > + popl %ebx > > > + popl %ecx > > > + popl %edx > > > + popl %esi > > > + popl %edi > > > + popl %ebp > > > + > > > + lea -3*4(%eax), %esp > > > + popl %eax > > > + popfl > > > + ret > > > +END(call_to_exception_trampoline) > > > --- a/arch/x86/kernel/kprobes/core.c > > > +++ b/arch/x86/kernel/kprobes/core.c > > > @@ -731,29 +731,8 @@ asm( > > > ".global kretprobe_trampoline\n" > > > ".type kretprobe_trampoline, @function\n" > > > "kretprobe_trampoline:\n" > > > - /* We don't bother saving the ss register */ > > > -#ifdef CONFIG_X86_64 > > > - " pushq %rsp\n" > > > - " pushfq\n" > > > - SAVE_REGS_STRING > > > - " movq %rsp, %rdi\n" > > > - " call trampoline_handler\n" > > > - /* Replace saved sp with true return address. */ > > > - " movq %rax, 19*8(%rsp)\n" > > > - RESTORE_REGS_STRING > > > - " popfq\n" > > > -#else > > > - " pushl %esp\n" > > > - " pushfl\n" > > > - SAVE_REGS_STRING > > > - " movl %esp, %eax\n" > > > - " call trampoline_handler\n" > > > - /* Replace saved sp with true return address. */ > > > - " movl %eax, 15*4(%esp)\n" > > > - RESTORE_REGS_STRING > > > - " popfl\n" > > > -#endif > > > - " ret\n" > > > > Here, we need a gap for storing ret-ip, because kretprobe_trampoline is > > the address which is returned from the target function. We have no > > "ret-ip" here at this point. So something like > > > > + "push $0\n" /* This is a gap, will be filled with real return address*/ > > The trampoline already provides a gap, trampoline_handler() will need to > use int3_emulate_push() if it wants to inject something on the return > stack. I guess you mean the int3 case. This trampoline is used as a return destination. When the target function is called, kretprobe interrupts the first instruction, and replace the return address with this trampoline. When a "ret" instruction is done, it returns to this trampoline. Thus the stack frame start with previous context here. As you described above, > > > + * On entry the stack looks like: > > > + * > > > + * 2*4(%esp) <previous context> > > > + * 1*4(%esp) RET-IP > > > + * 0*4(%esp) func From this trampoline call, the stack looks like: * 1*4(%esp) <previous context> * 0*4(%esp) func So we need one more push. > > > > + "push trampoline_handler\n" > > > + "jmp call_to_exception_trampoline\n" > > > ".size kretprobe_trampoline, .-kretprobe_trampoline\n" > > > ); > > > NOKPROBE_SYMBOL(kretprobe_trampoline); Thank you, -- Masami Hiramatsu <mhiramat@kernel.org> ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/4] x86/kprobes: Fix frame pointer annotations 2019-05-10 4:58 ` Masami Hiramatsu @ 2019-05-10 12:31 ` Peter Zijlstra 2019-05-11 0:52 ` Masami Hiramatsu 2019-05-10 12:40 ` Peter Zijlstra 1 sibling, 1 reply; 36+ messages in thread From: Peter Zijlstra @ 2019-05-10 12:31 UTC (permalink / raw) To: Masami Hiramatsu Cc: Josh Poimboeuf, linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton, Andy Lutomirski, Nicolai Stange, Thomas Gleixner, Borislav Petkov, H. Peter Anvin, x86, Jiri Kosina, Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain, Masahiro Yamada, Joerg Roedel, linux-kselftest On Fri, May 10, 2019 at 01:58:31PM +0900, Masami Hiramatsu wrote: > On Thu, 9 May 2019 19:14:16 +0200 > Peter Zijlstra <peterz@infradead.org> wrote: > > Ideally also the optimized kprobe trampoline, but I've not managed to > > fully comprehend that one. > > As you pointed in other reply, save/restore can be a macro, but > each trampoline code is slightly different. Optprobe template has > below parts > > (jumped from probed address) > [store regs] > [setup function arguments (pt_regs and probed address)] > [handler call] > [restore regs] > [execute copied instruction] instruction_s_ ? The JMP to this trampoline is likely 5 bytes and could have clobbered multiple instructions, we'd then have to place them all here, and > [jump back to probed address] jump to after whatever instructions were clobbered by the JMP. > Note that there is a limitation that if it is optiomized probe, user > handler can not change regs->ip. (we can not use "ret" after executed > a copied instruction, which must run on same stack) Changing regs->ip in this case is going to be massively dodgy indeed :-) But so would changing much else; changing stack layout would also be somewhat tricky. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/4] x86/kprobes: Fix frame pointer annotations 2019-05-10 12:31 ` Peter Zijlstra @ 2019-05-11 0:52 ` Masami Hiramatsu 0 siblings, 0 replies; 36+ messages in thread From: Masami Hiramatsu @ 2019-05-11 0:52 UTC (permalink / raw) To: Peter Zijlstra Cc: Josh Poimboeuf, linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton, Andy Lutomirski, Nicolai Stange, Thomas Gleixner, Borislav Petkov, H. Peter Anvin, x86, Jiri Kosina, Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain, Masahiro Yamada, Joerg Roedel, linux-kselftest On Fri, 10 May 2019 14:31:31 +0200 Peter Zijlstra <peterz@infradead.org> wrote: > On Fri, May 10, 2019 at 01:58:31PM +0900, Masami Hiramatsu wrote: > > On Thu, 9 May 2019 19:14:16 +0200 > > Peter Zijlstra <peterz@infradead.org> wrote: > > > > Ideally also the optimized kprobe trampoline, but I've not managed to > > > fully comprehend that one. > > > > As you pointed in other reply, save/restore can be a macro, but > > each trampoline code is slightly different. Optprobe template has > > below parts > > > > (jumped from probed address) > > [store regs] > > [setup function arguments (pt_regs and probed address)] > > [handler call] > > [restore regs] > > [execute copied instruction] > > instruction_s_ ? Yes. > > The JMP to this trampoline is likely 5 bytes and could have clobbered > multiple instructions, we'd then have to place them all here, and > > > [jump back to probed address] > > jump to after whatever instructions were clobbered by the JMP. Right! > > Note that there is a limitation that if it is optiomized probe, user > > handler can not change regs->ip. (we can not use "ret" after executed > > a copied instruction, which must run on same stack) > > Changing regs->ip in this case is going to be massively dodgy indeed :-) > But so would changing much else; changing stack layout would also be > somewhat tricky. Yes, so the stack must be same after [restore regs]. Thank you, -- Masami Hiramatsu <mhiramat@kernel.org> ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/4] x86/kprobes: Fix frame pointer annotations 2019-05-10 4:58 ` Masami Hiramatsu 2019-05-10 12:31 ` Peter Zijlstra @ 2019-05-10 12:40 ` Peter Zijlstra 2019-05-11 0:56 ` Masami Hiramatsu 1 sibling, 1 reply; 36+ messages in thread From: Peter Zijlstra @ 2019-05-10 12:40 UTC (permalink / raw) To: Masami Hiramatsu Cc: Josh Poimboeuf, linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton, Andy Lutomirski, Nicolai Stange, Thomas Gleixner, Borislav Petkov, H. Peter Anvin, x86, Jiri Kosina, Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain, Masahiro Yamada, Joerg Roedel, linux-kselftest On Fri, May 10, 2019 at 01:58:31PM +0900, Masami Hiramatsu wrote: > On Thu, 9 May 2019 19:14:16 +0200 > Peter Zijlstra <peterz@infradead.org> wrote: > > > > --- a/arch/x86/kernel/kprobes/core.c > > > > +++ b/arch/x86/kernel/kprobes/core.c > > > > @@ -731,29 +731,8 @@ asm( > > > > ".global kretprobe_trampoline\n" > > > > ".type kretprobe_trampoline, @function\n" > > > > "kretprobe_trampoline:\n" > > > Here, we need a gap for storing ret-ip, because kretprobe_trampoline is > > > the address which is returned from the target function. We have no > > > "ret-ip" here at this point. So something like > > > > > > + "push $0\n" /* This is a gap, will be filled with real return address*/ > > > > The trampoline already provides a gap, trampoline_handler() will need to > > use int3_emulate_push() if it wants to inject something on the return > > stack. > > I guess you mean the int3 case. This trampoline is used as a return destination. > When the target function is called, kretprobe interrupts the first instruction, > and replace the return address with this trampoline. When a "ret" instruction > is done, it returns to this trampoline. Thus the stack frame start with > previous context here. As you described above, I would prefer to change that to inject an extra return address, instead of replacing it. With the new exception stuff we can actually do that. So on entry we then go from: <previous context> RET-IP to <previous context> RET-IP return-trampoline So when the function returns, it falls into the trampoline instead. > > > > + * On entry the stack looks like: > > > > + * > > > > + * 2*4(%esp) <previous context> > > > > + * 1*4(%esp) RET-IP > > > > + * 0*4(%esp) func > > From this trampoline call, the stack looks like: > > * 1*4(%esp) <previous context> > * 0*4(%esp) func > > So we need one more push. And then the stack looks just right at this point. > > > > + "push trampoline_handler\n" > > > > + "jmp call_to_exception_trampoline\n" > > > > ".size kretprobe_trampoline, .-kretprobe_trampoline\n" > > > > ); > > > > NOKPROBE_SYMBOL(kretprobe_trampoline); ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/4] x86/kprobes: Fix frame pointer annotations 2019-05-10 12:40 ` Peter Zijlstra @ 2019-05-11 0:56 ` Masami Hiramatsu 2019-05-13 8:15 ` Peter Zijlstra 0 siblings, 1 reply; 36+ messages in thread From: Masami Hiramatsu @ 2019-05-11 0:56 UTC (permalink / raw) To: Peter Zijlstra Cc: Josh Poimboeuf, linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton, Andy Lutomirski, Nicolai Stange, Thomas Gleixner, Borislav Petkov, H. Peter Anvin, x86, Jiri Kosina, Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain, Masahiro Yamada, Joerg Roedel, linux-kselftest On Fri, 10 May 2019 14:40:54 +0200 Peter Zijlstra <peterz@infradead.org> wrote: > On Fri, May 10, 2019 at 01:58:31PM +0900, Masami Hiramatsu wrote: > > On Thu, 9 May 2019 19:14:16 +0200 > > Peter Zijlstra <peterz@infradead.org> wrote: > > > > > --- a/arch/x86/kernel/kprobes/core.c > > > > > +++ b/arch/x86/kernel/kprobes/core.c > > > > > @@ -731,29 +731,8 @@ asm( > > > > > ".global kretprobe_trampoline\n" > > > > > ".type kretprobe_trampoline, @function\n" > > > > > "kretprobe_trampoline:\n" > > > > > Here, we need a gap for storing ret-ip, because kretprobe_trampoline is > > > > the address which is returned from the target function. We have no > > > > "ret-ip" here at this point. So something like > > > > > > > > + "push $0\n" /* This is a gap, will be filled with real return address*/ > > > > > > The trampoline already provides a gap, trampoline_handler() will need to > > > use int3_emulate_push() if it wants to inject something on the return > > > stack. > > > > I guess you mean the int3 case. This trampoline is used as a return destination. > > > When the target function is called, kretprobe interrupts the first instruction, > > and replace the return address with this trampoline. When a "ret" instruction > > is done, it returns to this trampoline. Thus the stack frame start with > > previous context here. As you described above, > > I would prefer to change that to inject an extra return address, instead > of replacing it. With the new exception stuff we can actually do that. > > So on entry we then go from: > > <previous context> > RET-IP > > to > > <previous context> > RET-IP > return-trampoline > > So when the function returns, it falls into the trampoline instead. Is that really possible? On x86-64, most parameters are passed by registers, but x86-32 (and x86-64 in rare case) some parameters can be passed by stack. If we change the stack layout in the function prologue, the code in function body can not access those parameters on stack. Thank you, > > > > > > + * On entry the stack looks like: > > > > > + * > > > > > + * 2*4(%esp) <previous context> > > > > > + * 1*4(%esp) RET-IP > > > > > + * 0*4(%esp) func > > > > From this trampoline call, the stack looks like: > > > > * 1*4(%esp) <previous context> > > * 0*4(%esp) func > > > > So we need one more push. > > And then the stack looks just right at this point. > > > > > > + "push trampoline_handler\n" > > > > > + "jmp call_to_exception_trampoline\n" > > > > > ".size kretprobe_trampoline, .-kretprobe_trampoline\n" > > > > > ); > > > > > NOKPROBE_SYMBOL(kretprobe_trampoline); -- Masami Hiramatsu <mhiramat@kernel.org> ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/4] x86/kprobes: Fix frame pointer annotations 2019-05-11 0:56 ` Masami Hiramatsu @ 2019-05-13 8:15 ` Peter Zijlstra 0 siblings, 0 replies; 36+ messages in thread From: Peter Zijlstra @ 2019-05-13 8:15 UTC (permalink / raw) To: Masami Hiramatsu Cc: Josh Poimboeuf, linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton, Andy Lutomirski, Nicolai Stange, Thomas Gleixner, Borislav Petkov, H. Peter Anvin, x86, Jiri Kosina, Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain, Masahiro Yamada, Joerg Roedel, linux-kselftest On Sat, May 11, 2019 at 09:56:55AM +0900, Masami Hiramatsu wrote: > On Fri, 10 May 2019 14:40:54 +0200 > Peter Zijlstra <peterz@infradead.org> wrote: > > > On Fri, May 10, 2019 at 01:58:31PM +0900, Masami Hiramatsu wrote: > > > On Thu, 9 May 2019 19:14:16 +0200 > > > Peter Zijlstra <peterz@infradead.org> wrote: > > > > > > --- a/arch/x86/kernel/kprobes/core.c > > > > > > +++ b/arch/x86/kernel/kprobes/core.c > > > > > > @@ -731,29 +731,8 @@ asm( > > > > > > ".global kretprobe_trampoline\n" > > > > > > ".type kretprobe_trampoline, @function\n" > > > > > > "kretprobe_trampoline:\n" > > > > > > > Here, we need a gap for storing ret-ip, because kretprobe_trampoline is > > > > > the address which is returned from the target function. We have no > > > > > "ret-ip" here at this point. So something like > > > > > > > > > > + "push $0\n" /* This is a gap, will be filled with real return address*/ > > > > > > > > The trampoline already provides a gap, trampoline_handler() will need to > > > > use int3_emulate_push() if it wants to inject something on the return > > > > stack. > > > > > > I guess you mean the int3 case. This trampoline is used as a return destination. > > > > > When the target function is called, kretprobe interrupts the first instruction, > > > and replace the return address with this trampoline. When a "ret" instruction > > > is done, it returns to this trampoline. Thus the stack frame start with > > > previous context here. As you described above, > > > > I would prefer to change that to inject an extra return address, instead > > of replacing it. With the new exception stuff we can actually do that. > > > > So on entry we then go from: > > > > <previous context> > > RET-IP > > > > to > > > > <previous context> > > RET-IP > > return-trampoline > > > > So when the function returns, it falls into the trampoline instead. > > Is that really possible? On x86-64, most parameters are passed by registers, > but x86-32 (and x86-64 in rare case) some parameters can be passed by stack. > If we change the stack layout in the function prologue, the code in > function body can not access those parameters on stack. Ooh, I see what you mean... yes that might be trouble indeed. Damn.. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/4] x86/kprobes: Fix frame pointer annotations 2019-05-09 8:14 ` Peter Zijlstra ` (2 preceding siblings ...) 2019-05-09 14:01 ` Masami Hiramatsu @ 2019-05-09 16:20 ` Andy Lutomirski 2019-05-09 17:18 ` Peter Zijlstra 2019-05-09 17:43 ` Steven Rostedt 2019-05-09 17:37 ` Steven Rostedt 4 siblings, 2 replies; 36+ messages in thread From: Andy Lutomirski @ 2019-05-09 16:20 UTC (permalink / raw) To: Peter Zijlstra Cc: Masami Hiramatsu, Josh Poimboeuf, linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton, Andy Lutomirski, Nicolai Stange, Thomas Gleixner, Borislav Petkov, H. Peter Anvin, x86, Jiri Kosina, Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain, Masahiro Yamada, Joerg Roedel, linux-kselftest > On May 9, 2019, at 1:14 AM, Peter Zijlstra <peterz@infradead.org> wrote: > >> On Thu, May 09, 2019 at 10:20:30AM +0900, Masami Hiramatsu wrote: >> Hi Josh, >> >> On Wed, 8 May 2019 13:48:48 -0500 >> Josh Poimboeuf <jpoimboe@redhat.com> wrote: >> >>>> On Wed, May 08, 2019 at 05:39:07PM +0200, Peter Zijlstra wrote: >>>>> On Wed, May 08, 2019 at 07:42:48AM -0500, Josh Poimboeuf wrote: >>>>> On Wed, May 08, 2019 at 02:04:16PM +0200, Peter Zijlstra wrote: >>>> >>>>>> Do the x86_64 variants also want some ORC annotation? >>>>> >>>>> Maybe so. Though it looks like regs->ip isn't saved. The saved >>>>> registers might need to be tweaked. I'll need to look into it. >>>> >>>> What all these sites do (and maybe we should look at unifying them >>>> somehow) is turn a CALL frame (aka RET-IP) into an exception frame (aka >>>> pt_regs). >>>> >>>> So regs->ip will be the return address (which is fixed up to be the CALL >>>> address in the handler). >>> >>> But from what I can tell, trampoline_handler() hard-codes regs->ip to >>> point to kretprobe_trampoline(), and the original return address is >>> placed in regs->sp. >>> >>> Masami, is there a reason why regs->ip doesn't have the original return >>> address and regs->sp doesn't have the original SP? I think that would >>> help the unwinder understand things. >> >> Yes, for regs->ip, there is a histrical reason. Since previously, we had >> an int3 at trampoline, so the user (kretprobe) handler expects that >> regs->ip is trampoline address and ri->ret_addr is original return address. >> It is better to check the other archs, but I think it is possible to >> change the regs->ip to original return address, since no one cares such >> "fixed address". :) >> >> For the regs->sp, there are 2 reasons. >> >> For x86-64, it's just for over-optimizing (reduce stack usage). >> I think we can make a gap for putting return address, something like >> >> "kretprobe_trampoline:\n" >> #ifdef CONFIG_X86_64 >> " pushq %rsp\n" /* Make a gap for return address */ >> " pushq 0(%rsp)\n" /* Copy original stack pointer */ >> " pushfq\n" >> SAVE_REGS_STRING >> " movq %rsp, %rdi\n" >> " call trampoline_handler\n" >> /* Push the true return address to the bottom */ >> " movq %rax, 20*8(%rsp)\n" >> RESTORE_REGS_STRING >> " popfq\n" >> " addq $8, %rsp\n" /* Skip original stack pointer */ >> >> For i386 (x86-32), there is no other way to keep ®s->sp as >> the original stack pointer. It has to be changed with this series, >> maybe as same as x86-64. > > Right; I already fixed that in my patch changing i386's pt_regs. > > But what I'd love to do is something like the belwo patch, and make all > the trampolines (very much including ftrace) use that. Such that we then > only have 1 copy of this magic (well, 2 because x86_64 also needs an > implementation of this of course). > > Changing ftrace over to this would be a little more work but it can > easily chain things a little to get its original context back: > > ENTRY(ftrace_regs_caller) > GLOBAL(ftrace_regs_func) > push ftrace_stub > push ftrace_regs_handler > jmp call_to_exception_trampoline > END(ftrace_regs_caller) > > typedef void (*ftrace_func_t)(unsigned long, unsigned long, struct ftrace_op *, struct pt_regs *); > > struct ftrace_regs_stack { > ftrace_func_t func; > unsigned long parent_ip; > }; > > void ftrace_regs_handler(struct pr_regs *regs) > { > struct ftrace_regs_stack *st = (void *)regs->sp; > ftrace_func_t func = st->func; > > regs->sp += sizeof(long); /* pop func */ > > func(regs->ip, st->parent_ip, function_trace_op, regs); > } > > Hmm? I didn't look into the function_graph thing, but I imagine it can > be added without too much pain. > > --- > --- a/arch/x86/entry/entry_32.S > +++ b/arch/x86/entry/entry_32.S > @@ -1576,3 +1576,100 @@ ENTRY(rewind_stack_do_exit) > call do_exit > 1: jmp 1b > END(rewind_stack_do_exit) > + > +/* > + * Transforms a CALL frame into an exception frame; IOW it pretends the CALL we > + * just did was in fact scribbled with an INT3. > + * > + * Use this trampoline like: > + * > + * PUSH $func > + * JMP call_to_exception_trampoline > + * > + * $func will see regs->ip point at the CALL instruction and must therefore > + * modify regs->ip in order to make progress (just like a normal INT3 scribbled > + * CALL). > + * > + * NOTE: we do not restore any of the segment registers. > + */ > +ENTRY(call_to_exception_trampoline) > + /* > + * On entry the stack looks like: > + * > + * 2*4(%esp) <previous context> > + * 1*4(%esp) RET-IP > + * 0*4(%esp) func > + * > + * transform this into: > + * > + * 19*4(%esp) <previous context> > + * 18*4(%esp) gap / RET-IP > + * 17*4(%esp) gap / func > + * 16*4(%esp) ss > + * 15*4(%esp) sp / <previous context> > + * 14*4(%esp) flags > + * 13*4(%esp) cs > + * 12*4(%esp) ip / RET-IP > + * 11*4(%esp) orig_eax > + * 10*4(%esp) gs > + * 9*4(%esp) fs > + * 8*4(%esp) es > + * 7*4(%esp) ds > + * 6*4(%esp) eax > + * 5*4(%esp) ebp > + * 4*4(%esp) edi > + * 3*4(%esp) esi > + * 2*4(%esp) edx > + * 1*4(%esp) ecx > + * 0*4(%esp) ebx > + */ > + pushl %ss > + pushl %esp # points at ss > + addl $3*4, (%esp) # point it at <previous context> > + pushfl > + pushl %cs > + pushl 5*4(%esp) # RET-IP > + subl 5, (%esp) # point at CALL instruction > + pushl $-1 > + pushl %gs > + pushl %fs > + pushl %es > + pushl %ds > + pushl %eax > + pushl %ebp > + pushl %edi > + pushl %esi > + pushl %edx > + pushl %ecx > + pushl %ebx > + > + ENCODE_FRAME_POINTER > + > + movl %esp, %eax # 1st argument: pt_regs > + > + movl 17*4(%esp), %ebx # func > + CALL_NOSPEC %ebx > + > + movl PT_OLDESP(%esp), %eax > + > + movl PT_EIP(%esp), %ecx > + movl %ecx, -1*4(%eax) > + > + movl PT_EFLAGS(%esp), %ecx > + movl %ecx, -2*4(%eax) > + > + movl PT_EAX(%esp), %ecx > + movl %ecx, -3*4(%eax) > + > + popl %ebx > + popl %ecx > + popl %edx > + popl %esi > + popl %edi > + popl %ebp > + > + lea -3*4(%eax), %esp > + popl %eax > + popfl > + ret > +END(call_to_exception_trampoline) > --- a/arch/x86/kernel/kprobes/core.c > +++ b/arch/x86/kernel/kprobes/core.c > @@ -731,29 +731,8 @@ asm( > ".global kretprobe_trampoline\n" > ".type kretprobe_trampoline, @function\n" > "kretprobe_trampoline:\n" > - /* We don't bother saving the ss register */ > -#ifdef CONFIG_X86_64 > - " pushq %rsp\n" > - " pushfq\n" > - SAVE_REGS_STRING > - " movq %rsp, %rdi\n" > - " call trampoline_handler\n" > - /* Replace saved sp with true return address. */ > - " movq %rax, 19*8(%rsp)\n" > - RESTORE_REGS_STRING > - " popfq\n" > -#else > - " pushl %esp\n" > - " pushfl\n" > - SAVE_REGS_STRING > - " movl %esp, %eax\n" > - " call trampoline_handler\n" > - /* Replace saved sp with true return address. */ > - " movl %eax, 15*4(%esp)\n" > - RESTORE_REGS_STRING > - " popfl\n" > -#endif > - " ret\n" > + "push trampoline_handler\n" > + "jmp call_to_exception_trampoline\n" > ".size kretprobe_trampoline, .-kretprobe_trampoline\n" > ); Potentially minor nit: you’re doing popfl, but you’re not doing TRACE_IRQ_whatever. This makes me think that you should either add the tracing (ugh!) or you should maybe just skip the popfl. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/4] x86/kprobes: Fix frame pointer annotations 2019-05-09 16:20 ` Andy Lutomirski @ 2019-05-09 17:18 ` Peter Zijlstra 2019-05-09 17:43 ` Steven Rostedt 1 sibling, 0 replies; 36+ messages in thread From: Peter Zijlstra @ 2019-05-09 17:18 UTC (permalink / raw) To: Andy Lutomirski Cc: Masami Hiramatsu, Josh Poimboeuf, linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton, Andy Lutomirski, Nicolai Stange, Thomas Gleixner, Borislav Petkov, H. Peter Anvin, x86, Jiri Kosina, Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain, Masahiro Yamada, Joerg Roedel, linux-kselftest On Thu, May 09, 2019 at 09:20:06AM -0700, Andy Lutomirski wrote: > > +ENTRY(call_to_exception_trampoline) > > + /* > > + * On entry the stack looks like: > > + * > > + * 2*4(%esp) <previous context> > > + * 1*4(%esp) RET-IP > > + * 0*4(%esp) func > > + * > > + * transform this into: > > + * > > + * 19*4(%esp) <previous context> > > + * 18*4(%esp) gap / RET-IP > > + * 17*4(%esp) gap / func > > + * 16*4(%esp) ss > > + * 15*4(%esp) sp / <previous context> > > + * 14*4(%esp) flags > > + * 13*4(%esp) cs > > + * 12*4(%esp) ip / RET-IP > > + * 11*4(%esp) orig_eax > > + * 10*4(%esp) gs > > + * 9*4(%esp) fs > > + * 8*4(%esp) es > > + * 7*4(%esp) ds > > + * 6*4(%esp) eax > > + * 5*4(%esp) ebp > > + * 4*4(%esp) edi > > + * 3*4(%esp) esi > > + * 2*4(%esp) edx > > + * 1*4(%esp) ecx > > + * 0*4(%esp) ebx > > + */ > > + pushl %ss > > + pushl %esp # points at ss > > + addl $3*4, (%esp) # point it at <previous context> > > + pushfl > > + pushl %cs > > + pushl 5*4(%esp) # RET-IP > > + subl 5, (%esp) # point at CALL instruction > > + pushl $-1 > > + pushl %gs > > + pushl %fs > > + pushl %es > > + pushl %ds > > + pushl %eax > > + pushl %ebp > > + pushl %edi > > + pushl %esi > > + pushl %edx > > + pushl %ecx > > + pushl %ebx > > + > > + ENCODE_FRAME_POINTER > > + > > + movl %esp, %eax # 1st argument: pt_regs > > + > > + movl 17*4(%esp), %ebx # func > > + CALL_NOSPEC %ebx > > + > > + movl PT_OLDESP(%esp), %eax > > + > > + movl PT_EIP(%esp), %ecx > > + movl %ecx, -1*4(%eax) > > + > > + movl PT_EFLAGS(%esp), %ecx > > + movl %ecx, -2*4(%eax) > > + > > + movl PT_EAX(%esp), %ecx > > + movl %ecx, -3*4(%eax) > > + > > + popl %ebx > > + popl %ecx > > + popl %edx > > + popl %esi > > + popl %edi > > + popl %ebp > > + > > + lea -3*4(%eax), %esp > > + popl %eax > > + popfl > > + ret > > +END(call_to_exception_trampoline) > Potentially minor nit: you’re doing popfl, but you’re not doing > TRACE_IRQ_whatever. This makes me think that you should either add > the tracing (ugh!) or you should maybe just skip the popfl. Yeah, so we really should not change flags I suppose. If this lives I'll remove the popfl. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/4] x86/kprobes: Fix frame pointer annotations 2019-05-09 16:20 ` Andy Lutomirski 2019-05-09 17:18 ` Peter Zijlstra @ 2019-05-09 17:43 ` Steven Rostedt 2019-05-10 3:21 ` Masami Hiramatsu 1 sibling, 1 reply; 36+ messages in thread From: Steven Rostedt @ 2019-05-09 17:43 UTC (permalink / raw) To: Andy Lutomirski Cc: Peter Zijlstra, Masami Hiramatsu, Josh Poimboeuf, linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton, Andy Lutomirski, Nicolai Stange, Thomas Gleixner, Borislav Petkov, H. Peter Anvin, x86, Jiri Kosina, Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain, Masahiro Yamada, Joerg Roedel, linux-kselftest On Thu, May 09, 2019 at 09:20:06AM -0700, Andy Lutomirski wrote: > > +END(call_to_exception_trampoline) > > --- a/arch/x86/kernel/kprobes/core.c > > +++ b/arch/x86/kernel/kprobes/core.c > > @@ -731,29 +731,8 @@ asm( > > ".global kretprobe_trampoline\n" > > ".type kretprobe_trampoline, @function\n" > > "kretprobe_trampoline:\n" > > - /* We don't bother saving the ss register */ > > -#ifdef CONFIG_X86_64 > > - " pushq %rsp\n" > > - " pushfq\n" > > - SAVE_REGS_STRING > > - " movq %rsp, %rdi\n" > > - " call trampoline_handler\n" > > - /* Replace saved sp with true return address. */ > > - " movq %rax, 19*8(%rsp)\n" > > - RESTORE_REGS_STRING > > - " popfq\n" > > -#else > > - " pushl %esp\n" > > - " pushfl\n" > > - SAVE_REGS_STRING > > - " movl %esp, %eax\n" > > - " call trampoline_handler\n" > > - /* Replace saved sp with true return address. */ > > - " movl %eax, 15*4(%esp)\n" > > - RESTORE_REGS_STRING > > - " popfl\n" > > -#endif > > - " ret\n" > > + "push trampoline_handler\n" > > + "jmp call_to_exception_trampoline\n" > > ".size kretprobe_trampoline, .-kretprobe_trampoline\n" > > ); > > > Potentially minor nit: you’re doing popfl, but you’re not doing TRACE_IRQ_whatever. This makes me think that you should either add the tracing (ugh!) or you should maybe just skip the popfl. Note, kprobes (and ftrace for that matter) are not saving flags for interrupts, but because it must not modify the sign, zero and carry flags. -- Steve ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/4] x86/kprobes: Fix frame pointer annotations 2019-05-09 17:43 ` Steven Rostedt @ 2019-05-10 3:21 ` Masami Hiramatsu 2019-05-10 12:14 ` Peter Zijlstra 2019-05-10 12:17 ` Peter Zijlstra 0 siblings, 2 replies; 36+ messages in thread From: Masami Hiramatsu @ 2019-05-10 3:21 UTC (permalink / raw) To: Steven Rostedt Cc: Andy Lutomirski, Peter Zijlstra, Masami Hiramatsu, Josh Poimboeuf, linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton, Andy Lutomirski, Nicolai Stange, Thomas Gleixner, Borislav Petkov, H. Peter Anvin, x86, Jiri Kosina, Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain, Masahiro Yamada, Joerg Roedel, linux-kselftest On Thu, 9 May 2019 13:43:16 -0400 Steven Rostedt <rostedt@goodmis.org> wrote: > On Thu, May 09, 2019 at 09:20:06AM -0700, Andy Lutomirski wrote: > > > +END(call_to_exception_trampoline) > > > --- a/arch/x86/kernel/kprobes/core.c > > > +++ b/arch/x86/kernel/kprobes/core.c > > > @@ -731,29 +731,8 @@ asm( > > > ".global kretprobe_trampoline\n" > > > ".type kretprobe_trampoline, @function\n" > > > "kretprobe_trampoline:\n" > > > - /* We don't bother saving the ss register */ > > > -#ifdef CONFIG_X86_64 > > > - " pushq %rsp\n" > > > - " pushfq\n" > > > - SAVE_REGS_STRING > > > - " movq %rsp, %rdi\n" > > > - " call trampoline_handler\n" > > > - /* Replace saved sp with true return address. */ > > > - " movq %rax, 19*8(%rsp)\n" > > > - RESTORE_REGS_STRING > > > - " popfq\n" > > > -#else > > > - " pushl %esp\n" > > > - " pushfl\n" > > > - SAVE_REGS_STRING > > > - " movl %esp, %eax\n" > > > - " call trampoline_handler\n" > > > - /* Replace saved sp with true return address. */ > > > - " movl %eax, 15*4(%esp)\n" > > > - RESTORE_REGS_STRING > > > - " popfl\n" > > > -#endif > > > - " ret\n" > > > + "push trampoline_handler\n" > > > + "jmp call_to_exception_trampoline\n" > > > ".size kretprobe_trampoline, .-kretprobe_trampoline\n" > > > ); > > > > > > Potentially minor nit: you’re doing popfl, but you’re not doing TRACE_IRQ_whatever. This makes me think that you should either add the tracing (ugh!) or you should maybe just skip the popfl. > > > Note, kprobes (and ftrace for that matter) are not saving flags for > interrupts, but because it must not modify the sign, zero and carry flags. Yes, optprobe also has to save and restore the flags. Above trampline is for kretprobe, which is placed at the function return, so we don't have to care about flags. Thanks, -- Masami Hiramatsu <mhiramat@kernel.org> ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/4] x86/kprobes: Fix frame pointer annotations 2019-05-10 3:21 ` Masami Hiramatsu @ 2019-05-10 12:14 ` Peter Zijlstra 2019-05-10 12:17 ` Peter Zijlstra 1 sibling, 0 replies; 36+ messages in thread From: Peter Zijlstra @ 2019-05-10 12:14 UTC (permalink / raw) To: Masami Hiramatsu Cc: Steven Rostedt, Andy Lutomirski, Josh Poimboeuf, linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton, Andy Lutomirski, Nicolai Stange, Thomas Gleixner, Borislav Petkov, H. Peter Anvin, x86, Jiri Kosina, Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain, Masahiro Yamada, Joerg Roedel, linux-kselftest On Fri, May 10, 2019 at 12:21:03PM +0900, Masami Hiramatsu wrote: > On Thu, 9 May 2019 13:43:16 -0400 > Steven Rostedt <rostedt@goodmis.org> wrote: > > > On Thu, May 09, 2019 at 09:20:06AM -0700, Andy Lutomirski wrote: > > > > +END(call_to_exception_trampoline) > > > > --- a/arch/x86/kernel/kprobes/core.c > > > > +++ b/arch/x86/kernel/kprobes/core.c > > > > @@ -731,29 +731,8 @@ asm( > > > > ".global kretprobe_trampoline\n" > > > > ".type kretprobe_trampoline, @function\n" > > > > "kretprobe_trampoline:\n" > > > > - /* We don't bother saving the ss register */ > > > > -#ifdef CONFIG_X86_64 > > > > - " pushq %rsp\n" > > > > - " pushfq\n" > > > > - SAVE_REGS_STRING > > > > - " movq %rsp, %rdi\n" > > > > - " call trampoline_handler\n" > > > > - /* Replace saved sp with true return address. */ > > > > - " movq %rax, 19*8(%rsp)\n" > > > > - RESTORE_REGS_STRING > > > > - " popfq\n" > > > > -#else > > > > - " pushl %esp\n" > > > > - " pushfl\n" > > > > - SAVE_REGS_STRING > > > > - " movl %esp, %eax\n" > > > > - " call trampoline_handler\n" > > > > - /* Replace saved sp with true return address. */ > > > > - " movl %eax, 15*4(%esp)\n" > > > > - RESTORE_REGS_STRING > > > > - " popfl\n" > > > > -#endif > > > > - " ret\n" > > > > + "push trampoline_handler\n" > > > > + "jmp call_to_exception_trampoline\n" > > > > ".size kretprobe_trampoline, .-kretprobe_trampoline\n" > > > > ); > > > > > > > > > Potentially minor nit: you’re doing popfl, but you’re not doing TRACE_IRQ_whatever. This makes me think that you should either add the tracing (ugh!) or you should maybe just skip the popfl. > > > > > > Note, kprobes (and ftrace for that matter) are not saving flags for > > interrupts, but because it must not modify the sign, zero and carry flags. Uh, C ABI considers those clobbered over function calls, surely. Relying on those flags over a CALL would be _insane_. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/4] x86/kprobes: Fix frame pointer annotations 2019-05-10 3:21 ` Masami Hiramatsu 2019-05-10 12:14 ` Peter Zijlstra @ 2019-05-10 12:17 ` Peter Zijlstra 2019-05-10 14:54 ` Steven Rostedt 1 sibling, 1 reply; 36+ messages in thread From: Peter Zijlstra @ 2019-05-10 12:17 UTC (permalink / raw) To: Masami Hiramatsu Cc: Steven Rostedt, Andy Lutomirski, Josh Poimboeuf, linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton, Andy Lutomirski, Nicolai Stange, Thomas Gleixner, Borislav Petkov, H. Peter Anvin, x86, Jiri Kosina, Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain, Masahiro Yamada, Joerg Roedel, linux-kselftest On Fri, May 10, 2019 at 12:21:03PM +0900, Masami Hiramatsu wrote: > Yes, optprobe also has to save and restore the flags. > Above trampline is for kretprobe, which is placed at the function return, so > we don't have to care about flags. Sure, optprobe is actually special here, because it branches out at 'random' places and does indeed need to preserve flags. But both ftrace and retprobes are at C function call boundaries. Preserving flags doesn't make sense. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/4] x86/kprobes: Fix frame pointer annotations 2019-05-10 12:17 ` Peter Zijlstra @ 2019-05-10 14:54 ` Steven Rostedt 0 siblings, 0 replies; 36+ messages in thread From: Steven Rostedt @ 2019-05-10 14:54 UTC (permalink / raw) To: Peter Zijlstra Cc: Masami Hiramatsu, Andy Lutomirski, Josh Poimboeuf, linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton, Andy Lutomirski, Nicolai Stange, Thomas Gleixner, Borislav Petkov, H. Peter Anvin, x86, Jiri Kosina, Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain, Masahiro Yamada, Joerg Roedel, linux-kselftest On Fri, 10 May 2019 14:17:20 +0200 Peter Zijlstra <peterz@infradead.org> wrote: > But both ftrace and retprobes are at C function call boundaries. > Preserving flags doesn't make sense. I agree, but I did it just because of my OCD and being complete in "emulating an int3" for ftrace_regs_caller ;-) Yeah, we can remove the popfl from the ftrace trampoline. -- Steve ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/4] x86/kprobes: Fix frame pointer annotations 2019-05-09 8:14 ` Peter Zijlstra ` (3 preceding siblings ...) 2019-05-09 16:20 ` Andy Lutomirski @ 2019-05-09 17:37 ` Steven Rostedt 2019-05-09 18:26 ` Peter Zijlstra 4 siblings, 1 reply; 36+ messages in thread From: Steven Rostedt @ 2019-05-09 17:37 UTC (permalink / raw) To: Peter Zijlstra Cc: Masami Hiramatsu, Josh Poimboeuf, linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton, Andy Lutomirski, Nicolai Stange, Thomas Gleixner, Borislav Petkov, H. Peter Anvin, x86, Jiri Kosina, Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain, Masahiro Yamada, Joerg Roedel, linux-kselftest On Thu, May 09, 2019 at 10:14:31AM +0200, Peter Zijlstra wrote: > > Right; I already fixed that in my patch changing i386's pt_regs. > > But what I'd love to do is something like the belwo patch, and make all > the trampolines (very much including ftrace) use that. Such that we then > only have 1 copy of this magic (well, 2 because x86_64 also needs an > implementation of this of course). > > Changing ftrace over to this would be a little more work but it can > easily chain things a little to get its original context back: > > ENTRY(ftrace_regs_caller) > GLOBAL(ftrace_regs_func) > push ftrace_stub > push ftrace_regs_handler Note, ftrace_stub is dynamically modified to remove any indirect calls. > jmp call_to_exception_trampoline > END(ftrace_regs_caller) > > typedef void (*ftrace_func_t)(unsigned long, unsigned long, struct ftrace_op *, struct pt_regs *); > > struct ftrace_regs_stack { > ftrace_func_t func; > unsigned long parent_ip; > }; > > void ftrace_regs_handler(struct pr_regs *regs) > { > struct ftrace_regs_stack *st = (void *)regs->sp; > ftrace_func_t func = st->func; > > regs->sp += sizeof(long); /* pop func */ > > func(regs->ip, st->parent_ip, function_trace_op, regs); I try very hard to limit all indirect function calls from the function tracing path, as they do add noticeable overhead. -- Steve > } > > Hmm? I didn't look into the function_graph thing, but I imagine it can > be added without too much pain. > ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/4] x86/kprobes: Fix frame pointer annotations 2019-05-09 17:37 ` Steven Rostedt @ 2019-05-09 18:26 ` Peter Zijlstra 2019-05-09 18:36 ` Steven Rostedt 0 siblings, 1 reply; 36+ messages in thread From: Peter Zijlstra @ 2019-05-09 18:26 UTC (permalink / raw) To: Steven Rostedt Cc: Masami Hiramatsu, Josh Poimboeuf, linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton, Andy Lutomirski, Nicolai Stange, Thomas Gleixner, Borislav Petkov, H. Peter Anvin, x86, Jiri Kosina, Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain, Masahiro Yamada, Joerg Roedel, linux-kselftest On Thu, May 09, 2019 at 01:37:42PM -0400, Steven Rostedt wrote: > On Thu, May 09, 2019 at 10:14:31AM +0200, Peter Zijlstra wrote: > > > > Right; I already fixed that in my patch changing i386's pt_regs. > > > > But what I'd love to do is something like the belwo patch, and make all > > the trampolines (very much including ftrace) use that. Such that we then > > only have 1 copy of this magic (well, 2 because x86_64 also needs an > > implementation of this of course). > > > > Changing ftrace over to this would be a little more work but it can > > easily chain things a little to get its original context back: > > > > ENTRY(ftrace_regs_caller) > > GLOBAL(ftrace_regs_func) > > push ftrace_stub > > push ftrace_regs_handler > > Note, ftrace_stub is dynamically modified to remove any indirect calls. Yeah, I realized that a few hours after I send this out; as you might have seen by the IRC chatter on this. Still, maybe we can wrap the thing in a .macro and reuse things that way. Because I really hate there are at least 3 (x2 for x86_64) copies of this around. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/4] x86/kprobes: Fix frame pointer annotations 2019-05-09 18:26 ` Peter Zijlstra @ 2019-05-09 18:36 ` Steven Rostedt 0 siblings, 0 replies; 36+ messages in thread From: Steven Rostedt @ 2019-05-09 18:36 UTC (permalink / raw) To: Peter Zijlstra Cc: Masami Hiramatsu, Josh Poimboeuf, linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton, Andy Lutomirski, Nicolai Stange, Thomas Gleixner, Borislav Petkov, H. Peter Anvin, x86, Jiri Kosina, Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain, Masahiro Yamada, Joerg Roedel, linux-kselftest On Thu, 9 May 2019 20:26:22 +0200 Peter Zijlstra <peterz@infradead.org> wrote: > Still, maybe we can wrap the thing in a .macro and reuse things that > way. Because I really hate there are at least 3 (x2 for x86_64) copies > of this around. I'm good with something like this. Have a single place that does the pt_regs saving would be great. Then I could separate ftrace_regs_caller from ftrace_caller, and simplify them both, as I would only need to make sure frame pointers still work for ftrace_caller, and ftrace_regs_caller would depend on the macro to make sure it works. -- Steve ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH 3/4] x86/ftrace: Add pt_regs frame annotations 2019-05-08 7:49 [PATCH 0/4] x86: int3 fallout Peter Zijlstra 2019-05-08 7:49 ` [PATCH 1/4] x86/entry/32: Clean up return from interrupt preemption path Peter Zijlstra 2019-05-08 7:49 ` [PATCH 2/4] x86/kprobes: Fix frame pointer annotations Peter Zijlstra @ 2019-05-08 7:49 ` Peter Zijlstra 2019-05-08 7:49 ` [RFC][PATCH 4/4] x86_32: Provide consistent pt_regs Peter Zijlstra 3 siblings, 0 replies; 36+ messages in thread From: Peter Zijlstra @ 2019-05-08 7:49 UTC (permalink / raw) To: linux-kernel Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar, Andrew Morton, Andy Lutomirski, Nicolai Stange, Thomas Gleixner, Borislav Petkov, H. Peter Anvin, x86, Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain, Masahiro Yamada, Joerg Roedel, linux-kselftest, Masami Hiramatsu When CONFIG_FRAME_POINTER, we should mark pt_regs frames. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> --- arch/x86/kernel/ftrace_32.S | 6 ++++++ arch/x86/kernel/ftrace_64.S | 3 +++ 2 files changed, 9 insertions(+) --- a/arch/x86/kernel/ftrace_32.S +++ b/arch/x86/kernel/ftrace_32.S @@ -138,6 +138,12 @@ ENTRY(ftrace_regs_caller) #else movl 0x4(%ebp), %edx /* Load parent ip (2nd parameter) */ #endif + +#ifdef CONFIG_FRAME_POINTER + movl %esp, %ebp + andl $0x7fffffff, %ebp +#endif + movl function_trace_op, %ecx /* Save ftrace_pos in 3rd parameter */ pushl %esp /* Save pt_regs as 4th parameter */ --- a/arch/x86/kernel/ftrace_64.S +++ b/arch/x86/kernel/ftrace_64.S @@ -222,6 +222,9 @@ GLOBAL(ftrace_regs_caller_op_ptr) leaq MCOUNT_REG_SIZE+8*2(%rsp), %rcx movq %rcx, RSP(%rsp) +#ifdef CONFIG_FRAME_POINTER + leaq 1(%rsp), %rbp +#endif /* regs go into 4th parameter */ leaq (%rsp), %rcx ^ permalink raw reply [flat|nested] 36+ messages in thread
* [RFC][PATCH 4/4] x86_32: Provide consistent pt_regs 2019-05-08 7:49 [PATCH 0/4] x86: int3 fallout Peter Zijlstra ` (2 preceding siblings ...) 2019-05-08 7:49 ` [PATCH 3/4] x86/ftrace: Add pt_regs frame annotations Peter Zijlstra @ 2019-05-08 7:49 ` Peter Zijlstra 2019-05-08 11:57 ` Josh Poimboeuf 2019-05-08 20:58 ` Linus Torvalds 3 siblings, 2 replies; 36+ messages in thread From: Peter Zijlstra @ 2019-05-08 7:49 UTC (permalink / raw) To: linux-kernel Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar, Andrew Morton, Andy Lutomirski, Nicolai Stange, Thomas Gleixner, Borislav Petkov, H. Peter Anvin, x86, Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain, Masahiro Yamada, Joerg Roedel, linux-kselftest, Masami Hiramatsu Currently pt_regs on x86_32 has an oddity in that kernel regs (!user_mode(regs)) are short two entries (esp/ss). This means that any code trying to use them (typically: regs->sp) needs to jump through some unfortunate hoops. Change the entry code to fix this up and create a full pt_regs frame. This then simplifies: - ftrace - kprobes - stack unwinder - ptrace - kdump - kgdb Hated-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> --- arch/x86/entry/entry_32.S | 105 ++++++++++++++++++++++++++++++++++---- arch/x86/include/asm/kexec.h | 17 ------ arch/x86/include/asm/ptrace.h | 17 ------ arch/x86/include/asm/stacktrace.h | 2 arch/x86/kernel/crash.c | 8 -- arch/x86/kernel/ftrace_32.S | 81 ++++++++++++++++------------- arch/x86/kernel/kgdb.c | 8 -- arch/x86/kernel/kprobes/common.h | 4 - arch/x86/kernel/kprobes/core.c | 29 ++++------ arch/x86/kernel/kprobes/opt.c | 20 ++++--- arch/x86/kernel/process_32.c | 16 +---- arch/x86/kernel/ptrace.c | 29 ---------- arch/x86/kernel/time.c | 3 - arch/x86/kernel/unwind_frame.c | 32 +---------- arch/x86/kernel/unwind_orc.c | 2 15 files changed, 181 insertions(+), 192 deletions(-) --- a/arch/x86/entry/entry_32.S +++ b/arch/x86/entry/entry_32.S @@ -202,9 +202,102 @@ .Lend_\@: .endm +#define CS_FROM_ENTRY_STACK (1 << 31) +#define CS_FROM_USER_CR3 (1 << 30) +#define CS_FROM_KERNEL (1 << 29) + +.macro FIXUP_FRAME + /* + * The high bits of the CS dword (__csh) are used for CS_FROM_*. + * Clear them in case hardware didn't do this for us. + */ + andl $0x0000ffff, 3*4(%esp) + +#ifdef CONFIG_VM86 + testl $X86_EFLAGS_VM, 4*4(%esp) + jnz .Lfrom_usermode_no_fixup_\@ +#endif + testl $SEGMENT_RPL_MASK, 3*4(%esp) + jnz .Lfrom_usermode_no_fixup_\@ + + orl $CS_FROM_KERNEL, 3*4(%esp) + + /* + * When we're here from kernel mode; the (exception) stack looks like: + * + * 5*4(%esp) - <previous context> + * 4*4(%esp) - flags + * 3*4(%esp) - cs + * 2*4(%esp) - ip + * 1*4(%esp) - orig_eax + * 0*4(%esp) - gs / function + * + * Lets build a 5 entry IRET frame after that, such that struct pt_regs + * is complete and in particular regs->sp is correct. This gives us + * the original 5 enties as gap: + * + * 12*4(%esp) - <previous context> + * 11*4(%esp) - gap / flags + * 10*4(%esp) - gap / cs + * 9*4(%esp) - gap / ip + * 8*4(%esp) - gap / orig_eax + * 7*4(%esp) - gap / gs / function + * 6*4(%esp) - ss + * 5*4(%esp) - sp + * 4*4(%esp) - flags + * 3*4(%esp) - cs + * 2*4(%esp) - ip + * 1*4(%esp) - orig_eax + * 0*4(%esp) - gs / function + */ + + pushl %ss # ss + pushl %esp # sp (points at ss) + addl $6*4, (%esp) # point sp back at the previous context + pushl 6*4(%esp) # flags + pushl 6*4(%esp) # cs + pushl 6*4(%esp) # ip + pushl 6*4(%esp) # orig_eax + pushl 6*4(%esp) # gs / function +.Lfrom_usermode_no_fixup_\@: +.endm + +.macro IRET_FRAME + testl $CS_FROM_KERNEL, 1*4(%esp) + jz .Lfinished_frame_\@ + + /* + * Reconstruct the 3 entry IRET frame right after the (modified) + * regs->sp without lowering %esp in between, such that an NMI in the + * middle doesn't scribble our stack. + */ + pushl %eax + pushl %ecx + movl 5*4(%esp), %eax # (modified) regs->sp + + movl 4*4(%esp), %ecx # flags + movl %ecx, -4(%eax) + + movl 3*4(%esp), %ecx # cs + andl $0x0000ffff, %ecx + movl %ecx, -8(%eax) + + movl 2*4(%esp), %ecx # ip + movl %ecx, -12(%eax) + + movl 1*4(%esp), %ecx # eax + movl %ecx, -16(%eax) + + popl %ecx + lea -16(%eax), %esp + popl %eax +.Lfinished_frame_\@: +.endm + .macro SAVE_ALL pt_regs_ax=%eax switch_stacks=0 cld PUSH_GS + FIXUP_FRAME pushl %fs pushl %es pushl %ds @@ -374,9 +467,6 @@ * switch to it before we do any copying. */ -#define CS_FROM_ENTRY_STACK (1 << 31) -#define CS_FROM_USER_CR3 (1 << 30) - .macro SWITCH_TO_KERNEL_STACK ALTERNATIVE "", "jmp .Lend_\@", X86_FEATURE_XENPV @@ -390,13 +480,6 @@ * that register for the time this macro runs */ - /* - * The high bits of the CS dword (__csh) are used for - * CS_FROM_ENTRY_STACK and CS_FROM_USER_CR3. Clear them in case - * hardware didn't do this for us. - */ - andl $(0x0000ffff), PT_CS(%esp) - /* Are we on the entry stack? Bail out if not! */ movl PER_CPU_VAR(cpu_entry_area), %ecx addl $CPU_ENTRY_AREA_entry_stack + SIZEOF_entry_stack, %ecx @@ -1006,6 +1089,7 @@ ENTRY(entry_INT80_32) /* Restore user state */ RESTORE_REGS pop=4 # skip orig_eax/error_code .Lirq_return: + IRET_FRAME /* * ARCH_HAS_MEMBARRIER_SYNC_CORE rely on IRET core serialization * when returning from IPI handler and when returning from @@ -1356,6 +1440,7 @@ END(page_fault) common_exception: /* the function address is in %gs's slot on the stack */ + FIXUP_FRAME pushl %fs pushl %es pushl %ds --- a/arch/x86/include/asm/kexec.h +++ b/arch/x86/include/asm/kexec.h @@ -71,22 +71,6 @@ struct kimage; #define KEXEC_BACKUP_SRC_END (640 * 1024UL - 1) /* 640K */ /* - * CPU does not save ss and sp on stack if execution is already - * running in kernel mode at the time of NMI occurrence. This code - * fixes it. - */ -static inline void crash_fixup_ss_esp(struct pt_regs *newregs, - struct pt_regs *oldregs) -{ -#ifdef CONFIG_X86_32 - newregs->sp = (unsigned long)&(oldregs->sp); - asm volatile("xorl %%eax, %%eax\n\t" - "movw %%ss, %%ax\n\t" - :"=a"(newregs->ss)); -#endif -} - -/* * This function is responsible for capturing register states if coming * via panic otherwise just fix up the ss and sp if coming via kernel * mode exception. @@ -96,7 +80,6 @@ static inline void crash_setup_regs(stru { if (oldregs) { memcpy(newregs, oldregs, sizeof(*newregs)); - crash_fixup_ss_esp(newregs, oldregs); } else { #ifdef CONFIG_X86_32 asm volatile("movl %%ebx,%0" : "=m"(newregs->bx)); --- a/arch/x86/include/asm/ptrace.h +++ b/arch/x86/include/asm/ptrace.h @@ -166,14 +166,10 @@ static inline bool user_64bit_mode(struc #define compat_user_stack_pointer() current_pt_regs()->sp #endif -#ifdef CONFIG_X86_32 -extern unsigned long kernel_stack_pointer(struct pt_regs *regs); -#else static inline unsigned long kernel_stack_pointer(struct pt_regs *regs) { return regs->sp; } -#endif #define GET_IP(regs) ((regs)->ip) #define GET_FP(regs) ((regs)->bp) @@ -201,14 +197,6 @@ static inline unsigned long regs_get_reg if (unlikely(offset > MAX_REG_OFFSET)) return 0; #ifdef CONFIG_X86_32 - /* - * Traps from the kernel do not save sp and ss. - * Use the helper function to retrieve sp. - */ - if (offset == offsetof(struct pt_regs, sp) && - regs->cs == __KERNEL_CS) - return kernel_stack_pointer(regs); - /* The selector fields are 16-bit. */ if (offset == offsetof(struct pt_regs, cs) || offset == offsetof(struct pt_regs, ss) || @@ -234,8 +222,7 @@ static inline unsigned long regs_get_reg static inline int regs_within_kernel_stack(struct pt_regs *regs, unsigned long addr) { - return ((addr & ~(THREAD_SIZE - 1)) == - (kernel_stack_pointer(regs) & ~(THREAD_SIZE - 1))); + return ((addr & ~(THREAD_SIZE - 1)) == (regs->sp & ~(THREAD_SIZE - 1))); } /** @@ -249,7 +236,7 @@ static inline int regs_within_kernel_sta */ static inline unsigned long *regs_get_kernel_stack_nth_addr(struct pt_regs *regs, unsigned int n) { - unsigned long *addr = (unsigned long *)kernel_stack_pointer(regs); + unsigned long *addr = (unsigned long *)regs->sp; addr += n; if (regs_within_kernel_stack(regs, (unsigned long)addr)) --- a/arch/x86/include/asm/stacktrace.h +++ b/arch/x86/include/asm/stacktrace.h @@ -78,7 +78,7 @@ static inline unsigned long * get_stack_pointer(struct task_struct *task, struct pt_regs *regs) { if (regs) - return (unsigned long *)kernel_stack_pointer(regs); + return (unsigned long *)regs->sp; if (task == current) return __builtin_frame_address(0); --- a/arch/x86/kernel/crash.c +++ b/arch/x86/kernel/crash.c @@ -72,14 +72,6 @@ static inline void cpu_crash_vmclear_loa static void kdump_nmi_callback(int cpu, struct pt_regs *regs) { -#ifdef CONFIG_X86_32 - struct pt_regs fixed_regs; - - if (!user_mode(regs)) { - crash_fixup_ss_esp(&fixed_regs, regs); - regs = &fixed_regs; - } -#endif crash_save_cpu(regs, cpu); /* --- a/arch/x86/kernel/ftrace_32.S +++ b/arch/x86/kernel/ftrace_32.S @@ -9,6 +9,7 @@ #include <asm/export.h> #include <asm/ftrace.h> #include <asm/nospec-branch.h> +#include <asm/asm-offsets.h> #ifdef CC_USING_FENTRY # define function_hook __fentry__ @@ -104,26 +105,38 @@ END(ftrace_caller) ENTRY(ftrace_regs_caller) /* - * i386 does not save SS and ESP when coming from kernel. - * Instead, to get sp, ®s->sp is used (see ptrace.h). - * Unfortunately, that means eflags must be at the same location - * as the current return ip is. We move the return ip into the - * regs->ip location, and move flags into the return ip location. + * We're here from an mcount/fentry CALL, and the stack frame looks like: + * + * <previous context> + * RET-IP + * + * The purpose of this function is to call out in an emulated INT3 + * environment with a stack frame like: + * + * <previous context> + * gap / RET-IP + * gap + * gap + * gap + * pt_regs + * + * We do _NOT_ restore: ss, flags, cs, gs, fs, es, ds */ - pushl $__KERNEL_CS - pushl 4(%esp) /* Save the return ip */ - pushl $0 /* Load 0 into orig_ax */ + subl $3*4, %esp # RET-IP + 3 gaps + pushl %ss # ss + pushl %esp # points at ss + addl $5*4, (%esp) # make it point at <previous context> + pushfl # flags + pushl $__KERNEL_CS # cs + pushl 7*4(%esp) # ip <- RET-IP + pushl $0 # orig_eax + pushl %gs pushl %fs pushl %es pushl %ds - pushl %eax - - /* Get flags and place them into the return ip slot */ - pushf - popl %eax - movl %eax, 8*4(%esp) + pushl %eax pushl %ebp pushl %edi pushl %esi @@ -131,12 +144,13 @@ ENTRY(ftrace_regs_caller) pushl %ecx pushl %ebx - movl 12*4(%esp), %eax /* Load ip (1st parameter) */ - subl $MCOUNT_INSN_SIZE, %eax /* Adjust ip */ + movl PT_EIP(%esp), %eax # 1st argument: IP + subl $MCOUNT_INSN_SIZE, %eax + #ifdef CC_USING_FENTRY - movl 15*4(%esp), %edx /* Load parent ip (2nd parameter) */ + movl 21*4(%esp), %edx # 2nd argument: parent ip #else - movl 0x4(%ebp), %edx /* Load parent ip (2nd parameter) */ + movl 1*4(%ebp), %edx # 2nd argument: parent ip #endif #ifdef CONFIG_FRAME_POINTER @@ -144,21 +158,22 @@ ENTRY(ftrace_regs_caller) andl $0x7fffffff, %ebp #endif - movl function_trace_op, %ecx /* Save ftrace_pos in 3rd parameter */ - pushl %esp /* Save pt_regs as 4th parameter */ + movl function_trace_op, %ecx # 3rd argument: ftrace_pos + pushl %esp # 4th argument: pt_regs GLOBAL(ftrace_regs_call) call ftrace_stub - addl $4, %esp /* Skip pt_regs */ + addl $4, %esp # skip 4th argument - /* restore flags */ - push 14*4(%esp) - popf - - /* Move return ip back to its original location */ - movl 12*4(%esp), %eax - movl %eax, 14*4(%esp) + /* place IP below the new SP */ + movl PT_OLDESP(%esp), %eax + movl PT_EIP(%esp), %ecx + movl %ecx, -4(%eax) + + /* place EAX below that */ + movl PT_EAX(%esp), %ecx + movl %ecx, -8(%eax) popl %ebx popl %ecx @@ -166,16 +181,12 @@ GLOBAL(ftrace_regs_call) popl %esi popl %edi popl %ebp - popl %eax - popl %ds - popl %es - popl %fs - popl %gs - /* use lea to not affect flags */ - lea 3*4(%esp), %esp /* Skip orig_ax, ip and cs */ + lea -8(%eax), %esp + popl %eax jmp .Lftrace_ret + #else /* ! CONFIG_DYNAMIC_FTRACE */ ENTRY(function_hook) --- a/arch/x86/kernel/kgdb.c +++ b/arch/x86/kernel/kgdb.c @@ -127,14 +127,6 @@ char *dbg_get_reg(int regno, void *mem, #ifdef CONFIG_X86_32 switch (regno) { - case GDB_SS: - if (!user_mode(regs)) - *(unsigned long *)mem = __KERNEL_DS; - break; - case GDB_SP: - if (!user_mode(regs)) - *(unsigned long *)mem = kernel_stack_pointer(regs); - break; case GDB_GS: case GDB_FS: *(unsigned long *)mem = 0xFFFF; --- a/arch/x86/kernel/kprobes/common.h +++ b/arch/x86/kernel/kprobes/common.h @@ -86,8 +86,8 @@ " popl %edi\n" \ " popl %ebp\n" \ " popl %eax\n" \ - /* Skip ds, es, fs, gs, orig_ax, and ip. Note: don't pop cs here*/\ - " addl $24, %esp\n" + /* Skip ds, es, fs, gs, orig_ax, ip, and cs. */\ + " addl $7*4, %esp\n" #endif /* Ensure if the instruction can be boostable */ --- a/arch/x86/kernel/kprobes/core.c +++ b/arch/x86/kernel/kprobes/core.c @@ -69,7 +69,7 @@ DEFINE_PER_CPU(struct kprobe *, current_kprobe) = NULL; DEFINE_PER_CPU(struct kprobe_ctlblk, kprobe_ctlblk); -#define stack_addr(regs) ((unsigned long *)kernel_stack_pointer(regs)) +#define stack_addr(regs) ((unsigned long *)regs->sp) #define W(row, b0, b1, b2, b3, b4, b5, b6, b7, b8, b9, ba, bb, bc, bd, be, bf)\ (((b0##UL << 0x0)|(b1##UL << 0x1)|(b2##UL << 0x2)|(b3##UL << 0x3) | \ @@ -731,29 +731,27 @@ asm( ".global kretprobe_trampoline\n" ".type kretprobe_trampoline, @function\n" "kretprobe_trampoline:\n" -#ifdef CONFIG_X86_64 /* We don't bother saving the ss register */ +#ifdef CONFIG_X86_64 " pushq %rsp\n" " pushfq\n" SAVE_REGS_STRING " movq %rsp, %rdi\n" " call trampoline_handler\n" /* Replace saved sp with true return address. */ - " movq %rax, 152(%rsp)\n" + " movq %rax, 19*8(%rsp)\n" RESTORE_REGS_STRING " popfq\n" #else - " pushf\n" + " pushl %esp\n" + " pushfl\n" SAVE_REGS_STRING " movl %esp, %eax\n" " call trampoline_handler\n" - /* Move flags to cs */ - " movl 56(%esp), %edx\n" - " movl %edx, 52(%esp)\n" - /* Replace saved flags with true return address. */ - " movl %eax, 56(%esp)\n" + /* Replace saved sp with true return address. */ + " movl %eax, 15*4(%esp)\n" RESTORE_REGS_STRING - " popf\n" + " popfl\n" #endif " ret\n" ".size kretprobe_trampoline, .-kretprobe_trampoline\n" @@ -794,16 +792,13 @@ static __used void *trampoline_handler(s INIT_HLIST_HEAD(&empty_rp); kretprobe_hash_lock(current, &head, &flags); /* fixup registers */ -#ifdef CONFIG_X86_64 regs->cs = __KERNEL_CS; - /* On x86-64, we use pt_regs->sp for return address holder. */ - frame_pointer = ®s->sp; -#else - regs->cs = __KERNEL_CS | get_kernel_rpl(); +#ifdef CONFIG_X86_32 + regs->cs |= get_kernel_rpl(); regs->gs = 0; - /* On x86-32, we use pt_regs->flags for return address holder. */ - frame_pointer = ®s->flags; #endif + /* We use pt_regs->sp for return address holder. */ + frame_pointer = ®s->sp; regs->ip = trampoline_address; regs->orig_ax = ~0UL; --- a/arch/x86/kernel/kprobes/opt.c +++ b/arch/x86/kernel/kprobes/opt.c @@ -115,14 +115,15 @@ asm ( "optprobe_template_call:\n" ASM_NOP5 /* Move flags to rsp */ - " movq 144(%rsp), %rdx\n" - " movq %rdx, 152(%rsp)\n" + " movq 18*8(%rsp), %rdx\n" + " movq %rdx, 19*8(%rsp)\n" RESTORE_REGS_STRING /* Skip flags entry */ " addq $8, %rsp\n" " popfq\n" #else /* CONFIG_X86_32 */ - " pushf\n" + " pushl %esp\n" + " pushfl\n" SAVE_REGS_STRING " movl %esp, %edx\n" ".global optprobe_template_val\n" @@ -131,9 +132,13 @@ asm ( ".global optprobe_template_call\n" "optprobe_template_call:\n" ASM_NOP5 + /* Move flags into esp */ + " movl 14*4(%esp), %edx\n" + " movl %edx, 15*4(%esp)\n" RESTORE_REGS_STRING - " addl $4, %esp\n" /* skip cs */ - " popf\n" + /* Skip flags entry */ + " addl $4, %esp\n" + " popfl\n" #endif ".global optprobe_template_end\n" "optprobe_template_end:\n" @@ -165,10 +170,9 @@ optimized_callback(struct optimized_kpro } else { struct kprobe_ctlblk *kcb = get_kprobe_ctlblk(); /* Save skipped registers */ -#ifdef CONFIG_X86_64 regs->cs = __KERNEL_CS; -#else - regs->cs = __KERNEL_CS | get_kernel_rpl(); +#ifdef CONFIG_X86_32 + regs->cs |= get_kernel_rpl(); regs->gs = 0; #endif regs->ip = (unsigned long)op->kp.addr + INT3_SIZE; --- a/arch/x86/kernel/process_32.c +++ b/arch/x86/kernel/process_32.c @@ -62,27 +62,21 @@ void __show_regs(struct pt_regs *regs, e { unsigned long cr0 = 0L, cr2 = 0L, cr3 = 0L, cr4 = 0L; unsigned long d0, d1, d2, d3, d6, d7; - unsigned long sp; - unsigned short ss, gs; + unsigned short gs; - if (user_mode(regs)) { - sp = regs->sp; - ss = regs->ss; + if (user_mode(regs)) gs = get_user_gs(regs); - } else { - sp = kernel_stack_pointer(regs); - savesegment(ss, ss); + else savesegment(gs, gs); - } show_ip(regs, KERN_DEFAULT); printk(KERN_DEFAULT "EAX: %08lx EBX: %08lx ECX: %08lx EDX: %08lx\n", regs->ax, regs->bx, regs->cx, regs->dx); printk(KERN_DEFAULT "ESI: %08lx EDI: %08lx EBP: %08lx ESP: %08lx\n", - regs->si, regs->di, regs->bp, sp); + regs->si, regs->di, regs->bp, regs->sp); printk(KERN_DEFAULT "DS: %04x ES: %04x FS: %04x GS: %04x SS: %04x EFLAGS: %08lx\n", - (u16)regs->ds, (u16)regs->es, (u16)regs->fs, gs, ss, regs->flags); + (u16)regs->ds, (u16)regs->es, (u16)regs->fs, gs, regs->ss, regs->flags); if (mode != SHOW_REGS_ALL) return; --- a/arch/x86/kernel/ptrace.c +++ b/arch/x86/kernel/ptrace.c @@ -153,35 +153,6 @@ static inline bool invalid_selector(u16 #define FLAG_MASK FLAG_MASK_32 -/* - * X86_32 CPUs don't save ss and esp if the CPU is already in kernel mode - * when it traps. The previous stack will be directly underneath the saved - * registers, and 'sp/ss' won't even have been saved. Thus the '®s->sp'. - * - * Now, if the stack is empty, '®s->sp' is out of range. In this - * case we try to take the previous stack. To always return a non-null - * stack pointer we fall back to regs as stack if no previous stack - * exists. - * - * This is valid only for kernel mode traps. - */ -unsigned long kernel_stack_pointer(struct pt_regs *regs) -{ - unsigned long context = (unsigned long)regs & ~(THREAD_SIZE - 1); - unsigned long sp = (unsigned long)®s->sp; - u32 *prev_esp; - - if (context == (sp & ~(THREAD_SIZE - 1))) - return sp; - - prev_esp = (u32 *)(context); - if (*prev_esp) - return (unsigned long)*prev_esp; - - return (unsigned long)regs; -} -EXPORT_SYMBOL_GPL(kernel_stack_pointer); - static unsigned long *pt_regs_access(struct pt_regs *regs, unsigned long regno) { BUILD_BUG_ON(offsetof(struct pt_regs, bx) != 0); --- a/arch/x86/kernel/time.c +++ b/arch/x86/kernel/time.c @@ -37,8 +37,7 @@ unsigned long profile_pc(struct pt_regs #ifdef CONFIG_FRAME_POINTER return *(unsigned long *)(regs->bp + sizeof(long)); #else - unsigned long *sp = - (unsigned long *)kernel_stack_pointer(regs); + unsigned long *sp = (unsigned long *)regs->sp; /* * Return address is either directly at stack pointer * or above a saved flags. Eflags has bits 22-31 zero, --- a/arch/x86/kernel/unwind_frame.c +++ b/arch/x86/kernel/unwind_frame.c @@ -69,15 +69,6 @@ static void unwind_dump(struct unwind_st } } -static size_t regs_size(struct pt_regs *regs) -{ - /* x86_32 regs from kernel mode are two words shorter: */ - if (IS_ENABLED(CONFIG_X86_32) && !user_mode(regs)) - return sizeof(*regs) - 2*sizeof(long); - - return sizeof(*regs); -} - static bool in_entry_code(unsigned long ip) { char *addr = (char *)ip; @@ -197,12 +188,6 @@ static struct pt_regs *decode_frame_poin } #endif -#ifdef CONFIG_X86_32 -#define KERNEL_REGS_SIZE (sizeof(struct pt_regs) - 2*sizeof(long)) -#else -#define KERNEL_REGS_SIZE (sizeof(struct pt_regs)) -#endif - static bool update_stack_state(struct unwind_state *state, unsigned long *next_bp) { @@ -213,7 +198,7 @@ static bool update_stack_state(struct un size_t len; if (state->regs) - prev_frame_end = (void *)state->regs + regs_size(state->regs); + prev_frame_end = (void *)state->regs + sizeof(*state->regs); else prev_frame_end = (void *)state->bp + FRAME_HEADER_SIZE; @@ -221,7 +206,7 @@ static bool update_stack_state(struct un regs = decode_frame_pointer(next_bp); if (regs) { frame = (unsigned long *)regs; - len = KERNEL_REGS_SIZE; + len = sizeof(*regs); state->got_irq = true; } else { frame = next_bp; @@ -245,14 +230,6 @@ static bool update_stack_state(struct un frame < prev_frame_end) return false; - /* - * On 32-bit with user mode regs, make sure the last two regs are safe - * to access: - */ - if (IS_ENABLED(CONFIG_X86_32) && regs && user_mode(regs) && - !on_stack(info, frame, len + 2*sizeof(long))) - return false; - /* Move state to the next frame: */ if (regs) { state->regs = regs; @@ -411,10 +388,9 @@ void __unwind_start(struct unwind_state * Pretend that the frame is complete and that BP points to it, but save * the real BP so that we can use it when looking for the next frame. */ - if (regs && regs->ip == 0 && - (unsigned long *)kernel_stack_pointer(regs) >= first_frame) { + if (regs && regs->ip == 0 && (unsigned long *)regs->sp >= first_frame) { state->next_bp = bp; - bp = ((unsigned long *)kernel_stack_pointer(regs)) - 1; + bp = ((unsigned long *)regs->sp) - 1; } /* Initialize stack info and make sure the frame data is accessible: */ --- a/arch/x86/kernel/unwind_orc.c +++ b/arch/x86/kernel/unwind_orc.c @@ -579,7 +579,7 @@ void __unwind_start(struct unwind_state goto done; state->ip = regs->ip; - state->sp = kernel_stack_pointer(regs); + state->sp = regs->sp; state->bp = regs->bp; state->regs = regs; state->full_regs = true; ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC][PATCH 4/4] x86_32: Provide consistent pt_regs 2019-05-08 7:49 ` [RFC][PATCH 4/4] x86_32: Provide consistent pt_regs Peter Zijlstra @ 2019-05-08 11:57 ` Josh Poimboeuf 2019-05-08 12:46 ` Ingo Molnar 2019-05-08 20:58 ` Linus Torvalds 1 sibling, 1 reply; 36+ messages in thread From: Josh Poimboeuf @ 2019-05-08 11:57 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton, Andy Lutomirski, Nicolai Stange, Thomas Gleixner, Borislav Petkov, H. Peter Anvin, x86, Jiri Kosina, Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain, Masahiro Yamada, Joerg Roedel, linux-kselftest, Masami Hiramatsu On Wed, May 08, 2019 at 09:49:05AM +0200, Peter Zijlstra wrote: > Currently pt_regs on x86_32 has an oddity in that kernel regs > (!user_mode(regs)) are short two entries (esp/ss). This means that any > code trying to use them (typically: regs->sp) needs to jump through > some unfortunate hoops. > > Change the entry code to fix this up and create a full pt_regs frame. > > This then simplifies: > > - ftrace > - kprobes > - stack unwinder > - ptrace > - kdump > - kgdb > > Hated-by: Linus Torvalds <torvalds@linux-foundation.org> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> > --- > arch/x86/entry/entry_32.S | 105 ++++++++++++++++++++++++++++++++++---- > arch/x86/include/asm/kexec.h | 17 ------ > arch/x86/include/asm/ptrace.h | 17 ------ > arch/x86/include/asm/stacktrace.h | 2 > arch/x86/kernel/crash.c | 8 -- > arch/x86/kernel/ftrace_32.S | 81 ++++++++++++++++------------- > arch/x86/kernel/kgdb.c | 8 -- > arch/x86/kernel/kprobes/common.h | 4 - > arch/x86/kernel/kprobes/core.c | 29 ++++------ > arch/x86/kernel/kprobes/opt.c | 20 ++++--- > arch/x86/kernel/process_32.c | 16 +---- > arch/x86/kernel/ptrace.c | 29 ---------- > arch/x86/kernel/time.c | 3 - > arch/x86/kernel/unwind_frame.c | 32 +---------- > arch/x86/kernel/unwind_orc.c | 2 > 15 files changed, 181 insertions(+), 192 deletions(-) Very nice diffstat. This moves all the pain to the 32-bit entry code where it belongs. -- Josh ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC][PATCH 4/4] x86_32: Provide consistent pt_regs 2019-05-08 11:57 ` Josh Poimboeuf @ 2019-05-08 12:46 ` Ingo Molnar 0 siblings, 0 replies; 36+ messages in thread From: Ingo Molnar @ 2019-05-08 12:46 UTC (permalink / raw) To: Josh Poimboeuf Cc: Peter Zijlstra, linux-kernel, Linus Torvalds, Andrew Morton, Andy Lutomirski, Nicolai Stange, Thomas Gleixner, Borislav Petkov, H. Peter Anvin, x86, Jiri Kosina, Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain, Masahiro Yamada, Joerg Roedel, linux-kselftest, Masami Hiramatsu * Josh Poimboeuf <jpoimboe@redhat.com> wrote: > On Wed, May 08, 2019 at 09:49:05AM +0200, Peter Zijlstra wrote: > > Currently pt_regs on x86_32 has an oddity in that kernel regs > > (!user_mode(regs)) are short two entries (esp/ss). This means that any > > code trying to use them (typically: regs->sp) needs to jump through > > some unfortunate hoops. > > > > Change the entry code to fix this up and create a full pt_regs frame. > > > > This then simplifies: > > > > - ftrace > > - kprobes > > - stack unwinder > > - ptrace > > - kdump > > - kgdb > > > > Hated-by: Linus Torvalds <torvalds@linux-foundation.org> > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> > > --- > > arch/x86/entry/entry_32.S | 105 ++++++++++++++++++++++++++++++++++---- > > arch/x86/include/asm/kexec.h | 17 ------ > > arch/x86/include/asm/ptrace.h | 17 ------ > > arch/x86/include/asm/stacktrace.h | 2 > > arch/x86/kernel/crash.c | 8 -- > > arch/x86/kernel/ftrace_32.S | 81 ++++++++++++++++------------- > > arch/x86/kernel/kgdb.c | 8 -- > > arch/x86/kernel/kprobes/common.h | 4 - > > arch/x86/kernel/kprobes/core.c | 29 ++++------ > > arch/x86/kernel/kprobes/opt.c | 20 ++++--- > > arch/x86/kernel/process_32.c | 16 +---- > > arch/x86/kernel/ptrace.c | 29 ---------- > > arch/x86/kernel/time.c | 3 - > > arch/x86/kernel/unwind_frame.c | 32 +---------- > > arch/x86/kernel/unwind_orc.c | 2 > > 15 files changed, 181 insertions(+), 192 deletions(-) > > Very nice diffstat. This moves all the pain to the 32-bit entry code > where it belongs. Yes, that's very convincing and looks like a nice simplification of 32-bit x86 entry concepts overall. Assuming it's correct, the question is whether Linus still hates it? :-) Thanks, Ingo ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC][PATCH 4/4] x86_32: Provide consistent pt_regs 2019-05-08 7:49 ` [RFC][PATCH 4/4] x86_32: Provide consistent pt_regs Peter Zijlstra 2019-05-08 11:57 ` Josh Poimboeuf @ 2019-05-08 20:58 ` Linus Torvalds 1 sibling, 0 replies; 36+ messages in thread From: Linus Torvalds @ 2019-05-08 20:58 UTC (permalink / raw) To: Peter Zijlstra Cc: Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, Andy Lutomirski, Nicolai Stange, Thomas Gleixner, Borislav Petkov, H. Peter Anvin, the arch/x86 maintainers, Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain, Masahiro Yamada, Joerg Roedel, open list:KERNEL SELFTEST FRAMEWORK, Masami Hiramatsu On Wed, May 8, 2019 at 1:12 AM Peter Zijlstra <peterz@infradead.org> wrote: > > Hated-by: Linus Torvalds <torvalds@linux-foundation.org> I can live with this cleaned-up version. I'm still not loving it, but at least it now tries very hard to simplify other code. You can remote the hated-by. Linus ^ permalink raw reply [flat|nested] 36+ messages in thread
end of thread, other threads:[~2019-05-13 8:15 UTC | newest] Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-05-08 7:49 [PATCH 0/4] x86: int3 fallout Peter Zijlstra 2019-05-08 7:49 ` [PATCH 1/4] x86/entry/32: Clean up return from interrupt preemption path Peter Zijlstra 2019-05-08 7:49 ` [PATCH 2/4] x86/kprobes: Fix frame pointer annotations Peter Zijlstra 2019-05-08 11:54 ` Josh Poimboeuf 2019-05-08 12:04 ` Peter Zijlstra 2019-05-08 12:40 ` Peter Zijlstra 2019-05-08 12:42 ` Josh Poimboeuf 2019-05-08 15:39 ` Peter Zijlstra 2019-05-08 18:48 ` Josh Poimboeuf 2019-05-09 1:20 ` Masami Hiramatsu 2019-05-09 8:14 ` Peter Zijlstra 2019-05-09 9:27 ` Peter Zijlstra 2019-05-09 14:00 ` Josh Poimboeuf 2019-05-09 14:01 ` Masami Hiramatsu 2019-05-09 17:14 ` Peter Zijlstra 2019-05-10 4:58 ` Masami Hiramatsu 2019-05-10 12:31 ` Peter Zijlstra 2019-05-11 0:52 ` Masami Hiramatsu 2019-05-10 12:40 ` Peter Zijlstra 2019-05-11 0:56 ` Masami Hiramatsu 2019-05-13 8:15 ` Peter Zijlstra 2019-05-09 16:20 ` Andy Lutomirski 2019-05-09 17:18 ` Peter Zijlstra 2019-05-09 17:43 ` Steven Rostedt 2019-05-10 3:21 ` Masami Hiramatsu 2019-05-10 12:14 ` Peter Zijlstra 2019-05-10 12:17 ` Peter Zijlstra 2019-05-10 14:54 ` Steven Rostedt 2019-05-09 17:37 ` Steven Rostedt 2019-05-09 18:26 ` Peter Zijlstra 2019-05-09 18:36 ` Steven Rostedt 2019-05-08 7:49 ` [PATCH 3/4] x86/ftrace: Add pt_regs frame annotations Peter Zijlstra 2019-05-08 7:49 ` [RFC][PATCH 4/4] x86_32: Provide consistent pt_regs Peter Zijlstra 2019-05-08 11:57 ` Josh Poimboeuf 2019-05-08 12:46 ` Ingo Molnar 2019-05-08 20:58 ` Linus Torvalds
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).