linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] x86: int3 fallout
@ 2019-05-08  7:49 Peter Zijlstra
  2019-05-08  7:49 ` [PATCH 1/4] x86/entry/32: Clean up return from interrupt preemption path Peter Zijlstra
                   ` (3 more replies)
  0 siblings, 4 replies; 36+ messages in thread
From: Peter Zijlstra @ 2019-05-08  7:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar, Andrew Morton,
	Andy Lutomirski, Nicolai Stange, Thomas Gleixner,
	Borislav Petkov, H. Peter Anvin, x86, Josh Poimboeuf,
	Jiri Kosina, Miroslav Benes, Petr Mladek, Joe Lawrence,
	Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen,
	Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross,
	Nick Desaulniers, Nayna Jain, Masahiro Yamada, Joerg Roedel,
	linux-kselftest, Masami Hiramatsu

So the first 3 patches are cleanups and fixes and should probably just go in.

The last patch however is the one Linus hates, it converts i386 to always have
a complete pt_regs. Both Josh and me did a bunch of cleanups and the patch is
now a net reduction in lines.

I still think it is the right thing to do; it removes a whole lot of magic from
the C code (and avoids future surprises), but if this cannot convince Linus,
I'll not pursue this further.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 1/4] x86/entry/32: Clean up return from interrupt preemption path
  2019-05-08  7:49 [PATCH 0/4] x86: int3 fallout Peter Zijlstra
@ 2019-05-08  7:49 ` Peter Zijlstra
  2019-05-08  7:49 ` [PATCH 2/4] x86/kprobes: Fix frame pointer annotations Peter Zijlstra
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 36+ messages in thread
From: Peter Zijlstra @ 2019-05-08  7:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar, Andrew Morton,
	Andy Lutomirski, Nicolai Stange, Thomas Gleixner,
	Borislav Petkov, H. Peter Anvin, x86, Josh Poimboeuf,
	Jiri Kosina, Miroslav Benes, Petr Mladek, Joe Lawrence,
	Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen,
	Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross,
	Nick Desaulniers, Nayna Jain, Masahiro Yamada, Joerg Roedel,
	linux-kselftest, Masami Hiramatsu

The code flow around the return from interrupt preemption point seems
needlesly complicated.

There is only one site jumping to resume_kernel, and none (outside of
resume_kernel) jumping to restore_all_kernel. Inline resume_kernel
in restore_all_kernel and avoid the CONFIG_PREEMPT dependent label.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/entry/entry_32.S |   24 ++++++++++--------------
 1 file changed, 10 insertions(+), 14 deletions(-)

--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -67,7 +67,6 @@
 # define preempt_stop(clobbers)	DISABLE_INTERRUPTS(clobbers); TRACE_IRQS_OFF
 #else
 # define preempt_stop(clobbers)
-# define resume_kernel		restore_all_kernel
 #endif
 
 .macro TRACE_IRQS_IRET
@@ -755,7 +754,7 @@ END(ret_from_fork)
 	andl	$SEGMENT_RPL_MASK, %eax
 #endif
 	cmpl	$USER_RPL, %eax
-	jb	resume_kernel			# not returning to v8086 or userspace
+	jb	restore_all_kernel		# not returning to v8086 or userspace
 
 ENTRY(resume_userspace)
 	DISABLE_INTERRUPTS(CLBR_ANY)
@@ -765,18 +764,6 @@ ENTRY(resume_userspace)
 	jmp	restore_all
 END(ret_from_exception)
 
-#ifdef CONFIG_PREEMPT
-ENTRY(resume_kernel)
-	DISABLE_INTERRUPTS(CLBR_ANY)
-	cmpl	$0, PER_CPU_VAR(__preempt_count)
-	jnz	restore_all_kernel
-	testl	$X86_EFLAGS_IF, PT_EFLAGS(%esp)	# interrupts off (exception path) ?
-	jz	restore_all_kernel
-	call	preempt_schedule_irq
-	jmp	restore_all_kernel
-END(resume_kernel)
-#endif
-
 GLOBAL(__begin_SYSENTER_singlestep_region)
 /*
  * All code from here through __end_SYSENTER_singlestep_region is subject
@@ -1027,6 +1014,15 @@ ENTRY(entry_INT80_32)
 	INTERRUPT_RETURN
 
 restore_all_kernel:
+#ifdef CONFIG_PREEMPT
+	DISABLE_INTERRUPTS(CLBR_ANY)
+	cmpl	$0, PER_CPU_VAR(__preempt_count)
+	jnz	.Lno_preempt
+	testl	$X86_EFLAGS_IF, PT_EFLAGS(%esp)	# interrupts off (exception path) ?
+	jz	.Lno_preempt
+	call	preempt_schedule_irq
+.Lno_preempt:
+#endif
 	TRACE_IRQS_IRET
 	PARANOID_EXIT_TO_KERNEL_MODE
 	BUG_IF_WRONG_CR3



^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 2/4] x86/kprobes: Fix frame pointer annotations
  2019-05-08  7:49 [PATCH 0/4] x86: int3 fallout Peter Zijlstra
  2019-05-08  7:49 ` [PATCH 1/4] x86/entry/32: Clean up return from interrupt preemption path Peter Zijlstra
@ 2019-05-08  7:49 ` Peter Zijlstra
  2019-05-08 11:54   ` Josh Poimboeuf
  2019-05-08  7:49 ` [PATCH 3/4] x86/ftrace: Add pt_regs frame annotations Peter Zijlstra
  2019-05-08  7:49 ` [RFC][PATCH 4/4] x86_32: Provide consistent pt_regs Peter Zijlstra
  3 siblings, 1 reply; 36+ messages in thread
From: Peter Zijlstra @ 2019-05-08  7:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar, Andrew Morton,
	Andy Lutomirski, Nicolai Stange, Thomas Gleixner,
	Borislav Petkov, H. Peter Anvin, x86, Josh Poimboeuf,
	Jiri Kosina, Miroslav Benes, Petr Mladek, Joe Lawrence,
	Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen,
	Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross,
	Nick Desaulniers, Nayna Jain, Masahiro Yamada, Joerg Roedel,
	linux-kselftest, Masami Hiramatsu

The kprobe trampolines have a FRAME_POINTER annotation that makes no
sense. It marks the frame in the middle of pt_regs, at the place of
saving BP.

Change it to mark the pt_regs frame as per the ENCODE_FRAME_POINTER
from the respective entry_*.S.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/kernel/kprobes/common.h |   32 +++++++++++++++++++++++---------
 1 file changed, 23 insertions(+), 9 deletions(-)

--- a/arch/x86/kernel/kprobes/common.h
+++ b/arch/x86/kernel/kprobes/common.h
@@ -6,14 +6,15 @@
 
 #include <asm/asm.h>
 
+#ifdef CONFIG_X86_64
+
 #ifdef CONFIG_FRAME_POINTER
-# define SAVE_RBP_STRING "	push %" _ASM_BP "\n" \
-			 "	mov  %" _ASM_SP ", %" _ASM_BP "\n"
+#define ENCODE_FRAME_POINTER			\
+	"	leaq 1(%rsp), %rbp\n"
 #else
-# define SAVE_RBP_STRING "	push %" _ASM_BP "\n"
+#define ENCODE_FRAME_POINTER
 #endif
 
-#ifdef CONFIG_X86_64
 #define SAVE_REGS_STRING			\
 	/* Skip cs, ip, orig_ax. */		\
 	"	subq $24, %rsp\n"		\
@@ -27,11 +28,13 @@
 	"	pushq %r10\n"			\
 	"	pushq %r11\n"			\
 	"	pushq %rbx\n"			\
-	SAVE_RBP_STRING				\
+	"	pushq %rbp\n"			\
 	"	pushq %r12\n"			\
 	"	pushq %r13\n"			\
 	"	pushq %r14\n"			\
-	"	pushq %r15\n"
+	"	pushq %r15\n"			\
+	ENCODE_FRAME_POINTER
+
 #define RESTORE_REGS_STRING			\
 	"	popq %r15\n"			\
 	"	popq %r14\n"			\
@@ -51,19 +54,30 @@
 	/* Skip orig_ax, ip, cs */		\
 	"	addq $24, %rsp\n"
 #else
+
+#ifdef CONFIG_FRAME_POINTER
+#define ENCODE_FRAME_POINTER			\
+	"	movl %esp, %ebp\n"		\
+	"	andl $0x7fffffff, %ebp\n"
+#else
+#define ENCODE_FRAME_POINTER
+#endif
+
 #define SAVE_REGS_STRING			\
 	/* Skip cs, ip, orig_ax and gs. */	\
-	"	subl $16, %esp\n"		\
+	"	subl $4*4, %esp\n"		\
 	"	pushl %fs\n"			\
 	"	pushl %es\n"			\
 	"	pushl %ds\n"			\
 	"	pushl %eax\n"			\
-	SAVE_RBP_STRING				\
+	"	pushl %ebp\n"			\
 	"	pushl %edi\n"			\
 	"	pushl %esi\n"			\
 	"	pushl %edx\n"			\
 	"	pushl %ecx\n"			\
-	"	pushl %ebx\n"
+	"	pushl %ebx\n"			\
+	ENCODE_FRAME_POINTER
+
 #define RESTORE_REGS_STRING			\
 	"	popl %ebx\n"			\
 	"	popl %ecx\n"			\



^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 3/4] x86/ftrace: Add pt_regs frame annotations
  2019-05-08  7:49 [PATCH 0/4] x86: int3 fallout Peter Zijlstra
  2019-05-08  7:49 ` [PATCH 1/4] x86/entry/32: Clean up return from interrupt preemption path Peter Zijlstra
  2019-05-08  7:49 ` [PATCH 2/4] x86/kprobes: Fix frame pointer annotations Peter Zijlstra
@ 2019-05-08  7:49 ` Peter Zijlstra
  2019-05-08  7:49 ` [RFC][PATCH 4/4] x86_32: Provide consistent pt_regs Peter Zijlstra
  3 siblings, 0 replies; 36+ messages in thread
From: Peter Zijlstra @ 2019-05-08  7:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar, Andrew Morton,
	Andy Lutomirski, Nicolai Stange, Thomas Gleixner,
	Borislav Petkov, H. Peter Anvin, x86, Josh Poimboeuf,
	Jiri Kosina, Miroslav Benes, Petr Mladek, Joe Lawrence,
	Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen,
	Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross,
	Nick Desaulniers, Nayna Jain, Masahiro Yamada, Joerg Roedel,
	linux-kselftest, Masami Hiramatsu

When CONFIG_FRAME_POINTER, we should mark pt_regs frames.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/kernel/ftrace_32.S |    6 ++++++
 arch/x86/kernel/ftrace_64.S |    3 +++
 2 files changed, 9 insertions(+)

--- a/arch/x86/kernel/ftrace_32.S
+++ b/arch/x86/kernel/ftrace_32.S
@@ -138,6 +138,12 @@ ENTRY(ftrace_regs_caller)
 #else
 	movl	0x4(%ebp), %edx			/* Load parent ip (2nd parameter) */
 #endif
+
+#ifdef CONFIG_FRAME_POINTER
+	movl	%esp, %ebp
+	andl	$0x7fffffff, %ebp
+#endif
+
 	movl	function_trace_op, %ecx		/* Save ftrace_pos in 3rd parameter */
 	pushl	%esp				/* Save pt_regs as 4th parameter */
 
--- a/arch/x86/kernel/ftrace_64.S
+++ b/arch/x86/kernel/ftrace_64.S
@@ -222,6 +222,9 @@ GLOBAL(ftrace_regs_caller_op_ptr)
 	leaq MCOUNT_REG_SIZE+8*2(%rsp), %rcx
 	movq %rcx, RSP(%rsp)
 
+#ifdef CONFIG_FRAME_POINTER
+	leaq 1(%rsp), %rbp
+#endif
 	/* regs go into 4th parameter */
 	leaq (%rsp), %rcx
 



^ permalink raw reply	[flat|nested] 36+ messages in thread

* [RFC][PATCH 4/4] x86_32: Provide consistent pt_regs
  2019-05-08  7:49 [PATCH 0/4] x86: int3 fallout Peter Zijlstra
                   ` (2 preceding siblings ...)
  2019-05-08  7:49 ` [PATCH 3/4] x86/ftrace: Add pt_regs frame annotations Peter Zijlstra
@ 2019-05-08  7:49 ` Peter Zijlstra
  2019-05-08 11:57   ` Josh Poimboeuf
  2019-05-08 20:58   ` Linus Torvalds
  3 siblings, 2 replies; 36+ messages in thread
From: Peter Zijlstra @ 2019-05-08  7:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar, Andrew Morton,
	Andy Lutomirski, Nicolai Stange, Thomas Gleixner,
	Borislav Petkov, H. Peter Anvin, x86, Josh Poimboeuf,
	Jiri Kosina, Miroslav Benes, Petr Mladek, Joe Lawrence,
	Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen,
	Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross,
	Nick Desaulniers, Nayna Jain, Masahiro Yamada, Joerg Roedel,
	linux-kselftest, Masami Hiramatsu

Currently pt_regs on x86_32 has an oddity in that kernel regs
(!user_mode(regs)) are short two entries (esp/ss). This means that any
code trying to use them (typically: regs->sp) needs to jump through
some unfortunate hoops.

Change the entry code to fix this up and create a full pt_regs frame.

This then simplifies:

  - ftrace
  - kprobes
  - stack unwinder
  - ptrace
  - kdump
  - kgdb

Hated-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/entry/entry_32.S         |  105 ++++++++++++++++++++++++++++++++++----
 arch/x86/include/asm/kexec.h      |   17 ------
 arch/x86/include/asm/ptrace.h     |   17 ------
 arch/x86/include/asm/stacktrace.h |    2 
 arch/x86/kernel/crash.c           |    8 --
 arch/x86/kernel/ftrace_32.S       |   81 ++++++++++++++++-------------
 arch/x86/kernel/kgdb.c            |    8 --
 arch/x86/kernel/kprobes/common.h  |    4 -
 arch/x86/kernel/kprobes/core.c    |   29 ++++------
 arch/x86/kernel/kprobes/opt.c     |   20 ++++---
 arch/x86/kernel/process_32.c      |   16 +----
 arch/x86/kernel/ptrace.c          |   29 ----------
 arch/x86/kernel/time.c            |    3 -
 arch/x86/kernel/unwind_frame.c    |   32 +----------
 arch/x86/kernel/unwind_orc.c      |    2 
 15 files changed, 181 insertions(+), 192 deletions(-)

--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -202,9 +202,102 @@
 .Lend_\@:
 .endm
 
+#define CS_FROM_ENTRY_STACK	(1 << 31)
+#define CS_FROM_USER_CR3	(1 << 30)
+#define CS_FROM_KERNEL		(1 << 29)
+
+.macro FIXUP_FRAME
+	/*
+	 * The high bits of the CS dword (__csh) are used for CS_FROM_*.
+	 * Clear them in case hardware didn't do this for us.
+	 */
+	andl	$0x0000ffff, 3*4(%esp)
+
+#ifdef CONFIG_VM86
+	testl	$X86_EFLAGS_VM, 4*4(%esp)
+	jnz	.Lfrom_usermode_no_fixup_\@
+#endif
+	testl	$SEGMENT_RPL_MASK, 3*4(%esp)
+	jnz	.Lfrom_usermode_no_fixup_\@
+
+	orl	$CS_FROM_KERNEL, 3*4(%esp)
+
+	/*
+	 * When we're here from kernel mode; the (exception) stack looks like:
+	 *
+	 *  5*4(%esp) - <previous context>
+	 *  4*4(%esp) - flags
+	 *  3*4(%esp) - cs
+	 *  2*4(%esp) - ip
+	 *  1*4(%esp) - orig_eax
+	 *  0*4(%esp) - gs / function
+	 *
+	 * Lets build a 5 entry IRET frame after that, such that struct pt_regs
+	 * is complete and in particular regs->sp is correct. This gives us
+	 * the original 5 enties as gap:
+	 *
+	 * 12*4(%esp) - <previous context>
+	 * 11*4(%esp) - gap / flags
+	 * 10*4(%esp) - gap / cs
+	 *  9*4(%esp) - gap / ip
+	 *  8*4(%esp) - gap / orig_eax
+	 *  7*4(%esp) - gap / gs / function
+	 *  6*4(%esp) - ss
+	 *  5*4(%esp) - sp
+	 *  4*4(%esp) - flags
+	 *  3*4(%esp) - cs
+	 *  2*4(%esp) - ip
+	 *  1*4(%esp) - orig_eax
+	 *  0*4(%esp) - gs / function
+	 */
+
+	pushl	%ss		# ss
+	pushl	%esp		# sp (points at ss)
+	addl	$6*4, (%esp)	# point sp back at the previous context
+	pushl	6*4(%esp)	# flags
+	pushl	6*4(%esp)	# cs
+	pushl	6*4(%esp)	# ip
+	pushl	6*4(%esp)	# orig_eax
+	pushl	6*4(%esp)	# gs / function
+.Lfrom_usermode_no_fixup_\@:
+.endm
+
+.macro IRET_FRAME
+	testl $CS_FROM_KERNEL, 1*4(%esp)
+	jz .Lfinished_frame_\@
+
+	/*
+	 * Reconstruct the 3 entry IRET frame right after the (modified)
+	 * regs->sp without lowering %esp in between, such that an NMI in the
+	 * middle doesn't scribble our stack.
+	 */
+	pushl	%eax
+	pushl	%ecx
+	movl	5*4(%esp), %eax		# (modified) regs->sp
+
+	movl	4*4(%esp), %ecx		# flags
+	movl	%ecx, -4(%eax)
+
+	movl	3*4(%esp), %ecx		# cs
+	andl	$0x0000ffff, %ecx
+	movl	%ecx, -8(%eax)
+
+	movl	2*4(%esp), %ecx		# ip
+	movl	%ecx, -12(%eax)
+
+	movl	1*4(%esp), %ecx		# eax
+	movl	%ecx, -16(%eax)
+
+	popl	%ecx
+	lea	-16(%eax), %esp
+	popl	%eax
+.Lfinished_frame_\@:
+.endm
+
 .macro SAVE_ALL pt_regs_ax=%eax switch_stacks=0
 	cld
 	PUSH_GS
+	FIXUP_FRAME
 	pushl	%fs
 	pushl	%es
 	pushl	%ds
@@ -374,9 +467,6 @@
  * switch to it before we do any copying.
  */
 
-#define CS_FROM_ENTRY_STACK	(1 << 31)
-#define CS_FROM_USER_CR3	(1 << 30)
-
 .macro SWITCH_TO_KERNEL_STACK
 
 	ALTERNATIVE     "", "jmp .Lend_\@", X86_FEATURE_XENPV
@@ -390,13 +480,6 @@
 	 * that register for the time this macro runs
 	 */
 
-	/*
-	 * The high bits of the CS dword (__csh) are used for
-	 * CS_FROM_ENTRY_STACK and CS_FROM_USER_CR3. Clear them in case
-	 * hardware didn't do this for us.
-	 */
-	andl	$(0x0000ffff), PT_CS(%esp)
-
 	/* Are we on the entry stack? Bail out if not! */
 	movl	PER_CPU_VAR(cpu_entry_area), %ecx
 	addl	$CPU_ENTRY_AREA_entry_stack + SIZEOF_entry_stack, %ecx
@@ -1006,6 +1089,7 @@ ENTRY(entry_INT80_32)
 	/* Restore user state */
 	RESTORE_REGS pop=4			# skip orig_eax/error_code
 .Lirq_return:
+	IRET_FRAME
 	/*
 	 * ARCH_HAS_MEMBARRIER_SYNC_CORE rely on IRET core serialization
 	 * when returning from IPI handler and when returning from
@@ -1356,6 +1440,7 @@ END(page_fault)
 
 common_exception:
 	/* the function address is in %gs's slot on the stack */
+	FIXUP_FRAME
 	pushl	%fs
 	pushl	%es
 	pushl	%ds
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -71,22 +71,6 @@ struct kimage;
 #define KEXEC_BACKUP_SRC_END	(640 * 1024UL - 1)	/* 640K */
 
 /*
- * CPU does not save ss and sp on stack if execution is already
- * running in kernel mode at the time of NMI occurrence. This code
- * fixes it.
- */
-static inline void crash_fixup_ss_esp(struct pt_regs *newregs,
-				      struct pt_regs *oldregs)
-{
-#ifdef CONFIG_X86_32
-	newregs->sp = (unsigned long)&(oldregs->sp);
-	asm volatile("xorl %%eax, %%eax\n\t"
-		     "movw %%ss, %%ax\n\t"
-		     :"=a"(newregs->ss));
-#endif
-}
-
-/*
  * This function is responsible for capturing register states if coming
  * via panic otherwise just fix up the ss and sp if coming via kernel
  * mode exception.
@@ -96,7 +80,6 @@ static inline void crash_setup_regs(stru
 {
 	if (oldregs) {
 		memcpy(newregs, oldregs, sizeof(*newregs));
-		crash_fixup_ss_esp(newregs, oldregs);
 	} else {
 #ifdef CONFIG_X86_32
 		asm volatile("movl %%ebx,%0" : "=m"(newregs->bx));
--- a/arch/x86/include/asm/ptrace.h
+++ b/arch/x86/include/asm/ptrace.h
@@ -166,14 +166,10 @@ static inline bool user_64bit_mode(struc
 #define compat_user_stack_pointer()	current_pt_regs()->sp
 #endif
 
-#ifdef CONFIG_X86_32
-extern unsigned long kernel_stack_pointer(struct pt_regs *regs);
-#else
 static inline unsigned long kernel_stack_pointer(struct pt_regs *regs)
 {
 	return regs->sp;
 }
-#endif
 
 #define GET_IP(regs) ((regs)->ip)
 #define GET_FP(regs) ((regs)->bp)
@@ -201,14 +197,6 @@ static inline unsigned long regs_get_reg
 	if (unlikely(offset > MAX_REG_OFFSET))
 		return 0;
 #ifdef CONFIG_X86_32
-	/*
-	 * Traps from the kernel do not save sp and ss.
-	 * Use the helper function to retrieve sp.
-	 */
-	if (offset == offsetof(struct pt_regs, sp) &&
-	    regs->cs == __KERNEL_CS)
-		return kernel_stack_pointer(regs);
-
 	/* The selector fields are 16-bit. */
 	if (offset == offsetof(struct pt_regs, cs) ||
 	    offset == offsetof(struct pt_regs, ss) ||
@@ -234,8 +222,7 @@ static inline unsigned long regs_get_reg
 static inline int regs_within_kernel_stack(struct pt_regs *regs,
 					   unsigned long addr)
 {
-	return ((addr & ~(THREAD_SIZE - 1))  ==
-		(kernel_stack_pointer(regs) & ~(THREAD_SIZE - 1)));
+	return ((addr & ~(THREAD_SIZE - 1)) == (regs->sp & ~(THREAD_SIZE - 1)));
 }
 
 /**
@@ -249,7 +236,7 @@ static inline int regs_within_kernel_sta
  */
 static inline unsigned long *regs_get_kernel_stack_nth_addr(struct pt_regs *regs, unsigned int n)
 {
-	unsigned long *addr = (unsigned long *)kernel_stack_pointer(regs);
+	unsigned long *addr = (unsigned long *)regs->sp;
 
 	addr += n;
 	if (regs_within_kernel_stack(regs, (unsigned long)addr))
--- a/arch/x86/include/asm/stacktrace.h
+++ b/arch/x86/include/asm/stacktrace.h
@@ -78,7 +78,7 @@ static inline unsigned long *
 get_stack_pointer(struct task_struct *task, struct pt_regs *regs)
 {
 	if (regs)
-		return (unsigned long *)kernel_stack_pointer(regs);
+		return (unsigned long *)regs->sp;
 
 	if (task == current)
 		return __builtin_frame_address(0);
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -72,14 +72,6 @@ static inline void cpu_crash_vmclear_loa
 
 static void kdump_nmi_callback(int cpu, struct pt_regs *regs)
 {
-#ifdef CONFIG_X86_32
-	struct pt_regs fixed_regs;
-
-	if (!user_mode(regs)) {
-		crash_fixup_ss_esp(&fixed_regs, regs);
-		regs = &fixed_regs;
-	}
-#endif
 	crash_save_cpu(regs, cpu);
 
 	/*
--- a/arch/x86/kernel/ftrace_32.S
+++ b/arch/x86/kernel/ftrace_32.S
@@ -9,6 +9,7 @@
 #include <asm/export.h>
 #include <asm/ftrace.h>
 #include <asm/nospec-branch.h>
+#include <asm/asm-offsets.h>
 
 #ifdef CC_USING_FENTRY
 # define function_hook	__fentry__
@@ -104,26 +105,38 @@ END(ftrace_caller)
 
 ENTRY(ftrace_regs_caller)
 	/*
-	 * i386 does not save SS and ESP when coming from kernel.
-	 * Instead, to get sp, &regs->sp is used (see ptrace.h).
-	 * Unfortunately, that means eflags must be at the same location
-	 * as the current return ip is. We move the return ip into the
-	 * regs->ip location, and move flags into the return ip location.
+	 * We're here from an mcount/fentry CALL, and the stack frame looks like:
+	 *
+	 *  <previous context>
+	 *  RET-IP
+	 *
+	 * The purpose of this function is to call out in an emulated INT3
+	 * environment with a stack frame like:
+	 *
+	 *  <previous context>
+	 *  gap / RET-IP
+	 *  gap
+	 *  gap
+	 *  gap
+	 *  pt_regs
+	 *
+	 * We do _NOT_ restore: ss, flags, cs, gs, fs, es, ds
 	 */
-	pushl	$__KERNEL_CS
-	pushl	4(%esp)				/* Save the return ip */
-	pushl	$0				/* Load 0 into orig_ax */
+	subl	$3*4, %esp	# RET-IP + 3 gaps
+	pushl	%ss		# ss
+	pushl	%esp		# points at ss
+	addl	$5*4, (%esp)	#   make it point at <previous context>
+	pushfl			# flags
+	pushl	$__KERNEL_CS	# cs
+	pushl	7*4(%esp)	# ip <- RET-IP
+	pushl	$0		# orig_eax
+
 	pushl	%gs
 	pushl	%fs
 	pushl	%es
 	pushl	%ds
-	pushl	%eax
-
-	/* Get flags and place them into the return ip slot */
-	pushf
-	popl	%eax
-	movl	%eax, 8*4(%esp)
 
+	pushl	%eax
 	pushl	%ebp
 	pushl	%edi
 	pushl	%esi
@@ -131,12 +144,13 @@ ENTRY(ftrace_regs_caller)
 	pushl	%ecx
 	pushl	%ebx
 
-	movl	12*4(%esp), %eax		/* Load ip (1st parameter) */
-	subl	$MCOUNT_INSN_SIZE, %eax		/* Adjust ip */
+	movl	PT_EIP(%esp), %eax	# 1st argument: IP
+	subl	$MCOUNT_INSN_SIZE, %eax
+
 #ifdef CC_USING_FENTRY
-	movl	15*4(%esp), %edx		/* Load parent ip (2nd parameter) */
+	movl	21*4(%esp), %edx	# 2nd argument: parent ip
 #else
-	movl	0x4(%ebp), %edx			/* Load parent ip (2nd parameter) */
+	movl	1*4(%ebp), %edx		# 2nd argument: parent ip
 #endif
 
 #ifdef CONFIG_FRAME_POINTER
@@ -144,21 +158,22 @@ ENTRY(ftrace_regs_caller)
 	andl	$0x7fffffff, %ebp
 #endif
 
-	movl	function_trace_op, %ecx		/* Save ftrace_pos in 3rd parameter */
-	pushl	%esp				/* Save pt_regs as 4th parameter */
+	movl	function_trace_op, %ecx	# 3rd argument: ftrace_pos
+	pushl	%esp			# 4th argument: pt_regs
 
 GLOBAL(ftrace_regs_call)
 	call	ftrace_stub
 
-	addl	$4, %esp			/* Skip pt_regs */
+	addl	$4, %esp		# skip 4th argument
 
-	/* restore flags */
-	push	14*4(%esp)
-	popf
-
-	/* Move return ip back to its original location */
-	movl	12*4(%esp), %eax
-	movl	%eax, 14*4(%esp)
+	/* place IP below the new SP */
+	movl	PT_OLDESP(%esp), %eax
+	movl	PT_EIP(%esp), %ecx
+	movl	%ecx, -4(%eax)
+
+	/* place EAX below that */
+	movl	PT_EAX(%esp), %ecx
+	movl	%ecx, -8(%eax)
 
 	popl	%ebx
 	popl	%ecx
@@ -166,16 +181,12 @@ GLOBAL(ftrace_regs_call)
 	popl	%esi
 	popl	%edi
 	popl	%ebp
-	popl	%eax
-	popl	%ds
-	popl	%es
-	popl	%fs
-	popl	%gs
 
-	/* use lea to not affect flags */
-	lea	3*4(%esp), %esp			/* Skip orig_ax, ip and cs */
+	lea	-8(%eax), %esp
+	popl	%eax
 
 	jmp	.Lftrace_ret
+
 #else /* ! CONFIG_DYNAMIC_FTRACE */
 
 ENTRY(function_hook)
--- a/arch/x86/kernel/kgdb.c
+++ b/arch/x86/kernel/kgdb.c
@@ -127,14 +127,6 @@ char *dbg_get_reg(int regno, void *mem,
 
 #ifdef CONFIG_X86_32
 	switch (regno) {
-	case GDB_SS:
-		if (!user_mode(regs))
-			*(unsigned long *)mem = __KERNEL_DS;
-		break;
-	case GDB_SP:
-		if (!user_mode(regs))
-			*(unsigned long *)mem = kernel_stack_pointer(regs);
-		break;
 	case GDB_GS:
 	case GDB_FS:
 		*(unsigned long *)mem = 0xFFFF;
--- a/arch/x86/kernel/kprobes/common.h
+++ b/arch/x86/kernel/kprobes/common.h
@@ -86,8 +86,8 @@
 	"	popl %edi\n"			\
 	"	popl %ebp\n"			\
 	"	popl %eax\n"			\
-	/* Skip ds, es, fs, gs, orig_ax, and ip. Note: don't pop cs here*/\
-	"	addl $24, %esp\n"
+	/* Skip ds, es, fs, gs, orig_ax, ip, and cs. */\
+	"	addl $7*4, %esp\n"
 #endif
 
 /* Ensure if the instruction can be boostable */
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -69,7 +69,7 @@
 DEFINE_PER_CPU(struct kprobe *, current_kprobe) = NULL;
 DEFINE_PER_CPU(struct kprobe_ctlblk, kprobe_ctlblk);
 
-#define stack_addr(regs) ((unsigned long *)kernel_stack_pointer(regs))
+#define stack_addr(regs) ((unsigned long *)regs->sp)
 
 #define W(row, b0, b1, b2, b3, b4, b5, b6, b7, b8, b9, ba, bb, bc, bd, be, bf)\
 	(((b0##UL << 0x0)|(b1##UL << 0x1)|(b2##UL << 0x2)|(b3##UL << 0x3) |   \
@@ -731,29 +731,27 @@ asm(
 	".global kretprobe_trampoline\n"
 	".type kretprobe_trampoline, @function\n"
 	"kretprobe_trampoline:\n"
-#ifdef CONFIG_X86_64
 	/* We don't bother saving the ss register */
+#ifdef CONFIG_X86_64
 	"	pushq %rsp\n"
 	"	pushfq\n"
 	SAVE_REGS_STRING
 	"	movq %rsp, %rdi\n"
 	"	call trampoline_handler\n"
 	/* Replace saved sp with true return address. */
-	"	movq %rax, 152(%rsp)\n"
+	"	movq %rax, 19*8(%rsp)\n"
 	RESTORE_REGS_STRING
 	"	popfq\n"
 #else
-	"	pushf\n"
+	"	pushl %esp\n"
+	"	pushfl\n"
 	SAVE_REGS_STRING
 	"	movl %esp, %eax\n"
 	"	call trampoline_handler\n"
-	/* Move flags to cs */
-	"	movl 56(%esp), %edx\n"
-	"	movl %edx, 52(%esp)\n"
-	/* Replace saved flags with true return address. */
-	"	movl %eax, 56(%esp)\n"
+	/* Replace saved sp with true return address. */
+	"	movl %eax, 15*4(%esp)\n"
 	RESTORE_REGS_STRING
-	"	popf\n"
+	"	popfl\n"
 #endif
 	"	ret\n"
 	".size kretprobe_trampoline, .-kretprobe_trampoline\n"
@@ -794,16 +792,13 @@ static __used void *trampoline_handler(s
 	INIT_HLIST_HEAD(&empty_rp);
 	kretprobe_hash_lock(current, &head, &flags);
 	/* fixup registers */
-#ifdef CONFIG_X86_64
 	regs->cs = __KERNEL_CS;
-	/* On x86-64, we use pt_regs->sp for return address holder. */
-	frame_pointer = &regs->sp;
-#else
-	regs->cs = __KERNEL_CS | get_kernel_rpl();
+#ifdef CONFIG_X86_32
+	regs->cs |= get_kernel_rpl();
 	regs->gs = 0;
-	/* On x86-32, we use pt_regs->flags for return address holder. */
-	frame_pointer = &regs->flags;
 #endif
+	/* We use pt_regs->sp for return address holder. */
+	frame_pointer = &regs->sp;
 	regs->ip = trampoline_address;
 	regs->orig_ax = ~0UL;
 
--- a/arch/x86/kernel/kprobes/opt.c
+++ b/arch/x86/kernel/kprobes/opt.c
@@ -115,14 +115,15 @@ asm (
 			"optprobe_template_call:\n"
 			ASM_NOP5
 			/* Move flags to rsp */
-			"	movq 144(%rsp), %rdx\n"
-			"	movq %rdx, 152(%rsp)\n"
+			"	movq 18*8(%rsp), %rdx\n"
+			"	movq %rdx, 19*8(%rsp)\n"
 			RESTORE_REGS_STRING
 			/* Skip flags entry */
 			"	addq $8, %rsp\n"
 			"	popfq\n"
 #else /* CONFIG_X86_32 */
-			"	pushf\n"
+			"	pushl %esp\n"
+			"	pushfl\n"
 			SAVE_REGS_STRING
 			"	movl %esp, %edx\n"
 			".global optprobe_template_val\n"
@@ -131,9 +132,13 @@ asm (
 			".global optprobe_template_call\n"
 			"optprobe_template_call:\n"
 			ASM_NOP5
+			/* Move flags into esp */
+			"	movl 14*4(%esp), %edx\n"
+			"	movl %edx, 15*4(%esp)\n"
 			RESTORE_REGS_STRING
-			"	addl $4, %esp\n"	/* skip cs */
-			"	popf\n"
+			/* Skip flags entry */
+			"	addl $4, %esp\n"
+			"	popfl\n"
 #endif
 			".global optprobe_template_end\n"
 			"optprobe_template_end:\n"
@@ -165,10 +170,9 @@ optimized_callback(struct optimized_kpro
 	} else {
 		struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
 		/* Save skipped registers */
-#ifdef CONFIG_X86_64
 		regs->cs = __KERNEL_CS;
-#else
-		regs->cs = __KERNEL_CS | get_kernel_rpl();
+#ifdef CONFIG_X86_32
+		regs->cs |= get_kernel_rpl();
 		regs->gs = 0;
 #endif
 		regs->ip = (unsigned long)op->kp.addr + INT3_SIZE;
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -62,27 +62,21 @@ void __show_regs(struct pt_regs *regs, e
 {
 	unsigned long cr0 = 0L, cr2 = 0L, cr3 = 0L, cr4 = 0L;
 	unsigned long d0, d1, d2, d3, d6, d7;
-	unsigned long sp;
-	unsigned short ss, gs;
+	unsigned short gs;
 
-	if (user_mode(regs)) {
-		sp = regs->sp;
-		ss = regs->ss;
+	if (user_mode(regs))
 		gs = get_user_gs(regs);
-	} else {
-		sp = kernel_stack_pointer(regs);
-		savesegment(ss, ss);
+	else
 		savesegment(gs, gs);
-	}
 
 	show_ip(regs, KERN_DEFAULT);
 
 	printk(KERN_DEFAULT "EAX: %08lx EBX: %08lx ECX: %08lx EDX: %08lx\n",
 		regs->ax, regs->bx, regs->cx, regs->dx);
 	printk(KERN_DEFAULT "ESI: %08lx EDI: %08lx EBP: %08lx ESP: %08lx\n",
-		regs->si, regs->di, regs->bp, sp);
+		regs->si, regs->di, regs->bp, regs->sp);
 	printk(KERN_DEFAULT "DS: %04x ES: %04x FS: %04x GS: %04x SS: %04x EFLAGS: %08lx\n",
-	       (u16)regs->ds, (u16)regs->es, (u16)regs->fs, gs, ss, regs->flags);
+	       (u16)regs->ds, (u16)regs->es, (u16)regs->fs, gs, regs->ss, regs->flags);
 
 	if (mode != SHOW_REGS_ALL)
 		return;
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -153,35 +153,6 @@ static inline bool invalid_selector(u16
 
 #define FLAG_MASK		FLAG_MASK_32
 
-/*
- * X86_32 CPUs don't save ss and esp if the CPU is already in kernel mode
- * when it traps.  The previous stack will be directly underneath the saved
- * registers, and 'sp/ss' won't even have been saved. Thus the '&regs->sp'.
- *
- * Now, if the stack is empty, '&regs->sp' is out of range. In this
- * case we try to take the previous stack. To always return a non-null
- * stack pointer we fall back to regs as stack if no previous stack
- * exists.
- *
- * This is valid only for kernel mode traps.
- */
-unsigned long kernel_stack_pointer(struct pt_regs *regs)
-{
-	unsigned long context = (unsigned long)regs & ~(THREAD_SIZE - 1);
-	unsigned long sp = (unsigned long)&regs->sp;
-	u32 *prev_esp;
-
-	if (context == (sp & ~(THREAD_SIZE - 1)))
-		return sp;
-
-	prev_esp = (u32 *)(context);
-	if (*prev_esp)
-		return (unsigned long)*prev_esp;
-
-	return (unsigned long)regs;
-}
-EXPORT_SYMBOL_GPL(kernel_stack_pointer);
-
 static unsigned long *pt_regs_access(struct pt_regs *regs, unsigned long regno)
 {
 	BUILD_BUG_ON(offsetof(struct pt_regs, bx) != 0);
--- a/arch/x86/kernel/time.c
+++ b/arch/x86/kernel/time.c
@@ -37,8 +37,7 @@ unsigned long profile_pc(struct pt_regs
 #ifdef CONFIG_FRAME_POINTER
 		return *(unsigned long *)(regs->bp + sizeof(long));
 #else
-		unsigned long *sp =
-			(unsigned long *)kernel_stack_pointer(regs);
+		unsigned long *sp = (unsigned long *)regs->sp;
 		/*
 		 * Return address is either directly at stack pointer
 		 * or above a saved flags. Eflags has bits 22-31 zero,
--- a/arch/x86/kernel/unwind_frame.c
+++ b/arch/x86/kernel/unwind_frame.c
@@ -69,15 +69,6 @@ static void unwind_dump(struct unwind_st
 	}
 }
 
-static size_t regs_size(struct pt_regs *regs)
-{
-	/* x86_32 regs from kernel mode are two words shorter: */
-	if (IS_ENABLED(CONFIG_X86_32) && !user_mode(regs))
-		return sizeof(*regs) - 2*sizeof(long);
-
-	return sizeof(*regs);
-}
-
 static bool in_entry_code(unsigned long ip)
 {
 	char *addr = (char *)ip;
@@ -197,12 +188,6 @@ static struct pt_regs *decode_frame_poin
 }
 #endif
 
-#ifdef CONFIG_X86_32
-#define KERNEL_REGS_SIZE (sizeof(struct pt_regs) - 2*sizeof(long))
-#else
-#define KERNEL_REGS_SIZE (sizeof(struct pt_regs))
-#endif
-
 static bool update_stack_state(struct unwind_state *state,
 			       unsigned long *next_bp)
 {
@@ -213,7 +198,7 @@ static bool update_stack_state(struct un
 	size_t len;
 
 	if (state->regs)
-		prev_frame_end = (void *)state->regs + regs_size(state->regs);
+		prev_frame_end = (void *)state->regs + sizeof(*state->regs);
 	else
 		prev_frame_end = (void *)state->bp + FRAME_HEADER_SIZE;
 
@@ -221,7 +206,7 @@ static bool update_stack_state(struct un
 	regs = decode_frame_pointer(next_bp);
 	if (regs) {
 		frame = (unsigned long *)regs;
-		len = KERNEL_REGS_SIZE;
+		len = sizeof(*regs);
 		state->got_irq = true;
 	} else {
 		frame = next_bp;
@@ -245,14 +230,6 @@ static bool update_stack_state(struct un
 	    frame < prev_frame_end)
 		return false;
 
-	/*
-	 * On 32-bit with user mode regs, make sure the last two regs are safe
-	 * to access:
-	 */
-	if (IS_ENABLED(CONFIG_X86_32) && regs && user_mode(regs) &&
-	    !on_stack(info, frame, len + 2*sizeof(long)))
-		return false;
-
 	/* Move state to the next frame: */
 	if (regs) {
 		state->regs = regs;
@@ -411,10 +388,9 @@ void __unwind_start(struct unwind_state
 	 * Pretend that the frame is complete and that BP points to it, but save
 	 * the real BP so that we can use it when looking for the next frame.
 	 */
-	if (regs && regs->ip == 0 &&
-	    (unsigned long *)kernel_stack_pointer(regs) >= first_frame) {
+	if (regs && regs->ip == 0 && (unsigned long *)regs->sp >= first_frame) {
 		state->next_bp = bp;
-		bp = ((unsigned long *)kernel_stack_pointer(regs)) - 1;
+		bp = ((unsigned long *)regs->sp) - 1;
 	}
 
 	/* Initialize stack info and make sure the frame data is accessible: */
--- a/arch/x86/kernel/unwind_orc.c
+++ b/arch/x86/kernel/unwind_orc.c
@@ -579,7 +579,7 @@ void __unwind_start(struct unwind_state
 			goto done;
 
 		state->ip = regs->ip;
-		state->sp = kernel_stack_pointer(regs);
+		state->sp = regs->sp;
 		state->bp = regs->bp;
 		state->regs = regs;
 		state->full_regs = true;



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/4] x86/kprobes: Fix frame pointer annotations
  2019-05-08  7:49 ` [PATCH 2/4] x86/kprobes: Fix frame pointer annotations Peter Zijlstra
@ 2019-05-08 11:54   ` Josh Poimboeuf
  2019-05-08 12:04     ` Peter Zijlstra
  0 siblings, 1 reply; 36+ messages in thread
From: Josh Poimboeuf @ 2019-05-08 11:54 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Andy Lutomirski, Nicolai Stange, Thomas Gleixner,
	Borislav Petkov, H. Peter Anvin, x86, Jiri Kosina,
	Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan,
	Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior,
	Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain,
	Masahiro Yamada, Joerg Roedel, linux-kselftest, Masami Hiramatsu

On Wed, May 08, 2019 at 09:49:03AM +0200, Peter Zijlstra wrote:
> The kprobe trampolines have a FRAME_POINTER annotation that makes no
> sense. It marks the frame in the middle of pt_regs, at the place of
> saving BP.
> 
> Change it to mark the pt_regs frame as per the ENCODE_FRAME_POINTER
> from the respective entry_*.S.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  arch/x86/kernel/kprobes/common.h |   32 +++++++++++++++++++++++---------
>  1 file changed, 23 insertions(+), 9 deletions(-)
> 
> --- a/arch/x86/kernel/kprobes/common.h
> +++ b/arch/x86/kernel/kprobes/common.h
> @@ -6,14 +6,15 @@
>  
>  #include <asm/asm.h>
>  
> +#ifdef CONFIG_X86_64
> +
>  #ifdef CONFIG_FRAME_POINTER
> -# define SAVE_RBP_STRING "	push %" _ASM_BP "\n" \
> -			 "	mov  %" _ASM_SP ", %" _ASM_BP "\n"
> +#define ENCODE_FRAME_POINTER			\
> +	"	leaq 1(%rsp), %rbp\n"
>  #else
> -# define SAVE_RBP_STRING "	push %" _ASM_BP "\n"
> +#define ENCODE_FRAME_POINTER
>  #endif

> +#ifdef CONFIG_FRAME_POINTER
> +#define ENCODE_FRAME_POINTER			\
> +	"	movl %esp, %ebp\n"		\
> +	"	andl $0x7fffffff, %ebp\n"
> +#else
> +#define ENCODE_FRAME_POINTER
> +#endif

We should put these macros in a header file somewhere (including
stringified versions).

-- 
Josh

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][PATCH 4/4] x86_32: Provide consistent pt_regs
  2019-05-08  7:49 ` [RFC][PATCH 4/4] x86_32: Provide consistent pt_regs Peter Zijlstra
@ 2019-05-08 11:57   ` Josh Poimboeuf
  2019-05-08 12:46     ` Ingo Molnar
  2019-05-08 20:58   ` Linus Torvalds
  1 sibling, 1 reply; 36+ messages in thread
From: Josh Poimboeuf @ 2019-05-08 11:57 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Andy Lutomirski, Nicolai Stange, Thomas Gleixner,
	Borislav Petkov, H. Peter Anvin, x86, Jiri Kosina,
	Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan,
	Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior,
	Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain,
	Masahiro Yamada, Joerg Roedel, linux-kselftest, Masami Hiramatsu

On Wed, May 08, 2019 at 09:49:05AM +0200, Peter Zijlstra wrote:
> Currently pt_regs on x86_32 has an oddity in that kernel regs
> (!user_mode(regs)) are short two entries (esp/ss). This means that any
> code trying to use them (typically: regs->sp) needs to jump through
> some unfortunate hoops.
> 
> Change the entry code to fix this up and create a full pt_regs frame.
> 
> This then simplifies:
> 
>   - ftrace
>   - kprobes
>   - stack unwinder
>   - ptrace
>   - kdump
>   - kgdb
> 
> Hated-by: Linus Torvalds <torvalds@linux-foundation.org>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  arch/x86/entry/entry_32.S         |  105 ++++++++++++++++++++++++++++++++++----
>  arch/x86/include/asm/kexec.h      |   17 ------
>  arch/x86/include/asm/ptrace.h     |   17 ------
>  arch/x86/include/asm/stacktrace.h |    2 
>  arch/x86/kernel/crash.c           |    8 --
>  arch/x86/kernel/ftrace_32.S       |   81 ++++++++++++++++-------------
>  arch/x86/kernel/kgdb.c            |    8 --
>  arch/x86/kernel/kprobes/common.h  |    4 -
>  arch/x86/kernel/kprobes/core.c    |   29 ++++------
>  arch/x86/kernel/kprobes/opt.c     |   20 ++++---
>  arch/x86/kernel/process_32.c      |   16 +----
>  arch/x86/kernel/ptrace.c          |   29 ----------
>  arch/x86/kernel/time.c            |    3 -
>  arch/x86/kernel/unwind_frame.c    |   32 +----------
>  arch/x86/kernel/unwind_orc.c      |    2 
>  15 files changed, 181 insertions(+), 192 deletions(-)

Very nice diffstat.  This moves all the pain to the 32-bit entry code
where it belongs.

-- 
Josh

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/4] x86/kprobes: Fix frame pointer annotations
  2019-05-08 11:54   ` Josh Poimboeuf
@ 2019-05-08 12:04     ` Peter Zijlstra
  2019-05-08 12:40       ` Peter Zijlstra
  2019-05-08 12:42       ` Josh Poimboeuf
  0 siblings, 2 replies; 36+ messages in thread
From: Peter Zijlstra @ 2019-05-08 12:04 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Andy Lutomirski, Nicolai Stange, Thomas Gleixner,
	Borislav Petkov, H. Peter Anvin, x86, Jiri Kosina,
	Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan,
	Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior,
	Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain,
	Masahiro Yamada, Joerg Roedel, linux-kselftest, Masami Hiramatsu

On Wed, May 08, 2019 at 06:54:16AM -0500, Josh Poimboeuf wrote:
> On Wed, May 08, 2019 at 09:49:03AM +0200, Peter Zijlstra wrote:
> > The kprobe trampolines have a FRAME_POINTER annotation that makes no
> > sense. It marks the frame in the middle of pt_regs, at the place of
> > saving BP.
> > 
> > Change it to mark the pt_regs frame as per the ENCODE_FRAME_POINTER
> > from the respective entry_*.S.
> > 
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > ---
> >  arch/x86/kernel/kprobes/common.h |   32 +++++++++++++++++++++++---------
> >  1 file changed, 23 insertions(+), 9 deletions(-)
> > 
> > --- a/arch/x86/kernel/kprobes/common.h
> > +++ b/arch/x86/kernel/kprobes/common.h
> > @@ -6,14 +6,15 @@
> >  
> >  #include <asm/asm.h>
> >  
> > +#ifdef CONFIG_X86_64
> > +
> >  #ifdef CONFIG_FRAME_POINTER
> > -# define SAVE_RBP_STRING "	push %" _ASM_BP "\n" \
> > -			 "	mov  %" _ASM_SP ", %" _ASM_BP "\n"
> > +#define ENCODE_FRAME_POINTER			\
> > +	"	leaq 1(%rsp), %rbp\n"
> >  #else
> > -# define SAVE_RBP_STRING "	push %" _ASM_BP "\n"
> > +#define ENCODE_FRAME_POINTER
> >  #endif
> 
> > +#ifdef CONFIG_FRAME_POINTER
> > +#define ENCODE_FRAME_POINTER			\
> > +	"	movl %esp, %ebp\n"		\
> > +	"	andl $0x7fffffff, %ebp\n"
> > +#else
> > +#define ENCODE_FRAME_POINTER
> > +#endif
> 
> We should put these macros in a header file somewhere (including
> stringified versions).

Probably a good idea. I'll frob them into asm/frame.h.

Do the x86_64 variants also want some ORC annotation?

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/4] x86/kprobes: Fix frame pointer annotations
  2019-05-08 12:04     ` Peter Zijlstra
@ 2019-05-08 12:40       ` Peter Zijlstra
  2019-05-08 12:42       ` Josh Poimboeuf
  1 sibling, 0 replies; 36+ messages in thread
From: Peter Zijlstra @ 2019-05-08 12:40 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Andy Lutomirski, Nicolai Stange, Thomas Gleixner,
	Borislav Petkov, H. Peter Anvin, x86, Jiri Kosina,
	Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan,
	Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior,
	Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain,
	Masahiro Yamada, Joerg Roedel, linux-kselftest, Masami Hiramatsu

On Wed, May 08, 2019 at 02:04:16PM +0200, Peter Zijlstra wrote:
> On Wed, May 08, 2019 at 06:54:16AM -0500, Josh Poimboeuf wrote:

> > We should put these macros in a header file somewhere (including
> > stringified versions).
> 
> Probably a good idea. I'll frob them into asm/frame.h.

---
Subject: x86: Move ENCODE_FRAME_POINTER to asm/frame.h
From: Peter Zijlstra <peterz@infradead.org>
Date: Wed May 8 14:30:48 CEST 2019

In preparation for wider use, move the ENCODE_FRAME_POINTER macros to
a common header and provide inline asm versions.

These macros are used to encode a pt_regs frame for the unwinder; see
unwind_frame.c:decode_frame_pointer().

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/entry/calling.h     |   15 --------------
 arch/x86/entry/entry_32.S    |   16 --------------
 arch/x86/include/asm/frame.h |   46 +++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 46 insertions(+), 31 deletions(-)

--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -172,21 +172,6 @@ For 32-bit we have the following convent
 	.endif
 .endm
 
-/*
- * This is a sneaky trick to help the unwinder find pt_regs on the stack.  The
- * frame pointer is replaced with an encoded pointer to pt_regs.  The encoding
- * is just setting the LSB, which makes it an invalid stack address and is also
- * a signal to the unwinder that it's a pt_regs pointer in disguise.
- *
- * NOTE: This macro must be used *after* PUSH_AND_CLEAR_REGS because it corrupts
- * the original rbp.
- */
-.macro ENCODE_FRAME_POINTER ptregs_offset=0
-#ifdef CONFIG_FRAME_POINTER
-	leaq 1+\ptregs_offset(%rsp), %rbp
-#endif
-.endm
-
 #ifdef CONFIG_PAGE_TABLE_ISOLATION
 
 /*
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -246,22 +246,6 @@
 .Lend_\@:
 .endm
 
-/*
- * This is a sneaky trick to help the unwinder find pt_regs on the stack.  The
- * frame pointer is replaced with an encoded pointer to pt_regs.  The encoding
- * is just clearing the MSB, which makes it an invalid stack address and is also
- * a signal to the unwinder that it's a pt_regs pointer in disguise.
- *
- * NOTE: This macro must be used *after* SAVE_ALL because it corrupts the
- * original rbp.
- */
-.macro ENCODE_FRAME_POINTER
-#ifdef CONFIG_FRAME_POINTER
-	mov %esp, %ebp
-	andl $0x7fffffff, %ebp
-#endif
-.endm
-
 .macro RESTORE_INT_REGS
 	popl	%ebx
 	popl	%ecx
--- a/arch/x86/include/asm/frame.h
+++ b/arch/x86/include/asm/frame.h
@@ -22,6 +22,39 @@
 	pop %_ASM_BP
 .endm
 
+#ifdef CONFIG_X86_64
+/*
+ * This is a sneaky trick to help the unwinder find pt_regs on the stack.  The
+ * frame pointer is replaced with an encoded pointer to pt_regs.  The encoding
+ * is just setting the LSB, which makes it an invalid stack address and is also
+ * a signal to the unwinder that it's a pt_regs pointer in disguise.
+ *
+ * NOTE: This macro must be used *after* PUSH_AND_CLEAR_REGS because it corrupts
+ * the original rbp.
+ */
+.macro ENCODE_FRAME_POINTER ptregs_offset=0
+#ifdef CONFIG_FRAME_POINTER
+	leaq 1+\ptregs_offset(%rsp), %rbp
+#endif
+.endm
+#else /* !CONFIG_X86_64 */
+/*
+ * This is a sneaky trick to help the unwinder find pt_regs on the stack.  The
+ * frame pointer is replaced with an encoded pointer to pt_regs.  The encoding
+ * is just clearing the MSB, which makes it an invalid stack address and is also
+ * a signal to the unwinder that it's a pt_regs pointer in disguise.
+ *
+ * NOTE: This macro must be used *after* SAVE_ALL because it corrupts the
+ * original ebp.
+ */
+.macro ENCODE_FRAME_POINTER
+#ifdef CONFIG_FRAME_POINTER
+	mov %esp, %ebp
+	andl $0x7fffffff, %ebp
+#endif
+.endm
+#endif /* CONFIG_X86_64 */
+
 #else /* !__ASSEMBLY__ */
 
 #define FRAME_BEGIN				\
@@ -30,6 +63,19 @@
 
 #define FRAME_END "pop %" _ASM_BP "\n"
 
+#ifdef CONFIG_FRAME_POINTER
+#ifdef CONFIG_X86_64
+#define ENCODE_FRAME_POINTER			\
+	"lea 1(%rsp), %rbp\n\t"
+#else /* !CONFIG_X86_64 */
+#define ENCODE_FRAME_POINTER			\
+	"movl %esp, %ebp\n\t"			\
+	"andl $0x7fffffff, %ebp\n\t"
+#endif /* CONFIG_X86_64 */
+#else /* CONFIG_FRAME_POINTER */
+#define ENCODE_FRAME_POINTER
+#endif /* CONFIG_FRAME_POINTER */
+
 #endif /* __ASSEMBLY__ */
 
 #define FRAME_OFFSET __ASM_SEL(4, 8)

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/4] x86/kprobes: Fix frame pointer annotations
  2019-05-08 12:04     ` Peter Zijlstra
  2019-05-08 12:40       ` Peter Zijlstra
@ 2019-05-08 12:42       ` Josh Poimboeuf
  2019-05-08 15:39         ` Peter Zijlstra
  1 sibling, 1 reply; 36+ messages in thread
From: Josh Poimboeuf @ 2019-05-08 12:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Andy Lutomirski, Nicolai Stange, Thomas Gleixner,
	Borislav Petkov, H. Peter Anvin, x86, Jiri Kosina,
	Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan,
	Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior,
	Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain,
	Masahiro Yamada, Joerg Roedel, linux-kselftest, Masami Hiramatsu

On Wed, May 08, 2019 at 02:04:16PM +0200, Peter Zijlstra wrote:
> On Wed, May 08, 2019 at 06:54:16AM -0500, Josh Poimboeuf wrote:
> > On Wed, May 08, 2019 at 09:49:03AM +0200, Peter Zijlstra wrote:
> > > The kprobe trampolines have a FRAME_POINTER annotation that makes no
> > > sense. It marks the frame in the middle of pt_regs, at the place of
> > > saving BP.
> > > 
> > > Change it to mark the pt_regs frame as per the ENCODE_FRAME_POINTER
> > > from the respective entry_*.S.
> > > 
> > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > > ---
> > >  arch/x86/kernel/kprobes/common.h |   32 +++++++++++++++++++++++---------
> > >  1 file changed, 23 insertions(+), 9 deletions(-)
> > > 
> > > --- a/arch/x86/kernel/kprobes/common.h
> > > +++ b/arch/x86/kernel/kprobes/common.h
> > > @@ -6,14 +6,15 @@
> > >  
> > >  #include <asm/asm.h>
> > >  
> > > +#ifdef CONFIG_X86_64
> > > +
> > >  #ifdef CONFIG_FRAME_POINTER
> > > -# define SAVE_RBP_STRING "	push %" _ASM_BP "\n" \
> > > -			 "	mov  %" _ASM_SP ", %" _ASM_BP "\n"
> > > +#define ENCODE_FRAME_POINTER			\
> > > +	"	leaq 1(%rsp), %rbp\n"
> > >  #else
> > > -# define SAVE_RBP_STRING "	push %" _ASM_BP "\n"
> > > +#define ENCODE_FRAME_POINTER
> > >  #endif
> > 
> > > +#ifdef CONFIG_FRAME_POINTER
> > > +#define ENCODE_FRAME_POINTER			\
> > > +	"	movl %esp, %ebp\n"		\
> > > +	"	andl $0x7fffffff, %ebp\n"
> > > +#else
> > > +#define ENCODE_FRAME_POINTER
> > > +#endif
> > 
> > We should put these macros in a header file somewhere (including
> > stringified versions).
> 
> Probably a good idea. I'll frob them into asm/frame.h.
> 
> Do the x86_64 variants also want some ORC annotation?

Maybe so.  Though it looks like regs->ip isn't saved.  The saved
registers might need to be tweaked.  I'll need to look into it.

-- 
Josh

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][PATCH 4/4] x86_32: Provide consistent pt_regs
  2019-05-08 11:57   ` Josh Poimboeuf
@ 2019-05-08 12:46     ` Ingo Molnar
  0 siblings, 0 replies; 36+ messages in thread
From: Ingo Molnar @ 2019-05-08 12:46 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Peter Zijlstra, linux-kernel, Linus Torvalds, Andrew Morton,
	Andy Lutomirski, Nicolai Stange, Thomas Gleixner,
	Borislav Petkov, H. Peter Anvin, x86, Jiri Kosina,
	Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan,
	Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior,
	Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain,
	Masahiro Yamada, Joerg Roedel, linux-kselftest, Masami Hiramatsu


* Josh Poimboeuf <jpoimboe@redhat.com> wrote:

> On Wed, May 08, 2019 at 09:49:05AM +0200, Peter Zijlstra wrote:
> > Currently pt_regs on x86_32 has an oddity in that kernel regs
> > (!user_mode(regs)) are short two entries (esp/ss). This means that any
> > code trying to use them (typically: regs->sp) needs to jump through
> > some unfortunate hoops.
> > 
> > Change the entry code to fix this up and create a full pt_regs frame.
> > 
> > This then simplifies:
> > 
> >   - ftrace
> >   - kprobes
> >   - stack unwinder
> >   - ptrace
> >   - kdump
> >   - kgdb
> > 
> > Hated-by: Linus Torvalds <torvalds@linux-foundation.org>
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > ---
> >  arch/x86/entry/entry_32.S         |  105 ++++++++++++++++++++++++++++++++++----
> >  arch/x86/include/asm/kexec.h      |   17 ------
> >  arch/x86/include/asm/ptrace.h     |   17 ------
> >  arch/x86/include/asm/stacktrace.h |    2 
> >  arch/x86/kernel/crash.c           |    8 --
> >  arch/x86/kernel/ftrace_32.S       |   81 ++++++++++++++++-------------
> >  arch/x86/kernel/kgdb.c            |    8 --
> >  arch/x86/kernel/kprobes/common.h  |    4 -
> >  arch/x86/kernel/kprobes/core.c    |   29 ++++------
> >  arch/x86/kernel/kprobes/opt.c     |   20 ++++---
> >  arch/x86/kernel/process_32.c      |   16 +----
> >  arch/x86/kernel/ptrace.c          |   29 ----------
> >  arch/x86/kernel/time.c            |    3 -
> >  arch/x86/kernel/unwind_frame.c    |   32 +----------
> >  arch/x86/kernel/unwind_orc.c      |    2 
> >  15 files changed, 181 insertions(+), 192 deletions(-)
> 
> Very nice diffstat.  This moves all the pain to the 32-bit entry code
> where it belongs.

Yes, that's very convincing and looks like a nice simplification of 
32-bit x86 entry concepts overall.

Assuming it's correct, the question is whether Linus still hates it? :-)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/4] x86/kprobes: Fix frame pointer annotations
  2019-05-08 12:42       ` Josh Poimboeuf
@ 2019-05-08 15:39         ` Peter Zijlstra
  2019-05-08 18:48           ` Josh Poimboeuf
  0 siblings, 1 reply; 36+ messages in thread
From: Peter Zijlstra @ 2019-05-08 15:39 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Andy Lutomirski, Nicolai Stange, Thomas Gleixner,
	Borislav Petkov, H. Peter Anvin, x86, Jiri Kosina,
	Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan,
	Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior,
	Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain,
	Masahiro Yamada, Joerg Roedel, linux-kselftest, Masami Hiramatsu

On Wed, May 08, 2019 at 07:42:48AM -0500, Josh Poimboeuf wrote:
> On Wed, May 08, 2019 at 02:04:16PM +0200, Peter Zijlstra wrote:

> > Do the x86_64 variants also want some ORC annotation?
> 
> Maybe so.  Though it looks like regs->ip isn't saved.  The saved
> registers might need to be tweaked.  I'll need to look into it.

What all these sites do (and maybe we should look at unifying them
somehow) is turn a CALL frame (aka RET-IP) into an exception frame (aka
pt_regs).

So regs->ip will be the return address (which is fixed up to be the CALL
address in the handler).

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/4] x86/kprobes: Fix frame pointer annotations
  2019-05-08 15:39         ` Peter Zijlstra
@ 2019-05-08 18:48           ` Josh Poimboeuf
  2019-05-09  1:20             ` Masami Hiramatsu
  0 siblings, 1 reply; 36+ messages in thread
From: Josh Poimboeuf @ 2019-05-08 18:48 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Andy Lutomirski, Nicolai Stange, Thomas Gleixner,
	Borislav Petkov, H. Peter Anvin, x86, Jiri Kosina,
	Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan,
	Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior,
	Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain,
	Masahiro Yamada, Joerg Roedel, linux-kselftest, Masami Hiramatsu

On Wed, May 08, 2019 at 05:39:07PM +0200, Peter Zijlstra wrote:
> On Wed, May 08, 2019 at 07:42:48AM -0500, Josh Poimboeuf wrote:
> > On Wed, May 08, 2019 at 02:04:16PM +0200, Peter Zijlstra wrote:
> 
> > > Do the x86_64 variants also want some ORC annotation?
> > 
> > Maybe so.  Though it looks like regs->ip isn't saved.  The saved
> > registers might need to be tweaked.  I'll need to look into it.
> 
> What all these sites do (and maybe we should look at unifying them
> somehow) is turn a CALL frame (aka RET-IP) into an exception frame (aka
> pt_regs).
> 
> So regs->ip will be the return address (which is fixed up to be the CALL
> address in the handler).

But from what I can tell, trampoline_handler() hard-codes regs->ip to
point to kretprobe_trampoline(), and the original return address is
placed in regs->sp.

Masami, is there a reason why regs->ip doesn't have the original return
address and regs->sp doesn't have the original SP?  I think that would
help the unwinder understand things.

-- 
Josh

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][PATCH 4/4] x86_32: Provide consistent pt_regs
  2019-05-08  7:49 ` [RFC][PATCH 4/4] x86_32: Provide consistent pt_regs Peter Zijlstra
  2019-05-08 11:57   ` Josh Poimboeuf
@ 2019-05-08 20:58   ` Linus Torvalds
  1 sibling, 0 replies; 36+ messages in thread
From: Linus Torvalds @ 2019-05-08 20:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linux List Kernel Mailing, Ingo Molnar, Andrew Morton,
	Andy Lutomirski, Nicolai Stange, Thomas Gleixner,
	Borislav Petkov, H. Peter Anvin, the arch/x86 maintainers,
	Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Petr Mladek,
	Joe Lawrence, Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen,
	Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross,
	Nick Desaulniers, Nayna Jain, Masahiro Yamada, Joerg Roedel,
	open list:KERNEL SELFTEST FRAMEWORK, Masami Hiramatsu

On Wed, May 8, 2019 at 1:12 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> Hated-by: Linus Torvalds <torvalds@linux-foundation.org>

I can live with this cleaned-up version. I'm still not loving it, but
at least it now tries very hard to simplify other code. You can remote
the hated-by.

                  Linus

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/4] x86/kprobes: Fix frame pointer annotations
  2019-05-08 18:48           ` Josh Poimboeuf
@ 2019-05-09  1:20             ` Masami Hiramatsu
  2019-05-09  8:14               ` Peter Zijlstra
  0 siblings, 1 reply; 36+ messages in thread
From: Masami Hiramatsu @ 2019-05-09  1:20 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Peter Zijlstra, linux-kernel, Linus Torvalds, Ingo Molnar,
	Andrew Morton, Andy Lutomirski, Nicolai Stange, Thomas Gleixner,
	Borislav Petkov, H. Peter Anvin, x86, Jiri Kosina,
	Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan,
	Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior,
	Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain,
	Masahiro Yamada, Joerg Roedel, linux-kselftest, Masami Hiramatsu

Hi Josh,

On Wed, 8 May 2019 13:48:48 -0500
Josh Poimboeuf <jpoimboe@redhat.com> wrote:

> On Wed, May 08, 2019 at 05:39:07PM +0200, Peter Zijlstra wrote:
> > On Wed, May 08, 2019 at 07:42:48AM -0500, Josh Poimboeuf wrote:
> > > On Wed, May 08, 2019 at 02:04:16PM +0200, Peter Zijlstra wrote:
> > 
> > > > Do the x86_64 variants also want some ORC annotation?
> > > 
> > > Maybe so.  Though it looks like regs->ip isn't saved.  The saved
> > > registers might need to be tweaked.  I'll need to look into it.
> > 
> > What all these sites do (and maybe we should look at unifying them
> > somehow) is turn a CALL frame (aka RET-IP) into an exception frame (aka
> > pt_regs).
> > 
> > So regs->ip will be the return address (which is fixed up to be the CALL
> > address in the handler).
> 
> But from what I can tell, trampoline_handler() hard-codes regs->ip to
> point to kretprobe_trampoline(), and the original return address is
> placed in regs->sp.
> 
> Masami, is there a reason why regs->ip doesn't have the original return
> address and regs->sp doesn't have the original SP?  I think that would
> help the unwinder understand things.

Yes, for regs->ip, there is a histrical reason. Since previously, we had
an int3 at trampoline, so the user (kretprobe) handler expects that
regs->ip is trampoline address and ri->ret_addr is original return address.
It is better to check the other archs, but I think it is possible to
change the regs->ip to original return address, since no one cares such
"fixed address". :)

For the regs->sp, there are 2 reasons.

For x86-64, it's just for over-optimizing (reduce stack usage).
I think we can make a gap for putting return address, something like

	"kretprobe_trampoline:\n"
#ifdef CONFIG_X86_64
	"	pushq %rsp\n"	/* Make a gap for return address */
	"	pushq 0(%rsp)\n"	/* Copy original stack pointer */
	"	pushfq\n"
	SAVE_REGS_STRING
	"	movq %rsp, %rdi\n"
	"	call trampoline_handler\n"
	/* Push the true return address to the bottom */
	"	movq %rax, 20*8(%rsp)\n"
	RESTORE_REGS_STRING
	"	popfq\n"
	"	addq $8, %rsp\n"	/* Skip original stack pointer */

For i386 (x86-32), there is no other way to keep &regs->sp as
the original stack pointer. It has to be changed with this series,
maybe as same as x86-64.

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/4] x86/kprobes: Fix frame pointer annotations
  2019-05-09  1:20             ` Masami Hiramatsu
@ 2019-05-09  8:14               ` Peter Zijlstra
  2019-05-09  9:27                 ` Peter Zijlstra
                                   ` (4 more replies)
  0 siblings, 5 replies; 36+ messages in thread
From: Peter Zijlstra @ 2019-05-09  8:14 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Josh Poimboeuf, linux-kernel, Linus Torvalds, Ingo Molnar,
	Andrew Morton, Andy Lutomirski, Nicolai Stange, Thomas Gleixner,
	Borislav Petkov, H. Peter Anvin, x86, Jiri Kosina,
	Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan,
	Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior,
	Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain,
	Masahiro Yamada, Joerg Roedel, linux-kselftest

On Thu, May 09, 2019 at 10:20:30AM +0900, Masami Hiramatsu wrote:
> Hi Josh,
> 
> On Wed, 8 May 2019 13:48:48 -0500
> Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> 
> > On Wed, May 08, 2019 at 05:39:07PM +0200, Peter Zijlstra wrote:
> > > On Wed, May 08, 2019 at 07:42:48AM -0500, Josh Poimboeuf wrote:
> > > > On Wed, May 08, 2019 at 02:04:16PM +0200, Peter Zijlstra wrote:
> > > 
> > > > > Do the x86_64 variants also want some ORC annotation?
> > > > 
> > > > Maybe so.  Though it looks like regs->ip isn't saved.  The saved
> > > > registers might need to be tweaked.  I'll need to look into it.
> > > 
> > > What all these sites do (and maybe we should look at unifying them
> > > somehow) is turn a CALL frame (aka RET-IP) into an exception frame (aka
> > > pt_regs).
> > > 
> > > So regs->ip will be the return address (which is fixed up to be the CALL
> > > address in the handler).
> > 
> > But from what I can tell, trampoline_handler() hard-codes regs->ip to
> > point to kretprobe_trampoline(), and the original return address is
> > placed in regs->sp.
> > 
> > Masami, is there a reason why regs->ip doesn't have the original return
> > address and regs->sp doesn't have the original SP?  I think that would
> > help the unwinder understand things.
> 
> Yes, for regs->ip, there is a histrical reason. Since previously, we had
> an int3 at trampoline, so the user (kretprobe) handler expects that
> regs->ip is trampoline address and ri->ret_addr is original return address.
> It is better to check the other archs, but I think it is possible to
> change the regs->ip to original return address, since no one cares such
> "fixed address". :)
> 
> For the regs->sp, there are 2 reasons.
> 
> For x86-64, it's just for over-optimizing (reduce stack usage).
> I think we can make a gap for putting return address, something like
> 
> 	"kretprobe_trampoline:\n"
> #ifdef CONFIG_X86_64
> 	"	pushq %rsp\n"	/* Make a gap for return address */
> 	"	pushq 0(%rsp)\n"	/* Copy original stack pointer */
> 	"	pushfq\n"
> 	SAVE_REGS_STRING
> 	"	movq %rsp, %rdi\n"
> 	"	call trampoline_handler\n"
> 	/* Push the true return address to the bottom */
> 	"	movq %rax, 20*8(%rsp)\n"
> 	RESTORE_REGS_STRING
> 	"	popfq\n"
> 	"	addq $8, %rsp\n"	/* Skip original stack pointer */
> 
> For i386 (x86-32), there is no other way to keep &regs->sp as
> the original stack pointer. It has to be changed with this series,
> maybe as same as x86-64.

Right; I already fixed that in my patch changing i386's pt_regs.

But what I'd love to do is something like the belwo patch, and make all
the trampolines (very much including ftrace) use that. Such that we then
only have 1 copy of this magic (well, 2 because x86_64 also needs an
implementation of this of course).

Changing ftrace over to this would be a little more work but it can
easily chain things a little to get its original context back:

ENTRY(ftrace_regs_caller)
GLOBAL(ftrace_regs_func)
	push ftrace_stub
	push ftrace_regs_handler
	jmp call_to_exception_trampoline
END(ftrace_regs_caller)

typedef void (*ftrace_func_t)(unsigned long, unsigned long, struct ftrace_op *, struct pt_regs *);

struct ftrace_regs_stack {
	ftrace_func_t func;
	unsigned long parent_ip;
};

void ftrace_regs_handler(struct pr_regs *regs)
{
	struct ftrace_regs_stack *st = (void *)regs->sp;
	ftrace_func_t func = st->func;

	regs->sp += sizeof(long); /* pop func */

	func(regs->ip, st->parent_ip, function_trace_op, regs);
}

Hmm? I didn't look into the function_graph thing, but I imagine it can
be added without too much pain.

---
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -1576,3 +1576,100 @@ ENTRY(rewind_stack_do_exit)
 	call	do_exit
 1:	jmp 1b
 END(rewind_stack_do_exit)
+
+/*
+ * Transforms a CALL frame into an exception frame; IOW it pretends the CALL we
+ * just did was in fact scribbled with an INT3.
+ *
+ * Use this trampoline like:
+ *
+ *   PUSH $func
+ *   JMP call_to_exception_trampoline
+ *
+ * $func will see regs->ip point at the CALL instruction and must therefore
+ * modify regs->ip in order to make progress (just like a normal INT3 scribbled
+ * CALL).
+ *
+ * NOTE: we do not restore any of the segment registers.
+ */
+ENTRY(call_to_exception_trampoline)
+	/*
+	 * On entry the stack looks like:
+	 *
+	 *   2*4(%esp) <previous context>
+	 *   1*4(%esp) RET-IP
+	 *   0*4(%esp) func
+	 *
+	 * transform this into:
+	 *
+	 *  19*4(%esp) <previous context>
+	 *  18*4(%esp) gap / RET-IP
+	 *  17*4(%esp) gap / func
+	 *  16*4(%esp) ss
+	 *  15*4(%esp) sp / <previous context>
+	 *  14*4(%esp) flags
+	 *  13*4(%esp) cs
+	 *  12*4(%esp) ip / RET-IP
+	 *  11*4(%esp) orig_eax
+	 *  10*4(%esp) gs
+	 *   9*4(%esp) fs
+	 *   8*4(%esp) es
+	 *   7*4(%esp) ds
+	 *   6*4(%esp) eax
+	 *   5*4(%esp) ebp
+	 *   4*4(%esp) edi
+	 *   3*4(%esp) esi
+	 *   2*4(%esp) edx
+	 *   1*4(%esp) ecx
+	 *   0*4(%esp) ebx
+	 */
+	pushl	%ss
+	pushl	%esp		# points at ss
+	addl	$3*4, (%esp)	#   point it at <previous context>
+	pushfl
+	pushl	%cs
+	pushl	5*4(%esp)	# RET-IP
+	subl	5, (%esp)	#   point at CALL instruction
+	pushl	$-1
+	pushl	%gs
+	pushl	%fs
+	pushl	%es
+	pushl	%ds
+	pushl	%eax
+	pushl	%ebp
+	pushl	%edi
+	pushl	%esi
+	pushl	%edx
+	pushl	%ecx
+	pushl	%ebx
+
+	ENCODE_FRAME_POINTER
+
+	movl	%esp, %eax	# 1st argument: pt_regs
+
+	movl	17*4(%esp), %ebx	# func
+	CALL_NOSPEC %ebx
+
+	movl	PT_OLDESP(%esp), %eax
+
+	movl	PT_EIP(%esp), %ecx
+	movl	%ecx, -1*4(%eax)
+
+	movl	PT_EFLAGS(%esp), %ecx
+	movl	%ecx, -2*4(%eax)
+
+	movl	PT_EAX(%esp), %ecx
+	movl	%ecx, -3*4(%eax)
+
+	popl	%ebx
+	popl	%ecx
+	popl	%edx
+	popl	%esi
+	popl	%edi
+	popl	%ebp
+
+	lea	-3*4(%eax), %esp
+	popl	%eax
+	popfl
+	ret
+END(call_to_exception_trampoline)
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -731,29 +731,8 @@ asm(
 	".global kretprobe_trampoline\n"
 	".type kretprobe_trampoline, @function\n"
 	"kretprobe_trampoline:\n"
-	/* We don't bother saving the ss register */
-#ifdef CONFIG_X86_64
-	"	pushq %rsp\n"
-	"	pushfq\n"
-	SAVE_REGS_STRING
-	"	movq %rsp, %rdi\n"
-	"	call trampoline_handler\n"
-	/* Replace saved sp with true return address. */
-	"	movq %rax, 19*8(%rsp)\n"
-	RESTORE_REGS_STRING
-	"	popfq\n"
-#else
-	"	pushl %esp\n"
-	"	pushfl\n"
-	SAVE_REGS_STRING
-	"	movl %esp, %eax\n"
-	"	call trampoline_handler\n"
-	/* Replace saved sp with true return address. */
-	"	movl %eax, 15*4(%esp)\n"
-	RESTORE_REGS_STRING
-	"	popfl\n"
-#endif
-	"	ret\n"
+	"push trampoline_handler\n"
+	"jmp call_to_exception_trampoline\n"
 	".size kretprobe_trampoline, .-kretprobe_trampoline\n"
 );
 NOKPROBE_SYMBOL(kretprobe_trampoline);
@@ -791,12 +770,7 @@ static __used void *trampoline_handler(s
 
 	INIT_HLIST_HEAD(&empty_rp);
 	kretprobe_hash_lock(current, &head, &flags);
-	/* fixup registers */
-	regs->cs = __KERNEL_CS;
-#ifdef CONFIG_X86_32
-	regs->cs |= get_kernel_rpl();
-	regs->gs = 0;
-#endif
+
 	/* We use pt_regs->sp for return address holder. */
 	frame_pointer = &regs->sp;
 	regs->ip = trampoline_address;

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/4] x86/kprobes: Fix frame pointer annotations
  2019-05-09  8:14               ` Peter Zijlstra
@ 2019-05-09  9:27                 ` Peter Zijlstra
  2019-05-09 14:00                 ` Josh Poimboeuf
                                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 36+ messages in thread
From: Peter Zijlstra @ 2019-05-09  9:27 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Josh Poimboeuf, linux-kernel, Linus Torvalds, Ingo Molnar,
	Andrew Morton, Andy Lutomirski, Nicolai Stange, Thomas Gleixner,
	Borislav Petkov, H. Peter Anvin, x86, Jiri Kosina,
	Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan,
	Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior,
	Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain,
	Masahiro Yamada, Joerg Roedel, linux-kselftest

On Thu, May 09, 2019 at 10:14:31AM +0200, Peter Zijlstra wrote:
> struct ftrace_regs_stack {
> 	ftrace_func_t func;
> 	unsigned long parent_ip;
> };
> 
> void ftrace_regs_handler(struct pr_regs *regs)
> {
> 	struct ftrace_regs_stack *st = (void *)regs->sp;
> 	ftrace_func_t func = st->func;
> 
> 	regs->sp += sizeof(long); /* pop func */
> 
> 	func(regs->ip, st->parent_ip, function_trace_op, regs);
> }

Alternatively we can add things like:

static inline unsigned long int3_emulate_pop(struct pt_regs *regs)
{
	unsigned long val = *(unsigned long *)regs->sp;
	regs->sp += sizeof(unsigned long);
	return val;
}

And do:

	ftrace_func_t func = (void *)int3_emulate_pop(regs);

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/4] x86/kprobes: Fix frame pointer annotations
  2019-05-09  8:14               ` Peter Zijlstra
  2019-05-09  9:27                 ` Peter Zijlstra
@ 2019-05-09 14:00                 ` Josh Poimboeuf
  2019-05-09 14:01                 ` Masami Hiramatsu
                                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 36+ messages in thread
From: Josh Poimboeuf @ 2019-05-09 14:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Masami Hiramatsu, linux-kernel, Linus Torvalds, Ingo Molnar,
	Andrew Morton, Andy Lutomirski, Nicolai Stange, Thomas Gleixner,
	Borislav Petkov, H. Peter Anvin, x86, Jiri Kosina,
	Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan,
	Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior,
	Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain,
	Masahiro Yamada, Joerg Roedel, linux-kselftest

On Thu, May 09, 2019 at 10:14:31AM +0200, Peter Zijlstra wrote:
> But what I'd love to do is something like the belwo patch, and make all
> the trampolines (very much including ftrace) use that. Such that we then
> only have 1 copy of this magic (well, 2 because x86_64 also needs an
> implementation of this of course).
> 
> Changing ftrace over to this would be a little more work but it can
> easily chain things a little to get its original context back:
> 
> ENTRY(ftrace_regs_caller)
> GLOBAL(ftrace_regs_func)
> 	push ftrace_stub
> 	push ftrace_regs_handler
> 	jmp call_to_exception_trampoline
> END(ftrace_regs_caller)
> 
> typedef void (*ftrace_func_t)(unsigned long, unsigned long, struct ftrace_op *, struct pt_regs *);
> 
> struct ftrace_regs_stack {
> 	ftrace_func_t func;
> 	unsigned long parent_ip;
> };
> 
> void ftrace_regs_handler(struct pr_regs *regs)
> {
> 	struct ftrace_regs_stack *st = (void *)regs->sp;
> 	ftrace_func_t func = st->func;
> 
> 	regs->sp += sizeof(long); /* pop func */
> 
> 	func(regs->ip, st->parent_ip, function_trace_op, regs);
> }
> 
> Hmm? I didn't look into the function_graph thing, but I imagine it can
> be added without too much pain.

I like this patch a lot, assuming it can be made to work for the
different users.

-- 
Josh

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/4] x86/kprobes: Fix frame pointer annotations
  2019-05-09  8:14               ` Peter Zijlstra
  2019-05-09  9:27                 ` Peter Zijlstra
  2019-05-09 14:00                 ` Josh Poimboeuf
@ 2019-05-09 14:01                 ` Masami Hiramatsu
  2019-05-09 17:14                   ` Peter Zijlstra
  2019-05-09 16:20                 ` Andy Lutomirski
  2019-05-09 17:37                 ` Steven Rostedt
  4 siblings, 1 reply; 36+ messages in thread
From: Masami Hiramatsu @ 2019-05-09 14:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Josh Poimboeuf, linux-kernel, Linus Torvalds, Ingo Molnar,
	Andrew Morton, Andy Lutomirski, Nicolai Stange, Thomas Gleixner,
	Borislav Petkov, H. Peter Anvin, x86, Jiri Kosina,
	Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan,
	Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior,
	Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain,
	Masahiro Yamada, Joerg Roedel, linux-kselftest

On Thu, 9 May 2019 10:14:31 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Thu, May 09, 2019 at 10:20:30AM +0900, Masami Hiramatsu wrote:
> > Hi Josh,
> > 
> > On Wed, 8 May 2019 13:48:48 -0500
> > Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > 
> > > On Wed, May 08, 2019 at 05:39:07PM +0200, Peter Zijlstra wrote:
> > > > On Wed, May 08, 2019 at 07:42:48AM -0500, Josh Poimboeuf wrote:
> > > > > On Wed, May 08, 2019 at 02:04:16PM +0200, Peter Zijlstra wrote:
> > > > 
> > > > > > Do the x86_64 variants also want some ORC annotation?
> > > > > 
> > > > > Maybe so.  Though it looks like regs->ip isn't saved.  The saved
> > > > > registers might need to be tweaked.  I'll need to look into it.
> > > > 
> > > > What all these sites do (and maybe we should look at unifying them
> > > > somehow) is turn a CALL frame (aka RET-IP) into an exception frame (aka
> > > > pt_regs).
> > > > 
> > > > So regs->ip will be the return address (which is fixed up to be the CALL
> > > > address in the handler).
> > > 
> > > But from what I can tell, trampoline_handler() hard-codes regs->ip to
> > > point to kretprobe_trampoline(), and the original return address is
> > > placed in regs->sp.
> > > 
> > > Masami, is there a reason why regs->ip doesn't have the original return
> > > address and regs->sp doesn't have the original SP?  I think that would
> > > help the unwinder understand things.
> > 
> > Yes, for regs->ip, there is a histrical reason. Since previously, we had
> > an int3 at trampoline, so the user (kretprobe) handler expects that
> > regs->ip is trampoline address and ri->ret_addr is original return address.
> > It is better to check the other archs, but I think it is possible to
> > change the regs->ip to original return address, since no one cares such
> > "fixed address". :)
> > 
> > For the regs->sp, there are 2 reasons.
> > 
> > For x86-64, it's just for over-optimizing (reduce stack usage).
> > I think we can make a gap for putting return address, something like
> > 
> > 	"kretprobe_trampoline:\n"
> > #ifdef CONFIG_X86_64
> > 	"	pushq %rsp\n"	/* Make a gap for return address */
> > 	"	pushq 0(%rsp)\n"	/* Copy original stack pointer */
> > 	"	pushfq\n"
> > 	SAVE_REGS_STRING
> > 	"	movq %rsp, %rdi\n"
> > 	"	call trampoline_handler\n"
> > 	/* Push the true return address to the bottom */
> > 	"	movq %rax, 20*8(%rsp)\n"
> > 	RESTORE_REGS_STRING
> > 	"	popfq\n"
> > 	"	addq $8, %rsp\n"	/* Skip original stack pointer */
> > 
> > For i386 (x86-32), there is no other way to keep &regs->sp as
> > the original stack pointer. It has to be changed with this series,
> > maybe as same as x86-64.
> 
> Right; I already fixed that in my patch changing i386's pt_regs.

I see it, and it is good to me. :)

> But what I'd love to do is something like the belwo patch, and make all
> the trampolines (very much including ftrace) use that. Such that we then
> only have 1 copy of this magic (well, 2 because x86_64 also needs an
> implementation of this of course).

OK, but I will make kretprobe integrated with func-graph tracer,
since it is inefficient that we have 2 different hidden return stack...

Anyway,

> Changing ftrace over to this would be a little more work but it can
> easily chain things a little to get its original context back:
> 
> ENTRY(ftrace_regs_caller)
> GLOBAL(ftrace_regs_func)
> 	push ftrace_stub
> 	push ftrace_regs_handler
> 	jmp call_to_exception_trampoline
> END(ftrace_regs_caller)
> 
> typedef void (*ftrace_func_t)(unsigned long, unsigned long, struct ftrace_op *, struct pt_regs *);
> 
> struct ftrace_regs_stack {
> 	ftrace_func_t func;
> 	unsigned long parent_ip;
> };
> 
> void ftrace_regs_handler(struct pr_regs *regs)
> {
> 	struct ftrace_regs_stack *st = (void *)regs->sp;
> 	ftrace_func_t func = st->func;
> 
> 	regs->sp += sizeof(long); /* pop func */

Sorry, why pop here? 

> 
> 	func(regs->ip, st->parent_ip, function_trace_op, regs);
> }
> 
> Hmm? I didn't look into the function_graph thing, but I imagine it can
> be added without too much pain.

Yes, that should be good for function_graph trampoline too.
We use very similar technic.

> 
> ---
> --- a/arch/x86/entry/entry_32.S
> +++ b/arch/x86/entry/entry_32.S
> @@ -1576,3 +1576,100 @@ ENTRY(rewind_stack_do_exit)
>  	call	do_exit
>  1:	jmp 1b
>  END(rewind_stack_do_exit)
> +
> +/*
> + * Transforms a CALL frame into an exception frame; IOW it pretends the CALL we
> + * just did was in fact scribbled with an INT3.
> + *
> + * Use this trampoline like:
> + *
> + *   PUSH $func
> + *   JMP call_to_exception_trampoline
> + *
> + * $func will see regs->ip point at the CALL instruction and must therefore
> + * modify regs->ip in order to make progress (just like a normal INT3 scribbled
> + * CALL).
> + *
> + * NOTE: we do not restore any of the segment registers.
> + */
> +ENTRY(call_to_exception_trampoline)
> +	/*
> +	 * On entry the stack looks like:
> +	 *
> +	 *   2*4(%esp) <previous context>
> +	 *   1*4(%esp) RET-IP
> +	 *   0*4(%esp) func
> +	 *
> +	 * transform this into:
> +	 *
> +	 *  19*4(%esp) <previous context>
> +	 *  18*4(%esp) gap / RET-IP
> +	 *  17*4(%esp) gap / func
> +	 *  16*4(%esp) ss
> +	 *  15*415*4(%esp) sp / <previous context>

isn't this "&<previous context>" ?

> +	 *  14*4(%esp) flags
> +	 *  13*4(%esp) cs
> +	 *  12*4(%esp) ip / RET-IP
> +	 *  11*4(%esp) orig_eax
> +	 *  10*4(%esp) gs
> +	 *   9*4(%esp) fs
> +	 *   8*4(%esp) es
> +	 *   7*4(%esp) ds
> +	 *   6*4(%esp) eax
> +	 *   5*4(%esp) ebp
> +	 *   4*4(%esp) edi
> +	 *   3*4(%esp) esi
> +	 *   2*4(%esp) edx
> +	 *   1*4(%esp) ecx
> +	 *   0*4(%esp) ebx
> +	 */
> +	pushl	%ss
> +	pushl	%esp		# points at ss
> +	addl	$3*4, (%esp)	#   point it at <previous context>
> +	pushfl
> +	pushl	%cs
> +	pushl	5*4(%esp)	# RET-IP
> +	subl	5, (%esp)	#   point at CALL instruction
> +	pushl	$-1
> +	pushl	%gs
> +	pushl	%fs
> +	pushl	%es
> +	pushl	%ds
> +	pushl	%eax
> +	pushl	%ebp
> +	pushl	%edi
> +	pushl	%esi
> +	pushl	%edx
> +	pushl	%ecx
> +	pushl	%ebx
> +
> +	ENCODE_FRAME_POINTER
> +
> +	movl	%esp, %eax	# 1st argument: pt_regs
> +
> +	movl	17*4(%esp), %ebx	# func
> +	CALL_NOSPEC %ebx
> +
> +	movl	PT_OLDESP(%esp), %eax

Is PT_OLDESP(%esp) "<previous context>" or "&<previous contex>"?

> +
> +	movl	PT_EIP(%esp), %ecx
> +	movl	%ecx, -1*4(%eax)

Ah, OK, so $func must set the true return address to regs->ip
instead of returning it.

> +
> +	movl	PT_EFLAGS(%esp), %ecx
> +	movl	%ecx, -2*4(%eax)
> +
> +	movl	PT_EAX(%esp), %ecx
> +	movl	%ecx, -3*4(%eax)

So, at this point, the stack becomes

 18*4(%esp) RET-IP
 17*4(%esp) eflags
 16*4(%esp) eax

Correct?

> +
> +	popl	%ebx
> +	popl	%ecx
> +	popl	%edx
> +	popl	%esi
> +	popl	%edi
> +	popl	%ebp
> +
> +	lea	-3*4(%eax), %esp
> +	popl	%eax
> +	popfl
> +	ret
> +END(call_to_exception_trampoline)
> --- a/arch/x86/kernel/kprobes/core.c
> +++ b/arch/x86/kernel/kprobes/core.c
> @@ -731,29 +731,8 @@ asm(
>  	".global kretprobe_trampoline\n"
>  	".type kretprobe_trampoline, @function\n"
>  	"kretprobe_trampoline:\n"
> -	/* We don't bother saving the ss register */
> -#ifdef CONFIG_X86_64
> -	"	pushq %rsp\n"
> -	"	pushfq\n"
> -	SAVE_REGS_STRING
> -	"	movq %rsp, %rdi\n"
> -	"	call trampoline_handler\n"
> -	/* Replace saved sp with true return address. */
> -	"	movq %rax, 19*8(%rsp)\n"
> -	RESTORE_REGS_STRING
> -	"	popfq\n"
> -#else
> -	"	pushl %esp\n"
> -	"	pushfl\n"
> -	SAVE_REGS_STRING
> -	"	movl %esp, %eax\n"
> -	"	call trampoline_handler\n"
> -	/* Replace saved sp with true return address. */
> -	"	movl %eax, 15*4(%esp)\n"
> -	RESTORE_REGS_STRING
> -	"	popfl\n"
> -#endif
> -	"	ret\n"

Here, we need a gap for storing ret-ip, because kretprobe_trampoline is
the address which is returned from the target function. We have no 
"ret-ip" here at this point. So something like

+	"push $0\n"	/* This is a gap, will be filled with real return address*/

> +	"push trampoline_handler\n"
> +	"jmp call_to_exception_trampoline\n"
>  	".size kretprobe_trampoline, .-kretprobe_trampoline\n"
>  );
>  NOKPROBE_SYMBOL(kretprobe_trampoline);
> @@ -791,12 +770,7 @@ static __used void *trampoline_handler(s
>  
>  	INIT_HLIST_HEAD(&empty_rp);
>  	kretprobe_hash_lock(current, &head, &flags);
> -	/* fixup registers */
> -	regs->cs = __KERNEL_CS;
> -#ifdef CONFIG_X86_32
> -	regs->cs |= get_kernel_rpl();
> -	regs->gs = 0;
> -#endif
> +
>  	/* We use pt_regs->sp for return address holder. */
>  	frame_pointer = &regs->sp;
>  	regs->ip = trampoline_address;

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/4] x86/kprobes: Fix frame pointer annotations
  2019-05-09  8:14               ` Peter Zijlstra
                                   ` (2 preceding siblings ...)
  2019-05-09 14:01                 ` Masami Hiramatsu
@ 2019-05-09 16:20                 ` Andy Lutomirski
  2019-05-09 17:18                   ` Peter Zijlstra
  2019-05-09 17:43                   ` Steven Rostedt
  2019-05-09 17:37                 ` Steven Rostedt
  4 siblings, 2 replies; 36+ messages in thread
From: Andy Lutomirski @ 2019-05-09 16:20 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Masami Hiramatsu, Josh Poimboeuf, linux-kernel, Linus Torvalds,
	Ingo Molnar, Andrew Morton, Andy Lutomirski, Nicolai Stange,
	Thomas Gleixner, Borislav Petkov, H. Peter Anvin, x86,
	Jiri Kosina, Miroslav Benes, Petr Mladek, Joe Lawrence,
	Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen,
	Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross,
	Nick Desaulniers, Nayna Jain, Masahiro Yamada, Joerg Roedel,
	linux-kselftest



> On May 9, 2019, at 1:14 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> 
>> On Thu, May 09, 2019 at 10:20:30AM +0900, Masami Hiramatsu wrote:
>> Hi Josh,
>> 
>> On Wed, 8 May 2019 13:48:48 -0500
>> Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>> 
>>>> On Wed, May 08, 2019 at 05:39:07PM +0200, Peter Zijlstra wrote:
>>>>> On Wed, May 08, 2019 at 07:42:48AM -0500, Josh Poimboeuf wrote:
>>>>> On Wed, May 08, 2019 at 02:04:16PM +0200, Peter Zijlstra wrote:
>>>> 
>>>>>> Do the x86_64 variants also want some ORC annotation?
>>>>> 
>>>>> Maybe so.  Though it looks like regs->ip isn't saved.  The saved
>>>>> registers might need to be tweaked.  I'll need to look into it.
>>>> 
>>>> What all these sites do (and maybe we should look at unifying them
>>>> somehow) is turn a CALL frame (aka RET-IP) into an exception frame (aka
>>>> pt_regs).
>>>> 
>>>> So regs->ip will be the return address (which is fixed up to be the CALL
>>>> address in the handler).
>>> 
>>> But from what I can tell, trampoline_handler() hard-codes regs->ip to
>>> point to kretprobe_trampoline(), and the original return address is
>>> placed in regs->sp.
>>> 
>>> Masami, is there a reason why regs->ip doesn't have the original return
>>> address and regs->sp doesn't have the original SP?  I think that would
>>> help the unwinder understand things.
>> 
>> Yes, for regs->ip, there is a histrical reason. Since previously, we had
>> an int3 at trampoline, so the user (kretprobe) handler expects that
>> regs->ip is trampoline address and ri->ret_addr is original return address.
>> It is better to check the other archs, but I think it is possible to
>> change the regs->ip to original return address, since no one cares such
>> "fixed address". :)
>> 
>> For the regs->sp, there are 2 reasons.
>> 
>> For x86-64, it's just for over-optimizing (reduce stack usage).
>> I think we can make a gap for putting return address, something like
>> 
>>    "kretprobe_trampoline:\n"
>> #ifdef CONFIG_X86_64
>>    "    pushq %rsp\n"    /* Make a gap for return address */
>>    "    pushq 0(%rsp)\n"    /* Copy original stack pointer */
>>    "    pushfq\n"
>>    SAVE_REGS_STRING
>>    "    movq %rsp, %rdi\n"
>>    "    call trampoline_handler\n"
>>    /* Push the true return address to the bottom */
>>    "    movq %rax, 20*8(%rsp)\n"
>>    RESTORE_REGS_STRING
>>    "    popfq\n"
>>    "    addq $8, %rsp\n"    /* Skip original stack pointer */
>> 
>> For i386 (x86-32), there is no other way to keep &regs->sp as
>> the original stack pointer. It has to be changed with this series,
>> maybe as same as x86-64.
> 
> Right; I already fixed that in my patch changing i386's pt_regs.
> 
> But what I'd love to do is something like the belwo patch, and make all
> the trampolines (very much including ftrace) use that. Such that we then
> only have 1 copy of this magic (well, 2 because x86_64 also needs an
> implementation of this of course).
> 
> Changing ftrace over to this would be a little more work but it can
> easily chain things a little to get its original context back:
> 
> ENTRY(ftrace_regs_caller)
> GLOBAL(ftrace_regs_func)
>    push ftrace_stub
>    push ftrace_regs_handler
>    jmp call_to_exception_trampoline
> END(ftrace_regs_caller)
> 
> typedef void (*ftrace_func_t)(unsigned long, unsigned long, struct ftrace_op *, struct pt_regs *);
> 
> struct ftrace_regs_stack {
>    ftrace_func_t func;
>    unsigned long parent_ip;
> };
> 
> void ftrace_regs_handler(struct pr_regs *regs)
> {
>    struct ftrace_regs_stack *st = (void *)regs->sp;
>    ftrace_func_t func = st->func;
> 
>    regs->sp += sizeof(long); /* pop func */
> 
>    func(regs->ip, st->parent_ip, function_trace_op, regs);
> }
> 
> Hmm? I didn't look into the function_graph thing, but I imagine it can
> be added without too much pain.
> 
> ---
> --- a/arch/x86/entry/entry_32.S
> +++ b/arch/x86/entry/entry_32.S
> @@ -1576,3 +1576,100 @@ ENTRY(rewind_stack_do_exit)
>    call    do_exit
> 1:    jmp 1b
> END(rewind_stack_do_exit)
> +
> +/*
> + * Transforms a CALL frame into an exception frame; IOW it pretends the CALL we
> + * just did was in fact scribbled with an INT3.
> + *
> + * Use this trampoline like:
> + *
> + *   PUSH $func
> + *   JMP call_to_exception_trampoline
> + *
> + * $func will see regs->ip point at the CALL instruction and must therefore
> + * modify regs->ip in order to make progress (just like a normal INT3 scribbled
> + * CALL).
> + *
> + * NOTE: we do not restore any of the segment registers.
> + */
> +ENTRY(call_to_exception_trampoline)
> +    /*
> +     * On entry the stack looks like:
> +     *
> +     *   2*4(%esp) <previous context>
> +     *   1*4(%esp) RET-IP
> +     *   0*4(%esp) func
> +     *
> +     * transform this into:
> +     *
> +     *  19*4(%esp) <previous context>
> +     *  18*4(%esp) gap / RET-IP
> +     *  17*4(%esp) gap / func
> +     *  16*4(%esp) ss
> +     *  15*4(%esp) sp / <previous context>
> +     *  14*4(%esp) flags
> +     *  13*4(%esp) cs
> +     *  12*4(%esp) ip / RET-IP
> +     *  11*4(%esp) orig_eax
> +     *  10*4(%esp) gs
> +     *   9*4(%esp) fs
> +     *   8*4(%esp) es
> +     *   7*4(%esp) ds
> +     *   6*4(%esp) eax
> +     *   5*4(%esp) ebp
> +     *   4*4(%esp) edi
> +     *   3*4(%esp) esi
> +     *   2*4(%esp) edx
> +     *   1*4(%esp) ecx
> +     *   0*4(%esp) ebx
> +     */
> +    pushl    %ss
> +    pushl    %esp        # points at ss
> +    addl    $3*4, (%esp)    #   point it at <previous context>
> +    pushfl
> +    pushl    %cs
> +    pushl    5*4(%esp)    # RET-IP
> +    subl    5, (%esp)    #   point at CALL instruction
> +    pushl    $-1
> +    pushl    %gs
> +    pushl    %fs
> +    pushl    %es
> +    pushl    %ds
> +    pushl    %eax
> +    pushl    %ebp
> +    pushl    %edi
> +    pushl    %esi
> +    pushl    %edx
> +    pushl    %ecx
> +    pushl    %ebx
> +
> +    ENCODE_FRAME_POINTER
> +
> +    movl    %esp, %eax    # 1st argument: pt_regs
> +
> +    movl    17*4(%esp), %ebx    # func
> +    CALL_NOSPEC %ebx
> +
> +    movl    PT_OLDESP(%esp), %eax
> +
> +    movl    PT_EIP(%esp), %ecx
> +    movl    %ecx, -1*4(%eax)
> +
> +    movl    PT_EFLAGS(%esp), %ecx
> +    movl    %ecx, -2*4(%eax)
> +
> +    movl    PT_EAX(%esp), %ecx
> +    movl    %ecx, -3*4(%eax)
> +
> +    popl    %ebx
> +    popl    %ecx
> +    popl    %edx
> +    popl    %esi
> +    popl    %edi
> +    popl    %ebp
> +
> +    lea    -3*4(%eax), %esp
> +    popl    %eax
> +    popfl
> +    ret
> +END(call_to_exception_trampoline)
> --- a/arch/x86/kernel/kprobes/core.c
> +++ b/arch/x86/kernel/kprobes/core.c
> @@ -731,29 +731,8 @@ asm(
>    ".global kretprobe_trampoline\n"
>    ".type kretprobe_trampoline, @function\n"
>    "kretprobe_trampoline:\n"
> -    /* We don't bother saving the ss register */
> -#ifdef CONFIG_X86_64
> -    "    pushq %rsp\n"
> -    "    pushfq\n"
> -    SAVE_REGS_STRING
> -    "    movq %rsp, %rdi\n"
> -    "    call trampoline_handler\n"
> -    /* Replace saved sp with true return address. */
> -    "    movq %rax, 19*8(%rsp)\n"
> -    RESTORE_REGS_STRING
> -    "    popfq\n"
> -#else
> -    "    pushl %esp\n"
> -    "    pushfl\n"
> -    SAVE_REGS_STRING
> -    "    movl %esp, %eax\n"
> -    "    call trampoline_handler\n"
> -    /* Replace saved sp with true return address. */
> -    "    movl %eax, 15*4(%esp)\n"
> -    RESTORE_REGS_STRING
> -    "    popfl\n"
> -#endif
> -    "    ret\n"
> +    "push trampoline_handler\n"
> +    "jmp call_to_exception_trampoline\n"
>    ".size kretprobe_trampoline, .-kretprobe_trampoline\n"
> );


Potentially minor nit: you’re doing popfl, but you’re not doing TRACE_IRQ_whatever.  This makes me think that you should either add the tracing (ugh!) or you should maybe just skip the popfl.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/4] x86/kprobes: Fix frame pointer annotations
  2019-05-09 14:01                 ` Masami Hiramatsu
@ 2019-05-09 17:14                   ` Peter Zijlstra
  2019-05-10  4:58                     ` Masami Hiramatsu
  0 siblings, 1 reply; 36+ messages in thread
From: Peter Zijlstra @ 2019-05-09 17:14 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Josh Poimboeuf, linux-kernel, Linus Torvalds, Ingo Molnar,
	Andrew Morton, Andy Lutomirski, Nicolai Stange, Thomas Gleixner,
	Borislav Petkov, H. Peter Anvin, x86, Jiri Kosina,
	Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan,
	Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior,
	Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain,
	Masahiro Yamada, Joerg Roedel, linux-kselftest

On Thu, May 09, 2019 at 11:01:06PM +0900, Masami Hiramatsu wrote:
> On Thu, 9 May 2019 10:14:31 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:

> > But what I'd love to do is something like the belwo patch, and make all
> > the trampolines (very much including ftrace) use that. Such that we then
> > only have 1 copy of this magic (well, 2 because x86_64 also needs an
> > implementation of this of course).
> 
> OK, but I will make kretprobe integrated with func-graph tracer,
> since it is inefficient that we have 2 different hidden return stack...
> 
> Anyway,
> 
> > Changing ftrace over to this would be a little more work but it can
> > easily chain things a little to get its original context back:
> > 
> > ENTRY(ftrace_regs_caller)
> > GLOBAL(ftrace_regs_func)
> > 	push ftrace_stub
> > 	push ftrace_regs_handler
> > 	jmp call_to_exception_trampoline
> > END(ftrace_regs_caller)
> > 
> > typedef void (*ftrace_func_t)(unsigned long, unsigned long, struct ftrace_op *, struct pt_regs *);
> > 
> > struct ftrace_regs_stack {
> > 	ftrace_func_t func;
> > 	unsigned long parent_ip;
> > };
> > 
> > void ftrace_regs_handler(struct pr_regs *regs)
> > {
> > 	struct ftrace_regs_stack *st = (void *)regs->sp;
> > 	ftrace_func_t func = st->func;
> > 
> > 	regs->sp += sizeof(long); /* pop func */
> 
> Sorry, why pop here? 

Otherwise it stays on the return stack and bad things happen. Note how
the below trampoline thing uses regs->sp.

> > 	func(regs->ip, st->parent_ip, function_trace_op, regs);
> > }
> > 
> > Hmm? I didn't look into the function_graph thing, but I imagine it can
> > be added without too much pain.
> 
> Yes, that should be good for function_graph trampoline too.
> We use very similar technic.

Ideally also the optimized kprobe trampoline, but I've not managed to
fully comprehend that one.

> > 
> > ---
> > --- a/arch/x86/entry/entry_32.S
> > +++ b/arch/x86/entry/entry_32.S
> > @@ -1576,3 +1576,100 @@ ENTRY(rewind_stack_do_exit)
> >  	call	do_exit
> >  1:	jmp 1b
> >  END(rewind_stack_do_exit)
> > +
> > +/*
> > + * Transforms a CALL frame into an exception frame; IOW it pretends the CALL we
> > + * just did was in fact scribbled with an INT3.
> > + *
> > + * Use this trampoline like:
> > + *
> > + *   PUSH $func
> > + *   JMP call_to_exception_trampoline
> > + *
> > + * $func will see regs->ip point at the CALL instruction and must therefore
> > + * modify regs->ip in order to make progress (just like a normal INT3 scribbled
> > + * CALL).
> > + *
> > + * NOTE: we do not restore any of the segment registers.
> > + */
> > +ENTRY(call_to_exception_trampoline)
> > +	/*
> > +	 * On entry the stack looks like:
> > +	 *
> > +	 *   2*4(%esp) <previous context>
> > +	 *   1*4(%esp) RET-IP
> > +	 *   0*4(%esp) func
> > +	 *
> > +	 * transform this into:
> > +	 *
> > +	 *  19*4(%esp) <previous context>
> > +	 *  18*4(%esp) gap / RET-IP
> > +	 *  17*4(%esp) gap / func
> > +	 *  16*4(%esp) ss
> > +	 *  15*415*4(%esp) sp / <previous context>
> 
> isn't this "&<previous context>" ?

Yes.

> > +	 *  14*4(%esp) flags
> > +	 *  13*4(%esp) cs
> > +	 *  12*4(%esp) ip / RET-IP
> > +	 *  11*4(%esp) orig_eax
> > +	 *  10*4(%esp) gs
> > +	 *   9*4(%esp) fs
> > +	 *   8*4(%esp) es
> > +	 *   7*4(%esp) ds
> > +	 *   6*4(%esp) eax
> > +	 *   5*4(%esp) ebp
> > +	 *   4*4(%esp) edi
> > +	 *   3*4(%esp) esi
> > +	 *   2*4(%esp) edx
> > +	 *   1*4(%esp) ecx
> > +	 *   0*4(%esp) ebx
> > +	 */
> > +	pushl	%ss
> > +	pushl	%esp		# points at ss
> > +	addl	$3*4, (%esp)	#   point it at <previous context>
> > +	pushfl
> > +	pushl	%cs
> > +	pushl	5*4(%esp)	# RET-IP
> > +	subl	5, (%esp)	#   point at CALL instruction
> > +	pushl	$-1
> > +	pushl	%gs
> > +	pushl	%fs
> > +	pushl	%es
> > +	pushl	%ds
> > +	pushl	%eax
> > +	pushl	%ebp
> > +	pushl	%edi
> > +	pushl	%esi
> > +	pushl	%edx
> > +	pushl	%ecx
> > +	pushl	%ebx
> > +
> > +	ENCODE_FRAME_POINTER
> > +
> > +	movl	%esp, %eax	# 1st argument: pt_regs
> > +
> > +	movl	17*4(%esp), %ebx	# func
> > +	CALL_NOSPEC %ebx
> > +
> > +	movl	PT_OLDESP(%esp), %eax
> 
> Is PT_OLDESP(%esp) "<previous context>" or "&<previous contex>"?

The latter.

> > +
> > +	movl	PT_EIP(%esp), %ecx
> > +	movl	%ecx, -1*4(%eax)
> 
> Ah, OK, so $func must set the true return address to regs->ip
> instead of returning it.

Just so.

> > +
> > +	movl	PT_EFLAGS(%esp), %ecx
> > +	movl	%ecx, -2*4(%eax)
> > +
> > +	movl	PT_EAX(%esp), %ecx
> > +	movl	%ecx, -3*4(%eax)
> 
> So, at this point, the stack becomes
> 
  3*4(%esp) &regs->sp
  2*4(%esp) RET-IP
  1*4(%esp) eflags
  0*4(%esp) eax

> Correct?

Yes, relative to regs->sp, which is why we need to pop 'func', otherwise
it stays on the stack.

> > +
> > +	popl	%ebx
> > +	popl	%ecx
> > +	popl	%edx
> > +	popl	%esi
> > +	popl	%edi
> > +	popl	%ebp
> > +
> > +	lea	-3*4(%eax), %esp
> > +	popl	%eax
> > +	popfl
> > +	ret
> > +END(call_to_exception_trampoline)
> > --- a/arch/x86/kernel/kprobes/core.c
> > +++ b/arch/x86/kernel/kprobes/core.c
> > @@ -731,29 +731,8 @@ asm(
> >  	".global kretprobe_trampoline\n"
> >  	".type kretprobe_trampoline, @function\n"
> >  	"kretprobe_trampoline:\n"
> > -	/* We don't bother saving the ss register */
> > -#ifdef CONFIG_X86_64
> > -	"	pushq %rsp\n"
> > -	"	pushfq\n"
> > -	SAVE_REGS_STRING
> > -	"	movq %rsp, %rdi\n"
> > -	"	call trampoline_handler\n"
> > -	/* Replace saved sp with true return address. */
> > -	"	movq %rax, 19*8(%rsp)\n"
> > -	RESTORE_REGS_STRING
> > -	"	popfq\n"
> > -#else
> > -	"	pushl %esp\n"
> > -	"	pushfl\n"
> > -	SAVE_REGS_STRING
> > -	"	movl %esp, %eax\n"
> > -	"	call trampoline_handler\n"
> > -	/* Replace saved sp with true return address. */
> > -	"	movl %eax, 15*4(%esp)\n"
> > -	RESTORE_REGS_STRING
> > -	"	popfl\n"
> > -#endif
> > -	"	ret\n"
> 
> Here, we need a gap for storing ret-ip, because kretprobe_trampoline is
> the address which is returned from the target function. We have no 
> "ret-ip" here at this point. So something like
> 
> +	"push $0\n"	/* This is a gap, will be filled with real return address*/

The trampoline already provides a gap, trampoline_handler() will need to
use int3_emulate_push() if it wants to inject something on the return
stack.

> > +	"push trampoline_handler\n"
> > +	"jmp call_to_exception_trampoline\n"
> >  	".size kretprobe_trampoline, .-kretprobe_trampoline\n"
> >  );
> >  NOKPROBE_SYMBOL(kretprobe_trampoline);

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/4] x86/kprobes: Fix frame pointer annotations
  2019-05-09 16:20                 ` Andy Lutomirski
@ 2019-05-09 17:18                   ` Peter Zijlstra
  2019-05-09 17:43                   ` Steven Rostedt
  1 sibling, 0 replies; 36+ messages in thread
From: Peter Zijlstra @ 2019-05-09 17:18 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Masami Hiramatsu, Josh Poimboeuf, linux-kernel, Linus Torvalds,
	Ingo Molnar, Andrew Morton, Andy Lutomirski, Nicolai Stange,
	Thomas Gleixner, Borislav Petkov, H. Peter Anvin, x86,
	Jiri Kosina, Miroslav Benes, Petr Mladek, Joe Lawrence,
	Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen,
	Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross,
	Nick Desaulniers, Nayna Jain, Masahiro Yamada, Joerg Roedel,
	linux-kselftest

On Thu, May 09, 2019 at 09:20:06AM -0700, Andy Lutomirski wrote:
> > +ENTRY(call_to_exception_trampoline)
> > +    /*
> > +     * On entry the stack looks like:
> > +     *
> > +     *   2*4(%esp) <previous context>
> > +     *   1*4(%esp) RET-IP
> > +     *   0*4(%esp) func
> > +     *
> > +     * transform this into:
> > +     *
> > +     *  19*4(%esp) <previous context>
> > +     *  18*4(%esp) gap / RET-IP
> > +     *  17*4(%esp) gap / func
> > +     *  16*4(%esp) ss
> > +     *  15*4(%esp) sp / <previous context>
> > +     *  14*4(%esp) flags
> > +     *  13*4(%esp) cs
> > +     *  12*4(%esp) ip / RET-IP
> > +     *  11*4(%esp) orig_eax
> > +     *  10*4(%esp) gs
> > +     *   9*4(%esp) fs
> > +     *   8*4(%esp) es
> > +     *   7*4(%esp) ds
> > +     *   6*4(%esp) eax
> > +     *   5*4(%esp) ebp
> > +     *   4*4(%esp) edi
> > +     *   3*4(%esp) esi
> > +     *   2*4(%esp) edx
> > +     *   1*4(%esp) ecx
> > +     *   0*4(%esp) ebx
> > +     */
> > +    pushl    %ss
> > +    pushl    %esp        # points at ss
> > +    addl    $3*4, (%esp)    #   point it at <previous context>
> > +    pushfl
> > +    pushl    %cs
> > +    pushl    5*4(%esp)    # RET-IP
> > +    subl    5, (%esp)    #   point at CALL instruction
> > +    pushl    $-1
> > +    pushl    %gs
> > +    pushl    %fs
> > +    pushl    %es
> > +    pushl    %ds
> > +    pushl    %eax
> > +    pushl    %ebp
> > +    pushl    %edi
> > +    pushl    %esi
> > +    pushl    %edx
> > +    pushl    %ecx
> > +    pushl    %ebx
> > +
> > +    ENCODE_FRAME_POINTER
> > +
> > +    movl    %esp, %eax    # 1st argument: pt_regs
> > +
> > +    movl    17*4(%esp), %ebx    # func
> > +    CALL_NOSPEC %ebx
> > +
> > +    movl    PT_OLDESP(%esp), %eax
> > +
> > +    movl    PT_EIP(%esp), %ecx
> > +    movl    %ecx, -1*4(%eax)
> > +
> > +    movl    PT_EFLAGS(%esp), %ecx
> > +    movl    %ecx, -2*4(%eax)
> > +
> > +    movl    PT_EAX(%esp), %ecx
> > +    movl    %ecx, -3*4(%eax)
> > +
> > +    popl    %ebx
> > +    popl    %ecx
> > +    popl    %edx
> > +    popl    %esi
> > +    popl    %edi
> > +    popl    %ebp
> > +
> > +    lea    -3*4(%eax), %esp
> > +    popl    %eax
> > +    popfl
> > +    ret
> > +END(call_to_exception_trampoline)

> Potentially minor nit: you’re doing popfl, but you’re not doing
> TRACE_IRQ_whatever.  This makes me think that you should either add
> the tracing (ugh!) or you should maybe just skip the popfl.

Yeah, so we really should not change flags I suppose. If this lives I'll
remove the popfl.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/4] x86/kprobes: Fix frame pointer annotations
  2019-05-09  8:14               ` Peter Zijlstra
                                   ` (3 preceding siblings ...)
  2019-05-09 16:20                 ` Andy Lutomirski
@ 2019-05-09 17:37                 ` Steven Rostedt
  2019-05-09 18:26                   ` Peter Zijlstra
  4 siblings, 1 reply; 36+ messages in thread
From: Steven Rostedt @ 2019-05-09 17:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Masami Hiramatsu, Josh Poimboeuf, linux-kernel, Linus Torvalds,
	Ingo Molnar, Andrew Morton, Andy Lutomirski, Nicolai Stange,
	Thomas Gleixner, Borislav Petkov, H. Peter Anvin, x86,
	Jiri Kosina, Miroslav Benes, Petr Mladek, Joe Lawrence,
	Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen,
	Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross,
	Nick Desaulniers, Nayna Jain, Masahiro Yamada, Joerg Roedel,
	linux-kselftest

On Thu, May 09, 2019 at 10:14:31AM +0200, Peter Zijlstra wrote:
> 
> Right; I already fixed that in my patch changing i386's pt_regs.
> 
> But what I'd love to do is something like the belwo patch, and make all
> the trampolines (very much including ftrace) use that. Such that we then
> only have 1 copy of this magic (well, 2 because x86_64 also needs an
> implementation of this of course).
> 
> Changing ftrace over to this would be a little more work but it can
> easily chain things a little to get its original context back:
> 
> ENTRY(ftrace_regs_caller)
> GLOBAL(ftrace_regs_func)
> 	push ftrace_stub
> 	push ftrace_regs_handler

Note, ftrace_stub is dynamically modified to remove any indirect calls.

> 	jmp call_to_exception_trampoline
> END(ftrace_regs_caller)
> 
> typedef void (*ftrace_func_t)(unsigned long, unsigned long, struct ftrace_op *, struct pt_regs *);
> 
> struct ftrace_regs_stack {
> 	ftrace_func_t func;
> 	unsigned long parent_ip;
> };
> 
> void ftrace_regs_handler(struct pr_regs *regs)
> {
> 	struct ftrace_regs_stack *st = (void *)regs->sp;
> 	ftrace_func_t func = st->func;
> 
> 	regs->sp += sizeof(long); /* pop func */
> 
> 	func(regs->ip, st->parent_ip, function_trace_op, regs);

I try very hard to limit all indirect function calls from the function tracing
path, as they do add noticeable overhead.

-- Steve

> }
> 
> Hmm? I didn't look into the function_graph thing, but I imagine it can
> be added without too much pain.
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/4] x86/kprobes: Fix frame pointer annotations
  2019-05-09 16:20                 ` Andy Lutomirski
  2019-05-09 17:18                   ` Peter Zijlstra
@ 2019-05-09 17:43                   ` Steven Rostedt
  2019-05-10  3:21                     ` Masami Hiramatsu
  1 sibling, 1 reply; 36+ messages in thread
From: Steven Rostedt @ 2019-05-09 17:43 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Peter Zijlstra, Masami Hiramatsu, Josh Poimboeuf, linux-kernel,
	Linus Torvalds, Ingo Molnar, Andrew Morton, Andy Lutomirski,
	Nicolai Stange, Thomas Gleixner, Borislav Petkov, H. Peter Anvin,
	x86, Jiri Kosina, Miroslav Benes, Petr Mladek, Joe Lawrence,
	Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen,
	Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross,
	Nick Desaulniers, Nayna Jain, Masahiro Yamada, Joerg Roedel,
	linux-kselftest

On Thu, May 09, 2019 at 09:20:06AM -0700, Andy Lutomirski wrote:
> > +END(call_to_exception_trampoline)
> > --- a/arch/x86/kernel/kprobes/core.c
> > +++ b/arch/x86/kernel/kprobes/core.c
> > @@ -731,29 +731,8 @@ asm(
> >    ".global kretprobe_trampoline\n"
> >    ".type kretprobe_trampoline, @function\n"
> >    "kretprobe_trampoline:\n"
> > -    /* We don't bother saving the ss register */
> > -#ifdef CONFIG_X86_64
> > -    "    pushq %rsp\n"
> > -    "    pushfq\n"
> > -    SAVE_REGS_STRING
> > -    "    movq %rsp, %rdi\n"
> > -    "    call trampoline_handler\n"
> > -    /* Replace saved sp with true return address. */
> > -    "    movq %rax, 19*8(%rsp)\n"
> > -    RESTORE_REGS_STRING
> > -    "    popfq\n"
> > -#else
> > -    "    pushl %esp\n"
> > -    "    pushfl\n"
> > -    SAVE_REGS_STRING
> > -    "    movl %esp, %eax\n"
> > -    "    call trampoline_handler\n"
> > -    /* Replace saved sp with true return address. */
> > -    "    movl %eax, 15*4(%esp)\n"
> > -    RESTORE_REGS_STRING
> > -    "    popfl\n"
> > -#endif
> > -    "    ret\n"
> > +    "push trampoline_handler\n"
> > +    "jmp call_to_exception_trampoline\n"
> >    ".size kretprobe_trampoline, .-kretprobe_trampoline\n"
> > );
> 
> 
> Potentially minor nit: you’re doing popfl, but you’re not doing TRACE_IRQ_whatever.  This makes me think that you should either add the tracing (ugh!) or you should maybe just skip the popfl.


Note, kprobes (and ftrace for that matter) are not saving flags for
interrupts, but because it must not modify the sign, zero and carry flags.

-- Steve


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/4] x86/kprobes: Fix frame pointer annotations
  2019-05-09 17:37                 ` Steven Rostedt
@ 2019-05-09 18:26                   ` Peter Zijlstra
  2019-05-09 18:36                     ` Steven Rostedt
  0 siblings, 1 reply; 36+ messages in thread
From: Peter Zijlstra @ 2019-05-09 18:26 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Masami Hiramatsu, Josh Poimboeuf, linux-kernel, Linus Torvalds,
	Ingo Molnar, Andrew Morton, Andy Lutomirski, Nicolai Stange,
	Thomas Gleixner, Borislav Petkov, H. Peter Anvin, x86,
	Jiri Kosina, Miroslav Benes, Petr Mladek, Joe Lawrence,
	Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen,
	Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross,
	Nick Desaulniers, Nayna Jain, Masahiro Yamada, Joerg Roedel,
	linux-kselftest

On Thu, May 09, 2019 at 01:37:42PM -0400, Steven Rostedt wrote:
> On Thu, May 09, 2019 at 10:14:31AM +0200, Peter Zijlstra wrote:
> > 
> > Right; I already fixed that in my patch changing i386's pt_regs.
> > 
> > But what I'd love to do is something like the belwo patch, and make all
> > the trampolines (very much including ftrace) use that. Such that we then
> > only have 1 copy of this magic (well, 2 because x86_64 also needs an
> > implementation of this of course).
> > 
> > Changing ftrace over to this would be a little more work but it can
> > easily chain things a little to get its original context back:
> > 
> > ENTRY(ftrace_regs_caller)
> > GLOBAL(ftrace_regs_func)
> > 	push ftrace_stub
> > 	push ftrace_regs_handler
> 
> Note, ftrace_stub is dynamically modified to remove any indirect calls.

Yeah, I realized that a few hours after I send this out; as you might
have seen by the IRC chatter on this.

Still, maybe we can wrap the thing in a .macro and reuse things that
way. Because I really hate there are at least 3 (x2 for x86_64) copies
of this around.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/4] x86/kprobes: Fix frame pointer annotations
  2019-05-09 18:26                   ` Peter Zijlstra
@ 2019-05-09 18:36                     ` Steven Rostedt
  0 siblings, 0 replies; 36+ messages in thread
From: Steven Rostedt @ 2019-05-09 18:36 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Masami Hiramatsu, Josh Poimboeuf, linux-kernel, Linus Torvalds,
	Ingo Molnar, Andrew Morton, Andy Lutomirski, Nicolai Stange,
	Thomas Gleixner, Borislav Petkov, H. Peter Anvin, x86,
	Jiri Kosina, Miroslav Benes, Petr Mladek, Joe Lawrence,
	Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen,
	Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross,
	Nick Desaulniers, Nayna Jain, Masahiro Yamada, Joerg Roedel,
	linux-kselftest

On Thu, 9 May 2019 20:26:22 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> Still, maybe we can wrap the thing in a .macro and reuse things that
> way. Because I really hate there are at least 3 (x2 for x86_64) copies
> of this around.

I'm good with something like this. Have a single place that does the
pt_regs saving would be great. Then I could separate ftrace_regs_caller
from ftrace_caller, and simplify them both, as I would only need to
make sure frame pointers still work for ftrace_caller, and
ftrace_regs_caller would depend on the macro to make sure it works.

-- Steve

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/4] x86/kprobes: Fix frame pointer annotations
  2019-05-09 17:43                   ` Steven Rostedt
@ 2019-05-10  3:21                     ` Masami Hiramatsu
  2019-05-10 12:14                       ` Peter Zijlstra
  2019-05-10 12:17                       ` Peter Zijlstra
  0 siblings, 2 replies; 36+ messages in thread
From: Masami Hiramatsu @ 2019-05-10  3:21 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Andy Lutomirski, Peter Zijlstra, Masami Hiramatsu,
	Josh Poimboeuf, linux-kernel, Linus Torvalds, Ingo Molnar,
	Andrew Morton, Andy Lutomirski, Nicolai Stange, Thomas Gleixner,
	Borislav Petkov, H. Peter Anvin, x86, Jiri Kosina,
	Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan,
	Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior,
	Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain,
	Masahiro Yamada, Joerg Roedel, linux-kselftest

On Thu, 9 May 2019 13:43:16 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Thu, May 09, 2019 at 09:20:06AM -0700, Andy Lutomirski wrote:
> > > +END(call_to_exception_trampoline)
> > > --- a/arch/x86/kernel/kprobes/core.c
> > > +++ b/arch/x86/kernel/kprobes/core.c
> > > @@ -731,29 +731,8 @@ asm(
> > >    ".global kretprobe_trampoline\n"
> > >    ".type kretprobe_trampoline, @function\n"
> > >    "kretprobe_trampoline:\n"
> > > -    /* We don't bother saving the ss register */
> > > -#ifdef CONFIG_X86_64
> > > -    "    pushq %rsp\n"
> > > -    "    pushfq\n"
> > > -    SAVE_REGS_STRING
> > > -    "    movq %rsp, %rdi\n"
> > > -    "    call trampoline_handler\n"
> > > -    /* Replace saved sp with true return address. */
> > > -    "    movq %rax, 19*8(%rsp)\n"
> > > -    RESTORE_REGS_STRING
> > > -    "    popfq\n"
> > > -#else
> > > -    "    pushl %esp\n"
> > > -    "    pushfl\n"
> > > -    SAVE_REGS_STRING
> > > -    "    movl %esp, %eax\n"
> > > -    "    call trampoline_handler\n"
> > > -    /* Replace saved sp with true return address. */
> > > -    "    movl %eax, 15*4(%esp)\n"
> > > -    RESTORE_REGS_STRING
> > > -    "    popfl\n"
> > > -#endif
> > > -    "    ret\n"
> > > +    "push trampoline_handler\n"
> > > +    "jmp call_to_exception_trampoline\n"
> > >    ".size kretprobe_trampoline, .-kretprobe_trampoline\n"
> > > );
> > 
> > 
> > Potentially minor nit: you’re doing popfl, but you’re not doing TRACE_IRQ_whatever.  This makes me think that you should either add the tracing (ugh!) or you should maybe just skip the popfl.
> 
> 
> Note, kprobes (and ftrace for that matter) are not saving flags for
> interrupts, but because it must not modify the sign, zero and carry flags.

Yes, optprobe also has to save and restore the flags.
Above trampline is for kretprobe, which is placed at the function return, so
we don't have to care about flags.

Thanks,
-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/4] x86/kprobes: Fix frame pointer annotations
  2019-05-09 17:14                   ` Peter Zijlstra
@ 2019-05-10  4:58                     ` Masami Hiramatsu
  2019-05-10 12:31                       ` Peter Zijlstra
  2019-05-10 12:40                       ` Peter Zijlstra
  0 siblings, 2 replies; 36+ messages in thread
From: Masami Hiramatsu @ 2019-05-10  4:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Josh Poimboeuf, linux-kernel, Linus Torvalds, Ingo Molnar,
	Andrew Morton, Andy Lutomirski, Nicolai Stange, Thomas Gleixner,
	Borislav Petkov, H. Peter Anvin, x86, Jiri Kosina,
	Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan,
	Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior,
	Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain,
	Masahiro Yamada, Joerg Roedel, linux-kselftest

On Thu, 9 May 2019 19:14:16 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Thu, May 09, 2019 at 11:01:06PM +0900, Masami Hiramatsu wrote:
> > On Thu, 9 May 2019 10:14:31 +0200
> > Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > > But what I'd love to do is something like the belwo patch, and make all
> > > the trampolines (very much including ftrace) use that. Such that we then
> > > only have 1 copy of this magic (well, 2 because x86_64 also needs an
> > > implementation of this of course).
> > 
> > OK, but I will make kretprobe integrated with func-graph tracer,
> > since it is inefficient that we have 2 different hidden return stack...
> > 
> > Anyway,
> > 
> > > Changing ftrace over to this would be a little more work but it can
> > > easily chain things a little to get its original context back:
> > > 
> > > ENTRY(ftrace_regs_caller)
> > > GLOBAL(ftrace_regs_func)
> > > 	push ftrace_stub
> > > 	push ftrace_regs_handler
> > > 	jmp call_to_exception_trampoline
> > > END(ftrace_regs_caller)
> > > 
> > > typedef void (*ftrace_func_t)(unsigned long, unsigned long, struct ftrace_op *, struct pt_regs *);
> > > 
> > > struct ftrace_regs_stack {
> > > 	ftrace_func_t func;
> > > 	unsigned long parent_ip;
> > > };
> > > 
> > > void ftrace_regs_handler(struct pr_regs *regs)
> > > {
> > > 	struct ftrace_regs_stack *st = (void *)regs->sp;
> > > 	ftrace_func_t func = st->func;
> > > 
> > > 	regs->sp += sizeof(long); /* pop func */
> > 
> > Sorry, why pop here? 
> 
> Otherwise it stays on the return stack and bad things happen. Note how
> the below trampoline thing uses regs->sp.
> 
> > > 	func(regs->ip, st->parent_ip, function_trace_op, regs);
> > > }
> > > 
> > > Hmm? I didn't look into the function_graph thing, but I imagine it can
> > > be added without too much pain.
> > 
> > Yes, that should be good for function_graph trampoline too.
> > We use very similar technic.
> 
> Ideally also the optimized kprobe trampoline, but I've not managed to
> fully comprehend that one.

As you pointed in other reply, save/restore can be a macro, but
each trampoline code is slightly different. Optprobe template has
below parts

(jumped from probed address)
[store regs]
[setup function arguments (pt_regs and probed address)]
[handler call]
[restore regs]
[execute copied instruction]
[jump back to probed address]

Note that there is a limitation that if it is optiomized probe, user
handler can not change regs->ip. (we can not use "ret" after executed
a copied instruction, which must run on same stack)

> 
> > > 
> > > ---
> > > --- a/arch/x86/entry/entry_32.S
> > > +++ b/arch/x86/entry/entry_32.S
> > > @@ -1576,3 +1576,100 @@ ENTRY(rewind_stack_do_exit)
> > >  	call	do_exit
> > >  1:	jmp 1b
> > >  END(rewind_stack_do_exit)
> > > +
> > > +/*
> > > + * Transforms a CALL frame into an exception frame; IOW it pretends the CALL we
> > > + * just did was in fact scribbled with an INT3.
> > > + *
> > > + * Use this trampoline like:
> > > + *
> > > + *   PUSH $func
> > > + *   JMP call_to_exception_trampoline
> > > + *
> > > + * $func will see regs->ip point at the CALL instruction and must therefore
> > > + * modify regs->ip in order to make progress (just like a normal INT3 scribbled
> > > + * CALL).
> > > + *
> > > + * NOTE: we do not restore any of the segment registers.
> > > + */
> > > +ENTRY(call_to_exception_trampoline)
> > > +	/*
> > > +	 * On entry the stack looks like:
> > > +	 *
> > > +	 *   2*4(%esp) <previous context>
> > > +	 *   1*4(%esp) RET-IP
> > > +	 *   0*4(%esp) func
> > > +	 *
> > > +	 * transform this into:
> > > +	 *
> > > +	 *  19*4(%esp) <previous context>
> > > +	 *  18*4(%esp) gap / RET-IP
> > > +	 *  17*4(%esp) gap / func
> > > +	 *  16*4(%esp) ss
> > > +	 *  15*415*4(%esp) sp / <previous context>
> > 
> > isn't this "&<previous context>" ?
> 
> Yes.
> 
> > > +	 *  14*4(%esp) flags
> > > +	 *  13*4(%esp) cs
> > > +	 *  12*4(%esp) ip / RET-IP
> > > +	 *  11*4(%esp) orig_eax
> > > +	 *  10*4(%esp) gs
> > > +	 *   9*4(%esp) fs
> > > +	 *   8*4(%esp) es
> > > +	 *   7*4(%esp) ds
> > > +	 *   6*4(%esp) eax
> > > +	 *   5*4(%esp) ebp
> > > +	 *   4*4(%esp) edi
> > > +	 *   3*4(%esp) esi
> > > +	 *   2*4(%esp) edx
> > > +	 *   1*4(%esp) ecx
> > > +	 *   0*4(%esp) ebx
> > > +	 */
> > > +	pushl	%ss
> > > +	pushl	%esp		# points at ss
> > > +	addl	$3*4, (%esp)	#   point it at <previous context>
> > > +	pushfl
> > > +	pushl	%cs
> > > +	pushl	5*4(%esp)	# RET-IP
> > > +	subl	5, (%esp)	#   point at CALL instruction
> > > +	pushl	$-1
> > > +	pushl	%gs
> > > +	pushl	%fs
> > > +	pushl	%es
> > > +	pushl	%ds
> > > +	pushl	%eax
> > > +	pushl	%ebp
> > > +	pushl	%edi
> > > +	pushl	%esi
> > > +	pushl	%edx
> > > +	pushl	%ecx
> > > +	pushl	%ebx
> > > +
> > > +	ENCODE_FRAME_POINTER
> > > +
> > > +	movl	%esp, %eax	# 1st argument: pt_regs
> > > +
> > > +	movl	17*4(%esp), %ebx	# func
> > > +	CALL_NOSPEC %ebx
> > > +
> > > +	movl	PT_OLDESP(%esp), %eax
> > 
> > Is PT_OLDESP(%esp) "<previous context>" or "&<previous contex>"?
> 
> The latter.
> 
> > > +
> > > +	movl	PT_EIP(%esp), %ecx
> > > +	movl	%ecx, -1*4(%eax)
> > 
> > Ah, OK, so $func must set the true return address to regs->ip
> > instead of returning it.
> 
> Just so.
> 
> > > +
> > > +	movl	PT_EFLAGS(%esp), %ecx
> > > +	movl	%ecx, -2*4(%eax)
> > > +
> > > +	movl	PT_EAX(%esp), %ecx
> > > +	movl	%ecx, -3*4(%eax)
> > 
> > So, at this point, the stack becomes
> > 
>   3*4(%esp) &regs->sp
>   2*4(%esp) RET-IP
>   1*4(%esp) eflags
>   0*4(%esp) eax
> 
> > Correct?
> 
> Yes, relative to regs->sp, which is why we need to pop 'func', otherwise
> it stays on the stack.
> 
> > > +
> > > +	popl	%ebx
> > > +	popl	%ecx
> > > +	popl	%edx
> > > +	popl	%esi
> > > +	popl	%edi
> > > +	popl	%ebp
> > > +
> > > +	lea	-3*4(%eax), %esp
> > > +	popl	%eax
> > > +	popfl
> > > +	ret
> > > +END(call_to_exception_trampoline)
> > > --- a/arch/x86/kernel/kprobes/core.c
> > > +++ b/arch/x86/kernel/kprobes/core.c
> > > @@ -731,29 +731,8 @@ asm(
> > >  	".global kretprobe_trampoline\n"
> > >  	".type kretprobe_trampoline, @function\n"
> > >  	"kretprobe_trampoline:\n"
> > > -	/* We don't bother saving the ss register */
> > > -#ifdef CONFIG_X86_64
> > > -	"	pushq %rsp\n"
> > > -	"	pushfq\n"
> > > -	SAVE_REGS_STRING
> > > -	"	movq %rsp, %rdi\n"
> > > -	"	call trampoline_handler\n"
> > > -	/* Replace saved sp with true return address. */
> > > -	"	movq %rax, 19*8(%rsp)\n"
> > > -	RESTORE_REGS_STRING
> > > -	"	popfq\n"
> > > -#else
> > > -	"	pushl %esp\n"
> > > -	"	pushfl\n"
> > > -	SAVE_REGS_STRING
> > > -	"	movl %esp, %eax\n"
> > > -	"	call trampoline_handler\n"
> > > -	/* Replace saved sp with true return address. */
> > > -	"	movl %eax, 15*4(%esp)\n"
> > > -	RESTORE_REGS_STRING
> > > -	"	popfl\n"
> > > -#endif
> > > -	"	ret\n"
> > 
> > Here, we need a gap for storing ret-ip, because kretprobe_trampoline is
> > the address which is returned from the target function. We have no 
> > "ret-ip" here at this point. So something like
> > 
> > +	"push $0\n"	/* This is a gap, will be filled with real return address*/
> 
> The trampoline already provides a gap, trampoline_handler() will need to
> use int3_emulate_push() if it wants to inject something on the return
> stack.

I guess you mean the int3 case. This trampoline is used as a return destination.
When the target function is called, kretprobe interrupts the first instruction,
and replace the return address with this trampoline. When a "ret" instruction
is done, it returns to this trampoline. Thus the stack frame start with
previous context here. As you described above,

> > > +	 * On entry the stack looks like:
> > > +	 *
> > > +	 *   2*4(%esp) <previous context>
> > > +	 *   1*4(%esp) RET-IP
> > > +	 *   0*4(%esp) func

From this trampoline call, the stack looks like:

	 *   1*4(%esp) <previous context>
	 *   0*4(%esp) func

So we need one more push.

> 
> > > +	"push trampoline_handler\n"
> > > +	"jmp call_to_exception_trampoline\n"
> > >  	".size kretprobe_trampoline, .-kretprobe_trampoline\n"
> > >  );
> > >  NOKPROBE_SYMBOL(kretprobe_trampoline);

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/4] x86/kprobes: Fix frame pointer annotations
  2019-05-10  3:21                     ` Masami Hiramatsu
@ 2019-05-10 12:14                       ` Peter Zijlstra
  2019-05-10 12:17                       ` Peter Zijlstra
  1 sibling, 0 replies; 36+ messages in thread
From: Peter Zijlstra @ 2019-05-10 12:14 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Steven Rostedt, Andy Lutomirski, Josh Poimboeuf, linux-kernel,
	Linus Torvalds, Ingo Molnar, Andrew Morton, Andy Lutomirski,
	Nicolai Stange, Thomas Gleixner, Borislav Petkov, H. Peter Anvin,
	x86, Jiri Kosina, Miroslav Benes, Petr Mladek, Joe Lawrence,
	Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen,
	Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross,
	Nick Desaulniers, Nayna Jain, Masahiro Yamada, Joerg Roedel,
	linux-kselftest

On Fri, May 10, 2019 at 12:21:03PM +0900, Masami Hiramatsu wrote:
> On Thu, 9 May 2019 13:43:16 -0400
> Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > On Thu, May 09, 2019 at 09:20:06AM -0700, Andy Lutomirski wrote:
> > > > +END(call_to_exception_trampoline)
> > > > --- a/arch/x86/kernel/kprobes/core.c
> > > > +++ b/arch/x86/kernel/kprobes/core.c
> > > > @@ -731,29 +731,8 @@ asm(
> > > >    ".global kretprobe_trampoline\n"
> > > >    ".type kretprobe_trampoline, @function\n"
> > > >    "kretprobe_trampoline:\n"
> > > > -    /* We don't bother saving the ss register */
> > > > -#ifdef CONFIG_X86_64
> > > > -    "    pushq %rsp\n"
> > > > -    "    pushfq\n"
> > > > -    SAVE_REGS_STRING
> > > > -    "    movq %rsp, %rdi\n"
> > > > -    "    call trampoline_handler\n"
> > > > -    /* Replace saved sp with true return address. */
> > > > -    "    movq %rax, 19*8(%rsp)\n"
> > > > -    RESTORE_REGS_STRING
> > > > -    "    popfq\n"
> > > > -#else
> > > > -    "    pushl %esp\n"
> > > > -    "    pushfl\n"
> > > > -    SAVE_REGS_STRING
> > > > -    "    movl %esp, %eax\n"
> > > > -    "    call trampoline_handler\n"
> > > > -    /* Replace saved sp with true return address. */
> > > > -    "    movl %eax, 15*4(%esp)\n"
> > > > -    RESTORE_REGS_STRING
> > > > -    "    popfl\n"
> > > > -#endif
> > > > -    "    ret\n"
> > > > +    "push trampoline_handler\n"
> > > > +    "jmp call_to_exception_trampoline\n"
> > > >    ".size kretprobe_trampoline, .-kretprobe_trampoline\n"
> > > > );
> > > 
> > > 
> > > Potentially minor nit: you’re doing popfl, but you’re not doing TRACE_IRQ_whatever.  This makes me think that you should either add the tracing (ugh!) or you should maybe just skip the popfl.
> > 
> > 
> > Note, kprobes (and ftrace for that matter) are not saving flags for
> > interrupts, but because it must not modify the sign, zero and carry flags.

Uh, C ABI considers those clobbered over function calls, surely. Relying
on those flags over a CALL would be _insane_.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/4] x86/kprobes: Fix frame pointer annotations
  2019-05-10  3:21                     ` Masami Hiramatsu
  2019-05-10 12:14                       ` Peter Zijlstra
@ 2019-05-10 12:17                       ` Peter Zijlstra
  2019-05-10 14:54                         ` Steven Rostedt
  1 sibling, 1 reply; 36+ messages in thread
From: Peter Zijlstra @ 2019-05-10 12:17 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Steven Rostedt, Andy Lutomirski, Josh Poimboeuf, linux-kernel,
	Linus Torvalds, Ingo Molnar, Andrew Morton, Andy Lutomirski,
	Nicolai Stange, Thomas Gleixner, Borislav Petkov, H. Peter Anvin,
	x86, Jiri Kosina, Miroslav Benes, Petr Mladek, Joe Lawrence,
	Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen,
	Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross,
	Nick Desaulniers, Nayna Jain, Masahiro Yamada, Joerg Roedel,
	linux-kselftest

On Fri, May 10, 2019 at 12:21:03PM +0900, Masami Hiramatsu wrote:
> Yes, optprobe also has to save and restore the flags.
> Above trampline is for kretprobe, which is placed at the function return, so
> we don't have to care about flags.

Sure, optprobe is actually special here, because it branches out at
'random' places and does indeed need to preserve flags.

But both ftrace and retprobes are at C function call boundaries.
Preserving flags doesn't make sense.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/4] x86/kprobes: Fix frame pointer annotations
  2019-05-10  4:58                     ` Masami Hiramatsu
@ 2019-05-10 12:31                       ` Peter Zijlstra
  2019-05-11  0:52                         ` Masami Hiramatsu
  2019-05-10 12:40                       ` Peter Zijlstra
  1 sibling, 1 reply; 36+ messages in thread
From: Peter Zijlstra @ 2019-05-10 12:31 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Josh Poimboeuf, linux-kernel, Linus Torvalds, Ingo Molnar,
	Andrew Morton, Andy Lutomirski, Nicolai Stange, Thomas Gleixner,
	Borislav Petkov, H. Peter Anvin, x86, Jiri Kosina,
	Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan,
	Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior,
	Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain,
	Masahiro Yamada, Joerg Roedel, linux-kselftest

On Fri, May 10, 2019 at 01:58:31PM +0900, Masami Hiramatsu wrote:
> On Thu, 9 May 2019 19:14:16 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:

> > Ideally also the optimized kprobe trampoline, but I've not managed to
> > fully comprehend that one.
> 
> As you pointed in other reply, save/restore can be a macro, but
> each trampoline code is slightly different. Optprobe template has
> below parts
> 
> (jumped from probed address)
> [store regs]
> [setup function arguments (pt_regs and probed address)]
> [handler call]
> [restore regs]
> [execute copied instruction]

 instruction_s_ ?

The JMP to this trampoline is likely 5 bytes and could have clobbered
multiple instructions, we'd then have to place them all here, and

> [jump back to probed address]

jump to after whatever instructions were clobbered by the JMP.

> Note that there is a limitation that if it is optiomized probe, user
> handler can not change regs->ip. (we can not use "ret" after executed
> a copied instruction, which must run on same stack)

Changing regs->ip in this case is going to be massively dodgy indeed :-)
But so would changing much else; changing stack layout would also be
somewhat tricky.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/4] x86/kprobes: Fix frame pointer annotations
  2019-05-10  4:58                     ` Masami Hiramatsu
  2019-05-10 12:31                       ` Peter Zijlstra
@ 2019-05-10 12:40                       ` Peter Zijlstra
  2019-05-11  0:56                         ` Masami Hiramatsu
  1 sibling, 1 reply; 36+ messages in thread
From: Peter Zijlstra @ 2019-05-10 12:40 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Josh Poimboeuf, linux-kernel, Linus Torvalds, Ingo Molnar,
	Andrew Morton, Andy Lutomirski, Nicolai Stange, Thomas Gleixner,
	Borislav Petkov, H. Peter Anvin, x86, Jiri Kosina,
	Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan,
	Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior,
	Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain,
	Masahiro Yamada, Joerg Roedel, linux-kselftest

On Fri, May 10, 2019 at 01:58:31PM +0900, Masami Hiramatsu wrote:
> On Thu, 9 May 2019 19:14:16 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:
> > > > --- a/arch/x86/kernel/kprobes/core.c
> > > > +++ b/arch/x86/kernel/kprobes/core.c
> > > > @@ -731,29 +731,8 @@ asm(
> > > >  	".global kretprobe_trampoline\n"
> > > >  	".type kretprobe_trampoline, @function\n"
> > > >  	"kretprobe_trampoline:\n"

> > > Here, we need a gap for storing ret-ip, because kretprobe_trampoline is
> > > the address which is returned from the target function. We have no 
> > > "ret-ip" here at this point. So something like
> > > 
> > > +	"push $0\n"	/* This is a gap, will be filled with real return address*/
> > 
> > The trampoline already provides a gap, trampoline_handler() will need to
> > use int3_emulate_push() if it wants to inject something on the return
> > stack.
> 
> I guess you mean the int3 case. This trampoline is used as a return destination.

> When the target function is called, kretprobe interrupts the first instruction,
> and replace the return address with this trampoline. When a "ret" instruction
> is done, it returns to this trampoline. Thus the stack frame start with
> previous context here. As you described above,

I would prefer to change that to inject an extra return address, instead
of replacing it. With the new exception stuff we can actually do that.

So on entry we then go from:

	<previous context>
	RET-IP

to

	<previous context>
	RET-IP
	return-trampoline

So when the function returns, it falls into the trampoline instead.

> > > > +	 * On entry the stack looks like:
> > > > +	 *
> > > > +	 *   2*4(%esp) <previous context>
> > > > +	 *   1*4(%esp) RET-IP
> > > > +	 *   0*4(%esp) func
> 
> From this trampoline call, the stack looks like:
> 
> 	 *   1*4(%esp) <previous context>
> 	 *   0*4(%esp) func
> 
> So we need one more push.

And then the stack looks just right at this point.

> > > > +	"push trampoline_handler\n"
> > > > +	"jmp call_to_exception_trampoline\n"
> > > >  	".size kretprobe_trampoline, .-kretprobe_trampoline\n"
> > > >  );
> > > >  NOKPROBE_SYMBOL(kretprobe_trampoline);

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/4] x86/kprobes: Fix frame pointer annotations
  2019-05-10 12:17                       ` Peter Zijlstra
@ 2019-05-10 14:54                         ` Steven Rostedt
  0 siblings, 0 replies; 36+ messages in thread
From: Steven Rostedt @ 2019-05-10 14:54 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Masami Hiramatsu, Andy Lutomirski, Josh Poimboeuf, linux-kernel,
	Linus Torvalds, Ingo Molnar, Andrew Morton, Andy Lutomirski,
	Nicolai Stange, Thomas Gleixner, Borislav Petkov, H. Peter Anvin,
	x86, Jiri Kosina, Miroslav Benes, Petr Mladek, Joe Lawrence,
	Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen,
	Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross,
	Nick Desaulniers, Nayna Jain, Masahiro Yamada, Joerg Roedel,
	linux-kselftest

On Fri, 10 May 2019 14:17:20 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> But both ftrace and retprobes are at C function call boundaries.
> Preserving flags doesn't make sense.

I agree, but I did it just because of my OCD and being complete in
"emulating an int3" for ftrace_regs_caller ;-)

Yeah, we can remove the popfl from the ftrace trampoline.

-- Steve

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/4] x86/kprobes: Fix frame pointer annotations
  2019-05-10 12:31                       ` Peter Zijlstra
@ 2019-05-11  0:52                         ` Masami Hiramatsu
  0 siblings, 0 replies; 36+ messages in thread
From: Masami Hiramatsu @ 2019-05-11  0:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Josh Poimboeuf, linux-kernel, Linus Torvalds, Ingo Molnar,
	Andrew Morton, Andy Lutomirski, Nicolai Stange, Thomas Gleixner,
	Borislav Petkov, H. Peter Anvin, x86, Jiri Kosina,
	Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan,
	Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior,
	Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain,
	Masahiro Yamada, Joerg Roedel, linux-kselftest

On Fri, 10 May 2019 14:31:31 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Fri, May 10, 2019 at 01:58:31PM +0900, Masami Hiramatsu wrote:
> > On Thu, 9 May 2019 19:14:16 +0200
> > Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > > Ideally also the optimized kprobe trampoline, but I've not managed to
> > > fully comprehend that one.
> > 
> > As you pointed in other reply, save/restore can be a macro, but
> > each trampoline code is slightly different. Optprobe template has
> > below parts
> > 
> > (jumped from probed address)
> > [store regs]
> > [setup function arguments (pt_regs and probed address)]
> > [handler call]
> > [restore regs]
> > [execute copied instruction]
> 
>  instruction_s_ ?

Yes.

> 
> The JMP to this trampoline is likely 5 bytes and could have clobbered
> multiple instructions, we'd then have to place them all here, and
> 
> > [jump back to probed address]
> 
> jump to after whatever instructions were clobbered by the JMP.

Right!

> > Note that there is a limitation that if it is optiomized probe, user
> > handler can not change regs->ip. (we can not use "ret" after executed
> > a copied instruction, which must run on same stack)
> 
> Changing regs->ip in this case is going to be massively dodgy indeed :-)
> But so would changing much else; changing stack layout would also be
> somewhat tricky.

Yes, so the stack must be same after [restore regs].

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/4] x86/kprobes: Fix frame pointer annotations
  2019-05-10 12:40                       ` Peter Zijlstra
@ 2019-05-11  0:56                         ` Masami Hiramatsu
  2019-05-13  8:15                           ` Peter Zijlstra
  0 siblings, 1 reply; 36+ messages in thread
From: Masami Hiramatsu @ 2019-05-11  0:56 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Josh Poimboeuf, linux-kernel, Linus Torvalds, Ingo Molnar,
	Andrew Morton, Andy Lutomirski, Nicolai Stange, Thomas Gleixner,
	Borislav Petkov, H. Peter Anvin, x86, Jiri Kosina,
	Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan,
	Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior,
	Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain,
	Masahiro Yamada, Joerg Roedel, linux-kselftest

On Fri, 10 May 2019 14:40:54 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Fri, May 10, 2019 at 01:58:31PM +0900, Masami Hiramatsu wrote:
> > On Thu, 9 May 2019 19:14:16 +0200
> > Peter Zijlstra <peterz@infradead.org> wrote:
> > > > > --- a/arch/x86/kernel/kprobes/core.c
> > > > > +++ b/arch/x86/kernel/kprobes/core.c
> > > > > @@ -731,29 +731,8 @@ asm(
> > > > >  	".global kretprobe_trampoline\n"
> > > > >  	".type kretprobe_trampoline, @function\n"
> > > > >  	"kretprobe_trampoline:\n"
> 
> > > > Here, we need a gap for storing ret-ip, because kretprobe_trampoline is
> > > > the address which is returned from the target function. We have no 
> > > > "ret-ip" here at this point. So something like
> > > > 
> > > > +	"push $0\n"	/* This is a gap, will be filled with real return address*/
> > > 
> > > The trampoline already provides a gap, trampoline_handler() will need to
> > > use int3_emulate_push() if it wants to inject something on the return
> > > stack.
> > 
> > I guess you mean the int3 case. This trampoline is used as a return destination.
> 
> > When the target function is called, kretprobe interrupts the first instruction,
> > and replace the return address with this trampoline. When a "ret" instruction
> > is done, it returns to this trampoline. Thus the stack frame start with
> > previous context here. As you described above,
> 
> I would prefer to change that to inject an extra return address, instead
> of replacing it. With the new exception stuff we can actually do that.
> 
> So on entry we then go from:
> 
> 	<previous context>
> 	RET-IP
> 
> to
> 
> 	<previous context>
> 	RET-IP
> 	return-trampoline
> 
> So when the function returns, it falls into the trampoline instead.

Is that really possible? On x86-64, most parameters are passed by registers,
but x86-32 (and x86-64 in rare case) some parameters can be passed by stack.
If we change the stack layout in the function prologue, the code in
function body can not access those parameters on stack.

Thank you,

> 
> > > > > +	 * On entry the stack looks like:
> > > > > +	 *
> > > > > +	 *   2*4(%esp) <previous context>
> > > > > +	 *   1*4(%esp) RET-IP
> > > > > +	 *   0*4(%esp) func
> > 
> > From this trampoline call, the stack looks like:
> > 
> > 	 *   1*4(%esp) <previous context>
> > 	 *   0*4(%esp) func
> > 
> > So we need one more push.
> 
> And then the stack looks just right at this point.
> 
> > > > > +	"push trampoline_handler\n"
> > > > > +	"jmp call_to_exception_trampoline\n"
> > > > >  	".size kretprobe_trampoline, .-kretprobe_trampoline\n"
> > > > >  );
> > > > >  NOKPROBE_SYMBOL(kretprobe_trampoline);


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/4] x86/kprobes: Fix frame pointer annotations
  2019-05-11  0:56                         ` Masami Hiramatsu
@ 2019-05-13  8:15                           ` Peter Zijlstra
  0 siblings, 0 replies; 36+ messages in thread
From: Peter Zijlstra @ 2019-05-13  8:15 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Josh Poimboeuf, linux-kernel, Linus Torvalds, Ingo Molnar,
	Andrew Morton, Andy Lutomirski, Nicolai Stange, Thomas Gleixner,
	Borislav Petkov, H. Peter Anvin, x86, Jiri Kosina,
	Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan,
	Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior,
	Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain,
	Masahiro Yamada, Joerg Roedel, linux-kselftest

On Sat, May 11, 2019 at 09:56:55AM +0900, Masami Hiramatsu wrote:
> On Fri, 10 May 2019 14:40:54 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > On Fri, May 10, 2019 at 01:58:31PM +0900, Masami Hiramatsu wrote:
> > > On Thu, 9 May 2019 19:14:16 +0200
> > > Peter Zijlstra <peterz@infradead.org> wrote:
> > > > > > --- a/arch/x86/kernel/kprobes/core.c
> > > > > > +++ b/arch/x86/kernel/kprobes/core.c
> > > > > > @@ -731,29 +731,8 @@ asm(
> > > > > >  	".global kretprobe_trampoline\n"
> > > > > >  	".type kretprobe_trampoline, @function\n"
> > > > > >  	"kretprobe_trampoline:\n"
> > 
> > > > > Here, we need a gap for storing ret-ip, because kretprobe_trampoline is
> > > > > the address which is returned from the target function. We have no 
> > > > > "ret-ip" here at this point. So something like
> > > > > 
> > > > > +	"push $0\n"	/* This is a gap, will be filled with real return address*/
> > > > 
> > > > The trampoline already provides a gap, trampoline_handler() will need to
> > > > use int3_emulate_push() if it wants to inject something on the return
> > > > stack.
> > > 
> > > I guess you mean the int3 case. This trampoline is used as a return destination.
> > 
> > > When the target function is called, kretprobe interrupts the first instruction,
> > > and replace the return address with this trampoline. When a "ret" instruction
> > > is done, it returns to this trampoline. Thus the stack frame start with
> > > previous context here. As you described above,
> > 
> > I would prefer to change that to inject an extra return address, instead
> > of replacing it. With the new exception stuff we can actually do that.
> > 
> > So on entry we then go from:
> > 
> > 	<previous context>
> > 	RET-IP
> > 
> > to
> > 
> > 	<previous context>
> > 	RET-IP
> > 	return-trampoline
> > 
> > So when the function returns, it falls into the trampoline instead.
> 
> Is that really possible? On x86-64, most parameters are passed by registers,
> but x86-32 (and x86-64 in rare case) some parameters can be passed by stack.
> If we change the stack layout in the function prologue, the code in
> function body can not access those parameters on stack.

Ooh, I see what you mean... yes that might be trouble indeed. Damn..

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2019-05-13  8:15 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-08  7:49 [PATCH 0/4] x86: int3 fallout Peter Zijlstra
2019-05-08  7:49 ` [PATCH 1/4] x86/entry/32: Clean up return from interrupt preemption path Peter Zijlstra
2019-05-08  7:49 ` [PATCH 2/4] x86/kprobes: Fix frame pointer annotations Peter Zijlstra
2019-05-08 11:54   ` Josh Poimboeuf
2019-05-08 12:04     ` Peter Zijlstra
2019-05-08 12:40       ` Peter Zijlstra
2019-05-08 12:42       ` Josh Poimboeuf
2019-05-08 15:39         ` Peter Zijlstra
2019-05-08 18:48           ` Josh Poimboeuf
2019-05-09  1:20             ` Masami Hiramatsu
2019-05-09  8:14               ` Peter Zijlstra
2019-05-09  9:27                 ` Peter Zijlstra
2019-05-09 14:00                 ` Josh Poimboeuf
2019-05-09 14:01                 ` Masami Hiramatsu
2019-05-09 17:14                   ` Peter Zijlstra
2019-05-10  4:58                     ` Masami Hiramatsu
2019-05-10 12:31                       ` Peter Zijlstra
2019-05-11  0:52                         ` Masami Hiramatsu
2019-05-10 12:40                       ` Peter Zijlstra
2019-05-11  0:56                         ` Masami Hiramatsu
2019-05-13  8:15                           ` Peter Zijlstra
2019-05-09 16:20                 ` Andy Lutomirski
2019-05-09 17:18                   ` Peter Zijlstra
2019-05-09 17:43                   ` Steven Rostedt
2019-05-10  3:21                     ` Masami Hiramatsu
2019-05-10 12:14                       ` Peter Zijlstra
2019-05-10 12:17                       ` Peter Zijlstra
2019-05-10 14:54                         ` Steven Rostedt
2019-05-09 17:37                 ` Steven Rostedt
2019-05-09 18:26                   ` Peter Zijlstra
2019-05-09 18:36                     ` Steven Rostedt
2019-05-08  7:49 ` [PATCH 3/4] x86/ftrace: Add pt_regs frame annotations Peter Zijlstra
2019-05-08  7:49 ` [RFC][PATCH 4/4] x86_32: Provide consistent pt_regs Peter Zijlstra
2019-05-08 11:57   ` Josh Poimboeuf
2019-05-08 12:46     ` Ingo Molnar
2019-05-08 20:58   ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).