All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/44] x86/dumpstack: rewrite x86 stack dump code
@ 2016-08-04 22:21 Josh Poimboeuf
  2016-08-04 22:21 ` [PATCH v2 01/44] x86/dumpstack: remove show_trace() Josh Poimboeuf
                   ` (43 more replies)
  0 siblings, 44 replies; 66+ messages in thread
From: Josh Poimboeuf @ 2016-08-04 22:21 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park

There are a lot of changes since last time.  See the v2 changelog for
more details.

A git branch is available at:
  
  https://github.com/jpoimboe/linux unwind-v2

Based on tip/master.

v2 changelog:
- split up several of the patches and reorder them with lower-risk
  patches first
- add a lot more comments
- remove the 64-byte gap at the end of the irq stack
- fix some existing ftrace function graph unwinding issues
- fix an existing bug in kernel_stack_pointer()
- clarify the origins of the stack_info "next stack" pointers
- do visit_mask checking in get_stack_info() instead of in_*_stack()
- add some new unwinder warnings
- remove uses of test_and_set_bit()
- dont print regs->ip twice
- remove unwind_state.sp
- have unwind_get_return_address() validate the return address
- change /proc/pid/stack to use %pB
- several minor cleanups and fixes

----

The x86 stack dump code is a bit of a mess.  dump_trace() uses
callbacks, and each user of it seems to have slightly different
requirements, so there are several slightly different callbacks floating
around.

Also there are some upcoming features which will require more changes to
the stack dump code: reliable stack detection for live patching,
hardened user copy, and the DWARF unwinder.  Each of those features
would at least need more callbacks and/or callback interfaces, resulting
in a much bigger mess than what we have today.

Before doing all that, we should try to clean things up and replace
dump_trace() with something cleaner and more flexible.

The new unwinder is a simple state machine which was heavily inspired by
a suggestion from Andy Lutomirski:

  https://lkml.kernel.org/r/CALCETrUbNTqaM2LRyXGRx=kVLRPeY5A3Pc6k4TtQxF320rUT=w@mail.gmail.com

It's also similar to the libunwind API:

  http://www.nongnu.org/libunwind/man/libunwind(3).html

Some if its advantages:

- simplicity: no more callback sprawl and less code duplication.

- flexibility: allows the caller to stop and inspect the stack state at
  each step in the unwinding process.

- modularity: the unwinder code, console stack dump code, and stack
  metadata analysis code are all better separated so that changing one
  of them shouldn't have much of an impact on any of the others.


Josh Poimboeuf (44):
  x86/dumpstack: remove show_trace()
  x86/asm/head: remove unused init_rsp variable
  x86/asm/head: rename 'stack_start' -> 'initial_stack'
  x86/asm/head: use a common function for starting CPUs
  x86/dumpstack: make printk_stack_address() more generally useful
  x86/dumpstack: add IRQ_USABLE_STACK_SIZE define
  x86/dumpstack: remove extra brackets around "<EOE>"
  x86/dumpstack: fix irq stack bounds calculation in
    show_stack_log_lvl()
  x86/dumpstack: fix x86_32 kernel_stack_pointer() previous stack access
  x86/dumpstack: add get_stack_pointer() and get_frame_pointer()
  x86/dumpstack: remove unnecessary stack pointer arguments
  x86: move _stext marker to before head code
  x86/asm/head: remove useless zeroed word
  x86/asm/head:  put real return address on idle task stack
  perf/x86: check perf_callchain_store() error
  oprofile/x86: add regs->ip to oprofile trace
  proc: fix return address printk conversion specifer in
    /proc/<pid>/stack
  ftrace: remove CONFIG_HAVE_FUNCTION_GRAPH_FP_TEST from config
  ftrace: only allocate the ret_stack 'fp' field when needed
  ftrace: add return address pointer to ftrace_ret_stack
  ftrace: add ftrace_graph_ret_addr() stack unwinding helpers
  x86/dumpstack/ftrace: convert dump_trace() callbacks to use
    ftrace_graph_ret_addr()
  ftrace/x86: implement HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
  x86/dumpstack/ftrace: mark function graph handler function as
    unreliable
  x86/dumpstack/ftrace: don't print unreliable addresses in
    print_context_stack_bp()
  x86/dumpstack: allow preemption in show_stack_log_lvl() and
    dump_trace()
  x86/dumpstack: simplify in_exception_stack()
  x86/dumpstack: add get_stack_info() interface
  x86/dumpstack: add recursion checking for all stacks
  x86/unwind: add new unwind interface and implementations
  perf/x86: convert perf_callchain_kernel() to use the new unwinder
  x86/stacktrace: convert save_stack_trace_*() to use the new unwinder
  oprofile/x86: convert x86_backtrace() to use the new unwinder
  x86/dumpstack: convert show_trace_log_lvl() to use the new unwinder
  x86/dumpstack: remove dump_trace() and related callbacks
  x86/entry/unwind: encode pt_regs pointer in frame pointer
  x86/unwind: detect syscall entry regs
  x86/dumpstack: print stack identifier on its own line
  x86/dumpstack: print any pt_regs found on the stack
  x86: remove 64-byte gap at end of irq stack
  x86/asm/head: standardize the end of the stack for idle tasks
  x86/unwind: warn on kernel stack corruption
  x86/unwind: warn on bad stack return address
  x86/unwind: warn if stack grows up

 Documentation/trace/ftrace-design.txt |  11 ++
 arch/arm/kernel/ftrace.c              |   2 +-
 arch/arm64/kernel/entry-ftrace.S      |   2 +-
 arch/arm64/kernel/ftrace.c            |   2 +-
 arch/blackfin/kernel/ftrace-entry.S   |   4 +-
 arch/blackfin/kernel/ftrace.c         |   2 +-
 arch/microblaze/kernel/ftrace.c       |   2 +-
 arch/mips/kernel/ftrace.c             |   4 +-
 arch/parisc/kernel/ftrace.c           |   2 +-
 arch/powerpc/kernel/ftrace.c          |   3 +-
 arch/s390/kernel/ftrace.c             |   3 +-
 arch/sh/kernel/ftrace.c               |   2 +-
 arch/sparc/Kconfig                    |   1 -
 arch/sparc/include/asm/ftrace.h       |   4 +
 arch/sparc/kernel/ftrace.c            |   2 +-
 arch/tile/kernel/ftrace.c             |   2 +-
 arch/x86/Kconfig                      |   1 -
 arch/x86/entry/calling.h              |  21 +++
 arch/x86/entry/entry_64.S             |  10 +-
 arch/x86/events/core.c                |  36 ++--
 arch/x86/include/asm/ftrace.h         |   3 +
 arch/x86/include/asm/kdebug.h         |   2 -
 arch/x86/include/asm/page_64_types.h  |  16 +-
 arch/x86/include/asm/realmode.h       |   2 +-
 arch/x86/include/asm/smp.h            |   3 -
 arch/x86/include/asm/stacktrace.h     | 114 ++++++------
 arch/x86/include/asm/unwind.h         | 104 +++++++++++
 arch/x86/kernel/Makefile              |   6 +
 arch/x86/kernel/acpi/sleep.c          |   2 +-
 arch/x86/kernel/cpu/common.c          |   2 +-
 arch/x86/kernel/dumpstack.c           | 272 ++++++++++++++---------------
 arch/x86/kernel/dumpstack_32.c        | 138 ++++++++-------
 arch/x86/kernel/dumpstack_64.c        | 319 ++++++++++------------------------
 arch/x86/kernel/ftrace.c              |   2 +-
 arch/x86/kernel/head_32.S             |   8 +-
 arch/x86/kernel/head_64.S             |  33 ++--
 arch/x86/kernel/ptrace.c              |   4 +-
 arch/x86/kernel/setup_percpu.c        |   2 +-
 arch/x86/kernel/smpboot.c             |   2 +-
 arch/x86/kernel/stacktrace.c          |  74 ++++----
 arch/x86/kernel/unwind_frame.c        | 222 +++++++++++++++++++++++
 arch/x86/kernel/unwind_guess.c        |  40 +++++
 arch/x86/kernel/vmlinux.lds.S         |   2 +-
 arch/x86/oprofile/backtrace.c         |  49 +++---
 fs/proc/base.c                        |   2 +-
 include/linux/ftrace.h                |  17 +-
 kernel/trace/Kconfig                  |   5 -
 kernel/trace/trace_functions_graph.c  |  67 ++++++-
 48 files changed, 977 insertions(+), 651 deletions(-)
 create mode 100644 arch/x86/include/asm/unwind.h
 create mode 100644 arch/x86/kernel/unwind_frame.c
 create mode 100644 arch/x86/kernel/unwind_guess.c

-- 
2.7.4

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH v2 01/44] x86/dumpstack: remove show_trace()
  2016-08-04 22:21 [PATCH v2 00/44] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
@ 2016-08-04 22:21 ` Josh Poimboeuf
  2016-08-04 22:21 ` [PATCH v2 02/44] x86/asm/head: remove unused init_rsp variable Josh Poimboeuf
                   ` (42 subsequent siblings)
  43 siblings, 0 replies; 66+ messages in thread
From: Josh Poimboeuf @ 2016-08-04 22:21 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park

There are a bewildering array of options for dumping the stack.
Simplify things a little by removing show_trace(), which is unused.

Reviewed-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/include/asm/kdebug.h | 2 --
 arch/x86/kernel/dumpstack.c   | 6 ------
 2 files changed, 8 deletions(-)

diff --git a/arch/x86/include/asm/kdebug.h b/arch/x86/include/asm/kdebug.h
index 1ef9d58..d318811 100644
--- a/arch/x86/include/asm/kdebug.h
+++ b/arch/x86/include/asm/kdebug.h
@@ -24,8 +24,6 @@ enum die_val {
 extern void printk_address(unsigned long address);
 extern void die(const char *, struct pt_regs *,long);
 extern int __must_check __die(const char *, struct pt_regs *, long);
-extern void show_trace(struct task_struct *t, struct pt_regs *regs,
-		       unsigned long *sp, unsigned long bp);
 extern void show_stack_regs(struct pt_regs *regs);
 extern void __show_regs(struct pt_regs *regs, int all);
 extern unsigned long oops_begin(void);
diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index 92e8f0a..5f49c04 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -182,12 +182,6 @@ show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
 	dump_trace(task, regs, stack, bp, &print_trace_ops, log_lvl);
 }
 
-void show_trace(struct task_struct *task, struct pt_regs *regs,
-		unsigned long *stack, unsigned long bp)
-{
-	show_trace_log_lvl(task, regs, stack, bp, "");
-}
-
 void show_stack(struct task_struct *task, unsigned long *sp)
 {
 	unsigned long bp = 0;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 02/44] x86/asm/head: remove unused init_rsp variable
  2016-08-04 22:21 [PATCH v2 00/44] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
  2016-08-04 22:21 ` [PATCH v2 01/44] x86/dumpstack: remove show_trace() Josh Poimboeuf
@ 2016-08-04 22:21 ` Josh Poimboeuf
  2016-08-04 22:21 ` [PATCH v2 03/44] x86/asm/head: rename 'stack_start' -> 'initial_stack' Josh Poimboeuf
                   ` (41 subsequent siblings)
  43 siblings, 0 replies; 66+ messages in thread
From: Josh Poimboeuf @ 2016-08-04 22:21 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park

The init_rsp variable is unused.  Remove it.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/include/asm/realmode.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/x86/include/asm/realmode.h b/arch/x86/include/asm/realmode.h
index 9c6b890..b2b407b 100644
--- a/arch/x86/include/asm/realmode.h
+++ b/arch/x86/include/asm/realmode.h
@@ -44,7 +44,6 @@ struct trampoline_header {
 extern struct real_mode_header *real_mode_header;
 extern unsigned char real_mode_blob_end[];
 
-extern unsigned long init_rsp;
 extern unsigned long initial_code;
 extern unsigned long initial_gs;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 03/44] x86/asm/head: rename 'stack_start' -> 'initial_stack'
  2016-08-04 22:21 [PATCH v2 00/44] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
  2016-08-04 22:21 ` [PATCH v2 01/44] x86/dumpstack: remove show_trace() Josh Poimboeuf
  2016-08-04 22:21 ` [PATCH v2 02/44] x86/asm/head: remove unused init_rsp variable Josh Poimboeuf
@ 2016-08-04 22:21 ` Josh Poimboeuf
  2016-08-05 15:28   ` Nilay Vaish
  2016-08-04 22:22 ` [PATCH v2 04/44] x86/asm/head: use a common function for starting CPUs Josh Poimboeuf
                   ` (40 subsequent siblings)
  43 siblings, 1 reply; 66+ messages in thread
From: Josh Poimboeuf @ 2016-08-04 22:21 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park

The 'stack_start' variable is similar in usage to 'initial_code' and
'initial_gs': they're all stored in head_64.S and they're all updated by
SMP and ACPI suspend before starting a CPU.

Rename it to 'initial_stack' to be consistent with the others.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/include/asm/realmode.h | 1 +
 arch/x86/include/asm/smp.h      | 3 ---
 arch/x86/kernel/acpi/sleep.c    | 2 +-
 arch/x86/kernel/head_32.S       | 8 ++++----
 arch/x86/kernel/head_64.S       | 9 ++++-----
 arch/x86/kernel/smpboot.c       | 2 +-
 6 files changed, 11 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/realmode.h b/arch/x86/include/asm/realmode.h
index b2b407b..677a671 100644
--- a/arch/x86/include/asm/realmode.h
+++ b/arch/x86/include/asm/realmode.h
@@ -46,6 +46,7 @@ extern unsigned char real_mode_blob_end[];
 
 extern unsigned long initial_code;
 extern unsigned long initial_gs;
+extern unsigned long initial_stack;
 
 extern unsigned char real_mode_blob[];
 extern unsigned char real_mode_relocs[];
diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h
index ebd0c16..19980b3 100644
--- a/arch/x86/include/asm/smp.h
+++ b/arch/x86/include/asm/smp.h
@@ -39,9 +39,6 @@ DECLARE_EARLY_PER_CPU_READ_MOSTLY(u16, x86_bios_cpu_apicid);
 DECLARE_EARLY_PER_CPU_READ_MOSTLY(int, x86_cpu_to_logical_apicid);
 #endif
 
-/* Static state in head.S used to set up a CPU */
-extern unsigned long stack_start; /* Initial stack pointer address */
-
 struct task_struct;
 
 struct smp_ops {
diff --git a/arch/x86/kernel/acpi/sleep.c b/arch/x86/kernel/acpi/sleep.c
index adb3eaf..4858733 100644
--- a/arch/x86/kernel/acpi/sleep.c
+++ b/arch/x86/kernel/acpi/sleep.c
@@ -99,7 +99,7 @@ int x86_acpi_suspend_lowlevel(void)
 	saved_magic = 0x12345678;
 #else /* CONFIG_64BIT */
 #ifdef CONFIG_SMP
-	stack_start = (unsigned long)temp_stack + sizeof(temp_stack);
+	initial_stack = (unsigned long)temp_stack + sizeof(temp_stack);
 	early_gdt_descr.address =
 			(unsigned long)get_cpu_gdt_table(smp_processor_id());
 	initial_gs = per_cpu_offset(smp_processor_id());
diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index 6f8902b..5f40126 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -94,7 +94,7 @@ RESERVE_BRK(pagetables, INIT_MAP_SIZE)
  */
 __HEAD
 ENTRY(startup_32)
-	movl pa(stack_start),%ecx
+	movl pa(initial_stack),%ecx
 	
 	/* test KEEP_SEGMENTS flag to see if the bootloader is asking
 		us to not reload segments */
@@ -286,7 +286,7 @@ num_subarch_entries = (. - subarch_entries) / 4
  * start_secondary().
  */
 ENTRY(start_cpu0)
-	movl stack_start, %ecx
+	movl initial_stack, %ecx
 	movl %ecx, %esp
 	jmp  *(initial_code)
 ENDPROC(start_cpu0)
@@ -307,7 +307,7 @@ ENTRY(startup_32_smp)
 	movl %eax,%es
 	movl %eax,%fs
 	movl %eax,%gs
-	movl pa(stack_start),%ecx
+	movl pa(initial_stack),%ecx
 	movl %eax,%ss
 	leal -__PAGE_OFFSET(%ecx),%esp
 
@@ -703,7 +703,7 @@ ENTRY(initial_page_table)
 
 .data
 .balign 4
-ENTRY(stack_start)
+ENTRY(initial_stack)
 	.long init_thread_union+THREAD_SIZE
 
 __INITRODATA
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 9f8efc9..aa10a53 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -226,7 +226,7 @@ ENTRY(secondary_startup_64)
 	movq	%rax, %cr0
 
 	/* Setup a boot time stack */
-	movq stack_start(%rip), %rsp
+	movq initial_stack(%rip), %rsp
 
 	/* zero EFLAGS after setting rsp */
 	pushq $0
@@ -310,7 +310,7 @@ ENDPROC(secondary_startup_64)
  * start_secondary().
  */
 ENTRY(start_cpu0)
-	movq stack_start(%rip),%rsp
+	movq initial_stack(%rip),%rsp
 	movq	initial_code(%rip),%rax
 	pushq	$0		# fake return address to stop unwinder
 	pushq	$__KERNEL_CS	# set correct cs
@@ -319,15 +319,14 @@ ENTRY(start_cpu0)
 ENDPROC(start_cpu0)
 #endif
 
-	/* SMP bootup changes these two */
+	/* Both SMP bootup and ACPI suspend change these variables */
 	__REFDATA
 	.balign	8
 	GLOBAL(initial_code)
 	.quad	x86_64_start_kernel
 	GLOBAL(initial_gs)
 	.quad	INIT_PER_CPU_VAR(irq_stack_union)
-
-	GLOBAL(stack_start)
+	GLOBAL(initial_stack)
 	.quad  init_thread_union+THREAD_SIZE-8
 	.word  0
 	__FINITDATA
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 2a6e84a..2fbd8d7 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -962,7 +962,7 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle)
 
 	early_gdt_descr.address = (unsigned long)get_cpu_gdt_table(cpu);
 	initial_code = (unsigned long)start_secondary;
-	stack_start  = idle->thread.sp;
+	initial_stack  = idle->thread.sp;
 
 	/*
 	 * Enable the espfix hack for this CPU
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 04/44] x86/asm/head: use a common function for starting CPUs
  2016-08-04 22:21 [PATCH v2 00/44] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (2 preceding siblings ...)
  2016-08-04 22:21 ` [PATCH v2 03/44] x86/asm/head: rename 'stack_start' -> 'initial_stack' Josh Poimboeuf
@ 2016-08-04 22:22 ` Josh Poimboeuf
  2016-08-05 15:41   ` Nilay Vaish
  2016-08-04 22:22 ` [PATCH v2 05/44] x86/dumpstack: make printk_stack_address() more generally useful Josh Poimboeuf
                   ` (39 subsequent siblings)
  43 siblings, 1 reply; 66+ messages in thread
From: Josh Poimboeuf @ 2016-08-04 22:22 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park

There are two different pieces of code for starting a CPU: start_cpu0()
and the end of secondary_startup_64().  They're identical except for the
stack setup.  Combine the common parts into a shared start_cpu()
function.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/head_64.S | 18 ++++++++----------
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index aa10a53..8822c20 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -264,13 +264,15 @@ ENTRY(secondary_startup_64)
 	movl	$MSR_GS_BASE,%ecx
 	movl	initial_gs(%rip),%eax
 	movl	initial_gs+4(%rip),%edx
-	wrmsr	
+	wrmsr
 
 	/* rsi is pointer to real mode structure with interesting info.
 	   pass it to C */
 	movq	%rsi, %rdi
-	
-	/* Finally jump to run C code and to be on real kernel address
+
+ENTRY(start_cpu)
+	/*
+	 * Jump to run C code and to be on a real kernel address.
 	 * Since we are running on identity-mapped space we have to jump
 	 * to the full 64bit address, this is only possible as indirect
 	 * jump.  In addition we need to ensure %cs is set so we make this
@@ -307,15 +309,11 @@ ENDPROC(secondary_startup_64)
 /*
  * Boot CPU0 entry point. It's called from play_dead(). Everything has been set
  * up already except stack. We just set up stack here. Then call
- * start_secondary().
+ * start_secondary() via start_cpu().
  */
 ENTRY(start_cpu0)
-	movq initial_stack(%rip),%rsp
-	movq	initial_code(%rip),%rax
-	pushq	$0		# fake return address to stop unwinder
-	pushq	$__KERNEL_CS	# set correct cs
-	pushq	%rax		# target address in negative space
-	lretq
+	movq	initial_stack(%rip), %rsp
+	jmp	start_cpu
 ENDPROC(start_cpu0)
 #endif
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 05/44] x86/dumpstack: make printk_stack_address() more generally useful
  2016-08-04 22:21 [PATCH v2 00/44] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (3 preceding siblings ...)
  2016-08-04 22:22 ` [PATCH v2 04/44] x86/asm/head: use a common function for starting CPUs Josh Poimboeuf
@ 2016-08-04 22:22 ` Josh Poimboeuf
  2016-08-04 22:22 ` [PATCH v2 06/44] x86/dumpstack: add IRQ_USABLE_STACK_SIZE define Josh Poimboeuf
                   ` (38 subsequent siblings)
  43 siblings, 0 replies; 66+ messages in thread
From: Josh Poimboeuf @ 2016-08-04 22:22 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park

Change printk_stack_address() to be useful when called by an unwinder
outside the context of dump_trace().

Specifically:

- printk_stack_address()'s 'data' argument is always used as the log
  level string.  Make that explicit.

- Call touch_nmi_watchdog().

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/dumpstack.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index 5f49c04..6b3376d 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -26,10 +26,11 @@ int kstack_depth_to_print = 3 * STACKSLOTS_PER_LINE;
 static int die_counter;
 
 static void printk_stack_address(unsigned long address, int reliable,
-		void *data)
+				 char *log_lvl)
 {
+	touch_nmi_watchdog();
 	printk("%s [<%p>] %s%pB\n",
-		(char *)data, (void *)address, reliable ? "" : "? ",
+		log_lvl, (void *)address, reliable ? "" : "? ",
 		(void *)address);
 }
 
@@ -163,7 +164,6 @@ static int print_trace_stack(void *data, char *name)
  */
 static int print_trace_address(void *data, unsigned long addr, int reliable)
 {
-	touch_nmi_watchdog();
 	printk_stack_address(addr, reliable, data);
 	return 0;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 06/44] x86/dumpstack: add IRQ_USABLE_STACK_SIZE define
  2016-08-04 22:21 [PATCH v2 00/44] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (4 preceding siblings ...)
  2016-08-04 22:22 ` [PATCH v2 05/44] x86/dumpstack: make printk_stack_address() more generally useful Josh Poimboeuf
@ 2016-08-04 22:22 ` Josh Poimboeuf
  2016-08-04 22:22 ` [PATCH v2 07/44] x86/dumpstack: remove extra brackets around "<EOE>" Josh Poimboeuf
                   ` (37 subsequent siblings)
  43 siblings, 0 replies; 66+ messages in thread
From: Josh Poimboeuf @ 2016-08-04 22:22 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park

For reasons unknown, the x86_64 irq stack starts at an offset 64 bytes
from the end of the page.  At least make that explicit.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/include/asm/page_64_types.h | 19 +++++++++++--------
 arch/x86/kernel/cpu/common.c         |  2 +-
 arch/x86/kernel/dumpstack_64.c       |  5 +----
 arch/x86/kernel/setup_percpu.c       |  2 +-
 4 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/page_64_types.h b/arch/x86/include/asm/page_64_types.h
index 9215e05..6256baf 100644
--- a/arch/x86/include/asm/page_64_types.h
+++ b/arch/x86/include/asm/page_64_types.h
@@ -12,17 +12,20 @@
 #endif
 
 #define THREAD_SIZE_ORDER	(2 + KASAN_STACK_ORDER)
-#define THREAD_SIZE  (PAGE_SIZE << THREAD_SIZE_ORDER)
-#define CURRENT_MASK (~(THREAD_SIZE - 1))
+#define THREAD_SIZE		(PAGE_SIZE << THREAD_SIZE_ORDER)
+#define CURRENT_MASK		(~(THREAD_SIZE - 1))
 
-#define EXCEPTION_STACK_ORDER (0 + KASAN_STACK_ORDER)
-#define EXCEPTION_STKSZ (PAGE_SIZE << EXCEPTION_STACK_ORDER)
+#define EXCEPTION_STACK_ORDER	(0 + KASAN_STACK_ORDER)
+#define EXCEPTION_STKSZ		(PAGE_SIZE << EXCEPTION_STACK_ORDER)
 
-#define DEBUG_STACK_ORDER (EXCEPTION_STACK_ORDER + 1)
-#define DEBUG_STKSZ (PAGE_SIZE << DEBUG_STACK_ORDER)
+#define DEBUG_STACK_ORDER	(EXCEPTION_STACK_ORDER + 1)
+#define DEBUG_STKSZ		(PAGE_SIZE << DEBUG_STACK_ORDER)
 
-#define IRQ_STACK_ORDER (2 + KASAN_STACK_ORDER)
-#define IRQ_STACK_SIZE (PAGE_SIZE << IRQ_STACK_ORDER)
+#define IRQ_STACK_ORDER		(2 + KASAN_STACK_ORDER)
+#define IRQ_STACK_SIZE		(PAGE_SIZE << IRQ_STACK_ORDER)
+
+/* FIXME: why? */
+#define IRQ_USABLE_STACK_SIZE	(IRQ_STACK_SIZE - 64)
 
 #define DOUBLEFAULT_STACK 1
 #define NMI_STACK 2
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 809eda0..8f3f7a4 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1281,7 +1281,7 @@ DEFINE_PER_CPU(struct task_struct *, current_task) ____cacheline_aligned =
 EXPORT_PER_CPU_SYMBOL(current_task);
 
 DEFINE_PER_CPU(char *, irq_stack_ptr) =
-	init_per_cpu_var(irq_stack_union.irq_stack) + IRQ_STACK_SIZE - 64;
+	init_per_cpu_var(irq_stack_union.irq_stack) + IRQ_USABLE_STACK_SIZE;
 
 DEFINE_PER_CPU(unsigned int, irq_count) __visible = -1;
 
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index 9ee4520..43023ae 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -103,9 +103,6 @@ in_irq_stack(unsigned long *stack, unsigned long *irq_stack,
 	return (stack >= irq_stack && stack < irq_stack_end);
 }
 
-static const unsigned long irq_stack_size =
-	(IRQ_STACK_SIZE - 64) / sizeof(unsigned long);
-
 enum stack_type {
 	STACK_IS_UNKNOWN,
 	STACK_IS_NORMAL,
@@ -133,7 +130,7 @@ analyze_stack(int cpu, struct task_struct *task, unsigned long *stack,
 		return STACK_IS_NORMAL;
 
 	*stack_end = irq_stack;
-	irq_stack = irq_stack - irq_stack_size;
+	irq_stack -= (IRQ_USABLE_STACK_SIZE / sizeof(long));
 
 	if (in_irq_stack(stack, irq_stack, *stack_end))
 		return STACK_IS_IRQ;
diff --git a/arch/x86/kernel/setup_percpu.c b/arch/x86/kernel/setup_percpu.c
index 7a40e06..be6b571 100644
--- a/arch/x86/kernel/setup_percpu.c
+++ b/arch/x86/kernel/setup_percpu.c
@@ -246,7 +246,7 @@ void __init setup_per_cpu_areas(void)
 #ifdef CONFIG_X86_64
 		per_cpu(irq_stack_ptr, cpu) =
 			per_cpu(irq_stack_union.irq_stack, cpu) +
-			IRQ_STACK_SIZE - 64;
+			IRQ_USABLE_STACK_SIZE;
 #endif
 #ifdef CONFIG_NUMA
 		per_cpu(x86_cpu_to_node_map, cpu) =
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 07/44] x86/dumpstack: remove extra brackets around "<EOE>"
  2016-08-04 22:21 [PATCH v2 00/44] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (5 preceding siblings ...)
  2016-08-04 22:22 ` [PATCH v2 06/44] x86/dumpstack: add IRQ_USABLE_STACK_SIZE define Josh Poimboeuf
@ 2016-08-04 22:22 ` Josh Poimboeuf
  2016-08-04 22:22 ` [PATCH v2 08/44] x86/dumpstack: fix irq stack bounds calculation in show_stack_log_lvl() Josh Poimboeuf
                   ` (36 subsequent siblings)
  43 siblings, 0 replies; 66+ messages in thread
From: Josh Poimboeuf @ 2016-08-04 22:22 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park

When starting the dump of an exception stack, it shows "<<EOE>>" instead
of "<EOE>".  print_trace_stack() already adds brackets, no need to add
them again.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/dumpstack_64.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index 43023ae..7ea6ed0 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -199,7 +199,7 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 
 			bp = ops->walk_stack(task, stack, bp, ops,
 					     data, stack_end, &graph);
-			ops->stack(data, "<EOE>");
+			ops->stack(data, "EOE");
 			/*
 			 * We link to the next stack via the
 			 * second-to-last pointer (index -2 to end) in the
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 08/44] x86/dumpstack: fix irq stack bounds calculation in show_stack_log_lvl()
  2016-08-04 22:21 [PATCH v2 00/44] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (6 preceding siblings ...)
  2016-08-04 22:22 ` [PATCH v2 07/44] x86/dumpstack: remove extra brackets around "<EOE>" Josh Poimboeuf
@ 2016-08-04 22:22 ` Josh Poimboeuf
  2016-08-04 22:22 ` [PATCH v2 09/44] x86/dumpstack: fix x86_32 kernel_stack_pointer() previous stack access Josh Poimboeuf
                   ` (35 subsequent siblings)
  43 siblings, 0 replies; 66+ messages in thread
From: Josh Poimboeuf @ 2016-08-04 22:22 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park

The percpu irq_stack_ptr variable has a 64-byte gap from the end of the
allocated irq stack area, so subtracting IRQ_STACK_SIZE from it actually
results in a value 64 bytes before the beginning of the stack.  Use
IRQ_USABLE_STACK_SIZE instead.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/dumpstack_64.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index 7ea6ed0..0fdd371 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -253,8 +253,8 @@ show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
 	preempt_disable();
 	cpu = smp_processor_id();
 
-	irq_stack_end	= (unsigned long *)(per_cpu(irq_stack_ptr, cpu));
-	irq_stack	= (unsigned long *)(per_cpu(irq_stack_ptr, cpu) - IRQ_STACK_SIZE);
+	irq_stack_end = (unsigned long *)(per_cpu(irq_stack_ptr, cpu));
+	irq_stack     = irq_stack_end - (IRQ_USABLE_STACK_SIZE / sizeof(long));
 
 	/*
 	 * Debugging aid: "show_stack(NULL, NULL);" prints the
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 09/44] x86/dumpstack: fix x86_32 kernel_stack_pointer() previous stack access
  2016-08-04 22:21 [PATCH v2 00/44] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (7 preceding siblings ...)
  2016-08-04 22:22 ` [PATCH v2 08/44] x86/dumpstack: fix irq stack bounds calculation in show_stack_log_lvl() Josh Poimboeuf
@ 2016-08-04 22:22 ` Josh Poimboeuf
  2016-08-04 22:22 ` [PATCH v2 10/44] x86/dumpstack: add get_stack_pointer() and get_frame_pointer() Josh Poimboeuf
                   ` (34 subsequent siblings)
  43 siblings, 0 replies; 66+ messages in thread
From: Josh Poimboeuf @ 2016-08-04 22:22 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park

On x86_32, when an interrupt happens from kernel space, SS and SP aren't
pushed and the existing stack is used.  So pt_regs is effectively two
words shorter, and the previous stack pointer is normally the memory
after the shortened pt_regs, aka '&regs->sp'.

But in the rare case where the interrupt hits right after the stack
pointer has been changed to point to an empty stack, like for example
when call_on_stack() is used, the address immediately after the
shortened pt_regs is no longer on the stack.  In that case, instead of
'&regs->sp', the previous stack pointer should be retrieved from the
beginning of the current stack page.

kernel_stack_pointer() wants to do that, but it forgets to dereference
the pointer.  So instead of returning a pointer to the previous stack,
it returns a pointer to the beginning of the current stack.

Fixes: 0788aa6a23cb ("x86: Prepare removal of previous_esp from i386 thread_info structure")
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/ptrace.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index f79576a..a1606ea 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -173,8 +173,8 @@ unsigned long kernel_stack_pointer(struct pt_regs *regs)
 		return sp;
 
 	prev_esp = (u32 *)(context);
-	if (prev_esp)
-		return (unsigned long)prev_esp;
+	if (*prev_esp)
+		return (unsigned long)*prev_esp;
 
 	return (unsigned long)regs;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 10/44] x86/dumpstack: add get_stack_pointer() and get_frame_pointer()
  2016-08-04 22:21 [PATCH v2 00/44] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (8 preceding siblings ...)
  2016-08-04 22:22 ` [PATCH v2 09/44] x86/dumpstack: fix x86_32 kernel_stack_pointer() previous stack access Josh Poimboeuf
@ 2016-08-04 22:22 ` Josh Poimboeuf
  2016-08-04 22:22 ` [PATCH v2 11/44] x86/dumpstack: remove unnecessary stack pointer arguments Josh Poimboeuf
                   ` (33 subsequent siblings)
  43 siblings, 0 replies; 66+ messages in thread
From: Josh Poimboeuf @ 2016-08-04 22:22 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park

The various functions involved in dumping the stack all do similar
things with regard to getting the stack pointer and the frame pointer
based on the regs and task arguments.  Create helper functions to
do that instead.

Reviewed-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/include/asm/stacktrace.h | 39 ++++++++++++++++++++++-----------------
 arch/x86/kernel/dumpstack.c       |  5 ++---
 arch/x86/kernel/dumpstack_32.c    | 25 ++++---------------------
 arch/x86/kernel/dumpstack_64.c    | 30 ++++--------------------------
 4 files changed, 32 insertions(+), 67 deletions(-)

diff --git a/arch/x86/include/asm/stacktrace.h b/arch/x86/include/asm/stacktrace.h
index 0944218..6f65995 100644
--- a/arch/x86/include/asm/stacktrace.h
+++ b/arch/x86/include/asm/stacktrace.h
@@ -49,37 +49,42 @@ void dump_trace(struct task_struct *tsk, struct pt_regs *regs,
 
 #ifdef CONFIG_X86_32
 #define STACKSLOTS_PER_LINE 8
-#define get_bp(bp) asm("movl %%ebp, %0" : "=r" (bp) :)
 #else
 #define STACKSLOTS_PER_LINE 4
-#define get_bp(bp) asm("movq %%rbp, %0" : "=r" (bp) :)
 #endif
 
 #ifdef CONFIG_FRAME_POINTER
-static inline unsigned long
-stack_frame(struct task_struct *task, struct pt_regs *regs)
+static inline unsigned long *
+get_frame_pointer(struct task_struct *task, struct pt_regs *regs)
 {
-	unsigned long bp;
-
 	if (regs)
-		return regs->bp;
+		return (unsigned long *)regs->bp;
 
-	if (task == current) {
-		/* Grab bp right from our regs */
-		get_bp(bp);
-		return bp;
-	}
+	if (!task || task == current)
+		return __builtin_frame_address(0);
 
 	/* bp is the last reg pushed by switch_to */
-	return *(unsigned long *)task->thread.sp;
+	return (unsigned long *)*(unsigned long *)task->thread.sp;
 }
 #else
-static inline unsigned long
-stack_frame(struct task_struct *task, struct pt_regs *regs)
+static inline unsigned long *
+get_frame_pointer(struct task_struct *task, struct pt_regs *regs)
 {
 	return 0;
 }
-#endif
+#endif /* CONFIG_FRAME_POINTER */
+
+static inline unsigned long *
+get_stack_pointer(struct task_struct *task, struct pt_regs *regs)
+{
+	if (regs)
+		return (unsigned long *)kernel_stack_pointer(regs);
+
+	if (!task || task == current)
+		return __builtin_frame_address(0);
+
+	return (unsigned long *)task->thread.sp;
+}
 
 extern void
 show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
@@ -106,7 +111,7 @@ static inline unsigned long caller_frame_pointer(void)
 {
 	struct stack_frame *frame;
 
-	get_bp(frame);
+	frame = __builtin_frame_address(0);
 
 #ifdef CONFIG_FRAME_POINTER
 	frame = frame->next_frame;
diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index 6b3376d..68f42bb 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -185,15 +185,14 @@ show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
 void show_stack(struct task_struct *task, unsigned long *sp)
 {
 	unsigned long bp = 0;
-	unsigned long stack;
 
 	/*
 	 * Stack frames below this one aren't interesting.  Don't show them
 	 * if we're printing for %current.
 	 */
 	if (!sp && (!task || task == current)) {
-		sp = &stack;
-		bp = stack_frame(current, NULL);
+		sp = get_stack_pointer(current, NULL);
+		bp = (unsigned long)get_frame_pointer(current, NULL);
 	}
 
 	show_stack_log_lvl(task, NULL, sp, bp, "");
diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
index 0967571..358fe1c 100644
--- a/arch/x86/kernel/dumpstack_32.c
+++ b/arch/x86/kernel/dumpstack_32.c
@@ -46,19 +46,9 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 	int graph = 0;
 	u32 *prev_esp;
 
-	if (!task)
-		task = current;
-
-	if (!stack) {
-		unsigned long dummy;
-
-		stack = &dummy;
-		if (task != current)
-			stack = (unsigned long *)task->thread.sp;
-	}
-
-	if (!bp)
-		bp = stack_frame(task, regs);
+	task = task ? : current;
+	stack = stack ? : get_stack_pointer(task, regs);
+	bp = bp ? : (unsigned long)get_frame_pointer(task, regs);
 
 	for (;;) {
 		void *end_stack;
@@ -95,14 +85,7 @@ show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
 	unsigned long *stack;
 	int i;
 
-	if (sp == NULL) {
-		if (regs)
-			sp = (unsigned long *)regs->sp;
-		else if (task)
-			sp = (unsigned long *)task->thread.sp;
-		else
-			sp = (unsigned long *)&sp;
-	}
+	sp = sp ? : get_stack_pointer(task, regs);
 
 	stack = sp;
 	for (i = 0; i < kstack_depth_to_print; i++) {
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index 0fdd371..3c5dbc0 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -151,25 +151,14 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 {
 	const unsigned cpu = get_cpu();
 	unsigned long *irq_stack = (unsigned long *)per_cpu(irq_stack_ptr, cpu);
-	unsigned long dummy;
 	unsigned used = 0;
 	int graph = 0;
 	int done = 0;
 
-	if (!task)
-		task = current;
+	task = task ? : current;
+	stack = stack ? : get_stack_pointer(task, regs);
+	bp = bp ? : (unsigned long)get_frame_pointer(task, regs);
 
-	if (!stack) {
-		if (regs)
-			stack = (unsigned long *)regs->sp;
-		else if (task != current)
-			stack = (unsigned long *)task->thread.sp;
-		else
-			stack = &dummy;
-	}
-
-	if (!bp)
-		bp = stack_frame(task, regs);
 	/*
 	 * Print function call entries in all stacks, starting at the
 	 * current stack address. If the stacks consist of nested
@@ -256,18 +245,7 @@ show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
 	irq_stack_end = (unsigned long *)(per_cpu(irq_stack_ptr, cpu));
 	irq_stack     = irq_stack_end - (IRQ_USABLE_STACK_SIZE / sizeof(long));
 
-	/*
-	 * Debugging aid: "show_stack(NULL, NULL);" prints the
-	 * back trace for this cpu:
-	 */
-	if (sp == NULL) {
-		if (regs)
-			sp = (unsigned long *)regs->sp;
-		else if (task)
-			sp = (unsigned long *)task->thread.sp;
-		else
-			sp = (unsigned long *)&sp;
-	}
+	sp = sp ? : get_stack_pointer(task, regs);
 
 	stack = sp;
 	for (i = 0; i < kstack_depth_to_print; i++) {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 11/44] x86/dumpstack: remove unnecessary stack pointer arguments
  2016-08-04 22:21 [PATCH v2 00/44] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (9 preceding siblings ...)
  2016-08-04 22:22 ` [PATCH v2 10/44] x86/dumpstack: add get_stack_pointer() and get_frame_pointer() Josh Poimboeuf
@ 2016-08-04 22:22 ` Josh Poimboeuf
  2016-08-04 22:22 ` [PATCH v2 12/44] x86: move _stext marker to before head code Josh Poimboeuf
                   ` (32 subsequent siblings)
  43 siblings, 0 replies; 66+ messages in thread
From: Josh Poimboeuf @ 2016-08-04 22:22 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park

When calling show_stack_log_lvl() or dump_trace() with a regs argument,
providing a stack pointer or frame pointer is redundant.

Reviewed-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>d
---
 arch/x86/kernel/dumpstack.c    | 2 +-
 arch/x86/kernel/dumpstack_32.c | 2 +-
 arch/x86/kernel/dumpstack_64.c | 5 +----
 arch/x86/oprofile/backtrace.c  | 4 +---
 4 files changed, 4 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index 68f42bb..692eecae 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -200,7 +200,7 @@ void show_stack(struct task_struct *task, unsigned long *sp)
 
 void show_stack_regs(struct pt_regs *regs)
 {
-	show_stack_log_lvl(current, regs, (unsigned long *)regs->sp, regs->bp, "");
+	show_stack_log_lvl(current, regs, NULL, 0, "");
 }
 
 static arch_spinlock_t die_lock = __ARCH_SPIN_LOCK_UNLOCKED;
diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
index 358fe1c..c533b8b 100644
--- a/arch/x86/kernel/dumpstack_32.c
+++ b/arch/x86/kernel/dumpstack_32.c
@@ -122,7 +122,7 @@ void show_regs(struct pt_regs *regs)
 		u8 *ip;
 
 		pr_emerg("Stack:\n");
-		show_stack_log_lvl(NULL, regs, &regs->sp, 0, KERN_EMERG);
+		show_stack_log_lvl(NULL, regs, NULL, 0, KERN_EMERG);
 
 		pr_emerg("Code:");
 
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index 3c5dbc0..491f2fd 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -283,9 +283,7 @@ show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
 void show_regs(struct pt_regs *regs)
 {
 	int i;
-	unsigned long sp;
 
-	sp = regs->sp;
 	show_regs_print_info(KERN_DEFAULT);
 	__show_regs(regs, 1);
 
@@ -300,8 +298,7 @@ void show_regs(struct pt_regs *regs)
 		u8 *ip;
 
 		printk(KERN_DEFAULT "Stack:\n");
-		show_stack_log_lvl(NULL, regs, (unsigned long *)sp,
-				   0, KERN_DEFAULT);
+		show_stack_log_lvl(NULL, regs, NULL, 0, KERN_DEFAULT);
 
 		printk(KERN_DEFAULT "Code: ");
 
diff --git a/arch/x86/oprofile/backtrace.c b/arch/x86/oprofile/backtrace.c
index cb31a44..c594768 100644
--- a/arch/x86/oprofile/backtrace.c
+++ b/arch/x86/oprofile/backtrace.c
@@ -113,10 +113,8 @@ x86_backtrace(struct pt_regs * const regs, unsigned int depth)
 	struct stack_frame *head = (struct stack_frame *)frame_pointer(regs);
 
 	if (!user_mode(regs)) {
-		unsigned long stack = kernel_stack_pointer(regs);
 		if (depth)
-			dump_trace(NULL, regs, (unsigned long *)stack, 0,
-				   &backtrace_ops, &depth);
+			dump_trace(NULL, regs, NULL, 0, &backtrace_ops, &depth);
 		return;
 	}
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 12/44] x86: move _stext marker to before head code
  2016-08-04 22:21 [PATCH v2 00/44] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (10 preceding siblings ...)
  2016-08-04 22:22 ` [PATCH v2 11/44] x86/dumpstack: remove unnecessary stack pointer arguments Josh Poimboeuf
@ 2016-08-04 22:22 ` Josh Poimboeuf
  2016-08-04 22:22 ` [PATCH v2 13/44] x86/asm/head: remove useless zeroed word Josh Poimboeuf
                   ` (31 subsequent siblings)
  43 siblings, 0 replies; 66+ messages in thread
From: Josh Poimboeuf @ 2016-08-04 22:22 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park

When core_kernel_text() is used to determine whether an address on a
task's stack trace is a kernel text address, it incorrectly returns
false for early text addresses for the head code between the _text and
_stext markers.

Head code is text code too, so mark it as such.  This seems to match the
intent of other users of the _stext symbol, and it also seems consistent
with what other architectures are already doing.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/vmlinux.lds.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 9297a00..1d9b636 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -91,10 +91,10 @@ SECTIONS
 	/* Text and read-only data */
 	.text :  AT(ADDR(.text) - LOAD_OFFSET) {
 		_text = .;
+		_stext = .;
 		/* bootstrapping code */
 		HEAD_TEXT
 		. = ALIGN(8);
-		_stext = .;
 		TEXT_TEXT
 		SCHED_TEXT
 		LOCK_TEXT
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 13/44] x86/asm/head: remove useless zeroed word
  2016-08-04 22:21 [PATCH v2 00/44] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (11 preceding siblings ...)
  2016-08-04 22:22 ` [PATCH v2 12/44] x86: move _stext marker to before head code Josh Poimboeuf
@ 2016-08-04 22:22 ` Josh Poimboeuf
  2016-08-05 16:13   ` Brian Gerst
  2016-08-04 22:22 ` [PATCH v2 14/44] x86/asm/head: put real return address on idle task stack Josh Poimboeuf
                   ` (30 subsequent siblings)
  43 siblings, 1 reply; 66+ messages in thread
From: Josh Poimboeuf @ 2016-08-04 22:22 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park

This zeroed word has no apparent purpose, so remove it.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/head_64.S | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 8822c20..ac6e27e 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -326,7 +326,6 @@ ENDPROC(start_cpu0)
 	.quad	INIT_PER_CPU_VAR(irq_stack_union)
 	GLOBAL(initial_stack)
 	.quad  init_thread_union+THREAD_SIZE-8
-	.word  0
 	__FINITDATA
 
 bad_address:
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 14/44] x86/asm/head:  put real return address on idle task stack
  2016-08-04 22:21 [PATCH v2 00/44] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (12 preceding siblings ...)
  2016-08-04 22:22 ` [PATCH v2 13/44] x86/asm/head: remove useless zeroed word Josh Poimboeuf
@ 2016-08-04 22:22 ` Josh Poimboeuf
  2016-08-04 22:22 ` [PATCH v2 15/44] perf/x86: check perf_callchain_store() error Josh Poimboeuf
                   ` (29 subsequent siblings)
  43 siblings, 0 replies; 66+ messages in thread
From: Josh Poimboeuf @ 2016-08-04 22:22 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park

The frame at the end of each idle task stack has a zero instead of a
real return address.  This is inconsistent with real task stacks, which
have a real return address at that spot.  This inconsistency can be
confusing for stack unwinders.

Make it a real address by using the side effect of a call instruction to
push the instruction pointer on the stack.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/head_64.S | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index ac6e27e..c910c27 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -296,8 +296,9 @@ ENTRY(start_cpu)
 	 *	REX.W + FF /5 JMP m16:64 Jump far, absolute indirect,
 	 *		address given in m16:64.
 	 */
-	movq	initial_code(%rip),%rax
-	pushq	$0		# fake return address to stop unwinder
+	call	1f		# put return address on stack for unwinder
+1:	xorq	%rbp, %rbp	# clear frame pointer
+	movq	initial_code(%rip), %rax
 	pushq	$__KERNEL_CS	# set correct cs
 	pushq	%rax		# target address in negative space
 	lretq
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 15/44] perf/x86: check perf_callchain_store() error
  2016-08-04 22:21 [PATCH v2 00/44] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (13 preceding siblings ...)
  2016-08-04 22:22 ` [PATCH v2 14/44] x86/asm/head: put real return address on idle task stack Josh Poimboeuf
@ 2016-08-04 22:22 ` Josh Poimboeuf
  2016-08-04 22:22 ` [PATCH v2 16/44] oprofile/x86: add regs->ip to oprofile trace Josh Poimboeuf
                   ` (28 subsequent siblings)
  43 siblings, 0 replies; 66+ messages in thread
From: Josh Poimboeuf @ 2016-08-04 22:22 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park

Add a check to perf_callchain_kernel() so that it returns early if the
callchain entry array is already full.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/events/core.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index d0efb5c..c1319ac 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2277,7 +2277,8 @@ perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *re
 		return;
 	}
 
-	perf_callchain_store(entry, regs->ip);
+	if (perf_callchain_store(entry, regs->ip))
+		return;
 
 	dump_trace(NULL, regs, NULL, 0, &backtrace_ops, entry);
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 16/44] oprofile/x86: add regs->ip to oprofile trace
  2016-08-04 22:21 [PATCH v2 00/44] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (14 preceding siblings ...)
  2016-08-04 22:22 ` [PATCH v2 15/44] perf/x86: check perf_callchain_store() error Josh Poimboeuf
@ 2016-08-04 22:22 ` Josh Poimboeuf
  2016-08-04 22:22 ` [PATCH v2 17/44] proc: fix return address printk conversion specifer in /proc/<pid>/stack Josh Poimboeuf
                   ` (27 subsequent siblings)
  43 siblings, 0 replies; 66+ messages in thread
From: Josh Poimboeuf @ 2016-08-04 22:22 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Robert Richter

dump_trace() doesn't add the interrupted instruction's address to the
trace, so add it manually.

Cc: Robert Richter <rric@kernel.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/oprofile/backtrace.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/oprofile/backtrace.c b/arch/x86/oprofile/backtrace.c
index c594768..d950f9e 100644
--- a/arch/x86/oprofile/backtrace.c
+++ b/arch/x86/oprofile/backtrace.c
@@ -113,8 +113,14 @@ x86_backtrace(struct pt_regs * const regs, unsigned int depth)
 	struct stack_frame *head = (struct stack_frame *)frame_pointer(regs);
 
 	if (!user_mode(regs)) {
-		if (depth)
-			dump_trace(NULL, regs, NULL, 0, &backtrace_ops, &depth);
+		if (!depth)
+			return;
+
+		oprofile_add_trace(regs->ip);
+		if (!--depth)
+			return;
+
+		dump_trace(NULL, regs, NULL, 0, &backtrace_ops, &depth);
 		return;
 	}
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 17/44] proc: fix return address printk conversion specifer in /proc/<pid>/stack
  2016-08-04 22:21 [PATCH v2 00/44] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (15 preceding siblings ...)
  2016-08-04 22:22 ` [PATCH v2 16/44] oprofile/x86: add regs->ip to oprofile trace Josh Poimboeuf
@ 2016-08-04 22:22 ` Josh Poimboeuf
  2016-08-04 22:22 ` [PATCH v2 18/44] ftrace: remove CONFIG_HAVE_FUNCTION_GRAPH_FP_TEST from config Josh Poimboeuf
                   ` (26 subsequent siblings)
  43 siblings, 0 replies; 66+ messages in thread
From: Josh Poimboeuf @ 2016-08-04 22:22 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park

When printing call return addresses found on a stack, /proc/<pid>/stack
can sometimes give a confusing result.  If the call instruction was the
last instruction in the function (which can happen when calling a
noreturn function), '%pS' will incorrectly display the name of the
function which happens to be next in the object code, rather than the
name of the actual calling function.

Use '%pB' instead, which was created for this exact purpose.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 fs/proc/base.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 31370da..cc4d81c 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -483,7 +483,7 @@ static int proc_pid_stack(struct seq_file *m, struct pid_namespace *ns,
 		save_stack_trace_tsk(task, &trace);
 
 		for (i = 0; i < trace.nr_entries; i++) {
-			seq_printf(m, "[<%pK>] %pS\n",
+			seq_printf(m, "[<%pK>] %pB\n",
 				   (void *)entries[i], (void *)entries[i]);
 		}
 		unlock_trace(task);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 18/44] ftrace: remove CONFIG_HAVE_FUNCTION_GRAPH_FP_TEST from config
  2016-08-04 22:21 [PATCH v2 00/44] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (16 preceding siblings ...)
  2016-08-04 22:22 ` [PATCH v2 17/44] proc: fix return address printk conversion specifer in /proc/<pid>/stack Josh Poimboeuf
@ 2016-08-04 22:22 ` Josh Poimboeuf
  2016-08-04 22:22 ` [PATCH v2 19/44] ftrace: only allocate the ret_stack 'fp' field when needed Josh Poimboeuf
                   ` (25 subsequent siblings)
  43 siblings, 0 replies; 66+ messages in thread
From: Josh Poimboeuf @ 2016-08-04 22:22 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park

Make HAVE_FUNCTION_GRAPH_FP_TEST a normal define, independent from
kconfig.  This removes some config file pollution and simplifies the
checking for the fp test.

Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/arm64/kernel/entry-ftrace.S     | 2 +-
 arch/blackfin/kernel/ftrace-entry.S  | 4 ++--
 arch/sparc/Kconfig                   | 1 -
 arch/sparc/include/asm/ftrace.h      | 4 ++++
 arch/x86/Kconfig                     | 1 -
 arch/x86/include/asm/ftrace.h        | 1 +
 kernel/trace/Kconfig                 | 5 -----
 kernel/trace/trace_functions_graph.c | 2 +-
 8 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/kernel/entry-ftrace.S b/arch/arm64/kernel/entry-ftrace.S
index 0f03a8f..aef02d2 100644
--- a/arch/arm64/kernel/entry-ftrace.S
+++ b/arch/arm64/kernel/entry-ftrace.S
@@ -219,7 +219,7 @@ ENDPROC(ftrace_graph_caller)
  *
  * Run ftrace_return_to_handler() before going back to parent.
  * @fp is checked against the value passed by ftrace_graph_caller()
- * only when CONFIG_HAVE_FUNCTION_GRAPH_FP_TEST is enabled.
+ * only when HAVE_FUNCTION_GRAPH_FP_TEST is enabled.
  */
 ENTRY(return_to_handler)
 	save_return_regs
diff --git a/arch/blackfin/kernel/ftrace-entry.S b/arch/blackfin/kernel/ftrace-entry.S
index 28d0595..3b8bdcb 100644
--- a/arch/blackfin/kernel/ftrace-entry.S
+++ b/arch/blackfin/kernel/ftrace-entry.S
@@ -169,7 +169,7 @@ ENTRY(_ftrace_graph_caller)
 	r0 = sp;	/* unsigned long *parent */
 	r1 = [sp];	/* unsigned long self_addr */
 # endif
-# ifdef CONFIG_HAVE_FUNCTION_GRAPH_FP_TEST
+# ifdef HAVE_FUNCTION_GRAPH_FP_TEST
 	r2 = fp;	/* unsigned long frame_pointer */
 # endif
 	r0 += 16;	/* skip the 4 local regs on stack */
@@ -190,7 +190,7 @@ ENTRY(_return_to_handler)
 	[--sp] = r1;
 
 	/* get original return address */
-# ifdef CONFIG_HAVE_FUNCTION_GRAPH_FP_TEST
+# ifdef HAVE_FUNCTION_GRAPH_FP_TEST
 	r0 = fp;	/* Blackfin is sane, so omit this */
 # endif
 	call _ftrace_return_to_handler;
diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index 546293d..b0a6ab2 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -55,7 +55,6 @@ config SPARC64
 	def_bool 64BIT
 	select HAVE_FUNCTION_TRACER
 	select HAVE_FUNCTION_GRAPH_TRACER
-	select HAVE_FUNCTION_GRAPH_FP_TEST
 	select HAVE_KRETPROBES
 	select HAVE_KPROBES
 	select HAVE_RCU_TABLE_FREE if SMP
diff --git a/arch/sparc/include/asm/ftrace.h b/arch/sparc/include/asm/ftrace.h
index 3192a8e..62755a3 100644
--- a/arch/sparc/include/asm/ftrace.h
+++ b/arch/sparc/include/asm/ftrace.h
@@ -9,6 +9,10 @@
 void _mcount(void);
 #endif
 
+#endif /* CONFIG_MCOUNT */
+
+#if defined(CONFIG_SPARC64) && !defined(CC_USE_FENTRY)
+#define HAVE_FUNCTION_GRAPH_FP_TEST
 #endif
 
 #ifdef CONFIG_DYNAMIC_FTRACE
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 2fa5585..69267bd 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -108,7 +108,6 @@ config X86
 	select HAVE_EXIT_THREAD
 	select HAVE_FENTRY			if X86_64
 	select HAVE_FTRACE_MCOUNT_RECORD
-	select HAVE_FUNCTION_GRAPH_FP_TEST
 	select HAVE_FUNCTION_GRAPH_TRACER
 	select HAVE_FUNCTION_TRACER
 	select HAVE_GENERIC_DMA_COHERENT	if X86_32
diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h
index a4820d4..37f67cb 100644
--- a/arch/x86/include/asm/ftrace.h
+++ b/arch/x86/include/asm/ftrace.h
@@ -6,6 +6,7 @@
 # define MCOUNT_ADDR		((unsigned long)(__fentry__))
 #else
 # define MCOUNT_ADDR		((unsigned long)(mcount))
+# define HAVE_FUNCTION_GRAPH_FP_TEST
 #endif
 #define MCOUNT_INSN_SIZE	5 /* sizeof mcount call */
 
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index f4b86e8..ba33267 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -24,11 +24,6 @@ config HAVE_FUNCTION_GRAPH_TRACER
 	help
 	  See Documentation/trace/ftrace-design.txt
 
-config HAVE_FUNCTION_GRAPH_FP_TEST
-	bool
-	help
-	  See Documentation/trace/ftrace-design.txt
-
 config HAVE_DYNAMIC_FTRACE
 	bool
 	help
diff --git a/kernel/trace/trace_functions_graph.c b/kernel/trace/trace_functions_graph.c
index 7363ccf..fc173cd 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -204,7 +204,7 @@ ftrace_pop_return_trace(struct ftrace_graph_ret *trace, unsigned long *ret,
 		return;
 	}
 
-#if defined(CONFIG_HAVE_FUNCTION_GRAPH_FP_TEST) && !defined(CC_USING_FENTRY)
+#ifdef HAVE_FUNCTION_GRAPH_FP_TEST
 	/*
 	 * The arch may choose to record the frame pointer used
 	 * and check it here to make sure that it is what we expect it
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 19/44] ftrace: only allocate the ret_stack 'fp' field when needed
  2016-08-04 22:21 [PATCH v2 00/44] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (17 preceding siblings ...)
  2016-08-04 22:22 ` [PATCH v2 18/44] ftrace: remove CONFIG_HAVE_FUNCTION_GRAPH_FP_TEST from config Josh Poimboeuf
@ 2016-08-04 22:22 ` Josh Poimboeuf
  2016-08-04 22:22 ` [PATCH v2 20/44] ftrace: add return address pointer to ftrace_ret_stack Josh Poimboeuf
                   ` (24 subsequent siblings)
  43 siblings, 0 replies; 66+ messages in thread
From: Josh Poimboeuf @ 2016-08-04 22:22 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park

This saves some memory when HAVE_FUNCTION_GRAPH_FP_TEST isn't defined.
On x86_64 with newer versions of gcc which have -mfentry, it saves 400
bytes per task.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 include/linux/ftrace.h               | 2 ++
 kernel/trace/trace_functions_graph.c | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 7d565af..4ad9ccc 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -795,7 +795,9 @@ struct ftrace_ret_stack {
 	unsigned long func;
 	unsigned long long calltime;
 	unsigned long long subtime;
+#ifdef HAVE_FUNCTION_GRAPH_FP_TEST
 	unsigned long fp;
+#endif
 };
 
 /*
diff --git a/kernel/trace/trace_functions_graph.c b/kernel/trace/trace_functions_graph.c
index fc173cd..0e03ed0 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -171,7 +171,9 @@ ftrace_push_return_trace(unsigned long ret, unsigned long func, int *depth,
 	current->ret_stack[index].func = func;
 	current->ret_stack[index].calltime = calltime;
 	current->ret_stack[index].subtime = 0;
+#ifdef HAVE_FUNCTION_GRAPH_FP_TEST
 	current->ret_stack[index].fp = frame_pointer;
+#endif
 	*depth = current->curr_ret_stack;
 
 	return 0;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 20/44] ftrace: add return address pointer to ftrace_ret_stack
  2016-08-04 22:21 [PATCH v2 00/44] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (18 preceding siblings ...)
  2016-08-04 22:22 ` [PATCH v2 19/44] ftrace: only allocate the ret_stack 'fp' field when needed Josh Poimboeuf
@ 2016-08-04 22:22 ` Josh Poimboeuf
  2016-08-04 22:22 ` [PATCH v2 21/44] ftrace: add ftrace_graph_ret_addr() stack unwinding helpers Josh Poimboeuf
                   ` (23 subsequent siblings)
  43 siblings, 0 replies; 66+ messages in thread
From: Josh Poimboeuf @ 2016-08-04 22:22 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park

Storing this value will help prevent unwinders from getting out of sync
with the function graph tracer ret_stack.

Note that an array of 50 ftrace_ret_stack structs is allocated for every
task.  So when an arch implements this, it will add either 200 or 400
bytes of memory usage per task (depending on whether it's a 32-bit or
64-bit platform).

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 Documentation/trace/ftrace-design.txt | 11 +++++++++++
 arch/arm/kernel/ftrace.c              |  2 +-
 arch/arm64/kernel/ftrace.c            |  2 +-
 arch/blackfin/kernel/ftrace.c         |  2 +-
 arch/microblaze/kernel/ftrace.c       |  2 +-
 arch/mips/kernel/ftrace.c             |  4 ++--
 arch/parisc/kernel/ftrace.c           |  2 +-
 arch/powerpc/kernel/ftrace.c          |  3 ++-
 arch/s390/kernel/ftrace.c             |  3 ++-
 arch/sh/kernel/ftrace.c               |  2 +-
 arch/sparc/kernel/ftrace.c            |  2 +-
 arch/tile/kernel/ftrace.c             |  2 +-
 arch/x86/kernel/ftrace.c              |  2 +-
 include/linux/ftrace.h                |  5 ++++-
 kernel/trace/trace_functions_graph.c  |  5 ++++-
 15 files changed, 34 insertions(+), 15 deletions(-)

diff --git a/Documentation/trace/ftrace-design.txt b/Documentation/trace/ftrace-design.txt
index dd5f916..a273dd0 100644
--- a/Documentation/trace/ftrace-design.txt
+++ b/Documentation/trace/ftrace-design.txt
@@ -203,6 +203,17 @@ along to ftrace_push_return_trace() instead of a stub value of 0.
 
 Similarly, when you call ftrace_return_to_handler(), pass it the frame pointer.
 
+HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
+--------------------------------
+
+An arch may pass in a pointer to the return address on the stack.  This
+prevents potential stack unwinding issues where the unwinder gets out of
+sync with ret_stack and the wrong addresses are reported by
+ftrace_graph_ret_addr().
+
+Adding support for it is easy: just define the macro in asm/ftrace.h and
+pass the return address pointer as the 'retp' argument to
+ftrace_push_return_trace().
 
 HAVE_FTRACE_NMI_ENTER
 ---------------------
diff --git a/arch/arm/kernel/ftrace.c b/arch/arm/kernel/ftrace.c
index 709ee1d..3f17594 100644
--- a/arch/arm/kernel/ftrace.c
+++ b/arch/arm/kernel/ftrace.c
@@ -218,7 +218,7 @@ void prepare_ftrace_return(unsigned long *parent, unsigned long self_addr,
 	}
 
 	err = ftrace_push_return_trace(old, self_addr, &trace.depth,
-				       frame_pointer);
+				       frame_pointer, NULL);
 	if (err == -EBUSY) {
 		*parent = old;
 		return;
diff --git a/arch/arm64/kernel/ftrace.c b/arch/arm64/kernel/ftrace.c
index ebecf9a..40ad08a 100644
--- a/arch/arm64/kernel/ftrace.c
+++ b/arch/arm64/kernel/ftrace.c
@@ -138,7 +138,7 @@ void prepare_ftrace_return(unsigned long *parent, unsigned long self_addr,
 		return;
 
 	err = ftrace_push_return_trace(old, self_addr, &trace.depth,
-				       frame_pointer);
+				       frame_pointer, NULL);
 	if (err == -EBUSY)
 		return;
 	else
diff --git a/arch/blackfin/kernel/ftrace.c b/arch/blackfin/kernel/ftrace.c
index 095de0f..8dad758 100644
--- a/arch/blackfin/kernel/ftrace.c
+++ b/arch/blackfin/kernel/ftrace.c
@@ -107,7 +107,7 @@ void prepare_ftrace_return(unsigned long *parent, unsigned long self_addr,
 		return;
 
 	if (ftrace_push_return_trace(*parent, self_addr, &trace.depth,
-	                             frame_pointer) == -EBUSY)
+				     frame_pointer, NULL) == -EBUSY)
 		return;
 
 	trace.func = self_addr;
diff --git a/arch/microblaze/kernel/ftrace.c b/arch/microblaze/kernel/ftrace.c
index fc7b48a..d57563c 100644
--- a/arch/microblaze/kernel/ftrace.c
+++ b/arch/microblaze/kernel/ftrace.c
@@ -63,7 +63,7 @@ void prepare_ftrace_return(unsigned long *parent, unsigned long self_addr)
 		return;
 	}
 
-	err = ftrace_push_return_trace(old, self_addr, &trace.depth, 0);
+	err = ftrace_push_return_trace(old, self_addr, &trace.depth, 0, NULL);
 	if (err == -EBUSY) {
 		*parent = old;
 		return;
diff --git a/arch/mips/kernel/ftrace.c b/arch/mips/kernel/ftrace.c
index 937c54b..30a3b75 100644
--- a/arch/mips/kernel/ftrace.c
+++ b/arch/mips/kernel/ftrace.c
@@ -382,8 +382,8 @@ void prepare_ftrace_return(unsigned long *parent_ra_addr, unsigned long self_ra,
 	if (unlikely(faulted))
 		goto out;
 
-	if (ftrace_push_return_trace(old_parent_ra, self_ra, &trace.depth, fp)
-	    == -EBUSY) {
+	if (ftrace_push_return_trace(old_parent_ra, self_ra, &trace.depth, fp,
+				     NULL) == -EBUSY) {
 		*parent_ra_addr = old_parent_ra;
 		return;
 	}
diff --git a/arch/parisc/kernel/ftrace.c b/arch/parisc/kernel/ftrace.c
index a828a0a..5a5506a 100644
--- a/arch/parisc/kernel/ftrace.c
+++ b/arch/parisc/kernel/ftrace.c
@@ -48,7 +48,7 @@ static void __hot prepare_ftrace_return(unsigned long *parent,
 		return;
 
         if (ftrace_push_return_trace(old, self_addr, &trace.depth,
-			0 ) == -EBUSY)
+				     0, NULL) == -EBUSY)
                 return;
 
 	/* activate parisc_return_to_handler() as return point */
diff --git a/arch/powerpc/kernel/ftrace.c b/arch/powerpc/kernel/ftrace.c
index 1123a4d..d49b757 100644
--- a/arch/powerpc/kernel/ftrace.c
+++ b/arch/powerpc/kernel/ftrace.c
@@ -592,7 +592,8 @@ unsigned long prepare_ftrace_return(unsigned long parent, unsigned long ip)
 	if (!ftrace_graph_entry(&trace))
 		goto out;
 
-	if (ftrace_push_return_trace(parent, ip, &trace.depth, 0) == -EBUSY)
+	if (ftrace_push_return_trace(parent, ip, &trace.depth, 0,
+				     NULL) == -EBUSY)
 		goto out;
 
 	parent = return_hooker;
diff --git a/arch/s390/kernel/ftrace.c b/arch/s390/kernel/ftrace.c
index 0f7bfeb..60a8a4e 100644
--- a/arch/s390/kernel/ftrace.c
+++ b/arch/s390/kernel/ftrace.c
@@ -209,7 +209,8 @@ unsigned long prepare_ftrace_return(unsigned long parent, unsigned long ip)
 	/* Only trace if the calling function expects to. */
 	if (!ftrace_graph_entry(&trace))
 		goto out;
-	if (ftrace_push_return_trace(parent, ip, &trace.depth, 0) == -EBUSY)
+	if (ftrace_push_return_trace(parent, ip, &trace.depth, 0,
+				     NULL) == -EBUSY)
 		goto out;
 	parent = (unsigned long) return_to_handler;
 out:
diff --git a/arch/sh/kernel/ftrace.c b/arch/sh/kernel/ftrace.c
index 38993e0..95eccd4 100644
--- a/arch/sh/kernel/ftrace.c
+++ b/arch/sh/kernel/ftrace.c
@@ -382,7 +382,7 @@ void prepare_ftrace_return(unsigned long *parent, unsigned long self_addr)
 		return;
 	}
 
-	err = ftrace_push_return_trace(old, self_addr, &trace.depth, 0);
+	err = ftrace_push_return_trace(old, self_addr, &trace.depth, 0, NULL);
 	if (err == -EBUSY) {
 		__raw_writel(old, parent);
 		return;
diff --git a/arch/sparc/kernel/ftrace.c b/arch/sparc/kernel/ftrace.c
index 0a2d2dd..6bcff69 100644
--- a/arch/sparc/kernel/ftrace.c
+++ b/arch/sparc/kernel/ftrace.c
@@ -131,7 +131,7 @@ unsigned long prepare_ftrace_return(unsigned long parent,
 		return parent + 8UL;
 
 	if (ftrace_push_return_trace(parent, self_addr, &trace.depth,
-				     frame_pointer) == -EBUSY)
+				     frame_pointer, NULL) == -EBUSY)
 		return parent + 8UL;
 
 	trace.func = self_addr;
diff --git a/arch/tile/kernel/ftrace.c b/arch/tile/kernel/ftrace.c
index 4a57208..b827a41 100644
--- a/arch/tile/kernel/ftrace.c
+++ b/arch/tile/kernel/ftrace.c
@@ -184,7 +184,7 @@ void prepare_ftrace_return(unsigned long *parent, unsigned long self_addr,
 	*parent = return_hooker;
 
 	err = ftrace_push_return_trace(old, self_addr, &trace.depth,
-				       frame_pointer);
+				       frame_pointer, NULL);
 	if (err == -EBUSY) {
 		*parent = old;
 		return;
diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index d036cfb..ae3b1fb 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -1029,7 +1029,7 @@ void prepare_ftrace_return(unsigned long self_addr, unsigned long *parent,
 	}
 
 	if (ftrace_push_return_trace(old, self_addr, &trace.depth,
-		    frame_pointer) == -EBUSY) {
+				     frame_pointer, NULL) == -EBUSY) {
 		*parent = old;
 		return;
 	}
diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 4ad9ccc..483e02a 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -798,6 +798,9 @@ struct ftrace_ret_stack {
 #ifdef HAVE_FUNCTION_GRAPH_FP_TEST
 	unsigned long fp;
 #endif
+#ifdef HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
+	unsigned long *retp;
+#endif
 };
 
 /*
@@ -809,7 +812,7 @@ extern void return_to_handler(void);
 
 extern int
 ftrace_push_return_trace(unsigned long ret, unsigned long func, int *depth,
-			 unsigned long frame_pointer);
+			 unsigned long frame_pointer, unsigned long *retp);
 
 /*
  * Sometimes we don't want to trace a function with the function
diff --git a/kernel/trace/trace_functions_graph.c b/kernel/trace/trace_functions_graph.c
index 0e03ed0..f7212ec 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -119,7 +119,7 @@ print_graph_duration(struct trace_array *tr, unsigned long long duration,
 /* Add a function return address to the trace stack on thread info.*/
 int
 ftrace_push_return_trace(unsigned long ret, unsigned long func, int *depth,
-			 unsigned long frame_pointer)
+			 unsigned long frame_pointer, unsigned long *retp)
 {
 	unsigned long long calltime;
 	int index;
@@ -174,6 +174,9 @@ ftrace_push_return_trace(unsigned long ret, unsigned long func, int *depth,
 #ifdef HAVE_FUNCTION_GRAPH_FP_TEST
 	current->ret_stack[index].fp = frame_pointer;
 #endif
+#ifdef HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
+	current->ret_stack[index].retp = retp;
+#endif
 	*depth = current->curr_ret_stack;
 
 	return 0;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 21/44] ftrace: add ftrace_graph_ret_addr() stack unwinding helpers
  2016-08-04 22:21 [PATCH v2 00/44] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (19 preceding siblings ...)
  2016-08-04 22:22 ` [PATCH v2 20/44] ftrace: add return address pointer to ftrace_ret_stack Josh Poimboeuf
@ 2016-08-04 22:22 ` Josh Poimboeuf
  2016-08-04 22:22 ` [PATCH v2 22/44] x86/dumpstack/ftrace: convert dump_trace() callbacks to use ftrace_graph_ret_addr() Josh Poimboeuf
                   ` (22 subsequent siblings)
  43 siblings, 0 replies; 66+ messages in thread
From: Josh Poimboeuf @ 2016-08-04 22:22 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park

When function graph tracing is enabled for a function, ftrace modifies
the stack by replacing the original return address with the address of a
hook function (return_to_handler).

Stack unwinders need a way to get the original return address.  Add an
arch-independent helper function for that named ftrace_graph_ret_addr().

This adds two variations of the function: one depends on
HAVE_FUNCTION_GRAPH_RET_ADDR_PTR, and the other relies on an index state
variable.

The former is recommended because, in some cases, the latter can cause
problems when the unwinder skips stack frames.  It can get out of sync
with the ret_stack index and wrong addresses can be reported for the
stack trace.

Once all arches have been ported to use
HAVE_FUNCTION_GRAPH_RET_ADDR_PTR, we can get rid of the distinction.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 include/linux/ftrace.h               | 10 +++++++
 kernel/trace/trace_functions_graph.c | 58 ++++++++++++++++++++++++++++++++++++
 2 files changed, 68 insertions(+)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 483e02a..6f93ac4 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -814,6 +814,9 @@ extern int
 ftrace_push_return_trace(unsigned long ret, unsigned long func, int *depth,
 			 unsigned long frame_pointer, unsigned long *retp);
 
+unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
+				    unsigned long ret, unsigned long *retp);
+
 /*
  * Sometimes we don't want to trace a function with the function
  * graph tracer but we want them to keep traced by the usual function
@@ -875,6 +878,13 @@ static inline int task_curr_ret_stack(struct task_struct *tsk)
 	return -1;
 }
 
+static inline unsigned long
+ftrace_graph_ret_addr(struct task_struct *task, int *idx, unsigned long ret,
+		      unsigned long *retp)
+{
+	return ret;
+}
+
 static inline void pause_graph_tracing(void) { }
 static inline void unpause_graph_tracing(void) { }
 #endif /* CONFIG_FUNCTION_GRAPH_TRACER */
diff --git a/kernel/trace/trace_functions_graph.c b/kernel/trace/trace_functions_graph.c
index f7212ec..0cbe38a 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -284,6 +284,64 @@ unsigned long ftrace_return_to_handler(unsigned long frame_pointer)
 	return ret;
 }
 
+/**
+ * ftrace_graph_ret_addr - convert a potentially modified stack return address
+ *			   to its original value
+ *
+ * This function can be called by stack unwinding code to convert a found stack
+ * return address ('ret') to its original value, in case the function graph
+ * tracer has modified it to be 'return_to_handler'.  If the address hasn't
+ * been modified, the unchanged value of 'ret' is returned.
+ *
+ * 'idx' is a state variable which should be initialized by the caller to zero
+ * before the first call.
+ *
+ * 'retp' is a pointer to the return address on the stack.  It's ignored if
+ * the arch doesn't have HAVE_FUNCTION_GRAPH_RET_ADDR_PTR defined.
+ */
+#ifdef HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
+unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
+				    unsigned long ret, unsigned long *retp)
+{
+	int index = task->curr_ret_stack;
+	int i;
+
+	if (ret != (unsigned long)return_to_handler)
+		return ret;
+
+	if (index < -1)
+		index += FTRACE_NOTRACE_DEPTH;
+
+	if (index < 0)
+		return ret;
+
+	for (i = 0; i <= index; i++)
+		if (task->ret_stack[i].retp == retp)
+			return task->ret_stack[i].ret;
+
+	return ret;
+}
+#else /* !HAVE_FUNCTION_GRAPH_RET_ADDR_PTR */
+unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
+				    unsigned long ret, unsigned long *retp)
+{
+	int task_idx;
+
+	if (ret != (unsigned long)return_to_handler)
+		return ret;
+
+	task_idx = task->curr_ret_stack;
+
+	if (!task->ret_stack || task_idx < *idx)
+		return ret;
+
+	task_idx -= *idx;
+	(*idx)++;
+
+	return task->ret_stack[task_idx].ret;
+}
+#endif /* HAVE_FUNCTION_GRAPH_RET_ADDR_PTR */
+
 int __trace_graph_entry(struct trace_array *tr,
 				struct ftrace_graph_ent *trace,
 				unsigned long flags,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 22/44] x86/dumpstack/ftrace: convert dump_trace() callbacks to use ftrace_graph_ret_addr()
  2016-08-04 22:21 [PATCH v2 00/44] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (20 preceding siblings ...)
  2016-08-04 22:22 ` [PATCH v2 21/44] ftrace: add ftrace_graph_ret_addr() stack unwinding helpers Josh Poimboeuf
@ 2016-08-04 22:22 ` Josh Poimboeuf
  2016-08-04 22:22 ` [PATCH v2 23/44] ftrace/x86: implement HAVE_FUNCTION_GRAPH_RET_ADDR_PTR Josh Poimboeuf
                   ` (21 subsequent siblings)
  43 siblings, 0 replies; 66+ messages in thread
From: Josh Poimboeuf @ 2016-08-04 22:22 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park

Convert print_context_stack() and print_context_stack_bp() to use the
arch-independent ftrace_graph_ret_addr() helper.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/dumpstack.c | 65 +++++++++++++++------------------------------
 1 file changed, 22 insertions(+), 43 deletions(-)

diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index 692eecae..b374d85 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -39,38 +39,6 @@ void printk_address(unsigned long address)
 	pr_cont(" [<%p>] %pS\n", (void *)address, (void *)address);
 }
 
-#ifdef CONFIG_FUNCTION_GRAPH_TRACER
-static void
-print_ftrace_graph_addr(unsigned long addr, void *data,
-			const struct stacktrace_ops *ops,
-			struct task_struct *task, int *graph)
-{
-	unsigned long ret_addr;
-	int index;
-
-	if (addr != (unsigned long)return_to_handler)
-		return;
-
-	index = task->curr_ret_stack;
-
-	if (!task->ret_stack || index < *graph)
-		return;
-
-	index -= *graph;
-	ret_addr = task->ret_stack[index].ret;
-
-	ops->address(data, ret_addr, 1);
-
-	(*graph)++;
-}
-#else
-static inline void
-print_ftrace_graph_addr(unsigned long addr, void *data,
-			const struct stacktrace_ops *ops,
-			struct task_struct *task, int *graph)
-{ }
-#endif
-
 /*
  * x86-64 can have up to three kernel stacks:
  * process stack
@@ -108,18 +76,24 @@ print_context_stack(struct task_struct *task,
 		stack = (unsigned long *)task_stack_page(task);
 
 	while (valid_stack_ptr(task, stack, sizeof(*stack), end)) {
-		unsigned long addr;
+		unsigned long addr = *stack;
 
-		addr = *stack;
 		if (__kernel_text_address(addr)) {
+			unsigned long real_addr;
+			int reliable = 0;
+
 			if ((unsigned long) stack == bp + sizeof(long)) {
-				ops->address(data, addr, 1);
+				reliable = 1;
 				frame = frame->next_frame;
 				bp = (unsigned long) frame;
-			} else {
-				ops->address(data, addr, 0);
 			}
-			print_ftrace_graph_addr(addr, data, ops, task, graph);
+
+			ops->address(data, addr, reliable);
+
+			real_addr = ftrace_graph_ret_addr(task, graph, addr,
+							  stack);
+			if (real_addr != addr)
+				ops->address(data, real_addr, 1);
 		}
 		stack++;
 	}
@@ -134,19 +108,24 @@ print_context_stack_bp(struct task_struct *task,
 		       unsigned long *end, int *graph)
 {
 	struct stack_frame *frame = (struct stack_frame *)bp;
-	unsigned long *ret_addr = &frame->return_address;
+	unsigned long *retp = &frame->return_address;
 
-	while (valid_stack_ptr(task, ret_addr, sizeof(*ret_addr), end)) {
-		unsigned long addr = *ret_addr;
+	while (valid_stack_ptr(task, retp, sizeof(*retp), end)) {
+		unsigned long addr = *retp;
+		unsigned long real_addr;
 
 		if (!__kernel_text_address(addr))
 			break;
 
 		if (ops->address(data, addr, 1))
 			break;
+
+		real_addr = ftrace_graph_ret_addr(task, graph, addr, retp);
+		if (real_addr != addr)
+			ops->address(data, real_addr, 1);
+
 		frame = frame->next_frame;
-		ret_addr = &frame->return_address;
-		print_ftrace_graph_addr(addr, data, ops, task, graph);
+		retp = &frame->return_address;
 	}
 
 	return (unsigned long)frame;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 23/44] ftrace/x86: implement HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
  2016-08-04 22:21 [PATCH v2 00/44] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (21 preceding siblings ...)
  2016-08-04 22:22 ` [PATCH v2 22/44] x86/dumpstack/ftrace: convert dump_trace() callbacks to use ftrace_graph_ret_addr() Josh Poimboeuf
@ 2016-08-04 22:22 ` Josh Poimboeuf
  2016-08-04 22:22 ` [PATCH v2 24/44] x86/dumpstack/ftrace: mark function graph handler function as unreliable Josh Poimboeuf
                   ` (20 subsequent siblings)
  43 siblings, 0 replies; 66+ messages in thread
From: Josh Poimboeuf @ 2016-08-04 22:22 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park

This allows the use of the more reliable version of
ftrace_graph_ret_addr() so we no longer have to worry about the unwinder
getting out of sync with the function graph ret_stack index, which can
happen if the unwinder skips any frames before calling
ftrace_graph_ret_addr().

This fixes this issue (and several others like it):

  Before:
  $ cat /proc/self/stack
  [<ffffffff810489a2>] save_stack_trace_tsk+0x22/0x40
  [<ffffffff81311a89>] proc_pid_stack+0xb9/0x110
  [<ffffffff813127c4>] proc_single_show+0x54/0x80
  [<ffffffff812be088>] seq_read+0x108/0x3e0
  [<ffffffff812923d7>] __vfs_read+0x37/0x140
  [<ffffffff812929d9>] vfs_read+0x99/0x140
  [<ffffffff81293f28>] SyS_read+0x58/0xc0
  [<ffffffff818af97c>] entry_SYSCALL_64_fastpath+0x1f/0xbd
  [<ffffffffffffffff>] 0xffffffffffffffff

  After:
  $ echo function_graph > /sys/kernel/debug/tracing/current_tracer
  $ cat /proc/self/stack
  [<ffffffff818b2428>] ? return_to_handler+0x0/0x27
  [<ffffffff810394cc>] print_context_stack+0xfc/0x100
  [<ffffffff818b2428>] ? return_to_handler+0x0/0x27
  [<ffffffff8103891b>] dump_trace+0x12b/0x350
  [<ffffffff818b2428>] ? return_to_handler+0x0/0x27
  [<ffffffff810489a2>] save_stack_trace_tsk+0x22/0x40
  [<ffffffff818b2428>] ? return_to_handler+0x0/0x27
  [<ffffffff81311a89>] proc_pid_stack+0xb9/0x110
  [<ffffffff818b2428>] ? return_to_handler+0x0/0x27
  [<ffffffff813127c4>] proc_single_show+0x54/0x80
  [<ffffffff818b2428>] ? return_to_handler+0x0/0x27
  [<ffffffff812be088>] seq_read+0x108/0x3e0
  [<ffffffff818b2428>] ? return_to_handler+0x0/0x27
  [<ffffffff812923d7>] __vfs_read+0x37/0x140
  [<ffffffff818b2428>] ? return_to_handler+0x0/0x27
  [<ffffffff812929d9>] vfs_read+0x99/0x140
  [<ffffffffffffffff>] 0xffffffffffffffff

Enabling function graph tracing caused the stack trace to change: it was
offset by 2 frames because the unwinder started with an earlier stack
frame and got out of sync with the ret_stack index.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/include/asm/ftrace.h | 2 ++
 arch/x86/kernel/ftrace.c      | 2 +-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h
index 37f67cb..eccd0ac 100644
--- a/arch/x86/include/asm/ftrace.h
+++ b/arch/x86/include/asm/ftrace.h
@@ -14,6 +14,8 @@
 #define ARCH_SUPPORTS_FTRACE_OPS 1
 #endif
 
+#define HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
+
 #ifndef __ASSEMBLY__
 extern void mcount(void);
 extern atomic_t modifying_ftrace_code;
diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index ae3b1fb..8639bb2 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -1029,7 +1029,7 @@ void prepare_ftrace_return(unsigned long self_addr, unsigned long *parent,
 	}
 
 	if (ftrace_push_return_trace(old, self_addr, &trace.depth,
-				     frame_pointer, NULL) == -EBUSY) {
+				     frame_pointer, parent) == -EBUSY) {
 		*parent = old;
 		return;
 	}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 24/44] x86/dumpstack/ftrace: mark function graph handler function as unreliable
  2016-08-04 22:21 [PATCH v2 00/44] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (22 preceding siblings ...)
  2016-08-04 22:22 ` [PATCH v2 23/44] ftrace/x86: implement HAVE_FUNCTION_GRAPH_RET_ADDR_PTR Josh Poimboeuf
@ 2016-08-04 22:22 ` Josh Poimboeuf
  2016-08-04 22:22 ` [PATCH v2 25/44] x86/dumpstack/ftrace: don't print unreliable addresses in print_context_stack_bp() Josh Poimboeuf
                   ` (19 subsequent siblings)
  43 siblings, 0 replies; 66+ messages in thread
From: Josh Poimboeuf @ 2016-08-04 22:22 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park

When function graph tracing is enabled for a function, its return
address on the stack is replaced with the address of an ftrace handler
(return_to_handler).

Currently 'return_to_handler' can be reported as reliable.  That's not
ideal, and can actually be misleading.  When saving or dumping the
stack, you normally only care about what led up to that point (the call
path), rather than what will happen in the future (the return path).

That's especially true in the non-oops stack trace case, which isn't
used for debugging.  For example, in a perf profiling operation,
reporting return_to_handler() in the trace would just be confusing.

And in the oops case, where debugging is important, "unreliable" is also
more appropriate there because it serves as a hint that graph tracing
was involved, instead of trying to imply that return_to_handler() was
the real caller.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/dumpstack.c | 24 ++++++++++++++++--------
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index b374d85..33f2899 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -88,12 +88,21 @@ print_context_stack(struct task_struct *task,
 				bp = (unsigned long) frame;
 			}
 
-			ops->address(data, addr, reliable);
-
+			/*
+			 * When function graph tracing is enabled for a
+			 * function, its return address on the stack is
+			 * replaced with the address of an ftrace handler
+			 * (return_to_handler).  In that case, before printing
+			 * the "real" address, we want to print the handler
+			 * address as an "unreliable" hint that function graph
+			 * tracing was involved.
+			 */
 			real_addr = ftrace_graph_ret_addr(task, graph, addr,
 							  stack);
 			if (real_addr != addr)
-				ops->address(data, real_addr, 1);
+				ops->address(data, addr, 0);
+
+			ops->address(data, real_addr, reliable);
 		}
 		stack++;
 	}
@@ -117,12 +126,11 @@ print_context_stack_bp(struct task_struct *task,
 		if (!__kernel_text_address(addr))
 			break;
 
-		if (ops->address(data, addr, 1))
-			break;
-
 		real_addr = ftrace_graph_ret_addr(task, graph, addr, retp);
-		if (real_addr != addr)
-			ops->address(data, real_addr, 1);
+		if (real_addr != addr && ops->address(data, addr, 0))
+			break;
+		if (ops->address(data, real_addr, 1))
+			break;
 
 		frame = frame->next_frame;
 		retp = &frame->return_address;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 25/44] x86/dumpstack/ftrace: don't print unreliable addresses in print_context_stack_bp()
  2016-08-04 22:21 [PATCH v2 00/44] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (23 preceding siblings ...)
  2016-08-04 22:22 ` [PATCH v2 24/44] x86/dumpstack/ftrace: mark function graph handler function as unreliable Josh Poimboeuf
@ 2016-08-04 22:22 ` Josh Poimboeuf
  2016-08-04 22:22 ` [PATCH v2 26/44] x86/dumpstack: allow preemption in show_stack_log_lvl() and dump_trace() Josh Poimboeuf
                   ` (18 subsequent siblings)
  43 siblings, 0 replies; 66+ messages in thread
From: Josh Poimboeuf @ 2016-08-04 22:22 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park

When function graph tracing is enabled, print_context_stack_bp() can
report return_to_handler() as an unreliable address, which is confusing
and misleading: return_to_handler() is really only useful as a hint for
debugging, whereas print_context_stack_bp() users only care about the
actual 'reliable' call path.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/dumpstack.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index 33f2899..c6c6c39 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -127,8 +127,6 @@ print_context_stack_bp(struct task_struct *task,
 			break;
 
 		real_addr = ftrace_graph_ret_addr(task, graph, addr, retp);
-		if (real_addr != addr && ops->address(data, addr, 0))
-			break;
 		if (ops->address(data, real_addr, 1))
 			break;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 26/44] x86/dumpstack: allow preemption in show_stack_log_lvl() and dump_trace()
  2016-08-04 22:21 [PATCH v2 00/44] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (24 preceding siblings ...)
  2016-08-04 22:22 ` [PATCH v2 25/44] x86/dumpstack/ftrace: don't print unreliable addresses in print_context_stack_bp() Josh Poimboeuf
@ 2016-08-04 22:22 ` Josh Poimboeuf
  2016-08-04 22:22 ` [PATCH v2 27/44] x86/dumpstack: simplify in_exception_stack() Josh Poimboeuf
                   ` (17 subsequent siblings)
  43 siblings, 0 replies; 66+ messages in thread
From: Josh Poimboeuf @ 2016-08-04 22:22 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park

show_stack_log_lvl() and dump_trace() are already preemption safe:

- If they're running in interrupt context, preemption is already
  disabled and the percpu irq stack pointers can be trusted.

- If they're running with preemption enabled, they must be running on
  the task stack anyway, so it doesn't matter if they're comparing the
  stack pointer against the percpu irq stack pointer from this CPU or
  another one: either way it won't match.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/dumpstack_32.c | 14 ++++++--------
 arch/x86/kernel/dumpstack_64.c | 26 +++++++++-----------------
 2 files changed, 15 insertions(+), 25 deletions(-)

diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
index c533b8b..b07d5c9 100644
--- a/arch/x86/kernel/dumpstack_32.c
+++ b/arch/x86/kernel/dumpstack_32.c
@@ -24,16 +24,16 @@ static void *is_irq_stack(void *p, void *irq)
 }
 
 
-static void *is_hardirq_stack(unsigned long *stack, int cpu)
+static void *is_hardirq_stack(unsigned long *stack)
 {
-	void *irq = per_cpu(hardirq_stack, cpu);
+	void *irq = this_cpu_read(hardirq_stack);
 
 	return is_irq_stack(stack, irq);
 }
 
-static void *is_softirq_stack(unsigned long *stack, int cpu)
+static void *is_softirq_stack(unsigned long *stack);
 {
-	void *irq = per_cpu(softirq_stack, cpu);
+	void *irq = this_cpu_read(softirq_stack);
 
 	return is_irq_stack(stack, irq);
 }
@@ -42,7 +42,6 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 		unsigned long *stack, unsigned long bp,
 		const struct stacktrace_ops *ops, void *data)
 {
-	const unsigned cpu = get_cpu();
 	int graph = 0;
 	u32 *prev_esp;
 
@@ -53,9 +52,9 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 	for (;;) {
 		void *end_stack;
 
-		end_stack = is_hardirq_stack(stack, cpu);
+		end_stack = is_hardirq_stack(stack);
 		if (!end_stack)
-			end_stack = is_softirq_stack(stack, cpu);
+			end_stack = is_softirq_stack(stack);
 
 		bp = ops->walk_stack(task, stack, bp, ops, data,
 				     end_stack, &graph);
@@ -74,7 +73,6 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 			break;
 		touch_nmi_watchdog();
 	}
-	put_cpu();
 }
 EXPORT_SYMBOL(dump_trace);
 
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index 491f2fd..f1b843a 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -31,8 +31,8 @@ static char x86_stack_ids[][8] = {
 #endif
 };
 
-static unsigned long *in_exception_stack(unsigned cpu, unsigned long stack,
-					 unsigned *usedp, char **idp)
+static unsigned long *in_exception_stack(unsigned long stack, unsigned *usedp,
+					 char **idp)
 {
 	unsigned k;
 
@@ -41,7 +41,7 @@ static unsigned long *in_exception_stack(unsigned cpu, unsigned long stack,
 	 * 'stack' is in one of them:
 	 */
 	for (k = 0; k < N_EXCEPTION_STACKS; k++) {
-		unsigned long end = per_cpu(orig_ist, cpu).ist[k];
+		unsigned long end = raw_cpu_ptr(&orig_ist)->ist[k];
 		/*
 		 * Is 'stack' above this exception frame's end?
 		 * If yes then skip to the next frame.
@@ -111,7 +111,7 @@ enum stack_type {
 };
 
 static enum stack_type
-analyze_stack(int cpu, struct task_struct *task, unsigned long *stack,
+analyze_stack(struct task_struct *task, unsigned long *stack,
 	      unsigned long **stack_end, unsigned long *irq_stack,
 	      unsigned *used, char **id)
 {
@@ -121,8 +121,7 @@ analyze_stack(int cpu, struct task_struct *task, unsigned long *stack,
 	if ((unsigned long)task_stack_page(task) == addr)
 		return STACK_IS_NORMAL;
 
-	*stack_end = in_exception_stack(cpu, (unsigned long)stack,
-					used, id);
+	*stack_end = in_exception_stack((unsigned long)stack, used, id);
 	if (*stack_end)
 		return STACK_IS_EXCEPTION;
 
@@ -149,8 +148,7 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 		unsigned long *stack, unsigned long bp,
 		const struct stacktrace_ops *ops, void *data)
 {
-	const unsigned cpu = get_cpu();
-	unsigned long *irq_stack = (unsigned long *)per_cpu(irq_stack_ptr, cpu);
+	unsigned long *irq_stack = (unsigned long *)this_cpu_read(irq_stack_ptr);
 	unsigned used = 0;
 	int graph = 0;
 	int done = 0;
@@ -169,8 +167,8 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 		enum stack_type stype;
 		char *id;
 
-		stype = analyze_stack(cpu, task, stack, &stack_end,
-				      irq_stack, &used, &id);
+		stype = analyze_stack(task, stack, &stack_end, irq_stack, &used,
+				      &id);
 
 		/* Default finish unless specified to continue */
 		done = 1;
@@ -225,7 +223,6 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 	 * This handles the process stack:
 	 */
 	bp = ops->walk_stack(task, stack, bp, ops, data, NULL, &graph);
-	put_cpu();
 }
 EXPORT_SYMBOL(dump_trace);
 
@@ -236,13 +233,9 @@ show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
 	unsigned long *irq_stack_end;
 	unsigned long *irq_stack;
 	unsigned long *stack;
-	int cpu;
 	int i;
 
-	preempt_disable();
-	cpu = smp_processor_id();
-
-	irq_stack_end = (unsigned long *)(per_cpu(irq_stack_ptr, cpu));
+	irq_stack_end = (unsigned long *)this_cpu_read(irq_stack_ptr);
 	irq_stack     = irq_stack_end - (IRQ_USABLE_STACK_SIZE / sizeof(long));
 
 	sp = sp ? : get_stack_pointer(task, regs);
@@ -274,7 +267,6 @@ show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
 		stack++;
 		touch_nmi_watchdog();
 	}
-	preempt_enable();
 
 	pr_cont("\n");
 	show_trace_log_lvl(task, regs, sp, bp, log_lvl);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 27/44] x86/dumpstack: simplify in_exception_stack()
  2016-08-04 22:21 [PATCH v2 00/44] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (25 preceding siblings ...)
  2016-08-04 22:22 ` [PATCH v2 26/44] x86/dumpstack: allow preemption in show_stack_log_lvl() and dump_trace() Josh Poimboeuf
@ 2016-08-04 22:22 ` Josh Poimboeuf
  2016-08-04 22:22 ` [PATCH v2 28/44] x86/dumpstack: add get_stack_info() interface Josh Poimboeuf
                   ` (16 subsequent siblings)
  43 siblings, 0 replies; 66+ messages in thread
From: Josh Poimboeuf @ 2016-08-04 22:22 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park

in_exception_stack() does some bad, bad things just so the unwinder can
print different values for different areas of the debug exception stack.

There's no need to clarify where exactly on the stack it is.  Just print
"#DB" and be done with it.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/dumpstack_64.c | 89 ++++++++++++------------------------------
 1 file changed, 26 insertions(+), 63 deletions(-)

diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index f1b843a..69f6ba2 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -16,83 +16,46 @@
 
 #include <asm/stacktrace.h>
 
+static char *exception_stack_names[N_EXCEPTION_STACKS] = {
+		[ DOUBLEFAULT_STACK-1	]	= "#DF",
+		[ NMI_STACK-1		]	= "NMI",
+		[ DEBUG_STACK-1		]	= "#DB",
+		[ MCE_STACK-1		]	= "#MC",
+};
 
-#define N_EXCEPTION_STACKS_END \
-		(N_EXCEPTION_STACKS + DEBUG_STKSZ/EXCEPTION_STKSZ - 2)
-
-static char x86_stack_ids[][8] = {
-		[ DEBUG_STACK-1			]	= "#DB",
-		[ NMI_STACK-1			]	= "NMI",
-		[ DOUBLEFAULT_STACK-1		]	= "#DF",
-		[ MCE_STACK-1			]	= "#MC",
-#if DEBUG_STKSZ > EXCEPTION_STKSZ
-		[ N_EXCEPTION_STACKS ...
-		  N_EXCEPTION_STACKS_END	]	= "#DB[?]"
-#endif
+static unsigned long exception_stack_sizes[N_EXCEPTION_STACKS] = {
+	[0 ... N_EXCEPTION_STACKS - 1]		= EXCEPTION_STKSZ,
+	[DEBUG_STACK - 1]			= DEBUG_STKSZ
 };
 
 static unsigned long *in_exception_stack(unsigned long stack, unsigned *usedp,
 					 char **idp)
 {
+	unsigned long begin, end;
 	unsigned k;
 
-	/*
-	 * Iterate over all exception stacks, and figure out whether
-	 * 'stack' is in one of them:
-	 */
+	BUILD_BUG_ON(N_EXCEPTION_STACKS != 4);
+
 	for (k = 0; k < N_EXCEPTION_STACKS; k++) {
-		unsigned long end = raw_cpu_ptr(&orig_ist)->ist[k];
-		/*
-		 * Is 'stack' above this exception frame's end?
-		 * If yes then skip to the next frame.
-		 */
-		if (stack >= end)
+		end   = raw_cpu_ptr(&orig_ist)->ist[k];
+		begin = end - exception_stack_sizes[k];
+
+		if (stack < begin || stack >= end)
 			continue;
+
 		/*
-		 * Is 'stack' above this exception frame's start address?
-		 * If yes then we found the right frame.
-		 */
-		if (stack >= end - EXCEPTION_STKSZ) {
-			/*
-			 * Make sure we only iterate through an exception
-			 * stack once. If it comes up for the second time
-			 * then there's something wrong going on - just
-			 * break out and return NULL:
-			 */
-			if (*usedp & (1U << k))
-				break;
-			*usedp |= 1U << k;
-			*idp = x86_stack_ids[k];
-			return (unsigned long *)end;
-		}
-		/*
-		 * If this is a debug stack, and if it has a larger size than
-		 * the usual exception stacks, then 'stack' might still
-		 * be within the lower portion of the debug stack:
+		 * Make sure we only iterate through an exception stack once.
+		 * If it comes up for the second time then there's something
+		 * wrong going on - just break and return NULL:
 		 */
-#if DEBUG_STKSZ > EXCEPTION_STKSZ
-		if (k == DEBUG_STACK - 1 && stack >= end - DEBUG_STKSZ) {
-			unsigned j = N_EXCEPTION_STACKS - 1;
+		if (*usedp & (1U << k))
+			break;
+		*usedp |= 1U << k;
 
-			/*
-			 * Black magic. A large debug stack is composed of
-			 * multiple exception stack entries, which we
-			 * iterate through now. Dont look:
-			 */
-			do {
-				++j;
-				end -= EXCEPTION_STKSZ;
-				x86_stack_ids[j][4] = '1' +
-						(j - N_EXCEPTION_STACKS);
-			} while (stack < end - EXCEPTION_STKSZ);
-			if (*usedp & (1U << j))
-				break;
-			*usedp |= 1U << j;
-			*idp = x86_stack_ids[j];
-			return (unsigned long *)end;
-		}
-#endif
+		*idp = exception_stack_names[k];
+		return (unsigned long *)end;
 	}
+
 	return NULL;
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 28/44] x86/dumpstack: add get_stack_info() interface
  2016-08-04 22:21 [PATCH v2 00/44] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (26 preceding siblings ...)
  2016-08-04 22:22 ` [PATCH v2 27/44] x86/dumpstack: simplify in_exception_stack() Josh Poimboeuf
@ 2016-08-04 22:22 ` Josh Poimboeuf
  2016-08-04 22:22 ` [PATCH v2 29/44] x86/dumpstack: add recursion checking for all stacks Josh Poimboeuf
                   ` (15 subsequent siblings)
  43 siblings, 0 replies; 66+ messages in thread
From: Josh Poimboeuf @ 2016-08-04 22:22 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park

valid_stack_ptr() is buggy: it assumes that all stacks are of size
THREAD_SIZE, which is not true for exception stacks.  So the
walk_stack() callbacks will need to know the location of the beginning
of the stack as well as the end.

Another issue is that in general the various features of a stack (type,
size, next stack pointer, description string) are scattered around in
various places throughout the stack dump code.

Encapsulate all that information in a single place with a new stack_info
struct and a get_stack_info() interface.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/events/core.c            |   2 +-
 arch/x86/include/asm/stacktrace.h |  41 +++++++++-
 arch/x86/kernel/dumpstack.c       |  40 ++++-----
 arch/x86/kernel/dumpstack_32.c    | 106 ++++++++++++++++++------
 arch/x86/kernel/dumpstack_64.c    | 165 ++++++++++++++++++++------------------
 arch/x86/kernel/stacktrace.c      |   2 +-
 arch/x86/oprofile/backtrace.c     |   2 +-
 7 files changed, 231 insertions(+), 127 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index c1319ac..477dc38 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2251,7 +2251,7 @@ void arch_perf_update_userpage(struct perf_event *event,
  * callchain support
  */
 
-static int backtrace_stack(void *data, char *name)
+static int backtrace_stack(void *data, const char *name)
 {
 	return 0;
 }
diff --git a/arch/x86/include/asm/stacktrace.h b/arch/x86/include/asm/stacktrace.h
index 6f65995..be9273c 100644
--- a/arch/x86/include/asm/stacktrace.h
+++ b/arch/x86/include/asm/stacktrace.h
@@ -9,6 +9,39 @@
 #include <linux/uaccess.h>
 #include <linux/ptrace.h>
 
+enum stack_type {
+	STACK_TYPE_UNKNOWN,
+	STACK_TYPE_TASK,
+	STACK_TYPE_IRQ,
+	STACK_TYPE_SOFTIRQ,
+	STACK_TYPE_EXCEPTION,
+	STACK_TYPE_EXCEPTION_LAST = STACK_TYPE_EXCEPTION + N_EXCEPTION_STACKS-1,
+};
+
+struct stack_info {
+	enum stack_type type;
+	unsigned long *begin, *end, *next_sp;
+};
+
+bool in_task_stack(unsigned long *stack, struct task_struct *task,
+		   struct stack_info *info);
+
+int get_stack_info(unsigned long *stack, struct task_struct *task,
+		   struct stack_info *info, unsigned long *visit_mask);
+
+void stack_type_str(enum stack_type type, const char **begin,
+		    const char **end);
+
+static inline bool on_stack(struct stack_info *info, void *addr, size_t len)
+{
+	void *begin = info->begin;
+	void *end   = info->end;
+
+	return (info->type != STACK_TYPE_UNKNOWN &&
+		addr >= begin && addr < end &&
+		addr + len > begin && addr + len <= end);
+}
+
 extern int kstack_depth_to_print;
 
 struct thread_info;
@@ -19,27 +52,27 @@ typedef unsigned long (*walk_stack_t)(struct task_struct *task,
 				      unsigned long bp,
 				      const struct stacktrace_ops *ops,
 				      void *data,
-				      unsigned long *end,
+				      struct stack_info *info,
 				      int *graph);
 
 extern unsigned long
 print_context_stack(struct task_struct *task,
 		    unsigned long *stack, unsigned long bp,
 		    const struct stacktrace_ops *ops, void *data,
-		    unsigned long *end, int *graph);
+		    struct stack_info *info, int *graph);
 
 extern unsigned long
 print_context_stack_bp(struct task_struct *task,
 		       unsigned long *stack, unsigned long bp,
 		       const struct stacktrace_ops *ops, void *data,
-		       unsigned long *end, int *graph);
+		       struct stack_info *info, int *graph);
 
 /* Generic stack tracer with callbacks */
 
 struct stacktrace_ops {
 	int (*address)(void *data, unsigned long address, int reliable);
 	/* On negative return stop dumping */
-	int (*stack)(void *data, char *name);
+	int (*stack)(void *data, const char *name);
 	walk_stack_t	walk_stack;
 };
 
diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index c6c6c39..aa208e5 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -25,6 +25,23 @@ unsigned int code_bytes = 64;
 int kstack_depth_to_print = 3 * STACKSLOTS_PER_LINE;
 static int die_counter;
 
+bool in_task_stack(unsigned long *stack, struct task_struct *task,
+		   struct stack_info *info)
+{
+	unsigned long *begin = task_stack_page(task);
+	unsigned long *end   = task_stack_page(task) + THREAD_SIZE;
+
+	if (stack < begin || stack >= end)
+		return false;
+
+	info->type	= STACK_TYPE_TASK;
+	info->begin	= begin;
+	info->end	= end;
+	info->next_sp	= NULL;
+
+	return true;
+}
+
 static void printk_stack_address(unsigned long address, int reliable,
 				 char *log_lvl)
 {
@@ -46,24 +63,11 @@ void printk_address(unsigned long address)
  * severe exception (double fault, nmi, stack fault, debug, mce) hardware stack
  */
 
-static inline int valid_stack_ptr(struct task_struct *task,
-			void *p, unsigned int size, void *end)
-{
-	void *t = task_stack_page(task);
-	if (end) {
-		if (p < end && p >= (end-THREAD_SIZE))
-			return 1;
-		else
-			return 0;
-	}
-	return p >= t && p < t + THREAD_SIZE - size;
-}
-
 unsigned long
 print_context_stack(struct task_struct *task,
 		unsigned long *stack, unsigned long bp,
 		const struct stacktrace_ops *ops, void *data,
-		unsigned long *end, int *graph)
+		struct stack_info *info, int *graph)
 {
 	struct stack_frame *frame = (struct stack_frame *)bp;
 
@@ -75,7 +79,7 @@ print_context_stack(struct task_struct *task,
 	    PAGE_SIZE)
 		stack = (unsigned long *)task_stack_page(task);
 
-	while (valid_stack_ptr(task, stack, sizeof(*stack), end)) {
+	while (on_stack(info, stack, sizeof(*stack))) {
 		unsigned long addr = *stack;
 
 		if (__kernel_text_address(addr)) {
@@ -114,12 +118,12 @@ unsigned long
 print_context_stack_bp(struct task_struct *task,
 		       unsigned long *stack, unsigned long bp,
 		       const struct stacktrace_ops *ops, void *data,
-		       unsigned long *end, int *graph)
+		       struct stack_info *info, int *graph)
 {
 	struct stack_frame *frame = (struct stack_frame *)bp;
 	unsigned long *retp = &frame->return_address;
 
-	while (valid_stack_ptr(task, retp, sizeof(*retp), end)) {
+	while (on_stack(info, stack, sizeof(*stack) * 2)) {
 		unsigned long addr = *retp;
 		unsigned long real_addr;
 
@@ -138,7 +142,7 @@ print_context_stack_bp(struct task_struct *task,
 }
 EXPORT_SYMBOL_GPL(print_context_stack_bp);
 
-static int print_trace_stack(void *data, char *name)
+static int print_trace_stack(void *data, const char *name)
 {
 	printk("%s <%s> ", (char *)data, name);
 	return 0;
diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
index b07d5c9..51a113b 100644
--- a/arch/x86/kernel/dumpstack_32.c
+++ b/arch/x86/kernel/dumpstack_32.c
@@ -16,61 +16,117 @@
 
 #include <asm/stacktrace.h>
 
-static void *is_irq_stack(void *p, void *irq)
+void stack_type_str(enum stack_type type, const char **begin, const char **end)
 {
-	if (p < irq || p >= (irq + THREAD_SIZE))
-		return NULL;
-	return irq + THREAD_SIZE;
+	switch (type) {
+	case STACK_TYPE_IRQ:
+	case STACK_TYPE_SOFTIRQ:
+		*begin = "IRQ";
+		*end   = "EOI";
+		break;
+	default:
+		*begin = NULL;
+		*end   = NULL;
+	}
 }
 
+static bool in_hardirq_stack(unsigned long *stack, struct stack_info *info)
+{
+	unsigned long *begin = (unsigned long *)this_cpu_read(hardirq_stack);
+	unsigned long *end   = begin + (THREAD_SIZE / sizeof(long));
+
+	if (stack < begin || stack >= end)
+		return false;
+
+	info->type	= STACK_TYPE_IRQ;
+	info->begin	= begin;
+	info->end	= end;
+
+	/*
+	 * See irq_32.c -- the next stack pointer is stored at the beginning of
+	 * the stack.
+	 */
+	info->next_sp	= (unsigned long *)*begin;
+
+	return true;
+}
 
-static void *is_hardirq_stack(unsigned long *stack)
+static bool in_softirq_stack(unsigned long *stack, struct stack_info *info)
 {
-	void *irq = this_cpu_read(hardirq_stack);
+	unsigned long *begin = (unsigned long *)this_cpu_read(softirq_stack);
+	unsigned long *end   = begin + (THREAD_SIZE / sizeof(long));
+
+	if (stack < begin || stack >= end)
+		return false;
+
+	info->type	= STACK_TYPE_SOFTIRQ;
+	info->begin	= begin;
+	info->end	= end;
+
+	/*
+	 * See irq_32.c -- the next stack pointer is stored at the beginning of
+	 * the stack.
+	 */
+	info->next_sp	= (unsigned long *)*begin;
 
-	return is_irq_stack(stack, irq);
+	return true;
 }
 
-static void *is_softirq_stack(unsigned long *stack);
+int get_stack_info(unsigned long *stack, struct task_struct *task,
+		   struct stack_info *info, unsigned long *visit_mask)
 {
-	void *irq = this_cpu_read(softirq_stack);
+	if (!stack)
+		goto unknown;
 
-	return is_irq_stack(stack, irq);
+	task = task ? : current;
+
+	if (in_task_stack(stack, task, info))
+		return 0;
+
+	if (task != current)
+		goto unknown;
+
+	if (in_hardirq_stack(stack, info))
+		return 0;
+
+	if (in_softirq_stack(stack, info))
+		return 0;
+
+unknown:
+	info->type = STACK_TYPE_UNKNOWN;
+	return -EINVAL;
 }
 
 void dump_trace(struct task_struct *task, struct pt_regs *regs,
 		unsigned long *stack, unsigned long bp,
 		const struct stacktrace_ops *ops, void *data)
 {
+	unsigned long visit_mask = 0;
 	int graph = 0;
-	u32 *prev_esp;
 
 	task = task ? : current;
 	stack = stack ? : get_stack_pointer(task, regs);
 	bp = bp ? : (unsigned long)get_frame_pointer(task, regs);
 
 	for (;;) {
-		void *end_stack;
+		const char *begin_str, *end_str;
+		struct stack_info info;
 
-		end_stack = is_hardirq_stack(stack);
-		if (!end_stack)
-			end_stack = is_softirq_stack(stack);
+		if (get_stack_info(stack, task, &info, &visit_mask))
+			break;
 
-		bp = ops->walk_stack(task, stack, bp, ops, data,
-				     end_stack, &graph);
+		stack_type_str(info.type, &begin_str, &end_str);
 
-		/* Stop if not on irq stack */
-		if (!end_stack)
+		if (begin_str && ops->stack(data, begin_str) < 0)
 			break;
 
-		/* The previous esp is saved on the bottom of the stack */
-		prev_esp = (u32 *)(end_stack - THREAD_SIZE);
-		stack = (unsigned long *)*prev_esp;
-		if (!stack)
-			break;
+		bp = ops->walk_stack(task, stack, bp, ops, data, &info, &graph);
 
-		if (ops->stack(data, "IRQ") < 0)
+		if (end_str && ops->stack(data, end_str) < 0)
 			break;
+
+		stack = info.next_sp;
+
 		touch_nmi_watchdog();
 	}
 }
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index 69f6ba2..2e8c750 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -28,17 +28,38 @@ static unsigned long exception_stack_sizes[N_EXCEPTION_STACKS] = {
 	[DEBUG_STACK - 1]			= DEBUG_STKSZ
 };
 
-static unsigned long *in_exception_stack(unsigned long stack, unsigned *usedp,
-					 char **idp)
+void stack_type_str(enum stack_type type, const char **begin, const char **end)
 {
-	unsigned long begin, end;
+	BUILD_BUG_ON(N_EXCEPTION_STACKS != 4);
+
+	switch (type) {
+	case STACK_TYPE_IRQ:
+		*begin = "IRQ";
+		*end   = "EOI";
+		break;
+	case STACK_TYPE_EXCEPTION ... STACK_TYPE_EXCEPTION_LAST:
+		*begin = exception_stack_names[type - STACK_TYPE_EXCEPTION];
+		*end   = "EOE";
+		break;
+	default:
+		*begin = NULL;
+		*end   = NULL;
+	}
+}
+
+static bool in_exception_stack(unsigned long *stack, struct stack_info *info,
+			       unsigned long *visit_mask)
+{
+	unsigned long *begin, *end;
+	struct pt_regs *regs;
 	unsigned k;
 
 	BUILD_BUG_ON(N_EXCEPTION_STACKS != 4);
 
 	for (k = 0; k < N_EXCEPTION_STACKS; k++) {
-		end   = raw_cpu_ptr(&orig_ist)->ist[k];
-		begin = end - exception_stack_sizes[k];
+		end   = (unsigned long *)raw_cpu_ptr(&orig_ist)->ist[k];
+		begin = end - (exception_stack_sizes[k] / sizeof(long));
+		regs  = (struct pt_regs *)end - 1;
 
 		if (stack < begin || stack >= end)
 			continue;
@@ -48,56 +69,67 @@ static unsigned long *in_exception_stack(unsigned long stack, unsigned *usedp,
 		 * If it comes up for the second time then there's something
 		 * wrong going on - just break and return NULL:
 		 */
-		if (*usedp & (1U << k))
+		if (*visit_mask & (1U << k))
 			break;
-		*usedp |= 1U << k;
+		*visit_mask |= 1U << k;
 
-		*idp = exception_stack_names[k];
-		return (unsigned long *)end;
+		info->type	= STACK_TYPE_EXCEPTION + k;
+		info->begin	= begin;
+		info->end	= end;
+		info->next_sp	= (unsigned long *)regs->sp;
+
+		return true;
 	}
 
-	return NULL;
+	return false;
 }
 
-static inline int
-in_irq_stack(unsigned long *stack, unsigned long *irq_stack,
-	     unsigned long *irq_stack_end)
+static bool in_irq_stack(unsigned long *stack, struct stack_info *info)
 {
-	return (stack >= irq_stack && stack < irq_stack_end);
-}
+	unsigned long *end   = (unsigned long *)this_cpu_read(irq_stack_ptr);
+	unsigned long *begin = end - (IRQ_USABLE_STACK_SIZE / sizeof(long));
 
-enum stack_type {
-	STACK_IS_UNKNOWN,
-	STACK_IS_NORMAL,
-	STACK_IS_EXCEPTION,
-	STACK_IS_IRQ,
-};
+	if (stack < begin || stack >= end)
+		return false;
+
+	info->type	= STACK_TYPE_IRQ;
+	info->begin	= begin;
+	info->end	= end;
+
+	/*
+	 * The next stack pointer is the first thing pushed by the entry code
+	 * after switching to the irq stack.
+	 */
+	info->next_sp = (unsigned long *)*(end - 1);
+
+	return true;
+}
 
-static enum stack_type
-analyze_stack(struct task_struct *task, unsigned long *stack,
-	      unsigned long **stack_end, unsigned long *irq_stack,
-	      unsigned *used, char **id)
+int get_stack_info(unsigned long *stack, struct task_struct *task,
+		   struct stack_info *info, unsigned long *visit_mask)
 {
-	unsigned long addr;
+	if (!stack)
+		goto unknown;
 
-	addr = ((unsigned long)stack & (~(THREAD_SIZE - 1)));
-	if ((unsigned long)task_stack_page(task) == addr)
-		return STACK_IS_NORMAL;
+	task = task ? : current;
+
+	if (in_task_stack(stack, task, info))
+		return 0;
 
-	*stack_end = in_exception_stack((unsigned long)stack, used, id);
-	if (*stack_end)
-		return STACK_IS_EXCEPTION;
+	if (task != current)
+		goto unknown;
 
-	if (!irq_stack)
-		return STACK_IS_NORMAL;
+	if (in_exception_stack(stack, info, visit_mask))
+		return 0;
 
-	*stack_end = irq_stack;
-	irq_stack -= (IRQ_USABLE_STACK_SIZE / sizeof(long));
+	if (in_irq_stack(stack, info))
+		return 0;
 
-	if (in_irq_stack(stack, irq_stack, *stack_end))
-		return STACK_IS_IRQ;
+	return 0;
 
-	return STACK_IS_UNKNOWN;
+unknown:
+	info->type = STACK_TYPE_UNKNOWN;
+	return -EINVAL;
 }
 
 /*
@@ -111,8 +143,8 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 		unsigned long *stack, unsigned long bp,
 		const struct stacktrace_ops *ops, void *data)
 {
-	unsigned long *irq_stack = (unsigned long *)this_cpu_read(irq_stack_ptr);
-	unsigned used = 0;
+	unsigned long visit_mask = 0;
+	struct stack_info info;
 	int graph = 0;
 	int done = 0;
 
@@ -126,57 +158,37 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 	 * exceptions
 	 */
 	while (!done) {
-		unsigned long *stack_end;
-		enum stack_type stype;
-		char *id;
+		const char *begin_str, *end_str;
 
-		stype = analyze_stack(task, stack, &stack_end, irq_stack, &used,
-				      &id);
+		get_stack_info(stack, task, &info, &visit_mask);
 
 		/* Default finish unless specified to continue */
 		done = 1;
 
-		switch (stype) {
+		switch (info.type) {
 
 		/* Break out early if we are on the thread stack */
-		case STACK_IS_NORMAL:
+		case STACK_TYPE_TASK:
 			break;
 
-		case STACK_IS_EXCEPTION:
+		case STACK_TYPE_IRQ:
+		case STACK_TYPE_EXCEPTION ... STACK_TYPE_EXCEPTION_LAST:
+
+			stack_type_str(info.type, &begin_str, &end_str);
 
-			if (ops->stack(data, id) < 0)
+			if (ops->stack(data, begin_str) < 0)
 				break;
 
 			bp = ops->walk_stack(task, stack, bp, ops,
-					     data, stack_end, &graph);
-			ops->stack(data, "EOE");
-			/*
-			 * We link to the next stack via the
-			 * second-to-last pointer (index -2 to end) in the
-			 * exception stack:
-			 */
-			stack = (unsigned long *) stack_end[-2];
-			done = 0;
-			break;
+					     data, &info, &graph);
 
-		case STACK_IS_IRQ:
+			ops->stack(data, end_str);
 
-			if (ops->stack(data, "IRQ") < 0)
-				break;
-			bp = ops->walk_stack(task, stack, bp,
-				     ops, data, stack_end, &graph);
-			/*
-			 * We link to the next stack (which would be
-			 * the process stack normally) the last
-			 * pointer (index -1 to end) in the IRQ stack:
-			 */
-			stack = (unsigned long *) (stack_end[-1]);
-			irq_stack = NULL;
-			ops->stack(data, "EOI");
+			stack = info.next_sp;
 			done = 0;
 			break;
 
-		case STACK_IS_UNKNOWN:
+		default:
 			ops->stack(data, "UNK");
 			break;
 		}
@@ -185,7 +197,7 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 	/*
 	 * This handles the process stack:
 	 */
-	bp = ops->walk_stack(task, stack, bp, ops, data, NULL, &graph);
+	bp = ops->walk_stack(task, stack, bp, ops, data, &info, &graph);
 }
 EXPORT_SYMBOL(dump_trace);
 
@@ -193,8 +205,7 @@ void
 show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
 		   unsigned long *sp, unsigned long bp, char *log_lvl)
 {
-	unsigned long *irq_stack_end;
-	unsigned long *irq_stack;
+	unsigned long *irq_stack, *irq_stack_end;
 	unsigned long *stack;
 	int i;
 
diff --git a/arch/x86/kernel/stacktrace.c b/arch/x86/kernel/stacktrace.c
index 4738f5e..785aef1 100644
--- a/arch/x86/kernel/stacktrace.c
+++ b/arch/x86/kernel/stacktrace.c
@@ -9,7 +9,7 @@
 #include <linux/uaccess.h>
 #include <asm/stacktrace.h>
 
-static int save_stack_stack(void *data, char *name)
+static int save_stack_stack(void *data, const char *name)
 {
 	return 0;
 }
diff --git a/arch/x86/oprofile/backtrace.c b/arch/x86/oprofile/backtrace.c
index d950f9e..7539148 100644
--- a/arch/x86/oprofile/backtrace.c
+++ b/arch/x86/oprofile/backtrace.c
@@ -17,7 +17,7 @@
 #include <asm/ptrace.h>
 #include <asm/stacktrace.h>
 
-static int backtrace_stack(void *data, char *name)
+static int backtrace_stack(void *data, const char *name)
 {
 	/* Yes, we want all stacks */
 	return 0;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 29/44] x86/dumpstack: add recursion checking for all stacks
  2016-08-04 22:21 [PATCH v2 00/44] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (27 preceding siblings ...)
  2016-08-04 22:22 ` [PATCH v2 28/44] x86/dumpstack: add get_stack_info() interface Josh Poimboeuf
@ 2016-08-04 22:22 ` Josh Poimboeuf
  2016-08-04 22:22 ` [PATCH v2 30/44] x86/unwind: add new unwind interface and implementations Josh Poimboeuf
                   ` (14 subsequent siblings)
  43 siblings, 0 replies; 66+ messages in thread
From: Josh Poimboeuf @ 2016-08-04 22:22 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park

in_exception_stack() has some recursion checking which makes sure the
stack trace code never traverses a given exception stack more than once.
Otherwise corruption could cause a stack to point to itself (directly or
indirectly), resulting in an infinite loop.

Extend the recursion checking to all stacks.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/dumpstack_32.c | 22 +++++++++++++++++++---
 arch/x86/kernel/dumpstack_64.c | 34 +++++++++++++++++++---------------
 2 files changed, 38 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
index 51a113b..37d9c30 100644
--- a/arch/x86/kernel/dumpstack_32.c
+++ b/arch/x86/kernel/dumpstack_32.c
@@ -81,16 +81,32 @@ int get_stack_info(unsigned long *stack, struct task_struct *task,
 	task = task ? : current;
 
 	if (in_task_stack(stack, task, info))
-		return 0;
+		goto recursion_check;
 
 	if (task != current)
 		goto unknown;
 
 	if (in_hardirq_stack(stack, info))
-		return 0;
+		goto recursion_check;
 
 	if (in_softirq_stack(stack, info))
-		return 0;
+		goto recursion_check;
+
+	goto unknown;
+
+recursion_check:
+	/*
+	 * Make sure we don't iterate through any given stack more than once.
+	 * If it comes up a second time then there's something wrong going on:
+	 * just break out and report an unknown stack type.
+	 */
+	if (visit_mask) {
+		if (*visit_mask & (1UL << info->type))
+			goto unknown;
+		*visit_mask |= 1UL << info->type;
+	}
+
+	return 0;
 
 unknown:
 	info->type = STACK_TYPE_UNKNOWN;
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index 2e8c750..2292292 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -47,8 +47,7 @@ void stack_type_str(enum stack_type type, const char **begin, const char **end)
 	}
 }
 
-static bool in_exception_stack(unsigned long *stack, struct stack_info *info,
-			       unsigned long *visit_mask)
+static bool in_exception_stack(unsigned long *stack, struct stack_info *info)
 {
 	unsigned long *begin, *end;
 	struct pt_regs *regs;
@@ -64,15 +63,6 @@ static bool in_exception_stack(unsigned long *stack, struct stack_info *info,
 		if (stack < begin || stack >= end)
 			continue;
 
-		/*
-		 * Make sure we only iterate through an exception stack once.
-		 * If it comes up for the second time then there's something
-		 * wrong going on - just break and return NULL:
-		 */
-		if (*visit_mask & (1U << k))
-			break;
-		*visit_mask |= 1U << k;
-
 		info->type	= STACK_TYPE_EXCEPTION + k;
 		info->begin	= begin;
 		info->end	= end;
@@ -114,16 +104,30 @@ int get_stack_info(unsigned long *stack, struct task_struct *task,
 	task = task ? : current;
 
 	if (in_task_stack(stack, task, info))
-		return 0;
+		goto recursion_check;
 
 	if (task != current)
 		goto unknown;
 
-	if (in_exception_stack(stack, info, visit_mask))
-		return 0;
+	if (in_exception_stack(stack, info))
+		goto recursion_check;
 
 	if (in_irq_stack(stack, info))
-		return 0;
+		goto recursion_check;
+
+	goto unknown;
+
+recursion_check:
+	/*
+	 * Make sure we don't iterate through any given stack more than once.
+	 * If it comes up a second time then there's something wrong going on:
+	 * just break out and report an unknown stack type.
+	 */
+	if (visit_mask) {
+		if (*visit_mask & (1UL << info->type))
+			goto unknown;
+		*visit_mask |= 1UL << info->type;
+	}
 
 	return 0;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 30/44] x86/unwind: add new unwind interface and implementations
  2016-08-04 22:21 [PATCH v2 00/44] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (28 preceding siblings ...)
  2016-08-04 22:22 ` [PATCH v2 29/44] x86/dumpstack: add recursion checking for all stacks Josh Poimboeuf
@ 2016-08-04 22:22 ` Josh Poimboeuf
  2016-08-09 23:17   ` Nilay Vaish
  2016-08-04 22:22 ` [PATCH v2 31/44] perf/x86: convert perf_callchain_kernel() to use the new unwinder Josh Poimboeuf
                   ` (13 subsequent siblings)
  43 siblings, 1 reply; 66+ messages in thread
From: Josh Poimboeuf @ 2016-08-04 22:22 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park

The x86 stack dump code is a bit of a mess.  dump_trace() uses
callbacks, and each user of it seems to have slightly different
requirements, so there are several slightly different callbacks floating
around.

Also there are some upcoming features which will require more changes to
the stack dump code: reliable stack detection for live patching,
hardened user copy, and the DWARF unwinder.  Each of those features
would at least need more callbacks and/or callback interfaces, resulting
in a much bigger mess than what we have today.

Before doing all that, we should try to clean things up and replace
dump_trace() with something cleaner and more flexible.

The new unwinder is a simple state machine which was heavily inspired by
a suggestion from Andy Lutomirski:

  https://lkml.kernel.org/r/CALCETrUbNTqaM2LRyXGRx=kVLRPeY5A3Pc6k4TtQxF320rUT=w@mail.gmail.com

It's also very similar to the libunwind API:

  http://www.nongnu.org/libunwind/man/libunwind(3).html

Some if its advantages:

- Simplicity: no more callback sprawl and less code duplication.

- Flexibility: it allows the caller to stop and inspect the stack state
  at each step in the unwinding process.

- Modularity: the unwinder code, console stack dump code, and stack
  metadata analysis code are all better separated so that changing one
  of them shouldn't have much of an impact on any of the others.

Two implementations are added which conform to the new unwind interface:

- The frame pointer unwinder which is used for CONFIG_FRAME_POINTER=y.

- The "guess" unwinder which is used for CONFIG_FRAME_POINTER=n.  This
  isn't an "unwinder" per se.  All it does is scan the stack for kernel
  text addresses.  But with no frame pointers, guesses are better than
  nothing in most cases.

Suggested-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/include/asm/unwind.h  | 93 ++++++++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/Makefile       |  6 +++
 arch/x86/kernel/unwind_frame.c | 84 ++++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/unwind_guess.c | 40 ++++++++++++++++++
 4 files changed, 223 insertions(+)
 create mode 100644 arch/x86/include/asm/unwind.h
 create mode 100644 arch/x86/kernel/unwind_frame.c
 create mode 100644 arch/x86/kernel/unwind_guess.c

diff --git a/arch/x86/include/asm/unwind.h b/arch/x86/include/asm/unwind.h
new file mode 100644
index 0000000..c4fdd58
--- /dev/null
+++ b/arch/x86/include/asm/unwind.h
@@ -0,0 +1,93 @@
+#ifndef _ASM_X86_UNWIND_H
+#define _ASM_X86_UNWIND_H
+
+#include <linux/sched.h>
+#include <linux/ftrace.h>
+#include <asm/ptrace.h>
+#include <asm/stacktrace.h>
+
+struct unwind_state {
+	struct stack_info stack_info;
+	unsigned long stack_mask;
+	struct task_struct *task;
+	int graph_idx;
+#ifdef CONFIG_FRAME_POINTER
+	unsigned long *bp;
+#else
+	unsigned long *sp;
+#endif
+};
+
+void __unwind_start(struct unwind_state *state, struct task_struct *task,
+		    struct pt_regs *regs, unsigned long *sp);
+
+bool unwind_next_frame(struct unwind_state *state);
+
+
+#ifdef CONFIG_FRAME_POINTER
+
+static inline
+unsigned long *unwind_get_return_address_ptr(struct unwind_state *state)
+{
+	if (state->stack_info.type == STACK_TYPE_UNKNOWN)
+		return NULL;
+
+	return state->bp + 1;
+}
+
+static inline unsigned long *unwind_get_stack_ptr(struct unwind_state *state)
+{
+	if (state->stack_info.type == STACK_TYPE_UNKNOWN)
+		return NULL;
+
+	return state->bp;
+}
+
+unsigned long unwind_get_return_address(struct unwind_state *state);
+
+#else /* !CONFIG_FRAME_POINTER */
+
+static inline
+unsigned long *unwind_get_return_address_ptr(struct unwind_state *state)
+{
+	return NULL;
+}
+
+static inline unsigned long *unwind_get_stack_ptr(struct unwind_state *state)
+{
+	if (state->stack_info.type == STACK_TYPE_UNKNOWN)
+		return NULL;
+
+	return state->sp;
+}
+
+static inline
+unsigned long unwind_get_return_address(struct unwind_state *state)
+{
+	if (state->stack_info.type == STACK_TYPE_UNKNOWN)
+		return 0;
+
+	return ftrace_graph_ret_addr(state->task, &state->graph_idx,
+				     *state->sp, state->sp);
+}
+
+#endif /* CONFIG_FRAME_POINTER */
+
+static inline bool unwind_done(struct unwind_state *state)
+{
+	return (state->stack_info.type == STACK_TYPE_UNKNOWN);
+}
+
+static inline
+void unwind_start(struct unwind_state *state, struct task_struct *task,
+		  struct pt_regs *regs, unsigned long *sp)
+{
+	if (!task)
+		task = current;
+
+	sp = sp ? : get_stack_pointer(task, regs);
+
+	__unwind_start(state, task, regs, sp);
+}
+
+#endif /* _ASM_X86_UNWIND_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 0503f5b..45257cf 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -125,6 +125,12 @@ obj-$(CONFIG_EFI)			+= sysfb_efi.o
 obj-$(CONFIG_PERF_EVENTS)		+= perf_regs.o
 obj-$(CONFIG_TRACING)			+= tracepoint.o
 
+ifdef CONFIG_FRAME_POINTER
+obj-y					+= unwind_frame.o
+else
+obj-y					+= unwind_guess.o
+endif
+
 ###
 # 64 bit specific files
 ifeq ($(CONFIG_X86_64),y)
diff --git a/arch/x86/kernel/unwind_frame.c b/arch/x86/kernel/unwind_frame.c
new file mode 100644
index 0000000..f28f1b5
--- /dev/null
+++ b/arch/x86/kernel/unwind_frame.c
@@ -0,0 +1,84 @@
+#include <linux/sched.h>
+#include <asm/ptrace.h>
+#include <asm/bitops.h>
+#include <asm/stacktrace.h>
+#include <asm/unwind.h>
+
+#define FRAME_HEADER_SIZE (sizeof(long) * 2)
+
+unsigned long unwind_get_return_address(struct unwind_state *state)
+{
+	unsigned long *addr_p = unwind_get_return_address_ptr(state);
+	unsigned long addr;
+
+	if (state->stack_info.type == STACK_TYPE_UNKNOWN)
+		return 0;
+
+	addr = ftrace_graph_ret_addr(state->task, &state->graph_idx, *addr_p,
+				     addr_p);
+
+	return __kernel_text_address(addr) ? addr : 0;
+}
+EXPORT_SYMBOL_GPL(unwind_get_return_address);
+
+static bool update_stack_state(struct unwind_state *state, void *addr,
+			       size_t len)
+{
+	struct stack_info *info = &state->stack_info;
+
+	if (on_stack(info, addr, len))
+		return true;
+
+	if (get_stack_info(info->next_sp, state->task, info,
+			   &state->stack_mask))
+		goto unknown;
+
+	if (!on_stack(info, addr, len))
+		goto unknown;
+
+	return true;
+
+unknown:
+	info->type = STACK_TYPE_UNKNOWN;
+	return false;
+}
+
+bool unwind_next_frame(struct unwind_state *state)
+{
+	unsigned long *next_bp;
+
+	if (unwind_done(state))
+		return false;
+
+	next_bp = (unsigned long *)*state->bp;
+
+	/*
+	 * Make sure the next frame is on a valid stack and can be accessed
+	 * safely.
+	 */
+	if (!update_stack_state(state, next_bp, FRAME_HEADER_SIZE))
+		return false;
+
+	/* move to the next frame */
+	state->bp = next_bp;
+	return true;
+}
+EXPORT_SYMBOL_GPL(unwind_next_frame);
+
+void __unwind_start(struct unwind_state *state, struct task_struct *task,
+		    struct pt_regs *regs, unsigned long *sp)
+{
+	memset(state, 0, sizeof(*state));
+
+	state->task = task;
+	state->bp = get_frame_pointer(task, regs);
+
+	get_stack_info(state->bp, state->task, &state->stack_info,
+		       &state->stack_mask);
+	update_stack_state(state, state->bp, FRAME_HEADER_SIZE);
+
+	/* unwind to the first frame after the specified stack pointer */
+	while (state->bp < sp && !unwind_done(state))
+		unwind_next_frame(state);
+}
+EXPORT_SYMBOL_GPL(__unwind_start);
diff --git a/arch/x86/kernel/unwind_guess.c b/arch/x86/kernel/unwind_guess.c
new file mode 100644
index 0000000..e03df5a
--- /dev/null
+++ b/arch/x86/kernel/unwind_guess.c
@@ -0,0 +1,40 @@
+#include <linux/sched.h>
+#include <linux/ftrace.h>
+#include <asm/ptrace.h>
+#include <asm/bitops.h>
+#include <asm/stacktrace.h>
+#include <asm/unwind.h>
+
+bool unwind_next_frame(struct unwind_state *state)
+{
+	struct stack_info *info = &state->stack_info;
+
+	if (info->type == STACK_TYPE_UNKNOWN)
+		return false;
+
+	do {
+		for (state->sp++; state->sp < info->end; state->sp++)
+			if (__kernel_text_address(*state->sp))
+				return true;
+
+		state->sp = info->next_sp;
+
+	} while (!get_stack_info(state->sp, state->task, info,
+				 &state->stack_mask));
+
+	return false;
+}
+
+void __unwind_start(struct unwind_state *state, struct task_struct *task,
+		    struct pt_regs *regs, unsigned long *sp)
+{
+	memset(state, 0, sizeof(*state));
+
+	state->task = task;
+	state->sp   = sp;
+
+	get_stack_info(sp, state->task, &state->stack_info, &state->stack_mask);
+
+	if (!__kernel_text_address(*sp))
+		unwind_next_frame(state);
+}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 31/44] perf/x86: convert perf_callchain_kernel() to use the new unwinder
  2016-08-04 22:21 [PATCH v2 00/44] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (29 preceding siblings ...)
  2016-08-04 22:22 ` [PATCH v2 30/44] x86/unwind: add new unwind interface and implementations Josh Poimboeuf
@ 2016-08-04 22:22 ` Josh Poimboeuf
  2016-08-04 22:22 ` [PATCH v2 32/44] x86/stacktrace: convert save_stack_trace_*() " Josh Poimboeuf
                   ` (12 subsequent siblings)
  43 siblings, 0 replies; 66+ messages in thread
From: Josh Poimboeuf @ 2016-08-04 22:22 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park

Convert perf_callchain_kernel() to use the new unwinder.  dump_trace()
has been deprecated.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/events/core.c | 33 ++++++++++-----------------------
 1 file changed, 10 insertions(+), 23 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 477dc38..113f90e 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -37,6 +37,7 @@
 #include <asm/timer.h>
 #include <asm/desc.h>
 #include <asm/ldt.h>
+#include <asm/unwind.h>
 
 #include "perf_event.h"
 
@@ -2247,31 +2248,12 @@ void arch_perf_update_userpage(struct perf_event *event,
 	cyc2ns_read_end(data);
 }
 
-/*
- * callchain support
- */
-
-static int backtrace_stack(void *data, const char *name)
-{
-	return 0;
-}
-
-static int backtrace_address(void *data, unsigned long addr, int reliable)
-{
-	struct perf_callchain_entry_ctx *entry = data;
-
-	return perf_callchain_store(entry, addr);
-}
-
-static const struct stacktrace_ops backtrace_ops = {
-	.stack			= backtrace_stack,
-	.address		= backtrace_address,
-	.walk_stack		= print_context_stack_bp,
-};
-
 void
 perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *regs)
 {
+	struct unwind_state state;
+	unsigned long addr;
+
 	if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
 		/* TODO: We don't support guest os callchain now */
 		return;
@@ -2280,7 +2262,12 @@ perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *re
 	if (perf_callchain_store(entry, regs->ip))
 		return;
 
-	dump_trace(NULL, regs, NULL, 0, &backtrace_ops, entry);
+	for (unwind_start(&state, NULL, regs, NULL); !unwind_done(&state);
+	     unwind_next_frame(&state)) {
+		addr = unwind_get_return_address(&state);
+		if (!addr || perf_callchain_store(entry, addr))
+			return;
+	}
 }
 
 static inline int
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 32/44] x86/stacktrace: convert save_stack_trace_*() to use the new unwinder
  2016-08-04 22:21 [PATCH v2 00/44] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (30 preceding siblings ...)
  2016-08-04 22:22 ` [PATCH v2 31/44] perf/x86: convert perf_callchain_kernel() to use the new unwinder Josh Poimboeuf
@ 2016-08-04 22:22 ` Josh Poimboeuf
  2016-08-04 22:22 ` [PATCH v2 33/44] oprofile/x86: convert x86_backtrace() " Josh Poimboeuf
                   ` (11 subsequent siblings)
  43 siblings, 0 replies; 66+ messages in thread
From: Josh Poimboeuf @ 2016-08-04 22:22 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park

Convert save_stack_trace_*() to use the new unwinder.  dump_trace() has
been deprecated.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/stacktrace.c | 74 +++++++++++++++++---------------------------
 1 file changed, 29 insertions(+), 45 deletions(-)

diff --git a/arch/x86/kernel/stacktrace.c b/arch/x86/kernel/stacktrace.c
index 785aef1..a168e7e 100644
--- a/arch/x86/kernel/stacktrace.c
+++ b/arch/x86/kernel/stacktrace.c
@@ -8,80 +8,64 @@
 #include <linux/export.h>
 #include <linux/uaccess.h>
 #include <asm/stacktrace.h>
+#include <asm/unwind.h>
 
-static int save_stack_stack(void *data, const char *name)
+static int save_stack_address(struct stack_trace *trace, unsigned long addr,
+			      bool nosched)
 {
-	return 0;
-}
-
-static int
-__save_stack_address(void *data, unsigned long addr, bool reliable, bool nosched)
-{
-	struct stack_trace *trace = data;
-#ifdef CONFIG_FRAME_POINTER
-	if (!reliable)
-		return 0;
-#endif
 	if (nosched && in_sched_functions(addr))
 		return 0;
+
 	if (trace->skip > 0) {
 		trace->skip--;
 		return 0;
 	}
-	if (trace->nr_entries < trace->max_entries) {
-		trace->entries[trace->nr_entries++] = addr;
-		return 0;
-	} else {
-		return -1; /* no more room, stop walking the stack */
-	}
-}
 
-static int save_stack_address(void *data, unsigned long addr, int reliable)
-{
-	return __save_stack_address(data, addr, reliable, false);
+	if (trace->nr_entries >= trace->max_entries)
+		return -1;
+
+	trace->entries[trace->nr_entries++] = addr;
+	return 0;
 }
 
-static int
-save_stack_address_nosched(void *data, unsigned long addr, int reliable)
+static void __save_stack_trace(struct stack_trace *trace,
+			       struct task_struct *task, struct pt_regs *regs,
+			       bool nosched)
 {
-	return __save_stack_address(data, addr, reliable, true);
-}
+	struct unwind_state state;
+	unsigned long addr;
 
-static const struct stacktrace_ops save_stack_ops = {
-	.stack		= save_stack_stack,
-	.address	= save_stack_address,
-	.walk_stack	= print_context_stack,
-};
+	if (regs)
+		save_stack_address(trace, regs->ip, nosched);
 
-static const struct stacktrace_ops save_stack_ops_nosched = {
-	.stack		= save_stack_stack,
-	.address	= save_stack_address_nosched,
-	.walk_stack	= print_context_stack,
-};
+	for (unwind_start(&state, task, regs, NULL); !unwind_done(&state);
+	     unwind_next_frame(&state)) {
+		addr = unwind_get_return_address(&state);
+		if (!addr || save_stack_address(trace, addr, nosched))
+			break;
+	}
+
+	if (trace->nr_entries < trace->max_entries)
+		trace->entries[trace->nr_entries++] = ULONG_MAX;
+}
 
 /*
  * Save stack-backtrace addresses into a stack_trace buffer.
  */
 void save_stack_trace(struct stack_trace *trace)
 {
-	dump_trace(current, NULL, NULL, 0, &save_stack_ops, trace);
-	if (trace->nr_entries < trace->max_entries)
-		trace->entries[trace->nr_entries++] = ULONG_MAX;
+	__save_stack_trace(trace, NULL, NULL, false);
 }
 EXPORT_SYMBOL_GPL(save_stack_trace);
 
 void save_stack_trace_regs(struct pt_regs *regs, struct stack_trace *trace)
 {
-	dump_trace(current, regs, NULL, 0, &save_stack_ops, trace);
-	if (trace->nr_entries < trace->max_entries)
-		trace->entries[trace->nr_entries++] = ULONG_MAX;
+	__save_stack_trace(trace, NULL, regs, false);
 }
 
 void save_stack_trace_tsk(struct task_struct *tsk, struct stack_trace *trace)
 {
-	dump_trace(tsk, NULL, NULL, 0, &save_stack_ops_nosched, trace);
-	if (trace->nr_entries < trace->max_entries)
-		trace->entries[trace->nr_entries++] = ULONG_MAX;
+	__save_stack_trace(trace, tsk, NULL, true);
 }
 EXPORT_SYMBOL_GPL(save_stack_trace_tsk);
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 33/44] oprofile/x86: convert x86_backtrace() to use the new unwinder
  2016-08-04 22:21 [PATCH v2 00/44] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (31 preceding siblings ...)
  2016-08-04 22:22 ` [PATCH v2 32/44] x86/stacktrace: convert save_stack_trace_*() " Josh Poimboeuf
@ 2016-08-04 22:22 ` Josh Poimboeuf
  2016-08-04 22:22 ` [PATCH v2 34/44] x86/dumpstack: convert show_trace_log_lvl() " Josh Poimboeuf
                   ` (10 subsequent siblings)
  43 siblings, 0 replies; 66+ messages in thread
From: Josh Poimboeuf @ 2016-08-04 22:22 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Robert Richter

Convert oprofile's x86_backtrace() to use the new unwinder.
dump_trace() has been deprecated.

Cc: Robert Richter <rric@kernel.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/oprofile/backtrace.c | 39 +++++++++++++++++----------------------
 1 file changed, 17 insertions(+), 22 deletions(-)

diff --git a/arch/x86/oprofile/backtrace.c b/arch/x86/oprofile/backtrace.c
index 7539148..f28ac1a 100644
--- a/arch/x86/oprofile/backtrace.c
+++ b/arch/x86/oprofile/backtrace.c
@@ -16,27 +16,7 @@
 
 #include <asm/ptrace.h>
 #include <asm/stacktrace.h>
-
-static int backtrace_stack(void *data, const char *name)
-{
-	/* Yes, we want all stacks */
-	return 0;
-}
-
-static int backtrace_address(void *data, unsigned long addr, int reliable)
-{
-	unsigned int *depth = data;
-
-	if ((*depth)--)
-		oprofile_add_trace(addr);
-	return 0;
-}
-
-static struct stacktrace_ops backtrace_ops = {
-	.stack		= backtrace_stack,
-	.address	= backtrace_address,
-	.walk_stack	= print_context_stack,
-};
+#include <asm/unwind.h>
 
 #ifdef CONFIG_COMPAT
 static struct stack_frame_ia32 *
@@ -113,14 +93,29 @@ x86_backtrace(struct pt_regs * const regs, unsigned int depth)
 	struct stack_frame *head = (struct stack_frame *)frame_pointer(regs);
 
 	if (!user_mode(regs)) {
+		struct unwind_state state;
+		unsigned long addr;
+
 		if (!depth)
 			return;
 
 		oprofile_add_trace(regs->ip);
+
 		if (!--depth)
 			return;
 
-		dump_trace(NULL, regs, NULL, 0, &backtrace_ops, &depth);
+		for (unwind_start(&state, NULL, regs, NULL);
+		     !unwind_done(&state); unwind_next_frame(&state)) {
+			addr = unwind_get_return_address(&state);
+			if (!addr)
+				break;
+
+			oprofile_add_trace(addr);
+
+			if (!--depth)
+				break;
+		}
+
 		return;
 	}
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 34/44] x86/dumpstack: convert show_trace_log_lvl() to use the new unwinder
  2016-08-04 22:21 [PATCH v2 00/44] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (32 preceding siblings ...)
  2016-08-04 22:22 ` [PATCH v2 33/44] oprofile/x86: convert x86_backtrace() " Josh Poimboeuf
@ 2016-08-04 22:22 ` Josh Poimboeuf
  2016-08-04 22:22 ` [PATCH v2 35/44] x86/dumpstack: remove dump_trace() and related callbacks Josh Poimboeuf
                   ` (9 subsequent siblings)
  43 siblings, 0 replies; 66+ messages in thread
From: Josh Poimboeuf @ 2016-08-04 22:22 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park

Convert show_trace_log_lvl() to use the new unwinder.  dump_trace() has
been deprecated.

show_trace_log_lvl() is special compared to other users of the unwinder.
It's the only place where both reliable *and* unreliable addresses are
needed.  With frame pointers enabled, most stack walking code doesn't
want to know about unreliable addresses.  But in this case, when we're
dumping the stack to the console because something presumably went
wrong, the unreliable addresses are useful:

- They show stale data on the stack which can provide useful clues.

- If something goes wrong with the unwinder, or if frame pointers are
  corrupt or missing, all the stack addresses still get shown.

So in order to show all addresses on the stack, and at the same time
figure out which addresses are reliable, we have to do the scanning and
the unwinding in parallel.

The scanning is done with the help of get_stack_info() to traverse the
stacks.  The unwinding is done separately by the new unwinder.

In theory we could simplify show_trace_log_lvl() by instead pushing some
of this logic into the unwind code.  But then we would need some kind of
"fake" frame logic in the unwinder which would add a lot of complexity
and wouldn't be worth it in order to support only one user.

Another benefit of this approach is that once we have a DWARF unwinder,
we should be able to just plug it in with minimal impact to this code.

Another change here is that callers of show_trace_log_lvl() don't need
to provide the 'bp' argument.  The unwinder already finds the relevant
frame pointer by unwinding until it reaches the first frame after the
provided stack pointer.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/include/asm/stacktrace.h |  10 ++-
 arch/x86/kernel/dumpstack.c       | 125 ++++++++++++++++++++++++++++++--------
 arch/x86/kernel/dumpstack_32.c    |   6 +-
 arch/x86/kernel/dumpstack_64.c    |  10 +--
 4 files changed, 111 insertions(+), 40 deletions(-)

diff --git a/arch/x86/include/asm/stacktrace.h b/arch/x86/include/asm/stacktrace.h
index be9273c..0a5acde 100644
--- a/arch/x86/include/asm/stacktrace.h
+++ b/arch/x86/include/asm/stacktrace.h
@@ -119,13 +119,11 @@ get_stack_pointer(struct task_struct *task, struct pt_regs *regs)
 	return (unsigned long *)task->thread.sp;
 }
 
-extern void
-show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
-		   unsigned long *stack, unsigned long bp, char *log_lvl);
+void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
+			unsigned long *stack, char *log_lvl);
 
-extern void
-show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
-		   unsigned long *sp, unsigned long bp, char *log_lvl);
+void show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
+			unsigned long *sp, char *log_lvl);
 
 extern unsigned int code_bytes;
 
diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index aa208e5..6a754b9 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -17,7 +17,7 @@
 #include <linux/sysfs.h>
 
 #include <asm/stacktrace.h>
-
+#include <asm/unwind.h>
 
 int panic_on_unrecovered_nmi;
 int panic_on_io_nmi;
@@ -142,33 +142,106 @@ print_context_stack_bp(struct task_struct *task,
 }
 EXPORT_SYMBOL_GPL(print_context_stack_bp);
 
-static int print_trace_stack(void *data, const char *name)
+void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
+			unsigned long *stack, char *log_lvl)
 {
-	printk("%s <%s> ", (char *)data, name);
-	return 0;
-}
+	struct unwind_state state;
+	struct stack_info stack_info = {0};
+	unsigned long visit_mask = 0;
+	int graph_idx = 0;
 
-/*
- * Print one address/symbol entries per line.
- */
-static int print_trace_address(void *data, unsigned long addr, int reliable)
-{
-	printk_stack_address(addr, reliable, data);
-	return 0;
-}
+	printk("%sCall Trace:\n", log_lvl);
 
-static const struct stacktrace_ops print_trace_ops = {
-	.stack			= print_trace_stack,
-	.address		= print_trace_address,
-	.walk_stack		= print_context_stack,
-};
+	stack = stack ? : get_stack_pointer(task, regs);
+	if (!task)
+		task = current;
 
-void
-show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
-		unsigned long *stack, unsigned long bp, char *log_lvl)
-{
-	printk("%sCall Trace:\n", log_lvl);
-	dump_trace(task, regs, stack, bp, &print_trace_ops, log_lvl);
+	unwind_start(&state, task, regs, stack);
+
+	/*
+	 * Iterate through the stacks, starting with the current stack pointer.
+	 * Each stack has a pointer to the next one.
+	 *
+	 * x86-64 can have several stacks:
+	 * - task stack
+	 * - interrupt stack
+	 * - HW exception stacks (double fault, nmi, debug, mce)
+	 *
+	 * x86-32 can have up to three stacks:
+	 * - task stack
+	 * - softirq stack
+	 * - hardirq stack
+	 */
+	for (; stack; stack = stack_info.next_sp) {
+		const char *str_begin, *str_end;
+
+		/*
+		 * If we overflowed the task stack into a guard page, jump back
+		 * to the bottom of the usable stack.
+		 */
+		if (task_stack_page(task) - (void *)stack < PAGE_SIZE)
+			stack = task_stack_page(task);
+
+		if (get_stack_info(stack, task, &stack_info, &visit_mask))
+			break;
+
+		stack_type_str(stack_info.type, &str_begin, &str_end);
+		if (str_begin)
+			printk("%s <%s> ", log_lvl, str_begin);
+
+		/*
+		 * Scan the stack, printing any text addresses we find.  At the
+		 * same time, follow proper stack frames with the unwinder.
+		 *
+		 * Addresses found during the scan which are not reported by
+		 * the unwinder are considered to be additional clues which are
+		 * sometimes useful for debugging and are prefixed with '?'.
+		 * This also serves as a failsafe option in case the unwinder
+		 * goes off in the weeds.
+		 */
+		for (; stack < stack_info.end; stack++) {
+			unsigned long real_addr;
+			int reliable = 0;
+			unsigned long addr = *stack;
+			unsigned long *ret_addr_p =
+				unwind_get_return_address_ptr(&state);
+
+			if (!__kernel_text_address(addr))
+				continue;
+
+			if (stack == ret_addr_p)
+				reliable = 1;
+
+			/*
+			 * When function graph tracing is enabled for a
+			 * function, its return address on the stack is
+			 * replaced with the address of an ftrace handler
+			 * (return_to_handler).  In that case, before printing
+			 * the "real" address, we want to print the handler
+			 * address as an "unreliable" hint that function graph
+			 * tracing was involved.
+			 */
+			real_addr = ftrace_graph_ret_addr(task, &graph_idx,
+							  addr, stack);
+			if (real_addr != addr)
+				printk_stack_address(addr, 0, log_lvl);
+			printk_stack_address(real_addr, reliable, log_lvl);
+
+			if (!reliable)
+				continue;
+
+			/*
+			 * Get the next frame from the unwinder.  No need to
+			 * check for an error: if anything goes wrong with the
+			 * unwinder, the rest of the addresses will just be
+			 * printed as unreliable.
+			 */
+			unwind_next_frame(&state);
+		}
+
+		if (str_end)
+			printk("%s <%s> ", log_lvl, str_end);
+	}
 }
 
 void show_stack(struct task_struct *task, unsigned long *sp)
@@ -184,12 +257,12 @@ void show_stack(struct task_struct *task, unsigned long *sp)
 		bp = (unsigned long)get_frame_pointer(current, NULL);
 	}
 
-	show_stack_log_lvl(task, NULL, sp, bp, "");
+	show_stack_log_lvl(task, NULL, sp, "");
 }
 
 void show_stack_regs(struct pt_regs *regs)
 {
-	show_stack_log_lvl(current, regs, NULL, 0, "");
+	show_stack_log_lvl(NULL, regs, NULL, "");
 }
 
 static arch_spinlock_t die_lock = __ARCH_SPIN_LOCK_UNLOCKED;
diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
index 37d9c30..922b722 100644
--- a/arch/x86/kernel/dumpstack_32.c
+++ b/arch/x86/kernel/dumpstack_32.c
@@ -150,7 +150,7 @@ EXPORT_SYMBOL(dump_trace);
 
 void
 show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
-		   unsigned long *sp, unsigned long bp, char *log_lvl)
+		   unsigned long *sp, char *log_lvl)
 {
 	unsigned long *stack;
 	int i;
@@ -170,7 +170,7 @@ show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
 		touch_nmi_watchdog();
 	}
 	pr_cont("\n");
-	show_trace_log_lvl(task, regs, sp, bp, log_lvl);
+	show_trace_log_lvl(task, regs, sp, log_lvl);
 }
 
 
@@ -192,7 +192,7 @@ void show_regs(struct pt_regs *regs)
 		u8 *ip;
 
 		pr_emerg("Stack:\n");
-		show_stack_log_lvl(NULL, regs, NULL, 0, KERN_EMERG);
+		show_stack_log_lvl(NULL, regs, NULL, KERN_EMERG);
 
 		pr_emerg("Code:");
 
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index 2292292..55ee1f3 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -15,6 +15,7 @@
 #include <linux/nmi.h>
 
 #include <asm/stacktrace.h>
+#include <asm/unwind.h>
 
 static char *exception_stack_names[N_EXCEPTION_STACKS] = {
 		[ DOUBLEFAULT_STACK-1	]	= "#DF",
@@ -205,9 +206,8 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 }
 EXPORT_SYMBOL(dump_trace);
 
-void
-show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
-		   unsigned long *sp, unsigned long bp, char *log_lvl)
+void show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
+			unsigned long *sp, char *log_lvl)
 {
 	unsigned long *irq_stack, *irq_stack_end;
 	unsigned long *stack;
@@ -247,7 +247,7 @@ show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
 	}
 
 	pr_cont("\n");
-	show_trace_log_lvl(task, regs, sp, bp, log_lvl);
+	show_trace_log_lvl(task, regs, sp, log_lvl);
 }
 
 void show_regs(struct pt_regs *regs)
@@ -268,7 +268,7 @@ void show_regs(struct pt_regs *regs)
 		u8 *ip;
 
 		printk(KERN_DEFAULT "Stack:\n");
-		show_stack_log_lvl(NULL, regs, NULL, 0, KERN_DEFAULT);
+		show_stack_log_lvl(NULL, regs, NULL, KERN_DEFAULT);
 
 		printk(KERN_DEFAULT "Code: ");
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 35/44] x86/dumpstack: remove dump_trace() and related callbacks
  2016-08-04 22:21 [PATCH v2 00/44] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (33 preceding siblings ...)
  2016-08-04 22:22 ` [PATCH v2 34/44] x86/dumpstack: convert show_trace_log_lvl() " Josh Poimboeuf
@ 2016-08-04 22:22 ` Josh Poimboeuf
  2016-08-04 22:22 ` [PATCH v2 36/44] x86/entry/unwind: encode pt_regs pointer in frame pointer Josh Poimboeuf
                   ` (8 subsequent siblings)
  43 siblings, 0 replies; 66+ messages in thread
From: Josh Poimboeuf @ 2016-08-04 22:22 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park

All previous users of dump_trace() have been converted to use the new
unwind interfaces, so we can remove it and the related
print_context_stack() and print_context_stack_bp() callback functions.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/include/asm/stacktrace.h | 36 ----------------
 arch/x86/kernel/dumpstack.c       | 86 ---------------------------------------
 arch/x86/kernel/dumpstack_32.c    | 35 ----------------
 arch/x86/kernel/dumpstack_64.c    | 69 -------------------------------
 4 files changed, 226 deletions(-)

diff --git a/arch/x86/include/asm/stacktrace.h b/arch/x86/include/asm/stacktrace.h
index 0a5acde..43a12d6 100644
--- a/arch/x86/include/asm/stacktrace.h
+++ b/arch/x86/include/asm/stacktrace.h
@@ -44,42 +44,6 @@ static inline bool on_stack(struct stack_info *info, void *addr, size_t len)
 
 extern int kstack_depth_to_print;
 
-struct thread_info;
-struct stacktrace_ops;
-
-typedef unsigned long (*walk_stack_t)(struct task_struct *task,
-				      unsigned long *stack,
-				      unsigned long bp,
-				      const struct stacktrace_ops *ops,
-				      void *data,
-				      struct stack_info *info,
-				      int *graph);
-
-extern unsigned long
-print_context_stack(struct task_struct *task,
-		    unsigned long *stack, unsigned long bp,
-		    const struct stacktrace_ops *ops, void *data,
-		    struct stack_info *info, int *graph);
-
-extern unsigned long
-print_context_stack_bp(struct task_struct *task,
-		       unsigned long *stack, unsigned long bp,
-		       const struct stacktrace_ops *ops, void *data,
-		       struct stack_info *info, int *graph);
-
-/* Generic stack tracer with callbacks */
-
-struct stacktrace_ops {
-	int (*address)(void *data, unsigned long address, int reliable);
-	/* On negative return stop dumping */
-	int (*stack)(void *data, const char *name);
-	walk_stack_t	walk_stack;
-};
-
-void dump_trace(struct task_struct *tsk, struct pt_regs *regs,
-		unsigned long *stack, unsigned long bp,
-		const struct stacktrace_ops *ops, void *data);
-
 #ifdef CONFIG_X86_32
 #define STACKSLOTS_PER_LINE 8
 #else
diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index 6a754b9..bdec037 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -56,92 +56,6 @@ void printk_address(unsigned long address)
 	pr_cont(" [<%p>] %pS\n", (void *)address, (void *)address);
 }
 
-/*
- * x86-64 can have up to three kernel stacks:
- * process stack
- * interrupt stack
- * severe exception (double fault, nmi, stack fault, debug, mce) hardware stack
- */
-
-unsigned long
-print_context_stack(struct task_struct *task,
-		unsigned long *stack, unsigned long bp,
-		const struct stacktrace_ops *ops, void *data,
-		struct stack_info *info, int *graph)
-{
-	struct stack_frame *frame = (struct stack_frame *)bp;
-
-	/*
-	 * If we overflowed the stack into a guard page, jump back to the
-	 * bottom of the usable stack.
-	 */
-	if ((unsigned long)task_stack_page(task) - (unsigned long)stack <
-	    PAGE_SIZE)
-		stack = (unsigned long *)task_stack_page(task);
-
-	while (on_stack(info, stack, sizeof(*stack))) {
-		unsigned long addr = *stack;
-
-		if (__kernel_text_address(addr)) {
-			unsigned long real_addr;
-			int reliable = 0;
-
-			if ((unsigned long) stack == bp + sizeof(long)) {
-				reliable = 1;
-				frame = frame->next_frame;
-				bp = (unsigned long) frame;
-			}
-
-			/*
-			 * When function graph tracing is enabled for a
-			 * function, its return address on the stack is
-			 * replaced with the address of an ftrace handler
-			 * (return_to_handler).  In that case, before printing
-			 * the "real" address, we want to print the handler
-			 * address as an "unreliable" hint that function graph
-			 * tracing was involved.
-			 */
-			real_addr = ftrace_graph_ret_addr(task, graph, addr,
-							  stack);
-			if (real_addr != addr)
-				ops->address(data, addr, 0);
-
-			ops->address(data, real_addr, reliable);
-		}
-		stack++;
-	}
-	return bp;
-}
-EXPORT_SYMBOL_GPL(print_context_stack);
-
-unsigned long
-print_context_stack_bp(struct task_struct *task,
-		       unsigned long *stack, unsigned long bp,
-		       const struct stacktrace_ops *ops, void *data,
-		       struct stack_info *info, int *graph)
-{
-	struct stack_frame *frame = (struct stack_frame *)bp;
-	unsigned long *retp = &frame->return_address;
-
-	while (on_stack(info, stack, sizeof(*stack) * 2)) {
-		unsigned long addr = *retp;
-		unsigned long real_addr;
-
-		if (!__kernel_text_address(addr))
-			break;
-
-		real_addr = ftrace_graph_ret_addr(task, graph, addr, retp);
-		if (ops->address(data, real_addr, 1))
-			break;
-
-		frame = frame->next_frame;
-		retp = &frame->return_address;
-	}
-
-	return (unsigned long)frame;
-}
-EXPORT_SYMBOL_GPL(print_context_stack_bp);
-
 void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
 			unsigned long *stack, char *log_lvl)
 {
diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
index 922b722..53d939e 100644
--- a/arch/x86/kernel/dumpstack_32.c
+++ b/arch/x86/kernel/dumpstack_32.c
@@ -113,41 +113,6 @@ unknown:
 	return -EINVAL;
 }
 
-void dump_trace(struct task_struct *task, struct pt_regs *regs,
-		unsigned long *stack, unsigned long bp,
-		const struct stacktrace_ops *ops, void *data)
-{
-	unsigned long visit_mask = 0;
-	int graph = 0;
-
-	task = task ? : current;
-	stack = stack ? : get_stack_pointer(task, regs);
-	bp = bp ? : (unsigned long)get_frame_pointer(task, regs);
-
-	for (;;) {
-		const char *begin_str, *end_str;
-		struct stack_info info;
-
-		if (get_stack_info(stack, task, &info, &visit_mask))
-			break;
-
-		stack_type_str(info.type, &begin_str, &end_str);
-
-		if (begin_str && ops->stack(data, begin_str) < 0)
-			break;
-
-		bp = ops->walk_stack(task, stack, bp, ops, data, &info, &graph);
-
-		if (end_str && ops->stack(data, end_str) < 0)
-			break;
-
-		stack = info.next_sp;
-
-		touch_nmi_watchdog();
-	}
-}
-EXPORT_SYMBOL(dump_trace);
-
 void
 show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
 		   unsigned long *sp, char *log_lvl)
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index 55ee1f3..9f7a9f9 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -137,75 +137,6 @@ unknown:
 	return -EINVAL;
 }
 
-/*
- * x86-64 can have up to three kernel stacks:
- * process stack
- * interrupt stack
- * severe exception (double fault, nmi, stack fault, debug, mce) hardware stack
- */
-
-void dump_trace(struct task_struct *task, struct pt_regs *regs,
-		unsigned long *stack, unsigned long bp,
-		const struct stacktrace_ops *ops, void *data)
-{
-	unsigned long visit_mask = 0;
-	struct stack_info info;
-	int graph = 0;
-	int done = 0;
-
-	task = task ? : current;
-	stack = stack ? : get_stack_pointer(task, regs);
-	bp = bp ? : (unsigned long)get_frame_pointer(task, regs);
-
-	/*
-	 * Print function call entries in all stacks, starting at the
-	 * current stack address. If the stacks consist of nested
-	 * exceptions
-	 */
-	while (!done) {
-		const char *begin_str, *end_str;
-
-		get_stack_info(stack, task, &info, &visit_mask);
-
-		/* Default finish unless specified to continue */
-		done = 1;
-
-		switch (info.type) {
-
-		/* Break out early if we are on the thread stack */
-		case STACK_TYPE_TASK:
-			break;
-
-		case STACK_TYPE_IRQ:
-		case STACK_TYPE_EXCEPTION ... STACK_TYPE_EXCEPTION_LAST:
-
-			stack_type_str(info.type, &begin_str, &end_str);
-
-			if (ops->stack(data, begin_str) < 0)
-				break;
-
-			bp = ops->walk_stack(task, stack, bp, ops,
-					     data, &info, &graph);
-
-			ops->stack(data, end_str);
-
-			stack = info.next_sp;
-			done = 0;
-			break;
-
-		default:
-			ops->stack(data, "UNK");
-			break;
-		}
-	}
-
-	/*
-	 * This handles the process stack:
-	 */
-	bp = ops->walk_stack(task, stack, bp, ops, data, &info, &graph);
-}
-EXPORT_SYMBOL(dump_trace);
-
 void show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
 			unsigned long *sp, char *log_lvl)
 {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 36/44] x86/entry/unwind: encode pt_regs pointer in frame pointer
  2016-08-04 22:21 [PATCH v2 00/44] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (34 preceding siblings ...)
  2016-08-04 22:22 ` [PATCH v2 35/44] x86/dumpstack: remove dump_trace() and related callbacks Josh Poimboeuf
@ 2016-08-04 22:22 ` Josh Poimboeuf
  2016-08-08 23:06   ` Josh Poimboeuf
  2016-08-04 22:22 ` [PATCH v2 37/44] x86/unwind: detect syscall entry regs Josh Poimboeuf
                   ` (7 subsequent siblings)
  43 siblings, 1 reply; 66+ messages in thread
From: Josh Poimboeuf @ 2016-08-04 22:22 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park

With frame pointers, when a task is interrupted, its stack is no longer
completely reliable because the function could have been interrupted
before it had a chance to save the previous frame pointer on the stack.
So the caller of the interrupted function could get skipped by a stack
trace.

This is problematic for live patching, which needs to know whether a
stack trace of a sleeping task can be relied upon.  There's currently no
way to detect if a sleeping task was interrupted by a page fault
exception or preemption before it went to sleep.

Another issue is that when dumping the stack of an interrupted task, the
unwinder has no way of knowing where the saved pt_regs registers are, so
it can't print them.

This solves those issues by encoding the pt_regs pointer in the frame
pointer on entry from an interrupt or an exception.  The frame pointer
unwinder is also updated to decode it.

Suggested-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/entry/calling.h       | 21 +++++++++++
 arch/x86/entry/entry_64.S      | 10 ++++--
 arch/x86/include/asm/unwind.h  | 11 ++++++
 arch/x86/kernel/unwind_frame.c | 80 ++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 119 insertions(+), 3 deletions(-)

diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index 9a9e588..ff5a5a3 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -201,6 +201,27 @@ For 32-bit we have the following conventions - kernel is built with
 	.byte 0xf1
 	.endm
 
+	/*
+	 * This is a sneaky trick to help the unwinder find pt_regs on the
+	 * stack.  The frame pointer is replaced with an encoded pointer to
+	 * pt_regs.  The encoding is just a clearing of the highest-order bit,
+	 * which makes it an invalid address and is also a signal to the
+	 * unwinder that it's a pt_regs pointer in disguise.
+	 *
+	 * NOTE: This must be called *after* SAVE_EXTRA_REGS because it
+	 * corrupts rbp.
+	 */
+.macro ENCODE_FRAME_POINTER ptregs_offset=0
+#ifdef CONFIG_FRAME_POINTER
+	.if \ptregs_offset
+		leaq \ptregs_offset(%rsp), %rbp
+	.else
+		mov %rsp, %rbp
+	.endif
+	btr $63, %rbp
+#endif
+.endm
+
 #endif /* CONFIG_X86_64 */
 
 /*
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 8956eae..0ee48c6 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -430,6 +430,7 @@ END(irq_entries_start)
 	ALLOC_PT_GPREGS_ON_STACK
 	SAVE_C_REGS
 	SAVE_EXTRA_REGS
+	ENCODE_FRAME_POINTER
 
 	testb	$3, CS(%rsp)
 	jz	1f
@@ -892,6 +893,7 @@ ENTRY(xen_failsafe_callback)
 	ALLOC_PT_GPREGS_ON_STACK
 	SAVE_C_REGS
 	SAVE_EXTRA_REGS
+	ENCODE_FRAME_POINTER
 	jmp	error_exit
 END(xen_failsafe_callback)
 
@@ -935,6 +937,7 @@ ENTRY(paranoid_entry)
 	cld
 	SAVE_C_REGS 8
 	SAVE_EXTRA_REGS 8
+	ENCODE_FRAME_POINTER 8
 	movl	$1, %ebx
 	movl	$MSR_GS_BASE, %ecx
 	rdmsr
@@ -982,6 +985,7 @@ ENTRY(error_entry)
 	cld
 	SAVE_C_REGS 8
 	SAVE_EXTRA_REGS 8
+	ENCODE_FRAME_POINTER 8
 	xorl	%ebx, %ebx
 	testb	$3, CS+8(%rsp)
 	jz	.Lerror_kernelspace
@@ -1164,6 +1168,7 @@ ENTRY(nmi)
 	pushq	%r13		/* pt_regs->r13 */
 	pushq	%r14		/* pt_regs->r14 */
 	pushq	%r15		/* pt_regs->r15 */
+	ENCODE_FRAME_POINTER
 
 	/*
 	 * At this point we no longer need to worry about stack damage
@@ -1177,11 +1182,10 @@ ENTRY(nmi)
 
 	/*
 	 * Return back to user mode.  We must *not* do the normal exit
-	 * work, because we don't want to enable interrupts.  Fortunately,
-	 * do_nmi doesn't modify pt_regs.
+	 * work, because we don't want to enable interrupts.
 	 */
 	SWAPGS
-	jmp	restore_c_regs_and_iret
+	jmp	restore_regs_and_iret
 
 .Lnmi_from_kernel:
 	/*
diff --git a/arch/x86/include/asm/unwind.h b/arch/x86/include/asm/unwind.h
index c4fdd58..5ba7f3c 100644
--- a/arch/x86/include/asm/unwind.h
+++ b/arch/x86/include/asm/unwind.h
@@ -13,6 +13,7 @@ struct unwind_state {
 	int graph_idx;
 #ifdef CONFIG_FRAME_POINTER
 	unsigned long *bp;
+	struct pt_regs *regs;
 #else
 	unsigned long *sp;
 #endif
@@ -43,6 +44,11 @@ static inline unsigned long *unwind_get_stack_ptr(struct unwind_state *state)
 	return state->bp;
 }
 
+static inline struct pt_regs *unwind_get_entry_regs(struct unwind_state *state)
+{
+	return state->regs;
+}
+
 unsigned long unwind_get_return_address(struct unwind_state *state);
 
 #else /* !CONFIG_FRAME_POINTER */
@@ -61,6 +67,11 @@ static inline unsigned long *unwind_get_stack_ptr(struct unwind_state *state)
 	return state->sp;
 }
 
+static inline struct pt_regs *unwind_get_entry_regs(struct unwind_state *state)
+{
+	return NULL;
+}
+
 static inline
 unsigned long unwind_get_return_address(struct unwind_state *state)
 {
diff --git a/arch/x86/kernel/unwind_frame.c b/arch/x86/kernel/unwind_frame.c
index f28f1b5..d24c192 100644
--- a/arch/x86/kernel/unwind_frame.c
+++ b/arch/x86/kernel/unwind_frame.c
@@ -21,6 +21,52 @@ unsigned long unwind_get_return_address(struct unwind_state *state)
 }
 EXPORT_SYMBOL_GPL(unwind_get_return_address);
 
+#ifdef CONFIG_X86_64
+/*
+ * This determines if the frame pointer actually contains an encoded pointer to
+ * pt_regs on the stack.  See ENCODE_FRAME_POINTER.
+ */
+static struct pt_regs *decode_frame_pointer(struct unwind_state *state,
+					    unsigned long bp)
+{
+	struct pt_regs *regs;
+	unsigned long *task_begin = task_stack_page(state->task);
+	unsigned long *task_end   = task_stack_page(state->task) + THREAD_SIZE;
+
+	/* if the MSB is set, it's not an encoded pointer */
+	if (bp & (1UL << (BITS_PER_LONG - 1)))
+		return NULL;
+
+	/* decode it by setting the MSB */
+	bp |= 1UL << (BITS_PER_LONG - 1);
+	regs = (struct pt_regs *)bp;
+
+	/* make sure the regs are on the current unwind_state stack */
+	if (on_stack(&state->stack_info, regs, sizeof(*regs)))
+		return regs;
+
+	/*
+	 * The regs might have been placed on the task stack before entry code
+	 * switched to the irq stack.
+	 */
+	if (state->stack_info.type == STACK_TYPE_IRQ &&
+	    state->stack_info.next_sp >= task_begin &&
+	    state->stack_info.next_sp < task_end &&
+	    (unsigned long *)regs >= task_begin &&
+	    (unsigned long *)regs < task_end &&
+	    (unsigned long *)(regs + 1) <= task_end)
+		return regs;
+
+	return NULL;
+}
+#else
+static struct pt_regs *decode_frame_pointer(struct unwind_state *state,
+					    unsigned long bp)
+{
+	return NULL;
+}
+#endif
+
 static bool update_stack_state(struct unwind_state *state, void *addr,
 			       size_t len)
 {
@@ -45,14 +91,47 @@ unknown:
 
 bool unwind_next_frame(struct unwind_state *state)
 {
+	struct pt_regs *regs;
 	unsigned long *next_bp;
 
+	state->regs = NULL;
+
 	if (unwind_done(state))
 		return false;
 
 	next_bp = (unsigned long *)*state->bp;
 
 	/*
+	 * Check if the next frame pointer is really an encoded pt_regs
+	 * pointer.
+	 */
+	regs = decode_frame_pointer(state, (unsigned long)next_bp);
+	if (regs) {
+		/*
+		 * We may need to switch to the next stack to access the regs.
+		 * This can happen when switching from the IRQ stack: the
+		 * encoded regs pointer is on the IRQ stack but the regs
+		 * themselves are on the task stack.
+		 */
+		if (!update_stack_state(state, regs, sizeof(*regs)))
+			return false;
+
+		/*
+		 * The regs are now safe to access and are made available to
+		 * the user even if we've reached the end.
+		 */
+		state->regs = regs;
+
+		if (user_mode(regs)) {
+			/* reached the end */
+			state->stack_info.type = STACK_TYPE_UNKNOWN;
+			return false;
+		}
+
+		next_bp = (unsigned long *)regs->bp;
+	}
+
+	/*
 	 * Make sure the next frame is on a valid stack and can be accessed
 	 * safely.
 	 */
@@ -72,6 +151,7 @@ void __unwind_start(struct unwind_state *state, struct task_struct *task,
 
 	state->task = task;
 	state->bp = get_frame_pointer(task, regs);
+	state->regs = NULL;
 
 	get_stack_info(state->bp, state->task, &state->stack_info,
 		       &state->stack_mask);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 37/44] x86/unwind: detect syscall entry regs
  2016-08-04 22:21 [PATCH v2 00/44] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (35 preceding siblings ...)
  2016-08-04 22:22 ` [PATCH v2 36/44] x86/entry/unwind: encode pt_regs pointer in frame pointer Josh Poimboeuf
@ 2016-08-04 22:22 ` Josh Poimboeuf
  2016-08-04 22:22 ` [PATCH v2 38/44] x86/dumpstack: print stack identifier on its own line Josh Poimboeuf
                   ` (6 subsequent siblings)
  43 siblings, 0 replies; 66+ messages in thread
From: Josh Poimboeuf @ 2016-08-04 22:22 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park

The entry code doesn't encode pt_regs for syscalls.  But they're always
at the same location, so we can add a manual check for them.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/unwind_frame.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/arch/x86/kernel/unwind_frame.c b/arch/x86/kernel/unwind_frame.c
index d24c192..f60a089 100644
--- a/arch/x86/kernel/unwind_frame.c
+++ b/arch/x86/kernel/unwind_frame.c
@@ -21,6 +21,14 @@ unsigned long unwind_get_return_address(struct unwind_state *state)
 }
 EXPORT_SYMBOL_GPL(unwind_get_return_address);
 
+static bool is_last_task_frame(struct unwind_state *state)
+{
+	unsigned long bp = (unsigned long)state->bp;
+	unsigned long regs = (unsigned long)task_pt_regs(state->task);
+
+	return bp == regs - FRAME_HEADER_SIZE;
+}
+
 #ifdef CONFIG_X86_64
 /*
  * This determines if the frame pointer actually contains an encoded pointer to
@@ -99,6 +107,20 @@ bool unwind_next_frame(struct unwind_state *state)
 	if (unwind_done(state))
 		return false;
 
+	/*
+	 * The entry code doesn't encode pt_regs on syscalls, so check for them
+	 * here.  The last frame pointer and associated syscall pt_regs (for
+	 * user tasks) are always at a standard location at the end of the task
+	 * stack.  If we've reached the end, go ahead and exit early to avoid
+	 * trying to decode an invalid frame pointer.
+	 */
+	if (is_last_task_frame(state)) {
+		if (!(state->task->flags & PF_KTHREAD))
+			state->regs = task_pt_regs(state->task);
+		state->stack_info.type = STACK_TYPE_UNKNOWN;
+		return false;
+	}
+
 	next_bp = (unsigned long *)*state->bp;
 
 	/*
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 38/44] x86/dumpstack: print stack identifier on its own line
  2016-08-04 22:21 [PATCH v2 00/44] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (36 preceding siblings ...)
  2016-08-04 22:22 ` [PATCH v2 37/44] x86/unwind: detect syscall entry regs Josh Poimboeuf
@ 2016-08-04 22:22 ` Josh Poimboeuf
  2016-08-04 22:22 ` [PATCH v2 39/44] x86/dumpstack: print any pt_regs found on the stack Josh Poimboeuf
                   ` (5 subsequent siblings)
  43 siblings, 0 replies; 66+ messages in thread
From: Josh Poimboeuf @ 2016-08-04 22:22 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park

show_trace_log_lvl() prints the stack id (e.g. "<IRQ>") without a
newline so that any stack address printed after it will appear on the
same line.  That causes the first stack address to be vertically
misaligned with the rest, making it visually cluttered and slightly
confusing:

  Call Trace:
   <IRQ> [<ffffffff814431c3>] dump_stack+0x86/0xc3
   [<ffffffff8100828b>] perf_callchain_kernel+0x14b/0x160
   [<ffffffff811e915f>] get_perf_callchain+0x15f/0x2b0
   ...
   <EOI> [<ffffffff8189c6c3>] ? _raw_spin_unlock_irq+0x33/0x60
   [<ffffffff810e1c84>] finish_task_switch+0xb4/0x250
   [<ffffffff8106f7dc>] do_async_page_fault+0x2c/0xa0

It will look worse once we start printing pt_regs registers found in the
middle of the stack:

  <IRQ> RIP: 0010:[<ffffffff8189c6c3>]  [<ffffffff8189c6c3>] _raw_spin_unlock_irq+0x33/0x60
  RSP: 0018:ffff88007876f720  EFLAGS: 00000206
  RAX: ffff8800786caa40 RBX: ffff88007d5da140 RCX: 0000000000000007
  ...

Improve readability by adding a newline to the stack name:

  Call Trace:
   <IRQ>
   [<ffffffff814431c3>] dump_stack+0x86/0xc3
   [<ffffffff8100828b>] perf_callchain_kernel+0x14b/0x160
   [<ffffffff811e915f>] get_perf_callchain+0x15f/0x2b0
   ...
   <EOI>
   [<ffffffff8189c6c3>] ? _raw_spin_unlock_irq+0x33/0x60
   [<ffffffff810e1c84>] finish_task_switch+0xb4/0x250
   [<ffffffff8106f7dc>] do_async_page_fault+0x2c/0xa0

Now that "continued" lines are no longer needed, we can also remove the
hack of using the empty string (aka KERN_CONT) and replace it with
KERN_DEFAULT.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/dumpstack.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index bdec037..3be646c 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -101,7 +101,7 @@ void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
 
 		stack_type_str(stack_info.type, &str_begin, &str_end);
 		if (str_begin)
-			printk("%s <%s> ", log_lvl, str_begin);
+			printk("%s <%s>\n", log_lvl, str_begin);
 
 		/*
 		 * Scan the stack, printing any text addresses we find.  At the
@@ -154,7 +154,7 @@ void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
 		}
 
 		if (str_end)
-			printk("%s <%s> ", log_lvl, str_end);
+			printk("%s <%s>\n", log_lvl, str_end);
 	}
 }
 
@@ -171,12 +171,12 @@ void show_stack(struct task_struct *task, unsigned long *sp)
 		bp = (unsigned long)get_frame_pointer(current, NULL);
 	}
 
-	show_stack_log_lvl(task, NULL, sp, "");
+	show_stack_log_lvl(task, NULL, sp, KERN_DEFAULT);
 }
 
 void show_stack_regs(struct pt_regs *regs)
 {
-	show_stack_log_lvl(NULL, regs, NULL, "");
+	show_stack_log_lvl(NULL, regs, NULL, KERN_DEFAULT);
 }
 
 static arch_spinlock_t die_lock = __ARCH_SPIN_LOCK_UNLOCKED;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 39/44] x86/dumpstack: print any pt_regs found on the stack
  2016-08-04 22:21 [PATCH v2 00/44] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (37 preceding siblings ...)
  2016-08-04 22:22 ` [PATCH v2 38/44] x86/dumpstack: print stack identifier on its own line Josh Poimboeuf
@ 2016-08-04 22:22 ` Josh Poimboeuf
  2016-08-04 22:22 ` [PATCH v2 40/44] x86: remove 64-byte gap at end of irq stack Josh Poimboeuf
                   ` (4 subsequent siblings)
  43 siblings, 0 replies; 66+ messages in thread
From: Josh Poimboeuf @ 2016-08-04 22:22 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park

Now that we can find pt_regs registers on the stack, print them.  Here's
an example of what it looks like:

Call Trace:
 <IRQ>
 [<ffffffff8144b793>] dump_stack+0x86/0xc3
 [<ffffffff81142c73>] hrtimer_interrupt+0xb3/0x1c0
 [<ffffffff8105eb86>] local_apic_timer_interrupt+0x36/0x60
 [<ffffffff818b27cd>] smp_apic_timer_interrupt+0x3d/0x50
 [<ffffffff818b06ee>] apic_timer_interrupt+0x9e/0xb0
RIP: 0010:[<ffffffff818aef43>]  [<ffffffff818aef43>] _raw_spin_unlock_irq+0x33/0x60
RSP: 0018:ffff880079c4f760  EFLAGS: 00000202
RAX: ffff880078738000 RBX: ffff88007d3da0c0 RCX: 0000000000000007
RDX: 0000000000006d78 RSI: ffff8800787388f0 RDI: ffff880078738000
RBP: ffff880079c4f768 R08: 0000002199088f38 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff81e0d540
R13: ffff8800369fb700 R14: 0000000000000000 R15: ffff880078738000
 <EOI>
 [<ffffffff810e1f14>] finish_task_switch+0xb4/0x250
 [<ffffffff810e1ed6>] ? finish_task_switch+0x76/0x250
 [<ffffffff818a7b61>] __schedule+0x3e1/0xb20
 ...
 [<ffffffff810759c8>] trace_do_page_fault+0x58/0x2c0
 [<ffffffff8106f7dc>] do_async_page_fault+0x2c/0xa0
 [<ffffffff818b1dd8>] async_page_fault+0x28/0x30
RIP: 0010:[<ffffffff8145b062>]  [<ffffffff8145b062>] __clear_user+0x42/0x70
RSP: 0018:ffff880079c4fd38  EFLAGS: 00010202
RAX: 0000000000000000 RBX: 0000000000000138 RCX: 0000000000000138
RDX: 0000000000000000 RSI: 0000000000000008 RDI: 000000000061b640
RBP: ffff880079c4fd48 R08: 0000002198feefd7 R09: ffffffff82a40928
R10: 0000000000000001 R11: 0000000000000000 R12: 000000000061b640
R13: 0000000000000000 R14: ffff880079c50000 R15: ffff8800791d7400
 [<ffffffff8145b043>] ? __clear_user+0x23/0x70
 [<ffffffff8145b0fb>] clear_user+0x2b/0x40
 [<ffffffff812fbda2>] load_elf_binary+0x1472/0x1750
 [<ffffffff8129a591>] search_binary_handler+0xa1/0x200
 [<ffffffff8129b69b>] do_execveat_common.isra.36+0x6cb/0x9f0
 [<ffffffff8129b5f3>] ? do_execveat_common.isra.36+0x623/0x9f0
 [<ffffffff8129bcaa>] SyS_execve+0x3a/0x50
 [<ffffffff81003f5c>] do_syscall_64+0x6c/0x1e0
 [<ffffffff818afa3f>] entry_SYSCALL64_slow_path+0x25/0x25
RIP: 0033:[<00007fd2e2f2e537>]  [<00007fd2e2f2e537>] 0x7fd2e2f2e537
RSP: 002b:00007ffc449c5fc8  EFLAGS: 00000246
RAX: ffffffffffffffda RBX: 00007ffc449c8860 RCX: 00007fd2e2f2e537
RDX: 000000000127cc40 RSI: 00007ffc449c8860 RDI: 00007ffc449c6029
RBP: 00007ffc449c60b0 R08: 65726f632d667265 R09: 00007ffc449c5e20
R10: 00000000000005a7 R11: 0000000000000246 R12: 000000000127cc40
R13: 000000000127ce05 R14: 00007ffc449c6029 R15: 000000000127ce01

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/dumpstack.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index 3be646c..f60a723 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -123,6 +123,13 @@ void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
 			if (!__kernel_text_address(addr))
 				continue;
 
+			/*
+			 * Don't print regs->ip again if it was already printed
+			 * by __show_regs() below.
+			 */
+			if (stack == &regs->ip)
+				continue;
+
 			if (stack == ret_addr_p)
 				reliable = 1;
 
@@ -151,6 +158,14 @@ void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
 			 * printed as unreliable.
 			 */
 			unwind_next_frame(&state);
+
+			/*
+			 * If the previous frame had pt_regs associated with it
+			 * due to an interrupt or syscall, print them.
+			 */
+			regs = unwind_get_entry_regs(&state);
+			if (regs)
+				__show_regs(regs, 0);
 		}
 
 		if (str_end)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 40/44] x86: remove 64-byte gap at end of irq stack
  2016-08-04 22:21 [PATCH v2 00/44] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (38 preceding siblings ...)
  2016-08-04 22:22 ` [PATCH v2 39/44] x86/dumpstack: print any pt_regs found on the stack Josh Poimboeuf
@ 2016-08-04 22:22 ` Josh Poimboeuf
  2016-08-04 22:22 ` [PATCH v2 41/44] x86/asm/head: standardize the end of the stack for idle tasks Josh Poimboeuf
                   ` (3 subsequent siblings)
  43 siblings, 0 replies; 66+ messages in thread
From: Josh Poimboeuf @ 2016-08-04 22:22 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park

There has been a 64-byte gap at the end of the irq stack for at least 12
years.  It predates git history, and I can't find any good reason for
it.  Remove it.  What's the worst that could happen?

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/include/asm/page_64_types.h | 3 ---
 arch/x86/kernel/cpu/common.c         | 2 +-
 arch/x86/kernel/dumpstack_64.c       | 4 ++--
 arch/x86/kernel/setup_percpu.c       | 2 +-
 4 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/page_64_types.h b/arch/x86/include/asm/page_64_types.h
index 6256baf..3c0be3b 100644
--- a/arch/x86/include/asm/page_64_types.h
+++ b/arch/x86/include/asm/page_64_types.h
@@ -24,9 +24,6 @@
 #define IRQ_STACK_ORDER		(2 + KASAN_STACK_ORDER)
 #define IRQ_STACK_SIZE		(PAGE_SIZE << IRQ_STACK_ORDER)
 
-/* FIXME: why? */
-#define IRQ_USABLE_STACK_SIZE	(IRQ_STACK_SIZE - 64)
-
 #define DOUBLEFAULT_STACK 1
 #define NMI_STACK 2
 #define DEBUG_STACK 3
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 8f3f7a4..6ef55e8 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1281,7 +1281,7 @@ DEFINE_PER_CPU(struct task_struct *, current_task) ____cacheline_aligned =
 EXPORT_PER_CPU_SYMBOL(current_task);
 
 DEFINE_PER_CPU(char *, irq_stack_ptr) =
-	init_per_cpu_var(irq_stack_union.irq_stack) + IRQ_USABLE_STACK_SIZE;
+	init_per_cpu_var(irq_stack_union.irq_stack) + IRQ_STACK_SIZE;
 
 DEFINE_PER_CPU(unsigned int, irq_count) __visible = -1;
 
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index 9f7a9f9..001a75d 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -78,7 +78,7 @@ static bool in_exception_stack(unsigned long *stack, struct stack_info *info)
 static bool in_irq_stack(unsigned long *stack, struct stack_info *info)
 {
 	unsigned long *end   = (unsigned long *)this_cpu_read(irq_stack_ptr);
-	unsigned long *begin = end - (IRQ_USABLE_STACK_SIZE / sizeof(long));
+	unsigned long *begin = end - (IRQ_STACK_SIZE / sizeof(long));
 
 	if (stack < begin || stack >= end)
 		return false;
@@ -145,7 +145,7 @@ void show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
 	int i;
 
 	irq_stack_end = (unsigned long *)this_cpu_read(irq_stack_ptr);
-	irq_stack     = irq_stack_end - (IRQ_USABLE_STACK_SIZE / sizeof(long));
+	irq_stack     = irq_stack_end - (IRQ_STACK_SIZE / sizeof(long));
 
 	sp = sp ? : get_stack_pointer(task, regs);
 
diff --git a/arch/x86/kernel/setup_percpu.c b/arch/x86/kernel/setup_percpu.c
index be6b571..d182799 100644
--- a/arch/x86/kernel/setup_percpu.c
+++ b/arch/x86/kernel/setup_percpu.c
@@ -246,7 +246,7 @@ void __init setup_per_cpu_areas(void)
 #ifdef CONFIG_X86_64
 		per_cpu(irq_stack_ptr, cpu) =
 			per_cpu(irq_stack_union.irq_stack, cpu) +
-			IRQ_USABLE_STACK_SIZE;
+			IRQ_STACK_SIZE;
 #endif
 #ifdef CONFIG_NUMA
 		per_cpu(x86_cpu_to_node_map, cpu) =
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 41/44] x86/asm/head: standardize the end of the stack for idle tasks
  2016-08-04 22:21 [PATCH v2 00/44] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (39 preceding siblings ...)
  2016-08-04 22:22 ` [PATCH v2 40/44] x86: remove 64-byte gap at end of irq stack Josh Poimboeuf
@ 2016-08-04 22:22 ` Josh Poimboeuf
  2016-08-04 22:22 ` [PATCH v2 42/44] x86/unwind: warn on kernel stack corruption Josh Poimboeuf
                   ` (2 subsequent siblings)
  43 siblings, 0 replies; 66+ messages in thread
From: Josh Poimboeuf @ 2016-08-04 22:22 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park

Thanks to all the recent x86 entry code refactoring, most tasks' kernel
stacks start at the same offset right above their saved pt_regs,
regardless of which syscall was used to enter the kernel.  That creates
a nice convention which makes it straightforward to identify the end of
the stack, which can be useful for stack walking code which needs to
verify the stack is sane.

However, CPU idle "swapper" tasks don't follow that convention.  Fix
that by starting their stack at a sizeof(pt_regs) offset from the end of
the stack page.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/head_64.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index c910c27..e33081d 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -326,7 +326,7 @@ ENDPROC(start_cpu0)
 	GLOBAL(initial_gs)
 	.quad	INIT_PER_CPU_VAR(irq_stack_union)
 	GLOBAL(initial_stack)
-	.quad  init_thread_union+THREAD_SIZE-8
+	.quad  init_thread_union + THREAD_SIZE - SIZEOF_PTREGS
 	__FINITDATA
 
 bad_address:
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 42/44] x86/unwind: warn on kernel stack corruption
  2016-08-04 22:21 [PATCH v2 00/44] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (40 preceding siblings ...)
  2016-08-04 22:22 ` [PATCH v2 41/44] x86/asm/head: standardize the end of the stack for idle tasks Josh Poimboeuf
@ 2016-08-04 22:22 ` Josh Poimboeuf
  2016-08-04 22:22 ` [PATCH v2 43/44] x86/unwind: warn on bad stack return address Josh Poimboeuf
  2016-08-04 22:22 ` [PATCH v2 44/44] x86/unwind: warn if stack grows up Josh Poimboeuf
  43 siblings, 0 replies; 66+ messages in thread
From: Josh Poimboeuf @ 2016-08-04 22:22 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park

Detect situations in the unwinder where the frame pointer refers to a
bad address, and print an appropriate warning.

Use printk_deferred_once() because the unwinder can be called with the
console lock by lockdep via save_stack_trace().

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/unwind_frame.c | 24 ++++++++++++++++++++++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/unwind_frame.c b/arch/x86/kernel/unwind_frame.c
index f60a089..d90c2c9 100644
--- a/arch/x86/kernel/unwind_frame.c
+++ b/arch/x86/kernel/unwind_frame.c
@@ -135,8 +135,13 @@ bool unwind_next_frame(struct unwind_state *state)
 		 * encoded regs pointer is on the IRQ stack but the regs
 		 * themselves are on the task stack.
 		 */
-		if (!update_stack_state(state, regs, sizeof(*regs)))
+		if (!update_stack_state(state, regs, sizeof(*regs))) {
+			printk_deferred_once(KERN_WARNING "WARNING: kernel stack frame pointer at %p in %s:%d decodes to bad regs pointer %p\n",
+				state->bp, state->task->comm, state->task->pid,
+				regs);
+
 			return false;
+		}
 
 		/*
 		 * The regs are now safe to access and are made available to
@@ -157,8 +162,23 @@ bool unwind_next_frame(struct unwind_state *state)
 	 * Make sure the next frame is on a valid stack and can be accessed
 	 * safely.
 	 */
-	if (!update_stack_state(state, next_bp, FRAME_HEADER_SIZE))
+	if (!update_stack_state(state, next_bp, FRAME_HEADER_SIZE)) {
+		/*
+		 * The next frame isn't on a valid stack, and we haven't
+		 * reached the end, which means something went wrong: either a
+		 * bad next stack pointer or a bad frame pointer.
+		 */
+		if (state->regs)
+			printk_deferred_once(KERN_WARNING "WARNING: kernel stack regs->bp at %p in %s:%d points to bad address %p\n",
+				state->bp, state->task->comm,
+				state->task->pid, regs);
+		else
+			printk_deferred_once(KERN_WARNING "WARNING: kernel stack frame pointer at %p in %s:%d points to bad address %p\n",
+				state->bp, state->task->comm,
+				state->task->pid, next_bp);
+
 		return false;
+	}
 
 	/* move to the next frame */
 	state->bp = next_bp;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 43/44] x86/unwind: warn on bad stack return address
  2016-08-04 22:21 [PATCH v2 00/44] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (41 preceding siblings ...)
  2016-08-04 22:22 ` [PATCH v2 42/44] x86/unwind: warn on kernel stack corruption Josh Poimboeuf
@ 2016-08-04 22:22 ` Josh Poimboeuf
  2016-08-04 22:22 ` [PATCH v2 44/44] x86/unwind: warn if stack grows up Josh Poimboeuf
  43 siblings, 0 replies; 66+ messages in thread
From: Josh Poimboeuf @ 2016-08-04 22:22 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park

If __kernel_text_address() doesn't recognize a return address on the
stack, it probably means that it's some generated code which
__kernel_text_address() doesn't know about yet.

Otherwise there's probably some stack corruption.

Either way, warn about it.

Use printk_deferred_once() because the unwinder can be called with the
console lock by lockdep via save_stack_trace().

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/unwind_frame.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/unwind_frame.c b/arch/x86/kernel/unwind_frame.c
index d90c2c9..f943413 100644
--- a/arch/x86/kernel/unwind_frame.c
+++ b/arch/x86/kernel/unwind_frame.c
@@ -17,7 +17,13 @@ unsigned long unwind_get_return_address(struct unwind_state *state)
 	addr = ftrace_graph_ret_addr(state->task, &state->graph_idx, *addr_p,
 				     addr_p);
 
-	return __kernel_text_address(addr) ? addr : 0;
+	if (!__kernel_text_address(addr)) {
+		printk_deferred_once(KERN_WARNING "WARNING: unrecognized kernel stack return address %p in %s:%d\n",
+			(void *)addr, state->task->comm, state->task->pid);
+		return 0;
+	}
+
+	return addr;
 }
 EXPORT_SYMBOL_GPL(unwind_get_return_address);
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 44/44] x86/unwind: warn if stack grows up
  2016-08-04 22:21 [PATCH v2 00/44] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (42 preceding siblings ...)
  2016-08-04 22:22 ` [PATCH v2 43/44] x86/unwind: warn on bad stack return address Josh Poimboeuf
@ 2016-08-04 22:22 ` Josh Poimboeuf
  43 siblings, 0 replies; 66+ messages in thread
From: Josh Poimboeuf @ 2016-08-04 22:22 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park

Add a sanity check to ensure the stack only grows down, and print a
warning if the check fails.

Use printk_deferred_once() because the unwinder can be called with the
console lock by lockdep via save_stack_trace().

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/unwind_frame.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/arch/x86/kernel/unwind_frame.c b/arch/x86/kernel/unwind_frame.c
index f943413..a29c342 100644
--- a/arch/x86/kernel/unwind_frame.c
+++ b/arch/x86/kernel/unwind_frame.c
@@ -107,6 +107,7 @@ bool unwind_next_frame(struct unwind_state *state)
 {
 	struct pt_regs *regs;
 	unsigned long *next_bp;
+	enum stack_type prev_type = state->stack_info.type;
 
 	state->regs = NULL;
 
@@ -186,6 +187,15 @@ bool unwind_next_frame(struct unwind_state *state)
 		return false;
 	}
 
+	/* make sure the stack only unwinds up */
+	if (state->stack_info.type == prev_type && next_bp <= state->bp) {
+		printk_deferred_once(KERN_WARNING "WARNING: kernel stack frame pointer at %p in %s:%d points the wrong way (%p)\n",
+				     state->bp, state->task->comm,
+				     state->task->pid, next_bp);
+		state->stack_info.type = STACK_TYPE_UNKNOWN;
+		return false;
+	}
+
 	/* move to the next frame */
 	state->bp = next_bp;
 	return true;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 03/44] x86/asm/head: rename 'stack_start' -> 'initial_stack'
  2016-08-04 22:21 ` [PATCH v2 03/44] x86/asm/head: rename 'stack_start' -> 'initial_stack' Josh Poimboeuf
@ 2016-08-05 15:28   ` Nilay Vaish
  2016-08-05 16:01     ` Josh Poimboeuf
  0 siblings, 1 reply; 66+ messages in thread
From: Nilay Vaish @ 2016-08-05 15:28 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86,
	Linux Kernel list, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park

On 4 August 2016 at 17:21, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> The 'stack_start' variable is similar in usage to 'initial_code' and
> 'initial_gs': they're all stored in head_64.S and they're all updated by
> SMP and ACPI suspend before starting a CPU.
>
> Rename it to 'initial_stack' to be consistent with the others.
>

May be change the following line as well:

./arch/x86/kernel/head_64.S:69:     * Setup stack for verify_cpu().
"-8" because stack_start is defined


--
Nilay

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 04/44] x86/asm/head: use a common function for starting CPUs
  2016-08-04 22:22 ` [PATCH v2 04/44] x86/asm/head: use a common function for starting CPUs Josh Poimboeuf
@ 2016-08-05 15:41   ` Nilay Vaish
  2016-08-05 16:17     ` Josh Poimboeuf
  0 siblings, 1 reply; 66+ messages in thread
From: Nilay Vaish @ 2016-08-05 15:41 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86,
	Linux Kernel list, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park

On 4 August 2016 at 17:22, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> There are two different pieces of code for starting a CPU: start_cpu0()
> and the end of secondary_startup_64().  They're identical except for the
> stack setup.  Combine the common parts into a shared start_cpu()
> function.
>
> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
> ---
>  arch/x86/kernel/head_64.S | 18 ++++++++----------
>  1 file changed, 8 insertions(+), 10 deletions(-)
>
> diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
> index aa10a53..8822c20 100644
> --- a/arch/x86/kernel/head_64.S
> +++ b/arch/x86/kernel/head_64.S
> @@ -264,13 +264,15 @@ ENTRY(secondary_startup_64)
>         movl    $MSR_GS_BASE,%ecx
>         movl    initial_gs(%rip),%eax
>         movl    initial_gs+4(%rip),%edx
> -       wrmsr
> +       wrmsr
>
>         /* rsi is pointer to real mode structure with interesting info.
>            pass it to C */
>         movq    %rsi, %rdi
> -
> -       /* Finally jump to run C code and to be on real kernel address
> +
> +ENTRY(start_cpu)
> +       /*
> +        * Jump to run C code and to be on a real kernel address.
>          * Since we are running on identity-mapped space we have to jump
>          * to the full 64bit address, this is only possible as indirect
>          * jump.  In addition we need to ensure %cs is set so we make this
> @@ -307,15 +309,11 @@ ENDPROC(secondary_startup_64)
>  /*
>   * Boot CPU0 entry point. It's called from play_dead(). Everything has been set
>   * up already except stack. We just set up stack here. Then call
> - * start_secondary().
> + * start_secondary() via start_cpu().
>   */
>  ENTRY(start_cpu0)
> -       movq initial_stack(%rip),%rsp
> -       movq    initial_code(%rip),%rax
> -       pushq   $0              # fake return address to stop unwinder
> -       pushq   $__KERNEL_CS    # set correct cs
> -       pushq   %rax            # target address in negative space
> -       lretq
> +       movq    initial_stack(%rip), %rsp
> +       jmp     start_cpu
>  ENDPROC(start_cpu0)
>  #endif
>

I have small suggestion here.  To me jumping from start_cpu0 into the
middle of secondary_startup_64 just seems strange.  May be we can
define separate ENTRY and ENDPROC pair for start_cpu and jump there
from start_cpu0 and also from secondary_startup_64.

--
Nilay

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 03/44] x86/asm/head: rename 'stack_start' -> 'initial_stack'
  2016-08-05 15:28   ` Nilay Vaish
@ 2016-08-05 16:01     ` Josh Poimboeuf
  2016-08-06  5:25       ` Borislav Petkov
  0 siblings, 1 reply; 66+ messages in thread
From: Josh Poimboeuf @ 2016-08-05 16:01 UTC (permalink / raw)
  To: Nilay Vaish
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86,
	Linux Kernel list, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park

On Fri, Aug 05, 2016 at 10:28:39AM -0500, Nilay Vaish wrote:
> On 4 August 2016 at 17:21, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > The 'stack_start' variable is similar in usage to 'initial_code' and
> > 'initial_gs': they're all stored in head_64.S and they're all updated by
> > SMP and ACPI suspend before starting a CPU.
> >
> > Rename it to 'initial_stack' to be consistent with the others.
> >
> 
> May be change the following line as well:
> 
> ./arch/x86/kernel/head_64.S:69:     * Setup stack for verify_cpu().
> "-8" because stack_start is defined

Ah, yeah, I missed that one.

And also, now that I see that comment, and the line below it:

  leaq    (__end_init_task - 8)(%rip), %rsp

The 8 should be changed to SIZEOF_PTREGS in a later patch
("x86/asm/head: standardize the end of the stack for idle tasks").

-- 
Josh

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 13/44] x86/asm/head: remove useless zeroed word
  2016-08-04 22:22 ` [PATCH v2 13/44] x86/asm/head: remove useless zeroed word Josh Poimboeuf
@ 2016-08-05 16:13   ` Brian Gerst
  2016-08-05 16:23     ` Josh Poimboeuf
  0 siblings, 1 reply; 66+ messages in thread
From: Brian Gerst @ 2016-08-05 16:13 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	the arch/x86 maintainers, Linux Kernel Mailing List,
	Andy Lutomirski, Linus Torvalds, Steven Rostedt, Kees Cook,
	Peter Zijlstra, Frederic Weisbecker, Byungchul Park

On Thu, Aug 4, 2016 at 6:22 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> This zeroed word has no apparent purpose, so remove it.
>
> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
> ---
>  arch/x86/kernel/head_64.S | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
> index 8822c20..ac6e27e 100644
> --- a/arch/x86/kernel/head_64.S
> +++ b/arch/x86/kernel/head_64.S
> @@ -326,7 +326,6 @@ ENDPROC(start_cpu0)
>         .quad   INIT_PER_CPU_VAR(irq_stack_union)
>         GLOBAL(initial_stack)
>         .quad  init_thread_union+THREAD_SIZE-8
> -       .word  0
>         __FINITDATA
>
>  bad_address:


FYI the word used to be the SS segment selector for the LSS
instruction, which isn't needed in 64-bit mode.

--
Brian Gerst

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 04/44] x86/asm/head: use a common function for starting CPUs
  2016-08-05 15:41   ` Nilay Vaish
@ 2016-08-05 16:17     ` Josh Poimboeuf
  0 siblings, 0 replies; 66+ messages in thread
From: Josh Poimboeuf @ 2016-08-05 16:17 UTC (permalink / raw)
  To: Nilay Vaish
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86,
	Linux Kernel list, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park

On Fri, Aug 05, 2016 at 10:41:15AM -0500, Nilay Vaish wrote:
> On 4 August 2016 at 17:22, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > There are two different pieces of code for starting a CPU: start_cpu0()
> > and the end of secondary_startup_64().  They're identical except for the
> > stack setup.  Combine the common parts into a shared start_cpu()
> > function.
> >
> > Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
> > ---
> >  arch/x86/kernel/head_64.S | 18 ++++++++----------
> >  1 file changed, 8 insertions(+), 10 deletions(-)
> >
> > diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
> > index aa10a53..8822c20 100644
> > --- a/arch/x86/kernel/head_64.S
> > +++ b/arch/x86/kernel/head_64.S
> > @@ -264,13 +264,15 @@ ENTRY(secondary_startup_64)
> >         movl    $MSR_GS_BASE,%ecx
> >         movl    initial_gs(%rip),%eax
> >         movl    initial_gs+4(%rip),%edx
> > -       wrmsr
> > +       wrmsr
> >
> >         /* rsi is pointer to real mode structure with interesting info.
> >            pass it to C */
> >         movq    %rsi, %rdi
> > -
> > -       /* Finally jump to run C code and to be on real kernel address
> > +
> > +ENTRY(start_cpu)
> > +       /*
> > +        * Jump to run C code and to be on a real kernel address.
> >          * Since we are running on identity-mapped space we have to jump
> >          * to the full 64bit address, this is only possible as indirect
> >          * jump.  In addition we need to ensure %cs is set so we make this
> > @@ -307,15 +309,11 @@ ENDPROC(secondary_startup_64)
> >  /*
> >   * Boot CPU0 entry point. It's called from play_dead(). Everything has been set
> >   * up already except stack. We just set up stack here. Then call
> > - * start_secondary().
> > + * start_secondary() via start_cpu().
> >   */
> >  ENTRY(start_cpu0)
> > -       movq initial_stack(%rip),%rsp
> > -       movq    initial_code(%rip),%rax
> > -       pushq   $0              # fake return address to stop unwinder
> > -       pushq   $__KERNEL_CS    # set correct cs
> > -       pushq   %rax            # target address in negative space
> > -       lretq
> > +       movq    initial_stack(%rip), %rsp
> > +       jmp     start_cpu
> >  ENDPROC(start_cpu0)
> >  #endif
> >
> 
> I have small suggestion here.  To me jumping from start_cpu0 into the
> middle of secondary_startup_64 just seems strange.  May be we can
> define separate ENTRY and ENDPROC pair for start_cpu and jump there
> from start_cpu0 and also from secondary_startup_64.

Yeah, that might be better.  But then again, it would also be strange to
add a jump at the end of secondary_startup_64, when it could instead
just fall through.

Maybe I should do as you suggest, but instead of the jump, add a comment
that it falls through to start_cpu()?

-- 
Josh

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 13/44] x86/asm/head: remove useless zeroed word
  2016-08-05 16:13   ` Brian Gerst
@ 2016-08-05 16:23     ` Josh Poimboeuf
  0 siblings, 0 replies; 66+ messages in thread
From: Josh Poimboeuf @ 2016-08-05 16:23 UTC (permalink / raw)
  To: Brian Gerst
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	the arch/x86 maintainers, Linux Kernel Mailing List,
	Andy Lutomirski, Linus Torvalds, Steven Rostedt, Kees Cook,
	Peter Zijlstra, Frederic Weisbecker, Byungchul Park

On Fri, Aug 05, 2016 at 12:13:04PM -0400, Brian Gerst wrote:
> On Thu, Aug 4, 2016 at 6:22 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > This zeroed word has no apparent purpose, so remove it.
> >
> > Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
> > ---
> >  arch/x86/kernel/head_64.S | 1 -
> >  1 file changed, 1 deletion(-)
> >
> > diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
> > index 8822c20..ac6e27e 100644
> > --- a/arch/x86/kernel/head_64.S
> > +++ b/arch/x86/kernel/head_64.S
> > @@ -326,7 +326,6 @@ ENDPROC(start_cpu0)
> >         .quad   INIT_PER_CPU_VAR(irq_stack_union)
> >         GLOBAL(initial_stack)
> >         .quad  init_thread_union+THREAD_SIZE-8
> > -       .word  0
> >         __FINITDATA
> >
> >  bad_address:
> 
> 
> FYI the word used to be the SS segment selector for the LSS
> instruction, which isn't needed in 64-bit mode.

Thanks.  I'll add that info to the patch header.

-- 
Josh

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 03/44] x86/asm/head: rename 'stack_start' -> 'initial_stack'
  2016-08-05 16:01     ` Josh Poimboeuf
@ 2016-08-06  5:25       ` Borislav Petkov
  2016-08-06 13:13         ` Josh Poimboeuf
  2016-08-06 13:15         ` Brian Gerst
  0 siblings, 2 replies; 66+ messages in thread
From: Borislav Petkov @ 2016-08-06  5:25 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Nilay Vaish, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86,
	Linux Kernel list, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park

On Fri, Aug 05, 2016 at 11:01:57AM -0500, Josh Poimboeuf wrote:
> The 8 should be changed to SIZEOF_PTREGS in a later patch
> ("x86/asm/head: standardize the end of the stack for idle tasks").

But SIZEOF_PTREGS is 21*8. I don't understand.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 03/44] x86/asm/head: rename 'stack_start' -> 'initial_stack'
  2016-08-06  5:25       ` Borislav Petkov
@ 2016-08-06 13:13         ` Josh Poimboeuf
  2016-08-06 13:15         ` Brian Gerst
  1 sibling, 0 replies; 66+ messages in thread
From: Josh Poimboeuf @ 2016-08-06 13:13 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Nilay Vaish, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86,
	Linux Kernel list, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park

On Sat, Aug 06, 2016 at 07:25:21AM +0200, Borislav Petkov wrote:
> On Fri, Aug 05, 2016 at 11:01:57AM -0500, Josh Poimboeuf wrote:
> > The 8 should be changed to SIZEOF_PTREGS in a later patch
> > ("x86/asm/head: standardize the end of the stack for idle tasks").
> 
> But SIZEOF_PTREGS is 21*8. I don't understand.

I was referring to this patch:

  [PATCH v2 41/44] x86/asm/head: standardize the end of the stack for idle tasks
  https://lkml.kernel.org/r/98f297ffbc2a23131f08c5c77c4db974e0de2ad3.1470345772.git.jpoimboe@redhat.com

It changes the stack end offset from 8 to SIZEOF_PTREGS, so idle tasks
will have the same end of stack address that other tasks do.  I was
thinking we should make a similar change here, for consistency:

	/*
	 * Setup stack for verify_cpu(). "-8" because stack_start is defined
	 * this way, see below. Our best guess is a NULL ptr for stack
	 * termination heuristics and we don't want to break anything which
	 * might depend on it (kgdb, ...).
	 */
	leaq	(__end_init_task - 8)(%rip), %rsp

-- 
Josh

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 03/44] x86/asm/head: rename 'stack_start' -> 'initial_stack'
  2016-08-06  5:25       ` Borislav Petkov
  2016-08-06 13:13         ` Josh Poimboeuf
@ 2016-08-06 13:15         ` Brian Gerst
  2016-08-06 13:38           ` Josh Poimboeuf
  1 sibling, 1 reply; 66+ messages in thread
From: Brian Gerst @ 2016-08-06 13:15 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Josh Poimboeuf, Nilay Vaish, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, x86, Linux Kernel list, Andy Lutomirski,
	Linus Torvalds, Steven Rostedt, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park

On Sat, Aug 6, 2016 at 1:25 AM, Borislav Petkov <bp@alien8.de> wrote:
> On Fri, Aug 05, 2016 at 11:01:57AM -0500, Josh Poimboeuf wrote:
>> The 8 should be changed to SIZEOF_PTREGS in a later patch
>> ("x86/asm/head: standardize the end of the stack for idle tasks").
>
> But SIZEOF_PTREGS is 21*8. I don't understand.

This patch is only for the boot cpu's idle thread.  All other kernel
threads, including idle threads for the secondary cpus, already have
the pt_regs area reserved.  My best guess for the current 8 byte
padding is to make sure thread_info is calculated properly (by masking
off the low bits from RSP).

Also, this fix should be applied to 32-bit, but make sure to account
for TOP_OF_KERNEL_STACK_PADDING.

--
Brian Gerst

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 03/44] x86/asm/head: rename 'stack_start' -> 'initial_stack'
  2016-08-06 13:15         ` Brian Gerst
@ 2016-08-06 13:38           ` Josh Poimboeuf
  0 siblings, 0 replies; 66+ messages in thread
From: Josh Poimboeuf @ 2016-08-06 13:38 UTC (permalink / raw)
  To: Brian Gerst
  Cc: Borislav Petkov, Nilay Vaish, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, x86, Linux Kernel list, Andy Lutomirski,
	Linus Torvalds, Steven Rostedt, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park

On Sat, Aug 06, 2016 at 09:15:30AM -0400, Brian Gerst wrote:
> On Sat, Aug 6, 2016 at 1:25 AM, Borislav Petkov <bp@alien8.de> wrote:
> > On Fri, Aug 05, 2016 at 11:01:57AM -0500, Josh Poimboeuf wrote:
> >> The 8 should be changed to SIZEOF_PTREGS in a later patch
> >> ("x86/asm/head: standardize the end of the stack for idle tasks").
> >
> > But SIZEOF_PTREGS is 21*8. I don't understand.
> 
> This patch is only for the boot cpu's idle thread.  All other kernel
> threads, including idle threads for the secondary cpus, already have
> the pt_regs area reserved.

Ah, you're right, it does only affect the boot CPU's idle thread.

(To be clear, I think you're talking about patch 41/44, and not the
temporary stack for the verify_cpu() call which I referred to above.)

> My best guess for the current 8 byte
> padding is to make sure thread_info is calculated properly (by masking
> off the low bits from RSP).
>
> Also, this fix should be applied to 32-bit, but make sure to account
> for TOP_OF_KERNEL_STACK_PADDING.

Ok.

-- 
Josh

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 36/44] x86/entry/unwind: encode pt_regs pointer in frame pointer
  2016-08-04 22:22 ` [PATCH v2 36/44] x86/entry/unwind: encode pt_regs pointer in frame pointer Josh Poimboeuf
@ 2016-08-08 23:06   ` Josh Poimboeuf
  0 siblings, 0 replies; 66+ messages in thread
From: Josh Poimboeuf @ 2016-08-08 23:06 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park

On Thu, Aug 04, 2016 at 05:22:32PM -0500, Josh Poimboeuf wrote:
> With frame pointers, when a task is interrupted, its stack is no longer
> completely reliable because the function could have been interrupted
> before it had a chance to save the previous frame pointer on the stack.
> So the caller of the interrupted function could get skipped by a stack
> trace.
> 
> This is problematic for live patching, which needs to know whether a
> stack trace of a sleeping task can be relied upon.  There's currently no
> way to detect if a sleeping task was interrupted by a page fault
> exception or preemption before it went to sleep.
> 
> Another issue is that when dumping the stack of an interrupted task, the
> unwinder has no way of knowing where the saved pt_regs registers are, so
> it can't print them.
> 
> This solves those issues by encoding the pt_regs pointer in the frame
> pointer on entry from an interrupt or an exception.  The frame pointer
> unwinder is also updated to decode it.
> 
> Suggested-by: Andy Lutomirski <luto@amacapital.net>
> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>

When doing some testing on x86_32, I realized there's a flaw here in the
unwinder's pt_regs detection, when an interrupt hits when we're already
in the entry code.  For example, a page fault can be interrupted by an
irq, after the page fault entry code encoded the frame pointer, but
before it had a chance to call into the C handler.

In that case, regs->bp itself is encoded, and the current "pt_regs
aren't real frames" design falls apart because there can actually be
more than one set of pt_regs per "frame".  So the unwinder gets confused
and stops walking the stack too early.

So really the unwinder needs to consider each pt_regs as a frame in and
of itself.  Which of course Andy already suggested before, but I
stupidly shot it down.

Working on that for v3.

-- 
Josh

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 30/44] x86/unwind: add new unwind interface and implementations
  2016-08-04 22:22 ` [PATCH v2 30/44] x86/unwind: add new unwind interface and implementations Josh Poimboeuf
@ 2016-08-09 23:17   ` Nilay Vaish
  2016-08-09 23:27     ` Josh Poimboeuf
  0 siblings, 1 reply; 66+ messages in thread
From: Nilay Vaish @ 2016-08-09 23:17 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86,
	Linux Kernel list, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park

On 4 August 2016 at 17:22, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> diff --git a/arch/x86/kernel/unwind_frame.c b/arch/x86/kernel/unwind_frame.c
> new file mode 100644
> index 0000000..f28f1b5
> --- /dev/null
> +++ b/arch/x86/kernel/unwind_frame.c
> @@ -0,0 +1,84 @@
> +#include <linux/sched.h>
> +#include <asm/ptrace.h>
> +#include <asm/bitops.h>
> +#include <asm/stacktrace.h>
> +#include <asm/unwind.h>
> +
> +#define FRAME_HEADER_SIZE (sizeof(long) * 2)
> +
> +unsigned long unwind_get_return_address(struct unwind_state *state)
> +{
> +       unsigned long *addr_p = unwind_get_return_address_ptr(state);
> +       unsigned long addr;
> +
> +       if (state->stack_info.type == STACK_TYPE_UNKNOWN)
> +               return 0;
> +
> +       addr = ftrace_graph_ret_addr(state->task, &state->graph_idx, *addr_p,
> +                                    addr_p);
> +
> +       return __kernel_text_address(addr) ? addr : 0;
> +}
> +EXPORT_SYMBOL_GPL(unwind_get_return_address);
> +
> +static bool update_stack_state(struct unwind_state *state, void *addr,
> +                              size_t len)
> +{
> +       struct stack_info *info = &state->stack_info;
> +
> +       if (on_stack(info, addr, len))
> +               return true;
> +
> +       if (get_stack_info(info->next_sp, state->task, info,
> +                          &state->stack_mask))
> +               goto unknown;
> +
> +       if (!on_stack(info, addr, len))
> +               goto unknown;
> +
> +       return true;
> +
> +unknown:
> +       info->type = STACK_TYPE_UNKNOWN;
> +       return false;
> +}
> +
> +bool unwind_next_frame(struct unwind_state *state)
> +{
> +       unsigned long *next_bp;
> +
> +       if (unwind_done(state))
> +               return false;
> +
> +       next_bp = (unsigned long *)*state->bp;
> +
> +       /*
> +        * Make sure the next frame is on a valid stack and can be accessed
> +        * safely.
> +        */
> +       if (!update_stack_state(state, next_bp, FRAME_HEADER_SIZE))
> +               return false;
> +
> +       /* move to the next frame */
> +       state->bp = next_bp;
> +       return true;
> +}
> +EXPORT_SYMBOL_GPL(unwind_next_frame);
> +
> +void __unwind_start(struct unwind_state *state, struct task_struct *task,
> +                   struct pt_regs *regs, unsigned long *sp)
> +{
> +       memset(state, 0, sizeof(*state));
> +
> +       state->task = task;
> +       state->bp = get_frame_pointer(task, regs);
> +
> +       get_stack_info(state->bp, state->task, &state->stack_info,
> +                      &state->stack_mask);
> +       update_stack_state(state, state->bp, FRAME_HEADER_SIZE);
> +
> +       /* unwind to the first frame after the specified stack pointer */
> +       while (state->bp < sp && !unwind_done(state))
> +               unwind_next_frame(state);

Do we unwind all the frames here?  It seems strange to me that in a
function named __unwind_start(), we unwind all the frames.

> +}
> +EXPORT_SYMBOL_GPL(__unwind_start);
> diff --git a/arch/x86/kernel/unwind_guess.c b/arch/x86/kernel/unwind_guess.c
> new file mode 100644
> index 0000000..e03df5a
> --- /dev/null
> +++ b/arch/x86/kernel/unwind_guess.c
> @@ -0,0 +1,40 @@
> +#include <linux/sched.h>
> +#include <linux/ftrace.h>
> +#include <asm/ptrace.h>
> +#include <asm/bitops.h>
> +#include <asm/stacktrace.h>
> +#include <asm/unwind.h>
> +
> +bool unwind_next_frame(struct unwind_state *state)
> +{
> +       struct stack_info *info = &state->stack_info;
> +
> +       if (info->type == STACK_TYPE_UNKNOWN)
> +               return false;
> +
> +       do {
> +               for (state->sp++; state->sp < info->end; state->sp++)
> +                       if (__kernel_text_address(*state->sp))
> +                               return true;
> +
> +               state->sp = info->next_sp;
> +
> +       } while (!get_stack_info(state->sp, state->task, info,
> +                                &state->stack_mask));
> +
> +       return false;
> +}
> +
> +void __unwind_start(struct unwind_state *state, struct task_struct *task,
> +                   struct pt_regs *regs, unsigned long *sp)
> +{
> +       memset(state, 0, sizeof(*state));
> +
> +       state->task = task;
> +       state->sp   = sp;
> +
> +       get_stack_info(sp, state->task, &state->stack_info, &state->stack_mask);
> +
> +       if (!__kernel_text_address(*sp))
> +               unwind_next_frame(state);
> +}
> --
> 2.7.4
>

Why is it that you need to export symbols in unwind_frame.c but not in
unwind_guess.c.  As per the Makefile, we would be compiling either of
those two files.  Should not EXPORT_SYMBOL_GPL(__unwind_start) appear
in both files?

--
Nilay

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 30/44] x86/unwind: add new unwind interface and implementations
  2016-08-09 23:17   ` Nilay Vaish
@ 2016-08-09 23:27     ` Josh Poimboeuf
  2016-08-10  7:25       ` Andy Lutomirski
  0 siblings, 1 reply; 66+ messages in thread
From: Josh Poimboeuf @ 2016-08-09 23:27 UTC (permalink / raw)
  To: Nilay Vaish
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86,
	Linux Kernel list, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park

On Tue, Aug 09, 2016 at 06:17:41PM -0500, Nilay Vaish wrote:
> On 4 August 2016 at 17:22, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > diff --git a/arch/x86/kernel/unwind_frame.c b/arch/x86/kernel/unwind_frame.c
> > new file mode 100644
> > index 0000000..f28f1b5
> > --- /dev/null
> > +++ b/arch/x86/kernel/unwind_frame.c
> > @@ -0,0 +1,84 @@
> > +#include <linux/sched.h>
> > +#include <asm/ptrace.h>
> > +#include <asm/bitops.h>
> > +#include <asm/stacktrace.h>
> > +#include <asm/unwind.h>
> > +
> > +#define FRAME_HEADER_SIZE (sizeof(long) * 2)
> > +
> > +unsigned long unwind_get_return_address(struct unwind_state *state)
> > +{
> > +       unsigned long *addr_p = unwind_get_return_address_ptr(state);
> > +       unsigned long addr;
> > +
> > +       if (state->stack_info.type == STACK_TYPE_UNKNOWN)
> > +               return 0;
> > +
> > +       addr = ftrace_graph_ret_addr(state->task, &state->graph_idx, *addr_p,
> > +                                    addr_p);
> > +
> > +       return __kernel_text_address(addr) ? addr : 0;
> > +}
> > +EXPORT_SYMBOL_GPL(unwind_get_return_address);
> > +
> > +static bool update_stack_state(struct unwind_state *state, void *addr,
> > +                              size_t len)
> > +{
> > +       struct stack_info *info = &state->stack_info;
> > +
> > +       if (on_stack(info, addr, len))
> > +               return true;
> > +
> > +       if (get_stack_info(info->next_sp, state->task, info,
> > +                          &state->stack_mask))
> > +               goto unknown;
> > +
> > +       if (!on_stack(info, addr, len))
> > +               goto unknown;
> > +
> > +       return true;
> > +
> > +unknown:
> > +       info->type = STACK_TYPE_UNKNOWN;
> > +       return false;
> > +}
> > +
> > +bool unwind_next_frame(struct unwind_state *state)
> > +{
> > +       unsigned long *next_bp;
> > +
> > +       if (unwind_done(state))
> > +               return false;
> > +
> > +       next_bp = (unsigned long *)*state->bp;
> > +
> > +       /*
> > +        * Make sure the next frame is on a valid stack and can be accessed
> > +        * safely.
> > +        */
> > +       if (!update_stack_state(state, next_bp, FRAME_HEADER_SIZE))
> > +               return false;
> > +
> > +       /* move to the next frame */
> > +       state->bp = next_bp;
> > +       return true;
> > +}
> > +EXPORT_SYMBOL_GPL(unwind_next_frame);
> > +
> > +void __unwind_start(struct unwind_state *state, struct task_struct *task,
> > +                   struct pt_regs *regs, unsigned long *sp)
> > +{
> > +       memset(state, 0, sizeof(*state));
> > +
> > +       state->task = task;
> > +       state->bp = get_frame_pointer(task, regs);
> > +
> > +       get_stack_info(state->bp, state->task, &state->stack_info,
> > +                      &state->stack_mask);
> > +       update_stack_state(state, state->bp, FRAME_HEADER_SIZE);
> > +
> > +       /* unwind to the first frame after the specified stack pointer */
> > +       while (state->bp < sp && !unwind_done(state))
> > +               unwind_next_frame(state);
> 
> Do we unwind all the frames here?  It seems strange to me that in a
> function named __unwind_start(), we unwind all the frames.

It just skips any stack frames before the specified "sp" pointer.
Several callers use this, for example, to start at regs->sp instead of
the current stack frame.  I'll try to make the comment clearer.

> > +}
> > +EXPORT_SYMBOL_GPL(__unwind_start);
> > diff --git a/arch/x86/kernel/unwind_guess.c b/arch/x86/kernel/unwind_guess.c
> > new file mode 100644
> > index 0000000..e03df5a
> > --- /dev/null
> > +++ b/arch/x86/kernel/unwind_guess.c
> > @@ -0,0 +1,40 @@
> > +#include <linux/sched.h>
> > +#include <linux/ftrace.h>
> > +#include <asm/ptrace.h>
> > +#include <asm/bitops.h>
> > +#include <asm/stacktrace.h>
> > +#include <asm/unwind.h>
> > +
> > +bool unwind_next_frame(struct unwind_state *state)
> > +{
> > +       struct stack_info *info = &state->stack_info;
> > +
> > +       if (info->type == STACK_TYPE_UNKNOWN)
> > +               return false;
> > +
> > +       do {
> > +               for (state->sp++; state->sp < info->end; state->sp++)
> > +                       if (__kernel_text_address(*state->sp))
> > +                               return true;
> > +
> > +               state->sp = info->next_sp;
> > +
> > +       } while (!get_stack_info(state->sp, state->task, info,
> > +                                &state->stack_mask));
> > +
> > +       return false;
> > +}
> > +
> > +void __unwind_start(struct unwind_state *state, struct task_struct *task,
> > +                   struct pt_regs *regs, unsigned long *sp)
> > +{
> > +       memset(state, 0, sizeof(*state));
> > +
> > +       state->task = task;
> > +       state->sp   = sp;
> > +
> > +       get_stack_info(sp, state->task, &state->stack_info, &state->stack_mask);
> > +
> > +       if (!__kernel_text_address(*sp))
> > +               unwind_next_frame(state);
> > +}
> > --
> > 2.7.4
> >
> 
> Why is it that you need to export symbols in unwind_frame.c but not in
> unwind_guess.c.  As per the Makefile, we would be compiling either of
> those two files.  Should not EXPORT_SYMBOL_GPL(__unwind_start) appear
> in both files?

Yeah, good catch.

-- 
Josh

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 30/44] x86/unwind: add new unwind interface and implementations
  2016-08-09 23:27     ` Josh Poimboeuf
@ 2016-08-10  7:25       ` Andy Lutomirski
  2016-08-10 14:16         ` Josh Poimboeuf
  0 siblings, 1 reply; 66+ messages in thread
From: Andy Lutomirski @ 2016-08-10  7:25 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Frederic Weisbecker, linux-kernel, Thomas Gleixner, Kees Cook,
	x86, H . Peter Anvin, Nilay Vaish, Steven Rostedt, Ingo Molnar,
	Brian Gerst, Byungchul Park, Peter Zijlstra, Linus Torvalds

On Aug 10, 2016 2:27 AM, "Josh Poimboeuf" <jpoimboe@redhat.com> wrote:
>
> On Tue, Aug 09, 2016 at 06:17:41PM -0500, Nilay Vaish wrote:
> > On 4 August 2016 at 17:22, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > > diff --git a/arch/x86/kernel/unwind_frame.c b/arch/x86/kernel/unwind_frame.c
> > > new file mode 100644
> > > index 0000000..f28f1b5
> > > --- /dev/null
> > > +++ b/arch/x86/kernel/unwind_frame.c
> > > @@ -0,0 +1,84 @@
> > > +#include <linux/sched.h>
> > > +#include <asm/ptrace.h>
> > > +#include <asm/bitops.h>
> > > +#include <asm/stacktrace.h>
> > > +#include <asm/unwind.h>
> > > +
> > > +#define FRAME_HEADER_SIZE (sizeof(long) * 2)
> > > +
> > > +unsigned long unwind_get_return_address(struct unwind_state *state)
> > > +{
> > > +       unsigned long *addr_p = unwind_get_return_address_ptr(state);
> > > +       unsigned long addr;
> > > +
> > > +       if (state->stack_info.type == STACK_TYPE_UNKNOWN)
> > > +               return 0;
> > > +
> > > +       addr = ftrace_graph_ret_addr(state->task, &state->graph_idx, *addr_p,
> > > +                                    addr_p);
> > > +
> > > +       return __kernel_text_address(addr) ? addr : 0;
> > > +}
> > > +EXPORT_SYMBOL_GPL(unwind_get_return_address);
> > > +
> > > +static bool update_stack_state(struct unwind_state *state, void *addr,
> > > +                              size_t len)
> > > +{
> > > +       struct stack_info *info = &state->stack_info;
> > > +
> > > +       if (on_stack(info, addr, len))
> > > +               return true;
> > > +
> > > +       if (get_stack_info(info->next_sp, state->task, info,
> > > +                          &state->stack_mask))
> > > +               goto unknown;
> > > +
> > > +       if (!on_stack(info, addr, len))
> > > +               goto unknown;
> > > +
> > > +       return true;
> > > +
> > > +unknown:
> > > +       info->type = STACK_TYPE_UNKNOWN;
> > > +       return false;
> > > +}
> > > +
> > > +bool unwind_next_frame(struct unwind_state *state)
> > > +{
> > > +       unsigned long *next_bp;
> > > +
> > > +       if (unwind_done(state))
> > > +               return false;
> > > +
> > > +       next_bp = (unsigned long *)*state->bp;
> > > +
> > > +       /*
> > > +        * Make sure the next frame is on a valid stack and can be accessed
> > > +        * safely.
> > > +        */
> > > +       if (!update_stack_state(state, next_bp, FRAME_HEADER_SIZE))
> > > +               return false;
> > > +
> > > +       /* move to the next frame */
> > > +       state->bp = next_bp;
> > > +       return true;
> > > +}
> > > +EXPORT_SYMBOL_GPL(unwind_next_frame);
> > > +
> > > +void __unwind_start(struct unwind_state *state, struct task_struct *task,
> > > +                   struct pt_regs *regs, unsigned long *sp)
> > > +{
> > > +       memset(state, 0, sizeof(*state));
> > > +
> > > +       state->task = task;
> > > +       state->bp = get_frame_pointer(task, regs);
> > > +
> > > +       get_stack_info(state->bp, state->task, &state->stack_info,
> > > +                      &state->stack_mask);
> > > +       update_stack_state(state, state->bp, FRAME_HEADER_SIZE);
> > > +
> > > +       /* unwind to the first frame after the specified stack pointer */
> > > +       while (state->bp < sp && !unwind_done(state))
> > > +               unwind_next_frame(state);
> >
> > Do we unwind all the frames here?  It seems strange to me that in a
> > function named __unwind_start(), we unwind all the frames.
>
> It just skips any stack frames before the specified "sp" pointer.
> Several callers use this, for example, to start at regs->sp instead of
> the current stack frame.  I'll try to make the comment clearer.
>

Are you checking the right condition?  Shouldn't this check that sp is
in bounds for the current stack if a stack switch happened?

I admit I don't fully understand the use case.  If someone wants to
start a trace in the middle, shouldn't they just pass regs in?

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 30/44] x86/unwind: add new unwind interface and implementations
  2016-08-10  7:25       ` Andy Lutomirski
@ 2016-08-10 14:16         ` Josh Poimboeuf
  2016-08-11  7:18           ` Andy Lutomirski
  0 siblings, 1 reply; 66+ messages in thread
From: Josh Poimboeuf @ 2016-08-10 14:16 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Frederic Weisbecker, linux-kernel, Thomas Gleixner, Kees Cook,
	x86, H . Peter Anvin, Nilay Vaish, Steven Rostedt, Ingo Molnar,
	Brian Gerst, Byungchul Park, Peter Zijlstra, Linus Torvalds

On Wed, Aug 10, 2016 at 12:25:11AM -0700, Andy Lutomirski wrote:
> On Aug 10, 2016 2:27 AM, "Josh Poimboeuf" <jpoimboe@redhat.com> wrote:
> >
> > On Tue, Aug 09, 2016 at 06:17:41PM -0500, Nilay Vaish wrote:
> > > On 4 August 2016 at 17:22, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > > > diff --git a/arch/x86/kernel/unwind_frame.c b/arch/x86/kernel/unwind_frame.c
> > > > new file mode 100644
> > > > index 0000000..f28f1b5
> > > > --- /dev/null
> > > > +++ b/arch/x86/kernel/unwind_frame.c
> > > > @@ -0,0 +1,84 @@
> > > > +#include <linux/sched.h>
> > > > +#include <asm/ptrace.h>
> > > > +#include <asm/bitops.h>
> > > > +#include <asm/stacktrace.h>
> > > > +#include <asm/unwind.h>
> > > > +
> > > > +#define FRAME_HEADER_SIZE (sizeof(long) * 2)
> > > > +
> > > > +unsigned long unwind_get_return_address(struct unwind_state *state)
> > > > +{
> > > > +       unsigned long *addr_p = unwind_get_return_address_ptr(state);
> > > > +       unsigned long addr;
> > > > +
> > > > +       if (state->stack_info.type == STACK_TYPE_UNKNOWN)
> > > > +               return 0;
> > > > +
> > > > +       addr = ftrace_graph_ret_addr(state->task, &state->graph_idx, *addr_p,
> > > > +                                    addr_p);
> > > > +
> > > > +       return __kernel_text_address(addr) ? addr : 0;
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(unwind_get_return_address);
> > > > +
> > > > +static bool update_stack_state(struct unwind_state *state, void *addr,
> > > > +                              size_t len)
> > > > +{
> > > > +       struct stack_info *info = &state->stack_info;
> > > > +
> > > > +       if (on_stack(info, addr, len))
> > > > +               return true;
> > > > +
> > > > +       if (get_stack_info(info->next_sp, state->task, info,
> > > > +                          &state->stack_mask))
> > > > +               goto unknown;
> > > > +
> > > > +       if (!on_stack(info, addr, len))
> > > > +               goto unknown;
> > > > +
> > > > +       return true;
> > > > +
> > > > +unknown:
> > > > +       info->type = STACK_TYPE_UNKNOWN;
> > > > +       return false;
> > > > +}
> > > > +
> > > > +bool unwind_next_frame(struct unwind_state *state)
> > > > +{
> > > > +       unsigned long *next_bp;
> > > > +
> > > > +       if (unwind_done(state))
> > > > +               return false;
> > > > +
> > > > +       next_bp = (unsigned long *)*state->bp;
> > > > +
> > > > +       /*
> > > > +        * Make sure the next frame is on a valid stack and can be accessed
> > > > +        * safely.
> > > > +        */
> > > > +       if (!update_stack_state(state, next_bp, FRAME_HEADER_SIZE))
> > > > +               return false;
> > > > +
> > > > +       /* move to the next frame */
> > > > +       state->bp = next_bp;
> > > > +       return true;
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(unwind_next_frame);
> > > > +
> > > > +void __unwind_start(struct unwind_state *state, struct task_struct *task,
> > > > +                   struct pt_regs *regs, unsigned long *sp)
> > > > +{
> > > > +       memset(state, 0, sizeof(*state));
> > > > +
> > > > +       state->task = task;
> > > > +       state->bp = get_frame_pointer(task, regs);
> > > > +
> > > > +       get_stack_info(state->bp, state->task, &state->stack_info,
> > > > +                      &state->stack_mask);
> > > > +       update_stack_state(state, state->bp, FRAME_HEADER_SIZE);
> > > > +
> > > > +       /* unwind to the first frame after the specified stack pointer */
> > > > +       while (state->bp < sp && !unwind_done(state))
> > > > +               unwind_next_frame(state);
> > >
> > > Do we unwind all the frames here?  It seems strange to me that in a
> > > function named __unwind_start(), we unwind all the frames.
> >
> > It just skips any stack frames before the specified "sp" pointer.
> > Several callers use this, for example, to start at regs->sp instead of
> > the current stack frame.  I'll try to make the comment clearer.
> >
> 
> Are you checking the right condition?  Shouldn't this check that sp is
> in bounds for the current stack if a stack switch happened?

You're right.

> I admit I don't fully understand the use case.  If someone wants to
> start a trace in the middle, shouldn't they just pass regs in?

The regs aren't always available.  Some callers just want to skip the
first few frames so the stack dump code itself isn't traced.

-- 
Josh

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 30/44] x86/unwind: add new unwind interface and implementations
  2016-08-10 14:16         ` Josh Poimboeuf
@ 2016-08-11  7:18           ` Andy Lutomirski
  2016-08-11 14:28             ` Josh Poimboeuf
  0 siblings, 1 reply; 66+ messages in thread
From: Andy Lutomirski @ 2016-08-11  7:18 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Frederic Weisbecker, Thomas Gleixner, Kees Cook, H . Peter Anvin,
	x86, Nilay Vaish, Ingo Molnar, Steven Rostedt, Brian Gerst,
	linux-kernel, Linus Torvalds, Peter Zijlstra, Byungchul Park

On Aug 10, 2016 5:16 PM, "Josh Poimboeuf" <jpoimboe@redhat.com> wrote:
>
> On Wed, Aug 10, 2016 at 12:25:11AM -0700, Andy Lutomirski wrote:
> > On Aug 10, 2016 2:27 AM, "Josh Poimboeuf" <jpoimboe@redhat.com> wrote:
> > >
> > > On Tue, Aug 09, 2016 at 06:17:41PM -0500, Nilay Vaish wrote:
> > > > On 4 August 2016 at 17:22, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > > > > diff --git a/arch/x86/kernel/unwind_frame.c b/arch/x86/kernel/unwind_frame.c
> > > > > new file mode 100644
> > > > > index 0000000..f28f1b5
> > > > > --- /dev/null
> > > > > +++ b/arch/x86/kernel/unwind_frame.c
> > > > > @@ -0,0 +1,84 @@
> > > > > +#include <linux/sched.h>
> > > > > +#include <asm/ptrace.h>
> > > > > +#include <asm/bitops.h>
> > > > > +#include <asm/stacktrace.h>
> > > > > +#include <asm/unwind.h>
> > > > > +
> > > > > +#define FRAME_HEADER_SIZE (sizeof(long) * 2)
> > > > > +
> > > > > +unsigned long unwind_get_return_address(struct unwind_state *state)
> > > > > +{
> > > > > +       unsigned long *addr_p = unwind_get_return_address_ptr(state);
> > > > > +       unsigned long addr;
> > > > > +
> > > > > +       if (state->stack_info.type == STACK_TYPE_UNKNOWN)
> > > > > +               return 0;
> > > > > +
> > > > > +       addr = ftrace_graph_ret_addr(state->task, &state->graph_idx, *addr_p,
> > > > > +                                    addr_p);
> > > > > +
> > > > > +       return __kernel_text_address(addr) ? addr : 0;
> > > > > +}
> > > > > +EXPORT_SYMBOL_GPL(unwind_get_return_address);
> > > > > +
> > > > > +static bool update_stack_state(struct unwind_state *state, void *addr,
> > > > > +                              size_t len)
> > > > > +{
> > > > > +       struct stack_info *info = &state->stack_info;
> > > > > +
> > > > > +       if (on_stack(info, addr, len))
> > > > > +               return true;
> > > > > +
> > > > > +       if (get_stack_info(info->next_sp, state->task, info,
> > > > > +                          &state->stack_mask))
> > > > > +               goto unknown;
> > > > > +
> > > > > +       if (!on_stack(info, addr, len))
> > > > > +               goto unknown;
> > > > > +
> > > > > +       return true;
> > > > > +
> > > > > +unknown:
> > > > > +       info->type = STACK_TYPE_UNKNOWN;
> > > > > +       return false;
> > > > > +}
> > > > > +
> > > > > +bool unwind_next_frame(struct unwind_state *state)
> > > > > +{
> > > > > +       unsigned long *next_bp;
> > > > > +
> > > > > +       if (unwind_done(state))
> > > > > +               return false;
> > > > > +
> > > > > +       next_bp = (unsigned long *)*state->bp;
> > > > > +
> > > > > +       /*
> > > > > +        * Make sure the next frame is on a valid stack and can be accessed
> > > > > +        * safely.
> > > > > +        */
> > > > > +       if (!update_stack_state(state, next_bp, FRAME_HEADER_SIZE))
> > > > > +               return false;
> > > > > +
> > > > > +       /* move to the next frame */
> > > > > +       state->bp = next_bp;
> > > > > +       return true;
> > > > > +}
> > > > > +EXPORT_SYMBOL_GPL(unwind_next_frame);
> > > > > +
> > > > > +void __unwind_start(struct unwind_state *state, struct task_struct *task,
> > > > > +                   struct pt_regs *regs, unsigned long *sp)
> > > > > +{
> > > > > +       memset(state, 0, sizeof(*state));
> > > > > +
> > > > > +       state->task = task;
> > > > > +       state->bp = get_frame_pointer(task, regs);
> > > > > +
> > > > > +       get_stack_info(state->bp, state->task, &state->stack_info,
> > > > > +                      &state->stack_mask);
> > > > > +       update_stack_state(state, state->bp, FRAME_HEADER_SIZE);
> > > > > +
> > > > > +       /* unwind to the first frame after the specified stack pointer */
> > > > > +       while (state->bp < sp && !unwind_done(state))
> > > > > +               unwind_next_frame(state);
> > > >
> > > > Do we unwind all the frames here?  It seems strange to me that in a
> > > > function named __unwind_start(), we unwind all the frames.
> > >
> > > It just skips any stack frames before the specified "sp" pointer.
> > > Several callers use this, for example, to start at regs->sp instead of
> > > the current stack frame.  I'll try to make the comment clearer.
> > >
> >
> > Are you checking the right condition?  Shouldn't this check that sp is
> > in bounds for the current stack if a stack switch happened?
>
> You're right.
>
> > I admit I don't fully understand the use case.  If someone wants to
> > start a trace in the middle, shouldn't they just pass regs in?
>
> The regs aren't always available.  Some callers just want to skip the
> first few frames so the stack dump code itself isn't traced.

I suspect that all users are okay with your algorithm simply because
they don't switch stacks.  Maybe the thing to do is to stop advancing
when sp is passed or if the stack switches at all.

Could you point me at a user that passes anything other than regs->sp
here?  On brief inspection, I haven't found any at all.

--Andy

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 30/44] x86/unwind: add new unwind interface and implementations
  2016-08-11  7:18           ` Andy Lutomirski
@ 2016-08-11 14:28             ` Josh Poimboeuf
  2016-08-11 14:58               ` Andy Lutomirski
  0 siblings, 1 reply; 66+ messages in thread
From: Josh Poimboeuf @ 2016-08-11 14:28 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Frederic Weisbecker, Thomas Gleixner, Kees Cook, H . Peter Anvin,
	x86, Nilay Vaish, Ingo Molnar, Steven Rostedt, Brian Gerst,
	linux-kernel, Linus Torvalds, Peter Zijlstra, Byungchul Park

On Thu, Aug 11, 2016 at 12:18:54AM -0700, Andy Lutomirski wrote:
> On Aug 10, 2016 5:16 PM, "Josh Poimboeuf" <jpoimboe@redhat.com> wrote:
> >
> > On Wed, Aug 10, 2016 at 12:25:11AM -0700, Andy Lutomirski wrote:
> > > On Aug 10, 2016 2:27 AM, "Josh Poimboeuf" <jpoimboe@redhat.com> wrote:
> > > >
> > > > On Tue, Aug 09, 2016 at 06:17:41PM -0500, Nilay Vaish wrote:
> > > > > On 4 August 2016 at 17:22, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > > > > > diff --git a/arch/x86/kernel/unwind_frame.c b/arch/x86/kernel/unwind_frame.c
> > > > > > new file mode 100644
> > > > > > index 0000000..f28f1b5
> > > > > > --- /dev/null
> > > > > > +++ b/arch/x86/kernel/unwind_frame.c
> > > > > > @@ -0,0 +1,84 @@
> > > > > > +#include <linux/sched.h>
> > > > > > +#include <asm/ptrace.h>
> > > > > > +#include <asm/bitops.h>
> > > > > > +#include <asm/stacktrace.h>
> > > > > > +#include <asm/unwind.h>
> > > > > > +
> > > > > > +#define FRAME_HEADER_SIZE (sizeof(long) * 2)
> > > > > > +
> > > > > > +unsigned long unwind_get_return_address(struct unwind_state *state)
> > > > > > +{
> > > > > > +       unsigned long *addr_p = unwind_get_return_address_ptr(state);
> > > > > > +       unsigned long addr;
> > > > > > +
> > > > > > +       if (state->stack_info.type == STACK_TYPE_UNKNOWN)
> > > > > > +               return 0;
> > > > > > +
> > > > > > +       addr = ftrace_graph_ret_addr(state->task, &state->graph_idx, *addr_p,
> > > > > > +                                    addr_p);
> > > > > > +
> > > > > > +       return __kernel_text_address(addr) ? addr : 0;
> > > > > > +}
> > > > > > +EXPORT_SYMBOL_GPL(unwind_get_return_address);
> > > > > > +
> > > > > > +static bool update_stack_state(struct unwind_state *state, void *addr,
> > > > > > +                              size_t len)
> > > > > > +{
> > > > > > +       struct stack_info *info = &state->stack_info;
> > > > > > +
> > > > > > +       if (on_stack(info, addr, len))
> > > > > > +               return true;
> > > > > > +
> > > > > > +       if (get_stack_info(info->next_sp, state->task, info,
> > > > > > +                          &state->stack_mask))
> > > > > > +               goto unknown;
> > > > > > +
> > > > > > +       if (!on_stack(info, addr, len))
> > > > > > +               goto unknown;
> > > > > > +
> > > > > > +       return true;
> > > > > > +
> > > > > > +unknown:
> > > > > > +       info->type = STACK_TYPE_UNKNOWN;
> > > > > > +       return false;
> > > > > > +}
> > > > > > +
> > > > > > +bool unwind_next_frame(struct unwind_state *state)
> > > > > > +{
> > > > > > +       unsigned long *next_bp;
> > > > > > +
> > > > > > +       if (unwind_done(state))
> > > > > > +               return false;
> > > > > > +
> > > > > > +       next_bp = (unsigned long *)*state->bp;
> > > > > > +
> > > > > > +       /*
> > > > > > +        * Make sure the next frame is on a valid stack and can be accessed
> > > > > > +        * safely.
> > > > > > +        */
> > > > > > +       if (!update_stack_state(state, next_bp, FRAME_HEADER_SIZE))
> > > > > > +               return false;
> > > > > > +
> > > > > > +       /* move to the next frame */
> > > > > > +       state->bp = next_bp;
> > > > > > +       return true;
> > > > > > +}
> > > > > > +EXPORT_SYMBOL_GPL(unwind_next_frame);
> > > > > > +
> > > > > > +void __unwind_start(struct unwind_state *state, struct task_struct *task,
> > > > > > +                   struct pt_regs *regs, unsigned long *sp)
> > > > > > +{
> > > > > > +       memset(state, 0, sizeof(*state));
> > > > > > +
> > > > > > +       state->task = task;
> > > > > > +       state->bp = get_frame_pointer(task, regs);
> > > > > > +
> > > > > > +       get_stack_info(state->bp, state->task, &state->stack_info,
> > > > > > +                      &state->stack_mask);
> > > > > > +       update_stack_state(state, state->bp, FRAME_HEADER_SIZE);
> > > > > > +
> > > > > > +       /* unwind to the first frame after the specified stack pointer */
> > > > > > +       while (state->bp < sp && !unwind_done(state))
> > > > > > +               unwind_next_frame(state);
> > > > >
> > > > > Do we unwind all the frames here?  It seems strange to me that in a
> > > > > function named __unwind_start(), we unwind all the frames.
> > > >
> > > > It just skips any stack frames before the specified "sp" pointer.
> > > > Several callers use this, for example, to start at regs->sp instead of
> > > > the current stack frame.  I'll try to make the comment clearer.
> > > >
> > >
> > > Are you checking the right condition?  Shouldn't this check that sp is
> > > in bounds for the current stack if a stack switch happened?
> >
> > You're right.
> >
> > > I admit I don't fully understand the use case.  If someone wants to
> > > start a trace in the middle, shouldn't they just pass regs in?
> >
> > The regs aren't always available.  Some callers just want to skip the
> > first few frames so the stack dump code itself isn't traced.
> 
> I suspect that all users are okay with your algorithm simply because
> they don't switch stacks.  Maybe the thing to do is to stop advancing
> when sp is passed or if the stack switches at all.

There are actually some cases which could be broken by the sloppy "while
state->bp < sp" check.  When starting with sp from 'regs->sp', sp often
points to a different stack than the current one.  It seemed to work in
my testing, but I guess I got lucky, with irq percpu stack addresses
being smaller than the thread stack addresses.

> Could you point me at a user that passes anything other than regs->sp
> here?  On brief inspection, I haven't found any at all.

For example, see show_stack().

-- 
Josh

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 30/44] x86/unwind: add new unwind interface and implementations
  2016-08-11 14:28             ` Josh Poimboeuf
@ 2016-08-11 14:58               ` Andy Lutomirski
  2016-08-11 16:09                 ` Josh Poimboeuf
  0 siblings, 1 reply; 66+ messages in thread
From: Andy Lutomirski @ 2016-08-11 14:58 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Frederic Weisbecker, Thomas Gleixner, Kees Cook, H . Peter Anvin,
	x86, Nilay Vaish, Ingo Molnar, Steven Rostedt, Brian Gerst,
	linux-kernel, Linus Torvalds, Peter Zijlstra, Byungchul Park

On Thu, Aug 11, 2016 at 7:28 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> On Thu, Aug 11, 2016 at 12:18:54AM -0700, Andy Lutomirski wrote:
>> > > I admit I don't fully understand the use case.  If someone wants to
>> > > start a trace in the middle, shouldn't they just pass regs in?
>> >
>> > The regs aren't always available.  Some callers just want to skip the
>> > first few frames so the stack dump code itself isn't traced.
>>
>> I suspect that all users are okay with your algorithm simply because
>> they don't switch stacks.  Maybe the thing to do is to stop advancing
>> when sp is passed or if the stack switches at all.
>
> There are actually some cases which could be broken by the sloppy "while
> state->bp < sp" check.  When starting with sp from 'regs->sp', sp often
> points to a different stack than the current one.  It seemed to work in
> my testing, but I guess I got lucky, with irq percpu stack addresses
> being smaller than the thread stack addresses.
>
>> Could you point me at a user that passes anything other than regs->sp
>> here?  On brief inspection, I haven't found any at all.
>
> For example, see show_stack().

Is that a non-trivial case?  show_stack() is generating sp *and* bp.
Why not just pass both of them all the way in to show_trace_log_lvl
(which the existing code already does) and then pass it into
unwind_start?

Alternatively, since that risks causing a bit of loss if you implement
DWARF, you could add an unwind_start_here() function that captures the
state in the calling function and pass the result all the way through.
Or you could write a silly asm helper that literally fills in a struct
pt_regs for the current context (although that could get a bit awkward
for 32-bit, since pt_regs doesn't really contain sp there).

If there are genuinely zero non-trivial cases, then this should fully
solve the problem, no?

Aside: the current code looks a bit silly to me:

    unsigned long stack;
    ...
        sp = &stack;
        bp = stack_frame(current, NULL);

Why not just force a frame pointer and read out sp and bp using asm?

--Andy

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 30/44] x86/unwind: add new unwind interface and implementations
  2016-08-11 14:58               ` Andy Lutomirski
@ 2016-08-11 16:09                 ` Josh Poimboeuf
  2016-08-11 18:58                   ` Andy Lutomirski
  0 siblings, 1 reply; 66+ messages in thread
From: Josh Poimboeuf @ 2016-08-11 16:09 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Frederic Weisbecker, Thomas Gleixner, Kees Cook, H . Peter Anvin,
	x86, Nilay Vaish, Ingo Molnar, Steven Rostedt, Brian Gerst,
	linux-kernel, Linus Torvalds, Peter Zijlstra, Byungchul Park

On Thu, Aug 11, 2016 at 07:58:34AM -0700, Andy Lutomirski wrote:
> On Thu, Aug 11, 2016 at 7:28 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > On Thu, Aug 11, 2016 at 12:18:54AM -0700, Andy Lutomirski wrote:
> >> > > I admit I don't fully understand the use case.  If someone wants to
> >> > > start a trace in the middle, shouldn't they just pass regs in?
> >> >
> >> > The regs aren't always available.  Some callers just want to skip the
> >> > first few frames so the stack dump code itself isn't traced.
> >>
> >> I suspect that all users are okay with your algorithm simply because
> >> they don't switch stacks.  Maybe the thing to do is to stop advancing
> >> when sp is passed or if the stack switches at all.
> >
> > There are actually some cases which could be broken by the sloppy "while
> > state->bp < sp" check.  When starting with sp from 'regs->sp', sp often
> > points to a different stack than the current one.  It seemed to work in
> > my testing, but I guess I got lucky, with irq percpu stack addresses
> > being smaller than the thread stack addresses.
> >
> >> Could you point me at a user that passes anything other than regs->sp
> >> here?  On brief inspection, I haven't found any at all.
> >
> > For example, see show_stack().
> 
> Is that a non-trivial case?  show_stack() is generating sp *and* bp.
> Why not just pass both of them all the way in to show_trace_log_lvl
> (which the existing code already does) and then pass it into
> unwind_start?
> 
> Alternatively, since that risks causing a bit of loss if you implement
> DWARF, you could add an unwind_start_here() function that captures the
> state in the calling function and pass the result all the way through.
> Or you could write a silly asm helper that literally fills in a struct
> pt_regs for the current context (although that could get a bit awkward
> for 32-bit, since pt_regs doesn't really contain sp there).
> 
> If there are genuinely zero non-trivial cases, then this should fully
> solve the problem, no?

Hm, what do you mean by non-trivial cases?

As far as I can tell, they're all trivial, and we don't need the caller
to provide bp.  Just start with the current frame's bp and unwind until
bp and sp are on the same stack and bp > sp.

> Aside: the current code looks a bit silly to me:
> 
>     unsigned long stack;
>     ...
>         sp = &stack;
>         bp = stack_frame(current, NULL);
> 
> Why not just force a frame pointer and read out sp and bp using asm?

Due to the above-discussed algorithm, we no longer need to get bp in
show_stack() (though I forgot to remove the get_frame_pointer() call in
my patches).

And I changed the 'sp = &stack' to a call to get_stack_pointer().

-- 
Josh

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 30/44] x86/unwind: add new unwind interface and implementations
  2016-08-11 16:09                 ` Josh Poimboeuf
@ 2016-08-11 18:58                   ` Andy Lutomirski
  2016-08-11 19:15                     ` Josh Poimboeuf
  0 siblings, 1 reply; 66+ messages in thread
From: Andy Lutomirski @ 2016-08-11 18:58 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Frederic Weisbecker, Thomas Gleixner, Kees Cook, x86,
	H . Peter Anvin, Nilay Vaish, Steven Rostedt, Ingo Molnar,
	linux-kernel, Brian Gerst, Byungchul Park, Peter Zijlstra,
	Linus Torvalds

On Aug 11, 2016 7:09 PM, "Josh Poimboeuf" <jpoimboe@redhat.com> wrote:
>
> On Thu, Aug 11, 2016 at 07:58:34AM -0700, Andy Lutomirski wrote:
> > On Thu, Aug 11, 2016 at 7:28 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > > On Thu, Aug 11, 2016 at 12:18:54AM -0700, Andy Lutomirski wrote:
> > >> > > I admit I don't fully understand the use case.  If someone wants to
> > >> > > start a trace in the middle, shouldn't they just pass regs in?
> > >> >
> > >> > The regs aren't always available.  Some callers just want to skip the
> > >> > first few frames so the stack dump code itself isn't traced.
> > >>
> > >> I suspect that all users are okay with your algorithm simply because
> > >> they don't switch stacks.  Maybe the thing to do is to stop advancing
> > >> when sp is passed or if the stack switches at all.
> > >
> > > There are actually some cases which could be broken by the sloppy "while
> > > state->bp < sp" check.  When starting with sp from 'regs->sp', sp often
> > > points to a different stack than the current one.  It seemed to work in
> > > my testing, but I guess I got lucky, with irq percpu stack addresses
> > > being smaller than the thread stack addresses.
> > >
> > >> Could you point me at a user that passes anything other than regs->sp
> > >> here?  On brief inspection, I haven't found any at all.
> > >
> > > For example, see show_stack().
> >
> > Is that a non-trivial case?  show_stack() is generating sp *and* bp.
> > Why not just pass both of them all the way in to show_trace_log_lvl
> > (which the existing code already does) and then pass it into
> > unwind_start?
> >
> > Alternatively, since that risks causing a bit of loss if you implement
> > DWARF, you could add an unwind_start_here() function that captures the
> > state in the calling function and pass the result all the way through.
> > Or you could write a silly asm helper that literally fills in a struct
> > pt_regs for the current context (although that could get a bit awkward
> > for 32-bit, since pt_regs doesn't really contain sp there).
> >
> > If there are genuinely zero non-trivial cases, then this should fully
> > solve the problem, no?
>
> Hm, what do you mean by non-trivial cases?
>
> As far as I can tell, they're all trivial, and we don't need the caller
> to provide bp.  Just start with the current frame's bp and unwind until
> bp and sp are on the same stack and bp > sp.

Given that your unwind-till-we-get-there algorithm isn't quite
correct, wouldn't it be easier to just pass all the needed information
through?  Especially since that information is there (in at least one
case) on current kernels, so the diff would be smaller.  It seems
overcomplicated to me to regenerate the state that was already in the
registers by unwinding to it rather than just passing it in to the
unwinder directly.

--Andy

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 30/44] x86/unwind: add new unwind interface and implementations
  2016-08-11 18:58                   ` Andy Lutomirski
@ 2016-08-11 19:15                     ` Josh Poimboeuf
  0 siblings, 0 replies; 66+ messages in thread
From: Josh Poimboeuf @ 2016-08-11 19:15 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Frederic Weisbecker, Thomas Gleixner, Kees Cook, x86,
	H . Peter Anvin, Nilay Vaish, Steven Rostedt, Ingo Molnar,
	linux-kernel, Brian Gerst, Byungchul Park, Peter Zijlstra,
	Linus Torvalds

On Thu, Aug 11, 2016 at 11:58:13AM -0700, Andy Lutomirski wrote:
> On Aug 11, 2016 7:09 PM, "Josh Poimboeuf" <jpoimboe@redhat.com> wrote:
> >
> > On Thu, Aug 11, 2016 at 07:58:34AM -0700, Andy Lutomirski wrote:
> > > On Thu, Aug 11, 2016 at 7:28 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > > > On Thu, Aug 11, 2016 at 12:18:54AM -0700, Andy Lutomirski wrote:
> > > >> > > I admit I don't fully understand the use case.  If someone wants to
> > > >> > > start a trace in the middle, shouldn't they just pass regs in?
> > > >> >
> > > >> > The regs aren't always available.  Some callers just want to skip the
> > > >> > first few frames so the stack dump code itself isn't traced.
> > > >>
> > > >> I suspect that all users are okay with your algorithm simply because
> > > >> they don't switch stacks.  Maybe the thing to do is to stop advancing
> > > >> when sp is passed or if the stack switches at all.
> > > >
> > > > There are actually some cases which could be broken by the sloppy "while
> > > > state->bp < sp" check.  When starting with sp from 'regs->sp', sp often
> > > > points to a different stack than the current one.  It seemed to work in
> > > > my testing, but I guess I got lucky, with irq percpu stack addresses
> > > > being smaller than the thread stack addresses.
> > > >
> > > >> Could you point me at a user that passes anything other than regs->sp
> > > >> here?  On brief inspection, I haven't found any at all.
> > > >
> > > > For example, see show_stack().
> > >
> > > Is that a non-trivial case?  show_stack() is generating sp *and* bp.
> > > Why not just pass both of them all the way in to show_trace_log_lvl
> > > (which the existing code already does) and then pass it into
> > > unwind_start?
> > >
> > > Alternatively, since that risks causing a bit of loss if you implement
> > > DWARF, you could add an unwind_start_here() function that captures the
> > > state in the calling function and pass the result all the way through.
> > > Or you could write a silly asm helper that literally fills in a struct
> > > pt_regs for the current context (although that could get a bit awkward
> > > for 32-bit, since pt_regs doesn't really contain sp there).
> > >
> > > If there are genuinely zero non-trivial cases, then this should fully
> > > solve the problem, no?
> >
> > Hm, what do you mean by non-trivial cases?
> >
> > As far as I can tell, they're all trivial, and we don't need the caller
> > to provide bp.  Just start with the current frame's bp and unwind until
> > bp and sp are on the same stack and bp > sp.
> 
> Given that your unwind-till-we-get-there algorithm isn't quite
> correct

But it *will* be correct in v3, with a minor change.  Sneak preview:

	/*
	 * The caller can optionally provide a stack pointer directly
	 * (first_sp) or indirectly (regs->sp), which indicates which stack
	 * frame to start unwinding at.  Skip ahead until we reach that frame.
	 */
	while (!unwind_done(state) &&
	       (!on_stack(&state->stack_info, first_sp, sizeof(*first_sp) ||
		state->bp < first_sp)))
		unwind_next_frame(state);

> wouldn't it be easier to just pass all the needed information
> through?  Especially since that information is there (in at least one
> case) on current kernels, so the diff would be smaller.

Why focus on diff size?  Isn't it the resulting code that's most
important?  It's not like removing the bp argument causes a big diff
anyway.

> It seems overcomplicated to me to regenerate the state that was
> already in the registers by unwinding to it rather than just passing
> it in to the unwinder directly.

Passing in the frame pointer when it isn't needed makes the interface
more complex.  It creates more opportunity for the caller to mess things
up and more edge cases to consider in the unwinder.

And as you mentioned, it's not compatible with DWARF.  Adding
unwind_start_here() would add even more interface complexity.

-- 
Josh

^ permalink raw reply	[flat|nested] 66+ messages in thread

end of thread, other threads:[~2016-08-11 19:15 UTC | newest]

Thread overview: 66+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-04 22:21 [PATCH v2 00/44] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
2016-08-04 22:21 ` [PATCH v2 01/44] x86/dumpstack: remove show_trace() Josh Poimboeuf
2016-08-04 22:21 ` [PATCH v2 02/44] x86/asm/head: remove unused init_rsp variable Josh Poimboeuf
2016-08-04 22:21 ` [PATCH v2 03/44] x86/asm/head: rename 'stack_start' -> 'initial_stack' Josh Poimboeuf
2016-08-05 15:28   ` Nilay Vaish
2016-08-05 16:01     ` Josh Poimboeuf
2016-08-06  5:25       ` Borislav Petkov
2016-08-06 13:13         ` Josh Poimboeuf
2016-08-06 13:15         ` Brian Gerst
2016-08-06 13:38           ` Josh Poimboeuf
2016-08-04 22:22 ` [PATCH v2 04/44] x86/asm/head: use a common function for starting CPUs Josh Poimboeuf
2016-08-05 15:41   ` Nilay Vaish
2016-08-05 16:17     ` Josh Poimboeuf
2016-08-04 22:22 ` [PATCH v2 05/44] x86/dumpstack: make printk_stack_address() more generally useful Josh Poimboeuf
2016-08-04 22:22 ` [PATCH v2 06/44] x86/dumpstack: add IRQ_USABLE_STACK_SIZE define Josh Poimboeuf
2016-08-04 22:22 ` [PATCH v2 07/44] x86/dumpstack: remove extra brackets around "<EOE>" Josh Poimboeuf
2016-08-04 22:22 ` [PATCH v2 08/44] x86/dumpstack: fix irq stack bounds calculation in show_stack_log_lvl() Josh Poimboeuf
2016-08-04 22:22 ` [PATCH v2 09/44] x86/dumpstack: fix x86_32 kernel_stack_pointer() previous stack access Josh Poimboeuf
2016-08-04 22:22 ` [PATCH v2 10/44] x86/dumpstack: add get_stack_pointer() and get_frame_pointer() Josh Poimboeuf
2016-08-04 22:22 ` [PATCH v2 11/44] x86/dumpstack: remove unnecessary stack pointer arguments Josh Poimboeuf
2016-08-04 22:22 ` [PATCH v2 12/44] x86: move _stext marker to before head code Josh Poimboeuf
2016-08-04 22:22 ` [PATCH v2 13/44] x86/asm/head: remove useless zeroed word Josh Poimboeuf
2016-08-05 16:13   ` Brian Gerst
2016-08-05 16:23     ` Josh Poimboeuf
2016-08-04 22:22 ` [PATCH v2 14/44] x86/asm/head: put real return address on idle task stack Josh Poimboeuf
2016-08-04 22:22 ` [PATCH v2 15/44] perf/x86: check perf_callchain_store() error Josh Poimboeuf
2016-08-04 22:22 ` [PATCH v2 16/44] oprofile/x86: add regs->ip to oprofile trace Josh Poimboeuf
2016-08-04 22:22 ` [PATCH v2 17/44] proc: fix return address printk conversion specifer in /proc/<pid>/stack Josh Poimboeuf
2016-08-04 22:22 ` [PATCH v2 18/44] ftrace: remove CONFIG_HAVE_FUNCTION_GRAPH_FP_TEST from config Josh Poimboeuf
2016-08-04 22:22 ` [PATCH v2 19/44] ftrace: only allocate the ret_stack 'fp' field when needed Josh Poimboeuf
2016-08-04 22:22 ` [PATCH v2 20/44] ftrace: add return address pointer to ftrace_ret_stack Josh Poimboeuf
2016-08-04 22:22 ` [PATCH v2 21/44] ftrace: add ftrace_graph_ret_addr() stack unwinding helpers Josh Poimboeuf
2016-08-04 22:22 ` [PATCH v2 22/44] x86/dumpstack/ftrace: convert dump_trace() callbacks to use ftrace_graph_ret_addr() Josh Poimboeuf
2016-08-04 22:22 ` [PATCH v2 23/44] ftrace/x86: implement HAVE_FUNCTION_GRAPH_RET_ADDR_PTR Josh Poimboeuf
2016-08-04 22:22 ` [PATCH v2 24/44] x86/dumpstack/ftrace: mark function graph handler function as unreliable Josh Poimboeuf
2016-08-04 22:22 ` [PATCH v2 25/44] x86/dumpstack/ftrace: don't print unreliable addresses in print_context_stack_bp() Josh Poimboeuf
2016-08-04 22:22 ` [PATCH v2 26/44] x86/dumpstack: allow preemption in show_stack_log_lvl() and dump_trace() Josh Poimboeuf
2016-08-04 22:22 ` [PATCH v2 27/44] x86/dumpstack: simplify in_exception_stack() Josh Poimboeuf
2016-08-04 22:22 ` [PATCH v2 28/44] x86/dumpstack: add get_stack_info() interface Josh Poimboeuf
2016-08-04 22:22 ` [PATCH v2 29/44] x86/dumpstack: add recursion checking for all stacks Josh Poimboeuf
2016-08-04 22:22 ` [PATCH v2 30/44] x86/unwind: add new unwind interface and implementations Josh Poimboeuf
2016-08-09 23:17   ` Nilay Vaish
2016-08-09 23:27     ` Josh Poimboeuf
2016-08-10  7:25       ` Andy Lutomirski
2016-08-10 14:16         ` Josh Poimboeuf
2016-08-11  7:18           ` Andy Lutomirski
2016-08-11 14:28             ` Josh Poimboeuf
2016-08-11 14:58               ` Andy Lutomirski
2016-08-11 16:09                 ` Josh Poimboeuf
2016-08-11 18:58                   ` Andy Lutomirski
2016-08-11 19:15                     ` Josh Poimboeuf
2016-08-04 22:22 ` [PATCH v2 31/44] perf/x86: convert perf_callchain_kernel() to use the new unwinder Josh Poimboeuf
2016-08-04 22:22 ` [PATCH v2 32/44] x86/stacktrace: convert save_stack_trace_*() " Josh Poimboeuf
2016-08-04 22:22 ` [PATCH v2 33/44] oprofile/x86: convert x86_backtrace() " Josh Poimboeuf
2016-08-04 22:22 ` [PATCH v2 34/44] x86/dumpstack: convert show_trace_log_lvl() " Josh Poimboeuf
2016-08-04 22:22 ` [PATCH v2 35/44] x86/dumpstack: remove dump_trace() and related callbacks Josh Poimboeuf
2016-08-04 22:22 ` [PATCH v2 36/44] x86/entry/unwind: encode pt_regs pointer in frame pointer Josh Poimboeuf
2016-08-08 23:06   ` Josh Poimboeuf
2016-08-04 22:22 ` [PATCH v2 37/44] x86/unwind: detect syscall entry regs Josh Poimboeuf
2016-08-04 22:22 ` [PATCH v2 38/44] x86/dumpstack: print stack identifier on its own line Josh Poimboeuf
2016-08-04 22:22 ` [PATCH v2 39/44] x86/dumpstack: print any pt_regs found on the stack Josh Poimboeuf
2016-08-04 22:22 ` [PATCH v2 40/44] x86: remove 64-byte gap at end of irq stack Josh Poimboeuf
2016-08-04 22:22 ` [PATCH v2 41/44] x86/asm/head: standardize the end of the stack for idle tasks Josh Poimboeuf
2016-08-04 22:22 ` [PATCH v2 42/44] x86/unwind: warn on kernel stack corruption Josh Poimboeuf
2016-08-04 22:22 ` [PATCH v2 43/44] x86/unwind: warn on bad stack return address Josh Poimboeuf
2016-08-04 22:22 ` [PATCH v2 44/44] x86/unwind: warn if stack grows up Josh Poimboeuf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.