All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 00/51] x86/dumpstack: rewrite x86 stack dump code
@ 2016-08-12 14:28 Josh Poimboeuf
  2016-08-12 14:28 ` [PATCH v3 01/51] x86/dumpstack: remove show_trace() Josh Poimboeuf
                   ` (50 more replies)
  0 siblings, 51 replies; 99+ messages in thread
From: Josh Poimboeuf @ 2016-08-12 14:28 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

The patch set continues to grow.  The most significant changes since
last time are a partial rewrite of the unwinder to give each pt_regs its
own frame, frame pointer encoding for 32-bit, and conversion of the new
CONFIG_HARDENED_USERCOPY arch_within_stack_frames() function to use the
unwinder.  See below for the full list of changes.

A git branch is available at:
 
  https://github.com/jpoimboe/linux unwind-v3

Based on tip/master.

v3:
- partial unwinder rewrite: each pt_regs gets its own frame
- add frame pointer encoding support for 32-bit
- several 32-bit fixes and cleanups for issues found by the new warnings
- convert CONFIG_HARDENED_USERCOPY arch_within_stack_frames()
- fix bug in unwinder when skipping stack frames (and add a comment)
- warn on stack recursion
- put start_cpu() in its own function
- export symbols in unwind_guess.c

v2:
- split up several of the patches and reorder them with lower-risk
  patches first
- add a lot more comments
- remove the 64-byte gap at the end of the irq stack
- fix some existing ftrace function graph unwinding issues
- fix an existing bug in kernel_stack_pointer()
- clarify the origins of the stack_info "next stack" pointers
- do visit_mask checking in get_stack_info() instead of in_*_stack()
- add some new unwinder warnings
- remove uses of test_and_set_bit()
- dont print regs->ip twice
- remove unwind_state.sp
- have unwind_get_return_address() validate the return address
- change /proc/pid/stack to use %pB
- several minor cleanups and fixes

----

The x86 stack dump code is a bit of a mess.  dump_trace() uses
callbacks, and each user of it seems to have slightly different
requirements, so there are several slightly different callbacks floating
around.

Also there are some upcoming features which will require more changes to
the stack dump code: reliable stack detection for live patching,
hardened user copy, and the DWARF unwinder.  Each of those features
would at least need more callbacks and/or callback interfaces, resulting
in a much bigger mess than what we have today.

Before doing all that, we should try to clean things up and replace
dump_trace() with something cleaner and more flexible.

The new unwinder is a simple state machine which was heavily inspired by
a suggestion from Andy Lutomirski:

  https://lkml.kernel.org/r/CALCETrUbNTqaM2LRyXGRx=kVLRPeY5A3Pc6k4TtQxF320rUT=w@mail.gmail.com

It's also similar to the libunwind API:

  http://www.nongnu.org/libunwind/man/libunwind(3).html

Some if its advantages:

- simplicity: no more callback sprawl and less code duplication.

- flexibility: allows the caller to stop and inspect the stack state at
  each step in the unwinding process.

- modularity: the unwinder code, console stack dump code, and stack
  metadata analysis code are all better separated so that changing one
  of them shouldn't have much of an impact on any of the others.


Josh Poimboeuf (51):
  x86/dumpstack: remove show_trace()
  x86/asm/head: remove unused init_rsp variable extern
  x86/asm/head: rename 'stack_start' -> 'initial_stack'
  x86/asm/head: use a common function for starting CPUs
  x86/dumpstack: make printk_stack_address() more generally useful
  x86/dumpstack: add IRQ_USABLE_STACK_SIZE define
  x86/dumpstack: remove extra brackets around "<EOE>"
  x86/dumpstack: fix irq stack bounds calculation in
    show_stack_log_lvl()
  x86/dumpstack: fix x86_32 kernel_stack_pointer() previous stack access
  x86/dumpstack: add get_stack_pointer() and get_frame_pointer()
  x86/dumpstack: remove unnecessary stack pointer arguments
  x86: move _stext marker to before head code
  x86/asm/head: remove useless zeroed word
  x86/asm/head:  put real return address on idle task stack
  x86/asm/head: standardize the end of the stack for idle tasks
  x86/32: put real return address on stack in entry code
  x86/smp: fix initial idle stack location on 32-bit
  x86/entry/head/32: use local labels
  x86/entry/32: rename 'error_code' to 'common_exception'
  perf/x86: check perf_callchain_store() error
  oprofile/x86: add regs->ip to oprofile trace
  proc: fix return address printk conversion specifer in
    /proc/<pid>/stack
  ftrace: remove CONFIG_HAVE_FUNCTION_GRAPH_FP_TEST from config
  ftrace: only allocate the ret_stack 'fp' field when needed
  ftrace: add return address pointer to ftrace_ret_stack
  ftrace: add ftrace_graph_ret_addr() stack unwinding helpers
  x86/dumpstack/ftrace: convert dump_trace() callbacks to use
    ftrace_graph_ret_addr()
  ftrace/x86: implement HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
  x86/dumpstack/ftrace: mark function graph handler function as
    unreliable
  x86/dumpstack/ftrace: don't print unreliable addresses in
    print_context_stack_bp()
  x86/dumpstack: allow preemption in show_stack_log_lvl() and
    dump_trace()
  x86/dumpstack: simplify in_exception_stack()
  x86/dumpstack: add get_stack_info() interface
  x86/dumpstack: add recursion checking for all stacks
  x86/unwind: add new unwind interface and implementations
  perf/x86: convert perf_callchain_kernel() to use the new unwinder
  x86/stacktrace: convert save_stack_trace_*() to use the new unwinder
  oprofile/x86: convert x86_backtrace() to use the new unwinder
  x86/dumpstack: convert show_trace_log_lvl() to use the new unwinder
  x86/dumpstack: remove dump_trace() and related callbacks
  x86/entry/unwind: create stack frames for saved interrupt registers
  x86/unwind: create stack frames for saved syscall registers
  x86/dumpstack: print stack identifier on its own line
  x86/dumpstack: print any pt_regs found on the stack
  x86: remove 64-byte gap at end of irq stack
  x86/unwind: warn on kernel stack corruption
  x86/unwind: warn on bad stack return address
  x86/unwind: warn if stack grows up
  x86/dumpstack: warn on stack recursion
  x86/mm: move arch_within_stack_frames() to usercopy.c
  x86/mm: convert arch_within_stack_frames() to use the new unwinder

 Documentation/trace/ftrace-design.txt |  11 ++
 arch/arm/kernel/ftrace.c              |   2 +-
 arch/arm64/kernel/entry-ftrace.S      |   2 +-
 arch/arm64/kernel/ftrace.c            |   2 +-
 arch/blackfin/kernel/ftrace-entry.S   |   4 +-
 arch/blackfin/kernel/ftrace.c         |   2 +-
 arch/microblaze/kernel/ftrace.c       |   2 +-
 arch/mips/kernel/ftrace.c             |   4 +-
 arch/parisc/kernel/ftrace.c           |   2 +-
 arch/powerpc/kernel/ftrace.c          |   3 +-
 arch/s390/kernel/ftrace.c             |   3 +-
 arch/sh/kernel/ftrace.c               |   2 +-
 arch/sparc/Kconfig                    |   1 -
 arch/sparc/include/asm/ftrace.h       |   4 +
 arch/sparc/kernel/ftrace.c            |   2 +-
 arch/tile/kernel/ftrace.c             |   2 +-
 arch/x86/Kconfig                      |   1 -
 arch/x86/entry/calling.h              |  21 +++
 arch/x86/entry/entry_32.S             | 140 +++++++++------
 arch/x86/entry/entry_64.S             |  10 +-
 arch/x86/events/core.c                |  36 ++--
 arch/x86/include/asm/ftrace.h         |   3 +
 arch/x86/include/asm/kdebug.h         |   2 -
 arch/x86/include/asm/page_64_types.h  |  16 +-
 arch/x86/include/asm/realmode.h       |   2 +-
 arch/x86/include/asm/smp.h            |   3 -
 arch/x86/include/asm/stacktrace.h     | 114 ++++++------
 arch/x86/include/asm/thread_info.h    |  46 +----
 arch/x86/include/asm/unwind.h         | 107 ++++++++++++
 arch/x86/kernel/Makefile              |   6 +
 arch/x86/kernel/acpi/sleep.c          |   2 +-
 arch/x86/kernel/cpu/common.c          |   2 +-
 arch/x86/kernel/dumpstack.c           | 272 +++++++++++++----------------
 arch/x86/kernel/dumpstack_32.c        | 141 ++++++++-------
 arch/x86/kernel/dumpstack_64.c        | 320 +++++++++++-----------------------
 arch/x86/kernel/ftrace.c              |   2 +-
 arch/x86/kernel/head_32.S             |  54 +++---
 arch/x86/kernel/head_64.S             |  50 +++---
 arch/x86/kernel/ptrace.c              |   4 +-
 arch/x86/kernel/setup_percpu.c        |   2 +-
 arch/x86/kernel/smpboot.c             |   6 +-
 arch/x86/kernel/stacktrace.c          |  74 +++-----
 arch/x86/kernel/unwind_frame.c        | 234 +++++++++++++++++++++++++
 arch/x86/kernel/unwind_guess.c        |  42 +++++
 arch/x86/kernel/vmlinux.lds.S         |   2 +-
 arch/x86/lib/usercopy.c               |  56 ++++++
 arch/x86/oprofile/backtrace.c         |  49 +++---
 fs/proc/base.c                        |   2 +-
 include/linux/ftrace.h                |  17 +-
 kernel/trace/Kconfig                  |   5 -
 kernel/trace/trace_functions_graph.c  |  67 ++++++-
 51 files changed, 1185 insertions(+), 773 deletions(-)
 create mode 100644 arch/x86/include/asm/unwind.h
 create mode 100644 arch/x86/kernel/unwind_frame.c
 create mode 100644 arch/x86/kernel/unwind_guess.c

-- 
2.7.4

^ permalink raw reply	[flat|nested] 99+ messages in thread

* [PATCH v3 01/51] x86/dumpstack: remove show_trace()
  2016-08-12 14:28 [PATCH v3 00/51] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
@ 2016-08-12 14:28 ` Josh Poimboeuf
  2016-08-12 14:28 ` [PATCH v3 02/51] x86/asm/head: remove unused init_rsp variable extern Josh Poimboeuf
                   ` (49 subsequent siblings)
  50 siblings, 0 replies; 99+ messages in thread
From: Josh Poimboeuf @ 2016-08-12 14:28 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

There are a bewildering array of options for dumping the stack.
Simplify things a little by removing show_trace(), which is unused.

Reviewed-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/include/asm/kdebug.h | 2 --
 arch/x86/kernel/dumpstack.c   | 6 ------
 2 files changed, 8 deletions(-)

diff --git a/arch/x86/include/asm/kdebug.h b/arch/x86/include/asm/kdebug.h
index 1ef9d58..d318811 100644
--- a/arch/x86/include/asm/kdebug.h
+++ b/arch/x86/include/asm/kdebug.h
@@ -24,8 +24,6 @@ enum die_val {
 extern void printk_address(unsigned long address);
 extern void die(const char *, struct pt_regs *,long);
 extern int __must_check __die(const char *, struct pt_regs *, long);
-extern void show_trace(struct task_struct *t, struct pt_regs *regs,
-		       unsigned long *sp, unsigned long bp);
 extern void show_stack_regs(struct pt_regs *regs);
 extern void __show_regs(struct pt_regs *regs, int all);
 extern unsigned long oops_begin(void);
diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index 92e8f0a..5f49c04 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -182,12 +182,6 @@ show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
 	dump_trace(task, regs, stack, bp, &print_trace_ops, log_lvl);
 }
 
-void show_trace(struct task_struct *task, struct pt_regs *regs,
-		unsigned long *stack, unsigned long bp)
-{
-	show_trace_log_lvl(task, regs, stack, bp, "");
-}
-
 void show_stack(struct task_struct *task, unsigned long *sp)
 {
 	unsigned long bp = 0;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 02/51] x86/asm/head: remove unused init_rsp variable extern
  2016-08-12 14:28 [PATCH v3 00/51] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
  2016-08-12 14:28 ` [PATCH v3 01/51] x86/dumpstack: remove show_trace() Josh Poimboeuf
@ 2016-08-12 14:28 ` Josh Poimboeuf
  2016-08-12 14:28 ` [PATCH v3 03/51] x86/asm/head: rename 'stack_start' -> 'initial_stack' Josh Poimboeuf
                   ` (48 subsequent siblings)
  50 siblings, 0 replies; 99+ messages in thread
From: Josh Poimboeuf @ 2016-08-12 14:28 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

There is no init_rsp variable.  Remove its extern.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/include/asm/realmode.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/x86/include/asm/realmode.h b/arch/x86/include/asm/realmode.h
index b2988c0..3327ffb 100644
--- a/arch/x86/include/asm/realmode.h
+++ b/arch/x86/include/asm/realmode.h
@@ -44,7 +44,6 @@ struct trampoline_header {
 extern struct real_mode_header *real_mode_header;
 extern unsigned char real_mode_blob_end[];
 
-extern unsigned long init_rsp;
 extern unsigned long initial_code;
 extern unsigned long initial_gs;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 03/51] x86/asm/head: rename 'stack_start' -> 'initial_stack'
  2016-08-12 14:28 [PATCH v3 00/51] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
  2016-08-12 14:28 ` [PATCH v3 01/51] x86/dumpstack: remove show_trace() Josh Poimboeuf
  2016-08-12 14:28 ` [PATCH v3 02/51] x86/asm/head: remove unused init_rsp variable extern Josh Poimboeuf
@ 2016-08-12 14:28 ` Josh Poimboeuf
  2016-08-12 14:28 ` [PATCH v3 04/51] x86/asm/head: use a common function for starting CPUs Josh Poimboeuf
                   ` (47 subsequent siblings)
  50 siblings, 0 replies; 99+ messages in thread
From: Josh Poimboeuf @ 2016-08-12 14:28 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

The 'stack_start' variable is similar in usage to 'initial_code' and
'initial_gs': they're all stored in head_64.S and they're all updated by
SMP and ACPI suspend before starting a CPU.

Rename it to 'initial_stack' to be consistent with the others.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/include/asm/realmode.h |  1 +
 arch/x86/include/asm/smp.h      |  3 ---
 arch/x86/kernel/acpi/sleep.c    |  2 +-
 arch/x86/kernel/head_32.S       |  8 ++++----
 arch/x86/kernel/head_64.S       | 11 +++++------
 arch/x86/kernel/smpboot.c       |  2 +-
 6 files changed, 12 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/realmode.h b/arch/x86/include/asm/realmode.h
index 3327ffb..230e190 100644
--- a/arch/x86/include/asm/realmode.h
+++ b/arch/x86/include/asm/realmode.h
@@ -46,6 +46,7 @@ extern unsigned char real_mode_blob_end[];
 
 extern unsigned long initial_code;
 extern unsigned long initial_gs;
+extern unsigned long initial_stack;
 
 extern unsigned char real_mode_blob[];
 extern unsigned char real_mode_relocs[];
diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h
index ebd0c16..19980b3 100644
--- a/arch/x86/include/asm/smp.h
+++ b/arch/x86/include/asm/smp.h
@@ -39,9 +39,6 @@ DECLARE_EARLY_PER_CPU_READ_MOSTLY(u16, x86_bios_cpu_apicid);
 DECLARE_EARLY_PER_CPU_READ_MOSTLY(int, x86_cpu_to_logical_apicid);
 #endif
 
-/* Static state in head.S used to set up a CPU */
-extern unsigned long stack_start; /* Initial stack pointer address */
-
 struct task_struct;
 
 struct smp_ops {
diff --git a/arch/x86/kernel/acpi/sleep.c b/arch/x86/kernel/acpi/sleep.c
index adb3eaf..4858733 100644
--- a/arch/x86/kernel/acpi/sleep.c
+++ b/arch/x86/kernel/acpi/sleep.c
@@ -99,7 +99,7 @@ int x86_acpi_suspend_lowlevel(void)
 	saved_magic = 0x12345678;
 #else /* CONFIG_64BIT */
 #ifdef CONFIG_SMP
-	stack_start = (unsigned long)temp_stack + sizeof(temp_stack);
+	initial_stack = (unsigned long)temp_stack + sizeof(temp_stack);
 	early_gdt_descr.address =
 			(unsigned long)get_cpu_gdt_table(smp_processor_id());
 	initial_gs = per_cpu_offset(smp_processor_id());
diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index 6f8902b..5f40126 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -94,7 +94,7 @@ RESERVE_BRK(pagetables, INIT_MAP_SIZE)
  */
 __HEAD
 ENTRY(startup_32)
-	movl pa(stack_start),%ecx
+	movl pa(initial_stack),%ecx
 	
 	/* test KEEP_SEGMENTS flag to see if the bootloader is asking
 		us to not reload segments */
@@ -286,7 +286,7 @@ num_subarch_entries = (. - subarch_entries) / 4
  * start_secondary().
  */
 ENTRY(start_cpu0)
-	movl stack_start, %ecx
+	movl initial_stack, %ecx
 	movl %ecx, %esp
 	jmp  *(initial_code)
 ENDPROC(start_cpu0)
@@ -307,7 +307,7 @@ ENTRY(startup_32_smp)
 	movl %eax,%es
 	movl %eax,%fs
 	movl %eax,%gs
-	movl pa(stack_start),%ecx
+	movl pa(initial_stack),%ecx
 	movl %eax,%ss
 	leal -__PAGE_OFFSET(%ecx),%esp
 
@@ -703,7 +703,7 @@ ENTRY(initial_page_table)
 
 .data
 .balign 4
-ENTRY(stack_start)
+ENTRY(initial_stack)
 	.long init_thread_union+THREAD_SIZE
 
 __INITRODATA
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 9f8efc9..e048142 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -66,7 +66,7 @@ startup_64:
 	 */
 
 	/*
-	 * Setup stack for verify_cpu(). "-8" because stack_start is defined
+	 * Setup stack for verify_cpu(). "-8" because initial_stack is defined
 	 * this way, see below. Our best guess is a NULL ptr for stack
 	 * termination heuristics and we don't want to break anything which
 	 * might depend on it (kgdb, ...).
@@ -226,7 +226,7 @@ ENTRY(secondary_startup_64)
 	movq	%rax, %cr0
 
 	/* Setup a boot time stack */
-	movq stack_start(%rip), %rsp
+	movq initial_stack(%rip), %rsp
 
 	/* zero EFLAGS after setting rsp */
 	pushq $0
@@ -310,7 +310,7 @@ ENDPROC(secondary_startup_64)
  * start_secondary().
  */
 ENTRY(start_cpu0)
-	movq stack_start(%rip),%rsp
+	movq initial_stack(%rip),%rsp
 	movq	initial_code(%rip),%rax
 	pushq	$0		# fake return address to stop unwinder
 	pushq	$__KERNEL_CS	# set correct cs
@@ -319,15 +319,14 @@ ENTRY(start_cpu0)
 ENDPROC(start_cpu0)
 #endif
 
-	/* SMP bootup changes these two */
+	/* Both SMP bootup and ACPI suspend change these variables */
 	__REFDATA
 	.balign	8
 	GLOBAL(initial_code)
 	.quad	x86_64_start_kernel
 	GLOBAL(initial_gs)
 	.quad	INIT_PER_CPU_VAR(irq_stack_union)
-
-	GLOBAL(stack_start)
+	GLOBAL(initial_stack)
 	.quad  init_thread_union+THREAD_SIZE-8
 	.word  0
 	__FINITDATA
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 067de61..d9d3d67 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -962,7 +962,7 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle)
 
 	early_gdt_descr.address = (unsigned long)get_cpu_gdt_table(cpu);
 	initial_code = (unsigned long)start_secondary;
-	stack_start  = idle->thread.sp;
+	initial_stack  = idle->thread.sp;
 
 	/*
 	 * Enable the espfix hack for this CPU
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 04/51] x86/asm/head: use a common function for starting CPUs
  2016-08-12 14:28 [PATCH v3 00/51] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (2 preceding siblings ...)
  2016-08-12 14:28 ` [PATCH v3 03/51] x86/asm/head: rename 'stack_start' -> 'initial_stack' Josh Poimboeuf
@ 2016-08-12 14:28 ` Josh Poimboeuf
  2016-08-12 22:08   ` Nilay Vaish
  2016-08-12 14:28 ` [PATCH v3 05/51] x86/dumpstack: make printk_stack_address() more generally useful Josh Poimboeuf
                   ` (46 subsequent siblings)
  50 siblings, 1 reply; 99+ messages in thread
From: Josh Poimboeuf @ 2016-08-12 14:28 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

There are two different pieces of code for starting a CPU: start_cpu0()
and the end of secondary_startup_64().  They're identical except for the
stack setup.  Combine the common parts into a shared start_cpu()
function.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/head_64.S | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index e048142..a212310 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -264,13 +264,17 @@ ENTRY(secondary_startup_64)
 	movl	$MSR_GS_BASE,%ecx
 	movl	initial_gs(%rip),%eax
 	movl	initial_gs+4(%rip),%edx
-	wrmsr	
+	wrmsr
 
 	/* rsi is pointer to real mode structure with interesting info.
 	   pass it to C */
 	movq	%rsi, %rdi
-	
-	/* Finally jump to run C code and to be on real kernel address
+	jmp	start_cpu
+ENDPROC(secondary_startup_64)
+
+ENTRY(start_cpu)
+	/*
+	 * Jump to run C code and to be on a real kernel address.
 	 * Since we are running on identity-mapped space we have to jump
 	 * to the full 64bit address, this is only possible as indirect
 	 * jump.  In addition we need to ensure %cs is set so we make this
@@ -299,7 +303,7 @@ ENTRY(secondary_startup_64)
 	pushq	$__KERNEL_CS	# set correct cs
 	pushq	%rax		# target address in negative space
 	lretq
-ENDPROC(secondary_startup_64)
+ENDPROC(start_cpu)
 
 #include "verify_cpu.S"
 
@@ -307,15 +311,11 @@ ENDPROC(secondary_startup_64)
 /*
  * Boot CPU0 entry point. It's called from play_dead(). Everything has been set
  * up already except stack. We just set up stack here. Then call
- * start_secondary().
+ * start_secondary() via start_cpu().
  */
 ENTRY(start_cpu0)
-	movq initial_stack(%rip),%rsp
-	movq	initial_code(%rip),%rax
-	pushq	$0		# fake return address to stop unwinder
-	pushq	$__KERNEL_CS	# set correct cs
-	pushq	%rax		# target address in negative space
-	lretq
+	movq	initial_stack(%rip), %rsp
+	jmp	start_cpu
 ENDPROC(start_cpu0)
 #endif
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 05/51] x86/dumpstack: make printk_stack_address() more generally useful
  2016-08-12 14:28 [PATCH v3 00/51] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (3 preceding siblings ...)
  2016-08-12 14:28 ` [PATCH v3 04/51] x86/asm/head: use a common function for starting CPUs Josh Poimboeuf
@ 2016-08-12 14:28 ` Josh Poimboeuf
  2016-08-12 14:28 ` [PATCH v3 06/51] x86/dumpstack: add IRQ_USABLE_STACK_SIZE define Josh Poimboeuf
                   ` (45 subsequent siblings)
  50 siblings, 0 replies; 99+ messages in thread
From: Josh Poimboeuf @ 2016-08-12 14:28 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

Change printk_stack_address() to be useful when called by an unwinder
outside the context of dump_trace().

Specifically:

- printk_stack_address()'s 'data' argument is always used as the log
  level string.  Make that explicit.

- Call touch_nmi_watchdog().

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/dumpstack.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index 5f49c04..6b3376d 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -26,10 +26,11 @@ int kstack_depth_to_print = 3 * STACKSLOTS_PER_LINE;
 static int die_counter;
 
 static void printk_stack_address(unsigned long address, int reliable,
-		void *data)
+				 char *log_lvl)
 {
+	touch_nmi_watchdog();
 	printk("%s [<%p>] %s%pB\n",
-		(char *)data, (void *)address, reliable ? "" : "? ",
+		log_lvl, (void *)address, reliable ? "" : "? ",
 		(void *)address);
 }
 
@@ -163,7 +164,6 @@ static int print_trace_stack(void *data, char *name)
  */
 static int print_trace_address(void *data, unsigned long addr, int reliable)
 {
-	touch_nmi_watchdog();
 	printk_stack_address(addr, reliable, data);
 	return 0;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 06/51] x86/dumpstack: add IRQ_USABLE_STACK_SIZE define
  2016-08-12 14:28 [PATCH v3 00/51] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (4 preceding siblings ...)
  2016-08-12 14:28 ` [PATCH v3 05/51] x86/dumpstack: make printk_stack_address() more generally useful Josh Poimboeuf
@ 2016-08-12 14:28 ` Josh Poimboeuf
  2016-08-12 14:28 ` [PATCH v3 07/51] x86/dumpstack: remove extra brackets around "<EOE>" Josh Poimboeuf
                   ` (44 subsequent siblings)
  50 siblings, 0 replies; 99+ messages in thread
From: Josh Poimboeuf @ 2016-08-12 14:28 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

For reasons unknown, the x86_64 irq stack starts at an offset 64 bytes
from the end of the page.  At least make that explicit.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/include/asm/page_64_types.h | 19 +++++++++++--------
 arch/x86/kernel/cpu/common.c         |  2 +-
 arch/x86/kernel/dumpstack_64.c       |  5 +----
 arch/x86/kernel/setup_percpu.c       |  2 +-
 4 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/page_64_types.h b/arch/x86/include/asm/page_64_types.h
index 9215e05..6256baf 100644
--- a/arch/x86/include/asm/page_64_types.h
+++ b/arch/x86/include/asm/page_64_types.h
@@ -12,17 +12,20 @@
 #endif
 
 #define THREAD_SIZE_ORDER	(2 + KASAN_STACK_ORDER)
-#define THREAD_SIZE  (PAGE_SIZE << THREAD_SIZE_ORDER)
-#define CURRENT_MASK (~(THREAD_SIZE - 1))
+#define THREAD_SIZE		(PAGE_SIZE << THREAD_SIZE_ORDER)
+#define CURRENT_MASK		(~(THREAD_SIZE - 1))
 
-#define EXCEPTION_STACK_ORDER (0 + KASAN_STACK_ORDER)
-#define EXCEPTION_STKSZ (PAGE_SIZE << EXCEPTION_STACK_ORDER)
+#define EXCEPTION_STACK_ORDER	(0 + KASAN_STACK_ORDER)
+#define EXCEPTION_STKSZ		(PAGE_SIZE << EXCEPTION_STACK_ORDER)
 
-#define DEBUG_STACK_ORDER (EXCEPTION_STACK_ORDER + 1)
-#define DEBUG_STKSZ (PAGE_SIZE << DEBUG_STACK_ORDER)
+#define DEBUG_STACK_ORDER	(EXCEPTION_STACK_ORDER + 1)
+#define DEBUG_STKSZ		(PAGE_SIZE << DEBUG_STACK_ORDER)
 
-#define IRQ_STACK_ORDER (2 + KASAN_STACK_ORDER)
-#define IRQ_STACK_SIZE (PAGE_SIZE << IRQ_STACK_ORDER)
+#define IRQ_STACK_ORDER		(2 + KASAN_STACK_ORDER)
+#define IRQ_STACK_SIZE		(PAGE_SIZE << IRQ_STACK_ORDER)
+
+/* FIXME: why? */
+#define IRQ_USABLE_STACK_SIZE	(IRQ_STACK_SIZE - 64)
 
 #define DOUBLEFAULT_STACK 1
 #define NMI_STACK 2
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index d3b91be..55684b1 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1286,7 +1286,7 @@ DEFINE_PER_CPU(struct task_struct *, current_task) ____cacheline_aligned =
 EXPORT_PER_CPU_SYMBOL(current_task);
 
 DEFINE_PER_CPU(char *, irq_stack_ptr) =
-	init_per_cpu_var(irq_stack_union.irq_stack) + IRQ_STACK_SIZE - 64;
+	init_per_cpu_var(irq_stack_union.irq_stack) + IRQ_USABLE_STACK_SIZE;
 
 DEFINE_PER_CPU(unsigned int, irq_count) __visible = -1;
 
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index 9ee4520..43023ae 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -103,9 +103,6 @@ in_irq_stack(unsigned long *stack, unsigned long *irq_stack,
 	return (stack >= irq_stack && stack < irq_stack_end);
 }
 
-static const unsigned long irq_stack_size =
-	(IRQ_STACK_SIZE - 64) / sizeof(unsigned long);
-
 enum stack_type {
 	STACK_IS_UNKNOWN,
 	STACK_IS_NORMAL,
@@ -133,7 +130,7 @@ analyze_stack(int cpu, struct task_struct *task, unsigned long *stack,
 		return STACK_IS_NORMAL;
 
 	*stack_end = irq_stack;
-	irq_stack = irq_stack - irq_stack_size;
+	irq_stack -= (IRQ_USABLE_STACK_SIZE / sizeof(long));
 
 	if (in_irq_stack(stack, irq_stack, *stack_end))
 		return STACK_IS_IRQ;
diff --git a/arch/x86/kernel/setup_percpu.c b/arch/x86/kernel/setup_percpu.c
index 1d5c794..a2a0eae 100644
--- a/arch/x86/kernel/setup_percpu.c
+++ b/arch/x86/kernel/setup_percpu.c
@@ -246,7 +246,7 @@ void __init setup_per_cpu_areas(void)
 #ifdef CONFIG_X86_64
 		per_cpu(irq_stack_ptr, cpu) =
 			per_cpu(irq_stack_union.irq_stack, cpu) +
-			IRQ_STACK_SIZE - 64;
+			IRQ_USABLE_STACK_SIZE;
 #endif
 #ifdef CONFIG_NUMA
 		per_cpu(x86_cpu_to_node_map, cpu) =
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 07/51] x86/dumpstack: remove extra brackets around "<EOE>"
  2016-08-12 14:28 [PATCH v3 00/51] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (5 preceding siblings ...)
  2016-08-12 14:28 ` [PATCH v3 06/51] x86/dumpstack: add IRQ_USABLE_STACK_SIZE define Josh Poimboeuf
@ 2016-08-12 14:28 ` Josh Poimboeuf
  2016-08-12 14:28 ` [PATCH v3 08/51] x86/dumpstack: fix irq stack bounds calculation in show_stack_log_lvl() Josh Poimboeuf
                   ` (43 subsequent siblings)
  50 siblings, 0 replies; 99+ messages in thread
From: Josh Poimboeuf @ 2016-08-12 14:28 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

When starting the dump of an exception stack, it shows "<<EOE>>" instead
of "<EOE>".  print_trace_stack() already adds brackets, no need to add
them again.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/dumpstack_64.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index 43023ae..7ea6ed0 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -199,7 +199,7 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 
 			bp = ops->walk_stack(task, stack, bp, ops,
 					     data, stack_end, &graph);
-			ops->stack(data, "<EOE>");
+			ops->stack(data, "EOE");
 			/*
 			 * We link to the next stack via the
 			 * second-to-last pointer (index -2 to end) in the
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 08/51] x86/dumpstack: fix irq stack bounds calculation in show_stack_log_lvl()
  2016-08-12 14:28 [PATCH v3 00/51] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (6 preceding siblings ...)
  2016-08-12 14:28 ` [PATCH v3 07/51] x86/dumpstack: remove extra brackets around "<EOE>" Josh Poimboeuf
@ 2016-08-12 14:28 ` Josh Poimboeuf
  2016-08-12 14:28 ` [PATCH v3 09/51] x86/dumpstack: fix x86_32 kernel_stack_pointer() previous stack access Josh Poimboeuf
                   ` (42 subsequent siblings)
  50 siblings, 0 replies; 99+ messages in thread
From: Josh Poimboeuf @ 2016-08-12 14:28 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

The percpu irq_stack_ptr variable has a 64-byte gap from the end of the
allocated irq stack area, so subtracting IRQ_STACK_SIZE from it actually
results in a value 64 bytes before the beginning of the stack.  Use
IRQ_USABLE_STACK_SIZE instead.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/dumpstack_64.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index 7ea6ed0..0fdd371 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -253,8 +253,8 @@ show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
 	preempt_disable();
 	cpu = smp_processor_id();
 
-	irq_stack_end	= (unsigned long *)(per_cpu(irq_stack_ptr, cpu));
-	irq_stack	= (unsigned long *)(per_cpu(irq_stack_ptr, cpu) - IRQ_STACK_SIZE);
+	irq_stack_end = (unsigned long *)(per_cpu(irq_stack_ptr, cpu));
+	irq_stack     = irq_stack_end - (IRQ_USABLE_STACK_SIZE / sizeof(long));
 
 	/*
 	 * Debugging aid: "show_stack(NULL, NULL);" prints the
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 09/51] x86/dumpstack: fix x86_32 kernel_stack_pointer() previous stack access
  2016-08-12 14:28 [PATCH v3 00/51] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (7 preceding siblings ...)
  2016-08-12 14:28 ` [PATCH v3 08/51] x86/dumpstack: fix irq stack bounds calculation in show_stack_log_lvl() Josh Poimboeuf
@ 2016-08-12 14:28 ` Josh Poimboeuf
  2016-08-14  7:26   ` Andy Lutomirski
  2016-08-12 14:28 ` [PATCH v3 10/51] x86/dumpstack: add get_stack_pointer() and get_frame_pointer() Josh Poimboeuf
                   ` (41 subsequent siblings)
  50 siblings, 1 reply; 99+ messages in thread
From: Josh Poimboeuf @ 2016-08-12 14:28 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

On x86_32, when an interrupt happens from kernel space, SS and SP aren't
pushed and the existing stack is used.  So pt_regs is effectively two
words shorter, and the previous stack pointer is normally the memory
after the shortened pt_regs, aka '&regs->sp'.

But in the rare case where the interrupt hits right after the stack
pointer has been changed to point to an empty stack, like for example
when call_on_stack() is used, the address immediately after the
shortened pt_regs is no longer on the stack.  In that case, instead of
'&regs->sp', the previous stack pointer should be retrieved from the
beginning of the current stack page.

kernel_stack_pointer() wants to do that, but it forgets to dereference
the pointer.  So instead of returning a pointer to the previous stack,
it returns a pointer to the beginning of the current stack.

Fixes: 0788aa6a23cb ("x86: Prepare removal of previous_esp from i386 thread_info structure")
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/ptrace.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index 2537cfb..5b88a1b 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -173,8 +173,8 @@ unsigned long kernel_stack_pointer(struct pt_regs *regs)
 		return sp;
 
 	prev_esp = (u32 *)(context);
-	if (prev_esp)
-		return (unsigned long)prev_esp;
+	if (*prev_esp)
+		return (unsigned long)*prev_esp;
 
 	return (unsigned long)regs;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 10/51] x86/dumpstack: add get_stack_pointer() and get_frame_pointer()
  2016-08-12 14:28 [PATCH v3 00/51] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (8 preceding siblings ...)
  2016-08-12 14:28 ` [PATCH v3 09/51] x86/dumpstack: fix x86_32 kernel_stack_pointer() previous stack access Josh Poimboeuf
@ 2016-08-12 14:28 ` Josh Poimboeuf
  2016-08-12 14:28 ` [PATCH v3 11/51] x86/dumpstack: remove unnecessary stack pointer arguments Josh Poimboeuf
                   ` (40 subsequent siblings)
  50 siblings, 0 replies; 99+ messages in thread
From: Josh Poimboeuf @ 2016-08-12 14:28 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

The various functions involved in dumping the stack all do similar
things with regard to getting the stack pointer and the frame pointer
based on the regs and task arguments.  Create helper functions to
do that instead.

Reviewed-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/include/asm/stacktrace.h | 39 ++++++++++++++++++++++-----------------
 arch/x86/kernel/dumpstack.c       |  5 ++---
 arch/x86/kernel/dumpstack_32.c    | 25 ++++---------------------
 arch/x86/kernel/dumpstack_64.c    | 30 ++++--------------------------
 4 files changed, 32 insertions(+), 67 deletions(-)

diff --git a/arch/x86/include/asm/stacktrace.h b/arch/x86/include/asm/stacktrace.h
index 0944218..6f65995 100644
--- a/arch/x86/include/asm/stacktrace.h
+++ b/arch/x86/include/asm/stacktrace.h
@@ -49,37 +49,42 @@ void dump_trace(struct task_struct *tsk, struct pt_regs *regs,
 
 #ifdef CONFIG_X86_32
 #define STACKSLOTS_PER_LINE 8
-#define get_bp(bp) asm("movl %%ebp, %0" : "=r" (bp) :)
 #else
 #define STACKSLOTS_PER_LINE 4
-#define get_bp(bp) asm("movq %%rbp, %0" : "=r" (bp) :)
 #endif
 
 #ifdef CONFIG_FRAME_POINTER
-static inline unsigned long
-stack_frame(struct task_struct *task, struct pt_regs *regs)
+static inline unsigned long *
+get_frame_pointer(struct task_struct *task, struct pt_regs *regs)
 {
-	unsigned long bp;
-
 	if (regs)
-		return regs->bp;
+		return (unsigned long *)regs->bp;
 
-	if (task == current) {
-		/* Grab bp right from our regs */
-		get_bp(bp);
-		return bp;
-	}
+	if (!task || task == current)
+		return __builtin_frame_address(0);
 
 	/* bp is the last reg pushed by switch_to */
-	return *(unsigned long *)task->thread.sp;
+	return (unsigned long *)*(unsigned long *)task->thread.sp;
 }
 #else
-static inline unsigned long
-stack_frame(struct task_struct *task, struct pt_regs *regs)
+static inline unsigned long *
+get_frame_pointer(struct task_struct *task, struct pt_regs *regs)
 {
 	return 0;
 }
-#endif
+#endif /* CONFIG_FRAME_POINTER */
+
+static inline unsigned long *
+get_stack_pointer(struct task_struct *task, struct pt_regs *regs)
+{
+	if (regs)
+		return (unsigned long *)kernel_stack_pointer(regs);
+
+	if (!task || task == current)
+		return __builtin_frame_address(0);
+
+	return (unsigned long *)task->thread.sp;
+}
 
 extern void
 show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
@@ -106,7 +111,7 @@ static inline unsigned long caller_frame_pointer(void)
 {
 	struct stack_frame *frame;
 
-	get_bp(frame);
+	frame = __builtin_frame_address(0);
 
 #ifdef CONFIG_FRAME_POINTER
 	frame = frame->next_frame;
diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index 6b3376d..68f42bb 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -185,15 +185,14 @@ show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
 void show_stack(struct task_struct *task, unsigned long *sp)
 {
 	unsigned long bp = 0;
-	unsigned long stack;
 
 	/*
 	 * Stack frames below this one aren't interesting.  Don't show them
 	 * if we're printing for %current.
 	 */
 	if (!sp && (!task || task == current)) {
-		sp = &stack;
-		bp = stack_frame(current, NULL);
+		sp = get_stack_pointer(current, NULL);
+		bp = (unsigned long)get_frame_pointer(current, NULL);
 	}
 
 	show_stack_log_lvl(task, NULL, sp, bp, "");
diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
index 0967571..358fe1c 100644
--- a/arch/x86/kernel/dumpstack_32.c
+++ b/arch/x86/kernel/dumpstack_32.c
@@ -46,19 +46,9 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 	int graph = 0;
 	u32 *prev_esp;
 
-	if (!task)
-		task = current;
-
-	if (!stack) {
-		unsigned long dummy;
-
-		stack = &dummy;
-		if (task != current)
-			stack = (unsigned long *)task->thread.sp;
-	}
-
-	if (!bp)
-		bp = stack_frame(task, regs);
+	task = task ? : current;
+	stack = stack ? : get_stack_pointer(task, regs);
+	bp = bp ? : (unsigned long)get_frame_pointer(task, regs);
 
 	for (;;) {
 		void *end_stack;
@@ -95,14 +85,7 @@ show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
 	unsigned long *stack;
 	int i;
 
-	if (sp == NULL) {
-		if (regs)
-			sp = (unsigned long *)regs->sp;
-		else if (task)
-			sp = (unsigned long *)task->thread.sp;
-		else
-			sp = (unsigned long *)&sp;
-	}
+	sp = sp ? : get_stack_pointer(task, regs);
 
 	stack = sp;
 	for (i = 0; i < kstack_depth_to_print; i++) {
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index 0fdd371..3c5dbc0 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -151,25 +151,14 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 {
 	const unsigned cpu = get_cpu();
 	unsigned long *irq_stack = (unsigned long *)per_cpu(irq_stack_ptr, cpu);
-	unsigned long dummy;
 	unsigned used = 0;
 	int graph = 0;
 	int done = 0;
 
-	if (!task)
-		task = current;
+	task = task ? : current;
+	stack = stack ? : get_stack_pointer(task, regs);
+	bp = bp ? : (unsigned long)get_frame_pointer(task, regs);
 
-	if (!stack) {
-		if (regs)
-			stack = (unsigned long *)regs->sp;
-		else if (task != current)
-			stack = (unsigned long *)task->thread.sp;
-		else
-			stack = &dummy;
-	}
-
-	if (!bp)
-		bp = stack_frame(task, regs);
 	/*
 	 * Print function call entries in all stacks, starting at the
 	 * current stack address. If the stacks consist of nested
@@ -256,18 +245,7 @@ show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
 	irq_stack_end = (unsigned long *)(per_cpu(irq_stack_ptr, cpu));
 	irq_stack     = irq_stack_end - (IRQ_USABLE_STACK_SIZE / sizeof(long));
 
-	/*
-	 * Debugging aid: "show_stack(NULL, NULL);" prints the
-	 * back trace for this cpu:
-	 */
-	if (sp == NULL) {
-		if (regs)
-			sp = (unsigned long *)regs->sp;
-		else if (task)
-			sp = (unsigned long *)task->thread.sp;
-		else
-			sp = (unsigned long *)&sp;
-	}
+	sp = sp ? : get_stack_pointer(task, regs);
 
 	stack = sp;
 	for (i = 0; i < kstack_depth_to_print; i++) {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 11/51] x86/dumpstack: remove unnecessary stack pointer arguments
  2016-08-12 14:28 [PATCH v3 00/51] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (9 preceding siblings ...)
  2016-08-12 14:28 ` [PATCH v3 10/51] x86/dumpstack: add get_stack_pointer() and get_frame_pointer() Josh Poimboeuf
@ 2016-08-12 14:28 ` Josh Poimboeuf
  2016-08-12 14:28 ` [PATCH v3 12/51] x86: move _stext marker to before head code Josh Poimboeuf
                   ` (39 subsequent siblings)
  50 siblings, 0 replies; 99+ messages in thread
From: Josh Poimboeuf @ 2016-08-12 14:28 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

When calling show_stack_log_lvl() or dump_trace() with a regs argument,
providing a stack pointer or frame pointer is redundant.

Reviewed-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>d
---
 arch/x86/kernel/dumpstack.c    | 2 +-
 arch/x86/kernel/dumpstack_32.c | 2 +-
 arch/x86/kernel/dumpstack_64.c | 5 +----
 arch/x86/oprofile/backtrace.c  | 4 +---
 4 files changed, 4 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index 68f42bb..692eecae 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -200,7 +200,7 @@ void show_stack(struct task_struct *task, unsigned long *sp)
 
 void show_stack_regs(struct pt_regs *regs)
 {
-	show_stack_log_lvl(current, regs, (unsigned long *)regs->sp, regs->bp, "");
+	show_stack_log_lvl(current, regs, NULL, 0, "");
 }
 
 static arch_spinlock_t die_lock = __ARCH_SPIN_LOCK_UNLOCKED;
diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
index 358fe1c..c533b8b 100644
--- a/arch/x86/kernel/dumpstack_32.c
+++ b/arch/x86/kernel/dumpstack_32.c
@@ -122,7 +122,7 @@ void show_regs(struct pt_regs *regs)
 		u8 *ip;
 
 		pr_emerg("Stack:\n");
-		show_stack_log_lvl(NULL, regs, &regs->sp, 0, KERN_EMERG);
+		show_stack_log_lvl(NULL, regs, NULL, 0, KERN_EMERG);
 
 		pr_emerg("Code:");
 
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index 3c5dbc0..491f2fd 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -283,9 +283,7 @@ show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
 void show_regs(struct pt_regs *regs)
 {
 	int i;
-	unsigned long sp;
 
-	sp = regs->sp;
 	show_regs_print_info(KERN_DEFAULT);
 	__show_regs(regs, 1);
 
@@ -300,8 +298,7 @@ void show_regs(struct pt_regs *regs)
 		u8 *ip;
 
 		printk(KERN_DEFAULT "Stack:\n");
-		show_stack_log_lvl(NULL, regs, (unsigned long *)sp,
-				   0, KERN_DEFAULT);
+		show_stack_log_lvl(NULL, regs, NULL, 0, KERN_DEFAULT);
 
 		printk(KERN_DEFAULT "Code: ");
 
diff --git a/arch/x86/oprofile/backtrace.c b/arch/x86/oprofile/backtrace.c
index cb31a44..c594768 100644
--- a/arch/x86/oprofile/backtrace.c
+++ b/arch/x86/oprofile/backtrace.c
@@ -113,10 +113,8 @@ x86_backtrace(struct pt_regs * const regs, unsigned int depth)
 	struct stack_frame *head = (struct stack_frame *)frame_pointer(regs);
 
 	if (!user_mode(regs)) {
-		unsigned long stack = kernel_stack_pointer(regs);
 		if (depth)
-			dump_trace(NULL, regs, (unsigned long *)stack, 0,
-				   &backtrace_ops, &depth);
+			dump_trace(NULL, regs, NULL, 0, &backtrace_ops, &depth);
 		return;
 	}
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 12/51] x86: move _stext marker to before head code
  2016-08-12 14:28 [PATCH v3 00/51] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (10 preceding siblings ...)
  2016-08-12 14:28 ` [PATCH v3 11/51] x86/dumpstack: remove unnecessary stack pointer arguments Josh Poimboeuf
@ 2016-08-12 14:28 ` Josh Poimboeuf
  2016-08-12 14:28 ` [PATCH v3 13/51] x86/asm/head: remove useless zeroed word Josh Poimboeuf
                   ` (38 subsequent siblings)
  50 siblings, 0 replies; 99+ messages in thread
From: Josh Poimboeuf @ 2016-08-12 14:28 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

When core_kernel_text() is used to determine whether an address on a
task's stack trace is a kernel text address, it incorrectly returns
false for early text addresses for the head code between the _text and
_stext markers.

Head code is text code too, so mark it as such.  This seems to match the
intent of other users of the _stext symbol, and it also seems consistent
with what other architectures are already doing.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/vmlinux.lds.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 9297a00..1d9b636 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -91,10 +91,10 @@ SECTIONS
 	/* Text and read-only data */
 	.text :  AT(ADDR(.text) - LOAD_OFFSET) {
 		_text = .;
+		_stext = .;
 		/* bootstrapping code */
 		HEAD_TEXT
 		. = ALIGN(8);
-		_stext = .;
 		TEXT_TEXT
 		SCHED_TEXT
 		LOCK_TEXT
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 13/51] x86/asm/head: remove useless zeroed word
  2016-08-12 14:28 [PATCH v3 00/51] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (11 preceding siblings ...)
  2016-08-12 14:28 ` [PATCH v3 12/51] x86: move _stext marker to before head code Josh Poimboeuf
@ 2016-08-12 14:28 ` Josh Poimboeuf
  2016-08-12 14:28 ` [PATCH v3 14/51] x86/asm/head: put real return address on idle task stack Josh Poimboeuf
                   ` (37 subsequent siblings)
  50 siblings, 0 replies; 99+ messages in thread
From: Josh Poimboeuf @ 2016-08-12 14:28 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

This zeroed word has no apparent purpose, so remove it.

Brian Gerst says:

  "FYI the word used to be the SS segment selector for the LSS
  instruction, which isn't needed in 64-bit mode."

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/head_64.S | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index a212310..3621ad2 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -328,7 +328,6 @@ ENDPROC(start_cpu0)
 	.quad	INIT_PER_CPU_VAR(irq_stack_union)
 	GLOBAL(initial_stack)
 	.quad  init_thread_union+THREAD_SIZE-8
-	.word  0
 	__FINITDATA
 
 bad_address:
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 14/51] x86/asm/head:  put real return address on idle task stack
  2016-08-12 14:28 [PATCH v3 00/51] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (12 preceding siblings ...)
  2016-08-12 14:28 ` [PATCH v3 13/51] x86/asm/head: remove useless zeroed word Josh Poimboeuf
@ 2016-08-12 14:28 ` Josh Poimboeuf
  2016-08-14  7:29   ` Andy Lutomirski
  2016-08-17 20:30   ` Nilay Vaish
  2016-08-12 14:28 ` [PATCH v3 15/51] x86/asm/head: standardize the end of the stack for idle tasks Josh Poimboeuf
                   ` (36 subsequent siblings)
  50 siblings, 2 replies; 99+ messages in thread
From: Josh Poimboeuf @ 2016-08-12 14:28 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

The frame at the end of each idle task stack has a zeroed return
address.  This is inconsistent with real task stacks, which have a real
return address at that spot.  This inconsistency can be confusing for
stack unwinders.

Make it a real address by using the side effect of a call instruction to
push the instruction pointer on the stack.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/head_64.S | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 3621ad2..c90f481 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -298,8 +298,9 @@ ENTRY(start_cpu)
 	 *	REX.W + FF /5 JMP m16:64 Jump far, absolute indirect,
 	 *		address given in m16:64.
 	 */
-	movq	initial_code(%rip),%rax
-	pushq	$0		# fake return address to stop unwinder
+	call	1f		# put return address on stack for unwinder
+1:	xorq	%rbp, %rbp	# clear frame pointer
+	movq	initial_code(%rip), %rax
 	pushq	$__KERNEL_CS	# set correct cs
 	pushq	%rax		# target address in negative space
 	lretq
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 15/51] x86/asm/head: standardize the end of the stack for idle tasks
  2016-08-12 14:28 [PATCH v3 00/51] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (13 preceding siblings ...)
  2016-08-12 14:28 ` [PATCH v3 14/51] x86/asm/head: put real return address on idle task stack Josh Poimboeuf
@ 2016-08-12 14:28 ` Josh Poimboeuf
  2016-08-14  7:30   ` Andy Lutomirski
  2016-08-12 14:28 ` [PATCH v3 16/51] x86/32: put real return address on stack in entry code Josh Poimboeuf
                   ` (35 subsequent siblings)
  50 siblings, 1 reply; 99+ messages in thread
From: Josh Poimboeuf @ 2016-08-12 14:28 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

Thanks to all the recent x86 entry code refactoring, most tasks' kernel
stacks start at the same offset right above their saved pt_regs,
regardless of which syscall was used to enter the kernel.  That creates
a nice convention which makes it straightforward to identify the end of
the stack, which can be useful for stack walking code which needs to
verify the stack is sane.

However, the boot CPU's idle "swapper" task doesn't follow that
convention.  Fix that by starting its stack at a sizeof(pt_regs) offset
from the end of the stack page.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/head_32.S |  9 ++++++++-
 arch/x86/kernel/head_64.S | 15 +++++++--------
 2 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index 5f40126..f2298e9 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -62,6 +62,8 @@
 #define PAGE_TABLE_SIZE(pages) ((pages) / PTRS_PER_PGD)
 #endif
 
+#define SIZEOF_PTREGS 17*4
+
 /*
  * Number of possible pages in the lowmem region.
  *
@@ -704,7 +706,12 @@ ENTRY(initial_page_table)
 .data
 .balign 4
 ENTRY(initial_stack)
-	.long init_thread_union+THREAD_SIZE
+	/*
+	 * The SIZEOF_PTREGS gap is a convention which helps the in-kernel
+	 * unwinder reliably detect the end of the stack.
+	 */
+	.long init_thread_union + THREAD_SIZE - SIZEOF_PTREGS - \
+	      TOP_OF_KERNEL_STACK_PADDING;
 
 __INITRODATA
 int_msg:
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index c90f481..ec332e9 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -65,13 +65,8 @@ startup_64:
 	 * tables and then reload them.
 	 */
 
-	/*
-	 * Setup stack for verify_cpu(). "-8" because initial_stack is defined
-	 * this way, see below. Our best guess is a NULL ptr for stack
-	 * termination heuristics and we don't want to break anything which
-	 * might depend on it (kgdb, ...).
-	 */
-	leaq	(__end_init_task - 8)(%rip), %rsp
+	/* Set up the stack for verify_cpu(), similar to initial_stack below */
+	leaq	(__end_init_task - SIZEOF_PTREGS)(%rip), %rsp
 
 	/* Sanitize CPU configuration */
 	call verify_cpu
@@ -328,7 +323,11 @@ ENDPROC(start_cpu0)
 	GLOBAL(initial_gs)
 	.quad	INIT_PER_CPU_VAR(irq_stack_union)
 	GLOBAL(initial_stack)
-	.quad  init_thread_union+THREAD_SIZE-8
+	/*
+	 * The SIZEOF_PTREGS gap is a convention which helps the in-kernel
+	 * unwinder reliably detect the end of the stack.
+	 */
+	.quad  init_thread_union + THREAD_SIZE - SIZEOF_PTREGS
 	__FINITDATA
 
 bad_address:
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 16/51] x86/32: put real return address on stack in entry code
  2016-08-12 14:28 [PATCH v3 00/51] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (14 preceding siblings ...)
  2016-08-12 14:28 ` [PATCH v3 15/51] x86/asm/head: standardize the end of the stack for idle tasks Josh Poimboeuf
@ 2016-08-12 14:28 ` Josh Poimboeuf
  2016-08-14  7:31   ` Andy Lutomirski
  2016-08-12 14:28 ` [PATCH v3 17/51] x86/smp: fix initial idle stack location on 32-bit Josh Poimboeuf
                   ` (34 subsequent siblings)
  50 siblings, 1 reply; 99+ messages in thread
From: Josh Poimboeuf @ 2016-08-12 14:28 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

This standardizes the stacks of idle tasks to be consistent with other
tasks on 32-bit.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/head_32.S | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index f2298e9..6fc4f1d 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -290,7 +290,7 @@ num_subarch_entries = (. - subarch_entries) / 4
 ENTRY(start_cpu0)
 	movl initial_stack, %ecx
 	movl %ecx, %esp
-	jmp  *(initial_code)
+	call *(initial_code)
 ENDPROC(start_cpu0)
 #endif
 
@@ -471,8 +471,7 @@ is486:
 	xorl %eax,%eax			# Clear LDT
 	lldt %ax
 
-	pushl $0		# fake return address for unwinder
-	jmp *(initial_code)
+	call *(initial_code)
 
 #include "verify_cpu.S"
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 17/51] x86/smp: fix initial idle stack location on 32-bit
  2016-08-12 14:28 [PATCH v3 00/51] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (15 preceding siblings ...)
  2016-08-12 14:28 ` [PATCH v3 16/51] x86/32: put real return address on stack in entry code Josh Poimboeuf
@ 2016-08-12 14:28 ` Josh Poimboeuf
  2016-08-12 14:28 ` [PATCH v3 18/51] x86/entry/head/32: use local labels Josh Poimboeuf
                   ` (33 subsequent siblings)
  50 siblings, 0 replies; 99+ messages in thread
From: Josh Poimboeuf @ 2016-08-12 14:28 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

On 32-bit, the initial idle stack calculation doesn't take into account
the TOP_OF_KERNEL_STACK_PADDING, making the stack end address
inconsistent with other tasks on 32-bit.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/smpboot.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index d9d3d67..1158a72 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -957,9 +957,7 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle)
 	int cpu0_nmi_registered = 0;
 	unsigned long timeout;
 
-	idle->thread.sp = (unsigned long) (((struct pt_regs *)
-			  (THREAD_SIZE +  task_stack_page(idle))) - 1);
-
+	idle->thread.sp = (unsigned long)task_pt_regs(idle);
 	early_gdt_descr.address = (unsigned long)get_cpu_gdt_table(cpu);
 	initial_code = (unsigned long)start_secondary;
 	initial_stack  = idle->thread.sp;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 18/51] x86/entry/head/32: use local labels
  2016-08-12 14:28 [PATCH v3 00/51] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (16 preceding siblings ...)
  2016-08-12 14:28 ` [PATCH v3 17/51] x86/smp: fix initial idle stack location on 32-bit Josh Poimboeuf
@ 2016-08-12 14:28 ` Josh Poimboeuf
  2016-08-12 14:28 ` [PATCH v3 19/51] x86/entry/32: rename 'error_code' to 'common_exception' Josh Poimboeuf
                   ` (32 subsequent siblings)
  50 siblings, 0 replies; 99+ messages in thread
From: Josh Poimboeuf @ 2016-08-12 14:28 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

Add the local label prefix to all non-function named labels in head_32.S
and entry_32.S.  In addition to decluttering the symbol table, it also
causes stack traces to be more sensible.  For example, the last reported
function in the idle task stack trace is now startup_32_smp() instead of
is486().

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/entry/entry_32.S | 57 ++++++++++++++++++++++++-----------------------
 arch/x86/kernel/head_32.S | 32 +++++++++++++-------------
 2 files changed, 45 insertions(+), 44 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 0b56666..df4e045 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -64,7 +64,7 @@
 # define preempt_stop(clobbers)	DISABLE_INTERRUPTS(clobbers); TRACE_IRQS_OFF
 #else
 # define preempt_stop(clobbers)
-# define resume_kernel		restore_all
+# define resume_kernel		.Lrestore_all
 #endif
 
 .macro TRACE_IRQS_IRET
@@ -212,7 +212,7 @@ ENTRY(ret_from_fork)
 	/* When we fork, we trace the syscall return in the child, too. */
 	movl    %esp, %eax
 	call    syscall_return_slowpath
-	jmp     restore_all
+	jmp     .Lrestore_all
 END(ret_from_fork)
 
 ENTRY(ret_from_kernel_thread)
@@ -230,7 +230,7 @@ ENTRY(ret_from_kernel_thread)
 	 */
 	movl    %esp, %eax
 	call    syscall_return_slowpath
-	jmp     restore_all
+	jmp     .Lrestore_all
 ENDPROC(ret_from_kernel_thread)
 
 /*
@@ -264,19 +264,19 @@ ENTRY(resume_userspace)
 	TRACE_IRQS_OFF
 	movl	%esp, %eax
 	call	prepare_exit_to_usermode
-	jmp	restore_all
+	jmp	.Lrestore_all
 END(ret_from_exception)
 
 #ifdef CONFIG_PREEMPT
 ENTRY(resume_kernel)
 	DISABLE_INTERRUPTS(CLBR_ANY)
-need_resched:
+.Lneed_resched:
 	cmpl	$0, PER_CPU_VAR(__preempt_count)
-	jnz	restore_all
+	jnz	.Lrestore_all
 	testl	$X86_EFLAGS_IF, PT_EFLAGS(%esp)	# interrupts off (exception path) ?
-	jz	restore_all
+	jz	.Lrestore_all
 	call	preempt_schedule_irq
-	jmp	need_resched
+	jmp	.Lneed_resched
 END(resume_kernel)
 #endif
 
@@ -297,7 +297,7 @@ GLOBAL(__begin_SYSENTER_singlestep_region)
  */
 ENTRY(xen_sysenter_target)
 	addl	$5*4, %esp			/* remove xen-provided frame */
-	jmp	sysenter_past_esp
+	jmp	.Lsysenter_past_esp
 #endif
 
 /*
@@ -334,7 +334,7 @@ ENTRY(xen_sysenter_target)
  */
 ENTRY(entry_SYSENTER_32)
 	movl	TSS_sysenter_sp0(%esp), %esp
-sysenter_past_esp:
+.Lsysenter_past_esp:
 	pushl	$__USER_DS		/* pt_regs->ss */
 	pushl	%ebp			/* pt_regs->sp (stashed in bp) */
 	pushfl				/* pt_regs->flags (except IF = 0) */
@@ -465,11 +465,11 @@ ENTRY(entry_INT80_32)
 	call	do_int80_syscall_32
 .Lsyscall_32_done:
 
-restore_all:
+.Lrestore_all:
 	TRACE_IRQS_IRET
-restore_all_notrace:
+.Lrestore_all_notrace:
 #ifdef CONFIG_X86_ESPFIX32
-	ALTERNATIVE	"jmp restore_nocheck", "", X86_BUG_ESPFIX
+	ALTERNATIVE	"jmp .Lrestore_nocheck", "", X86_BUG_ESPFIX
 
 	movl	PT_EFLAGS(%esp), %eax		# mix EFLAGS, SS and CS
 	/*
@@ -481,22 +481,23 @@ restore_all_notrace:
 	movb	PT_CS(%esp), %al
 	andl	$(X86_EFLAGS_VM | (SEGMENT_TI_MASK << 8) | SEGMENT_RPL_MASK), %eax
 	cmpl	$((SEGMENT_LDT << 8) | USER_RPL), %eax
-	je ldt_ss				# returning to user-space with LDT SS
+	je .Lldt_ss				# returning to user-space with LDT SS
 #endif
-restore_nocheck:
+.Lrestore_nocheck:
 	RESTORE_REGS 4				# skip orig_eax/error_code
-irq_return:
+.Lirq_return:
 	INTERRUPT_RETURN
+
 .section .fixup, "ax"
 ENTRY(iret_exc	)
 	pushl	$0				# no error code
 	pushl	$do_iret_error
 	jmp	error_code
 .previous
-	_ASM_EXTABLE(irq_return, iret_exc)
+	_ASM_EXTABLE(.Lirq_return, iret_exc)
 
 #ifdef CONFIG_X86_ESPFIX32
-ldt_ss:
+.Lldt_ss:
 /*
  * Setup and switch to ESPFIX stack
  *
@@ -525,7 +526,7 @@ ldt_ss:
 	 */
 	DISABLE_INTERRUPTS(CLBR_EAX)
 	lss	(%esp), %esp			/* switch to espfix segment */
-	jmp	restore_nocheck
+	jmp	.Lrestore_nocheck
 #endif
 ENDPROC(entry_INT80_32)
 
@@ -845,7 +846,7 @@ ftrace_call:
 	popl	%edx
 	popl	%ecx
 	popl	%eax
-ftrace_ret:
+.Lftrace_ret:
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
 .globl ftrace_graph_call
 ftrace_graph_call:
@@ -915,7 +916,7 @@ GLOBAL(ftrace_regs_call)
 	popl	%gs
 	addl	$8, %esp			/* Skip orig_ax and ip */
 	popf					/* Pop flags at end (no addl to corrupt flags) */
-	jmp	ftrace_ret
+	jmp	.Lftrace_ret
 
 	popf
 	jmp	ftrace_stub
@@ -926,7 +927,7 @@ ENTRY(mcount)
 	jb	ftrace_stub			/* Paging not enabled yet? */
 
 	cmpl	$ftrace_stub, ftrace_trace_function
-	jnz	trace
+	jnz	.Ltrace
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
 	cmpl	$ftrace_stub, ftrace_graph_return
 	jnz	ftrace_graph_caller
@@ -939,7 +940,7 @@ ftrace_stub:
 	ret
 
 	/* taken from glibc */
-trace:
+.Ltrace:
 	pushl	%eax
 	pushl	%ecx
 	pushl	%edx
@@ -1078,7 +1079,7 @@ ENTRY(nmi)
 	movl	%ss, %eax
 	cmpw	$__ESPFIX_SS, %ax
 	popl	%eax
-	je	nmi_espfix_stack
+	je	.Lnmi_espfix_stack
 #endif
 
 	pushl	%eax				# pt_regs->orig_ax
@@ -1094,7 +1095,7 @@ ENTRY(nmi)
 
 	/* Not on SYSENTER stack. */
 	call	do_nmi
-	jmp	restore_all_notrace
+	jmp	.Lrestore_all_notrace
 
 .Lnmi_from_sysenter_stack:
 	/*
@@ -1105,10 +1106,10 @@ ENTRY(nmi)
 	movl	PER_CPU_VAR(cpu_current_top_of_stack), %esp
 	call	do_nmi
 	movl	%ebp, %esp
-	jmp	restore_all_notrace
+	jmp	.Lrestore_all_notrace
 
 #ifdef CONFIG_X86_ESPFIX32
-nmi_espfix_stack:
+.Lnmi_espfix_stack:
 	/*
 	 * create the pointer to lss back
 	 */
@@ -1126,7 +1127,7 @@ nmi_espfix_stack:
 	call	do_nmi
 	RESTORE_REGS
 	lss	12+4(%esp), %esp		# back to espfix stack
-	jmp	irq_return
+	jmp	.Lirq_return
 #endif
 END(nmi)
 
diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index 6fc4f1d..53202d7 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -249,19 +249,19 @@ page_pde_offset = (__PAGE_OFFSET >> 20);
 #ifdef CONFIG_PARAVIRT
 	/* This is can only trip for a broken bootloader... */
 	cmpw $0x207, pa(boot_params + BP_version)
-	jb default_entry
+	jb .Ldefault_entry
 
 	/* Paravirt-compatible boot parameters.  Look to see what architecture
 		we're booting under. */
 	movl pa(boot_params + BP_hardware_subarch), %eax
 	cmpl $num_subarch_entries, %eax
-	jae bad_subarch
+	jae .Lbad_subarch
 
 	movl pa(subarch_entries)(,%eax,4), %eax
 	subl $__PAGE_OFFSET, %eax
 	jmp *%eax
 
-bad_subarch:
+.Lbad_subarch:
 WEAK(lguest_entry)
 WEAK(xen_entry)
 	/* Unknown implementation; there's really
@@ -271,14 +271,14 @@ WEAK(xen_entry)
 	__INITDATA
 
 subarch_entries:
-	.long default_entry		/* normal x86/PC */
+	.long .Ldefault_entry		/* normal x86/PC */
 	.long lguest_entry		/* lguest hypervisor */
 	.long xen_entry			/* Xen hypervisor */
-	.long default_entry		/* Moorestown MID */
+	.long .Ldefault_entry		/* Moorestown MID */
 num_subarch_entries = (. - subarch_entries) / 4
 .previous
 #else
-	jmp default_entry
+	jmp .Ldefault_entry
 #endif /* CONFIG_PARAVIRT */
 
 #ifdef CONFIG_HOTPLUG_CPU
@@ -318,7 +318,7 @@ ENTRY(startup_32_smp)
 	call load_ucode_ap
 #endif
 
-default_entry:
+.Ldefault_entry:
 #define CR0_STATE	(X86_CR0_PE | X86_CR0_MP | X86_CR0_ET | \
 			 X86_CR0_NE | X86_CR0_WP | X86_CR0_AM | \
 			 X86_CR0_PG)
@@ -348,7 +348,7 @@ default_entry:
 	pushfl
 	popl %eax			# get EFLAGS
 	testl $X86_EFLAGS_ID,%eax	# did EFLAGS.ID remained set?
-	jz enable_paging		# hw disallowed setting of ID bit
+	jz .Lenable_paging		# hw disallowed setting of ID bit
 					# which means no CPUID and no CR4
 
 	xorl %eax,%eax
@@ -358,13 +358,13 @@ default_entry:
 	movl $1,%eax
 	cpuid
 	andl $~1,%edx			# Ignore CPUID.FPU
-	jz enable_paging		# No flags or only CPUID.FPU = no CR4
+	jz .Lenable_paging		# No flags or only CPUID.FPU = no CR4
 
 	movl pa(mmu_cr4_features),%eax
 	movl %eax,%cr4
 
 	testb $X86_CR4_PAE, %al		# check if PAE is enabled
-	jz enable_paging
+	jz .Lenable_paging
 
 	/* Check if extended functions are implemented */
 	movl $0x80000000, %eax
@@ -372,7 +372,7 @@ default_entry:
 	/* Value must be in the range 0x80000001 to 0x8000ffff */
 	subl $0x80000001, %eax
 	cmpl $(0x8000ffff-0x80000001), %eax
-	ja enable_paging
+	ja .Lenable_paging
 
 	/* Clear bogus XD_DISABLE bits */
 	call verify_cpu
@@ -381,7 +381,7 @@ default_entry:
 	cpuid
 	/* Execute Disable bit supported? */
 	btl $(X86_FEATURE_NX & 31), %edx
-	jnc enable_paging
+	jnc .Lenable_paging
 
 	/* Setup EFER (Extended Feature Enable Register) */
 	movl $MSR_EFER, %ecx
@@ -391,7 +391,7 @@ default_entry:
 	/* Make changes effective */
 	wrmsr
 
-enable_paging:
+.Lenable_paging:
 
 /*
  * Enable paging
@@ -420,7 +420,7 @@ enable_paging:
  */
 	movb $4,X86			# at least 486
 	cmpl $-1,X86_CPUID
-	je is486
+	je .Lis486
 
 	/* get vendor info */
 	xorl %eax,%eax			# call CPUID with 0 -> return vendor ID
@@ -431,7 +431,7 @@ enable_paging:
 	movl %ecx,X86_VENDOR_ID+8	# last 4 chars
 
 	orl %eax,%eax			# do we have processor info as well?
-	je is486
+	je .Lis486
 
 	movl $1,%eax		# Use the CPUID instruction to get CPU type
 	cpuid
@@ -445,7 +445,7 @@ enable_paging:
 	movb %cl,X86_MASK
 	movl %edx,X86_CAPABILITY
 
-is486:
+.Lis486:
 	movl $0x50022,%ecx	# set AM, WP, NE and MP
 	movl %cr0,%eax
 	andl $0x80000011,%eax	# Save PG,PE,ET
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 19/51] x86/entry/32: rename 'error_code' to 'common_exception'
  2016-08-12 14:28 [PATCH v3 00/51] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (17 preceding siblings ...)
  2016-08-12 14:28 ` [PATCH v3 18/51] x86/entry/head/32: use local labels Josh Poimboeuf
@ 2016-08-12 14:28 ` Josh Poimboeuf
  2016-08-14  7:40   ` Andy Lutomirski
  2016-08-12 14:28 ` [PATCH v3 20/51] perf/x86: check perf_callchain_store() error Josh Poimboeuf
                   ` (31 subsequent siblings)
  50 siblings, 1 reply; 99+ messages in thread
From: Josh Poimboeuf @ 2016-08-12 14:28 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

The 'error_code' label is awkwardly named, especially when it shows up
in a stack trace.  Move it to its own local function and rename it to
'common_exception', analagous to the existing 'common_interrupt'.

This also makes related stack traces more sensible.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/entry/entry_32.S | 43 +++++++++++++++++++++++--------------------
 1 file changed, 23 insertions(+), 20 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index df4e045..4396278 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -492,7 +492,7 @@ ENTRY(entry_INT80_32)
 ENTRY(iret_exc	)
 	pushl	$0				# no error code
 	pushl	$do_iret_error
-	jmp	error_code
+	jmp	common_exception
 .previous
 	_ASM_EXTABLE(.Lirq_return, iret_exc)
 
@@ -623,7 +623,7 @@ ENTRY(coprocessor_error)
 	ASM_CLAC
 	pushl	$0
 	pushl	$do_coprocessor_error
-	jmp	error_code
+	jmp	common_exception
 END(coprocessor_error)
 
 ENTRY(simd_coprocessor_error)
@@ -637,14 +637,14 @@ ENTRY(simd_coprocessor_error)
 #else
 	pushl	$do_simd_coprocessor_error
 #endif
-	jmp	error_code
+	jmp	common_exception
 END(simd_coprocessor_error)
 
 ENTRY(device_not_available)
 	ASM_CLAC
 	pushl	$-1				# mark this as an int
 	pushl	$do_device_not_available
-	jmp	error_code
+	jmp	common_exception
 END(device_not_available)
 
 #ifdef CONFIG_PARAVIRT
@@ -658,59 +658,59 @@ ENTRY(overflow)
 	ASM_CLAC
 	pushl	$0
 	pushl	$do_overflow
-	jmp	error_code
+	jmp	common_exception
 END(overflow)
 
 ENTRY(bounds)
 	ASM_CLAC
 	pushl	$0
 	pushl	$do_bounds
-	jmp	error_code
+	jmp	common_exception
 END(bounds)
 
 ENTRY(invalid_op)
 	ASM_CLAC
 	pushl	$0
 	pushl	$do_invalid_op
-	jmp	error_code
+	jmp	common_exception
 END(invalid_op)
 
 ENTRY(coprocessor_segment_overrun)
 	ASM_CLAC
 	pushl	$0
 	pushl	$do_coprocessor_segment_overrun
-	jmp	error_code
+	jmp	common_exception
 END(coprocessor_segment_overrun)
 
 ENTRY(invalid_TSS)
 	ASM_CLAC
 	pushl	$do_invalid_TSS
-	jmp	error_code
+	jmp	common_exception
 END(invalid_TSS)
 
 ENTRY(segment_not_present)
 	ASM_CLAC
 	pushl	$do_segment_not_present
-	jmp	error_code
+	jmp	common_exception
 END(segment_not_present)
 
 ENTRY(stack_segment)
 	ASM_CLAC
 	pushl	$do_stack_segment
-	jmp	error_code
+	jmp	common_exception
 END(stack_segment)
 
 ENTRY(alignment_check)
 	ASM_CLAC
 	pushl	$do_alignment_check
-	jmp	error_code
+	jmp	common_exception
 END(alignment_check)
 
 ENTRY(divide_error)
 	ASM_CLAC
 	pushl	$0				# no error code
 	pushl	$do_divide_error
-	jmp	error_code
+	jmp	common_exception
 END(divide_error)
 
 #ifdef CONFIG_X86_MCE
@@ -718,7 +718,7 @@ ENTRY(machine_check)
 	ASM_CLAC
 	pushl	$0
 	pushl	machine_check_vector
-	jmp	error_code
+	jmp	common_exception
 END(machine_check)
 #endif
 
@@ -726,7 +726,7 @@ ENTRY(spurious_interrupt_bug)
 	ASM_CLAC
 	pushl	$0
 	pushl	$do_spurious_interrupt_bug
-	jmp	error_code
+	jmp	common_exception
 END(spurious_interrupt_bug)
 
 #ifdef CONFIG_XEN
@@ -990,7 +990,7 @@ return_to_handler:
 ENTRY(trace_page_fault)
 	ASM_CLAC
 	pushl	$trace_do_page_fault
-	jmp	error_code
+	jmp	common_exception
 END(trace_page_fault)
 #endif
 
@@ -998,7 +998,10 @@ ENTRY(page_fault)
 	ASM_CLAC
 	pushl	$do_page_fault
 	ALIGN
-error_code:
+	jmp common_exception
+END(page_fault)
+
+common_exception:
 	/* the function address is in %gs's slot on the stack */
 	pushl	%fs
 	pushl	%es
@@ -1027,7 +1030,7 @@ error_code:
 	movl	%esp, %eax			# pt_regs pointer
 	call	*%edi
 	jmp	ret_from_exception
-END(page_fault)
+END(common_exception)
 
 ENTRY(debug)
 	/*
@@ -1144,14 +1147,14 @@ END(int3)
 
 ENTRY(general_protection)
 	pushl	$do_general_protection
-	jmp	error_code
+	jmp	common_exception
 END(general_protection)
 
 #ifdef CONFIG_KVM_GUEST
 ENTRY(async_page_fault)
 	ASM_CLAC
 	pushl	$do_async_page_fault
-	jmp	error_code
+	jmp	common_exception
 END(async_page_fault)
 #endif
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 20/51] perf/x86: check perf_callchain_store() error
  2016-08-12 14:28 [PATCH v3 00/51] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (18 preceding siblings ...)
  2016-08-12 14:28 ` [PATCH v3 19/51] x86/entry/32: rename 'error_code' to 'common_exception' Josh Poimboeuf
@ 2016-08-12 14:28 ` Josh Poimboeuf
  2016-08-12 14:28 ` [PATCH v3 21/51] oprofile/x86: add regs->ip to oprofile trace Josh Poimboeuf
                   ` (30 subsequent siblings)
  50 siblings, 0 replies; 99+ messages in thread
From: Josh Poimboeuf @ 2016-08-12 14:28 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

Add a check to perf_callchain_kernel() so that it returns early if the
callchain entry array is already full.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/events/core.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 18a1acf..dcaa887 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2297,7 +2297,8 @@ perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *re
 		return;
 	}
 
-	perf_callchain_store(entry, regs->ip);
+	if (perf_callchain_store(entry, regs->ip))
+		return;
 
 	dump_trace(NULL, regs, NULL, 0, &backtrace_ops, entry);
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 21/51] oprofile/x86: add regs->ip to oprofile trace
  2016-08-12 14:28 [PATCH v3 00/51] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (19 preceding siblings ...)
  2016-08-12 14:28 ` [PATCH v3 20/51] perf/x86: check perf_callchain_store() error Josh Poimboeuf
@ 2016-08-12 14:28 ` Josh Poimboeuf
  2016-08-12 14:28 ` [PATCH v3 22/51] proc: fix return address printk conversion specifer in /proc/<pid>/stack Josh Poimboeuf
                   ` (29 subsequent siblings)
  50 siblings, 0 replies; 99+ messages in thread
From: Josh Poimboeuf @ 2016-08-12 14:28 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish, Robert Richter

dump_trace() doesn't add the interrupted instruction's address to the
trace, so add it manually.

Cc: Robert Richter <rric@kernel.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/oprofile/backtrace.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/oprofile/backtrace.c b/arch/x86/oprofile/backtrace.c
index c594768..d950f9e 100644
--- a/arch/x86/oprofile/backtrace.c
+++ b/arch/x86/oprofile/backtrace.c
@@ -113,8 +113,14 @@ x86_backtrace(struct pt_regs * const regs, unsigned int depth)
 	struct stack_frame *head = (struct stack_frame *)frame_pointer(regs);
 
 	if (!user_mode(regs)) {
-		if (depth)
-			dump_trace(NULL, regs, NULL, 0, &backtrace_ops, &depth);
+		if (!depth)
+			return;
+
+		oprofile_add_trace(regs->ip);
+		if (!--depth)
+			return;
+
+		dump_trace(NULL, regs, NULL, 0, &backtrace_ops, &depth);
 		return;
 	}
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 22/51] proc: fix return address printk conversion specifer in /proc/<pid>/stack
  2016-08-12 14:28 [PATCH v3 00/51] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (20 preceding siblings ...)
  2016-08-12 14:28 ` [PATCH v3 21/51] oprofile/x86: add regs->ip to oprofile trace Josh Poimboeuf
@ 2016-08-12 14:28 ` Josh Poimboeuf
  2016-08-12 14:28 ` [PATCH v3 23/51] ftrace: remove CONFIG_HAVE_FUNCTION_GRAPH_FP_TEST from config Josh Poimboeuf
                   ` (28 subsequent siblings)
  50 siblings, 0 replies; 99+ messages in thread
From: Josh Poimboeuf @ 2016-08-12 14:28 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

When printing call return addresses found on a stack, /proc/<pid>/stack
can sometimes give a confusing result.  If the call instruction was the
last instruction in the function (which can happen when calling a
noreturn function), '%pS' will incorrectly display the name of the
function which happens to be next in the object code, rather than the
name of the actual calling function.

Use '%pB' instead, which was created for this exact purpose.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 fs/proc/base.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 54e2702..e9ff186 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -483,7 +483,7 @@ static int proc_pid_stack(struct seq_file *m, struct pid_namespace *ns,
 		save_stack_trace_tsk(task, &trace);
 
 		for (i = 0; i < trace.nr_entries; i++) {
-			seq_printf(m, "[<%pK>] %pS\n",
+			seq_printf(m, "[<%pK>] %pB\n",
 				   (void *)entries[i], (void *)entries[i]);
 		}
 		unlock_trace(task);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 23/51] ftrace: remove CONFIG_HAVE_FUNCTION_GRAPH_FP_TEST from config
  2016-08-12 14:28 [PATCH v3 00/51] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (21 preceding siblings ...)
  2016-08-12 14:28 ` [PATCH v3 22/51] proc: fix return address printk conversion specifer in /proc/<pid>/stack Josh Poimboeuf
@ 2016-08-12 14:28 ` Josh Poimboeuf
  2016-08-12 14:28 ` [PATCH v3 24/51] ftrace: only allocate the ret_stack 'fp' field when needed Josh Poimboeuf
                   ` (27 subsequent siblings)
  50 siblings, 0 replies; 99+ messages in thread
From: Josh Poimboeuf @ 2016-08-12 14:28 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

Make HAVE_FUNCTION_GRAPH_FP_TEST a normal define, independent from
kconfig.  This removes some config file pollution and simplifies the
checking for the fp test.

Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/arm64/kernel/entry-ftrace.S     | 2 +-
 arch/blackfin/kernel/ftrace-entry.S  | 4 ++--
 arch/sparc/Kconfig                   | 1 -
 arch/sparc/include/asm/ftrace.h      | 4 ++++
 arch/x86/Kconfig                     | 1 -
 arch/x86/include/asm/ftrace.h        | 1 +
 kernel/trace/Kconfig                 | 5 -----
 kernel/trace/trace_functions_graph.c | 2 +-
 8 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/kernel/entry-ftrace.S b/arch/arm64/kernel/entry-ftrace.S
index 0f03a8f..aef02d2 100644
--- a/arch/arm64/kernel/entry-ftrace.S
+++ b/arch/arm64/kernel/entry-ftrace.S
@@ -219,7 +219,7 @@ ENDPROC(ftrace_graph_caller)
  *
  * Run ftrace_return_to_handler() before going back to parent.
  * @fp is checked against the value passed by ftrace_graph_caller()
- * only when CONFIG_HAVE_FUNCTION_GRAPH_FP_TEST is enabled.
+ * only when HAVE_FUNCTION_GRAPH_FP_TEST is enabled.
  */
 ENTRY(return_to_handler)
 	save_return_regs
diff --git a/arch/blackfin/kernel/ftrace-entry.S b/arch/blackfin/kernel/ftrace-entry.S
index 28d0595..3b8bdcb 100644
--- a/arch/blackfin/kernel/ftrace-entry.S
+++ b/arch/blackfin/kernel/ftrace-entry.S
@@ -169,7 +169,7 @@ ENTRY(_ftrace_graph_caller)
 	r0 = sp;	/* unsigned long *parent */
 	r1 = [sp];	/* unsigned long self_addr */
 # endif
-# ifdef CONFIG_HAVE_FUNCTION_GRAPH_FP_TEST
+# ifdef HAVE_FUNCTION_GRAPH_FP_TEST
 	r2 = fp;	/* unsigned long frame_pointer */
 # endif
 	r0 += 16;	/* skip the 4 local regs on stack */
@@ -190,7 +190,7 @@ ENTRY(_return_to_handler)
 	[--sp] = r1;
 
 	/* get original return address */
-# ifdef CONFIG_HAVE_FUNCTION_GRAPH_FP_TEST
+# ifdef HAVE_FUNCTION_GRAPH_FP_TEST
 	r0 = fp;	/* Blackfin is sane, so omit this */
 # endif
 	call _ftrace_return_to_handler;
diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index 59b0960..f5d60f1 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -56,7 +56,6 @@ config SPARC64
 	def_bool 64BIT
 	select HAVE_FUNCTION_TRACER
 	select HAVE_FUNCTION_GRAPH_TRACER
-	select HAVE_FUNCTION_GRAPH_FP_TEST
 	select HAVE_KRETPROBES
 	select HAVE_KPROBES
 	select HAVE_RCU_TABLE_FREE if SMP
diff --git a/arch/sparc/include/asm/ftrace.h b/arch/sparc/include/asm/ftrace.h
index 3192a8e..62755a3 100644
--- a/arch/sparc/include/asm/ftrace.h
+++ b/arch/sparc/include/asm/ftrace.h
@@ -9,6 +9,10 @@
 void _mcount(void);
 #endif
 
+#endif /* CONFIG_MCOUNT */
+
+#if defined(CONFIG_SPARC64) && !defined(CC_USE_FENTRY)
+#define HAVE_FUNCTION_GRAPH_FP_TEST
 #endif
 
 #ifdef CONFIG_DYNAMIC_FTRACE
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index c580d8c..acf85ae 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -110,7 +110,6 @@ config X86
 	select HAVE_EXIT_THREAD
 	select HAVE_FENTRY			if X86_64
 	select HAVE_FTRACE_MCOUNT_RECORD
-	select HAVE_FUNCTION_GRAPH_FP_TEST
 	select HAVE_FUNCTION_GRAPH_TRACER
 	select HAVE_FUNCTION_TRACER
 	select HAVE_GCC_PLUGINS
diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h
index a4820d4..37f67cb 100644
--- a/arch/x86/include/asm/ftrace.h
+++ b/arch/x86/include/asm/ftrace.h
@@ -6,6 +6,7 @@
 # define MCOUNT_ADDR		((unsigned long)(__fentry__))
 #else
 # define MCOUNT_ADDR		((unsigned long)(mcount))
+# define HAVE_FUNCTION_GRAPH_FP_TEST
 #endif
 #define MCOUNT_INSN_SIZE	5 /* sizeof mcount call */
 
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index f4b86e8..ba33267 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -24,11 +24,6 @@ config HAVE_FUNCTION_GRAPH_TRACER
 	help
 	  See Documentation/trace/ftrace-design.txt
 
-config HAVE_FUNCTION_GRAPH_FP_TEST
-	bool
-	help
-	  See Documentation/trace/ftrace-design.txt
-
 config HAVE_DYNAMIC_FTRACE
 	bool
 	help
diff --git a/kernel/trace/trace_functions_graph.c b/kernel/trace/trace_functions_graph.c
index 7363ccf..fc173cd 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -204,7 +204,7 @@ ftrace_pop_return_trace(struct ftrace_graph_ret *trace, unsigned long *ret,
 		return;
 	}
 
-#if defined(CONFIG_HAVE_FUNCTION_GRAPH_FP_TEST) && !defined(CC_USING_FENTRY)
+#ifdef HAVE_FUNCTION_GRAPH_FP_TEST
 	/*
 	 * The arch may choose to record the frame pointer used
 	 * and check it here to make sure that it is what we expect it
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 24/51] ftrace: only allocate the ret_stack 'fp' field when needed
  2016-08-12 14:28 [PATCH v3 00/51] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (22 preceding siblings ...)
  2016-08-12 14:28 ` [PATCH v3 23/51] ftrace: remove CONFIG_HAVE_FUNCTION_GRAPH_FP_TEST from config Josh Poimboeuf
@ 2016-08-12 14:28 ` Josh Poimboeuf
  2016-08-12 14:28 ` [PATCH v3 25/51] ftrace: add return address pointer to ftrace_ret_stack Josh Poimboeuf
                   ` (26 subsequent siblings)
  50 siblings, 0 replies; 99+ messages in thread
From: Josh Poimboeuf @ 2016-08-12 14:28 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

This saves some memory when HAVE_FUNCTION_GRAPH_FP_TEST isn't defined.
On x86_64 with newer versions of gcc which have -mfentry, it saves 400
bytes per task.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 include/linux/ftrace.h               | 2 ++
 kernel/trace/trace_functions_graph.c | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 7d565af..4ad9ccc 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -795,7 +795,9 @@ struct ftrace_ret_stack {
 	unsigned long func;
 	unsigned long long calltime;
 	unsigned long long subtime;
+#ifdef HAVE_FUNCTION_GRAPH_FP_TEST
 	unsigned long fp;
+#endif
 };
 
 /*
diff --git a/kernel/trace/trace_functions_graph.c b/kernel/trace/trace_functions_graph.c
index fc173cd..0e03ed0 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -171,7 +171,9 @@ ftrace_push_return_trace(unsigned long ret, unsigned long func, int *depth,
 	current->ret_stack[index].func = func;
 	current->ret_stack[index].calltime = calltime;
 	current->ret_stack[index].subtime = 0;
+#ifdef HAVE_FUNCTION_GRAPH_FP_TEST
 	current->ret_stack[index].fp = frame_pointer;
+#endif
 	*depth = current->curr_ret_stack;
 
 	return 0;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 25/51] ftrace: add return address pointer to ftrace_ret_stack
  2016-08-12 14:28 [PATCH v3 00/51] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (23 preceding siblings ...)
  2016-08-12 14:28 ` [PATCH v3 24/51] ftrace: only allocate the ret_stack 'fp' field when needed Josh Poimboeuf
@ 2016-08-12 14:28 ` Josh Poimboeuf
  2016-08-12 14:28 ` [PATCH v3 26/51] ftrace: add ftrace_graph_ret_addr() stack unwinding helpers Josh Poimboeuf
                   ` (25 subsequent siblings)
  50 siblings, 0 replies; 99+ messages in thread
From: Josh Poimboeuf @ 2016-08-12 14:28 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

Storing this value will help prevent unwinders from getting out of sync
with the function graph tracer ret_stack.  Now instead of needing a
stateful iterator, they can compare the return address pointer to find
the right ret_stack entry.

Note that an array of 50 ftrace_ret_stack structs is allocated for every
task.  So when an arch implements this, it will add either 200 or 400
bytes of memory usage per task (depending on whether it's a 32-bit or
64-bit platform).

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 Documentation/trace/ftrace-design.txt | 11 +++++++++++
 arch/arm/kernel/ftrace.c              |  2 +-
 arch/arm64/kernel/ftrace.c            |  2 +-
 arch/blackfin/kernel/ftrace.c         |  2 +-
 arch/microblaze/kernel/ftrace.c       |  2 +-
 arch/mips/kernel/ftrace.c             |  4 ++--
 arch/parisc/kernel/ftrace.c           |  2 +-
 arch/powerpc/kernel/ftrace.c          |  3 ++-
 arch/s390/kernel/ftrace.c             |  3 ++-
 arch/sh/kernel/ftrace.c               |  2 +-
 arch/sparc/kernel/ftrace.c            |  2 +-
 arch/tile/kernel/ftrace.c             |  2 +-
 arch/x86/kernel/ftrace.c              |  2 +-
 include/linux/ftrace.h                |  5 ++++-
 kernel/trace/trace_functions_graph.c  |  5 ++++-
 15 files changed, 34 insertions(+), 15 deletions(-)

diff --git a/Documentation/trace/ftrace-design.txt b/Documentation/trace/ftrace-design.txt
index dd5f916..a273dd0 100644
--- a/Documentation/trace/ftrace-design.txt
+++ b/Documentation/trace/ftrace-design.txt
@@ -203,6 +203,17 @@ along to ftrace_push_return_trace() instead of a stub value of 0.
 
 Similarly, when you call ftrace_return_to_handler(), pass it the frame pointer.
 
+HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
+--------------------------------
+
+An arch may pass in a pointer to the return address on the stack.  This
+prevents potential stack unwinding issues where the unwinder gets out of
+sync with ret_stack and the wrong addresses are reported by
+ftrace_graph_ret_addr().
+
+Adding support for it is easy: just define the macro in asm/ftrace.h and
+pass the return address pointer as the 'retp' argument to
+ftrace_push_return_trace().
 
 HAVE_FTRACE_NMI_ENTER
 ---------------------
diff --git a/arch/arm/kernel/ftrace.c b/arch/arm/kernel/ftrace.c
index 709ee1d..3f17594 100644
--- a/arch/arm/kernel/ftrace.c
+++ b/arch/arm/kernel/ftrace.c
@@ -218,7 +218,7 @@ void prepare_ftrace_return(unsigned long *parent, unsigned long self_addr,
 	}
 
 	err = ftrace_push_return_trace(old, self_addr, &trace.depth,
-				       frame_pointer);
+				       frame_pointer, NULL);
 	if (err == -EBUSY) {
 		*parent = old;
 		return;
diff --git a/arch/arm64/kernel/ftrace.c b/arch/arm64/kernel/ftrace.c
index ebecf9a..40ad08a 100644
--- a/arch/arm64/kernel/ftrace.c
+++ b/arch/arm64/kernel/ftrace.c
@@ -138,7 +138,7 @@ void prepare_ftrace_return(unsigned long *parent, unsigned long self_addr,
 		return;
 
 	err = ftrace_push_return_trace(old, self_addr, &trace.depth,
-				       frame_pointer);
+				       frame_pointer, NULL);
 	if (err == -EBUSY)
 		return;
 	else
diff --git a/arch/blackfin/kernel/ftrace.c b/arch/blackfin/kernel/ftrace.c
index 095de0f..8dad758 100644
--- a/arch/blackfin/kernel/ftrace.c
+++ b/arch/blackfin/kernel/ftrace.c
@@ -107,7 +107,7 @@ void prepare_ftrace_return(unsigned long *parent, unsigned long self_addr,
 		return;
 
 	if (ftrace_push_return_trace(*parent, self_addr, &trace.depth,
-	                             frame_pointer) == -EBUSY)
+				     frame_pointer, NULL) == -EBUSY)
 		return;
 
 	trace.func = self_addr;
diff --git a/arch/microblaze/kernel/ftrace.c b/arch/microblaze/kernel/ftrace.c
index fc7b48a..d57563c 100644
--- a/arch/microblaze/kernel/ftrace.c
+++ b/arch/microblaze/kernel/ftrace.c
@@ -63,7 +63,7 @@ void prepare_ftrace_return(unsigned long *parent, unsigned long self_addr)
 		return;
 	}
 
-	err = ftrace_push_return_trace(old, self_addr, &trace.depth, 0);
+	err = ftrace_push_return_trace(old, self_addr, &trace.depth, 0, NULL);
 	if (err == -EBUSY) {
 		*parent = old;
 		return;
diff --git a/arch/mips/kernel/ftrace.c b/arch/mips/kernel/ftrace.c
index 937c54b..30a3b75 100644
--- a/arch/mips/kernel/ftrace.c
+++ b/arch/mips/kernel/ftrace.c
@@ -382,8 +382,8 @@ void prepare_ftrace_return(unsigned long *parent_ra_addr, unsigned long self_ra,
 	if (unlikely(faulted))
 		goto out;
 
-	if (ftrace_push_return_trace(old_parent_ra, self_ra, &trace.depth, fp)
-	    == -EBUSY) {
+	if (ftrace_push_return_trace(old_parent_ra, self_ra, &trace.depth, fp,
+				     NULL) == -EBUSY) {
 		*parent_ra_addr = old_parent_ra;
 		return;
 	}
diff --git a/arch/parisc/kernel/ftrace.c b/arch/parisc/kernel/ftrace.c
index a828a0a..5a5506a 100644
--- a/arch/parisc/kernel/ftrace.c
+++ b/arch/parisc/kernel/ftrace.c
@@ -48,7 +48,7 @@ static void __hot prepare_ftrace_return(unsigned long *parent,
 		return;
 
         if (ftrace_push_return_trace(old, self_addr, &trace.depth,
-			0 ) == -EBUSY)
+				     0, NULL) == -EBUSY)
                 return;
 
 	/* activate parisc_return_to_handler() as return point */
diff --git a/arch/powerpc/kernel/ftrace.c b/arch/powerpc/kernel/ftrace.c
index cc52d97..a95639b 100644
--- a/arch/powerpc/kernel/ftrace.c
+++ b/arch/powerpc/kernel/ftrace.c
@@ -593,7 +593,8 @@ unsigned long prepare_ftrace_return(unsigned long parent, unsigned long ip)
 	if (!ftrace_graph_entry(&trace))
 		goto out;
 
-	if (ftrace_push_return_trace(parent, ip, &trace.depth, 0) == -EBUSY)
+	if (ftrace_push_return_trace(parent, ip, &trace.depth, 0,
+				     NULL) == -EBUSY)
 		goto out;
 
 	parent = return_hooker;
diff --git a/arch/s390/kernel/ftrace.c b/arch/s390/kernel/ftrace.c
index 0f7bfeb..60a8a4e 100644
--- a/arch/s390/kernel/ftrace.c
+++ b/arch/s390/kernel/ftrace.c
@@ -209,7 +209,8 @@ unsigned long prepare_ftrace_return(unsigned long parent, unsigned long ip)
 	/* Only trace if the calling function expects to. */
 	if (!ftrace_graph_entry(&trace))
 		goto out;
-	if (ftrace_push_return_trace(parent, ip, &trace.depth, 0) == -EBUSY)
+	if (ftrace_push_return_trace(parent, ip, &trace.depth, 0,
+				     NULL) == -EBUSY)
 		goto out;
 	parent = (unsigned long) return_to_handler;
 out:
diff --git a/arch/sh/kernel/ftrace.c b/arch/sh/kernel/ftrace.c
index 38993e0..95eccd4 100644
--- a/arch/sh/kernel/ftrace.c
+++ b/arch/sh/kernel/ftrace.c
@@ -382,7 +382,7 @@ void prepare_ftrace_return(unsigned long *parent, unsigned long self_addr)
 		return;
 	}
 
-	err = ftrace_push_return_trace(old, self_addr, &trace.depth, 0);
+	err = ftrace_push_return_trace(old, self_addr, &trace.depth, 0, NULL);
 	if (err == -EBUSY) {
 		__raw_writel(old, parent);
 		return;
diff --git a/arch/sparc/kernel/ftrace.c b/arch/sparc/kernel/ftrace.c
index 0a2d2dd..6bcff69 100644
--- a/arch/sparc/kernel/ftrace.c
+++ b/arch/sparc/kernel/ftrace.c
@@ -131,7 +131,7 @@ unsigned long prepare_ftrace_return(unsigned long parent,
 		return parent + 8UL;
 
 	if (ftrace_push_return_trace(parent, self_addr, &trace.depth,
-				     frame_pointer) == -EBUSY)
+				     frame_pointer, NULL) == -EBUSY)
 		return parent + 8UL;
 
 	trace.func = self_addr;
diff --git a/arch/tile/kernel/ftrace.c b/arch/tile/kernel/ftrace.c
index 4a57208..b827a41 100644
--- a/arch/tile/kernel/ftrace.c
+++ b/arch/tile/kernel/ftrace.c
@@ -184,7 +184,7 @@ void prepare_ftrace_return(unsigned long *parent, unsigned long self_addr,
 	*parent = return_hooker;
 
 	err = ftrace_push_return_trace(old, self_addr, &trace.depth,
-				       frame_pointer);
+				       frame_pointer, NULL);
 	if (err == -EBUSY) {
 		*parent = old;
 		return;
diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index d036cfb..ae3b1fb 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -1029,7 +1029,7 @@ void prepare_ftrace_return(unsigned long self_addr, unsigned long *parent,
 	}
 
 	if (ftrace_push_return_trace(old, self_addr, &trace.depth,
-		    frame_pointer) == -EBUSY) {
+				     frame_pointer, NULL) == -EBUSY) {
 		*parent = old;
 		return;
 	}
diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 4ad9ccc..483e02a 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -798,6 +798,9 @@ struct ftrace_ret_stack {
 #ifdef HAVE_FUNCTION_GRAPH_FP_TEST
 	unsigned long fp;
 #endif
+#ifdef HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
+	unsigned long *retp;
+#endif
 };
 
 /*
@@ -809,7 +812,7 @@ extern void return_to_handler(void);
 
 extern int
 ftrace_push_return_trace(unsigned long ret, unsigned long func, int *depth,
-			 unsigned long frame_pointer);
+			 unsigned long frame_pointer, unsigned long *retp);
 
 /*
  * Sometimes we don't want to trace a function with the function
diff --git a/kernel/trace/trace_functions_graph.c b/kernel/trace/trace_functions_graph.c
index 0e03ed0..f7212ec 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -119,7 +119,7 @@ print_graph_duration(struct trace_array *tr, unsigned long long duration,
 /* Add a function return address to the trace stack on thread info.*/
 int
 ftrace_push_return_trace(unsigned long ret, unsigned long func, int *depth,
-			 unsigned long frame_pointer)
+			 unsigned long frame_pointer, unsigned long *retp)
 {
 	unsigned long long calltime;
 	int index;
@@ -174,6 +174,9 @@ ftrace_push_return_trace(unsigned long ret, unsigned long func, int *depth,
 #ifdef HAVE_FUNCTION_GRAPH_FP_TEST
 	current->ret_stack[index].fp = frame_pointer;
 #endif
+#ifdef HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
+	current->ret_stack[index].retp = retp;
+#endif
 	*depth = current->curr_ret_stack;
 
 	return 0;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 26/51] ftrace: add ftrace_graph_ret_addr() stack unwinding helpers
  2016-08-12 14:28 [PATCH v3 00/51] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (24 preceding siblings ...)
  2016-08-12 14:28 ` [PATCH v3 25/51] ftrace: add return address pointer to ftrace_ret_stack Josh Poimboeuf
@ 2016-08-12 14:28 ` Josh Poimboeuf
  2016-08-12 14:28 ` [PATCH v3 27/51] x86/dumpstack/ftrace: convert dump_trace() callbacks to use ftrace_graph_ret_addr() Josh Poimboeuf
                   ` (24 subsequent siblings)
  50 siblings, 0 replies; 99+ messages in thread
From: Josh Poimboeuf @ 2016-08-12 14:28 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

When function graph tracing is enabled for a function, ftrace modifies
the stack by replacing the original return address with the address of a
hook function (return_to_handler).

Stack unwinders need a way to get the original return address.  Add an
arch-independent helper function for that named ftrace_graph_ret_addr().

This adds two variations of the function: one depends on
HAVE_FUNCTION_GRAPH_RET_ADDR_PTR, and the other relies on an index state
variable.

The former is recommended because, in some cases, the latter can cause
problems when the unwinder skips stack frames.  It can get out of sync
with the ret_stack index and wrong addresses can be reported for the
stack trace.

Once all arches have been ported to use
HAVE_FUNCTION_GRAPH_RET_ADDR_PTR, we can get rid of the distinction.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 include/linux/ftrace.h               | 10 +++++++
 kernel/trace/trace_functions_graph.c | 58 ++++++++++++++++++++++++++++++++++++
 2 files changed, 68 insertions(+)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 483e02a..6f93ac4 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -814,6 +814,9 @@ extern int
 ftrace_push_return_trace(unsigned long ret, unsigned long func, int *depth,
 			 unsigned long frame_pointer, unsigned long *retp);
 
+unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
+				    unsigned long ret, unsigned long *retp);
+
 /*
  * Sometimes we don't want to trace a function with the function
  * graph tracer but we want them to keep traced by the usual function
@@ -875,6 +878,13 @@ static inline int task_curr_ret_stack(struct task_struct *tsk)
 	return -1;
 }
 
+static inline unsigned long
+ftrace_graph_ret_addr(struct task_struct *task, int *idx, unsigned long ret,
+		      unsigned long *retp)
+{
+	return ret;
+}
+
 static inline void pause_graph_tracing(void) { }
 static inline void unpause_graph_tracing(void) { }
 #endif /* CONFIG_FUNCTION_GRAPH_TRACER */
diff --git a/kernel/trace/trace_functions_graph.c b/kernel/trace/trace_functions_graph.c
index f7212ec..0cbe38a 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -284,6 +284,64 @@ unsigned long ftrace_return_to_handler(unsigned long frame_pointer)
 	return ret;
 }
 
+/**
+ * ftrace_graph_ret_addr - convert a potentially modified stack return address
+ *			   to its original value
+ *
+ * This function can be called by stack unwinding code to convert a found stack
+ * return address ('ret') to its original value, in case the function graph
+ * tracer has modified it to be 'return_to_handler'.  If the address hasn't
+ * been modified, the unchanged value of 'ret' is returned.
+ *
+ * 'idx' is a state variable which should be initialized by the caller to zero
+ * before the first call.
+ *
+ * 'retp' is a pointer to the return address on the stack.  It's ignored if
+ * the arch doesn't have HAVE_FUNCTION_GRAPH_RET_ADDR_PTR defined.
+ */
+#ifdef HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
+unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
+				    unsigned long ret, unsigned long *retp)
+{
+	int index = task->curr_ret_stack;
+	int i;
+
+	if (ret != (unsigned long)return_to_handler)
+		return ret;
+
+	if (index < -1)
+		index += FTRACE_NOTRACE_DEPTH;
+
+	if (index < 0)
+		return ret;
+
+	for (i = 0; i <= index; i++)
+		if (task->ret_stack[i].retp == retp)
+			return task->ret_stack[i].ret;
+
+	return ret;
+}
+#else /* !HAVE_FUNCTION_GRAPH_RET_ADDR_PTR */
+unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
+				    unsigned long ret, unsigned long *retp)
+{
+	int task_idx;
+
+	if (ret != (unsigned long)return_to_handler)
+		return ret;
+
+	task_idx = task->curr_ret_stack;
+
+	if (!task->ret_stack || task_idx < *idx)
+		return ret;
+
+	task_idx -= *idx;
+	(*idx)++;
+
+	return task->ret_stack[task_idx].ret;
+}
+#endif /* HAVE_FUNCTION_GRAPH_RET_ADDR_PTR */
+
 int __trace_graph_entry(struct trace_array *tr,
 				struct ftrace_graph_ent *trace,
 				unsigned long flags,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 27/51] x86/dumpstack/ftrace: convert dump_trace() callbacks to use ftrace_graph_ret_addr()
  2016-08-12 14:28 [PATCH v3 00/51] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (25 preceding siblings ...)
  2016-08-12 14:28 ` [PATCH v3 26/51] ftrace: add ftrace_graph_ret_addr() stack unwinding helpers Josh Poimboeuf
@ 2016-08-12 14:28 ` Josh Poimboeuf
  2016-08-12 14:28 ` [PATCH v3 28/51] ftrace/x86: implement HAVE_FUNCTION_GRAPH_RET_ADDR_PTR Josh Poimboeuf
                   ` (23 subsequent siblings)
  50 siblings, 0 replies; 99+ messages in thread
From: Josh Poimboeuf @ 2016-08-12 14:28 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

Convert print_context_stack() and print_context_stack_bp() to use the
arch-independent ftrace_graph_ret_addr() helper.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/dumpstack.c | 65 +++++++++++++++------------------------------
 1 file changed, 22 insertions(+), 43 deletions(-)

diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index 692eecae..b374d85 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -39,38 +39,6 @@ void printk_address(unsigned long address)
 	pr_cont(" [<%p>] %pS\n", (void *)address, (void *)address);
 }
 
-#ifdef CONFIG_FUNCTION_GRAPH_TRACER
-static void
-print_ftrace_graph_addr(unsigned long addr, void *data,
-			const struct stacktrace_ops *ops,
-			struct task_struct *task, int *graph)
-{
-	unsigned long ret_addr;
-	int index;
-
-	if (addr != (unsigned long)return_to_handler)
-		return;
-
-	index = task->curr_ret_stack;
-
-	if (!task->ret_stack || index < *graph)
-		return;
-
-	index -= *graph;
-	ret_addr = task->ret_stack[index].ret;
-
-	ops->address(data, ret_addr, 1);
-
-	(*graph)++;
-}
-#else
-static inline void
-print_ftrace_graph_addr(unsigned long addr, void *data,
-			const struct stacktrace_ops *ops,
-			struct task_struct *task, int *graph)
-{ }
-#endif
-
 /*
  * x86-64 can have up to three kernel stacks:
  * process stack
@@ -108,18 +76,24 @@ print_context_stack(struct task_struct *task,
 		stack = (unsigned long *)task_stack_page(task);
 
 	while (valid_stack_ptr(task, stack, sizeof(*stack), end)) {
-		unsigned long addr;
+		unsigned long addr = *stack;
 
-		addr = *stack;
 		if (__kernel_text_address(addr)) {
+			unsigned long real_addr;
+			int reliable = 0;
+
 			if ((unsigned long) stack == bp + sizeof(long)) {
-				ops->address(data, addr, 1);
+				reliable = 1;
 				frame = frame->next_frame;
 				bp = (unsigned long) frame;
-			} else {
-				ops->address(data, addr, 0);
 			}
-			print_ftrace_graph_addr(addr, data, ops, task, graph);
+
+			ops->address(data, addr, reliable);
+
+			real_addr = ftrace_graph_ret_addr(task, graph, addr,
+							  stack);
+			if (real_addr != addr)
+				ops->address(data, real_addr, 1);
 		}
 		stack++;
 	}
@@ -134,19 +108,24 @@ print_context_stack_bp(struct task_struct *task,
 		       unsigned long *end, int *graph)
 {
 	struct stack_frame *frame = (struct stack_frame *)bp;
-	unsigned long *ret_addr = &frame->return_address;
+	unsigned long *retp = &frame->return_address;
 
-	while (valid_stack_ptr(task, ret_addr, sizeof(*ret_addr), end)) {
-		unsigned long addr = *ret_addr;
+	while (valid_stack_ptr(task, retp, sizeof(*retp), end)) {
+		unsigned long addr = *retp;
+		unsigned long real_addr;
 
 		if (!__kernel_text_address(addr))
 			break;
 
 		if (ops->address(data, addr, 1))
 			break;
+
+		real_addr = ftrace_graph_ret_addr(task, graph, addr, retp);
+		if (real_addr != addr)
+			ops->address(data, real_addr, 1);
+
 		frame = frame->next_frame;
-		ret_addr = &frame->return_address;
-		print_ftrace_graph_addr(addr, data, ops, task, graph);
+		retp = &frame->return_address;
 	}
 
 	return (unsigned long)frame;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 28/51] ftrace/x86: implement HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
  2016-08-12 14:28 [PATCH v3 00/51] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (26 preceding siblings ...)
  2016-08-12 14:28 ` [PATCH v3 27/51] x86/dumpstack/ftrace: convert dump_trace() callbacks to use ftrace_graph_ret_addr() Josh Poimboeuf
@ 2016-08-12 14:28 ` Josh Poimboeuf
  2016-08-12 14:28 ` [PATCH v3 29/51] x86/dumpstack/ftrace: mark function graph handler function as unreliable Josh Poimboeuf
                   ` (22 subsequent siblings)
  50 siblings, 0 replies; 99+ messages in thread
From: Josh Poimboeuf @ 2016-08-12 14:28 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

This allows the use of the more reliable version of
ftrace_graph_ret_addr() so we no longer have to worry about the unwinder
getting out of sync with the function graph ret_stack index, which can
happen if the unwinder skips any frames before calling
ftrace_graph_ret_addr().

This fixes this issue (and several others like it):

  Before:
  $ cat /proc/self/stack
  [<ffffffff810489a2>] save_stack_trace_tsk+0x22/0x40
  [<ffffffff81311a89>] proc_pid_stack+0xb9/0x110
  [<ffffffff813127c4>] proc_single_show+0x54/0x80
  [<ffffffff812be088>] seq_read+0x108/0x3e0
  [<ffffffff812923d7>] __vfs_read+0x37/0x140
  [<ffffffff812929d9>] vfs_read+0x99/0x140
  [<ffffffff81293f28>] SyS_read+0x58/0xc0
  [<ffffffff818af97c>] entry_SYSCALL_64_fastpath+0x1f/0xbd
  [<ffffffffffffffff>] 0xffffffffffffffff

  After:
  $ echo function_graph > /sys/kernel/debug/tracing/current_tracer
  $ cat /proc/self/stack
  [<ffffffff818b2428>] ? return_to_handler+0x0/0x27
  [<ffffffff810394cc>] print_context_stack+0xfc/0x100
  [<ffffffff818b2428>] ? return_to_handler+0x0/0x27
  [<ffffffff8103891b>] dump_trace+0x12b/0x350
  [<ffffffff818b2428>] ? return_to_handler+0x0/0x27
  [<ffffffff810489a2>] save_stack_trace_tsk+0x22/0x40
  [<ffffffff818b2428>] ? return_to_handler+0x0/0x27
  [<ffffffff81311a89>] proc_pid_stack+0xb9/0x110
  [<ffffffff818b2428>] ? return_to_handler+0x0/0x27
  [<ffffffff813127c4>] proc_single_show+0x54/0x80
  [<ffffffff818b2428>] ? return_to_handler+0x0/0x27
  [<ffffffff812be088>] seq_read+0x108/0x3e0
  [<ffffffff818b2428>] ? return_to_handler+0x0/0x27
  [<ffffffff812923d7>] __vfs_read+0x37/0x140
  [<ffffffff818b2428>] ? return_to_handler+0x0/0x27
  [<ffffffff812929d9>] vfs_read+0x99/0x140
  [<ffffffffffffffff>] 0xffffffffffffffff

Enabling function graph tracing caused the stack trace to change: it was
offset by 2 frames because the unwinder started with an earlier stack
frame and got out of sync with the ret_stack index.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/include/asm/ftrace.h | 2 ++
 arch/x86/kernel/ftrace.c      | 2 +-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h
index 37f67cb..eccd0ac 100644
--- a/arch/x86/include/asm/ftrace.h
+++ b/arch/x86/include/asm/ftrace.h
@@ -14,6 +14,8 @@
 #define ARCH_SUPPORTS_FTRACE_OPS 1
 #endif
 
+#define HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
+
 #ifndef __ASSEMBLY__
 extern void mcount(void);
 extern atomic_t modifying_ftrace_code;
diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index ae3b1fb..8639bb2 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -1029,7 +1029,7 @@ void prepare_ftrace_return(unsigned long self_addr, unsigned long *parent,
 	}
 
 	if (ftrace_push_return_trace(old, self_addr, &trace.depth,
-				     frame_pointer, NULL) == -EBUSY) {
+				     frame_pointer, parent) == -EBUSY) {
 		*parent = old;
 		return;
 	}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 29/51] x86/dumpstack/ftrace: mark function graph handler function as unreliable
  2016-08-12 14:28 [PATCH v3 00/51] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (27 preceding siblings ...)
  2016-08-12 14:28 ` [PATCH v3 28/51] ftrace/x86: implement HAVE_FUNCTION_GRAPH_RET_ADDR_PTR Josh Poimboeuf
@ 2016-08-12 14:28 ` Josh Poimboeuf
  2016-08-12 14:28 ` [PATCH v3 30/51] x86/dumpstack/ftrace: don't print unreliable addresses in print_context_stack_bp() Josh Poimboeuf
                   ` (21 subsequent siblings)
  50 siblings, 0 replies; 99+ messages in thread
From: Josh Poimboeuf @ 2016-08-12 14:28 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

When function graph tracing is enabled for a function, its return
address on the stack is replaced with the address of an ftrace handler
(return_to_handler).

Currently 'return_to_handler' can be reported as reliable.  That's not
ideal, and can actually be misleading.  When saving or dumping the
stack, you normally only care about what led up to that point (the call
path), rather than what will happen in the future (the return path).

That's especially true in the non-oops stack trace case, which isn't
used for debugging.  For example, in a perf profiling operation,
reporting return_to_handler() in the trace would just be confusing.

And in the oops case, where debugging is important, "unreliable" is also
more appropriate there because it serves as a hint that graph tracing
was involved, instead of trying to imply that return_to_handler() was
the real caller.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/dumpstack.c | 24 ++++++++++++++++--------
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index b374d85..33f2899 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -88,12 +88,21 @@ print_context_stack(struct task_struct *task,
 				bp = (unsigned long) frame;
 			}
 
-			ops->address(data, addr, reliable);
-
+			/*
+			 * When function graph tracing is enabled for a
+			 * function, its return address on the stack is
+			 * replaced with the address of an ftrace handler
+			 * (return_to_handler).  In that case, before printing
+			 * the "real" address, we want to print the handler
+			 * address as an "unreliable" hint that function graph
+			 * tracing was involved.
+			 */
 			real_addr = ftrace_graph_ret_addr(task, graph, addr,
 							  stack);
 			if (real_addr != addr)
-				ops->address(data, real_addr, 1);
+				ops->address(data, addr, 0);
+
+			ops->address(data, real_addr, reliable);
 		}
 		stack++;
 	}
@@ -117,12 +126,11 @@ print_context_stack_bp(struct task_struct *task,
 		if (!__kernel_text_address(addr))
 			break;
 
-		if (ops->address(data, addr, 1))
-			break;
-
 		real_addr = ftrace_graph_ret_addr(task, graph, addr, retp);
-		if (real_addr != addr)
-			ops->address(data, real_addr, 1);
+		if (real_addr != addr && ops->address(data, addr, 0))
+			break;
+		if (ops->address(data, real_addr, 1))
+			break;
 
 		frame = frame->next_frame;
 		retp = &frame->return_address;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 30/51] x86/dumpstack/ftrace: don't print unreliable addresses in print_context_stack_bp()
  2016-08-12 14:28 [PATCH v3 00/51] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (28 preceding siblings ...)
  2016-08-12 14:28 ` [PATCH v3 29/51] x86/dumpstack/ftrace: mark function graph handler function as unreliable Josh Poimboeuf
@ 2016-08-12 14:28 ` Josh Poimboeuf
  2016-08-12 14:28 ` [PATCH v3 31/51] x86/dumpstack: allow preemption in show_stack_log_lvl() and dump_trace() Josh Poimboeuf
                   ` (20 subsequent siblings)
  50 siblings, 0 replies; 99+ messages in thread
From: Josh Poimboeuf @ 2016-08-12 14:28 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

When function graph tracing is enabled, print_context_stack_bp() can
report return_to_handler() as an unreliable address, which is confusing
and misleading: return_to_handler() is really only useful as a hint for
debugging, whereas print_context_stack_bp() users only care about the
actual 'reliable' call path.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/dumpstack.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index 33f2899..c6c6c39 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -127,8 +127,6 @@ print_context_stack_bp(struct task_struct *task,
 			break;
 
 		real_addr = ftrace_graph_ret_addr(task, graph, addr, retp);
-		if (real_addr != addr && ops->address(data, addr, 0))
-			break;
 		if (ops->address(data, real_addr, 1))
 			break;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 31/51] x86/dumpstack: allow preemption in show_stack_log_lvl() and dump_trace()
  2016-08-12 14:28 [PATCH v3 00/51] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (29 preceding siblings ...)
  2016-08-12 14:28 ` [PATCH v3 30/51] x86/dumpstack/ftrace: don't print unreliable addresses in print_context_stack_bp() Josh Poimboeuf
@ 2016-08-12 14:28 ` Josh Poimboeuf
  2016-08-14  7:45   ` Andy Lutomirski
  2016-08-12 14:28 ` [PATCH v3 32/51] x86/dumpstack: simplify in_exception_stack() Josh Poimboeuf
                   ` (19 subsequent siblings)
  50 siblings, 1 reply; 99+ messages in thread
From: Josh Poimboeuf @ 2016-08-12 14:28 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

show_stack_log_lvl() and dump_trace() are already preemption safe:

- If they're running in interrupt context, preemption is already
  disabled and the percpu irq stack pointers can be trusted.

- If they're running with preemption enabled, they must be running on
  the task stack anyway, so it doesn't matter if they're comparing the
  stack pointer against the percpu irq stack pointer from this CPU or
  another one: either way it won't match.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/dumpstack_32.c | 14 ++++++--------
 arch/x86/kernel/dumpstack_64.c | 26 +++++++++-----------------
 2 files changed, 15 insertions(+), 25 deletions(-)

diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
index c533b8b..b07d5c9 100644
--- a/arch/x86/kernel/dumpstack_32.c
+++ b/arch/x86/kernel/dumpstack_32.c
@@ -24,16 +24,16 @@ static void *is_irq_stack(void *p, void *irq)
 }
 
 
-static void *is_hardirq_stack(unsigned long *stack, int cpu)
+static void *is_hardirq_stack(unsigned long *stack)
 {
-	void *irq = per_cpu(hardirq_stack, cpu);
+	void *irq = this_cpu_read(hardirq_stack);
 
 	return is_irq_stack(stack, irq);
 }
 
-static void *is_softirq_stack(unsigned long *stack, int cpu)
+static void *is_softirq_stack(unsigned long *stack);
 {
-	void *irq = per_cpu(softirq_stack, cpu);
+	void *irq = this_cpu_read(softirq_stack);
 
 	return is_irq_stack(stack, irq);
 }
@@ -42,7 +42,6 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 		unsigned long *stack, unsigned long bp,
 		const struct stacktrace_ops *ops, void *data)
 {
-	const unsigned cpu = get_cpu();
 	int graph = 0;
 	u32 *prev_esp;
 
@@ -53,9 +52,9 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 	for (;;) {
 		void *end_stack;
 
-		end_stack = is_hardirq_stack(stack, cpu);
+		end_stack = is_hardirq_stack(stack);
 		if (!end_stack)
-			end_stack = is_softirq_stack(stack, cpu);
+			end_stack = is_softirq_stack(stack);
 
 		bp = ops->walk_stack(task, stack, bp, ops, data,
 				     end_stack, &graph);
@@ -74,7 +73,6 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 			break;
 		touch_nmi_watchdog();
 	}
-	put_cpu();
 }
 EXPORT_SYMBOL(dump_trace);
 
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index 491f2fd..f1b843a 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -31,8 +31,8 @@ static char x86_stack_ids[][8] = {
 #endif
 };
 
-static unsigned long *in_exception_stack(unsigned cpu, unsigned long stack,
-					 unsigned *usedp, char **idp)
+static unsigned long *in_exception_stack(unsigned long stack, unsigned *usedp,
+					 char **idp)
 {
 	unsigned k;
 
@@ -41,7 +41,7 @@ static unsigned long *in_exception_stack(unsigned cpu, unsigned long stack,
 	 * 'stack' is in one of them:
 	 */
 	for (k = 0; k < N_EXCEPTION_STACKS; k++) {
-		unsigned long end = per_cpu(orig_ist, cpu).ist[k];
+		unsigned long end = raw_cpu_ptr(&orig_ist)->ist[k];
 		/*
 		 * Is 'stack' above this exception frame's end?
 		 * If yes then skip to the next frame.
@@ -111,7 +111,7 @@ enum stack_type {
 };
 
 static enum stack_type
-analyze_stack(int cpu, struct task_struct *task, unsigned long *stack,
+analyze_stack(struct task_struct *task, unsigned long *stack,
 	      unsigned long **stack_end, unsigned long *irq_stack,
 	      unsigned *used, char **id)
 {
@@ -121,8 +121,7 @@ analyze_stack(int cpu, struct task_struct *task, unsigned long *stack,
 	if ((unsigned long)task_stack_page(task) == addr)
 		return STACK_IS_NORMAL;
 
-	*stack_end = in_exception_stack(cpu, (unsigned long)stack,
-					used, id);
+	*stack_end = in_exception_stack((unsigned long)stack, used, id);
 	if (*stack_end)
 		return STACK_IS_EXCEPTION;
 
@@ -149,8 +148,7 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 		unsigned long *stack, unsigned long bp,
 		const struct stacktrace_ops *ops, void *data)
 {
-	const unsigned cpu = get_cpu();
-	unsigned long *irq_stack = (unsigned long *)per_cpu(irq_stack_ptr, cpu);
+	unsigned long *irq_stack = (unsigned long *)this_cpu_read(irq_stack_ptr);
 	unsigned used = 0;
 	int graph = 0;
 	int done = 0;
@@ -169,8 +167,8 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 		enum stack_type stype;
 		char *id;
 
-		stype = analyze_stack(cpu, task, stack, &stack_end,
-				      irq_stack, &used, &id);
+		stype = analyze_stack(task, stack, &stack_end, irq_stack, &used,
+				      &id);
 
 		/* Default finish unless specified to continue */
 		done = 1;
@@ -225,7 +223,6 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 	 * This handles the process stack:
 	 */
 	bp = ops->walk_stack(task, stack, bp, ops, data, NULL, &graph);
-	put_cpu();
 }
 EXPORT_SYMBOL(dump_trace);
 
@@ -236,13 +233,9 @@ show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
 	unsigned long *irq_stack_end;
 	unsigned long *irq_stack;
 	unsigned long *stack;
-	int cpu;
 	int i;
 
-	preempt_disable();
-	cpu = smp_processor_id();
-
-	irq_stack_end = (unsigned long *)(per_cpu(irq_stack_ptr, cpu));
+	irq_stack_end = (unsigned long *)this_cpu_read(irq_stack_ptr);
 	irq_stack     = irq_stack_end - (IRQ_USABLE_STACK_SIZE / sizeof(long));
 
 	sp = sp ? : get_stack_pointer(task, regs);
@@ -274,7 +267,6 @@ show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
 		stack++;
 		touch_nmi_watchdog();
 	}
-	preempt_enable();
 
 	pr_cont("\n");
 	show_trace_log_lvl(task, regs, sp, bp, log_lvl);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 32/51] x86/dumpstack: simplify in_exception_stack()
  2016-08-12 14:28 [PATCH v3 00/51] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (30 preceding siblings ...)
  2016-08-12 14:28 ` [PATCH v3 31/51] x86/dumpstack: allow preemption in show_stack_log_lvl() and dump_trace() Josh Poimboeuf
@ 2016-08-12 14:28 ` Josh Poimboeuf
  2016-08-14  7:48   ` Andy Lutomirski
  2016-08-12 14:28 ` [PATCH v3 33/51] x86/dumpstack: add get_stack_info() interface Josh Poimboeuf
                   ` (18 subsequent siblings)
  50 siblings, 1 reply; 99+ messages in thread
From: Josh Poimboeuf @ 2016-08-12 14:28 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

in_exception_stack() does some bad, bad things just so the unwinder can
print different values for different areas of the debug exception stack.

There's no need to clarify where exactly on the stack it is.  Just print
"#DB" and be done with it.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/dumpstack_64.c | 89 ++++++++++++------------------------------
 1 file changed, 26 insertions(+), 63 deletions(-)

diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index f1b843a..69f6ba2 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -16,83 +16,46 @@
 
 #include <asm/stacktrace.h>
 
+static char *exception_stack_names[N_EXCEPTION_STACKS] = {
+		[ DOUBLEFAULT_STACK-1	]	= "#DF",
+		[ NMI_STACK-1		]	= "NMI",
+		[ DEBUG_STACK-1		]	= "#DB",
+		[ MCE_STACK-1		]	= "#MC",
+};
 
-#define N_EXCEPTION_STACKS_END \
-		(N_EXCEPTION_STACKS + DEBUG_STKSZ/EXCEPTION_STKSZ - 2)
-
-static char x86_stack_ids[][8] = {
-		[ DEBUG_STACK-1			]	= "#DB",
-		[ NMI_STACK-1			]	= "NMI",
-		[ DOUBLEFAULT_STACK-1		]	= "#DF",
-		[ MCE_STACK-1			]	= "#MC",
-#if DEBUG_STKSZ > EXCEPTION_STKSZ
-		[ N_EXCEPTION_STACKS ...
-		  N_EXCEPTION_STACKS_END	]	= "#DB[?]"
-#endif
+static unsigned long exception_stack_sizes[N_EXCEPTION_STACKS] = {
+	[0 ... N_EXCEPTION_STACKS - 1]		= EXCEPTION_STKSZ,
+	[DEBUG_STACK - 1]			= DEBUG_STKSZ
 };
 
 static unsigned long *in_exception_stack(unsigned long stack, unsigned *usedp,
 					 char **idp)
 {
+	unsigned long begin, end;
 	unsigned k;
 
-	/*
-	 * Iterate over all exception stacks, and figure out whether
-	 * 'stack' is in one of them:
-	 */
+	BUILD_BUG_ON(N_EXCEPTION_STACKS != 4);
+
 	for (k = 0; k < N_EXCEPTION_STACKS; k++) {
-		unsigned long end = raw_cpu_ptr(&orig_ist)->ist[k];
-		/*
-		 * Is 'stack' above this exception frame's end?
-		 * If yes then skip to the next frame.
-		 */
-		if (stack >= end)
+		end   = raw_cpu_ptr(&orig_ist)->ist[k];
+		begin = end - exception_stack_sizes[k];
+
+		if (stack < begin || stack >= end)
 			continue;
+
 		/*
-		 * Is 'stack' above this exception frame's start address?
-		 * If yes then we found the right frame.
-		 */
-		if (stack >= end - EXCEPTION_STKSZ) {
-			/*
-			 * Make sure we only iterate through an exception
-			 * stack once. If it comes up for the second time
-			 * then there's something wrong going on - just
-			 * break out and return NULL:
-			 */
-			if (*usedp & (1U << k))
-				break;
-			*usedp |= 1U << k;
-			*idp = x86_stack_ids[k];
-			return (unsigned long *)end;
-		}
-		/*
-		 * If this is a debug stack, and if it has a larger size than
-		 * the usual exception stacks, then 'stack' might still
-		 * be within the lower portion of the debug stack:
+		 * Make sure we only iterate through an exception stack once.
+		 * If it comes up for the second time then there's something
+		 * wrong going on - just break and return NULL:
 		 */
-#if DEBUG_STKSZ > EXCEPTION_STKSZ
-		if (k == DEBUG_STACK - 1 && stack >= end - DEBUG_STKSZ) {
-			unsigned j = N_EXCEPTION_STACKS - 1;
+		if (*usedp & (1U << k))
+			break;
+		*usedp |= 1U << k;
 
-			/*
-			 * Black magic. A large debug stack is composed of
-			 * multiple exception stack entries, which we
-			 * iterate through now. Dont look:
-			 */
-			do {
-				++j;
-				end -= EXCEPTION_STKSZ;
-				x86_stack_ids[j][4] = '1' +
-						(j - N_EXCEPTION_STACKS);
-			} while (stack < end - EXCEPTION_STKSZ);
-			if (*usedp & (1U << j))
-				break;
-			*usedp |= 1U << j;
-			*idp = x86_stack_ids[j];
-			return (unsigned long *)end;
-		}
-#endif
+		*idp = exception_stack_names[k];
+		return (unsigned long *)end;
 	}
+
 	return NULL;
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 33/51] x86/dumpstack: add get_stack_info() interface
  2016-08-12 14:28 [PATCH v3 00/51] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (31 preceding siblings ...)
  2016-08-12 14:28 ` [PATCH v3 32/51] x86/dumpstack: simplify in_exception_stack() Josh Poimboeuf
@ 2016-08-12 14:28 ` Josh Poimboeuf
  2016-08-12 14:28 ` [PATCH v3 34/51] x86/dumpstack: add recursion checking for all stacks Josh Poimboeuf
                   ` (17 subsequent siblings)
  50 siblings, 0 replies; 99+ messages in thread
From: Josh Poimboeuf @ 2016-08-12 14:28 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

valid_stack_ptr() is buggy: it assumes that all stacks are of size
THREAD_SIZE, which is not true for exception stacks.  So the
walk_stack() callbacks will need to know the location of the beginning
of the stack as well as the end.

Another issue is that in general the various features of a stack (type,
size, next stack pointer, description string) are scattered around in
various places throughout the stack dump code.

Encapsulate all that information in a single place with a new stack_info
struct and a get_stack_info() interface.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/events/core.c            |   2 +-
 arch/x86/include/asm/stacktrace.h |  41 +++++++++-
 arch/x86/kernel/dumpstack.c       |  40 ++++-----
 arch/x86/kernel/dumpstack_32.c    | 106 ++++++++++++++++++------
 arch/x86/kernel/dumpstack_64.c    | 165 ++++++++++++++++++++------------------
 arch/x86/kernel/stacktrace.c      |   2 +-
 arch/x86/oprofile/backtrace.c     |   2 +-
 7 files changed, 231 insertions(+), 127 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index dcaa887..dd3a1dc 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2271,7 +2271,7 @@ void arch_perf_update_userpage(struct perf_event *event,
  * callchain support
  */
 
-static int backtrace_stack(void *data, char *name)
+static int backtrace_stack(void *data, const char *name)
 {
 	return 0;
 }
diff --git a/arch/x86/include/asm/stacktrace.h b/arch/x86/include/asm/stacktrace.h
index 6f65995..be9273c 100644
--- a/arch/x86/include/asm/stacktrace.h
+++ b/arch/x86/include/asm/stacktrace.h
@@ -9,6 +9,39 @@
 #include <linux/uaccess.h>
 #include <linux/ptrace.h>
 
+enum stack_type {
+	STACK_TYPE_UNKNOWN,
+	STACK_TYPE_TASK,
+	STACK_TYPE_IRQ,
+	STACK_TYPE_SOFTIRQ,
+	STACK_TYPE_EXCEPTION,
+	STACK_TYPE_EXCEPTION_LAST = STACK_TYPE_EXCEPTION + N_EXCEPTION_STACKS-1,
+};
+
+struct stack_info {
+	enum stack_type type;
+	unsigned long *begin, *end, *next_sp;
+};
+
+bool in_task_stack(unsigned long *stack, struct task_struct *task,
+		   struct stack_info *info);
+
+int get_stack_info(unsigned long *stack, struct task_struct *task,
+		   struct stack_info *info, unsigned long *visit_mask);
+
+void stack_type_str(enum stack_type type, const char **begin,
+		    const char **end);
+
+static inline bool on_stack(struct stack_info *info, void *addr, size_t len)
+{
+	void *begin = info->begin;
+	void *end   = info->end;
+
+	return (info->type != STACK_TYPE_UNKNOWN &&
+		addr >= begin && addr < end &&
+		addr + len > begin && addr + len <= end);
+}
+
 extern int kstack_depth_to_print;
 
 struct thread_info;
@@ -19,27 +52,27 @@ typedef unsigned long (*walk_stack_t)(struct task_struct *task,
 				      unsigned long bp,
 				      const struct stacktrace_ops *ops,
 				      void *data,
-				      unsigned long *end,
+				      struct stack_info *info,
 				      int *graph);
 
 extern unsigned long
 print_context_stack(struct task_struct *task,
 		    unsigned long *stack, unsigned long bp,
 		    const struct stacktrace_ops *ops, void *data,
-		    unsigned long *end, int *graph);
+		    struct stack_info *info, int *graph);
 
 extern unsigned long
 print_context_stack_bp(struct task_struct *task,
 		       unsigned long *stack, unsigned long bp,
 		       const struct stacktrace_ops *ops, void *data,
-		       unsigned long *end, int *graph);
+		       struct stack_info *info, int *graph);
 
 /* Generic stack tracer with callbacks */
 
 struct stacktrace_ops {
 	int (*address)(void *data, unsigned long address, int reliable);
 	/* On negative return stop dumping */
-	int (*stack)(void *data, char *name);
+	int (*stack)(void *data, const char *name);
 	walk_stack_t	walk_stack;
 };
 
diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index c6c6c39..aa208e5 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -25,6 +25,23 @@ unsigned int code_bytes = 64;
 int kstack_depth_to_print = 3 * STACKSLOTS_PER_LINE;
 static int die_counter;
 
+bool in_task_stack(unsigned long *stack, struct task_struct *task,
+		   struct stack_info *info)
+{
+	unsigned long *begin = task_stack_page(task);
+	unsigned long *end   = task_stack_page(task) + THREAD_SIZE;
+
+	if (stack < begin || stack >= end)
+		return false;
+
+	info->type	= STACK_TYPE_TASK;
+	info->begin	= begin;
+	info->end	= end;
+	info->next_sp	= NULL;
+
+	return true;
+}
+
 static void printk_stack_address(unsigned long address, int reliable,
 				 char *log_lvl)
 {
@@ -46,24 +63,11 @@ void printk_address(unsigned long address)
  * severe exception (double fault, nmi, stack fault, debug, mce) hardware stack
  */
 
-static inline int valid_stack_ptr(struct task_struct *task,
-			void *p, unsigned int size, void *end)
-{
-	void *t = task_stack_page(task);
-	if (end) {
-		if (p < end && p >= (end-THREAD_SIZE))
-			return 1;
-		else
-			return 0;
-	}
-	return p >= t && p < t + THREAD_SIZE - size;
-}
-
 unsigned long
 print_context_stack(struct task_struct *task,
 		unsigned long *stack, unsigned long bp,
 		const struct stacktrace_ops *ops, void *data,
-		unsigned long *end, int *graph)
+		struct stack_info *info, int *graph)
 {
 	struct stack_frame *frame = (struct stack_frame *)bp;
 
@@ -75,7 +79,7 @@ print_context_stack(struct task_struct *task,
 	    PAGE_SIZE)
 		stack = (unsigned long *)task_stack_page(task);
 
-	while (valid_stack_ptr(task, stack, sizeof(*stack), end)) {
+	while (on_stack(info, stack, sizeof(*stack))) {
 		unsigned long addr = *stack;
 
 		if (__kernel_text_address(addr)) {
@@ -114,12 +118,12 @@ unsigned long
 print_context_stack_bp(struct task_struct *task,
 		       unsigned long *stack, unsigned long bp,
 		       const struct stacktrace_ops *ops, void *data,
-		       unsigned long *end, int *graph)
+		       struct stack_info *info, int *graph)
 {
 	struct stack_frame *frame = (struct stack_frame *)bp;
 	unsigned long *retp = &frame->return_address;
 
-	while (valid_stack_ptr(task, retp, sizeof(*retp), end)) {
+	while (on_stack(info, stack, sizeof(*stack) * 2)) {
 		unsigned long addr = *retp;
 		unsigned long real_addr;
 
@@ -138,7 +142,7 @@ print_context_stack_bp(struct task_struct *task,
 }
 EXPORT_SYMBOL_GPL(print_context_stack_bp);
 
-static int print_trace_stack(void *data, char *name)
+static int print_trace_stack(void *data, const char *name)
 {
 	printk("%s <%s> ", (char *)data, name);
 	return 0;
diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
index b07d5c9..51a113b 100644
--- a/arch/x86/kernel/dumpstack_32.c
+++ b/arch/x86/kernel/dumpstack_32.c
@@ -16,61 +16,117 @@
 
 #include <asm/stacktrace.h>
 
-static void *is_irq_stack(void *p, void *irq)
+void stack_type_str(enum stack_type type, const char **begin, const char **end)
 {
-	if (p < irq || p >= (irq + THREAD_SIZE))
-		return NULL;
-	return irq + THREAD_SIZE;
+	switch (type) {
+	case STACK_TYPE_IRQ:
+	case STACK_TYPE_SOFTIRQ:
+		*begin = "IRQ";
+		*end   = "EOI";
+		break;
+	default:
+		*begin = NULL;
+		*end   = NULL;
+	}
 }
 
+static bool in_hardirq_stack(unsigned long *stack, struct stack_info *info)
+{
+	unsigned long *begin = (unsigned long *)this_cpu_read(hardirq_stack);
+	unsigned long *end   = begin + (THREAD_SIZE / sizeof(long));
+
+	if (stack < begin || stack >= end)
+		return false;
+
+	info->type	= STACK_TYPE_IRQ;
+	info->begin	= begin;
+	info->end	= end;
+
+	/*
+	 * See irq_32.c -- the next stack pointer is stored at the beginning of
+	 * the stack.
+	 */
+	info->next_sp	= (unsigned long *)*begin;
+
+	return true;
+}
 
-static void *is_hardirq_stack(unsigned long *stack)
+static bool in_softirq_stack(unsigned long *stack, struct stack_info *info)
 {
-	void *irq = this_cpu_read(hardirq_stack);
+	unsigned long *begin = (unsigned long *)this_cpu_read(softirq_stack);
+	unsigned long *end   = begin + (THREAD_SIZE / sizeof(long));
+
+	if (stack < begin || stack >= end)
+		return false;
+
+	info->type	= STACK_TYPE_SOFTIRQ;
+	info->begin	= begin;
+	info->end	= end;
+
+	/*
+	 * See irq_32.c -- the next stack pointer is stored at the beginning of
+	 * the stack.
+	 */
+	info->next_sp	= (unsigned long *)*begin;
 
-	return is_irq_stack(stack, irq);
+	return true;
 }
 
-static void *is_softirq_stack(unsigned long *stack);
+int get_stack_info(unsigned long *stack, struct task_struct *task,
+		   struct stack_info *info, unsigned long *visit_mask)
 {
-	void *irq = this_cpu_read(softirq_stack);
+	if (!stack)
+		goto unknown;
 
-	return is_irq_stack(stack, irq);
+	task = task ? : current;
+
+	if (in_task_stack(stack, task, info))
+		return 0;
+
+	if (task != current)
+		goto unknown;
+
+	if (in_hardirq_stack(stack, info))
+		return 0;
+
+	if (in_softirq_stack(stack, info))
+		return 0;
+
+unknown:
+	info->type = STACK_TYPE_UNKNOWN;
+	return -EINVAL;
 }
 
 void dump_trace(struct task_struct *task, struct pt_regs *regs,
 		unsigned long *stack, unsigned long bp,
 		const struct stacktrace_ops *ops, void *data)
 {
+	unsigned long visit_mask = 0;
 	int graph = 0;
-	u32 *prev_esp;
 
 	task = task ? : current;
 	stack = stack ? : get_stack_pointer(task, regs);
 	bp = bp ? : (unsigned long)get_frame_pointer(task, regs);
 
 	for (;;) {
-		void *end_stack;
+		const char *begin_str, *end_str;
+		struct stack_info info;
 
-		end_stack = is_hardirq_stack(stack);
-		if (!end_stack)
-			end_stack = is_softirq_stack(stack);
+		if (get_stack_info(stack, task, &info, &visit_mask))
+			break;
 
-		bp = ops->walk_stack(task, stack, bp, ops, data,
-				     end_stack, &graph);
+		stack_type_str(info.type, &begin_str, &end_str);
 
-		/* Stop if not on irq stack */
-		if (!end_stack)
+		if (begin_str && ops->stack(data, begin_str) < 0)
 			break;
 
-		/* The previous esp is saved on the bottom of the stack */
-		prev_esp = (u32 *)(end_stack - THREAD_SIZE);
-		stack = (unsigned long *)*prev_esp;
-		if (!stack)
-			break;
+		bp = ops->walk_stack(task, stack, bp, ops, data, &info, &graph);
 
-		if (ops->stack(data, "IRQ") < 0)
+		if (end_str && ops->stack(data, end_str) < 0)
 			break;
+
+		stack = info.next_sp;
+
 		touch_nmi_watchdog();
 	}
 }
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index 69f6ba2..2e8c750 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -28,17 +28,38 @@ static unsigned long exception_stack_sizes[N_EXCEPTION_STACKS] = {
 	[DEBUG_STACK - 1]			= DEBUG_STKSZ
 };
 
-static unsigned long *in_exception_stack(unsigned long stack, unsigned *usedp,
-					 char **idp)
+void stack_type_str(enum stack_type type, const char **begin, const char **end)
 {
-	unsigned long begin, end;
+	BUILD_BUG_ON(N_EXCEPTION_STACKS != 4);
+
+	switch (type) {
+	case STACK_TYPE_IRQ:
+		*begin = "IRQ";
+		*end   = "EOI";
+		break;
+	case STACK_TYPE_EXCEPTION ... STACK_TYPE_EXCEPTION_LAST:
+		*begin = exception_stack_names[type - STACK_TYPE_EXCEPTION];
+		*end   = "EOE";
+		break;
+	default:
+		*begin = NULL;
+		*end   = NULL;
+	}
+}
+
+static bool in_exception_stack(unsigned long *stack, struct stack_info *info,
+			       unsigned long *visit_mask)
+{
+	unsigned long *begin, *end;
+	struct pt_regs *regs;
 	unsigned k;
 
 	BUILD_BUG_ON(N_EXCEPTION_STACKS != 4);
 
 	for (k = 0; k < N_EXCEPTION_STACKS; k++) {
-		end   = raw_cpu_ptr(&orig_ist)->ist[k];
-		begin = end - exception_stack_sizes[k];
+		end   = (unsigned long *)raw_cpu_ptr(&orig_ist)->ist[k];
+		begin = end - (exception_stack_sizes[k] / sizeof(long));
+		regs  = (struct pt_regs *)end - 1;
 
 		if (stack < begin || stack >= end)
 			continue;
@@ -48,56 +69,67 @@ static unsigned long *in_exception_stack(unsigned long stack, unsigned *usedp,
 		 * If it comes up for the second time then there's something
 		 * wrong going on - just break and return NULL:
 		 */
-		if (*usedp & (1U << k))
+		if (*visit_mask & (1U << k))
 			break;
-		*usedp |= 1U << k;
+		*visit_mask |= 1U << k;
 
-		*idp = exception_stack_names[k];
-		return (unsigned long *)end;
+		info->type	= STACK_TYPE_EXCEPTION + k;
+		info->begin	= begin;
+		info->end	= end;
+		info->next_sp	= (unsigned long *)regs->sp;
+
+		return true;
 	}
 
-	return NULL;
+	return false;
 }
 
-static inline int
-in_irq_stack(unsigned long *stack, unsigned long *irq_stack,
-	     unsigned long *irq_stack_end)
+static bool in_irq_stack(unsigned long *stack, struct stack_info *info)
 {
-	return (stack >= irq_stack && stack < irq_stack_end);
-}
+	unsigned long *end   = (unsigned long *)this_cpu_read(irq_stack_ptr);
+	unsigned long *begin = end - (IRQ_USABLE_STACK_SIZE / sizeof(long));
 
-enum stack_type {
-	STACK_IS_UNKNOWN,
-	STACK_IS_NORMAL,
-	STACK_IS_EXCEPTION,
-	STACK_IS_IRQ,
-};
+	if (stack < begin || stack >= end)
+		return false;
+
+	info->type	= STACK_TYPE_IRQ;
+	info->begin	= begin;
+	info->end	= end;
+
+	/*
+	 * The next stack pointer is the first thing pushed by the entry code
+	 * after switching to the irq stack.
+	 */
+	info->next_sp = (unsigned long *)*(end - 1);
+
+	return true;
+}
 
-static enum stack_type
-analyze_stack(struct task_struct *task, unsigned long *stack,
-	      unsigned long **stack_end, unsigned long *irq_stack,
-	      unsigned *used, char **id)
+int get_stack_info(unsigned long *stack, struct task_struct *task,
+		   struct stack_info *info, unsigned long *visit_mask)
 {
-	unsigned long addr;
+	if (!stack)
+		goto unknown;
 
-	addr = ((unsigned long)stack & (~(THREAD_SIZE - 1)));
-	if ((unsigned long)task_stack_page(task) == addr)
-		return STACK_IS_NORMAL;
+	task = task ? : current;
+
+	if (in_task_stack(stack, task, info))
+		return 0;
 
-	*stack_end = in_exception_stack((unsigned long)stack, used, id);
-	if (*stack_end)
-		return STACK_IS_EXCEPTION;
+	if (task != current)
+		goto unknown;
 
-	if (!irq_stack)
-		return STACK_IS_NORMAL;
+	if (in_exception_stack(stack, info, visit_mask))
+		return 0;
 
-	*stack_end = irq_stack;
-	irq_stack -= (IRQ_USABLE_STACK_SIZE / sizeof(long));
+	if (in_irq_stack(stack, info))
+		return 0;
 
-	if (in_irq_stack(stack, irq_stack, *stack_end))
-		return STACK_IS_IRQ;
+	return 0;
 
-	return STACK_IS_UNKNOWN;
+unknown:
+	info->type = STACK_TYPE_UNKNOWN;
+	return -EINVAL;
 }
 
 /*
@@ -111,8 +143,8 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 		unsigned long *stack, unsigned long bp,
 		const struct stacktrace_ops *ops, void *data)
 {
-	unsigned long *irq_stack = (unsigned long *)this_cpu_read(irq_stack_ptr);
-	unsigned used = 0;
+	unsigned long visit_mask = 0;
+	struct stack_info info;
 	int graph = 0;
 	int done = 0;
 
@@ -126,57 +158,37 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 	 * exceptions
 	 */
 	while (!done) {
-		unsigned long *stack_end;
-		enum stack_type stype;
-		char *id;
+		const char *begin_str, *end_str;
 
-		stype = analyze_stack(task, stack, &stack_end, irq_stack, &used,
-				      &id);
+		get_stack_info(stack, task, &info, &visit_mask);
 
 		/* Default finish unless specified to continue */
 		done = 1;
 
-		switch (stype) {
+		switch (info.type) {
 
 		/* Break out early if we are on the thread stack */
-		case STACK_IS_NORMAL:
+		case STACK_TYPE_TASK:
 			break;
 
-		case STACK_IS_EXCEPTION:
+		case STACK_TYPE_IRQ:
+		case STACK_TYPE_EXCEPTION ... STACK_TYPE_EXCEPTION_LAST:
+
+			stack_type_str(info.type, &begin_str, &end_str);
 
-			if (ops->stack(data, id) < 0)
+			if (ops->stack(data, begin_str) < 0)
 				break;
 
 			bp = ops->walk_stack(task, stack, bp, ops,
-					     data, stack_end, &graph);
-			ops->stack(data, "EOE");
-			/*
-			 * We link to the next stack via the
-			 * second-to-last pointer (index -2 to end) in the
-			 * exception stack:
-			 */
-			stack = (unsigned long *) stack_end[-2];
-			done = 0;
-			break;
+					     data, &info, &graph);
 
-		case STACK_IS_IRQ:
+			ops->stack(data, end_str);
 
-			if (ops->stack(data, "IRQ") < 0)
-				break;
-			bp = ops->walk_stack(task, stack, bp,
-				     ops, data, stack_end, &graph);
-			/*
-			 * We link to the next stack (which would be
-			 * the process stack normally) the last
-			 * pointer (index -1 to end) in the IRQ stack:
-			 */
-			stack = (unsigned long *) (stack_end[-1]);
-			irq_stack = NULL;
-			ops->stack(data, "EOI");
+			stack = info.next_sp;
 			done = 0;
 			break;
 
-		case STACK_IS_UNKNOWN:
+		default:
 			ops->stack(data, "UNK");
 			break;
 		}
@@ -185,7 +197,7 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 	/*
 	 * This handles the process stack:
 	 */
-	bp = ops->walk_stack(task, stack, bp, ops, data, NULL, &graph);
+	bp = ops->walk_stack(task, stack, bp, ops, data, &info, &graph);
 }
 EXPORT_SYMBOL(dump_trace);
 
@@ -193,8 +205,7 @@ void
 show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
 		   unsigned long *sp, unsigned long bp, char *log_lvl)
 {
-	unsigned long *irq_stack_end;
-	unsigned long *irq_stack;
+	unsigned long *irq_stack, *irq_stack_end;
 	unsigned long *stack;
 	int i;
 
diff --git a/arch/x86/kernel/stacktrace.c b/arch/x86/kernel/stacktrace.c
index 4738f5e..785aef1 100644
--- a/arch/x86/kernel/stacktrace.c
+++ b/arch/x86/kernel/stacktrace.c
@@ -9,7 +9,7 @@
 #include <linux/uaccess.h>
 #include <asm/stacktrace.h>
 
-static int save_stack_stack(void *data, char *name)
+static int save_stack_stack(void *data, const char *name)
 {
 	return 0;
 }
diff --git a/arch/x86/oprofile/backtrace.c b/arch/x86/oprofile/backtrace.c
index d950f9e..7539148 100644
--- a/arch/x86/oprofile/backtrace.c
+++ b/arch/x86/oprofile/backtrace.c
@@ -17,7 +17,7 @@
 #include <asm/ptrace.h>
 #include <asm/stacktrace.h>
 
-static int backtrace_stack(void *data, char *name)
+static int backtrace_stack(void *data, const char *name)
 {
 	/* Yes, we want all stacks */
 	return 0;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 34/51] x86/dumpstack: add recursion checking for all stacks
  2016-08-12 14:28 [PATCH v3 00/51] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (32 preceding siblings ...)
  2016-08-12 14:28 ` [PATCH v3 33/51] x86/dumpstack: add get_stack_info() interface Josh Poimboeuf
@ 2016-08-12 14:28 ` Josh Poimboeuf
  2016-08-12 14:28 ` [PATCH v3 35/51] x86/unwind: add new unwind interface and implementations Josh Poimboeuf
                   ` (16 subsequent siblings)
  50 siblings, 0 replies; 99+ messages in thread
From: Josh Poimboeuf @ 2016-08-12 14:28 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

in_exception_stack() has some recursion checking which makes sure the
stack trace code never traverses a given exception stack more than once.
Otherwise corruption could cause a stack to point to itself (directly or
indirectly), resulting in an infinite loop.

Extend the recursion checking to all stacks.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/dumpstack_32.c | 22 +++++++++++++++++++---
 arch/x86/kernel/dumpstack_64.c | 34 +++++++++++++++++++---------------
 2 files changed, 38 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
index 51a113b..37d9c30 100644
--- a/arch/x86/kernel/dumpstack_32.c
+++ b/arch/x86/kernel/dumpstack_32.c
@@ -81,16 +81,32 @@ int get_stack_info(unsigned long *stack, struct task_struct *task,
 	task = task ? : current;
 
 	if (in_task_stack(stack, task, info))
-		return 0;
+		goto recursion_check;
 
 	if (task != current)
 		goto unknown;
 
 	if (in_hardirq_stack(stack, info))
-		return 0;
+		goto recursion_check;
 
 	if (in_softirq_stack(stack, info))
-		return 0;
+		goto recursion_check;
+
+	goto unknown;
+
+recursion_check:
+	/*
+	 * Make sure we don't iterate through any given stack more than once.
+	 * If it comes up a second time then there's something wrong going on:
+	 * just break out and report an unknown stack type.
+	 */
+	if (visit_mask) {
+		if (*visit_mask & (1UL << info->type))
+			goto unknown;
+		*visit_mask |= 1UL << info->type;
+	}
+
+	return 0;
 
 unknown:
 	info->type = STACK_TYPE_UNKNOWN;
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index 2e8c750..2292292 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -47,8 +47,7 @@ void stack_type_str(enum stack_type type, const char **begin, const char **end)
 	}
 }
 
-static bool in_exception_stack(unsigned long *stack, struct stack_info *info,
-			       unsigned long *visit_mask)
+static bool in_exception_stack(unsigned long *stack, struct stack_info *info)
 {
 	unsigned long *begin, *end;
 	struct pt_regs *regs;
@@ -64,15 +63,6 @@ static bool in_exception_stack(unsigned long *stack, struct stack_info *info,
 		if (stack < begin || stack >= end)
 			continue;
 
-		/*
-		 * Make sure we only iterate through an exception stack once.
-		 * If it comes up for the second time then there's something
-		 * wrong going on - just break and return NULL:
-		 */
-		if (*visit_mask & (1U << k))
-			break;
-		*visit_mask |= 1U << k;
-
 		info->type	= STACK_TYPE_EXCEPTION + k;
 		info->begin	= begin;
 		info->end	= end;
@@ -114,16 +104,30 @@ int get_stack_info(unsigned long *stack, struct task_struct *task,
 	task = task ? : current;
 
 	if (in_task_stack(stack, task, info))
-		return 0;
+		goto recursion_check;
 
 	if (task != current)
 		goto unknown;
 
-	if (in_exception_stack(stack, info, visit_mask))
-		return 0;
+	if (in_exception_stack(stack, info))
+		goto recursion_check;
 
 	if (in_irq_stack(stack, info))
-		return 0;
+		goto recursion_check;
+
+	goto unknown;
+
+recursion_check:
+	/*
+	 * Make sure we don't iterate through any given stack more than once.
+	 * If it comes up a second time then there's something wrong going on:
+	 * just break out and report an unknown stack type.
+	 */
+	if (visit_mask) {
+		if (*visit_mask & (1UL << info->type))
+			goto unknown;
+		*visit_mask |= 1UL << info->type;
+	}
 
 	return 0;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 35/51] x86/unwind: add new unwind interface and implementations
  2016-08-12 14:28 [PATCH v3 00/51] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (33 preceding siblings ...)
  2016-08-12 14:28 ` [PATCH v3 34/51] x86/dumpstack: add recursion checking for all stacks Josh Poimboeuf
@ 2016-08-12 14:28 ` Josh Poimboeuf
  2016-08-15 21:43   ` Josh Poimboeuf
  2016-08-12 14:28 ` [PATCH v3 36/51] perf/x86: convert perf_callchain_kernel() to use the new unwinder Josh Poimboeuf
                   ` (15 subsequent siblings)
  50 siblings, 1 reply; 99+ messages in thread
From: Josh Poimboeuf @ 2016-08-12 14:28 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

The x86 stack dump code is a bit of a mess.  dump_trace() uses
callbacks, and each user of it seems to have slightly different
requirements, so there are several slightly different callbacks floating
around.

Also there are some upcoming features which will require more changes to
the stack dump code: reliable stack detection for live patching,
hardened user copy, and the DWARF unwinder.  Each of those features
would at least need more callbacks and/or callback interfaces, resulting
in a much bigger mess than what we have today.

Before doing all that, we should try to clean things up and replace
dump_trace() with something cleaner and more flexible.

The new unwinder is a simple state machine which was heavily inspired by
a suggestion from Andy Lutomirski:

  https://lkml.kernel.org/r/CALCETrUbNTqaM2LRyXGRx=kVLRPeY5A3Pc6k4TtQxF320rUT=w@mail.gmail.com

It's also very similar to the libunwind API:

  http://www.nongnu.org/libunwind/man/libunwind(3).html

Some if its advantages:

- Simplicity: no more callback sprawl and less code duplication.

- Flexibility: it allows the caller to stop and inspect the stack state
  at each step in the unwinding process.

- Modularity: the unwinder code, console stack dump code, and stack
  metadata analysis code are all better separated so that changing one
  of them shouldn't have much of an impact on any of the others.

Two implementations are added which conform to the new unwind interface:

- The frame pointer unwinder which is used for CONFIG_FRAME_POINTER=y.

- The "guess" unwinder which is used for CONFIG_FRAME_POINTER=n.  This
  isn't an "unwinder" per se.  All it does is scan the stack for kernel
  text addresses.  But with no frame pointers, guesses are better than
  nothing in most cases.

Suggested-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/include/asm/unwind.h  | 93 ++++++++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/Makefile       |  6 +++
 arch/x86/kernel/unwind_frame.c | 93 ++++++++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/unwind_guess.c | 42 +++++++++++++++++++
 4 files changed, 234 insertions(+)
 create mode 100644 arch/x86/include/asm/unwind.h
 create mode 100644 arch/x86/kernel/unwind_frame.c
 create mode 100644 arch/x86/kernel/unwind_guess.c

diff --git a/arch/x86/include/asm/unwind.h b/arch/x86/include/asm/unwind.h
new file mode 100644
index 0000000..6dcb44b
--- /dev/null
+++ b/arch/x86/include/asm/unwind.h
@@ -0,0 +1,93 @@
+#ifndef _ASM_X86_UNWIND_H
+#define _ASM_X86_UNWIND_H
+
+#include <linux/sched.h>
+#include <linux/ftrace.h>
+#include <asm/ptrace.h>
+#include <asm/stacktrace.h>
+
+struct unwind_state {
+	struct stack_info stack_info;
+	unsigned long stack_mask;
+	struct task_struct *task;
+	int graph_idx;
+#ifdef CONFIG_FRAME_POINTER
+	unsigned long *bp;
+#else
+	unsigned long *sp;
+#endif
+};
+
+void __unwind_start(struct unwind_state *state, struct task_struct *task,
+		    struct pt_regs *regs, unsigned long *sp);
+
+bool unwind_next_frame(struct unwind_state *state);
+
+
+#ifdef CONFIG_FRAME_POINTER
+
+static inline
+unsigned long *unwind_get_return_address_ptr(struct unwind_state *state)
+{
+	if (state->stack_info.type == STACK_TYPE_UNKNOWN)
+		return NULL;
+
+	return state->bp + 1;
+}
+
+static inline unsigned long *unwind_get_stack_ptr(struct unwind_state *state)
+{
+	if (state->stack_info.type == STACK_TYPE_UNKNOWN)
+		return NULL;
+
+	return state->bp;
+}
+
+unsigned long unwind_get_return_address(struct unwind_state *state);
+
+#else /* !CONFIG_FRAME_POINTER */
+
+static inline
+unsigned long *unwind_get_return_address_ptr(struct unwind_state *state)
+{
+	return NULL;
+}
+
+static inline unsigned long *unwind_get_stack_ptr(struct unwind_state *state)
+{
+	if (state->stack_info.type == STACK_TYPE_UNKNOWN)
+		return NULL;
+
+	return state->sp;
+}
+
+static inline
+unsigned long unwind_get_return_address(struct unwind_state *state)
+{
+	if (state->stack_info.type == STACK_TYPE_UNKNOWN)
+		return 0;
+
+	return ftrace_graph_ret_addr(state->task, &state->graph_idx,
+				     *state->sp, state->sp);
+}
+
+#endif /* CONFIG_FRAME_POINTER */
+
+static inline bool unwind_done(struct unwind_state *state)
+{
+	return (state->stack_info.type == STACK_TYPE_UNKNOWN);
+}
+
+static inline
+void unwind_start(struct unwind_state *state, struct task_struct *task,
+		  struct pt_regs *regs, unsigned long *first_sp)
+{
+	if (!task)
+		task = current;
+
+	first_sp = first_sp ? : get_stack_pointer(task, regs);
+
+	__unwind_start(state, task, regs, first_sp);
+}
+
+#endif /* _ASM_X86_UNWIND_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 0503f5b..45257cf 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -125,6 +125,12 @@ obj-$(CONFIG_EFI)			+= sysfb_efi.o
 obj-$(CONFIG_PERF_EVENTS)		+= perf_regs.o
 obj-$(CONFIG_TRACING)			+= tracepoint.o
 
+ifdef CONFIG_FRAME_POINTER
+obj-y					+= unwind_frame.o
+else
+obj-y					+= unwind_guess.o
+endif
+
 ###
 # 64 bit specific files
 ifeq ($(CONFIG_X86_64),y)
diff --git a/arch/x86/kernel/unwind_frame.c b/arch/x86/kernel/unwind_frame.c
new file mode 100644
index 0000000..00ad526
--- /dev/null
+++ b/arch/x86/kernel/unwind_frame.c
@@ -0,0 +1,93 @@
+#include <linux/sched.h>
+#include <asm/ptrace.h>
+#include <asm/bitops.h>
+#include <asm/stacktrace.h>
+#include <asm/unwind.h>
+
+#define FRAME_HEADER_SIZE (sizeof(long) * 2)
+
+unsigned long unwind_get_return_address(struct unwind_state *state)
+{
+	unsigned long addr;
+	unsigned long *addr_p = unwind_get_return_address_ptr(state);
+
+	if (state->stack_info.type == STACK_TYPE_UNKNOWN)
+		return 0;
+
+	addr = ftrace_graph_ret_addr(state->task, &state->graph_idx, *addr_p,
+				     addr_p);
+
+	return __kernel_text_address(addr) ? addr : 0;
+}
+EXPORT_SYMBOL_GPL(unwind_get_return_address);
+
+static bool update_stack_state(struct unwind_state *state, void *addr,
+			       size_t len)
+{
+	struct stack_info *info = &state->stack_info;
+
+	if (on_stack(info, addr, len))
+		return true;
+
+	if (get_stack_info(info->next_sp, state->task, info,
+			   &state->stack_mask))
+		goto unknown;
+
+	if (!on_stack(info, addr, len))
+		goto unknown;
+
+	return true;
+
+unknown:
+	info->type = STACK_TYPE_UNKNOWN;
+	return false;
+}
+
+bool unwind_next_frame(struct unwind_state *state)
+{
+	unsigned long *next_bp;
+
+	if (unwind_done(state))
+		return false;
+
+	next_bp = (unsigned long *)*state->bp;
+
+	/* make sure the next frame's data is accessible */
+	if (!update_stack_state(state, next_bp, FRAME_HEADER_SIZE))
+		return false;
+
+	/* move to the next frame */
+	state->bp = next_bp;
+	return true;
+}
+EXPORT_SYMBOL_GPL(unwind_next_frame);
+
+void __unwind_start(struct unwind_state *state, struct task_struct *task,
+		    struct pt_regs *regs, unsigned long *first_sp)
+{
+	memset(state, 0, sizeof(*state));
+	state->task = task;
+
+	/* don't even attempt to start from user-mode regs */
+	if (regs && user_mode(regs))
+		return;
+
+	/* set up the first stack frame */
+	state->bp = get_frame_pointer(task, regs);
+
+	/* initialize stack info and make sure the frame data is accessible */
+	get_stack_info(state->bp, state->task, &state->stack_info,
+		       &state->stack_mask);
+	update_stack_state(state, state->bp, FRAME_HEADER_SIZE);
+
+	/*
+	 * The caller can optionally provide a stack pointer directly
+	 * (sp) or indirectly (regs->sp), which indicates which stack
+	 * frame to start unwinding at.  Skip ahead until we reach it.
+	 */
+	while (!unwind_done(state) &&
+	       (!on_stack(&state->stack_info, first_sp, sizeof(*first_sp) ||
+		state->bp < first_sp)))
+		unwind_next_frame(state);
+}
+EXPORT_SYMBOL_GPL(__unwind_start);
diff --git a/arch/x86/kernel/unwind_guess.c b/arch/x86/kernel/unwind_guess.c
new file mode 100644
index 0000000..c9f3e8b
--- /dev/null
+++ b/arch/x86/kernel/unwind_guess.c
@@ -0,0 +1,42 @@
+#include <linux/sched.h>
+#include <linux/ftrace.h>
+#include <asm/ptrace.h>
+#include <asm/bitops.h>
+#include <asm/stacktrace.h>
+#include <asm/unwind.h>
+
+bool unwind_next_frame(struct unwind_state *state)
+{
+	struct stack_info *info = &state->stack_info;
+
+	if (info->type == STACK_TYPE_UNKNOWN)
+		return false;
+
+	do {
+		for (state->sp++; state->sp < info->end; state->sp++)
+			if (__kernel_text_address(*state->sp))
+				return true;
+
+		state->sp = info->next_sp;
+
+	} while (!get_stack_info(state->sp, state->task, info,
+				 &state->stack_mask));
+
+	return false;
+}
+EXPORT_SYMBOL_GPL(unwind_next_frame);
+
+void __unwind_start(struct unwind_state *state, struct task_struct *task,
+		    struct pt_regs *regs, unsigned long *sp)
+{
+	memset(state, 0, sizeof(*state));
+
+	state->task = task;
+	state->sp   = sp;
+
+	get_stack_info(sp, state->task, &state->stack_info, &state->stack_mask);
+
+	if (!__kernel_text_address(*sp))
+		unwind_next_frame(state);
+}
+EXPORT_SYMBOL_GPL(__unwind_start);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 36/51] perf/x86: convert perf_callchain_kernel() to use the new unwinder
  2016-08-12 14:28 [PATCH v3 00/51] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (34 preceding siblings ...)
  2016-08-12 14:28 ` [PATCH v3 35/51] x86/unwind: add new unwind interface and implementations Josh Poimboeuf
@ 2016-08-12 14:28 ` Josh Poimboeuf
  2016-08-12 14:28 ` [PATCH v3 37/51] x86/stacktrace: convert save_stack_trace_*() " Josh Poimboeuf
                   ` (14 subsequent siblings)
  50 siblings, 0 replies; 99+ messages in thread
From: Josh Poimboeuf @ 2016-08-12 14:28 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

Convert perf_callchain_kernel() to use the new unwinder.  dump_trace()
has been deprecated.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/events/core.c | 33 ++++++++++-----------------------
 1 file changed, 10 insertions(+), 23 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index dd3a1dc..b409e7c 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -37,6 +37,7 @@
 #include <asm/timer.h>
 #include <asm/desc.h>
 #include <asm/ldt.h>
+#include <asm/unwind.h>
 
 #include "perf_event.h"
 
@@ -2267,31 +2268,12 @@ void arch_perf_update_userpage(struct perf_event *event,
 	cyc2ns_read_end(data);
 }
 
-/*
- * callchain support
- */
-
-static int backtrace_stack(void *data, const char *name)
-{
-	return 0;
-}
-
-static int backtrace_address(void *data, unsigned long addr, int reliable)
-{
-	struct perf_callchain_entry_ctx *entry = data;
-
-	return perf_callchain_store(entry, addr);
-}
-
-static const struct stacktrace_ops backtrace_ops = {
-	.stack			= backtrace_stack,
-	.address		= backtrace_address,
-	.walk_stack		= print_context_stack_bp,
-};
-
 void
 perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *regs)
 {
+	struct unwind_state state;
+	unsigned long addr;
+
 	if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
 		/* TODO: We don't support guest os callchain now */
 		return;
@@ -2300,7 +2282,12 @@ perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *re
 	if (perf_callchain_store(entry, regs->ip))
 		return;
 
-	dump_trace(NULL, regs, NULL, 0, &backtrace_ops, entry);
+	for (unwind_start(&state, NULL, regs, NULL); !unwind_done(&state);
+	     unwind_next_frame(&state)) {
+		addr = unwind_get_return_address(&state);
+		if (!addr || perf_callchain_store(entry, addr))
+			return;
+	}
 }
 
 static inline int
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 37/51] x86/stacktrace: convert save_stack_trace_*() to use the new unwinder
  2016-08-12 14:28 [PATCH v3 00/51] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (35 preceding siblings ...)
  2016-08-12 14:28 ` [PATCH v3 36/51] perf/x86: convert perf_callchain_kernel() to use the new unwinder Josh Poimboeuf
@ 2016-08-12 14:28 ` Josh Poimboeuf
  2016-08-12 14:28 ` [PATCH v3 38/51] oprofile/x86: convert x86_backtrace() " Josh Poimboeuf
                   ` (13 subsequent siblings)
  50 siblings, 0 replies; 99+ messages in thread
From: Josh Poimboeuf @ 2016-08-12 14:28 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

Convert save_stack_trace_*() to use the new unwinder.  dump_trace() has
been deprecated.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/stacktrace.c | 74 +++++++++++++++++---------------------------
 1 file changed, 29 insertions(+), 45 deletions(-)

diff --git a/arch/x86/kernel/stacktrace.c b/arch/x86/kernel/stacktrace.c
index 785aef1..a168e7e 100644
--- a/arch/x86/kernel/stacktrace.c
+++ b/arch/x86/kernel/stacktrace.c
@@ -8,80 +8,64 @@
 #include <linux/export.h>
 #include <linux/uaccess.h>
 #include <asm/stacktrace.h>
+#include <asm/unwind.h>
 
-static int save_stack_stack(void *data, const char *name)
+static int save_stack_address(struct stack_trace *trace, unsigned long addr,
+			      bool nosched)
 {
-	return 0;
-}
-
-static int
-__save_stack_address(void *data, unsigned long addr, bool reliable, bool nosched)
-{
-	struct stack_trace *trace = data;
-#ifdef CONFIG_FRAME_POINTER
-	if (!reliable)
-		return 0;
-#endif
 	if (nosched && in_sched_functions(addr))
 		return 0;
+
 	if (trace->skip > 0) {
 		trace->skip--;
 		return 0;
 	}
-	if (trace->nr_entries < trace->max_entries) {
-		trace->entries[trace->nr_entries++] = addr;
-		return 0;
-	} else {
-		return -1; /* no more room, stop walking the stack */
-	}
-}
 
-static int save_stack_address(void *data, unsigned long addr, int reliable)
-{
-	return __save_stack_address(data, addr, reliable, false);
+	if (trace->nr_entries >= trace->max_entries)
+		return -1;
+
+	trace->entries[trace->nr_entries++] = addr;
+	return 0;
 }
 
-static int
-save_stack_address_nosched(void *data, unsigned long addr, int reliable)
+static void __save_stack_trace(struct stack_trace *trace,
+			       struct task_struct *task, struct pt_regs *regs,
+			       bool nosched)
 {
-	return __save_stack_address(data, addr, reliable, true);
-}
+	struct unwind_state state;
+	unsigned long addr;
 
-static const struct stacktrace_ops save_stack_ops = {
-	.stack		= save_stack_stack,
-	.address	= save_stack_address,
-	.walk_stack	= print_context_stack,
-};
+	if (regs)
+		save_stack_address(trace, regs->ip, nosched);
 
-static const struct stacktrace_ops save_stack_ops_nosched = {
-	.stack		= save_stack_stack,
-	.address	= save_stack_address_nosched,
-	.walk_stack	= print_context_stack,
-};
+	for (unwind_start(&state, task, regs, NULL); !unwind_done(&state);
+	     unwind_next_frame(&state)) {
+		addr = unwind_get_return_address(&state);
+		if (!addr || save_stack_address(trace, addr, nosched))
+			break;
+	}
+
+	if (trace->nr_entries < trace->max_entries)
+		trace->entries[trace->nr_entries++] = ULONG_MAX;
+}
 
 /*
  * Save stack-backtrace addresses into a stack_trace buffer.
  */
 void save_stack_trace(struct stack_trace *trace)
 {
-	dump_trace(current, NULL, NULL, 0, &save_stack_ops, trace);
-	if (trace->nr_entries < trace->max_entries)
-		trace->entries[trace->nr_entries++] = ULONG_MAX;
+	__save_stack_trace(trace, NULL, NULL, false);
 }
 EXPORT_SYMBOL_GPL(save_stack_trace);
 
 void save_stack_trace_regs(struct pt_regs *regs, struct stack_trace *trace)
 {
-	dump_trace(current, regs, NULL, 0, &save_stack_ops, trace);
-	if (trace->nr_entries < trace->max_entries)
-		trace->entries[trace->nr_entries++] = ULONG_MAX;
+	__save_stack_trace(trace, NULL, regs, false);
 }
 
 void save_stack_trace_tsk(struct task_struct *tsk, struct stack_trace *trace)
 {
-	dump_trace(tsk, NULL, NULL, 0, &save_stack_ops_nosched, trace);
-	if (trace->nr_entries < trace->max_entries)
-		trace->entries[trace->nr_entries++] = ULONG_MAX;
+	__save_stack_trace(trace, tsk, NULL, true);
 }
 EXPORT_SYMBOL_GPL(save_stack_trace_tsk);
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 38/51] oprofile/x86: convert x86_backtrace() to use the new unwinder
  2016-08-12 14:28 [PATCH v3 00/51] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (36 preceding siblings ...)
  2016-08-12 14:28 ` [PATCH v3 37/51] x86/stacktrace: convert save_stack_trace_*() " Josh Poimboeuf
@ 2016-08-12 14:28 ` Josh Poimboeuf
  2016-08-12 14:28 ` [PATCH v3 39/51] x86/dumpstack: convert show_trace_log_lvl() " Josh Poimboeuf
                   ` (12 subsequent siblings)
  50 siblings, 0 replies; 99+ messages in thread
From: Josh Poimboeuf @ 2016-08-12 14:28 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish, Robert Richter

Convert oprofile's x86_backtrace() to use the new unwinder.
dump_trace() has been deprecated.

Cc: Robert Richter <rric@kernel.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/oprofile/backtrace.c | 39 +++++++++++++++++----------------------
 1 file changed, 17 insertions(+), 22 deletions(-)

diff --git a/arch/x86/oprofile/backtrace.c b/arch/x86/oprofile/backtrace.c
index 7539148..f28ac1a 100644
--- a/arch/x86/oprofile/backtrace.c
+++ b/arch/x86/oprofile/backtrace.c
@@ -16,27 +16,7 @@
 
 #include <asm/ptrace.h>
 #include <asm/stacktrace.h>
-
-static int backtrace_stack(void *data, const char *name)
-{
-	/* Yes, we want all stacks */
-	return 0;
-}
-
-static int backtrace_address(void *data, unsigned long addr, int reliable)
-{
-	unsigned int *depth = data;
-
-	if ((*depth)--)
-		oprofile_add_trace(addr);
-	return 0;
-}
-
-static struct stacktrace_ops backtrace_ops = {
-	.stack		= backtrace_stack,
-	.address	= backtrace_address,
-	.walk_stack	= print_context_stack,
-};
+#include <asm/unwind.h>
 
 #ifdef CONFIG_COMPAT
 static struct stack_frame_ia32 *
@@ -113,14 +93,29 @@ x86_backtrace(struct pt_regs * const regs, unsigned int depth)
 	struct stack_frame *head = (struct stack_frame *)frame_pointer(regs);
 
 	if (!user_mode(regs)) {
+		struct unwind_state state;
+		unsigned long addr;
+
 		if (!depth)
 			return;
 
 		oprofile_add_trace(regs->ip);
+
 		if (!--depth)
 			return;
 
-		dump_trace(NULL, regs, NULL, 0, &backtrace_ops, &depth);
+		for (unwind_start(&state, NULL, regs, NULL);
+		     !unwind_done(&state); unwind_next_frame(&state)) {
+			addr = unwind_get_return_address(&state);
+			if (!addr)
+				break;
+
+			oprofile_add_trace(addr);
+
+			if (!--depth)
+				break;
+		}
+
 		return;
 	}
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 39/51] x86/dumpstack: convert show_trace_log_lvl() to use the new unwinder
  2016-08-12 14:28 [PATCH v3 00/51] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (37 preceding siblings ...)
  2016-08-12 14:28 ` [PATCH v3 38/51] oprofile/x86: convert x86_backtrace() " Josh Poimboeuf
@ 2016-08-12 14:28 ` Josh Poimboeuf
  2016-08-14  8:13   ` Andy Lutomirski
  2016-08-12 14:28 ` [PATCH v3 40/51] x86/dumpstack: remove dump_trace() and related callbacks Josh Poimboeuf
                   ` (11 subsequent siblings)
  50 siblings, 1 reply; 99+ messages in thread
From: Josh Poimboeuf @ 2016-08-12 14:28 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

Convert show_trace_log_lvl() to use the new unwinder.  dump_trace() has
been deprecated.

show_trace_log_lvl() is special compared to other users of the unwinder.
It's the only place where both reliable *and* unreliable addresses are
needed.  With frame pointers enabled, most callers of the unwinder don't
want to know about unreliable addresses.  But in this case, when we're
dumping the stack to the console because something presumably went
wrong, the unreliable addresses are useful:

- They show stale data on the stack which can provide useful clues.

- If something goes wrong with the unwinder, or if frame pointers are
  corrupt or missing, all the stack addresses still get shown.

So in order to show all addresses on the stack, and at the same time
figure out which addresses are reliable, we have to do the scanning and
the unwinding in parallel.

The scanning is done with the help of get_stack_info() to traverse the
stacks.  The unwinding is done separately by the new unwinder.

In theory we could simplify show_trace_log_lvl() by instead pushing some
of this logic into the unwind code.  But then we would need some kind of
"fake" frame logic in the unwinder which would add a lot of complexity
and wouldn't be worth it in order to support only one user.

Another benefit of this approach is that once we have a DWARF unwinder,
we should be able to just plug it in with minimal impact to this code.

Another change here is that callers of show_trace_log_lvl() don't need
to provide the 'bp' argument.  The unwinder already finds the relevant
frame pointer by unwinding until it reaches the first frame after the
provided stack pointer.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/include/asm/stacktrace.h |  10 ++-
 arch/x86/kernel/dumpstack.c       | 130 +++++++++++++++++++++++++++++---------
 arch/x86/kernel/dumpstack_32.c    |   6 +-
 arch/x86/kernel/dumpstack_64.c    |  10 +--
 4 files changed, 111 insertions(+), 45 deletions(-)

diff --git a/arch/x86/include/asm/stacktrace.h b/arch/x86/include/asm/stacktrace.h
index be9273c..0a5acde 100644
--- a/arch/x86/include/asm/stacktrace.h
+++ b/arch/x86/include/asm/stacktrace.h
@@ -119,13 +119,11 @@ get_stack_pointer(struct task_struct *task, struct pt_regs *regs)
 	return (unsigned long *)task->thread.sp;
 }
 
-extern void
-show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
-		   unsigned long *stack, unsigned long bp, char *log_lvl);
+void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
+			unsigned long *stack, char *log_lvl);
 
-extern void
-show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
-		   unsigned long *sp, unsigned long bp, char *log_lvl);
+void show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
+			unsigned long *sp, char *log_lvl);
 
 extern unsigned int code_bytes;
 
diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index aa208e5..d627fe2 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -17,7 +17,7 @@
 #include <linux/sysfs.h>
 
 #include <asm/stacktrace.h>
-
+#include <asm/unwind.h>
 
 int panic_on_unrecovered_nmi;
 int panic_on_io_nmi;
@@ -142,54 +142,122 @@ print_context_stack_bp(struct task_struct *task,
 }
 EXPORT_SYMBOL_GPL(print_context_stack_bp);
 
-static int print_trace_stack(void *data, const char *name)
+void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
+			unsigned long *stack, char *log_lvl)
 {
-	printk("%s <%s> ", (char *)data, name);
-	return 0;
-}
+	struct unwind_state state;
+	struct stack_info stack_info = {0};
+	unsigned long visit_mask = 0;
+	int graph_idx = 0;
 
-/*
- * Print one address/symbol entries per line.
- */
-static int print_trace_address(void *data, unsigned long addr, int reliable)
-{
-	printk_stack_address(addr, reliable, data);
-	return 0;
-}
+	printk("%sCall Trace:\n", log_lvl);
 
-static const struct stacktrace_ops print_trace_ops = {
-	.stack			= print_trace_stack,
-	.address		= print_trace_address,
-	.walk_stack		= print_context_stack,
-};
+	stack = stack ? : get_stack_pointer(task, regs);
+	if (!task)
+		task = current;
 
-void
-show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
-		unsigned long *stack, unsigned long bp, char *log_lvl)
-{
-	printk("%sCall Trace:\n", log_lvl);
-	dump_trace(task, regs, stack, bp, &print_trace_ops, log_lvl);
+	unwind_start(&state, task, regs, stack);
+
+	/*
+	 * Iterate through the stacks, starting with the current stack pointer.
+	 * Each stack has a pointer to the next one.
+	 *
+	 * x86-64 can have several stacks:
+	 * - task stack
+	 * - interrupt stack
+	 * - HW exception stacks (double fault, nmi, debug, mce)
+	 *
+	 * x86-32 can have up to three stacks:
+	 * - task stack
+	 * - softirq stack
+	 * - hardirq stack
+	 */
+	for (; stack; stack = stack_info.next_sp) {
+		const char *str_begin, *str_end;
+
+		/*
+		 * If we overflowed the task stack into a guard page, jump back
+		 * to the bottom of the usable stack.
+		 */
+		if (task_stack_page(task) - (void *)stack < PAGE_SIZE)
+			stack = task_stack_page(task);
+
+		if (get_stack_info(stack, task, &stack_info, &visit_mask))
+			break;
+
+		stack_type_str(stack_info.type, &str_begin, &str_end);
+		if (str_begin)
+			printk("%s <%s> ", log_lvl, str_begin);
+
+		/*
+		 * Scan the stack, printing any text addresses we find.  At the
+		 * same time, follow proper stack frames with the unwinder.
+		 *
+		 * Addresses found during the scan which are not reported by
+		 * the unwinder are considered to be additional clues which are
+		 * sometimes useful for debugging and are prefixed with '?'.
+		 * This also serves as a failsafe option in case the unwinder
+		 * goes off in the weeds.
+		 */
+		for (; stack < stack_info.end; stack++) {
+			unsigned long real_addr;
+			int reliable = 0;
+			unsigned long addr = *stack;
+			unsigned long *ret_addr_p =
+				unwind_get_return_address_ptr(&state);
+
+			if (!__kernel_text_address(addr))
+				continue;
+
+			if (stack == ret_addr_p)
+				reliable = 1;
+
+			/*
+			 * When function graph tracing is enabled for a
+			 * function, its return address on the stack is
+			 * replaced with the address of an ftrace handler
+			 * (return_to_handler).  In that case, before printing
+			 * the "real" address, we want to print the handler
+			 * address as an "unreliable" hint that function graph
+			 * tracing was involved.
+			 */
+			real_addr = ftrace_graph_ret_addr(task, &graph_idx,
+							  addr, stack);
+			if (real_addr != addr)
+				printk_stack_address(addr, 0, log_lvl);
+			printk_stack_address(real_addr, reliable, log_lvl);
+
+			if (!reliable)
+				continue;
+
+			/*
+			 * Get the next frame from the unwinder.  No need to
+			 * check for an error: if anything goes wrong, the rest
+			 * of the addresses will just be printed as unreliable.
+			 */
+			unwind_next_frame(&state);
+		}
+
+		if (str_end)
+			printk("%s <%s> ", log_lvl, str_end);
+	}
 }
 
 void show_stack(struct task_struct *task, unsigned long *sp)
 {
-	unsigned long bp = 0;
-
 	/*
 	 * Stack frames below this one aren't interesting.  Don't show them
 	 * if we're printing for %current.
 	 */
-	if (!sp && (!task || task == current)) {
+	if (!sp && (!task || task == current))
 		sp = get_stack_pointer(current, NULL);
-		bp = (unsigned long)get_frame_pointer(current, NULL);
-	}
 
-	show_stack_log_lvl(task, NULL, sp, bp, "");
+	show_stack_log_lvl(task, NULL, sp, "");
 }
 
 void show_stack_regs(struct pt_regs *regs)
 {
-	show_stack_log_lvl(current, regs, NULL, 0, "");
+	show_stack_log_lvl(NULL, regs, NULL, "");
 }
 
 static arch_spinlock_t die_lock = __ARCH_SPIN_LOCK_UNLOCKED;
diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
index 37d9c30..922b722 100644
--- a/arch/x86/kernel/dumpstack_32.c
+++ b/arch/x86/kernel/dumpstack_32.c
@@ -150,7 +150,7 @@ EXPORT_SYMBOL(dump_trace);
 
 void
 show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
-		   unsigned long *sp, unsigned long bp, char *log_lvl)
+		   unsigned long *sp, char *log_lvl)
 {
 	unsigned long *stack;
 	int i;
@@ -170,7 +170,7 @@ show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
 		touch_nmi_watchdog();
 	}
 	pr_cont("\n");
-	show_trace_log_lvl(task, regs, sp, bp, log_lvl);
+	show_trace_log_lvl(task, regs, sp, log_lvl);
 }
 
 
@@ -192,7 +192,7 @@ void show_regs(struct pt_regs *regs)
 		u8 *ip;
 
 		pr_emerg("Stack:\n");
-		show_stack_log_lvl(NULL, regs, NULL, 0, KERN_EMERG);
+		show_stack_log_lvl(NULL, regs, NULL, KERN_EMERG);
 
 		pr_emerg("Code:");
 
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index 2292292..55ee1f3 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -15,6 +15,7 @@
 #include <linux/nmi.h>
 
 #include <asm/stacktrace.h>
+#include <asm/unwind.h>
 
 static char *exception_stack_names[N_EXCEPTION_STACKS] = {
 		[ DOUBLEFAULT_STACK-1	]	= "#DF",
@@ -205,9 +206,8 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 }
 EXPORT_SYMBOL(dump_trace);
 
-void
-show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
-		   unsigned long *sp, unsigned long bp, char *log_lvl)
+void show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
+			unsigned long *sp, char *log_lvl)
 {
 	unsigned long *irq_stack, *irq_stack_end;
 	unsigned long *stack;
@@ -247,7 +247,7 @@ show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
 	}
 
 	pr_cont("\n");
-	show_trace_log_lvl(task, regs, sp, bp, log_lvl);
+	show_trace_log_lvl(task, regs, sp, log_lvl);
 }
 
 void show_regs(struct pt_regs *regs)
@@ -268,7 +268,7 @@ void show_regs(struct pt_regs *regs)
 		u8 *ip;
 
 		printk(KERN_DEFAULT "Stack:\n");
-		show_stack_log_lvl(NULL, regs, NULL, 0, KERN_DEFAULT);
+		show_stack_log_lvl(NULL, regs, NULL, KERN_DEFAULT);
 
 		printk(KERN_DEFAULT "Code: ");
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 40/51] x86/dumpstack: remove dump_trace() and related callbacks
  2016-08-12 14:28 [PATCH v3 00/51] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (38 preceding siblings ...)
  2016-08-12 14:28 ` [PATCH v3 39/51] x86/dumpstack: convert show_trace_log_lvl() " Josh Poimboeuf
@ 2016-08-12 14:28 ` Josh Poimboeuf
  2016-08-12 14:29 ` [PATCH v3 41/51] x86/entry/unwind: create stack frames for saved interrupt registers Josh Poimboeuf
                   ` (10 subsequent siblings)
  50 siblings, 0 replies; 99+ messages in thread
From: Josh Poimboeuf @ 2016-08-12 14:28 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

All previous users of dump_trace() have been converted to use the new
unwind interfaces, so we can remove it and the related
print_context_stack() and print_context_stack_bp() callback functions.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/include/asm/stacktrace.h | 36 ----------------
 arch/x86/kernel/dumpstack.c       | 86 ---------------------------------------
 arch/x86/kernel/dumpstack_32.c    | 35 ----------------
 arch/x86/kernel/dumpstack_64.c    | 69 -------------------------------
 4 files changed, 226 deletions(-)

diff --git a/arch/x86/include/asm/stacktrace.h b/arch/x86/include/asm/stacktrace.h
index 0a5acde..43a12d6 100644
--- a/arch/x86/include/asm/stacktrace.h
+++ b/arch/x86/include/asm/stacktrace.h
@@ -44,42 +44,6 @@ static inline bool on_stack(struct stack_info *info, void *addr, size_t len)
 
 extern int kstack_depth_to_print;
 
-struct thread_info;
-struct stacktrace_ops;
-
-typedef unsigned long (*walk_stack_t)(struct task_struct *task,
-				      unsigned long *stack,
-				      unsigned long bp,
-				      const struct stacktrace_ops *ops,
-				      void *data,
-				      struct stack_info *info,
-				      int *graph);
-
-extern unsigned long
-print_context_stack(struct task_struct *task,
-		    unsigned long *stack, unsigned long bp,
-		    const struct stacktrace_ops *ops, void *data,
-		    struct stack_info *info, int *graph);
-
-extern unsigned long
-print_context_stack_bp(struct task_struct *task,
-		       unsigned long *stack, unsigned long bp,
-		       const struct stacktrace_ops *ops, void *data,
-		       struct stack_info *info, int *graph);
-
-/* Generic stack tracer with callbacks */
-
-struct stacktrace_ops {
-	int (*address)(void *data, unsigned long address, int reliable);
-	/* On negative return stop dumping */
-	int (*stack)(void *data, const char *name);
-	walk_stack_t	walk_stack;
-};
-
-void dump_trace(struct task_struct *tsk, struct pt_regs *regs,
-		unsigned long *stack, unsigned long bp,
-		const struct stacktrace_ops *ops, void *data);
-
 #ifdef CONFIG_X86_32
 #define STACKSLOTS_PER_LINE 8
 #else
diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index d627fe2..dcb718b 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -56,92 +56,6 @@ void printk_address(unsigned long address)
 	pr_cont(" [<%p>] %pS\n", (void *)address, (void *)address);
 }
 
-/*
- * x86-64 can have up to three kernel stacks:
- * process stack
- * interrupt stack
- * severe exception (double fault, nmi, stack fault, debug, mce) hardware stack
- */
-
-unsigned long
-print_context_stack(struct task_struct *task,
-		unsigned long *stack, unsigned long bp,
-		const struct stacktrace_ops *ops, void *data,
-		struct stack_info *info, int *graph)
-{
-	struct stack_frame *frame = (struct stack_frame *)bp;
-
-	/*
-	 * If we overflowed the stack into a guard page, jump back to the
-	 * bottom of the usable stack.
-	 */
-	if ((unsigned long)task_stack_page(task) - (unsigned long)stack <
-	    PAGE_SIZE)
-		stack = (unsigned long *)task_stack_page(task);
-
-	while (on_stack(info, stack, sizeof(*stack))) {
-		unsigned long addr = *stack;
-
-		if (__kernel_text_address(addr)) {
-			unsigned long real_addr;
-			int reliable = 0;
-
-			if ((unsigned long) stack == bp + sizeof(long)) {
-				reliable = 1;
-				frame = frame->next_frame;
-				bp = (unsigned long) frame;
-			}
-
-			/*
-			 * When function graph tracing is enabled for a
-			 * function, its return address on the stack is
-			 * replaced with the address of an ftrace handler
-			 * (return_to_handler).  In that case, before printing
-			 * the "real" address, we want to print the handler
-			 * address as an "unreliable" hint that function graph
-			 * tracing was involved.
-			 */
-			real_addr = ftrace_graph_ret_addr(task, graph, addr,
-							  stack);
-			if (real_addr != addr)
-				ops->address(data, addr, 0);
-
-			ops->address(data, real_addr, reliable);
-		}
-		stack++;
-	}
-	return bp;
-}
-EXPORT_SYMBOL_GPL(print_context_stack);
-
-unsigned long
-print_context_stack_bp(struct task_struct *task,
-		       unsigned long *stack, unsigned long bp,
-		       const struct stacktrace_ops *ops, void *data,
-		       struct stack_info *info, int *graph)
-{
-	struct stack_frame *frame = (struct stack_frame *)bp;
-	unsigned long *retp = &frame->return_address;
-
-	while (on_stack(info, stack, sizeof(*stack) * 2)) {
-		unsigned long addr = *retp;
-		unsigned long real_addr;
-
-		if (!__kernel_text_address(addr))
-			break;
-
-		real_addr = ftrace_graph_ret_addr(task, graph, addr, retp);
-		if (ops->address(data, real_addr, 1))
-			break;
-
-		frame = frame->next_frame;
-		retp = &frame->return_address;
-	}
-
-	return (unsigned long)frame;
-}
-EXPORT_SYMBOL_GPL(print_context_stack_bp);
-
 void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
 			unsigned long *stack, char *log_lvl)
 {
diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
index 922b722..53d939e 100644
--- a/arch/x86/kernel/dumpstack_32.c
+++ b/arch/x86/kernel/dumpstack_32.c
@@ -113,41 +113,6 @@ unknown:
 	return -EINVAL;
 }
 
-void dump_trace(struct task_struct *task, struct pt_regs *regs,
-		unsigned long *stack, unsigned long bp,
-		const struct stacktrace_ops *ops, void *data)
-{
-	unsigned long visit_mask = 0;
-	int graph = 0;
-
-	task = task ? : current;
-	stack = stack ? : get_stack_pointer(task, regs);
-	bp = bp ? : (unsigned long)get_frame_pointer(task, regs);
-
-	for (;;) {
-		const char *begin_str, *end_str;
-		struct stack_info info;
-
-		if (get_stack_info(stack, task, &info, &visit_mask))
-			break;
-
-		stack_type_str(info.type, &begin_str, &end_str);
-
-		if (begin_str && ops->stack(data, begin_str) < 0)
-			break;
-
-		bp = ops->walk_stack(task, stack, bp, ops, data, &info, &graph);
-
-		if (end_str && ops->stack(data, end_str) < 0)
-			break;
-
-		stack = info.next_sp;
-
-		touch_nmi_watchdog();
-	}
-}
-EXPORT_SYMBOL(dump_trace);
-
 void
 show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
 		   unsigned long *sp, char *log_lvl)
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index 55ee1f3..9f7a9f9 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -137,75 +137,6 @@ unknown:
 	return -EINVAL;
 }
 
-/*
- * x86-64 can have up to three kernel stacks:
- * process stack
- * interrupt stack
- * severe exception (double fault, nmi, stack fault, debug, mce) hardware stack
- */
-
-void dump_trace(struct task_struct *task, struct pt_regs *regs,
-		unsigned long *stack, unsigned long bp,
-		const struct stacktrace_ops *ops, void *data)
-{
-	unsigned long visit_mask = 0;
-	struct stack_info info;
-	int graph = 0;
-	int done = 0;
-
-	task = task ? : current;
-	stack = stack ? : get_stack_pointer(task, regs);
-	bp = bp ? : (unsigned long)get_frame_pointer(task, regs);
-
-	/*
-	 * Print function call entries in all stacks, starting at the
-	 * current stack address. If the stacks consist of nested
-	 * exceptions
-	 */
-	while (!done) {
-		const char *begin_str, *end_str;
-
-		get_stack_info(stack, task, &info, &visit_mask);
-
-		/* Default finish unless specified to continue */
-		done = 1;
-
-		switch (info.type) {
-
-		/* Break out early if we are on the thread stack */
-		case STACK_TYPE_TASK:
-			break;
-
-		case STACK_TYPE_IRQ:
-		case STACK_TYPE_EXCEPTION ... STACK_TYPE_EXCEPTION_LAST:
-
-			stack_type_str(info.type, &begin_str, &end_str);
-
-			if (ops->stack(data, begin_str) < 0)
-				break;
-
-			bp = ops->walk_stack(task, stack, bp, ops,
-					     data, &info, &graph);
-
-			ops->stack(data, end_str);
-
-			stack = info.next_sp;
-			done = 0;
-			break;
-
-		default:
-			ops->stack(data, "UNK");
-			break;
-		}
-	}
-
-	/*
-	 * This handles the process stack:
-	 */
-	bp = ops->walk_stack(task, stack, bp, ops, data, &info, &graph);
-}
-EXPORT_SYMBOL(dump_trace);
-
 void show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
 			unsigned long *sp, char *log_lvl)
 {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 41/51] x86/entry/unwind: create stack frames for saved interrupt registers
  2016-08-12 14:28 [PATCH v3 00/51] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (39 preceding siblings ...)
  2016-08-12 14:28 ` [PATCH v3 40/51] x86/dumpstack: remove dump_trace() and related callbacks Josh Poimboeuf
@ 2016-08-12 14:29 ` Josh Poimboeuf
  2016-08-14  8:10   ` Andy Lutomirski
  2016-08-12 14:29 ` [PATCH v3 42/51] x86/unwind: create stack frames for saved syscall registers Josh Poimboeuf
                   ` (9 subsequent siblings)
  50 siblings, 1 reply; 99+ messages in thread
From: Josh Poimboeuf @ 2016-08-12 14:29 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

With frame pointers, when a task is interrupted, its stack is no longer
completely reliable because the function could have been interrupted
before it had a chance to save the previous frame pointer on the stack.
So the caller of the interrupted function could get skipped by a stack
trace.

This is problematic for live patching, which needs to know whether a
stack trace of a sleeping task can be relied upon.  There's currently no
way to detect if a sleeping task was interrupted by a page fault
exception or preemption before it went to sleep.

Another issue is that when dumping the stack of an interrupted task, the
unwinder has no way of knowing where the saved pt_regs registers are, so
it can't print them.

This solves those issues by encoding the pt_regs pointer in the frame
pointer on entry from an interrupt or an exception.

This patch also updates the unwinder to be able to decode it, because
otherwise the unwinder would be broken by this change.

Note that this causes a change in the behavior of the unwinder: each
instance of a pt_regs on the stack is now considered a "frame".  So
callers of unwind_get_return_address() will now get an occasional
'regs->ip' address that would have previously been skipped over.

Suggested-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/entry/calling.h       | 21 +++++++++++
 arch/x86/entry/entry_32.S      | 40 ++++++++++++++++++---
 arch/x86/entry/entry_64.S      | 10 ++++--
 arch/x86/include/asm/unwind.h  | 18 ++++++++--
 arch/x86/kernel/unwind_frame.c | 82 +++++++++++++++++++++++++++++++++++++-----
 5 files changed, 153 insertions(+), 18 deletions(-)

diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index 9a9e588..ab799a3 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -201,6 +201,27 @@ For 32-bit we have the following conventions - kernel is built with
 	.byte 0xf1
 	.endm
 
+	/*
+	 * This is a sneaky trick to help the unwinder find pt_regs on the
+	 * stack.  The frame pointer is replaced with an encoded pointer to
+	 * pt_regs.  The encoding is just a clearing of the highest-order bit,
+	 * which makes it an invalid address and is also a signal to the
+	 * unwinder that it's a pt_regs pointer in disguise.
+	 *
+	 * NOTE: This macro must be used *after* SAVE_EXTRA_REGS because it
+	 * corrupts the original rbp.
+	 */
+.macro ENCODE_FRAME_POINTER ptregs_offset=0
+#ifdef CONFIG_FRAME_POINTER
+	.if \ptregs_offset
+		leaq \ptregs_offset(%rsp), %rbp
+	.else
+		mov %rsp, %rbp
+	.endif
+	btr $63, %rbp
+#endif
+.endm
+
 #endif /* CONFIG_X86_64 */
 
 /*
diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 4396278..4006fa3 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -174,6 +174,23 @@
 	SET_KERNEL_GS %edx
 .endm
 
+/*
+ * This is a sneaky trick to help the unwinder find pt_regs on the
+ * stack.  The frame pointer is replaced with an encoded pointer to
+ * pt_regs.  The encoding is just a clearing of the highest-order bit,
+ * which makes it an invalid address and is also a signal to the
+ * unwinder that it's a pt_regs pointer in disguise.
+ *
+ * NOTE: This macro must be used *after* SAVE_ALL because it corrupts the
+ * original rbp.
+ */
+.macro ENCODE_FRAME_POINTER
+#ifdef CONFIG_FRAME_POINTER
+	mov %esp, %ebp
+	btr $31, %ebp
+#endif
+.endm
+
 .macro RESTORE_INT_REGS
 	popl	%ebx
 	popl	%ecx
@@ -205,10 +222,16 @@
 .endm
 
 ENTRY(ret_from_fork)
+	call	1f
+1:	push	$0
+	mov	%esp, %ebp
+
 	pushl	%eax
 	call	schedule_tail
 	popl	%eax
 
+	addl	$8, %esp
+
 	/* When we fork, we trace the syscall return in the child, too. */
 	movl    %esp, %eax
 	call    syscall_return_slowpath
@@ -588,6 +611,7 @@ common_interrupt:
 	ASM_CLAC
 	addl	$-0x80, (%esp)			/* Adjust vector into the [-256, -1] range */
 	SAVE_ALL
+	ENCODE_FRAME_POINTER
 	TRACE_IRQS_OFF
 	movl	%esp, %eax
 	call	do_IRQ
@@ -599,6 +623,7 @@ ENTRY(name)				\
 	ASM_CLAC;			\
 	pushl	$~(nr);			\
 	SAVE_ALL;			\
+	ENCODE_FRAME_POINTER;		\
 	TRACE_IRQS_OFF			\
 	movl	%esp, %eax;		\
 	call	fn;			\
@@ -733,6 +758,7 @@ END(spurious_interrupt_bug)
 ENTRY(xen_hypervisor_callback)
 	pushl	$-1				/* orig_ax = -1 => not a system call */
 	SAVE_ALL
+	ENCODE_FRAME_POINTER
 	TRACE_IRQS_OFF
 
 	/*
@@ -787,6 +813,7 @@ ENTRY(xen_failsafe_callback)
 	jmp	iret_exc
 5:	pushl	$-1				/* orig_ax = -1 => not a system call */
 	SAVE_ALL
+	ENCODE_FRAME_POINTER
 	jmp	ret_from_exception
 
 .section .fixup, "ax"
@@ -1013,6 +1040,7 @@ common_exception:
 	pushl	%edx
 	pushl	%ecx
 	pushl	%ebx
+	ENCODE_FRAME_POINTER
 	cld
 	movl	$(__KERNEL_PERCPU), %ecx
 	movl	%ecx, %fs
@@ -1045,6 +1073,7 @@ ENTRY(debug)
 	ASM_CLAC
 	pushl	$-1				# mark this as an int
 	SAVE_ALL
+	ENCODE_FRAME_POINTER
 	xorl	%edx, %edx			# error code 0
 	movl	%esp, %eax			# pt_regs pointer
 
@@ -1060,11 +1089,11 @@ ENTRY(debug)
 
 .Ldebug_from_sysenter_stack:
 	/* We're on the SYSENTER stack.  Switch off. */
-	movl	%esp, %ebp
+	movl	%esp, %ebx
 	movl	PER_CPU_VAR(cpu_current_top_of_stack), %esp
 	TRACE_IRQS_OFF
 	call	do_debug
-	movl	%ebp, %esp
+	movl	%ebx, %esp
 	jmp	ret_from_exception
 END(debug)
 
@@ -1087,6 +1116,7 @@ ENTRY(nmi)
 
 	pushl	%eax				# pt_regs->orig_ax
 	SAVE_ALL
+	ENCODE_FRAME_POINTER
 	xorl	%edx, %edx			# zero error code
 	movl	%esp, %eax			# pt_regs pointer
 
@@ -1105,10 +1135,10 @@ ENTRY(nmi)
 	 * We're on the SYSENTER stack.  Switch off.  No one (not even debug)
 	 * is using the thread stack right now, so it's safe for us to use it.
 	 */
-	movl	%esp, %ebp
+	movl	%esp, %ebx
 	movl	PER_CPU_VAR(cpu_current_top_of_stack), %esp
 	call	do_nmi
-	movl	%ebp, %esp
+	movl	%ebx, %esp
 	jmp	.Lrestore_all_notrace
 
 #ifdef CONFIG_X86_ESPFIX32
@@ -1125,6 +1155,7 @@ ENTRY(nmi)
 	.endr
 	pushl	%eax
 	SAVE_ALL
+	ENCODE_FRAME_POINTER
 	FIXUP_ESPFIX_STACK			# %eax == %esp
 	xorl	%edx, %edx			# zero error code
 	call	do_nmi
@@ -1138,6 +1169,7 @@ ENTRY(int3)
 	ASM_CLAC
 	pushl	$-1				# mark this as an int
 	SAVE_ALL
+	ENCODE_FRAME_POINTER
 	TRACE_IRQS_OFF
 	xorl	%edx, %edx			# zero error code
 	movl	%esp, %eax			# pt_regs pointer
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index f6b40e5..6200318 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -434,6 +434,7 @@ END(irq_entries_start)
 	ALLOC_PT_GPREGS_ON_STACK
 	SAVE_C_REGS
 	SAVE_EXTRA_REGS
+	ENCODE_FRAME_POINTER
 
 	testb	$3, CS(%rsp)
 	jz	1f
@@ -907,6 +908,7 @@ ENTRY(xen_failsafe_callback)
 	ALLOC_PT_GPREGS_ON_STACK
 	SAVE_C_REGS
 	SAVE_EXTRA_REGS
+	ENCODE_FRAME_POINTER
 	jmp	error_exit
 END(xen_failsafe_callback)
 
@@ -950,6 +952,7 @@ ENTRY(paranoid_entry)
 	cld
 	SAVE_C_REGS 8
 	SAVE_EXTRA_REGS 8
+	ENCODE_FRAME_POINTER 8
 	movl	$1, %ebx
 	movl	$MSR_GS_BASE, %ecx
 	rdmsr
@@ -997,6 +1000,7 @@ ENTRY(error_entry)
 	cld
 	SAVE_C_REGS 8
 	SAVE_EXTRA_REGS 8
+	ENCODE_FRAME_POINTER 8
 	xorl	%ebx, %ebx
 	testb	$3, CS+8(%rsp)
 	jz	.Lerror_kernelspace
@@ -1179,6 +1183,7 @@ ENTRY(nmi)
 	pushq	%r13		/* pt_regs->r13 */
 	pushq	%r14		/* pt_regs->r14 */
 	pushq	%r15		/* pt_regs->r15 */
+	ENCODE_FRAME_POINTER
 
 	/*
 	 * At this point we no longer need to worry about stack damage
@@ -1192,11 +1197,10 @@ ENTRY(nmi)
 
 	/*
 	 * Return back to user mode.  We must *not* do the normal exit
-	 * work, because we don't want to enable interrupts.  Fortunately,
-	 * do_nmi doesn't modify pt_regs.
+	 * work, because we don't want to enable interrupts.
 	 */
 	SWAPGS
-	jmp	restore_c_regs_and_iret
+	jmp	restore_regs_and_iret
 
 .Lnmi_from_kernel:
 	/*
diff --git a/arch/x86/include/asm/unwind.h b/arch/x86/include/asm/unwind.h
index 6dcb44b..295dbd1 100644
--- a/arch/x86/include/asm/unwind.h
+++ b/arch/x86/include/asm/unwind.h
@@ -13,6 +13,7 @@ struct unwind_state {
 	int graph_idx;
 #ifdef CONFIG_FRAME_POINTER
 	unsigned long *bp;
+	struct pt_regs *regs;
 #else
 	unsigned long *sp;
 #endif
@@ -32,7 +33,7 @@ unsigned long *unwind_get_return_address_ptr(struct unwind_state *state)
 	if (state->stack_info.type == STACK_TYPE_UNKNOWN)
 		return NULL;
 
-	return state->bp + 1;
+	return state->regs ? &state->regs->ip : state->bp + 1;
 }
 
 static inline unsigned long *unwind_get_stack_ptr(struct unwind_state *state)
@@ -40,7 +41,15 @@ static inline unsigned long *unwind_get_stack_ptr(struct unwind_state *state)
 	if (state->stack_info.type == STACK_TYPE_UNKNOWN)
 		return NULL;
 
-	return state->bp;
+	return state->regs ? (unsigned long *)state->regs : state->bp;
+}
+
+static inline struct pt_regs *unwind_get_entry_regs(struct unwind_state *state)
+{
+	if (state->stack_info.type == STACK_TYPE_UNKNOWN)
+		return NULL;
+
+	return state->regs;
 }
 
 unsigned long unwind_get_return_address(struct unwind_state *state);
@@ -61,6 +70,11 @@ static inline unsigned long *unwind_get_stack_ptr(struct unwind_state *state)
 	return state->sp;
 }
 
+static inline struct pt_regs *unwind_get_entry_regs(struct unwind_state *state)
+{
+	return NULL;
+}
+
 static inline
 unsigned long unwind_get_return_address(struct unwind_state *state)
 {
diff --git a/arch/x86/kernel/unwind_frame.c b/arch/x86/kernel/unwind_frame.c
index 00ad526..e02acec 100644
--- a/arch/x86/kernel/unwind_frame.c
+++ b/arch/x86/kernel/unwind_frame.c
@@ -14,6 +14,9 @@ unsigned long unwind_get_return_address(struct unwind_state *state)
 	if (state->stack_info.type == STACK_TYPE_UNKNOWN)
 		return 0;
 
+	if (state->regs && user_mode(state->regs))
+		return 0;
+
 	addr = ftrace_graph_ret_addr(state->task, &state->graph_idx, *addr_p,
 				     addr_p);
 
@@ -21,6 +24,24 @@ unsigned long unwind_get_return_address(struct unwind_state *state)
 }
 EXPORT_SYMBOL_GPL(unwind_get_return_address);
 
+/*
+ * This determines if the frame pointer actually contains an encoded pointer to
+ * pt_regs on the stack.  See ENCODE_FRAME_POINTER.
+ */
+static struct pt_regs *decode_frame_pointer(unsigned long *bp)
+{
+	unsigned long regs = (unsigned long)bp;
+
+	/* if the MSB is set, it's not an encoded pointer */
+	if (regs & (1UL << (BITS_PER_LONG - 1)))
+		return NULL;
+
+	/* decode it by setting the MSB */
+	regs |= 1UL << (BITS_PER_LONG - 1);
+
+	return (struct pt_regs *)regs;
+}
+
 static bool update_stack_state(struct unwind_state *state, void *addr,
 			       size_t len)
 {
@@ -45,26 +66,59 @@ unknown:
 
 bool unwind_next_frame(struct unwind_state *state)
 {
-	unsigned long *next_bp;
+	struct pt_regs *regs;
+	unsigned long *next_bp, *next_sp;
+	size_t next_len;
 
 	if (unwind_done(state))
 		return false;
 
-	next_bp = (unsigned long *)*state->bp;
+	/* have we reached the end? */
+	if (state->regs && user_mode(state->regs))
+		goto the_end;
+
+	/* get the next frame pointer */
+	if (state->regs)
+		next_bp = (unsigned long *)state->regs->bp;
+	else
+		next_bp = (unsigned long *)*state->bp;
+
+	/* is the next frame pointer an encoded pointer to pt_regs? */
+	regs = decode_frame_pointer(next_bp);
+	if (regs) {
+		next_sp = (unsigned long *)regs;
+		next_len = sizeof(*regs);
+	} else {
+		next_sp = next_bp;
+		next_len = FRAME_HEADER_SIZE;
+	}
 
 	/* make sure the next frame's data is accessible */
-	if (!update_stack_state(state, next_bp, FRAME_HEADER_SIZE))
+	if (!update_stack_state(state, next_sp, next_len))
 		return false;
-
 	/* move to the next frame */
-	state->bp = next_bp;
+	if (regs) {
+		state->regs = regs;
+		state->bp = NULL;
+	} else {
+		state->bp = next_bp;
+		state->regs = NULL;
+	}
+
 	return true;
+
+the_end:
+	state->stack_info.type = STACK_TYPE_UNKNOWN;
+	return false;
 }
 EXPORT_SYMBOL_GPL(unwind_next_frame);
 
 void __unwind_start(struct unwind_state *state, struct task_struct *task,
 		    struct pt_regs *regs, unsigned long *first_sp)
 {
+	unsigned long *bp, *sp;
+	size_t len;
+
 	memset(state, 0, sizeof(*state));
 	state->task = task;
 
@@ -73,12 +127,22 @@ void __unwind_start(struct unwind_state *state, struct task_struct *task,
 		return;
 
 	/* set up the first stack frame */
-	state->bp = get_frame_pointer(task, regs);
+	bp = get_frame_pointer(task, regs);
+	regs = decode_frame_pointer(bp);
+	if (regs) {
+		state->regs = regs;
+		sp = (unsigned long *)regs;
+		len = sizeof(*regs);
+	}
+	else {
+		state->bp = bp;
+		sp = bp;
+		len = FRAME_HEADER_SIZE;
+	}
 
 	/* initialize stack info and make sure the frame data is accessible */
-	get_stack_info(state->bp, state->task, &state->stack_info,
-		       &state->stack_mask);
-	update_stack_state(state, state->bp, FRAME_HEADER_SIZE);
+	get_stack_info(sp, state->task, &state->stack_info, &state->stack_mask);
+	update_stack_state(state, sp, len);
 
 	/*
 	 * The caller can optionally provide a stack pointer directly
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 42/51] x86/unwind: create stack frames for saved syscall registers
  2016-08-12 14:28 [PATCH v3 00/51] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (40 preceding siblings ...)
  2016-08-12 14:29 ` [PATCH v3 41/51] x86/entry/unwind: create stack frames for saved interrupt registers Josh Poimboeuf
@ 2016-08-12 14:29 ` Josh Poimboeuf
  2016-08-14  8:23   ` Andy Lutomirski
  2016-08-12 14:29 ` [PATCH v3 43/51] x86/dumpstack: print stack identifier on its own line Josh Poimboeuf
                   ` (8 subsequent siblings)
  50 siblings, 1 reply; 99+ messages in thread
From: Josh Poimboeuf @ 2016-08-12 14:29 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

The entry code doesn't encode pt_regs for syscalls.  But they're always
at the same location, so we can add a manual check for them.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/unwind_frame.c | 35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

diff --git a/arch/x86/kernel/unwind_frame.c b/arch/x86/kernel/unwind_frame.c
index e02acec..d2480a3 100644
--- a/arch/x86/kernel/unwind_frame.c
+++ b/arch/x86/kernel/unwind_frame.c
@@ -24,6 +24,14 @@ unsigned long unwind_get_return_address(struct unwind_state *state)
 }
 EXPORT_SYMBOL_GPL(unwind_get_return_address);
 
+static bool is_last_task_frame(struct unwind_state *state)
+{
+	unsigned long bp = (unsigned long)state->bp;
+	unsigned long regs = (unsigned long)task_pt_regs(state->task);
+
+	return bp == regs - FRAME_HEADER_SIZE;
+}
+
 /*
  * This determines if the frame pointer actually contains an encoded pointer to
  * pt_regs on the stack.  See ENCODE_FRAME_POINTER.
@@ -77,6 +85,33 @@ bool unwind_next_frame(struct unwind_state *state)
 	if (state->regs && user_mode(state->regs))
 		goto the_end;
 
+	if (is_last_task_frame(state)) {
+		regs = task_pt_regs(state->task);
+
+		/*
+		 * kthreads (other than the boot CPU's idle thread) have some
+		 * partial regs at the end of their stack which were placed
+		 * there by copy_thread_tls().  But the regs don't have any
+		 * useful information, so we can skip them.
+		 *
+		 * This user_mode() check is slightly broader than a PF_KTHREAD
+		 * check because it also catches the awkward situation where a
+		 * newly forked kthread transitions into a user task by calling
+		 * do_execve(), which eventually clears PF_KTHREAD.
+		 */
+		if (!user_mode(regs))
+			goto the_end;
+
+		/*
+		 * We're almost at the end, but not quite: there's still the
+		 * syscall regs frame.  Entry code doesn't encode the regs
+		 * pointer for syscalls, so we have to set it manually.
+		 */
+		state->regs = regs;
+		state->bp = NULL;
+		return true;
+	}
+
 	/* get the next frame pointer */
 	if (state->regs)
 		next_bp = (unsigned long *)state->regs->bp;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 43/51] x86/dumpstack: print stack identifier on its own line
  2016-08-12 14:28 [PATCH v3 00/51] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (41 preceding siblings ...)
  2016-08-12 14:29 ` [PATCH v3 42/51] x86/unwind: create stack frames for saved syscall registers Josh Poimboeuf
@ 2016-08-12 14:29 ` Josh Poimboeuf
  2016-08-12 14:29 ` [PATCH v3 44/51] x86/dumpstack: print any pt_regs found on the stack Josh Poimboeuf
                   ` (7 subsequent siblings)
  50 siblings, 0 replies; 99+ messages in thread
From: Josh Poimboeuf @ 2016-08-12 14:29 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

show_trace_log_lvl() prints the stack id (e.g. "<IRQ>") without a
newline so that any stack address printed after it will appear on the
same line.  That causes the first stack address to be vertically
misaligned with the rest, making it visually cluttered and slightly
confusing:

  Call Trace:
   <IRQ> [<ffffffff814431c3>] dump_stack+0x86/0xc3
   [<ffffffff8100828b>] perf_callchain_kernel+0x14b/0x160
   [<ffffffff811e915f>] get_perf_callchain+0x15f/0x2b0
   ...
   <EOI> [<ffffffff8189c6c3>] ? _raw_spin_unlock_irq+0x33/0x60
   [<ffffffff810e1c84>] finish_task_switch+0xb4/0x250
   [<ffffffff8106f7dc>] do_async_page_fault+0x2c/0xa0

It will look worse once we start printing pt_regs registers found in the
middle of the stack:

  <IRQ> RIP: 0010:[<ffffffff8189c6c3>]  [<ffffffff8189c6c3>] _raw_spin_unlock_irq+0x33/0x60
  RSP: 0018:ffff88007876f720  EFLAGS: 00000206
  RAX: ffff8800786caa40 RBX: ffff88007d5da140 RCX: 0000000000000007
  ...

Improve readability by adding a newline to the stack name:

  Call Trace:
   <IRQ>
   [<ffffffff814431c3>] dump_stack+0x86/0xc3
   [<ffffffff8100828b>] perf_callchain_kernel+0x14b/0x160
   [<ffffffff811e915f>] get_perf_callchain+0x15f/0x2b0
   ...
   <EOI>
   [<ffffffff8189c6c3>] ? _raw_spin_unlock_irq+0x33/0x60
   [<ffffffff810e1c84>] finish_task_switch+0xb4/0x250
   [<ffffffff8106f7dc>] do_async_page_fault+0x2c/0xa0

Now that "continued" lines are no longer needed, we can also remove the
hack of using the empty string (aka KERN_CONT) and replace it with
KERN_DEFAULT.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/dumpstack.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index dcb718b..92a2f82 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -101,7 +101,7 @@ void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
 
 		stack_type_str(stack_info.type, &str_begin, &str_end);
 		if (str_begin)
-			printk("%s <%s> ", log_lvl, str_begin);
+			printk("%s <%s>\n", log_lvl, str_begin);
 
 		/*
 		 * Scan the stack, printing any text addresses we find.  At the
@@ -153,7 +153,7 @@ void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
 		}
 
 		if (str_end)
-			printk("%s <%s> ", log_lvl, str_end);
+			printk("%s <%s>\n", log_lvl, str_end);
 	}
 }
 
@@ -166,12 +166,12 @@ void show_stack(struct task_struct *task, unsigned long *sp)
 	if (!sp && (!task || task == current))
 		sp = get_stack_pointer(current, NULL);
 
-	show_stack_log_lvl(task, NULL, sp, "");
+	show_stack_log_lvl(task, NULL, sp, KERN_DEFAULT);
 }
 
 void show_stack_regs(struct pt_regs *regs)
 {
-	show_stack_log_lvl(NULL, regs, NULL, "");
+	show_stack_log_lvl(NULL, regs, NULL, KERN_DEFAULT);
 }
 
 static arch_spinlock_t die_lock = __ARCH_SPIN_LOCK_UNLOCKED;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 44/51] x86/dumpstack: print any pt_regs found on the stack
  2016-08-12 14:28 [PATCH v3 00/51] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (42 preceding siblings ...)
  2016-08-12 14:29 ` [PATCH v3 43/51] x86/dumpstack: print stack identifier on its own line Josh Poimboeuf
@ 2016-08-12 14:29 ` Josh Poimboeuf
  2016-08-14  8:16   ` Andy Lutomirski
  2016-08-12 14:29 ` [PATCH v3 45/51] x86: remove 64-byte gap at end of irq stack Josh Poimboeuf
                   ` (6 subsequent siblings)
  50 siblings, 1 reply; 99+ messages in thread
From: Josh Poimboeuf @ 2016-08-12 14:29 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

Now that we can find pt_regs registers on the stack, print them.  Here's
an example of what it looks like:

Call Trace:
 <IRQ>
 [<ffffffff8144b793>] dump_stack+0x86/0xc3
 [<ffffffff81142c73>] hrtimer_interrupt+0xb3/0x1c0
 [<ffffffff8105eb86>] local_apic_timer_interrupt+0x36/0x60
 [<ffffffff818b27cd>] smp_apic_timer_interrupt+0x3d/0x50
 [<ffffffff818b06ee>] apic_timer_interrupt+0x9e/0xb0
RIP: 0010:[<ffffffff818aef43>]  [<ffffffff818aef43>] _raw_spin_unlock_irq+0x33/0x60
RSP: 0018:ffff880079c4f760  EFLAGS: 00000202
RAX: ffff880078738000 RBX: ffff88007d3da0c0 RCX: 0000000000000007
RDX: 0000000000006d78 RSI: ffff8800787388f0 RDI: ffff880078738000
RBP: ffff880079c4f768 R08: 0000002199088f38 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff81e0d540
R13: ffff8800369fb700 R14: 0000000000000000 R15: ffff880078738000
 <EOI>
 [<ffffffff810e1f14>] finish_task_switch+0xb4/0x250
 [<ffffffff810e1ed6>] ? finish_task_switch+0x76/0x250
 [<ffffffff818a7b61>] __schedule+0x3e1/0xb20
 ...
 [<ffffffff810759c8>] trace_do_page_fault+0x58/0x2c0
 [<ffffffff8106f7dc>] do_async_page_fault+0x2c/0xa0
 [<ffffffff818b1dd8>] async_page_fault+0x28/0x30
RIP: 0010:[<ffffffff8145b062>]  [<ffffffff8145b062>] __clear_user+0x42/0x70
RSP: 0018:ffff880079c4fd38  EFLAGS: 00010202
RAX: 0000000000000000 RBX: 0000000000000138 RCX: 0000000000000138
RDX: 0000000000000000 RSI: 0000000000000008 RDI: 000000000061b640
RBP: ffff880079c4fd48 R08: 0000002198feefd7 R09: ffffffff82a40928
R10: 0000000000000001 R11: 0000000000000000 R12: 000000000061b640
R13: 0000000000000000 R14: ffff880079c50000 R15: ffff8800791d7400
 [<ffffffff8145b043>] ? __clear_user+0x23/0x70
 [<ffffffff8145b0fb>] clear_user+0x2b/0x40
 [<ffffffff812fbda2>] load_elf_binary+0x1472/0x1750
 [<ffffffff8129a591>] search_binary_handler+0xa1/0x200
 [<ffffffff8129b69b>] do_execveat_common.isra.36+0x6cb/0x9f0
 [<ffffffff8129b5f3>] ? do_execveat_common.isra.36+0x623/0x9f0
 [<ffffffff8129bcaa>] SyS_execve+0x3a/0x50
 [<ffffffff81003f5c>] do_syscall_64+0x6c/0x1e0
 [<ffffffff818afa3f>] entry_SYSCALL64_slow_path+0x25/0x25
RIP: 0033:[<00007fd2e2f2e537>]  [<00007fd2e2f2e537>] 0x7fd2e2f2e537
RSP: 002b:00007ffc449c5fc8  EFLAGS: 00000246
RAX: ffffffffffffffda RBX: 00007ffc449c8860 RCX: 00007fd2e2f2e537
RDX: 000000000127cc40 RSI: 00007ffc449c8860 RDI: 00007ffc449c6029
RBP: 00007ffc449c60b0 R08: 65726f632d667265 R09: 00007ffc449c5e20
R10: 00000000000005a7 R11: 0000000000000246 R12: 000000000127cc40
R13: 000000000127ce05 R14: 00007ffc449c6029 R15: 000000000127ce01

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/dumpstack.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index 92a2f82..cae70c1 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -86,7 +86,7 @@ void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
 	 * - softirq stack
 	 * - hardirq stack
 	 */
-	for (; stack; stack = stack_info.next_sp) {
+	for (regs = NULL; stack; stack = stack_info.next_sp) {
 		const char *str_begin, *str_end;
 
 		/*
@@ -123,6 +123,15 @@ void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
 			if (!__kernel_text_address(addr))
 				continue;
 
+			/*
+			 * Don't print regs->ip again if it was already printed
+			 * by __show_regs() below.
+			 */
+			if (regs && stack == &regs->ip) {
+				unwind_next_frame(&state);
+				continue;
+			}
+
 			if (stack == ret_addr_p)
 				reliable = 1;
 
@@ -150,6 +159,11 @@ void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
 			 * of the addresses will just be printed as unreliable.
 			 */
 			unwind_next_frame(&state);
+
+			/* if the frame has entry regs, print them */
+			regs = unwind_get_entry_regs(&state);
+			if (regs)
+				__show_regs(regs, 0);
 		}
 
 		if (str_end)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 45/51] x86: remove 64-byte gap at end of irq stack
  2016-08-12 14:28 [PATCH v3 00/51] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (43 preceding siblings ...)
  2016-08-12 14:29 ` [PATCH v3 44/51] x86/dumpstack: print any pt_regs found on the stack Josh Poimboeuf
@ 2016-08-12 14:29 ` Josh Poimboeuf
  2016-08-14  7:52   ` Andy Lutomirski
  2016-08-12 14:29 ` [PATCH v3 46/51] x86/unwind: warn on kernel stack corruption Josh Poimboeuf
                   ` (5 subsequent siblings)
  50 siblings, 1 reply; 99+ messages in thread
From: Josh Poimboeuf @ 2016-08-12 14:29 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

There has been a 64-byte gap at the end of the irq stack for at least 12
years.  It predates git history, and I can't find any good reason for
it.  Remove it.  What's the worst that could happen?

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/include/asm/page_64_types.h | 3 ---
 arch/x86/kernel/cpu/common.c         | 2 +-
 arch/x86/kernel/dumpstack_64.c       | 4 ++--
 arch/x86/kernel/setup_percpu.c       | 2 +-
 4 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/page_64_types.h b/arch/x86/include/asm/page_64_types.h
index 6256baf..3c0be3b 100644
--- a/arch/x86/include/asm/page_64_types.h
+++ b/arch/x86/include/asm/page_64_types.h
@@ -24,9 +24,6 @@
 #define IRQ_STACK_ORDER		(2 + KASAN_STACK_ORDER)
 #define IRQ_STACK_SIZE		(PAGE_SIZE << IRQ_STACK_ORDER)
 
-/* FIXME: why? */
-#define IRQ_USABLE_STACK_SIZE	(IRQ_STACK_SIZE - 64)
-
 #define DOUBLEFAULT_STACK 1
 #define NMI_STACK 2
 #define DEBUG_STACK 3
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 55684b1..ce7a4c1 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1286,7 +1286,7 @@ DEFINE_PER_CPU(struct task_struct *, current_task) ____cacheline_aligned =
 EXPORT_PER_CPU_SYMBOL(current_task);
 
 DEFINE_PER_CPU(char *, irq_stack_ptr) =
-	init_per_cpu_var(irq_stack_union.irq_stack) + IRQ_USABLE_STACK_SIZE;
+	init_per_cpu_var(irq_stack_union.irq_stack) + IRQ_STACK_SIZE;
 
 DEFINE_PER_CPU(unsigned int, irq_count) __visible = -1;
 
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index 9f7a9f9..001a75d 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -78,7 +78,7 @@ static bool in_exception_stack(unsigned long *stack, struct stack_info *info)
 static bool in_irq_stack(unsigned long *stack, struct stack_info *info)
 {
 	unsigned long *end   = (unsigned long *)this_cpu_read(irq_stack_ptr);
-	unsigned long *begin = end - (IRQ_USABLE_STACK_SIZE / sizeof(long));
+	unsigned long *begin = end - (IRQ_STACK_SIZE / sizeof(long));
 
 	if (stack < begin || stack >= end)
 		return false;
@@ -145,7 +145,7 @@ void show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
 	int i;
 
 	irq_stack_end = (unsigned long *)this_cpu_read(irq_stack_ptr);
-	irq_stack     = irq_stack_end - (IRQ_USABLE_STACK_SIZE / sizeof(long));
+	irq_stack     = irq_stack_end - (IRQ_STACK_SIZE / sizeof(long));
 
 	sp = sp ? : get_stack_pointer(task, regs);
 
diff --git a/arch/x86/kernel/setup_percpu.c b/arch/x86/kernel/setup_percpu.c
index a2a0eae..2bbd27f 100644
--- a/arch/x86/kernel/setup_percpu.c
+++ b/arch/x86/kernel/setup_percpu.c
@@ -246,7 +246,7 @@ void __init setup_per_cpu_areas(void)
 #ifdef CONFIG_X86_64
 		per_cpu(irq_stack_ptr, cpu) =
 			per_cpu(irq_stack_union.irq_stack, cpu) +
-			IRQ_USABLE_STACK_SIZE;
+			IRQ_STACK_SIZE;
 #endif
 #ifdef CONFIG_NUMA
 		per_cpu(x86_cpu_to_node_map, cpu) =
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 46/51] x86/unwind: warn on kernel stack corruption
  2016-08-12 14:28 [PATCH v3 00/51] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (44 preceding siblings ...)
  2016-08-12 14:29 ` [PATCH v3 45/51] x86: remove 64-byte gap at end of irq stack Josh Poimboeuf
@ 2016-08-12 14:29 ` Josh Poimboeuf
  2016-08-12 14:29 ` [PATCH v3 47/51] x86/unwind: warn on bad stack return address Josh Poimboeuf
                   ` (4 subsequent siblings)
  50 siblings, 0 replies; 99+ messages in thread
From: Josh Poimboeuf @ 2016-08-12 14:29 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

Detect situations in the unwinder where the frame pointer refers to a
bad address, and print an appropriate warning.

Use printk_deferred_once() because the unwinder can be called with the
console lock by lockdep via save_stack_trace().

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/unwind_frame.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/unwind_frame.c b/arch/x86/kernel/unwind_frame.c
index d2480a3..cde900e 100644
--- a/arch/x86/kernel/unwind_frame.c
+++ b/arch/x86/kernel/unwind_frame.c
@@ -130,7 +130,8 @@ bool unwind_next_frame(struct unwind_state *state)
 
 	/* make sure the next frame's data is accessible */
 	if (!update_stack_state(state, next_sp, next_len))
-		return false;
+		goto bad_address;
+
 	/* move to the next frame */
 	if (regs) {
 		state->regs = regs;
@@ -142,6 +143,17 @@ bool unwind_next_frame(struct unwind_state *state)
 
 	return true;
 
+bad_address:
+	if (state->regs)
+		printk_deferred_once(KERN_WARNING
+			"WARNING: kernel stack regs at %p in %s:%d has bad 'bp' value %p\n",
+			state->regs, state->task->comm,
+			state->task->pid, next_bp);
+	else
+		printk_deferred_once(KERN_WARNING
+			"WARNING: kernel stack frame pointer at %p in %s:%d has bad value %p\n",
+			state->bp, state->task->comm,
+			state->task->pid, next_bp);
 the_end:
 	state->stack_info.type = STACK_TYPE_UNKNOWN;
 	return false;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 47/51] x86/unwind: warn on bad stack return address
  2016-08-12 14:28 [PATCH v3 00/51] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (45 preceding siblings ...)
  2016-08-12 14:29 ` [PATCH v3 46/51] x86/unwind: warn on kernel stack corruption Josh Poimboeuf
@ 2016-08-12 14:29 ` Josh Poimboeuf
  2016-08-12 14:29 ` [PATCH v3 48/51] x86/unwind: warn if stack grows up Josh Poimboeuf
                   ` (3 subsequent siblings)
  50 siblings, 0 replies; 99+ messages in thread
From: Josh Poimboeuf @ 2016-08-12 14:29 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

If __kernel_text_address() doesn't recognize a return address on the
stack, it probably means that it's some generated code which
__kernel_text_address() doesn't know about yet.

Otherwise there's probably some stack corruption.

Either way, warn about it.

Use printk_deferred_once() because the unwinder can be called with the
console lock by lockdep via save_stack_trace().

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/unwind_frame.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/unwind_frame.c b/arch/x86/kernel/unwind_frame.c
index cde900e..5496462 100644
--- a/arch/x86/kernel/unwind_frame.c
+++ b/arch/x86/kernel/unwind_frame.c
@@ -20,7 +20,15 @@ unsigned long unwind_get_return_address(struct unwind_state *state)
 	addr = ftrace_graph_ret_addr(state->task, &state->graph_idx, *addr_p,
 				     addr_p);
 
-	return __kernel_text_address(addr) ? addr : 0;
+	if (!__kernel_text_address(addr)) {
+		printk_deferred_once(KERN_WARNING
+			"WARNING: unrecognized kernel stack return address %p at %p in %s:%d\n",
+			(void *)addr, addr_p, state->task->comm,
+			state->task->pid);
+		return 0;
+	}
+
+	return addr;
 }
 EXPORT_SYMBOL_GPL(unwind_get_return_address);
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 48/51] x86/unwind: warn if stack grows up
  2016-08-12 14:28 [PATCH v3 00/51] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (46 preceding siblings ...)
  2016-08-12 14:29 ` [PATCH v3 47/51] x86/unwind: warn on bad stack return address Josh Poimboeuf
@ 2016-08-12 14:29 ` Josh Poimboeuf
  2016-08-14  7:56   ` Andy Lutomirski
  2016-08-12 14:29 ` [PATCH v3 49/51] x86/dumpstack: warn on stack recursion Josh Poimboeuf
                   ` (2 subsequent siblings)
  50 siblings, 1 reply; 99+ messages in thread
From: Josh Poimboeuf @ 2016-08-12 14:29 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

Add a sanity check to ensure the stack only grows down, and print a
warning if the check fails.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/unwind_frame.c | 26 ++++++++++++++++++++++++--
 1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/unwind_frame.c b/arch/x86/kernel/unwind_frame.c
index 5496462..f21b7ef 100644
--- a/arch/x86/kernel/unwind_frame.c
+++ b/arch/x86/kernel/unwind_frame.c
@@ -32,6 +32,15 @@ unsigned long unwind_get_return_address(struct unwind_state *state)
 }
 EXPORT_SYMBOL_GPL(unwind_get_return_address);
 
+static size_t regs_size(struct pt_regs *regs)
+{
+	/* x86_32 regs from kernel mode are two words shorter */
+	if (IS_ENABLED(CONFIG_X86_32) && !user_mode(regs))
+		return sizeof(*regs) - (2*sizeof(long));
+
+	return sizeof(*regs);
+}
+
 static bool is_last_task_frame(struct unwind_state *state)
 {
 	unsigned long bp = (unsigned long)state->bp;
@@ -85,6 +94,7 @@ bool unwind_next_frame(struct unwind_state *state)
 	struct pt_regs *regs;
 	unsigned long *next_bp, *next_sp;
 	size_t next_len;
+	enum stack_type prev_type = state->stack_info.type;
 
 	if (unwind_done(state))
 		return false;
@@ -140,6 +150,18 @@ bool unwind_next_frame(struct unwind_state *state)
 	if (!update_stack_state(state, next_sp, next_len))
 		goto bad_address;
 
+	/* make sure it only unwinds up and doesn't overlap the last frame */
+	if (state->stack_info.type == prev_type) {
+		if (state->regs &&
+		    (void *)next_sp < (void *)state->regs +
+				      regs_size(state->regs))
+			goto bad_address;
+
+		if (state->bp &&
+		    (void *)next_sp < (void *)state->bp + FRAME_HEADER_SIZE)
+			goto bad_address;
+	}
+
 	/* move to the next frame */
 	if (regs) {
 		state->regs = regs;
@@ -156,12 +178,12 @@ bad_address:
 		printk_deferred_once(KERN_WARNING
 			"WARNING: kernel stack regs at %p in %s:%d has bad 'bp' value %p\n",
 			state->regs, state->task->comm,
-			state->task->pid, next_bp);
+			state->task->pid, next_sp);
 	else
 		printk_deferred_once(KERN_WARNING
 			"WARNING: kernel stack frame pointer at %p in %s:%d has bad value %p\n",
 			state->bp, state->task->comm,
-			state->task->pid, next_bp);
+			state->task->pid, next_sp);
 the_end:
 	state->stack_info.type = STACK_TYPE_UNKNOWN;
 	return false;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 49/51] x86/dumpstack: warn on stack recursion
  2016-08-12 14:28 [PATCH v3 00/51] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (47 preceding siblings ...)
  2016-08-12 14:29 ` [PATCH v3 48/51] x86/unwind: warn if stack grows up Josh Poimboeuf
@ 2016-08-12 14:29 ` Josh Poimboeuf
  2016-08-12 14:29 ` [PATCH v3 50/51] x86/mm: move arch_within_stack_frames() to usercopy.c Josh Poimboeuf
  2016-08-12 14:29 ` [PATCH v3 51/51] x86/mm: convert arch_within_stack_frames() to use the new unwinder Josh Poimboeuf
  50 siblings, 0 replies; 99+ messages in thread
From: Josh Poimboeuf @ 2016-08-12 14:29 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

Print a warning if stack recursion is detected.

Use printk_deferred_once() because the unwinder can be called with the
console lock by lockdep via save_stack_trace().

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/dumpstack_32.c | 5 ++++-
 arch/x86/kernel/dumpstack_64.c | 5 ++++-
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
index 53d939e..cbbef0e 100644
--- a/arch/x86/kernel/dumpstack_32.c
+++ b/arch/x86/kernel/dumpstack_32.c
@@ -101,8 +101,11 @@ recursion_check:
 	 * just break out and report an unknown stack type.
 	 */
 	if (visit_mask) {
-		if (*visit_mask & (1UL << info->type))
+		if (*visit_mask & (1UL << info->type)) {
+			printk_deferred_once(KERN_WARNING "WARNING: stack recursion on stack type %d\n",
+					     info->type);
 			goto unknown;
+		}
 		*visit_mask |= 1UL << info->type;
 	}
 
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index 001a75d..36581f9 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -125,8 +125,11 @@ recursion_check:
 	 * just break out and report an unknown stack type.
 	 */
 	if (visit_mask) {
-		if (*visit_mask & (1UL << info->type))
+		if (*visit_mask & (1UL << info->type)) {
+			printk_deferred_once(KERN_WARNING "WARNING: stack recursion on stack type %d\n",
+					     info->type);
 			goto unknown;
+		}
 		*visit_mask |= 1UL << info->type;
 	}
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 50/51] x86/mm: move arch_within_stack_frames() to usercopy.c
  2016-08-12 14:28 [PATCH v3 00/51] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (48 preceding siblings ...)
  2016-08-12 14:29 ` [PATCH v3 49/51] x86/dumpstack: warn on stack recursion Josh Poimboeuf
@ 2016-08-12 14:29 ` Josh Poimboeuf
  2016-08-12 17:36   ` Kees Cook
  2016-08-12 14:29 ` [PATCH v3 51/51] x86/mm: convert arch_within_stack_frames() to use the new unwinder Josh Poimboeuf
  50 siblings, 1 reply; 99+ messages in thread
From: Josh Poimboeuf @ 2016-08-12 14:29 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

When I tried to port arch_within_stack_frames() to use the new unwinder,
I got a nightmare include file "header soup" scenario when unwind.h was
included from thread_info.h.  And anyway, I think thread_info.h isn't
really an appropriate place for this function.  So move it to usercopy.c
instead.

Since it relies on its parent's stack pointer, and the function is no
longer inlined, the arguments to the __builtin_frame_address() calls
have been incremented.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/include/asm/thread_info.h | 46 ++++++++------------------------------
 arch/x86/lib/usercopy.c            | 43 +++++++++++++++++++++++++++++++++++
 2 files changed, 52 insertions(+), 37 deletions(-)

diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index 8b7c8d8e..fd849e6 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -176,49 +176,21 @@ static inline unsigned long current_stack_pointer(void)
 	return sp;
 }
 
-/*
- * Walks up the stack frames to make sure that the specified object is
- * entirely contained by a single stack frame.
- *
- * Returns:
- *		 1 if within a frame
- *		-1 if placed across a frame boundary (or outside stack)
- *		 0 unable to determine (no frame pointers, etc)
- */
+#ifdef CONFIG_HARDENED_USERCOPY
+#ifdef CONFIG_FRAME_POINTER
+int arch_within_stack_frames(const void * const stack,
+			     const void * const stackend,
+			     const void *obj, unsigned long len);
+#else
 static inline int arch_within_stack_frames(const void * const stack,
 					   const void * const stackend,
 					   const void *obj, unsigned long len)
 {
-#if defined(CONFIG_FRAME_POINTER)
-	const void *frame = NULL;
-	const void *oldframe;
-
-	oldframe = __builtin_frame_address(1);
-	if (oldframe)
-		frame = __builtin_frame_address(2);
-	/*
-	 * low ----------------------------------------------> high
-	 * [saved bp][saved ip][args][local vars][saved bp][saved ip]
-	 *                     ^----------------^
-	 *               allow copies only within here
-	 */
-	while (stack <= frame && frame < stackend) {
-		/*
-		 * If obj + len extends past the last frame, this
-		 * check won't pass and the next frame will be 0,
-		 * causing us to bail out and correctly report
-		 * the copy as invalid.
-		 */
-		if (obj + len <= frame)
-			return obj >= oldframe + 2 * sizeof(void *) ? 1 : -1;
-		oldframe = frame;
-		frame = *(const void * const *)frame;
-	}
-	return -1;
-#else
 	return 0;
-#endif
 }
+#endif /* CONFIG_FRAME_POINTER */
+#endif /* CONFIG_HARDENED_USERCOPY */
+
 
 #else /* !__ASSEMBLY__ */
 
diff --git a/arch/x86/lib/usercopy.c b/arch/x86/lib/usercopy.c
index b490878..96ce151 100644
--- a/arch/x86/lib/usercopy.c
+++ b/arch/x86/lib/usercopy.c
@@ -9,6 +9,7 @@
 
 #include <asm/word-at-a-time.h>
 #include <linux/sched.h>
+#include <asm/unwind.h>
 
 /*
  * We rely on the nested NMI work to allow atomic faults from the NMI path; the
@@ -34,3 +35,45 @@ copy_from_user_nmi(void *to, const void __user *from, unsigned long n)
 	return ret;
 }
 EXPORT_SYMBOL_GPL(copy_from_user_nmi);
+
+#ifdef CONFIG_HARDENED_USERCOPY
+/*
+ * Walks up the stack frames to make sure that the specified object is
+ * entirely contained by a single stack frame.
+ *
+ * Returns:
+ *		 1 if within a frame
+ *		-1 if placed across a frame boundary (or outside stack)
+ *		 0 unable to determine (no frame pointers, etc)
+ */
+int arch_within_stack_frames(const void * const stack,
+			     const void * const stackend,
+			     const void *obj, unsigned long len)
+{
+	const void *frame = NULL;
+	const void *oldframe;
+
+	oldframe = __builtin_frame_address(2);
+	if (oldframe)
+		frame = __builtin_frame_address(3);
+	/*
+	 * low ----------------------------------------------> high
+	 * [saved bp][saved ip][args][local vars][saved bp][saved ip]
+	 *                     ^----------------^
+	 *               allow copies only within here
+	 */
+	while (stack <= frame && frame < stackend) {
+		/*
+		 * If obj + len extends past the last frame, this
+		 * check won't pass and the next frame will be 0,
+		 * causing us to bail out and correctly report
+		 * the copy as invalid.
+		 */
+		if (obj + len <= frame)
+			return obj >= oldframe + 2 * sizeof(void *) ? 1 : -1;
+		oldframe = frame;
+		frame = *(const void * const *)frame;
+	}
+	return -1;
+}
+#endif /* CONFIG_HARDENED_USERCOPY */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 51/51] x86/mm: convert arch_within_stack_frames() to use the new unwinder
  2016-08-12 14:28 [PATCH v3 00/51] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (49 preceding siblings ...)
  2016-08-12 14:29 ` [PATCH v3 50/51] x86/mm: move arch_within_stack_frames() to usercopy.c Josh Poimboeuf
@ 2016-08-12 14:29 ` Josh Poimboeuf
  2016-08-12 15:17   ` Josh Poimboeuf
  2016-08-12 20:41   ` Josh Poimboeuf
  50 siblings, 2 replies; 99+ messages in thread
From: Josh Poimboeuf @ 2016-08-12 14:29 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

Convert arch_within_stack_frames() to use the new unwinder.

Boot tested with CONFIG_HARDENED_USERCOPY.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/lib/usercopy.c | 25 +++++++++++++++++++------
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/arch/x86/lib/usercopy.c b/arch/x86/lib/usercopy.c
index 96ce151..9d0913c 100644
--- a/arch/x86/lib/usercopy.c
+++ b/arch/x86/lib/usercopy.c
@@ -50,12 +50,21 @@ int arch_within_stack_frames(const void * const stack,
 			     const void * const stackend,
 			     const void *obj, unsigned long len)
 {
-	const void *frame = NULL;
-	const void *oldframe;
+	struct unwind_state state;
+	const void *frame, *oldframe;
+
+	unwind_start(&state, current, NULL, NULL);
+
+	if (!unwind_next_frame(&state))
+		return 0;
+
+	oldframe = unwind_get_stack_ptr(&state);
+
+	if (!unwind_next_frame(&state))
+		return 0;
+
+	frame = unwind_get_stack_ptr(&state);
 
-	oldframe = __builtin_frame_address(2);
-	if (oldframe)
-		frame = __builtin_frame_address(3);
 	/*
 	 * low ----------------------------------------------> high
 	 * [saved bp][saved ip][args][local vars][saved bp][saved ip]
@@ -71,8 +80,12 @@ int arch_within_stack_frames(const void * const stack,
 		 */
 		if (obj + len <= frame)
 			return obj >= oldframe + 2 * sizeof(void *) ? 1 : -1;
+
+		if (!unwind_next_frame(&state))
+			return 0;
+
 		oldframe = frame;
-		frame = *(const void * const *)frame;
+		frame = unwind_get_stack_ptr(&state);
 	}
 	return -1;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 51/51] x86/mm: convert arch_within_stack_frames() to use the new unwinder
  2016-08-12 14:29 ` [PATCH v3 51/51] x86/mm: convert arch_within_stack_frames() to use the new unwinder Josh Poimboeuf
@ 2016-08-12 15:17   ` Josh Poimboeuf
  2016-08-12 17:38     ` Kees Cook
  2016-08-12 20:41   ` Josh Poimboeuf
  1 sibling, 1 reply; 99+ messages in thread
From: Josh Poimboeuf @ 2016-08-12 15:17 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

On Fri, Aug 12, 2016 at 09:29:10AM -0500, Josh Poimboeuf wrote:
> Convert arch_within_stack_frames() to use the new unwinder.
> 
> Boot tested with CONFIG_HARDENED_USERCOPY.
> 
> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
> ---
>  arch/x86/lib/usercopy.c | 25 +++++++++++++++++++------
>  1 file changed, 19 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/lib/usercopy.c b/arch/x86/lib/usercopy.c
> index 96ce151..9d0913c 100644
> --- a/arch/x86/lib/usercopy.c
> +++ b/arch/x86/lib/usercopy.c
> @@ -50,12 +50,21 @@ int arch_within_stack_frames(const void * const stack,
>  			     const void * const stackend,
>  			     const void *obj, unsigned long len)
>  {
> -	const void *frame = NULL;
> -	const void *oldframe;
> +	struct unwind_state state;
> +	const void *frame, *oldframe;
> +
> +	unwind_start(&state, current, NULL, NULL);
> +
> +	if (!unwind_next_frame(&state))
> +		return 0;
> +
> +	oldframe = unwind_get_stack_ptr(&state);

Actually, I think this isn't quite right.  Now that the function isn't
inlined, this needs to unwind another frame to be equivalent to current
behavior.

-- 
Josh

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 50/51] x86/mm: move arch_within_stack_frames() to usercopy.c
  2016-08-12 14:29 ` [PATCH v3 50/51] x86/mm: move arch_within_stack_frames() to usercopy.c Josh Poimboeuf
@ 2016-08-12 17:36   ` Kees Cook
  2016-08-12 19:12     ` Josh Poimboeuf
  0 siblings, 1 reply; 99+ messages in thread
From: Kees Cook @ 2016-08-12 17:36 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86, LKML,
	Andy Lutomirski, Linus Torvalds, Steven Rostedt, Brian Gerst,
	Peter Zijlstra, Frederic Weisbecker, Byungchul Park, Nilay Vaish

On Fri, Aug 12, 2016 at 7:29 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> When I tried to port arch_within_stack_frames() to use the new unwinder,
> I got a nightmare include file "header soup" scenario when unwind.h was
> included from thread_info.h.  And anyway, I think thread_info.h isn't
> really an appropriate place for this function.  So move it to usercopy.c
> instead.
>
> Since it relies on its parent's stack pointer, and the function is no
> longer inlined, the arguments to the __builtin_frame_address() calls
> have been incremented.

Cool, looks good (minor change noted below). This patch might be a
good place to drop this from mm/Makefile too:

# Since __builtin_frame_address does work as used, disable the warning.
CFLAGS_usercopy.o += $(call cc-disable-warning, frame-address)

Since frame-address warnings have been disabled globally now since
commit 124a3d88fa20 ("Disable "frame-address" warning").

> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
> ---
>  arch/x86/include/asm/thread_info.h | 46 ++++++++------------------------------
>  arch/x86/lib/usercopy.c            | 43 +++++++++++++++++++++++++++++++++++
>  2 files changed, 52 insertions(+), 37 deletions(-)
>
> diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
> index 8b7c8d8e..fd849e6 100644
> --- a/arch/x86/include/asm/thread_info.h
> +++ b/arch/x86/include/asm/thread_info.h
> @@ -176,49 +176,21 @@ static inline unsigned long current_stack_pointer(void)
>         return sp;
>  }
>
> -/*
> - * Walks up the stack frames to make sure that the specified object is
> - * entirely contained by a single stack frame.
> - *
> - * Returns:
> - *              1 if within a frame
> - *             -1 if placed across a frame boundary (or outside stack)
> - *              0 unable to determine (no frame pointers, etc)
> - */
> +#ifdef CONFIG_HARDENED_USERCOPY

This ifdef shouldn't be needed: the arch_within_stack_frames wasn't
designed to depend on it.

> +#ifdef CONFIG_FRAME_POINTER
> +int arch_within_stack_frames(const void * const stack,
> +                            const void * const stackend,
> +                            const void *obj, unsigned long len);
> +#else
>  static inline int arch_within_stack_frames(const void * const stack,
>                                            const void * const stackend,
>                                            const void *obj, unsigned long len)
>  {
> -#if defined(CONFIG_FRAME_POINTER)
> -       const void *frame = NULL;
> -       const void *oldframe;
> -
> -       oldframe = __builtin_frame_address(1);
> -       if (oldframe)
> -               frame = __builtin_frame_address(2);
> -       /*
> -        * low ----------------------------------------------> high
> -        * [saved bp][saved ip][args][local vars][saved bp][saved ip]
> -        *                     ^----------------^
> -        *               allow copies only within here
> -        */
> -       while (stack <= frame && frame < stackend) {
> -               /*
> -                * If obj + len extends past the last frame, this
> -                * check won't pass and the next frame will be 0,
> -                * causing us to bail out and correctly report
> -                * the copy as invalid.
> -                */
> -               if (obj + len <= frame)
> -                       return obj >= oldframe + 2 * sizeof(void *) ? 1 : -1;
> -               oldframe = frame;
> -               frame = *(const void * const *)frame;
> -       }
> -       return -1;
> -#else
>         return 0;
> -#endif
>  }
> +#endif /* CONFIG_FRAME_POINTER */
> +#endif /* CONFIG_HARDENED_USERCOPY */
> +
>
>  #else /* !__ASSEMBLY__ */
>
> diff --git a/arch/x86/lib/usercopy.c b/arch/x86/lib/usercopy.c
> index b490878..96ce151 100644
> --- a/arch/x86/lib/usercopy.c
> +++ b/arch/x86/lib/usercopy.c
> @@ -9,6 +9,7 @@
>
>  #include <asm/word-at-a-time.h>
>  #include <linux/sched.h>
> +#include <asm/unwind.h>
>
>  /*
>   * We rely on the nested NMI work to allow atomic faults from the NMI path; the
> @@ -34,3 +35,45 @@ copy_from_user_nmi(void *to, const void __user *from, unsigned long n)
>         return ret;
>  }
>  EXPORT_SYMBOL_GPL(copy_from_user_nmi);
> +
> +#ifdef CONFIG_HARDENED_USERCOPY

Same thing: no need to check CONFIG_HARDENED_USERCOPY here: it should
be checking CONFIG_FRAME_POINTER instead.

> +/*
> + * Walks up the stack frames to make sure that the specified object is
> + * entirely contained by a single stack frame.
> + *
> + * Returns:
> + *              1 if within a frame
> + *             -1 if placed across a frame boundary (or outside stack)
> + *              0 unable to determine (no frame pointers, etc)
> + */
> +int arch_within_stack_frames(const void * const stack,
> +                            const void * const stackend,
> +                            const void *obj, unsigned long len)
> +{
> +       const void *frame = NULL;
> +       const void *oldframe;
> +
> +       oldframe = __builtin_frame_address(2);
> +       if (oldframe)
> +               frame = __builtin_frame_address(3);
> +       /*
> +        * low ----------------------------------------------> high
> +        * [saved bp][saved ip][args][local vars][saved bp][saved ip]
> +        *                     ^----------------^
> +        *               allow copies only within here
> +        */
> +       while (stack <= frame && frame < stackend) {
> +               /*
> +                * If obj + len extends past the last frame, this
> +                * check won't pass and the next frame will be 0,
> +                * causing us to bail out and correctly report
> +                * the copy as invalid.
> +                */
> +               if (obj + len <= frame)
> +                       return obj >= oldframe + 2 * sizeof(void *) ? 1 : -1;
> +               oldframe = frame;
> +               frame = *(const void * const *)frame;
> +       }
> +       return -1;
> +}
> +#endif /* CONFIG_HARDENED_USERCOPY */
> --
> 2.7.4
>

-Kees

-- 
Kees Cook
Nexus Security

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 51/51] x86/mm: convert arch_within_stack_frames() to use the new unwinder
  2016-08-12 15:17   ` Josh Poimboeuf
@ 2016-08-12 17:38     ` Kees Cook
  2016-08-12 19:15       ` Josh Poimboeuf
  0 siblings, 1 reply; 99+ messages in thread
From: Kees Cook @ 2016-08-12 17:38 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86, LKML,
	Andy Lutomirski, Linus Torvalds, Steven Rostedt, Brian Gerst,
	Peter Zijlstra, Frederic Weisbecker, Byungchul Park, Nilay Vaish

On Fri, Aug 12, 2016 at 8:17 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> On Fri, Aug 12, 2016 at 09:29:10AM -0500, Josh Poimboeuf wrote:
>> Convert arch_within_stack_frames() to use the new unwinder.
>>
>> Boot tested with CONFIG_HARDENED_USERCOPY.
>>
>> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
>> ---
>>  arch/x86/lib/usercopy.c | 25 +++++++++++++++++++------
>>  1 file changed, 19 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/x86/lib/usercopy.c b/arch/x86/lib/usercopy.c
>> index 96ce151..9d0913c 100644
>> --- a/arch/x86/lib/usercopy.c
>> +++ b/arch/x86/lib/usercopy.c
>> @@ -50,12 +50,21 @@ int arch_within_stack_frames(const void * const stack,
>>                            const void * const stackend,
>>                            const void *obj, unsigned long len)
>>  {
>> -     const void *frame = NULL;
>> -     const void *oldframe;
>> +     struct unwind_state state;
>> +     const void *frame, *oldframe;
>> +
>> +     unwind_start(&state, current, NULL, NULL);
>> +
>> +     if (!unwind_next_frame(&state))
>> +             return 0;
>> +
>> +     oldframe = unwind_get_stack_ptr(&state);
>
> Actually, I think this isn't quite right.  Now that the function isn't
> inlined, this needs to unwind another frame to be equivalent to current
> behavior.

Yeah, that seems right. And IIUC, as long as this is wrapped in the
CONFIG_FRAME_POINTER check, this won't use the guessing unwinder,
right? (Which is how it should be.)

-Kees

-- 
Kees Cook
Nexus Security

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 50/51] x86/mm: move arch_within_stack_frames() to usercopy.c
  2016-08-12 17:36   ` Kees Cook
@ 2016-08-12 19:12     ` Josh Poimboeuf
  2016-08-12 20:06       ` Kees Cook
  0 siblings, 1 reply; 99+ messages in thread
From: Josh Poimboeuf @ 2016-08-12 19:12 UTC (permalink / raw)
  To: Kees Cook
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86, LKML,
	Andy Lutomirski, Linus Torvalds, Steven Rostedt, Brian Gerst,
	Peter Zijlstra, Frederic Weisbecker, Byungchul Park, Nilay Vaish

On Fri, Aug 12, 2016 at 10:36:21AM -0700, Kees Cook wrote:
> On Fri, Aug 12, 2016 at 7:29 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > When I tried to port arch_within_stack_frames() to use the new unwinder,
> > I got a nightmare include file "header soup" scenario when unwind.h was
> > included from thread_info.h.  And anyway, I think thread_info.h isn't
> > really an appropriate place for this function.  So move it to usercopy.c
> > instead.
> >
> > Since it relies on its parent's stack pointer, and the function is no
> > longer inlined, the arguments to the __builtin_frame_address() calls
> > have been incremented.
> 
> Cool, looks good (minor change noted below). This patch might be a
> good place to drop this from mm/Makefile too:
> 
> # Since __builtin_frame_address does work as used, disable the warning.
> CFLAGS_usercopy.o += $(call cc-disable-warning, frame-address)
> 
> Since frame-address warnings have been disabled globally now since
> commit 124a3d88fa20 ("Disable "frame-address" warning").

Ok, I'll do that with the next patch (51/51) which removes the
__builtin_frame_address() calls.

> > Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
> > ---
> >  arch/x86/include/asm/thread_info.h | 46 ++++++++------------------------------
> >  arch/x86/lib/usercopy.c            | 43 +++++++++++++++++++++++++++++++++++
> >  2 files changed, 52 insertions(+), 37 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
> > index 8b7c8d8e..fd849e6 100644
> > --- a/arch/x86/include/asm/thread_info.h
> > +++ b/arch/x86/include/asm/thread_info.h
> > @@ -176,49 +176,21 @@ static inline unsigned long current_stack_pointer(void)
> >         return sp;
> >  }
> >
> > -/*
> > - * Walks up the stack frames to make sure that the specified object is
> > - * entirely contained by a single stack frame.
> > - *
> > - * Returns:
> > - *              1 if within a frame
> > - *             -1 if placed across a frame boundary (or outside stack)
> > - *              0 unable to determine (no frame pointers, etc)
> > - */
> > +#ifdef CONFIG_HARDENED_USERCOPY
> 
> This ifdef shouldn't be needed: the arch_within_stack_frames wasn't
> designed to depend on it.
> 
> > +#ifdef CONFIG_FRAME_POINTER
> > +int arch_within_stack_frames(const void * const stack,
> > +                            const void * const stackend,
> > +                            const void *obj, unsigned long len);
> > +#else
> >  static inline int arch_within_stack_frames(const void * const stack,
> >                                            const void * const stackend,
> >                                            const void *obj, unsigned long len)
> >  {
> > -#if defined(CONFIG_FRAME_POINTER)
> > -       const void *frame = NULL;
> > -       const void *oldframe;
> > -
> > -       oldframe = __builtin_frame_address(1);
> > -       if (oldframe)
> > -               frame = __builtin_frame_address(2);
> > -       /*
> > -        * low ----------------------------------------------> high
> > -        * [saved bp][saved ip][args][local vars][saved bp][saved ip]
> > -        *                     ^----------------^
> > -        *               allow copies only within here
> > -        */
> > -       while (stack <= frame && frame < stackend) {
> > -               /*
> > -                * If obj + len extends past the last frame, this
> > -                * check won't pass and the next frame will be 0,
> > -                * causing us to bail out and correctly report
> > -                * the copy as invalid.
> > -                */
> > -               if (obj + len <= frame)
> > -                       return obj >= oldframe + 2 * sizeof(void *) ? 1 : -1;
> > -               oldframe = frame;
> > -               frame = *(const void * const *)frame;
> > -       }
> > -       return -1;
> > -#else
> >         return 0;
> > -#endif
> >  }
> > +#endif /* CONFIG_FRAME_POINTER */
> > +#endif /* CONFIG_HARDENED_USERCOPY */
> > +
> >
> >  #else /* !__ASSEMBLY__ */
> >
> > diff --git a/arch/x86/lib/usercopy.c b/arch/x86/lib/usercopy.c
> > index b490878..96ce151 100644
> > --- a/arch/x86/lib/usercopy.c
> > +++ b/arch/x86/lib/usercopy.c
> > @@ -9,6 +9,7 @@
> >
> >  #include <asm/word-at-a-time.h>
> >  #include <linux/sched.h>
> > +#include <asm/unwind.h>
> >
> >  /*
> >   * We rely on the nested NMI work to allow atomic faults from the NMI path; the
> > @@ -34,3 +35,45 @@ copy_from_user_nmi(void *to, const void __user *from, unsigned long n)
> >         return ret;
> >  }
> >  EXPORT_SYMBOL_GPL(copy_from_user_nmi);
> > +
> > +#ifdef CONFIG_HARDENED_USERCOPY
> 
> Same thing: no need to check CONFIG_HARDENED_USERCOPY here: it should
> be checking CONFIG_FRAME_POINTER instead.

Now that this function is no longer inlined and is instead compiled in
its own .c file, I was thinking that the tinyconfig folks would
appreciate not growing the text size if there's no reason to do so.
Keeping this #ifdef won't break anything, right?

Also I moved the CONFIG_FRAME_POINTER check to the header file so it
doesn't pollute the .c code.

-- 
Josh

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 51/51] x86/mm: convert arch_within_stack_frames() to use the new unwinder
  2016-08-12 17:38     ` Kees Cook
@ 2016-08-12 19:15       ` Josh Poimboeuf
  0 siblings, 0 replies; 99+ messages in thread
From: Josh Poimboeuf @ 2016-08-12 19:15 UTC (permalink / raw)
  To: Kees Cook
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86, LKML,
	Andy Lutomirski, Linus Torvalds, Steven Rostedt, Brian Gerst,
	Peter Zijlstra, Frederic Weisbecker, Byungchul Park, Nilay Vaish

On Fri, Aug 12, 2016 at 10:38:31AM -0700, Kees Cook wrote:
> On Fri, Aug 12, 2016 at 8:17 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > On Fri, Aug 12, 2016 at 09:29:10AM -0500, Josh Poimboeuf wrote:
> >> Convert arch_within_stack_frames() to use the new unwinder.
> >>
> >> Boot tested with CONFIG_HARDENED_USERCOPY.
> >>
> >> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
> >> ---
> >>  arch/x86/lib/usercopy.c | 25 +++++++++++++++++++------
> >>  1 file changed, 19 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/arch/x86/lib/usercopy.c b/arch/x86/lib/usercopy.c
> >> index 96ce151..9d0913c 100644
> >> --- a/arch/x86/lib/usercopy.c
> >> +++ b/arch/x86/lib/usercopy.c
> >> @@ -50,12 +50,21 @@ int arch_within_stack_frames(const void * const stack,
> >>                            const void * const stackend,
> >>                            const void *obj, unsigned long len)
> >>  {
> >> -     const void *frame = NULL;
> >> -     const void *oldframe;
> >> +     struct unwind_state state;
> >> +     const void *frame, *oldframe;
> >> +
> >> +     unwind_start(&state, current, NULL, NULL);
> >> +
> >> +     if (!unwind_next_frame(&state))
> >> +             return 0;
> >> +
> >> +     oldframe = unwind_get_stack_ptr(&state);
> >
> > Actually, I think this isn't quite right.  Now that the function isn't
> > inlined, this needs to unwind another frame to be equivalent to current
> > behavior.
> 
> Yeah, that seems right. And IIUC, as long as this is wrapped in the
> CONFIG_FRAME_POINTER check, this won't use the guessing unwinder,
> right? (Which is how it should be.)

Right, only the frame pointer unwinder will be used here, thanks to the
CONFIG_FRAME_POINTER guard in thread_info.h.

-- 
Josh

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 50/51] x86/mm: move arch_within_stack_frames() to usercopy.c
  2016-08-12 19:12     ` Josh Poimboeuf
@ 2016-08-12 20:06       ` Kees Cook
  2016-08-12 20:36         ` Josh Poimboeuf
  0 siblings, 1 reply; 99+ messages in thread
From: Kees Cook @ 2016-08-12 20:06 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86, LKML,
	Andy Lutomirski, Linus Torvalds, Steven Rostedt, Brian Gerst,
	Peter Zijlstra, Frederic Weisbecker, Byungchul Park, Nilay Vaish

On Fri, Aug 12, 2016 at 12:12 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> On Fri, Aug 12, 2016 at 10:36:21AM -0700, Kees Cook wrote:
>> On Fri, Aug 12, 2016 at 7:29 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>> > When I tried to port arch_within_stack_frames() to use the new unwinder,
>> > I got a nightmare include file "header soup" scenario when unwind.h was
>> > included from thread_info.h.  And anyway, I think thread_info.h isn't
>> > really an appropriate place for this function.  So move it to usercopy.c
>> > instead.
>> >
>> > Since it relies on its parent's stack pointer, and the function is no
>> > longer inlined, the arguments to the __builtin_frame_address() calls
>> > have been incremented.
>>
>> Cool, looks good (minor change noted below). This patch might be a
>> good place to drop this from mm/Makefile too:
>>
>> # Since __builtin_frame_address does work as used, disable the warning.
>> CFLAGS_usercopy.o += $(call cc-disable-warning, frame-address)
>>
>> Since frame-address warnings have been disabled globally now since
>> commit 124a3d88fa20 ("Disable "frame-address" warning").
>
> Ok, I'll do that with the next patch (51/51) which removes the
> __builtin_frame_address() calls.
>
>> > Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
>> > ---
>> >  arch/x86/include/asm/thread_info.h | 46 ++++++++------------------------------
>> >  arch/x86/lib/usercopy.c            | 43 +++++++++++++++++++++++++++++++++++
>> >  2 files changed, 52 insertions(+), 37 deletions(-)
>> >
>> > diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
>> > index 8b7c8d8e..fd849e6 100644
>> > --- a/arch/x86/include/asm/thread_info.h
>> > +++ b/arch/x86/include/asm/thread_info.h
>> > @@ -176,49 +176,21 @@ static inline unsigned long current_stack_pointer(void)
>> >         return sp;
>> >  }
>> >
>> > -/*
>> > - * Walks up the stack frames to make sure that the specified object is
>> > - * entirely contained by a single stack frame.
>> > - *
>> > - * Returns:
>> > - *              1 if within a frame
>> > - *             -1 if placed across a frame boundary (or outside stack)
>> > - *              0 unable to determine (no frame pointers, etc)
>> > - */
>> > +#ifdef CONFIG_HARDENED_USERCOPY
>>
>> This ifdef shouldn't be needed: the arch_within_stack_frames wasn't
>> designed to depend on it.
>>
>> > +#ifdef CONFIG_FRAME_POINTER
>> > +int arch_within_stack_frames(const void * const stack,
>> > +                            const void * const stackend,
>> > +                            const void *obj, unsigned long len);
>> > +#else
>> >  static inline int arch_within_stack_frames(const void * const stack,
>> >                                            const void * const stackend,
>> >                                            const void *obj, unsigned long len)
>> >  {
>> > -#if defined(CONFIG_FRAME_POINTER)
>> > -       const void *frame = NULL;
>> > -       const void *oldframe;
>> > -
>> > -       oldframe = __builtin_frame_address(1);
>> > -       if (oldframe)
>> > -               frame = __builtin_frame_address(2);
>> > -       /*
>> > -        * low ----------------------------------------------> high
>> > -        * [saved bp][saved ip][args][local vars][saved bp][saved ip]
>> > -        *                     ^----------------^
>> > -        *               allow copies only within here
>> > -        */
>> > -       while (stack <= frame && frame < stackend) {
>> > -               /*
>> > -                * If obj + len extends past the last frame, this
>> > -                * check won't pass and the next frame will be 0,
>> > -                * causing us to bail out and correctly report
>> > -                * the copy as invalid.
>> > -                */
>> > -               if (obj + len <= frame)
>> > -                       return obj >= oldframe + 2 * sizeof(void *) ? 1 : -1;
>> > -               oldframe = frame;
>> > -               frame = *(const void * const *)frame;
>> > -       }
>> > -       return -1;
>> > -#else
>> >         return 0;
>> > -#endif
>> >  }
>> > +#endif /* CONFIG_FRAME_POINTER */
>> > +#endif /* CONFIG_HARDENED_USERCOPY */
>> > +
>> >
>> >  #else /* !__ASSEMBLY__ */
>> >
>> > diff --git a/arch/x86/lib/usercopy.c b/arch/x86/lib/usercopy.c
>> > index b490878..96ce151 100644
>> > --- a/arch/x86/lib/usercopy.c
>> > +++ b/arch/x86/lib/usercopy.c
>> > @@ -9,6 +9,7 @@
>> >
>> >  #include <asm/word-at-a-time.h>
>> >  #include <linux/sched.h>
>> > +#include <asm/unwind.h>
>> >
>> >  /*
>> >   * We rely on the nested NMI work to allow atomic faults from the NMI path; the
>> > @@ -34,3 +35,45 @@ copy_from_user_nmi(void *to, const void __user *from, unsigned long n)
>> >         return ret;
>> >  }
>> >  EXPORT_SYMBOL_GPL(copy_from_user_nmi);
>> > +
>> > +#ifdef CONFIG_HARDENED_USERCOPY
>>
>> Same thing: no need to check CONFIG_HARDENED_USERCOPY here: it should
>> be checking CONFIG_FRAME_POINTER instead.
>
> Now that this function is no longer inlined and is instead compiled in
> its own .c file, I was thinking that the tinyconfig folks would
> appreciate not growing the text size if there's no reason to do so.
> Keeping this #ifdef won't break anything, right?

Hrm, well, I guess not, but it means if anyone else wants to use it
they have to remove the ifdef. I guess I don't object that much. :P

> Also I moved the CONFIG_FRAME_POINTER check to the header file so it
> doesn't pollute the .c code.

Right, but if FRAME_POINTER=n and HARDENED_USERCOPY=y you'll get a
build error about it being both in the .h and the .c file, if I'm
reading that correctly.

-Kees

-- 
Kees Cook
Nexus Security

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 50/51] x86/mm: move arch_within_stack_frames() to usercopy.c
  2016-08-12 20:06       ` Kees Cook
@ 2016-08-12 20:36         ` Josh Poimboeuf
  2016-08-12 20:44           ` Kees Cook
  0 siblings, 1 reply; 99+ messages in thread
From: Josh Poimboeuf @ 2016-08-12 20:36 UTC (permalink / raw)
  To: Kees Cook
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86, LKML,
	Andy Lutomirski, Linus Torvalds, Steven Rostedt, Brian Gerst,
	Peter Zijlstra, Frederic Weisbecker, Byungchul Park, Nilay Vaish

On Fri, Aug 12, 2016 at 01:06:41PM -0700, Kees Cook wrote:
> On Fri, Aug 12, 2016 at 12:12 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > On Fri, Aug 12, 2016 at 10:36:21AM -0700, Kees Cook wrote:
> >> On Fri, Aug 12, 2016 at 7:29 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> >> > When I tried to port arch_within_stack_frames() to use the new unwinder,
> >> > I got a nightmare include file "header soup" scenario when unwind.h was
> >> > included from thread_info.h.  And anyway, I think thread_info.h isn't
> >> > really an appropriate place for this function.  So move it to usercopy.c
> >> > instead.
> >> >
> >> > Since it relies on its parent's stack pointer, and the function is no
> >> > longer inlined, the arguments to the __builtin_frame_address() calls
> >> > have been incremented.
> >>
> >> Cool, looks good (minor change noted below). This patch might be a
> >> good place to drop this from mm/Makefile too:
> >>
> >> # Since __builtin_frame_address does work as used, disable the warning.
> >> CFLAGS_usercopy.o += $(call cc-disable-warning, frame-address)
> >>
> >> Since frame-address warnings have been disabled globally now since
> >> commit 124a3d88fa20 ("Disable "frame-address" warning").
> >
> > Ok, I'll do that with the next patch (51/51) which removes the
> > __builtin_frame_address() calls.
> >
> >> > Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
> >> > ---
> >> >  arch/x86/include/asm/thread_info.h | 46 ++++++++------------------------------
> >> >  arch/x86/lib/usercopy.c            | 43 +++++++++++++++++++++++++++++++++++
> >> >  2 files changed, 52 insertions(+), 37 deletions(-)
> >> >
> >> > diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
> >> > index 8b7c8d8e..fd849e6 100644
> >> > --- a/arch/x86/include/asm/thread_info.h
> >> > +++ b/arch/x86/include/asm/thread_info.h
> >> > @@ -176,49 +176,21 @@ static inline unsigned long current_stack_pointer(void)
> >> >         return sp;
> >> >  }
> >> >
> >> > -/*
> >> > - * Walks up the stack frames to make sure that the specified object is
> >> > - * entirely contained by a single stack frame.
> >> > - *
> >> > - * Returns:
> >> > - *              1 if within a frame
> >> > - *             -1 if placed across a frame boundary (or outside stack)
> >> > - *              0 unable to determine (no frame pointers, etc)
> >> > - */
> >> > +#ifdef CONFIG_HARDENED_USERCOPY
> >>
> >> This ifdef shouldn't be needed: the arch_within_stack_frames wasn't
> >> designed to depend on it.
> >>
> >> > +#ifdef CONFIG_FRAME_POINTER
> >> > +int arch_within_stack_frames(const void * const stack,
> >> > +                            const void * const stackend,
> >> > +                            const void *obj, unsigned long len);
> >> > +#else
> >> >  static inline int arch_within_stack_frames(const void * const stack,
> >> >                                            const void * const stackend,
> >> >                                            const void *obj, unsigned long len)
> >> >  {
> >> > -#if defined(CONFIG_FRAME_POINTER)
> >> > -       const void *frame = NULL;
> >> > -       const void *oldframe;
> >> > -
> >> > -       oldframe = __builtin_frame_address(1);
> >> > -       if (oldframe)
> >> > -               frame = __builtin_frame_address(2);
> >> > -       /*
> >> > -        * low ----------------------------------------------> high
> >> > -        * [saved bp][saved ip][args][local vars][saved bp][saved ip]
> >> > -        *                     ^----------------^
> >> > -        *               allow copies only within here
> >> > -        */
> >> > -       while (stack <= frame && frame < stackend) {
> >> > -               /*
> >> > -                * If obj + len extends past the last frame, this
> >> > -                * check won't pass and the next frame will be 0,
> >> > -                * causing us to bail out and correctly report
> >> > -                * the copy as invalid.
> >> > -                */
> >> > -               if (obj + len <= frame)
> >> > -                       return obj >= oldframe + 2 * sizeof(void *) ? 1 : -1;
> >> > -               oldframe = frame;
> >> > -               frame = *(const void * const *)frame;
> >> > -       }
> >> > -       return -1;
> >> > -#else
> >> >         return 0;
> >> > -#endif
> >> >  }
> >> > +#endif /* CONFIG_FRAME_POINTER */
> >> > +#endif /* CONFIG_HARDENED_USERCOPY */
> >> > +
> >> >
> >> >  #else /* !__ASSEMBLY__ */
> >> >
> >> > diff --git a/arch/x86/lib/usercopy.c b/arch/x86/lib/usercopy.c
> >> > index b490878..96ce151 100644
> >> > --- a/arch/x86/lib/usercopy.c
> >> > +++ b/arch/x86/lib/usercopy.c
> >> > @@ -9,6 +9,7 @@
> >> >
> >> >  #include <asm/word-at-a-time.h>
> >> >  #include <linux/sched.h>
> >> > +#include <asm/unwind.h>
> >> >
> >> >  /*
> >> >   * We rely on the nested NMI work to allow atomic faults from the NMI path; the
> >> > @@ -34,3 +35,45 @@ copy_from_user_nmi(void *to, const void __user *from, unsigned long n)
> >> >         return ret;
> >> >  }
> >> >  EXPORT_SYMBOL_GPL(copy_from_user_nmi);
> >> > +
> >> > +#ifdef CONFIG_HARDENED_USERCOPY
> >>
> >> Same thing: no need to check CONFIG_HARDENED_USERCOPY here: it should
> >> be checking CONFIG_FRAME_POINTER instead.
> >
> > Now that this function is no longer inlined and is instead compiled in
> > its own .c file, I was thinking that the tinyconfig folks would
> > appreciate not growing the text size if there's no reason to do so.
> > Keeping this #ifdef won't break anything, right?
> 
> Hrm, well, I guess not, but it means if anyone else wants to use it
> they have to remove the ifdef. I guess I don't object that much. :P

Ah.  Do you expect other uses for it?

> > Also I moved the CONFIG_FRAME_POINTER check to the header file so it
> > doesn't pollute the .c code.
> 
> Right, but if FRAME_POINTER=n and HARDENED_USERCOPY=y you'll get a
> build error about it being both in the .h and the .c file, if I'm
> reading that correctly.

Oh, right.

-- 
Josh

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 51/51] x86/mm: convert arch_within_stack_frames() to use the new unwinder
  2016-08-12 14:29 ` [PATCH v3 51/51] x86/mm: convert arch_within_stack_frames() to use the new unwinder Josh Poimboeuf
  2016-08-12 15:17   ` Josh Poimboeuf
@ 2016-08-12 20:41   ` Josh Poimboeuf
  2016-08-12 20:47     ` Kees Cook
  1 sibling, 1 reply; 99+ messages in thread
From: Josh Poimboeuf @ 2016-08-12 20:41 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

On Fri, Aug 12, 2016 at 09:29:10AM -0500, Josh Poimboeuf wrote:
> Convert arch_within_stack_frames() to use the new unwinder.
> 
> Boot tested with CONFIG_HARDENED_USERCOPY.
> 
> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
> ---
>  arch/x86/lib/usercopy.c | 25 +++++++++++++++++++------
>  1 file changed, 19 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/lib/usercopy.c b/arch/x86/lib/usercopy.c
> index 96ce151..9d0913c 100644
> --- a/arch/x86/lib/usercopy.c
> +++ b/arch/x86/lib/usercopy.c
> @@ -50,12 +50,21 @@ int arch_within_stack_frames(const void * const stack,
>  			     const void * const stackend,
>  			     const void *obj, unsigned long len)
>  {
> -	const void *frame = NULL;
> -	const void *oldframe;
> +	struct unwind_state state;
> +	const void *frame, *oldframe;
> +
> +	unwind_start(&state, current, NULL, NULL);
> +
> +	if (!unwind_next_frame(&state))
> +		return 0;
> +
> +	oldframe = unwind_get_stack_ptr(&state);
> +
> +	if (!unwind_next_frame(&state))
> +		return 0;
> +
> +	frame = unwind_get_stack_ptr(&state);
>  
> -	oldframe = __builtin_frame_address(2);
> -	if (oldframe)
> -		frame = __builtin_frame_address(3);
>  	/*
>  	 * low ----------------------------------------------> high
>  	 * [saved bp][saved ip][args][local vars][saved bp][saved ip]
> @@ -71,8 +80,12 @@ int arch_within_stack_frames(const void * const stack,
>  		 */
>  		if (obj + len <= frame)
>  			return obj >= oldframe + 2 * sizeof(void *) ? 1 : -1;
> +
> +		if (!unwind_next_frame(&state))
> +			return 0;

I think there's another issue here.  This return needs to be tweaked.
IIUC, if it reliably reaches the end of the stack without finding the
object, it should return -1, but if there's something wrong with the
frame pointers which prevents the unwinder from reaching the end of the
stack, it should return 0.

-- 
Josh

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 50/51] x86/mm: move arch_within_stack_frames() to usercopy.c
  2016-08-12 20:36         ` Josh Poimboeuf
@ 2016-08-12 20:44           ` Kees Cook
  0 siblings, 0 replies; 99+ messages in thread
From: Kees Cook @ 2016-08-12 20:44 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86, LKML,
	Andy Lutomirski, Linus Torvalds, Steven Rostedt, Brian Gerst,
	Peter Zijlstra, Frederic Weisbecker, Byungchul Park, Nilay Vaish

On Fri, Aug 12, 2016 at 1:36 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> On Fri, Aug 12, 2016 at 01:06:41PM -0700, Kees Cook wrote:
>> On Fri, Aug 12, 2016 at 12:12 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>> > On Fri, Aug 12, 2016 at 10:36:21AM -0700, Kees Cook wrote:
>> >> On Fri, Aug 12, 2016 at 7:29 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>> >> > When I tried to port arch_within_stack_frames() to use the new unwinder,
>> >> > I got a nightmare include file "header soup" scenario when unwind.h was
>> >> > included from thread_info.h.  And anyway, I think thread_info.h isn't
>> >> > really an appropriate place for this function.  So move it to usercopy.c
>> >> > instead.
>> >> >
>> >> > Since it relies on its parent's stack pointer, and the function is no
>> >> > longer inlined, the arguments to the __builtin_frame_address() calls
>> >> > have been incremented.
>> >>
>> >> Cool, looks good (minor change noted below). This patch might be a
>> >> good place to drop this from mm/Makefile too:
>> >>
>> >> # Since __builtin_frame_address does work as used, disable the warning.
>> >> CFLAGS_usercopy.o += $(call cc-disable-warning, frame-address)
>> >>
>> >> Since frame-address warnings have been disabled globally now since
>> >> commit 124a3d88fa20 ("Disable "frame-address" warning").
>> >
>> > Ok, I'll do that with the next patch (51/51) which removes the
>> > __builtin_frame_address() calls.
>> >
>> >> > Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
>> >> > ---
>> >> >  arch/x86/include/asm/thread_info.h | 46 ++++++++------------------------------
>> >> >  arch/x86/lib/usercopy.c            | 43 +++++++++++++++++++++++++++++++++++
>> >> >  2 files changed, 52 insertions(+), 37 deletions(-)
>> >> >
>> >> > diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
>> >> > index 8b7c8d8e..fd849e6 100644
>> >> > --- a/arch/x86/include/asm/thread_info.h
>> >> > +++ b/arch/x86/include/asm/thread_info.h
>> >> > @@ -176,49 +176,21 @@ static inline unsigned long current_stack_pointer(void)
>> >> >         return sp;
>> >> >  }
>> >> >
>> >> > -/*
>> >> > - * Walks up the stack frames to make sure that the specified object is
>> >> > - * entirely contained by a single stack frame.
>> >> > - *
>> >> > - * Returns:
>> >> > - *              1 if within a frame
>> >> > - *             -1 if placed across a frame boundary (or outside stack)
>> >> > - *              0 unable to determine (no frame pointers, etc)
>> >> > - */
>> >> > +#ifdef CONFIG_HARDENED_USERCOPY
>> >>
>> >> This ifdef shouldn't be needed: the arch_within_stack_frames wasn't
>> >> designed to depend on it.
>> >>
>> >> > +#ifdef CONFIG_FRAME_POINTER
>> >> > +int arch_within_stack_frames(const void * const stack,
>> >> > +                            const void * const stackend,
>> >> > +                            const void *obj, unsigned long len);
>> >> > +#else
>> >> >  static inline int arch_within_stack_frames(const void * const stack,
>> >> >                                            const void * const stackend,
>> >> >                                            const void *obj, unsigned long len)
>> >> >  {
>> >> > -#if defined(CONFIG_FRAME_POINTER)
>> >> > -       const void *frame = NULL;
>> >> > -       const void *oldframe;
>> >> > -
>> >> > -       oldframe = __builtin_frame_address(1);
>> >> > -       if (oldframe)
>> >> > -               frame = __builtin_frame_address(2);
>> >> > -       /*
>> >> > -        * low ----------------------------------------------> high
>> >> > -        * [saved bp][saved ip][args][local vars][saved bp][saved ip]
>> >> > -        *                     ^----------------^
>> >> > -        *               allow copies only within here
>> >> > -        */
>> >> > -       while (stack <= frame && frame < stackend) {
>> >> > -               /*
>> >> > -                * If obj + len extends past the last frame, this
>> >> > -                * check won't pass and the next frame will be 0,
>> >> > -                * causing us to bail out and correctly report
>> >> > -                * the copy as invalid.
>> >> > -                */
>> >> > -               if (obj + len <= frame)
>> >> > -                       return obj >= oldframe + 2 * sizeof(void *) ? 1 : -1;
>> >> > -               oldframe = frame;
>> >> > -               frame = *(const void * const *)frame;
>> >> > -       }
>> >> > -       return -1;
>> >> > -#else
>> >> >         return 0;
>> >> > -#endif
>> >> >  }
>> >> > +#endif /* CONFIG_FRAME_POINTER */
>> >> > +#endif /* CONFIG_HARDENED_USERCOPY */
>> >> > +
>> >> >
>> >> >  #else /* !__ASSEMBLY__ */
>> >> >
>> >> > diff --git a/arch/x86/lib/usercopy.c b/arch/x86/lib/usercopy.c
>> >> > index b490878..96ce151 100644
>> >> > --- a/arch/x86/lib/usercopy.c
>> >> > +++ b/arch/x86/lib/usercopy.c
>> >> > @@ -9,6 +9,7 @@
>> >> >
>> >> >  #include <asm/word-at-a-time.h>
>> >> >  #include <linux/sched.h>
>> >> > +#include <asm/unwind.h>
>> >> >
>> >> >  /*
>> >> >   * We rely on the nested NMI work to allow atomic faults from the NMI path; the
>> >> > @@ -34,3 +35,45 @@ copy_from_user_nmi(void *to, const void __user *from, unsigned long n)
>> >> >         return ret;
>> >> >  }
>> >> >  EXPORT_SYMBOL_GPL(copy_from_user_nmi);
>> >> > +
>> >> > +#ifdef CONFIG_HARDENED_USERCOPY
>> >>
>> >> Same thing: no need to check CONFIG_HARDENED_USERCOPY here: it should
>> >> be checking CONFIG_FRAME_POINTER instead.
>> >
>> > Now that this function is no longer inlined and is instead compiled in
>> > its own .c file, I was thinking that the tinyconfig folks would
>> > appreciate not growing the text size if there's no reason to do so.
>> > Keeping this #ifdef won't break anything, right?
>>
>> Hrm, well, I guess not, but it means if anyone else wants to use it
>> they have to remove the ifdef. I guess I don't object that much. :P
>
> Ah.  Do you expect other uses for it?

None that I'm aware of. :)

-Kees

>
>> > Also I moved the CONFIG_FRAME_POINTER check to the header file so it
>> > doesn't pollute the .c code.
>>
>> Right, but if FRAME_POINTER=n and HARDENED_USERCOPY=y you'll get a
>> build error about it being both in the .h and the .c file, if I'm
>> reading that correctly.
>
> Oh, right.
>
> --
> Josh



-- 
Kees Cook
Nexus Security

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 51/51] x86/mm: convert arch_within_stack_frames() to use the new unwinder
  2016-08-12 20:41   ` Josh Poimboeuf
@ 2016-08-12 20:47     ` Kees Cook
  0 siblings, 0 replies; 99+ messages in thread
From: Kees Cook @ 2016-08-12 20:47 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86, LKML,
	Andy Lutomirski, Linus Torvalds, Steven Rostedt, Brian Gerst,
	Peter Zijlstra, Frederic Weisbecker, Byungchul Park, Nilay Vaish

On Fri, Aug 12, 2016 at 1:41 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> On Fri, Aug 12, 2016 at 09:29:10AM -0500, Josh Poimboeuf wrote:
>> Convert arch_within_stack_frames() to use the new unwinder.
>>
>> Boot tested with CONFIG_HARDENED_USERCOPY.
>>
>> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
>> ---
>>  arch/x86/lib/usercopy.c | 25 +++++++++++++++++++------
>>  1 file changed, 19 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/x86/lib/usercopy.c b/arch/x86/lib/usercopy.c
>> index 96ce151..9d0913c 100644
>> --- a/arch/x86/lib/usercopy.c
>> +++ b/arch/x86/lib/usercopy.c
>> @@ -50,12 +50,21 @@ int arch_within_stack_frames(const void * const stack,
>>                            const void * const stackend,
>>                            const void *obj, unsigned long len)
>>  {
>> -     const void *frame = NULL;
>> -     const void *oldframe;
>> +     struct unwind_state state;
>> +     const void *frame, *oldframe;
>> +
>> +     unwind_start(&state, current, NULL, NULL);
>> +
>> +     if (!unwind_next_frame(&state))
>> +             return 0;
>> +
>> +     oldframe = unwind_get_stack_ptr(&state);
>> +
>> +     if (!unwind_next_frame(&state))
>> +             return 0;
>> +
>> +     frame = unwind_get_stack_ptr(&state);
>>
>> -     oldframe = __builtin_frame_address(2);
>> -     if (oldframe)
>> -             frame = __builtin_frame_address(3);
>>       /*
>>        * low ----------------------------------------------> high
>>        * [saved bp][saved ip][args][local vars][saved bp][saved ip]
>> @@ -71,8 +80,12 @@ int arch_within_stack_frames(const void * const stack,
>>                */
>>               if (obj + len <= frame)
>>                       return obj >= oldframe + 2 * sizeof(void *) ? 1 : -1;
>> +
>> +             if (!unwind_next_frame(&state))
>> +                     return 0;
>
> I think there's another issue here.  This return needs to be tweaked.
> IIUC, if it reliably reaches the end of the stack without finding the
> object, it should return -1, but if there's something wrong with the
> frame pointers which prevents the unwinder from reaching the end of the
> stack, it should return 0.

Ah, yes, good catch. The callers of this function should have already
determined if the address is outside the stack itself, so this should
only be called when we expect the contents to be somewhere in the
stack. If the unwinder can't find it, then that should be an error,
yes.

-Kees

-- 
Kees Cook
Nexus Security

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 04/51] x86/asm/head: use a common function for starting CPUs
  2016-08-12 14:28 ` [PATCH v3 04/51] x86/asm/head: use a common function for starting CPUs Josh Poimboeuf
@ 2016-08-12 22:08   ` Nilay Vaish
  0 siblings, 0 replies; 99+ messages in thread
From: Nilay Vaish @ 2016-08-12 22:08 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86,
	Linux Kernel list, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park

On 12 August 2016 at 09:28, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> There are two different pieces of code for starting a CPU: start_cpu0()
> and the end of secondary_startup_64().  They're identical except for the
> stack setup.  Combine the common parts into a shared start_cpu()
> function.
>
> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
> ---
>  arch/x86/kernel/head_64.S | 22 +++++++++++-----------
>  1 file changed, 11 insertions(+), 11 deletions(-)
>
> diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
> index e048142..a212310 100644
> --- a/arch/x86/kernel/head_64.S
> +++ b/arch/x86/kernel/head_64.S
> @@ -264,13 +264,17 @@ ENTRY(secondary_startup_64)
>         movl    $MSR_GS_BASE,%ecx
>         movl    initial_gs(%rip),%eax
>         movl    initial_gs+4(%rip),%edx
> -       wrmsr
> +       wrmsr
>
>         /* rsi is pointer to real mode structure with interesting info.
>            pass it to C */
>         movq    %rsi, %rdi
> -
> -       /* Finally jump to run C code and to be on real kernel address
> +       jmp     start_cpu
> +ENDPROC(secondary_startup_64)
> +
> +ENTRY(start_cpu)
> +       /*
> +        * Jump to run C code and to be on a real kernel address.
>          * Since we are running on identity-mapped space we have to jump
>          * to the full 64bit address, this is only possible as indirect
>          * jump.  In addition we need to ensure %cs is set so we make this
> @@ -299,7 +303,7 @@ ENTRY(secondary_startup_64)
>         pushq   $__KERNEL_CS    # set correct cs
>         pushq   %rax            # target address in negative space
>         lretq
> -ENDPROC(secondary_startup_64)
> +ENDPROC(start_cpu)
>
>  #include "verify_cpu.S"
>
> @@ -307,15 +311,11 @@ ENDPROC(secondary_startup_64)
>  /*
>   * Boot CPU0 entry point. It's called from play_dead(). Everything has been set
>   * up already except stack. We just set up stack here. Then call
> - * start_secondary().
> + * start_secondary() via start_cpu().
>   */
>  ENTRY(start_cpu0)
> -       movq initial_stack(%rip),%rsp
> -       movq    initial_code(%rip),%rax
> -       pushq   $0              # fake return address to stop unwinder
> -       pushq   $__KERNEL_CS    # set correct cs
> -       pushq   %rax            # target address in negative space
> -       lretq
> +       movq    initial_stack(%rip), %rsp
> +       jmp     start_cpu
>  ENDPROC(start_cpu0)
>  #endif
>

Josh,  I think this patch looks good now, better than the previous
version.  Thanks for making the change.

--
Nilay

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 09/51] x86/dumpstack: fix x86_32 kernel_stack_pointer() previous stack access
  2016-08-12 14:28 ` [PATCH v3 09/51] x86/dumpstack: fix x86_32 kernel_stack_pointer() previous stack access Josh Poimboeuf
@ 2016-08-14  7:26   ` Andy Lutomirski
  2016-08-14 12:55     ` Brian Gerst
  2016-08-15 15:05     ` Josh Poimboeuf
  0 siblings, 2 replies; 99+ messages in thread
From: Andy Lutomirski @ 2016-08-14  7:26 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, X86 ML,
	linux-kernel, Linus Torvalds, Steven Rostedt, Brian Gerst,
	Kees Cook, Peter Zijlstra, Frederic Weisbecker, Byungchul Park,
	Nilay Vaish

On Fri, Aug 12, 2016 at 7:28 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> On x86_32, when an interrupt happens from kernel space, SS and SP aren't
> pushed and the existing stack is used.  So pt_regs is effectively two
> words shorter, and the previous stack pointer is normally the memory
> after the shortened pt_regs, aka '&regs->sp'.
>
> But in the rare case where the interrupt hits right after the stack
> pointer has been changed to point to an empty stack, like for example
> when call_on_stack() is used, the address immediately after the
> shortened pt_regs is no longer on the stack.  In that case, instead of
> '&regs->sp', the previous stack pointer should be retrieved from the
> beginning of the current stack page.
>
> kernel_stack_pointer() wants to do that, but it forgets to dereference
> the pointer.  So instead of returning a pointer to the previous stack,
> it returns a pointer to the beginning of the current stack.
>
> Fixes: 0788aa6a23cb ("x86: Prepare removal of previous_esp from i386 thread_info structure")
> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>

This seems like a valid fix, but I'm not sure I agree with the intent
of the code.  &regs->sp really is the previous stack pointer in the
sense that the stack pointer was &regs->sp when the entry happened.
>From an unwinder's perspective, how is:

movl [whatever], $esp
<-- interrupt

any different from:

movl [whatever], $esp
pushl [something]
<-- interrupt

Also, does x86_32 do this type of stack switching at all?  AFAICS
32-bit kernels don't use IRQ stacks in the first place.  Do they?  Am
I just missing the code that does it?

--Andy

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 14/51] x86/asm/head: put real return address on idle task stack
  2016-08-12 14:28 ` [PATCH v3 14/51] x86/asm/head: put real return address on idle task stack Josh Poimboeuf
@ 2016-08-14  7:29   ` Andy Lutomirski
  2016-08-17 20:30   ` Nilay Vaish
  1 sibling, 0 replies; 99+ messages in thread
From: Andy Lutomirski @ 2016-08-14  7:29 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, X86 ML,
	linux-kernel, Linus Torvalds, Steven Rostedt, Brian Gerst,
	Kees Cook, Peter Zijlstra, Frederic Weisbecker, Byungchul Park,
	Nilay Vaish

On Fri, Aug 12, 2016 at 7:28 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> The frame at the end of each idle task stack has a zeroed return
> address.  This is inconsistent with real task stacks, which have a real
> return address at that spot.  This inconsistency can be confusing for
> stack unwinders.
>
> Make it a real address by using the side effect of a call instruction to
> push the instruction pointer on the stack.
>
> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
> ---
>  arch/x86/kernel/head_64.S | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
> index 3621ad2..c90f481 100644
> --- a/arch/x86/kernel/head_64.S
> +++ b/arch/x86/kernel/head_64.S
> @@ -298,8 +298,9 @@ ENTRY(start_cpu)
>          *      REX.W + FF /5 JMP m16:64 Jump far, absolute indirect,
>          *              address given in m16:64.
>          */
> -       movq    initial_code(%rip),%rax
> -       pushq   $0              # fake return address to stop unwinder
> +       call    1f              # put return address on stack for unwinder
> +1:     xorq    %rbp, %rbp      # clear frame pointer
> +       movq    initial_code(%rip), %rax
>         pushq   $__KERNEL_CS    # set correct cs
>         pushq   %rax            # target address in negative space
>         lretq
> --
> 2.7.4
>

Seems reasonable.

Reviewed-by: Andy Lutomirski <luto@kernel.org>

--Andy

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 15/51] x86/asm/head: standardize the end of the stack for idle tasks
  2016-08-12 14:28 ` [PATCH v3 15/51] x86/asm/head: standardize the end of the stack for idle tasks Josh Poimboeuf
@ 2016-08-14  7:30   ` Andy Lutomirski
  0 siblings, 0 replies; 99+ messages in thread
From: Andy Lutomirski @ 2016-08-14  7:30 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, X86 ML,
	linux-kernel, Linus Torvalds, Steven Rostedt, Brian Gerst,
	Kees Cook, Peter Zijlstra, Frederic Weisbecker, Byungchul Park,
	Nilay Vaish

On Fri, Aug 12, 2016 at 7:28 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> Thanks to all the recent x86 entry code refactoring, most tasks' kernel
> stacks start at the same offset right above their saved pt_regs,
> regardless of which syscall was used to enter the kernel.  That creates
> a nice convention which makes it straightforward to identify the end of
> the stack, which can be useful for stack walking code which needs to
> verify the stack is sane.
>
> However, the boot CPU's idle "swapper" task doesn't follow that
> convention.  Fix that by starting its stack at a sizeof(pt_regs) offset
> from the end of the stack page.
>

I think this is an improvement.  If you want to be fancy, from memory
it might be nice to poke -1 into the orig_ax slot, but this doesn't
matter much.

--Andy

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 16/51] x86/32: put real return address on stack in entry code
  2016-08-12 14:28 ` [PATCH v3 16/51] x86/32: put real return address on stack in entry code Josh Poimboeuf
@ 2016-08-14  7:31   ` Andy Lutomirski
  2016-08-15 15:09     ` Josh Poimboeuf
  0 siblings, 1 reply; 99+ messages in thread
From: Andy Lutomirski @ 2016-08-14  7:31 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, X86 ML,
	linux-kernel, Linus Torvalds, Steven Rostedt, Brian Gerst,
	Kees Cook, Peter Zijlstra, Frederic Weisbecker, Byungchul Park,
	Nilay Vaish

On Fri, Aug 12, 2016 at 7:28 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> This standardizes the stacks of idle tasks to be consistent with other
> tasks on 32-bit.

It might be nice to stick a ud2 or 1: hlt; jmp 1b or similar
afterwards to make it clear that initial_code can't return.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 19/51] x86/entry/32: rename 'error_code' to 'common_exception'
  2016-08-12 14:28 ` [PATCH v3 19/51] x86/entry/32: rename 'error_code' to 'common_exception' Josh Poimboeuf
@ 2016-08-14  7:40   ` Andy Lutomirski
  2016-08-15 15:30     ` Josh Poimboeuf
  0 siblings, 1 reply; 99+ messages in thread
From: Andy Lutomirski @ 2016-08-14  7:40 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, X86 ML,
	linux-kernel, Linus Torvalds, Steven Rostedt, Brian Gerst,
	Kees Cook, Peter Zijlstra, Frederic Weisbecker, Byungchul Park,
	Nilay Vaish

On Fri, Aug 12, 2016 at 7:28 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> The 'error_code' label is awkwardly named, especially when it shows up
> in a stack trace.  Move it to its own local function and rename it to
> 'common_exception', analagous to the existing 'common_interrupt'.
>
> This also makes related stack traces more sensible.

This is okay with me.  You could also call it "error_entry" for
consistency with x86_64.

--Andy

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 31/51] x86/dumpstack: allow preemption in show_stack_log_lvl() and dump_trace()
  2016-08-12 14:28 ` [PATCH v3 31/51] x86/dumpstack: allow preemption in show_stack_log_lvl() and dump_trace() Josh Poimboeuf
@ 2016-08-14  7:45   ` Andy Lutomirski
  2016-08-15 15:32     ` Josh Poimboeuf
  0 siblings, 1 reply; 99+ messages in thread
From: Andy Lutomirski @ 2016-08-14  7:45 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, X86 ML,
	linux-kernel, Linus Torvalds, Steven Rostedt, Brian Gerst,
	Kees Cook, Peter Zijlstra, Frederic Weisbecker, Byungchul Park,
	Nilay Vaish

On Fri, Aug 12, 2016 at 7:28 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> show_stack_log_lvl() and dump_trace() are already preemption safe:
>
> - If they're running in interrupt context, preemption is already
>   disabled and the percpu irq stack pointers can be trusted.

I agree with the patch, but I have a minor nitpick about the
description.  The "irq stack" is an actual thing, but the relevant
stacks here aren't just the irq stack: they're the irq stack, the IST
stacks (on 64-bit), and the NMI stack (on 32-bit).  The same logic
applies, though.

--Andy

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 32/51] x86/dumpstack: simplify in_exception_stack()
  2016-08-12 14:28 ` [PATCH v3 32/51] x86/dumpstack: simplify in_exception_stack() Josh Poimboeuf
@ 2016-08-14  7:48   ` Andy Lutomirski
  2016-08-15 15:34     ` Josh Poimboeuf
  0 siblings, 1 reply; 99+ messages in thread
From: Andy Lutomirski @ 2016-08-14  7:48 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, X86 ML,
	linux-kernel, Linus Torvalds, Steven Rostedt, Brian Gerst,
	Kees Cook, Peter Zijlstra, Frederic Weisbecker, Byungchul Park,
	Nilay Vaish

On Fri, Aug 12, 2016 at 7:28 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> in_exception_stack() does some bad, bad things just so the unwinder can
> print different values for different areas of the debug exception stack.
>
> There's no need to clarify where exactly on the stack it is.  Just print
> "#DB" and be done with it.

I'm okay with the printing part, but you're also using this to prevent
infinite looping.  Will this cause the unwind to fail if we go debug
-> page fault -> debug or similar?  (Or whatever actually uses the
deeper debug stacks?  I figured this out once and then forgot exactly
what's going on.  I really need to dust off my patches that stop using
IST for #DB.)

--Andy

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 45/51] x86: remove 64-byte gap at end of irq stack
  2016-08-12 14:29 ` [PATCH v3 45/51] x86: remove 64-byte gap at end of irq stack Josh Poimboeuf
@ 2016-08-14  7:52   ` Andy Lutomirski
  2016-08-14 12:50     ` Brian Gerst
  2016-08-15 15:42     ` Josh Poimboeuf
  0 siblings, 2 replies; 99+ messages in thread
From: Andy Lutomirski @ 2016-08-14  7:52 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, X86 ML,
	linux-kernel, Linus Torvalds, Steven Rostedt, Brian Gerst,
	Kees Cook, Peter Zijlstra, Frederic Weisbecker, Byungchul Park,
	Nilay Vaish

On Fri, Aug 12, 2016 at 7:29 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> There has been a 64-byte gap at the end of the irq stack for at least 12
> years.  It predates git history, and I can't find any good reason for
> it.  Remove it.  What's the worst that could happen?

I can't think of any reason this would matter.

For that matter, do you have any idea why irq_stack_union is a union
or why we insist on sticking it at %gs:0?  Sure, the *canary* needs to
live at a fixed offset (because GCC is daft, sigh), but I don't see
what that has to do with the rest of the IRQ stack.

--Andy

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 48/51] x86/unwind: warn if stack grows up
  2016-08-12 14:29 ` [PATCH v3 48/51] x86/unwind: warn if stack grows up Josh Poimboeuf
@ 2016-08-14  7:56   ` Andy Lutomirski
  2016-08-15 16:25     ` Josh Poimboeuf
  0 siblings, 1 reply; 99+ messages in thread
From: Andy Lutomirski @ 2016-08-14  7:56 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, X86 ML,
	linux-kernel, Linus Torvalds, Steven Rostedt, Brian Gerst,
	Kees Cook, Peter Zijlstra, Frederic Weisbecker, Byungchul Park,
	Nilay Vaish

On Fri, Aug 12, 2016 at 7:29 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> Add a sanity check to ensure the stack only grows down, and print a
> warning if the check fails.
>
> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
> ---
>  arch/x86/kernel/unwind_frame.c | 26 ++++++++++++++++++++++++--
>  1 file changed, 24 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kernel/unwind_frame.c b/arch/x86/kernel/unwind_frame.c
> index 5496462..f21b7ef 100644
> --- a/arch/x86/kernel/unwind_frame.c
> +++ b/arch/x86/kernel/unwind_frame.c
> @@ -32,6 +32,15 @@ unsigned long unwind_get_return_address(struct unwind_state *state)
>  }
>  EXPORT_SYMBOL_GPL(unwind_get_return_address);
>
> +static size_t regs_size(struct pt_regs *regs)
> +{
> +       /* x86_32 regs from kernel mode are two words shorter */
> +       if (IS_ENABLED(CONFIG_X86_32) && !user_mode(regs))
> +               return sizeof(*regs) - (2*sizeof(long));
> +
> +       return sizeof(*regs);
> +}
> +
>  static bool is_last_task_frame(struct unwind_state *state)
>  {
>         unsigned long bp = (unsigned long)state->bp;
> @@ -85,6 +94,7 @@ bool unwind_next_frame(struct unwind_state *state)
>         struct pt_regs *regs;
>         unsigned long *next_bp, *next_sp;
>         size_t next_len;
> +       enum stack_type prev_type = state->stack_info.type;
>
>         if (unwind_done(state))
>                 return false;
> @@ -140,6 +150,18 @@ bool unwind_next_frame(struct unwind_state *state)
>         if (!update_stack_state(state, next_sp, next_len))
>                 goto bad_address;
>
> +       /* make sure it only unwinds up and doesn't overlap the last frame */
> +       if (state->stack_info.type == prev_type) {
> +               if (state->regs &&
> +                   (void *)next_sp < (void *)state->regs +
> +                                     regs_size(state->regs))
> +                       goto bad_address;
> +
> +               if (state->bp &&
> +                   (void *)next_sp < (void *)state->bp + FRAME_HEADER_SIZE)
> +                       goto bad_address;
> +       }
> +

Maybe this is obvious in context, but does something prevent this
error from firing if the stack switched?  That is:

pushq $rbp
movq $rsp, $rbp
...
movq [irq stack], $rsp
<- rsp and rbp have no particular relationship right now.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 41/51] x86/entry/unwind: create stack frames for saved interrupt registers
  2016-08-12 14:29 ` [PATCH v3 41/51] x86/entry/unwind: create stack frames for saved interrupt registers Josh Poimboeuf
@ 2016-08-14  8:10   ` Andy Lutomirski
  2016-08-15 16:33     ` Josh Poimboeuf
  0 siblings, 1 reply; 99+ messages in thread
From: Andy Lutomirski @ 2016-08-14  8:10 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, X86 ML,
	linux-kernel, Linus Torvalds, Steven Rostedt, Brian Gerst,
	Kees Cook, Peter Zijlstra, Frederic Weisbecker, Byungchul Park,
	Nilay Vaish

On Fri, Aug 12, 2016 at 7:29 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> With frame pointers, when a task is interrupted, its stack is no longer
> completely reliable because the function could have been interrupted
> before it had a chance to save the previous frame pointer on the stack.
> So the caller of the interrupted function could get skipped by a stack
> trace.
>
> This is problematic for live patching, which needs to know whether a
> stack trace of a sleeping task can be relied upon.  There's currently no
> way to detect if a sleeping task was interrupted by a page fault
> exception or preemption before it went to sleep.
>
> Another issue is that when dumping the stack of an interrupted task, the
> unwinder has no way of knowing where the saved pt_regs registers are, so
> it can't print them.
>
> This solves those issues by encoding the pt_regs pointer in the frame
> pointer on entry from an interrupt or an exception.
>
> This patch also updates the unwinder to be able to decode it, because
> otherwise the unwinder would be broken by this change.
>
> Note that this causes a change in the behavior of the unwinder: each
> instance of a pt_regs on the stack is now considered a "frame".  So
> callers of unwind_get_return_address() will now get an occasional
> 'regs->ip' address that would have previously been skipped over.

Acked-by: Andy Lutomirski <luto@kernel.org>

with minor optional nitpicks below.

>
> Suggested-by: Andy Lutomirski <luto@amacapital.net>
> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
> ---
>  arch/x86/entry/calling.h       | 21 +++++++++++
>  arch/x86/entry/entry_32.S      | 40 ++++++++++++++++++---
>  arch/x86/entry/entry_64.S      | 10 ++++--
>  arch/x86/include/asm/unwind.h  | 18 ++++++++--
>  arch/x86/kernel/unwind_frame.c | 82 +++++++++++++++++++++++++++++++++++++-----
>  5 files changed, 153 insertions(+), 18 deletions(-)
>
> diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
> index 9a9e588..ab799a3 100644
> --- a/arch/x86/entry/calling.h
> +++ b/arch/x86/entry/calling.h
> @@ -201,6 +201,27 @@ For 32-bit we have the following conventions - kernel is built with
>         .byte 0xf1
>         .endm
>
> +       /*
> +        * This is a sneaky trick to help the unwinder find pt_regs on the
> +        * stack.  The frame pointer is replaced with an encoded pointer to
> +        * pt_regs.  The encoding is just a clearing of the highest-order bit,
> +        * which makes it an invalid address and is also a signal to the
> +        * unwinder that it's a pt_regs pointer in disguise.
> +        *
> +        * NOTE: This macro must be used *after* SAVE_EXTRA_REGS because it
> +        * corrupts the original rbp.
> +        */
> +.macro ENCODE_FRAME_POINTER ptregs_offset=0
> +#ifdef CONFIG_FRAME_POINTER
> +       .if \ptregs_offset
> +               leaq \ptregs_offset(%rsp), %rbp
> +       .else
> +               mov %rsp, %rbp
> +       .endif
> +       btr $63, %rbp
> +#endif
> +.endm
> +
>  #endif /* CONFIG_X86_64 */
>
>  /*
> diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
> index 4396278..4006fa3 100644
> --- a/arch/x86/entry/entry_32.S
> +++ b/arch/x86/entry/entry_32.S
> @@ -174,6 +174,23 @@
>         SET_KERNEL_GS %edx
>  .endm
>
> +/*
> + * This is a sneaky trick to help the unwinder find pt_regs on the
> + * stack.  The frame pointer is replaced with an encoded pointer to
> + * pt_regs.  The encoding is just a clearing of the highest-order bit,
> + * which makes it an invalid address and is also a signal to the
> + * unwinder that it's a pt_regs pointer in disguise.
> + *
> + * NOTE: This macro must be used *after* SAVE_ALL because it corrupts the
> + * original rbp.
> + */
> +.macro ENCODE_FRAME_POINTER
> +#ifdef CONFIG_FRAME_POINTER
> +       mov %esp, %ebp
> +       btr $31, %ebp
> +#endif
> +.endm
> +
>  .macro RESTORE_INT_REGS
>         popl    %ebx
>         popl    %ecx
> @@ -205,10 +222,16 @@
>  .endm
>
>  ENTRY(ret_from_fork)
> +       call    1f

pushl $ret_from_fork is the same length and slightly less strange.
OTOH it forces a relocation, and this function doesn't return, so
there shouldn't be any performance issue, so this may save a byte or
two in the compressed image.

> +1:     push    $0

This could maybe use a comment.

--Andy

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 39/51] x86/dumpstack: convert show_trace_log_lvl() to use the new unwinder
  2016-08-12 14:28 ` [PATCH v3 39/51] x86/dumpstack: convert show_trace_log_lvl() " Josh Poimboeuf
@ 2016-08-14  8:13   ` Andy Lutomirski
  2016-08-15 16:44     ` Josh Poimboeuf
  0 siblings, 1 reply; 99+ messages in thread
From: Andy Lutomirski @ 2016-08-14  8:13 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, X86 ML,
	linux-kernel, Linus Torvalds, Steven Rostedt, Brian Gerst,
	Kees Cook, Peter Zijlstra, Frederic Weisbecker, Byungchul Park,
	Nilay Vaish

On Fri, Aug 12, 2016 at 7:28 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> Convert show_trace_log_lvl() to use the new unwinder.  dump_trace() has
> been deprecated.

>
> Another change here is that callers of show_trace_log_lvl() don't need
> to provide the 'bp' argument.  The unwinder already finds the relevant
> frame pointer by unwinding until it reaches the first frame after the
> provided stack pointer.

I still think that the best long-term solution is to change the sp and
bp arguments to an optional state argument and to add a helper to
capture the current state for future unwinding, but this is okay too.
(If nothing else, this may improve DWARF's ability to recover function
arguments and such that are available when the trace is requested but
that are gone by the time the unwinder runs.  But mainly because it
seems simpler and more direct to me and therefore seems like it will
be less likely to get confused and skip too many frames.)

But I'm okay with this for now.

--Andy

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 44/51] x86/dumpstack: print any pt_regs found on the stack
  2016-08-12 14:29 ` [PATCH v3 44/51] x86/dumpstack: print any pt_regs found on the stack Josh Poimboeuf
@ 2016-08-14  8:16   ` Andy Lutomirski
  0 siblings, 0 replies; 99+ messages in thread
From: Andy Lutomirski @ 2016-08-14  8:16 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, X86 ML,
	linux-kernel, Linus Torvalds, Steven Rostedt, Brian Gerst,
	Kees Cook, Peter Zijlstra, Frederic Weisbecker, Byungchul Park,
	Nilay Vaish

On Fri, Aug 12, 2016 at 7:29 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> Now that we can find pt_regs registers on the stack, print them.  Here's
> an example of what it looks like:
>
> Call Trace:
>  <IRQ>
>  [<ffffffff8144b793>] dump_stack+0x86/0xc3
>  [<ffffffff81142c73>] hrtimer_interrupt+0xb3/0x1c0
>  [<ffffffff8105eb86>] local_apic_timer_interrupt+0x36/0x60
>  [<ffffffff818b27cd>] smp_apic_timer_interrupt+0x3d/0x50
>  [<ffffffff818b06ee>] apic_timer_interrupt+0x9e/0xb0
> RIP: 0010:[<ffffffff818aef43>]  [<ffffffff818aef43>] _raw_spin_unlock_irq+0x33/0x60
> RSP: 0018:ffff880079c4f760  EFLAGS: 00000202
> RAX: ffff880078738000 RBX: ffff88007d3da0c0 RCX: 0000000000000007
> RDX: 0000000000006d78 RSI: ffff8800787388f0 RDI: ffff880078738000
> RBP: ffff880079c4f768 R08: 0000002199088f38 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff81e0d540
> R13: ffff8800369fb700 R14: 0000000000000000 R15: ffff880078738000
>  <EOI>
>  [<ffffffff810e1f14>] finish_task_switch+0xb4/0x250
>  [<ffffffff810e1ed6>] ? finish_task_switch+0x76/0x250
>  [<ffffffff818a7b61>] __schedule+0x3e1/0xb20
>  ...
>  [<ffffffff810759c8>] trace_do_page_fault+0x58/0x2c0
>  [<ffffffff8106f7dc>] do_async_page_fault+0x2c/0xa0
>  [<ffffffff818b1dd8>] async_page_fault+0x28/0x30
> RIP: 0010:[<ffffffff8145b062>]  [<ffffffff8145b062>] __clear_user+0x42/0x70
> RSP: 0018:ffff880079c4fd38  EFLAGS: 00010202
> RAX: 0000000000000000 RBX: 0000000000000138 RCX: 0000000000000138
> RDX: 0000000000000000 RSI: 0000000000000008 RDI: 000000000061b640
> RBP: ffff880079c4fd48 R08: 0000002198feefd7 R09: ffffffff82a40928
> R10: 0000000000000001 R11: 0000000000000000 R12: 000000000061b640
> R13: 0000000000000000 R14: ffff880079c50000 R15: ffff8800791d7400
>  [<ffffffff8145b043>] ? __clear_user+0x23/0x70
>  [<ffffffff8145b0fb>] clear_user+0x2b/0x40
>  [<ffffffff812fbda2>] load_elf_binary+0x1472/0x1750
>  [<ffffffff8129a591>] search_binary_handler+0xa1/0x200
>  [<ffffffff8129b69b>] do_execveat_common.isra.36+0x6cb/0x9f0
>  [<ffffffff8129b5f3>] ? do_execveat_common.isra.36+0x623/0x9f0
>  [<ffffffff8129bcaa>] SyS_execve+0x3a/0x50
>  [<ffffffff81003f5c>] do_syscall_64+0x6c/0x1e0
>  [<ffffffff818afa3f>] entry_SYSCALL64_slow_path+0x25/0x25
> RIP: 0033:[<00007fd2e2f2e537>]  [<00007fd2e2f2e537>] 0x7fd2e2f2e537
> RSP: 002b:00007ffc449c5fc8  EFLAGS: 00000246
> RAX: ffffffffffffffda RBX: 00007ffc449c8860 RCX: 00007fd2e2f2e537
> RDX: 000000000127cc40 RSI: 00007ffc449c8860 RDI: 00007ffc449c6029
> RBP: 00007ffc449c60b0 R08: 65726f632d667265 R09: 00007ffc449c5e20
> R10: 00000000000005a7 R11: 0000000000000246 R12: 000000000127cc40
> R13: 000000000127ce05 R14: 00007ffc449c6029 R15: 000000000127ce01

I really like this, and I think it'll be quite useful for future debugging.

Some day I want to teach this thing to print gsbase as well, but it's
not quite obvious to me how to do that except maybe by using DWARF and
a lot of special case code.

--Andy

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 42/51] x86/unwind: create stack frames for saved syscall registers
  2016-08-12 14:29 ` [PATCH v3 42/51] x86/unwind: create stack frames for saved syscall registers Josh Poimboeuf
@ 2016-08-14  8:23   ` Andy Lutomirski
  2016-08-15 16:52     ` Josh Poimboeuf
  0 siblings, 1 reply; 99+ messages in thread
From: Andy Lutomirski @ 2016-08-14  8:23 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, X86 ML,
	linux-kernel, Linus Torvalds, Steven Rostedt, Brian Gerst,
	Kees Cook, Peter Zijlstra, Frederic Weisbecker, Byungchul Park,
	Nilay Vaish

On Fri, Aug 12, 2016 at 7:29 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> The entry code doesn't encode pt_regs for syscalls.  But they're always
> at the same location, so we can add a manual check for them.

At first I thought these would be useless (they're the *user* state
and aren't directly relevant to the kernel), but then I realized that
they could be extremely valuable: they contain the syscall args.  Do
they display in OOPS dumps?  It might be nice to add orig_ax to the
regs display too so we get the syscall nr as well.

--Andy

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 45/51] x86: remove 64-byte gap at end of irq stack
  2016-08-14  7:52   ` Andy Lutomirski
@ 2016-08-14 12:50     ` Brian Gerst
  2016-08-15 17:00       ` Josh Poimboeuf
  2016-08-15 15:42     ` Josh Poimboeuf
  1 sibling, 1 reply; 99+ messages in thread
From: Brian Gerst @ 2016-08-14 12:50 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Josh Poimboeuf, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	X86 ML, linux-kernel, Linus Torvalds, Steven Rostedt, Kees Cook,
	Peter Zijlstra, Frederic Weisbecker, Byungchul Park, Nilay Vaish

On Sun, Aug 14, 2016 at 3:52 AM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Fri, Aug 12, 2016 at 7:29 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>> There has been a 64-byte gap at the end of the irq stack for at least 12
>> years.  It predates git history, and I can't find any good reason for
>> it.  Remove it.  What's the worst that could happen?
>
> I can't think of any reason this would matter.
>
> For that matter, do you have any idea why irq_stack_union is a union
> or why we insist on sticking it at %gs:0?  Sure, the *canary* needs to
> live at a fixed offset (because GCC is daft, sigh), but I don't see
> what that has to do with the rest of the IRQ stack.
>
> --Andy

Because the IRQ stack requires page alignment so it was convenient to
put it at the start of the per-cpu area.  I don't think at the time I
wrote this there was specific support for page-aligned objects in
per-cpu memory.  Since stacks grow down, it was tolerable to reserve a
few bytes at the bottom for the canary.

What would be great is if we could leverage the new GCC plugin tools
to reimplement stack protector in a manner that is more compatible
with the kernel environment.  It would make the stack canary a true
per-cpu variable instead of the hard-coded TLS-based location it is
now.  That would make 64-bit be able to use normal delta per-cpu
offsets instead of zero-based, and would allow 32-bit to always do
lazy GS.

--
Brian Gerst

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 09/51] x86/dumpstack: fix x86_32 kernel_stack_pointer() previous stack access
  2016-08-14  7:26   ` Andy Lutomirski
@ 2016-08-14 12:55     ` Brian Gerst
  2016-08-14 13:42       ` Andy Lutomirski
  2016-08-15 15:05     ` Josh Poimboeuf
  1 sibling, 1 reply; 99+ messages in thread
From: Brian Gerst @ 2016-08-14 12:55 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Josh Poimboeuf, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	X86 ML, linux-kernel, Linus Torvalds, Steven Rostedt, Kees Cook,
	Peter Zijlstra, Frederic Weisbecker, Byungchul Park, Nilay Vaish

On Sun, Aug 14, 2016 at 3:26 AM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Fri, Aug 12, 2016 at 7:28 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>> On x86_32, when an interrupt happens from kernel space, SS and SP aren't
>> pushed and the existing stack is used.  So pt_regs is effectively two
>> words shorter, and the previous stack pointer is normally the memory
>> after the shortened pt_regs, aka '&regs->sp'.
>>
>> But in the rare case where the interrupt hits right after the stack
>> pointer has been changed to point to an empty stack, like for example
>> when call_on_stack() is used, the address immediately after the
>> shortened pt_regs is no longer on the stack.  In that case, instead of
>> '&regs->sp', the previous stack pointer should be retrieved from the
>> beginning of the current stack page.
>>
>> kernel_stack_pointer() wants to do that, but it forgets to dereference
>> the pointer.  So instead of returning a pointer to the previous stack,
>> it returns a pointer to the beginning of the current stack.
>>
>> Fixes: 0788aa6a23cb ("x86: Prepare removal of previous_esp from i386 thread_info structure")
>> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
>
> This seems like a valid fix, but I'm not sure I agree with the intent
> of the code.  &regs->sp really is the previous stack pointer in the
> sense that the stack pointer was &regs->sp when the entry happened.
> From an unwinder's perspective, how is:
>
> movl [whatever], $esp
> <-- interrupt
>
> any different from:
>
> movl [whatever], $esp
> pushl [something]
> <-- interrupt
>
> Also, does x86_32 do this type of stack switching at all?  AFAICS
> 32-bit kernels don't use IRQ stacks in the first place.  Do they?  Am
> I just missing the code that does it?

32-bit uses a software-based stack switch to run on the IRQ stack.
See execute_on_irq_stack() in irq_32.c.

--
Brian Gerst

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 09/51] x86/dumpstack: fix x86_32 kernel_stack_pointer() previous stack access
  2016-08-14 12:55     ` Brian Gerst
@ 2016-08-14 13:42       ` Andy Lutomirski
  0 siblings, 0 replies; 99+ messages in thread
From: Andy Lutomirski @ 2016-08-14 13:42 UTC (permalink / raw)
  To: Brian Gerst
  Cc: Josh Poimboeuf, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	X86 ML, linux-kernel, Linus Torvalds, Steven Rostedt, Kees Cook,
	Peter Zijlstra, Frederic Weisbecker, Byungchul Park, Nilay Vaish

On Sun, Aug 14, 2016 at 5:55 AM, Brian Gerst <brgerst@gmail.com> wrote:
> On Sun, Aug 14, 2016 at 3:26 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>> On Fri, Aug 12, 2016 at 7:28 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>>> On x86_32, when an interrupt happens from kernel space, SS and SP aren't
>>> pushed and the existing stack is used.  So pt_regs is effectively two
>>> words shorter, and the previous stack pointer is normally the memory
>>> after the shortened pt_regs, aka '&regs->sp'.
>>>
>>> But in the rare case where the interrupt hits right after the stack
>>> pointer has been changed to point to an empty stack, like for example
>>> when call_on_stack() is used, the address immediately after the
>>> shortened pt_regs is no longer on the stack.  In that case, instead of
>>> '&regs->sp', the previous stack pointer should be retrieved from the
>>> beginning of the current stack page.
>>>
>>> kernel_stack_pointer() wants to do that, but it forgets to dereference
>>> the pointer.  So instead of returning a pointer to the previous stack,
>>> it returns a pointer to the beginning of the current stack.
>>>
>>> Fixes: 0788aa6a23cb ("x86: Prepare removal of previous_esp from i386 thread_info structure")
>>> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
>>
>> This seems like a valid fix, but I'm not sure I agree with the intent
>> of the code.  &regs->sp really is the previous stack pointer in the
>> sense that the stack pointer was &regs->sp when the entry happened.
>> From an unwinder's perspective, how is:
>>
>> movl [whatever], $esp
>> <-- interrupt
>>
>> any different from:
>>
>> movl [whatever], $esp
>> pushl [something]
>> <-- interrupt
>>
>> Also, does x86_32 do this type of stack switching at all?  AFAICS
>> 32-bit kernels don't use IRQ stacks in the first place.  Do they?  Am
>> I just missing the code that does it?
>
> 32-bit uses a software-based stack switch to run on the IRQ stack.
> See execute_on_irq_stack() in irq_32.c.
>

Indeed, thanks.

I'm still not convinced that kernel_stack_pojnter() needs to handle this.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 09/51] x86/dumpstack: fix x86_32 kernel_stack_pointer() previous stack access
  2016-08-14  7:26   ` Andy Lutomirski
  2016-08-14 12:55     ` Brian Gerst
@ 2016-08-15 15:05     ` Josh Poimboeuf
  2016-08-15 17:22       ` Josh Poimboeuf
  1 sibling, 1 reply; 99+ messages in thread
From: Josh Poimboeuf @ 2016-08-15 15:05 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, X86 ML,
	linux-kernel, Linus Torvalds, Steven Rostedt, Brian Gerst,
	Kees Cook, Peter Zijlstra, Frederic Weisbecker, Byungchul Park,
	Nilay Vaish

On Sun, Aug 14, 2016 at 12:26:29AM -0700, Andy Lutomirski wrote:
> On Fri, Aug 12, 2016 at 7:28 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > On x86_32, when an interrupt happens from kernel space, SS and SP aren't
> > pushed and the existing stack is used.  So pt_regs is effectively two
> > words shorter, and the previous stack pointer is normally the memory
> > after the shortened pt_regs, aka '&regs->sp'.
> >
> > But in the rare case where the interrupt hits right after the stack
> > pointer has been changed to point to an empty stack, like for example
> > when call_on_stack() is used, the address immediately after the
> > shortened pt_regs is no longer on the stack.  In that case, instead of
> > '&regs->sp', the previous stack pointer should be retrieved from the
> > beginning of the current stack page.
> >
> > kernel_stack_pointer() wants to do that, but it forgets to dereference
> > the pointer.  So instead of returning a pointer to the previous stack,
> > it returns a pointer to the beginning of the current stack.
> >
> > Fixes: 0788aa6a23cb ("x86: Prepare removal of previous_esp from i386 thread_info structure")
> > Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
> 
> This seems like a valid fix, but I'm not sure I agree with the intent
> of the code.  &regs->sp really is the previous stack pointer in the
> sense that the stack pointer was &regs->sp when the entry happened.
> From an unwinder's perspective, how is:
> 
> movl [whatever], $esp
> <-- interrupt
> 
> any different from:
> 
> movl [whatever], $esp
> pushl [something]
> <-- interrupt

In the first case, the stack is empty, so reading the value pointed to
by %esp would result in accessing outside the bounds of the stack.

-- 
Josh

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 16/51] x86/32: put real return address on stack in entry code
  2016-08-14  7:31   ` Andy Lutomirski
@ 2016-08-15 15:09     ` Josh Poimboeuf
  2016-08-15 18:04       ` H. Peter Anvin
  0 siblings, 1 reply; 99+ messages in thread
From: Josh Poimboeuf @ 2016-08-15 15:09 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, X86 ML,
	linux-kernel, Linus Torvalds, Steven Rostedt, Brian Gerst,
	Kees Cook, Peter Zijlstra, Frederic Weisbecker, Byungchul Park,
	Nilay Vaish

On Sun, Aug 14, 2016 at 12:31:47AM -0700, Andy Lutomirski wrote:
> On Fri, Aug 12, 2016 at 7:28 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > This standardizes the stacks of idle tasks to be consistent with other
> > tasks on 32-bit.
> 
> It might be nice to stick a ud2 or 1: hlt; jmp 1b or similar
> afterwards to make it clear that initial_code can't return.

Yeah, I'll do something like that.

-- 
Josh

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 19/51] x86/entry/32: rename 'error_code' to 'common_exception'
  2016-08-14  7:40   ` Andy Lutomirski
@ 2016-08-15 15:30     ` Josh Poimboeuf
  0 siblings, 0 replies; 99+ messages in thread
From: Josh Poimboeuf @ 2016-08-15 15:30 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, X86 ML,
	linux-kernel, Linus Torvalds, Steven Rostedt, Brian Gerst,
	Kees Cook, Peter Zijlstra, Frederic Weisbecker, Byungchul Park,
	Nilay Vaish

On Sun, Aug 14, 2016 at 12:40:03AM -0700, Andy Lutomirski wrote:
> On Fri, Aug 12, 2016 at 7:28 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > The 'error_code' label is awkwardly named, especially when it shows up
> > in a stack trace.  Move it to its own local function and rename it to
> > 'common_exception', analagous to the existing 'common_interrupt'.
> >
> > This also makes related stack traces more sensible.
> 
> This is okay with me.  You could also call it "error_entry" for
> consistency with x86_64.

On x86_64, error_entry is just a helper function which doesn't call the
C handler, and so it doesn't usually show up in the stack trace.  So its
scope is quite different from error_code/common_exception.

-- 
Josh

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 31/51] x86/dumpstack: allow preemption in show_stack_log_lvl() and dump_trace()
  2016-08-14  7:45   ` Andy Lutomirski
@ 2016-08-15 15:32     ` Josh Poimboeuf
  0 siblings, 0 replies; 99+ messages in thread
From: Josh Poimboeuf @ 2016-08-15 15:32 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, X86 ML,
	linux-kernel, Linus Torvalds, Steven Rostedt, Brian Gerst,
	Kees Cook, Peter Zijlstra, Frederic Weisbecker, Byungchul Park,
	Nilay Vaish

On Sun, Aug 14, 2016 at 12:45:35AM -0700, Andy Lutomirski wrote:
> On Fri, Aug 12, 2016 at 7:28 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > show_stack_log_lvl() and dump_trace() are already preemption safe:
> >
> > - If they're running in interrupt context, preemption is already
> >   disabled and the percpu irq stack pointers can be trusted.
> 
> I agree with the patch, but I have a minor nitpick about the
> description.  The "irq stack" is an actual thing, but the relevant
> stacks here aren't just the irq stack: they're the irq stack, the IST
> stacks (on 64-bit), and the NMI stack (on 32-bit).  The same logic
> applies, though.

Yeah.  Maybe I'll just remove the "irq" qualifier and instead call them
"percpu stack pointers".

-- 
Josh

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 32/51] x86/dumpstack: simplify in_exception_stack()
  2016-08-14  7:48   ` Andy Lutomirski
@ 2016-08-15 15:34     ` Josh Poimboeuf
  0 siblings, 0 replies; 99+ messages in thread
From: Josh Poimboeuf @ 2016-08-15 15:34 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, X86 ML,
	linux-kernel, Linus Torvalds, Steven Rostedt, Brian Gerst,
	Kees Cook, Peter Zijlstra, Frederic Weisbecker, Byungchul Park,
	Nilay Vaish

On Sun, Aug 14, 2016 at 12:48:15AM -0700, Andy Lutomirski wrote:
> On Fri, Aug 12, 2016 at 7:28 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > in_exception_stack() does some bad, bad things just so the unwinder can
> > print different values for different areas of the debug exception stack.
> >
> > There's no need to clarify where exactly on the stack it is.  Just print
> > "#DB" and be done with it.
> 
> I'm okay with the printing part, but you're also using this to prevent
> infinite looping.  Will this cause the unwind to fail if we go debug
> -> page fault -> debug or similar?

Yes, but that behavior already existed.  This patch doesn't change that;
it just makes it clearer what's going on.

-- 
Josh

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 45/51] x86: remove 64-byte gap at end of irq stack
  2016-08-14  7:52   ` Andy Lutomirski
  2016-08-14 12:50     ` Brian Gerst
@ 2016-08-15 15:42     ` Josh Poimboeuf
  1 sibling, 0 replies; 99+ messages in thread
From: Josh Poimboeuf @ 2016-08-15 15:42 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, X86 ML,
	linux-kernel, Linus Torvalds, Steven Rostedt, Brian Gerst,
	Kees Cook, Peter Zijlstra, Frederic Weisbecker, Byungchul Park,
	Nilay Vaish

On Sun, Aug 14, 2016 at 12:52:40AM -0700, Andy Lutomirski wrote:
> On Fri, Aug 12, 2016 at 7:29 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > There has been a 64-byte gap at the end of the irq stack for at least 12
> > years.  It predates git history, and I can't find any good reason for
> > it.  Remove it.  What's the worst that could happen?
> 
> I can't think of any reason this would matter.
> 
> For that matter, do you have any idea why irq_stack_union is a union
> or why we insist on sticking it at %gs:0?  Sure, the *canary* needs to
> live at a fixed offset (because GCC is daft, sigh), but I don't see
> what that has to do with the rest of the IRQ stack.

Good question.  I have no idea...

-- 
Josh

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 48/51] x86/unwind: warn if stack grows up
  2016-08-14  7:56   ` Andy Lutomirski
@ 2016-08-15 16:25     ` Josh Poimboeuf
  0 siblings, 0 replies; 99+ messages in thread
From: Josh Poimboeuf @ 2016-08-15 16:25 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, X86 ML,
	linux-kernel, Linus Torvalds, Steven Rostedt, Brian Gerst,
	Kees Cook, Peter Zijlstra, Frederic Weisbecker, Byungchul Park,
	Nilay Vaish

On Sun, Aug 14, 2016 at 12:56:40AM -0700, Andy Lutomirski wrote:
> On Fri, Aug 12, 2016 at 7:29 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > Add a sanity check to ensure the stack only grows down, and print a
> > warning if the check fails.
> >
> > Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
> > ---
> >  arch/x86/kernel/unwind_frame.c | 26 ++++++++++++++++++++++++--
> >  1 file changed, 24 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/x86/kernel/unwind_frame.c b/arch/x86/kernel/unwind_frame.c
> > index 5496462..f21b7ef 100644
> > --- a/arch/x86/kernel/unwind_frame.c
> > +++ b/arch/x86/kernel/unwind_frame.c
> > @@ -32,6 +32,15 @@ unsigned long unwind_get_return_address(struct unwind_state *state)
> >  }
> >  EXPORT_SYMBOL_GPL(unwind_get_return_address);
> >
> > +static size_t regs_size(struct pt_regs *regs)
> > +{
> > +       /* x86_32 regs from kernel mode are two words shorter */
> > +       if (IS_ENABLED(CONFIG_X86_32) && !user_mode(regs))
> > +               return sizeof(*regs) - (2*sizeof(long));
> > +
> > +       return sizeof(*regs);
> > +}
> > +
> >  static bool is_last_task_frame(struct unwind_state *state)
> >  {
> >         unsigned long bp = (unsigned long)state->bp;
> > @@ -85,6 +94,7 @@ bool unwind_next_frame(struct unwind_state *state)
> >         struct pt_regs *regs;
> >         unsigned long *next_bp, *next_sp;
> >         size_t next_len;
> > +       enum stack_type prev_type = state->stack_info.type;
> >
> >         if (unwind_done(state))
> >                 return false;
> > @@ -140,6 +150,18 @@ bool unwind_next_frame(struct unwind_state *state)
> >         if (!update_stack_state(state, next_sp, next_len))
> >                 goto bad_address;
> >
> > +       /* make sure it only unwinds up and doesn't overlap the last frame */
> > +       if (state->stack_info.type == prev_type) {
> > +               if (state->regs &&
> > +                   (void *)next_sp < (void *)state->regs +
> > +                                     regs_size(state->regs))
> > +                       goto bad_address;
> > +
> > +               if (state->bp &&
> > +                   (void *)next_sp < (void *)state->bp + FRAME_HEADER_SIZE)
> > +                       goto bad_address;
> > +       }
> > +
> 
> Maybe this is obvious in context, but does something prevent this
> error from firing if the stack switched?  That is:
> 
> pushq $rbp
> movq $rsp, $rbp
> ...
> movq [irq stack], $rsp
> <- rsp and rbp have no particular relationship right now.

Short answer:

No, because the above warning only happens between two "frame" pointers
(where frame pointer might be a regs pointer) when the two frames are on
the same stack.  This warning has nothing to do with the stack pointer,
despite the "next_sp" name.  I should probably rename "next_sp" to
"next_frame" or something.

Long answer:

The unwinder is frame-pointer based, so in most cases it completely
ignores the value of the stack pointer.  The only exceptions are:

a) in __unwind_start() where it can use the value of regs->sp to determine how
   many frames to skip; and

b) when reading the next stack pointer to switch to the next stack.

If for example the regs where taken from an interrupt right after the
stack had been switched, then no frame would contain the stack pointer,
and __unwind_start() would skip all the frames, and the unwind would be
reported as empty.

Such edge cases are exceedingly rare, and are acceptable IMO, because
frame pointers and interrupts are inherently not 100% compatible.  And
these edge cases already exist in today's code.

(And I should reiterate that even when the unwinder breaks down like
that, the oops dump code should still keep going and show all the
addresses anyway.)

-- 
Josh

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 41/51] x86/entry/unwind: create stack frames for saved interrupt registers
  2016-08-14  8:10   ` Andy Lutomirski
@ 2016-08-15 16:33     ` Josh Poimboeuf
  0 siblings, 0 replies; 99+ messages in thread
From: Josh Poimboeuf @ 2016-08-15 16:33 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, X86 ML,
	linux-kernel, Linus Torvalds, Steven Rostedt, Brian Gerst,
	Kees Cook, Peter Zijlstra, Frederic Weisbecker, Byungchul Park,
	Nilay Vaish

On Sun, Aug 14, 2016 at 01:10:42AM -0700, Andy Lutomirski wrote:
> On Fri, Aug 12, 2016 at 7:29 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > With frame pointers, when a task is interrupted, its stack is no longer
> > completely reliable because the function could have been interrupted
> > before it had a chance to save the previous frame pointer on the stack.
> > So the caller of the interrupted function could get skipped by a stack
> > trace.
> >
> > This is problematic for live patching, which needs to know whether a
> > stack trace of a sleeping task can be relied upon.  There's currently no
> > way to detect if a sleeping task was interrupted by a page fault
> > exception or preemption before it went to sleep.
> >
> > Another issue is that when dumping the stack of an interrupted task, the
> > unwinder has no way of knowing where the saved pt_regs registers are, so
> > it can't print them.
> >
> > This solves those issues by encoding the pt_regs pointer in the frame
> > pointer on entry from an interrupt or an exception.
> >
> > This patch also updates the unwinder to be able to decode it, because
> > otherwise the unwinder would be broken by this change.
> >
> > Note that this causes a change in the behavior of the unwinder: each
> > instance of a pt_regs on the stack is now considered a "frame".  So
> > callers of unwind_get_return_address() will now get an occasional
> > 'regs->ip' address that would have previously been skipped over.
> 
> Acked-by: Andy Lutomirski <luto@kernel.org>
> 
> with minor optional nitpicks below.
> 
> >
> > Suggested-by: Andy Lutomirski <luto@amacapital.net>
> > Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
> > ---
> >  arch/x86/entry/calling.h       | 21 +++++++++++
> >  arch/x86/entry/entry_32.S      | 40 ++++++++++++++++++---
> >  arch/x86/entry/entry_64.S      | 10 ++++--
> >  arch/x86/include/asm/unwind.h  | 18 ++++++++--
> >  arch/x86/kernel/unwind_frame.c | 82 +++++++++++++++++++++++++++++++++++++-----
> >  5 files changed, 153 insertions(+), 18 deletions(-)
> >
> > diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
> > index 9a9e588..ab799a3 100644
> > --- a/arch/x86/entry/calling.h
> > +++ b/arch/x86/entry/calling.h
> > @@ -201,6 +201,27 @@ For 32-bit we have the following conventions - kernel is built with
> >         .byte 0xf1
> >         .endm
> >
> > +       /*
> > +        * This is a sneaky trick to help the unwinder find pt_regs on the
> > +        * stack.  The frame pointer is replaced with an encoded pointer to
> > +        * pt_regs.  The encoding is just a clearing of the highest-order bit,
> > +        * which makes it an invalid address and is also a signal to the
> > +        * unwinder that it's a pt_regs pointer in disguise.
> > +        *
> > +        * NOTE: This macro must be used *after* SAVE_EXTRA_REGS because it
> > +        * corrupts the original rbp.
> > +        */
> > +.macro ENCODE_FRAME_POINTER ptregs_offset=0
> > +#ifdef CONFIG_FRAME_POINTER
> > +       .if \ptregs_offset
> > +               leaq \ptregs_offset(%rsp), %rbp
> > +       .else
> > +               mov %rsp, %rbp
> > +       .endif
> > +       btr $63, %rbp
> > +#endif
> > +.endm
> > +
> >  #endif /* CONFIG_X86_64 */
> >
> >  /*
> > diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
> > index 4396278..4006fa3 100644
> > --- a/arch/x86/entry/entry_32.S
> > +++ b/arch/x86/entry/entry_32.S
> > @@ -174,6 +174,23 @@
> >         SET_KERNEL_GS %edx
> >  .endm
> >
> > +/*
> > + * This is a sneaky trick to help the unwinder find pt_regs on the
> > + * stack.  The frame pointer is replaced with an encoded pointer to
> > + * pt_regs.  The encoding is just a clearing of the highest-order bit,
> > + * which makes it an invalid address and is also a signal to the
> > + * unwinder that it's a pt_regs pointer in disguise.
> > + *
> > + * NOTE: This macro must be used *after* SAVE_ALL because it corrupts the
> > + * original rbp.
> > + */
> > +.macro ENCODE_FRAME_POINTER
> > +#ifdef CONFIG_FRAME_POINTER
> > +       mov %esp, %ebp
> > +       btr $31, %ebp
> > +#endif
> > +.endm
> > +
> >  .macro RESTORE_INT_REGS
> >         popl    %ebx
> >         popl    %ecx
> > @@ -205,10 +222,16 @@
> >  .endm
> >
> >  ENTRY(ret_from_fork)
> > +       call    1f
> 
> pushl $ret_from_fork is the same length and slightly less strange.
> OTOH it forces a relocation, and this function doesn't return, so
> there shouldn't be any performance issue, so this may save a byte or
> two in the compressed image.
> 
> > +1:     push    $0
> 
> This could maybe use a comment.

Oops.  This ret_from_fork bit was meant for a separate patch.

I think the problem with "pushl $ret_from_fork" is that
ret_from_fork+0x0 is not a valid call return address.
printk_stack_address() will show it as the end of the previous function
in the file.

Anyway, this definitely needs a comment and should be split out to a
separate patch.

-- 
Josh

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 39/51] x86/dumpstack: convert show_trace_log_lvl() to use the new unwinder
  2016-08-14  8:13   ` Andy Lutomirski
@ 2016-08-15 16:44     ` Josh Poimboeuf
  0 siblings, 0 replies; 99+ messages in thread
From: Josh Poimboeuf @ 2016-08-15 16:44 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, X86 ML,
	linux-kernel, Linus Torvalds, Steven Rostedt, Brian Gerst,
	Kees Cook, Peter Zijlstra, Frederic Weisbecker, Byungchul Park,
	Nilay Vaish

On Sun, Aug 14, 2016 at 01:13:54AM -0700, Andy Lutomirski wrote:
> On Fri, Aug 12, 2016 at 7:28 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > Convert show_trace_log_lvl() to use the new unwinder.  dump_trace() has
> > been deprecated.
> 
> >
> > Another change here is that callers of show_trace_log_lvl() don't need
> > to provide the 'bp' argument.  The unwinder already finds the relevant
> > frame pointer by unwinding until it reaches the first frame after the
> > provided stack pointer.
> 
> I still think that the best long-term solution is to change the sp and
> bp arguments to an optional state argument and to add a helper to
> capture the current state for future unwinding, but this is okay too.
> (If nothing else, this may improve DWARF's ability to recover function
> arguments and such that are available when the trace is requested but
> that are gone by the time the unwinder runs.  But mainly because it
> seems simpler and more direct to me and therefore seems like it will
> be less likely to get confused and skip too many frames.)
> 
> But I'm okay with this for now.

Long term, maybe something like an unwind_capture_state() helper might
be best (though I'm not yet convinced either way).  However it wouldn't
quite work with today's code because show_stack(), which is implemented
for all architectures, takes 'sp' as an argument.

-- 
Josh

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 42/51] x86/unwind: create stack frames for saved syscall registers
  2016-08-14  8:23   ` Andy Lutomirski
@ 2016-08-15 16:52     ` Josh Poimboeuf
  0 siblings, 0 replies; 99+ messages in thread
From: Josh Poimboeuf @ 2016-08-15 16:52 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, X86 ML,
	linux-kernel, Linus Torvalds, Steven Rostedt, Brian Gerst,
	Kees Cook, Peter Zijlstra, Frederic Weisbecker, Byungchul Park,
	Nilay Vaish

On Sun, Aug 14, 2016 at 01:23:11AM -0700, Andy Lutomirski wrote:
> On Fri, Aug 12, 2016 at 7:29 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > The entry code doesn't encode pt_regs for syscalls.  But they're always
> > at the same location, so we can add a manual check for them.
> 
> At first I thought these would be useless (they're the *user* state
> and aren't directly relevant to the kernel), but then I realized that
> they could be extremely valuable: they contain the syscall args.

Yeah, that's what I was thinking.  I'll add that justification to the
patch header.

> Do they display in OOPS dumps?

Yes.

> It might be nice to add orig_ax to the regs display too so we get the
> syscall nr as well.

Hm, good idea.  I'm surprised orig_ax isn't already being shown since it
could be useful for exceptions and irqs too.

-- 
Josh

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 45/51] x86: remove 64-byte gap at end of irq stack
  2016-08-14 12:50     ` Brian Gerst
@ 2016-08-15 17:00       ` Josh Poimboeuf
  0 siblings, 0 replies; 99+ messages in thread
From: Josh Poimboeuf @ 2016-08-15 17:00 UTC (permalink / raw)
  To: Brian Gerst
  Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	X86 ML, linux-kernel, Linus Torvalds, Steven Rostedt, Kees Cook,
	Peter Zijlstra, Frederic Weisbecker, Byungchul Park, Nilay Vaish

On Sun, Aug 14, 2016 at 08:50:57AM -0400, Brian Gerst wrote:
> On Sun, Aug 14, 2016 at 3:52 AM, Andy Lutomirski <luto@amacapital.net> wrote:
> > On Fri, Aug 12, 2016 at 7:29 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> >> There has been a 64-byte gap at the end of the irq stack for at least 12
> >> years.  It predates git history, and I can't find any good reason for
> >> it.  Remove it.  What's the worst that could happen?
> >
> > I can't think of any reason this would matter.
> >
> > For that matter, do you have any idea why irq_stack_union is a union
> > or why we insist on sticking it at %gs:0?  Sure, the *canary* needs to
> > live at a fixed offset (because GCC is daft, sigh), but I don't see
> > what that has to do with the rest of the IRQ stack.
> >
> > --Andy
> 
> Because the IRQ stack requires page alignment so it was convenient to
> put it at the start of the per-cpu area.  I don't think at the time I
> wrote this there was specific support for page-aligned objects in
> per-cpu memory.  Since stacks grow down, it was tolerable to reserve a
> few bytes at the bottom for the canary.

Hm.  Sounds like another good opportunity for a cleanup (though it's
well outside the scope of this patch set).

> What would be great is if we could leverage the new GCC plugin tools
> to reimplement stack protector in a manner that is more compatible
> with the kernel environment.  It would make the stack canary a true
> per-cpu variable instead of the hard-coded TLS-based location it is
> now.  That would make 64-bit be able to use normal delta per-cpu
> offsets instead of zero-based, and would allow 32-bit to always do
> lazy GS.
> 
> --
> Brian Gerst

-- 
Josh

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 09/51] x86/dumpstack: fix x86_32 kernel_stack_pointer() previous stack access
  2016-08-15 15:05     ` Josh Poimboeuf
@ 2016-08-15 17:22       ` Josh Poimboeuf
  2016-08-15 20:04         ` Andy Lutomirski
  0 siblings, 1 reply; 99+ messages in thread
From: Josh Poimboeuf @ 2016-08-15 17:22 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, X86 ML,
	linux-kernel, Linus Torvalds, Steven Rostedt, Brian Gerst,
	Kees Cook, Peter Zijlstra, Frederic Weisbecker, Byungchul Park,
	Nilay Vaish

On Mon, Aug 15, 2016 at 10:05:58AM -0500, Josh Poimboeuf wrote:
> On Sun, Aug 14, 2016 at 12:26:29AM -0700, Andy Lutomirski wrote:
> > On Fri, Aug 12, 2016 at 7:28 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > > On x86_32, when an interrupt happens from kernel space, SS and SP aren't
> > > pushed and the existing stack is used.  So pt_regs is effectively two
> > > words shorter, and the previous stack pointer is normally the memory
> > > after the shortened pt_regs, aka '&regs->sp'.
> > >
> > > But in the rare case where the interrupt hits right after the stack
> > > pointer has been changed to point to an empty stack, like for example
> > > when call_on_stack() is used, the address immediately after the
> > > shortened pt_regs is no longer on the stack.  In that case, instead of
> > > '&regs->sp', the previous stack pointer should be retrieved from the
> > > beginning of the current stack page.
> > >
> > > kernel_stack_pointer() wants to do that, but it forgets to dereference
> > > the pointer.  So instead of returning a pointer to the previous stack,
> > > it returns a pointer to the beginning of the current stack.
> > >
> > > Fixes: 0788aa6a23cb ("x86: Prepare removal of previous_esp from i386 thread_info structure")
> > > Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
> > 
> > This seems like a valid fix, but I'm not sure I agree with the intent
> > of the code.  &regs->sp really is the previous stack pointer in the
> > sense that the stack pointer was &regs->sp when the entry happened.
> > From an unwinder's perspective, how is:
> > 
> > movl [whatever], $esp
> > <-- interrupt
> > 
> > any different from:
> > 
> > movl [whatever], $esp
> > pushl [something]
> > <-- interrupt
> 
> In the first case, the stack is empty, so reading the value pointed to
> by %esp would result in accessing outside the bounds of the stack.

...but maybe your point is that following the previous stack pointer is
outside the scope of kernel_stack_pointer() and should instead be done
by its caller.  Especially considering the fact that the x86_64 version
of this function doesn't have this "feature".  In which case I think I
would agree.

However I think fixing that is outside the scope of this
already-way-too-big patch set.

-- 
Josh

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 16/51] x86/32: put real return address on stack in entry code
  2016-08-15 15:09     ` Josh Poimboeuf
@ 2016-08-15 18:04       ` H. Peter Anvin
  2016-08-15 18:25         ` Josh Poimboeuf
  0 siblings, 1 reply; 99+ messages in thread
From: H. Peter Anvin @ 2016-08-15 18:04 UTC (permalink / raw)
  To: Josh Poimboeuf, Andy Lutomirski
  Cc: Thomas Gleixner, Ingo Molnar, X86 ML, linux-kernel,
	Linus Torvalds, Steven Rostedt, Brian Gerst, Kees Cook,
	Peter Zijlstra, Frederic Weisbecker, Byungchul Park, Nilay Vaish

On 08/15/16 08:09, Josh Poimboeuf wrote:
> On Sun, Aug 14, 2016 at 12:31:47AM -0700, Andy Lutomirski wrote:
>> On Fri, Aug 12, 2016 at 7:28 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>>> This standardizes the stacks of idle tasks to be consistent with other
>>> tasks on 32-bit.
>>
>> It might be nice to stick a ud2 or 1: hlt; jmp 1b or similar
>> afterwards to make it clear that initial_code can't return.
> 
> Yeah, I'll do something like that.
> 

"Standardizing the stack" how?  A zero on the stack terminates the stack
trace.

	-hpa

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 16/51] x86/32: put real return address on stack in entry code
  2016-08-15 18:04       ` H. Peter Anvin
@ 2016-08-15 18:25         ` Josh Poimboeuf
  2016-08-15 19:22           ` H. Peter Anvin
  0 siblings, 1 reply; 99+ messages in thread
From: Josh Poimboeuf @ 2016-08-15 18:25 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, X86 ML,
	linux-kernel, Linus Torvalds, Steven Rostedt, Brian Gerst,
	Kees Cook, Peter Zijlstra, Frederic Weisbecker, Byungchul Park,
	Nilay Vaish

On Mon, Aug 15, 2016 at 11:04:42AM -0700, H. Peter Anvin wrote:
> On 08/15/16 08:09, Josh Poimboeuf wrote:
> > On Sun, Aug 14, 2016 at 12:31:47AM -0700, Andy Lutomirski wrote:
> >> On Fri, Aug 12, 2016 at 7:28 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> >>> This standardizes the stacks of idle tasks to be consistent with other
> >>> tasks on 32-bit.
> >>
> >> It might be nice to stick a ud2 or 1: hlt; jmp 1b or similar
> >> afterwards to make it clear that initial_code can't return.
> > 
> > Yeah, I'll do something like that.
> > 
> 
> "Standardizing the stack" how?  A zero on the stack terminates the stack
> trace.

Instead of zero, user tasks have a real return address at that spot.
This makes idle tasks consistent with that, so we have a well defined
"end of stack".  Also it makes the stack trace more useful since it
shows what entry code was involved in calling into C.

-- 
Josh

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 16/51] x86/32: put real return address on stack in entry code
  2016-08-15 18:25         ` Josh Poimboeuf
@ 2016-08-15 19:22           ` H. Peter Anvin
  2016-08-15 20:06             ` Josh Poimboeuf
  0 siblings, 1 reply; 99+ messages in thread
From: H. Peter Anvin @ 2016-08-15 19:22 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, X86 ML,
	linux-kernel, Linus Torvalds, Steven Rostedt, Brian Gerst,
	Kees Cook, Peter Zijlstra, Frederic Weisbecker, Byungchul Park,
	Nilay Vaish

On 08/15/16 11:25, Josh Poimboeuf wrote:
> On Mon, Aug 15, 2016 at 11:04:42AM -0700, H. Peter Anvin wrote:
>> On 08/15/16 08:09, Josh Poimboeuf wrote:
>>> On Sun, Aug 14, 2016 at 12:31:47AM -0700, Andy Lutomirski wrote:
>>>> On Fri, Aug 12, 2016 at 7:28 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>>>>> This standardizes the stacks of idle tasks to be consistent with other
>>>>> tasks on 32-bit.
>>>>
>>>> It might be nice to stick a ud2 or 1: hlt; jmp 1b or similar
>>>> afterwards to make it clear that initial_code can't return.
>>>
>>> Yeah, I'll do something like that.
>>>
>>
>> "Standardizing the stack" how?  A zero on the stack terminates the stack
>> trace.
> 
> Instead of zero, user tasks have a real return address at that spot.
> This makes idle tasks consistent with that, so we have a well defined
> "end of stack".  Also it makes the stack trace more useful since it
> shows what entry code was involved in calling into C.
> 

So how is the stack terminated, and does things like kdb and kgdb need
modifications?  Or is there now a stack termination above the struct
pt_regs?

	-hpa

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 09/51] x86/dumpstack: fix x86_32 kernel_stack_pointer() previous stack access
  2016-08-15 17:22       ` Josh Poimboeuf
@ 2016-08-15 20:04         ` Andy Lutomirski
  0 siblings, 0 replies; 99+ messages in thread
From: Andy Lutomirski @ 2016-08-15 20:04 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, X86 ML,
	linux-kernel, Linus Torvalds, Steven Rostedt, Brian Gerst,
	Kees Cook, Peter Zijlstra, Frederic Weisbecker, Byungchul Park,
	Nilay Vaish

On Mon, Aug 15, 2016 at 10:22 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> On Mon, Aug 15, 2016 at 10:05:58AM -0500, Josh Poimboeuf wrote:
>> On Sun, Aug 14, 2016 at 12:26:29AM -0700, Andy Lutomirski wrote:
>> > On Fri, Aug 12, 2016 at 7:28 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>> > > On x86_32, when an interrupt happens from kernel space, SS and SP aren't
>> > > pushed and the existing stack is used.  So pt_regs is effectively two
>> > > words shorter, and the previous stack pointer is normally the memory
>> > > after the shortened pt_regs, aka '&regs->sp'.
>> > >
>> > > But in the rare case where the interrupt hits right after the stack
>> > > pointer has been changed to point to an empty stack, like for example
>> > > when call_on_stack() is used, the address immediately after the
>> > > shortened pt_regs is no longer on the stack.  In that case, instead of
>> > > '&regs->sp', the previous stack pointer should be retrieved from the
>> > > beginning of the current stack page.
>> > >
>> > > kernel_stack_pointer() wants to do that, but it forgets to dereference
>> > > the pointer.  So instead of returning a pointer to the previous stack,
>> > > it returns a pointer to the beginning of the current stack.
>> > >
>> > > Fixes: 0788aa6a23cb ("x86: Prepare removal of previous_esp from i386 thread_info structure")
>> > > Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
>> >
>> > This seems like a valid fix, but I'm not sure I agree with the intent
>> > of the code.  &regs->sp really is the previous stack pointer in the
>> > sense that the stack pointer was &regs->sp when the entry happened.
>> > From an unwinder's perspective, how is:
>> >
>> > movl [whatever], $esp
>> > <-- interrupt
>> >
>> > any different from:
>> >
>> > movl [whatever], $esp
>> > pushl [something]
>> > <-- interrupt
>>
>> In the first case, the stack is empty, so reading the value pointed to
>> by %esp would result in accessing outside the bounds of the stack.
>
> ...but maybe your point is that following the previous stack pointer is
> outside the scope of kernel_stack_pointer() and should instead be done
> by its caller.  Especially considering the fact that the x86_64 version
> of this function doesn't have this "feature".  In which case I think I
> would agree.

Yes, especially since your code seems to know how to find the previous
stack already.

>
> However I think fixing that is outside the scope of this
> already-way-too-big patch set.

Agreed.

>
> --
> Josh



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 16/51] x86/32: put real return address on stack in entry code
  2016-08-15 19:22           ` H. Peter Anvin
@ 2016-08-15 20:06             ` Josh Poimboeuf
  0 siblings, 0 replies; 99+ messages in thread
From: Josh Poimboeuf @ 2016-08-15 20:06 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, X86 ML,
	linux-kernel, Linus Torvalds, Steven Rostedt, Brian Gerst,
	Kees Cook, Peter Zijlstra, Frederic Weisbecker, Byungchul Park,
	Nilay Vaish

On Mon, Aug 15, 2016 at 12:22:33PM -0700, H. Peter Anvin wrote:
> On 08/15/16 11:25, Josh Poimboeuf wrote:
> > On Mon, Aug 15, 2016 at 11:04:42AM -0700, H. Peter Anvin wrote:
> >> On 08/15/16 08:09, Josh Poimboeuf wrote:
> >>> On Sun, Aug 14, 2016 at 12:31:47AM -0700, Andy Lutomirski wrote:
> >>>> On Fri, Aug 12, 2016 at 7:28 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> >>>>> This standardizes the stacks of idle tasks to be consistent with other
> >>>>> tasks on 32-bit.
> >>>>
> >>>> It might be nice to stick a ud2 or 1: hlt; jmp 1b or similar
> >>>> afterwards to make it clear that initial_code can't return.
> >>>
> >>> Yeah, I'll do something like that.
> >>>
> >>
> >> "Standardizing the stack" how?  A zero on the stack terminates the stack
> >> trace.
> > 
> > Instead of zero, user tasks have a real return address at that spot.
> > This makes idle tasks consistent with that, so we have a well defined
> > "end of stack".  Also it makes the stack trace more useful since it
> > shows what entry code was involved in calling into C.
> > 
> 
> So how is the stack terminated, and does things like kdb and kgdb need
> modifications?  Or is there now a stack termination above the struct
> pt_regs?

Even in today's code, there's no real "terminator".   The unwinder just
stops when it leaves the stack bounds.  See print_context_stack() in
mainline.

-- 
Josh

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 35/51] x86/unwind: add new unwind interface and implementations
  2016-08-12 14:28 ` [PATCH v3 35/51] x86/unwind: add new unwind interface and implementations Josh Poimboeuf
@ 2016-08-15 21:43   ` Josh Poimboeuf
  0 siblings, 0 replies; 99+ messages in thread
From: Josh Poimboeuf @ 2016-08-15 21:43 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

On Fri, Aug 12, 2016 at 09:28:54AM -0500, Josh Poimboeuf wrote:
> +	/*
> +	 * The caller can optionally provide a stack pointer directly
> +	 * (sp) or indirectly (regs->sp), which indicates which stack
> +	 * frame to start unwinding at.  Skip ahead until we reach it.
> +	 */
> +	while (!unwind_done(state) &&
> +	       (!on_stack(&state->stack_info, first_sp, sizeof(*first_sp) ||
> +		state->bp < first_sp)))
> +		unwind_next_frame(state);

Ack, the parentheses got messed up with a last minute formatting change
and gcc didn't catch it.  This should actually be:

	while (!unwind_done(state) &&
	       (!on_stack(&state->stack_info, first_sp, sizeof(*first_sp)) ||
		state->bp < first_sp))
		unwind_next_frame(state);

-- 
Josh

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 14/51] x86/asm/head: put real return address on idle task stack
  2016-08-12 14:28 ` [PATCH v3 14/51] x86/asm/head: put real return address on idle task stack Josh Poimboeuf
  2016-08-14  7:29   ` Andy Lutomirski
@ 2016-08-17 20:30   ` Nilay Vaish
  2016-08-17 21:10     ` Josh Poimboeuf
  1 sibling, 1 reply; 99+ messages in thread
From: Nilay Vaish @ 2016-08-17 20:30 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86,
	Linux Kernel list, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park

On 12 August 2016 at 09:28, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> The frame at the end of each idle task stack has a zeroed return
> address.  This is inconsistent with real task stacks, which have a real
> return address at that spot.  This inconsistency can be confusing for
> stack unwinders.
>
> Make it a real address by using the side effect of a call instruction to
> push the instruction pointer on the stack.
>
> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
> ---
>  arch/x86/kernel/head_64.S | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
> index 3621ad2..c90f481 100644
> --- a/arch/x86/kernel/head_64.S
> +++ b/arch/x86/kernel/head_64.S
> @@ -298,8 +298,9 @@ ENTRY(start_cpu)
>          *      REX.W + FF /5 JMP m16:64 Jump far, absolute indirect,
>          *              address given in m16:64.
>          */
> -       movq    initial_code(%rip),%rax
> -       pushq   $0              # fake return address to stop unwinder
> +       call    1f              # put return address on stack for unwinder
> +1:     xorq    %rbp, %rbp      # clear frame pointer
> +       movq    initial_code(%rip), %rax
>         pushq   $__KERNEL_CS    # set correct cs
>         pushq   %rax            # target address in negative space
>         lretq


Josh,  I have a couple of questions.

It seems to me that this patch and the patch 16/51 are both aiming at
the same thing, but are for two different architectures: 32-bit and
64-bit versions of x86.  But you have taken slightly different
approaches in the two patches (for 64-bit, we first jump and then make
a function call, for 32-bit we directly call the function).  Is there
any particular reason for this?  May be I missed out on something.

Second, this is for the whole patch series.  If I wanted to test this
series, how should I go about doing so?

Thanks
Nilay

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 14/51] x86/asm/head: put real return address on idle task stack
  2016-08-17 20:30   ` Nilay Vaish
@ 2016-08-17 21:10     ` Josh Poimboeuf
  0 siblings, 0 replies; 99+ messages in thread
From: Josh Poimboeuf @ 2016-08-17 21:10 UTC (permalink / raw)
  To: Nilay Vaish
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86,
	Linux Kernel list, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park

On Wed, Aug 17, 2016 at 03:30:55PM -0500, Nilay Vaish wrote:
> On 12 August 2016 at 09:28, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > The frame at the end of each idle task stack has a zeroed return
> > address.  This is inconsistent with real task stacks, which have a real
> > return address at that spot.  This inconsistency can be confusing for
> > stack unwinders.
> >
> > Make it a real address by using the side effect of a call instruction to
> > push the instruction pointer on the stack.
> >
> > Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
> > ---
> >  arch/x86/kernel/head_64.S | 5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
> > index 3621ad2..c90f481 100644
> > --- a/arch/x86/kernel/head_64.S
> > +++ b/arch/x86/kernel/head_64.S
> > @@ -298,8 +298,9 @@ ENTRY(start_cpu)
> >          *      REX.W + FF /5 JMP m16:64 Jump far, absolute indirect,
> >          *              address given in m16:64.
> >          */
> > -       movq    initial_code(%rip),%rax
> > -       pushq   $0              # fake return address to stop unwinder
> > +       call    1f              # put return address on stack for unwinder
> > +1:     xorq    %rbp, %rbp      # clear frame pointer
> > +       movq    initial_code(%rip), %rax
> >         pushq   $__KERNEL_CS    # set correct cs
> >         pushq   %rax            # target address in negative space
> >         lretq
> 
> 
> Josh,  I have a couple of questions.
> 
> It seems to me that this patch and the patch 16/51 are both aiming at
> the same thing, but are for two different architectures: 32-bit and
> 64-bit versions of x86.  But you have taken slightly different
> approaches in the two patches (for 64-bit, we first jump and then make
> a function call, for 32-bit we directly call the function).  Is there
> any particular reason for this?  May be I missed out on something.

Yes, the 64-bit code is different: it has to use a far return (lretq) in
order to jump to the 64-bit address stored in 'initial_code'.

So instead of calling the initial code directly, I had to use a more
obtuse approach there ("call 1f") which just places a start_cpu()
address at the right place on the stack before the lretq (which
"returns" to the intial code).

> Second, this is for the whole patch series.  If I wanted to test this
> series, how should I go about doing so?

That's a loaded question. :-) I'd recommend doing lots of stack dumps
and traces in various situations and making sure everything looks sane,
on both 32-bit and 64-bit, and with both CONFIG_FRAME_POINTER=n and
CONFIG_FRAME_POINTER=y.  Some easy ones are:

  cat /proc/*/stack
  echo 1 > /proc/sys/kernel/sysrq; echo l > /proc/sysrq-trigger
  echo 1 > /proc/sys/kernel/sysrq; echo t > /proc/sysrq-trigger
  perf record -g <cmd>; perf report

Also, if you have CONFIG_LOCKDEP enabled, it will silently save a bunch
of stacks in the background, though you won't ever see any evidence of
that unless something goes wrong (you can check if the unwinder spit out
any warnings with 'dmesg |grep WARNING').

For my own testing, I also hacked up my kernel to dump the stack in
various scenarios: interrupts, NMIs, page faults, preemption, perf
callchain record, etc.

BTW, I'll be posting v4 soon, probably Thursday morning US Central time.

-- 
Josh

^ permalink raw reply	[flat|nested] 99+ messages in thread

end of thread, other threads:[~2016-08-17 21:10 UTC | newest]

Thread overview: 99+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-12 14:28 [PATCH v3 00/51] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
2016-08-12 14:28 ` [PATCH v3 01/51] x86/dumpstack: remove show_trace() Josh Poimboeuf
2016-08-12 14:28 ` [PATCH v3 02/51] x86/asm/head: remove unused init_rsp variable extern Josh Poimboeuf
2016-08-12 14:28 ` [PATCH v3 03/51] x86/asm/head: rename 'stack_start' -> 'initial_stack' Josh Poimboeuf
2016-08-12 14:28 ` [PATCH v3 04/51] x86/asm/head: use a common function for starting CPUs Josh Poimboeuf
2016-08-12 22:08   ` Nilay Vaish
2016-08-12 14:28 ` [PATCH v3 05/51] x86/dumpstack: make printk_stack_address() more generally useful Josh Poimboeuf
2016-08-12 14:28 ` [PATCH v3 06/51] x86/dumpstack: add IRQ_USABLE_STACK_SIZE define Josh Poimboeuf
2016-08-12 14:28 ` [PATCH v3 07/51] x86/dumpstack: remove extra brackets around "<EOE>" Josh Poimboeuf
2016-08-12 14:28 ` [PATCH v3 08/51] x86/dumpstack: fix irq stack bounds calculation in show_stack_log_lvl() Josh Poimboeuf
2016-08-12 14:28 ` [PATCH v3 09/51] x86/dumpstack: fix x86_32 kernel_stack_pointer() previous stack access Josh Poimboeuf
2016-08-14  7:26   ` Andy Lutomirski
2016-08-14 12:55     ` Brian Gerst
2016-08-14 13:42       ` Andy Lutomirski
2016-08-15 15:05     ` Josh Poimboeuf
2016-08-15 17:22       ` Josh Poimboeuf
2016-08-15 20:04         ` Andy Lutomirski
2016-08-12 14:28 ` [PATCH v3 10/51] x86/dumpstack: add get_stack_pointer() and get_frame_pointer() Josh Poimboeuf
2016-08-12 14:28 ` [PATCH v3 11/51] x86/dumpstack: remove unnecessary stack pointer arguments Josh Poimboeuf
2016-08-12 14:28 ` [PATCH v3 12/51] x86: move _stext marker to before head code Josh Poimboeuf
2016-08-12 14:28 ` [PATCH v3 13/51] x86/asm/head: remove useless zeroed word Josh Poimboeuf
2016-08-12 14:28 ` [PATCH v3 14/51] x86/asm/head: put real return address on idle task stack Josh Poimboeuf
2016-08-14  7:29   ` Andy Lutomirski
2016-08-17 20:30   ` Nilay Vaish
2016-08-17 21:10     ` Josh Poimboeuf
2016-08-12 14:28 ` [PATCH v3 15/51] x86/asm/head: standardize the end of the stack for idle tasks Josh Poimboeuf
2016-08-14  7:30   ` Andy Lutomirski
2016-08-12 14:28 ` [PATCH v3 16/51] x86/32: put real return address on stack in entry code Josh Poimboeuf
2016-08-14  7:31   ` Andy Lutomirski
2016-08-15 15:09     ` Josh Poimboeuf
2016-08-15 18:04       ` H. Peter Anvin
2016-08-15 18:25         ` Josh Poimboeuf
2016-08-15 19:22           ` H. Peter Anvin
2016-08-15 20:06             ` Josh Poimboeuf
2016-08-12 14:28 ` [PATCH v3 17/51] x86/smp: fix initial idle stack location on 32-bit Josh Poimboeuf
2016-08-12 14:28 ` [PATCH v3 18/51] x86/entry/head/32: use local labels Josh Poimboeuf
2016-08-12 14:28 ` [PATCH v3 19/51] x86/entry/32: rename 'error_code' to 'common_exception' Josh Poimboeuf
2016-08-14  7:40   ` Andy Lutomirski
2016-08-15 15:30     ` Josh Poimboeuf
2016-08-12 14:28 ` [PATCH v3 20/51] perf/x86: check perf_callchain_store() error Josh Poimboeuf
2016-08-12 14:28 ` [PATCH v3 21/51] oprofile/x86: add regs->ip to oprofile trace Josh Poimboeuf
2016-08-12 14:28 ` [PATCH v3 22/51] proc: fix return address printk conversion specifer in /proc/<pid>/stack Josh Poimboeuf
2016-08-12 14:28 ` [PATCH v3 23/51] ftrace: remove CONFIG_HAVE_FUNCTION_GRAPH_FP_TEST from config Josh Poimboeuf
2016-08-12 14:28 ` [PATCH v3 24/51] ftrace: only allocate the ret_stack 'fp' field when needed Josh Poimboeuf
2016-08-12 14:28 ` [PATCH v3 25/51] ftrace: add return address pointer to ftrace_ret_stack Josh Poimboeuf
2016-08-12 14:28 ` [PATCH v3 26/51] ftrace: add ftrace_graph_ret_addr() stack unwinding helpers Josh Poimboeuf
2016-08-12 14:28 ` [PATCH v3 27/51] x86/dumpstack/ftrace: convert dump_trace() callbacks to use ftrace_graph_ret_addr() Josh Poimboeuf
2016-08-12 14:28 ` [PATCH v3 28/51] ftrace/x86: implement HAVE_FUNCTION_GRAPH_RET_ADDR_PTR Josh Poimboeuf
2016-08-12 14:28 ` [PATCH v3 29/51] x86/dumpstack/ftrace: mark function graph handler function as unreliable Josh Poimboeuf
2016-08-12 14:28 ` [PATCH v3 30/51] x86/dumpstack/ftrace: don't print unreliable addresses in print_context_stack_bp() Josh Poimboeuf
2016-08-12 14:28 ` [PATCH v3 31/51] x86/dumpstack: allow preemption in show_stack_log_lvl() and dump_trace() Josh Poimboeuf
2016-08-14  7:45   ` Andy Lutomirski
2016-08-15 15:32     ` Josh Poimboeuf
2016-08-12 14:28 ` [PATCH v3 32/51] x86/dumpstack: simplify in_exception_stack() Josh Poimboeuf
2016-08-14  7:48   ` Andy Lutomirski
2016-08-15 15:34     ` Josh Poimboeuf
2016-08-12 14:28 ` [PATCH v3 33/51] x86/dumpstack: add get_stack_info() interface Josh Poimboeuf
2016-08-12 14:28 ` [PATCH v3 34/51] x86/dumpstack: add recursion checking for all stacks Josh Poimboeuf
2016-08-12 14:28 ` [PATCH v3 35/51] x86/unwind: add new unwind interface and implementations Josh Poimboeuf
2016-08-15 21:43   ` Josh Poimboeuf
2016-08-12 14:28 ` [PATCH v3 36/51] perf/x86: convert perf_callchain_kernel() to use the new unwinder Josh Poimboeuf
2016-08-12 14:28 ` [PATCH v3 37/51] x86/stacktrace: convert save_stack_trace_*() " Josh Poimboeuf
2016-08-12 14:28 ` [PATCH v3 38/51] oprofile/x86: convert x86_backtrace() " Josh Poimboeuf
2016-08-12 14:28 ` [PATCH v3 39/51] x86/dumpstack: convert show_trace_log_lvl() " Josh Poimboeuf
2016-08-14  8:13   ` Andy Lutomirski
2016-08-15 16:44     ` Josh Poimboeuf
2016-08-12 14:28 ` [PATCH v3 40/51] x86/dumpstack: remove dump_trace() and related callbacks Josh Poimboeuf
2016-08-12 14:29 ` [PATCH v3 41/51] x86/entry/unwind: create stack frames for saved interrupt registers Josh Poimboeuf
2016-08-14  8:10   ` Andy Lutomirski
2016-08-15 16:33     ` Josh Poimboeuf
2016-08-12 14:29 ` [PATCH v3 42/51] x86/unwind: create stack frames for saved syscall registers Josh Poimboeuf
2016-08-14  8:23   ` Andy Lutomirski
2016-08-15 16:52     ` Josh Poimboeuf
2016-08-12 14:29 ` [PATCH v3 43/51] x86/dumpstack: print stack identifier on its own line Josh Poimboeuf
2016-08-12 14:29 ` [PATCH v3 44/51] x86/dumpstack: print any pt_regs found on the stack Josh Poimboeuf
2016-08-14  8:16   ` Andy Lutomirski
2016-08-12 14:29 ` [PATCH v3 45/51] x86: remove 64-byte gap at end of irq stack Josh Poimboeuf
2016-08-14  7:52   ` Andy Lutomirski
2016-08-14 12:50     ` Brian Gerst
2016-08-15 17:00       ` Josh Poimboeuf
2016-08-15 15:42     ` Josh Poimboeuf
2016-08-12 14:29 ` [PATCH v3 46/51] x86/unwind: warn on kernel stack corruption Josh Poimboeuf
2016-08-12 14:29 ` [PATCH v3 47/51] x86/unwind: warn on bad stack return address Josh Poimboeuf
2016-08-12 14:29 ` [PATCH v3 48/51] x86/unwind: warn if stack grows up Josh Poimboeuf
2016-08-14  7:56   ` Andy Lutomirski
2016-08-15 16:25     ` Josh Poimboeuf
2016-08-12 14:29 ` [PATCH v3 49/51] x86/dumpstack: warn on stack recursion Josh Poimboeuf
2016-08-12 14:29 ` [PATCH v3 50/51] x86/mm: move arch_within_stack_frames() to usercopy.c Josh Poimboeuf
2016-08-12 17:36   ` Kees Cook
2016-08-12 19:12     ` Josh Poimboeuf
2016-08-12 20:06       ` Kees Cook
2016-08-12 20:36         ` Josh Poimboeuf
2016-08-12 20:44           ` Kees Cook
2016-08-12 14:29 ` [PATCH v3 51/51] x86/mm: convert arch_within_stack_frames() to use the new unwinder Josh Poimboeuf
2016-08-12 15:17   ` Josh Poimboeuf
2016-08-12 17:38     ` Kees Cook
2016-08-12 19:15       ` Josh Poimboeuf
2016-08-12 20:41   ` Josh Poimboeuf
2016-08-12 20:47     ` Kees Cook

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.