All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 00/57] x86/dumpstack: rewrite x86 stack dump code
@ 2016-08-18 13:05 Josh Poimboeuf
  2016-08-18 13:05 ` [PATCH v4 01/57] x86/dumpstack: remove show_trace() Josh Poimboeuf
                   ` (57 more replies)
  0 siblings, 58 replies; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-18 13:05 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

Mostly minor changes this time.  See below for the full list of changes.

A git branch is available at:
 
  https://github.com/jpoimboe/linux unwind-v4

Based on tip/master.

v4: 
- complete rewrite of arch_within_stack_frames() for hardened usercopy
- handle empty stacks better:
  - change in_*_stack() functions to consider the end to be part of the
    stack
  - add loop in update_stack_state() to handle empty stacks more
    gracefully
- prevent false positive warnings when unwinding interrupts in entry code
- fix misplaced parentheses bug in __unwind_start()
- move 32-bit ret_from_fork change to a separate commit: "fix the end of
  the stack for newly forked tasks"
- add infinite loop after call to initial_code
- print orig_ax in __show_regs()
- fix duplicate RIP address display in __show_regs()
- rename "next_sp" to "next_frame" in unwind_next_frame() and "first_sp" to 
  "first_frame" in unwind_start() to improve readability
- improve a few patch header descriptions

v3:
- partial unwinder rewrite: each pt_regs gets its own frame
- add frame pointer encoding support for 32-bit
- several 32-bit fixes and cleanups for issues found by the new warnings
- convert CONFIG_HARDENED_USERCOPY arch_within_stack_frames()
- fix bug in unwinder when skipping stack frames (and add a comment)
- warn on stack recursion
- put start_cpu() in its own function
- export symbols in unwind_guess.c

v2:
- split up several of the patches and reorder them with lower-risk
  patches first
- add a lot more comments
- remove the 64-byte gap at the end of the irq stack
- fix some existing ftrace function graph unwinding issues
- fix an existing bug in kernel_stack_pointer()
- clarify the origins of the stack_info "next stack" pointers
- do visit_mask checking in get_stack_info() instead of in_*_stack()
- add some new unwinder warnings
- remove uses of test_and_set_bit()
- dont print regs->ip twice
- remove unwind_state.sp
- have unwind_get_return_address() validate the return address
- change /proc/pid/stack to use %pB
- several minor cleanups and fixes

----

The x86 stack dump code is a bit of a mess.  dump_trace() uses
callbacks, and each user of it seems to have slightly different
requirements, so there are several slightly different callbacks floating
around.

Also there are some upcoming features which will require more changes to
the stack dump code: reliable stack detection for live patching,
hardened user copy, and the DWARF unwinder.  Each of those features
would at least need more callbacks and/or callback interfaces, resulting
in a much bigger mess than what we have today.

Before doing all that, we should try to clean things up and replace
dump_trace() with something cleaner and more flexible.

The new unwinder is a simple state machine which was heavily inspired by
a suggestion from Andy Lutomirski:

  https://lkml.kernel.org/r/CALCETrUbNTqaM2LRyXGRx=kVLRPeY5A3Pc6k4TtQxF320rUT=w@mail.gmail.com

It's also similar to the libunwind API:

  http://www.nongnu.org/libunwind/man/libunwind(3).html

Some if its advantages:

- simplicity: no more callback sprawl and less code duplication.

- flexibility: allows the caller to stop and inspect the stack state at
  each step in the unwinding process.

- modularity: the unwinder code, console stack dump code, and stack
  metadata analysis code are all better separated so that changing one
  of them shouldn't have much of an impact on any of the others.

----

Josh Poimboeuf (57):
  x86/dumpstack: remove show_trace()
  x86/asm/head: remove unused init_rsp variable extern
  x86/asm/head: rename 'stack_start' -> 'initial_stack'
  x86/asm/head: use a common function for starting CPUs
  x86/dumpstack: make printk_stack_address() more generally useful
  x86/dumpstack: add IRQ_USABLE_STACK_SIZE define
  x86/dumpstack: remove extra brackets around "<EOE>"
  x86/dumpstack: fix irq stack bounds calculation in
    show_stack_log_lvl()
  x86/dumpstack: fix x86_32 kernel_stack_pointer() previous stack access
  x86/dumpstack: add get_stack_pointer() and get_frame_pointer()
  x86/dumpstack: remove unnecessary stack pointer arguments
  x86: move _stext marker to before head code
  x86/head: remove useless zeroed word
  x86/head: put real return address on idle task stack
  x86/head: fix the end of the stack for idle tasks
  x86/entry/32: fix the end of the stack for newly forked tasks
  x86/head/32: fix the end of the stack for idle tasks
  x86/smp: fix initial idle stack location on 32-bit
  x86/entry/head/32: use local labels
  x86/entry/32: rename 'error_code' to 'common_exception'
  perf/x86: check perf_callchain_store() error
  oprofile/x86: add regs->ip to oprofile trace
  proc: fix return address printk conversion specifer in
    /proc/<pid>/stack
  ftrace: remove CONFIG_HAVE_FUNCTION_GRAPH_FP_TEST from config
  ftrace: only allocate the ret_stack 'fp' field when needed
  ftrace: add return address pointer to ftrace_ret_stack
  ftrace: add ftrace_graph_ret_addr() stack unwinding helpers
  x86/dumpstack/ftrace: convert dump_trace() callbacks to use
    ftrace_graph_ret_addr()
  ftrace/x86: implement HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
  x86/dumpstack/ftrace: mark function graph handler function as
    unreliable
  x86/dumpstack/ftrace: don't print unreliable addresses in
    print_context_stack_bp()
  x86/dumpstack: allow preemption in show_stack_log_lvl() and
    dump_trace()
  x86/dumpstack: simplify in_exception_stack()
  x86/dumpstack: add get_stack_info() interface
  x86/dumpstack: add recursion checking for all stacks
  x86/unwind: add new unwind interface and implementations
  perf/x86: convert perf_callchain_kernel() to use the new unwinder
  x86/stacktrace: convert save_stack_trace_*() to use the new unwinder
  oprofile/x86: convert x86_backtrace() to use the new unwinder
  x86/dumpstack: convert show_trace_log_lvl() to use the new unwinder
  x86/dumpstack: remove dump_trace() and related callbacks
  x86/entry/unwind: create stack frames for saved interrupt registers
  x86/unwind: create stack frames for saved syscall registers
  x86/dumpstack: print stack identifier on its own line
  x86/dumpstack: print any pt_regs found on the stack
  x86/dumpstack: fix duplicate RIP address display in __show_regs()
  x86/dumpstack: print orig_ax in __show_regs()
  x86: remove 64-byte gap at end of irq stack
  x86/unwind: warn on kernel stack corruption
  x86/unwind: warn on bad stack return address
  x86/unwind: warn if stack grows up
  x86/dumpstack: warn on stack recursion
  x86/mm: move arch_within_stack_frames() to usercopy.c
  x86/mm: convert arch_within_stack_frames() to use the new unwinder
  x86/mm: simplify starting frame logic for hardened usercopy
  x86/mm: removed unused arch_within_stack_frames() arguments
  mm: re-enable gcc frame address warning

 Documentation/trace/ftrace-design.txt |  11 ++
 arch/Kconfig                          |   4 +-
 arch/arm/kernel/ftrace.c              |   2 +-
 arch/arm64/kernel/entry-ftrace.S      |   2 +-
 arch/arm64/kernel/ftrace.c            |   2 +-
 arch/blackfin/kernel/ftrace-entry.S   |   4 +-
 arch/blackfin/kernel/ftrace.c         |   2 +-
 arch/microblaze/kernel/ftrace.c       |   2 +-
 arch/mips/kernel/ftrace.c             |   4 +-
 arch/parisc/kernel/ftrace.c           |   2 +-
 arch/powerpc/kernel/ftrace.c          |   3 +-
 arch/s390/kernel/ftrace.c             |   3 +-
 arch/sh/kernel/ftrace.c               |   2 +-
 arch/sparc/Kconfig                    |   1 -
 arch/sparc/include/asm/ftrace.h       |   4 +
 arch/sparc/kernel/ftrace.c            |   2 +-
 arch/tile/kernel/ftrace.c             |   2 +-
 arch/x86/Kconfig                      |   1 -
 arch/x86/entry/calling.h              |  21 +++
 arch/x86/entry/entry_32.S             | 158 +++++++++++------
 arch/x86/entry/entry_64.S             |  10 +-
 arch/x86/events/core.c                |  36 ++--
 arch/x86/include/asm/ftrace.h         |   3 +
 arch/x86/include/asm/kdebug.h         |   2 -
 arch/x86/include/asm/page_64_types.h  |  16 +-
 arch/x86/include/asm/realmode.h       |   2 +-
 arch/x86/include/asm/smp.h            |   3 -
 arch/x86/include/asm/stacktrace.h     | 116 ++++++------
 arch/x86/include/asm/thread_info.h    |  48 +----
 arch/x86/include/asm/unwind.h         | 104 +++++++++++
 arch/x86/kernel/Makefile              |   6 +
 arch/x86/kernel/acpi/sleep.c          |   2 +-
 arch/x86/kernel/cpu/common.c          |   2 +-
 arch/x86/kernel/dumpstack.c           | 272 +++++++++++++----------------
 arch/x86/kernel/dumpstack_32.c        | 141 ++++++++-------
 arch/x86/kernel/dumpstack_64.c        | 320 +++++++++++-----------------------
 arch/x86/kernel/ftrace.c              |   2 +-
 arch/x86/kernel/head_32.S             |  57 +++---
 arch/x86/kernel/head_64.S             |  50 +++---
 arch/x86/kernel/process_64.c          |  11 +-
 arch/x86/kernel/ptrace.c              |   4 +-
 arch/x86/kernel/setup_percpu.c        |   2 +-
 arch/x86/kernel/smpboot.c             |   6 +-
 arch/x86/kernel/stacktrace.c          |  74 +++-----
 arch/x86/kernel/unwind_frame.c        | 245 ++++++++++++++++++++++++++
 arch/x86/kernel/unwind_guess.c        |  43 +++++
 arch/x86/kernel/vmlinux.lds.S         |   2 +-
 arch/x86/lib/usercopy.c               |  49 ++++++
 arch/x86/oprofile/backtrace.c         |  49 +++---
 fs/proc/base.c                        |   2 +-
 include/linux/ftrace.h                |  17 +-
 include/linux/thread_info.h           |   3 +-
 kernel/trace/Kconfig                  |   5 -
 kernel/trace/trace_functions_graph.c  |  67 ++++++-
 mm/Makefile                           |   3 -
 mm/usercopy.c                         |  14 +-
 56 files changed, 1225 insertions(+), 795 deletions(-)
 create mode 100644 arch/x86/include/asm/unwind.h
 create mode 100644 arch/x86/kernel/unwind_frame.c
 create mode 100644 arch/x86/kernel/unwind_guess.c

-- 
2.7.4

^ permalink raw reply	[flat|nested] 107+ messages in thread

* [PATCH v4 01/57] x86/dumpstack: remove show_trace()
  2016-08-18 13:05 [PATCH v4 00/57] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
@ 2016-08-18 13:05 ` Josh Poimboeuf
  2016-08-18 13:05 ` [PATCH v4 02/57] x86/asm/head: remove unused init_rsp variable extern Josh Poimboeuf
                   ` (56 subsequent siblings)
  57 siblings, 0 replies; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-18 13:05 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

There are a bewildering array of options for dumping the stack.
Simplify things a little by removing show_trace(), which is unused.

Reviewed-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/include/asm/kdebug.h | 2 --
 arch/x86/kernel/dumpstack.c   | 6 ------
 2 files changed, 8 deletions(-)

diff --git a/arch/x86/include/asm/kdebug.h b/arch/x86/include/asm/kdebug.h
index 1ef9d58..d318811 100644
--- a/arch/x86/include/asm/kdebug.h
+++ b/arch/x86/include/asm/kdebug.h
@@ -24,8 +24,6 @@ enum die_val {
 extern void printk_address(unsigned long address);
 extern void die(const char *, struct pt_regs *,long);
 extern int __must_check __die(const char *, struct pt_regs *, long);
-extern void show_trace(struct task_struct *t, struct pt_regs *regs,
-		       unsigned long *sp, unsigned long bp);
 extern void show_stack_regs(struct pt_regs *regs);
 extern void __show_regs(struct pt_regs *regs, int all);
 extern unsigned long oops_begin(void);
diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index 92e8f0a..5f49c04 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -182,12 +182,6 @@ show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
 	dump_trace(task, regs, stack, bp, &print_trace_ops, log_lvl);
 }
 
-void show_trace(struct task_struct *task, struct pt_regs *regs,
-		unsigned long *stack, unsigned long bp)
-{
-	show_trace_log_lvl(task, regs, stack, bp, "");
-}
-
 void show_stack(struct task_struct *task, unsigned long *sp)
 {
 	unsigned long bp = 0;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v4 02/57] x86/asm/head: remove unused init_rsp variable extern
  2016-08-18 13:05 [PATCH v4 00/57] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
  2016-08-18 13:05 ` [PATCH v4 01/57] x86/dumpstack: remove show_trace() Josh Poimboeuf
@ 2016-08-18 13:05 ` Josh Poimboeuf
  2016-08-18 16:22   ` Sebastian Andrzej Siewior
  2016-08-18 13:05 ` [PATCH v4 03/57] x86/asm/head: rename 'stack_start' -> 'initial_stack' Josh Poimboeuf
                   ` (55 subsequent siblings)
  57 siblings, 1 reply; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-18 13:05 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

There is no init_rsp variable.  Remove its extern.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/include/asm/realmode.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/x86/include/asm/realmode.h b/arch/x86/include/asm/realmode.h
index b2988c0..3327ffb 100644
--- a/arch/x86/include/asm/realmode.h
+++ b/arch/x86/include/asm/realmode.h
@@ -44,7 +44,6 @@ struct trampoline_header {
 extern struct real_mode_header *real_mode_header;
 extern unsigned char real_mode_blob_end[];
 
-extern unsigned long init_rsp;
 extern unsigned long initial_code;
 extern unsigned long initial_gs;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v4 03/57] x86/asm/head: rename 'stack_start' -> 'initial_stack'
  2016-08-18 13:05 [PATCH v4 00/57] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
  2016-08-18 13:05 ` [PATCH v4 01/57] x86/dumpstack: remove show_trace() Josh Poimboeuf
  2016-08-18 13:05 ` [PATCH v4 02/57] x86/asm/head: remove unused init_rsp variable extern Josh Poimboeuf
@ 2016-08-18 13:05 ` Josh Poimboeuf
  2016-08-18 13:05 ` [PATCH v4 04/57] x86/asm/head: use a common function for starting CPUs Josh Poimboeuf
                   ` (54 subsequent siblings)
  57 siblings, 0 replies; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-18 13:05 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

The 'stack_start' variable is similar in usage to 'initial_code' and
'initial_gs': they're all stored in head_64.S and they're all updated by
SMP and ACPI suspend before starting a CPU.

Rename it to 'initial_stack' to be consistent with the others.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/include/asm/realmode.h |  1 +
 arch/x86/include/asm/smp.h      |  3 ---
 arch/x86/kernel/acpi/sleep.c    |  2 +-
 arch/x86/kernel/head_32.S       |  8 ++++----
 arch/x86/kernel/head_64.S       | 11 +++++------
 arch/x86/kernel/smpboot.c       |  2 +-
 6 files changed, 12 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/realmode.h b/arch/x86/include/asm/realmode.h
index 3327ffb..230e190 100644
--- a/arch/x86/include/asm/realmode.h
+++ b/arch/x86/include/asm/realmode.h
@@ -46,6 +46,7 @@ extern unsigned char real_mode_blob_end[];
 
 extern unsigned long initial_code;
 extern unsigned long initial_gs;
+extern unsigned long initial_stack;
 
 extern unsigned char real_mode_blob[];
 extern unsigned char real_mode_relocs[];
diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h
index ebd0c16..19980b3 100644
--- a/arch/x86/include/asm/smp.h
+++ b/arch/x86/include/asm/smp.h
@@ -39,9 +39,6 @@ DECLARE_EARLY_PER_CPU_READ_MOSTLY(u16, x86_bios_cpu_apicid);
 DECLARE_EARLY_PER_CPU_READ_MOSTLY(int, x86_cpu_to_logical_apicid);
 #endif
 
-/* Static state in head.S used to set up a CPU */
-extern unsigned long stack_start; /* Initial stack pointer address */
-
 struct task_struct;
 
 struct smp_ops {
diff --git a/arch/x86/kernel/acpi/sleep.c b/arch/x86/kernel/acpi/sleep.c
index adb3eaf..4858733 100644
--- a/arch/x86/kernel/acpi/sleep.c
+++ b/arch/x86/kernel/acpi/sleep.c
@@ -99,7 +99,7 @@ int x86_acpi_suspend_lowlevel(void)
 	saved_magic = 0x12345678;
 #else /* CONFIG_64BIT */
 #ifdef CONFIG_SMP
-	stack_start = (unsigned long)temp_stack + sizeof(temp_stack);
+	initial_stack = (unsigned long)temp_stack + sizeof(temp_stack);
 	early_gdt_descr.address =
 			(unsigned long)get_cpu_gdt_table(smp_processor_id());
 	initial_gs = per_cpu_offset(smp_processor_id());
diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index 6f8902b..5f40126 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -94,7 +94,7 @@ RESERVE_BRK(pagetables, INIT_MAP_SIZE)
  */
 __HEAD
 ENTRY(startup_32)
-	movl pa(stack_start),%ecx
+	movl pa(initial_stack),%ecx
 	
 	/* test KEEP_SEGMENTS flag to see if the bootloader is asking
 		us to not reload segments */
@@ -286,7 +286,7 @@ num_subarch_entries = (. - subarch_entries) / 4
  * start_secondary().
  */
 ENTRY(start_cpu0)
-	movl stack_start, %ecx
+	movl initial_stack, %ecx
 	movl %ecx, %esp
 	jmp  *(initial_code)
 ENDPROC(start_cpu0)
@@ -307,7 +307,7 @@ ENTRY(startup_32_smp)
 	movl %eax,%es
 	movl %eax,%fs
 	movl %eax,%gs
-	movl pa(stack_start),%ecx
+	movl pa(initial_stack),%ecx
 	movl %eax,%ss
 	leal -__PAGE_OFFSET(%ecx),%esp
 
@@ -703,7 +703,7 @@ ENTRY(initial_page_table)
 
 .data
 .balign 4
-ENTRY(stack_start)
+ENTRY(initial_stack)
 	.long init_thread_union+THREAD_SIZE
 
 __INITRODATA
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 9f8efc9..e048142 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -66,7 +66,7 @@ startup_64:
 	 */
 
 	/*
-	 * Setup stack for verify_cpu(). "-8" because stack_start is defined
+	 * Setup stack for verify_cpu(). "-8" because initial_stack is defined
 	 * this way, see below. Our best guess is a NULL ptr for stack
 	 * termination heuristics and we don't want to break anything which
 	 * might depend on it (kgdb, ...).
@@ -226,7 +226,7 @@ ENTRY(secondary_startup_64)
 	movq	%rax, %cr0
 
 	/* Setup a boot time stack */
-	movq stack_start(%rip), %rsp
+	movq initial_stack(%rip), %rsp
 
 	/* zero EFLAGS after setting rsp */
 	pushq $0
@@ -310,7 +310,7 @@ ENDPROC(secondary_startup_64)
  * start_secondary().
  */
 ENTRY(start_cpu0)
-	movq stack_start(%rip),%rsp
+	movq initial_stack(%rip),%rsp
 	movq	initial_code(%rip),%rax
 	pushq	$0		# fake return address to stop unwinder
 	pushq	$__KERNEL_CS	# set correct cs
@@ -319,15 +319,14 @@ ENTRY(start_cpu0)
 ENDPROC(start_cpu0)
 #endif
 
-	/* SMP bootup changes these two */
+	/* Both SMP bootup and ACPI suspend change these variables */
 	__REFDATA
 	.balign	8
 	GLOBAL(initial_code)
 	.quad	x86_64_start_kernel
 	GLOBAL(initial_gs)
 	.quad	INIT_PER_CPU_VAR(irq_stack_union)
-
-	GLOBAL(stack_start)
+	GLOBAL(initial_stack)
 	.quad  init_thread_union+THREAD_SIZE-8
 	.word  0
 	__FINITDATA
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 26b473d..3878725 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -969,7 +969,7 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle)
 
 	early_gdt_descr.address = (unsigned long)get_cpu_gdt_table(cpu);
 	initial_code = (unsigned long)start_secondary;
-	stack_start  = idle->thread.sp;
+	initial_stack  = idle->thread.sp;
 
 	/*
 	 * Enable the espfix hack for this CPU
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v4 04/57] x86/asm/head: use a common function for starting CPUs
  2016-08-18 13:05 [PATCH v4 00/57] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (2 preceding siblings ...)
  2016-08-18 13:05 ` [PATCH v4 03/57] x86/asm/head: rename 'stack_start' -> 'initial_stack' Josh Poimboeuf
@ 2016-08-18 13:05 ` Josh Poimboeuf
  2016-08-18 13:05 ` [PATCH v4 05/57] x86/dumpstack: make printk_stack_address() more generally useful Josh Poimboeuf
                   ` (53 subsequent siblings)
  57 siblings, 0 replies; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-18 13:05 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

There are two different pieces of code for starting a CPU: start_cpu0()
and the end of secondary_startup_64().  They're identical except for the
stack setup.  Combine the common parts into a shared start_cpu()
function.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/head_64.S | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index e048142..a212310 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -264,13 +264,17 @@ ENTRY(secondary_startup_64)
 	movl	$MSR_GS_BASE,%ecx
 	movl	initial_gs(%rip),%eax
 	movl	initial_gs+4(%rip),%edx
-	wrmsr	
+	wrmsr
 
 	/* rsi is pointer to real mode structure with interesting info.
 	   pass it to C */
 	movq	%rsi, %rdi
-	
-	/* Finally jump to run C code and to be on real kernel address
+	jmp	start_cpu
+ENDPROC(secondary_startup_64)
+
+ENTRY(start_cpu)
+	/*
+	 * Jump to run C code and to be on a real kernel address.
 	 * Since we are running on identity-mapped space we have to jump
 	 * to the full 64bit address, this is only possible as indirect
 	 * jump.  In addition we need to ensure %cs is set so we make this
@@ -299,7 +303,7 @@ ENTRY(secondary_startup_64)
 	pushq	$__KERNEL_CS	# set correct cs
 	pushq	%rax		# target address in negative space
 	lretq
-ENDPROC(secondary_startup_64)
+ENDPROC(start_cpu)
 
 #include "verify_cpu.S"
 
@@ -307,15 +311,11 @@ ENDPROC(secondary_startup_64)
 /*
  * Boot CPU0 entry point. It's called from play_dead(). Everything has been set
  * up already except stack. We just set up stack here. Then call
- * start_secondary().
+ * start_secondary() via start_cpu().
  */
 ENTRY(start_cpu0)
-	movq initial_stack(%rip),%rsp
-	movq	initial_code(%rip),%rax
-	pushq	$0		# fake return address to stop unwinder
-	pushq	$__KERNEL_CS	# set correct cs
-	pushq	%rax		# target address in negative space
-	lretq
+	movq	initial_stack(%rip), %rsp
+	jmp	start_cpu
 ENDPROC(start_cpu0)
 #endif
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v4 05/57] x86/dumpstack: make printk_stack_address() more generally useful
  2016-08-18 13:05 [PATCH v4 00/57] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (3 preceding siblings ...)
  2016-08-18 13:05 ` [PATCH v4 04/57] x86/asm/head: use a common function for starting CPUs Josh Poimboeuf
@ 2016-08-18 13:05 ` Josh Poimboeuf
  2016-08-18 13:05 ` [PATCH v4 06/57] x86/dumpstack: add IRQ_USABLE_STACK_SIZE define Josh Poimboeuf
                   ` (52 subsequent siblings)
  57 siblings, 0 replies; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-18 13:05 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

Change printk_stack_address() to be useful when called by an unwinder
outside the context of dump_trace().

Specifically:

- printk_stack_address()'s 'data' argument is always used as the log
  level string.  Make that explicit.

- Call touch_nmi_watchdog().

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/dumpstack.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index 5f49c04..6b3376d 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -26,10 +26,11 @@ int kstack_depth_to_print = 3 * STACKSLOTS_PER_LINE;
 static int die_counter;
 
 static void printk_stack_address(unsigned long address, int reliable,
-		void *data)
+				 char *log_lvl)
 {
+	touch_nmi_watchdog();
 	printk("%s [<%p>] %s%pB\n",
-		(char *)data, (void *)address, reliable ? "" : "? ",
+		log_lvl, (void *)address, reliable ? "" : "? ",
 		(void *)address);
 }
 
@@ -163,7 +164,6 @@ static int print_trace_stack(void *data, char *name)
  */
 static int print_trace_address(void *data, unsigned long addr, int reliable)
 {
-	touch_nmi_watchdog();
 	printk_stack_address(addr, reliable, data);
 	return 0;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v4 06/57] x86/dumpstack: add IRQ_USABLE_STACK_SIZE define
  2016-08-18 13:05 [PATCH v4 00/57] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (4 preceding siblings ...)
  2016-08-18 13:05 ` [PATCH v4 05/57] x86/dumpstack: make printk_stack_address() more generally useful Josh Poimboeuf
@ 2016-08-18 13:05 ` Josh Poimboeuf
  2016-08-18 13:05 ` [PATCH v4 07/57] x86/dumpstack: remove extra brackets around "<EOE>" Josh Poimboeuf
                   ` (51 subsequent siblings)
  57 siblings, 0 replies; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-18 13:05 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

For reasons unknown, the x86_64 irq stack starts at an offset 64 bytes
from the end of the page.  At least make that explicit.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/include/asm/page_64_types.h | 19 +++++++++++--------
 arch/x86/kernel/cpu/common.c         |  2 +-
 arch/x86/kernel/dumpstack_64.c       |  5 +----
 arch/x86/kernel/setup_percpu.c       |  2 +-
 4 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/page_64_types.h b/arch/x86/include/asm/page_64_types.h
index 9215e05..6256baf 100644
--- a/arch/x86/include/asm/page_64_types.h
+++ b/arch/x86/include/asm/page_64_types.h
@@ -12,17 +12,20 @@
 #endif
 
 #define THREAD_SIZE_ORDER	(2 + KASAN_STACK_ORDER)
-#define THREAD_SIZE  (PAGE_SIZE << THREAD_SIZE_ORDER)
-#define CURRENT_MASK (~(THREAD_SIZE - 1))
+#define THREAD_SIZE		(PAGE_SIZE << THREAD_SIZE_ORDER)
+#define CURRENT_MASK		(~(THREAD_SIZE - 1))
 
-#define EXCEPTION_STACK_ORDER (0 + KASAN_STACK_ORDER)
-#define EXCEPTION_STKSZ (PAGE_SIZE << EXCEPTION_STACK_ORDER)
+#define EXCEPTION_STACK_ORDER	(0 + KASAN_STACK_ORDER)
+#define EXCEPTION_STKSZ		(PAGE_SIZE << EXCEPTION_STACK_ORDER)
 
-#define DEBUG_STACK_ORDER (EXCEPTION_STACK_ORDER + 1)
-#define DEBUG_STKSZ (PAGE_SIZE << DEBUG_STACK_ORDER)
+#define DEBUG_STACK_ORDER	(EXCEPTION_STACK_ORDER + 1)
+#define DEBUG_STKSZ		(PAGE_SIZE << DEBUG_STACK_ORDER)
 
-#define IRQ_STACK_ORDER (2 + KASAN_STACK_ORDER)
-#define IRQ_STACK_SIZE (PAGE_SIZE << IRQ_STACK_ORDER)
+#define IRQ_STACK_ORDER		(2 + KASAN_STACK_ORDER)
+#define IRQ_STACK_SIZE		(PAGE_SIZE << IRQ_STACK_ORDER)
+
+/* FIXME: why? */
+#define IRQ_USABLE_STACK_SIZE	(IRQ_STACK_SIZE - 64)
 
 #define DOUBLEFAULT_STACK 1
 #define NMI_STACK 2
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index d3b91be..55684b1 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1286,7 +1286,7 @@ DEFINE_PER_CPU(struct task_struct *, current_task) ____cacheline_aligned =
 EXPORT_PER_CPU_SYMBOL(current_task);
 
 DEFINE_PER_CPU(char *, irq_stack_ptr) =
-	init_per_cpu_var(irq_stack_union.irq_stack) + IRQ_STACK_SIZE - 64;
+	init_per_cpu_var(irq_stack_union.irq_stack) + IRQ_USABLE_STACK_SIZE;
 
 DEFINE_PER_CPU(unsigned int, irq_count) __visible = -1;
 
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index 9ee4520..43023ae 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -103,9 +103,6 @@ in_irq_stack(unsigned long *stack, unsigned long *irq_stack,
 	return (stack >= irq_stack && stack < irq_stack_end);
 }
 
-static const unsigned long irq_stack_size =
-	(IRQ_STACK_SIZE - 64) / sizeof(unsigned long);
-
 enum stack_type {
 	STACK_IS_UNKNOWN,
 	STACK_IS_NORMAL,
@@ -133,7 +130,7 @@ analyze_stack(int cpu, struct task_struct *task, unsigned long *stack,
 		return STACK_IS_NORMAL;
 
 	*stack_end = irq_stack;
-	irq_stack = irq_stack - irq_stack_size;
+	irq_stack -= (IRQ_USABLE_STACK_SIZE / sizeof(long));
 
 	if (in_irq_stack(stack, irq_stack, *stack_end))
 		return STACK_IS_IRQ;
diff --git a/arch/x86/kernel/setup_percpu.c b/arch/x86/kernel/setup_percpu.c
index 1d5c794..a2a0eae 100644
--- a/arch/x86/kernel/setup_percpu.c
+++ b/arch/x86/kernel/setup_percpu.c
@@ -246,7 +246,7 @@ void __init setup_per_cpu_areas(void)
 #ifdef CONFIG_X86_64
 		per_cpu(irq_stack_ptr, cpu) =
 			per_cpu(irq_stack_union.irq_stack, cpu) +
-			IRQ_STACK_SIZE - 64;
+			IRQ_USABLE_STACK_SIZE;
 #endif
 #ifdef CONFIG_NUMA
 		per_cpu(x86_cpu_to_node_map, cpu) =
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v4 07/57] x86/dumpstack: remove extra brackets around "<EOE>"
  2016-08-18 13:05 [PATCH v4 00/57] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (5 preceding siblings ...)
  2016-08-18 13:05 ` [PATCH v4 06/57] x86/dumpstack: add IRQ_USABLE_STACK_SIZE define Josh Poimboeuf
@ 2016-08-18 13:05 ` Josh Poimboeuf
  2016-08-18 13:05 ` [PATCH v4 08/57] x86/dumpstack: fix irq stack bounds calculation in show_stack_log_lvl() Josh Poimboeuf
                   ` (50 subsequent siblings)
  57 siblings, 0 replies; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-18 13:05 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

When starting the dump of an exception stack, it shows "<<EOE>>" instead
of "<EOE>".  print_trace_stack() already adds brackets, no need to add
them again.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/dumpstack_64.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index 43023ae..7ea6ed0 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -199,7 +199,7 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 
 			bp = ops->walk_stack(task, stack, bp, ops,
 					     data, stack_end, &graph);
-			ops->stack(data, "<EOE>");
+			ops->stack(data, "EOE");
 			/*
 			 * We link to the next stack via the
 			 * second-to-last pointer (index -2 to end) in the
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v4 08/57] x86/dumpstack: fix irq stack bounds calculation in show_stack_log_lvl()
  2016-08-18 13:05 [PATCH v4 00/57] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (6 preceding siblings ...)
  2016-08-18 13:05 ` [PATCH v4 07/57] x86/dumpstack: remove extra brackets around "<EOE>" Josh Poimboeuf
@ 2016-08-18 13:05 ` Josh Poimboeuf
  2016-08-18 13:05 ` [PATCH v4 09/57] x86/dumpstack: fix x86_32 kernel_stack_pointer() previous stack access Josh Poimboeuf
                   ` (49 subsequent siblings)
  57 siblings, 0 replies; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-18 13:05 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

The percpu irq_stack_ptr variable has a 64-byte gap from the end of the
allocated irq stack area, so subtracting IRQ_STACK_SIZE from it actually
results in a value 64 bytes before the beginning of the stack.  Use
IRQ_USABLE_STACK_SIZE instead.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/dumpstack_64.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index 7ea6ed0..0fdd371 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -253,8 +253,8 @@ show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
 	preempt_disable();
 	cpu = smp_processor_id();
 
-	irq_stack_end	= (unsigned long *)(per_cpu(irq_stack_ptr, cpu));
-	irq_stack	= (unsigned long *)(per_cpu(irq_stack_ptr, cpu) - IRQ_STACK_SIZE);
+	irq_stack_end = (unsigned long *)(per_cpu(irq_stack_ptr, cpu));
+	irq_stack     = irq_stack_end - (IRQ_USABLE_STACK_SIZE / sizeof(long));
 
 	/*
 	 * Debugging aid: "show_stack(NULL, NULL);" prints the
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v4 09/57] x86/dumpstack: fix x86_32 kernel_stack_pointer() previous stack access
  2016-08-18 13:05 [PATCH v4 00/57] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (7 preceding siblings ...)
  2016-08-18 13:05 ` [PATCH v4 08/57] x86/dumpstack: fix irq stack bounds calculation in show_stack_log_lvl() Josh Poimboeuf
@ 2016-08-18 13:05 ` Josh Poimboeuf
  2016-08-18 13:05 ` [PATCH v4 10/57] x86/dumpstack: add get_stack_pointer() and get_frame_pointer() Josh Poimboeuf
                   ` (48 subsequent siblings)
  57 siblings, 0 replies; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-18 13:05 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

On x86_32, when an interrupt happens from kernel space, SS and SP aren't
pushed and the existing stack is used.  So pt_regs is effectively two
words shorter, and the previous stack pointer is normally the memory
after the shortened pt_regs, aka '&regs->sp'.

But in the rare case where the interrupt hits right after the stack
pointer has been changed to point to an empty stack, like for example
when call_on_stack() is used, the address immediately after the
shortened pt_regs is no longer on the stack.  In that case, instead of
'&regs->sp', the previous stack pointer should be retrieved from the
beginning of the current stack page.

kernel_stack_pointer() wants to do that, but it forgets to dereference
the pointer.  So instead of returning a pointer to the previous stack,
it returns a pointer to the beginning of the current stack.

Note that it's probably outside of kernel_stack_pointer()'s scope to be
switching stacks at all.  The x86_64 version of this function doesn't do
it, and it would be better for the caller to do it if necessary.  But
that's a patch for another day.  This just fixes the original intent.

Fixes: 0788aa6a23cb ("x86: Prepare removal of previous_esp from i386 thread_info structure")
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/ptrace.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index 2537cfb..5b88a1b 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -173,8 +173,8 @@ unsigned long kernel_stack_pointer(struct pt_regs *regs)
 		return sp;
 
 	prev_esp = (u32 *)(context);
-	if (prev_esp)
-		return (unsigned long)prev_esp;
+	if (*prev_esp)
+		return (unsigned long)*prev_esp;
 
 	return (unsigned long)regs;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v4 10/57] x86/dumpstack: add get_stack_pointer() and get_frame_pointer()
  2016-08-18 13:05 [PATCH v4 00/57] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (8 preceding siblings ...)
  2016-08-18 13:05 ` [PATCH v4 09/57] x86/dumpstack: fix x86_32 kernel_stack_pointer() previous stack access Josh Poimboeuf
@ 2016-08-18 13:05 ` Josh Poimboeuf
  2016-08-18 13:05 ` [PATCH v4 11/57] x86/dumpstack: remove unnecessary stack pointer arguments Josh Poimboeuf
                   ` (47 subsequent siblings)
  57 siblings, 0 replies; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-18 13:05 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

The various functions involved in dumping the stack all do similar
things with regard to getting the stack pointer and the frame pointer
based on the regs and task arguments.  Create helper functions to
do that instead.

Reviewed-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/include/asm/stacktrace.h | 41 ++++++++++++++++++++++-----------------
 arch/x86/kernel/dumpstack.c       |  5 ++---
 arch/x86/kernel/dumpstack_32.c    | 25 ++++--------------------
 arch/x86/kernel/dumpstack_64.c    | 30 ++++------------------------
 4 files changed, 33 insertions(+), 68 deletions(-)

diff --git a/arch/x86/include/asm/stacktrace.h b/arch/x86/include/asm/stacktrace.h
index 0944218..38a6bf8 100644
--- a/arch/x86/include/asm/stacktrace.h
+++ b/arch/x86/include/asm/stacktrace.h
@@ -49,37 +49,42 @@ void dump_trace(struct task_struct *tsk, struct pt_regs *regs,
 
 #ifdef CONFIG_X86_32
 #define STACKSLOTS_PER_LINE 8
-#define get_bp(bp) asm("movl %%ebp, %0" : "=r" (bp) :)
 #else
 #define STACKSLOTS_PER_LINE 4
-#define get_bp(bp) asm("movq %%rbp, %0" : "=r" (bp) :)
 #endif
 
 #ifdef CONFIG_FRAME_POINTER
-static inline unsigned long
-stack_frame(struct task_struct *task, struct pt_regs *regs)
+static inline unsigned long *
+get_frame_pointer(struct task_struct *task, struct pt_regs *regs)
 {
-	unsigned long bp;
-
 	if (regs)
-		return regs->bp;
+		return (unsigned long *)regs->bp;
 
-	if (task == current) {
-		/* Grab bp right from our regs */
-		get_bp(bp);
-		return bp;
-	}
+	if (!task || task == current)
+		return __builtin_frame_address(0);
 
 	/* bp is the last reg pushed by switch_to */
-	return *(unsigned long *)task->thread.sp;
+	return (unsigned long *)*(unsigned long *)task->thread.sp;
 }
 #else
-static inline unsigned long
-stack_frame(struct task_struct *task, struct pt_regs *regs)
+static inline unsigned long *
+get_frame_pointer(struct task_struct *task, struct pt_regs *regs)
 {
-	return 0;
+	return NULL;
+}
+#endif /* CONFIG_FRAME_POINTER */
+
+static inline unsigned long *
+get_stack_pointer(struct task_struct *task, struct pt_regs *regs)
+{
+	if (regs)
+		return (unsigned long *)kernel_stack_pointer(regs);
+
+	if (!task || task == current)
+		return __builtin_frame_address(0);
+
+	return (unsigned long *)task->thread.sp;
 }
-#endif
 
 extern void
 show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
@@ -106,7 +111,7 @@ static inline unsigned long caller_frame_pointer(void)
 {
 	struct stack_frame *frame;
 
-	get_bp(frame);
+	frame = __builtin_frame_address(0);
 
 #ifdef CONFIG_FRAME_POINTER
 	frame = frame->next_frame;
diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index 6b3376d..68f42bb 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -185,15 +185,14 @@ show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
 void show_stack(struct task_struct *task, unsigned long *sp)
 {
 	unsigned long bp = 0;
-	unsigned long stack;
 
 	/*
 	 * Stack frames below this one aren't interesting.  Don't show them
 	 * if we're printing for %current.
 	 */
 	if (!sp && (!task || task == current)) {
-		sp = &stack;
-		bp = stack_frame(current, NULL);
+		sp = get_stack_pointer(current, NULL);
+		bp = (unsigned long)get_frame_pointer(current, NULL);
 	}
 
 	show_stack_log_lvl(task, NULL, sp, bp, "");
diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
index 0967571..358fe1c 100644
--- a/arch/x86/kernel/dumpstack_32.c
+++ b/arch/x86/kernel/dumpstack_32.c
@@ -46,19 +46,9 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 	int graph = 0;
 	u32 *prev_esp;
 
-	if (!task)
-		task = current;
-
-	if (!stack) {
-		unsigned long dummy;
-
-		stack = &dummy;
-		if (task != current)
-			stack = (unsigned long *)task->thread.sp;
-	}
-
-	if (!bp)
-		bp = stack_frame(task, regs);
+	task = task ? : current;
+	stack = stack ? : get_stack_pointer(task, regs);
+	bp = bp ? : (unsigned long)get_frame_pointer(task, regs);
 
 	for (;;) {
 		void *end_stack;
@@ -95,14 +85,7 @@ show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
 	unsigned long *stack;
 	int i;
 
-	if (sp == NULL) {
-		if (regs)
-			sp = (unsigned long *)regs->sp;
-		else if (task)
-			sp = (unsigned long *)task->thread.sp;
-		else
-			sp = (unsigned long *)&sp;
-	}
+	sp = sp ? : get_stack_pointer(task, regs);
 
 	stack = sp;
 	for (i = 0; i < kstack_depth_to_print; i++) {
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index 0fdd371..3c5dbc0 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -151,25 +151,14 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 {
 	const unsigned cpu = get_cpu();
 	unsigned long *irq_stack = (unsigned long *)per_cpu(irq_stack_ptr, cpu);
-	unsigned long dummy;
 	unsigned used = 0;
 	int graph = 0;
 	int done = 0;
 
-	if (!task)
-		task = current;
+	task = task ? : current;
+	stack = stack ? : get_stack_pointer(task, regs);
+	bp = bp ? : (unsigned long)get_frame_pointer(task, regs);
 
-	if (!stack) {
-		if (regs)
-			stack = (unsigned long *)regs->sp;
-		else if (task != current)
-			stack = (unsigned long *)task->thread.sp;
-		else
-			stack = &dummy;
-	}
-
-	if (!bp)
-		bp = stack_frame(task, regs);
 	/*
 	 * Print function call entries in all stacks, starting at the
 	 * current stack address. If the stacks consist of nested
@@ -256,18 +245,7 @@ show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
 	irq_stack_end = (unsigned long *)(per_cpu(irq_stack_ptr, cpu));
 	irq_stack     = irq_stack_end - (IRQ_USABLE_STACK_SIZE / sizeof(long));
 
-	/*
-	 * Debugging aid: "show_stack(NULL, NULL);" prints the
-	 * back trace for this cpu:
-	 */
-	if (sp == NULL) {
-		if (regs)
-			sp = (unsigned long *)regs->sp;
-		else if (task)
-			sp = (unsigned long *)task->thread.sp;
-		else
-			sp = (unsigned long *)&sp;
-	}
+	sp = sp ? : get_stack_pointer(task, regs);
 
 	stack = sp;
 	for (i = 0; i < kstack_depth_to_print; i++) {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v4 11/57] x86/dumpstack: remove unnecessary stack pointer arguments
  2016-08-18 13:05 [PATCH v4 00/57] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (9 preceding siblings ...)
  2016-08-18 13:05 ` [PATCH v4 10/57] x86/dumpstack: add get_stack_pointer() and get_frame_pointer() Josh Poimboeuf
@ 2016-08-18 13:05 ` Josh Poimboeuf
  2016-08-18 13:05 ` [PATCH v4 12/57] x86: move _stext marker to before head code Josh Poimboeuf
                   ` (46 subsequent siblings)
  57 siblings, 0 replies; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-18 13:05 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

When calling show_stack_log_lvl() or dump_trace() with a regs argument,
providing a stack pointer or frame pointer is redundant.

Reviewed-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>d
---
 arch/x86/kernel/dumpstack.c    | 2 +-
 arch/x86/kernel/dumpstack_32.c | 2 +-
 arch/x86/kernel/dumpstack_64.c | 5 +----
 arch/x86/oprofile/backtrace.c  | 4 +---
 4 files changed, 4 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index 68f42bb..692eecae 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -200,7 +200,7 @@ void show_stack(struct task_struct *task, unsigned long *sp)
 
 void show_stack_regs(struct pt_regs *regs)
 {
-	show_stack_log_lvl(current, regs, (unsigned long *)regs->sp, regs->bp, "");
+	show_stack_log_lvl(current, regs, NULL, 0, "");
 }
 
 static arch_spinlock_t die_lock = __ARCH_SPIN_LOCK_UNLOCKED;
diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
index 358fe1c..c533b8b 100644
--- a/arch/x86/kernel/dumpstack_32.c
+++ b/arch/x86/kernel/dumpstack_32.c
@@ -122,7 +122,7 @@ void show_regs(struct pt_regs *regs)
 		u8 *ip;
 
 		pr_emerg("Stack:\n");
-		show_stack_log_lvl(NULL, regs, &regs->sp, 0, KERN_EMERG);
+		show_stack_log_lvl(NULL, regs, NULL, 0, KERN_EMERG);
 
 		pr_emerg("Code:");
 
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index 3c5dbc0..491f2fd 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -283,9 +283,7 @@ show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
 void show_regs(struct pt_regs *regs)
 {
 	int i;
-	unsigned long sp;
 
-	sp = regs->sp;
 	show_regs_print_info(KERN_DEFAULT);
 	__show_regs(regs, 1);
 
@@ -300,8 +298,7 @@ void show_regs(struct pt_regs *regs)
 		u8 *ip;
 
 		printk(KERN_DEFAULT "Stack:\n");
-		show_stack_log_lvl(NULL, regs, (unsigned long *)sp,
-				   0, KERN_DEFAULT);
+		show_stack_log_lvl(NULL, regs, NULL, 0, KERN_DEFAULT);
 
 		printk(KERN_DEFAULT "Code: ");
 
diff --git a/arch/x86/oprofile/backtrace.c b/arch/x86/oprofile/backtrace.c
index cb31a44..c594768 100644
--- a/arch/x86/oprofile/backtrace.c
+++ b/arch/x86/oprofile/backtrace.c
@@ -113,10 +113,8 @@ x86_backtrace(struct pt_regs * const regs, unsigned int depth)
 	struct stack_frame *head = (struct stack_frame *)frame_pointer(regs);
 
 	if (!user_mode(regs)) {
-		unsigned long stack = kernel_stack_pointer(regs);
 		if (depth)
-			dump_trace(NULL, regs, (unsigned long *)stack, 0,
-				   &backtrace_ops, &depth);
+			dump_trace(NULL, regs, NULL, 0, &backtrace_ops, &depth);
 		return;
 	}
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v4 12/57] x86: move _stext marker to before head code
  2016-08-18 13:05 [PATCH v4 00/57] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (10 preceding siblings ...)
  2016-08-18 13:05 ` [PATCH v4 11/57] x86/dumpstack: remove unnecessary stack pointer arguments Josh Poimboeuf
@ 2016-08-18 13:05 ` Josh Poimboeuf
  2016-08-18 13:05 ` [PATCH v4 13/57] x86/head: remove useless zeroed word Josh Poimboeuf
                   ` (45 subsequent siblings)
  57 siblings, 0 replies; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-18 13:05 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

When core_kernel_text() is used to determine whether an address on a
task's stack trace is a kernel text address, it incorrectly returns
false for early text addresses for the head code between the _text and
_stext markers.

Head code is text code too, so mark it as such.  This seems to match the
intent of other users of the _stext symbol, and it also seems consistent
with what other architectures are already doing.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/vmlinux.lds.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 9297a00..1d9b636 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -91,10 +91,10 @@ SECTIONS
 	/* Text and read-only data */
 	.text :  AT(ADDR(.text) - LOAD_OFFSET) {
 		_text = .;
+		_stext = .;
 		/* bootstrapping code */
 		HEAD_TEXT
 		. = ALIGN(8);
-		_stext = .;
 		TEXT_TEXT
 		SCHED_TEXT
 		LOCK_TEXT
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v4 13/57] x86/head: remove useless zeroed word
  2016-08-18 13:05 [PATCH v4 00/57] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (11 preceding siblings ...)
  2016-08-18 13:05 ` [PATCH v4 12/57] x86: move _stext marker to before head code Josh Poimboeuf
@ 2016-08-18 13:05 ` Josh Poimboeuf
  2016-08-18 13:05 ` [PATCH v4 14/57] x86/head: put real return address on idle task stack Josh Poimboeuf
                   ` (44 subsequent siblings)
  57 siblings, 0 replies; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-18 13:05 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

This zeroed word has no apparent purpose, so remove it.

Brian Gerst says:

  "FYI the word used to be the SS segment selector for the LSS
  instruction, which isn't needed in 64-bit mode."

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/head_64.S | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index a212310..3621ad2 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -328,7 +328,6 @@ ENDPROC(start_cpu0)
 	.quad	INIT_PER_CPU_VAR(irq_stack_union)
 	GLOBAL(initial_stack)
 	.quad  init_thread_union+THREAD_SIZE-8
-	.word  0
 	__FINITDATA
 
 bad_address:
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v4 14/57] x86/head: put real return address on idle task stack
  2016-08-18 13:05 [PATCH v4 00/57] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (12 preceding siblings ...)
  2016-08-18 13:05 ` [PATCH v4 13/57] x86/head: remove useless zeroed word Josh Poimboeuf
@ 2016-08-18 13:05 ` Josh Poimboeuf
  2016-08-18 13:05 ` [PATCH v4 15/57] x86/head: fix the end of the stack for idle tasks Josh Poimboeuf
                   ` (43 subsequent siblings)
  57 siblings, 0 replies; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-18 13:05 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

The frame at the end of each idle task stack has a zeroed return
address.  This is inconsistent with real task stacks, which have a real
return address at that spot.  This inconsistency can be confusing for
stack unwinders.  It also hides useful information about what asm code
was involved in calling into C.

Make it a real address by using the side effect of a call instruction to
push the instruction pointer on the stack.

Reviewed-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/head_64.S | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 3621ad2..c90f481 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -298,8 +298,9 @@ ENTRY(start_cpu)
 	 *	REX.W + FF /5 JMP m16:64 Jump far, absolute indirect,
 	 *		address given in m16:64.
 	 */
-	movq	initial_code(%rip),%rax
-	pushq	$0		# fake return address to stop unwinder
+	call	1f		# put return address on stack for unwinder
+1:	xorq	%rbp, %rbp	# clear frame pointer
+	movq	initial_code(%rip), %rax
 	pushq	$__KERNEL_CS	# set correct cs
 	pushq	%rax		# target address in negative space
 	lretq
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v4 15/57] x86/head: fix the end of the stack for idle tasks
  2016-08-18 13:05 [PATCH v4 00/57] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (13 preceding siblings ...)
  2016-08-18 13:05 ` [PATCH v4 14/57] x86/head: put real return address on idle task stack Josh Poimboeuf
@ 2016-08-18 13:05 ` Josh Poimboeuf
  2016-08-18 13:05 ` [PATCH v4 16/57] x86/entry/32: fix the end of the stack for newly forked tasks Josh Poimboeuf
                   ` (42 subsequent siblings)
  57 siblings, 0 replies; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-18 13:05 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

Thanks to all the recent x86 entry code refactoring, most tasks' kernel
stacks start at the same offset right above their saved pt_regs,
regardless of which syscall was used to enter the kernel.  That creates
a nice convention which makes it straightforward to identify the end of
the stack, which can be useful for stack walking code which needs to
verify the stack is sane.

However, the boot CPU's idle "swapper" task doesn't follow that
convention.  Fix that by starting its stack at a sizeof(pt_regs) offset
from the end of the stack page.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/head_32.S |  9 ++++++++-
 arch/x86/kernel/head_64.S | 15 +++++++--------
 2 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index 5f40126..f2298e9 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -62,6 +62,8 @@
 #define PAGE_TABLE_SIZE(pages) ((pages) / PTRS_PER_PGD)
 #endif
 
+#define SIZEOF_PTREGS 17*4
+
 /*
  * Number of possible pages in the lowmem region.
  *
@@ -704,7 +706,12 @@ ENTRY(initial_page_table)
 .data
 .balign 4
 ENTRY(initial_stack)
-	.long init_thread_union+THREAD_SIZE
+	/*
+	 * The SIZEOF_PTREGS gap is a convention which helps the in-kernel
+	 * unwinder reliably detect the end of the stack.
+	 */
+	.long init_thread_union + THREAD_SIZE - SIZEOF_PTREGS - \
+	      TOP_OF_KERNEL_STACK_PADDING;
 
 __INITRODATA
 int_msg:
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index c90f481..ec332e9 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -65,13 +65,8 @@ startup_64:
 	 * tables and then reload them.
 	 */
 
-	/*
-	 * Setup stack for verify_cpu(). "-8" because initial_stack is defined
-	 * this way, see below. Our best guess is a NULL ptr for stack
-	 * termination heuristics and we don't want to break anything which
-	 * might depend on it (kgdb, ...).
-	 */
-	leaq	(__end_init_task - 8)(%rip), %rsp
+	/* Set up the stack for verify_cpu(), similar to initial_stack below */
+	leaq	(__end_init_task - SIZEOF_PTREGS)(%rip), %rsp
 
 	/* Sanitize CPU configuration */
 	call verify_cpu
@@ -328,7 +323,11 @@ ENDPROC(start_cpu0)
 	GLOBAL(initial_gs)
 	.quad	INIT_PER_CPU_VAR(irq_stack_union)
 	GLOBAL(initial_stack)
-	.quad  init_thread_union+THREAD_SIZE-8
+	/*
+	 * The SIZEOF_PTREGS gap is a convention which helps the in-kernel
+	 * unwinder reliably detect the end of the stack.
+	 */
+	.quad  init_thread_union + THREAD_SIZE - SIZEOF_PTREGS
 	__FINITDATA
 
 bad_address:
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v4 16/57] x86/entry/32: fix the end of the stack for newly forked tasks
  2016-08-18 13:05 [PATCH v4 00/57] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (14 preceding siblings ...)
  2016-08-18 13:05 ` [PATCH v4 15/57] x86/head: fix the end of the stack for idle tasks Josh Poimboeuf
@ 2016-08-18 13:05 ` Josh Poimboeuf
  2016-08-18 13:05 ` [PATCH v4 17/57] x86/head/32: fix the end of the stack for idle tasks Josh Poimboeuf
                   ` (41 subsequent siblings)
  57 siblings, 0 replies; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-18 13:05 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

The unwinder expects the last frame on the stack to always be at the
same offset from the end of the page, which allows it to validate the
stack.  Calling schedule_tail() directly breaks that convention because
its an asmlinkage function so its argument has to be pushed on the
stack.  Add a wrapper which creates a proper "end of stack" frame header
before the call.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/entry/entry_32.S | 24 ++++++++++++++++++++----
 1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 0b56666..52b77ac 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -44,6 +44,7 @@
 #include <asm/alternative-asm.h>
 #include <asm/asm.h>
 #include <asm/smap.h>
+#include <asm/frame.h>
 
 	.section .entry.text, "ax"
 
@@ -204,11 +205,27 @@
 	POP_GS_EX
 .endm
 
-ENTRY(ret_from_fork)
+/*
+ * The unwinder expects the last frame on the stack to always be at the same
+ * offset from the end of the page, which allows it to validate the stack.
+ * Calling schedule_tail() directly would break that convention because its an
+ * asmlinkage function so its argument has to be pushed on the stack.  This
+ * wrapper creates a proper "end of stack" frame header before the call.
+ */
+ENTRY(schedule_tail_wrapper)
+	FRAME_BEGIN
+
 	pushl	%eax
 	call	schedule_tail
 	popl	%eax
 
+	FRAME_END
+	ret
+ENDPROC(schedule_tail_wrapper)
+
+ENTRY(ret_from_fork)
+	call	schedule_tail_wrapper
+
 	/* When we fork, we trace the syscall return in the child, too. */
 	movl    %esp, %eax
 	call    syscall_return_slowpath
@@ -216,9 +233,8 @@ ENTRY(ret_from_fork)
 END(ret_from_fork)
 
 ENTRY(ret_from_kernel_thread)
-	pushl	%eax
-	call	schedule_tail
-	popl	%eax
+	call	schedule_tail_wrapper
+
 	movl	PT_EBP(%esp), %eax
 	call	*PT_EBX(%esp)
 	movl	$0, PT_EAX(%esp)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v4 17/57] x86/head/32: fix the end of the stack for idle tasks
  2016-08-18 13:05 [PATCH v4 00/57] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (15 preceding siblings ...)
  2016-08-18 13:05 ` [PATCH v4 16/57] x86/entry/32: fix the end of the stack for newly forked tasks Josh Poimboeuf
@ 2016-08-18 13:05 ` Josh Poimboeuf
  2016-08-18 13:05 ` [PATCH v4 18/57] x86/smp: fix initial idle stack location on 32-bit Josh Poimboeuf
                   ` (40 subsequent siblings)
  57 siblings, 0 replies; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-18 13:05 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

The frame at the end of each idle task stack is inconsistent with real
task stacks, which have a stack frame header and a real return address
before the pt_regs area.  This inconsistency can be confusing for stack
unwinders.  It also hides useful information about what asm code was
involved in calling into C.

Fix that by changing the initial code jumps to calls.  Also add infinite
loops after the calls to make it clear that the calls don't return, and
to hang if they do.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/head_32.S | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index f2298e9..b9ceb44 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -290,7 +290,8 @@ num_subarch_entries = (. - subarch_entries) / 4
 ENTRY(start_cpu0)
 	movl initial_stack, %ecx
 	movl %ecx, %esp
-	jmp  *(initial_code)
+	call *(initial_code)
+1:	jmp 1b
 ENDPROC(start_cpu0)
 #endif
 
@@ -471,8 +472,9 @@ is486:
 	xorl %eax,%eax			# Clear LDT
 	lldt %ax
 
-	pushl $0		# fake return address for unwinder
-	jmp *(initial_code)
+	call *(initial_code)
+1:	jmp 1b
+ENDPROC(startup_32_smp)
 
 #include "verify_cpu.S"
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v4 18/57] x86/smp: fix initial idle stack location on 32-bit
  2016-08-18 13:05 [PATCH v4 00/57] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (16 preceding siblings ...)
  2016-08-18 13:05 ` [PATCH v4 17/57] x86/head/32: fix the end of the stack for idle tasks Josh Poimboeuf
@ 2016-08-18 13:05 ` Josh Poimboeuf
  2016-08-18 13:05 ` [PATCH v4 19/57] x86/entry/head/32: use local labels Josh Poimboeuf
                   ` (39 subsequent siblings)
  57 siblings, 0 replies; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-18 13:05 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

On 32-bit, the initial idle stack calculation doesn't take into account
the TOP_OF_KERNEL_STACK_PADDING, making the stack end address
inconsistent with other tasks on 32-bit.

Reviewed-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/smpboot.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 3878725..3d919b8 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -964,9 +964,7 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle)
 	int cpu0_nmi_registered = 0;
 	unsigned long timeout;
 
-	idle->thread.sp = (unsigned long) (((struct pt_regs *)
-			  (THREAD_SIZE +  task_stack_page(idle))) - 1);
-
+	idle->thread.sp = (unsigned long)task_pt_regs(idle);
 	early_gdt_descr.address = (unsigned long)get_cpu_gdt_table(cpu);
 	initial_code = (unsigned long)start_secondary;
 	initial_stack  = idle->thread.sp;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v4 19/57] x86/entry/head/32: use local labels
  2016-08-18 13:05 [PATCH v4 00/57] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (17 preceding siblings ...)
  2016-08-18 13:05 ` [PATCH v4 18/57] x86/smp: fix initial idle stack location on 32-bit Josh Poimboeuf
@ 2016-08-18 13:05 ` Josh Poimboeuf
  2016-08-18 13:06 ` [PATCH v4 20/57] x86/entry/32: rename 'error_code' to 'common_exception' Josh Poimboeuf
                   ` (38 subsequent siblings)
  57 siblings, 0 replies; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-18 13:05 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

Add the local label prefix to all non-function named labels in head_32.S
and entry_32.S.  In addition to decluttering the symbol table, it also
causes stack traces to be more sensible.  For example, the last reported
function in the idle task stack trace is now startup_32_smp() instead of
is486().

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/entry/entry_32.S | 57 ++++++++++++++++++++++++-----------------------
 arch/x86/kernel/head_32.S | 32 +++++++++++++-------------
 2 files changed, 45 insertions(+), 44 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 52b77ac..aed9b11 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -65,7 +65,7 @@
 # define preempt_stop(clobbers)	DISABLE_INTERRUPTS(clobbers); TRACE_IRQS_OFF
 #else
 # define preempt_stop(clobbers)
-# define resume_kernel		restore_all
+# define resume_kernel		.Lrestore_all
 #endif
 
 .macro TRACE_IRQS_IRET
@@ -229,7 +229,7 @@ ENTRY(ret_from_fork)
 	/* When we fork, we trace the syscall return in the child, too. */
 	movl    %esp, %eax
 	call    syscall_return_slowpath
-	jmp     restore_all
+	jmp     .Lrestore_all
 END(ret_from_fork)
 
 ENTRY(ret_from_kernel_thread)
@@ -246,7 +246,7 @@ ENTRY(ret_from_kernel_thread)
 	 */
 	movl    %esp, %eax
 	call    syscall_return_slowpath
-	jmp     restore_all
+	jmp     .Lrestore_all
 ENDPROC(ret_from_kernel_thread)
 
 /*
@@ -280,19 +280,19 @@ ENTRY(resume_userspace)
 	TRACE_IRQS_OFF
 	movl	%esp, %eax
 	call	prepare_exit_to_usermode
-	jmp	restore_all
+	jmp	.Lrestore_all
 END(ret_from_exception)
 
 #ifdef CONFIG_PREEMPT
 ENTRY(resume_kernel)
 	DISABLE_INTERRUPTS(CLBR_ANY)
-need_resched:
+.Lneed_resched:
 	cmpl	$0, PER_CPU_VAR(__preempt_count)
-	jnz	restore_all
+	jnz	.Lrestore_all
 	testl	$X86_EFLAGS_IF, PT_EFLAGS(%esp)	# interrupts off (exception path) ?
-	jz	restore_all
+	jz	.Lrestore_all
 	call	preempt_schedule_irq
-	jmp	need_resched
+	jmp	.Lneed_resched
 END(resume_kernel)
 #endif
 
@@ -313,7 +313,7 @@ GLOBAL(__begin_SYSENTER_singlestep_region)
  */
 ENTRY(xen_sysenter_target)
 	addl	$5*4, %esp			/* remove xen-provided frame */
-	jmp	sysenter_past_esp
+	jmp	.Lsysenter_past_esp
 #endif
 
 /*
@@ -350,7 +350,7 @@ ENTRY(xen_sysenter_target)
  */
 ENTRY(entry_SYSENTER_32)
 	movl	TSS_sysenter_sp0(%esp), %esp
-sysenter_past_esp:
+.Lsysenter_past_esp:
 	pushl	$__USER_DS		/* pt_regs->ss */
 	pushl	%ebp			/* pt_regs->sp (stashed in bp) */
 	pushfl				/* pt_regs->flags (except IF = 0) */
@@ -481,11 +481,11 @@ ENTRY(entry_INT80_32)
 	call	do_int80_syscall_32
 .Lsyscall_32_done:
 
-restore_all:
+.Lrestore_all:
 	TRACE_IRQS_IRET
-restore_all_notrace:
+.Lrestore_all_notrace:
 #ifdef CONFIG_X86_ESPFIX32
-	ALTERNATIVE	"jmp restore_nocheck", "", X86_BUG_ESPFIX
+	ALTERNATIVE	"jmp .Lrestore_nocheck", "", X86_BUG_ESPFIX
 
 	movl	PT_EFLAGS(%esp), %eax		# mix EFLAGS, SS and CS
 	/*
@@ -497,22 +497,23 @@ restore_all_notrace:
 	movb	PT_CS(%esp), %al
 	andl	$(X86_EFLAGS_VM | (SEGMENT_TI_MASK << 8) | SEGMENT_RPL_MASK), %eax
 	cmpl	$((SEGMENT_LDT << 8) | USER_RPL), %eax
-	je ldt_ss				# returning to user-space with LDT SS
+	je .Lldt_ss				# returning to user-space with LDT SS
 #endif
-restore_nocheck:
+.Lrestore_nocheck:
 	RESTORE_REGS 4				# skip orig_eax/error_code
-irq_return:
+.Lirq_return:
 	INTERRUPT_RETURN
+
 .section .fixup, "ax"
 ENTRY(iret_exc	)
 	pushl	$0				# no error code
 	pushl	$do_iret_error
 	jmp	error_code
 .previous
-	_ASM_EXTABLE(irq_return, iret_exc)
+	_ASM_EXTABLE(.Lirq_return, iret_exc)
 
 #ifdef CONFIG_X86_ESPFIX32
-ldt_ss:
+.Lldt_ss:
 /*
  * Setup and switch to ESPFIX stack
  *
@@ -541,7 +542,7 @@ ldt_ss:
 	 */
 	DISABLE_INTERRUPTS(CLBR_EAX)
 	lss	(%esp), %esp			/* switch to espfix segment */
-	jmp	restore_nocheck
+	jmp	.Lrestore_nocheck
 #endif
 ENDPROC(entry_INT80_32)
 
@@ -861,7 +862,7 @@ ftrace_call:
 	popl	%edx
 	popl	%ecx
 	popl	%eax
-ftrace_ret:
+.Lftrace_ret:
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
 .globl ftrace_graph_call
 ftrace_graph_call:
@@ -931,7 +932,7 @@ GLOBAL(ftrace_regs_call)
 	popl	%gs
 	addl	$8, %esp			/* Skip orig_ax and ip */
 	popf					/* Pop flags at end (no addl to corrupt flags) */
-	jmp	ftrace_ret
+	jmp	.Lftrace_ret
 
 	popf
 	jmp	ftrace_stub
@@ -942,7 +943,7 @@ ENTRY(mcount)
 	jb	ftrace_stub			/* Paging not enabled yet? */
 
 	cmpl	$ftrace_stub, ftrace_trace_function
-	jnz	trace
+	jnz	.Ltrace
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
 	cmpl	$ftrace_stub, ftrace_graph_return
 	jnz	ftrace_graph_caller
@@ -955,7 +956,7 @@ ftrace_stub:
 	ret
 
 	/* taken from glibc */
-trace:
+.Ltrace:
 	pushl	%eax
 	pushl	%ecx
 	pushl	%edx
@@ -1094,7 +1095,7 @@ ENTRY(nmi)
 	movl	%ss, %eax
 	cmpw	$__ESPFIX_SS, %ax
 	popl	%eax
-	je	nmi_espfix_stack
+	je	.Lnmi_espfix_stack
 #endif
 
 	pushl	%eax				# pt_regs->orig_ax
@@ -1110,7 +1111,7 @@ ENTRY(nmi)
 
 	/* Not on SYSENTER stack. */
 	call	do_nmi
-	jmp	restore_all_notrace
+	jmp	.Lrestore_all_notrace
 
 .Lnmi_from_sysenter_stack:
 	/*
@@ -1121,10 +1122,10 @@ ENTRY(nmi)
 	movl	PER_CPU_VAR(cpu_current_top_of_stack), %esp
 	call	do_nmi
 	movl	%ebp, %esp
-	jmp	restore_all_notrace
+	jmp	.Lrestore_all_notrace
 
 #ifdef CONFIG_X86_ESPFIX32
-nmi_espfix_stack:
+.Lnmi_espfix_stack:
 	/*
 	 * create the pointer to lss back
 	 */
@@ -1142,7 +1143,7 @@ nmi_espfix_stack:
 	call	do_nmi
 	RESTORE_REGS
 	lss	12+4(%esp), %esp		# back to espfix stack
-	jmp	irq_return
+	jmp	.Lirq_return
 #endif
 END(nmi)
 
diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index b9ceb44..d87bec4 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -249,19 +249,19 @@ page_pde_offset = (__PAGE_OFFSET >> 20);
 #ifdef CONFIG_PARAVIRT
 	/* This is can only trip for a broken bootloader... */
 	cmpw $0x207, pa(boot_params + BP_version)
-	jb default_entry
+	jb .Ldefault_entry
 
 	/* Paravirt-compatible boot parameters.  Look to see what architecture
 		we're booting under. */
 	movl pa(boot_params + BP_hardware_subarch), %eax
 	cmpl $num_subarch_entries, %eax
-	jae bad_subarch
+	jae .Lbad_subarch
 
 	movl pa(subarch_entries)(,%eax,4), %eax
 	subl $__PAGE_OFFSET, %eax
 	jmp *%eax
 
-bad_subarch:
+.Lbad_subarch:
 WEAK(lguest_entry)
 WEAK(xen_entry)
 	/* Unknown implementation; there's really
@@ -271,14 +271,14 @@ WEAK(xen_entry)
 	__INITDATA
 
 subarch_entries:
-	.long default_entry		/* normal x86/PC */
+	.long .Ldefault_entry		/* normal x86/PC */
 	.long lguest_entry		/* lguest hypervisor */
 	.long xen_entry			/* Xen hypervisor */
-	.long default_entry		/* Moorestown MID */
+	.long .Ldefault_entry		/* Moorestown MID */
 num_subarch_entries = (. - subarch_entries) / 4
 .previous
 #else
-	jmp default_entry
+	jmp .Ldefault_entry
 #endif /* CONFIG_PARAVIRT */
 
 #ifdef CONFIG_HOTPLUG_CPU
@@ -319,7 +319,7 @@ ENTRY(startup_32_smp)
 	call load_ucode_ap
 #endif
 
-default_entry:
+.Ldefault_entry:
 #define CR0_STATE	(X86_CR0_PE | X86_CR0_MP | X86_CR0_ET | \
 			 X86_CR0_NE | X86_CR0_WP | X86_CR0_AM | \
 			 X86_CR0_PG)
@@ -349,7 +349,7 @@ default_entry:
 	pushfl
 	popl %eax			# get EFLAGS
 	testl $X86_EFLAGS_ID,%eax	# did EFLAGS.ID remained set?
-	jz enable_paging		# hw disallowed setting of ID bit
+	jz .Lenable_paging		# hw disallowed setting of ID bit
 					# which means no CPUID and no CR4
 
 	xorl %eax,%eax
@@ -359,13 +359,13 @@ default_entry:
 	movl $1,%eax
 	cpuid
 	andl $~1,%edx			# Ignore CPUID.FPU
-	jz enable_paging		# No flags or only CPUID.FPU = no CR4
+	jz .Lenable_paging		# No flags or only CPUID.FPU = no CR4
 
 	movl pa(mmu_cr4_features),%eax
 	movl %eax,%cr4
 
 	testb $X86_CR4_PAE, %al		# check if PAE is enabled
-	jz enable_paging
+	jz .Lenable_paging
 
 	/* Check if extended functions are implemented */
 	movl $0x80000000, %eax
@@ -373,7 +373,7 @@ default_entry:
 	/* Value must be in the range 0x80000001 to 0x8000ffff */
 	subl $0x80000001, %eax
 	cmpl $(0x8000ffff-0x80000001), %eax
-	ja enable_paging
+	ja .Lenable_paging
 
 	/* Clear bogus XD_DISABLE bits */
 	call verify_cpu
@@ -382,7 +382,7 @@ default_entry:
 	cpuid
 	/* Execute Disable bit supported? */
 	btl $(X86_FEATURE_NX & 31), %edx
-	jnc enable_paging
+	jnc .Lenable_paging
 
 	/* Setup EFER (Extended Feature Enable Register) */
 	movl $MSR_EFER, %ecx
@@ -392,7 +392,7 @@ default_entry:
 	/* Make changes effective */
 	wrmsr
 
-enable_paging:
+.Lenable_paging:
 
 /*
  * Enable paging
@@ -421,7 +421,7 @@ enable_paging:
  */
 	movb $4,X86			# at least 486
 	cmpl $-1,X86_CPUID
-	je is486
+	je .Lis486
 
 	/* get vendor info */
 	xorl %eax,%eax			# call CPUID with 0 -> return vendor ID
@@ -432,7 +432,7 @@ enable_paging:
 	movl %ecx,X86_VENDOR_ID+8	# last 4 chars
 
 	orl %eax,%eax			# do we have processor info as well?
-	je is486
+	je .Lis486
 
 	movl $1,%eax		# Use the CPUID instruction to get CPU type
 	cpuid
@@ -446,7 +446,7 @@ enable_paging:
 	movb %cl,X86_MASK
 	movl %edx,X86_CAPABILITY
 
-is486:
+.Lis486:
 	movl $0x50022,%ecx	# set AM, WP, NE and MP
 	movl %cr0,%eax
 	andl $0x80000011,%eax	# Save PG,PE,ET
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v4 20/57] x86/entry/32: rename 'error_code' to 'common_exception'
  2016-08-18 13:05 [PATCH v4 00/57] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (18 preceding siblings ...)
  2016-08-18 13:05 ` [PATCH v4 19/57] x86/entry/head/32: use local labels Josh Poimboeuf
@ 2016-08-18 13:06 ` Josh Poimboeuf
  2016-08-18 13:06 ` [PATCH v4 21/57] perf/x86: check perf_callchain_store() error Josh Poimboeuf
                   ` (37 subsequent siblings)
  57 siblings, 0 replies; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-18 13:06 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

The 'error_code' label is awkwardly named, especially when it shows up
in a stack trace.  Move it to its own local function and rename it to
'common_exception', analagous to the existing 'common_interrupt'.

This also makes related stack traces more sensible.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/entry/entry_32.S | 43 +++++++++++++++++++++++--------------------
 1 file changed, 23 insertions(+), 20 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index aed9b11..7bf9ec8 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -508,7 +508,7 @@ ENTRY(entry_INT80_32)
 ENTRY(iret_exc	)
 	pushl	$0				# no error code
 	pushl	$do_iret_error
-	jmp	error_code
+	jmp	common_exception
 .previous
 	_ASM_EXTABLE(.Lirq_return, iret_exc)
 
@@ -639,7 +639,7 @@ ENTRY(coprocessor_error)
 	ASM_CLAC
 	pushl	$0
 	pushl	$do_coprocessor_error
-	jmp	error_code
+	jmp	common_exception
 END(coprocessor_error)
 
 ENTRY(simd_coprocessor_error)
@@ -653,14 +653,14 @@ ENTRY(simd_coprocessor_error)
 #else
 	pushl	$do_simd_coprocessor_error
 #endif
-	jmp	error_code
+	jmp	common_exception
 END(simd_coprocessor_error)
 
 ENTRY(device_not_available)
 	ASM_CLAC
 	pushl	$-1				# mark this as an int
 	pushl	$do_device_not_available
-	jmp	error_code
+	jmp	common_exception
 END(device_not_available)
 
 #ifdef CONFIG_PARAVIRT
@@ -674,59 +674,59 @@ ENTRY(overflow)
 	ASM_CLAC
 	pushl	$0
 	pushl	$do_overflow
-	jmp	error_code
+	jmp	common_exception
 END(overflow)
 
 ENTRY(bounds)
 	ASM_CLAC
 	pushl	$0
 	pushl	$do_bounds
-	jmp	error_code
+	jmp	common_exception
 END(bounds)
 
 ENTRY(invalid_op)
 	ASM_CLAC
 	pushl	$0
 	pushl	$do_invalid_op
-	jmp	error_code
+	jmp	common_exception
 END(invalid_op)
 
 ENTRY(coprocessor_segment_overrun)
 	ASM_CLAC
 	pushl	$0
 	pushl	$do_coprocessor_segment_overrun
-	jmp	error_code
+	jmp	common_exception
 END(coprocessor_segment_overrun)
 
 ENTRY(invalid_TSS)
 	ASM_CLAC
 	pushl	$do_invalid_TSS
-	jmp	error_code
+	jmp	common_exception
 END(invalid_TSS)
 
 ENTRY(segment_not_present)
 	ASM_CLAC
 	pushl	$do_segment_not_present
-	jmp	error_code
+	jmp	common_exception
 END(segment_not_present)
 
 ENTRY(stack_segment)
 	ASM_CLAC
 	pushl	$do_stack_segment
-	jmp	error_code
+	jmp	common_exception
 END(stack_segment)
 
 ENTRY(alignment_check)
 	ASM_CLAC
 	pushl	$do_alignment_check
-	jmp	error_code
+	jmp	common_exception
 END(alignment_check)
 
 ENTRY(divide_error)
 	ASM_CLAC
 	pushl	$0				# no error code
 	pushl	$do_divide_error
-	jmp	error_code
+	jmp	common_exception
 END(divide_error)
 
 #ifdef CONFIG_X86_MCE
@@ -734,7 +734,7 @@ ENTRY(machine_check)
 	ASM_CLAC
 	pushl	$0
 	pushl	machine_check_vector
-	jmp	error_code
+	jmp	common_exception
 END(machine_check)
 #endif
 
@@ -742,7 +742,7 @@ ENTRY(spurious_interrupt_bug)
 	ASM_CLAC
 	pushl	$0
 	pushl	$do_spurious_interrupt_bug
-	jmp	error_code
+	jmp	common_exception
 END(spurious_interrupt_bug)
 
 #ifdef CONFIG_XEN
@@ -1006,7 +1006,7 @@ return_to_handler:
 ENTRY(trace_page_fault)
 	ASM_CLAC
 	pushl	$trace_do_page_fault
-	jmp	error_code
+	jmp	common_exception
 END(trace_page_fault)
 #endif
 
@@ -1014,7 +1014,10 @@ ENTRY(page_fault)
 	ASM_CLAC
 	pushl	$do_page_fault
 	ALIGN
-error_code:
+	jmp common_exception
+END(page_fault)
+
+common_exception:
 	/* the function address is in %gs's slot on the stack */
 	pushl	%fs
 	pushl	%es
@@ -1043,7 +1046,7 @@ error_code:
 	movl	%esp, %eax			# pt_regs pointer
 	call	*%edi
 	jmp	ret_from_exception
-END(page_fault)
+END(common_exception)
 
 ENTRY(debug)
 	/*
@@ -1160,14 +1163,14 @@ END(int3)
 
 ENTRY(general_protection)
 	pushl	$do_general_protection
-	jmp	error_code
+	jmp	common_exception
 END(general_protection)
 
 #ifdef CONFIG_KVM_GUEST
 ENTRY(async_page_fault)
 	ASM_CLAC
 	pushl	$do_async_page_fault
-	jmp	error_code
+	jmp	common_exception
 END(async_page_fault)
 #endif
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v4 21/57] perf/x86: check perf_callchain_store() error
  2016-08-18 13:05 [PATCH v4 00/57] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (19 preceding siblings ...)
  2016-08-18 13:06 ` [PATCH v4 20/57] x86/entry/32: rename 'error_code' to 'common_exception' Josh Poimboeuf
@ 2016-08-18 13:06 ` Josh Poimboeuf
  2016-08-18 13:06 ` [PATCH v4 22/57] oprofile/x86: add regs->ip to oprofile trace Josh Poimboeuf
                   ` (36 subsequent siblings)
  57 siblings, 0 replies; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-18 13:06 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

Add a check to perf_callchain_kernel() so that it returns early if the
callchain entry array is already full.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/events/core.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 18a1acf..dcaa887 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2297,7 +2297,8 @@ perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *re
 		return;
 	}
 
-	perf_callchain_store(entry, regs->ip);
+	if (perf_callchain_store(entry, regs->ip))
+		return;
 
 	dump_trace(NULL, regs, NULL, 0, &backtrace_ops, entry);
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v4 22/57] oprofile/x86: add regs->ip to oprofile trace
  2016-08-18 13:05 [PATCH v4 00/57] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (20 preceding siblings ...)
  2016-08-18 13:06 ` [PATCH v4 21/57] perf/x86: check perf_callchain_store() error Josh Poimboeuf
@ 2016-08-18 13:06 ` Josh Poimboeuf
  2016-08-18 13:06 ` [PATCH v4 23/57] proc: fix return address printk conversion specifer in /proc/<pid>/stack Josh Poimboeuf
                   ` (35 subsequent siblings)
  57 siblings, 0 replies; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-18 13:06 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish, Robert Richter

dump_trace() doesn't add the interrupted instruction's address to the
trace, so add it manually.

Cc: Robert Richter <rric@kernel.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/oprofile/backtrace.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/oprofile/backtrace.c b/arch/x86/oprofile/backtrace.c
index c594768..d950f9e 100644
--- a/arch/x86/oprofile/backtrace.c
+++ b/arch/x86/oprofile/backtrace.c
@@ -113,8 +113,14 @@ x86_backtrace(struct pt_regs * const regs, unsigned int depth)
 	struct stack_frame *head = (struct stack_frame *)frame_pointer(regs);
 
 	if (!user_mode(regs)) {
-		if (depth)
-			dump_trace(NULL, regs, NULL, 0, &backtrace_ops, &depth);
+		if (!depth)
+			return;
+
+		oprofile_add_trace(regs->ip);
+		if (!--depth)
+			return;
+
+		dump_trace(NULL, regs, NULL, 0, &backtrace_ops, &depth);
 		return;
 	}
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v4 23/57] proc: fix return address printk conversion specifer in /proc/<pid>/stack
  2016-08-18 13:05 [PATCH v4 00/57] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (21 preceding siblings ...)
  2016-08-18 13:06 ` [PATCH v4 22/57] oprofile/x86: add regs->ip to oprofile trace Josh Poimboeuf
@ 2016-08-18 13:06 ` Josh Poimboeuf
  2016-08-18 13:06 ` [PATCH v4 24/57] ftrace: remove CONFIG_HAVE_FUNCTION_GRAPH_FP_TEST from config Josh Poimboeuf
                   ` (34 subsequent siblings)
  57 siblings, 0 replies; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-18 13:06 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

When printing call return addresses found on a stack, /proc/<pid>/stack
can sometimes give a confusing result.  If the call instruction was the
last instruction in the function (which can happen when calling a
noreturn function), '%pS' will incorrectly display the name of the
function which happens to be next in the object code, rather than the
name of the actual calling function.

Use '%pB' instead, which was created for this exact purpose.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 fs/proc/base.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 54e2702..e9ff186 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -483,7 +483,7 @@ static int proc_pid_stack(struct seq_file *m, struct pid_namespace *ns,
 		save_stack_trace_tsk(task, &trace);
 
 		for (i = 0; i < trace.nr_entries; i++) {
-			seq_printf(m, "[<%pK>] %pS\n",
+			seq_printf(m, "[<%pK>] %pB\n",
 				   (void *)entries[i], (void *)entries[i]);
 		}
 		unlock_trace(task);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v4 24/57] ftrace: remove CONFIG_HAVE_FUNCTION_GRAPH_FP_TEST from config
  2016-08-18 13:05 [PATCH v4 00/57] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (22 preceding siblings ...)
  2016-08-18 13:06 ` [PATCH v4 23/57] proc: fix return address printk conversion specifer in /proc/<pid>/stack Josh Poimboeuf
@ 2016-08-18 13:06 ` Josh Poimboeuf
  2016-08-18 13:06 ` [PATCH v4 25/57] ftrace: only allocate the ret_stack 'fp' field when needed Josh Poimboeuf
                   ` (33 subsequent siblings)
  57 siblings, 0 replies; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-18 13:06 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

Make HAVE_FUNCTION_GRAPH_FP_TEST a normal define, independent from
kconfig.  This removes some config file pollution and simplifies the
checking for the fp test.

Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/arm64/kernel/entry-ftrace.S     | 2 +-
 arch/blackfin/kernel/ftrace-entry.S  | 4 ++--
 arch/sparc/Kconfig                   | 1 -
 arch/sparc/include/asm/ftrace.h      | 4 ++++
 arch/x86/Kconfig                     | 1 -
 arch/x86/include/asm/ftrace.h        | 1 +
 kernel/trace/Kconfig                 | 5 -----
 kernel/trace/trace_functions_graph.c | 2 +-
 8 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/kernel/entry-ftrace.S b/arch/arm64/kernel/entry-ftrace.S
index 0f03a8f..aef02d2 100644
--- a/arch/arm64/kernel/entry-ftrace.S
+++ b/arch/arm64/kernel/entry-ftrace.S
@@ -219,7 +219,7 @@ ENDPROC(ftrace_graph_caller)
  *
  * Run ftrace_return_to_handler() before going back to parent.
  * @fp is checked against the value passed by ftrace_graph_caller()
- * only when CONFIG_HAVE_FUNCTION_GRAPH_FP_TEST is enabled.
+ * only when HAVE_FUNCTION_GRAPH_FP_TEST is enabled.
  */
 ENTRY(return_to_handler)
 	save_return_regs
diff --git a/arch/blackfin/kernel/ftrace-entry.S b/arch/blackfin/kernel/ftrace-entry.S
index 28d0595..3b8bdcb 100644
--- a/arch/blackfin/kernel/ftrace-entry.S
+++ b/arch/blackfin/kernel/ftrace-entry.S
@@ -169,7 +169,7 @@ ENTRY(_ftrace_graph_caller)
 	r0 = sp;	/* unsigned long *parent */
 	r1 = [sp];	/* unsigned long self_addr */
 # endif
-# ifdef CONFIG_HAVE_FUNCTION_GRAPH_FP_TEST
+# ifdef HAVE_FUNCTION_GRAPH_FP_TEST
 	r2 = fp;	/* unsigned long frame_pointer */
 # endif
 	r0 += 16;	/* skip the 4 local regs on stack */
@@ -190,7 +190,7 @@ ENTRY(_return_to_handler)
 	[--sp] = r1;
 
 	/* get original return address */
-# ifdef CONFIG_HAVE_FUNCTION_GRAPH_FP_TEST
+# ifdef HAVE_FUNCTION_GRAPH_FP_TEST
 	r0 = fp;	/* Blackfin is sane, so omit this */
 # endif
 	call _ftrace_return_to_handler;
diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index 59b0960..f5d60f1 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -56,7 +56,6 @@ config SPARC64
 	def_bool 64BIT
 	select HAVE_FUNCTION_TRACER
 	select HAVE_FUNCTION_GRAPH_TRACER
-	select HAVE_FUNCTION_GRAPH_FP_TEST
 	select HAVE_KRETPROBES
 	select HAVE_KPROBES
 	select HAVE_RCU_TABLE_FREE if SMP
diff --git a/arch/sparc/include/asm/ftrace.h b/arch/sparc/include/asm/ftrace.h
index 3192a8e..62755a3 100644
--- a/arch/sparc/include/asm/ftrace.h
+++ b/arch/sparc/include/asm/ftrace.h
@@ -9,6 +9,10 @@
 void _mcount(void);
 #endif
 
+#endif /* CONFIG_MCOUNT */
+
+#if defined(CONFIG_SPARC64) && !defined(CC_USE_FENTRY)
+#define HAVE_FUNCTION_GRAPH_FP_TEST
 #endif
 
 #ifdef CONFIG_DYNAMIC_FTRACE
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index c580d8c..acf85ae 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -110,7 +110,6 @@ config X86
 	select HAVE_EXIT_THREAD
 	select HAVE_FENTRY			if X86_64
 	select HAVE_FTRACE_MCOUNT_RECORD
-	select HAVE_FUNCTION_GRAPH_FP_TEST
 	select HAVE_FUNCTION_GRAPH_TRACER
 	select HAVE_FUNCTION_TRACER
 	select HAVE_GCC_PLUGINS
diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h
index a4820d4..37f67cb 100644
--- a/arch/x86/include/asm/ftrace.h
+++ b/arch/x86/include/asm/ftrace.h
@@ -6,6 +6,7 @@
 # define MCOUNT_ADDR		((unsigned long)(__fentry__))
 #else
 # define MCOUNT_ADDR		((unsigned long)(mcount))
+# define HAVE_FUNCTION_GRAPH_FP_TEST
 #endif
 #define MCOUNT_INSN_SIZE	5 /* sizeof mcount call */
 
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index f4b86e8..ba33267 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -24,11 +24,6 @@ config HAVE_FUNCTION_GRAPH_TRACER
 	help
 	  See Documentation/trace/ftrace-design.txt
 
-config HAVE_FUNCTION_GRAPH_FP_TEST
-	bool
-	help
-	  See Documentation/trace/ftrace-design.txt
-
 config HAVE_DYNAMIC_FTRACE
 	bool
 	help
diff --git a/kernel/trace/trace_functions_graph.c b/kernel/trace/trace_functions_graph.c
index 7363ccf..fc173cd 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -204,7 +204,7 @@ ftrace_pop_return_trace(struct ftrace_graph_ret *trace, unsigned long *ret,
 		return;
 	}
 
-#if defined(CONFIG_HAVE_FUNCTION_GRAPH_FP_TEST) && !defined(CC_USING_FENTRY)
+#ifdef HAVE_FUNCTION_GRAPH_FP_TEST
 	/*
 	 * The arch may choose to record the frame pointer used
 	 * and check it here to make sure that it is what we expect it
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v4 25/57] ftrace: only allocate the ret_stack 'fp' field when needed
  2016-08-18 13:05 [PATCH v4 00/57] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (23 preceding siblings ...)
  2016-08-18 13:06 ` [PATCH v4 24/57] ftrace: remove CONFIG_HAVE_FUNCTION_GRAPH_FP_TEST from config Josh Poimboeuf
@ 2016-08-18 13:06 ` Josh Poimboeuf
  2016-08-18 13:06 ` [PATCH v4 26/57] ftrace: add return address pointer to ftrace_ret_stack Josh Poimboeuf
                   ` (32 subsequent siblings)
  57 siblings, 0 replies; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-18 13:06 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

This saves some memory when HAVE_FUNCTION_GRAPH_FP_TEST isn't defined.
On x86_64 with newer versions of gcc which have -mfentry, it saves 400
bytes per task.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 include/linux/ftrace.h               | 2 ++
 kernel/trace/trace_functions_graph.c | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 7d565af..4ad9ccc 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -795,7 +795,9 @@ struct ftrace_ret_stack {
 	unsigned long func;
 	unsigned long long calltime;
 	unsigned long long subtime;
+#ifdef HAVE_FUNCTION_GRAPH_FP_TEST
 	unsigned long fp;
+#endif
 };
 
 /*
diff --git a/kernel/trace/trace_functions_graph.c b/kernel/trace/trace_functions_graph.c
index fc173cd..0e03ed0 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -171,7 +171,9 @@ ftrace_push_return_trace(unsigned long ret, unsigned long func, int *depth,
 	current->ret_stack[index].func = func;
 	current->ret_stack[index].calltime = calltime;
 	current->ret_stack[index].subtime = 0;
+#ifdef HAVE_FUNCTION_GRAPH_FP_TEST
 	current->ret_stack[index].fp = frame_pointer;
+#endif
 	*depth = current->curr_ret_stack;
 
 	return 0;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v4 26/57] ftrace: add return address pointer to ftrace_ret_stack
  2016-08-18 13:05 [PATCH v4 00/57] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (24 preceding siblings ...)
  2016-08-18 13:06 ` [PATCH v4 25/57] ftrace: only allocate the ret_stack 'fp' field when needed Josh Poimboeuf
@ 2016-08-18 13:06 ` Josh Poimboeuf
  2016-08-18 13:06 ` [PATCH v4 27/57] ftrace: add ftrace_graph_ret_addr() stack unwinding helpers Josh Poimboeuf
                   ` (31 subsequent siblings)
  57 siblings, 0 replies; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-18 13:06 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

Storing this value will help prevent unwinders from getting out of sync
with the function graph tracer ret_stack.  Now instead of needing a
stateful iterator, they can compare the return address pointer to find
the right ret_stack entry.

Note that an array of 50 ftrace_ret_stack structs is allocated for every
task.  So when an arch implements this, it will add either 200 or 400
bytes of memory usage per task (depending on whether it's a 32-bit or
64-bit platform).

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 Documentation/trace/ftrace-design.txt | 11 +++++++++++
 arch/arm/kernel/ftrace.c              |  2 +-
 arch/arm64/kernel/ftrace.c            |  2 +-
 arch/blackfin/kernel/ftrace.c         |  2 +-
 arch/microblaze/kernel/ftrace.c       |  2 +-
 arch/mips/kernel/ftrace.c             |  4 ++--
 arch/parisc/kernel/ftrace.c           |  2 +-
 arch/powerpc/kernel/ftrace.c          |  3 ++-
 arch/s390/kernel/ftrace.c             |  3 ++-
 arch/sh/kernel/ftrace.c               |  2 +-
 arch/sparc/kernel/ftrace.c            |  2 +-
 arch/tile/kernel/ftrace.c             |  2 +-
 arch/x86/kernel/ftrace.c              |  2 +-
 include/linux/ftrace.h                |  5 ++++-
 kernel/trace/trace_functions_graph.c  |  5 ++++-
 15 files changed, 34 insertions(+), 15 deletions(-)

diff --git a/Documentation/trace/ftrace-design.txt b/Documentation/trace/ftrace-design.txt
index dd5f916..a273dd0 100644
--- a/Documentation/trace/ftrace-design.txt
+++ b/Documentation/trace/ftrace-design.txt
@@ -203,6 +203,17 @@ along to ftrace_push_return_trace() instead of a stub value of 0.
 
 Similarly, when you call ftrace_return_to_handler(), pass it the frame pointer.
 
+HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
+--------------------------------
+
+An arch may pass in a pointer to the return address on the stack.  This
+prevents potential stack unwinding issues where the unwinder gets out of
+sync with ret_stack and the wrong addresses are reported by
+ftrace_graph_ret_addr().
+
+Adding support for it is easy: just define the macro in asm/ftrace.h and
+pass the return address pointer as the 'retp' argument to
+ftrace_push_return_trace().
 
 HAVE_FTRACE_NMI_ENTER
 ---------------------
diff --git a/arch/arm/kernel/ftrace.c b/arch/arm/kernel/ftrace.c
index 709ee1d..3f17594 100644
--- a/arch/arm/kernel/ftrace.c
+++ b/arch/arm/kernel/ftrace.c
@@ -218,7 +218,7 @@ void prepare_ftrace_return(unsigned long *parent, unsigned long self_addr,
 	}
 
 	err = ftrace_push_return_trace(old, self_addr, &trace.depth,
-				       frame_pointer);
+				       frame_pointer, NULL);
 	if (err == -EBUSY) {
 		*parent = old;
 		return;
diff --git a/arch/arm64/kernel/ftrace.c b/arch/arm64/kernel/ftrace.c
index ebecf9a..40ad08a 100644
--- a/arch/arm64/kernel/ftrace.c
+++ b/arch/arm64/kernel/ftrace.c
@@ -138,7 +138,7 @@ void prepare_ftrace_return(unsigned long *parent, unsigned long self_addr,
 		return;
 
 	err = ftrace_push_return_trace(old, self_addr, &trace.depth,
-				       frame_pointer);
+				       frame_pointer, NULL);
 	if (err == -EBUSY)
 		return;
 	else
diff --git a/arch/blackfin/kernel/ftrace.c b/arch/blackfin/kernel/ftrace.c
index 095de0f..8dad758 100644
--- a/arch/blackfin/kernel/ftrace.c
+++ b/arch/blackfin/kernel/ftrace.c
@@ -107,7 +107,7 @@ void prepare_ftrace_return(unsigned long *parent, unsigned long self_addr,
 		return;
 
 	if (ftrace_push_return_trace(*parent, self_addr, &trace.depth,
-	                             frame_pointer) == -EBUSY)
+				     frame_pointer, NULL) == -EBUSY)
 		return;
 
 	trace.func = self_addr;
diff --git a/arch/microblaze/kernel/ftrace.c b/arch/microblaze/kernel/ftrace.c
index fc7b48a..d57563c 100644
--- a/arch/microblaze/kernel/ftrace.c
+++ b/arch/microblaze/kernel/ftrace.c
@@ -63,7 +63,7 @@ void prepare_ftrace_return(unsigned long *parent, unsigned long self_addr)
 		return;
 	}
 
-	err = ftrace_push_return_trace(old, self_addr, &trace.depth, 0);
+	err = ftrace_push_return_trace(old, self_addr, &trace.depth, 0, NULL);
 	if (err == -EBUSY) {
 		*parent = old;
 		return;
diff --git a/arch/mips/kernel/ftrace.c b/arch/mips/kernel/ftrace.c
index 937c54b..30a3b75 100644
--- a/arch/mips/kernel/ftrace.c
+++ b/arch/mips/kernel/ftrace.c
@@ -382,8 +382,8 @@ void prepare_ftrace_return(unsigned long *parent_ra_addr, unsigned long self_ra,
 	if (unlikely(faulted))
 		goto out;
 
-	if (ftrace_push_return_trace(old_parent_ra, self_ra, &trace.depth, fp)
-	    == -EBUSY) {
+	if (ftrace_push_return_trace(old_parent_ra, self_ra, &trace.depth, fp,
+				     NULL) == -EBUSY) {
 		*parent_ra_addr = old_parent_ra;
 		return;
 	}
diff --git a/arch/parisc/kernel/ftrace.c b/arch/parisc/kernel/ftrace.c
index a828a0a..5a5506a 100644
--- a/arch/parisc/kernel/ftrace.c
+++ b/arch/parisc/kernel/ftrace.c
@@ -48,7 +48,7 @@ static void __hot prepare_ftrace_return(unsigned long *parent,
 		return;
 
         if (ftrace_push_return_trace(old, self_addr, &trace.depth,
-			0 ) == -EBUSY)
+				     0, NULL) == -EBUSY)
                 return;
 
 	/* activate parisc_return_to_handler() as return point */
diff --git a/arch/powerpc/kernel/ftrace.c b/arch/powerpc/kernel/ftrace.c
index cc52d97..a95639b 100644
--- a/arch/powerpc/kernel/ftrace.c
+++ b/arch/powerpc/kernel/ftrace.c
@@ -593,7 +593,8 @@ unsigned long prepare_ftrace_return(unsigned long parent, unsigned long ip)
 	if (!ftrace_graph_entry(&trace))
 		goto out;
 
-	if (ftrace_push_return_trace(parent, ip, &trace.depth, 0) == -EBUSY)
+	if (ftrace_push_return_trace(parent, ip, &trace.depth, 0,
+				     NULL) == -EBUSY)
 		goto out;
 
 	parent = return_hooker;
diff --git a/arch/s390/kernel/ftrace.c b/arch/s390/kernel/ftrace.c
index 0f7bfeb..60a8a4e 100644
--- a/arch/s390/kernel/ftrace.c
+++ b/arch/s390/kernel/ftrace.c
@@ -209,7 +209,8 @@ unsigned long prepare_ftrace_return(unsigned long parent, unsigned long ip)
 	/* Only trace if the calling function expects to. */
 	if (!ftrace_graph_entry(&trace))
 		goto out;
-	if (ftrace_push_return_trace(parent, ip, &trace.depth, 0) == -EBUSY)
+	if (ftrace_push_return_trace(parent, ip, &trace.depth, 0,
+				     NULL) == -EBUSY)
 		goto out;
 	parent = (unsigned long) return_to_handler;
 out:
diff --git a/arch/sh/kernel/ftrace.c b/arch/sh/kernel/ftrace.c
index 38993e0..95eccd4 100644
--- a/arch/sh/kernel/ftrace.c
+++ b/arch/sh/kernel/ftrace.c
@@ -382,7 +382,7 @@ void prepare_ftrace_return(unsigned long *parent, unsigned long self_addr)
 		return;
 	}
 
-	err = ftrace_push_return_trace(old, self_addr, &trace.depth, 0);
+	err = ftrace_push_return_trace(old, self_addr, &trace.depth, 0, NULL);
 	if (err == -EBUSY) {
 		__raw_writel(old, parent);
 		return;
diff --git a/arch/sparc/kernel/ftrace.c b/arch/sparc/kernel/ftrace.c
index 0a2d2dd..6bcff69 100644
--- a/arch/sparc/kernel/ftrace.c
+++ b/arch/sparc/kernel/ftrace.c
@@ -131,7 +131,7 @@ unsigned long prepare_ftrace_return(unsigned long parent,
 		return parent + 8UL;
 
 	if (ftrace_push_return_trace(parent, self_addr, &trace.depth,
-				     frame_pointer) == -EBUSY)
+				     frame_pointer, NULL) == -EBUSY)
 		return parent + 8UL;
 
 	trace.func = self_addr;
diff --git a/arch/tile/kernel/ftrace.c b/arch/tile/kernel/ftrace.c
index 4a57208..b827a41 100644
--- a/arch/tile/kernel/ftrace.c
+++ b/arch/tile/kernel/ftrace.c
@@ -184,7 +184,7 @@ void prepare_ftrace_return(unsigned long *parent, unsigned long self_addr,
 	*parent = return_hooker;
 
 	err = ftrace_push_return_trace(old, self_addr, &trace.depth,
-				       frame_pointer);
+				       frame_pointer, NULL);
 	if (err == -EBUSY) {
 		*parent = old;
 		return;
diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index d036cfb..ae3b1fb 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -1029,7 +1029,7 @@ void prepare_ftrace_return(unsigned long self_addr, unsigned long *parent,
 	}
 
 	if (ftrace_push_return_trace(old, self_addr, &trace.depth,
-		    frame_pointer) == -EBUSY) {
+				     frame_pointer, NULL) == -EBUSY) {
 		*parent = old;
 		return;
 	}
diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 4ad9ccc..483e02a 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -798,6 +798,9 @@ struct ftrace_ret_stack {
 #ifdef HAVE_FUNCTION_GRAPH_FP_TEST
 	unsigned long fp;
 #endif
+#ifdef HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
+	unsigned long *retp;
+#endif
 };
 
 /*
@@ -809,7 +812,7 @@ extern void return_to_handler(void);
 
 extern int
 ftrace_push_return_trace(unsigned long ret, unsigned long func, int *depth,
-			 unsigned long frame_pointer);
+			 unsigned long frame_pointer, unsigned long *retp);
 
 /*
  * Sometimes we don't want to trace a function with the function
diff --git a/kernel/trace/trace_functions_graph.c b/kernel/trace/trace_functions_graph.c
index 0e03ed0..f7212ec 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -119,7 +119,7 @@ print_graph_duration(struct trace_array *tr, unsigned long long duration,
 /* Add a function return address to the trace stack on thread info.*/
 int
 ftrace_push_return_trace(unsigned long ret, unsigned long func, int *depth,
-			 unsigned long frame_pointer)
+			 unsigned long frame_pointer, unsigned long *retp)
 {
 	unsigned long long calltime;
 	int index;
@@ -174,6 +174,9 @@ ftrace_push_return_trace(unsigned long ret, unsigned long func, int *depth,
 #ifdef HAVE_FUNCTION_GRAPH_FP_TEST
 	current->ret_stack[index].fp = frame_pointer;
 #endif
+#ifdef HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
+	current->ret_stack[index].retp = retp;
+#endif
 	*depth = current->curr_ret_stack;
 
 	return 0;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v4 27/57] ftrace: add ftrace_graph_ret_addr() stack unwinding helpers
  2016-08-18 13:05 [PATCH v4 00/57] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (25 preceding siblings ...)
  2016-08-18 13:06 ` [PATCH v4 26/57] ftrace: add return address pointer to ftrace_ret_stack Josh Poimboeuf
@ 2016-08-18 13:06 ` Josh Poimboeuf
  2016-08-18 13:06 ` [PATCH v4 28/57] x86/dumpstack/ftrace: convert dump_trace() callbacks to use ftrace_graph_ret_addr() Josh Poimboeuf
                   ` (30 subsequent siblings)
  57 siblings, 0 replies; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-18 13:06 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

When function graph tracing is enabled for a function, ftrace modifies
the stack by replacing the original return address with the address of a
hook function (return_to_handler).

Stack unwinders need a way to get the original return address.  Add an
arch-independent helper function for that named ftrace_graph_ret_addr().

This adds two variations of the function: one depends on
HAVE_FUNCTION_GRAPH_RET_ADDR_PTR, and the other relies on an index state
variable.

The former is recommended because, in some cases, the latter can cause
problems when the unwinder skips stack frames.  It can get out of sync
with the ret_stack index and wrong addresses can be reported for the
stack trace.

Once all arches have been ported to use
HAVE_FUNCTION_GRAPH_RET_ADDR_PTR, we can get rid of the distinction.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 include/linux/ftrace.h               | 10 +++++++
 kernel/trace/trace_functions_graph.c | 58 ++++++++++++++++++++++++++++++++++++
 2 files changed, 68 insertions(+)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 483e02a..6f93ac4 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -814,6 +814,9 @@ extern int
 ftrace_push_return_trace(unsigned long ret, unsigned long func, int *depth,
 			 unsigned long frame_pointer, unsigned long *retp);
 
+unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
+				    unsigned long ret, unsigned long *retp);
+
 /*
  * Sometimes we don't want to trace a function with the function
  * graph tracer but we want them to keep traced by the usual function
@@ -875,6 +878,13 @@ static inline int task_curr_ret_stack(struct task_struct *tsk)
 	return -1;
 }
 
+static inline unsigned long
+ftrace_graph_ret_addr(struct task_struct *task, int *idx, unsigned long ret,
+		      unsigned long *retp)
+{
+	return ret;
+}
+
 static inline void pause_graph_tracing(void) { }
 static inline void unpause_graph_tracing(void) { }
 #endif /* CONFIG_FUNCTION_GRAPH_TRACER */
diff --git a/kernel/trace/trace_functions_graph.c b/kernel/trace/trace_functions_graph.c
index f7212ec..0cbe38a 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -284,6 +284,64 @@ unsigned long ftrace_return_to_handler(unsigned long frame_pointer)
 	return ret;
 }
 
+/**
+ * ftrace_graph_ret_addr - convert a potentially modified stack return address
+ *			   to its original value
+ *
+ * This function can be called by stack unwinding code to convert a found stack
+ * return address ('ret') to its original value, in case the function graph
+ * tracer has modified it to be 'return_to_handler'.  If the address hasn't
+ * been modified, the unchanged value of 'ret' is returned.
+ *
+ * 'idx' is a state variable which should be initialized by the caller to zero
+ * before the first call.
+ *
+ * 'retp' is a pointer to the return address on the stack.  It's ignored if
+ * the arch doesn't have HAVE_FUNCTION_GRAPH_RET_ADDR_PTR defined.
+ */
+#ifdef HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
+unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
+				    unsigned long ret, unsigned long *retp)
+{
+	int index = task->curr_ret_stack;
+	int i;
+
+	if (ret != (unsigned long)return_to_handler)
+		return ret;
+
+	if (index < -1)
+		index += FTRACE_NOTRACE_DEPTH;
+
+	if (index < 0)
+		return ret;
+
+	for (i = 0; i <= index; i++)
+		if (task->ret_stack[i].retp == retp)
+			return task->ret_stack[i].ret;
+
+	return ret;
+}
+#else /* !HAVE_FUNCTION_GRAPH_RET_ADDR_PTR */
+unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
+				    unsigned long ret, unsigned long *retp)
+{
+	int task_idx;
+
+	if (ret != (unsigned long)return_to_handler)
+		return ret;
+
+	task_idx = task->curr_ret_stack;
+
+	if (!task->ret_stack || task_idx < *idx)
+		return ret;
+
+	task_idx -= *idx;
+	(*idx)++;
+
+	return task->ret_stack[task_idx].ret;
+}
+#endif /* HAVE_FUNCTION_GRAPH_RET_ADDR_PTR */
+
 int __trace_graph_entry(struct trace_array *tr,
 				struct ftrace_graph_ent *trace,
 				unsigned long flags,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v4 28/57] x86/dumpstack/ftrace: convert dump_trace() callbacks to use ftrace_graph_ret_addr()
  2016-08-18 13:05 [PATCH v4 00/57] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (26 preceding siblings ...)
  2016-08-18 13:06 ` [PATCH v4 27/57] ftrace: add ftrace_graph_ret_addr() stack unwinding helpers Josh Poimboeuf
@ 2016-08-18 13:06 ` Josh Poimboeuf
  2016-08-18 13:06 ` [PATCH v4 29/57] ftrace/x86: implement HAVE_FUNCTION_GRAPH_RET_ADDR_PTR Josh Poimboeuf
                   ` (29 subsequent siblings)
  57 siblings, 0 replies; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-18 13:06 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

Convert print_context_stack() and print_context_stack_bp() to use the
arch-independent ftrace_graph_ret_addr() helper.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/dumpstack.c | 65 +++++++++++++++------------------------------
 1 file changed, 22 insertions(+), 43 deletions(-)

diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index 692eecae..b374d85 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -39,38 +39,6 @@ void printk_address(unsigned long address)
 	pr_cont(" [<%p>] %pS\n", (void *)address, (void *)address);
 }
 
-#ifdef CONFIG_FUNCTION_GRAPH_TRACER
-static void
-print_ftrace_graph_addr(unsigned long addr, void *data,
-			const struct stacktrace_ops *ops,
-			struct task_struct *task, int *graph)
-{
-	unsigned long ret_addr;
-	int index;
-
-	if (addr != (unsigned long)return_to_handler)
-		return;
-
-	index = task->curr_ret_stack;
-
-	if (!task->ret_stack || index < *graph)
-		return;
-
-	index -= *graph;
-	ret_addr = task->ret_stack[index].ret;
-
-	ops->address(data, ret_addr, 1);
-
-	(*graph)++;
-}
-#else
-static inline void
-print_ftrace_graph_addr(unsigned long addr, void *data,
-			const struct stacktrace_ops *ops,
-			struct task_struct *task, int *graph)
-{ }
-#endif
-
 /*
  * x86-64 can have up to three kernel stacks:
  * process stack
@@ -108,18 +76,24 @@ print_context_stack(struct task_struct *task,
 		stack = (unsigned long *)task_stack_page(task);
 
 	while (valid_stack_ptr(task, stack, sizeof(*stack), end)) {
-		unsigned long addr;
+		unsigned long addr = *stack;
 
-		addr = *stack;
 		if (__kernel_text_address(addr)) {
+			unsigned long real_addr;
+			int reliable = 0;
+
 			if ((unsigned long) stack == bp + sizeof(long)) {
-				ops->address(data, addr, 1);
+				reliable = 1;
 				frame = frame->next_frame;
 				bp = (unsigned long) frame;
-			} else {
-				ops->address(data, addr, 0);
 			}
-			print_ftrace_graph_addr(addr, data, ops, task, graph);
+
+			ops->address(data, addr, reliable);
+
+			real_addr = ftrace_graph_ret_addr(task, graph, addr,
+							  stack);
+			if (real_addr != addr)
+				ops->address(data, real_addr, 1);
 		}
 		stack++;
 	}
@@ -134,19 +108,24 @@ print_context_stack_bp(struct task_struct *task,
 		       unsigned long *end, int *graph)
 {
 	struct stack_frame *frame = (struct stack_frame *)bp;
-	unsigned long *ret_addr = &frame->return_address;
+	unsigned long *retp = &frame->return_address;
 
-	while (valid_stack_ptr(task, ret_addr, sizeof(*ret_addr), end)) {
-		unsigned long addr = *ret_addr;
+	while (valid_stack_ptr(task, retp, sizeof(*retp), end)) {
+		unsigned long addr = *retp;
+		unsigned long real_addr;
 
 		if (!__kernel_text_address(addr))
 			break;
 
 		if (ops->address(data, addr, 1))
 			break;
+
+		real_addr = ftrace_graph_ret_addr(task, graph, addr, retp);
+		if (real_addr != addr)
+			ops->address(data, real_addr, 1);
+
 		frame = frame->next_frame;
-		ret_addr = &frame->return_address;
-		print_ftrace_graph_addr(addr, data, ops, task, graph);
+		retp = &frame->return_address;
 	}
 
 	return (unsigned long)frame;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v4 29/57] ftrace/x86: implement HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
  2016-08-18 13:05 [PATCH v4 00/57] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (27 preceding siblings ...)
  2016-08-18 13:06 ` [PATCH v4 28/57] x86/dumpstack/ftrace: convert dump_trace() callbacks to use ftrace_graph_ret_addr() Josh Poimboeuf
@ 2016-08-18 13:06 ` Josh Poimboeuf
  2016-08-18 13:06 ` [PATCH v4 30/57] x86/dumpstack/ftrace: mark function graph handler function as unreliable Josh Poimboeuf
                   ` (28 subsequent siblings)
  57 siblings, 0 replies; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-18 13:06 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

This allows the use of the more reliable version of
ftrace_graph_ret_addr() so we no longer have to worry about the unwinder
getting out of sync with the function graph ret_stack index, which can
happen if the unwinder skips any frames before calling
ftrace_graph_ret_addr().

This fixes this issue (and several others like it):

  Before:
  $ cat /proc/self/stack
  [<ffffffff810489a2>] save_stack_trace_tsk+0x22/0x40
  [<ffffffff81311a89>] proc_pid_stack+0xb9/0x110
  [<ffffffff813127c4>] proc_single_show+0x54/0x80
  [<ffffffff812be088>] seq_read+0x108/0x3e0
  [<ffffffff812923d7>] __vfs_read+0x37/0x140
  [<ffffffff812929d9>] vfs_read+0x99/0x140
  [<ffffffff81293f28>] SyS_read+0x58/0xc0
  [<ffffffff818af97c>] entry_SYSCALL_64_fastpath+0x1f/0xbd
  [<ffffffffffffffff>] 0xffffffffffffffff

  After:
  $ echo function_graph > /sys/kernel/debug/tracing/current_tracer
  $ cat /proc/self/stack
  [<ffffffff818b2428>] ? return_to_handler+0x0/0x27
  [<ffffffff810394cc>] print_context_stack+0xfc/0x100
  [<ffffffff818b2428>] ? return_to_handler+0x0/0x27
  [<ffffffff8103891b>] dump_trace+0x12b/0x350
  [<ffffffff818b2428>] ? return_to_handler+0x0/0x27
  [<ffffffff810489a2>] save_stack_trace_tsk+0x22/0x40
  [<ffffffff818b2428>] ? return_to_handler+0x0/0x27
  [<ffffffff81311a89>] proc_pid_stack+0xb9/0x110
  [<ffffffff818b2428>] ? return_to_handler+0x0/0x27
  [<ffffffff813127c4>] proc_single_show+0x54/0x80
  [<ffffffff818b2428>] ? return_to_handler+0x0/0x27
  [<ffffffff812be088>] seq_read+0x108/0x3e0
  [<ffffffff818b2428>] ? return_to_handler+0x0/0x27
  [<ffffffff812923d7>] __vfs_read+0x37/0x140
  [<ffffffff818b2428>] ? return_to_handler+0x0/0x27
  [<ffffffff812929d9>] vfs_read+0x99/0x140
  [<ffffffffffffffff>] 0xffffffffffffffff

Enabling function graph tracing caused the stack trace to change: it was
offset by 2 frames because the unwinder started with an earlier stack
frame and got out of sync with the ret_stack index.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/include/asm/ftrace.h | 2 ++
 arch/x86/kernel/ftrace.c      | 2 +-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h
index 37f67cb..eccd0ac 100644
--- a/arch/x86/include/asm/ftrace.h
+++ b/arch/x86/include/asm/ftrace.h
@@ -14,6 +14,8 @@
 #define ARCH_SUPPORTS_FTRACE_OPS 1
 #endif
 
+#define HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
+
 #ifndef __ASSEMBLY__
 extern void mcount(void);
 extern atomic_t modifying_ftrace_code;
diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index ae3b1fb..8639bb2 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -1029,7 +1029,7 @@ void prepare_ftrace_return(unsigned long self_addr, unsigned long *parent,
 	}
 
 	if (ftrace_push_return_trace(old, self_addr, &trace.depth,
-				     frame_pointer, NULL) == -EBUSY) {
+				     frame_pointer, parent) == -EBUSY) {
 		*parent = old;
 		return;
 	}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v4 30/57] x86/dumpstack/ftrace: mark function graph handler function as unreliable
  2016-08-18 13:05 [PATCH v4 00/57] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (28 preceding siblings ...)
  2016-08-18 13:06 ` [PATCH v4 29/57] ftrace/x86: implement HAVE_FUNCTION_GRAPH_RET_ADDR_PTR Josh Poimboeuf
@ 2016-08-18 13:06 ` Josh Poimboeuf
  2016-08-18 13:06 ` [PATCH v4 31/57] x86/dumpstack/ftrace: don't print unreliable addresses in print_context_stack_bp() Josh Poimboeuf
                   ` (27 subsequent siblings)
  57 siblings, 0 replies; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-18 13:06 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

When function graph tracing is enabled for a function, its return
address on the stack is replaced with the address of an ftrace handler
(return_to_handler).

Currently 'return_to_handler' can be reported as reliable.  That's not
ideal, and can actually be misleading.  When saving or dumping the
stack, you normally only care about what led up to that point (the call
path), rather than what will happen in the future (the return path).

That's especially true in the non-oops stack trace case, which isn't
used for debugging.  For example, in a perf profiling operation,
reporting return_to_handler() in the trace would just be confusing.

And in the oops case, where debugging is important, "unreliable" is also
more appropriate there because it serves as a hint that graph tracing
was involved, instead of trying to imply that return_to_handler() was
the real caller.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/dumpstack.c | 24 ++++++++++++++++--------
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index b374d85..33f2899 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -88,12 +88,21 @@ print_context_stack(struct task_struct *task,
 				bp = (unsigned long) frame;
 			}
 
-			ops->address(data, addr, reliable);
-
+			/*
+			 * When function graph tracing is enabled for a
+			 * function, its return address on the stack is
+			 * replaced with the address of an ftrace handler
+			 * (return_to_handler).  In that case, before printing
+			 * the "real" address, we want to print the handler
+			 * address as an "unreliable" hint that function graph
+			 * tracing was involved.
+			 */
 			real_addr = ftrace_graph_ret_addr(task, graph, addr,
 							  stack);
 			if (real_addr != addr)
-				ops->address(data, real_addr, 1);
+				ops->address(data, addr, 0);
+
+			ops->address(data, real_addr, reliable);
 		}
 		stack++;
 	}
@@ -117,12 +126,11 @@ print_context_stack_bp(struct task_struct *task,
 		if (!__kernel_text_address(addr))
 			break;
 
-		if (ops->address(data, addr, 1))
-			break;
-
 		real_addr = ftrace_graph_ret_addr(task, graph, addr, retp);
-		if (real_addr != addr)
-			ops->address(data, real_addr, 1);
+		if (real_addr != addr && ops->address(data, addr, 0))
+			break;
+		if (ops->address(data, real_addr, 1))
+			break;
 
 		frame = frame->next_frame;
 		retp = &frame->return_address;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v4 31/57] x86/dumpstack/ftrace: don't print unreliable addresses in print_context_stack_bp()
  2016-08-18 13:05 [PATCH v4 00/57] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (29 preceding siblings ...)
  2016-08-18 13:06 ` [PATCH v4 30/57] x86/dumpstack/ftrace: mark function graph handler function as unreliable Josh Poimboeuf
@ 2016-08-18 13:06 ` Josh Poimboeuf
  2016-08-18 13:06 ` [PATCH v4 32/57] x86/dumpstack: allow preemption in show_stack_log_lvl() and dump_trace() Josh Poimboeuf
                   ` (26 subsequent siblings)
  57 siblings, 0 replies; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-18 13:06 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

When function graph tracing is enabled, print_context_stack_bp() can
report return_to_handler() as an unreliable address, which is confusing
and misleading: return_to_handler() is really only useful as a hint for
debugging, whereas print_context_stack_bp() users only care about the
actual 'reliable' call path.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/dumpstack.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index 33f2899..c6c6c39 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -127,8 +127,6 @@ print_context_stack_bp(struct task_struct *task,
 			break;
 
 		real_addr = ftrace_graph_ret_addr(task, graph, addr, retp);
-		if (real_addr != addr && ops->address(data, addr, 0))
-			break;
 		if (ops->address(data, real_addr, 1))
 			break;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v4 32/57] x86/dumpstack: allow preemption in show_stack_log_lvl() and dump_trace()
  2016-08-18 13:05 [PATCH v4 00/57] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (30 preceding siblings ...)
  2016-08-18 13:06 ` [PATCH v4 31/57] x86/dumpstack/ftrace: don't print unreliable addresses in print_context_stack_bp() Josh Poimboeuf
@ 2016-08-18 13:06 ` Josh Poimboeuf
  2016-08-18 13:06 ` [PATCH v4 33/57] x86/dumpstack: simplify in_exception_stack() Josh Poimboeuf
                   ` (25 subsequent siblings)
  57 siblings, 0 replies; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-18 13:06 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

show_stack_log_lvl() and dump_trace() are already preemption safe:

- If they're running in irq or exception context, preemption is already
  disabled and the percpu stack pointers can be trusted.

- If they're running with preemption enabled, they must be running on
  the task stack anyway, so it doesn't matter if they're comparing the
  stack pointer against a percpu stack pointer from this CPU or another
  one: either way it won't match.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/dumpstack_32.c | 14 ++++++--------
 arch/x86/kernel/dumpstack_64.c | 26 +++++++++-----------------
 2 files changed, 15 insertions(+), 25 deletions(-)

diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
index c533b8b..b07d5c9 100644
--- a/arch/x86/kernel/dumpstack_32.c
+++ b/arch/x86/kernel/dumpstack_32.c
@@ -24,16 +24,16 @@ static void *is_irq_stack(void *p, void *irq)
 }
 
 
-static void *is_hardirq_stack(unsigned long *stack, int cpu)
+static void *is_hardirq_stack(unsigned long *stack)
 {
-	void *irq = per_cpu(hardirq_stack, cpu);
+	void *irq = this_cpu_read(hardirq_stack);
 
 	return is_irq_stack(stack, irq);
 }
 
-static void *is_softirq_stack(unsigned long *stack, int cpu)
+static void *is_softirq_stack(unsigned long *stack);
 {
-	void *irq = per_cpu(softirq_stack, cpu);
+	void *irq = this_cpu_read(softirq_stack);
 
 	return is_irq_stack(stack, irq);
 }
@@ -42,7 +42,6 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 		unsigned long *stack, unsigned long bp,
 		const struct stacktrace_ops *ops, void *data)
 {
-	const unsigned cpu = get_cpu();
 	int graph = 0;
 	u32 *prev_esp;
 
@@ -53,9 +52,9 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 	for (;;) {
 		void *end_stack;
 
-		end_stack = is_hardirq_stack(stack, cpu);
+		end_stack = is_hardirq_stack(stack);
 		if (!end_stack)
-			end_stack = is_softirq_stack(stack, cpu);
+			end_stack = is_softirq_stack(stack);
 
 		bp = ops->walk_stack(task, stack, bp, ops, data,
 				     end_stack, &graph);
@@ -74,7 +73,6 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 			break;
 		touch_nmi_watchdog();
 	}
-	put_cpu();
 }
 EXPORT_SYMBOL(dump_trace);
 
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index 491f2fd..f1b843a 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -31,8 +31,8 @@ static char x86_stack_ids[][8] = {
 #endif
 };
 
-static unsigned long *in_exception_stack(unsigned cpu, unsigned long stack,
-					 unsigned *usedp, char **idp)
+static unsigned long *in_exception_stack(unsigned long stack, unsigned *usedp,
+					 char **idp)
 {
 	unsigned k;
 
@@ -41,7 +41,7 @@ static unsigned long *in_exception_stack(unsigned cpu, unsigned long stack,
 	 * 'stack' is in one of them:
 	 */
 	for (k = 0; k < N_EXCEPTION_STACKS; k++) {
-		unsigned long end = per_cpu(orig_ist, cpu).ist[k];
+		unsigned long end = raw_cpu_ptr(&orig_ist)->ist[k];
 		/*
 		 * Is 'stack' above this exception frame's end?
 		 * If yes then skip to the next frame.
@@ -111,7 +111,7 @@ enum stack_type {
 };
 
 static enum stack_type
-analyze_stack(int cpu, struct task_struct *task, unsigned long *stack,
+analyze_stack(struct task_struct *task, unsigned long *stack,
 	      unsigned long **stack_end, unsigned long *irq_stack,
 	      unsigned *used, char **id)
 {
@@ -121,8 +121,7 @@ analyze_stack(int cpu, struct task_struct *task, unsigned long *stack,
 	if ((unsigned long)task_stack_page(task) == addr)
 		return STACK_IS_NORMAL;
 
-	*stack_end = in_exception_stack(cpu, (unsigned long)stack,
-					used, id);
+	*stack_end = in_exception_stack((unsigned long)stack, used, id);
 	if (*stack_end)
 		return STACK_IS_EXCEPTION;
 
@@ -149,8 +148,7 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 		unsigned long *stack, unsigned long bp,
 		const struct stacktrace_ops *ops, void *data)
 {
-	const unsigned cpu = get_cpu();
-	unsigned long *irq_stack = (unsigned long *)per_cpu(irq_stack_ptr, cpu);
+	unsigned long *irq_stack = (unsigned long *)this_cpu_read(irq_stack_ptr);
 	unsigned used = 0;
 	int graph = 0;
 	int done = 0;
@@ -169,8 +167,8 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 		enum stack_type stype;
 		char *id;
 
-		stype = analyze_stack(cpu, task, stack, &stack_end,
-				      irq_stack, &used, &id);
+		stype = analyze_stack(task, stack, &stack_end, irq_stack, &used,
+				      &id);
 
 		/* Default finish unless specified to continue */
 		done = 1;
@@ -225,7 +223,6 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 	 * This handles the process stack:
 	 */
 	bp = ops->walk_stack(task, stack, bp, ops, data, NULL, &graph);
-	put_cpu();
 }
 EXPORT_SYMBOL(dump_trace);
 
@@ -236,13 +233,9 @@ show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
 	unsigned long *irq_stack_end;
 	unsigned long *irq_stack;
 	unsigned long *stack;
-	int cpu;
 	int i;
 
-	preempt_disable();
-	cpu = smp_processor_id();
-
-	irq_stack_end = (unsigned long *)(per_cpu(irq_stack_ptr, cpu));
+	irq_stack_end = (unsigned long *)this_cpu_read(irq_stack_ptr);
 	irq_stack     = irq_stack_end - (IRQ_USABLE_STACK_SIZE / sizeof(long));
 
 	sp = sp ? : get_stack_pointer(task, regs);
@@ -274,7 +267,6 @@ show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
 		stack++;
 		touch_nmi_watchdog();
 	}
-	preempt_enable();
 
 	pr_cont("\n");
 	show_trace_log_lvl(task, regs, sp, bp, log_lvl);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v4 33/57] x86/dumpstack: simplify in_exception_stack()
  2016-08-18 13:05 [PATCH v4 00/57] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (31 preceding siblings ...)
  2016-08-18 13:06 ` [PATCH v4 32/57] x86/dumpstack: allow preemption in show_stack_log_lvl() and dump_trace() Josh Poimboeuf
@ 2016-08-18 13:06 ` Josh Poimboeuf
  2016-08-18 13:06 ` [PATCH v4 34/57] x86/dumpstack: add get_stack_info() interface Josh Poimboeuf
                   ` (24 subsequent siblings)
  57 siblings, 0 replies; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-18 13:06 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

in_exception_stack() does some bad, bad things just so the unwinder can
print different values for different areas of the debug exception stack.

There's no need to clarify where exactly on the stack it is.  Just print
"#DB" and be done with it.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/dumpstack_64.c | 89 ++++++++++++------------------------------
 1 file changed, 26 insertions(+), 63 deletions(-)

diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index f1b843a..69f6ba2 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -16,83 +16,46 @@
 
 #include <asm/stacktrace.h>
 
+static char *exception_stack_names[N_EXCEPTION_STACKS] = {
+		[ DOUBLEFAULT_STACK-1	]	= "#DF",
+		[ NMI_STACK-1		]	= "NMI",
+		[ DEBUG_STACK-1		]	= "#DB",
+		[ MCE_STACK-1		]	= "#MC",
+};
 
-#define N_EXCEPTION_STACKS_END \
-		(N_EXCEPTION_STACKS + DEBUG_STKSZ/EXCEPTION_STKSZ - 2)
-
-static char x86_stack_ids[][8] = {
-		[ DEBUG_STACK-1			]	= "#DB",
-		[ NMI_STACK-1			]	= "NMI",
-		[ DOUBLEFAULT_STACK-1		]	= "#DF",
-		[ MCE_STACK-1			]	= "#MC",
-#if DEBUG_STKSZ > EXCEPTION_STKSZ
-		[ N_EXCEPTION_STACKS ...
-		  N_EXCEPTION_STACKS_END	]	= "#DB[?]"
-#endif
+static unsigned long exception_stack_sizes[N_EXCEPTION_STACKS] = {
+	[0 ... N_EXCEPTION_STACKS - 1]		= EXCEPTION_STKSZ,
+	[DEBUG_STACK - 1]			= DEBUG_STKSZ
 };
 
 static unsigned long *in_exception_stack(unsigned long stack, unsigned *usedp,
 					 char **idp)
 {
+	unsigned long begin, end;
 	unsigned k;
 
-	/*
-	 * Iterate over all exception stacks, and figure out whether
-	 * 'stack' is in one of them:
-	 */
+	BUILD_BUG_ON(N_EXCEPTION_STACKS != 4);
+
 	for (k = 0; k < N_EXCEPTION_STACKS; k++) {
-		unsigned long end = raw_cpu_ptr(&orig_ist)->ist[k];
-		/*
-		 * Is 'stack' above this exception frame's end?
-		 * If yes then skip to the next frame.
-		 */
-		if (stack >= end)
+		end   = raw_cpu_ptr(&orig_ist)->ist[k];
+		begin = end - exception_stack_sizes[k];
+
+		if (stack < begin || stack >= end)
 			continue;
+
 		/*
-		 * Is 'stack' above this exception frame's start address?
-		 * If yes then we found the right frame.
-		 */
-		if (stack >= end - EXCEPTION_STKSZ) {
-			/*
-			 * Make sure we only iterate through an exception
-			 * stack once. If it comes up for the second time
-			 * then there's something wrong going on - just
-			 * break out and return NULL:
-			 */
-			if (*usedp & (1U << k))
-				break;
-			*usedp |= 1U << k;
-			*idp = x86_stack_ids[k];
-			return (unsigned long *)end;
-		}
-		/*
-		 * If this is a debug stack, and if it has a larger size than
-		 * the usual exception stacks, then 'stack' might still
-		 * be within the lower portion of the debug stack:
+		 * Make sure we only iterate through an exception stack once.
+		 * If it comes up for the second time then there's something
+		 * wrong going on - just break and return NULL:
 		 */
-#if DEBUG_STKSZ > EXCEPTION_STKSZ
-		if (k == DEBUG_STACK - 1 && stack >= end - DEBUG_STKSZ) {
-			unsigned j = N_EXCEPTION_STACKS - 1;
+		if (*usedp & (1U << k))
+			break;
+		*usedp |= 1U << k;
 
-			/*
-			 * Black magic. A large debug stack is composed of
-			 * multiple exception stack entries, which we
-			 * iterate through now. Dont look:
-			 */
-			do {
-				++j;
-				end -= EXCEPTION_STKSZ;
-				x86_stack_ids[j][4] = '1' +
-						(j - N_EXCEPTION_STACKS);
-			} while (stack < end - EXCEPTION_STKSZ);
-			if (*usedp & (1U << j))
-				break;
-			*usedp |= 1U << j;
-			*idp = x86_stack_ids[j];
-			return (unsigned long *)end;
-		}
-#endif
+		*idp = exception_stack_names[k];
+		return (unsigned long *)end;
 	}
+
 	return NULL;
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v4 34/57] x86/dumpstack: add get_stack_info() interface
  2016-08-18 13:05 [PATCH v4 00/57] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (32 preceding siblings ...)
  2016-08-18 13:06 ` [PATCH v4 33/57] x86/dumpstack: simplify in_exception_stack() Josh Poimboeuf
@ 2016-08-18 13:06 ` Josh Poimboeuf
  2016-08-18 13:06 ` [PATCH v4 35/57] x86/dumpstack: add recursion checking for all stacks Josh Poimboeuf
                   ` (23 subsequent siblings)
  57 siblings, 0 replies; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-18 13:06 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

valid_stack_ptr() is buggy: it assumes that all stacks are of size
THREAD_SIZE, which is not true for exception stacks.  So the
walk_stack() callbacks will need to know the location of the beginning
of the stack as well as the end.

Another issue is that in general the various features of a stack (type,
size, next stack pointer, description string) are scattered around in
various places throughout the stack dump code.

Encapsulate all that information in a single place with a new stack_info
struct and a get_stack_info() interface.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/events/core.c            |   2 +-
 arch/x86/include/asm/stacktrace.h |  41 +++++++++-
 arch/x86/kernel/dumpstack.c       |  40 +++++----
 arch/x86/kernel/dumpstack_32.c    | 106 ++++++++++++++++++------
 arch/x86/kernel/dumpstack_64.c    | 167 ++++++++++++++++++++------------------
 arch/x86/kernel/stacktrace.c      |   2 +-
 arch/x86/oprofile/backtrace.c     |   2 +-
 7 files changed, 232 insertions(+), 128 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index dcaa887..dd3a1dc 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2271,7 +2271,7 @@ void arch_perf_update_userpage(struct perf_event *event,
  * callchain support
  */
 
-static int backtrace_stack(void *data, char *name)
+static int backtrace_stack(void *data, const char *name)
 {
 	return 0;
 }
diff --git a/arch/x86/include/asm/stacktrace.h b/arch/x86/include/asm/stacktrace.h
index 38a6bf8..e564b1d 100644
--- a/arch/x86/include/asm/stacktrace.h
+++ b/arch/x86/include/asm/stacktrace.h
@@ -9,6 +9,39 @@
 #include <linux/uaccess.h>
 #include <linux/ptrace.h>
 
+enum stack_type {
+	STACK_TYPE_UNKNOWN,
+	STACK_TYPE_TASK,
+	STACK_TYPE_IRQ,
+	STACK_TYPE_SOFTIRQ,
+	STACK_TYPE_EXCEPTION,
+	STACK_TYPE_EXCEPTION_LAST = STACK_TYPE_EXCEPTION + N_EXCEPTION_STACKS-1,
+};
+
+struct stack_info {
+	enum stack_type type;
+	unsigned long *begin, *end, *next_sp;
+};
+
+bool in_task_stack(unsigned long *stack, struct task_struct *task,
+		   struct stack_info *info);
+
+int get_stack_info(unsigned long *stack, struct task_struct *task,
+		   struct stack_info *info, unsigned long *visit_mask);
+
+void stack_type_str(enum stack_type type, const char **begin,
+		    const char **end);
+
+static inline bool on_stack(struct stack_info *info, void *addr, size_t len)
+{
+	void *begin = info->begin;
+	void *end   = info->end;
+
+	return (info->type != STACK_TYPE_UNKNOWN &&
+		addr >= begin && addr < end &&
+		addr + len > begin && addr + len <= end);
+}
+
 extern int kstack_depth_to_print;
 
 struct thread_info;
@@ -19,27 +52,27 @@ typedef unsigned long (*walk_stack_t)(struct task_struct *task,
 				      unsigned long bp,
 				      const struct stacktrace_ops *ops,
 				      void *data,
-				      unsigned long *end,
+				      struct stack_info *info,
 				      int *graph);
 
 extern unsigned long
 print_context_stack(struct task_struct *task,
 		    unsigned long *stack, unsigned long bp,
 		    const struct stacktrace_ops *ops, void *data,
-		    unsigned long *end, int *graph);
+		    struct stack_info *info, int *graph);
 
 extern unsigned long
 print_context_stack_bp(struct task_struct *task,
 		       unsigned long *stack, unsigned long bp,
 		       const struct stacktrace_ops *ops, void *data,
-		       unsigned long *end, int *graph);
+		       struct stack_info *info, int *graph);
 
 /* Generic stack tracer with callbacks */
 
 struct stacktrace_ops {
 	int (*address)(void *data, unsigned long address, int reliable);
 	/* On negative return stop dumping */
-	int (*stack)(void *data, char *name);
+	int (*stack)(void *data, const char *name);
 	walk_stack_t	walk_stack;
 };
 
diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index c6c6c39..960eef6 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -25,6 +25,23 @@ unsigned int code_bytes = 64;
 int kstack_depth_to_print = 3 * STACKSLOTS_PER_LINE;
 static int die_counter;
 
+bool in_task_stack(unsigned long *stack, struct task_struct *task,
+		   struct stack_info *info)
+{
+	unsigned long *begin = task_stack_page(task);
+	unsigned long *end   = task_stack_page(task) + THREAD_SIZE;
+
+	if (stack < begin || stack > end)
+		return false;
+
+	info->type	= STACK_TYPE_TASK;
+	info->begin	= begin;
+	info->end	= end;
+	info->next_sp	= NULL;
+
+	return true;
+}
+
 static void printk_stack_address(unsigned long address, int reliable,
 				 char *log_lvl)
 {
@@ -46,24 +63,11 @@ void printk_address(unsigned long address)
  * severe exception (double fault, nmi, stack fault, debug, mce) hardware stack
  */
 
-static inline int valid_stack_ptr(struct task_struct *task,
-			void *p, unsigned int size, void *end)
-{
-	void *t = task_stack_page(task);
-	if (end) {
-		if (p < end && p >= (end-THREAD_SIZE))
-			return 1;
-		else
-			return 0;
-	}
-	return p >= t && p < t + THREAD_SIZE - size;
-}
-
 unsigned long
 print_context_stack(struct task_struct *task,
 		unsigned long *stack, unsigned long bp,
 		const struct stacktrace_ops *ops, void *data,
-		unsigned long *end, int *graph)
+		struct stack_info *info, int *graph)
 {
 	struct stack_frame *frame = (struct stack_frame *)bp;
 
@@ -75,7 +79,7 @@ print_context_stack(struct task_struct *task,
 	    PAGE_SIZE)
 		stack = (unsigned long *)task_stack_page(task);
 
-	while (valid_stack_ptr(task, stack, sizeof(*stack), end)) {
+	while (on_stack(info, stack, sizeof(*stack))) {
 		unsigned long addr = *stack;
 
 		if (__kernel_text_address(addr)) {
@@ -114,12 +118,12 @@ unsigned long
 print_context_stack_bp(struct task_struct *task,
 		       unsigned long *stack, unsigned long bp,
 		       const struct stacktrace_ops *ops, void *data,
-		       unsigned long *end, int *graph)
+		       struct stack_info *info, int *graph)
 {
 	struct stack_frame *frame = (struct stack_frame *)bp;
 	unsigned long *retp = &frame->return_address;
 
-	while (valid_stack_ptr(task, retp, sizeof(*retp), end)) {
+	while (on_stack(info, stack, sizeof(*stack) * 2)) {
 		unsigned long addr = *retp;
 		unsigned long real_addr;
 
@@ -138,7 +142,7 @@ print_context_stack_bp(struct task_struct *task,
 }
 EXPORT_SYMBOL_GPL(print_context_stack_bp);
 
-static int print_trace_stack(void *data, char *name)
+static int print_trace_stack(void *data, const char *name)
 {
 	printk("%s <%s> ", (char *)data, name);
 	return 0;
diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
index b07d5c9..ca49102 100644
--- a/arch/x86/kernel/dumpstack_32.c
+++ b/arch/x86/kernel/dumpstack_32.c
@@ -16,61 +16,117 @@
 
 #include <asm/stacktrace.h>
 
-static void *is_irq_stack(void *p, void *irq)
+void stack_type_str(enum stack_type type, const char **begin, const char **end)
 {
-	if (p < irq || p >= (irq + THREAD_SIZE))
-		return NULL;
-	return irq + THREAD_SIZE;
+	switch (type) {
+	case STACK_TYPE_IRQ:
+	case STACK_TYPE_SOFTIRQ:
+		*begin = "IRQ";
+		*end   = "EOI";
+		break;
+	default:
+		*begin = NULL;
+		*end   = NULL;
+	}
 }
 
+static bool in_hardirq_stack(unsigned long *stack, struct stack_info *info)
+{
+	unsigned long *begin = (unsigned long *)this_cpu_read(hardirq_stack);
+	unsigned long *end   = begin + (THREAD_SIZE / sizeof(long));
+
+	if (stack < begin || stack > end)
+		return false;
+
+	info->type	= STACK_TYPE_IRQ;
+	info->begin	= begin;
+	info->end	= end;
+
+	/*
+	 * See irq_32.c -- the next stack pointer is stored at the beginning of
+	 * the stack.
+	 */
+	info->next_sp	= (unsigned long *)*begin;
+
+	return true;
+}
 
-static void *is_hardirq_stack(unsigned long *stack)
+static bool in_softirq_stack(unsigned long *stack, struct stack_info *info)
 {
-	void *irq = this_cpu_read(hardirq_stack);
+	unsigned long *begin = (unsigned long *)this_cpu_read(softirq_stack);
+	unsigned long *end   = begin + (THREAD_SIZE / sizeof(long));
+
+	if (stack < begin || stack > end)
+		return false;
+
+	info->type	= STACK_TYPE_SOFTIRQ;
+	info->begin	= begin;
+	info->end	= end;
+
+	/*
+	 * See irq_32.c -- the next stack pointer is stored at the beginning of
+	 * the stack.
+	 */
+	info->next_sp	= (unsigned long *)*begin;
 
-	return is_irq_stack(stack, irq);
+	return true;
 }
 
-static void *is_softirq_stack(unsigned long *stack);
+int get_stack_info(unsigned long *stack, struct task_struct *task,
+		   struct stack_info *info, unsigned long *visit_mask)
 {
-	void *irq = this_cpu_read(softirq_stack);
+	if (!stack)
+		goto unknown;
 
-	return is_irq_stack(stack, irq);
+	task = task ? : current;
+
+	if (in_task_stack(stack, task, info))
+		return 0;
+
+	if (task != current)
+		goto unknown;
+
+	if (in_hardirq_stack(stack, info))
+		return 0;
+
+	if (in_softirq_stack(stack, info))
+		return 0;
+
+unknown:
+	info->type = STACK_TYPE_UNKNOWN;
+	return -EINVAL;
 }
 
 void dump_trace(struct task_struct *task, struct pt_regs *regs,
 		unsigned long *stack, unsigned long bp,
 		const struct stacktrace_ops *ops, void *data)
 {
+	unsigned long visit_mask = 0;
 	int graph = 0;
-	u32 *prev_esp;
 
 	task = task ? : current;
 	stack = stack ? : get_stack_pointer(task, regs);
 	bp = bp ? : (unsigned long)get_frame_pointer(task, regs);
 
 	for (;;) {
-		void *end_stack;
+		const char *begin_str, *end_str;
+		struct stack_info info;
 
-		end_stack = is_hardirq_stack(stack);
-		if (!end_stack)
-			end_stack = is_softirq_stack(stack);
+		if (get_stack_info(stack, task, &info, &visit_mask))
+			break;
 
-		bp = ops->walk_stack(task, stack, bp, ops, data,
-				     end_stack, &graph);
+		stack_type_str(info.type, &begin_str, &end_str);
 
-		/* Stop if not on irq stack */
-		if (!end_stack)
+		if (begin_str && ops->stack(data, begin_str) < 0)
 			break;
 
-		/* The previous esp is saved on the bottom of the stack */
-		prev_esp = (u32 *)(end_stack - THREAD_SIZE);
-		stack = (unsigned long *)*prev_esp;
-		if (!stack)
-			break;
+		bp = ops->walk_stack(task, stack, bp, ops, data, &info, &graph);
 
-		if (ops->stack(data, "IRQ") < 0)
+		if (end_str && ops->stack(data, end_str) < 0)
 			break;
+
+		stack = info.next_sp;
+
 		touch_nmi_watchdog();
 	}
 }
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index 69f6ba2..65a77bf 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -28,19 +28,40 @@ static unsigned long exception_stack_sizes[N_EXCEPTION_STACKS] = {
 	[DEBUG_STACK - 1]			= DEBUG_STKSZ
 };
 
-static unsigned long *in_exception_stack(unsigned long stack, unsigned *usedp,
-					 char **idp)
+void stack_type_str(enum stack_type type, const char **begin, const char **end)
 {
-	unsigned long begin, end;
+	BUILD_BUG_ON(N_EXCEPTION_STACKS != 4);
+
+	switch (type) {
+	case STACK_TYPE_IRQ:
+		*begin = "IRQ";
+		*end   = "EOI";
+		break;
+	case STACK_TYPE_EXCEPTION ... STACK_TYPE_EXCEPTION_LAST:
+		*begin = exception_stack_names[type - STACK_TYPE_EXCEPTION];
+		*end   = "EOE";
+		break;
+	default:
+		*begin = NULL;
+		*end   = NULL;
+	}
+}
+
+static bool in_exception_stack(unsigned long *stack, struct stack_info *info,
+			       unsigned long *visit_mask)
+{
+	unsigned long *begin, *end;
+	struct pt_regs *regs;
 	unsigned k;
 
 	BUILD_BUG_ON(N_EXCEPTION_STACKS != 4);
 
 	for (k = 0; k < N_EXCEPTION_STACKS; k++) {
-		end   = raw_cpu_ptr(&orig_ist)->ist[k];
-		begin = end - exception_stack_sizes[k];
+		end   = (unsigned long *)raw_cpu_ptr(&orig_ist)->ist[k];
+		begin = end - (exception_stack_sizes[k] / sizeof(long));
+		regs  = (struct pt_regs *)end - 1;
 
-		if (stack < begin || stack >= end)
+		if (stack < begin || stack > end)
 			continue;
 
 		/*
@@ -48,56 +69,67 @@ static unsigned long *in_exception_stack(unsigned long stack, unsigned *usedp,
 		 * If it comes up for the second time then there's something
 		 * wrong going on - just break and return NULL:
 		 */
-		if (*usedp & (1U << k))
+		if (*visit_mask & (1U << k))
 			break;
-		*usedp |= 1U << k;
+		*visit_mask |= 1U << k;
 
-		*idp = exception_stack_names[k];
-		return (unsigned long *)end;
+		info->type	= STACK_TYPE_EXCEPTION + k;
+		info->begin	= begin;
+		info->end	= end;
+		info->next_sp	= (unsigned long *)regs->sp;
+
+		return true;
 	}
 
-	return NULL;
+	return false;
 }
 
-static inline int
-in_irq_stack(unsigned long *stack, unsigned long *irq_stack,
-	     unsigned long *irq_stack_end)
+static bool in_irq_stack(unsigned long *stack, struct stack_info *info)
 {
-	return (stack >= irq_stack && stack < irq_stack_end);
-}
+	unsigned long *end   = (unsigned long *)this_cpu_read(irq_stack_ptr);
+	unsigned long *begin = end - (IRQ_USABLE_STACK_SIZE / sizeof(long));
 
-enum stack_type {
-	STACK_IS_UNKNOWN,
-	STACK_IS_NORMAL,
-	STACK_IS_EXCEPTION,
-	STACK_IS_IRQ,
-};
+	if (stack < begin || stack > end)
+		return false;
+
+	info->type	= STACK_TYPE_IRQ;
+	info->begin	= begin;
+	info->end	= end;
+
+	/*
+	 * The next stack pointer is the first thing pushed by the entry code
+	 * after switching to the irq stack.
+	 */
+	info->next_sp = (unsigned long *)*(end - 1);
+
+	return true;
+}
 
-static enum stack_type
-analyze_stack(struct task_struct *task, unsigned long *stack,
-	      unsigned long **stack_end, unsigned long *irq_stack,
-	      unsigned *used, char **id)
+int get_stack_info(unsigned long *stack, struct task_struct *task,
+		   struct stack_info *info, unsigned long *visit_mask)
 {
-	unsigned long addr;
+	if (!stack)
+		goto unknown;
 
-	addr = ((unsigned long)stack & (~(THREAD_SIZE - 1)));
-	if ((unsigned long)task_stack_page(task) == addr)
-		return STACK_IS_NORMAL;
+	task = task ? : current;
+
+	if (in_task_stack(stack, task, info))
+		return 0;
 
-	*stack_end = in_exception_stack((unsigned long)stack, used, id);
-	if (*stack_end)
-		return STACK_IS_EXCEPTION;
+	if (task != current)
+		goto unknown;
 
-	if (!irq_stack)
-		return STACK_IS_NORMAL;
+	if (in_exception_stack(stack, info, visit_mask))
+		return 0;
 
-	*stack_end = irq_stack;
-	irq_stack -= (IRQ_USABLE_STACK_SIZE / sizeof(long));
+	if (in_irq_stack(stack, info))
+		return 0;
 
-	if (in_irq_stack(stack, irq_stack, *stack_end))
-		return STACK_IS_IRQ;
+	return 0;
 
-	return STACK_IS_UNKNOWN;
+unknown:
+	info->type = STACK_TYPE_UNKNOWN;
+	return -EINVAL;
 }
 
 /*
@@ -111,8 +143,8 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 		unsigned long *stack, unsigned long bp,
 		const struct stacktrace_ops *ops, void *data)
 {
-	unsigned long *irq_stack = (unsigned long *)this_cpu_read(irq_stack_ptr);
-	unsigned used = 0;
+	unsigned long visit_mask = 0;
+	struct stack_info info;
 	int graph = 0;
 	int done = 0;
 
@@ -126,57 +158,37 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 	 * exceptions
 	 */
 	while (!done) {
-		unsigned long *stack_end;
-		enum stack_type stype;
-		char *id;
+		const char *begin_str, *end_str;
 
-		stype = analyze_stack(task, stack, &stack_end, irq_stack, &used,
-				      &id);
+		get_stack_info(stack, task, &info, &visit_mask);
 
 		/* Default finish unless specified to continue */
 		done = 1;
 
-		switch (stype) {
+		switch (info.type) {
 
 		/* Break out early if we are on the thread stack */
-		case STACK_IS_NORMAL:
+		case STACK_TYPE_TASK:
 			break;
 
-		case STACK_IS_EXCEPTION:
+		case STACK_TYPE_IRQ:
+		case STACK_TYPE_EXCEPTION ... STACK_TYPE_EXCEPTION_LAST:
+
+			stack_type_str(info.type, &begin_str, &end_str);
 
-			if (ops->stack(data, id) < 0)
+			if (ops->stack(data, begin_str) < 0)
 				break;
 
 			bp = ops->walk_stack(task, stack, bp, ops,
-					     data, stack_end, &graph);
-			ops->stack(data, "EOE");
-			/*
-			 * We link to the next stack via the
-			 * second-to-last pointer (index -2 to end) in the
-			 * exception stack:
-			 */
-			stack = (unsigned long *) stack_end[-2];
-			done = 0;
-			break;
+					     data, &info, &graph);
 
-		case STACK_IS_IRQ:
+			ops->stack(data, end_str);
 
-			if (ops->stack(data, "IRQ") < 0)
-				break;
-			bp = ops->walk_stack(task, stack, bp,
-				     ops, data, stack_end, &graph);
-			/*
-			 * We link to the next stack (which would be
-			 * the process stack normally) the last
-			 * pointer (index -1 to end) in the IRQ stack:
-			 */
-			stack = (unsigned long *) (stack_end[-1]);
-			irq_stack = NULL;
-			ops->stack(data, "EOI");
+			stack = info.next_sp;
 			done = 0;
 			break;
 
-		case STACK_IS_UNKNOWN:
+		default:
 			ops->stack(data, "UNK");
 			break;
 		}
@@ -185,7 +197,7 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 	/*
 	 * This handles the process stack:
 	 */
-	bp = ops->walk_stack(task, stack, bp, ops, data, NULL, &graph);
+	bp = ops->walk_stack(task, stack, bp, ops, data, &info, &graph);
 }
 EXPORT_SYMBOL(dump_trace);
 
@@ -193,8 +205,7 @@ void
 show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
 		   unsigned long *sp, unsigned long bp, char *log_lvl)
 {
-	unsigned long *irq_stack_end;
-	unsigned long *irq_stack;
+	unsigned long *irq_stack, *irq_stack_end;
 	unsigned long *stack;
 	int i;
 
diff --git a/arch/x86/kernel/stacktrace.c b/arch/x86/kernel/stacktrace.c
index 4738f5e..785aef1 100644
--- a/arch/x86/kernel/stacktrace.c
+++ b/arch/x86/kernel/stacktrace.c
@@ -9,7 +9,7 @@
 #include <linux/uaccess.h>
 #include <asm/stacktrace.h>
 
-static int save_stack_stack(void *data, char *name)
+static int save_stack_stack(void *data, const char *name)
 {
 	return 0;
 }
diff --git a/arch/x86/oprofile/backtrace.c b/arch/x86/oprofile/backtrace.c
index d950f9e..7539148 100644
--- a/arch/x86/oprofile/backtrace.c
+++ b/arch/x86/oprofile/backtrace.c
@@ -17,7 +17,7 @@
 #include <asm/ptrace.h>
 #include <asm/stacktrace.h>
 
-static int backtrace_stack(void *data, char *name)
+static int backtrace_stack(void *data, const char *name)
 {
 	/* Yes, we want all stacks */
 	return 0;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v4 35/57] x86/dumpstack: add recursion checking for all stacks
  2016-08-18 13:05 [PATCH v4 00/57] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (33 preceding siblings ...)
  2016-08-18 13:06 ` [PATCH v4 34/57] x86/dumpstack: add get_stack_info() interface Josh Poimboeuf
@ 2016-08-18 13:06 ` Josh Poimboeuf
  2016-08-18 13:06 ` [PATCH v4 36/57] x86/unwind: add new unwind interface and implementations Josh Poimboeuf
                   ` (22 subsequent siblings)
  57 siblings, 0 replies; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-18 13:06 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

in_exception_stack() has some recursion checking which makes sure the
stack trace code never traverses a given exception stack more than once.
Otherwise corruption could cause a stack to point to itself (directly or
indirectly), resulting in an infinite loop.

Extend the recursion checking to all stacks.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/dumpstack_32.c | 22 +++++++++++++++++++---
 arch/x86/kernel/dumpstack_64.c | 34 +++++++++++++++++++---------------
 2 files changed, 38 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
index ca49102..c233f93 100644
--- a/arch/x86/kernel/dumpstack_32.c
+++ b/arch/x86/kernel/dumpstack_32.c
@@ -81,16 +81,32 @@ int get_stack_info(unsigned long *stack, struct task_struct *task,
 	task = task ? : current;
 
 	if (in_task_stack(stack, task, info))
-		return 0;
+		goto recursion_check;
 
 	if (task != current)
 		goto unknown;
 
 	if (in_hardirq_stack(stack, info))
-		return 0;
+		goto recursion_check;
 
 	if (in_softirq_stack(stack, info))
-		return 0;
+		goto recursion_check;
+
+	goto unknown;
+
+recursion_check:
+	/*
+	 * Make sure we don't iterate through any given stack more than once.
+	 * If it comes up a second time then there's something wrong going on:
+	 * just break out and report an unknown stack type.
+	 */
+	if (visit_mask) {
+		if (*visit_mask & (1UL << info->type))
+			goto unknown;
+		*visit_mask |= 1UL << info->type;
+	}
+
+	return 0;
 
 unknown:
 	info->type = STACK_TYPE_UNKNOWN;
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index 65a77bf..7a4029d 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -47,8 +47,7 @@ void stack_type_str(enum stack_type type, const char **begin, const char **end)
 	}
 }
 
-static bool in_exception_stack(unsigned long *stack, struct stack_info *info,
-			       unsigned long *visit_mask)
+static bool in_exception_stack(unsigned long *stack, struct stack_info *info)
 {
 	unsigned long *begin, *end;
 	struct pt_regs *regs;
@@ -64,15 +63,6 @@ static bool in_exception_stack(unsigned long *stack, struct stack_info *info,
 		if (stack < begin || stack > end)
 			continue;
 
-		/*
-		 * Make sure we only iterate through an exception stack once.
-		 * If it comes up for the second time then there's something
-		 * wrong going on - just break and return NULL:
-		 */
-		if (*visit_mask & (1U << k))
-			break;
-		*visit_mask |= 1U << k;
-
 		info->type	= STACK_TYPE_EXCEPTION + k;
 		info->begin	= begin;
 		info->end	= end;
@@ -114,16 +104,30 @@ int get_stack_info(unsigned long *stack, struct task_struct *task,
 	task = task ? : current;
 
 	if (in_task_stack(stack, task, info))
-		return 0;
+		goto recursion_check;
 
 	if (task != current)
 		goto unknown;
 
-	if (in_exception_stack(stack, info, visit_mask))
-		return 0;
+	if (in_exception_stack(stack, info))
+		goto recursion_check;
 
 	if (in_irq_stack(stack, info))
-		return 0;
+		goto recursion_check;
+
+	goto unknown;
+
+recursion_check:
+	/*
+	 * Make sure we don't iterate through any given stack more than once.
+	 * If it comes up a second time then there's something wrong going on:
+	 * just break out and report an unknown stack type.
+	 */
+	if (visit_mask) {
+		if (*visit_mask & (1UL << info->type))
+			goto unknown;
+		*visit_mask |= 1UL << info->type;
+	}
 
 	return 0;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v4 36/57] x86/unwind: add new unwind interface and implementations
  2016-08-18 13:05 [PATCH v4 00/57] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (34 preceding siblings ...)
  2016-08-18 13:06 ` [PATCH v4 35/57] x86/dumpstack: add recursion checking for all stacks Josh Poimboeuf
@ 2016-08-18 13:06 ` Josh Poimboeuf
  2016-08-18 13:06 ` [PATCH v4 37/57] perf/x86: convert perf_callchain_kernel() to use the new unwinder Josh Poimboeuf
                   ` (21 subsequent siblings)
  57 siblings, 0 replies; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-18 13:06 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

The x86 stack dump code is a bit of a mess.  dump_trace() uses
callbacks, and each user of it seems to have slightly different
requirements, so there are several slightly different callbacks floating
around.

Also there are some upcoming features which will require more changes to
the stack dump code: reliable stack detection for live patching,
hardened user copy, and the DWARF unwinder.  Each of those features
would at least need more callbacks and/or callback interfaces, resulting
in a much bigger mess than what we have today.

Before doing all that, we should try to clean things up and replace
dump_trace() with something cleaner and more flexible.

The new unwinder is a simple state machine which was heavily inspired by
a suggestion from Andy Lutomirski:

  https://lkml.kernel.org/r/CALCETrUbNTqaM2LRyXGRx=kVLRPeY5A3Pc6k4TtQxF320rUT=w@mail.gmail.com

It's also very similar to the libunwind API:

  http://www.nongnu.org/libunwind/man/libunwind(3).html

Some if its advantages:

- Simplicity: no more callback sprawl and less code duplication.

- Flexibility: it allows the caller to stop and inspect the stack state
  at each step in the unwinding process.

- Modularity: the unwinder code, console stack dump code, and stack
  metadata analysis code are all better separated so that changing one
  of them shouldn't have much of an impact on any of the others.

Two implementations are added which conform to the new unwind interface:

- The frame pointer unwinder which is used for CONFIG_FRAME_POINTER=y.

- The "guess" unwinder which is used for CONFIG_FRAME_POINTER=n.  This
  isn't an "unwinder" per se.  All it does is scan the stack for kernel
  text addresses.  But with no frame pointers, guesses are better than
  nothing in most cases.

Suggested-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/include/asm/unwind.h  | 90 +++++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/Makefile       |  6 +++
 arch/x86/kernel/unwind_frame.c | 96 ++++++++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/unwind_guess.c | 43 +++++++++++++++++++
 4 files changed, 235 insertions(+)
 create mode 100644 arch/x86/include/asm/unwind.h
 create mode 100644 arch/x86/kernel/unwind_frame.c
 create mode 100644 arch/x86/kernel/unwind_guess.c

diff --git a/arch/x86/include/asm/unwind.h b/arch/x86/include/asm/unwind.h
new file mode 100644
index 0000000..8306536
--- /dev/null
+++ b/arch/x86/include/asm/unwind.h
@@ -0,0 +1,90 @@
+#ifndef _ASM_X86_UNWIND_H
+#define _ASM_X86_UNWIND_H
+
+#include <linux/sched.h>
+#include <linux/ftrace.h>
+#include <asm/ptrace.h>
+#include <asm/stacktrace.h>
+
+struct unwind_state {
+	struct stack_info stack_info;
+	unsigned long stack_mask;
+	struct task_struct *task;
+	int graph_idx;
+#ifdef CONFIG_FRAME_POINTER
+	unsigned long *bp;
+#else
+	unsigned long *sp;
+#endif
+};
+
+void __unwind_start(struct unwind_state *state, struct task_struct *task,
+		    struct pt_regs *regs, unsigned long *first_frame);
+
+bool unwind_next_frame(struct unwind_state *state);
+
+static inline bool unwind_done(struct unwind_state *state)
+{
+	return state->stack_info.type == STACK_TYPE_UNKNOWN;
+}
+
+static inline
+void unwind_start(struct unwind_state *state, struct task_struct *task,
+		  struct pt_regs *regs, unsigned long *first_frame)
+{
+	task = task ? : current;
+	first_frame = first_frame ? : get_stack_pointer(task, regs);
+
+	__unwind_start(state, task, regs, first_frame);
+}
+
+#ifdef CONFIG_FRAME_POINTER
+
+static inline
+unsigned long *unwind_get_return_address_ptr(struct unwind_state *state)
+{
+	if (unwind_done(state))
+		return NULL;
+
+	return state->bp + 1;
+}
+
+static inline unsigned long *unwind_get_stack_ptr(struct unwind_state *state)
+{
+	if (unwind_done(state))
+		return NULL;
+
+	return state->bp;
+}
+
+unsigned long unwind_get_return_address(struct unwind_state *state);
+
+#else /* !CONFIG_FRAME_POINTER */
+
+static inline
+unsigned long *unwind_get_return_address_ptr(struct unwind_state *state)
+{
+	return NULL;
+}
+
+static inline unsigned long *unwind_get_stack_ptr(struct unwind_state *state)
+{
+	if (unwind_done(state))
+		return NULL;
+
+	return state->sp;
+}
+
+static inline
+unsigned long unwind_get_return_address(struct unwind_state *state)
+{
+	if (unwind_done(state))
+		return 0;
+
+	return ftrace_graph_ret_addr(state->task, &state->graph_idx,
+				     *state->sp, state->sp);
+}
+
+#endif /* CONFIG_FRAME_POINTER */
+
+#endif /* _ASM_X86_UNWIND_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 0503f5b..45257cf 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -125,6 +125,12 @@ obj-$(CONFIG_EFI)			+= sysfb_efi.o
 obj-$(CONFIG_PERF_EVENTS)		+= perf_regs.o
 obj-$(CONFIG_TRACING)			+= tracepoint.o
 
+ifdef CONFIG_FRAME_POINTER
+obj-y					+= unwind_frame.o
+else
+obj-y					+= unwind_guess.o
+endif
+
 ###
 # 64 bit specific files
 ifeq ($(CONFIG_X86_64),y)
diff --git a/arch/x86/kernel/unwind_frame.c b/arch/x86/kernel/unwind_frame.c
new file mode 100644
index 0000000..416908f
--- /dev/null
+++ b/arch/x86/kernel/unwind_frame.c
@@ -0,0 +1,96 @@
+#include <linux/sched.h>
+#include <asm/ptrace.h>
+#include <asm/bitops.h>
+#include <asm/stacktrace.h>
+#include <asm/unwind.h>
+
+#define FRAME_HEADER_SIZE (sizeof(long) * 2)
+
+unsigned long unwind_get_return_address(struct unwind_state *state)
+{
+	unsigned long addr;
+	unsigned long *addr_p = unwind_get_return_address_ptr(state);
+
+	if (unwind_done(state))
+		return 0;
+
+	addr = ftrace_graph_ret_addr(state->task, &state->graph_idx, *addr_p,
+				     addr_p);
+
+	return __kernel_text_address(addr) ? addr : 0;
+}
+EXPORT_SYMBOL_GPL(unwind_get_return_address);
+
+static bool update_stack_state(struct unwind_state *state, void *addr,
+			       size_t len)
+{
+	struct stack_info *info = &state->stack_info;
+
+	/*
+	 * If addr isn't on the current stack, switch to the next one.
+	 *
+	 * We may have to traverse multiple stacks to deal with the possibility
+	 * that 'info->next_sp' could point to an empty stack and 'addr' could
+	 * be on a subsequent stack.
+	 */
+	while (!on_stack(info, addr, len)) {
+		if (get_stack_info(info->next_sp, state->task, info,
+				   &state->stack_mask)) {
+			info->type = STACK_TYPE_UNKNOWN;
+			return false;
+		}
+	}
+
+	return true;
+}
+
+bool unwind_next_frame(struct unwind_state *state)
+{
+	unsigned long *next_bp;
+
+	if (unwind_done(state))
+		return false;
+
+	next_bp = (unsigned long *)*state->bp;
+
+	/* make sure the next frame's data is accessible */
+	if (!update_stack_state(state, next_bp, FRAME_HEADER_SIZE))
+		return false;
+
+	/* move to the next frame */
+	state->bp = next_bp;
+	return true;
+}
+EXPORT_SYMBOL_GPL(unwind_next_frame);
+
+void __unwind_start(struct unwind_state *state, struct task_struct *task,
+		    struct pt_regs *regs, unsigned long *first_frame)
+{
+	memset(state, 0, sizeof(*state));
+	state->task = task;
+
+	/* don't even attempt to start from user mode regs */
+	if (regs && user_mode(regs)) {
+		state->stack_info.type = STACK_TYPE_UNKNOWN;
+		return;
+	}
+
+	/* set up the starting stack frame */
+	state->bp = get_frame_pointer(task, regs);
+
+	/* initialize stack info and make sure the frame data is accessible */
+	get_stack_info(state->bp, state->task, &state->stack_info,
+		       &state->stack_mask);
+	update_stack_state(state, state->bp, FRAME_HEADER_SIZE);
+
+	/*
+	 * The caller can provide the address of the first frame directly
+	 * (first_frame) or indirectly (regs->sp) to indicate which stack frame
+	 * to start unwinding at.  Skip ahead until we reach it.
+	 */
+	while (!unwind_done(state) &&
+	       (!on_stack(&state->stack_info, first_frame, sizeof(long)) ||
+		unwind_get_stack_ptr(state) < first_frame))
+		unwind_next_frame(state);
+}
+EXPORT_SYMBOL_GPL(__unwind_start);
diff --git a/arch/x86/kernel/unwind_guess.c b/arch/x86/kernel/unwind_guess.c
new file mode 100644
index 0000000..b5a834c
--- /dev/null
+++ b/arch/x86/kernel/unwind_guess.c
@@ -0,0 +1,43 @@
+#include <linux/sched.h>
+#include <linux/ftrace.h>
+#include <asm/ptrace.h>
+#include <asm/bitops.h>
+#include <asm/stacktrace.h>
+#include <asm/unwind.h>
+
+bool unwind_next_frame(struct unwind_state *state)
+{
+	struct stack_info *info = &state->stack_info;
+
+	if (unwind_done(state))
+		return false;
+
+	do {
+		for (state->sp++; state->sp < info->end; state->sp++)
+			if (__kernel_text_address(*state->sp))
+				return true;
+
+		state->sp = info->next_sp;
+
+	} while (!get_stack_info(state->sp, state->task, info,
+				 &state->stack_mask));
+
+	return false;
+}
+EXPORT_SYMBOL_GPL(unwind_next_frame);
+
+void __unwind_start(struct unwind_state *state, struct task_struct *task,
+		    struct pt_regs *regs, unsigned long *first_frame)
+{
+	memset(state, 0, sizeof(*state));
+
+	state->task = task;
+	state->sp   = first_frame;
+
+	get_stack_info(first_frame, state->task, &state->stack_info,
+		       &state->stack_mask);
+
+	if (!__kernel_text_address(*first_frame))
+		unwind_next_frame(state);
+}
+EXPORT_SYMBOL_GPL(__unwind_start);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v4 37/57] perf/x86: convert perf_callchain_kernel() to use the new unwinder
  2016-08-18 13:05 [PATCH v4 00/57] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (35 preceding siblings ...)
  2016-08-18 13:06 ` [PATCH v4 36/57] x86/unwind: add new unwind interface and implementations Josh Poimboeuf
@ 2016-08-18 13:06 ` Josh Poimboeuf
  2016-08-18 13:06 ` [PATCH v4 38/57] x86/stacktrace: convert save_stack_trace_*() " Josh Poimboeuf
                   ` (20 subsequent siblings)
  57 siblings, 0 replies; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-18 13:06 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

Convert perf_callchain_kernel() to use the new unwinder.  dump_trace()
has been deprecated.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/events/core.c | 33 ++++++++++-----------------------
 1 file changed, 10 insertions(+), 23 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index dd3a1dc..b409e7c 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -37,6 +37,7 @@
 #include <asm/timer.h>
 #include <asm/desc.h>
 #include <asm/ldt.h>
+#include <asm/unwind.h>
 
 #include "perf_event.h"
 
@@ -2267,31 +2268,12 @@ void arch_perf_update_userpage(struct perf_event *event,
 	cyc2ns_read_end(data);
 }
 
-/*
- * callchain support
- */
-
-static int backtrace_stack(void *data, const char *name)
-{
-	return 0;
-}
-
-static int backtrace_address(void *data, unsigned long addr, int reliable)
-{
-	struct perf_callchain_entry_ctx *entry = data;
-
-	return perf_callchain_store(entry, addr);
-}
-
-static const struct stacktrace_ops backtrace_ops = {
-	.stack			= backtrace_stack,
-	.address		= backtrace_address,
-	.walk_stack		= print_context_stack_bp,
-};
-
 void
 perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *regs)
 {
+	struct unwind_state state;
+	unsigned long addr;
+
 	if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
 		/* TODO: We don't support guest os callchain now */
 		return;
@@ -2300,7 +2282,12 @@ perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *re
 	if (perf_callchain_store(entry, regs->ip))
 		return;
 
-	dump_trace(NULL, regs, NULL, 0, &backtrace_ops, entry);
+	for (unwind_start(&state, NULL, regs, NULL); !unwind_done(&state);
+	     unwind_next_frame(&state)) {
+		addr = unwind_get_return_address(&state);
+		if (!addr || perf_callchain_store(entry, addr))
+			return;
+	}
 }
 
 static inline int
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v4 38/57] x86/stacktrace: convert save_stack_trace_*() to use the new unwinder
  2016-08-18 13:05 [PATCH v4 00/57] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (36 preceding siblings ...)
  2016-08-18 13:06 ` [PATCH v4 37/57] perf/x86: convert perf_callchain_kernel() to use the new unwinder Josh Poimboeuf
@ 2016-08-18 13:06 ` Josh Poimboeuf
  2016-08-18 13:06 ` [PATCH v4 39/57] oprofile/x86: convert x86_backtrace() " Josh Poimboeuf
                   ` (19 subsequent siblings)
  57 siblings, 0 replies; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-18 13:06 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

Convert save_stack_trace_*() to use the new unwinder.  dump_trace() has
been deprecated.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/stacktrace.c | 74 +++++++++++++++++---------------------------
 1 file changed, 29 insertions(+), 45 deletions(-)

diff --git a/arch/x86/kernel/stacktrace.c b/arch/x86/kernel/stacktrace.c
index 785aef1..a168e7e 100644
--- a/arch/x86/kernel/stacktrace.c
+++ b/arch/x86/kernel/stacktrace.c
@@ -8,80 +8,64 @@
 #include <linux/export.h>
 #include <linux/uaccess.h>
 #include <asm/stacktrace.h>
+#include <asm/unwind.h>
 
-static int save_stack_stack(void *data, const char *name)
+static int save_stack_address(struct stack_trace *trace, unsigned long addr,
+			      bool nosched)
 {
-	return 0;
-}
-
-static int
-__save_stack_address(void *data, unsigned long addr, bool reliable, bool nosched)
-{
-	struct stack_trace *trace = data;
-#ifdef CONFIG_FRAME_POINTER
-	if (!reliable)
-		return 0;
-#endif
 	if (nosched && in_sched_functions(addr))
 		return 0;
+
 	if (trace->skip > 0) {
 		trace->skip--;
 		return 0;
 	}
-	if (trace->nr_entries < trace->max_entries) {
-		trace->entries[trace->nr_entries++] = addr;
-		return 0;
-	} else {
-		return -1; /* no more room, stop walking the stack */
-	}
-}
 
-static int save_stack_address(void *data, unsigned long addr, int reliable)
-{
-	return __save_stack_address(data, addr, reliable, false);
+	if (trace->nr_entries >= trace->max_entries)
+		return -1;
+
+	trace->entries[trace->nr_entries++] = addr;
+	return 0;
 }
 
-static int
-save_stack_address_nosched(void *data, unsigned long addr, int reliable)
+static void __save_stack_trace(struct stack_trace *trace,
+			       struct task_struct *task, struct pt_regs *regs,
+			       bool nosched)
 {
-	return __save_stack_address(data, addr, reliable, true);
-}
+	struct unwind_state state;
+	unsigned long addr;
 
-static const struct stacktrace_ops save_stack_ops = {
-	.stack		= save_stack_stack,
-	.address	= save_stack_address,
-	.walk_stack	= print_context_stack,
-};
+	if (regs)
+		save_stack_address(trace, regs->ip, nosched);
 
-static const struct stacktrace_ops save_stack_ops_nosched = {
-	.stack		= save_stack_stack,
-	.address	= save_stack_address_nosched,
-	.walk_stack	= print_context_stack,
-};
+	for (unwind_start(&state, task, regs, NULL); !unwind_done(&state);
+	     unwind_next_frame(&state)) {
+		addr = unwind_get_return_address(&state);
+		if (!addr || save_stack_address(trace, addr, nosched))
+			break;
+	}
+
+	if (trace->nr_entries < trace->max_entries)
+		trace->entries[trace->nr_entries++] = ULONG_MAX;
+}
 
 /*
  * Save stack-backtrace addresses into a stack_trace buffer.
  */
 void save_stack_trace(struct stack_trace *trace)
 {
-	dump_trace(current, NULL, NULL, 0, &save_stack_ops, trace);
-	if (trace->nr_entries < trace->max_entries)
-		trace->entries[trace->nr_entries++] = ULONG_MAX;
+	__save_stack_trace(trace, NULL, NULL, false);
 }
 EXPORT_SYMBOL_GPL(save_stack_trace);
 
 void save_stack_trace_regs(struct pt_regs *regs, struct stack_trace *trace)
 {
-	dump_trace(current, regs, NULL, 0, &save_stack_ops, trace);
-	if (trace->nr_entries < trace->max_entries)
-		trace->entries[trace->nr_entries++] = ULONG_MAX;
+	__save_stack_trace(trace, NULL, regs, false);
 }
 
 void save_stack_trace_tsk(struct task_struct *tsk, struct stack_trace *trace)
 {
-	dump_trace(tsk, NULL, NULL, 0, &save_stack_ops_nosched, trace);
-	if (trace->nr_entries < trace->max_entries)
-		trace->entries[trace->nr_entries++] = ULONG_MAX;
+	__save_stack_trace(trace, tsk, NULL, true);
 }
 EXPORT_SYMBOL_GPL(save_stack_trace_tsk);
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v4 39/57] oprofile/x86: convert x86_backtrace() to use the new unwinder
  2016-08-18 13:05 [PATCH v4 00/57] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (37 preceding siblings ...)
  2016-08-18 13:06 ` [PATCH v4 38/57] x86/stacktrace: convert save_stack_trace_*() " Josh Poimboeuf
@ 2016-08-18 13:06 ` Josh Poimboeuf
  2016-08-18 13:06 ` [PATCH v4 40/57] x86/dumpstack: convert show_trace_log_lvl() " Josh Poimboeuf
                   ` (18 subsequent siblings)
  57 siblings, 0 replies; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-18 13:06 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish, Robert Richter

Convert oprofile's x86_backtrace() to use the new unwinder.
dump_trace() has been deprecated.

Cc: Robert Richter <rric@kernel.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/oprofile/backtrace.c | 39 +++++++++++++++++----------------------
 1 file changed, 17 insertions(+), 22 deletions(-)

diff --git a/arch/x86/oprofile/backtrace.c b/arch/x86/oprofile/backtrace.c
index 7539148..f28ac1a 100644
--- a/arch/x86/oprofile/backtrace.c
+++ b/arch/x86/oprofile/backtrace.c
@@ -16,27 +16,7 @@
 
 #include <asm/ptrace.h>
 #include <asm/stacktrace.h>
-
-static int backtrace_stack(void *data, const char *name)
-{
-	/* Yes, we want all stacks */
-	return 0;
-}
-
-static int backtrace_address(void *data, unsigned long addr, int reliable)
-{
-	unsigned int *depth = data;
-
-	if ((*depth)--)
-		oprofile_add_trace(addr);
-	return 0;
-}
-
-static struct stacktrace_ops backtrace_ops = {
-	.stack		= backtrace_stack,
-	.address	= backtrace_address,
-	.walk_stack	= print_context_stack,
-};
+#include <asm/unwind.h>
 
 #ifdef CONFIG_COMPAT
 static struct stack_frame_ia32 *
@@ -113,14 +93,29 @@ x86_backtrace(struct pt_regs * const regs, unsigned int depth)
 	struct stack_frame *head = (struct stack_frame *)frame_pointer(regs);
 
 	if (!user_mode(regs)) {
+		struct unwind_state state;
+		unsigned long addr;
+
 		if (!depth)
 			return;
 
 		oprofile_add_trace(regs->ip);
+
 		if (!--depth)
 			return;
 
-		dump_trace(NULL, regs, NULL, 0, &backtrace_ops, &depth);
+		for (unwind_start(&state, NULL, regs, NULL);
+		     !unwind_done(&state); unwind_next_frame(&state)) {
+			addr = unwind_get_return_address(&state);
+			if (!addr)
+				break;
+
+			oprofile_add_trace(addr);
+
+			if (!--depth)
+				break;
+		}
+
 		return;
 	}
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v4 40/57] x86/dumpstack: convert show_trace_log_lvl() to use the new unwinder
  2016-08-18 13:05 [PATCH v4 00/57] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (38 preceding siblings ...)
  2016-08-18 13:06 ` [PATCH v4 39/57] oprofile/x86: convert x86_backtrace() " Josh Poimboeuf
@ 2016-08-18 13:06 ` Josh Poimboeuf
  2016-08-18 13:06 ` [PATCH v4 41/57] x86/dumpstack: remove dump_trace() and related callbacks Josh Poimboeuf
                   ` (17 subsequent siblings)
  57 siblings, 0 replies; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-18 13:06 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

Convert show_trace_log_lvl() to use the new unwinder.  dump_trace() has
been deprecated.

show_trace_log_lvl() is special compared to other users of the unwinder.
It's the only place where both reliable *and* unreliable addresses are
needed.  With frame pointers enabled, most callers of the unwinder don't
want to know about unreliable addresses.  But in this case, when we're
dumping the stack to the console because something presumably went
wrong, the unreliable addresses are useful:

- They show stale data on the stack which can provide useful clues.

- If something goes wrong with the unwinder, or if frame pointers are
  corrupt or missing, all the stack addresses still get shown.

So in order to show all addresses on the stack, and at the same time
figure out which addresses are reliable, we have to do the scanning and
the unwinding in parallel.

The scanning is done with the help of get_stack_info() to traverse the
stacks.  The unwinding is done separately by the new unwinder.

In theory we could simplify show_trace_log_lvl() by instead pushing some
of this logic into the unwind code.  But then we would need some kind of
"fake" frame logic in the unwinder which would add a lot of complexity
and wouldn't be worth it in order to support only one user.

Another benefit of this approach is that once we have a DWARF unwinder,
we should be able to just plug it in with minimal impact to this code.

Another change here is that callers of show_trace_log_lvl() don't need
to provide the 'bp' argument.  The unwinder already finds the relevant
frame pointer by unwinding until it reaches the first frame after the
provided stack pointer.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/include/asm/stacktrace.h |  10 ++-
 arch/x86/kernel/dumpstack.c       | 130 +++++++++++++++++++++++++++++---------
 arch/x86/kernel/dumpstack_32.c    |   6 +-
 arch/x86/kernel/dumpstack_64.c    |  10 +--
 4 files changed, 111 insertions(+), 45 deletions(-)

diff --git a/arch/x86/include/asm/stacktrace.h b/arch/x86/include/asm/stacktrace.h
index e564b1d..4f9630d 100644
--- a/arch/x86/include/asm/stacktrace.h
+++ b/arch/x86/include/asm/stacktrace.h
@@ -119,13 +119,11 @@ get_stack_pointer(struct task_struct *task, struct pt_regs *regs)
 	return (unsigned long *)task->thread.sp;
 }
 
-extern void
-show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
-		   unsigned long *stack, unsigned long bp, char *log_lvl);
+void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
+			unsigned long *stack, char *log_lvl);
 
-extern void
-show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
-		   unsigned long *sp, unsigned long bp, char *log_lvl);
+void show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
+			unsigned long *sp, char *log_lvl);
 
 extern unsigned int code_bytes;
 
diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index 960eef6..b27df04 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -17,7 +17,7 @@
 #include <linux/sysfs.h>
 
 #include <asm/stacktrace.h>
-
+#include <asm/unwind.h>
 
 int panic_on_unrecovered_nmi;
 int panic_on_io_nmi;
@@ -142,54 +142,122 @@ print_context_stack_bp(struct task_struct *task,
 }
 EXPORT_SYMBOL_GPL(print_context_stack_bp);
 
-static int print_trace_stack(void *data, const char *name)
+void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
+			unsigned long *stack, char *log_lvl)
 {
-	printk("%s <%s> ", (char *)data, name);
-	return 0;
-}
+	struct unwind_state state;
+	struct stack_info stack_info = {0};
+	unsigned long visit_mask = 0;
+	int graph_idx = 0;
 
-/*
- * Print one address/symbol entries per line.
- */
-static int print_trace_address(void *data, unsigned long addr, int reliable)
-{
-	printk_stack_address(addr, reliable, data);
-	return 0;
-}
+	printk("%sCall Trace:\n", log_lvl);
 
-static const struct stacktrace_ops print_trace_ops = {
-	.stack			= print_trace_stack,
-	.address		= print_trace_address,
-	.walk_stack		= print_context_stack,
-};
+	stack = stack ? : get_stack_pointer(task, regs);
+	if (!task)
+		task = current;
 
-void
-show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
-		unsigned long *stack, unsigned long bp, char *log_lvl)
-{
-	printk("%sCall Trace:\n", log_lvl);
-	dump_trace(task, regs, stack, bp, &print_trace_ops, log_lvl);
+	unwind_start(&state, task, regs, stack);
+
+	/*
+	 * Iterate through the stacks, starting with the current stack pointer.
+	 * Each stack has a pointer to the next one.
+	 *
+	 * x86-64 can have several stacks:
+	 * - task stack
+	 * - interrupt stack
+	 * - HW exception stacks (double fault, nmi, debug, mce)
+	 *
+	 * x86-32 can have up to three stacks:
+	 * - task stack
+	 * - softirq stack
+	 * - hardirq stack
+	 */
+	for (; stack; stack = stack_info.next_sp) {
+		const char *str_begin, *str_end;
+
+		/*
+		 * If we overflowed the task stack into a guard page, jump back
+		 * to the bottom of the usable stack.
+		 */
+		if (task_stack_page(task) - (void *)stack < PAGE_SIZE)
+			stack = task_stack_page(task);
+
+		if (get_stack_info(stack, task, &stack_info, &visit_mask))
+			break;
+
+		stack_type_str(stack_info.type, &str_begin, &str_end);
+		if (str_begin)
+			printk("%s <%s> ", log_lvl, str_begin);
+
+		/*
+		 * Scan the stack, printing any text addresses we find.  At the
+		 * same time, follow proper stack frames with the unwinder.
+		 *
+		 * Addresses found during the scan which are not reported by
+		 * the unwinder are considered to be additional clues which are
+		 * sometimes useful for debugging and are prefixed with '?'.
+		 * This also serves as a failsafe option in case the unwinder
+		 * goes off in the weeds.
+		 */
+		for (; stack < stack_info.end; stack++) {
+			unsigned long real_addr;
+			int reliable = 0;
+			unsigned long addr = *stack;
+			unsigned long *ret_addr_p =
+				unwind_get_return_address_ptr(&state);
+
+			if (!__kernel_text_address(addr))
+				continue;
+
+			if (stack == ret_addr_p)
+				reliable = 1;
+
+			/*
+			 * When function graph tracing is enabled for a
+			 * function, its return address on the stack is
+			 * replaced with the address of an ftrace handler
+			 * (return_to_handler).  In that case, before printing
+			 * the "real" address, we want to print the handler
+			 * address as an "unreliable" hint that function graph
+			 * tracing was involved.
+			 */
+			real_addr = ftrace_graph_ret_addr(task, &graph_idx,
+							  addr, stack);
+			if (real_addr != addr)
+				printk_stack_address(addr, 0, log_lvl);
+			printk_stack_address(real_addr, reliable, log_lvl);
+
+			if (!reliable)
+				continue;
+
+			/*
+			 * Get the next frame from the unwinder.  No need to
+			 * check for an error: if anything goes wrong, the rest
+			 * of the addresses will just be printed as unreliable.
+			 */
+			unwind_next_frame(&state);
+		}
+
+		if (str_end)
+			printk("%s <%s> ", log_lvl, str_end);
+	}
 }
 
 void show_stack(struct task_struct *task, unsigned long *sp)
 {
-	unsigned long bp = 0;
-
 	/*
 	 * Stack frames below this one aren't interesting.  Don't show them
 	 * if we're printing for %current.
 	 */
-	if (!sp && (!task || task == current)) {
+	if (!sp && (!task || task == current))
 		sp = get_stack_pointer(current, NULL);
-		bp = (unsigned long)get_frame_pointer(current, NULL);
-	}
 
-	show_stack_log_lvl(task, NULL, sp, bp, "");
+	show_stack_log_lvl(task, NULL, sp, "");
 }
 
 void show_stack_regs(struct pt_regs *regs)
 {
-	show_stack_log_lvl(current, regs, NULL, 0, "");
+	show_stack_log_lvl(NULL, regs, NULL, "");
 }
 
 static arch_spinlock_t die_lock = __ARCH_SPIN_LOCK_UNLOCKED;
diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
index c233f93..94c5ed5 100644
--- a/arch/x86/kernel/dumpstack_32.c
+++ b/arch/x86/kernel/dumpstack_32.c
@@ -150,7 +150,7 @@ EXPORT_SYMBOL(dump_trace);
 
 void
 show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
-		   unsigned long *sp, unsigned long bp, char *log_lvl)
+		   unsigned long *sp, char *log_lvl)
 {
 	unsigned long *stack;
 	int i;
@@ -170,7 +170,7 @@ show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
 		touch_nmi_watchdog();
 	}
 	pr_cont("\n");
-	show_trace_log_lvl(task, regs, sp, bp, log_lvl);
+	show_trace_log_lvl(task, regs, sp, log_lvl);
 }
 
 
@@ -192,7 +192,7 @@ void show_regs(struct pt_regs *regs)
 		u8 *ip;
 
 		pr_emerg("Stack:\n");
-		show_stack_log_lvl(NULL, regs, NULL, 0, KERN_EMERG);
+		show_stack_log_lvl(NULL, regs, NULL, KERN_EMERG);
 
 		pr_emerg("Code:");
 
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index 7a4029d..6690322 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -15,6 +15,7 @@
 #include <linux/nmi.h>
 
 #include <asm/stacktrace.h>
+#include <asm/unwind.h>
 
 static char *exception_stack_names[N_EXCEPTION_STACKS] = {
 		[ DOUBLEFAULT_STACK-1	]	= "#DF",
@@ -205,9 +206,8 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 }
 EXPORT_SYMBOL(dump_trace);
 
-void
-show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
-		   unsigned long *sp, unsigned long bp, char *log_lvl)
+void show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
+			unsigned long *sp, char *log_lvl)
 {
 	unsigned long *irq_stack, *irq_stack_end;
 	unsigned long *stack;
@@ -247,7 +247,7 @@ show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
 	}
 
 	pr_cont("\n");
-	show_trace_log_lvl(task, regs, sp, bp, log_lvl);
+	show_trace_log_lvl(task, regs, sp, log_lvl);
 }
 
 void show_regs(struct pt_regs *regs)
@@ -268,7 +268,7 @@ void show_regs(struct pt_regs *regs)
 		u8 *ip;
 
 		printk(KERN_DEFAULT "Stack:\n");
-		show_stack_log_lvl(NULL, regs, NULL, 0, KERN_DEFAULT);
+		show_stack_log_lvl(NULL, regs, NULL, KERN_DEFAULT);
 
 		printk(KERN_DEFAULT "Code: ");
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v4 41/57] x86/dumpstack: remove dump_trace() and related callbacks
  2016-08-18 13:05 [PATCH v4 00/57] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (39 preceding siblings ...)
  2016-08-18 13:06 ` [PATCH v4 40/57] x86/dumpstack: convert show_trace_log_lvl() " Josh Poimboeuf
@ 2016-08-18 13:06 ` Josh Poimboeuf
  2016-08-18 13:06 ` [PATCH v4 42/57] x86/entry/unwind: create stack frames for saved interrupt registers Josh Poimboeuf
                   ` (16 subsequent siblings)
  57 siblings, 0 replies; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-18 13:06 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

All previous users of dump_trace() have been converted to use the new
unwind interfaces, so we can remove it and the related
print_context_stack() and print_context_stack_bp() callback functions.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/include/asm/stacktrace.h | 36 ----------------
 arch/x86/kernel/dumpstack.c       | 86 ---------------------------------------
 arch/x86/kernel/dumpstack_32.c    | 35 ----------------
 arch/x86/kernel/dumpstack_64.c    | 69 -------------------------------
 4 files changed, 226 deletions(-)

diff --git a/arch/x86/include/asm/stacktrace.h b/arch/x86/include/asm/stacktrace.h
index 4f9630d..4692a5b 100644
--- a/arch/x86/include/asm/stacktrace.h
+++ b/arch/x86/include/asm/stacktrace.h
@@ -44,42 +44,6 @@ static inline bool on_stack(struct stack_info *info, void *addr, size_t len)
 
 extern int kstack_depth_to_print;
 
-struct thread_info;
-struct stacktrace_ops;
-
-typedef unsigned long (*walk_stack_t)(struct task_struct *task,
-				      unsigned long *stack,
-				      unsigned long bp,
-				      const struct stacktrace_ops *ops,
-				      void *data,
-				      struct stack_info *info,
-				      int *graph);
-
-extern unsigned long
-print_context_stack(struct task_struct *task,
-		    unsigned long *stack, unsigned long bp,
-		    const struct stacktrace_ops *ops, void *data,
-		    struct stack_info *info, int *graph);
-
-extern unsigned long
-print_context_stack_bp(struct task_struct *task,
-		       unsigned long *stack, unsigned long bp,
-		       const struct stacktrace_ops *ops, void *data,
-		       struct stack_info *info, int *graph);
-
-/* Generic stack tracer with callbacks */
-
-struct stacktrace_ops {
-	int (*address)(void *data, unsigned long address, int reliable);
-	/* On negative return stop dumping */
-	int (*stack)(void *data, const char *name);
-	walk_stack_t	walk_stack;
-};
-
-void dump_trace(struct task_struct *tsk, struct pt_regs *regs,
-		unsigned long *stack, unsigned long bp,
-		const struct stacktrace_ops *ops, void *data);
-
 #ifdef CONFIG_X86_32
 #define STACKSLOTS_PER_LINE 8
 #else
diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index b27df04..7e66837 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -56,92 +56,6 @@ void printk_address(unsigned long address)
 	pr_cont(" [<%p>] %pS\n", (void *)address, (void *)address);
 }
 
-/*
- * x86-64 can have up to three kernel stacks:
- * process stack
- * interrupt stack
- * severe exception (double fault, nmi, stack fault, debug, mce) hardware stack
- */
-
-unsigned long
-print_context_stack(struct task_struct *task,
-		unsigned long *stack, unsigned long bp,
-		const struct stacktrace_ops *ops, void *data,
-		struct stack_info *info, int *graph)
-{
-	struct stack_frame *frame = (struct stack_frame *)bp;
-
-	/*
-	 * If we overflowed the stack into a guard page, jump back to the
-	 * bottom of the usable stack.
-	 */
-	if ((unsigned long)task_stack_page(task) - (unsigned long)stack <
-	    PAGE_SIZE)
-		stack = (unsigned long *)task_stack_page(task);
-
-	while (on_stack(info, stack, sizeof(*stack))) {
-		unsigned long addr = *stack;
-
-		if (__kernel_text_address(addr)) {
-			unsigned long real_addr;
-			int reliable = 0;
-
-			if ((unsigned long) stack == bp + sizeof(long)) {
-				reliable = 1;
-				frame = frame->next_frame;
-				bp = (unsigned long) frame;
-			}
-
-			/*
-			 * When function graph tracing is enabled for a
-			 * function, its return address on the stack is
-			 * replaced with the address of an ftrace handler
-			 * (return_to_handler).  In that case, before printing
-			 * the "real" address, we want to print the handler
-			 * address as an "unreliable" hint that function graph
-			 * tracing was involved.
-			 */
-			real_addr = ftrace_graph_ret_addr(task, graph, addr,
-							  stack);
-			if (real_addr != addr)
-				ops->address(data, addr, 0);
-
-			ops->address(data, real_addr, reliable);
-		}
-		stack++;
-	}
-	return bp;
-}
-EXPORT_SYMBOL_GPL(print_context_stack);
-
-unsigned long
-print_context_stack_bp(struct task_struct *task,
-		       unsigned long *stack, unsigned long bp,
-		       const struct stacktrace_ops *ops, void *data,
-		       struct stack_info *info, int *graph)
-{
-	struct stack_frame *frame = (struct stack_frame *)bp;
-	unsigned long *retp = &frame->return_address;
-
-	while (on_stack(info, stack, sizeof(*stack) * 2)) {
-		unsigned long addr = *retp;
-		unsigned long real_addr;
-
-		if (!__kernel_text_address(addr))
-			break;
-
-		real_addr = ftrace_graph_ret_addr(task, graph, addr, retp);
-		if (ops->address(data, real_addr, 1))
-			break;
-
-		frame = frame->next_frame;
-		retp = &frame->return_address;
-	}
-
-	return (unsigned long)frame;
-}
-EXPORT_SYMBOL_GPL(print_context_stack_bp);
-
 void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
 			unsigned long *stack, char *log_lvl)
 {
diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
index 94c5ed5..fa7a85c 100644
--- a/arch/x86/kernel/dumpstack_32.c
+++ b/arch/x86/kernel/dumpstack_32.c
@@ -113,41 +113,6 @@ unknown:
 	return -EINVAL;
 }
 
-void dump_trace(struct task_struct *task, struct pt_regs *regs,
-		unsigned long *stack, unsigned long bp,
-		const struct stacktrace_ops *ops, void *data)
-{
-	unsigned long visit_mask = 0;
-	int graph = 0;
-
-	task = task ? : current;
-	stack = stack ? : get_stack_pointer(task, regs);
-	bp = bp ? : (unsigned long)get_frame_pointer(task, regs);
-
-	for (;;) {
-		const char *begin_str, *end_str;
-		struct stack_info info;
-
-		if (get_stack_info(stack, task, &info, &visit_mask))
-			break;
-
-		stack_type_str(info.type, &begin_str, &end_str);
-
-		if (begin_str && ops->stack(data, begin_str) < 0)
-			break;
-
-		bp = ops->walk_stack(task, stack, bp, ops, data, &info, &graph);
-
-		if (end_str && ops->stack(data, end_str) < 0)
-			break;
-
-		stack = info.next_sp;
-
-		touch_nmi_watchdog();
-	}
-}
-EXPORT_SYMBOL(dump_trace);
-
 void
 show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
 		   unsigned long *sp, char *log_lvl)
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index 6690322..8be240f 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -137,75 +137,6 @@ unknown:
 	return -EINVAL;
 }
 
-/*
- * x86-64 can have up to three kernel stacks:
- * process stack
- * interrupt stack
- * severe exception (double fault, nmi, stack fault, debug, mce) hardware stack
- */
-
-void dump_trace(struct task_struct *task, struct pt_regs *regs,
-		unsigned long *stack, unsigned long bp,
-		const struct stacktrace_ops *ops, void *data)
-{
-	unsigned long visit_mask = 0;
-	struct stack_info info;
-	int graph = 0;
-	int done = 0;
-
-	task = task ? : current;
-	stack = stack ? : get_stack_pointer(task, regs);
-	bp = bp ? : (unsigned long)get_frame_pointer(task, regs);
-
-	/*
-	 * Print function call entries in all stacks, starting at the
-	 * current stack address. If the stacks consist of nested
-	 * exceptions
-	 */
-	while (!done) {
-		const char *begin_str, *end_str;
-
-		get_stack_info(stack, task, &info, &visit_mask);
-
-		/* Default finish unless specified to continue */
-		done = 1;
-
-		switch (info.type) {
-
-		/* Break out early if we are on the thread stack */
-		case STACK_TYPE_TASK:
-			break;
-
-		case STACK_TYPE_IRQ:
-		case STACK_TYPE_EXCEPTION ... STACK_TYPE_EXCEPTION_LAST:
-
-			stack_type_str(info.type, &begin_str, &end_str);
-
-			if (ops->stack(data, begin_str) < 0)
-				break;
-
-			bp = ops->walk_stack(task, stack, bp, ops,
-					     data, &info, &graph);
-
-			ops->stack(data, end_str);
-
-			stack = info.next_sp;
-			done = 0;
-			break;
-
-		default:
-			ops->stack(data, "UNK");
-			break;
-		}
-	}
-
-	/*
-	 * This handles the process stack:
-	 */
-	bp = ops->walk_stack(task, stack, bp, ops, data, &info, &graph);
-}
-EXPORT_SYMBOL(dump_trace);
-
 void show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
 			unsigned long *sp, char *log_lvl)
 {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v4 42/57] x86/entry/unwind: create stack frames for saved interrupt registers
  2016-08-18 13:05 [PATCH v4 00/57] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (40 preceding siblings ...)
  2016-08-18 13:06 ` [PATCH v4 41/57] x86/dumpstack: remove dump_trace() and related callbacks Josh Poimboeuf
@ 2016-08-18 13:06 ` Josh Poimboeuf
  2016-08-18 13:06 ` [PATCH v4 43/57] x86/unwind: create stack frames for saved syscall registers Josh Poimboeuf
                   ` (15 subsequent siblings)
  57 siblings, 0 replies; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-18 13:06 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

With frame pointers, when a task is interrupted, its stack is no longer
completely reliable because the function could have been interrupted
before it had a chance to save the previous frame pointer on the stack.
So the caller of the interrupted function could get skipped by a stack
trace.

This is problematic for live patching, which needs to know whether a
stack trace of a sleeping task can be relied upon.  There's currently no
way to detect if a sleeping task was interrupted by a page fault
exception or preemption before it went to sleep.

Another issue is that when dumping the stack of an interrupted task, the
unwinder has no way of knowing where the saved pt_regs registers are, so
it can't print them.

This solves those issues by encoding the pt_regs pointer in the frame
pointer on entry from an interrupt or an exception.

This patch also updates the unwinder to be able to decode it, because
otherwise the unwinder would be broken by this change.

Note that this causes a change in the behavior of the unwinder: each
instance of a pt_regs on the stack is now considered a "frame".  So
callers of unwind_get_return_address() will now get an occasional
'regs->ip' address that would have previously been skipped over.

Suggested-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/entry/calling.h       | 21 +++++++++++
 arch/x86/entry/entry_32.S      | 34 +++++++++++++++---
 arch/x86/entry/entry_64.S      | 10 ++++--
 arch/x86/include/asm/unwind.h  | 18 ++++++++--
 arch/x86/kernel/unwind_frame.c | 80 +++++++++++++++++++++++++++++++++++++-----
 5 files changed, 146 insertions(+), 17 deletions(-)

diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index 9a9e588..ab799a3 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -201,6 +201,27 @@ For 32-bit we have the following conventions - kernel is built with
 	.byte 0xf1
 	.endm
 
+	/*
+	 * This is a sneaky trick to help the unwinder find pt_regs on the
+	 * stack.  The frame pointer is replaced with an encoded pointer to
+	 * pt_regs.  The encoding is just a clearing of the highest-order bit,
+	 * which makes it an invalid address and is also a signal to the
+	 * unwinder that it's a pt_regs pointer in disguise.
+	 *
+	 * NOTE: This macro must be used *after* SAVE_EXTRA_REGS because it
+	 * corrupts the original rbp.
+	 */
+.macro ENCODE_FRAME_POINTER ptregs_offset=0
+#ifdef CONFIG_FRAME_POINTER
+	.if \ptregs_offset
+		leaq \ptregs_offset(%rsp), %rbp
+	.else
+		mov %rsp, %rbp
+	.endif
+	btr $63, %rbp
+#endif
+.endm
+
 #endif /* CONFIG_X86_64 */
 
 /*
diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 7bf9ec8..b82f763 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -175,6 +175,23 @@
 	SET_KERNEL_GS %edx
 .endm
 
+/*
+ * This is a sneaky trick to help the unwinder find pt_regs on the
+ * stack.  The frame pointer is replaced with an encoded pointer to
+ * pt_regs.  The encoding is just a clearing of the highest-order bit,
+ * which makes it an invalid address and is also a signal to the
+ * unwinder that it's a pt_regs pointer in disguise.
+ *
+ * NOTE: This macro must be used *after* SAVE_ALL because it corrupts the
+ * original rbp.
+ */
+.macro ENCODE_FRAME_POINTER
+#ifdef CONFIG_FRAME_POINTER
+	mov %esp, %ebp
+	btr $31, %ebp
+#endif
+.endm
+
 .macro RESTORE_INT_REGS
 	popl	%ebx
 	popl	%ecx
@@ -604,6 +621,7 @@ common_interrupt:
 	ASM_CLAC
 	addl	$-0x80, (%esp)			/* Adjust vector into the [-256, -1] range */
 	SAVE_ALL
+	ENCODE_FRAME_POINTER
 	TRACE_IRQS_OFF
 	movl	%esp, %eax
 	call	do_IRQ
@@ -615,6 +633,7 @@ ENTRY(name)				\
 	ASM_CLAC;			\
 	pushl	$~(nr);			\
 	SAVE_ALL;			\
+	ENCODE_FRAME_POINTER;		\
 	TRACE_IRQS_OFF			\
 	movl	%esp, %eax;		\
 	call	fn;			\
@@ -749,6 +768,7 @@ END(spurious_interrupt_bug)
 ENTRY(xen_hypervisor_callback)
 	pushl	$-1				/* orig_ax = -1 => not a system call */
 	SAVE_ALL
+	ENCODE_FRAME_POINTER
 	TRACE_IRQS_OFF
 
 	/*
@@ -803,6 +823,7 @@ ENTRY(xen_failsafe_callback)
 	jmp	iret_exc
 5:	pushl	$-1				/* orig_ax = -1 => not a system call */
 	SAVE_ALL
+	ENCODE_FRAME_POINTER
 	jmp	ret_from_exception
 
 .section .fixup, "ax"
@@ -1029,6 +1050,7 @@ common_exception:
 	pushl	%edx
 	pushl	%ecx
 	pushl	%ebx
+	ENCODE_FRAME_POINTER
 	cld
 	movl	$(__KERNEL_PERCPU), %ecx
 	movl	%ecx, %fs
@@ -1061,6 +1083,7 @@ ENTRY(debug)
 	ASM_CLAC
 	pushl	$-1				# mark this as an int
 	SAVE_ALL
+	ENCODE_FRAME_POINTER
 	xorl	%edx, %edx			# error code 0
 	movl	%esp, %eax			# pt_regs pointer
 
@@ -1076,11 +1099,11 @@ ENTRY(debug)
 
 .Ldebug_from_sysenter_stack:
 	/* We're on the SYSENTER stack.  Switch off. */
-	movl	%esp, %ebp
+	movl	%esp, %ebx
 	movl	PER_CPU_VAR(cpu_current_top_of_stack), %esp
 	TRACE_IRQS_OFF
 	call	do_debug
-	movl	%ebp, %esp
+	movl	%ebx, %esp
 	jmp	ret_from_exception
 END(debug)
 
@@ -1103,6 +1126,7 @@ ENTRY(nmi)
 
 	pushl	%eax				# pt_regs->orig_ax
 	SAVE_ALL
+	ENCODE_FRAME_POINTER
 	xorl	%edx, %edx			# zero error code
 	movl	%esp, %eax			# pt_regs pointer
 
@@ -1121,10 +1145,10 @@ ENTRY(nmi)
 	 * We're on the SYSENTER stack.  Switch off.  No one (not even debug)
 	 * is using the thread stack right now, so it's safe for us to use it.
 	 */
-	movl	%esp, %ebp
+	movl	%esp, %ebx
 	movl	PER_CPU_VAR(cpu_current_top_of_stack), %esp
 	call	do_nmi
-	movl	%ebp, %esp
+	movl	%ebx, %esp
 	jmp	.Lrestore_all_notrace
 
 #ifdef CONFIG_X86_ESPFIX32
@@ -1141,6 +1165,7 @@ ENTRY(nmi)
 	.endr
 	pushl	%eax
 	SAVE_ALL
+	ENCODE_FRAME_POINTER
 	FIXUP_ESPFIX_STACK			# %eax == %esp
 	xorl	%edx, %edx			# zero error code
 	call	do_nmi
@@ -1154,6 +1179,7 @@ ENTRY(int3)
 	ASM_CLAC
 	pushl	$-1				# mark this as an int
 	SAVE_ALL
+	ENCODE_FRAME_POINTER
 	TRACE_IRQS_OFF
 	xorl	%edx, %edx			# zero error code
 	movl	%esp, %eax			# pt_regs pointer
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index f6b40e5..6200318 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -434,6 +434,7 @@ END(irq_entries_start)
 	ALLOC_PT_GPREGS_ON_STACK
 	SAVE_C_REGS
 	SAVE_EXTRA_REGS
+	ENCODE_FRAME_POINTER
 
 	testb	$3, CS(%rsp)
 	jz	1f
@@ -907,6 +908,7 @@ ENTRY(xen_failsafe_callback)
 	ALLOC_PT_GPREGS_ON_STACK
 	SAVE_C_REGS
 	SAVE_EXTRA_REGS
+	ENCODE_FRAME_POINTER
 	jmp	error_exit
 END(xen_failsafe_callback)
 
@@ -950,6 +952,7 @@ ENTRY(paranoid_entry)
 	cld
 	SAVE_C_REGS 8
 	SAVE_EXTRA_REGS 8
+	ENCODE_FRAME_POINTER 8
 	movl	$1, %ebx
 	movl	$MSR_GS_BASE, %ecx
 	rdmsr
@@ -997,6 +1000,7 @@ ENTRY(error_entry)
 	cld
 	SAVE_C_REGS 8
 	SAVE_EXTRA_REGS 8
+	ENCODE_FRAME_POINTER 8
 	xorl	%ebx, %ebx
 	testb	$3, CS+8(%rsp)
 	jz	.Lerror_kernelspace
@@ -1179,6 +1183,7 @@ ENTRY(nmi)
 	pushq	%r13		/* pt_regs->r13 */
 	pushq	%r14		/* pt_regs->r14 */
 	pushq	%r15		/* pt_regs->r15 */
+	ENCODE_FRAME_POINTER
 
 	/*
 	 * At this point we no longer need to worry about stack damage
@@ -1192,11 +1197,10 @@ ENTRY(nmi)
 
 	/*
 	 * Return back to user mode.  We must *not* do the normal exit
-	 * work, because we don't want to enable interrupts.  Fortunately,
-	 * do_nmi doesn't modify pt_regs.
+	 * work, because we don't want to enable interrupts.
 	 */
 	SWAPGS
-	jmp	restore_c_regs_and_iret
+	jmp	restore_regs_and_iret
 
 .Lnmi_from_kernel:
 	/*
diff --git a/arch/x86/include/asm/unwind.h b/arch/x86/include/asm/unwind.h
index 8306536..b79a8a6 100644
--- a/arch/x86/include/asm/unwind.h
+++ b/arch/x86/include/asm/unwind.h
@@ -13,6 +13,7 @@ struct unwind_state {
 	int graph_idx;
 #ifdef CONFIG_FRAME_POINTER
 	unsigned long *bp;
+	struct pt_regs *regs;
 #else
 	unsigned long *sp;
 #endif
@@ -46,7 +47,7 @@ unsigned long *unwind_get_return_address_ptr(struct unwind_state *state)
 	if (unwind_done(state))
 		return NULL;
 
-	return state->bp + 1;
+	return state->regs ? &state->regs->ip : state->bp + 1;
 }
 
 static inline unsigned long *unwind_get_stack_ptr(struct unwind_state *state)
@@ -54,7 +55,15 @@ static inline unsigned long *unwind_get_stack_ptr(struct unwind_state *state)
 	if (unwind_done(state))
 		return NULL;
 
-	return state->bp;
+	return state->regs ? (unsigned long *)state->regs : state->bp;
+}
+
+static inline struct pt_regs *unwind_get_entry_regs(struct unwind_state *state)
+{
+	if (unwind_done(state))
+		return NULL;
+
+	return state->regs;
 }
 
 unsigned long unwind_get_return_address(struct unwind_state *state);
@@ -75,6 +84,11 @@ static inline unsigned long *unwind_get_stack_ptr(struct unwind_state *state)
 	return state->sp;
 }
 
+static inline struct pt_regs *unwind_get_entry_regs(struct unwind_state *state)
+{
+	return NULL;
+}
+
 static inline
 unsigned long unwind_get_return_address(struct unwind_state *state)
 {
diff --git a/arch/x86/kernel/unwind_frame.c b/arch/x86/kernel/unwind_frame.c
index 416908f..5cee693 100644
--- a/arch/x86/kernel/unwind_frame.c
+++ b/arch/x86/kernel/unwind_frame.c
@@ -14,6 +14,9 @@ unsigned long unwind_get_return_address(struct unwind_state *state)
 	if (unwind_done(state))
 		return 0;
 
+	if (state->regs && user_mode(state->regs))
+		return 0;
+
 	addr = ftrace_graph_ret_addr(state->task, &state->graph_idx, *addr_p,
 				     addr_p);
 
@@ -21,6 +24,24 @@ unsigned long unwind_get_return_address(struct unwind_state *state)
 }
 EXPORT_SYMBOL_GPL(unwind_get_return_address);
 
+/*
+ * This determines if the frame pointer actually contains an encoded pointer to
+ * pt_regs on the stack.  See ENCODE_FRAME_POINTER.
+ */
+static struct pt_regs *decode_frame_pointer(unsigned long *bp)
+{
+	unsigned long regs = (unsigned long)bp;
+
+	/* if the MSB is set, it's not an encoded pointer */
+	if (regs & (1UL << (BITS_PER_LONG - 1)))
+		return NULL;
+
+	/* decode it by setting the MSB */
+	regs |= 1UL << (BITS_PER_LONG - 1);
+
+	return (struct pt_regs *)regs;
+}
+
 static bool update_stack_state(struct unwind_state *state, void *addr,
 			       size_t len)
 {
@@ -46,26 +67,59 @@ static bool update_stack_state(struct unwind_state *state, void *addr,
 
 bool unwind_next_frame(struct unwind_state *state)
 {
-	unsigned long *next_bp;
+	struct pt_regs *regs;
+	unsigned long *next_bp, *next_frame;
+	size_t next_len;
 
 	if (unwind_done(state))
 		return false;
 
-	next_bp = (unsigned long *)*state->bp;
+	/* have we reached the end? */
+	if (state->regs && user_mode(state->regs))
+		goto the_end;
+
+	/* get the next frame pointer */
+	if (state->regs)
+		next_bp = (unsigned long *)state->regs->bp;
+	else
+		next_bp = (unsigned long *)*state->bp;
+
+	/* is the next frame pointer an encoded pointer to pt_regs? */
+	regs = decode_frame_pointer(next_bp);
+	if (regs) {
+		next_frame = (unsigned long *)regs;
+		next_len = sizeof(*regs);
+	} else {
+		next_frame = next_bp;
+		next_len = FRAME_HEADER_SIZE;
+	}
 
 	/* make sure the next frame's data is accessible */
-	if (!update_stack_state(state, next_bp, FRAME_HEADER_SIZE))
+	if (!update_stack_state(state, next_frame, next_len))
 		return false;
-
 	/* move to the next frame */
-	state->bp = next_bp;
+	if (regs) {
+		state->regs = regs;
+		state->bp = NULL;
+	} else {
+		state->bp = next_bp;
+		state->regs = NULL;
+	}
+
 	return true;
+
+the_end:
+	state->stack_info.type = STACK_TYPE_UNKNOWN;
+	return false;
 }
 EXPORT_SYMBOL_GPL(unwind_next_frame);
 
 void __unwind_start(struct unwind_state *state, struct task_struct *task,
 		    struct pt_regs *regs, unsigned long *first_frame)
 {
+	unsigned long *bp, *frame;
+	size_t len;
+
 	memset(state, 0, sizeof(*state));
 	state->task = task;
 
@@ -76,12 +130,22 @@ void __unwind_start(struct unwind_state *state, struct task_struct *task,
 	}
 
 	/* set up the starting stack frame */
-	state->bp = get_frame_pointer(task, regs);
+	bp = get_frame_pointer(task, regs);
+	regs = decode_frame_pointer(bp);
+	if (regs) {
+		state->regs = regs;
+		frame = (unsigned long *)regs;
+		len = sizeof(*regs);
+	} else {
+		state->bp = bp;
+		frame = bp;
+		len = FRAME_HEADER_SIZE;
+	}
 
 	/* initialize stack info and make sure the frame data is accessible */
-	get_stack_info(state->bp, state->task, &state->stack_info,
+	get_stack_info(frame, state->task, &state->stack_info,
 		       &state->stack_mask);
-	update_stack_state(state, state->bp, FRAME_HEADER_SIZE);
+	update_stack_state(state, frame, len);
 
 	/*
 	 * The caller can provide the address of the first frame directly
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v4 43/57] x86/unwind: create stack frames for saved syscall registers
  2016-08-18 13:05 [PATCH v4 00/57] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (41 preceding siblings ...)
  2016-08-18 13:06 ` [PATCH v4 42/57] x86/entry/unwind: create stack frames for saved interrupt registers Josh Poimboeuf
@ 2016-08-18 13:06 ` Josh Poimboeuf
  2016-08-18 13:06 ` [PATCH v4 44/57] x86/dumpstack: print stack identifier on its own line Josh Poimboeuf
                   ` (14 subsequent siblings)
  57 siblings, 0 replies; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-18 13:06 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

The entry code doesn't encode pt_regs for syscalls.  But they're always
at the same location, so we can add a manual check for them.

A later patch prints them as part of the oops stack dump.  They could be
useful, for example, to determine the arguments to a system call.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/unwind_frame.c | 35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

diff --git a/arch/x86/kernel/unwind_frame.c b/arch/x86/kernel/unwind_frame.c
index 5cee693..2e729bf 100644
--- a/arch/x86/kernel/unwind_frame.c
+++ b/arch/x86/kernel/unwind_frame.c
@@ -24,6 +24,14 @@ unsigned long unwind_get_return_address(struct unwind_state *state)
 }
 EXPORT_SYMBOL_GPL(unwind_get_return_address);
 
+static bool is_last_task_frame(struct unwind_state *state)
+{
+	unsigned long bp = (unsigned long)state->bp;
+	unsigned long regs = (unsigned long)task_pt_regs(state->task);
+
+	return bp == regs - FRAME_HEADER_SIZE;
+}
+
 /*
  * This determines if the frame pointer actually contains an encoded pointer to
  * pt_regs on the stack.  See ENCODE_FRAME_POINTER.
@@ -78,6 +86,33 @@ bool unwind_next_frame(struct unwind_state *state)
 	if (state->regs && user_mode(state->regs))
 		goto the_end;
 
+	if (is_last_task_frame(state)) {
+		regs = task_pt_regs(state->task);
+
+		/*
+		 * kthreads (other than the boot CPU's idle thread) have some
+		 * partial regs at the end of their stack which were placed
+		 * there by copy_thread_tls().  But the regs don't have any
+		 * useful information, so we can skip them.
+		 *
+		 * This user_mode() check is slightly broader than a PF_KTHREAD
+		 * check because it also catches the awkward situation where a
+		 * newly forked kthread transitions into a user task by calling
+		 * do_execve(), which eventually clears PF_KTHREAD.
+		 */
+		if (!user_mode(regs))
+			goto the_end;
+
+		/*
+		 * We're almost at the end, but not quite: there's still the
+		 * syscall regs frame.  Entry code doesn't encode the regs
+		 * pointer for syscalls, so we have to set it manually.
+		 */
+		state->regs = regs;
+		state->bp = NULL;
+		return true;
+	}
+
 	/* get the next frame pointer */
 	if (state->regs)
 		next_bp = (unsigned long *)state->regs->bp;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v4 44/57] x86/dumpstack: print stack identifier on its own line
  2016-08-18 13:05 [PATCH v4 00/57] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (42 preceding siblings ...)
  2016-08-18 13:06 ` [PATCH v4 43/57] x86/unwind: create stack frames for saved syscall registers Josh Poimboeuf
@ 2016-08-18 13:06 ` Josh Poimboeuf
  2016-08-18 13:06 ` [PATCH v4 45/57] x86/dumpstack: print any pt_regs found on the stack Josh Poimboeuf
                   ` (13 subsequent siblings)
  57 siblings, 0 replies; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-18 13:06 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

show_trace_log_lvl() prints the stack id (e.g. "<IRQ>") without a
newline so that any stack address printed after it will appear on the
same line.  That causes the first stack address to be vertically
misaligned with the rest, making it visually cluttered and slightly
confusing:

  Call Trace:
   <IRQ> [<ffffffff814431c3>] dump_stack+0x86/0xc3
   [<ffffffff8100828b>] perf_callchain_kernel+0x14b/0x160
   [<ffffffff811e915f>] get_perf_callchain+0x15f/0x2b0
   ...
   <EOI> [<ffffffff8189c6c3>] ? _raw_spin_unlock_irq+0x33/0x60
   [<ffffffff810e1c84>] finish_task_switch+0xb4/0x250
   [<ffffffff8106f7dc>] do_async_page_fault+0x2c/0xa0

It will look worse once we start printing pt_regs registers found in the
middle of the stack:

  <IRQ> RIP: 0010:[<ffffffff8189c6c3>]  [<ffffffff8189c6c3>] _raw_spin_unlock_irq+0x33/0x60
  RSP: 0018:ffff88007876f720  EFLAGS: 00000206
  RAX: ffff8800786caa40 RBX: ffff88007d5da140 RCX: 0000000000000007
  ...

Improve readability by adding a newline to the stack name:

  Call Trace:
   <IRQ>
   [<ffffffff814431c3>] dump_stack+0x86/0xc3
   [<ffffffff8100828b>] perf_callchain_kernel+0x14b/0x160
   [<ffffffff811e915f>] get_perf_callchain+0x15f/0x2b0
   ...
   <EOI>
   [<ffffffff8189c6c3>] ? _raw_spin_unlock_irq+0x33/0x60
   [<ffffffff810e1c84>] finish_task_switch+0xb4/0x250
   [<ffffffff8106f7dc>] do_async_page_fault+0x2c/0xa0

Now that "continued" lines are no longer needed, we can also remove the
hack of using the empty string (aka KERN_CONT) and replace it with
KERN_DEFAULT.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/dumpstack.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index 7e66837..1c25a6d 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -101,7 +101,7 @@ void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
 
 		stack_type_str(stack_info.type, &str_begin, &str_end);
 		if (str_begin)
-			printk("%s <%s> ", log_lvl, str_begin);
+			printk("%s <%s>\n", log_lvl, str_begin);
 
 		/*
 		 * Scan the stack, printing any text addresses we find.  At the
@@ -153,7 +153,7 @@ void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
 		}
 
 		if (str_end)
-			printk("%s <%s> ", log_lvl, str_end);
+			printk("%s <%s>\n", log_lvl, str_end);
 	}
 }
 
@@ -166,12 +166,12 @@ void show_stack(struct task_struct *task, unsigned long *sp)
 	if (!sp && (!task || task == current))
 		sp = get_stack_pointer(current, NULL);
 
-	show_stack_log_lvl(task, NULL, sp, "");
+	show_stack_log_lvl(task, NULL, sp, KERN_DEFAULT);
 }
 
 void show_stack_regs(struct pt_regs *regs)
 {
-	show_stack_log_lvl(NULL, regs, NULL, "");
+	show_stack_log_lvl(NULL, regs, NULL, KERN_DEFAULT);
 }
 
 static arch_spinlock_t die_lock = __ARCH_SPIN_LOCK_UNLOCKED;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v4 45/57] x86/dumpstack: print any pt_regs found on the stack
  2016-08-18 13:05 [PATCH v4 00/57] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (43 preceding siblings ...)
  2016-08-18 13:06 ` [PATCH v4 44/57] x86/dumpstack: print stack identifier on its own line Josh Poimboeuf
@ 2016-08-18 13:06 ` Josh Poimboeuf
  2016-08-18 13:06 ` [PATCH v4 46/57] x86/dumpstack: fix duplicate RIP address display in __show_regs() Josh Poimboeuf
                   ` (12 subsequent siblings)
  57 siblings, 0 replies; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-18 13:06 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

Now that we can find pt_regs registers on the stack, print them.  Here's
an example of what it looks like:

Call Trace:
 <IRQ>
 [<ffffffff8144b793>] dump_stack+0x86/0xc3
 [<ffffffff81142c73>] hrtimer_interrupt+0xb3/0x1c0
 [<ffffffff8105eb86>] local_apic_timer_interrupt+0x36/0x60
 [<ffffffff818b27cd>] smp_apic_timer_interrupt+0x3d/0x50
 [<ffffffff818b06ee>] apic_timer_interrupt+0x9e/0xb0
RIP: 0010:[<ffffffff818aef43>]  [<ffffffff818aef43>] _raw_spin_unlock_irq+0x33/0x60
RSP: 0018:ffff880079c4f760  EFLAGS: 00000202
RAX: ffff880078738000 RBX: ffff88007d3da0c0 RCX: 0000000000000007
RDX: 0000000000006d78 RSI: ffff8800787388f0 RDI: ffff880078738000
RBP: ffff880079c4f768 R08: 0000002199088f38 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff81e0d540
R13: ffff8800369fb700 R14: 0000000000000000 R15: ffff880078738000
 <EOI>
 [<ffffffff810e1f14>] finish_task_switch+0xb4/0x250
 [<ffffffff810e1ed6>] ? finish_task_switch+0x76/0x250
 [<ffffffff818a7b61>] __schedule+0x3e1/0xb20
 ...
 [<ffffffff810759c8>] trace_do_page_fault+0x58/0x2c0
 [<ffffffff8106f7dc>] do_async_page_fault+0x2c/0xa0
 [<ffffffff818b1dd8>] async_page_fault+0x28/0x30
RIP: 0010:[<ffffffff8145b062>]  [<ffffffff8145b062>] __clear_user+0x42/0x70
RSP: 0018:ffff880079c4fd38  EFLAGS: 00010202
RAX: 0000000000000000 RBX: 0000000000000138 RCX: 0000000000000138
RDX: 0000000000000000 RSI: 0000000000000008 RDI: 000000000061b640
RBP: ffff880079c4fd48 R08: 0000002198feefd7 R09: ffffffff82a40928
R10: 0000000000000001 R11: 0000000000000000 R12: 000000000061b640
R13: 0000000000000000 R14: ffff880079c50000 R15: ffff8800791d7400
 [<ffffffff8145b043>] ? __clear_user+0x23/0x70
 [<ffffffff8145b0fb>] clear_user+0x2b/0x40
 [<ffffffff812fbda2>] load_elf_binary+0x1472/0x1750
 [<ffffffff8129a591>] search_binary_handler+0xa1/0x200
 [<ffffffff8129b69b>] do_execveat_common.isra.36+0x6cb/0x9f0
 [<ffffffff8129b5f3>] ? do_execveat_common.isra.36+0x623/0x9f0
 [<ffffffff8129bcaa>] SyS_execve+0x3a/0x50
 [<ffffffff81003f5c>] do_syscall_64+0x6c/0x1e0
 [<ffffffff818afa3f>] entry_SYSCALL64_slow_path+0x25/0x25
RIP: 0033:[<00007fd2e2f2e537>]  [<00007fd2e2f2e537>] 0x7fd2e2f2e537
RSP: 002b:00007ffc449c5fc8  EFLAGS: 00000246
RAX: ffffffffffffffda RBX: 00007ffc449c8860 RCX: 00007fd2e2f2e537
RDX: 000000000127cc40 RSI: 00007ffc449c8860 RDI: 00007ffc449c6029
RBP: 00007ffc449c60b0 R08: 65726f632d667265 R09: 00007ffc449c5e20
R10: 00000000000005a7 R11: 0000000000000246 R12: 000000000127cc40
R13: 000000000127ce05 R14: 00007ffc449c6029 R15: 000000000127ce01

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/dumpstack.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index 1c25a6d..9af8bb1 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -86,7 +86,7 @@ void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
 	 * - softirq stack
 	 * - hardirq stack
 	 */
-	for (; stack; stack = stack_info.next_sp) {
+	for (regs = NULL; stack; stack = stack_info.next_sp) {
 		const char *str_begin, *str_end;
 
 		/*
@@ -123,6 +123,15 @@ void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
 			if (!__kernel_text_address(addr))
 				continue;
 
+			/*
+			 * Don't print regs->ip again if it was already printed
+			 * by __show_regs() below.
+			 */
+			if (regs && stack == &regs->ip) {
+				unwind_next_frame(&state);
+				continue;
+			}
+
 			if (stack == ret_addr_p)
 				reliable = 1;
 
@@ -150,6 +159,11 @@ void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
 			 * of the addresses will just be printed as unreliable.
 			 */
 			unwind_next_frame(&state);
+
+			/* if the frame has entry regs, print them */
+			regs = unwind_get_entry_regs(&state);
+			if (regs)
+				__show_regs(regs, 0);
 		}
 
 		if (str_end)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v4 46/57] x86/dumpstack: fix duplicate RIP address display in __show_regs()
  2016-08-18 13:05 [PATCH v4 00/57] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (44 preceding siblings ...)
  2016-08-18 13:06 ` [PATCH v4 45/57] x86/dumpstack: print any pt_regs found on the stack Josh Poimboeuf
@ 2016-08-18 13:06 ` Josh Poimboeuf
  2016-08-18 13:06 ` [PATCH v4 47/57] x86/dumpstack: print orig_ax " Josh Poimboeuf
                   ` (11 subsequent siblings)
  57 siblings, 0 replies; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-18 13:06 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

The RIP address is shown twice in __show_regs().  Before:

  RIP: 0010:[<ffffffff81070446>]  [<ffffffff81070446>] native_write_msr+0x6/0x30

After:

  RIP: 0010:[<ffffffff81070446>] native_write_msr+0x6/0x30

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/process_64.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 63236d8..6c43d29 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -62,8 +62,8 @@ void __show_regs(struct pt_regs *regs, int all)
 	unsigned int fsindex, gsindex;
 	unsigned int ds, cs, es;
 
-	printk(KERN_DEFAULT "RIP: %04lx:[<%016lx>] ", regs->cs & 0xffff, regs->ip);
-	printk_address(regs->ip);
+	printk(KERN_DEFAULT "RIP: %04lx:[<%016lx>] %pS\n", regs->cs & 0xffff,
+			regs->ip, (void *)regs->ip);
 	printk(KERN_DEFAULT "RSP: %04lx:%016lx  EFLAGS: %08lx\n", regs->ss,
 			regs->sp, regs->flags);
 	printk(KERN_DEFAULT "RAX: %016lx RBX: %016lx RCX: %016lx\n",
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v4 47/57] x86/dumpstack: print orig_ax in __show_regs()
  2016-08-18 13:05 [PATCH v4 00/57] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (45 preceding siblings ...)
  2016-08-18 13:06 ` [PATCH v4 46/57] x86/dumpstack: fix duplicate RIP address display in __show_regs() Josh Poimboeuf
@ 2016-08-18 13:06 ` Josh Poimboeuf
  2016-08-18 13:06 ` [PATCH v4 48/57] x86: remove 64-byte gap at end of irq stack Josh Poimboeuf
                   ` (10 subsequent siblings)
  57 siblings, 0 replies; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-18 13:06 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

The value of regs->orig_ax contains potentially useful debugging data:
For syscalls it contains the syscall number.  For interrupts it contains
the (negated) vector number.  To reduce noise, print it only if it has a
useful value (i.e., something other than -1).

Here's what it looks like for a write syscall:

  RIP: 0033:[<00007f53ad7b1940>] 0x7f53ad7b1940
  RSP: 002b:00007fff8de66558 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
  RAX: ffffffffffffffda RBX: 0000000000000046 RCX: 00007f53ad7b1940
  RDX: 0000000000000002 RSI: 00007f53ae0ca000 RDI: 0000000000000001
  ...

Suggested-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/process_64.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 6c43d29..cde6fd0 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -64,8 +64,13 @@ void __show_regs(struct pt_regs *regs, int all)
 
 	printk(KERN_DEFAULT "RIP: %04lx:[<%016lx>] %pS\n", regs->cs & 0xffff,
 			regs->ip, (void *)regs->ip);
-	printk(KERN_DEFAULT "RSP: %04lx:%016lx  EFLAGS: %08lx\n", regs->ss,
+	printk(KERN_DEFAULT "RSP: %04lx:%016lx EFLAGS: %08lx", regs->ss,
 			regs->sp, regs->flags);
+	if (regs->orig_ax != -1)
+		pr_cont(" ORIG_RAX: %016lx\n", regs->orig_ax);
+	else
+		pr_cont("\n");
+
 	printk(KERN_DEFAULT "RAX: %016lx RBX: %016lx RCX: %016lx\n",
 	       regs->ax, regs->bx, regs->cx);
 	printk(KERN_DEFAULT "RDX: %016lx RSI: %016lx RDI: %016lx\n",
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v4 48/57] x86: remove 64-byte gap at end of irq stack
  2016-08-18 13:05 [PATCH v4 00/57] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (46 preceding siblings ...)
  2016-08-18 13:06 ` [PATCH v4 47/57] x86/dumpstack: print orig_ax " Josh Poimboeuf
@ 2016-08-18 13:06 ` Josh Poimboeuf
  2016-08-18 13:06 ` [PATCH v4 49/57] x86/unwind: warn on kernel stack corruption Josh Poimboeuf
                   ` (9 subsequent siblings)
  57 siblings, 0 replies; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-18 13:06 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

There has been a 64-byte gap at the end of the irq stack for at least 12
years.  It predates git history, and I can't find any good reason for
it.  Remove it.  What's the worst that could happen?

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/include/asm/page_64_types.h | 3 ---
 arch/x86/kernel/cpu/common.c         | 2 +-
 arch/x86/kernel/dumpstack_64.c       | 4 ++--
 arch/x86/kernel/setup_percpu.c       | 2 +-
 4 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/page_64_types.h b/arch/x86/include/asm/page_64_types.h
index 6256baf..3c0be3b 100644
--- a/arch/x86/include/asm/page_64_types.h
+++ b/arch/x86/include/asm/page_64_types.h
@@ -24,9 +24,6 @@
 #define IRQ_STACK_ORDER		(2 + KASAN_STACK_ORDER)
 #define IRQ_STACK_SIZE		(PAGE_SIZE << IRQ_STACK_ORDER)
 
-/* FIXME: why? */
-#define IRQ_USABLE_STACK_SIZE	(IRQ_STACK_SIZE - 64)
-
 #define DOUBLEFAULT_STACK 1
 #define NMI_STACK 2
 #define DEBUG_STACK 3
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 55684b1..ce7a4c1 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1286,7 +1286,7 @@ DEFINE_PER_CPU(struct task_struct *, current_task) ____cacheline_aligned =
 EXPORT_PER_CPU_SYMBOL(current_task);
 
 DEFINE_PER_CPU(char *, irq_stack_ptr) =
-	init_per_cpu_var(irq_stack_union.irq_stack) + IRQ_USABLE_STACK_SIZE;
+	init_per_cpu_var(irq_stack_union.irq_stack) + IRQ_STACK_SIZE;
 
 DEFINE_PER_CPU(unsigned int, irq_count) __visible = -1;
 
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index 8be240f..33f3142 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -78,7 +78,7 @@ static bool in_exception_stack(unsigned long *stack, struct stack_info *info)
 static bool in_irq_stack(unsigned long *stack, struct stack_info *info)
 {
 	unsigned long *end   = (unsigned long *)this_cpu_read(irq_stack_ptr);
-	unsigned long *begin = end - (IRQ_USABLE_STACK_SIZE / sizeof(long));
+	unsigned long *begin = end - (IRQ_STACK_SIZE / sizeof(long));
 
 	if (stack < begin || stack > end)
 		return false;
@@ -145,7 +145,7 @@ void show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
 	int i;
 
 	irq_stack_end = (unsigned long *)this_cpu_read(irq_stack_ptr);
-	irq_stack     = irq_stack_end - (IRQ_USABLE_STACK_SIZE / sizeof(long));
+	irq_stack     = irq_stack_end - (IRQ_STACK_SIZE / sizeof(long));
 
 	sp = sp ? : get_stack_pointer(task, regs);
 
diff --git a/arch/x86/kernel/setup_percpu.c b/arch/x86/kernel/setup_percpu.c
index a2a0eae..2bbd27f 100644
--- a/arch/x86/kernel/setup_percpu.c
+++ b/arch/x86/kernel/setup_percpu.c
@@ -246,7 +246,7 @@ void __init setup_per_cpu_areas(void)
 #ifdef CONFIG_X86_64
 		per_cpu(irq_stack_ptr, cpu) =
 			per_cpu(irq_stack_union.irq_stack, cpu) +
-			IRQ_USABLE_STACK_SIZE;
+			IRQ_STACK_SIZE;
 #endif
 #ifdef CONFIG_NUMA
 		per_cpu(x86_cpu_to_node_map, cpu) =
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v4 49/57] x86/unwind: warn on kernel stack corruption
  2016-08-18 13:05 [PATCH v4 00/57] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (47 preceding siblings ...)
  2016-08-18 13:06 ` [PATCH v4 48/57] x86: remove 64-byte gap at end of irq stack Josh Poimboeuf
@ 2016-08-18 13:06 ` Josh Poimboeuf
  2016-08-18 13:06 ` [PATCH v4 50/57] x86/unwind: warn on bad stack return address Josh Poimboeuf
                   ` (8 subsequent siblings)
  57 siblings, 0 replies; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-18 13:06 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

Detect situations in the unwinder where the frame pointer refers to a
bad address, and print an appropriate warning.

Use printk_deferred_once() because the unwinder can be called with the
console lock by lockdep via save_stack_trace().

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/unwind_frame.c | 24 ++++++++++++++++++++++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/unwind_frame.c b/arch/x86/kernel/unwind_frame.c
index 2e729bf..65c5e416 100644
--- a/arch/x86/kernel/unwind_frame.c
+++ b/arch/x86/kernel/unwind_frame.c
@@ -130,8 +130,17 @@ bool unwind_next_frame(struct unwind_state *state)
 	}
 
 	/* make sure the next frame's data is accessible */
-	if (!update_stack_state(state, next_frame, next_len))
-		return false;
+	if (!update_stack_state(state, next_frame, next_len)) {
+		/*
+		 * Don't warn on bad regs->bp.  An interrupt in entry code
+		 * might cause a false positive warning.
+		 */
+		if (state->regs)
+			goto the_end;
+
+		goto bad_address;
+	}
+
 	/* move to the next frame */
 	if (regs) {
 		state->regs = regs;
@@ -143,6 +152,17 @@ bool unwind_next_frame(struct unwind_state *state)
 
 	return true;
 
+bad_address:
+	if (state->regs)
+		printk_deferred_once(KERN_WARNING
+			"WARNING: kernel stack regs at %p in %s:%d has bad 'bp' value %p\n",
+			state->regs, state->task->comm,
+			state->task->pid, next_bp);
+	else
+		printk_deferred_once(KERN_WARNING
+			"WARNING: kernel stack frame pointer at %p in %s:%d has bad value %p\n",
+			state->bp, state->task->comm,
+			state->task->pid, next_bp);
 the_end:
 	state->stack_info.type = STACK_TYPE_UNKNOWN;
 	return false;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v4 50/57] x86/unwind: warn on bad stack return address
  2016-08-18 13:05 [PATCH v4 00/57] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (48 preceding siblings ...)
  2016-08-18 13:06 ` [PATCH v4 49/57] x86/unwind: warn on kernel stack corruption Josh Poimboeuf
@ 2016-08-18 13:06 ` Josh Poimboeuf
  2016-08-18 13:06 ` [PATCH v4 51/57] x86/unwind: warn if stack grows up Josh Poimboeuf
                   ` (7 subsequent siblings)
  57 siblings, 0 replies; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-18 13:06 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

If __kernel_text_address() doesn't recognize a return address on the
stack, it probably means that it's some generated code which
__kernel_text_address() doesn't know about yet.

Otherwise there's probably some stack corruption.

Either way, warn about it.

Use printk_deferred_once() because the unwinder can be called with the
console lock by lockdep via save_stack_trace().

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/unwind_frame.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/unwind_frame.c b/arch/x86/kernel/unwind_frame.c
index 65c5e416..4dbdf4f 100644
--- a/arch/x86/kernel/unwind_frame.c
+++ b/arch/x86/kernel/unwind_frame.c
@@ -20,7 +20,15 @@ unsigned long unwind_get_return_address(struct unwind_state *state)
 	addr = ftrace_graph_ret_addr(state->task, &state->graph_idx, *addr_p,
 				     addr_p);
 
-	return __kernel_text_address(addr) ? addr : 0;
+	if (!__kernel_text_address(addr)) {
+		printk_deferred_once(KERN_WARNING
+			"WARNING: unrecognized kernel stack return address %p at %p in %s:%d\n",
+			(void *)addr, addr_p, state->task->comm,
+			state->task->pid);
+		return 0;
+	}
+
+	return addr;
 }
 EXPORT_SYMBOL_GPL(unwind_get_return_address);
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v4 51/57] x86/unwind: warn if stack grows up
  2016-08-18 13:05 [PATCH v4 00/57] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (49 preceding siblings ...)
  2016-08-18 13:06 ` [PATCH v4 50/57] x86/unwind: warn on bad stack return address Josh Poimboeuf
@ 2016-08-18 13:06 ` Josh Poimboeuf
  2016-08-18 13:06 ` [PATCH v4 52/57] x86/dumpstack: warn on stack recursion Josh Poimboeuf
                   ` (6 subsequent siblings)
  57 siblings, 0 replies; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-18 13:06 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

Add a sanity check to ensure the stack only grows down, and print a
warning if the check fails.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/unwind_frame.c | 26 ++++++++++++++++++++++++--
 1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/unwind_frame.c b/arch/x86/kernel/unwind_frame.c
index 4dbdf4f..140dac4 100644
--- a/arch/x86/kernel/unwind_frame.c
+++ b/arch/x86/kernel/unwind_frame.c
@@ -32,6 +32,15 @@ unsigned long unwind_get_return_address(struct unwind_state *state)
 }
 EXPORT_SYMBOL_GPL(unwind_get_return_address);
 
+static size_t regs_size(struct pt_regs *regs)
+{
+	/* x86_32 regs from kernel mode are two words shorter */
+	if (IS_ENABLED(CONFIG_X86_32) && !user_mode(regs))
+		return sizeof(*regs) - (2*sizeof(long));
+
+	return sizeof(*regs);
+}
+
 static bool is_last_task_frame(struct unwind_state *state)
 {
 	unsigned long bp = (unsigned long)state->bp;
@@ -86,6 +95,7 @@ bool unwind_next_frame(struct unwind_state *state)
 	struct pt_regs *regs;
 	unsigned long *next_bp, *next_frame;
 	size_t next_len;
+	enum stack_type prev_type = state->stack_info.type;
 
 	if (unwind_done(state))
 		return false;
@@ -149,6 +159,18 @@ bool unwind_next_frame(struct unwind_state *state)
 		goto bad_address;
 	}
 
+	/* make sure it only unwinds up and doesn't overlap the last frame */
+	if (state->stack_info.type == prev_type) {
+		if (state->regs &&
+		    (void *)next_frame < (void *)state->regs +
+					 regs_size(state->regs))
+			goto bad_address;
+
+		if (state->bp &&
+		    (void *)next_frame < (void *)state->bp + FRAME_HEADER_SIZE)
+			goto bad_address;
+	}
+
 	/* move to the next frame */
 	if (regs) {
 		state->regs = regs;
@@ -165,12 +187,12 @@ bad_address:
 		printk_deferred_once(KERN_WARNING
 			"WARNING: kernel stack regs at %p in %s:%d has bad 'bp' value %p\n",
 			state->regs, state->task->comm,
-			state->task->pid, next_bp);
+			state->task->pid, next_frame);
 	else
 		printk_deferred_once(KERN_WARNING
 			"WARNING: kernel stack frame pointer at %p in %s:%d has bad value %p\n",
 			state->bp, state->task->comm,
-			state->task->pid, next_bp);
+			state->task->pid, next_frame);
 the_end:
 	state->stack_info.type = STACK_TYPE_UNKNOWN;
 	return false;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v4 52/57] x86/dumpstack: warn on stack recursion
  2016-08-18 13:05 [PATCH v4 00/57] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (50 preceding siblings ...)
  2016-08-18 13:06 ` [PATCH v4 51/57] x86/unwind: warn if stack grows up Josh Poimboeuf
@ 2016-08-18 13:06 ` Josh Poimboeuf
  2016-08-18 13:06 ` [PATCH v4 53/57] x86/mm: move arch_within_stack_frames() to usercopy.c Josh Poimboeuf
                   ` (5 subsequent siblings)
  57 siblings, 0 replies; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-18 13:06 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

Print a warning if stack recursion is detected.

Use printk_deferred_once() because the unwinder can be called with the
console lock by lockdep via save_stack_trace().

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/dumpstack_32.c | 5 ++++-
 arch/x86/kernel/dumpstack_64.c | 5 ++++-
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
index fa7a85c..d78bd4b 100644
--- a/arch/x86/kernel/dumpstack_32.c
+++ b/arch/x86/kernel/dumpstack_32.c
@@ -101,8 +101,11 @@ recursion_check:
 	 * just break out and report an unknown stack type.
 	 */
 	if (visit_mask) {
-		if (*visit_mask & (1UL << info->type))
+		if (*visit_mask & (1UL << info->type)) {
+			printk_deferred_once(KERN_WARNING "WARNING: stack recursion on stack type %d\n",
+					     info->type);
 			goto unknown;
+		}
 		*visit_mask |= 1UL << info->type;
 	}
 
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index 33f3142..fa2ba2f 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -125,8 +125,11 @@ recursion_check:
 	 * just break out and report an unknown stack type.
 	 */
 	if (visit_mask) {
-		if (*visit_mask & (1UL << info->type))
+		if (*visit_mask & (1UL << info->type)) {
+			printk_deferred_once(KERN_WARNING "WARNING: stack recursion on stack type %d\n",
+					     info->type);
 			goto unknown;
+		}
 		*visit_mask |= 1UL << info->type;
 	}
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v4 53/57] x86/mm: move arch_within_stack_frames() to usercopy.c
  2016-08-18 13:05 [PATCH v4 00/57] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (51 preceding siblings ...)
  2016-08-18 13:06 ` [PATCH v4 52/57] x86/dumpstack: warn on stack recursion Josh Poimboeuf
@ 2016-08-18 13:06 ` Josh Poimboeuf
  2016-08-18 13:06 ` [PATCH v4 54/57] x86/mm: convert arch_within_stack_frames() to use the new unwinder Josh Poimboeuf
                   ` (4 subsequent siblings)
  57 siblings, 0 replies; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-18 13:06 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

When I tried to port arch_within_stack_frames() to use the new unwinder,
I got a nightmare include file "header soup" scenario when unwind.h was
included from thread_info.h.  And anyway, I think thread_info.h isn't
really an appropriate place for this function.  So move it to usercopy.c
instead.

Since it relies on its parent's stack pointer, and the function is no
longer inlined, the arguments to the __builtin_frame_address() calls
have been incremented.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/Kconfig                       |  4 ++--
 arch/x86/include/asm/thread_info.h | 46 ++++++++------------------------------
 arch/x86/lib/usercopy.c            | 43 +++++++++++++++++++++++++++++++++++
 3 files changed, 54 insertions(+), 39 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index e9c9334..1513043 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -467,8 +467,8 @@ config HAVE_ARCH_WITHIN_STACK_FRAMES
 	  An architecture should select this if it can walk the kernel stack
 	  frames to determine if an object is part of either the arguments
 	  or local variables (i.e. that it excludes saved return addresses,
-	  and similar) by implementing an inline arch_within_stack_frames(),
-	  which is used by CONFIG_HARDENED_USERCOPY.
+	  and similar) by implementing arch_within_stack_frames(), which is
+	  used by CONFIG_HARDENED_USERCOPY.
 
 config HAVE_CONTEXT_TRACKING
 	bool
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index 8b7c8d8e..fd849e6 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -176,49 +176,21 @@ static inline unsigned long current_stack_pointer(void)
 	return sp;
 }
 
-/*
- * Walks up the stack frames to make sure that the specified object is
- * entirely contained by a single stack frame.
- *
- * Returns:
- *		 1 if within a frame
- *		-1 if placed across a frame boundary (or outside stack)
- *		 0 unable to determine (no frame pointers, etc)
- */
+#ifdef CONFIG_HARDENED_USERCOPY
+#ifdef CONFIG_FRAME_POINTER
+int arch_within_stack_frames(const void * const stack,
+			     const void * const stackend,
+			     const void *obj, unsigned long len);
+#else
 static inline int arch_within_stack_frames(const void * const stack,
 					   const void * const stackend,
 					   const void *obj, unsigned long len)
 {
-#if defined(CONFIG_FRAME_POINTER)
-	const void *frame = NULL;
-	const void *oldframe;
-
-	oldframe = __builtin_frame_address(1);
-	if (oldframe)
-		frame = __builtin_frame_address(2);
-	/*
-	 * low ----------------------------------------------> high
-	 * [saved bp][saved ip][args][local vars][saved bp][saved ip]
-	 *                     ^----------------^
-	 *               allow copies only within here
-	 */
-	while (stack <= frame && frame < stackend) {
-		/*
-		 * If obj + len extends past the last frame, this
-		 * check won't pass and the next frame will be 0,
-		 * causing us to bail out and correctly report
-		 * the copy as invalid.
-		 */
-		if (obj + len <= frame)
-			return obj >= oldframe + 2 * sizeof(void *) ? 1 : -1;
-		oldframe = frame;
-		frame = *(const void * const *)frame;
-	}
-	return -1;
-#else
 	return 0;
-#endif
 }
+#endif /* CONFIG_FRAME_POINTER */
+#endif /* CONFIG_HARDENED_USERCOPY */
+
 
 #else /* !__ASSEMBLY__ */
 
diff --git a/arch/x86/lib/usercopy.c b/arch/x86/lib/usercopy.c
index b490878..2492fa7 100644
--- a/arch/x86/lib/usercopy.c
+++ b/arch/x86/lib/usercopy.c
@@ -9,6 +9,7 @@
 
 #include <asm/word-at-a-time.h>
 #include <linux/sched.h>
+#include <asm/unwind.h>
 
 /*
  * We rely on the nested NMI work to allow atomic faults from the NMI path; the
@@ -34,3 +35,45 @@ copy_from_user_nmi(void *to, const void __user *from, unsigned long n)
 	return ret;
 }
 EXPORT_SYMBOL_GPL(copy_from_user_nmi);
+
+#if defined(CONFIG_HARDENED_USERCOPY) && defined(CONFIG_FRAME_POINTER)
+/*
+ * Walks up the stack frames to make sure that the specified object is
+ * entirely contained by a single stack frame.
+ *
+ * Returns:
+ *		 1 if within a frame
+ *		-1 if placed across a frame boundary (or outside stack)
+ *		 0 unable to determine (no frame pointers, etc)
+ */
+int arch_within_stack_frames(const void * const stack,
+			     const void * const stackend,
+			     const void *obj, unsigned long len)
+{
+	const void *frame = NULL;
+	const void *oldframe;
+
+	oldframe = __builtin_frame_address(2);
+	if (oldframe)
+		frame = __builtin_frame_address(3);
+	/*
+	 * low ----------------------------------------------> high
+	 * [saved bp][saved ip][args][local vars][saved bp][saved ip]
+	 *                     ^----------------^
+	 *               allow copies only within here
+	 */
+	while (stack <= frame && frame < stackend) {
+		/*
+		 * If obj + len extends past the last frame, this
+		 * check won't pass and the next frame will be 0,
+		 * causing us to bail out and correctly report
+		 * the copy as invalid.
+		 */
+		if (obj + len <= frame)
+			return obj >= oldframe + 2 * sizeof(void *) ? 1 : -1;
+		oldframe = frame;
+		frame = *(const void * const *)frame;
+	}
+	return -1;
+}
+#endif /* CONFIG_HARDENED_USERCOPY && CONFIG_FRAME_POINTER */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v4 54/57] x86/mm: convert arch_within_stack_frames() to use the new unwinder
  2016-08-18 13:05 [PATCH v4 00/57] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (52 preceding siblings ...)
  2016-08-18 13:06 ` [PATCH v4 53/57] x86/mm: move arch_within_stack_frames() to usercopy.c Josh Poimboeuf
@ 2016-08-18 13:06 ` Josh Poimboeuf
  2016-08-19 18:27   ` Kees Cook
  2016-08-22 22:11   ` Linus Torvalds
  2016-08-18 13:06 ` [PATCH v4 55/57] x86/mm: simplify starting frame logic for hardened usercopy Josh Poimboeuf
                   ` (3 subsequent siblings)
  57 siblings, 2 replies; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-18 13:06 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

Convert arch_within_stack_frames() to use the new unwinder.

This also changes some existing behavior:

- Skip checking of pt_regs frames.
- Warn if it can't reach the grandparent's stack frame.
- Warn if it doesn't unwind to the end of the stack.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/lib/usercopy.c | 44 ++++++++++++++++++++++++++++----------------
 1 file changed, 28 insertions(+), 16 deletions(-)

diff --git a/arch/x86/lib/usercopy.c b/arch/x86/lib/usercopy.c
index 2492fa7..8fe0a9c 100644
--- a/arch/x86/lib/usercopy.c
+++ b/arch/x86/lib/usercopy.c
@@ -50,30 +50,42 @@ int arch_within_stack_frames(const void * const stack,
 			     const void * const stackend,
 			     const void *obj, unsigned long len)
 {
-	const void *frame = NULL;
-	const void *oldframe;
+	struct unwind_state state;
+	const void *frame, *frame_end;
+
+	/*
+	 * Start at the end of our grandparent's frame (beginning of
+	 * great-grandparent's frame).
+	 */
+	unwind_start(&state, current, NULL, NULL);
+	if (WARN_ON_ONCE(!unwind_next_frame(&state) ||
+			 !unwind_next_frame(&state)))
+		return 0;
+	frame = unwind_get_stack_ptr(&state);
 
-	oldframe = __builtin_frame_address(2);
-	if (oldframe)
-		frame = __builtin_frame_address(3);
 	/*
 	 * low ----------------------------------------------> high
 	 * [saved bp][saved ip][args][local vars][saved bp][saved ip]
 	 *                     ^----------------^
 	 *               allow copies only within here
 	 */
-	while (stack <= frame && frame < stackend) {
-		/*
-		 * If obj + len extends past the last frame, this
-		 * check won't pass and the next frame will be 0,
-		 * causing us to bail out and correctly report
-		 * the copy as invalid.
-		 */
-		if (obj + len <= frame)
-			return obj >= oldframe + 2 * sizeof(void *) ? 1 : -1;
-		oldframe = frame;
-		frame = *(const void * const *)frame;
+	frame += 2*sizeof(long);
+
+	while (unwind_next_frame(&state)) {
+		frame_end = unwind_get_stack_ptr(&state);
+
+		/* skip checking of pt_regs frames */
+		if (!unwind_get_entry_regs(&state) &&
+		    obj >= frame && obj + len <= frame_end)
+			return 1;
+
+		frame = frame_end + 2*sizeof(long);
 	}
+
+	/* make sure the unwinder reached the end of the task stack */
+	if (WARN_ON_ONCE(frame != (void *)task_pt_regs(current)))
+		return 0;
+
 	return -1;
 }
 #endif /* CONFIG_HARDENED_USERCOPY && CONFIG_FRAME_POINTER */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v4 55/57] x86/mm: simplify starting frame logic for hardened usercopy
  2016-08-18 13:05 [PATCH v4 00/57] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (53 preceding siblings ...)
  2016-08-18 13:06 ` [PATCH v4 54/57] x86/mm: convert arch_within_stack_frames() to use the new unwinder Josh Poimboeuf
@ 2016-08-18 13:06 ` Josh Poimboeuf
  2016-08-18 13:06 ` [PATCH v4 56/57] x86/mm: removed unused arch_within_stack_frames() arguments Josh Poimboeuf
                   ` (2 subsequent siblings)
  57 siblings, 0 replies; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-18 13:06 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

Currently, arch_within_stack_frames() has to manually find the stack
frame of its great-grandparent (i.e., it's caller's caller's caller).
This is somewhat fragile because it relies on the current call path and
inlining decisions.

Get the starting frame address closer to the source, in
check_stack_object(), and pass it to arch_within_stack_frames().

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/include/asm/thread_info.h |  2 ++
 arch/x86/lib/usercopy.c            | 12 ++++--------
 include/linux/thread_info.h        |  1 +
 mm/usercopy.c                      | 15 ++++++++++-----
 4 files changed, 17 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index fd849e6..0f27e04 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -180,10 +180,12 @@ static inline unsigned long current_stack_pointer(void)
 #ifdef CONFIG_FRAME_POINTER
 int arch_within_stack_frames(const void * const stack,
 			     const void * const stackend,
+			     void *first_frame,
 			     const void *obj, unsigned long len);
 #else
 static inline int arch_within_stack_frames(const void * const stack,
 					   const void * const stackend,
+					   void *first_frame
 					   const void *obj, unsigned long len)
 {
 	return 0;
diff --git a/arch/x86/lib/usercopy.c b/arch/x86/lib/usercopy.c
index 8fe0a9c..fe0b233 100644
--- a/arch/x86/lib/usercopy.c
+++ b/arch/x86/lib/usercopy.c
@@ -48,20 +48,16 @@ EXPORT_SYMBOL_GPL(copy_from_user_nmi);
  */
 int arch_within_stack_frames(const void * const stack,
 			     const void * const stackend,
+			     void *first_frame,
 			     const void *obj, unsigned long len)
 {
 	struct unwind_state state;
 	const void *frame, *frame_end;
 
-	/*
-	 * Start at the end of our grandparent's frame (beginning of
-	 * great-grandparent's frame).
-	 */
-	unwind_start(&state, current, NULL, NULL);
-	if (WARN_ON_ONCE(!unwind_next_frame(&state) ||
-			 !unwind_next_frame(&state)))
-		return 0;
+	unwind_start(&state, current, NULL, first_frame);
 	frame = unwind_get_stack_ptr(&state);
+	if (WARN_ON_ONCE(frame != first_frame))
+		return 0;
 
 	/*
 	 * low ----------------------------------------------> high
diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h
index cbd8990..aa58813 100644
--- a/include/linux/thread_info.h
+++ b/include/linux/thread_info.h
@@ -108,6 +108,7 @@ static inline int test_ti_thread_flag(struct thread_info *ti, int flag)
 #ifndef CONFIG_HAVE_ARCH_WITHIN_STACK_FRAMES
 static inline int arch_within_stack_frames(const void * const stack,
 					   const void * const stackend,
+					   void *first_frame,
 					   const void *obj, unsigned long len)
 {
 	return 0;
diff --git a/mm/usercopy.c b/mm/usercopy.c
index 8ebae91..359249c 100644
--- a/mm/usercopy.c
+++ b/mm/usercopy.c
@@ -26,8 +26,8 @@ enum {
 };
 
 /*
- * Checks if a given pointer and length is contained by the current
- * stack frame (if possible).
+ * Checks if a given pointer and length is contained by a frame in
+ * the current stack (if possible).
  *
  * Returns:
  *	NOT_STACK: not at all on the stack
@@ -35,7 +35,8 @@ enum {
  *	GOOD_STACK: fully on the stack (when can't do frame-checking)
  *	BAD_STACK: error condition (invalid stack position or bad stack frame)
  */
-static noinline int check_stack_object(const void *obj, unsigned long len)
+static __always_inline int check_stack_object(const void *obj,
+					      unsigned long len)
 {
 	const void * const stack = task_stack_page(current);
 	const void * const stackend = stack + THREAD_SIZE;
@@ -53,8 +54,12 @@ static noinline int check_stack_object(const void *obj, unsigned long len)
 	if (obj < stack || stackend < obj + len)
 		return BAD_STACK;
 
-	/* Check if object is safely within a valid frame. */
-	ret = arch_within_stack_frames(stack, stackend, obj, len);
+	/*
+	 * Starting with the caller's stack frame, check if the object
+	 * is safely within a valid frame.
+	 */
+	ret = arch_within_stack_frames(stack, stackend,
+				       __builtin_frame_address(0), obj, len);
 	if (ret)
 		return ret;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v4 56/57] x86/mm: removed unused arch_within_stack_frames() arguments
  2016-08-18 13:05 [PATCH v4 00/57] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (54 preceding siblings ...)
  2016-08-18 13:06 ` [PATCH v4 55/57] x86/mm: simplify starting frame logic for hardened usercopy Josh Poimboeuf
@ 2016-08-18 13:06 ` Josh Poimboeuf
  2016-08-18 13:06 ` [PATCH v4 57/57] mm: re-enable gcc frame address warning Josh Poimboeuf
  2016-08-18 13:25 ` [PATCH v4 00/57] x86/dumpstack: rewrite x86 stack dump code Frederic Weisbecker
  57 siblings, 0 replies; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-18 13:06 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

Remove two unused arch_within_stack_frames() arguments: 'stack' and
'stackend'.  The x86 unwinder already knows how to find the stack
bounds, and that's generally the case for other arch unwinders as well.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/include/asm/thread_info.h | 8 ++------
 arch/x86/lib/usercopy.c            | 4 +---
 include/linux/thread_info.h        | 4 +---
 mm/usercopy.c                      | 3 +--
 4 files changed, 5 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index 0f27e04..530789b 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -178,14 +178,10 @@ static inline unsigned long current_stack_pointer(void)
 
 #ifdef CONFIG_HARDENED_USERCOPY
 #ifdef CONFIG_FRAME_POINTER
-int arch_within_stack_frames(const void * const stack,
-			     const void * const stackend,
-			     void *first_frame,
+int arch_within_stack_frames(void *first_frame,
 			     const void *obj, unsigned long len);
 #else
-static inline int arch_within_stack_frames(const void * const stack,
-					   const void * const stackend,
-					   void *first_frame
+static inline int arch_within_stack_frames(void *first_frame,
 					   const void *obj, unsigned long len)
 {
 	return 0;
diff --git a/arch/x86/lib/usercopy.c b/arch/x86/lib/usercopy.c
index fe0b233..3045da7 100644
--- a/arch/x86/lib/usercopy.c
+++ b/arch/x86/lib/usercopy.c
@@ -46,9 +46,7 @@ EXPORT_SYMBOL_GPL(copy_from_user_nmi);
  *		-1 if placed across a frame boundary (or outside stack)
  *		 0 unable to determine (no frame pointers, etc)
  */
-int arch_within_stack_frames(const void * const stack,
-			     const void * const stackend,
-			     void *first_frame,
+int arch_within_stack_frames(void *first_frame,
 			     const void *obj, unsigned long len)
 {
 	struct unwind_state state;
diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h
index aa58813..74d99e2 100644
--- a/include/linux/thread_info.h
+++ b/include/linux/thread_info.h
@@ -106,9 +106,7 @@ static inline int test_ti_thread_flag(struct thread_info *ti, int flag)
 #define tif_need_resched() test_thread_flag(TIF_NEED_RESCHED)
 
 #ifndef CONFIG_HAVE_ARCH_WITHIN_STACK_FRAMES
-static inline int arch_within_stack_frames(const void * const stack,
-					   const void * const stackend,
-					   void *first_frame,
+static inline int arch_within_stack_frames(void *first_frame,
 					   const void *obj, unsigned long len)
 {
 	return 0;
diff --git a/mm/usercopy.c b/mm/usercopy.c
index 359249c..7b2585a 100644
--- a/mm/usercopy.c
+++ b/mm/usercopy.c
@@ -58,8 +58,7 @@ static __always_inline int check_stack_object(const void *obj,
 	 * Starting with the caller's stack frame, check if the object
 	 * is safely within a valid frame.
 	 */
-	ret = arch_within_stack_frames(stack, stackend,
-				       __builtin_frame_address(0), obj, len);
+	ret = arch_within_stack_frames(__builtin_frame_address(0), obj, len);
 	if (ret)
 		return ret;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v4 57/57] mm: re-enable gcc frame address warning
  2016-08-18 13:05 [PATCH v4 00/57] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (55 preceding siblings ...)
  2016-08-18 13:06 ` [PATCH v4 56/57] x86/mm: removed unused arch_within_stack_frames() arguments Josh Poimboeuf
@ 2016-08-18 13:06 ` Josh Poimboeuf
  2016-08-18 13:25 ` [PATCH v4 00/57] x86/dumpstack: rewrite x86 stack dump code Frederic Weisbecker
  57 siblings, 0 replies; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-18 13:06 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

Now that arch_within_stack_frames()'s uses of __builtin_frame_address()
with non-zero arguments have been removed, this warning can be
re-enabled for mm code.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 mm/Makefile | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/mm/Makefile b/mm/Makefile
index 2ca1faf..295bd7a 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -21,9 +21,6 @@ KCOV_INSTRUMENT_memcontrol.o := n
 KCOV_INSTRUMENT_mmzone.o := n
 KCOV_INSTRUMENT_vmstat.o := n
 
-# Since __builtin_frame_address does work as used, disable the warning.
-CFLAGS_usercopy.o += $(call cc-disable-warning, frame-address)
-
 mmu-y			:= nommu.o
 mmu-$(CONFIG_MMU)	:= gup.o highmem.o memory.o mincore.o \
 			   mlock.o mmap.o mprotect.o mremap.o msync.o rmap.o \
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* Re: [PATCH v4 00/57] x86/dumpstack: rewrite x86 stack dump code
  2016-08-18 13:05 [PATCH v4 00/57] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
                   ` (56 preceding siblings ...)
  2016-08-18 13:06 ` [PATCH v4 57/57] mm: re-enable gcc frame address warning Josh Poimboeuf
@ 2016-08-18 13:25 ` Frederic Weisbecker
  2016-08-18 13:39   ` Ingo Molnar
  57 siblings, 1 reply; 107+ messages in thread
From: Frederic Weisbecker @ 2016-08-18 13:25 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86, linux-kernel,
	Andy Lutomirski, Linus Torvalds, Steven Rostedt, Brian Gerst,
	Kees Cook, Peter Zijlstra, Byungchul Park, Nilay Vaish

On Thu, Aug 18, 2016 at 08:05:40AM -0500, Josh Poimboeuf wrote:
> 
> The x86 stack dump code is a bit of a mess.  dump_trace() uses
> callbacks, and each user of it seems to have slightly different
> requirements, so there are several slightly different callbacks floating
> around.
> 
> Also there are some upcoming features which will require more changes to
> the stack dump code: reliable stack detection for live patching,
> hardened user copy, and the DWARF unwinder.  Each of those features
> would at least need more callbacks and/or callback interfaces, resulting
> in a much bigger mess than what we have today.
> 
> Before doing all that, we should try to clean things up and replace
> dump_trace() with something cleaner and more flexible.
> 
> The new unwinder is a simple state machine which was heavily inspired by
> a suggestion from Andy Lutomirski:
> 
>   https://lkml.kernel.org/r/CALCETrUbNTqaM2LRyXGRx=kVLRPeY5A3Pc6k4TtQxF320rUT=w@mail.gmail.com
> 
> It's also similar to the libunwind API:
> 
>   http://www.nongnu.org/libunwind/man/libunwind(3).html
> 
> Some if its advantages:
> 
> - simplicity: no more callback sprawl and less code duplication.
> 
> - flexibility: allows the caller to stop and inspect the stack state at
>   each step in the unwinding process.
> 
> - modularity: the unwinder code, console stack dump code, and stack
>   metadata analysis code are all better separated so that changing one
>   of them shouldn't have much of an impact on any of the others.
> 
> ----
> 
> Josh Poimboeuf (57):

I am personally unable to review a 57 patches series.

Any chance you could split it into self-contained steps? In general doing so
increase the chances for reviews, accelerate merging, improve maintainance...

Thanks.

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v4 00/57] x86/dumpstack: rewrite x86 stack dump code
  2016-08-18 13:25 ` [PATCH v4 00/57] x86/dumpstack: rewrite x86 stack dump code Frederic Weisbecker
@ 2016-08-18 13:39   ` Ingo Molnar
  2016-08-18 14:31     ` Josh Poimboeuf
  2016-08-18 14:34     ` Steven Rostedt
  0 siblings, 2 replies; 107+ messages in thread
From: Ingo Molnar @ 2016-08-18 13:39 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Josh Poimboeuf, Thomas Gleixner, H . Peter Anvin, x86,
	linux-kernel, Andy Lutomirski, Linus Torvalds, Steven Rostedt,
	Brian Gerst, Kees Cook, Peter Zijlstra, Byungchul Park,
	Nilay Vaish


* Frederic Weisbecker <fweisbec@gmail.com> wrote:

> > Josh Poimboeuf (57):
> 
> I am personally unable to review a 57 patches series.
> 
> Any chance you could split it into self-contained steps? In general doing so 
> increase the chances for reviews, accelerate merging, improve maintainance...

Yes, please!

Series of no more than 4-6 patches, ordered in a logical fasion from lowest risk / 
simplest towards highest risk / most complex.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v4 00/57] x86/dumpstack: rewrite x86 stack dump code
  2016-08-18 13:39   ` Ingo Molnar
@ 2016-08-18 14:31     ` Josh Poimboeuf
  2016-08-18 14:41       ` Steven Rostedt
  2016-08-18 16:36       ` Ingo Molnar
  2016-08-18 14:34     ` Steven Rostedt
  1 sibling, 2 replies; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-18 14:31 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Frederic Weisbecker, Thomas Gleixner, H . Peter Anvin, x86,
	linux-kernel, Andy Lutomirski, Linus Torvalds, Steven Rostedt,
	Brian Gerst, Kees Cook, Peter Zijlstra, Byungchul Park,
	Nilay Vaish

On Thu, Aug 18, 2016 at 03:39:35PM +0200, Ingo Molnar wrote:
> 
> * Frederic Weisbecker <fweisbec@gmail.com> wrote:
> 
> > > Josh Poimboeuf (57):
> > 
> > I am personally unable to review a 57 patches series.
> > 
> > Any chance you could split it into self-contained steps? In general doing so 
> > increase the chances for reviews, accelerate merging, improve maintainance...
> 
> Yes, please!
> 
> Series of no more than 4-6 patches, ordered in a logical fasion from lowest risk / 
> simplest towards highest risk / most complex.

You're right, that would be better.  My apologies for spamming.  It
started with "only" 19 patches in v1 and then quickly got out of hand.

I may split it up something like this:

cleanups:
  x86/dumpstack: remove show_trace()
  x86/asm/head: remove unused init_rsp variable extern
  x86/asm/head: rename 'stack_start' -> 'initial_stack'
  x86/dumpstack: add IRQ_USABLE_STACK_SIZE define
  x86/dumpstack: remove extra brackets around "<EOE>"
  x86/head: remove useless zeroed word
  x86/dumpstack: fix x86_32 kernel_stack_pointer() previous stack access
  proc: fix return address printk conversion specifer in /proc/<pid>/stack
  x86: remove 64-byte gap at end of irq stack

function graph fixes:
  ftrace: remove CONFIG_HAVE_FUNCTION_GRAPH_FP_TEST from config
  ftrace: only allocate the ret_stack 'fp' field when needed
  ftrace: add return address pointer to ftrace_ret_stack
  ftrace: add ftrace_graph_ret_addr() stack unwinding helpers
  x86/dumpstack/ftrace: convert dump_trace() callbacks to use ftrace_graph_ret_addr()
  ftrace/x86: implement HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
  x86/dumpstack/ftrace: mark function graph handler function as unreliable
  x86/dumpstack/ftrace: don't print unreliable addresses in print_context_stack_bp()

get_stack_info():
  x86/dumpstack: simplify in_exception_stack()
  x86/dumpstack: add get_stack_info() interface
  x86/dumpstack: add recursion checking for all stacks

unwinder prep:
  perf/x86: check perf_callchain_store() error
  oprofile/x86: add regs->ip to oprofile trace
  x86/dumpstack: remove unnecessary stack pointer arguments
  x86/dumpstack: add get_stack_pointer() and get_frame_pointer()
  x86/dumpstack: make printk_stack_address() more generally useful
  x86/dumpstack: fix irq stack bounds calculation in show_stack_log_lvl()
  x86/dumpstack: allow preemption in show_stack_log_lvl() and dump_trace()

unwinder:
  x86/unwind: add new unwind interface and implementations
  perf/x86: convert perf_callchain_kernel() to use the new unwinder
  x86/stacktrace: convert save_stack_trace_*() to use the new unwinder
  oprofile/x86: convert x86_backtrace() to use the new unwinder
  x86/dumpstack: convert show_trace_log_lvl() to use the new unwinder
  x86/dumpstack: remove dump_trace() and related callbacks

hardened usercopy:
  x86/mm: move arch_within_stack_frames() to usercopy.c
  x86/mm: convert arch_within_stack_frames() to use the new unwinder
  x86/mm: simplify starting frame logic for hardened usercopy
  x86/mm: removed unused arch_within_stack_frames() arguments
  mm: re-enable gcc frame address warning

standardize the end of the stack:
  x86/entry/head/32: use local labels
  x86/asm/head: use a common function for starting CPUs
  x86: move _stext marker to before head code
  x86/head: put real return address on idle task stack
  x86/head: fix the end of the stack for idle tasks
  x86/entry/32: fix the end of the stack for newly forked tasks
  x86/head/32: fix the end of the stack for idle tasks
  x86/smp: fix initial idle stack location on 32-bit
  x86/entry/32: rename 'error_code' to 'common_exception'

pt_regs frames:
  x86/entry/unwind: create stack frames for saved interrupt registers
  x86/unwind: create stack frames for saved syscall registers
  x86/dumpstack: print stack identifier on its own line
  x86/dumpstack: print any pt_regs found on the stack
  x86/dumpstack: fix duplicate RIP address display in __show_regs()
  x86/dumpstack: print orig_ax in __show_regs()

unwinder warnings:
  x86/unwind: warn on kernel stack corruption
  x86/unwind: warn on bad stack return address
  x86/unwind: warn if stack grows up
  x86/dumpstack: warn on stack recursion

-- 
Josh

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v4 00/57] x86/dumpstack: rewrite x86 stack dump code
  2016-08-18 13:39   ` Ingo Molnar
  2016-08-18 14:31     ` Josh Poimboeuf
@ 2016-08-18 14:34     ` Steven Rostedt
  1 sibling, 0 replies; 107+ messages in thread
From: Steven Rostedt @ 2016-08-18 14:34 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Frederic Weisbecker, Josh Poimboeuf, Thomas Gleixner,
	H . Peter Anvin, x86, linux-kernel, Andy Lutomirski,
	Linus Torvalds, Brian Gerst, Kees Cook, Peter Zijlstra,
	Byungchul Park, Nilay Vaish

On Thu, 18 Aug 2016 15:39:35 +0200
Ingo Molnar <mingo@kernel.org> wrote:

> * Frederic Weisbecker <fweisbec@gmail.com> wrote:
> 
> > > Josh Poimboeuf (57):  
> > 
> > I am personally unable to review a 57 patches series.
> > 
> > Any chance you could split it into self-contained steps? In general doing so 
> > increase the chances for reviews, accelerate merging, improve maintainance...  
> 
> Yes, please!
> 
> Series of no more than 4-6 patches, ordered in a logical fasion from lowest risk / 
> simplest towards highest risk / most complex.
> 

 +1

-- Steve

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v4 00/57] x86/dumpstack: rewrite x86 stack dump code
  2016-08-18 14:31     ` Josh Poimboeuf
@ 2016-08-18 14:41       ` Steven Rostedt
  2016-08-18 16:36       ` Ingo Molnar
  1 sibling, 0 replies; 107+ messages in thread
From: Steven Rostedt @ 2016-08-18 14:41 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Ingo Molnar, Frederic Weisbecker, Thomas Gleixner,
	H . Peter Anvin, x86, linux-kernel, Andy Lutomirski,
	Linus Torvalds, Brian Gerst, Kees Cook, Peter Zijlstra,
	Byungchul Park, Nilay Vaish

On Thu, 18 Aug 2016 09:31:36 -0500
Josh Poimboeuf <jpoimboe@redhat.com> wrote:

> On Thu, Aug 18, 2016 at 03:39:35PM +0200, Ingo Molnar wrote:
> > 
> > * Frederic Weisbecker <fweisbec@gmail.com> wrote:
> >   
> > > > Josh Poimboeuf (57):  
> > > 
> > > I am personally unable to review a 57 patches series.
> > > 
> > > Any chance you could split it into self-contained steps? In general doing so 
> > > increase the chances for reviews, accelerate merging, improve maintainance...  
> > 
> > Yes, please!
> > 
> > Series of no more than 4-6 patches, ordered in a logical fasion from lowest risk / 
> > simplest towards highest risk / most complex.  
> 
> You're right, that would be better.  My apologies for spamming.  It
> started with "only" 19 patches in v1 and then quickly got out of hand.
> 
> I may split it up something like this:

This looks fine. My wife came down to complain to me that my tablet was
"popping" too much. (it pops for every email I get).

But don't send these all at once. Send one series out and focus on
that, then when that is all set, then work on the next series.

-- Steve

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v4 02/57] x86/asm/head: remove unused init_rsp variable extern
  2016-08-18 13:05 ` [PATCH v4 02/57] x86/asm/head: remove unused init_rsp variable extern Josh Poimboeuf
@ 2016-08-18 16:22   ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 107+ messages in thread
From: Sebastian Andrzej Siewior @ 2016-08-18 16:22 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86, linux-kernel,
	Andy Lutomirski, Linus Torvalds, Steven Rostedt, Brian Gerst,
	Kees Cook, Peter Zijlstra, Frederic Weisbecker, Byungchul Park,
	Nilay Vaish

On 2016-08-18 08:05:42 [-0500], Josh Poimboeuf wrote:
> There is no init_rsp variable.  Remove its extern.

You could add that it was removed in 9cf4f298e29a ("x86: use stack_start
in x86_64") (merged in v2.6.27-rc1).

> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>

Sebastian

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v4 00/57] x86/dumpstack: rewrite x86 stack dump code
  2016-08-18 14:31     ` Josh Poimboeuf
  2016-08-18 14:41       ` Steven Rostedt
@ 2016-08-18 16:36       ` Ingo Molnar
  1 sibling, 0 replies; 107+ messages in thread
From: Ingo Molnar @ 2016-08-18 16:36 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Frederic Weisbecker, Thomas Gleixner, H . Peter Anvin, x86,
	linux-kernel, Andy Lutomirski, Linus Torvalds, Steven Rostedt,
	Brian Gerst, Kees Cook, Peter Zijlstra, Byungchul Park,
	Nilay Vaish


* Josh Poimboeuf <jpoimboe@redhat.com> wrote:

> You're right, that would be better.  My apologies for spamming.  It
> started with "only" 19 patches in v1 and then quickly got out of hand.

np!

> I may split it up something like this:
> 
> cleanups:
> function graph fixes:
> get_stack_info():
> unwinder prep:
> unwinder:
> hardened usercopy:
> standardize the end of the stack:
> pt_regs frames:
> unwinder warnings:

Looks good to me, but please always keep only one of these series 'in flight' - 
i.e. wait until it's been reviewed & applied before proceeding to the next series.

That helps us keep all sane and relaxed!

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v4 54/57] x86/mm: convert arch_within_stack_frames() to use the new unwinder
  2016-08-18 13:06 ` [PATCH v4 54/57] x86/mm: convert arch_within_stack_frames() to use the new unwinder Josh Poimboeuf
@ 2016-08-19 18:27   ` Kees Cook
  2016-08-19 21:55     ` Josh Poimboeuf
  2016-08-22 22:11   ` Linus Torvalds
  1 sibling, 1 reply; 107+ messages in thread
From: Kees Cook @ 2016-08-19 18:27 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86, LKML,
	Andy Lutomirski, Linus Torvalds, Steven Rostedt, Brian Gerst,
	Peter Zijlstra, Frederic Weisbecker, Byungchul Park, Nilay Vaish

On Thu, Aug 18, 2016 at 6:06 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> Convert arch_within_stack_frames() to use the new unwinder.
>
> This also changes some existing behavior:
>
> - Skip checking of pt_regs frames.
> - Warn if it can't reach the grandparent's stack frame.
> - Warn if it doesn't unwind to the end of the stack.
>
> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>

All the stuff touching usercopy looks good to me. One question,
though, in looking through the unwinder. It seems like it's much more
complex than just the frame-hopping that the old
arch_within_stack_frames() did, but I'm curious to hear what you think
about its performance. We'll be calling this with every usercopy that
touches the stack, so I'd like to be able to estimate the performance
impact of this replacement...

-Kees

> ---
>  arch/x86/lib/usercopy.c | 44 ++++++++++++++++++++++++++++----------------
>  1 file changed, 28 insertions(+), 16 deletions(-)
>
> diff --git a/arch/x86/lib/usercopy.c b/arch/x86/lib/usercopy.c
> index 2492fa7..8fe0a9c 100644
> --- a/arch/x86/lib/usercopy.c
> +++ b/arch/x86/lib/usercopy.c
> @@ -50,30 +50,42 @@ int arch_within_stack_frames(const void * const stack,
>                              const void * const stackend,
>                              const void *obj, unsigned long len)
>  {
> -       const void *frame = NULL;
> -       const void *oldframe;
> +       struct unwind_state state;
> +       const void *frame, *frame_end;
> +
> +       /*
> +        * Start at the end of our grandparent's frame (beginning of
> +        * great-grandparent's frame).
> +        */
> +       unwind_start(&state, current, NULL, NULL);
> +       if (WARN_ON_ONCE(!unwind_next_frame(&state) ||
> +                        !unwind_next_frame(&state)))
> +               return 0;
> +       frame = unwind_get_stack_ptr(&state);
>
> -       oldframe = __builtin_frame_address(2);
> -       if (oldframe)
> -               frame = __builtin_frame_address(3);
>         /*
>          * low ----------------------------------------------> high
>          * [saved bp][saved ip][args][local vars][saved bp][saved ip]
>          *                     ^----------------^
>          *               allow copies only within here
>          */
> -       while (stack <= frame && frame < stackend) {
> -               /*
> -                * If obj + len extends past the last frame, this
> -                * check won't pass and the next frame will be 0,
> -                * causing us to bail out and correctly report
> -                * the copy as invalid.
> -                */
> -               if (obj + len <= frame)
> -                       return obj >= oldframe + 2 * sizeof(void *) ? 1 : -1;
> -               oldframe = frame;
> -               frame = *(const void * const *)frame;
> +       frame += 2*sizeof(long);
> +
> +       while (unwind_next_frame(&state)) {
> +               frame_end = unwind_get_stack_ptr(&state);
> +
> +               /* skip checking of pt_regs frames */
> +               if (!unwind_get_entry_regs(&state) &&
> +                   obj >= frame && obj + len <= frame_end)
> +                       return 1;
> +
> +               frame = frame_end + 2*sizeof(long);
>         }
> +
> +       /* make sure the unwinder reached the end of the task stack */
> +       if (WARN_ON_ONCE(frame != (void *)task_pt_regs(current)))
> +               return 0;
> +
>         return -1;
>  }
>  #endif /* CONFIG_HARDENED_USERCOPY && CONFIG_FRAME_POINTER */
> --
> 2.7.4
>



-- 
Kees Cook
Nexus Security

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v4 54/57] x86/mm: convert arch_within_stack_frames() to use the new unwinder
  2016-08-19 18:27   ` Kees Cook
@ 2016-08-19 21:55     ` Josh Poimboeuf
  2016-08-22 20:27       ` Josh Poimboeuf
  0 siblings, 1 reply; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-19 21:55 UTC (permalink / raw)
  To: Kees Cook
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86, LKML,
	Andy Lutomirski, Linus Torvalds, Steven Rostedt, Brian Gerst,
	Peter Zijlstra, Frederic Weisbecker, Byungchul Park, Nilay Vaish

On Fri, Aug 19, 2016 at 11:27:18AM -0700, Kees Cook wrote:
> On Thu, Aug 18, 2016 at 6:06 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > Convert arch_within_stack_frames() to use the new unwinder.
> >
> > This also changes some existing behavior:
> >
> > - Skip checking of pt_regs frames.
> > - Warn if it can't reach the grandparent's stack frame.
> > - Warn if it doesn't unwind to the end of the stack.
> >
> > Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
> 
> All the stuff touching usercopy looks good to me. One question,
> though, in looking through the unwinder. It seems like it's much more
> complex than just the frame-hopping that the old
> arch_within_stack_frames() did, but I'm curious to hear what you think
> about its performance. We'll be calling this with every usercopy that
> touches the stack, so I'd like to be able to estimate the performance
> impact of this replacement...

Yeah, good point.  I'll take some measurements from before and after and
get back to you.

-- 
Josh

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v4 54/57] x86/mm: convert arch_within_stack_frames() to use the new unwinder
  2016-08-19 21:55     ` Josh Poimboeuf
@ 2016-08-22 20:27       ` Josh Poimboeuf
  2016-08-22 23:33         ` Josh Poimboeuf
  0 siblings, 1 reply; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-22 20:27 UTC (permalink / raw)
  To: Kees Cook
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86, LKML,
	Andy Lutomirski, Linus Torvalds, Steven Rostedt, Brian Gerst,
	Peter Zijlstra, Frederic Weisbecker, Byungchul Park, Nilay Vaish

On Fri, Aug 19, 2016 at 04:55:22PM -0500, Josh Poimboeuf wrote:
> On Fri, Aug 19, 2016 at 11:27:18AM -0700, Kees Cook wrote:
> > On Thu, Aug 18, 2016 at 6:06 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > > Convert arch_within_stack_frames() to use the new unwinder.
> > >
> > > This also changes some existing behavior:
> > >
> > > - Skip checking of pt_regs frames.
> > > - Warn if it can't reach the grandparent's stack frame.
> > > - Warn if it doesn't unwind to the end of the stack.
> > >
> > > Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
> > 
> > All the stuff touching usercopy looks good to me. One question,
> > though, in looking through the unwinder. It seems like it's much more
> > complex than just the frame-hopping that the old
> > arch_within_stack_frames() did, but I'm curious to hear what you think
> > about its performance. We'll be calling this with every usercopy that
> > touches the stack, so I'd like to be able to estimate the performance
> > impact of this replacement...
> 
> Yeah, good point.  I'll take some measurements from before and after and
> get back to you.

I took some before/after measurements by enclosing the affected
functions with ktime calls to get the total time spent in each function,
and did a "find /usr >/dev/null" to trigger a bunch of user copies.

	copy_to/from_user	check_object_size	arch_within_stack_frames
before: 13ms			6.8ms			0.61ms
after: 	17ms			11ms			4.6ms

The unwinder port made arch_within_stack_frames() *much* (8x) slower
than its current simple implementation, and added about 30% (4ms) to the
total copy_to/from_user() run time.

Note that hardened usercopy itself is already quite slow: it made user
copies about 52% slower.  With the unwinder port, that worsened to ~65%.

"find /usr" took about 170ms of kernel time and 2.3s total.  So the
unwinder port added about 2% on the kernel side and 0.2% total for this
particular test case.  Though I'm sure there are more I/O-intensive
workloads out there which would be more adversely affected.

I haven't yet looked to see where the bottlenecks are and if there could
be any obvious performance improvements.

BTW, ignoring the performance issues, using the unwinder here would have
some benefits:

- It protects pt_regs frames from being changed.  For example, during a
  page fault operation, the saved regs->ip on the stack is protected.

- Unlike the existing code, it could potentially work with
  __copy_from_user_inatomic() and copy_from_user_nmi(), which can copy
  to/from an irq/exception stack.  (I think check_stack_object() would
  need to be rewritten a bit so that it doesn't always assume the task
  stack.)

- It complains loudly if there's stack corruption or something else goes
  wrong with walking the stack instead of just silently failing.

- The same code could also work with DWARF if we ever add a DWARF
  unwinder (with a possible tweak to the unwinder API to get the stack
  frame header size).

-- 
Josh

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v4 54/57] x86/mm: convert arch_within_stack_frames() to use the new unwinder
  2016-08-18 13:06 ` [PATCH v4 54/57] x86/mm: convert arch_within_stack_frames() to use the new unwinder Josh Poimboeuf
  2016-08-19 18:27   ` Kees Cook
@ 2016-08-22 22:11   ` Linus Torvalds
  2016-08-23  1:27     ` Kees Cook
                       ` (2 more replies)
  1 sibling, 3 replies; 107+ messages in thread
From: Linus Torvalds @ 2016-08-22 22:11 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	the arch/x86 maintainers, Linux Kernel Mailing List,
	Andy Lutomirski, Steven Rostedt, Brian Gerst, Kees Cook,
	Peter Zijlstra, Frederic Weisbecker, Byungchul Park, Nilay Vaish

On Thu, Aug 18, 2016 at 6:06 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> Convert arch_within_stack_frames() to use the new unwinder.

Please don't do this.

There's no real reason to unwind the stack frame. If it's not on the
current stack page, it shouldn't be a valid source anyway, so
unwidning things just seems entirely pointless.

Quite frankly, I think the whole "look at the stack frames" logic
should be removed from this. It's classic crap that external patches
do. How many call-sites does it actually check, and how many of them
aren't already checked by the existing static checks for constant
addresses within existing objects?

It's entirely possible that there is simply no point what-so-ever to
this all, and it mostly triggers on things like the fs/stat.c code
that does

        struct stat tmp;
    ...
        return copy_to_user(statbuf,&tmp,sizeof(tmp)) ? -EFAULT : 0;

where the new useraccess.c code is pure masturbatory crap.

One of the reasons I had for merging that code was that I was hoping
that it would improve by being in the  kernel. And by "improve" I mean
"get rid of crap" rather than make it more expensive and even more
self-congratulatory stupidity.

Right now, I suspect 99% of all the stack checks in usercopy.c are
solidly in the "mindbogglingly stupid crap" camp.

             Linus

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v4 54/57] x86/mm: convert arch_within_stack_frames() to use the new unwinder
  2016-08-22 20:27       ` Josh Poimboeuf
@ 2016-08-22 23:33         ` Josh Poimboeuf
  2016-08-23  0:59           ` Kees Cook
  0 siblings, 1 reply; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-22 23:33 UTC (permalink / raw)
  To: Kees Cook
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86, LKML,
	Andy Lutomirski, Linus Torvalds, Steven Rostedt, Brian Gerst,
	Peter Zijlstra, Frederic Weisbecker, Byungchul Park, Nilay Vaish

On Mon, Aug 22, 2016 at 03:27:19PM -0500, Josh Poimboeuf wrote:
> On Fri, Aug 19, 2016 at 04:55:22PM -0500, Josh Poimboeuf wrote:
> > On Fri, Aug 19, 2016 at 11:27:18AM -0700, Kees Cook wrote:
> > > On Thu, Aug 18, 2016 at 6:06 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > > > Convert arch_within_stack_frames() to use the new unwinder.
> > > >
> > > > This also changes some existing behavior:
> > > >
> > > > - Skip checking of pt_regs frames.
> > > > - Warn if it can't reach the grandparent's stack frame.
> > > > - Warn if it doesn't unwind to the end of the stack.
> > > >
> > > > Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
> > > 
> > > All the stuff touching usercopy looks good to me. One question,
> > > though, in looking through the unwinder. It seems like it's much more
> > > complex than just the frame-hopping that the old
> > > arch_within_stack_frames() did, but I'm curious to hear what you think
> > > about its performance. We'll be calling this with every usercopy that
> > > touches the stack, so I'd like to be able to estimate the performance
> > > impact of this replacement...
> > 
> > Yeah, good point.  I'll take some measurements from before and after and
> > get back to you.
> 
> I took some before/after measurements by enclosing the affected
> functions with ktime calls to get the total time spent in each function,
> and did a "find /usr >/dev/null" to trigger a bunch of user copies.
> 
> 	copy_to/from_user	check_object_size	arch_within_stack_frames
> before: 13ms			6.8ms			0.61ms
> after: 	17ms			11ms			4.6ms
> 
> The unwinder port made arch_within_stack_frames() *much* (8x) slower
> than its current simple implementation, and added about 30% (4ms) to the
> total copy_to/from_user() run time.
> 
> Note that hardened usercopy itself is already quite slow: it made user
> copies about 52% slower.  With the unwinder port, that worsened to ~65%.

FWIW, I think I messed up my math summary here.  Hardened usercopy was
roughly 110% slower than normal usercopy (i.e., it took more than twice
as long) with 52% of the usercopy time being consumed by
check_object_size().

With the unwinder, that worsened to 180% slower -- with 65% of the
usercopy time being consumed by check_object_size().

-- 
Josh

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v4 54/57] x86/mm: convert arch_within_stack_frames() to use the new unwinder
  2016-08-22 23:33         ` Josh Poimboeuf
@ 2016-08-23  0:59           ` Kees Cook
  2016-08-23  4:21             ` Josh Poimboeuf
  0 siblings, 1 reply; 107+ messages in thread
From: Kees Cook @ 2016-08-23  0:59 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86, LKML,
	Andy Lutomirski, Linus Torvalds, Steven Rostedt, Brian Gerst,
	Peter Zijlstra, Frederic Weisbecker, Byungchul Park, Nilay Vaish

On Mon, Aug 22, 2016 at 4:33 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> On Mon, Aug 22, 2016 at 03:27:19PM -0500, Josh Poimboeuf wrote:
>> On Fri, Aug 19, 2016 at 04:55:22PM -0500, Josh Poimboeuf wrote:
>> > On Fri, Aug 19, 2016 at 11:27:18AM -0700, Kees Cook wrote:
>> > > On Thu, Aug 18, 2016 at 6:06 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>> > > > Convert arch_within_stack_frames() to use the new unwinder.
>> > > >
>> > > > This also changes some existing behavior:
>> > > >
>> > > > - Skip checking of pt_regs frames.
>> > > > - Warn if it can't reach the grandparent's stack frame.
>> > > > - Warn if it doesn't unwind to the end of the stack.
>> > > >
>> > > > Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
>> > >
>> > > All the stuff touching usercopy looks good to me. One question,
>> > > though, in looking through the unwinder. It seems like it's much more
>> > > complex than just the frame-hopping that the old
>> > > arch_within_stack_frames() did, but I'm curious to hear what you think
>> > > about its performance. We'll be calling this with every usercopy that
>> > > touches the stack, so I'd like to be able to estimate the performance
>> > > impact of this replacement...
>> >
>> > Yeah, good point.  I'll take some measurements from before and after and
>> > get back to you.
>>
>> I took some before/after measurements by enclosing the affected
>> functions with ktime calls to get the total time spent in each function,
>> and did a "find /usr >/dev/null" to trigger a bunch of user copies.
>>
>>       copy_to/from_user       check_object_size       arch_within_stack_frames
>> before: 13ms                  6.8ms                   0.61ms
>> after:        17ms                    11ms                    4.6ms
>>
>> The unwinder port made arch_within_stack_frames() *much* (8x) slower
>> than its current simple implementation, and added about 30% (4ms) to the
>> total copy_to/from_user() run time.
>>
>> Note that hardened usercopy itself is already quite slow: it made user
>> copies about 52% slower.  With the unwinder port, that worsened to ~65%.
>
> FWIW, I think I messed up my math summary here.  Hardened usercopy was
> roughly 110% slower than normal usercopy (i.e., it took more than twice
> as long) with 52% of the usercopy time being consumed by
> check_object_size().

And this is comparing usercopy to hardened usercopy, which isn't
expected to be super fast, it's just a cheap expense in comparison to
the rest of the work being done for a given syscall.

> With the unwinder, that worsened to 180% slower -- with 65% of the
> usercopy time being consumed by check_object_size().

That's quite a bit more than just a simple frame walk. You mentioned a
few benefits to using the unwinder, but I'm trying to make sure the
cases it covers can actually happen during a usercopy?

-Kees

-- 
Kees Cook
Nexus Security

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v4 54/57] x86/mm: convert arch_within_stack_frames() to use the new unwinder
  2016-08-22 22:11   ` Linus Torvalds
@ 2016-08-23  1:27     ` Kees Cook
  2016-08-23 16:21       ` Josh Poimboeuf
  2016-08-23 18:47       ` Linus Torvalds
  2016-08-23 16:06     ` Josh Poimboeuf
  2016-08-23 20:31     ` [PATCH v4 54/57] x86/mm: convert arch_within_stack_frames() to use the new unwinder Andy Lutomirski
  2 siblings, 2 replies; 107+ messages in thread
From: Kees Cook @ 2016-08-23  1:27 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Josh Poimboeuf, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	the arch/x86 maintainers, Linux Kernel Mailing List,
	Andy Lutomirski, Steven Rostedt, Brian Gerst, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

On Mon, Aug 22, 2016 at 3:11 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Thu, Aug 18, 2016 at 6:06 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>> Convert arch_within_stack_frames() to use the new unwinder.
>
> Please don't do this.
>
> There's no real reason to unwind the stack frame. If it's not on the
> current stack page, it shouldn't be a valid source anyway, so
> unwidning things just seems entirely pointless.
>
> Quite frankly, I think the whole "look at the stack frames" logic
> should be removed from this. It's classic crap that external patches
> do. How many call-sites does it actually check, and how many of them
> aren't already checked by the existing static checks for constant
> addresses within existing objects?
>
> It's entirely possible that there is simply no point what-so-ever to
> this all, and it mostly triggers on things like the fs/stat.c code
> that does
>
>         struct stat tmp;
>     ...
>         return copy_to_user(statbuf,&tmp,sizeof(tmp)) ? -EFAULT : 0;
>
> where the new useraccess.c code is pure masturbatory crap.

I need to re-check the copy_*_user changes, but on several
architectures, the bounds checking is only triggered for non
built-in-const sizes, so these kinds of pointless checks shouldn't
happen. This should be done universally to avoid the needless
overhead. (And is why I'm hoping to consolidate the copy_*_user logic,
which Al appears to also be looking at recently.)

> One of the reasons I had for merging that code was that I was hoping
> that it would improve by being in the  kernel. And by "improve" I mean
> "get rid of crap" rather than make it more expensive and even more
> self-congratulatory stupidity.
>
> Right now, I suspect 99% of all the stack checks in usercopy.c are
> solidly in the "mindbogglingly stupid crap" camp.

The stack bounds checking makes sense to block writes to the saved
frame and instruction pointers, though in practice the stack canary
should resist that kind of attack. The improvement I'd like to see
would be for the canary to be excluded from the frame size calculation
(though I can't imagine how) so that canaries couldn't be exposed
during reads.

-Kees

-- 
Kees Cook
Nexus Security

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v4 54/57] x86/mm: convert arch_within_stack_frames() to use the new unwinder
  2016-08-23  0:59           ` Kees Cook
@ 2016-08-23  4:21             ` Josh Poimboeuf
  0 siblings, 0 replies; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-23  4:21 UTC (permalink / raw)
  To: Kees Cook
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86, LKML,
	Andy Lutomirski, Linus Torvalds, Steven Rostedt, Brian Gerst,
	Peter Zijlstra, Frederic Weisbecker, Byungchul Park, Nilay Vaish

On Mon, Aug 22, 2016 at 05:59:18PM -0700, Kees Cook wrote:
> On Mon, Aug 22, 2016 at 4:33 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > On Mon, Aug 22, 2016 at 03:27:19PM -0500, Josh Poimboeuf wrote:
> >> On Fri, Aug 19, 2016 at 04:55:22PM -0500, Josh Poimboeuf wrote:
> >> > On Fri, Aug 19, 2016 at 11:27:18AM -0700, Kees Cook wrote:
> >> > > On Thu, Aug 18, 2016 at 6:06 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> >> > > > Convert arch_within_stack_frames() to use the new unwinder.
> >> > > >
> >> > > > This also changes some existing behavior:
> >> > > >
> >> > > > - Skip checking of pt_regs frames.
> >> > > > - Warn if it can't reach the grandparent's stack frame.
> >> > > > - Warn if it doesn't unwind to the end of the stack.
> >> > > >
> >> > > > Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
> >> > >
> >> > > All the stuff touching usercopy looks good to me. One question,
> >> > > though, in looking through the unwinder. It seems like it's much more
> >> > > complex than just the frame-hopping that the old
> >> > > arch_within_stack_frames() did, but I'm curious to hear what you think
> >> > > about its performance. We'll be calling this with every usercopy that
> >> > > touches the stack, so I'd like to be able to estimate the performance
> >> > > impact of this replacement...
> >> >
> >> > Yeah, good point.  I'll take some measurements from before and after and
> >> > get back to you.
> >>
> >> I took some before/after measurements by enclosing the affected
> >> functions with ktime calls to get the total time spent in each function,
> >> and did a "find /usr >/dev/null" to trigger a bunch of user copies.
> >>
> >>       copy_to/from_user       check_object_size       arch_within_stack_frames
> >> before: 13ms                  6.8ms                   0.61ms
> >> after:        17ms                    11ms                    4.6ms
> >>
> >> The unwinder port made arch_within_stack_frames() *much* (8x) slower
> >> than its current simple implementation, and added about 30% (4ms) to the
> >> total copy_to/from_user() run time.
> >>
> >> Note that hardened usercopy itself is already quite slow: it made user
> >> copies about 52% slower.  With the unwinder port, that worsened to ~65%.
> >
> > FWIW, I think I messed up my math summary here.  Hardened usercopy was
> > roughly 110% slower than normal usercopy (i.e., it took more than twice
> > as long) with 52% of the usercopy time being consumed by
> > check_object_size().
> 
> And this is comparing usercopy to hardened usercopy, which isn't
> expected to be super fast, it's just a cheap expense in comparison to
> the rest of the work being done for a given syscall.
> 
> > With the unwinder, that worsened to 180% slower -- with 65% of the
> > usercopy time being consumed by check_object_size().
> 
> That's quite a bit more than just a simple frame walk. You mentioned a
> few benefits to using the unwinder, but I'm trying to make sure the
> cases it covers can actually happen during a usercopy?

Yeah, I really don't know.  And given Linus's objections, I think I'll
drop it, and the other usercopy patches.

Though I think "move arch_within_stack_frames() to usercopy.c" is still
a nice cleanup if you want to pick that one up.

-- 
Josh

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v4 54/57] x86/mm: convert arch_within_stack_frames() to use the new unwinder
  2016-08-22 22:11   ` Linus Torvalds
  2016-08-23  1:27     ` Kees Cook
@ 2016-08-23 16:06     ` Josh Poimboeuf
  2016-08-23 19:28       ` [PATCH 1/2] mm/usercopy: get rid of "provably correct" warnings Josh Poimboeuf
  2016-08-23 19:28       ` [PATCH 2/2] mm/usercopy: enable usercopy size checking for modern versions of gcc Josh Poimboeuf
  2016-08-23 20:31     ` [PATCH v4 54/57] x86/mm: convert arch_within_stack_frames() to use the new unwinder Andy Lutomirski
  2 siblings, 2 replies; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-23 16:06 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	the arch/x86 maintainers, Linux Kernel Mailing List,
	Andy Lutomirski, Steven Rostedt, Brian Gerst, Kees Cook,
	Peter Zijlstra, Frederic Weisbecker, Byungchul Park, Nilay Vaish

On Mon, Aug 22, 2016 at 03:11:32PM -0700, Linus Torvalds wrote:
> On Thu, Aug 18, 2016 at 6:06 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > Convert arch_within_stack_frames() to use the new unwinder.
> 
> Please don't do this.
> 
> There's no real reason to unwind the stack frame. If it's not on the
> current stack page, it shouldn't be a valid source anyway, so
> unwidning things just seems entirely pointless.
> 
> Quite frankly, I think the whole "look at the stack frames" logic
> should be removed from this. It's classic crap that external patches
> do. How many call-sites does it actually check, and how many of them
> aren't already checked by the existing static checks for constant
> addresses within existing objects?

I noticed the __compiletime_object_size() check is completely disabled
for gcc >= 4.6, thanks to:

  2fb0815c9ee6 ("gcc4: disable __compiletime_object_size for GCC 4.6+")

AFAICT, that change went too far: it disabled both the compile-time
*and* the runtime checks, so copy_from_user_overflow() is never called.

Working on a couple of patches to try to fix that.

-- 
Josh

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v4 54/57] x86/mm: convert arch_within_stack_frames() to use the new unwinder
  2016-08-23  1:27     ` Kees Cook
@ 2016-08-23 16:21       ` Josh Poimboeuf
  2016-08-23 18:47       ` Linus Torvalds
  1 sibling, 0 replies; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-23 16:21 UTC (permalink / raw)
  To: Kees Cook
  Cc: Linus Torvalds, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	the arch/x86 maintainers, Linux Kernel Mailing List,
	Andy Lutomirski, Steven Rostedt, Brian Gerst, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

On Mon, Aug 22, 2016 at 06:27:28PM -0700, Kees Cook wrote:
> On Mon, Aug 22, 2016 at 3:11 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> > On Thu, Aug 18, 2016 at 6:06 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> >> Convert arch_within_stack_frames() to use the new unwinder.
> >
> > Please don't do this.
> >
> > There's no real reason to unwind the stack frame. If it's not on the
> > current stack page, it shouldn't be a valid source anyway, so
> > unwidning things just seems entirely pointless.
> >
> > Quite frankly, I think the whole "look at the stack frames" logic
> > should be removed from this. It's classic crap that external patches
> > do. How many call-sites does it actually check, and how many of them
> > aren't already checked by the existing static checks for constant
> > addresses within existing objects?
> >
> > It's entirely possible that there is simply no point what-so-ever to
> > this all, and it mostly triggers on things like the fs/stat.c code
> > that does
> >
> >         struct stat tmp;
> >     ...
> >         return copy_to_user(statbuf,&tmp,sizeof(tmp)) ? -EFAULT : 0;
> >
> > where the new useraccess.c code is pure masturbatory crap.
> 
> I need to re-check the copy_*_user changes, but on several
> architectures, the bounds checking is only triggered for non
> built-in-const sizes, so these kinds of pointless checks shouldn't
> happen. This should be done universally to avoid the needless
> overhead. (And is why I'm hoping to consolidate the copy_*_user logic,
> which Al appears to also be looking at recently.)

I noticed you added this check for powerpc:

	if (!__builtin_constant_p(n))
		check_object_size(to, n, false);

But I don't see a similar check on x86 or any of the other arches I
looked at.  Was that an oversight or is there a specific reason for
doing it on some arches and not others?

> > One of the reasons I had for merging that code was that I was hoping
> > that it would improve by being in the  kernel. And by "improve" I mean
> > "get rid of crap" rather than make it more expensive and even more
> > self-congratulatory stupidity.
> >
> > Right now, I suspect 99% of all the stack checks in usercopy.c are
> > solidly in the "mindbogglingly stupid crap" camp.
> 
> The stack bounds checking makes sense to block writes to the saved
> frame and instruction pointers, though in practice the stack canary
> should resist that kind of attack. The improvement I'd like to see
> would be for the canary to be excluded from the frame size calculation
> (though I can't imagine how) so that canaries couldn't be exposed
> during reads.

Yeah, protecting the stack canary would be nice, but it would be hard
without DWARF.  The only way I can think of doing it would be with a gcc
plugin or an objtool extension which creates some kind of fast-access
table of per-function canary stack offsets for
arch_within_stack_frames() to consult.

-- 
Josh

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v4 54/57] x86/mm: convert arch_within_stack_frames() to use the new unwinder
  2016-08-23  1:27     ` Kees Cook
  2016-08-23 16:21       ` Josh Poimboeuf
@ 2016-08-23 18:47       ` Linus Torvalds
  1 sibling, 0 replies; 107+ messages in thread
From: Linus Torvalds @ 2016-08-23 18:47 UTC (permalink / raw)
  To: Kees Cook
  Cc: Josh Poimboeuf, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	the arch/x86 maintainers, Linux Kernel Mailing List,
	Andy Lutomirski, Steven Rostedt, Brian Gerst, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

On Mon, Aug 22, 2016 at 9:27 PM, Kees Cook <keescook@chromium.org> wrote:
>
> I need to re-check the copy_*_user changes, but on several
> architectures, the bounds checking is only triggered for non
> built-in-const sizes, so these kinds of pointless checks shouldn't
> happen.

They definitely happen at least on x86.

"stat()" is one common user of fixed-sized structures being copied.
There are tons of others, but 'stat()' is the one I've seen in my
profiles before as being noticeable. It's been critical enough that I
have occasionally tried to play with making it avoid the "copy to
temporary struct, then copy_to_user() the whole struct" and just do it
field-by-field. But it gets nasty with the padding fields etc, so it's
never been done.

Not doing the access size checks for constant-sized copies (at least
when they are "sufficiently small" constants) would probably be the
right thing to do, and then depend on gcc just getting the static case
right warning-wise. Which isn't apparently getting done right now
either, but oh well..

            Linus

^ permalink raw reply	[flat|nested] 107+ messages in thread

* [PATCH 1/2] mm/usercopy: get rid of "provably correct" warnings
  2016-08-23 16:06     ` Josh Poimboeuf
@ 2016-08-23 19:28       ` Josh Poimboeuf
  2016-08-24  2:36         ` Kees Cook
  2016-08-23 19:28       ` [PATCH 2/2] mm/usercopy: enable usercopy size checking for modern versions of gcc Josh Poimboeuf
  1 sibling, 1 reply; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-23 19:28 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86, linux-kernel,
	Andy Lutomirski, Steven Rostedt, Brian Gerst, Kees Cook,
	Peter Zijlstra, Frederic Weisbecker, Byungchul Park, Nilay Vaish

With CONFIG_DEBUG_STRICT_USER_COPY_CHECKS=y, if I force enable the
__compiletime_object_size() macro with a recent compiler by removing the
"GCC_VERSION < 40600" check, I get a bunch of false positive warnings.
For example:

  In function ‘copy_to_user.part.8’,
      inlined from ‘copy_to_user’,
      inlined from ‘proc_put_long’ at /home/jpoimboe/git/linux/kernel/sysctl.c:2096:6:
  /home/jpoimboe/git/linux/arch/x86/include/asm/uaccess.h:723:46: warning: call to ‘__copy_to_user_overflow’ declared with attribute warning: copy_to_user() buffer size is not provably correct

The problem is that gcc can't always definitively tell whether
copy_from_user_overflow() will be called.  And when in doubt, it prints
the warning anyway.  So in practice, these warnings mostly just create a
lot of noise.  There might be a bug lurking in there somewhere, but the
signal to noise ratio is really low, and not worth the pain IMO.

So just remove the "provably correct" warnings altogether.  This also
lays the groundwork for re-enabling the copy_from_user_overflow()
runtime warnings for newer compilers.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/parisc/include/asm/uaccess.h |  8 +-------
 arch/s390/include/asm/uaccess.h   |  6 +-----
 arch/tile/include/asm/uaccess.h   |  3 +--
 arch/x86/include/asm/uaccess.h    | 35 -----------------------------------
 4 files changed, 3 insertions(+), 49 deletions(-)

diff --git a/arch/parisc/include/asm/uaccess.h b/arch/parisc/include/asm/uaccess.h
index 0f59fd9..b34c022 100644
--- a/arch/parisc/include/asm/uaccess.h
+++ b/arch/parisc/include/asm/uaccess.h
@@ -208,13 +208,7 @@ unsigned long copy_in_user(void __user *dst, const void __user *src, unsigned lo
 #define __copy_to_user_inatomic __copy_to_user
 #define __copy_from_user_inatomic __copy_from_user
 
-extern void copy_from_user_overflow(void)
-#ifdef CONFIG_DEBUG_STRICT_USER_COPY_CHECKS
-        __compiletime_error("copy_from_user() buffer size is not provably correct")
-#else
-        __compiletime_warning("copy_from_user() buffer size is not provably correct")
-#endif
-;
+extern void copy_from_user_overflow(void);
 
 static inline unsigned long __must_check copy_from_user(void *to,
                                           const void __user *from,
diff --git a/arch/s390/include/asm/uaccess.h b/arch/s390/include/asm/uaccess.h
index 9b49cf1..6d36860 100644
--- a/arch/s390/include/asm/uaccess.h
+++ b/arch/s390/include/asm/uaccess.h
@@ -332,11 +332,7 @@ copy_to_user(void __user *to, const void *from, unsigned long n)
 	return __copy_to_user(to, from, n);
 }
 
-void copy_from_user_overflow(void)
-#ifdef CONFIG_DEBUG_STRICT_USER_COPY_CHECKS
-__compiletime_warning("copy_from_user() buffer size is not provably correct")
-#endif
-;
+void copy_from_user_overflow(void);
 
 /**
  * copy_from_user: - Copy a block of data from user space.
diff --git a/arch/tile/include/asm/uaccess.h b/arch/tile/include/asm/uaccess.h
index 0a9c4265..e0e313f 100644
--- a/arch/tile/include/asm/uaccess.h
+++ b/arch/tile/include/asm/uaccess.h
@@ -422,8 +422,7 @@ _copy_from_user(void *to, const void __user *from, unsigned long n)
  * option is not really compatible with -Werror, which is more useful in
  * general.
  */
-extern void copy_from_user_overflow(void)
-	__compiletime_warning("copy_from_user() size is not provably correct");
+extern void copy_from_user_overflow(void);
 
 static inline unsigned long __must_check copy_from_user(void *to,
 					  const void __user *from,
diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index a0ae610..89c12cb 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -710,20 +710,6 @@ copy_to_user_overflow(void) __asm__("copy_from_user_overflow");
 
 #undef copy_user_diag
 
-#ifdef CONFIG_DEBUG_STRICT_USER_COPY_CHECKS
-
-extern void
-__compiletime_warning("copy_from_user() buffer size is not provably correct")
-__copy_from_user_overflow(void) __asm__("copy_from_user_overflow");
-#define __copy_from_user_overflow(size, count) __copy_from_user_overflow()
-
-extern void
-__compiletime_warning("copy_to_user() buffer size is not provably correct")
-__copy_to_user_overflow(void) __asm__("copy_from_user_overflow");
-#define __copy_to_user_overflow(size, count) __copy_to_user_overflow()
-
-#else
-
 static inline void
 __copy_from_user_overflow(int size, unsigned long count)
 {
@@ -732,8 +718,6 @@ __copy_from_user_overflow(int size, unsigned long count)
 
 #define __copy_to_user_overflow __copy_from_user_overflow
 
-#endif
-
 static inline unsigned long __must_check
 copy_from_user(void *to, const void __user *from, unsigned long n)
 {
@@ -743,24 +727,6 @@ copy_from_user(void *to, const void __user *from, unsigned long n)
 
 	kasan_check_write(to, n);
 
-	/*
-	 * While we would like to have the compiler do the checking for us
-	 * even in the non-constant size case, any false positives there are
-	 * a problem (especially when DEBUG_STRICT_USER_COPY_CHECKS, but even
-	 * without - the [hopefully] dangerous looking nature of the warning
-	 * would make people go look at the respecitive call sites over and
-	 * over again just to find that there's no problem).
-	 *
-	 * And there are cases where it's just not realistic for the compiler
-	 * to prove the count to be in range. For example when multiple call
-	 * sites of a helper function - perhaps in different source files -
-	 * all doing proper range checking, yet the helper function not doing
-	 * so again.
-	 *
-	 * Therefore limit the compile time checking to the constant size
-	 * case, and do only runtime checking for non-constant sizes.
-	 */
-
 	if (likely(sz < 0 || sz >= n)) {
 		check_object_size(to, n, false);
 		n = _copy_from_user(to, from, n);
@@ -781,7 +747,6 @@ copy_to_user(void __user *to, const void *from, unsigned long n)
 
 	might_fault();
 
-	/* See the comment in copy_from_user() above. */
 	if (likely(sz < 0 || sz >= n)) {
 		check_object_size(from, n, true);
 		n = _copy_to_user(to, from, n);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH 2/2] mm/usercopy: enable usercopy size checking for modern versions of gcc
  2016-08-23 16:06     ` Josh Poimboeuf
  2016-08-23 19:28       ` [PATCH 1/2] mm/usercopy: get rid of "provably correct" warnings Josh Poimboeuf
@ 2016-08-23 19:28       ` Josh Poimboeuf
  2016-08-24  2:37         ` Kees Cook
  1 sibling, 1 reply; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-23 19:28 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86, linux-kernel,
	Andy Lutomirski, Steven Rostedt, Brian Gerst, Kees Cook,
	Peter Zijlstra, Frederic Weisbecker, Byungchul Park, Nilay Vaish

This is a revert of:

  2fb0815c9ee6 ("gcc4: disable __compiletime_object_size for GCC 4.6+")

The goal of that commit was to silence the "provably correct" gcc
warnings.  But it went too far: it also disabled the runtime warnings.

Now that the pretty much useless gcc warnings have been properly
disposed of with the previous patch, re-enable this checking on modern
versions of gcc so we can get the runtime warnings again.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 include/linux/compiler-gcc.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
index e294939..e7f7a68 100644
--- a/include/linux/compiler-gcc.h
+++ b/include/linux/compiler-gcc.h
@@ -158,7 +158,7 @@
 #define __compiler_offsetof(a, b)					\
 	__builtin_offsetof(a, b)
 
-#if GCC_VERSION >= 40100 && GCC_VERSION < 40600
+#if GCC_VERSION >= 40100
 # define __compiletime_object_size(obj) __builtin_object_size(obj, 0)
 #endif
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* Re: [PATCH v4 54/57] x86/mm: convert arch_within_stack_frames() to use the new unwinder
  2016-08-22 22:11   ` Linus Torvalds
  2016-08-23  1:27     ` Kees Cook
  2016-08-23 16:06     ` Josh Poimboeuf
@ 2016-08-23 20:31     ` Andy Lutomirski
  2016-08-23 21:06       ` Linus Torvalds
  2016-08-23 21:08       ` Josh Poimboeuf
  2 siblings, 2 replies; 107+ messages in thread
From: Andy Lutomirski @ 2016-08-23 20:31 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Frederic Weisbecker, Thomas Gleixner, Kees Cook, Josh Poimboeuf,
	H . Peter Anvin, Nilay Vaish, the arch/x86 maintainers,
	Steven Rostedt, Ingo Molnar, Linux Kernel Mailing List,
	Brian Gerst, Byungchul Park, Peter Zijlstra

On Aug 23, 2016 12:11 AM, "Linus Torvalds"
<torvalds@linux-foundation.org> wrote:
>
> On Thu, Aug 18, 2016 at 6:06 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > Convert arch_within_stack_frames() to use the new unwinder.
>
> Please don't do this.
>
> There's no real reason to unwind the stack frame. If it's not on the
> current stack page, it shouldn't be a valid source anyway, so
> unwidning things just seems entirely pointless.
>
> Quite frankly, I think the whole "look at the stack frames" logic
> should be removed from this. It's classic crap that external patches
> do. How many call-sites does it actually check, and how many of them
> aren't already checked by the existing static checks for constant
> addresses within existing objects?
>

I'm a bit confused by what you're objecting to.  If I write:

char buf[123];

func(buf, size);

And func eventually does some usercopy to buf, the idea is to check
that size is in bounds.  Now admittedly this kind of code should be
quite rare in the kernel, and it should be even rarer for the buffer
to be more than a frame or two up the stack.

So the fact that this seems to have any significant effect on
performance suggests to me that it's being run unnecessarily or that
somehow we're walking all the way to the top of the stack in cases
where we shouldn't have done so.

Josh, can you see an example call site in a profile of your test to
find out what this code is doing?

All that being said, Linus, assuming that Josh's new unwinder can be
made reasonably performant, I don't understand your objection to this
patch in particular.  Josh isn't changing the way that usercopy
hardening works -- he's just changing the stack walking
implementation.  It seems that you're objecting to this code in
general, but that predates Josh's patch, no?

--Andy

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v4 54/57] x86/mm: convert arch_within_stack_frames() to use the new unwinder
  2016-08-23 20:31     ` [PATCH v4 54/57] x86/mm: convert arch_within_stack_frames() to use the new unwinder Andy Lutomirski
@ 2016-08-23 21:06       ` Linus Torvalds
  2016-08-23 21:08       ` Josh Poimboeuf
  1 sibling, 0 replies; 107+ messages in thread
From: Linus Torvalds @ 2016-08-23 21:06 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Frederic Weisbecker, Thomas Gleixner, Kees Cook, Josh Poimboeuf,
	H . Peter Anvin, Nilay Vaish, the arch/x86 maintainers,
	Steven Rostedt, Ingo Molnar, Linux Kernel Mailing List,
	Brian Gerst, Byungchul Park, Peter Zijlstra

On Tue, Aug 23, 2016 at 4:31 PM, Andy Lutomirski <luto@kernel.org> wrote:
>
> I'm a bit confused by what you're objecting to.  If I write:
>
> char buf[123];
>
> func(buf, size);
>
> And func eventually does some usercopy to buf, the idea is to check
> that size is in bounds.

That's the *IDEA*.

That's not what the code actually does.

The code will follow arbitrary stack frames, which seems silly since
it's expensive. At least the old code only checked within one page
(looking at the "stackend" thing), and aborted whenever the trivla
frame pointer chasing didn't. The new code may be a nice abstraction,
but also seems to not do that, and just follow the frame in general.

Should we have nested stacks and copy_to_user()? No. But why have
generic frame following code when we don't want the generic case to
ever trigger? If the code is slower - and Josh said it was quite
noticeably slower, then what's the advantage?

But my *real* objection is that I suspect that in 99% of all cases we
shouldn't do any of this, and the user access hardening should be made
smart enough that we don't need to worry about it. Right now the
hardening is not that smart. It tries to handle the case you mention,
but it does so by *also* handling the case _I_ mentioned, which is the
"trivially statically correct at build time", where the code is

    struct xyz tmp;

    .. fill in tmo ..

     copy_to_user(ptr, &tmp, sizeof(tmp));

where wasting cycles to see if it's on the stack is just stupid.

And quite frankly, I suspect that *most* situations where you copy
from or to the stack are very obvious constant sizes like the above.

Can  you find a _single_ case of a non-constant buffer on the stack?
It's rare. If it's a variably-sized area, 99% of all time it's a
dynamic allocation, not a stack variable.

So I actually suspect that we could just say "let's make it entirely
invalid to copy variably-sized things to/from the stack". Get rid of
this "follow frames" code _entirely_, and just make the rule be that a
variable copy_to/from_user had better not be on the stack. And the
static constant sizes are clearly not about overflows, so if those are
wrong, it's because somebody uses the wrong type entirely, and gcc
should be catching them statically (or we should catch them with other
tools).

Because the fs/stat.c copies really have been some of the hottest
user-copy code examples we have under certain loads. Do we really want
to have stupid code that makes them slower for no possibly valid
reason?

At some point somebody has to just say "That's just TOO STUPID TO LIVE!".

Checking those fs/stat.c copies dynamically is one such case.

              Linus

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v4 54/57] x86/mm: convert arch_within_stack_frames() to use the new unwinder
  2016-08-23 20:31     ` [PATCH v4 54/57] x86/mm: convert arch_within_stack_frames() to use the new unwinder Andy Lutomirski
  2016-08-23 21:06       ` Linus Torvalds
@ 2016-08-23 21:08       ` Josh Poimboeuf
  2016-08-24  1:37         ` Kees Cook
  1 sibling, 1 reply; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-23 21:08 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, Frederic Weisbecker, Thomas Gleixner, Kees Cook,
	H . Peter Anvin, Nilay Vaish, the arch/x86 maintainers,
	Steven Rostedt, Ingo Molnar, Linux Kernel Mailing List,
	Brian Gerst, Byungchul Park, Peter Zijlstra

On Tue, Aug 23, 2016 at 01:31:20PM -0700, Andy Lutomirski wrote:
> On Aug 23, 2016 12:11 AM, "Linus Torvalds"
> So the fact that this seems to have any significant effect on
> performance suggests to me that it's being run unnecessarily

Yeah, I think check_object_size() is being run unnecessarily in a lot of
cases.  Calling it only when size is non-const would probably speed
things up a lot.

> or that somehow we're walking all the way to the top of the stack in
> cases where we shouldn't have done so.

I know that's not happening because this code would print a warning.

> Josh, can you see an example call site in a profile of your test to
> find out what this code is doing?

I can try to figure it out tomorrow.  But really it doesn't surprise me
much that this patch makes arch_within_stack_frames() an order of
magnitude slower.  The original code was very simple, whereas
__unwind_start() and unwind_next_frame() have a lot more code.

-- 
Josh

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v4 54/57] x86/mm: convert arch_within_stack_frames() to use the new unwinder
  2016-08-23 21:08       ` Josh Poimboeuf
@ 2016-08-24  1:37         ` Kees Cook
  0 siblings, 0 replies; 107+ messages in thread
From: Kees Cook @ 2016-08-24  1:37 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Andy Lutomirski, Linus Torvalds, Frederic Weisbecker,
	Thomas Gleixner, H . Peter Anvin, Nilay Vaish,
	the arch/x86 maintainers, Steven Rostedt, Ingo Molnar,
	Linux Kernel Mailing List, Brian Gerst, Byungchul Park,
	Peter Zijlstra

On Tue, Aug 23, 2016 at 5:08 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> On Tue, Aug 23, 2016 at 01:31:20PM -0700, Andy Lutomirski wrote:
>> On Aug 23, 2016 12:11 AM, "Linus Torvalds"
>> So the fact that this seems to have any significant effect on
>> performance suggests to me that it's being run unnecessarily
>
> Yeah, I think check_object_size() is being run unnecessarily in a lot of
> cases.  Calling it only when size is non-const would probably speed
> things up a lot.

Yup, this is at the top of my list to fix. The non-const is only done
on a handful of architectures, and it needs to be done everywhere.

-Kees

-- 
Kees Cook
Nexus Security

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH 1/2] mm/usercopy: get rid of "provably correct" warnings
  2016-08-23 19:28       ` [PATCH 1/2] mm/usercopy: get rid of "provably correct" warnings Josh Poimboeuf
@ 2016-08-24  2:36         ` Kees Cook
  0 siblings, 0 replies; 107+ messages in thread
From: Kees Cook @ 2016-08-24  2:36 UTC (permalink / raw)
  To: Josh Poimboeuf, Jeff Law
  Cc: Linus Torvalds, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	x86, LKML, Andy Lutomirski, Steven Rostedt, Brian Gerst,
	Peter Zijlstra, Frederic Weisbecker, Byungchul Park, Nilay Vaish

On Tue, Aug 23, 2016 at 3:28 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> With CONFIG_DEBUG_STRICT_USER_COPY_CHECKS=y, if I force enable the
> __compiletime_object_size() macro with a recent compiler by removing the
> "GCC_VERSION < 40600" check, I get a bunch of false positive warnings.
> For example:
>
>   In function ‘copy_to_user.part.8’,
>       inlined from ‘copy_to_user’,
>       inlined from ‘proc_put_long’ at /home/jpoimboe/git/linux/kernel/sysctl.c:2096:6:
>   /home/jpoimboe/git/linux/arch/x86/include/asm/uaccess.h:723:46: warning: call to ‘__copy_to_user_overflow’ declared with attribute warning: copy_to_user() buffer size is not provably correct
>
> The problem is that gcc can't always definitively tell whether
> copy_from_user_overflow() will be called.  And when in doubt, it prints
> the warning anyway.  So in practice, these warnings mostly just create a
> lot of noise.  There might be a bug lurking in there somewhere, but the
> signal to noise ratio is really low, and not worth the pain IMO.
>
> So just remove the "provably correct" warnings altogether.  This also
> lays the groundwork for re-enabling the copy_from_user_overflow()
> runtime warnings for newer compilers.

Hrrrm, I'd much rather split configs or something. This "probably
correct" warning is something gcc should be ABLE to do, but the
ability regressed and hasn't yet been fixed:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46639
originally: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=48880

We should get that back at some point, and I'd like to have the
compile-time checks enabled again then without having to reintroduce
the code.

Jeff, any news on this front? It'd be really nice to get this back in.
One of your comments in 2014 on the bug make it sound like it might be
easy to fix?

-Kees

>
> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
> ---
>  arch/parisc/include/asm/uaccess.h |  8 +-------
>  arch/s390/include/asm/uaccess.h   |  6 +-----
>  arch/tile/include/asm/uaccess.h   |  3 +--
>  arch/x86/include/asm/uaccess.h    | 35 -----------------------------------
>  4 files changed, 3 insertions(+), 49 deletions(-)
>
> diff --git a/arch/parisc/include/asm/uaccess.h b/arch/parisc/include/asm/uaccess.h
> index 0f59fd9..b34c022 100644
> --- a/arch/parisc/include/asm/uaccess.h
> +++ b/arch/parisc/include/asm/uaccess.h
> @@ -208,13 +208,7 @@ unsigned long copy_in_user(void __user *dst, const void __user *src, unsigned lo
>  #define __copy_to_user_inatomic __copy_to_user
>  #define __copy_from_user_inatomic __copy_from_user
>
> -extern void copy_from_user_overflow(void)
> -#ifdef CONFIG_DEBUG_STRICT_USER_COPY_CHECKS
> -        __compiletime_error("copy_from_user() buffer size is not provably correct")
> -#else
> -        __compiletime_warning("copy_from_user() buffer size is not provably correct")
> -#endif
> -;
> +extern void copy_from_user_overflow(void);
>
>  static inline unsigned long __must_check copy_from_user(void *to,
>                                            const void __user *from,
> diff --git a/arch/s390/include/asm/uaccess.h b/arch/s390/include/asm/uaccess.h
> index 9b49cf1..6d36860 100644
> --- a/arch/s390/include/asm/uaccess.h
> +++ b/arch/s390/include/asm/uaccess.h
> @@ -332,11 +332,7 @@ copy_to_user(void __user *to, const void *from, unsigned long n)
>         return __copy_to_user(to, from, n);
>  }
>
> -void copy_from_user_overflow(void)
> -#ifdef CONFIG_DEBUG_STRICT_USER_COPY_CHECKS
> -__compiletime_warning("copy_from_user() buffer size is not provably correct")
> -#endif
> -;
> +void copy_from_user_overflow(void);
>
>  /**
>   * copy_from_user: - Copy a block of data from user space.
> diff --git a/arch/tile/include/asm/uaccess.h b/arch/tile/include/asm/uaccess.h
> index 0a9c4265..e0e313f 100644
> --- a/arch/tile/include/asm/uaccess.h
> +++ b/arch/tile/include/asm/uaccess.h
> @@ -422,8 +422,7 @@ _copy_from_user(void *to, const void __user *from, unsigned long n)
>   * option is not really compatible with -Werror, which is more useful in
>   * general.
>   */
> -extern void copy_from_user_overflow(void)
> -       __compiletime_warning("copy_from_user() size is not provably correct");
> +extern void copy_from_user_overflow(void);
>
>  static inline unsigned long __must_check copy_from_user(void *to,
>                                           const void __user *from,
> diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
> index a0ae610..89c12cb 100644
> --- a/arch/x86/include/asm/uaccess.h
> +++ b/arch/x86/include/asm/uaccess.h
> @@ -710,20 +710,6 @@ copy_to_user_overflow(void) __asm__("copy_from_user_overflow");
>
>  #undef copy_user_diag
>
> -#ifdef CONFIG_DEBUG_STRICT_USER_COPY_CHECKS
> -
> -extern void
> -__compiletime_warning("copy_from_user() buffer size is not provably correct")
> -__copy_from_user_overflow(void) __asm__("copy_from_user_overflow");
> -#define __copy_from_user_overflow(size, count) __copy_from_user_overflow()
> -
> -extern void
> -__compiletime_warning("copy_to_user() buffer size is not provably correct")
> -__copy_to_user_overflow(void) __asm__("copy_from_user_overflow");
> -#define __copy_to_user_overflow(size, count) __copy_to_user_overflow()
> -
> -#else
> -
>  static inline void
>  __copy_from_user_overflow(int size, unsigned long count)
>  {
> @@ -732,8 +718,6 @@ __copy_from_user_overflow(int size, unsigned long count)
>
>  #define __copy_to_user_overflow __copy_from_user_overflow
>
> -#endif
> -
>  static inline unsigned long __must_check
>  copy_from_user(void *to, const void __user *from, unsigned long n)
>  {
> @@ -743,24 +727,6 @@ copy_from_user(void *to, const void __user *from, unsigned long n)
>
>         kasan_check_write(to, n);
>
> -       /*
> -        * While we would like to have the compiler do the checking for us
> -        * even in the non-constant size case, any false positives there are
> -        * a problem (especially when DEBUG_STRICT_USER_COPY_CHECKS, but even
> -        * without - the [hopefully] dangerous looking nature of the warning
> -        * would make people go look at the respecitive call sites over and
> -        * over again just to find that there's no problem).
> -        *
> -        * And there are cases where it's just not realistic for the compiler
> -        * to prove the count to be in range. For example when multiple call
> -        * sites of a helper function - perhaps in different source files -
> -        * all doing proper range checking, yet the helper function not doing
> -        * so again.
> -        *
> -        * Therefore limit the compile time checking to the constant size
> -        * case, and do only runtime checking for non-constant sizes.
> -        */
> -
>         if (likely(sz < 0 || sz >= n)) {
>                 check_object_size(to, n, false);
>                 n = _copy_from_user(to, from, n);
> @@ -781,7 +747,6 @@ copy_to_user(void __user *to, const void *from, unsigned long n)
>
>         might_fault();
>
> -       /* See the comment in copy_from_user() above. */
>         if (likely(sz < 0 || sz >= n)) {
>                 check_object_size(from, n, true);
>                 n = _copy_to_user(to, from, n);
> --
> 2.7.4
>



-- 
Kees Cook
Nexus Security

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH 2/2] mm/usercopy: enable usercopy size checking for modern versions of gcc
  2016-08-23 19:28       ` [PATCH 2/2] mm/usercopy: enable usercopy size checking for modern versions of gcc Josh Poimboeuf
@ 2016-08-24  2:37         ` Kees Cook
  2016-08-25 20:47           ` Josh Poimboeuf
  0 siblings, 1 reply; 107+ messages in thread
From: Kees Cook @ 2016-08-24  2:37 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Linus Torvalds, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	x86, LKML, Andy Lutomirski, Steven Rostedt, Brian Gerst,
	Peter Zijlstra, Frederic Weisbecker, Byungchul Park, Nilay Vaish

On Tue, Aug 23, 2016 at 3:28 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> This is a revert of:
>
>   2fb0815c9ee6 ("gcc4: disable __compiletime_object_size for GCC 4.6+")
>
> The goal of that commit was to silence the "provably correct" gcc
> warnings.  But it went too far: it also disabled the runtime warnings.
>
> Now that the pretty much useless gcc warnings have been properly
> disposed of with the previous patch, re-enable this checking on modern
> versions of gcc so we can get the runtime warnings again.

As far as I know, this will still be broken since it's
__builtin_object_size() that is buggy. Maybe I'm misunderstanding
which piece is busted, though?

-Kees

>
> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
> ---
>  include/linux/compiler-gcc.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
> index e294939..e7f7a68 100644
> --- a/include/linux/compiler-gcc.h
> +++ b/include/linux/compiler-gcc.h
> @@ -158,7 +158,7 @@
>  #define __compiler_offsetof(a, b)                                      \
>         __builtin_offsetof(a, b)
>
> -#if GCC_VERSION >= 40100 && GCC_VERSION < 40600
> +#if GCC_VERSION >= 40100
>  # define __compiletime_object_size(obj) __builtin_object_size(obj, 0)
>  #endif
>
> --
> 2.7.4
>



-- 
Kees Cook
Nexus Security

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH 2/2] mm/usercopy: enable usercopy size checking for modern versions of gcc
  2016-08-24  2:37         ` Kees Cook
@ 2016-08-25 20:47           ` Josh Poimboeuf
  2016-08-26  2:14             ` Kees Cook
  0 siblings, 1 reply; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-25 20:47 UTC (permalink / raw)
  To: Kees Cook
  Cc: Linus Torvalds, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	x86, LKML, Andy Lutomirski, Steven Rostedt, Brian Gerst,
	Peter Zijlstra, Frederic Weisbecker, Byungchul Park, Nilay Vaish

On Tue, Aug 23, 2016 at 10:37:43PM -0400, Kees Cook wrote:
> On Tue, Aug 23, 2016 at 3:28 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > This is a revert of:
> >
> >   2fb0815c9ee6 ("gcc4: disable __compiletime_object_size for GCC 4.6+")
> >
> > The goal of that commit was to silence the "provably correct" gcc
> > warnings.  But it went too far: it also disabled the runtime warnings.
> >
> > Now that the pretty much useless gcc warnings have been properly
> > disposed of with the previous patch, re-enable this checking on modern
> > versions of gcc so we can get the runtime warnings again.
> 
> As far as I know, this will still be broken since it's
> __builtin_object_size() that is buggy. Maybe I'm misunderstanding
> which piece is busted, though?

What specifically is buggy with __builtin_object_size()?  Looking at the
generated code for a few of the "provably correct" warning sites, the
values generated by __builtin_object_size() are correct.

I think the problem is really related to the compile-time warning
function attribute used by __copy_to_user_overflow().  The warning is
printed when gcc *can* determine the object size but it *can't*
determine the copy size.  The warning just means that, even though the
object has a const size, gcc isn't able to prove that the overflow won't
happen.

As an example, here's one of the warnings:

  In file included from /home/jpoimboe/git/linux/include/linux/uaccess.h:5:0,
                   from /home/jpoimboe/git/linux/arch/x86/include/asm/stacktrace.h:9,
                   from /home/jpoimboe/git/linux/arch/x86/include/asm/perf_event.h:246,
                   from /home/jpoimboe/git/linux/include/linux/perf_event.h:24,
                   from /home/jpoimboe/git/linux/kernel/sys.c:16:
  In function ‘copy_to_user.part.10’,
      inlined from ‘copy_to_user’,
      inlined from ‘override_release.part.11’ at /home/jpoimboe/git/linux/kernel/sys.c:1136:9:
  /home/jpoimboe/git/linux/arch/x86/include/asm/uaccess.h:723:46: warning: call to ‘__copy_to_user_overflow’ declared with attribute warning: copy_to_user() buffer size is not provably correct
   #define __copy_to_user_overflow(size, count) __copy_to_user_overflow()
                                                ^~~~~~~~~~~~~~~~~~~~~~~~~
  /home/jpoimboe/git/linux/arch/x86/include/asm/uaccess.h:791:3: note: in expansion of macro ‘__copy_to_user_overflow’
     __copy_to_user_overflow(sz, n);
     ^~~~~~~~~~~~~~~~~~~~~~~

This is from override_release()'s use of copy_to_user().  The object
code shows that __builtin_object_size() correctly reports 65 bytes for
the 'buf' object size.  But the copy size ('copy + 1') isn't known at
compile-time.  Thus the (bogus) warning.

Maybe I'm missing something but I don't even see a gcc bug.  To me it
looks like a mismatch in expectations between the code and the compiler.

-- 
Josh

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH 2/2] mm/usercopy: enable usercopy size checking for modern versions of gcc
  2016-08-25 20:47           ` Josh Poimboeuf
@ 2016-08-26  2:14             ` Kees Cook
  2016-08-26  3:27               ` Josh Poimboeuf
  0 siblings, 1 reply; 107+ messages in thread
From: Kees Cook @ 2016-08-26  2:14 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Linus Torvalds, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	x86, LKML, Andy Lutomirski, Steven Rostedt, Brian Gerst,
	Peter Zijlstra, Frederic Weisbecker, Byungchul Park, Nilay Vaish

On Thu, Aug 25, 2016 at 4:47 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> On Tue, Aug 23, 2016 at 10:37:43PM -0400, Kees Cook wrote:
>> On Tue, Aug 23, 2016 at 3:28 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>> > This is a revert of:
>> >
>> >   2fb0815c9ee6 ("gcc4: disable __compiletime_object_size for GCC 4.6+")
>> >
>> > The goal of that commit was to silence the "provably correct" gcc
>> > warnings.  But it went too far: it also disabled the runtime warnings.
>> >
>> > Now that the pretty much useless gcc warnings have been properly
>> > disposed of with the previous patch, re-enable this checking on modern
>> > versions of gcc so we can get the runtime warnings again.
>>
>> As far as I know, this will still be broken since it's
>> __builtin_object_size() that is buggy. Maybe I'm misunderstanding
>> which piece is busted, though?
>
> What specifically is buggy with __builtin_object_size()?  Looking at the
> generated code for a few of the "provably correct" warning sites, the
> values generated by __builtin_object_size() are correct.
>
> I think the problem is really related to the compile-time warning
> function attribute used by __copy_to_user_overflow().  The warning is
> printed when gcc *can* determine the object size but it *can't*
> determine the copy size.  The warning just means that, even though the
> object has a const size, gcc isn't able to prove that the overflow won't
> happen.
>
> As an example, here's one of the warnings:
>
>   In file included from /home/jpoimboe/git/linux/include/linux/uaccess.h:5:0,
>                    from /home/jpoimboe/git/linux/arch/x86/include/asm/stacktrace.h:9,
>                    from /home/jpoimboe/git/linux/arch/x86/include/asm/perf_event.h:246,
>                    from /home/jpoimboe/git/linux/include/linux/perf_event.h:24,
>                    from /home/jpoimboe/git/linux/kernel/sys.c:16:
>   In function ‘copy_to_user.part.10’,
>       inlined from ‘copy_to_user’,
>       inlined from ‘override_release.part.11’ at /home/jpoimboe/git/linux/kernel/sys.c:1136:9:
>   /home/jpoimboe/git/linux/arch/x86/include/asm/uaccess.h:723:46: warning: call to ‘__copy_to_user_overflow’ declared with attribute warning: copy_to_user() buffer size is not provably correct
>    #define __copy_to_user_overflow(size, count) __copy_to_user_overflow()
>                                                 ^~~~~~~~~~~~~~~~~~~~~~~~~
>   /home/jpoimboe/git/linux/arch/x86/include/asm/uaccess.h:791:3: note: in expansion of macro ‘__copy_to_user_overflow’
>      __copy_to_user_overflow(sz, n);
>      ^~~~~~~~~~~~~~~~~~~~~~~
>
> This is from override_release()'s use of copy_to_user().  The object
> code shows that __builtin_object_size() correctly reports 65 bytes for
> the 'buf' object size.  But the copy size ('copy + 1') isn't known at
> compile-time.  Thus the (bogus) warning.
>
> Maybe I'm missing something but I don't even see a gcc bug.  To me it
> looks like a mismatch in expectations between the code and the compiler.

Ah, yes, I had a total brain failure. This is what I get trying to do
email between sessions at a conference. :)

Okay, right. __builtin_object_size() is totally fine, I absolutely
misspoke: it's the resolution of const value ranges. I wouldn't expect
gcc to warn here, though, since "copy + 1" isn't a const value...

-Kees

-- 
Kees Cook
Nexus Security

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH 2/2] mm/usercopy: enable usercopy size checking for modern versions of gcc
  2016-08-26  2:14             ` Kees Cook
@ 2016-08-26  3:27               ` Josh Poimboeuf
  2016-08-26 13:42                 ` Kees Cook
  0 siblings, 1 reply; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-26  3:27 UTC (permalink / raw)
  To: Kees Cook
  Cc: Linus Torvalds, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	x86, LKML, Andy Lutomirski, Steven Rostedt, Brian Gerst,
	Peter Zijlstra, Frederic Weisbecker, Byungchul Park, Nilay Vaish

On Thu, Aug 25, 2016 at 10:14:36PM -0400, Kees Cook wrote:
> On Thu, Aug 25, 2016 at 4:47 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > On Tue, Aug 23, 2016 at 10:37:43PM -0400, Kees Cook wrote:
> >> On Tue, Aug 23, 2016 at 3:28 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> >> > This is a revert of:
> >> >
> >> >   2fb0815c9ee6 ("gcc4: disable __compiletime_object_size for GCC 4.6+")
> >> >
> >> > The goal of that commit was to silence the "provably correct" gcc
> >> > warnings.  But it went too far: it also disabled the runtime warnings.
> >> >
> >> > Now that the pretty much useless gcc warnings have been properly
> >> > disposed of with the previous patch, re-enable this checking on modern
> >> > versions of gcc so we can get the runtime warnings again.
> >>
> >> As far as I know, this will still be broken since it's
> >> __builtin_object_size() that is buggy. Maybe I'm misunderstanding
> >> which piece is busted, though?
> >
> > What specifically is buggy with __builtin_object_size()?  Looking at the
> > generated code for a few of the "provably correct" warning sites, the
> > values generated by __builtin_object_size() are correct.
> >
> > I think the problem is really related to the compile-time warning
> > function attribute used by __copy_to_user_overflow().  The warning is
> > printed when gcc *can* determine the object size but it *can't*
> > determine the copy size.  The warning just means that, even though the
> > object has a const size, gcc isn't able to prove that the overflow won't
> > happen.
> >
> > As an example, here's one of the warnings:
> >
> >   In file included from /home/jpoimboe/git/linux/include/linux/uaccess.h:5:0,
> >                    from /home/jpoimboe/git/linux/arch/x86/include/asm/stacktrace.h:9,
> >                    from /home/jpoimboe/git/linux/arch/x86/include/asm/perf_event.h:246,
> >                    from /home/jpoimboe/git/linux/include/linux/perf_event.h:24,
> >                    from /home/jpoimboe/git/linux/kernel/sys.c:16:
> >   In function ‘copy_to_user.part.10’,
> >       inlined from ‘copy_to_user’,
> >       inlined from ‘override_release.part.11’ at /home/jpoimboe/git/linux/kernel/sys.c:1136:9:
> >   /home/jpoimboe/git/linux/arch/x86/include/asm/uaccess.h:723:46: warning: call to ‘__copy_to_user_overflow’ declared with attribute warning: copy_to_user() buffer size is not provably correct
> >    #define __copy_to_user_overflow(size, count) __copy_to_user_overflow()
> >                                                 ^~~~~~~~~~~~~~~~~~~~~~~~~
> >   /home/jpoimboe/git/linux/arch/x86/include/asm/uaccess.h:791:3: note: in expansion of macro ‘__copy_to_user_overflow’
> >      __copy_to_user_overflow(sz, n);
> >      ^~~~~~~~~~~~~~~~~~~~~~~
> >
> > This is from override_release()'s use of copy_to_user().  The object
> > code shows that __builtin_object_size() correctly reports 65 bytes for
> > the 'buf' object size.  But the copy size ('copy + 1') isn't known at
> > compile-time.  Thus the (bogus) warning.
> >
> > Maybe I'm missing something but I don't even see a gcc bug.  To me it
> > looks like a mismatch in expectations between the code and the compiler.
> 
> Ah, yes, I had a total brain failure. This is what I get trying to do
> email between sessions at a conference. :)
> 
> Okay, right. __builtin_object_size() is totally fine, I absolutely
> misspoke: it's the resolution of const value ranges. I wouldn't expect
> gcc to warn here, though, since "copy + 1" isn't a const value...

Look at the code again :-)

__copy_to_user_overflow(), which does the "provably correct" warning, is
"called" when the copy size is non-const (and the object size is const).
So "copy + 1" being non-const is consistent with the warning.

-- 
Josh

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH 2/2] mm/usercopy: enable usercopy size checking for modern versions of gcc
  2016-08-26  3:27               ` Josh Poimboeuf
@ 2016-08-26 13:42                 ` Kees Cook
  2016-08-26 13:55                   ` Josh Poimboeuf
  0 siblings, 1 reply; 107+ messages in thread
From: Kees Cook @ 2016-08-26 13:42 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Linus Torvalds, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	x86, LKML, Andy Lutomirski, Steven Rostedt, Brian Gerst,
	Peter Zijlstra, Frederic Weisbecker, Byungchul Park, Nilay Vaish

On Thu, Aug 25, 2016 at 11:27 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> On Thu, Aug 25, 2016 at 10:14:36PM -0400, Kees Cook wrote:
>> Okay, right. __builtin_object_size() is totally fine, I absolutely
>> misspoke: it's the resolution of const value ranges. I wouldn't expect
>> gcc to warn here, though, since "copy + 1" isn't a const value...
>
> Look at the code again :-)
>
> __copy_to_user_overflow(), which does the "provably correct" warning, is
> "called" when the copy size is non-const (and the object size is const).
> So "copy + 1" being non-const is consistent with the warning.

Right, yes. Man, this is hard to read. All the names are the same. ;)

So this will trigger when the object size is known but the copy length
is non-const?

When I played with re-enabling this in the past, I didn't hit very
many false positives. I sent a bunch of patches a few months back for
legitimate problems that this warning pointed out, so I'm a bit
cautious to just entirely drop it.

-Kees

-- 
Kees Cook
Nexus Security

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH 2/2] mm/usercopy: enable usercopy size checking for modern versions of gcc
  2016-08-26 13:42                 ` Kees Cook
@ 2016-08-26 13:55                   ` Josh Poimboeuf
  2016-08-26 20:56                     ` Josh Poimboeuf
  0 siblings, 1 reply; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-26 13:55 UTC (permalink / raw)
  To: Kees Cook
  Cc: Linus Torvalds, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	x86, LKML, Andy Lutomirski, Steven Rostedt, Brian Gerst,
	Peter Zijlstra, Frederic Weisbecker, Byungchul Park, Nilay Vaish

On Fri, Aug 26, 2016 at 09:42:42AM -0400, Kees Cook wrote:
> On Thu, Aug 25, 2016 at 11:27 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > On Thu, Aug 25, 2016 at 10:14:36PM -0400, Kees Cook wrote:
> >> Okay, right. __builtin_object_size() is totally fine, I absolutely
> >> misspoke: it's the resolution of const value ranges. I wouldn't expect
> >> gcc to warn here, though, since "copy + 1" isn't a const value...
> >
> > Look at the code again :-)
> >
> > __copy_to_user_overflow(), which does the "provably correct" warning, is
> > "called" when the copy size is non-const (and the object size is const).
> > So "copy + 1" being non-const is consistent with the warning.
> 
> Right, yes. Man, this is hard to read. All the names are the same. ;)

Yeah, agreed.  The code is way too cryptic.

> So this will trigger when the object size is known but the copy length
> is non-const?

Right.

> When I played with re-enabling this in the past, I didn't hit very
> many false positives. I sent a bunch of patches a few months back for
> legitimate problems that this warning pointed out, so I'm a bit
> cautious to just entirely drop it.

Ah, I didn't realize that.  We should definitely keep
DEBUG_STRICT_USER_COPY_CHECKS then.  Though it would be *really* nice to
find a way to associate some kind of whitelist with it to separate the
wheat from all the chaff.

-- 
Josh

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH 2/2] mm/usercopy: enable usercopy size checking for modern versions of gcc
  2016-08-26 13:55                   ` Josh Poimboeuf
@ 2016-08-26 20:56                     ` Josh Poimboeuf
  2016-08-26 21:00                       ` Josh Poimboeuf
  2016-08-27  0:37                       ` Linus Torvalds
  0 siblings, 2 replies; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-26 20:56 UTC (permalink / raw)
  To: Kees Cook
  Cc: Linus Torvalds, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	x86, LKML, Andy Lutomirski, Steven Rostedt, Brian Gerst,
	Peter Zijlstra, Frederic Weisbecker, Byungchul Park, Nilay Vaish

On Fri, Aug 26, 2016 at 08:55:33AM -0500, Josh Poimboeuf wrote:
> On Fri, Aug 26, 2016 at 09:42:42AM -0400, Kees Cook wrote:
> > On Thu, Aug 25, 2016 at 11:27 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > > On Thu, Aug 25, 2016 at 10:14:36PM -0400, Kees Cook wrote:
> > >> Okay, right. __builtin_object_size() is totally fine, I absolutely
> > >> misspoke: it's the resolution of const value ranges. I wouldn't expect
> > >> gcc to warn here, though, since "copy + 1" isn't a const value...
> > >
> > > Look at the code again :-)
> > >
> > > __copy_to_user_overflow(), which does the "provably correct" warning, is
> > > "called" when the copy size is non-const (and the object size is const).
> > > So "copy + 1" being non-const is consistent with the warning.
> > 
> > Right, yes. Man, this is hard to read. All the names are the same. ;)
> 
> Yeah, agreed.  The code is way too cryptic.
> 
> > So this will trigger when the object size is known but the copy length
> > is non-const?
> 
> Right.
> 
> > When I played with re-enabling this in the past, I didn't hit very
> > many false positives. I sent a bunch of patches a few months back for
> > legitimate problems that this warning pointed out, so I'm a bit
> > cautious to just entirely drop it.
> 
> Ah, I didn't realize that.  We should definitely keep
> DEBUG_STRICT_USER_COPY_CHECKS then.  Though it would be *really* nice to
> find a way to associate some kind of whitelist with it to separate the
> wheat from all the chaff.

Ok, so I could drop patch 1/2 and then resubmit 2/2 with an updated
patch header.

There's one problem with that though.  It's going to annoy a lot of
people who do allyesconfig/allmodconfig builds because
DEBUG_STRICT_USER_COPY_CHECKS adds several fake warnings.

Anybody know if there's a way to disable an option for
allyesconfig/allmodconfig?

-- 
Josh

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH 2/2] mm/usercopy: enable usercopy size checking for modern versions of gcc
  2016-08-26 20:56                     ` Josh Poimboeuf
@ 2016-08-26 21:00                       ` Josh Poimboeuf
  2016-08-27  0:37                       ` Linus Torvalds
  1 sibling, 0 replies; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-26 21:00 UTC (permalink / raw)
  To: Kees Cook
  Cc: Linus Torvalds, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	x86, LKML, Andy Lutomirski, Steven Rostedt, Brian Gerst,
	Peter Zijlstra, Frederic Weisbecker, Byungchul Park, Nilay Vaish

On Fri, Aug 26, 2016 at 03:56:27PM -0500, Josh Poimboeuf wrote:
> On Fri, Aug 26, 2016 at 08:55:33AM -0500, Josh Poimboeuf wrote:
> > On Fri, Aug 26, 2016 at 09:42:42AM -0400, Kees Cook wrote:
> > > On Thu, Aug 25, 2016 at 11:27 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > > > On Thu, Aug 25, 2016 at 10:14:36PM -0400, Kees Cook wrote:
> > > >> Okay, right. __builtin_object_size() is totally fine, I absolutely
> > > >> misspoke: it's the resolution of const value ranges. I wouldn't expect
> > > >> gcc to warn here, though, since "copy + 1" isn't a const value...
> > > >
> > > > Look at the code again :-)
> > > >
> > > > __copy_to_user_overflow(), which does the "provably correct" warning, is
> > > > "called" when the copy size is non-const (and the object size is const).
> > > > So "copy + 1" being non-const is consistent with the warning.
> > > 
> > > Right, yes. Man, this is hard to read. All the names are the same. ;)
> > 
> > Yeah, agreed.  The code is way too cryptic.
> > 
> > > So this will trigger when the object size is known but the copy length
> > > is non-const?
> > 
> > Right.
> > 
> > > When I played with re-enabling this in the past, I didn't hit very
> > > many false positives. I sent a bunch of patches a few months back for
> > > legitimate problems that this warning pointed out, so I'm a bit
> > > cautious to just entirely drop it.
> > 
> > Ah, I didn't realize that.  We should definitely keep
> > DEBUG_STRICT_USER_COPY_CHECKS then.  Though it would be *really* nice to
> > find a way to associate some kind of whitelist with it to separate the
> > wheat from all the chaff.
> 
> Ok, so I could drop patch 1/2 and then resubmit 2/2 with an updated
> patch header.
> 
> There's one problem with that though.  It's going to annoy a lot of
> people who do allyesconfig/allmodconfig builds because
> DEBUG_STRICT_USER_COPY_CHECKS adds several fake warnings.
> 
> Anybody know if there's a way to disable an option for
> allyesconfig/allmodconfig?

Hm, I guess that wouldn't be good enough anyway because the build bot
randconfig builds woudn't be happy with the warnings either.  Not sure
how to keep the feature around without littering the landscape with
false positives...

-- 
Josh

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH 2/2] mm/usercopy: enable usercopy size checking for modern versions of gcc
  2016-08-26 20:56                     ` Josh Poimboeuf
  2016-08-26 21:00                       ` Josh Poimboeuf
@ 2016-08-27  0:37                       ` Linus Torvalds
  2016-08-29 14:48                         ` Josh Poimboeuf
  1 sibling, 1 reply; 107+ messages in thread
From: Linus Torvalds @ 2016-08-27  0:37 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Kees Cook, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86,
	LKML, Andy Lutomirski, Steven Rostedt, Brian Gerst,
	Peter Zijlstra, Frederic Weisbecker, Byungchul Park, Nilay Vaish

[-- Attachment #1: Type: text/plain, Size: 2408 bytes --]

On Fri, Aug 26, 2016 at 1:56 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>
> There's one problem with that though.  It's going to annoy a lot of
> people who do allyesconfig/allmodconfig builds because
> DEBUG_STRICT_USER_COPY_CHECKS adds several fake warnings.

How bad is it?

In particular, we've definitely had issues with the "warning"
attribute before. Because as you pointed out somewhere elsewhere, the
warrning can happen before the call is actually optimized away by a
later compiler phase.

In particular, we _have_ occasionally fixed this by turning it into a
link-time error instead (that's in fact the really traditional model).
That makes the errior happen much later, and the error message isn't
nearly as nice (you get something like "undefined reference to unknown
symbol '__copy_to_user_failed' in function xyz" without line numbers
etc nice things). But it cuts down on the false positives that come
from warnings triggering before the final code has actually been
generated.

So one option *might* be to make the copy_to_user checks do an
explicitly constant and static check like


     if (__builtin_constant_p(n) && sz >= 0 && n > sz)
          __copy_to_user_failed();

with the "__copy_to_user_failed()" function declared but never
defined. That way, at link time, if something still references it, you
get a link error and you'll know it's bad.

So something like the attached patch *might* work. As mentioned, it
makes the error messages much less legible if they happen, and it
delays them to link time, so it's not perfect. But it certainly has
the potential of avoiding bogus warnings.

It *seemed* to work in my quick allmodconfig build test, but that may
be because I screwed something up. So take that with a large pinch of
salt.

What do people think? The static built-time errors - if they happen -
really should be pretty exceptional and unusual. So maybe it's ok that
they then would be somewhat cryptic, and you'd have to maybe hunt
(possibly through several layers of inline functions) where the actual
offending user copy then ends up being..

So I'm not happy with this patch, but I also think that the false
positives make the *current* code simply unworkable with current gcc
versions.

Of course, somebody might be able to come up with a better approach
that still gets the nice error messages and avoids the false
positives.

                          Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/plain, Size: 4891 bytes --]

 arch/x86/include/asm/uaccess.h | 99 ++++++++++++------------------------------
 include/linux/compiler-gcc.h   |  2 +-
 2 files changed, 28 insertions(+), 73 deletions(-)

diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index a0ae610b9280..f1fe3740b46b 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -697,105 +697,60 @@ unsigned long __must_check _copy_from_user(void *to, const void __user *from,
 unsigned long __must_check _copy_to_user(void __user *to, const void *from,
 					 unsigned n);
 
-#ifdef CONFIG_DEBUG_STRICT_USER_COPY_CHECKS
-# define copy_user_diag __compiletime_error
-#else
-# define copy_user_diag __compiletime_warning
-#endif
-
-extern void copy_user_diag("copy_from_user() buffer size is too small")
-copy_from_user_overflow(void);
-extern void copy_user_diag("copy_to_user() buffer size is too small")
-copy_to_user_overflow(void) __asm__("copy_from_user_overflow");
-
-#undef copy_user_diag
-
-#ifdef CONFIG_DEBUG_STRICT_USER_COPY_CHECKS
-
-extern void
-__compiletime_warning("copy_from_user() buffer size is not provably correct")
-__copy_from_user_overflow(void) __asm__("copy_from_user_overflow");
-#define __copy_from_user_overflow(size, count) __copy_from_user_overflow()
-
-extern void
-__compiletime_warning("copy_to_user() buffer size is not provably correct")
-__copy_to_user_overflow(void) __asm__("copy_from_user_overflow");
-#define __copy_to_user_overflow(size, count) __copy_to_user_overflow()
-
-#else
-
 static inline void
-__copy_from_user_overflow(int size, unsigned long count)
+copy_user_overflow(int size, unsigned long count)
 {
 	WARN(1, "Buffer overflow detected (%d < %lu)!\n", size, count);
 }
 
-#define __copy_to_user_overflow __copy_from_user_overflow
-
-#endif
+extern unsigned long copy_user_bad(void);
+extern unsigned long copy_user_bad(void);
 
 static inline unsigned long __must_check
 copy_from_user(void *to, const void __user *from, unsigned long n)
 {
 	int sz = __compiletime_object_size(to);
 
+	if (sz >= 0 && n > sz) {
+		if (__builtin_constant_p(n))
+			return copy_user_bad();
+#ifdef CONFIG_DEBUG_STRICT_USER_COPY_CHECKS
+		copy_user_overflow(sz, n);
+		memset(to, 0, n);
+		return n;
+#endif
+	}
+
 	might_fault();
 
 	kasan_check_write(to, n);
 
-	/*
-	 * While we would like to have the compiler do the checking for us
-	 * even in the non-constant size case, any false positives there are
-	 * a problem (especially when DEBUG_STRICT_USER_COPY_CHECKS, but even
-	 * without - the [hopefully] dangerous looking nature of the warning
-	 * would make people go look at the respecitive call sites over and
-	 * over again just to find that there's no problem).
-	 *
-	 * And there are cases where it's just not realistic for the compiler
-	 * to prove the count to be in range. For example when multiple call
-	 * sites of a helper function - perhaps in different source files -
-	 * all doing proper range checking, yet the helper function not doing
-	 * so again.
-	 *
-	 * Therefore limit the compile time checking to the constant size
-	 * case, and do only runtime checking for non-constant sizes.
-	 */
-
-	if (likely(sz < 0 || sz >= n)) {
-		check_object_size(to, n, false);
-		n = _copy_from_user(to, from, n);
-	} else if (__builtin_constant_p(n))
-		copy_from_user_overflow();
-	else
-		__copy_from_user_overflow(sz, n);
-
-	return n;
+	check_object_size(to, n, false);
+	return copy_from_user(to, from, n);
 }
 
 static inline unsigned long __must_check
 copy_to_user(void __user *to, const void *from, unsigned long n)
 {
-	int sz = __compiletime_object_size(from);
+	int sz = __compiletime_object_size(to);
+
+	if (sz >= 0 && n > sz) {
+		if (__builtin_constant_p(n))
+			return copy_user_bad();
+#ifdef CONFIG_DEBUG_STRICT_USER_COPY_CHECKS
+		copy_user_overflow(sz, n);
+		return n;
+#endif
+	}
 
 	kasan_check_read(from, n);
 
 	might_fault();
 
-	/* See the comment in copy_from_user() above. */
-	if (likely(sz < 0 || sz >= n)) {
-		check_object_size(from, n, true);
-		n = _copy_to_user(to, from, n);
-	} else if (__builtin_constant_p(n))
-		copy_to_user_overflow();
-	else
-		__copy_to_user_overflow(sz, n);
-
-	return n;
+	check_object_size(from, n, true);
+	return _copy_to_user(to, from, n);
 }
 
-#undef __copy_from_user_overflow
-#undef __copy_to_user_overflow
-
 /*
  * We rely on the nested NMI work to allow atomic faults from the NMI path; the
  * nested NMI paths are careful to preserve CR2.
diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
index e2949397c19b..e7f7a689ef09 100644
--- a/include/linux/compiler-gcc.h
+++ b/include/linux/compiler-gcc.h
@@ -158,7 +158,7 @@
 #define __compiler_offsetof(a, b)					\
 	__builtin_offsetof(a, b)
 
-#if GCC_VERSION >= 40100 && GCC_VERSION < 40600
+#if GCC_VERSION >= 40100
 # define __compiletime_object_size(obj) __builtin_object_size(obj, 0)
 #endif
 

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* Re: [PATCH 2/2] mm/usercopy: enable usercopy size checking for modern versions of gcc
  2016-08-27  0:37                       ` Linus Torvalds
@ 2016-08-29 14:48                         ` Josh Poimboeuf
  2016-08-29 15:36                           ` Linus Torvalds
  2016-08-30 18:33                           ` [PATCH 2/2] mm/usercopy: enable usercopy size checking for modern versions of gcc Kees Cook
  0 siblings, 2 replies; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-29 14:48 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Kees Cook, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86,
	LKML, Andy Lutomirski, Steven Rostedt, Brian Gerst,
	Peter Zijlstra, Frederic Weisbecker, Byungchul Park, Nilay Vaish

On Fri, Aug 26, 2016 at 05:37:20PM -0700, Linus Torvalds wrote:
> On Fri, Aug 26, 2016 at 1:56 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> >
> > There's one problem with that though.  It's going to annoy a lot of
> > people who do allyesconfig/allmodconfig builds because
> > DEBUG_STRICT_USER_COPY_CHECKS adds several fake warnings.
> 
> How bad is it?
> 
> In particular, we've definitely had issues with the "warning"
> attribute before. Because as you pointed out somewhere elsewhere, the
> warrning can happen before the call is actually optimized away by a
> later compiler phase.

So I *think* your patch fixes the wrong problem.  That's probably at
least somewhat my fault because I misunderstood the issue before and may
have described it wrong at some point.

AFAICT, gcc isn't doing anything wrong, and the false positives are
"intentional".

There are in fact two static warnings (which are being silenced for new
versions of gcc):

1) "copy_from_user() buffer size is too small"

   This happens when object size and copy size are both const, and copy
   size > object size.  I didn't see any false positives for this one.
   So the function warning attribute seems to be working fine here.
   Your patch "fixed" this warning, but it didn't need fixing.

   Note this scenario is always a bug and so I think it should be
   changed to *always* be an error, regardless of
   DEBUG_STRICT_USER_COPY_CHECKS.

2) "copy_from_user() buffer size is not provably correct"

   This is the (cryptic) false positive warning which happens when I
   enable __compiletime_object_size() for new compilers (and
   DEBUG_STRICT_USER_COPY_CHECKS).  It happens when object size is
   const, but copy size is *not*.  In this case there's no way to
   compare the two at build time, so it gives the warning.  (Note the
   warning is a byproduct of the fact that gcc has no way of knowing
   whether the overflow function will be called, so the call isn't dead
   code and the warning attribute is activated.)

   So this warning seems to only indicate "this is an unusual pattern,
   maybe you should check it out" rather than "this is a bug".  It seems
   to be working "as designed": it has nothing to do with gcc compiler
   phases AFAICT.

   (Which begs the question: why didn't these warnings appear with older
   versions of gcc?  I have no idea...)

   I get 102(!) of these warnings with allyesconfig and the
   __compiletime_object_size() gcc check removed.  I don't know if there
   are any real bugs hiding in there, but from looking at a small
   sample, I didn't see any.

So warning 2 seems to be intentional for some reason.  I suggested
removing it, while keeping the corresponding runtime check.  But
according to Kees it sometimes finds real bugs.

(Kees, can you confirm that at least some of the recent bugs you found
were from warning 2?)

Anyway I don't currently see any doable option other than just removing
warning 2 (yet still keeping the corresponding copy_user_overflow()
runtime check).

-- 
Josh

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH 2/2] mm/usercopy: enable usercopy size checking for modern versions of gcc
  2016-08-29 14:48                         ` Josh Poimboeuf
@ 2016-08-29 15:36                           ` Linus Torvalds
  2016-08-29 17:08                             ` [PATCH v2] mm/usercopy: get rid of CONFIG_DEBUG_STRICT_USER_COPY_CHECKS Josh Poimboeuf
  2016-08-30 18:33                           ` [PATCH 2/2] mm/usercopy: enable usercopy size checking for modern versions of gcc Kees Cook
  1 sibling, 1 reply; 107+ messages in thread
From: Linus Torvalds @ 2016-08-29 15:36 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Kees Cook, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86,
	LKML, Andy Lutomirski, Steven Rostedt, Brian Gerst,
	Peter Zijlstra, Frederic Weisbecker, Byungchul Park, Nilay Vaish

On Mon, Aug 29, 2016 at 7:48 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>
> So I *think* your patch fixes the wrong problem.  That's probably at
> least somewhat my fault because I misunderstood the issue before and may
> have described it wrong at some point.
>
> AFAICT, gcc isn't doing anything wrong, and the false positives are
> "intentional".
>
> There are in fact two static warnings (which are being silenced for new
> versions of gcc):

[ snip snip details ]

Ok.

Color me convinced, I never even looked at the two different cases, I
thought it was just one issue.

Let's just remove the spurious false positive warning then, in order
to re-instate the *actual* warning that right now is disabled entirely
due to the unrelated false positives.

Thanks for looking into this. Would you happen to also have a patch
that can be applied? Hint hint..

               Linus

^ permalink raw reply	[flat|nested] 107+ messages in thread

* [PATCH v2] mm/usercopy: get rid of CONFIG_DEBUG_STRICT_USER_COPY_CHECKS
  2016-08-29 15:36                           ` Linus Torvalds
@ 2016-08-29 17:08                             ` Josh Poimboeuf
  2016-08-29 17:59                               ` Josh Poimboeuf
  2016-08-30 13:04                               ` [PATCH v3] " Josh Poimboeuf
  0 siblings, 2 replies; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-29 17:08 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Kees Cook, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86,
	LKML, Andy Lutomirski, Steven Rostedt, Brian Gerst,
	Peter Zijlstra, Frederic Weisbecker, Byungchul Park, Nilay Vaish

On Mon, Aug 29, 2016 at 08:36:46AM -0700, Linus Torvalds wrote:
> On Mon, Aug 29, 2016 at 7:48 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> >
> > So I *think* your patch fixes the wrong problem.  That's probably at
> > least somewhat my fault because I misunderstood the issue before and may
> > have described it wrong at some point.
> >
> > AFAICT, gcc isn't doing anything wrong, and the false positives are
> > "intentional".
> >
> > There are in fact two static warnings (which are being silenced for new
> > versions of gcc):
> 
> [ snip snip details ]
> 
> Ok.
> 
> Color me convinced, I never even looked at the two different cases, I
> thought it was just one issue.
> 
> Let's just remove the spurious false positive warning then, in order
> to re-instate the *actual* warning that right now is disabled entirely
> due to the unrelated false positives.
> 
> Thanks for looking into this. Would you happen to also have a patch
> that can be applied? Hint hint..

How about something like this?  I can split it up if needed...

---

From: Josh Poimboeuf <jpoimboe@redhat.com>
Subject: [PATCH v2] mm/usercopy: get rid of CONFIG_DEBUG_STRICT_USER_COPY_CHECKS

There are three usercopy warnings which are currently being silenced for
gcc 4.6 and newer:

1) "copy_from_user() buffer size is too small" compile warning/error

   This is a static warning which happens when object size and copy size
   are both const, and copy size > object size.  I didn't see any false
   positives for this one.  So the function warning attribute seems to
   be working fine here.

   Note this scenario is always a bug and so I think it should be
   changed to *always* be an error, regardless of
   CONFIG_DEBUG_STRICT_USER_COPY_CHECKS.

2) "copy_from_user() buffer size is not provably correct" compile warning

   This is another static warning which happens when I enable
   __compiletime_object_size() for new compilers (and
   CONFIG_DEBUG_STRICT_USER_COPY_CHECKS).  It happens when object size
   is const, but copy size is *not*.  In this case there's no way to
   compare the two at build time, so it gives the warning.  (Note the
   warning is a byproduct of the fact that gcc has no way of knowing
   whether the overflow function will be called, so the call isn't dead
   code and the warning attribute is activated.)

   So this warning seems to only indicate "this is an unusual pattern,
   maybe you should check it out" rather than "this is a bug".

   I get 102(!) of these warnings with allyesconfig and the
   __compiletime_object_size() gcc check removed.  I don't know if there
   are any real bugs hiding in there, but from looking at a small
   sample, I didn't see any.  According to Kees, it does sometimes find
   real bugs.  But the false positive rate seems high.

3) "Buffer overflow detected" runtime warning

   This is a runtime warning where object size is const, and copy size >
   object size.

All three warnings (both static and runtime) were completely disabled
for gcc 4.6 with the following commit:

  2fb0815c9ee6 ("gcc4: disable __compiletime_object_size for GCC 4.6+")

That commit mistakenly assumed that the false positives were caused by a
gcc bug in __compiletime_object_size().  But in fact,
__compiletime_object_size() seems to be working fine.  The false
positives were instead triggered by #2 above.  (Though I don't have an
explanation for why the warnings supposedly only started showing up in
gcc 4.6.)

So remove warning #2 to get rid of all the false positives, and re-enable
warnings #1 and #3 by reverting the above commit.

Furthermore, since #1 is a real bug which is detected at compile time,
upgrade it to always be an error.

Having done all that, CONFIG_DEBUG_STRICT_USER_COPY_CHECKS is no longer
needed.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/parisc/Kconfig                         |  1 -
 arch/parisc/configs/c8000_defconfig         |  1 -
 arch/parisc/configs/generic-64bit_defconfig |  1 -
 arch/parisc/include/asm/uaccess.h           | 22 ++++-----
 arch/s390/Kconfig                           |  1 -
 arch/s390/configs/default_defconfig         |  1 -
 arch/s390/configs/gcov_defconfig            |  1 -
 arch/s390/configs/performance_defconfig     |  1 -
 arch/s390/defconfig                         |  1 -
 arch/s390/include/asm/uaccess.h             | 19 +++++---
 arch/tile/Kconfig                           |  1 -
 arch/tile/include/asm/uaccess.h             | 19 ++++----
 arch/x86/Kconfig                            |  1 -
 arch/x86/include/asm/uaccess.h              | 69 ++++-------------------------
 include/asm-generic/uaccess.h               |  1 +
 include/linux/compiler-gcc.h                |  2 +-
 lib/Kconfig.debug                           | 18 --------
 lib/Makefile                                |  1 -
 lib/usercopy.c                              |  9 ----
 19 files changed, 45 insertions(+), 125 deletions(-)
 delete mode 100644 lib/usercopy.c

diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig
index cd87781..af12c2d 100644
--- a/arch/parisc/Kconfig
+++ b/arch/parisc/Kconfig
@@ -1,6 +1,5 @@
 config PARISC
 	def_bool y
-	select ARCH_HAS_DEBUG_STRICT_USER_COPY_CHECKS
 	select ARCH_MIGHT_HAVE_PC_PARPORT
 	select HAVE_IDE
 	select HAVE_OPROFILE
diff --git a/arch/parisc/configs/c8000_defconfig b/arch/parisc/configs/c8000_defconfig
index 1a8f6f95..f6a4c01 100644
--- a/arch/parisc/configs/c8000_defconfig
+++ b/arch/parisc/configs/c8000_defconfig
@@ -245,7 +245,6 @@ CONFIG_DEBUG_RT_MUTEXES=y
 CONFIG_PROVE_RCU_DELAY=y
 CONFIG_DEBUG_BLOCK_EXT_DEVT=y
 CONFIG_LATENCYTOP=y
-CONFIG_DEBUG_STRICT_USER_COPY_CHECKS=y
 CONFIG_KEYS=y
 # CONFIG_CRYPTO_HW is not set
 CONFIG_FONTS=y
diff --git a/arch/parisc/configs/generic-64bit_defconfig b/arch/parisc/configs/generic-64bit_defconfig
index 7e07926..c564e6e 100644
--- a/arch/parisc/configs/generic-64bit_defconfig
+++ b/arch/parisc/configs/generic-64bit_defconfig
@@ -291,7 +291,6 @@ CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC=y
 CONFIG_BOOTPARAM_HUNG_TASK_PANIC=y
 # CONFIG_SCHED_DEBUG is not set
 CONFIG_TIMER_STATS=y
-CONFIG_DEBUG_STRICT_USER_COPY_CHECKS=y
 CONFIG_CRYPTO_MANAGER=y
 CONFIG_CRYPTO_ECB=m
 CONFIG_CRYPTO_PCBC=m
diff --git a/arch/parisc/include/asm/uaccess.h b/arch/parisc/include/asm/uaccess.h
index 0f59fd9..736c0c1 100644
--- a/arch/parisc/include/asm/uaccess.h
+++ b/arch/parisc/include/asm/uaccess.h
@@ -208,13 +208,13 @@ unsigned long copy_in_user(void __user *dst, const void __user *src, unsigned lo
 #define __copy_to_user_inatomic __copy_to_user
 #define __copy_from_user_inatomic __copy_from_user
 
-extern void copy_from_user_overflow(void)
-#ifdef CONFIG_DEBUG_STRICT_USER_COPY_CHECKS
-        __compiletime_error("copy_from_user() buffer size is not provably correct")
-#else
-        __compiletime_warning("copy_from_user() buffer size is not provably correct")
-#endif
-;
+extern void __compiletime_error("usercopy buffer size is too small")
+__bad_copy_user(void);
+
+static inline void copy_user_overflow(int size, unsigned long count)
+{
+	WARN(1, "Buffer overflow detected (%d < %lu)!\n", size, count);
+}
 
 static inline unsigned long __must_check copy_from_user(void *to,
                                           const void __user *from,
@@ -223,10 +223,12 @@ static inline unsigned long __must_check copy_from_user(void *to,
         int sz = __compiletime_object_size(to);
         int ret = -EFAULT;
 
-        if (likely(sz == -1 || !__builtin_constant_p(n) || sz >= n))
+        if (likely(sz == -1 || sz >= n))
                 ret = __copy_from_user(to, from, n);
-        else
-                copy_from_user_overflow();
+        else (!__builtin_constant_p(n))
+		copy_user_overflow(sz, n);
+	else
+                __bad_copy_user();
 
         return ret;
 }
diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index e751fe2..c109f07 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -68,7 +68,6 @@ config DEBUG_RODATA
 config S390
 	def_bool y
 	select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
-	select ARCH_HAS_DEBUG_STRICT_USER_COPY_CHECKS
 	select ARCH_HAS_DEVMEM_IS_ALLOWED
 	select ARCH_HAS_ELF_RANDOMIZE
 	select ARCH_HAS_GCOV_PROFILE_ALL
diff --git a/arch/s390/configs/default_defconfig b/arch/s390/configs/default_defconfig
index 26e0c7f..412b1bd 100644
--- a/arch/s390/configs/default_defconfig
+++ b/arch/s390/configs/default_defconfig
@@ -602,7 +602,6 @@ CONFIG_FAIL_FUTEX=y
 CONFIG_FAULT_INJECTION_DEBUG_FS=y
 CONFIG_FAULT_INJECTION_STACKTRACE_FILTER=y
 CONFIG_LATENCYTOP=y
-CONFIG_DEBUG_STRICT_USER_COPY_CHECKS=y
 CONFIG_IRQSOFF_TRACER=y
 CONFIG_PREEMPT_TRACER=y
 CONFIG_SCHED_TRACER=y
diff --git a/arch/s390/configs/gcov_defconfig b/arch/s390/configs/gcov_defconfig
index 24879da..bec279e 100644
--- a/arch/s390/configs/gcov_defconfig
+++ b/arch/s390/configs/gcov_defconfig
@@ -552,7 +552,6 @@ CONFIG_NOTIFIER_ERROR_INJECTION=m
 CONFIG_CPU_NOTIFIER_ERROR_INJECT=m
 CONFIG_PM_NOTIFIER_ERROR_INJECT=m
 CONFIG_LATENCYTOP=y
-CONFIG_DEBUG_STRICT_USER_COPY_CHECKS=y
 CONFIG_BLK_DEV_IO_TRACE=y
 # CONFIG_KPROBE_EVENT is not set
 CONFIG_TRACE_ENUM_MAP_FILE=y
diff --git a/arch/s390/configs/performance_defconfig b/arch/s390/configs/performance_defconfig
index a5c1e5f..1751446 100644
--- a/arch/s390/configs/performance_defconfig
+++ b/arch/s390/configs/performance_defconfig
@@ -549,7 +549,6 @@ CONFIG_TIMER_STATS=y
 CONFIG_RCU_TORTURE_TEST=m
 CONFIG_RCU_CPU_STALL_TIMEOUT=60
 CONFIG_LATENCYTOP=y
-CONFIG_DEBUG_STRICT_USER_COPY_CHECKS=y
 CONFIG_SCHED_TRACER=y
 CONFIG_FTRACE_SYSCALLS=y
 CONFIG_STACK_TRACER=y
diff --git a/arch/s390/defconfig b/arch/s390/defconfig
index 73610f2..2d40ef0 100644
--- a/arch/s390/defconfig
+++ b/arch/s390/defconfig
@@ -172,7 +172,6 @@ CONFIG_DEBUG_NOTIFIERS=y
 CONFIG_RCU_CPU_STALL_TIMEOUT=60
 CONFIG_RCU_TRACE=y
 CONFIG_LATENCYTOP=y
-CONFIG_DEBUG_STRICT_USER_COPY_CHECKS=y
 CONFIG_SCHED_TRACER=y
 CONFIG_FTRACE_SYSCALLS=y
 CONFIG_TRACER_SNAPSHOT_PER_CPU_SWAP=y
diff --git a/arch/s390/include/asm/uaccess.h b/arch/s390/include/asm/uaccess.h
index 9b49cf1..95aefdb 100644
--- a/arch/s390/include/asm/uaccess.h
+++ b/arch/s390/include/asm/uaccess.h
@@ -311,6 +311,14 @@ int __get_user_bad(void) __attribute__((noreturn));
 #define __put_user_unaligned __put_user
 #define __get_user_unaligned __get_user
 
+extern void __compiletime_error("usercopy buffer size is too small")
+__bad_copy_user(void);
+
+static inline void copy_user_overflow(int size, unsigned long count)
+{
+	WARN(1, "Buffer overflow detected (%d < %lu)!\n", size, count);
+}
+
 /**
  * copy_to_user: - Copy a block of data into user space.
  * @to:   Destination address, in user space.
@@ -332,12 +340,6 @@ copy_to_user(void __user *to, const void *from, unsigned long n)
 	return __copy_to_user(to, from, n);
 }
 
-void copy_from_user_overflow(void)
-#ifdef CONFIG_DEBUG_STRICT_USER_COPY_CHECKS
-__compiletime_warning("copy_from_user() buffer size is not provably correct")
-#endif
-;
-
 /**
  * copy_from_user: - Copy a block of data from user space.
  * @to:   Destination address, in kernel space.
@@ -362,7 +364,10 @@ copy_from_user(void *to, const void __user *from, unsigned long n)
 
 	might_fault();
 	if (unlikely(sz != -1 && sz < n)) {
-		copy_from_user_overflow();
+		if (!__builtin_constant_p(n))
+			copy_user_overflow(sz, n);
+		else
+			__bad_copy_user();
 		return n;
 	}
 	return __copy_from_user(to, from, n);
diff --git a/arch/tile/Kconfig b/arch/tile/Kconfig
index 4820a02..78da75b 100644
--- a/arch/tile/Kconfig
+++ b/arch/tile/Kconfig
@@ -4,7 +4,6 @@
 config TILE
 	def_bool y
 	select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
-	select ARCH_HAS_DEBUG_STRICT_USER_COPY_CHECKS
 	select ARCH_HAS_DEVMEM_IS_ALLOWED
 	select ARCH_HAVE_NMI_SAFE_CMPXCHG
 	select ARCH_WANT_FRAME_POINTERS
diff --git a/arch/tile/include/asm/uaccess.h b/arch/tile/include/asm/uaccess.h
index 0a9c4265..c664300 100644
--- a/arch/tile/include/asm/uaccess.h
+++ b/arch/tile/include/asm/uaccess.h
@@ -416,14 +416,13 @@ _copy_from_user(void *to, const void __user *from, unsigned long n)
 	return n;
 }
 
-#ifdef CONFIG_DEBUG_STRICT_USER_COPY_CHECKS
-/*
- * There are still unprovable places in the generic code as of 2.6.34, so this
- * option is not really compatible with -Werror, which is more useful in
- * general.
- */
-extern void copy_from_user_overflow(void)
-	__compiletime_warning("copy_from_user() size is not provably correct");
+extern void __compiletime_error("usercopy buffer size is too small")
+__bad_copy_user(void);
+
+static inline void copy_user_overflow(int size, unsigned long count)
+{
+	WARN(1, "Buffer overflow detected (%d < %lu)!\n", size, count);
+}
 
 static inline unsigned long __must_check copy_from_user(void *to,
 					  const void __user *from,
@@ -433,8 +432,10 @@ static inline unsigned long __must_check copy_from_user(void *to,
 
 	if (likely(sz == -1 || sz >= n))
 		n = _copy_from_user(to, from, n);
+	else if (!__builtin_constant_p(n))
+		copy_user_overflow();
 	else
-		copy_from_user_overflow();
+		__bad_copy_user();
 
 	return n;
 }
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index c580d8c..2a1f0ce 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -24,7 +24,6 @@ config X86
 	select ARCH_DISCARD_MEMBLOCK
 	select ARCH_HAS_ACPI_TABLE_UPGRADE if ACPI
 	select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
-	select ARCH_HAS_DEBUG_STRICT_USER_COPY_CHECKS
 	select ARCH_HAS_DEVMEM_IS_ALLOWED
 	select ARCH_HAS_ELF_RANDOMIZE
 	select ARCH_HAS_FAST_MULTIPLIER
diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index a0ae610..c3f2911 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -697,43 +697,14 @@ unsigned long __must_check _copy_from_user(void *to, const void __user *from,
 unsigned long __must_check _copy_to_user(void __user *to, const void *from,
 					 unsigned n);
 
-#ifdef CONFIG_DEBUG_STRICT_USER_COPY_CHECKS
-# define copy_user_diag __compiletime_error
-#else
-# define copy_user_diag __compiletime_warning
-#endif
-
-extern void copy_user_diag("copy_from_user() buffer size is too small")
-copy_from_user_overflow(void);
-extern void copy_user_diag("copy_to_user() buffer size is too small")
-copy_to_user_overflow(void) __asm__("copy_from_user_overflow");
-
-#undef copy_user_diag
-
-#ifdef CONFIG_DEBUG_STRICT_USER_COPY_CHECKS
-
-extern void
-__compiletime_warning("copy_from_user() buffer size is not provably correct")
-__copy_from_user_overflow(void) __asm__("copy_from_user_overflow");
-#define __copy_from_user_overflow(size, count) __copy_from_user_overflow()
-
-extern void
-__compiletime_warning("copy_to_user() buffer size is not provably correct")
-__copy_to_user_overflow(void) __asm__("copy_from_user_overflow");
-#define __copy_to_user_overflow(size, count) __copy_to_user_overflow()
-
-#else
+extern void __compiletime_error("usercopy buffer size is too small")
+__bad_copy_user(void);
 
-static inline void
-__copy_from_user_overflow(int size, unsigned long count)
+static inline void copy_user_overflow(int size, unsigned long count)
 {
 	WARN(1, "Buffer overflow detected (%d < %lu)!\n", size, count);
 }
 
-#define __copy_to_user_overflow __copy_from_user_overflow
-
-#endif
-
 static inline unsigned long __must_check
 copy_from_user(void *to, const void __user *from, unsigned long n)
 {
@@ -743,31 +714,13 @@ copy_from_user(void *to, const void __user *from, unsigned long n)
 
 	kasan_check_write(to, n);
 
-	/*
-	 * While we would like to have the compiler do the checking for us
-	 * even in the non-constant size case, any false positives there are
-	 * a problem (especially when DEBUG_STRICT_USER_COPY_CHECKS, but even
-	 * without - the [hopefully] dangerous looking nature of the warning
-	 * would make people go look at the respecitive call sites over and
-	 * over again just to find that there's no problem).
-	 *
-	 * And there are cases where it's just not realistic for the compiler
-	 * to prove the count to be in range. For example when multiple call
-	 * sites of a helper function - perhaps in different source files -
-	 * all doing proper range checking, yet the helper function not doing
-	 * so again.
-	 *
-	 * Therefore limit the compile time checking to the constant size
-	 * case, and do only runtime checking for non-constant sizes.
-	 */
-
 	if (likely(sz < 0 || sz >= n)) {
 		check_object_size(to, n, false);
 		n = _copy_from_user(to, from, n);
-	} else if (__builtin_constant_p(n))
-		copy_from_user_overflow();
+	} else if (!__builtin_constant_p(n))
+		copy_user_overflow(sz, n);
 	else
-		__copy_from_user_overflow(sz, n);
+		__bad_copy_user();
 
 	return n;
 }
@@ -781,21 +734,17 @@ copy_to_user(void __user *to, const void *from, unsigned long n)
 
 	might_fault();
 
-	/* See the comment in copy_from_user() above. */
 	if (likely(sz < 0 || sz >= n)) {
 		check_object_size(from, n, true);
 		n = _copy_to_user(to, from, n);
-	} else if (__builtin_constant_p(n))
-		copy_to_user_overflow();
+	} else if (!__builtin_constant_p(n))
+		copy_user_overflow(sz, n);
 	else
-		__copy_to_user_overflow(sz, n);
+		__bad_copy_user();
 
 	return n;
 }
 
-#undef __copy_from_user_overflow
-#undef __copy_to_user_overflow
-
 /*
  * We rely on the nested NMI work to allow atomic faults from the NMI path; the
  * nested NMI paths are careful to preserve CR2.
diff --git a/include/asm-generic/uaccess.h b/include/asm-generic/uaccess.h
index 1bfa602..5dea1fb 100644
--- a/include/asm-generic/uaccess.h
+++ b/include/asm-generic/uaccess.h
@@ -72,6 +72,7 @@ struct exception_table_entry
 /* Returns 0 if exception not found and fixup otherwise.  */
 extern unsigned long search_exception_table(unsigned long);
 
+
 /*
  * architectures with an MMU should override these two
  */
diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
index 8dbc892..573c5a1 100644
--- a/include/linux/compiler-gcc.h
+++ b/include/linux/compiler-gcc.h
@@ -158,7 +158,7 @@
 #define __compiler_offsetof(a, b)					\
 	__builtin_offsetof(a, b)
 
-#if GCC_VERSION >= 40100 && GCC_VERSION < 40600
+#if GCC_VERSION >= 40100
 # define __compiletime_object_size(obj) __builtin_object_size(obj, 0)
 #endif
 
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 2307d7c..2e2cca5 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1686,24 +1686,6 @@ config LATENCYTOP
 	  Enable this option if you want to use the LatencyTOP tool
 	  to find out which userspace is blocking on what kernel operations.
 
-config ARCH_HAS_DEBUG_STRICT_USER_COPY_CHECKS
-	bool
-
-config DEBUG_STRICT_USER_COPY_CHECKS
-	bool "Strict user copy size checks"
-	depends on ARCH_HAS_DEBUG_STRICT_USER_COPY_CHECKS
-	depends on DEBUG_KERNEL && !TRACE_BRANCH_PROFILING
-	help
-	  Enabling this option turns a certain set of sanity checks for user
-	  copy operations into compile time failures.
-
-	  The copy_from_user() etc checks are there to help test if there
-	  are sufficient security checks on the length argument of
-	  the copy operation, by having gcc prove that the argument is
-	  within bounds.
-
-	  If unsure, say N.
-
 source kernel/trace/Kconfig
 
 menu "Runtime Testing"
diff --git a/lib/Makefile b/lib/Makefile
index cfa68eb..5dc77a8 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -24,7 +24,6 @@ lib-y := ctype.o string.o vsprintf.o cmdline.o \
 	 is_single_threaded.o plist.o decompress.o kobject_uevent.o \
 	 earlycpio.o seq_buf.o nmi_backtrace.o nodemask.o
 
-obj-$(CONFIG_ARCH_HAS_DEBUG_STRICT_USER_COPY_CHECKS) += usercopy.o
 lib-$(CONFIG_MMU) += ioremap.o
 lib-$(CONFIG_SMP) += cpumask.o
 lib-$(CONFIG_HAS_DMA) += dma-noop.o
diff --git a/lib/usercopy.c b/lib/usercopy.c
deleted file mode 100644
index 4f5b1dd..0000000
--- a/lib/usercopy.c
+++ /dev/null
@@ -1,9 +0,0 @@
-#include <linux/export.h>
-#include <linux/bug.h>
-#include <linux/uaccess.h>
-
-void copy_from_user_overflow(void)
-{
-	WARN(1, "Buffer overflow detected!\n");
-}
-EXPORT_SYMBOL(copy_from_user_overflow);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* Re: [PATCH v2] mm/usercopy: get rid of CONFIG_DEBUG_STRICT_USER_COPY_CHECKS
  2016-08-29 17:08                             ` [PATCH v2] mm/usercopy: get rid of CONFIG_DEBUG_STRICT_USER_COPY_CHECKS Josh Poimboeuf
@ 2016-08-29 17:59                               ` Josh Poimboeuf
  2016-08-30 13:04                               ` [PATCH v3] " Josh Poimboeuf
  1 sibling, 0 replies; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-29 17:59 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Kees Cook, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86,
	LKML, Andy Lutomirski, Steven Rostedt, Brian Gerst,
	Peter Zijlstra, Frederic Weisbecker, Byungchul Park, Nilay Vaish

On Mon, Aug 29, 2016 at 12:08:13PM -0500, Josh Poimboeuf wrote:
> On Mon, Aug 29, 2016 at 08:36:46AM -0700, Linus Torvalds wrote:
> > On Mon, Aug 29, 2016 at 7:48 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > >
> > > So I *think* your patch fixes the wrong problem.  That's probably at
> > > least somewhat my fault because I misunderstood the issue before and may
> > > have described it wrong at some point.
> > >
> > > AFAICT, gcc isn't doing anything wrong, and the false positives are
> > > "intentional".
> > >
> > > There are in fact two static warnings (which are being silenced for new
> > > versions of gcc):
> > 
> > [ snip snip details ]
> > 
> > Ok.
> > 
> > Color me convinced, I never even looked at the two different cases, I
> > thought it was just one issue.
> > 
> > Let's just remove the spurious false positive warning then, in order
> > to re-instate the *actual* warning that right now is disabled entirely
> > due to the unrelated false positives.
> > 
> > Thanks for looking into this. Would you happen to also have a patch
> > that can be applied? Hint hint..
> 
> How about something like this?  I can split it up if needed...
> 
> ---
> 
> From: Josh Poimboeuf <jpoimboe@redhat.com>
> Subject: [PATCH v2] mm/usercopy: get rid of CONFIG_DEBUG_STRICT_USER_COPY_CHECKS

...and that failed to build on tile and parisc.  Here's the fix:

----

diff --git a/arch/parisc/include/asm/uaccess.h b/arch/parisc/include/asm/uaccess.h
index 736c0c1..e915048 100644
--- a/arch/parisc/include/asm/uaccess.h
+++ b/arch/parisc/include/asm/uaccess.h
@@ -225,7 +225,7 @@ static inline unsigned long __must_check copy_from_user(void *to,
 
         if (likely(sz == -1 || sz >= n))
                 ret = __copy_from_user(to, from, n);
-        else (!__builtin_constant_p(n))
+        else if (!__builtin_constant_p(n))
 		copy_user_overflow(sz, n);
 	else
                 __bad_copy_user();
diff --git a/arch/tile/include/asm/uaccess.h b/arch/tile/include/asm/uaccess.h
index c664300..4416f09 100644
--- a/arch/tile/include/asm/uaccess.h
+++ b/arch/tile/include/asm/uaccess.h
@@ -433,7 +433,7 @@ static inline unsigned long __must_check copy_from_user(void *to,
 	if (likely(sz == -1 || sz >= n))
 		n = _copy_from_user(to, from, n);
 	else if (!__builtin_constant_p(n))
-		copy_user_overflow();
+		copy_user_overflow(sz, n);
 	else
 		__bad_copy_user();
 

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v3] mm/usercopy: get rid of CONFIG_DEBUG_STRICT_USER_COPY_CHECKS
  2016-08-29 17:08                             ` [PATCH v2] mm/usercopy: get rid of CONFIG_DEBUG_STRICT_USER_COPY_CHECKS Josh Poimboeuf
  2016-08-29 17:59                               ` Josh Poimboeuf
@ 2016-08-30 13:04                               ` Josh Poimboeuf
  2016-08-30 17:02                                 ` Linus Torvalds
  1 sibling, 1 reply; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-30 13:04 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Kees Cook, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86,
	linux-kernel, Andy Lutomirski, Steven Rostedt, Brian Gerst,
	Peter Zijlstra, Frederic Weisbecker, Byungchul Park, Nilay Vaish

There are three usercopy warnings which are currently being silenced for
gcc 4.6 and newer:

1) "copy_from_user() buffer size is too small" compile warning/error

   This is a static warning which happens when object size and copy size
   are both const, and copy size > object size.  I didn't see any false
   positives for this one.  So the function warning attribute seems to
   be working fine here.

   Note this scenario is always a bug and so I think it should be
   changed to *always* be an error, regardless of
   CONFIG_DEBUG_STRICT_USER_COPY_CHECKS.

2) "copy_from_user() buffer size is not provably correct" compile warning

   This is another static warning which happens when I enable
   __compiletime_object_size() for new compilers (and
   CONFIG_DEBUG_STRICT_USER_COPY_CHECKS).  It happens when object size
   is const, but copy size is *not*.  In this case there's no way to
   compare the two at build time, so it gives the warning.  (Note the
   warning is a byproduct of the fact that gcc has no way of knowing
   whether the overflow function will be called, so the call isn't dead
   code and the warning attribute is activated.)

   So this warning seems to only indicate "this is an unusual pattern,
   maybe you should check it out" rather than "this is a bug".

   I get 102(!) of these warnings with allyesconfig and the
   __compiletime_object_size() gcc check removed.  I don't know if there
   are any real bugs hiding in there, but from looking at a small
   sample, I didn't see any.  According to Kees, it does sometimes find
   real bugs.  But the false positive rate seems high.

3) "Buffer overflow detected" runtime warning

   This is a runtime warning where object size is const, and copy size >
   object size.

All three warnings (both static and runtime) were completely disabled
for gcc 4.6 with the following commit:

  2fb0815c9ee6 ("gcc4: disable __compiletime_object_size for GCC 4.6+")

That commit mistakenly assumed that the false positives were caused by a
gcc bug in __compiletime_object_size().  But in fact,
__compiletime_object_size() seems to be working fine.  The false
positives were instead triggered by #2 above.  (Though I don't have an
explanation for why the warnings supposedly only started showing up in
gcc 4.6.)

So remove warning #2 to get rid of all the false positives, and re-enable
warnings #1 and #3 by reverting the above commit.

Furthermore, since #1 is a real bug which is detected at compile time,
upgrade it to always be an error.

Having done all that, CONFIG_DEBUG_STRICT_USER_COPY_CHECKS is no longer
needed.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
v3: fixed compile errors on parisc and tile

 arch/parisc/Kconfig                         |  1 -
 arch/parisc/configs/c8000_defconfig         |  1 -
 arch/parisc/configs/generic-64bit_defconfig |  1 -
 arch/parisc/include/asm/uaccess.h           | 22 ++++-----
 arch/s390/Kconfig                           |  1 -
 arch/s390/configs/default_defconfig         |  1 -
 arch/s390/configs/gcov_defconfig            |  1 -
 arch/s390/configs/performance_defconfig     |  1 -
 arch/s390/defconfig                         |  1 -
 arch/s390/include/asm/uaccess.h             | 19 +++++---
 arch/tile/Kconfig                           |  1 -
 arch/tile/include/asm/uaccess.h             | 22 +++++----
 arch/x86/Kconfig                            |  1 -
 arch/x86/include/asm/uaccess.h              | 69 ++++-------------------------
 include/asm-generic/uaccess.h               |  1 +
 include/linux/compiler-gcc.h                |  2 +-
 lib/Kconfig.debug                           | 18 --------
 lib/Makefile                                |  1 -
 lib/usercopy.c                              |  9 ----
 19 files changed, 45 insertions(+), 128 deletions(-)
 delete mode 100644 lib/usercopy.c

diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig
index cd87781..af12c2d 100644
--- a/arch/parisc/Kconfig
+++ b/arch/parisc/Kconfig
@@ -1,6 +1,5 @@
 config PARISC
 	def_bool y
-	select ARCH_HAS_DEBUG_STRICT_USER_COPY_CHECKS
 	select ARCH_MIGHT_HAVE_PC_PARPORT
 	select HAVE_IDE
 	select HAVE_OPROFILE
diff --git a/arch/parisc/configs/c8000_defconfig b/arch/parisc/configs/c8000_defconfig
index 1a8f6f95..f6a4c01 100644
--- a/arch/parisc/configs/c8000_defconfig
+++ b/arch/parisc/configs/c8000_defconfig
@@ -245,7 +245,6 @@ CONFIG_DEBUG_RT_MUTEXES=y
 CONFIG_PROVE_RCU_DELAY=y
 CONFIG_DEBUG_BLOCK_EXT_DEVT=y
 CONFIG_LATENCYTOP=y
-CONFIG_DEBUG_STRICT_USER_COPY_CHECKS=y
 CONFIG_KEYS=y
 # CONFIG_CRYPTO_HW is not set
 CONFIG_FONTS=y
diff --git a/arch/parisc/configs/generic-64bit_defconfig b/arch/parisc/configs/generic-64bit_defconfig
index 7e07926..c564e6e 100644
--- a/arch/parisc/configs/generic-64bit_defconfig
+++ b/arch/parisc/configs/generic-64bit_defconfig
@@ -291,7 +291,6 @@ CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC=y
 CONFIG_BOOTPARAM_HUNG_TASK_PANIC=y
 # CONFIG_SCHED_DEBUG is not set
 CONFIG_TIMER_STATS=y
-CONFIG_DEBUG_STRICT_USER_COPY_CHECKS=y
 CONFIG_CRYPTO_MANAGER=y
 CONFIG_CRYPTO_ECB=m
 CONFIG_CRYPTO_PCBC=m
diff --git a/arch/parisc/include/asm/uaccess.h b/arch/parisc/include/asm/uaccess.h
index 0f59fd9..e915048 100644
--- a/arch/parisc/include/asm/uaccess.h
+++ b/arch/parisc/include/asm/uaccess.h
@@ -208,13 +208,13 @@ unsigned long copy_in_user(void __user *dst, const void __user *src, unsigned lo
 #define __copy_to_user_inatomic __copy_to_user
 #define __copy_from_user_inatomic __copy_from_user
 
-extern void copy_from_user_overflow(void)
-#ifdef CONFIG_DEBUG_STRICT_USER_COPY_CHECKS
-        __compiletime_error("copy_from_user() buffer size is not provably correct")
-#else
-        __compiletime_warning("copy_from_user() buffer size is not provably correct")
-#endif
-;
+extern void __compiletime_error("usercopy buffer size is too small")
+__bad_copy_user(void);
+
+static inline void copy_user_overflow(int size, unsigned long count)
+{
+	WARN(1, "Buffer overflow detected (%d < %lu)!\n", size, count);
+}
 
 static inline unsigned long __must_check copy_from_user(void *to,
                                           const void __user *from,
@@ -223,10 +223,12 @@ static inline unsigned long __must_check copy_from_user(void *to,
         int sz = __compiletime_object_size(to);
         int ret = -EFAULT;
 
-        if (likely(sz == -1 || !__builtin_constant_p(n) || sz >= n))
+        if (likely(sz == -1 || sz >= n))
                 ret = __copy_from_user(to, from, n);
-        else
-                copy_from_user_overflow();
+        else if (!__builtin_constant_p(n))
+		copy_user_overflow(sz, n);
+	else
+                __bad_copy_user();
 
         return ret;
 }
diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index e751fe2..c109f07 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -68,7 +68,6 @@ config DEBUG_RODATA
 config S390
 	def_bool y
 	select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
-	select ARCH_HAS_DEBUG_STRICT_USER_COPY_CHECKS
 	select ARCH_HAS_DEVMEM_IS_ALLOWED
 	select ARCH_HAS_ELF_RANDOMIZE
 	select ARCH_HAS_GCOV_PROFILE_ALL
diff --git a/arch/s390/configs/default_defconfig b/arch/s390/configs/default_defconfig
index 26e0c7f..412b1bd 100644
--- a/arch/s390/configs/default_defconfig
+++ b/arch/s390/configs/default_defconfig
@@ -602,7 +602,6 @@ CONFIG_FAIL_FUTEX=y
 CONFIG_FAULT_INJECTION_DEBUG_FS=y
 CONFIG_FAULT_INJECTION_STACKTRACE_FILTER=y
 CONFIG_LATENCYTOP=y
-CONFIG_DEBUG_STRICT_USER_COPY_CHECKS=y
 CONFIG_IRQSOFF_TRACER=y
 CONFIG_PREEMPT_TRACER=y
 CONFIG_SCHED_TRACER=y
diff --git a/arch/s390/configs/gcov_defconfig b/arch/s390/configs/gcov_defconfig
index 24879da..bec279e 100644
--- a/arch/s390/configs/gcov_defconfig
+++ b/arch/s390/configs/gcov_defconfig
@@ -552,7 +552,6 @@ CONFIG_NOTIFIER_ERROR_INJECTION=m
 CONFIG_CPU_NOTIFIER_ERROR_INJECT=m
 CONFIG_PM_NOTIFIER_ERROR_INJECT=m
 CONFIG_LATENCYTOP=y
-CONFIG_DEBUG_STRICT_USER_COPY_CHECKS=y
 CONFIG_BLK_DEV_IO_TRACE=y
 # CONFIG_KPROBE_EVENT is not set
 CONFIG_TRACE_ENUM_MAP_FILE=y
diff --git a/arch/s390/configs/performance_defconfig b/arch/s390/configs/performance_defconfig
index a5c1e5f..1751446 100644
--- a/arch/s390/configs/performance_defconfig
+++ b/arch/s390/configs/performance_defconfig
@@ -549,7 +549,6 @@ CONFIG_TIMER_STATS=y
 CONFIG_RCU_TORTURE_TEST=m
 CONFIG_RCU_CPU_STALL_TIMEOUT=60
 CONFIG_LATENCYTOP=y
-CONFIG_DEBUG_STRICT_USER_COPY_CHECKS=y
 CONFIG_SCHED_TRACER=y
 CONFIG_FTRACE_SYSCALLS=y
 CONFIG_STACK_TRACER=y
diff --git a/arch/s390/defconfig b/arch/s390/defconfig
index 73610f2..2d40ef0 100644
--- a/arch/s390/defconfig
+++ b/arch/s390/defconfig
@@ -172,7 +172,6 @@ CONFIG_DEBUG_NOTIFIERS=y
 CONFIG_RCU_CPU_STALL_TIMEOUT=60
 CONFIG_RCU_TRACE=y
 CONFIG_LATENCYTOP=y
-CONFIG_DEBUG_STRICT_USER_COPY_CHECKS=y
 CONFIG_SCHED_TRACER=y
 CONFIG_FTRACE_SYSCALLS=y
 CONFIG_TRACER_SNAPSHOT_PER_CPU_SWAP=y
diff --git a/arch/s390/include/asm/uaccess.h b/arch/s390/include/asm/uaccess.h
index 9b49cf1..95aefdb 100644
--- a/arch/s390/include/asm/uaccess.h
+++ b/arch/s390/include/asm/uaccess.h
@@ -311,6 +311,14 @@ int __get_user_bad(void) __attribute__((noreturn));
 #define __put_user_unaligned __put_user
 #define __get_user_unaligned __get_user
 
+extern void __compiletime_error("usercopy buffer size is too small")
+__bad_copy_user(void);
+
+static inline void copy_user_overflow(int size, unsigned long count)
+{
+	WARN(1, "Buffer overflow detected (%d < %lu)!\n", size, count);
+}
+
 /**
  * copy_to_user: - Copy a block of data into user space.
  * @to:   Destination address, in user space.
@@ -332,12 +340,6 @@ copy_to_user(void __user *to, const void *from, unsigned long n)
 	return __copy_to_user(to, from, n);
 }
 
-void copy_from_user_overflow(void)
-#ifdef CONFIG_DEBUG_STRICT_USER_COPY_CHECKS
-__compiletime_warning("copy_from_user() buffer size is not provably correct")
-#endif
-;
-
 /**
  * copy_from_user: - Copy a block of data from user space.
  * @to:   Destination address, in kernel space.
@@ -362,7 +364,10 @@ copy_from_user(void *to, const void __user *from, unsigned long n)
 
 	might_fault();
 	if (unlikely(sz != -1 && sz < n)) {
-		copy_from_user_overflow();
+		if (!__builtin_constant_p(n))
+			copy_user_overflow(sz, n);
+		else
+			__bad_copy_user();
 		return n;
 	}
 	return __copy_from_user(to, from, n);
diff --git a/arch/tile/Kconfig b/arch/tile/Kconfig
index 4820a02..78da75b 100644
--- a/arch/tile/Kconfig
+++ b/arch/tile/Kconfig
@@ -4,7 +4,6 @@
 config TILE
 	def_bool y
 	select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
-	select ARCH_HAS_DEBUG_STRICT_USER_COPY_CHECKS
 	select ARCH_HAS_DEVMEM_IS_ALLOWED
 	select ARCH_HAVE_NMI_SAFE_CMPXCHG
 	select ARCH_WANT_FRAME_POINTERS
diff --git a/arch/tile/include/asm/uaccess.h b/arch/tile/include/asm/uaccess.h
index 0a9c4265..a77369e 100644
--- a/arch/tile/include/asm/uaccess.h
+++ b/arch/tile/include/asm/uaccess.h
@@ -416,14 +416,13 @@ _copy_from_user(void *to, const void __user *from, unsigned long n)
 	return n;
 }
 
-#ifdef CONFIG_DEBUG_STRICT_USER_COPY_CHECKS
-/*
- * There are still unprovable places in the generic code as of 2.6.34, so this
- * option is not really compatible with -Werror, which is more useful in
- * general.
- */
-extern void copy_from_user_overflow(void)
-	__compiletime_warning("copy_from_user() size is not provably correct");
+extern void __compiletime_error("usercopy buffer size is too small")
+__bad_copy_user(void);
+
+static inline void copy_user_overflow(int size, unsigned long count)
+{
+	WARN(1, "Buffer overflow detected (%d < %lu)!\n", size, count);
+}
 
 static inline unsigned long __must_check copy_from_user(void *to,
 					  const void __user *from,
@@ -433,14 +432,13 @@ static inline unsigned long __must_check copy_from_user(void *to,
 
 	if (likely(sz == -1 || sz >= n))
 		n = _copy_from_user(to, from, n);
+	else if (!__builtin_constant_p(n))
+		copy_user_overflow(sz, n);
 	else
-		copy_from_user_overflow();
+		__bad_copy_user();
 
 	return n;
 }
-#else
-#define copy_from_user _copy_from_user
-#endif
 
 #ifdef __tilegx__
 /**
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index c580d8c..2a1f0ce 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -24,7 +24,6 @@ config X86
 	select ARCH_DISCARD_MEMBLOCK
 	select ARCH_HAS_ACPI_TABLE_UPGRADE if ACPI
 	select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
-	select ARCH_HAS_DEBUG_STRICT_USER_COPY_CHECKS
 	select ARCH_HAS_DEVMEM_IS_ALLOWED
 	select ARCH_HAS_ELF_RANDOMIZE
 	select ARCH_HAS_FAST_MULTIPLIER
diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index a0ae610..c3f2911 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -697,43 +697,14 @@ unsigned long __must_check _copy_from_user(void *to, const void __user *from,
 unsigned long __must_check _copy_to_user(void __user *to, const void *from,
 					 unsigned n);
 
-#ifdef CONFIG_DEBUG_STRICT_USER_COPY_CHECKS
-# define copy_user_diag __compiletime_error
-#else
-# define copy_user_diag __compiletime_warning
-#endif
-
-extern void copy_user_diag("copy_from_user() buffer size is too small")
-copy_from_user_overflow(void);
-extern void copy_user_diag("copy_to_user() buffer size is too small")
-copy_to_user_overflow(void) __asm__("copy_from_user_overflow");
-
-#undef copy_user_diag
-
-#ifdef CONFIG_DEBUG_STRICT_USER_COPY_CHECKS
-
-extern void
-__compiletime_warning("copy_from_user() buffer size is not provably correct")
-__copy_from_user_overflow(void) __asm__("copy_from_user_overflow");
-#define __copy_from_user_overflow(size, count) __copy_from_user_overflow()
-
-extern void
-__compiletime_warning("copy_to_user() buffer size is not provably correct")
-__copy_to_user_overflow(void) __asm__("copy_from_user_overflow");
-#define __copy_to_user_overflow(size, count) __copy_to_user_overflow()
-
-#else
+extern void __compiletime_error("usercopy buffer size is too small")
+__bad_copy_user(void);
 
-static inline void
-__copy_from_user_overflow(int size, unsigned long count)
+static inline void copy_user_overflow(int size, unsigned long count)
 {
 	WARN(1, "Buffer overflow detected (%d < %lu)!\n", size, count);
 }
 
-#define __copy_to_user_overflow __copy_from_user_overflow
-
-#endif
-
 static inline unsigned long __must_check
 copy_from_user(void *to, const void __user *from, unsigned long n)
 {
@@ -743,31 +714,13 @@ copy_from_user(void *to, const void __user *from, unsigned long n)
 
 	kasan_check_write(to, n);
 
-	/*
-	 * While we would like to have the compiler do the checking for us
-	 * even in the non-constant size case, any false positives there are
-	 * a problem (especially when DEBUG_STRICT_USER_COPY_CHECKS, but even
-	 * without - the [hopefully] dangerous looking nature of the warning
-	 * would make people go look at the respecitive call sites over and
-	 * over again just to find that there's no problem).
-	 *
-	 * And there are cases where it's just not realistic for the compiler
-	 * to prove the count to be in range. For example when multiple call
-	 * sites of a helper function - perhaps in different source files -
-	 * all doing proper range checking, yet the helper function not doing
-	 * so again.
-	 *
-	 * Therefore limit the compile time checking to the constant size
-	 * case, and do only runtime checking for non-constant sizes.
-	 */
-
 	if (likely(sz < 0 || sz >= n)) {
 		check_object_size(to, n, false);
 		n = _copy_from_user(to, from, n);
-	} else if (__builtin_constant_p(n))
-		copy_from_user_overflow();
+	} else if (!__builtin_constant_p(n))
+		copy_user_overflow(sz, n);
 	else
-		__copy_from_user_overflow(sz, n);
+		__bad_copy_user();
 
 	return n;
 }
@@ -781,21 +734,17 @@ copy_to_user(void __user *to, const void *from, unsigned long n)
 
 	might_fault();
 
-	/* See the comment in copy_from_user() above. */
 	if (likely(sz < 0 || sz >= n)) {
 		check_object_size(from, n, true);
 		n = _copy_to_user(to, from, n);
-	} else if (__builtin_constant_p(n))
-		copy_to_user_overflow();
+	} else if (!__builtin_constant_p(n))
+		copy_user_overflow(sz, n);
 	else
-		__copy_to_user_overflow(sz, n);
+		__bad_copy_user();
 
 	return n;
 }
 
-#undef __copy_from_user_overflow
-#undef __copy_to_user_overflow
-
 /*
  * We rely on the nested NMI work to allow atomic faults from the NMI path; the
  * nested NMI paths are careful to preserve CR2.
diff --git a/include/asm-generic/uaccess.h b/include/asm-generic/uaccess.h
index 1bfa602..5dea1fb 100644
--- a/include/asm-generic/uaccess.h
+++ b/include/asm-generic/uaccess.h
@@ -72,6 +72,7 @@ struct exception_table_entry
 /* Returns 0 if exception not found and fixup otherwise.  */
 extern unsigned long search_exception_table(unsigned long);
 
+
 /*
  * architectures with an MMU should override these two
  */
diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
index 8dbc892..573c5a1 100644
--- a/include/linux/compiler-gcc.h
+++ b/include/linux/compiler-gcc.h
@@ -158,7 +158,7 @@
 #define __compiler_offsetof(a, b)					\
 	__builtin_offsetof(a, b)
 
-#if GCC_VERSION >= 40100 && GCC_VERSION < 40600
+#if GCC_VERSION >= 40100
 # define __compiletime_object_size(obj) __builtin_object_size(obj, 0)
 #endif
 
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 2307d7c..2e2cca5 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1686,24 +1686,6 @@ config LATENCYTOP
 	  Enable this option if you want to use the LatencyTOP tool
 	  to find out which userspace is blocking on what kernel operations.
 
-config ARCH_HAS_DEBUG_STRICT_USER_COPY_CHECKS
-	bool
-
-config DEBUG_STRICT_USER_COPY_CHECKS
-	bool "Strict user copy size checks"
-	depends on ARCH_HAS_DEBUG_STRICT_USER_COPY_CHECKS
-	depends on DEBUG_KERNEL && !TRACE_BRANCH_PROFILING
-	help
-	  Enabling this option turns a certain set of sanity checks for user
-	  copy operations into compile time failures.
-
-	  The copy_from_user() etc checks are there to help test if there
-	  are sufficient security checks on the length argument of
-	  the copy operation, by having gcc prove that the argument is
-	  within bounds.
-
-	  If unsure, say N.
-
 source kernel/trace/Kconfig
 
 menu "Runtime Testing"
diff --git a/lib/Makefile b/lib/Makefile
index cfa68eb..5dc77a8 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -24,7 +24,6 @@ lib-y := ctype.o string.o vsprintf.o cmdline.o \
 	 is_single_threaded.o plist.o decompress.o kobject_uevent.o \
 	 earlycpio.o seq_buf.o nmi_backtrace.o nodemask.o
 
-obj-$(CONFIG_ARCH_HAS_DEBUG_STRICT_USER_COPY_CHECKS) += usercopy.o
 lib-$(CONFIG_MMU) += ioremap.o
 lib-$(CONFIG_SMP) += cpumask.o
 lib-$(CONFIG_HAS_DMA) += dma-noop.o
diff --git a/lib/usercopy.c b/lib/usercopy.c
deleted file mode 100644
index 4f5b1dd..0000000
--- a/lib/usercopy.c
+++ /dev/null
@@ -1,9 +0,0 @@
-#include <linux/export.h>
-#include <linux/bug.h>
-#include <linux/uaccess.h>
-
-void copy_from_user_overflow(void)
-{
-	WARN(1, "Buffer overflow detected!\n");
-}
-EXPORT_SYMBOL(copy_from_user_overflow);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* Re: [PATCH v3] mm/usercopy: get rid of CONFIG_DEBUG_STRICT_USER_COPY_CHECKS
  2016-08-30 13:04                               ` [PATCH v3] " Josh Poimboeuf
@ 2016-08-30 17:02                                 ` Linus Torvalds
  2016-08-30 18:12                                   ` Al Viro
  2016-08-30 18:15                                   ` Kees Cook
  0 siblings, 2 replies; 107+ messages in thread
From: Linus Torvalds @ 2016-08-30 17:02 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Kees Cook, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	the arch/x86 maintainers, Linux Kernel Mailing List,
	Andy Lutomirski, Steven Rostedt, Brian Gerst, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

On Tue, Aug 30, 2016 at 6:04 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> There are three usercopy warnings which are currently being silenced for
> gcc 4.6 and newer:

[.. snip snip ..]

Ok, I'm not entirely happy with the timing, but I think the problem
counts as a regression since it effectively made all the checks go
away in practice for most people, so I'm going to apply this patch.

I know Al Viro is working on some uaccess cleanups and trying to make
a lot of this be generic, so there's hopefully cleanups coming in the
not too distant future (I say "hopefully", because I worry that
looking at the mess will make Al dig his eyes out), but this seems to
be a clear improvement.

I still do wish we'd move the x86 __builtin_constant_p(n) check
around, so that x86 wouldn't do the run-time check_object_size() for
the trivially statically correct case, but I guess that's a separate
issue from this patch anyway.

If somebody has objections to this patch, holler quickly, because it's
about to get applied. 3.. 2.. 1..

                         Linus

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v3] mm/usercopy: get rid of CONFIG_DEBUG_STRICT_USER_COPY_CHECKS
  2016-08-30 17:02                                 ` Linus Torvalds
@ 2016-08-30 18:12                                   ` Al Viro
  2016-08-30 18:13                                     ` Linus Torvalds
  2016-08-30 18:15                                   ` Kees Cook
  1 sibling, 1 reply; 107+ messages in thread
From: Al Viro @ 2016-08-30 18:12 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Josh Poimboeuf, Kees Cook, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, the arch/x86 maintainers,
	Linux Kernel Mailing List, Andy Lutomirski, Steven Rostedt,
	Brian Gerst, Peter Zijlstra, Frederic Weisbecker, Byungchul Park,
	Nilay Vaish

On Tue, Aug 30, 2016 at 10:02:38AM -0700, Linus Torvalds wrote:
> On Tue, Aug 30, 2016 at 6:04 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > There are three usercopy warnings which are currently being silenced for
> > gcc 4.6 and newer:
> 
> [.. snip snip ..]
> 
> Ok, I'm not entirely happy with the timing, but I think the problem
> counts as a regression since it effectively made all the checks go
> away in practice for most people, so I'm going to apply this patch.
> 
> I know Al Viro is working on some uaccess cleanups and trying to make
> a lot of this be generic, so there's hopefully cleanups coming in the
> not too distant future (I say "hopefully", because I worry that
> looking at the mess will make Al dig his eyes out), but this seems to
> be a clear improvement.
> 
> I still do wish we'd move the x86 __builtin_constant_p(n) check
> around, so that x86 wouldn't do the run-time check_object_size() for
> the trivially statically correct case, but I guess that's a separate
> issue from this patch anyway.
> 
> If somebody has objections to this patch, holler quickly, because it's
> about to get applied. 3.. 2.. 1..

The only thing in my pile it conflicts with is this:

commit 0983ee6305f551faf29b11e59486679f600f1cd9
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Sat Aug 20 19:03:37 2016 -0400

    parisc: fix copy_from_user()
    
    Cc: stable@vger.kernel.org
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

diff --git a/arch/parisc/include/asm/uaccess.h b/arch/parisc/include/asm/uaccess.h
index 0f59fd9..54cfea9 100644
--- a/arch/parisc/include/asm/uaccess.h
+++ b/arch/parisc/include/asm/uaccess.h
@@ -221,13 +221,14 @@ static inline unsigned long __must_check copy_from_user(void *to,
                                           unsigned long n)
 {
         int sz = __compiletime_object_size(to);
-        int ret = -EFAULT;
+        unsigned long ret = n;
 
         if (likely(sz == -1 || !__builtin_constant_p(n) || sz >= n))
                 ret = __copy_from_user(to, from, n);
         else
                 copy_from_user_overflow();
-
+	if (unlikely(ret))
+		memset(to + (n - ret), 0, ret);
         return ret;
 }
 

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* Re: [PATCH v3] mm/usercopy: get rid of CONFIG_DEBUG_STRICT_USER_COPY_CHECKS
  2016-08-30 18:12                                   ` Al Viro
@ 2016-08-30 18:13                                     ` Linus Torvalds
  0 siblings, 0 replies; 107+ messages in thread
From: Linus Torvalds @ 2016-08-30 18:13 UTC (permalink / raw)
  To: Al Viro
  Cc: Josh Poimboeuf, Kees Cook, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, the arch/x86 maintainers,
	Linux Kernel Mailing List, Andy Lutomirski, Steven Rostedt,
	Brian Gerst, Peter Zijlstra, Frederic Weisbecker, Byungchul Park,
	Nilay Vaish

On Tue, Aug 30, 2016 at 11:12 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> The only thing in my pile it conflicts with is this:

Ok, no worries then.

             Linus

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v3] mm/usercopy: get rid of CONFIG_DEBUG_STRICT_USER_COPY_CHECKS
  2016-08-30 17:02                                 ` Linus Torvalds
  2016-08-30 18:12                                   ` Al Viro
@ 2016-08-30 18:15                                   ` Kees Cook
  2016-08-30 19:09                                     ` Josh Poimboeuf
  2016-08-30 20:13                                     ` Al Viro
  1 sibling, 2 replies; 107+ messages in thread
From: Kees Cook @ 2016-08-30 18:15 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Josh Poimboeuf, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	the arch/x86 maintainers, Linux Kernel Mailing List,
	Andy Lutomirski, Steven Rostedt, Brian Gerst, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish, Al Viro,
	Mark Rutland

On Tue, Aug 30, 2016 at 1:02 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Tue, Aug 30, 2016 at 6:04 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>> There are three usercopy warnings which are currently being silenced for
>> gcc 4.6 and newer:
>
> [.. snip snip ..]
>
> Ok, I'm not entirely happy with the timing, but I think the problem
> counts as a regression since it effectively made all the checks go
> away in practice for most people, so I'm going to apply this patch.

Yeah, for pragmatism, I'm a fan of this patch since it restores the
const checks. What gets lost here are the gcc dead-code optimization
situations where gcc can figure out the value range for a non-const
size, but that's currently broken anyway, so there's no point in
keeping it. We can add it back when gcc fixes their regression.

> I know Al Viro is working on some uaccess cleanups and trying to make
> a lot of this be generic, so there's hopefully cleanups coming in the
> not too distant future (I say "hopefully", because I worry that
> looking at the mess will make Al dig his eyes out), but this seems to
> be a clear improvement.

Yeah. Mark Rutland is also looking at this too.

> I still do wish we'd move the x86 __builtin_constant_p(n) check
> around, so that x86 wouldn't do the run-time check_object_size() for
> the trivially statically correct case, but I guess that's a separate
> issue from this patch anyway.

Yeah, I'm going to wait a bit for the dust to settle here, but it's
worth documenting the situation as I'd like to see it.

First, some current API usage which we'll need to maintain at least
for now: __copy_*_user() is just copy_*_user() without the access_ok()
checks. Unfortunately, some arch implement different copying methods
depending on if the entry is via copy...() or __copy..() (e.g. see
x86's use of _copy...() -- single underscore??) There doesn't seem to
be a good reason for this, and I think it would make sense to extract
the actual per-arch implementation that performs the real copy into
something like arm64's __arch_copy_*_user(), which only does the copy
itself and nothing else.

Once that's in place, we can do sanity-checking in __copy_*_user(),
leaving the access_ok() only in copy_*_user(). The logic should be
something like:

if const destination object size is known:
    if copy size is too large:
        if copy size is const:
            abort build
        else:
            runtime BUG
    else:
        perform copy
else:
    perform runtime object size sanity checks
    perform copy

For example, totally untested, put together based on Josh's updates,
and the arm64 code, and some variable name clarity changes:

static inline __must_check unsigned long __copy_from_user(void *to,
                                   const void __user *from, unsigned long n)
{
    int dest_size = __compiletime_object_size(to);

    might_fault();
    /* KASan seems to want pre-check arguments, so run it first. */
    kasan_check_write(to, n);

    if (likely(dest_size != -1)) {
        /* Destination object size is known at compile time. */
        if (n > dest_size) {
            /* Copy size is too large for destination object. */
            if (__builtin_constant_p(n)) {
                /* Copy size is known at compile time: abort the build. */
                copy_user_compile_time_overflow(dest_size, n);
            } else {
                /* Copy size only known at runtime, abort copy with BUG. */
                __bad_user_copy();
            }
        } else {
            /* Copy size within size of destination object, perform copy. */
            n = __arch_copy_from_user(to, from, n);
        }
    } else {
        /* Destination object size needs runtime checking. */
        check_runtime_object_size(to, from, n);
        /* If we got here, runtime checks passed, perform copy. */
        n = __arch_copy_from_user(to, from, n);
    }
    return n;
}

static inline __must_check unsigned long copy_from_user(void *to,
                                   const void __user * from, unsigned long n)
{
    if (access_ok(VERIFY_READ, from, n)) {
            n = __copy_from_user(to, from, n);
    } else
             memset(to, 0, n); /* This is needed to avoid memory
content leaks. */
    return n;
}

Some notes, here: the __bad_user_copy() should be a BUG, not a WARN
since we've landed on a provably bad situation.

check_object_size() should probably be renamed
"check_runtime_obj_size" or something to clarify its purpose, since
it's intended to be called only when we have to go off and examine
runtime object metadata to figure out how to correctly perform bounds
checking.

> If somebody has objections to this patch, holler quickly, because it's
> about to get applied. 3.. 2.. 1..

Go for it! :)

-Kees

-- 
Kees Cook
Nexus Security

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH 2/2] mm/usercopy: enable usercopy size checking for modern versions of gcc
  2016-08-29 14:48                         ` Josh Poimboeuf
  2016-08-29 15:36                           ` Linus Torvalds
@ 2016-08-30 18:33                           ` Kees Cook
  1 sibling, 0 replies; 107+ messages in thread
From: Kees Cook @ 2016-08-30 18:33 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Linus Torvalds, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	x86, LKML, Andy Lutomirski, Steven Rostedt, Brian Gerst,
	Peter Zijlstra, Frederic Weisbecker, Byungchul Park, Nilay Vaish

On Mon, Aug 29, 2016 at 10:48 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> On Fri, Aug 26, 2016 at 05:37:20PM -0700, Linus Torvalds wrote:
>> On Fri, Aug 26, 2016 at 1:56 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>> >
>> > There's one problem with that though.  It's going to annoy a lot of
>> > people who do allyesconfig/allmodconfig builds because
>> > DEBUG_STRICT_USER_COPY_CHECKS adds several fake warnings.
>>
>> How bad is it?
>>
>> In particular, we've definitely had issues with the "warning"
>> attribute before. Because as you pointed out somewhere elsewhere, the
>> warrning can happen before the call is actually optimized away by a
>> later compiler phase.
>
> So I *think* your patch fixes the wrong problem.  That's probably at
> least somewhat my fault because I misunderstood the issue before and may
> have described it wrong at some point.
>
> AFAICT, gcc isn't doing anything wrong, and the false positives are
> "intentional".
>
> There are in fact two static warnings (which are being silenced for new
> versions of gcc):
>
> 1) "copy_from_user() buffer size is too small"
>
>    This happens when object size and copy size are both const, and copy
>    size > object size.  I didn't see any false positives for this one.
>    So the function warning attribute seems to be working fine here.
>    Your patch "fixed" this warning, but it didn't need fixing.
>
>    Note this scenario is always a bug and so I think it should be
>    changed to *always* be an error, regardless of
>    DEBUG_STRICT_USER_COPY_CHECKS.
>
> 2) "copy_from_user() buffer size is not provably correct"
>
>    This is the (cryptic) false positive warning which happens when I
>    enable __compiletime_object_size() for new compilers (and
>    DEBUG_STRICT_USER_COPY_CHECKS).  It happens when object size is
>    const, but copy size is *not*.  In this case there's no way to
>    compare the two at build time, so it gives the warning.  (Note the
>    warning is a byproduct of the fact that gcc has no way of knowing
>    whether the overflow function will be called, so the call isn't dead
>    code and the warning attribute is activated.)
>
>    So this warning seems to only indicate "this is an unusual pattern,
>    maybe you should check it out" rather than "this is a bug".  It seems
>    to be working "as designed": it has nothing to do with gcc compiler
>    phases AFAICT.
>
>    (Which begs the question: why didn't these warnings appear with older
>    versions of gcc?  I have no idea...)
>
>    I get 102(!) of these warnings with allyesconfig and the
>    __compiletime_object_size() gcc check removed.  I don't know if there
>    are any real bugs hiding in there, but from looking at a small
>    sample, I didn't see any.
>
> So warning 2 seems to be intentional for some reason.  I suggested
> removing it, while keeping the corresponding runtime check.  But
> according to Kees it sometimes finds real bugs.
>
> (Kees, can you confirm that at least some of the recent bugs you found
> were from warning 2?)

Yup, that's correct. In a perfect world, gcc would perform dead-code
analysis and constrain the values of the copy size, removing the "not
provably correct" option when it wasn't possible to hit it. But that
ability regressed, and we started getting more warnings.

> Anyway I don't currently see any doable option other than just removing
> warning 2 (yet still keeping the corresponding copy_user_overflow()
> runtime check).

Yeah. I think consolidating all the usercopy logic into asm-generic is
probably the first task, then we can reexamine adding the warning back
once the gcc bug is fixed.

-Kees

-- 
Kees Cook
Nexus Security

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v3] mm/usercopy: get rid of CONFIG_DEBUG_STRICT_USER_COPY_CHECKS
  2016-08-30 18:15                                   ` Kees Cook
@ 2016-08-30 19:09                                     ` Josh Poimboeuf
  2016-08-30 19:20                                       ` Kees Cook
  2016-08-30 20:13                                     ` Al Viro
  1 sibling, 1 reply; 107+ messages in thread
From: Josh Poimboeuf @ 2016-08-30 19:09 UTC (permalink / raw)
  To: Kees Cook
  Cc: Linus Torvalds, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	the arch/x86 maintainers, Linux Kernel Mailing List,
	Andy Lutomirski, Steven Rostedt, Brian Gerst, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish, Al Viro,
	Mark Rutland

On Tue, Aug 30, 2016 at 02:15:58PM -0400, Kees Cook wrote:
> static inline __must_check unsigned long __copy_from_user(void *to,
>                                    const void __user *from, unsigned long n)
> {
>     int dest_size = __compiletime_object_size(to);
> 
>     might_fault();
>     /* KASan seems to want pre-check arguments, so run it first. */
>     kasan_check_write(to, n);
> 
>     if (likely(dest_size != -1)) {
>         /* Destination object size is known at compile time. */
>         if (n > dest_size) {
>             /* Copy size is too large for destination object. */
>             if (__builtin_constant_p(n)) {
>                 /* Copy size is known at compile time: abort the build. */
>                 copy_user_compile_time_overflow(dest_size, n);
>             } else {
>                 /* Copy size only known at runtime, abort copy with BUG. */
>                 __bad_user_copy();
>             }
>         } else {
>             /* Copy size within size of destination object, perform copy. */
>             n = __arch_copy_from_user(to, from, n);
>         }
>     } else {
>         /* Destination object size needs runtime checking. */
>         check_runtime_object_size(to, from, n);
>         /* If we got here, runtime checks passed, perform copy. */
>         n = __arch_copy_from_user(to, from, n);
>     }
>     return n;
> }
> 
> static inline __must_check unsigned long copy_from_user(void *to,
>                                    const void __user * from, unsigned long n)
> {
>     if (access_ok(VERIFY_READ, from, n)) {
>             n = __copy_from_user(to, from, n);
>     } else
>              memset(to, 0, n); /* This is needed to avoid memory
> content leaks. */
>     return n;
> }
> 
> Some notes, here: the __bad_user_copy() should be a BUG, not a WARN
> since we've landed on a provably bad situation.

Looks good to me.  One nit: I think the "likely" check for "dest_size !=
-1" isn't needed.  dest_size is known at compile-time, so gcc should be
able to optimize it accordingly.

> check_object_size() should probably be renamed
> "check_runtime_obj_size" or something to clarify its purpose, since
> it's intended to be called only when we have to go off and examine
> runtime object metadata to figure out how to correctly perform bounds
> checking.

Personally I find having "size" in the name to be misleading, since the
function actually looks at much more than just size.  Especially
considering the fact that we already have the other static and runtime
checks which do only check the size.

I also don't really care for "runtime", since most functions are indeed
called at runtime.  If anything I'd prefer the reverse, where any
built-in compile-time "functions" are specially named or annotated.

My vote would be something like check_usercopy_object().

-- 
Josh

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v3] mm/usercopy: get rid of CONFIG_DEBUG_STRICT_USER_COPY_CHECKS
  2016-08-30 19:09                                     ` Josh Poimboeuf
@ 2016-08-30 19:20                                       ` Kees Cook
  0 siblings, 0 replies; 107+ messages in thread
From: Kees Cook @ 2016-08-30 19:20 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Linus Torvalds, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	the arch/x86 maintainers, Linux Kernel Mailing List,
	Andy Lutomirski, Steven Rostedt, Brian Gerst, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish, Al Viro,
	Mark Rutland

On Tue, Aug 30, 2016 at 3:09 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> On Tue, Aug 30, 2016 at 02:15:58PM -0400, Kees Cook wrote:
>> static inline __must_check unsigned long __copy_from_user(void *to,
>>                                    const void __user *from, unsigned long n)
>> {
>>     int dest_size = __compiletime_object_size(to);
>>
>>     might_fault();
>>     /* KASan seems to want pre-check arguments, so run it first. */
>>     kasan_check_write(to, n);
>>
>>     if (likely(dest_size != -1)) {
>>         /* Destination object size is known at compile time. */
>>         if (n > dest_size) {
>>             /* Copy size is too large for destination object. */
>>             if (__builtin_constant_p(n)) {
>>                 /* Copy size is known at compile time: abort the build. */
>>                 copy_user_compile_time_overflow(dest_size, n);
>>             } else {
>>                 /* Copy size only known at runtime, abort copy with BUG. */
>>                 __bad_user_copy();
>>             }
>>         } else {
>>             /* Copy size within size of destination object, perform copy. */
>>             n = __arch_copy_from_user(to, from, n);
>>         }
>>     } else {
>>         /* Destination object size needs runtime checking. */
>>         check_runtime_object_size(to, from, n);
>>         /* If we got here, runtime checks passed, perform copy. */
>>         n = __arch_copy_from_user(to, from, n);
>>     }
>>     return n;
>> }
>>
>> static inline __must_check unsigned long copy_from_user(void *to,
>>                                    const void __user * from, unsigned long n)
>> {
>>     if (access_ok(VERIFY_READ, from, n)) {
>>             n = __copy_from_user(to, from, n);
>>     } else
>>              memset(to, 0, n); /* This is needed to avoid memory
>> content leaks. */
>>     return n;
>> }
>>
>> Some notes, here: the __bad_user_copy() should be a BUG, not a WARN
>> since we've landed on a provably bad situation.
>
> Looks good to me.  One nit: I think the "likely" check for "dest_size !=
> -1" isn't needed.  dest_size is known at compile-time, so gcc should be
> able to optimize it accordingly.

Yeah, good point.

>> check_object_size() should probably be renamed
>> "check_runtime_obj_size" or something to clarify its purpose, since
>> it's intended to be called only when we have to go off and examine
>> runtime object metadata to figure out how to correctly perform bounds
>> checking.
>
> Personally I find having "size" in the name to be misleading, since the
> function actually looks at much more than just size.  Especially
> considering the fact that we already have the other static and runtime
> checks which do only check the size.
>
> I also don't really care for "runtime", since most functions are indeed
> called at runtime.  If anything I'd prefer the reverse, where any
> built-in compile-time "functions" are specially named or annotated.
>
> My vote would be something like check_usercopy_object().

Sounds good to me. :)

-Kees

-- 
Kees Cook
Nexus Security

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v3] mm/usercopy: get rid of CONFIG_DEBUG_STRICT_USER_COPY_CHECKS
  2016-08-30 18:15                                   ` Kees Cook
  2016-08-30 19:09                                     ` Josh Poimboeuf
@ 2016-08-30 20:13                                     ` Al Viro
  2016-08-30 22:20                                       ` Kees Cook
  2016-08-31  9:43                                       ` Mark Rutland
  1 sibling, 2 replies; 107+ messages in thread
From: Al Viro @ 2016-08-30 20:13 UTC (permalink / raw)
  To: Kees Cook
  Cc: Linus Torvalds, Josh Poimboeuf, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, the arch/x86 maintainers,
	Linux Kernel Mailing List, Andy Lutomirski, Steven Rostedt,
	Brian Gerst, Peter Zijlstra, Frederic Weisbecker, Byungchul Park,
	Nilay Vaish, Mark Rutland

On Tue, Aug 30, 2016 at 02:15:58PM -0400, Kees Cook wrote:

> First, some current API usage which we'll need to maintain at least
> for now: __copy_*_user() is just copy_*_user() without the access_ok()
> checks. Unfortunately, some arch implement different copying methods
> depending on if the entry is via copy...() or __copy..() (e.g. see
> x86's use of _copy...() -- single underscore??) There doesn't seem to
> be a good reason for this, and I think it would make sense to extract
> the actual per-arch implementation that performs the real copy into
> something like arm64's __arch_copy_*_user(), which only does the copy
> itself and nothing else.

No.  __arch_copy_from_user() is a bloody bad idea; the real primitive
is what's currently called __copy_from_user_inatomic(), and I'm planning
to rename it to raw_copy_from_user().  Note that _this_ should not
zero anything on fault; "inatomic" part is a misnomer.  I'm not sure
if __copy_from_user() will survive long-term, actually; copy_from_user()
should (size checks aside) be equivalent to
	size_t res = size;
	might_fault();
	if (likely(access_ok(...)))
		res = __copy_from_user_inatomic(...);
	if (unlikely(res))
		memset(to + (size - res), 0, res);
	return res;

Linus asked to take that to lib/* - at least the memset() part.

	* get_user()/put_user()/clear_user()/copy_{from,to,in}_user() should
check access_ok() (if non-degenerate on the architecture in question).
	* failing get_user(x, p)/__get_user(x, p) should zero x
	* short copy (for any reason, including access_ok() failure) in
copy_from_user() should return the amount of bytes *not* copied and zero them.
In no circumstances should it return -E...
	* __copy_from_user_inatomic(to, from, size) should return exactly
size - amount of bytes stored.  It does *not* need to copy as much as possible
in case of fault.  It should not zero anything; as the matter of fact, zeroing
does not belong in assembler part at all.
	* iov_iter_copy_from_user_atomic(), copy_page_from_iter()
and copy_page_from_iter() will not modify anything past the amount they
return.  In particular, they will not zero anything at all.  Right now it's
arch-dependent.
	* iov_iter_fault_in_readable() will merge with
iov_iter_fault_in_multipages_readable(), with the semantics of the latter.
As the matter of fact, the same ought to happen to fault_in_pages_readable()
and fault_in_multipages_readable().
	* ->write_end() instances on short copy into uptodate page should
not zero anything whatsoever; when page is not uptodate, they should only
zero an area if readpage should've done the same (e.g. if it's something like
ramfs, or if we know that we'd allocated new on-disk blocks and hadn't
copied them over, etc.  Returning 0 and leaving a page !uptodate is always
OK on a short copy; we might do something more intelligent, but that's
up to specific ->write_end() instance.
	* includes of asm/uaccess.h are going away.  That's obviously not
something we can afford as a prereq for fixes to be backported, but for
the next window we definitely want a one-time tree-wide switch to
linux/uaccess.h.  For *.c (and local .h) it's trivial, for general-purpose
headers it'll take some massage.  Once we have linux/uaccess.h use, we
can move duplicates over there.

	The above obviously doesn't go into exception/longjmp/asm-goto/etc.
pile of joy; that needs more experiments and frankly, I want to finish
separating the -stable fodder first.

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v3] mm/usercopy: get rid of CONFIG_DEBUG_STRICT_USER_COPY_CHECKS
  2016-08-30 20:13                                     ` Al Viro
@ 2016-08-30 22:20                                       ` Kees Cook
  2016-08-31  9:43                                       ` Mark Rutland
  1 sibling, 0 replies; 107+ messages in thread
From: Kees Cook @ 2016-08-30 22:20 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Josh Poimboeuf, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, the arch/x86 maintainers,
	Linux Kernel Mailing List, Andy Lutomirski, Steven Rostedt,
	Brian Gerst, Peter Zijlstra, Frederic Weisbecker, Byungchul Park,
	Nilay Vaish, Mark Rutland

On Tue, Aug 30, 2016 at 4:13 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Tue, Aug 30, 2016 at 02:15:58PM -0400, Kees Cook wrote:
>
>> First, some current API usage which we'll need to maintain at least
>> for now: __copy_*_user() is just copy_*_user() without the access_ok()
>> checks. Unfortunately, some arch implement different copying methods
>> depending on if the entry is via copy...() or __copy..() (e.g. see
>> x86's use of _copy...() -- single underscore??) There doesn't seem to
>> be a good reason for this, and I think it would make sense to extract
>> the actual per-arch implementation that performs the real copy into
>> something like arm64's __arch_copy_*_user(), which only does the copy
>> itself and nothing else.
>
> No.  __arch_copy_from_user() is a bloody bad idea; the real primitive
> is what's currently called __copy_from_user_inatomic(), and I'm planning
> to rename it to raw_copy_from_user().  Note that _this_ should not

I don't think the name is important, just as long as it's clear. We
both seem to agree: the arch-specific stuff should be separate from
the common API that has the sanity checking, etc, which it sounds like
you're already doing.

-Kees

> zero anything on fault; "inatomic" part is a misnomer.  I'm not sure
> if __copy_from_user() will survive long-term, actually; copy_from_user()
> should (size checks aside) be equivalent to
>         size_t res = size;
>         might_fault();
>         if (likely(access_ok(...)))
>                 res = __copy_from_user_inatomic(...);
>         if (unlikely(res))
>                 memset(to + (size - res), 0, res);
>         return res;
>
> Linus asked to take that to lib/* - at least the memset() part.
>
>         * get_user()/put_user()/clear_user()/copy_{from,to,in}_user() should
> check access_ok() (if non-degenerate on the architecture in question).
>         * failing get_user(x, p)/__get_user(x, p) should zero x
>         * short copy (for any reason, including access_ok() failure) in
> copy_from_user() should return the amount of bytes *not* copied and zero them.
> In no circumstances should it return -E...
>         * __copy_from_user_inatomic(to, from, size) should return exactly
> size - amount of bytes stored.  It does *not* need to copy as much as possible
> in case of fault.  It should not zero anything; as the matter of fact, zeroing
> does not belong in assembler part at all.
>         * iov_iter_copy_from_user_atomic(), copy_page_from_iter()
> and copy_page_from_iter() will not modify anything past the amount they
> return.  In particular, they will not zero anything at all.  Right now it's
> arch-dependent.
>         * iov_iter_fault_in_readable() will merge with
> iov_iter_fault_in_multipages_readable(), with the semantics of the latter.
> As the matter of fact, the same ought to happen to fault_in_pages_readable()
> and fault_in_multipages_readable().
>         * ->write_end() instances on short copy into uptodate page should
> not zero anything whatsoever; when page is not uptodate, they should only
> zero an area if readpage should've done the same (e.g. if it's something like
> ramfs, or if we know that we'd allocated new on-disk blocks and hadn't
> copied them over, etc.  Returning 0 and leaving a page !uptodate is always
> OK on a short copy; we might do something more intelligent, but that's
> up to specific ->write_end() instance.

Agreed on all these; and getting that documented in the final
uaccess.h seems like a very good idea too.

>         * includes of asm/uaccess.h are going away.  That's obviously not
> something we can afford as a prereq for fixes to be backported, but for
> the next window we definitely want a one-time tree-wide switch to
> linux/uaccess.h.  For *.c (and local .h) it's trivial, for general-purpose
> headers it'll take some massage.  Once we have linux/uaccess.h use, we
> can move duplicates over there.

How do you envision architectures gluing their copy implementation to
raw_copy_from_user()?

>         The above obviously doesn't go into exception/longjmp/asm-goto/etc.
> pile of joy; that needs more experiments and frankly, I want to finish
> separating the -stable fodder first.

Yup, cool.

-Kees

-- 
Kees Cook
Nexus Security

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v3] mm/usercopy: get rid of CONFIG_DEBUG_STRICT_USER_COPY_CHECKS
  2016-08-30 20:13                                     ` Al Viro
  2016-08-30 22:20                                       ` Kees Cook
@ 2016-08-31  9:43                                       ` Mark Rutland
  1 sibling, 0 replies; 107+ messages in thread
From: Mark Rutland @ 2016-08-31  9:43 UTC (permalink / raw)
  To: Al Viro
  Cc: Kees Cook, Linus Torvalds, Josh Poimboeuf, Thomas Gleixner,
	Ingo Molnar, H . Peter Anvin, the arch/x86 maintainers,
	Linux Kernel Mailing List, Andy Lutomirski, Steven Rostedt,
	Brian Gerst, Peter Zijlstra, Frederic Weisbecker, Byungchul Park,
	Nilay Vaish

On Tue, Aug 30, 2016 at 09:13:32PM +0100, Al Viro wrote:
> On Tue, Aug 30, 2016 at 02:15:58PM -0400, Kees Cook wrote:
> 
> > First, some current API usage which we'll need to maintain at least
> > for now: __copy_*_user() is just copy_*_user() without the access_ok()
> > checks. Unfortunately, some arch implement different copying methods
> > depending on if the entry is via copy...() or __copy..() (e.g. see
> > x86's use of _copy...() -- single underscore??) There doesn't seem to
> > be a good reason for this, and I think it would make sense to extract
> > the actual per-arch implementation that performs the real copy into
> > something like arm64's __arch_copy_*_user(), which only does the copy
> > itself and nothing else.
> 
> No.  __arch_copy_from_user() is a bloody bad idea; the real primitive
> is what's currently called __copy_from_user_inatomic(), and I'm planning
> to rename it to raw_copy_from_user(). 

Great!

FWIW, my plan with the arch_* forms was to follow the convention set by
the spinlock code and have raw_* forms build atop of these, where common
debug and/or hardening checks would live.

>From my PoV, anything to make this more consistent cross-architecture is
good, especially if we can pull the duplicated logic into common code.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 107+ messages in thread

end of thread, other threads:[~2016-08-31  9:44 UTC | newest]

Thread overview: 107+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-18 13:05 [PATCH v4 00/57] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
2016-08-18 13:05 ` [PATCH v4 01/57] x86/dumpstack: remove show_trace() Josh Poimboeuf
2016-08-18 13:05 ` [PATCH v4 02/57] x86/asm/head: remove unused init_rsp variable extern Josh Poimboeuf
2016-08-18 16:22   ` Sebastian Andrzej Siewior
2016-08-18 13:05 ` [PATCH v4 03/57] x86/asm/head: rename 'stack_start' -> 'initial_stack' Josh Poimboeuf
2016-08-18 13:05 ` [PATCH v4 04/57] x86/asm/head: use a common function for starting CPUs Josh Poimboeuf
2016-08-18 13:05 ` [PATCH v4 05/57] x86/dumpstack: make printk_stack_address() more generally useful Josh Poimboeuf
2016-08-18 13:05 ` [PATCH v4 06/57] x86/dumpstack: add IRQ_USABLE_STACK_SIZE define Josh Poimboeuf
2016-08-18 13:05 ` [PATCH v4 07/57] x86/dumpstack: remove extra brackets around "<EOE>" Josh Poimboeuf
2016-08-18 13:05 ` [PATCH v4 08/57] x86/dumpstack: fix irq stack bounds calculation in show_stack_log_lvl() Josh Poimboeuf
2016-08-18 13:05 ` [PATCH v4 09/57] x86/dumpstack: fix x86_32 kernel_stack_pointer() previous stack access Josh Poimboeuf
2016-08-18 13:05 ` [PATCH v4 10/57] x86/dumpstack: add get_stack_pointer() and get_frame_pointer() Josh Poimboeuf
2016-08-18 13:05 ` [PATCH v4 11/57] x86/dumpstack: remove unnecessary stack pointer arguments Josh Poimboeuf
2016-08-18 13:05 ` [PATCH v4 12/57] x86: move _stext marker to before head code Josh Poimboeuf
2016-08-18 13:05 ` [PATCH v4 13/57] x86/head: remove useless zeroed word Josh Poimboeuf
2016-08-18 13:05 ` [PATCH v4 14/57] x86/head: put real return address on idle task stack Josh Poimboeuf
2016-08-18 13:05 ` [PATCH v4 15/57] x86/head: fix the end of the stack for idle tasks Josh Poimboeuf
2016-08-18 13:05 ` [PATCH v4 16/57] x86/entry/32: fix the end of the stack for newly forked tasks Josh Poimboeuf
2016-08-18 13:05 ` [PATCH v4 17/57] x86/head/32: fix the end of the stack for idle tasks Josh Poimboeuf
2016-08-18 13:05 ` [PATCH v4 18/57] x86/smp: fix initial idle stack location on 32-bit Josh Poimboeuf
2016-08-18 13:05 ` [PATCH v4 19/57] x86/entry/head/32: use local labels Josh Poimboeuf
2016-08-18 13:06 ` [PATCH v4 20/57] x86/entry/32: rename 'error_code' to 'common_exception' Josh Poimboeuf
2016-08-18 13:06 ` [PATCH v4 21/57] perf/x86: check perf_callchain_store() error Josh Poimboeuf
2016-08-18 13:06 ` [PATCH v4 22/57] oprofile/x86: add regs->ip to oprofile trace Josh Poimboeuf
2016-08-18 13:06 ` [PATCH v4 23/57] proc: fix return address printk conversion specifer in /proc/<pid>/stack Josh Poimboeuf
2016-08-18 13:06 ` [PATCH v4 24/57] ftrace: remove CONFIG_HAVE_FUNCTION_GRAPH_FP_TEST from config Josh Poimboeuf
2016-08-18 13:06 ` [PATCH v4 25/57] ftrace: only allocate the ret_stack 'fp' field when needed Josh Poimboeuf
2016-08-18 13:06 ` [PATCH v4 26/57] ftrace: add return address pointer to ftrace_ret_stack Josh Poimboeuf
2016-08-18 13:06 ` [PATCH v4 27/57] ftrace: add ftrace_graph_ret_addr() stack unwinding helpers Josh Poimboeuf
2016-08-18 13:06 ` [PATCH v4 28/57] x86/dumpstack/ftrace: convert dump_trace() callbacks to use ftrace_graph_ret_addr() Josh Poimboeuf
2016-08-18 13:06 ` [PATCH v4 29/57] ftrace/x86: implement HAVE_FUNCTION_GRAPH_RET_ADDR_PTR Josh Poimboeuf
2016-08-18 13:06 ` [PATCH v4 30/57] x86/dumpstack/ftrace: mark function graph handler function as unreliable Josh Poimboeuf
2016-08-18 13:06 ` [PATCH v4 31/57] x86/dumpstack/ftrace: don't print unreliable addresses in print_context_stack_bp() Josh Poimboeuf
2016-08-18 13:06 ` [PATCH v4 32/57] x86/dumpstack: allow preemption in show_stack_log_lvl() and dump_trace() Josh Poimboeuf
2016-08-18 13:06 ` [PATCH v4 33/57] x86/dumpstack: simplify in_exception_stack() Josh Poimboeuf
2016-08-18 13:06 ` [PATCH v4 34/57] x86/dumpstack: add get_stack_info() interface Josh Poimboeuf
2016-08-18 13:06 ` [PATCH v4 35/57] x86/dumpstack: add recursion checking for all stacks Josh Poimboeuf
2016-08-18 13:06 ` [PATCH v4 36/57] x86/unwind: add new unwind interface and implementations Josh Poimboeuf
2016-08-18 13:06 ` [PATCH v4 37/57] perf/x86: convert perf_callchain_kernel() to use the new unwinder Josh Poimboeuf
2016-08-18 13:06 ` [PATCH v4 38/57] x86/stacktrace: convert save_stack_trace_*() " Josh Poimboeuf
2016-08-18 13:06 ` [PATCH v4 39/57] oprofile/x86: convert x86_backtrace() " Josh Poimboeuf
2016-08-18 13:06 ` [PATCH v4 40/57] x86/dumpstack: convert show_trace_log_lvl() " Josh Poimboeuf
2016-08-18 13:06 ` [PATCH v4 41/57] x86/dumpstack: remove dump_trace() and related callbacks Josh Poimboeuf
2016-08-18 13:06 ` [PATCH v4 42/57] x86/entry/unwind: create stack frames for saved interrupt registers Josh Poimboeuf
2016-08-18 13:06 ` [PATCH v4 43/57] x86/unwind: create stack frames for saved syscall registers Josh Poimboeuf
2016-08-18 13:06 ` [PATCH v4 44/57] x86/dumpstack: print stack identifier on its own line Josh Poimboeuf
2016-08-18 13:06 ` [PATCH v4 45/57] x86/dumpstack: print any pt_regs found on the stack Josh Poimboeuf
2016-08-18 13:06 ` [PATCH v4 46/57] x86/dumpstack: fix duplicate RIP address display in __show_regs() Josh Poimboeuf
2016-08-18 13:06 ` [PATCH v4 47/57] x86/dumpstack: print orig_ax " Josh Poimboeuf
2016-08-18 13:06 ` [PATCH v4 48/57] x86: remove 64-byte gap at end of irq stack Josh Poimboeuf
2016-08-18 13:06 ` [PATCH v4 49/57] x86/unwind: warn on kernel stack corruption Josh Poimboeuf
2016-08-18 13:06 ` [PATCH v4 50/57] x86/unwind: warn on bad stack return address Josh Poimboeuf
2016-08-18 13:06 ` [PATCH v4 51/57] x86/unwind: warn if stack grows up Josh Poimboeuf
2016-08-18 13:06 ` [PATCH v4 52/57] x86/dumpstack: warn on stack recursion Josh Poimboeuf
2016-08-18 13:06 ` [PATCH v4 53/57] x86/mm: move arch_within_stack_frames() to usercopy.c Josh Poimboeuf
2016-08-18 13:06 ` [PATCH v4 54/57] x86/mm: convert arch_within_stack_frames() to use the new unwinder Josh Poimboeuf
2016-08-19 18:27   ` Kees Cook
2016-08-19 21:55     ` Josh Poimboeuf
2016-08-22 20:27       ` Josh Poimboeuf
2016-08-22 23:33         ` Josh Poimboeuf
2016-08-23  0:59           ` Kees Cook
2016-08-23  4:21             ` Josh Poimboeuf
2016-08-22 22:11   ` Linus Torvalds
2016-08-23  1:27     ` Kees Cook
2016-08-23 16:21       ` Josh Poimboeuf
2016-08-23 18:47       ` Linus Torvalds
2016-08-23 16:06     ` Josh Poimboeuf
2016-08-23 19:28       ` [PATCH 1/2] mm/usercopy: get rid of "provably correct" warnings Josh Poimboeuf
2016-08-24  2:36         ` Kees Cook
2016-08-23 19:28       ` [PATCH 2/2] mm/usercopy: enable usercopy size checking for modern versions of gcc Josh Poimboeuf
2016-08-24  2:37         ` Kees Cook
2016-08-25 20:47           ` Josh Poimboeuf
2016-08-26  2:14             ` Kees Cook
2016-08-26  3:27               ` Josh Poimboeuf
2016-08-26 13:42                 ` Kees Cook
2016-08-26 13:55                   ` Josh Poimboeuf
2016-08-26 20:56                     ` Josh Poimboeuf
2016-08-26 21:00                       ` Josh Poimboeuf
2016-08-27  0:37                       ` Linus Torvalds
2016-08-29 14:48                         ` Josh Poimboeuf
2016-08-29 15:36                           ` Linus Torvalds
2016-08-29 17:08                             ` [PATCH v2] mm/usercopy: get rid of CONFIG_DEBUG_STRICT_USER_COPY_CHECKS Josh Poimboeuf
2016-08-29 17:59                               ` Josh Poimboeuf
2016-08-30 13:04                               ` [PATCH v3] " Josh Poimboeuf
2016-08-30 17:02                                 ` Linus Torvalds
2016-08-30 18:12                                   ` Al Viro
2016-08-30 18:13                                     ` Linus Torvalds
2016-08-30 18:15                                   ` Kees Cook
2016-08-30 19:09                                     ` Josh Poimboeuf
2016-08-30 19:20                                       ` Kees Cook
2016-08-30 20:13                                     ` Al Viro
2016-08-30 22:20                                       ` Kees Cook
2016-08-31  9:43                                       ` Mark Rutland
2016-08-30 18:33                           ` [PATCH 2/2] mm/usercopy: enable usercopy size checking for modern versions of gcc Kees Cook
2016-08-23 20:31     ` [PATCH v4 54/57] x86/mm: convert arch_within_stack_frames() to use the new unwinder Andy Lutomirski
2016-08-23 21:06       ` Linus Torvalds
2016-08-23 21:08       ` Josh Poimboeuf
2016-08-24  1:37         ` Kees Cook
2016-08-18 13:06 ` [PATCH v4 55/57] x86/mm: simplify starting frame logic for hardened usercopy Josh Poimboeuf
2016-08-18 13:06 ` [PATCH v4 56/57] x86/mm: removed unused arch_within_stack_frames() arguments Josh Poimboeuf
2016-08-18 13:06 ` [PATCH v4 57/57] mm: re-enable gcc frame address warning Josh Poimboeuf
2016-08-18 13:25 ` [PATCH v4 00/57] x86/dumpstack: rewrite x86 stack dump code Frederic Weisbecker
2016-08-18 13:39   ` Ingo Molnar
2016-08-18 14:31     ` Josh Poimboeuf
2016-08-18 14:41       ` Steven Rostedt
2016-08-18 16:36       ` Ingo Molnar
2016-08-18 14:34     ` Steven Rostedt

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.