All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] x86/dumpstack: more stack dump improvements
@ 2016-08-24 16:50 Josh Poimboeuf
  2016-08-24 16:50 ` [PATCH 1/6] perf/x86: check perf_callchain_store() error Josh Poimboeuf
                   ` (6 more replies)
  0 siblings, 7 replies; 42+ messages in thread
From: Josh Poimboeuf @ 2016-08-24 16:50 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

Some more preparatory patches for the new x86 unwinder (though these
generally stand alone as improvements in their own right).

Josh Poimboeuf (6):
  perf/x86: check perf_callchain_store() error
  oprofile/x86: add regs->ip to oprofile trace
  x86/dumpstack: make printk_stack_address() more generally useful
  x86/dumpstack: add get_stack_pointer() and get_frame_pointer()
  x86/dumpstack: remove unnecessary stack pointer arguments
  x86/dumpstack: allow preemption in show_stack_log_lvl() and
    dump_trace()

 arch/x86/events/core.c            |  3 +-
 arch/x86/include/asm/stacktrace.h | 41 ++++++++++++++------------
 arch/x86/kernel/dumpstack.c       | 13 ++++-----
 arch/x86/kernel/dumpstack_32.c    | 41 +++++++-------------------
 arch/x86/kernel/dumpstack_64.c    | 61 +++++++++------------------------------
 arch/x86/oprofile/backtrace.c     | 12 +++++---
 6 files changed, 64 insertions(+), 107 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 1/6] perf/x86: check perf_callchain_store() error
  2016-08-24 16:50 [PATCH 0/6] x86/dumpstack: more stack dump improvements Josh Poimboeuf
@ 2016-08-24 16:50 ` Josh Poimboeuf
  2016-09-08  9:48   ` [tip:x86/asm] perf/x86: Check " tip-bot for Josh Poimboeuf
  2016-08-24 16:50 ` [PATCH 2/6] oprofile/x86: add regs->ip to oprofile trace Josh Poimboeuf
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 42+ messages in thread
From: Josh Poimboeuf @ 2016-08-24 16:50 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

Add a check to perf_callchain_kernel() so that it returns early if the
callchain entry array is already full.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/events/core.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 18a1acf..dcaa887 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2297,7 +2297,8 @@ perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *re
 		return;
 	}
 
-	perf_callchain_store(entry, regs->ip);
+	if (perf_callchain_store(entry, regs->ip))
+		return;
 
 	dump_trace(NULL, regs, NULL, 0, &backtrace_ops, entry);
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 2/6] oprofile/x86: add regs->ip to oprofile trace
  2016-08-24 16:50 [PATCH 0/6] x86/dumpstack: more stack dump improvements Josh Poimboeuf
  2016-08-24 16:50 ` [PATCH 1/6] perf/x86: check perf_callchain_store() error Josh Poimboeuf
@ 2016-08-24 16:50 ` Josh Poimboeuf
  2016-08-30 11:30   ` Robert Richter
  2016-09-08  9:48   ` [tip:x86/asm] oprofile/x86: Add " tip-bot for Josh Poimboeuf
  2016-08-24 16:50 ` [PATCH 3/6] x86/dumpstack: make printk_stack_address() more generally useful Josh Poimboeuf
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 42+ messages in thread
From: Josh Poimboeuf @ 2016-08-24 16:50 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish, Robert Richter

dump_trace() doesn't add the interrupted instruction's address to the
trace, so add it manually.  This makes the profile more useful, and also
makes it more consistent with what perf profiling does.

Cc: Robert Richter <rric@kernel.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/oprofile/backtrace.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/arch/x86/oprofile/backtrace.c b/arch/x86/oprofile/backtrace.c
index cb31a44..2ef6c8b 100644
--- a/arch/x86/oprofile/backtrace.c
+++ b/arch/x86/oprofile/backtrace.c
@@ -114,9 +114,16 @@ x86_backtrace(struct pt_regs * const regs, unsigned int depth)
 
 	if (!user_mode(regs)) {
 		unsigned long stack = kernel_stack_pointer(regs);
-		if (depth)
-			dump_trace(NULL, regs, (unsigned long *)stack, 0,
-				   &backtrace_ops, &depth);
+
+		if (!depth)
+			return;
+
+		oprofile_add_trace(regs->ip);
+		if (!--depth)
+			return;
+
+		dump_trace(NULL, regs, (unsigned long *)stack, 0,
+			   &backtrace_ops, &depth);
 		return;
 	}
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 3/6] x86/dumpstack: make printk_stack_address() more generally useful
  2016-08-24 16:50 [PATCH 0/6] x86/dumpstack: more stack dump improvements Josh Poimboeuf
  2016-08-24 16:50 ` [PATCH 1/6] perf/x86: check perf_callchain_store() error Josh Poimboeuf
  2016-08-24 16:50 ` [PATCH 2/6] oprofile/x86: add regs->ip to oprofile trace Josh Poimboeuf
@ 2016-08-24 16:50 ` Josh Poimboeuf
  2016-08-24 17:28   ` Joe Perches
                     ` (2 more replies)
  2016-08-24 16:50 ` [PATCH 4/6] x86/dumpstack: add get_stack_pointer() and get_frame_pointer() Josh Poimboeuf
                   ` (3 subsequent siblings)
  6 siblings, 3 replies; 42+ messages in thread
From: Josh Poimboeuf @ 2016-08-24 16:50 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

Change printk_stack_address() to be useful when called by an unwinder
outside the context of dump_trace().

Specifically:

- printk_stack_address()'s 'data' argument is always used as the log
  level string.  Make that explicit.

- Call touch_nmi_watchdog().

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/dumpstack.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index 01072e9..f0ddf85 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -26,10 +26,11 @@ int kstack_depth_to_print = 3 * STACKSLOTS_PER_LINE;
 static int die_counter;
 
 static void printk_stack_address(unsigned long address, int reliable,
-		void *data)
+				 char *log_lvl)
 {
+	touch_nmi_watchdog();
 	printk("%s [<%p>] %s%pB\n",
-		(char *)data, (void *)address, reliable ? "" : "? ",
+		log_lvl, (void *)address, reliable ? "" : "? ",
 		(void *)address);
 }
 
@@ -148,7 +149,6 @@ static int print_trace_stack(void *data, char *name)
  */
 static int print_trace_address(void *data, unsigned long addr, int reliable)
 {
-	touch_nmi_watchdog();
 	printk_stack_address(addr, reliable, data);
 	return 0;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 4/6] x86/dumpstack: add get_stack_pointer() and get_frame_pointer()
  2016-08-24 16:50 [PATCH 0/6] x86/dumpstack: more stack dump improvements Josh Poimboeuf
                   ` (2 preceding siblings ...)
  2016-08-24 16:50 ` [PATCH 3/6] x86/dumpstack: make printk_stack_address() more generally useful Josh Poimboeuf
@ 2016-08-24 16:50 ` Josh Poimboeuf
  2016-09-08  9:49   ` [tip:x86/asm] x86/dumpstack: Add " tip-bot for Josh Poimboeuf
  2016-08-24 16:50 ` [PATCH 5/6] x86/dumpstack: remove unnecessary stack pointer arguments Josh Poimboeuf
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 42+ messages in thread
From: Josh Poimboeuf @ 2016-08-24 16:50 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

The various functions involved in dumping the stack all do similar
things with regard to getting the stack pointer and the frame pointer
based on the regs and task arguments.  Create helper functions to
do that instead.

Reviewed-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/include/asm/stacktrace.h | 41 ++++++++++++++++++++++-----------------
 arch/x86/kernel/dumpstack.c       |  5 ++---
 arch/x86/kernel/dumpstack_32.c    | 25 ++++--------------------
 arch/x86/kernel/dumpstack_64.c    | 30 ++++------------------------
 4 files changed, 33 insertions(+), 68 deletions(-)

diff --git a/arch/x86/include/asm/stacktrace.h b/arch/x86/include/asm/stacktrace.h
index 7646fb2..3552f5e 100644
--- a/arch/x86/include/asm/stacktrace.h
+++ b/arch/x86/include/asm/stacktrace.h
@@ -50,36 +50,41 @@ void dump_trace(struct task_struct *tsk, struct pt_regs *regs,
 
 #ifdef CONFIG_X86_32
 #define STACKSLOTS_PER_LINE 8
-#define get_bp(bp) asm("movl %%ebp, %0" : "=r" (bp) :)
 #else
 #define STACKSLOTS_PER_LINE 4
-#define get_bp(bp) asm("movq %%rbp, %0" : "=r" (bp) :)
 #endif
 
 #ifdef CONFIG_FRAME_POINTER
-static inline unsigned long
-stack_frame(struct task_struct *task, struct pt_regs *regs)
+static inline unsigned long *
+get_frame_pointer(struct task_struct *task, struct pt_regs *regs)
 {
-	unsigned long bp;
-
 	if (regs)
-		return regs->bp;
+		return (unsigned long *)regs->bp;
 
-	if (task == current) {
-		/* Grab bp right from our regs */
-		get_bp(bp);
-		return bp;
-	}
+	if (!task || task == current)
+		return __builtin_frame_address(0);
 
-	return ((struct inactive_task_frame *)task->thread.sp)->bp;
+	return (unsigned long *)((struct inactive_task_frame *)task->thread.sp)->bp;
 }
 #else
-static inline unsigned long
-stack_frame(struct task_struct *task, struct pt_regs *regs)
+static inline unsigned long *
+get_frame_pointer(struct task_struct *task, struct pt_regs *regs)
 {
-	return 0;
+	return NULL;
+}
+#endif /* CONFIG_FRAME_POINTER */
+
+static inline unsigned long *
+get_stack_pointer(struct task_struct *task, struct pt_regs *regs)
+{
+	if (regs)
+		return (unsigned long *)kernel_stack_pointer(regs);
+
+	if (!task || task == current)
+		return __builtin_frame_address(0);
+
+	return (unsigned long *)task->thread.sp;
 }
-#endif
 
 extern void
 show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
@@ -106,7 +111,7 @@ static inline unsigned long caller_frame_pointer(void)
 {
 	struct stack_frame *frame;
 
-	get_bp(frame);
+	frame = __builtin_frame_address(0);
 
 #ifdef CONFIG_FRAME_POINTER
 	frame = frame->next_frame;
diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index f0ddf85..6d6f468 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -170,15 +170,14 @@ show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
 void show_stack(struct task_struct *task, unsigned long *sp)
 {
 	unsigned long bp = 0;
-	unsigned long stack;
 
 	/*
 	 * Stack frames below this one aren't interesting.  Don't show them
 	 * if we're printing for %current.
 	 */
 	if (!sp && (!task || task == current)) {
-		sp = &stack;
-		bp = stack_frame(current, NULL);
+		sp = get_stack_pointer(current, NULL);
+		bp = (unsigned long)get_frame_pointer(current, NULL);
 	}
 
 	show_stack_log_lvl(task, NULL, sp, bp, "");
diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
index 0967571..358fe1c 100644
--- a/arch/x86/kernel/dumpstack_32.c
+++ b/arch/x86/kernel/dumpstack_32.c
@@ -46,19 +46,9 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 	int graph = 0;
 	u32 *prev_esp;
 
-	if (!task)
-		task = current;
-
-	if (!stack) {
-		unsigned long dummy;
-
-		stack = &dummy;
-		if (task != current)
-			stack = (unsigned long *)task->thread.sp;
-	}
-
-	if (!bp)
-		bp = stack_frame(task, regs);
+	task = task ? : current;
+	stack = stack ? : get_stack_pointer(task, regs);
+	bp = bp ? : (unsigned long)get_frame_pointer(task, regs);
 
 	for (;;) {
 		void *end_stack;
@@ -95,14 +85,7 @@ show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
 	unsigned long *stack;
 	int i;
 
-	if (sp == NULL) {
-		if (regs)
-			sp = (unsigned long *)regs->sp;
-		else if (task)
-			sp = (unsigned long *)task->thread.sp;
-		else
-			sp = (unsigned long *)&sp;
-	}
+	sp = sp ? : get_stack_pointer(task, regs);
 
 	stack = sp;
 	for (i = 0; i < kstack_depth_to_print; i++) {
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index 066eb5c7..7f3b806 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -151,25 +151,14 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 {
 	const unsigned cpu = get_cpu();
 	unsigned long *irq_stack = (unsigned long *)per_cpu(irq_stack_ptr, cpu);
-	unsigned long dummy;
 	unsigned used = 0;
 	int graph = 0;
 	int done = 0;
 
-	if (!task)
-		task = current;
+	task = task ? : current;
+	stack = stack ? : get_stack_pointer(task, regs);
+	bp = bp ? : (unsigned long)get_frame_pointer(task, regs);
 
-	if (!stack) {
-		if (regs)
-			stack = (unsigned long *)regs->sp;
-		else if (task != current)
-			stack = (unsigned long *)task->thread.sp;
-		else
-			stack = &dummy;
-	}
-
-	if (!bp)
-		bp = stack_frame(task, regs);
 	/*
 	 * Print function call entries in all stacks, starting at the
 	 * current stack address. If the stacks consist of nested
@@ -256,18 +245,7 @@ show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
 	irq_stack_end = (unsigned long *)(per_cpu(irq_stack_ptr, cpu));
 	irq_stack     = irq_stack_end - (IRQ_STACK_SIZE / sizeof(long));
 
-	/*
-	 * Debugging aid: "show_stack(NULL, NULL);" prints the
-	 * back trace for this cpu:
-	 */
-	if (sp == NULL) {
-		if (regs)
-			sp = (unsigned long *)regs->sp;
-		else if (task)
-			sp = (unsigned long *)task->thread.sp;
-		else
-			sp = (unsigned long *)&sp;
-	}
+	sp = sp ? : get_stack_pointer(task, regs);
 
 	stack = sp;
 	for (i = 0; i < kstack_depth_to_print; i++) {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 5/6] x86/dumpstack: remove unnecessary stack pointer arguments
  2016-08-24 16:50 [PATCH 0/6] x86/dumpstack: more stack dump improvements Josh Poimboeuf
                   ` (3 preceding siblings ...)
  2016-08-24 16:50 ` [PATCH 4/6] x86/dumpstack: add get_stack_pointer() and get_frame_pointer() Josh Poimboeuf
@ 2016-08-24 16:50 ` Josh Poimboeuf
  2016-09-08  9:50   ` [tip:x86/asm] x86/dumpstack: Remove " tip-bot for Josh Poimboeuf
  2016-08-24 16:50 ` [PATCH 6/6] x86/dumpstack: allow preemption in show_stack_log_lvl() and dump_trace() Josh Poimboeuf
  2016-09-01 13:13 ` [PATCH 0/6] x86/dumpstack: more stack dump improvements Josh Poimboeuf
  6 siblings, 1 reply; 42+ messages in thread
From: Josh Poimboeuf @ 2016-08-24 16:50 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

When calling show_stack_log_lvl() or dump_trace() with a regs argument,
providing a stack pointer or frame pointer is redundant.

Reviewed-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>d
---
 arch/x86/kernel/dumpstack.c    | 2 +-
 arch/x86/kernel/dumpstack_32.c | 2 +-
 arch/x86/kernel/dumpstack_64.c | 5 +----
 arch/x86/oprofile/backtrace.c  | 5 +----
 4 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index 6d6f468..c6c6c39 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -185,7 +185,7 @@ void show_stack(struct task_struct *task, unsigned long *sp)
 
 void show_stack_regs(struct pt_regs *regs)
 {
-	show_stack_log_lvl(current, regs, (unsigned long *)regs->sp, regs->bp, "");
+	show_stack_log_lvl(current, regs, NULL, 0, "");
 }
 
 static arch_spinlock_t die_lock = __ARCH_SPIN_LOCK_UNLOCKED;
diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
index 358fe1c..c533b8b 100644
--- a/arch/x86/kernel/dumpstack_32.c
+++ b/arch/x86/kernel/dumpstack_32.c
@@ -122,7 +122,7 @@ void show_regs(struct pt_regs *regs)
 		u8 *ip;
 
 		pr_emerg("Stack:\n");
-		show_stack_log_lvl(NULL, regs, &regs->sp, 0, KERN_EMERG);
+		show_stack_log_lvl(NULL, regs, NULL, 0, KERN_EMERG);
 
 		pr_emerg("Code:");
 
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index 7f3b806..b243352 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -283,9 +283,7 @@ show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
 void show_regs(struct pt_regs *regs)
 {
 	int i;
-	unsigned long sp;
 
-	sp = regs->sp;
 	show_regs_print_info(KERN_DEFAULT);
 	__show_regs(regs, 1);
 
@@ -300,8 +298,7 @@ void show_regs(struct pt_regs *regs)
 		u8 *ip;
 
 		printk(KERN_DEFAULT "Stack:\n");
-		show_stack_log_lvl(NULL, regs, (unsigned long *)sp,
-				   0, KERN_DEFAULT);
+		show_stack_log_lvl(NULL, regs, NULL, 0, KERN_DEFAULT);
 
 		printk(KERN_DEFAULT "Code: ");
 
diff --git a/arch/x86/oprofile/backtrace.c b/arch/x86/oprofile/backtrace.c
index 2ef6c8b..d950f9e 100644
--- a/arch/x86/oprofile/backtrace.c
+++ b/arch/x86/oprofile/backtrace.c
@@ -113,8 +113,6 @@ x86_backtrace(struct pt_regs * const regs, unsigned int depth)
 	struct stack_frame *head = (struct stack_frame *)frame_pointer(regs);
 
 	if (!user_mode(regs)) {
-		unsigned long stack = kernel_stack_pointer(regs);
-
 		if (!depth)
 			return;
 
@@ -122,8 +120,7 @@ x86_backtrace(struct pt_regs * const regs, unsigned int depth)
 		if (!--depth)
 			return;
 
-		dump_trace(NULL, regs, (unsigned long *)stack, 0,
-			   &backtrace_ops, &depth);
+		dump_trace(NULL, regs, NULL, 0, &backtrace_ops, &depth);
 		return;
 	}
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 6/6] x86/dumpstack: allow preemption in show_stack_log_lvl() and dump_trace()
  2016-08-24 16:50 [PATCH 0/6] x86/dumpstack: more stack dump improvements Josh Poimboeuf
                   ` (4 preceding siblings ...)
  2016-08-24 16:50 ` [PATCH 5/6] x86/dumpstack: remove unnecessary stack pointer arguments Josh Poimboeuf
@ 2016-08-24 16:50 ` Josh Poimboeuf
  2016-09-08  7:04   ` Ingo Molnar
  2016-09-01 13:13 ` [PATCH 0/6] x86/dumpstack: more stack dump improvements Josh Poimboeuf
  6 siblings, 1 reply; 42+ messages in thread
From: Josh Poimboeuf @ 2016-08-24 16:50 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

show_stack_log_lvl() and dump_trace() are already preemption safe:

- If they're running in irq or exception context, preemption is already
  disabled and the percpu stack pointers can be trusted.

- If they're running with preemption enabled, they must be running on
  the task stack anyway, so it doesn't matter if they're comparing the
  stack pointer against a percpu stack pointer from this CPU or another
  one: either way it won't match.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/dumpstack_32.c | 14 ++++++--------
 arch/x86/kernel/dumpstack_64.c | 26 +++++++++-----------------
 2 files changed, 15 insertions(+), 25 deletions(-)

diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
index c533b8b..b07d5c9 100644
--- a/arch/x86/kernel/dumpstack_32.c
+++ b/arch/x86/kernel/dumpstack_32.c
@@ -24,16 +24,16 @@ static void *is_irq_stack(void *p, void *irq)
 }
 
 
-static void *is_hardirq_stack(unsigned long *stack, int cpu)
+static void *is_hardirq_stack(unsigned long *stack)
 {
-	void *irq = per_cpu(hardirq_stack, cpu);
+	void *irq = this_cpu_read(hardirq_stack);
 
 	return is_irq_stack(stack, irq);
 }
 
-static void *is_softirq_stack(unsigned long *stack, int cpu)
+static void *is_softirq_stack(unsigned long *stack);
 {
-	void *irq = per_cpu(softirq_stack, cpu);
+	void *irq = this_cpu_read(softirq_stack);
 
 	return is_irq_stack(stack, irq);
 }
@@ -42,7 +42,6 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 		unsigned long *stack, unsigned long bp,
 		const struct stacktrace_ops *ops, void *data)
 {
-	const unsigned cpu = get_cpu();
 	int graph = 0;
 	u32 *prev_esp;
 
@@ -53,9 +52,9 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 	for (;;) {
 		void *end_stack;
 
-		end_stack = is_hardirq_stack(stack, cpu);
+		end_stack = is_hardirq_stack(stack);
 		if (!end_stack)
-			end_stack = is_softirq_stack(stack, cpu);
+			end_stack = is_softirq_stack(stack);
 
 		bp = ops->walk_stack(task, stack, bp, ops, data,
 				     end_stack, &graph);
@@ -74,7 +73,6 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 			break;
 		touch_nmi_watchdog();
 	}
-	put_cpu();
 }
 EXPORT_SYMBOL(dump_trace);
 
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index b243352..07373be 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -31,8 +31,8 @@ static char x86_stack_ids[][8] = {
 #endif
 };
 
-static unsigned long *in_exception_stack(unsigned cpu, unsigned long stack,
-					 unsigned *usedp, char **idp)
+static unsigned long *in_exception_stack(unsigned long stack, unsigned *usedp,
+					 char **idp)
 {
 	unsigned k;
 
@@ -41,7 +41,7 @@ static unsigned long *in_exception_stack(unsigned cpu, unsigned long stack,
 	 * 'stack' is in one of them:
 	 */
 	for (k = 0; k < N_EXCEPTION_STACKS; k++) {
-		unsigned long end = per_cpu(orig_ist, cpu).ist[k];
+		unsigned long end = raw_cpu_ptr(&orig_ist)->ist[k];
 		/*
 		 * Is 'stack' above this exception frame's end?
 		 * If yes then skip to the next frame.
@@ -111,7 +111,7 @@ enum stack_type {
 };
 
 static enum stack_type
-analyze_stack(int cpu, struct task_struct *task, unsigned long *stack,
+analyze_stack(struct task_struct *task, unsigned long *stack,
 	      unsigned long **stack_end, unsigned long *irq_stack,
 	      unsigned *used, char **id)
 {
@@ -121,8 +121,7 @@ analyze_stack(int cpu, struct task_struct *task, unsigned long *stack,
 	if ((unsigned long)task_stack_page(task) == addr)
 		return STACK_IS_NORMAL;
 
-	*stack_end = in_exception_stack(cpu, (unsigned long)stack,
-					used, id);
+	*stack_end = in_exception_stack((unsigned long)stack, used, id);
 	if (*stack_end)
 		return STACK_IS_EXCEPTION;
 
@@ -149,8 +148,7 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 		unsigned long *stack, unsigned long bp,
 		const struct stacktrace_ops *ops, void *data)
 {
-	const unsigned cpu = get_cpu();
-	unsigned long *irq_stack = (unsigned long *)per_cpu(irq_stack_ptr, cpu);
+	unsigned long *irq_stack = (unsigned long *)this_cpu_read(irq_stack_ptr);
 	unsigned used = 0;
 	int graph = 0;
 	int done = 0;
@@ -169,8 +167,8 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 		enum stack_type stype;
 		char *id;
 
-		stype = analyze_stack(cpu, task, stack, &stack_end,
-				      irq_stack, &used, &id);
+		stype = analyze_stack(task, stack, &stack_end, irq_stack, &used,
+				      &id);
 
 		/* Default finish unless specified to continue */
 		done = 1;
@@ -225,7 +223,6 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 	 * This handles the process stack:
 	 */
 	bp = ops->walk_stack(task, stack, bp, ops, data, NULL, &graph);
-	put_cpu();
 }
 EXPORT_SYMBOL(dump_trace);
 
@@ -236,13 +233,9 @@ show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
 	unsigned long *irq_stack_end;
 	unsigned long *irq_stack;
 	unsigned long *stack;
-	int cpu;
 	int i;
 
-	preempt_disable();
-	cpu = smp_processor_id();
-
-	irq_stack_end = (unsigned long *)(per_cpu(irq_stack_ptr, cpu));
+	irq_stack_end = (unsigned long *)this_cpu_read(irq_stack_ptr);
 	irq_stack     = irq_stack_end - (IRQ_STACK_SIZE / sizeof(long));
 
 	sp = sp ? : get_stack_pointer(task, regs);
@@ -274,7 +267,6 @@ show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
 		stack++;
 		touch_nmi_watchdog();
 	}
-	preempt_enable();
 
 	pr_cont("\n");
 	show_trace_log_lvl(task, regs, sp, bp, log_lvl);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH 3/6] x86/dumpstack: make printk_stack_address() more generally useful
  2016-08-24 16:50 ` [PATCH 3/6] x86/dumpstack: make printk_stack_address() more generally useful Josh Poimboeuf
@ 2016-08-24 17:28   ` Joe Perches
  2016-08-24 18:43     ` Josh Poimboeuf
  2016-08-24 18:03   ` Linus Torvalds
  2016-09-08  9:49   ` [tip:x86/asm] x86/dumpstack: Make " tip-bot for Josh Poimboeuf
  2 siblings, 1 reply; 42+ messages in thread
From: Joe Perches @ 2016-08-24 17:28 UTC (permalink / raw)
  To: Josh Poimboeuf, Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

On Wed, 2016-08-24 at 11:50 -0500, Josh Poimboeuf wrote:
> Change printk_stack_address() to be useful when called by an unwinder
> outside the context of dump_trace().
> 
> Specifically:
> 
> - printk_stack_address()'s 'data' argument is always used as the log
>   level string.  Make that explicit.

If this is true, and I'm not sure it is as I believe
there are static strings emitted like EOE and IRQ,
shouldn't this bubble up through the calling tree?

> - Call touch_nmi_watchdog().
[]
> diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
[]
> @@ -26,10 +26,11 @@ int kstack_depth_to_print = 3 * STACKSLOTS_PER_LINE;
>  static int die_counter;
>  
>  static void printk_stack_address(unsigned long address, int reliable,
> -		void *data)
> +				 char *log_lvl)
>  {
> +	touch_nmi_watchdog();
>  	printk("%s [<%p>] %s%pB\n",
> -		(char *)data, (void *)address, reliable ? "" : "? ",
> +		log_lvl, (void *)address, reliable ? "" : "? ",
>  		(void *)address);
>  }
>  
> @@ -148,7 +149,6 @@ static int print_trace_stack(void *data, char *name)
>   */
>  static int print_trace_address(void *data, unsigned long addr, int reliable)
>  {
> -	touch_nmi_watchdog();
>  	printk_stack_address(addr, reliable, data);
>  	return 0;
>  }

like for data here?

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 3/6] x86/dumpstack: make printk_stack_address() more generally useful
  2016-08-24 16:50 ` [PATCH 3/6] x86/dumpstack: make printk_stack_address() more generally useful Josh Poimboeuf
  2016-08-24 17:28   ` Joe Perches
@ 2016-08-24 18:03   ` Linus Torvalds
  2016-08-24 18:22     ` Peter Zijlstra
  2016-09-08  9:49   ` [tip:x86/asm] x86/dumpstack: Make " tip-bot for Josh Poimboeuf
  2 siblings, 1 reply; 42+ messages in thread
From: Linus Torvalds @ 2016-08-24 18:03 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	the arch/x86 maintainers, Linux Kernel Mailing List,
	Andy Lutomirski, Steven Rostedt, Brian Gerst, Kees Cook,
	Peter Zijlstra, Frederic Weisbecker, Byungchul Park, Nilay Vaish

On Wed, Aug 24, 2016 at 12:50 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> Change printk_stack_address() to be useful when called by an unwinder
> outside the context of dump_trace().
...
>         printk("%s [<%p>] %s%pB\n",
> -               (char *)data, (void *)address, reliable ? "" : "? ",
> +               log_lvl, (void *)address, reliable ? "" : "? ",
>                 (void *)address);

Side note: we should just remove the "[<hex>]" thing entirely, I
think. It's not just redundant, but when people care about kernel
address randomization it ends up spilling the beans on the offset in
warnings (in oopses you can generally find the data in the stack
backtraces, which we probably should remove too as useless).

So it would be just

        printk("%s %s%pB\n", log_lvl, reliable ? "" : "? ", (void *)address);

instead.

For the non-kallsyms case we _could_ also just make the '%pB' format
add the [<>] markers back in case somebody still uses the user-space
kallsyms script that looks up hex numbers.

Right now the hex numbers are not only useless in stack dumps (since
you can't look up symbols using them anyway thanks to randomization),
they are noise that makes the stack traces harder to read. So let's
just remove them?

I realize that is entirely orthogonal to this particular patch, but it
happened to touch this file.

No objections to this series other than that comment.

              Linus

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 3/6] x86/dumpstack: make printk_stack_address() more generally useful
  2016-08-24 18:03   ` Linus Torvalds
@ 2016-08-24 18:22     ` Peter Zijlstra
  2016-08-24 18:37       ` Linus Torvalds
  0 siblings, 1 reply; 42+ messages in thread
From: Peter Zijlstra @ 2016-08-24 18:22 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Josh Poimboeuf, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	the arch/x86 maintainers, Linux Kernel Mailing List,
	Andy Lutomirski, Steven Rostedt, Brian Gerst, Kees Cook,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

On Wed, Aug 24, 2016 at 02:03:58PM -0400, Linus Torvalds wrote:

> 
> For the non-kallsyms case we _could_ also just make the '%pB' format
> add the [<>] markers back in case somebody still uses the user-space
> kallsyms script that looks up hex numbers.
> 
> Right now the hex numbers are not only useless in stack dumps (since
> you can't look up symbols using them anyway thanks to randomization),
> they are noise that makes the stack traces harder to read. So let's
> just remove them?

I actively disable KASLR on my dev box and feed these hex numbers into
addr2line -ie vmlinux to find where in the function we are.

Having the option to make %pB generate them works for me.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 3/6] x86/dumpstack: make printk_stack_address() more generally useful
  2016-08-24 18:22     ` Peter Zijlstra
@ 2016-08-24 18:37       ` Linus Torvalds
  2016-08-24 19:37         ` Josh Poimboeuf
  2016-08-31 16:53         ` Josh Poimboeuf
  0 siblings, 2 replies; 42+ messages in thread
From: Linus Torvalds @ 2016-08-24 18:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Josh Poimboeuf, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	the arch/x86 maintainers, Linux Kernel Mailing List,
	Andy Lutomirski, Steven Rostedt, Brian Gerst, Kees Cook,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

On Wed, Aug 24, 2016 at 2:22 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>
> I actively disable KASLR on my dev box and feed these hex numbers into
> addr2line -ie vmlinux to find where in the function we are.
>
> Having the option to make %pB generate them works for me.

Yeah, considering that this is the only place this is used, changing
%pB sounds quite reasonable.

We could perhaps make %pB show the hex numbers and address (so pB
would expand to "[<hex>] symbolname".if

 (a) not randomizing (so the hex numbers _may_ be useful)

 (b) kptr_restrict is 0 (so the hex numbers are "safe" in the dmesg)

and fall back to just the symbolic name if either of those aren't true?

And obviously, if KALLSYMS isn't enabled, you always show hex
numbers.. That's already the case (but we might want to add the "[<>}'
markers around the hex numbers just to make the user space automation
we do have work).

            Linus

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 3/6] x86/dumpstack: make printk_stack_address() more generally useful
  2016-08-24 17:28   ` Joe Perches
@ 2016-08-24 18:43     ` Josh Poimboeuf
  2016-08-24 19:07       ` Joe Perches
  0 siblings, 1 reply; 42+ messages in thread
From: Josh Poimboeuf @ 2016-08-24 18:43 UTC (permalink / raw)
  To: Joe Perches
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86, linux-kernel,
	Andy Lutomirski, Linus Torvalds, Steven Rostedt, Brian Gerst,
	Kees Cook, Peter Zijlstra, Frederic Weisbecker, Byungchul Park,
	Nilay Vaish

On Wed, Aug 24, 2016 at 10:28:38AM -0700, Joe Perches wrote:
> On Wed, 2016-08-24 at 11:50 -0500, Josh Poimboeuf wrote:
> > Change printk_stack_address() to be useful when called by an unwinder
> > outside the context of dump_trace().
> > 
> > Specifically:
> > 
> > - printk_stack_address()'s 'data' argument is always used as the log
> >   level string.  Make that explicit.
> 
> If this is true, and I'm not sure it is as I believe
> there are static strings emitted like EOE and IRQ,
> shouldn't this bubble up through the calling tree?
>
> > - Call touch_nmi_watchdog().
> []
> > diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
> []
> > @@ -26,10 +26,11 @@ int kstack_depth_to_print = 3 * STACKSLOTS_PER_LINE;
> >  static int die_counter;
> >  
> >  static void printk_stack_address(unsigned long address, int reliable,
> > -		void *data)
> > +				 char *log_lvl)
> >  {
> > +	touch_nmi_watchdog();
> >  	printk("%s [<%p>] %s%pB\n",
> > -		(char *)data, (void *)address, reliable ? "" : "? ",
> > +		log_lvl, (void *)address, reliable ? "" : "? ",
> >  		(void *)address);
> >  }
> >  
> > @@ -148,7 +149,6 @@ static int print_trace_stack(void *data, char *name)
> >   */
> >  static int print_trace_address(void *data, unsigned long addr, int reliable)
> >  {
> > -	touch_nmi_watchdog();
> >  	printk_stack_address(addr, reliable, data);
> >  	return 0;
> >  }
> 
> like for data here?

This function needs to keep its 'void *data' argument because it's a
callback for stacktrace_ops, so it has to conform to the callback
interface.  'data' is used for passing a pointer to an opaque data
structure to the callback.

Also this is the only caller of printk_stack_address(), so there's
nowhere else to bubble it up to.

-- 
Josh

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 3/6] x86/dumpstack: make printk_stack_address() more generally useful
  2016-08-24 18:43     ` Josh Poimboeuf
@ 2016-08-24 19:07       ` Joe Perches
  2016-08-24 19:24         ` Josh Poimboeuf
  0 siblings, 1 reply; 42+ messages in thread
From: Joe Perches @ 2016-08-24 19:07 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86, linux-kernel,
	Andy Lutomirski, Linus Torvalds, Steven Rostedt, Brian Gerst,
	Kees Cook, Peter Zijlstra, Frederic Weisbecker, Byungchul Park,
	Nilay Vaish

On Wed, 2016-08-24 at 13:43 -0500, Josh Poimboeuf wrote:
> On Wed, Aug 24, 2016 at 10:28:38AM -0700, Joe Perches wrote:
> > On Wed, 2016-08-24 at 11:50 -0500, Josh Poimboeuf wrote:
> > > Change printk_stack_address() to be useful when called by an unwinder
> > > outside the context of dump_trace().
> > > 
> > > Specifically:
> > > 
> > > - printk_stack_address()'s 'data' argument is always used as the log
> > >   level string.  Make that explicit.
[]
> > If this is true, and I'm not sure it is as I believe
> > there are static strings emitted like EOE and IRQ,
> > shouldn't this bubble up through the calling tree?

> []
> This function needs to keep its 'void *data' argument because it's a
> callback for stacktrace_ops, so it has to conform to the callback
> interface.  'data' is used for passing a pointer to an opaque data
> structure to the callback.
> 
> Also this is the only caller of printk_stack_address(), so there's
> nowhere else to bubble it up to.

And that shows that print_stack_address(data is not always a log level.
ie: walk_stack uses it to print a string not a log level.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 3/6] x86/dumpstack: make printk_stack_address() more generally useful
  2016-08-24 19:07       ` Joe Perches
@ 2016-08-24 19:24         ` Josh Poimboeuf
  2016-08-24 19:57           ` Joe Perches
  0 siblings, 1 reply; 42+ messages in thread
From: Josh Poimboeuf @ 2016-08-24 19:24 UTC (permalink / raw)
  To: Joe Perches
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86, linux-kernel,
	Andy Lutomirski, Linus Torvalds, Steven Rostedt, Brian Gerst,
	Kees Cook, Peter Zijlstra, Frederic Weisbecker, Byungchul Park,
	Nilay Vaish

On Wed, Aug 24, 2016 at 12:07:06PM -0700, Joe Perches wrote:
> On Wed, 2016-08-24 at 13:43 -0500, Josh Poimboeuf wrote:
> > On Wed, Aug 24, 2016 at 10:28:38AM -0700, Joe Perches wrote:
> > > On Wed, 2016-08-24 at 11:50 -0500, Josh Poimboeuf wrote:
> > > > Change printk_stack_address() to be useful when called by an unwinder
> > > > outside the context of dump_trace().
> > > > 
> > > > Specifically:
> > > > 
> > > > - printk_stack_address()'s 'data' argument is always used as the log
> > > >   level string.  Make that explicit.
> []
> > > If this is true, and I'm not sure it is as I believe
> > > there are static strings emitted like EOE and IRQ,
> > > shouldn't this bubble up through the calling tree?
> 
> > []
> > This function needs to keep its 'void *data' argument because it's a
> > callback for stacktrace_ops, so it has to conform to the callback
> > interface.  'data' is used for passing a pointer to an opaque data
> > structure to the callback.
> > 
> > Also this is the only caller of printk_stack_address(), so there's
> > nowhere else to bubble it up to.
> 
> And that shows that print_stack_address(data is not always a log level.
> ie: walk_stack uses it to print a string not a log level.

Hm, can you be more specific?  As far as I can tell, here's the only
possible call path to print_trace_address() and printk_stack_address():

show_trace_log_lvl()
  dump_trace()				// ops is print_trace_op
    print_context_stack()		// ops->walk_stack
      print_trace_address()		// ops->address
          printk_stack_address()

So 'data' is a sneaky way to pass 'log_lvl' from show_trace_log_lvl() to
print_trace_address(), without dump_trace() and print_context_stack()
knowing what it is, because they're used in other places where 'data'
means something else.

-- 
Josh

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 3/6] x86/dumpstack: make printk_stack_address() more generally useful
  2016-08-24 18:37       ` Linus Torvalds
@ 2016-08-24 19:37         ` Josh Poimboeuf
  2016-08-25 17:49           ` Josh Poimboeuf
  2016-08-31 16:53         ` Josh Poimboeuf
  1 sibling, 1 reply; 42+ messages in thread
From: Josh Poimboeuf @ 2016-08-24 19:37 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Zijlstra, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	the arch/x86 maintainers, Linux Kernel Mailing List,
	Andy Lutomirski, Steven Rostedt, Brian Gerst, Kees Cook,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

On Wed, Aug 24, 2016 at 02:37:21PM -0400, Linus Torvalds wrote:
> On Wed, Aug 24, 2016 at 2:22 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > I actively disable KASLR on my dev box and feed these hex numbers into
> > addr2line -ie vmlinux to find where in the function we are.
> >
> > Having the option to make %pB generate them works for me.
> 
> Yeah, considering that this is the only place this is used, changing
> %pB sounds quite reasonable.

There's now another use of '%pB' in proc_pid_stack() in the tip tree: I
changed it to '%pB' from '%pS'.  But I think the modified '%pB' would
work there as well.

> We could perhaps make %pB show the hex numbers and address (so pB
> would expand to "[<hex>] symbolname".if
> 
>  (a) not randomizing (so the hex numbers _may_ be useful)
> 
>  (b) kptr_restrict is 0 (so the hex numbers are "safe" in the dmesg)
> 
> and fall back to just the symbolic name if either of those aren't true?

Do we really need to check for both?  '%pK' only checks kptr_restrict.
I'd think we should be consistent with that.  And maybe there are some
scenarios where the actual text addresses provide useful debug
information if KASLR is enabled and kptr_restrict is zero.

> And obviously, if KALLSYMS isn't enabled, you always show hex
> numbers.. That's already the case (but we might want to add the "[<>}'
> markers around the hex numbers just to make the user space automation
> we do have work).

Even if kptr_restrict is set?

-- 
Josh

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 3/6] x86/dumpstack: make printk_stack_address() more generally useful
  2016-08-24 19:24         ` Josh Poimboeuf
@ 2016-08-24 19:57           ` Joe Perches
  0 siblings, 0 replies; 42+ messages in thread
From: Joe Perches @ 2016-08-24 19:57 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86, linux-kernel,
	Andy Lutomirski, Linus Torvalds, Steven Rostedt, Brian Gerst,
	Kees Cook, Peter Zijlstra, Frederic Weisbecker, Byungchul Park,
	Nilay Vaish

On Wed, 2016-08-24 at 14:24 -0500, Josh Poimboeuf wrote:
> On Wed, Aug 24, 2016 at 12:07:06PM -0700, Joe Perches wrote:
> > On Wed, 2016-08-24 at 13:43 -0500, Josh Poimboeuf wrote:
> > > On Wed, Aug 24, 2016 at 10:28:38AM -0700, Joe Perches wrote:
> > > > On Wed, 2016-08-24 at 11:50 -0500, Josh Poimboeuf wrote:
> > > > > Change printk_stack_address() to be useful when called by an unwinder
> > > > > outside the context of dump_trace().
> > > > > 
> > > > > Specifically:
> > > > > 
> > > > > - printk_stack_address()'s 'data' argument is always used as the log
> > > > >   level string.  Make that explicit.
> > []
> > > > If this is true, and I'm not sure it is as I believe
> > > > there are static strings emitted like EOE and IRQ,
> > > > shouldn't this bubble up through the calling tree?
> > > []
> > > This function needs to keep its 'void *data' argument because it's a
> > > callback for stacktrace_ops, so it has to conform to the callback
> > > interface.  'data' is used for passing a pointer to an opaque data
> > > structure to the callback.
> > > 
> > > Also this is the only caller of printk_stack_address(), so there's
> > > nowhere else to bubble it up to.
> > And that shows that print_stack_address(data is not always a log level.
> > ie: walk_stack uses it to print a string not a log level.
> Hm, can you be more specific?  As far as I can tell, here's the only
> possible call path to print_trace_address() and printk_stack_address():
> 
> show_trace_log_lvl()
>   dump_trace()				// ops is print_trace_op
>     print_context_stack()		// ops->walk_stack
>       print_trace_address()		// ops->address
>           printk_stack_address()
> 
> So 'data' is a sneaky way to pass 'log_lvl' from show_trace_log_lvl() to
> print_trace_address(), without dump_trace() and print_context_stack()
> knowing what it is, because they're used in other places where 'data'
> means something else.

hmm, perhaps I got lost in a twisting maze of little callbacks.
I'll drop some stuff and see where I am next time.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 3/6] x86/dumpstack: make printk_stack_address() more generally useful
  2016-08-24 19:37         ` Josh Poimboeuf
@ 2016-08-25 17:49           ` Josh Poimboeuf
  2016-08-25 20:41             ` Kees Cook
  0 siblings, 1 reply; 42+ messages in thread
From: Josh Poimboeuf @ 2016-08-25 17:49 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Zijlstra, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	the arch/x86 maintainers, Linux Kernel Mailing List,
	Andy Lutomirski, Steven Rostedt, Brian Gerst, Kees Cook,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

On Wed, Aug 24, 2016 at 02:37:07PM -0500, Josh Poimboeuf wrote:
> On Wed, Aug 24, 2016 at 02:37:21PM -0400, Linus Torvalds wrote:
> > On Wed, Aug 24, 2016 at 2:22 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> > >
> > > I actively disable KASLR on my dev box and feed these hex numbers into
> > > addr2line -ie vmlinux to find where in the function we are.
> > >
> > > Having the option to make %pB generate them works for me.
> > 
> > Yeah, considering that this is the only place this is used, changing
> > %pB sounds quite reasonable.
> 
> There's now another use of '%pB' in proc_pid_stack() in the tip tree: I
> changed it to '%pB' from '%pS'.  But I think the modified '%pB' would
> work there as well.
> 
> > We could perhaps make %pB show the hex numbers and address (so pB
> > would expand to "[<hex>] symbolname".if
> > 
> >  (a) not randomizing (so the hex numbers _may_ be useful)
> > 
> >  (b) kptr_restrict is 0 (so the hex numbers are "safe" in the dmesg)
> > 
> > and fall back to just the symbolic name if either of those aren't true?
> 
> Do we really need to check for both?  '%pK' only checks kptr_restrict.
> I'd think we should be consistent with that.  And maybe there are some
> scenarios where the actual text addresses provide useful debug
> information if KASLR is enabled and kptr_restrict is zero.

So I was looking at implementing this, and I noticed that '%pK' prints
"pK-error" if it's called from interrupt context when kptr_restrict==1.
Because checking CAP_SYSLOG would be meaningless in that case.

I don't really understand the point of the "pK-error" thing.  Any reason
why we can't print zero, i.e., just degrade the kptr_restrict from 1 to
2 in an interrupt?

That would make the '%pK' code simpler and usable from interrupt
context.  Also it would make its behavior consistent with the proposed
'%pB' changes, and the kptr_restrict code could be shared between '%pK'
and '%pB'.

Kess (or others), any objections if I make that change?

-- 
Josh

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 3/6] x86/dumpstack: make printk_stack_address() more generally useful
  2016-08-25 17:49           ` Josh Poimboeuf
@ 2016-08-25 20:41             ` Kees Cook
  2016-08-25 21:07               ` Josh Poimboeuf
  0 siblings, 1 reply; 42+ messages in thread
From: Kees Cook @ 2016-08-25 20:41 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Linus Torvalds, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, the arch/x86 maintainers,
	Linux Kernel Mailing List, Andy Lutomirski, Steven Rostedt,
	Brian Gerst, Frederic Weisbecker, Byungchul Park, Nilay Vaish

On Thu, Aug 25, 2016 at 1:49 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> On Wed, Aug 24, 2016 at 02:37:07PM -0500, Josh Poimboeuf wrote:
>> On Wed, Aug 24, 2016 at 02:37:21PM -0400, Linus Torvalds wrote:
>> > On Wed, Aug 24, 2016 at 2:22 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>> > >
>> > > I actively disable KASLR on my dev box and feed these hex numbers into
>> > > addr2line -ie vmlinux to find where in the function we are.
>> > >
>> > > Having the option to make %pB generate them works for me.
>> >
>> > Yeah, considering that this is the only place this is used, changing
>> > %pB sounds quite reasonable.
>>
>> There's now another use of '%pB' in proc_pid_stack() in the tip tree: I
>> changed it to '%pB' from '%pS'.  But I think the modified '%pB' would
>> work there as well.
>>
>> > We could perhaps make %pB show the hex numbers and address (so pB
>> > would expand to "[<hex>] symbolname".if
>> >
>> >  (a) not randomizing (so the hex numbers _may_ be useful)
>> >
>> >  (b) kptr_restrict is 0 (so the hex numbers are "safe" in the dmesg)
>> >
>> > and fall back to just the symbolic name if either of those aren't true?
>>
>> Do we really need to check for both?  '%pK' only checks kptr_restrict.
>> I'd think we should be consistent with that.  And maybe there are some
>> scenarios where the actual text addresses provide useful debug
>> information if KASLR is enabled and kptr_restrict is zero.
>
> So I was looking at implementing this, and I noticed that '%pK' prints
> "pK-error" if it's called from interrupt context when kptr_restrict==1.
> Because checking CAP_SYSLOG would be meaningless in that case.
>
> I don't really understand the point of the "pK-error" thing.  Any reason
> why we can't print zero, i.e., just degrade the kptr_restrict from 1 to
> 2 in an interrupt?
>
> That would make the '%pK' code simpler and usable from interrupt
> context.  Also it would make its behavior consistent with the proposed
> '%pB' changes, and the kptr_restrict code could be shared between '%pK'
> and '%pB'.
>
> Kess (or others), any objections if I make that change?

I don't mind this becoming "0" on error. I suspect the rationale was
to make it a discoverable condition and to avoid confusion.

As far as expanding the usage, I'm still in favor, though there is
work planned to make kptr_restrict go away in favor of having
blacklisted destination buffers, etc. I'm hoping to have this as part
of the continuing usercopy hardening work.

Regardless, aren't these values being written to dmesg buffer?
Traditionally we've not bothered censoring values that go there, as
"dmesg_restrict" exists to protect those contents.

-Kees

-- 
Kees Cook
Nexus Security

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 3/6] x86/dumpstack: make printk_stack_address() more generally useful
  2016-08-25 20:41             ` Kees Cook
@ 2016-08-25 21:07               ` Josh Poimboeuf
       [not found]                 ` <CA+55aFy3sgA4=ZPhiDWiRMvWj9QPhUd0JBdUr1hm_6G0aSC6uw@mail.gmail.com>
  0 siblings, 1 reply; 42+ messages in thread
From: Josh Poimboeuf @ 2016-08-25 21:07 UTC (permalink / raw)
  To: Kees Cook
  Cc: Linus Torvalds, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, the arch/x86 maintainers,
	Linux Kernel Mailing List, Andy Lutomirski, Steven Rostedt,
	Brian Gerst, Frederic Weisbecker, Byungchul Park, Nilay Vaish

On Thu, Aug 25, 2016 at 04:41:29PM -0400, Kees Cook wrote:
> On Thu, Aug 25, 2016 at 1:49 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > On Wed, Aug 24, 2016 at 02:37:07PM -0500, Josh Poimboeuf wrote:
> >> On Wed, Aug 24, 2016 at 02:37:21PM -0400, Linus Torvalds wrote:
> >> > On Wed, Aug 24, 2016 at 2:22 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> >> > >
> >> > > I actively disable KASLR on my dev box and feed these hex numbers into
> >> > > addr2line -ie vmlinux to find where in the function we are.
> >> > >
> >> > > Having the option to make %pB generate them works for me.
> >> >
> >> > Yeah, considering that this is the only place this is used, changing
> >> > %pB sounds quite reasonable.
> >>
> >> There's now another use of '%pB' in proc_pid_stack() in the tip tree: I
> >> changed it to '%pB' from '%pS'.  But I think the modified '%pB' would
> >> work there as well.
> >>
> >> > We could perhaps make %pB show the hex numbers and address (so pB
> >> > would expand to "[<hex>] symbolname".if
> >> >
> >> >  (a) not randomizing (so the hex numbers _may_ be useful)
> >> >
> >> >  (b) kptr_restrict is 0 (so the hex numbers are "safe" in the dmesg)
> >> >
> >> > and fall back to just the symbolic name if either of those aren't true?
> >>
> >> Do we really need to check for both?  '%pK' only checks kptr_restrict.
> >> I'd think we should be consistent with that.  And maybe there are some
> >> scenarios where the actual text addresses provide useful debug
> >> information if KASLR is enabled and kptr_restrict is zero.
> >
> > So I was looking at implementing this, and I noticed that '%pK' prints
> > "pK-error" if it's called from interrupt context when kptr_restrict==1.
> > Because checking CAP_SYSLOG would be meaningless in that case.
> >
> > I don't really understand the point of the "pK-error" thing.  Any reason
> > why we can't print zero, i.e., just degrade the kptr_restrict from 1 to
> > 2 in an interrupt?
> >
> > That would make the '%pK' code simpler and usable from interrupt
> > context.  Also it would make its behavior consistent with the proposed
> > '%pB' changes, and the kptr_restrict code could be shared between '%pK'
> > and '%pB'.
> >
> > Kess (or others), any objections if I make that change?

Ahem, Kees, sorry :-)

> I don't mind this becoming "0" on error. I suspect the rationale was
> to make it a discoverable condition and to avoid confusion.
> 
> As far as expanding the usage, I'm still in favor, though there is
> work planned to make kptr_restrict go away in favor of having
> blacklisted destination buffers, etc. I'm hoping to have this as part
> of the continuing usercopy hardening work.
> 
> Regardless, aren't these values being written to dmesg buffer?
> Traditionally we've not bothered censoring values that go there, as
> "dmesg_restrict" exists to protect those contents.

Ah, the plot thickens.  I didn't know about 'dmesg_restrict'.  So I
guess we don't have to restrict the stack dump addresses after all,
since the entire dmesg buffer is protected by syslog()?

If so, I'm thinking that expanding '%pB' wouldn't be worthwhile after
all, because its two users would have two different requirements for
printing the address: /proc/<pid>/stack needs to use kptr_restrict but
the unwinder doesn't.

In which case I think the current code is fine.

-- 
Josh

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 3/6] x86/dumpstack: make printk_stack_address() more generally useful
       [not found]                 ` <CA+55aFy3sgA4=ZPhiDWiRMvWj9QPhUd0JBdUr1hm_6G0aSC6uw@mail.gmail.com>
@ 2016-08-26  2:19                   ` Kees Cook
  2016-08-26  3:19                   ` Josh Poimboeuf
  1 sibling, 0 replies; 42+ messages in thread
From: Kees Cook @ 2016-08-26  2:19 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Josh Poimboeuf, Frederic Weisbecker, Thomas Gleixner,
	H . Peter Anvin, Nilay Vaish, the arch/x86 maintainers,
	Steven Rostedt, Andy Lutomirski, Ingo Molnar,
	Linux Kernel Mailing List, Brian Gerst, Byungchul Park,
	Peter Zijlstra

On Thu, Aug 25, 2016 at 5:23 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Aug 25, 2016 2:08 PM, "Josh Poimboeuf" <jpoimboe@redhat.com> wrote:
>>
>> Ah, the plot thickens.  I didn't know about 'dmesg_restrict'.  So I
>> guess we don't have to restrict the stack dump addresses after all,
>> since the entire dmesg buffer is protected by syslog()?
>
> No.
>
> Guys, the whole dmesg_restrict thing is a joke. You can't restrict access to
> system messages in general. It's just a stupid idea.

I'm not advocating that it's a globally useful protection, I was just
trying to point out that so much stuff is already exposed in the
system log that it's likely not a great use of time to think about
censoring things there right now. Obviously if it both improves
debuggability _and_ removes raw addresses from the log, I'm all for
it. :)

-Kees

-- 
Kees Cook
Nexus Security

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 3/6] x86/dumpstack: make printk_stack_address() more generally useful
       [not found]                 ` <CA+55aFy3sgA4=ZPhiDWiRMvWj9QPhUd0JBdUr1hm_6G0aSC6uw@mail.gmail.com>
  2016-08-26  2:19                   ` Kees Cook
@ 2016-08-26  3:19                   ` Josh Poimboeuf
  2016-08-26  4:40                     ` Linus Torvalds
  1 sibling, 1 reply; 42+ messages in thread
From: Josh Poimboeuf @ 2016-08-26  3:19 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Frederic Weisbecker, Thomas Gleixner, Kees Cook, H . Peter Anvin,
	Nilay Vaish, the arch/x86 maintainers, Steven Rostedt,
	Andy Lutomirski, Ingo Molnar, Linux Kernel Mailing List,
	Brian Gerst, Byungchul Park, Peter Zijlstra

On Thu, Aug 25, 2016 at 02:23:35PM -0700, Linus Torvalds wrote:
> On Aug 25, 2016 2:08 PM, "Josh Poimboeuf" <jpoimboe@redhat.com> wrote:
> >
> > Ah, the plot thickens.  I didn't know about 'dmesg_restrict'.  So I
> > guess we don't have to restrict the stack dump addresses after all,
> > since the entire dmesg buffer is protected by syslog()?
> 
> No.
> 
> Guys, the whole dmesg_restrict thing is a joke. You can't restrict access
> to system messages in general. It's just a stupid idea.
> 
> So stop thinking like some theoretical security people - we have more than
> enough of those. Start thinking like *practical* people that actually care
> about the user experience, and interfaces, and then within those confined,
> think about security.
> 
> Because remember: if you don't get that security comes *second*, you're not
> a security person, you're just a joke. People actually *using* the system
> come first.

So I'm not all that familiar with this pointer restriction stuff.  And I
don't claim to be a security person.  I'm just trying to understand what
the real issues are so we can fix them.

And anyway I thought this whole discussion was trying to resolve a
security concern which you brought up: not leaking kernel addresses to
unprivileged user space.

> So stop with the idiotic security flags that same people cannot actually
> use, either because they restrict use too much, or because they are so
> expensive that they aren't practical by default.
> 
> The things is, we should strive to do something *useful*. And
> dmesg_restrict is not that.

For an oops, there are other opportunities for address leakage besides
the stack trace function addresses.  There's the raw stack dump which
dumps the first 12 stack entries.  And there's the register dump.  I'm
pretty sure we don't want to get rid of those.

I suppose we could come up with some innovative way to filter or
sanitize kernel addresses from the stack dump and the registers.  But
that probably hurts usability for kernel developers.

Another issue is that there are a lot of duplicate symbol names in the
kernel.  So the symbol name alone might not be enough to disambiguate
the function address.

Not to mention the fact that today there are a gazillion uses of
printk() with '%p' in the kernel.

So yes, dmesg_restrict sounds useful to me.  It's a way to prevent users
from seeing kernel addresses without affecting my ability to debug
issues.  For a locked down system, why would non-root users need to
access dmesg anyway?

> In contrast, striving to just get the symbol names - but not the hex
> addresses - is actually both more useful and more secure than what we have
> now.

How exactly does removing data from the stack dump make it more useful?

> And dammit, even addr2line should just be able to understand the "symbol+
> offset" format, so even that is a really bad reason to show the hex number.

Maybe addr2line *should* be able to understand "symbol+offset", but it
doesn't, so it breaks my workflow.  Is there another lightweight tool
out there (i.e., not gdb) which does it?

I also wonder if other tools might have similar issues.  Though I can't
think of anything off the top of my head.

-- 
Josh

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 3/6] x86/dumpstack: make printk_stack_address() more generally useful
  2016-08-26  3:19                   ` Josh Poimboeuf
@ 2016-08-26  4:40                     ` Linus Torvalds
  2016-08-26  5:56                       ` Josh Poimboeuf
  0 siblings, 1 reply; 42+ messages in thread
From: Linus Torvalds @ 2016-08-26  4:40 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Frederic Weisbecker, Thomas Gleixner, Kees Cook, H . Peter Anvin,
	Nilay Vaish, the arch/x86 maintainers, Steven Rostedt,
	Andy Lutomirski, Ingo Molnar, Linux Kernel Mailing List,
	Brian Gerst, Byungchul Park, Peter Zijlstra

On Thu, Aug 25, 2016 at 8:19 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> For an oops, there are other opportunities for address leakage besides
> the stack trace function addresses.  There's the raw stack dump which
> dumps the first 12 stack entries.  And there's the register dump.  I'm
> pretty sure we don't want to get rid of those.

We actually probably *do* want to get rid of the stack dump. It's
likely not really useful any more, and more legacy noise.

The register contents we definitely don't want to remove, obviously.
But apart from EIP itself (and LR etc on other architectures), those
actually rather seldom contain code addresses, so that kind of data is
rather harder to misuse.

That said, if you can trigger an oops, you do quite likely have a
security problem already.

So oopses aren't necessarily the first thing to worry about. I'd worry
more about things like WARN_ON_ONCE() that are much more likely things
that might be triggerable (ie we do occasionally have things like
warning for legacy system calls that shouldn't be used any more)

> I suppose we could come up with some innovative way to filter or
> sanitize kernel addresses from the stack dump and the registers.  But
> that probably hurts usability for kernel developers.

Yeah, but see above: an oops really does tend to often be a security
issue on its own, especially if it can be triggered arbitrarily by an
attacker.

> Another issue is that there are a lot of duplicate symbol names in the
> kernel.  So the symbol name alone might not be enough to disambiguate
> the function address.

That is true. It's seldom a big issue, though. Nobody actually uses
the address for anything _anyway_, so people end up disambiguating
those things based on context regardless.

Again, something like addr2line could be an exception, but (a) that
thing is useless in most situations due to randomization and (b) if
you don't randomize then the whole discussion is moot.

Plus add2line could just show all options, and then you end up doing
human disambiguation anyway.

> Not to mention the fact that today there are a gazillion uses of
> printk() with '%p' in the kernel.

Yes. And some of them have been stupiud security issues on their own,
and have nothing to do with symbol names. See for example commit
31b0b385f69d ("nf_conntrack: avoid kernel pointer value leak in slab
name")

> So yes, dmesg_restrict sounds useful to me.  It's a way to prevent users
> from seeing kernel addresses without affecting my ability to debug
> issues.  For a locked down system, why would non-root users need to
> access dmesg anyway?

That's the point. It is only useful for locked-down systems.

But that also means that IT IS NOT USEFUL AS A SECURITY ARGUMENT -
since it's simply not relevant to most systems out there.

Most systems aren't locked down.

> How exactly does removing data from the stack dump make it more useful?

I actually spend time cleaning up commit messages in logs, because
useless data that isn't actually information (random hex numbers) is
actively detrimental.

It makes commit logs less legible.

It also makes it harder to parse dumps.

It's not useful. That makes it actively bad.

I probably look at more oops reports than most people. I have not
found the hex numbers useful for the last five years, because they are
just randomized crap.

The stack content thing just makes code scroll off the screen etc, for example.

So in order for data to be useful, it has to be more than "data". It
has to be "information". More random useless hex noise is not good.

                   Linus

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 3/6] x86/dumpstack: make printk_stack_address() more generally useful
  2016-08-26  4:40                     ` Linus Torvalds
@ 2016-08-26  5:56                       ` Josh Poimboeuf
       [not found]                         ` <CA+55aFy8FPRiEr-4p++TGj+keTNsg781q1E1FQZ7z4+nAs0ZaQ@mail.gmail.com>
  0 siblings, 1 reply; 42+ messages in thread
From: Josh Poimboeuf @ 2016-08-26  5:56 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Frederic Weisbecker, Thomas Gleixner, Kees Cook, H . Peter Anvin,
	Nilay Vaish, the arch/x86 maintainers, Steven Rostedt,
	Andy Lutomirski, Ingo Molnar, Linux Kernel Mailing List,
	Brian Gerst, Byungchul Park, Peter Zijlstra

On Thu, Aug 25, 2016 at 09:40:12PM -0700, Linus Torvalds wrote:
> On Thu, Aug 25, 2016 at 8:19 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > So yes, dmesg_restrict sounds useful to me.  It's a way to prevent users
> > from seeing kernel addresses without affecting my ability to debug
> > issues.  For a locked down system, why would non-root users need to
> > access dmesg anyway?
> 
> That's the point. It is only useful for locked-down systems.
> 
> But that also means that IT IS NOT USEFUL AS A SECURITY ARGUMENT -
> since it's simply not relevant to most systems out there.
> 
> Most systems aren't locked down.

Ok, so maybe removing kernel text addresses from the stack dump wouldn't
be the end of the world.

But I still don't quite understand your statement that dmesg_restrict is
only useful for locked down systems.

To prevent kernel address disclosure, it seems we already rely on the
user setting kptr_restrict today, otherwise I can do cat
/proc/self/stack and the game is already lost, right?

So what's the difference between expecting the user to set kptr_restrict
vs dmesg_restrict?

-- 
Josh

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 3/6] x86/dumpstack: make printk_stack_address() more generally useful
       [not found]                         ` <CA+55aFy8FPRiEr-4p++TGj+keTNsg781q1E1FQZ7z4+nAs0ZaQ@mail.gmail.com>
@ 2016-08-26 13:30                           ` Josh Poimboeuf
  0 siblings, 0 replies; 42+ messages in thread
From: Josh Poimboeuf @ 2016-08-26 13:30 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Frederic Weisbecker, Thomas Gleixner, Kees Cook, H . Peter Anvin,
	the arch/x86 maintainers, Nilay Vaish, Ingo Molnar,
	Andy Lutomirski, Steven Rostedt, Linux Kernel Mailing List,
	Brian Gerst, Peter Zijlstra, Byungchul Park

On Thu, Aug 25, 2016 at 11:12:40PM -0700, Linus Torvalds wrote:
> On Aug 25, 2016 10:57 PM, "Josh Poimboeuf" <jpoimboe@redhat.com> wrote:
> >
> > But I still don't quite understand your statement that dmesg_restrict is
> > only useful for locked down systems.
> >
> > To prevent kernel address disclosure, it seems we already rely on the
> > user setting kptr_restrict today, otherwise I can do cat
> > /proc/self/stack and the game is already lost, right?
> 
> The point is: kptr_restrict actually makes sense, and is widely useful. It
> doesn't really end up hurting normal things. It's a pretty targeted thing,
> and generally doesn't actually hurt. You can still do basic health
> monitoring without having to get elevated privileges, for example.
> 
> Even system maintainers don't want to be root all the time. In fact, I
> suspect that the better a system maintainer you are, the less you want to
> be root - but you'll still want to see logs etc.
> 
> So note the difference between kptr_restrict and dmesg_restrict.
> 
> One is useful in a pretty wide environment, the other simply is not.
> 
> > So what's the difference between expecting the user to set kptr_restrict
> > vs dmesg_restrict?
> 
> Do you see the difference now?
> 
> kptr_restrict simply doesn't hurt as much as dmesg_restrict, so you can
> enable it fairly widely by default.
> 
> That makes it the *much* better security option. Because security options
> that you can't enable aren't actually useful.

Yeah, at least for human-administered systems, that does make sense.
Grumpy sysadmins don't want to type "sudo dmesg" or "sudo journalctl"
because a) they don't like change; and b) using sudo adds risk.

And a security option which is never used is indeed useless.  So *maybe*
that's a good enough argument for expecting the user to only enable
kptr_restrict instead of both.

But with cloud, devops, mobile, embedded, IoT, [insert buzzword], it
seems most systems are actually managed by software nowadays.  Then the
above arguments don't seem to apply, and dmesg_restrict could still be
quite widely useful, no?

-- 
Josh

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/6] oprofile/x86: add regs->ip to oprofile trace
  2016-08-24 16:50 ` [PATCH 2/6] oprofile/x86: add regs->ip to oprofile trace Josh Poimboeuf
@ 2016-08-30 11:30   ` Robert Richter
  2016-09-08  9:48   ` [tip:x86/asm] oprofile/x86: Add " tip-bot for Josh Poimboeuf
  1 sibling, 0 replies; 42+ messages in thread
From: Robert Richter @ 2016-08-30 11:30 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86, linux-kernel,
	Andy Lutomirski, Linus Torvalds, Steven Rostedt, Brian Gerst,
	Kees Cook, Peter Zijlstra, Frederic Weisbecker, Byungchul Park,
	Nilay Vaish, oprofile-list

On 24.08.16 11:50:15, Josh Poimboeuf wrote:
> dump_trace() doesn't add the interrupted instruction's address to the
> trace, so add it manually.  This makes the profile more useful, and also
> makes it more consistent with what perf profiling does.
> 
> Cc: Robert Richter <rric@kernel.org>
> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
> ---
>  arch/x86/oprofile/backtrace.c | 13 ++++++++++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/oprofile/backtrace.c b/arch/x86/oprofile/backtrace.c
> index cb31a44..2ef6c8b 100644
> --- a/arch/x86/oprofile/backtrace.c
> +++ b/arch/x86/oprofile/backtrace.c
> @@ -114,9 +114,16 @@ x86_backtrace(struct pt_regs * const regs, unsigned int depth)
>  
>  	if (!user_mode(regs)) {
>  		unsigned long stack = kernel_stack_pointer(regs);
> -		if (depth)
> -			dump_trace(NULL, regs, (unsigned long *)stack, 0,
> -				   &backtrace_ops, &depth);
> +
> +		if (!depth)
> +			return;
> +
> +		oprofile_add_trace(regs->ip);
> +		if (!--depth)
> +			return;
> +
> +		dump_trace(NULL, regs, (unsigned long *)stack, 0,
> +			   &backtrace_ops, &depth);

Acked-by: Robert Richter <rric@kernel.org>

Adding oprofile-list in case someone has objections since this may
break userland tools.

Ingo, please apply the patch to tip once this series is ok to you.

Thanks,

-Robert

>  		return;
>  	}
>  
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 3/6] x86/dumpstack: make printk_stack_address() more generally useful
  2016-08-24 18:37       ` Linus Torvalds
  2016-08-24 19:37         ` Josh Poimboeuf
@ 2016-08-31 16:53         ` Josh Poimboeuf
  2016-08-31 17:15           ` Linus Torvalds
  1 sibling, 1 reply; 42+ messages in thread
From: Josh Poimboeuf @ 2016-08-31 16:53 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Zijlstra, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	the arch/x86 maintainers, Linux Kernel Mailing List,
	Andy Lutomirski, Steven Rostedt, Brian Gerst, Kees Cook,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

On Wed, Aug 24, 2016 at 02:37:21PM -0400, Linus Torvalds wrote:
> On Wed, Aug 24, 2016 at 2:22 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > I actively disable KASLR on my dev box and feed these hex numbers into
> > addr2line -ie vmlinux to find where in the function we are.
> >
> > Having the option to make %pB generate them works for me.
> 
> Yeah, considering that this is the only place this is used, changing
> %pB sounds quite reasonable.
> 
> We could perhaps make %pB show the hex numbers and address (so pB
> would expand to "[<hex>] symbolname".if
> 
>  (a) not randomizing (so the hex numbers _may_ be useful)
> 
>  (b) kptr_restrict is 0 (so the hex numbers are "safe" in the dmesg)
> 
> and fall back to just the symbolic name if either of those aren't true?
> 
> And obviously, if KALLSYMS isn't enabled, you always show hex
> numbers.. That's already the case (but we might want to add the "[<>}'
> markers around the hex numbers just to make the user space automation
> we do have work).

Here's a simple patch to skip the kernel text addresses in the stack
dump, based on top of the up-thread patch set.

It doesn't do the '%pB' thing because now in tip there's a non-printk
user of it in /proc/<pid>/stack, for which there are different rules
(print zeros on kptr_restrict vs skipping the number altogether, for
example).  And anyway that could maybe be added later.

It also doesn't check for randomization or kptr_restrict, which keeps it
simple and maximizes its "usefulness" ;-)

Thoughts?

---

From: Josh Poimboeuf <jpoimboe@redhat.com>
Subject: [PATCH] x86/dumpstack: don't print kernel text addresses in stack dump

Printing kernel text addresses in stack dumps is of questionable value,
especially now that address randomization is becoming common.

It can be a security issue because it leaks kernel addresses.

It also affects the usefulness of the stack dump.  Linus says:

  "I actually spend time cleaning up commit messages in logs, because
  useless data that isn't actually information (random hex numbers) is
  actively detrimental.

  It makes commit logs less legible.

  It also makes it harder to parse dumps.

  It's not useful. That makes it actively bad.

  I probably look at more oops reports than most people. I have not
  found the hex numbers useful for the last five years, because they are
  just randomized crap.

  The stack content thing just makes code scroll off the screen etc, for
  example."

The only real downside to removing these addresses is that they can be
used to disambiguate duplicate symbol names.  However such cases are
rare, and the context of the stack dump should be enough to be able to
figure it out.

Some kernel developers like to use addr2line to convert the addresses to
a file name and line (which only works without randomization).  They can
instead use gdb for that purpose:

  $ echo "list *driver_probe_device+0x223" |gdb vmlinux |grep "is in"
  (gdb) 0xffffffff815b5d83 is in driver_probe_device (/home/jpoimboe/git/linux/drivers/base/dd.c:378).

(But note that when there are duplicate symbol names, that will only
show the first symbol it finds.)

Here's an example of what a stack dump looks like after this change:

  kernel BUG at /home/jpoimboe/git/linux/include/linux/scatterlist.h:140!
  invalid opcode: 0000 [#1] PREEMPT SMP
  Modules linked in: ...
  CPU: 0 PID: 338 Comm: systemd-udevd Not tainted 4.8.0-rc3+ #7
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.1-1.fc24 04/01/2014
  task: ffff880078978000 task.stack: ffffc900008f0000
  RIP: sg_init_one+0x87/0xa0
  RSP: 0018:ffffc900008f3a68  EFLAGS: 00010246
  RAX: 0000000000000000 RBX: ffffc900008f3a98 RCX: 0000000000000028
  RDX: 0000000000000041 RSI: 0000000000000000 RDI: 00004100008f3a98
  RBP: ffffc900008f3a80 R08: 00000002a014a073 R09: ffffc900008f3aa0
  R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000008
  R13: ffffc900008f3aa0 R14: 0000000000000000 R15: 0000000000000001
  FS:  00007fa18fc20880(0000) GS:ffff88007d200000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007fa18fc29000 CR3: 00000000799f8000 CR4: 00000000001406f0
  Stack:
   ffff880079af7350 ffff880079905400 0000000000000000 ffffc900008f3ae0
   ffffffffa0196610 0000000000000001 00010000ffffffff 0000000087654321
   0000000000000002 0000000000000000 0000000000000000 0000000000000000
  Call Trace:
   __send_control_msg+0x80/0xf0 [virtio_console]
   virtcons_probe+0x256/0x400 [virtio_console]
   virtio_dev_probe+0x144/0x1e0 [virtio]
   driver_probe_device+0x223/0x430
   __driver_attach+0xe3/0xf0
   ? driver_probe_device+0x430/0x430
   bus_for_each_dev+0x73/0xc0
   driver_attach+0x1e/0x20
   bus_add_driver+0x173/0x270
   ? virtio_cons_early_init+0x1d/0x1d [virtio_console]
   driver_register+0x60/0xe0
   ? virtio_cons_early_init+0x1d/0x1d [virtio_console]
   register_virtio_driver+0x20/0x30 [virtio]
   init+0x9f/0xfe3 [virtio_console]
   do_one_initcall+0x50/0x180
   ? do_init_module+0x27/0x1f9
   ? rcu_read_lock_sched_held+0x45/0x80
   ? kmem_cache_alloc_trace+0x28a/0x2f0
   do_init_module+0x5f/0x1f9
   load_module+0x270b/0x2c20
   ? __symbol_put+0x90/0x90
   ? show_coresize+0x30/0x30
   ? vfs_read+0x121/0x140
   SYSC_finit_module+0xdf/0x110
   SyS_finit_module+0xe/0x10
   entry_SYSCALL_64_fastpath+0x1f/0xbd
  Code: ff 83 e2 03 48 c1 e8 0c 81 e3 ff 0f 00 00 45 89 65 14 48 c1 e0 06 41 89 5d 10 48 01 c8 48 09 d0 5b 49 89 45 08 41 5c 41 5d 5d c3 <0f> 0b 0f 0b 0f 0b 48 8b 0d fc c6 9a 00 eb a6 66 2e 0f 1f 84 00
  RIP: sg_init_one+0x87/0xa0 RSP: ffffc900008f3a68

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/include/asm/kdebug.h |  1 -
 arch/x86/kernel/dumpstack.c   | 18 ++++--------------
 arch/x86/kernel/process_32.c  |  7 +++----
 arch/x86/kernel/process_64.c  |  3 +--
 arch/x86/mm/fault.c           |  3 +--
 arch/x86/platform/uv/uv_nmi.c |  4 ++--
 6 files changed, 11 insertions(+), 25 deletions(-)

diff --git a/arch/x86/include/asm/kdebug.h b/arch/x86/include/asm/kdebug.h
index d318811..29a594a 100644
--- a/arch/x86/include/asm/kdebug.h
+++ b/arch/x86/include/asm/kdebug.h
@@ -21,7 +21,6 @@ enum die_val {
 	DIE_NMIUNKNOWN,
 };
 
-extern void printk_address(unsigned long address);
 extern void die(const char *, struct pt_regs *,long);
 extern int __must_check __die(const char *, struct pt_regs *, long);
 extern void show_stack_regs(struct pt_regs *regs);
diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index c6c6c39..a56c04c 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -29,14 +29,7 @@ static void printk_stack_address(unsigned long address, int reliable,
 				 char *log_lvl)
 {
 	touch_nmi_watchdog();
-	printk("%s [<%p>] %s%pB\n",
-		log_lvl, (void *)address, reliable ? "" : "? ",
-		(void *)address);
-}
-
-void printk_address(unsigned long address)
-{
-	pr_cont(" [<%p>] %pS\n", (void *)address, (void *)address);
+	printk("%s %s%pB\n", log_lvl, reliable ? "" : "? ", (void *)address);
 }
 
 /*
@@ -277,14 +270,11 @@ int __die(const char *str, struct pt_regs *regs, long err)
 		sp = kernel_stack_pointer(regs);
 		savesegment(ss, ss);
 	}
-	printk(KERN_EMERG "EIP: [<%08lx>] ", regs->ip);
-	print_symbol("%s", regs->ip);
-	printk(" SS:ESP %04x:%08lx\n", ss, sp);
+	printk(KERN_EMERG "EIP: %pS SS:ESP: %04x:%08lx\n",
+	       (void *)regs->ip, ss, sp);
 #else
 	/* Executive summary in case the oops scrolled away */
-	printk(KERN_ALERT "RIP ");
-	printk_address(regs->ip);
-	printk(" RSP <%016lx>\n", regs->sp);
+	printk(KERN_ALERT "RIP: %pS RSP: %016lx\n", (void *)regs->ip, regs->sp);
 #endif
 	return 0;
 }
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 404efdf..9f828a9 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -72,10 +72,9 @@ void __show_regs(struct pt_regs *regs, int all)
 		savesegment(gs, gs);
 	}
 
-	printk(KERN_DEFAULT "EIP: %04x:[<%08lx>] EFLAGS: %08lx CPU: %d\n",
-			(u16)regs->cs, regs->ip, regs->flags,
-			smp_processor_id());
-	print_symbol("EIP is at %s\n", regs->ip);
+	printk(KERN_DEFAULT "EIP: %pS\n", (void *)regs->ip);
+	printk(KERN_DEFAULT "EFLAGS: %08lx CPU: %d\n", regs->flags,
+		smp_processor_id());
 
 	printk(KERN_DEFAULT "EAX: %08lx EBX: %08lx ECX: %08lx EDX: %08lx\n",
 		regs->ax, regs->bx, regs->cx, regs->dx);
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index b812cd0..d250088 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -60,8 +60,7 @@ void __show_regs(struct pt_regs *regs, int all)
 	unsigned int fsindex, gsindex;
 	unsigned int ds, cs, es;
 
-	printk(KERN_DEFAULT "RIP: %04lx:[<%016lx>] ", regs->cs & 0xffff, regs->ip);
-	printk_address(regs->ip);
+	printk(KERN_DEFAULT "RIP: %pS\n", (void *)regs->ip);
 	printk(KERN_DEFAULT "RSP: %04lx:%016lx  EFLAGS: %08lx\n", regs->ss,
 			regs->sp, regs->flags);
 	printk(KERN_DEFAULT "RAX: %016lx RBX: %016lx RCX: %016lx\n",
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index dc80230..0e7b66c 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -679,8 +679,7 @@ show_fault_oops(struct pt_regs *regs, unsigned long error_code,
 		printk(KERN_CONT "paging request");
 
 	printk(KERN_CONT " at %p\n", (void *) address);
-	printk(KERN_ALERT "IP:");
-	printk_address(regs->ip);
+	printk(KERN_ALERT "IP: %pS\n", (void *)regs->ip);
 
 	dump_pagetable(address);
 }
diff --git a/arch/x86/platform/uv/uv_nmi.c b/arch/x86/platform/uv/uv_nmi.c
index cd5173a..8410e7d 100644
--- a/arch/x86/platform/uv/uv_nmi.c
+++ b/arch/x86/platform/uv/uv_nmi.c
@@ -387,8 +387,8 @@ static void uv_nmi_dump_cpu_ip_hdr(void)
 /* Dump Instruction Pointer info */
 static void uv_nmi_dump_cpu_ip(int cpu, struct pt_regs *regs)
 {
-	pr_info("UV: %4d %6d %-32.32s ", cpu, current->pid, current->comm);
-	printk_address(regs->ip);
+	pr_info("UV: %4d %6d %-32.32s %pS",
+		cpu, current->pid, current->comm, (void *)regs->ip);
 }
 
 /*
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH 3/6] x86/dumpstack: make printk_stack_address() more generally useful
  2016-08-31 16:53         ` Josh Poimboeuf
@ 2016-08-31 17:15           ` Linus Torvalds
  2016-09-01 13:09             ` Josh Poimboeuf
  0 siblings, 1 reply; 42+ messages in thread
From: Linus Torvalds @ 2016-08-31 17:15 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Peter Zijlstra, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	the arch/x86 maintainers, Linux Kernel Mailing List,
	Andy Lutomirski, Steven Rostedt, Brian Gerst, Kees Cook,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

On Wed, Aug 31, 2016 at 9:53 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>
> Here's an example of what a stack dump looks like after this change:

Looks good, but it also shows:

>   RSP: 0018:ffffc900008f3a68  EFLAGS: 00010246

Ok, we know the stack pointer now...

>   RBP: ffffc900008f3a80 R08: 00000002a014a073 R09: ffffc900008f3aa0

.. and it looks like we have a frame pointer too, which is pretty
close to the stack pointer. So let's look at the stack dump:

>   Stack:
>    ffff880079af7350 ffff880079905400 0000000000000000 ffffc900008f3ae0
>    ffffffffa0196610 0000000000000001 00010000ffffffff 0000000087654321

Yeah, counting down, the frame pointer points to the next frame, and
right after that you see the return address: 0xffffffffa0196610.

So let's look at the call trace:

>   Call Trace:
>    __send_control_msg+0x80/0xf0 [virtio_console]

Ok, so now we can match up that value to a symbol.

And that means that now we can match up *all* symbols in that module.
Even if the module isn't readable on that machine, you can just look
it up by checking the distro modules on another machine.

And if it had been core kernel code, you'd haev had the whole core
kernel ofdfsets.

So I think the patch is good, and I think the oops looks great, but I
think we should also just remove the stack dump. Sure, the register
state *can* contain these things too, but almost never do (and didn't,
in this example).

The stack dump actually goes back to forever, and it used to be useful
back in 1992 or so. But it used to be useful mainly because stacks
were simpler and we didn't have very good call traces anyway. I
definitely remember having used them - I just do not remember having
used them in the last ten+ years.

Of course, it's still true that if you can trigger an oops, you've
likely already lost the security game, but since the stack dump is so
useless, let's aim to just remove it and make games like the above
harder.

I'm also sure that we probably have a lot of other addresses in dmesg
that we should make sure aren't leaked. I did a quick grep and found

 Base memory trampoline at [ffff8f5e00097000] 97000 size 24576
 percpu: Embedded 35 pages/cpu @ffff8f6236c00000 s103640 r8192 d31528 u262144
 Freeing SMP alternatives memory: 32K (ffffffffaaec1000 - ffffffffaaec9000)

and a few more, and didn't check if those might give load addresses
away, but it would be good to check.

                 Linus

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 3/6] x86/dumpstack: make printk_stack_address() more generally useful
  2016-08-31 17:15           ` Linus Torvalds
@ 2016-09-01 13:09             ` Josh Poimboeuf
  2016-09-01 16:30               ` Linus Torvalds
  0 siblings, 1 reply; 42+ messages in thread
From: Josh Poimboeuf @ 2016-09-01 13:09 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Zijlstra, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	the arch/x86 maintainers, Linux Kernel Mailing List,
	Andy Lutomirski, Steven Rostedt, Brian Gerst, Kees Cook,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

On Wed, Aug 31, 2016 at 10:15:19AM -0700, Linus Torvalds wrote:
> So I think the patch is good, and I think the oops looks great, but I
> think we should also just remove the stack dump. Sure, the register
> state *can* contain these things too, but almost never do (and didn't,
> in this example).
> 
> The stack dump actually goes back to forever, and it used to be useful
> back in 1992 or so. But it used to be useful mainly because stacks
> were simpler and we didn't have very good call traces anyway. I
> definitely remember having used them - I just do not remember having
> used them in the last ten+ years.
> 
> Of course, it's still true that if you can trigger an oops, you've
> likely already lost the security game, but since the stack dump is so
> useless, let's aim to just remove it and make games like the above
> harder.

Yeah.  I'll do another patch to get rid of the raw stack dump (though
maybe I'll wait until the other patches get merged first so I don't have
patches flying around everywhere).

> I'm also sure that we probably have a lot of other addresses in dmesg
> that we should make sure aren't leaked. I did a quick grep and found
> 
>  Base memory trampoline at [ffff8f5e00097000] 97000 size 24576
>  percpu: Embedded 35 pages/cpu @ffff8f6236c00000 s103640 r8192 d31528 u262144
>  Freeing SMP alternatives memory: 32K (ffffffffaaec1000 - ffffffffaaec9000)
> 
> and a few more, and didn't check if those might give load addresses
> away, but it would be good to check.

On my system, a grep found these:

  $ dmesg |grep "ffff[8-e]\|ffffffff[8-e]"
  [    0.000000] found SMP MP-table at [mem 0x000f6b40-0x000f6b4f] mapped at [ffffa0b7000f6b40]
  [    0.000000] Base memory trampoline at [ffffa0b700099000] 99000 size 24576
  [    0.000000] percpu: Embedded 485 pages/cpu @ffffa0b77d200000 s1946904 r8192 d31464 u2097152
  [    0.475975] Freeing SMP alternatives memory: 32K (ffffffff9e309000 - ffffffff9e311000)
  [    2.656380] Freeing initrd memory: 10588K (ffffa0b736b42000 - ffffa0b737599000)
  [    4.444099] Freeing unused kernel memory: 3592K (ffffffff9df87000 - ffffffff9e309000)
  [    4.447080] Freeing unused kernel memory: 1352K (ffffa0b7288ae000 - ffffa0b728a00000)
  [    4.449517] Freeing unused kernel memory: 632K (ffffa0b728d62000 - ffffa0b728e00000)

The text starts at 0xffffa0b728000000 and 0xffffffff9d000000.  I think
only the "Freeing" messages would give away the vmlinux mappings.

I'm wonder if it might be useful to encode the addresses somehow; they
could conceivably be used to debug use-after-free issues.  Or we could
just remove them.

-- 
Josh

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 0/6] x86/dumpstack: more stack dump improvements
  2016-08-24 16:50 [PATCH 0/6] x86/dumpstack: more stack dump improvements Josh Poimboeuf
                   ` (5 preceding siblings ...)
  2016-08-24 16:50 ` [PATCH 6/6] x86/dumpstack: allow preemption in show_stack_log_lvl() and dump_trace() Josh Poimboeuf
@ 2016-09-01 13:13 ` Josh Poimboeuf
  6 siblings, 0 replies; 42+ messages in thread
From: Josh Poimboeuf @ 2016-09-01 13:13 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: x86, linux-kernel, Andy Lutomirski, Linus Torvalds,
	Steven Rostedt, Brian Gerst, Kees Cook, Peter Zijlstra,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish,
	Thomas Gleixner, H . Peter Anvin

On Wed, Aug 24, 2016 at 11:50:13AM -0500, Josh Poimboeuf wrote:
> Some more preparatory patches for the new x86 unwinder (though these
> generally stand alone as improvements in their own right).

Ingo, would you be willing to take this series?  There have been no
objections other than Linus's orthogonal comment.

-- 
Josh

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 3/6] x86/dumpstack: make printk_stack_address() more generally useful
  2016-09-01 13:09             ` Josh Poimboeuf
@ 2016-09-01 16:30               ` Linus Torvalds
  0 siblings, 0 replies; 42+ messages in thread
From: Linus Torvalds @ 2016-09-01 16:30 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Peter Zijlstra, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	the arch/x86 maintainers, Linux Kernel Mailing List,
	Andy Lutomirski, Steven Rostedt, Brian Gerst, Kees Cook,
	Frederic Weisbecker, Byungchul Park, Nilay Vaish

On Thu, Sep 1, 2016 at 6:09 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>
> I'm wonder if it might be useful to encode the addresses somehow; they
> could conceivably be used to debug use-after-free issues.  Or we could
> just remove them.

I suspect we should just remove them. I'm sure they are useful in
theory, but I suspect they were more useful back when the whole "free
init memory" was originally done.

These days, if we have a use-after-free, I suspect the init-mem
situation is the easiest situation by far. Compared to all the dynamic
allocations which are much more likely to show it anyway. So having
debug output for that case is likely not all that productive.

              Linus

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 6/6] x86/dumpstack: allow preemption in show_stack_log_lvl() and dump_trace()
  2016-08-24 16:50 ` [PATCH 6/6] x86/dumpstack: allow preemption in show_stack_log_lvl() and dump_trace() Josh Poimboeuf
@ 2016-09-08  7:04   ` Ingo Molnar
  2016-09-08 21:47     ` Josh Poimboeuf
  0 siblings, 1 reply; 42+ messages in thread
From: Ingo Molnar @ 2016-09-08  7:04 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Thomas Gleixner, H . Peter Anvin, x86, linux-kernel,
	Andy Lutomirski, Linus Torvalds, Steven Rostedt, Brian Gerst,
	Kees Cook, Peter Zijlstra, Frederic Weisbecker, Byungchul Park,
	Nilay Vaish


* Josh Poimboeuf <jpoimboe@redhat.com> wrote:

>  arch/x86/kernel/dumpstack_32.c | 14 ++++++--------

> --- a/arch/x86/kernel/dumpstack_32.c
> +++ b/arch/x86/kernel/dumpstack_32.c

> +static void *is_softirq_stack(unsigned long *stack);
>  {

FYI, this bit clearly wasn't build tested on 32-bit even once - I'm skipping this 
patch.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [tip:x86/asm] perf/x86: Check perf_callchain_store() error
  2016-08-24 16:50 ` [PATCH 1/6] perf/x86: check perf_callchain_store() error Josh Poimboeuf
@ 2016-09-08  9:48   ` tip-bot for Josh Poimboeuf
  0 siblings, 0 replies; 42+ messages in thread
From: tip-bot for Josh Poimboeuf @ 2016-09-08  9:48 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: keescook, peterz, tglx, luto, rostedt, brgerst, hpa,
	linux-kernel, nilayvaish, jpoimboe, mingo, torvalds,
	byungchul.park, fweisbec

Commit-ID:  019e579d395733d14097c2d29c8c43226dad1617
Gitweb:     http://git.kernel.org/tip/019e579d395733d14097c2d29c8c43226dad1617
Author:     Josh Poimboeuf <jpoimboe@redhat.com>
AuthorDate: Wed, 24 Aug 2016 11:50:14 -0500
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 8 Sep 2016 08:58:40 +0200

perf/x86: Check perf_callchain_store() error

Add a check to perf_callchain_kernel() so that it returns early if the
callchain entry array is already full.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/dce6d60bab08be2600efd90021d9b85620646161.1472057064.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/events/core.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index d0efb5c..c1319ac 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2277,7 +2277,8 @@ perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *re
 		return;
 	}
 
-	perf_callchain_store(entry, regs->ip);
+	if (perf_callchain_store(entry, regs->ip))
+		return;
 
 	dump_trace(NULL, regs, NULL, 0, &backtrace_ops, entry);
 }

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [tip:x86/asm] oprofile/x86: Add regs->ip to oprofile trace
  2016-08-24 16:50 ` [PATCH 2/6] oprofile/x86: add regs->ip to oprofile trace Josh Poimboeuf
  2016-08-30 11:30   ` Robert Richter
@ 2016-09-08  9:48   ` tip-bot for Josh Poimboeuf
  1 sibling, 0 replies; 42+ messages in thread
From: tip-bot for Josh Poimboeuf @ 2016-09-08  9:48 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: torvalds, mingo, nilayvaish, linux-kernel, byungchul.park,
	peterz, brgerst, rostedt, rric, keescook, fweisbec, hpa, tglx,
	jpoimboe, luto

Commit-ID:  3e344a0db900757caaf0beeb749de4c7b59bfd60
Gitweb:     http://git.kernel.org/tip/3e344a0db900757caaf0beeb749de4c7b59bfd60
Author:     Josh Poimboeuf <jpoimboe@redhat.com>
AuthorDate: Wed, 24 Aug 2016 11:50:15 -0500
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 8 Sep 2016 08:58:40 +0200

oprofile/x86: Add regs->ip to oprofile trace

dump_trace() doesn't add the interrupted instruction's address to the
trace, so add it manually.  This makes the profile more useful, and also
makes it more consistent with what perf profiling does.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Robert Richter <rric@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/6c745a83dbd69fc6857ef9b2f8be0f011d775936.1472057064.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/oprofile/backtrace.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/arch/x86/oprofile/backtrace.c b/arch/x86/oprofile/backtrace.c
index cb31a44..2ef6c8b 100644
--- a/arch/x86/oprofile/backtrace.c
+++ b/arch/x86/oprofile/backtrace.c
@@ -114,9 +114,16 @@ x86_backtrace(struct pt_regs * const regs, unsigned int depth)
 
 	if (!user_mode(regs)) {
 		unsigned long stack = kernel_stack_pointer(regs);
-		if (depth)
-			dump_trace(NULL, regs, (unsigned long *)stack, 0,
-				   &backtrace_ops, &depth);
+
+		if (!depth)
+			return;
+
+		oprofile_add_trace(regs->ip);
+		if (!--depth)
+			return;
+
+		dump_trace(NULL, regs, (unsigned long *)stack, 0,
+			   &backtrace_ops, &depth);
 		return;
 	}
 

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [tip:x86/asm] x86/dumpstack: Make printk_stack_address() more generally useful
  2016-08-24 16:50 ` [PATCH 3/6] x86/dumpstack: make printk_stack_address() more generally useful Josh Poimboeuf
  2016-08-24 17:28   ` Joe Perches
  2016-08-24 18:03   ` Linus Torvalds
@ 2016-09-08  9:49   ` tip-bot for Josh Poimboeuf
  2 siblings, 0 replies; 42+ messages in thread
From: tip-bot for Josh Poimboeuf @ 2016-09-08  9:49 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: keescook, rostedt, jpoimboe, peterz, tglx, hpa, brgerst, mingo,
	torvalds, luto, nilayvaish, fweisbec, byungchul.park,
	linux-kernel

Commit-ID:  d438f5fda30ec087512355e405e9c8955d8bd337
Gitweb:     http://git.kernel.org/tip/d438f5fda30ec087512355e405e9c8955d8bd337
Author:     Josh Poimboeuf <jpoimboe@redhat.com>
AuthorDate: Wed, 24 Aug 2016 11:50:16 -0500
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 8 Sep 2016 08:58:40 +0200

x86/dumpstack: Make printk_stack_address() more generally useful

Change printk_stack_address() to be useful when called by an unwinder
outside the context of dump_trace().

Specifically:

- printk_stack_address()'s 'data' argument is always used as the log
  level string.  Make that explicit.

- Call touch_nmi_watchdog().

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/9fbe0db05bacf66d337c162edbf61450d0cff1e2.1472057064.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/dumpstack.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index 01072e9..f0ddf85 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -26,10 +26,11 @@ int kstack_depth_to_print = 3 * STACKSLOTS_PER_LINE;
 static int die_counter;
 
 static void printk_stack_address(unsigned long address, int reliable,
-		void *data)
+				 char *log_lvl)
 {
+	touch_nmi_watchdog();
 	printk("%s [<%p>] %s%pB\n",
-		(char *)data, (void *)address, reliable ? "" : "? ",
+		log_lvl, (void *)address, reliable ? "" : "? ",
 		(void *)address);
 }
 
@@ -148,7 +149,6 @@ static int print_trace_stack(void *data, char *name)
  */
 static int print_trace_address(void *data, unsigned long addr, int reliable)
 {
-	touch_nmi_watchdog();
 	printk_stack_address(addr, reliable, data);
 	return 0;
 }

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [tip:x86/asm] x86/dumpstack: Add get_stack_pointer() and get_frame_pointer()
  2016-08-24 16:50 ` [PATCH 4/6] x86/dumpstack: add get_stack_pointer() and get_frame_pointer() Josh Poimboeuf
@ 2016-09-08  9:49   ` tip-bot for Josh Poimboeuf
  0 siblings, 0 replies; 42+ messages in thread
From: tip-bot for Josh Poimboeuf @ 2016-09-08  9:49 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: brgerst, keescook, luto, peterz, tglx, nilayvaish, luto, hpa,
	mingo, linux-kernel, rostedt, byungchul.park, fweisbec, torvalds,
	jpoimboe

Commit-ID:  4b8afafbe743be1a81c96ddcd75b19c534d5e262
Gitweb:     http://git.kernel.org/tip/4b8afafbe743be1a81c96ddcd75b19c534d5e262
Author:     Josh Poimboeuf <jpoimboe@redhat.com>
AuthorDate: Wed, 24 Aug 2016 11:50:17 -0500
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 8 Sep 2016 08:58:40 +0200

x86/dumpstack: Add get_stack_pointer() and get_frame_pointer()

The various functions involved in dumping the stack all do similar
things with regard to getting the stack pointer and the frame pointer
based on the regs and task arguments.  Create helper functions to
do that instead.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/f448914885a35f333fe04da1b97a6c2cc1f80974.1472057064.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/stacktrace.h | 41 ++++++++++++++++++++++-----------------
 arch/x86/kernel/dumpstack.c       |  5 ++---
 arch/x86/kernel/dumpstack_32.c    | 25 ++++--------------------
 arch/x86/kernel/dumpstack_64.c    | 30 ++++------------------------
 4 files changed, 33 insertions(+), 68 deletions(-)

diff --git a/arch/x86/include/asm/stacktrace.h b/arch/x86/include/asm/stacktrace.h
index 7646fb2..3552f5e 100644
--- a/arch/x86/include/asm/stacktrace.h
+++ b/arch/x86/include/asm/stacktrace.h
@@ -50,36 +50,41 @@ void dump_trace(struct task_struct *tsk, struct pt_regs *regs,
 
 #ifdef CONFIG_X86_32
 #define STACKSLOTS_PER_LINE 8
-#define get_bp(bp) asm("movl %%ebp, %0" : "=r" (bp) :)
 #else
 #define STACKSLOTS_PER_LINE 4
-#define get_bp(bp) asm("movq %%rbp, %0" : "=r" (bp) :)
 #endif
 
 #ifdef CONFIG_FRAME_POINTER
-static inline unsigned long
-stack_frame(struct task_struct *task, struct pt_regs *regs)
+static inline unsigned long *
+get_frame_pointer(struct task_struct *task, struct pt_regs *regs)
 {
-	unsigned long bp;
-
 	if (regs)
-		return regs->bp;
+		return (unsigned long *)regs->bp;
 
-	if (task == current) {
-		/* Grab bp right from our regs */
-		get_bp(bp);
-		return bp;
-	}
+	if (!task || task == current)
+		return __builtin_frame_address(0);
 
-	return ((struct inactive_task_frame *)task->thread.sp)->bp;
+	return (unsigned long *)((struct inactive_task_frame *)task->thread.sp)->bp;
 }
 #else
-static inline unsigned long
-stack_frame(struct task_struct *task, struct pt_regs *regs)
+static inline unsigned long *
+get_frame_pointer(struct task_struct *task, struct pt_regs *regs)
 {
-	return 0;
+	return NULL;
+}
+#endif /* CONFIG_FRAME_POINTER */
+
+static inline unsigned long *
+get_stack_pointer(struct task_struct *task, struct pt_regs *regs)
+{
+	if (regs)
+		return (unsigned long *)kernel_stack_pointer(regs);
+
+	if (!task || task == current)
+		return __builtin_frame_address(0);
+
+	return (unsigned long *)task->thread.sp;
 }
-#endif
 
 extern void
 show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
@@ -106,7 +111,7 @@ static inline unsigned long caller_frame_pointer(void)
 {
 	struct stack_frame *frame;
 
-	get_bp(frame);
+	frame = __builtin_frame_address(0);
 
 #ifdef CONFIG_FRAME_POINTER
 	frame = frame->next_frame;
diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index f0ddf85..6d6f468 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -170,15 +170,14 @@ show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
 void show_stack(struct task_struct *task, unsigned long *sp)
 {
 	unsigned long bp = 0;
-	unsigned long stack;
 
 	/*
 	 * Stack frames below this one aren't interesting.  Don't show them
 	 * if we're printing for %current.
 	 */
 	if (!sp && (!task || task == current)) {
-		sp = &stack;
-		bp = stack_frame(current, NULL);
+		sp = get_stack_pointer(current, NULL);
+		bp = (unsigned long)get_frame_pointer(current, NULL);
 	}
 
 	show_stack_log_lvl(task, NULL, sp, bp, "");
diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
index 0967571..358fe1c 100644
--- a/arch/x86/kernel/dumpstack_32.c
+++ b/arch/x86/kernel/dumpstack_32.c
@@ -46,19 +46,9 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 	int graph = 0;
 	u32 *prev_esp;
 
-	if (!task)
-		task = current;
-
-	if (!stack) {
-		unsigned long dummy;
-
-		stack = &dummy;
-		if (task != current)
-			stack = (unsigned long *)task->thread.sp;
-	}
-
-	if (!bp)
-		bp = stack_frame(task, regs);
+	task = task ? : current;
+	stack = stack ? : get_stack_pointer(task, regs);
+	bp = bp ? : (unsigned long)get_frame_pointer(task, regs);
 
 	for (;;) {
 		void *end_stack;
@@ -95,14 +85,7 @@ show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
 	unsigned long *stack;
 	int i;
 
-	if (sp == NULL) {
-		if (regs)
-			sp = (unsigned long *)regs->sp;
-		else if (task)
-			sp = (unsigned long *)task->thread.sp;
-		else
-			sp = (unsigned long *)&sp;
-	}
+	sp = sp ? : get_stack_pointer(task, regs);
 
 	stack = sp;
 	for (i = 0; i < kstack_depth_to_print; i++) {
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index 066eb5c7..7f3b806 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -151,25 +151,14 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 {
 	const unsigned cpu = get_cpu();
 	unsigned long *irq_stack = (unsigned long *)per_cpu(irq_stack_ptr, cpu);
-	unsigned long dummy;
 	unsigned used = 0;
 	int graph = 0;
 	int done = 0;
 
-	if (!task)
-		task = current;
+	task = task ? : current;
+	stack = stack ? : get_stack_pointer(task, regs);
+	bp = bp ? : (unsigned long)get_frame_pointer(task, regs);
 
-	if (!stack) {
-		if (regs)
-			stack = (unsigned long *)regs->sp;
-		else if (task != current)
-			stack = (unsigned long *)task->thread.sp;
-		else
-			stack = &dummy;
-	}
-
-	if (!bp)
-		bp = stack_frame(task, regs);
 	/*
 	 * Print function call entries in all stacks, starting at the
 	 * current stack address. If the stacks consist of nested
@@ -256,18 +245,7 @@ show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
 	irq_stack_end = (unsigned long *)(per_cpu(irq_stack_ptr, cpu));
 	irq_stack     = irq_stack_end - (IRQ_STACK_SIZE / sizeof(long));
 
-	/*
-	 * Debugging aid: "show_stack(NULL, NULL);" prints the
-	 * back trace for this cpu:
-	 */
-	if (sp == NULL) {
-		if (regs)
-			sp = (unsigned long *)regs->sp;
-		else if (task)
-			sp = (unsigned long *)task->thread.sp;
-		else
-			sp = (unsigned long *)&sp;
-	}
+	sp = sp ? : get_stack_pointer(task, regs);
 
 	stack = sp;
 	for (i = 0; i < kstack_depth_to_print; i++) {

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [tip:x86/asm] x86/dumpstack: Remove unnecessary stack pointer arguments
  2016-08-24 16:50 ` [PATCH 5/6] x86/dumpstack: remove unnecessary stack pointer arguments Josh Poimboeuf
@ 2016-09-08  9:50   ` tip-bot for Josh Poimboeuf
  0 siblings, 0 replies; 42+ messages in thread
From: tip-bot for Josh Poimboeuf @ 2016-09-08  9:50 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: hpa, nilayvaish, tglx, keescook, byungchul.park, fweisbec, luto,
	torvalds, mingo, jpoimboe, linux-kernel, rostedt, peterz,
	brgerst, luto

Commit-ID:  5a8ff54c260ecfed3de9b8d1272eb87826935df8
Gitweb:     http://git.kernel.org/tip/5a8ff54c260ecfed3de9b8d1272eb87826935df8
Author:     Josh Poimboeuf <jpoimboe@redhat.com>
AuthorDate: Wed, 24 Aug 2016 11:50:18 -0500
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 8 Sep 2016 08:58:40 +0200

x86/dumpstack: Remove unnecessary stack pointer arguments

When calling show_stack_log_lvl() or dump_trace() with a regs argument,
providing a stack pointer or frame pointer is redundant.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>d
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1694e2e955e3b9a73a3c3d5ba2634344014dd550.1472057064.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/dumpstack.c    | 2 +-
 arch/x86/kernel/dumpstack_32.c | 2 +-
 arch/x86/kernel/dumpstack_64.c | 5 +----
 arch/x86/oprofile/backtrace.c  | 5 +----
 4 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index 6d6f468..c6c6c39 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -185,7 +185,7 @@ void show_stack(struct task_struct *task, unsigned long *sp)
 
 void show_stack_regs(struct pt_regs *regs)
 {
-	show_stack_log_lvl(current, regs, (unsigned long *)regs->sp, regs->bp, "");
+	show_stack_log_lvl(current, regs, NULL, 0, "");
 }
 
 static arch_spinlock_t die_lock = __ARCH_SPIN_LOCK_UNLOCKED;
diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
index 358fe1c..c533b8b 100644
--- a/arch/x86/kernel/dumpstack_32.c
+++ b/arch/x86/kernel/dumpstack_32.c
@@ -122,7 +122,7 @@ void show_regs(struct pt_regs *regs)
 		u8 *ip;
 
 		pr_emerg("Stack:\n");
-		show_stack_log_lvl(NULL, regs, &regs->sp, 0, KERN_EMERG);
+		show_stack_log_lvl(NULL, regs, NULL, 0, KERN_EMERG);
 
 		pr_emerg("Code:");
 
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index 7f3b806..b243352 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -283,9 +283,7 @@ show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
 void show_regs(struct pt_regs *regs)
 {
 	int i;
-	unsigned long sp;
 
-	sp = regs->sp;
 	show_regs_print_info(KERN_DEFAULT);
 	__show_regs(regs, 1);
 
@@ -300,8 +298,7 @@ void show_regs(struct pt_regs *regs)
 		u8 *ip;
 
 		printk(KERN_DEFAULT "Stack:\n");
-		show_stack_log_lvl(NULL, regs, (unsigned long *)sp,
-				   0, KERN_DEFAULT);
+		show_stack_log_lvl(NULL, regs, NULL, 0, KERN_DEFAULT);
 
 		printk(KERN_DEFAULT "Code: ");
 
diff --git a/arch/x86/oprofile/backtrace.c b/arch/x86/oprofile/backtrace.c
index 2ef6c8b..d950f9e 100644
--- a/arch/x86/oprofile/backtrace.c
+++ b/arch/x86/oprofile/backtrace.c
@@ -113,8 +113,6 @@ x86_backtrace(struct pt_regs * const regs, unsigned int depth)
 	struct stack_frame *head = (struct stack_frame *)frame_pointer(regs);
 
 	if (!user_mode(regs)) {
-		unsigned long stack = kernel_stack_pointer(regs);
-
 		if (!depth)
 			return;
 
@@ -122,8 +120,7 @@ x86_backtrace(struct pt_regs * const regs, unsigned int depth)
 		if (!--depth)
 			return;
 
-		dump_trace(NULL, regs, (unsigned long *)stack, 0,
-			   &backtrace_ops, &depth);
+		dump_trace(NULL, regs, NULL, 0, &backtrace_ops, &depth);
 		return;
 	}
 

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH 6/6] x86/dumpstack: allow preemption in show_stack_log_lvl() and dump_trace()
  2016-09-08  7:04   ` Ingo Molnar
@ 2016-09-08 21:47     ` Josh Poimboeuf
  2016-09-08 21:49       ` [PATCH v2] " Josh Poimboeuf
  2016-09-09  6:11       ` [PATCH 6/6] x86/dumpstack: allow " Ingo Molnar
  0 siblings, 2 replies; 42+ messages in thread
From: Josh Poimboeuf @ 2016-09-08 21:47 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Gleixner, H . Peter Anvin, x86, linux-kernel,
	Andy Lutomirski, Linus Torvalds, Steven Rostedt, Brian Gerst,
	Kees Cook, Peter Zijlstra, Frederic Weisbecker, Byungchul Park,
	Nilay Vaish

On Thu, Sep 08, 2016 at 09:04:01AM +0200, Ingo Molnar wrote:
> 
> * Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> 
> >  arch/x86/kernel/dumpstack_32.c | 14 ++++++--------
> 
> > --- a/arch/x86/kernel/dumpstack_32.c
> > +++ b/arch/x86/kernel/dumpstack_32.c
> 
> > +static void *is_softirq_stack(unsigned long *stack);
> >  {
> 
> FYI, this bit clearly wasn't build tested on 32-bit even once - I'm skipping this 
> patch.

Argh, sorry about that.  I did a lot of testing on 32-bit with this
patch before.  I guess I goofed somehow when I split up the patches.

-- 
Josh

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v2] x86/dumpstack: allow preemption in show_stack_log_lvl() and dump_trace()
  2016-09-08 21:47     ` Josh Poimboeuf
@ 2016-09-08 21:49       ` Josh Poimboeuf
  2016-09-13 18:29         ` Ingo Molnar
  2016-09-14 17:39         ` [tip:x86/asm] x86/dumpstack: Allow " tip-bot for Josh Poimboeuf
  2016-09-09  6:11       ` [PATCH 6/6] x86/dumpstack: allow " Ingo Molnar
  1 sibling, 2 replies; 42+ messages in thread
From: Josh Poimboeuf @ 2016-09-08 21:49 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Kees Cook, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86,
	linux-kernel, Andy Lutomirski, Steven Rostedt, Brian Gerst,
	Peter Zijlstra, Frederic Weisbecker, Byungchul Park, Nilay Vaish

show_stack_log_lvl() and dump_trace() are already preemption safe:

- If they're running in irq or exception context, preemption is already
  disabled and the percpu stack pointers can be trusted.

- If they're running with preemption enabled, they must be running on
  the task stack anyway, so it doesn't matter if they're comparing the
  stack pointer against a percpu stack pointer from this CPU or another
  one: either way it won't match.
---
 arch/x86/kernel/dumpstack_32.c | 14 ++++++--------
 arch/x86/kernel/dumpstack_64.c | 26 +++++++++-----------------
 2 files changed, 15 insertions(+), 25 deletions(-)

diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
index c533b8b..da5cd62 100644
--- a/arch/x86/kernel/dumpstack_32.c
+++ b/arch/x86/kernel/dumpstack_32.c
@@ -24,16 +24,16 @@ static void *is_irq_stack(void *p, void *irq)
 }
 
 
-static void *is_hardirq_stack(unsigned long *stack, int cpu)
+static void *is_hardirq_stack(unsigned long *stack)
 {
-	void *irq = per_cpu(hardirq_stack, cpu);
+	void *irq = this_cpu_read(hardirq_stack);
 
 	return is_irq_stack(stack, irq);
 }
 
-static void *is_softirq_stack(unsigned long *stack, int cpu)
+static void *is_softirq_stack(unsigned long *stack)
 {
-	void *irq = per_cpu(softirq_stack, cpu);
+	void *irq = this_cpu_read(softirq_stack);
 
 	return is_irq_stack(stack, irq);
 }
@@ -42,7 +42,6 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 		unsigned long *stack, unsigned long bp,
 		const struct stacktrace_ops *ops, void *data)
 {
-	const unsigned cpu = get_cpu();
 	int graph = 0;
 	u32 *prev_esp;
 
@@ -53,9 +52,9 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 	for (;;) {
 		void *end_stack;
 
-		end_stack = is_hardirq_stack(stack, cpu);
+		end_stack = is_hardirq_stack(stack);
 		if (!end_stack)
-			end_stack = is_softirq_stack(stack, cpu);
+			end_stack = is_softirq_stack(stack);
 
 		bp = ops->walk_stack(task, stack, bp, ops, data,
 				     end_stack, &graph);
@@ -74,7 +73,6 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 			break;
 		touch_nmi_watchdog();
 	}
-	put_cpu();
 }
 EXPORT_SYMBOL(dump_trace);
 
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index b243352..07373be 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -31,8 +31,8 @@ static char x86_stack_ids[][8] = {
 #endif
 };
 
-static unsigned long *in_exception_stack(unsigned cpu, unsigned long stack,
-					 unsigned *usedp, char **idp)
+static unsigned long *in_exception_stack(unsigned long stack, unsigned *usedp,
+					 char **idp)
 {
 	unsigned k;
 
@@ -41,7 +41,7 @@ static unsigned long *in_exception_stack(unsigned cpu, unsigned long stack,
 	 * 'stack' is in one of them:
 	 */
 	for (k = 0; k < N_EXCEPTION_STACKS; k++) {
-		unsigned long end = per_cpu(orig_ist, cpu).ist[k];
+		unsigned long end = raw_cpu_ptr(&orig_ist)->ist[k];
 		/*
 		 * Is 'stack' above this exception frame's end?
 		 * If yes then skip to the next frame.
@@ -111,7 +111,7 @@ enum stack_type {
 };
 
 static enum stack_type
-analyze_stack(int cpu, struct task_struct *task, unsigned long *stack,
+analyze_stack(struct task_struct *task, unsigned long *stack,
 	      unsigned long **stack_end, unsigned long *irq_stack,
 	      unsigned *used, char **id)
 {
@@ -121,8 +121,7 @@ analyze_stack(int cpu, struct task_struct *task, unsigned long *stack,
 	if ((unsigned long)task_stack_page(task) == addr)
 		return STACK_IS_NORMAL;
 
-	*stack_end = in_exception_stack(cpu, (unsigned long)stack,
-					used, id);
+	*stack_end = in_exception_stack((unsigned long)stack, used, id);
 	if (*stack_end)
 		return STACK_IS_EXCEPTION;
 
@@ -149,8 +148,7 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 		unsigned long *stack, unsigned long bp,
 		const struct stacktrace_ops *ops, void *data)
 {
-	const unsigned cpu = get_cpu();
-	unsigned long *irq_stack = (unsigned long *)per_cpu(irq_stack_ptr, cpu);
+	unsigned long *irq_stack = (unsigned long *)this_cpu_read(irq_stack_ptr);
 	unsigned used = 0;
 	int graph = 0;
 	int done = 0;
@@ -169,8 +167,8 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 		enum stack_type stype;
 		char *id;
 
-		stype = analyze_stack(cpu, task, stack, &stack_end,
-				      irq_stack, &used, &id);
+		stype = analyze_stack(task, stack, &stack_end, irq_stack, &used,
+				      &id);
 
 		/* Default finish unless specified to continue */
 		done = 1;
@@ -225,7 +223,6 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 	 * This handles the process stack:
 	 */
 	bp = ops->walk_stack(task, stack, bp, ops, data, NULL, &graph);
-	put_cpu();
 }
 EXPORT_SYMBOL(dump_trace);
 
@@ -236,13 +233,9 @@ show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
 	unsigned long *irq_stack_end;
 	unsigned long *irq_stack;
 	unsigned long *stack;
-	int cpu;
 	int i;
 
-	preempt_disable();
-	cpu = smp_processor_id();
-
-	irq_stack_end = (unsigned long *)(per_cpu(irq_stack_ptr, cpu));
+	irq_stack_end = (unsigned long *)this_cpu_read(irq_stack_ptr);
 	irq_stack     = irq_stack_end - (IRQ_STACK_SIZE / sizeof(long));
 
 	sp = sp ? : get_stack_pointer(task, regs);
@@ -274,7 +267,6 @@ show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
 		stack++;
 		touch_nmi_watchdog();
 	}
-	preempt_enable();
 
 	pr_cont("\n");
 	show_trace_log_lvl(task, regs, sp, bp, log_lvl);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH 6/6] x86/dumpstack: allow preemption in show_stack_log_lvl() and dump_trace()
  2016-09-08 21:47     ` Josh Poimboeuf
  2016-09-08 21:49       ` [PATCH v2] " Josh Poimboeuf
@ 2016-09-09  6:11       ` Ingo Molnar
  1 sibling, 0 replies; 42+ messages in thread
From: Ingo Molnar @ 2016-09-09  6:11 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Thomas Gleixner, H . Peter Anvin, x86, linux-kernel,
	Andy Lutomirski, Linus Torvalds, Steven Rostedt, Brian Gerst,
	Kees Cook, Peter Zijlstra, Frederic Weisbecker, Byungchul Park,
	Nilay Vaish


* Josh Poimboeuf <jpoimboe@redhat.com> wrote:

> On Thu, Sep 08, 2016 at 09:04:01AM +0200, Ingo Molnar wrote:
> > 
> > * Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > 
> > >  arch/x86/kernel/dumpstack_32.c | 14 ++++++--------
> > 
> > > --- a/arch/x86/kernel/dumpstack_32.c
> > > +++ b/arch/x86/kernel/dumpstack_32.c
> > 
> > > +static void *is_softirq_stack(unsigned long *stack);
> > >  {
> > 
> > FYI, this bit clearly wasn't build tested on 32-bit even once - I'm skipping this 
> > patch.
> 
> Argh, sorry about that.  I did a lot of testing on 32-bit with this
> patch before.  I guess I goofed somehow when I split up the patches.

No problem!

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2] x86/dumpstack: allow preemption in show_stack_log_lvl() and dump_trace()
  2016-09-08 21:49       ` [PATCH v2] " Josh Poimboeuf
@ 2016-09-13 18:29         ` Ingo Molnar
  2016-09-13 19:38           ` Josh Poimboeuf
  2016-09-14 17:39         ` [tip:x86/asm] x86/dumpstack: Allow " tip-bot for Josh Poimboeuf
  1 sibling, 1 reply; 42+ messages in thread
From: Ingo Molnar @ 2016-09-13 18:29 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Linus Torvalds, Kees Cook, Thomas Gleixner, H . Peter Anvin, x86,
	linux-kernel, Andy Lutomirski, Steven Rostedt, Brian Gerst,
	Peter Zijlstra, Frederic Weisbecker, Byungchul Park, Nilay Vaish


* Josh Poimboeuf <jpoimboe@redhat.com> wrote:

> show_stack_log_lvl() and dump_trace() are already preemption safe:
> 
> - If they're running in irq or exception context, preemption is already
>   disabled and the percpu stack pointers can be trusted.
> 
> - If they're running with preemption enabled, they must be running on
>   the task stack anyway, so it doesn't matter if they're comparing the
>   stack pointer against a percpu stack pointer from this CPU or another
>   one: either way it won't match.

Yeah, so I'm having second thoughts about this patch. My worry here is: what if we 
get preempted in this sequence?

If the kernel is borked real bad then we could get technically correct but really, 
really weird looking stack traces if for example the task stack is getting 
corrupted or something like that.

Dunno. How long does the worst-case processing here take on a typical x86 system, 
does it really matter to scheduling latency?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2] x86/dumpstack: allow preemption in show_stack_log_lvl() and dump_trace()
  2016-09-13 18:29         ` Ingo Molnar
@ 2016-09-13 19:38           ` Josh Poimboeuf
  0 siblings, 0 replies; 42+ messages in thread
From: Josh Poimboeuf @ 2016-09-13 19:38 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Kees Cook, Thomas Gleixner, H . Peter Anvin, x86,
	linux-kernel, Andy Lutomirski, Steven Rostedt, Brian Gerst,
	Peter Zijlstra, Frederic Weisbecker, Byungchul Park, Nilay Vaish

On Tue, Sep 13, 2016 at 08:29:57PM +0200, Ingo Molnar wrote:
> 
> * Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> 
> > show_stack_log_lvl() and dump_trace() are already preemption safe:
> > 
> > - If they're running in irq or exception context, preemption is already
> >   disabled and the percpu stack pointers can be trusted.
> > 
> > - If they're running with preemption enabled, they must be running on
> >   the task stack anyway, so it doesn't matter if they're comparing the
> >   stack pointer against a percpu stack pointer from this CPU or another
> >   one: either way it won't match.
> 
> Yeah, so I'm having second thoughts about this patch. My worry here is: what if we 
> get preempted in this sequence?
> 
> If the kernel is borked real bad then we could get technically correct but really, 
> really weird looking stack traces if for example the task stack is getting 
> corrupted or something like that.

If it's in the oops or BUG path, there can't be preemption anyway
because oops_begin() disables interrupts.

It does look like the WARN path could get preempted.  Not to mention all
the other callers of show_regs(), dump_stack(), show_stack_log_lvl(),
etc.  In those cases, if the stack dump got preempted in the middle, and
then another task dumped its stack, the two dumps could be interspersed
a bit which would indeed be a little confusing.

But that would be quite rare.  And anyway, we already have the same
issue today when two CPUs are dumping the stack at the same time.  So I
don't think it's much of an issue.

> Dunno. How long does the worst-case processing here take on a typical x86 system, 
> does it really matter to scheduling latency?

I haven't heard any complaints about latency.  The goal was just to try
to simplify the code a bit.

-- 
Josh

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [tip:x86/asm] x86/dumpstack: Allow preemption in show_stack_log_lvl() and dump_trace()
  2016-09-08 21:49       ` [PATCH v2] " Josh Poimboeuf
  2016-09-13 18:29         ` Ingo Molnar
@ 2016-09-14 17:39         ` tip-bot for Josh Poimboeuf
  1 sibling, 0 replies; 42+ messages in thread
From: tip-bot for Josh Poimboeuf @ 2016-09-14 17:39 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: luto, dvlasenk, luto, linux-kernel, torvalds, brgerst, hpa,
	rostedt, tglx, fweisbec, nilayvaish, peterz, bp, byungchul.park,
	mingo, keescook, jpoimboe

Commit-ID:  cfeeed279dc2fa83a00fbe4856ebd231d56201ab
Gitweb:     http://git.kernel.org/tip/cfeeed279dc2fa83a00fbe4856ebd231d56201ab
Author:     Josh Poimboeuf <jpoimboe@redhat.com>
AuthorDate: Thu, 8 Sep 2016 16:49:20 -0500
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 14 Sep 2016 17:23:30 +0200

x86/dumpstack: Allow preemption in show_stack_log_lvl() and dump_trace()

show_stack_log_lvl() and dump_trace() are already preemption safe:

- If they're running in irq or exception context, preemption is already
  disabled and the percpu stack pointers can be trusted.

- If they're running with preemption enabled, they must be running on
  the task stack anyway, so it doesn't matter if they're comparing the
  stack pointer against a percpu stack pointer from this CPU or another
  one: either way it won't match.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/a0ca0b1044eca97d4f0ec7c1619cf80b3b65560d.1473371307.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/dumpstack_32.c | 14 ++++++--------
 arch/x86/kernel/dumpstack_64.c | 26 +++++++++-----------------
 2 files changed, 15 insertions(+), 25 deletions(-)

diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
index c533b8b..da5cd62 100644
--- a/arch/x86/kernel/dumpstack_32.c
+++ b/arch/x86/kernel/dumpstack_32.c
@@ -24,16 +24,16 @@ static void *is_irq_stack(void *p, void *irq)
 }
 
 
-static void *is_hardirq_stack(unsigned long *stack, int cpu)
+static void *is_hardirq_stack(unsigned long *stack)
 {
-	void *irq = per_cpu(hardirq_stack, cpu);
+	void *irq = this_cpu_read(hardirq_stack);
 
 	return is_irq_stack(stack, irq);
 }
 
-static void *is_softirq_stack(unsigned long *stack, int cpu)
+static void *is_softirq_stack(unsigned long *stack)
 {
-	void *irq = per_cpu(softirq_stack, cpu);
+	void *irq = this_cpu_read(softirq_stack);
 
 	return is_irq_stack(stack, irq);
 }
@@ -42,7 +42,6 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 		unsigned long *stack, unsigned long bp,
 		const struct stacktrace_ops *ops, void *data)
 {
-	const unsigned cpu = get_cpu();
 	int graph = 0;
 	u32 *prev_esp;
 
@@ -53,9 +52,9 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 	for (;;) {
 		void *end_stack;
 
-		end_stack = is_hardirq_stack(stack, cpu);
+		end_stack = is_hardirq_stack(stack);
 		if (!end_stack)
-			end_stack = is_softirq_stack(stack, cpu);
+			end_stack = is_softirq_stack(stack);
 
 		bp = ops->walk_stack(task, stack, bp, ops, data,
 				     end_stack, &graph);
@@ -74,7 +73,6 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 			break;
 		touch_nmi_watchdog();
 	}
-	put_cpu();
 }
 EXPORT_SYMBOL(dump_trace);
 
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index b243352..07373be 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -31,8 +31,8 @@ static char x86_stack_ids[][8] = {
 #endif
 };
 
-static unsigned long *in_exception_stack(unsigned cpu, unsigned long stack,
-					 unsigned *usedp, char **idp)
+static unsigned long *in_exception_stack(unsigned long stack, unsigned *usedp,
+					 char **idp)
 {
 	unsigned k;
 
@@ -41,7 +41,7 @@ static unsigned long *in_exception_stack(unsigned cpu, unsigned long stack,
 	 * 'stack' is in one of them:
 	 */
 	for (k = 0; k < N_EXCEPTION_STACKS; k++) {
-		unsigned long end = per_cpu(orig_ist, cpu).ist[k];
+		unsigned long end = raw_cpu_ptr(&orig_ist)->ist[k];
 		/*
 		 * Is 'stack' above this exception frame's end?
 		 * If yes then skip to the next frame.
@@ -111,7 +111,7 @@ enum stack_type {
 };
 
 static enum stack_type
-analyze_stack(int cpu, struct task_struct *task, unsigned long *stack,
+analyze_stack(struct task_struct *task, unsigned long *stack,
 	      unsigned long **stack_end, unsigned long *irq_stack,
 	      unsigned *used, char **id)
 {
@@ -121,8 +121,7 @@ analyze_stack(int cpu, struct task_struct *task, unsigned long *stack,
 	if ((unsigned long)task_stack_page(task) == addr)
 		return STACK_IS_NORMAL;
 
-	*stack_end = in_exception_stack(cpu, (unsigned long)stack,
-					used, id);
+	*stack_end = in_exception_stack((unsigned long)stack, used, id);
 	if (*stack_end)
 		return STACK_IS_EXCEPTION;
 
@@ -149,8 +148,7 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 		unsigned long *stack, unsigned long bp,
 		const struct stacktrace_ops *ops, void *data)
 {
-	const unsigned cpu = get_cpu();
-	unsigned long *irq_stack = (unsigned long *)per_cpu(irq_stack_ptr, cpu);
+	unsigned long *irq_stack = (unsigned long *)this_cpu_read(irq_stack_ptr);
 	unsigned used = 0;
 	int graph = 0;
 	int done = 0;
@@ -169,8 +167,8 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 		enum stack_type stype;
 		char *id;
 
-		stype = analyze_stack(cpu, task, stack, &stack_end,
-				      irq_stack, &used, &id);
+		stype = analyze_stack(task, stack, &stack_end, irq_stack, &used,
+				      &id);
 
 		/* Default finish unless specified to continue */
 		done = 1;
@@ -225,7 +223,6 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 	 * This handles the process stack:
 	 */
 	bp = ops->walk_stack(task, stack, bp, ops, data, NULL, &graph);
-	put_cpu();
 }
 EXPORT_SYMBOL(dump_trace);
 
@@ -236,13 +233,9 @@ show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
 	unsigned long *irq_stack_end;
 	unsigned long *irq_stack;
 	unsigned long *stack;
-	int cpu;
 	int i;
 
-	preempt_disable();
-	cpu = smp_processor_id();
-
-	irq_stack_end = (unsigned long *)(per_cpu(irq_stack_ptr, cpu));
+	irq_stack_end = (unsigned long *)this_cpu_read(irq_stack_ptr);
 	irq_stack     = irq_stack_end - (IRQ_STACK_SIZE / sizeof(long));
 
 	sp = sp ? : get_stack_pointer(task, regs);
@@ -274,7 +267,6 @@ show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
 		stack++;
 		touch_nmi_watchdog();
 	}
-	preempt_enable();
 
 	pr_cont("\n");
 	show_trace_log_lvl(task, regs, sp, bp, log_lvl);

^ permalink raw reply related	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2016-09-14 17:42 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-24 16:50 [PATCH 0/6] x86/dumpstack: more stack dump improvements Josh Poimboeuf
2016-08-24 16:50 ` [PATCH 1/6] perf/x86: check perf_callchain_store() error Josh Poimboeuf
2016-09-08  9:48   ` [tip:x86/asm] perf/x86: Check " tip-bot for Josh Poimboeuf
2016-08-24 16:50 ` [PATCH 2/6] oprofile/x86: add regs->ip to oprofile trace Josh Poimboeuf
2016-08-30 11:30   ` Robert Richter
2016-09-08  9:48   ` [tip:x86/asm] oprofile/x86: Add " tip-bot for Josh Poimboeuf
2016-08-24 16:50 ` [PATCH 3/6] x86/dumpstack: make printk_stack_address() more generally useful Josh Poimboeuf
2016-08-24 17:28   ` Joe Perches
2016-08-24 18:43     ` Josh Poimboeuf
2016-08-24 19:07       ` Joe Perches
2016-08-24 19:24         ` Josh Poimboeuf
2016-08-24 19:57           ` Joe Perches
2016-08-24 18:03   ` Linus Torvalds
2016-08-24 18:22     ` Peter Zijlstra
2016-08-24 18:37       ` Linus Torvalds
2016-08-24 19:37         ` Josh Poimboeuf
2016-08-25 17:49           ` Josh Poimboeuf
2016-08-25 20:41             ` Kees Cook
2016-08-25 21:07               ` Josh Poimboeuf
     [not found]                 ` <CA+55aFy3sgA4=ZPhiDWiRMvWj9QPhUd0JBdUr1hm_6G0aSC6uw@mail.gmail.com>
2016-08-26  2:19                   ` Kees Cook
2016-08-26  3:19                   ` Josh Poimboeuf
2016-08-26  4:40                     ` Linus Torvalds
2016-08-26  5:56                       ` Josh Poimboeuf
     [not found]                         ` <CA+55aFy8FPRiEr-4p++TGj+keTNsg781q1E1FQZ7z4+nAs0ZaQ@mail.gmail.com>
2016-08-26 13:30                           ` Josh Poimboeuf
2016-08-31 16:53         ` Josh Poimboeuf
2016-08-31 17:15           ` Linus Torvalds
2016-09-01 13:09             ` Josh Poimboeuf
2016-09-01 16:30               ` Linus Torvalds
2016-09-08  9:49   ` [tip:x86/asm] x86/dumpstack: Make " tip-bot for Josh Poimboeuf
2016-08-24 16:50 ` [PATCH 4/6] x86/dumpstack: add get_stack_pointer() and get_frame_pointer() Josh Poimboeuf
2016-09-08  9:49   ` [tip:x86/asm] x86/dumpstack: Add " tip-bot for Josh Poimboeuf
2016-08-24 16:50 ` [PATCH 5/6] x86/dumpstack: remove unnecessary stack pointer arguments Josh Poimboeuf
2016-09-08  9:50   ` [tip:x86/asm] x86/dumpstack: Remove " tip-bot for Josh Poimboeuf
2016-08-24 16:50 ` [PATCH 6/6] x86/dumpstack: allow preemption in show_stack_log_lvl() and dump_trace() Josh Poimboeuf
2016-09-08  7:04   ` Ingo Molnar
2016-09-08 21:47     ` Josh Poimboeuf
2016-09-08 21:49       ` [PATCH v2] " Josh Poimboeuf
2016-09-13 18:29         ` Ingo Molnar
2016-09-13 19:38           ` Josh Poimboeuf
2016-09-14 17:39         ` [tip:x86/asm] x86/dumpstack: Allow " tip-bot for Josh Poimboeuf
2016-09-09  6:11       ` [PATCH 6/6] x86/dumpstack: allow " Ingo Molnar
2016-09-01 13:13 ` [PATCH 0/6] x86/dumpstack: more stack dump improvements Josh Poimboeuf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.