linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/7] arch: More wchan fixes
@ 2021-10-22 15:09 Peter Zijlstra
  2021-10-22 15:09 ` [PATCH 1/7] x86: Fix __get_wchan() for !STACKTRACE Peter Zijlstra
                   ` (7 more replies)
  0 siblings, 8 replies; 28+ messages in thread
From: Peter Zijlstra @ 2021-10-22 15:09 UTC (permalink / raw)
  To: keescook, x86
  Cc: linux-kernel, peterz, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot, akpm,
	mark.rutland, zhengqi.arch, linux, catalin.marinas, will, mpe,
	paul.walmsley, palmer, hca, gor, borntraeger, linux-arch, ardb

Hi,

Respin of the rest of the get_wchan() rework [1]. Given the many problems with
STACKTRACE this series focuses on the newer ARCH_STACKWALK since that's a
smaller set of architectures to review with a better specified interface.

The idea is to piece-wise convert the rest of the architectures to
ARCH_STACKWALK, but that can be done at leasure.

PowerPC and ARM64 are a bit funny in that their __switch_to() is in C and ends
up on the stack. For now simply mark it __sched to make it work. Mark wanted to
maybe do something different, but I think this ought to get us started.



^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 1/7] x86: Fix __get_wchan() for !STACKTRACE
  2021-10-22 15:09 [PATCH 0/7] arch: More wchan fixes Peter Zijlstra
@ 2021-10-22 15:09 ` Peter Zijlstra
  2021-10-22 16:25   ` Kees Cook
  2021-10-26 19:16   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
  2021-10-22 15:09 ` [PATCH 2/7] stacktrace,sched: Make stack_trace_save_tsk() more robust Peter Zijlstra
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 28+ messages in thread
From: Peter Zijlstra @ 2021-10-22 15:09 UTC (permalink / raw)
  To: keescook, x86
  Cc: linux-kernel, peterz, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot, akpm,
	mark.rutland, zhengqi.arch, linux, catalin.marinas, will, mpe,
	paul.walmsley, palmer, hca, gor, borntraeger, linux-arch, ardb,
	Stephen Rothwell

Use asm/unwind.h to implement wchan, since we cannot always rely on
STACKTRACE=y.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Fixes: bc9bbb81730e ("x86: Fix get_wchan() to support the ORC unwinder")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/kernel/process.c |   17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -43,6 +43,7 @@
 #include <asm/io_bitmap.h>
 #include <asm/proto.h>
 #include <asm/frame.h>
+#include <asm/unwind.h>
 
 #include "process.h"
 
@@ -949,10 +950,20 @@ unsigned long arch_randomize_brk(struct
  */
 unsigned long __get_wchan(struct task_struct *p)
 {
-	unsigned long entry = 0;
+	struct unwind_state state;
+	unsigned long addr = 0;
 
-	stack_trace_save_tsk(p, &entry, 1, 0);
-	return entry;
+	for (unwind_start(&state, p, NULL, NULL); !unwind_done(&state);
+	     unwind_next_frame(&state)) {
+		addr = unwind_get_return_address(&state);
+		if (!addr)
+			break;
+		if (in_sched_functions(addr))
+			continue;
+		break;
+	}
+
+	return addr;
 }
 
 long do_arch_prctl_common(struct task_struct *task, int option,



^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 2/7] stacktrace,sched: Make stack_trace_save_tsk() more robust
  2021-10-22 15:09 [PATCH 0/7] arch: More wchan fixes Peter Zijlstra
  2021-10-22 15:09 ` [PATCH 1/7] x86: Fix __get_wchan() for !STACKTRACE Peter Zijlstra
@ 2021-10-22 15:09 ` Peter Zijlstra
  2021-10-22 16:25   ` Kees Cook
  2021-10-25 16:27   ` Peter Zijlstra
  2021-10-22 15:09 ` [PATCH 3/7] ARM: implement ARCH_STACKWALK Peter Zijlstra
                   ` (5 subsequent siblings)
  7 siblings, 2 replies; 28+ messages in thread
From: Peter Zijlstra @ 2021-10-22 15:09 UTC (permalink / raw)
  To: keescook, x86
  Cc: linux-kernel, peterz, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot, akpm,
	mark.rutland, zhengqi.arch, linux, catalin.marinas, will, mpe,
	paul.walmsley, palmer, hca, gor, borntraeger, linux-arch, ardb

Recent patches to get_wchan() made it more robust by only doing the
unwind when the task was blocked and serialized against wakeups.

Extract this functionality as a simpler companion to task_call_func()
named task_try_func() that really only cares about blocked tasks. Then
employ this new function to implement the same robustness for
ARCH_STACKWALK based stack_trace_save_tsk().

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/wait.h |    1 
 kernel/sched/core.c  |   62 ++++++++++++++++++++++++++++++++++++++++++++-------
 kernel/stacktrace.c  |   13 ++++++----
 3 files changed, 63 insertions(+), 13 deletions(-)

--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -1162,5 +1162,6 @@ int autoremove_wake_function(struct wait
 
 typedef int (*task_call_f)(struct task_struct *p, void *arg);
 extern int task_call_func(struct task_struct *p, task_call_f func, void *arg);
+extern int task_try_func(struct task_struct *p, task_call_f func, void *arg);
 
 #endif /* _LINUX_WAIT_H */
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1966,21 +1966,21 @@ bool sched_task_on_rq(struct task_struct
 	return task_on_rq_queued(p);
 }
 
+static int try_get_wchan(struct task_struct *p, void *arg)
+{
+	unsigned long *wchan = arg;
+	*wchan = __get_wchan(p);
+	return 0;
+}
+
 unsigned long get_wchan(struct task_struct *p)
 {
 	unsigned long ip = 0;
-	unsigned int state;
 
 	if (!p || p == current)
 		return 0;
 
-	/* Only get wchan if task is blocked and we can keep it that way. */
-	raw_spin_lock_irq(&p->pi_lock);
-	state = READ_ONCE(p->__state);
-	smp_rmb(); /* see try_to_wake_up() */
-	if (state != TASK_RUNNING && state != TASK_WAKING && !p->on_rq)
-		ip = __get_wchan(p);
-	raw_spin_unlock_irq(&p->pi_lock);
+	task_try_func(p, try_get_wchan, &ip);
 
 	return ip;
 }
@@ -4184,6 +4184,52 @@ int task_call_func(struct task_struct *p
 	return ret;
 }
 
+/*
+ * task_try_func - Invoke a function on task in blocked state
+ * @p: Process for which the function is to be invoked
+ * @func: Function to invoke
+ * @arg: Argument to function
+ *
+ * Fix the task in a blocked state, when possible. And if so, invoke @func on it.
+ *
+ * Returns:
+ *  -EBUSY or whatever @func returns
+ */
+int task_try_func(struct task_struct *p, task_call_f func, void *arg)
+{
+	unsigned long flags;
+	unsigned int state;
+	int ret = -EBUSY;
+
+	raw_spin_lock_irqsave(&p->pi_lock, flags);
+
+	state = READ_ONCE(p->__state);
+
+	/*
+	 * Ensure we load p->on_rq after p->__state, otherwise it would be
+	 * possible to, falsely, observe p->on_rq == 0.
+	 *
+	 * See try_to_wake_up() for a longer comment.
+	 */
+	smp_rmb();
+
+	/*
+	 * Since pi->lock blocks try_to_wake_up(), we don't need rq->lock when
+	 * the task is blocked. Make sure to check @state since ttwu() can drop
+	 * locks at the end, see ttwu_queue_wakelist().
+	 */
+	if (state != TASK_RUNNING && state != TASK_WAKING && !p->on_rq) {
+		/*
+		 * The task is blocked and we're holding off wakeupsr. For any
+		 * of the other task states, see task_call_func().
+		 */
+		ret = func(p, arg);
+	}
+
+	raw_spin_unlock_irqrestore(&p->pi_lock, flags);
+	return ret;
+}
+
 /**
  * wake_up_process - Wake up a specific process
  * @p: The process to be woken up.
--- a/kernel/stacktrace.c
+++ b/kernel/stacktrace.c
@@ -123,6 +123,13 @@ unsigned int stack_trace_save(unsigned l
 }
 EXPORT_SYMBOL_GPL(stack_trace_save);
 
+static int try_arch_stack_walk_tsk(struct task_struct *tsk, void *arg)
+{
+	stack_trace_consume_fn consume_entry = stack_trace_consume_entry_nosched;
+	arch_stack_walk(consume_entry, arg, tsk, NULL);
+	return 0;
+}
+
 /**
  * stack_trace_save_tsk - Save a task stack trace into a storage array
  * @task:	The task to examine
@@ -135,7 +142,6 @@ EXPORT_SYMBOL_GPL(stack_trace_save);
 unsigned int stack_trace_save_tsk(struct task_struct *tsk, unsigned long *store,
 				  unsigned int size, unsigned int skipnr)
 {
-	stack_trace_consume_fn consume_entry = stack_trace_consume_entry_nosched;
 	struct stacktrace_cookie c = {
 		.store	= store,
 		.size	= size,
@@ -143,11 +149,8 @@ unsigned int stack_trace_save_tsk(struct
 		.skip	= skipnr + (current == tsk),
 	};
 
-	if (!try_get_task_stack(tsk))
-		return 0;
+	task_try_func(tsk, try_arch_stack_walk_tsk, &c);
 
-	arch_stack_walk(consume_entry, &c, tsk, NULL);
-	put_task_stack(tsk);
 	return c.len;
 }
 



^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 3/7] ARM: implement ARCH_STACKWALK
  2021-10-22 15:09 [PATCH 0/7] arch: More wchan fixes Peter Zijlstra
  2021-10-22 15:09 ` [PATCH 1/7] x86: Fix __get_wchan() for !STACKTRACE Peter Zijlstra
  2021-10-22 15:09 ` [PATCH 2/7] stacktrace,sched: Make stack_trace_save_tsk() more robust Peter Zijlstra
@ 2021-10-22 15:09 ` Peter Zijlstra
  2021-10-22 16:18   ` Kees Cook
  2021-10-22 15:09 ` [PATCH 4/7] arch: Make ARCH_STACKWALK independent of STACKTRACE Peter Zijlstra
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 28+ messages in thread
From: Peter Zijlstra @ 2021-10-22 15:09 UTC (permalink / raw)
  To: keescook, x86
  Cc: linux-kernel, peterz, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot, akpm,
	mark.rutland, zhengqi.arch, linux, catalin.marinas, will, mpe,
	paul.walmsley, palmer, hca, gor, borntraeger, linux-arch, ardb

From: Ard Biesheuvel <ardb@kernel.org>

Implement the flavor of CONFIG_STACKTRACE that relies mostly on generic
code, and only need a small arch_stack_walk() helper that performs the
actual frame unwinding.

Note that this removes the SMP check that used to live in
__save_stack_trace(), but this is no longer needed now that the generic
version of save_stack_trace_tsk() takes care not to walk the call stack
of tasks that are live on other CPUs.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/arm/Kconfig             |    1 
 arch/arm/kernel/stacktrace.c |  114 +++++++------------------------------------
 2 files changed, 20 insertions(+), 95 deletions(-)

--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -24,6 +24,7 @@ config ARM
 	select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
 	select ARCH_HAVE_CUSTOM_GPIO_H
 	select ARCH_HAVE_NMI_SAFE_CMPXCHG if CPU_V7 || CPU_V7M || CPU_V6K
+	select ARCH_STACKWALK
 	select ARCH_HAS_GCOV_PROFILE_ALL
 	select ARCH_KEEP_MEMBLOCK
 	select ARCH_MIGHT_HAVE_PC_PARPORT
--- a/arch/arm/kernel/stacktrace.c
+++ b/arch/arm/kernel/stacktrace.c
@@ -2,6 +2,7 @@
 #include <linux/export.h>
 #include <linux/sched.h>
 #include <linux/sched/debug.h>
+#include <linux/sched/task_stack.h>
 #include <linux/stacktrace.h>
 
 #include <asm/sections.h>
@@ -87,106 +88,29 @@ void notrace walk_stackframe(struct stac
 EXPORT_SYMBOL(walk_stackframe);
 
 #ifdef CONFIG_STACKTRACE
-struct stack_trace_data {
-	struct stack_trace *trace;
-	unsigned int no_sched_functions;
-	unsigned int skip;
-};
-
-static int save_trace(struct stackframe *frame, void *d)
-{
-	struct stack_trace_data *data = d;
-	struct stack_trace *trace = data->trace;
-	struct pt_regs *regs;
-	unsigned long addr = frame->pc;
-
-	if (data->no_sched_functions && in_sched_functions(addr))
-		return 0;
-	if (data->skip) {
-		data->skip--;
-		return 0;
-	}
-
-	trace->entries[trace->nr_entries++] = addr;
-
-	if (trace->nr_entries >= trace->max_entries)
-		return 1;
-
-	if (!in_entry_text(frame->pc))
-		return 0;
-
-	regs = (struct pt_regs *)frame->sp;
-	if ((unsigned long)&regs[1] > ALIGN(frame->sp, THREAD_SIZE))
-		return 0;
-
-	trace->entries[trace->nr_entries++] = regs->ARM_pc;
-
-	return trace->nr_entries >= trace->max_entries;
-}
-
-/* This must be noinline to so that our skip calculation works correctly */
-static noinline void __save_stack_trace(struct task_struct *tsk,
-	struct stack_trace *trace, unsigned int nosched)
+noinline notrace void arch_stack_walk(stack_trace_consume_fn consume_entry,
+				      void *cookie, struct task_struct *task,
+				      struct pt_regs *regs)
 {
-	struct stack_trace_data data;
 	struct stackframe frame;
 
-	data.trace = trace;
-	data.skip = trace->skip;
-	data.no_sched_functions = nosched;
-
-	if (tsk != current) {
-#ifdef CONFIG_SMP
-		/*
-		 * What guarantees do we have here that 'tsk' is not
-		 * running on another CPU?  For now, ignore it as we
-		 * can't guarantee we won't explode.
-		 */
-		return;
-#else
-		frame.fp = thread_saved_fp(tsk);
-		frame.sp = thread_saved_sp(tsk);
-		frame.lr = 0;		/* recovered from the stack */
-		frame.pc = thread_saved_pc(tsk);
-#endif
+	if (regs) {
+		frame.fp = IS_ENABLED(CONFIG_THUMB2_KERNEL) ? regs->ARM_r7
+							    : regs->ARM_fp;
+		frame.sp = regs->ARM_sp;
+		frame.lr = regs->ARM_lr;
+		frame.pc = regs->ARM_pc;
 	} else {
-		/* We don't want this function nor the caller */
-		data.skip += 2;
-		frame.fp = (unsigned long)__builtin_frame_address(0);
-		frame.sp = current_stack_pointer;
-		frame.lr = (unsigned long)__builtin_return_address(0);
-		frame.pc = (unsigned long)__save_stack_trace;
+		frame.fp = thread_saved_fp(task);
+		frame.sp = thread_saved_sp(task);
+		frame.lr = 0;                   /* recovered from the stack */
+		frame.pc = thread_saved_pc(task);
 	}
 
-	walk_stackframe(&frame, save_trace, &data);
-}
-
-void save_stack_trace_regs(struct pt_regs *regs, struct stack_trace *trace)
-{
-	struct stack_trace_data data;
-	struct stackframe frame;
-
-	data.trace = trace;
-	data.skip = trace->skip;
-	data.no_sched_functions = 0;
-
-	frame.fp = regs->ARM_fp;
-	frame.sp = regs->ARM_sp;
-	frame.lr = regs->ARM_lr;
-	frame.pc = regs->ARM_pc;
-
-	walk_stackframe(&frame, save_trace, &data);
-}
-
-void save_stack_trace_tsk(struct task_struct *tsk, struct stack_trace *trace)
-{
-	__save_stack_trace(tsk, trace, 1);
-}
-EXPORT_SYMBOL(save_stack_trace_tsk);
-
-void save_stack_trace(struct stack_trace *trace)
-{
-	__save_stack_trace(current, trace, 0);
+	for (;;) {
+		if (unwind_frame(&frame) < 0 ||
+		    !consume_entry(cookie, frame.pc))
+			break;
+	}
 }
-EXPORT_SYMBOL_GPL(save_stack_trace);
 #endif



^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 4/7] arch: Make ARCH_STACKWALK independent of STACKTRACE
  2021-10-22 15:09 [PATCH 0/7] arch: More wchan fixes Peter Zijlstra
                   ` (2 preceding siblings ...)
  2021-10-22 15:09 ` [PATCH 3/7] ARM: implement ARCH_STACKWALK Peter Zijlstra
@ 2021-10-22 15:09 ` Peter Zijlstra
  2021-10-22 16:18   ` Kees Cook
  2021-10-22 17:06   ` Mark Rutland
  2021-10-22 15:09 ` [PATCH 5/7] powerpc, arm64: Mark __switch_to() as __sched Peter Zijlstra
                   ` (3 subsequent siblings)
  7 siblings, 2 replies; 28+ messages in thread
From: Peter Zijlstra @ 2021-10-22 15:09 UTC (permalink / raw)
  To: keescook, x86
  Cc: linux-kernel, peterz, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot, akpm,
	mark.rutland, zhengqi.arch, linux, catalin.marinas, will, mpe,
	paul.walmsley, palmer, hca, gor, borntraeger, linux-arch, ardb

Make arch_stack_walk() available for ARCH_STACKWALK architectures
without it being entangled in STACKTRACE.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/arm/kernel/stacktrace.c   |    2 --
 arch/arm64/kernel/stacktrace.c |    4 ----
 arch/powerpc/kernel/Makefile   |    3 +--
 arch/riscv/kernel/stacktrace.c |    4 ----
 arch/s390/kernel/Makefile      |    3 +--
 arch/x86/kernel/Makefile       |    2 +-
 include/linux/stacktrace.h     |   33 +++++++++++++++++----------------
 7 files changed, 20 insertions(+), 31 deletions(-)

--- a/arch/arm/kernel/stacktrace.c
+++ b/arch/arm/kernel/stacktrace.c
@@ -87,7 +87,6 @@ void notrace walk_stackframe(struct stac
 }
 EXPORT_SYMBOL(walk_stackframe);
 
-#ifdef CONFIG_STACKTRACE
 noinline notrace void arch_stack_walk(stack_trace_consume_fn consume_entry,
 				      void *cookie, struct task_struct *task,
 				      struct pt_regs *regs)
@@ -113,4 +112,3 @@ noinline notrace void arch_stack_walk(st
 			break;
 	}
 }
-#endif
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -216,8 +216,6 @@ void show_stack(struct task_struct *tsk,
 	barrier();
 }
 
-#ifdef CONFIG_STACKTRACE
-
 noinline notrace void arch_stack_walk(stack_trace_consume_fn consume_entry,
 			      void *cookie, struct task_struct *task,
 			      struct pt_regs *regs)
@@ -236,5 +234,3 @@ noinline notrace void arch_stack_walk(st
 
 	walk_stackframe(task, &frame, consume_entry, cookie);
 }
-
-#endif
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -47,7 +47,7 @@ obj-y				:= cputable.o syscalls.o \
 				   udbg.o misc.o io.o misc_$(BITS).o \
 				   of_platform.o prom_parse.o firmware.o \
 				   hw_breakpoint_constraints.o interrupt.o \
-				   kdebugfs.o
+				   kdebugfs.o stacktrace.o
 obj-y				+= ptrace/
 obj-$(CONFIG_PPC64)		+= setup_64.o \
 				   paca.o nvram_64.o note.o
@@ -116,7 +116,6 @@ obj-$(CONFIG_OPTPROBES)		+= optprobes.o
 obj-$(CONFIG_KPROBES_ON_FTRACE)	+= kprobes-ftrace.o
 obj-$(CONFIG_UPROBES)		+= uprobes.o
 obj-$(CONFIG_PPC_UDBG_16550)	+= legacy_serial.o udbg_16550.o
-obj-$(CONFIG_STACKTRACE)	+= stacktrace.o
 obj-$(CONFIG_SWIOTLB)		+= dma-swiotlb.o
 obj-$(CONFIG_ARCH_HAS_DMA_SET_MASK) += dma-mask.o
 
--- a/arch/riscv/kernel/stacktrace.c
+++ b/arch/riscv/kernel/stacktrace.c
@@ -139,12 +139,8 @@ unsigned long __get_wchan(struct task_st
 	return pc;
 }
 
-#ifdef CONFIG_STACKTRACE
-
 noinline void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie,
 		     struct task_struct *task, struct pt_regs *regs)
 {
 	walk_stackframe(task, regs, consume_entry, cookie);
 }
-
-#endif /* CONFIG_STACKTRACE */
--- a/arch/s390/kernel/Makefile
+++ b/arch/s390/kernel/Makefile
@@ -40,7 +40,7 @@ obj-y	+= sysinfo.o lgr.o os_info.o machi
 obj-y	+= runtime_instr.o cache.o fpu.o dumpstack.o guarded_storage.o sthyi.o
 obj-y	+= entry.o reipl.o relocate_kernel.o kdebugfs.o alternative.o
 obj-y	+= nospec-branch.o ipl_vmparm.o machine_kexec_reloc.o unwind_bc.o
-obj-y	+= smp.o text_amode31.o
+obj-y	+= smp.o text_amode31.o stacktrace.o
 
 extra-y				+= head64.o vmlinux.lds
 
@@ -55,7 +55,6 @@ compat-obj-$(CONFIG_AUDIT)	+= compat_aud
 obj-$(CONFIG_COMPAT)		+= compat_linux.o compat_signal.o
 obj-$(CONFIG_COMPAT)		+= $(compat-obj-y)
 obj-$(CONFIG_EARLY_PRINTK)	+= early_printk.o
-obj-$(CONFIG_STACKTRACE)	+= stacktrace.o
 obj-$(CONFIG_KPROBES)		+= kprobes.o
 obj-$(CONFIG_KPROBES)		+= kprobes_insn_page.o
 obj-$(CONFIG_FUNCTION_TRACER)	+= mcount.o ftrace.o
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -84,7 +84,7 @@ obj-$(CONFIG_IA32_EMULATION)	+= tls.o
 obj-y				+= step.o
 obj-$(CONFIG_INTEL_TXT)		+= tboot.o
 obj-$(CONFIG_ISA_DMA_API)	+= i8237.o
-obj-$(CONFIG_STACKTRACE)	+= stacktrace.o
+obj-y				+= stacktrace.o
 obj-y				+= cpu/
 obj-y				+= acpi/
 obj-y				+= reboot.o
--- a/include/linux/stacktrace.h
+++ b/include/linux/stacktrace.h
@@ -8,21 +8,6 @@
 struct task_struct;
 struct pt_regs;
 
-#ifdef CONFIG_STACKTRACE
-void stack_trace_print(const unsigned long *trace, unsigned int nr_entries,
-		       int spaces);
-int stack_trace_snprint(char *buf, size_t size, const unsigned long *entries,
-			unsigned int nr_entries, int spaces);
-unsigned int stack_trace_save(unsigned long *store, unsigned int size,
-			      unsigned int skipnr);
-unsigned int stack_trace_save_tsk(struct task_struct *task,
-				  unsigned long *store, unsigned int size,
-				  unsigned int skipnr);
-unsigned int stack_trace_save_regs(struct pt_regs *regs, unsigned long *store,
-				   unsigned int size, unsigned int skipnr);
-unsigned int stack_trace_save_user(unsigned long *store, unsigned int size);
-
-/* Internal interfaces. Do not use in generic code */
 #ifdef CONFIG_ARCH_STACKWALK
 
 /**
@@ -75,8 +60,24 @@ int arch_stack_walk_reliable(stack_trace
 
 void arch_stack_walk_user(stack_trace_consume_fn consume_entry, void *cookie,
 			  const struct pt_regs *regs);
+#endif /* CONFIG_ARCH_STACKWALK */
 
-#else /* CONFIG_ARCH_STACKWALK */
+#ifdef CONFIG_STACKTRACE
+void stack_trace_print(const unsigned long *trace, unsigned int nr_entries,
+		       int spaces);
+int stack_trace_snprint(char *buf, size_t size, const unsigned long *entries,
+			unsigned int nr_entries, int spaces);
+unsigned int stack_trace_save(unsigned long *store, unsigned int size,
+			      unsigned int skipnr);
+unsigned int stack_trace_save_tsk(struct task_struct *task,
+				  unsigned long *store, unsigned int size,
+				  unsigned int skipnr);
+unsigned int stack_trace_save_regs(struct pt_regs *regs, unsigned long *store,
+				   unsigned int size, unsigned int skipnr);
+unsigned int stack_trace_save_user(unsigned long *store, unsigned int size);
+
+#ifndef CONFIG_ARCH_STACKWALK
+/* Internal interfaces. Do not use in generic code */
 struct stack_trace {
 	unsigned int nr_entries, max_entries;
 	unsigned long *entries;



^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 5/7] powerpc, arm64: Mark __switch_to() as __sched
  2021-10-22 15:09 [PATCH 0/7] arch: More wchan fixes Peter Zijlstra
                   ` (3 preceding siblings ...)
  2021-10-22 15:09 ` [PATCH 4/7] arch: Make ARCH_STACKWALK independent of STACKTRACE Peter Zijlstra
@ 2021-10-22 15:09 ` Peter Zijlstra
  2021-10-22 16:15   ` Kees Cook
  2021-10-22 17:40   ` Mark Rutland
  2021-10-22 15:09 ` [PATCH 6/7] arch: __get_wchan() || ARCH_STACKWALK Peter Zijlstra
                   ` (2 subsequent siblings)
  7 siblings, 2 replies; 28+ messages in thread
From: Peter Zijlstra @ 2021-10-22 15:09 UTC (permalink / raw)
  To: keescook, x86
  Cc: linux-kernel, peterz, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot, akpm,
	mark.rutland, zhengqi.arch, linux, catalin.marinas, will, mpe,
	paul.walmsley, palmer, hca, gor, borntraeger, linux-arch, ardb

Unlike most of the other architectures, PowerPC and ARM64 have
__switch_to() as a C function which remains on the stack. Their
respective __get_wchan() skips one stack frame unconditionally,
without testing is_sched_functions().

Mark them __sched such that we can forgo that special case.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/arm64/kernel/process.c   |    4 ++--
 arch/powerpc/kernel/process.c |    4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -490,8 +490,8 @@ void update_sctlr_el1(u64 sctlr)
 /*
  * Thread switching.
  */
-__notrace_funcgraph struct task_struct *__switch_to(struct task_struct *prev,
-				struct task_struct *next)
+__notrace_funcgraph __sched
+struct task_struct *__switch_to(struct task_struct *prev, struct task_struct *next)
 {
 	struct task_struct *last;
 
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1201,8 +1201,8 @@ static inline void restore_sprs(struct t
 
 }
 
-struct task_struct *__switch_to(struct task_struct *prev,
-	struct task_struct *new)
+__sched struct task_struct *__switch_to(struct task_struct *prev,
+					struct task_struct *new)
 {
 	struct thread_struct *new_thread, *old_thread;
 	struct task_struct *last;



^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 6/7] arch: __get_wchan() || ARCH_STACKWALK
  2021-10-22 15:09 [PATCH 0/7] arch: More wchan fixes Peter Zijlstra
                   ` (4 preceding siblings ...)
  2021-10-22 15:09 ` [PATCH 5/7] powerpc, arm64: Mark __switch_to() as __sched Peter Zijlstra
@ 2021-10-22 15:09 ` Peter Zijlstra
  2021-10-22 16:13   ` Kees Cook
  2021-10-22 17:52   ` Mark Rutland
  2021-10-22 15:09 ` [PATCH 7/7] selftests: proc: Make sure wchan works when it exists Peter Zijlstra
  2021-10-22 15:27 ` [PATCH 0/7] arch: More wchan fixes Peter Zijlstra
  7 siblings, 2 replies; 28+ messages in thread
From: Peter Zijlstra @ 2021-10-22 15:09 UTC (permalink / raw)
  To: keescook, x86
  Cc: linux-kernel, peterz, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot, akpm,
	mark.rutland, zhengqi.arch, linux, catalin.marinas, will, mpe,
	paul.walmsley, palmer, hca, gor, borntraeger, linux-arch, ardb

Use ARCH_STACKWALK to implement a generic __get_wchan().

STACKTRACE should be possible, but the various implementations of
stack_trace_save_tsk() are not consistent enough for this to work.
ARCH_STACKWALK is a smaller set of architectures with a better defined
interface.

Since get_wchan() pins the task in a blocked state, it is not
necessary to take a reference on the task stack, the task isn't going
anywhere.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/arm/include/asm/processor.h     |    2 -
 arch/arm/kernel/process.c            |   22 --------------------
 arch/arm64/include/asm/processor.h   |    2 -
 arch/arm64/kernel/process.c          |   26 ------------------------
 arch/powerpc/include/asm/processor.h |    2 -
 arch/powerpc/kernel/process.c        |   37 -----------------------------------
 arch/riscv/include/asm/processor.h   |    3 --
 arch/riscv/kernel/stacktrace.c       |   21 -------------------
 arch/s390/include/asm/processor.h    |    1 
 arch/s390/kernel/process.c           |   29 ---------------------------
 arch/x86/include/asm/processor.h     |    2 -
 arch/x86/kernel/process.c            |   25 -----------------------
 kernel/sched/core.c                  |   24 ++++++++++++++++++++++
 13 files changed, 24 insertions(+), 172 deletions(-)

--- a/arch/arm/include/asm/processor.h
+++ b/arch/arm/include/asm/processor.h
@@ -84,8 +84,6 @@ struct task_struct;
 /* Free all resources held by a thread. */
 extern void release_thread(struct task_struct *);
 
-unsigned long __get_wchan(struct task_struct *p);
-
 #define task_pt_regs(p) \
 	((struct pt_regs *)(THREAD_START_SP + task_stack_page(p)) - 1)
 
--- a/arch/arm/kernel/process.c
+++ b/arch/arm/kernel/process.c
@@ -276,28 +276,6 @@ int copy_thread(unsigned long clone_flag
 	return 0;
 }
 
-unsigned long __get_wchan(struct task_struct *p)
-{
-	struct stackframe frame;
-	unsigned long stack_page;
-	int count = 0;
-
-	frame.fp = thread_saved_fp(p);
-	frame.sp = thread_saved_sp(p);
-	frame.lr = 0;			/* recovered from the stack */
-	frame.pc = thread_saved_pc(p);
-	stack_page = (unsigned long)task_stack_page(p);
-	do {
-		if (frame.sp < stack_page ||
-		    frame.sp >= stack_page + THREAD_SIZE ||
-		    unwind_frame(&frame) < 0)
-			return 0;
-		if (!in_sched_functions(frame.pc))
-			return frame.pc;
-	} while (count ++ < 16);
-	return 0;
-}
-
 #ifdef CONFIG_MMU
 #ifdef CONFIG_KUSER_HELPERS
 /*
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -257,8 +257,6 @@ struct task_struct;
 /* Free all resources held by a thread. */
 extern void release_thread(struct task_struct *);
 
-unsigned long __get_wchan(struct task_struct *p);
-
 void update_sctlr_el1(u64 sctlr);
 
 /* Thread switching */
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -528,32 +528,6 @@ struct task_struct *__switch_to(struct t
 	return last;
 }
 
-unsigned long __get_wchan(struct task_struct *p)
-{
-	struct stackframe frame;
-	unsigned long stack_page, ret = 0;
-	int count = 0;
-
-	stack_page = (unsigned long)try_get_task_stack(p);
-	if (!stack_page)
-		return 0;
-
-	start_backtrace(&frame, thread_saved_fp(p), thread_saved_pc(p));
-
-	do {
-		if (unwind_frame(p, &frame))
-			goto out;
-		if (!in_sched_functions(frame.pc)) {
-			ret = frame.pc;
-			goto out;
-		}
-	} while (count++ < 16);
-
-out:
-	put_task_stack(p);
-	return ret;
-}
-
 unsigned long arch_align_stack(unsigned long sp)
 {
 	if (!(current->personality & ADDR_NO_RANDOMIZE) && randomize_va_space)
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -300,8 +300,6 @@ struct thread_struct {
 
 #define task_pt_regs(tsk)	((tsk)->thread.regs)
 
-unsigned long __get_wchan(struct task_struct *p);
-
 #define KSTK_EIP(tsk)  ((tsk)->thread.regs? (tsk)->thread.regs->nip: 0)
 #define KSTK_ESP(tsk)  ((tsk)->thread.regs? (tsk)->thread.regs->gpr[1]: 0)
 
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -2111,43 +2111,6 @@ int validate_sp(unsigned long sp, struct
 
 EXPORT_SYMBOL(validate_sp);
 
-static unsigned long ___get_wchan(struct task_struct *p)
-{
-	unsigned long ip, sp;
-	int count = 0;
-
-	sp = p->thread.ksp;
-	if (!validate_sp(sp, p, STACK_FRAME_OVERHEAD))
-		return 0;
-
-	do {
-		sp = *(unsigned long *)sp;
-		if (!validate_sp(sp, p, STACK_FRAME_OVERHEAD) ||
-		    task_is_running(p))
-			return 0;
-		if (count > 0) {
-			ip = ((unsigned long *)sp)[STACK_FRAME_LR_SAVE];
-			if (!in_sched_functions(ip))
-				return ip;
-		}
-	} while (count++ < 16);
-	return 0;
-}
-
-unsigned long __get_wchan(struct task_struct *p)
-{
-	unsigned long ret;
-
-	if (!try_get_task_stack(p))
-		return 0;
-
-	ret = ___get_wchan(p);
-
-	put_task_stack(p);
-
-	return ret;
-}
-
 static int kstack_depth_to_print = CONFIG_PRINT_STACK_DEPTH;
 
 void __no_sanitize_address show_stack(struct task_struct *tsk,
--- a/arch/riscv/include/asm/processor.h
+++ b/arch/riscv/include/asm/processor.h
@@ -66,9 +66,6 @@ static inline void release_thread(struct
 {
 }
 
-extern unsigned long __get_wchan(struct task_struct *p);
-
-
 static inline void wait_for_interrupt(void)
 {
 	__asm__ __volatile__ ("wfi");
--- a/arch/riscv/kernel/stacktrace.c
+++ b/arch/riscv/kernel/stacktrace.c
@@ -118,27 +118,6 @@ void show_stack(struct task_struct *task
 	dump_backtrace(NULL, task, loglvl);
 }
 
-static bool save_wchan(void *arg, unsigned long pc)
-{
-	if (!in_sched_functions(pc)) {
-		unsigned long *p = arg;
-		*p = pc;
-		return false;
-	}
-	return true;
-}
-
-unsigned long __get_wchan(struct task_struct *task)
-{
-	unsigned long pc = 0;
-
-	if (!try_get_task_stack(task))
-		return 0;
-	walk_stackframe(task, NULL, save_wchan, &pc);
-	put_task_stack(task);
-	return pc;
-}
-
 noinline void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie,
 		     struct task_struct *task, struct pt_regs *regs)
 {
--- a/arch/s390/include/asm/processor.h
+++ b/arch/s390/include/asm/processor.h
@@ -192,7 +192,6 @@ static inline void release_thread(struct
 void guarded_storage_release(struct task_struct *tsk);
 void gs_load_bc_cb(struct pt_regs *regs);
 
-unsigned long __get_wchan(struct task_struct *p);
 #define task_pt_regs(tsk) ((struct pt_regs *) \
         (task_stack_page(tsk) + THREAD_SIZE) - 1)
 #define KSTK_EIP(tsk)	(task_pt_regs(tsk)->psw.addr)
--- a/arch/s390/kernel/process.c
+++ b/arch/s390/kernel/process.c
@@ -181,35 +181,6 @@ void execve_tail(void)
 	asm volatile("sfpc %0" : : "d" (0));
 }
 
-unsigned long __get_wchan(struct task_struct *p)
-{
-	struct unwind_state state;
-	unsigned long ip = 0;
-
-	if (!task_stack_page(p))
-		return 0;
-
-	if (!try_get_task_stack(p))
-		return 0;
-
-	unwind_for_each_frame(&state, p, NULL, 0) {
-		if (state.stack_info.type != STACK_TYPE_TASK) {
-			ip = 0;
-			break;
-		}
-
-		ip = unwind_get_return_address(&state);
-		if (!ip)
-			break;
-
-		if (!in_sched_functions(ip))
-			break;
-	}
-
-	put_task_stack(p);
-	return ip;
-}
-
 unsigned long arch_align_stack(unsigned long sp)
 {
 	if (!(current->personality & ADDR_NO_RANDOMIZE) && randomize_va_space)
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -588,8 +588,6 @@ static inline void load_sp0(unsigned lon
 /* Free all resources held by a thread. */
 extern void release_thread(struct task_struct *);
 
-unsigned long __get_wchan(struct task_struct *p);
-
 /*
  * Generic CPUID function
  * clear %ecx since some cpus (Cyrix MII) do not set or clear %ecx
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -43,7 +43,6 @@
 #include <asm/io_bitmap.h>
 #include <asm/proto.h>
 #include <asm/frame.h>
-#include <asm/unwind.h>
 
 #include "process.h"
 
@@ -942,30 +941,6 @@ unsigned long arch_randomize_brk(struct
 	return randomize_page(mm->brk, 0x02000000);
 }
 
-/*
- * Called from fs/proc with a reference on @p to find the function
- * which called into schedule(). This needs to be done carefully
- * because the task might wake up and we might look at a stack
- * changing under us.
- */
-unsigned long __get_wchan(struct task_struct *p)
-{
-	struct unwind_state state;
-	unsigned long addr = 0;
-
-	for (unwind_start(&state, p, NULL, NULL); !unwind_done(&state);
-	     unwind_next_frame(&state)) {
-		addr = unwind_get_return_address(&state);
-		if (!addr)
-			break;
-		if (in_sched_functions(addr))
-			continue;
-		break;
-	}
-
-	return addr;
-}
-
 long do_arch_prctl_common(struct task_struct *task, int option,
 			  unsigned long cpuid_enabled)
 {
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1966,6 +1966,30 @@ bool sched_task_on_rq(struct task_struct
 	return task_on_rq_queued(p);
 }
 
+#ifdef CONFIG_ARCH_STACKWALK
+
+static bool consume_wchan(void *cookie, unsigned long addr)
+{
+	unsigned long *wchan = cookie;
+
+	if (in_sched_functions(addr))
+		return true;
+
+	*wchan = addr;
+	return false;
+}
+
+static unsigned long __get_wchan(struct task_struct *p)
+{
+	unsigned long wchan = 0;
+
+	arch_stack_walk(consume_wchan, &wchan, p, NULL);
+
+	return wchan;
+}
+
+#endif /* CONFIG_ARCH_STACKWALK */
+
 static int try_get_wchan(struct task_struct *p, void *arg)
 {
 	unsigned long *wchan = arg;



^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 7/7] selftests: proc: Make sure wchan works when it exists
  2021-10-22 15:09 [PATCH 0/7] arch: More wchan fixes Peter Zijlstra
                   ` (5 preceding siblings ...)
  2021-10-22 15:09 ` [PATCH 6/7] arch: __get_wchan() || ARCH_STACKWALK Peter Zijlstra
@ 2021-10-22 15:09 ` Peter Zijlstra
  2021-10-22 15:27 ` [PATCH 0/7] arch: More wchan fixes Peter Zijlstra
  7 siblings, 0 replies; 28+ messages in thread
From: Peter Zijlstra @ 2021-10-22 15:09 UTC (permalink / raw)
  To: keescook, x86
  Cc: linux-kernel, peterz, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot, akpm,
	mark.rutland, zhengqi.arch, linux, catalin.marinas, will, mpe,
	paul.walmsley, palmer, hca, gor, borntraeger, linux-arch, ardb

From: Kees Cook <keescook@chromium.org>

This makes sure that wchan contains a sensible symbol when a process is
blocked. Specifically this calls the sleep() syscall, and expects the
architecture to have called schedule() from a function that has "sleep"
somewhere in its name. For example, on the architectures I tested
(x86_64, arm64, arm, mips, and powerpc) this is "hrtimer_nanosleep":

$ tools/testing/selftests/proc/proc-pid-wchan
ok: found 'sleep' in wchan 'hrtimer_nanosleep'

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20211008235504.2957528-1-keescook@chromium.org
---
 tools/testing/selftests/proc/Makefile         |    1 
 tools/testing/selftests/proc/proc-pid-wchan.c |   69 ++++++++++++++++++++++++++
 2 files changed, 70 insertions(+)
 create mode 100644 tools/testing/selftests/proc/proc-pid-wchan.c

--- a/tools/testing/selftests/proc/Makefile
+++ b/tools/testing/selftests/proc/Makefile
@@ -8,6 +8,7 @@ TEST_GEN_PROGS += fd-002-posix-eq
 TEST_GEN_PROGS += fd-003-kthread
 TEST_GEN_PROGS += proc-loadavg-001
 TEST_GEN_PROGS += proc-pid-vm
+TEST_GEN_PROGS += proc-pid-wchan
 TEST_GEN_PROGS += proc-self-map-files-001
 TEST_GEN_PROGS += proc-self-map-files-002
 TEST_GEN_PROGS += proc-self-syscall
--- /dev/null
+++ b/tools/testing/selftests/proc/proc-pid-wchan.c
@@ -0,0 +1,69 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Make sure that wchan returns a reasonable symbol when blocked.
+ */
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <signal.h>
+#include <stdio.h>
+#include <string.h>
+#include <unistd.h>
+#include <sys/wait.h>
+
+#define perror_exit(str) do { perror(str); _exit(1); } while (0)
+
+int main(void)
+{
+	char buf[64];
+	pid_t child;
+	int sync[2], fd;
+
+	if (pipe(sync) < 0)
+		perror_exit("pipe");
+
+	child = fork();
+	if (child < 0)
+		perror_exit("fork");
+	if (child == 0) {
+		/* Child */
+		if (close(sync[0]) < 0)
+			perror_exit("child close sync[0]");
+		if (close(sync[1]) < 0)
+			perror_exit("child close sync[1]");
+		sleep(10);
+		_exit(0);
+	}
+	/* Parent */
+	if (close(sync[1]) < 0)
+		perror_exit("parent close sync[1]");
+	if (read(sync[0], buf, 1) != 0)
+		perror_exit("parent read sync[0]");
+
+	snprintf(buf, sizeof(buf), "/proc/%d/wchan", child);
+	fd = open(buf, O_RDONLY);
+	if (fd < 0) {
+		if (errno == ENOENT)
+			return 4;
+		perror_exit(buf);
+	}
+
+	memset(buf, 0, sizeof(buf));
+	if (read(fd, buf, sizeof(buf) - 1) < 1)
+		perror_exit(buf);
+	if (strstr(buf, "sleep") == NULL) {
+		fprintf(stderr, "FAIL: did not find 'sleep' in wchan '%s'\n", buf);
+		return 1;
+	}
+	printf("ok: found 'sleep' in wchan '%s'\n", buf);
+
+	if (kill(child, SIGKILL) < 0)
+		perror_exit("kill");
+	if (waitpid(child, NULL, 0) != child) {
+		fprintf(stderr, "waitpid: got the wrong child!?\n");
+		return 1;
+	}
+
+	return 0;
+}



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/7] arch: More wchan fixes
  2021-10-22 15:09 [PATCH 0/7] arch: More wchan fixes Peter Zijlstra
                   ` (6 preceding siblings ...)
  2021-10-22 15:09 ` [PATCH 7/7] selftests: proc: Make sure wchan works when it exists Peter Zijlstra
@ 2021-10-22 15:27 ` Peter Zijlstra
  7 siblings, 0 replies; 28+ messages in thread
From: Peter Zijlstra @ 2021-10-22 15:27 UTC (permalink / raw)
  To: keescook, x86
  Cc: linux-kernel, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, bristot, akpm, mark.rutland,
	zhengqi.arch, linux, catalin.marinas, will, mpe, paul.walmsley,
	palmer, hca, gor, borntraeger, linux-arch, ardb, Josh Poimboeuf


*sigh*,...

On Fri, Oct 22, 2021 at 05:09:33PM +0200, Peter Zijlstra wrote:
> Hi,
> 
> Respin of the rest of the get_wchan() rework [1]. Given the many problems with
> STACKTRACE this series focuses on the newer ARCH_STACKWALK since that's a
> smaller set of architectures to review with a better specified interface.
> 
> The idea is to piece-wise convert the rest of the architectures to
> ARCH_STACKWALK, but that can be done at leasure.
> 
> PowerPC and ARM64 are a bit funny in that their __switch_to() is in C and ends
> up on the stack. For now simply mark it __sched to make it work. Mark wanted to
> maybe do something different, but I think this ought to get us started.


[1] https://lkml.kernel.org/r/20211008111527.438276127@infradead.org

Patches are also available at:

  git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git sched/wchan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 6/7] arch: __get_wchan() || ARCH_STACKWALK
  2021-10-22 15:09 ` [PATCH 6/7] arch: __get_wchan() || ARCH_STACKWALK Peter Zijlstra
@ 2021-10-22 16:13   ` Kees Cook
  2021-10-22 17:52   ` Mark Rutland
  1 sibling, 0 replies; 28+ messages in thread
From: Kees Cook @ 2021-10-22 16:13 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, linux-kernel, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, bristot, akpm, mark.rutland,
	zhengqi.arch, linux, catalin.marinas, will, mpe, paul.walmsley,
	palmer, hca, gor, borntraeger, linux-arch, ardb

On Fri, Oct 22, 2021 at 05:09:39PM +0200, Peter Zijlstra wrote:
> Use ARCH_STACKWALK to implement a generic __get_wchan().
> 
> STACKTRACE should be possible, but the various implementations of
> stack_trace_save_tsk() are not consistent enough for this to work.
> ARCH_STACKWALK is a smaller set of architectures with a better defined
> interface.
> 
> Since get_wchan() pins the task in a blocked state, it is not
> necessary to take a reference on the task stack, the task isn't going
> anywhere.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Nice, this looks good.

Reviewed-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 5/7] powerpc, arm64: Mark __switch_to() as __sched
  2021-10-22 15:09 ` [PATCH 5/7] powerpc, arm64: Mark __switch_to() as __sched Peter Zijlstra
@ 2021-10-22 16:15   ` Kees Cook
  2021-10-22 17:40   ` Mark Rutland
  1 sibling, 0 replies; 28+ messages in thread
From: Kees Cook @ 2021-10-22 16:15 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, linux-kernel, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, bristot, akpm, mark.rutland,
	zhengqi.arch, linux, catalin.marinas, will, mpe, paul.walmsley,
	palmer, hca, gor, borntraeger, linux-arch, ardb

On Fri, Oct 22, 2021 at 05:09:38PM +0200, Peter Zijlstra wrote:
> Unlike most of the other architectures, PowerPC and ARM64 have
> __switch_to() as a C function which remains on the stack. Their
> respective __get_wchan() skips one stack frame unconditionally,
> without testing is_sched_functions().
> 
> Mark them __sched such that we can forgo that special case.

I wonder if this change will improve any benchmarks? (i.e. this will
move __switch_to into the scheduler section, maybe improving icache?)

Reviewed-by: Kees Cook <keescook@chromium.org>

> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  arch/arm64/kernel/process.c   |    4 ++--
>  arch/powerpc/kernel/process.c |    4 ++--
>  2 files changed, 4 insertions(+), 4 deletions(-)
> 
> --- a/arch/arm64/kernel/process.c
> +++ b/arch/arm64/kernel/process.c
> @@ -490,8 +490,8 @@ void update_sctlr_el1(u64 sctlr)
>  /*
>   * Thread switching.
>   */
> -__notrace_funcgraph struct task_struct *__switch_to(struct task_struct *prev,
> -				struct task_struct *next)
> +__notrace_funcgraph __sched
> +struct task_struct *__switch_to(struct task_struct *prev, struct task_struct *next)
>  {
>  	struct task_struct *last;
>  
> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -1201,8 +1201,8 @@ static inline void restore_sprs(struct t
>  
>  }
>  
> -struct task_struct *__switch_to(struct task_struct *prev,
> -	struct task_struct *new)
> +__sched struct task_struct *__switch_to(struct task_struct *prev,
> +					struct task_struct *new)
>  {
>  	struct thread_struct *new_thread, *old_thread;
>  	struct task_struct *last;
> 
> 

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/7] arch: Make ARCH_STACKWALK independent of STACKTRACE
  2021-10-22 15:09 ` [PATCH 4/7] arch: Make ARCH_STACKWALK independent of STACKTRACE Peter Zijlstra
@ 2021-10-22 16:18   ` Kees Cook
  2021-10-22 16:36     ` Peter Zijlstra
  2021-10-22 17:06   ` Mark Rutland
  1 sibling, 1 reply; 28+ messages in thread
From: Kees Cook @ 2021-10-22 16:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, linux-kernel, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, bristot, akpm, mark.rutland,
	zhengqi.arch, linux, catalin.marinas, will, mpe, paul.walmsley,
	palmer, hca, gor, borntraeger, linux-arch, ardb

On Fri, Oct 22, 2021 at 05:09:37PM +0200, Peter Zijlstra wrote:
> Make arch_stack_walk() available for ARCH_STACKWALK architectures
> without it being entangled in STACKTRACE.

Which CONFIG/arch combos did you build test with this? It looks good,
but I always expect things like this to end up landing in corner cases.
:)

Reviewed-by: Kees Cook <keescook@chromium.org>

> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  arch/arm/kernel/stacktrace.c   |    2 --
>  arch/arm64/kernel/stacktrace.c |    4 ----
>  arch/powerpc/kernel/Makefile   |    3 +--
>  arch/riscv/kernel/stacktrace.c |    4 ----
>  arch/s390/kernel/Makefile      |    3 +--
>  arch/x86/kernel/Makefile       |    2 +-
>  include/linux/stacktrace.h     |   33 +++++++++++++++++----------------
>  7 files changed, 20 insertions(+), 31 deletions(-)
> 
> --- a/arch/arm/kernel/stacktrace.c
> +++ b/arch/arm/kernel/stacktrace.c
> @@ -87,7 +87,6 @@ void notrace walk_stackframe(struct stac
>  }
>  EXPORT_SYMBOL(walk_stackframe);
>  
> -#ifdef CONFIG_STACKTRACE
>  noinline notrace void arch_stack_walk(stack_trace_consume_fn consume_entry,
>  				      void *cookie, struct task_struct *task,
>  				      struct pt_regs *regs)
> @@ -113,4 +112,3 @@ noinline notrace void arch_stack_walk(st
>  			break;
>  	}
>  }
> -#endif
> --- a/arch/arm64/kernel/stacktrace.c
> +++ b/arch/arm64/kernel/stacktrace.c
> @@ -216,8 +216,6 @@ void show_stack(struct task_struct *tsk,
>  	barrier();
>  }
>  
> -#ifdef CONFIG_STACKTRACE
> -
>  noinline notrace void arch_stack_walk(stack_trace_consume_fn consume_entry,
>  			      void *cookie, struct task_struct *task,
>  			      struct pt_regs *regs)
> @@ -236,5 +234,3 @@ noinline notrace void arch_stack_walk(st
>  
>  	walk_stackframe(task, &frame, consume_entry, cookie);
>  }
> -
> -#endif
> --- a/arch/powerpc/kernel/Makefile
> +++ b/arch/powerpc/kernel/Makefile
> @@ -47,7 +47,7 @@ obj-y				:= cputable.o syscalls.o \
>  				   udbg.o misc.o io.o misc_$(BITS).o \
>  				   of_platform.o prom_parse.o firmware.o \
>  				   hw_breakpoint_constraints.o interrupt.o \
> -				   kdebugfs.o
> +				   kdebugfs.o stacktrace.o
>  obj-y				+= ptrace/
>  obj-$(CONFIG_PPC64)		+= setup_64.o \
>  				   paca.o nvram_64.o note.o
> @@ -116,7 +116,6 @@ obj-$(CONFIG_OPTPROBES)		+= optprobes.o
>  obj-$(CONFIG_KPROBES_ON_FTRACE)	+= kprobes-ftrace.o
>  obj-$(CONFIG_UPROBES)		+= uprobes.o
>  obj-$(CONFIG_PPC_UDBG_16550)	+= legacy_serial.o udbg_16550.o
> -obj-$(CONFIG_STACKTRACE)	+= stacktrace.o
>  obj-$(CONFIG_SWIOTLB)		+= dma-swiotlb.o
>  obj-$(CONFIG_ARCH_HAS_DMA_SET_MASK) += dma-mask.o
>  
> --- a/arch/riscv/kernel/stacktrace.c
> +++ b/arch/riscv/kernel/stacktrace.c
> @@ -139,12 +139,8 @@ unsigned long __get_wchan(struct task_st
>  	return pc;
>  }
>  
> -#ifdef CONFIG_STACKTRACE
> -
>  noinline void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie,
>  		     struct task_struct *task, struct pt_regs *regs)
>  {
>  	walk_stackframe(task, regs, consume_entry, cookie);
>  }
> -
> -#endif /* CONFIG_STACKTRACE */
> --- a/arch/s390/kernel/Makefile
> +++ b/arch/s390/kernel/Makefile
> @@ -40,7 +40,7 @@ obj-y	+= sysinfo.o lgr.o os_info.o machi
>  obj-y	+= runtime_instr.o cache.o fpu.o dumpstack.o guarded_storage.o sthyi.o
>  obj-y	+= entry.o reipl.o relocate_kernel.o kdebugfs.o alternative.o
>  obj-y	+= nospec-branch.o ipl_vmparm.o machine_kexec_reloc.o unwind_bc.o
> -obj-y	+= smp.o text_amode31.o
> +obj-y	+= smp.o text_amode31.o stacktrace.o
>  
>  extra-y				+= head64.o vmlinux.lds
>  
> @@ -55,7 +55,6 @@ compat-obj-$(CONFIG_AUDIT)	+= compat_aud
>  obj-$(CONFIG_COMPAT)		+= compat_linux.o compat_signal.o
>  obj-$(CONFIG_COMPAT)		+= $(compat-obj-y)
>  obj-$(CONFIG_EARLY_PRINTK)	+= early_printk.o
> -obj-$(CONFIG_STACKTRACE)	+= stacktrace.o
>  obj-$(CONFIG_KPROBES)		+= kprobes.o
>  obj-$(CONFIG_KPROBES)		+= kprobes_insn_page.o
>  obj-$(CONFIG_FUNCTION_TRACER)	+= mcount.o ftrace.o
> --- a/arch/x86/kernel/Makefile
> +++ b/arch/x86/kernel/Makefile
> @@ -84,7 +84,7 @@ obj-$(CONFIG_IA32_EMULATION)	+= tls.o
>  obj-y				+= step.o
>  obj-$(CONFIG_INTEL_TXT)		+= tboot.o
>  obj-$(CONFIG_ISA_DMA_API)	+= i8237.o
> -obj-$(CONFIG_STACKTRACE)	+= stacktrace.o
> +obj-y				+= stacktrace.o
>  obj-y				+= cpu/
>  obj-y				+= acpi/
>  obj-y				+= reboot.o
> --- a/include/linux/stacktrace.h
> +++ b/include/linux/stacktrace.h
> @@ -8,21 +8,6 @@
>  struct task_struct;
>  struct pt_regs;
>  
> -#ifdef CONFIG_STACKTRACE
> -void stack_trace_print(const unsigned long *trace, unsigned int nr_entries,
> -		       int spaces);
> -int stack_trace_snprint(char *buf, size_t size, const unsigned long *entries,
> -			unsigned int nr_entries, int spaces);
> -unsigned int stack_trace_save(unsigned long *store, unsigned int size,
> -			      unsigned int skipnr);
> -unsigned int stack_trace_save_tsk(struct task_struct *task,
> -				  unsigned long *store, unsigned int size,
> -				  unsigned int skipnr);
> -unsigned int stack_trace_save_regs(struct pt_regs *regs, unsigned long *store,
> -				   unsigned int size, unsigned int skipnr);
> -unsigned int stack_trace_save_user(unsigned long *store, unsigned int size);
> -
> -/* Internal interfaces. Do not use in generic code */
>  #ifdef CONFIG_ARCH_STACKWALK
>  
>  /**
> @@ -75,8 +60,24 @@ int arch_stack_walk_reliable(stack_trace
>  
>  void arch_stack_walk_user(stack_trace_consume_fn consume_entry, void *cookie,
>  			  const struct pt_regs *regs);
> +#endif /* CONFIG_ARCH_STACKWALK */
>  
> -#else /* CONFIG_ARCH_STACKWALK */
> +#ifdef CONFIG_STACKTRACE
> +void stack_trace_print(const unsigned long *trace, unsigned int nr_entries,
> +		       int spaces);
> +int stack_trace_snprint(char *buf, size_t size, const unsigned long *entries,
> +			unsigned int nr_entries, int spaces);
> +unsigned int stack_trace_save(unsigned long *store, unsigned int size,
> +			      unsigned int skipnr);
> +unsigned int stack_trace_save_tsk(struct task_struct *task,
> +				  unsigned long *store, unsigned int size,
> +				  unsigned int skipnr);
> +unsigned int stack_trace_save_regs(struct pt_regs *regs, unsigned long *store,
> +				   unsigned int size, unsigned int skipnr);
> +unsigned int stack_trace_save_user(unsigned long *store, unsigned int size);
> +
> +#ifndef CONFIG_ARCH_STACKWALK
> +/* Internal interfaces. Do not use in generic code */
>  struct stack_trace {
>  	unsigned int nr_entries, max_entries;
>  	unsigned long *entries;
> 
> 

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/7] ARM: implement ARCH_STACKWALK
  2021-10-22 15:09 ` [PATCH 3/7] ARM: implement ARCH_STACKWALK Peter Zijlstra
@ 2021-10-22 16:18   ` Kees Cook
  0 siblings, 0 replies; 28+ messages in thread
From: Kees Cook @ 2021-10-22 16:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, linux-kernel, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, bristot, akpm, mark.rutland,
	zhengqi.arch, linux, catalin.marinas, will, mpe, paul.walmsley,
	palmer, hca, gor, borntraeger, linux-arch, ardb

On Fri, Oct 22, 2021 at 05:09:36PM +0200, Peter Zijlstra wrote:
> From: Ard Biesheuvel <ardb@kernel.org>
> 
> Implement the flavor of CONFIG_STACKTRACE that relies mostly on generic
> code, and only need a small arch_stack_walk() helper that performs the
> actual frame unwinding.
> 
> Note that this removes the SMP check that used to live in
> __save_stack_trace(), but this is no longer needed now that the generic
> version of save_stack_trace_tsk() takes care not to walk the call stack
> of tasks that are live on other CPUs.
> 
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>

Reviewed-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/7] stacktrace,sched: Make stack_trace_save_tsk() more robust
  2021-10-22 15:09 ` [PATCH 2/7] stacktrace,sched: Make stack_trace_save_tsk() more robust Peter Zijlstra
@ 2021-10-22 16:25   ` Kees Cook
  2021-10-22 16:45     ` Peter Zijlstra
  2021-10-22 16:54     ` Mark Rutland
  2021-10-25 16:27   ` Peter Zijlstra
  1 sibling, 2 replies; 28+ messages in thread
From: Kees Cook @ 2021-10-22 16:25 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, linux-kernel, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, bristot, akpm, mark.rutland,
	zhengqi.arch, linux, catalin.marinas, will, mpe, paul.walmsley,
	palmer, hca, gor, borntraeger, linux-arch, ardb

On Fri, Oct 22, 2021 at 05:09:35PM +0200, Peter Zijlstra wrote:
> Recent patches to get_wchan() made it more robust by only doing the
> unwind when the task was blocked and serialized against wakeups.
> 
> Extract this functionality as a simpler companion to task_call_func()
> named task_try_func() that really only cares about blocked tasks. Then
> employ this new function to implement the same robustness for
> ARCH_STACKWALK based stack_trace_save_tsk().
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  include/linux/wait.h |    1 
>  kernel/sched/core.c  |   62 ++++++++++++++++++++++++++++++++++++++++++++-------
>  kernel/stacktrace.c  |   13 ++++++----
>  3 files changed, 63 insertions(+), 13 deletions(-)
> 
> --- a/include/linux/wait.h
> +++ b/include/linux/wait.h
> @@ -1162,5 +1162,6 @@ int autoremove_wake_function(struct wait
>  
>  typedef int (*task_call_f)(struct task_struct *p, void *arg);
>  extern int task_call_func(struct task_struct *p, task_call_f func, void *arg);
> +extern int task_try_func(struct task_struct *p, task_call_f func, void *arg);
>  
>  #endif /* _LINUX_WAIT_H */
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1966,21 +1966,21 @@ bool sched_task_on_rq(struct task_struct
>  	return task_on_rq_queued(p);
>  }
>  
> +static int try_get_wchan(struct task_struct *p, void *arg)
> +{
> +	unsigned long *wchan = arg;
> +	*wchan = __get_wchan(p);
> +	return 0;
> +}
> +
>  unsigned long get_wchan(struct task_struct *p)
>  {
>  	unsigned long ip = 0;
> -	unsigned int state;
>  
>  	if (!p || p == current)
>  		return 0;
>  
> -	/* Only get wchan if task is blocked and we can keep it that way. */
> -	raw_spin_lock_irq(&p->pi_lock);
> -	state = READ_ONCE(p->__state);
> -	smp_rmb(); /* see try_to_wake_up() */
> -	if (state != TASK_RUNNING && state != TASK_WAKING && !p->on_rq)
> -		ip = __get_wchan(p);
> -	raw_spin_unlock_irq(&p->pi_lock);
> +	task_try_func(p, try_get_wchan, &ip);
>  
>  	return ip;
>  }
> @@ -4184,6 +4184,52 @@ int task_call_func(struct task_struct *p
>  	return ret;
>  }
>  
> +/*
> + * task_try_func - Invoke a function on task in blocked state
> + * @p: Process for which the function is to be invoked
> + * @func: Function to invoke
> + * @arg: Argument to function
> + *
> + * Fix the task in a blocked state, when possible. And if so, invoke @func on it.
> + *
> + * Returns:
> + *  -EBUSY or whatever @func returns
> + */
> +int task_try_func(struct task_struct *p, task_call_f func, void *arg)
> +{
> +	unsigned long flags;
> +	unsigned int state;
> +	int ret = -EBUSY;
> +
> +	raw_spin_lock_irqsave(&p->pi_lock, flags);
> +
> +	state = READ_ONCE(p->__state);
> +
> +	/*
> +	 * Ensure we load p->on_rq after p->__state, otherwise it would be
> +	 * possible to, falsely, observe p->on_rq == 0.
> +	 *
> +	 * See try_to_wake_up() for a longer comment.
> +	 */
> +	smp_rmb();
> +
> +	/*
> +	 * Since pi->lock blocks try_to_wake_up(), we don't need rq->lock when
> +	 * the task is blocked. Make sure to check @state since ttwu() can drop
> +	 * locks at the end, see ttwu_queue_wakelist().
> +	 */
> +	if (state != TASK_RUNNING && state != TASK_WAKING && !p->on_rq) {
> +		/*
> +		 * The task is blocked and we're holding off wakeupsr. For any
> +		 * of the other task states, see task_call_func().
> +		 */
> +		ret = func(p, arg);
> +	}
> +
> +	raw_spin_unlock_irqrestore(&p->pi_lock, flags);
> +	return ret;
> +}
> +
>  /**
>   * wake_up_process - Wake up a specific process
>   * @p: The process to be woken up.
> --- a/kernel/stacktrace.c
> +++ b/kernel/stacktrace.c
> @@ -123,6 +123,13 @@ unsigned int stack_trace_save(unsigned l
>  }
>  EXPORT_SYMBOL_GPL(stack_trace_save);
>  
> +static int try_arch_stack_walk_tsk(struct task_struct *tsk, void *arg)
> +{
> +	stack_trace_consume_fn consume_entry = stack_trace_consume_entry_nosched;
> +	arch_stack_walk(consume_entry, arg, tsk, NULL);
> +	return 0;
> +}
> +
>  /**
>   * stack_trace_save_tsk - Save a task stack trace into a storage array
>   * @task:	The task to examine
> @@ -135,7 +142,6 @@ EXPORT_SYMBOL_GPL(stack_trace_save);
>  unsigned int stack_trace_save_tsk(struct task_struct *tsk, unsigned long *store,
>  				  unsigned int size, unsigned int skipnr)
>  {
> -	stack_trace_consume_fn consume_entry = stack_trace_consume_entry_nosched;
>  	struct stacktrace_cookie c = {
>  		.store	= store,
>  		.size	= size,
> @@ -143,11 +149,8 @@ unsigned int stack_trace_save_tsk(struct
>  		.skip	= skipnr + (current == tsk),
>  	};
>  
> -	if (!try_get_task_stack(tsk))
> -		return 0;
> +	task_try_func(tsk, try_arch_stack_walk_tsk, &c);

Pardon my thin understanding of the scheduler, but I assume this change
doesn't mean stack_trace_save_tsk() stops working for "current", right?
In trying to answer this for myself, I couldn't convince myself what value
current->__state have here. Is it one of TASK_(UN)INTERRUPTIBLE ?

Assuming this does actually remain callable for current:

Reviewed-by: Kees Cook <keescook@chromium.org>


>  
> -	arch_stack_walk(consume_entry, &c, tsk, NULL);
> -	put_task_stack(tsk);
>  	return c.len;
>  }
>  
> 
> 

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/7] x86: Fix __get_wchan() for !STACKTRACE
  2021-10-22 15:09 ` [PATCH 1/7] x86: Fix __get_wchan() for !STACKTRACE Peter Zijlstra
@ 2021-10-22 16:25   ` Kees Cook
  2021-10-26 19:16   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
  1 sibling, 0 replies; 28+ messages in thread
From: Kees Cook @ 2021-10-22 16:25 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, linux-kernel, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, bristot, akpm, mark.rutland,
	zhengqi.arch, linux, catalin.marinas, will, mpe, paul.walmsley,
	palmer, hca, gor, borntraeger, linux-arch, ardb,
	Stephen Rothwell

On Fri, Oct 22, 2021 at 05:09:34PM +0200, Peter Zijlstra wrote:
> Use asm/unwind.h to implement wchan, since we cannot always rely on
> STACKTRACE=y.
> 
> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
> Fixes: bc9bbb81730e ("x86: Fix get_wchan() to support the ORC unwinder")
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Reviewed-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/7] arch: Make ARCH_STACKWALK independent of STACKTRACE
  2021-10-22 16:18   ` Kees Cook
@ 2021-10-22 16:36     ` Peter Zijlstra
  0 siblings, 0 replies; 28+ messages in thread
From: Peter Zijlstra @ 2021-10-22 16:36 UTC (permalink / raw)
  To: Kees Cook
  Cc: x86, linux-kernel, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, bristot, akpm, mark.rutland,
	zhengqi.arch, linux, catalin.marinas, will, mpe, paul.walmsley,
	palmer, hca, gor, borntraeger, linux-arch, ardb

On Fri, Oct 22, 2021 at 09:18:05AM -0700, Kees Cook wrote:
> On Fri, Oct 22, 2021 at 05:09:37PM +0200, Peter Zijlstra wrote:
> > Make arch_stack_walk() available for ARCH_STACKWALK architectures
> > without it being entangled in STACKTRACE.
> 
> Which CONFIG/arch combos did you build test with this? It looks good,
> but I always expect things like this to end up landing in corner cases.
> :)

I had this one through 0day last night. That's no guarantees, but at
least it's had a handfull of builds done on it.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/7] stacktrace,sched: Make stack_trace_save_tsk() more robust
  2021-10-22 16:25   ` Kees Cook
@ 2021-10-22 16:45     ` Peter Zijlstra
  2021-10-22 16:57       ` Mark Rutland
  2021-10-22 16:54     ` Mark Rutland
  1 sibling, 1 reply; 28+ messages in thread
From: Peter Zijlstra @ 2021-10-22 16:45 UTC (permalink / raw)
  To: Kees Cook
  Cc: x86, linux-kernel, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, bristot, akpm, mark.rutland,
	zhengqi.arch, linux, catalin.marinas, will, mpe, paul.walmsley,
	palmer, hca, gor, borntraeger, linux-arch, ardb

On Fri, Oct 22, 2021 at 09:25:02AM -0700, Kees Cook wrote:
> On Fri, Oct 22, 2021 at 05:09:35PM +0200, Peter Zijlstra wrote:
> >  /**
> >   * stack_trace_save_tsk - Save a task stack trace into a storage array
> >   * @task:	The task to examine
> > @@ -135,7 +142,6 @@ EXPORT_SYMBOL_GPL(stack_trace_save);
> >  unsigned int stack_trace_save_tsk(struct task_struct *tsk, unsigned long *store,
> >  				  unsigned int size, unsigned int skipnr)
> >  {
> > -	stack_trace_consume_fn consume_entry = stack_trace_consume_entry_nosched;
> >  	struct stacktrace_cookie c = {
> >  		.store	= store,
> >  		.size	= size,
> > @@ -143,11 +149,8 @@ unsigned int stack_trace_save_tsk(struct
> >  		.skip	= skipnr + (current == tsk),
> >  	};
> >  
> > -	if (!try_get_task_stack(tsk))
> > -		return 0;
> > +	task_try_func(tsk, try_arch_stack_walk_tsk, &c);
> 
> Pardon my thin understanding of the scheduler, but I assume this change
> doesn't mean stack_trace_save_tsk() stops working for "current", right?
> In trying to answer this for myself, I couldn't convince myself what value
> current->__state have here. Is it one of TASK_(UN)INTERRUPTIBLE ?

current really shouldn't be using stack_trace_save_tsk(), and no you're
quite right, it will not work for current, irrespective of ->__state,
current will always be ->on_rq.

I started auditing stack_trace_save_tsk() users a few days ago, but
didn't look for this particular issue. I suppose I'll have to start over
with that.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/7] stacktrace,sched: Make stack_trace_save_tsk() more robust
  2021-10-22 16:25   ` Kees Cook
  2021-10-22 16:45     ` Peter Zijlstra
@ 2021-10-22 16:54     ` Mark Rutland
  2021-10-22 17:01       ` Peter Zijlstra
  1 sibling, 1 reply; 28+ messages in thread
From: Mark Rutland @ 2021-10-22 16:54 UTC (permalink / raw)
  To: Kees Cook
  Cc: Peter Zijlstra, x86, linux-kernel, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot, akpm,
	zhengqi.arch, linux, catalin.marinas, will, mpe, paul.walmsley,
	palmer, hca, gor, borntraeger, linux-arch, ardb

On Fri, Oct 22, 2021 at 09:25:02AM -0700, Kees Cook wrote:
> On Fri, Oct 22, 2021 at 05:09:35PM +0200, Peter Zijlstra wrote:
> > Recent patches to get_wchan() made it more robust by only doing the
> > unwind when the task was blocked and serialized against wakeups.
> > 
> > Extract this functionality as a simpler companion to task_call_func()
> > named task_try_func() that really only cares about blocked tasks. Then
> > employ this new function to implement the same robustness for
> > ARCH_STACKWALK based stack_trace_save_tsk().
> > 
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > ---
> >  include/linux/wait.h |    1 
> >  kernel/sched/core.c  |   62 ++++++++++++++++++++++++++++++++++++++++++++-------
> >  kernel/stacktrace.c  |   13 ++++++----
> >  3 files changed, 63 insertions(+), 13 deletions(-)
> > 
> > --- a/include/linux/wait.h
> > +++ b/include/linux/wait.h
> > @@ -1162,5 +1162,6 @@ int autoremove_wake_function(struct wait
> >  
> >  typedef int (*task_call_f)(struct task_struct *p, void *arg);
> >  extern int task_call_func(struct task_struct *p, task_call_f func, void *arg);
> > +extern int task_try_func(struct task_struct *p, task_call_f func, void *arg);
> >  
> >  #endif /* _LINUX_WAIT_H */
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -1966,21 +1966,21 @@ bool sched_task_on_rq(struct task_struct
> >  	return task_on_rq_queued(p);
> >  }
> >  
> > +static int try_get_wchan(struct task_struct *p, void *arg)
> > +{
> > +	unsigned long *wchan = arg;
ke> > +	*wchan = __get_wchan(p);
> > +	return 0;
> > +}
> > +
> >  unsigned long get_wchan(struct task_struct *p)
> >  {
> >  	unsigned long ip = 0;
> > -	unsigned int state;
> >  
> >  	if (!p || p == current)
> >  		return 0;
> >  
> > -	/* Only get wchan if task is blocked and we can keep it that way. */
> > -	raw_spin_lock_irq(&p->pi_lock);
> > -	state = READ_ONCE(p->__state);
> > -	smp_rmb(); /* see try_to_wake_up() */
> > -	if (state != TASK_RUNNING && state != TASK_WAKING && !p->on_rq)
> > -		ip = __get_wchan(p);
> > -	raw_spin_unlock_irq(&p->pi_lock);
> > +	task_try_func(p, try_get_wchan, &ip);
> >  
> >  	return ip;
> >  }
> > @@ -4184,6 +4184,52 @@ int task_call_func(struct task_struct *p
> >  	return ret;
> >  }
> >  
> > +/*
> > + * task_try_func - Invoke a function on task in blocked state
> > + * @p: Process for which the function is to be invoked
> > + * @func: Function to invoke
> > + * @arg: Argument to function
> > + *
> > + * Fix the task in a blocked state, when possible. And if so, invoke @func on it.
> > + *
> > + * Returns:
> > + *  -EBUSY or whatever @func returns
> > + */
> > +int task_try_func(struct task_struct *p, task_call_f func, void *arg)
> > +{
> > +	unsigned long flags;
> > +	unsigned int state;
> > +	int ret = -EBUSY;
> > +
> > +	raw_spin_lock_irqsave(&p->pi_lock, flags);
> > +
> > +	state = READ_ONCE(p->__state);
> > +
> > +	/*
> > +	 * Ensure we load p->on_rq after p->__state, otherwise it would be
> > +	 * possible to, falsely, observe p->on_rq == 0.
> > +	 *
> > +	 * See try_to_wake_up() for a longer comment.
> > +	 */
> > +	smp_rmb();
> > +
> > +	/*
> > +	 * Since pi->lock blocks try_to_wake_up(), we don't need rq->lock when
> > +	 * the task is blocked. Make sure to check @state since ttwu() can drop
> > +	 * locks at the end, see ttwu_queue_wakelist().
> > +	 */
> > +	if (state != TASK_RUNNING && state != TASK_WAKING && !p->on_rq) {
> > +		/*
> > +		 * The task is blocked and we're holding off wakeupsr. For any
> > +		 * of the other task states, see task_call_func().
> > +		 */
> > +		ret = func(p, arg);
> > +	}
> > +
> > +	raw_spin_unlock_irqrestore(&p->pi_lock, flags);
> > +	return ret;
> > +}
> > +
> >  /**
> >   * wake_up_process - Wake up a specific process
> >   * @p: The process to be woken up.
> > --- a/kernel/stacktrace.c
> > +++ b/kernel/stacktrace.c
> > @@ -123,6 +123,13 @@ unsigned int stack_trace_save(unsigned l
> >  }
> >  EXPORT_SYMBOL_GPL(stack_trace_save);
> >  
> > +static int try_arch_stack_walk_tsk(struct task_struct *tsk, void *arg)
> > +{
> > +	stack_trace_consume_fn consume_entry = stack_trace_consume_entry_nosched;
> > +	arch_stack_walk(consume_entry, arg, tsk, NULL);
> > +	return 0;
> > +}
> > +
> >  /**
> >   * stack_trace_save_tsk - Save a task stack trace into a storage array
> >   * @task:	The task to examine
> > @@ -135,7 +142,6 @@ EXPORT_SYMBOL_GPL(stack_trace_save);
> >  unsigned int stack_trace_save_tsk(struct task_struct *tsk, unsigned long *store,
> >  				  unsigned int size, unsigned int skipnr)
> >  {
> > -	stack_trace_consume_fn consume_entry = stack_trace_consume_entry_nosched;
> >  	struct stacktrace_cookie c = {
> >  		.store	= store,
> >  		.size	= size,
> > @@ -143,11 +149,8 @@ unsigned int stack_trace_save_tsk(struct
> >  		.skip	= skipnr + (current == tsk),
> >  	};
> >  
> > -	if (!try_get_task_stack(tsk))
> > -		return 0;
> > +	task_try_func(tsk, try_arch_stack_walk_tsk, &c);
> 
> Pardon my thin understanding of the scheduler, but I assume this change
> doesn't mean stack_trace_save_tsk() stops working for "current", right?
> In trying to answer this for myself, I couldn't convince myself what value
> current->__state have here. Is it one of TASK_(UN)INTERRUPTIBLE ?

Regardless of that, current->on_rq will be non-zero, so you're right that this
causes stack_trace_save_tsk() to not work for current, e.g.

| # cat /proc/self/stack 
| # wc  /proc/self/stack 
|         0         0         0 /proc/self/stack

TBH, I think that (taking a step back from this issue in particular)
stack_trace_save_tsk() *shouldn't* work for current, and callers *should* be
forced to explicitly handle current separately from blocked tasks.

So we could fix this in the stacktrace code with:

| diff --git a/kernel/stacktrace.c b/kernel/stacktrace.c
| index a1cdbf8c3ef8..327af9ff2c55 100644
| --- a/kernel/stacktrace.c
| +++ b/kernel/stacktrace.c
| @@ -149,7 +149,10 @@ unsigned int stack_trace_save_tsk(struct task_struct *tsk, unsigned long *store,
|                 .skip   = skipnr + (current == tsk),
|         };
|  
| -       task_try_func(tsk, try_arch_stack_walk_tsk, &c);
| +       if (tsk == current)
| +               try_arch_stack_walk_tsk(tsk, &c);
| +       else
| +               task_try_func(tsk, try_arch_stack_walk_tsk, &c);
|  
|         return c.len;
|  }

... and we could rename task_try_func() to blocked_task_try_func(), and
later push the distinction into higher-level callers.

Alternatively, we could do:

| diff --git a/kernel/sched/core.c b/kernel/sched/core.c
| index a8be6e135c57..cef9e35ecf2f 100644
| --- a/kernel/sched/core.c
| +++ b/kernel/sched/core.c
| @@ -4203,6 +4203,11 @@ int task_try_func(struct task_struct *p, task_call_f func, void *arg)
|  
|         raw_spin_lock_irqsave(&p->pi_lock, flags);
|  
| +       if (p == current) {
| +               ret = func(p, arg);
| +               goto out;
| +       }
| +
|         state = READ_ONCE(p->__state);
|  
|         /*
| @@ -4226,6 +4231,7 @@ int task_try_func(struct task_struct *p, task_call_f func, void *arg)
|                 ret = func(p, arg);
|         }
|  
| +out:
|         raw_spin_unlock_irqrestore(&p->pi_lock, flags);
|         return ret;
|  }

... which perhaps is aligned with smp_call_function_single() and
generic_exec_single().

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/7] stacktrace,sched: Make stack_trace_save_tsk() more robust
  2021-10-22 16:45     ` Peter Zijlstra
@ 2021-10-22 16:57       ` Mark Rutland
  0 siblings, 0 replies; 28+ messages in thread
From: Mark Rutland @ 2021-10-22 16:57 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kees Cook, x86, linux-kernel, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot, akpm,
	zhengqi.arch, linux, catalin.marinas, will, mpe, paul.walmsley,
	palmer, hca, gor, borntraeger, linux-arch, ardb

On Fri, Oct 22, 2021 at 06:45:14PM +0200, Peter Zijlstra wrote:
> On Fri, Oct 22, 2021 at 09:25:02AM -0700, Kees Cook wrote:
> > On Fri, Oct 22, 2021 at 05:09:35PM +0200, Peter Zijlstra wrote:
> > >  /**
> > >   * stack_trace_save_tsk - Save a task stack trace into a storage array
> > >   * @task:	The task to examine
> > > @@ -135,7 +142,6 @@ EXPORT_SYMBOL_GPL(stack_trace_save);
> > >  unsigned int stack_trace_save_tsk(struct task_struct *tsk, unsigned long *store,
> > >  				  unsigned int size, unsigned int skipnr)
> > >  {
> > > -	stack_trace_consume_fn consume_entry = stack_trace_consume_entry_nosched;
> > >  	struct stacktrace_cookie c = {
> > >  		.store	= store,
> > >  		.size	= size,
> > > @@ -143,11 +149,8 @@ unsigned int stack_trace_save_tsk(struct
> > >  		.skip	= skipnr + (current == tsk),
> > >  	};
> > >  
> > > -	if (!try_get_task_stack(tsk))
> > > -		return 0;
> > > +	task_try_func(tsk, try_arch_stack_walk_tsk, &c);
> > 
> > Pardon my thin understanding of the scheduler, but I assume this change
> > doesn't mean stack_trace_save_tsk() stops working for "current", right?
> > In trying to answer this for myself, I couldn't convince myself what value
> > current->__state have here. Is it one of TASK_(UN)INTERRUPTIBLE ?
> 
> current really shouldn't be using stack_trace_save_tsk(), and no you're
> quite right, it will not work for current, irrespective of ->__state,
> current will always be ->on_rq.

Heh, we raced to say the same thing. :)

> I started auditing stack_trace_save_tsk() users a few days ago, but
> didn't look for this particular issue. I suppose I'll have to start over
> with that.

FWIW, this shape of thing was one of the reasons I wanted to split
arch_stack_walk() into separate:

* arch_stack_walk_current()
* arch_stack_walk_current_regs()
* arch_stack_walk_blocked_task()

... with similar applying here, since otherwise people won't consider
the distinction between current / !current at the caller level, leading
to junk like this.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/7] stacktrace,sched: Make stack_trace_save_tsk() more robust
  2021-10-22 16:54     ` Mark Rutland
@ 2021-10-22 17:01       ` Peter Zijlstra
  2021-10-25 20:38         ` Peter Zijlstra
  0 siblings, 1 reply; 28+ messages in thread
From: Peter Zijlstra @ 2021-10-22 17:01 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Kees Cook, x86, linux-kernel, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot, akpm,
	zhengqi.arch, linux, catalin.marinas, will, mpe, paul.walmsley,
	palmer, hca, gor, borntraeger, linux-arch, ardb

On Fri, Oct 22, 2021 at 05:54:31PM +0100, Mark Rutland wrote:

> > Pardon my thin understanding of the scheduler, but I assume this change
> > doesn't mean stack_trace_save_tsk() stops working for "current", right?
> > In trying to answer this for myself, I couldn't convince myself what value
> > current->__state have here. Is it one of TASK_(UN)INTERRUPTIBLE ?
> 
> Regardless of that, current->on_rq will be non-zero, so you're right that this
> causes stack_trace_save_tsk() to not work for current, e.g.
> 
> | # cat /proc/self/stack 
> | # wc  /proc/self/stack 
> |         0         0         0 /proc/self/stack
> 
> TBH, I think that (taking a step back from this issue in particular)
> stack_trace_save_tsk() *shouldn't* work for current, and callers *should* be
> forced to explicitly handle current separately from blocked tasks.

That..

> 
> So we could fix this in the stacktrace code with:
> 
> | diff --git a/kernel/stacktrace.c b/kernel/stacktrace.c
> | index a1cdbf8c3ef8..327af9ff2c55 100644
> | --- a/kernel/stacktrace.c
> | +++ b/kernel/stacktrace.c
> | @@ -149,7 +149,10 @@ unsigned int stack_trace_save_tsk(struct task_struct *tsk, unsigned long *store,
> |                 .skip   = skipnr + (current == tsk),
> |         };
> |  
> | -       task_try_func(tsk, try_arch_stack_walk_tsk, &c);
> | +       if (tsk == current)
> | +               try_arch_stack_walk_tsk(tsk, &c);
> | +       else
> | +               task_try_func(tsk, try_arch_stack_walk_tsk, &c);
> |  
> |         return c.len;
> |  }
> 
> ... and we could rename task_try_func() to blocked_task_try_func(), and
> later push the distinction into higher-level callers.

I think I favour this fix if we have to. But that's for next week :-)

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/7] arch: Make ARCH_STACKWALK independent of STACKTRACE
  2021-10-22 15:09 ` [PATCH 4/7] arch: Make ARCH_STACKWALK independent of STACKTRACE Peter Zijlstra
  2021-10-22 16:18   ` Kees Cook
@ 2021-10-22 17:06   ` Mark Rutland
  1 sibling, 0 replies; 28+ messages in thread
From: Mark Rutland @ 2021-10-22 17:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: keescook, x86, linux-kernel, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot, akpm,
	zhengqi.arch, linux, catalin.marinas, will, mpe, paul.walmsley,
	palmer, hca, gor, borntraeger, linux-arch, ardb,
	Madhavan T. Venkataraman, broonie

On Fri, Oct 22, 2021 at 05:09:37PM +0200, Peter Zijlstra wrote:
> Make arch_stack_walk() available for ARCH_STACKWALK architectures
> without it being entangled in STACKTRACE.

Nice!

> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

This looks good to me.

I gave this a spin on arm64, which builds and boots fine with or without
CONFIG_STACKTRACE, and this doesn't seem to regress /proc/*/{wchan,
stack}, so:

Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>

Madhavan (Cc'd) has patches to clean up arm64's various unwinders to use
arch_stack_walk(), and it would be good if we could build atop this
rather than having to unconditionally enable STACKTRACE and expose
/proc/*/stack (which should really just be for debugging the unwinder,
and not something distros should ever have enabled).

Once this settles, would it be possible to place this (and the __sched
change to __switch_to) on a stable branch somewhere?

Thanks,
Mark.

> ---
>  arch/arm/kernel/stacktrace.c   |    2 --
>  arch/arm64/kernel/stacktrace.c |    4 ----
>  arch/powerpc/kernel/Makefile   |    3 +--
>  arch/riscv/kernel/stacktrace.c |    4 ----
>  arch/s390/kernel/Makefile      |    3 +--
>  arch/x86/kernel/Makefile       |    2 +-
>  include/linux/stacktrace.h     |   33 +++++++++++++++++----------------
>  7 files changed, 20 insertions(+), 31 deletions(-)
> 
> --- a/arch/arm/kernel/stacktrace.c
> +++ b/arch/arm/kernel/stacktrace.c
> @@ -87,7 +87,6 @@ void notrace walk_stackframe(struct stac
>  }
>  EXPORT_SYMBOL(walk_stackframe);
>  
> -#ifdef CONFIG_STACKTRACE
>  noinline notrace void arch_stack_walk(stack_trace_consume_fn consume_entry,
>  				      void *cookie, struct task_struct *task,
>  				      struct pt_regs *regs)
> @@ -113,4 +112,3 @@ noinline notrace void arch_stack_walk(st
>  			break;
>  	}
>  }
> -#endif
> --- a/arch/arm64/kernel/stacktrace.c
> +++ b/arch/arm64/kernel/stacktrace.c
> @@ -216,8 +216,6 @@ void show_stack(struct task_struct *tsk,
>  	barrier();
>  }
>  
> -#ifdef CONFIG_STACKTRACE
> -
>  noinline notrace void arch_stack_walk(stack_trace_consume_fn consume_entry,
>  			      void *cookie, struct task_struct *task,
>  			      struct pt_regs *regs)
> @@ -236,5 +234,3 @@ noinline notrace void arch_stack_walk(st
>  
>  	walk_stackframe(task, &frame, consume_entry, cookie);
>  }
> -
> -#endif
> --- a/arch/powerpc/kernel/Makefile
> +++ b/arch/powerpc/kernel/Makefile
> @@ -47,7 +47,7 @@ obj-y				:= cputable.o syscalls.o \
>  				   udbg.o misc.o io.o misc_$(BITS).o \
>  				   of_platform.o prom_parse.o firmware.o \
>  				   hw_breakpoint_constraints.o interrupt.o \
> -				   kdebugfs.o
> +				   kdebugfs.o stacktrace.o
>  obj-y				+= ptrace/
>  obj-$(CONFIG_PPC64)		+= setup_64.o \
>  				   paca.o nvram_64.o note.o
> @@ -116,7 +116,6 @@ obj-$(CONFIG_OPTPROBES)		+= optprobes.o
>  obj-$(CONFIG_KPROBES_ON_FTRACE)	+= kprobes-ftrace.o
>  obj-$(CONFIG_UPROBES)		+= uprobes.o
>  obj-$(CONFIG_PPC_UDBG_16550)	+= legacy_serial.o udbg_16550.o
> -obj-$(CONFIG_STACKTRACE)	+= stacktrace.o
>  obj-$(CONFIG_SWIOTLB)		+= dma-swiotlb.o
>  obj-$(CONFIG_ARCH_HAS_DMA_SET_MASK) += dma-mask.o
>  
> --- a/arch/riscv/kernel/stacktrace.c
> +++ b/arch/riscv/kernel/stacktrace.c
> @@ -139,12 +139,8 @@ unsigned long __get_wchan(struct task_st
>  	return pc;
>  }
>  
> -#ifdef CONFIG_STACKTRACE
> -
>  noinline void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie,
>  		     struct task_struct *task, struct pt_regs *regs)
>  {
>  	walk_stackframe(task, regs, consume_entry, cookie);
>  }
> -
> -#endif /* CONFIG_STACKTRACE */
> --- a/arch/s390/kernel/Makefile
> +++ b/arch/s390/kernel/Makefile
> @@ -40,7 +40,7 @@ obj-y	+= sysinfo.o lgr.o os_info.o machi
>  obj-y	+= runtime_instr.o cache.o fpu.o dumpstack.o guarded_storage.o sthyi.o
>  obj-y	+= entry.o reipl.o relocate_kernel.o kdebugfs.o alternative.o
>  obj-y	+= nospec-branch.o ipl_vmparm.o machine_kexec_reloc.o unwind_bc.o
> -obj-y	+= smp.o text_amode31.o
> +obj-y	+= smp.o text_amode31.o stacktrace.o
>  
>  extra-y				+= head64.o vmlinux.lds
>  
> @@ -55,7 +55,6 @@ compat-obj-$(CONFIG_AUDIT)	+= compat_aud
>  obj-$(CONFIG_COMPAT)		+= compat_linux.o compat_signal.o
>  obj-$(CONFIG_COMPAT)		+= $(compat-obj-y)
>  obj-$(CONFIG_EARLY_PRINTK)	+= early_printk.o
> -obj-$(CONFIG_STACKTRACE)	+= stacktrace.o
>  obj-$(CONFIG_KPROBES)		+= kprobes.o
>  obj-$(CONFIG_KPROBES)		+= kprobes_insn_page.o
>  obj-$(CONFIG_FUNCTION_TRACER)	+= mcount.o ftrace.o
> --- a/arch/x86/kernel/Makefile
> +++ b/arch/x86/kernel/Makefile
> @@ -84,7 +84,7 @@ obj-$(CONFIG_IA32_EMULATION)	+= tls.o
>  obj-y				+= step.o
>  obj-$(CONFIG_INTEL_TXT)		+= tboot.o
>  obj-$(CONFIG_ISA_DMA_API)	+= i8237.o
> -obj-$(CONFIG_STACKTRACE)	+= stacktrace.o
> +obj-y				+= stacktrace.o
>  obj-y				+= cpu/
>  obj-y				+= acpi/
>  obj-y				+= reboot.o
> --- a/include/linux/stacktrace.h
> +++ b/include/linux/stacktrace.h
> @@ -8,21 +8,6 @@
>  struct task_struct;
>  struct pt_regs;
>  
> -#ifdef CONFIG_STACKTRACE
> -void stack_trace_print(const unsigned long *trace, unsigned int nr_entries,
> -		       int spaces);
> -int stack_trace_snprint(char *buf, size_t size, const unsigned long *entries,
> -			unsigned int nr_entries, int spaces);
> -unsigned int stack_trace_save(unsigned long *store, unsigned int size,
> -			      unsigned int skipnr);
> -unsigned int stack_trace_save_tsk(struct task_struct *task,
> -				  unsigned long *store, unsigned int size,
> -				  unsigned int skipnr);
> -unsigned int stack_trace_save_regs(struct pt_regs *regs, unsigned long *store,
> -				   unsigned int size, unsigned int skipnr);
> -unsigned int stack_trace_save_user(unsigned long *store, unsigned int size);
> -
> -/* Internal interfaces. Do not use in generic code */
>  #ifdef CONFIG_ARCH_STACKWALK
>  
>  /**
> @@ -75,8 +60,24 @@ int arch_stack_walk_reliable(stack_trace
>  
>  void arch_stack_walk_user(stack_trace_consume_fn consume_entry, void *cookie,
>  			  const struct pt_regs *regs);
> +#endif /* CONFIG_ARCH_STACKWALK */
>  
> -#else /* CONFIG_ARCH_STACKWALK */
> +#ifdef CONFIG_STACKTRACE
> +void stack_trace_print(const unsigned long *trace, unsigned int nr_entries,
> +		       int spaces);
> +int stack_trace_snprint(char *buf, size_t size, const unsigned long *entries,
> +			unsigned int nr_entries, int spaces);
> +unsigned int stack_trace_save(unsigned long *store, unsigned int size,
> +			      unsigned int skipnr);
> +unsigned int stack_trace_save_tsk(struct task_struct *task,
> +				  unsigned long *store, unsigned int size,
> +				  unsigned int skipnr);
> +unsigned int stack_trace_save_regs(struct pt_regs *regs, unsigned long *store,
> +				   unsigned int size, unsigned int skipnr);
> +unsigned int stack_trace_save_user(unsigned long *store, unsigned int size);
> +
> +#ifndef CONFIG_ARCH_STACKWALK
> +/* Internal interfaces. Do not use in generic code */
>  struct stack_trace {
>  	unsigned int nr_entries, max_entries;
>  	unsigned long *entries;
> 
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 5/7] powerpc, arm64: Mark __switch_to() as __sched
  2021-10-22 15:09 ` [PATCH 5/7] powerpc, arm64: Mark __switch_to() as __sched Peter Zijlstra
  2021-10-22 16:15   ` Kees Cook
@ 2021-10-22 17:40   ` Mark Rutland
  1 sibling, 0 replies; 28+ messages in thread
From: Mark Rutland @ 2021-10-22 17:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: keescook, x86, linux-kernel, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot, akpm,
	zhengqi.arch, linux, catalin.marinas, will, mpe, paul.walmsley,
	palmer, hca, gor, borntraeger, linux-arch, ardb

On Fri, Oct 22, 2021 at 05:09:38PM +0200, Peter Zijlstra wrote:
> Unlike most of the other architectures, PowerPC and ARM64 have
> __switch_to() as a C function which remains on the stack. 

For clarity, could we say:

  Unlike most of the other architectures, PowerPC and ARM64 have
  __switch_to() as a C function which is visible when unwinding from
  their assembly switch function.

... since both arm64 and powerpc are branch-and-link architectures, and
this isn't stacked; it's in the GPR context saved by the switch
assembly.

> Their
> respective __get_wchan() skips one stack frame unconditionally,
> without testing is_sched_functions().

and similarly s/stack frame/caller/ here.

> 
> Mark them __sched such that we can forgo that special case.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  arch/arm64/kernel/process.c   |    4 ++--
>  arch/powerpc/kernel/process.c |    4 ++--
>  2 files changed, 4 insertions(+), 4 deletions(-)
> 
> --- a/arch/arm64/kernel/process.c
> +++ b/arch/arm64/kernel/process.c
> @@ -490,8 +490,8 @@ void update_sctlr_el1(u64 sctlr)
>  /*
>   * Thread switching.
>   */
> -__notrace_funcgraph struct task_struct *__switch_to(struct task_struct *prev,
> -				struct task_struct *next)
> +__notrace_funcgraph __sched
> +struct task_struct *__switch_to(struct task_struct *prev, struct task_struct *next)

As this only matters for the call to our cpu_switch_to() assembly, this
looks sufficient to me. This only changes the placement of the function
and doesn't affect the existing tracing restrictions, so I don't think
this should have any adverse effect.

For testing, this doesn't adversly affect the existing unwinder (which
should obviously be true since we skip the entry anyway, but hey..).

Regardless of the commit message wording:

Reviewed-by: Mark Rutland <mark.rutland@arm.com> [arm64]
Tested-by: Mark Rutland <mark.rutland@arm.com> [arm64]

Thanks,
Mark.

>  {
>  	struct task_struct *last;
>  
> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -1201,8 +1201,8 @@ static inline void restore_sprs(struct t
>  
>  }
>  
> -struct task_struct *__switch_to(struct task_struct *prev,
> -	struct task_struct *new)
> +__sched struct task_struct *__switch_to(struct task_struct *prev,
> +					struct task_struct *new)
>  {
>  	struct thread_struct *new_thread, *old_thread;
>  	struct task_struct *last;
> 
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 6/7] arch: __get_wchan() || ARCH_STACKWALK
  2021-10-22 15:09 ` [PATCH 6/7] arch: __get_wchan() || ARCH_STACKWALK Peter Zijlstra
  2021-10-22 16:13   ` Kees Cook
@ 2021-10-22 17:52   ` Mark Rutland
  1 sibling, 0 replies; 28+ messages in thread
From: Mark Rutland @ 2021-10-22 17:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: keescook, x86, linux-kernel, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot, akpm,
	zhengqi.arch, linux, catalin.marinas, will, mpe, paul.walmsley,
	palmer, hca, gor, borntraeger, linux-arch, ardb,
	Madhavan T. Venkataraman, broonie

On Fri, Oct 22, 2021 at 05:09:39PM +0200, Peter Zijlstra wrote:
> Use ARCH_STACKWALK to implement a generic __get_wchan().
> 
> STACKTRACE should be possible, but the various implementations of
> stack_trace_save_tsk() are not consistent enough for this to work.
> ARCH_STACKWALK is a smaller set of architectures with a better defined
> interface.
> 
> Since get_wchan() pins the task in a blocked state, it is not
> necessary to take a reference on the task stack, the task isn't going
> anywhere.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  arch/arm/include/asm/processor.h     |    2 -
>  arch/arm/kernel/process.c            |   22 --------------------
>  arch/arm64/include/asm/processor.h   |    2 -
>  arch/arm64/kernel/process.c          |   26 ------------------------
>  arch/powerpc/include/asm/processor.h |    2 -
>  arch/powerpc/kernel/process.c        |   37 -----------------------------------
>  arch/riscv/include/asm/processor.h   |    3 --
>  arch/riscv/kernel/stacktrace.c       |   21 -------------------
>  arch/s390/include/asm/processor.h    |    1 
>  arch/s390/kernel/process.c           |   29 ---------------------------
>  arch/x86/include/asm/processor.h     |    2 -
>  arch/x86/kernel/process.c            |   25 -----------------------
>  kernel/sched/core.c                  |   24 ++++++++++++++++++++++
>  13 files changed, 24 insertions(+), 172 deletions(-)

Nice!

> --- a/arch/arm64/kernel/process.c
> +++ b/arch/arm64/kernel/process.c
> @@ -528,32 +528,6 @@ struct task_struct *__switch_to(struct t
>  	return last;
>  }
>  
> -unsigned long __get_wchan(struct task_struct *p)
> -{
> -	struct stackframe frame;
> -	unsigned long stack_page, ret = 0;
> -	int count = 0;
> -
> -	stack_page = (unsigned long)try_get_task_stack(p);
> -	if (!stack_page)
> -		return 0;
> -
> -	start_backtrace(&frame, thread_saved_fp(p), thread_saved_pc(p));
> -
> -	do {
> -		if (unwind_frame(p, &frame))
> -			goto out;
> -		if (!in_sched_functions(frame.pc)) {
> -			ret = frame.pc;
> -			goto out;
> -		}
> -	} while (count++ < 16);
> -
> -out:
> -	put_task_stack(p);
> -	return ret;
> -}

> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1966,6 +1966,30 @@ bool sched_task_on_rq(struct task_struct
>  	return task_on_rq_queued(p);
>  }
>  
> +#ifdef CONFIG_ARCH_STACKWALK
> +
> +static bool consume_wchan(void *cookie, unsigned long addr)
> +{
> +	unsigned long *wchan = cookie;
> +
> +	if (in_sched_functions(addr))
> +		return true;
> +
> +	*wchan = addr;
> +	return false;
> +}
> +
> +static unsigned long __get_wchan(struct task_struct *p)
> +{
> +	unsigned long wchan = 0;
> +
> +	arch_stack_walk(consume_wchan, &wchan, p, NULL);
> +
> +	return wchan;
> +}

It's amazing how much simpler things become with the right
infrastructure!

I gave this a spin on arm64, and /proc/*/wchan looks as expected. The
code looks "obviously correct" to me given the changes in prior patches,
so FWIW:

Reviewed-by: Mark Rutland <mark.rutland@arm.com> [arm64]
Tested-by: Mark Rutland <mark.rutland@arm.com> [arm64]

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/7] stacktrace,sched: Make stack_trace_save_tsk() more robust
  2021-10-22 15:09 ` [PATCH 2/7] stacktrace,sched: Make stack_trace_save_tsk() more robust Peter Zijlstra
  2021-10-22 16:25   ` Kees Cook
@ 2021-10-25 16:27   ` Peter Zijlstra
  1 sibling, 0 replies; 28+ messages in thread
From: Peter Zijlstra @ 2021-10-25 16:27 UTC (permalink / raw)
  To: keescook, x86
  Cc: linux-kernel, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, bristot, akpm, mark.rutland,
	zhengqi.arch, linux, catalin.marinas, will, mpe, paul.walmsley,
	palmer, hca, gor, borntraeger, linux-arch, ardb, Josh Poimboeuf

On Fri, Oct 22, 2021 at 05:09:35PM +0200, Peter Zijlstra wrote:
> --- a/kernel/stacktrace.c
> +++ b/kernel/stacktrace.c
> @@ -123,6 +123,13 @@ unsigned int stack_trace_save(unsigned l
>  }
>  EXPORT_SYMBOL_GPL(stack_trace_save);
>  
> +static int try_arch_stack_walk_tsk(struct task_struct *tsk, void *arg)
> +{
> +	stack_trace_consume_fn consume_entry = stack_trace_consume_entry_nosched;
> +	arch_stack_walk(consume_entry, arg, tsk, NULL);
> +	return 0;
> +}
> +
>  /**
>   * stack_trace_save_tsk - Save a task stack trace into a storage array
>   * @task:	The task to examine
> @@ -135,7 +142,6 @@ EXPORT_SYMBOL_GPL(stack_trace_save);
>  unsigned int stack_trace_save_tsk(struct task_struct *tsk, unsigned long *store,
>  				  unsigned int size, unsigned int skipnr)
>  {
> -	stack_trace_consume_fn consume_entry = stack_trace_consume_entry_nosched;
>  	struct stacktrace_cookie c = {
>  		.store	= store,
>  		.size	= size,
> @@ -143,11 +149,8 @@ unsigned int stack_trace_save_tsk(struct
>  		.skip	= skipnr + (current == tsk),
>  	};
>  
> -	if (!try_get_task_stack(tsk))
> -		return 0;

So I took that out because task_try_func() pins the task, except now
I see that _reliable() has a comment about zombies, which I suppose is
equally applicable to here and wchan.

Alternative to failing try_get_task_stack() is checking PF_EXITING in
try_arch_stack_walk_tsk(), which seems more consistent behaviour since
it doesn't rely on CONFIG_THREAD_INFO_IN_TASK.

> +	task_try_func(tsk, try_arch_stack_walk_tsk, &c);
>  
> -	arch_stack_walk(consume_entry, &c, tsk, NULL);
> -	put_task_stack(tsk);
>  	return c.len;
>  }
>  
> 
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/7] stacktrace,sched: Make stack_trace_save_tsk() more robust
  2021-10-22 17:01       ` Peter Zijlstra
@ 2021-10-25 20:38         ` Peter Zijlstra
  2021-10-25 20:52           ` Kees Cook
  2021-10-26  9:33           ` Mark Rutland
  0 siblings, 2 replies; 28+ messages in thread
From: Peter Zijlstra @ 2021-10-25 20:38 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Kees Cook, x86, linux-kernel, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot, akpm,
	zhengqi.arch, linux, catalin.marinas, will, mpe, paul.walmsley,
	palmer, hca, gor, borntraeger, linux-arch, ardb

On Fri, Oct 22, 2021 at 07:01:35PM +0200, Peter Zijlstra wrote:
> On Fri, Oct 22, 2021 at 05:54:31PM +0100, Mark Rutland wrote:
> 
> > > Pardon my thin understanding of the scheduler, but I assume this change
> > > doesn't mean stack_trace_save_tsk() stops working for "current", right?
> > > In trying to answer this for myself, I couldn't convince myself what value
> > > current->__state have here. Is it one of TASK_(UN)INTERRUPTIBLE ?
> > 
> > Regardless of that, current->on_rq will be non-zero, so you're right that this
> > causes stack_trace_save_tsk() to not work for current, e.g.
> > 
> > | # cat /proc/self/stack 
> > | # wc  /proc/self/stack 
> > |         0         0         0 /proc/self/stack
> > 
> > TBH, I think that (taking a step back from this issue in particular)
> > stack_trace_save_tsk() *shouldn't* work for current, and callers *should* be
> > forced to explicitly handle current separately from blocked tasks.
> 
> That..

So I think I'd prefer the following approach to that (and i'm not
currently volunteering for it):

 - convert all archs to ARCH_STACKWALK; this gets the semantics out of
   arch code and into the single kernel/stacktrace.c file.

 - bike-shed a new/improved stack_trace_save*() API and implement it
   *once* in generic code based on arch_stack_walk().

 - convert users; delete old etc..

For now, current users of stack_trace_save_tsk() very much expect
tsk==current to work.

> > So we could fix this in the stacktrace code with:
> > 
> > | diff --git a/kernel/stacktrace.c b/kernel/stacktrace.c
> > | index a1cdbf8c3ef8..327af9ff2c55 100644
> > | --- a/kernel/stacktrace.c
> > | +++ b/kernel/stacktrace.c
> > | @@ -149,7 +149,10 @@ unsigned int stack_trace_save_tsk(struct task_struct *tsk, unsigned long *store,
> > |                 .skip   = skipnr + (current == tsk),
> > |         };
> > |  
> > | -       task_try_func(tsk, try_arch_stack_walk_tsk, &c);
> > | +       if (tsk == current)
> > | +               try_arch_stack_walk_tsk(tsk, &c);
> > | +       else
> > | +               task_try_func(tsk, try_arch_stack_walk_tsk, &c);
> > |  
> > |         return c.len;
> > |  }
> > 
> > ... and we could rename task_try_func() to blocked_task_try_func(), and
> > later push the distinction into higher-level callers.
> 
> I think I favour this fix if we have to. But that's for next week :-)

I ended up with the below delta to this patch.

--- a/kernel/stacktrace.c
+++ b/kernel/stacktrace.c
@@ -101,7 +101,7 @@ static bool stack_trace_consume_entry_no
 }
 
 /**
- * stack_trace_save - Save a stack trace into a storage array
+ * stack_trace_save - Save a stack trace (of current) into a storage array
  * @store:	Pointer to storage array
  * @size:	Size of the storage array
  * @skipnr:	Number of entries to skip at the start of the stack trace
@@ -132,7 +132,7 @@ static int try_arch_stack_walk_tsk(struc
 
 /**
  * stack_trace_save_tsk - Save a task stack trace into a storage array
- * @task:	The task to examine
+ * @task:	The task to examine (current allowed)
  * @store:	Pointer to storage array
  * @size:	Size of the storage array
  * @skipnr:	Number of entries to skip at the start of the stack trace
@@ -149,13 +149,25 @@ unsigned int stack_trace_save_tsk(struct
 		.skip	= skipnr + (current == tsk),
 	};
 
-	task_try_func(tsk, try_arch_stack_walk_tsk, &c);
+	/*
+	 * If the task doesn't have a stack (e.g., a zombie), the stack is
+	 * empty.
+	 */
+	if (!try_get_task_stack(tsk))
+		return 0;
+
+	if (tsk == current)
+		try_arch_stack_walk_tsk(tsk, &c);
+	else
+		task_try_func(tsk, try_arch_stack_walk_tsk, &c);
+
+	put_task_stack(tsk);
 
 	return c.len;
 }
 
 /**
- * stack_trace_save_regs - Save a stack trace based on pt_regs into a storage array
+ * stack_trace_save_regs - Save a stack trace (of current) based on pt_regs into a storage array
  * @regs:	Pointer to pt_regs to examine
  * @store:	Pointer to storage array
  * @size:	Size of the storage array

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/7] stacktrace,sched: Make stack_trace_save_tsk() more robust
  2021-10-25 20:38         ` Peter Zijlstra
@ 2021-10-25 20:52           ` Kees Cook
  2021-10-26  9:33           ` Mark Rutland
  1 sibling, 0 replies; 28+ messages in thread
From: Kees Cook @ 2021-10-25 20:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mark Rutland, x86, linux-kernel, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot, akpm,
	zhengqi.arch, linux, catalin.marinas, will, mpe, paul.walmsley,
	palmer, hca, gor, borntraeger, linux-arch, ardb

On Mon, Oct 25, 2021 at 10:38:33PM +0200, Peter Zijlstra wrote:
> On Fri, Oct 22, 2021 at 07:01:35PM +0200, Peter Zijlstra wrote:
> > On Fri, Oct 22, 2021 at 05:54:31PM +0100, Mark Rutland wrote:
> > 
> > > > Pardon my thin understanding of the scheduler, but I assume this change
> > > > doesn't mean stack_trace_save_tsk() stops working for "current", right?
> > > > In trying to answer this for myself, I couldn't convince myself what value
> > > > current->__state have here. Is it one of TASK_(UN)INTERRUPTIBLE ?
> > > 
> > > Regardless of that, current->on_rq will be non-zero, so you're right that this
> > > causes stack_trace_save_tsk() to not work for current, e.g.
> > > 
> > > | # cat /proc/self/stack 
> > > | # wc  /proc/self/stack 
> > > |         0         0         0 /proc/self/stack
> > > 
> > > TBH, I think that (taking a step back from this issue in particular)
> > > stack_trace_save_tsk() *shouldn't* work for current, and callers *should* be
> > > forced to explicitly handle current separately from blocked tasks.
> > 
> > That..
> 
> So I think I'd prefer the following approach to that (and i'm not
> currently volunteering for it):
> 
>  - convert all archs to ARCH_STACKWALK; this gets the semantics out of
>    arch code and into the single kernel/stacktrace.c file.
> 
>  - bike-shed a new/improved stack_trace_save*() API and implement it
>    *once* in generic code based on arch_stack_walk().
> 
>  - convert users; delete old etc..
> 
> For now, current users of stack_trace_save_tsk() very much expect
> tsk==current to work.
> 
> > > So we could fix this in the stacktrace code with:
> > > 
> > > | diff --git a/kernel/stacktrace.c b/kernel/stacktrace.c
> > > | index a1cdbf8c3ef8..327af9ff2c55 100644
> > > | --- a/kernel/stacktrace.c
> > > | +++ b/kernel/stacktrace.c
> > > | @@ -149,7 +149,10 @@ unsigned int stack_trace_save_tsk(struct task_struct *tsk, unsigned long *store,
> > > |                 .skip   = skipnr + (current == tsk),
> > > |         };
> > > |  
> > > | -       task_try_func(tsk, try_arch_stack_walk_tsk, &c);
> > > | +       if (tsk == current)
> > > | +               try_arch_stack_walk_tsk(tsk, &c);
> > > | +       else
> > > | +               task_try_func(tsk, try_arch_stack_walk_tsk, &c);
> > > |  
> > > |         return c.len;
> > > |  }
> > > 
> > > ... and we could rename task_try_func() to blocked_task_try_func(), and
> > > later push the distinction into higher-level callers.
> > 
> > I think I favour this fix if we have to. But that's for next week :-)
> 
> I ended up with the below delta to this patch.
> 
> --- a/kernel/stacktrace.c
> +++ b/kernel/stacktrace.c
> @@ -101,7 +101,7 @@ static bool stack_trace_consume_entry_no
>  }
>  
>  /**
> - * stack_trace_save - Save a stack trace into a storage array
> + * stack_trace_save - Save a stack trace (of current) into a storage array
>   * @store:	Pointer to storage array
>   * @size:	Size of the storage array
>   * @skipnr:	Number of entries to skip at the start of the stack trace
> @@ -132,7 +132,7 @@ static int try_arch_stack_walk_tsk(struc
>  
>  /**
>   * stack_trace_save_tsk - Save a task stack trace into a storage array
> - * @task:	The task to examine
> + * @task:	The task to examine (current allowed)
>   * @store:	Pointer to storage array
>   * @size:	Size of the storage array
>   * @skipnr:	Number of entries to skip at the start of the stack trace
> @@ -149,13 +149,25 @@ unsigned int stack_trace_save_tsk(struct
>  		.skip	= skipnr + (current == tsk),
>  	};
>  
> -	task_try_func(tsk, try_arch_stack_walk_tsk, &c);
> +	/*
> +	 * If the task doesn't have a stack (e.g., a zombie), the stack is
> +	 * empty.
> +	 */
> +	if (!try_get_task_stack(tsk))
> +		return 0;
> +
> +	if (tsk == current)
> +		try_arch_stack_walk_tsk(tsk, &c);
> +	else
> +		task_try_func(tsk, try_arch_stack_walk_tsk, &c);
> +
> +	put_task_stack(tsk);
>  
>  	return c.len;
>  }
>  
>  /**
> - * stack_trace_save_regs - Save a stack trace based on pt_regs into a storage array
> + * stack_trace_save_regs - Save a stack trace (of current) based on pt_regs into a storage array
>   * @regs:	Pointer to pt_regs to examine
>   * @store:	Pointer to storage array
>   * @size:	Size of the storage array

Looks good to me, though I did like Mark's idea to name "task_try_func"
to "task_blocked_try_func" or something like that to make the "why can
this fail?" be more self-documenting. *shrug*

Reviewed-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/7] stacktrace,sched: Make stack_trace_save_tsk() more robust
  2021-10-25 20:38         ` Peter Zijlstra
  2021-10-25 20:52           ` Kees Cook
@ 2021-10-26  9:33           ` Mark Rutland
  1 sibling, 0 replies; 28+ messages in thread
From: Mark Rutland @ 2021-10-26  9:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kees Cook, x86, linux-kernel, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot, akpm,
	zhengqi.arch, linux, catalin.marinas, will, mpe, paul.walmsley,
	palmer, hca, gor, borntraeger, linux-arch, ardb

On Mon, Oct 25, 2021 at 10:38:33PM +0200, Peter Zijlstra wrote:
> On Fri, Oct 22, 2021 at 07:01:35PM +0200, Peter Zijlstra wrote:
> > On Fri, Oct 22, 2021 at 05:54:31PM +0100, Mark Rutland wrote:
> > 
> > > > Pardon my thin understanding of the scheduler, but I assume this change
> > > > doesn't mean stack_trace_save_tsk() stops working for "current", right?
> > > > In trying to answer this for myself, I couldn't convince myself what value
> > > > current->__state have here. Is it one of TASK_(UN)INTERRUPTIBLE ?
> > > 
> > > Regardless of that, current->on_rq will be non-zero, so you're right that this
> > > causes stack_trace_save_tsk() to not work for current, e.g.
> > > 
> > > | # cat /proc/self/stack 
> > > | # wc  /proc/self/stack 
> > > |         0         0         0 /proc/self/stack
> > > 
> > > TBH, I think that (taking a step back from this issue in particular)
> > > stack_trace_save_tsk() *shouldn't* work for current, and callers *should* be
> > > forced to explicitly handle current separately from blocked tasks.
> > 
> > That..
> 
> So I think I'd prefer the following approach to that (and i'm not
> currently volunteering for it):
> 
>  - convert all archs to ARCH_STACKWALK; this gets the semantics out of
>    arch code and into the single kernel/stacktrace.c file.
> 
>  - bike-shed a new/improved stack_trace_save*() API and implement it
>    *once* in generic code based on arch_stack_walk().
> 
>  - convert users; delete old etc..
> 
> For now, current users of stack_trace_save_tsk() very much expect
> tsk==current to work.

While I still think it's best to have distinct arch_stack_walk_*()
functions, I'm perfectly happy to say we need to convert arches to
ARCH_STACKWALK first, and punt bikeshedding until that's done.

> > > So we could fix this in the stacktrace code with:
> > > 
> > > | diff --git a/kernel/stacktrace.c b/kernel/stacktrace.c
> > > | index a1cdbf8c3ef8..327af9ff2c55 100644
> > > | --- a/kernel/stacktrace.c
> > > | +++ b/kernel/stacktrace.c
> > > | @@ -149,7 +149,10 @@ unsigned int stack_trace_save_tsk(struct task_struct *tsk, unsigned long *store,
> > > |                 .skip   = skipnr + (current == tsk),
> > > |         };
> > > |  
> > > | -       task_try_func(tsk, try_arch_stack_walk_tsk, &c);
> > > | +       if (tsk == current)
> > > | +               try_arch_stack_walk_tsk(tsk, &c);
> > > | +       else
> > > | +               task_try_func(tsk, try_arch_stack_walk_tsk, &c);
> > > |  
> > > |         return c.len;
> > > |  }
> > > 
> > > ... and we could rename task_try_func() to blocked_task_try_func(), and
> > > later push the distinction into higher-level callers.
> > 
> > I think I favour this fix if we have to. But that's for next week :-)
> 
> I ended up with the below delta to this patch.

With the abov in mind, the below looks good to me!

Thanks,
Mark.

> --- a/kernel/stacktrace.c
> +++ b/kernel/stacktrace.c
> @@ -101,7 +101,7 @@ static bool stack_trace_consume_entry_no
>  }
>  
>  /**
> - * stack_trace_save - Save a stack trace into a storage array
> + * stack_trace_save - Save a stack trace (of current) into a storage array
>   * @store:	Pointer to storage array
>   * @size:	Size of the storage array
>   * @skipnr:	Number of entries to skip at the start of the stack trace
> @@ -132,7 +132,7 @@ static int try_arch_stack_walk_tsk(struc
>  
>  /**
>   * stack_trace_save_tsk - Save a task stack trace into a storage array
> - * @task:	The task to examine
> + * @task:	The task to examine (current allowed)
>   * @store:	Pointer to storage array
>   * @size:	Size of the storage array
>   * @skipnr:	Number of entries to skip at the start of the stack trace
> @@ -149,13 +149,25 @@ unsigned int stack_trace_save_tsk(struct
>  		.skip	= skipnr + (current == tsk),
>  	};
>  
> -	task_try_func(tsk, try_arch_stack_walk_tsk, &c);
> +	/*
> +	 * If the task doesn't have a stack (e.g., a zombie), the stack is
> +	 * empty.
> +	 */
> +	if (!try_get_task_stack(tsk))
> +		return 0;
> +
> +	if (tsk == current)
> +		try_arch_stack_walk_tsk(tsk, &c);
> +	else
> +		task_try_func(tsk, try_arch_stack_walk_tsk, &c);
> +
> +	put_task_stack(tsk);
>  
>  	return c.len;
>  }
>  
>  /**
> - * stack_trace_save_regs - Save a stack trace based on pt_regs into a storage array
> + * stack_trace_save_regs - Save a stack trace (of current) based on pt_regs into a storage array
>   * @regs:	Pointer to pt_regs to examine
>   * @store:	Pointer to storage array
>   * @size:	Size of the storage array

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [tip: sched/core] x86: Fix __get_wchan() for !STACKTRACE
  2021-10-22 15:09 ` [PATCH 1/7] x86: Fix __get_wchan() for !STACKTRACE Peter Zijlstra
  2021-10-22 16:25   ` Kees Cook
@ 2021-10-26 19:16   ` tip-bot2 for Peter Zijlstra
  1 sibling, 0 replies; 28+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2021-10-26 19:16 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Stephen Rothwell, Peter Zijlstra (Intel), Kees Cook, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     5d1ceb3969b6b2e47e2df6d17790a7c5a20fcbb4
Gitweb:        https://git.kernel.org/tip/5d1ceb3969b6b2e47e2df6d17790a7c5a20fcbb4
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Fri, 22 Oct 2021 16:53:02 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 26 Oct 2021 21:10:12 +02:00

x86: Fix __get_wchan() for !STACKTRACE

Use asm/unwind.h to implement wchan, since we cannot always rely on
STACKTRACE=y.

Fixes: bc9bbb81730e ("x86: Fix get_wchan() to support the ORC unwinder")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lkml.kernel.org/r/20211022152104.137058575@infradead.org
---
 arch/x86/kernel/process.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index a69885a..8fb6bd4 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -43,6 +43,7 @@
 #include <asm/io_bitmap.h>
 #include <asm/proto.h>
 #include <asm/frame.h>
+#include <asm/unwind.h>
 
 #include "process.h"
 
@@ -944,10 +945,20 @@ unsigned long arch_randomize_brk(struct mm_struct *mm)
  */
 unsigned long __get_wchan(struct task_struct *p)
 {
-	unsigned long entry = 0;
+	struct unwind_state state;
+	unsigned long addr = 0;
 
-	stack_trace_save_tsk(p, &entry, 1, 0);
-	return entry;
+	for (unwind_start(&state, p, NULL, NULL); !unwind_done(&state);
+	     unwind_next_frame(&state)) {
+		addr = unwind_get_return_address(&state);
+		if (!addr)
+			break;
+		if (in_sched_functions(addr))
+			continue;
+		break;
+	}
+
+	return addr;
 }
 
 long do_arch_prctl_common(struct task_struct *task, int option,

^ permalink raw reply related	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2021-10-26 19:17 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-22 15:09 [PATCH 0/7] arch: More wchan fixes Peter Zijlstra
2021-10-22 15:09 ` [PATCH 1/7] x86: Fix __get_wchan() for !STACKTRACE Peter Zijlstra
2021-10-22 16:25   ` Kees Cook
2021-10-26 19:16   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2021-10-22 15:09 ` [PATCH 2/7] stacktrace,sched: Make stack_trace_save_tsk() more robust Peter Zijlstra
2021-10-22 16:25   ` Kees Cook
2021-10-22 16:45     ` Peter Zijlstra
2021-10-22 16:57       ` Mark Rutland
2021-10-22 16:54     ` Mark Rutland
2021-10-22 17:01       ` Peter Zijlstra
2021-10-25 20:38         ` Peter Zijlstra
2021-10-25 20:52           ` Kees Cook
2021-10-26  9:33           ` Mark Rutland
2021-10-25 16:27   ` Peter Zijlstra
2021-10-22 15:09 ` [PATCH 3/7] ARM: implement ARCH_STACKWALK Peter Zijlstra
2021-10-22 16:18   ` Kees Cook
2021-10-22 15:09 ` [PATCH 4/7] arch: Make ARCH_STACKWALK independent of STACKTRACE Peter Zijlstra
2021-10-22 16:18   ` Kees Cook
2021-10-22 16:36     ` Peter Zijlstra
2021-10-22 17:06   ` Mark Rutland
2021-10-22 15:09 ` [PATCH 5/7] powerpc, arm64: Mark __switch_to() as __sched Peter Zijlstra
2021-10-22 16:15   ` Kees Cook
2021-10-22 17:40   ` Mark Rutland
2021-10-22 15:09 ` [PATCH 6/7] arch: __get_wchan() || ARCH_STACKWALK Peter Zijlstra
2021-10-22 16:13   ` Kees Cook
2021-10-22 17:52   ` Mark Rutland
2021-10-22 15:09 ` [PATCH 7/7] selftests: proc: Make sure wchan works when it exists Peter Zijlstra
2021-10-22 15:27 ` [PATCH 0/7] arch: More wchan fixes Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).