All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 1/4] powerpc/64: Save stack pointer when we hard disable interrupts
@ 2018-05-02 13:07 Michael Ellerman
  2018-05-02 13:07 ` [RFC PATCH 2/4] powerpc/nmi: Add an API for sending "safe" NMIs Michael Ellerman
                   ` (4 more replies)
  0 siblings, 5 replies; 7+ messages in thread
From: Michael Ellerman @ 2018-05-02 13:07 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: npiggin

A CPU that gets stuck with interrupts hard disable can be difficult to
debug, as on some platforms we have no way to interrupt the CPU to
find out what it's doing.

A stop-gap is to have the CPU save it's stack pointer (r1) in its paca
when it hard disables interrupts. That way if we can't interrupt it,
we can at least trace the stack based on where it last disabled
interrupts.

In some cases that will be total junk, but the stack trace code should
handle that. In the simple case of a CPU that disable interrupts and
then gets stuck in a loop, the stack trace should be informative.

We could clear the saved stack pointer when we enable interrupts, but
that loses information which could be useful if we have nothing else
to go on.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
---
 arch/powerpc/include/asm/hw_irq.h    | 6 +++++-
 arch/powerpc/include/asm/paca.h      | 2 +-
 arch/powerpc/kernel/exceptions-64s.S | 1 +
 arch/powerpc/xmon/xmon.c             | 2 ++
 4 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/hw_irq.h b/arch/powerpc/include/asm/hw_irq.h
index 855e17d158b1..35cb37be61fe 100644
--- a/arch/powerpc/include/asm/hw_irq.h
+++ b/arch/powerpc/include/asm/hw_irq.h
@@ -237,8 +237,12 @@ static inline bool arch_irqs_disabled(void)
 	__hard_irq_disable();						\
 	flags = irq_soft_mask_set_return(IRQS_ALL_DISABLED);		\
 	local_paca->irq_happened |= PACA_IRQ_HARD_DIS;			\
-	if (!arch_irqs_disabled_flags(flags))				\
+	if (!arch_irqs_disabled_flags(flags)) {				\
+		asm ("stdx %%r1, 0, %1 ;"				\
+		     : "=m" (local_paca->saved_r1)			\
+		     : "b" (&local_paca->saved_r1));			\
 		trace_hardirqs_off();					\
+	}								\
 } while(0)
 
 static inline bool lazy_irq_pending(void)
diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index 3f109a3e3edb..e7814d948c7a 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -161,7 +161,7 @@ struct paca_struct {
 	struct task_struct *__current;	/* Pointer to current */
 	u64 kstack;			/* Saved Kernel stack addr */
 	u64 stab_rr;			/* stab/slb round-robin counter */
-	u64 saved_r1;			/* r1 save for RTAS calls or PM */
+	u64 saved_r1;			/* r1 save for RTAS calls or PM or EE=0 */
 	u64 saved_msr;			/* MSR saved here by enter_rtas */
 	u16 trap_save;			/* Used when bad stack is encountered */
 	u8 irq_soft_mask;		/* mask for irq soft masking */
diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index ae6a849db60b..bb26fe9e90ce 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -1499,6 +1499,7 @@ masked_##_H##interrupt:					\
 	xori	r10,r10,MSR_EE; /* clear MSR_EE */	\
 	mtspr	SPRN_##_H##SRR1,r10;			\
 2:	mtcrf	0x80,r9;				\
+	std	r1,PACAR1(r13);				\
 	ld	r9,PACA_EXGEN+EX_R9(r13);		\
 	ld	r10,PACA_EXGEN+EX_R10(r13);		\
 	ld	r11,PACA_EXGEN+EX_R11(r13);		\
diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index a0842f1ff72c..94cc8ba36c14 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -1161,6 +1161,8 @@ static int cpu_cmd(void)
 	/* try to switch to cpu specified */
 	if (!cpumask_test_cpu(cpu, &cpus_in_xmon)) {
 		printf("cpu 0x%x isn't in xmon\n", cpu);
+		printf("backtrace of paca[0x%x].saved_r1 (possibly stale):\n", cpu);
+		xmon_show_stack(paca_ptrs[cpu]->saved_r1, 0, 0);
 		return 0;
 	}
 	xmon_taken = 0;
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [RFC PATCH 2/4] powerpc/nmi: Add an API for sending "safe" NMIs
  2018-05-02 13:07 [RFC PATCH 1/4] powerpc/64: Save stack pointer when we hard disable interrupts Michael Ellerman
@ 2018-05-02 13:07 ` Michael Ellerman
  2018-05-02 13:07 ` [RFC PATCH 3/4] powerpc/64s: Wire up arch_trigger_cpumask_backtrace() Michael Ellerman
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Michael Ellerman @ 2018-05-02 13:07 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: npiggin

Currently the options we have for sending NMIs are not necessarily
safe, that is they can potentially interrupt a CPU in a
non-recoverable region of code, meaning the kernel must then panic().

But we'd like to use smp_send_nmi_ipi() to do cross-CPU calls in
situations where we don't want to risk a panic(), because it doesn't
have the requirement that interrupts must be enabled like
smp_call_function().

So add an API for the caller to indicate that it wants to use the NMI
infrastructure, but doesn't want to do anything "unsafe".

Currently that is implemented by not actually calling cause_nmi_ipi(),
instead falling back to an IPI. In future we can pass the safe
parameter down to cause_nmi_ipi() and the individual backends can
potentially take it into account before deciding what to do.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
---

I dislike "safe" but I couldn't think of a better word, suggestions
welcome.
---
 arch/powerpc/include/asm/smp.h |  1 +
 arch/powerpc/kernel/smp.c      | 20 +++++++++++++++-----
 2 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
index cfecfee1194b..29ffaabdf75b 100644
--- a/arch/powerpc/include/asm/smp.h
+++ b/arch/powerpc/include/asm/smp.h
@@ -58,6 +58,7 @@ struct smp_ops_t {
 
 extern void smp_flush_nmi_ipi(u64 delay_us);
 extern int smp_send_nmi_ipi(int cpu, void (*fn)(struct pt_regs *), u64 delay_us);
+extern int smp_send_safe_nmi_ipi(int cpu, void (*fn)(struct pt_regs *), u64 delay_us);
 extern void smp_send_debugger_break(void);
 extern void start_secondary_resume(void);
 extern void smp_generic_give_timebase(void);
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index e16ec7b3b427..6be19381ee70 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -419,9 +419,9 @@ int smp_handle_nmi_ipi(struct pt_regs *regs)
 	return ret;
 }
 
-static void do_smp_send_nmi_ipi(int cpu)
+static void do_smp_send_nmi_ipi(int cpu, bool safe)
 {
-	if (smp_ops->cause_nmi_ipi && smp_ops->cause_nmi_ipi(cpu))
+	if (!safe && smp_ops->cause_nmi_ipi && smp_ops->cause_nmi_ipi(cpu))
 		return;
 
 	if (cpu >= 0) {
@@ -461,7 +461,7 @@ void smp_flush_nmi_ipi(u64 delay_us)
  * - delay_us > 0 is the delay before giving up waiting for targets to
  *   enter the handler, == 0 specifies indefinite delay.
  */
-int smp_send_nmi_ipi(int cpu, void (*fn)(struct pt_regs *), u64 delay_us)
+int __smp_send_nmi_ipi(int cpu, void (*fn)(struct pt_regs *), u64 delay_us, bool safe)
 {
 	unsigned long flags;
 	int me = raw_smp_processor_id();
@@ -494,7 +494,7 @@ int smp_send_nmi_ipi(int cpu, void (*fn)(struct pt_regs *), u64 delay_us)
 	nmi_ipi_busy_count++;
 	nmi_ipi_unlock();
 
-	do_smp_send_nmi_ipi(cpu);
+	do_smp_send_nmi_ipi(cpu, safe);
 
 	while (!cpumask_empty(&nmi_ipi_pending_mask)) {
 		udelay(1);
@@ -516,6 +516,16 @@ int smp_send_nmi_ipi(int cpu, void (*fn)(struct pt_regs *), u64 delay_us)
 
 	return ret;
 }
+
+int smp_send_nmi_ipi(int cpu, void (*fn)(struct pt_regs *), u64 delay_us)
+{
+	return __smp_send_nmi_ipi(cpu, fn, delay_us, false);
+}
+
+int smp_send_safe_nmi_ipi(int cpu, void (*fn)(struct pt_regs *), u64 delay_us)
+{
+	return __smp_send_nmi_ipi(cpu, fn, delay_us, true);
+}
 #endif /* CONFIG_NMI_IPI */
 
 #ifdef CONFIG_GENERIC_CLOCKEVENTS_BROADCAST
@@ -559,7 +569,7 @@ void crash_send_ipi(void (*crash_ipi_callback)(struct pt_regs *))
 			 * entire NMI dance and waiting for
 			 * cpus to clear pending mask, etc.
 			 */
-			do_smp_send_nmi_ipi(cpu);
+			do_smp_send_nmi_ipi(cpu, false);
 		}
 	}
 }
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [RFC PATCH 3/4] powerpc/64s: Wire up arch_trigger_cpumask_backtrace()
  2018-05-02 13:07 [RFC PATCH 1/4] powerpc/64: Save stack pointer when we hard disable interrupts Michael Ellerman
  2018-05-02 13:07 ` [RFC PATCH 2/4] powerpc/nmi: Add an API for sending "safe" NMIs Michael Ellerman
@ 2018-05-02 13:07 ` Michael Ellerman
  2018-06-13  7:32   ` Christophe LEROY
  2018-05-02 13:07 ` [RFC PATCH 4/4] powerpc/stacktrace: Update copyright Michael Ellerman
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 7+ messages in thread
From: Michael Ellerman @ 2018-05-02 13:07 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: npiggin

This allows eg. the RCU stall detector, or the soft/hardlockup
detectors to trigger a backtrace on all CPUs.

We implement this by sending a "safe" NMI, which will actually only
send an IPI. Unfortunately the generic code prints "NMI", so that's a
little confusing but we can probably live with it.

If one of the CPUs doesn't respond to the IPI, we then print some info
from it's paca and do a backtrace based on its saved_r1.

Example output:

  INFO: rcu_sched detected stalls on CPUs/tasks:
  	2-...0: (0 ticks this GP) idle=1be/1/4611686018427387904 softirq=1055/1055 fqs=25735
  	(detected by 4, t=58847 jiffies, g=58, c=57, q=1258)
  Sending NMI from CPU 4 to CPUs 2:
  CPU 2 didn't respond to backtrace IPI, inspecting paca.
  irq_soft_mask: 0x01 in_mce: 0 in_nmi: 0 current: 3623 (bash)
  Back trace of paca->saved_r1 (0xc0000000e1c83ba0) (possibly stale):
  Call Trace:
  [c0000000e1c83ba0] [0000000000000014] 0x14 (unreliable)
  [c0000000e1c83bc0] [c000000000765798] lkdtm_do_action+0x48/0x80
  [c0000000e1c83bf0] [c000000000765a40] direct_entry+0x110/0x1b0
  [c0000000e1c83c90] [c00000000058e650] full_proxy_write+0x90/0xe0
  [c0000000e1c83ce0] [c0000000003aae3c] __vfs_write+0x6c/0x1f0
  [c0000000e1c83d80] [c0000000003ab214] vfs_write+0xd4/0x240
  [c0000000e1c83dd0] [c0000000003ab5cc] ksys_write+0x6c/0x110
  [c0000000e1c83e30] [c00000000000b860] system_call+0x58/0x6c

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
---
 arch/powerpc/include/asm/nmi.h   |  4 ++++
 arch/powerpc/kernel/stacktrace.c | 51 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 55 insertions(+)

diff --git a/arch/powerpc/include/asm/nmi.h b/arch/powerpc/include/asm/nmi.h
index 9c80939b4d14..e97f58689ca7 100644
--- a/arch/powerpc/include/asm/nmi.h
+++ b/arch/powerpc/include/asm/nmi.h
@@ -4,6 +4,10 @@
 
 #ifdef CONFIG_PPC_WATCHDOG
 extern void arch_touch_nmi_watchdog(void);
+extern void arch_trigger_cpumask_backtrace(const cpumask_t *mask,
+					   bool exclude_self);
+#define arch_trigger_cpumask_backtrace arch_trigger_cpumask_backtrace
+
 #else
 static inline void arch_touch_nmi_watchdog(void) {}
 #endif
diff --git a/arch/powerpc/kernel/stacktrace.c b/arch/powerpc/kernel/stacktrace.c
index d534ed901538..cf4652d5df80 100644
--- a/arch/powerpc/kernel/stacktrace.c
+++ b/arch/powerpc/kernel/stacktrace.c
@@ -11,12 +11,15 @@
  */
 
 #include <linux/export.h>
+#include <linux/nmi.h>
 #include <linux/sched.h>
 #include <linux/sched/debug.h>
 #include <linux/stacktrace.h>
 #include <asm/ptrace.h>
 #include <asm/processor.h>
 
+#include <asm/paca.h>
+
 /*
  * Save stack-backtrace addresses into a stack_trace buffer.
  */
@@ -76,3 +79,51 @@ save_stack_trace_regs(struct pt_regs *regs, struct stack_trace *trace)
 	save_context_stack(trace, regs->gpr[1], current, 0);
 }
 EXPORT_SYMBOL_GPL(save_stack_trace_regs);
+
+#ifdef CONFIG_PPC_BOOK3S_64
+static void handle_backtrace_ipi(struct pt_regs *regs)
+{
+	nmi_cpu_backtrace(regs);
+}
+
+static void raise_backtrace_ipi(cpumask_t *mask)
+{
+	unsigned int cpu;
+
+	for_each_cpu(cpu, mask) {
+		if (cpu == smp_processor_id())
+			handle_backtrace_ipi(NULL);
+		else
+			smp_send_safe_nmi_ipi(cpu, handle_backtrace_ipi, 5 * USEC_PER_SEC);
+	}
+
+	for_each_cpu(cpu, mask) {
+		struct paca_struct *p = paca_ptrs[cpu];
+
+		cpumask_clear_cpu(cpu, mask);
+
+		pr_warn("CPU %d didn't respond to backtrace IPI, inspecting paca.\n", cpu);
+		if (!virt_addr_valid(p)) {
+			pr_warn("paca pointer appears corrupt? (%px)\n", p);
+			continue;
+		}
+
+		pr_warn("irq_soft_mask: 0x%02x in_mce: %d in_nmi: %d",
+			p->irq_soft_mask, p->in_mce, p->in_nmi);
+
+		if (virt_addr_valid(p->__current))
+			pr_cont(" current: %d (%s)\n", p->__current->pid,
+				p->__current->comm);
+		else
+			pr_cont(" current pointer corrupt? (%px)\n", p->__current);
+
+		pr_warn("Back trace of paca->saved_r1 (0x%016llx) (possibly stale):\n", p->saved_r1);
+		show_stack(p->__current, (unsigned long *)p->saved_r1);
+	}
+}
+
+void arch_trigger_cpumask_backtrace(const cpumask_t *mask, bool exclude_self)
+{
+	nmi_trigger_cpumask_backtrace(mask, exclude_self, raise_backtrace_ipi);
+}
+#endif /* CONFIG_PPC64 */
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [RFC PATCH 4/4] powerpc/stacktrace: Update copyright
  2018-05-02 13:07 [RFC PATCH 1/4] powerpc/64: Save stack pointer when we hard disable interrupts Michael Ellerman
  2018-05-02 13:07 ` [RFC PATCH 2/4] powerpc/nmi: Add an API for sending "safe" NMIs Michael Ellerman
  2018-05-02 13:07 ` [RFC PATCH 3/4] powerpc/64s: Wire up arch_trigger_cpumask_backtrace() Michael Ellerman
@ 2018-05-02 13:07 ` Michael Ellerman
  2018-05-05  6:26 ` [RFC PATCH 1/4] powerpc/64: Save stack pointer when we hard disable interrupts Nicholas Piggin
  2018-06-04 14:10 ` [RFC, " Michael Ellerman
  4 siblings, 0 replies; 7+ messages in thread
From: Michael Ellerman @ 2018-05-02 13:07 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: npiggin

This now has new code in it written by Nick and I, and switch to a
SPDX tag.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
---
 arch/powerpc/kernel/stacktrace.c | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kernel/stacktrace.c b/arch/powerpc/kernel/stacktrace.c
index cf4652d5df80..8f032747b96d 100644
--- a/arch/powerpc/kernel/stacktrace.c
+++ b/arch/powerpc/kernel/stacktrace.c
@@ -1,13 +1,10 @@
+// SPDX-License-Identifier: GPL-2.0
+
 /*
- * Stack trace utility
+ * Stack trace utility functions etc.
  *
  * Copyright 2008 Christoph Hellwig, IBM Corp.
- *
- *
- *      This program is free software; you can redistribute it and/or
- *      modify it under the terms of the GNU General Public License
- *      as published by the Free Software Foundation; either version
- *      2 of the License, or (at your option) any later version.
+ * Copyright 2018 Nick Piggin, Michael Ellerman, IBM Corp.
  */
 
 #include <linux/export.h>
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH 1/4] powerpc/64: Save stack pointer when we hard disable interrupts
  2018-05-02 13:07 [RFC PATCH 1/4] powerpc/64: Save stack pointer when we hard disable interrupts Michael Ellerman
                   ` (2 preceding siblings ...)
  2018-05-02 13:07 ` [RFC PATCH 4/4] powerpc/stacktrace: Update copyright Michael Ellerman
@ 2018-05-05  6:26 ` Nicholas Piggin
  2018-06-04 14:10 ` [RFC, " Michael Ellerman
  4 siblings, 0 replies; 7+ messages in thread
From: Nicholas Piggin @ 2018-05-05  6:26 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev

On Wed,  2 May 2018 23:07:26 +1000
Michael Ellerman <mpe@ellerman.id.au> wrote:

> A CPU that gets stuck with interrupts hard disable can be difficult to
> debug, as on some platforms we have no way to interrupt the CPU to
> find out what it's doing.
> 
> A stop-gap is to have the CPU save it's stack pointer (r1) in its paca
> when it hard disables interrupts. That way if we can't interrupt it,
> we can at least trace the stack based on where it last disabled
> interrupts.
> 
> In some cases that will be total junk, but the stack trace code should
> handle that. In the simple case of a CPU that disable interrupts and
> then gets stuck in a loop, the stack trace should be informative.
> 
> We could clear the saved stack pointer when we enable interrupts, but
> that loses information which could be useful if we have nothing else
> to go on.
> 
> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
> ---
>  arch/powerpc/include/asm/hw_irq.h    | 6 +++++-
>  arch/powerpc/include/asm/paca.h      | 2 +-
>  arch/powerpc/kernel/exceptions-64s.S | 1 +
>  arch/powerpc/xmon/xmon.c             | 2 ++
>  4 files changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/hw_irq.h b/arch/powerpc/include/asm/hw_irq.h
> index 855e17d158b1..35cb37be61fe 100644
> --- a/arch/powerpc/include/asm/hw_irq.h
> +++ b/arch/powerpc/include/asm/hw_irq.h
> @@ -237,8 +237,12 @@ static inline bool arch_irqs_disabled(void)
>  	__hard_irq_disable();						\
>  	flags = irq_soft_mask_set_return(IRQS_ALL_DISABLED);		\
>  	local_paca->irq_happened |= PACA_IRQ_HARD_DIS;			\
> -	if (!arch_irqs_disabled_flags(flags))				\
> +	if (!arch_irqs_disabled_flags(flags)) {				\
> +		asm ("stdx %%r1, 0, %1 ;"				\
> +		     : "=m" (local_paca->saved_r1)			\
> +		     : "b" (&local_paca->saved_r1));			\
>  		trace_hardirqs_off();					\
> +	}	

This is pretty neat, it would be good to have something that's not so
destructive as the NMI IPI.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC, 1/4] powerpc/64: Save stack pointer when we hard disable interrupts
  2018-05-02 13:07 [RFC PATCH 1/4] powerpc/64: Save stack pointer when we hard disable interrupts Michael Ellerman
                   ` (3 preceding siblings ...)
  2018-05-05  6:26 ` [RFC PATCH 1/4] powerpc/64: Save stack pointer when we hard disable interrupts Nicholas Piggin
@ 2018-06-04 14:10 ` Michael Ellerman
  4 siblings, 0 replies; 7+ messages in thread
From: Michael Ellerman @ 2018-06-04 14:10 UTC (permalink / raw)
  To: Michael Ellerman, linuxppc-dev; +Cc: npiggin

On Wed, 2018-05-02 at 13:07:26 UTC, Michael Ellerman wrote:
> A CPU that gets stuck with interrupts hard disable can be difficult to
> debug, as on some platforms we have no way to interrupt the CPU to
> find out what it's doing.
> 
> A stop-gap is to have the CPU save it's stack pointer (r1) in its paca
> when it hard disables interrupts. That way if we can't interrupt it,
> we can at least trace the stack based on where it last disabled
> interrupts.
> 
> In some cases that will be total junk, but the stack trace code should
> handle that. In the simple case of a CPU that disable interrupts and
> then gets stuck in a loop, the stack trace should be informative.
> 
> We could clear the saved stack pointer when we enable interrupts, but
> that loses information which could be useful if we have nothing else
> to go on.
> 
> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

Series applied to powerpc next.

https://git.kernel.org/powerpc/c/7b08729cb272b4cd5c657cd5ac0ddd

cheers

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH 3/4] powerpc/64s: Wire up arch_trigger_cpumask_backtrace()
  2018-05-02 13:07 ` [RFC PATCH 3/4] powerpc/64s: Wire up arch_trigger_cpumask_backtrace() Michael Ellerman
@ 2018-06-13  7:32   ` Christophe LEROY
  0 siblings, 0 replies; 7+ messages in thread
From: Christophe LEROY @ 2018-06-13  7:32 UTC (permalink / raw)
  To: Michael Ellerman, linuxppc-dev; +Cc: npiggin

Hi Michael,

It looks like this commit generates the following error:

stacktrace.c:(.text+0x1b0): undefined reference to `.smp_send_safe_nmi_ipi'
make[1]: *** [vmlinux] Error 1
make: *** [sub-make] Error 2

See http://kisskb.ellerman.id.au/kisskb/buildresult/13395345/ for details

Seems like that function only exists when CONFIG_NMI_IPI is defined.

Christophe

Le 02/05/2018 à 15:07, Michael Ellerman a écrit :
> This allows eg. the RCU stall detector, or the soft/hardlockup
> detectors to trigger a backtrace on all CPUs.
> 
> We implement this by sending a "safe" NMI, which will actually only
> send an IPI. Unfortunately the generic code prints "NMI", so that's a
> little confusing but we can probably live with it.
> 
> If one of the CPUs doesn't respond to the IPI, we then print some info
> from it's paca and do a backtrace based on its saved_r1.
> 
> Example output:
> 
>    INFO: rcu_sched detected stalls on CPUs/tasks:
>    	2-...0: (0 ticks this GP) idle=1be/1/4611686018427387904 softirq=1055/1055 fqs=25735
>    	(detected by 4, t=58847 jiffies, g=58, c=57, q=1258)
>    Sending NMI from CPU 4 to CPUs 2:
>    CPU 2 didn't respond to backtrace IPI, inspecting paca.
>    irq_soft_mask: 0x01 in_mce: 0 in_nmi: 0 current: 3623 (bash)
>    Back trace of paca->saved_r1 (0xc0000000e1c83ba0) (possibly stale):
>    Call Trace:
>    [c0000000e1c83ba0] [0000000000000014] 0x14 (unreliable)
>    [c0000000e1c83bc0] [c000000000765798] lkdtm_do_action+0x48/0x80
>    [c0000000e1c83bf0] [c000000000765a40] direct_entry+0x110/0x1b0
>    [c0000000e1c83c90] [c00000000058e650] full_proxy_write+0x90/0xe0
>    [c0000000e1c83ce0] [c0000000003aae3c] __vfs_write+0x6c/0x1f0
>    [c0000000e1c83d80] [c0000000003ab214] vfs_write+0xd4/0x240
>    [c0000000e1c83dd0] [c0000000003ab5cc] ksys_write+0x6c/0x110
>    [c0000000e1c83e30] [c00000000000b860] system_call+0x58/0x6c
> 
> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
> ---
>   arch/powerpc/include/asm/nmi.h   |  4 ++++
>   arch/powerpc/kernel/stacktrace.c | 51 ++++++++++++++++++++++++++++++++++++++++
>   2 files changed, 55 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/nmi.h b/arch/powerpc/include/asm/nmi.h
> index 9c80939b4d14..e97f58689ca7 100644
> --- a/arch/powerpc/include/asm/nmi.h
> +++ b/arch/powerpc/include/asm/nmi.h
> @@ -4,6 +4,10 @@
>   
>   #ifdef CONFIG_PPC_WATCHDOG
>   extern void arch_touch_nmi_watchdog(void);
> +extern void arch_trigger_cpumask_backtrace(const cpumask_t *mask,
> +					   bool exclude_self);
> +#define arch_trigger_cpumask_backtrace arch_trigger_cpumask_backtrace
> +
>   #else
>   static inline void arch_touch_nmi_watchdog(void) {}
>   #endif
> diff --git a/arch/powerpc/kernel/stacktrace.c b/arch/powerpc/kernel/stacktrace.c
> index d534ed901538..cf4652d5df80 100644
> --- a/arch/powerpc/kernel/stacktrace.c
> +++ b/arch/powerpc/kernel/stacktrace.c
> @@ -11,12 +11,15 @@
>    */
>   
>   #include <linux/export.h>
> +#include <linux/nmi.h>
>   #include <linux/sched.h>
>   #include <linux/sched/debug.h>
>   #include <linux/stacktrace.h>
>   #include <asm/ptrace.h>
>   #include <asm/processor.h>
>   
> +#include <asm/paca.h>
> +
>   /*
>    * Save stack-backtrace addresses into a stack_trace buffer.
>    */
> @@ -76,3 +79,51 @@ save_stack_trace_regs(struct pt_regs *regs, struct stack_trace *trace)
>   	save_context_stack(trace, regs->gpr[1], current, 0);
>   }
>   EXPORT_SYMBOL_GPL(save_stack_trace_regs);
> +
> +#ifdef CONFIG_PPC_BOOK3S_64
> +static void handle_backtrace_ipi(struct pt_regs *regs)
> +{
> +	nmi_cpu_backtrace(regs);
> +}
> +
> +static void raise_backtrace_ipi(cpumask_t *mask)
> +{
> +	unsigned int cpu;
> +
> +	for_each_cpu(cpu, mask) {
> +		if (cpu == smp_processor_id())
> +			handle_backtrace_ipi(NULL);
> +		else
> +			smp_send_safe_nmi_ipi(cpu, handle_backtrace_ipi, 5 * USEC_PER_SEC);
> +	}
> +
> +	for_each_cpu(cpu, mask) {
> +		struct paca_struct *p = paca_ptrs[cpu];
> +
> +		cpumask_clear_cpu(cpu, mask);
> +
> +		pr_warn("CPU %d didn't respond to backtrace IPI, inspecting paca.\n", cpu);
> +		if (!virt_addr_valid(p)) {
> +			pr_warn("paca pointer appears corrupt? (%px)\n", p);
> +			continue;
> +		}
> +
> +		pr_warn("irq_soft_mask: 0x%02x in_mce: %d in_nmi: %d",
> +			p->irq_soft_mask, p->in_mce, p->in_nmi);
> +
> +		if (virt_addr_valid(p->__current))
> +			pr_cont(" current: %d (%s)\n", p->__current->pid,
> +				p->__current->comm);
> +		else
> +			pr_cont(" current pointer corrupt? (%px)\n", p->__current);
> +
> +		pr_warn("Back trace of paca->saved_r1 (0x%016llx) (possibly stale):\n", p->saved_r1);
> +		show_stack(p->__current, (unsigned long *)p->saved_r1);
> +	}
> +}
> +
> +void arch_trigger_cpumask_backtrace(const cpumask_t *mask, bool exclude_self)
> +{
> +	nmi_trigger_cpumask_backtrace(mask, exclude_self, raise_backtrace_ipi);
> +}
> +#endif /* CONFIG_PPC64 */
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-06-13  7:50 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-02 13:07 [RFC PATCH 1/4] powerpc/64: Save stack pointer when we hard disable interrupts Michael Ellerman
2018-05-02 13:07 ` [RFC PATCH 2/4] powerpc/nmi: Add an API for sending "safe" NMIs Michael Ellerman
2018-05-02 13:07 ` [RFC PATCH 3/4] powerpc/64s: Wire up arch_trigger_cpumask_backtrace() Michael Ellerman
2018-06-13  7:32   ` Christophe LEROY
2018-05-02 13:07 ` [RFC PATCH 4/4] powerpc/stacktrace: Update copyright Michael Ellerman
2018-05-05  6:26 ` [RFC PATCH 1/4] powerpc/64: Save stack pointer when we hard disable interrupts Nicholas Piggin
2018-06-04 14:10 ` [RFC, " Michael Ellerman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.