linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] Shared NMI backtracing support for ARM/x86
@ 2015-07-15 20:39 Russell King - ARM Linux
  2015-07-15 20:39 ` [PATCH 1/3] nmi: create generic NMI backtrace implementation Russell King
                   ` (4 more replies)
  0 siblings, 5 replies; 15+ messages in thread
From: Russell King - ARM Linux @ 2015-07-15 20:39 UTC (permalink / raw)
  To: linux-arm-kernel, x86
  Cc: Daniel Thompson, H. Peter Anvin, Ingo Molnar, linux-kernel,
	Thomas Gleixner

Back in September, I stumbled across a single CPU IRQs-off lockup of an
ARM SMP system, and decided to hack something together based on a much
older hacky implementation used with StrongARM CPUs from early 2000s.

This resulted in a copy of the x86 NMI backtrace code into ARM as it was
back then, and feedback indicated that it wasn't a good time to push
such an effort forward, as printk() in NMI context is dodgy.

Over time, the x86 code has had this problem addressed, and last week
I updated the patch which I've been carrying in my tree to move the
shared code out of arch/x86 into lib/ rather than duplicating it, and
switch the ARM implementation to use it.

Discussing this with Thomas Gliexner, he agreed to give it a test over
last weekend, and he has reported to me this evening "no explosion so
far".  Since then, I've made a change to add the NOKPROBE_SYMBOL() to
the generic handler as per the x86 original code.

I'm aware that there are other competing implementations out there -
Daniel has one based on my patch from September time, but I don't think
that goes far enough with code sharing.  I'm also partially aware of
an implementation from Petr too.

 arch/arm/include/asm/irq.h    |   5 ++
 arch/arm/kernel/smp.c         |  18 +++++
 arch/x86/kernel/apic/hw_nmi.c | 133 ++--------------------------------
 include/linux/nmi.h           |   6 ++
 lib/Makefile                  |   2 +-
 lib/nmi_backtrace.c           | 162 ++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 196 insertions(+), 130 deletions(-)

-- 
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/3] nmi: create generic NMI backtrace implementation
  2015-07-15 20:39 [PATCH 0/3] Shared NMI backtracing support for ARM/x86 Russell King - ARM Linux
@ 2015-07-15 20:39 ` Russell King
  2015-07-16  9:11   ` Daniel Thompson
  2015-07-16 11:07   ` Thomas Gleixner
  2015-07-15 20:39 ` [PATCH 2/3] nmi: x86: convert to generic nmi handler Russell King
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 15+ messages in thread
From: Russell King @ 2015-07-15 20:39 UTC (permalink / raw)
  To: linux-arm-kernel, x86; +Cc: linux-kernel, Daniel Thompson

x86s NMI backtrace implementation (for arch_trigger_all_cpu_backtrace())
is fairly generic in nature - the only architecture specific bits are
the act of raising the NMI to other CPUs, and reporting the status of
the NMI handler.

These are fairly simple to factor out, and produce a generic
implementation which can be shared between ARM and x86.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
---
 include/linux/nmi.h |   6 ++
 lib/Makefile        |   2 +-
 lib/nmi_backtrace.c | 162 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 169 insertions(+), 1 deletion(-)
 create mode 100644 lib/nmi_backtrace.c

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index f94da0e65dea..5791e3229068 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -49,6 +49,12 @@ static inline bool trigger_allbutself_cpu_backtrace(void)
 	arch_trigger_all_cpu_backtrace(false);
 	return true;
 }
+
+/* generic implementation */
+void nmi_trigger_all_cpu_backtrace(bool include_self,
+				   void (*raise)(cpumask_t *mask));
+bool nmi_cpu_backtrace(struct pt_regs *regs);
+
 #else
 static inline bool trigger_all_cpu_backtrace(void)
 {
diff --git a/lib/Makefile b/lib/Makefile
index 6897b527581a..392169c5bc4e 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -13,7 +13,7 @@ lib-y := ctype.o string.o vsprintf.o cmdline.o \
 	 sha1.o md5.o irq_regs.o argv_split.o \
 	 proportions.o flex_proportions.o ratelimit.o show_mem.o \
 	 is_single_threaded.o plist.o decompress.o kobject_uevent.o \
-	 earlycpio.o seq_buf.o
+	 earlycpio.o seq_buf.o nmi_backtrace.o
 
 obj-$(CONFIG_ARCH_HAS_DEBUG_STRICT_USER_COPY_CHECKS) += usercopy.o
 lib-$(CONFIG_MMU) += ioremap.o
diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c
new file mode 100644
index 000000000000..88d3d32e5923
--- /dev/null
+++ b/lib/nmi_backtrace.c
@@ -0,0 +1,162 @@
+/*
+ *  NMI backtrace support
+ *
+ * Gratuitously copied from arch/x86/kernel/apic/hw_nmi.c by Russell King,
+ * with the following header:
+ *
+ *  HW NMI watchdog support
+ *
+ *  started by Don Zickus, Copyright (C) 2010 Red Hat, Inc.
+ *
+ *  Arch specific calls to support NMI watchdog
+ *
+ *  Bits copied from original nmi.c file
+ */
+#include <linux/cpumask.h>
+#include <linux/delay.h>
+#include <linux/kprobes.h>
+#include <linux/nmi.h>
+#include <linux/seq_buf.h>
+
+#ifdef arch_trigger_all_cpu_backtrace
+/* For reliability, we're prepared to waste bits here. */
+static DECLARE_BITMAP(backtrace_mask, NR_CPUS) __read_mostly;
+static cpumask_t printtrace_mask;
+
+#define NMI_BUF_SIZE		4096
+
+struct nmi_seq_buf {
+	unsigned char		buffer[NMI_BUF_SIZE];
+	struct seq_buf		seq;
+};
+
+/* Safe printing in NMI context */
+static DEFINE_PER_CPU(struct nmi_seq_buf, nmi_print_seq);
+
+/* "in progress" flag of arch_trigger_all_cpu_backtrace */
+static unsigned long backtrace_flag;
+
+static void print_seq_line(struct nmi_seq_buf *s, int start, int end)
+{
+	const char *buf = s->buffer + start;
+
+	printk("%.*s", (end - start) + 1, buf);
+}
+
+void nmi_trigger_all_cpu_backtrace(bool include_self,
+				   void (*raise)(cpumask_t *mask))
+{
+	struct nmi_seq_buf *s;
+	int i, cpu, this_cpu = get_cpu();
+
+	if (test_and_set_bit(0, &backtrace_flag)) {
+		/*
+		 * If there is already a trigger_all_cpu_backtrace() in progress
+		 * (backtrace_flag == 1), don't output double cpu dump infos.
+		 */
+		put_cpu();
+		return;
+	}
+
+	cpumask_copy(to_cpumask(backtrace_mask), cpu_online_mask);
+	if (!include_self)
+		cpumask_clear_cpu(this_cpu, to_cpumask(backtrace_mask));
+
+	cpumask_copy(&printtrace_mask, to_cpumask(backtrace_mask));
+
+	/*
+	 * Set up per_cpu seq_buf buffers that the NMIs running on the other
+	 * CPUs will write to.
+	 */
+	for_each_cpu(cpu, to_cpumask(backtrace_mask)) {
+		s = &per_cpu(nmi_print_seq, cpu);
+		seq_buf_init(&s->seq, s->buffer, NMI_BUF_SIZE);
+	}
+
+	if (!cpumask_empty(to_cpumask(backtrace_mask))) {
+		pr_info("Sending NMI to %s CPUs:\n",
+			(include_self ? "all" : "other"));
+		raise(to_cpumask(backtrace_mask));
+	}
+
+	/* Wait for up to 10 seconds for all CPUs to do the backtrace */
+	for (i = 0; i < 10 * 1000; i++) {
+		if (cpumask_empty(to_cpumask(backtrace_mask)))
+			break;
+		mdelay(1);
+		touch_softlockup_watchdog();
+	}
+
+	/*
+	 * Now that all the NMIs have triggered, we can dump out their
+	 * back traces safely to the console.
+	 */
+	for_each_cpu(cpu, &printtrace_mask) {
+		int len, last_i = 0;
+
+		s = &per_cpu(nmi_print_seq, cpu);
+		len = seq_buf_used(&s->seq);
+		if (!len)
+			continue;
+
+		/* Print line by line. */
+		for (i = 0; i < len; i++) {
+			if (s->buffer[i] == '\n') {
+				print_seq_line(s, last_i, i);
+				last_i = i + 1;
+			}
+		}
+		/* Check if there was a partial line. */
+		if (last_i < len) {
+			print_seq_line(s, last_i, len - 1);
+			pr_cont("\n");
+		}
+	}
+
+	clear_bit(0, &backtrace_flag);
+	smp_mb__after_atomic();
+	put_cpu();
+}
+
+/*
+ * It is not safe to call printk() directly from NMI handlers.
+ * It may be fine if the NMI detected a lock up and we have no choice
+ * but to do so, but doing a NMI on all other CPUs to get a back trace
+ * can be done with a sysrq-l. We don't want that to lock up, which
+ * can happen if the NMI interrupts a printk in progress.
+ *
+ * Instead, we redirect the vprintk() to this nmi_vprintk() that writes
+ * the content into a per cpu seq_buf buffer. Then when the NMIs are
+ * all done, we can safely dump the contents of the seq_buf to a printk()
+ * from a non NMI context.
+ */
+static int nmi_vprintk(const char *fmt, va_list args)
+{
+	struct nmi_seq_buf *s = this_cpu_ptr(&nmi_print_seq);
+	unsigned int len = seq_buf_used(&s->seq);
+
+	seq_buf_vprintf(&s->seq, fmt, args);
+	return seq_buf_used(&s->seq) - len;
+}
+
+bool nmi_cpu_backtrace(struct pt_regs *regs)
+{
+	int cpu = smp_processor_id();
+
+	if (cpumask_test_cpu(cpu, to_cpumask(backtrace_mask))) {
+		printk_func_t printk_func_save = this_cpu_read(printk_func);
+
+		/* Replace printk to write into the NMI seq */
+		this_cpu_write(printk_func, nmi_vprintk);
+		pr_warn("NMI backtrace for cpu %d\n", cpu);
+		show_regs(regs);
+		this_cpu_write(printk_func, printk_func_save);
+
+		cpumask_clear_cpu(cpu, to_cpumask(backtrace_mask));
+		return true;
+	}
+
+	return false;
+}
+NOKPROBE_SYMBOL(nmi_cpu_backtrace);
+#endif
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/3] nmi: x86: convert to generic nmi handler
  2015-07-15 20:39 [PATCH 0/3] Shared NMI backtracing support for ARM/x86 Russell King - ARM Linux
  2015-07-15 20:39 ` [PATCH 1/3] nmi: create generic NMI backtrace implementation Russell King
@ 2015-07-15 20:39 ` Russell King
  2015-07-16 11:07   ` Thomas Gleixner
  2015-07-15 20:39 ` [PATCH 3/3] ARM: add basic support for on-demand backtrace of other CPUs Russell King
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 15+ messages in thread
From: Russell King @ 2015-07-15 20:39 UTC (permalink / raw)
  To: linux-arm-kernel, x86
  Cc: linux-kernel, Daniel Thompson, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin

Convert x86 to use the generic nmi handler code which can be shared
between architectures.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
---
 arch/x86/kernel/apic/hw_nmi.c | 133 ++----------------------------------------
 1 file changed, 4 insertions(+), 129 deletions(-)

diff --git a/arch/x86/kernel/apic/hw_nmi.c b/arch/x86/kernel/apic/hw_nmi.c
index 6873ab925d00..045e424fb368 100644
--- a/arch/x86/kernel/apic/hw_nmi.c
+++ b/arch/x86/kernel/apic/hw_nmi.c
@@ -28,146 +28,21 @@ u64 hw_nmi_get_sample_period(int watchdog_thresh)
 #endif
 
 #ifdef arch_trigger_all_cpu_backtrace
-/* For reliability, we're prepared to waste bits here. */
-static DECLARE_BITMAP(backtrace_mask, NR_CPUS) __read_mostly;
-static cpumask_t printtrace_mask;
-
-#define NMI_BUF_SIZE		4096
-
-struct nmi_seq_buf {
-	unsigned char		buffer[NMI_BUF_SIZE];
-	struct seq_buf		seq;
-};
-
-/* Safe printing in NMI context */
-static DEFINE_PER_CPU(struct nmi_seq_buf, nmi_print_seq);
-
-/* "in progress" flag of arch_trigger_all_cpu_backtrace */
-static unsigned long backtrace_flag;
-
-static void print_seq_line(struct nmi_seq_buf *s, int start, int end)
+static void nmi_raise_cpu_backtrace(cpumask_t *mask)
 {
-	const char *buf = s->buffer + start;
-
-	printk("%.*s", (end - start) + 1, buf);
+	apic->send_IPI_mask(mask, NMI_VECTOR);
 }
 
 void arch_trigger_all_cpu_backtrace(bool include_self)
 {
-	struct nmi_seq_buf *s;
-	int len;
-	int cpu;
-	int i;
-	int this_cpu = get_cpu();
-
-	if (test_and_set_bit(0, &backtrace_flag)) {
-		/*
-		 * If there is already a trigger_all_cpu_backtrace() in progress
-		 * (backtrace_flag == 1), don't output double cpu dump infos.
-		 */
-		put_cpu();
-		return;
-	}
-
-	cpumask_copy(to_cpumask(backtrace_mask), cpu_online_mask);
-	if (!include_self)
-		cpumask_clear_cpu(this_cpu, to_cpumask(backtrace_mask));
-
-	cpumask_copy(&printtrace_mask, to_cpumask(backtrace_mask));
-	/*
-	 * Set up per_cpu seq_buf buffers that the NMIs running on the other
-	 * CPUs will write to.
-	 */
-	for_each_cpu(cpu, to_cpumask(backtrace_mask)) {
-		s = &per_cpu(nmi_print_seq, cpu);
-		seq_buf_init(&s->seq, s->buffer, NMI_BUF_SIZE);
-	}
-
-	if (!cpumask_empty(to_cpumask(backtrace_mask))) {
-		pr_info("sending NMI to %s CPUs:\n",
-			(include_self ? "all" : "other"));
-		apic->send_IPI_mask(to_cpumask(backtrace_mask), NMI_VECTOR);
-	}
-
-	/* Wait for up to 10 seconds for all CPUs to do the backtrace */
-	for (i = 0; i < 10 * 1000; i++) {
-		if (cpumask_empty(to_cpumask(backtrace_mask)))
-			break;
-		mdelay(1);
-		touch_softlockup_watchdog();
-	}
-
-	/*
-	 * Now that all the NMIs have triggered, we can dump out their
-	 * back traces safely to the console.
-	 */
-	for_each_cpu(cpu, &printtrace_mask) {
-		int last_i = 0;
-
-		s = &per_cpu(nmi_print_seq, cpu);
-		len = seq_buf_used(&s->seq);
-		if (!len)
-			continue;
-
-		/* Print line by line. */
-		for (i = 0; i < len; i++) {
-			if (s->buffer[i] == '\n') {
-				print_seq_line(s, last_i, i);
-				last_i = i + 1;
-			}
-		}
-		/* Check if there was a partial line. */
-		if (last_i < len) {
-			print_seq_line(s, last_i, len - 1);
-			pr_cont("\n");
-		}
-	}
-
-	clear_bit(0, &backtrace_flag);
-	smp_mb__after_atomic();
-	put_cpu();
-}
-
-/*
- * It is not safe to call printk() directly from NMI handlers.
- * It may be fine if the NMI detected a lock up and we have no choice
- * but to do so, but doing a NMI on all other CPUs to get a back trace
- * can be done with a sysrq-l. We don't want that to lock up, which
- * can happen if the NMI interrupts a printk in progress.
- *
- * Instead, we redirect the vprintk() to this nmi_vprintk() that writes
- * the content into a per cpu seq_buf buffer. Then when the NMIs are
- * all done, we can safely dump the contents of the seq_buf to a printk()
- * from a non NMI context.
- */
-static int nmi_vprintk(const char *fmt, va_list args)
-{
-	struct nmi_seq_buf *s = this_cpu_ptr(&nmi_print_seq);
-	unsigned int len = seq_buf_used(&s->seq);
-
-	seq_buf_vprintf(&s->seq, fmt, args);
-	return seq_buf_used(&s->seq) - len;
+	nmi_trigger_all_cpu_backtrace(include_self, nmi_raise_cpu_backtrace);
 }
 
 static int
 arch_trigger_all_cpu_backtrace_handler(unsigned int cmd, struct pt_regs *regs)
 {
-	int cpu;
-
-	cpu = smp_processor_id();
-
-	if (cpumask_test_cpu(cpu, to_cpumask(backtrace_mask))) {
-		printk_func_t printk_func_save = this_cpu_read(printk_func);
-
-		/* Replace printk to write into the NMI seq */
-		this_cpu_write(printk_func, nmi_vprintk);
-		printk(KERN_WARNING "NMI backtrace for cpu %d\n", cpu);
-		show_regs(regs);
-		this_cpu_write(printk_func, printk_func_save);
-
-		cpumask_clear_cpu(cpu, to_cpumask(backtrace_mask));
+	if (nmi_cpu_backtrace(regs))
 		return NMI_HANDLED;
-	}
 
 	return NMI_DONE;
 }
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/3] ARM: add basic support for on-demand backtrace of other CPUs
  2015-07-15 20:39 [PATCH 0/3] Shared NMI backtracing support for ARM/x86 Russell King - ARM Linux
  2015-07-15 20:39 ` [PATCH 1/3] nmi: create generic NMI backtrace implementation Russell King
  2015-07-15 20:39 ` [PATCH 2/3] nmi: x86: convert to generic nmi handler Russell King
@ 2015-07-15 20:39 ` Russell King
  2015-07-16  9:13   ` Daniel Thompson
  2015-07-16  9:55 ` [PATCH 0/3] Shared NMI backtracing support for ARM/x86 Daniel Thompson
  2015-07-21  9:34 ` Thomas Gleixner
  4 siblings, 1 reply; 15+ messages in thread
From: Russell King @ 2015-07-15 20:39 UTC (permalink / raw)
  To: linux-arm-kernel, x86; +Cc: linux-kernel, Daniel Thompson

As we now have generic infrastructure to support backtracing of other
CPUs in the system on lockups, we can start to implement this for ARM.
Initially, we add an IPI based implementation, as the GIC code needs
modification to support the generation of FIQ IPIs, and not all ARM
platforms have the ability to raise a FIQ in the non-secure world.

This provides us with a "best efforts" implementation in the absence
of FIQs.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
---
 arch/arm/include/asm/irq.h |  5 +++++
 arch/arm/kernel/smp.c      | 18 ++++++++++++++++++
 2 files changed, 23 insertions(+)

diff --git a/arch/arm/include/asm/irq.h b/arch/arm/include/asm/irq.h
index 53c15dec7af6..be1d07d59ee9 100644
--- a/arch/arm/include/asm/irq.h
+++ b/arch/arm/include/asm/irq.h
@@ -35,6 +35,11 @@ extern void (*handle_arch_irq)(struct pt_regs *);
 extern void set_handle_irq(void (*handle_irq)(struct pt_regs *));
 #endif
 
+#ifdef CONFIG_SMP
+extern void arch_trigger_all_cpu_backtrace(bool);
+#define arch_trigger_all_cpu_backtrace(x) arch_trigger_all_cpu_backtrace(x)
+#endif
+
 #endif
 
 #endif
diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index 90dfbedfbfb8..3a20c386fd33 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -21,6 +21,7 @@
 #include <linux/cpu.h>
 #include <linux/seq_file.h>
 #include <linux/irq.h>
+#include <linux/nmi.h>
 #include <linux/percpu.h>
 #include <linux/clockchips.h>
 #include <linux/completion.h>
@@ -72,6 +73,7 @@ enum ipi_msg_type {
 	IPI_CPU_STOP,
 	IPI_IRQ_WORK,
 	IPI_COMPLETION,
+	IPI_CPU_BACKTRACE = 15,
 };
 
 static DECLARE_COMPLETION(cpu_running);
@@ -630,6 +632,12 @@ void handle_IPI(int ipinr, struct pt_regs *regs)
 		irq_exit();
 		break;
 
+	case IPI_CPU_BACKTRACE:
+		irq_enter();
+		nmi_cpu_backtrace(regs);
+		irq_exit();
+		break;
+
 	default:
 		pr_crit("CPU%u: Unknown IPI message 0x%x\n",
 		        cpu, ipinr);
@@ -724,3 +732,13 @@ static int __init register_cpufreq_notifier(void)
 core_initcall(register_cpufreq_notifier);
 
 #endif
+
+static void raise_nmi(cpumask_t *mask)
+{
+	smp_cross_call(mask, IPI_CPU_BACKTRACE);
+}
+
+void arch_trigger_all_cpu_backtrace(bool include_self)
+{
+	nmi_trigger_all_cpu_backtrace(include_self, raise_nmi);
+}
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] nmi: create generic NMI backtrace implementation
  2015-07-15 20:39 ` [PATCH 1/3] nmi: create generic NMI backtrace implementation Russell King
@ 2015-07-16  9:11   ` Daniel Thompson
  2015-07-16  9:37     ` Russell King - ARM Linux
  2015-07-16 11:07   ` Thomas Gleixner
  1 sibling, 1 reply; 15+ messages in thread
From: Daniel Thompson @ 2015-07-16  9:11 UTC (permalink / raw)
  To: Russell King, linux-arm-kernel, x86; +Cc: linux-kernel

On 15/07/15 21:39, Russell King wrote:
> +void nmi_trigger_all_cpu_backtrace(bool include_self,
> +				   void (*raise)(cpumask_t *mask))
> +{
> +	struct nmi_seq_buf *s;
> +	int i, cpu, this_cpu = get_cpu();
> +
> +	if (test_and_set_bit(0, &backtrace_flag)) {
> +		/*
> +		 * If there is already a trigger_all_cpu_backtrace() in progress
> +		 * (backtrace_flag == 1), don't output double cpu dump infos.
> +		 */
> +		put_cpu();
> +		return;
> +	}
> +
> +	cpumask_copy(to_cpumask(backtrace_mask), cpu_online_mask);
> +	if (!include_self)
> +		cpumask_clear_cpu(this_cpu, to_cpumask(backtrace_mask));
> +
> +	cpumask_copy(&printtrace_mask, to_cpumask(backtrace_mask));
> +
> +	/*
> +	 * Set up per_cpu seq_buf buffers that the NMIs running on the other
> +	 * CPUs will write to.
> +	 */
> +	for_each_cpu(cpu, to_cpumask(backtrace_mask)) {
> +		s = &per_cpu(nmi_print_seq, cpu);
> +		seq_buf_init(&s->seq, s->buffer, NMI_BUF_SIZE);
> +	}
> +
> +	if (!cpumask_empty(to_cpumask(backtrace_mask))) {
> +		pr_info("Sending NMI to %s CPUs:\n",
> +			(include_self ? "all" : "other"));
> +		raise(to_cpumask(backtrace_mask));

On ARM, this code could be running with IRQs locked and with raise() 
implemented using IRQs. In such as case the IPI will not be raised until 
the function exists (and perhaps never). Thanks to the timeout we will 
exit but we end up needlessly failing to print a backtrace for the 
calling CPU.

The solution I used for this was to special case the current CPU and 
call nmi_cpu_backtrace() directly. Originally I made this logic arm only 
but I can't really see any reason for this to be arch specific so the 
logic to do that should probably be included here.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] ARM: add basic support for on-demand backtrace of other CPUs
  2015-07-15 20:39 ` [PATCH 3/3] ARM: add basic support for on-demand backtrace of other CPUs Russell King
@ 2015-07-16  9:13   ` Daniel Thompson
  2015-07-16  9:39     ` Russell King - ARM Linux
  0 siblings, 1 reply; 15+ messages in thread
From: Daniel Thompson @ 2015-07-16  9:13 UTC (permalink / raw)
  To: Russell King, linux-arm-kernel, x86; +Cc: linux-kernel

On 15/07/15 21:39, Russell King wrote:
> As we now have generic infrastructure to support backtracing of other
> CPUs in the system on lockups, we can start to implement this for ARM.
> Initially, we add an IPI based implementation, as the GIC code needs
> modification to support the generation of FIQ IPIs, and not all ARM
> platforms have the ability to raise a FIQ in the non-secure world.
>
> This provides us with a "best efforts" implementation in the absence
> of FIQs.
>
> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
> ---
> diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
> index 90dfbedfbfb8..3a20c386fd33 100644
> --- a/arch/arm/kernel/smp.c
> +++ b/arch/arm/kernel/smp.c
> @@ -21,6 +21,7 @@
>   #include <linux/cpu.h>
>   #include <linux/seq_file.h>
>   #include <linux/irq.h>
> +#include <linux/nmi.h>
>   #include <linux/percpu.h>
>   #include <linux/clockchips.h>
>   #include <linux/completion.h>
> @@ -72,6 +73,7 @@ enum ipi_msg_type {
>   	IPI_CPU_STOP,
>   	IPI_IRQ_WORK,
>   	IPI_COMPLETION,
> +	IPI_CPU_BACKTRACE = 15,

Even with the potential for (eventually) being signalled by FIQ, is this 
IPI really so special it needs to be placed outside the scope of NR_IPI 
and the accounting and tracing support it brings with it?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] nmi: create generic NMI backtrace implementation
  2015-07-16  9:11   ` Daniel Thompson
@ 2015-07-16  9:37     ` Russell King - ARM Linux
  2015-07-16  9:51       ` Daniel Thompson
  0 siblings, 1 reply; 15+ messages in thread
From: Russell King - ARM Linux @ 2015-07-16  9:37 UTC (permalink / raw)
  To: Daniel Thompson; +Cc: linux-arm-kernel, x86, linux-kernel

On Thu, Jul 16, 2015 at 10:11:24AM +0100, Daniel Thompson wrote:
> On 15/07/15 21:39, Russell King wrote:
> >+void nmi_trigger_all_cpu_backtrace(bool include_self,
> >+				   void (*raise)(cpumask_t *mask))
> >+{
> >+	struct nmi_seq_buf *s;
> >+	int i, cpu, this_cpu = get_cpu();
> >+
> >+	if (test_and_set_bit(0, &backtrace_flag)) {
> >+		/*
> >+		 * If there is already a trigger_all_cpu_backtrace() in progress
> >+		 * (backtrace_flag == 1), don't output double cpu dump infos.
> >+		 */
> >+		put_cpu();
> >+		return;
> >+	}
> >+
> >+	cpumask_copy(to_cpumask(backtrace_mask), cpu_online_mask);
> >+	if (!include_self)
> >+		cpumask_clear_cpu(this_cpu, to_cpumask(backtrace_mask));
> >+
> >+	cpumask_copy(&printtrace_mask, to_cpumask(backtrace_mask));
> >+
> >+	/*
> >+	 * Set up per_cpu seq_buf buffers that the NMIs running on the other
> >+	 * CPUs will write to.
> >+	 */
> >+	for_each_cpu(cpu, to_cpumask(backtrace_mask)) {
> >+		s = &per_cpu(nmi_print_seq, cpu);
> >+		seq_buf_init(&s->seq, s->buffer, NMI_BUF_SIZE);
> >+	}
> >+
> >+	if (!cpumask_empty(to_cpumask(backtrace_mask))) {
> >+		pr_info("Sending NMI to %s CPUs:\n",
> >+			(include_self ? "all" : "other"));
> >+		raise(to_cpumask(backtrace_mask));
> 
> On ARM, this code could be running with IRQs locked and with raise()
> implemented using IRQs. In such as case the IPI will not be raised until the
> function exists (and perhaps never). Thanks to the timeout we will exit but
> we end up needlessly failing to print a backtrace for the calling CPU.
> 
> The solution I used for this was to special case the current CPU and call
> nmi_cpu_backtrace() directly. Originally I made this logic arm only but I
> can't really see any reason for this to be arch specific so the logic to do
> that should probably be included here.

That can be implemented in the arch raise() method if needed - most
architectures shouldn't need it as if they are properly raising a NMI
which is, by definition, deliverable with normal IRQs disabled.

-- 
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] ARM: add basic support for on-demand backtrace of other CPUs
  2015-07-16  9:13   ` Daniel Thompson
@ 2015-07-16  9:39     ` Russell King - ARM Linux
  0 siblings, 0 replies; 15+ messages in thread
From: Russell King - ARM Linux @ 2015-07-16  9:39 UTC (permalink / raw)
  To: Daniel Thompson; +Cc: linux-arm-kernel, x86, linux-kernel

On Thu, Jul 16, 2015 at 10:13:26AM +0100, Daniel Thompson wrote:
> On 15/07/15 21:39, Russell King wrote:
> >As we now have generic infrastructure to support backtracing of other
> >CPUs in the system on lockups, we can start to implement this for ARM.
> >Initially, we add an IPI based implementation, as the GIC code needs
> >modification to support the generation of FIQ IPIs, and not all ARM
> >platforms have the ability to raise a FIQ in the non-secure world.
> >
> >This provides us with a "best efforts" implementation in the absence
> >of FIQs.
> >
> >Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
> >---
> >diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
> >index 90dfbedfbfb8..3a20c386fd33 100644
> >--- a/arch/arm/kernel/smp.c
> >+++ b/arch/arm/kernel/smp.c
> >@@ -21,6 +21,7 @@
> >  #include <linux/cpu.h>
> >  #include <linux/seq_file.h>
> >  #include <linux/irq.h>
> >+#include <linux/nmi.h>
> >  #include <linux/percpu.h>
> >  #include <linux/clockchips.h>
> >  #include <linux/completion.h>
> >@@ -72,6 +73,7 @@ enum ipi_msg_type {
> >  	IPI_CPU_STOP,
> >  	IPI_IRQ_WORK,
> >  	IPI_COMPLETION,
> >+	IPI_CPU_BACKTRACE = 15,
> 
> Even with the potential for (eventually) being signalled by FIQ, is this IPI
> really so special it needs to be placed outside the scope of NR_IPI and the
> accounting and tracing support it brings with it?

That's exactly why it's placed outside that range.  We don't want the
accounting and tracing stuff at all in that path - that's more code which
needs to be run which could be the cause of the lockup.  More fragility.

-- 
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] nmi: create generic NMI backtrace implementation
  2015-07-16  9:37     ` Russell King - ARM Linux
@ 2015-07-16  9:51       ` Daniel Thompson
  2015-07-25 14:42         ` Russell King - ARM Linux
  0 siblings, 1 reply; 15+ messages in thread
From: Daniel Thompson @ 2015-07-16  9:51 UTC (permalink / raw)
  To: Russell King - ARM Linux; +Cc: linux-arm-kernel, x86, linux-kernel

On 16/07/15 10:37, Russell King - ARM Linux wrote:
> On Thu, Jul 16, 2015 at 10:11:24AM +0100, Daniel Thompson wrote:
>> On 15/07/15 21:39, Russell King wrote:
>>> +void nmi_trigger_all_cpu_backtrace(bool include_self,
>>> +				   void (*raise)(cpumask_t *mask))
>>> +{
>>> +	struct nmi_seq_buf *s;
>>> +	int i, cpu, this_cpu = get_cpu();
>>> +
>>> +	if (test_and_set_bit(0, &backtrace_flag)) {
>>> +		/*
>>> +		 * If there is already a trigger_all_cpu_backtrace() in progress
>>> +		 * (backtrace_flag == 1), don't output double cpu dump infos.
>>> +		 */
>>> +		put_cpu();
>>> +		return;
>>> +	}
>>> +
>>> +	cpumask_copy(to_cpumask(backtrace_mask), cpu_online_mask);
>>> +	if (!include_self)
>>> +		cpumask_clear_cpu(this_cpu, to_cpumask(backtrace_mask));
>>> +
>>> +	cpumask_copy(&printtrace_mask, to_cpumask(backtrace_mask));
>>> +
>>> +	/*
>>> +	 * Set up per_cpu seq_buf buffers that the NMIs running on the other
>>> +	 * CPUs will write to.
>>> +	 */
>>> +	for_each_cpu(cpu, to_cpumask(backtrace_mask)) {
>>> +		s = &per_cpu(nmi_print_seq, cpu);
>>> +		seq_buf_init(&s->seq, s->buffer, NMI_BUF_SIZE);
>>> +	}
>>> +
>>> +	if (!cpumask_empty(to_cpumask(backtrace_mask))) {
>>> +		pr_info("Sending NMI to %s CPUs:\n",
>>> +			(include_self ? "all" : "other"));
>>> +		raise(to_cpumask(backtrace_mask));
>>
>> On ARM, this code could be running with IRQs locked and with raise()
>> implemented using IRQs. In such as case the IPI will not be raised until the
>> function exists (and perhaps never). Thanks to the timeout we will exit but
>> we end up needlessly failing to print a backtrace for the calling CPU.
>>
>> The solution I used for this was to special case the current CPU and call
>> nmi_cpu_backtrace() directly. Originally I made this logic arm only but I
>> can't really see any reason for this to be arch specific so the logic to do
>> that should probably be included here.
>
> That can be implemented in the arch raise() method if needed - most
> architectures shouldn't need it as if they are properly raising a NMI
> which is, by definition, deliverable with normal IRQs disabled.

Agreed. The bug certainly could be fixed in the ARM raise() function.

However I'm still curious whether there is any architecture that 
benefits from forcing the current CPU into an NMI handler? Why doesn't 
the don't-run-unnecessary-code argument apply here as well?


Daniel.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] Shared NMI backtracing support for ARM/x86
  2015-07-15 20:39 [PATCH 0/3] Shared NMI backtracing support for ARM/x86 Russell King - ARM Linux
                   ` (2 preceding siblings ...)
  2015-07-15 20:39 ` [PATCH 3/3] ARM: add basic support for on-demand backtrace of other CPUs Russell King
@ 2015-07-16  9:55 ` Daniel Thompson
  2015-07-21  9:34 ` Thomas Gleixner
  4 siblings, 0 replies; 15+ messages in thread
From: Daniel Thompson @ 2015-07-16  9:55 UTC (permalink / raw)
  To: Russell King - ARM Linux, linux-arm-kernel, x86
  Cc: H. Peter Anvin, Ingo Molnar, linux-kernel, Thomas Gleixner

On 15/07/15 21:39, Russell King - ARM Linux wrote:
> Back in September, I stumbled across a single CPU IRQs-off lockup of an
> ARM SMP system, and decided to hack something together based on a much
> older hacky implementation used with StrongARM CPUs from early 2000s.
>
> This resulted in a copy of the x86 NMI backtrace code into ARM as it was
> back then, and feedback indicated that it wasn't a good time to push
> such an effort forward, as printk() in NMI context is dodgy.
>
> Over time, the x86 code has had this problem addressed, and last week
> I updated the patch which I've been carrying in my tree to move the
> shared code out of arch/x86 into lib/ rather than duplicating it, and
> switch the ARM implementation to use it.
>
> Discussing this with Thomas Gliexner, he agreed to give it a test over
> last weekend, and he has reported to me this evening "no explosion so
> far".  Since then, I've made a change to add the NOKPROBE_SYMBOL() to
> the generic handler as per the x86 original code.
>
> I'm aware that there are other competing implementations out there -
> Daniel has one based on my patch from September time, but I don't think
> that goes far enough with code sharing.  I'm also partially aware of
> an implementation from Petr too.

No worries. I agree this approach is much cleaner with regards to code 
sharing.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] nmi: create generic NMI backtrace implementation
  2015-07-15 20:39 ` [PATCH 1/3] nmi: create generic NMI backtrace implementation Russell King
  2015-07-16  9:11   ` Daniel Thompson
@ 2015-07-16 11:07   ` Thomas Gleixner
  1 sibling, 0 replies; 15+ messages in thread
From: Thomas Gleixner @ 2015-07-16 11:07 UTC (permalink / raw)
  To: Russell King; +Cc: linux-arm-kernel, x86, linux-kernel, Daniel Thompson

On Wed, 15 Jul 2015, Russell King wrote:

> x86s NMI backtrace implementation (for arch_trigger_all_cpu_backtrace())
> is fairly generic in nature - the only architecture specific bits are
> the act of raising the NMI to other CPUs, and reporting the status of
> the NMI handler.
> 
> These are fairly simple to factor out, and produce a generic
> implementation which can be shared between ARM and x86.
> 
> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] nmi: x86: convert to generic nmi handler
  2015-07-15 20:39 ` [PATCH 2/3] nmi: x86: convert to generic nmi handler Russell King
@ 2015-07-16 11:07   ` Thomas Gleixner
  0 siblings, 0 replies; 15+ messages in thread
From: Thomas Gleixner @ 2015-07-16 11:07 UTC (permalink / raw)
  To: Russell King
  Cc: linux-arm-kernel, x86, linux-kernel, Daniel Thompson,
	Ingo Molnar, H. Peter Anvin

On Wed, 15 Jul 2015, Russell King wrote:

> Convert x86 to use the generic nmi handler code which can be shared
> between architectures.
> 
> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

Reviewed-and-tested-by: Thomas Gleixner <tglx@linutronix.de>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] Shared NMI backtracing support for ARM/x86
  2015-07-15 20:39 [PATCH 0/3] Shared NMI backtracing support for ARM/x86 Russell King - ARM Linux
                   ` (3 preceding siblings ...)
  2015-07-16  9:55 ` [PATCH 0/3] Shared NMI backtracing support for ARM/x86 Daniel Thompson
@ 2015-07-21  9:34 ` Thomas Gleixner
  4 siblings, 0 replies; 15+ messages in thread
From: Thomas Gleixner @ 2015-07-21  9:34 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: LAK, x86, Daniel Thompson, H. Peter Anvin, Ingo Molnar, LKML,
	Peter Zijlstra

On Wed, 15 Jul 2015, Russell King - ARM Linux wrote:
> Back in September, I stumbled across a single CPU IRQs-off lockup of an
> ARM SMP system, and decided to hack something together based on a much
> older hacky implementation used with StrongARM CPUs from early 2000s.
> 
> This resulted in a copy of the x86 NMI backtrace code into ARM as it was
> back then, and feedback indicated that it wasn't a good time to push
> such an effort forward, as printk() in NMI context is dodgy.
> 
> Over time, the x86 code has had this problem addressed, and last week
> I updated the patch which I've been carrying in my tree to move the
> shared code out of arch/x86 into lib/ rather than duplicating it, and
> switch the ARM implementation to use it.
> 
> Discussing this with Thomas Gliexner, he agreed to give it a test over
> last weekend, and he has reported to me this evening "no explosion so
> far".  Since then, I've made a change to add the NOKPROBE_SYMBOL() to
> the generic handler as per the x86 original code.
> 
> I'm aware that there are other competing implementations out there -
> Daniel has one based on my patch from September time, but I don't think
> that goes far enough with code sharing.  I'm also partially aware of
> an implementation from Petr too.

I think we should just move ahead and apply this lot. Any improvements
can be done on top of this.

Russell, please take it through your tree.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] nmi: create generic NMI backtrace implementation
  2015-07-16  9:51       ` Daniel Thompson
@ 2015-07-25 14:42         ` Russell King - ARM Linux
  2015-07-28  8:29           ` Daniel Thompson
  0 siblings, 1 reply; 15+ messages in thread
From: Russell King - ARM Linux @ 2015-07-25 14:42 UTC (permalink / raw)
  To: Daniel Thompson; +Cc: linux-arm-kernel, x86, linux-kernel

On Thu, Jul 16, 2015 at 10:51:25AM +0100, Daniel Thompson wrote:
> On 16/07/15 10:37, Russell King - ARM Linux wrote:
> >That can be implemented in the arch raise() method if needed - most
> >architectures shouldn't need it as if they are properly raising a NMI
> >which is, by definition, deliverable with normal IRQs disabled.
> 
> Agreed. The bug certainly could be fixed in the ARM raise() function.
> 
> However I'm still curious whether there is any architecture that benefits
> from forcing the current CPU into an NMI handler? Why doesn't the
> don't-run-unnecessary-code argument apply here as well?

The benefit is that we get a consistent way of invoking the backtrace,
since causing the NMI exception gives us a 'struct pt_regs' to work
with, which we wouldn't otherwise have if we tried to call it "inline".

The NMI backtrace includes dumping the register state of the NMI-
receiving CPUs, which needs a 'struct pt_regs' and generating a that in
arch-independent code wouldn't be nice.

In any case, if this area needs changing in the generic code, it should
be done as a separate change so that it can be properly assessed and
validated on x86.

In the mean time, I will action Thomas' request to put it into my tree
so that we can get some reasonable linux-next time with it, and hopefully
have some progress towards FIQ-based backtracing for ARM.

Thanks.

-- 
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] nmi: create generic NMI backtrace implementation
  2015-07-25 14:42         ` Russell King - ARM Linux
@ 2015-07-28  8:29           ` Daniel Thompson
  0 siblings, 0 replies; 15+ messages in thread
From: Daniel Thompson @ 2015-07-28  8:29 UTC (permalink / raw)
  To: Russell King - ARM Linux; +Cc: linux-arm-kernel, x86, linux-kernel

On 25/07/15 15:42, Russell King - ARM Linux wrote:
> On Thu, Jul 16, 2015 at 10:51:25AM +0100, Daniel Thompson wrote:
>> On 16/07/15 10:37, Russell King - ARM Linux wrote:
>>> That can be implemented in the arch raise() method if needed - most
>>> architectures shouldn't need it as if they are properly raising a NMI
>>> which is, by definition, deliverable with normal IRQs disabled.
>>
>> Agreed. The bug certainly could be fixed in the ARM raise() function.
>>
>> However I'm still curious whether there is any architecture that benefits
>> from forcing the current CPU into an NMI handler? Why doesn't the
>> don't-run-unnecessary-code argument apply here as well?
>
> The benefit is that we get a consistent way of invoking the backtrace,
> since causing the NMI exception gives us a 'struct pt_regs' to work
> with, which we wouldn't otherwise have if we tried to call it "inline".
>
> The NMI backtrace includes dumping the register state of the NMI-
> receiving CPUs, which needs a 'struct pt_regs' and generating a that in
> arch-independent code wouldn't be nice.

Previously I have relied on dump_stack() for this. That should work 
everywhere although guess the arch code might display the stack display 
differently.


> In any case, if this area needs changing in the generic code, it should
> be done as a separate change so that it can be properly assessed and
> validated on x86.

Do you want me to supply a patch to fix the IRQ issue in the arm 
specific code for now?

If we don't fix that then the behaviour of SysRq-L on ARM will change 
and the output will no longer include the CPU that executed SysRq-L.


> In the mean time, I will action Thomas' request to put it into my tree
> so that we can get some reasonable linux-next time with it, and hopefully
> have some progress towards FIQ-based backtracing for ARM.

Great!


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2015-07-28  8:29 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-15 20:39 [PATCH 0/3] Shared NMI backtracing support for ARM/x86 Russell King - ARM Linux
2015-07-15 20:39 ` [PATCH 1/3] nmi: create generic NMI backtrace implementation Russell King
2015-07-16  9:11   ` Daniel Thompson
2015-07-16  9:37     ` Russell King - ARM Linux
2015-07-16  9:51       ` Daniel Thompson
2015-07-25 14:42         ` Russell King - ARM Linux
2015-07-28  8:29           ` Daniel Thompson
2015-07-16 11:07   ` Thomas Gleixner
2015-07-15 20:39 ` [PATCH 2/3] nmi: x86: convert to generic nmi handler Russell King
2015-07-16 11:07   ` Thomas Gleixner
2015-07-15 20:39 ` [PATCH 3/3] ARM: add basic support for on-demand backtrace of other CPUs Russell King
2015-07-16  9:13   ` Daniel Thompson
2015-07-16  9:39     ` Russell King - ARM Linux
2015-07-16  9:55 ` [PATCH 0/3] Shared NMI backtracing support for ARM/x86 Daniel Thompson
2015-07-21  9:34 ` Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).