All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] [RFC] new nmi_watchdog using perf events
@ 2010-01-27 20:03 Don Zickus
  2010-01-27 20:03 ` [PATCH 1/3] [RFC][x86] move notify_die from nmi.c to traps.c Don Zickus
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Don Zickus @ 2010-01-27 20:03 UTC (permalink / raw)
  To: mingo, peterz, gorcunov; +Cc: aris, linux-kernel, Don Zickus

This patch series tries to implement a new nmi_watchdog using the perf
events subsystem.  I am posting this series early for feedback on the
approach.  It isn't feature compatible with the old nmi_watchdog yet, nor
does it have all the old workarounds either.

The basic design is to create an in-kernel performance counter that goes off
every few seconds and checks for cpu lockups.  It is fairly straight
forward.  Some of the quirks are making sure the cpu lockup detection works
correctly.

It has been lightly tested for now.  Once people are ok with the approach,
I'll expand testing to more machines in our lab.

I tried taking a generic approach so all arches could use it if they want
and then implement some per arch specific hooks.  I believe this is what
Ingo was suggesting.  The initial work is based off of kernel/softlockup.c.

Any feedback would be great.

Cheers,
Don

--
Sorry if this spams people again.  My mailer was broke and some mail wasn't
sent correctly.

Don Zickus (3):
  [RFC][x86] move notify_die from nmi.c to traps.c
  [RFC] nmi_watchdog: new implementation using perf events
  [RFC] nmi_watchdog: config option to enable new nmi_watchdog

 arch/x86/kernel/apic/Makefile |    7 ++-
 arch/x86/kernel/apic/hw_nmi.c |  114 ++++++++++++++++++++++++
 arch/x86/kernel/apic/nmi.c    |    7 --
 arch/x86/kernel/traps.c       |    7 ++
 include/linux/nmi.h           |    4 +
 kernel/Makefile               |    1 +
 kernel/nmi_watchdog.c         |  196 +++++++++++++++++++++++++++++++++++++++++
 lib/Kconfig.debug             |   13 +++
 8 files changed, 341 insertions(+), 8 deletions(-)
 create mode 100644 arch/x86/kernel/apic/hw_nmi.c
 create mode 100644 kernel/nmi_watchdog.c


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/3] [RFC][x86] move notify_die from nmi.c to traps.c
  2010-01-27 20:03 [PATCH 0/3] [RFC] new nmi_watchdog using perf events Don Zickus
@ 2010-01-27 20:03 ` Don Zickus
  2010-01-28 15:10   ` Cyrill Gorcunov
  2010-01-27 20:03 ` [PATCH 2/3] [RFC] nmi_watchdog: new implementation using perf events Don Zickus
  2010-01-27 20:03 ` [PATCH 3/3] [RFC] nmi_watchdog: config option to enable new nmi_watchdog Don Zickus
  2 siblings, 1 reply; 15+ messages in thread
From: Don Zickus @ 2010-01-27 20:03 UTC (permalink / raw)
  To: mingo, peterz, gorcunov; +Cc: aris, linux-kernel, Don Zickus

In order to handle a new nmi_watchdog approach, I need to move the notify_die()
routine out of nmi_watchdog_tick() and into default_do_nmi().  This lets me easily
swap out the old nmi_watchdog with the new one with just a config change.

The change probably makes sense from a high level perspective because the
nmi_watchdog shouldn't be handling notify_die routines anyway.  However, this
move does change the semantics a little bit.  Instead of checking on every nmi
interrupt if the cpus are stuck, only check them on the nmi_watchdog interrupts.

Signed-off-by: Don Zickus <dzickus@redhat.com>
---
 arch/x86/kernel/apic/nmi.c |    7 -------
 arch/x86/kernel/traps.c    |    5 +++++
 2 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/apic/nmi.c b/arch/x86/kernel/apic/nmi.c
index 0159a69..5d47682 100644
--- a/arch/x86/kernel/apic/nmi.c
+++ b/arch/x86/kernel/apic/nmi.c
@@ -400,13 +400,6 @@ nmi_watchdog_tick(struct pt_regs *regs, unsigned reason)
 	int cpu = smp_processor_id();
 	int rc = 0;
 
-	/* check for other users first */
-	if (notify_die(DIE_NMI, "nmi", regs, reason, 2, SIGINT)
-			== NOTIFY_STOP) {
-		rc = 1;
-		touched = 1;
-	}
-
 	sum = get_timer_irqs(cpu);
 
 	if (__get_cpu_var(nmi_touch)) {
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 3339917..3b98dd3 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -400,6 +400,11 @@ static notrace __kprobes void default_do_nmi(struct pt_regs *regs)
 		if (notify_die(DIE_NMI_IPI, "nmi_ipi", regs, reason, 2, SIGINT)
 								== NOTIFY_STOP)
 			return;
+
+	        if (notify_die(DIE_NMI, "nmi", regs, reason, 2, SIGINT)
+	        			                == NOTIFY_STOP)
+	                return;
+
 #ifdef CONFIG_X86_LOCAL_APIC
 		/*
 		 * Ok, so this is none of the documented NMI sources,
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/3] [RFC] nmi_watchdog: new implementation using perf events
  2010-01-27 20:03 [PATCH 0/3] [RFC] new nmi_watchdog using perf events Don Zickus
  2010-01-27 20:03 ` [PATCH 1/3] [RFC][x86] move notify_die from nmi.c to traps.c Don Zickus
@ 2010-01-27 20:03 ` Don Zickus
  2010-01-27 20:03 ` [PATCH 3/3] [RFC] nmi_watchdog: config option to enable new nmi_watchdog Don Zickus
  2 siblings, 0 replies; 15+ messages in thread
From: Don Zickus @ 2010-01-27 20:03 UTC (permalink / raw)
  To: mingo, peterz, gorcunov; +Cc: aris, linux-kernel, Don Zickus

This is a new generic nmi_watchdog implementation using the perf events
infrastructure as suggested by Ingo.

The implementation is simple, just create an in-kernel perf event and register
an overflow handler to check for cpu lockups.  I created a generic implementation
that lives in kernel/ and the hardware specific part that for now lives in arch/x86.

I have done light testing to make sure the framework works correctly and it does.
Correctly detecting hardlocks is still a work in progress, though it should work,
I am having trouble proving it through testing.

Signed-off-by: Don Zickus <dzickus@redhat.com>
---
 arch/x86/kernel/apic/hw_nmi.c |  114 ++++++++++++++++++++++++
 kernel/nmi_watchdog.c         |  196 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 310 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/kernel/apic/hw_nmi.c
 create mode 100644 kernel/nmi_watchdog.c

diff --git a/arch/x86/kernel/apic/hw_nmi.c b/arch/x86/kernel/apic/hw_nmi.c
new file mode 100644
index 0000000..8c0e6a4
--- /dev/null
+++ b/arch/x86/kernel/apic/hw_nmi.c
@@ -0,0 +1,114 @@
+/*
+ *  HW NMI watchdog support
+ *
+ *  started by Don Zickus, Copyright (C) 2010 Red Hat, Inc.
+ *
+ *  Arch specific calls to support NMI watchdog
+ *
+ *  Bits copied from original nmi.c file
+ *
+ */
+
+#include <asm/apic.h>
+#include <linux/smp.h>
+#include <linux/cpumask.h>
+#include <linux/sched.h>
+#include <linux/percpu.h>
+#include <linux/cpumask.h>
+#include <linux/kernel_stat.h>
+#include <asm/mce.h>
+
+#include <linux/nmi.h>
+#include <linux/module.h>
+
+/* For reliability, we're prepared to waste bits here. */
+static DECLARE_BITMAP(backtrace_mask, NR_CPUS) __read_mostly;
+
+static DEFINE_PER_CPU(unsigned, last_irq_sum);
+
+/*
+ * Take the local apic timer and PIT/HPET into account. We don't
+ * know which one is active, when we have highres/dyntick on
+ */
+static inline unsigned int get_timer_irqs(int cpu)
+{
+        return per_cpu(irq_stat, cpu).apic_timer_irqs +
+                per_cpu(irq_stat, cpu).irq0_irqs;
+}
+
+static inline int mce_in_progress(void)
+{
+#if defined(CONFIG_X86_MCE)
+        return atomic_read(&mce_entry) > 0;
+#endif
+        return 0;
+}
+
+int hw_nmi_is_cpu_stuck(struct pt_regs *regs)
+{
+	unsigned int sum;
+	int cpu = smp_processor_id();
+
+	/* FIXME: cheap hack for this check, probably should get its own
+	 * die_notifier handler
+	 */
+	if (cpumask_test_cpu(cpu, to_cpumask(backtrace_mask))) {
+		static DEFINE_SPINLOCK(lock);	/* Serialise the printks */
+
+		spin_lock(&lock);
+		printk(KERN_WARNING "NMI backtrace for cpu %d\n", cpu);
+		show_regs(regs);
+		dump_stack();
+		spin_unlock(&lock);
+		cpumask_clear_cpu(cpu, to_cpumask(backtrace_mask));
+	}
+
+	/* if we are doing an mce, just assume the cpu is not stuck */
+        /* Could check oops_in_progress here too, but it's safer not to */
+        if (mce_in_progress())
+                return 0;
+
+	/* We determine if the cpu is stuck by checking whether any
+	 * interrupts have happened since we last checked.  Of course
+	 * an nmi storm could create false positives, but the higher
+	 * level logic should account for that
+	 */
+	sum = get_timer_irqs(cpu);
+	if (__get_cpu_var(last_irq_sum) == sum) {
+		return 1;
+	} else {
+		__get_cpu_var(last_irq_sum) = sum;
+		return 0;
+	}
+}
+
+void arch_trigger_all_cpu_backtrace(void)
+{
+	int i;
+
+	cpumask_copy(to_cpumask(backtrace_mask), cpu_online_mask);
+
+	printk(KERN_INFO "sending NMI to all CPUs:\n");
+	apic->send_IPI_all(NMI_VECTOR);
+
+	/* Wait for up to 10 seconds for all CPUs to do the backtrace */
+	for (i = 0; i < 10 * 1000; i++) {
+		if (cpumask_empty(to_cpumask(backtrace_mask)))
+			break;
+		mdelay(1);
+	}
+}
+
+/* STUB calls to mimic old nmi_watchdog behaviour */
+unsigned int nmi_watchdog = NMI_NONE;
+EXPORT_SYMBOL(nmi_watchdog);
+atomic_t nmi_active = ATOMIC_INIT(0);           /* oprofile uses this */
+EXPORT_SYMBOL(nmi_active);
+int nmi_watchdog_enabled;
+int unknown_nmi_panic;
+void cpu_nmi_set_wd_enabled(void) { return; }
+void acpi_nmi_enable(void) { return; }
+void acpi_nmi_disable(void) { return; }
+void stop_apic_nmi_watchdog(void *unused) { return; }
+void setup_apic_nmi_watchdog(void *unused) { return; }
+int __init check_nmi_watchdog(void) { return 0; }
diff --git a/kernel/nmi_watchdog.c b/kernel/nmi_watchdog.c
new file mode 100644
index 0000000..9e5697e
--- /dev/null
+++ b/kernel/nmi_watchdog.c
@@ -0,0 +1,196 @@
+/*
+ * Detect Hard Lockups using the NMI
+ *
+ * started by Don Zickus, Copyright (C) 2010 Red Hat, Inc.
+ *
+ * this code detects hard lockups: incidents in where on a CPU
+ * the kernel does not respond to anything except NMI.
+ *
+ * Note: Most of this code is borrowed heavily from softlockup.c,
+ * so thanks to Ingo for the initial implementation.
+ * Some chunks also taken from arch/x86/kernel/apic/nmi.c, thanks
+ * to those contributors as well.
+ */
+
+#include <linux/mm.h>
+#include <linux/cpu.h>
+#include <linux/nmi.h>
+#include <linux/init.h>
+#include <linux/delay.h>
+#include <linux/freezer.h>
+#include <linux/lockdep.h>
+#include <linux/notifier.h>
+#include <linux/module.h>
+#include <linux/sysctl.h>
+
+#include <asm/irq_regs.h>
+#include <linux/perf_event.h>
+
+static DEFINE_PER_CPU(struct perf_event *, nmi_watchdog_ev);
+static DEFINE_PER_CPU(int, nmi_watchdog_touch);
+static DEFINE_PER_CPU(long, alert_counter);
+
+/* arbituary number that updates every ~5 seconds */
+int __read_mostly hardlockup_thresh = 10000000;
+
+void touch_nmi_watchdog(void)
+{
+	__raw_get_cpu_var(nmi_watchdog_touch) = 1;
+	touch_softlockup_watchdog();
+}
+EXPORT_SYMBOL(touch_nmi_watchdog);
+
+void touch_all_nmi_watchdog(void)
+{
+	int cpu;
+
+	for_each_online_cpu(cpu)
+		per_cpu(nmi_watchdog_touch, cpu) = 1;
+	touch_softlockup_watchdog();
+}
+
+#ifdef CONFIG_SYSCTL
+/*
+ * proc handler for /proc/sys/kernel/nmi_watchdog
+ */
+int proc_nmi_enabled(struct ctl_table *table, int write,
+		     void __user *buffer, size_t *length, loff_t *ppos)
+{
+	int cpu;
+
+	if (per_cpu(nmi_watchdog_ev, smp_processor_id()) == NULL)
+		nmi_watchdog_enabled = 0;
+	else
+		nmi_watchdog_enabled = 1;
+
+	touch_all_nmi_watchdog();
+	proc_dointvec(table, write, buffer, length, ppos);
+	if (nmi_watchdog_enabled)
+		for_each_online_cpu(cpu)
+			perf_event_enable(per_cpu(nmi_watchdog_ev, cpu));
+	else
+		for_each_online_cpu(cpu)
+			perf_event_disable(per_cpu(nmi_watchdog_ev, cpu));
+	return 0;
+}
+
+#endif /* CONFIG_SYSCTL */
+
+struct perf_event_attr wd_attr = {
+	.type = PERF_TYPE_HARDWARE,
+	.config = PERF_COUNT_HW_CPU_CYCLES,
+	.size = sizeof(struct perf_event_attr),
+	.pinned = 1,
+	.disabled = 1,
+};
+
+static int panic_on_timeout;
+
+void wd_overflow(struct perf_event *event, int nmi,
+		 struct perf_sample_data *data,
+		 struct pt_regs *regs)
+{
+	int cpu = smp_processor_id();
+	int touched = 0;
+
+	if (__get_cpu_var(nmi_watchdog_touch)) {
+		per_cpu(nmi_watchdog_touch, cpu) = 0;
+		touched = 1;
+	}
+
+	/* check to see if the cpu is doing anything */
+	if (!touched && hw_nmi_is_cpu_stuck(regs)) {
+		/*
+		 * Ayiee, looks like this CPU is stuck ...
+		 * wait a few IRQs (5 seconds) before doing the oops ...
+		 */
+		per_cpu(alert_counter,cpu) += 1;
+		printk(KERN_EMERG "DONNMI: %d - ac %ld\n", cpu, per_cpu(alert_counter,cpu));
+		if (per_cpu(alert_counter,cpu) == (HZ)) {
+			/*
+			 * die_nmi will return ONLY if NOTIFY_STOP happens..
+			 */
+			die_nmi("BUG: DONNMI Watchdog detected LOCKUP",
+				regs, panic_on_timeout);
+		}
+	} else {
+		per_cpu(alert_counter,cpu) = 0;
+	}
+
+	return;
+}
+
+/*
+ * Create/destroy watchdog threads as CPUs come and go:
+ */
+static int __cpuinit
+cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu)
+{
+	int hotcpu = (unsigned long)hcpu;
+	struct perf_event *event;
+
+	switch (action) {
+	case CPU_UP_PREPARE:
+	case CPU_UP_PREPARE_FROZEN:
+		per_cpu(nmi_watchdog_touch, hotcpu) = 0;
+		break;
+	case CPU_ONLINE:
+	case CPU_ONLINE_FROZEN:
+		/* below line should be moved up but compiler doesn't like that, probably the union */
+		wd_attr.sample_period = hardlockup_thresh;
+		/* originally wanted the below chunk to be in CPU_UP_PREPARE, but caps is unpriv for non-CPU0 */
+		event = perf_event_create_kernel_counter(&wd_attr, hotcpu, -1, wd_overflow);
+		if (IS_ERR(event)) {
+			printk(KERN_ERR "nmi watchdog failed to create perf event on %i: %p\n", hotcpu, event);
+			return NOTIFY_BAD;
+		}
+		per_cpu(nmi_watchdog_ev, hotcpu) = event;
+		perf_event_enable(per_cpu(nmi_watchdog_ev, hotcpu));
+		break;
+#ifdef CONFIG_HOTPLUG_CPU
+	case CPU_UP_CANCELED:
+	case CPU_UP_CANCELED_FROZEN:
+		perf_event_disable(per_cpu(nmi_watchdog_ev, hotcpu));
+	case CPU_DEAD:
+	case CPU_DEAD_FROZEN:
+		event = per_cpu(nmi_watchdog_ev, hotcpu);
+		per_cpu(nmi_watchdog_ev, hotcpu) = NULL;
+		perf_event_release_kernel(event);
+		break;
+#endif /* CONFIG_HOTPLUG_CPU */
+	}
+	return NOTIFY_OK;
+}
+
+static struct notifier_block __cpuinitdata cpu_nfb = {
+	.notifier_call = cpu_callback
+};
+
+static int __initdata nonmi_watchdog;
+
+static int __init nonmi_watchdog_setup(char *str)
+{
+	nonmi_watchdog = 1;
+	return 1;
+}
+__setup("nonmi_watchdog", nonmi_watchdog_setup);
+
+static int __init spawn_nmi_watchdog_task(void)
+{
+	void *cpu = (void *)(long)smp_processor_id();
+	int err;
+
+	if (nonmi_watchdog)
+		return 0;
+
+	err = cpu_callback(&cpu_nfb, CPU_UP_PREPARE, cpu);
+	if (err == NOTIFY_BAD) {
+		BUG();
+		return 1;
+	}
+	cpu_callback(&cpu_nfb, CPU_ONLINE, cpu);
+	register_cpu_notifier(&cpu_nfb);
+
+	return 0;
+}
+early_initcall(spawn_nmi_watchdog_task);
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/3] [RFC] nmi_watchdog: config option to enable new nmi_watchdog
  2010-01-27 20:03 [PATCH 0/3] [RFC] new nmi_watchdog using perf events Don Zickus
  2010-01-27 20:03 ` [PATCH 1/3] [RFC][x86] move notify_die from nmi.c to traps.c Don Zickus
  2010-01-27 20:03 ` [PATCH 2/3] [RFC] nmi_watchdog: new implementation using perf events Don Zickus
@ 2010-01-27 20:03 ` Don Zickus
  2010-01-28 14:54   ` Peter Zijlstra
  2 siblings, 1 reply; 15+ messages in thread
From: Don Zickus @ 2010-01-27 20:03 UTC (permalink / raw)
  To: mingo, peterz, gorcunov; +Cc: aris, linux-kernel, Don Zickus

These are the bits that enable the new nmi_watchdog and safely isolate the
old nmi_watchdog.  Only one or the other can run, not both at the same
time.

Signed-off-by: Don Zickus <dzickus@redhat.com>
---
 arch/x86/kernel/apic/Makefile |    7 ++++++-
 arch/x86/kernel/traps.c       |    2 ++
 include/linux/nmi.h           |    4 ++++
 kernel/Makefile               |    1 +
 lib/Kconfig.debug             |   13 +++++++++++++
 5 files changed, 26 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/apic/Makefile b/arch/x86/kernel/apic/Makefile
index 565c1bf..1a4512e 100644
--- a/arch/x86/kernel/apic/Makefile
+++ b/arch/x86/kernel/apic/Makefile
@@ -2,7 +2,12 @@
 # Makefile for local APIC drivers and for the IO-APIC code
 #
 
-obj-$(CONFIG_X86_LOCAL_APIC)	+= apic.o apic_noop.o probe_$(BITS).o ipi.o nmi.o
+obj-$(CONFIG_X86_LOCAL_APIC)	+= apic.o apic_noop.o probe_$(BITS).o ipi.o
+ifneq ($(CONFIG_NMI_WATCHDOG),y)
+obj-$(CONFIG_X86_LOCAL_APIC)	+= nmi.o
+endif
+obj-$(CONFIG_NMI_WATCHDOG)	+= hw_nmi.o
+
 obj-$(CONFIG_X86_IO_APIC)	+= io_apic.o
 obj-$(CONFIG_SMP)		+= ipi.o
 
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 3b98dd3..eee1dc8 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -406,6 +406,7 @@ static notrace __kprobes void default_do_nmi(struct pt_regs *regs)
 	                return;
 
 #ifdef CONFIG_X86_LOCAL_APIC
+#ifndef CONFIG_NMI_WATCHDOG
 		/*
 		 * Ok, so this is none of the documented NMI sources,
 		 * so it must be the NMI watchdog.
@@ -413,6 +414,7 @@ static notrace __kprobes void default_do_nmi(struct pt_regs *regs)
 		if (nmi_watchdog_tick(regs, reason))
 			return;
 		if (!do_nmi_callback(regs, cpu))
+#endif /* !CONFIG_NMI_WATCHDOG */
 			unknown_nmi_error(reason, regs);
 #else
 		unknown_nmi_error(reason, regs);
diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index b752e80..a42ff0b 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -47,4 +47,8 @@ static inline bool trigger_all_cpu_backtrace(void)
 }
 #endif
 
+#ifdef CONFIG_NMI_WATCHDOG
+int hw_nmi_is_cpu_stuck(struct pt_regs *);
+#endif
+
 #endif
diff --git a/kernel/Makefile b/kernel/Makefile
index 864ff75..8a5abe5 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -76,6 +76,7 @@ obj-$(CONFIG_AUDIT_TREE) += audit_tree.o
 obj-$(CONFIG_KPROBES) += kprobes.o
 obj-$(CONFIG_KGDB) += kgdb.o
 obj-$(CONFIG_DETECT_SOFTLOCKUP) += softlockup.o
+obj-$(CONFIG_NMI_WATCHDOG) += nmi_watchdog.o
 obj-$(CONFIG_DETECT_HUNG_TASK) += hung_task.o
 obj-$(CONFIG_GENERIC_HARDIRQS) += irq/
 obj-$(CONFIG_SECCOMP) += seccomp.o
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 25c3ed5..04a43a2 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -170,6 +170,19 @@ config DETECT_SOFTLOCKUP
 	   can be detected via the NMI-watchdog, on platforms that
 	   support it.)
 
+config NMI_WATCHDOG
+	bool "Detect Hard Lockups with an NMI Watchdog"
+	depends on DEBUG_KERNEL && PERF_EVENTS
+	default y
+	help
+	  Say Y here to enable the kernel to use the NMI as a watchdog
+	  to detect hard lockups.  This is useful when a cpu hangs for no
+	  reason but can still respond to NMIs.  A backtrace is displayed
+	  for reviewing and reporting.
+
+	  The overhead should be minimal, just an extra NMI every few 
+	  seconds.
+
 config BOOTPARAM_SOFTLOCKUP_PANIC
 	bool "Panic (Reboot) On Soft Lockups"
 	depends on DETECT_SOFTLOCKUP
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] [RFC] nmi_watchdog: config option to enable new nmi_watchdog
  2010-01-27 20:03 ` [PATCH 3/3] [RFC] nmi_watchdog: config option to enable new nmi_watchdog Don Zickus
@ 2010-01-28 14:54   ` Peter Zijlstra
  2010-01-28 15:44     ` Don Zickus
  0 siblings, 1 reply; 15+ messages in thread
From: Peter Zijlstra @ 2010-01-28 14:54 UTC (permalink / raw)
  To: Don Zickus; +Cc: mingo, gorcunov, aris, linux-kernel

On Wed, 2010-01-27 at 15:03 -0500, Don Zickus wrote:
> These are the bits that enable the new nmi_watchdog and safely isolate the
> old nmi_watchdog.  Only one or the other can run, not both at the same
> time.

perf disables the lapic watchdog when it wants the pmu, so there
shouldn't be a problem having both built in.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] [RFC][x86] move notify_die from nmi.c to traps.c
  2010-01-27 20:03 ` [PATCH 1/3] [RFC][x86] move notify_die from nmi.c to traps.c Don Zickus
@ 2010-01-28 15:10   ` Cyrill Gorcunov
  2010-01-28 15:46     ` Don Zickus
  0 siblings, 1 reply; 15+ messages in thread
From: Cyrill Gorcunov @ 2010-01-28 15:10 UTC (permalink / raw)
  To: Don Zickus; +Cc: mingo, peterz, aris, linux-kernel

On Wed, Jan 27, 2010 at 03:03:40PM -0500, Don Zickus wrote:
...
> diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
> index 3339917..3b98dd3 100644
> --- a/arch/x86/kernel/traps.c
> +++ b/arch/x86/kernel/traps.c
> @@ -400,6 +400,11 @@ static notrace __kprobes void default_do_nmi(struct pt_regs *regs)
>  		if (notify_die(DIE_NMI_IPI, "nmi_ipi", regs, reason, 2, SIGINT)
>  								== NOTIFY_STOP)
>  			return;
> +
> +	        if (notify_die(DIE_NMI, "nmi", regs, reason, 2, SIGINT)
> +	        			                == NOTIFY_STOP)
> +	                return;
> +
>  #ifdef CONFIG_X86_LOCAL_APIC
>  		/*
>  		 * Ok, so this is none of the documented NMI sources,
> --

Hi Don, I suppose this notify_die should be in CONFIG_X86_LOCAL_APIC
section?
 
	-- Cyrill

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] [RFC] nmi_watchdog: config option to enable new nmi_watchdog
  2010-01-28 14:54   ` Peter Zijlstra
@ 2010-01-28 15:44     ` Don Zickus
  2010-01-29  8:12       ` Ingo Molnar
  0 siblings, 1 reply; 15+ messages in thread
From: Don Zickus @ 2010-01-28 15:44 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: mingo, gorcunov, aris, linux-kernel

On Thu, Jan 28, 2010 at 03:54:54PM +0100, Peter Zijlstra wrote:
> On Wed, 2010-01-27 at 15:03 -0500, Don Zickus wrote:
> > These are the bits that enable the new nmi_watchdog and safely isolate the
> > old nmi_watchdog.  Only one or the other can run, not both at the same
> > time.
> 
> perf disables the lapic watchdog when it wants the pmu, so there
> shouldn't be a problem having both built in.

Yes it does disable but does not prevent nmi_watchdog_tick from running
nor the /proc interface from being loaded.  So perhaps my description
isn't very good.  The idea with the new watchdog was to re-use some of the
bits of the old one, but having them both compiled in seemed to stomp on
each other.  That is what I was trying to prevent.

I can certainly change the behaviour, just makes the code a little more
messy I think.

Cheers,
Don

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] [RFC][x86] move notify_die from nmi.c to traps.c
  2010-01-28 15:10   ` Cyrill Gorcunov
@ 2010-01-28 15:46     ` Don Zickus
  2010-02-02 17:59       ` Cyrill Gorcunov
  0 siblings, 1 reply; 15+ messages in thread
From: Don Zickus @ 2010-01-28 15:46 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: mingo, peterz, aris, linux-kernel

On Thu, Jan 28, 2010 at 06:10:25PM +0300, Cyrill Gorcunov wrote:
> On Wed, Jan 27, 2010 at 03:03:40PM -0500, Don Zickus wrote:
> ...
> > diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
> > index 3339917..3b98dd3 100644
> > --- a/arch/x86/kernel/traps.c
> > +++ b/arch/x86/kernel/traps.c
> > @@ -400,6 +400,11 @@ static notrace __kprobes void default_do_nmi(struct pt_regs *regs)
> >  		if (notify_die(DIE_NMI_IPI, "nmi_ipi", regs, reason, 2, SIGINT)
> >  								== NOTIFY_STOP)
> >  			return;
> > +
> > +	        if (notify_die(DIE_NMI, "nmi", regs, reason, 2, SIGINT)
> > +	        			                == NOTIFY_STOP)
> > +	                return;
> > +
> >  #ifdef CONFIG_X86_LOCAL_APIC
> >  		/*
> >  		 * Ok, so this is none of the documented NMI sources,
> > --
> 
> Hi Don, I suppose this notify_die should be in CONFIG_X86_LOCAL_APIC
> section?

To maintain old behaviour I suppose, yes.  Personally I don't think
notify_die has anything to do with CONFIG_X86_LOCAL_APIC so I put it in
above the #define.

Cheers,
Don

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] [RFC] nmi_watchdog: config option to enable new nmi_watchdog
  2010-01-28 15:44     ` Don Zickus
@ 2010-01-29  8:12       ` Ingo Molnar
  2010-02-01 18:52         ` Don Zickus
  0 siblings, 1 reply; 15+ messages in thread
From: Ingo Molnar @ 2010-01-29  8:12 UTC (permalink / raw)
  To: Don Zickus; +Cc: Peter Zijlstra, gorcunov, aris, linux-kernel


* Don Zickus <dzickus@redhat.com> wrote:

> On Thu, Jan 28, 2010 at 03:54:54PM +0100, Peter Zijlstra wrote:
> > On Wed, 2010-01-27 at 15:03 -0500, Don Zickus wrote:
> > > These are the bits that enable the new nmi_watchdog and safely isolate the
> > > old nmi_watchdog.  Only one or the other can run, not both at the same
> > > time.
> > 
> > perf disables the lapic watchdog when it wants the pmu, so there
> > shouldn't be a problem having both built in.
> 
> Yes it does disable but does not prevent nmi_watchdog_tick from running nor 
> the /proc interface from being loaded.  So perhaps my description isn't very 
> good.  The idea with the new watchdog was to re-use some of the bits of the 
> old one, but having them both compiled in seemed to stomp on each other.  
> That is what I was trying to prevent.
> 
> I can certainly change the behaviour, just makes the code a little more 
> messy I think.

I think that's a good idea - and i think we want to be bold and just have the 
new code run seemlessly. (and fix bugs, if any.)

In fact we want to be even bolder: how about enabling the NMI watchdog by 
default again?

The problem with the old one was its fragility - but now if we have a PMU 
driver active and perf events enabled we might as well use your brand new NMI 
watchdog code as a testing facility as well: if there's _any_ problem with 
NMIs then regular 'perf' use would trigger it too - except that not all people 
run perf while an always-enabled NMI watchdog would.

And it would detect hard hangs too.

What do you think?

	Ingo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] [RFC] nmi_watchdog: config option to enable new nmi_watchdog
  2010-01-29  8:12       ` Ingo Molnar
@ 2010-02-01 18:52         ` Don Zickus
  2010-02-02  7:29           ` Ingo Molnar
  0 siblings, 1 reply; 15+ messages in thread
From: Don Zickus @ 2010-02-01 18:52 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Peter Zijlstra, gorcunov, aris, linux-kernel

On Fri, Jan 29, 2010 at 09:12:27AM +0100, Ingo Molnar wrote:
> 
> * Don Zickus <dzickus@redhat.com> wrote:
> 
> > On Thu, Jan 28, 2010 at 03:54:54PM +0100, Peter Zijlstra wrote:
> > > On Wed, 2010-01-27 at 15:03 -0500, Don Zickus wrote:
> > > > These are the bits that enable the new nmi_watchdog and safely isolate the
> > > > old nmi_watchdog.  Only one or the other can run, not both at the same
> > > > time.
> > > 
> > > perf disables the lapic watchdog when it wants the pmu, so there
> > > shouldn't be a problem having both built in.
> > 
> > Yes it does disable but does not prevent nmi_watchdog_tick from running nor 
> > the /proc interface from being loaded.  So perhaps my description isn't very 
> > good.  The idea with the new watchdog was to re-use some of the bits of the 
> > old one, but having them both compiled in seemed to stomp on each other.  
> > That is what I was trying to prevent.
> > 
> > I can certainly change the behaviour, just makes the code a little more 
> > messy I think.
> 
> I think that's a good idea - and i think we want to be bold and just have the 
> new code run seemlessly. (and fix bugs, if any.)

Ok.  I guess I am confused what you are suggesting here, to do as Peter
suggested and run both at the same time?

> 
> In fact we want to be even bolder: how about enabling the NMI watchdog by 
> default again?

I won't complain. :-)

> 
> The problem with the old one was its fragility - but now if we have a PMU 
> driver active and perf events enabled we might as well use your brand new NMI 
> watchdog code as a testing facility as well: if there's _any_ problem with 
> NMIs then regular 'perf' use would trigger it too - except that not all people 
> run perf while an always-enabled NMI watchdog would.

Agreed.

> 
> And it would detect hard hangs too.
> 
> What do you think?

I will need to give you an updated patch that properly sets the frequency
of the NMI and I probably should still implement a code path that uses the
software perf counters in the cases where the hardware perf counters are
not available.

It seems like you are ok with my approach.  If that is so, I can test on
more machines to iron out some more bugs.  Or did you want to take my
patches as is and have me throw fixes on top?

Cheers,
Don

> 
> 	Ingo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] [RFC] nmi_watchdog: config option to enable new nmi_watchdog
  2010-02-01 18:52         ` Don Zickus
@ 2010-02-02  7:29           ` Ingo Molnar
  2010-02-02 16:42             ` Don Zickus
  0 siblings, 1 reply; 15+ messages in thread
From: Ingo Molnar @ 2010-02-02  7:29 UTC (permalink / raw)
  To: Don Zickus; +Cc: Peter Zijlstra, gorcunov, aris, linux-kernel


* Don Zickus <dzickus@redhat.com> wrote:

> On Fri, Jan 29, 2010 at 09:12:27AM +0100, Ingo Molnar wrote:
> > 
> > * Don Zickus <dzickus@redhat.com> wrote:
> > 
> > > On Thu, Jan 28, 2010 at 03:54:54PM +0100, Peter Zijlstra wrote:
> > > > On Wed, 2010-01-27 at 15:03 -0500, Don Zickus wrote:
> > > > >
> > > > > These are the bits that enable the new nmi_watchdog and safely 
> > > > > isolate the old nmi_watchdog.  Only one or the other can run, not 
> > > > > both at the same time.
> > > > 
> > > > perf disables the lapic watchdog when it wants the pmu, so there 
> > > > shouldn't be a problem having both built in.
> > > 
> > > Yes it does disable but does not prevent nmi_watchdog_tick from running 
> > > nor the /proc interface from being loaded.  So perhaps my description 
> > > isn't very good.  The idea with the new watchdog was to re-use some of 
> > > the bits of the old one, but having them both compiled in seemed to 
> > > stomp on each other.  That is what I was trying to prevent.
> > > 
> > > I can certainly change the behaviour, just makes the code a little more 
> > > messy I think.
> > 
> > I think that's a good idea - and i think we want to be bold and just have 
> > the new code run seemlessly. (and fix bugs, if any.)
> 
> Ok.  I guess I am confused what you are suggesting here, to do as Peter 
> suggested and run both at the same time?

I dont think we want to run old and new code at once, the old NMI watchdog 
code is really a hardcoded minimal PMU driver generating a cycles based NMI 
tick once per second.

> > What do you think?
> 
> I will need to give you an updated patch that properly sets the frequency 
> of the NMI and I probably should still implement a code path that uses the 
> software perf counters in the cases where the hardware perf counters are 
> not available.
> 
> It seems like you are ok with my approach.  If that is so, I can test on 
> more machines to iron out some more bugs.  Or did you want to take my 
> patches as is and have me throw fixes on top?

Well, all known bugs/showstoppers should be fixed - but otherwise if you 
think it works fine we can certainly apply it and then iterate it from that 
point on to increase coverage and add features.

	Ingo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] [RFC] nmi_watchdog: config option to enable new nmi_watchdog
  2010-02-02  7:29           ` Ingo Molnar
@ 2010-02-02 16:42             ` Don Zickus
  0 siblings, 0 replies; 15+ messages in thread
From: Don Zickus @ 2010-02-02 16:42 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Peter Zijlstra, gorcunov, aris, linux-kernel

On Tue, Feb 02, 2010 at 08:29:02AM +0100, Ingo Molnar wrote:
> > 
> > Ok.  I guess I am confused what you are suggesting here, to do as Peter 
> > suggested and run both at the same time?
> 
> I dont think we want to run old and new code at once, the old NMI watchdog 
> code is really a hardcoded minimal PMU driver generating a cycles based NMI 
> tick once per second.

Ok.  Agreed.

> 
> > > What do you think?
> > 
> > I will need to give you an updated patch that properly sets the frequency 
> > of the NMI and I probably should still implement a code path that uses the 
> > software perf counters in the cases where the hardware perf counters are 
> > not available.
> > 
> > It seems like you are ok with my approach.  If that is so, I can test on 
> > more machines to iron out some more bugs.  Or did you want to take my 
> > patches as is and have me throw fixes on top?
> 
> Well, all known bugs/showstoppers should be fixed - but otherwise if you 
> think it works fine we can certainly apply it and then iterate it from that 
> point on to increase coverage and add features.

I have a small cleanup to my second patch that makes it functional.  I'll
run some tests on some more machines and post version two tomorrow.  We
can probably build from there.

Cheers,
Don

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] [RFC][x86] move notify_die from nmi.c to traps.c
  2010-01-28 15:46     ` Don Zickus
@ 2010-02-02 17:59       ` Cyrill Gorcunov
  2010-02-02 18:27         ` Don Zickus
  0 siblings, 1 reply; 15+ messages in thread
From: Cyrill Gorcunov @ 2010-02-02 17:59 UTC (permalink / raw)
  To: Don Zickus; +Cc: mingo, peterz, aris, linux-kernel

On Thu, Jan 28, 2010 at 10:46:46AM -0500, Don Zickus wrote:
> On Thu, Jan 28, 2010 at 06:10:25PM +0300, Cyrill Gorcunov wrote:
> > On Wed, Jan 27, 2010 at 03:03:40PM -0500, Don Zickus wrote:
> > ...
> > > diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
> > > index 3339917..3b98dd3 100644
> > > --- a/arch/x86/kernel/traps.c
> > > +++ b/arch/x86/kernel/traps.c
> > > @@ -400,6 +400,11 @@ static notrace __kprobes void default_do_nmi(struct pt_regs *regs)
> > >  		if (notify_die(DIE_NMI_IPI, "nmi_ipi", regs, reason, 2, SIGINT)
> > >  								== NOTIFY_STOP)
> > >  			return;
> > > +
> > > +	        if (notify_die(DIE_NMI, "nmi", regs, reason, 2, SIGINT)
> > > +	        			                == NOTIFY_STOP)
> > > +	                return;
> > > +
> > >  #ifdef CONFIG_X86_LOCAL_APIC
> > >  		/*
> > >  		 * Ok, so this is none of the documented NMI sources,
> > > --
> > 
> > Hi Don, I suppose this notify_die should be in CONFIG_X86_LOCAL_APIC
> > section?
> 
> To maintain old behaviour I suppose, yes.  Personally I don't think
> notify_die has anything to do with CONFIG_X86_LOCAL_APIC so I put it in
> above the #define.
> 

I think it is. It becomes that if some (possible buggy in future) code
notify default_do_nmi via NOTIFY_STOP we may loose unknown_nmi_error
for non-apic configs. And I reckon that even DIE_NMI_IPI is a bit "weird"
by not being under apic here, but this one should stay there in a
sake of kgdb I guess.

All-in-one: I would better not change old behaviour.

Though, it is possbile that I just miss something obvious :)

> Cheers,
> Don
> 
	-- Cyrill

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] [RFC][x86] move notify_die from nmi.c to traps.c
  2010-02-02 17:59       ` Cyrill Gorcunov
@ 2010-02-02 18:27         ` Don Zickus
  2010-02-02 18:44           ` Cyrill Gorcunov
  0 siblings, 1 reply; 15+ messages in thread
From: Don Zickus @ 2010-02-02 18:27 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: mingo, peterz, aris, linux-kernel

> > > > +	        if (notify_die(DIE_NMI, "nmi", regs, reason, 2, SIGINT)
> > > > +	        			                == NOTIFY_STOP)
> > > > +	                return;
> > > > +
> > > >  #ifdef CONFIG_X86_LOCAL_APIC
> > > >  		/*
> > > >  		 * Ok, so this is none of the documented NMI sources,
> > > > --
> > > 
> > > Hi Don, I suppose this notify_die should be in CONFIG_X86_LOCAL_APIC
> > > section?
> > 
> > To maintain old behaviour I suppose, yes.  Personally I don't think
> > notify_die has anything to do with CONFIG_X86_LOCAL_APIC so I put it in
> > above the #define.
> > 
> 
> I think it is. It becomes that if some (possible buggy in future) code
> notify default_do_nmi via NOTIFY_STOP we may loose unknown_nmi_error

How is that different if the code is under the #define?

> for non-apic configs. And I reckon that even DIE_NMI_IPI is a bit "weird"
> by not being under apic here, but this one should stay there in a
> sake of kgdb I guess.

So you are saying the only way to get NMIs on x86 is through the local
apic?  That seems odd.

I really don't care either way, I was just trying to cleanup the code to
make it easier to understand.  Putting it under the #define didn't seem to
make sense (though that doesn't mean there is a valid reason).

Cheers,
Don

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] [RFC][x86] move notify_die from nmi.c to traps.c
  2010-02-02 18:27         ` Don Zickus
@ 2010-02-02 18:44           ` Cyrill Gorcunov
  0 siblings, 0 replies; 15+ messages in thread
From: Cyrill Gorcunov @ 2010-02-02 18:44 UTC (permalink / raw)
  To: Don Zickus; +Cc: mingo, peterz, aris, linux-kernel

On Tue, Feb 02, 2010 at 01:27:46PM -0500, Don Zickus wrote:
> > > > > +	        if (notify_die(DIE_NMI, "nmi", regs, reason, 2, SIGINT)
> > > > > +	        			                == NOTIFY_STOP)
> > > > > +	                return;
> > > > > +
> > > > >  #ifdef CONFIG_X86_LOCAL_APIC
> > > > >  		/*
> > > > >  		 * Ok, so this is none of the documented NMI sources,
> > > > > --
> > > > 
> > > > Hi Don, I suppose this notify_die should be in CONFIG_X86_LOCAL_APIC
> > > > section?
> > > 
> > > To maintain old behaviour I suppose, yes.  Personally I don't think
> > > notify_die has anything to do with CONFIG_X86_LOCAL_APIC so I put it in
> > > above the #define.
> > > 
> > 
> > I think it is. It becomes that if some (possible buggy in future) code
> > notify default_do_nmi via NOTIFY_STOP we may loose unknown_nmi_error
> 
> How is that different if the code is under the #define?
> 

It matters if code is compiled without CONFIG_X86_LOCAL_APIC.

> > for non-apic configs. And I reckon that even DIE_NMI_IPI is a bit "weird"
> > by not being under apic here, but this one should stay there in a
> > sake of kgdb I guess.
> 
> So you are saying the only way to get NMIs on x86 is through the local
> apic?  That seems odd.

No, I didn't say that. I said that semantic of DIE_NMI_IPI is somehow
weird for me. There is no IPI if no apic present (at least in official
way). But in a sake of kgdb which checks for this notification we should
save it here.

> 
> I really don't care either way, I was just trying to cleanup the code to
> make it easier to understand.  Putting it under the #define didn't seem to
> make sense (though that doesn't mean there is a valid reason).
> 

Don, please don't get me wrong. At moment I don't see problems with putting
DIE_NMI in a place you had put it at. But something worrying me, can't explain
what exactly, perhaps that is how paranoia happens :)

>
> Cheers,
> Don
> 
	-- Cyrill

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2010-02-02 18:44 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-01-27 20:03 [PATCH 0/3] [RFC] new nmi_watchdog using perf events Don Zickus
2010-01-27 20:03 ` [PATCH 1/3] [RFC][x86] move notify_die from nmi.c to traps.c Don Zickus
2010-01-28 15:10   ` Cyrill Gorcunov
2010-01-28 15:46     ` Don Zickus
2010-02-02 17:59       ` Cyrill Gorcunov
2010-02-02 18:27         ` Don Zickus
2010-02-02 18:44           ` Cyrill Gorcunov
2010-01-27 20:03 ` [PATCH 2/3] [RFC] nmi_watchdog: new implementation using perf events Don Zickus
2010-01-27 20:03 ` [PATCH 3/3] [RFC] nmi_watchdog: config option to enable new nmi_watchdog Don Zickus
2010-01-28 14:54   ` Peter Zijlstra
2010-01-28 15:44     ` Don Zickus
2010-01-29  8:12       ` Ingo Molnar
2010-02-01 18:52         ` Don Zickus
2010-02-02  7:29           ` Ingo Molnar
2010-02-02 16:42             ` Don Zickus

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.