All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH 0/6] x86, nmi: new NMI handling routines
@ 2011-08-19 20:37 Don Zickus
  2011-08-19 20:37 ` [RFC][PATCH 1/6] x86, nmi: split out nmi from traps.c Don Zickus
                   ` (5 more replies)
  0 siblings, 6 replies; 26+ messages in thread
From: Don Zickus @ 2011-08-19 20:37 UTC (permalink / raw)
  To: x86, Andi Kleen, Robert Richter, Peter Zijlstra, ying.huang
  Cc: LKML, jason.wessel, Don Zickus

I had the pleasure of hosting Robert Richter this week at Red Hat.  One of
the issues he wanted to talk with me about was having the NMI handling 
routines execute all the NMI handlers for each NMI mainly for his AMD IBS
work.  But he also brought up another good point that because of the way NMIs
work, it is possible to lose them if multiple NMIs happen at the same time.

As a result, we sat around and discussed how we could go about executing
all the nmi handlers for each NMI to ensure that we would not lose any events.

We decided the best way to do this would be to have the NMI handlers break
away from the notifier routines and create our own.  This would allow us to
execute all the handlers without hacking up the notifier stuff and easily
track the number of events processed at a higher level to deal with the new
problemm of extra NMIs.

I spent some time hacking and came up with this patch.  I tested it on my
core2quad machine trying to enable all the NMI handler I could, mainly
perf and kgdb (and oprofile too when perf was disabled).  Everything seems
to work correctly.  If people are ok with this approach, I'll try and test
this on more machines.

More details about the patch are in the individual changelogs.

Don Zickus (6):
  x86, nmi: split out nmi from traps.c
  x86, nmi: create new NMI handler routines
  x86, nmi: wire up NMI handlers to new routines
  x86, nmi:  add in logic to handle multiple events and unknown NMIs
  x86, nmi: track NMI usage stats
  x86, nmi: print out NMI stats in /proc/interrupts

 arch/x86/include/asm/nmi.h              |   39 ++--
 arch/x86/include/asm/reboot.h           |    2 +-
 arch/x86/kernel/Makefile                |    2 +-
 arch/x86/kernel/apic/hw_nmi.c           |   27 +--
 arch/x86/kernel/apic/x2apic_uv_x.c      |   20 +--
 arch/x86/kernel/cpu/mcheck/mce-inject.c |   20 +-
 arch/x86/kernel/cpu/mcheck/mce.c        |    3 -
 arch/x86/kernel/cpu/perf_event.c        |   60 +----
 arch/x86/kernel/crash.c                 |    5 +-
 arch/x86/kernel/irq.c                   |    2 +
 arch/x86/kernel/kgdb.c                  |   60 ++++--
 arch/x86/kernel/nmi.c                   |  412 +++++++++++++++++++++++++++++++
 arch/x86/kernel/reboot.c                |   23 +--
 arch/x86/kernel/traps.c                 |  155 ------------
 arch/x86/oprofile/nmi_int.c             |   40 +--
 arch/x86/oprofile/nmi_timer_int.c       |   28 +--
 drivers/acpi/apei/ghes.c                |   22 +-
 drivers/char/ipmi/ipmi_watchdog.c       |   32 +--
 drivers/watchdog/hpwdt.c                |   23 +--
 19 files changed, 552 insertions(+), 423 deletions(-)
 create mode 100644 arch/x86/kernel/nmi.c

-- 
1.7.6


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFC][PATCH 1/6] x86, nmi: split out nmi from traps.c
  2011-08-19 20:37 [RFC][PATCH 0/6] x86, nmi: new NMI handling routines Don Zickus
@ 2011-08-19 20:37 ` Don Zickus
  2011-08-19 20:37 ` [RFC][PATCH 2/6] x86, nmi: create new NMI handler routines Don Zickus
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 26+ messages in thread
From: Don Zickus @ 2011-08-19 20:37 UTC (permalink / raw)
  To: x86, Andi Kleen, Robert Richter, Peter Zijlstra, ying.huang
  Cc: LKML, jason.wessel, Don Zickus

The nmi stuff is changing a lot and adding more functionality.  Split it
out from the traps.c file so it doesn't continue to pollute that file.

This makes it easier to find and expand all the future nmi related work.

No real functional changes here.

Signed-off-by: Don Zickus <dzickus@redhat.com>
---
 arch/x86/kernel/Makefile |    2 +-
 arch/x86/kernel/nmi.c    |  178 ++++++++++++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/traps.c  |  155 ----------------------------------------
 3 files changed, 179 insertions(+), 156 deletions(-)
 create mode 100644 arch/x86/kernel/nmi.c

diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 82f2912..8baca3c 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -19,7 +19,7 @@ endif
 
 obj-y			:= process_$(BITS).o signal.o entry_$(BITS).o
 obj-y			+= traps.o irq.o irq_$(BITS).o dumpstack_$(BITS).o
-obj-y			+= time.o ioport.o ldt.o dumpstack.o
+obj-y			+= time.o ioport.o ldt.o dumpstack.o nmi.o
 obj-y			+= setup.o x86_init.o i8259.o irqinit.o jump_label.o
 obj-$(CONFIG_IRQ_WORK)  += irq_work.o
 obj-y			+= probe_roms.o
diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
new file mode 100644
index 0000000..68d758a
--- /dev/null
+++ b/arch/x86/kernel/nmi.c
@@ -0,0 +1,178 @@
+/*
+ *  Copyright (C) 1991, 1992  Linus Torvalds
+ *  Copyright (C) 2000, 2001, 2002 Andi Kleen, SuSE Labs
+ *
+ *  Pentium III FXSR, SSE support
+ *	Gareth Hughes <gareth@valinux.com>, May 2000
+ */
+
+/*
+ * Handle hardware traps and faults.
+ */
+#include <linux/spinlock.h>
+#include <linux/kprobes.h>
+#include <linux/kdebug.h>
+#include <linux/nmi.h>
+
+#if defined(CONFIG_EDAC)
+#include <linux/edac.h>
+#endif
+
+#include <linux/atomic.h>
+#include <asm/traps.h>
+#include <asm/mach_traps.h>
+
+static int ignore_nmis;
+
+int unknown_nmi_panic;
+/*
+ * Prevent NMI reason port (0x61) being accessed simultaneously, can
+ * only be used in NMI handler.
+ */
+static DEFINE_RAW_SPINLOCK(nmi_reason_lock);
+
+static int __init setup_unknown_nmi_panic(char *str)
+{
+	unknown_nmi_panic = 1;
+	return 1;
+}
+__setup("unknown_nmi_panic", setup_unknown_nmi_panic);
+
+static notrace __kprobes void
+pci_serr_error(unsigned char reason, struct pt_regs *regs)
+{
+	pr_emerg("NMI: PCI system error (SERR) for reason %02x on CPU %d.\n",
+		 reason, smp_processor_id());
+
+	/*
+	 * On some machines, PCI SERR line is used to report memory
+	 * errors. EDAC makes use of it.
+	 */
+#if defined(CONFIG_EDAC)
+	if (edac_handler_set()) {
+		edac_atomic_assert_error();
+		return;
+	}
+#endif
+
+	if (panic_on_unrecovered_nmi)
+		panic("NMI: Not continuing");
+
+	pr_emerg("Dazed and confused, but trying to continue\n");
+
+	/* Clear and disable the PCI SERR error line. */
+	reason = (reason & NMI_REASON_CLEAR_MASK) | NMI_REASON_CLEAR_SERR;
+	outb(reason, NMI_REASON_PORT);
+}
+
+static notrace __kprobes void
+io_check_error(unsigned char reason, struct pt_regs *regs)
+{
+	unsigned long i;
+
+	pr_emerg(
+	"NMI: IOCK error (debug interrupt?) for reason %02x on CPU %d.\n",
+		 reason, smp_processor_id());
+	show_registers(regs);
+
+	if (panic_on_io_nmi)
+		panic("NMI IOCK error: Not continuing");
+
+	/* Re-enable the IOCK line, wait for a few seconds */
+	reason = (reason & NMI_REASON_CLEAR_MASK) | NMI_REASON_CLEAR_IOCHK;
+	outb(reason, NMI_REASON_PORT);
+
+	i = 20000;
+	while (--i) {
+		touch_nmi_watchdog();
+		udelay(100);
+	}
+
+	reason &= ~NMI_REASON_CLEAR_IOCHK;
+	outb(reason, NMI_REASON_PORT);
+}
+
+static notrace __kprobes void
+unknown_nmi_error(unsigned char reason, struct pt_regs *regs)
+{
+	if (notify_die(DIE_NMIUNKNOWN, "nmi", regs, reason, 2, SIGINT) ==
+			NOTIFY_STOP)
+		return;
+#ifdef CONFIG_MCA
+	/*
+	 * Might actually be able to figure out what the guilty party
+	 * is:
+	 */
+	if (MCA_bus) {
+		mca_handle_nmi();
+		return;
+	}
+#endif
+	pr_emerg("Uhhuh. NMI received for unknown reason %02x on CPU %d.\n",
+		 reason, smp_processor_id());
+
+	pr_emerg("Do you have a strange power saving mode enabled?\n");
+	if (unknown_nmi_panic || panic_on_unrecovered_nmi)
+		panic("NMI: Not continuing");
+
+	pr_emerg("Dazed and confused, but trying to continue\n");
+}
+
+static notrace __kprobes void default_do_nmi(struct pt_regs *regs)
+{
+	unsigned char reason = 0;
+
+	/*
+	 * CPU-specific NMI must be processed before non-CPU-specific
+	 * NMI, otherwise we may lose it, because the CPU-specific
+	 * NMI can not be detected/processed on other CPUs.
+	 */
+	if (notify_die(DIE_NMI, "nmi", regs, 0, 2, SIGINT) == NOTIFY_STOP)
+		return;
+
+	/* Non-CPU-specific NMI: NMI sources can be processed on any CPU */
+	raw_spin_lock(&nmi_reason_lock);
+	reason = get_nmi_reason();
+
+	if (reason & NMI_REASON_MASK) {
+		if (reason & NMI_REASON_SERR)
+			pci_serr_error(reason, regs);
+		else if (reason & NMI_REASON_IOCHK)
+			io_check_error(reason, regs);
+#ifdef CONFIG_X86_32
+		/*
+		 * Reassert NMI in case it became active
+		 * meanwhile as it's edge-triggered:
+		 */
+		reassert_nmi();
+#endif
+		raw_spin_unlock(&nmi_reason_lock);
+		return;
+	}
+	raw_spin_unlock(&nmi_reason_lock);
+
+	unknown_nmi_error(reason, regs);
+}
+
+dotraplinkage notrace __kprobes void
+do_nmi(struct pt_regs *regs, long error_code)
+{
+	nmi_enter();
+
+	inc_irq_stat(__nmi_count);
+
+	if (!ignore_nmis)
+		default_do_nmi(regs);
+
+	nmi_exit();
+}
+
+void stop_nmi(void)
+{
+	ignore_nmis++;
+}
+
+void restart_nmi(void)
+{
+	ignore_nmis--;
+}
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 6913369..a8e3eb8 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -81,15 +81,6 @@ gate_desc idt_table[NR_VECTORS] __page_aligned_data = { { { { 0, 0 } } }, };
 DECLARE_BITMAP(used_vectors, NR_VECTORS);
 EXPORT_SYMBOL_GPL(used_vectors);
 
-static int ignore_nmis;
-
-int unknown_nmi_panic;
-/*
- * Prevent NMI reason port (0x61) being accessed simultaneously, can
- * only be used in NMI handler.
- */
-static DEFINE_RAW_SPINLOCK(nmi_reason_lock);
-
 static inline void conditional_sti(struct pt_regs *regs)
 {
 	if (regs->flags & X86_EFLAGS_IF)
@@ -307,152 +298,6 @@ gp_in_kernel:
 	die("general protection fault", regs, error_code);
 }
 
-static int __init setup_unknown_nmi_panic(char *str)
-{
-	unknown_nmi_panic = 1;
-	return 1;
-}
-__setup("unknown_nmi_panic", setup_unknown_nmi_panic);
-
-static notrace __kprobes void
-pci_serr_error(unsigned char reason, struct pt_regs *regs)
-{
-	pr_emerg("NMI: PCI system error (SERR) for reason %02x on CPU %d.\n",
-		 reason, smp_processor_id());
-
-	/*
-	 * On some machines, PCI SERR line is used to report memory
-	 * errors. EDAC makes use of it.
-	 */
-#if defined(CONFIG_EDAC)
-	if (edac_handler_set()) {
-		edac_atomic_assert_error();
-		return;
-	}
-#endif
-
-	if (panic_on_unrecovered_nmi)
-		panic("NMI: Not continuing");
-
-	pr_emerg("Dazed and confused, but trying to continue\n");
-
-	/* Clear and disable the PCI SERR error line. */
-	reason = (reason & NMI_REASON_CLEAR_MASK) | NMI_REASON_CLEAR_SERR;
-	outb(reason, NMI_REASON_PORT);
-}
-
-static notrace __kprobes void
-io_check_error(unsigned char reason, struct pt_regs *regs)
-{
-	unsigned long i;
-
-	pr_emerg(
-	"NMI: IOCK error (debug interrupt?) for reason %02x on CPU %d.\n",
-		 reason, smp_processor_id());
-	show_registers(regs);
-
-	if (panic_on_io_nmi)
-		panic("NMI IOCK error: Not continuing");
-
-	/* Re-enable the IOCK line, wait for a few seconds */
-	reason = (reason & NMI_REASON_CLEAR_MASK) | NMI_REASON_CLEAR_IOCHK;
-	outb(reason, NMI_REASON_PORT);
-
-	i = 20000;
-	while (--i) {
-		touch_nmi_watchdog();
-		udelay(100);
-	}
-
-	reason &= ~NMI_REASON_CLEAR_IOCHK;
-	outb(reason, NMI_REASON_PORT);
-}
-
-static notrace __kprobes void
-unknown_nmi_error(unsigned char reason, struct pt_regs *regs)
-{
-	if (notify_die(DIE_NMIUNKNOWN, "nmi", regs, reason, 2, SIGINT) ==
-			NOTIFY_STOP)
-		return;
-#ifdef CONFIG_MCA
-	/*
-	 * Might actually be able to figure out what the guilty party
-	 * is:
-	 */
-	if (MCA_bus) {
-		mca_handle_nmi();
-		return;
-	}
-#endif
-	pr_emerg("Uhhuh. NMI received for unknown reason %02x on CPU %d.\n",
-		 reason, smp_processor_id());
-
-	pr_emerg("Do you have a strange power saving mode enabled?\n");
-	if (unknown_nmi_panic || panic_on_unrecovered_nmi)
-		panic("NMI: Not continuing");
-
-	pr_emerg("Dazed and confused, but trying to continue\n");
-}
-
-static notrace __kprobes void default_do_nmi(struct pt_regs *regs)
-{
-	unsigned char reason = 0;
-
-	/*
-	 * CPU-specific NMI must be processed before non-CPU-specific
-	 * NMI, otherwise we may lose it, because the CPU-specific
-	 * NMI can not be detected/processed on other CPUs.
-	 */
-	if (notify_die(DIE_NMI, "nmi", regs, 0, 2, SIGINT) == NOTIFY_STOP)
-		return;
-
-	/* Non-CPU-specific NMI: NMI sources can be processed on any CPU */
-	raw_spin_lock(&nmi_reason_lock);
-	reason = get_nmi_reason();
-
-	if (reason & NMI_REASON_MASK) {
-		if (reason & NMI_REASON_SERR)
-			pci_serr_error(reason, regs);
-		else if (reason & NMI_REASON_IOCHK)
-			io_check_error(reason, regs);
-#ifdef CONFIG_X86_32
-		/*
-		 * Reassert NMI in case it became active
-		 * meanwhile as it's edge-triggered:
-		 */
-		reassert_nmi();
-#endif
-		raw_spin_unlock(&nmi_reason_lock);
-		return;
-	}
-	raw_spin_unlock(&nmi_reason_lock);
-
-	unknown_nmi_error(reason, regs);
-}
-
-dotraplinkage notrace __kprobes void
-do_nmi(struct pt_regs *regs, long error_code)
-{
-	nmi_enter();
-
-	inc_irq_stat(__nmi_count);
-
-	if (!ignore_nmis)
-		default_do_nmi(regs);
-
-	nmi_exit();
-}
-
-void stop_nmi(void)
-{
-	ignore_nmis++;
-}
-
-void restart_nmi(void)
-{
-	ignore_nmis--;
-}
-
 /* May run on IST stack. */
 dotraplinkage void __kprobes do_int3(struct pt_regs *regs, long error_code)
 {
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC][PATCH 2/6] x86, nmi: create new NMI handler routines
  2011-08-19 20:37 [RFC][PATCH 0/6] x86, nmi: new NMI handling routines Don Zickus
  2011-08-19 20:37 ` [RFC][PATCH 1/6] x86, nmi: split out nmi from traps.c Don Zickus
@ 2011-08-19 20:37 ` Don Zickus
  2011-08-22 14:13   ` Peter Zijlstra
                     ` (2 more replies)
  2011-08-19 20:37 ` [RFC][PATCH 3/6] x86, nmi: wire up NMI handlers to new routines Don Zickus
                   ` (3 subsequent siblings)
  5 siblings, 3 replies; 26+ messages in thread
From: Don Zickus @ 2011-08-19 20:37 UTC (permalink / raw)
  To: x86, Andi Kleen, Robert Richter, Peter Zijlstra, ying.huang
  Cc: LKML, jason.wessel, Don Zickus

The NMI handlers used to rely on the notifier infrastructure.  This worked
great until we wanted to support handling multiple events better.

One of the key ideas to the nmi handling is to process _all_ the handlers for
each NMI.  The reason behind this switch is because NMIs are edge triggered.
If enough NMIs are triggered, then they could be lost because the cpu can
only latch at most one NMI (besides the one currently being processed).

In order to deal with this we have decided to process all the NMI handlers
for each NMI.  This allows the handlers to determine if they recieved an
event or not (the ones that can not determine this will be left to fend
for themselves on the unknown NMI list).

As a result of this change it is now possible to have an extra NMI that
was destined to be received for an already processed event.  Because the
event was processed in the previous NMI, this NMI gets dropped and becomes
an 'unknown' NMI.  This of course will cause printks that scare people.

However, we prefer to have extra NMIs as opposed to losing NMIs and as such
are have developed a basic mechanism to catch most of them.  That will be
a later patch.

To accomplish this idea, I unhooked the nmi handlers from the notifier
routines and created a new mechanism loosely based on doIRQ.  The reason
for this is the notifier routines have a couple of shortcomings.  One we
could't guarantee all future NMI handlers used NOTIFY_OK instead of
NOTIFY_STOP.  Second, we couldn't keep track of the number of events being
handled in each routine (most only handle one, perf can handle more than one).
Third, I wanted to eventually display which nmi handlers are registered in
the system in /proc/interrupts to help see who is generating NMIs.

The patch below just implements the new infrastructure but doesn't wire it up
yet (that is the next patch).  Its design is based on doIRQ structs and the
atomic notifier routines.  So the rcu stuff in the patch isn't entirely untested
(as the notifier routines have soaked it) but it should be double checked in
case I copied the code wrong.

Signed-off-by: Don Zickus <dzickus@redhat.com>
---
 arch/x86/include/asm/nmi.h |   19 ++++++
 arch/x86/kernel/nmi.c      |  139 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 158 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/nmi.h b/arch/x86/include/asm/nmi.h
index 4886a68..6d04b28 100644
--- a/arch/x86/include/asm/nmi.h
+++ b/arch/x86/include/asm/nmi.h
@@ -42,6 +42,25 @@ void arch_trigger_all_cpu_backtrace(void);
 #define NMI_LOCAL_NORMAL_PRIOR	(NMI_LOCAL_BIT | NMI_NORMAL_PRIOR)
 #define NMI_LOCAL_LOW_PRIOR	(NMI_LOCAL_BIT | NMI_LOW_PRIOR)
 
+#define NMI_FLAG_FIRST	1
+
+enum {
+	NMI_LOCAL=0,
+	NMI_UNKNOWN,
+	NMI_EXTERNAL,
+	NMI_MAX
+};
+
+#define NMI_DONE	0
+#define NMI_HANDLED	1
+
+typedef int (*nmi_handler_t)(unsigned int, struct pt_regs *);
+
+int register_nmi_handler(unsigned int, nmi_handler_t, unsigned long,
+			 const char *);
+
+void unregister_nmi_handler(unsigned int, const char *);
+
 void stop_nmi(void);
 void restart_nmi(void);
 
diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index 68d758a..dfc46a8 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -13,6 +13,9 @@
 #include <linux/kprobes.h>
 #include <linux/kdebug.h>
 #include <linux/nmi.h>
+#include <linux/delay.h>
+#include <linux/hardirq.h>
+#include <linux/slab.h>
 
 #if defined(CONFIG_EDAC)
 #include <linux/edac.h>
@@ -21,6 +24,27 @@
 #include <linux/atomic.h>
 #include <asm/traps.h>
 #include <asm/mach_traps.h>
+#include <asm/nmi.h>
+
+struct nmiaction {
+	struct nmiaction __rcu *next;
+	nmi_handler_t handler;
+	unsigned int flags;
+	const char *name;
+};
+
+struct nmi_desc {
+	spinlock_t lock;
+	struct nmiaction __rcu *head;
+};
+
+static struct nmi_desc nmi_desc[NMI_MAX] = 
+{
+	{	.lock = __SPIN_LOCK_UNLOCKED(&nmi_desc[0].lock), },
+	{	.lock = __SPIN_LOCK_UNLOCKED(&nmi_desc[1].lock), },
+	{	.lock = __SPIN_LOCK_UNLOCKED(&nmi_desc[2].lock), },
+
+};
 
 static int ignore_nmis;
 
@@ -38,6 +62,121 @@ static int __init setup_unknown_nmi_panic(char *str)
 }
 __setup("unknown_nmi_panic", setup_unknown_nmi_panic);
 
+#define nmi_to_desc(type) (&nmi_desc[type])
+
+static int notrace __kprobes nmi_handle(unsigned int type, struct pt_regs *regs)
+{
+	struct nmi_desc *desc = nmi_to_desc(type);
+	struct nmiaction *next_a, *a, **ap = &desc->head;
+	int handled=0;
+
+	rcu_read_lock();
+	a = rcu_dereference_raw(*ap);
+
+	/*
+	 * NMIs are edge-triggered, which means if you have enough
+	 * of them concurrently, you can lose some because only one
+	 * can be latched at any given time.  Walk the whole list
+	 * to handle those situations.
+	 */
+	while (a) {
+		next_a = rcu_dereference_raw(a->next);
+
+		handled += a->handler(type, regs);
+
+		a = next_a;
+	}
+	rcu_read_unlock();
+
+	/* return total number of NMI events handled */
+	return handled;
+}
+
+static int __setup_nmi(unsigned int type, struct nmiaction *action)
+{
+	struct nmi_desc *desc = nmi_to_desc(type);
+	struct nmiaction **a = &(desc->head);
+	unsigned long flags;
+
+	spin_lock_irqsave(&desc->lock, flags);
+
+	/*
+	 * some handlers need to be executed first otherwise a fake
+	 * event confuses some handlers (kdump uses this flag)
+	 */
+	if (!(action->flags & NMI_FLAG_FIRST))
+		while ((*a) != NULL)
+			a = &((*a)->next);
+	
+	action->next = *a;
+	rcu_assign_pointer(*a, action);
+
+	spin_unlock_irqrestore(&desc->lock, flags);
+	return 0;
+}
+
+static struct nmiaction *__free_nmi(unsigned int type, const char *name)
+{
+	struct nmi_desc *desc = nmi_to_desc(type);
+	struct nmiaction *n, **np = &(desc->head);
+	unsigned long flags;
+
+	spin_lock_irqsave(&desc->lock, flags);
+
+	while ((*np) != NULL) {
+		n = *np;
+
+		/*
+		 * the name passed in to describe the nmi handler
+		 * is used as the lookup key
+		 */
+		if (!strcmp(n->name, name)) {
+			WARN(in_nmi(),
+				"Trying to free NMI (%s) from NMI context!\n", n->name);
+			rcu_assign_pointer(*np, n->next);
+			break;
+		}
+		np = &(n->next);
+	}
+
+	spin_unlock_irqrestore(&desc->lock, flags);
+	synchronize_rcu();
+	return *np;
+}
+
+int register_nmi_handler(unsigned int type, nmi_handler_t handler,
+			unsigned long nmiflags, const char *devname)
+{
+	struct nmiaction *action;
+	int retval;
+
+	if (!handler)
+		return -EINVAL;
+
+	action = kzalloc(sizeof(struct nmiaction), GFP_KERNEL);
+	if (!action)
+		return -ENOMEM;
+
+	action->handler = handler;
+	action->flags = nmiflags;
+	action->name = devname;
+
+	retval = __setup_nmi(type, action);
+
+	if (retval)
+		kfree(action);
+	
+	return retval;
+}
+EXPORT_SYMBOL(register_nmi_handler);
+
+void unregister_nmi_handler(unsigned int type, const char *name)
+{
+	kfree(__free_nmi(type, name));
+}
+
+EXPORT_SYMBOL_GPL(unregister_nmi_handler);
+
 static notrace __kprobes void
 pci_serr_error(unsigned char reason, struct pt_regs *regs)
 {
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC][PATCH 3/6] x86, nmi: wire up NMI handlers to new routines
  2011-08-19 20:37 [RFC][PATCH 0/6] x86, nmi: new NMI handling routines Don Zickus
  2011-08-19 20:37 ` [RFC][PATCH 1/6] x86, nmi: split out nmi from traps.c Don Zickus
  2011-08-19 20:37 ` [RFC][PATCH 2/6] x86, nmi: create new NMI handler routines Don Zickus
@ 2011-08-19 20:37 ` Don Zickus
  2011-08-19 20:37 ` [RFC][PATCH 4/6] x86, nmi: add in logic to handle multiple events and unknown NMIs Don Zickus
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 26+ messages in thread
From: Don Zickus @ 2011-08-19 20:37 UTC (permalink / raw)
  To: x86, Andi Kleen, Robert Richter, Peter Zijlstra, ying.huang
  Cc: LKML, jason.wessel, Don Zickus

Just convert all the files that have an nmi handler to the new routines.
Most of it is straight forward conversion.  A couple of places needed some
tweaking like kgdb which separates the debug notifier from the nmi handler
and mce removes a call to notify_die (as I couldn't figure out why it was
there).

The things that get converted are the registeration/unregistration routines
and the nmi handler itself has its args changed along with code removal
to check which list it is on (most are on one NMI list except for kgdb
which has both an NMI routine and an NMI Unknown routine).

Signed-off-by: Don Zickus <dzickus@redhat.com>
---
 arch/x86/include/asm/nmi.h              |   20 ----------
 arch/x86/include/asm/reboot.h           |    2 +-
 arch/x86/kernel/apic/hw_nmi.c           |   27 +++-----------
 arch/x86/kernel/apic/x2apic_uv_x.c      |   20 ++--------
 arch/x86/kernel/cpu/mcheck/mce-inject.c |   20 ++++-------
 arch/x86/kernel/cpu/mcheck/mce.c        |    3 --
 arch/x86/kernel/cpu/perf_event.c        |   60 +++----------------------------
 arch/x86/kernel/crash.c                 |    5 +--
 arch/x86/kernel/kgdb.c                  |   60 +++++++++++++++++++++++--------
 arch/x86/kernel/nmi.c                   |   11 ++++--
 arch/x86/kernel/reboot.c                |   23 ++++--------
 arch/x86/oprofile/nmi_int.c             |   40 ++++++--------------
 arch/x86/oprofile/nmi_timer_int.c       |   28 +++-----------
 drivers/acpi/apei/ghes.c                |   22 ++++-------
 drivers/char/ipmi/ipmi_watchdog.c       |   32 +++++-----------
 drivers/watchdog/hpwdt.c                |   23 +++---------
 16 files changed, 125 insertions(+), 271 deletions(-)

diff --git a/arch/x86/include/asm/nmi.h b/arch/x86/include/asm/nmi.h
index 6d04b28..fc74547 100644
--- a/arch/x86/include/asm/nmi.h
+++ b/arch/x86/include/asm/nmi.h
@@ -22,26 +22,6 @@ void arch_trigger_all_cpu_backtrace(void);
 #define arch_trigger_all_cpu_backtrace arch_trigger_all_cpu_backtrace
 #endif
 
-/*
- * Define some priorities for the nmi notifier call chain.
- *
- * Create a local nmi bit that has a higher priority than
- * external nmis, because the local ones are more frequent.
- *
- * Also setup some default high/normal/low settings for
- * subsystems to registers with.  Using 4 bits to separate
- * the priorities.  This can go a lot higher if needed be.
- */
-
-#define NMI_LOCAL_SHIFT		16	/* randomly picked */
-#define NMI_LOCAL_BIT		(1ULL << NMI_LOCAL_SHIFT)
-#define NMI_HIGH_PRIOR		(1ULL << 8)
-#define NMI_NORMAL_PRIOR	(1ULL << 4)
-#define NMI_LOW_PRIOR		(1ULL << 0)
-#define NMI_LOCAL_HIGH_PRIOR	(NMI_LOCAL_BIT | NMI_HIGH_PRIOR)
-#define NMI_LOCAL_NORMAL_PRIOR	(NMI_LOCAL_BIT | NMI_NORMAL_PRIOR)
-#define NMI_LOCAL_LOW_PRIOR	(NMI_LOCAL_BIT | NMI_LOW_PRIOR)
-
 #define NMI_FLAG_FIRST	1
 
 enum {
diff --git a/arch/x86/include/asm/reboot.h b/arch/x86/include/asm/reboot.h
index 3250e3d..92f29706 100644
--- a/arch/x86/include/asm/reboot.h
+++ b/arch/x86/include/asm/reboot.h
@@ -23,7 +23,7 @@ void machine_real_restart(unsigned int type);
 #define MRR_BIOS	0
 #define MRR_APM		1
 
-typedef void (*nmi_shootdown_cb)(int, struct die_args*);
+typedef void (*nmi_shootdown_cb)(int, struct pt_regs*);
 void nmi_shootdown_cpus(nmi_shootdown_cb callback);
 
 #endif /* _ASM_X86_REBOOT_H */
diff --git a/arch/x86/kernel/apic/hw_nmi.c b/arch/x86/kernel/apic/hw_nmi.c
index d5e57db0..31cb9ae 100644
--- a/arch/x86/kernel/apic/hw_nmi.c
+++ b/arch/x86/kernel/apic/hw_nmi.c
@@ -60,22 +60,10 @@ void arch_trigger_all_cpu_backtrace(void)
 }
 
 static int __kprobes
-arch_trigger_all_cpu_backtrace_handler(struct notifier_block *self,
-			 unsigned long cmd, void *__args)
+arch_trigger_all_cpu_backtrace_handler(unsigned int cmd, struct pt_regs *regs)
 {
-	struct die_args *args = __args;
-	struct pt_regs *regs;
 	int cpu;
 
-	switch (cmd) {
-	case DIE_NMI:
-		break;
-
-	default:
-		return NOTIFY_DONE;
-	}
-
-	regs = args->regs;
 	cpu = smp_processor_id();
 
 	if (cpumask_test_cpu(cpu, to_cpumask(backtrace_mask))) {
@@ -86,21 +74,16 @@ arch_trigger_all_cpu_backtrace_handler(struct notifier_block *self,
 		show_regs(regs);
 		arch_spin_unlock(&lock);
 		cpumask_clear_cpu(cpu, to_cpumask(backtrace_mask));
-		return NOTIFY_STOP;
+		return NMI_HANDLED;
 	}
 
-	return NOTIFY_DONE;
+	return NMI_DONE;
 }
 
-static __read_mostly struct notifier_block backtrace_notifier = {
-	.notifier_call          = arch_trigger_all_cpu_backtrace_handler,
-	.next                   = NULL,
-	.priority               = NMI_LOCAL_LOW_PRIOR,
-};
-
 static int __init register_trigger_all_cpu_backtrace(void)
 {
-	register_die_notifier(&backtrace_notifier);
+	register_nmi_handler(NMI_LOCAL, arch_trigger_all_cpu_backtrace_handler,
+				0, "arch_bt");
 	return 0;
 }
 early_initcall(register_trigger_all_cpu_backtrace);
diff --git a/arch/x86/kernel/apic/x2apic_uv_x.c b/arch/x86/kernel/apic/x2apic_uv_x.c
index adc66c3..88b0dbb 100644
--- a/arch/x86/kernel/apic/x2apic_uv_x.c
+++ b/arch/x86/kernel/apic/x2apic_uv_x.c
@@ -673,18 +673,11 @@ void __cpuinit uv_cpu_init(void)
 /*
  * When NMI is received, print a stack trace.
  */
-int uv_handle_nmi(struct notifier_block *self, unsigned long reason, void *data)
+int uv_handle_nmi(unsigned int reason, struct pt_regs *regs)
 {
 	unsigned long real_uv_nmi;
 	int bid;
 
-	if (reason != DIE_NMIUNKNOWN)
-		return NOTIFY_OK;
-
-	if (in_crash_kexec)
-		/* do nothing if entering the crash kernel */
-		return NOTIFY_OK;
-
 	/*
 	 * Each blade has an MMR that indicates when an NMI has been sent
 	 * to cpus on the blade. If an NMI is detected, atomically
@@ -705,7 +698,7 @@ int uv_handle_nmi(struct notifier_block *self, unsigned long reason, void *data)
 	}
 
 	if (likely(__get_cpu_var(cpu_last_nmi_count) == uv_blade_info[bid].nmi_count))
-		return NOTIFY_DONE;
+		return NMI_DONE;
 
 	__get_cpu_var(cpu_last_nmi_count) = uv_blade_info[bid].nmi_count;
 
@@ -718,17 +711,12 @@ int uv_handle_nmi(struct notifier_block *self, unsigned long reason, void *data)
 	dump_stack();
 	spin_unlock(&uv_nmi_lock);
 
-	return NOTIFY_STOP;
+	return NMI_HANDLED;
 }
 
-static struct notifier_block uv_dump_stack_nmi_nb = {
-	.notifier_call	= uv_handle_nmi,
-	.priority = NMI_LOCAL_LOW_PRIOR - 1,
-};
-
 void uv_register_nmi_notifier(void)
 {
-	if (register_die_notifier(&uv_dump_stack_nmi_nb))
+	if (register_nmi_handler(NMI_UNKNOWN, uv_handle_nmi, 0, "uv"))
 		printk(KERN_WARNING "UV NMI handler failed to register\n");
 }
 
diff --git a/arch/x86/kernel/cpu/mcheck/mce-inject.c b/arch/x86/kernel/cpu/mcheck/mce-inject.c
index 0ed633c..6199232 100644
--- a/arch/x86/kernel/cpu/mcheck/mce-inject.c
+++ b/arch/x86/kernel/cpu/mcheck/mce-inject.c
@@ -78,27 +78,20 @@ static void raise_exception(struct mce *m, struct pt_regs *pregs)
 
 static cpumask_var_t mce_inject_cpumask;
 
-static int mce_raise_notify(struct notifier_block *self,
-			    unsigned long val, void *data)
+static int mce_raise_notify(unsigned int cmd, struct pt_regs *regs)
 {
-	struct die_args *args = (struct die_args *)data;
 	int cpu = smp_processor_id();
 	struct mce *m = &__get_cpu_var(injectm);
-	if (val != DIE_NMI || !cpumask_test_cpu(cpu, mce_inject_cpumask))
-		return NOTIFY_DONE;
+	if (!cpumask_test_cpu(cpu, mce_inject_cpumask))
+		return NMI_DONE;
 	cpumask_clear_cpu(cpu, mce_inject_cpumask);
 	if (m->inject_flags & MCJ_EXCEPTION)
-		raise_exception(m, args->regs);
+		raise_exception(m, regs);
 	else if (m->status)
 		raise_poll(m);
-	return NOTIFY_STOP;
+	return NMI_HANDLED;
 }
 
-static struct notifier_block mce_raise_nb = {
-	.notifier_call = mce_raise_notify,
-	.priority = NMI_LOCAL_NORMAL_PRIOR,
-};
-
 /* Inject mce on current CPU */
 static int raise_local(void)
 {
@@ -216,7 +209,8 @@ static int inject_init(void)
 		return -ENOMEM;
 	printk(KERN_INFO "Machine check injector initialized\n");
 	mce_chrdev_ops.write = mce_write;
-	register_die_notifier(&mce_raise_nb);
+	register_nmi_handler(NMI_LOCAL, mce_raise_notify, 0,
+				"mce_notify");
 	return 0;
 }
 
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 08363b0..3fc65b6 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -908,9 +908,6 @@ void do_machine_check(struct pt_regs *regs, long error_code)
 
 	percpu_inc(mce_exception_count);
 
-	if (notify_die(DIE_NMI, "machine check", regs, error_code,
-			   18, SIGKILL) == NOTIFY_STOP)
-		goto out;
 	if (!banks)
 		goto out;
 
diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 4ee3abf..767371f 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1375,68 +1375,18 @@ struct pmu_nmi_state {
 static DEFINE_PER_CPU(struct pmu_nmi_state, pmu_nmi);
 
 static int __kprobes
-perf_event_nmi_handler(struct notifier_block *self,
-			 unsigned long cmd, void *__args)
+perf_event_nmi_handler(unsigned int cmd, struct pt_regs *regs)
 {
-	struct die_args *args = __args;
-	unsigned int this_nmi;
 	int handled;
 
 	if (!atomic_read(&active_events))
-		return NOTIFY_DONE;
+		return NMI_DONE;
 
-	switch (cmd) {
-	case DIE_NMI:
-		break;
-	case DIE_NMIUNKNOWN:
-		this_nmi = percpu_read(irq_stat.__nmi_count);
-		if (this_nmi != __this_cpu_read(pmu_nmi.marked))
-			/* let the kernel handle the unknown nmi */
-			return NOTIFY_DONE;
-		/*
-		 * This one is a PMU back-to-back nmi. Two events
-		 * trigger 'simultaneously' raising two back-to-back
-		 * NMIs. If the first NMI handles both, the latter
-		 * will be empty and daze the CPU. So, we drop it to
-		 * avoid false-positive 'unknown nmi' messages.
-		 */
-		return NOTIFY_STOP;
-	default:
-		return NOTIFY_DONE;
-	}
-
-	handled = x86_pmu.handle_irq(args->regs);
-	if (!handled)
-		return NOTIFY_DONE;
-
-	this_nmi = percpu_read(irq_stat.__nmi_count);
-	if ((handled > 1) ||
-		/* the next nmi could be a back-to-back nmi */
-	    ((__this_cpu_read(pmu_nmi.marked) == this_nmi) &&
-	     (__this_cpu_read(pmu_nmi.handled) > 1))) {
-		/*
-		 * We could have two subsequent back-to-back nmis: The
-		 * first handles more than one counter, the 2nd
-		 * handles only one counter and the 3rd handles no
-		 * counter.
-		 *
-		 * This is the 2nd nmi because the previous was
-		 * handling more than one counter. We will mark the
-		 * next (3rd) and then drop it if unhandled.
-		 */
-		__this_cpu_write(pmu_nmi.marked, this_nmi + 1);
-		__this_cpu_write(pmu_nmi.handled, handled);
-	}
+	handled = x86_pmu.handle_irq(regs);
 
-	return NOTIFY_STOP;
+	return handled;
 }
 
-static __read_mostly struct notifier_block perf_event_nmi_notifier = {
-	.notifier_call		= perf_event_nmi_handler,
-	.next			= NULL,
-	.priority		= NMI_LOCAL_LOW_PRIOR,
-};
-
 static struct event_constraint unconstrained;
 static struct event_constraint emptyconstraint;
 
@@ -1557,7 +1507,7 @@ static int __init init_hw_perf_events(void)
 		((1LL << x86_pmu.num_counters_fixed)-1) << X86_PMC_IDX_FIXED;
 
 	perf_events_lapic_init();
-	register_die_notifier(&perf_event_nmi_notifier);
+	register_nmi_handler(NMI_LOCAL, perf_event_nmi_handler, 0, "PMI");
 
 	unconstrained = (struct event_constraint)
 		__EVENT_CONSTRAINT(0, (1ULL << x86_pmu.num_counters) - 1,
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 764c7c2..13ad899 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -32,15 +32,12 @@ int in_crash_kexec;
 
 #if defined(CONFIG_SMP) && defined(CONFIG_X86_LOCAL_APIC)
 
-static void kdump_nmi_callback(int cpu, struct die_args *args)
+static void kdump_nmi_callback(int cpu, struct pt_regs *regs)
 {
-	struct pt_regs *regs;
 #ifdef CONFIG_X86_32
 	struct pt_regs fixed_regs;
 #endif
 
-	regs = args->regs;
-
 #ifdef CONFIG_X86_32
 	if (!user_mode_vm(regs)) {
 		crash_fixup_ss_esp(&fixed_regs, regs);
diff --git a/arch/x86/kernel/kgdb.c b/arch/x86/kernel/kgdb.c
index 00354d4..faba577 100644
--- a/arch/x86/kernel/kgdb.c
+++ b/arch/x86/kernel/kgdb.c
@@ -511,28 +511,37 @@ single_step_cont(struct pt_regs *regs, struct die_args *args)
 
 static int was_in_debug_nmi[NR_CPUS];
 
-static int __kgdb_notify(struct die_args *args, unsigned long cmd)
+static int kgdb_nmi_handler(unsigned int cmd, struct pt_regs *regs)
 {
-	struct pt_regs *regs = args->regs;
-
 	switch (cmd) {
-	case DIE_NMI:
+	case NMI_LOCAL:
 		if (atomic_read(&kgdb_active) != -1) {
 			/* KGDB CPU roundup */
 			kgdb_nmicallback(raw_smp_processor_id(), regs);
 			was_in_debug_nmi[raw_smp_processor_id()] = 1;
 			touch_nmi_watchdog();
-			return NOTIFY_STOP;
+			return NMI_HANDLED;
 		}
-		return NOTIFY_DONE;
+		break;
 
-	case DIE_NMIUNKNOWN:
+	case NMI_UNKNOWN:
 		if (was_in_debug_nmi[raw_smp_processor_id()]) {
 			was_in_debug_nmi[raw_smp_processor_id()] = 0;
-			return NOTIFY_STOP;
+			return NMI_HANDLED;
 		}
-		return NOTIFY_DONE;
+		break;
+	default:
+		/* do nothing */
+		break;
+	}
+	return NMI_DONE;
+}
+
+static int __kgdb_notify(struct die_args *args, unsigned long cmd)
+{
+	struct pt_regs *regs = args->regs;
 
+	switch (cmd) {
 	case DIE_DEBUG:
 		if (atomic_read(&kgdb_cpu_doing_single_step) != -1) {
 			if (user_mode(regs))
@@ -590,11 +599,6 @@ kgdb_notify(struct notifier_block *self, unsigned long cmd, void *ptr)
 
 static struct notifier_block kgdb_notifier = {
 	.notifier_call	= kgdb_notify,
-
-	/*
-	 * Lowest-prio notifier priority, we want to be notified last:
-	 */
-	.priority	= NMI_LOCAL_LOW_PRIOR,
 };
 
 /**
@@ -605,7 +609,31 @@ static struct notifier_block kgdb_notifier = {
  */
 int kgdb_arch_init(void)
 {
-	return register_die_notifier(&kgdb_notifier);
+	int retval;
+
+	retval = register_die_notifier(&kgdb_notifier);
+	if (retval)
+		goto out;
+
+	retval = register_nmi_handler(NMI_LOCAL, kgdb_nmi_handler,
+					0, "kgdb");
+	if (retval)
+		goto out1;
+
+	retval = register_nmi_handler(NMI_UNKNOWN, kgdb_nmi_handler,
+					0, "kgdb");
+
+	if (retval)
+		goto out2;
+
+	return retval;
+
+out2:
+	unregister_nmi_handler(NMI_LOCAL, "kgdb");
+out1:
+	unregister_die_notifier(&kgdb_notifier);
+out:
+	return retval;
 }
 
 static void kgdb_hw_overflow_handler(struct perf_event *event,
@@ -673,6 +701,8 @@ void kgdb_arch_exit(void)
 			breakinfo[i].pev = NULL;
 		}
 	}
+	unregister_nmi_handler(NMI_UNKNOWN, "kgdb");
+	unregister_nmi_handler(NMI_LOCAL, "kgdb");
 	unregister_die_notifier(&kgdb_notifier);
 }
 
diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index dfc46a8..5d16fd4 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -1,6 +1,7 @@
 /*
  *  Copyright (C) 1991, 1992  Linus Torvalds
  *  Copyright (C) 2000, 2001, 2002 Andi Kleen, SuSE Labs
+ *  Copyright (C) 2011	Don Zickus Red Hat, Inc.
  *
  *  Pentium III FXSR, SSE support
  *	Gareth Hughes <gareth@valinux.com>, May 2000
@@ -234,8 +235,10 @@ io_check_error(unsigned char reason, struct pt_regs *regs)
 static notrace __kprobes void
 unknown_nmi_error(unsigned char reason, struct pt_regs *regs)
 {
-	if (notify_die(DIE_NMIUNKNOWN, "nmi", regs, reason, 2, SIGINT) ==
-			NOTIFY_STOP)
+	int handled;
+
+	handled = nmi_handle(NMI_UNKNOWN, regs);
+	if (handled) 
 		return;
 #ifdef CONFIG_MCA
 	/*
@@ -260,13 +263,15 @@ unknown_nmi_error(unsigned char reason, struct pt_regs *regs)
 static notrace __kprobes void default_do_nmi(struct pt_regs *regs)
 {
 	unsigned char reason = 0;
+	int handled;
 
 	/*
 	 * CPU-specific NMI must be processed before non-CPU-specific
 	 * NMI, otherwise we may lose it, because the CPU-specific
 	 * NMI can not be detected/processed on other CPUs.
 	 */
-	if (notify_die(DIE_NMI, "nmi", regs, 0, 2, SIGINT) == NOTIFY_STOP)
+	handled = nmi_handle(NMI_LOCAL, regs);
+	if (handled)
 		return;
 
 	/* Non-CPU-specific NMI: NMI sources can be processed on any CPU */
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index 9242436..adab340 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -464,7 +464,7 @@ static inline void kb_wait(void)
 	}
 }
 
-static void vmxoff_nmi(int cpu, struct die_args *args)
+static void vmxoff_nmi(int cpu, struct pt_regs *regs)
 {
 	cpu_emergency_vmxoff();
 }
@@ -736,14 +736,10 @@ static nmi_shootdown_cb shootdown_callback;
 
 static atomic_t waiting_for_crash_ipi;
 
-static int crash_nmi_callback(struct notifier_block *self,
-			unsigned long val, void *data)
+static int crash_nmi_callback(unsigned int val, struct pt_regs *regs)
 {
 	int cpu;
 
-	if (val != DIE_NMI)
-		return NOTIFY_OK;
-
 	cpu = raw_smp_processor_id();
 
 	/* Don't do anything if this handler is invoked on crashing cpu.
@@ -751,10 +747,10 @@ static int crash_nmi_callback(struct notifier_block *self,
 	 * an NMI if system was initially booted with nmi_watchdog parameter.
 	 */
 	if (cpu == crashing_cpu)
-		return NOTIFY_STOP;
+		return NMI_HANDLED;
 	local_irq_disable();
 
-	shootdown_callback(cpu, (struct die_args *)data);
+	shootdown_callback(cpu, regs);
 
 	atomic_dec(&waiting_for_crash_ipi);
 	/* Assume hlt works */
@@ -762,7 +758,7 @@ static int crash_nmi_callback(struct notifier_block *self,
 	for (;;)
 		cpu_relax();
 
-	return 1;
+	return NMI_HANDLED;
 }
 
 static void smp_send_nmi_allbutself(void)
@@ -770,12 +766,6 @@ static void smp_send_nmi_allbutself(void)
 	apic->send_IPI_allbutself(NMI_VECTOR);
 }
 
-static struct notifier_block crash_nmi_nb = {
-	.notifier_call = crash_nmi_callback,
-	/* we want to be the first one called */
-	.priority = NMI_LOCAL_HIGH_PRIOR+1,
-};
-
 /* Halt all other CPUs, calling the specified function on each of them
  *
  * This function can be used to halt all other CPUs on crash
@@ -794,7 +784,8 @@ void nmi_shootdown_cpus(nmi_shootdown_cb callback)
 
 	atomic_set(&waiting_for_crash_ipi, num_online_cpus() - 1);
 	/* Would it be better to replace the trap vector here? */
-	if (register_die_notifier(&crash_nmi_nb))
+	if (register_nmi_handler(NMI_LOCAL, crash_nmi_callback, 
+				 NMI_FLAG_FIRST, "crash"))
 		return;		/* return what? */
 	/* Ensure the new callback function is set before sending
 	 * out the NMI
diff --git a/arch/x86/oprofile/nmi_int.c b/arch/x86/oprofile/nmi_int.c
index 68894fd..adf8fb3 100644
--- a/arch/x86/oprofile/nmi_int.c
+++ b/arch/x86/oprofile/nmi_int.c
@@ -61,26 +61,15 @@ u64 op_x86_get_ctrl(struct op_x86_model_spec const *model,
 }
 
 
-static int profile_exceptions_notify(struct notifier_block *self,
-				     unsigned long val, void *data)
+static int profile_exceptions_notify(unsigned int val, struct pt_regs *regs)
 {
-	struct die_args *args = (struct die_args *)data;
-	int ret = NOTIFY_DONE;
-
-	switch (val) {
-	case DIE_NMI:
-		if (ctr_running)
-			model->check_ctrs(args->regs, &__get_cpu_var(cpu_msrs));
-		else if (!nmi_enabled)
-			break;
-		else
-			model->stop(&__get_cpu_var(cpu_msrs));
-		ret = NOTIFY_STOP;
-		break;
-	default:
-		break;
-	}
-	return ret;
+	if (ctr_running)
+		model->check_ctrs(regs, &__get_cpu_var(cpu_msrs));
+	else if (!nmi_enabled)
+		return NMI_DONE;
+	else
+		model->stop(&__get_cpu_var(cpu_msrs));
+	return NMI_HANDLED;
 }
 
 static void nmi_cpu_save_registers(struct op_msrs *msrs)
@@ -363,12 +352,6 @@ static void nmi_cpu_setup(void *dummy)
 	apic_write(APIC_LVTPC, APIC_DM_NMI);
 }
 
-static struct notifier_block profile_exceptions_nb = {
-	.notifier_call = profile_exceptions_notify,
-	.next = NULL,
-	.priority = NMI_LOCAL_LOW_PRIOR,
-};
-
 static void nmi_cpu_restore_registers(struct op_msrs *msrs)
 {
 	struct op_msr *counters = msrs->counters;
@@ -508,7 +491,8 @@ static int nmi_setup(void)
 	ctr_running = 0;
 	/* make variables visible to the nmi handler: */
 	smp_mb();
-	err = register_die_notifier(&profile_exceptions_nb);
+	err = register_nmi_handler(NMI_LOCAL, profile_exceptions_notify,
+					0, "oprofile");
 	if (err)
 		goto fail;
 
@@ -538,7 +522,7 @@ static void nmi_shutdown(void)
 	put_online_cpus();
 	/* make variables visible to the nmi handler: */
 	smp_mb();
-	unregister_die_notifier(&profile_exceptions_nb);
+	unregister_nmi_handler(NMI_LOCAL, "oprofile");
 	msrs = &get_cpu_var(cpu_msrs);
 	model->shutdown(msrs);
 	free_msrs();
diff --git a/arch/x86/oprofile/nmi_timer_int.c b/arch/x86/oprofile/nmi_timer_int.c
index 720bf5a..7f8052c 100644
--- a/arch/x86/oprofile/nmi_timer_int.c
+++ b/arch/x86/oprofile/nmi_timer_int.c
@@ -18,32 +18,16 @@
 #include <asm/apic.h>
 #include <asm/ptrace.h>
 
-static int profile_timer_exceptions_notify(struct notifier_block *self,
-					   unsigned long val, void *data)
+static int profile_timer_exceptions_notify(unsigned int val, struct pt_regs *regs)
 {
-	struct die_args *args = (struct die_args *)data;
-	int ret = NOTIFY_DONE;
-
-	switch (val) {
-	case DIE_NMI:
-		oprofile_add_sample(args->regs, 0);
-		ret = NOTIFY_STOP;
-		break;
-	default:
-		break;
-	}
-	return ret;
+	oprofile_add_sample(regs, 0);
+	return NMI_HANDLED;
 }
 
-static struct notifier_block profile_timer_exceptions_nb = {
-	.notifier_call = profile_timer_exceptions_notify,
-	.next = NULL,
-	.priority = NMI_LOW_PRIOR,
-};
-
 static int timer_start(void)
 {
-	if (register_die_notifier(&profile_timer_exceptions_nb))
+	if (register_nmi_handler(NMI_LOCAL, profile_timer_exceptions_notify,
+					0, "oprofile-timer"))
 		return 1;
 	return 0;
 }
@@ -51,7 +35,7 @@ static int timer_start(void)
 
 static void timer_stop(void)
 {
-	unregister_die_notifier(&profile_timer_exceptions_nb);
+	unregister_nmi_handler(NMI_LOCAL, "oprofile-timer");
 	synchronize_sched();  /* Allow already-started NMIs to complete. */
 }
 
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 0784f99..b8e08cb 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -50,6 +50,7 @@
 #include <acpi/hed.h>
 #include <asm/mce.h>
 #include <asm/tlbflush.h>
+#include <asm/nmi.h>
 
 #include "apei-internal.h"
 
@@ -749,15 +750,11 @@ static void ghes_proc_in_irq(struct irq_work *irq_work)
 	}
 }
 
-static int ghes_notify_nmi(struct notifier_block *this,
-				  unsigned long cmd, void *data)
+static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs)
 {
 	struct ghes *ghes, *ghes_global = NULL;
 	int sev, sev_global = -1;
-	int ret = NOTIFY_DONE;
-
-	if (cmd != DIE_NMI)
-		return ret;
+	int ret = NMI_DONE;
 
 	raw_spin_lock(&ghes_nmi_lock);
 	list_for_each_entry_rcu(ghes, &ghes_nmi, list) {
@@ -770,10 +767,10 @@ static int ghes_notify_nmi(struct notifier_block *this,
 			sev_global = sev;
 			ghes_global = ghes;
 		}
-		ret = NOTIFY_STOP;
+		ret = NMI_HANDLED;
 	}
 
-	if (ret == NOTIFY_DONE)
+	if (ret == NMI_DONE)
 		goto out;
 
 	if (sev_global >= GHES_SEV_PANIC) {
@@ -825,10 +822,6 @@ static struct notifier_block ghes_notifier_sci = {
 	.notifier_call = ghes_notify_sci,
 };
 
-static struct notifier_block ghes_notifier_nmi = {
-	.notifier_call = ghes_notify_nmi,
-};
-
 static unsigned long ghes_esource_prealloc_size(
 	const struct acpi_hest_generic *generic)
 {
@@ -918,7 +911,8 @@ static int __devinit ghes_probe(struct platform_device *ghes_dev)
 		ghes_estatus_pool_expand(len);
 		mutex_lock(&ghes_list_mutex);
 		if (list_empty(&ghes_nmi))
-			register_die_notifier(&ghes_notifier_nmi);
+			register_nmi_handler(NMI_LOCAL, ghes_notify_nmi, 0,
+						"ghes");
 		list_add_rcu(&ghes->list, &ghes_nmi);
 		mutex_unlock(&ghes_list_mutex);
 		break;
@@ -964,7 +958,7 @@ static int __devexit ghes_remove(struct platform_device *ghes_dev)
 		mutex_lock(&ghes_list_mutex);
 		list_del_rcu(&ghes->list);
 		if (list_empty(&ghes_nmi))
-			unregister_die_notifier(&ghes_notifier_nmi);
+			unregister_nmi_handler(NMI_LOCAL, "ghes");
 		mutex_unlock(&ghes_list_mutex);
 		/*
 		 * To synchronize with NMI handler, ghes can only be
diff --git a/drivers/char/ipmi/ipmi_watchdog.c b/drivers/char/ipmi/ipmi_watchdog.c
index 3302586..3dcab56 100644
--- a/drivers/char/ipmi/ipmi_watchdog.c
+++ b/drivers/char/ipmi/ipmi_watchdog.c
@@ -1077,17 +1077,8 @@ static void ipmi_unregister_watchdog(int ipmi_intf)
 
 #ifdef HAVE_DIE_NMI
 static int
-ipmi_nmi(struct notifier_block *self, unsigned long val, void *data)
+ipmi_nmi(unsigned int val, struct pt_regs *regs)
 {
-	struct die_args *args = data;
-
-	if (val != DIE_NMIUNKNOWN)
-		return NOTIFY_OK;
-
-	/* Hack, if it's a memory or I/O error, ignore it. */
-	if (args->err & 0xc0)
-		return NOTIFY_OK;
-
 	/*
 	 * If we get here, it's an NMI that's not a memory or I/O
 	 * error.  We can't truly tell if it's from IPMI or not
@@ -1097,15 +1088,15 @@ ipmi_nmi(struct notifier_block *self, unsigned long val, void *data)
 
 	if (testing_nmi) {
 		testing_nmi = 2;
-		return NOTIFY_STOP;
+		return NMI_HANDLED;
 	}
 
 	/* If we are not expecting a timeout, ignore it. */
 	if (ipmi_watchdog_state == WDOG_TIMEOUT_NONE)
-		return NOTIFY_OK;
+		return NMI_DONE;
 
 	if (preaction_val != WDOG_PRETIMEOUT_NMI)
-		return NOTIFY_OK;
+		return NMI_DONE;
 
 	/*
 	 * If no one else handled the NMI, we assume it was the IPMI
@@ -1120,12 +1111,8 @@ ipmi_nmi(struct notifier_block *self, unsigned long val, void *data)
 			panic(PFX "pre-timeout");
 	}
 
-	return NOTIFY_STOP;
+	return NMI_HANDLED;
 }
-
-static struct notifier_block ipmi_nmi_handler = {
-	.notifier_call = ipmi_nmi
-};
 #endif
 
 static int wdog_reboot_handler(struct notifier_block *this,
@@ -1290,7 +1277,8 @@ static void check_parms(void)
 		}
 	}
 	if (do_nmi && !nmi_handler_registered) {
-		rv = register_die_notifier(&ipmi_nmi_handler);
+		rv = register_nmi_handler(NMI_UNKNOWN, ipmi_nmi, 0,
+						"ipmi");
 		if (rv) {
 			printk(KERN_WARNING PFX
 			       "Can't register nmi handler\n");
@@ -1298,7 +1286,7 @@ static void check_parms(void)
 		} else
 			nmi_handler_registered = 1;
 	} else if (!do_nmi && nmi_handler_registered) {
-		unregister_die_notifier(&ipmi_nmi_handler);
+		unregister_nmi_handler(NMI_UNKNOWN, "ipmi");
 		nmi_handler_registered = 0;
 	}
 #endif
@@ -1336,7 +1324,7 @@ static int __init ipmi_wdog_init(void)
 	if (rv) {
 #ifdef HAVE_DIE_NMI
 		if (nmi_handler_registered)
-			unregister_die_notifier(&ipmi_nmi_handler);
+			unregister_nmi_handler(NMI_UNKNOWN, "ipmi");
 #endif
 		atomic_notifier_chain_unregister(&panic_notifier_list,
 						 &wdog_panic_notifier);
@@ -1357,7 +1345,7 @@ static void __exit ipmi_wdog_exit(void)
 
 #ifdef HAVE_DIE_NMI
 	if (nmi_handler_registered)
-		unregister_die_notifier(&ipmi_nmi_handler);
+		unregister_nmi_handler(NMI_UNKNOWN, "ipmi");
 #endif
 
 	atomic_notifier_chain_unregister(&panic_notifier_list,
diff --git a/drivers/watchdog/hpwdt.c b/drivers/watchdog/hpwdt.c
index 410fba4..e0f6202 100644
--- a/drivers/watchdog/hpwdt.c
+++ b/drivers/watchdog/hpwdt.c
@@ -477,15 +477,12 @@ static int hpwdt_time_left(void)
 /*
  *	NMI Handler
  */
-static int hpwdt_pretimeout(struct notifier_block *nb, unsigned long ulReason,
+static int hpwdt_pretimeout(unsigned int ulReason, struct pt_regs *regs,
 				void *data)
 {
 	unsigned long rom_pl;
 	static int die_nmi_called;
 
-	if (ulReason != DIE_NMIUNKNOWN)
-		goto out;
-
 	if (!hpwdt_nmi_decoding)
 		goto out;
 
@@ -507,7 +504,7 @@ static int hpwdt_pretimeout(struct notifier_block *nb, unsigned long ulReason,
 		"Management Log for details.\n");
 
 out:
-	return NOTIFY_OK;
+	return NMI_DONE;
 }
 #endif /* CONFIG_HPWDT_NMI_DECODING */
 
@@ -647,13 +644,6 @@ static struct miscdevice hpwdt_miscdev = {
 	.fops = &hpwdt_fops,
 };
 
-#ifdef CONFIG_HPWDT_NMI_DECODING
-static struct notifier_block die_notifier = {
-	.notifier_call = hpwdt_pretimeout,
-	.priority = 0,
-};
-#endif /* CONFIG_HPWDT_NMI_DECODING */
-
 /*
  *	Init & Exit
  */
@@ -739,10 +729,9 @@ static int __devinit hpwdt_init_nmi_decoding(struct pci_dev *dev)
 	 * die notify list to handle a critical NMI. The default is to
 	 * be last so other users of the NMI signal can function.
 	 */
-	if (priority)
-		die_notifier.priority = 0x7FFFFFFF;
-
-	retval = register_die_notifier(&die_notifier);
+	retval = register_nmi_handler(NMI_UNKNOWN, hpwdt_pretimeout,
+					(priority) ? NMI_HANDLE_FIRST : 0,
+					"hpwdt");
 	if (retval != 0) {
 		dev_warn(&dev->dev,
 			"Unable to register a die notifier (err=%d).\n",
@@ -762,7 +751,7 @@ static int __devinit hpwdt_init_nmi_decoding(struct pci_dev *dev)
 
 static void hpwdt_exit_nmi_decoding(void)
 {
-	unregister_die_notifier(&die_notifier);
+	unregister_nmi_handler(NMI_UNKNOWN, "hpwdt");
 	if (cru_rom_addr)
 		iounmap(cru_rom_addr);
 }
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC][PATCH 4/6] x86, nmi:  add in logic to handle multiple events and unknown NMIs
  2011-08-19 20:37 [RFC][PATCH 0/6] x86, nmi: new NMI handling routines Don Zickus
                   ` (2 preceding siblings ...)
  2011-08-19 20:37 ` [RFC][PATCH 3/6] x86, nmi: wire up NMI handlers to new routines Don Zickus
@ 2011-08-19 20:37 ` Don Zickus
  2011-08-22 14:22   ` Peter Zijlstra
  2011-08-19 20:37 ` [RFC][PATCH 5/6] x86, nmi: track NMI usage stats Don Zickus
  2011-08-19 20:37 ` [RFC][PATCH 6/6] x86, nmi: print out NMI stats in /proc/interrupts Don Zickus
  5 siblings, 1 reply; 26+ messages in thread
From: Don Zickus @ 2011-08-19 20:37 UTC (permalink / raw)
  To: x86, Andi Kleen, Robert Richter, Peter Zijlstra, ying.huang
  Cc: LKML, jason.wessel, Don Zickus

Previous patches allow the NMI subsystem to process multipe NMI events
in one NMI.  As previously discussed this can cause issues when an event
triggered another NMI but is processed in the current NMI.  This causes the
next NMI to go unprocessed and become an 'unknown' NMI.

Having this print 'unknown' NMI to the console would be inaccurate and
scare users.  As a result I have copied the 'skip unknown' NMI logic
developed by Robert Richter (and simplfied a little because we can
track _all_ NMIs better instead of just the perf ones) to the main
NMI handling routine.

It is fairly simple, if when processing an NMI, the nmi_handle routine returns
more than one event handled, then set a flag for future use.  This flag just
says there might be a possible unknown NMI.  If an unknown NMI does come in,
then it is skipped (swallowed).  Otherwise just clear the flag on the next NMI
if it has events processed.

The algorithm isn't 100% accurate but for most loads we have seen in perf it
captures a large majority of unknown NMIs.  Under high workloads there still
is the chance that unknown NMIs can trigger because you can time it just right
such that you are generating NMIs as fast as you can process them and go four
or five NMIs before seeing the unknown NMI.

Without involving the concept of time when tracking these 'possible' NMIs,
we may never be 100% reliable.  The idea with time being that back-to-back
NMIs immediately follow each other.  Anything more than a micro second or so
on modern machines between when the first NMI finished to when the second one
starts, probably indicates a completely new event.

Signed-off-by: Don Zickus <dzickus@redhat.com>
---
 arch/x86/kernel/nmi.c |   26 +++++++++++++++++++++++++-
 1 files changed, 25 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index 5d16fd4..7d9287e 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -260,6 +260,8 @@ unknown_nmi_error(unsigned char reason, struct pt_regs *regs)
 	pr_emerg("Dazed and confused, but trying to continue\n");
 }
 
+DEFINE_PER_CPU(bool, swallow_nmi);
+
 static notrace __kprobes void default_do_nmi(struct pt_regs *regs)
 {
 	unsigned char reason = 0;
@@ -271,8 +273,28 @@ static notrace __kprobes void default_do_nmi(struct pt_regs *regs)
 	 * NMI can not be detected/processed on other CPUs.
 	 */
 	handled = nmi_handle(NMI_LOCAL, regs);
-	if (handled)
+	if (handled) {
+		/*
+		 * When handling multiple NMI events, we are not
+		 * sure if the second NMI was dropped (because of
+		 * too many NMIs), piggy-backed on the same NMI
+		 * (perf) or is queued right behind this NMI.
+		 * In the last case, we may accidentally get an
+		 * unknown NMI because the event is already handled.
+		 * Flag for this condition and swallow it later.
+		 *
+		 * FIXME: This detection has holes in it mainly
+		 * because we can't tell _when_ the next NMI comes
+		 * in.  A multi-handled NMI event followed by an 
+		 * unknown NMI a second later, clearly should not
+		 * be swallowed.
+		 */
+		if (handled > 1)
+			__this_cpu_write(swallow_nmi, true);
+		else
+			__this_cpu_write(swallow_nmi, false);
 		return;
+	}
 
 	/* Non-CPU-specific NMI: NMI sources can be processed on any CPU */
 	raw_spin_lock(&nmi_reason_lock);
@@ -296,6 +318,8 @@ static notrace __kprobes void default_do_nmi(struct pt_regs *regs)
 	raw_spin_unlock(&nmi_reason_lock);
 
 	unknown_nmi_error(reason, regs);
+
+	__this_cpu_write(swallow_nmi, false);
 }
 
 dotraplinkage notrace __kprobes void
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC][PATCH 5/6] x86, nmi: track NMI usage stats
  2011-08-19 20:37 [RFC][PATCH 0/6] x86, nmi: new NMI handling routines Don Zickus
                   ` (3 preceding siblings ...)
  2011-08-19 20:37 ` [RFC][PATCH 4/6] x86, nmi: add in logic to handle multiple events and unknown NMIs Don Zickus
@ 2011-08-19 20:37 ` Don Zickus
  2011-08-19 20:37 ` [RFC][PATCH 6/6] x86, nmi: print out NMI stats in /proc/interrupts Don Zickus
  5 siblings, 0 replies; 26+ messages in thread
From: Don Zickus @ 2011-08-19 20:37 UTC (permalink / raw)
  To: x86, Andi Kleen, Robert Richter, Peter Zijlstra, ying.huang
  Cc: LKML, jason.wessel, Don Zickus

Now that the NMI handler are broken into lists, increment the appropriate
stats for each list.  This allows us to see what is going on when they
get printed out in the next patch.

Signed-off-by: Don Zickus <dzickus@redhat.com>
---
 arch/x86/kernel/nmi.c |   23 +++++++++++++++++++++--
 1 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index 7d9287e..260a275 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -47,6 +47,15 @@ static struct nmi_desc nmi_desc[NMI_MAX] =
 
 };
 
+struct nmi_stats {
+	unsigned int normal;
+	unsigned int unknown;
+	unsigned int external;
+	unsigned int swallow;
+};
+
+static DEFINE_PER_CPU(struct nmi_stats, nmi_stats);
+
 static int ignore_nmis;
 
 int unknown_nmi_panic;
@@ -238,8 +247,13 @@ unknown_nmi_error(unsigned char reason, struct pt_regs *regs)
 	int handled;
 
 	handled = nmi_handle(NMI_UNKNOWN, regs);
-	if (handled) 
+	if (handled) {
+		__this_cpu_add(nmi_stats.unknown, handled);
 		return;
+	}
+
+	__this_cpu_add(nmi_stats.unknown, 1);
+
 #ifdef CONFIG_MCA
 	/*
 	 * Might actually be able to figure out what the guilty party
@@ -273,6 +287,7 @@ static notrace __kprobes void default_do_nmi(struct pt_regs *regs)
 	 * NMI can not be detected/processed on other CPUs.
 	 */
 	handled = nmi_handle(NMI_LOCAL, regs);
+	__this_cpu_add(nmi_stats.normal, handled);
 	if (handled) {
 		/*
 		 * When handling multiple NMI events, we are not
@@ -312,12 +327,16 @@ static notrace __kprobes void default_do_nmi(struct pt_regs *regs)
 		 */
 		reassert_nmi();
 #endif
+		__this_cpu_add(nmi_stats.external, 1);
 		raw_spin_unlock(&nmi_reason_lock);
 		return;
 	}
 	raw_spin_unlock(&nmi_reason_lock);
 
-	unknown_nmi_error(reason, regs);
+	if (!__this_cpu_read(swallow_nmi))
+		unknown_nmi_error(reason, regs);
+	else
+		__this_cpu_add(nmi_stats.swallow, 1);
 
 	__this_cpu_write(swallow_nmi, false);
 }
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC][PATCH 6/6] x86, nmi: print out NMI stats in /proc/interrupts
  2011-08-19 20:37 [RFC][PATCH 0/6] x86, nmi: new NMI handling routines Don Zickus
                   ` (4 preceding siblings ...)
  2011-08-19 20:37 ` [RFC][PATCH 5/6] x86, nmi: track NMI usage stats Don Zickus
@ 2011-08-19 20:37 ` Don Zickus
  2011-08-22 14:27   ` Peter Zijlstra
  5 siblings, 1 reply; 26+ messages in thread
From: Don Zickus @ 2011-08-19 20:37 UTC (permalink / raw)
  To: x86, Andi Kleen, Robert Richter, Peter Zijlstra, ying.huang
  Cc: LKML, jason.wessel, Don Zickus

This is a cheap hack to add the stats to the middle of /proc/interrupts.
It is more of a conversation starter than anything as I am not sure
the right letters and place to put this stuff.

The benefit of these stats is a better breakdown of which list the NMIs
get handled in either a normal handler, unknown, or external.  It also
list the number of unknown NMIs swallowed to help check for false
positives or not.  Another benefit is the ability to actually see which
NMI handlers are currently registered in the system.

The output of 'cat /proc/interrupts/ will look like this:

<snip>
 58:        275          0        864          0   PCI-MSI-edge      eth0
NMI:       4161       4155        158       4194   Non-maskable interrupts
SWA:          0          0          0          0   Unknown NMIs swallowed
  0:       4161       4155        158       4194   NMI  PMI, arch_bt
UNK:          0          0          0          0   NMI
EXT:          0          0          0          0   NMI
LOC:      12653      13304      13974      12926   Local timer interrupts
SPU:          0          0          0          0   Spurious interrupts
PMI:          6          6          5          6   Performance monitoring interrupts
IWI:          0          0          0          0   IRQ work interrupts
RES:       1839       1897       1821       1854   Rescheduling interrupts
CAL:        524       2714        392        331   Function call interrupts
TLB:        217        146        593        576   TLB shootdowns
TRM:          0          0          0          0   Thermal event interrupts
THR:          0          0          0          0   Threshold APIC interrupts
MCE:          0          0          0          0   Machine check exceptions
MCP:          1          1          1          1   Machine check polls
ERR:          0
MIS:          0

Signed-off-by: Don Zickus <dzickus@redhat.com>
---
 arch/x86/include/asm/nmi.h |    2 +
 arch/x86/kernel/irq.c      |    2 +
 arch/x86/kernel/nmi.c      |   47 ++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 51 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/nmi.h b/arch/x86/include/asm/nmi.h
index fc74547..a4f1945 100644
--- a/arch/x86/include/asm/nmi.h
+++ b/arch/x86/include/asm/nmi.h
@@ -24,6 +24,8 @@ void arch_trigger_all_cpu_backtrace(void);
 
 #define NMI_FLAG_FIRST	1
 
+void arch_show_nmi(struct seq_file *p, int prec);
+
 enum {
 	NMI_LOCAL=0,
 	NMI_UNKNOWN,
diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
index 6c0802e..44d1cac 100644
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -16,6 +16,7 @@
 #include <asm/idle.h>
 #include <asm/mce.h>
 #include <asm/hw_irq.h>
+#include <asm/nmi.h>
 
 atomic_t irq_err_count;
 
@@ -55,6 +56,7 @@ int arch_show_interrupts(struct seq_file *p, int prec)
 	for_each_online_cpu(j)
 		seq_printf(p, "%10u ", irq_stats(j)->__nmi_count);
 	seq_printf(p, "  Non-maskable interrupts\n");
+	arch_show_nmi(p, prec);
 #ifdef CONFIG_X86_LOCAL_APIC
 	seq_printf(p, "%*s: ", prec, "LOC");
 	for_each_online_cpu(j)
diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index 260a275..63fe8f3 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -363,3 +363,50 @@ void restart_nmi(void)
 {
 	ignore_nmis--;
 }
+
+void arch_show_nmi(struct seq_file *p, int prec)
+{
+	int j;
+	struct nmiaction *action;
+
+	seq_printf(p, "%*s: ", prec, "SWA");
+        for_each_online_cpu(j)
+                seq_printf(p, "%10u ", per_cpu(nmi_stats.swallow, j));
+        seq_printf(p, "  Unknown NMIs swallowed\n");
+
+	seq_printf(p, "%*s: ", prec, "  0");
+        for_each_online_cpu(j)
+                seq_printf(p, "%10u ", per_cpu(nmi_stats.normal, j));
+        seq_printf(p, "  NMI");
+	action = (nmi_to_desc(NMI_LOCAL))->head;
+	if (action) {
+		seq_printf(p, "\t%s", action->name);
+		while ((action = action->next) != NULL)
+			seq_printf(p, ", %s", action->name);
+	}
+	seq_putc(p, '\n');
+
+	seq_printf(p, "%*s: ", prec, "UNK");
+        for_each_online_cpu(j)
+                seq_printf(p, "%10u ", per_cpu(nmi_stats.unknown, j));
+        seq_printf(p, "  NMI");
+	action = (nmi_to_desc(NMI_UNKNOWN))->head;
+	if (action) {
+		seq_printf(p, "\t%s", action->name);
+		while ((action = action->next) != NULL)
+			seq_printf(p, ", %s", action->name);
+	}
+	seq_putc(p, '\n');
+
+	seq_printf(p, "%*s: ", prec, "EXT");
+        for_each_online_cpu(j)
+                seq_printf(p, "%10u ", per_cpu(nmi_stats.external, j));
+        seq_printf(p, "  NMI");
+	action = (nmi_to_desc(NMI_EXTERNAL))->head;
+	if (action) {
+		seq_printf(p, "\t%s", action->name);
+		while ((action = action->next) != NULL)
+			seq_printf(p, ", %s", action->name);
+	}
+	seq_putc(p, '\n');
+}
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [RFC][PATCH 2/6] x86, nmi: create new NMI handler routines
  2011-08-19 20:37 ` [RFC][PATCH 2/6] x86, nmi: create new NMI handler routines Don Zickus
@ 2011-08-22 14:13   ` Peter Zijlstra
  2011-08-22 15:21     ` Don Zickus
  2011-08-22 14:16   ` Peter Zijlstra
  2011-08-24 17:04   ` Paul E. McKenney
  2 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2011-08-22 14:13 UTC (permalink / raw)
  To: Don Zickus
  Cc: x86, Andi Kleen, Robert Richter, ying.huang, LKML, jason.wessel

On Fri, 2011-08-19 at 16:37 -0400, Don Zickus wrote:
> +int register_nmi_handler(unsigned int type, nmi_handler_t handler,
> +                       unsigned long nmiflags, const char *devname)
> +{
> +       struct nmiaction *action;
> +       int retval;
> +
> +       if (!handler)
> +               return -EINVAL;
> +
> +       action = kzalloc(sizeof(struct nmiaction), GFP_KERNEL);
> +       if (!action)
> +               return -ENOMEM;
> +
> +       action->handler = handler;
> +       action->flags = nmiflags;
> +       action->name = devname; 

So you assume the string passed as devname remains stable..


> +
> +       retval = __setup_nmi(type, action);
> +
> +       if (retval)
> +               kfree(action);
> +       
> +       return retval;
> +}
> +EXPORT_SYMBOL(register_nmi_handler);

Also, I'd use _GPL, the current register_die_notifier() is also.
Preferably I'd not export NMI stuff at all.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC][PATCH 2/6] x86, nmi: create new NMI handler routines
  2011-08-19 20:37 ` [RFC][PATCH 2/6] x86, nmi: create new NMI handler routines Don Zickus
  2011-08-22 14:13   ` Peter Zijlstra
@ 2011-08-22 14:16   ` Peter Zijlstra
  2011-08-22 15:23     ` Don Zickus
  2011-08-23 14:14     ` Don Zickus
  2011-08-24 17:04   ` Paul E. McKenney
  2 siblings, 2 replies; 26+ messages in thread
From: Peter Zijlstra @ 2011-08-22 14:16 UTC (permalink / raw)
  To: Don Zickus
  Cc: x86, Andi Kleen, Robert Richter, ying.huang, LKML, jason.wessel

On Fri, 2011-08-19 at 16:37 -0400, Don Zickus wrote:
> +static struct nmiaction *__free_nmi(unsigned int type, const char *name)
> +{
> +       struct nmi_desc *desc = nmi_to_desc(type);
> +       struct nmiaction *n, **np = &(desc->head);
> +       unsigned long flags;
> +
> +       spin_lock_irqsave(&desc->lock, flags);
> +
...
> +
> +       spin_unlock_irqrestore(&desc->lock, flags);
> +       synchronize_rcu();
> +       return *np;
> +}

> +void unregister_nmi_handler(unsigned int type, const char *name)
> +{
> +       kfree(__free_nmi(type, name));
> +}

This code is weird.. why not have the kfree() in __free_nmi(), also why
use sync_rcu() and not use kfree_rcu()?

> +EXPORT_SYMBOL_GPL(unregister_nmi_handler); 

*g*, so binary modules can register an NMI handler, but can't unregister
it..?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC][PATCH 4/6] x86, nmi:  add in logic to handle multiple events and unknown NMIs
  2011-08-19 20:37 ` [RFC][PATCH 4/6] x86, nmi: add in logic to handle multiple events and unknown NMIs Don Zickus
@ 2011-08-22 14:22   ` Peter Zijlstra
  2011-08-22 15:25     ` Don Zickus
  0 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2011-08-22 14:22 UTC (permalink / raw)
  To: Don Zickus
  Cc: x86, Andi Kleen, Robert Richter, ying.huang, LKML, jason.wessel

On Fri, 2011-08-19 at 16:37 -0400, Don Zickus wrote:
> @@ -260,6 +260,8 @@ unknown_nmi_error(unsigned char reason, struct pt_regs *regs)
>         pr_emerg("Dazed and confused, but trying to continue\n");
>  }
>  
> +DEFINE_PER_CPU(bool, swallow_nmi);
> +
>  static notrace __kprobes void default_do_nmi(struct pt_regs *regs)
>  {
>         unsigned char reason = 0;
> @@ -271,8 +273,28 @@ static notrace __kprobes void default_do_nmi(struct pt_regs *regs)
>          * NMI can not be detected/processed on other CPUs.
>          */
>         handled = nmi_handle(NMI_LOCAL, regs);
> -       if (handled)
> +       if (handled) {
> +               /*
> +                * When handling multiple NMI events, we are not
> +                * sure if the second NMI was dropped (because of
> +                * too many NMIs), piggy-backed on the same NMI
> +                * (perf) or is queued right behind this NMI.
> +                * In the last case, we may accidentally get an
> +                * unknown NMI because the event is already handled.
> +                * Flag for this condition and swallow it later.
> +                *
> +                * FIXME: This detection has holes in it mainly
> +                * because we can't tell _when_ the next NMI comes
> +                * in.  A multi-handled NMI event followed by an 
> +                * unknown NMI a second later, clearly should not
> +                * be swallowed.
> +                */
> +               if (handled > 1)
> +                       __this_cpu_write(swallow_nmi, true);
> +               else
> +                       __this_cpu_write(swallow_nmi, false);
>                 return;
> +       }
>  
>         /* Non-CPU-specific NMI: NMI sources can be processed on any CPU */
>         raw_spin_lock(&nmi_reason_lock);
> @@ -296,6 +318,8 @@ static notrace __kprobes void default_do_nmi(struct pt_regs *regs)
>         raw_spin_unlock(&nmi_reason_lock);
>  
>         unknown_nmi_error(reason, regs);
> +
> +       __this_cpu_write(swallow_nmi, false);
>  } 

All writes, no reads... the actual dropping of NMIs got lost and now
lives in patch 5 which purports to be about statistics only.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC][PATCH 6/6] x86, nmi: print out NMI stats in /proc/interrupts
  2011-08-19 20:37 ` [RFC][PATCH 6/6] x86, nmi: print out NMI stats in /proc/interrupts Don Zickus
@ 2011-08-22 14:27   ` Peter Zijlstra
  2011-08-22 15:28     ` Don Zickus
  0 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2011-08-22 14:27 UTC (permalink / raw)
  To: Don Zickus
  Cc: x86, Andi Kleen, Robert Richter, ying.huang, LKML, jason.wessel

On Fri, 2011-08-19 at 16:37 -0400, Don Zickus wrote:
> NMI:       4161       4155        158       4194   Non-maskable interrupts
> SWA:          0          0          0          0   Unknown NMIs swallowed
>   0:       4161       4155        158       4194   NMI  PMI, arch_bt
> UNK:          0          0          0          0   NMI
> EXT:          0          0          0          0   NMI 

Yeah, not too pretty.. I would have expected something like:

NMI:
 PMI:
 uv:
 arch_bt:
 ...
 swallowed:
 unknown:

But then, I'm sure some people would complain to that as I'm pretty sure
it'll break stuff parsing /proc/interrupts

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC][PATCH 2/6] x86, nmi: create new NMI handler routines
  2011-08-22 14:13   ` Peter Zijlstra
@ 2011-08-22 15:21     ` Don Zickus
  2011-08-22 15:26       ` Peter Zijlstra
  2011-08-22 15:31       ` Peter Zijlstra
  0 siblings, 2 replies; 26+ messages in thread
From: Don Zickus @ 2011-08-22 15:21 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, Andi Kleen, Robert Richter, ying.huang, LKML, jason.wessel

On Mon, Aug 22, 2011 at 04:13:49PM +0200, Peter Zijlstra wrote:
> On Fri, 2011-08-19 at 16:37 -0400, Don Zickus wrote:
> > +int register_nmi_handler(unsigned int type, nmi_handler_t handler,
> > +                       unsigned long nmiflags, const char *devname)
> > +{
> > +       struct nmiaction *action;
> > +       int retval;
> > +
> > +       if (!handler)
> > +               return -EINVAL;
> > +
> > +       action = kzalloc(sizeof(struct nmiaction), GFP_KERNEL);
> > +       if (!action)
> > +               return -ENOMEM;
> > +
> > +       action->handler = handler;
> > +       action->flags = nmiflags;
> > +       action->name = devname; 
> 
> So you assume the string passed as devname remains stable..

yeah, I just kinda copied the irq stuff, thinking if it works there, it
should work here.  It does seem odd, but I can change that to strncpy to
be safer.

> 
> 
> > +
> > +       retval = __setup_nmi(type, action);
> > +
> > +       if (retval)
> > +               kfree(action);
> > +       
> > +       return retval;
> > +}
> > +EXPORT_SYMBOL(register_nmi_handler);
> 
> Also, I'd use _GPL, the current register_die_notifier() is also.
> Preferably I'd not export NMI stuff at all.

Sure.  Well there are a few drivers that want to register.  Unless you
want to make a rule that all nmi handlers have to be built-in?

Cheers,
Don

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC][PATCH 2/6] x86, nmi: create new NMI handler routines
  2011-08-22 14:16   ` Peter Zijlstra
@ 2011-08-22 15:23     ` Don Zickus
  2011-08-23 14:14     ` Don Zickus
  1 sibling, 0 replies; 26+ messages in thread
From: Don Zickus @ 2011-08-22 15:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, Andi Kleen, Robert Richter, ying.huang, LKML, jason.wessel

On Mon, Aug 22, 2011 at 04:16:20PM +0200, Peter Zijlstra wrote:
> On Fri, 2011-08-19 at 16:37 -0400, Don Zickus wrote:
> > +static struct nmiaction *__free_nmi(unsigned int type, const char *name)
> > +{
> > +       struct nmi_desc *desc = nmi_to_desc(type);
> > +       struct nmiaction *n, **np = &(desc->head);
> > +       unsigned long flags;
> > +
> > +       spin_lock_irqsave(&desc->lock, flags);
> > +
> ...
> > +
> > +       spin_unlock_irqrestore(&desc->lock, flags);
> > +       synchronize_rcu();
> > +       return *np;
> > +}
> 
> > +void unregister_nmi_handler(unsigned int type, const char *name)
> > +{
> > +       kfree(__free_nmi(type, name));
> > +}
> 
> This code is weird.. why not have the kfree() in __free_nmi(), also why
> use sync_rcu() and not use kfree_rcu()?

It's a mixture of copying the notifier code and not fully understanding
the rcu options out there.  I'll look into kfree_rcu and try to clean this
up.

> 
> > +EXPORT_SYMBOL_GPL(unregister_nmi_handler); 
> 
> *g*, so binary modules can register an NMI handler, but can't unregister
> it..?

heh.  I suck.  Sorry about that.

Cheers,
Don


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC][PATCH 4/6] x86, nmi:  add in logic to handle multiple events and unknown NMIs
  2011-08-22 14:22   ` Peter Zijlstra
@ 2011-08-22 15:25     ` Don Zickus
  0 siblings, 0 replies; 26+ messages in thread
From: Don Zickus @ 2011-08-22 15:25 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, Andi Kleen, Robert Richter, ying.huang, LKML, jason.wessel

On Mon, Aug 22, 2011 at 04:22:15PM +0200, Peter Zijlstra wrote:
> On Fri, 2011-08-19 at 16:37 -0400, Don Zickus wrote:
> > @@ -260,6 +260,8 @@ unknown_nmi_error(unsigned char reason, struct pt_regs *regs)
> >         pr_emerg("Dazed and confused, but trying to continue\n");
> >  }
> >  
> > +DEFINE_PER_CPU(bool, swallow_nmi);
> > +
> >  static notrace __kprobes void default_do_nmi(struct pt_regs *regs)
> >  {
> >         unsigned char reason = 0;
> > @@ -271,8 +273,28 @@ static notrace __kprobes void default_do_nmi(struct pt_regs *regs)
> >          * NMI can not be detected/processed on other CPUs.
> >          */
> >         handled = nmi_handle(NMI_LOCAL, regs);
> > -       if (handled)
> > +       if (handled) {
> > +               /*
> > +                * When handling multiple NMI events, we are not
> > +                * sure if the second NMI was dropped (because of
> > +                * too many NMIs), piggy-backed on the same NMI
> > +                * (perf) or is queued right behind this NMI.
> > +                * In the last case, we may accidentally get an
> > +                * unknown NMI because the event is already handled.
> > +                * Flag for this condition and swallow it later.
> > +                *
> > +                * FIXME: This detection has holes in it mainly
> > +                * because we can't tell _when_ the next NMI comes
> > +                * in.  A multi-handled NMI event followed by an 
> > +                * unknown NMI a second later, clearly should not
> > +                * be swallowed.
> > +                */
> > +               if (handled > 1)
> > +                       __this_cpu_write(swallow_nmi, true);
> > +               else
> > +                       __this_cpu_write(swallow_nmi, false);
> >                 return;
> > +       }
> >  
> >         /* Non-CPU-specific NMI: NMI sources can be processed on any CPU */
> >         raw_spin_lock(&nmi_reason_lock);
> > @@ -296,6 +318,8 @@ static notrace __kprobes void default_do_nmi(struct pt_regs *regs)
> >         raw_spin_unlock(&nmi_reason_lock);
> >  
> >         unknown_nmi_error(reason, regs);
> > +
> > +       __this_cpu_write(swallow_nmi, false);
> >  } 
> 
> All writes, no reads... the actual dropping of NMIs got lost and now
> lives in patch 5 which purports to be about statistics only.

Oops.  I screwed up when breaking up the changes into multiple patches.
I'll fix that.  Thanks for catching that.

Cheers,
Don


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC][PATCH 2/6] x86, nmi: create new NMI handler routines
  2011-08-22 15:21     ` Don Zickus
@ 2011-08-22 15:26       ` Peter Zijlstra
  2011-08-22 15:41         ` Don Zickus
  2011-08-22 15:31       ` Peter Zijlstra
  1 sibling, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2011-08-22 15:26 UTC (permalink / raw)
  To: Don Zickus
  Cc: x86, Andi Kleen, Robert Richter, ying.huang, LKML, jason.wessel

On Mon, 2011-08-22 at 11:21 -0400, Don Zickus wrote:
> 
> Sure.  Well there are a few drivers that want to register.  Unless you
> want to make a rule that all nmi handlers have to be built-in? 

Well, I'm all for that, but I guess I'm going to loose that particular
fight ;-)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC][PATCH 6/6] x86, nmi: print out NMI stats in /proc/interrupts
  2011-08-22 14:27   ` Peter Zijlstra
@ 2011-08-22 15:28     ` Don Zickus
  0 siblings, 0 replies; 26+ messages in thread
From: Don Zickus @ 2011-08-22 15:28 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, Andi Kleen, Robert Richter, ying.huang, LKML, jason.wessel

On Mon, Aug 22, 2011 at 04:27:42PM +0200, Peter Zijlstra wrote:
> On Fri, 2011-08-19 at 16:37 -0400, Don Zickus wrote:
> > NMI:       4161       4155        158       4194   Non-maskable interrupts
> > SWA:          0          0          0          0   Unknown NMIs swallowed
> >   0:       4161       4155        158       4194   NMI  PMI, arch_bt
> > UNK:          0          0          0          0   NMI
> > EXT:          0          0          0          0   NMI 
> 
> Yeah, not too pretty.. I would have expected something like:
> 
> NMI:
>  PMI:
>  uv:
>  arch_bt:
>  ...
>  swallowed:
>  unknown:

Yeah, I kinda wanted to do that originally, but wasn't sure what the rules
were for the output structure of that file.  I even toyed with the idea of
creating a /proc/nmi file to do that too, but thought that it might confuse
people.

> 
> But then, I'm sure some people would complain to that as I'm pretty sure
> it'll break stuff parsing /proc/interrupts

That's why I just posted my half-ass patch to see if it was

- useful info
- gather ideas on the best way to present it

Cheers,
Don


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC][PATCH 2/6] x86, nmi: create new NMI handler routines
  2011-08-22 15:21     ` Don Zickus
  2011-08-22 15:26       ` Peter Zijlstra
@ 2011-08-22 15:31       ` Peter Zijlstra
  1 sibling, 0 replies; 26+ messages in thread
From: Peter Zijlstra @ 2011-08-22 15:31 UTC (permalink / raw)
  To: Don Zickus
  Cc: x86, Andi Kleen, Robert Richter, ying.huang, LKML, jason.wessel

On Mon, 2011-08-22 at 11:21 -0400, Don Zickus wrote:
> It does seem odd, but I can change that to strncpy to
> be safer. 

I think kstrdup() exists for this purpose, don't forget to free it when
you're done with it though ;-)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC][PATCH 2/6] x86, nmi: create new NMI handler routines
  2011-08-22 15:26       ` Peter Zijlstra
@ 2011-08-22 15:41         ` Don Zickus
  0 siblings, 0 replies; 26+ messages in thread
From: Don Zickus @ 2011-08-22 15:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, Andi Kleen, Robert Richter, ying.huang, LKML, jason.wessel

On Mon, Aug 22, 2011 at 05:26:52PM +0200, Peter Zijlstra wrote:
> On Mon, 2011-08-22 at 11:21 -0400, Don Zickus wrote:
> > 
> > Sure.  Well there are a few drivers that want to register.  Unless you
> > want to make a rule that all nmi handlers have to be built-in? 
> 
> Well, I'm all for that, but I guess I'm going to loose that particular
> fight ;-)

:-)  I don't care either way, but from a technical point of view, you
would have trouble with getting oprofile as a built-in I would think.

Cheers,
Don


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC][PATCH 2/6] x86, nmi: create new NMI handler routines
  2011-08-22 14:16   ` Peter Zijlstra
  2011-08-22 15:23     ` Don Zickus
@ 2011-08-23 14:14     ` Don Zickus
  2011-08-23 14:17       ` Peter Zijlstra
  1 sibling, 1 reply; 26+ messages in thread
From: Don Zickus @ 2011-08-23 14:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, Andi Kleen, Robert Richter, ying.huang, LKML, jason.wessel

On Mon, Aug 22, 2011 at 04:16:20PM +0200, Peter Zijlstra wrote:
> On Fri, 2011-08-19 at 16:37 -0400, Don Zickus wrote:
> > +static struct nmiaction *__free_nmi(unsigned int type, const char *name)
> > +{
> > +       struct nmi_desc *desc = nmi_to_desc(type);
> > +       struct nmiaction *n, **np = &(desc->head);
> > +       unsigned long flags;
> > +
> > +       spin_lock_irqsave(&desc->lock, flags);
> > +
> ...
> > +
> > +       spin_unlock_irqrestore(&desc->lock, flags);
> > +       synchronize_rcu();
> > +       return *np;
> > +}
> 
> > +void unregister_nmi_handler(unsigned int type, const char *name)
> > +{
> > +       kfree(__free_nmi(type, name));
> > +}
> 
> This code is weird.. why not have the kfree() in __free_nmi(), also why
> use sync_rcu() and not use kfree_rcu()?

I was looking at trying to use kfree_rcu and noticed I would have to add
another element to the nmiaction struct and another function as a callback
to kfree the memory from the device name.  The overhead didn't seem worth
it.  For some reason it just seems simpler to keep it the way it is and
just have

	struct nmiaction *a;

	a = __free_nmi(type, name);
	if (a) {
		kfree(a->name);
		kfree(a);
	}

(side note:  I was keeping the kfree()s in here for symmetry with the
registration handler).

Maybe I don't understand kfree_rcu, but what advantage do I have by
placing rcu_head into nmiaction that never gets used except on unregister
(which rarely happens to begin with)?

Cheers,
Don

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC][PATCH 2/6] x86, nmi: create new NMI handler routines
  2011-08-23 14:14     ` Don Zickus
@ 2011-08-23 14:17       ` Peter Zijlstra
  0 siblings, 0 replies; 26+ messages in thread
From: Peter Zijlstra @ 2011-08-23 14:17 UTC (permalink / raw)
  To: Don Zickus
  Cc: x86, Andi Kleen, Robert Richter, ying.huang, LKML, jason.wessel

On Tue, 2011-08-23 at 10:14 -0400, Don Zickus wrote:

> Maybe I don't understand kfree_rcu, but what advantage do I have by
> placing rcu_head into nmiaction that never gets used except on unregister
> (which rarely happens to begin with)?

you don't have to wait for sync_rcu(), but you're right, this is an
absolute slow patch and we don't care.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC][PATCH 2/6] x86, nmi: create new NMI handler routines
  2011-08-19 20:37 ` [RFC][PATCH 2/6] x86, nmi: create new NMI handler routines Don Zickus
  2011-08-22 14:13   ` Peter Zijlstra
  2011-08-22 14:16   ` Peter Zijlstra
@ 2011-08-24 17:04   ` Paul E. McKenney
  2011-08-24 17:44     ` Don Zickus
  2 siblings, 1 reply; 26+ messages in thread
From: Paul E. McKenney @ 2011-08-24 17:04 UTC (permalink / raw)
  To: Don Zickus
  Cc: x86, Andi Kleen, Robert Richter, Peter Zijlstra, ying.huang,
	LKML, jason.wessel

On Fri, Aug 19, 2011 at 04:37:42PM -0400, Don Zickus wrote:
> The NMI handlers used to rely on the notifier infrastructure.  This worked
> great until we wanted to support handling multiple events better.
> 
> One of the key ideas to the nmi handling is to process _all_ the handlers for
> each NMI.  The reason behind this switch is because NMIs are edge triggered.
> If enough NMIs are triggered, then they could be lost because the cpu can
> only latch at most one NMI (besides the one currently being processed).
> 
> In order to deal with this we have decided to process all the NMI handlers
> for each NMI.  This allows the handlers to determine if they recieved an
> event or not (the ones that can not determine this will be left to fend
> for themselves on the unknown NMI list).
> 
> As a result of this change it is now possible to have an extra NMI that
> was destined to be received for an already processed event.  Because the
> event was processed in the previous NMI, this NMI gets dropped and becomes
> an 'unknown' NMI.  This of course will cause printks that scare people.
> 
> However, we prefer to have extra NMIs as opposed to losing NMIs and as such
> are have developed a basic mechanism to catch most of them.  That will be
> a later patch.
> 
> To accomplish this idea, I unhooked the nmi handlers from the notifier
> routines and created a new mechanism loosely based on doIRQ.  The reason
> for this is the notifier routines have a couple of shortcomings.  One we
> could't guarantee all future NMI handlers used NOTIFY_OK instead of
> NOTIFY_STOP.  Second, we couldn't keep track of the number of events being
> handled in each routine (most only handle one, perf can handle more than one).
> Third, I wanted to eventually display which nmi handlers are registered in
> the system in /proc/interrupts to help see who is generating NMIs.
> 
> The patch below just implements the new infrastructure but doesn't wire it up
> yet (that is the next patch).  Its design is based on doIRQ structs and the
> atomic notifier routines.  So the rcu stuff in the patch isn't entirely untested
> (as the notifier routines have soaked it) but it should be double checked in
> case I copied the code wrong.

One comment below.

							Thanx, Paul

> Signed-off-by: Don Zickus <dzickus@redhat.com>
> ---
>  arch/x86/include/asm/nmi.h |   19 ++++++
>  arch/x86/kernel/nmi.c      |  139 ++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 158 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/x86/include/asm/nmi.h b/arch/x86/include/asm/nmi.h
> index 4886a68..6d04b28 100644
> --- a/arch/x86/include/asm/nmi.h
> +++ b/arch/x86/include/asm/nmi.h
> @@ -42,6 +42,25 @@ void arch_trigger_all_cpu_backtrace(void);
>  #define NMI_LOCAL_NORMAL_PRIOR	(NMI_LOCAL_BIT | NMI_NORMAL_PRIOR)
>  #define NMI_LOCAL_LOW_PRIOR	(NMI_LOCAL_BIT | NMI_LOW_PRIOR)
> 
> +#define NMI_FLAG_FIRST	1
> +
> +enum {
> +	NMI_LOCAL=0,
> +	NMI_UNKNOWN,
> +	NMI_EXTERNAL,
> +	NMI_MAX
> +};
> +
> +#define NMI_DONE	0
> +#define NMI_HANDLED	1
> +
> +typedef int (*nmi_handler_t)(unsigned int, struct pt_regs *);
> +
> +int register_nmi_handler(unsigned int, nmi_handler_t, unsigned long,
> +			 const char *);
> +
> +void unregister_nmi_handler(unsigned int, const char *);
> +
>  void stop_nmi(void);
>  void restart_nmi(void);
> 
> diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
> index 68d758a..dfc46a8 100644
> --- a/arch/x86/kernel/nmi.c
> +++ b/arch/x86/kernel/nmi.c
> @@ -13,6 +13,9 @@
>  #include <linux/kprobes.h>
>  #include <linux/kdebug.h>
>  #include <linux/nmi.h>
> +#include <linux/delay.h>
> +#include <linux/hardirq.h>
> +#include <linux/slab.h>
> 
>  #if defined(CONFIG_EDAC)
>  #include <linux/edac.h>
> @@ -21,6 +24,27 @@
>  #include <linux/atomic.h>
>  #include <asm/traps.h>
>  #include <asm/mach_traps.h>
> +#include <asm/nmi.h>
> +
> +struct nmiaction {
> +	struct nmiaction __rcu *next;
> +	nmi_handler_t handler;
> +	unsigned int flags;
> +	const char *name;
> +};
> +
> +struct nmi_desc {
> +	spinlock_t lock;
> +	struct nmiaction __rcu *head;
> +};
> +
> +static struct nmi_desc nmi_desc[NMI_MAX] = 
> +{
> +	{	.lock = __SPIN_LOCK_UNLOCKED(&nmi_desc[0].lock), },
> +	{	.lock = __SPIN_LOCK_UNLOCKED(&nmi_desc[1].lock), },
> +	{	.lock = __SPIN_LOCK_UNLOCKED(&nmi_desc[2].lock), },
> +
> +};
> 
>  static int ignore_nmis;
> 
> @@ -38,6 +62,121 @@ static int __init setup_unknown_nmi_panic(char *str)
>  }
>  __setup("unknown_nmi_panic", setup_unknown_nmi_panic);
> 
> +#define nmi_to_desc(type) (&nmi_desc[type])
> +
> +static int notrace __kprobes nmi_handle(unsigned int type, struct pt_regs *regs)
> +{
> +	struct nmi_desc *desc = nmi_to_desc(type);
> +	struct nmiaction *next_a, *a, **ap = &desc->head;
> +	int handled=0;
> +
> +	rcu_read_lock();
> +	a = rcu_dereference_raw(*ap);

The reason for rcu_dereference_raw() is to prevent lockdep from choking
due to being called from an NMI handler, correct?  If so, please add a
comment to this effect on this and similar uses.

> +
> +	/*
> +	 * NMIs are edge-triggered, which means if you have enough
> +	 * of them concurrently, you can lose some because only one
> +	 * can be latched at any given time.  Walk the whole list
> +	 * to handle those situations.
> +	 */
> +	while (a) {
> +		next_a = rcu_dereference_raw(a->next);
> +
> +		handled += a->handler(type, regs);
> +
> +		a = next_a;
> +	}
> +	rcu_read_unlock();
> +
> +	/* return total number of NMI events handled */
> +	return handled;
> +}
> +
> +static int __setup_nmi(unsigned int type, struct nmiaction *action)
> +{
> +	struct nmi_desc *desc = nmi_to_desc(type);
> +	struct nmiaction **a = &(desc->head);
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&desc->lock, flags);
> +
> +	/*
> +	 * some handlers need to be executed first otherwise a fake
> +	 * event confuses some handlers (kdump uses this flag)
> +	 */
> +	if (!(action->flags & NMI_FLAG_FIRST))
> +		while ((*a) != NULL)
> +			a = &((*a)->next);
> +	
> +	action->next = *a;
> +	rcu_assign_pointer(*a, action);
> +
> +	spin_unlock_irqrestore(&desc->lock, flags);
> +	return 0;
> +}
> +
> +static struct nmiaction *__free_nmi(unsigned int type, const char *name)
> +{
> +	struct nmi_desc *desc = nmi_to_desc(type);
> +	struct nmiaction *n, **np = &(desc->head);
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&desc->lock, flags);
> +
> +	while ((*np) != NULL) {
> +		n = *np;
> +
> +		/*
> +		 * the name passed in to describe the nmi handler
> +		 * is used as the lookup key
> +		 */
> +		if (!strcmp(n->name, name)) {
> +			WARN(in_nmi(),
> +				"Trying to free NMI (%s) from NMI context!\n", n->name);
> +			rcu_assign_pointer(*np, n->next);
> +			break;
> +		}
> +		np = &(n->next);
> +	}
> +
> +	spin_unlock_irqrestore(&desc->lock, flags);
> +	synchronize_rcu();
> +	return *np;
> +}
> +
> +int register_nmi_handler(unsigned int type, nmi_handler_t handler,
> +			unsigned long nmiflags, const char *devname)
> +{
> +	struct nmiaction *action;
> +	int retval;
> +
> +	if (!handler)
> +		return -EINVAL;
> +
> +	action = kzalloc(sizeof(struct nmiaction), GFP_KERNEL);
> +	if (!action)
> +		return -ENOMEM;
> +
> +	action->handler = handler;
> +	action->flags = nmiflags;
> +	action->name = devname;
> +
> +	retval = __setup_nmi(type, action);
> +
> +	if (retval)
> +		kfree(action);
> +	
> +	return retval;
> +}
> +EXPORT_SYMBOL(register_nmi_handler);
> +
> +void unregister_nmi_handler(unsigned int type, const char *name)
> +{
> +	kfree(__free_nmi(type, name));
> +}
> +
> +EXPORT_SYMBOL_GPL(unregister_nmi_handler);
> +
>  static notrace __kprobes void
>  pci_serr_error(unsigned char reason, struct pt_regs *regs)
>  {
> -- 
> 1.7.6
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC][PATCH 2/6] x86, nmi: create new NMI handler routines
  2011-08-24 17:04   ` Paul E. McKenney
@ 2011-08-24 17:44     ` Don Zickus
  2011-08-24 17:51       ` Peter Zijlstra
  0 siblings, 1 reply; 26+ messages in thread
From: Don Zickus @ 2011-08-24 17:44 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: x86, Andi Kleen, Robert Richter, Peter Zijlstra, ying.huang,
	LKML, jason.wessel

On Wed, Aug 24, 2011 at 10:04:11AM -0700, Paul E. McKenney wrote:
> On Fri, Aug 19, 2011 at 04:37:42PM -0400, Don Zickus wrote:
> > The NMI handlers used to rely on the notifier infrastructure.  This worked
> > great until we wanted to support handling multiple events better.
> > 
> > One of the key ideas to the nmi handling is to process _all_ the handlers for
> > each NMI.  The reason behind this switch is because NMIs are edge triggered.
> > If enough NMIs are triggered, then they could be lost because the cpu can
> > only latch at most one NMI (besides the one currently being processed).
> > 
> > In order to deal with this we have decided to process all the NMI handlers
> > for each NMI.  This allows the handlers to determine if they recieved an
> > event or not (the ones that can not determine this will be left to fend
> > for themselves on the unknown NMI list).
> > 
> > As a result of this change it is now possible to have an extra NMI that
> > was destined to be received for an already processed event.  Because the
> > event was processed in the previous NMI, this NMI gets dropped and becomes
> > an 'unknown' NMI.  This of course will cause printks that scare people.
> > 
> > However, we prefer to have extra NMIs as opposed to losing NMIs and as such
> > are have developed a basic mechanism to catch most of them.  That will be
> > a later patch.
> > 
> > To accomplish this idea, I unhooked the nmi handlers from the notifier
> > routines and created a new mechanism loosely based on doIRQ.  The reason
> > for this is the notifier routines have a couple of shortcomings.  One we
> > could't guarantee all future NMI handlers used NOTIFY_OK instead of
> > NOTIFY_STOP.  Second, we couldn't keep track of the number of events being
> > handled in each routine (most only handle one, perf can handle more than one).
> > Third, I wanted to eventually display which nmi handlers are registered in
> > the system in /proc/interrupts to help see who is generating NMIs.
> > 
> > The patch below just implements the new infrastructure but doesn't wire it up
> > yet (that is the next patch).  Its design is based on doIRQ structs and the
> > atomic notifier routines.  So the rcu stuff in the patch isn't entirely untested
> > (as the notifier routines have soaked it) but it should be double checked in
> > case I copied the code wrong.
> 
> One comment below.
> 
> 							Thanx, Paul
> 
> > Signed-off-by: Don Zickus <dzickus@redhat.com>
> > ---
> >  arch/x86/include/asm/nmi.h |   19 ++++++
> >  arch/x86/kernel/nmi.c      |  139 ++++++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 158 insertions(+), 0 deletions(-)
> > 
> > diff --git a/arch/x86/include/asm/nmi.h b/arch/x86/include/asm/nmi.h
> > index 4886a68..6d04b28 100644
> > --- a/arch/x86/include/asm/nmi.h
> > +++ b/arch/x86/include/asm/nmi.h
> > @@ -42,6 +42,25 @@ void arch_trigger_all_cpu_backtrace(void);
> >  #define NMI_LOCAL_NORMAL_PRIOR	(NMI_LOCAL_BIT | NMI_NORMAL_PRIOR)
> >  #define NMI_LOCAL_LOW_PRIOR	(NMI_LOCAL_BIT | NMI_LOW_PRIOR)
> > 
> > +#define NMI_FLAG_FIRST	1
> > +
> > +enum {
> > +	NMI_LOCAL=0,
> > +	NMI_UNKNOWN,
> > +	NMI_EXTERNAL,
> > +	NMI_MAX
> > +};
> > +
> > +#define NMI_DONE	0
> > +#define NMI_HANDLED	1
> > +
> > +typedef int (*nmi_handler_t)(unsigned int, struct pt_regs *);
> > +
> > +int register_nmi_handler(unsigned int, nmi_handler_t, unsigned long,
> > +			 const char *);
> > +
> > +void unregister_nmi_handler(unsigned int, const char *);
> > +
> >  void stop_nmi(void);
> >  void restart_nmi(void);
> > 
> > diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
> > index 68d758a..dfc46a8 100644
> > --- a/arch/x86/kernel/nmi.c
> > +++ b/arch/x86/kernel/nmi.c
> > @@ -13,6 +13,9 @@
> >  #include <linux/kprobes.h>
> >  #include <linux/kdebug.h>
> >  #include <linux/nmi.h>
> > +#include <linux/delay.h>
> > +#include <linux/hardirq.h>
> > +#include <linux/slab.h>
> > 
> >  #if defined(CONFIG_EDAC)
> >  #include <linux/edac.h>
> > @@ -21,6 +24,27 @@
> >  #include <linux/atomic.h>
> >  #include <asm/traps.h>
> >  #include <asm/mach_traps.h>
> > +#include <asm/nmi.h>
> > +
> > +struct nmiaction {
> > +	struct nmiaction __rcu *next;
> > +	nmi_handler_t handler;
> > +	unsigned int flags;
> > +	const char *name;
> > +};
> > +
> > +struct nmi_desc {
> > +	spinlock_t lock;
> > +	struct nmiaction __rcu *head;
> > +};
> > +
> > +static struct nmi_desc nmi_desc[NMI_MAX] = 
> > +{
> > +	{	.lock = __SPIN_LOCK_UNLOCKED(&nmi_desc[0].lock), },
> > +	{	.lock = __SPIN_LOCK_UNLOCKED(&nmi_desc[1].lock), },
> > +	{	.lock = __SPIN_LOCK_UNLOCKED(&nmi_desc[2].lock), },
> > +
> > +};
> > 
> >  static int ignore_nmis;
> > 
> > @@ -38,6 +62,121 @@ static int __init setup_unknown_nmi_panic(char *str)
> >  }
> >  __setup("unknown_nmi_panic", setup_unknown_nmi_panic);
> > 
> > +#define nmi_to_desc(type) (&nmi_desc[type])
> > +
> > +static int notrace __kprobes nmi_handle(unsigned int type, struct pt_regs *regs)
> > +{
> > +	struct nmi_desc *desc = nmi_to_desc(type);
> > +	struct nmiaction *next_a, *a, **ap = &desc->head;
> > +	int handled=0;
> > +
> > +	rcu_read_lock();
> > +	a = rcu_dereference_raw(*ap);
> 
> The reason for rcu_dereference_raw() is to prevent lockdep from choking
> due to being called from an NMI handler, correct?  If so, please add a
> comment to this effect on this and similar uses.

That sounds right.  But honestly, I just copied what notifier_call_chain
had.  Regardless, I will make sure to document that in my next version.
Thanks!

Cheers,
Don

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC][PATCH 2/6] x86, nmi: create new NMI handler routines
  2011-08-24 17:44     ` Don Zickus
@ 2011-08-24 17:51       ` Peter Zijlstra
  2011-08-24 18:16         ` Don Zickus
  0 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2011-08-24 17:51 UTC (permalink / raw)
  To: Don Zickus
  Cc: Paul E. McKenney, x86, Andi Kleen, Robert Richter, ying.huang,
	LKML, jason.wessel

On Wed, 2011-08-24 at 13:44 -0400, Don Zickus wrote:
> > > +   rcu_read_lock();
> > > +   a = rcu_dereference_raw(*ap);
> > 
> > The reason for rcu_dereference_raw() is to prevent lockdep from choking
> > due to being called from an NMI handler, correct?  If so, please add a
> > comment to this effect on this and similar uses.
> 
> That sounds right.  But honestly, I just copied what notifier_call_chain
> had.  Regardless, I will make sure to document that in my next version.
> Thanks! 

Not quite right, nmi_enter() does lockdep_disable() and makes
lock_is_held() return always true.

I think this (and the other sites) could do with rcu_dereference_check(,
lockdep_is_held(&desc->lock)); not that it wouldn't be anything but
documentation since the actual test isn't working from NMI context but I
do think its worth it for that alone.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC][PATCH 2/6] x86, nmi: create new NMI handler routines
  2011-08-24 17:51       ` Peter Zijlstra
@ 2011-08-24 18:16         ` Don Zickus
  2011-08-24 18:19           ` Peter Zijlstra
  0 siblings, 1 reply; 26+ messages in thread
From: Don Zickus @ 2011-08-24 18:16 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paul E. McKenney, x86, Andi Kleen, Robert Richter, ying.huang,
	LKML, jason.wessel

On Wed, Aug 24, 2011 at 07:51:00PM +0200, Peter Zijlstra wrote:
> On Wed, 2011-08-24 at 13:44 -0400, Don Zickus wrote:
> > > > +   rcu_read_lock();
> > > > +   a = rcu_dereference_raw(*ap);
> > > 
> > > The reason for rcu_dereference_raw() is to prevent lockdep from choking
> > > due to being called from an NMI handler, correct?  If so, please add a
> > > comment to this effect on this and similar uses.
> > 
> > That sounds right.  But honestly, I just copied what notifier_call_chain
> > had.  Regardless, I will make sure to document that in my next version.
> > Thanks! 
> 
> Not quite right, nmi_enter() does lockdep_disable() and makes
> lock_is_held() return always true.
> 
> I think this (and the other sites) could do with rcu_dereference_check(,
> lockdep_is_held(&desc->lock)); not that it wouldn't be anything but
> documentation since the actual test isn't working from NMI context but I
> do think its worth it for that alone.

So you want me to remove the _raw part of the dereference?  I can test
that with lockdep enabled to verify things don't go splat.

Cheers,
Don

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC][PATCH 2/6] x86, nmi: create new NMI handler routines
  2011-08-24 18:16         ` Don Zickus
@ 2011-08-24 18:19           ` Peter Zijlstra
  2011-08-24 19:16             ` Paul E. McKenney
  0 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2011-08-24 18:19 UTC (permalink / raw)
  To: Don Zickus
  Cc: Paul E. McKenney, x86, Andi Kleen, Robert Richter, ying.huang,
	LKML, jason.wessel

On Wed, 2011-08-24 at 14:16 -0400, Don Zickus wrote:
> On Wed, Aug 24, 2011 at 07:51:00PM +0200, Peter Zijlstra wrote:
> > On Wed, 2011-08-24 at 13:44 -0400, Don Zickus wrote:
> > > > > +   rcu_read_lock();
> > > > > +   a = rcu_dereference_raw(*ap);
> > > > 
> > > > The reason for rcu_dereference_raw() is to prevent lockdep from choking
> > > > due to being called from an NMI handler, correct?  If so, please add a
> > > > comment to this effect on this and similar uses.
> > > 
> > > That sounds right.  But honestly, I just copied what notifier_call_chain
> > > had.  Regardless, I will make sure to document that in my next version.
> > > Thanks! 
> > 
> > Not quite right, nmi_enter() does lockdep_disable() and makes
> > lock_is_held() return always true.
> > 
> > I think this (and the other sites) could do with rcu_dereference_check(,
> > lockdep_is_held(&desc->lock)); not that it wouldn't be anything but
> > documentation since the actual test isn't working from NMI context but I
> > do think its worth it for that alone.
> 
> So you want me to remove the _raw part of the dereference?  I can test
> that with lockdep enabled to verify things don't go splat.

Ah, right, its never used from the desc->lock context and we always hold
rcu_read_lock(), so a simple rcu_dereference() should indeed suffice. 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC][PATCH 2/6] x86, nmi: create new NMI handler routines
  2011-08-24 18:19           ` Peter Zijlstra
@ 2011-08-24 19:16             ` Paul E. McKenney
  0 siblings, 0 replies; 26+ messages in thread
From: Paul E. McKenney @ 2011-08-24 19:16 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Don Zickus, x86, Andi Kleen, Robert Richter, ying.huang, LKML,
	jason.wessel

On Wed, Aug 24, 2011 at 08:19:54PM +0200, Peter Zijlstra wrote:
> On Wed, 2011-08-24 at 14:16 -0400, Don Zickus wrote:
> > On Wed, Aug 24, 2011 at 07:51:00PM +0200, Peter Zijlstra wrote:
> > > On Wed, 2011-08-24 at 13:44 -0400, Don Zickus wrote:
> > > > > > +   rcu_read_lock();
> > > > > > +   a = rcu_dereference_raw(*ap);
> > > > > 
> > > > > The reason for rcu_dereference_raw() is to prevent lockdep from choking
> > > > > due to being called from an NMI handler, correct?  If so, please add a
> > > > > comment to this effect on this and similar uses.
> > > > 
> > > > That sounds right.  But honestly, I just copied what notifier_call_chain
> > > > had.  Regardless, I will make sure to document that in my next version.
> > > > Thanks! 
> > > 
> > > Not quite right, nmi_enter() does lockdep_disable() and makes
> > > lock_is_held() return always true.
> > > 
> > > I think this (and the other sites) could do with rcu_dereference_check(,
> > > lockdep_is_held(&desc->lock)); not that it wouldn't be anything but
> > > documentation since the actual test isn't working from NMI context but I
> > > do think its worth it for that alone.
> > 
> > So you want me to remove the _raw part of the dereference?  I can test
> > that with lockdep enabled to verify things don't go splat.
> 
> Ah, right, its never used from the desc->lock context and we always hold
> rcu_read_lock(), so a simple rcu_dereference() should indeed suffice. 

Even better!  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2011-08-24 19:17 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-19 20:37 [RFC][PATCH 0/6] x86, nmi: new NMI handling routines Don Zickus
2011-08-19 20:37 ` [RFC][PATCH 1/6] x86, nmi: split out nmi from traps.c Don Zickus
2011-08-19 20:37 ` [RFC][PATCH 2/6] x86, nmi: create new NMI handler routines Don Zickus
2011-08-22 14:13   ` Peter Zijlstra
2011-08-22 15:21     ` Don Zickus
2011-08-22 15:26       ` Peter Zijlstra
2011-08-22 15:41         ` Don Zickus
2011-08-22 15:31       ` Peter Zijlstra
2011-08-22 14:16   ` Peter Zijlstra
2011-08-22 15:23     ` Don Zickus
2011-08-23 14:14     ` Don Zickus
2011-08-23 14:17       ` Peter Zijlstra
2011-08-24 17:04   ` Paul E. McKenney
2011-08-24 17:44     ` Don Zickus
2011-08-24 17:51       ` Peter Zijlstra
2011-08-24 18:16         ` Don Zickus
2011-08-24 18:19           ` Peter Zijlstra
2011-08-24 19:16             ` Paul E. McKenney
2011-08-19 20:37 ` [RFC][PATCH 3/6] x86, nmi: wire up NMI handlers to new routines Don Zickus
2011-08-19 20:37 ` [RFC][PATCH 4/6] x86, nmi: add in logic to handle multiple events and unknown NMIs Don Zickus
2011-08-22 14:22   ` Peter Zijlstra
2011-08-22 15:25     ` Don Zickus
2011-08-19 20:37 ` [RFC][PATCH 5/6] x86, nmi: track NMI usage stats Don Zickus
2011-08-19 20:37 ` [RFC][PATCH 6/6] x86, nmi: print out NMI stats in /proc/interrupts Don Zickus
2011-08-22 14:27   ` Peter Zijlstra
2011-08-22 15:28     ` Don Zickus

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.