All of lore.kernel.org
 help / color / mirror / Atom feed
* [V6][PATCH 0/6] x86, nmi: new NMI handling routines
@ 2011-09-23 19:17 Don Zickus
  2011-09-23 19:17 ` [V6][PATCH 1/6] x86, nmi: split out nmi from traps.c Don Zickus
                   ` (5 more replies)
  0 siblings, 6 replies; 19+ messages in thread
From: Don Zickus @ 2011-09-23 19:17 UTC (permalink / raw)
  To: x86, Andi Kleen, Robert Richter, Peter Zijlstra, ying.huang
  Cc: LKML, paulmck, avi, jeremy, Don Zickus

I had the pleasure of hosting Robert Richter at Red Hat in August.  One of
the issues he wanted to talk with me about was having the NMI handling 
routines execute all the NMI handlers for each NMI mainly for his AMD IBS
work.  But he also brought up another good point that because of the way NMIs
work, it is possible to lose them if multiple NMIs happen at the same time.

As a result, we sat around and discussed how we could go about executing
all the nmi handlers for each NMI to ensure that we would not lose any events.

We decided the best way to do this would be to have the NMI handlers break
away from the notifier routines and create our own.  This would allow us to
execute all the handlers without hacking up the notifier stuff and easily
track the number of events processed at a higher level to deal with the new
problemm of extra NMIs.

I spent some time hacking and came up with this patch.  I tested it on my
core2quad machine trying to enable all the NMI handler I could, mainly
perf and kgdb (and oprofile too when perf was disabled).  Everything seems
to work correctly.  If people are ok with this approach, I'll try and test
this on more machines.

More details about the patch are in the individual changelogs.

I rebased this on top of tglx/perf/core [cba9bd22], I hope that is the right
branch?

Don Zickus (6):
  x86, nmi: split out nmi from traps.c
  x86, nmi: create new NMI handler routines
  x86, nmi: wire up NMI handlers to new routines
  x86, nmi:  add in logic to handle multiple events and unknown NMIs
  x86, nmi: track NMI usage stats
  x86, nmi: print out NMI stats in /proc/interrupts

 arch/x86/include/asm/nmi.h              |   39 ++--
 arch/x86/include/asm/reboot.h           |    2 +-
 arch/x86/kernel/Makefile                |    2 +-
 arch/x86/kernel/apic/hw_nmi.c           |   27 +--
 arch/x86/kernel/apic/x2apic_uv_x.c      |   20 +-
 arch/x86/kernel/cpu/mcheck/mce-inject.c |   20 +-
 arch/x86/kernel/cpu/mcheck/mce.c        |    3 -
 arch/x86/kernel/cpu/perf_event.c        |   62 +----
 arch/x86/kernel/crash.c                 |    5 +-
 arch/x86/kernel/irq.c                   |    2 +
 arch/x86/kernel/kgdb.c                  |   60 +++-
 arch/x86/kernel/nmi.c                   |  469 +++++++++++++++++++++++++++++++
 arch/x86/kernel/process_32.c            |    2 +
 arch/x86/kernel/process_64.c            |    2 +
 arch/x86/kernel/reboot.c                |   23 +-
 arch/x86/kernel/traps.c                 |  155 ----------
 arch/x86/oprofile/nmi_int.c             |   40 +--
 arch/x86/oprofile/nmi_timer_int.c       |   28 +--
 drivers/acpi/apei/ghes.c                |   22 +-
 drivers/char/ipmi/ipmi_watchdog.c       |   32 +--
 drivers/watchdog/hpwdt.c                |   23 +--
 21 files changed, 612 insertions(+), 426 deletions(-)
 create mode 100644 arch/x86/kernel/nmi.c

-- 
1.7.6


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [V6][PATCH 1/6] x86, nmi: split out nmi from traps.c
  2011-09-23 19:17 [V6][PATCH 0/6] x86, nmi: new NMI handling routines Don Zickus
@ 2011-09-23 19:17 ` Don Zickus
  2011-09-23 19:17 ` [V6][PATCH 2/6] x86, nmi: create new NMI handler routines Don Zickus
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 19+ messages in thread
From: Don Zickus @ 2011-09-23 19:17 UTC (permalink / raw)
  To: x86, Andi Kleen, Robert Richter, Peter Zijlstra, ying.huang
  Cc: LKML, paulmck, avi, jeremy, Don Zickus

The nmi stuff is changing a lot and adding more functionality.  Split it
out from the traps.c file so it doesn't continue to pollute that file.

This makes it easier to find and expand all the future nmi related work.

No real functional changes here.

Signed-off-by: Don Zickus <dzickus@redhat.com>
---
 arch/x86/kernel/Makefile |    2 +-
 arch/x86/kernel/nmi.c    |  178 ++++++++++++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/traps.c  |  155 ----------------------------------------
 3 files changed, 179 insertions(+), 156 deletions(-)
 create mode 100644 arch/x86/kernel/nmi.c

diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 82f2912..8baca3c 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -19,7 +19,7 @@ endif
 
 obj-y			:= process_$(BITS).o signal.o entry_$(BITS).o
 obj-y			+= traps.o irq.o irq_$(BITS).o dumpstack_$(BITS).o
-obj-y			+= time.o ioport.o ldt.o dumpstack.o
+obj-y			+= time.o ioport.o ldt.o dumpstack.o nmi.o
 obj-y			+= setup.o x86_init.o i8259.o irqinit.o jump_label.o
 obj-$(CONFIG_IRQ_WORK)  += irq_work.o
 obj-y			+= probe_roms.o
diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
new file mode 100644
index 0000000..68d758a
--- /dev/null
+++ b/arch/x86/kernel/nmi.c
@@ -0,0 +1,178 @@
+/*
+ *  Copyright (C) 1991, 1992  Linus Torvalds
+ *  Copyright (C) 2000, 2001, 2002 Andi Kleen, SuSE Labs
+ *
+ *  Pentium III FXSR, SSE support
+ *	Gareth Hughes <gareth@valinux.com>, May 2000
+ */
+
+/*
+ * Handle hardware traps and faults.
+ */
+#include <linux/spinlock.h>
+#include <linux/kprobes.h>
+#include <linux/kdebug.h>
+#include <linux/nmi.h>
+
+#if defined(CONFIG_EDAC)
+#include <linux/edac.h>
+#endif
+
+#include <linux/atomic.h>
+#include <asm/traps.h>
+#include <asm/mach_traps.h>
+
+static int ignore_nmis;
+
+int unknown_nmi_panic;
+/*
+ * Prevent NMI reason port (0x61) being accessed simultaneously, can
+ * only be used in NMI handler.
+ */
+static DEFINE_RAW_SPINLOCK(nmi_reason_lock);
+
+static int __init setup_unknown_nmi_panic(char *str)
+{
+	unknown_nmi_panic = 1;
+	return 1;
+}
+__setup("unknown_nmi_panic", setup_unknown_nmi_panic);
+
+static notrace __kprobes void
+pci_serr_error(unsigned char reason, struct pt_regs *regs)
+{
+	pr_emerg("NMI: PCI system error (SERR) for reason %02x on CPU %d.\n",
+		 reason, smp_processor_id());
+
+	/*
+	 * On some machines, PCI SERR line is used to report memory
+	 * errors. EDAC makes use of it.
+	 */
+#if defined(CONFIG_EDAC)
+	if (edac_handler_set()) {
+		edac_atomic_assert_error();
+		return;
+	}
+#endif
+
+	if (panic_on_unrecovered_nmi)
+		panic("NMI: Not continuing");
+
+	pr_emerg("Dazed and confused, but trying to continue\n");
+
+	/* Clear and disable the PCI SERR error line. */
+	reason = (reason & NMI_REASON_CLEAR_MASK) | NMI_REASON_CLEAR_SERR;
+	outb(reason, NMI_REASON_PORT);
+}
+
+static notrace __kprobes void
+io_check_error(unsigned char reason, struct pt_regs *regs)
+{
+	unsigned long i;
+
+	pr_emerg(
+	"NMI: IOCK error (debug interrupt?) for reason %02x on CPU %d.\n",
+		 reason, smp_processor_id());
+	show_registers(regs);
+
+	if (panic_on_io_nmi)
+		panic("NMI IOCK error: Not continuing");
+
+	/* Re-enable the IOCK line, wait for a few seconds */
+	reason = (reason & NMI_REASON_CLEAR_MASK) | NMI_REASON_CLEAR_IOCHK;
+	outb(reason, NMI_REASON_PORT);
+
+	i = 20000;
+	while (--i) {
+		touch_nmi_watchdog();
+		udelay(100);
+	}
+
+	reason &= ~NMI_REASON_CLEAR_IOCHK;
+	outb(reason, NMI_REASON_PORT);
+}
+
+static notrace __kprobes void
+unknown_nmi_error(unsigned char reason, struct pt_regs *regs)
+{
+	if (notify_die(DIE_NMIUNKNOWN, "nmi", regs, reason, 2, SIGINT) ==
+			NOTIFY_STOP)
+		return;
+#ifdef CONFIG_MCA
+	/*
+	 * Might actually be able to figure out what the guilty party
+	 * is:
+	 */
+	if (MCA_bus) {
+		mca_handle_nmi();
+		return;
+	}
+#endif
+	pr_emerg("Uhhuh. NMI received for unknown reason %02x on CPU %d.\n",
+		 reason, smp_processor_id());
+
+	pr_emerg("Do you have a strange power saving mode enabled?\n");
+	if (unknown_nmi_panic || panic_on_unrecovered_nmi)
+		panic("NMI: Not continuing");
+
+	pr_emerg("Dazed and confused, but trying to continue\n");
+}
+
+static notrace __kprobes void default_do_nmi(struct pt_regs *regs)
+{
+	unsigned char reason = 0;
+
+	/*
+	 * CPU-specific NMI must be processed before non-CPU-specific
+	 * NMI, otherwise we may lose it, because the CPU-specific
+	 * NMI can not be detected/processed on other CPUs.
+	 */
+	if (notify_die(DIE_NMI, "nmi", regs, 0, 2, SIGINT) == NOTIFY_STOP)
+		return;
+
+	/* Non-CPU-specific NMI: NMI sources can be processed on any CPU */
+	raw_spin_lock(&nmi_reason_lock);
+	reason = get_nmi_reason();
+
+	if (reason & NMI_REASON_MASK) {
+		if (reason & NMI_REASON_SERR)
+			pci_serr_error(reason, regs);
+		else if (reason & NMI_REASON_IOCHK)
+			io_check_error(reason, regs);
+#ifdef CONFIG_X86_32
+		/*
+		 * Reassert NMI in case it became active
+		 * meanwhile as it's edge-triggered:
+		 */
+		reassert_nmi();
+#endif
+		raw_spin_unlock(&nmi_reason_lock);
+		return;
+	}
+	raw_spin_unlock(&nmi_reason_lock);
+
+	unknown_nmi_error(reason, regs);
+}
+
+dotraplinkage notrace __kprobes void
+do_nmi(struct pt_regs *regs, long error_code)
+{
+	nmi_enter();
+
+	inc_irq_stat(__nmi_count);
+
+	if (!ignore_nmis)
+		default_do_nmi(regs);
+
+	nmi_exit();
+}
+
+void stop_nmi(void)
+{
+	ignore_nmis++;
+}
+
+void restart_nmi(void)
+{
+	ignore_nmis--;
+}
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 6913369..a8e3eb8 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -81,15 +81,6 @@ gate_desc idt_table[NR_VECTORS] __page_aligned_data = { { { { 0, 0 } } }, };
 DECLARE_BITMAP(used_vectors, NR_VECTORS);
 EXPORT_SYMBOL_GPL(used_vectors);
 
-static int ignore_nmis;
-
-int unknown_nmi_panic;
-/*
- * Prevent NMI reason port (0x61) being accessed simultaneously, can
- * only be used in NMI handler.
- */
-static DEFINE_RAW_SPINLOCK(nmi_reason_lock);
-
 static inline void conditional_sti(struct pt_regs *regs)
 {
 	if (regs->flags & X86_EFLAGS_IF)
@@ -307,152 +298,6 @@ gp_in_kernel:
 	die("general protection fault", regs, error_code);
 }
 
-static int __init setup_unknown_nmi_panic(char *str)
-{
-	unknown_nmi_panic = 1;
-	return 1;
-}
-__setup("unknown_nmi_panic", setup_unknown_nmi_panic);
-
-static notrace __kprobes void
-pci_serr_error(unsigned char reason, struct pt_regs *regs)
-{
-	pr_emerg("NMI: PCI system error (SERR) for reason %02x on CPU %d.\n",
-		 reason, smp_processor_id());
-
-	/*
-	 * On some machines, PCI SERR line is used to report memory
-	 * errors. EDAC makes use of it.
-	 */
-#if defined(CONFIG_EDAC)
-	if (edac_handler_set()) {
-		edac_atomic_assert_error();
-		return;
-	}
-#endif
-
-	if (panic_on_unrecovered_nmi)
-		panic("NMI: Not continuing");
-
-	pr_emerg("Dazed and confused, but trying to continue\n");
-
-	/* Clear and disable the PCI SERR error line. */
-	reason = (reason & NMI_REASON_CLEAR_MASK) | NMI_REASON_CLEAR_SERR;
-	outb(reason, NMI_REASON_PORT);
-}
-
-static notrace __kprobes void
-io_check_error(unsigned char reason, struct pt_regs *regs)
-{
-	unsigned long i;
-
-	pr_emerg(
-	"NMI: IOCK error (debug interrupt?) for reason %02x on CPU %d.\n",
-		 reason, smp_processor_id());
-	show_registers(regs);
-
-	if (panic_on_io_nmi)
-		panic("NMI IOCK error: Not continuing");
-
-	/* Re-enable the IOCK line, wait for a few seconds */
-	reason = (reason & NMI_REASON_CLEAR_MASK) | NMI_REASON_CLEAR_IOCHK;
-	outb(reason, NMI_REASON_PORT);
-
-	i = 20000;
-	while (--i) {
-		touch_nmi_watchdog();
-		udelay(100);
-	}
-
-	reason &= ~NMI_REASON_CLEAR_IOCHK;
-	outb(reason, NMI_REASON_PORT);
-}
-
-static notrace __kprobes void
-unknown_nmi_error(unsigned char reason, struct pt_regs *regs)
-{
-	if (notify_die(DIE_NMIUNKNOWN, "nmi", regs, reason, 2, SIGINT) ==
-			NOTIFY_STOP)
-		return;
-#ifdef CONFIG_MCA
-	/*
-	 * Might actually be able to figure out what the guilty party
-	 * is:
-	 */
-	if (MCA_bus) {
-		mca_handle_nmi();
-		return;
-	}
-#endif
-	pr_emerg("Uhhuh. NMI received for unknown reason %02x on CPU %d.\n",
-		 reason, smp_processor_id());
-
-	pr_emerg("Do you have a strange power saving mode enabled?\n");
-	if (unknown_nmi_panic || panic_on_unrecovered_nmi)
-		panic("NMI: Not continuing");
-
-	pr_emerg("Dazed and confused, but trying to continue\n");
-}
-
-static notrace __kprobes void default_do_nmi(struct pt_regs *regs)
-{
-	unsigned char reason = 0;
-
-	/*
-	 * CPU-specific NMI must be processed before non-CPU-specific
-	 * NMI, otherwise we may lose it, because the CPU-specific
-	 * NMI can not be detected/processed on other CPUs.
-	 */
-	if (notify_die(DIE_NMI, "nmi", regs, 0, 2, SIGINT) == NOTIFY_STOP)
-		return;
-
-	/* Non-CPU-specific NMI: NMI sources can be processed on any CPU */
-	raw_spin_lock(&nmi_reason_lock);
-	reason = get_nmi_reason();
-
-	if (reason & NMI_REASON_MASK) {
-		if (reason & NMI_REASON_SERR)
-			pci_serr_error(reason, regs);
-		else if (reason & NMI_REASON_IOCHK)
-			io_check_error(reason, regs);
-#ifdef CONFIG_X86_32
-		/*
-		 * Reassert NMI in case it became active
-		 * meanwhile as it's edge-triggered:
-		 */
-		reassert_nmi();
-#endif
-		raw_spin_unlock(&nmi_reason_lock);
-		return;
-	}
-	raw_spin_unlock(&nmi_reason_lock);
-
-	unknown_nmi_error(reason, regs);
-}
-
-dotraplinkage notrace __kprobes void
-do_nmi(struct pt_regs *regs, long error_code)
-{
-	nmi_enter();
-
-	inc_irq_stat(__nmi_count);
-
-	if (!ignore_nmis)
-		default_do_nmi(regs);
-
-	nmi_exit();
-}
-
-void stop_nmi(void)
-{
-	ignore_nmis++;
-}
-
-void restart_nmi(void)
-{
-	ignore_nmis--;
-}
-
 /* May run on IST stack. */
 dotraplinkage void __kprobes do_int3(struct pt_regs *regs, long error_code)
 {
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [V6][PATCH 2/6] x86, nmi: create new NMI handler routines
  2011-09-23 19:17 [V6][PATCH 0/6] x86, nmi: new NMI handling routines Don Zickus
  2011-09-23 19:17 ` [V6][PATCH 1/6] x86, nmi: split out nmi from traps.c Don Zickus
@ 2011-09-23 19:17 ` Don Zickus
  2011-09-23 19:17 ` [V6][PATCH 3/6] x86, nmi: wire up NMI handlers to new routines Don Zickus
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 19+ messages in thread
From: Don Zickus @ 2011-09-23 19:17 UTC (permalink / raw)
  To: x86, Andi Kleen, Robert Richter, Peter Zijlstra, ying.huang
  Cc: LKML, paulmck, avi, jeremy, Don Zickus

The NMI handlers used to rely on the notifier infrastructure.  This worked
great until we wanted to support handling multiple events better.

One of the key ideas to the nmi handling is to process _all_ the handlers for
each NMI.  The reason behind this switch is because NMIs are edge triggered.
If enough NMIs are triggered, then they could be lost because the cpu can
only latch at most one NMI (besides the one currently being processed).

In order to deal with this we have decided to process all the NMI handlers
for each NMI.  This allows the handlers to determine if they recieved an
event or not (the ones that can not determine this will be left to fend
for themselves on the unknown NMI list).

As a result of this change it is now possible to have an extra NMI that
was destined to be received for an already processed event.  Because the
event was processed in the previous NMI, this NMI gets dropped and becomes
an 'unknown' NMI.  This of course will cause printks that scare people.

However, we prefer to have extra NMIs as opposed to losing NMIs and as such
are have developed a basic mechanism to catch most of them.  That will be
a later patch.

To accomplish this idea, I unhooked the nmi handlers from the notifier
routines and created a new mechanism loosely based on doIRQ.  The reason
for this is the notifier routines have a couple of shortcomings.  One we
could't guarantee all future NMI handlers used NOTIFY_OK instead of
NOTIFY_STOP.  Second, we couldn't keep track of the number of events being
handled in each routine (most only handle one, perf can handle more than one).
Third, I wanted to eventually display which nmi handlers are registered in
the system in /proc/interrupts to help see who is generating NMIs.

The patch below just implements the new infrastructure but doesn't wire it up
yet (that is the next patch).  Its design is based on doIRQ structs and the
atomic notifier routines.  So the rcu stuff in the patch isn't entirely untested
(as the notifier routines have soaked it) but it should be double checked in
case I copied the code wrong.

V2:
  - use kstrdup to copy/allocate device name
  - fix-up _GPL oops

V3:
  - fix leak in register_nmi_handler error path
  - removed _raw annotations from rcu_dereference

V4:
  - handle kstrndup failure

V5:
  - converted the list to list_head and used the list_XXX_rcu stuff
    to manipulate it (based on Ying's idea).
  - removed unused NMI_EXTERNAL for now (until we actually have a user)

Signed-off-by: Don Zickus <dzickus@redhat.com>
---
 arch/x86/include/asm/nmi.h |   18 +++++
 arch/x86/kernel/nmi.c      |  153 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 171 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/nmi.h b/arch/x86/include/asm/nmi.h
index 4886a68..480b69b 100644
--- a/arch/x86/include/asm/nmi.h
+++ b/arch/x86/include/asm/nmi.h
@@ -42,6 +42,24 @@ void arch_trigger_all_cpu_backtrace(void);
 #define NMI_LOCAL_NORMAL_PRIOR	(NMI_LOCAL_BIT | NMI_NORMAL_PRIOR)
 #define NMI_LOCAL_LOW_PRIOR	(NMI_LOCAL_BIT | NMI_LOW_PRIOR)
 
+#define NMI_FLAG_FIRST	1
+
+enum {
+	NMI_LOCAL=0,
+	NMI_UNKNOWN,
+	NMI_MAX
+};
+
+#define NMI_DONE	0
+#define NMI_HANDLED	1
+
+typedef int (*nmi_handler_t)(unsigned int, struct pt_regs *);
+
+int register_nmi_handler(unsigned int, nmi_handler_t, unsigned long,
+			 const char *);
+
+void unregister_nmi_handler(unsigned int, const char *);
+
 void stop_nmi(void);
 void restart_nmi(void);
 
diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index 68d758a..327748d 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -13,6 +13,9 @@
 #include <linux/kprobes.h>
 #include <linux/kdebug.h>
 #include <linux/nmi.h>
+#include <linux/delay.h>
+#include <linux/hardirq.h>
+#include <linux/slab.h>
 
 #if defined(CONFIG_EDAC)
 #include <linux/edac.h>
@@ -21,6 +24,33 @@
 #include <linux/atomic.h>
 #include <asm/traps.h>
 #include <asm/mach_traps.h>
+#include <asm/nmi.h>
+
+#define NMI_MAX_NAMELEN	16
+struct nmiaction {
+	struct list_head list;
+	nmi_handler_t handler;
+	unsigned int flags;
+	char *name;
+};
+
+struct nmi_desc {
+	spinlock_t lock;
+	struct list_head head;
+};
+
+static struct nmi_desc nmi_desc[NMI_MAX] = 
+{
+	{
+		.lock = __SPIN_LOCK_UNLOCKED(&nmi_desc[0].lock),
+		.head = LIST_HEAD_INIT(nmi_desc[0].head),
+	},
+	{
+		.lock = __SPIN_LOCK_UNLOCKED(&nmi_desc[1].lock),
+		.head = LIST_HEAD_INIT(nmi_desc[1].head),
+	},
+
+};
 
 static int ignore_nmis;
 
@@ -38,6 +68,129 @@ static int __init setup_unknown_nmi_panic(char *str)
 }
 __setup("unknown_nmi_panic", setup_unknown_nmi_panic);
 
+#define nmi_to_desc(type) (&nmi_desc[type])
+
+static int notrace __kprobes nmi_handle(unsigned int type, struct pt_regs *regs)
+{
+	struct nmi_desc *desc = nmi_to_desc(type);
+	struct nmiaction *a;
+	int handled=0;
+
+	rcu_read_lock();
+
+	/*
+	 * NMIs are edge-triggered, which means if you have enough
+	 * of them concurrently, you can lose some because only one
+	 * can be latched at any given time.  Walk the whole list
+	 * to handle those situations.
+	 */
+	list_for_each_entry_rcu(a, &desc->head, list) {
+
+		handled += a->handler(type, regs);
+
+	}
+
+	rcu_read_unlock();
+
+	/* return total number of NMI events handled */
+	return handled;
+}
+
+static int __setup_nmi(unsigned int type, struct nmiaction *action)
+{
+	struct nmi_desc *desc = nmi_to_desc(type);
+	unsigned long flags;
+
+	spin_lock_irqsave(&desc->lock, flags);
+
+	/*
+	 * some handlers need to be executed first otherwise a fake
+	 * event confuses some handlers (kdump uses this flag)
+	 */
+	if (action->flags & NMI_FLAG_FIRST)
+		list_add_rcu(&action->list, &desc->head);
+	else
+		list_add_tail_rcu(&action->list, &desc->head);
+	
+	spin_unlock_irqrestore(&desc->lock, flags);
+	return 0;
+}
+
+static struct nmiaction *__free_nmi(unsigned int type, const char *name)
+{
+	struct nmi_desc *desc = nmi_to_desc(type);
+	struct nmiaction *n;
+	unsigned long flags;
+
+	spin_lock_irqsave(&desc->lock, flags);
+
+	list_for_each_entry_rcu(n, &desc->head, list) {
+		/*
+		 * the name passed in to describe the nmi handler
+		 * is used as the lookup key
+		 */
+		if (!strcmp(n->name, name)) {
+			WARN(in_nmi(),
+				"Trying to free NMI (%s) from NMI context!\n", n->name);
+			list_del_rcu(&n->list);
+			break;
+		}
+	}
+
+	spin_unlock_irqrestore(&desc->lock, flags);
+	synchronize_rcu();
+	return (n);
+}
+
+int register_nmi_handler(unsigned int type, nmi_handler_t handler,
+			unsigned long nmiflags, const char *devname)
+{
+	struct nmiaction *action;
+	int retval = -ENOMEM;
+
+	if (!handler)
+		return -EINVAL;
+
+	action = kzalloc(sizeof(struct nmiaction), GFP_KERNEL);
+	if (!action)
+		goto fail_action;
+
+	action->handler = handler;
+	action->flags = nmiflags;
+	action->name = kstrndup(devname, NMI_MAX_NAMELEN, GFP_KERNEL);
+	if (!action->name)
+		goto fail_action_name;
+
+	retval = __setup_nmi(type, action);
+
+	if (retval)
+		goto fail_setup_nmi;
+
+	return retval;
+
+fail_setup_nmi:
+	kfree(action->name);
+fail_action_name:
+	kfree(action);
+fail_action:	
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(register_nmi_handler);
+
+void unregister_nmi_handler(unsigned int type, const char *name)
+{
+	struct nmiaction *a;
+
+	a = __free_nmi(type, name);
+	if (a) {
+		kfree(a->name);
+		kfree(a);
+	}
+}
+
+EXPORT_SYMBOL_GPL(unregister_nmi_handler);
+
 static notrace __kprobes void
 pci_serr_error(unsigned char reason, struct pt_regs *regs)
 {
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [V6][PATCH 3/6] x86, nmi: wire up NMI handlers to new routines
  2011-09-23 19:17 [V6][PATCH 0/6] x86, nmi: new NMI handling routines Don Zickus
  2011-09-23 19:17 ` [V6][PATCH 1/6] x86, nmi: split out nmi from traps.c Don Zickus
  2011-09-23 19:17 ` [V6][PATCH 2/6] x86, nmi: create new NMI handler routines Don Zickus
@ 2011-09-23 19:17 ` Don Zickus
  2011-09-30 14:25   ` Robert Richter
  2011-09-23 19:17 ` [V6][PATCH 4/6] x86, nmi: add in logic to handle multiple events and unknown NMIs Don Zickus
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 19+ messages in thread
From: Don Zickus @ 2011-09-23 19:17 UTC (permalink / raw)
  To: x86, Andi Kleen, Robert Richter, Peter Zijlstra, ying.huang
  Cc: LKML, paulmck, avi, jeremy, Don Zickus, Jason Wessel, Andi Kleen,
	Corey Minyard, Jack Steiner

Just convert all the files that have an nmi handler to the new routines.
Most of it is straight forward conversion.  A couple of places needed some
tweaking like kgdb which separates the debug notifier from the nmi handler
and mce removes a call to notify_die.

[Thanks to Ying for finding out the history behind that mce call

https://lkml.org/lkml/2010/5/27/114

And Boris responding that he would like to remove that call because of it

https://lkml.org/lkml/2011/9/21/163]

The things that get converted are the registeration/unregistration routines
and the nmi handler itself has its args changed along with code removal
to check which list it is on (most are on one NMI list except for kgdb
which has both an NMI routine and an NMI Unknown routine).

V2:
  - perf event nmi handler cleanup
  - update changelog

Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Corey Minyard <minyard@acm.org>
Cc: Jack Steiner <steiner@sgi.com>
Acked-by: Corey Minyard <minyard@acm.org>
Signed-off-by: Don Zickus <dzickus@redhat.com>
---
 arch/x86/include/asm/nmi.h              |   20 ----------
 arch/x86/include/asm/reboot.h           |    2 +-
 arch/x86/kernel/apic/hw_nmi.c           |   27 ++-----------
 arch/x86/kernel/apic/x2apic_uv_x.c      |   20 ++--------
 arch/x86/kernel/cpu/mcheck/mce-inject.c |   20 +++------
 arch/x86/kernel/cpu/mcheck/mce.c        |    3 -
 arch/x86/kernel/cpu/perf_event.c        |   62 ++-----------------------------
 arch/x86/kernel/crash.c                 |    5 +--
 arch/x86/kernel/kgdb.c                  |   60 ++++++++++++++++++++++-------
 arch/x86/kernel/nmi.c                   |   11 ++++-
 arch/x86/kernel/reboot.c                |   23 +++--------
 arch/x86/oprofile/nmi_int.c             |   40 ++++++--------------
 arch/x86/oprofile/nmi_timer_int.c       |   28 +++-----------
 drivers/acpi/apei/ghes.c                |   22 ++++-------
 drivers/char/ipmi/ipmi_watchdog.c       |   32 +++++-----------
 drivers/watchdog/hpwdt.c                |   23 +++--------
 16 files changed, 124 insertions(+), 274 deletions(-)

diff --git a/arch/x86/include/asm/nmi.h b/arch/x86/include/asm/nmi.h
index 480b69b..5361095 100644
--- a/arch/x86/include/asm/nmi.h
+++ b/arch/x86/include/asm/nmi.h
@@ -22,26 +22,6 @@ void arch_trigger_all_cpu_backtrace(void);
 #define arch_trigger_all_cpu_backtrace arch_trigger_all_cpu_backtrace
 #endif
 
-/*
- * Define some priorities for the nmi notifier call chain.
- *
- * Create a local nmi bit that has a higher priority than
- * external nmis, because the local ones are more frequent.
- *
- * Also setup some default high/normal/low settings for
- * subsystems to registers with.  Using 4 bits to separate
- * the priorities.  This can go a lot higher if needed be.
- */
-
-#define NMI_LOCAL_SHIFT		16	/* randomly picked */
-#define NMI_LOCAL_BIT		(1ULL << NMI_LOCAL_SHIFT)
-#define NMI_HIGH_PRIOR		(1ULL << 8)
-#define NMI_NORMAL_PRIOR	(1ULL << 4)
-#define NMI_LOW_PRIOR		(1ULL << 0)
-#define NMI_LOCAL_HIGH_PRIOR	(NMI_LOCAL_BIT | NMI_HIGH_PRIOR)
-#define NMI_LOCAL_NORMAL_PRIOR	(NMI_LOCAL_BIT | NMI_NORMAL_PRIOR)
-#define NMI_LOCAL_LOW_PRIOR	(NMI_LOCAL_BIT | NMI_LOW_PRIOR)
-
 #define NMI_FLAG_FIRST	1
 
 enum {
diff --git a/arch/x86/include/asm/reboot.h b/arch/x86/include/asm/reboot.h
index 3250e3d..92f29706 100644
--- a/arch/x86/include/asm/reboot.h
+++ b/arch/x86/include/asm/reboot.h
@@ -23,7 +23,7 @@ void machine_real_restart(unsigned int type);
 #define MRR_BIOS	0
 #define MRR_APM		1
 
-typedef void (*nmi_shootdown_cb)(int, struct die_args*);
+typedef void (*nmi_shootdown_cb)(int, struct pt_regs*);
 void nmi_shootdown_cpus(nmi_shootdown_cb callback);
 
 #endif /* _ASM_X86_REBOOT_H */
diff --git a/arch/x86/kernel/apic/hw_nmi.c b/arch/x86/kernel/apic/hw_nmi.c
index d5e57db0..31cb9ae 100644
--- a/arch/x86/kernel/apic/hw_nmi.c
+++ b/arch/x86/kernel/apic/hw_nmi.c
@@ -60,22 +60,10 @@ void arch_trigger_all_cpu_backtrace(void)
 }
 
 static int __kprobes
-arch_trigger_all_cpu_backtrace_handler(struct notifier_block *self,
-			 unsigned long cmd, void *__args)
+arch_trigger_all_cpu_backtrace_handler(unsigned int cmd, struct pt_regs *regs)
 {
-	struct die_args *args = __args;
-	struct pt_regs *regs;
 	int cpu;
 
-	switch (cmd) {
-	case DIE_NMI:
-		break;
-
-	default:
-		return NOTIFY_DONE;
-	}
-
-	regs = args->regs;
 	cpu = smp_processor_id();
 
 	if (cpumask_test_cpu(cpu, to_cpumask(backtrace_mask))) {
@@ -86,21 +74,16 @@ arch_trigger_all_cpu_backtrace_handler(struct notifier_block *self,
 		show_regs(regs);
 		arch_spin_unlock(&lock);
 		cpumask_clear_cpu(cpu, to_cpumask(backtrace_mask));
-		return NOTIFY_STOP;
+		return NMI_HANDLED;
 	}
 
-	return NOTIFY_DONE;
+	return NMI_DONE;
 }
 
-static __read_mostly struct notifier_block backtrace_notifier = {
-	.notifier_call          = arch_trigger_all_cpu_backtrace_handler,
-	.next                   = NULL,
-	.priority               = NMI_LOCAL_LOW_PRIOR,
-};
-
 static int __init register_trigger_all_cpu_backtrace(void)
 {
-	register_die_notifier(&backtrace_notifier);
+	register_nmi_handler(NMI_LOCAL, arch_trigger_all_cpu_backtrace_handler,
+				0, "arch_bt");
 	return 0;
 }
 early_initcall(register_trigger_all_cpu_backtrace);
diff --git a/arch/x86/kernel/apic/x2apic_uv_x.c b/arch/x86/kernel/apic/x2apic_uv_x.c
index adc66c3..88b0dbb 100644
--- a/arch/x86/kernel/apic/x2apic_uv_x.c
+++ b/arch/x86/kernel/apic/x2apic_uv_x.c
@@ -673,18 +673,11 @@ void __cpuinit uv_cpu_init(void)
 /*
  * When NMI is received, print a stack trace.
  */
-int uv_handle_nmi(struct notifier_block *self, unsigned long reason, void *data)
+int uv_handle_nmi(unsigned int reason, struct pt_regs *regs)
 {
 	unsigned long real_uv_nmi;
 	int bid;
 
-	if (reason != DIE_NMIUNKNOWN)
-		return NOTIFY_OK;
-
-	if (in_crash_kexec)
-		/* do nothing if entering the crash kernel */
-		return NOTIFY_OK;
-
 	/*
 	 * Each blade has an MMR that indicates when an NMI has been sent
 	 * to cpus on the blade. If an NMI is detected, atomically
@@ -705,7 +698,7 @@ int uv_handle_nmi(struct notifier_block *self, unsigned long reason, void *data)
 	}
 
 	if (likely(__get_cpu_var(cpu_last_nmi_count) == uv_blade_info[bid].nmi_count))
-		return NOTIFY_DONE;
+		return NMI_DONE;
 
 	__get_cpu_var(cpu_last_nmi_count) = uv_blade_info[bid].nmi_count;
 
@@ -718,17 +711,12 @@ int uv_handle_nmi(struct notifier_block *self, unsigned long reason, void *data)
 	dump_stack();
 	spin_unlock(&uv_nmi_lock);
 
-	return NOTIFY_STOP;
+	return NMI_HANDLED;
 }
 
-static struct notifier_block uv_dump_stack_nmi_nb = {
-	.notifier_call	= uv_handle_nmi,
-	.priority = NMI_LOCAL_LOW_PRIOR - 1,
-};
-
 void uv_register_nmi_notifier(void)
 {
-	if (register_die_notifier(&uv_dump_stack_nmi_nb))
+	if (register_nmi_handler(NMI_UNKNOWN, uv_handle_nmi, 0, "uv"))
 		printk(KERN_WARNING "UV NMI handler failed to register\n");
 }
 
diff --git a/arch/x86/kernel/cpu/mcheck/mce-inject.c b/arch/x86/kernel/cpu/mcheck/mce-inject.c
index 0ed633c..6199232 100644
--- a/arch/x86/kernel/cpu/mcheck/mce-inject.c
+++ b/arch/x86/kernel/cpu/mcheck/mce-inject.c
@@ -78,27 +78,20 @@ static void raise_exception(struct mce *m, struct pt_regs *pregs)
 
 static cpumask_var_t mce_inject_cpumask;
 
-static int mce_raise_notify(struct notifier_block *self,
-			    unsigned long val, void *data)
+static int mce_raise_notify(unsigned int cmd, struct pt_regs *regs)
 {
-	struct die_args *args = (struct die_args *)data;
 	int cpu = smp_processor_id();
 	struct mce *m = &__get_cpu_var(injectm);
-	if (val != DIE_NMI || !cpumask_test_cpu(cpu, mce_inject_cpumask))
-		return NOTIFY_DONE;
+	if (!cpumask_test_cpu(cpu, mce_inject_cpumask))
+		return NMI_DONE;
 	cpumask_clear_cpu(cpu, mce_inject_cpumask);
 	if (m->inject_flags & MCJ_EXCEPTION)
-		raise_exception(m, args->regs);
+		raise_exception(m, regs);
 	else if (m->status)
 		raise_poll(m);
-	return NOTIFY_STOP;
+	return NMI_HANDLED;
 }
 
-static struct notifier_block mce_raise_nb = {
-	.notifier_call = mce_raise_notify,
-	.priority = NMI_LOCAL_NORMAL_PRIOR,
-};
-
 /* Inject mce on current CPU */
 static int raise_local(void)
 {
@@ -216,7 +209,8 @@ static int inject_init(void)
 		return -ENOMEM;
 	printk(KERN_INFO "Machine check injector initialized\n");
 	mce_chrdev_ops.write = mce_write;
-	register_die_notifier(&mce_raise_nb);
+	register_nmi_handler(NMI_LOCAL, mce_raise_notify, 0,
+				"mce_notify");
 	return 0;
 }
 
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 08363b0..3fc65b6 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -908,9 +908,6 @@ void do_machine_check(struct pt_regs *regs, long error_code)
 
 	percpu_inc(mce_exception_count);
 
-	if (notify_die(DIE_NMI, "machine check", regs, error_code,
-			   18, SIGKILL) == NOTIFY_STOP)
-		goto out;
 	if (!banks)
 		goto out;
 
diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 594d425..90ffec5 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1377,68 +1377,14 @@ struct pmu_nmi_state {
 static DEFINE_PER_CPU(struct pmu_nmi_state, pmu_nmi);
 
 static int __kprobes
-perf_event_nmi_handler(struct notifier_block *self,
-			 unsigned long cmd, void *__args)
+perf_event_nmi_handler(unsigned int cmd, struct pt_regs *regs)
 {
-	struct die_args *args = __args;
-	unsigned int this_nmi;
-	int handled;
-
 	if (!atomic_read(&active_events))
-		return NOTIFY_DONE;
-
-	switch (cmd) {
-	case DIE_NMI:
-		break;
-	case DIE_NMIUNKNOWN:
-		this_nmi = percpu_read(irq_stat.__nmi_count);
-		if (this_nmi != __this_cpu_read(pmu_nmi.marked))
-			/* let the kernel handle the unknown nmi */
-			return NOTIFY_DONE;
-		/*
-		 * This one is a PMU back-to-back nmi. Two events
-		 * trigger 'simultaneously' raising two back-to-back
-		 * NMIs. If the first NMI handles both, the latter
-		 * will be empty and daze the CPU. So, we drop it to
-		 * avoid false-positive 'unknown nmi' messages.
-		 */
-		return NOTIFY_STOP;
-	default:
-		return NOTIFY_DONE;
-	}
-
-	handled = x86_pmu.handle_irq(args->regs);
-	if (!handled)
-		return NOTIFY_DONE;
+		return NMI_DONE;
 
-	this_nmi = percpu_read(irq_stat.__nmi_count);
-	if ((handled > 1) ||
-		/* the next nmi could be a back-to-back nmi */
-	    ((__this_cpu_read(pmu_nmi.marked) == this_nmi) &&
-	     (__this_cpu_read(pmu_nmi.handled) > 1))) {
-		/*
-		 * We could have two subsequent back-to-back nmis: The
-		 * first handles more than one counter, the 2nd
-		 * handles only one counter and the 3rd handles no
-		 * counter.
-		 *
-		 * This is the 2nd nmi because the previous was
-		 * handling more than one counter. We will mark the
-		 * next (3rd) and then drop it if unhandled.
-		 */
-		__this_cpu_write(pmu_nmi.marked, this_nmi + 1);
-		__this_cpu_write(pmu_nmi.handled, handled);
-	}
-
-	return NOTIFY_STOP;
+	return x86_pmu.handle_irq(regs);
 }
 
-static __read_mostly struct notifier_block perf_event_nmi_notifier = {
-	.notifier_call		= perf_event_nmi_handler,
-	.next			= NULL,
-	.priority		= NMI_LOCAL_LOW_PRIOR,
-};
-
 static struct event_constraint unconstrained;
 static struct event_constraint emptyconstraint;
 
@@ -1565,7 +1511,7 @@ static int __init init_hw_perf_events(void)
 		((1LL << x86_pmu.num_counters_fixed)-1) << X86_PMC_IDX_FIXED;
 
 	perf_events_lapic_init();
-	register_die_notifier(&perf_event_nmi_notifier);
+	register_nmi_handler(NMI_LOCAL, perf_event_nmi_handler, 0, "PMI");
 
 	unconstrained = (struct event_constraint)
 		__EVENT_CONSTRAINT(0, (1ULL << x86_pmu.num_counters) - 1,
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 764c7c2..13ad899 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -32,15 +32,12 @@ int in_crash_kexec;
 
 #if defined(CONFIG_SMP) && defined(CONFIG_X86_LOCAL_APIC)
 
-static void kdump_nmi_callback(int cpu, struct die_args *args)
+static void kdump_nmi_callback(int cpu, struct pt_regs *regs)
 {
-	struct pt_regs *regs;
 #ifdef CONFIG_X86_32
 	struct pt_regs fixed_regs;
 #endif
 
-	regs = args->regs;
-
 #ifdef CONFIG_X86_32
 	if (!user_mode_vm(regs)) {
 		crash_fixup_ss_esp(&fixed_regs, regs);
diff --git a/arch/x86/kernel/kgdb.c b/arch/x86/kernel/kgdb.c
index 00354d4..faba577 100644
--- a/arch/x86/kernel/kgdb.c
+++ b/arch/x86/kernel/kgdb.c
@@ -511,28 +511,37 @@ single_step_cont(struct pt_regs *regs, struct die_args *args)
 
 static int was_in_debug_nmi[NR_CPUS];
 
-static int __kgdb_notify(struct die_args *args, unsigned long cmd)
+static int kgdb_nmi_handler(unsigned int cmd, struct pt_regs *regs)
 {
-	struct pt_regs *regs = args->regs;
-
 	switch (cmd) {
-	case DIE_NMI:
+	case NMI_LOCAL:
 		if (atomic_read(&kgdb_active) != -1) {
 			/* KGDB CPU roundup */
 			kgdb_nmicallback(raw_smp_processor_id(), regs);
 			was_in_debug_nmi[raw_smp_processor_id()] = 1;
 			touch_nmi_watchdog();
-			return NOTIFY_STOP;
+			return NMI_HANDLED;
 		}
-		return NOTIFY_DONE;
+		break;
 
-	case DIE_NMIUNKNOWN:
+	case NMI_UNKNOWN:
 		if (was_in_debug_nmi[raw_smp_processor_id()]) {
 			was_in_debug_nmi[raw_smp_processor_id()] = 0;
-			return NOTIFY_STOP;
+			return NMI_HANDLED;
 		}
-		return NOTIFY_DONE;
+		break;
+	default:
+		/* do nothing */
+		break;
+	}
+	return NMI_DONE;
+}
+
+static int __kgdb_notify(struct die_args *args, unsigned long cmd)
+{
+	struct pt_regs *regs = args->regs;
 
+	switch (cmd) {
 	case DIE_DEBUG:
 		if (atomic_read(&kgdb_cpu_doing_single_step) != -1) {
 			if (user_mode(regs))
@@ -590,11 +599,6 @@ kgdb_notify(struct notifier_block *self, unsigned long cmd, void *ptr)
 
 static struct notifier_block kgdb_notifier = {
 	.notifier_call	= kgdb_notify,
-
-	/*
-	 * Lowest-prio notifier priority, we want to be notified last:
-	 */
-	.priority	= NMI_LOCAL_LOW_PRIOR,
 };
 
 /**
@@ -605,7 +609,31 @@ static struct notifier_block kgdb_notifier = {
  */
 int kgdb_arch_init(void)
 {
-	return register_die_notifier(&kgdb_notifier);
+	int retval;
+
+	retval = register_die_notifier(&kgdb_notifier);
+	if (retval)
+		goto out;
+
+	retval = register_nmi_handler(NMI_LOCAL, kgdb_nmi_handler,
+					0, "kgdb");
+	if (retval)
+		goto out1;
+
+	retval = register_nmi_handler(NMI_UNKNOWN, kgdb_nmi_handler,
+					0, "kgdb");
+
+	if (retval)
+		goto out2;
+
+	return retval;
+
+out2:
+	unregister_nmi_handler(NMI_LOCAL, "kgdb");
+out1:
+	unregister_die_notifier(&kgdb_notifier);
+out:
+	return retval;
 }
 
 static void kgdb_hw_overflow_handler(struct perf_event *event,
@@ -673,6 +701,8 @@ void kgdb_arch_exit(void)
 			breakinfo[i].pev = NULL;
 		}
 	}
+	unregister_nmi_handler(NMI_UNKNOWN, "kgdb");
+	unregister_nmi_handler(NMI_LOCAL, "kgdb");
 	unregister_die_notifier(&kgdb_notifier);
 }
 
diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index 327748d..8d9f8f2 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -1,6 +1,7 @@
 /*
  *  Copyright (C) 1991, 1992  Linus Torvalds
  *  Copyright (C) 2000, 2001, 2002 Andi Kleen, SuSE Labs
+ *  Copyright (C) 2011	Don Zickus Red Hat, Inc.
  *
  *  Pentium III FXSR, SSE support
  *	Gareth Hughes <gareth@valinux.com>, May 2000
@@ -248,8 +249,10 @@ io_check_error(unsigned char reason, struct pt_regs *regs)
 static notrace __kprobes void
 unknown_nmi_error(unsigned char reason, struct pt_regs *regs)
 {
-	if (notify_die(DIE_NMIUNKNOWN, "nmi", regs, reason, 2, SIGINT) ==
-			NOTIFY_STOP)
+	int handled;
+
+	handled = nmi_handle(NMI_UNKNOWN, regs);
+	if (handled) 
 		return;
 #ifdef CONFIG_MCA
 	/*
@@ -274,13 +277,15 @@ unknown_nmi_error(unsigned char reason, struct pt_regs *regs)
 static notrace __kprobes void default_do_nmi(struct pt_regs *regs)
 {
 	unsigned char reason = 0;
+	int handled;
 
 	/*
 	 * CPU-specific NMI must be processed before non-CPU-specific
 	 * NMI, otherwise we may lose it, because the CPU-specific
 	 * NMI can not be detected/processed on other CPUs.
 	 */
-	if (notify_die(DIE_NMI, "nmi", regs, 0, 2, SIGINT) == NOTIFY_STOP)
+	handled = nmi_handle(NMI_LOCAL, regs);
+	if (handled)
 		return;
 
 	/* Non-CPU-specific NMI: NMI sources can be processed on any CPU */
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index 9242436..adab340 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -464,7 +464,7 @@ static inline void kb_wait(void)
 	}
 }
 
-static void vmxoff_nmi(int cpu, struct die_args *args)
+static void vmxoff_nmi(int cpu, struct pt_regs *regs)
 {
 	cpu_emergency_vmxoff();
 }
@@ -736,14 +736,10 @@ static nmi_shootdown_cb shootdown_callback;
 
 static atomic_t waiting_for_crash_ipi;
 
-static int crash_nmi_callback(struct notifier_block *self,
-			unsigned long val, void *data)
+static int crash_nmi_callback(unsigned int val, struct pt_regs *regs)
 {
 	int cpu;
 
-	if (val != DIE_NMI)
-		return NOTIFY_OK;
-
 	cpu = raw_smp_processor_id();
 
 	/* Don't do anything if this handler is invoked on crashing cpu.
@@ -751,10 +747,10 @@ static int crash_nmi_callback(struct notifier_block *self,
 	 * an NMI if system was initially booted with nmi_watchdog parameter.
 	 */
 	if (cpu == crashing_cpu)
-		return NOTIFY_STOP;
+		return NMI_HANDLED;
 	local_irq_disable();
 
-	shootdown_callback(cpu, (struct die_args *)data);
+	shootdown_callback(cpu, regs);
 
 	atomic_dec(&waiting_for_crash_ipi);
 	/* Assume hlt works */
@@ -762,7 +758,7 @@ static int crash_nmi_callback(struct notifier_block *self,
 	for (;;)
 		cpu_relax();
 
-	return 1;
+	return NMI_HANDLED;
 }
 
 static void smp_send_nmi_allbutself(void)
@@ -770,12 +766,6 @@ static void smp_send_nmi_allbutself(void)
 	apic->send_IPI_allbutself(NMI_VECTOR);
 }
 
-static struct notifier_block crash_nmi_nb = {
-	.notifier_call = crash_nmi_callback,
-	/* we want to be the first one called */
-	.priority = NMI_LOCAL_HIGH_PRIOR+1,
-};
-
 /* Halt all other CPUs, calling the specified function on each of them
  *
  * This function can be used to halt all other CPUs on crash
@@ -794,7 +784,8 @@ void nmi_shootdown_cpus(nmi_shootdown_cb callback)
 
 	atomic_set(&waiting_for_crash_ipi, num_online_cpus() - 1);
 	/* Would it be better to replace the trap vector here? */
-	if (register_die_notifier(&crash_nmi_nb))
+	if (register_nmi_handler(NMI_LOCAL, crash_nmi_callback, 
+				 NMI_FLAG_FIRST, "crash"))
 		return;		/* return what? */
 	/* Ensure the new callback function is set before sending
 	 * out the NMI
diff --git a/arch/x86/oprofile/nmi_int.c b/arch/x86/oprofile/nmi_int.c
index 68894fd..adf8fb3 100644
--- a/arch/x86/oprofile/nmi_int.c
+++ b/arch/x86/oprofile/nmi_int.c
@@ -61,26 +61,15 @@ u64 op_x86_get_ctrl(struct op_x86_model_spec const *model,
 }
 
 
-static int profile_exceptions_notify(struct notifier_block *self,
-				     unsigned long val, void *data)
+static int profile_exceptions_notify(unsigned int val, struct pt_regs *regs)
 {
-	struct die_args *args = (struct die_args *)data;
-	int ret = NOTIFY_DONE;
-
-	switch (val) {
-	case DIE_NMI:
-		if (ctr_running)
-			model->check_ctrs(args->regs, &__get_cpu_var(cpu_msrs));
-		else if (!nmi_enabled)
-			break;
-		else
-			model->stop(&__get_cpu_var(cpu_msrs));
-		ret = NOTIFY_STOP;
-		break;
-	default:
-		break;
-	}
-	return ret;
+	if (ctr_running)
+		model->check_ctrs(regs, &__get_cpu_var(cpu_msrs));
+	else if (!nmi_enabled)
+		return NMI_DONE;
+	else
+		model->stop(&__get_cpu_var(cpu_msrs));
+	return NMI_HANDLED;
 }
 
 static void nmi_cpu_save_registers(struct op_msrs *msrs)
@@ -363,12 +352,6 @@ static void nmi_cpu_setup(void *dummy)
 	apic_write(APIC_LVTPC, APIC_DM_NMI);
 }
 
-static struct notifier_block profile_exceptions_nb = {
-	.notifier_call = profile_exceptions_notify,
-	.next = NULL,
-	.priority = NMI_LOCAL_LOW_PRIOR,
-};
-
 static void nmi_cpu_restore_registers(struct op_msrs *msrs)
 {
 	struct op_msr *counters = msrs->counters;
@@ -508,7 +491,8 @@ static int nmi_setup(void)
 	ctr_running = 0;
 	/* make variables visible to the nmi handler: */
 	smp_mb();
-	err = register_die_notifier(&profile_exceptions_nb);
+	err = register_nmi_handler(NMI_LOCAL, profile_exceptions_notify,
+					0, "oprofile");
 	if (err)
 		goto fail;
 
@@ -538,7 +522,7 @@ static void nmi_shutdown(void)
 	put_online_cpus();
 	/* make variables visible to the nmi handler: */
 	smp_mb();
-	unregister_die_notifier(&profile_exceptions_nb);
+	unregister_nmi_handler(NMI_LOCAL, "oprofile");
 	msrs = &get_cpu_var(cpu_msrs);
 	model->shutdown(msrs);
 	free_msrs();
diff --git a/arch/x86/oprofile/nmi_timer_int.c b/arch/x86/oprofile/nmi_timer_int.c
index 720bf5a..7f8052c 100644
--- a/arch/x86/oprofile/nmi_timer_int.c
+++ b/arch/x86/oprofile/nmi_timer_int.c
@@ -18,32 +18,16 @@
 #include <asm/apic.h>
 #include <asm/ptrace.h>
 
-static int profile_timer_exceptions_notify(struct notifier_block *self,
-					   unsigned long val, void *data)
+static int profile_timer_exceptions_notify(unsigned int val, struct pt_regs *regs)
 {
-	struct die_args *args = (struct die_args *)data;
-	int ret = NOTIFY_DONE;
-
-	switch (val) {
-	case DIE_NMI:
-		oprofile_add_sample(args->regs, 0);
-		ret = NOTIFY_STOP;
-		break;
-	default:
-		break;
-	}
-	return ret;
+	oprofile_add_sample(regs, 0);
+	return NMI_HANDLED;
 }
 
-static struct notifier_block profile_timer_exceptions_nb = {
-	.notifier_call = profile_timer_exceptions_notify,
-	.next = NULL,
-	.priority = NMI_LOW_PRIOR,
-};
-
 static int timer_start(void)
 {
-	if (register_die_notifier(&profile_timer_exceptions_nb))
+	if (register_nmi_handler(NMI_LOCAL, profile_timer_exceptions_notify,
+					0, "oprofile-timer"))
 		return 1;
 	return 0;
 }
@@ -51,7 +35,7 @@ static int timer_start(void)
 
 static void timer_stop(void)
 {
-	unregister_die_notifier(&profile_timer_exceptions_nb);
+	unregister_nmi_handler(NMI_LOCAL, "oprofile-timer");
 	synchronize_sched();  /* Allow already-started NMIs to complete. */
 }
 
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 0784f99..b8e08cb 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -50,6 +50,7 @@
 #include <acpi/hed.h>
 #include <asm/mce.h>
 #include <asm/tlbflush.h>
+#include <asm/nmi.h>
 
 #include "apei-internal.h"
 
@@ -749,15 +750,11 @@ static void ghes_proc_in_irq(struct irq_work *irq_work)
 	}
 }
 
-static int ghes_notify_nmi(struct notifier_block *this,
-				  unsigned long cmd, void *data)
+static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs)
 {
 	struct ghes *ghes, *ghes_global = NULL;
 	int sev, sev_global = -1;
-	int ret = NOTIFY_DONE;
-
-	if (cmd != DIE_NMI)
-		return ret;
+	int ret = NMI_DONE;
 
 	raw_spin_lock(&ghes_nmi_lock);
 	list_for_each_entry_rcu(ghes, &ghes_nmi, list) {
@@ -770,10 +767,10 @@ static int ghes_notify_nmi(struct notifier_block *this,
 			sev_global = sev;
 			ghes_global = ghes;
 		}
-		ret = NOTIFY_STOP;
+		ret = NMI_HANDLED;
 	}
 
-	if (ret == NOTIFY_DONE)
+	if (ret == NMI_DONE)
 		goto out;
 
 	if (sev_global >= GHES_SEV_PANIC) {
@@ -825,10 +822,6 @@ static struct notifier_block ghes_notifier_sci = {
 	.notifier_call = ghes_notify_sci,
 };
 
-static struct notifier_block ghes_notifier_nmi = {
-	.notifier_call = ghes_notify_nmi,
-};
-
 static unsigned long ghes_esource_prealloc_size(
 	const struct acpi_hest_generic *generic)
 {
@@ -918,7 +911,8 @@ static int __devinit ghes_probe(struct platform_device *ghes_dev)
 		ghes_estatus_pool_expand(len);
 		mutex_lock(&ghes_list_mutex);
 		if (list_empty(&ghes_nmi))
-			register_die_notifier(&ghes_notifier_nmi);
+			register_nmi_handler(NMI_LOCAL, ghes_notify_nmi, 0,
+						"ghes");
 		list_add_rcu(&ghes->list, &ghes_nmi);
 		mutex_unlock(&ghes_list_mutex);
 		break;
@@ -964,7 +958,7 @@ static int __devexit ghes_remove(struct platform_device *ghes_dev)
 		mutex_lock(&ghes_list_mutex);
 		list_del_rcu(&ghes->list);
 		if (list_empty(&ghes_nmi))
-			unregister_die_notifier(&ghes_notifier_nmi);
+			unregister_nmi_handler(NMI_LOCAL, "ghes");
 		mutex_unlock(&ghes_list_mutex);
 		/*
 		 * To synchronize with NMI handler, ghes can only be
diff --git a/drivers/char/ipmi/ipmi_watchdog.c b/drivers/char/ipmi/ipmi_watchdog.c
index 3302586..3dcab56 100644
--- a/drivers/char/ipmi/ipmi_watchdog.c
+++ b/drivers/char/ipmi/ipmi_watchdog.c
@@ -1077,17 +1077,8 @@ static void ipmi_unregister_watchdog(int ipmi_intf)
 
 #ifdef HAVE_DIE_NMI
 static int
-ipmi_nmi(struct notifier_block *self, unsigned long val, void *data)
+ipmi_nmi(unsigned int val, struct pt_regs *regs)
 {
-	struct die_args *args = data;
-
-	if (val != DIE_NMIUNKNOWN)
-		return NOTIFY_OK;
-
-	/* Hack, if it's a memory or I/O error, ignore it. */
-	if (args->err & 0xc0)
-		return NOTIFY_OK;
-
 	/*
 	 * If we get here, it's an NMI that's not a memory or I/O
 	 * error.  We can't truly tell if it's from IPMI or not
@@ -1097,15 +1088,15 @@ ipmi_nmi(struct notifier_block *self, unsigned long val, void *data)
 
 	if (testing_nmi) {
 		testing_nmi = 2;
-		return NOTIFY_STOP;
+		return NMI_HANDLED;
 	}
 
 	/* If we are not expecting a timeout, ignore it. */
 	if (ipmi_watchdog_state == WDOG_TIMEOUT_NONE)
-		return NOTIFY_OK;
+		return NMI_DONE;
 
 	if (preaction_val != WDOG_PRETIMEOUT_NMI)
-		return NOTIFY_OK;
+		return NMI_DONE;
 
 	/*
 	 * If no one else handled the NMI, we assume it was the IPMI
@@ -1120,12 +1111,8 @@ ipmi_nmi(struct notifier_block *self, unsigned long val, void *data)
 			panic(PFX "pre-timeout");
 	}
 
-	return NOTIFY_STOP;
+	return NMI_HANDLED;
 }
-
-static struct notifier_block ipmi_nmi_handler = {
-	.notifier_call = ipmi_nmi
-};
 #endif
 
 static int wdog_reboot_handler(struct notifier_block *this,
@@ -1290,7 +1277,8 @@ static void check_parms(void)
 		}
 	}
 	if (do_nmi && !nmi_handler_registered) {
-		rv = register_die_notifier(&ipmi_nmi_handler);
+		rv = register_nmi_handler(NMI_UNKNOWN, ipmi_nmi, 0,
+						"ipmi");
 		if (rv) {
 			printk(KERN_WARNING PFX
 			       "Can't register nmi handler\n");
@@ -1298,7 +1286,7 @@ static void check_parms(void)
 		} else
 			nmi_handler_registered = 1;
 	} else if (!do_nmi && nmi_handler_registered) {
-		unregister_die_notifier(&ipmi_nmi_handler);
+		unregister_nmi_handler(NMI_UNKNOWN, "ipmi");
 		nmi_handler_registered = 0;
 	}
 #endif
@@ -1336,7 +1324,7 @@ static int __init ipmi_wdog_init(void)
 	if (rv) {
 #ifdef HAVE_DIE_NMI
 		if (nmi_handler_registered)
-			unregister_die_notifier(&ipmi_nmi_handler);
+			unregister_nmi_handler(NMI_UNKNOWN, "ipmi");
 #endif
 		atomic_notifier_chain_unregister(&panic_notifier_list,
 						 &wdog_panic_notifier);
@@ -1357,7 +1345,7 @@ static void __exit ipmi_wdog_exit(void)
 
 #ifdef HAVE_DIE_NMI
 	if (nmi_handler_registered)
-		unregister_die_notifier(&ipmi_nmi_handler);
+		unregister_nmi_handler(NMI_UNKNOWN, "ipmi");
 #endif
 
 	atomic_notifier_chain_unregister(&panic_notifier_list,
diff --git a/drivers/watchdog/hpwdt.c b/drivers/watchdog/hpwdt.c
index 410fba4..e0f6202 100644
--- a/drivers/watchdog/hpwdt.c
+++ b/drivers/watchdog/hpwdt.c
@@ -477,15 +477,12 @@ static int hpwdt_time_left(void)
 /*
  *	NMI Handler
  */
-static int hpwdt_pretimeout(struct notifier_block *nb, unsigned long ulReason,
+static int hpwdt_pretimeout(unsigned int ulReason, struct pt_regs *regs,
 				void *data)
 {
 	unsigned long rom_pl;
 	static int die_nmi_called;
 
-	if (ulReason != DIE_NMIUNKNOWN)
-		goto out;
-
 	if (!hpwdt_nmi_decoding)
 		goto out;
 
@@ -507,7 +504,7 @@ static int hpwdt_pretimeout(struct notifier_block *nb, unsigned long ulReason,
 		"Management Log for details.\n");
 
 out:
-	return NOTIFY_OK;
+	return NMI_DONE;
 }
 #endif /* CONFIG_HPWDT_NMI_DECODING */
 
@@ -647,13 +644,6 @@ static struct miscdevice hpwdt_miscdev = {
 	.fops = &hpwdt_fops,
 };
 
-#ifdef CONFIG_HPWDT_NMI_DECODING
-static struct notifier_block die_notifier = {
-	.notifier_call = hpwdt_pretimeout,
-	.priority = 0,
-};
-#endif /* CONFIG_HPWDT_NMI_DECODING */
-
 /*
  *	Init & Exit
  */
@@ -739,10 +729,9 @@ static int __devinit hpwdt_init_nmi_decoding(struct pci_dev *dev)
 	 * die notify list to handle a critical NMI. The default is to
 	 * be last so other users of the NMI signal can function.
 	 */
-	if (priority)
-		die_notifier.priority = 0x7FFFFFFF;
-
-	retval = register_die_notifier(&die_notifier);
+	retval = register_nmi_handler(NMI_UNKNOWN, hpwdt_pretimeout,
+					(priority) ? NMI_HANDLE_FIRST : 0,
+					"hpwdt");
 	if (retval != 0) {
 		dev_warn(&dev->dev,
 			"Unable to register a die notifier (err=%d).\n",
@@ -762,7 +751,7 @@ static int __devinit hpwdt_init_nmi_decoding(struct pci_dev *dev)
 
 static void hpwdt_exit_nmi_decoding(void)
 {
-	unregister_die_notifier(&die_notifier);
+	unregister_nmi_handler(NMI_UNKNOWN, "hpwdt");
 	if (cru_rom_addr)
 		iounmap(cru_rom_addr);
 }
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [V6][PATCH 4/6] x86, nmi:  add in logic to handle multiple events and unknown NMIs
  2011-09-23 19:17 [V6][PATCH 0/6] x86, nmi: new NMI handling routines Don Zickus
                   ` (2 preceding siblings ...)
  2011-09-23 19:17 ` [V6][PATCH 3/6] x86, nmi: wire up NMI handlers to new routines Don Zickus
@ 2011-09-23 19:17 ` Don Zickus
  2011-09-26  8:59   ` Peter Zijlstra
  2011-09-28 10:31   ` Robert Richter
  2011-09-23 19:17 ` [V6][PATCH 5/6] x86, nmi: track NMI usage stats Don Zickus
  2011-09-23 19:17 ` [V6][PATCH 6/6] x86, nmi: print out NMI stats in /proc/interrupts Don Zickus
  5 siblings, 2 replies; 19+ messages in thread
From: Don Zickus @ 2011-09-23 19:17 UTC (permalink / raw)
  To: x86, Andi Kleen, Robert Richter, Peter Zijlstra, ying.huang
  Cc: LKML, paulmck, avi, jeremy, Don Zickus

Previous patches allow the NMI subsystem to process multipe NMI events
in one NMI.  As previously discussed this can cause issues when an event
triggered another NMI but is processed in the current NMI.  This causes the
next NMI to go unprocessed and become an 'unknown' NMI.

To handle this, we first have to flag whether or not the NMI handler handled
more than one event or not.  If it did, then there exists a chance that
the next NMI might be already processed.  Once the NMI is flagged as a
candidate to be swallowed, we next look for a back-to-back NMI condition.

This is determined by looking at the %rip from pt_regs.  If it is the same
as the previous NMI, it is assumed the cpu did not have a chance to jump
back into a non-NMI context and execute code and instead handled another NMI.

If both of those conditions are true then we will swallow any unknown NMI.

There still exists a chance that we accidentally swallow a real unknown NMI,
but for now things seem better.

An optimization has also been added to the nmi notifier rountine.  Because x86
can latch up to one NMI while currently processing an NMI, we don't have to
worry about executing _all_ the handlers in a standalone NMI.  The idea is
if multiple NMIs come in, the second NMI will represent them.  For those
back-to-back NMI cases, we have the potentail to drop NMIs.  Therefore only
execute all the handlers in the second half of a detected back-to-back NMI.

V2:
  - forgot to add the 'read' code for swallow_nmi (went into next patch)

V3:
  - redesigned the algorithm to utilize Avi's idea of detecting a back-to-back
    NMI with %rip.
V4:
  - clean up fixes, like adding 'static', rename save_rip to last_nmi_rip

V5:
  - wire up the idle path to reset the back-to-back NMI logic

v6:
  - provide an example scenario where a 'real' unknown NMI would get swallowed

Signed-off-by: Don Zickus <dzickus@redhat.com>
---
 arch/x86/include/asm/nmi.h   |    1 +
 arch/x86/kernel/nmi.c        |   94 +++++++++++++++++++++++++++++++++++++++--
 arch/x86/kernel/process_32.c |    2 +
 arch/x86/kernel/process_64.c |    2 +
 4 files changed, 94 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/nmi.h b/arch/x86/include/asm/nmi.h
index 5361095..fd3f9f1 100644
--- a/arch/x86/include/asm/nmi.h
+++ b/arch/x86/include/asm/nmi.h
@@ -42,5 +42,6 @@ void unregister_nmi_handler(unsigned int, const char *);
 
 void stop_nmi(void);
 void restart_nmi(void);
+void local_touch_nmi(void);
 
 #endif /* _ASM_X86_NMI_H */
diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index 8d9f8f2..673e7a5 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -71,7 +71,7 @@ __setup("unknown_nmi_panic", setup_unknown_nmi_panic);
 
 #define nmi_to_desc(type) (&nmi_desc[type])
 
-static int notrace __kprobes nmi_handle(unsigned int type, struct pt_regs *regs)
+static int notrace __kprobes nmi_handle(unsigned int type, struct pt_regs *regs, bool b2b)
 {
 	struct nmi_desc *desc = nmi_to_desc(type);
 	struct nmiaction *a;
@@ -89,6 +89,15 @@ static int notrace __kprobes nmi_handle(unsigned int type, struct pt_regs *regs)
 
 		handled += a->handler(type, regs);
 
+		/*
+ 		 * Optimization: only loop once if this is not a 
+ 		 * back-to-back NMI.  The idea is nothing is dropped
+ 		 * on the first NMI, only on the second of a back-to-back
+ 		 * NMI.  No need to waste cycles going through all the
+ 		 * handlers.
+ 		 */
+		if (!b2b && handled)
+			break;
 	}
 
 	rcu_read_unlock();
@@ -251,7 +260,13 @@ unknown_nmi_error(unsigned char reason, struct pt_regs *regs)
 {
 	int handled;
 
-	handled = nmi_handle(NMI_UNKNOWN, regs);
+	/*
+	 * Use 'false' as back-to-back NMIs are dealt with one level up.
+	 * Of course this makes having multiple 'unknown' handlers useless
+	 * as only the first one is ever run (unless it can actually determine
+	 * if it caused the NMI)
+	 */
+	handled = nmi_handle(NMI_UNKNOWN, regs, false);
 	if (handled) 
 		return;
 #ifdef CONFIG_MCA
@@ -274,19 +289,49 @@ unknown_nmi_error(unsigned char reason, struct pt_regs *regs)
 	pr_emerg("Dazed and confused, but trying to continue\n");
 }
 
+static DEFINE_PER_CPU(bool, swallow_nmi);
+static DEFINE_PER_CPU(unsigned long, last_nmi_rip);
+
 static notrace __kprobes void default_do_nmi(struct pt_regs *regs)
 {
 	unsigned char reason = 0;
 	int handled;
+	bool b2b = false;
 
 	/*
 	 * CPU-specific NMI must be processed before non-CPU-specific
 	 * NMI, otherwise we may lose it, because the CPU-specific
 	 * NMI can not be detected/processed on other CPUs.
 	 */
-	handled = nmi_handle(NMI_LOCAL, regs);
-	if (handled)
+
+	/*
+	 * Back-to-back NMIs are interesting because they can either
+	 * be two NMI or more than two NMIs (any thing over two is dropped
+	 * due to NMI being edge-triggered).  If this is the second half
+	 * of the back-to-back NMI, assume we dropped things and process
+	 * more handlers.  Otherwise reset the 'swallow' NMI behaviour
+	 */
+	if (regs->ip == __this_cpu_read(last_nmi_rip))
+		b2b = true;
+	else
+		__this_cpu_write(swallow_nmi, false);
+
+	__this_cpu_write(last_nmi_rip, regs->ip);
+
+	handled = nmi_handle(NMI_LOCAL, regs, b2b);
+	if (handled) {
+		/*
+		 * There are cases when a NMI handler handles multiple
+		 * events in the current NMI.  One of these events may
+		 * be queued for in the next NMI.  Because the event is
+		 * already handled, the next NMI will result in an unknown
+		 * NMI.  Instead lets flag this for a potential NMI to
+		 * swallow.
+		 */
+		if (handled > 1)
+			__this_cpu_write(swallow_nmi, true);
 		return;
+	}
 
 	/* Non-CPU-specific NMI: NMI sources can be processed on any CPU */
 	raw_spin_lock(&nmi_reason_lock);
@@ -309,7 +354,40 @@ static notrace __kprobes void default_do_nmi(struct pt_regs *regs)
 	}
 	raw_spin_unlock(&nmi_reason_lock);
 
-	unknown_nmi_error(reason, regs);
+	/*
+	 * Only one NMI can be latched at a time.  To handle
+	 * this we may process multiple nmi handlers at once to
+	 * cover the case where an NMI is dropped.  The downside
+	 * to this approach is we may process an NMI prematurely,
+	 * while its real NMI is sitting latched.  This will cause
+	 * an unknown NMI on the next run of the NMI processing.
+	 * 
+	 * We tried to flag that condition above, by setting the
+	 * swallow_nmi flag when we process more than one event.
+	 * This condition is also only present on the second half
+	 * of a back-to-back NMI, so we flag that condition too.
+	 *
+	 * If both are true, we assume we already processed this
+	 * NMI previously and we swallow it.  Otherwise we reset
+	 * the logic.
+	 *
+	 * There are scenarios where we may accidentally swallow
+	 * a 'real' unknown NMI.  For example, while processing
+	 * a perf NMI another perf NMI comes in along with a
+	 * 'real' unknown NMI.  These two NMIs get combined into
+	 * one (as descibed above).  When the next NMI gets 
+	 * processed, it will be flagged by perf as handled, but
+	 * noone will know that there was a 'real' unknown NMI sent
+	 * also.  As a result it gets swallowed.  Or if the first
+	 * perf NMI returns two events handled then the second 
+	 * NMI will get eaten by the logic below, again losing a
+	 * 'real' unknown NMI.  But this is the best we can do
+	 * for now.
+	 */
+	if (b2b && __this_cpu_read(swallow_nmi))
+		;
+	else
+		unknown_nmi_error(reason, regs);
 }
 
 dotraplinkage notrace __kprobes void
@@ -334,3 +412,9 @@ void restart_nmi(void)
 {
 	ignore_nmis--;
 }
+
+/* reset the back-to-back NMI logic */
+void local_touch_nmi(void)
+{
+	__this_cpu_write(last_nmi_rip, 0);
+}
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 7a3b651..46ff054 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -57,6 +57,7 @@
 #include <asm/idle.h>
 #include <asm/syscalls.h>
 #include <asm/debugreg.h>
+#include <asm/nmi.h>
 
 asmlinkage void ret_from_fork(void) __asm__("ret_from_fork");
 
@@ -107,6 +108,7 @@ void cpu_idle(void)
 			if (cpu_is_offline(cpu))
 				play_dead();
 
+			local_touch_nmi();
 			local_irq_disable();
 			/* Don't trace irqs off for idle */
 			stop_critical_timings();
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index f693e44..3bd7e6e 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -51,6 +51,7 @@
 #include <asm/idle.h>
 #include <asm/syscalls.h>
 #include <asm/debugreg.h>
+#include <asm/nmi.h>
 
 asmlinkage extern void ret_from_fork(void);
 
@@ -133,6 +134,7 @@ void cpu_idle(void)
 			 * from here on, until they go to idle.
 			 * Otherwise, idle callbacks can misfire.
 			 */
+			local_touch_nmi();
 			local_irq_disable();
 			enter_idle();
 			/* Don't trace irqs off for idle */
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [V6][PATCH 5/6] x86, nmi: track NMI usage stats
  2011-09-23 19:17 [V6][PATCH 0/6] x86, nmi: new NMI handling routines Don Zickus
                   ` (3 preceding siblings ...)
  2011-09-23 19:17 ` [V6][PATCH 4/6] x86, nmi: add in logic to handle multiple events and unknown NMIs Don Zickus
@ 2011-09-23 19:17 ` Don Zickus
  2011-09-23 19:17 ` [V6][PATCH 6/6] x86, nmi: print out NMI stats in /proc/interrupts Don Zickus
  5 siblings, 0 replies; 19+ messages in thread
From: Don Zickus @ 2011-09-23 19:17 UTC (permalink / raw)
  To: x86, Andi Kleen, Robert Richter, Peter Zijlstra, ying.huang
  Cc: LKML, paulmck, avi, jeremy, Don Zickus

Now that the NMI handler are broken into lists, increment the appropriate
stats for each list.  This allows us to see what is going on when they
get printed out in the next patch.

Signed-off-by: Don Zickus <dzickus@redhat.com>
---
 arch/x86/kernel/nmi.c |   20 ++++++++++++++++++--
 1 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index 673e7a5..f0450da 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -53,6 +53,15 @@ static struct nmi_desc nmi_desc[NMI_MAX] =
 
 };
 
+struct nmi_stats {
+	unsigned int normal;
+	unsigned int unknown;
+	unsigned int external;
+	unsigned int swallow;
+};
+
+static DEFINE_PER_CPU(struct nmi_stats, nmi_stats);
+
 static int ignore_nmis;
 
 int unknown_nmi_panic;
@@ -267,8 +276,13 @@ unknown_nmi_error(unsigned char reason, struct pt_regs *regs)
 	 * if it caused the NMI)
 	 */
 	handled = nmi_handle(NMI_UNKNOWN, regs, false);
-	if (handled) 
+	if (handled) {
+		__this_cpu_add(nmi_stats.unknown, handled);
 		return;
+	}
+
+	__this_cpu_add(nmi_stats.unknown, 1);
+
 #ifdef CONFIG_MCA
 	/*
 	 * Might actually be able to figure out what the guilty party
@@ -319,6 +333,7 @@ static notrace __kprobes void default_do_nmi(struct pt_regs *regs)
 	__this_cpu_write(last_nmi_rip, regs->ip);
 
 	handled = nmi_handle(NMI_LOCAL, regs, b2b);
+	__this_cpu_add(nmi_stats.normal, handled);
 	if (handled) {
 		/*
 		 * There are cases when a NMI handler handles multiple
@@ -349,6 +364,7 @@ static notrace __kprobes void default_do_nmi(struct pt_regs *regs)
 		 */
 		reassert_nmi();
 #endif
+		__this_cpu_add(nmi_stats.external, 1);
 		raw_spin_unlock(&nmi_reason_lock);
 		return;
 	}
@@ -385,7 +401,7 @@ static notrace __kprobes void default_do_nmi(struct pt_regs *regs)
 	 * for now.
 	 */
 	if (b2b && __this_cpu_read(swallow_nmi))
-		;
+		__this_cpu_add(nmi_stats.swallow, 1);
 	else
 		unknown_nmi_error(reason, regs);
 }
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [V6][PATCH 6/6] x86, nmi: print out NMI stats in /proc/interrupts
  2011-09-23 19:17 [V6][PATCH 0/6] x86, nmi: new NMI handling routines Don Zickus
                   ` (4 preceding siblings ...)
  2011-09-23 19:17 ` [V6][PATCH 5/6] x86, nmi: track NMI usage stats Don Zickus
@ 2011-09-23 19:17 ` Don Zickus
  5 siblings, 0 replies; 19+ messages in thread
From: Don Zickus @ 2011-09-23 19:17 UTC (permalink / raw)
  To: x86, Andi Kleen, Robert Richter, Peter Zijlstra, ying.huang
  Cc: LKML, paulmck, avi, jeremy, Don Zickus

This is a cheap hack to add the stats to the middle of /proc/interrupts.
It is more of a conversation starter than anything as I am not sure
the right letters and place to put this stuff.

The benefit of these stats is a better breakdown of which list the NMIs
get handled in either a normal handler, unknown, or external.  It also
list the number of unknown NMIs swallowed to help check for false
positives or not.  Another benefit is the ability to actually see which
NMI handlers are currently registered in the system.

The output of 'cat /proc/interrupts/ will look like this:

<snip>
 58:        275          0        864          0   PCI-MSI-edge      eth0
NMI:       4161       4155        158       4194   Non-maskable interrupts
SWA:          0          0          0          0   Unknown NMIs swallowed
  0:       4161       4155        158       4194   NMI  PMI, arch_bt
UNK:          0          0          0          0   NMI
EXT:          0          0          0          0   NMI
LOC:      12653      13304      13974      12926   Local timer interrupts
SPU:          0          0          0          0   Spurious interrupts
PMI:          6          6          5          6   Performance monitoring interrupts
IWI:          0          0          0          0   IRQ work interrupts
RES:       1839       1897       1821       1854   Rescheduling interrupts
CAL:        524       2714        392        331   Function call interrupts
TLB:        217        146        593        576   TLB shootdowns
TRM:          0          0          0          0   Thermal event interrupts
THR:          0          0          0          0   Threshold APIC interrupts
MCE:          0          0          0          0   Machine check exceptions
MCP:          1          1          1          1   Machine check polls
ERR:          0
MIS:          0

V2:
  - convert to list_for_each_rcu

Signed-off-by: Don Zickus <dzickus@redhat.com>
---
 arch/x86/include/asm/nmi.h |    2 ++
 arch/x86/kernel/irq.c      |    2 ++
 arch/x86/kernel/nmi.c      |   33 +++++++++++++++++++++++++++++++++
 3 files changed, 37 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/nmi.h b/arch/x86/include/asm/nmi.h
index fd3f9f1..af5fa3b 100644
--- a/arch/x86/include/asm/nmi.h
+++ b/arch/x86/include/asm/nmi.h
@@ -24,6 +24,8 @@ void arch_trigger_all_cpu_backtrace(void);
 
 #define NMI_FLAG_FIRST	1
 
+void arch_show_nmi(struct seq_file *p, int prec);
+
 enum {
 	NMI_LOCAL=0,
 	NMI_UNKNOWN,
diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
index 6c0802e..44d1cac 100644
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -16,6 +16,7 @@
 #include <asm/idle.h>
 #include <asm/mce.h>
 #include <asm/hw_irq.h>
+#include <asm/nmi.h>
 
 atomic_t irq_err_count;
 
@@ -55,6 +56,7 @@ int arch_show_interrupts(struct seq_file *p, int prec)
 	for_each_online_cpu(j)
 		seq_printf(p, "%10u ", irq_stats(j)->__nmi_count);
 	seq_printf(p, "  Non-maskable interrupts\n");
+	arch_show_nmi(p, prec);
 #ifdef CONFIG_X86_LOCAL_APIC
 	seq_printf(p, "%*s: ", prec, "LOC");
 	for_each_online_cpu(j)
diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index f0450da..9e8d2c2 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -434,3 +434,36 @@ void local_touch_nmi(void)
 {
 	__this_cpu_write(last_nmi_rip, 0);
 }
+
+void arch_show_nmi(struct seq_file *p, int prec)
+{
+	int j;
+	struct nmiaction *action;
+
+	seq_printf(p, "%*s: ", prec, "SWA");
+	for_each_online_cpu(j)
+		seq_printf(p, "%10u ", per_cpu(nmi_stats.swallow, j));
+	seq_printf(p, "  Unknown NMIs swallowed\n");
+
+	seq_printf(p, "%*s: ", prec, "  0");
+	for_each_online_cpu(j)
+		seq_printf(p, "%10u ", per_cpu(nmi_stats.normal, j));
+	seq_printf(p, "  NMI\t");
+	list_for_each_entry_rcu(action, &(nmi_to_desc(NMI_LOCAL))->head, list)
+		seq_printf(p, "%s, ", action->name);
+	seq_putc(p, '\n');
+
+	seq_printf(p, "%*s: ", prec, "UNK");
+	for_each_online_cpu(j)
+		seq_printf(p, "%10u ", per_cpu(nmi_stats.unknown, j));
+	seq_printf(p, "  NMI\t");
+	list_for_each_entry_rcu(action, &(nmi_to_desc(NMI_UNKNOWN))->head, list)
+		seq_printf(p, "%s, ", action->name);
+	seq_putc(p, '\n');
+
+	seq_printf(p, "%*s: ", prec, "EXT");
+	for_each_online_cpu(j)
+		seq_printf(p, "%10u ", per_cpu(nmi_stats.external, j));
+	seq_printf(p, "  NMI");
+	seq_putc(p, '\n');
+}
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [V6][PATCH 4/6] x86, nmi:  add in logic to handle multiple events and unknown NMIs
  2011-09-23 19:17 ` [V6][PATCH 4/6] x86, nmi: add in logic to handle multiple events and unknown NMIs Don Zickus
@ 2011-09-26  8:59   ` Peter Zijlstra
  2011-09-26 12:55     ` Don Zickus
  2011-09-28 10:31   ` Robert Richter
  1 sibling, 1 reply; 19+ messages in thread
From: Peter Zijlstra @ 2011-09-26  8:59 UTC (permalink / raw)
  To: Don Zickus
  Cc: x86, Andi Kleen, Robert Richter, ying.huang, LKML, paulmck, avi, jeremy

On Fri, 2011-09-23 at 15:17 -0400, Don Zickus wrote:
> +       /*
> +        * Use 'false' as back-to-back NMIs are dealt with one level up.
> +        * Of course this makes having multiple 'unknown' handlers useless
> +        * as only the first one is ever run (unless it can actually determine
> +        * if it caused the NMI)
> +        */
> +       handled = nmi_handle(NMI_UNKNOWN, regs, false); 

Shouldn't we then also add something like:

WARN_ON_ONCE(type == NMI_UNKNOWN && !list_empty(&desc->head));

to __setup_nmi()?



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [V6][PATCH 4/6] x86, nmi:  add in logic to handle multiple events and unknown NMIs
  2011-09-26  8:59   ` Peter Zijlstra
@ 2011-09-26 12:55     ` Don Zickus
  0 siblings, 0 replies; 19+ messages in thread
From: Don Zickus @ 2011-09-26 12:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, Andi Kleen, Robert Richter, ying.huang, LKML, paulmck, avi, jeremy

On Mon, Sep 26, 2011 at 10:59:14AM +0200, Peter Zijlstra wrote:
> On Fri, 2011-09-23 at 15:17 -0400, Don Zickus wrote:
> > +       /*
> > +        * Use 'false' as back-to-back NMIs are dealt with one level up.
> > +        * Of course this makes having multiple 'unknown' handlers useless
> > +        * as only the first one is ever run (unless it can actually determine
> > +        * if it caused the NMI)
> > +        */
> > +       handled = nmi_handle(NMI_UNKNOWN, regs, false); 
> 
> Shouldn't we then also add something like:
> 
> WARN_ON_ONCE(type == NMI_UNKNOWN && !list_empty(&desc->head));

That would make sense.

Cheers,
Don

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [V6][PATCH 4/6] x86, nmi:  add in logic to handle multiple events and unknown NMIs
  2011-09-23 19:17 ` [V6][PATCH 4/6] x86, nmi: add in logic to handle multiple events and unknown NMIs Don Zickus
  2011-09-26  8:59   ` Peter Zijlstra
@ 2011-09-28 10:31   ` Robert Richter
  2011-09-28 12:37     ` Don Zickus
  1 sibling, 1 reply; 19+ messages in thread
From: Robert Richter @ 2011-09-28 10:31 UTC (permalink / raw)
  To: Don Zickus
  Cc: x86, Andi Kleen, Peter Zijlstra, ying.huang, LKML, paulmck, avi, jeremy

On 23.09.11 15:17:13, Don Zickus wrote:
> @@ -89,6 +89,15 @@ static int notrace __kprobes nmi_handle(unsigned int type, struct pt_regs *regs)
>  
>  		handled += a->handler(type, regs);
>  
> +		/*
> + 		 * Optimization: only loop once if this is not a 
> + 		 * back-to-back NMI.  The idea is nothing is dropped
> + 		 * on the first NMI, only on the second of a back-to-back
> + 		 * NMI.  No need to waste cycles going through all the
> + 		 * handlers.
> + 		 */
> +		if (!b2b && handled)
> +			break;

I don't think we can leave this in. As said, there are cases that 2
nmis trigger but the handler is called only once. Only the first would
be handled then, and the second get lost cause there is no 2nd nmi
call.

>  	}
>  
>  	rcu_read_unlock();
> @@ -251,7 +260,13 @@ unknown_nmi_error(unsigned char reason, struct pt_regs *regs)
>  {
>  	int handled;
>  
> -	handled = nmi_handle(NMI_UNKNOWN, regs);
> +	/*
> +	 * Use 'false' as back-to-back NMIs are dealt with one level up.
> +	 * Of course this makes having multiple 'unknown' handlers useless
> +	 * as only the first one is ever run (unless it can actually determine
> +	 * if it caused the NMI)
> +	 */
> +	handled = nmi_handle(NMI_UNKNOWN, regs, false);
>  	if (handled) 
>  		return;
>  #ifdef CONFIG_MCA
> @@ -274,19 +289,49 @@ unknown_nmi_error(unsigned char reason, struct pt_regs *regs)
>  	pr_emerg("Dazed and confused, but trying to continue\n");
>  }
>  
> +static DEFINE_PER_CPU(bool, swallow_nmi);
> +static DEFINE_PER_CPU(unsigned long, last_nmi_rip);
> +
>  static notrace __kprobes void default_do_nmi(struct pt_regs *regs)
>  {
>  	unsigned char reason = 0;
>  	int handled;
> +	bool b2b = false;
>  
>  	/*
>  	 * CPU-specific NMI must be processed before non-CPU-specific
>  	 * NMI, otherwise we may lose it, because the CPU-specific
>  	 * NMI can not be detected/processed on other CPUs.
>  	 */
> -	handled = nmi_handle(NMI_LOCAL, regs);
> -	if (handled)
> +
> +	/*
> +	 * Back-to-back NMIs are interesting because they can either
> +	 * be two NMI or more than two NMIs (any thing over two is dropped
> +	 * due to NMI being edge-triggered).  If this is the second half
> +	 * of the back-to-back NMI, assume we dropped things and process
> +	 * more handlers.  Otherwise reset the 'swallow' NMI behaviour
> +	 */
> +	if (regs->ip == __this_cpu_read(last_nmi_rip))
> +		b2b = true;
> +	else
> +		__this_cpu_write(swallow_nmi, false);
> +
> +	__this_cpu_write(last_nmi_rip, regs->ip);

Just a minor thing and if you make a new version of this patch: You
could move the write to the else branch.

-Robert

-- 
Advanced Micro Devices, Inc.
Operating System Research Center


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [V6][PATCH 4/6] x86, nmi:  add in logic to handle multiple events and unknown NMIs
  2011-09-28 10:31   ` Robert Richter
@ 2011-09-28 12:37     ` Don Zickus
  2011-09-28 16:51       ` Jeremy Fitzhardinge
  2011-10-02 10:07       ` Avi Kivity
  0 siblings, 2 replies; 19+ messages in thread
From: Don Zickus @ 2011-09-28 12:37 UTC (permalink / raw)
  To: Robert Richter
  Cc: x86, Andi Kleen, Peter Zijlstra, ying.huang, LKML, paulmck, avi, jeremy

On Wed, Sep 28, 2011 at 12:31:40PM +0200, Robert Richter wrote:
> On 23.09.11 15:17:13, Don Zickus wrote:
> > @@ -89,6 +89,15 @@ static int notrace __kprobes nmi_handle(unsigned int type, struct pt_regs *regs)
> >  
> >  		handled += a->handler(type, regs);
> >  
> > +		/*
> > + 		 * Optimization: only loop once if this is not a 
> > + 		 * back-to-back NMI.  The idea is nothing is dropped
> > + 		 * on the first NMI, only on the second of a back-to-back
> > + 		 * NMI.  No need to waste cycles going through all the
> > + 		 * handlers.
> > + 		 */
> > +		if (!b2b && handled)
> > +			break;
> 
> I don't think we can leave this in. As said, there are cases that 2
> nmis trigger but the handler is called only once. Only the first would
> be handled then, and the second get lost cause there is no 2nd nmi
> call.

Right.  Avi, Jeremy what was your objection that needed this optimization
in the first place?

> 
> >  	}
> >  
> >  	rcu_read_unlock();
> > @@ -251,7 +260,13 @@ unknown_nmi_error(unsigned char reason, struct pt_regs *regs)
> >  {
> >  	int handled;
> >  
> > -	handled = nmi_handle(NMI_UNKNOWN, regs);
> > +	/*
> > +	 * Use 'false' as back-to-back NMIs are dealt with one level up.
> > +	 * Of course this makes having multiple 'unknown' handlers useless
> > +	 * as only the first one is ever run (unless it can actually determine
> > +	 * if it caused the NMI)
> > +	 */
> > +	handled = nmi_handle(NMI_UNKNOWN, regs, false);
> >  	if (handled) 
> >  		return;
> >  #ifdef CONFIG_MCA
> > @@ -274,19 +289,49 @@ unknown_nmi_error(unsigned char reason, struct pt_regs *regs)
> >  	pr_emerg("Dazed and confused, but trying to continue\n");
> >  }
> >  
> > +static DEFINE_PER_CPU(bool, swallow_nmi);
> > +static DEFINE_PER_CPU(unsigned long, last_nmi_rip);
> > +
> >  static notrace __kprobes void default_do_nmi(struct pt_regs *regs)
> >  {
> >  	unsigned char reason = 0;
> >  	int handled;
> > +	bool b2b = false;
> >  
> >  	/*
> >  	 * CPU-specific NMI must be processed before non-CPU-specific
> >  	 * NMI, otherwise we may lose it, because the CPU-specific
> >  	 * NMI can not be detected/processed on other CPUs.
> >  	 */
> > -	handled = nmi_handle(NMI_LOCAL, regs);
> > -	if (handled)
> > +
> > +	/*
> > +	 * Back-to-back NMIs are interesting because they can either
> > +	 * be two NMI or more than two NMIs (any thing over two is dropped
> > +	 * due to NMI being edge-triggered).  If this is the second half
> > +	 * of the back-to-back NMI, assume we dropped things and process
> > +	 * more handlers.  Otherwise reset the 'swallow' NMI behaviour
> > +	 */
> > +	if (regs->ip == __this_cpu_read(last_nmi_rip))
> > +		b2b = true;
> > +	else
> > +		__this_cpu_write(swallow_nmi, false);
> > +
> > +	__this_cpu_write(last_nmi_rip, regs->ip);
> 
> Just a minor thing and if you make a new version of this patch: You
> could move the write to the else branch.

Ah, true.  Thanks.

Cheers,
Don

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [V6][PATCH 4/6] x86, nmi:  add in logic to handle multiple events and unknown NMIs
  2011-09-28 12:37     ` Don Zickus
@ 2011-09-28 16:51       ` Jeremy Fitzhardinge
  2011-09-28 17:44         ` Don Zickus
  2011-10-02 10:07       ` Avi Kivity
  1 sibling, 1 reply; 19+ messages in thread
From: Jeremy Fitzhardinge @ 2011-09-28 16:51 UTC (permalink / raw)
  To: Don Zickus
  Cc: Robert Richter, x86, Andi Kleen, Peter Zijlstra, ying.huang,
	LKML, paulmck, avi

On 09/28/2011 05:37 AM, Don Zickus wrote:
> On Wed, Sep 28, 2011 at 12:31:40PM +0200, Robert Richter wrote:
>> On 23.09.11 15:17:13, Don Zickus wrote:
>>> @@ -89,6 +89,15 @@ static int notrace __kprobes nmi_handle(unsigned int type, struct pt_regs *regs)
>>>  
>>>  		handled += a->handler(type, regs);
>>>  
>>> +		/*
>>> + 		 * Optimization: only loop once if this is not a 
>>> + 		 * back-to-back NMI.  The idea is nothing is dropped
>>> + 		 * on the first NMI, only on the second of a back-to-back
>>> + 		 * NMI.  No need to waste cycles going through all the
>>> + 		 * handlers.
>>> + 		 */
>>> +		if (!b2b && handled)
>>> +			break;
>> I don't think we can leave this in. As said, there are cases that 2
>> nmis trigger but the handler is called only once. Only the first would
>> be handled then, and the second get lost cause there is no 2nd nmi
>> call.
> Right.  Avi, Jeremy what was your objection that needed this optimization
> in the first place?

My only interest in the NMI code is its use of spinlocks, which seem
inappropriate.

    J

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [V6][PATCH 4/6] x86, nmi:  add in logic to handle multiple events and unknown NMIs
  2011-09-28 16:51       ` Jeremy Fitzhardinge
@ 2011-09-28 17:44         ` Don Zickus
  2011-09-28 17:52           ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 19+ messages in thread
From: Don Zickus @ 2011-09-28 17:44 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Robert Richter, x86, Andi Kleen, Peter Zijlstra, ying.huang,
	LKML, paulmck, avi

On Wed, Sep 28, 2011 at 09:51:33AM -0700, Jeremy Fitzhardinge wrote:
> On 09/28/2011 05:37 AM, Don Zickus wrote:
> > On Wed, Sep 28, 2011 at 12:31:40PM +0200, Robert Richter wrote:
> >> On 23.09.11 15:17:13, Don Zickus wrote:
> >>> @@ -89,6 +89,15 @@ static int notrace __kprobes nmi_handle(unsigned int type, struct pt_regs *regs)
> >>>  
> >>>  		handled += a->handler(type, regs);
> >>>  
> >>> +		/*
> >>> + 		 * Optimization: only loop once if this is not a 
> >>> + 		 * back-to-back NMI.  The idea is nothing is dropped
> >>> + 		 * on the first NMI, only on the second of a back-to-back
> >>> + 		 * NMI.  No need to waste cycles going through all the
> >>> + 		 * handlers.
> >>> + 		 */
> >>> +		if (!b2b && handled)
> >>> +			break;
> >> I don't think we can leave this in. As said, there are cases that 2
> >> nmis trigger but the handler is called only once. Only the first would
> >> be handled then, and the second get lost cause there is no 2nd nmi
> >> call.
> > Right.  Avi, Jeremy what was your objection that needed this optimization
> > in the first place?
> 
> My only interest in the NMI code is its use of spinlocks, which seem
> inappropriate.

Right.  But I thought this was going to clash with your ticketed spinlock
stuff?

Cheers,
Don

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [V6][PATCH 4/6] x86, nmi:  add in logic to handle multiple events and unknown NMIs
  2011-09-28 17:44         ` Don Zickus
@ 2011-09-28 17:52           ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 19+ messages in thread
From: Jeremy Fitzhardinge @ 2011-09-28 17:52 UTC (permalink / raw)
  To: Don Zickus
  Cc: Robert Richter, x86, Andi Kleen, Peter Zijlstra, ying.huang,
	LKML, paulmck, avi

On 09/28/2011 10:44 AM, Don Zickus wrote:
> On Wed, Sep 28, 2011 at 09:51:33AM -0700, Jeremy Fitzhardinge wrote:
>> On 09/28/2011 05:37 AM, Don Zickus wrote:
>>> On Wed, Sep 28, 2011 at 12:31:40PM +0200, Robert Richter wrote:
>>>> On 23.09.11 15:17:13, Don Zickus wrote:
>>>>> @@ -89,6 +89,15 @@ static int notrace __kprobes nmi_handle(unsigned int type, struct pt_regs *regs)
>>>>>  
>>>>>  		handled += a->handler(type, regs);
>>>>>  
>>>>> +		/*
>>>>> + 		 * Optimization: only loop once if this is not a 
>>>>> + 		 * back-to-back NMI.  The idea is nothing is dropped
>>>>> + 		 * on the first NMI, only on the second of a back-to-back
>>>>> + 		 * NMI.  No need to waste cycles going through all the
>>>>> + 		 * handlers.
>>>>> + 		 */
>>>>> +		if (!b2b && handled)
>>>>> +			break;
>>>> I don't think we can leave this in. As said, there are cases that 2
>>>> nmis trigger but the handler is called only once. Only the first would
>>>> be handled then, and the second get lost cause there is no 2nd nmi
>>>> call.
>>> Right.  Avi, Jeremy what was your objection that needed this optimization
>>> in the first place?
>> My only interest in the NMI code is its use of spinlocks, which seem
>> inappropriate.
> Right.  But I thought this was going to clash with your ticketed spinlock
> stuff?

To be honest I haven't really been following this thread; it didn't
appear to me to relate to the spinlock issue any more.  At least in the
Xen case, I don't *think* we'll be sending NMIs to domains which are
using pv ticketlocks, so the issue is moot.  It's more pressing for Avi
because he's thinking of using NMIs to implement pv ticketlocks in kvm.

    J

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [V6][PATCH 3/6] x86, nmi: wire up NMI handlers to new routines
  2011-09-23 19:17 ` [V6][PATCH 3/6] x86, nmi: wire up NMI handlers to new routines Don Zickus
@ 2011-09-30 14:25   ` Robert Richter
  0 siblings, 0 replies; 19+ messages in thread
From: Robert Richter @ 2011-09-30 14:25 UTC (permalink / raw)
  To: Don Zickus
  Cc: x86, Andi Kleen, Peter Zijlstra, ying.huang, LKML, paulmck, avi,
	jeremy, Jason Wessel, Andi Kleen, Corey Minyard, Jack Steiner

On 23.09.11 15:17:12, Don Zickus wrote:
> diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
> index 594d425..90ffec5 100644
> --- a/arch/x86/kernel/cpu/perf_event.c
> +++ b/arch/x86/kernel/cpu/perf_event.c
> @@ -1377,68 +1377,14 @@ struct pmu_nmi_state {
>  static DEFINE_PER_CPU(struct pmu_nmi_state, pmu_nmi);

Don, please remove this too and also struct pmu_nmi_state:

 arch/x86/kernel/cpu/perf_event.c:1066: warning: ‘pmu_nmi’ defined but not used

Thanks,

-Robert

-- 
Advanced Micro Devices, Inc.
Operating System Research Center


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [V6][PATCH 4/6] x86, nmi:  add in logic to handle multiple events and unknown NMIs
  2011-09-28 12:37     ` Don Zickus
  2011-09-28 16:51       ` Jeremy Fitzhardinge
@ 2011-10-02 10:07       ` Avi Kivity
  2011-10-02 13:48         ` Andi Kleen
  2011-10-03 13:13         ` Don Zickus
  1 sibling, 2 replies; 19+ messages in thread
From: Avi Kivity @ 2011-10-02 10:07 UTC (permalink / raw)
  To: Don Zickus
  Cc: Robert Richter, x86, Andi Kleen, Peter Zijlstra, ying.huang,
	LKML, paulmck, jeremy

On 09/28/2011 03:37 PM, Don Zickus wrote:
> On Wed, Sep 28, 2011 at 12:31:40PM +0200, Robert Richter wrote:
> >  On 23.09.11 15:17:13, Don Zickus wrote:
> >  >  @@ -89,6 +89,15 @@ static int notrace __kprobes nmi_handle(unsigned int type, struct pt_regs *regs)
> >  >
> >  >   		handled += a->handler(type, regs);
> >  >
> >  >  +		/*
> >  >  + 		 * Optimization: only loop once if this is not a
> >  >  + 		 * back-to-back NMI.  The idea is nothing is dropped
> >  >  + 		 * on the first NMI, only on the second of a back-to-back
> >  >  + 		 * NMI.  No need to waste cycles going through all the
> >  >  + 		 * handlers.
> >  >  + 		 */
> >  >  +		if (!b2b&&  handled)
> >  >  +			break;
> >
> >  I don't think we can leave this in. As said, there are cases that 2
> >  nmis trigger but the handler is called only once. Only the first would
> >  be handled then, and the second get lost cause there is no 2nd nmi
> >  call.
>
> Right.  Avi, Jeremy what was your objection that needed this optimization
> in the first place?
>

First, iterating over all NMI sources is going to be slow, and a lot 
more so in a guest.  This reduces the performance of perf.

Second, I wanted to use NMIs as a way of waking up a vcpu sleeping with 
interrupts disabled (in the context of Jeremy's paravirt spinlock 
patches).  Looks like we'll have to use paravirtualization for that.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [V6][PATCH 4/6] x86, nmi:  add in logic to handle multiple events and unknown NMIs
  2011-10-02 10:07       ` Avi Kivity
@ 2011-10-02 13:48         ` Andi Kleen
  2011-10-02 14:04           ` Avi Kivity
  2011-10-03 13:13         ` Don Zickus
  1 sibling, 1 reply; 19+ messages in thread
From: Andi Kleen @ 2011-10-02 13:48 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Don Zickus, Robert Richter, x86, Andi Kleen, Peter Zijlstra,
	ying.huang, LKML, paulmck, jeremy

> Second, I wanted to use NMIs as a way of waking up a vcpu sleeping with 
> interrupts disabled (in the context of Jeremy's paravirt spinlock 
> patches).  Looks like we'll have to use paravirtualization for that.

Just use a mode of perf that doesn't use NMIs.
Interrupt off profiling is rarely needed if at all on modern kernels.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [V6][PATCH 4/6] x86, nmi:  add in logic to handle multiple events and unknown NMIs
  2011-10-02 13:48         ` Andi Kleen
@ 2011-10-02 14:04           ` Avi Kivity
  0 siblings, 0 replies; 19+ messages in thread
From: Avi Kivity @ 2011-10-02 14:04 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Don Zickus, Robert Richter, x86, Peter Zijlstra, ying.huang,
	LKML, paulmck, jeremy

On 10/02/2011 03:48 PM, Andi Kleen wrote:
> >  Second, I wanted to use NMIs as a way of waking up a vcpu sleeping with
> >  interrupts disabled (in the context of Jeremy's paravirt spinlock
> >  patches).  Looks like we'll have to use paravirtualization for that.
>
> Just use a mode of perf that doesn't use NMIs.
> Interrupt off profiling is rarely needed if at all on modern kernels.

I tend to agree.   But that's not my call.

(note 'perf kvm' will need adjustment if we make that change - likely 
need to capture the interrupt during vmexit and dispatch it manually; 
not sure if it's workable for svm at all)

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [V6][PATCH 4/6] x86, nmi:  add in logic to handle multiple events and unknown NMIs
  2011-10-02 10:07       ` Avi Kivity
  2011-10-02 13:48         ` Andi Kleen
@ 2011-10-03 13:13         ` Don Zickus
  1 sibling, 0 replies; 19+ messages in thread
From: Don Zickus @ 2011-10-03 13:13 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Robert Richter, x86, Andi Kleen, Peter Zijlstra, ying.huang,
	LKML, paulmck, jeremy

On Sun, Oct 02, 2011 at 12:07:16PM +0200, Avi Kivity wrote:
> >Right.  Avi, Jeremy what was your objection that needed this optimization
> >in the first place?
> >
> 
> First, iterating over all NMI sources is going to be slow, and a lot
> more so in a guest.  This reduces the performance of perf.

I understand and agree.  The only positive I can offer is most machines
will most likely only have two NMI handlers registered: perf and
arch_backtrace.  So any slowness should be minimal.  We might even be able
to code up arch_backtrace to register its NMI handler when it is called to 
minimize the the number of NMI sources even more.

Cheers,
Don

> 
> Second, I wanted to use NMIs as a way of waking up a vcpu sleeping
> with interrupts disabled (in the context of Jeremy's paravirt
> spinlock patches).  Looks like we'll have to use paravirtualization
> for that.


> 
> -- 
> error compiling committee.c: too many arguments to function
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2011-10-03 13:14 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-23 19:17 [V6][PATCH 0/6] x86, nmi: new NMI handling routines Don Zickus
2011-09-23 19:17 ` [V6][PATCH 1/6] x86, nmi: split out nmi from traps.c Don Zickus
2011-09-23 19:17 ` [V6][PATCH 2/6] x86, nmi: create new NMI handler routines Don Zickus
2011-09-23 19:17 ` [V6][PATCH 3/6] x86, nmi: wire up NMI handlers to new routines Don Zickus
2011-09-30 14:25   ` Robert Richter
2011-09-23 19:17 ` [V6][PATCH 4/6] x86, nmi: add in logic to handle multiple events and unknown NMIs Don Zickus
2011-09-26  8:59   ` Peter Zijlstra
2011-09-26 12:55     ` Don Zickus
2011-09-28 10:31   ` Robert Richter
2011-09-28 12:37     ` Don Zickus
2011-09-28 16:51       ` Jeremy Fitzhardinge
2011-09-28 17:44         ` Don Zickus
2011-09-28 17:52           ` Jeremy Fitzhardinge
2011-10-02 10:07       ` Avi Kivity
2011-10-02 13:48         ` Andi Kleen
2011-10-02 14:04           ` Avi Kivity
2011-10-03 13:13         ` Don Zickus
2011-09-23 19:17 ` [V6][PATCH 5/6] x86, nmi: track NMI usage stats Don Zickus
2011-09-23 19:17 ` [V6][PATCH 6/6] x86, nmi: print out NMI stats in /proc/interrupts Don Zickus

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.