linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5 RESEND] x86, nmi: Various fixes and cleanups
@ 2014-05-07 15:34 Don Zickus
  2014-05-07 15:34 ` [PATCH 1/5] x86, nmi: Add new nmi type 'external' Don Zickus
                   ` (4 more replies)
  0 siblings, 5 replies; 21+ messages in thread
From: Don Zickus @ 2014-05-07 15:34 UTC (permalink / raw)
  To: x86; +Cc: Peter Zijlstra, ak, gong.chen, LKML, Don Zickus

I started this patch by fixing a performance problem with the GHES
NMI handler and then things evolved to more patches as I was poking
around in the code.

The main focus was moving the GHES NMI driver to its own NMI subtype
to avoid slowing down perf handling.  Then I decided to move the
default external NMI handler to its own routine.  Then finally, I
needed to see which NMI handlers were registered so I hacked up
/proc/interrupts to show me.

Tested mostly on HP boxes that have GHES enabled and having the iLO
send NMIs to panic the box (using hpwdt driver).  Ran perf on other
GHES enabled boxes to test performance results.

Update: I realized I didn't send this to lkml initially.  Might be why no one
has commented on it more.

Don Zickus (5):
  x86, nmi:  Add new nmi type 'external'
  x86, nmi: Add boot line option 'panic_on_unrecovered_nmi' and
    'panic_on_io_nmi'
  x86, nmi: Remove 'reason' value from unknown nmi output
  x86, nmi: Move default external NMI handler to its own routine
  x86, nmi: Add better NMI stats to /proc/interrupts and show handlers

 Documentation/kernel-parameters.txt |    9 ++
 arch/x86/include/asm/nmi.h          |    5 +
 arch/x86/kernel/irq.c               |    3 +
 arch/x86/kernel/nmi.c               |  199 ++++++++++++++++++++++++++--------
 drivers/acpi/apei/ghes.c            |    4 +-
 drivers/watchdog/hpwdt.c            |   24 +++--
 6 files changed, 187 insertions(+), 57 deletions(-)


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/5] x86, nmi:  Add new nmi type 'external'
  2014-05-07 15:34 [PATCH 0/5 RESEND] x86, nmi: Various fixes and cleanups Don Zickus
@ 2014-05-07 15:34 ` Don Zickus
  2014-05-07 15:38   ` Ingo Molnar
  2014-05-07 15:34 ` [PATCH 2/5] x86, nmi: Add boot line option 'panic_on_unrecovered_nmi' and 'panic_on_io_nmi' Don Zickus
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 21+ messages in thread
From: Don Zickus @ 2014-05-07 15:34 UTC (permalink / raw)
  To: x86; +Cc: Peter Zijlstra, ak, gong.chen, LKML, Don Zickus

I noticed when debugging a perf problem on a machine with GHES enabled,
perf seemed slow.  I then realized that the GHES NMI routine was taking
a global lock all the time to inspect the hardware.  This contended
with all the local perf counters which did not need a lock.  So each cpu
accidentally was synchronizing with itself when using perf.

This is because the way the nmi handler works.  It executes all the handlers
registered to a particular subtype (to deal with nmi sharing).  As a result
the GHES handler was executed on every PMI.

Fix this by creating a new nmi type called NMI_EXT, which is used by
handlers that need to probe external hardware and require a global lock
to do so.

Now the main NMI handler can check the internal NMI handlers first and
then the external ones if nothing is found.

This makes perf a little faster again on those machines with GHES enabled.

Signed-off-by: Don Zickus <dzickus@redhat.com>
---
 arch/x86/include/asm/nmi.h |    1 +
 arch/x86/kernel/nmi.c      |   41 ++++++++++++++++++++++++++---------------
 drivers/acpi/apei/ghes.c   |    4 ++--
 drivers/watchdog/hpwdt.c   |   10 ++--------
 4 files changed, 31 insertions(+), 25 deletions(-)

diff --git a/arch/x86/include/asm/nmi.h b/arch/x86/include/asm/nmi.h
index 86f9301..d6d26fa 100644
--- a/arch/x86/include/asm/nmi.h
+++ b/arch/x86/include/asm/nmi.h
@@ -24,6 +24,7 @@ extern int unknown_nmi_panic;
 
 enum {
 	NMI_LOCAL=0,
+	NMI_EXT,
 	NMI_UNKNOWN,
 	NMI_SERR,
 	NMI_IO_CHECK,
diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index 6fcb49c..72fcff7 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -34,29 +34,32 @@
 #include <trace/events/nmi.h>
 
 struct nmi_desc {
-	spinlock_t lock;
-	struct list_head head;
+	spinlock_t		lock;
+	struct list_head	head;
 };
 
 static struct nmi_desc nmi_desc[NMI_MAX] = 
 {
-	{
-		.lock = __SPIN_LOCK_UNLOCKED(&nmi_desc[0].lock),
-		.head = LIST_HEAD_INIT(nmi_desc[0].head),
+	[NMI_LOCAL] = {
+		.lock = __SPIN_LOCK_UNLOCKED(&nmi_desc[NMI_LOCAL].lock),
+		.head = LIST_HEAD_INIT(nmi_desc[NMI_LOCAL].head),
 	},
-	{
-		.lock = __SPIN_LOCK_UNLOCKED(&nmi_desc[1].lock),
-		.head = LIST_HEAD_INIT(nmi_desc[1].head),
+	[NMI_EXT] = {
+		.lock = __SPIN_LOCK_UNLOCKED(&nmi_desc[NMI_EXT].lock),
+		.head = LIST_HEAD_INIT(nmi_desc[NMI_EXT].head),
 	},
-	{
-		.lock = __SPIN_LOCK_UNLOCKED(&nmi_desc[2].lock),
-		.head = LIST_HEAD_INIT(nmi_desc[2].head),
+	[NMI_UNKNOWN] = {
+		.lock = __SPIN_LOCK_UNLOCKED(&nmi_desc[NMI_UNKNOWN].lock),
+		.head = LIST_HEAD_INIT(nmi_desc[NMI_UNKNOWN].head),
 	},
-	{
-		.lock = __SPIN_LOCK_UNLOCKED(&nmi_desc[3].lock),
-		.head = LIST_HEAD_INIT(nmi_desc[3].head),
+	[NMI_SERR] = {
+		.lock = __SPIN_LOCK_UNLOCKED(&nmi_desc[NMI_SERR].lock),
+		.head = LIST_HEAD_INIT(nmi_desc[NMI_SERR].head),
+	},
+	[NMI_IO_CHECK] = {
+		.lock = __SPIN_LOCK_UNLOCKED(&nmi_desc[NMI_IO_CHECK].lock),
+		.head = LIST_HEAD_INIT(nmi_desc[NMI_IO_CHECK].head),
 	},
-
 };
 
 struct nmi_stats {
@@ -317,8 +320,16 @@ static __kprobes void default_do_nmi(struct pt_regs *regs)
 
 	__this_cpu_write(last_nmi_rip, regs->ip);
 
+	/* try local NMIs first */
 	handled = nmi_handle(NMI_LOCAL, regs, b2b);
 	__this_cpu_add(nmi_stats.normal, handled);
+
+	/* if unclaimed, try external NMIs next */
+	if (!handled) {
+		handled = nmi_handle(NMI_EXT, regs, b2b);
+		__this_cpu_add(nmi_stats.external, handled);
+	}
+
 	if (handled) {
 		/*
 		 * There are cases when a NMI handler handles multiple
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index dab7cb7..8b78bf5 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -976,7 +976,7 @@ static int ghes_probe(struct platform_device *ghes_dev)
 		ghes_estatus_pool_expand(len);
 		mutex_lock(&ghes_list_mutex);
 		if (list_empty(&ghes_nmi))
-			register_nmi_handler(NMI_LOCAL, ghes_notify_nmi, 0,
+			register_nmi_handler(NMI_EXT, ghes_notify_nmi, 0,
 						"ghes");
 		list_add_rcu(&ghes->list, &ghes_nmi);
 		mutex_unlock(&ghes_list_mutex);
@@ -1025,7 +1025,7 @@ static int ghes_remove(struct platform_device *ghes_dev)
 		mutex_lock(&ghes_list_mutex);
 		list_del_rcu(&ghes->list);
 		if (list_empty(&ghes_nmi))
-			unregister_nmi_handler(NMI_LOCAL, "ghes");
+			unregister_nmi_handler(NMI_EXT, "ghes");
 		mutex_unlock(&ghes_list_mutex);
 		/*
 		 * To synchronize with NMI handler, ghes can only be
diff --git a/drivers/watchdog/hpwdt.c b/drivers/watchdog/hpwdt.c
index 2b75e8b..9617a4d 100644
--- a/drivers/watchdog/hpwdt.c
+++ b/drivers/watchdog/hpwdt.c
@@ -737,12 +737,9 @@ static int hpwdt_init_nmi_decoding(struct pci_dev *dev)
 	retval = register_nmi_handler(NMI_UNKNOWN, hpwdt_pretimeout, 0, "hpwdt");
 	if (retval)
 		goto error;
-	retval = register_nmi_handler(NMI_SERR, hpwdt_pretimeout, 0, "hpwdt");
+	retval = register_nmi_handler(NMI_EXT, hpwdt_pretimeout, 0, "hpwdt");
 	if (retval)
 		goto error1;
-	retval = register_nmi_handler(NMI_IO_CHECK, hpwdt_pretimeout, 0, "hpwdt");
-	if (retval)
-		goto error2;
 
 	dev_info(&dev->dev,
 			"HP Watchdog Timer Driver: NMI decoding initialized"
@@ -750,8 +747,6 @@ static int hpwdt_init_nmi_decoding(struct pci_dev *dev)
 			(allow_kdump == 0) ? "OFF" : "ON");
 	return 0;
 
-error2:
-	unregister_nmi_handler(NMI_SERR, "hpwdt");
 error1:
 	unregister_nmi_handler(NMI_UNKNOWN, "hpwdt");
 error:
@@ -766,8 +761,7 @@ error:
 static void hpwdt_exit_nmi_decoding(void)
 {
 	unregister_nmi_handler(NMI_UNKNOWN, "hpwdt");
-	unregister_nmi_handler(NMI_SERR, "hpwdt");
-	unregister_nmi_handler(NMI_IO_CHECK, "hpwdt");
+	unregister_nmi_handler(NMI_EXT, "hpwdt");
 	if (cru_rom_addr)
 		iounmap(cru_rom_addr);
 }
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 2/5] x86, nmi: Add boot line option 'panic_on_unrecovered_nmi' and 'panic_on_io_nmi'
  2014-05-07 15:34 [PATCH 0/5 RESEND] x86, nmi: Various fixes and cleanups Don Zickus
  2014-05-07 15:34 ` [PATCH 1/5] x86, nmi: Add new nmi type 'external' Don Zickus
@ 2014-05-07 15:34 ` Don Zickus
  2014-05-07 15:34 ` [PATCH 3/5] x86, nmi: Remove 'reason' value from unknown nmi output Don Zickus
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 21+ messages in thread
From: Don Zickus @ 2014-05-07 15:34 UTC (permalink / raw)
  To: x86; +Cc: Peter Zijlstra, ak, gong.chen, LKML, Don Zickus

These options are accessable through /proc/sys/kernel but not on the command
line.  The main use is for on board controllers (iLO, DRAC, BMC) to be able
to issue an external NMI to bring down a hung box.

This just makes configuring a box a little easier.

Signed-off-by: Don Zickus <dzickus@redhat.com>
---
 Documentation/kernel-parameters.txt |    9 +++++++++
 arch/x86/kernel/nmi.c               |   14 ++++++++++++++
 2 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 7116fda..c9da1a7 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2313,6 +2313,15 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			timeout < 0: reboot immediately
 			Format: <timeout>
 
+	panic_on_unrecovered_nmi [X86]
+			Force a machine to panic if an unrecoverable NMI is
+			unclaimed.  This covers SERR or UNKONWN NMI cases.
+
+	panic_on_io_nmi [X86]
+			Force a machine to panic if an IO NMI is unclaimed.
+			This covers external NMIs with no handlers associated
+			with them.
+
 	parkbd.port=	[HW] Parallel port number the keyboard adapter is
 			connected to, default is 0.
 			Format: <parport#>
diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index 72fcff7..f3f4c43 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -87,6 +87,20 @@ static int __init setup_unknown_nmi_panic(char *str)
 }
 __setup("unknown_nmi_panic", setup_unknown_nmi_panic);
 
+static int __init setup_panic_on_unrecovered_nmi(char *str)
+{
+	panic_on_unrecovered_nmi = 1;
+	return 1;
+}
+__setup("panic_on_unrecovered_nmi", setup_panic_on_unrecovered_nmi);
+
+static int __init setup_panic_on_io_nmi(char *str)
+{
+	panic_on_io_nmi = 1;
+	return 1;
+}
+__setup("panic_on_io_nmi", setup_panic_on_io_nmi);
+
 #define nmi_to_desc(type) (&nmi_desc[type])
 
 static u64 nmi_longest_ns = 1 * NSEC_PER_MSEC;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 3/5] x86, nmi: Remove 'reason' value from unknown nmi output
  2014-05-07 15:34 [PATCH 0/5 RESEND] x86, nmi: Various fixes and cleanups Don Zickus
  2014-05-07 15:34 ` [PATCH 1/5] x86, nmi: Add new nmi type 'external' Don Zickus
  2014-05-07 15:34 ` [PATCH 2/5] x86, nmi: Add boot line option 'panic_on_unrecovered_nmi' and 'panic_on_io_nmi' Don Zickus
@ 2014-05-07 15:34 ` Don Zickus
  2014-05-07 15:34 ` [PATCH 4/5] x86, nmi: Move default external NMI handler to its own routine Don Zickus
  2014-05-07 15:34 ` [PATCH 5/5] x86, nmi: Add better NMI stats to /proc/interrupts and show handlers Don Zickus
  4 siblings, 0 replies; 21+ messages in thread
From: Don Zickus @ 2014-05-07 15:34 UTC (permalink / raw)
  To: x86; +Cc: Peter Zijlstra, ak, gong.chen, LKML, Don Zickus

The 'reason' value is from the io port for the external NMI.

We already have handlers that deal with non-zero 'reason'.  Providing
this in the output of unknown NMI, doesn't add much.

In addition, it should lower the number of questions I get when
customers keep attaching new unknown NMI panic updates because the
reason value changes (which is expected because part of it is a
timer bit that toggles all the time).

However, the main reason for this, is for the next patch which
will move the external NMI check to another function.  Hence the
reason value will not be available.

Signed-off-by: Don Zickus <dzickus@redhat.com>
---
 arch/x86/kernel/nmi.c |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index f3f4c43..be2d622 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -277,7 +277,7 @@ io_check_error(unsigned char reason, struct pt_regs *regs)
 }
 
 static __kprobes void
-unknown_nmi_error(unsigned char reason, struct pt_regs *regs)
+unknown_nmi_error(struct pt_regs *regs)
 {
 	int handled;
 
@@ -295,8 +295,8 @@ unknown_nmi_error(unsigned char reason, struct pt_regs *regs)
 
 	__this_cpu_add(nmi_stats.unknown, 1);
 
-	pr_emerg("Uhhuh. NMI received for unknown reason %02x on CPU %d.\n",
-		 reason, smp_processor_id());
+	pr_emerg("Uhhuh. NMI received for unknown reason on CPU %d.\n",
+		 smp_processor_id());
 
 	pr_emerg("Do you have a strange power saving mode enabled?\n");
 	if (unknown_nmi_panic || panic_on_unrecovered_nmi)
@@ -413,7 +413,7 @@ static __kprobes void default_do_nmi(struct pt_regs *regs)
 	if (b2b && __this_cpu_read(swallow_nmi))
 		__this_cpu_add(nmi_stats.swallow, 1);
 	else
-		unknown_nmi_error(reason, regs);
+		unknown_nmi_error(regs);
 }
 
 /*
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 4/5] x86, nmi: Move default external NMI handler to its own routine
  2014-05-07 15:34 [PATCH 0/5 RESEND] x86, nmi: Various fixes and cleanups Don Zickus
                   ` (2 preceding siblings ...)
  2014-05-07 15:34 ` [PATCH 3/5] x86, nmi: Remove 'reason' value from unknown nmi output Don Zickus
@ 2014-05-07 15:34 ` Don Zickus
  2014-05-07 15:34 ` [PATCH 5/5] x86, nmi: Add better NMI stats to /proc/interrupts and show handlers Don Zickus
  4 siblings, 0 replies; 21+ messages in thread
From: Don Zickus @ 2014-05-07 15:34 UTC (permalink / raw)
  To: x86; +Cc: Peter Zijlstra, ak, gong.chen, LKML, Don Zickus, thomas.mingarelli

Now that we have setup an NMI subtye called NMI_EXT, there is really
no need to hard code the default external NMI handler in the main
nmi handler routine.

Move it to a proper function and register it on boot.  This change is
just code movement.

In addition, update the hpwdt to allow it to unregister the default
handler on its registration (and vice versa).  This allows the driver
to take control of that io port (which it ultimately wanted to do
originally), but in a cleaner way.

Tested by HP to make sure I didn't break anything. :-)

Cc: thomas.mingarelli@hp.com
Signed-off-by: Don Zickus <dzickus@redhat.com>
---
 arch/x86/include/asm/nmi.h |    3 ++
 arch/x86/kernel/nmi.c      |   79 ++++++++++++++++++++++++++++---------------
 drivers/watchdog/hpwdt.c   |   14 ++++++++
 3 files changed, 68 insertions(+), 28 deletions(-)

diff --git a/arch/x86/include/asm/nmi.h b/arch/x86/include/asm/nmi.h
index d6d26fa..47e8cff 100644
--- a/arch/x86/include/asm/nmi.h
+++ b/arch/x86/include/asm/nmi.h
@@ -61,4 +61,7 @@ void stop_nmi(void);
 void restart_nmi(void);
 void local_touch_nmi(void);
 
+int register_nmi_default_external_handler(void);
+void unregister_nmi_default_external_handler(void);
+
 #endif /* _ASM_X86_NMI_H */
diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index be2d622..7359e53 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -74,11 +74,6 @@ static DEFINE_PER_CPU(struct nmi_stats, nmi_stats);
 static int ignore_nmis;
 
 int unknown_nmi_panic;
-/*
- * Prevent NMI reason port (0x61) being accessed simultaneously, can
- * only be used in NMI handler.
- */
-static DEFINE_RAW_SPINLOCK(nmi_reason_lock);
 
 static int __init setup_unknown_nmi_panic(char *str)
 {
@@ -310,7 +305,6 @@ static DEFINE_PER_CPU(unsigned long, last_nmi_rip);
 
 static __kprobes void default_do_nmi(struct pt_regs *regs)
 {
-	unsigned char reason = 0;
 	int handled;
 	bool b2b = false;
 
@@ -358,28 +352,6 @@ static __kprobes void default_do_nmi(struct pt_regs *regs)
 		return;
 	}
 
-	/* Non-CPU-specific NMI: NMI sources can be processed on any CPU */
-	raw_spin_lock(&nmi_reason_lock);
-	reason = x86_platform.get_nmi_reason();
-
-	if (reason & NMI_REASON_MASK) {
-		if (reason & NMI_REASON_SERR)
-			pci_serr_error(reason, regs);
-		else if (reason & NMI_REASON_IOCHK)
-			io_check_error(reason, regs);
-#ifdef CONFIG_X86_32
-		/*
-		 * Reassert NMI in case it became active
-		 * meanwhile as it's edge-triggered:
-		 */
-		reassert_nmi();
-#endif
-		__this_cpu_add(nmi_stats.external, 1);
-		raw_spin_unlock(&nmi_reason_lock);
-		return;
-	}
-	raw_spin_unlock(&nmi_reason_lock);
-
 	/*
 	 * Only one NMI can be latched at a time.  To handle
 	 * this we may process multiple nmi handlers at once to
@@ -417,6 +389,57 @@ static __kprobes void default_do_nmi(struct pt_regs *regs)
 }
 
 /*
+ * Prevent NMI reason port (0x61) being accessed simultaneously, can
+ * only be used in NMI handler.
+ */
+static DEFINE_RAW_SPINLOCK(nmi_reason_lock);
+
+static int __kprobes
+nmi_default_external_handler(unsigned int cmd, struct pt_regs *regs)
+{
+	unsigned char reason = 0;
+	int ret = NMI_DONE;
+
+	/* Non-CPU-specific NMI: NMI sources can be processed on any CPU */
+	raw_spin_lock(&nmi_reason_lock);
+	reason = x86_platform.get_nmi_reason();
+
+	if (reason & NMI_REASON_MASK) {
+		if (reason & NMI_REASON_SERR)
+			pci_serr_error(reason, regs);
+		else if (reason & NMI_REASON_IOCHK)
+			io_check_error(reason, regs);
+#ifdef CONFIG_X86_32
+		/*
+		 * Reassert NMI in case it became active
+		 * meanwhile as it's edge-triggered:
+		 */
+		reassert_nmi();
+#endif
+		ret = NMI_HANDLED;
+	}
+	raw_spin_unlock(&nmi_reason_lock);
+
+	return ret;
+}
+
+int register_nmi_default_external_handler(void)
+{
+	register_nmi_handler(NMI_EXT, nmi_default_external_handler,
+				0, "plat");
+
+	return 0;
+}
+EXPORT_SYMBOL(register_nmi_default_external_handler);
+early_initcall(register_nmi_default_external_handler);
+
+void unregister_nmi_default_external_handler(void)
+{
+	unregister_nmi_handler(NMI_EXT, "plat");
+}
+EXPORT_SYMBOL(unregister_nmi_default_external_handler);
+
+/*
  * NMIs can hit breakpoints which will cause it to lose its
  * NMI context with the CPU when the breakpoint does an iret.
  */
diff --git a/drivers/watchdog/hpwdt.c b/drivers/watchdog/hpwdt.c
index 9617a4d..4de85e0 100644
--- a/drivers/watchdog/hpwdt.c
+++ b/drivers/watchdog/hpwdt.c
@@ -38,6 +38,7 @@
 #include <asm/cacheflush.h>
 #endif /* CONFIG_HPWDT_NMI_DECODING */
 #include <asm/nmi.h>
+#include <asm/mach_traps.h>
 
 #define HPWDT_VERSION			"1.3.3"
 #define SECS_TO_TICKS(secs)		((secs) * 1000 / 128)
@@ -487,6 +488,15 @@ static int hpwdt_pretimeout(unsigned int ulReason, struct pt_regs *regs)
 		goto out;
 
 	spin_lock_irqsave(&rom_lock, rom_pl);
+	if (ulReason == NMI_EXT) {
+		unsigned char reason = 0;
+		reason = x86_platform.get_nmi_reason();
+
+		if (!(reason & NMI_REASON_MASK)) {
+			spin_unlock_irqrestore(&rom_lock, rom_pl);
+			goto out;
+		}
+	}
 	if (!die_nmi_called && !is_icru && !is_uefi)
 		asminline_call(&cmn_regs, cru_rom_addr);
 	die_nmi_called = 1;
@@ -741,6 +751,9 @@ static int hpwdt_init_nmi_decoding(struct pci_dev *dev)
 	if (retval)
 		goto error1;
 
+	/* hpwdt is the new external port 0x61 handler */
+	unregister_nmi_default_external_handler();
+
 	dev_info(&dev->dev,
 			"HP Watchdog Timer Driver: NMI decoding initialized"
 			", allow kernel dump: %s (default = 0/OFF)\n",
@@ -762,6 +775,7 @@ static void hpwdt_exit_nmi_decoding(void)
 {
 	unregister_nmi_handler(NMI_UNKNOWN, "hpwdt");
 	unregister_nmi_handler(NMI_EXT, "hpwdt");
+	register_nmi_default_external_handler();
 	if (cru_rom_addr)
 		iounmap(cru_rom_addr);
 }
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 5/5] x86, nmi: Add better NMI stats to /proc/interrupts and show handlers
  2014-05-07 15:34 [PATCH 0/5 RESEND] x86, nmi: Various fixes and cleanups Don Zickus
                   ` (3 preceding siblings ...)
  2014-05-07 15:34 ` [PATCH 4/5] x86, nmi: Move default external NMI handler to its own routine Don Zickus
@ 2014-05-07 15:34 ` Don Zickus
  2014-05-07 15:42   ` Ingo Molnar
  2014-05-07 19:50   ` Elliott, Robert (Server Storage)
  4 siblings, 2 replies; 21+ messages in thread
From: Don Zickus @ 2014-05-07 15:34 UTC (permalink / raw)
  To: x86; +Cc: Peter Zijlstra, ak, gong.chen, LKML, Don Zickus

The main reason for this patch is because I have a hard time knowing
what NMI handlers are registered on the system when debugging NMI issues.

This info is provided in /proc/interrupts for interrupt handlers, so I
added support for NMI stuff too.  As a bonus it provides stat breakdowns
much like the interrupts.

The only ugly issue is how to label NMI subtypes using only 3 letters
and still make it obvious it is part of the NMI.  Adding a /proc/nmi
seemed overkill, so I choose to indent things by one space.  Sample
output is below:

[root@dhcp71-248 ~]# cat /proc/interrupts
           CPU0       CPU1       CPU2       CPU3
  0:         29          0          0          0  IR-IO-APIC-edge      timer
<snip>
NMI:         20        774      10986       4227   Non-maskable interrupts
 LOC:         21        775      10987       4228  Local     PMI, arch_bt
 EXT:          0          0          0          0  External  plat
 UNK:          0          0          0          0  Unknown
 SWA:          0          0          0          0  Swallowed
LOC:      30374      24749      20795      15095   Local timer interrupts
SPU:          0          0          0          0   Spurious interrupts
PMI:         20        774      10986       4227   Performance monitoring interrupts
<snip>

The extra 'local' NMI count represents the number of NMIs 'handled'
whereas the general NMI count represents how many actual NMIs were
processed.  IOW, two NMIs came in at once during one call.

I am open to better suggestions.

Signed-off-by: Don Zickus <dzickus@redhat.com>
---
 arch/x86/include/asm/nmi.h |    1 +
 arch/x86/kernel/irq.c      |    3 ++
 arch/x86/kernel/nmi.c      |   57 ++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 61 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/nmi.h b/arch/x86/include/asm/nmi.h
index 47e8cff..47706af 100644
--- a/arch/x86/include/asm/nmi.h
+++ b/arch/x86/include/asm/nmi.h
@@ -60,6 +60,7 @@ void unregister_nmi_handler(unsigned int, const char *);
 void stop_nmi(void);
 void restart_nmi(void);
 void local_touch_nmi(void);
+void nmi_show_interrupts(struct seq_file *, int);
 
 int register_nmi_default_external_handler(void);
 void unregister_nmi_default_external_handler(void);
diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
index d99f31d..520359c 100644
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -17,6 +17,7 @@
 #include <asm/idle.h>
 #include <asm/mce.h>
 #include <asm/hw_irq.h>
+#include <asm/nmi.h>
 
 #define CREATE_TRACE_POINTS
 #include <asm/trace/irq_vectors.h>
@@ -59,6 +60,8 @@ int arch_show_interrupts(struct seq_file *p, int prec)
 	for_each_online_cpu(j)
 		seq_printf(p, "%10u ", irq_stats(j)->__nmi_count);
 	seq_printf(p, "  Non-maskable interrupts\n");
+	nmi_show_interrupts(p, prec);
+
 #ifdef CONFIG_X86_LOCAL_APIC
 	seq_printf(p, "%*s: ", prec, "LOC");
 	for_each_online_cpu(j)
diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index 7359e53..8741933 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -388,6 +388,63 @@ static __kprobes void default_do_nmi(struct pt_regs *regs)
 		unknown_nmi_error(regs);
 }
 
+static void print_nmi_action_name(struct seq_file *p, int type)
+{
+	struct nmi_desc *desc;
+	struct nmiaction *a;
+
+	rcu_read_lock();
+
+	desc = nmi_to_desc(type);
+
+	if (!(list_empty(&desc->head))) {
+
+		a = list_entry_rcu(desc->head.next, struct nmiaction, list);
+		seq_printf(p, "  %s", a->name);
+
+		list_for_each_entry_continue_rcu(a, &desc->head, list)
+			seq_printf(p, ", %s", a->name);
+
+	}
+	seq_puts(p, "\n");
+
+	rcu_read_unlock();
+}
+
+void nmi_show_interrupts(struct seq_file *p, int prec)
+{
+	int j;
+	int indent = prec + 1;
+
+#define get_nmi_stats(j)	(&per_cpu(nmi_stats, j))
+
+	seq_printf(p, "%*s: ", indent, "LOC");
+	for_each_online_cpu(j)
+		seq_printf(p, "%10u ", get_nmi_stats(j)->normal);
+	seq_printf(p, " %-8s", "Local");
+
+	print_nmi_action_name(p, NMI_LOCAL);
+
+	seq_printf(p, "%*s: ", indent, "EXT");
+	for_each_online_cpu(j)
+		seq_printf(p, "%10u ", get_nmi_stats(j)->external);
+	seq_printf(p, " %-8s", "External");
+
+	print_nmi_action_name(p, NMI_EXT);
+
+	seq_printf(p, "%*s: ", indent, "UNK");
+	for_each_online_cpu(j)
+		seq_printf(p, "%10u ", get_nmi_stats(j)->unknown);
+	seq_printf(p, " %-8s", "Unknown");
+
+	print_nmi_action_name(p, NMI_UNKNOWN);
+
+	seq_printf(p, "%*s: ", indent, "SWA");
+	for_each_online_cpu(j)
+		seq_printf(p, "%10u ", get_nmi_stats(j)->swallow);
+	seq_printf(p, " %-8s", "Swallowed\n");
+}
+
 /*
  * Prevent NMI reason port (0x61) being accessed simultaneously, can
  * only be used in NMI handler.
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/5] x86, nmi:  Add new nmi type 'external'
  2014-05-07 15:34 ` [PATCH 1/5] x86, nmi: Add new nmi type 'external' Don Zickus
@ 2014-05-07 15:38   ` Ingo Molnar
  2014-05-07 16:02     ` Don Zickus
  0 siblings, 1 reply; 21+ messages in thread
From: Ingo Molnar @ 2014-05-07 15:38 UTC (permalink / raw)
  To: Don Zickus
  Cc: x86, Peter Zijlstra, ak, gong.chen, LKML, Thomas Gleixner,
	Frédéric Weisbecker, Steven Rostedt


* Don Zickus <dzickus@redhat.com> wrote:

> I noticed when debugging a perf problem on a machine with GHES enabled,
> perf seemed slow.  I then realized that the GHES NMI routine was taking
> a global lock all the time to inspect the hardware.  This contended
> with all the local perf counters which did not need a lock.  So each cpu
> accidentally was synchronizing with itself when using perf.
> 
> This is because the way the nmi handler works.  It executes all the handlers
> registered to a particular subtype (to deal with nmi sharing).  As a result
> the GHES handler was executed on every PMI.
> 
> Fix this by creating a new nmi type called NMI_EXT, which is used by
> handlers that need to probe external hardware and require a global lock
> to do so.
> 
> Now the main NMI handler can check the internal NMI handlers first and
> then the external ones if nothing is found.
> 
> This makes perf a little faster again on those machines with GHES enabled.

So what happens if GHES asserts an NMI at the same time a PMI 
triggers?

If the perf PMI executes and indicates that it has handled something, 
we don't execute the GHES handler, right? Will the GHES re-trigger the 
NMI after we return?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 5/5] x86, nmi: Add better NMI stats to /proc/interrupts and show handlers
  2014-05-07 15:34 ` [PATCH 5/5] x86, nmi: Add better NMI stats to /proc/interrupts and show handlers Don Zickus
@ 2014-05-07 15:42   ` Ingo Molnar
  2014-05-07 16:04     ` Don Zickus
  2014-05-07 19:50   ` Elliott, Robert (Server Storage)
  1 sibling, 1 reply; 21+ messages in thread
From: Ingo Molnar @ 2014-05-07 15:42 UTC (permalink / raw)
  To: Don Zickus
  Cc: x86, Peter Zijlstra, ak, gong.chen, LKML, Thomas Gleixner,
	Frédéric Weisbecker


* Don Zickus <dzickus@redhat.com> wrote:

> The main reason for this patch is because I have a hard time knowing
> what NMI handlers are registered on the system when debugging NMI issues.
> 
> This info is provided in /proc/interrupts for interrupt handlers, so I
> added support for NMI stuff too.  As a bonus it provides stat breakdowns
> much like the interrupts.
> 
> The only ugly issue is how to label NMI subtypes using only 3 letters
> and still make it obvious it is part of the NMI.  Adding a /proc/nmi
> seemed overkill, so I choose to indent things by one space.  Sample
> output is below:
> 
> [root@dhcp71-248 ~]# cat /proc/interrupts
>            CPU0       CPU1       CPU2       CPU3
>   0:         29          0          0          0  IR-IO-APIC-edge      timer
> <snip>
> NMI:         20        774      10986       4227   Non-maskable interrupts
>  LOC:         21        775      10987       4228  Local     PMI, arch_bt
>  EXT:          0          0          0          0  External  plat
>  UNK:          0          0          0          0  Unknown
>  SWA:          0          0          0          0  Swallowed
> LOC:      30374      24749      20795      15095   Local timer interrupts
> SPU:          0          0          0          0   Spurious interrupts
> PMI:         20        774      10986       4227   Performance monitoring interrupts
> <snip>

Looks pretty useful!

The lost vertical alignment of the counts is a problem though IMHO.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/5] x86, nmi:  Add new nmi type 'external'
  2014-05-07 15:38   ` Ingo Molnar
@ 2014-05-07 16:02     ` Don Zickus
  2014-05-07 16:27       ` Ingo Molnar
  0 siblings, 1 reply; 21+ messages in thread
From: Don Zickus @ 2014-05-07 16:02 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: x86, Peter Zijlstra, ak, gong.chen, LKML, Thomas Gleixner,
	Frédéric Weisbecker, Steven Rostedt

On Wed, May 07, 2014 at 05:38:54PM +0200, Ingo Molnar wrote:
> 
> * Don Zickus <dzickus@redhat.com> wrote:
> 
> > I noticed when debugging a perf problem on a machine with GHES enabled,
> > perf seemed slow.  I then realized that the GHES NMI routine was taking
> > a global lock all the time to inspect the hardware.  This contended
> > with all the local perf counters which did not need a lock.  So each cpu
> > accidentally was synchronizing with itself when using perf.
> > 
> > This is because the way the nmi handler works.  It executes all the handlers
> > registered to a particular subtype (to deal with nmi sharing).  As a result
> > the GHES handler was executed on every PMI.
> > 
> > Fix this by creating a new nmi type called NMI_EXT, which is used by
> > handlers that need to probe external hardware and require a global lock
> > to do so.
> > 
> > Now the main NMI handler can check the internal NMI handlers first and
> > then the external ones if nothing is found.
> > 
> > This makes perf a little faster again on those machines with GHES enabled.
> 
> So what happens if GHES asserts an NMI at the same time a PMI 
> triggers?
> 
> If the perf PMI executes and indicates that it has handled something, 
> we don't execute the GHES handler, right? Will the GHES re-trigger the 
> NMI after we return?

In my head, I had thought they would be queued up and things work out
fine.  But I guess in theory, if a PMI NMI comes in and before the cpu can
accept it and GHES NMI comes in, then it would suffice to say it may get
dropped.  That would be not be good.  Though the race would be very small.

I don't have a good idea how to handle that.

On the flip side, we have the same exact problem, today, with the other
common external NMIs (SERR, IO).  If a PCI SERR comes in at the same time
as a PMI, then it gets dropped.  Worse, it doesn't get re-enabled and
blocks future SERRs (just found this out two weeks ago because of a dirty
perf status register on boot).

Again, I don't have a solution to juggle between PMI performance and
reliable delivery.  We could do away with the spinlocks and go back to
single cpu delivery (like it used to be).  Then devise a mechanism to
switch delivery to another cpu upon hotplug.

Thoughts?

Cheers,
Don

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 5/5] x86, nmi: Add better NMI stats to /proc/interrupts and show handlers
  2014-05-07 15:42   ` Ingo Molnar
@ 2014-05-07 16:04     ` Don Zickus
  2014-05-07 16:30       ` Ingo Molnar
  0 siblings, 1 reply; 21+ messages in thread
From: Don Zickus @ 2014-05-07 16:04 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: x86, Peter Zijlstra, ak, gong.chen, LKML, Thomas Gleixner,
	Frédéric Weisbecker

On Wed, May 07, 2014 at 05:42:31PM +0200, Ingo Molnar wrote:
> 
> * Don Zickus <dzickus@redhat.com> wrote:
> 
> > The main reason for this patch is because I have a hard time knowing
> > what NMI handlers are registered on the system when debugging NMI issues.
> > 
> > This info is provided in /proc/interrupts for interrupt handlers, so I
> > added support for NMI stuff too.  As a bonus it provides stat breakdowns
> > much like the interrupts.
> > 
> > The only ugly issue is how to label NMI subtypes using only 3 letters
> > and still make it obvious it is part of the NMI.  Adding a /proc/nmi
> > seemed overkill, so I choose to indent things by one space.  Sample
> > output is below:
> > 
> > [root@dhcp71-248 ~]# cat /proc/interrupts
> >            CPU0       CPU1       CPU2       CPU3
> >   0:         29          0          0          0  IR-IO-APIC-edge      timer
> > <snip>
> > NMI:         20        774      10986       4227   Non-maskable interrupts
> >  LOC:         21        775      10987       4228  Local     PMI, arch_bt
> >  EXT:          0          0          0          0  External  plat
> >  UNK:          0          0          0          0  Unknown
> >  SWA:          0          0          0          0  Swallowed
> > LOC:      30374      24749      20795      15095   Local timer interrupts
> > SPU:          0          0          0          0   Spurious interrupts
> > PMI:         20        774      10986       4227   Performance monitoring interrupts
> > <snip>
> 
> Looks pretty useful!
> 
> The lost vertical alignment of the counts is a problem though IMHO.

Agreed!  It wasn't obvious to me at the time on how to keep that alignment
while only changing the first column.  I can look at it again, if you are
ok with the small shift in the first column.  Otherwise alternate ideas
are welcomed. :-)

Cheers,
Don

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/5] x86, nmi:  Add new nmi type 'external'
  2014-05-07 16:02     ` Don Zickus
@ 2014-05-07 16:27       ` Ingo Molnar
  2014-05-07 16:48         ` Don Zickus
  2014-05-08 16:33         ` Don Zickus
  0 siblings, 2 replies; 21+ messages in thread
From: Ingo Molnar @ 2014-05-07 16:27 UTC (permalink / raw)
  To: Don Zickus
  Cc: x86, Peter Zijlstra, ak, gong.chen, LKML, Thomas Gleixner,
	Frédéric Weisbecker, Steven Rostedt


* Don Zickus <dzickus@redhat.com> wrote:

> On Wed, May 07, 2014 at 05:38:54PM +0200, Ingo Molnar wrote:
> > 
> > * Don Zickus <dzickus@redhat.com> wrote:
> > 
> > > I noticed when debugging a perf problem on a machine with GHES enabled,
> > > perf seemed slow.  I then realized that the GHES NMI routine was taking
> > > a global lock all the time to inspect the hardware.  This contended
> > > with all the local perf counters which did not need a lock.  So each cpu
> > > accidentally was synchronizing with itself when using perf.
> > > 
> > > This is because the way the nmi handler works.  It executes all the handlers
> > > registered to a particular subtype (to deal with nmi sharing).  As a result
> > > the GHES handler was executed on every PMI.
> > > 
> > > Fix this by creating a new nmi type called NMI_EXT, which is used by
> > > handlers that need to probe external hardware and require a global lock
> > > to do so.
> > > 
> > > Now the main NMI handler can check the internal NMI handlers first and
> > > then the external ones if nothing is found.
> > > 
> > > This makes perf a little faster again on those machines with GHES enabled.
> > 
> > So what happens if GHES asserts an NMI at the same time a PMI 
> > triggers?
> > 
> > If the perf PMI executes and indicates that it has handled something, 
> > we don't execute the GHES handler, right? Will the GHES re-trigger the 
> > NMI after we return?
> 
> In my head, I had thought they would be queued up and things work 
> out fine. [...]

x86 NMIs are generally edge triggered.

> [...]  But I guess in theory, if a PMI NMI comes in and before the 
> cpu can accept it and GHES NMI comes in, then it would suffice to 
> say it may get dropped.  That would be not be good.  Though the race 
> would be very small.
> 
> I don't have a good idea how to handle that.

Well, are GHES NMIs reasserted if they are not handled? I don't know 
but there's a definite answer to that hardware behavior question.

> On the flip side, we have the same exact problem, today, with the 
> other common external NMIs (SERR, IO).  If a PCI SERR comes in at 
> the same time as a PMI, then it gets dropped.  Worse, it doesn't get 
> re-enabled and blocks future SERRs (just found this out two weeks 
> ago because of a dirty perf status register on boot).
> 
> Again, I don't have a solution to juggle between PMI performance and 
> reliable delivery.  We could do away with the spinlocks and go back 
> to single cpu delivery (like it used to be).  Then devise a 
> mechanism to switch delivery to another cpu upon hotplug.
> 
> Thoughts?

I'd say we should do a delayed timer that makes sure that all possible 
handlers are polled after an NMI is triggered, but never at a high 
rate.

Then simply return early the moment an NMI handler indicates that 
there was an event handled - and first call high-performance handlers 
like the perf handler.

The proper channel for hardware errors is the #MC entry anyway, so 
this is mostly about legacies and weird hardware.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 5/5] x86, nmi: Add better NMI stats to /proc/interrupts and show handlers
  2014-05-07 16:04     ` Don Zickus
@ 2014-05-07 16:30       ` Ingo Molnar
  0 siblings, 0 replies; 21+ messages in thread
From: Ingo Molnar @ 2014-05-07 16:30 UTC (permalink / raw)
  To: Don Zickus
  Cc: x86, Peter Zijlstra, ak, gong.chen, LKML, Thomas Gleixner,
	Frédéric Weisbecker


* Don Zickus <dzickus@redhat.com> wrote:

> On Wed, May 07, 2014 at 05:42:31PM +0200, Ingo Molnar wrote:
> > 
> > * Don Zickus <dzickus@redhat.com> wrote:
> > 
> > > The main reason for this patch is because I have a hard time knowing
> > > what NMI handlers are registered on the system when debugging NMI issues.
> > > 
> > > This info is provided in /proc/interrupts for interrupt handlers, so I
> > > added support for NMI stuff too.  As a bonus it provides stat breakdowns
> > > much like the interrupts.
> > > 
> > > The only ugly issue is how to label NMI subtypes using only 3 letters
> > > and still make it obvious it is part of the NMI.  Adding a /proc/nmi
> > > seemed overkill, so I choose to indent things by one space.  Sample
> > > output is below:
> > > 
> > > [root@dhcp71-248 ~]# cat /proc/interrupts
> > >            CPU0       CPU1       CPU2       CPU3
> > >   0:         29          0          0          0  IR-IO-APIC-edge      timer
> > > <snip>
> > > NMI:         20        774      10986       4227   Non-maskable interrupts
> > >  LOC:         21        775      10987       4228  Local     PMI, arch_bt
> > >  EXT:          0          0          0          0  External  plat
> > >  UNK:          0          0          0          0  Unknown
> > >  SWA:          0          0          0          0  Swallowed
> > > LOC:      30374      24749      20795      15095   Local timer interrupts
> > > SPU:          0          0          0          0   Spurious interrupts
> > > PMI:         20        774      10986       4227   Performance monitoring interrupts
> > > <snip>
> > 
> > Looks pretty useful!
> > 
> > The lost vertical alignment of the counts is a problem though IMHO.
> 
> Agreed!  It wasn't obvious to me at the time on how to keep that 
> alignment while only changing the first column.  I can look at it 
> again, if you are ok with the small shift in the first column.  
> Otherwise alternate ideas are welcomed. :-)

So I'd just pick a unique 3-letter shortcut for the first column and 
not do any shifting. Do the 'these are NMI sub-cases' distinction in 
the descripion field. Example mockup:

 NMI:         20        774      10986       4227   Non-maskable interrupts
 NLC:         21        775      10987       4228   NMI: Local PMI, arch_bt
 NXT:          0          0          0          0   NMI: External plat
 NUN:          0          0          0          0   NMI: Unknown
 NSW:          0          0          0          0   NMI: Swallowed
 LOC:      30374      24749      20795      15095   Local timer interrupts

or so.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/5] x86, nmi:  Add new nmi type 'external'
  2014-05-07 16:27       ` Ingo Molnar
@ 2014-05-07 16:48         ` Don Zickus
  2014-05-08 16:33         ` Don Zickus
  1 sibling, 0 replies; 21+ messages in thread
From: Don Zickus @ 2014-05-07 16:48 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: x86, Peter Zijlstra, ak, gong.chen, LKML, Thomas Gleixner,
	Frédéric Weisbecker, Steven Rostedt

On Wed, May 07, 2014 at 06:27:46PM +0200, Ingo Molnar wrote:
> 
> * Don Zickus <dzickus@redhat.com> wrote:
> 
> > On Wed, May 07, 2014 at 05:38:54PM +0200, Ingo Molnar wrote:
> > > 
> > > * Don Zickus <dzickus@redhat.com> wrote:
> > > 
> > > > I noticed when debugging a perf problem on a machine with GHES enabled,
> > > > perf seemed slow.  I then realized that the GHES NMI routine was taking
> > > > a global lock all the time to inspect the hardware.  This contended
> > > > with all the local perf counters which did not need a lock.  So each cpu
> > > > accidentally was synchronizing with itself when using perf.
> > > > 
> > > > This is because the way the nmi handler works.  It executes all the handlers
> > > > registered to a particular subtype (to deal with nmi sharing).  As a result
> > > > the GHES handler was executed on every PMI.
> > > > 
> > > > Fix this by creating a new nmi type called NMI_EXT, which is used by
> > > > handlers that need to probe external hardware and require a global lock
> > > > to do so.
> > > > 
> > > > Now the main NMI handler can check the internal NMI handlers first and
> > > > then the external ones if nothing is found.
> > > > 
> > > > This makes perf a little faster again on those machines with GHES enabled.
> > > 
> > > So what happens if GHES asserts an NMI at the same time a PMI 
> > > triggers?
> > > 
> > > If the perf PMI executes and indicates that it has handled something, 
> > > we don't execute the GHES handler, right? Will the GHES re-trigger the 
> > > NMI after we return?
> > 
> > In my head, I had thought they would be queued up and things work 
> > out fine. [...]
> 
> x86 NMIs are generally edge triggered.

Right, I meant to say they would be latched.

> 
> > [...]  But I guess in theory, if a PMI NMI comes in and before the 
> > cpu can accept it and GHES NMI comes in, then it would suffice to 
> > say it may get dropped.  That would be not be good.  Though the race 
> > would be very small.
> > 
> > I don't have a good idea how to handle that.
> 
> Well, are GHES NMIs reasserted if they are not handled? I don't know 
> but there's a definite answer to that hardware behavior question.

I can dig around and find out but I would think not.

> 
> > On the flip side, we have the same exact problem, today, with the 
> > other common external NMIs (SERR, IO).  If a PCI SERR comes in at 
> > the same time as a PMI, then it gets dropped.  Worse, it doesn't get 
> > re-enabled and blocks future SERRs (just found this out two weeks 
> > ago because of a dirty perf status register on boot).
> > 
> > Again, I don't have a solution to juggle between PMI performance and 
> > reliable delivery.  We could do away with the spinlocks and go back 
> > to single cpu delivery (like it used to be).  Then devise a 
> > mechanism to switch delivery to another cpu upon hotplug.
> > 
> > Thoughts?
> 
> I'd say we should do a delayed timer that makes sure that all possible 
> handlers are polled after an NMI is triggered, but never at a high 
> rate.
> 
> Then simply return early the moment an NMI handler indicates that 
> there was an event handled - and first call high-performance handlers 
> like the perf handler.

Ok, I can look into something like.

> 
> The proper channel for hardware errors is the #MC entry anyway, so 
> this is mostly about legacies and weird hardware.

Well, it seems most vendors are going 'firmware first' with the NMI being
the notifying mechanism.  But that is mostly on servers.  I do deal with
vendors that like to generate their own NMI to panic the box (though those
come in as unknown NMIs).

Cheers,
Don

> 
> Thanks,
> 
> 	Ingo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH 5/5] x86, nmi: Add better NMI stats to /proc/interrupts and show handlers
  2014-05-07 15:34 ` [PATCH 5/5] x86, nmi: Add better NMI stats to /proc/interrupts and show handlers Don Zickus
  2014-05-07 15:42   ` Ingo Molnar
@ 2014-05-07 19:50   ` Elliott, Robert (Server Storage)
  2014-05-08  1:28     ` Don Zickus
  1 sibling, 1 reply; 21+ messages in thread
From: Elliott, Robert (Server Storage) @ 2014-05-07 19:50 UTC (permalink / raw)
  To: Don Zickus, x86; +Cc: Peter Zijlstra, ak, gong.chen, LKML

Don Zickus <dzickus@redhat.com> wrote:
> The main reason for this patch is because I have a hard time knowing
> what NMI handlers are registered on the system when debugging NMI issues.
> 
> This info is provided in /proc/interrupts for interrupt handlers, so I
> added support for NMI stuff too.  As a bonus it provides stat breakdowns
> much like the interrupts.

/proc/interrupts only shows online CPUs, while /proc/softirqs shows 
all possible CPUs.  Is there any value in this information for all 
possible CPUs? Perhaps a /proc/hardirqs could be created alongside.

> The only ugly issue is how to label NMI subtypes using only 3 letters
> and still make it obvious it is part of the NMI.  Adding a /proc/nmi
> seemed overkill, so I choose to indent things by one space.  

The list only shows the currently registered handlers, which may
differ from the ones that were registered when the NMIs whose counts 
are being displayed occurred. You might want to describe these new 
rows and mention that in Documentation/filesystems/proc.txt and 
the proc(5) manpage.

> Sample output is below:
> 
> [root@dhcp71-248 ~]# cat /proc/interrupts
>            CPU0       CPU1       CPU2       CPU3
>   0:         29          0          0          0  IR-IO-APIC-edge      timer
> <snip>
> NMI:         20        774      10986       4227   Non-maskable interrupts
>  LOC:         21        775      10987       4228  Local     PMI, arch_bt
>  EXT:          0          0          0          0  External  plat
>  UNK:          0          0          0          0  Unknown
>  SWA:          0          0          0          0  Swallowed

Adding the list of NMI handlers in /proc/interrupts is a bit 
inconsistent with the other interrupts, which don't describe their 
handlers. It would be helpful to distinguish between a handler 
list being present, being present but empty, or not being present.

Maybe use parenthesis like this (using Ingo's suggested format):
 NMI:         20        774      10986       4227   Non-maskable interrupts
 NLC:         21        775      10987       4228   NMI: Local (PMI, arch_bt)
 NXT:          0          0          0          0   NMI: External (plat)
 NUN:          0          0          0          0   NMI: Unknown ()
 NSW:          0          0          0          0   NMI: Swallowed
 LOC:      30374      24749      20795      15095   Local timer interrupts

> diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
> index d99f31d..520359c 100644
> --- a/arch/x86/kernel/irq.c
> +++ b/arch/x86/kernel/irq.c
...
> +void nmi_show_interrupts(struct seq_file *p, int prec)
> +{
> +	int j;
> +	int indent = prec + 1;
> +
> +#define get_nmi_stats(j)	(&per_cpu(nmi_stats, j))
> +
> +	seq_printf(p, "%*s: ", indent, "LOC");
> +	for_each_online_cpu(j)
> +		seq_printf(p, "%10u ", get_nmi_stats(j)->normal);
> +	seq_printf(p, " %-8s", "Local");
> +
> +	print_nmi_action_name(p, NMI_LOCAL);
> +
> +	seq_printf(p, "%*s: ", indent, "EXT");
> +	for_each_online_cpu(j)
> +		seq_printf(p, "%10u ", get_nmi_stats(j)->external);
> +	seq_printf(p, " %-8s", "External");
> +
> +	print_nmi_action_name(p, NMI_EXT);
> +
> +	seq_printf(p, "%*s: ", indent, "UNK");
> +	for_each_online_cpu(j)
> +		seq_printf(p, "%10u ", get_nmi_stats(j)->unknown);
> +	seq_printf(p, " %-8s", "Unknown");
> +
> +	print_nmi_action_name(p, NMI_UNKNOWN);
> +

The NMI handler types are in arch/c86/include/asm/nmi.h:
enum {
        NMI_LOCAL=0,
        NMI_UNKNOWN,
        NMI_SERR,
        NMI_IO_CHECK,
        NMI_MAX
};

The new code only prints the registered handlers for NMI_LOCAL, 
NMI_UNKNOWN, and the new NMI_EXT.  Consider adding counters 
for NMI_SERR and NMI_IO_CHECK and printing their handlers too.

drivers/watchdog/hpwdt.c is the only code currently in 
the kernel registering handlers for them.

---
Rob Elliott    HP Server Storage




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 5/5] x86, nmi: Add better NMI stats to /proc/interrupts and show handlers
  2014-05-07 19:50   ` Elliott, Robert (Server Storage)
@ 2014-05-08  1:28     ` Don Zickus
  2014-05-08  6:04       ` Ingo Molnar
  0 siblings, 1 reply; 21+ messages in thread
From: Don Zickus @ 2014-05-08  1:28 UTC (permalink / raw)
  To: Elliott, Robert (Server Storage); +Cc: x86, Peter Zijlstra, ak, gong.chen, LKML

On Wed, May 07, 2014 at 07:50:48PM +0000, Elliott, Robert (Server Storage) wrote:
> Don Zickus <dzickus@redhat.com> wrote:
> > The main reason for this patch is because I have a hard time knowing
> > what NMI handlers are registered on the system when debugging NMI issues.
> > 
> > This info is provided in /proc/interrupts for interrupt handlers, so I
> > added support for NMI stuff too.  As a bonus it provides stat breakdowns
> > much like the interrupts.
> 
> /proc/interrupts only shows online CPUs, while /proc/softirqs shows 
> all possible CPUs.  Is there any value in this information for all 
> possible CPUs? Perhaps a /proc/hardirqs could be created alongside.

Well if they are not online, they probably won't be generating NMIs, so I
am not sure there is much value there.

> 
> > The only ugly issue is how to label NMI subtypes using only 3 letters
> > and still make it obvious it is part of the NMI.  Adding a /proc/nmi
> > seemed overkill, so I choose to indent things by one space.  
> 
> The list only shows the currently registered handlers, which may
> differ from the ones that were registered when the NMIs whose counts 
> are being displayed occurred. You might want to describe these new 
> rows and mention that in Documentation/filesystems/proc.txt and 
> the proc(5) manpage.

Ok, but that is a /proc/interrupts problem not one specific to NMI, no?

> 
> > Sample output is below:
> > 
> > [root@dhcp71-248 ~]# cat /proc/interrupts
> >            CPU0       CPU1       CPU2       CPU3
> >   0:         29          0          0          0  IR-IO-APIC-edge      timer
> > <snip>
> > NMI:         20        774      10986       4227   Non-maskable interrupts
> >  LOC:         21        775      10987       4228  Local     PMI, arch_bt
> >  EXT:          0          0          0          0  External  plat
> >  UNK:          0          0          0          0  Unknown
> >  SWA:          0          0          0          0  Swallowed
> 
> Adding the list of NMI handlers in /proc/interrupts is a bit 
> inconsistent with the other interrupts, which don't describe their 
> handlers. It would be helpful to distinguish between a handler 
> list being present, being present but empty, or not being present.
> 
> Maybe use parenthesis like this (using Ingo's suggested format):
>  NMI:         20        774      10986       4227   Non-maskable interrupts
>  NLC:         21        775      10987       4228   NMI: Local (PMI, arch_bt)
>  NXT:          0          0          0          0   NMI: External (plat)
>  NUN:          0          0          0          0   NMI: Unknown ()
>  NSW:          0          0          0          0   NMI: Swallowed
>  LOC:      30374      24749      20795      15095   Local timer interrupts
> 

Hmm, looking at /proc/interrupts I see

  1:     858014      29054      23191       9337   IO-APIC-edge      i8042
  8:          3         24         10          2   IO-APIC-edge      rtc0
  9:     387555       9219       8308       7944   IO-APIC-fasteoi   acpi
 12:    9251360     163811     158846     141916   IO-APIC-edge      i8042
 16:          0          0          0          0   IO-APIC-fasteoi   mmc0
 17:         14          5          7         10   IO-APIC-fasteoi 
 19:       6892        367         13         10   IO-APIC-fasteoi ehci_hcd:usb2, ips, firewire_ohci
 23:    1363281        753         94         94   IO-APIC-fasteoi ehci_hcd:usb1

Those may not be specific handlers, but they are registered irq names, no?
That basically matches what I was trying to accomplish with NMI. 

I guess I don't see how what I did is much different than what already
exists.


> > diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
> > index d99f31d..520359c 100644
> > --- a/arch/x86/kernel/irq.c
> > +++ b/arch/x86/kernel/irq.c
> ...
> > +void nmi_show_interrupts(struct seq_file *p, int prec)
> > +{
> > +	int j;
> > +	int indent = prec + 1;
> > +
> > +#define get_nmi_stats(j)	(&per_cpu(nmi_stats, j))
> > +
> > +	seq_printf(p, "%*s: ", indent, "LOC");
> > +	for_each_online_cpu(j)
> > +		seq_printf(p, "%10u ", get_nmi_stats(j)->normal);
> > +	seq_printf(p, " %-8s", "Local");
> > +
> > +	print_nmi_action_name(p, NMI_LOCAL);
> > +
> > +	seq_printf(p, "%*s: ", indent, "EXT");
> > +	for_each_online_cpu(j)
> > +		seq_printf(p, "%10u ", get_nmi_stats(j)->external);
> > +	seq_printf(p, " %-8s", "External");
> > +
> > +	print_nmi_action_name(p, NMI_EXT);
> > +
> > +	seq_printf(p, "%*s: ", indent, "UNK");
> > +	for_each_online_cpu(j)
> > +		seq_printf(p, "%10u ", get_nmi_stats(j)->unknown);
> > +	seq_printf(p, " %-8s", "Unknown");
> > +
> > +	print_nmi_action_name(p, NMI_UNKNOWN);
> > +
> 
> The NMI handler types are in arch/c86/include/asm/nmi.h:
> enum {
>         NMI_LOCAL=0,
>         NMI_UNKNOWN,
>         NMI_SERR,
>         NMI_IO_CHECK,
>         NMI_MAX
> };
> 
> The new code only prints the registered handlers for NMI_LOCAL, 
> NMI_UNKNOWN, and the new NMI_EXT.  Consider adding counters 
> for NMI_SERR and NMI_IO_CHECK and printing their handlers too.
> 
> drivers/watchdog/hpwdt.c is the only code currently in 
> the kernel registering handlers for them.

Yeah, I guess I was trying to remove NMI_SERR and NMI_IO_CHECK.  I forgot
if I accomplished that with this patch set or not.  Instead I had hpwdt do
the ioport read directly instead of having do_default_nmi do it.  I can
look at it again.

Cheers,
Don

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 5/5] x86, nmi: Add better NMI stats to /proc/interrupts and show handlers
  2014-05-08  1:28     ` Don Zickus
@ 2014-05-08  6:04       ` Ingo Molnar
  0 siblings, 0 replies; 21+ messages in thread
From: Ingo Molnar @ 2014-05-08  6:04 UTC (permalink / raw)
  To: Don Zickus
  Cc: Elliott, Robert (Server Storage),
	x86, Peter Zijlstra, ak, gong.chen, LKML


* Don Zickus <dzickus@redhat.com> wrote:

> On Wed, May 07, 2014 at 07:50:48PM +0000, Elliott, Robert (Server Storage) wrote:
> > Don Zickus <dzickus@redhat.com> wrote:
> > > The main reason for this patch is because I have a hard time knowing
> > > what NMI handlers are registered on the system when debugging NMI issues.
> > > 
> > > This info is provided in /proc/interrupts for interrupt handlers, so I
> > > added support for NMI stuff too.  As a bonus it provides stat breakdowns
> > > much like the interrupts.
> > 
> > /proc/interrupts only shows online CPUs, while /proc/softirqs shows 
> > all possible CPUs.  Is there any value in this information for all 
> > possible CPUs? Perhaps a /proc/hardirqs could be created alongside.
> 
> Well if they are not online, they probably won't be generating NMIs, so I
> am not sure there is much value there.
> 
> > 
> > > The only ugly issue is how to label NMI subtypes using only 3 letters
> > > and still make it obvious it is part of the NMI.  Adding a /proc/nmi
> > > seemed overkill, so I choose to indent things by one space.  
> > 
> > The list only shows the currently registered handlers, which may
> > differ from the ones that were registered when the NMIs whose counts 
> > are being displayed occurred. You might want to describe these new 
> > rows and mention that in Documentation/filesystems/proc.txt and 
> > the proc(5) manpage.
> 
> Ok, but that is a /proc/interrupts problem not one specific to NMI, no?
> 
> > 
> > > Sample output is below:
> > > 
> > > [root@dhcp71-248 ~]# cat /proc/interrupts
> > >            CPU0       CPU1       CPU2       CPU3
> > >   0:         29          0          0          0  IR-IO-APIC-edge      timer
> > > <snip>
> > > NMI:         20        774      10986       4227   Non-maskable interrupts
> > >  LOC:         21        775      10987       4228  Local     PMI, arch_bt
> > >  EXT:          0          0          0          0  External  plat
> > >  UNK:          0          0          0          0  Unknown
> > >  SWA:          0          0          0          0  Swallowed
> > 
> > Adding the list of NMI handlers in /proc/interrupts is a bit 
> > inconsistent with the other interrupts, which don't describe their 
> > handlers. It would be helpful to distinguish between a handler 
> > list being present, being present but empty, or not being present.
> > 
> > Maybe use parenthesis like this (using Ingo's suggested format):
> >  NMI:         20        774      10986       4227   Non-maskable interrupts
> >  NLC:         21        775      10987       4228   NMI: Local (PMI, arch_bt)
> >  NXT:          0          0          0          0   NMI: External (plat)
> >  NUN:          0          0          0          0   NMI: Unknown ()
> >  NSW:          0          0          0          0   NMI: Swallowed
> >  LOC:      30374      24749      20795      15095   Local timer interrupts
> > 
> 
> Hmm, looking at /proc/interrupts I see
> 
>   1:     858014      29054      23191       9337   IO-APIC-edge      i8042
>   8:          3         24         10          2   IO-APIC-edge      rtc0
>   9:     387555       9219       8308       7944   IO-APIC-fasteoi   acpi
>  12:    9251360     163811     158846     141916   IO-APIC-edge      i8042
>  16:          0          0          0          0   IO-APIC-fasteoi   mmc0
>  17:         14          5          7         10   IO-APIC-fasteoi 
>  19:       6892        367         13         10   IO-APIC-fasteoi ehci_hcd:usb2, ips, firewire_ohci
>  23:    1363281        753         94         94   IO-APIC-fasteoi ehci_hcd:usb1
> 
> Those may not be specific handlers, but they are registered irq 
> names, no? That basically matches what I was trying to accomplish 
> with NMI.
> 
> I guess I don't see how what I did is much different than what 
> already exists.

The parentheses makes the output more readable, especially as with the 
NMI format it's not quite clear what is 'irq type' and what is 
'handler'.

Might make sense to add parentheses for regular irq handlers as well, 
for consistency and readability.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/5] x86, nmi:  Add new nmi type 'external'
  2014-05-07 16:27       ` Ingo Molnar
  2014-05-07 16:48         ` Don Zickus
@ 2014-05-08 16:33         ` Don Zickus
  2014-05-08 17:35           ` Ingo Molnar
  1 sibling, 1 reply; 21+ messages in thread
From: Don Zickus @ 2014-05-08 16:33 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: x86, Peter Zijlstra, ak, gong.chen, LKML, Thomas Gleixner,
	Frédéric Weisbecker, Steven Rostedt, andi

On Wed, May 07, 2014 at 06:27:46PM +0200, Ingo Molnar wrote:
> > [...]  But I guess in theory, if a PMI NMI comes in and before the 
> > cpu can accept it and GHES NMI comes in, then it would suffice to 
> > say it may get dropped.  That would be not be good.  Though the race 
> > would be very small.
> > 
> > I don't have a good idea how to handle that.
> 
> Well, are GHES NMIs reasserted if they are not handled? I don't know 
> but there's a definite answer to that hardware behavior question.

I can't find anything that explicitly says the NMI will be re-asserted, so
I will it does not.  Andi, do you know?  (I am not sure who maintains GHES
any more).

> 
> > On the flip side, we have the same exact problem, today, with the 
> > other common external NMIs (SERR, IO).  If a PCI SERR comes in at 
> > the same time as a PMI, then it gets dropped.  Worse, it doesn't get 
> > re-enabled and blocks future SERRs (just found this out two weeks 
> > ago because of a dirty perf status register on boot).
> > 
> > Again, I don't have a solution to juggle between PMI performance and 
> > reliable delivery.  We could do away with the spinlocks and go back 
> > to single cpu delivery (like it used to be).  Then devise a 
> > mechanism to switch delivery to another cpu upon hotplug.
> > 
> > Thoughts?
> 
> I'd say we should do a delayed timer that makes sure that all possible 
> handlers are polled after an NMI is triggered, but never at a high 
> rate.

Hmm, I was thinking about it and wanted to avoid a poll as I hear
complaints here and there about the nmi_watchdog constantly wasting power
cycles with its polling.

I was wondering if I could do a status read outside the spinlock, then if
a bit is set, just grab the spin_lock and re-read the status.  But then
looking at the GHES code, I am not sure if it is as easy to read the
status bit as it is for a PCI_SERR/IO_CHK NMI.

Andi thoughts here?  Should I poke Tony Luck?

Otherwise I can set up the polling if that doesn't work.

Cheers,
Don

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/5] x86, nmi:  Add new nmi type 'external'
  2014-05-08 16:33         ` Don Zickus
@ 2014-05-08 17:35           ` Ingo Molnar
  2014-05-08 17:52             ` Don Zickus
  0 siblings, 1 reply; 21+ messages in thread
From: Ingo Molnar @ 2014-05-08 17:35 UTC (permalink / raw)
  To: Don Zickus
  Cc: x86, Peter Zijlstra, ak, gong.chen, LKML, Thomas Gleixner,
	Frédéric Weisbecker, Steven Rostedt, andi


* Don Zickus <dzickus@redhat.com> wrote:

> > > Again, I don't have a solution to juggle between PMI performance 
> > > and reliable delivery.  We could do away with the spinlocks and 
> > > go back to single cpu delivery (like it used to be).  Then 
> > > devise a mechanism to switch delivery to another cpu upon 
> > > hotplug.
> > > 
> > > Thoughts?
> > 
> > I'd say we should do a delayed timer that makes sure that all 
> > possible handlers are polled after an NMI is triggered, but never 
> > at a high rate.
> 
> Hmm, I was thinking about it and wanted to avoid a poll as I hear 
> complaints here and there about the nmi_watchdog constantly wasting 
> power cycles with its polling.

But the polling would only happen if there's NMI traffic, so that's 
fine. So as long as polling stops some time after the last PMI use, 
it's a good solution.

This would also address a lot of NMI handling related fragility.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/5] x86, nmi:  Add new nmi type 'external'
  2014-05-08 17:35           ` Ingo Molnar
@ 2014-05-08 17:52             ` Don Zickus
  2014-05-09  7:10               ` Ingo Molnar
  0 siblings, 1 reply; 21+ messages in thread
From: Don Zickus @ 2014-05-08 17:52 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: x86, Peter Zijlstra, ak, gong.chen, LKML, Thomas Gleixner,
	Frédéric Weisbecker, Steven Rostedt, andi

On Thu, May 08, 2014 at 07:35:01PM +0200, Ingo Molnar wrote:
> 
> * Don Zickus <dzickus@redhat.com> wrote:
> 
> > > > Again, I don't have a solution to juggle between PMI performance 
> > > > and reliable delivery.  We could do away with the spinlocks and 
> > > > go back to single cpu delivery (like it used to be).  Then 
> > > > devise a mechanism to switch delivery to another cpu upon 
> > > > hotplug.
> > > > 
> > > > Thoughts?
> > > 
> > > I'd say we should do a delayed timer that makes sure that all 
> > > possible handlers are polled after an NMI is triggered, but never 
> > > at a high rate.
> > 
> > Hmm, I was thinking about it and wanted to avoid a poll as I hear 
> > complaints here and there about the nmi_watchdog constantly wasting 
> > power cycles with its polling.
> 
> But the polling would only happen if there's NMI traffic, so that's 
> fine. So as long as polling stops some time after the last PMI use, 
> it's a good solution.

So you are thinking an NMI comes in, kicks off a delayed timer for say
10ms.  The timer fires, rechecks the NMI for missed events and then stops?
If another NMI happens before the timer fires, just kick the timer again?

Something like that?

Cheers,
Don
 
> 
> This would also address a lot of NMI handling related fragility.
> 
> Thanks,
> 
> 	Ingo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/5] x86, nmi:  Add new nmi type 'external'
  2014-05-08 17:52             ` Don Zickus
@ 2014-05-09  7:10               ` Ingo Molnar
  2014-05-09 13:36                 ` Don Zickus
  0 siblings, 1 reply; 21+ messages in thread
From: Ingo Molnar @ 2014-05-09  7:10 UTC (permalink / raw)
  To: Don Zickus
  Cc: x86, Peter Zijlstra, ak, gong.chen, LKML, Thomas Gleixner,
	Frédéric Weisbecker, Steven Rostedt, andi


* Don Zickus <dzickus@redhat.com> wrote:

> On Thu, May 08, 2014 at 07:35:01PM +0200, Ingo Molnar wrote:
> > 
> > * Don Zickus <dzickus@redhat.com> wrote:
> > 
> > > > > Again, I don't have a solution to juggle between PMI performance 
> > > > > and reliable delivery.  We could do away with the spinlocks and 
> > > > > go back to single cpu delivery (like it used to be).  Then 
> > > > > devise a mechanism to switch delivery to another cpu upon 
> > > > > hotplug.
> > > > > 
> > > > > Thoughts?
> > > > 
> > > > I'd say we should do a delayed timer that makes sure that all 
> > > > possible handlers are polled after an NMI is triggered, but never 
> > > > at a high rate.
> > > 
> > > Hmm, I was thinking about it and wanted to avoid a poll as I hear 
> > > complaints here and there about the nmi_watchdog constantly wasting 
> > > power cycles with its polling.
> > 
> > But the polling would only happen if there's NMI traffic, so that's 
> > fine. So as long as polling stops some time after the last PMI use, 
> > it's a good solution.
> 
> So you are thinking an NMI comes in, kicks off a delayed timer for 
> say 10ms.  The timer fires, rechecks the NMI for missed events and 
> then stops? If another NMI happens before the timer fires, just kick 
> the timer again?
> 
> Something like that?

Yeah, exactly, using delayed IRQ work for that or so.

This would allow us to 'optimistic' processing of NMI events: the 
first handler that manages to do any work causes a return. No need to 
make a per handler distinction, etc.

It would generally be pretty robust and would possibly be a natural 
workaround for 'stuck PMU' type of bugs as well.

[ As long as it does not result in spurious 'dazed and confused' 
  messages :-) ]

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/5] x86, nmi:  Add new nmi type 'external'
  2014-05-09  7:10               ` Ingo Molnar
@ 2014-05-09 13:36                 ` Don Zickus
  0 siblings, 0 replies; 21+ messages in thread
From: Don Zickus @ 2014-05-09 13:36 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: x86, Peter Zijlstra, ak, gong.chen, LKML, Thomas Gleixner,
	Frédéric Weisbecker, Steven Rostedt, andi

On Fri, May 09, 2014 at 09:10:50AM +0200, Ingo Molnar wrote:
> 
> * Don Zickus <dzickus@redhat.com> wrote:
> 
> > On Thu, May 08, 2014 at 07:35:01PM +0200, Ingo Molnar wrote:
> > > 
> > > * Don Zickus <dzickus@redhat.com> wrote:
> > > 
> > > > > > Again, I don't have a solution to juggle between PMI performance 
> > > > > > and reliable delivery.  We could do away with the spinlocks and 
> > > > > > go back to single cpu delivery (like it used to be).  Then 
> > > > > > devise a mechanism to switch delivery to another cpu upon 
> > > > > > hotplug.
> > > > > > 
> > > > > > Thoughts?
> > > > > 
> > > > > I'd say we should do a delayed timer that makes sure that all 
> > > > > possible handlers are polled after an NMI is triggered, but never 
> > > > > at a high rate.
> > > > 
> > > > Hmm, I was thinking about it and wanted to avoid a poll as I hear 
> > > > complaints here and there about the nmi_watchdog constantly wasting 
> > > > power cycles with its polling.
> > > 
> > > But the polling would only happen if there's NMI traffic, so that's 
> > > fine. So as long as polling stops some time after the last PMI use, 
> > > it's a good solution.
> > 
> > So you are thinking an NMI comes in, kicks off a delayed timer for 
> > say 10ms.  The timer fires, rechecks the NMI for missed events and 
> > then stops? If another NMI happens before the timer fires, just kick 
> > the timer again?
> > 
> > Something like that?
> 
> Yeah, exactly, using delayed IRQ work for that or so.
> 
> This would allow us to 'optimistic' processing of NMI events: the 
> first handler that manages to do any work causes a return. No need to 
> make a per handler distinction, etc.
> 
> It would generally be pretty robust and would possibly be a natural 
> workaround for 'stuck PMU' type of bugs as well.

Agreed.  I'll try to hack something up for that.

> 
> [ As long as it does not result in spurious 'dazed and confused' 
>   messages :-) ]

Hehe. 

Cheers,
Don

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2014-05-09 13:37 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-07 15:34 [PATCH 0/5 RESEND] x86, nmi: Various fixes and cleanups Don Zickus
2014-05-07 15:34 ` [PATCH 1/5] x86, nmi: Add new nmi type 'external' Don Zickus
2014-05-07 15:38   ` Ingo Molnar
2014-05-07 16:02     ` Don Zickus
2014-05-07 16:27       ` Ingo Molnar
2014-05-07 16:48         ` Don Zickus
2014-05-08 16:33         ` Don Zickus
2014-05-08 17:35           ` Ingo Molnar
2014-05-08 17:52             ` Don Zickus
2014-05-09  7:10               ` Ingo Molnar
2014-05-09 13:36                 ` Don Zickus
2014-05-07 15:34 ` [PATCH 2/5] x86, nmi: Add boot line option 'panic_on_unrecovered_nmi' and 'panic_on_io_nmi' Don Zickus
2014-05-07 15:34 ` [PATCH 3/5] x86, nmi: Remove 'reason' value from unknown nmi output Don Zickus
2014-05-07 15:34 ` [PATCH 4/5] x86, nmi: Move default external NMI handler to its own routine Don Zickus
2014-05-07 15:34 ` [PATCH 5/5] x86, nmi: Add better NMI stats to /proc/interrupts and show handlers Don Zickus
2014-05-07 15:42   ` Ingo Molnar
2014-05-07 16:04     ` Don Zickus
2014-05-07 16:30       ` Ingo Molnar
2014-05-07 19:50   ` Elliott, Robert (Server Storage)
2014-05-08  1:28     ` Don Zickus
2014-05-08  6:04       ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).