* [PATCH 0/5 RESEND] x86, nmi: Various fixes and cleanups
@ 2014-05-07 15:34 Don Zickus
2014-05-07 15:34 ` [PATCH 1/5] x86, nmi: Add new nmi type 'external' Don Zickus
` (4 more replies)
0 siblings, 5 replies; 21+ messages in thread
From: Don Zickus @ 2014-05-07 15:34 UTC (permalink / raw)
To: x86; +Cc: Peter Zijlstra, ak, gong.chen, LKML, Don Zickus
I started this patch by fixing a performance problem with the GHES
NMI handler and then things evolved to more patches as I was poking
around in the code.
The main focus was moving the GHES NMI driver to its own NMI subtype
to avoid slowing down perf handling. Then I decided to move the
default external NMI handler to its own routine. Then finally, I
needed to see which NMI handlers were registered so I hacked up
/proc/interrupts to show me.
Tested mostly on HP boxes that have GHES enabled and having the iLO
send NMIs to panic the box (using hpwdt driver). Ran perf on other
GHES enabled boxes to test performance results.
Update: I realized I didn't send this to lkml initially. Might be why no one
has commented on it more.
Don Zickus (5):
x86, nmi: Add new nmi type 'external'
x86, nmi: Add boot line option 'panic_on_unrecovered_nmi' and
'panic_on_io_nmi'
x86, nmi: Remove 'reason' value from unknown nmi output
x86, nmi: Move default external NMI handler to its own routine
x86, nmi: Add better NMI stats to /proc/interrupts and show handlers
Documentation/kernel-parameters.txt | 9 ++
arch/x86/include/asm/nmi.h | 5 +
arch/x86/kernel/irq.c | 3 +
arch/x86/kernel/nmi.c | 199 ++++++++++++++++++++++++++--------
drivers/acpi/apei/ghes.c | 4 +-
drivers/watchdog/hpwdt.c | 24 +++--
6 files changed, 187 insertions(+), 57 deletions(-)
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 1/5] x86, nmi: Add new nmi type 'external'
2014-05-07 15:34 [PATCH 0/5 RESEND] x86, nmi: Various fixes and cleanups Don Zickus
@ 2014-05-07 15:34 ` Don Zickus
2014-05-07 15:38 ` Ingo Molnar
2014-05-07 15:34 ` [PATCH 2/5] x86, nmi: Add boot line option 'panic_on_unrecovered_nmi' and 'panic_on_io_nmi' Don Zickus
` (3 subsequent siblings)
4 siblings, 1 reply; 21+ messages in thread
From: Don Zickus @ 2014-05-07 15:34 UTC (permalink / raw)
To: x86; +Cc: Peter Zijlstra, ak, gong.chen, LKML, Don Zickus
I noticed when debugging a perf problem on a machine with GHES enabled,
perf seemed slow. I then realized that the GHES NMI routine was taking
a global lock all the time to inspect the hardware. This contended
with all the local perf counters which did not need a lock. So each cpu
accidentally was synchronizing with itself when using perf.
This is because the way the nmi handler works. It executes all the handlers
registered to a particular subtype (to deal with nmi sharing). As a result
the GHES handler was executed on every PMI.
Fix this by creating a new nmi type called NMI_EXT, which is used by
handlers that need to probe external hardware and require a global lock
to do so.
Now the main NMI handler can check the internal NMI handlers first and
then the external ones if nothing is found.
This makes perf a little faster again on those machines with GHES enabled.
Signed-off-by: Don Zickus <dzickus@redhat.com>
---
arch/x86/include/asm/nmi.h | 1 +
arch/x86/kernel/nmi.c | 41 ++++++++++++++++++++++++++---------------
drivers/acpi/apei/ghes.c | 4 ++--
drivers/watchdog/hpwdt.c | 10 ++--------
4 files changed, 31 insertions(+), 25 deletions(-)
diff --git a/arch/x86/include/asm/nmi.h b/arch/x86/include/asm/nmi.h
index 86f9301..d6d26fa 100644
--- a/arch/x86/include/asm/nmi.h
+++ b/arch/x86/include/asm/nmi.h
@@ -24,6 +24,7 @@ extern int unknown_nmi_panic;
enum {
NMI_LOCAL=0,
+ NMI_EXT,
NMI_UNKNOWN,
NMI_SERR,
NMI_IO_CHECK,
diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index 6fcb49c..72fcff7 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -34,29 +34,32 @@
#include <trace/events/nmi.h>
struct nmi_desc {
- spinlock_t lock;
- struct list_head head;
+ spinlock_t lock;
+ struct list_head head;
};
static struct nmi_desc nmi_desc[NMI_MAX] =
{
- {
- .lock = __SPIN_LOCK_UNLOCKED(&nmi_desc[0].lock),
- .head = LIST_HEAD_INIT(nmi_desc[0].head),
+ [NMI_LOCAL] = {
+ .lock = __SPIN_LOCK_UNLOCKED(&nmi_desc[NMI_LOCAL].lock),
+ .head = LIST_HEAD_INIT(nmi_desc[NMI_LOCAL].head),
},
- {
- .lock = __SPIN_LOCK_UNLOCKED(&nmi_desc[1].lock),
- .head = LIST_HEAD_INIT(nmi_desc[1].head),
+ [NMI_EXT] = {
+ .lock = __SPIN_LOCK_UNLOCKED(&nmi_desc[NMI_EXT].lock),
+ .head = LIST_HEAD_INIT(nmi_desc[NMI_EXT].head),
},
- {
- .lock = __SPIN_LOCK_UNLOCKED(&nmi_desc[2].lock),
- .head = LIST_HEAD_INIT(nmi_desc[2].head),
+ [NMI_UNKNOWN] = {
+ .lock = __SPIN_LOCK_UNLOCKED(&nmi_desc[NMI_UNKNOWN].lock),
+ .head = LIST_HEAD_INIT(nmi_desc[NMI_UNKNOWN].head),
},
- {
- .lock = __SPIN_LOCK_UNLOCKED(&nmi_desc[3].lock),
- .head = LIST_HEAD_INIT(nmi_desc[3].head),
+ [NMI_SERR] = {
+ .lock = __SPIN_LOCK_UNLOCKED(&nmi_desc[NMI_SERR].lock),
+ .head = LIST_HEAD_INIT(nmi_desc[NMI_SERR].head),
+ },
+ [NMI_IO_CHECK] = {
+ .lock = __SPIN_LOCK_UNLOCKED(&nmi_desc[NMI_IO_CHECK].lock),
+ .head = LIST_HEAD_INIT(nmi_desc[NMI_IO_CHECK].head),
},
-
};
struct nmi_stats {
@@ -317,8 +320,16 @@ static __kprobes void default_do_nmi(struct pt_regs *regs)
__this_cpu_write(last_nmi_rip, regs->ip);
+ /* try local NMIs first */
handled = nmi_handle(NMI_LOCAL, regs, b2b);
__this_cpu_add(nmi_stats.normal, handled);
+
+ /* if unclaimed, try external NMIs next */
+ if (!handled) {
+ handled = nmi_handle(NMI_EXT, regs, b2b);
+ __this_cpu_add(nmi_stats.external, handled);
+ }
+
if (handled) {
/*
* There are cases when a NMI handler handles multiple
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index dab7cb7..8b78bf5 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -976,7 +976,7 @@ static int ghes_probe(struct platform_device *ghes_dev)
ghes_estatus_pool_expand(len);
mutex_lock(&ghes_list_mutex);
if (list_empty(&ghes_nmi))
- register_nmi_handler(NMI_LOCAL, ghes_notify_nmi, 0,
+ register_nmi_handler(NMI_EXT, ghes_notify_nmi, 0,
"ghes");
list_add_rcu(&ghes->list, &ghes_nmi);
mutex_unlock(&ghes_list_mutex);
@@ -1025,7 +1025,7 @@ static int ghes_remove(struct platform_device *ghes_dev)
mutex_lock(&ghes_list_mutex);
list_del_rcu(&ghes->list);
if (list_empty(&ghes_nmi))
- unregister_nmi_handler(NMI_LOCAL, "ghes");
+ unregister_nmi_handler(NMI_EXT, "ghes");
mutex_unlock(&ghes_list_mutex);
/*
* To synchronize with NMI handler, ghes can only be
diff --git a/drivers/watchdog/hpwdt.c b/drivers/watchdog/hpwdt.c
index 2b75e8b..9617a4d 100644
--- a/drivers/watchdog/hpwdt.c
+++ b/drivers/watchdog/hpwdt.c
@@ -737,12 +737,9 @@ static int hpwdt_init_nmi_decoding(struct pci_dev *dev)
retval = register_nmi_handler(NMI_UNKNOWN, hpwdt_pretimeout, 0, "hpwdt");
if (retval)
goto error;
- retval = register_nmi_handler(NMI_SERR, hpwdt_pretimeout, 0, "hpwdt");
+ retval = register_nmi_handler(NMI_EXT, hpwdt_pretimeout, 0, "hpwdt");
if (retval)
goto error1;
- retval = register_nmi_handler(NMI_IO_CHECK, hpwdt_pretimeout, 0, "hpwdt");
- if (retval)
- goto error2;
dev_info(&dev->dev,
"HP Watchdog Timer Driver: NMI decoding initialized"
@@ -750,8 +747,6 @@ static int hpwdt_init_nmi_decoding(struct pci_dev *dev)
(allow_kdump == 0) ? "OFF" : "ON");
return 0;
-error2:
- unregister_nmi_handler(NMI_SERR, "hpwdt");
error1:
unregister_nmi_handler(NMI_UNKNOWN, "hpwdt");
error:
@@ -766,8 +761,7 @@ error:
static void hpwdt_exit_nmi_decoding(void)
{
unregister_nmi_handler(NMI_UNKNOWN, "hpwdt");
- unregister_nmi_handler(NMI_SERR, "hpwdt");
- unregister_nmi_handler(NMI_IO_CHECK, "hpwdt");
+ unregister_nmi_handler(NMI_EXT, "hpwdt");
if (cru_rom_addr)
iounmap(cru_rom_addr);
}
--
1.7.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH 2/5] x86, nmi: Add boot line option 'panic_on_unrecovered_nmi' and 'panic_on_io_nmi'
2014-05-07 15:34 [PATCH 0/5 RESEND] x86, nmi: Various fixes and cleanups Don Zickus
2014-05-07 15:34 ` [PATCH 1/5] x86, nmi: Add new nmi type 'external' Don Zickus
@ 2014-05-07 15:34 ` Don Zickus
2014-05-07 15:34 ` [PATCH 3/5] x86, nmi: Remove 'reason' value from unknown nmi output Don Zickus
` (2 subsequent siblings)
4 siblings, 0 replies; 21+ messages in thread
From: Don Zickus @ 2014-05-07 15:34 UTC (permalink / raw)
To: x86; +Cc: Peter Zijlstra, ak, gong.chen, LKML, Don Zickus
These options are accessable through /proc/sys/kernel but not on the command
line. The main use is for on board controllers (iLO, DRAC, BMC) to be able
to issue an external NMI to bring down a hung box.
This just makes configuring a box a little easier.
Signed-off-by: Don Zickus <dzickus@redhat.com>
---
Documentation/kernel-parameters.txt | 9 +++++++++
arch/x86/kernel/nmi.c | 14 ++++++++++++++
2 files changed, 23 insertions(+), 0 deletions(-)
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 7116fda..c9da1a7 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2313,6 +2313,15 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
timeout < 0: reboot immediately
Format: <timeout>
+ panic_on_unrecovered_nmi [X86]
+ Force a machine to panic if an unrecoverable NMI is
+ unclaimed. This covers SERR or UNKONWN NMI cases.
+
+ panic_on_io_nmi [X86]
+ Force a machine to panic if an IO NMI is unclaimed.
+ This covers external NMIs with no handlers associated
+ with them.
+
parkbd.port= [HW] Parallel port number the keyboard adapter is
connected to, default is 0.
Format: <parport#>
diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index 72fcff7..f3f4c43 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -87,6 +87,20 @@ static int __init setup_unknown_nmi_panic(char *str)
}
__setup("unknown_nmi_panic", setup_unknown_nmi_panic);
+static int __init setup_panic_on_unrecovered_nmi(char *str)
+{
+ panic_on_unrecovered_nmi = 1;
+ return 1;
+}
+__setup("panic_on_unrecovered_nmi", setup_panic_on_unrecovered_nmi);
+
+static int __init setup_panic_on_io_nmi(char *str)
+{
+ panic_on_io_nmi = 1;
+ return 1;
+}
+__setup("panic_on_io_nmi", setup_panic_on_io_nmi);
+
#define nmi_to_desc(type) (&nmi_desc[type])
static u64 nmi_longest_ns = 1 * NSEC_PER_MSEC;
--
1.7.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH 3/5] x86, nmi: Remove 'reason' value from unknown nmi output
2014-05-07 15:34 [PATCH 0/5 RESEND] x86, nmi: Various fixes and cleanups Don Zickus
2014-05-07 15:34 ` [PATCH 1/5] x86, nmi: Add new nmi type 'external' Don Zickus
2014-05-07 15:34 ` [PATCH 2/5] x86, nmi: Add boot line option 'panic_on_unrecovered_nmi' and 'panic_on_io_nmi' Don Zickus
@ 2014-05-07 15:34 ` Don Zickus
2014-05-07 15:34 ` [PATCH 4/5] x86, nmi: Move default external NMI handler to its own routine Don Zickus
2014-05-07 15:34 ` [PATCH 5/5] x86, nmi: Add better NMI stats to /proc/interrupts and show handlers Don Zickus
4 siblings, 0 replies; 21+ messages in thread
From: Don Zickus @ 2014-05-07 15:34 UTC (permalink / raw)
To: x86; +Cc: Peter Zijlstra, ak, gong.chen, LKML, Don Zickus
The 'reason' value is from the io port for the external NMI.
We already have handlers that deal with non-zero 'reason'. Providing
this in the output of unknown NMI, doesn't add much.
In addition, it should lower the number of questions I get when
customers keep attaching new unknown NMI panic updates because the
reason value changes (which is expected because part of it is a
timer bit that toggles all the time).
However, the main reason for this, is for the next patch which
will move the external NMI check to another function. Hence the
reason value will not be available.
Signed-off-by: Don Zickus <dzickus@redhat.com>
---
arch/x86/kernel/nmi.c | 8 ++++----
1 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index f3f4c43..be2d622 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -277,7 +277,7 @@ io_check_error(unsigned char reason, struct pt_regs *regs)
}
static __kprobes void
-unknown_nmi_error(unsigned char reason, struct pt_regs *regs)
+unknown_nmi_error(struct pt_regs *regs)
{
int handled;
@@ -295,8 +295,8 @@ unknown_nmi_error(unsigned char reason, struct pt_regs *regs)
__this_cpu_add(nmi_stats.unknown, 1);
- pr_emerg("Uhhuh. NMI received for unknown reason %02x on CPU %d.\n",
- reason, smp_processor_id());
+ pr_emerg("Uhhuh. NMI received for unknown reason on CPU %d.\n",
+ smp_processor_id());
pr_emerg("Do you have a strange power saving mode enabled?\n");
if (unknown_nmi_panic || panic_on_unrecovered_nmi)
@@ -413,7 +413,7 @@ static __kprobes void default_do_nmi(struct pt_regs *regs)
if (b2b && __this_cpu_read(swallow_nmi))
__this_cpu_add(nmi_stats.swallow, 1);
else
- unknown_nmi_error(reason, regs);
+ unknown_nmi_error(regs);
}
/*
--
1.7.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH 4/5] x86, nmi: Move default external NMI handler to its own routine
2014-05-07 15:34 [PATCH 0/5 RESEND] x86, nmi: Various fixes and cleanups Don Zickus
` (2 preceding siblings ...)
2014-05-07 15:34 ` [PATCH 3/5] x86, nmi: Remove 'reason' value from unknown nmi output Don Zickus
@ 2014-05-07 15:34 ` Don Zickus
2014-05-07 15:34 ` [PATCH 5/5] x86, nmi: Add better NMI stats to /proc/interrupts and show handlers Don Zickus
4 siblings, 0 replies; 21+ messages in thread
From: Don Zickus @ 2014-05-07 15:34 UTC (permalink / raw)
To: x86; +Cc: Peter Zijlstra, ak, gong.chen, LKML, Don Zickus, thomas.mingarelli
Now that we have setup an NMI subtye called NMI_EXT, there is really
no need to hard code the default external NMI handler in the main
nmi handler routine.
Move it to a proper function and register it on boot. This change is
just code movement.
In addition, update the hpwdt to allow it to unregister the default
handler on its registration (and vice versa). This allows the driver
to take control of that io port (which it ultimately wanted to do
originally), but in a cleaner way.
Tested by HP to make sure I didn't break anything. :-)
Cc: thomas.mingarelli@hp.com
Signed-off-by: Don Zickus <dzickus@redhat.com>
---
arch/x86/include/asm/nmi.h | 3 ++
arch/x86/kernel/nmi.c | 79 ++++++++++++++++++++++++++++---------------
drivers/watchdog/hpwdt.c | 14 ++++++++
3 files changed, 68 insertions(+), 28 deletions(-)
diff --git a/arch/x86/include/asm/nmi.h b/arch/x86/include/asm/nmi.h
index d6d26fa..47e8cff 100644
--- a/arch/x86/include/asm/nmi.h
+++ b/arch/x86/include/asm/nmi.h
@@ -61,4 +61,7 @@ void stop_nmi(void);
void restart_nmi(void);
void local_touch_nmi(void);
+int register_nmi_default_external_handler(void);
+void unregister_nmi_default_external_handler(void);
+
#endif /* _ASM_X86_NMI_H */
diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index be2d622..7359e53 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -74,11 +74,6 @@ static DEFINE_PER_CPU(struct nmi_stats, nmi_stats);
static int ignore_nmis;
int unknown_nmi_panic;
-/*
- * Prevent NMI reason port (0x61) being accessed simultaneously, can
- * only be used in NMI handler.
- */
-static DEFINE_RAW_SPINLOCK(nmi_reason_lock);
static int __init setup_unknown_nmi_panic(char *str)
{
@@ -310,7 +305,6 @@ static DEFINE_PER_CPU(unsigned long, last_nmi_rip);
static __kprobes void default_do_nmi(struct pt_regs *regs)
{
- unsigned char reason = 0;
int handled;
bool b2b = false;
@@ -358,28 +352,6 @@ static __kprobes void default_do_nmi(struct pt_regs *regs)
return;
}
- /* Non-CPU-specific NMI: NMI sources can be processed on any CPU */
- raw_spin_lock(&nmi_reason_lock);
- reason = x86_platform.get_nmi_reason();
-
- if (reason & NMI_REASON_MASK) {
- if (reason & NMI_REASON_SERR)
- pci_serr_error(reason, regs);
- else if (reason & NMI_REASON_IOCHK)
- io_check_error(reason, regs);
-#ifdef CONFIG_X86_32
- /*
- * Reassert NMI in case it became active
- * meanwhile as it's edge-triggered:
- */
- reassert_nmi();
-#endif
- __this_cpu_add(nmi_stats.external, 1);
- raw_spin_unlock(&nmi_reason_lock);
- return;
- }
- raw_spin_unlock(&nmi_reason_lock);
-
/*
* Only one NMI can be latched at a time. To handle
* this we may process multiple nmi handlers at once to
@@ -417,6 +389,57 @@ static __kprobes void default_do_nmi(struct pt_regs *regs)
}
/*
+ * Prevent NMI reason port (0x61) being accessed simultaneously, can
+ * only be used in NMI handler.
+ */
+static DEFINE_RAW_SPINLOCK(nmi_reason_lock);
+
+static int __kprobes
+nmi_default_external_handler(unsigned int cmd, struct pt_regs *regs)
+{
+ unsigned char reason = 0;
+ int ret = NMI_DONE;
+
+ /* Non-CPU-specific NMI: NMI sources can be processed on any CPU */
+ raw_spin_lock(&nmi_reason_lock);
+ reason = x86_platform.get_nmi_reason();
+
+ if (reason & NMI_REASON_MASK) {
+ if (reason & NMI_REASON_SERR)
+ pci_serr_error(reason, regs);
+ else if (reason & NMI_REASON_IOCHK)
+ io_check_error(reason, regs);
+#ifdef CONFIG_X86_32
+ /*
+ * Reassert NMI in case it became active
+ * meanwhile as it's edge-triggered:
+ */
+ reassert_nmi();
+#endif
+ ret = NMI_HANDLED;
+ }
+ raw_spin_unlock(&nmi_reason_lock);
+
+ return ret;
+}
+
+int register_nmi_default_external_handler(void)
+{
+ register_nmi_handler(NMI_EXT, nmi_default_external_handler,
+ 0, "plat");
+
+ return 0;
+}
+EXPORT_SYMBOL(register_nmi_default_external_handler);
+early_initcall(register_nmi_default_external_handler);
+
+void unregister_nmi_default_external_handler(void)
+{
+ unregister_nmi_handler(NMI_EXT, "plat");
+}
+EXPORT_SYMBOL(unregister_nmi_default_external_handler);
+
+/*
* NMIs can hit breakpoints which will cause it to lose its
* NMI context with the CPU when the breakpoint does an iret.
*/
diff --git a/drivers/watchdog/hpwdt.c b/drivers/watchdog/hpwdt.c
index 9617a4d..4de85e0 100644
--- a/drivers/watchdog/hpwdt.c
+++ b/drivers/watchdog/hpwdt.c
@@ -38,6 +38,7 @@
#include <asm/cacheflush.h>
#endif /* CONFIG_HPWDT_NMI_DECODING */
#include <asm/nmi.h>
+#include <asm/mach_traps.h>
#define HPWDT_VERSION "1.3.3"
#define SECS_TO_TICKS(secs) ((secs) * 1000 / 128)
@@ -487,6 +488,15 @@ static int hpwdt_pretimeout(unsigned int ulReason, struct pt_regs *regs)
goto out;
spin_lock_irqsave(&rom_lock, rom_pl);
+ if (ulReason == NMI_EXT) {
+ unsigned char reason = 0;
+ reason = x86_platform.get_nmi_reason();
+
+ if (!(reason & NMI_REASON_MASK)) {
+ spin_unlock_irqrestore(&rom_lock, rom_pl);
+ goto out;
+ }
+ }
if (!die_nmi_called && !is_icru && !is_uefi)
asminline_call(&cmn_regs, cru_rom_addr);
die_nmi_called = 1;
@@ -741,6 +751,9 @@ static int hpwdt_init_nmi_decoding(struct pci_dev *dev)
if (retval)
goto error1;
+ /* hpwdt is the new external port 0x61 handler */
+ unregister_nmi_default_external_handler();
+
dev_info(&dev->dev,
"HP Watchdog Timer Driver: NMI decoding initialized"
", allow kernel dump: %s (default = 0/OFF)\n",
@@ -762,6 +775,7 @@ static void hpwdt_exit_nmi_decoding(void)
{
unregister_nmi_handler(NMI_UNKNOWN, "hpwdt");
unregister_nmi_handler(NMI_EXT, "hpwdt");
+ register_nmi_default_external_handler();
if (cru_rom_addr)
iounmap(cru_rom_addr);
}
--
1.7.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH 5/5] x86, nmi: Add better NMI stats to /proc/interrupts and show handlers
2014-05-07 15:34 [PATCH 0/5 RESEND] x86, nmi: Various fixes and cleanups Don Zickus
` (3 preceding siblings ...)
2014-05-07 15:34 ` [PATCH 4/5] x86, nmi: Move default external NMI handler to its own routine Don Zickus
@ 2014-05-07 15:34 ` Don Zickus
2014-05-07 15:42 ` Ingo Molnar
2014-05-07 19:50 ` Elliott, Robert (Server Storage)
4 siblings, 2 replies; 21+ messages in thread
From: Don Zickus @ 2014-05-07 15:34 UTC (permalink / raw)
To: x86; +Cc: Peter Zijlstra, ak, gong.chen, LKML, Don Zickus
The main reason for this patch is because I have a hard time knowing
what NMI handlers are registered on the system when debugging NMI issues.
This info is provided in /proc/interrupts for interrupt handlers, so I
added support for NMI stuff too. As a bonus it provides stat breakdowns
much like the interrupts.
The only ugly issue is how to label NMI subtypes using only 3 letters
and still make it obvious it is part of the NMI. Adding a /proc/nmi
seemed overkill, so I choose to indent things by one space. Sample
output is below:
[root@dhcp71-248 ~]# cat /proc/interrupts
CPU0 CPU1 CPU2 CPU3
0: 29 0 0 0 IR-IO-APIC-edge timer
<snip>
NMI: 20 774 10986 4227 Non-maskable interrupts
LOC: 21 775 10987 4228 Local PMI, arch_bt
EXT: 0 0 0 0 External plat
UNK: 0 0 0 0 Unknown
SWA: 0 0 0 0 Swallowed
LOC: 30374 24749 20795 15095 Local timer interrupts
SPU: 0 0 0 0 Spurious interrupts
PMI: 20 774 10986 4227 Performance monitoring interrupts
<snip>
The extra 'local' NMI count represents the number of NMIs 'handled'
whereas the general NMI count represents how many actual NMIs were
processed. IOW, two NMIs came in at once during one call.
I am open to better suggestions.
Signed-off-by: Don Zickus <dzickus@redhat.com>
---
arch/x86/include/asm/nmi.h | 1 +
arch/x86/kernel/irq.c | 3 ++
arch/x86/kernel/nmi.c | 57 ++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 61 insertions(+), 0 deletions(-)
diff --git a/arch/x86/include/asm/nmi.h b/arch/x86/include/asm/nmi.h
index 47e8cff..47706af 100644
--- a/arch/x86/include/asm/nmi.h
+++ b/arch/x86/include/asm/nmi.h
@@ -60,6 +60,7 @@ void unregister_nmi_handler(unsigned int, const char *);
void stop_nmi(void);
void restart_nmi(void);
void local_touch_nmi(void);
+void nmi_show_interrupts(struct seq_file *, int);
int register_nmi_default_external_handler(void);
void unregister_nmi_default_external_handler(void);
diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
index d99f31d..520359c 100644
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -17,6 +17,7 @@
#include <asm/idle.h>
#include <asm/mce.h>
#include <asm/hw_irq.h>
+#include <asm/nmi.h>
#define CREATE_TRACE_POINTS
#include <asm/trace/irq_vectors.h>
@@ -59,6 +60,8 @@ int arch_show_interrupts(struct seq_file *p, int prec)
for_each_online_cpu(j)
seq_printf(p, "%10u ", irq_stats(j)->__nmi_count);
seq_printf(p, " Non-maskable interrupts\n");
+ nmi_show_interrupts(p, prec);
+
#ifdef CONFIG_X86_LOCAL_APIC
seq_printf(p, "%*s: ", prec, "LOC");
for_each_online_cpu(j)
diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index 7359e53..8741933 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -388,6 +388,63 @@ static __kprobes void default_do_nmi(struct pt_regs *regs)
unknown_nmi_error(regs);
}
+static void print_nmi_action_name(struct seq_file *p, int type)
+{
+ struct nmi_desc *desc;
+ struct nmiaction *a;
+
+ rcu_read_lock();
+
+ desc = nmi_to_desc(type);
+
+ if (!(list_empty(&desc->head))) {
+
+ a = list_entry_rcu(desc->head.next, struct nmiaction, list);
+ seq_printf(p, " %s", a->name);
+
+ list_for_each_entry_continue_rcu(a, &desc->head, list)
+ seq_printf(p, ", %s", a->name);
+
+ }
+ seq_puts(p, "\n");
+
+ rcu_read_unlock();
+}
+
+void nmi_show_interrupts(struct seq_file *p, int prec)
+{
+ int j;
+ int indent = prec + 1;
+
+#define get_nmi_stats(j) (&per_cpu(nmi_stats, j))
+
+ seq_printf(p, "%*s: ", indent, "LOC");
+ for_each_online_cpu(j)
+ seq_printf(p, "%10u ", get_nmi_stats(j)->normal);
+ seq_printf(p, " %-8s", "Local");
+
+ print_nmi_action_name(p, NMI_LOCAL);
+
+ seq_printf(p, "%*s: ", indent, "EXT");
+ for_each_online_cpu(j)
+ seq_printf(p, "%10u ", get_nmi_stats(j)->external);
+ seq_printf(p, " %-8s", "External");
+
+ print_nmi_action_name(p, NMI_EXT);
+
+ seq_printf(p, "%*s: ", indent, "UNK");
+ for_each_online_cpu(j)
+ seq_printf(p, "%10u ", get_nmi_stats(j)->unknown);
+ seq_printf(p, " %-8s", "Unknown");
+
+ print_nmi_action_name(p, NMI_UNKNOWN);
+
+ seq_printf(p, "%*s: ", indent, "SWA");
+ for_each_online_cpu(j)
+ seq_printf(p, "%10u ", get_nmi_stats(j)->swallow);
+ seq_printf(p, " %-8s", "Swallowed\n");
+}
+
/*
* Prevent NMI reason port (0x61) being accessed simultaneously, can
* only be used in NMI handler.
--
1.7.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH 1/5] x86, nmi: Add new nmi type 'external'
2014-05-07 15:34 ` [PATCH 1/5] x86, nmi: Add new nmi type 'external' Don Zickus
@ 2014-05-07 15:38 ` Ingo Molnar
2014-05-07 16:02 ` Don Zickus
0 siblings, 1 reply; 21+ messages in thread
From: Ingo Molnar @ 2014-05-07 15:38 UTC (permalink / raw)
To: Don Zickus
Cc: x86, Peter Zijlstra, ak, gong.chen, LKML, Thomas Gleixner,
Frédéric Weisbecker, Steven Rostedt
* Don Zickus <dzickus@redhat.com> wrote:
> I noticed when debugging a perf problem on a machine with GHES enabled,
> perf seemed slow. I then realized that the GHES NMI routine was taking
> a global lock all the time to inspect the hardware. This contended
> with all the local perf counters which did not need a lock. So each cpu
> accidentally was synchronizing with itself when using perf.
>
> This is because the way the nmi handler works. It executes all the handlers
> registered to a particular subtype (to deal with nmi sharing). As a result
> the GHES handler was executed on every PMI.
>
> Fix this by creating a new nmi type called NMI_EXT, which is used by
> handlers that need to probe external hardware and require a global lock
> to do so.
>
> Now the main NMI handler can check the internal NMI handlers first and
> then the external ones if nothing is found.
>
> This makes perf a little faster again on those machines with GHES enabled.
So what happens if GHES asserts an NMI at the same time a PMI
triggers?
If the perf PMI executes and indicates that it has handled something,
we don't execute the GHES handler, right? Will the GHES re-trigger the
NMI after we return?
Thanks,
Ingo
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 5/5] x86, nmi: Add better NMI stats to /proc/interrupts and show handlers
2014-05-07 15:34 ` [PATCH 5/5] x86, nmi: Add better NMI stats to /proc/interrupts and show handlers Don Zickus
@ 2014-05-07 15:42 ` Ingo Molnar
2014-05-07 16:04 ` Don Zickus
2014-05-07 19:50 ` Elliott, Robert (Server Storage)
1 sibling, 1 reply; 21+ messages in thread
From: Ingo Molnar @ 2014-05-07 15:42 UTC (permalink / raw)
To: Don Zickus
Cc: x86, Peter Zijlstra, ak, gong.chen, LKML, Thomas Gleixner,
Frédéric Weisbecker
* Don Zickus <dzickus@redhat.com> wrote:
> The main reason for this patch is because I have a hard time knowing
> what NMI handlers are registered on the system when debugging NMI issues.
>
> This info is provided in /proc/interrupts for interrupt handlers, so I
> added support for NMI stuff too. As a bonus it provides stat breakdowns
> much like the interrupts.
>
> The only ugly issue is how to label NMI subtypes using only 3 letters
> and still make it obvious it is part of the NMI. Adding a /proc/nmi
> seemed overkill, so I choose to indent things by one space. Sample
> output is below:
>
> [root@dhcp71-248 ~]# cat /proc/interrupts
> CPU0 CPU1 CPU2 CPU3
> 0: 29 0 0 0 IR-IO-APIC-edge timer
> <snip>
> NMI: 20 774 10986 4227 Non-maskable interrupts
> LOC: 21 775 10987 4228 Local PMI, arch_bt
> EXT: 0 0 0 0 External plat
> UNK: 0 0 0 0 Unknown
> SWA: 0 0 0 0 Swallowed
> LOC: 30374 24749 20795 15095 Local timer interrupts
> SPU: 0 0 0 0 Spurious interrupts
> PMI: 20 774 10986 4227 Performance monitoring interrupts
> <snip>
Looks pretty useful!
The lost vertical alignment of the counts is a problem though IMHO.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 1/5] x86, nmi: Add new nmi type 'external'
2014-05-07 15:38 ` Ingo Molnar
@ 2014-05-07 16:02 ` Don Zickus
2014-05-07 16:27 ` Ingo Molnar
0 siblings, 1 reply; 21+ messages in thread
From: Don Zickus @ 2014-05-07 16:02 UTC (permalink / raw)
To: Ingo Molnar
Cc: x86, Peter Zijlstra, ak, gong.chen, LKML, Thomas Gleixner,
Frédéric Weisbecker, Steven Rostedt
On Wed, May 07, 2014 at 05:38:54PM +0200, Ingo Molnar wrote:
>
> * Don Zickus <dzickus@redhat.com> wrote:
>
> > I noticed when debugging a perf problem on a machine with GHES enabled,
> > perf seemed slow. I then realized that the GHES NMI routine was taking
> > a global lock all the time to inspect the hardware. This contended
> > with all the local perf counters which did not need a lock. So each cpu
> > accidentally was synchronizing with itself when using perf.
> >
> > This is because the way the nmi handler works. It executes all the handlers
> > registered to a particular subtype (to deal with nmi sharing). As a result
> > the GHES handler was executed on every PMI.
> >
> > Fix this by creating a new nmi type called NMI_EXT, which is used by
> > handlers that need to probe external hardware and require a global lock
> > to do so.
> >
> > Now the main NMI handler can check the internal NMI handlers first and
> > then the external ones if nothing is found.
> >
> > This makes perf a little faster again on those machines with GHES enabled.
>
> So what happens if GHES asserts an NMI at the same time a PMI
> triggers?
>
> If the perf PMI executes and indicates that it has handled something,
> we don't execute the GHES handler, right? Will the GHES re-trigger the
> NMI after we return?
In my head, I had thought they would be queued up and things work out
fine. But I guess in theory, if a PMI NMI comes in and before the cpu can
accept it and GHES NMI comes in, then it would suffice to say it may get
dropped. That would be not be good. Though the race would be very small.
I don't have a good idea how to handle that.
On the flip side, we have the same exact problem, today, with the other
common external NMIs (SERR, IO). If a PCI SERR comes in at the same time
as a PMI, then it gets dropped. Worse, it doesn't get re-enabled and
blocks future SERRs (just found this out two weeks ago because of a dirty
perf status register on boot).
Again, I don't have a solution to juggle between PMI performance and
reliable delivery. We could do away with the spinlocks and go back to
single cpu delivery (like it used to be). Then devise a mechanism to
switch delivery to another cpu upon hotplug.
Thoughts?
Cheers,
Don
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 5/5] x86, nmi: Add better NMI stats to /proc/interrupts and show handlers
2014-05-07 15:42 ` Ingo Molnar
@ 2014-05-07 16:04 ` Don Zickus
2014-05-07 16:30 ` Ingo Molnar
0 siblings, 1 reply; 21+ messages in thread
From: Don Zickus @ 2014-05-07 16:04 UTC (permalink / raw)
To: Ingo Molnar
Cc: x86, Peter Zijlstra, ak, gong.chen, LKML, Thomas Gleixner,
Frédéric Weisbecker
On Wed, May 07, 2014 at 05:42:31PM +0200, Ingo Molnar wrote:
>
> * Don Zickus <dzickus@redhat.com> wrote:
>
> > The main reason for this patch is because I have a hard time knowing
> > what NMI handlers are registered on the system when debugging NMI issues.
> >
> > This info is provided in /proc/interrupts for interrupt handlers, so I
> > added support for NMI stuff too. As a bonus it provides stat breakdowns
> > much like the interrupts.
> >
> > The only ugly issue is how to label NMI subtypes using only 3 letters
> > and still make it obvious it is part of the NMI. Adding a /proc/nmi
> > seemed overkill, so I choose to indent things by one space. Sample
> > output is below:
> >
> > [root@dhcp71-248 ~]# cat /proc/interrupts
> > CPU0 CPU1 CPU2 CPU3
> > 0: 29 0 0 0 IR-IO-APIC-edge timer
> > <snip>
> > NMI: 20 774 10986 4227 Non-maskable interrupts
> > LOC: 21 775 10987 4228 Local PMI, arch_bt
> > EXT: 0 0 0 0 External plat
> > UNK: 0 0 0 0 Unknown
> > SWA: 0 0 0 0 Swallowed
> > LOC: 30374 24749 20795 15095 Local timer interrupts
> > SPU: 0 0 0 0 Spurious interrupts
> > PMI: 20 774 10986 4227 Performance monitoring interrupts
> > <snip>
>
> Looks pretty useful!
>
> The lost vertical alignment of the counts is a problem though IMHO.
Agreed! It wasn't obvious to me at the time on how to keep that alignment
while only changing the first column. I can look at it again, if you are
ok with the small shift in the first column. Otherwise alternate ideas
are welcomed. :-)
Cheers,
Don
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 1/5] x86, nmi: Add new nmi type 'external'
2014-05-07 16:02 ` Don Zickus
@ 2014-05-07 16:27 ` Ingo Molnar
2014-05-07 16:48 ` Don Zickus
2014-05-08 16:33 ` Don Zickus
0 siblings, 2 replies; 21+ messages in thread
From: Ingo Molnar @ 2014-05-07 16:27 UTC (permalink / raw)
To: Don Zickus
Cc: x86, Peter Zijlstra, ak, gong.chen, LKML, Thomas Gleixner,
Frédéric Weisbecker, Steven Rostedt
* Don Zickus <dzickus@redhat.com> wrote:
> On Wed, May 07, 2014 at 05:38:54PM +0200, Ingo Molnar wrote:
> >
> > * Don Zickus <dzickus@redhat.com> wrote:
> >
> > > I noticed when debugging a perf problem on a machine with GHES enabled,
> > > perf seemed slow. I then realized that the GHES NMI routine was taking
> > > a global lock all the time to inspect the hardware. This contended
> > > with all the local perf counters which did not need a lock. So each cpu
> > > accidentally was synchronizing with itself when using perf.
> > >
> > > This is because the way the nmi handler works. It executes all the handlers
> > > registered to a particular subtype (to deal with nmi sharing). As a result
> > > the GHES handler was executed on every PMI.
> > >
> > > Fix this by creating a new nmi type called NMI_EXT, which is used by
> > > handlers that need to probe external hardware and require a global lock
> > > to do so.
> > >
> > > Now the main NMI handler can check the internal NMI handlers first and
> > > then the external ones if nothing is found.
> > >
> > > This makes perf a little faster again on those machines with GHES enabled.
> >
> > So what happens if GHES asserts an NMI at the same time a PMI
> > triggers?
> >
> > If the perf PMI executes and indicates that it has handled something,
> > we don't execute the GHES handler, right? Will the GHES re-trigger the
> > NMI after we return?
>
> In my head, I had thought they would be queued up and things work
> out fine. [...]
x86 NMIs are generally edge triggered.
> [...] But I guess in theory, if a PMI NMI comes in and before the
> cpu can accept it and GHES NMI comes in, then it would suffice to
> say it may get dropped. That would be not be good. Though the race
> would be very small.
>
> I don't have a good idea how to handle that.
Well, are GHES NMIs reasserted if they are not handled? I don't know
but there's a definite answer to that hardware behavior question.
> On the flip side, we have the same exact problem, today, with the
> other common external NMIs (SERR, IO). If a PCI SERR comes in at
> the same time as a PMI, then it gets dropped. Worse, it doesn't get
> re-enabled and blocks future SERRs (just found this out two weeks
> ago because of a dirty perf status register on boot).
>
> Again, I don't have a solution to juggle between PMI performance and
> reliable delivery. We could do away with the spinlocks and go back
> to single cpu delivery (like it used to be). Then devise a
> mechanism to switch delivery to another cpu upon hotplug.
>
> Thoughts?
I'd say we should do a delayed timer that makes sure that all possible
handlers are polled after an NMI is triggered, but never at a high
rate.
Then simply return early the moment an NMI handler indicates that
there was an event handled - and first call high-performance handlers
like the perf handler.
The proper channel for hardware errors is the #MC entry anyway, so
this is mostly about legacies and weird hardware.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 5/5] x86, nmi: Add better NMI stats to /proc/interrupts and show handlers
2014-05-07 16:04 ` Don Zickus
@ 2014-05-07 16:30 ` Ingo Molnar
0 siblings, 0 replies; 21+ messages in thread
From: Ingo Molnar @ 2014-05-07 16:30 UTC (permalink / raw)
To: Don Zickus
Cc: x86, Peter Zijlstra, ak, gong.chen, LKML, Thomas Gleixner,
Frédéric Weisbecker
* Don Zickus <dzickus@redhat.com> wrote:
> On Wed, May 07, 2014 at 05:42:31PM +0200, Ingo Molnar wrote:
> >
> > * Don Zickus <dzickus@redhat.com> wrote:
> >
> > > The main reason for this patch is because I have a hard time knowing
> > > what NMI handlers are registered on the system when debugging NMI issues.
> > >
> > > This info is provided in /proc/interrupts for interrupt handlers, so I
> > > added support for NMI stuff too. As a bonus it provides stat breakdowns
> > > much like the interrupts.
> > >
> > > The only ugly issue is how to label NMI subtypes using only 3 letters
> > > and still make it obvious it is part of the NMI. Adding a /proc/nmi
> > > seemed overkill, so I choose to indent things by one space. Sample
> > > output is below:
> > >
> > > [root@dhcp71-248 ~]# cat /proc/interrupts
> > > CPU0 CPU1 CPU2 CPU3
> > > 0: 29 0 0 0 IR-IO-APIC-edge timer
> > > <snip>
> > > NMI: 20 774 10986 4227 Non-maskable interrupts
> > > LOC: 21 775 10987 4228 Local PMI, arch_bt
> > > EXT: 0 0 0 0 External plat
> > > UNK: 0 0 0 0 Unknown
> > > SWA: 0 0 0 0 Swallowed
> > > LOC: 30374 24749 20795 15095 Local timer interrupts
> > > SPU: 0 0 0 0 Spurious interrupts
> > > PMI: 20 774 10986 4227 Performance monitoring interrupts
> > > <snip>
> >
> > Looks pretty useful!
> >
> > The lost vertical alignment of the counts is a problem though IMHO.
>
> Agreed! It wasn't obvious to me at the time on how to keep that
> alignment while only changing the first column. I can look at it
> again, if you are ok with the small shift in the first column.
> Otherwise alternate ideas are welcomed. :-)
So I'd just pick a unique 3-letter shortcut for the first column and
not do any shifting. Do the 'these are NMI sub-cases' distinction in
the descripion field. Example mockup:
NMI: 20 774 10986 4227 Non-maskable interrupts
NLC: 21 775 10987 4228 NMI: Local PMI, arch_bt
NXT: 0 0 0 0 NMI: External plat
NUN: 0 0 0 0 NMI: Unknown
NSW: 0 0 0 0 NMI: Swallowed
LOC: 30374 24749 20795 15095 Local timer interrupts
or so.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 1/5] x86, nmi: Add new nmi type 'external'
2014-05-07 16:27 ` Ingo Molnar
@ 2014-05-07 16:48 ` Don Zickus
2014-05-08 16:33 ` Don Zickus
1 sibling, 0 replies; 21+ messages in thread
From: Don Zickus @ 2014-05-07 16:48 UTC (permalink / raw)
To: Ingo Molnar
Cc: x86, Peter Zijlstra, ak, gong.chen, LKML, Thomas Gleixner,
Frédéric Weisbecker, Steven Rostedt
On Wed, May 07, 2014 at 06:27:46PM +0200, Ingo Molnar wrote:
>
> * Don Zickus <dzickus@redhat.com> wrote:
>
> > On Wed, May 07, 2014 at 05:38:54PM +0200, Ingo Molnar wrote:
> > >
> > > * Don Zickus <dzickus@redhat.com> wrote:
> > >
> > > > I noticed when debugging a perf problem on a machine with GHES enabled,
> > > > perf seemed slow. I then realized that the GHES NMI routine was taking
> > > > a global lock all the time to inspect the hardware. This contended
> > > > with all the local perf counters which did not need a lock. So each cpu
> > > > accidentally was synchronizing with itself when using perf.
> > > >
> > > > This is because the way the nmi handler works. It executes all the handlers
> > > > registered to a particular subtype (to deal with nmi sharing). As a result
> > > > the GHES handler was executed on every PMI.
> > > >
> > > > Fix this by creating a new nmi type called NMI_EXT, which is used by
> > > > handlers that need to probe external hardware and require a global lock
> > > > to do so.
> > > >
> > > > Now the main NMI handler can check the internal NMI handlers first and
> > > > then the external ones if nothing is found.
> > > >
> > > > This makes perf a little faster again on those machines with GHES enabled.
> > >
> > > So what happens if GHES asserts an NMI at the same time a PMI
> > > triggers?
> > >
> > > If the perf PMI executes and indicates that it has handled something,
> > > we don't execute the GHES handler, right? Will the GHES re-trigger the
> > > NMI after we return?
> >
> > In my head, I had thought they would be queued up and things work
> > out fine. [...]
>
> x86 NMIs are generally edge triggered.
Right, I meant to say they would be latched.
>
> > [...] But I guess in theory, if a PMI NMI comes in and before the
> > cpu can accept it and GHES NMI comes in, then it would suffice to
> > say it may get dropped. That would be not be good. Though the race
> > would be very small.
> >
> > I don't have a good idea how to handle that.
>
> Well, are GHES NMIs reasserted if they are not handled? I don't know
> but there's a definite answer to that hardware behavior question.
I can dig around and find out but I would think not.
>
> > On the flip side, we have the same exact problem, today, with the
> > other common external NMIs (SERR, IO). If a PCI SERR comes in at
> > the same time as a PMI, then it gets dropped. Worse, it doesn't get
> > re-enabled and blocks future SERRs (just found this out two weeks
> > ago because of a dirty perf status register on boot).
> >
> > Again, I don't have a solution to juggle between PMI performance and
> > reliable delivery. We could do away with the spinlocks and go back
> > to single cpu delivery (like it used to be). Then devise a
> > mechanism to switch delivery to another cpu upon hotplug.
> >
> > Thoughts?
>
> I'd say we should do a delayed timer that makes sure that all possible
> handlers are polled after an NMI is triggered, but never at a high
> rate.
>
> Then simply return early the moment an NMI handler indicates that
> there was an event handled - and first call high-performance handlers
> like the perf handler.
Ok, I can look into something like.
>
> The proper channel for hardware errors is the #MC entry anyway, so
> this is mostly about legacies and weird hardware.
Well, it seems most vendors are going 'firmware first' with the NMI being
the notifying mechanism. But that is mostly on servers. I do deal with
vendors that like to generate their own NMI to panic the box (though those
come in as unknown NMIs).
Cheers,
Don
>
> Thanks,
>
> Ingo
^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: [PATCH 5/5] x86, nmi: Add better NMI stats to /proc/interrupts and show handlers
2014-05-07 15:34 ` [PATCH 5/5] x86, nmi: Add better NMI stats to /proc/interrupts and show handlers Don Zickus
2014-05-07 15:42 ` Ingo Molnar
@ 2014-05-07 19:50 ` Elliott, Robert (Server Storage)
2014-05-08 1:28 ` Don Zickus
1 sibling, 1 reply; 21+ messages in thread
From: Elliott, Robert (Server Storage) @ 2014-05-07 19:50 UTC (permalink / raw)
To: Don Zickus, x86; +Cc: Peter Zijlstra, ak, gong.chen, LKML
Don Zickus <dzickus@redhat.com> wrote:
> The main reason for this patch is because I have a hard time knowing
> what NMI handlers are registered on the system when debugging NMI issues.
>
> This info is provided in /proc/interrupts for interrupt handlers, so I
> added support for NMI stuff too. As a bonus it provides stat breakdowns
> much like the interrupts.
/proc/interrupts only shows online CPUs, while /proc/softirqs shows
all possible CPUs. Is there any value in this information for all
possible CPUs? Perhaps a /proc/hardirqs could be created alongside.
> The only ugly issue is how to label NMI subtypes using only 3 letters
> and still make it obvious it is part of the NMI. Adding a /proc/nmi
> seemed overkill, so I choose to indent things by one space.
The list only shows the currently registered handlers, which may
differ from the ones that were registered when the NMIs whose counts
are being displayed occurred. You might want to describe these new
rows and mention that in Documentation/filesystems/proc.txt and
the proc(5) manpage.
> Sample output is below:
>
> [root@dhcp71-248 ~]# cat /proc/interrupts
> CPU0 CPU1 CPU2 CPU3
> 0: 29 0 0 0 IR-IO-APIC-edge timer
> <snip>
> NMI: 20 774 10986 4227 Non-maskable interrupts
> LOC: 21 775 10987 4228 Local PMI, arch_bt
> EXT: 0 0 0 0 External plat
> UNK: 0 0 0 0 Unknown
> SWA: 0 0 0 0 Swallowed
Adding the list of NMI handlers in /proc/interrupts is a bit
inconsistent with the other interrupts, which don't describe their
handlers. It would be helpful to distinguish between a handler
list being present, being present but empty, or not being present.
Maybe use parenthesis like this (using Ingo's suggested format):
NMI: 20 774 10986 4227 Non-maskable interrupts
NLC: 21 775 10987 4228 NMI: Local (PMI, arch_bt)
NXT: 0 0 0 0 NMI: External (plat)
NUN: 0 0 0 0 NMI: Unknown ()
NSW: 0 0 0 0 NMI: Swallowed
LOC: 30374 24749 20795 15095 Local timer interrupts
> diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
> index d99f31d..520359c 100644
> --- a/arch/x86/kernel/irq.c
> +++ b/arch/x86/kernel/irq.c
...
> +void nmi_show_interrupts(struct seq_file *p, int prec)
> +{
> + int j;
> + int indent = prec + 1;
> +
> +#define get_nmi_stats(j) (&per_cpu(nmi_stats, j))
> +
> + seq_printf(p, "%*s: ", indent, "LOC");
> + for_each_online_cpu(j)
> + seq_printf(p, "%10u ", get_nmi_stats(j)->normal);
> + seq_printf(p, " %-8s", "Local");
> +
> + print_nmi_action_name(p, NMI_LOCAL);
> +
> + seq_printf(p, "%*s: ", indent, "EXT");
> + for_each_online_cpu(j)
> + seq_printf(p, "%10u ", get_nmi_stats(j)->external);
> + seq_printf(p, " %-8s", "External");
> +
> + print_nmi_action_name(p, NMI_EXT);
> +
> + seq_printf(p, "%*s: ", indent, "UNK");
> + for_each_online_cpu(j)
> + seq_printf(p, "%10u ", get_nmi_stats(j)->unknown);
> + seq_printf(p, " %-8s", "Unknown");
> +
> + print_nmi_action_name(p, NMI_UNKNOWN);
> +
The NMI handler types are in arch/c86/include/asm/nmi.h:
enum {
NMI_LOCAL=0,
NMI_UNKNOWN,
NMI_SERR,
NMI_IO_CHECK,
NMI_MAX
};
The new code only prints the registered handlers for NMI_LOCAL,
NMI_UNKNOWN, and the new NMI_EXT. Consider adding counters
for NMI_SERR and NMI_IO_CHECK and printing their handlers too.
drivers/watchdog/hpwdt.c is the only code currently in
the kernel registering handlers for them.
---
Rob Elliott HP Server Storage
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 5/5] x86, nmi: Add better NMI stats to /proc/interrupts and show handlers
2014-05-07 19:50 ` Elliott, Robert (Server Storage)
@ 2014-05-08 1:28 ` Don Zickus
2014-05-08 6:04 ` Ingo Molnar
0 siblings, 1 reply; 21+ messages in thread
From: Don Zickus @ 2014-05-08 1:28 UTC (permalink / raw)
To: Elliott, Robert (Server Storage); +Cc: x86, Peter Zijlstra, ak, gong.chen, LKML
On Wed, May 07, 2014 at 07:50:48PM +0000, Elliott, Robert (Server Storage) wrote:
> Don Zickus <dzickus@redhat.com> wrote:
> > The main reason for this patch is because I have a hard time knowing
> > what NMI handlers are registered on the system when debugging NMI issues.
> >
> > This info is provided in /proc/interrupts for interrupt handlers, so I
> > added support for NMI stuff too. As a bonus it provides stat breakdowns
> > much like the interrupts.
>
> /proc/interrupts only shows online CPUs, while /proc/softirqs shows
> all possible CPUs. Is there any value in this information for all
> possible CPUs? Perhaps a /proc/hardirqs could be created alongside.
Well if they are not online, they probably won't be generating NMIs, so I
am not sure there is much value there.
>
> > The only ugly issue is how to label NMI subtypes using only 3 letters
> > and still make it obvious it is part of the NMI. Adding a /proc/nmi
> > seemed overkill, so I choose to indent things by one space.
>
> The list only shows the currently registered handlers, which may
> differ from the ones that were registered when the NMIs whose counts
> are being displayed occurred. You might want to describe these new
> rows and mention that in Documentation/filesystems/proc.txt and
> the proc(5) manpage.
Ok, but that is a /proc/interrupts problem not one specific to NMI, no?
>
> > Sample output is below:
> >
> > [root@dhcp71-248 ~]# cat /proc/interrupts
> > CPU0 CPU1 CPU2 CPU3
> > 0: 29 0 0 0 IR-IO-APIC-edge timer
> > <snip>
> > NMI: 20 774 10986 4227 Non-maskable interrupts
> > LOC: 21 775 10987 4228 Local PMI, arch_bt
> > EXT: 0 0 0 0 External plat
> > UNK: 0 0 0 0 Unknown
> > SWA: 0 0 0 0 Swallowed
>
> Adding the list of NMI handlers in /proc/interrupts is a bit
> inconsistent with the other interrupts, which don't describe their
> handlers. It would be helpful to distinguish between a handler
> list being present, being present but empty, or not being present.
>
> Maybe use parenthesis like this (using Ingo's suggested format):
> NMI: 20 774 10986 4227 Non-maskable interrupts
> NLC: 21 775 10987 4228 NMI: Local (PMI, arch_bt)
> NXT: 0 0 0 0 NMI: External (plat)
> NUN: 0 0 0 0 NMI: Unknown ()
> NSW: 0 0 0 0 NMI: Swallowed
> LOC: 30374 24749 20795 15095 Local timer interrupts
>
Hmm, looking at /proc/interrupts I see
1: 858014 29054 23191 9337 IO-APIC-edge i8042
8: 3 24 10 2 IO-APIC-edge rtc0
9: 387555 9219 8308 7944 IO-APIC-fasteoi acpi
12: 9251360 163811 158846 141916 IO-APIC-edge i8042
16: 0 0 0 0 IO-APIC-fasteoi mmc0
17: 14 5 7 10 IO-APIC-fasteoi
19: 6892 367 13 10 IO-APIC-fasteoi ehci_hcd:usb2, ips, firewire_ohci
23: 1363281 753 94 94 IO-APIC-fasteoi ehci_hcd:usb1
Those may not be specific handlers, but they are registered irq names, no?
That basically matches what I was trying to accomplish with NMI.
I guess I don't see how what I did is much different than what already
exists.
> > diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
> > index d99f31d..520359c 100644
> > --- a/arch/x86/kernel/irq.c
> > +++ b/arch/x86/kernel/irq.c
> ...
> > +void nmi_show_interrupts(struct seq_file *p, int prec)
> > +{
> > + int j;
> > + int indent = prec + 1;
> > +
> > +#define get_nmi_stats(j) (&per_cpu(nmi_stats, j))
> > +
> > + seq_printf(p, "%*s: ", indent, "LOC");
> > + for_each_online_cpu(j)
> > + seq_printf(p, "%10u ", get_nmi_stats(j)->normal);
> > + seq_printf(p, " %-8s", "Local");
> > +
> > + print_nmi_action_name(p, NMI_LOCAL);
> > +
> > + seq_printf(p, "%*s: ", indent, "EXT");
> > + for_each_online_cpu(j)
> > + seq_printf(p, "%10u ", get_nmi_stats(j)->external);
> > + seq_printf(p, " %-8s", "External");
> > +
> > + print_nmi_action_name(p, NMI_EXT);
> > +
> > + seq_printf(p, "%*s: ", indent, "UNK");
> > + for_each_online_cpu(j)
> > + seq_printf(p, "%10u ", get_nmi_stats(j)->unknown);
> > + seq_printf(p, " %-8s", "Unknown");
> > +
> > + print_nmi_action_name(p, NMI_UNKNOWN);
> > +
>
> The NMI handler types are in arch/c86/include/asm/nmi.h:
> enum {
> NMI_LOCAL=0,
> NMI_UNKNOWN,
> NMI_SERR,
> NMI_IO_CHECK,
> NMI_MAX
> };
>
> The new code only prints the registered handlers for NMI_LOCAL,
> NMI_UNKNOWN, and the new NMI_EXT. Consider adding counters
> for NMI_SERR and NMI_IO_CHECK and printing their handlers too.
>
> drivers/watchdog/hpwdt.c is the only code currently in
> the kernel registering handlers for them.
Yeah, I guess I was trying to remove NMI_SERR and NMI_IO_CHECK. I forgot
if I accomplished that with this patch set or not. Instead I had hpwdt do
the ioport read directly instead of having do_default_nmi do it. I can
look at it again.
Cheers,
Don
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 5/5] x86, nmi: Add better NMI stats to /proc/interrupts and show handlers
2014-05-08 1:28 ` Don Zickus
@ 2014-05-08 6:04 ` Ingo Molnar
0 siblings, 0 replies; 21+ messages in thread
From: Ingo Molnar @ 2014-05-08 6:04 UTC (permalink / raw)
To: Don Zickus
Cc: Elliott, Robert (Server Storage),
x86, Peter Zijlstra, ak, gong.chen, LKML
* Don Zickus <dzickus@redhat.com> wrote:
> On Wed, May 07, 2014 at 07:50:48PM +0000, Elliott, Robert (Server Storage) wrote:
> > Don Zickus <dzickus@redhat.com> wrote:
> > > The main reason for this patch is because I have a hard time knowing
> > > what NMI handlers are registered on the system when debugging NMI issues.
> > >
> > > This info is provided in /proc/interrupts for interrupt handlers, so I
> > > added support for NMI stuff too. As a bonus it provides stat breakdowns
> > > much like the interrupts.
> >
> > /proc/interrupts only shows online CPUs, while /proc/softirqs shows
> > all possible CPUs. Is there any value in this information for all
> > possible CPUs? Perhaps a /proc/hardirqs could be created alongside.
>
> Well if they are not online, they probably won't be generating NMIs, so I
> am not sure there is much value there.
>
> >
> > > The only ugly issue is how to label NMI subtypes using only 3 letters
> > > and still make it obvious it is part of the NMI. Adding a /proc/nmi
> > > seemed overkill, so I choose to indent things by one space.
> >
> > The list only shows the currently registered handlers, which may
> > differ from the ones that were registered when the NMIs whose counts
> > are being displayed occurred. You might want to describe these new
> > rows and mention that in Documentation/filesystems/proc.txt and
> > the proc(5) manpage.
>
> Ok, but that is a /proc/interrupts problem not one specific to NMI, no?
>
> >
> > > Sample output is below:
> > >
> > > [root@dhcp71-248 ~]# cat /proc/interrupts
> > > CPU0 CPU1 CPU2 CPU3
> > > 0: 29 0 0 0 IR-IO-APIC-edge timer
> > > <snip>
> > > NMI: 20 774 10986 4227 Non-maskable interrupts
> > > LOC: 21 775 10987 4228 Local PMI, arch_bt
> > > EXT: 0 0 0 0 External plat
> > > UNK: 0 0 0 0 Unknown
> > > SWA: 0 0 0 0 Swallowed
> >
> > Adding the list of NMI handlers in /proc/interrupts is a bit
> > inconsistent with the other interrupts, which don't describe their
> > handlers. It would be helpful to distinguish between a handler
> > list being present, being present but empty, or not being present.
> >
> > Maybe use parenthesis like this (using Ingo's suggested format):
> > NMI: 20 774 10986 4227 Non-maskable interrupts
> > NLC: 21 775 10987 4228 NMI: Local (PMI, arch_bt)
> > NXT: 0 0 0 0 NMI: External (plat)
> > NUN: 0 0 0 0 NMI: Unknown ()
> > NSW: 0 0 0 0 NMI: Swallowed
> > LOC: 30374 24749 20795 15095 Local timer interrupts
> >
>
> Hmm, looking at /proc/interrupts I see
>
> 1: 858014 29054 23191 9337 IO-APIC-edge i8042
> 8: 3 24 10 2 IO-APIC-edge rtc0
> 9: 387555 9219 8308 7944 IO-APIC-fasteoi acpi
> 12: 9251360 163811 158846 141916 IO-APIC-edge i8042
> 16: 0 0 0 0 IO-APIC-fasteoi mmc0
> 17: 14 5 7 10 IO-APIC-fasteoi
> 19: 6892 367 13 10 IO-APIC-fasteoi ehci_hcd:usb2, ips, firewire_ohci
> 23: 1363281 753 94 94 IO-APIC-fasteoi ehci_hcd:usb1
>
> Those may not be specific handlers, but they are registered irq
> names, no? That basically matches what I was trying to accomplish
> with NMI.
>
> I guess I don't see how what I did is much different than what
> already exists.
The parentheses makes the output more readable, especially as with the
NMI format it's not quite clear what is 'irq type' and what is
'handler'.
Might make sense to add parentheses for regular irq handlers as well,
for consistency and readability.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 1/5] x86, nmi: Add new nmi type 'external'
2014-05-07 16:27 ` Ingo Molnar
2014-05-07 16:48 ` Don Zickus
@ 2014-05-08 16:33 ` Don Zickus
2014-05-08 17:35 ` Ingo Molnar
1 sibling, 1 reply; 21+ messages in thread
From: Don Zickus @ 2014-05-08 16:33 UTC (permalink / raw)
To: Ingo Molnar
Cc: x86, Peter Zijlstra, ak, gong.chen, LKML, Thomas Gleixner,
Frédéric Weisbecker, Steven Rostedt, andi
On Wed, May 07, 2014 at 06:27:46PM +0200, Ingo Molnar wrote:
> > [...] But I guess in theory, if a PMI NMI comes in and before the
> > cpu can accept it and GHES NMI comes in, then it would suffice to
> > say it may get dropped. That would be not be good. Though the race
> > would be very small.
> >
> > I don't have a good idea how to handle that.
>
> Well, are GHES NMIs reasserted if they are not handled? I don't know
> but there's a definite answer to that hardware behavior question.
I can't find anything that explicitly says the NMI will be re-asserted, so
I will it does not. Andi, do you know? (I am not sure who maintains GHES
any more).
>
> > On the flip side, we have the same exact problem, today, with the
> > other common external NMIs (SERR, IO). If a PCI SERR comes in at
> > the same time as a PMI, then it gets dropped. Worse, it doesn't get
> > re-enabled and blocks future SERRs (just found this out two weeks
> > ago because of a dirty perf status register on boot).
> >
> > Again, I don't have a solution to juggle between PMI performance and
> > reliable delivery. We could do away with the spinlocks and go back
> > to single cpu delivery (like it used to be). Then devise a
> > mechanism to switch delivery to another cpu upon hotplug.
> >
> > Thoughts?
>
> I'd say we should do a delayed timer that makes sure that all possible
> handlers are polled after an NMI is triggered, but never at a high
> rate.
Hmm, I was thinking about it and wanted to avoid a poll as I hear
complaints here and there about the nmi_watchdog constantly wasting power
cycles with its polling.
I was wondering if I could do a status read outside the spinlock, then if
a bit is set, just grab the spin_lock and re-read the status. But then
looking at the GHES code, I am not sure if it is as easy to read the
status bit as it is for a PCI_SERR/IO_CHK NMI.
Andi thoughts here? Should I poke Tony Luck?
Otherwise I can set up the polling if that doesn't work.
Cheers,
Don
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 1/5] x86, nmi: Add new nmi type 'external'
2014-05-08 16:33 ` Don Zickus
@ 2014-05-08 17:35 ` Ingo Molnar
2014-05-08 17:52 ` Don Zickus
0 siblings, 1 reply; 21+ messages in thread
From: Ingo Molnar @ 2014-05-08 17:35 UTC (permalink / raw)
To: Don Zickus
Cc: x86, Peter Zijlstra, ak, gong.chen, LKML, Thomas Gleixner,
Frédéric Weisbecker, Steven Rostedt, andi
* Don Zickus <dzickus@redhat.com> wrote:
> > > Again, I don't have a solution to juggle between PMI performance
> > > and reliable delivery. We could do away with the spinlocks and
> > > go back to single cpu delivery (like it used to be). Then
> > > devise a mechanism to switch delivery to another cpu upon
> > > hotplug.
> > >
> > > Thoughts?
> >
> > I'd say we should do a delayed timer that makes sure that all
> > possible handlers are polled after an NMI is triggered, but never
> > at a high rate.
>
> Hmm, I was thinking about it and wanted to avoid a poll as I hear
> complaints here and there about the nmi_watchdog constantly wasting
> power cycles with its polling.
But the polling would only happen if there's NMI traffic, so that's
fine. So as long as polling stops some time after the last PMI use,
it's a good solution.
This would also address a lot of NMI handling related fragility.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 1/5] x86, nmi: Add new nmi type 'external'
2014-05-08 17:35 ` Ingo Molnar
@ 2014-05-08 17:52 ` Don Zickus
2014-05-09 7:10 ` Ingo Molnar
0 siblings, 1 reply; 21+ messages in thread
From: Don Zickus @ 2014-05-08 17:52 UTC (permalink / raw)
To: Ingo Molnar
Cc: x86, Peter Zijlstra, ak, gong.chen, LKML, Thomas Gleixner,
Frédéric Weisbecker, Steven Rostedt, andi
On Thu, May 08, 2014 at 07:35:01PM +0200, Ingo Molnar wrote:
>
> * Don Zickus <dzickus@redhat.com> wrote:
>
> > > > Again, I don't have a solution to juggle between PMI performance
> > > > and reliable delivery. We could do away with the spinlocks and
> > > > go back to single cpu delivery (like it used to be). Then
> > > > devise a mechanism to switch delivery to another cpu upon
> > > > hotplug.
> > > >
> > > > Thoughts?
> > >
> > > I'd say we should do a delayed timer that makes sure that all
> > > possible handlers are polled after an NMI is triggered, but never
> > > at a high rate.
> >
> > Hmm, I was thinking about it and wanted to avoid a poll as I hear
> > complaints here and there about the nmi_watchdog constantly wasting
> > power cycles with its polling.
>
> But the polling would only happen if there's NMI traffic, so that's
> fine. So as long as polling stops some time after the last PMI use,
> it's a good solution.
So you are thinking an NMI comes in, kicks off a delayed timer for say
10ms. The timer fires, rechecks the NMI for missed events and then stops?
If another NMI happens before the timer fires, just kick the timer again?
Something like that?
Cheers,
Don
>
> This would also address a lot of NMI handling related fragility.
>
> Thanks,
>
> Ingo
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 1/5] x86, nmi: Add new nmi type 'external'
2014-05-08 17:52 ` Don Zickus
@ 2014-05-09 7:10 ` Ingo Molnar
2014-05-09 13:36 ` Don Zickus
0 siblings, 1 reply; 21+ messages in thread
From: Ingo Molnar @ 2014-05-09 7:10 UTC (permalink / raw)
To: Don Zickus
Cc: x86, Peter Zijlstra, ak, gong.chen, LKML, Thomas Gleixner,
Frédéric Weisbecker, Steven Rostedt, andi
* Don Zickus <dzickus@redhat.com> wrote:
> On Thu, May 08, 2014 at 07:35:01PM +0200, Ingo Molnar wrote:
> >
> > * Don Zickus <dzickus@redhat.com> wrote:
> >
> > > > > Again, I don't have a solution to juggle between PMI performance
> > > > > and reliable delivery. We could do away with the spinlocks and
> > > > > go back to single cpu delivery (like it used to be). Then
> > > > > devise a mechanism to switch delivery to another cpu upon
> > > > > hotplug.
> > > > >
> > > > > Thoughts?
> > > >
> > > > I'd say we should do a delayed timer that makes sure that all
> > > > possible handlers are polled after an NMI is triggered, but never
> > > > at a high rate.
> > >
> > > Hmm, I was thinking about it and wanted to avoid a poll as I hear
> > > complaints here and there about the nmi_watchdog constantly wasting
> > > power cycles with its polling.
> >
> > But the polling would only happen if there's NMI traffic, so that's
> > fine. So as long as polling stops some time after the last PMI use,
> > it's a good solution.
>
> So you are thinking an NMI comes in, kicks off a delayed timer for
> say 10ms. The timer fires, rechecks the NMI for missed events and
> then stops? If another NMI happens before the timer fires, just kick
> the timer again?
>
> Something like that?
Yeah, exactly, using delayed IRQ work for that or so.
This would allow us to 'optimistic' processing of NMI events: the
first handler that manages to do any work causes a return. No need to
make a per handler distinction, etc.
It would generally be pretty robust and would possibly be a natural
workaround for 'stuck PMU' type of bugs as well.
[ As long as it does not result in spurious 'dazed and confused'
messages :-) ]
Thanks,
Ingo
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 1/5] x86, nmi: Add new nmi type 'external'
2014-05-09 7:10 ` Ingo Molnar
@ 2014-05-09 13:36 ` Don Zickus
0 siblings, 0 replies; 21+ messages in thread
From: Don Zickus @ 2014-05-09 13:36 UTC (permalink / raw)
To: Ingo Molnar
Cc: x86, Peter Zijlstra, ak, gong.chen, LKML, Thomas Gleixner,
Frédéric Weisbecker, Steven Rostedt, andi
On Fri, May 09, 2014 at 09:10:50AM +0200, Ingo Molnar wrote:
>
> * Don Zickus <dzickus@redhat.com> wrote:
>
> > On Thu, May 08, 2014 at 07:35:01PM +0200, Ingo Molnar wrote:
> > >
> > > * Don Zickus <dzickus@redhat.com> wrote:
> > >
> > > > > > Again, I don't have a solution to juggle between PMI performance
> > > > > > and reliable delivery. We could do away with the spinlocks and
> > > > > > go back to single cpu delivery (like it used to be). Then
> > > > > > devise a mechanism to switch delivery to another cpu upon
> > > > > > hotplug.
> > > > > >
> > > > > > Thoughts?
> > > > >
> > > > > I'd say we should do a delayed timer that makes sure that all
> > > > > possible handlers are polled after an NMI is triggered, but never
> > > > > at a high rate.
> > > >
> > > > Hmm, I was thinking about it and wanted to avoid a poll as I hear
> > > > complaints here and there about the nmi_watchdog constantly wasting
> > > > power cycles with its polling.
> > >
> > > But the polling would only happen if there's NMI traffic, so that's
> > > fine. So as long as polling stops some time after the last PMI use,
> > > it's a good solution.
> >
> > So you are thinking an NMI comes in, kicks off a delayed timer for
> > say 10ms. The timer fires, rechecks the NMI for missed events and
> > then stops? If another NMI happens before the timer fires, just kick
> > the timer again?
> >
> > Something like that?
>
> Yeah, exactly, using delayed IRQ work for that or so.
>
> This would allow us to 'optimistic' processing of NMI events: the
> first handler that manages to do any work causes a return. No need to
> make a per handler distinction, etc.
>
> It would generally be pretty robust and would possibly be a natural
> workaround for 'stuck PMU' type of bugs as well.
Agreed. I'll try to hack something up for that.
>
> [ As long as it does not result in spurious 'dazed and confused'
> messages :-) ]
Hehe.
Cheers,
Don
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2014-05-09 13:37 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-07 15:34 [PATCH 0/5 RESEND] x86, nmi: Various fixes and cleanups Don Zickus
2014-05-07 15:34 ` [PATCH 1/5] x86, nmi: Add new nmi type 'external' Don Zickus
2014-05-07 15:38 ` Ingo Molnar
2014-05-07 16:02 ` Don Zickus
2014-05-07 16:27 ` Ingo Molnar
2014-05-07 16:48 ` Don Zickus
2014-05-08 16:33 ` Don Zickus
2014-05-08 17:35 ` Ingo Molnar
2014-05-08 17:52 ` Don Zickus
2014-05-09 7:10 ` Ingo Molnar
2014-05-09 13:36 ` Don Zickus
2014-05-07 15:34 ` [PATCH 2/5] x86, nmi: Add boot line option 'panic_on_unrecovered_nmi' and 'panic_on_io_nmi' Don Zickus
2014-05-07 15:34 ` [PATCH 3/5] x86, nmi: Remove 'reason' value from unknown nmi output Don Zickus
2014-05-07 15:34 ` [PATCH 4/5] x86, nmi: Move default external NMI handler to its own routine Don Zickus
2014-05-07 15:34 ` [PATCH 5/5] x86, nmi: Add better NMI stats to /proc/interrupts and show handlers Don Zickus
2014-05-07 15:42 ` Ingo Molnar
2014-05-07 16:04 ` Don Zickus
2014-05-07 16:30 ` Ingo Molnar
2014-05-07 19:50 ` Elliott, Robert (Server Storage)
2014-05-08 1:28 ` Don Zickus
2014-05-08 6:04 ` Ingo Molnar
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).