NMI watch dog notify patch
diff mbox series

Message ID 42E940BE.3020908@mvista.com
State New, archived
Headers show
Series
  • NMI watch dog notify patch
Related show

Commit Message

George Anzinger July 28, 2005, 8:31 p.m. UTC
Andrew,
I have been doing some work on kgdb to pull a few of it "fingers" out of 
various places in the kernel.  This is the final location where we have 
a kgdb intercept not covered by a notify.

On a related issue, I feel very queasy with sending nmi interrupts and 
non-nmi events to the same notify code.  Would you be open to a patch to 
create a seperate notify list for nmi events?


-
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
Source: MontaVista Software, Inc. George Anzinger <george@mvista.com>
Type: Enhancement 
Description:
	This patch adds a notify to the nmi watchdog to notify that
	the system is about to be taken down by the watchdog.  If the
	notify is handled with a NOTIFY_STOP return, the system is
	given a new lease on life.

	This give debug code a chance to a) catch watchdog timeouts and
	b) possibly allow the system to continue, realizing that 
	the time out may be due to debugger activities such as single 
	stepping which is usually done with "other" cpus held.

Signed-off-by: George Anzinger<george@mvista.com>

 nmi.c |   15 ++++++++++++---
 1 files changed, 12 insertions(+), 3 deletions(-)

Comments

Andrew Morton July 28, 2005, 11:13 p.m. UTC | #1
George Anzinger <george@mvista.com> wrote:
>
> 	This patch adds a notify to the nmi watchdog to notify that
>  	the system is about to be taken down by the watchdog.  If the
>  	notify is handled with a NOTIFY_STOP return, the system is
>  	given a new lease on life.

It looks sensible, but as there aren't actually any in-kernel uses for this
I'd have thought it would be better for it to live out-of-tree?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
George Anzinger July 28, 2005, 11:58 p.m. UTC | #2
Andrew Morton wrote:
> George Anzinger <george@mvista.com> wrote:
> 
>>	This patch adds a notify to the nmi watchdog to notify that
>> 	the system is about to be taken down by the watchdog.  If the
>> 	notify is handled with a NOTIFY_STOP return, the system is
>> 	given a new lease on life.
> 
> 
> It looks sensible, but as there aren't actually any in-kernel uses for this
> I'd have thought it would be better for it to live out-of-tree?

I should just bundle it with the kgdb patch then?
Andrew Morton July 29, 2005, 12:12 a.m. UTC | #3
George Anzinger <george@mvista.com> wrote:
>
> Andrew Morton wrote:
> > George Anzinger <george@mvista.com> wrote:
> > 
> >>	This patch adds a notify to the nmi watchdog to notify that
> >> 	the system is about to be taken down by the watchdog.  If the
> >> 	notify is handled with a NOTIFY_STOP return, the system is
> >> 	given a new lease on life.
> > 
> > 
> > It looks sensible, but as there aren't actually any in-kernel uses for this
> > I'd have thought it would be better for it to live out-of-tree?
> 
> I should just bundle it with the kgdb patch then?

I spose so, for now.  If kdb and/or nlkd could benefit from it then it
might simplify life to merge it into mainline.  Perhaps you could ping
Keith Owens <kaos@sgi.com> and clyde.griffin@novell.com?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Patch
diff mbox series

Index: linux-2.6.13-rc/arch/i386/kernel/nmi.c
===================================================================
--- linux-2.6.13-rc.orig/arch/i386/kernel/nmi.c
+++ linux-2.6.13-rc/arch/i386/kernel/nmi.c
@@ -26,11 +26,13 @@ 
 #include <linux/nmi.h>
 #include <linux/sysdev.h>
 #include <linux/sysctl.h>
+#include <linux/notifier.h>
 
 #include <asm/smp.h>
 #include <asm/mtrr.h>
 #include <asm/mpspec.h>
 #include <asm/nmi.h>
+#include <asm/kdebug.h>
 
 #include "mach_traps.h"
 
@@ -494,8 +496,15 @@  void nmi_watchdog_tick (struct pt_regs *
 		 * wait a few IRQs (5 seconds) before doing the oops ...
 		 */
 		alert_counter[cpu]++;
-		if (alert_counter[cpu] == 5*nmi_hz)
-			die_nmi(regs, "NMI Watchdog detected LOCKUP");
+		if (alert_counter[cpu] == 5*nmi_hz) {
+			if (notify_die(DIE_NMIWATCHDOG, "nmi_ipi_watchdog", 
+				       regs, 0, 0, SIGINT) == NOTIFY_STOP) {
+				last_irq_sums[cpu] = sum;
+				alert_counter[cpu] = 0;
+			} else {
+				die_nmi(regs, "NMI Watchdog detected LOCKUP");
+			}
+		}
 	} else {
 		last_irq_sums[cpu] = sum;
 		alert_counter[cpu] = 0;
@@ -555,7 +564,7 @@  int proc_unknown_nmi_panic(ctl_table *ta
 			return -EBUSY;
 		} else {
 			set_nmi_callback(unknown_nmi_panic_callback);
-		}
+		} 
 	} else {
 		release_lapic_nmi();
 		unset_nmi_callback();