linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RT] have x86_64 nmi watchdog also count irq 0
@ 2007-06-20 18:51 Steven Rostedt
  2007-06-21 12:20 ` Konstantin Baydarov
  0 siblings, 1 reply; 2+ messages in thread
From: Steven Rostedt @ 2007-06-20 18:51 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Thomas Gleixner, LKML, RT

I was getting false reports about NMI lockups on CPU 3.  For some reason
CPU 0,1 and 2 where using apic timer, and CPU 3 was using the irq 0
timer.

This patch makes the NMI watchdog count both the apic timer and the irq0
timer.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

Index: linux-2.6-rt-test/arch/x86_64/kernel/nmi.c
===================================================================
--- linux-2.6-rt-test.orig/arch/x86_64/kernel/nmi.c
+++ linux-2.6-rt-test/arch/x86_64/kernel/nmi.c
@@ -938,7 +938,7 @@ int notrace __kprobes nmi_watchdog_tick(
 		touched = 1;
 	}
 
-	sum = read_pda(apic_timer_irqs);
+	sum = read_pda(apic_timer_irqs) + kstat_cpu(cpu).irqs[0];
 
 	if (__get_cpu_var(nmi_touch)) {
 		__get_cpu_var(nmi_touch) = 0;



^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [PATCH RT] have x86_64 nmi watchdog also count irq 0
  2007-06-20 18:51 [PATCH RT] have x86_64 nmi watchdog also count irq 0 Steven Rostedt
@ 2007-06-21 12:20 ` Konstantin Baydarov
  0 siblings, 0 replies; 2+ messages in thread
From: Konstantin Baydarov @ 2007-06-21 12:20 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Ingo Molnar, Thomas Gleixner, LKML, RT

On Wed, 20 Jun 2007 14:51:03 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> I was getting false reports about NMI lockups on CPU 3.  For some
> reason CPU 0,1 and 2 where using apic timer, and CPU 3 was using the
> irq 0 timer.
> 
I think you are telling about case when nmi_watchdog is set to IO_APIC
(cmdline:nmi_watchdog=1). I've faced that problem too.
In case when nmi_watchdog is set to IO_APIC local APIC timers are disabled
kernel uses broadcast timer. External timer(PIT or PM I'm not sure) is used
instead of local CPU apic timers. But even that lapic timers are disabled,
lapic timer handler smp_apic_timer_interrupt() is triggered by broadcast timer
IPIs with the same vector as local APIC IRQs.
The problem is that IPIs are sent to all CPUs except CPU that is sending IPIs,
so local_apic_timer_interrupt won't executed for CPU that initiated IPIs,
instead high level timer code(event_handler) will be called explicitly. It's not a
problem in general but in case of watchdog per CPU variable apic_timer_irqs won't
be incremented, that's why we get lockup.
Steven, I think you have Boot CPU with physical id 3, timer handler(irq 0) runs
and sends broadcast IPIs only on Boot CPU so every time you get lockup on CPU3.

Also I suggest to fix i386 kernel, we should use per CPU statistic:
Index: git-linux-2.6/arch/i386/kernel/nmi.c
===================================================================
--- git-linux-2.6.orig/arch/i386/kernel/nmi.c
+++ git-linux-2.6/arch/i386/kernel/nmi.c
@@ -351,7 +351,7 @@ __kprobes int nmi_watchdog_tick(struct p
         * Take the local apic timer and PIT/HPET into account. We don't
         * know which one is active, when we have highres/dyntick on
         */
-       sum = per_cpu(irq_stat, cpu).apic_timer_irqs + kstat_irqs(0);
+       sum = per_cpu(irq_stat, cpu).apic_timer_irqs + kstat_cpu(cpu).irqs[0];

        /* if the none of the timers isn't firing, this cpu isn't doing much */
        if (!touched && last_irq_sums[cpu] == sum) {

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2007-06-21 12:15 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-06-20 18:51 [PATCH RT] have x86_64 nmi watchdog also count irq 0 Steven Rostedt
2007-06-21 12:20 ` Konstantin Baydarov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).